Duplicate values in a single row in dataframe

蓝咒 提交于 2019-12-04 04:30:34

问题


df <- data.frame(label = c("a","b","c"),
                 val=c("x","b","c"),
                 val1=c("z","b","d"))

   label val val1
1     a   x    z
2     b   b    b
3     c   c    d

I want find out the duplicate values in each row. for 1st row, there is no duplicate for 2nd row , "b" is duplicate for 3rd row, "c" is duplicate. How to find this duplicate in R programming.

Also I need to replace the duplicate elements with NA value.


回答1:


Using duplicated with apply

apply(df,1,duplicated)
      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE  TRUE  TRUE
[3,] FALSE  TRUE FALSE

And replace it with NA

df[t(apply(df,1,duplicated))]=NA
df
  label  val val1
1     a    x    z
2     b <NA> <NA>
3     c <NA>    d



回答2:


Here are couple of options

Using base R apply we replace the duplicated values to NA for each row

df[] <- t(apply(df, 1, function(x) replace(x, duplicated(x), NA)))

df
#  label  val val1
#1     a    x    z
#2     b <NA> <NA>
#3     c <NA>    d

Or another alternative using dplyr and tidyr is to first create a new column representing the row_number() of the dataframe, gather it to long format, group_by each row, replace duplicated value to NA and spread it back into wide format.

library(dplyr)
library(tidyr)

df %>%
  mutate(row = row_number()) %>%
  gather(key, value, -row) %>%
  group_by(row) %>%
  mutate(value = replace(value, duplicated(value), NA)) %>%
  spread(key, value) %>%
  ungroup %>%
  select(-row)

# A tibble: 3 x 3
#  label val   val1 
#  <chr> <chr> <chr>
#1 a     x     z    
#2 b     NA    NA   
#3 c     NA    d    


来源:https://stackoverflow.com/questions/52996273/duplicate-values-in-a-single-row-in-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!