问题
df <- data.frame(label = c("a","b","c"),
val=c("x","b","c"),
val1=c("z","b","d"))
label val val1
1 a x z
2 b b b
3 c c d
I want find out the duplicate values in each row. for 1st row, there is no duplicate for 2nd row , "b" is duplicate for 3rd row, "c" is duplicate. How to find this duplicate in R programming.
Also I need to replace the duplicate elements with NA
value.
回答1:
Using duplicated
with apply
apply(df,1,duplicated)
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE TRUE TRUE
[3,] FALSE TRUE FALSE
And replace it with NA
df[t(apply(df,1,duplicated))]=NA
df
label val val1
1 a x z
2 b <NA> <NA>
3 c <NA> d
回答2:
Here are couple of options
Using base R apply
we replace
the duplicated
values to NA
for each row
df[] <- t(apply(df, 1, function(x) replace(x, duplicated(x), NA)))
df
# label val val1
#1 a x z
#2 b <NA> <NA>
#3 c <NA> d
Or another alternative using dplyr
and tidyr
is to first create a new column representing the row_number()
of the dataframe, gather
it to long format, group_by
each row, replace
duplicated
value to NA
and spread
it back into wide format.
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
gather(key, value, -row) %>%
group_by(row) %>%
mutate(value = replace(value, duplicated(value), NA)) %>%
spread(key, value) %>%
ungroup %>%
select(-row)
# A tibble: 3 x 3
# label val val1
# <chr> <chr> <chr>
#1 a x z
#2 b NA NA
#3 c NA d
来源:https://stackoverflow.com/questions/52996273/duplicate-values-in-a-single-row-in-dataframe