R - find all unique values among subsets of a data frame

帅比萌擦擦* 提交于 2019-12-01 23:34:14

First, find the count of each element in df$data_values:

 x <- sapply(df$data_values, function(x) sum(as.numeric(df$data_values == x)))

> x
 [1] 1 2 2 2 1 2 2 2 1 1

Now extract the rows:

> df[x==1,]
   data_subsets data_values
1             A           1
5             A           5
9             B           6
10            B           7

Note that you missed "A 5" above. There is no "B 5".

You had the right idea with duplicated. The trick is to combine fromLast = TRUE and fromLast = FALSE options to get a full list of non-duplicated rows.

!duplicated(df$data_values,fromLast = FALSE)&!duplicated(df$data_values,fromLast = TRUE)
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE

Indexing your data.frame with this vector gives:

df[!duplicated(df$data_values,fromLast = FALSE)&!duplicated(df$data_values,fromLast = TRUE),]
   data_subsets data_values
1             A           1
5             A           5
9             B           6
10            B           7

A variant of P Lapointe's answer would be

df[! df$data_values %in% df[duplicated( unique(df)$data_values ), ]$data_values,]

The unique() deals with the possibility (not in your test data) that some rows in the data may be identical and you want to keep them once if the same data_values does not appear for distinct data_sets (or distinct other columns).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!