Filtering a dataframe showing only duplicates

╄→尐↘猪︶ㄣ 提交于 2019-11-29 16:59:48

Try

d[!duplicated(d),]

and

d[duplicated(d),]

where d is your database.

=== UPDATE === If only the first column is desired, and all duplicates need to go in a separate column, you could do:

library(gdata) d[duplicated2(d$V1,bothWays = T),] d[!duplicated2(d$V1,bothWays = T),]

If only base R is desired, then:

bm <- duplicated(d$V1) | duplicated(d$V1,fromLast = TRUE) d[bm,] d[!bm,]

You can use duplicated but bear in mind that duplicated only returns TRUE at the first duplicated value, i.e.

> duplicated(c("A", "A", "A"))
[1] FALSE  TRUE  TRUE 

does not return TRUE for the first "A". If you want to catch all values of "A" you can e.g. use

duplicated(c("A", "A", "A")) | duplicated(c("A", "A", "A"), fromLast = TRUE)
# [1] TRUE TRUE TRUE

You can then separate your data using

## Index of the duplicated values:
indDuplicatedVec <- duplicated(d[,1]) | duplicated(d[,1], fromLast = TRUE)

myDuplicates <- d[indDuplicatedVec, ]
myUniques <- d[!indDuplicatedVec, ]

> myDuplicates
#V1 V2
#1  A  1
#3  A  1
#5  D  3
#6  D  4

> myUniques
#V1 V2
#2  B  1
#4  C  2

Considering df as your input, you can use dplyr and try:

df %>% group_by(V1) %>% filter(n() > 1)

for the duplicates

and

df %>% group_by(V1) %>% filter(n() == 1)

for the unique entries.

We can use data.table

library(data.table)
setDT(df)[, .SD[.N >1], V1]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!