Display duplicate records in data.frame and omit single ones

百般思念 提交于 2019-11-26 21:44:43

问题


I have been struggling with how to select ONLY duplicated rows of data.frame in R. For Instance, my data.frame is:

age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
Names=c("John","John","John", "Harry", "Paul", "Paul", "Paul", "Khan", "Khan", "Khan", "Sam", "Joe")
village <- data.frame(Names, age, height)

 Names age height
 John  18   76.1
 John  19   77.0
 John  20   78.1
 Harry  21   78.2
 Paul  22   78.8
 Paul  23   79.7
 Paul  24   79.9
 Khan  25   81.1
 Khan  26   81.2
 Khan  27   81.8
 Sam  28   82.8
 Joe  29   83.5

I want to see the result as following:

Names age height
John  18   76.1
John  19   77.0
John  20   78.1
Paul  22   78.8
Paul  23   79.7
Paul  24   79.9
Khan  25   81.1
Khan  26   81.2
Khan  27   81.8

Thanks for your time...


回答1:


A solution using duplicated twice:

village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]


   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8

An alternative solution with by:

village[unlist(by(seq(nrow(village)), village$Names, 
                  function(x) if(length(x)-1) x)), ]



回答2:


village[ duplicated(village),]



回答3:


I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:

  1. Use table() and subset by matching the names where the tabulation is > 1 with the names present in the first column:

    village[village$Names %in% names(which(table(village$Names) > 1)), ]
    
  2. Use ave() to "tabulate" in a little different manner, but subset in the same way:

    village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]
    



回答4:


I came up with a solution using nested sapply:

> village_dups = 
village[unique(unlist(which(sapply(sapply(village$Names,function(x) 
which(village$Names==x)),function(y) length(y)) > 1))),]
> village_dups
   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8


来源:https://stackoverflow.com/questions/14274306/display-duplicate-records-in-data-frame-and-omit-single-ones

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!