Display duplicate records in data.frame and omit single ones

点点圈 提交于 2019-11-28 00:33:48

A solution using duplicated twice:

village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]


   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8

An alternative solution with by:

village[unlist(by(seq(nrow(village)), village$Names, 
                  function(x) if(length(x)-1) x)), ]
village[ duplicated(village),]

I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:

  1. Use table() and subset by matching the names where the tabulation is > 1 with the names present in the first column:

    village[village$Names %in% names(which(table(village$Names) > 1)), ]
    
  2. Use ave() to "tabulate" in a little different manner, but subset in the same way:

    village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]
    

I came up with a solution using nested sapply:

> village_dups = 
village[unique(unlist(which(sapply(sapply(village$Names,function(x) 
which(village$Names==x)),function(y) length(y)) > 1))),]
> village_dups
   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!