I have been struggling with how to select ONLY duplicated rows of data.frame in R. For Instance, my data.frame is:
age=18:29
height=c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
Names=c("John","John","John", "Harry", "Paul", "Paul", "Paul", "Khan", "Khan", "Khan", "Sam", "Joe")
village <- data.frame(Names, age, height)
Names age height
John 18 76.1
John 19 77.0
John 20 78.1
Harry 21 78.2
Paul 22 78.8
Paul 23 79.7
Paul 24 79.9
Khan 25 81.1
Khan 26 81.2
Khan 27 81.8
Sam 28 82.8
Joe 29 83.5
I want to see the result as following:
Names age height
John 18 76.1
John 19 77.0
John 20 78.1
Paul 22 78.8
Paul 23 79.7
Paul 24 79.9
Khan 25 81.1
Khan 26 81.2
Khan 27 81.8
Thanks for your time...
A solution using duplicated twice:
village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]
Names age height
1 John 18 76.1
2 John 19 77.0
3 John 20 78.1
5 Paul 22 78.8
6 Paul 23 79.7
7 Paul 24 79.9
8 Khan 25 81.1
9 Khan 26 81.2
10 Khan 27 81.8
An alternative solution with by:
village[unlist(by(seq(nrow(village)), village$Names,
function(x) if(length(x)-1) x)), ]
village[ duplicated(village),]
I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:
Use
table()and subset by matching the names where the tabulation is > 1 with the names present in the first column:village[village$Names %in% names(which(table(village$Names) > 1)), ]Use
ave()to "tabulate" in a little different manner, but subset in the same way:village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]
I came up with a solution using nested sapply:
> village_dups =
village[unique(unlist(which(sapply(sapply(village$Names,function(x)
which(village$Names==x)),function(y) length(y)) > 1))),]
> village_dups
Names age height
1 John 18 76.1
2 John 19 77.0
3 John 20 78.1
5 Paul 22 78.8
6 Paul 23 79.7
7 Paul 24 79.9
8 Khan 25 81.1
9 Khan 26 81.2
10 Khan 27 81.8
来源:https://stackoverflow.com/questions/14274306/display-duplicate-records-in-data-frame-and-omit-single-ones