Display duplicate records in data.frame and omit single ones

后端 未结 4 881
刺人心
刺人心 2020-12-06 15:57

I have been struggling with how to select ONLY duplicated rows of data.frame in R. For Instance, my data.frame is:

age=18:29
height=c(76.1,77,78.1,78.2,78.8         


        
相关标签:
4条回答
  • 2020-12-06 15:59

    A solution using duplicated twice:

    village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]
    
    
       Names age height
    1   John  18   76.1
    2   John  19   77.0
    3   John  20   78.1
    5   Paul  22   78.8
    6   Paul  23   79.7
    7   Paul  24   79.9
    8   Khan  25   81.1
    9   Khan  26   81.2
    10  Khan  27   81.8
    

    An alternative solution with by:

    village[unlist(by(seq(nrow(village)), village$Names, 
                      function(x) if(length(x)-1) x)), ]
    
    0 讨论(0)
  • 2020-12-06 16:12

    I came up with a solution using nested sapply:

    > village_dups = 
    village[unique(unlist(which(sapply(sapply(village$Names,function(x) 
    which(village$Names==x)),function(y) length(y)) > 1))),]
    > village_dups
       Names age height
    1   John  18   76.1
    2   John  19   77.0
    3   John  20   78.1
    5   Paul  22   78.8
    6   Paul  23   79.7
    7   Paul  24   79.9
    8   Khan  25   81.1
    9   Khan  26   81.2
    10  Khan  27   81.8
    
    0 讨论(0)
  • 2020-12-06 16:14
    village[ duplicated(village),]
    
    0 讨论(0)
  • 2020-12-06 16:16

    I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:

    1. Use table() and subset by matching the names where the tabulation is > 1 with the names present in the first column:

      village[village$Names %in% names(which(table(village$Names) > 1)), ]
      
    2. Use ave() to "tabulate" in a little different manner, but subset in the same way:

      village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]
      
    0 讨论(0)
提交回复
热议问题