Display duplicate records in data.frame and omit single ones

后端未结

关注

 4  881

I have been struggling with how to select ONLY duplicated rows of data.frame in R. For Instance, my data.frame is:

age=18:29
height=c(76.1,77,78.1,78.2,78.8


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2020-12-06 15:59
              
            
            
                                                                       
A solution using duplicated twice:

village[duplicated(village$Names) | duplicated(village$Names, fromLast = TRUE), ]


   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8


An alternative solution with by:

village[unlist(by(seq(nrow(village)), village$Names, 
                  function(x) if(length(x)-1) x)), ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-06 16:12
              
            
            
                                                                       
I came up with a solution using nested sapply:

> village_dups = 
village[unique(unlist(which(sapply(sapply(village$Names,function(x) 
which(village$Names==x)),function(y) length(y)) > 1))),]
> village_dups
   Names age height
1   John  18   76.1
2   John  19   77.0
3   John  20   78.1
5   Paul  22   78.8
6   Paul  23   79.7
7   Paul  24   79.9
8   Khan  25   81.1
9   Khan  26   81.2
10  Khan  27   81.8

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我在风中等你        
                
              
                            
                2020-12-06 16:14
              
            
            
                                                                       
village[ duplicated(village),]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-12-06 16:16
              
            
            
                                                                       
I find @Sven's answer using duplicated the "tidiest", but you can also do this many other ways. Here are two more:


Use table() and subset by matching the names where the tabulation is > 1 with the names present in the first column:

village[village$Names %in% names(which(table(village$Names) > 1)), ]

Use ave() to "tabulate" in a little different manner, but subset in the same way:

village[with(village, ave(as.numeric(Names), Names, FUN = length) > 1), ]


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复