using R - delete rows when a value repeated less than 3 times

前端未结

关注

 4  1031

frame with 10 rows and 3 columns

    a   b c
1   1 201 1
2   2 202 1
3   3 203 1
4   4 204 1
5   5 205 4
6   6 206 5
7   7 207 4
8   8 208 4
9   9 209 8
10 1


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  忘了有多久        
                
              
                            
                2020-12-16 22:43
              
            
            
                                                                       
Here is a solution using ave :

Data[ave(Data$c, Data$c, FUN = length) > 2, ]


or using ave with subset:

subset(Data, ave(c, c, FUN = length) > 2)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  盖世英雄少女心        
                
              
                            
                2020-12-16 22:53
              
            
            
                                                                       
Correct me if I'm wrong, but it seems like you want all the rows where the value in column c occurs more than twice.  "Repeated" makes me think that they need to occur consecutively, which is what rle is for, but you would only want rows 1-4 if that was what you were trying to do.

That said, the code below finds the rows where the value in column c occurs more than 2 times.  I'm sure this can be done more elegantly, but it works.

lines <-
"a   b c
1 201 1
2 202 1
3 203 1
4 204 1
5 205 4
6 206 5
7 207 4
8 208 4
9 209 8
10 210 5"
Data <- read.table(con <- textConnection(lines), header=TRUE); close(con)
cVals <- data.frame(table(Data$c))
Rows <- Data$c %in% cVals[cVals$Freq > 2,1]
Data[Rows,]
#  a   b c
#1 1 201 1
#2 2 202 1
#3 3 203 1
#4 4 204 1
#5 5 205 4
#7 7 207 4
#8 8 208 4

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2020-12-16 22:59
              
            
            
                                                                       
Using unsplit is probably the easiest way to project a grouped aggregate (in this case using table to get counts, but see tapply for the general case) out to the original data.

subset(Data, with(Data, unsplit(table(c), c)) >= 3)


Equivalently and more similar to Erik's:

Data[unsplit(table(Data$c), Data$c) >= 3, ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2020-12-16 23:04
              
            
            
                                                                       
Building on Joshua's answer: 

Data[Data$c %in% names(which(table(Data$c) > 2)), ]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复