When should I use “which” for subsetting?

后端未结

关注

 3  672

It is a toy example.

 iris %>% 
  group_by(Species) %>% 
  summarise(max = Sepal.Width[Sepal.Length == max(Sepal.Length)])

 # A tibble: 3 x 2
  Specie


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2020-12-19 14:29
              
            
            
                                                                       
The which removes the NA elements.  If we need to get the same behavior as which where there are NAsuse another condition along with==`

iris %>% 
  group_by(Species) %>% 
  summarise(max = Sepal.Width[Sepal.Length == max(Sepal.Length, na.rm = TRUE) & 
                                   !is.na(Sepal.Length)])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-19 14:32
              
            
            
                                                                       

Since this question is specifically about subsetting, I thought I would
illustrate some of the performance benefits of using which() over a
logical subset brought up in the linked question.

When you want to extract the entire subset, there is not much difference in
processing speed, but using which() needs to allocate less memory. However,if you only want a part of the subset (e.g. to showcase some strange
findings), which() has a significant speed and memory advantage due to
being able to avoid subsetting a data frame twice by subsetting the result of
which() instead.

Here are the benchmarks:

df <- ggplot2::diamonds; dim(df)
#> [1] 53940    10
mu <- mean(df$price)

bench::press(
  n = c(sum(df$price > mu), 10),
  {
    i <- seq_len(n)
    bench::mark(
      logical = df[df$price > mu, ][i, ],
      which_1 = df[which(df$price > mu), ][i, ],
      which_2 = df[which(df$price > mu)[i], ]
    )
  }
)
#> Running with:
#>       n
#> 1 19657
#> 2    10
#> # A tibble: 6 x 11
#>   expression     n      min     mean   median      max `itr/sec` mem_alloc
#>   <chr>      <dbl> <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 logical    19657    1.5ms   1.81ms   1.71ms   3.39ms      553.     5.5MB
#> 2 which_1    19657   1.41ms   1.61ms   1.56ms   2.41ms      620.    2.89MB
#> 3 which_2    19657 826.56us 934.72us 910.88us   1.41ms     1070.    1.76MB
#> 4 logical       10 893.12us   1.06ms   1.02ms   1.93ms      941.    4.21MB
#> 5 which_1       10  814.4us 944.81us 908.16us   1.78ms     1058.    1.69MB
#> 6 which_2       10 230.72us 264.45us 249.28us   1.08ms     3781.  498.34KB
#> # ... with 3 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm>


Created on 2018-08-19 by the reprex package (v0.2.0).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-19 14:37
              
            
            
                                                                       
When "==" ends up with NA. Try (1:2)[which(c(TRUE, NA))] v.s. (1:2)[c(TRUE, NA)].

If NA is not removed, indexing by NA gives NA (see ?Extract). However, this removal cannot be done by na.omit, as otherwise you may get positions of TRUE potentially wrong. A safe way is to replace NA by FALSE then do indexing. But why not just use which?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复