Apply a function to dataframe subsetted by all possible combinations of categorical variables

后端未结

关注

 4  1088

野的像风 2021-01-20 03:30

An example dataframe with categorical variables catA, catB, and catC. Obs is some observed value.

catA <- rep(factor(c(\"a\",\"b\",\"c\")), length.out=10


      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   不要未来只要你来
                                             
                
                
                (楼主)
            
              
              
                2021-01-20 03:59
              

            
            
                        
This isn't the cleanest solution, but I think it gets close to what you want.

getAllSubs <- function(df, lookup, fun) {

  out <- lapply(1:nrow(lookup), function(i) {

    df_new <- df

    if(length(na.omit(unlist(lookup[i,]))) > 0) {

      for(j in colnames(lookup)[which(!is.na(unlist(lookup[i,])))]) {
        df_new <- df_new[df_new[,j] == lookup[i,j],]
      }  
    } 
    fun(df_new)  
  })

  if(mean(sapply(out, length) ==1) == 1) {
    out <- unlist(out)
  } else {
    out <- do.call("rbind", out)
  }

  final <- cbind(lookup, out)
  final[is.na(final)] <- NA
  final
}


As it is currently written you have to construct the lookup table beforehand, but you could just as easily move that construction into the function itself. I added a few lines at the end to make sure it could accomodate outputs of different lengths and so NaNs were turned into NAs, just because that seemed to create a cleaner output. As it is currently written, it applies the function to the entire original data frame in cases where all columns are NA.

dat_out <- getAllSubs(dat, allsubs, function(x) mean(x$obs, na.rm = TRUE))

head(dat_out,20)

   catA catB catC      out
1     47.25446
2     a   51.54226
3     b   46.45352
4     c   43.63767
5      1  47.23872
6     a    1  66.59281
7     b    1  32.03513
8     c    1  40.66896
9      2  45.16588
10    a    2  50.59323
11    b    2  51.02013
12    c    2  33.15251
13     3  51.67809
14    a    3  48.13645
15    b    3  57.92084
16    c    3  49.27710
17     4  44.93515
18    a    4  40.36266
19    b    4  44.26717
20    c    4  50.74718

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复