Apply a function to dataframe subsetted by all possible combinations of categorical variables

后端 未结 4 1088
野的像风
野的像风 2021-01-20 03:30

An example dataframe with categorical variables catA, catB, and catC. Obs is some observed value.

catA <- rep(factor(c(\"a\",\"b\",\"c\")), length.out=10         


        
4条回答
  •  不要未来只要你来
    2021-01-20 03:59

    This isn't the cleanest solution, but I think it gets close to what you want.

    getAllSubs <- function(df, lookup, fun) {
    
      out <- lapply(1:nrow(lookup), function(i) {
    
        df_new <- df
    
        if(length(na.omit(unlist(lookup[i,]))) > 0) {
    
          for(j in colnames(lookup)[which(!is.na(unlist(lookup[i,])))]) {
            df_new <- df_new[df_new[,j] == lookup[i,j],]
          }  
        } 
        fun(df_new)  
      })
    
      if(mean(sapply(out, length) ==1) == 1) {
        out <- unlist(out)
      } else {
        out <- do.call("rbind", out)
      }
    
      final <- cbind(lookup, out)
      final[is.na(final)] <- NA
      final
    }
    

    As it is currently written you have to construct the lookup table beforehand, but you could just as easily move that construction into the function itself. I added a few lines at the end to make sure it could accomodate outputs of different lengths and so NaNs were turned into NAs, just because that seemed to create a cleaner output. As it is currently written, it applies the function to the entire original data frame in cases where all columns are NA.

    dat_out <- getAllSubs(dat, allsubs, function(x) mean(x$obs, na.rm = TRUE))
    
    head(dat_out,20)
    
       catA catB catC      out
    1     47.25446
    2     a   51.54226
    3     b   46.45352
    4     c   43.63767
    5      1  47.23872
    6     a    1  66.59281
    7     b    1  32.03513
    8     c    1  40.66896
    9      2  45.16588
    10    a    2  50.59323
    11    b    2  51.02013
    12    c    2  33.15251
    13     3  51.67809
    14    a    3  48.13645
    15    b    3  57.92084
    16    c    3  49.27710
    17     4  44.93515
    18    a    4  40.36266
    19    b    4  44.26717
    20    c    4  50.74718
    

提交回复
热议问题