Apply a function to dataframe subsetted by all possible combinations of categorical variables

后端 未结 4 1090
野的像风
野的像风 2021-01-20 03:30

An example dataframe with categorical variables catA, catB, and catC. Obs is some observed value.

catA <- rep(factor(c(\"a\",\"b\",\"c\")), length.out=10         


        
4条回答
  •  日久生厌
    2021-01-20 04:12

    An alternative approach, one function to get all combinations of variables and another to apply a function over all subsets. The combinations function was stolen from another post...

    ## return all combinations of vector up to maximum length n
    multicombn <- function(dat, n) {
        unlist(lapply(1:n, function(x) combn(dat, x, simplify=F)), recursive=F)
    }
    

    For allsubs, vars is of form c("catA","catB","catC"), out.name = "mean". func needs to be written in form that ddply would take,

    func=function(x) mean(x$obs, na.rm=TRUE)
    
    library(plyr)
    allsubs <- function(indat, vars, func=NULL, out.name=NULL) {
        results <- data.frame()
        nvars <- rev(multicombn(vars,length(vars)))
        for(i in 1:length(nvars)) {
            results <-
                rbind.fill(results, ddply(indat, unlist(nvars[i]), func))
        }
        if(!missing(out.name)) names(results)[length(vars)+1] <- out.name
        results
    }
    

    One difference between this answer and shwaund's, this does not return rows for empty subsets, so no NAs in results column.

    allsubs(dat, c("catA","catB","catc"), func, out.name="mean")
    > head(allsubs(dat, vars, func, out.name = "mean"),20)
       catA catB catC     mean
    1     a    1    d 56.65909
    2     a    2    d 54.98116
    3     a    3    d 37.52655
    4     a    4    d 58.29034
    5     b    1    e 52.88945
    6     b    2    e 50.43122
    7     b    3    e 52.57115
    8     b    4    e 59.45348
    9     c    1    f 52.41637
    10    c    2    f 34.58122
    11    c    3    f 46.80256
    12    c    4    f 51.58668
    13     1    d 56.65909
    14     1    e 52.88945
    15     1    f 52.41637
    16     2    d 54.98116
    17     2    e 50.43122
    18     2    f 34.58122
    19     3    d 37.52655
    20     3    e 52.57115
    

提交回复
热议问题