An example dataframe with categorical variables catA, catB, and catC. Obs is some observed value.
catA <- rep(factor(c(\"a\",\"b\",\"c\")), length.out=10
An alternative approach, one function to get all combinations of variables and another to apply a function over all subsets. The combinations function was stolen from another post...
## return all combinations of vector up to maximum length n
multicombn <- function(dat, n) {
unlist(lapply(1:n, function(x) combn(dat, x, simplify=F)), recursive=F)
}
For allsubs, vars is of form c("catA","catB","catC"), out.name = "mean".
func needs to be written in form that ddply would take,
func=function(x) mean(x$obs, na.rm=TRUE)
library(plyr)
allsubs <- function(indat, vars, func=NULL, out.name=NULL) {
results <- data.frame()
nvars <- rev(multicombn(vars,length(vars)))
for(i in 1:length(nvars)) {
results <-
rbind.fill(results, ddply(indat, unlist(nvars[i]), func))
}
if(!missing(out.name)) names(results)[length(vars)+1] <- out.name
results
}
One difference between this answer and shwaund's, this does not return rows for empty subsets, so no NAs in results column.
allsubs(dat, c("catA","catB","catc"), func, out.name="mean")
> head(allsubs(dat, vars, func, out.name = "mean"),20)
catA catB catC mean
1 a 1 d 56.65909
2 a 2 d 54.98116
3 a 3 d 37.52655
4 a 4 d 58.29034
5 b 1 e 52.88945
6 b 2 e 50.43122
7 b 3 e 52.57115
8 b 4 e 59.45348
9 c 1 f 52.41637
10 c 2 f 34.58122
11 c 3 f 46.80256
12 c 4 f 51.58668
13 1 d 56.65909
14 1 e 52.88945
15 1 f 52.41637
16 2 d 54.98116
17 2 e 50.43122
18 2 f 34.58122
19 3 d 37.52655
20 3 e 52.57115