An example dataframe with categorical variables catA, catB, and catC. Obs is some observed value.
catA <- rep(factor(c(\"a\",\"b\",\"c\")), length.out=10
This isn't the cleanest solution, but I think it gets close to what you want.
getAllSubs <- function(df, lookup, fun) {
out <- lapply(1:nrow(lookup), function(i) {
df_new <- df
if(length(na.omit(unlist(lookup[i,]))) > 0) {
for(j in colnames(lookup)[which(!is.na(unlist(lookup[i,])))]) {
df_new <- df_new[df_new[,j] == lookup[i,j],]
}
}
fun(df_new)
})
if(mean(sapply(out, length) ==1) == 1) {
out <- unlist(out)
} else {
out <- do.call("rbind", out)
}
final <- cbind(lookup, out)
final[is.na(final)] <- NA
final
}
As it is currently written you have to construct the lookup table beforehand, but you could just as easily move that construction into the function itself. I added a few lines at the end to make sure it could accomodate outputs of different lengths and so NaNs were turned into NAs, just because that seemed to create a cleaner output. As it is currently written, it applies the function to the entire original data frame in cases where all columns are NA.
dat_out <- getAllSubs(dat, allsubs, function(x) mean(x$obs, na.rm = TRUE))
head(dat_out,20)
catA catB catC out
1 47.25446
2 a 51.54226
3 b 46.45352
4 c 43.63767
5 1 47.23872
6 a 1 66.59281
7 b 1 32.03513
8 c 1 40.66896
9 2 45.16588
10 a 2 50.59323
11 b 2 51.02013
12 c 2 33.15251
13 3 51.67809
14 a 3 48.13645
15 b 3 57.92084
16 c 3 49.27710
17 4 44.93515
18 a 4 40.36266
19 b 4 44.26717
20 c 4 50.74718