Aggregate and Weighted Mean for multiple columns in R

断了今生、忘了曾经 提交于 2020-01-04 06:50:33

问题


The question is basically the samt as this: Aggregate and Weighted Mean in R.

But i want it to compute it on several columns, using data.table, as I have millions of rows. So something like this:

set.seed(42)   # fix seed so that you get the same results
dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                                    tax=rnorm(20),tax2=rnorm(20), assets=1e7+1e7*runif(20), assets2=1e6+1e7*runif(20))

DT <- data.table(dat)

I can compute the weighted mean on one column, assets, like this:

DT[,list(wret = weighted.mean(tax,assets)),by=assetclass]

But how to do it on both assets and assets2?
What if there are several columns, like col=c("assets1", "assets2", "assets3", ... )? And is it also possible to do it for tax, tax1...


回答1:


So you can do it for several columns of weights

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                  tax=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=3:4]
#    assetclass      assets       asets2
# 1:          D -0.14179882 -0.003717957
# 2:          B  0.61146928  0.523913589
# 3:          E -0.28037796 -0.147677384
# 4:          C -0.09658125 -0.010338894
# 5:          A  0.74954460  0.750190947

or you can exclude the non-weight columns from .SD:

DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=-(1:2)]

Here is a variant using matrix multiplication:

DT[, as.list(crossprod(as.matrix(.SD), tax)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

The matrix multiplication can do it also for several columns tax1, tax2, ...

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                 tax1=rnorm(20), tax2=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, as.list(crossprod(as.matrix(.SD), tax1)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), tax2)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), cbind(tax1, tax2))/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]


来源:https://stackoverflow.com/questions/42993956/aggregate-and-weighted-mean-for-multiple-columns-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!