Aggregate and Weighted Mean in R

前端 未结 4 2061
迷失自我
迷失自我 2020-12-06 06:04

I\'m trying to calculate asset-weighted returns by asset class. For the life of me, I can\'t figure out how to do it using the aggregate command.

My data frame look

相关标签:
4条回答
  • 2020-12-06 06:14

    This is also easily done with aggregate. It helps to remember alternate equations for a weighted mean.

    rw <- dat$return * dat$assets
    dat1 <- aggregate(rw ~ assetclass, data = dat, sum)
    datw <- aggregate(assets ~ assetclass, data = dat, sum)
    dat1$weighted.return <- dat1$rw / datw$assets
    
    0 讨论(0)
  • 2020-12-06 06:20

    For starters, w=(dat$return, dat$assets)) is a syntax error.

    And plyr makes this a little easier:

    > set.seed(42)   # fix seed so that you get the same results
    > dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
    +                   return=rnorm(20), assets=1e7+1e7*runif(20))
    > library(plyr)
    > ddply(dat, .(assetclass),   # so by asset class invoke following function
    +       function(x) data.frame(wret=weighted.mean(x$return, x$assets)))
      assetclass     wret
    1          A -2.27292
    2          B -0.19969
    3          C  0.46448
    4          D -0.71354
    5          E  0.55354
    > 
    
    0 讨论(0)
  • 2020-12-06 06:27

    A data.table solution, will be faster than plyr

    library(data.table)
    DT <- data.table(dat)
    DT[,list(wret = weighted.mean(return,assets)),by=assetclass]
    ##    assetclass        wret
    ## 1:          A -0.05445455
    ## 2:          E -0.56614312
    ## 3:          D -0.43007547
    ## 4:          B  0.69799701
    ## 5:          C  0.08850954
    
    0 讨论(0)
  • 2020-12-06 06:27

    The recently released collapse package provides a fast solution to this and similar problems (using weighted median, mode etc.) by providing a full set of Fast Statistical Functions performing grouped and weighted computations internally in C++:

    library(collapse)
    dat <- data.frame(assetclass = sample(LETTERS[1:5], 20, replace = TRUE), 
                      return = rnorm(20), assets = 1e7+1e7*runif(20))
    
    # Using collap() function with fmean, which supports weights: (by default weights are aggregated using the sum, which is prevented using keep.w = FALSE)
    collap(dat, return ~ assetclass, fmean, w = ~ assets, keep.w = FALSE)
    ##   assetclass     return
    ## 1          A -0.4667822
    ## 2          B  0.5417719
    ## 3          C -0.8810705
    ## 4          D  0.6301396
    ## 5          E  0.3101673
    
    # Can also use a dplyr-like workflow: (use keep.w = FALSE to omit sum.assets)
    library(magrittr)
    dat %>% fgroup_by(assetclass) %>% fmean(assets)
    ##   assetclass sum.assets     return
    ## 1          A   80683025 -0.4667822
    ## 2          B   27411156  0.5417719
    ## 3          C   22627377 -0.8810705
    ## 4          D  146355734  0.6301396
    ## 5          E   25463042  0.3101673
    
    # Or simply a direct computation yielding a vector:
    dat %$% fmean(return, assetclass, assets)
    ##          A          B          C          D          E 
    ## -0.4667822  0.5417719 -0.8810705  0.6301396  0.3101673 
    
    0 讨论(0)
提交回复
热议问题