Weighted average value in the presence of NA values

谁说胖子不能爱 提交于 2019-12-11 23:45:49

问题


Here's a very simple example of what I'm dealing with:

data_stack <- data.table(CompA_value = c(10,20,30,40), CompB_value = c(60,70,80,80), CompC_value = c(NA, NA, NA, 100), CompA_weight = c(0.2, 0.3,0.4,0.4), CompB_weight = c(0.8,0.7,0.6,0.4), CompC_weight = c(NA, NA, NA,0.2))

   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight
1:          10          60          NA          0.2          0.8           NA
2:          20          70          NA          0.3          0.7           NA
3:          30          80          NA          0.4          0.6           NA
4:          40          80         100          0.4          0.4          0.2

What I want to do is calculate the weighted average of CompA through C, for each row. However, notice that CompC has NAs for rows 1-3. What I would like is for rows 1-3 to have the weighted average of CompA and CompB, but once CompC becomes active, I'd like to have it automatically included in the calculation.

As it stands, I've done something like this:

> data_stack[, Weighted_average := CompA_value*CompA_weight + CompB_value*CompB_weight + CompC_value * CompC_weight]
> data_stack
   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1:          10          60          NA          0.2          0.8           NA               NA
2:          20          70          NA          0.3          0.7           NA               NA
3:          30          80          NA          0.4          0.6           NA               NA
4:          40          80         100          0.4          0.4          0.2               68

But my "Weighted_average" column obviously won't give me weights for the first 1-3 rows.

What I want is:

 data_stack[, Weighted_average := c((10*0.2 + 60*0.8),(20*0.3 + 70*0.7),(30*0.4 + 80*0.6),(40*0.4 + 80*0.4 + 100*0.2))]
 data_stack
   CompA_value CompB_value CompC_value CompA_weight   CompB_weight CompC_weight   Weighted_average
1:          10          60          NA          0.2          0.8           NA               50
2:          20          70          NA          0.3          0.7           NA               55
3:          30          80          NA          0.4          0.6           NA               60
4:          40          80         100          0.4          0.4          0.2               68

So, note how the first three rows are just the weighted average of A and B, but once C becomes available, it is also included into the calculation.

So I'd like to find out how to write some code which picks up whether there is an NA value, if so, skips it, but if not includes it in the calculation.

I've got a considerably bigger data table so doing it manually is out of the question!

Regards.


回答1:


Here you go:

data_stack$Weighted_average = apply(data_stack,1,function(x){
  y = c(x["CompA_value"]*x["CompA_weight"],
        x["CompB_value"]*x["CompB_weight"],
        x["CompC_value"]*x["CompC_weight"])
  return(sum(y,na.rm = T))
})

Result:

> data_stack
  CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1          10          60          NA          0.2          0.8           NA               50
2          20          70          NA          0.3          0.7           NA               55
3          30          80          NA          0.4          0.6           NA               60
4          40          80         100          0.4          0.4          0.2               68

The function creates a vector with value*weight for each column. Then returns the sum ignoring the NA values. This means that this will ignore NA values in any column.



来源:https://stackoverflow.com/questions/40541172/weighted-average-value-in-the-presence-of-na-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!