Getting a weighted average in R when joining two tables

雨燕双飞 提交于 2019-12-24 19:07:45

问题


I'm just going to apologize in advance for anything confusing and/or dumb about this question. I am completely new to R but because of larger project restrictions, I am currently forced to use it for this task.

Right now I have two tables that I would like to join, RMS1 and RMS2. RMS1 is larger, and I only want to carry over matching columns from RMS2 (left join). For the most part, RMS1 and RMS2 are separate data sets with a unique ID for every entry, but there are a few overlapping IDs between the two tables, and in that case, I would like to get a weighted average of the columns they share in common when I do a join.

For example, I have columns (ID, sev1, freq1, score1, count1) in both tables, and if there are two of the same IDs in both tables, the counts will be different, so I want a new table with the weighted average of sev1, freq1, and score1 based on the counts.

I found this old question which I could probably make work for me, but since I would need to do this calculation 13*3 times and I do not have any experience with vectors in R, I thought I would ask and see if there was a more efficient way to get what I want.

Basically, at the end of the day, I am looking to make a new table with all the exact same columns as RMS1, but with sev1, freq1, score1, etc. being weighted averages, if necessary.

EDITS: My bad, looks like I want a full join. Doesn't really matter in the context of this question though, I'm assuming I can tweak the kind of join later, I just need to know how to do the weighted average. I guess to make it more clear, I'll write out a simplified table example:

RMS1:   id  freq1   sev1    score1  count1
        W123    1   5   3   40
        F456    2   2   4   55
        Y789    0   3   6   25

  RMS2: id  freq1   sev1    score1  count1
        S012    3   3   6   25
        Y789    3   0   3   50

Joined: id      freq1   sev1    score1  
        W123    1   5   3   
        F456    2   2   4   
        Y789    2*  1*  4*  
        S012    3   3   6

So the starred values are the weighted averages of id Y789 (weighted on the counts) because it appears in both RMS tables. Otherwise I just take the raw values from either table. Hope this helps. Again, new to all this, sorry for bad formatting.


回答1:


A solution using dplyr. We can combine the two data frames by rows, and then calculated the weighted mean by each id. The last as.data.frame() is not required if you are fine to work on the tibble.

library(dplyr)

Joined <- bind_rows(RMS1, RMS2) %>%
  group_by(id) %>%
  summarise_at(vars(-count1), funs(weighted.mean(., count1))) %>%
  as.data.frame()
Joined
#     id freq1 sev1 score1
# 1 F456     2    2      4
# 2 S012     3    3      6
# 3 W123     1    5      3
# 4 Y789     2    1      4

DATA

RMS1 <- read.table(text = "id  freq1 sev1 score1 count1
        W123    1   5   3   40
        F456    2   2   4   55
        Y789    0   3   6   25",
                   header = TRUE, stringsAsFactors = FALSE)

RMS2 <- read.table(text = "id  freq1 sev1 score1 count1
        S012    3   3   6   25
        Y789    3   0   3   50",
                   header = TRUE, stringsAsFactors = FALSE)


来源:https://stackoverflow.com/questions/49041280/getting-a-weighted-average-in-r-when-joining-two-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!