问题
I'm just going to apologize in advance for anything confusing and/or dumb about this question. I am completely new to R but because of larger project restrictions, I am currently forced to use it for this task.
Right now I have two tables that I would like to join, RMS1 and RMS2. RMS1 is larger, and I only want to carry over matching columns from RMS2 (left join). For the most part, RMS1 and RMS2 are separate data sets with a unique ID for every entry, but there are a few overlapping IDs between the two tables, and in that case, I would like to get a weighted average of the columns they share in common when I do a join.
For example, I have columns (ID, sev1, freq1, score1, count1) in both tables, and if there are two of the same IDs in both tables, the counts will be different, so I want a new table with the weighted average of sev1, freq1, and score1 based on the counts.
I found this old question which I could probably make work for me, but since I would need to do this calculation 13*3 times and I do not have any experience with vectors in R, I thought I would ask and see if there was a more efficient way to get what I want.
Basically, at the end of the day, I am looking to make a new table with all the exact same columns as RMS1, but with sev1, freq1, score1, etc. being weighted averages, if necessary.
EDITS: My bad, looks like I want a full join. Doesn't really matter in the context of this question though, I'm assuming I can tweak the kind of join later, I just need to know how to do the weighted average. I guess to make it more clear, I'll write out a simplified table example:
RMS1: id freq1 sev1 score1 count1
W123 1 5 3 40
F456 2 2 4 55
Y789 0 3 6 25
RMS2: id freq1 sev1 score1 count1
S012 3 3 6 25
Y789 3 0 3 50
Joined: id freq1 sev1 score1
W123 1 5 3
F456 2 2 4
Y789 2* 1* 4*
S012 3 3 6
So the starred values are the weighted averages of id Y789 (weighted on the counts) because it appears in both RMS tables. Otherwise I just take the raw values from either table. Hope this helps. Again, new to all this, sorry for bad formatting.
回答1:
A solution using dplyr. We can combine the two data frames by rows, and then calculated the weighted mean by each id
. The last as.data.frame()
is not required if you are fine to work on the tibble
.
library(dplyr)
Joined <- bind_rows(RMS1, RMS2) %>%
group_by(id) %>%
summarise_at(vars(-count1), funs(weighted.mean(., count1))) %>%
as.data.frame()
Joined
# id freq1 sev1 score1
# 1 F456 2 2 4
# 2 S012 3 3 6
# 3 W123 1 5 3
# 4 Y789 2 1 4
DATA
RMS1 <- read.table(text = "id freq1 sev1 score1 count1
W123 1 5 3 40
F456 2 2 4 55
Y789 0 3 6 25",
header = TRUE, stringsAsFactors = FALSE)
RMS2 <- read.table(text = "id freq1 sev1 score1 count1
S012 3 3 6 25
Y789 3 0 3 50",
header = TRUE, stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/49041280/getting-a-weighted-average-in-r-when-joining-two-tables