问题
Note: this question is a copy of this one but with different wording, and a suggestion for data.table
instead of dplyr
I have two datasets that contain scores for different patients on multiple measuring moments like so:
dt1 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient3"),
"Days" = c(0,10,25,340,100,538),
"Score" = c(NA,2,3,99,5,6),
stringsAsFactors = FALSE)
dt2 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient2","patient3"),
"Days" = c(0,10,25,353,100,150,503),
"Score" = c(1,10,3,4,5,7,6),
stringsAsFactors = FALSE)
> dt1
ID Days Score
1 patient1 0 NA
2 patient1 10 2
3 patient1 25 3
4 patient1 340 99
5 patient2 100 5
6 patient3 538 6
> dt2
ID Days Score
1 patient1 0 1
2 patient1 10 10
3 patient1 25 3
4 patient1 353 4
5 patient2 100 5
6 patient2 150 7
7 patient3 503 6
Column Days
is the time measurement. I want to join both datasets based on ID
and Days
if the value for Days
is within threshold <- 30
. There are five conditions:
- Consecutive days that are within the threshold from within the same df (rows 1 and 2) are not merged.
- In some cases, up to four values for the Days variable exist in the same dataframe and thus should not be merged. It might be the case that one of these values does exist within the treshold in the other dataframe, and these will have to be merged (row 4).
- Data that does not fall within treshold should not be merged, but not be discarded either (see example output row 7 and 8).
- If there is no corresponding value for
Days
in either of the data sets, NA should be filled in. - The dataframes are not of equal length!
I suspect a data.table rolling join
can give me the answer but I can't seem to figure it out. The expected output is as follows:
setDT(dt1)
setDT(dt2)
setkey(dt1, ID, Days) ?
setkey(dt2, ID, Days) ?
** do the join **
> dt_joined
ID Days Score.x Score.y
1 patient1 0 NA 1
2 patient1 10 2 10
3 patient1 25 3 3
4 patient1 353 99 4 <<- merged (days 340 > 353)
5 patient2 100 5 5
6 patient2 150 NA 7 <<- new row added in dt2
7 patient3 503 NA 6
8 patient3 538 6 NA <<- same score as row 7 but not within treshold
Any help would be greatly appreciated. A data.table
solution is not mandatory.
回答1:
A data.table
answer has been given here by user Uwe:
https://stackoverflow.com/a/62321710/12079387
来源:https://stackoverflow.com/questions/62211431/r-rolling-join-two-data-tables-with-error-margin-on-join