R rolling join two data.tables with error margin on join

落爺英雄遲暮 提交于 2020-06-25 06:33:28

问题


Note: this question is a copy of this one but with different wording, and a suggestion for data.table instead of dplyr

I have two datasets that contain scores for different patients on multiple measuring moments like so:

dt1 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient3"),
                  "Days" = c(0,10,25,340,100,538),
                  "Score" = c(NA,2,3,99,5,6), 
                  stringsAsFactors = FALSE)
dt2 <- data.frame("ID" = c("patient1","patient1","patient1","patient1","patient2","patient2","patient3"),
                  "Days" = c(0,10,25,353,100,150,503),
                  "Score" = c(1,10,3,4,5,7,6), 
                  stringsAsFactors = FALSE)

> dt1
        ID Days Score
1 patient1    0    NA
2 patient1   10     2
3 patient1   25     3
4 patient1  340    99
5 patient2  100     5
6 patient3  538     6

> dt2
        ID Days Score
1 patient1    0     1
2 patient1   10    10
3 patient1   25     3
4 patient1  353     4
5 patient2  100     5
6 patient2  150     7
7 patient3  503     6

Column Days is the time measurement. I want to join both datasets based on ID and Days if the value for Days is within threshold <- 30. There are five conditions:

  • Consecutive days that are within the threshold from within the same df (rows 1 and 2) are not merged.
  • In some cases, up to four values for the Days variable exist in the same dataframe and thus should not be merged. It might be the case that one of these values does exist within the treshold in the other dataframe, and these will have to be merged (row 4).
  • Data that does not fall within treshold should not be merged, but not be discarded either (see example output row 7 and 8).
  • If there is no corresponding value for Days in either of the data sets, NA should be filled in.
  • The dataframes are not of equal length!

I suspect a data.table rolling join can give me the answer but I can't seem to figure it out. The expected output is as follows:

setDT(dt1)
setDT(dt2)
setkey(dt1, ID, Days) ?
setkey(dt2, ID, Days) ?

** do the join **

> dt_joined

        ID Days Score.x Score.y
1 patient1    0      NA       1
2 patient1   10       2      10
3 patient1   25       3       3
4 patient1  353      99       4   <<- merged (days 340 > 353)
5 patient2  100       5       5
6 patient2  150      NA       7   <<- new row added in dt2
7 patient3  503      NA       6   
8 patient3  538       6      NA   <<- same score as row 7 but not within treshold

Any help would be greatly appreciated. A data.table solution is not mandatory.


回答1:


A data.table answer has been given here by user Uwe:

https://stackoverflow.com/a/62321710/12079387



来源:https://stackoverflow.com/questions/62211431/r-rolling-join-two-data-tables-with-error-margin-on-join

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!