问题
I have two dataframes in R:
df1:
ID Date Discharged
1 2014-08-04
2 2014-12-10
3 2015-01-01
df2:
ID Check-in-Date
1 2013-01-02
1 2014-08-11
2 2014-12-14
2 2015-05-01
3 2012-05-06
3 2015-01-05
I need to compare df1 with df2 based on ID and see which person checked in for another appointment within 7 days of being discharged. How would I accomplish this since df2 has duplicate IDs?
I'd like to create a new column in df1 with 1 if the person checked in and 0 if they didn't. I also need a new column with the check-in-date if they checked in with 7 days, or blank if they didn't.
NEW df1:
ID Date Discharged Within_7days 7-day Checkin
1 2014-08-04 1 2014-08-11
2 2014-12-10 1 2014-12-14
3 2015-01-01 1
回答1:
Your last example should have a Within_7days value of 1 for ID of 3 according to your data.
The package lubridate
is great for working with times. Also, I use dplyr
for manipulating datasets.
# Load Libraries
library(dplyr)
library(lubridate)
# Recreate dataframes
df1 <- data.frame(ID = 1:3, Date.Discharged = c("2014-08-04","2014-12-10","2015-01-01"))
df2 <- data.frame(ID = c(1,2,2,3), CheckDate = c("2014-08-11","2014-12-14","2015-05-01","2015-01-05"))
# Make the characters in Date.Discharged and CheckDate into Date objects:
df1$Date.Discharged <- as.Date(df1$Date.Discharged)
df2$CheckDate <- as.Date(df2$CheckDate)
I assume that you only care about the minimum value for each ID for the check-in Date. That way if you select the minimum value, you get rid of any problems with multiple entries with the same ID.
df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))
# Now join the 2 dataframes together
df1 <- left_join(df1,df2) # this joins by ID by default
Finally, we make the column with Y/N
df1 <- df1 %>% mutate(within_7days = (CheckDate <= Date.Discharged + days(7))*1)
df1
ID Date.Discharged CheckDate within_7days
1 1 2014-08-04 2014-08-11 1
2 2 2014-12-10 2014-12-14 1
3 3 2015-01-01 2015-01-05 1
EDIT:
This is slightly inefficient, but will handle the case where you have values in the 2nd dataset that occur before the discharge date:
We can ignore the following line above:
df2 <- df2 %>% group_by(ID) %>% summarize(CheckDate = min(CheckDate))
and instead go straight to the join step and modify it as follows:
df1 <- left_join(df1,df2) %>% filter(CheckDate > Date.Discharged)
This will add all of the CheckDates that take place each Date Discharged for the same ID. This is again assuming that the ID's in DF1 are unique.
来源:https://stackoverflow.com/questions/36826986/r-compare-dates-in-two-dataframes-and-isolate-rows-that-match-within-a-certain