问题
My question seems simple and I hope it is.
I have a dataframe that has the date of diagnosis of a disease, a binary variable indicating which drug the patient was on (or exposed and unexposed group), a start and stop date for the drug, and an overall stop date.
ID Diag_date Treatment End.date Drug.start drug.end
1 NA 0 15/03/2002 01/01/2002 01/02/2002
1 NA 1 15/03/2002 01/02/2002 01/03/2002
1 NA 0 15/03/2002 01/03/2002 NA
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002
2 01/04/2002 0 01/05/2002 01/03/2002 NA
As you can see the date of diagnosis is non time-varying, but the drug start and stop dates are.
Preferably I want an answer to two questions:
1.) How do i transfer the overall End.date
to the final drug.end
for each ID
?
2.) How do I create a binary column that shows if the diagnosis date occurs in the interval between Drug.start
and Drug.end
?
I wish my final data to look like the following:
ID Diag_date Treatment End.Date Drug.start Drug.end Event
1 NA 0 15/03/2002 01/01/2002 01/02/2002 0
1 NA 1 15/03/2002 01/02/2002 01/03/2002 0
1 NA 0 15/03/2002 01/03/2002 15/03/2002 0
2 01/04/2002 1 01/05/2002 01/01/2015 01/02/2002 0
2 01/04/2002 0 01/05/2002 01/02/2002 01/03/2002 0
2 01/04/2002 0 01/05/2002 01/03/2002 01/05/2002 1
Not everyone has a diagnosis date because not everyone in the sample had the disease. The code I wrote is the following:
for (i in 1:nrow(df)) {
if ((df$Diag_date[i] >= df$Drug.start[i]) && ( df$Diag_date[i] <= df$Drug.stop[i])) {
df$Event[i] <- 1
} else {
df$Event[i] <- 0
}
}
the error i get when I run this code is:
missing value where TRUE/FALSE needed
Any help would be much appreciated.
回答1:
You can try
library(dplyr)
df1 %>%
mutate_each(funs(as.Date(., '%d/%m/%Y')), matches('start|end|date')) %>%
mutate(drug.end= as.Date(ifelse(is.na(drug.end), End.date,
drug.end),origin='1970-01-01'),
Event= as.integer((Diag_date >= Drug.start & Diag_date<=drug.end) &
!is.na(Diag_date))) #%>%
#mutate_each(funs(format(., '%d/%m/%Y')), matches('start|end|date'))
# ID Diag_date Treatment End.date Drug.start drug.end Event
#1 1 <NA> 0 2002-03-15 2002-01-01 2002-02-01 0
#2 1 <NA> 1 2002-03-15 2002-02-01 2002-03-01 0
#3 1 <NA> 0 2002-03-15 2002-03-01 2002-03-15 0
#4 2 2002-04-01 1 2002-05-01 2015-01-01 2002-02-01 0
#5 2 2002-04-01 0 2002-05-01 2002-02-01 2002-03-01 0
#6 2 2002-04-01 0 2002-05-01 2002-03-01 2002-05-01 1
As @David Arenburg mentioned, it is better to keep the 'date' columns as the 'Date' class. If you need it in the 'character' format, just uncomment the last line and run it.
NOTE: Removed the group_by
as it was not needed
回答2:
Her's a possible data.table
equivalent
library(data.table)
# Converting to dates
Dates <- names(df)[c(2, 4:6)]
setDT(df)[, (Dates) := lapply(.SD, as.IDate, format = "%d/%m/%Y"), .SDcols = Dates]
# First question
df[is.na(drug.end), drug.end := End.date]
# Second question
df[Diag_date >= Drug.start & Diag_date <= drug.end, Event := 1L]
回答3:
Akrun's answer suffices for the problem in hand. Suggesting a more straight forward code.
A <- read.table("clipboard", header = T)
Dates <- c("Diag_date", "End.date", "Drug.start", "drug.end")
A[,Dates] <- lapply(A[,Dates],function(x) as.Date(x, format = "%d/%m/%Y"))
A$drug.end[is.na(A$drug.end)] <- as.character(A$End.date[is.na(A$drug.end)])
A$Event <- as.numeric((A$Diag_date >= A$Drug.start & A$Diag_date<=A$drug.end) & !is.na(A$Diag_date))
来源:https://stackoverflow.com/questions/31262707/how-to-show-an-event-happened-between-two-dates-in-r