I\'m trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medicati
I am not sure why your function is slow (I think you could remove your ifelse function), but I would propose to use merge to be faster and to operate on one table only:
plouf <- lsr[adherence, on = "ID", allow.cartesian=TRUE]
plouf[,year := as.date(year)]
bob <- rbindlist(lapply(unique(adherence$year),function(x){
plouf <- lsr[adherence[year == x], on = "ID"]
plouf[,year := as.Date(year)]
plouf[year >= eksd & year < ENDDATE,list(sum = sum(as.numeric(ENDDATE-as.Date(year))), year = year), by = ID]
}))
bob
ID sum year
1: 1 64 2013-02-01
2: 3 63 2013-02-01
you can then merge to adherence
adherence <- setDT(adherence)
adherence[,year := as.Date(year)]
bob[adherence, on = .(ID,year)]
ID sum year
1: 1 NA 2013-01-01
2: 2 NA 2013-01-01
3: 3 NA 2013-01-01
4: 1 64 2013-02-01
5: 2 NA 2013-02-01
6: 3 63 2013-02-01
For reading your data use fread()
function that is fast for big data