Data.table: how to get the blazingly fast subsets it promises and apply to a second data.table

后端未结

关注

 2  1447

难免孤独 2021-01-15 00:55

I\'m trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medicati

2条回答

南方客 (楼主)

2021-01-15 01:30

I am not sure why your function is slow (I think you could remove your ifelse function), but I would propose to use merge to be faster and to operate on one table only:

plouf <- lsr[adherence, on = "ID", allow.cartesian=TRUE]
plouf[,year := as.date(year)]
bob <- rbindlist(lapply(unique(adherence$year),function(x){
  plouf <- lsr[adherence[year == x], on = "ID"]
  plouf[,year := as.Date(year)]
  plouf[year >= eksd & year < ENDDATE,list(sum = sum(as.numeric(ENDDATE-as.Date(year))), year = year), by = ID]
  }))
bob

   ID sum       year
1:  1  64 2013-02-01
2:  3  63 2013-02-01

you can then merge to adherence

adherence <- setDT(adherence)
adherence[,year := as.Date(year)]
bob[adherence, on = .(ID,year)]
   ID sum       year
1:  1  NA 2013-01-01
2:  2  NA 2013-01-01
3:  3  NA 2013-01-01
4:  1  64 2013-02-01
5:  2  NA 2013-02-01
6:  3  63 2013-02-01

For reading your data use fread() function that is fast for big data

0 讨论(0)

查看其它2个回答