Data.table: how to get the blazingly fast subsets it promises and apply to a second data.table

后端 未结 2 1447
难免孤独
难免孤独 2021-01-15 00:55

I\'m trying to enrich one dataset (adherence) based on subsets from another (lsr). For each individual row in adherence, I want to calculate (as a third column) the medicati

2条回答
  •  南方客
    南方客 (楼主)
    2021-01-15 01:30

    I am not sure why your function is slow (I think you could remove your ifelse function), but I would propose to use merge to be faster and to operate on one table only:

    plouf <- lsr[adherence, on = "ID", allow.cartesian=TRUE]
    plouf[,year := as.date(year)]
    bob <- rbindlist(lapply(unique(adherence$year),function(x){
      plouf <- lsr[adherence[year == x], on = "ID"]
      plouf[,year := as.Date(year)]
      plouf[year >= eksd & year < ENDDATE,list(sum = sum(as.numeric(ENDDATE-as.Date(year))), year = year), by = ID]
      }))
    bob
    
       ID sum       year
    1:  1  64 2013-02-01
    2:  3  63 2013-02-01
    

    you can then merge to adherence

    adherence <- setDT(adherence)
    adherence[,year := as.Date(year)]
    bob[adherence, on = .(ID,year)]
       ID sum       year
    1:  1  NA 2013-01-01
    2:  2  NA 2013-01-01
    3:  3  NA 2013-01-01
    4:  1  64 2013-02-01
    5:  2  NA 2013-02-01
    6:  3  63 2013-02-01
    

    For reading your data use fread() function that is fast for big data

提交回复
热议问题