Fastest way for filling-in missing dates for data.table

后端 未结 3 1502
攒了一身酷
攒了一身酷 2020-12-05 05:58

I am loading a data.table from CSV file that has date, orders, amount etc. fields.

The input file occasionally does not have data for all dates. For exa

3条回答
  •  执念已碎
    2020-12-05 06:12

    Here is how you fill in the gaps within subgroup

    # a toy dataset with gaps in the time series
    dt <- as.data.table(read.csv(textConnection('"group","date","x"
    "a","2017-01-01",1
    "a","2017-02-01",2
    "a","2017-05-01",3
    "b","2017-02-01",4
    "b","2017-04-01",5')))
    dt[,date := as.Date(date)]
    
    # the desired dates by group
    indx <- dt[,.(date=seq(min(date),max(date),"months")),group]
    
    # key the tables and join them using a rolling join
    setkey(dt,group,date)
    setkey(indx,group,date)
    dt[indx,roll=TRUE]
    
    #>    group       date x
    #> 1:     a 2017-01-01 1
    #> 2:     a 2017-02-01 2
    #> 3:     a 2017-03-01 2
    #> 4:     a 2017-04-01 2
    #> 5:     a 2017-05-01 3
    #> 6:     b 2017-02-01 4
    #> 7:     b 2017-03-01 4
    #> 8:     b 2017-04-01 5
    

提交回复
热议问题