Fastest way for filling-in missing dates for data.table

后端 未结 3 1501
攒了一身酷
攒了一身酷 2020-12-05 05:58

I am loading a data.table from CSV file that has date, orders, amount etc. fields.

The input file occasionally does not have data for all dates. For exa

相关标签:
3条回答
  • 2020-12-05 06:12

    Here is how you fill in the gaps within subgroup

    # a toy dataset with gaps in the time series
    dt <- as.data.table(read.csv(textConnection('"group","date","x"
    "a","2017-01-01",1
    "a","2017-02-01",2
    "a","2017-05-01",3
    "b","2017-02-01",4
    "b","2017-04-01",5')))
    dt[,date := as.Date(date)]
    
    # the desired dates by group
    indx <- dt[,.(date=seq(min(date),max(date),"months")),group]
    
    # key the tables and join them using a rolling join
    setkey(dt,group,date)
    setkey(indx,group,date)
    dt[indx,roll=TRUE]
    
    #>    group       date x
    #> 1:     a 2017-01-01 1
    #> 2:     a 2017-02-01 2
    #> 3:     a 2017-03-01 2
    #> 4:     a 2017-04-01 2
    #> 5:     a 2017-05-01 3
    #> 6:     b 2017-02-01 4
    #> 7:     b 2017-03-01 4
    #> 8:     b 2017-04-01 5
    
    0 讨论(0)
  • 2020-12-05 06:31

    Not sure if it's the fastest, but it'll work if there are no NAs in the data:

    # just in case these aren't Dates. 
    NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
    # all desired dates.
    alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
    # merge
    dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
    # now carry forward last observation (alternatively, set NA's to 0)
    require(xts)
    na.locf(dt)
    
    0 讨论(0)
  • 2020-12-05 06:36

    The idiomatic data.table way (using rolling joins) is this:

    setkey(NADayWiseOrders, date)
    all_dates <- seq(from = as.Date("2013-01-01"), 
                       to = as.Date("2013-01-07"), 
                       by = "days")
    
    NADayWiseOrders[J(all_dates), roll=Inf]
             date orders  amount guests
    1: 2013-01-01     50 2272.55    149
    2: 2013-01-02      3   64.04      4
    3: 2013-01-03      3   64.04      4
    4: 2013-01-04      1   18.81      0
    5: 2013-01-05      2   77.62      0
    6: 2013-01-06      2   77.62      0
    7: 2013-01-07      2   35.82      2
    
    0 讨论(0)
提交回复
热议问题