I am loading a data.table from CSV file that has date, orders, amount etc. fields.
The input file occasionally does not have data for all dates. For exa
Here is how you fill in the gaps within subgroup
# a toy dataset with gaps in the time series
dt <- as.data.table(read.csv(textConnection('"group","date","x"
"a","2017-01-01",1
"a","2017-02-01",2
"a","2017-05-01",3
"b","2017-02-01",4
"b","2017-04-01",5')))
dt[,date := as.Date(date)]
# the desired dates by group
indx <- dt[,.(date=seq(min(date),max(date),"months")),group]
# key the tables and join them using a rolling join
setkey(dt,group,date)
setkey(indx,group,date)
dt[indx,roll=TRUE]
#> group date x
#> 1: a 2017-01-01 1
#> 2: a 2017-02-01 2
#> 3: a 2017-03-01 2
#> 4: a 2017-04-01 2
#> 5: a 2017-05-01 3
#> 6: b 2017-02-01 4
#> 7: b 2017-03-01 4
#> 8: b 2017-04-01 5
Not sure if it's the fastest, but it'll work if there are no NAs in the data:
# just in case these aren't Dates.
NADayWiseOrders$date <- as.Date(NADayWiseOrders$date)
# all desired dates.
alldates <- data.table(date=seq.Date(min(NADayWiseOrders$date), max(NADayWiseOrders$date), by="day"))
# merge
dt <- merge(NADayWiseOrders, alldates, by="date", all=TRUE)
# now carry forward last observation (alternatively, set NA's to 0)
require(xts)
na.locf(dt)
The idiomatic data.table way (using rolling joins) is this:
setkey(NADayWiseOrders, date)
all_dates <- seq(from = as.Date("2013-01-01"),
to = as.Date("2013-01-07"),
by = "days")
NADayWiseOrders[J(all_dates), roll=Inf]
date orders amount guests
1: 2013-01-01 50 2272.55 149
2: 2013-01-02 3 64.04 4
3: 2013-01-03 3 64.04 4
4: 2013-01-04 1 18.81 0
5: 2013-01-05 2 77.62 0
6: 2013-01-06 2 77.62 0
7: 2013-01-07 2 35.82 2