I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run t
Struggling with column names is a strong indicator that the wide format is probably not the best choice for the given problem. Therefore, I suggest to do the computations in long form and to reshape the result from long to wide format, finally.
nYears = 10
params = data.table(Site = paste("Site", 1:3),
growthRate = c(1.1, 1.2, 1.3),
pop = c(10, 12, 13))
long <- params[CJ(Site = Site, Year = 0:nYears), on = "Site"][
, growth := cumprod(shift(growthRate, fill = 1)), by = Site][
, pop := pop * growth][]
dcast(long, Site + growthRate ~ sprintf("popYears%02i", Year), value.var = "pop")
Site growthRate popYears 0 popYears 1 popYears 2 popYears 3 popYears 4 popYears 5 popYears 6 popYears 7 popYears 8 popYears 9 popYears10 1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510 17.71561 19.48717 21.43589 23.57948 25.93742 2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984 35.83181 42.99817 51.59780 61.91736 74.30084 3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809 62.74852 81.57307 106.04499 137.85849 179.21604
First, the parameters are expanded to cover 11 years (including year 0) using the cross join function CJ() and a subsequent right join on Site:
params[CJ(Site = Site, Year = 0:nYears), on = "Site"]
Site growthRate pop Year 1: Site 1 1.1 10 0 2: Site 1 1.1 10 1 3: Site 1 1.1 10 2 4: Site 1 1.1 10 3 5: Site 1 1.1 10 4 6: Site 1 1.1 10 5 7: Site 1 1.1 10 6 8: Site 1 1.1 10 7 9: Site 1 1.1 10 8 10: Site 1 1.1 10 9 11: Site 1 1.1 10 10 12: Site 2 1.2 12 0 13: Site 2 1.2 12 1 14: Site 2 1.2 12 2 15: Site 2 1.2 12 3 16: Site 2 1.2 12 4 17: Site 2 1.2 12 5 18: Site 2 1.2 12 6 19: Site 2 1.2 12 7 20: Site 2 1.2 12 8 21: Site 2 1.2 12 9 22: Site 2 1.2 12 10 23: Site 3 1.3 13 0 24: Site 3 1.3 13 1 25: Site 3 1.3 13 2 26: Site 3 1.3 13 3 27: Site 3 1.3 13 4 28: Site 3 1.3 13 5 29: Site 3 1.3 13 6 30: Site 3 1.3 13 7 31: Site 3 1.3 13 8 32: Site 3 1.3 13 9 33: Site 3 1.3 13 10 Site growthRate pop Year
Then the growth is computed from the shifted growth rates using the cumulative product function cumprod() separately for each Site. The shift is required to skip the initial year for each Site. Then the population is computed by multiplying with the intial population.
Finally, the data.table is reshaped from long to wide format using dcast(). The column headers are created on-the-fly using sprintf() to ensure the correct order of columns.
## Start with 1st three columns of example data
dt <- exampleTable[,1:3]
## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
y0 <- as.symbol(paste0("popYears", ii))
y1 <- paste0("popYears", ii+1)
dt[, (y1) := eval(y0)*growthRate]
}
## Check that it worked
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809
Edit:
Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.
nYears <- 5
## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)
## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}
## Check results
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809