Using := in data.table with paste()

后端未结

关注

 2  1648

I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run t

相关标签:

2条回答

孤城傲影

2020-12-19 02:39

Struggling with column names is a strong indicator that the wide format is probably not the best choice for the given problem. Therefore, I suggest to do the computations in long form and to reshape the result from long to wide format, finally.

nYears = 10
params = data.table(Site = paste("Site", 1:3),
                    growthRate = c(1.1, 1.2, 1.3), 
                    pop = c(10, 12, 13))
long <- params[CJ(Site = Site, Year = 0:nYears), on = "Site"][
  , growth := cumprod(shift(growthRate, fill = 1)), by = Site][
    , pop := pop * growth][]
dcast(long, Site + growthRate ~ sprintf("popYears%02i", Year), value.var = "pop")

     Site growthRate popYears 0 popYears 1 popYears 2 popYears 3 popYears 4 popYears 5 popYears 6 popYears 7 popYears 8 popYears 9 popYears10
1: Site 1        1.1         10       11.0      12.10     13.310    14.6410   16.10510   17.71561   19.48717   21.43589   23.57948   25.93742
2: Site 2        1.2         12       14.4      17.28     20.736    24.8832   29.85984   35.83181   42.99817   51.59780   61.91736   74.30084
3: Site 3        1.3         13       16.9      21.97     28.561    37.1293   48.26809   62.74852   81.57307  106.04499  137.85849  179.21604

Explanation

First, the parameters are expanded to cover 11 years (including year 0) using the cross join function CJ() and a subsequent right join on Site:

params[CJ(Site = Site, Year = 0:nYears), on = "Site"]

       Site growthRate pop Year
 1: Site 1        1.1  10    0
 2: Site 1        1.1  10    1
 3: Site 1        1.1  10    2
 4: Site 1        1.1  10    3
 5: Site 1        1.1  10    4
 6: Site 1        1.1  10    5
 7: Site 1        1.1  10    6
 8: Site 1        1.1  10    7
 9: Site 1        1.1  10    8
10: Site 1        1.1  10    9
11: Site 1        1.1  10   10
12: Site 2        1.2  12    0
13: Site 2        1.2  12    1
14: Site 2        1.2  12    2
15: Site 2        1.2  12    3
16: Site 2        1.2  12    4
17: Site 2        1.2  12    5
18: Site 2        1.2  12    6
19: Site 2        1.2  12    7
20: Site 2        1.2  12    8
21: Site 2        1.2  12    9
22: Site 2        1.2  12   10
23: Site 3        1.3  13    0
24: Site 3        1.3  13    1
25: Site 3        1.3  13    2
26: Site 3        1.3  13    3
27: Site 3        1.3  13    4
28: Site 3        1.3  13    5
29: Site 3        1.3  13    6
30: Site 3        1.3  13    7
31: Site 3        1.3  13    8
32: Site 3        1.3  13    9
33: Site 3        1.3  13   10
      Site growthRate pop Year

Then the growth is computed from the shifted growth rates using the cumulative product function cumprod() separately for each Site. The shift is required to skip the initial year for each Site. Then the population is computed by multiplying with the intial population.

Finally, the data.table is reshaped from long to wide format using dcast(). The column headers are created on-the-fly using sprintf() to ensure the correct order of columns.

0 讨论(0)

忘掉有多难

2020-12-19 02:45

## Start with 1st three columns of example data
dt <- exampleTable[,1:3]

## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
    y0 <- as.symbol(paste0("popYears", ii))
    y1 <- paste0("popYears", ii+1)
    dt[, (y1) := eval(y0)*growthRate]
}

## Check that it worked
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809

Edit:

Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.

nYears <- 5

## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)

## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
    set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}

## Check results
dt
#     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
#2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
#3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809

0 讨论(0)