Using := in data.table with paste()

后端 未结 2 1648
轻奢々
轻奢々 2020-12-19 02:05

I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run t

相关标签:
2条回答
  • 2020-12-19 02:39

    Struggling with column names is a strong indicator that the wide format is probably not the best choice for the given problem. Therefore, I suggest to do the computations in long form and to reshape the result from long to wide format, finally.

    nYears = 10
    params = data.table(Site = paste("Site", 1:3),
                        growthRate = c(1.1, 1.2, 1.3), 
                        pop = c(10, 12, 13))
    long <- params[CJ(Site = Site, Year = 0:nYears), on = "Site"][
      , growth := cumprod(shift(growthRate, fill = 1)), by = Site][
        , pop := pop * growth][]
    dcast(long, Site + growthRate ~ sprintf("popYears%02i", Year), value.var = "pop")
    
         Site growthRate popYears 0 popYears 1 popYears 2 popYears 3 popYears 4 popYears 5 popYears 6 popYears 7 popYears 8 popYears 9 popYears10
    1: Site 1        1.1         10       11.0      12.10     13.310    14.6410   16.10510   17.71561   19.48717   21.43589   23.57948   25.93742
    2: Site 2        1.2         12       14.4      17.28     20.736    24.8832   29.85984   35.83181   42.99817   51.59780   61.91736   74.30084
    3: Site 3        1.3         13       16.9      21.97     28.561    37.1293   48.26809   62.74852   81.57307  106.04499  137.85849  179.21604
    

    Explanation

    First, the parameters are expanded to cover 11 years (including year 0) using the cross join function CJ() and a subsequent right join on Site:

    params[CJ(Site = Site, Year = 0:nYears), on = "Site"]
    
           Site growthRate pop Year
     1: Site 1        1.1  10    0
     2: Site 1        1.1  10    1
     3: Site 1        1.1  10    2
     4: Site 1        1.1  10    3
     5: Site 1        1.1  10    4
     6: Site 1        1.1  10    5
     7: Site 1        1.1  10    6
     8: Site 1        1.1  10    7
     9: Site 1        1.1  10    8
    10: Site 1        1.1  10    9
    11: Site 1        1.1  10   10
    12: Site 2        1.2  12    0
    13: Site 2        1.2  12    1
    14: Site 2        1.2  12    2
    15: Site 2        1.2  12    3
    16: Site 2        1.2  12    4
    17: Site 2        1.2  12    5
    18: Site 2        1.2  12    6
    19: Site 2        1.2  12    7
    20: Site 2        1.2  12    8
    21: Site 2        1.2  12    9
    22: Site 2        1.2  12   10
    23: Site 3        1.3  13    0
    24: Site 3        1.3  13    1
    25: Site 3        1.3  13    2
    26: Site 3        1.3  13    3
    27: Site 3        1.3  13    4
    28: Site 3        1.3  13    5
    29: Site 3        1.3  13    6
    30: Site 3        1.3  13    7
    31: Site 3        1.3  13    8
    32: Site 3        1.3  13    9
    33: Site 3        1.3  13   10
          Site growthRate pop Year
    

    Then the growth is computed from the shifted growth rates using the cumulative product function cumprod() separately for each Site. The shift is required to skip the initial year for each Site. Then the population is computed by multiplying with the intial population.

    Finally, the data.table is reshaped from long to wide format using dcast(). The column headers are created on-the-fly using sprintf() to ensure the correct order of columns.

    0 讨论(0)
  • 2020-12-19 02:45
    ## Start with 1st three columns of example data
    dt <- exampleTable[,1:3]
    
    ## Run for 1st five years
    nYears <- 5
    for(ii in seq_len(nYears)-1) {
        y0 <- as.symbol(paste0("popYears", ii))
        y1 <- paste0("popYears", ii+1)
        dt[, (y1) := eval(y0)*growthRate]
    }
    
    ## Check that it worked
    dt
    #     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
    #1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
    #2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
    #3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809
    

    Edit:

    Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.

    nYears <- 5
    
    ## Things that only need to be calculated once can be taken out of the loop
    r <- dt[["growthRate"]]
    yy <- paste0("popYears", seq_len(nYears+1)-1)
    
    ## A loop using set() and data.table's nice compact syntax
    for(ii in seq_len(nYears)) {
        set(dt, , yy[ii+1], r*dt[[yy[ii]]])
    }
    
    ## Check results
    dt
    #     Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
    #1: Site 1        1.1        10      11.0     12.10    13.310   14.6410  16.10510
    #2: Site 2        1.2        12      14.4     17.28    20.736   24.8832  29.85984
    #3: Site 3        1.3        13      16.9     21.97    28.561   37.1293  48.26809
    
    0 讨论(0)
提交回复
热议问题