Using := in data.table with paste()

后端 未结 2 1647
轻奢々
轻奢々 2020-12-19 02:05

I have started using data.table for a large population model. So far, I have been impressed because using the data.table structure decreases my simulation run t

2条回答
  •  孤城傲影
    2020-12-19 02:39

    Struggling with column names is a strong indicator that the wide format is probably not the best choice for the given problem. Therefore, I suggest to do the computations in long form and to reshape the result from long to wide format, finally.

    nYears = 10
    params = data.table(Site = paste("Site", 1:3),
                        growthRate = c(1.1, 1.2, 1.3), 
                        pop = c(10, 12, 13))
    long <- params[CJ(Site = Site, Year = 0:nYears), on = "Site"][
      , growth := cumprod(shift(growthRate, fill = 1)), by = Site][
        , pop := pop * growth][]
    dcast(long, Site + growthRate ~ sprintf("popYears%02i", Year), value.var = "pop")
    
         Site growthRate popYears 0 popYears 1 popYears 2 popYears 3 popYears 4 popYears 5 popYears 6 popYears 7 popYears 8 popYears 9 popYears10
    1: Site 1        1.1         10       11.0      12.10     13.310    14.6410   16.10510   17.71561   19.48717   21.43589   23.57948   25.93742
    2: Site 2        1.2         12       14.4      17.28     20.736    24.8832   29.85984   35.83181   42.99817   51.59780   61.91736   74.30084
    3: Site 3        1.3         13       16.9      21.97     28.561    37.1293   48.26809   62.74852   81.57307  106.04499  137.85849  179.21604
    

    Explanation

    First, the parameters are expanded to cover 11 years (including year 0) using the cross join function CJ() and a subsequent right join on Site:

    params[CJ(Site = Site, Year = 0:nYears), on = "Site"]
    
           Site growthRate pop Year
     1: Site 1        1.1  10    0
     2: Site 1        1.1  10    1
     3: Site 1        1.1  10    2
     4: Site 1        1.1  10    3
     5: Site 1        1.1  10    4
     6: Site 1        1.1  10    5
     7: Site 1        1.1  10    6
     8: Site 1        1.1  10    7
     9: Site 1        1.1  10    8
    10: Site 1        1.1  10    9
    11: Site 1        1.1  10   10
    12: Site 2        1.2  12    0
    13: Site 2        1.2  12    1
    14: Site 2        1.2  12    2
    15: Site 2        1.2  12    3
    16: Site 2        1.2  12    4
    17: Site 2        1.2  12    5
    18: Site 2        1.2  12    6
    19: Site 2        1.2  12    7
    20: Site 2        1.2  12    8
    21: Site 2        1.2  12    9
    22: Site 2        1.2  12   10
    23: Site 3        1.3  13    0
    24: Site 3        1.3  13    1
    25: Site 3        1.3  13    2
    26: Site 3        1.3  13    3
    27: Site 3        1.3  13    4
    28: Site 3        1.3  13    5
    29: Site 3        1.3  13    6
    30: Site 3        1.3  13    7
    31: Site 3        1.3  13    8
    32: Site 3        1.3  13    9
    33: Site 3        1.3  13   10
          Site growthRate pop Year
    

    Then the growth is computed from the shifted growth rates using the cumulative product function cumprod() separately for each Site. The shift is required to skip the initial year for each Site. Then the population is computed by multiplying with the intial population.

    Finally, the data.table is reshaped from long to wide format using dcast(). The column headers are created on-the-fly using sprintf() to ensure the correct order of columns.

提交回复
热议问题