I am new to R and I am trying to build a frequency/severity simulation. Everything is working fine except that it takes about 10min to do 10000 simulations for each of 700 locat
We can append NAs at the end to make the length same for each of the list elements and then do the rbind
out <- do.call(rbind, lapply(obs, `length<-`, max(lengths(obs))))
as.data.frame(out) # if we need a data.frame as output
or using tidyverse
library(tidyverse)
obs %>%
set_names(seq_along(.)) %>%
stack %>%
group_by(ind) %>%
mutate(Col = paste0("Col", row_number())) %>%
spread(Col, values)
Everything is working fine except that it takes [too long] to do [
numsim] simulations
If your real application uses rnorm or similar, you can make a single call to it:
set.seed(1223)
numsim = 3e5
freqs = rN.D(numsim)
maxlen = max(freqs)
m = matrix(, maxlen, numsim)
m[row(m) <= freqs[col(m)]] <- rX.D(sum(freqs))
res = as.data.table(t(m))
I am filling the data the "wrong way" (with each simulation on a column instead of a row) and then transposing since R fills matrix values using "column-major" order.
If you need to use lapply, here's a benchmark for the final step:
set.seed(1223)
library(dplyr); library(tidyr); library(purrr)
library(data.table)
numsim = 3e5
rN.D <- function(numsim) rpois(numsim, 4)
rX.D <- function(numsim) rnorm(numsim, mean = 5, sd = 4)
freqs <- rN.D(numsim)
obs <- lapply(freqs, function(x) rX.D(x))
system.time({
tidyres = obs %>%
set_names(seq_along(.)) %>%
stack %>%
group_by(ind) %>%
mutate(Col = paste0("Col", row_number())) %>%
spread(Col, values)
})
# user system elapsed
# 16.56 0.31 16.88
system.time({
out <- do.call(rbind, lapply(obs, `length<-`, max(lengths(obs))))
bres = as.data.frame(out)
})
# user system elapsed
# 0.50 0.05 0.55
system.time(
dtres <- setDT(transpose(obs))
)
# user system elapsed
# 0.03 0.01 0.05
The last approach is fastest compared to the other two (both from @akrun's answer).
Comment. I would recommend using only data.table or tidyverse. Mixing and matching will get messy very quickly. When I was setting this example up, I saw that purrr has it's own transpose function, so if you loaded packages in a different order, code like this can give different results without warning.