I have a big performance problem in R. I wrote a function that iterates over a data.frame
object. It simply adds a new column to a data.frame
and a
If you are using for
loops, you are most likely coding R as if it was C or Java or something else. R code that is properly vectorised is extremely fast.
Take for example these two simple bits of code to generate a list of 10,000 integers in sequence:
The first code example is how one would code a loop using a traditional coding paradigm. It takes 28 seconds to complete
system.time({
a <- NULL
for(i in 1:1e5)a[i] <- i
})
user system elapsed
28.36 0.07 28.61
You can get an almost 100-times improvement by the simple action of pre-allocating memory:
system.time({
a <- rep(1, 1e5)
for(i in 1:1e5)a[i] <- i
})
user system elapsed
0.30 0.00 0.29
But using the base R vector operation using the colon operator :
this operation is virtually instantaneous:
system.time(a <- 1:1e5)
user system elapsed
0 0 0