I\'m trying to write from a loop to a data frame in R, for example a loop like this>
for (i in 1:20) {
print(c(i+i,i*i,i/1))}
and to write
For
loops have side-effects, so the usual way of doing this is to create an empty dataframe before the loop and then add to it on each iteration. You can instantiate it to the correct size and then assign your values to the i
'th row on each iteration, or else add to it and reassign the whole thing using rbind()
.
The former approach will have better performance for large datasets.
If all your values have the same type and you know the number of rows, you can use a matrix in the following way (this will be very fast):
d <- matrix(nrow=20, ncol=3)
for (i in 1:20) { d[i,] <- c(i+i, i*i, i/1)}
If you need a data frame, you can use rbind (as another answer suggests), or functions from package plyr like this:
library(plyr)
ldply(1:20, function(i)c(i+i, i*i, i/1))
You could use rbind:
d <- data.frame()
for (i in 1:20) {d <- rbind(d,c(i+i, i*i, i/1))}
Another way would be
do.call("rbind", sapply(1:20, FUN = function(i) c(i+i,i*i,i/1), simplify = FALSE))
[,1] [,2] [,3]
[1,] 2 1 1
[2,] 4 4 2
[3,] 6 9 3
[4,] 8 16 4
[5,] 10 25 5
[6,] 12 36 6
If you don't specify simplify = FALSE
, you have to transpose the result using t
. This can be tedious for large structures.
This solution is especially handy if you have a data set on the large side and/or you need to repeat this many many times.
I offer some timings of solutions in this "thread".
> system.time(do.call("rbind", sapply(1:20000, FUN = function(i) c(i+i,i*i,i/1), simplify = FALSE)))
user system elapsed
0.05 0.00 0.05
> system.time(ldply(1:20000, function(i)c(i+i, i*i, i/1)))
user system elapsed
0.14 0.00 0.14
> system.time({d <- matrix(nrow=20000, ncol=3)
+ for (i in 1:20000) { d[i,] <- c(i+i, i*i, i/1)}})
user system elapsed
0.10 0.00 0.09
> system.time(ldply(1:20000, function(i)c(i+i, i*i, i/1)))
user system elapsed
62.88 0.00 62.99