I know that loops are slow in R
and that I should try to do things in a vectorised manner instead.
But, why? Why are loops slow and apply
i
The only Answer to the Question posed is; loops are not slow if what you need to do is iterate over a set of data performing some function and that function or the operation is not vectorized. A for()
loop will be as quick, in general, as apply()
, but possibly a little bit slower than an lapply()
call. The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop.
Why many people think for()
loops are slow is because they, the user, are writing bad code. In general (though there are several exceptions), if you need to expand/grow an object, that too will involve copying so you have both the overhead of copying and growing the object. This is not just restricted to loops, but if you copy/grow at each iteration of a loop, of course, the loop is going to be slow because you are incurring many copy/grow operations.
The general idiom for using for()
loops in R is that you allocate the storage you require before the loop starts, and then fill in the object thus allocated. If you follow that idiom, loops will not be slow. This is what apply()
manages for you, but it is just hidden from view.
Of course, if a vectorised function exists for the operation you are implementing with the for()
loop, don't do that. Likewise, don't use apply()
etc if a vectorised function exists (e.g. apply(foo, 2, mean)
is better performed via colMeans(foo)
).