Loops inefficiency in R

后端 未结 2 1100
暗喜
暗喜 2021-01-01 02:59

Good morning,

I have been developing for a few months in R and I have to make sure that the execution time of my code is not too long because I analyze big datasets.

2条回答
  •  佛祖请我去吃肉
    2021-01-01 03:37

    Just a couple of comments. A for loop is roughly as fast as apply and its variants, and the real speed-ups come when you vectorise your function as much as possible (that is, using low-level loops, rather than apply, which just hides the for loop). I'm not sure if this is the best example, but consider the following:

    > n <- 1e06
    > sinI <- rep(NA,n)
    > system.time(for(i in 1:n) sinI[i] <- sin(i))
       user  system elapsed 
      3.316   0.000   3.358 
    > system.time(sinI <- sapply(1:n,sin))
       user  system elapsed 
      5.217   0.016   5.311 
    > system.time(sinI <- unlist(lapply(1:n,sin),
    +       recursive = FALSE, use.names = FALSE))
       user  system elapsed 
      1.284   0.012   1.303 
    > system.time(sinI <- sin(1:n))
       user  system elapsed 
      0.056   0.000   0.057 
    

    In one of the comments below, Marek points out that the time consuming part of the for loop above is actually the ]<- part:

    > system.time(sinI <- unlist(lapply(1:n,sin),
    +       recursive = FALSE, use.names = FALSE))
       user  system elapsed 
      1.284   0.012   1.303 
    

    The bottlenecks which can't immediately be vectorised can be rewritten in C or Fortran, compiled with R CMD SHLIB, and then plugged in with .Call, .C or .Fortran.

    Also, see these links for more info about loop optimisation in R. Also check out the article "How Can I Avoid This Loop or Make It Faster?" in R News.

提交回复
热议问题