apply() is slow - how to make it faster or what are my alternatives?

前端 未结 3 2183
面向向阳花
面向向阳花 2020-12-14 03:33

I have a quite large data frame, about 10 millions of rows. It has columns x and y, and what I want is to compute

hypot <- funct         


        
3条回答
  •  感动是毒
    2020-12-14 04:05

    What about with(my_data,sqrt(x^2+y^2)) ?

    set.seed(101)
    d <- data.frame(x=runif(1e5),y=runif(1e5))
    
    library(rbenchmark)
    

    Two different per-line functions, one taking advantage of vectorization:

    hypot <- function(x) sqrt(x[1]^2+x[2]^2)
    hypot2 <- function(x) sqrt(sum(x^2))
    

    Try compiling these too:

    library(compiler)
    chypot <- cmpfun(hypot)
    chypot2 <- cmpfun(hypot2)
    
    benchmark(sqrt(d[,1]^2+d[,2]^2),
              with(d,sqrt(x^2+y^2)),
              apply(d,1,hypot),
              apply(d,1,hypot2),
              apply(d,1,chypot),
              apply(d,1,chypot2),
              replications=50)
    

    Results:

                           test replications elapsed relative user.self sys.self
    5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
    6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
    3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
    4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
    1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
    2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144
    

    As expected the with() solution and the column-indexing solution à la Tyler Rinker are essentially identical; hypot2 is twice as fast as the original hypot (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn't help very much.

提交回复
热议问题