Is R's apply family more than syntactic sugar?

后端 未结 5 2399
旧时难觅i
旧时难觅i 2020-11-21 22:14

...regarding execution time and / or memory.

If this is not true, prove it with a code snippet. Note that speedup by vectorization does not count. The speedup must c

5条回答
  •  醉梦人生
    2020-11-21 22:34

    When applying functions over subsets of a vector, tapply can be pretty faster than a for loop. Example:

    df <- data.frame(id = rep(letters[1:10], 100000),
                     value = rnorm(1000000))
    
    f1 <- function(x)
      tapply(x$value, x$id, sum)
    
    f2 <- function(x){
      res <- 0
      for(i in seq_along(l <- unique(x$id)))
        res[i] <- sum(x$value[x$id == l[i]])
      names(res) <- l
      res
    }            
    
    library(microbenchmark)
    
    > microbenchmark(f1(df), f2(df), times=100)
    Unit: milliseconds
       expr      min       lq   median       uq      max neval
     f1(df) 28.02612 28.28589 28.46822 29.20458 32.54656   100
     f2(df) 38.02241 41.42277 41.80008 42.05954 45.94273   100
    

    apply, however, in most situation doesn't provide any speed increase, and in some cases can be even lot slower:

    mat <- matrix(rnorm(1000000), nrow=1000)
    
    f3 <- function(x)
      apply(x, 2, sum)
    
    f4 <- function(x){
      res <- 0
      for(i in 1:ncol(x))
        res[i] <- sum(x[,i])
      res
    }
    
    > microbenchmark(f3(mat), f4(mat), times=100)
    Unit: milliseconds
        expr      min       lq   median       uq      max neval
     f3(mat) 14.87594 15.44183 15.87897 17.93040 19.14975   100
     f4(mat) 12.01614 12.19718 12.40003 15.00919 40.59100   100
    

    But for these situations we've got colSums and rowSums:

    f5 <- function(x)
      colSums(x) 
    
    > microbenchmark(f5(mat), times=100)
    Unit: milliseconds
        expr      min       lq   median       uq      max neval
     f5(mat) 1.362388 1.405203 1.413702 1.434388 1.992909   100
    

提交回复
热议问题