Getting same output as cut() using speedier hist() or findInterval()?

前端 未结 2 1995
执笔经年
执笔经年 2021-01-27 00:11

I read this article http://www.r-bloggers.com/comparing-hist-and-cut-r-functions/ and tested hist() to be faster than cut() by ~4 times on my PC. My sc

2条回答
  •  清歌不尽
    2021-01-27 01:02

    Here is an implementation based on your findInterval suggestion which is 5-6 times faster than classical cut:

    cut2 <- function(x, breaks) {
      labels <- paste0("(",  breaks[-length(breaks)], ",", breaks[-1L], "]")
      return(factor(labels[findInterval(x, breaks)], levels=labels))
    }
    
    library(microbenchmark)
    
    set.seed(1)
    data <- rnorm(1e4, mean=0, sd=1)
    
    microbenchmark(cut.default(data, my_breaks), cut2(data, my_breaks))
    
    # Unit: microseconds
    #                         expr      min        lq    median        uq      max neval
    # cut.default(data, my_breaks) 3011.932 3031.1705 3046.5245 3075.3085 4119.147   100
    #        cut2(data, my_breaks)  453.761  459.8045  464.0755  469.4605 1462.020   100
    
    identical(cut(data, my_breaks), cut2(data, my_breaks))
    # TRUE
    

提交回复
热议问题