Determine if data frame is empty

后端 未结 1 1458
-上瘾入骨i
-上瘾入骨i 2020-12-10 18:51

I have a data frame and I would like to test really fast if it is empty or not. I know that there are either no rows or there are integers (no missing values). So far, I hav

相关标签:
1条回答
  • 2020-12-10 19:50

    Since most of the objects you tests aren't likely to be empty, you should be more concerned about the timing of your functions on a non-empty data.frame. You should also compile them to get a sense for how they would perform in a package.

    library(microbenchmark)
    library(compiler)
    
    fa <- cmpfun({function(){
      nrow(df) > 0L
    }})
    
    fb <- cmpfun({function(){
      any(dim(df)[1L])
    }})
    
    fc <- cmpfun({function(){
      dim(df)[1L] != 0L
    }})
    
    fd <- cmpfun({function() {
      any(.subset2(df, 1L)[1L])
    }})
    
    fe <- cmpfun({function() {
      any(.subset2(df, 1L))
    }})
    
    ff <- cmpfun({function() {
      length(.subset2(df, 1L)) > 0L
    }})
    
    fg <- cmpfun({function() {
      as.logical(length(.subset2(df, 1L)))
    }})
    

    The test on an empty data.frame shows all methods are roughly the same.

    df <- data.frame(a = integer(0), b = integer(0), c = integer(0))
    microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)
    
    # Unit: nanoseconds
    #  expr  min     lq median     uq   max neval
    #  fa() 5685 5969.0 6165.0 6608.5 20515  1000
    #  fb() 6147 6443.0 6651.0 7214.0 18117  1000
    #  fc() 5726 5984.0 6152.0 6457.5 38404  1000
    #  fd() 1210 1411.0 1573.0 1764.5  4933  1000
    #  fe()  635  871.0 1003.0 1105.5 10225  1000
    #  ff()  513  727.5  861.5  941.0  5691  1000
    #  fg()  681  868.5  981.5 1080.0  2982  1000
    

    The test on a non-empty data.frame shows that one of the functions is a really bad performer, while the rest are roughly the same.

    df <- data.frame(a = integer(1e6), b = integer(1e6), c = integer(1e6))
    microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)
    
    # Unit: nanoseconds
    #  expr     min      lq    median        uq      max neval
    #  fa()    6569    7142    8782.0   12364.5    46749  1000
    #  fb()    7034    7682    9334.5   18334.0    53172  1000
    #  fc()    6539    7110    8453.5   20585.5    49912  1000
    #  fd()    1171    1585    2507.5    5021.5    17641  1000
    #  fe() 4340209 4413042 4460973.5 5468688.5 26045766  1000
    #  ff()     637     984    1489.0    3646.5    14212  1000
    #  fg()     767    1161    2401.0    4078.5   236958  1000
    
    0 讨论(0)
提交回复
热议问题