Fastest way to replace NAs in a large data.table

后端 未结 10 1076
走了就别回头了
走了就别回头了 2020-11-22 17:10

I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to re code those NA values to zeros as efficiently as

10条回答
  •  甜味超标
    2020-11-22 17:31

    Here is a solution using NAToUnknown in the gdata package. I have used Andrie's solution to create a huge data table and also included time comparisons with Andrie's solution.

    # CREATE DATA TABLE
    dt1 = create_dt(2e5, 200, 0.1)
    
    # FUNCTIONS TO SET NA TO ZERO   
    f_gdata  = function(dt, un = 0) gdata::NAToUnknown(dt, un)
    f_Andrie = function(dt) remove_na(dt)
    
    # COMPARE SOLUTIONS AND TIMES
    system.time(a_gdata  <- f_gdata(dt1))
    
    user  system elapsed 
    4.224   2.962   7.388 
    
    system.time(a_andrie <- f_Andrie(dt1))
    
     user  system elapsed 
    4.635   4.730  20.060 
    
    identical(a_gdata, g_andrie)  
    
    TRUE
    

提交回复
热议问题