I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to re code those NA values to zeros as efficiently as
Here is a solution using NAToUnknown in the gdata package. I have used Andrie's solution to create a huge data table and also included time comparisons with Andrie's solution.
# CREATE DATA TABLE
dt1 = create_dt(2e5, 200, 0.1)
# FUNCTIONS TO SET NA TO ZERO
f_gdata = function(dt, un = 0) gdata::NAToUnknown(dt, un)
f_Andrie = function(dt) remove_na(dt)
# COMPARE SOLUTIONS AND TIMES
system.time(a_gdata <- f_gdata(dt1))
user system elapsed
4.224 2.962 7.388
system.time(a_andrie <- f_Andrie(dt1))
user system elapsed
4.635 4.730 20.060
identical(a_gdata, g_andrie)
TRUE