I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to re code those NA values to zeros as efficiently as
For the sake of completeness, another way to replace NAs with 0 is to use
f_rep <- function(dt) {
dt[is.na(dt)] <- 0
return(dt)
}
To compare results and times I have incorporated all approaches mentioned so far.
set.seed(1)
dt1 <- create_dt(2e5, 200, 0.1)
dt2 <- dt1
dt3 <- dt1
system.time(res1 <- f_gdata(dt1))
User System verstrichen
3.62 0.22 3.84
system.time(res2 <- f_andrie(dt1))
User System verstrichen
2.95 0.33 3.28
system.time(f_dowle2(dt2))
User System verstrichen
0.78 0.00 0.78
system.time(f_dowle3(dt3))
User System verstrichen
0.17 0.00 0.17
system.time(res3 <- f_unknown(dt1))
User System verstrichen
6.71 0.84 7.55
system.time(res4 <- f_rep(dt1))
User System verstrichen
0.32 0.00 0.32
identical(res1, res2) & identical(res2, res3) & identical(res3, res4) & identical(res4, dt2) & identical(dt2, dt3)
[1] TRUE
So the new approach is slightly slower than f_dowle3 but faster than all the other approaches. But to be honest, this is against my Intuition of the data.table Syntax and I have no idea why this works. Can anybody enlighten me?