Fastest way to replace NAs in a large data.table

后端 未结 10 1069
走了就别回头了
走了就别回头了 2020-11-22 17:10

I have a large data.table, with many missing values scattered throughout its ~200k rows and 200 columns. I would like to re code those NA values to zeros as efficiently as

10条回答
  •  礼貌的吻别
    2020-11-22 17:40

    library(data.table)
    
    DT = data.table(a=c(1,"A",NA),b=c(4,NA,"B"))
    
    DT
        a  b
    1:  1  4
    2:  A NA
    3: NA  B
    
    DT[,lapply(.SD,function(x){ifelse(is.na(x),0,x)})]
       a b
    1: 1 4
    2: A 0
    3: 0 B
    

    Just for reference, slower compared to gdata or data.matrix, but uses only the data.table package and can deal with non numerical entries.

提交回复
热议问题