Replace NAs with mean of the same column of a data.table

前端 未结 6 775
不思量自难忘°
不思量自难忘° 2020-12-09 17:42

I want to replace NAs present in a column of a DATA TABLE with the mean of the same column. I am doing the following. But it is not working.

ww <- data.ta         


        
相关标签:
6条回答
  • 2020-12-09 17:55

    na.aggregate in the zoo package replaces NAs with the mean of the non-NAs in the same column:

    library(zoo)
    
    ww[, Sepal.Length := na.aggregate(Sepal.Length)]
    
    0 讨论(0)
  • 2020-12-09 17:57

    In base R:

    ww$Sepal.Length[is.na(ww$Sepal.Length)] <- mean(ww$Sepal.Length, na.rm = T)
    
    0 讨论(0)
  • 2020-12-09 18:01

    While the zoo answer is pretty nice it requires new dependency.
    Using just data.table you could do the following.

    library(data.table)
    
    # prepare data
    ww = data.table(iris[1:5,])
    ww[1, Sepal.Length := NA]
    
    # solution
    ww[, Sepal.Length.mean := mean(Sepal.Length, na.rm = TRUE) # calculate mean
       ][is.na(Sepal.Length), Sepal.Length := Sepal.Length.mean # replace NA with mean
         ][, Sepal.Length.mean := NULL # remove mean col
           ][] # just prints
    

    While it may looks biggish comparing to zoo's, it is performance efficient as all steps are made using update by reference :=. It can also be easily tuned to replace NA with mean by group, just using by argument in data.table.

    0 讨论(0)
  • 2020-12-09 18:05

    Your attempt subsetted the table first, selecting

    > ww[is.na(Sepal.Length)]
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1:   
    
        NA         3.5          1.4         0.2  setosa
    

    so any further operations can only 'see' these rows - i.e. Sepal.Length can only see that one NA.

    The data.table solution you want is below - it looks at the whole table and replaces the NAs with the means using an ifelse.

    ww[, Sepal.Length := ifelse(is.na(Sepal.Length), mean(Sepal.Length, na.rm = TRUE), Sepal.Length)]
    
    0 讨论(0)
  • 2020-12-09 18:06

    It is not taking the mean of the entire Sepal.Length column; only the 1 column that you have chosen.

    Rather use:

    ww[is.na(Sepal.Length) , Sepal.Length:= mean(ww$Sepal.Length, na.rm=TRUE)]
    
    0 讨论(0)
  • 2020-12-09 18:14

    tidyr has a built in function, replace_na you can use for this:

    library(tidyr)
    ww %>% replace_na(list(Sepal.Length = mean(.$Sepal.Length, na.rm = TRUE)))
    
    0 讨论(0)
提交回复
热议问题