How to fill NA with median?

前端 未结 6 1294
野性不改
野性不改 2020-12-05 08:44

Example data:

set.seed(1)
df <- data.frame(years=sort(rep(2005:2010, 12)), 
                 months=1:12, 
                 value=c(rnorm(60),NA,NA,NA,NA,         


        
6条回答
  •  自闭症患者
    2020-12-05 09:01

    you want to use the test is.na function:

    df$value[is.na(df$value)] <- median(df$value, na.rm=TRUE)
    

    which says for all the values where df$value is NA, replace it with the right hand side. You need the na.rm=TRUE piece or else the median function will return NA

    to do this month by month, there are many choices, but i think plyr has the simplest syntax:

    library(plyr)
    ddply(df, 
          .(months), 
          transform, 
          value=ifelse(is.na(value), median(value, na.rm=TRUE), value))
    

    you can also use data.table. this is an especially good choice if your data is large:

    library(data.table)
    DT <- data.table(df)
    setkey(DT, months)
    
    DT[,value := ifelse(is.na(value), median(value, na.rm=TRUE), value), by=months]
    

    There are many other ways, but there are two!

提交回复
热议问题