How to fill NA with median?

前端 未结 6 1307
野性不改
野性不改 2020-12-05 08:44

Example data:

set.seed(1)
df <- data.frame(years=sort(rep(2005:2010, 12)), 
                 months=1:12, 
                 value=c(rnorm(60),NA,NA,NA,NA,         


        
6条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-05 08:53

    Here's the most robust solution I can think of. It ensures the years are ordered correctly and will correctly compute the median for all previous months in cases where you have multiple years with missing values.

    # first, reshape your data so it is years by months:
    library(reshape2)
    tmp <- dcast(years ~ months, data=df)  # convert data to years x months
    tmp <- tmp[order(tmp$years),]          # order years
    # now calculate the running median on each month
    library(caTools)
    # function to replace NA with rolling median
    tmpfun <- function(x) {
      ifelse(is.na(x), runquantile(x, k=length(x), probs=0.5, align="right"), x)
    }
    # apply tmpfun to each column and convert back to data.frame
    tmpmed <- as.data.frame(lapply(tmp, tmpfun))
    # reshape back to long and convert 'months' back to integer
    res <- melt(tmpmed, "years", variable.name="months")
    res$months <- as.integer(gsub("^X","",res$months))
    

提交回复
热议问题