mean-before-after imputation in R

前端 未结 4 863
渐次进展
渐次进展 2021-01-21 11:03

I\'m new in R. My question is how to impute missing value using mean of before and after of the missing data point?

example;

using the mean from the upper and lo

相关标签:
4条回答
  • 2021-01-21 11:23

    This would be a basic manual approach you can take:

    age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
    age[is.na(age)] <- rowMeans(cbind(age[which(is.na(age))-1], 
                                      age[which(is.na(age))+1]))
    age
    # [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
    

    Or, since you seem to have a single column data.frame:

    mydf <- data.frame(age = c(52, 27, NA, 23, 39, 32, NA, 33, 43))
    
    mydf[is.na(mydf$age), ] <- rowMeans(
      cbind(mydf$age[which(is.na(mydf$age))-1],
            mydf$age[which(is.na(mydf$age))+1]))
    
    0 讨论(0)
  • 2021-01-21 11:28

    Here a solution using from na.locf from zoo package which replaces each NA with the most recent non-NA prior or posterior to it.

    0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
    [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
    

    the advantage here if you have more than one consecutive NA.

    x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
    0.5*(na.locf(x,fromlast=TRUE) + na.locf(x))
    [1] 52 27 25 23 39 36 36 33 43
    

    EDIT rev argument is deprecated so I replace it by fromlast

    0 讨论(0)
  • 2021-01-21 11:29

    Just an other way:

    age <- c(52, 27, NA, 23, 39, 32, NA, 33, 43)
    age[is.na(age)] <- apply(sapply(which(is.na(age)), "+", c(-1, 1)), 2, 
                             function(x) mean(age[x]))
    age
    ## [1] 52.0 27.0 25.0 23.0 39.0 32.0 32.5 33.0 43.0
    
    0 讨论(0)
  • 2021-01-21 11:34

    You are looking for Moving Average Imputation - you can use the na.ma function of imputeTS for this.

    library(imputeTS)
    x <- c(52, 27, NA, 23, 39, NA, NA, 33, 43)
    na.ma(x, k=1, weighting = "simple")
    

    [1] 52.00000 27.00000 25.00000 23.00000 39.00000 31.66667 38.33333 33.00000 43.00000

    This produces exactly the required result. With the k parameter you specify how many neighbors on each side are taken into account for the calculation.

    0 讨论(0)
提交回复
热议问题