How to replace NA with mean by subset in R (impute with plyr?)

南笙酒味 提交于 2019-11-26 17:21:14

Not my own technique I saw it on the boards a while back:

dat <- read.table(text = "id    taxa        length  width
101   collembola  2.1     0.9
102   mite        0.9     0.7
103   mite        1.1     0.8
104   collembola  NA      NA
105   collembola  1.5     0.5
106   mite        NA      NA", header=TRUE)


library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
     width = impute.mean(width))

dat2[order(dat2$id), ] #plyr orders by group so we have to reorder

Edit A non plyr approach with a for loop:

for (i in which(sapply(dat, is.numeric))) {
    for (j in which(is.na(dat[, i]))) {
        dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i],  na.rm = TRUE)
    }
}

Edit many moons later here is a data.table & dplyr approach:

data.table

library(data.table)
setDT(dat)

dat[, length := impute.mean(length), by = taxa][,
    width := impute.mean(width), by = taxa]

dplyr

library(dplyr)

dat %>%
    group_by(taxa) %>%
    mutate(
        length = impute.mean(length),
        width = impute.mean(width)  
    )

Before answering this, I want to say that am a beginner in R. Hence, please let me know if you feel my answer is wrong.

Code:

DF[is.na(DF$length), "length"] <- mean(na.omit(telecom_original_1$length))

and apply the same for width.

DF stands for name of the data.frame.

Thanks, Parthi

Expanding on @Tyler Rinker's solution, suppose features are the columns to impute. In this case features <- c('length', 'width'). Then using data.table the solution becomes:

library(data.table)
setDT(dat)

dat[, (features) := lapply(.SD, impute.mean), by = taxa, .SDcols = features]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!