可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a dataframe with the lengths and widths of various arthropods from the guts of salamanders. Because some guts had thousands of certain prey items, I only measured a subset of each prey type. I now want to replace each unmeasured individual with the mean length and width for that prey. I want to keep the dataframe and just add imputed columns (length2, width2). The main reason is that each row also has columns with data on the date and location the salamander was collected. I could fill in the NA with a random selection of the measured individuals but for the sake of argument let's assume I just want to replace each NA with the mean.

For example imagine I have a dataframe that looks something like:

id    taxa        length  width 101   collembola  2.1     0.9 102   mite        0.9     0.7 103   mite        1.1     0.8 104   collembola  NA      NA 105   collembola  1.5     0.5 106   mite        NA      NA

In reality I have more columns and about 25 different taxa and a total of ~30,000 prey items in total. It seems like the plyr package might be ideal for this but I just can't figure out how to do this. I'm not very R or programming savvy but I'm trying to learn.

Not that I know what I'm doing but I'll try to create a small dataset to play with if it helps.

exampleDF

Here are a few things I've tried (that haven't worked):

# mean imputation to recode NA in length and width with means    (could do random imputation but unnecessary here) mean.imp

another attempt:

imp.mean

Any suggestions using plyr or not?

回答1:

Not my own technique I saw it on the boards a while back:

dat

Edit A non plyr approach with a for loop:

for (i in which(sapply(dat, is.numeric))) {     for (j in which(is.na(dat[, i]))) {         dat[j, i]

Edit many moons later here is a data.table & dplyr approach:

data.table

library(data.table) setDT(dat)  dat[, length := impute.mean(length), by = taxa][,     width := impute.mean(width), by = taxa]

dplyr

library(dplyr)  dat %>%     group_by(taxa) %>%     mutate(         length = impute.mean(length),         width = impute.mean(width)       )

回答2:

Before answering this, I want to say that am a beginner in R. Hence, please let me know if you feel my answer is wrong.

Code:

DF[is.na(DF$length), "length"]

and apply the same for width.

DF stands for name of the data.frame.

Thanks, Parthi

回答3:

Expanding on @Tyler Rinker's solution, suppose features are the columns to impute. In this case features . Then using data.table the solution becomes:

library(data.table) setDT(dat)  dat[, (features) := lapply(.SD, impute.mean), by = taxa, .SDcols = features]

文章来源: How to replace NA with mean by subset in R (impute with plyr?)

标签

mean