问题
if I have a data frame df
df=data.frame(x=1:20,y=c(1:10,rep(NA,10)),z=c(rep(NA,5),1:15))
I know to replace NAs with mean value for a given column is, we can use
df[is.na(df$x)]=mean(df$x,na.rm=T)
What I am trying to find is a way to use a single command so that it does this for the columns at once instead of repeating it for every column.
Suspecting, I need to use sapply and function, I tried something like this but clearly this does not work
sapply(df,function(x) df[is.na(df$x)]=mean(df$x,na.rm=T))
Any suggestions would be great. I tried to search previous post but I could not find a similar problem being addressed.
回答1:
We can use na.aggregate. One option would be to separately apply the na.aggregate on each column. We can do this with lapply. If we are using data.table, convert the 'data.frame' to 'data.table' (setDT(df)), loop over the columns and apply na.aggregate. This will replace NA with the mean of the non-NA values.
library(zoo)
library(data.table)
setDT(df)[, names(df) := lapply(.SD, na.aggregate)][]
# x y z
# 1: 1 1.0 8
# 2: 2 2.0 8
# 3: 3 3.0 8
# 4: 4 4.0 8
# 5: 5 5.0 8
# 6: 6 6.0 1
# 7: 7 7.0 2
# 8: 8 8.0 3
# 9: 9 9.0 4
#10: 10 10.0 5
#11: 11 5.5 6
#12: 12 5.5 7
#13: 13 5.5 8
#14: 14 5.5 9
#15: 15 5.5 10
#16: 16 5.5 11
#17: 17 5.5 12
#18: 18 5.5 13
#19: 19 5.5 14
#20: 20 5.5 15
Or we can use na.aggregate directly on the dataset.
na.aggregate(df)
来源:https://stackoverflow.com/questions/35194239/fill-in-mean-values-for-na-in-every-column-of-a-data-frame