Find and replace missing values with row mean

那年仲夏 提交于 2019-12-18 04:12:50

问题


I have a data frame with NAs and I want to replace the NAs with row means

c1 = c(1,2,3,NA)
c2 = c(3,1,NA,3)
c3 = c(2,1,3,1)

df = data.frame(c1,c2,c3)

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3 NA  3
4 NA  3  1

so that

> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

回答1:


Very similar to @baptiste's answer

> ind <- which(is.na(df), arr.ind=TRUE)
> df[ind] <- rowMeans(df,  na.rm = TRUE)[ind[,1]]



回答2:


I think this works,

df[which(is.na(df), arr.ind=TRUE)] <- rowMeans(df[!complete.cases(df), ], na.rm=TRUE)



回答3:


Using apply (note the returned object is a matrix):

t( apply( df , 1 , function(x) { x[ is.na(x) ] = mean( x , na.rm = TRUE ); x } ) )
     c1 c2 c3
[1,]  1  3  2
[2,]  2  1  1
[3,]  3  3  3
[4,]  2  3  1

We use any anonymous function to change the values of each NA in each row to the mean of that row. The only advantage is that you don't have to do any more typing if the number of rows increases. It is not particularly efficient or fast in a computational sense, but more so in a cognitive sense (you won't notice unless you have 000,000's of rows).




回答4:


My solution is

rwmns = rowMeans(df,na.rm=TRUE)
df$c1[is.na(df$c1)] = rwmns[is.na(df$c1)]
df$c2[is.na(df$c2)] = rwmns[is.na(df$c2)]
df$c3[is.na(df$c3)] = rwmns[is.na(df$c3)]
> df
  c1 c2 c3
1  1  3  2
2  2  1  1
3  3  3  3
4  2  3  1

Is there a more elegant way, especially when someone has many columns?




回答5:


Another option is na.aggregate from library(zoo) after transposing the dataset

library(zoo)
df[] <- t(na.aggregate(t(df)))
df
#  c1 c2 c3
#1  1  3  2
#2  2  1  1
#3  3  3  3
#4  2  3  1


来源:https://stackoverflow.com/questions/17812641/find-and-replace-missing-values-with-row-mean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!