Imputation with column medians in R

亡梦爱人 提交于 2020-01-25 21:40:12

问题


If I have a vector, for example

vec <- c(3,4,5,NA)

I can replace the NA with the median value of the other values in the vector with the following code:

vec[which(is.na(vec))] <- median(vec, na.rm = T)

However, if I have a matrix containing NAs, applying this same code across all columns of the matrix doesn't give me back a matrix, just returning the medians of each matrix column.

mat <- matrix(c(1,NA,3,5,6,7,NA,3,4,NA,2,8), ncol = 3)
apply(mat, 2, function(x) x[which(is.na(x))] <- median(x, na.rm=T) )

#[1] 3 6 4

How can I get the matrix back with NAs replaced by column medians? This question is similar: Replace NA values by row means but I can't adapt any of the solutions to my case.


回答1:


Adding return(x) as last line of the function within apply will solve it.

> apply(mat, 2, function(x){
    x[which(is.na(x))] <- median(x, na.rm=T)
    return(x)
  })
     [,1] [,2] [,3]
[1,]    1    6    4
[2,]    3    7    4
[3,]    3    6    2
[4,]    5    3    8



回答2:


There is a convenient function (na.aggregate) in zoo to replace the NA elements with the specified FUN.

library(zoo)
apply(mat, 2, FUN = function(x) na.aggregate(x, FUN = median))
#      [,1] [,2] [,3]
#[1,]    1    6    4
#[2,]    3    7    4
#[3,]    3    6    2
#[4,]    5    3    8

Or as @G.Grothendieck commented, na.aggregate can be directly applied on the matrix

na.aggregate(mat, FUN = median)


来源:https://stackoverflow.com/questions/39862778/imputation-with-column-medians-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!