Replacing NA's in each column of matrix with the median of that column

只谈情不闲聊 提交于 2019-12-04 03:56:52

问题


I am trying to replace the NA's in each column of a matrix with the median of of that column, however when I try to use lapply or sapply I get an error; the code works when I use a for-loop and when I change one column at a time, what am I doing wrong?

Example:

set.seed(1928)
mat <- matrix(rnorm(100*110), ncol = 110)
mat[sample(1:length(mat), 700, replace = FALSE)] <- NA
mat1 <- mat2 <- mat

mat1 <- lapply(mat1,
  function(n) {
     mat1[is.na(mat1[,n]),n] <- median(mat1[,n], na.rm = TRUE)
  }
)   

for (n in 1:ncol(mat2)) {
  mat2[is.na(mat2[,n]),n] <- median(mat2[,n], na.rm = TRUE)
}

回答1:


I would suggest vectorizing this using the matrixStats package instead of calculating a median per column using either of the loops (sapply is also a loop in a sense that its evaluates a function in each iteration).

First, we will create a NAs index

indx <- which(is.na(mat), arr.ind = TRUE)

Then, replace the NAs using the precalculated column medians and according to the index

mat[indx] <- matrixStats::colMedians(mat, na.rm = TRUE)[indx[, 2]]



回答2:


You can use sweep:

sweep(mat, MARGIN = 2, 
      STATS = apply(mat, 2, median, na.rm=TRUE),
      FUN =  function(x,s) ifelse(is.na(x), s, x)
    )

EDIT: You can also drop in STATS=matrixStats::colMedians(mat, na.rm=TRUE) for a little more performance.




回答3:


lapply loops over a list. Do you mean to loop over the columns?

matx <- sapply(seq_len(ncol(mat1)), function(n) {
  mat1[is.na(mat1[,n]),n] <- median(mat1[,n], na.rm = TRUE)
})

though that's essentially just doing what your loop example does (but presumably faster).




回答4:


You could possibly get there easier via conversion to data.frame and back to matrix as a result, using vapply:

vapply(as.data.frame(mat1), function(x)
   replace(x, is.na(x), median(x,na.rm=TRUE)), FUN.VALUE=numeric(nrow(mat1)) 
)


来源:https://stackoverflow.com/questions/34865789/replacing-nas-in-each-column-of-matrix-with-the-median-of-that-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!