How to conditionally replace values with NA across multiple columns

北城余情 提交于 2021-02-05 08:45:21

问题


I would like to replace outliers in each column of a dataframe with NA.

If for example we define outliers as being any value greater than 3 standard deviations from the mean I can achieve this per variable with the code below.

Rather than specify each column individually I'd like to perform the same operation on all columns of df in one call. Any pointers on how to do this?!

Thanks!

library(dplyr)
data("iris")
df <- iris %>% 
  select(Sepal.Length, Sepal.Width, Petal.Length)%>% 
  head(10) 

# add a clear outlier to each variable
df[1, 1:3] = 99

# replace values above 3 SD's with NA
df_cleaned <- df %>% 
  mutate(Sepal.Length = replace(Sepal.Length, Sepal.Length > (abs(3 * sd(df$Sepal.Length, na.rm = TRUE))), NA))

回答1:


You need to use mutate_all(), i.e.

library(dplyr)

df %>% 
 mutate_all(funs(replace(., . > (abs(3 * sd(., na.rm = TRUE))), NA)))



回答2:


Another option is base R

df[] <- lapply(df, function(x) replace(x, . > (abs(3 * sd(x, na.rm = TRUE))), NA))

or with colSds from matrixStats

library(matrixStats)
df[df > abs(3 * colSds(as.matrix(df), na.rm = TRUE))] <- NA


来源:https://stackoverflow.com/questions/55745379/how-to-conditionally-replace-values-with-na-across-multiple-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!