Remove outliers from data frame in R?

别来无恙 提交于 2021-01-29 10:01:32

问题


I am trying to remove outliers from my data. The outliers in my case are the values that are away from rest of the data when plotted on a boxplot. After removing outliers, I will save data in new file and run some prediction model to see the results. How different they are from the original data.

I used one tutorial and adopted it to remove outliers from my data. The tutorial uses boxplotting to figure out the outliers.

It works fine when I run it on a column that has outliers. But it raises errors when I run it for a column that don't have outliers. How to remove this error?

Here is code:

outlier_rem <- Data_combined #data-frame with 25 var, few have outliers

#removing outliers from the column

outliers <- boxplot(outlier_rem$var1, plot=FALSE)$out
#print(outliers)
ol <- outlier_rem[-which(outlier_rem$var1 %in% outliers),]

dim(ol)
# [1]  0 25
boxplot(ol)

Produces the error:

no non-missing arguments to min; returning Infno non-missing arguments to max; 
returning -InfError in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) : 
  need finite 'ylim' values

回答1:


The following works

# Sample data based on mtcars and one additional row
df <- rbind(mtcars[, 1:3], c(100, 6, 300))

# Identify outliers        
outliers <- boxplot(df$mpg, plot = FALSE)$out
#[1]  33.9 100.0

# Remove outliers
df[!(df$mpg %in% outliers), ]

The reason why your method fails is because if there are no outliers, which(mtcars$mpg %in% numeric(0)) returns integer(0) and you end up with a zero-row data.frame, which is exactly what you see from dim.

outliers <- boxplot(mtcars$mpg, plot = FALSE)$out
outliers
#numeric(0)

Compare

which(mtcars$mpg %in% outliers)
#integer(0)

with

df$mpg %in% outliers
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

There exists a nice post here on SO that elaborates on this point.



来源:https://stackoverflow.com/questions/54782522/remove-outliers-from-data-frame-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!