remove outliers in r very easy?

六眼飞鱼酱① 提交于 2019-12-02 03:27:37

问题


I am currently trying to remove outliers in R in a very easy way. I know there are functions you can create on your own for this but I would like some input on this simple code and why it does not seem to work?

outliers <- boxplot(okt$pris)$out

okt_no_out <- okt[-c(outliers),]

boxplot(okt_no_out$pris)

so first row I create a vector with the outliers, the second I create a new dataframe omitting the values in that vector. But... When I check the new dataframe only about 400 of the 750 outliers were removed?

So, the vector outliers contain roughly 750 rows, but when doing this it only remove about halv of them....

So, my simple question. I might be stupid but should not these simple lines of code remove the outliers in a very convenient way?

//Peter


回答1:


boxplot$out is returning the values for the outliers and not the positions of the outliers. So okt[-c(outliers),] is removing random points in the data series, some of them are outliers and others are not.

What you can do is use the output from the boxplot's stats information to retrieve the end of the upper and lower whiskers and then filter your dataset using those values. See the example below:

#test data
testdata<-iris$Sepal.Width

#return boxplot object
b<-boxplot(testdata)

#find extremes from the boxplot's stats output
lowerwhisker<-b$stats[1]
upperwhisker<-b$stats[5]

#remove the extremes
testdata<-testdata[testdata>lowerwhisker & testdata<upperwhisker]

#replot
b<-boxplot(testdata)


来源:https://stackoverflow.com/questions/53201016/remove-outliers-in-r-very-easy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!