Identifying the outliers in a data set in R

末鹿安然 提交于 2019-12-19 04:44:14

问题


So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number?


回答1:


You can get this using boxplot. If your variable is x,

OutVals = boxplot(x)$out
which(x %in% OutVals)

If you are annoyed by the plot, you could use

OutVals = boxplot(x, plot=FALSE)$out



回答2:


If your dataset is x you can get those numbers using

summary(x)[["1st Qu."]]

and

summary(x)[["3rd Qu."]]

Then you compare against those numbers to get the numbers you want.




回答3:


You can refer to the function remove_outliersin this answer here. It does exactly what you want.

remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}


来源:https://stackoverflow.com/questions/44089894/identifying-the-outliers-in-a-data-set-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!