问题
So, I have a data set and know how to get the five number summary using the summary command. Now I need to get the instances above the Q3 + 1.5IQR or below the Q1 - 1.5IQR, since these are just numbers - how would I return the instances from a data set which lie above the number or below the number?
回答1:
You can get this using boxplot
. If your variable is x,
OutVals = boxplot(x)$out
which(x %in% OutVals)
If you are annoyed by the plot, you could use
OutVals = boxplot(x, plot=FALSE)$out
回答2:
If your dataset is x
you can get those numbers using
summary(x)[["1st Qu."]]
and
summary(x)[["3rd Qu."]]
Then you compare against those numbers to get the numbers you want.
回答3:
You can refer to the function remove_outliers
in this answer here. It does exactly what you want.
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
来源:https://stackoverflow.com/questions/44089894/identifying-the-outliers-in-a-data-set-in-r