Labeling outliers on boxplot in R

Deadly 提交于 2019-12-03 16:12:04
42-

In the example given it's a bit boring because they are all the same row. but here is the code:

bxpdat <- boxplot(vv)
text(bxpdat$group,                                              # the x locations 
     bxpdat$out,                                                # the y values
     rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]],  # the labels
     pos = 4)  

This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot. Boxplot calls and returns the values from boxplot.stats. Take a look at:

 str(bxpdat)
user4168562

There is a simple way. Note that b in Boxplot in following lines is a capital letter.

library(car)

Boxplot(y ~ x, id.method="y")
Tony Knights

Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.

See the following link: https://CRAN.R-project.org/package=car

@DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:

#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums, 
          c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)

On this dataset, @DWin's solution gives:

Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.

This solution is monstrous (and I hope can be simplified), but effective.

#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")

#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])

#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1, 
    function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))

#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))

@sebastian-c This is a slight modification of DWin solution that seem to work with more generality

bx1<-boxplot(pb,las=2,cex.axis=.8)
if(length(bx1$out)!=0){
  ## get the row of each outlier
  out.rows<-sapply(1:length(bx1$out),function(i) which(vv[,bx1$group[i]]==bx1$out[i]))
  text(bx1$group,bx1$out,
     rownames(vv)[out.rows],
     pos=4
  )
}

Or you can simply run the code from this blog post:

source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)

(which handles multiple outliers which are close to one another)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!