Labeling outliers on boxplot in R

断了今生、忘了曾经 提交于 2019-12-05 00:15:32

问题


I would like to plot each column of a matrix as a boxplot and then label the outliers in each boxplot as the row name they belong to in the matrix. To use an example:

vv=matrix(c(1,2,3,4,8,15,30),nrow=7,ncol=4,byrow=F)
rownames(vv)=c("one","two","three","four","five","six","seven")
boxplot(vv)

I would like to label the outlier in each plot (in this case 30) as the row name it belongs to, so in this case 30 belongs to row 7. Is there an easy way to do this? I have seen similar questions to this asked but none seemed to have worked the way I want it to.


回答1:


In the example given it's a bit boring because they are all the same row. but here is the code:

bxpdat <- boxplot(vv)
text(bxpdat$group,                                              # the x locations 
     bxpdat$out,                                                # the y values
     rownames(vv)[which(vv == bxpdat$out, arr.ind=TRUE)[, 1]],  # the labels
     pos = 4)  

This picks the rownames that have values equal to the "out" list (i.e., the outliers) in the result of boxplot. Boxplot calls and returns the values from boxplot.stats. Take a look at:

 str(bxpdat)



回答2:


There is a simple way. Note that b in Boxplot in following lines is a capital letter.

library(car)

Boxplot(y ~ x, id.method="y")



回答3:


Or alternatively, you could use the "Boxplot" function from the {car} package which labels outliers for you.

See the following link: https://CRAN.R-project.org/package=car




回答4:


@DWin's solution works very well for a single boxplot, but will fail for anything with duplicate values, like the dataset I have created:

#Create data
set.seed(1)
basenums <- c(1,2,3,4,8,15,30)
vv=matrix(c(basenums, sample(basenums), 1-basenums, 
          c(0, 29, 30, 31, 32, 33, 60)),nrow=7,ncol=4,byrow=F)
dimnames(vv)=list(c("one","two","three","four","five","six","seven"), 1:4)

On this dataset, @DWin's solution gives:

enter image description here

Which is false, because in the 4th example, it is not possible for the minimum and maximum to be in the same row.

This solution is monstrous (and I hope can be simplified), but effective.

#Reshape data
vv_dat <- as.data.frame(vv)
vv_dat$row <- row.names(vv_dat)
library(reshape2)
new_vv <- melt(vv_dat, id.vars="row")

#Get boxplot data
bxpdat <- as.data.frame(boxplot(value~variable, data=new_vv)[c("out", "group")])

#Get matches with boxplot data
text_guide <- do.call(rbind, apply(bxpdat, 1, 
    function(x) new_vv[new_vv$value==x[1]&new_vv$variable==x[2], ]))

#Add labels
with(text_guide, text(x=as.numeric(variable)+0.2, y=value, labels=row))

enter image description here




回答5:


@sebastian-c This is a slight modification of DWin solution that seem to work with more generality

bx1<-boxplot(pb,las=2,cex.axis=.8)
if(length(bx1$out)!=0){
  ## get the row of each outlier
  out.rows<-sapply(1:length(bx1$out),function(i) which(vv[,bx1$group[i]]==bx1$out[i]))
  text(bx1$group,bx1$out,
     rownames(vv)[out.rows],
     pos=4
  )
}



回答6:


Or you can simply run the code from this blog post:

source("https://raw.githubusercontent.com/talgalili/R-code-snippets/master/boxplot.with.outlier.label.r") # Load the function
set.seed(6484)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)

(which handles multiple outliers which are close to one another)

enter image description here



来源:https://stackoverflow.com/questions/15181086/labeling-outliers-on-boxplot-in-r

工具导航Map