Plot multiple boxplot in one graph

梦想与她 提交于 2019-11-26 03:22:30

问题


I saved my data in as a .csv file with 12 columns. Columns two through 11 (labeled F1, F2, ..., F11) are features. Column one contains the label of these features either good or bad.

I would like to plot a boxplot of all these 11 features against the label, but separate by good or bad. My code so far is:

qplot(Label, F1, data=testData, geom = \"boxplot\", fill=Label, 
          binwidth=0.5, main=\"Test\") + xlab(\"Label\") + ylab(\"Features\")

However, this only shows F1 against the label.

My question is: How to show F2, F3, ..., F11 against the label in one graph with some dodge position? I have normalized the features so they are in the same scale within [0 1] range.

The test data can be found here. I have drawn something by hand to explain the problem (see below).

\"hand-drawn


回答1:


You should get your data in a specific format by melting your data (see below for how melted data looks like) before you plot. Otherwise, what you have done seems to be okay.

require(reshape2)
df <- read.csv("TestData.csv", header=T)
# melting by "Label". `melt is from the reshape2 package. 
# do ?melt to see what other things it can do (you will surely need it)
df.m <- melt(df, id.var = "Label")
> df.m # pasting some rows of the melted data.frame

#     Label variable      value
# 1    Good       F1 0.64778924
# 2    Good       F1 0.54608791
# 3    Good       F1 0.46134200
# 4    Good       F1 0.79421221
# 5    Good       F1 0.56919951
# 6    Good       F1 0.73568570
# 7    Good       F1 0.65094207
# 8    Good       F1 0.45749702
# 9    Good       F1 0.80861929
# 10   Good       F1 0.67310067
# 11   Good       F1 0.68781739
# 12   Good       F1 0.47009455
# 13   Good       F1 0.95859182
# 14   Good       F1 1.00000000
# 15   Good       F1 0.46908343
# 16    Bad       F1 0.57875528
# 17    Bad       F1 0.28938046
# 18    Bad       F1 0.68511766

require(ggplot2)
ggplot(data = df.m, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Label))

Edit: I realise that you might need to facet. Here's an implementation of that as well:

p <- ggplot(data = df.m, aes(x=variable, y=value)) + 
             geom_boxplot(aes(fill=Label))
p + facet_wrap( ~ variable, scales="free")

Edit 2: How to add x-labels, y-labels, title, change legend heading, add a jitter?

p <- ggplot(data = df.m, aes(x=variable, y=value)) 
p <- p + geom_boxplot(aes(fill=Label))
p <- p + geom_jitter()
p <- p + facet_wrap( ~ variable, scales="free")
p <- p + xlab("x-axis") + ylab("y-axis") + ggtitle("Title")
p <- p + guides(fill=guide_legend(title="Legend_Title"))
p 

Edit 3: How to align geom_point() points to the center of box-plot? It could be done using position_dodge. This should work.

require(ggplot2)
p <- ggplot(data = df.m, aes(x=variable, y=value)) 
p <- p + geom_boxplot(aes(fill = Label))
# if you want color for points replace group with colour=Label
p <- p + geom_point(aes(y=value, group=Label), position = position_dodge(width=0.75))
p <- p + facet_wrap( ~ variable, scales="free")
p <- p + xlab("x-axis") + ylab("y-axis") + ggtitle("Title")
p <- p + guides(fill=guide_legend(title="Legend_Title"))
p 




回答2:


Since you don't mention a plot package , I propose here using Lattice version( I think there is more ggplot2 answers than lattice ones, at least since I am here in SO).

 ## reshaping the data( similar to the other answer)
 library(reshape2)
 dat.m <- melt(TestData,id.vars='Label')
 library(lattice)
 bwplot(value~Label |variable,    ## see the powerful conditional formula 
        data=dat.m,
        between=list(y=1),
        main="Bad or Good")




回答3:


Using base graphics, we can use at = to control box position , combined with boxwex = for the width of the boxes. The 1st boxplot statement creates a blank plot. Then add the 2 traces in the following two statements.

Note that in the following, we use df[,-1] to exclude the 1st (id) column from the values to plot. With different data frames, it may be necessary to change this to subset for whichever columns contain the data you want to plot.

boxplot(df[,-1], boxfill = NA, border = NA) #invisible boxes - only axes and plot area
boxplot(df[df$id=="Good", -1], xaxt = "n", add = TRUE, boxfill="red", 
  boxwex=0.25, at = 1:ncol(df[,-1]) - 0.15) #shift these left by -0.15
boxplot(df[df$id=="Bad", -1], xaxt = "n", add = TRUE, boxfill="blue", 
  boxwex=0.25, at = 1:ncol(df[,-1]) + 0.15) #shift to the right by +0.15

Some dummy data:

df <- data.frame(
  id = c(rep("Good",200), rep("Bad", 200)),
  F1 = c(rnorm(200,10,2), rnorm(200,8,1)),
  F2 = c(rnorm(200,7,1),  rnorm(200,6,1)),
  F3 = c(rnorm(200,6,2),  rnorm(200,9,3)),
  F4 = c(rnorm(200,12,3), rnorm(200,8,2)))



回答4:


ggplot version of the lattice plot:

library(reshape2)
library(ggplot2)
df <- read.csv("TestData.csv", header=T)
df.m <- melt(df, id.var = "Label")

ggplot(data = df.m, aes(x=Label, y=value)) + 
         geom_boxplot() + facet_wrap(~variable,ncol = 4)

Plot:




回答5:


I know this is a bit of an older question, but it is one I had as well, and while the accepted answers work, there is a way to do something similar without using additional packages like ggplot or lattice. It isn't quite as nice in that the boxplots overlap rather than showing side by side but:

boxplot(data1[,1:4])
boxplot(data2[,1:4],add=TRUE,border="red")

This puts in two sets of boxplots, with the second having an outline (no fill) in red, and also puts the outliers in red. The nice thing is, it works for two different dataframes rather than trying to reshape them. Quick and dirty way.




回答6:


In base R a formula interface with interactions (:) can be used to achieve this.

df <- read.csv("~/Desktop/TestData.csv")
df <- data.frame(stack(df[,-1]), Label=df$Label) # reshape to long format

boxplot(values ~ Label:ind, data=df, col=c("red", "limegreen"), las=2)



来源:https://stackoverflow.com/questions/14604439/plot-multiple-boxplot-in-one-graph

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!