I\'m trying to use position_dodge on ggplot to obtain boxplots of two different signals (ind) sharing the same categories (cat). When there is a category with data for one
After some workarounds, I came up with the outcome I was looking for... (kind of)
data <- data.frame(
cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'),
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9, 0),
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))
p <- ggplot() +
scale_colour_hue(guide='none') +
geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
position=position_dodge(width=.60),
data=data,
outlier.size = 1.2,
na.rm=T) +
geom_line(aes(x=x, y=y),
data=data.frame(x=c(0,3),y=rep(0,2)),
size = 1,
col='white')
print(p)
Some people recommended using faceting for the effect I wanted. Faceting doesn't give me the effect I'm looking for. The final graph I was looking for is shown below:
If you notice, the white major tick mark at y = 10 is thicker than the other tick marks. This thicker line is the geom_line with size=1 that hides unwanted boxplots.
I wish we could combine different geom objects more seamlessly. I reported this as a bug on Hadley's github, but Hadley said this is how position_dodge behaves by design. I guess I'm using ggplot2 in a non-standard way and workarounds are the way to go on these kind of issues. Anyways, I hope this helps some of the R folks to push ggplot great functionality a little further.
I just got a clue to use faceting from one of the comments posted by Hadley at his git site, so credits goes to Hadley, the maintainer of ggplot2 package!
See if this is what you wanted. To learn more about options on setting the whiskers and others in this plot, check this help page in ggplot2 package:
?stat_boxplot
data<-data.frame(cat=c('A','A','A','A','B','B','A','A','A','A','B','B'),
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9),
ind=c('x','x','x','x','x','x','y','y','y','y','y','y'))
p <- ggplot(data = data, aes(factor(cat), values))
p + stat_boxplot(geom="boxplot", position = "dodge", width = 0.60, na.rm = TRUE) + facet_grid(.~ind)
To add colors to your plot, which in my opinion is a redundant one as you are already faceting the plot based on "ind" variable, try this:
p <- ggplot(data, aes(factor(cat), values, fill = ind))
p + stat_boxplot(geom="boxplot", position = "dodge", width = 0.60, na.rm = TRUE) + facet_grid(.~ind)
HTH!
x of B has no values, so you can add "B", 0, "x" which essentially indicates that there is no distribution of "values" for x of B. The median and other percentiles are zero.
data<-data.frame(cat=c('A','A','A','A','B','B','A','A','A','A','B','B','B'),
values=c(3,2,1,4,NA,NA,4,5,6,7,8,9,0),
ind=c('x','x','x','x','x','x','y','y','y','y','y','y','x'))
Also you do not have to add position parameters here, because when you consider x as a factor, ggplot -- geom_boxplot will automagically dodge to the sides.
print(ggplot() +
scale_colour_hue(guide='none') +
geom_boxplot(aes(x=as.factor(cat), y=values, fill=ind),
data=data,
outlier.size = 1.2,
na.rm=T))