We all love robust measures like medians and interquartile ranges, but lets face it, in many fields, boxplots almost never show up in published articles, while means and sta
ggplot produces aesthetically pleasing graphs, but I don't have the gumption to try and publish any ggplot output yet.
Until the day comes, here is how I have been making the aforementioned graphs. I use a graphics package called 'gplots' in order to get the standard error bars (using data I've calculated already). Note that this code provides for two or more factors for each class/category. This requires the data to go in as a matrix and for the "beside=TRUE" command in the "barplot2" function to keep the bars from being stacked.
# Create the data (means) matrix
# Using the matrix accommodates two or more factors for each class
data.m <- matrix(c(75,34,19, 39,90,41), nrow = 2, ncol=3, byrow=TRUE,
dimnames = list(c("Factor 1", "Factor 2"),
c("Class A", "Class B", "Class C")))
# Create the standard error matrix
error.m <- matrix(c(12,10,7, 4,7,3), nrow = 2, ncol = 3, byrow=TRUE)
# Join the data and s.e. matrices into a data frame
data.fr <- data.frame(data.m, error.m)
# load library {gplots}
library(gplots)
# Plot the bar graph, with standard errors
with(data.fr,
barplot2(data.m, beside=TRUE, axes=T, las=1, ylim = c(0,120),
main=" ", sub=" ", col=c("gray20",0),
xlab="Class", ylab="Total amount (Mean +/- s.e.)",
plot.ci=TRUE, ci.u=data.m+error.m, ci.l=data.m-error.m, ci.lty=1))
# Now, give it a legend:
legend("topright", c("Factor 1", "Factor 2"), fill=c("gray20",0),box.lty=0)
It is pretty plain-Jane, aesthetically, but seems to be what most journals/old professors want to see.
I'd post the graph produced by these example data, but this is my first post on the site. Sorry. One should be able to copy-paste the whole thing (after installing the "gplots" package) without problem.
The first plot was just covered in a blog post on imachordata.com. (hat tip to David Smith on blog.revolution-computing.com) You can also read the related documentation from Hadley on ggplot2.
Here's the example code:
library(ggplot2)
data(mpg)
#create a data frame with averages and standard deviations
hwy.avg<-ddply(mpg, c("class", "year"), function(df)
return(c(hwy.avg=mean(df$hwy), hwy.sd=sd(df$hwy))))
#create the barplot component
avg.plot<-qplot(class, hwy.avg, fill=factor(year), data=hwy.avg, geom="bar", position="dodge")
#first, define the width of the dodge
dodge <- position_dodge(width=0.9)
#now add the error bars to the plot
avg.plot+geom_linerange(aes(ymax=hwy.avg+hwy.sd, ymin=hwy.avg-hwy.sd), position=dodge)+theme_bw()
It ends up looking like this:
Means and their standard errors are easily automatically computed using ggplot2
. I would recommend using the default pointranges, instead of dynamite plots. You might have to provide the position manually. Here is how:
ggplot(mtcars, aes(factor(cyl), hp, color = factor(am))) +
stat_summary(position = position_dodge(0.5))
This question is almost 2 years old now, but as a new R user in an experimental field, this was a big question for me, and this page is prominent in google results. I just discovered an answer I like better than the current set, so I thought I'd add it.
the package sciplot makes the task super easy. It gets the job done in a single command
#only necessary to get the MPG dataset from ggplot for direct comparison
library(ggplot2)
data(mpg)
attach(mpg)
#the bargraph.CI function with a couple of parameters to match the ggplot example
#see also lineplot.CI in the same package
library(sciplot)
bargraph.CI(
class, #categorical factor for the x-axis
hwy, #numerical DV for the y-axis
year, #grouping factor
legend=T,
x.leg=19,
ylab="Highway MPG",
xlab="Class")
produces this very workable graph with mostly default options. Note that the error bars are standard errors by default, but the parameter takes a function, so they can be anything you want!
Coming a little late to the game, but this solution might be useful for future users. It uses the diamond
data.frame loaded with R and takes advantage of stat_summary
along with two (super short) custom functions.
require(ggplot2)
# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}
# create a ggplot
ggplot(diamonds,aes(cut,price,fill=color))+
# first layer is barplot with means
stat_summary(fun.y=mean, geom="bar", position="dodge", colour='white')+
# second layer overlays the error bars using the functions defined above
stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, geom="errorbar", position="dodge",color = 'black', size=.5)