Boxplot schmoxplot: How to plot means and standard errors conditioned by a factor in R?

后端 未结 5 1172
孤独总比滥情好
孤独总比滥情好 2020-12-13 15:57

We all love robust measures like medians and interquartile ranges, but lets face it, in many fields, boxplots almost never show up in published articles, while means and sta

相关标签:
5条回答
  • 2020-12-13 16:13

    ggplot produces aesthetically pleasing graphs, but I don't have the gumption to try and publish any ggplot output yet.

    Until the day comes, here is how I have been making the aforementioned graphs. I use a graphics package called 'gplots' in order to get the standard error bars (using data I've calculated already). Note that this code provides for two or more factors for each class/category. This requires the data to go in as a matrix and for the "beside=TRUE" command in the "barplot2" function to keep the bars from being stacked.

    # Create the data (means) matrix
    # Using the matrix accommodates two or more factors for each class
    
    data.m <- matrix(c(75,34,19, 39,90,41), nrow = 2, ncol=3, byrow=TRUE,
                   dimnames = list(c("Factor 1", "Factor 2"),
                                    c("Class A", "Class B", "Class C")))
    
    # Create the standard error matrix
    
    error.m <- matrix(c(12,10,7, 4,7,3), nrow = 2, ncol = 3, byrow=TRUE)
    
    # Join the data and s.e. matrices into a data frame
    
    data.fr <- data.frame(data.m, error.m) 
    
    # load library {gplots}
    
    library(gplots)
    
    # Plot the bar graph, with standard errors
    
    with(data.fr,
         barplot2(data.m, beside=TRUE, axes=T, las=1, ylim = c(0,120),  
                    main=" ", sub=" ", col=c("gray20",0),
                        xlab="Class", ylab="Total amount (Mean +/- s.e.)",
                    plot.ci=TRUE, ci.u=data.m+error.m, ci.l=data.m-error.m, ci.lty=1))
    
    # Now, give it a legend:
    
    legend("topright", c("Factor 1", "Factor 2"), fill=c("gray20",0),box.lty=0)
    

    It is pretty plain-Jane, aesthetically, but seems to be what most journals/old professors want to see.

    I'd post the graph produced by these example data, but this is my first post on the site. Sorry. One should be able to copy-paste the whole thing (after installing the "gplots" package) without problem.

    0 讨论(0)
  • 2020-12-13 16:22

    The first plot was just covered in a blog post on imachordata.com. (hat tip to David Smith on blog.revolution-computing.com) You can also read the related documentation from Hadley on ggplot2.

    Here's the example code:

    library(ggplot2)
    data(mpg)
    
    #create a data frame with averages and standard deviations
     hwy.avg<-ddply(mpg, c("class", "year"), function(df)
     return(c(hwy.avg=mean(df$hwy), hwy.sd=sd(df$hwy))))
    
    #create the barplot component
     avg.plot<-qplot(class, hwy.avg, fill=factor(year), data=hwy.avg, geom="bar", position="dodge")
    
    #first, define the width of the dodge
    dodge <- position_dodge(width=0.9)
    
    #now add the error bars to the plot
    avg.plot+geom_linerange(aes(ymax=hwy.avg+hwy.sd, ymin=hwy.avg-hwy.sd), position=dodge)+theme_bw()
    

    It ends up looking like this:

    0 讨论(0)
  • 2020-12-13 16:23

    Means and their standard errors are easily automatically computed using ggplot2. I would recommend using the default pointranges, instead of dynamite plots. You might have to provide the position manually. Here is how:

    ggplot(mtcars, aes(factor(cyl), hp, color = factor(am))) +
      stat_summary(position = position_dodge(0.5))
    

    0 讨论(0)
  • 2020-12-13 16:29

    This question is almost 2 years old now, but as a new R user in an experimental field, this was a big question for me, and this page is prominent in google results. I just discovered an answer I like better than the current set, so I thought I'd add it.

    the package sciplot makes the task super easy. It gets the job done in a single command

    #only necessary to get the MPG dataset from ggplot for direct comparison
    library(ggplot2)
    data(mpg)
    attach(mpg)
    
    #the bargraph.CI function with a couple of parameters to match the ggplot example
    #see also lineplot.CI in the same package
    library(sciplot)
    bargraph.CI(
      class,  #categorical factor for the x-axis
      hwy,    #numerical DV for the y-axis
      year,   #grouping factor
      legend=T, 
      x.leg=19,
      ylab="Highway MPG",
      xlab="Class")
    

    produces this very workable graph with mostly default options. Note that the error bars are standard errors by default, but the parameter takes a function, so they can be anything you want! sciplot bargraph.CI with mpg data

    0 讨论(0)
  • 2020-12-13 16:31

    Coming a little late to the game, but this solution might be useful for future users. It uses the diamond data.frame loaded with R and takes advantage of stat_summary along with two (super short) custom functions.

    require(ggplot2)
    
    # create functions to get the lower and upper bounds of the error bars
    stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
    lowsd <- function(x){return(mean(x)-stderr(x))}
    highsd <- function(x){return(mean(x)+stderr(x))}
    
    # create a ggplot
    ggplot(diamonds,aes(cut,price,fill=color))+
      # first layer is barplot with means
      stat_summary(fun.y=mean, geom="bar", position="dodge", colour='white')+
      # second layer overlays the error bars using the functions defined above
      stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd, geom="errorbar", position="dodge",color = 'black', size=.5)
    

    0 讨论(0)
提交回复
热议问题