How to better create stacked bar graphs with multiple variables from ggplot2?

后端 未结 5 875
野趣味
野趣味 2020-12-15 11:24

I often have to make stacked barplots to compare variables, and because I do all my stats in R, I prefer to do all my graphics in R with ggplot2. I would like to learn how t

相关标签:
5条回答
  • 2020-12-15 11:44

    Your second problem can be solved with melt and cast from the reshape package

    After you've factored the elements in your data.frame called you can use something like:

    install.packages("reshape")
    library(reshape)
    
    x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
    x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations
    
    x <- cast(x, variable + value ~., length)
    colnames(x) <- c("variable","value","freq")
    ## Presto!
    ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")
    

    As an aside, I like to use grep to pull in columns from a messy import. For example:

    x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"
    

    And factoring is easier when you don't have to type c(' ', ...) a million times.

    for(x in 1:ncol(x)) { 
    df[,x] <- factor(df[,x], labels = strsplit('
    Very Interested
    Somewhat Interested
    Not Very Interested
    Not At All interested
    NA
    NA
    NA
    NA
    NA
    NA
    ', '\n')[[1]][-1]
    }
    
    0 讨论(0)
  • 2020-12-15 11:53

    You don't need prop.tables or count etc to do the 100% stacked bars. You just need +geom_bar(position="stack")

    0 讨论(0)
  • 2020-12-15 12:01

    About percentages insted of ..count.. , try:

    ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()
    

    but since it's not a good idea to shove a function into the aes(), you can write custom function to create percentages out of ..count.. , round it to n decimals etc.

    You labeled this post with plyr, but I don't see any plyr in action here, and I bet that one ddply() can do the job. Online plyr documentation should suffice.

    0 讨论(0)
  • 2020-12-15 12:06

    If I am understanding you correctly, to fix the axis labeling problem make the following change:

    # p<-ggplot(Interest, aes(Interest2, ..count..))
    p<-ggplot(Interest, aes(Interest2, ..density..))
    

    As for the second one, I think you would be better off working with the reshape package. You can use it to aggregate data into groups very easily.

    In reference to aL3xa's comment below...

    library(ggplot2)
    r<-rnorm(1000)
    d<-as.data.frame(cbind(r,1:1000))
    ggplot(d,aes(r,..density..))+geom_bar()
    

    Returns...

    alt text http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png

    The bins are now densities...

    0 讨论(0)
  • 2020-12-15 12:10

    Your first question: Would this help?

    geom_bar(aes(y=..count../sum(..count..)))
    

    Your second question; could you use reorder to sort the bars? Something like

    aes(reorder(Interest, Value, mean), Value)
    

    (just back from a seven hour drive - am tired - but I guess it should work)

    0 讨论(0)
提交回复
热议问题