Stacked histogram from already summarized counts using ggplot2

前端 未结 3 2060
别跟我提以往
别跟我提以往 2021-01-02 06:29

I would like some help coloring a ggplot2 histogram generated from already-summarized count data.

The data are something like counts of # males and # females living

3条回答
  •  离开以前
    2021-01-02 07:14

    Here's a hack using ggplot_build. The idea is to first get your old/original plot:

    p <- ggplot(data = X, aes(x=C)) + geom_histogram()
    

    stored in p. Then, use ggplot_build(p)$data[[1]] to extract the data, specifically, the columns xmin and xmax (to get the same breaks/binwidths of histogram) and count column (to normalize the percentage by count. Here's the code:

    # get old plot
    p <- ggplot(data = X, aes(x=C)) + geom_histogram()
    # get data of old plot: cols = count, xmin and xmax
    d <- ggplot_build(p)$data[[1]][c("count", "xmin", "xmax")]
    # add a id colum for ddply
    d$id <- seq(nrow(d))
    

    How to generate data now? What I understand from your post is this. Take for example the first bar in your plot. It has a count of 2 and it extends from xmin = 147 to xmax = 156.8. When we check X for these values:

    X[X$C >= 147 & X$C <= 156.8, ] # count = 2 as shown below
    #    C1 C2   C
    # 19 91 63 154
    # 75 86 70 156
    

    Here, I compute (91+86)/(154+156)*(count=2) = 1.141935 and (63+70)/(154+156) * (count=2) = 0.8580645 as the two normalised values for each bar we'll generate.

    require(plyr)
    dd <- ddply(d, .(id), function(x) {
        t <- X[X$C >= x$xmin & X$C <= x$xmax, ]
        if(nrow(t) == 0) return(c(0,0))
        p <- colSums(t)[1:2]/colSums(t)[3] * x$count
    })
    
    # then, it just normal plotting
    require(reshape2)
    dd <- melt(dd, id.var="id")
    ggplot(data = dd, aes(x=id, y=value)) + 
          geom_bar(aes(fill=variable), stat="identity", group=1)
    

    And this is the original plot:

    original_ggplot2_plot

    And this is what I get:

    ggplot2_weird_histogram_plot

    Edit: If you also want to get the breaks proper, then, you can get the corresponding x coordinates from the old plot and use it here instead of id:

    p <- ggplot(data = X, aes(x=C)) + geom_histogram()
    d <- ggplot_build(p)$data[[1]][c("count", "x", "xmin", "xmax")]
    d$id <- seq(nrow(d))
    
    require(plyr)
    dd <- ddply(d, .(id), function(x) {
        t <- X[X$C >= x$xmin & X$C <= x$xmax, ]
        if(nrow(t) == 0) return(c(x$x,0,0))
        p <- c(x=x$x, colSums(t)[1:2]/colSums(t)[3] * x$count)
    })
    
    require(reshape2)
    dd.m <- melt(dd, id.var="V1", measure.var=c("V2", "V3"))
    ggplot(data = dd.m, aes(x=V1, y=value)) + 
          geom_bar(aes(fill=variable), stat="identity", group=1)
    

    enter image description here

提交回复
热议问题