ggplot2: How to combine histogram, rug plot, and logistic regression prediction in a single graph

╄→гoц情女王★ 提交于 2019-12-12 08:13:22

问题


I am trying to plot combined graphs for logistic regressions as the function logi.hist.plot but I would like to do it using ggplot2 (aesthetic reasons).

The problem is that only one of the histograms should have the scale_y_reverse().

Is there any way to specify this in a single plot (see code below) or to overlap the two histograms by using coordinates that can be passed to the previous plot?

ggplot(dat) + 
    geom_point(aes(x=ind, y=dep)) + 
    stat_smooth(aes(x=ind, y=dep), method=glm, method.args=list(family="binomial"), se=FALSE) + 
    geom_histogram(data=dat[dat$dep==0,], aes(x=ind)) +
    geom_histogram(data=dat[dat$dep==1,], aes(x=ind)) ## + scale_y_reverse()

This final plot is what I have been trying to achieve:


回答1:


You haven't provided a reproducible example, so here's an example with fake data.

We use geom_segment to create the "bars" for the histogram and also to create the rug plots. Adjust the size parameter to change the "bar" widths in the histogram. In the example below, the bar heights are equal to the percentage of values within a given x range. If you want to change the absolute heights of the bars, just multiply n/sum(n) by a scaling factor when you create the h data frame of histogram counts.

To generate histogram counts for the plot, we pre-summarize the data to create the histogram values. Note the ifelse statement in the mutate function, which adjusts the values of pct in order to get the upward and downward bars in the plot, depending on whether y is 0 or 1, respectively. You can do this in the plot code itself, but then you need two separate calls to geom_segment.

library(dplyr)

# Fake data
set.seed(1926)
dat = data.frame(y = sample(0:1, 1000, replace=TRUE))
dat$x1 = rnorm(1000, 5, 2) * (dat$y+1)

# Summarise data to create histogram counts
h = dat %>% group_by(y) %>%
  mutate(breaks = cut(x1, breaks=seq(-2,20,0.5), labels=seq(-1.75,20,0.5), 
                      include.lowest=TRUE),
         breaks = as.numeric(as.character(breaks))) %>%
  group_by(y, breaks) %>% 
  summarise(n = n()) %>%
  mutate(pct = ifelse(y==0, n/sum(n), 1 - n/sum(n))) 

ggplot() +
  geom_segment(data=h, size=4, show.legend=FALSE,
               aes(x=breaks, xend=breaks, y=y, yend=pct, colour=factor(y))) +
  geom_segment(dat=dat[dat$y==0,], aes(x=x1, xend=x1, y=0, yend=-0.02), size=0.2, colour="grey30") +
  geom_segment(dat=dat[dat$y==1,], aes(x=x1, xend=x1, y=1, yend=1.02), size=0.2, colour="grey30") +
  geom_line(data=data.frame(x=seq(-2,20,0.1), 
                            y=predict(glm(y ~ x1, family="binomial", data=dat), 
                                      newdata=data.frame(x1=seq(-2,20,0.1)),
                                      type="response")), 
            aes(x,y), colour="grey50", lwd=1) +
  scale_y_continuous(limits=c(-0.02,1.02)) +
  scale_x_continuous(limits=c(-1,20)) +
  theme_bw(base_size=12)



来源:https://stackoverflow.com/questions/35366499/ggplot2-how-to-combine-histogram-rug-plot-and-logistic-regression-prediction

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!