R Normalize then plot two histograms together in R

坚强是说给别人听的谎言 提交于 2019-12-03 00:14:41

ggplot2 makes it relatively straightforward to plot normalized histograms of groups with unequal size. Here's an example with fake data:

library(ggplot2)

# Fake data (two normal distributions)
set.seed(20)
dat1 = data.frame(x=rnorm(1000, 100, 10), group="A")
dat2 = data.frame(x=rnorm(2000, 120, 20), group="B")
dat = rbind(dat1, dat2)

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Unormalized")

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=..density..), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Normalized")

If you want to make overlayed density plots, you can do that as well. adjust controls the bandwidth. This is already normalized by default.

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_density(alpha=0.4, lwd=0.8, adjust=0.5) 

UPDATE: In answer to your comment, the following code should do it. (..density..)/sum(..density..) results in the total density over the two histograms adding up to one, and the total density of each individual group adding up to 0.5. So you have multiply by 2 in order for the total density of each group to be individually normalized to 1. In general, you have to multiply by n, where n is the number of groups. This seems kind of kludgy and there may be a more elegant approach.

library(scales) # For percent_format()

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=2*(..density..)/sum(..density..)), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  scale_y_continuous(labels=percent_format())

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!