问题
I'm looking to plot the following histograms:
library(palmerpenguins)
library(tidyverse)
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species)
For each histogram, I would like to add a Normal Distribution to each histogram with each species mean and standard deviation.
Of course I'm aware that I could compute the group specific mean and SD before embarking on the ggplot command, but I wonder whether there is a smarter/faster way to do this.
I have tried:
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram() + 
  facet_wrap(~species) + 
  stat_function(fun = dnorm)
But this only gives me a thin line at the bottom:
Any ideas? Thanks!
Edit I guess what I'm trying to recreate is this simple command from Stata:
hist bill_length_mm, by(species) normal
which gives me this:
I understand that there are some suggestions here: using stat_function and facet_wrap together in ggplot2 in R
But I'm specifically looking for a short answer that does not require me creating a separate function.
回答1:
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 1) + 
  stat_theodensity(aes(y = after_stat(count))) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(binwidth = 2) + 
  stat_theodensity(aes(y = after_stat(count)*2)) +
  facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).

Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density))) + 
  stat_theodensity() +
  facet_wrap(~species)
回答2:
While the ggh4x package is the way to go in this case, a more generalizable approach is with tapply and the use of the PANEL variable which is added to the data when a facet is applied.
penguins %>% 
  ggplot(aes(x=bill_length_mm, fill = species)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30) + 
  facet_wrap(~species) + 
  geom_line(aes(y = dnorm(bill_length_mm,
                          mean = tapply(bill_length_mm, species, mean, na.rm = TRUE)[PANEL],
                          sd = tapply(bill_length_mm, species, sd, na.rm = TRUE)[PANEL])))
来源:https://stackoverflow.com/questions/65924407/ggplot-add-normal-distribution-while-using-facet-wrap