how to sample from an upside down bell curve

问题

I can generate numbers with uniform distribution by using the code below:

runif(1,min=10,max=20)

How can I sample randomly generated numbers that fall more frequently closer to the minimum and maxium boundaries? (Aka an "upside down bell curve")

回答1:

Well, bell curve is usually gaussian, meaning it doesn't have min and max. You could try Beta distribution and map it to desired interval. Along the lines

min <- 1
max <- 20
q <- min + (max-min)*rbeta(10000, 0.5, 0.5)

As @Gregor-reinstateMonica noted, Beta distribution is bounded on both ends, [0...1], so it could be easily mapped into any bounded interval just by scale and shift. It has two parameters, and symmetric if those parameters are equal. Above 1 parameters make it kind of bell distribution, but below 1 parameters make it into inverse bell, what you're looking for. You could play with them, put different values instead of 0.5 and see how it is going. Parameters equal to 1 makes it uniform.

回答2:

Sampling from a beta distribution is a good idea. Another way is to sample a number of uniform numbers and then take the minimum or maximum of them.

According to the theory of order statistics, the cumulative distribution function for the maximum is F(x)^n where F is the cdf from which the sample is taken and n is the number of samples, and the cdf for the minimum is 1 - (1 - F(x))^n. For a uniform distribution, the cdf is a straight line from 0 to 1, i.e., F(x) = x, and therefore the cdf of the maximum is x^n and the cdf of the minimum is 1 - (1 - x)^n. As n increases, these become more and more curved, with most of the mass close to the ends.

A web search for "order statistics" will turn up some resources.

回答3:

If you don't care about decimal places, a hacky way would be to generate a large sample of normally distributed datapoints using rnorm(), then count the number of times each given rounded value appears (n), and then substract n from the maximum value of n (max(n)) to get inverse counts.

You can then use the inverse count to make a new vector (that you can sample from), i.e.:


library(tidyverse)

x <- rnorm(100000, 100, 15)

x_tib <- round(x) %>%
  tibble(x = .) %>%
  count(x) %>%
  mutate(new_n = max(n) - n)

new_x <- rep(x_tib$x, x_tib$new_n)

qplot(new_x, binwidth = 1)

来源：https://stackoverflow.com/questions/59886064/how-to-sample-from-an-upside-down-bell-curve

标签

statistics