I want to specify the probability density function of a distribution and then pick up N random numbers from that distribution in Python. How do I go about doing that?
In general, you want to have the inverse cumulative probability density function. Once you have that, then generating the random numbers along the distribution is simple:
import random
def sample(n):
return [ icdf(random.random()) for _ in range(n) ]
Or, if you use NumPy:
import numpy as np
def sample(n):
return icdf(np.random.random(n))
In both cases icdf is the inverse cumulative distribution function which accepts a value between 0 and 1 and outputs the corresponding value from the distribution.
To illustrate the nature of icdf, we'll take a simple uniform distribution between values 10 and 12 as an example:
probability distribution function is 0.5 between 10 and 12, zero elsewhere
cumulative distribution function is 0 below 10 (no samples below 10), 1 above 12 (no samples above 12) and increases linearly between the values (integral of the PDF)
inverse cumulative distribution function is only defined between 0 and 1. At 0 it is 10, at 12 it is 1, and changes linearly between the values
Of course, the difficult part is obtaining the inverse cumulative density function. It really depends on your distribution, sometimes you may have an analytical function, sometimes you may want to resort to interpolation. Numerical methods may be useful, as numerical integration can be used to create the CDF and interpolation can be used to invert it.