Representing continuous probability distributions

后端 未结 10 2067
天命终不由人
天命终不由人 2021-01-30 14:27

I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I

10条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-30 15:24

    No need for histograms or symbolic computation: everything can be done at the language level in closed form, if the right point of view is taken.

    [I shall use the term "measure" and "distribution" interchangeably. Also, my Haskell is rusty and I ask you to forgive me for being imprecise in this area.]

    Probability distributions are really codata.

    Let mu be a probability measure. The only thing you can do with a measure is integrate it against a test function (this is one possible mathematical definition of "measure"). Note that this is what you will eventually do: for instance integrating against identity is taking the mean:

    mean :: Measure -> Double
    mean mu = mu id
    

    another example:

    variance :: Measure -> Double
    variance mu = (mu $ \x -> x ^ 2) - (mean mu) ^ 2
    

    another example, which computes P(mu < x):

    cdf :: Measure -> Double -> Double
    cdf mu x = mu $ \z -> if z < x then 1 else 0
    

    This suggests an approach by duality.

    The type Measure shall therefore denote the type (Double -> Double) -> Double. This allows you to model results of MC simulation, numerical/symbolic quadrature against a PDF, etc. For instance, the function

    empirical :: [Double] -> Measure
    empirical h:t f = (f h) + empirical t f
    

    returns the integral of f against an empirical measure obtained by eg. MC sampling. Also

    from_pdf :: (Double -> Double) -> Measure
    from_pdf rho f = my_favorite_quadrature_method rho f
    

    construct measures from (regular) densities.

    Now, the good news. If mu and nu are two measures, the convolution mu ** nu is given by:

    (mu ** nu) f = nu $ \y -> (mu $ \x -> f $ x + y)
    

    So, given two measures, you can integrate against their convolution.

    Also, given a random variable X of law mu, the law of a * X is given by:

    rescale :: Double -> Measure -> Measure
    rescale a mu f = mu $ \x -> f(a * x)
    

    Also, the distribution of phi(X) is given by the image measure phi_* X, in our framework:

    apply :: (Double -> Double) -> Measure -> Measure
    apply phi mu f = mu $ f . phi
    

    So now you can easily work out an embedded language for measures. There are much more things to do here, particularly with respect to sample spaces other than the real line, dependencies between random variables, conditionning, but I hope you get the point.

    In particular, the pushforward is functorial:

    newtype Measure a = (a -> Double) -> Double
    instance Functor Measure a where
        fmap f mu = apply f mu
    

    It is a monad too (exercise -- hint: this very much looks like the continuation monad. What is return ? What is the analog of call/cc ?).

    Also, combined with a differential geometry framework, this can probably be turned into something which compute Bayesian posterior distributions automatically.

    At the end of the day, you can write stuff like

    m = mean $ apply cos ((from_pdf gauss) ** (empirical data))
    

    to compute the mean of cos(X + Y) where X has pdf gauss and Y has been sampled by a MC method whose results are in data.

提交回复
热议问题