PyMC3 - Differences in ways observations are passed to model -> difference in results?

问题

I'm trying to understand if there is any meaningful difference in the ways of passing data into a model - either aggregated or as single trials (note this will only be a sensical question for certain distributions e.g. Binomial).

Predicting p for a yes/no trail, using a simple model with a Binomial distribution.

What is the difference in the computation/results of the following models (if any)?

I choose the two extremes, either passing in a single trail at once (reducing to Bernoulli) or passing in the sum of the entire series of trails, to exemplify my meaning though I am interested in the difference in between these extremes also.

# set up constants
p_true = 0.1
N = 3000
observed = scipy.stats.bernoulli.rvs(p_true, size=N)

Model 1: combining all observations into a single data point

with pm.Model() as binomial_model1:
    p = pm.Uniform('p', lower=0, upper=1)
    observations = pm.Binomial('observations', N, p, observed=np.sum(observed))
    trace1 = pm.sample(40000)

Model 2: using each observation individually

with pm.Model() as binomial_model2:
    p = pm.Uniform('p', lower=0, upper=1)
    observations = pm.Binomial('observations', 1, p, observed=observed)
    trace2 = pm.sample(40000)

There is isn't any noticeable difference in the trace or posteriors in this case. I attempted to dig into the pymc3 source code to try to see how the observations were being processed but couldn't find the right part.

Possible expected answers:

pymc3 aggregates the observations under the hood for Binomial anyway so their is no difference
the resultant posterior surface (which is explored in the sample process) is identical in each case -> there is no meaningful/statistical difference in the two models
there are differences in the resultant statistics because of this and that...

回答1:

This is an interesting example! Your second suggestion is correct: you can actually work out the posterior analytically, and it will be distributed according to

Beta(sum(observed), N - sum(observed))

in either case.

The difference in modelling approach would show up if you used, for example, pm.sample_ppc, in that the first would be distributed according to Binomial(N, p) and the second would be N draws of Binomial(1, p).

来源：https://stackoverflow.com/questions/46952953/pymc3-differences-in-ways-observations-are-passed-to-model-difference-in-re

标签

python

bayesian

pymc

pymc3