Updating model on PyMC3 with new observed data

浪子不回头ぞ 提交于 2020-12-04 09:01:59

问题


I have measured the diameter of 80 fruits last year, and after checking what is the best distribution of the values, I've created a PyMC3 model

with Model() as diam_model:
    mu = Normal('mu',mu=57,sd=5.42)
    sigma = Uniform('sigma',0,10)

after, as far as I understand, I've "trained" the model with my prior data (the 80 values)

with diam_model:
    dist = Normal('dist',mu=mu,sd=sigma, observed=prior_data.values)

with diam_model:
    samples=fit().sample(1000)

then I used the plot_posteriorof the samples, returning also the mean and HPD.

My idea is to measure again this year using Bayesian update to reduce the sample size. How can I add single values, and update the posterior, expecting that the HPD gets smaller and smaller?


回答1:


Kernel Density Estimated Updated Priors

Using the other answer suggested as a duplicate, one could extract approximate versions of the priors using the code from this Jupyter notebook.

First round

I'll assume we have data from the first round of sampling, which we can impose the mean 57.0 and standard deviation of 5.42.

import numpy as np
import pymc3 as pm
from sklearn.preprocessing import scale
from scipy import stats

# generate data forced to match distribution indicated
Y0 = 57.0 + scale(np.random.normal(size=80))*5.42

with pm.Model() as m0:
    # let's place an informed, but broad prior on the mean
    mu = pm.Normal('mu', mu=50, sd=10)
    sigma = pm.Uniform('sigma', 0, 10)
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y0)
    
    trace0 = pm.sample(5000, tune=5000)

Extracting new priors from posterior

We can then use the results of this model to extract KDE posteriors on the parameters with the following code from the referenced notebook:

def from_posterior(param, samples, k=100):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, k)
    y = stats.gaussian_kde(samples)(x)
    
    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return pm.Interpolated(param, x, y)

Second round

Now, if we have more data we can run a new model using the KDE updated priors:

Y1 = np.random.normal(loc=57, scale=5.42, size=100)

with pm.Model() as m1:
    mu = from_posterior('mu', trace0['mu'])
    sigma = from_posterior('sigma', trace0['sigma'])
    
    y = pm.Normal('y', mu=mu, sd=sigma, observed=Y1)
    
    trace1 = pm.sample(5000, tune=5000)

And similarly, one could use this trace to extract updated posterior estimates for future rounds of incoming data.

Caveat Emptor

The above methodology yields approximations to true updated priors and would be most useful in cases where conjugate priors are not possible. It should also be noted that I'm not sure the extent to which such KDE-base approximations introduce errors and how they propagate in a model when used repeatedly. It's a neat trick but one should be cautious about putting this into production without further validation of its robustness.

In particular, I would be very concerned about situations where the posterior distributions have strong correlation structures. The code provided here generates a "prior" distribution using only the marginals of each latent variable. This seems fine for simple models like this, and admittedly the initial priors also lack correlations, so it may not be a huge issue here. Generally, however, summarizing to marginals involves discarding information about how variables are related, and in other contexts this could be rather significant. For example, the default parameterization of a Beta distribution always leads to correlated parameters in the posterior and thus the above technique would inappropriate. Instead, one would need to infer a multi-dimensional KDE that incorporates all the latent variables.


Conjugate Model

However, in your particular case the expected distribution is Gaussian and those distributions have established closed-form conjugate models, i.e., principled solutions rather than approximations. I strongly recommend working through Kevin Murphy's Conjugate Bayesian analysis of the Gaussian distribution.

Normal-Inverse Gamma Model

The Normal-Inverse Gamma model estimates both the mean and the variance of an observed normal random variable. The mean is modeled with a normal prior; the variance with an inverse gamma. This model uses four parameters for the prior:

mu_0  = prior mean
nu    = number of observations used to estimate the mean
alpha = half the number of obs used to estimate variance
beta  = half the sum of squared deviations

Given your initial model, we could use the values

mu_0  = 57.0
nu    = 80
alpha = 40
beta  = alpha*5.42**2

You can then plot the log-likelihood of the prior as follows:

# points to compute likelihood at
mu_grid, sd_grid = np.meshgrid(np.linspace(47, 67, 101), 
                               np.linspace(4, 8, 101))

# normal ~ N(X | mu_0, sigma/sqrt(nu))
logN = stats.norm.logpdf(x=mu_grid, loc=mu_0, scale=sd_grid/np.sqrt(nu))

# inv-gamma ~ IG(sigma^2 | alpha, beta)
logIG = stats.invgamma.logpdf(x=sd_grid**2, a=alpha, scale=beta)

# full log-likelihood
logNIG = logN + logIG

# actually, we'll plot the -log(-log(likelihood)) to get nicer contour
plt.figure(figsize=(8,8))
plt.contourf(mu_grid, sd_grid, -np.log(-logNIG))
plt.xlabel("$\mu$")
plt.ylabel("$\sigma$")
plt.show()

enter image description here

Updating parameters

Given new data, Y1, one updates the parameters as follows:

# precompute some helpful values
n = Y1.shape[0]
mu_y = Y1.mean()

# updated NIG parameters
mu_n = (nu*mu_0 + n*mu_y)/(nu + n)
nu_n = nu + n
alpha_n = alpha + n/2
beta_n = beta + 0.5*np.square(Y1 - mu_y).sum() + 0.5*(n*nu/nu_n)*(mu_y - mu_0)**2

For the sake of illustrating change in the model, let's generate some data from a slightly different distribution and then plot the resulting posterior log-likelihood:

np.random.seed(53211277)
Y1 = np.random.normal(loc=62, scale=7.0, size=20)

which yields

enter image description here

Here, the 20 observations are not enough to completely move to the new location and scale I provided, but both parameters appear shifted in that direction.



来源:https://stackoverflow.com/questions/53211277/updating-model-on-pymc3-with-new-observed-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!