Calculating WAIC for models with multiple likelihood functions with pymc3

对着背影说爱祢 提交于 2021-01-29 11:45:01

问题


I try to predict the outcome of soccer games based on the number of goals scored and I use the following model:

with pm.Model() as model:
  # global model parameters
   h = pm.Normal('h', mu = mu, tau = tau)
   sd_a = pm.Gamma('sd_a', .1, .1) 
   sd_d = pm.Gamma('sd_d', .1, .1) 
   alpha = pm.Normal('alpha', mu=mu, tau = tau)

  # team-specific model parameters
   a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)
   d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)

   atts = pm.Deterministic('atts', a_s - tt.mean(a_s))
   defs = pm.Deterministic('defs', d_s - tt.mean(d_s))
   h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])
   a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])

  # likelihood of observed data
   h_goals = pm.Poisson('h_goals', mu=h_theta, observed=observed_h_goals)
   a_goals = pm.Poisson('a_goals', mu=a_theta, observed=observed_a_goals)

When I sample the model, the trace plots look fine.

Afterward when I want to calculate the WAIC:

waic = pm.waic(trace, model)

I get the following error:


----> 1 waic = pm.waic(trace, model)

~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py in wrapped(*args, **kwargs)
22 )
23 kwargs[new] = kwargs.pop(old)
—> 24 return func(*args, **kwargs)
25
26 return wrapped

~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)
1176 “”"
1177 inference_data = convert_to_inference_data(data)
-> 1178 log_likelihood = _get_log_likelihood(inference_data)
1179 scale = rcParams[“stats.ic_scale”] if scale is None else scale.lower()
1180

~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)
403 var_names.remove(“lp”)
404 if len(var_names) > 1:
–> 405 raise TypeError(
406 “Found several log likelihood arrays {}, var_name cannot be None”.format(var_names)
407 )

TypeError: Found several log likelihood arrays [‘h_goals’, ‘a_goals’], var_name cannot be None

Is there any way to calculate WAIC and compare models when I have two likelihood functions in pymc3? (1: the goals scored by the home 2: the goals scored by the away team)


回答1:


It is possible but requires defining what are you interested in predicting, it can be the result of the match, or could be the number of goals scored by either team (not the aggregate, each match would then provide 2 results to predict).

A complete and detailed answer is available at PyMC discourse.

Here I transcribe the case where the quantity of interest is the result of the match as a summary. ArviZ will automatically retrieve 2 pointwise log likelihood arrays, which we have to combine somehow (e.g. add, concatenate, groupby...) to get a single array. The tricky part is knowing which operation corresponds to each quantity, which has to be assessed on a per model basis. In this particular example, the predictive accuracy of a match result can be calculated in the following way:

dims = {
    "home_points": ["match"],
    "away_points": ["match"],
}
idata = az.from_pymc3(trace, dims=dims, model=model)

Setting the match dim is important to tell xarray how to align the pointwise log likelihood arrays, otherwise they would not be broadcasted and aligned in the desired way.

idata.sample_stats["log_likelihood"] = (
    idata.log_likelihood.home_points + idata.log_likelihood.away_points
)
az.waic(idata)
# Output
# Computed from 3000 by 60 log-likelihood matrix
#
#           Estimate       SE
# elpd_waic  -551.28    37.96
# p_waic       46.16        -
#
# There has been a warning during the calculation. Please check the results.

Note that ArviZ>=0.7.0 is required.



来源:https://stackoverflow.com/questions/61125228/calculating-waic-for-models-with-multiple-likelihood-functions-with-pymc3

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!