arviz_stats.compare#
- arviz_stats.compare(compare_dict, method='stacking', var_name=None)[source]#
Compare models based on their expected log pointwise predictive density (ELPD).
The ELPD is estimated by Pareto smoothed importance sampling leave-one-out cross-validation, the same method used by
func:arviz_stats.loo. The method is described in [1] and [2]. By default, the weights are estimated using"stacking"as described in [3].- Parameters:
- compare_dict: dict of {str: DataTree or ELPDData}
A dictionary of model names and
xr.DataTreeorELPDData.- method: str, optional
Method used to estimate the weights for each model. Available options are:
‘stacking’ : stacking of predictive distributions.
‘BB-pseudo-BMA’ : pseudo-Bayesian Model averaging using Akaike-type weighting. The weights are stabilized using the Bayesian bootstrap.
‘pseudo-BMA’: pseudo-Bayesian Model averaging using Akaike-type weighting, without Bootstrap stabilization (not recommended).
For more information read https://arxiv.org/abs/1704.02030
- var_name: str, optional
If there is more than a single observed variable in the
InferenceData, which should be used as the basis for comparison.
- Returns:
ADataFrame,orderedfrombesttoworstmodel(measuredbytheELPD).Theindexreflectsthekeywithwhichthemodelsarepassedtothisfunction.Thecolumnsare:- rank:
Therank-order ofthemodels. 0isthebest. - elpd:
ELPDestimatedeitherusing(PSIS-LOO-CV elpd_loo orWAICelpd_waic). Higher ELPD indicates higher out-of-sample predictive fit (“better” model).
- pIC:
Estimatedeffectivenumberof parameters. - elpd_diff:
ThedifferenceinELPDbetweentwomodels. If more than two models are compared, the difference is computed relative to the top-ranked model, that always has a elpd_diff of 0.
- weight:
Relativeweightforeachmodel. This can be loosely interpreted as the probability of each model (among the compared model) given the data. By default the uncertainty in the weights estimation is considered using Bayesian bootstrap.
- SE:
StandarderroroftheELPDestimate. If method = BB-pseudo-BMA these values are estimated using Bayesian bootstrap.
- dSE:
StandarderrorofthedifferenceinELPDbetweeneachmodelandthetop-ranked model. It’s always 0 for the top-ranked model.
- warning:
Avalueof 1indicatesthatthecomputationoftheELPDmaynotbereliable. This could be indication of WAIC/LOO starting to fail see http://arxiv.org/abs/1507.04544 for details.
See also
looCompute the ELPD using the Pareto smoothed importance sampling Leave-one-out cross-validation method.
arviz_plots.plot_compareSummary plot for model comparison.
References
[1]Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.
[2]Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646
[3]Yao et al. Using stacking to average Bayesian predictive distributions Bayesian Analysis, 13, 3 (2018). https://doi.org/10.1214/17-BA1091 arXiv preprint https://arxiv.org/abs/1704.02030.
Examples
Compare the centered and non centered models of the eight school problem:
In [1]: In [1]: from arviz_stats import compare ...: ...: from arviz_base import load_arviz_data ...: ...: data1 = load_arviz_data("non_centered_eight") ...: ...: data2 = load_arviz_data("centered_eight") ...: ...: compare_dict = {"non centered": data1, "centered": data2} ...: ...: compare(compare_dict) ...: Out[1]: rank elpd p ... se dse warning non centered 0 -30.716361 0.902646 ... 1.333201 0.000000 True centered 1 -30.781004 0.945475 ... 1.347355 0.061164 False [2 rows x 8 columns]