Comparing Spike and Slab Priors for Bayesian Variable Selection

  • Gertraud Malsiner-Walli Johannes Kepler Universität Linz, Austria
  • Helga Wagner Johannes Kepler Universität Linz, Austria

Abstract

An important task in building regression models is to decide which regressors should be included in the final model. In a Bayesian approach, variable selection can be performed using mixture priors with a spike and a slab component for the effects subject to selection. As the spike is concentrated at zero, variable selection is based on the probability of assigning the corresponding regression effect to the slab component. These posterior inclusion probabilities can be determined by MCMC sampling. In this paper we compare the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data. Further, we investigate posterior inclusion probabilities analytically for different slabs in two simple settings. Application of variable selection with spike and slab priors is illustrated on a data set of psychiatric patients where the goal is to identify covariates affecting metabolism.

References

Barbieri, M. M., and Berger, J. O. (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870-897.

Dey, T., Ishwaran, H., and Rao, S. J. (2008). An in-depth look at highest posterior model selection. Econometric Theory, 24, 377-403.

Fernández, C., Ley, E., and Steel, M. F. J. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100, 381-427.

Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2, 1360-1383.

George, E. I., and McCulloch, R. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88, 881-889.

George, E. I., and McCulloch, R. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7, 339-373.

Geweke, J. (1996). Variable selection and model comparison in regression. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. Smith (Eds.), Bayesian Statistics 5 – Proceedings of the fifth Valencia International Meeting (p. 609-620). Oxford University Press.

Geyer, C. (1992). Practical Markov chain Monte Carlo. Statistical Science, 7, 473-511.

Ishwaran, H., and Rao, S. J. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association, 98, 438-455.

Ishwaran, H., and Rao, S. J. (2005). Spike and slab variable selection; frequentist and Bayesian strategies. Annals of Statistics, 33, 730-773.

Johnson, V. E., and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. Journal of the Royal Statistical Society, Series B, 72, 143-170.

Konrath, S., Kneib, T., and Fahrmeir, L. (2008). Bayesian Regularisation in Structured Additive Regression Models for Survival Data (Tech. Rep. No. 35). University of Munich, Department of Statistics.

Malsiner-Walli, G. (2010). Bayesian Variable Selection in Normal Regression Models. Unpublished master’s thesis, Johannes Kepler Universität Linz, Institut für Angewandte Statistik.

Mitchell, T., and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 404, 1023-1032.

O’Hagan, A. (1995). Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society, Series B, 57, 99-118.

Smith, M., and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics, 75, 317-343.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267-288.

Wagner, H., and Duller, C. (2011). Bayesian model selection for logistic regression models with random intercept. Computational Statistics and Data Analysis. (doi:10.1016/j.csda.2011.06.033)

Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. Goel and A. Zellner (Eds.), Bayesian Inference and Decision Techniques (p. 233-243). Elsevier Sccience Publishers.

Published
2016-02-24
How to Cite
Malsiner-Walli, G., & Wagner, H. (2016). Comparing Spike and Slab Priors for Bayesian Variable Selection. Austrian Journal of Statistics, 40(4), 241–264. https://doi.org/10.17713/ajs.v40i4.215
Section
Articles