Abstract
During the past decade, shrinkage priors have received much attention in Bayesian analysis of high-dimensional data. This paper establishes the posterior consistency for high-dimensional linear regression with a class of shrinkage priors, which has a heavy and flat tail and allocates a sufficiently large probability mass in a very small neighborhood of zero. While enjoying its efficiency in posterior simulations, the shrinkage prior can lead to a nearly optimal posterior contraction rate and the variable selection consistency as the spike-and-slab prior. Our numerical results show that under the posterior consistency, Bayesian methods can yield much better results in variable selection than the regularization methods such as LASSO and SCAD. This paper also establishes a BvM-type result, which leads to a convenient way of uncertainty quantification for regression coefficient estimates.
Similar content being viewed by others
References
Armagan A, Dunson D B, Clyde M. Generalized beta mixtures of Gaussians. Adv Neural Information Process Syst, 2011, 24: 523–531
Armagan A, Dunson D B, Lee J. Generalized double Pareto shrinkage. Statist Sinica, 2013, 23: 119–143
Armagan A, Dunson D B, Lee J, et al. Posterior consistency in linear models under shrinkage priors. Biometrika, 2013, 100: 1011–1018
Bai R, Ghosh M. On the beta prime prior for scale parameters in high-dimensional Bayesian regression models. Statist Sinica, 2021, 31: 843–865
Barron A R. Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In: Bayesian Statist, vol. 6. Oxford: Clarendon Press, 1999, 27–52
Belitser E, Ghosal S. Empirical Bayes oracle uncertainty quantification for regression. Ann Statist, 2020, 48: 3113–3137
Bhadra A, Datta J, Li Y, et al. Prediction risk for the Horseshoe regression. J Mach Learn Res, 2019, 20: 1–39
Bhattacharya A, Pati D, Pillai N S, et al. Dirichlet-Laplace priors for optimal shrinkage. J Amer Statist Assoc, 2015, 110: 1479–1490
Bontemps D. Bernstein-von Mises theorems for Gaussian regression with increasing number of regressors. Ann Statist, 2011, 39: 2557–2584
Breheny P, Huang J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat, 2011, 5: 232–253
Carvalho C M, Polson N G, Scott J G. The horseshoe estimator for sparse signals. Biometrika, 2010, 97: 465–480
Castillo I, Schmidt-Hieber J, van der Vaart A. Bayesian linear regression with sparse priors. Ann Statist, 2015, 43: 1986–2018
Castillo I, van der Vaart A. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann Statist, 2012, 40: 2069–2101
Chen J H, Chen Z H. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 2008, 95: 759–771
Chen T Q, Fox E B, Guestrin C. Stochastic gradient Hamiltonian Monte Carlo. J Mach Learn Res, 2014, 15: 1683–1691
Chiang A P, Beck J S, Yen H J, et al. Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proc Natl Acad Sci USA, 2006, 103: 6287–6292
Dezeure R, Bühlmann P, Meier L, et al. High-dimensional inference: Confidence intervals, p-values and R-software hdi. Statist Sci, 2015, 30: 533–558
Duane S, Kennedy A D, Pendleton B J, et al. Hybrid Monte Carlo. Phys Lett B, 1987, 195: 216–222
Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Amer Statist Assoc, 2004, 99: 96–104
Fan J Q, Li R Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Statist Softw, 2010, 33: 1–22
Gao C, Ada W, van der Vaart A W, et al. A general framework for Bayes structured linear models. Ann Statist, 2020, 48: 2848–2878
George E I, McCulloch R E. Variable selection via Gibbs sampling. J Amer Statist Assoc, 1993, 88: 881–889
Ghosal S. Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli, 1999, 5: 315–331
Ghosal S, Ghosh J K, van der Vaart A W. Convergence rates of posterior distributions. Ann Statist, 2000, 28: 500–531
Ghosal S, van der Vaart A W. Convergence rates of posterior distributions for noniid observations. Ann Statist, 2007, 35: 192–223
Ghosh P, Chakrabarti A. Posterior concentration properties of a general class of shrinkage priors around nearly black vectors. arXiv:1412.8161, 2014
Girolami M, Galderhead B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J R Stat Soc Ser B Stat Methodol, 2011, 73: 123–214
Griffin J E, Brown P J. Bayesian hyper-Lassos with non-convex penalization. Aust N Z J Stat, 2011, 53: 423–442
Griffin J E, Brown P J. Structuring shrinkage: Some correlated priors for regression. Biometrika, 2012, 99: 481–487
Hahn P R, Carvalho C M. Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. J Amer Statist Assoc, 2015, 110: 435–448
Hans C. Bayesian lasso regression. Biometrika, 2009, 96: 835–845
Inglot T. Inequalities for quantiles of the chi-square distribution. Probab Math Statist, 2010, 30: 339–351
Ishwaran H, Rao J S. Spike and slab variable selection: Frequentist and Bayesian strategies. Ann Statist, 2005, 33: 730–773
Jiang W X. Bayesian variable selection for high dimensional generalized linear models: Convergence rate of the fitted densities. Ann Statist, 2007, 35: 1487–1511
Johnson V E, Rossel D. Bayesian model selection in high-dimensional settings. J Amer Statist Assoc, 2012, 107: 649–660
Kleijn B J K, van der Vaart A W. Misspecification in infinite-dimensional Bayesian statistics. Ann Statist, 2006, 34: 837–877
Li H N, Pati D. Variable selection using shrinkage priors. Comput Statist Data Anal, 2017, 107: 107–119
Li X G, Zhao T, Yuan X M, et al. The flare package for high dimensional linear regression and precision matrix estimation in R. J Mach Learn Res, 2015, 16: 553–557
Liang F M, Song Q F, Yu K. Bayesian subset modeling for high-dimensional generalized linear models. J Amer Statist Assoc, 2013, 108: 589–606
Liang F M, Zhang J. Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika, 2008, 95: 961–977
Lockhart R, Taylor J, Tibshirani R J, et al. A significance test for the Lasso. Ann Statist, 2014, 42: 413–468
Luo S K, Song R, Witten D. Sure screening for Gaussian graphical models. arXiv:1407.7819, 2014
Martin R, Mess R, Walker S G. Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli, 2017, 23: 1822–1847
Narisetty N N. Statistical analysis of complex data: Bayesian model selection and functional data depth. PhD Thesis. Ann Arbor: University of Michigan, 2016
Narisetty N N, He X M. Bayesian variable selection with shrinking and diffusing priors. Ann Statist, 2014, 42: 789–817
Neal R M. MCMC using Hamiltonian dynamics. In: Handbook of Markov Chain Monte Carlo. New York: Chapman and Hall/CRC, 2011, 113–162
Park T, Casella G. The Bayesian Lasso. J Amer Statist Assoc, 2008, 103: 681–686
Pati D, Bhattacharya A, Pillai N S, et al. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann Statist, 2014, 42: 1102–1130
Raskutti G, Wainwright M J, Yu B. Minimax rates of estimation for high-dimensional linear regression over Lq-balls. IEEE Trans Inform Theory, 2011, 57: 6976–6994
Ročková V, George E I. The spike-and-slab LASSO. J Amer Statist Assoc, 2018, 113: 431–444
Scheetz T E, Kim K Y A, Swiderski R E, et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci USA, 2006, 103: 14429–14434
Scott J G, Berger J O. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann Statist, 2010, 38: 2587–2619
Song Q. Bayesian shrinkage towards sharp minimaxity. Electron J Stat, 2020, 14: 2714–2741
Song Q, Cheng G. Optimal false discovery control of minimax estimator. arXiv:1812.10013, 2018
Song Q, Liang F. High-dimensional variable selection with reciprocal L1-regularization. J Amer Statist Assoc, 2015, 110: 1607–1620
Song Q, Liang F. A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc Ser B Stat Methodol, 2015, 77: 947–972
Tang X, Xu X, Ghosh M, et al. Bayesian variable selection and estimation based on global-local shrinkage priors. Sankhya A, 2018, 80: 215–246
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288
Tibshirani R, Taylor J, Lockhart R, et al. Exact post-selection inference for sequential regression procedures. J Amer Statist Assoc, 2016, 111: 600–620
van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202
van der Pas S L, Kleijn B J K, van der Vaart A W. The horseshoe estimator: Posterior concentration around nearly black vectors. Electron J Stat, 2014, 8: 2585–2618
van der Pas S L, Szabo B, van der Vaart A W. Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal, 2017, 12: 1221–1274
van der Pas S L, Szabo B, van der Vaart A W. Adaptive posterior contraction rates for the horseshoe. Electron J Stat, 2017, 11: 3196–3225
van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York: Springer, 1996
Vershynin R. Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing Theory and Applications. Cambridge: Cambridge University Press, 2012, 210–268
Wei R, Reich B J, Hoppin J A, et al. Sparse Bayesian additive nonparametric regression with application to health effects of pesticides mixtures. Statist Sinica, 2020, 30: 55–79
Welling M, Yee W T. Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue: ICML, 2011, 681–688
Xu Z, Schmidt D F, Makalic E, et al. Bayesian sparse global-local shrinkage regression for grouped variables. arXiv:1709.04333, 2017
Yang Y, Wainwright M J, Jordan M I. On the computational complexity of high-dimensional Bayesian variable selection. Ann Statist, 2016, 44, 2497–2532
Zhang C-H. Nearly unbiased variable selection under minimax concave penalty. Ann Statist, 2010, 38: 894–942
Zhang C-H, Huang J. The sparsity and bias of the LASSO selection in high-dimensional regression. Ann Statist, 2008, 36: 1567–1594
Zhang C-H, Zhang S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242
Zhang Y, Naughton B, Bondell H D, et al. High dimensional linear regression via the R2-D2 shrinkage prior. J Amer Statist Assoc, 2022, in press
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol, 2005, 67: 301–320
Zubkov A M, Serov A A. A complete proof of universal inequalities for the distribution function of the binomial law. Theory Probab Appl, 2013, 57: 539–544
Acknowledgements
Qifan Song was supported by National Science Foundation of USA (Grant No. DMS-1811812). Faming Liang was supported by National Science Foundation of USA (Grant No. DMS-2015498) and National Institutes of Health of USA (Grant Nos. R01GM117597 and R01GM126089). The authors thank the referees for their constructive comments on the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, Q., Liang, F. Nearly optimal Bayesian shrinkage for high-dimensional regression. Sci. China Math. 66, 409–442 (2023). https://doi.org/10.1007/s11425-020-1912-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-020-1912-6
Keywords
- Bayesian variable selection
- absolutely continuous shrinkage prior
- heavy tail
- posterior consistency
- high-dimensional inference