Skip to main content

Nearly optimal Bayesian shrinkage for high-dimensional regression

  • Articles
  • Published:
Science China Mathematics Aims and scope Submit manuscript

Abstract

During the past decade, shrinkage priors have received much attention in Bayesian analysis of high-dimensional data. This paper establishes the posterior consistency for high-dimensional linear regression with a class of shrinkage priors, which has a heavy and flat tail and allocates a sufficiently large probability mass in a very small neighborhood of zero. While enjoying its efficiency in posterior simulations, the shrinkage prior can lead to a nearly optimal posterior contraction rate and the variable selection consistency as the spike-and-slab prior. Our numerical results show that under the posterior consistency, Bayesian methods can yield much better results in variable selection than the regularization methods such as LASSO and SCAD. This paper also establishes a BvM-type result, which leads to a convenient way of uncertainty quantification for regression coefficient estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Armagan A, Dunson D B, Clyde M. Generalized beta mixtures of Gaussians. Adv Neural Information Process Syst, 2011, 24: 523–531

    Google Scholar 

  2. Armagan A, Dunson D B, Lee J. Generalized double Pareto shrinkage. Statist Sinica, 2013, 23: 119–143

    MATH  Google Scholar 

  3. Armagan A, Dunson D B, Lee J, et al. Posterior consistency in linear models under shrinkage priors. Biometrika, 2013, 100: 1011–1018

    Article  MATH  Google Scholar 

  4. Bai R, Ghosh M. On the beta prime prior for scale parameters in high-dimensional Bayesian regression models. Statist Sinica, 2021, 31: 843–865

    MATH  Google Scholar 

  5. Barron A R. Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In: Bayesian Statist, vol. 6. Oxford: Clarendon Press, 1999, 27–52

    MATH  Google Scholar 

  6. Belitser E, Ghosal S. Empirical Bayes oracle uncertainty quantification for regression. Ann Statist, 2020, 48: 3113–3137

    Article  MATH  Google Scholar 

  7. Bhadra A, Datta J, Li Y, et al. Prediction risk for the Horseshoe regression. J Mach Learn Res, 2019, 20: 1–39

    MATH  Google Scholar 

  8. Bhattacharya A, Pati D, Pillai N S, et al. Dirichlet-Laplace priors for optimal shrinkage. J Amer Statist Assoc, 2015, 110: 1479–1490

    Article  MATH  Google Scholar 

  9. Bontemps D. Bernstein-von Mises theorems for Gaussian regression with increasing number of regressors. Ann Statist, 2011, 39: 2557–2584

    Article  MATH  Google Scholar 

  10. Breheny P, Huang J. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat, 2011, 5: 232–253

    Article  MATH  Google Scholar 

  11. Carvalho C M, Polson N G, Scott J G. The horseshoe estimator for sparse signals. Biometrika, 2010, 97: 465–480

    Article  MATH  Google Scholar 

  12. Castillo I, Schmidt-Hieber J, van der Vaart A. Bayesian linear regression with sparse priors. Ann Statist, 2015, 43: 1986–2018

    Article  MATH  Google Scholar 

  13. Castillo I, van der Vaart A. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann Statist, 2012, 40: 2069–2101

    Article  MATH  Google Scholar 

  14. Chen J H, Chen Z H. Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 2008, 95: 759–771

    Article  MATH  Google Scholar 

  15. Chen T Q, Fox E B, Guestrin C. Stochastic gradient Hamiltonian Monte Carlo. J Mach Learn Res, 2014, 15: 1683–1691

    Google Scholar 

  16. Chiang A P, Beck J S, Yen H J, et al. Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11). Proc Natl Acad Sci USA, 2006, 103: 6287–6292

    Article  Google Scholar 

  17. Dezeure R, Bühlmann P, Meier L, et al. High-dimensional inference: Confidence intervals, p-values and R-software hdi. Statist Sci, 2015, 30: 533–558

    Article  MATH  Google Scholar 

  18. Duane S, Kennedy A D, Pendleton B J, et al. Hybrid Monte Carlo. Phys Lett B, 1987, 195: 216–222

    Article  Google Scholar 

  19. Efron B. Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J Amer Statist Assoc, 2004, 99: 96–104

    Article  MATH  Google Scholar 

  20. Fan J Q, Li R Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc, 2001, 96: 1348–1360

    Article  MATH  Google Scholar 

  21. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Statist Softw, 2010, 33: 1–22

    Article  Google Scholar 

  22. Gao C, Ada W, van der Vaart A W, et al. A general framework for Bayes structured linear models. Ann Statist, 2020, 48: 2848–2878

    Article  MATH  Google Scholar 

  23. George E I, McCulloch R E. Variable selection via Gibbs sampling. J Amer Statist Assoc, 1993, 88: 881–889

    Article  Google Scholar 

  24. Ghosal S. Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli, 1999, 5: 315–331

    Article  MATH  Google Scholar 

  25. Ghosal S, Ghosh J K, van der Vaart A W. Convergence rates of posterior distributions. Ann Statist, 2000, 28: 500–531

    Article  MATH  Google Scholar 

  26. Ghosal S, van der Vaart A W. Convergence rates of posterior distributions for noniid observations. Ann Statist, 2007, 35: 192–223

    Article  MATH  Google Scholar 

  27. Ghosh P, Chakrabarti A. Posterior concentration properties of a general class of shrinkage priors around nearly black vectors. arXiv:1412.8161, 2014

  28. Girolami M, Galderhead B. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J R Stat Soc Ser B Stat Methodol, 2011, 73: 123–214

    Article  MATH  Google Scholar 

  29. Griffin J E, Brown P J. Bayesian hyper-Lassos with non-convex penalization. Aust N Z J Stat, 2011, 53: 423–442

    Article  MATH  Google Scholar 

  30. Griffin J E, Brown P J. Structuring shrinkage: Some correlated priors for regression. Biometrika, 2012, 99: 481–487

    Article  MATH  Google Scholar 

  31. Hahn P R, Carvalho C M. Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. J Amer Statist Assoc, 2015, 110: 435–448

    Article  MATH  Google Scholar 

  32. Hans C. Bayesian lasso regression. Biometrika, 2009, 96: 835–845

    Article  MATH  Google Scholar 

  33. Inglot T. Inequalities for quantiles of the chi-square distribution. Probab Math Statist, 2010, 30: 339–351

    MATH  Google Scholar 

  34. Ishwaran H, Rao J S. Spike and slab variable selection: Frequentist and Bayesian strategies. Ann Statist, 2005, 33: 730–773

    Article  MATH  Google Scholar 

  35. Jiang W X. Bayesian variable selection for high dimensional generalized linear models: Convergence rate of the fitted densities. Ann Statist, 2007, 35: 1487–1511

    Article  MATH  Google Scholar 

  36. Johnson V E, Rossel D. Bayesian model selection in high-dimensional settings. J Amer Statist Assoc, 2012, 107: 649–660

    Article  MATH  Google Scholar 

  37. Kleijn B J K, van der Vaart A W. Misspecification in infinite-dimensional Bayesian statistics. Ann Statist, 2006, 34: 837–877

    Article  MATH  Google Scholar 

  38. Li H N, Pati D. Variable selection using shrinkage priors. Comput Statist Data Anal, 2017, 107: 107–119

    Article  MATH  Google Scholar 

  39. Li X G, Zhao T, Yuan X M, et al. The flare package for high dimensional linear regression and precision matrix estimation in R. J Mach Learn Res, 2015, 16: 553–557

    MATH  Google Scholar 

  40. Liang F M, Song Q F, Yu K. Bayesian subset modeling for high-dimensional generalized linear models. J Amer Statist Assoc, 2013, 108: 589–606

    Article  MATH  Google Scholar 

  41. Liang F M, Zhang J. Estimating the false discovery rate using the stochastic approximation algorithm. Biometrika, 2008, 95: 961–977

    Article  MATH  Google Scholar 

  42. Lockhart R, Taylor J, Tibshirani R J, et al. A significance test for the Lasso. Ann Statist, 2014, 42: 413–468

    MATH  Google Scholar 

  43. Luo S K, Song R, Witten D. Sure screening for Gaussian graphical models. arXiv:1407.7819, 2014

  44. Martin R, Mess R, Walker S G. Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli, 2017, 23: 1822–1847

    Article  MATH  Google Scholar 

  45. Narisetty N N. Statistical analysis of complex data: Bayesian model selection and functional data depth. PhD Thesis. Ann Arbor: University of Michigan, 2016

    Google Scholar 

  46. Narisetty N N, He X M. Bayesian variable selection with shrinking and diffusing priors. Ann Statist, 2014, 42: 789–817

    Article  MATH  Google Scholar 

  47. Neal R M. MCMC using Hamiltonian dynamics. In: Handbook of Markov Chain Monte Carlo. New York: Chapman and Hall/CRC, 2011, 113–162

    Google Scholar 

  48. Park T, Casella G. The Bayesian Lasso. J Amer Statist Assoc, 2008, 103: 681–686

    Article  MATH  Google Scholar 

  49. Pati D, Bhattacharya A, Pillai N S, et al. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann Statist, 2014, 42: 1102–1130

    Article  MATH  Google Scholar 

  50. Raskutti G, Wainwright M J, Yu B. Minimax rates of estimation for high-dimensional linear regression over Lq-balls. IEEE Trans Inform Theory, 2011, 57: 6976–6994

    Article  MATH  Google Scholar 

  51. Ročková V, George E I. The spike-and-slab LASSO. J Amer Statist Assoc, 2018, 113: 431–444

    Article  MATH  Google Scholar 

  52. Scheetz T E, Kim K Y A, Swiderski R E, et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci USA, 2006, 103: 14429–14434

    Article  Google Scholar 

  53. Scott J G, Berger J O. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann Statist, 2010, 38: 2587–2619

    Article  MATH  Google Scholar 

  54. Song Q. Bayesian shrinkage towards sharp minimaxity. Electron J Stat, 2020, 14: 2714–2741

    Article  MATH  Google Scholar 

  55. Song Q, Cheng G. Optimal false discovery control of minimax estimator. arXiv:1812.10013, 2018

  56. Song Q, Liang F. High-dimensional variable selection with reciprocal L1-regularization. J Amer Statist Assoc, 2015, 110: 1607–1620

    Article  MATH  Google Scholar 

  57. Song Q, Liang F. A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc Ser B Stat Methodol, 2015, 77: 947–972

    Article  MATH  Google Scholar 

  58. Tang X, Xu X, Ghosh M, et al. Bayesian variable selection and estimation based on global-local shrinkage priors. Sankhya A, 2018, 80: 215–246

    Article  Google Scholar 

  59. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol, 1996, 58: 267–288

    MATH  Google Scholar 

  60. Tibshirani R, Taylor J, Lockhart R, et al. Exact post-selection inference for sequential regression procedures. J Amer Statist Assoc, 2016, 111: 600–620

    Article  Google Scholar 

  61. van de Geer S, Bühlmann P, Ritov Y, et al. On asymptotically optimal confidence regions and tests for high-dimensional models. Ann Statist, 2014, 42: 1166–1202

    Article  MATH  Google Scholar 

  62. van der Pas S L, Kleijn B J K, van der Vaart A W. The horseshoe estimator: Posterior concentration around nearly black vectors. Electron J Stat, 2014, 8: 2585–2618

    Article  MATH  Google Scholar 

  63. van der Pas S L, Szabo B, van der Vaart A W. Uncertainty quantification for the horseshoe (with discussion). Bayesian Anal, 2017, 12: 1221–1274

    Article  MATH  Google Scholar 

  64. van der Pas S L, Szabo B, van der Vaart A W. Adaptive posterior contraction rates for the horseshoe. Electron J Stat, 2017, 11: 3196–3225

    Article  MATH  Google Scholar 

  65. van der Vaart A W, Wellner J A. Weak Convergence and Empirical Processes. New York: Springer, 1996

    Book  MATH  Google Scholar 

  66. Vershynin R. Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing Theory and Applications. Cambridge: Cambridge University Press, 2012, 210–268

    Chapter  Google Scholar 

  67. Wei R, Reich B J, Hoppin J A, et al. Sparse Bayesian additive nonparametric regression with application to health effects of pesticides mixtures. Statist Sinica, 2020, 30: 55–79

    MATH  Google Scholar 

  68. Welling M, Yee W T. Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning. Bellevue: ICML, 2011, 681–688

    Google Scholar 

  69. Xu Z, Schmidt D F, Makalic E, et al. Bayesian sparse global-local shrinkage regression for grouped variables. arXiv:1709.04333, 2017

  70. Yang Y, Wainwright M J, Jordan M I. On the computational complexity of high-dimensional Bayesian variable selection. Ann Statist, 2016, 44, 2497–2532

    Article  MATH  Google Scholar 

  71. Zhang C-H. Nearly unbiased variable selection under minimax concave penalty. Ann Statist, 2010, 38: 894–942

    Article  MATH  Google Scholar 

  72. Zhang C-H, Huang J. The sparsity and bias of the LASSO selection in high-dimensional regression. Ann Statist, 2008, 36: 1567–1594

    Article  MATH  Google Scholar 

  73. Zhang C-H, Zhang S. Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc Ser B Stat Methodol, 2014, 76: 217–242

    Article  MATH  Google Scholar 

  74. Zhang Y, Naughton B, Bondell H D, et al. High dimensional linear regression via the R2-D2 shrinkage prior. J Amer Statist Assoc, 2022, in press

  75. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol, 2005, 67: 301–320

    Article  MATH  Google Scholar 

  76. Zubkov A M, Serov A A. A complete proof of universal inequalities for the distribution function of the binomial law. Theory Probab Appl, 2013, 57: 539–544

    Article  MATH  Google Scholar 

Download references

Acknowledgements

Qifan Song was supported by National Science Foundation of USA (Grant No. DMS-1811812). Faming Liang was supported by National Science Foundation of USA (Grant No. DMS-2015498) and National Institutes of Health of USA (Grant Nos. R01GM117597 and R01GM126089). The authors thank the referees for their constructive comments on the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qifan Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Q., Liang, F. Nearly optimal Bayesian shrinkage for high-dimensional regression. Sci. China Math. 66, 409–442 (2023). https://doi.org/10.1007/s11425-020-1912-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11425-020-1912-6

Keywords

MSC(2020)

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy