Skip to main content

Advertisement

A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

The advances of modern sequencing techniques have generated an unprecedented amount of multi-omics data which provide great opportunities to quantitatively explore functional genomes from different but complementary perspectives. However, distinct modalities/sequencing technologies generate diverse types of data which greatly complicate statistical modeling because uniquely optimized methods are required for handling each type of data. In this paper, we propose a unified framework for Bayesian nonparametric matrix factorization that infers overlapping bi-clusters for multi-omics data. The proposed method adaptively discretizes different types of observations into common latent states on which cluster structures are built hierarchically. The proposed Bayesian nonparametric method is able to automatically determine the number of clusters. We demonstrate the utility of the proposed method using simulation studies and applications to a single-cell RNA-sequencing dataset, a combination of single-cell RNA-sequencing and single-cell ATAC-sequencing dataset, a bulk RNA-sequencing dataset, and a DNA methylation dataset which reveal several interesting findings that are consistent with biological literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Arteaga CL, Moulder SL, Yakes FM (2002) HER (ERBB) tyrosine kinase inhibitors in the treatment of breast cancer. Semin Oncol 29:4–10

    Google Scholar 

  2. Banchereau J, Steinman RM (1998) Dendritic cells and the control of immunity. Nature 392(6673):245–252

    Google Scholar 

  3. Banerjee A, Krumpelman C, Ghosh J, Basu S, Mooney RJ (2005) Model-based overlapping clustering. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. pp 532–537

  4. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Google Scholar 

  5. Bod L, Douguet L, Auffray C, Lengagne R, Bekkat F, Rondeau E, Molinier-Frenkel V, Castellano F, Richard Y, Prévost-Blondel A (2018) IL-4-induced gene 1: a negative immune checkpoint controlling B cell differentiation and activation. J Immunol 200(3):1027–1038

    Google Scholar 

  6. Bolós V, Gasent JM, Lopez-Tarruella S, Grande E (2010) The dual kinase complex FAK-SRC as a promising therapeutic target in cancer. OncoTargets Therapy 3:83

    Google Scholar 

  7. Brenna Ø, Furnes MW, Munkvold B, Kidd M, Sandvik AK, Gustafsson BI (2016) Cellular localization of guanylin and uroguanylin mRNAs in human and rat duodenal and colonic mucosa. Cell Tissue Res 365(2):331–341

    Google Scholar 

  8. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523(7561):486–490

    Google Scholar 

  9. Cai T, Li H, Ma J, Xia Y (2019) Differential Markov random field analysis with an application to detecting differential microbial community networks. Biometrika 106(2):401–416

    MathSciNet  MATH  Google Scholar 

  10. Clark EA (1997) Regulation of B lymphocytes by dendritic cells. J Exp Med 185(5):801–804

    Google Scholar 

  11. Cleuziou G (2008) An extended version of the k-means method for overlapping clustering. In Proceedings of the 19th international conference on pattern recognition. pp 1–4

  12. Demokan S, Dalay N (2011) Role of DNA methylation in head and neck cancer. Clin Epigenet 2(2):123

    Google Scholar 

  13. DeSantis CE, Ma J, Sauer AG, Newman LA, Jemal A (2017) Breast cancer statistics, 2017, racial disparity in mortality by state. CA Cancer J Clin 67(6):439–448

    Google Scholar 

  14. Ding B, Zheng L, Zhu Y, Li N, Jia H, Ai R, Wildberg A, Wang W (2015) Normalization and noise reduction for single cell RNA-seq experiments. Bioinformatics 31(13):2225–2227

    Google Scholar 

  15. Engelstoft MS, Lund ML, Grunddal KV, Egerod KL, Osborne-Lawrence S, Poulsen SS, Zigman JM, Schwartz TW (2015) Research resource: a chromogranin a reporter for serotonin and histamine secreting enteroendocrine cells. Mol Endocrinol 29(11):1658–1671

    Google Scholar 

  16. Ghahramani Z, Griffiths TL (2006) Infinite latent feature models and the Indian buffet process. In Advances in neural information processing systems. pp 475–482

  17. Gopalan P, Ruiz FJ, Ranganath R, Blei D (2014) Bayesian nonparametric Poisson factorization for recommendation systems. In Proceedings of the seventeenth international conference on artificial intelligence and statistics, pp 275–283

  18. Haagenson KK, Wu GS (2010) The role of MAP kinases and MAP kinase phosphatase-1 in resistance to breast cancer treatment. Cancer Metastasis Rev 29(1):143–149

    Google Scholar 

  19. Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129

    Google Scholar 

  20. Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  21. Heppner GH, Miller BE (1983) Tumor heterogeneity: biological implications and therapeutic consequences. Cancer Metastasis Rev 2:5–23

    Google Scholar 

  22. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(11):1457–1469

    MathSciNet  MATH  Google Scholar 

  23. Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    MATH  Google Scholar 

  24. Kang JM, Park S, Kim SJ, Hong H, Jeong J, Kim H (2012) CBL enhances breast tumor formation by inhibiting tumor suppressive activity of TGF-$\beta $ signaling. Oncogene 31(50):5123–5131

    Google Scholar 

  25. Kaske S, Krasteva G, König P, Kummer W, Hofmann T, Gudermann T, Chubanov V (2007) TRPM5, a taste-signaling transient receptor potential ion-channel, is a ubiquitous signaling component in chemosensory cells. BMC Neurosci 8:49

    Google Scholar 

  26. Kim JK, Kolodziejczyk AA, Ilicic T, Teichmann SA, Marioni JC (2015) Characterizing noise structure in single-cell RNA-seq distinguishes genuine from technical stochastic allelic expression. Nat Commun 6:8687

    Google Scholar 

  27. Kim E, Davidson LA, Zoh RS, Hensel ME, Salinas ML, Patil BS, Jayaprakasha GK, Callaway ES, Allred CD, Turner ND, Weeks BR, Chapkin RS (2016) Rapidly cycling LGR5+ stem cells are exquisitely sensitive to extrinsic dietary factors that modulate colon cancer risk. Cell Death Dis 7(11):e2460

    Google Scholar 

  28. Kiselev VY, Andrews TS, Hemberg M (2019) Challenges in unsupervised clustering of single-cell RNA-seq data. Nat Rev Genet 20(5):273–282

    Google Scholar 

  29. Kranich J, Krautler NJ (2016) How follicular dendritic cells shape the B-cell antigenome. Front Immunol 7:225

    Google Scholar 

  30. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791

    MATH  Google Scholar 

  31. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, pp 556–562

  32. Lee E-R, Kim J-Y, Kang Y-J, Ahn J-Y, Kim J-H, Kim B-W, Choi H-Y, Jeong M-Y, Cho S-G (2006) Interplay between PI3K/AKT and MAPK signaling pathways in DNA-damaging drug-induced apoptosis. Biochimica et Biophysica Acta (BBA)-Mol Cell Res 1763(9):958–968

    Google Scholar 

  33. Lee J, Müller P, Gulukota K, Ji Y (2015) A Bayesian feature allocation model for tumor heterogeneity. Ann Appl Stat 9(2):621–639

    MathSciNet  MATH  Google Scholar 

  34. Leek JT (2014) Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Res 42(21):e161

    Google Scholar 

  35. Li L, Tao Q, Jin H, Van Hasselt A, Poon FF, Wang X, Zeng M-S, Jia W-H, Zeng Y-X, Chan AT et al (2010) The tumor suppressor UCHL1 forms a complex with P53/MDM2/ARF to promote P53 signaling and is frequently silenced in nasopharyngeal carcinoma. Clin Cancer Res 16(11):2949–2958

    Google Scholar 

  36. Lin Z, Zamanighomi M, Daley T, Ma S, Wong WH (2020) Model-based approach to the joint analysis of single-cell data on chromatin accessibility and gene expression. Stat Sci 35(1):2–13

    MathSciNet  MATH  Google Scholar 

  37. Liu Y, Zhang R, Xin J, Sun Y, Li J, Wei D, Zhao AZ (2011) Identification of S100A16 as a novel adipogenesis promoting factor in 3T3-L1 cells. Endocrinology 152(3):903–911

    Google Scholar 

  38. Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinf 1(1):24–45

    Google Scholar 

  39. Mallik S, Zhao Z (2019) Multi-objective optimized fuzzy clustering for detecting cell clusters from single-cell expression profiles. Genes 10(8):611

    Google Scholar 

  40. Marusyk A, Polyak K (2010) Tumor heterogeneity: causes and consequences. Biochimica et Biophysica Acta (BBA) 1805(1):105–117

    Google Scholar 

  41. McLachlan GJ, Peel D (2004) Finite mixture models. Wiley, Hoboken

    MATH  Google Scholar 

  42. Morris DC, Popp JL, Tang LK, Gibbs HC, Schmitt E, Chaki SP, Bywaters BC, Yeh AT, Porter WW, Burghardt RC et al (2017) NCK deficiency is associated with delayed breast carcinoma progression and reduced metastasis. Mol Biol Cell 28(24):3500–3516

    Google Scholar 

  43. Müller P, Quintana FA, Jara A, Hanson T (2015) Bayesian nonparametric data analysis. Springer, Berlin

    MATH  Google Scholar 

  44. Muñoz J, Stange DE, Schepers AG, Van De Wetering M, Koo B-K, Itzkovitz S, Volckmann R, Kung KS, Koster J, Radulescu S et al (2012) The LGR5 intestinal stem cell signature: robust expression of proposed quiescent ‘+ 4’ cell markers. EMBO J 31(14):3079–3091

    Google Scholar 

  45. Ni Y, Müller P, Ji Y (2019) Bayesian double feature allocation for phenotyping with electronic health records. J Am Stat Assoc 115:1–15

    MathSciNet  MATH  Google Scholar 

  46. Noren NK, Foos G, Hauser CA, Pasquale EB (2006) The EPHB4 receptor suppresses breast cancer cell tumorigenicity through an ABL-CRK pathway. Nat Cell Biol 8(8):815–825

    Google Scholar 

  47. Ongusaha PP, Kwak JC, Zwible AJ, Macip S, Higashiyama S, Taniguchi N, Fang L, Lee SW (2004) HB-EGF is a potent inducer of tumor growth and angiogenesis. Can Res 64(15):5283–5290

    Google Scholar 

  48. Paplomata E, O’Regan R (2014) The PI3K/AKT/MTOR pathway in breast cancer: targets, trials and biomarkers. Therap Adv Med Oncol 6(4):154–166

    Google Scholar 

  49. Parmigiani G, Garrett ES, Anbazhagan R, Gabrielson E (2002) A statistical framework for expression-based molecular classification in cancer. J R Stat Soc Ser B (Statistical Methodology) 64(4):717–736

    MathSciNet  MATH  Google Scholar 

  50. Rehfeld JF (1998) The new biology of gastrointestinal hormones. Physiol Rev 78(4):1087–1108

    Google Scholar 

  51. Ročková V, George EI (2016) Fast Bayesian factor analysis via automatic rotations to sparsity. J Am Stat Assoc 111(516):1608–1622

    MathSciNet  Google Scholar 

  52. Safe S, Han H, Goldsby J, Mohankumar K, Chapkin RS (2018) Aryl hydrocarbon receptor (AhR) ligands as selective AhR modulators: genomic studies. Current Opin Toxicol 11:10–20

    Google Scholar 

  53. Shintani S, Nakahara Y, Mihara M, Ueyama Y, Matsumura T (2001) Inactivation of the P14ARF, P15INK4B and P16INK4A genes is a frequent event in human oral squamous cell carcinomas. Oral Oncol 37(6):498–504

    Google Scholar 

  54. Stern DF (2000) Tyrosine kinase signalling in breast cancer: ERBB family receptor tyrosine kinases. Breast Cancer Res 2(3):176

    Google Scholar 

  55. Wei L, Jin Z, Yang S, Xu Y, Zhu Y, Ji Y (2018) TCGA-assembler 2: software pipeline for retrieval and processing of TCGA/CPTAC data. Bioinformatics 34(9):1615–1617

    Google Scholar 

  56. Xu Y, Lee J, Yuan Y, Mitra R, Liang S, Müller P, Ji Y (2013) Nonparametric Bayesian bi-clustering for next generation sequencing count data. Bayesian Anal 8(4):759

    MathSciNet  MATH  Google Scholar 

  57. Zeisel A, Hochgerner H, Lönnerberg P, Johnsson A, Memic F, Van Der Zwan J, Häring M, Braun E, Borm LE, La Manno G et al (2018) Molecular architecture of the mouse nervous system. Cell 174(4):999–1014

    Google Scholar 

  58. Zeng Y, Min L, Han Y, Meng L, Liu C, Xie Y, Dong B, Wang L, Jiang B, Xu H et al (2014) Inhibition of STAT5A by NAA10P contributes to decreased breast cancer metastasis. Carcinogenesis 35(10):2244–2253

    Google Scholar 

  59. Zhang Z, Li T, Ding C, Zhang X (2007) Binary matrix factorization with applications. In Seventh IEEE international conference on data mining, pp 391–400

  60. Zhang Z-Y, Li T, Ding C, Ren X-W, Zhang X-S (2010) Binary matrix factorization for analyzing gene expression data. Data Min Knowl Disc 20:28–52

    MathSciNet  Google Scholar 

  61. Zhou M, Hannah L, Dunson D, Carin L (2012) Beta-negative binomial process and Poisson factor analysis. In Proceedings of the fifteenth international conference on artificial intelligence and statistics. pp 1462–1471

  62. Zhou C, Ye M, Ni S, Li Q, Ye D, Li J, Shen Z, Deng H (2018) DNA methylation biomarkers for head and neck squamous cell carcinoma. Epigenetics 13(4):398–409

    Google Scholar 

  63. Zhou F, He K, Li Q, Chapkin RS, Ni Y (2021) Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization. Biostatistics

Download references

Acknowledgements

Yang Ni is partially supported by the National Science Foundation, NSF DMS-1918851 and NSF DMS-2112943. Robert S. Chapkin is partially supported by the Allen Endowed Chair in Nutrition & Chronic Disease Prevention, and the National Institutes of Health (Grant Nos. R01-ES025713, R01-CA202697, R35-CA197707, and T32-CA090301). Kejun He is partially supported by the National Natural Science Foundation of China under Grant 11801560.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kejun He or Yang Ni.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 236 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, F., He, K., Cai, J.J. et al. A Unified Bayesian Framework for Bi-overlapping-Clustering Multi-omics Data via Sparse Matrix Factorization. Stat Biosci 15, 669–691 (2023). https://doi.org/10.1007/s12561-022-09350-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-022-09350-w

Keywords

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy