Abstract
Much of a cell's activity is organized as a network of interacting modules: sets of genes coregulated to respond to different conditions. We present a probabilistic method for identifying regulatory modules from gene expression data. Our procedure identifies modules of coregulated genes, their regulators and the conditions under which regulation occurs, generating testable hypotheses in the form 'regulator X regulates module Y under conditions W'. We applied the method to a Saccharomyces cerevisiae expression data set, showing its ability to identify functionally coherent modules and their correct regulators. We present microarray experiments supporting three novel predictions, suggesting regulatory roles for previously uncharacterized proteins.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
DeRisi, J.L., Iyer, V.R. & Brown, P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997).
Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).
Gasch, A.P. et al. Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 (2000).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Wu, L.F. et al. Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet. 31, 255–265 (2002).
Ihmels, J. et al. Revealing modular organization in the yeast transcriptional network. Nat. Genet. 31, 370–377 (2002).
Halfon, M.S., Grad, Y., Church, G.M. & Michelson, A.M. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12, 1019–1028 (2002).
Tanay, A., Sharan, R. & Shamir, R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 18 Suppl 1, S136–S144 (2002).
Roth, F.P., Hughes, J.D., Estep, P.W. & Church, G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16, 939–945 (1998).
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J. & Church, G.M. Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999).
Pilpel, Y., Sudarsanam, P. & Church, G.M. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat. Genet. 29, 153–159 (2001).
Segal, E., Barash Y., Simon I., Friedman N. & Koller D. From Promoter Sequence to Expression: A Probabilistic Framework. in Proceedings of the 6th International Conference on Research in Computational Molecular Biology (RECOMB) 263–272 (Washington, DC, 2002).
Pearl, J. Probabilistic Reasoning in Intelligent Systems (Morgan Kaufmann, Palo Alto, 1988).
Dhaseleer, P., Liang, S. & Somogoyi, R. Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics 16, 707–726 (2000).
Pe'er, D., Regev, A., Elidan, G. & Friedman, N. Inferring subnetworks from perturbed expression profiles. Bioinformatics 17 Suppl 1, S215–S224 (2001).
Hartemink, A.J., Gifford, D.K., Jaakkola, T.S. & Young, R.A. Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Networks. in Pacific Symposium on Biocomputing (Kauai, 2002).
Tanay, A. & Shamir, R. Computational expansion of genetic networks. Bioinformatics 17 Suppl 1, S270–S278 (2001).
Pe'er, D., Regev, A. & Tanay, A. Minreg: inferring an active regulator set. Bioinformatics 18 Suppl 1, S258–S267 (2002).
Forsburg, S.L. & Guarente, L. Identification and characterization of HAP4: a third component of the CCAAT-bound HAP2/HAP3 heteromer. Genes Dev. 3, 1166–1178 (1989).
Norbeck, J. & Blomberg, A. The level of cAMP-dependent protein kinase A activity strongly affects osmotolerance and osmo-instigated gene expression changes in Saccharomyces cerevisiae. Yeast 16, 121–137 (2000).
Lenssen, E., Oberholzer, U., Labarre, J., De Virgilio, C. & Collart, M.A. Saccharomyces cerevisiae Ccr4-not complex contributes to the control of Msn2p-dependent transcription by the Ras/cAMP pathway. Mol. Microbiol. 43, 1023–1037 (2002).
Winzeler, E.A. et al. Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 (1999).
Shen-Orr, S.S., Milo, R., Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31, 64–68 (2002).
Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).
Milo, R. et al. Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002).
Hlavacek, W.S. & Savageau, M.A. Rules for coupled expression of regulator and effector genes in inducible circuits. J. Mol. Biol. 255, 121–139 (1996).
Rosenfeld, N., Elowitz, M.B. & Alon, U. Negative autoregulation speeds the response times of transcription networks. J. Mol. Biol. 323, 785–793 (2002).
Roberts, C.J. et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 287, 873–880 (2000).
Cherry, J.M. et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res. 26, 73–79 (1998).
Hodges, P.E., McKee, A.H., Davis, B.P., Payne, W.E. & Garrels, J.I. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res. 27, 69–73 (1999).
Duda, R.O. & Hart, P.E. Pattern classification and scene analysis (John Wiley & Sons, New York, 1973).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Mewes, H.W., Albermann, K., Heumann, K., Liebl, S. & Pfeiffer, F. MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Res. 25, 28–30 (1997).
Kanehisa, M., Goto, S., Kawashima, S. & Nakaya, A. The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42–46 (2002).
Segal, E., Taskar, B., Gasch, A., Friedman, N. & Koller, D. Rich probabilistic models for gene expression. Bioinformatics 17 Suppl 1, S243–S252 (2001).
Heckerman, D. A tutorial on learning with Bayesian networks. in Learning in Graphical Models (ed. Jordan, M.I.) 301–354 (MIT Press, Cambridge, Massachusetts 1998).
Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–39 (1977).
Friedman, N. The Bayesian structural EM algorithm. in Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI) 129–138 (1998).
Wingender, E. et al. The TRANSFAC system on gene expression regulation. Nucleic Acids Res. 29, 281–283 (2001).
Hughes, T.R. et al. Functional discovery via a compendium of expression profiles. Cell 102, 109–126 (2000).
Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Mayordomo, I., Estruch, F. & Sanz, P. Convergence of the target of rapamycin and the Snf1 protein kinase pathways in the regulation of the subcellular localization of Msn2, a transcriptional activator of STRE (Stress Response Element)-regulated genes. J. Biol. Chem. 277, 35650–35656 (2002).
Gorner, W. et al. Nuclear localization of the C2H2 zinc finger protein Msn2p is regulated by stress and protein kinase A activity. Genes Dev. 12, 586–597 (1998).
Zahringer, H., Thevelein, J.M. & Nwaka, S. Induction of neutral trehalase Nth1 by heat and osmotic stress is controlled by STRE elements and Msn2/Msn4 transcription factors: variations of PKA effect during stress and growth. Mol. Microbiol. 35, 397–406 (2000).
Boy-Marcotte, E., Perrot, M., Bussereau, F., Boucherie, H. & Jacquet, M. Msn2p and Msn4p control a large number of genes induced at the diauxic transition which are repressed by cyclic AMP in Saccharomyces cerevisiae. J. Bacteriol. 180, 1044–1052 (1998).
Inoue, Y., Tsujimoto, Y. & Kimura, A. Expression of the glyoxalase I gene of Saccharomyces cerevisiae is regulated by high osmolarity glycerol mitogen-activated protein kinase pathway in osmotic stress response. J. Biol. Chem. 273, 2977–2983 (1998).
Acknowledgements
We thank L. Garwin, M. Scott, G. Simchen and L. Stryer for their useful comments on earlier versions of this manuscript and A. Kaushal, T. Pham, A. Tanay and R. Yelensky for technical help with software and visualization. E.S., D.K. and N.F. were supported by a National Science Foundation grant under the Information Technology Research program. E.S. was also supported by a Stanford Graduate Fellowship. M.S was supported by the Stanford University School of Medicine Dean's Fellowship. A.R. was supported by the Colton Foundation. D.P. was supported by an Eshkol Fellowship. N.F. was also supported by an Alon Fellowship, by the Harry & Abe Sherman Senior Lectureship in Computer Science and by the Israeli Ministry of Science.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Segal, E., Shapira, M., Regev, A. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34, 166–176 (2003). https://doi.org/10.1038/ng1165
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1165
This article is cited by
-
Exploring pathway interactions to detect molecular mechanisms of disease: 22q11.2 deletion syndrome
Orphanet Journal of Rare Diseases (2023)
-
BRANEnet: embedding multilayer networks for omics data integration
BMC Bioinformatics (2022)
-
In silico discovery of blood cell macromolecular associations
BMC Genomic Data (2022)
-
A novel approach to co-expression network analysis identifies modules and genes relevant for moulting and development in the Atlantic salmon louse (Lepeophtheirus salmonis)
BMC Genomics (2021)
-
Lisa: inferring transcriptional regulators through integrative modeling of public chromatin accessibility and ChIP-seq data
Genome Biology (2020)