Abstract
Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Primary accessions
ArrayExpress
Data deposits
All ChIP data are available in ArrayExpress under accession numbers E-TABM-648, E-TABM-649, E-TABM-650, E-TABM-651 and E-TABM-652, and the array design under A-AFFY-53. The CRM coordinates and transcription factor occupancy is available at http://furlonglab.embl.de/.
References
Levine, M. & Davidson, E. H. Gene regulatory networks for development. Proc. Natl Acad. Sci. USA 102, 4936–4942 (2005)
Ochoa-Espinosa, A. & Small, S. Developmental mechanisms and cis-regulatory codes. Curr. Opin. Genet. Dev. 16, 165–170 (2006)
Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? J. Cell. Biochem. 94, 890–898 (2005)
Small, S., Blair, A. & Levine, M. Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J. 11, 4047–4057 (1992)
Studer, M., Popperl, H., Marshall, H., Kuroiwa, A. & Krumlauf, R. Role of a conserved retinoic acid response element in rhombomere restriction of Hoxb-1. Science 265, 1728–1732 (1994)
Arnosti, D. N., Barolo, S., Levine, M. & Small, S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996)
Halfon, M. S. et al. Ras pathway specificity is determined by the integration of multiple signal-activated and tissue-restricted transcription factors. Cell 103, 63–74 (2000)
Yuh, C. H., Bolouri, H. & Davidson, E. H. Cis-regulatory logic in the endo16 gene: switching from a specification to a differentiation mode of control. Development 128, 617–629 (2001)
Knirr, S. & Frasch, M. Molecular integration of inductive and mesoderm-intrinsic inputs governs even-skipped enhancer activity in a subset of pericardial and dorsal muscle progenitors. Dev. Biol. 238, 13–26 (2001)
Oliveri, P., Carrick, D. M. & Davidson, E. H. A regulatory gene network that directs micromere specification in the sea urchin embryo. Dev. Biol. 246, 209–228 (2002)
Davidson, B. & Levine, M. Evolutionary origins of the vertebrate heart: Specification of the cardiac lineage in Ciona intestinalis . Proc. Natl Acad. Sci. USA 100, 11469–11473 (2003)
Hadchouel, J. et al. Analysis of a key regulatory region upstream of the Myf5 gene reveals multiple phases of myogenesis, orchestrated at each site by a combination of elements dispersed throughout the locus. Development 130, 3415–3426 (2003)
Lee, H. H. & Frasch, M. Nuclear integration of positive Dpp signals, antagonistic Wg inputs and mesodermal competence factors during Drosophila visceral mesoderm induction. Development 132, 1429–1442 (2005)
Zinzen, R. P., Senger, K., Levine, M. & Papatsenko, D. Computational models for neurogenic gene expression in the Drosophila embryo. Curr. Biol. 16, 1358–1365 (2006)
Rothbacher, U., Bertrand, V., Lamy, C. & Lemaire, P. A combinatorial code of maternal GATA, Ets and β-catenin-TCF transcription factors specifies and patterns the early ascidian ectoderm. Development 134, 4023–4032 (2007)
Sandmann, T. et al. A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. Dev. Cell 10, 797–807 (2006)
Zeitlinger, J. et al. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390 (2007)
Sandmann, T. et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster . Genes Dev. 21, 436–449 (2007)
Jakobsen, J. S. et al. Temporal ChIP-on-chip reveals Biniou as a universal regulator of the visceral muscle transcriptional network. Genes Dev. 21, 2448–2460 (2007)
Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008)
Vokes, S. A., Ji, H., Wong, W. H. & McMahon, A. P. A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog-mediated patterning of the mammalian limb. Genes Dev. 22, 2651–2663 (2008)
Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009)
Davidson, E. H. The Regulatory Genome—Gene Regulatory Networks In Development and Evolution 2nd edn (Elsevier Publishers, 2006)
MacArthur, S. et al. Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biol. 10, R80 (2009)
Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005)
Janssens, H. et al. Quantitative and predictive model of transcriptional control of the Drosophila melanogaster even skipped gene. Nature Genet. 38, 1159–1165 (2006)
Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008)
Baylies, M. K. & Bate, M. twist: a myogenic switch in Drosophila . Science 272, 1481–1484 (1996)
Yin, Z., Xu, X. L. & Frasch, M. Regulation of the Twist target gene tinman by modular cis-regulatory elements during early mesoderm development. Development 124, 4971–4982 (1997)
Azpiazu, N. & Frasch, M. tinman and bagpipe: two homeo box genes that determine cell fates in the dorsal mesoderm of Drosophila . Genes Dev. 7 (7B). 1325–1340 (1993)
Bour, B. A. et al. Drosophila MEF2, a transcription factor that is essential for myogenesis. Genes Dev. 9, 730–741 (1995)
Lilly, B., Galewsky, S., Firulli, A. B., Schulz, R. A. & Olson, E. N. D-MEF2: a MADS box transcription factor expressed in differentiating mesoderm and muscle cell lineages during Drosophila embryogenesis. Proc. Natl Acad. Sci. USA 91, 5662–5666 (1994)
Zaffran, S., Kuchler, A., Lee, H. H. & Frasch, M. biniou (FoxF), a central component in a regulatory network controlling visceral mesoderm development and midgut morphogenesis in Drosophila . Genes Dev. 15, 2900–2915 (2001)
Furlong, E. E. Integrating transcriptional and signalling networks during muscle development. Curr. Opin. Genet. Dev. 14, 343–350 (2004)
Sink, H. Muscle Development in Drosophila (Birkhäuser, 2006)
Liu, Y. H. et al. A systematic analysis of Tinman function reveals Eya and JAK-STAT signaling as essential regulators of muscle development. Dev. Cell 16, 280–291 (2009)
Ji, H. & Wong, W. H. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 21, 3629–3636 (2005)
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007)
Reiss, D. J., Facciotti, M. T. & Baliga, N. S. Model-based deconvolution of genome-wide DNA binding. Bioinformatics 24, 396–403 (2008)
Schwartz, Y. B. et al. Genome-wide analysis of Polycomb targets in Drosophila melanogaster . Nature Genet. 38, 700–705 (2006)
Cripps, R. M. et al. The myogenic regulatory gene Mef2 is a direct target for transcriptional activation by Twist during Drosophila myogenesis. Genes Dev. 12, 422–434 (1998)
Cripps, R. M., Zhao, B. & Olson, E. N. Transcription of the myogenic regulatory gene Mef2 in cardiac, somatic, and visceral muscle cell lineages is regulated by a Tinman-dependent core enhancer. Dev. Biol. 215, 420–430 (1999)
Cripps, R. M., Lovato, T. L. & Olson, E. N. Positive autoregulation of the Myocyte enhancer factor-2 myogenic control gene during somatic muscle development in Drosophila . Dev. Biol. 267, 536–547 (2004)
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)
Halfon, M. S., Gallo, S. M. & Bergman, C. M. REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila . Nucleic Acids Res. 36 (Database issue). D594–D598 (2008)
Bischof, J., Maeda, R. K., Hediger, M., Karch, F. & Basler, K. An optimized transgenesis system for Drosophila using germ-line-specific ϕC31 integrases. Proc. Natl Acad. Sci. USA 104, 3312–3317 (2007)
Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007)
Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 35 (Database issue). D88–D92 (2007)
Choo, B. G. et al. Zebrafish transgenic Enhancer TRAP line database (ZETRAP). BMC Dev. Biol. 6, 5 (2006)
Sandmann, T., Jakobsen, J. S. & Furlong, E. E. ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos. Nature Protocols 1, 2839–2855 (2006)
Celniker, S. E. et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 3, RESEARCH0079 (2002)
Tweedie, S. et al. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res. 37 (Database issue). D555–D559 (2009)
Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003)
Thomas-Chollier, M. et al. RSAT: regulatory sequence analysis tools. Nucleic Acids Res. 36 (Web Server issue) W119–W127 (2008)
Li, L., Liang, Y. & Bass, R. L. GAPWM: a genetic algorithm method for optimizing a position weight matrix. Bioinformatics 23, 1188–1194 (2007)
Hertz, G. Z. & Stormo, G. D. Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)
Lloyd, C. J. Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. J. Am. Stat. Assoc. 93, 1356–1364 (1998)
Acknowledgements
We are grateful to M. Leptin for providing an independent assessment of the expression patterns driven by tested CRMs. We thank H. Gustafson for fly work, J. de Graaf for array hybridizations, S. Müller for embryo injections, and R. Bourgon for sharing code on signal peak identification. We thank all members of the Furlong laboratory for discussions and comments on the manuscript. This work was supported by a grant to E.E.M.F. and by a fellowship to R.P.Z. from the Human Frontiers Science Program.
Author Contributions M.B. performed ChIP experiments. R.P.Z., E.E.M.F. and C.G. generated CAD. R.P.Z. performed transgenic reporter experiments including in situ hybridizations and imaging. C.G. performed ChIP data analysis and motif analysis. J.G. devised the statistical and SVM analyses. E.E.M.F., R.P.Z., C.G. and J.G. formulated the hypotheses, designed experiments and wrote the manuscript.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary Information
This file contains Supplementary Methods, Supplementary Tables 1- 3 and 12, (for Supplementary Tables 4 -11 see separate files s2-s9), Supplementary Figures 1-15 with Legends and Supplementary References. (PDF 14712 kb)
Supplementary Table 4
This file contains CAD entries in tabular text format. Source, name and coordinates of each CAD entry is given with anatomy ontology terms and cross references to Flybase, PubMed, REDfly and FLDb (Furlong Db). A 'NR' key next to the source name indicates that entries have been modified during the CAD building process; in these cases, references to original entries are available in the cross references (using REDFly and FLDb references). File also contains embedded formatting comments. The associated CAD archive contains the various CAD input files as well as CAD in GFF format. (TXT 103 kb)
Supplementary Table 5
This file contains CRM Atlas in tabular format. The file provides ID, location and binding events for each CRM Atlas entry. (TXT 482 kb)
Supplementary Table 6
This file contains Regions reported by TileMap before cut-off selection. (TXT 7430 kb)
Supplementary Table 7
This file contains TileMap regions used to build the CRM Atlas; together with peak position and height. (TXT 1122 kb)
Supplementary Table 8
This file contains Training set for the Support Vector Machine. (TXT 29 kb)
Supplementary Table 9
This file contains Support Vector Machine predictions. (TXT 1922 kb)
Supplementary Table 10
This file contains Initial and Optimized Position Weight Matrices. (TXT 1 kb)
Supplementary Table 11
This file contains Cloned CRMs. (TXT 2 kb)
Supplementary Data
This contains Supplementary File 1, which was added on 25 Mar 2010. (ZIP 9 kb)
Rights and permissions
About this article
Cite this article
Zinzen, R., Girardot, C., Gagneur, J. et al. Combinatorial binding predicts spatio-temporal cis-regulatory activity . Nature 462, 65–70 (2009). https://doi.org/10.1038/nature08531
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/nature08531
This article is cited by
-
Genome-wide study of Cerrena unicolor 87613 laccase gene family and their mode prediction in association with substrate oxidation
BMC Genomics (2023)
-
CRISPR/Cas9 and FLP-FRT mediated regulatory dissection of the BX-C of Drosophila melanogaster
Chromosome Research (2023)
-
ASC proneural factors are necessary for chromatin remodeling during neuroectodermal to neuroblast fate transition to ensure the timely initiation of the neural stem cell program
BMC Biology (2022)
-
Eukaryotic transcription factors can track and control their target genes using DNA antennas
Nature Communications (2020)
-
Highly rearranged chromosomes reveal uncoupling between genome topology and gene expression
Nature Genetics (2019)