Abstract
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
Hahne, F., Huber, W., Gentleman, R. & Falcon, S. Bioconductor Case Studies (Springer, 2008).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Ohnishi, Y. et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nat. Cell Biol. 16, 27–37 (2014).
Finak, G. et al. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 10, e1003806 (2014).
Chelaru, F., Smith, L., Goldstein, N. & Corrada Bravo, H. Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).
Gentleman, R. Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4, Article 2 (2005).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Laufer, C., Fischer, B., Billmann, M., Huber, W. & Boutros, M. Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Nat. Methods 10, 427–431 (2013).
Waldron, L. et al. Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J. Natl. Cancer Inst. 106, dju049 (2014).
Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl. Cancer Inst. 106, dju048 (2014).
McMurdie, P.J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).
Goecks, J., Nekrutenko, A., Taylor, J. & The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Pérez, F. & Granger, B.E. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007).
Anonymous. Credit for code. Nat. Genet. 46, 1 (2014).
Altschul, S. et al. The anatomy of successful computational biology software. Nat. Biotechnol. 31, 894–897 (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Lawrence, M. & Morgan, M. Scalable genomics with R and Bioconductor. Stat. Sci. 29, 214–226 (2014).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) - toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Obenchain, V. et al. VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics 30, 2076 (2014).
Acknowledgements
We thank all contributors to the Bioconductor and R projects. Bioconductor is supported by the National Human Genome Research Institute of the US National Institutes of Health (U41HG004059 to M.M.). Additional support is from the US National Science Foundation (1247813 to M.M.) and the European Commission FP7 project RADIANT (to W.H.). A. Bruce provided graphics support for Figure 2.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Huber, W., Carey, V., Gentleman, R. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–121 (2015). https://doi.org/10.1038/nmeth.3252
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3252
This article is cited by
-
CytoPipeline and CytoPipelineGUI: a Bioconductor R package suite for building and visualizing automated pre-processing pipelines for flow cytometry data
BMC Bioinformatics (2024)
-
Comparing microbiotas of foals and their mares’ milk in the first two weeks after birth
BMC Veterinary Research (2024)
-
ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management
BMC Bioinformatics (2024)
-
Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet
Genome Biology (2024)
-
Orchestrating chromosome conformation capture analysis with Bioconductor
Nature Communications (2024)