The document discusses the need for increased genomic sampling of bacterial and archaeal diversity based on the tree of life. It summarizes the Genomic Encyclopedia of Bacteria and Archaea (GEBA) pilot project, which selected 200 organisms from diverse phylogenetic lineages for genome sequencing. The results showed that sequencing genomes from underrepresented lineages led to novel gene, protein, and structural discoveries that improved genome annotation and metagenomic analysis. The document argues for expanding GEBA-style systematic sequencing to better represent microbial diversity.
1 of 76
Downloaded 31 times
More Related Content
Eisen.Geba.Jgi2009b
1. GEBA A genomic encyclopedia of bacteria and archaea Jonathan A. Eisen JGI User Meeting 2009
2. “ Nothing in biology makes sense except in the light of evolution.” T. Dobzhansky (1973)
7. At least 40 phyla of bacteria Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
8. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
9. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Some other phyla are only sparsely sampled As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
10. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Some other phyla are only sparsely sampled Same trend in Archaea As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
11. Need for Tree Guidance Well Established Common approach within some eukaryotic groups Many small projects funded to fill in some bacterial or archaeal gaps Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
12. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Some other phyla are only sparsely sampled Solution I: sequence more phyla NSF-funded Tree of Life Project A genome from each of eight phyla Eisen, Ward, Badger, Wu, Wu, et al. Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
13. Bacterial aTOL Project AIMS Improve resolution of deep branches in the bacterial tree Launch biological studies of these phyla and discover functional novelty Leverage data for interpreting environmental surveys
16. Within Phyla Diversity Immense Each phyla represents billions of years of evolution Some have hundreds of major lineages New lineages are being discovered all the time Most branches within most phyla have few or no genomes
18. Additional Impetus for Tree Guided Projects Suggestion to sequence all bacteria and archaea in Bergey’s Manual (Stevens et al) Success in sequencing genomes from across the tree in animals Multiple government reports suggest a more systematic approach to sequencing is needed
19. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 100 phyla of bacteria Genome sequences are mostly from three phyla Most phyla with cultured species are sparsely sampled Lineages with no cultured taxa even more poorly sampled Solution - use tree to really fill gaps Well sampled phyla Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
21. GEBA Pilot Project Overview Select 200 organisms using tree Develop high throughput pipeline for strain growth and DNA preparation Sequence and finish 100 Annotate, analyze, release data Assess benefits of tree guided sequencing
31. GEBA Biggest Challenge: Getting DNA Getting quality DNA is biggest bottleneck Solution: Beg Borrow and Steal DSMZ offered to do for free ATCC is doing a small number for a fee In discussions with other PCC and other collections
32.
33. Microorganisms Quantification gel of the genomic DNA isolated from Conexibacter woesei (DSM 14684T) Conexibacter woesei (DSM 14684T) was taken from the German Collection of Microorganisms and Cell Cultures (DSMZ). The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262). The genomic DNA was 10-250 kb in size as determined by Pulsed Field Gel Electrophoresis (PFGE). The bulk of DNA had a size of 50-250 kb (see attached PFGE image). The DNA concentration is 500 ng/µl as estimated from the gel. Spectrophotometric measurements yielded a DNA concentration of 450 µg/ml; 300 µl of genomic DNA are shipped (150 µg). 1 2 3 4 5 6 7 8 Lane 1: c( -Marker)= 15 ng Lane 2: c( -Marker)= 30 ng Lane 3: c( -Marker)= 50 ng Lane 4: DNA Molecular Weight Marker II (Roche 236250) Lane 5: DSM 13279, Collinsella stercoris Lane 6: DSM 43043, Intrasporangium calvum Lane 7: DSM 18053, Dyadobacter fermentans Lane 8: DSM 20476, Slackia heliotrinireducens Lane 9: DSM 18081, Patulibacter minatonensis Lane 10: DSM 14684, Conexibacter woesei Lane 11: DSM 11002, Dethiosulfovibrio peptidovorans Lane 12: DSM 11551, Halogeometricum borinquense Lane 13: DNA Molecular Weight Marker II (Roche 236250) Lane 14: c( -Marker)= 125 ng Lane 15: c( -Marker)= 250 ng Lane 16: c( -Marker)= 500 ng 9 10 11 12 13 14 15 16
35. Current Status >100 in progress GEBA 56 (focus of first paper) 34 finished genomes 55 submitted to Genbank Released to IMG-GEBA page and JGI-FTP site All data is completely Open for anyone to use
38. GEBA Pilot IV: Assess Benefits of GEBA56 All genomes have some value But what, if any, is the benefit of tree-guided sequencing over other selection methods
39. Why Increase Taxonomic Coverage II? Gene discovery Annotation, functional prediction Metagenomic analysis Mechanisms of diversification Species phylogeny and classification
40.
41. Value of diverse genomes I: Gene discovery Premise: New genomes frequently contain genetic novelty Phylogenetic diversity of a genome should be correlated to novelty Caveat: Does lateral gene transfer wipe out contribution of phylogenetic diversity to novelty?
42. Protein Family Rarefaction Curves Take data set of multiple complete genomes Identify all protein families using MCL Plot # of genomes vs. # of protein families
43.
44. Genome Number Total Gene Number Number of proteins 0 50000 100000 150000 200000 250000 300000 350000 0 10 20 30 40 50 60 70 80 S. agalactiae Enterobacteriaceae Actinobacteria Bacteria from GEBA project
45. Novelty 2 - Structural Novelty Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) Structural modeling suggests many are structurally novel too (D'haeseleer) 372 being crystallized by the PSI (Kerfeld)
52. Value of 100 diverse genomes II: Annotation Premise: Increased phylogenetic coverage should improve our ability to annotate genes in other (e.g., reference/model genomes)
53. Annotation Improves Conversion of hypothetical into conserved hypotheticals Linking distantly related members of protein families Non-homology functional prediction methods
57. Value of 100 diverse genomes III: Metagenomics Premise: Increased sampling of diverse genomes should improve many aspects of metagenomic analysis To test: Annotation Binning
70. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Most phyla with cultured species are sparsely sampled Lineages with no cultured taxa even more poorly sampled Well sampled phyla Poorly sampled No cultured taxa Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
71. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Some other phyla are only sparsely sampled Same trend in Viruses As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
72. Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi At least 40 phyla of bacteria Genome sequences are mostly from three phyla Some other phyla are only sparsely sampled Same trend in Microbial Eukaryotes As of 2002 Based on Hugenholtz, 2002 Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter
73. 0.1 Acidobacteria Bacteroides Fibrobacteres Gemmimonas Verrucomicrobia Planctomycetes Chloroflexi Tree based on Hugenholtz (2002) with some modifications. Need experimental studies from across the tree too Proteobacteria Chlorobi Firmicutes Fusobacteria Actinobacteria Cyanobacteria Chlamydia Spriochaetes Deinococcus-Thermus Aquificae Thermotogae TM6 OS-K Termite Group OP8 Marine GroupA WS3 OP9 NKB19 OP3 OP10 TM7 OP1 OP11 Nitrospira Synergistes Deferribacteres Thermudesulfobacteria Chrysiogenetes Thermomicrobia Dictyoglomus Coprothmermobacter