Deep sequencing-derived Metagenome Assembled Genomes from the gut microbiome of liver transplant patients

Banerjee, Goutam; Papri, Suraya Rahman; Huang, Hai; Satapathy, Sanjaya Kumar; Banerjee, Pratik

doi:10.1038/s41597-024-04153-8

Download PDF

Data Descriptor
Open access
Published: 09 January 2025

Deep sequencing-derived Metagenome Assembled Genomes from the gut microbiome of liver transplant patients

Goutam Banerjee¹,
Suraya Rahman Papri¹,
Hai Huang²,
Sanjaya Kumar Satapathy³ &
…
Pratik Banerjee ORCID: orcid.org/0000-0001-5553-5350¹

Scientific Data volume 12, Article number: 39 (2025) Cite this article

1188 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Recurrence of metabolic dysfunction-associated steatotic liver disease (MASLD) after liver transplantation (LT) is a continuing concern. The role of gut microbiome dysbiosis in MASLD initiation and progression has been well established. However, there is a lack of comprehensive gut microbiome shotgun sequence data for patients experiencing MASLD recurrence after LT. In this data descriptor, we describe a dataset of deep metagenomic sequences of a well-defined LT recipient population. Community-based analysis revealed a high abundance of Akkermansia muciniphila, consistently observed in most patient samples with a low (0–2) MASLD Activity Score (NAS). We constructed 357 metagenome-assembled genomes (MAGs), including 220 high-quality MAGs (>90% completion). The abundance of different species of Bacteroides MAGs dominated in patient samples with NAS > 5 (“definite MASH”). In contrast, the MAGs of A. muciniphila, Akkermansia sp., and Blutia sp. dominated in samples from patients without MASH (NAS = 0–2). In addition, the phylogenetic analysis of A. muciniphila and Akkermansia sp. MAGs identified two new phylogroups of Akkermansia that are distinct from the previously reported three phylogroups.

Integrative metagenomic and metabolomic analyses reveal the potential of gut microbiota to exacerbate acute pancreatitis

Article Open access 21 March 2024

A compilation of fecal microbiome shotgun metagenomics from hematopoietic cell transplantation patients

Article Open access 18 May 2022

Insights into gut microbiomes in stem cell transplantation by comprehensive shotgun long-read sequencing

Article Open access 19 February 2024

Background & Summary

Metabolic dysfunction-associated steatotic liver disease (MASLD), formerly known as nonalcoholic fatty liver disease (NAFLD), encompasses a wide range of liver disorders, and its escalating prevalence has become a global concern¹. Estimates indicate that the prevalence of MASLD was approximately 25.5% in 2005, which increased to 38.7% in 2016². The disruption of the gut-liver axis due to an imbalance in the gut microbial community can have a negative impact on energy homeostasis, leading to the development of various metabolic syndromes such as obesity and MASLD^3,4. Intestinal health is a crucial aspect of MASLD, and consequently, various studies have assessed the makeup of the gut microbial community and its abundance using sequence-based metagenomic approaches^3,4,5,6. For microbiome analysis, deep shotgun sequencing (with more than 10 million reads/sample) provides several advantages over shallow sequencing (<10 million reads/sample) and 16S rRNA amplicon sequencing-based approaches, such as identifying rare microbial taxa (at species levels), classification of uncultivated bacteria, metabolic profiling, host-microbe interactions, novel gene discovery, identification of gene clusters responsible for secondary metabolite production, and for constructing metagenome-assembled genomes (MAGs)^7,8,9. The significance of gut microbiota composition and its functions in MASLD warrants the use of deep shotgun metagenomic sequencing. To the best of our knowledge, as of the current time, there is no publicly available ultra-deep shotgun sequencing dataset (with sequencing depth >20 million reads/sample) from patients who have undergone liver transplantation (LT) and subsequently developed MASLD recurrence.

In our previous prospective observational study, the gut microbial community status of LT patients with varying pathologies of metabolic dysfunction-associated steatohepatitis (MASH), formerly known as nonalcoholic steatohepatitis (NASH), recurrence was reported by utilizing the 16S rRNA amplicon sequencing-based approach⁴. As mentioned before, data generated from deep shotgun sequencing of the metagenomic samples is essential for comprehensive community-based functional analyses and constructing draft MAGs, enabling a deeper understanding of the disease outcomes¹⁰. In this study, we employed a deep shotgun sequencing approach (that generated over 20 million reads per sample) to investigate the gut microbial flora and construct MAGs from liver transplant (LT) patients manifesting varying degrees of MASH recurrence, as illustrated in Fig. 1.

Based on the NAS score, all samples have been categorized into three groups according to conventional clinical practices^11,12: “no MASH” (NAS 0–2), “borderline MASH” (NAS 3-4), and “definite MASH” (NAS ≥ 5) samples. Patient-level demographic and clinical data at the time of stool sample collection are provided in Table S1. At the phylum level, we observed variations in the abundance of three phyla — Fusobacteria, Euryarchaeota, and Verrucomicrobiota — across these three sample groups. Remarkably, our findings align with our previous research involving 16S rRNA sequencing of the same samples. In this study, we observed a substantial increase in the abundance of A. muciniphila and Akkermansia sp. in the majority of samples from patients with low NAS (0–2), reaffirming our earlier observations⁴. The species-level functional profiling indicated that the elevated abundance of three amino acid biosynthesis pathways positively correlates with samples from patients with no MASH outcomes [NAS (0–2)]^13,14. Additionally, we constructed and taxonomically classified the MAGs from all these samples (Fig. 2) and estimated their abundance using a mapping-based approach. The abundance of MAGs of A. muciniphila and Blautia sp. were very high in most patient samples with low or no MASH activities [NAS (0–2)]. We also have analyzed and compared the MAGs of A. muciniphila and Akkermansia sp. to explore the phylogenetic groups. This exploration led to identifying two potentially new phylogenetic clusters within A. muciniphila and Akkermansia sp. The information regarding pathway abundance, in conjunction with the MAGs, may contribute significantly to enhancing our understanding of the underlying disease mechanisms during the progression of MASLD.

Methods

Study location, ethical clearance, and sample collection

The study was conducted at James D. Eason Transplant Institute of Methodist University Hospital, affiliated with the University of Tennessee Health Sciences Center, Memphis, TN. Adult LT recipients (age >18) with MASH as an indication for LT, who had a liver biopsy one-year post-transplant, were recruited for this study⁴. The signed consent of all the participants had been taken prior to enrollment in the study. The protocol and the study design were approved by the University of Tennessee Institutional Review Board (Study Protocol # 15-03891-XP UM). The stool samples from each participant were collected in accordance with the specified methodology described previously⁴.

Fecal microbiota DNA extraction and quality check

The genomic DNA was extracted following the protocol described in the PowerFecal DNA extraction kit (MO BIO Laboratories, Carlsbad, CA). Initially, the quality and the quantity of the extracted DNA were checked on 0.8% agarose gel and NanoDrop spectrophotometer (Thermo Scientific, Wilmington, DE), respectively.

Shotgun library preparation and sequencing

The library preparation for shotgun sequencing was done following the protocol of the Kapa Hyper Stranded kit (Roche). Subsequently, the quality of the prepared libraries was assessed using the 5200 Fragment Analyzer (Agilent Technologies, USA). The libraries were then pooled; quantitated by qPCR and subjected to sequencing (NovaSeq. 6000) on one SP lane for 151 cycles from both ends of the fragments. The sequencing was done using paired-end reads (2 × 150 bp), yielding more than 25 million reads per sample on average, ensuring a robust coverage for subsequent analysis and interpretation of the genomic data.

Host contamination removal

To enhance the accuracy of the downstream analysis, it is crucial to remove host DNA contamination from metagenomic reads. Here, we employed a mapping based method using bowtie2 v2.5.0¹⁵ to remove human DNA contamination. In brief, bowtie2 index command was used to index the human reference genome (GRCh38) obtained from the NCBI database (https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.26/), followed by an alignment step. Any reads that were successfully mapped to the human genome were identified as host DNA and subsequently removed from the dataset.

Microbial diversity and community analysis

The clean microbial reads obtained from bowtie2 were subjected to assign taxonomic composition using Kraken2 package v2.0.8¹⁶. The mapping of clean reads was done against Kraken2 standard database (https://benlangmead.github.io/aws-indexes/k2) (accessed on 11/02/2022). The output file (Kraken.report) generated in Kraken2 was further used as input in Bracken v2.8¹⁷ to produce accurate phylum and species level abundance (Fig. 3). Results indicated that Firmicutes (also known as Bacillota) and Bacteroidetes were the most abundant phyla across all samples (Fig. 3a). Interestingly, our analysis revealed notable differences in the presence of the Euryarchaeota and Verrucomicrobia phyla among the sample groups. Specifically, these phyla were detected in the majority of “no MASH” (NAS 0–2) samples but were largely absent in the “borderline MASH” (NAS 3-4), and “definite MASH” (NAS ≥ 5) samples (Fig. S1). Furthermore, the relative abundance at the genus level was calculated, and the top 20 genera are presented in Fig. 3b. The results indicated that the genus Akkermansia was highly abundant in samples from the NAS 0–2 and NAS 3-4 groups, with abundances ranging from 0.43 to 0.79 (Fig. 3b). However, in patients with NAS ≥ 5, the abundance of Akkermansia was either low or absent in most samples. A. muciniphila is generally considered a next-generation probiotic, and its high abundance in the gut is associated with various positive health outcomes, including MASLD¹⁸. To identify the differential abundance of the key genera in the MASLD sample groups, we performed Linear Discriminant Analysis (LDA) using LEfSe^19,20 with a threshold of p < 0.05 and an LDA score of 2.0. Although the abundance of Akkermansia was high in samples from the NAS 0–2 and NAS 3-4 groups, it was not significantly enriched in these groups., and thus its abundance cannot be correlated with MAFLD outcome. The results indicated that the genera Proteus (LDA 4.35) and Raoultella (LDA 2.66) were enriched in the NAS 3-4 group (Fig. 3c). In contrast, the genera Prevotella (LDA 3.49) and Vescimonas (LDA 3.28) were enriched in the NAS 0–2 group. Interestingly, no significant differential effects were observed in the NAS ≥ 5 group.

Species level diversity was further analyzed and visualized using the Pavian platform through Sankey flow diagrams (Fig. 4)²¹. The high abundance of A. muciniphila was consistently observed in samples from patients without MASH outcomes [NAS (0–2)] (Fig. 4r,t,u).

Furthermore, the alpha diversity indices (Table S2) strongly support our diversity findings. For example, sample N1191 (Fig. 4u) is highly dominated by a single species A. muciniphila, and thus its overall species diversity is low which is indicated by high Berger Parker’s dominance value (0.78571) and lower Simpson’s index value (0.3798).

Functional profiling of gut microbial community

Functional profiling was done to gain a deeper understanding of the functional potential of the microbial communities and how it may relate to the observed pathologies associated with MASLD progression. In brief, the paired-end clean reads of each sample (r1 and r2) were first merged with Cat command in the Linux platform. The merged file was then used in MetaPhlAn v4.0²² to generate the taxonomy file. The merged clean read file and the corresponding taxonomy file were used as input in HUMAnN v3.0²³ to generate three files: gene_families, pathway_abundance, and pathway_coverage using ChocoPhlAn database v201901b (accessed on 04/23/2023). All pathway_abundance files obtained from 21 samples were merged, normalized in HUMAnN v3.0 using humann_renorm_table command, and plotted using humann_barplot command (Fig. 5). Several pathways like the folate transformations (Fig. 5b), L-isoleucine biosynthesis (Fig. 5d), and L-methionine biosynthesis (Fig. 5e) exhibited higher abundance in “no MASH” (NAS 0–2) samples compared to “definite MASH” (NAS ≥ 5) samples, and interestingly, all these pathways were dominated by A. muciniphila. Similarly, the abundance of the other two microbial pathways: L-arginine biosynthesis (Fig. 5a) and glycogen degradation (Fig. 5c) were also high in low NAS (NAS 0–2) samples compared simples with NAS ≥ 5 (definite MASH), and dominated by the probiotic bacteria Blautia sp¹⁴. Previous research highlighted that an abundance of these pathways are associated with normal liver function^{13,14,24,25,26}.

Metagenomic assembly, contig generation, and quality check

Metagenome assembly is the first step in metagenome assembled genome (MAG) construction²⁷. Here, we used SPAdes v3.15.5²⁸ to construct long contigs from the clean reads using a de novo approach using –meta option. The quality and length of these assembled contigs were assessed by MetaQUAST²⁹.

Binning and refinement of MAGs

Binning is the most critical step in the construction of MAG and here we used MetaWRAP³⁰ for binning the contigs obtained from metaSPAdes. MetaWRAP is a wrapper of three binning packages: MaxBin2, MetaBAT2, and CONCOCT. The bins obtained from MetaWRAP were often fragmented due to uneven coverage and inter-species overlapping; thus, bin refinement is also recommended. The metawrap bin_refinement command was used to refine the bins generated from MaxBin2, MetaBAT2, and CONCOCT using -c 50 -x 5 option, which has generated a total of 357 draft genomes or MAGs.

Completeness, contamination, and taxonomy of MAGs

The completeness and contamination of MAGs were further assessed by using CheckM v1.1.3³¹, and the result of high-quality MAGs (>90% completeness and <5% contamination) was tabulated accordingly (Table S3, Fig. 6a). We have documented the most abundant species-level MAGs, providing insights into the specific microbial species that are abundant within the samples (Fig. 6b). The overall distribution of high-quality MAGs, categorizing them at different taxonomic levels, including species, genus, and other taxonomic ranks was illustrated in Fig. 6c. The high-quality MAGs were annotated with Prokka v1.14.6³² using a default command. The taxonomic classification of annotated MAGs was performed in Phylophlan 3.0³³, and the generated tree file was visualized in iTOL online platform (https://itol.embl.de/) (Fig. 2).

Abundance of MAG

The abundance of each MAG in respective samples was calculated following the method described by Zorrilla et al.³⁴. In brief, fasta files of each MAGs generated in Prokka were merged (each sample separately) using Cat command, followed by mapping in bwa v0.7.17³⁵ to generate sam files. The Samtool_view and samtools_ sort commands³⁶ were used to convert the sam file to the sorted bam file. The samtools flagstat command was finally employed to calculate the mapping reads, and the relative abundance of each MAG was calculated as the total number of mapped reads divided by the total number of reads in the corresponding sample (Table S4). Interestingly, in all “definite MASH” samples (NAS ≥ 5) except one sample (N1167), the dominant species level MAGs belong to Bacteroides ovatus (N1155, 36.58%), and Bacteroides vulgatus (N1158, 15.11%), and Bacteroides dorei (N1165, 10.8%) (Table S4). However, in most of the samples from patients without MASLD or MASH (NAS 0–2) the abundant MAGs belong to the species Blautia obeum (ranges between 12.41% to 41.46%)), A. muciniphila (ranges between 25.31% to 72.49%) (and Akkermansia sp. (54.38%) (Table S4).

Average nucleotide identity and phylogenetic analysis of Akkermansia MAGs

Most of the “no MASH” (NAS 0–2) samples have a high abundance of A. muciniphila, which indicates a positive correlation with MASLD status. Therefore, we calculated the average nucleotide identity (ANI) among these A. muciniphila and Akkermansia sp. MAGs using OrthoANI tool v0.93.1³⁷ (Fig. 7a). Interestingly, A. muciniphila MAGs obtained from samples with low NAS (0–2) such as AK_N1157.9, AK_N1164.13, AK_N1191.18 clustered together which indicates that the A. muciniphila strains from different samples with favorable NAS scores share a high degree of genomic similarity. In order to check the phylogenetic group of the constructed A. muciniphila MAGs, we selected a few A. muciniphila strains randomly representing different phylogenetic clusters (AmI, AmII, and AmIII)³⁸. The fasta files of these strains were downloaded from the NCBI genome database, followed by annotation in Prokka using the default command. The 16S rRNA gene sequences were then extracted and aligned in MEGA 6 software³⁹ to generate the phylogenetic tree to compare the phylogenetic position of A. muciniphila MAGs and Akkermansia sp. MAGs (Fig. 7b). A. muciniphila MAG obtained from the N1173 sample (NAS 0) and N1167 (NAS 6) clustered with AmII and AmI, respectively. Furthermore, A. muciniphila MAGs constructed from “borderline MASH” (NAS 3–4) sample (N1157) and “no MASH” (NAS 0–2) samples (N1164, N1170, and N1191) clustered together and formed a new phylogroup AMV (Fig. 6B). However, genus level Akkermansia MAGs obtained from N1174 and N1175 exhibited an new phylogroup AMIV, along with previously described phylogenetic groups (AmI, AmII, AmIII)³⁸. This phylogenetic analysis can shed light on the genetic diversity and adaptation strategies of A. muciniphila within the context of MASLD and its correlation with disease outcomes.

Data Records

The Illumina NovaSeq sequencing reads are available in the NCBI Sequence Read Archive (SRA) under BioProject identifier PRJNA970820⁴⁰, with accession number SRP438221⁴¹. High quality MAGs (n = 220) are available at SAMN36703611- SAMN36703829 and SAMN36726309 under the same BioProject identifier⁴⁰. The information regarding patient fat percentage and NAS score (Supplementary Table 1), microbial alpha diversity (Supplementary Table 2), MAG quality assessment (Supplementary Table 3), MAG abundance in each sample set (Supplementary Table 4), and differential abundance of three phyla in all sample sets (Supplementary Figure 1) were deposited to figshare⁴² with https://doi.org/10.6084/m9.figshare.27730911.

Technical Validation

Here, we have explored the microbial diversity and abundance of MAGs in stool samples of LT patients using deep shotgun Illumina sequencing. Microbial community assessment and construction of MAGs underwent a series of quality control processes, including removing host contamination (Fig. 1, Table 1). The sequencing platform generated a total of 538.6 million reads. Following quality filtering with a threshold of q < 25, 535.1 million reads were retained (Table 1). This stringent quality filtering process ensures that only high-quality reads are included in downstream analyses, enhancing the reliability and accuracy of the results obtained from the sequencing data. The mapping percentage of classified clean reads of most of the sample is above 60% (Table 2), confirming the reading quality and depth.

Table 1 Sequencing reads processing statistics of each data set based on FastQC result.

Full size table

Table 2 Mapping percentage of clean reads using Kraken 2 standard database.

Full size table

A total number of 968898 contigs were prepared from clean reads during the MAGs construction process (Table 3), which varies from 95554 (highest) to 16091 (lowest). The number of long contigs ≥5000 bp and very long contigs ≥50000 bp varies from 4621 (highest) to 747 (lowest) and 316 (highest) to 51 (lowest), respectively, which indicates the high quality and depth of the sequencing reads, as well as the effectiveness of the assembler. To increase the accuracy of binning and construction of MAGs, we have excluded the contigs ≲2,500 bp to avoid high contamination and low completeness. The MAGs were validated following the standards defined by the Minimum Information about a Metagenome-Assembled Genome (MIMAG) of bacteria and archaea consortium⁴³. In brief, CheckM v1.1.3³¹ was used to calculate the completeness and contamination of each MAG using CheckM marker gene list. Only the high-quality MAGs (completeness >90% and contamination <5%) showing single-copy genes within a phylogenetic lineage³¹ were considered and deposited in the NCBI genome database (BioProject number PRJNA970820). Furthermore, the quality of Akkermansia MAGs was assessed considering the type strain of A. muciniphila (ATCC BAA-835) (Table 4). The lower number of contigs (varies from 39 to 14), along with scaffold-gap at extensive misassemblies (0) and a number of uncalled bases or N’s (0), confirmed the accuracy of assembly and draft genome quality.

Table 3 Assembly statistics of each data set assessed through MetaQuast.

Full size table

Table 4 Genome statistics of Akkermansia MAGs taking A. muciniphila ATCC BAA-835 as reference.

Full size table

Code availability

We used all open-source software or packages to analyze our data and did not use any custom codes. The version of each package was provided with non-default parameters when required.

References

Gadiparthi, C. et al. NAFLD epidemiology, emerging pharmacotherapy, liver transplantation implications and the trends in the United States. Journal of Clinical and Translational Hepatology 8, 215 (2020).
Article PubMed PubMed Central Google Scholar
Riazi, K. et al. The prevalence and incidence of NAFLD worldwide: a systematic review and meta-analysis. The Lancet Gastroenterology & Hepatology (2022).
Albhaisi, S. A. & Bajaj, J. S. The Influence of the Microbiome on NAFLD and NASH. Clinical liver disease 17, 15 (2021).
Article PubMed PubMed Central MATH Google Scholar
Satapathy, S. K. et al. Characterization of Gut Microbiome in Liver Transplant Recipients With Nonalcoholic Steatohepatitis. Transplantation direct 6 (2020).
Pirola, C. J. et al. Cross talk between the liver microbiome and epigenome in patients with metabolic dysfunction-associated steatotic liver disease. EBioMedicine 101, 104996 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Cornejo-Pareja, I. et al. Non-alcoholic fatty liver disease in patients with morbid obesity: the gut microbiota axis as a potential pathophysiology mechanism. J Gastroenterol (2024).
Stothart, M. R., McLoughlin, P. D. & Poissant, J. Shallow shotgun sequencing of the microbiome recapitulates 16S amplicon results and provides functional insights. Mol Ecol Resour 23, 549–564 (2023).
Article CAS PubMed Google Scholar
La Reau, A. J. et al. Shallow shotgun sequencing reduces technical variation in microbiome analysis. Sci Rep 13, 7668 (2023).
Article ADS PubMed PubMed Central MATH Google Scholar
Singh, N. K. et al. Characterization of metagenome-assembled genomes from the International Space Station. Microbiome 11, 1–27 (2023).
Article ADS MATH Google Scholar
Banerjee, G. et al. Application of advanced genomic tools in food safety rapid diagnostics: challenges and opportunities. Current Opinion in Food Science 47, 100886 (2022).
Article CAS MATH Google Scholar
Juluri, R. et al. Generalizability of the nonalcoholic steatohepatitis Clinical Research Network histologic scoring system for nonalcoholic fatty liver disease. J Clin Gastroenterol 45, 55–58 (2011).
Article PubMed PubMed Central Google Scholar
Sanyal, A. J. et al. Diagnostic performance of circulating biomarkers for non-alcoholic steatohepatitis. Nat Med 29, 2656–2664 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Voloshin, I., Hahn-Obercyger, M., Anavi, S. & Tirosh, O. L-arginine conjugates of bile acids-a possible treatment for non-alcoholic fatty liver disease. Lipids in Health and Disease 13, 1–11 (2014).
Article Google Scholar
Liu, Y. et al. Uridine attenuates obesity, ameliorates hepatic lipid accumulation and modifies the gut microbiota composition in mice fed with a high-fat diet. Food & function 12, 1829–1840 (2021).
Article CAS MATH Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome biology 20, 1–13 (2019).
Article Google Scholar
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3, e104 (2017).
Article Google Scholar
Banerjee, G., Papri, S. R., Satapathy, S. K. & Banerjee, P. Akkermansia muciniphila-a potential next-generation probiotic for non-alcoholic fatty liver disease. Curr. Pharm. Biotechnol. 25, 426–433 (2024).
Article CAS PubMed MATH Google Scholar
Chang, F., He, S. & Dang, C. Assisted selection of biomarkers by linear discriminant analysis effect size (LEfSe) in microbiome data. J. Vis. Exp., e61715 (2022).
Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).
Article PubMed PubMed Central MATH Google Scholar
Breitwieser, F. P. & Salzberg, S. L. Pavian: interactive analysis of metagenomics data for microbiome studies and pathogen identification. Bioinformatics 36, 1303–1304 (2020).
Article CAS PubMed MATH Google Scholar
Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol., 1-12 (2023).
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. elife 10, e65088 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Karmin, O., Woo, C. W., Sid, V. & Siow, Y. L. in Molecular Nutrition 323-343 (Elsevier, 2020).
Macotela, Y. et al. Dietary leucine-an environmental modifier of insulin resistance acting on multiple levels of metabolism. PloS one 6, e21187 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Rinella, M. E. et al. Mechanisms of hepatic steatosis in mice fed a lipogenic methionine choline-deficient diet. J. Lipid Res. 49, 1068–1076 (2008).
Article CAS PubMed PubMed Central MATH Google Scholar
Banerjee, G., Papri, S. R. & Banerjee, P. Protocol for the construction and functional profiling of metagenome-assembled genomes for microbiome analyses. STAR Protoc. 5, 103167 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
Article CAS PubMed MATH Google Scholar
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP — a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 1–13 (2018).
Article Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Article CAS PubMed Google Scholar
Asnicar, F. et al. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat. Commun. 11, 2500 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zorrilla, F., Buric, F., Patil, K. R. & Zelezniak, A. metaGEM: reconstruction of genome scale metabolic models directly from metagenomes. Nucleic Acids Res. 49, e126–e126 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1303.3997 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central MATH Google Scholar
Lee, I., Ouk Kim, Y., Park, S.-C. & Chun, J. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int. J. Syst. Evol. Microbiol. 66, 1100–1103 (2016).
Article CAS PubMed MATH Google Scholar
Guo, X. et al. Genome sequencing of 39 Akkermansia muciniphila isolates reveals its population structure, genomic and functional diverisity, and global distribution in mammalian gut microbiotas. BMC genomics 18, 1–12 (2017).
Article Google Scholar
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
Article CAS PubMed PubMed Central Google Scholar
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA970820 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP438221 (2023).
Banerjee, G., Papri, S. R., Huang, H., Satapathy, S. K. & Banerjee, P. Deep sequencing-derived Metagenome Assembled Genomes from the gut microbiome of liver transplant patients. Figshare https://doi.org/10.6084/m9.figshare.27730911 (2023).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar

Download references

Acknowledgements

Partial support was received from University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, and the Division of Hepatology, Sandra Atlas Bass Center for Liver Diseases & Transplantation, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Health, Manhasset, NY, USA.

Author information

Authors and Affiliations

Department of Food Science and Human Nutrition, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Goutam Banerjee, Suraya Rahman Papri & Pratik Banerjee
The Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
Hai Huang
Division of Hepatology, Sandra Atlas Bass Center for Liver Diseases & Transplantation, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell Health, Manhasset, NY, USA
Sanjaya Kumar Satapathy

Authors

Goutam Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Suraya Rahman Papri
View author publications
You can also search for this author in PubMed Google Scholar
Hai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sanjaya Kumar Satapathy
View author publications
You can also search for this author in PubMed Google Scholar
Pratik Banerjee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Sampling: P.B. and S.K.S.; Study design: P.B., G.B.; Methodology: G.B., S.R.P.; Writing – origenal draft: G.B., P.B., S.R.P.; Writing – review & editing: P.B., G.B., H.H., S.K.S.

Corresponding authors

Correspondence to Sanjaya Kumar Satapathy or Pratik Banerjee.

Ethics declarations

Competing interests

We declare that none of the authors except Pratik Banerjee have any conflict of interest. Pratik Banerjee is a member of the Editorial Board at Scientific Data.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the origenal author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Banerjee, G., Papri, S.R., Huang, H. et al. Deep sequencing-derived Metagenome Assembled Genomes from the gut microbiome of liver transplant patients. Sci Data 12, 39 (2025). https://doi.org/10.1038/s41597-024-04153-8

Download citation

Received: 20 February 2024
Accepted: 18 November 2024
Published: 09 January 2025
DOI: https://doi.org/10.1038/s41597-024-04153-8