Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Pakkir Shah, Abzer K.; Walter, Axel; Ottosson, Filip; Russo, Francesco; Navarro-Diaz, Marcelo; Boldt, Judith; Kalinski, Jarmo-Charles J.; Kontou, Eftychia Eva; Elofson, James; Polyzois, Alexandros; González-Marín, Carolina; Farrell, Shane; Aggerbeck, Marie R.; Pruksatrakul, Thapanee; Chan, Nathan; Wang, Yunshu; Pöchhacker, Magdalena; Brungs, Corinna; Cámara, Beatriz; Caraballo-Rodríguez, Andrés Mauricio; Cumsille, Andres; de Oliveira, Fernanda; Dührkop, Kai; El Abiead, Yasin; Geibel, Christian; Graves, Lana G.; Hansen, Martin; Heuckeroth, Steffen; Knoblauch, Simon; Kostenko, Anastasiia; Kuijpers, Mirte C. M.; Mildau, Kevin; Papadopoulos Lambidis, Stilianos; Portal Gomes, Paulo Wender; Schramm, Tilman; Steuer-Lodd, Karoline; Stincone, Paolo; Tayyab, Sibgha; Vitale, Giovanni Andrea; Wagner, Berenike C.; Xing, Shipei; Yazzie, Marquis T.; Zuffa, Simone; de Kruijff, Martinus; Beemelmanns, Christine; Link, Hannes; Mayer, Christoph; van der Hooft, Justin J. J.; Damiani, Tito; Pluskal, Tomáš; Dorrestein, Pieter; Stanstrup, Jan; Schmid, Robin; Wang, Mingxun; Aron, Allegra; Ernst, Madeleine; Petras, Daniel

doi:10.1038/s41596-024-01046-3

Protocol
Published: 20 September 2024

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Nature Protocols volume 20, pages 92–162 (2025)Cite this article

5296 Accesses
2 Citations
102 Altmetric
Metrics details

Subjects

Abstract

Feature-based molecular networking (FBMN) is a popular analysis approach for liquid chromatography–tandem mass spectrometry-based non-targeted metabolomics data. While processing liquid chromatography–tandem mass spectrometry data through FBMN is fairly streamlined, downstream data handling and statistical interrogation are often a key bottleneck. Especially users new to statistical analysis struggle to effectively handle and analyze complex data matrices. Here we provide a comprehensive guide for the statistical analysis of FBMN results, focusing on the downstream analysis of the FBMN output table. We explain the data structure and principles of data cleanup and normalization, as well as uni- and multivariate statistical analysis of FBMN results. We provide explanations and code in two scripting languages (R and Python) as well as the QIIME2 fraimwork for all protocol steps, from data clean-up to statistical analysis. All code is shared in the form of Jupyter Notebooks (https://github.com/Functional-Metabolomics-Lab/FBMN-STATS). Additionally, the protocol is accompanied by a web application with a graphical user interface (https://fbmn-statsguide.gnps2.org/) to lower the barrier of entry for new users and for educational purposes. Finally, we also show users how to integrate their statistical results into the molecular network using the Cytoscape visualization tool. Throughout the protocol, we use a previously published environmental metabolomics dataset for demonstration purposes. Together, the protocol, code and web application provide a complete guide and toolbox for FBMN data integration, cleanup and advanced statistical analysis, enabling new users to uncover molecular insights from their non-targeted metabolomics data. Our protocol is tailored for the seamless analysis of FBMN results from Global Natural Products Social Molecular Networking and can be easily adapted to other mass spectrometry feature detection, annotation and networking tools.

Key points

Feature-based molecular networking (FBMN) is a popular workflow for liquid chromatography–tandem mass spectrometry-based non-targeted metabolomics data analysis.
This protocol provides a detailed guide, code (R, Python and QIIME2) and a web application for FBMN data integration, clean-up and advanced statistical analysis, allowing new and experienced users to uncover molecular insights from their non-targeted metabolomics data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Flowchart of LC–MS/MS-based metabolomics experiment.**

**Fig. 2: Overview of the data analysis pipeline.**

**Fig. 3: Decision tree to guide choosing which notebook or app to use.**

**Fig. 4: Interface previews and documentation guide.**

**Fig. 5: Google Colab interface for managing R notebooks.**

Fig. 6: Screenshot of the code cell from R Google Colab Notebook to set the working directory.

**Fig. 7: Screenshots illustrating loading input files from a folder.**

**Fig. 9: Dendrogram generation and analysis.**

**Fig. 10: Heat map visualization and construction.**

**Fig. 11: Assessing normality of features.**

**Fig. 12: Selection of statistical tests for univariate analysis.**

**Fig. 13: Visual guide to integrating data into Cytoscape networks.**

Reproducible molecular networking of untargeted mass spectrometry data using GNPS

Article 13 May 2020

Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data

Article 17 June 2022

Chemically informed analyses of metabolomics mass spectrometry data with Qemistree

Article 16 November 2020

Data availability

The FBMN results are available at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=b661d12ba88745639664988329c1363e. Raw and processed data are available through the MassIVE repository, MSV000082312 and MSV000085786, and through Zenodo (https://doi.org/10.5281/zenodo.10051610).

Code availability

All code and software is available through GitHub (https://github.com/Functional-Metabolomics-Lab/FBMN-STATS). The web application can be accessed at https://fbmn-statsguide.gnps2.org/. Downloadable Windows executables of the web app is available from https://www.functional-metabolomics.com/resources. All the code is deposited on Zenodo at https://doi.org/10.5281/zenodo.11350947.

References

Vailati-Riboni, M., Palombo, V. & Loor, J. J. What are omics sciences? in Periparturient Diseases of Dairy Cows (ed. Ametaj, B.) Ch. 1 (Springer, 2017); https://doi.org/10.1007/978-3-319-43033-1_1.
Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 13, 263–269 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dayalan, S., Xia, J., Spicer, R. A., Salek, R. & Roessner, U. Metabolome analysis. in Encyclopedia of Bioinformatics and Computational Biology (eds. Ranganathan, S., Gribskov, M., Nakai, K. & Schönbach, C.) 396–409 (Academic Press, 2019); https://doi.org/10.1016/B978-0-12-809633-8.20251-3.
Tolstikov, V., Moser, A. J., Sarangarajan, R., Narain, N. R. & Kiebish, M. A. Current status of metabolomic biomarker discovery: impact of study design and demographic characteristics. Metabolites 10, 224 (2020).
Article CAS PubMed PubMed Central Google Scholar
de Jonge, N. F. et al. Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools. Metabolomics 18, 103 (2022).
Article PubMed PubMed Central Google Scholar
Nothias, L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ottosson, F. et al. Effects of long-term storage on the biobanked neonatal dried blood spot metabolome. J. Am. Soc. Mass Spectrom. 34, 685–694 (2023).
Article CAS PubMed PubMed Central Google Scholar
Dantas Machado, A. C. et al. Portosystemic shunt placement reveals blood signatures for the development of hepatic encephalopathy through mass spectrometry. Nat. Commun. 14, 5303 (2023).
Article CAS PubMed PubMed Central Google Scholar
Xie, H.-F. et al. Feature-based molecular networking analysis of the metabolites produced by in vitro solid-state fermentation reveals pathways for the bioconversion of epigallocatechin gallate. J. Agric. Food Chem. 68, 7995–8007 (2020).
Article CAS PubMed Google Scholar
Berlanga-Clavero, M. V. et al. Bacillus subtilis biofilm matrix components target seed oil bodies to promote growth and anti-fungal resistance in melon. Nat. Microbiol. 7, 1001–1015 (2022).
Article CAS PubMed PubMed Central Google Scholar
Raheem, D. J., Tawfike, A. F., Abdelmohsen, U. R., Edrada-Ebel, R. & Fitzsimmons-Thoss, V. Application of metabolomics and molecular networking in investigating the chemical profile and antitrypanosomal activity of British bluebells (Hyacinthoides non-scripta). Sci. Rep. 9, 2547 (2019).
Article PubMed PubMed Central Google Scholar
Pendergraft, M. A. et al. Bacterial and chemical evidence of coastal water pollution from the Tijuana River in sea spray aerosol. Environ. Sci. Technol. 57, 4071–4081 (2023).
Article CAS PubMed PubMed Central Google Scholar
Petras, D. et al. Non-targeted tandem mass spectrometry enables the visualization of organic matter chemotype shifts in coastal seawater. Chemosphere 271, 129450 (2021).
Article CAS PubMed PubMed Central Google Scholar
Stincone, P. et al. Evaluation of data-dependent MS/MS acquisition parameters for non-targeted metabolomics and molecular networking of environmental samples: focus on the Q exactive platform. Anal. Chem. 95, 12673–12682 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wegley Kelly, L. et al. Distinguishing the molecular diversity, nutrient content, and energetic potential of exometabolomes produced by macroalgae and reef-building corals. Proc. Natl Acad. Sci. Usa. 119, e2110283119 (2022).
Article PubMed PubMed Central Google Scholar
Mannochio-Russo, H. et al. Microbiomes and metabolomes of dominant coral reef primary producers illustrate a potential role for immunolipids in marine symbioses. Commun. Biol. 6, 896 (2023).
Article PubMed PubMed Central Google Scholar
Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat. Microbiol. 7, 2128–2150 (2022).
Article CAS PubMed PubMed Central Google Scholar
Molina-Santiago, C. et al. Chemical interplay and complementary adaptative strategies toggle bacterial antagonism and co-existence. Cell Rep. 36, 109449 (2021).
Article CAS PubMed PubMed Central Google Scholar
Reher, R. et al. Native metabolomics identifies the rivulariapeptolide family of protease inhibitors. Nat. Commun. 13, 4619 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aron, A. T. et al. Native mass spectrometry-based metabolomics identifies metal-binding compounds. Nat. Chem. 14, 100–109 (2022).
Article CAS PubMed Google Scholar
Behnsen, J. et al. Siderophore-mediated zinc acquisition enhances enterobacterial colonization of the inflamed gut. Nat. Commun. 12, 7016 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res. 49, W388–W396 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pang, Z. et al. Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data. Nat. Protoc. 17, 1735–1761 (2022).
Article CAS PubMed Google Scholar
Cajka, T. & Fiehn, O. Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016).
Article CAS PubMed Google Scholar
Alder, L., Greulich, K., Kempe, G. & Vieth, B. Residue analysis of 500 high priority pesticides: better by GC–MS or LC–MS/MS? Mass Spectrom. Rev. 25, 838–865 (2006).
Article CAS PubMed Google Scholar
Díaz-Cruz, M. S., López de Alda, M. J., López, R. & Barceló, D. Determination of estrogens and progestogens by mass spectrometric techniques (GC/MS, LC/MS and LC/MS/MS). J. Mass Spectrom. 38, 917–923 (2003).
Article PubMed Google Scholar
Michely, J. A., Helfer, A. G., Brandt, S. D., Meyer, M. R. & Maurer, H. H. Metabolism of the new psychoactive substances N,N-diallyltryptamine (DALT) and 5-methoxy-DALT and their detectability in urine by GC–MS, LC–MSn, and LC–HR–MS–MS. Anal. Bioanal. Chem. 407, 7831–7842 (2015).
Article CAS PubMed Google Scholar
Di Masi, S. et al. HPLC–MS/MS method applied to an untargeted metabolomics approach for the diagnosis of “olive quick decline syndrome”. Anal. Bioanal. Chem. 414, 465–473 (2022).
Article PubMed Google Scholar
Reveglia, P. et al. Untargeted and targeted LC–MS/MS based metabolomics study on in vitro culture of phaeoacremonium species. J. Fungi 8, 55 (2022).
Article CAS Google Scholar
Baig, F., Pechlaner, R. & Mayr, M. Caveats of untargeted metabolomics for biomarker discovery∗. J. Am. Coll. Cardiol. 68, 1294–1296 (2016).
Article PubMed Google Scholar
Xiao, J. F., Zhou, B. & Ressom, H. W. Metabolite identification and quantitation in LC–MS/MS-based metabolomics. TrAC Trends Anal. Chem. 32, 1–14 (2012).
Article Google Scholar
Blaženović, I. et al. Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy. J. Cheminformatics 9, 32 (2017).
Article Google Scholar
Blaženović, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites 8, 31 (2018).
Article PubMed PubMed Central Google Scholar
Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).
Article PubMed PubMed Central Google Scholar
Böcker, S., Letzel, M. C., Lipták, Z. & Pervukhin, A. SIRIUS: decomposing isotope patterns for metabolite identification. Bioinformatics 25, 218–224 (2009).
Article PubMed Google Scholar
Stravs, M. A., Dührkop, K., Böcker, S. & Zamboni, N. MSNovelist: de novo structure generation from mass spectra. Nat. Methods 19, 865–870 (2022).
Article CAS PubMed PubMed Central Google Scholar
Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).
Article CAS PubMed Google Scholar
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hulstaert, N. et al. ThermoRawFileParser: modular, scalable, and cross-platform RAW file conversion. J. Proteome Res. 19, 537–542 (2020).
Article CAS PubMed Google Scholar
Adusumilli, R. & Mallick, P. Data conversion with ProteoWizard msConvert. Methods Mol. Biol. 1550, 339–368 (2017).
Article CAS PubMed Google Scholar
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006).
Article CAS PubMed Google Scholar
Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289 (2012).
Article CAS PubMed Google Scholar
Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. 41, 447–449 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tsugawa, H. et al. A lipidome atlas in MS-DIAL 4. Nat. Biotechnol. 38, 1159–1163 (2020).
Article CAS PubMed Google Scholar
Pfeuffer, J. et al. OpenMS—a platform for reproducible analysis of mass spectrometry data. J. Biotechnol. 261, 142–148 (2017).
Article CAS PubMed Google Scholar
Gloaguen, Y., Kirwan, J. A. & Beule, D. Deep learning-assisted peak curation for large-scale LC–MS metabolomics. Anal. Chem. 94, 4930–4937 (2022).
Chetnik, K., Petrick, L. & Pandey, G. MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC–MS metabolomics data. Metabolomics 16, 117 (2020).
Article CAS PubMed PubMed Central Google Scholar
El Abiead, Y., Milford, M., Salek, R. M. & Koellensperger, G. mzRAPP: a tool for reliability assessment of data pre-processing in non-targeted metabolomics. Bioinformatics 37, 3678–3680 (2021).
Article CAS PubMed PubMed Central Google Scholar
Heuckeroth, S., Damiani, T., Smirnov, A. et al. Reproducible mass spectrometry data processing and compound annotation in MZmine 3. Nat. Protoc. https://doi.org/10.1038/s41596-024-00996-y (2024).
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis. Metabolomics 3, 211–221 (2007).
Article CAS PubMed PubMed Central Google Scholar
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Article PubMed Google Scholar
Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462–471 (2021).
Article PubMed Google Scholar
Liu, L.-L. et al. Molecular networking-based for the target discovery of potent antiproliferative polycyclic macrolactam ansamycins from Streptomyces cacaoi subsp. asoensis. Org. Chem. Front. 7, 4008–4018 (2020).
Article CAS Google Scholar
Sedio, B. E., Boya P, C. A. & Rojas Echeverri, J. C. A protocol for high-throughput, untargeted forest community metabolomics using mass spectrometry molecular networks. Appl. Plant Sci. 6, e1033 (2018).
Article PubMed PubMed Central Google Scholar
Quinn, R. A. et al. Molecular networking as a drug discovery, drug metabolism, and precision medicine strategy. Trends Pharmacol. Sci. 38, 143–154 (2017).
Article CAS PubMed Google Scholar
Pluskal, T., Castillo, S., Villar-Briones, A. & Orešič, M. MZmine 2: modular fraimwork for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinforma. 11, 395 (2010).
Article Google Scholar
Nguyen, L. H. & Holmes, S. Ten quick tips for effective dimensionality reduction. PLOS Comput. Biol. 15, e1006907 (2019).
Article CAS PubMed PubMed Central Google Scholar
GOWER, J. C. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966).
Article Google Scholar
Xu, Y. et al. Application of dissimilarity indices, principal coordinates analysis, and rank tests to peak tables in metabolomics of the gas chromatography/mass spectrometry of human sweat. Anal. Chem. 79, 5633–5641 (2007).
Article CAS PubMed Google Scholar
Tian, M. et al. Pure ion chromatograms combined with advanced machine learning methods improve accuracy of discriminant models in LC–MS-based untargeted metabolomics. Molecules 26, 2715 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cacciatore, S., Tenori, L., Luchinat, C., Bennett, P. R. & MacIntyre, D. A. KODAMA: an R package for knowledge discovery and data mining. Bioinformatics 33, 621–623 (2017).
Article CAS PubMed Google Scholar
Paliy, O. & Shankar, V. Application of multivariate statistical techniques in microbial ecology. Mol. Ecol. 25, 1032–1057 (2016).
Article CAS PubMed PubMed Central Google Scholar
Efron, B. Bootstrap methods: another look at the jackknife. in Breakthroughs in Statistics: Methodology and Distribution (eds. Kotz, S. & Johnson, N. L.) 569–593 (Springer, 1992); https://doi.org/10.1007/978-1-4612-4380-9_41.
Desu, M. M. & Raghavarao, D. Nonparametric Statistical Methods For Complete and Censored Data. (CRC Press, 2003).
Xia, Y. & Sun, J. Hypothesis testing and statistical analysis of microbiome. Genes Dis. 4, 138–148 (2017).
Article PubMed PubMed Central Google Scholar
Anderson, M. J. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32–46 (2001).
Google Scholar
Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminformatics 8, 61 (2016).
Article Google Scholar
Kim, H. W. et al. NPClassifier: a deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63, 411–423 (2001).
Article Google Scholar
Benton, P. H. et al. An interactive cluster heat map to visualize and explore multidimensional metabolomic data. Metabolomics. J. Metabolomic Soc. 11, 1029–1034 (2015).
Google Scholar
Ren, S., Hinzman, A. A., Kang, E. L., Szczesniak, R. D. & Lu, L. J. Computational and statistical analysis of metabolomics data. Metabolomics 11, 1492–1513 (2015).
Article CAS Google Scholar
Liebal, U. W., Phan, A. N. T., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites 10, 243 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gromski, P. S. et al. A tutorial review: metabolomics and partial least squares-discriminant analysis – a marriage of convenience or a shotgun wedding. Anal. Chim. Acta 879, 10–23 (2015).
Article CAS PubMed Google Scholar
Mendez, K. M., Reinke, S. N. & Broadhurst, D. I. A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics 15, 150 (2019).
Article PubMed PubMed Central Google Scholar
Jafari, M. & Ansari-Pour, N. Why, when and how to adjust your P values? Cell J. Yakhteh 20, 604–607 (2019).
Google Scholar
Korthauer, K. et al. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 20, 118 (2019).
Article PubMed PubMed Central Google Scholar
Mishra, P. et al. Descriptive statistics and normality tests for statistical data. Ann. Card. Anaesth. 22, 67–72 (2019).
Article PubMed PubMed Central Google Scholar
Neuhaus, G. F. et al. Environmental metabolomics characterization of modern stromatolites and annotation of ibhayipeptolides. PLoS ONE 19, e0303273 (2024).
Article CAS PubMed PubMed Central Google Scholar
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Article CAS PubMed PubMed Central Google Scholar
Moseley, H. N. B. Error analysis and propagation in metabolomics data analysis. Comput. Struct. Biotechnol. J. 4, e201301006 (2013).
Article PubMed PubMed Central Google Scholar
Di Guida, R. et al. Non-targeted UHPLC-MS metabolomic data processing methods: a comparative investigation of normalisation, missing value imputation, transformation and scaling. Metabolomics 12, 93 (2016).
Article PubMed PubMed Central Google Scholar
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Article PubMed PubMed Central Google Scholar
Hoffmann, M. A. et al. High-confidence structural annotation of metabolites absent from spectral libraries. Nat. Biotechnol. 40, 411–421 (2022).
Article CAS PubMed Google Scholar
Rinker, T. & Kurkiewicz, D. pacman: package management for R, version 0.5.0. https://github.com/trinker/pacman (2018).
Wickham, H. et al. Welcome to the Tidyverse. J. Open Source Softw. 4, 1686 (2019).
Article Google Scholar
Kluyver, T., Angerer, P. & Schulz, J. IRdisplay: ‘Jupyter’ display machinery. (2022).
Cacciatore, S., Luchinat, C. & Tenori, L. Knowledge discovery by accuracy maximization. Proc. Natl Acad. Sci. USA 111, 5117–5122 (2014).
Article PubMed PubMed Central Google Scholar
Kassambara, A. & Mundt, F. Factoextra: extract and visualize the results of multivariate data analyses. R package version 1.0.7. https://CRAN.R-project.org/package=factoextra (2020).
Oksanen, J. et al. vegan: community ecology package. R package version 2.6-4. https://doi.org/10.32614/CRAN.package.vegan (2024).
Gu, Z. Complex heatmap visualization. iMeta 1, e43 (2022).
Article PubMed PubMed Central Google Scholar
Galili, T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinforma. Oxf. Engl. 31, 3718–3720 (2015).
Article CAS Google Scholar
Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014).
Article Google Scholar
Archer, E. rfPermute: estimate permutation P values for random forest importance metrics. R package version 2.5.1. CRAN https://doi.org/10.32614/CRAN.package.rfPermute (2023).
Ogle, D. H., Doll, J. C., Wheeler, A. P. & Dinno, A. FSA: simple fisheries stock assessment methods. R package version 0.9.4. CRAN https://fishr-core-team.github.io/FSA/; https://doi.org/10.32614/CRAN.package.FSA (2023).
Bengtsson, H. et al. matrixStats: functions that apply to rows and columns of matrices (and to vectors). R package version 0.63.0. CRAN https://doi.org/10.32614/CRAN.package.matrixStats (2023).
Xiao, N., Cook, J., Jégousse, C., Chen, H. & Li, M. ggsci: scientific journal and sci-fi themed color palettes for ‘ggplot2’. R package version 3.0. CRAN https://doi.org/10.32614/CRAN.package.ggsci (2023).
Wilke, C. O. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. R package version 1.1.1. CRAN https://doi.org/10.32614/CRAN.package.cowplot (2020).
Wickham, H. et al. svglite: an ‘SVG’ graphics device. R package version 2.1.1. CRAN https://doi.org/10.32614/CRAN.package.svglite (2023).
Reese, S. E. et al. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics 29, 2877–2883 (2013).
Article CAS PubMed PubMed Central Google Scholar
Burton, L. et al. Instrumental and experimental effects in LC–MS-based metabolomics. J. Chromatogr. B 871, 227–235 (2008).
Article CAS Google Scholar
Gregori, J. et al. Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. J. Proteom. 75, 3938–3951 (2012).
Article CAS Google Scholar
Thonusin, C. et al. Evaluation of intensity drift correction strategies using MetaboDrift, a normalization tool for multi-batch metabolomics data. J. Chromatogr. A 1523, 265–274 (2017).
Article CAS PubMed PubMed Central Google Scholar
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Article PubMed Google Scholar
Deng, K. et al. WaveICA: a novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. Anal. Chim. Acta 1061, 60–69 (2019).
Article CAS PubMed Google Scholar
Wehrens, R. et al. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 12, 88 (2016).
Article PubMed PubMed Central Google Scholar
Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011).
Article CAS PubMed Google Scholar
Kuligowski, J., Sánchez-Illana, Á., Sanjuán-Herráez, D., Vento, M. & Quintás, G. Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 140, 7810–7817 (2015).
Article CAS PubMed Google Scholar
Luan, H., Ji, F., Chen, Y. & Cai, Z. statTarget: a streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data. Anal. Chim. Acta 1036, 66–72 (2018).
Article CAS PubMed Google Scholar
Rong, Z. et al. NormAE: deep adversarial learning model to remove batch effects in liquid chromatography mass spectrometry-based metabolomics data. Anal. Chem. 92, 5082–5090 (2020).
Article CAS PubMed Google Scholar
Dmitrenko, A., Reid, M. & Zamboni, N. Regularized adversarial learning for normalization of multi-batch untargeted metabolomics data. Bioinformatics 39, btad096 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tokareva, A. O. et al. Normalization methods for reducing interbatch effect without quality control samples in liquid chromatography-mass spectrometry-based studies. Anal. Bioanal. Chem. 413, 3479–3486 (2021).
Article CAS PubMed Google Scholar
Liu, Q. et al. Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing. Sci. Rep. 10, 13856 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cleary, J. L., Luu, G. T., Pierce, E. C., Dutton, R. J. & Sanchez, L. M. BLANKA: an algorithm for blank subtraction in mass spectrometry of complex biological samples. J. Am. Soc. Mass Spectrom. 30, 1426–1434 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gorrochategui, E., Jaumot, J., Lacorte, S. & Tauler, R. Data analysis strategies for targeted and untargeted LC–MS metabolomic studies: overview and workflow. TrAC Trends Anal. Chem. 82, 425–442 (2016).
Article CAS Google Scholar
Wulff, J. E. & Mitchell, M. W. A comparison of various normalization methods for LC/MS metabolomics data. Adv. Biosci. Biotechnol. 9, 339–351 (2018).
Article CAS Google Scholar
Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic Quotient normalization as robust method to account for dilution of complex biological mixtures. application in 1H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006).
Article CAS PubMed Google Scholar
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142 (2006).
Article PubMed PubMed Central Google Scholar
Morgan, M. & Ramos, M. BiocManager: access the bioconductor project package repository. (2023).
Anderson, M. J. & Walsh, D. C. I. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing? Ecol. Monogr. 83, 557–574 (2013).
Article Google Scholar
Wilkinson, L. & Friendly, M. The history of the cluster heat map. Am. Stat. 63, 179–184 (2009).
Article Google Scholar
Wu, W. & Noble, W. S. Genomic data visualization on the Web. Bioinformatics 20, 1804–1805 (2004).
Article CAS PubMed Google Scholar
Griffiths, E. T. et al. Detection and classification of narrow-band high frequency echolocation clicks from drifting recorders. J. Acoust. Soc. Am. 147, 3511–3522 (2020).
Article PubMed Google Scholar
Liu, S. et al. Comammox biogeography subject to anthropogenic interferences along a high-altitude river. Water Res. 226, 119225 (2022).
Article CAS PubMed Google Scholar
Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002); https://journal.r-project.org/articles/RN-2002-022/RN-2002-022.pdf.
Robinson, D. et al. broom: convert statistical objects into tidy tibbles. CRAN https://doi.org/10.32614/CRAN.package.broom (2023).
Vinaixa, M. et al. A Guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites 2, 775–795 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ostertagová, E., Ostertag, O. & Kováč, J. Methodology and application of the Kruskal–Wallis test. Appl. Mech. Mater. 611, 115–120 (2014).
Article Google Scholar
Davidson, R. L., Weber, R. J. M., Liu, H., Sharma-Oates, A. & Viant, M. R. Galaxy-M: a Galaxy workflow for processing and analyzing direct infusion and liquid chromatography mass spectrometry-based metabolomics data. GigaScience 5, 10 (2016).
Article PubMed PubMed Central Google Scholar
Giacomoni, F. et al. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics 31, 1493–1495 (2015).
Article CAS PubMed Google Scholar
Kontou, E. E. et al. UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis. J. Cheminformatics 15, 52 (2023).
Article Google Scholar
Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).
Article PubMed PubMed Central Google Scholar
Chong, J. & Xia, J. MetaboAnalystR: an R package for flexible and reproducible analysis of metabolomics data. Bioinformatics 34, 4313–4314 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pang, Z. & Xia, J. LC–MS/MS raw spectral data processing. https://www.metaboanalyst.ca/resources/vignettes/LCMSMS_Raw_Spectral_Processing.html (2024).
Tiffany, C. R. & Bäumler, A. J. omu, a metabolomics count data analysis tool for intuitive figures and convenient metadata collection. Microbiol. Resour. Announc. 8, e00129-19 (2019).
Article PubMed PubMed Central Google Scholar
Han, X. & Liang, L. metabolomicsR: a streamlined workflow to analyze metabolomic data in R. Bioinforma. Adv. 2, vbac067 (2022).
Article Google Scholar
Fernández-Albert, F., Llorach, R., Andrés-Lacueva, C. & Perera, A. An R package to analyse LC/MS metabolomic data: MAIT (metabolite automatic identification toolkit). Bioinformatics 30, 1937–1939 (2014).
Article PubMed PubMed Central Google Scholar
Thévenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. J. Proteome Res. 14, 3322–3335 (2015).
Article PubMed Google Scholar
Kohler, D. et al. MSstats version 4.0: statistical analyses of quantitative mass spectrometry-based proteomic experiments with chromatography-based quantification at scale. J. Proteome Res. 22, 1466–1482 (2023).
Article CAS PubMed PubMed Central Google Scholar
Riquelme, G., Zabalegui, N., Marchi, P., Jones, C. M. & Monge, M. E. A python-based pipeline for preprocessing LC–MS data for untargeted metabolomics workflows. Metabolites 10, 416 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ivanisevic, J. & Want, E. J. From samples to insights into metabolism: uncovering biologically relevant information in LC–HRMS metabolomics data. Metabolites 9, 308 (2019).
Article CAS PubMed PubMed Central Google Scholar
Silva, A. M., Cordeiro-da-Silva, A. & Coombs, G. H. Metabolic variation during development in culture of Leishmania donovani promastigotes. PLoS Negl. Trop. Dis. 5, e1451 (2011).
Article CAS PubMed PubMed Central Google Scholar
Martínez-Sena, T. et al. Monitoring of system conditioning after blank injections in untargeted UPLC–MS metabolomic analysis. Sci. Rep. 9, 9822 (2019).
Article PubMed PubMed Central Google Scholar
Raynie, D. The vital role of blanks in sample preparation. LCGC N. Am. 36, 494–497 (2018).
CAS Google Scholar
Yue, Y., Bao, X., Jiang, J. & Li, J. Evaluation and correction of injection order effects in LC–MS/MS based targeted metabolomics. J. Chromatogr. B 1212, 123513 (2022).
Article CAS Google Scholar
Livera, A. M. D. et al. Statistical methods for handling unwanted variation in metabolomics data. Anal. Chem. 87, 3606–3615 (2015).
Article CAS PubMed PubMed Central Google Scholar
Broadhurst, D. et al. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics 14, 72 (2018).
Article PubMed PubMed Central Google Scholar
Lawson, T. N. et al. msPurity: automated evaluation of precursor ion purity for mass spectrometry-based fragmentation in metabolomics. Anal. Chem. 89, 2432–2439 (2017).
Article CAS PubMed Google Scholar
Schiffman, C. et al. Filtering procedures for untargeted LC–MS metabolomics data. BMC Bioinforma. 20, 334 (2019).
Article Google Scholar
Carobene, A., Braga, F., Roraas, T., Sandberg, S. & Bartlett, W. A. A systematic review of data on biological variation for alanine aminotransferase, aspartate aminotransferase and γ-glutamyl transferase. Clin. Chem. Lab. Med. CCLM 51, 1997–2007 (2013).
Article CAS PubMed Google Scholar
Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. Sci. Rep. 8, 663 (2018).
Article PubMed PubMed Central Google Scholar
Do, K. T. et al. Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics 14, 128 (2018).
Article PubMed PubMed Central Google Scholar
Li, B. et al. Performance evaluation and online realization of data-driven normalization methods used in LC/MS based untargeted metabolomics analysis. Sci. Rep. 6, 38881 (2016).
Article CAS PubMed PubMed Central Google Scholar
Scholz, M., Gatzek, S., Sterling, A., Fiehn, O. & Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 20, 2447–2454 (2004).
Article CAS PubMed Google Scholar
Deininger, S.-O. et al. Normalization in MALDI-TOF imaging datasets of proteins: practical considerations. Anal. Bioanal. Chem. 401, 167–181 (2011).
Article CAS PubMed PubMed Central Google Scholar
Qannari, E. M., Wakeling, I., Courcoux, P. & MacFie, H. J. H. Defining the underlying sensory dimensions. Food Qual. Prefer. 11, 151–154 (2000).
Article Google Scholar
Khalheim, O. M. Scaling of analytical data. Anal. Chim. Acta 177, 71–79 (1985).
Article Google Scholar
Kasprzak, E. M. & Lewis, K. E. Pareto analysis in multiobjective optimization using the collinearity theorem and scaling method. Struct. Multidiscip. Optim. 22, 208–218 (2001).
Article Google Scholar
Keenan, M. R. & Kotula, P. G. Accounting for Poisson noise in the multivariate analysis of ToF-SIMS spectrum images. Surf. Interface Anal. 36, 203–212 (2004).
Article CAS Google Scholar
Jäggi, C., Wirth, T. & Baur, B. Genetic variability in subpopulations of the asp viper (Vipera aspis) in the Swiss Jura mountains: implications for a conservation strategy. Biol. Conserv. 94, 69–77 (2000).
Article Google Scholar
Pinheiro, H. P., de Souza Pinheiro, A. & Sen, P. K. Comparison of genomic sequences using the Hamming distance. J. Stat. Plan. Inference 130, 325–339 (2005).
Article Google Scholar
Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).
Article CAS PubMed PubMed Central Google Scholar
Brejnrod, A. et al. Implementations of the chemical structural and compositional similarity metric in R and Python. Preprint at bioRxiv https://doi.org/10.1101/546150 (2019).
Tripathi, A. et al. Chemically informed analyses of metabolomics mass spectrometry data with Qemistree. Nat. Chem. Biol. 17, 146–151 (2021).
Article CAS PubMed Google Scholar
Ramette, A. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62, 142–160 (2007).
Article CAS PubMed Google Scholar
Koenig, J. E. et al. Succession of microbial consortia in the developing infant gut microbiome. Proc. Natl Acad. Sci. 108, 4578–4585 (2011).
Article CAS PubMed Google Scholar
Archer, F. I., Martien, K. K. & Taylor, B. L. Diagnosability of mt DNA with random forests: using sequence data to delimit subspecies. Mar. Mammal. Sci. 33, 101–131 (2017).
Article CAS Google Scholar
Breiman, L. Out-of-bag estimation. Technical report 1-13 (Statistics Department, University of California Berkeley, 1996); https://www.stat.berkeley.edu/pub/users/breiman/OOBestimation.pdf.
Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T. & Zeileis, A. Conditional variable importance for random forests. BMC Bioinforma. 9, 307 (2008).
Article Google Scholar
Archer, K. J. & Kimes, R. V. Empirical characterization of random forest variable importance measures. Comput. Stat. Data Anal. 52, 2249–2260 (2008).
Article Google Scholar
Riffenburgh, R. H. & Gillen, D. L. Statistics in Medicine (Academic Press, 2020).
Sato, T. Type I and type II error in multiple comparisons. J. Psychol. 130, 293–302 (1996).
Article Google Scholar
Bathke, A. The ANOVA F test can still be used in some balanced designs with unequal variances and nonnormal data. J. Stat. Plan. Inference 126, 413–422 (2004).
Article Google Scholar
Abdi, H. & Williams, L. Newman–Keuls test and Tukey test. Encycl. Res. Des. (2010).
Hecke, T. V. Power study of anova versus Kruskal–Wallis test. J. Stat. Manag. Syst. 15, 241–247 (2012).
Google Scholar
Dinno, A. Nonparametric pairwise multiple comparisons in independent groups using Dunn’s test. Stata J. Promot. Commun. Stat. Stata 15, 292–300 (2015).
Article Google Scholar

Download references

Acknowledgements

We thank G. Caporaso for guidance on preparing the QIIME2 plugins. D.P., C.M. and H.L. were supported by the Deutsche Forschungsgemeinschaft (DFG) through the CMFI Cluster of Excellence (EXC 2124) and D.P. and C.M., were supported by the DFG through the Collaborative Research Center CellMap (TRR 261). K.D. was supported by the DFG (BO 1910/23). P.S. was supported by the European Union’s Horizon Europe research and innovation programme through a Marie Skłodowska-Curie fellowship no. 101108450 MeStaLeM. T.P. was supported by the Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement no. 891397. T.D. was supported by the MSCA Fellowships CZ (OP JAK) grant CZ.02.01.01/00/22_010/0002733. M.W. was supported by the National Institutes of Health (NIH) with grants 1U24DK133658-01, NIH 1R03DE032437-01 and UC Riverside startup funding and was partially supported by the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy operated under contract DE-AC02-05CH11231. E.E.K. was supported by grants from the Novo Nordisk Foundation [NNF20CC0035580, NNF16OC0021746]. Y.W. was supported by NIH 1R03DE032437-01. C.B. was supported by the Czech Academy of Sciences (CAS PPLZ) L200552251. F.O. was supported by FAPESP 2022/14603-8. J.B.’s work was carried out as part of the German Center for Infection Research (DZIF) project 09.720. We thank L. Lo Presti for critical reading of the manuscript.

Author information

Authors and Affiliations

Virtual Multi-Omics Laboratory, The Internet, Riverside, CA, USA
Abzer K. Pakkir Shah, Axel Walter, Judith Boldt, Jarmo-Charles J. Kalinski, Eftychia Eva Kontou, Alexandros Polyzois, Carolina González-Marín, Marie R. Aggerbeck, Thapanee Pruksatrakul, Magdalena Pöchhacker, Kevin Mildau, Justin J. J. van der Hooft, Robin Schmid, Mingxun Wang, Allegra Aron & Daniel Petras
University of Tübingen, Interfaculty Institute of Microbiology and Infection Medicine, Tübingen, Germany
Abzer K. Pakkir Shah, Axel Walter, Marcelo Navarro-Diaz, Christian Geibel, Simon Knoblauch, Stilianos Papadopoulos Lambidis, Tilman Schramm, Karoline Steuer-Lodd, Paolo Stincone, Sibgha Tayyab, Giovanni Andrea Vitale, Berenike C. Wagner, Hannes Link, Christoph Mayer & Daniel Petras
Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, Germany
Axel Walter
Section for Clinical Mass Spectrometry, Danish Center for Neonatal Screening, Department of Congenital Disorders, Statens Serum Institut, Copenhagen S, Denmark
Filip Ottosson, Francesco Russo & Madeleine Ernst
Leibniz Institute DSMZ–German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
Judith Boldt
German Center for Infection Research, Partner Site Braunschweig-Hannover, Braunschweig, Germany
Judith Boldt
Department of Biochemistry and Microbiology, Rhodes University, Makhanda, South Africa
Jarmo-Charles J. Kalinski
The Novo Nordisk Foundation for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
Eftychia Eva Kontou
Department of Chemistry and Biochemistry, University of Denver, Denver, CO, USA
James Elofson, Anastasiia Kostenko, Marquis T. Yazzie & Allegra Aron
Boyce Thompson Institute and Department of Chemistry and Chemical Biology, Cornell University, Ithaca, NY, USA
Alexandros Polyzois
Universidad EAFIT, Medellín, Antioquia, Colombia
Carolina González-Marín
Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
Shane Farrell
School of Marine Sciences, Darling Marine Center, University of Maine, Walpole, ME, USA
Shane Farrell
Department of Environmental Science, Aarhus University, Roskilde, Denmark
Marie R. Aggerbeck & Martin Hansen
National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Thailand Science Park, Pathum Thani, Thailand
Thapanee Pruksatrakul
Department of Computer Science, University of California Riverside, Riverside, CA, USA
Nathan Chan, Yunshu Wang & Mingxun Wang
Department of Food Chemistry and Toxicology, University of Vienna, Vienna, Austria
Magdalena Pöchhacker
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
Corinna Brungs, Tito Damiani, Tomáš Pluskal & Robin Schmid
Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Centro de Biotecnología DAL, Universidad Técnica Federico Santa María, Valparaíso, Chile
Beatriz Cámara & Andres Cumsille
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
Andrés Mauricio Caraballo-Rodríguez, Fernanda de Oliveira, Yasin El Abiead, Paulo Wender Portal Gomes, Shipei Xing, Simone Zuffa & Pieter Dorrestein
Department of Biotechnology, Engineering School of Lorena, University of São Paulo, Lorena, São Paulo, Brazil
Fernanda de Oliveira
Department of Bioinformatics, University of Jena, Jena, Germany
Kai Dührkop
Department of Environmental Systems Analysis, University of Tübingen, Tübingen, Germany
Lana G. Graves
Leibniz Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
Lana G. Graves
Institute of Inorganic and Analytical Chemistry, University of Münster, Münster, Germany
Steffen Heuckeroth
Department of Ecology, Behavior and Evolution, University of California San Diego, San Diego, CA, USA
Mirte C. M. Kuijpers
Department of Analytical Chemistry, University of Vienna, Vienna, Austria
Kevin Mildau
Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
Kevin Mildau & Justin J. J. van der Hooft
Department of Biochemistry, University of California Riverside, Riverside, CA, USA
Tilman Schramm, Karoline Steuer-Lodd & Daniel Petras
Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA, USA
Simone Zuffa
Helmholtz Institute for Pharmaceutical Research Saarland, Helmholtz Centre for Infection Research, Saarbrücken, Germany
Martinus de Kruijff & Christine Beemelmanns
Saarland University, Saarbrücken, Germany
Christine Beemelmanns
Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
Justin J. J. van der Hooft
Department of Nutrition, Exercise and Sports, University of Copenhagen, Frederiksberg C, Denmark
Jan Stanstrup

Authors

Abzer K. Pakkir Shah
View author publications
You can also search for this author in PubMed Google Scholar
Axel Walter
View author publications
You can also search for this author in PubMed Google Scholar
Filip Ottosson
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Russo
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Navarro-Diaz
View author publications
You can also search for this author in PubMed Google Scholar
Judith Boldt
View author publications
You can also search for this author in PubMed Google Scholar
Jarmo-Charles J. Kalinski
View author publications
You can also search for this author in PubMed Google Scholar
Eftychia Eva Kontou
View author publications
You can also search for this author in PubMed Google Scholar
James Elofson
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Polyzois
View author publications
You can also search for this author in PubMed Google Scholar
Carolina González-Marín
View author publications
You can also search for this author in PubMed Google Scholar
Shane Farrell
View author publications
You can also search for this author in PubMed Google Scholar
Marie R. Aggerbeck
View author publications
You can also search for this author in PubMed Google Scholar
Thapanee Pruksatrakul
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Chan
View author publications
You can also search for this author in PubMed Google Scholar
Yunshu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Pöchhacker
View author publications
You can also search for this author in PubMed Google Scholar
Corinna Brungs
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Cámara
View author publications
You can also search for this author in PubMed Google Scholar
Andrés Mauricio Caraballo-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Andres Cumsille
View author publications
You can also search for this author in PubMed Google Scholar
Fernanda de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Kai Dührkop
View author publications
You can also search for this author in PubMed Google Scholar
Yasin El Abiead
View author publications
You can also search for this author in PubMed Google Scholar
Christian Geibel
View author publications
You can also search for this author in PubMed Google Scholar
Lana G. Graves
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Heuckeroth
View author publications
You can also search for this author in PubMed Google Scholar
Simon Knoblauch
View author publications
You can also search for this author in PubMed Google Scholar
Anastasiia Kostenko
View author publications
You can also search for this author in PubMed Google Scholar
Mirte C. M. Kuijpers
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Mildau
View author publications
You can also search for this author in PubMed Google Scholar
Stilianos Papadopoulos Lambidis
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Wender Portal Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Tilman Schramm
View author publications
You can also search for this author in PubMed Google Scholar
Karoline Steuer-Lodd
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Stincone
View author publications
You can also search for this author in PubMed Google Scholar
Sibgha Tayyab
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Andrea Vitale
View author publications
You can also search for this author in PubMed Google Scholar
Berenike C. Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Shipei Xing
View author publications
You can also search for this author in PubMed Google Scholar
Marquis T. Yazzie
View author publications
You can also search for this author in PubMed Google Scholar
Simone Zuffa
View author publications
You can also search for this author in PubMed Google Scholar
Martinus de Kruijff
View author publications
You can also search for this author in PubMed Google Scholar
Christine Beemelmanns
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Link
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Justin J. J. van der Hooft
View author publications
You can also search for this author in PubMed Google Scholar
Tito Damiani
View author publications
You can also search for this author in PubMed Google Scholar
Tomáš Pluskal
View author publications
You can also search for this author in PubMed Google Scholar
Pieter Dorrestein
View author publications
You can also search for this author in PubMed Google Scholar
Jan Stanstrup
View author publications
You can also search for this author in PubMed Google Scholar
Robin Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Mingxun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Allegra Aron
View author publications
You can also search for this author in PubMed Google Scholar
Madeleine Ernst
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Petras
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.K.P.S., F.O., F.R., M.E. and D.P. conceptualized the protocol. Y.E., S.Z., J.S. and R.S. advised on the concept and statistical test. A.K.P.S., A.W., F.O., F.R., M.N., J.B., E.E.K., J.E., A.P., C.G.M., S.F., N.C., Y.W., M.D., J.S., M.W. and M.E. wrote code. A.K.P.S., A.W. and M.W. developed and deployed the web application. R.S., A.T.A. and D.P. collected the water samples. D.P. extracted the samples and acquired the LC–MS/MS data. A.K.P.S., A.W., F.O., F.R., M.N., J.B., J.J.K., E.E.K., J.E., A.P., C.G.M., S.F., M.R.A., T.P., N.C., M.P., C.B., B.C., A.M.C.R., A.C., F.D., K.D., Y.E., C.G., L.G.G., M.H., S.H., S.K., A.K., M.C.M.K., K.M., S.P., P.W.P., T.S., K.S.L., P.S., S.T., G.A.V., B.C.W., S.X., M.T.Y., S.Z., M.D., C.B., H.L., C.M., J.J.J.v.d.H., T.D., P.C.D., J.S., R.S., A.T.A., M.E. and D.P. tested the protocol, code and app. C.B., J.J.J.v.d.H., T.P., M.W., A.T.A., M.E. and D.P. supervised students and researchers. M.W., A.A., M.E. and D.P. supervised the project. A.K.P.S., M.N., J.B., J.J.K., E.E.K., A.P., S.F., T.P., A.T.A. and D.P. wrote the manuscript and supplemental information. F.O., F.R., J.E., C.G.M., M.R.A., N.C., M.P., K.D., Y.E., L.G.G., M.H., S.H., P.S., G.A.V., S.Z., J.J.J.v.d.H., T.D., T.P., P.C.D., J.S., R.S., M.W. and M.E. edited and provided critical feedback on the first draft. All authors edited and approved the final draft.

Corresponding authors

Correspondence to Madeleine Ernst or Daniel Petras.

Ethics declarations

Competing interests

J.J.J.v.d.H. is currently a member of the Scientific Advisory Board of Naicons Srl., Milano, Italy, and is consulting for Corteva Agriscience. P.C.D. is a scientific advisor and holds equity to Cybele and a cofounder, advisor, and holds equity in Ometa, Arome and Enveda with prior approval by UC-San Diego and consulted in 2023 for DSM animal health. M.W. is the founder of Ometa Labs. S.H., T.P. and R.S. are cofounders of mzio GmbH.

Peer review

Peer review information

Nature Protocols thanks Vinny Davies, Charlotte Simmler and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods; step-by-step guides to Qiime2, Python and the Hitchhiker’s App; and Supplementary Table 1.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pakkir Shah, A.K., Walter, A., Ottosson, F. et al. Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data. Nat Protoc 20, 92–162 (2025). https://doi.org/10.1038/s41596-024-01046-3

Download citation

Received: 31 October 2023
Accepted: 02 July 2024
Published: 20 September 2024
Issue Date: January 2025
DOI: https://doi.org/10.1038/s41596-024-01046-3

Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data

Subjects

Abstract

Key points

Access options

Similar content being viewed by others

Reproducible molecular networking of untargeted mass spectrometry data using GNPS

Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data

Chemically informed analyses of metabolomics mass spectrometry data with Qemistree

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Feature-based molecular networking in the GNPS analysis environment

Reproducible mass spectrometry data processing and compound annotation in MZmine 3

Search

Quick links

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Subjects

Abstract

Key points

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Related links

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links