A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems.

MBio
Begum D TopcuogluPatrick D. Schloss

Abstract

Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with a...Continue Reading

References

Jan 27, 2007·BMC Bioinformatics·Carolin StroblTorsten Hothorn
Apr 14, 2010·Bioinformatics·André AltmannThomas Lengauer
Nov 3, 2010·FEMS Microbiology Reviews·Dan KnightsRob Knight
Jun 28, 2011·Genome Biology·Nicola SegataCurtis Huttenhower
Oct 25, 2011·Cell Host & Microbe·Dan KnightsRob Knight
Jan 25, 2014·Microbiome·Alexander StatnikovAlexander V Alekseyenko
Aug 12, 2014·Cancer Prevention Research·Joseph P ZackularPatrick D Schloss
Oct 14, 2014·FEBS Letters·William A WaltersRob Knight
Nov 30, 2014·Molecular Systems Biology·Georg ZellerPeer Bork
Nov 2, 2015·Mayo Clinic Proceedings·Diana G RedwoodDavid A Ahlquist
Apr 29, 2016·Genome Medicine·Emmanuel MontassierDan Knights
Jul 12, 2016·PLoS Computational Biology·Edoardo PasolliNicola Segata
Aug 25, 2016·MBio·Marc A Sze, Patrick D Schloss
Sep 28, 2016·Cancer Epidemiology, Biomarkers & Prevention : a Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology·Vanessa L HaleNicholas Chia
Oct 27, 2016·PeerJ·Torbjørn RognesFrédéric Mahé
Oct 11, 2017·Gut·Burkhardt FlemerPaul W O'Toole
Oct 23, 2017·Gut·Yoshiki Vázquez-BaezaRob Knight
Oct 25, 2017·Conference Proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society·Derek Reiman Yang Dai
Mar 15, 2018·BMC Bioinformatics·Diego FioravantiCesare Furlanello
Jul 6, 2019·BMJ Open Gastroenterology·Ezzat DadkhahPatrick M Gillevet
Aug 21, 2019·Nature Medicine·Jenna WiensAnna Goldenberg

❮ Previous
Next ❯

Citations

Oct 9, 2020·Genome Génome / Conseil National De Recherches Canada·Harpreet KaurRaja B Singh
Dec 20, 2020·British Dental Journal·Charifa ZemouriNicholas S Jakubovics
Mar 12, 2021·Frontiers in Microbiology·Isabel Moreno-IndiasMarcus J Claesson
Mar 9, 2021·Computational and Structural Biotechnology Journal·Ryan B Ghannam, Stephen M Techtmann
Apr 22, 2021·Journal of Gastroenterology and Hepatology·Weitong ZhangKa-Chun Wong
Mar 9, 2021·Clinical Infectious Diseases : an Official Publication of the Infectious Diseases Society of America·Maureen A CareyCarol A Gilchrist
Jun 8, 2021·Computational and Structural Biotechnology Journal·Shunyao WuXiaoquan Su
Aug 21, 2021·Journal of Open Source Software·Begüm D TopçuoğluPatrick D Schloss
Nov 10, 2021·Critical Reviews in Microbiology·Valentin ScherzClaire Bertelli
Oct 26, 2021·Clinical Infectious Diseases : an Official Publication of the Infectious Diseases Society of America·Juliana de CastilhosChristoph K Stein-Thoeringer

❮ Previous
Next ❯

Software Mentioned

R
OptiClust
Python
Hyperband
XGBoost
VSEARCH
tidyverse
caret
MATLAB
NMDS

Related Concepts

Related Feeds

Biomarkers for Diabetes

This feed focuses on the latest research on biomarkers used for monitoring disease progression in diabetes.