Massive metagenomic data analysis using abundance-based machine learning

Biology Direct
Zachary N HarrisTae-Hyuk Ahn

Abstract

Metagenomics is the application of modern genomic techniques to investigate the members of a microbial community directly in their natural environments and is widely used in many studies to survey the communities of microbial organisms that live in diverse ecosystems. In order to understand the metagenomic profile of one of the densest interaction spaces for millions of people, the public transit system, the MetaSUB international Consortium has collected and sequenced metagenomes from subways of different cities across the world. In collaboration with CAMDA, MetaSUB has made the metagenomic samples from these cities available for an open challenge of data analysis including, but not limited in scope to, the identification of unknown samples. To distinguish the metagenomic profiling among different cities and also predict unknown samples precisely based on the profiling, two different approaches are proposed using machine learning techniques; one is a read-based taxonomy profiling of each sample and prediction method, and the other is a reduced representation assembly-based method. Among various machine learning techniques tested, the random forest technique showed promising results as a suitable classifier for both approaches. ...Continue Reading

References

Jul 31, 2001·Journal of Exposure Analysis and Environmental Epidemiology·N E KlepeisW H Engelmann
Apr 23, 2005·Science·Susannah Green TringeEdward M Rubin
Jun 3, 2005·Nature Reviews. Microbiology·Rolf Daniel
Oct 19, 2007·Nature·Peter J TurnbaughJeffrey I Gordon
Jan 18, 2008·The Journal of Infectious Diseases·Ju Young ChangVincent B Young
Apr 3, 2008·PloS One·Susannah G TringeYijun Ruan
Apr 10, 2008·BMC Microbiology·Helena RintalaAino Nevalainen
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Dec 17, 2009·BMC Bioinformatics·Christiam CamachoThomas L Madden
May 22, 2010·Science·UNKNOWN Human Microbiome Jumpstart Reference Strains ConsortiumDianhui Zhu
Jun 11, 2010·Proceedings of the National Academy of Sciences of the United States of America·Jacques RavelLarry J Forney
Jan 7, 2011·Genome Research·Szymon M KiełbasaMartin C Frith
Mar 6, 2012·Nature Methods·Ben Langmead, Steven L Salzberg
Jun 16, 2012·Nature·UNKNOWN Human Microbiome Project Consortium
Jun 16, 2012·Nature·UNKNOWN Human Microbiome Project Consortium
Jun 16, 2012·Nature·Tanya YatsunenkoJeffrey I Gordon
Jul 4, 2012·Annual Review of Microbiology·Bing MaJacques Ravel
Dec 25, 2012·Genome Biology·Sébastien BoisvertJacques Corbeil
Mar 4, 2014·Genome Biology·Derrick E Wood, Steven L Salzberg
Mar 7, 2014·Frontiers in Neuroinformatics·Alexandre AbrahamGaël Varoquaux
Aug 31, 2014·Applied and Environmental Microbiology·Marcus H Y LeungPatrick K H Lee
May 20, 2015·Bioinformatics and Biology Insights·Anastasis OulasIoannis Iliopoulos
Sep 8, 2015·Nature Biotechnology·Chengwei LuoDirk Gevers
Sep 30, 2015·Nature Methods·Duy Tin TruongNicola Segata
Dec 18, 2015·Microbiome·Andrew J HoisingtonChristopher A Lowry
Feb 13, 2016·Clinical Microbiology Reviews·Andrew B OnderdonkRaina N Fichorova
Mar 26, 2016·Methods : a Companion to Methods in Enzymology·Dinghua LiTak-Wah Lam
May 3, 2016·Cell Systems·Ebrahim AfshinnekooChristopher E Mason
Jul 12, 2016·PLoS Computational Biology·Edoardo PasolliNicola Segata
Nov 18, 2016·Genome Research·Daehwan KimSteven L Salzberg
Feb 9, 2017·Genome Research·Duy Tin TruongNicola Segata
Mar 17, 2017·Genome Research·Sergey NurkPavel A Pevzner
Sep 28, 2017·Nature·Jason Lloyd-PriceCurtis Huttenhower
Oct 3, 2017·Nature Methods·Alexander SczyrbaAlice C McHardy
Oct 14, 2017·Briefings in Bioinformatics·Florian P BreitwieserSteven L Salzberg
Dec 24, 2017·Nature Communications·Davide Albanese, Claudio Donati

❮ Previous
Next ❯

Citations

Aug 20, 2020·Current Microbiology·Urszula GodlewskaJoanna Cichy
Nov 15, 2019·Biology Direct·Jolanta KawulokSebastian Deorowicz
Dec 11, 2020·Cell Death Discovery·Alessio ButeraGerry Melino
May 8, 2021·Frontiers in Genetics·Samuel Anyaso-SamuelSomnath Datta
Jul 11, 2021·Molecular Oncology·Carlo GaniniGerry Melino
Jun 14, 2021·Journal of Molecular Biology·Alessio ButeraIvano Amelio
Aug 23, 2021·BioData Mining·Scott LewisTae-Hyuk Ahn

❮ Previous
Next ❯

Methods Mentioned

BETA
profilers
RNA-seq

Software Mentioned

BBMap
MEGAN
BBDuk
Megahit
custom script
StrainPhlAn
Learn R - package
Ray Meta
R
MetAML

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.