RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles

BMC Bioinformatics
Ozkan U NalbantogluKhalid Sayood

Abstract

Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes. We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The s...Continue Reading

References

Aug 1, 1986·Journal of Biomolecular Structure & Dynamics·V BrendelE N Trifonov
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jun 26, 1999·Science·W F Doolittle
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Nov 24, 2005·Nature Reviews. Genetics·Susannah Green Tringe, Edward M Rubin
Mar 4, 2006·Science·Francesca D CiccarelliPeer Bork
Jun 14, 2006·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Takashi AbeToshimichi Ikemura
Dec 21, 2006·Nature Methods·Alice Carolyn McHardyIsidore Rigoutsos
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
Mar 30, 2007·Journal of Microbiological Methods·Soumitesh ChakravortyDavid Alland
Jul 12, 2007·Bioinformatics·Xiu-Feng WanRuben Donis
Feb 21, 2008·Nucleic Acids Research·Lutz KrauseJens Stoye
Feb 22, 2008·Journal of Biomedicine & Biotechnology·Chon-Kit Kenneth ChanSaman K Halgamuge
Feb 28, 2008·PloS One·Elizabeth A DinsdaleForest Rohwer
Apr 30, 2008·BMC Bioinformatics·Chon-Kit Kenneth ChanSen-Lin Tang
Jul 1, 2008·Journal of Microbiological Methods·Noha H Youssef, Mostafa S Elshahed
Oct 15, 2008·Genome Biology·Martin Wu, Jonathan A Eisen
Feb 4, 2009·Proceedings of the National Academy of Sciences of the United States of America·Gregory E SimsSung-Hou Kim
Mar 31, 2009·Genome Biology·Olivier HarismendyKelly A Frazer
Aug 4, 2009·Nature Methods·Arthur Brady, Steven L Salzberg

Citations

Jun 13, 2012·Nature Methods·Nicola SegataCurtis Huttenhower
Sep 11, 2012·Briefings in Bioinformatics·Sharmila S MandeTarini Shankar Ghosh
Apr 26, 2012·Nucleic Acids Research·Norman J MacDonaldRobert G Beiko
May 12, 2012·BMC Bioinformatics·Adam L Bazinet, Michael P Cummings
Apr 23, 2014·Food Microbiology·Ufuk NalbantogluHandan Can
Jun 30, 2014·Bioinformatics·Ehsan Behnam, Andrew D Smith
Mar 10, 2016·Microbiome·Naseer SangwanJack A Gilbert
May 15, 2013·Molecular Systems Biology·Nicola SegataCurtis Huttenhower
Jan 23, 2015·Frontiers in Microbiology·Hayssam SoueidanMacha Nikolski
Dec 21, 2016·Annals of the New York Academy of Sciences·Jian-Qiang SuYong-Guan Zhu
Mar 28, 2017·Journal of Bioinformatics and Computational Biology·Diem-Trang PhamVinhthuy Phan
Jun 8, 2017·Nature Communications·Xinglin JiangSang Yup Lee
Jan 4, 2018·BMC Bioinformatics·Quang TranVinhthuy Phan
Aug 10, 2017·Nature Reviews. Gastroenterology & Hepatology·Marcus J ClaessonPaul W O'Toole
May 23, 2014·Nature·Kevin J ForsbergGautam Dantas
Jun 10, 2020·Bioinformatics·David J Burks, Rajeev K Azad
Dec 26, 2016·Briefings in Bioinformatics·Valentina GalataAndreas Keller
May 26, 2017·Frontiers in Microbiology·Danillo O AlvarengaAlessandro M Varani
May 6, 2019·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Dylan LebatteuxAbdoulaye Baniré Diallo
Dec 13, 2019·Computational and Mathematical Methods in Medicine·Matyas CserhatiChittibabu Guda
Jan 11, 2017·PloS One·Thomas W A BraukmannPaul D N Hebert
Feb 6, 2021·The Science of the Total Environment·Xi-Ran WangJian Sun

Related Concepts

Phylogeny
Computer Programs and Programming
Sequence Determinations, DNA
Metagenomics
Classification
DNA
Environment
Genome
Analysis
probe gene fragment

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.