Accurate genome relative abundance estimation based on shotgun metagenomic reads

PloS One
Li C XiaFengzhu Sun

Abstract

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many rea...Continue Reading

References

Feb 1, 1992·Computer Applications in the Biosciences : CABIOS·C Karreman
Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Mar 1, 1973·Computer Programs in Biomedicine·D J Park, B E Wright
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Jun 3, 2006·Science·Steven R GillKaren E Nelson
Oct 31, 2006·Applied and Environmental Microbiology·Rebecca J CaseStaffan Kjelleberg
Dec 21, 2006·Nature Methods·Alice C McHardyIsidore Rigoutsos
Dec 22, 2006·Nature·Peter J TurnbaughJeffrey I Gordon
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos Kyrpides
May 16, 2007·Proceedings of the National Academy of Sciences of the United States of America·Ramunas Stepanauskas, Michael E Sieracki
Oct 6, 2007·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Ken KurokawaMasahira Hattori
Oct 19, 2007·Nature·Peter J TurnbaughJeffrey I Gordon
Aug 22, 2008·Human & Experimental Toxicology·S I LiuC R Jan
Oct 9, 2008·PloS One·Daniel C RichterDaniel H Huson
Dec 2, 2008·Nature·Peter J TurnbaughJeffrey I Gordon
Feb 10, 2009·Applied and Environmental Microbiology·Erin J BiersErinn C Howard
Mar 6, 2009·Genome Biology·Ben LangmeadSteven L Salzberg
Apr 25, 2009·PloS One·Tanja WoykeRamunas Stepanauskas
Aug 4, 2009·Nature Methods·Arthur Brady, Steven L Salzberg
Oct 13, 2009·Genome Research·Jane PetersonMark Guyer
Oct 30, 2009·Nucleic Acids Research·Victor M MarkowitzNikos Kyrpides
Mar 25, 2010·Genome Biology and Evolution·Parag A VaishampayanM Pilar Francino
Mar 26, 2010·The ISME Journal·Bánk BeszteriStephen J Giovannoni
Apr 27, 2010·PloS One·Jenna L MorganJonathan A Eisen
May 4, 2010·Current Genomics·Gail L RosenNon Yok
May 22, 2010·Science·Karen E NelsonDianhui Zhu
Nov 4, 2010·BMC Bioinformatics·David R Kelley, Steven L Salzberg
Mar 10, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yu-Wei Wu, Yuzhen Ye
Feb 4, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yi WangFrancis Y L Chin
Jul 11, 2014·PLoS Computational Biology·Fredrik H KarlssonJens Nielsen

Citations

Apr 18, 2014·The ISME Journal·Stephen J GiovannoniBen Temperton
Aug 15, 2014·Nucleic Acids Research·Graham R WoodNigel Burroughs
Sep 11, 2013·PloS One·Thomas BonfertCaroline C Friedel
Sep 12, 2012·Journal of Mathematical Biology·Michael C WendlMakedonka Mitreva
Apr 25, 2013·Genetics and Molecular Research : GMR·Q WangM Gao
Dec 28, 2012·BMC Genomics·Bai JiangXuegong Zhang
May 1, 2012·Briefings in Bioinformatics·Catherine Ngom-Bru, Caroline Barretto
Sep 4, 2012·Nucleic Acids Research·Martin S Lindner, Bernhard Y Renard
Oct 23, 2013·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Itamar EskinEran Halperin
Apr 17, 2013·Bioinformatics·Martin S LindnerBernhard Y Renard
Jun 13, 2012·Nature Methods·Daniel H Haft, Andrey Tovchigrechko
Feb 13, 2016·The Journal of Dairy Research·C Bruce A WhitelawChris Proudfoot
Jan 8, 2016·Parasitology·Freya J I FowkesJames G Beeson
Jan 9, 2016·Parasitology·Danielle I Stanisic, Michael F Good
Jan 9, 2016·Parasitology·Victoria Ryg-CornejoDiana S Hansen
Jan 9, 2016·Parasitology·Katherine R Dobbs, Arlene E Dent
Aug 13, 2015·Epidemiology and Infection·W PaveenkittipornS Hamada
Feb 9, 2016·Journal of Nutritional Science·Anne A BjerregaardThorhallur I Halldorsson
Jan 30, 2016·Quarterly Reviews of Biophysics·Musa OzboyaciRebecca C Wade
Feb 24, 2016·The Proceedings of the Nutrition Society·Miguel A Martínez-González, Almudena Sánchez-Villegas
Oct 16, 2015·Proceedings of the National Academy of Sciences of the United States of America·Iwijn De VlaminckStephen R Quake
Feb 24, 2016·Bulletin of Entomological Research·N Singh Omkar
Oct 17, 2015·The Proceedings of the Nutrition Society·Romano RegazziCécile Jacovetti
Sep 8, 2015·The Proceedings of the Nutrition Society·Carles Cantó
Mar 28, 2013·Journal of Toxicology·Jing LiuKari C Nadeau
Oct 30, 2015·Proceedings of the National Academy of Sciences of the United States of America·Marcus B JonesJ Craig Venter
Oct 10, 2015·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Yu-Qing QiuShihua Zhang
May 24, 2015·Bioinformatics·Sofia Morfopoulou, Vincent Plagnol
Jul 18, 2015·Genomics, Proteomics & Bioinformatics·Rahul Shubhra MandalSantasabuj Das
Jun 18, 2016·Health Security·Norman A DoggettSegaran Pillai
Mar 24, 2017·Bioinformatics·L SchaefferL Pachter
Apr 7, 2017·Bioinformatics·Patrick CzeczkoA P Jason de Koning
May 20, 2015·Bioinformatics and Biology Insights·Anastasis OulasIoannis Iliopoulos
Aug 6, 2018·Applied Microbiology and Biotechnology·Tiphaine C MartinMario Falchi
Aug 28, 2019·Proceedings of the National Academy of Sciences of the United States of America·Alexandre Pellan ChengIwijn De Vlaminck
Feb 1, 2020·Digestive Diseases and Sciences·Jessica Galloway-Peña, Blake Hanson
Feb 13, 2020·Microbiome·Philip BurnhamIwijn De Vlaminck
Sep 9, 2017·Bioinformatics·Martina FischerBernhard Y Renard
Jun 22, 2018·Nature Communications·Philip BurnhamIwijn De Vlaminck
Jan 30, 2018·Frontiers in Plant Science·Salvatore AlaimoAlfredo Pulvirenti

Related Concepts

Bacteroides
Intestines
Probability
Programming Languages
Two-Parameter Models
Likelihood Functions
Genome
Sequence Determinations, DNA
Computational Molecular Biology
Genomics

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.