Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models

Nature Methods
Arthur Brady, Steven L Salzberg

Abstract

Metagenomics projects collect DNA from uncharacterized environments that may contain thousands of species per sample. One main challenge facing metagenomic analysis is phylogenetic classification of raw sequence reads into groups representing the same or similar taxa, a prerequisite for genome assembly and for analyzing the biological diversity of a sample. New sequencing technologies have made metagenomics easier, by making sequencing faster, and more difficult, by producing shorter reads than previous technologies. Classifying sequences from reads as short as 100 base pairs has until now been relatively inaccurate, requiring researchers to use older, long-read technologies. We present Phymm, a classifier for metagenomic data, that has been trained on 539 complete, curated genomes and can accurately classify reads as short as 100 base pairs, a substantial improvement over previous composition-based classification methods. We also describe how combining Phymm with sequence alignment algorithms improves accuracy.

References

Jul 1, 1995·Trends in Genetics : TIG·S Karlin, C Burge
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Feb 28, 1998·Nucleic Acids Research·Steven L SalzbergOwen White
Nov 11, 1999·Nucleic Acids Research·A L DelcherSteven L Salzberg
Apr 23, 2005·Science·Susannah G TringeEdward M Rubin
Nov 11, 2005·BMC Evolutionary Biology·Charles ChapusPatrick Deschavanne
Dec 21, 2006·Nature Methods·A C McHardyIsidore Rigoutsos
Jan 24, 2007·Bioinformatics·Arthur L DelcherSteven L Salzberg
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Jan 19, 2008·Bioinformatics·James Robert WhiteMihai Pop
Feb 21, 2008·Nucleic Acids Research·Lutz KrauseJens Stoye
Feb 28, 2008·PloS One·Elizabeth A DinsdaleForest Rohwer
Apr 23, 2008·Current Protocols in Bioinformatics·Arthur L DelcherAdam M Phillippy
Nov 13, 2008·PloS One·Raúl Y TitoCecil M Lewis
Dec 5, 2008·Microbiology and Molecular Biology Reviews : MMBR·Victor KuninPhilip Hugenholtz

Citations

Jul 1, 2011·The ISME Journal·Pedro Belda-FerreAlex Mira
Feb 3, 2012·The ISME Journal·Michael LiuTorsten Thomas
Jan 19, 2010·Nature Methods·Joseph B HiattJay Shendure
Jun 13, 2012·Nature Methods·Nicola SegataCurtis Huttenhower
Jun 13, 2012·Nature Methods·Daniel H Haft, Andrey Tovchigrechko
Mar 2, 2011·Nature Methods·Kaustubh Raosaheb PatilA C McHardy
Apr 30, 2011·Nature Methods·Arthur Brady, Steven L Salzberg
Sep 1, 2009·Nature Methods·Jens Reeder, Rob Knight
Nov 30, 2011·Proceedings of the National Academy of Sciences of the United States of America·Mohamed S DoniaEric W Schmidt
Mar 10, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yu-Wei Wu, Yuzhen Ye
Feb 4, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yi WangFrancis Chin
Sep 11, 2012·Briefings in Bioinformatics·Chris I HunterSarah Hunter
Aug 2, 2012·Briefings in Bioinformatics·Johannes Dröge, A C McHardy
Sep 12, 2012·Briefings in Bioinformatics·Hanno Teeling, Frank Oliver Glöckner
Sep 11, 2012·Briefings in Bioinformatics·Sharmila S MandeTarini Shankar Ghosh
May 17, 2013·Briefings in Bioinformatics·Jing WangPeter J Lockhart
Feb 23, 2010·Bioinformatics·Fabian SchreiberPeter Meinicke
Oct 30, 2010·Bioinformatics·M H MohammedSharmila S Mande
May 7, 2011·Bioinformatics·Peter MeinickeThomas Lingner
Jun 5, 2013·Bioinformatics·Michael S Porter, Robert G Beiko
Nov 22, 2011·Nucleic Acids Research·David R KelleySteven L Salzberg
Dec 26, 2012·Nucleic Acids Research·Leelavati NarlikarMihir Arjunwadkar
Apr 26, 2012·Nucleic Acids Research·Norman J MacDonaldRobert G Beiko
Mar 13, 2013·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Taiki FutagamiFumio Inagaki
Jun 22, 2011·Genome Research·Daniel H HusonStephan C Schuster
Jul 12, 2013·Genome Research·Owen E FrancisWilliam Evan Johnson
May 5, 2011·Journal of Biomedicine & Biotechnology·Gail RosenBahrad A Sokhansanj
May 12, 2012·Journal of Biomedicine & Biotechnology·Shruthi Prabhakara, Raj Acharya
Nov 4, 2010·BMC Bioinformatics·David R Kelley, Steven L Salzberg
Nov 19, 2010·BMC Bioinformatics·Francis Cheng-Hsuan WengDaryi Wang
May 18, 2011·BMC Bioinformatics·Yuzhen YeHaixu Tang
Aug 11, 2011·BMC Bioinformatics·Donovan ParksRobert G Beiko
Oct 4, 2011·BMC Bioinformatics·Brian D OndovAdam M Phillippy
May 12, 2012·BMC Bioinformatics·Adam L Bazinet, Michael P Cummings
Oct 26, 2011·BMC Genomics·Anveshi Charuvaka, Huzefa Rangwala
Nov 1, 2012·BMC Genomics·Susan HigashiAna Tereza Ribeiro de Vasconcelos
May 17, 2012·Microbial Informatics and Experimentation·Torsten ThomasFolker Meyer
Dec 25, 2012·Genome Biology·Sébastien BoisvertJacques Corbeil
Mar 3, 2010·PLoS Computational Biology·John C WooleyIddo Friedberg
Dec 5, 2012·PLoS Computational Biology·Dirk GeversCurtis Huttenhower
Jan 10, 2013·PLoS Computational Biology·Xochitl C Morgan, Curtis Huttenhower
Oct 5, 2010·PLoS Genetics·Garret SuenCameron R Currie
Aug 3, 2010·PloS One·Sébastien RodrigueSallie W Chisholm
Mar 30, 2011·PloS One·Mohamed S DoniaEric W Schmidt
Apr 13, 2012·PloS One·Vineet K SharmaTodd D Taylor
Jun 30, 2012·PloS One·Kaustubh Raosaheb PatilA C McHardy
Jul 5, 2012·PloS One·Lu FanTorsten Thomas
Aug 29, 2012·PloS One·Colin F DavenportFrauke Sprengel
Oct 20, 2012·PloS One·Shannon J WilliamsonJ Craig Venter
Sep 11, 2013·PloS One·Thomas BonfertCaroline C Friedel
Mar 15, 2014·PloS One·David KoslickiGail Rosen
Dec 20, 2011·Standards in Genomic Sciences·Amrita PatiNatalia Ivanova
Oct 1, 2014·Nature Reviews. Genetics·David S GuttmanPaul Schulze-Lefert
Feb 1, 2014·PeerJ·Aaron E DarlingJonathan A Eisen
Dec 4, 2013·Proceedings of the National Academy of Sciences of the United States of America·Xiaomin YuWilliam W Metcalf
Mar 4, 2014·Genome Biology·Derrick E Wood, Steven L Salzberg
Apr 1, 2014·BMC Bioinformatics·Koldo Garcia-EtxebarriaFrancesc Calafell
Sep 14, 2013·Journal of Microbiological Methods·Julia M Di BellaGregor Reid
Jun 3, 2014·BMC Bioinformatics·Sergey KorenAdam M Phillippy
Nov 2, 2014·Bioinformatics·Nam-Phuong NguyenTandy Warnow
Jan 1, 2009·Journal of Computer Science and Technology·John C Wooley, Yuzhen Ye
Sep 24, 2014·Proceedings of the National Academy of Sciences of the United States of America·Michael PoulsenGuojie Zhang
Feb 4, 2014·Gastroenterology·Xochitl C Morgan, Curtis Huttenhower
Jul 30, 2015·Proceedings of the National Academy of Sciences of the United States of America·Erin M BertrandAndrew E Allen
Jan 8, 2016·BMC Bioinformatics·Vinh Van LeHoai Van Tran
Dec 17, 2014·BMC Bioinformatics·Daniel LangenkämperTim Wilhelm Nattkemper
Oct 10, 2015·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Yu-Qing QiuShihua Zhang
Dec 23, 2015·BioData Mining·Zhenqiu LiuSteven Piantadosi
Jan 15, 2015·Clinica Chimica Acta; International Journal of Clinical Chemistry·Valeria D'Argenio, Francesco Salvatore
Aug 23, 2011·Environmental Microbiology·Haiwei LuoMary Ann Moran
Aug 10, 2015·Cellular and Molecular Life Sciences : CMLS·Daniel R Garza, Bas E Dutilh
Aug 17, 2011·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Xin ChenGail Rosen
Jan 1, 2014·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Ruiqi LiaoShuigeng Zhou
Aug 17, 2011·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Chien-Hao SuHuai-Kuang Tsai
Dec 15, 2015·Biotechnology Research International·Satish KumarManoj Pandit Brahmane
Aug 22, 2015·Interdisciplinary Sciences, Computational Life Sciences·Weihua PanYun Xu
Jun 26, 2015·Wiley Interdisciplinary Reviews. Systems Biology and Medicine·Matthew B BiggsJason A Papin
Feb 13, 2015·Briefings in Bioinformatics·Marie Lisandra Zepeda MendozaM Thomas P Gilbert
Oct 10, 2015·International Journal of Genomics·Graham RoseSaheer Gharbia
May 24, 2015·Bioinformatics·Sofia Morfopoulou, Vincent Plagnol
Oct 21, 2014·Environmental Microbiology·Meghan ChafeeSheri L Simmons
Sep 2, 2015·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Jimmy H SawThijs J G Ettema
Jul 18, 2015·Genomics, Proteomics & Bioinformatics·Rahul Shubhra MandalSantasabuj Das
May 6, 2016·Frontiers in Microbiology·Ankit GuptaVineet K Sharma
Apr 11, 2015·BMC Bioinformatics·Ruichang ZhangShuigeng Zhou
Jun 18, 2015·Briefings in Bioinformatics·Wynand AlkemaSacha A F T van Hijum
Apr 12, 2015·Genome Biology and Evolution·Erin R ReichenbergerRuth Hershberg
Apr 14, 2015·Systematic and Applied Microbiology·Senthil Alias SankarPierre-Edouard Fournier
May 6, 2016·Frontiers in Microbiology·Juan JovelGane K-S Wong
Feb 6, 2015·BMC Bioinformatics·Ying WangXiaoman Li
Jul 2, 2014·Frontiers in Plant Science·Thomas J Sharpton
Jun 8, 2012·Clinical Microbiology and Infection : the Official Publication of the European Society of Clinical Microbiology and Infectious Diseases·L D AlcarazA Mira
Jul 5, 2016·Frontiers in Microbiology·Patrick W LaffyThomas Rattei
Jun 18, 2016·Health Security·Norman A DoggettSegaran Pillai
Jun 15, 2016·Bioinformatics·Panu SomervuoOtso Ovaskainen
Mar 19, 2016·Environmental Microbiology Reports·William D OrsiJennifer F Biddle
Apr 18, 2015·PloS One·Jolanta Kawulok, Sebastian Deorowicz
May 6, 2015·Annual Review of Genomics and Human Genetics·Knut ReinertDirk J Evers
Aug 21, 2016·Molecular Ecology·David Reynolds, Torsten Thomas
Nov 4, 2015·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Keru HuaRuiming Zhang
Aug 2, 2012·Journal of Bioinformatics and Computational Biology·Zeehasham Rasheed, Huzefa Rangwala
Sep 30, 2016·PloS One·Ahmed A MetwallyDavid L Perkins
Oct 21, 2016·PloS One·Philippe ChouvarineBurkhard Tümmler
Aug 12, 2014·Systematic Biology·Frederick A Matsen
Jan 17, 2017·Nature·Thomas MockIgor V Grigoriev
Jan 21, 2017·Microbiome·Jeremy W CoxAleksey Porollo
Mar 28, 2017·Journal of Bioinformatics and Computational Biology·Diem-Trang PhamVinhthuy Phan
Mar 23, 2017·Frontiers in Genetics·Despoina D RoumpekaMick Watson
Jan 13, 2012·Bioinformatics·Alex L B LeachKelly R Redeker
Mar 5, 2014·Nucleic Acids Research·Chengwei LuoKonstantinos T Konstantinidis
Jan 3, 2018·Molecular Ecology·Teresita M Porter, Mehrdad Hajibabaei
Nov 7, 2015·BMC Biology·Pavel PetrenkoAndrew C Doxey
May 21, 2016·Evolutionary Bioinformatics Online·Vanessa Aguiar-PulidoGiri Narasimhan
Nov 27, 2014·Nature Communications·Alexander J ProbstChristine Moissl-Eichinger
Apr 16, 2013·Bioinformatics and Biology Insights·Riccardo Percudani
Sep 30, 2017·Bioinformatics·André MüllerBertil Schmidt
May 20, 2015·Bioinformatics and Biology Insights·Anastasis OulasIoannis Iliopoulos
Jan 1, 2013·Microbiology Insights·Girish Neelakanta, Hameeda Sultana
Jan 4, 2018·BMC Bioinformatics·Quang TranVinhthuy Phan
Oct 3, 2017·Environmental Microbiology·Marc Garcia-GarceraEduardo P C Rocha
Jul 17, 2018·Bioinformatics·Yunan LuoJian Peng
Jun 14, 2016·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Yun LiuFu Liu
Jul 4, 2018·Environmental Microbiology·Aurèle VuilleminJens Kallmeyer
Aug 2, 2013·Science Progress·Ramana MadupuKaren E Nelson
May 7, 2015·Nature·Anja SpangThijs J G Ettema
Mar 11, 2010·Expert Review of Molecular Diagnostics·Geraint B RogersKenneth D Bruce
Apr 21, 2019·Bioinformatics·Fabio CunialDjamal Belazzougui
Sep 13, 2017·Nature Biotechnology·Christopher QuinceNicola Segata
Aug 10, 2017·Nature Reviews. Gastroenterology & Hepatology·Marcus J ClaessonPaul W O'Toole
Jun 21, 2019·Molecular Biology and Evolution·Eva Maria NovoaManolis Kellis
Oct 15, 2013·Genomics & Informatics·Mincheol KimHana Yi
Aug 27, 2017·Scientific Reports·Yuan JiangGuoxian Yu
Dec 24, 2019·IUBMB Life·Giovanna De SimonePaolo Ascenzi
Sep 14, 2016·Experimental Dermatology·Pamela FerrettiNicola Segata
Jun 17, 2020·Molecular Ecology·Kristine BohmannM Thomas P Gilbert
Jul 13, 2019·Nature Communications·Alexander T DiltheyAdam M Phillippy
Jun 10, 2020·Bioinformatics·David J Burks, Rajeev K Azad
Jun 14, 2013·Nature·William D OrsiJennifer F Biddle
Nov 18, 2016·Genome Research·Daehwan KimSteven L Salzberg
Dec 27, 2016·Frontiers in Physiology·Alejandra V ContrerasOsbaldo Resendis-Antonio
Dec 12, 2019·Frontiers in Genetics·Kai SongFengzhu Sun
Jul 14, 2020·Bioinformatics·Metin Balaban, Siavash Mirarab
Jan 13, 2017·Nature·Katarzyna Zaremba-NiedzwiedzkaThijs J G Ettema
Jul 7, 2017·Frontiers in Genetics·Emanuele BosiMarco Fondi
Aug 3, 2016·Virus Evolution·Rebecca RoseMattia Prosperi
Jun 6, 2018·PeerJ·Adam L BazinetShashikala Ratnayake
Apr 4, 2017·PeerJ·Temesgen Hailemariam DadiKnut Reinert
Jan 20, 2017·Nature·James O McInerney, Mary J O'Connell
Jul 25, 2020·Microbial Genomics·Ana Elena Pérez-CobasCarmen Buchrieser
Nov 17, 2020·Frontiers in Cellular and Infection Microbiology·Congmin XuLiping Duan
May 27, 2020·Methods : a Companion to Methods in Enzymology·Jiayu Shang, Yanni Sun
Jan 21, 2021·BioData Mining·Congmin XuHuaiqiu Zhu
Oct 29, 2020·MSystems·Harald R Gruber-VodickaElmar Pruesse

Related Concepts

Knowledge Representation (Computer)
DNA, Double-Stranded
Hydrogen-Ion Concentration
Markov Chains
Mining
Phylogeny
Soil Microbiology
Determination, Sequence Homology
Genomics
Classification

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.