Mixture models for analysis of the taxonomic composition of metagenomes

Peter MeinickeThomas Lingner


Inferring the taxonomic profile of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in metagenomics. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmentary sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependence complicates comparative analysis of data originating from different sequencing platforms or resulting from different preprocessing pipelines. We here introduce a new method for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Our results indicate that the mixture-based profiles compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, our approach shows a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. A platform-independent implementation of the mixture modeling approach is available in terms of a MATLAB/Octave toolbox at http://gobics.de/peter/taxy. In addition, a prototypical implementation within an easy-to-use interactive tool for Windows can be downloaded.


Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Dec 11, 1999·Nucleic Acids Research·M Kanehisa, S Goto
Feb 28, 2002·Genome Biology·Philip Hugenholtz
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Aug 13, 2004·Environmental Microbiology·Hanno TeelingFrank Oliver Glöckner
Feb 24, 2005·Antonie van Leeuwenhoek·James E M Stach, Alan T Bull
Jun 3, 2006·Science·Steven R GillKaren E Nelson
Dec 21, 2006·Nature Methods·Alice C McHardyIsidore Rigoutsos
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
Mar 16, 2007·PLoS Biology·Rekha SeshadriMarvin Frazier
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Oct 6, 2007·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Ken KurokawaMasahira Hattori
Oct 13, 2007·Nucleic Acids Research·Victor M MarkowitzNikos C Kyrpides
Jan 15, 2008·Applied and Environmental Microbiology·K Eric WommackJacques Ravel
Feb 21, 2008·Nucleic Acids Research·Lutz KrauseJens Stoye
Oct 15, 2008·Genome Biology·Martin Wu, Jonathan A Eisen
Dec 2, 2008·Nature·Peter J TurnbaughJeffrey I Gordon
Dec 5, 2008·Microbiology and Molecular Biology Reviews : MMBR·Victor KuninPhilip Hugenholtz
Aug 4, 2009·Nature Methods·Arthur Brady, Steven L Salzberg
Aug 21, 2009·The ISME Journal·Sunhee HongSlava S Epstein
Oct 13, 2009·Genome Research·Sergei Kosakovsky PondGalaxy Team
Jan 1, 2008·Advances in Bioinformatics·Gail RosenBahrad Sokhansanj
Dec 25, 2009·Nature·Dongying WuJonathan A Eisen
Feb 23, 2010·Bioinformatics·Fabian SchreiberPeter Meinicke
Oct 5, 2010·PLoS Genetics·Garret SuenCameron R Currie


Jun 13, 2012·Nature Methods·Daniel H Haft, Andrey Tovchigrechko
Nov 20, 2012·Nature Methods·Adam Roberts, Lior Pachter
Sep 12, 2012·Briefings in Bioinformatics·Hanno Teeling, Frank Oliver Glöckner
Feb 24, 2012·PLoS Computational Biology·Yael Baran, Eran Halperin
Mar 7, 2014·PloS One·Philippe M HauserGaudenz M Hafen
Dec 10, 2013·BMC Bioinformatics·Adam RobertsLior Pachter
May 9, 2014·Bioinformatics·Saikat ChatterjeeJukka Corander
Aug 15, 2014·Nucleic Acids Research·Graham R WoodNigel Burroughs
Jul 16, 2014·International Journal of Molecular Sciences·Kathrin P AßhauerPeter Meinicke
Dec 17, 2014·BMC Bioinformatics·Daniel LangenkämperTim Wilhelm Nattkemper
Apr 16, 2014·Cytometry. Part a : the Journal of the International Society for Analytical Cytology·Carl-Magnus SvenssonMarc Thilo Figge
Feb 13, 2015·Briefings in Bioinformatics·Marie Lisandra Zepeda MendozaM Thomas P Gilbert
Jul 11, 2015·PeerJ·Katelyn McNair, Robert A Edwards
Jul 30, 2014·Methods : a Companion to Methods in Enzymology·Ho-Sik SeokJaebum Kim
Feb 3, 2015·PloS One·Martin S Lindner, Bernhard Y Renard
Sep 14, 2016·Experimental Dermatology·Pamela FerrettiNicola Segata
Mar 28, 2017·Journal of Bioinformatics and Computational Biology·Diem-Trang PhamVinhthuy Phan
Jan 4, 2018·BMC Bioinformatics·Quang TranVinhthuy Phan

Related Concepts

Sequence Determinations, DNA
Comparative Analysis
Genome Sequencing
Human DNA Sequencing

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

The Tendon Seed Network

Tendons are rich in the extracellular matrix and are abundant throughout the body providing essential roles including structure and mobility. The transcriptome of tendons is being compiled to understand the micro-anatomical functioning of tendons. Discover the latest research pertaining to the Tendon Seed Network here.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.


Incretins are metabolic hormones that stimulate a decrease in glucose levels in the blood and they have been implicated in glycemic regulation in the remission phase of type 1 diabetes. Here is the latest research.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Long COVID-19

“Long Covid-19” describes illness in patients who are reporting long-lasting effects of the SARS-CoV-19 infection, often long after they have recovered from acute Covid-19. Ongoing health issues often reported include low exercise tolerance and breathing difficulties, chronic tiredness, and mental health problems such as post-traumatic stress disorder and depression. This feed follows the latest research into Long Covid.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.