Distribution-based clustering: using ecology to refine the operational taxonomic unit

Applied and Environmental Microbiology
Sarah P PreheimEric J Alm


16S rRNA sequencing, commonly used to survey microbial communities, begins by grouping individual reads into operational taxonomic units (OTUs). There are two major challenges in calling OTUs: identifying bacterial population boundaries and differentiating true diversity from sequencing errors. Current approaches to identifying taxonomic groups or eliminating sequencing errors rely on sequence data alone, but both of these activities could be informed by the distribution of sequences across samples. Here, we show that using the distribution of sequences across samples can help identify population boundaries even in noisy sequence data. The logic underlying our approach is that bacteria in different populations will often be highly correlated in their abundance across different samples. Conversely, 16S rRNA sequences derived from the same population, whether slightly different copies in the same organism, variation of the 16S rRNA gene within a population, or sequences generated randomly in error, will have the same underlying distribution across sampled environments. We present a simple OTU-calling algorithm (distribution-based clustering) that uses both genetic distance and the distribution of sequences across samples and demo...Continue Reading


Apr 26, 2014·PLoS Computational Biology·Thomas S B SchmidtChristian von Mering
Mar 8, 2016·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Shi-Yi ChenSong-Jia Lai
Nov 11, 2015·Nucleic Acids Research·Fernando Puente-SánchezVíctor Parro
Aug 27, 2014·Environmental Microbiology·Thomas S B SchmidtChristian von Mering
Feb 11, 2015·Trends in Microbiology·Ashley Shade, Jack A Gilbert
Feb 19, 2015·Nucleic Acids Research·P EslingJ Pawlowski
Nov 27, 2014·Frontiers in Microbiology·Jessica L Mark WelchGary G Borisy
Sep 25, 2014·Annual Review of Marine Science·Zackary I Johnson, Adam C Martiny
Mar 24, 2016·European Journal of Protistology·J PawlowskiP Esling
Jun 1, 2016·Frontiers in Microbiology·Ilana L Brito, Eric J Alm
Sep 9, 2016·Frontiers in Microbiology·A Murat ErenLoïs Maignien
Aug 28, 2016·Cell·Stephen Nayfach, Katherine S Pollard
Sep 17, 2016·The ISME Journal·Ashley Shade
Oct 1, 2016·PeerJ·Virginie Lemieux-LabontéFrançois-Joseph Lapointe
Feb 24, 2017·Environmental Microbiology·Adam J E FreedmanJanelle R Thompson
May 5, 2017·PloS One·Scott W OlesenEric J Alm
Mar 2, 2018·Human Vaccines & Immunotherapeutics·Audrey Humphries, Adil Daud
Nov 1, 2017·Nature Communications·Tobias Guldberg FrøslevAnders Johannes Hansen
Nov 18, 2017·Microbiome·Geneviève DuboisB Jesse Shapiro
May 20, 2015·Bioinformatics and Biology Insights·Anastasis OulasIoannis Iliopoulos
Jul 12, 2014·The ISME Journal·Mikhail TikhonovNed S Wingreen
Jan 11, 2017·MSphere·Catherine GirardB Jesse Shapiro
Sep 1, 2018·Bioinformatics·Stephen C WattsKathryn E Holt
Jun 18, 2016·Nature Communications·Manoshi S DattaOtto X Cordero
Nov 17, 2017·Nature·Nicola WilckDominik N Müller
Jul 4, 2019·Nucleic Acids Research·Benjamin J CallahanMichael K Dougherty
Jan 31, 2020·Environmental Microbiology·Nicolas TromasAlessandra Giani
Feb 13, 2020·Reproductive Sciences·Allison R PerrottaMauricio S Abrao
Mar 24, 2017·FEMS Microbiology Ecology·Mohamed MysaraPieter Monsieurs
Feb 12, 2020·Nature Microbiology·David VanInsbergheMartin F Polz
Sep 6, 2017·Microbiome·Virginie Lemieux-LabontéFrançois-Joseph Lapointe
Apr 20, 2016·NPJ Biofilms and Microbiomes·Nam-Phuong NguyenBryan White
Jan 20, 2018·Nature Communications·Antonio M Martin-PlateroMartin F Polz


Oct 6, 1998·Applied and Environmental Microbiology·Martin F Polz, C M Cavanaugh
Jul 22, 2005·Proceedings of the National Academy of Sciences of the United States of America·Ruth E LeyJeffrey I Gordon
Feb 15, 2008·Proceedings of the National Academy of Sciences of the United States of America·Alexander F KoeppelFrederick M Cohan
Jul 29, 2008·Nucleic Acids Research·Juliane C DohmHeinz Himmelbauer
May 7, 2009·Nucleic Acids Research·Yijun SunWilliam G Farmerie
Aug 12, 2009·Nature Methods·Christopher QuinceWilliam T Sloan
Jan 6, 2010·Applied and Environmental Microbiology·Nora ConnorFrederick M Cohan
Mar 13, 2010·PloS One·Morgan N PriceAdam P Arkin
Mar 20, 2010·Environmental Microbiology·Susan M HuseMitchell L Sogin
Apr 7, 2010·Proceedings of the National Academy of Sciences of the United States of America·Peter J TurnbaughJeffrey I Gordon
Apr 13, 2010·Nature Methods·J Gregory CaporasoRob Knight
Jun 11, 2010·Proceedings of the National Academy of Sciences of the United States of America·J Gregory CaporasoRob Knight
Aug 17, 2010·Bioinformatics·Robert C Edgar
Jan 5, 2011·Environmental Microbiology·Diana R NemergutRob Knight
Feb 1, 2011·BMC Bioinformatics·Christopher QuincePeter J Turnbaugh
Mar 23, 2011·Applied and Environmental Microbiology·Patrick D Schloss, Sarah L Westcott
May 18, 2011·Nucleic Acids Research·Kensuke NakamuraShigehiko Kanaya
Jun 17, 2011·The ISME Journal·Patrick H Degnan, Howard Ochman
Jun 28, 2011·Bioinformatics·Robert C EdgarRob Knight
Aug 30, 2011·Proceedings of the National Academy of Sciences of the United States of America·Woo Jun SulJames M Tiedje
Jun 26, 2012·Bioinformatics·Zejun ZhengBertil Schmidt
Oct 16, 2012·Applied and Environmental Microbiology·Nicholas D YoungblutRachel J Whitaker
Dec 4, 2012·Nature Methods·Nicholas A BokulichJ Gregory Caporaso

Related Concepts

Kinetic Polymerase Chain Reaction
Oligonucleotide Primers
Sequence Determinations, DNA
Base Pairing
RNA, Ribosomal, 16S
Bacterial 16S RNA

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

HLA Genetic Variation

HLA genetic variation has been found to confer risk for a wide variety of diseases. Identifying these associations and understanding their molecular mechanisms is ongoing and holds promise for the development of therapeutics. Find the latest research on HLA genetic variation here.

Super-resolution Microscopy

Super-resolution microscopy is the term commonly given to fluorescence microscopy techniques with resolutions that are not limited by the diffraction of light. Here are the latest discoveries pertaining to super-resolution microscopy.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.

CD4/CD8 Signaling

Cluster of differentiation 4 and 8 (CD8 and CD8) are glycoproteins founds on the surface of immune cells. Here is the latest research on their role in cell signaling pathways.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.