Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering

Xiaolin HaoTing Chen


With the advancements of next-generation sequencing technology, it is now possible to study samples directly obtained from the environment. Particularly, 16S rRNA gene sequences have been frequently used to profile the diversity of organisms in a sample. However, such studies are still taxed to determine both the number of operational taxonomic units (OTUs) and their relative abundance in a sample. To address these challenges, we propose an unsupervised Bayesian clustering method termed Clustering 16S rRNA for OTU Prediction (CROP). CROP can find clusters based on the natural organization of data without setting a hard cut-off threshold (3%/5%) as required by hierarchical clustering methods. By applying our method to several datasets, we demonstrate that CROP is robust against sequencing errors and that it produces more accurate results than conventional hierarchical clustering methods. Source code freely available at the following URL:, implemented in C++ and supported on Linux and MS Windows.


Sep 1, 1967·Psychometrika·S C Johnson
Mar 1, 1970·Journal of Molecular Biology·S B Needleman, C D Wunsch
Mar 28, 2002·Nucleic Acids Research·A J EnrightC A Ouzounis
Jan 22, 2005·Nucleic Acids Research·Kazutaka KatohTakashi Miyata
Mar 5, 2005·Applied and Environmental Microbiology·Patrick D Schloss, Jo Handelsman
Jul 18, 2006·Nucleic Acids Research·T Z DeSantisG L Andersen
Aug 2, 2006·Proceedings of the National Academy of Sciences of the United States of America·Mitchell L SoginGerhard J Herndl
Jul 31, 2007·Genome Biology·Susan M HuseDavid Mark Welch
Oct 11, 2008·Nature Biotechnology·Jonathan M Rothberg, John H Leamon
Nov 14, 2008·Nucleic Acids Research·J R ColeJ M Tiedje
May 7, 2009·Nucleic Acids Research·Yijun SunWilliam Farmerie
May 30, 2009·Science·Elizabeth A GriceJulia A Segre
Aug 12, 2009·Nature Methods·Christopher QuinceWilliam T Sloan
Nov 7, 2009·Science·Elizabeth K CostelloRob Knight
Mar 20, 2010·Environmental Microbiology·Susan M HuseMitchell L Sogin

❮ Previous
Next ❯


Aug 31, 2013·Bioinformatics·Jiajie ZhangAlexandros Stamatakis
Mar 13, 2012·Nucleic Acids Research·Lu ChengJukka Corander
Jul 25, 2012·BMC Bioinformatics·Dan WeiShengrui Wang
Feb 8, 2013·BMC Bioinformatics·Xiaoyu WangVolker Mai
Feb 7, 2014·BMC Bioinformatics·Susan M HuseMitchell L Sogin
Feb 26, 2014·BMC Systems Biology·Zeehasham RasheedDaniel Barbará
Apr 26, 2014·PLoS Computational Biology·Thomas S B SchmidtChristian von Mering
Nov 29, 2012·PloS One·Xiaolin Hao, Ting Chen
Jul 19, 2013·PloS One·Sujeevan Ratnasingham, Paul D N Hebert
Sep 14, 2013·Journal of Microbiological Methods·Julia M Di BellaGregor Reid
May 16, 2014·The ISME Journal·Sebastian HornStefan Hempel
Aug 12, 2014·Systematic Biology·Frederick A Matsen
Jan 14, 2016·Scientific Reports·Parul GanjuRajesh S Gokhale
Dec 28, 2012·Science China. Life Sciences·ChenXue YangDouglas W Yu
Mar 6, 2015·Microbes and Environments·Katharina LührigPeter Rådström
Mar 29, 2013·The New Phytologist·Björn D LindahlHåvard Kauserud
Nov 19, 2011·Molecular Ecology·Lucie ZingerThomas Pommier
Aug 15, 2013·Biotechnology and Applied Biochemistry·Kertu TiirikJaak Truu
Apr 15, 2015·Research in Microbiology·Reet MändarJaak Truu
Apr 15, 2015·Beneficial Microbes·I SmidtR Mändar
Aug 27, 2014·Environmental Microbiology·Thomas S B SchmidtChristian von Mering
Aug 1, 2013·Journal of Microbiological Methods·Wei ChenHongyu Zhao
Apr 29, 2015·Molecular BioSystems·Ze-Gang Wei, Shao-Wu Zhang
Dec 3, 2014·Methods : a Companion to Methods in Enzymology·Yihwan KimMina Rho
Apr 20, 2016·The ISME Journal·Kriszta VályiStefan Hempel
Oct 29, 2014·The ISME Journal·Martin HartmannFranco Widmer

❮ Previous
Next ❯

Related Concepts

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Epigenetics Insights from Twin Studies

Find the latest research on epigenetics and twin studies here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.


Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.