A novel abundance-based algorithm for binning metagenomic sequences using l-tuples

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
Yu-Wei Wu, Yuzhen Ye

Abstract

Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes-two important parameters for characterizing a micr...Continue Reading

References

Apr 6, 2002·Bioinformatics·Heiko A SchmidtArndt von Haeseler
Sep 4, 2003·Genome Research·Xiaoqiu HuangLaDeana Hillier
Oct 8, 2003·Systematic Biology·Stéphane Guindon, Olivier Gascuel
May 15, 2004·Environmental Microbiology·Michael Y Galperin
Dec 1, 2004·Annual Review of Genetics·Stephen D Bentley, Julian Parkhill
Jan 28, 2005·Nature·E HoskinsonThomas M Haard
Apr 23, 2005·Science·Susannah G TringeEdward M Rubin
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Oct 4, 2005·EMBO Reports·Konrad U FoerstnerPeer Bork
Dec 31, 2005·Nucleic Acids Research·Robert D FinnAlex Bateman
Mar 4, 2006·Science·Francesca D CiccarelliPeer Bork
Oct 24, 2006·Current Opinion in Genetics & Development·David R Bentley
Dec 22, 2006·Nature·Peter J TurnbaughJeffrey I Gordon
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
Mar 30, 2007·Journal of Microbiological Methods·Soumitesh ChakravortyDavid Alland
Jul 31, 2007·Genome Biology·Susan M HuseDavid Mark Welch
Sep 15, 2007·Nucleic Acids Research·Clyde A Hutchison
Jan 19, 2008·Bioinformatics·James Robert WhiteMihai Pop
Feb 21, 2008·Nucleic Acids Research·Lutz KrauseJens Stoye
Feb 28, 2008·PloS One·Elizabeth A DinsdaleForest Rohwer
Mar 14, 2008·Nature·Elizabeth A DinsdaleForest Rohwer
Jul 5, 2008·Genome Biology·Adam MonierHiroyuki Ogata
Oct 15, 2008·Genome Biology·Martin Wu, Jonathan A Eisen
Dec 2, 2008·Nature·Peter J TurnbaughJeffrey I Gordon
Dec 19, 2008·BMC Bioinformatics·Fengfeng ZhouYing Xu
Aug 4, 2009·Nature Methods·Arthur Brady, Steven L Salzberg

Citations

Feb 4, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yi WangFrancis Chin
Sep 12, 2012·Briefings in Bioinformatics·Hanno Teeling, Frank Oliver Glöckner
Sep 11, 2012·Briefings in Bioinformatics·Sharmila S MandeTarini Shankar Ghosh
Jul 24, 2012·Nucleic Acids Research·Toshiaki NamikiYasubumi Sakakibara
May 12, 2012·Journal of Biomedicine & Biotechnology·Shruthi Prabhakara, Raj Acharya
Feb 24, 2012·PLoS Computational Biology·Yael Baran, Eran Halperin
Oct 23, 2013·PLoS Computational Biology·Rogan CarrElhanan Borenstein
Sep 14, 2013·Journal of Microbiological Methods·Julia M Di BellaGregor Reid
Jan 18, 2016·BMC Bioinformatics·Veronika B DubinkinaDmitry G Alexeev
Mar 21, 2015·Journal of Bioinformatics and Computational Biology·Tarini Shankar GhoshSharmila S Mande
Jan 8, 2016·BMC Bioinformatics·Vinh Van LeHoai Van Tran
Mar 10, 2015·Computational and Structural Biotechnology Journal·Florence Abram
Jan 1, 2014·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Ruiqi LiaoShuigeng Zhou
Aug 22, 2015·Interdisciplinary Sciences, Computational Life Sciences·Weihua PanYun Xu
Feb 5, 2015·Algorithms for Molecular Biology : AMB·Le Van VinhTran Van Hoai
Apr 11, 2015·BMC Bioinformatics·Ruichang ZhangShuigeng Zhou
Feb 17, 2015·Algorithms for Molecular Biology : AMB·Burkhard MorgensternChris André Leimeister
Jul 16, 2013·Journal of Theoretical Biology·Lianping YangHegui Zhu
Feb 3, 2015·PloS One·Martin S Lindner, Bernhard Y Renard
Jun 5, 2016·Bioinformatics·Vladimir I UlyantsevDmitry G Alexeev
Oct 23, 2014·PloS One·Jennifer M FettweisGregory A Buck
Nov 12, 2015·International Journal of Genomics·Ying-Chih Lin
Aug 21, 2016·BMC Bioinformatics·Anestis GkanogiannisThomas Brüls
Mar 17, 2017·Genome Research·Sergey NurkPavel A Pevzner
Jul 30, 2014·Methods : a Companion to Methods in Enzymology·Ho-Sik SeokJaebum Kim
Nov 9, 2017·Journal of Bioinformatics and Computational Biology·Mohammad Arifur RahmanDaniel Barbara
Jan 4, 2018·BMC Bioinformatics·Damayanthi HerathSaman Kumara Halgamuge
Jun 14, 2016·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Yun LiuFu Liu
Jan 1, 2019·GigaScience·Illyoung ChoiBonnie L Hurwitz
Apr 13, 2019·Bioinformatics·Ziye WangShanfeng Zhu
Apr 25, 2018·Mediators of Inflammation·Anna PiccaEmanuele Marzetti
Mar 14, 2020·Bioinformatics·Vijini MallawaarachchiYu Lin
Oct 14, 2017·Briefings in Bioinformatics·Florian P BreitwieserSteven L Salzberg
Jan 30, 2018·Frontiers in Plant Science·Salvatore AlaimoAlfredo Pulvirenti
Jan 12, 2017·Bioinformatics·Chris-André LeimeisterBurkhard Morgenstern
May 26, 2017·Frontiers in Microbiology·Danillo O AlvarengaAlessandro M Varani
Apr 1, 2017·BMC Bioinformatics·Jarno AlankoVeli Mäkinen
Mar 8, 2019·Scientific Reports·Ajay Kumar SawSoumyadeep Nandi
Jul 25, 2020·Microbial Genomics·Ana Elena Pérez-CobasCarmen Buchrieser
Jun 25, 2020·Briefings in Bioinformatics·Matteo CominFabio Vandin
Jul 1, 2018·Annual Review of Biomedical Data Science·Jie RenFengzhu Sun

Methods Mentioned

BETA
454 sequencing
PCA

Related Concepts

DNA, Double-Stranded
Genome, Bacterial
Sequence Determinations, DNA
Online Mendelian Inheritance In Man
Metagenomics
DNA
Environment
Genome
Operator Regions, Genetic
Laboratory Culture

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

The Tendon Seed Network

Tendons are rich in the extracellular matrix and are abundant throughout the body providing essential roles including structure and mobility. The transcriptome of tendons is being compiled to understand the micro-anatomical functioning of tendons. Discover the latest research pertaining to the Tendon Seed Network here.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Incretins

Incretins are metabolic hormones that stimulate a decrease in glucose levels in the blood and they have been implicated in glycemic regulation in the remission phase of type 1 diabetes. Here is the latest research.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Long COVID-19

“Long Covid-19” describes illness in patients who are reporting long-lasting effects of the SARS-CoV-19 infection, often long after they have recovered from acute Covid-19. Ongoing health issues often reported include low exercise tolerance and breathing difficulties, chronic tiredness, and mental health problems such as post-traumatic stress disorder and depression. This feed follows the latest research into Long Covid.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.