Improving contig binning of metagenomic data using [Formula: see text] oligonucleotide frequency dissimilarity

BMC Bioinformatics
Ying WangFengzhu Sun

Abstract

Metagenomics sequencing provides deep insights into microbial communities. To investigate their taxonomic structure, binning assembled contigs into discrete clusters is critical. Many binning algorithms have been developed, but their performance is not always satisfactory, especially for complex microbial communities, calling for further development. According to previous studies, relative sequence compositions are similar across different regions of the same genome, but they differ between distinct genomes. Generally, current tools have used the normalized frequency of k-tuples directly, but this represents an absolute, not relative, sequence composition. Therefore, we attempted to model contigs using relative k-tuple composition, followed by measuring dissimilarity between contigs using [Formula: see text]. The [Formula: see text] was designed to measure the dissimilarity between two long sequences or Next-Generation Sequencing data with the Markov models of the background genomes. This method was effective in revealing group and gradient relationships between genomes, metagenomes and metatranscriptomes. With many binning tools available, we do not try to bin contigs from scratch. Instead, we developed [Formula: see text] to ...Continue Reading

References

Apr 13, 2019·Bioinformatics·Ziye WangShanfeng Zhu
Mar 14, 2020·Bioinformatics·Vijini MallawaarachchiYu Lin
Jul 1, 2018·Annual Review of Biomedical Data Science·Jie RenFengzhu Sun
Dec 12, 2019·Frontiers in Genetics·Kai SongFengzhu Sun

Citations

Jun 1, 1997·Journal of Bacteriology·S KarlinA M Campbell
Dec 1, 2004·Annual Review of Genetics·Christian S RiesenfeldJo Handelsman
Apr 23, 2005·Science·Susannah G TringeEdward M Rubin
Nov 23, 2006·Proceedings of the National Academy of Sciences of the United States of America·Steven J HallamEdward F DeLong
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
Oct 9, 2008·PloS One·Daniel C RichterDaniel H Huson
Aug 25, 2009·Genome Biology·Gregory J DickJillian F Banfield
Oct 6, 2009·BMC Bioinformatics·Andrey KislyukJoshua S Weitz
Oct 27, 2010·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Lin WanMichael S Waterman
Nov 4, 2010·BMC Bioinformatics·David R Kelley, Steven L Salzberg
Mar 10, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yu-Wei Wu, Yuzhen Ye
Feb 4, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yi WangFrancis Y L Chin
Sep 11, 2012·Briefings in Bioinformatics·Sharmila S MandeTarini Shankar Ghosh
Dec 12, 2012·Frontiers in Microbiology·Marc StrousHalina E Tegetmeyer
Dec 28, 2012·BMC Genomics·Bai JiangXuegong Zhang
Feb 7, 2013·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Kai SongFengzhu Sun
Mar 4, 2014·Genome Biology·Derrick E Wood, Steven L Salzberg
Sep 15, 2014·Nature Methods·Johannes AlnebergChristopher Quince
Feb 6, 2015·BMC Bioinformatics·Ying WangXiaoman Li
Dec 18, 2015·Nucleic Acids Research·Robert D FinnAlex Bateman
Dec 17, 2016·Computational and Structural Biotechnology Journal·Karel SedlarIvo Provaznik

Related Concepts

Twitter Messaging
Study
Killer Cells
Genome
Metagenome
Metagenomics
Mathematical Formula
Sequencing
Massively-Parallel Sequencing
Insight

Related Feeds

Anxiety Disorders

Discover the latest research on anxiety disorders including agoraphobia, panic disorder, obsessive-compulsive disorder, and post-traumatic stress disorder here.