Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

Scientific Reports
Hsin-Hung Lin, Yu-Chieh Liao

Abstract

Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/.

References

Mar 4, 2006·Science·Francesca D CiccarelliPeer Bork
Jan 16, 2007·Science·Brendan J Frey, Delbert Dueck
Aug 25, 2009·Genome Biology·Gregory J DickJillian F Banfield
Mar 10, 2010·BMC Bioinformatics·Doug HyattLoren J Hauser
Aug 17, 2010·Bioinformatics·Robert C Edgar
Jan 29, 2011·Science·Matthias HessEdward M Rubin
Jun 21, 2011·Bioinformatics·Yu PengFrancis Y L Chin
Mar 6, 2012·Nature Methods·Ben Langmead, Steven L Salzberg
Jul 17, 2012·Bioinformatics·Doug HyattEdward C Uberbacher
Jul 24, 2012·Nucleic Acids Research·Toshiaki NamikiYasubumi Sakakibara
Oct 20, 2012·PloS One·Jens Roat KultimaPeer Bork
Dec 25, 2012·Genome Biology·Sébastien BoisvertJacques Corbeil
Jul 31, 2013·Nature Methods·Daniel R MendePeer Bork
Oct 22, 2013·Nature Methods·Shinichi SunagawaPeer Bork
Mar 20, 2014·Bioinformatics·Torsten Seemann
Apr 1, 2014·Scientific Reports·Cedric C LacznyPaul Wilmes
Jun 21, 2014·Bioinformatics·Bahlul HaiderChongle Pan
Sep 15, 2014·Nature Methods·Johannes AlnebergChristopher Quince

Citations

Jan 5, 2018·Genes·Sandra Christine Andersen, Jeffrey Hoorfar
Dec 1, 2017·Briefings in Bioinformatics·Aitor Blanco-MíguezAnália Lourenço
Sep 16, 2017·The ISME Journal·Rosario GilAndrés Moya
Jan 4, 2018·BMC Bioinformatics·Damayanthi HerathSaman Kumara Halgamuge
Jun 13, 2018·GigaScience·Fernando MeyerAlice C McHardy
Aug 1, 2018·Environmental Microbiology·Mila Kojadinovic-SirinelliBrigitte Gontero
Oct 3, 2017·Nature Methods·Alexander SczyrbaAlice C McHardy
Aug 10, 2017·Nature Reviews. Gastroenterology & Hepatology·Marcus J ClaessonPaul W O'Toole
Jun 21, 2019·Molecular Biology and Evolution·Eva Maria NovoaManolis Kellis
Apr 25, 2018·Mediators of Inflammation·Anna PiccaEmanuele Marzetti
Apr 11, 2017·Journal of Phycology·Heroen VerbruggenChristopher J Jackson
May 26, 2017·Frontiers in Microbiology·Danillo O AlvarengaAlessandro M Varani
Dec 14, 2018·BMC Bioinformatics·Manuel ZaharievC André Lévesque
Feb 5, 2020·Frontiers in Genetics·Rilquer MascarenhasPedro Milet Meirelles
Jul 25, 2020·Microbial Genomics·Ana Elena Pérez-CobasCarmen Buchrieser
Aug 29, 2020·Microbiology Resource Announcements·Wensi ZhangWei Lin
Sep 10, 2020·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Roberta IaconoAndrea Strazzulli
Dec 27, 2016·Frontiers in Physiology·Alejandra V ContrerasOsbaldo Resendis-Antonio
Apr 27, 2018·Protein & Cell·Yuan Xu, Fangqing Zhao
May 29, 2018·Nature Microbiology·Christian M K SieberJillian F Banfield
Mar 7, 2019·Nucleic Acids Research·Ian J MillerJason C Kwan
May 31, 2019·Nature Medicine·Jennifer M FettweisGregory A Buck
Jan 17, 2020·Nature·Hiroyuki ImachiKen Takai
May 4, 2017·BMC Bioinformatics·Bertjan BroeksemaMohammad Ghoniem
Dec 6, 2018·Frontiers in Microbiology·Alex Ranieri Jerônimo LimaEvonnildo Costa Goncalves
Jul 7, 2020·EFSA Journal·EFSA Panel on Biological Hazards (EFSA BIOHAZ Panel)Lieve Herman
Sep 22, 2020·PeerJ·Andres BenavidesFelipe Cabarcas
Feb 1, 2019·Microbiology Resource Announcements·Denis S GrouzdevVladimir M Gorlenko
Jul 1, 2018·Annual Review of Biomedical Data Science·Jie RenFengzhu Sun

Related Concepts

Gastrointestinal Microbiome
Cistron
Disease Clustering
Computational Molecular Biology
Contig Mapping
Metagenome
Metagenomics
Microbiota (plant)
Environment
Genome

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.