Spherical: an iterative workflow for assembling metagenomic datasets

BMC Bioinformatics
Thomas C A Hitch, Christopher J Creevey


The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% co...Continue Reading


Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Dec 31, 2005·Nucleic Acids Research·Cathy H WuBaris Suzek
Sep 6, 2007·PloS One·Anne BergeronGuylaine Poisson
Mar 3, 2009·Genome Research·Jared SimpsonInanç Birol
Oct 30, 2010·Bioinformatics·Monzoorul Haque MohammedSharmila S Mande
Jul 1, 2011·The ISME Journal·Pedro Belda-FerreAlex Mira
Nov 10, 2011·Nature Biotechnology·Phillip E C CompeauGlenn Tesler
Mar 6, 2012·Nature Methods·Ben Langmead, Steven L Salzberg
Apr 13, 2012·Bioinformatics·Binbin LaiHuaiqiu Zhu
May 23, 2012·Bioinformatics·Thomas ConwayBryan Beresford-Smith
May 26, 2012·PloS One·Daniel van der LelieSusannah G Tringe
Jul 24, 2012·Nucleic Acids Research·Toshiaki NamikiYasubumi Sakakibara
Aug 2, 2012·Briefings in Bioinformatics·Johannes Dröge, Alice C McHardy
Dec 25, 2012·Genome Biology·Sébastien BoisvertJacques Corbeil
Jan 30, 2013·Nature Reviews. Genetics·Niranjan Nagarajan, Mihai Pop
Sep 28, 2014·Bioinformatics·Simon AndersWolfgang Huber
Jun 15, 2015·Bioinformatics·Hamid MirebrahimStefano Lonardi
Nov 5, 2015·F1000Research·Michael R CrusoeC Titus Brown


Jul 18, 2019·FEMS Microbiology Letters·Thomas C A HitchRosalind A Gilbert
Feb 9, 2019·Frontiers in Microbiology·Javier Tamames, Fernando Puente-Sánchez
Apr 23, 2020·Frontiers in Microbiology·Ameen EetemadiIlias Tagkopoulos

Related Concepts

Vestibule of Mouth
Virtual Systems
Twitter Messaging
Spherical Shape

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Pediculosis pubis

Pediculosis pubis is a disease caused by a parasitic insect known as Pthirus pubis, which infests human pubic hair, as well as other areas with hair including eye lashes. Here is the latest research.

Rh Isoimmunization

Rh isoimmunization is a potentially preventable condition that occasionally is associated with significant perinatal morbidity or mortality. Discover the latest research on Rh Isoimmunization here.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells. It also follows CRISPR-Cas9 approaches to generating genetic mutants as a means of understanding the effect of genetics on phenotype.

Enzyme Evolution

This feed focuses on molecular models of enzyme evolution and new approaches (such as adaptive laboratory evolution) to metabolic engineering of microorganisms. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Pharmacology of Proteinopathies

This feed focuses on the pharmacology of proteinopathies - diseases in which proteins abnormally aggregate (i.e. Alzheimer’s, Parkinson’s, etc.). Discover the latest research in this field with this feed.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.