Is the whole greater than the sum of its parts? De novo assembly strategies for bacterial genomes based on paired-end sequencing

BMC Genomics
Ting-Wen ChenPetrus Tang

Abstract

Whole genome sequence construction is becoming increasingly feasible because of advances in next generation sequencing (NGS), including increasing throughput and read length. By simply overlapping paired-end reads, we can obtain longer reads with higher accuracy, which can facilitate the assembly process. However, the influences of different library sizes and assembly methods on paired-end sequencing-based de novo assembly remain poorly understood. We used 250 bp Illumina Miseq paired-end reads of different library sizes generated from genomic DNA from Escherichia coli DH1 and Streptococcus parasanguinis FW213 to compare the assembly results of different library sizes and assembly approaches. Our data indicate that overlapping paired-end reads can increase read accuracy but sometimes cause insertion or deletions. Regarding genome assembly, merged reads only outcompete original paired-end reads when coverage depth is low, and larger libraries tend to yield better assembly results. These results imply that distance information is the most critical factor during assembly. Our results also indicate that when depth is sufficiently high, assembly from subsets can sometimes produce better results. In summary, this study provides syste...Continue Reading

References

Dec 24, 1998·Nucleic Acids Research·G Benson
Aug 16, 2001·Proceedings of the National Academy of Sciences of the United States of America·P A PevznerM S Waterman
Jul 31, 2003·Nucleic Acids Research·Martti T TammiBjörn Andersson
Nov 9, 2005·Nucleic Acids Research·Nava WhitefordCameron Neylon
Dec 18, 2007·Genome Research·Mark J Chaisson, Pavel A Pevzner
Mar 15, 2008·Genome Research·Jonathan ButlerDavid B Jaffe
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
Oct 11, 2008·Nature Biotechnology·Jay Shendure, Hanlee Ji
Dec 6, 2008·Genome Research·Mark J ChaissonPavel A Pevzner
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Mar 10, 2010·Genomics·Jason R MillerGranger Sutton
Dec 1, 2010·Genome Biology·David R KelleySteven L Salzberg
Sep 10, 2011·Bioinformatics·Tanja Magoč, Steven L Salzberg
Feb 16, 2012·BMC Bioinformatics·Andre P MasellaJosh D Neufeld
Apr 18, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Anton BankevichPavel A Pevzner
Apr 24, 2012·Nature Biotechnology·Nicholas J LomanMark J Pallen
Jul 4, 2012·Nature Biotechnology·Sergey Koren Adam M Phillippy
Feb 21, 2013·Bioinformatics·Alexey GurevichGlenn Tesler
May 7, 2013·Nature Methods·Chen-Shan ChinJonas Korlach
Oct 22, 2013·Bioinformatics·Jiajie ZhangAlexandros Stamatakis
Dec 10, 2013·Nucleic Acids Research·Tatiana TatusovaIgor Tolstoy
Jan 18, 2014·Nature Reviews. Genetics·David SimsChris P Ponting
Sep 5, 2014·Molecular Ecology Resources·Alexander S Mikheyev, Mandy M Y Tin

❮ Previous
Next ❯

Methods Mentioned

BETA
454 sequencing

Software Mentioned

BWA
FLASH
FastqJoin
Moleculo
CLC
MEM
seqtk
MinION
Perl script
PacBio

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.