Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing

PloS One
James A StapletonTimothy A Whitehead

Abstract

Next-generation DNA sequencing has revolutionized the study of biology. However, the short read lengths of the dominant instruments complicate assembly of complex genomes and haplotype phasing of mixtures of similar sequences. Here we demonstrate a method to reconstruct the sequences of individual nucleic acid molecules up to 11.6 kilobases in length from short (150-bp) reads. We show that our method can construct 99.97%-accurate synthetic reads from bacterial, plant, and animal genomic samples, full-length mRNA sequences from human cancer cell lines, and individual HIV env gene variants from a mixture. The preparation of multiple samples can be multiplexed into a single tube, further reducing effort and cost relative to competing approaches. Our approach generates sequencing libraries in three days from less than one microgram of DNA in a single-tube format without custom equipment or specialized expertise.

Associated Datasets

References

Nov 11, 1988·Nucleic Acids Research·A M DunningS E Humphries
Dec 1, 1984·Proceedings of the National Academy of Sciences of the United States of America·M A Saghai-MaroofR W Allard
May 29, 2000·Trends in Genetics : TIG·P RiceA Bleasby
Feb 27, 2004·Nature Immunology·Dennis R BurtonRichard T Wyatt
Feb 25, 2005·Bioinformatics·Thomas D Wu, Colin K Watanabe
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Jun 2, 2006·Molecular Systems Biology·Koji HayashiTakashi Horiuchi
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
Oct 11, 2008·Nature Biotechnology·Daniel BrantonJeffery A Schloss
Nov 7, 2008·Nature·David R BentleyAnthony J Smith
Mar 18, 2009·Bioinformatics·Cole TrapnellSteven L Salzberg
Jun 10, 2009·Bioinformatics·Heng LiUNKNOWN 1000 Genome Project Data Processing Subgroup
Dec 10, 2009·Nature Reviews. Genetics·Michael L Metzker
Jan 19, 2010·Nature Methods·Joseph B HiattJay Shendure
Jan 30, 2010·Bioinformatics·Aaron R Quinlan, Ira M Hall
May 19, 2010·PloS One·James A Stapleton, James R Swartz
Jan 29, 2011·Science·Matthias HessEdward M Rubin
May 17, 2011·Nature Biotechnology·Manfred G GrabherrAviv Regev
Jul 12, 2011·Nature·UNKNOWN Potato Genome Sequencing ConsortiumRichard G F Visser
Sep 10, 2011·Bioinformatics·Tanja Magoč, Steven L Salzberg
Apr 18, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Anton BankevichPavel A Pevzner
Jun 30, 2012·Bioinformatics·Liguo WangWei Li
Jul 4, 2012·Nature Biotechnology·Sergey Koren Adam M Phillippy
Oct 30, 2012·Bioinformatics·Alexander DobinThomas R Gingeras
Jul 11, 2013·ELife·Ayelet VoskoboynikStephen R Quake
Sep 24, 2013·Nature Methods·Simone PicelliRickard Sandberg
Oct 11, 2013·Nature Biotechnology·Donald SharonMichael Snyder
Jan 5, 2014·Nature Protocols·Simone PicelliRickard Sandberg
Jan 21, 2014·Nature Biotechnology·George GeorgiouStephen R Quake
Feb 25, 2014·Nature Biotechnology·Volodymyr KuleshovMichael Snyder
Apr 4, 2014·Bioinformatics·Anthony M BolgerBjoern Usadel
May 21, 2014·PloS One·Nicholas C WuRen Sun
Jun 22, 2014·BMC Bioinformatics·Marten Boetzer, Walter Pirovano
Feb 11, 2015·Genome Research·Itai SharonJillian F Banfield
May 26, 2015·Nature Biotechnology·Konstantin BerlinAdam M Phillippy
Aug 20, 2015·Genome Research·Alex BisharaSerafim Batzoglou

❮ Previous
Next ❯

Citations

Nov 26, 2016·Current Opinion in Structural Biology·Emily E WrenbeckTimothy A Whitehead
Nov 29, 2017·BMC Genomics·Hidenobu SegawaKikuya Kato
Aug 31, 2017·Proceedings of the National Academy of Sciences of the United States of America·Clarence Y ChengRhiju Das
Oct 10, 2018·Chembiochem : a European Journal of Chemical Biology·Jakob FrankeSarah E O'Connor
Sep 3, 2016·BMC Genomics·Mohammad Shabbir Hasan, Liqing Zhang
Feb 9, 2020·Genome Biology·Shanika L AmarasingheQuentin Gouil
Jun 27, 2018·Nature Chemical Biology·Thu-Thuy T DangSarah E O'Connor
Nov 28, 2020·Nature Communications·Paul Jannis ZurekFlorian Hollfelder
Mar 6, 2018·Current Opinion in Structural Biology·Kritika Gupta, Raghavan Varadarajan
Apr 14, 2021·Nucleic Acids Research·Christian M GallardoBruce E Torbett
Jun 7, 2021·Microbiome·Benjamin J CallahanTuval Ben Yehezkel
Jun 17, 2021·Applied and Environmental Microbiology·Leho TedersooBenjamin Callahan

❮ Previous
Next ❯

Methods Mentioned

BETA
PCR
whole-genome shotgun
RNA-seq
electrophoresis

Software Mentioned

TopHat
FLASh
SAMtools
SPAdes assembler
Integrative Genomics Viewer
custom
SSPACE
Perl script
BWA
EMBOSS

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.