Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation

Genome Research
Sergey KorenAdam M Phillippy

Abstract

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of >21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that canno...Continue Reading

References

Mar 24, 2000·Science·E W MyersJ C Venter
Oct 10, 2001·Genome Research·Z NingJ C Mullikin
Aug 10, 2002·Bioinformatics·Daniel FasuloClark Mobarry
Oct 5, 2002·Science·Robert A HoltStephen L Hoffman
Feb 5, 2004·Genome Biology·Stefan KurtzSteven L Salzberg
Feb 11, 2004·Proceedings of the National Academy of Sciences of the United States of America·Sorin IstrailJ Craig Venter
May 28, 2004·Nature·Jeremy SchmutzRichard M Myers
Oct 22, 2004·Nature·UNKNOWN International Human Genome Sequencing Consortium
Oct 6, 2005·Bioinformatics·Eugene W Myers
Mar 18, 2008·Genome Biology·Adam M PhillippyMihai Pop
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
Oct 28, 2008·Bioinformatics·Jason R MillerGranger Sutton
Nov 22, 2008·Science·John EidStephen Turner
Jul 8, 2009·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Niranjan Nagarajan, Mihai Pop
Jun 24, 2010·Nucleic Acids Research·Kevin J TraversStephen W Turner
Sep 20, 2011·Bioinformatics·Sergey KorenMihai Pop
Dec 8, 2011·Genome Research·Steven L SalzbergJames A Yorke
Apr 12, 2012·Nature Biotechnology·Grégory F Schneider, Cees Dekker
Jun 23, 2012·Nature·Kay PrüferSvante Pääbo
Jul 4, 2012·Nature Biotechnology·Sergey Koren Adam M Phillippy
Nov 7, 2012·Bioinformatics·Yukiteru OnoMichiaki Hamada
Feb 13, 2013·BMC Bioinformatics·Andreas SandThomas Mailund
May 7, 2013·Nature Methods·Chen-Shan ChinJonas Korlach
May 31, 2013·Genome Biology·Michael G RossDavid B Jaffe
Nov 6, 2013·BMC Bioinformatics·Guy BreslerDavid Tse
Sep 17, 2013·Genome Biology·Sergey KorenAdam M Phillippy
Sep 24, 2013·Bioinformatics·Jurgen F NijkampDick de Ridder
Nov 5, 2013·Nature Biotechnology·Siddarth SelvarajBing Ren
Nov 5, 2013·Nature Biotechnology·Joshua N BurtonJay Shendure
Nov 26, 2013·Nature Biotechnology·Noam Kaplan, Job Dekker
Aug 29, 2014·Bioinformatics·Leena Salmela, Eric Rivals
Jan 16, 2015·Genome Research·Roger A HoskinsSusan E Celniker
Feb 17, 2015·Nature Methods·Miten JainMark Akeson
Jan 1, 2014·Scientific Data·Kristi E KimJane M Landolin
May 26, 2015·Nature Biotechnology·Konstantin BerlinAdam M Phillippy
Jun 16, 2015·Nature Methods·Nicholas J LomanJared T Simpson
Jun 24, 2015·Bioinformatics·Ryan R WickKathryn E Holt
Oct 16, 2015·Nucleic Acids Research·Helen AttrillUNKNOWN FlyBase Consortium
Nov 22, 2015·Bioinformatics·Dmitry AntipovPavel A Pevzner
Feb 2, 2016·Nature Biotechnology·Grace X Y ZhengHanlee P Ji
Apr 2, 2016·Science·David GordonEvan E Eichler
Jun 22, 2016·Genome Biology·Brian D OndovAdam M Phillippy
Jul 1, 2016·Nature Communications·Lingling ShiKai Wang

❮ Previous
Next ❯

Citations

Jan 20, 2017·Genome Research·Robert VaserMile Šikić
Jun 9, 2017·PLoS Computational Biology·Ryan R WickKathryn E Holt
Jun 25, 2017·Bioinformatics·Giorgio Gonnella, Stefan Kurtz
Jul 25, 2017·Infection, Genetics and Evolution : Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases·Juan Germán Rodríguez-CastilloMartha Isabel Murcia-Aranguren
Jan 28, 2018·Applied and Environmental Microbiology·Nikolai V RavinSvetlana N Dedysh
Feb 13, 2018·Nature Biotechnology·Miten JainMatthew Loose
Dec 29, 2017·Genes·Changsheng LiRuidong Huang
Oct 19, 2017·Nucleic Acids Research·Yuichi KodamaToshihisa Takagi
Apr 6, 2018·PloS One·Kris A ChristensenRobert H Devlin
Mar 23, 2018·Nature Communications·Ruobing WangFrancois Balloux
Apr 17, 2018·Bioinformatics·Yu LiXin Gao
Apr 20, 2018·Molecular Ecology Resources·Amrita SrivathsanRudolf Meier
Dec 2, 2017·Genome Announcements·Mitsunori YoshidaYoshihiko Hoshino
Mar 4, 2018·Applied and Environmental Microbiology·Natalia M ViorAndrew W Truman
Aug 26, 2017·Genome Announcements·Sandeep SharmaBurton H Bluhm
Mar 3, 2018·Genome Announcements·Blake W StampsWendy J Crookes-Goodson
Oct 21, 2017·Genome Announcements·Matthew J SullivanGlen C Ulett
Jan 20, 2018·Genome Announcements·Stefanie Gobeli BrawandVincent Perreten

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Landau-Kleffner Syndrome

Landau Kleffner syndrome (LKS), also called infantile acquired aphasia, acquired epileptic aphasia, or aphasia with convulsive disorder, is a rare childhood neurological syndrome characterized by the sudden or gradual development of aphasia (the inability to understand or express language) and an abnormal electroencephalogram. Discover the latest research on LKS here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.