Dashing: Fast and Accurate Genomic Distances with HyperLogLog

BioRxiv : the Preprint Server for Biology
Daniel N Baker, Benjamin Langmead

Abstract

Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Related Concepts

Chemical Bond
Size
Genome
Genomic Stability
Nucleic Acid Sequencing
Software Tools

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

BioRxiv : the Preprint Server for Biology
Luiz Carlos Irber Junior, C. Titus Brown
Journal of the American Dietetic Association
Linda Van Horn
Journal of Emergency Nursing : JEN : Official Publication of the Emergency Department Nurses Association
Reneé S Holleran
Canadian Hospital
A L SWANSON
Queensland Nurses Journal
© 2020 Meta ULC. All rights reserved