Jun 7, 2016

Efficient cardinality estimation for k-mers in large DNA sequencing data sets

BioRxiv : the Preprint Server for Biology
Luiz Carlos Irber Junior, C. Titus Brown

Abstract

We present an open implementation of the HyperLogLog cardinality estimation sketch for counting fixed-length substrings of DNA strings (k-mers). The HyperLogLog sketch implementation is in C++ with a Python interface, and is distributed as part of the khmer software package. khmer is freely available from \url{https://github.com/dib-lab/khmer} under a BSD License. The features presented here are included in version 1.4 and later.

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Severe Acute Respiratory Syndrome
Medical Devices
URL Data Type
Cocaine
Sequence Determinations, DNA
License
GPER protein, human
DNA
DNA Sequence

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.