Apr 16, 2020

REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets

BioRxiv : the Preprint Server for Biology
Maria BotcharovaSimon F Farmer


Motivation: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. Results: We used REINDEER to index the abundances of sequences within 2,585 human RNA-seq experiments in 45 hours using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ~4 billion distinct k-mers across 2,585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph (DBG) of each dataset, then conceptually merges those DBGs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. Availability: https://github.com/kamimrcht/REINDEER Contact: camille.marchet@univ-lille.fr

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Sleep, Slow-Wave
Inebriated protein, Drosophila
Menticirrhus americanus
Finger Tapping
Direct Fluorescent Antibody Test

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Acta Veterinaria Scandinavica
J TimisjärviH Laaksonen
The Veterinary Record
T Harris
The Veterinary Record
J M Rutter
The Veterinary Record
B T Skinner
The Veterinary Record
D M McDowell
© 2020 Meta ULC. All rights reserved