Mar 25, 2020

Accelerating Maximal-Exact-Match Seeding with Enumerated Radix Trees

BioRxiv : the Preprint Server for Biology
Arun SubramaniyanR. Das


Motivation: Read alignment is a time consuming step in genome sequence analysis. In the read alignment software BWA-MEM and the recently published faster version BWA-MEM2, the seeding step is a major bottleneck, for instance, contributing 38% to the overall execution time in BWA-MEM2 when aligning single-end whole human genome reads from the Platinum Genomes dataset. This is because both BWA-MEM and BWA-MEM2 use a compressed index structure called the FMD-Index, which results in high memory bandwidth requirements for seeding, primarily due to its character-by-character processing of reads. Results: We propose a memory bandwidth-aware data structure for maximal-exact-match seeding called Enumerated Radix Tree (ERT). ERT trades off memory capacity to improve seeding performance (~60 GB index for human genome). Together with optimizations to the seeding algorithm and mate-rescue step, ERT when integrated into BWA-MEM2 speeds up overall read alignment by 1.28x and provides up to 2.1x higher seeding performance while guaranteeing identical output to the original software. Furthermore, we prototype an FPGA implementation of ERT on Amazon EC2 F1 cloud and observe 1.6x higher seeding throughput over a 48-thread optimized CPU-ERT implem...Continue Reading

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Migration, Cell
Genotype Determination
Mutation Abnormality
Population Group

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Cell Migration

Cell migration is involved in a variety of physiological and pathological processes such as embryonic development, cancer metastasis, blood vessel formation and remoulding, tissue regeneration, immune surveillance and inflammation. Here is the latest research.

© 2020 Meta ULC. All rights reserved