Jun 15, 2016

A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference

BioRxiv : the Preprint Server for Biology
Sorina MaciucaZamin Iqbal

Abstract

We show how positional markers can be used to encode genetic variation within a Burrows-Wheeler Transform (BWT), and use this to construct a generalisation of the traditional 'reference genome', incorporating known variation within a species. Our goal is to support the inference of the closest mosaic of previously known sequences to the genome(s) under analysis. Our scheme results in an increased alphabet size, and by using a wavelet tree encoding of the BWT we reduce the performance impact on rank operations. We give a specialised form of the backward search that allows variation-aware exact matching. We implement this, and demonstrate the cost of constructing an index of the whole human genome with 8 million genetic variants is 25GB of RAM. We also show that inferring a closer reference can close large kilobase-scale coverage gaps in P. falciparum.

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Biological Markers
Size
Genome
Trees (plant)
Closest
Act Relationship Type - Transformation
Mosaic Organism
Species
Analysis
Retrograde Direction

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Physical Review. E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics
A A TsonisP A Tsonis
Journal of Applied Physiology
Francois Meyer
© 2020 Meta ULC. All rights reserved