Fast haplotype matching in very large cohorts using the Li and Stephens model

BioRxiv : the Preprint Server for Biology
Gerton Lunter

Abstract

The Li and Stephens model, which approximates the coalescent describing the pattern of variation in a population, underpins a range of key tools and results in genetics. Although highly efficient compared to the coalescent, standard implemen- tations of this model still cannot deal with the very large reference cohorts that are starting to become available, and practical implementations use heuristics to achieve reasonable runtimes. Here I describe a new, exact algorithm (fastLS) that implements the Li and Stephens model and achieves runtimes independent of the size of the reference cohort. Key to achieving this runtime is the use of the Burrows-Wheeler transform, allowing the algorithm to efficiently identify partial haplotype matches across a cohort. I show that the proposed data structure is very similar to, and generalizes, Durbin's positional Burrows-Wheeler transform.

Related Concepts

Size
Patterns
Heuristics
hostoxin-1, Hoplocephalus stephensi
Act Relationship Type - Transformation
Protein S
Structure
Population Group
Cohort
Lithium

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.