DOI: 10.1101/477398Nov 24, 2018Paper

Extending long-range phasing and haplotype library imputation algorithms to very large and heterogeneous datasets

BioRxiv : the Preprint Server for Biology
Daniel MoneyJohn M Hickey

Abstract

Background: This paper describes the latest improvements to the long-range phasing and haplotype library imputation algorithms that enable them to successfully phase both datasets with one million individuals and datasets genotyped using different sets of single nucleotide polymorphisms (SNPs). Previous publicly available implementations of long-range phasing could not phase large datasets due to the computational cost of defining surrogate parents by exhaustive all-against-all searches. Further, both long-range phasing and haplotype library imputation were not designed to deal with large amounts of missing data, which is inherent when using multiple SNP arrays. Methods: Here, we developed methods which avoid the need for all-against-all searches by performing long-range phasing on subsets of individuals and then combing results. We also extended long-range phasing and haplotype library imputation algorithms to enable them to use different sets of markers, including missing values, when determining surrogate parents and identifying haplotypes. We implemented and tested these extensions in an updated version of our phasing software AlphaPhase. Results: A simulated dataset with one million individuals genotyped with the same set ...Continue Reading

Related Concepts

Alleles
Biological Markers
Breeding
Chromosomes
Computer Software
Genetic Research
Genetic Loci
Simulation
Single Nucleotide Polymorphism

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.