DOI: 10.1101/504480Dec 21, 2018Paper

A family-based phasing algorithm for sequence data

BioRxiv : the Preprint Server for Biology
Mara BattaginJohn M Hickey

Abstract

This paper describes a family-based phasing algorithm, for variable-coverage sequence data, that first minimises phasing errors and then maximises the proportion of alleles phased. This algorithm is one of the essential tools that underpin an overall strategy for generating highly accurate sequence data on whole populations at low cost. The algorithm is called AlphaFamSeq. It uses sequence data on the focal individual and at least two generations of ancestors to phase alleles. In the first step, AlphaFamSeq calculates allele probabilities using iterative peeling. In subsequent steps, the alleles are phased using heuristics deriving information from the sequence data of parents, grandparents and progenies and, if available, from other families in the pedigree. AlphaFamSeq was tested on a range of simulated data sets. AlphaFamSeq gives low phasing error rates and, if there is sufficient sequence information and haplotype sharing amongst individuals, it can give a high yield of correctly phased alleles. The allele threshold had a large effect and window size had a small effect on performance. When all individuals in a single family were sequenced at different coverages the highest correctly phased alleles reached 90% of the possib...Continue Reading

Related Concepts

Alleles
Biological Markers
Focal
Size
Population Group

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.