An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination

Genetics
Joshua S PaulYun S Song

Abstract

The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence...Continue Reading

References

Sep 20, 2011·Nature Genetics·Ilan GronauAdam Siepel
May 10, 2013·Nature Reviews. Genetics·Vitor Sousa, Jody Hey
May 30, 2012·Bioinformatics·Joshua S Paul, Yun S Song
Jun 12, 2013·PLoS Genetics·Kelley Harris, Rasmus Nielsen
May 17, 2014·PLoS Genetics·Matthew D RasmussenAdam Siepel
Jun 24, 2014·Nature Genetics·Stephan Schiffels, Richard Durbin
Feb 4, 2014·Theoretical Population Biology·Asger Hobolth, Jens Ledet Jensen
Sep 25, 2014·Annual Review of Genetics·Thomas MailundMikkel H Schierup
Jun 10, 2015·Proceedings of the National Academy of Sciences of the United States of America·Jonathan Terhorst, Yun S Song
Sep 27, 2012·Theoretical Population Biology·Matthias SteinrückenYun S Song
Nov 11, 2015·Nature Reviews. Genetics·Joshua G Schraiber, Joshua M Akey
Jan 16, 2016·PLoS Genetics·Gideon S BradburdGraham Coop
Jul 4, 2017·ELife·Daniel B Weissman, Oskar Hallatschek
Jan 10, 2012·Systematic Biology·Alexandre Bouchard-CôtéMichael I Jordan
Apr 1, 2018·Molecular Ecology·Matthias SteinrückenYun S Song
Jun 26, 2013·G3 : Genes - Genomes - Genetics·Hua Chen, Montgomery Slatkin
Aug 28, 2015·Genetics·Jeremy J Berg, Graham Coop

Citations

Jun 29, 1994·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Robert C Griffiths, S Tavaré
Jun 15, 1999·Theoretical Population Biology·C Wiuf, Jotun Hein
Apr 24, 2004·Science·Gilean A T McVeanPeter Donnelly
Feb 9, 2005·American Journal of Human Genetics·Matthew Stephens, Paul Scheet
Jul 29, 2005·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Gilean A T McVean, Niall J Cardin
Mar 17, 2006·BMC Genetics·Paul Marjoram, Jeff D Wall
Jun 19, 2007·Nature Genetics·Jonathan MarchiniPeter Donnelly
May 24, 2008·PLoS Genetics·Garrett HellenthalDaniel Falush
Nov 26, 2008·Genome Research·Gary K ChenJeffrey D Wall
Jul 8, 2009·Genetics·Julien Y DutheilMikkel H Schierup
Oct 3, 2009·PLoS Genetics·Philip L F Johnson, Montgomery Slatkin
Jun 1, 2008·Advances in Applied Probability·Robert C GriffithsYun S Song

Related Concepts

In Silico
Infarction, Anterior Cerebral Artery
Genetics, Population
Recombination, Genetic
Sequence Determinations, DNA
Probability
Markov Chains
Genetic Loci
Sampling - Surgical Action
DNA Sequence

Related Feeds

BioHub - Researcher Network

The Chan-Zuckerberg Biohub aims to support the fundamental research and develop the technologies that will enable physicians to cure, prevent, or manage all diseases in our childrens' lifetimes. The CZ Biohub brings together researchers from UC Berkeley, Stanford, and UCSF. Find the latest research from the CZ Biohub researcher network here.