A principled approach to deriving approximate conditional sampling distributions in population genetics models with recombination

Joshua S Paul, Yun S Song


The multilocus conditional sampling distribution (CSD) describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. The CSD has a wide range of applications in both computational biology and population genomics analysis, including phasing genotype data into haplotype data, imputing missing data, estimating recombination rates, inferring local ancestry in admixed populations, and importance sampling of coalescent genealogies. Unfortunately, the true CSD under the coalescent with recombination is not known, so approximations, formulated as hidden Markov models, have been proposed in the past. These approximations have led to a number of useful statistical tools, but it is important to recognize that they were not derived from, though were certainly motivated by, principles underlying the coalescent process. The goal of this article is to develop a principled approach to derive improved CSDs directly from the underlying population genetics model. Our approach is based on the diffusion process approximation and the resulting mathematical expressions admit intuitive genealogical interpretations, which we utilize to introduce further approxim...Continue Reading


May 30, 2012·Bioinformatics·Joshua S Paul, Yun S Song
May 17, 2014·PLoS Genetics·Matthew D RasmussenAdam Siepel
Jan 15, 2014·Theoretical Population Biology·Ethan M Jewett, Noah A Rosenberg
Sep 27, 2012·Theoretical Population Biology·Matthias SteinrückenYun S Song
Nov 19, 2014·Evolution; International Journal of Organic Evolution·Ziyue GaoGuy Sella
Jan 31, 2016·Molecular Ecology·Nathan K SchaeferRichard E Green
Apr 1, 2018·Molecular Ecology·Matthias SteinrückenYun S Song
Jan 7, 2014·Genetics·Yongtao Guan


Apr 1, 1983·Theoretical Population Biology·R R Hudson
Jan 1, 1996·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Robert C Griffiths, P Marjoram
Jun 15, 1999·Theoretical Population Biology·C Wiuf, Jotun Hein
Apr 24, 2004·Science·Gilean A T McVeanPeter Donnelly
Feb 9, 2005·American Journal of Human Genetics·Matthew Stephens, Paul Scheet
Jul 29, 2005·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Gilean A T McVean, Niall J Cardin
Mar 17, 2006·BMC Genetics·Paul Marjoram, Jeff D Wall
Sep 20, 2006·Nature Reviews. Genetics·Paul Marjoram, Simon Tavaré
Jun 19, 2007·Nature Genetics·Jonathan MarchiniPeter Donnelly
May 24, 2008·PLoS Genetics·Garrett HellenthalDaniel Falush
Oct 15, 2008·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Ying Wang, Bruce Rannala
Nov 26, 2008·Genome Research·Gary K ChenJeffrey D Wall
Sep 10, 2009·Genetics·Paul A Jenkins, Yun S Song
Oct 3, 2009·PLoS Genetics·Philip L F Johnson, Montgomery Slatkin
Jun 1, 2008·Advances in Applied Probability·Robert C GriffithsYun S Song
Jul 31, 2010·The Annals of Applied Probability : an Official Journal of the Institute of Mathematical Statistics·Paul A Jenkins, Yun S Song

Related Concepts

Likelihood Functions
Infarction, Anterior Cerebral Artery
Recombination, Genetic
Sample Size
Computational Biology

Related Feeds

BioHub - Researcher Network

The Chan-Zuckerberg Biohub aims to support the fundamental research and develop the technologies that will enable physicians to cure, prevent, or manage all diseases in our childrens' lifetimes. The CZ Biohub brings together researchers from UC Berkeley, Stanford, and UCSF. Find the latest research from the CZ Biohub researcher network here.