Estimating K in Genetic Mixture Models

BioRxiv : the Preprint Server for Biology
Robert Verity, Richard A Nichols


A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favour of a particular value of K cannot usually be computed exactly, and instead programs such as STRUCTURE make use of simple heuristic estimators to approximate this quantity. We show - using simulated data sets small enough that the true evidence can be computed exactly - that these simple heuristics often fail to estimate the true evidence, and that this can lead to incorrect conclusions about K. Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are orders of magnitude more accurate and precise than those based on simple heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Our solution is implemented fo...Continue Reading

Related Concepts

Computer Software
Tricuspid Valve Insufficiency
Potassium Ion
Population Group

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.