DOI: 10.1101/454355Oct 26, 2018Paper

Fast Hierarchical Bayesian Analysis of Population Structure

BioRxiv : the Preprint Server for Biology
Gerry Tonkin-HillJukka Corander

Abstract

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet Process Mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analysing an alignment of over 110,000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximise the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while generally being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

Related Concepts

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

IEEE/ACM Transactions on Computational Biology and Bioinformatics
Carl Edward RasmussenDavid L Wild
Statistical Applications in Genetics and Molecular Biology
Jiehuan SunHongyu Zhao
BioRxiv : the Preprint Server for Biology
Markus A BrownT. Ried
© 2021 Meta ULC. All rights reserved