Advances in the recovery of haplotypes from the metagenome

BioRxiv : the Preprint Server for Biology
Samuel M NichollsAmanda Clare

Abstract

High-throughput DNA sequencing has enabled us to look beyond consensus reference sequences to the variation observed in sequences within organisms; their haplotypes. Recovery, or assembly of haplotypes has proved computationally difficult and there exist many probabilistic heuristics that attempt to recover the original haplotypes for a single organism of known ploidy. However, existing approaches make simplifications or assumptions that are easily violated when investigating sequence variation within a metagenome. We propose the "metahaplome" as the set of haplotypes for any particular genomic region of interest within a metagenomic data set and present Hansel and Gretel, a data structure and algorithm that together provide a proof of concept framework for the recovery of true haplotypes from a metagenomic data set. The algorithm performs incremental haplotype recovery, using smoothed Naive Bayes - a simple, efficient and effective method. Hansel and Gretel pose several advantages over existing solutions: the framework is capable of recovering haplotypes from metagenomes, does not require a priori knowledge about the input data, makes no assumptions regarding the distribution of alleles at variant sites, is robust to error, an...Continue Reading

Related Concepts

Alleles
Genes
Genome
Site
Evaluation
Structure
Single Nucleotide Polymorphism
Molecular Assembly/Self Assembly
Metagenome
High-Throughput DNA Sequencing

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

BioRxiv : the Preprint Server for Biology
Samuel M NichollsAmanda Clare
BioRxiv : the Preprint Server for Biology
Samuel M NichollsAmanda Clare
Bulletin of the Menninger Clinic
G ROHEIM
© 2021 Meta ULC. All rights reserved