DOI: 10.1101/460915Nov 4, 2018Paper

Fast and accurate shared segment detection and relatedness estimation in un-phased genetic data using TRUFFLE

BioRxiv : the Preprint Server for Biology
Apostolos DimitromanolakisLei Sun


Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency, or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified three second cousin or closer pairs across different populations, while commonly used methods identified over 15,000 such pairs. Similarly, within populations, we identified much fewer related pairs. Benchmarking to methods relying on phased data, TRUFFLE has a favorable accuracy profile but is drastically faster. We also identified specific local genomic re...Continue Reading

Related Concepts

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.