geck: trio-based comparative benchmarking of variant calls

Bioinformatics
Péter Kómár, Deniz Kural

Abstract

Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary data are available at Bioinformatics online.

References

Jan 16, 2002·American Journal of Human Genetics·Julie A DouglasMichael Boehnke
Jan 16, 2002·American Journal of Human Genetics·Eric SobelKenneth Lange
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Aug 25, 2011·Bioinformatics·Barak MarkusDan Geiger
Aug 30, 2011·Human Molecular Genetics·Luke Jostins, Jeffrey C Barrett
Sep 29, 2011·Nature Reviews. Genetics·Michael J BamshadJay Shendure
Jul 19, 2012·Nature Reviews. Genetics·Joris A Veltman, Han G Brunner
Oct 16, 2012·Genome Research·Wei ChenGonçalo R Abecasis
Feb 22, 2013·Proceedings of the National Academy of Sciences of the United States of America·Gang PengWenyi Wang
Nov 12, 2013·American Journal of Human Genetics·Brian L Browning, Sharon R Browning
Jun 5, 2014·Bioinformatics·Ameet TalwalkarDavid Patterson
Oct 16, 2014·Genome Biology·Paul C BoutrosGustavo Stolovitzky
Jun 26, 2015·PloS One·Suyash S ShringarpureCarlos D Bustamante
Jul 22, 2015·PloS One·Marcel Elie NutsuaMichael Nothnagel
Jul 29, 2015·Frontiers in Genetics·Nathan D OlsonJustin M Zook
Oct 4, 2015·Nature·UNKNOWN 1000 Genomes Project ConsortiumGonçalo R Abecasis
Nov 6, 2015·BioMed Research International·Adam Cornish, Chittibabu Guda
Jan 17, 2016·BMC Genomics·Hemang ParikhMarc Salit
Feb 25, 2017·Scientific Reports·Sarah SandmannMartin Dugas

❮ Previous
Next ❯

Citations

Apr 4, 2019·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Prachi KothiyalJohn E Niederhuber
Nov 17, 2020·Frontiers in Genetics·Aquillah M KanziTulio de Oliveira

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.