Towards pan-genome read alignment to improve variation calling

BMC Genomics
Daniel ValenzuelaVeli Mäkinen

Abstract

Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation - a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC . Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.

References

Jan 30, 2008·Bioinformatics·Ruiqiang LiJun Wang
Mar 6, 2009·Genome Biology·Ben LangmeadSteven L Salzberg
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Sep 19, 2009·Genome Biology·Korbinian SchneebergerDetlef Weigel
Apr 10, 2010·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Veli MäkinenNiko Välimäki
Apr 16, 2010·Nature·UNKNOWN International Cancer Genome ConsortiumHuanming Yang
Jul 3, 2013·Bioinformatics·Lin HuangSerafim Batzoglou
Aug 24, 2013·Bioinformatics·Sebastian DeorowiczSzymon Grabowski
Apr 23, 2014·Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences·H FerradaS J Puglisi
Oct 8, 2014·PloS One·Agnieszka DanekSzymon Grabowski
Nov 29, 2014·Current Protocols in Bioinformatics·Geraldine A Van der AuweraMark A DePristo
Apr 29, 2015·Nature Genetics·Alexander DiltheyGil McVean
May 17, 2015·Bioinformatics·Roland WittlerVeli Mäkinen
Mar 1, 2014·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Jouni SirénVeli Mäkinen
Sep 15, 2015·Nature·UNKNOWN UK10K ConsortiumNicole Soranzo
Oct 4, 2015·Nature·UNKNOWN 1000 Genomes Project ConsortiumGonçalo R Abecasis
Aug 19, 2016·Nature·Monkol LekUNKNOWN Exome Aggregation Consortium

❮ Previous
Next ❯

Citations

Mar 18, 2020·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Alan KuhnleGiovanni Manzini
May 27, 2020·Annual Review of Genomics and Human Genetics·Jordan M EizengaErik Garrison
May 28, 2019·Algorithms for Molecular Biology : AMB·Tuukka NorriVeli Mäkinen
Aug 11, 2019·Genome Biology·Sara BallouzJesse A Gillis
Jan 28, 2021·PLoS Computational Biology·Carlos Valiente-MullorFernando González-Candelas
Jan 10, 2021·Trends in Genetics : TIG·Alice M Kaye, Wyeth W Wasserman
Apr 10, 2021·Briefings in Bioinformatics·Adrien OlivaYassine Souilmi
Sep 1, 2021·Communications Biology·Qiuhui LiRuibang Luo
Aug 31, 2021·Journal of the Indian Institute of Science·Joseph Outten, Andrew Warren
Sep 8, 2021·Genome Biology·Brice LetcherZamin Iqbal
Jul 15, 2021·Bioinformatics·Tuukka NorriVeli Mäkinen
Jan 19, 2022·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Massimiliano RossiChristina Boucher

❮ Previous
Next ❯

Methods Mentioned

BETA
genotyping

Software Mentioned

GATK
PanVC
GATK HaplotypeCaller
IndelRealigner
Picard
CHIC Aligner
BWA
Genome Analysis Toolkit ( GATK )
GATK RealignerTargetCreator

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.