Abstract
Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.
References
May 16, 1998·Genome Research·D GordonP Green
Apr 5, 2002·Genome Research·W James Kent
Apr 3, 2004·Genome Research·Martin S TaylorRichard R Copley
Sep 7, 2007·PLoS Biology·Samuel LevyJ Craig Venter
Nov 7, 2008·Nature·David R BentleyAnthony J Smith
Dec 24, 2008·Nucleic Acids Research·Guoqing LiJun Wang
Mar 4, 2009·Tissue Antigens·S UlrichG Lanzer
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Nov 13, 2009·PloS One·Nils HomerStanley F Nelson
Dec 19, 2009·Genome Research·Yufeng ShenFuli Yu
Oct 29, 2010·Nature·Gonçalo R AbecasisGil A McVean
Apr 12, 2011·Nature Genetics·Mark A DePristoMark J Daly
Sep 16, 2011·Genome Biology·Gabor T MarthUNKNOWN 1000 Genomes Project
Jan 14, 2012·BMC Bioinformatics·Danny ChallisFuli Yu
Feb 10, 2012·Journal of the American Medical Informatics Association : JAMIA·Carrie C BuchananMarylyn D Ritchie
Apr 21, 2012·Briefings in Bioinformatics·Helga ThorvaldsdóttirJill P Mesirov
Apr 24, 2012·Genome Research·Melissa GymrekYaniv Erlich
Jun 19, 2012·Briefings in Bioinformatics·Joseph A NeumanNoam Shomron
Sep 14, 2012·Genome Research·Shengting LiJun Wang
Nov 7, 2012·Nature·UNKNOWN 1000 Genomes Project ConsortiumGil A McVean
Mar 13, 2013·Genome Research·Stephen B MontgomeryGerton Lunter
Mar 30, 2013·Genome Medicine·Jason O'RaweGholson J Lyon
Mar 7, 2014·PloS One·Wan-Ping LeeGabor T Marth
Jun 30, 2014·Bioinformatics·Heng Li
Aug 19, 2014·Nature Methods·Giuseppe NarzisiMichael C Schatz
Citations
Jan 21, 2016·Expert Opinion on Drug Discovery·Gabriel Wajnberg, Fabio Passetti
Aug 29, 2016·BMC Medical Genomics·Chen DuDavid R Adams
Oct 13, 2017·RNA Biology·Yaseswini NeelamrajuSarath Chandra Janga
Jan 29, 2019·The Journal of Pathology·Nancy M JosephGregor Krings
May 1, 2019·Journal of Evolutionary Biology·Clément RougeuxLouis Bernatchez
May 19, 2019·American Journal of Medical Genetics. Part a·Rajech SharkiaMuhammad Mahajnah
Feb 6, 2017·Movement Disorders : Official Journal of the Movement Disorder Society·Gabrielle HouleGuy A Rouleau
Sep 1, 2016·Expert Review of Endocrinology & Metabolism·Anu Bashamboo, Ken McElreavey
Dec 12, 2020·Genome Research·Xiaoyu ZhuoTing Wang
Feb 23, 2020·Cell Host & Microbe·Arthur S KimMichael S Diamond