The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

BMC Genomics
Danny ChallisFuli Yu

Abstract

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

References

May 16, 1998·Genome Research·D GordonP Green
Apr 5, 2002·Genome Research·W James Kent
Sep 7, 2007·PLoS Biology·Samuel LevyJ Craig Venter
Nov 7, 2008·Nature·David R BentleyAnthony J Smith
Dec 24, 2008·Nucleic Acids Research·Guoqing LiJun Wang
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Nov 13, 2009·PloS One·Nils HomerStanley F Nelson
Oct 29, 2010·Nature·Gonçalo R AbecasisGil A McVean
Sep 16, 2011·Genome Biology·Gabor T MarthUNKNOWN 1000 Genomes Project
Feb 10, 2012·Journal of the American Medical Informatics Association : JAMIA·Carrie C BuchananMarylyn D Ritchie
Apr 21, 2012·Briefings in Bioinformatics·Helga ThorvaldsdóttirJill P Mesirov
Apr 24, 2012·Genome Research·Melissa GymrekYaniv Erlich
Jun 19, 2012·Briefings in Bioinformatics·Joseph A NeumanNoam Shomron
Sep 14, 2012·Genome Research·Shengting LiJun Wang
Nov 7, 2012·Nature·UNKNOWN 1000 Genomes Project ConsortiumGil A McVean
Aug 19, 2014·Nature Methods·Giuseppe NarzisiMichael C Schatz

❮ Previous
Next ❯

Citations

Jan 21, 2016·Expert Opinion on Drug Discovery·Gabriel Wajnberg, Fabio Passetti
Aug 29, 2016·BMC Medical Genomics·Chen DuDavid R Adams
Oct 13, 2017·RNA Biology·Yaseswini NeelamrajuSarath Chandra Janga
May 1, 2019·Journal of Evolutionary Biology·Clément RougeuxLouis Bernatchez
Feb 6, 2017·Movement Disorders : Official Journal of the Movement Disorder Society·Gabrielle HouleGuy A Rouleau
Sep 1, 2016·Expert Review of Endocrinology & Metabolism·Anu Bashamboo, Ken McElreavey

❮ Previous
Next ❯

Datasets Mentioned

BETA
NA19238
NA10851

Methods Mentioned

BETA
exome sequencing
PCR
exome capture
genotyping
454 sequencing

Software Mentioned

1000G
BLAT
vcflib
SOLiD
UnifiedGenotyper
INDEL
CrossMatch
BWA aligner
Integrative Genomic Viewer ( IGV )
FreeBayes

Related Concepts

Related Feeds

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.