Abstract
Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://gith...Continue Reading
References
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Jun 10, 2009·Bioinformatics·Heng LiUNKNOWN 1000 Genome Project Data Processing Subgroup
Jul 21, 2010·Genome Research·Aaron McKennaMark A DePristo
Sep 3, 2010·Nature·UNKNOWN International HapMap 3 ConsortiumJean E McEwen
Oct 29, 2010·Genome Research·Cornelis A AlbersRichard Durbin
Oct 29, 2010·Genome Research·Si Quang Le, Richard Durbin
Oct 29, 2010·Nature·Gonçalo R AbecasisGil A McVean
Apr 12, 2011·Nature Genetics·Mark A DePristoMark J Daly
Apr 16, 2011·Bioinformatics·Derek W BarnettGabor T Marth
Nov 9, 2011·Nucleic Acids Research·Omkar MuralidharanNancy Zhang
Jan 19, 2012·Bioinformatics·Na YouXinping Cui
Feb 4, 2012·Genome Research·Daniel C KoboldtRichard K Wilson
Oct 12, 2012·PLoS Genetics·Bingshan LiGonçalo R Abecasis
Nov 7, 2012·Nature·UNKNOWN 1000 Genomes Project ConsortiumGil A McVean