Jul 12, 2016

Fast genotyping of known SNPs through approximate k-mer matching

BioRxiv : the Preprint Server for Biology
Ariya ShajiiBonnie Berger


Motivation: As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS). Results: We introduce LAVA (Lightweight Assignment of Variant Alleles), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while usi...Continue Reading

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Severe Acute Respiratory Syndrome
Genome-Wide Association Study
Single Nucleotide Polymorphism Database
EPILEPSY, IDIOPATHIC GENERALIZED, SUSCEPTIBILITY TO, HAPLOTYPE (dbSNP rs674351, dbSNP rs584087, dbSNP rs585344, dbSNP rs608781, dbSNP rs642698, dbSNP rs674210, dbSNP rs645088, dbSNP rs649224, dbSNP rs654136)
Sequence Analysis

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.