Oct 25, 2018

SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology

BioRxiv : the Preprint Server for Biology
Simon R Harris

Abstract

Genome sequencing is revolutionising infectious disease epidemiology, providing a huge step forward in sensitivity and specificity over more traditional molecular typing techniques. However, the complexity of genome data often means that its analysis and interpretation requires high-performance compute infrastructure and dedicated bioinformatics support. Furthermore, current methods have limitations that can differ between analyses and are often opaque to the user, and their reliance on multiple external dependencies makes reproducibility difficult. Here I introduce SKA, a toolkit for analysis of genome sequence data from closely-related, small, haploid genomes. SKA uses split kmers to rapidly identify variation between genome sequences, making it possible to analyse hundreds of genomes on a standard home computer. Tests on publicly available simulated and real-life data show that SKA is both faster and more efficient than the gold standard methods used today while retaining similar levels of accuracy for epidemiological purposes. SKA can take raw read data or genome assemblies as input and calculate pairwise distances, create single linkage clusters and align genomes to a reference genome or using a reference-free approach. SK...Continue Reading

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Environmental Infrastructure
Genome
Molecular Probe Techniques
Bio-Informatics
Haploid Cell
Genomics
Comparative Genomic Analysis
Epidemiology
Analysis
SKA1

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Bioinformatics in Biomedicine (Preprints)

Bioinformatics in biomedicine incorporates computer science, biology, chemistry, medicine, mathematics and statistics. Discover the latest preprints on bioinformatics in biomedicine here.