Mar 2, 2015

BamHash: a checksum program for verifying the integrity of sequence data

BioRxiv : the Preprint Server for Biology
Arna ÓskarsdóttirPáll Melsted

Abstract

Summary Large resequencing projects require a significant amount of storage for raw sequences, as well as alignment files. Since the raw sequences are redundant once the alignment has been generated, it is possible to keep only the alignment files. We present BamHash, a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing reads after alignment, without the loss of data. Availability and Implementation The software is implemented in C++, GPL licensed and available at [https://github.com/DecodeGenetics/BamHash][1] Contact pmelsted{at}hi.is [1]: http://https://github.com/DecodeGenetics/BamHash

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Nucleic Acid Sequencing
Cocaine
DNA Resequencing
Sequencing
License
Health Care Program

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.