Sep 13, 2015

BamHash: a checksum program for verifying the integrity of sequence data

Bioinformatics
Arna ÓskarsdóttirPáll Melsted

Abstract

Large resequencing projects require a significant amount of storage for raw sequences, as well as alignment files. Because the raw sequences are redundant once the alignment has been generated, it is possible to keep only the alignment files. We present BamHash, a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing read after alignment, without the loss of data. The software is implemented in C++, GPL licensed and available at https://github.com/DecodeGenetics/BamHash pmelsted@hi.is.

  • References3
  • Citations

References

  • References3
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Nucleic Acid Sequencing
Sequence Determinations
Computer Programs and Programming
Cocaine
DNA Resequencing
Sequencing
License
Health Care Program
Reproducibility of Results

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

Head And Neck Squamous Cell Carcinoma

Squamous cell carcinomas account for >90% of all tumors in the head and neck region. Head and neck squamous cell carcinoma incidence has increased dramatically recently with little improvement in patient outcomes. Here is the latest research on this aggressive malignancy.

Signaling in Adult Neurogenesis

Neural stem cells play a critical role in the production of neuronal cells in neurogenesis is of great importance. Of interest is the role signalling mechanisms in adult neurogenesis. Discover the latest research on signalling in adult neurogenesis.

Psychiatric Chronotherapy

Psychiatric Chronotherapy considers the circadian rhythm as a major factor for optimizing therapeutic efficacy of psychiatric interventions. Discover the latest research on Psychiatric Chronotherapy here.

Bone Marrow Neoplasms

Bone Marrow Neoplasms are cancers that occur in the bone marrow. Discover the latest research on Bone Marrow Neoplasms here.

IGA Glomerulonephritis

IgA glomerulonephritis is a chronic form of glomerulonephritis characterized by deposits of predominantly Iimmunoglobin A in the mesangial area. Discover the latest research on IgA glomerulonephritis here.

Cryogenic Electron Microscopy

Cryogenic electron microscopy (Cryo-EM) allows the determination of biological macromolecules and their assemblies at a near-atomic resolution. Here is the latest research.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.