NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types

Nucleic Acids Research
Sejoon LeePeter J Park

Abstract

In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate.

References

Oct 29, 2005·Nature·International HapMap Consortium
Oct 20, 2007·Nature Protocols·Melissa S ClineGary D Bader
Jun 10, 2009·Bioinformatics·Heng Li1000 Genome Project Data Processing Subgroup
Nov 19, 2010·Nature Protocols·Carl A AndersonKrina T Zondervan
Jun 10, 2011·Bioinformatics·Petr Danecek1000 Genomes Project Analysis Group
Nov 9, 2011·Proceedings of the National Academy of Sciences of the United States of America·Ruibin XiPeter J Park
Aug 29, 2012·Journal of Forensic Sciences·Sara H Katsanis, Jennifer K Wagner
Apr 6, 2013·Bioinformatics·Jinyan HuangLiming Liang
Sep 28, 2013·Genome Medicine·Reuben J PengellySarah Ennis
Oct 19, 2013·Science·Maya KasowskiMichael Snyder
Nov 1, 2013·Nucleic Acids Research·Jeffrey R MacDonaldStephen W Scherer
Jun 5, 2014·Cancer Discovery·Joshua M FrancisKeith L Ligon
Aug 1, 2014·Nature·Cancer Genome Atlas Research Network
Aug 15, 2014·PLoS Computational Biology·Seungyeul YooJun Zhu
Aug 21, 2015·G3 : Genes - Genomes - Genetics·Karl W BromanAlan D Attie
Oct 27, 2015·Nature Genetics·Matthew T MauranoJohn A Stamatoyannopoulos
Jan 26, 2016·Nature Genetics·Virginia SavovaAlexander A Gimelbrant

Citations

Feb 13, 2019·Endocrine-related Cancer·Antonio M LerarioTobias Else
Jun 15, 2019·Bioinformatics·Hein Chun, Sangwoo Kim
May 1, 2019·Carcinogenesis·Harald OeyRayleen V Bowman
Dec 12, 2018·Nature Medicine·Fulvio D'AngeloAntonio Iavarone
Sep 11, 2019·Nucleic Acids Research·Natalia BlayTanya Vavouri
May 6, 2020·Nature Communications·Chey LovedayClare Turnbull
Jun 20, 2020·Scientific Reports·Ian M RoseAlexander Yu Nikitin
Jul 31, 2020·Nature Communications·Nauman JavedNoam Shoresh
Nov 27, 2020·Cell·Francesca PetraliaClinical Proteomic Tumor Analysis Consortium

Related Concepts

Malignant Neoplasms
Standardization
Computer Programs and Programming
Genome, Human
Sequence Determinations, DNA
Genetic Predisposition to Disease
Single Nucleotide Polymorphism
Genome-Wide Association Study
DNA Copy Number Changes
High-Throughput Nucleotide Sequencing

Related Feeds

Cancer Sequencing

Several sequencing approaches are employed to understand and examine tumor development and progression. These include whole genome as well as RNA sequencing. Here is the latest research on cancer sequencing.