svclassify: a method to establish benchmark structural variant calls

BMC Genomics
Hemang ParikhMarc Salit

Abstract

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-S...Continue Reading

References

May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Jun 10, 2009·Bioinformatics·Heng LiUNKNOWN 1000 Genome Project Data Processing Subgroup
Dec 29, 2009·Nature Biotechnology·Hugo Y K LamMark B Gerstein
Jan 30, 2010·Bioinformatics·Aaron R Quinlan, Ira M Hall
Mar 10, 2010·Expert Reviews in Molecular Medicine·Charles Lee, Stephen W Scherer
Oct 29, 2010·Nature·Gonçalo R AbecasisGil A McVean
Jan 12, 2011·Nature Biotechnology·James T RobinsonJill P Mesirov
Feb 5, 2011·Nature·Ryan E MillsUNKNOWN 1000 Genomes Project
Mar 2, 2011·Nature Reviews. Genetics·Can AlkanEvan E Eichler
Mar 9, 2012·Nature Biotechnology·Hugo Y K LamMichael Snyder
Jun 28, 2014·Genome Biology·Ryan M LayerIra M Hall
Jun 2, 2015·Nature Communications·Alexej AbyzovMark B Gerstein

❮ Previous
Next ❯

Citations

Jun 20, 2017·Nature Reviews. Genetics·Simon A HardwickTim R Mercer
Mar 27, 2018·PloS One·Jesper EisfeldtAnna Lindstrand
May 6, 2016·Cancer Informatics·Lun-Ching ChangEric Polley
Oct 20, 2017·GigaScience·Sean D SmithAndrey Grigoriev
Jun 1, 2018·Bioinformatics·Péter Kómár, Deniz Kural
Jan 23, 2019·Bioinformatics·David Heller, Martin Vingron
Mar 22, 2019·Genome Research·Patrick MarksDeanna M Church
Apr 19, 2017·Annual Review of Genomics and Human Genetics·Samya Chakravorty, Madhuri Hegde
May 29, 2019·PLoS Computational Biology·Le ZhangZhenglin Du
Nov 16, 2019·Nature Reviews. Genetics·Steve S HoRyan E Mills
Apr 25, 2019·Nature Communications·Rory BowdenPeter Donnelly
Nov 30, 2019·Nature Communications·Hannes P EggertssonPall Melsted
Jun 12, 2016·Genome Biology·Xuefang ZhaoRyan E Mills
Aug 14, 2020·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Matthew HayesAngela Nguyen
Feb 6, 2020·Bioinformatics and Biology Insights·Eman Alzaid, Achraf El Allali
Jun 17, 2020·Nature Biotechnology·Justin M ZookMarc Salit
Oct 2, 2020·Briefings in Bioinformatics·Md Shariful Islam BhuyanM Sohel Rahman
Jan 19, 2021·Bioinformatics·Ksenia LavrichenkoStefan Johansson
Feb 13, 2021·NAR Genomics and Bioinformatics·Xuehan ZhuangPak Chung Sham
May 12, 2021·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Wenyan GuDaming Zhu
May 25, 2021·Patterns·Siavash Raeisi DehkordiVineet Bafna
Jul 15, 2021·Scientific Reports·Veronika GordeevaGeorgij Arapidi

❮ Previous
Next ❯

Datasets Mentioned

BETA
NA12878
PRJEB3381
SAMEA1573618
SAMEA1573615
SAMEA1573616
NA12892

Methods Mentioned

BETA
PCR
MDS

Software Mentioned

Platinum
Spiral Genetics
GIAB
MetaSV
Spiral
BreakSeq
GeT
Perl
BreakDancer
integrative genomics viewer ( IGV )

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.