A recurrence-based approach for validating structural variation using long-read sequencing technology

GigaScience
Xuefang ZhaoRyan E Mills

Abstract

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We furth...Continue Reading

References

Sep 1, 1970·European Journal of Biochemistry·A J Gibbs, G A McIntyre
Jun 24, 2010·Nucleic Acids Research·Kevin J TraversStephen W Turner
Jan 12, 2011·Nature Biotechnology·James T RobinsonJill P Mesirov
Jul 4, 2012·Nature Biotechnology·Sergey Koren Adam M Phillippy
Sep 8, 2012·Nature·ENCODE Project Consortium
Nov 7, 2012·Bioinformatics·Yukiteru OnoMichiaki Hamada
Jun 28, 2014·Genome Biology·Ryan M LayerIra M Hall
Oct 4, 2014·American Journal of Human Genetics·Harrison BrandAlysa E Doyle
Nov 11, 2014·Nature·Mark J P ChaissonEvan E Eichler
Oct 4, 2015·Nature·1000 Genomes Project ConsortiumGonçalo R Abecasis
Oct 4, 2015·Nature·Peter H SudmantJan O Korbel
Nov 7, 2015·Genomics, Proteomics & Bioinformatics·Anthony Rhoads, Kin Fai Au
Jun 12, 2016·Genome Biology·Xuefang ZhaoRyan E Mills
Jul 1, 2016·Nature Communications·Lingling ShiKai Wang
Nov 11, 2016·Genome Research·Antonio Bernardo CarvalhoGabriel Goldstein
Nov 29, 2016·Nature Methods·Zechen ChongKen Chen

Related Concepts

In Silico
Receiver Operating Characteristic
Reproducibility of Results
Sequence Determinations, DNA
Computational Molecular Biology
Genomics
Genomic Structural Variation
DNA Copy Number Changes
High-Throughput Nucleotide Sequencing
Genome

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.