Improve homology search sensitivity of PacBio data by correcting frameshifts

Bioinformatics
Nan Du, Yanni Sun

Abstract

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against charac...Continue Reading

References

Jan 10, 1986·Nucleic Acids Research·H PeltolaE Ukkonen
Feb 1, 1996·Computer Applications in the Biosciences : CABIOS·X Guan, E C Uberbacher
Jun 20, 1998·Bioinformatics·N P BrownP Bork
Jan 27, 1999·Bioinformatics·Sean R Eddy
Apr 1, 2000·Bioinformatics·E HalperinR Gill-More
Jan 10, 2003·Nucleic Acids Research·Daniel H HaftOwen White
May 5, 2004·Genome Research·Ewan BirneyRichard Durbin
Jun 2, 2006·Molecular Systems Biology·Koji HayashiTakashi Horiuchi
Feb 13, 2007·Proceedings of the National Academy of Sciences of the United States of America·Zhan GaoMartin J Blaser
Jul 31, 2009·International Journal of Bioinformatics Research and Applications·Andrey O KislyukMark Borodovsky
Sep 19, 2009·Nucleic Acids Research·Folker MeyerAlex Rodriguez
Jan 5, 2010·BMC Genomics·Tim van ZutphenIda J van der Klei
Jan 6, 2010·Algorithms for Molecular Biology : AMB·Marta GirdeaGregory Kucherov
Jun 18, 2010·Journal of Bioinformatics and Computational Biology·Ivan Antonov, Mark Borodovsky
Feb 10, 2011·Standards in Genomic Sciences·Brian J TindallAlla Lapidus
Jul 29, 2011·The New England Journal of Medicine·David A RaskoMatthew K Waldor
Oct 26, 2011·BMC Bioinformatics·Jintao ZhangJun Huan
Dec 1, 2011·Nucleic Acids Research·Marco PuntaRobert D Finn
Jan 5, 2012·BMC Bioinformatics·Rebecca F HalperinStephen Albert Johnston
Jul 4, 2012·Nature Biotechnology·Sergey Koren Adam M Phillippy
Nov 7, 2012·Bioinformatics·Yukiteru OnoMichiaki Hamada
Jan 16, 2013·Genome Biology·Yanjiao ZhouGeorge M Weinstock
May 7, 2013·Nature Methods·Chen-Shan ChinJonas Korlach
Jun 26, 2014·Proceedings of the National Academy of Sciences of the United States of America·Hagen TilgnerMichael P Snyder
Nov 7, 2015·Genomics, Proteomics & Bioinformatics·Anthony Rhoads, Kin Fai Au
Dec 18, 2015·Nucleic Acids Research·Robert D FinnAlex Bateman

Citations

Jun 1, 2018·Bioinformatics·Kemal Eren, Ben Murrell
Feb 7, 2019·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Y M SuvorovaE V Korotkov
Feb 25, 2018·G3 : Genes - Genomes - Genetics·Jörg A BachmannTanja Slotte

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Glut1 Deficiency

Glut1 deficiency, an autosomal dominant, genetic metabolic disorder associated with a deficiency of GLUT1, the protein that transports glucose across the blood brain barrier, is characterized by mental and motor developmental delays and infantile seizures. Follow the latest research on Glut1 deficiency with this feed.

Separation Anxiety

Separation anxiety is a type of anxiety disorder that involves excessive distress and anxiety with separation. This may include separation from places or people to which they have a strong emotional connection with. It often affects children more than adults. Here is the latest research on separation anxiety.

KIF1A Associated Neurological Disorder

KIF1A associated neurological disorder (KAND) is a rare neurodegenerative condition caused by mutations in the KIF1A gene. KAND may present with a wide range and severity of symptoms including stiff or weak leg muscles, low muscle tone, a lack of muscle coordination and balance, and intellectual disability. Find the latest research on KAND here.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.