Figaro: a novel statistical method for vector sequence removal.

Bioinformatics
James Robert WhiteMihai Pop

Abstract

Sequences produced by automated Sanger sequencing machines frequently contain fragments of the cloning vector on their ends. Software tools currently available for identifying and removing the vector sequence require knowledge of the vector sequence, specific splice sites and any adapter sequences used in the experiment-information often omitted from public databases. Furthermore, the clipping coordinates themselves are missing or incorrectly reported. As an example, within the approximately 1.24 billion shotgun sequences deposited in the NCBI Trace Archive, as many as approximately 735 million (approximately 60%) lack vector clipping information. Correct clipping information is essential to scientists attempting to validate, improve and even finish the increasingly large number of genomes released at a 'draft' quality level. We present here Figaro, a novel software tool for identifying and removing the vector from raw sequence data without prior knowledge of the vector sequence. The vector sequence is automatically inferred by analyzing the frequency of occurrence of short oligo-nucleotides using Poisson statistics. We show that Figaro achieves 99.98% sensitivity when tested on approximately 1.5 million shotgun reads from Dros...Continue Reading

References

Dec 1, 1977·Proceedings of the National Academy of Sciences of the United States of America·F SangerA R Coulson
Apr 5, 1996·Analytical Biochemistry·B AnderssonR A Gibbs
Mar 24, 2000·Science·E W MyersJ C Venter
Feb 22, 2001·Science·J C VenterX Zhu
Aug 16, 2001·Proceedings of the National Academy of Sciences of the United States of America·P A PevznerM S Waterman
Dec 26, 2001·Bioinformatics·H H Chou, M H Holmes
May 30, 2002·Nucleic Acids Research·Arthur L DelcherSteven L Salzberg
Apr 22, 2003·Proceedings of the National Academy of Sciences of the United States of America·Rekha SeshadriJohn F Heidelberg
Feb 5, 2004·Genome Biology·Stefan KurtzSteven L Salzberg
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Feb 7, 2006·Current Opinion in Plant Biology·Pablo D Rabinowicz, Jeffrey L Bennetzen

❮ Previous
Next ❯

Citations

Mar 10, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yu-Wei Wu, Yuzhen Ye
Oct 28, 2008·Bioinformatics·Jason R MillerGranger Sutton
Sep 10, 2009·Bioinformatics·Timo LassmannCarsten O Daub
Feb 5, 2013·Bioinformatics·Susanne BalzerInge Jonassen
Dec 15, 2010·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Shusei SatoKiichi Fukui
Jan 22, 2010·BMC Bioinformatics·Juan FalguerasM Gonzalo Claros
Sep 27, 2012·BMC Bioinformatics·Hongseok TaeJeong-Hyeon Choi
May 28, 2011·BMC Research Notes·Mariette JérômeChristophe Klopp
Apr 28, 2009·Genome Biology·Aleksey V ZiminSteven L Salzberg
Oct 18, 2008·PloS One·Daniela Puiu, Steven L Salzberg
Feb 6, 2015·BMC Bioinformatics·Ying WangXiaoman Li
Oct 27, 2017·Bioinformatics·Alejandro A SchäfferRichard McVeigh
Apr 16, 2013·Bioinformatics and Biology Insights·Riccardo Percudani
Oct 18, 2013·Journal of Bioinformatics and Computational Biology·Fethullah Karabiber
Nov 21, 2018·Communications Biology·Theodore S KalbfleischJames N MacLeod

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.