Improving eukaryotic genome annotation using single molecule mRNA sequencing

BMC Genomics
Vincent MagriniMakedonka Mitreva

Abstract

The advantages of Pacific Biosciences (PacBio) single-molecule real-time (SMRT) technology include long reads, low systematic bias, and high consensus read accuracy. Here we use these attributes to improve on the genome annotation of the parasitic hookworm Ancylostoma ceylanicum using PacBio RNA-Seq. We sequenced 192,888 circular consensus sequences (CCS) derived from cDNAs generated using the CloneTech SMARTer system. These SMARTer-SMRT libraries were normalized and size-selected providing a robust population of expressed structural genes for subsequent genome annotation. We demonstrate PacBio mRNA sequences based genome annotation improvement, compared to genome annotation using conventional sequencing-by-synthesis alone, by identifying 1609 (9.2%) new genes, extended the length of 3965 (26.7%) genes and increased the total genomic exon length by 1.9 Mb (12.4%). Non-coding sequence representation (primarily from UTRs based on dT reverse transcription priming) was particularly improved, increasing in total length by fifteen-fold, by increasing both the length and number of UTR exons. In addition, the UTR data provided by these CCS allowed for the identification of a novel SL2 splice leader sequence for A. ceylanicum and an inc...Continue Reading

References

May 1, 1990·Parasite Immunology·P GarsideR A Rose
Sep 3, 1999·Nucleic Acids Research·D A ShaginM V Matz
Dec 11, 1999·Nucleic Acids Research·M Kanehisa, S Goto
Apr 26, 2000·Genome Research·A A Salamov, V V Solovyev
Jul 19, 2000·Genome Research·E BeaudoingD Gautheret
Jun 2, 2001·Parasitology·D W Crompton
Jan 10, 2003·Nucleic Acids Research·Sam Griffiths-JonesSean R Eddy
May 18, 2004·BMC Bioinformatics·Ian Korf
Dec 21, 2004·Nucleic Acids Research·Dennis A BensonDavid L Wheeler
Jan 8, 2005·Epidemiology and Infection·S BrookerF Kazibwe
Apr 28, 2005·BMC Genomics·Makedonka MitrevaRichard K Wilson
Jun 27, 2006·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Aleksandr MorgulisRicha Agarwala
Jul 18, 2006·Nucleic Acids Research·Mario StankeBurkhard Morgenstern
Apr 25, 2007·Nucleic Acids Research·Karin LagesenDavid W Ussery
Nov 21, 2007·Genome Research·Brandi L CantarelMark Yandell
Oct 22, 2008·International Journal for Parasitology·Xin GaoJohn M Hawdon
Dec 2, 2008·Nucleic Acids Research·Genis ParraIan Korf
Mar 11, 2009·Current Protocols in Bioinformatics·Maja Tarailo-Graovac, Nansheng Chen
Mar 18, 2009·Bioinformatics·Cole TrapnellSteven L Salzberg
Apr 9, 2010·The Plant Journal : for Cell and Molecular Biology·Julia HofmannFlorian Grundler
Feb 22, 2011·Nature Genetics·Makedonka MitrevaRichard K Wilson
Apr 5, 2011·The American Journal of Tropical Medicine and Hygiene·Vittaya JiraanankulSaovanee Leelayoova
Sep 8, 2011·Genes & Development·Nick J Proudfoot
Apr 12, 2012·The American Journal of Tropical Medicine and Hygiene·James V ConlanR C Andrew Thompson
Apr 28, 2012·Cellular and Molecular Life Sciences : CMLS·Lucy W BarrettSteve D Wilton
Sep 14, 2012·The Journal of Experimental Biology·Matthew C Banton, Alan Tunnacliffe
Nov 12, 2013·Nucleic Acids Research·Minoru KanehisaMao Tanabe
Nov 30, 2013·Nucleic Acids Research·Robert D FinnMarco Punta
Jan 21, 2014·Nature Genetics·Yat T TangMakedonka Mitreva
Sep 28, 2014·Bioinformatics·Simon AndersWolfgang Huber
Nov 14, 2014·Nucleic Acids Research·John MartinMakedonka Mitreva
Nov 22, 2014·Nucleic Acids Research·Aron Marchler-BauerStephen H Bryant
Feb 19, 2015·Nature Biotechnology·Mihaela PerteaSteven L Salzberg

❮ Previous
Next ❯

Citations

Aug 10, 2019·Frontiers in Genetics·Nam V HoangRobert J Henry
Nov 17, 2020·PLoS Neglected Tropical Diseases·Nicolas J WheelerMostafa Zamanian
Nov 24, 2019·Zoology : Analysis of Complex Systems, ZACS·Ira CookeUNKNOWN Consortium of Australian Academy of Science Boden Research Conference Participants

❮ Previous
Next ❯

Datasets Mentioned

BETA
PRJNA72583

Methods Mentioned

BETA
Illumina sequencing
cDNA library
PCR
RNA-Seq
RNAseq
whole genome shotgun

Software Mentioned

HTSeq
PacBio
house Perl script
BLASTX
RPSBLAST
RNAmmer
Orig
SMRT
BLAST
SMARTer

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.