Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi

BMC Bioinformatics
Jens KeilwagenJan Grau

Abstract

Genome annotation is of key importance in many research questions. The identification of protein-coding genes is often based on transcriptome sequencing data, ab-initio or homology-based prediction. Recently, it was demonstrated that intron position conservation improves homology-based gene prediction, and that experimental data improves ab-initio gene prediction. Here, we present an extension of the gene prediction program GeMoMa that utilizes amino acid sequence conservation, intron position conservation and optionally RNA-seq data for homology-based gene prediction. We show on published benchmark data for plants, animals and fungi that GeMoMa performs better than the gene prediction programs BRAKER1, MAKER2, and CodingQuarry, and purely RNA-seq-based pipelines for transcript identification. In addition, we demonstrate that using multiple reference organisms may help to further improve the performance of GeMoMa. Finally, we apply GeMoMa to four nematode species and to the recently published barley reference genome indicating that current annotations of protein-coding genes may be refined using GeMoMa predictions. GeMoMa might be of great utility for annotating newly sequenced genomes but also for finding homologs of a specifi...Continue Reading

References

Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Jun 15, 1996·Genomics·M Burset, R Guigó
Apr 5, 2002·Genome Research·W James Kent
Oct 21, 2003·BMC Bioinformatics·Evan Keibler, Michael R Brent
Feb 17, 2005·BMC Bioinformatics·Guy St C Slater, Ewan Birney
Aug 24, 2006·Genome Biology·Victor SolovyevDenis Vorobyev
Dec 6, 2006·Nucleic Acids Research·Shu OuyangC Robin Buell
Nov 13, 2007·Nature·UNKNOWN Drosophila 12 Genomes ConsortiumIain MacCallum
Jan 26, 2008·Bioinformatics·Mario StankeDavid Haussler
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Dec 23, 2008·BMC Bioinformatics·Avril CoghlanLincoln D Stein
Feb 12, 2010·Nature·UNKNOWN International Brachypodium Initiative
Apr 23, 2011·Science·Nicholas RhindChad Nusbaum
Jun 10, 2011·Bioinformatics·Rong SheNansheng Chen
Nov 23, 2011·BMC Evolutionary Biology·Karin C KiontkeDavid H A Fitch
Dec 6, 2011·Nucleic Acids Research·Philippe LameschEva Huala
May 15, 2012·Nature Biotechnology·Jeffrey L BennetzenKatrien M Devos
Oct 30, 2012·Bioinformatics·Alexander DobinThomas R Gingeras
Nov 5, 2013·Nature Methods·Tamara SteijgerPaul Bertone
Jul 6, 2014·Nucleic Acids Research·Alexandre LomsadzeMark Borodovsky
Feb 19, 2015·Nature Biotechnology·Mihaela PerteaSteven L Salzberg
Jun 26, 2015·G3 : Genes - Genomes - Genetics·Beverley B MatthewsUNKNOWN FlyBase Consortium
Sep 19, 2015·PloS One·Vimal RawatKorbinian Schneeberger
Nov 13, 2015·Bioinformatics·Katharina J HoffMario Stanke
Nov 19, 2015·Nucleic Acids Research·Kevin L HowePaul W Sternberg
Jan 28, 2016·Genome Biology·Ana ConesaAli Mortazavi
Feb 20, 2016·Nucleic Acids Research·Jens KeilwagenFrank Hartung
Aug 26, 2016·Nature Protocols·Mihaela PerteaSteven L Salzberg
Nov 2, 2016·Nucleic Acids Research·L Sian GramatesUNKNOWN the FlyBase Consortium

❮ Previous
Next ❯

Citations

Oct 22, 2019·Molecular Plant-microbe Interactions : MPMI·David J WinterCarolyn A Young
Jul 18, 2020·Applied Microbiology and Biotechnology·Fernando Augusto da SilveiraWendel Batista da Silveira
Aug 1, 2018·Science China. Life Sciences·Yanting ShenZhixi Tian
Nov 7, 2019·PeerJ·Stepan PachganovTatiana V Tatarinova
Aug 8, 2020·G3 : Genes - Genomes - Genetics·Natascha van LieshoutRichard Finkers
Feb 10, 2021·G3 : Genes - Genomes - Genetics·Jonas BohnWojciech Makałowski
Feb 13, 2021·NAR Genomics and Bioinformatics·Tomáš BrůnaMark Borodovsky
Mar 7, 2021·Nature Communications·Carlos G Urzúa-TraslaviñaRudolf S N Fehrmann
Apr 6, 2021·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Lisa MännerCarola Greve
Sep 11, 2020·Fungal Genetics and Biology : FG & B·Nívea Moreira VieiraWendel Batista da Silveira
May 18, 2021·Frontiers in Genetics·Luis J ChuecaSven Klimpel
May 21, 2021·Journal of Genetics and Genomics = Yi Chuan Xue Bao·Jinhua XiaoDawei Huang
Nov 27, 2021·BMC Bioinformatics·Lars GabrielMario Stanke
Dec 21, 2021·Nucleic Acids Research·Jing LiEve Syrkin Wurtele
Jan 10, 2022·Genome Biology and Evolution·Xiaolei FengYuan Huang

❮ Previous
Next ❯

Datasets Mentioned

BETA
SRP071745

Methods Mentioned

BETA
RNA-seq
homology-based prediction

Software Mentioned

Maker2
StringTie
TransDecoder
BRAKER1
Galaxy
Hisat2
Cufflinks
- StringTie
CompareTranscripts
GAF

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.