CPPred: coding potential prediction based on the global description of RNA sequence

Nucleic Acids Research
Xiaoxue Tong, Shiyong Liu


The rapid and accurate approach to distinguish between coding RNAs and ncRNAs has been playing a critical role in analyzing thousands of novel transcripts, which have been generated in recent years by next-generation sequencing technology. Previously developed methods CPAT, CPC2 and PLEK can distinguish coding RNAs and ncRNAs very well, but poorly distinguish between small coding RNAs and small ncRNAs. Herein, we report an approach, CPPred (coding potential prediction), which is based on SVM classifier and multiple sequence features including novel RNA features encoded by the global description. The CPPred can better distinguish not only between coding RNAs and ncRNAs, but also between small coding RNAs and small ncRNAs than the state-of-the-art methods due to the addition of the novel RNA features. A recent study proposes 1335 novel human coding RNAs from a large number of RNA-seq datasets. However, only 119 transcripts are predicted as coding RNAs by the CPPred. In fact, almost all proposed novel coding RNAs are ncRNAs (91.1%), which is consistent with previous reports. Remarkably, we also reveal that the global description of encoding features (T2, C0 and GC) plays an important role in the prediction of coding potential.


Oct 20, 1975·Biochimica Et Biophysica Acta·B W Matthews
Dec 25, 1992·Nucleic Acids Research·J W Fickett, C S Tung
Sep 11, 1982·Nucleic Acids Research·J W Fickett
Sep 12, 1995·Proceedings of the National Academy of Sciences of the United States of America·I DubchakS H Kim
Jan 19, 2000·Trends in Genetics : TIG·K D PruittD R Maglott
Sep 28, 2001·Nucleic Acids Research·R J CarterS R Holbrook
Feb 14, 2002·Proceedings of the National Academy of Sciences of the United States of America·Horst RohrigMichael John
Jul 12, 2002·Nature·Gregory J Hannon
Apr 14, 2004·Genome Research·Ewan BirneyMichele Clamp
Aug 27, 2005·IEEE Transactions on Pattern Analysis and Machine Intelligence·Hanchuan PengChris Ding
Apr 19, 2007·PLoS Biology·Máximo Ibo GalindoJuan Pablo Couso
Sep 6, 2007·Journal of Mathematical Biology·Ariane Machado-LimaAlan M Durham
May 3, 2008·Science·Ugrappa NagalakshmiMichael Snyder
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Oct 6, 2009·Functional & Integrative Genomics·Sadegh Azimzadeh Jamalkandi, Ali Masoudi-Nejad
Oct 12, 2010·RNA Biology·Tiffany Hung, Howard Y Chang
May 10, 2011·Trends in Cell Biology·Orly Wapinski, Howard Y Chang
Jul 27, 2011·Current Protein & Peptide Science·Haoyu ChengYaoqi Zhou
Nov 24, 2011·Genome Research·Andrea PauliAlexander F Schier
Dec 20, 2011·Journal of Proteomics·Bi-Qing LiKuo-Chen Chou
Aug 29, 2012·Proceedings of the National Academy of Sciences of the United States of America·Sooncheol LeeShu-Bing Qian
Nov 20, 2012·Nature Chemical Biology·Sarah A SlavoffAlan Saghatelian
Mar 19, 2013·Cell·Pedro J Batista, Howard Y Chang
May 11, 2013·British Journal of Cancer·S W CheethamM E Dinger
Dec 10, 2013·Developmental Cell·Serene C ChngBruno Reversade
Jan 11, 2014·Science·Andrea PauliAlexander F Schier
Feb 5, 2014·Journal of Proteome Research·Jiao MaAlan Saghatelian
Feb 12, 2014·Nature Reviews. Genetics·Shea J Andrews, Joseph A Rothnagel
Apr 29, 2014·Nucleic Acids Research·Supatcha LertampaipornMarasri Ruengjitchatchawalya

Related Concepts

Malignant Neoplasm of Stomach
RNA, Untranslated
Sequence Determinations, RNA
Massively-Parallel Sequencing
High-Throughput Nucleotide Sequencing
Poly(A) Tail
Datasets as Topic

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

HLA Genetic Variation

HLA genetic variation has been found to confer risk for a wide variety of diseases. Identifying these associations and understanding their molecular mechanisms is ongoing and holds promise for the development of therapeutics. Find the latest research on HLA genetic variation here.

Super-resolution Microscopy

Super-resolution microscopy is the term commonly given to fluorescence microscopy techniques with resolutions that are not limited by the diffraction of light. Here are the latest discoveries pertaining to super-resolution microscopy.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.

CD4/CD8 Signaling

Cluster of differentiation 4 and 8 (CD8 and CD8) are glycoproteins founds on the surface of immune cells. Here is the latest research on their role in cell signaling pathways.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.