A novel riboswitch classification based on imbalanced sequences achieved by machine learning

PLoS Computational Biology
S. S. BeyeneMing Chen

Abstract

Riboswitch, a part of regulatory mRNA (50-250nt in length), has two main classes: aptamer and expression platform. One of the main challenges raised during the classification of riboswitch is imbalanced data. That is a circumstance in which the records of a sequences of one group are very small compared to the others. Such circumstances lead classifier to ignore minority group and emphasize on majority ones, which results in a skewed classification. We considered sixteen riboswitch families, to be in accord with recent riboswitch classification work, that contain imbalanced sequences. The sequences were split into training and test set using a newly developed pipeline. From 5460 k-mers (k value 1 to 6) produced, 156 features were calculated based on CfsSubsetEval and BestFirst function found in WEKA 3.8. Statistically tested result was significantly difference between balanced and imbalanced sequences (p < 0.05). Besides, each algorithm also showed a significant difference in sensitivity, specificity, accuracy, and macro F-score when used in both groups (p < 0.05). Several k-mers clustered from heat map were discovered to have biological functions and motifs at the different positions like interior loops, terminal loops and hel...Continue Reading

References

Jun 14, 2000·Current Opinion in Structural Biology·M Zuker
Nov 29, 2002·Proceedings of the National Academy of Sciences of the United States of America·Wade C WinklerRonald R Breaker
May 21, 2003·RNA·Narasimhan SudarsanRonald R Breaker
Nov 5, 2003·Applied Optics·David W HahnVincenzo Palleschi
Nov 25, 2003·Bioinformatics·Thomas Hamelryck, Bernard Manderick
Mar 19, 2004·Nature·Wade C WinklerRonald R Breaker
Mar 23, 2004·Nucleic Acids Research·Robert C Edgar
Jun 3, 2004·Nature Reviews. Molecular Cell Biology·Maumita Mandal, Ronald R Breaker
Jun 25, 2004·Nucleic Acids Research·Peter Bengert, Thomas Dandekar
Jun 28, 2005·Nucleic Acids Research·Cei Abreu-Goodger, Enrique Merino
Jul 1, 2006·Nature·Rebecca K Montange, Robert T Batey
Mar 16, 2007·Structure·Juan Miranda-Ríos
Sep 7, 2007·Cell·Charles E DannWade C Winkler
Nov 29, 2007·RNA·Miyun Kwon, Scott A Strobel
Jan 22, 2008·Nature Structural & Molecular Biology·Sunny D GilbertRobert T Batey
Mar 21, 2009·Annual Review of Biochemistry·Adam Roth, Ronald R Breaker
Nov 10, 2009·Nature Structural & Molecular Biology·Kathryn D SmithScott A Strobel
Dec 17, 2009·BMC Bioinformatics·Christiam CamachoThomas L Madden
Apr 21, 2011·Briefings in Bioinformatics·Paolo Ribeca, Gabriel Valiente
Aug 16, 2011·Biometrics·Tyler H McCormickRandall S Burd
Sep 20, 2011·Molecular Cell·Ronald R Breaker
Nov 10, 2011·Nature Biotechnology·Phillip E C CompeauGlenn Tesler
Jan 22, 2013·Cell·Alexander Serganov, Evgeny Nudler
Apr 2, 2013·Briefings in Bioinformatics·Haipeng GongZengyou He
Feb 14, 2014·BMC Genomics·Bharat PanwarGajendra P S Raghava
Apr 30, 2014·The Plant Journal : for Cell and Molecular Biology·Samuel E Bocobza, Asaph Aharoni
Jun 28, 2014·Journal of the American Chemical Society·Christopher J RobinsonJason Micklefield
Jul 13, 2014·Bioinformatics·Jessen T HavillJeffrey S Thompson
Nov 22, 2014·Nucleic Acids Research·Dennis A BensonEric W Sayers
Nov 6, 2015·BMC Bioinformatics·Gregory DitzlerGail L Rosen
Mar 30, 2016·Journal of Applied Genetics·Piotr MachtelMarek Żywicki
Apr 12, 2016·The Plant Journal : for Cell and Molecular Biology·Stephen DouglassMatteo Pellegrini

❮ Previous
Next ❯

Citations

Apr 4, 2021·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Ousama Al ShanaaMarina Padkina

❮ Previous
Next ❯

Software Mentioned

RibEx
imblearn
WEKA
Riboswitch finder
RiboSW
RiboD
GraphPad Prism
SMOTE
CfsSubsetEval
GridSearchCV

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.