SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

BMC Bioinformatics
Iain MelvinChristina Leslie

Abstract

Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at http://svm-fold.c2b2.columbia.edu. Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence...Continue Reading

References

Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Mar 25, 1981·Journal of Molecular Biology·T F Smith, M S Waterman
Feb 4, 1994·Journal of Molecular Biology·A KroghD Haussler
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Aug 15, 1997·Structure·C A OrengoJ M Thornton
Dec 11, 1999·Nucleic Acids Research·S E BrennerM Levitt
Jul 13, 2000·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·T JaakkolaD Haussler
Jul 12, 2003·Bioinformatics·Asa Ben-Hur, Douglas Brutlag
Feb 28, 2004·Bioinformatics·Hiroto SaigoTatsuya Akutsu
Mar 3, 2004·Bioinformatics·Christina LeslieWilliam Stafford Noble
May 21, 2005·Bioinformatics·Jason WestonWilliam Stafford Noble
Aug 19, 2005·Journal of Bioinformatics and Computational Biology·Rui KuangChristina Leslie
Jun 30, 2007·BMC Bioinformatics·Iain MelvinChristina Leslie
Feb 7, 2008·IEEE Transactions on Neural Networks·O L Mangasarian, D R Musicant

Citations

Feb 6, 2010·Proceedings of the National Academy of Sciences of the United States of America·Inbal Budowski-TalRachel Kolodny
Dec 22, 2009·The Journal of Biological Chemistry·Shivangi AgarwalRakesh Bhatnagar
Oct 3, 2008·Bioinformatics·Ta-Tsen SoongBurkhard Rost
Jul 3, 2010·BMC Bioinformatics·Pooja Jain, Jonathan D Hirst
Sep 8, 2007·BMC Bioinformatics·Ana P C RodriguesIddo Friedberg
Jun 30, 2007·BMC Bioinformatics·Iain MelvinChristina Leslie
Sep 24, 2008·BMC Bioinformatics·Iain MelvinWilliam Stafford Noble
Apr 29, 2011·BMC Ecology·Todd Z DeSantisNiels Larsen
Apr 21, 2009·BMC Structural Biology·Gergely CsabaRalf Zimmer
Mar 14, 2009·PLoS Computational Biology·Gareth A PalidworMiguel A Andrade-Navarro
Jul 5, 2013·PloS One·Tobias HampBurkhard Rost
Aug 12, 2009·Computers in Biology and Medicine·Christos LamprosDimitrios Tsalikakis
May 29, 2009·Computational Biology and Chemistry·Pooja JainJonathan D Hirst
Jan 13, 2009·Journal of Molecular Biology·Quan LeP Koehl
Dec 18, 2008·Proteins·Patrick Deschavanne, Pierre Tufféry
Jun 28, 2011·Computers in Biology and Medicine·Hilmi M MudaRazib M Othman
Jul 3, 2008·Nucleic Acids Research·Michelle D BrazasB F Francis Ouellette
Feb 10, 2017·Chromosome Research : an International Journal on the Molecular, Supramolecular and Evolutionary Aspects of Chromosome Biology·Yan ZhengXiaoqing Zhao
May 31, 2018·Scientific Reports·Inbal Budowski-TalYael Mandel-Gutfreund
Aug 4, 2020·Royal Society Open Science·Khalique NewazTijana Milenković

Related Concepts

Knowledge Representation (Computer)
Pattern Recognition System
Gene Products, Protein
Computer Programs and Programming
Discriminant Analysis
Determination, Sequence Homology
Homologous Sequences, Amino Acid
Protein Folding, Globular
Twitter Messaging
Sequence Analysis, Protein

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Pediculosis pubis

Pediculosis pubis is a disease caused by a parasitic insect known as Pthirus pubis, which infests human pubic hair, as well as other areas with hair including eye lashes. Here is the latest research.

Rh Isoimmunization

Rh isoimmunization is a potentially preventable condition that occasionally is associated with significant perinatal morbidity or mortality. Discover the latest research on Rh Isoimmunization here.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells. It also follows CRISPR-Cas9 approaches to generating genetic mutants as a means of understanding the effect of genetics on phenotype.

Enzyme Evolution

This feed focuses on molecular models of enzyme evolution and new approaches (such as adaptive laboratory evolution) to metabolic engineering of microorganisms. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Pharmacology of Proteinopathies

This feed focuses on the pharmacology of proteinopathies - diseases in which proteins abnormally aggregate (i.e. Alzheimer’s, Parkinson’s, etc.). Discover the latest research in this field with this feed.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.