Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifi...Continue Reading
An empirical study on the matrix-based protein representations and their combination with sequence-based approaches
Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test
String kernels and high-quality data set for improved prediction of kinked helices in α-helical membrane proteins
Chemical genomics approach for GPCR-ligand interaction prediction and extraction of ligand binding determinants
Complex networks govern coiled-coil oligomerization--predicting and profiling by means of a machine learning approach.
Using recurrence quantification analysis descriptors for protein sequence classification with support vector machines
Statistical prediction of protein chemical interactions based on chemical structure and mass spectrometry data
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection
Multiresolution analysis uncovers hidden conservation of properties in structurally and functionally similar proteins
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models
The distance-profile representation and its application to detection of distantly related protein families
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine
A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis
A method for probabilistic mapping between protein structure and function taxonomies through cross training
The proteins of intra-nuclear bodies: a data-driven analysis of sequence, interaction and expression
High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions
TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors
Accurate prediction of secreted substrates and identification of a conserved putative secretion signal for type III secretion systems
Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection
Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances
WITHDRAWN: Identification of microRNA precursor based on gapped n-tuple structure status composition kernel
Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures
A feature vector integration approach for a generalized support vector machine pairwise homology algorithm
Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation
CompoundProtein Interaction Prediction Within Chemogenomics: Theoretical Concepts, Practical Usage, and Future Directions
A novel approach to extracting features from motif content and protein composition for protein sequence classification
A computational method for designing diverse linear epitopes including citrullinated peptides with desired binding affinities to intravenous immunoglobulin
Protein remote homology detection by combining Chou's distance-pair pseudo amino acid composition and principal component analysis
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers
Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites
Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics
Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
Taking advantage of the use of supervised learning methods for characterization of sperm population structure related with freezability in the Iberian red deer
Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach
Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space
Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence
Prediction of amino acid positions specific for functional groups in a protein family based on local sequence similarity
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison
Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape
A comprehensive review and comparison of different computational methods for protein remote homology detection
BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches
LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification
Sequence-based Detection of DNA-binding Proteins using Multiple-view Features Allied with Feature Selection
Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor
The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
Prediction of bacterial E3 ubiquitin ligase effectors using reduced amino acid peptide fingerprinting
Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines
Use Chou's 5-steps rule to identify DNase I hypersensitive sites via dinucleotide property matrix and extreme gradient boosting
Using Random Forest Model Combined With Gabor Feature to Predict Protein-Protein Interaction From Protein Sequence
In search of a novel chassis material for synthetic cells: emergence of synthetic peptide compartment
Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components
Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.
Alzheimer's Disease: MS4A
Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.
Pediculosis pubis is a disease caused by a parasitic insect known as Pthirus pubis, which infests human pubic hair, as well as other areas with hair including eye lashes. Here is the latest research.
Rh isoimmunization is a potentially preventable condition that occasionally is associated with significant perinatal morbidity or mortality. Discover the latest research on Rh Isoimmunization here.
Genetic Screens in iPSC-derived Brain Cells
Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells. It also follows CRISPR-Cas9 approaches to generating genetic mutants as a means of understanding the effect of genetics on phenotype.
This feed focuses on molecular models of enzyme evolution and new approaches (such as adaptive laboratory evolution) to metabolic engineering of microorganisms. Here is the latest research.
Chronic Fatigue Syndrome
Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.
Pharmacology of Proteinopathies
This feed focuses on the pharmacology of proteinopathies - diseases in which proteins abnormally aggregate (i.e. Alzheimer’s, Parkinson’s, etc.). Discover the latest research in this field with this feed.
Alignment-free Sequence Analysis Tools
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.