PMID: 11331239May 2, 2001Paper

The utility of different representations of protein sequence for predicting functional class

Bioinformatics
R D KingL Dehaspe

Abstract

Data Mining Prediction (DMP) is a novel approach to predicting protein functional class from sequence. DMP works even in the absence of a homologous protein of known function. We investigate the utility of different ways of representing protein sequence in DMP (residue frequencies, phylogeny, predicted structure) using the Escherichia coli genome as a model. Using the different representations DMP learnt prediction rules that were more accurate than default at every level of function using every type of representation. The most effective way to represent sequence was using phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most general level of function: 69% accuracy and 7% coverage at the most detailed). We tested different methods for combining predictions from the different types of representation. These improved both the accuracy and coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted at an estimated accuracy of 86%.

Citations

Feb 26, 2004·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Minghua DengFengzhu Sun
Aug 3, 2004·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Minghua DengFengzhu Sun
Apr 6, 2006·Omics : a Journal of Integrative Biology·Hyunju LeeTing Chen
Feb 17, 2006·Bioinformatics·A ClareR D King
Oct 14, 2006·BMC Bioinformatics·Babak Shahbaba, Radford M Neal
Dec 4, 2008·Critical Reviews in Biotechnology·Zhenran Jiang
Feb 4, 2012·Statistics in Medicine·Babak ShahbabaZhaoxia Yu
Jan 5, 2014·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Pavel P Kuksa
Feb 13, 2010·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Alex A FreitasRolf Apweiler
Apr 25, 2002·BMC Bioinformatics·Andreas Karwath, Ross D King
Oct 11, 2005·Journal of Biomedical Informatics·Zhuo FangLei Liu
Aug 16, 2016·Bioinformatics·Vedrana VidulinFran Supek
Jan 1, 2014·ISRN Bioinformatics·Giorgio Valentini

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.