Exploiting and integrating rich features for biological literature classification.

BMC Bioinformatics
Hongning WangXiaoyan Zhu

Abstract

Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper. To efficiently classify the biological literatures, we propose a novel feature value schema TF*ML, features covering from lower level domain independent "string feature" to higher level domain dependent "semantic template feature", and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of AUC and F-Score by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006. Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.

References

Jul 14, 2006·Bioinformatics·Bo HanSlobodan Vucetic
Aug 28, 2007·Bioinformatics·Yvan SaeysPedro Larrañaga
Dec 12, 2012·Database : the Journal of Biological Databases and Curation·Cathy H WuW John Wilbur

❮ Previous
Next ❯

Citations

Jan 13, 2009·Journal of Proteome Research·Allen HerbstLingjun Li
Dec 12, 2012·Database : the Journal of Biological Databases and Curation·Catalina O TudorK Vijay-Shanker
Sep 5, 2013·Bioinformatics·João D FerreiraFrancisco M Couto
Dec 17, 2009·Bioinformatics·Yijing ShenKer-Chau Li
Sep 18, 2009·Bioinformatics·Kristina M HettneJan A Kors
Dec 20, 2011·Bioinformatics·Fidel RamírezMario Albrecht
Jul 27, 2012·Briefings in Bioinformatics·Udo HahnNigam H Shah
Nov 28, 2012·Database : the Journal of Biological Databases and Curation·Thomas C WiegersCarolyn J Mattingly
Oct 26, 2013·Bioinformatics·Robert HoehndorfMichel Dumontier
May 11, 2013·Database : the Journal of Biological Databases and Curation·Damian SmedleyChristopher Mungall
Nov 8, 2011·Bioinformatics·Stefan R MaetschkeMark A Ragan
Apr 14, 2012·Bioinformatics·Tim RocktäschelUlf Leser
Dec 6, 2011·Briefings in Bioinformatics·Pietro H GuzziMario Cannataro
Jul 20, 2014·Database : the Journal of Biological Databases and Curation·Rafal RakSophia Ananiadou
Feb 3, 2012·Journal of the Royal Society, Interface·Kimberly GlassMichelle Girvan

❮ Previous
Next ❯

Software Mentioned

ONBIRES
ABNER
KeyBT
BioCreAtIvE
XEROX

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

Briefings in Bioinformatics
Pierre ZweigenbaumK Bretonnel Cohen
Molecular Cell
Lawrence Hunter, K Bretonnel Cohen
IEEE Transactions on Pattern Analysis and Machine Intelligence
Chengjun Liu
© 2022 Meta ULC. All rights reserved