Feb 13, 2014

Literature mining of genetic variants for curation: quantifying the importance of supplementary material

Database : the Journal of Biological Databases and Curation
Antonio Jimeno Yepes, Karin Verspoor

Abstract

A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously consid...Continue Reading

Mentioned in this Paper

Genome
Somatic Mutation
Neoplasms
Computer Programs and Programming
Genomics
Sporotrichosis
Intestines
Genome, Human
Medical Subject Headings
Mutation Abnormality

Related Feeds

Cancer Genomics (Keystone)

Cancer genomics approaches employ high-throughput technologies to identify the complete catalog of somatic alterations that characterize the genome, transcriptome and epigenome of cohorts of tumor samples. Discover the latest research using such technologies in this feed.

Cancer Genomics

Cancer genomics employ high-throughput technologies to identify the complete catalog of somatic alterations that characterize the genome, transcriptome and epigenome of cohorts of tumor samples. Discover the latest research here.

Bioinformatics in Biomedicine

Bioinformatics in biomedicine incorporates computer science, biology, chemistry, medicine, mathematics and statistics. Discover the latest research on bioinformatics in biomedicine here.