A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

Journal of the American Medical Informatics Association : JAMIA
Tasnia TahsinGraciela Gonzalez

Abstract

The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordin...Continue Reading

References

Mar 12, 2004·Molecular Ecology·Edward C Holmes
Jun 17, 2004·Environmental Health Perspectives·Paul Elliott, Daniel Wartenberg
Feb 3, 2005·Journal of the American Medical Informatics Association : JAMIA·George Hripcsak, Adam S Rothschild
Apr 28, 2007·Systematic Biology·Daniel JaniesWard C Wheeler
Jun 11, 2009·BMC Research Notes·Holly MillerIndra Neil Sarkar
Oct 24, 2009·Journal of Biomedical Informatics·Elizabeth S Chen, Indra Neil Sarkar
Jun 3, 2010·BMC Bioinformatics·Javier Tamames, Victor de Lorenzo
Dec 3, 2010·PLoS Computational Biology·Joseph ChanRaul Rabadan
Jul 5, 2011·Journal of Biomedical Informatics·Matthew ScotchGraciela Gonzalez
Mar 24, 2012·Current Opinion in Virology·Nuno Rodrigues FariaPhilippe Lemey
Sep 21, 2012·Clinical Pharmacology and Therapeutics·M Whirl-CarrilloT E Klein
Nov 30, 2012·Nucleic Acids Research·Dennis A BensonEric W Sayers
Jun 15, 2015·Bioinformatics·Davy WeissenbacherGraciela Gonzalez

❮ Previous
Next ❯

Citations

Dec 15, 2017·Bioinformatics·Tasnia TahsinGraciela Gonzalez-Hernandez
Dec 29, 2016·PeerJ·Lucas SinclairEvangelos Pafilis
Jan 1, 2017·Database : the Journal of Biological Databases and Curation·Tasnia TahsinMatthew Scotch
Nov 3, 2020·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Thomas J Webb, Bart Vanhoorne
Jan 7, 2021·PloS One·Elise Acheson, Ross S Purves
Jun 29, 2021·American Journal of Botany·Ryan A Folk, Carolina M Siniscalchi

❮ Previous
Next ❯

Related Concepts

Related Feeds

Bioinformatics in Biomedicine

Bioinformatics in biomedicine incorporates computer science, biology, chemistry, medicine, mathematics and statistics. Discover the latest research on bioinformatics in biomedicine here.