Abstract
The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordin...Continue Reading
References
Mar 12, 2004·Molecular Ecology·Edward C Holmes
Jun 17, 2004·Environmental Health Perspectives·Paul Elliott, Daniel Wartenberg
Feb 3, 2005·Journal of the American Medical Informatics Association : JAMIA·George Hripcsak, Adam S Rothschild
Apr 28, 2007·Systematic Biology·Daniel JaniesWard C Wheeler
Jun 11, 2009·BMC Research Notes·Holly MillerIndra Neil Sarkar
Oct 24, 2009·Journal of Biomedical Informatics·Elizabeth S Chen, Indra Neil Sarkar
Jun 3, 2010·BMC Bioinformatics·Javier Tamames, Victor de Lorenzo
Dec 3, 2010·PLoS Computational Biology·Joseph ChanRaul Rabadan
Jun 10, 2011·PLoS Pathogens·Jayna RaghwaniCameron P Simmons
Jul 5, 2011·Journal of Biomedical Informatics·Matthew ScotchGraciela Gonzalez
Mar 24, 2012·Current Opinion in Virology·Nuno Rodrigues FariaPhilippe Lemey
Aug 31, 2012·Parasitology·Rebecca R Gray, Marco Salemi
Sep 21, 2012·Clinical Pharmacology and Therapeutics·M Whirl-CarrilloT E Klein
Nov 30, 2012·Nucleic Acids Research·Dennis A BensonEric W Sayers
Nov 15, 2013·BioMed Research International·Okba SelamaHocine Hacène
Oct 31, 2014·Archives of Virology·Daniel MageeMatthew Scotch
Jun 15, 2015·Bioinformatics·Davy WeissenbacherGraciela Gonzalez
Jul 24, 2015·BMC Bioinformatics·Robert BossyClaire Nédellec
Citations
Dec 15, 2017·Bioinformatics·Tasnia TahsinGraciela Gonzalez-Hernandez
Dec 29, 2016·PeerJ·Lucas SinclairEvangelos Pafilis
Apr 14, 2018·Scientific Reports·Daniel MageeMatthew Scotch
Nov 1, 2018·International Journal of Health Geographics·Rachel BeardMatthew Scotch
Jan 1, 2017·Database : the Journal of Biological Databases and Curation·Tasnia TahsinMatthew Scotch
Jul 20, 2020·Bioinformatics·Arjun MaggeMatthew Scotch
Nov 3, 2020·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Thomas J Webb, Bart Vanhoorne
Jan 7, 2021·PloS One·Elise Acheson, Ross S Purves
Jun 29, 2021·American Journal of Botany·Ryan A Folk, Carolina M Siniscalchi