Abstract
Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suite...Continue Reading
References
Dec 21, 2004·Nucleic Acids Research·Ada HamoshVictor A McKusick
Nov 17, 2007·Bioinformatics·Dietrich Rebholz-SchuhmannAntonio Jimeno
Jul 22, 2008·Genome Biology·Yuanfang GuanOlga G Troyanskaya
Oct 18, 2008·Genome Biology·Alexander A MorganLynette Hirschman
Oct 28, 2008·American Journal of Human Genetics·Peter N RobinsonStefan Mundlos
Apr 7, 2009·American Journal of Human Genetics·Helen V FirthNigel P Carter
Oct 6, 2009·American Journal of Human Genetics·Sebastian KöhlerPeter N Robinson
Dec 4, 2009·PLoS Biology·Nicole L WashingtonSuzanna E Lewis
May 6, 2010·Journal of the American Medical Informatics Association : JAMIA·Alan R Aronson, François-Michel Lang
Sep 8, 2010·Journal of the American Medical Informatics Association : JAMIA·Guergana K SavovaChristopher G Chute
Jun 10, 2011·Molecular Systems Biology·Assaf GottliebRoded Sharan
Jun 16, 2011·Nucleic Acids Research·Patricia L WhetzelMark A Musen
Jun 21, 2011·Journal of the American Medical Informatics Association : JAMIA·Özlem UzunerScott L DuVall
Feb 15, 2012·Human Mutation·Chao-Kung ChenDamian Smedley
Mar 28, 2012·BMC Bioinformatics·Tudor GrozaAndreas Zankl
Feb 15, 2013·PloS One·Tudor GrozaAndreas Zankl
Oct 25, 2013·PloS One·Nigel CollierDietrich Rebholz-Schuhmann
Oct 29, 2013·Genome Research·Peter N RobinsonDamian Smedley
Nov 13, 2013·Nucleic Acids Research·Sebastian KöhlerPeter N Robinson
Citations
Feb 4, 2016·Genome Medicine·Regis A JamesChad A Shaw
Jun 30, 2015·American Journal of Human Genetics·Tudor GrozaPeter N Robinson
Jun 12, 2016·Orphanet Journal of Rare Diseases·Gareth BaynamJack Goldblatt
Oct 29, 2015·Database : the Journal of Biological Databases and Curation·Nigel CollierDietrich Rebholz-Schuhmann
Dec 3, 2016·Nucleic Acids Research·Sebastian KöhlerPeter N Robinson
Dec 3, 2016·Nucleic Acids Research·Christopher J MungallMelissa A Haendel
Dec 19, 2017·BioMed Research International·Manuel LoboFrancisco M Couto
Dec 9, 2017·Database : the Journal of Biological Databases and Curation·Luís CamposFrancisco Couto
Nov 17, 2016·European Journal of Human Genetics : EJHG·Daniel TrujillanoRami Abou Jamra
Nov 13, 2018·Clinical Genetics·Daniela HombachSebastian Köhler
Sep 25, 2016·Journal of Biomedical Semantics·Pier Luigi ButtigiegChristopher J Mungall
Apr 1, 2017·Archives of Pathology & Laboratory Medicine·Madhuri HegdeKarl V Voelkerding
Oct 11, 2017·Journal of the American Medical Informatics Association : JAMIA·Vincent GardeuxYves A Lussier
Jun 27, 2018·BMC Medical Informatics and Decision Making·Richard JacksonRichard Dobson
Dec 7, 2018·Journal of Cheminformatics·Francisco M Couto, Andre Lamurias
May 21, 2019·Nucleic Acids Research·Cong LiuChunhua Weng
May 17, 2019·JMIR Medical Informatics·Aryan ArbabiMichael Brudno
Jan 18, 2019·Clinical Medicine : Journal of the Royal College of Physicians of London·Joyeeta Rahman, Shamima Rahman
Jan 28, 2020·JAMIA Open·Meizhi JuSophia Ananiadou
Oct 12, 2017·BMC Bioinformatics·Maria TaboadaDiego Martinez
Oct 28, 2019·Journal of Biomedical Informatics·Cong LiuChunhua Weng
Jan 21, 2021·Bioinformatics·Ling LuoZhiyong Lu
Sep 4, 2018·American Journal of Human Genetics·Toyofumi FujiwaraToshihisa Takagi
Apr 14, 2021·Journal of Biomedical Semantics·Luke T SlaterGeorgios V Gkoutos