Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

Journal of Medical Internet Research
Albert ParkWanda Pratt

Abstract

The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap....Continue Reading

References

Feb 7, 1998·Journal of the American Medical Informatics Association : JAMIA·B L HumphreysG O Barnett
Jun 1, 2000·Journal of the American Medical Informatics Association : JAMIA·A T McCray, N C Ide
Jul 19, 2002·Journal of Biomedical Informatics·W W ChapmanB G Buchanan
Dec 4, 2003·Journal of Biomedical Informatics·Patricia Flatley Brennan, Alan R Aronson
Apr 10, 2004·Bioinformatics·L SmithW J Wilbur
Jun 10, 2004·British Journal of Cancer·S BamfordR Wooster
Oct 14, 2005·Journal of the American Medical Informatics Association : JAMIA·Qing T ZengEmily Dibble
Oct 14, 2005·Journal of the American Medical Informatics Association : JAMIA·Qing T Zeng, Tony Tse
Jan 11, 2007·Journal of the American Medical Informatics Association : JAMIA·Yan ChenJames J Cimino
Apr 26, 2008·Journal of the American Medical Informatics Association : JAMIA·Alla KeselmanQing Zeng-Treitler
Aug 30, 2008·Journal of Medical Internet Research·Gunther Eysenbach
Oct 18, 2008·Genome Biology·William A BaumgartnerLawrence Hunter
May 6, 2010·Journal of the American Medical Informatics Association : JAMIA·Alan R Aronson, François-Michel Lang
Jun 15, 2010·Journal of Medical Internet Research·Paul WicksJames Heywood
Sep 8, 2010·Journal of the American Medical Informatics Association : JAMIA·Guergana K SavovaChristopher G Chute
Nov 15, 2011·Journal of the American Medical Informatics Association : JAMIA·Mark A MusenUNKNOWN NCBO team
Oct 9, 2012·Journal of the American Medical Informatics Association : JAMIA·Ning KangJan A Kors
May 7, 2013·Journal of the American Medical Informatics Association : JAMIA·Diana Lynn MacLean, Jeffrey Heer
Sep 13, 2013·Journal of Biomedical Informatics·Jina HuhWanda Pratt

❮ Previous
Next ❯

Citations

Feb 26, 2016·Journal of the American Medical Informatics Association : JAMIA·Andrea L HartzlerWanda Pratt
May 16, 2019·Health Information and Libraries Journal·Donghua ChenKecheng Liu
Apr 5, 2018·Technology and Health Care : Official Journal of the European Society for Engineering and Medicine·Harshad HegdeAmit Acharya
Jul 1, 2018·Journal of Medical Internet Research·Timothy C GuettermanV G Vinod Vydiswaran
Jan 1, 2018·Information Systems Frontiers : a Journal of Research and Innovation·Koustav RudraMuhammad Imran
Jul 13, 2016·Journal of the American Medical Informatics Association : JAMIA·Shaodian ZhangNoémie Elhadad
Apr 20, 2018··Andrea HuMunmun De Choudhury
Jan 11, 2018··Shaodian ZhangFrank Chen
Apr 29, 2019··Amanda LazarNorman Makoto Su

❮ Previous
Next ❯

Software Mentioned

SPECIALIST
MedPost SKR
CancerConnect
Medical Language Extraction and Encoding System ( MedLEE )
MedPost SKR ( Semantic Knowledge Representation )
UMLS
cTakes
Stanford POS Tagger
Stanford Parser
MedLEE

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

AMIA ... Annual Symposium Proceedings
Hua XuCarol Friedman
AMIA ... Annual Symposium Proceedings
Jung-Wei Fan, Carol Friedman
Studies in Health Technology and Informatics
Selen Bozkurt, Daniel Rubin
© 2021 Meta ULC. All rights reserved