Apr 14, 2020

sureLDA: A Multi-Disease Automated Phenotyping Method for the Electronic Health Record

BioRxiv : the Preprint Server for Biology
Yuri AhujaT. Cai


Objective: A major bottleneck hindering utilization of electronic health record (EHR) data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though ICD codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes. Methods: sureLDA is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on two surrogate features for each target phenotype, and then leverages these probabilities to constrain the Latent Dirichlet Allocation (LDA) topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities. Results: sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness ...Continue Reading

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Pinnularia biceps
Neoplastic Polygonal Cell
Plant fiber
Motor Neurons
Anatomical Space Structure
Muscle Function
Biceps Brachii Muscle Structure

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Journal of the American Medical Informatics Association : JAMIA
Sheng YuTianxi Cai
Journal of the American Medical Informatics Association : JAMIA
Katherine LiaoTianxi Cai
Journal of the American Medical Informatics Association : JAMIA
Sheng YuTianxi Cai
BioRxiv : the Preprint Server for Biology
Katherine Liaowith the VA Million Veteran Program
© 2020 Meta ULC. All rights reserved