Mar 23, 2019

High-throughput Multimodal Automated Phenotyping (MAP) with Application to PheWAS

BioRxiv : the Preprint Server for Biology
Katherine P. Liaowith the VA Million Veteran Program


Objective Electronic health records (EHR) linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Method We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the UMLS. Aggregated ICD and NLP counts along with healthcare utilization were jointly analyzed by fitting an ensemble of latent mixture models. The MAP algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying subjects with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort PheWAS for two SNPs with known associations. Results The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembl...Continue Reading

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Electronic Health Records
Unified Medical Language System
Translational Research
Bnk protein, Drosophila
Implantable Defibrillator
Single Nucleotide Polymorphism

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.