A machine learning-based framework to identify type 2 diabetes through electronic health records

International Journal of Medical Informatics
Tao ZhengYou Chen


To discover diverse genotype-phenotype associations affiliated with Type 2 Diabetes Mellitus (T2DM) via genome-wide association study (GWAS) and phenome-wide association study (PheWAS), more cases (T2DM subjects) and controls (subjects without T2DM) are required to be identified (e.g., via Electronic Health Records (EHR)). However, existing expert based identification algorithms often suffer in a low recall rate and could miss a large number of valuable samples under conservative filtering standards. The goal of this work is to develop a semi-automated framework based on machine learning as a pilot study to liberalize filtering criteria to improve recall rate with a keeping of low false positive rate. We propose a data informed framework for identifying subjects with and without T2DM from EHR via feature engineering and machine learning. We evaluate and contrast the identification performance of widely-used machine learning models within our framework, including k-Nearest-Neighbors, Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine and Logistic Regression. Our framework was conducted on 300 patient samples (161 cases, 60 controls and 79 unconfirmed subjects), randomly selected from 23,281 diabetes related cohort...Continue Reading


Nov 27, 2018·Statistical Methods in Medical Research·Theodora S BrisimiIoannis Ch Paschalidis
Mar 29, 2018·BMC Medical Informatics and Decision Making·Weiqi ChenMei Liu
Nov 29, 2019·Healthcare Informatics Research·Shahabeddin AbhariAli Garavand
Jul 31, 2020·Journal of Medical Engineering & Technology·Ambaji S JadhavSunil Biradar
Dec 8, 2019·Systematic Reviews·Corrado LaneraIleana Baldi
Apr 6, 2018·Journal of the Royal Society, Interface·Travers ChingCasey S Greene
Jul 16, 2019·Journal of the American Medical Informatics Association : JAMIA·Ying XiongJun Yan
Jun 18, 2020·Alzheimer's & Dementia : Translational Research & Clinical Interventions·Donna TjandraJenna Wiens
Apr 3, 2020·Nature Reviews. Genetics·Ruowang LiJason H Moore
Aug 5, 2017·Cerebellum & Ataxias·Eva BolcekováRobert Rusina
Oct 9, 2020·JMIR Medical Informatics·Masuda Begum SampaAshir Ahmed
Jul 15, 2020·The Journal of Knee Surgery·Emily LearyJames L Cook
Dec 15, 2020·Journal of the American Medical Informatics Association : JAMIA·Hossein EstiriShawn N Murphy
Mar 20, 2020·Joyce C. HoYubin Park
Sep 7, 2018·Arnel C. FajardoRuji P. Medina

Related Concepts

Machine Learning
Bayesian Prediction
Diabetes Mellitus, Non-Insulin-Dependent
Pilot Projects
Logistic Regression
Genome-Wide Association Study
Electronic Health Records
Support Vector Machines
Diabetes Mellitus, Non-Insulin-Dependent

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.