Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis

PloS One
Shang-Ming ZhouSinead Brophy


1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary c...Continue Reading


Sep 6, 2003·Artificial Intelligence in Medicine·Brigitte Séroussi, Jacques Bouaud
Dec 14, 2004·Arthritis and Rheumatism·Jasvinder A SinghSiamak Noorbaloochi
Mar 21, 2006·Bioinformatics·Jianlin Cheng, Pierre Baldi
Mar 6, 2008·The British Journal of General Practice : the Journal of the Royal College of General Practitioners·Tim A HoltAzeem Majeed
Aug 13, 2008·Family Practice·Azeem MajeedAziz Sheikh
Jan 20, 2009·BMC Medical Informatics and Decision Making·Ronan A LyonsKen Leake
Feb 5, 2010·Bioinformatics·Minghui WangHeping Zhang
Mar 27, 2010·Clinical Trials : Journal of the Society for Clinical Trials·Martin DugasHans-Ulrich Prokosch
Feb 25, 2011·Arthritis Research & Therapy·Seo Young KimDaniel H Solomon
May 3, 2012·Nature Reviews. Genetics·Peter B JensenSøren Brunak
May 25, 2012·Arthritis Care & Research·Bernard NgMaria E Suarez-Almazor
Jan 25, 2013·Archives of Disease in Childhood·Wilhelmine Hadler MeerausRuth Gilbert
Jun 8, 2013·Genetics in Medicine : Official Journal of the American College of Medical Genetics·Omri GottesmaneMERGE Network
Aug 21, 2013·Journal of the American Medical Informatics Association : JAMIA·Rachel L RichessonRobert M Califf
Nov 10, 2013·Journal of the American Medical Informatics Association : JAMIA·Chaitanya ShivadeAlbert M Lai
Nov 15, 2013·Postgraduate Medical Journal·Samuel SeiduKamlesh Khunti
Sep 6, 2014·International Journal of Medical Informatics·Björn SchreiweisHans-Ulrich Prokosch
May 23, 2015·PloS One·Thomas J MacGillivrayUK Biobank Eye and Vision Consortium
Sep 29, 2015·PloS One·Bahram NamjouJohn B Harley


Feb 12, 2019·Nature Reviews. Rheumatology·Jose U ScherChristopher Ritchlin
Jul 23, 2019·Journal of the American Medical Informatics Association : JAMIA·Spiros DenaxasHarry Hemingway
Apr 30, 2019·Current Opinion in Rheumatology·Jessica A WalshYujin Park
Nov 28, 2018·BMJ Evidence-based Medicine·Daphne GuinnSean Khozin
Aug 29, 2020·Rheumatology International·Ankush D JamthikarJasjit S Suri
Jan 22, 2019·JMIR Public Health and Surveillance·Amir Talaei-KhoeiSeyed-Farzan Kazemi
Feb 6, 2018·Physical Review Letters·Cameron DeansFerruccio Renzoni
Aug 24, 2018·Journal of the American Medical Informatics Association : JAMIA·Kerry A McBrienPaul E Ronksley
Dec 21, 2018·Journal of Medical Internet Research·Brian McMillanDavid Dickinson
Apr 17, 2020·Rheumatology Advances in Practice·Maria HügleThomas Hügle
Jul 19, 2020·Clinical Reviews in Allergy & Immunology·Mengdi JiangPeter E Lipsky

Related Concepts

Machine Learning
Rheumatoid Arthritis
Primary Health Care
Antirheumatic Drugs, Disease-Modifying
Electronic Health Records
Arthritis, Psoriatic
Rheumatoid Arthritis
Electron Microscopy, Diagnostic
Primary Health Care
Learning and Learning Problems

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.