DOI: 10.1101/499392Dec 21, 2018Paper

A Modeling Framework for Exploring Sampling and Observation Process Biases in Genome and Phenome-wide Association Studies using Electronic Health Records

BioRxiv : the Preprint Server for Biology
Lauren J. BeesleyBhramar Mukherjee


Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges of non-probability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. In general, the extent of the bias that may be introduced by ignoring these factors is not well-characterized. In this paper, we develop a statistical framework for characterizing the bias expected in association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given results obtained from standard naive analysis methods. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed approximations. We apply our approach to study bias in genetic association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine.

Related Concepts

Plants, Medicinal
Health Care
University of Michigan Comprehensive Cancer Center
Genome-Wide Association Study
Electronic Health Records
Genetic Association Studies

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.