DOI: 10.1101/487975Dec 6, 2018Paper

EpiSmokEr: A robust classifier to determine smoking status from DNA methylation data

BioRxiv : the Preprint Server for Biology
Sailalitha BollepalliSimon Anders


Self-reported smoking status is prone to misclassification due to under-reporting while biomarkers like cotinine can only measure recent exposure. Smoking strongly influences DNA methylation, with current, former and never smokers exhibiting different methylation profiles. Recently, two approaches were proposed to calculate scores based on smoking-responsive DNA methylation loci to serve as reliable indicators of long-term exposure and potential biomarkers to estimate smoking behaviour. However, these two methodologies cannot be directly used to classify individuals with unknown smoking habits. To advance the practical applicability of the smoking-associated methylation signals, we used a machine learning methodology to train a classifier for smoking status prediction. We show the prediction performance of our classifier on three independent whole-blood test datasets demonstrating its robustness and global applicability. Furthermore, we examine the reasons for biologically meaningful misclassifications through comprehensive phenotypic evaluation. Additionally, we provide the community with an R package, EpiSmokEr, facilitating the implementation of our classifier to predict smoking status in future studies.

Related Concepts

Biological Markers
DNA Methylation
Whole Body Blood Replacement

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.