SNP selection in genome-wide association studies via penalized support vector machine with MAX test

Computational and Mathematical Methods in Medicine
Jinseog KimSin-Ho Jung


One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity.


Sep 15, 2015·Advances in Bioinformatics·Fayroz F SherifMahmoud Fakhr


Mar 16, 2000·Proceedings of the National Academy of Sciences of the United States of America·S P OhE Li
Oct 27, 2005·Bioinformatics·Hao Helen ZhangCheolwoo Park
Jun 24, 2006·Nature Reviews. Cancer·Brian Bierie, Harold L Moses
Oct 30, 2007·American Journal of Human Genetics·Karen N Conneely, Michael Boehnke
Jul 26, 2008·PLoS Genetics·Clive J HoggartDavid J Balding
Jan 30, 2009·Bioinformatics·Tong Tong WuKenneth Lange
Apr 29, 2009·Bioinformatics·Natalia BeckerAxel Benner
Sep 16, 2010·Genetic Epidemiology·Charles KooperbergValerie Obenchain
Oct 21, 2010·Clinical Cancer Research : an Official Journal of the American Association for Cancer Research·Dong Hwan Dennis KimJeffrey H Lipton
Nov 26, 2010·Genetic Epidemiology·Kristin L Ayers, Heather J Cordell
Sep 20, 2012·Journal of Biochemistry·Kohei Miyazono
Feb 20, 2013·BMC Bioinformatics·Jinseog KimSin-Ho Jung

Related Concepts

Genome-Wide Association Study
Support Vector Machines
Logistic Regression
Genetic Polymorphism
Single Nucleotide Polymorphism

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

HLA Genetic Variation

HLA genetic variation has been found to confer risk for a wide variety of diseases. Identifying these associations and understanding their molecular mechanisms is ongoing and holds promise for the development of therapeutics. Find the latest research on HLA genetic variation here.

Super-resolution Microscopy

Super-resolution microscopy is the term commonly given to fluorescence microscopy techniques with resolutions that are not limited by the diffraction of light. Here are the latest discoveries pertaining to super-resolution microscopy.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.

CD4/CD8 Signaling

Cluster of differentiation 4 and 8 (CD8 and CD8) are glycoproteins founds on the surface of immune cells. Here is the latest research on their role in cell signaling pathways.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.