Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms

Feng Tai, Wei Pan


In the context of sample (e.g. tumor) classifications with microarray gene expression data, many methods have been proposed. However, almost all the methods ignore existing biological knowledge and treat all the genes equally a priori. On the other hand, because some genes have been identified by previous studies to have biological functions or to be involved in pathways related to the outcome (e.g. cancer), incorporating this type of prior knowledge into a classifier can potentially improve both the predictive performance and interpretability of the resulting model. We propose a simple and general framework to incorporate such prior knowledge into building a penalized classifier. As two concrete examples, we apply the idea to two penalized classifiers, nearest shrunken centroids (also called PAM) and penalized partial least squares (PPLS). Instead of treating all the genes equally a priori as in standard penalized methods, we group the genes according to their functional associations based on existing biological knowledge or data, and adopt group-specific penalty terms and penalization parameters. Simulated and real data examples demonstrate that, if prior knowledge on gene grouping is indeed informative, our new methods perfo...Continue Reading


Feb 3, 2009·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Xi Chen, Lily Wang
Jun 29, 2010·Bioinformatics·Monika JelizarowAnne-Laure Boulesteix
Aug 21, 2013·Bioinformatics·Zixing WangYin Liu
May 2, 2012·BMC Bioinformatics·Tiziana SanaviaBarbara Di Camillo
Jan 22, 2008·Bioinformatics·Giorgio Valentini, Nicolò Cesa-Bianchi
Dec 23, 2015·Journal of Biomedical Informatics·Iman KamkarSvetha Venkatesh
Sep 15, 2015·Statistics in Medicine·Mark A van de WielSaskia M Wilting
Aug 13, 2010·Computational Biology and Chemistry·Zengyou He, Weichuan Yu
Sep 16, 2008·Artificial Intelligence in Medicine·Joaquin Dopazo
Aug 30, 2008·Current Opinion in Biotechnology·Xuewei WangChristina Chan
Oct 6, 2016·BioMed Research International·Jiawei LuoPingjian Ding
Jan 10, 2018·Statistics in Medicine·Kristoffer H Hellton, Nils Lid Hjort
Dec 29, 2017·BMC Bioinformatics·Dennis E Te BeestMark A van de Wiel
Oct 6, 2018·Statistical Methods in Medical Research·Lin Zhang, Inyoung Kim
Dec 31, 2019·Biostatistics·Magnus M MünchMark A van de Wiel
Jan 1, 2008·Bioinformatics and Biology Insights·Manuela HummelUlrich Mansmann
Apr 23, 2019·Scandinavian Journal of Statistics, Theory and Applications·Mark A van de WielMagnus M Münch


Nov 15, 2001·Proceedings of the National Academy of Sciences of the United States of America·A BhattacharjeeM Meyerson
Feb 12, 2002·Bioinformatics·Danh V Nguyen, David M Rocke
May 2, 2002·Proceedings of the National Academy of Sciences of the United States of America·Christophe Ambroise, Geoffrey J McLachlan
May 16, 2002·Proceedings of the National Academy of Sciences of the United States of America·Robert TibshiraniGilbert Chu
Jun 4, 2002·CMAJ : Canadian Medical Association Journal = Journal De L'Association Medicale Canadienne·Khursheed N Jeejeebhoy
Jun 28, 2002·Cancer Cell·Dinesh SinghWilliam R Sellers
May 16, 2003·Lancet·Erich HuangAndrew T Huang
Nov 5, 2003·Bioinformatics·Xiaohong Huang, Wei Pan
Oct 8, 2004·Journal of Biopharmaceutical Statistics·Jill ChengMichael A Siani-Rose
Sep 22, 2005·Bioinformatics·Alan R Dabney
Oct 11, 2005·Journal of Biomedical Informatics·Zhuo FangLei Liu
May 2, 2006·Statistical Applications in Genetics and Molecular Biology·Wei Pan
Jul 1, 2006·Bioinformatics·Herbert PangHongyu Zhao
Oct 31, 2006·Omics : a Journal of Integrative Biology·Joaquin Dopazo

Related Concepts

Biochemical Pathway
Gene Expression Regulation, Neoplastic
Gene Expression
Rietveld Refinement
Regression Analysis
Two-Parameter Models
Mammary Neoplasms, Human
Cdna Microarrays
Malignant Neoplasms
Computational Molecular Biology

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Synapse Loss as Therapeutic Target in MS

As we age, the number of synapses present in the human brain starts to decline, but in neurodegenerative diseases this occurs at an accelerated rate. In MS, it has been shown that there is a reduction in synaptic density, which presents a potential target for treatment. Here is the latest research on synapse loss as a therapeutic target in MS.

Artificial Intelligence in Cardiac Imaging

Artificial intelligence (ai) techniques are increasingly applied to cardiovascular (cv) medicine in cardiac imaging analysis. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

Social Learning

Social learning involves learning new behaviors through observation, imitation and modeling. Follow this feed to stay up to date on the latest research.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Single Cell Chromatin Profiling

Techniques like ATAC-seq and CUT&Tag have the potential to allow single cell profiling of chromatin accessibility, histones, and TFs. This will provide novel insight into cellular heterogeneity and cell states. Discover the latest research on single cell chromatin profiling here.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.