Variable selection and validation in multivariate modelling

Bioinformatics
Lin ShiCarl Brunius

Abstract

Validation of variable selection and predictive performance is crucial in construction of robust multivariate models that generalize well, minimize overfitting and facilitate interpretation of results. Inappropriate variable selection leads instead to selection bias, thereby increasing the risk of model overfitting and false positive discoveries. Although several algorithms exist to identify a minimal set of most informative variables (i.e. the minimal-optimal problem), few can select all variables related to the research question (i.e. the all-relevant problem). Robust algorithms combining identification of both minimal-optimal and all-relevant variables with proper cross-validation are urgently needed. We developed the MUVR algorithm to improve predictive performance and minimize overfitting and false positives in multivariate analysis. In the MUVR algorithm, minimal variable selection is achieved by performing recursive variable elimination in a repeated double cross-validation (rdCV) procedure. The algorithm supports partial least squares and random forest modelling, and simultaneously identifies minimal-optimal and all-relevant variable sets for regression, classification and multilevel analyses. Using three authentic omic...Continue Reading

References

May 2, 2002·Proceedings of the National Academy of Sciences of the United States of America·Christophe Ambroise, Geoffrey J McLachlan
Jan 27, 2007·BMC Bioinformatics·Carolin StroblTorsten Hothorn
May 15, 2007·Bioinformatics·Anne-Laure Boulesteix
Aug 28, 2007·Bioinformatics·Yvan SaeysPedro Larrañaga
Aug 30, 2008·Journal of Proteome Research·Ewoud J J van VelzenAge K Smilde
Mar 27, 2010·Metabolomics : Official Journal of the Metabolomic Society·Johan A WesterhuisAge K Smilde
Feb 9, 2011·Briefings in Bioinformatics·Peter J CastaldiJohn P A Ioannidis
Mar 4, 2011·Methods in Molecular Biology·Hiroshi Tanaka, Soichi Ogishima
Dec 7, 2011·Algorithms for Molecular Biology : AMB·Tahir MehmoodLars Snipen
Mar 23, 2012·Nature Reviews. Molecular Cell Biology·Gary J PattiGary Siuzdak
Nov 2, 2012·Molecular & Cellular Proteomics : MCP·Christin ChristinPeter Horvatovich
Nov 10, 2012·Theory in Biosciences = Theorie in Den Biowissenschaften·Hong Li
Jan 15, 2014·BMC Bioinformatics·Miron Bartosz Kursa
Apr 1, 2014·Journal of Cheminformatics·Damjan KrstajicSimon Thomas
Dec 18, 2015·Artificial Intelligence in Medicine·Jerzy Krawczuk, Tomasz Łukaszuk
Mar 12, 2016·Analytica Chimica Acta·Lunzhao YiYizeng Liang
Dec 19, 2016·NeuroImage·Gaël VaroquauxBertrand Thirion
Jun 8, 2017·Environmental Monitoring and Assessment·Eric W FoxMarc H Weber
May 26, 2018·Educational and Psychological Measurement·Danilo BzdokBertrand Thirion

❮ Previous
Next ❯

Citations

Mar 20, 2020·Molecular Omics·Anton RibbenstedtJonathan P Benskin
Apr 5, 2020·Metabolites·Anton KlåvusKati Hanhineva
Jun 17, 2020·The Journal of Maternal-fetal & Neonatal Medicine : the Official Journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians·Maria HallingströmBo Jacobsson
Oct 1, 2020·International Journal of Epidemiology·Chuanyu HuXingdong Chen
Jul 24, 2020·Journal of Animal Science and Biotechnology·Ying BaiJiuzhou Song
Sep 25, 2019·Metabolites·Jan StanstrupSteffen Neumann
Jan 16, 2020·Metabolites·Álvaro Fernández-OchoaAntonio Segura-Carretero
Jul 24, 2020·Analytical Methods : Advancing Methods and Applications·Ivan PlyushchenkoIgor Rodin
Jan 12, 2021·Computational and Structural Biotechnology Journal·Megan L MatthewsCranos M Williams
Feb 13, 2021·International Journal of Molecular Sciences·Stanislaw SupplittIzabela Laczmanska
May 29, 2020·Journal of Proteome Research·Álvaro Fernández-OchoaAntonio Segura-Carretero
Nov 17, 2020·Clinical Nutrition : Official Journal of the European Society of Parenteral and Enteral Nutrition·Stefania NoermanKati Hanhineva
Jun 1, 2021·Nutrition, Metabolism, and Cardiovascular Diseases : NMCD·Kun XuHong Yan
Jun 8, 2021·Frontiers in Cell and Developmental Biology·Liru WangYan Zhang
May 18, 2021·International Journal of Obesity : Journal of the International Association for the Study of Obesity·Zhanxuan E WuSally D Poppitt
Aug 2, 2021·HPB : the Official Journal of the International Hepato Pancreato Biliary Association·Zuhaib M MirPatti A Groome
Oct 8, 2020·Cell Metabolism·María Arnoriaga-RodríguezJosé Manuel Fernández-Real

❮ Previous
Next ❯

Methods Mentioned

BETA
amplicon sequencing

Software Mentioned

VSURF
randomForest
RF
Boruta
mixOmics
R
R package ‘ MUVR ’
PLS
MUVR
rdCV

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.