A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations.

BMC Bioinformatics
Harri T Kiiveri

Abstract

With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking. The major contribution of this paper is to present a unified methodology which allows many common (statistical) response models to be fitted to such data sets. The class of models includes virtually any model with a linear predictor in it, for example (but not limited to), multiclass logistic regression (classification), generalised linear models (regression) and survival models. A fast algorithm for finding sparse well fitting models is presented. The ideas are illustrated on real data sets with numbers of variables ranging from thousands to millions. R code implementing the ideas is available for download. The method described in this paper enables existing work on response models when there are less variables than observations to be leveraged to the situa...Continue Reading

References

May 2, 2002·Proceedings of the National Academy of Sciences of the United States of America·Christophe Ambroise, Geoffrey J McLachlan
Jun 24, 2004·Proceedings of the National Academy of Sciences of the United States of America·Avrum SpiraJerome S Brody
Nov 19, 2004·The New England Journal of Medicine·Sandeep S DaveLouis M Staudt
May 16, 2006·Biostatistics·Mee Young ParkRobert Tibshirani
Dec 19, 2006·Nature Genetics·Scott A TomlinsArul M Chinnaiyan

❮ Previous
Next ❯

Citations

Oct 13, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Huwaida S Rabie, Ian W Saunders
Apr 23, 2010·Bioinformatics·A PhatakW J Wilson
Nov 7, 2009·Genetics, Selection, Evolution : GSE·Klara L Verbyla, Arunas P Verbyla
Jan 26, 2012·Statistics in Medicine·Veronika RockovaBob Löwenberg
Feb 27, 2016·The Lancet. Respiratory Medicine·Emma E DavenportJulian C Knight
Jan 27, 2015·Digestive Diseases and Sciences·Glen S PattenMichael A Conlon
May 31, 2013·Molecular Pharmaceutics·Maryam SalahinejadDavid A Winkler
Nov 20, 2008·The Journal of Clinical Endocrinology and Metabolism·Viive M HowellDeborah J Marsh

❮ Previous
Next ❯

Methods Mentioned

BETA
chips
and

Software Mentioned

R library
R package
R

Related Concepts

Related Feeds

Bioinformatics in Biomedicine

Bioinformatics in biomedicine incorporates computer science, biology, chemistry, medicine, mathematics and statistics. Discover the latest research on bioinformatics in biomedicine here.

© 2022 Meta ULC. All rights reserved