TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION

The Annals of Applied Statistics
Genevera I Allen, Robert Tibshirani

Abstract

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data matrix is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrix-variate normal, the mean-restricted matrix-variate normal, in which the rows and columns each have a separate mean vector and covariance matrix. By placing additive penalties on the inverse covariance matrices of the rows and columns, these so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and non-singular covariance matrices. Using these models, we formulate EM-type algorithms for missing data imputation in both the multivariate and transposable frameworks. We present theoretical results exploiting the structure of our transposable models that allow these models and imputation methods to be applied to high-dimensional data. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility.

References

Jun 8, 2001·Bioinformatics·O TroyanskayaR B Altman
Dec 2, 2005·PLoS Medicine·Hongjuan ZhaoJames D Brooks
Dec 15, 2007·Biostatistics·Jerome FriedmanRobert Tibshirani
Jan 20, 2010·Journal of the Royal Statistical Society. Series B, Statistical Methodology·Daniela M Witten, Robert Tibshirani
Jan 1, 2009·The Annals of Applied Statistics·Bradley Efron
Nov 6, 2010·Journal of the American Statistical Association·Bradley Efron
Jun 1, 2010·The Annals of Applied Statistics·Genevera I Allen, Robert Tibshirani

❮ Previous
Next ❯

Citations

Jun 23, 2020·Biometrics·Suprateek Kundu, Benjamin B Risk
Apr 3, 2018·Statistics in Medicine·Laura A Hatfield, Alan M Zaslavsky
Nov 5, 2014·Journal of Computational and Graphical Statistics : a Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·Kean Ming Tan, Daniela M Witten
Jan 1, 2015·Journal of the American Statistical Association·Alexander Volfovsky, Peter D Hoff
Jan 24, 2015·Biometrics·Anestis TouloumisJohn C Marioni
Feb 24, 2016·Nature Genetics·Andrew DahlJonathan Marchini
Dec 10, 2014·The Annals of Applied Statistics·Bailey K Fosdick, Peter D Hoff
Jun 1, 2010·The Annals of Applied Statistics·Genevera I Allen, Robert Tibshirani

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.