Exploring patterns enriched in a dataset with contrastive principal component analysis

Nature Communications
Abubakar AbidJames Zou

Abstract

Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.

References

Apr 1, 1998·Trends in Genetics : TIG·L L Cavalli-Sforza
Mar 11, 2008·Nature Biotechnology·Markus Ringnér
Sep 2, 2008·Nature·John NovembreCarlos D Bustamante
May 13, 2009·Proceedings of the National Academy of Sciences of the United States of America·Irma Silva-ZolezziGerardo Jimenez-Sanchez
Jan 19, 2016·Annals of Statistics·Jianqing FanLucy Xia
Mar 13, 2016·Briefings in Bioinformatics·Chen MengAedín C Culhane
Jan 17, 2017·Nature Communications·Grace X Y ZhengJason H Bielas
Nov 4, 2017·PLoS Computational Biology·Florian RohartKim-Anh Lê Cao

❮ Previous
Next ❯

Citations

Apr 6, 2019·PLoS Genetics·Andy DahlNoah Zaitlen
Jan 29, 2020·Brain : a Journal of Neurology·Yasser Iturria-MedinaUNKNOWN Alzheimer's Disease Neuroimaging Initiative
Aug 28, 2019·NPJ Digital Medicine·Eli M CahanDaniel L Rubin
Sep 26, 2020·Molecular Systems Biology·Romain LopezNir Yosef
Feb 9, 2020·Nature Communications·Martin Jinye ZhangDavid Tse
Aug 30, 2019·BMC Medical Genomics·Yongqin YinYu-Juan Zhang
Jan 6, 2021·Nature Communications·Allison WarrenJames M McFarland
Mar 31, 2021·British Journal of Cancer·Javad NoorbakhshJames M McFarland
May 1, 2021·International Journal of Molecular Sciences·Marco Del GiudiceMatteo Cereda
Apr 30, 2021·Seminars in Cancer Biology·Jamal Elkhader, Olivier Elemento
May 23, 2021·Communications Biology·Yasser Iturria-MedinaLazaro Sanchez-Rodriguez
Nov 4, 2020·Nature Biomedical Engineering·Md Tauhidul Islam, Lei Xing
Aug 19, 2021·Genome Biology·Yongjin P Park, Manolis Kellis

❮ Previous
Next ❯

Methods Mentioned

BETA
PCA
RNA-Seq

Software Mentioned

QUADRO
GitHub
SNE
cPCA

Related Concepts

Related Feeds

CZI Human Cell Atlas Seed Network

The aim of the Human Cell Atlas (HCA) is to build reference maps of all human cells in order to enhance our understanding of health and disease. The Seed Networks for the HCA project aims to bring together collaborators with different areas of expertise in order to facilitate the development of the HCA. Find the latest research from members of the HCA Seed Networks here.

BioHub - Researcher Network

The Chan-Zuckerberg Biohub aims to support the fundamental research and develop the technologies that will enable physicians to cure, prevent, or manage all diseases in our childrens' lifetimes. The CZ Biohub brings together researchers from UC Berkeley, Stanford, and UCSF. Find the latest research from the CZ Biohub researcher network here.

Related Papers

IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication of the IEEE Systems, Man, and Cybernetics Society
M Gallagher, T Downs
BMC Bioinformatics
Gift NyamundandaIsobel Claire Gormley
© 2022 Meta ULC. All rights reserved