Abstract
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
References
Apr 1, 1998·Trends in Genetics : TIG·L L Cavalli-Sforza
Sep 23, 1998·Carcinogenesis·S Garte
Jan 29, 2008·IEEE Transactions on Neural Networks·Martin J PearsonM Nibouche
Mar 11, 2008·Nature Biotechnology·Markus Ringnér
Sep 2, 2008·Nature·John NovembreCarlos D Bustamante
May 13, 2009·Proceedings of the National Academy of Sciences of the United States of America·Irma Silva-ZolezziGerardo Jimenez-Sanchez
Jun 14, 2014·Science·Andrés Moreno-EstradaCarlos D Bustamante
Mar 21, 2015·PloS One·Md Mahiuddin AhmedKatheleen J Gardiner
Jun 26, 2015·PloS One·Clara HigueraKrzysztof J Cios
Jan 19, 2016·Annals of Statistics·Jianqing FanLucy Xia
Mar 10, 2016·Sensors·Wenjun ChenHua Zhang
Mar 13, 2016·Briefings in Bioinformatics·Chen MengAedín C Culhane
Dec 22, 2016·Scientific Reports·Zhijun LiaoQuan Zou
Jan 17, 2017·Nature Communications·Grace X Y ZhengJason H Bielas
Nov 4, 2017·PLoS Computational Biology·Florian RohartKim-Anh Lê Cao
Citations
Apr 6, 2019·PLoS Genetics·Andy DahlNoah Zaitlen
Jan 29, 2020·Brain : a Journal of Neurology·Yasser Iturria-MedinaUNKNOWN Alzheimer's Disease Neuroimaging Initiative
Mar 17, 2020·Bioinformatics·Philippe BoileauSandrine Dudoit
Aug 28, 2019·NPJ Digital Medicine·Eli M CahanDaniel L Rubin
Sep 26, 2020·Molecular Systems Biology·Romain LopezNir Yosef
Feb 9, 2020·Nature Communications·Martin Jinye ZhangDavid Tse
Aug 30, 2019·BMC Medical Genomics·Yongqin YinYu-Juan Zhang
May 18, 2020·Cancer Research·Russell C RockneGuido Marcucci
Jan 6, 2021·Nature Communications·Allison WarrenJames M McFarland
Mar 31, 2021·British Journal of Cancer·Javad NoorbakhshJames M McFarland
May 1, 2021·International Journal of Molecular Sciences·Marco Del GiudiceMatteo Cereda
Apr 30, 2021·Seminars in Cancer Biology·Jamal Elkhader, Olivier Elemento
May 23, 2021·Communications Biology·Yasser Iturria-MedinaLazaro Sanchez-Rodriguez
Nov 4, 2020·Nature Biomedical Engineering·Md Tauhidul Islam, Lei Xing
Jun 25, 2021·Nature Cancer·Xueqing ZouSerena Nik-Zainal
Aug 19, 2021·Genome Biology·Yongjin P Park, Manolis Kellis
Jul 11, 2020·Journal of Agricultural and Food Chemistry·Chengyu WangJing Xie
Feb 25, 2020··Sandrine DudoitPhilippe Boileau