Statistical significance of cluster membership for unsupervised evaluation of cell identities

Neo Christopher Chung


Single-cell RNA-sequencing (scRNA-seq) allows us to dissect transcriptional heterogeneity arising from cellular types, spatio-temporal contexts and environmental stimuli. Transcriptional heterogeneity may reflect phenotypes and molecular signatures that are often unmeasured or unknown a priori. Cell identities of samples derived from heterogeneous subpopulations are then determined by clustering of scRNA-seq data. These cell identities are used in downstream analyses. How can we examine if cell identities are accurately inferred? Unlike external measurements or labels for single cells, using clustering-based cell identities result in spurious signals and false discoveries. We introduce non-parametric methods to evaluate cell identities by testing cluster memberships in an unsupervised manner. Diverse simulation studies demonstrate accuracy of the jackstraw test for cluster membership. We propose a posterior probability that a cell should be included in that clustering-based subpopulation. Posterior inclusion probabilities (PIPs) for cluster memberships can be used to select and visualize samples relevant to subpopulations. The proposed methods are applied on three scRNA-seq datasets. First, a mixture of Jurkat and 293T cell lin...Continue Reading


Dec 9, 1998·Proceedings of the National Academy of Sciences of the United States of America·M B EisenD Botstein
Jun 9, 1999·Proceedings of the National Academy of Sciences of the United States of America·U AlonA J Levine
Dec 5, 2000·Molecular Biology of the Cell·A P GaschP O Brown
Sep 13, 2001·Proceedings of the National Academy of Sciences of the United States of America·T SørlieA L Børresen-Dale
Oct 24, 2001·Bioinformatics·K Y YeungW L Ruzzo
Jul 29, 2003·Proceedings of the National Academy of Sciences of the United States of America·John D Storey, Robert Tibshirani
Nov 8, 2008·IEEE Transactions on Visualization and Computer Graphics·Markus GlatterAidong Lu
Sep 24, 2013·Nature Methods·Philip BrenneckeMarcus G Heisler
Jun 14, 2014·Science·Anoop P PatelBradley E Bernstein
Oct 23, 2014·Bioinformatics·Neo Christopher Chung, John D Storey
Jan 30, 2015·Nature Reviews. Genetics·Oliver StegleJohn C Marioni
Apr 14, 2015·Nature Biotechnology·Rahul SatijaAviv Regev
May 16, 2015·Epigenetics & Chromatin·Pau FarréMichael S Kobor
Nov 25, 2015·PLoS Computational Biology·Minzhe GuoYan Xu
Jan 4, 2017·Scientific Reports·Po-Yuan TungYoav Gilad
Jan 14, 2017·Scientific Reports·Neo Christopher Chung1000 Bull Genomes Project
Jan 17, 2017·Nature Communications·Grace X Y ZhengJason H Bielas
Jan 24, 2017·Nature Methods·Xiaojie QiuCole Trapnell
Mar 28, 2017·Nature Methods·Vladimir Yu KiselevMartin Hemberg
Jul 18, 2017·Molecular Aspects of Medicine·Tallulah S Andrews, Martin Hemberg
Sep 14, 2017·Genome Biology·Luke ZappiaAlicia Oshlack
Nov 10, 2017·Biostatistics·Stephanie C HicksRafael A Irizarry
Aug 24, 2018·Briefings in Bioinformatics·Taiyun KimPengyi Yang
Oct 28, 2018·BMC Genomics·Marmar Moussa, Ion I Măndoiu
Dec 26, 2019·BMC Bioinformatics·Neo Christopher ChungAnna Gambin


Dec 4, 2020·PloS One·Johan GustafssonJens Nielsen

Related Concepts

Statistical Cluster
Donor Person
Transcription, Genetic
Patient-Controlled Analgesia
Peripheral Blood
Jurkat Cells

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

Related Papers

BioRxiv : the Preprint Server for Biology
Neo Christopher Chung
Theresa B OehmkeMichael Tran Duong
British Dental Journal
L E Molyneux
Nature Immunology
Arthur M Silverstein
© 2021 Meta ULC. All rights reserved