A Tool for Visualization and Analysis of Single-Cell RNA-Seq Data Based on Text Mining

Frontiers in Genetics
Gennaro Gambardella, Diego di Bernardo


Gene expression in individual cells can now be measured for thousands of cells in a single experiment thanks to innovative sample-preparation and sequencing technologies. State-of-the-art computational pipelines for single-cell RNA-sequencing data, however, still employ computational methods that were developed for traditional bulk RNA-sequencing data, thus not accounting for the peculiarities of single-cell data, such as sparseness and zero-inflated counts. Here, we present a ready-to-use pipeline named gf-icf (gene frequency-inverse cell frequency) for normalization of raw counts, feature selection, and dimensionality reduction of scRNA-seq data for their visualization and subsequent analyses. Our work is based on a data transformation model named term frequency-inverse document frequency (TF-IDF), which has been extensively used in the field of text mining where extremely sparse and zero-inflated data are common. Using benchmark scRNA-seq datasets, we show that the gf-icf pipeline outperforms existing state-of-the-art methods in terms of improved visualization and ability to separate and distinguish different cell types.


Oct 4, 2005·Proceedings of the National Academy of Sciences of the United States of America·Aravind SubramanianJill P Mesirov
Sep 8, 2012·Nature·ENCODE Project Consortium
Apr 14, 2015·Nature Biotechnology·Kaia AchimJohn C Marioni
Apr 14, 2015·Nature Biotechnology·Rahul SatijaAviv Regev
Jan 26, 2016·Nature Reviews. Genetics·Charles GawadStephen R Quake
Mar 8, 2016·F1000Research·Serena Liu, Cole Trapnell
Jan 17, 2017·Nature Communications·Grace X Y ZhengJason H Bielas
Oct 5, 2018·Nature·Tabula Muris ConsortiumPrincipal investigators
Oct 28, 2018·BMC Genomics·Marmar Moussa, Ion I Măndoiu


Related Concepts

Gene Expression
Computer Software
Software Tools
Single-Cell Analysis
Research Study
Sequence Determinations
Sequence Determinations, RNA
Nucleic Acid Sequencing

Related Feeds

CZI Human Cell Atlas Seed Network

The aim of the Human Cell Atlas (HCA) is to build reference maps of all human cells in order to enhance our understanding of health and disease. The Seed Networks for the HCA project aims to bring together collaborators with different areas of expertise in order to facilitate the development of the HCA. Find the latest research from members of the HCA Seed Networks here.