Modeling and analysis of RNA-seq data: a review from a statistical perspective

Quantitative Biology
Wei Vivian Li, Jingyi Jessica Li


Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.


Aug 31, 2002·Science·E RavaszA L Barabási
Sep 28, 2002·Science·Michelle N ArbeitmanKevin P White
Jul 31, 2004·Bioinformatics·Alberto de la FuentePedro Mendes
Apr 25, 2008·BMC Bioinformatics·Martin OtiHan G Brunner
Jun 3, 2008·Nature Methods·Ali MortazaviBarbara J Wold
Oct 11, 2008·Nature Biotechnology·Daniel BrantonJeffery A Schloss
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Dec 31, 2008·BMC Bioinformatics·Peter Langfelder, Steve Horvath
Feb 27, 2009·Bioinformatics·Hui Jiang, Wing Hung Wong
Mar 18, 2009·Bioinformatics·Cole TrapnellSteven L Salzberg
Mar 4, 2010·Genome Biology·Mark D Robinson, Alicia Oshlack
Mar 12, 2010·Nature·Joseph K PickrellJonathan K Pritchard
Jun 10, 2010·Nature Reviews. Genetics·R David HawkinsBing Ren
Jun 17, 2010·Nucleic Acids Research·Regina Bohnert, Gunnar Rätsch
Aug 12, 2010·BMC Bioinformatics·Thomas J Hardcastle, Krystyna A Kelly
Oct 29, 2010·Genome Biology·Simon Anders, Wolfgang Huber
Nov 9, 2010·Nature Methods·Yarden KatzChristopher B Burge
Dec 15, 2010·Journal of Bioinformatics and Computational Biology·Xi WangXuegong Zhang
Mar 18, 2011·Genome Biology·Adam RobertsLior Pachter
May 17, 2011·Nature Biotechnology·Manfred G GrabherrAviv Regev
Sep 29, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Wei Vivian LiTao Jiang
Dec 3, 2011·Proceedings of the National Academy of Sciences of the United States of America·Jingyi Jessica LiPeter J Bickel
Apr 10, 2012·Nature Methods·Gokul RamaswamiJin Billy Li
Jun 23, 2012·Genome Research·Simon AndersWolfgang Huber
Aug 9, 2012·Theory in Biosciences = Theorie in Den Biowissenschaften·Günter P WagnerVincent J Lynch
Sep 8, 2012·Genome Research·Jennifer HarrowTim J Hubbard
Sep 19, 2012·Briefings in Bioinformatics·Marie-Agnès DilliesFrench StatOmique Consortium
Sep 19, 2012·Biostatistics·Mark A Van De WielWessel N Van Wieringen
Nov 20, 2012·Nature Methods·Adam Roberts, Lior Pachter
Dec 4, 2012·Genome Research·Aziz M MezliniMichael Brudno
Feb 23, 2013·Bioinformatics·Ning LengChristina Kendziorski
Mar 19, 2013·BMC Bioinformatics·Charlotte Soneson, Mauro Delorenzi
Nov 5, 2013·Nature Methods·Pär G EngströmRGASP Consortium
Nov 5, 2013·Nature Methods·Tamara SteijgerPaul Bertone
Nov 28, 2013·Proceedings of the National Academy of Sciences of the United States of America·Kin Fai AuWing Hung Wong
Jan 28, 2014·Nature·Anamaria NecsuleaHenrik Kaessmann
Feb 4, 2014·Genome Biology·Charity W LawGordon K Smyth
Dec 7, 2014·Proceedings of the National Academy of Sciences of the United States of America·Shihao ShenYi Xing
Dec 18, 2014·Genome Biology·Michael I LoveSimon Anders
Jan 20, 2015·Nature Genetics·Matthew K IyerArul M Chinnaiyan
Feb 19, 2015·Nature Biotechnology·Mihaela PerteaSteven L Salzberg
May 23, 2015·Molecular Cell·Aleksandra A KolodziejczykSarah A Teichmann
Nov 4, 2015·Genome Biology·Emma Pierson, Christopher Yau
Nov 7, 2015·Genomics, Proteomics & Bioinformatics·Anthony Rhoads, Kin Fai Au
Jan 13, 2016·Genome Biology·Marc HaberChris Tyler-Smith
Feb 3, 2016·Genome Biology·Stefan CanzarGunnar W Klau
Mar 28, 2016·Lancet·Daniel E ZakACS and GC6-74 cohort study groups
Apr 5, 2016·Nature Biotechnology·Nicolas L BrayLior Pachter
Apr 20, 2016·Proceedings of the National Academy of Sciences of the United States of America·Juan ZhaoLuonan Chen
Mar 7, 2017·Nature Methods·Rob PatroCarl Kingsford
Jun 6, 2017·Nature Methods·Harold PimentelLior Pachter
Aug 7, 2017·Statistics in Biosciences·Wei Vivian LiJingyi Jessica Li
Nov 2, 2017·Genome Research·Nicolas J TourasseDenis Dupuy
Mar 10, 2018·Nature Communications·Wei Vivian Li, Jingyi Jessica Li
May 8, 2018·The Annals of Applied Statistics·Wei Vivian LiJingyi Jessica Li

Related Concepts

Sequence Analysis, RNA
Sequence Determinations, RNA
Small Molecule
Computed (Procedure)
Massively-Parallel Sequencing

Related Feeds

CZI Human Cell Atlas Seed Network

The aim of the Human Cell Atlas (HCA) is to build reference maps of all human cells in order to enhance our understanding of health and disease. The Seed Networks for the HCA project aims to bring together collaborators with different areas of expertise in order to facilitate the development of the HCA. Find the latest research from members of the HCA Seed Networks here.