Bias detection and correction in RNA-Sequencing data

BMC Bioinformatics
Wei ZhengHongyu Zhao


High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates. In this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level...Continue Reading


Oct 20, 1995·Science·Victor E VelculescuKenneth W Kinzler
Mar 10, 2001·Nature·E S LanderInternational Human Genome Sequencing Consortium
May 1, 2002·Nature Biotechnology·Saurabh SahaVictor E Velculescu
Jun 8, 2002·Nature Genetics·Augustine KongKari Stefansson
Jul 15, 2004·Briefings in Functional Genomics & Proteomics·Jeanette ReinartzRick Woychik
Sep 12, 2006·Nature Biotechnology·Roger D CanalesFederico M Goodsaid
Nov 23, 2007·Genome Research·Martin C FrithAlbin Sandelin
Feb 8, 2008·BMC Bioinformatics·Mark J DunningMatthew E Ritchie
May 3, 2008·Science·Ugrappa NagalakshmiMichael Snyder
Jun 3, 2008·Nature Methods·Ali MortazaviBarbara J Wold
Jul 1, 2008·Nature Methods·Jay Shendure
Jul 29, 2008·Nucleic Acids Research·Juliane C DohmHeinz Himmelbauer
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Feb 27, 2009·Bioinformatics·Hui Jiang, Wing Hung Wong
Mar 18, 2009·Bioinformatics·Cole TrapnellSteven L Salzberg
Apr 18, 2009·Biology Direct·Alicia Oshlack, Matthew J Wakefield
Dec 22, 2009·Bioinformatics·Bo LiColin N Dewey
Jan 19, 2010·Nature Methods·Lira MamanovaDaniel J Turner
Mar 4, 2010·Genome Biology·Mark D Robinson, Alicia Oshlack
Mar 12, 2010·Nature·Joseph K PickrellJonathan K Pritchard
Apr 17, 2010·Nucleic Acids Research·Kasper D HansenSandrine Dudoit
May 13, 2010·Genome Biology·Jun LiWing Hung Wong
Nov 20, 2010·Bioinformatics·Timo LassmannCarsten O Daub
Jan 22, 2011·Bioinformatics·Liyan GaoXiangqin Cui
Mar 18, 2011·Genome Biology·Adam RobertsLior Pachter


Jan 11, 2012·Proceedings of the National Academy of Sciences of the United States of America·Katsuyuki ShiroguchiX Sunney Xie
Jan 31, 2012·Bioinformatics·Daniel C JonesMichael G Katze
Sep 29, 2011·Human Molecular Genetics·Jungsun ParkSoojin V Yi
Dec 20, 2011·BMC Bioinformatics·Davide RissoSandrine Dudoit
Mar 28, 2013·BMC Bioinformatics·Lisa M ChungHongyu Zhao
Nov 21, 2012·PloS One·Maxime TarabichiTomasz Konopka
Nov 14, 2013·PloS One·Gustavo GlusmanLeroy Hood
Jan 5, 2014·PloS One·Thomas P van GurpKoen J F Verhoeven
Jun 16, 2014·BMC Bioinformatics·Cyril FillouxPetit Daniel
May 16, 2014·Cellular and Molecular Life Sciences : CMLS·Eleonora de KlerkPeter A C 't Hoen
Sep 13, 2014·Briefings in Functional Genomics·Martha Lucía MeijueiroAntonio G Pisabarro
Jul 18, 2015·Nucleic Acids Research·Sonia TarazonaAna Conesa
Mar 12, 2016·Genome Biology and Evolution·Quan LeiRongjia Zhou
Jul 15, 2015·Scientific Reports·Sachin Pundhir, Jan Gorodkin
Sep 4, 2015·Biology Direct·Roman JaksikMarek Kimmel
Sep 18, 2015·Frontiers in Plant Science·Takayuki FujiwaraShin-Ya Miyagishima
Jan 8, 2013·Current Opinion in Chemical Biology·Paul A McGettigan
Jan 3, 2012·Seminars in Cell & Developmental Biology·Peter W HarrisonJudith E Mank
Sep 5, 2015·BioMed Research International·Yongchao DouChi Zhang
Aug 1, 2015·BMC Biotechnology·Sheena L FahertyAnne D Yoder
Sep 23, 2014·Briefings in Functional Genomics·Francesca Finotello, Barbara Di Camillo
Apr 7, 2015·Journal of Theoretical Biology·Robert Sinclair
Jun 25, 2013·Methods : a Companion to Methods in Enzymology·Nirav M AminFrank L Conlon
Jul 30, 2014·Molecular Biology and Evolution·Carlo G Artieri, Hunter B Fraser
May 20, 2016·Scientific Reports·Silvie Van den HoeckeXavier Saelens
Oct 27, 2017·BMC Bioinformatics·Oana CarjaPremal Shah
Nov 27, 2015·Bioinformatics and Biology Insights·Yixing HanBing Zhou
Nov 28, 2013·Evolutionary Bioinformatics Online·Tae Young Yang, Seongmun Jeong
May 18, 2018·Developmental Dynamics : an Official Publication of the American Association of Anatomists·Kate A MaurerNadean L Brown
Jul 23, 2019·Genetic Epidemiology·Siyun LiuTao Yu
Nov 7, 2019·Genome Research·Wei Vivian LiJingyi Jessica Li
Sep 24, 2017·Journal of Integrative Bioinformatics·Jamie Alnasir, Hugh P Shanahan
Nov 29, 2015·Biochemical Society Transactions·Muhammad Ali S Mumtaz, Juan Pablo Couso
Dec 28, 2012·Nature Reviews. Genetics·Johan Rung, Alvis Brazma
Sep 13, 2017·Proceedings of the National Academy of Sciences of the United States of America·Shunsuke HirookaShin-Ya Miyagishima
Nov 8, 2018·BMC Bioinformatics·Momeneh ForoutanMelissa J Davis
Nov 30, 2018·Wiley Interdisciplinary Reviews. RNA·Elodie MaillerValerie Vivet-Boudou
Oct 1, 2016·Briefings in Functional Genomics·Quanhu ShengYan Guo
May 26, 2017·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Tianzhou MaGeorge C Tseng
Aug 11, 2017·BMC Bioinformatics·Guoshuai CaiFeifei Xiao
Sep 9, 2017·Marine Biotechnology·Kai SongGuofan Zhang

Related Concepts

Meta Analysis (Statistical Procedure)
Sequence Determinations, RNA
MRNA Differential Display
Gene Expression
Malignant Neoplasm of Stomach

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

The Tendon Seed Network

Tendons are rich in the extracellular matrix and are abundant throughout the body providing essential roles including structure and mobility. The transcriptome of tendons is being compiled to understand the micro-anatomical functioning of tendons. Discover the latest research pertaining to the Tendon Seed Network here.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.


Incretins are metabolic hormones that stimulate a decrease in glucose levels in the blood and they have been implicated in glycemic regulation in the remission phase of type 1 diabetes. Here is the latest research.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Long COVID-19

“Long Covid-19” describes illness in patients who are reporting long-lasting effects of the SARS-CoV-19 infection, often long after they have recovered from acute Covid-19. Ongoing health issues often reported include low exercise tolerance and breathing difficulties, chronic tiredness, and mental health problems such as post-traumatic stress disorder and depression. This feed follows the latest research into Long Covid.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.