Impact of RNA-seq attributes on false positive rates in differential expression analysis of de novo assembled transcriptomes

BMC Research Notes
Emmanuel González, Simon Joly

Abstract

High-throughput RNA sequencing studies are becoming increasingly popular and differential expression studies represent an important downstream analysis that often follow de novo transcriptome assembly. If a lot of attention has been given to bioinformatics tools for differential gene expression, little has yet been given to the impact of the sequence data itself used in pipelines. We tested how using different types of reads from the ones used to assemble a de novo transcriptome (both differing in length and pairing attributes) could potentially affect differential expression (DE) results. To investigate this, we created artificial datasets out of long paired-end RNA-seq datasets initially used to build the assembly. All datasets were compared via DE analyses and because all samples come from the same sequencing run, DE of genes or isoforms can be interpreted as false positives resulting from sequence attributes. If the false positive rate for differential gene expression does not seem to be strongly affected by sequencing strategy (max. of 3.5%), it could reach 12.2% or 28.1% for differential isoform expression depending of the pipeline used. The effect of paired-end vs. single-end strategy was found to have a much greater imp...Continue Reading

References

Jun 19, 2008·Annals of the New York Academy of Sciences·Douglas E SoltisPamela S Soltis
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Mar 6, 2009·Genome Biology·Ben LangmeadSteven L Salzberg
May 13, 2010·Briefings in Bioinformatics·Heng Li, Nils Homer
Oct 12, 2010·Nature Methods·Gordon RobertsonInanc Birol
Dec 24, 2010·Genome Biology·Alicia OshlackMatthew D Young
May 17, 2011·Nature Biotechnology·Manfred G GrabherrAviv Regev
May 31, 2011·Nature Methods·Manuel GarberCole Trapnell
Sep 8, 2011·Nature Reviews. Genetics·Jeffrey A Martin, Zhong Wang
Sep 10, 2011·Genome Research·Sonia TarazonaAna Conesa
Feb 4, 2012·American Journal of Botany·Judson A WardCourtney A Weber
Mar 6, 2012·Nature Methods·Ben Langmead, Steven L Salzberg
Jun 12, 2012·Nucleic Acids Research·Marc LohseBjörn Usadel
Oct 13, 2012·Bioinformatics·Nuno A FonsecaJohn C Marioni
Jan 10, 2013·PloS One·Robert Lindner, Caroline C Friedel
Feb 23, 2013·Bioinformatics·Ning LengChristina Kendziorski
Mar 19, 2013·BMC Bioinformatics·Charlotte Soneson, Mauro Delorenzi

Citations

Jul 2, 2015·Arthritis Research & Therapy·Eugenia G GiannopoulouLionel B Ivashkiv
May 24, 2016·Scientific Reports·Xavier MartinezChaysavanh Manichanh

Methods Mentioned

BETA
RNA-seq
PCR

Related Concepts

Sequence Determinations, RNA
MRNA Differential Display
High-Throughput Nucleotide Sequencing
Gene Expression Profiles
Attention
Gene Expression
Genes
Downstream
Protein Isoforms
Molecular Assembly/Self Assembly

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Lipidomics & Rhinovirus Infection

Lipidomics can be used to examine the lipid species involved with pathogenic conditions, such as viral associated inflammation. Discovered the latest research on Lipidomics & Rhinovirus Infection.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Torsion Dystonia

Torsion dystonia is a movement disorder characterized by loss of control of voluntary movements appearing as sustained muscle contractions and/or abnormal postures. Here is the latest research.

Archaeal RNA Polymerase

Archaeal RNA polymerases are most similar to eukaryotic RNA polymerase II but require the support of only two archaeal general transcription factors, TBP (TATA-box binding protein) and TFB (archaeal homologue of the eukaryotic general transcription factor TFIIB) to initiate basal transcription. Here is the latest research on archaeal RNA polymerases.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.