SimSeq: a nonparametric approach to simulation of RNA-sequence datasets

Sam Benidt, Dan Nettleton


RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method. We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algorithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate. Not surprisingly, methods based on parametric modeling assumptions seem to perform better with respect to false discovery rate control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy. ...Continue Reading


Aug 31, 2007·Biostatistics·Mark D Robinson, Gordon K Smyth
Sep 21, 2007·Bioinformatics·Mark D Robinson, Gordon K Smyth
Jul 11, 2008·BMC Bioinformatics·Korbinian Strimmer
Mar 4, 2010·Genome Biology·Mark D Robinson, Alicia Oshlack
Oct 29, 2010·Genome Biology·Simon Anders, Wolfgang Huber
Dec 1, 2011·Statistical Methods in Medical Research·Jun Li, Robert Tibshirani
Sep 11, 2012·Nucleic Acids Research·Thasso GriebelMichael Sammeth
Sep 19, 2012·Briefings in Bioinformatics·Marie-Agnès DilliesFrench StatOmique Consortium
Oct 30, 2012·Statistical Applications in Genetics and Molecular Biology·Steven P LundGordon K Smyth
Mar 19, 2013·BMC Bioinformatics·Charlotte Soneson, Mauro Delorenzi
Jun 25, 2013·Nature·Cancer Genome Atlas Research Network
Sep 26, 2013·Frontiers in Genetics·Pablo D Reeb, Juan P Steibel
Feb 4, 2014·Genome Biology·Charity W LawGordon K Smyth
Sep 6, 2014·Bioinformatics·David G Robinson, John D Storey
Dec 18, 2014·Genome Biology·Michael I LoveSimon Anders


Dec 15, 2015·Statistical Applications in Genetics and Molecular Biology·Thomas Thorne
Oct 3, 2015·Computational and Structural Biotechnology Journal·Daniel Spies, Constance Ciaudo
Oct 16, 2016·Briefings in Bioinformatics·Guillem RigaillEtienne Delannoy
Jan 20, 2017·Briefings in Bioinformatics·Alicia Poplawski, Harald Binder
Oct 3, 2017·Briefings in Bioinformatics·Stijn HawinkelOlivier Thas
Dec 24, 2019·Briefings in Functional Genomics·Shatha AlosaimiEmile R Chimusa
Feb 18, 2020·Bioinformatics·Alemu Takele AssefaOlivier Thas
May 1, 2020·PloS One·Stijn HawinkelOlivier Thas
Mar 27, 2019·Therapeutic Advances in Ophthalmology·Nicholas Owen, Mariya Moosajee
Jul 22, 2017·BMC Bioinformatics·Jan SchröderAnthony T Papenfuss
Feb 7, 2020·Nature Communications·Marek CmeroPCAWG Consortium
May 26, 2020·BMC Bioinformatics·David Gerard
Dec 18, 2020·BMC Genomics·Andrew MelnykAlex Zelikovsky
Feb 13, 2021·NAR Genomics and Bioinformatics·Stijn HawinkelOlivier Thas

Related Concepts

Sequence Analysis, RNA
Statistical Technique
R Programming Language

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.