Corset: enabling differential gene expression analysis for de novo assembled transcriptomes

Genome Biology
Nadia M Davidson, Alicia Oshlack


Next generation sequencing has made it possible to perform differential gene expression studies in non-model organisms. For these studies, the need for a reference genome is circumvented by performing de novo assembly on the RNA-seq data. However, transcriptome assembly produces a multitude of contigs, which must be clustered into genes prior to differential gene expression detection. Here we present Corset, a method that hierarchically clusters contigs using shared reads and expression, then summarizes read counts to clusters, ready for statistical testing. Using a range of metrics, we demonstrate that Corset out-performs alternative methods. Corset is available from


Apr 5, 2002·Genome Research·W James Kent
Feb 12, 2004·Bioinformatics·M J L de HoonS Miyano
Jun 25, 2004·Nucleic Acids Research·Scott McGinnis, Thomas L Madden
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Mar 3, 2009·Genome Research·Jared T SimpsonInanc Birol
Mar 6, 2009·Genome Biology·Ben LangmeadSteven L Salzberg
Mar 18, 2009·Bioinformatics·Cole TrapnellSteven L Salzberg
Jun 10, 2009·Bioinformatics·Heng Li1000 Genome Project Data Processing Subgroup
Aug 10, 2010·Genome Research·Yann Surget-Groba, Juan I Montoya-Burgos
Oct 12, 2010·Nature Methods·Gordon RobertsonInanc Birol
Oct 29, 2010·Genome Biology·Simon Anders, Wolfgang Huber
Dec 24, 2010·Genome Biology·Alicia OshlackMatthew D Young
Dec 31, 2010·Nature Reviews. Genetics·Fatih Ozsolak, Patrice M Milos
Jan 11, 2011·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Rohini GargMukesh Jain
May 17, 2011·Nature Biotechnology·Manfred GrabherrAviv Regev
Aug 19, 2011·Genome Biology·Thomas SandmannKerstin Bartscherer
Sep 8, 2011·Nature Reviews. Genetics·Jeffrey A Martin, Zhong Wang
Nov 1, 2011·PloS One·Linnéa Smeds, Axel Künstner
Oct 13, 2012·Bioinformatics·Limin FuWeizhong Li
Dec 12, 2012·Nature Biotechnology·Cole TrapnellLior Pachter
Mar 19, 2013·BMC Bioinformatics·Charlotte Soneson, Mauro Delorenzi
May 16, 2013·BMC Genomics·Ya Yang, Stephen A Smith
Jul 23, 2013·PloS One·Knut FinstermeierChristian Roos


Dec 16, 2017·MSphere·Louisi Souza de OliveiraFabiano L Thompson
May 21, 2019·Genome Biology and Evolution·Zhenhua ZhangBojian Zhong
Jul 3, 2019·Insect Molecular Biology·Kun YangXiao-Yue Hong
Aug 10, 2019·Molecular Biology and Evolution·Paolo FranchiniAxel Meyer
Aug 21, 2019·G3 : Genes - Genomes - Genetics·Dario I OjedaTanja Pyhäjärvi
Dec 5, 2019·International Journal of Molecular Sciences·Xiaodong ZhangWensheng Qin
Feb 22, 2020·PloS One·Alexander D FinoshinYulia V Lyupina
Mar 12, 2020·Molecular Ecology·James D BurgonKathryn R Elmer
May 13, 2020·PloS One·Vladimir MashanovDaniel Janies
May 14, 2020·Evolution; International Journal of Organic Evolution·Beatriz WillinkErik I Svensson
Sep 26, 2020·Proceedings of the National Academy of Sciences of the United States of America·Katja T C A PeijnenburgFerdinand Marlétaz
Mar 28, 2018·ELife·Sumanth K MutteDolf Weijers
Feb 13, 2020·BMC Genomics·Fernando G Razo-MendivilCorina Hayano-Kanashiro
Apr 7, 2020·Scientific Reports·Surintorn BoonanuntanasarnUthairat Na-Nakorn
Sep 16, 2020·BMC Evolutionary Biology·Suman KumarHarald Hausen
Nov 6, 2020·Journal of Chemical Ecology·Angel Llopis-GiménezCristina M Crava
Nov 26, 2020·Journal of Molecular Evolution·Keisuke ShimizuMichio Suzuki
Mar 14, 2021·Science Advances·Joshua P TorresEric W Schmidt
Mar 31, 2021·Proceedings of the National Academy of Sciences of the United States of America·Yuki DollHirokazu Tsukaya

Datasets Mentioned


Methods Mentioned


Related Concepts

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Glut1 Deficiency

Glut1 deficiency, an autosomal dominant, genetic metabolic disorder associated with a deficiency of GLUT1, the protein that transports glucose across the blood brain barrier, is characterized by mental and motor developmental delays and infantile seizures. Follow the latest research on Glut1 deficiency with this feed.

Separation Anxiety

Separation anxiety is a type of anxiety disorder that involves excessive distress and anxiety with separation. This may include separation from places or people to which they have a strong emotional connection with. It often affects children more than adults. Here is the latest research on separation anxiety.

KIF1A Associated Neurological Disorder

KIF1A associated neurological disorder (KAND) is a rare neurodegenerative condition caused by mutations in the KIF1A gene. KAND may present with a wide range and severity of symptoms including stiff or weak leg muscles, low muscle tone, a lack of muscle coordination and balance, and intellectual disability. Find the latest research on KAND here.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.