Grouper: graph-based clustering and annotation for improved de novo transcriptome analysis

Bioinformatics
Laraib MalikRob Patro

Abstract

De novo transcriptome analysis using RNA-seq offers a promising means to study gene expression in non-model organisms. Yet, the difficulty of transcriptome assembly means that the contigs provided by the assembler often represent a fractured and incomplete view of the transcriptome, complicating downstream analysis. We introduce Grouper, a new method for clustering contigs from de novo assemblies that are likely to belong to the same transcripts and genes; these groups can subsequently be analyzed more robustly. When provided with access to the genome of a related organism, Grouper can transfer annotations to the de novo assembly, further improving the clustering. On de novo assemblies from four different species, we show that Grouper is able to accurately cluster a larger number of contigs than the existing state-of-the-art method. The Grouper pipeline is able to map greater than 10% more reads against the contigs, leading to accurate downstream differential expression analyses. The labeling module, in the presence of a closely related annotated genome, can efficiently transfer annotations to the contigs and use this information to further improve clustering. Overall, Grouper provides a complete and efficient pipeline for proc...Continue Reading

References

Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Oct 12, 2010·Nature Methods·Gordon RobertsonInanc Birol
Dec 15, 2010·Current Protocols in Bioinformatics·Ben Langmead
Apr 21, 2011·Algorithms for Molecular Biology : AMB·Marius NicolaeAlex Zelikovsky
May 17, 2011·Nature Biotechnology·Manfred GrabherrAviv Regev
May 31, 2011·Nature Methods·Manuel GarberCole Trapnell
Sep 8, 2011·Nature Reviews. Genetics·Jeffrey A Martin, Zhong Wang
Feb 1, 2011·Statistical Science : a Review Journal of the Institute of Mathematical Statistics·Julia SalzmanWing Hung Wong
Feb 4, 2014·Genome Biology·Charity W LawGordon K Smyth
May 7, 2016·Bioinformatics·Dilip A Durai, Marcel H Schulz
Feb 23, 2017·PeerJ·Cédric CabauChristophe Klopp
Mar 7, 2017·Nature Methods·Rob PatroCarl Kingsford

Citations

Dec 21, 2019·PLoS Computational Biology·Matt CrumMichelle M Meyer
Feb 13, 2020·BMC Genomics·Fernando G Razo-MendivilCorina Hayano-Kanashiro
Jun 10, 2020·Genome Biology·Zijian NiChristina Kendziorski

Related Concepts

Statistical Cluster
Fracture
Gene Expression
Genes
Genome
Laboratory
Downstream
Genetic Analysis
Molecular Assembly/Self Assembly
Sequence Determinations, RNA

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.