DOI: 10.1101/477869Nov 29, 2018Paper

CoCo: RNA-seq Read Assignment Correction for Nested Genes and Multimapped Reads

BioRxiv : the Preprint Server for Biology
Gabrielle Deschamps-FrancoeurMichelle S Scott

Abstract

Motivation: Next generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. Results: Here we present CoCo, a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. Availability: The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco.

Related Concepts

Genes
Motivation
Polymerase Chain Reaction
RNA
Computer Software
RNA, Untranslated
Sequence Determinations, RNA
Analysis
Nested Genes
Protein Expression

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.