EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering

BMC Bioinformatics
Soohyun LeePeter J Park

Abstract

RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

References

Mar 10, 2001·Nature·E S LanderUNKNOWN International Human Genome Sequencing Consortium
Jun 3, 2008·Nature Methods·Ali MortazaviBarbara Wold
Nov 4, 2008·Nature·Eric T WangChristopher B Burge
Feb 27, 2009·Bioinformatics·Hui Jiang, Wing Hung Wong
Jul 22, 2009·Nucleic Acids Research·Dmitri ParkhomchukAlexey Soldatov
Jan 16, 2010·Nature·Jeremy SchmutzScott A Jackson
Aug 17, 2010·Nature Methods·Joshua Z LevinAviv Regev
Oct 29, 2010·Genome Biology·Simon Anders, Wolfgang Huber
Mar 10, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Bogdan PaşaniucEran Halperin
Apr 21, 2011·Algorithms for Molecular Biology : AMB·Marius NicolaeAlex Zelikovsky
Sep 29, 2011·Methods in Enzymology·Tatiana BorodinaMarc Sultan
Sep 11, 2012·Nucleic Acids Research·Thasso GriebelMichael Sammeth
Sep 19, 2012·Briefings in Bioinformatics·Marie-Agnès DilliesUNKNOWN French StatOmique Consortium
Nov 20, 2012·Nature Methods·Adam Roberts, Lior Pachter
Feb 4, 2014·Genome Biology·Charity W LawGordon K Smyth

❮ Previous
Next ❯

Citations

Oct 27, 2015·Nature Biotechnology·Jiho ChoiKonrad Hochedlinger
Jul 20, 2018·G3 : Genes - Genomes - Genetics·Jeremy R B NewmanLauren M McIntyre
Apr 5, 2016·Nature Biotechnology·Nicolas L BrayLior Pachter
Apr 1, 2020·Genomics & Informatics·Gunhwan KoByungwook Lee

❮ Previous
Next ❯

Methods Mentioned

BETA
RNA-seq

Software Mentioned

IsoEM
Ensembl
Cufflink
eXpress
RSEM
FLUX
RefSeq
NEUMA
Cufflinks
EUMA

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.