Seq-ing improved gene expression estimates from microarrays using machine learning

BMC Bioinformatics
Paul K KorirCathal Seoighe

Abstract

Quantifying gene expression by RNA-Seq has several advantages over microarrays, including greater dynamic range and gene expression estimates on an absolute, rather than a relative scale. Nevertheless, microarrays remain in widespread use, demonstrated by the ever-growing numbers of samples deposited in public repositories. We propose a novel approach to microarray analysis that attains many of the advantages of RNA-Seq. This method, called Machine Learning of Transcript Expression (MaLTE), leverages samples for which both microarray and RNA-Seq data are available, using a Random Forest to learn the relationship between the fluorescence intensity of sets of microarray probes and RNA-Seq transcript expression estimates. We trained MaLTE on data from the Genotype-Tissue Expression (GTEx) project, consisting of Affymetrix gene arrays and RNA-Seq from over 700 samples across a broad range of human tissues. This approach can be used to accurately estimate absolute expression levels from microarray data, at both gene and transcript level, which has not previously been possible. This methodology will facilitate re-analysis of archived microarray data and broaden the utility of the vast quantities of data still being generated.

References

Dec 26, 2001·Nucleic Acids Research·Ron EdgarAlex E Lash
Feb 13, 2003·Nucleic Acids Research·Rafael A IrizarryTerence P Speed
Dec 20, 2003·Nature·UNKNOWN International HapMap Consortium
Apr 23, 2005·Nature Methods·Rafael A IrizarryWayne Yu
Jan 18, 2006·Bioinformatics·Rafael A IrizarryHarris A Jaffee
Nov 14, 2006·Nucleic Acids Research·Tanya BarrettRon Edgar
Jun 1, 2007·Proceedings of the National Academy of Sciences of the United States of America·R Stephanie HuangM Eileen Dolan
Oct 13, 2007·Nucleic Acids Research·Tim YatesCrispin J Miller
Nov 17, 2007·BMC Bioinformatics·Mark D Robinson, Terence P Speed
Jan 15, 2008·Nature Genetics·Tony KwanJacek Majewski
Jun 3, 2008·Nature Methods·Ali MortazaviBarbara Wold
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Apr 18, 2009·BMC Genomics·Xing FuPhilipp Khaitovich
Mar 9, 2010·Nature Methods·Shai S Shen-OrrAtul J Butte
Mar 12, 2010·Nature·Stephen B MontgomeryEmmanouil T Dermitzakis
Mar 12, 2010·Nature·Joseph K PickrellJonathan K Pritchard
Dec 31, 2010·Nature Reviews. Genetics·Fatih Ozsolak, Patrice M Milos
May 20, 2011·Nature·Björn SchwanhäusserMatthias Selbach
Nov 17, 2011·Nucleic Acids Research·Paul FlicekStephen M J Searle
Jan 24, 2013·Molecular Systems Biology·Pavel MazinPhilipp Khaitovich
May 30, 2013·Nature Genetics·UNKNOWN GTEx Consortium
Jul 5, 2013·Bioinformatics·Renaud Gaujoux, Cathal Seoighe

❮ Previous
Next ❯

Citations

Sep 3, 2016·BMC Bioinformatics·Barbara F F Huang, Paul C Boutros
Feb 8, 2019·BMC Bioinformatics·Nicolas BorisovAnton Buzdin

❮ Previous
Next ❯

Datasets Mentioned

BETA
GSE45878
GSE25030
GSE19480
GSE7851

Methods Mentioned

BETA
RNA-Seq
chips
gene array
Rosetta Stone

Software Mentioned

Ensembl
Affymetrix Power Tools ( APT
APT
CellMix
MaLTE
Random
BEDTools
Forest
Cuffdiff
R

Related Concepts

Related Feeds

Alternative splicing

Alternative splicing a regulated gene expression process that allows a single genetic sequence to code for multiple proteins. Here is that latest research.