Apr 27, 2020

Matrix factorization recovers consistent regulatory signals from disparate datasets

BioRxiv : the Preprint Server for Biology
A. V. SastryBernhard O Palsson


The availability of gene expression data has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We also show that echoes of this structure remain in the proteome, accelerating biological discovery through multi-omics analysis. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Severe Acute Respiratory Syndrome
Genome-Wide Association Study
Protein Binding
Bloom Syndrome
Saccharomyces cerevisiae allergenic extract
Ligand Binding Domain
Transcription Factor Binding

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.