Mar 26, 2015

Large-Scale Search of Transcriptomic Read Sets with Sequence Bloom Trees

BioRxiv : the Preprint Server for Biology
Brad Solomon, Carleton Kingsford

Abstract

Enormous databases of short-read RNA-seq sequencing experiments such as the NIH Sequence Read Archive (SRA) are now available. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. A natural question is which of these experiments contain sequences that indicate the expression of a particular sequence such as a gene isoform, lncRNA, or uORF. However, at present this is a computationally demanding question at the scale of these databases. We introduce an indexing scheme, the Sequence Bloom Tree (SBT), to support sequence-based querying of terabase-scale collections of thousands of short-read sequencing experiments. We apply SBT to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2652 publicly available RNA-seq experiments contained in the NIH for the breast, blood, and brain tissues, comprising 5 terabytes of sequence. SBTs of this size can be queried for a 1000 nt sequence in 19 minutes using less than 300 MB of RAM, over 100 times faster than standard usage of SRA-BLAST and 119 times faster than STAR. SBTs allow for fast identification of experiments with expressed novel isoforms, even if the...Continue Reading

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
STAR protein, human
Size
Trees (plant)
Basic Local Alignment Search Tool
Brain
Cocaine
Human RNA Sequencing
Sequencing
Blood

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
Brad Solomon, Carl Kingsford
Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
Chen SunPaul Medvedev
© 2020 Meta ULC. All rights reserved