LinearSampling: Linear-Time Stochastic Sampling of RNA Secondary Structure with Applications to SARS-CoV-2.

BioRxiv : the Preprint Server for Biology
He ZhangLiang Huang


Many RNAs fold into multiple structures at equilibrium. The classical stochastic sampling algorithm can sample secondary structures according to their probabilities in the Boltzmann ensemble, and is widely used, e.g., for accessibility prediction. However, the current sampling algorithm, consisting of a bottom-up partition function phase followed by a top-down sampling phase, suffers from three limitations: (a) the formulation and implementation of the sampling phase are unnecessarily complicated; (b) much redundant work is repeatedly performed in the sampling phase; (c) the partition function runtime scales cubically with the sequence length. These issues prevent it from being used for full-length viral genomes such as SARS-CoV-2. To address these problems, we first present a hypergraph framework under which the sampling algorithm can be greatly simplified. We then present three sampling algorithms under this framework of which two eliminate redundant work in the sampling phase. Finally, we present LinearSampling, an end-to-end linear-time sampling algorithm that is orders of magnitude faster than the standard algorithm. For instance, LinearSampling is 111 times faster (48s vs. 1.5h) than Vienna RNAsubopt on the longest sequen...Continue Reading

Methods Mentioned


Related Concepts

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.