DOI: 10.1101/517987Jan 11, 2019Paper

A Note on Computing Interval Overlap Statistics

BioRxiv : the Preprint Server for Biology
Shahab Sarmashghi, Vineet Bafna


We consider the following problem: Let I and If each describe a collection of n and m non-overlapping intervals on a line segment of finite length. Suppose that k of the m intervals of If are intersected by some interval(s) in I . Under the null hypothesis that intervals in I are randomly arranged w.r.t If , what is the significance of this overlap? This is a natural abstraction of statistical questions that are ubiquitous in the post-genomic era. The interval collections represent annotations that reveal structural or functional regions of the genome, and overlap statistics can provide insight into the correlation between different structural and functional regions. However, the statistics of interval overlaps have not been systematically explored. In this manuscript, we formulate a statistical significance problem which considers the length and structure of intervals. We describe a combinatorial algorithm for a constrained interval overlap problem that can accurately compute very small p-values. We also propose a fast approximate method to facilitate problems consisted of very large number of intervals. These methods are all implemented in a tool, ISTAT. We applied ISTAT to simulated interval data to obtain precise estimates ...Continue Reading

Related Concepts

Computer Software
Genomic Stability
Parametric Image

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.