## Abstract

We consider the following problem: Let I and If each describe a collection of n and m non-overlapping intervals on a line segment of finite length. Suppose that k of the m intervals of If are intersected by some interval(s) in I . Under the null hypothesis that intervals in I are randomly arranged w.r.t If , what is the significance of this overlap? This is a natural abstraction of statistical questions that are ubiquitous in the post-genomic era. The interval collections represent annotations that reveal structural or functional regions of the genome, and overlap statistics can provide insight into the correlation between different structural and functional regions. However, the statistics of interval overlaps have not been systematically explored. In this manuscript, we formulate a statistical significance problem which considers the length and structure of intervals. We describe a combinatorial algorithm for a constrained interval overlap problem that can accurately compute very small p-values. We also propose a fast approximate method to facilitate problems consisted of very large number of intervals. These methods are all implemented in a tool, ISTAT. We applied ISTAT to simulated interval data to obtain precise estimates ...Continue Reading