RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections

BioRxiv : the Preprint Server for Biology
Jaime A Castro-MondragonJacques van Helden

Abstract

Transcription Factor (TF) databases contain multitudes of motifs from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq peaks) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant collections of motifs. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools and highlights biologically relevant variations of similar motifs. By clustering 24 entire databases (>7,500 motifs), we show that matrix-clustering c...Continue Reading

Related Concepts

Statistical Cluster
Gene Clusters
Transcription Factor
Trees (plant)
Cell Line, Tumor
High Throughput Technology
Binding (Molecular Function)
Chromatin Immunoprecipitation
Protein Domain

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

© 2021 Meta ULC. All rights reserved