An algorithm to infer similarity among cell types and organisms by examining the most expressed sequences

Genetics and Molecular Research : GMR
S A P Pinto, J M Ortega

Abstract

Following sequence alignment, clustering algorithms are among the most utilized techniques in gene expression data analysis. Clustering gene expression patterns allows researchers to determine which gene expression patterns are alike and most likely to participate in the same biological process being investigated. Gene expression data also allow the clustering of whole samples of data, which makes it possible to find which samples are similar and, consequently, which sampled biological conditions are alike. Here, a novel similarity measure calculation and the resulting rank-based clustering algorithm are presented. The clustering was applied in 418 gene expression samples from 13 data series spanning three model organisms: Homo sapiens, Mus musculus, and Arabidopsis thaliana. The initial results are striking: more than 91% of the samples were clustered as expected. The MESs (most expressed sequences) approach outperformed some of the most used clustering algorithms applied to this kind of data such as hierarchical clustering and K-means. The clustering performance suggests that the new similarity measure is an alternative to the traditional correlation/distance measures typically used in clustering algorithms.

Related Concepts

Disease Clustering
Arabidopsis thaliana <plant>
MRNA Differential Display
Gene Expression
House mice
Research Personnel
RNA, Guide
Arabidopsis thaliana <plant>
Patterns
Analysis

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.