CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

BMC Bioinformatics
Linda Dib, Alessandra Carbone

Abstract

Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomera...Continue Reading

References

Dec 9, 1998·Proceedings of the National Academy of Sciences of the United States of America·M B EisenD Botstein
Dec 17, 2002·Nature Structural Biology·Gürol M SüelRama Ranganathan
May 23, 2003·Bioinformatics·Doulaye Dembélé, Philippe Kastner
Jan 16, 2007·Science·Brendan J Frey, Delbert Dueck
Dec 7, 2007·Bioinformatics·Kevin Y YipMark Gerstein
Jun 10, 2008·Biometrics·Ranjan Maitra, Ivan P Ramler
Sep 5, 2009·PLoS Computational Biology·Julie Baussand, Alessandra Carbone
Dec 17, 2009·Bioinformatics·Yijing ShenKer-Chau Li

Citations

Jul 11, 2018·IEEE Journal of Biomedical and Health Informatics·Sriparna SahaSaisree Miriyala
Jan 8, 2020·Toxins·José Antonio AgüeroLaurent Falquet
Feb 12, 2020·Immunogenetics·Fabio D'AmicoMassimo Libra
May 5, 2017·Nucleic Acids Research·Francesco OteriAlessandra Carbone
May 7, 2020·Nucleic Acids Research·Elin TeppaAlessandra Carbone

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Glut1 Deficiency

Glut1 deficiency, an autosomal dominant, genetic metabolic disorder associated with a deficiency of GLUT1, the protein that transports glucose across the blood brain barrier, is characterized by mental and motor developmental delays and infantile seizures. Follow the latest research on Glut1 deficiency with this feed.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Separation Anxiety

Separation anxiety is a type of anxiety disorder that involves excessive distress and anxiety with separation. This may include separation from places or people to which they have a strong emotional connection with. It often affects children more than adults. Here is the latest research on separation anxiety.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.