Abstract
Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies. Representing dendrograms in graph theoretical terms allowed us to introduce four measures of cluster frequency in a canonical way, and use them to calculate cluster frequencies over the set of all possible dendrograms, taking all ties in proximity into account. A toy example of well separated clusters was used, as well as a set of 1666 molecular descriptors calculated for a group of molecules having hepatotoxic activity to show how our functions may be used f...Continue Reading
References
Jul 9, 1999·Medical & Biological Engineering & Computing·Z S WangJ D Chen
Feb 24, 2001·Journal of Chemical Information and Computer Sciences·J MacCuishN E MacCuish
Jan 27, 2004·Journal of Chemical Information and Computer Sciences·Guillermo RestrepoJosé L Villaveces
Mar 3, 2004·Genome Research·Susanne PrinzTimothy Galitski
Sep 18, 2004·Bioinformatics·Vicente ArnauIgnacio Marín
Jan 24, 2006·Journal of Chemical Information and Modeling·Shinji AmariTatsuya Nakano
May 23, 2006·Journal of Chemical Information and Modeling·Dariusz PlewczynskiUwe Koch
May 1, 2007·Journal of Chemical Information and Modeling·Guillermo RestrepoEugenio J Llanos
Sep 26, 2008·Journal of Chemical Information and Modeling·Osvaldo A Santos-Filho, Artem Cherkasov
Jun 25, 2010·Journal of Chemical Information and Modeling·Denis FourchesAlexander Tropsha
Oct 5, 2011·Journal of the American Chemical Society·Hengwei LinKenneth S Suslick
Mar 20, 2012·Journal of Cheminformatics·Martin GütleinStefan Kramer
Dec 19, 2012·Journal of Cheminformatics·Faisal SaeedAmmar Abdo
Aug 3, 2013·Journal of Chemical Information and Modeling·Carolina L BelleraCarolina Carrillo
Apr 4, 2014·Journal of the American Chemical Society·Kate J AkermanOrde Q Munro
Sep 30, 2014·Journal of Cheminformatics·Ctibor SkutaDaniel Svozil
Apr 14, 2015·Journal of Cheminformatics·Alberto GobbiMan-Ling Lee
Jul 8, 2015·Journal of Cheminformatics·Sunghwan KimStephen H Bryant