Dear Philip, A) Dendogram as trees Here I assume that one handles "dendogram" as result of a hierarchical clustering method. The problem of evaluating cluster quality or cluster validity has not yet been solved entirely and in my personal opinion it is very unlikely that a "general solution" exists to asses cluster quality. However, you might want to look into the work of Laura Mather (JASIS, 2000) where she described a linear algebra measure of cluster quality. Be aware that her measure relies on a specific property of the document-term matrix being clustered and therefore read also my comment on JASIST 2001. I have implemented her method in Java and published its application in a bioinformatics conference last year (MGED3). B) Dendogram as ontology Here I assume that one handles "dendogram" as a manually or automatically generated ontology whereby the links between the nodes (or in other words the edges between vertices) may belong to different semantic types. This is the task of "comparing ontologies". Here the matter gets even more complicated because you have to build e.g. semantic distances which is itself a topic still lacking a general valid solution. I have found the recent work by Maedche and Staab be instructive and perhaps you might want to use that as starting point for your further explorations. Regards Luca Toldo [log in to unmask] References Mather LA (2000) JASIS 51(7):602-613 Maedche A, Staab S (2001) http://citeseer.nj.nec.com/439130.html Toldo L(2001) JASIST 52(7):601-602 Toldo L (2001) MGED3 http://www.dnachip.org/mged3/mged3_abstracts_v4.pdf [log in to unmask]@LISTS.SUNYSB.EDU am 09.03.2002 15:37:12 Bitte antworten an [log in to unmask] Gesendet von: [log in to unmask] An: [log in to unmask] Kopie: Thema: Re: question: comparing dendrograms No, having smaller clustering levels does not mean the clusters are "better" (if it did then everyone would only use the single-link method). It also does not make much sense to use any other 'internal' criteria such as whether you get the number of clusters you expect or how distinct the clusters look in the dendrogram. The criterion has to either be how well the dendrogram summarizes the information in the original data or dissimilarity matrix or (in a simulation) how well it reflects known relationships. One concept of "better" is that the distance relationships among the objects as implied by the dendrogram ("ultrametric distances" or "cophenetic values") are closely related to the values in the dissimilarity matrix that you are clustering. There has been some controversy about this in the past but I still find the cophentic correlation (or some index in a similar spirit) to be the most useful measure of how well a dendrogram fits a dissimilarity matrix -- especially when you want to take the actual clustering levels into account. I suspect my response may generate some reactions from others! A 'classic' reference would be the book by Sneath and Sokal but this approach is described in other publications. > -----Original Message----- > From: Classification, clustering, and phylogeny estimation > [mailto:[log in to unmask]]On Behalf Of Herriton > Sent: Friday, March 08, 2002 10:57 AM > To: [log in to unmask] > Subject: question: comparing dendrograms > > > I have a question on how to compare dendrograms: > > I have two dendrograms using two different sets of data. They both have > very similar clustering, say, clusters A, B, and C. The > difference lies in > the distance between the members of each cluster. Say, in cluster A, the > distance between the members in one dendrogram is on average larger. > > I used Euclidean distance and Ward linkage. > > Is one dendrogram "better" than the other? That is, if the > distance between > the members of a cluster is on average shorter in one dendrogram, > does that > mean the data used for this dendrogram cluster better? > > I'm not very familiar with clustering algorithms in general. Pointers and > comments are much appreciated. > > Philip > ################################################################# ################################################################# ################################################################# ##### ##### ##### ################################################################# ################################################################# #################################################################