Print

Print


No, having smaller clustering levels does not mean the clusters are "better"
(if it did then everyone would only use the single-link method). It also
does not make much sense to use any other 'internal' criteria such as
whether you get the number of clusters you expect or how distinct the
clusters look in the dendrogram.

The criterion has to either be how well the dendrogram summarizes the
information in the original data or dissimilarity matrix or (in a
simulation) how well it reflects known relationships. One concept of
"better" is that the distance relationships among the objects as implied by
the dendrogram ("ultrametric distances" or "cophenetic values") are closely
related to the values in the dissimilarity matrix that you are clustering.
There has been some controversy about this in the past but I still find the
cophentic correlation (or some index in a similar spirit) to be the most
useful measure of how well a dendrogram fits a dissimilarity matrix --
especially when you want to take the actual clustering levels into account.
I suspect my response may generate some reactions from others!

A 'classic' reference would be the book by Sneath and Sokal but this
approach is described in other publications.

> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation
> [mailto:[log in to unmask]]On Behalf Of Herriton
> Sent: Friday, March 08, 2002 10:57 AM
> To: [log in to unmask]
> Subject: question: comparing dendrograms
>
>
> I have a question on how to compare dendrograms:
>
> I have two dendrograms using two different sets of data.  They both have
> very similar clustering, say, clusters A, B, and C.  The
> difference lies in
> the distance between the members of each cluster.  Say, in cluster A, the
> distance between the members in one dendrogram is on average larger.
>
> I used Euclidean distance and Ward linkage.
>
> Is one dendrogram "better" than the other?  That is, if the
> distance between
> the members of a cluster is on average shorter in one dendrogram,
> does that
> mean the data used for this dendrogram cluster better?
>
> I'm not very familiar with clustering algorithms in general.  Pointers and
> comments are much appreciated.
>
> Philip
>


#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################