One approach that's often used is to calculate the between to within mean
square ratio (as in a oneway ANOVA), then calculate the F ratio and
pvalue for each cut, and then use the cut with the smallest
pvalue. This is at best a useful heuristic, however probably the best
approach is to choose the clustering that is most interpretable
substantively, or to use a technique such as Kmeans designed to find a
simple clustering (or partition) in the first place! (Kmeans explicitly
maximizes the F ratio discussed above for a partitioning into K clusters,
so you may want to use the relative pvalues as well as interpretability
and other criteria to choose the appropriate value of K.)
Doug Carroll
At 12:49 PM 5/5/2004 0500, you wrote:
>Dear all,
>
>I am now working on the hierarchical clustering methods, and
>confused about the following problem.
>
>As you know, to form clustering from the hierarchical tree generated by
>the pairwise distance bw the elements, we have to set a threshold value
>to cut the tree horizonally such that the vertical links intersecting with
>this horizonal critical value will be the final clusters.
>
>However, I do not find a very robust criterion for choosing the
>optimal number of clusters or calculating this threshold value to make the
>clustering results good different pairwise distance(similairty) measure.
>
>So any one has some point on this problem or recommended papers
>or methods?
>
>Thanks for your help.
>
>Fred
>
>
>
