One approach that's often used is to calculate the between to within mean
square ratio (as in a one-way ANOVA), then calculate the F ratio and
p-value for each cut, and then use the cut with the smallest
p-value. This is at best a useful heuristic, however--
probably the best approach is to choose the clustering that is most
interpretable substantively, or to use a technique such as K-means
designed to find a simple clustering (or partition) in the first
place! (K-means explicitly
maximizes the F ratio discussed above for a partitioning into K clusters,
so you may want to use the relative p-values as well as interpretability
and other criteria to choose the appropriate value of K.)
Doug Carroll
At 12:49 PM 5/5/2004 -0500, you wrote:
Dear all,
I am now working on the hierarchical clustering methods, and
confused about the following problem.
As you know, to form clustering from the hierarchical tree generated by
the pairwise distance bw the elements, we have to set a threshold value
to cut the tree horizonally such that the vertical links intersecting with
this horizonal critical value will be the final clusters.
However, I do not find a very robust criterion for choosing the
optimal number of clusters or calculating this threshold value to make the
clustering results good different pairwise distance(similairty) measure.
So any one has some point on this problem or recommended papers
or methods?
Thanks for your help.
Fred