One approach that's often used is to calculate the between to within mean
square ratio (as in a oneway ANOVA), then calculate the F ratio and
pvalue for each cut, and then use the cut with the smallest
pvalue. This is at best a useful heuristic, however probably the best
approach is to choose the clustering that is most interpretable
substantively, or to use a technique such as Kmeans designed to find a
simple clustering (or partition) in the first place! (Kmeans explicitly
maximizes the F ratio discussed above for a partitioning into K clusters,
so you may want to use the relative pvalues as well as interpretability
and other criteria to choose the appropriate value of K.)
Doug Carroll
At 12:49 PM 5/5/2004 0500, you wrote:
>Dear all,
>
>I am now working on the hierarchical clustering methods, and
>confused about the following problem.
>
>As you know, to form clustering from the hierarchical tree generated by
>the pairwise distance bw the elements, we have to set a threshold value
>to cut the tree horizonally such that the vertical links intersecting with
>this horizonal critical value will be the final clusters.
>
>However, I do not find a very robust criterion for choosing the
>optimal number of clusters or calculating this threshold value to make the
>clustering results good different pairwise distance(similairty) measure.
>
>So any one has some point on this problem or recommended papers
>or methods?
>
>Thanks for your help.
>
>Fred
>
>
>
######################################################################
# J. Douglas Carroll, Board of Governors Professor of Management and #
#Psychology, Rutgers University, Graduate School of Management, #
#Marketing Dept., MEC125, 111 Washington Street, Newark, New Jersey #
#071023027. Tel.: (973) 3535814, Fax: (973) 3535376. #
# Home: 14 Forest Drive, Warren, New Jersey 070595802. #
# Home Phone: (908) 7536441 or 7531620, Home Fax: (908) 7571086. #
# Email: [log in to unmask] #
######################################################################
