One approach that's often used is to calculate the between to within mean
square ratio (as in a one-way ANOVA), then calculate the F ratio and
p-value for each cut, and then use the cut with the smallest
p-value. This is at best a useful heuristic, however-- probably the best
approach is to choose the clustering that is most interpretable
substantively, or to use a technique such as K-means designed to find a
simple clustering (or partition) in the first place! (K-means explicitly
maximizes the F ratio discussed above for a partitioning into K clusters,
so you may want to use the relative p-values as well as interpretability
and other criteria to choose the appropriate value of K.)
Doug Carroll
At 12:49 PM 5/5/2004 -0500, you wrote:
>Dear all,
>
>I am now working on the hierarchical clustering methods, and
>confused about the following problem.
>
>As you know, to form clustering from the hierarchical tree generated by
>the pairwise distance bw the elements, we have to set a threshold value
>to cut the tree horizonally such that the vertical links intersecting with
>this horizonal critical value will be the final clusters.
>
>However, I do not find a very robust criterion for choosing the
>optimal number of clusters or calculating this threshold value to make the
>clustering results good different pairwise distance(similairty) measure.
>
>So any one has some point on this problem or recommended papers
>or methods?
>
>Thanks for your help.
>
>Fred
>
>
>
######################################################################
# J. Douglas Carroll, Board of Governors Professor of Management and #
#Psychology, Rutgers University, Graduate School of Management, #
#Marketing Dept., MEC125, 111 Washington Street, Newark, New Jersey #
#07102-3027. Tel.: (973) 353-5814, Fax: (973) 353-5376. #
# Home: 14 Forest Drive, Warren, New Jersey 07059-5802. #
# Home Phone: (908) 753-6441 or 753-1620, Home Fax: (908) 757-1086. #
# E-mail: [log in to unmask] #
######################################################################
|