The issue of determining the number of clusters is less of a problem when
clustering is done using maximum likelihood via latent class modeling. See
for example http://www.statisticalinnovations.com/articles/lcclurev.pdf ,
which appeared as chapter 3 in Hagenaars and McCutcheon (eds), Applied
Latent Class Analysis, Cambridge U. Press, 2002.
----- Original Message -----
From: "Art Kendall" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Sunday, May 09, 2004 5:36 PM
Subject: Re: [Help] Criteria for choosing # of clusters in hierarchical
> Just as there are a number of stopping rules in factor analysis which
> can narrow down the the number of solutions to try to interpret, the
> various stopping rules in cluster analysis can narrow down the number of
> solutions to try to interpret.
> If you would like to find some idea what number of cluster would be
> appropriate take a look at the TWOSTEP procedure in SPSS. However, this
> approach does not have any necessary relation between the clusterings
> for different numbers of cluster the way hierarchical solutions do.
> (i.e., the three cluster solution is NOT the 4 cluster solution with 2
> of the groups combined.
> "The procedure produces information criteria (AIC or BIC) by numbers of
> clusters in the solution, cluster frequencies for the final clustering,
> and descriptive
> statistics by cluster for the final clustering."
> You can specify whether you want the software to "automatically" pick a
> number of cluster, "automatically" pick a number of cluster up to some
> maximum number, or to find a fixed number of clusters. You can specify
> Bayesian Information Criterion (BIC) or the Akaike Information Criterion
> (AIC) to be used as the criterion in automatic choice of number of
> You can use categorical or continuous variables or both.
> Hope this helps.
> [log in to unmask]
> Social Research Consultants
> University Park, MD USA
> (301) 864-5570
> Fred wrote:
> > Dear all,
> > I am now working on the hierarchical clustering methods, and
> > confused about the following problem.
> > As you know, to form clustering from the hierarchical tree generated by
> > the pairwise distance bw the elements, we have to set a threshold value
> > to cut the tree horizonally such that the vertical links intersecting
> > this horizonal critical value will be the final clusters.
> > However, I do not find a very robust criterion for choosing the
> > optimal number of clusters or calculating this threshold value to make
> > the
> > clustering results good different pairwise distance(similairty) measure.
> > So any one has some point on this problem or recommended papers
> > or methods?
> > Thanks for your help.
> > Fred