Subject: | |
From: | |
Reply To: | Classification, clustering, and phylogeny estimation |
Date: | Sun, 9 May 2004 17:36:52 -0400 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Just as there are a number of stopping rules in factor analysis which
can narrow down the the number of solutions to try to interpret, the
various stopping rules in cluster analysis can narrow down the number of
solutions to try to interpret.
If you would like to find some idea what number of cluster would be
appropriate take a look at the TWOSTEP procedure in SPSS. However, this
approach does not have any necessary relation between the clusterings
for different numbers of cluster the way hierarchical solutions do.
(i.e., the three cluster solution is NOT the 4 cluster solution with 2
of the groups combined.
"The procedure produces information criteria (AIC or BIC) by numbers of
clusters in the solution, cluster frequencies for the final clustering,
and descriptive
statistics by cluster for the final clustering."
You can specify whether you want the software to "automatically" pick a
number of cluster, "automatically" pick a number of cluster up to some
maximum number, or to find a fixed number of clusters. You can specify
Bayesian Information Criterion (BIC) or the Akaike Information Criterion
(AIC) to be used as the criterion in automatic choice of number of
clusters.
You can use categorical or continuous variables or both.
Hope this helps.
Art
[log in to unmask]
Social Research Consultants
University Park, MD USA
(301) 864-5570
Fred wrote:
> Dear all,
>
> I am now working on the hierarchical clustering methods, and
> confused about the following problem.
>
> As you know, to form clustering from the hierarchical tree generated by
> the pairwise distance bw the elements, we have to set a threshold value
> to cut the tree horizonally such that the vertical links intersecting with
> this horizonal critical value will be the final clusters.
>
> However, I do not find a very robust criterion for choosing the
> optimal number of clusters or calculating this threshold value to make
> the
> clustering results good different pairwise distance(similairty) measure.
>
> So any one has some point on this problem or recommended papers
> or methods?
>
> Thanks for your help.
>
> Fred
>
>
>
|
|
|