CLASS-L Archives

October 2002


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Arthur J. Kendall" <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Wed, 23 Oct 2002 16:31:07 -0400
text/plain (56 lines)
In the mid-70's, while at the stat research division of the US Census bureau, I
refined an approach to this question that I had developed earlier.
I called the approach "core clusters".  To this day I do not believe that there
is one best all around all purpose method of cluster analysis.
This approach assumes that you want to classify all cases rather than finding
"pure types", that there is no interest in the hierarchical nature of some
methods (you select a single slice of the tree), that consensus of methods is a
rough indicator of validity,  and that more compact clusters are preferred.
This is easily done in SPSS but could be done in other software.

Select 7 (or so) distinct methods of cluster analysis/ similarity measure
For each method save the cluster membership of each case from each method as a
variable in your data file.
Crosstab the memberships.   Identify groups of cases that are placed together by
5 or more methods.  These are "cores".
Create a variable representing which core cluster each case belongs to with a
residual value representing "unclustered".
(If you are interested in "pure types" you could stop here.)
Iteratively, until you don't get much reassignment of classified cases:
    Do a Discriminant function analysis saving the probability of membership
from the classification phase and the probability a member of of the assigned
group being so
        far      from the centroid of the group.
    interpret the profiles of the core clusters after the Discriminant phase of
each iteration to see if they make sense
    recode cases that have (low probabilities for nearest group or for distance
from the centroid) to the "unclassified"

 A few iterations should start showing consistent results. Cases that won't fit
in well may have special characteristics.

In those applications, we clustered social characteristics of counties and
validated the results by mapping the clusters. You may mean "spatial" in a
different sense.  If so, please describe your research application in greater

Hope this helps.

[log in to unmask]
Social Research Consultants
University Park, MD USA

Henry Bulley wrote:

> Hi all,
> I am working on different classification methods based on ecological
> (spatial) variables. I would be glad if any of you could direct me to
> literature on comparing the different classifications methods as to which
> one is better.
>   Henry