CLASS-L Archives

March 2004


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Art Kendall <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Fri, 19 Mar 2004 10:02:33 -0500
text/plain (70 lines)
Welcome to the list.  Clustering can be a very useful way to explore
data.  Even if it does not help with your particular research questions,
trying out clustering will provide you with some potentially valuable tools.

These are some of the questions that would help in formulating a response.

What are the variables that you are using?  Are they simple variables or
factor scores that are are orthogonal or . . .? Is this pre-existing
data or did you select the variables because you thought there would be
distinct types?

Why these 100 children? Is this a sample of children in general or were
the children selected according to some criteria?

What did you do to interpret the different clusters?
Do you have other variables that could be used in a Discriminant
Function analysis to see how the 3 or 2 groups differentiate?
Did you use the obtained groups in a DFA with the original variables to
see the "territorial maps" in discriminant function space?  How well did
the cluster stay together during the classification phase of the DFA?
Were the center of the group very separated with particular cases spread
Did you eyeball the probabilities of membership in the group?  The
probability that a member of a group would be that far from the centroid?

Is there a reason to expect sharp boundaries between the type?

If you have a mix of categorical and continuous variables did you try
the TWOSTEP procedure in SPSS?

When you looked at the profile were the 4 variables the same ones the
clustering was performed on? Why was it surprising on the fifth variable?

Ooops have to sign off .  My ten o'clock client just arrived.  Look
forward to seeing your response.

Hope this helps.

[log in to unmask]
Social Research Consultants
University Park, MD  USA

ALBA Sandra, Assistant Statistician wrote:

>I have performed a cluster analysis on a medical dataset consisting of 100 children measured on 4 variables.
>The dendograms suggested there were three groups, so I did a k-means clustering with k=3. I didn't set the initial centroids of the k-means = centres of hierarchical clustering, and the two types of clustering did not repeat the same partiton. Arnold's test for cluster proved to be non significant. YET, I managed to find two groups of children who had a very different profile on the 4 variables clustered and and a similar response on a 5th variable, which was very surprising.
>Now, I understand I haven't identified 3 groups of very different children, everything so far suggests there are no sharply differing groups. I cannot make any inferences from my sample, obviously. But could I say I have found some sort of multivariate thresholds on the basis of the matrix of distances, which allow me to gain a certain insight into the data?  Or is it just all a big fluke, not worth the paper it's written on?!
>I welcome any comments/suggestions. I am only new to the topic (and the list), but I am keen to learn!
>Thanks for your time so far
>Sandra Alba
>University Medicine - Level 7
>Derriford Hospital
>PL6 8DH