Hi,

I have performed a cluster analysis on a medical dataset consisting of 100 children measured on 4 variables.

The dendograms suggested there were three groups, so I did a k-means clustering with k=3. I didn't set the initial centroids of the k-means = centres of hierarchical clustering, and the two types of clustering did not repeat the same partiton. Arnold's test for cluster proved to be non significant. YET, I managed to find two groups of children who had a very different profile on the 4 variables clustered and and a similar response on a 5th variable, which was very surprising.

Now, I understand I haven't identified 3 groups of very different children, everything so far suggests there are no sharply differing groups. I cannot make any inferences from my sample, obviously. But could I say I have found some sort of multivariate thresholds on the basis of the matrix of distances, which allow me to gain a certain insight into the data?  Or is it just all a big fluke, not worth the paper it's written on?!

I welcome any comments/suggestions. I am only new to the topic (and the list), but I am keen to learn!

Thanks for your time so far

Sandra

Sandra Alba
University Medicine - Level 7
Derriford Hospital
Plymouth
PL6 8DH
UK