CLASS-L Archives

September 2011

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Date:
Thu, 22 Sep 2011 11:14:15 +0100
Reply-To:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Subject:
MIME-Version:
1.0
Message-ID:
In-Reply-To:
<01e301cc78ca$9adc2f70$d0948e50$@net>
Content-Type:
TEXT/PLAIN; format=flowed; charset=US-ASCII
From:
Christian Hennig <[log in to unmask]>
Parts/Attachments:
TEXT/PLAIN (90 lines)
Dear Matthew,

well, basically it is up to you to decide what for your application the 
most important features are.
Clusters with bad cohesion and separation may well still be useful, but 
whether they are depends on what you want to use them for.

If *you* think that cohesion and separation are required here, that's bad 
news. But if you don't, why worry about them?

You have all the data analytic information in place and I don't have 
objections against your interpretation of it (one may do more but whether 
this makes sense again depends on what the clusters are used for and how 
they should be interpreted). But the responsibility to decide what is
required in your area and in this situation is yours.

I guess you want to know something like whether your clusters are "real".
Well, that's a question to which there is no proper answer, because it 
crucially depends on what is meant by a "real" cluster, and this again 
depends your field.

Christian

On Wed, 21 Sep 2011, Matthew Pirritano wrote:

> All,
>
>
>
> I'm a first time poster. I have data on coping strategies used by couples
> undergoing infertility treatment. I have created clusters of the coping
> strategies keeping male and female scores separate. There are 4 coping
> scores, based on composite scores of 4 subscales (active-avoidance,
> active-confronting, passive-avoidance, meaning-based). So I have 8 variables
> in my cluster analysis. I've started with Hierarchical clustering using
> Ward's method and squared Euclidean distance. I then used those cluster
> centers as the starting centers for a k-means cluster analysis. Based on my
> dendrogram from the hierarchical analysis and the clinical interpretability
> of the k-means solutions I arrived at a 5 cluster solution. These cluster's
> predict well a number of outcome variables, such as stress. These
> predictions are well in line with theory and previous research. That's the
> external validity.
>
>
>
> I then went to validate the clusters using the average silhouette. I've
> tested all solutions between 2 and 12 clusters and my average silhouette is
> never greater than .4. I've tried different clustering methods and different
> distance measures, with the same results. The highest average silhouette I
> get is when I multiply men and women's scores. I've seen this done before,
> but I'm not sure how to interpret the resulting scores. Any ideas? And that
> solution was only for 2 clusters.
>
>
>
> So, is it still possible that could still discuss the original 5 cluster
> solution despite not finding good separation and cohesion with the average
> silhouette? Is all lost, or is there a way to save the situation?
>
>
>
> Any help is much appreciated. Please let me know if you need more info or if
> I've violated any list protocol.
>
>
>
> Thanks
>
> Matt
>
>
>
>
>
>
> ----------------------------------------------
> CLASS-L list.
> Instructions: http://www.classification-society.org/csna/lists.html#class-l
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[log in to unmask], www.homepages.ucl.ac.uk/~ucakche

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l

ATOM RSS1 RSS2