Print

Print


I look at such problems a little differently. The initial clusters may have good clinical interpretability because they correspond to some compact regions  of the space of your variables. The tests for distinctness of the clusters may indicate poor results simply because the clusters are not well-separated. That is quite different from whether the clusters "make sense" although it is nice if meaningful clusters are also well-separated. For example, if one had a simple continuum of points in space one could cut it in half and make sense of each "cluster" but they would not be expected to pass any test for the distinctness of the two clusters. With such data any clustering may be artificial and one might consider some sort of multidimensional scaling solution rather than forcing data into clusters. 

Jim

-------------------
F. James Rohlf, John S. Toll Professor
Dept. Ecology & Evolution, Stony Brook University
 Please consider the environment before printing this email 


> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation [mailto:CLASS-
> [log in to unmask]] On Behalf Of Matthew Pirritano
> Sent: Thursday, September 22, 2011 10:21 AM
> To: [log in to unmask]
> Subject: Re: good external validity for clusters but bad cohesion and separation
> 
> Christian,
> 
> Thanks so much for you thoughtful response. My application would be to
> ultimately provide cutoff values on these coping scales so that clinicians could
> tell what clusters couples likely fall into, whether they are using a combination
> of adaptive or maladaptive strategies, and counsel them accordingly.
> 
> Thanks again
> Matt
> 
> Sent from my iPhone
> 
> On Sep 22, 2011, at 3:14 AM, Christian Hennig <[log in to unmask]>
> wrote:
> 
> > Dear Matthew,
> >
> > well, basically it is up to you to decide what for your application the most
> important features are.
> > Clusters with bad cohesion and separation may well still be useful, but whether
> they are depends on what you want to use them for.
> >
> > If *you* think that cohesion and separation are required here, that's bad news.
> But if you don't, why worry about them?
> >
> > You have all the data analytic information in place and I don't have
> > objections against your interpretation of it (one may do more but whether this
> makes sense again depends on what the clusters are used for and how they
> should be interpreted). But the responsibility to decide what is required in your
> area and in this situation is yours.
> >
> > I guess you want to know something like whether your clusters are "real".
> > Well, that's a question to which there is no proper answer, because it crucially
> depends on what is meant by a "real" cluster, and this again depends your field.
> >
> > Christian
> >
> > On Wed, 21 Sep 2011, Matthew Pirritano wrote:
> >
> >> All,
> >>
> >>
> >>
> >> I'm a first time poster. I have data on coping strategies used by
> >> couples undergoing infertility treatment. I have created clusters of
> >> the coping strategies keeping male and female scores separate. There
> >> are 4 coping scores, based on composite scores of 4 subscales
> >> (active-avoidance, active-confronting, passive-avoidance,
> >> meaning-based). So I have 8 variables in my cluster analysis. I've
> >> started with Hierarchical clustering using Ward's method and squared
> >> Euclidean distance. I then used those cluster centers as the starting
> >> centers for a k-means cluster analysis. Based on my dendrogram from
> >> the hierarchical analysis and the clinical interpretability of the
> >> k-means solutions I arrived at a 5 cluster solution. These cluster's
> >> predict well a number of outcome variables, such as stress. These
> >> predictions are well in line with theory and previous research. That's the
> external validity.
> >>
> >>
> >>
> >> I then went to validate the clusters using the average silhouette.
> >> I've tested all solutions between 2 and 12 clusters and my average
> >> silhouette is never greater than .4. I've tried different clustering
> >> methods and different distance measures, with the same results. The
> >> highest average silhouette I get is when I multiply men and women's
> >> scores. I've seen this done before, but I'm not sure how to interpret
> >> the resulting scores. Any ideas? And that solution was only for 2 clusters.
> >>
> >>
> >>
> >> So, is it still possible that could still discuss the original 5
> >> cluster solution despite not finding good separation and cohesion
> >> with the average silhouette? Is all lost, or is there a way to save the
> situation?
> >>
> >>
> >>
> >> Any help is much appreciated. Please let me know if you need more
> >> info or if I've violated any list protocol.
> >>
> >>
> >>
> >> Thanks
> >>
> >> Matt
> >>
> >>
> >>
> >>
> >>
> >>
> >> ----------------------------------------------
> >> CLASS-L list.
> >> Instructions:
> >> http://www.classification-society.org/csna/lists.html#class-l
> >>
> >
> > *** --- ***
> > Christian Hennig
> > University College London, Department of Statistical Science Gower
> > St., London WC1E 6BT, phone +44 207 679 1698 [log in to unmask],
> > www.homepages.ucl.ac.uk/~ucakche
> >
> > ----------------------------------------------
> > CLASS-L list.
> > Instructions:
> > http://www.classification-society.org/csna/lists.html#class-l
> 
> ----------------------------------------------
> CLASS-L list.
> Instructions: http://www.classification-society.org/csna/lists.html#class-l

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l