This sounds like a very interesting question both methodologically and substantively.

How many cases did you start with?  Too many to do  hierarchical clustering on all cases?
I have not used the silhouette coefficients per se, but I have used an approach based on the same kind of information, the distance of a case from the centroids, i.e., the probabilities of membership in each cluster from the classification phase of a discriminant function analysis.  It would be interesting to hear whether silhouette coefficients have been of use to list members in ball parking the number of clusters to retain.

Since the mid-70s my habit is to use several methods of clustering and find sets of cases that are placed together by several methods. I called these core clusters. Since k-means and to some extent TWOSTEP are sensitive to the order of cases within the file some of the methods would be based an random re-sequencing of the file. Most clustering methods save  variables that are the cluster assignments.  I then use several of the variables saved in the classification phase of discriminant function analysis (DFA).  These are the predicted cluster that DFA would assign a cases to, the scores of each case on the discriminant functions, and the probability a case would be so far from the centroid of the cluster.  

I would then create a new membership variable giving a value that would be ungrouped in the discriminant to cases that have are close to more than one cluster. . I would repeat this until the results subjectively stabilized. 

Once I have the core cluster I use profile graphs and by-products of the DFA to give a final interpretation of  the clusters. Profile graphs are very like parallel coordinate plots when when the data is aggregated to the clusters.

Relation of the new variable found by the clustering to variables that were not used in the clustering can be a form of external validation.

A caveat.  You use the term 'composite'. There may be complications if the data is ipsative/ compositional.  Getting up to speed on compositional data has been on my to do list since I retired, BUT...  I have a feeling that there are strong analogies for compositional and ipsative data, possibly these are different names for the same thing.  If that is what you have, someone else on the list may be more able to address that question.


It might also be interesting to cluster score profiles for each person separately  and  exploring, e.g., by crosstab, whether members of a couple would be assigned to the same cluster.

Art Kendall
Social Research Consultants

On 9/21/2011 9:54 PM, Matthew Pirritano wrote:

All,

 

I’m a first time poster. I have data on coping strategies used by couples undergoing infertility treatment. I have created clusters of the coping strategies keeping male and female scores separate. There are 4 coping scores, based on composite scores of 4 subscales (active-avoidance, active-confronting, passive-avoidance, meaning-based). So I have 8 variables in my cluster analysis. I’ve started with Hierarchical clustering using Ward’s method and squared Euclidean distance. I then used those cluster centers as the starting centers for a k-means cluster analysis. Based on my dendrogram from the hierarchical analysis and the clinical interpretability of the k-means solutions I arrived at a 5 cluster solution. These cluster’s predict well a number of outcome variables, such as stress. These predictions are well in line with theory and previous research. That’s the external validity.

 

I then went to validate the clusters using the average silhouette. I’ve tested all solutions between 2 and 12 clusters and my average silhouette is never greater than .4. I’ve tried different clustering methods and different distance measures, with the same results. The highest average silhouette I get is when I multiply men and women’s scores. I’ve seen this done before, but I’m not sure how to interpret the resulting scores. Any ideas? And that solution was only for 2 clusters.

 

So, is it still possible that could still discuss the original 5 cluster solution despite not finding good separation and cohesion with the average silhouette? Is all lost, or is there a way to save the situation?

 

Any help is much appreciated. Please let me know if you need more info or if I’ve violated any list protocol.

 

Thanks

Matt

 

 

---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l
---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l