Thoughts on statistical comparisons of two (or more) clusterings. My PhD research project will require comparisons of clusterings and comparisons of relational networks so I am very interested in your query. I am new to this area and only tentative thoughts that amount to an attempt to conceptualise the questions which need to be addessed so that I may have some idea where to look for a suitable set of answers. I have made some progress in moving from some of the questions towards potential answer but the process is tentative at this stage. Chapters 19 (Procrustean Procedures) and 20 (Three-way Procrustean Models) in Borg and Groenen (1997) provides a useful discussion of techniques for rotating, rescaling, and comparison of multidimensional scaling outcomes with similar or dissimilar dimensionality. I suspect that much of that discussion is applicable to comparison of clusterings. However, for a complete, single stage application of the methods described by Borg and Groenen the locations of the individual points in the clusters, rather than the patterns of the clusters, would need to be compared. It seems as though it would be possible to compare the centroids of (non-hierarchical) clusters using these methods. Such a treatment would discount or ignore the characteristics of the clusters themselves. In particular, such characteristics as density, dispersion, and intra-cluster relations of elements of each cluster would be not be analyzed. It would seem inappropriate to apply the Procrusrtean transformations at the individual cluster level to compare similar clusters in different clusterings. However, some of the comparison techniques that would normally follow Procrustean transformations may permit analysis of the relational distribution of elements of individual clusters that have similar elements in different clusterings. In short, at this stage, I suspect a two stage process using centroids and individual clusters may be useful. If the clusterings represent multiple sampling across some environmental facet or dimensional property of the data generating entities it should be possible to use the binary results of the two stage comparisons to determine if trends exist at either level and whether the trends, if they exist, have a covariance. However, I would not be surprised if other respondents to your question suggest a more elegant and suitable approach. Reference: Borg, I. & Groenen, P. (1997). Modern Multidimensional Scaling: Theory and Applications. New York: Springer. Jack Keegan PhD Candidate Queensland University of Technology 25/2/03 > Hello. > > I'm trying to find a way to compare two clusterings (two results > of a clustering > algorithm). > > Is there any algorithm (or better yet - working softwar package > for S-Plus/R/whatever) ? > > Thanks a lot... > > P.S. Comparing the results visually is not an option - too many > records...