Let me add an additional bit of advice. In real-world problems, data often does not line up in neat balls or clusters. Quite often the algorithms fail to aggregate groups which we feel should have been grouped together, or, the algorithm will join groups which we feel should have been split That is because all the algorithms require some sort of 'vigilance parameter', such as the predetermined number of clusters or some criterion for splitting and joining, based on some concept of 'how big is big?' 'how far is far?' etc. The choice of this parameter thus makes the clustering problem somewhat subjective. Therefore, to assess algorithms I recommend that you start with 2 or 3 dimensional data sets which you can scatter-plot independently, to verify how well the algorithms performed. Move to higher dimensions, once you've satisfied yourself that the algorithm is working correctly on the clusters you can perceive with your own eyes. HTH, John Day, Staff Scientist Computer Science Innovations, Inc Melbourne, FL http://www.csi.cc At 10:36 AM 1/3/02 -0600, you wrote: >On Thu, 3 Jan 2002, George Feretzakis wrote: > > > In order to compare Clustering methods, I need an artificial data > > set which is formed in true clusters. I would therefore greatly > > appreciate anyone sending me a data set like this or information > > where I could find this kind of data sets and if there is any > > simpler algorithm for generating artificial data. > >Three suggestions for you. First, an algorithm developed by Glenn Milligan: > >@Article{Milligan:1985, > author = {Glenn W. Milligan}, > title = {An Algorithm for Generating Artificial Test Clusters}, > journal = {Psychometrica}, > year = 1985, > volume = 50, > number = 1, > pages = {123--127}, > month = {March} >} > >A C++ implementation of Milligan's algorithm has been written by Dan >Pape, based on an implementation in C that I wrote. You can get it here: > >http://clusutils.sourceforge.net/ > >Compiling it on a Unix or Linux system will be easy using recent >versions of the standard Gnu development tools. We haven't heard much >from people who have compiled it for other platforms, but we'd like to. > >Dr. Milligan's implementation is still available at the CSNA website >as Fortran source and (I think) a DOS executable: > >http://www.pitt.edu/~csna/Milligan/ > >You might also try the algorithm developed by Waller, et al: > >@Article{waller99, > author = {Waller, N.G. and Underhill, J. M. and Kaiser, H. A.}, > title = {A method for generating simulated plasmodes and > artificial test clusters with user-defined shape, > size, and orientation}, > journal = {Multivariate Behavioral Research}, > year = 1999, > volume = 34, > number = 2, > pages = {123--142}, >} > >Dr. Waller has links to Windows executables and Splus implementations of the >algorithm from this page: > > http://peabody.vanderbilt.edu/depts/psych_and_hd/faculty/wallern/ > >Final suggestion: these algorithms were developed to support the same >kind of research you are proposing to begin. Review the methodologies >Milligan and Waller employed, not just the algorithms for the test >data. Here are two starting points that you can also use in citation >searches: > >@Article{Milligan:1980, > author = {Glenn W. Milligan}, > title = {An Examination of the Effect of Six Types of Error > Perturbation on Fifteen Clustering Algorithms}, > journal = {Psychometrica}, > year = 1980, > volume = 45, > number = 3, > pages = {325--341}, > month = {September} >} > >@Article{Waller:1998b, > author = {Niels G. Waller and Heather A. Kaiser and Janine > B. Illian and Mike Manry}, > title = {A Comparison of the Classification Capabilities of the > 1-Dimensional Kohonen Neural Network with Two > Partitioning and Three Hierarchical Cluster Analysis > Algorithms}, > journal = {Psychometrica}, > year = 1998, > volume = 63, > number = 1, > pages = {5--22}, > month = {March} >} > >cheers, > >Dave Dubin >ISRL, UIUC