On Thu, 3 Jan 2002, George Feretzakis wrote: > In order to compare Clustering methods, I need an artificial data > set which is formed in true clusters. I would therefore greatly > appreciate anyone sending me a data set like this or information > where I could find this kind of data sets and if there is any > simpler algorithm for generating artificial data. Three suggestions for you. First, an algorithm developed by Glenn Milligan: @Article{Milligan:1985, author = {Glenn W. Milligan}, title = {An Algorithm for Generating Artificial Test Clusters}, journal = {Psychometrica}, year = 1985, volume = 50, number = 1, pages = {123--127}, month = {March} } A C++ implementation of Milligan's algorithm has been written by Dan Pape, based on an implementation in C that I wrote. You can get it here: http://clusutils.sourceforge.net/ Compiling it on a Unix or Linux system will be easy using recent versions of the standard Gnu development tools. We haven't heard much from people who have compiled it for other platforms, but we'd like to. Dr. Milligan's implementation is still available at the CSNA website as Fortran source and (I think) a DOS executable: http://www.pitt.edu/~csna/Milligan/ You might also try the algorithm developed by Waller, et al: @Article{waller99, author = {Waller, N.G. and Underhill, J. M. and Kaiser, H. A.}, title = {A method for generating simulated plasmodes and artificial test clusters with user-defined shape, size, and orientation}, journal = {Multivariate Behavioral Research}, year = 1999, volume = 34, number = 2, pages = {123--142}, } Dr. Waller has links to Windows executables and Splus implementations of the algorithm from this page: http://peabody.vanderbilt.edu/depts/psych_and_hd/faculty/wallern/ Final suggestion: these algorithms were developed to support the same kind of research you are proposing to begin. Review the methodologies Milligan and Waller employed, not just the algorithms for the test data. Here are two starting points that you can also use in citation searches: @Article{Milligan:1980, author = {Glenn W. Milligan}, title = {An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms}, journal = {Psychometrica}, year = 1980, volume = 45, number = 3, pages = {325--341}, month = {September} } @Article{Waller:1998b, author = {Niels G. Waller and Heather A. Kaiser and Janine B. Illian and Mike Manry}, title = {A Comparison of the Classification Capabilities of the 1-Dimensional Kohonen Neural Network with Two Partitioning and Three Hierarchical Cluster Analysis Algorithms}, journal = {Psychometrica}, year = 1998, volume = 63, number = 1, pages = {5--22}, month = {March} } cheers, Dave Dubin ISRL, UIUC