Luca- In spite of their appeal in the consumer world where many seem to believe that large amounts of information is tantamount to insurance of some type, the sorry fact is that with massive amounts of data, all too frequently what you get is massive redundancy. David Scott's suggestion to do mode clustering with large databases remains one of the most sensible suggestions I've ever heard. Regards, Tom Ball McKinsey & Co 55 East 52nd Street New York, NY 10022 Luca Meyer <lucameyer@TIS To: [log in to unmask] CALI.IT> cc: (bcc: Thomas Ball/NYO/NorthAmerica/MCKINSEY) Sent by: Subject: TwoStep clustering method comparison "Classificatio n, clustering, and phylogeny estimation" <CLASS-L@lists .sunysb.edu> 06/16/2004 11:24 AM Please respond to "Classificatio n, clustering, and phylogeny estimation" Hello, I am searching for working/published papers on twostep clustering method comparison as well as references about this and other methods for clustering large datasets. I am already aware of the following material: Chiu, T., Fang, D., Chen, J., Wang, Y., and Jeris, C. (2001). A Robust and Scalable Clustering Algorithm for Mixed Type Attributes in Large Database Environment. Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, 263. Zhang, T., Ramakrishnon, R., and Livny M. (1996). BIRCH: An Efficient Data Clustering Method for Very Large Datebases. Proceedings of the ACM SIGMOD Conference on Management of Data, p. 103-114, Montreal, Canada. Gore, P. A. Jr. (2000). Cluster analysis. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 297-321). San Diego, CA: Academic Press. Thank you in advance, Mr. Luca Meyer Consumer research advisor: http://www.lucameyer.com/en/ Italian Online Research Mailing List: http://it.groups.yahoo.com/group/ior Tel: +390122854456 - Fax: +390122854837 - Mobile: + 393355217628 - One world, one human race - +---------------------------------------------------------+ This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. +---------------------------------------------------------+