On Wed, 23 Oct 2002, Henry Bulley wrote: > I am working on different classification methods based on ecological > (spatial) variables. I would be glad if any of you could direct me to > literature on comparing the different classifications methods as to which > one is better. The problem of validation in clustering is a hard one. I assume by "which one is better" you're trying to compare the results of different clustering methods on the same data set, and you want to decide which results are the best. I doubt there's any consensus about the best overall clustering method - the appropriateness of any algorithm depends strongly upon the characteristics of your data set as well as what you're trying to find. For example, some methods assume that your data has gaussian distributions in it. Others make different assumptions. The quality of your results will be affected by how well your data fits the assumptions of the method. Probably the best way to validate clustering results is to compare the classification to some actual known data labels (ground truth), if you have it. Unfortunately, for many clustering applications you don't have that kind of information (which is why you're using a clustering algorithm). Failing that, another approach would be to evaluate the robustness of the method - given a small perturbation of your original data, how much does the classification change? Or you can compare the results of different clustering algorithms just to get a sense of how much agreement there is - support from more than one approach for the same partition of the data gives you more confidence in the partition. Hope this helps! Kiri ------------ Kiri Wagstaff, Ph.D. -------- [log in to unmask] ------------- Love is the image you place around your significant other, and how close it is to being true love depends on how closely he or she fits into the mold. -- Orlando de La Cruz -----------------------------------------------------------------------------