CLASS-L Archives

October 2002


Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Kiri Wagstaff <[log in to unmask]>
Wed, 23 Oct 2002 14:40:41 -0400
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
TEXT/PLAIN (38 lines)
On Wed, 23 Oct 2002, Henry Bulley wrote:
> I am working on different classification methods based on ecological
> (spatial) variables. I would be glad if any of you could direct me to
> literature on comparing the different classifications methods as to which
> one is better.

The problem of validation in clustering is a hard one.  I assume by "which
one is better" you're trying to compare the results of different
clustering methods on the same data set, and you want to decide which
results are the best.  I doubt there's any consensus about the best
overall clustering method - the appropriateness of any algorithm depends
strongly upon the characteristics of your data set as well as what you're
trying to find.  For example, some methods assume that your data has
gaussian distributions in it.  Others make different assumptions.  The
quality of your results will be affected by how well your data fits the
assumptions of the method.

Probably the best way to validate clustering results is to compare the
classification to some actual known data labels (ground truth), if you
have it.  Unfortunately, for many clustering applications you don't have
that kind of information (which is why you're using a clustering
algorithm).  Failing that, another approach would be to evaluate the
robustness of the method - given a small perturbation of your original
data, how much does the classification change?  Or you can compare the
results of different clustering algorithms just to get a sense of how much
agreement there is - support from more than one approach for the same
partition of the data gives you more confidence in the partition.

        Hope this helps!


------------ Kiri Wagstaff, Ph.D. -------- [log in to unmask] -------------
        Love is the image you place around your significant other,
   and how close it is to being true love depends on how closely he or she
                fits into the mold.  -- Orlando de La Cruz