CLASS-L Archives

April 2005


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Travis Brenden <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 11 Apr 2005 13:01:00 -0400
text/plain (51 lines)
Classification Listserve Members,

I am working on a clustering method that can be applied to digital river
in a geographic information system.  Essentially, the goal is to cluster
adjoining river reaches (i.e., river reaches that flow into each other) into
larger habitat units (i.e. patches, valley segments) based on habitat data that
are attributed
to each river reach.  I have found that most standard clustering methods do not
work well with this type of data because the methods do not recognize the fact
that only adjoining reaches should be clustered.  I thus have constructed an
algorithmn that will "crawl" through a river network and form patches one at
time by iteratively merging adjoining river reaches, until no more adjoining
reaches satisfy the merging threshold.  Once a patch is formed, all reaches
that comprise that patch are dropped from the candidate list of reaches so that
they will not get clustered into another habitat patch.  Right now, I am basing
my threshold value on the average (or some other statistic) of the pairwise
Euclidean differences between all river reaches in the network.  Clustering also
is based on Euclidean differences in the habitat variables.

The method seems to work fairly well, but I would now like to try and merge
neighboring patches into larger units until some "optimum" level of patches is
found (although I realize "optimum" is probably a myth).  Essentially, this
concerns the implementation of a stopping rule for forming clusters.  My
current stopping rule is based on the Calinski and Harabasz (1974) index, which
in a nutshell is the ratio of the between and pooled within cluster sum of
squares.  I thus iteratively merge the most similar patches until the Calinski
and Harabasz index can no longer be improved.

My question is whether the Calinski and Harabasz index is useful for this type
of application (trying to find an "optimum" number of clusters) or to see if
anybody had any other suggestions as to a better stopping rule?  I would also
be interested to hear if anybody had other suggestions concerning how to
cluster only adjoining river reaches.  I have done a number of web searches for
a better method but I have always come up empty.  To me, this is a form of
spatially-constrained clustering, but I have not come across anything similar
in other fields.

Thanks in advance for any suggestions that might be provided.


Travis Brenden
School of Natural Resources and Environment
University of Michigan
212 Museums Annex
Ann Arbor, MI 48109-1084
734-663-3554 (Ext. 122)
[log in to unmask]