CLASS-L Archives

April 2002


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Wolfgang M. Martmann" <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Fri, 26 Apr 2002 08:23:57 +0200
text/plain (73 lines)
Dear Prof. Rohlf,
I have read your paper on the minimum spanning tree algorithm
and I think I've got your program to run. For testing I did use the
Birth and Dead Rate data in Hartigan's book. The Yuval grid
did run fine, but it looks like that I encounter an infinite loop with
the Rabin grid. Maybe I did specify something wrong. I will try
to trace this today. I think the LIMLOT in the Common block
cannot be larger than 2999. Do you have any idea how large
NDIMGR and LIMH normally should be?

But back to my original problem: First I would have to modify the
algorithm from searching for the smallest to searching for the largest
distances. Then I would need to look into the graph for those K objects,
the number K is user specified, which have no small interpoint distances.
That brings me to the question: Does the minimum spanning tree really
find a cluster of K cases which have all small interpoint distances?

In my applications the total number of cases N woiuld be in the thousands
but the number K of selected cases should be somehow between 100 and
1000. Then the K times K matrix of distances between the selected objects
should have no small interpoint distance (off-diagonal entry).

----- Original Message -----
From: F. James Rohlf <[log in to unmask]>
To: <[log in to unmask]>
Sent: Wednesday, April 17, 2002 4:23 PM
Subject: Re: Name and Reference of Algorithm

> I am not sure if is exactly the reference you need, but the following may
> of interest because it shows one method to avoid having to compute an
> distance matrix. It was intended for quickly finding near neighbors but it
> should also be useful for finding distant neighbors (I looked for points
> the same partition whereas you would be interested in points not in the
> partition).  Main limitation for my application was that it works best for
> large numbers of points in very low-dimensional spaces.
> Rohlf, F. J. 1977. A probabalistic minimum spanning tree algorithm.
> Information Proc. Letters, 7:44-48.
> > -----Original Message-----
> > From: Classification, clustering, and phylogeny estimation
> > [mailto:[log in to unmask]]On Behalf Of Wolfgang M. Martmann
> > Sent: Wednesday, April 17, 2002 9:45 AM
> > To: [log in to unmask]
> > Subject: Name and Reference of Algorithm
> >
> >
> > Hi:
> > For some time I'm working on a problem of sampling a set of K
> > observations (cases) from a large data set with N >> K cases so that
> > the selected observations are as "different as possible". In more
> > mathematical terms, I'm interested in locating those K cases which
> > will result in a (not necessarily Euclidean) distance matrix in which
> > the smallest off-diagonal entry d_ij is as large as possible.
> >
> > I have developed an algorithm which seems to work very well and
> > generates sets which are either optimal or close to optimality without
> > computing the entire distance matrix. However, I'm thinking more
> > and more that this maybe a known problem to people who work in
> > Cluster Analysis, MDS, or classification. I wonder if anybody on
> > this list could point me to some references about this search
> > problem.
> >
> > Thanks, Wolfgang Hartmann
> >