CLASS-L Archives

January 2004


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Moritz Lennert <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 26 Jan 2004 16:45:32 +0100
text/plain (42 lines)
Fionn Murtagh said:
> Moritz,
> Normal situation for clustering used with correspondence analysis.  Code
> in Java is at   I will put
> code in C for the clustering there soon, and in R for all - corresp.
> analysis and hier. clustering  (weights on cases, min. var. criterion,
> reciprocal nearest neigh. algorithm) .

Thank you, this sounds great. I would be very interested once you're done,
especially concerning R.

However, I was not really looking for code or algorithms (although I'm
interested in seeing this implemented in R). We have done this before and
have home-made fortran programs to do this. However, looking into the
possibility of doing this with R, I was surprised to see that none of the
existing cluster algorithms implemented allow such weighting of cases. I
was, therefore, wondering whether this had any reason that I should know
of, other than just pure lack of interest.

In order to (hopefully) make my question clearer, here is an explanation
of what we doing currently:

We have the census tracts of the city of Brussels. We have a series of
data concerning the housing market in each census tract (type of
ownership, number of rented appartments, etc). In order to put a bit of
order into this information, we would like to run a cluster analysis to
identify different types of ownership/housing structures.

My question stems from the fact that the total population of housing units
differ quite strongly from one census tract to the other. We do not want
tracts with small populations to have the same influence on the types as
tracts with large populations. Thus the idea of weighting each census
tract according to its population.

You seem to be saying that this is a standard situation when clustering
results of correspondance analyses, but is this used in general
agglomerative clustering algorithms ?