The usual UPGMA clustering uses this strategy within the cluster analysis
but starts with all objects having equal weights. It would be a trivial
generalization to provide an initial set of weights for each object. The
rest of the algorithm could then proceed normally. Would this give the
I am a bit puzzled by the suggested use of correspondence analysis. In that
approach one scales by the square roots of row and column frequencies and
thus, in effect, weights inversely to sample size - which I believe is the
opposite of Moritz's request. Or am I missing something?
F. James Rohlf - Dept. Ecology & Evolution
SUNY, Stony Brook, NY 11794-5245
> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation
> [mailto:[log in to unmask]] On Behalf Of Moritz Lennert
> Sent: Monday, January 26, 2004 10:46 AM
> To: [log in to unmask]
> Subject: Re: clustering with weighted cases
> Fionn Murtagh said:
> > Moritz,
> > Normal situation for clustering used with correspondence
> analysis. Code
> > in Java is at http://astro.u-strasbg.fr/~fmurtagh/mda-sw
> I will put
> > code in C for the clustering there soon, and in R for all -
> > analysis and hier. clustering (weights on cases, min. var.
> > reciprocal nearest neigh. algorithm) .
> Thank you, this sounds great. I would be very interested once
> you're done, especially concerning R.
> However, I was not really looking for code or algorithms
> (although I'm interested in seeing this implemented in R). We
> have done this before and have home-made fortran programs to
> do this. However, looking into the possibility of doing this
> with R, I was surprised to see that none of the existing
> cluster algorithms implemented allow such weighting of cases.
> I was, therefore, wondering whether this had any reason that
> I should know of, other than just pure lack of interest.
> In order to (hopefully) make my question clearer, here is an
> explanation of what we doing currently:
> We have the census tracts of the city of Brussels. We have a
> series of data concerning the housing market in each census
> tract (type of ownership, number of rented appartments, etc).
> In order to put a bit of order into this information, we
> would like to run a cluster analysis to identify different
> types of ownership/housing structures.
> My question stems from the fact that the total population of
> housing units differ quite strongly from one census tract to
> the other. We do not want tracts with small populations to
> have the same influence on the types as tracts with large
> populations. Thus the idea of weighting each census tract
> according to its population.
> You seem to be saying that this is a standard situation when
> clustering results of correspondance analyses, but is this
> used in general agglomerative clustering algorithms ?