Moritz,

Normal situation for clustering used with correspondence analysis.  Code in Java is at  http://astro.u-strasbg.fr/~fmurtagh/mda-sw   I will put code in C for the clustering there soon, and in R for all - corresp. analysis and hier. clustering  (weights on cases, min. var. criterion, reciprocal nearest neigh. algorithm) .

Best regards,

Fionn Murtagh
(f dot murtagh at qub dot ac dot uk)

 

From: Moritz Lennert <[log in to unmask]>
Reply-To: "Classification, clustering, and phylogeny estimation"              <[log in to unmask]>
To: [log in to unmask]
Subject: clustering with weighted cases
Date: Thu, 22 Jan 2004 16:05:01 +0100
Hello,
We want to submit data to a hierarchical cluster analysis which
weights each case by a value given for each case. In our case we are
speaking of geographical entities which we would like to weight according
to their population when clustering them according to a series of
variables linked to this population.
According to our ideas, the weighting should occur at two moments in the
process:
1) in the calculation of the distance by multiplying (for each pair) the
sum of squares by the product of the respective weights divided by the sum
of the weights (if two pairs of observations are of equal euclidian
distance, the pair with a higher weight should be considered of greater
distance)
2) in the calculation of the ward criterion, by not using just the number
of observations in each cluster, but a weighted sum of observations.
However, skimming through the litterature I have not been able to find
much references on weighting of cases (=observations). Much is said about
weighting variables, but little about cases. Is this due to a simple lack
of interest for such a procedure, or are there any
mathematical/statistical reasons for not weighting cases ?
If there is litterature on the topic which I have missed, could someone
please point me to it ?
Thank you,
Moritz Lennert


Help STOP spam with the new MSN 8 and get 2 months FREE*