We want to submit data to a hierarchical cluster analysis which
weights each case by a value given for each case. In our case we are
speaking of geographical entities which we would like to weight according
to their population when clustering them according to a series of
variables linked to this population.
According to our ideas, the weighting should occur at two moments in the
1) in the calculation of the distance by multiplying (for each pair) the
sum of squares by the product of the respective weights divided by the sum
of the weights (if two pairs of observations are of equal euclidian
distance, the pair with a higher weight should be considered of greater
2) in the calculation of the ward criterion, by not using just the number
of observations in each cluster, but a weighted sum of observations.
However, skimming through the litterature I have not been able to find
much references on weighting of cases (=observations). Much is said about
weighting variables, but little about cases. Is this due to a simple lack
of interest for such a procedure, or are there any
mathematical/statistical reasons for not weighting cases ?
If there is litterature on the topic which I have missed, could someone
please point me to it ?