I am hoping someone here can help me with a “how to” question on running McIntyre and Blashfield’s (1980) nearest-centroid evaluation procedure to validate the stability of my cluster analysis solution. I am a newbie to cluster analysis, so this is my first time running this procedure.
I have a sample of about 900 observations and have randomly split the sample in two (Sample A and Sample B). I conducted hierarchical cluster analysis and then calculated the centroid vectors for a 3-cluster solution on each of these two subsamples (i.e., steps 1 through 4 of McIntrye and Blashfield’s evaluation technique).
Step 5 of McIntrye and Blashfield’s technique is to calculate “the squared Euclidean distance for each of Sample B’s objects from each of the centroids of Sample A,” and Step 6 is to assign “each object in Sample B to the closest centroid vector.” At this point, I am not sure what buttons to press in SPSS to complete the analysis. One possibility I tried is to use K-means cluster analysis to achieve these two steps, but K-means uses simple Euclidean distance (not squared Euclidean distance as recommended by McIntyre and Blashfield) to assign the observations to clusters. Is this okay? (someone told me it was, but I just want to double-check). I would greatly appreciate any guidance on what buttons to press in SPSS/appropriate syntax to complete steps 5 and 6 of this analysis.
Liza S. Rovniak, PhD, MPH
Adjunct Assistant Professor
Center for Behavioral Epidemiology & Community Health
Graduate School of Public Health, San Diego State University
San Diego, CA 92123
Phone: 858-505-4770, ext. 152; Fax: 858-505-8614
Email: [log in to unmask]