Hi, I am using correspondence analysis to examine degrees of homogamy/social distance in society using  the occupations of husbands and  wives as markers of social position. I have done this over five historical points using New Zealand census data (1981-2001). I'm using the dimension scores  (1 and 2) of the CA process to achieve a ranked scale of homogamy/social interaction . It is expected that the order of the  ranking will be similar to that of the ranking of occupations in the issco model . This indeed is the case with three of the time periods. At two points in time however this ranking is inverted. Has anyone got tips on how to  explain this/switch this around? I believe that the 'best fit model' in correspondence analysis can be  a little nebulous. Greenacre talks about 'rotating the axis.' Will this work and how might I do this in SAS?
Any help will be useful.
Cheers, Stephen 

From: Classification, clustering, and phylogeny estimation [mailto:[log in to unmask]] On Behalf Of Liza Rovniak
Sent: Thursday, 4 September 2008 10:40 a.m.
To: [log in to unmask]
Subject: cluster analysis validation technique



I am hoping someone here can help me with a “how to” question on running McIntyre and Blashfield’s (1980) nearest-centroid evaluation procedure to validate the stability of my cluster analysis solution. I am a newbie to cluster analysis, so this is my first time running this procedure.


I have a sample of  about 900 observations and have randomly split the sample in two (Sample A and Sample B). I conducted hierarchical cluster analysis and then calculated the centroid vectors for a 3-cluster solution on each of these two subsamples (i.e., steps 1 through 4 of McIntrye and Blashfield’s evaluation technique).


Step 5 of McIntrye and Blashfield’s technique is to calculate “the squared Euclidean distance for each of Sample B’s objects from each of the centroids of Sample A,” and Step 6 is to assign “each object  in Sample B to the closest centroid vector.” At this point, I am not sure what buttons to press in SPSS to complete the analysis. One possibility I tried is to use K-means cluster analysis to achieve these two steps, but K-means uses simple Euclidean distance (not squared Euclidean distance as recommended by McIntyre and Blashfield) to assign the observations to clusters. Is this okay? (someone told me it was, but I just want to double-check).  I would greatly appreciate any guidance on what buttons to press in SPSS/appropriate syntax to complete steps 5 and 6 of this analysis.


Thank you.


Liza Rovniak


Liza S. Rovniak, PhD, MPH

Adjunct Assistant Professor

Center for Behavioral Epidemiology & Community Health

Graduate School of Public Health, San Diego State University

San Diego, CA 92123

Phone: 858-505-4770, ext. 152; Fax: 858-505-8614

Email: [log in to unmask]


---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l