CLASS-L Archives

September 2008

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Date:
Thu, 4 Sep 2008 08:43:26 -0400
MIME-version:
1.0
Reply-To:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Content-type:
multipart/alternative; boundary=------------010308000205060002040405
Subject:
From:
Art Kendall <[log in to unmask]>
In-Reply-To:
Organization:
Social Research Consultants
Parts/Attachments:
text/plain (3678 bytes) , text/html (6 kB)
If you have SPSS here are some ways to do this.

the squared Euclidean distance is the sum of the squared distances on 
each dimension.
If you have 10 z variables  try something like this *untested *syntax. 
which will find the distance of each case from each centroid.
create 60 variables for the centroids in a file with 1 "case" with a 
variable called constant  set to 1, and 6 sets of  10 
cen1z1 to cen1z10 cen2z1 to cen2z10 ...cen6z1 to cen6z10

in your main file
compute constant=1.
match files file=main /table= centroids by constant.

do repeat
vector
   distance= distance1 to distance6
/ z = z1 to z10
/ center1 = cen1z1 to cen1z10
 / center2 = cen2z1 to cen2z10
. . .
 / center6 = cen6z1 to cen6z10.

loop #i =1 to 6
compute distance(#i)=0.
loop #j = 1 to 10.
distance (#i) = distance(#i)  + ((center(#i) - z(#j)**2).
end loop.
end loop.

If you do not have a huge number of cases and have a fairly powerful 
machine a solution with less effort on your part but a lot of 
computation for the machine  might be this.
Just add 6 cases to the main each representing a centroid at the top of 
the files and do PROXIMITIES on the large matrix and then delete the 
columns you do not want.

Another way to look at the agreement between two solutions is to do the 
clusterings with filtered cases saving the memberships.
Then do two DISCRIMINANTs, each time treating the other set of cases as 
unclustered in the classification phase saving the assignments and 
probabilities of membership on each pass.
Then CROSSTAB the assignments on the DFA with those from the original 
clustering.

Art Kendall
Social Research Consultants




Liza Rovniak wrote:
>
> Hi,
>
>  
>
> I am hoping someone here can help me with a "how to" question on 
> running McIntyre and Blashfield's (1980) nearest-centroid evaluation 
> procedure to validate the stability of my cluster analysis solution. I 
> am a newbie to cluster analysis, so this is my first time running this 
> procedure.
>
>  
>
> I have a sample of  about 900 observations and have randomly split the 
> sample in two (Sample A and Sample B). I conducted hierarchical 
> cluster analysis and then calculated the centroid vectors for a 
> 3-cluster solution on each of these two subsamples (i.e., steps 1 
> through 4 of McIntrye and Blashfield's evaluation technique).
>
>  
>
> Step 5 of McIntrye and Blashfield's technique is to calculate "the 
> squared Euclidean distance for each of Sample B's objects from each of 
> the centroids of Sample A," and Step 6 is to assign "each object  in 
> Sample B to the closest centroid vector." At this point, I am not sure 
> what buttons to press in SPSS to complete the analysis. One 
> possibility I tried is to use K-means cluster analysis to achieve 
> these two steps, but K-means uses simple Euclidean distance (not 
> squared Euclidean distance as recommended by McIntyre and Blashfield) 
> to assign the observations to clusters. Is this okay? (someone told me 
> it was, but I just want to double-check).  I would greatly appreciate 
> any guidance on what buttons to press in SPSS/appropriate syntax to 
> complete steps 5 and 6 of this analysis.
>
>  
>
> Thank you.
>
>  
>
> Liza Rovniak
>
>  
>
> Liza S. Rovniak, PhD, MPH
>
> Adjunct Assistant Professor
>
> Center for Behavioral Epidemiology & Community Health
>
> Graduate School of Public Health, San Diego State University
>
> San Diego, CA 92123
>
> Phone: 858-505-4770, ext. 152; Fax: 858-505-8614
>
> Email: [log in to unmask]
>
>  
>
> ---------------------------------------------- CLASS-L list. 
> Instructions: 
> http://www.classification-society.org/csna/lists.html#class-l 

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l


ATOM RSS1 RSS2