CLASS-L Archives

December 2005

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show HTML Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Art Kendall <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Date:
Sat, 17 Dec 2005 09:33:51 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (91 lines)
A lot of the interpretation depends on the nature of the differences in 
the sources, the clustering software, and the substantive topic you are 
working on.

Are your data sets representative samples? pops? availability samples? 
 From two different cultures?  French vs US? Males vs females?  Two 
sections of the same class in a university? From two species of 
bacteria? Signatures of aircraft from two brands of radar?

If the sources are very different, and you get the same cluster profiles 
in about the same proportions you have a strong case for stability.
If the sources are very  similar,   and you get the same cluster 
profiles in about the same proportions you have a weak  case for 
stability beyond the type of source.
If the sources are very different, and you find substantively different 
clusters, then the reason for the difference may be instability or it 
may be the source or both
If the sources are very  similar,   and you find substantively different 
clusters, then the reason for the difference would lean more toward 
being instability than the source.

Are you doing hierarchical or non-hierarchical clustering?  What do you 
mean by stable?  Are you also trying to establish some validity?

Are your results similar  within a set?
If your procedures run reasonably quickly did you try different 
algorithms? 
If you are using k-means did you use a number of starts?
What stopping rules did you use to decide how many clusters to use?
What is the level of measurement of your variables (attributes)?  Are 
you using attributes in the sense of dichotomous (dummy) variables?

Does your software have provisions for applying the same model to a 
different set of cases? E.g., in SPSS you can save cluster memberships 
from different solutions and save them.

Finally, one approach would be to slip between the horns of the 
dilemma.  Put all cases in one file with  variables indicating which 
file a case came from.  and which replication it is for. apply models 
form one subset of cases to the other cases. Do 4 clusterings and apply 
the models to all of the cases.  Consider 4 membership variables. 
Source1 half 1, Source1 half 2, source2 half 1, source2 half2.  Do a 4 
way crosstab.

If you have continuous variables in the attributes explore using GLM 
(ignoring tests) to do a 4 way ANOVA 
If you have variables external to the clustering on the same cases, do 
the same GLM on them.  (Here the tests would not have the usual 
interpretation but could be used as a rough measure to interpret 
differences.)

Hope this helps.

Art
[log in to unmask]
Social Research Consultants
University Park, MD  USA 
(Inside the Washington, DC beltway.)
(301) 864-5570

jessie jessie wrote:

>Hi everyone, 
>
>I have a question about the replication analysis. In
>order to carry out a replication analysis, we need to
>have two datasets first. Currently I do have two
>datasets but they are from different sources although
>the attributes(columns) are the same and the number of
>rows are similar. Since the two datasets in the
>replication analyses I read about were obtained by
>dividing a bigger dataset into two halves, I wonder if
>I can still do replication analysis using my two
>datasets for the purpose of validation (maybe after
>some statistical procedures). The expectation I have
>is that if the result is good, then I claim the
>clusters I've found are stable. Could anyone please
>give me some insightful suggestions on this? Thank you
>very much in advance! 
>
>Jessie
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around 
>http://mail.yahoo.com 
>
>
>  
>

ATOM RSS1 RSS2