A lot of the interpretation depends on the nature of the differences in the sources, the clustering software, and the substantive topic you are working on. Are your data sets representative samples? pops? availability samples? From two different cultures? French vs US? Males vs females? Two sections of the same class in a university? From two species of bacteria? Signatures of aircraft from two brands of radar? If the sources are very different, and you get the same cluster profiles in about the same proportions you have a strong case for stability. If the sources are very similar, and you get the same cluster profiles in about the same proportions you have a weak case for stability beyond the type of source. If the sources are very different, and you find substantively different clusters, then the reason for the difference may be instability or it may be the source or both If the sources are very similar, and you find substantively different clusters, then the reason for the difference would lean more toward being instability than the source. Are you doing hierarchical or non-hierarchical clustering? What do you mean by stable? Are you also trying to establish some validity? Are your results similar within a set? If your procedures run reasonably quickly did you try different algorithms? If you are using k-means did you use a number of starts? What stopping rules did you use to decide how many clusters to use? What is the level of measurement of your variables (attributes)? Are you using attributes in the sense of dichotomous (dummy) variables? Does your software have provisions for applying the same model to a different set of cases? E.g., in SPSS you can save cluster memberships from different solutions and save them. Finally, one approach would be to slip between the horns of the dilemma. Put all cases in one file with variables indicating which file a case came from. and which replication it is for. apply models form one subset of cases to the other cases. Do 4 clusterings and apply the models to all of the cases. Consider 4 membership variables. Source1 half 1, Source1 half 2, source2 half 1, source2 half2. Do a 4 way crosstab. If you have continuous variables in the attributes explore using GLM (ignoring tests) to do a 4 way ANOVA If you have variables external to the clustering on the same cases, do the same GLM on them. (Here the tests would not have the usual interpretation but could be used as a rough measure to interpret differences.) Hope this helps. Art [log in to unmask] Social Research Consultants University Park, MD USA (Inside the Washington, DC beltway.) (301) 864-5570 jessie jessie wrote: >Hi everyone, > >I have a question about the replication analysis. In >order to carry out a replication analysis, we need to >have two datasets first. Currently I do have two >datasets but they are from different sources although >the attributes(columns) are the same and the number of >rows are similar. Since the two datasets in the >replication analyses I read about were obtained by >dividing a bigger dataset into two halves, I wonder if >I can still do replication analysis using my two >datasets for the purpose of validation (maybe after >some statistical procedures). The expectation I have >is that if the result is good, then I claim the >clusters I've found are stable. Could anyone please >give me some insightful suggestions on this? Thank you >very much in advance! > >Jessie > >__________________________________________________ >Do You Yahoo!? >Tired of spam? Yahoo! Mail has the best spam protection around >http://mail.yahoo.com > > > >