I have couple of questions on (hierarchical) cluster analysis and Multidimensional scaling. As part of my research, I collected data using a method called 'similarity rating' on a scale of 1 to 9. There are 30 variables (30 concepts from physics to be exact). I want to find out how people organise these concepts. The software I am using is SPSS 11, because SPSS is the only one I know how to use and one of the two statistical packages available in university computers (I think the other one is SAS). I should add that I am not very familiar with the theoretical background of these analyses, though trying my best to get as much information as I can/need. For example, I have been reading a lot lately on MDS and HCA, but I still do not know what the basic assumptions are for MDS and HCA. I need to find a good book which explains things conceptually, with little mathematical notation.
Now my real problem, as I enter the data in SPSS, I use the subjects' ratings of the pairwise similarities for the 30 concepts. I want to know which of these is the appropriate statistical analysis for my analysis. I am confused with the metric/non-metric distinction. My data is non-metric I think. Can I use HCA with non-metric data? If I can, and if HCA is appropriate, what is the best method? Ward's? Between-groups linkage? or within-groups linkage? etc. Since my original data is already a proximity matrix (or at least I think it is), what HCA is doing seems to be wrong. It tries to create proximity matrix again. Is this ok? When I run the analysis as it is, it seem fine, but when I change the syntax so that it uses the original data matrix in /MATRIX IN ('filename.sav'), a totally different clustering is produced. Which one is correct? Is there a clearly written book on multivariate analysis using SPSS?
For MDS, I have similar problem. What are the things I need to do to get a clear picture of how people organise these 30 concepts. Because stress value with low dimensions is quite law, I have to increase the number of dimensions. By the way in SPSS results, there a lot of stress values: normalized raw stress, Stress-I, Stress-II and S-Stress. Which of these should I use to interpret my results? Also, what are "Dispersion Accounted For (D.A.F.)" and "Tucker's Coefficient of Congruence" used for? What is the difference between Simplex and Torgerson in initial configuration options?
I know this is a lot, but as I mentioned earlier there isn't any book on multivariate statistics using SPSS as far as I know. Many books on multivariate statistics explain things to make life more difficult. If you could help me, I would be very happy.
Thank you very much for your interest and help in advance.