CLASS-L Archives

June 2003


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
leo horseman <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 16 Jun 2003 11:50:49 -0700
text/plain (88 lines)
No, you do not have a metric.  You have no idea how each subject has
mentally scaled the values between a 1 and a 9.  You cannot construct a
Euclidean distance measure from these values.  Whatever you decide to do,
you may be disappointed in the results.  It is possible for a subject to
rate Concepts 1 and 2 as very similar; Concepts 2 and 3 as very similar; and
Concepts 1 and 3 as not very similar.

If you construct a frequency table for each of your k(k-1)/2=435 possible
pairings, with the values 1-9 as one dimension and the 435 pairs as the
other, with entries the number of responses for each value 1-9 for each
pair, you can quickly visually scan and find strange pairings.

You may wish either to rethink your methodology or to consider some
non-Euclidean form of multidimensional scaling (see Kruskal's work).

You are in the U.K.  You should have easy access to the Clustan clustering
programs; also see online Statsoft textbook.  I am not familiar with SPSS or
SAS implementations.

Be careful in defining your proximities as "similarities" (9=least similar
and 1=most similar) or "dissimilarities" (9=most similar and 1=least
similar).  No, it is not O.K. to allow your computer program to construct
another similarity matrix.  Your subjects have already done so.  Your
proximity matrix should be a kxk concepts matrix, not an NxN subjects matrix
(your are clustering the concepts, not the people; the people are the
"variables" in this instance).

>From: Ufuk Yildirim <[log in to unmask]>
>Reply-To: "Classification, clustering, and phylogeny estimation"
>   <[log in to unmask]>
>To: [log in to unmask]
>Subject: Help on HCA and MDS
>Date: Tue, 10 Jun 2003 11:57:57 +0100
>Hi everyone,
>I have couple of questions on (hierarchical) cluster analysis and
>Multidimensional scaling. As part of my research, I collected data using a
>method called 'similarity rating' on a scale of 1 to 9. There are 30
>variables (30 concepts from physics to be exact). I want to find out how
>people organise these concepts. The software I am using is SPSS 11, because
>SPSS is the only one I know how to use and one of the two statistical
>packages available in university computers (I think the other one is SAS).
>I should add that I am not very familiar with the theoretical background of
>these analyses, though trying my best to get as much information as I
>can/need. For example, I have been reading a lot lately on MDS and HCA, but
>I still do not know what the basic assumptions are for MDS and HCA. I need
>to find a good book which explains things conceptually, with little
>mathematical notation.
>Now my real problem, as I enter the data in SPSS, I use the subjects'
>ratings of the pairwise similarities for the 30 concepts. I want to know
>which of these is the appropriate statistical analysis for my analysis. I
>am confused with the metric/non-metric distinction. My data is non-metric I
>think. Can I use HCA with non-metric data? If I can, and if HCA is
>appropriate, what is the best method? Ward's? Between-groups linkage? or
>within-groups linkage? etc. Since my original data is already a proximity
>matrix (or at least I think it is), what HCA is doing seems to be wrong. It
>tries to create proximity matrix again. Is this ok? When I run the analysis
>as it is, it seem fine, but when I change the syntax so that it uses the
>original data matrix in /MATRIX IN ('filename.sav'), a totally different
>clustering is produced. Which one is correct? Is there a clearly written
>book on multivariate analysis using SPSS?
>For MDS, I have similar problem. What are the things I need to do to get a
>clear picture of how people organise these 30 concepts. Because stress
>value with low dimensions is quite law, I have to increase the number of
>dimensions. By the way in SPSS results, there a lot of stress values:
>normalized raw stress, Stress-I, Stress-II and S-Stress. Which of these
>should I use to interpret my results? Also, what are "Dispersion Accounted
>For (D.A.F.)" and "Tucker's Coefficient of Congruence" used for? What is
>the difference between Simplex and Torgerson in initial configuration
>I know this is a lot, but as I mentioned earlier there isn't any book on
>multivariate statistics using SPSS as far as I know. Many books on
>multivariate statistics explain things to make life more difficult. If you
>could help me, I would be very happy.
>Thank you very much for your interest and help in advance.

Help STOP SPAM with the new MSN 8 and get 2 months FREE*