CLASS-L Archives

February 2008


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Arnaud Trollé <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 4 Feb 2008 14:26:15 -0500
text/plain (93 lines)
Thank you all for your help.

Sorry, you're right Art, I've been too evasive, I should have begun by defining
the framework of my study :
During a listening test, 33 subjects were presented 78 pairs of sounds (i.e.
number of possible combinations between 13 sounds). For each pair, the
subject is asked to indicate which sound he prefers, three possibilities : ``first
sound preferred", "second sound preferred", and "no preference". Actually, my
data set consits of 33 cases for 78 categorical variables (all with 3 modalities).
Before any other analysis, my first objective is to find out whether there exists
any sub-groups of subjects with distinct preference logics.
So, my approach is exploratory. However, if there exists any subgroups (with,
for each, a meaningful size), I'm expecting at most a weak number of
subgroups. Thus, I first went in for partitioning methods such as the k-modes
of which I've heard. But, I've got to few experience to even judge whether
this method is one of the most adapted or not to my study case ?

I hope these elements will help to work out a little more my initial questionning.

Best Regards,


De : Classification, clustering, and phylogeny estimation [CLASS-
[log in to unmask]] de la part de Art Kendall [[log in to unmask]]
Date d'envoi : lundi 4 février 2008 18:30
À : [log in to unmask]
Objet : Re: About Partitioning Categorical Data ...

Please tell us more about your application? Are the values ordered? Are
you trying to find groups of variables or groups of cases (rows,
subjects, entities)?
How many cases (rows) do you have? How many variables? Do all of the
variables have 3 values? Are you trying to see how an existing partition
of cases or variables works with other cases or variables?
Often it is helpful to us to know the substantive meaning of your
variables, and what a case represents.

SPSS is widely available but there are also many specific purpose
programs around depending on what you are trying to do.
If SPSS itself does not have a procedure, you can call any R procedures
from within SPSS. So you might be able to use several procedures.
If you are partitioning variables into sets, then you might look at
Categorical Principal Components analysis (CATPCA).
If you are partitioning cases into sets, then you might look TWOSTEP
which clusters cases based on either/both categorical and continuous
If you have an existing 3 value variable, that you want to see how the
cases with each value differ on another, TREES, CATREG, and DISCRIMINANT
might be what you could use.
If you have three sets of variables, you can confirm how well a three
factor solution fits in CATPCA by specifying the number of factors you want.

Art Kendall
Social Research Consultants

Arnaud Trollé wrote:
> Hello,
> I'd like to cluster categorical data (3 categories) by means of a partitioning
> method; I'm quite a beginner in that field and I would need to be enlightened.
> From a bibliographic review I carried out about that topic, it appeared to me
> that a method is often used :the k-modes method. From her/his experience,
> could anyone confirm or deny that it is the case ? If denied, which method
> could be more "powerful" ?
> Thanks in advance.
> Best Regards.
> Arnaud.
> PhD Student in Acoustics.
> Lyon, France.
> ----------------------------------------------
> CLASS-L list.
> Instructions:

CLASS-L list.

CLASS-L list.