"Arthur J. Kendall" <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 22 Sep 2003 10:18:30 -0400
text/plain (48 lines)
There are several different ways to explore this kind of data.

I would base the within cases vector of variables on the set of possible
responses (a set of multiple dichotomies)

To start I would try leaving each time point as a separate case for the
cluster analyses. I would then also explore (cluster) each subset of
cases and crosstab the results to look for consistency.

Hope this helps.

Bob Green wrote:

> I am interested in the question of whether pooling data from the same
> individuals into a single variable which would violate the assumption of
> the independence of observations in multiple regression, is problematic in
> cluster analysis.
> Briefly, I have data collected at baseline and 4 time points asking whether
> someone smoked and the reasons why. Any individual might give 1-3
> responses, which could range from a single word to a sentence. These
> open-ended responses have been coded by coders. There are therefore 5 time
> periods x potentially 3 responses.
> I have received advice that it is acceptable to pool this data into 1
> variable and have run the analysis using the cluster option in a content
> analysis software program and the results were both interpretable and made
> sense (the analysis was performed using the default options of a similarity
> matrix, average linkage and the Jaccard coefficient) .  However, my
> readings and enquiries to date have not been of much assistance in
> providing substantiative support for this approach.  Any advice or
> references in relation to this question is appreciated,
> regards
> Bob Green