There are several different ways to explore this kind of data.
I would base the within cases vector of variables on the set of possible
responses (a set of multiple dichotomies)
To start I would try leaving each time point as a separate case for the
cluster analyses. I would then also explore (cluster) each subset of
cases and crosstab the results to look for consistency.
Hope this helps.
[log in to unmask]
Social Research Consultants
University Park, MD USA
Bob Green wrote:
> I am interested in the question of whether pooling data from the same
> individuals into a single variable which would violate the assumption of
> the independence of observations in multiple regression, is problematic in
> cluster analysis.
> Briefly, I have data collected at baseline and 4 time points asking whether
> someone smoked and the reasons why. Any individual might give 1-3
> responses, which could range from a single word to a sentence. These
> open-ended responses have been coded by coders. There are therefore 5 time
> periods x potentially 3 responses.
> I have received advice that it is acceptable to pool this data into 1
> variable and have run the analysis using the cluster option in a content
> analysis software program and the results were both interpretable and made
> sense (the analysis was performed using the default options of a similarity
> matrix, average linkage and the Jaccard coefficient) . However, my
> readings and enquiries to date have not been of much assistance in
> providing substantiative support for this approach. Any advice or
> references in relation to this question is appreciated,
> Bob Green