I am doing a simple correspondence analysis on a contingency table
having more
than 5000 lines (that are representing genes) and 4 columns. I am then
doing a
clustering analysis on the coordinates of all genes on the factorial
axes. The
results of the correspondence analysis are quite surprising to me. The
coordinates of all rows on the 3 axes are strictly within a simplex
that is a
perfect tetrahedron. Of course, the extremities of the tetrahedron are
corresponding to the coordinates of the column in the factorial space.
How
could such a structure be obtained?
I first
thought that this was due to a specific structure in my original
dataset, so I’ve
decided to do the analysis again on a randomly drawn contingency table
of the
same size (each cell was drawn from a uniform distribution between 0
and 100). I
collected again such a triangle structure.
It thus
seems that a simple correspondence analysis is always producing such a
triangular
structure on a space with a small number of dimensions, but I never
heard about
this before. I suspect that people are usually not getting this because
either
they have more dimensions, or less rows in the original table, leading
to
points that look to be spread on factorial plans in a more homogeneous
way.
Any
explanation on this will be welcomed!
Eric
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Eric Wajnberg
Chair of the ESF Scientific programme on
Behavioural Ecology of Insect Parasitoids (BEPAR)
Associated Professor at the UQAM
(Universite du Quebec a Montreal)
I.N.R.A.
400 Route des Chappes, BP 167,
06903 Sophia Antipolis Cedex, France
Tel: (33-0) 4.92.38.64.47
Fax: (33-0) 4.92.38.65.57
e-mail: [log in to unmask]
Web page: http://www.sophia.inra.fr/perso/wajnberg/
Editor-in-Chief of BioControl, Published by Springer.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~