CLASS-L Archives

September 2008


Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Mon, 29 Sep 2008 21:08:43 -0400
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
text/plain; charset=iso-8859-1
"F. James Rohlf" <[log in to unmask]>
Stony Brook University
text/plain (118 lines)
The following are some comments by Richard Reyment who has worked on
problems in this area:

"Is this useful. Generally speaking, biologists know abolsutely nothing
about the geometry of the simplex, and this is also true of a great many
statisticians. For the geomathematical fraternity, however, the subject is
of great importance because it is often connected to analyses involving
large-scale economic aspects where an inappropriate analysis can waste great
sums of money.

     G G Simpson was among the first biologists to point out that ratios
cannot be used in correlation exercises such as indictaed in the Course 1
Originally, it was Karl Pearson who in 1898 proved that ratios induce
spurious correlations. This was in relation to so-called standardised

   Of recent years geomathematicians have taken the subject much further,
following the results of the statistician John Aitchison, who proved that
correlation coefficients are not defined in simplex space, that is the space
in which percentages, frequencies etc lie. This is the outcome of the fact
that such data have a constant sum and any alteration introduced into a
matrix of compositions changes the sum in a manner that is beyond control.
This is not a problem for open-space data of course.

   Ref. John Aitchison: The Statistical analysis of Compositional Data;
Chapman and Hall (1986), slightly revised version reprinted in 2003.

   Hence, multivariate analyses involving compositional data must be made
using the appropriate algebra for distributions on the simplex.  
Applying the "open-space" standard version can only lead to incorrect

   Since the original work was published by Aitchison, the Applied
Mathematicians Professors Vera Pawlowsky-Glahn and Juan Josť Egozcue have
raised the bar several levels in that they introduced the concept of a
finite dimensional Hilbert Space into the analysis of simplicial geometry.
This leads to very elegant solutions.

   An indispensible reference is the recently published volume edited by A.
Buccianti, G. Mateu-Figueras and V. Pawlowsky-Glahn


Published by the Geological Society of London, Special Publication No 264,
2006 (212 pp.)

   Best wishes

Richard A. Reyment"

F. James Rohlf, Distinguished Professor
Ecology & Evolution, Stony Brook University

> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation
> [mailto:[log in to unmask]] On Behalf Of Richard Wright
> Sent: Saturday, September 27, 2008 2:05 AM
> To: [log in to unmask]
> Subject: using ratios in MV correlational analysis
> There is a scattered literature on the dangers, or otherwise, of using
> ratios in correlational analyses.
> I have read what looks like a non-obfuscatory paper on this topic by
> Firebaugh and Gibbs "User's Guide to Ratio Variables" from American
> Sociological Review, Vol.50, No.5 (1985) pp.713-722.
> On page 721 the authors state: "Avoid mixed methods (part ratio, part
> component). If Z is controlled by division rather than by
> residualization, all of the other variables should be divided by Z.
> Should only some of the variables by divided by Z, the effect of Z is
> 'controlled' for some variables and not for others, and a defensible
> interpretation of the results is difficult."
> The reason for my interest is that I am trying to evaluate a
> morphometric paper that does linear discriminant analysis on a mixture
> of measurements and ratios derived from those same measurements. For
> example the analysis includes (A) Length as well as Height/Length and
> (B) Height and Breadth as well as Height/Breadth and Height/Length.
> This paper seems to be an example of the 'mixed method' that Firebaugh
> and Gibbs warn against, where data are part ratio, part measurement,
> and spurious correlations are introduced into the data.
> So my first question is whether I am correct in this interpretation.
> My second question also concerns ratios.
> In his Multivariate Statistical Methods, 2nd ed. 1994, B.F.J. Manly
> suggests controlling for the effects of absolute size difference in a
> PCA of pots (goblets) by expressing the measurements as "a proportion
> of the sum of all measurements on that goblet."
> Given that each variable is divided by the same sum, this example of
> the use of ratios seems to be a case that Firebaugh and Gibbs would
> not frown on.
> I shall welcome any comments on these questions and any pointers to
> relevant literature.
> Richard
> ----------------------------------------------
> CLASS-L list.
> Instructions: http://www.classification-

CLASS-L list.