CLASS-L Archives

September 2008


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Richard Wright <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Tue, 30 Sep 2008 11:20:44 +1000
text/plain (125 lines)
Dear Thomas,

Thanks for that.

The core problem (as I understand things) is that spurious correlation may be generated if an individual measurement that contribute to a ratio is itself themselves included in the analysis, along with the ratio. When the original measurement and the ratio are included in a larger database of measurements then correlational distortions surrounding the ratios affect the overall result. At least, this is how I understand the argument.

An accessible demonstration of possible spurious correlation is illustrated by Atkinson et al.

Maybe the Cauchy distribution of ratios is behind the problem. I am no mathematician.

If the same divisor is used (e.g. the sum of all the measurements in morphometrics) then, as I understand it, the problem of spurious correlation is absent.

The question of ratios not being normally distributed certainly affects the probabilistic part of classification by discriminant analysis. This is perhaps a separate but compounding problem in using ratios, though on its own it should not have too severe an effect on the descriptive side of discriminant analysis (i.e. canonical variates analysis).

I would welcome any further comments you have on this topic.



>Subject: Re: using ratios in MV correlational analysis
>   From: Thomas Augustin <[log in to unmask]>
>   Date: Mon, 29 Sep 2008 20:49:55 +0200
>     To: [log in to unmask]
>Dear Richard,
>I am not hundred percent certain whether this contributes to your 
>problem, but let me try nevertheless.
>One of the major problems in using ratios of variables could be the fact 
>that the ratio of normal variables is Cauchy distributed, and the Cauchy 
>distribution is the standard counterexample to all standard statistical 
>optimality results. For instance, Cauchy distributed variables do even 
>have an expected value, and the arithmetic mean of standard Cauchy 
>distributed variables has the same distribution as one single variable, 
>i.e. we can not learn from the data by increasing the sample size.
>Hope this comment is of some help, the more so as discriminant analysis 
>often relies on a model where variables are taken to be normally 
>distributed, so that, in my view,  taking the ratio of these variables 
>could lead to such problems.
>Best wishes
>Prof  Dr  Thomas Augustin
>Department of  Statistics
>University of Munich
>Ludwigstr. 33/II
>D-80539 Munich
>Tel +49 89 2180 3520
>Fax+49 89 2180 5044
>[log in to unmask]
>Richard Wright schrieb:
>> There is a scattered literature on the dangers, or otherwise, of using
>> ratios in correlational analyses. 
>> I have read what looks like a non-obfuscatory paper on this topic by
>> Firebaugh and Gibbs "User's Guide to Ratio Variables" from American
>> Sociological Review, Vol.50, No.5 (1985) pp.713-722.
>> On page 721 the authors state: "Avoid mixed methods (part ratio, part
>> component). If Z is controlled by division rather than by
>> residualization, all of the other variables should be divided by Z.
>> Should only some of the variables by divided by Z, the effect of Z is
>> 'controlled' for some variables and not for others, and a defensible
>> interpretation of the results is difficult." 
>> The reason for my interest is that I am trying to evaluate a
>> morphometric paper that does linear discriminant analysis on a mixture
>> of measurements and ratios derived from those same measurements. For
>> example the analysis includes (A) Length as well as Height/Length and
>> (B) Height and Breadth as well as Height/Breadth and Height/Length. 
>> This paper seems to be an example of the 'mixed method' that Firebaugh
>> and Gibbs warn against, where data are part ratio, part measurement,
>> and spurious correlations are introduced into the data.
>> So my first question is whether I am correct in this interpretation.
>> My second question also concerns ratios.
>> In his Multivariate Statistical Methods, 2nd ed. 1994, B.F.J. Manly
>> suggests controlling for the effects of absolute size difference in a
>> PCA of pots (goblets) by expressing the measurements as "a proportion
>> of the sum of all measurements on that goblet."
>> Given that each variable is divided by the same sum, this example of
>> the use of ratios seems to be a case that Firebaugh and Gibbs would
>> not frown on.
>> I shall welcome any comments on these questions and any pointers to
>> relevant literature.
>> Richard
>> ----------------------------------------------
>> CLASS-L list.
>> Instructions:
>CLASS-L list.

CLASS-L list.