CLASS-L Archives

June 2000


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Ognian Asparoukhov <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Sat, 24 Jun 2000 09:10:32 +0300
text/plain (42 lines)
At 11:34 23.6.2000, you wrote:
>Hi everyone,
>can someone tell me the limits on the number of variables in relation to
>sample size? Are there any good references on this topic? Thanks in advance,

The good empirical rule for discriminant analysis (classification) is:

N>=p, where

N is the numer of observations (sample size);
p is the number of variable.
However this rule is appropriate for two classes.

In general the minimum sample size depends on the procedure you will use.
The parametric statistical procedures require less N,
while the nonparametric ones require more N.
But you have to use as more as possible training observations,
except if you have tremendous data set.

I any case you need some unbiased estimation of
the classification accuracy (cross-validation; leave-one-out;
test sample) in order to determine the particular classifier's

And the most important questions are:
a) selection of variables (the best subset)
        Even you have many variables and moderate N,
        you could use different variables slection procedures and
        you will decrease p
b) choice of an appropriate discriminant procedure

Ognian Asparoukhov

Ognian Asparoukhov                        Phone:  ++(359) 2 700-528
Centre of Biomedical Engineering                  ++(359) 2 700-326
Bulgarian Academy of Sciences             Fax:    ++(359) 2 723-787
Acad. Georgi Bonchev Street, Bl. 105      E-mail: [log in to unmask]
1113 Sofia, BULGARIA                           [log in to unmask]