CLASS-L Archives

February 2008


Options: Use Monospaced Font
Show HTML Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Peter Flom <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Wed, 6 Feb 2008 08:51:22 -0800
text/plain (1480 bytes) , text/html (2150 bytes)

I am starting to play around with nonlinear discriminant analysis (kernel method) for categorizing people based on their EEG data.  

Some questions:
1) I have more variables than people (at least in the training data set), what are good methods of data reduction?  I do not want to use principal components or related methods, because one goal of this analysis (at least at this point) is to see which variables are important.

2) I am using SAS PROC DISCRIM, and when I use only one variable, results are more or less as expected.  But when I use multiple variables, I get warnings that 

         The ellipsoid centered at an observation in TESTDATA= data set does not 
         contain any training set observations in DATA= data set or BY group.  This     
         observation is classified into group "Other"

and this actually applies to ALL the observations in the testdata set.  Any ideas on what is causing this?  It happens even with only a few variables being used.

3)  Is there any way to assess the importance of the different variables to the nonlinear discriminant function?

Some notes:
Many of my variables are highly correlated
I have 600 variables
I have about 1,000 people total, divided into training and test data sets


Peter L. Flom, PhD
Brainscope, Inc.
212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)

CLASS-L list.