I am starting to play around with nonlinear discriminant analysis (kernel method) for categorizing people based on their EEG data.
1) I have more variables than people (at least in the training data set), what are good methods of data reduction? I do not want to use principal components or related methods, because one goal of this analysis (at least at this point) is to see which variables are important.
2) I am using SAS PROC DISCRIM, and when I use only one variable, results are more or less as expected. But when I use multiple variables, I get warnings that
The ellipsoid centered at an observation in TESTDATA= data set does not
contain any training set observations in DATA= data set or BY group. This
observation is classified into group "Other"
and this actually applies to ALL the observations in the testdata set. Any ideas on what is causing this? It happens even with only a few variables being used.
3) Is there any way to assess the importance of the different variables to the nonlinear discriminant function?
Many of my variables are highly correlated
I have 600 variables
I have about 1,000 people total, divided into training and test data sets
Peter L. Flom, PhD
212 263 7863 (MTW)
212 845 4485 (Th)
917 488 7176 (F)