Thanks! I didn't know about that. Peter L. Flom, PhD -----Original Message----- From: Classification, clustering, and phylogeny estimation on behalf of Kari Torkkola Sent: Mon 7/2/2007 11:04 AM To: [log in to unmask] Subject: Re: Tree software Have you considered the R interface to a public domain version of Random Forests? http://cran.r-project.org/src/contrib/Descriptions/randomForest.html You do not need to reduce the number of covariates. It thrives on correlated covariates. Regards, - Kari Torkkola William Shannon wrote: > Hi Peter, > > I am unaware of SPINA and am downloading party now to look into that > software. I generally have used rpart (because Salford is so expensive) > but have never dealt with this many variables with rpart. > > Do you have anyway to reduce the number of covariates before > partitioning? I would be concerned about the curse of dimensionality > with 900 variables and 1,000 data points. It would be very easy to find > excellent classifiers based on noise. Some suggest that a split data > set (train on one subset randomly selected from the 1,000 data points > and test on the remaining) overcomes this. However, if X by chance due > to the curse of dimensionality discriminates well than it will > discriminate well in both the training and test data sets. > > Can you reduce the 900 covariates by PCA or perhaps use an upfront > stepwise linear discriminant analysis with a high P value threshold to > retain the covariate (say p = .2). We have a paper where we proposed > and tested a genetic algorithm to reduce the number of variables in > microarray data that I can send you in a couple of weeks when I get back > to St. Louis. It is being published in Sept. in the Interface Proceedings. > > Good luck. > Bill Shannon > Washington Univ. School of Medicine, St. Louis > 314-704-8725 > > */Peter Flom <[log in to unmask]>/* wrote: > > I have been getting involved with classification trees, and have > some questions regarding software. My data consist of the following: > > about 1,000 subjects - likely to increase but not dramatically > > about 900 independent or predictor variables - all continuous, some > highly correlated, all standardized and approximately normally > distributed > > outcome which can be dichotomous or categorical, with up to 10 or so > categories. > > I have been using software from R - both Torsten Hothorn's party > package and Therneau and Atkinson's rpart - but these bog down when > the tree is not dichotomous > > I have investigated Salford System's software, which is very > impressive, but expensive, and may be beyond our budget. > > I've looked briefly at SPINA > > > I'd appreciate any advice or references to recent reviews. > > Thanks > > Peter L. Flom, PhD > Brainscope, Inc. > 212 263 7863 (MTW) > 212 845 4485 (Th) > 917 488 7176 (F) > > > ---------------------------------------------- CLASS-L list. > Instructions: > http://www.classification-society.org/csna/lists.html#class-l > > > ---------------------------------------------- CLASS-L list. > Instructions: http://www.classification-society.org/csna/lists.html#class-l ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l ---------------------------------------------- CLASS-L list. Instructions: http://www.classification-society.org/csna/lists.html#class-l