Thanks!  I didn't know about that.

Peter L. Flom, PhD



-----Original Message-----
From: Classification, clustering, and phylogeny estimation on behalf of Kari Torkkola
Sent: Mon 7/2/2007 11:04 AM
To: [log in to unmask]
Subject: Re: Tree software
 
Have you considered the R interface to a public domain version of Random Forests?
http://cran.r-project.org/src/contrib/Descriptions/randomForest.html
You do not need to reduce the number of covariates.
It thrives on correlated covariates.

Regards,

- Kari Torkkola


William Shannon wrote:
> Hi Peter,
> 
> I am unaware of SPINA and am downloading party now to look into that 
> software.  I generally have used rpart (because Salford is so expensive) 
> but have never dealt with this many variables with rpart.
> 
> Do you have anyway to reduce the number of covariates before 
> partitioning?  I would be concerned about the curse of dimensionality 
> with 900 variables and 1,000 data points.  It would be very easy to find 
> excellent classifiers based on noise.  Some suggest that a split data 
> set (train on one subset randomly selected from the 1,000 data points 
> and test on the remaining) overcomes this.  However, if X by chance due 
> to the curse of dimensionality discriminates well than it will 
> discriminate well in both the training and test data sets.
> 
> Can you reduce the 900 covariates by PCA or perhaps use an upfront 
> stepwise linear discriminant analysis with a high P value threshold to 
> retain the covariate (say p = .2).  We have a paper where we proposed 
> and tested a genetic algorithm to reduce the number of variables in 
> microarray data that I can send you in a couple of weeks when I get back 
> to St. Louis.  It is being published in Sept. in the Interface Proceedings.
> 
> Good luck.
> Bill Shannon
> Washington Univ. School of Medicine, St. Louis
> 314-704-8725
> 
> */Peter Flom <[log in to unmask]>/* wrote:
> 
>     I have been getting involved with classification trees, and have
>     some questions regarding software.  My data consist of the following:
> 
>     about 1,000 subjects - likely to increase but not dramatically
> 
>     about 900 independent or predictor variables - all continuous, some
>     highly correlated, all standardized and approximately normally
>     distributed
> 
>     outcome which can be dichotomous or categorical, with up to 10 or so
>     categories.
> 
>     I have been using software from R - both Torsten Hothorn's party
>     package and Therneau and Atkinson's rpart - but these bog down when
>     the tree is not dichotomous
> 
>     I have investigated Salford System's software, which is very
>     impressive, but expensive, and may be beyond our budget.
> 
>     I've looked briefly at SPINA
> 
> 
>     I'd appreciate any advice or references to recent reviews.
> 
>     Thanks
> 
>     Peter L. Flom, PhD
>     Brainscope, Inc.
>     212 263 7863 (MTW)
>     212 845 4485 (Th)
>     917 488 7176 (F)
> 
> 
>     ---------------------------------------------- CLASS-L list.
>     Instructions:
>     http://www.classification-society.org/csna/lists.html#class-l
> 
> 
> ---------------------------------------------- CLASS-L list. 
> Instructions: http://www.classification-society.org/csna/lists.html#class-l

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l


----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l