CLASS-L Archives

July 2007


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Peter Flom <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Mon, 2 Jul 2007 12:45:57 -0700
text/plain (102 lines)
Thanks!  I didn't know about that.

Peter L. Flom, PhD

-----Original Message-----
From: Classification, clustering, and phylogeny estimation on behalf of Kari Torkkola
Sent: Mon 7/2/2007 11:04 AM
To: [log in to unmask]
Subject: Re: Tree software
Have you considered the R interface to a public domain version of Random Forests?
You do not need to reduce the number of covariates.
It thrives on correlated covariates.


- Kari Torkkola

William Shannon wrote:
> Hi Peter,
> I am unaware of SPINA and am downloading party now to look into that 
> software.  I generally have used rpart (because Salford is so expensive) 
> but have never dealt with this many variables with rpart.
> Do you have anyway to reduce the number of covariates before 
> partitioning?  I would be concerned about the curse of dimensionality 
> with 900 variables and 1,000 data points.  It would be very easy to find 
> excellent classifiers based on noise.  Some suggest that a split data 
> set (train on one subset randomly selected from the 1,000 data points 
> and test on the remaining) overcomes this.  However, if X by chance due 
> to the curse of dimensionality discriminates well than it will 
> discriminate well in both the training and test data sets.
> Can you reduce the 900 covariates by PCA or perhaps use an upfront 
> stepwise linear discriminant analysis with a high P value threshold to 
> retain the covariate (say p = .2).  We have a paper where we proposed 
> and tested a genetic algorithm to reduce the number of variables in 
> microarray data that I can send you in a couple of weeks when I get back 
> to St. Louis.  It is being published in Sept. in the Interface Proceedings.
> Good luck.
> Bill Shannon
> Washington Univ. School of Medicine, St. Louis
> 314-704-8725
> */Peter Flom <[log in to unmask]>/* wrote:
>     I have been getting involved with classification trees, and have
>     some questions regarding software.  My data consist of the following:
>     about 1,000 subjects - likely to increase but not dramatically
>     about 900 independent or predictor variables - all continuous, some
>     highly correlated, all standardized and approximately normally
>     distributed
>     outcome which can be dichotomous or categorical, with up to 10 or so
>     categories.
>     I have been using software from R - both Torsten Hothorn's party
>     package and Therneau and Atkinson's rpart - but these bog down when
>     the tree is not dichotomous
>     I have investigated Salford System's software, which is very
>     impressive, but expensive, and may be beyond our budget.
>     I've looked briefly at SPINA
>     I'd appreciate any advice or references to recent reviews.
>     Thanks
>     Peter L. Flom, PhD
>     Brainscope, Inc.
>     212 263 7863 (MTW)
>     212 845 4485 (Th)
>     917 488 7176 (F)
>     ---------------------------------------------- CLASS-L list.
>     Instructions:
> ---------------------------------------------- CLASS-L list. 
> Instructions:

CLASS-L list.

CLASS-L list.