CLASS-L Archives

April 2007

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show HTML Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Yakir Gagnon <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Date:
Tue, 24 Apr 2007 19:20:10 +0200
Content-Type:
multipart/alternative
Parts/Attachments:
text/plain (5 kB) , text/html (7 kB)
small remark, but I'd love to get some response on the following:
I've had to tackle multivariable data sets as well (up to 100 variables).
using PCA to reduce that number to 2 or 3 principal components (also noticed
the noise to signal ratio problem, where clean data works fine with PCA and
dirty ones doesn't at all). my problem is though that the PCA produced this
new data set that doesn't have equality of variance nor covariance. which
prohibits us form analysing it with MANOA. my question then is how should I
transform the new data? assuming I'm using Cox-Box transform, should I use
it on each variable (for all replicates) at a time, until I reach equality
of variance (with Cochran's test)?
thanks for any help you can provide.


On 24/04/07, Carlos Alberto Estombelo Montesco <[log in to unmask]> wrote:
>
> Dear Peter Flom,
>
> Some coments about your email:
>
> - When you say "classify ... neurological problems" I think that it is
> general and probably there are characteristics (in signal) that can define
> which neurological problems, for example are your talkin about spikes
> related to schizophrenia?
>
> - It is interesting to do a Principal componene analysis over data, but
> here you can obtain : (ortogonal) components  ordered by their variances, if
> you have a good signal-to-noise-ratio (SNR) I think that there are no
> problems. But if you have low SNR, you need to be carefully about the high
> variance of the noise compared with the signal of interest and then lost the
> characteristics of the rela signal of interes, and when you cluster it can
> be appear spread, interfering the clustering.
>
> - When you use PCA probably you have the most correlated cases but not
> independent, because uncorrelation not meaning independence.
>
> - If you have "the diagnosis of the people " why didn´t you choose the
> most representative (a percentage of the set) and train and algoritm of
> classification, after that test with other little percentage, and at the end
> you can clasify the rest of data ?
>
> Best Regards,
>
> Carlos  Estombelo-Montesco
>
>
> 2007/4/24, Peter Flom <[log in to unmask]>:
> >
> >  Hello
> >
> > (note that this is the same Peter Flom at a different address with a new
> > e-mail and a new job)
> >
> > I have a data set with about 800 people and about 1000 variables.  The
> > variables are all 'features' of EEG data that have been extracted by subject
> > matter experts in neurology as being potentially useful. All variables have
> > been standardized to mean 0, sd 1. There are many high correlations among
> > them.
> >
> > We are interested in many aspects of this data - one primary aim is to
> > use the EEG data to better classify people who have neurological problems.
> > Two methods that seem particularly relevant to this list are clustering and
> > decision trees.  I've done a bit of both, but always on data sets with FAR
> > fewer variables ( e.g. about 10 variables).  Especially with regard to
> > clustering, I was thinking of doing a principal components analysis prior to
> > the cluster analysis (perhaps with SAS PRINCOMP, FACTOR, or VARCLUS).
> >
> > With regard to trees, I've done some 'basic' analysis of other data sets
> > using R's 'party' and 'rpart' packages.  With those data sets, however, the
> > main goal was explanation, and so, I did not explore bagging and boosting
> > and such.  Any pointers or introductions to that literature would be most
> > welcome (preferably at a not TOO high mathematical level - I had some
> > calculus many years ago, but am much more interested in applications than in
> > 'theorem-proof' material).
> >
> > I will be exploring this data set for quite some time, so am willing to
> > invest some effort to learn best practices, and am also willing to try a
> > variety of methods.
> >
> > Finally, as to why I am looking at both trees and clusters - partly, we
> > know the diagnosis of the people (hence trees are useful) but we also know
> > that there are difficulties with the diagnoses, and that these difficulties
> > may be amenable to exploration with sophisticated methods
> >
> >
> > Thanks in advance
> >
> > Peter Flom
> > Brainscope, Inc.
> >
> > ---------------------------------------------- CLASS-L list.
> > Instructions:
> > http://www.classification-society.org/csna/lists.html#class-l
>
>
>
>
> --
>
> +--------------------------------------------------------------------------------------+
>   Carlos Alberto Estombelo Montesco
>   PhD. Student in Physics Applied to Medicine and
> Biology
>
> .......................................................................................
>   University of Sao Paulo
>   Department of Physics and Mathematics
>   School of Philosophy, Sciences and Letters of Ribeirão Preto
>   Av. Bandeirantes, 3900 CEP: 14040-901 Ribeirão Preto, SP, Brazil
>   fax  : +55 16 3602 4887
>   email: [log in to unmask]
> +--------------------------------------------------------------------------------------+
> ---------------------------------------------- CLASS-L list. Instructions:
> http://www.classification-society.org/csna/lists.html#class-l




-- 
Yakir L. Gagnon, PhD student
The Vision Group
Tel  +46 (046) 222 93 40
Cell +46 (073) 7536354
Fax +46 (046) 222 44 25
http://www.biol.lu.se/funkmorf/vision/staff.html

----------------------------------------------
CLASS-L list.
Instructions: http://www.classification-society.org/csna/lists.html#class-l


ATOM RSS1 RSS2