## CLASS-L@LISTS.SUNYSB.EDU

 Options: Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: Re: Clustering question From: shannon <[log in to unmask]> Reply To: Classification, clustering, and phylogeny estimation Date: Wed, 23 Oct 2002 09:09:29 -0500 Content-Type: TEXT/PLAIN Parts/Attachments: TEXT/PLAIN (178 lines)
```Jim

I agree that the data will probably never occur in real life, but I can
imagine three multivariate normal distributions with the same covariance
matrix that if partitioned should result in R1=R2=R3=R. However, if the
data is generated by a single distribution as you suggest, then
partitioning that data will result in very different correlation matrices.

One other approach that might be taken is latent class analysis where the
partition is found such that the variables within a cluster are
independent. Then the problem would be to find the partition that produces
the same variance estimates on the diagonal of the covariance matrix for
each of the three clusters.

None of this seems realistic in terms of finding a dataset that can be
modeled by all these conditions, but who knows.

Bill

PS -- Class-l is great and we should all be sending lots and lots of stuff
to it!!!

On Wed, 23 Oct 2002, F. James Rohlf wrote:

> The requirement that R1=R2=R3=R does not sound very reasonable to me. What
> is the motivation?
>
> If the clustering partitions the space then that must have an effect on the
> covariance matrices and thus the correlation matrices will also be affected.
> For example, if the overall cloud of points is very elongated and one chops
> the distribution along the major axis one would have a similar major axes of
> covariation within each cluster. However, the relative importance of the
> major axis within the clusters (in comparison to that of the total sample)
> would be much less and thus the overall correlations found within the
> clusters would be reduced.
>
> On the other hand, if the overall cloud of points is pretty much spherical,
> then linear partitions of the space will likely show higher levels of
> correlation within the clusters.
>
> -----------------------
> F. James Rohlf
> State University of New York, Stony Brook, NY 11794-5245
> www: http://life.bio.sunysb.edu/ee/rohlf
>
> > -----Original Message-----
> > From: Classification, clustering, and phylogeny estimation
> > Sent: Wednesday, October 23, 2002 7:36 AM
> > Subject: Re: Clustering question
> >
> >
> > hi,
> >
> > I agree, mixture modelling can handle your specific data.
> > for mixure modelling, try MCLUST
> > http://www.stat.washington.edu/fraley/software.html/
> >
> > There are also some good papers and reports on the site.
> >
> > regards,
> > jacco
> > -------------------------------------------
> > J.C. Noordam
> > Agrotechnological Research Institute (ATO)
> > Department Production & Control Systems
> > P.O.Box 17,6700 AA Wageningen, the Netherlands
> > http://www.ato.wageningen-ur.nl
> > tel: +31.317.475139
> > fax: +31.317.475347
> >
> >
> > > -----Original Message-----
> > > Sent: woensdag 23 oktober 2002 13:32
> > > Subject: Re: Clustering question
> > >
> > >
> > > Hi
> > >
> > > I would think this could occur only in a special case where a mixture
> > > model approach can be used. The data would need to be from
> > > three different
> > > multivariate normal distributions, each with the same
> > > covariance matrix.
> > > If you do a web search on 'mixture models' you will come up with the
> > > information you need.
> > >
> > > I don't know of and can't imagine any type of hierarchical or scaling
> > > approach that could be used.
> > >
> > >
> > > Bill
> > > ---
> > >
> > > William D. Shannon, Ph.D.
> > >
> > > Assistant Professor of Biostatistics in Medicine
> > > Division of General Medical Sciences and Biostatistics
> > >
> > > Washington University School of Medicine
> > > Campus Box 8005, 660 S. Euclid
> > > St. Louis, MO   63110
> > >
> > > Phone: 314-454-8356
> > > Fax: 314-454-5113
> > > web page: http://ilya.wustl.edu/~shannon
> > >
> > >
> > > On Wed, 23 Oct 2002, Marinucci, Max (MB Ergo) wrote:
> > >
> > > > Dear all
> > > >
> > > >
> > > > I would like to know if there is some clustering provedure
> > > which does the
> > > > following.Given a data set with n observations on k variables with
> > > > correlations matrix R (k x k) I would like to obtain 3 cluster of
> > > > approximatively equal size n1=n2=n3 that satisfy the
> > > following condition.
> > > >
> > > >
> > > > The correlations matrix of each of the three subgroups
> > > should be as close as
> > > > possible each other and with respect to the pooled
> > > correlation matrix, That
> > > > is R1=R2=R3=R
> > > >
> > > >
> > > > Do you have any suggestions or ideas on how to proceed to
> > > obtain such
> > > > partitions?
> > > >
> > > >
> > > > Thanx a lot
> > > >
> > > >
> > > > Massimiliano Marinucci
> > > >
> > > >
> > > > Phd candidate
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ====================================================
> > > > This email is confidential and intended solely for the use of the
> > > > individual or organisation to whom it is addressed. Any opinions or
> > > > advice presented are solely those of the author and do not
> > > necessarily
> > > > represent those of the Millward Brown Group of Companies.
> > > If you are
> > > > not the intended recipient of this email, you should not
> > > copy, modify,
> > > > distribute or take any action in reliance on it. If you