Dear all
thanks for all your suggestions which I highly appreciate and take into
account for my work.
Some of you asked me about the motivation of such partitionning: to be more
concrete it deals with the crossvalidation
of some econometric regression models.
In other words we want to split the data into three subsamples that should
be as homogeneous as possible in terms of the correlations patterns among
all variables (dependent and regressors).
We need to do this because our modelling tool uses the out of sample
forecasting ability to suggest the best model.
I hope this helps you to understand better the sense and the meaning of my
query.
We already tried some random allocation of the data, but we want to
implement in our tool some routines that do this in a more objective and
optimal way.
This is because the random solution results almost always in a non
satisfactory partitions when the data set is very small (ie 50 obs).
I apologise if the matter is not strictly related with cluster analysis.
Maybe it can be considered an allocation-optimization problem.
Kind regards
max
Max Marinucci
Phd Candidate
Universidad Complutense Madrid
-----Mensaje original-----
De: F. James Rohlf [mailto:[log in to unmask]]
Enviado el: miércoles, 23 de octubre de 2002 14:52
Para: [log in to unmask]
Asunto: Re: Clustering question
The requirement that R1=R2=R3=R does not sound very reasonable to me. What
is the motivation?
If the clustering partitions the space then that must have an effect on the
covariance matrices and thus the correlation matrices will also be affected.
For example, if the overall cloud of points is very elongated and one chops
the distribution along the major axis one would have a similar major axes of
covariation within each cluster. However, the relative importance of the
major axis within the clusters (in comparison to that of the total sample)
would be much less and thus the overall correlations found within the
clusters would be reduced.
On the other hand, if the overall cloud of points is pretty much spherical,
then linear partitions of the space will likely show higher levels of
correlation within the clusters.
-----------------------
F. James Rohlf
State University of New York, Stony Brook, NY 11794-5245
www: http://life.bio.sunysb.edu/ee/rohlf
> -----Original Message-----
> From: Classification, clustering, and phylogeny estimation
> [mailto:[log in to unmask]]On Behalf Of Noordam Ir J.C.
> Sent: Wednesday, October 23, 2002 7:36 AM
> To: [log in to unmask]
> Subject: Re: Clustering question
>
>
> hi,
>
> I agree, mixture modelling can handle your specific data.
> for mixure modelling, try MCLUST
> http://www.stat.washington.edu/fraley/software.html/
>
> There are also some good papers and reports on the site.
>
> regards,
> jacco
> -------------------------------------------
> J.C. Noordam
> Agrotechnological Research Institute (ATO)
> Department Production & Control Systems
> P.O.Box 17,6700 AA Wageningen, the Netherlands
> http://www.ato.wageningen-ur.nl
> email : [log in to unmask]
> tel: +31.317.475139
> fax: +31.317.475347
>
>
> > -----Original Message-----
> > From: shannon [mailto:[log in to unmask]]
> > Sent: woensdag 23 oktober 2002 13:32
> > To: [log in to unmask]
> > Subject: Re: Clustering question
> >
> >
> > Hi
> >
> > I would think this could occur only in a special case where a mixture
> > model approach can be used. The data would need to be from
> > three different
> > multivariate normal distributions, each with the same
> > covariance matrix.
> > If you do a web search on 'mixture models' you will come up with the
> > information you need.
> >
> > I don't know of and can't imagine any type of hierarchical or scaling
> > approach that could be used.
> >
> >
> > Bill
> > ---
> >
> > William D. Shannon, Ph.D.
> >
> > Assistant Professor of Biostatistics in Medicine
> > Division of General Medical Sciences and Biostatistics
> >
> > Washington University School of Medicine
> > Campus Box 8005, 660 S. Euclid
> > St. Louis, MO 63110
> >
> > Phone: 314-454-8356
> > Fax: 314-454-5113
> > e-mail: [log in to unmask]
> > web page: http://ilya.wustl.edu/~shannon
> >
> >
> > On Wed, 23 Oct 2002, Marinucci, Max (MB Ergo) wrote:
> >
> > > Dear all
> > >
> > >
> > > I would like to know if there is some clustering provedure
> > which does the
> > > following.Given a data set with n observations on k variables with
> > > correlations matrix R (k x k) I would like to obtain 3 cluster of
> > > approximatively equal size n1=n2=n3 that satisfy the
> > following condition.
> > >
> > >
> > > The correlations matrix of each of the three subgroups
> > should be as close as
> > > possible each other and with respect to the pooled
> > correlation matrix, That
> > > is R1=R2=R3=R
> > >
> > >
> > > Do you have any suggestions or ideas on how to proceed to
> > obtain such
> > > partitions?
> > >
> > >
> > > Thanx a lot
> > >
> > >
> > > Massimiliano Marinucci
> > >
> > >
> > > Phd candidate
> > >
> > >
> > > Universidad Complutense Madrid
> > >
> > >
> > >
> > >
> > >
> > >
> > > ====================================================
> > > This email is confidential and intended solely for the use of the
> > > individual or organisation to whom it is addressed. Any opinions or
> > > advice presented are solely those of the author and do not
> > necessarily
> > > represent those of the Millward Brown Group of Companies.
> > If you are
> > > not the intended recipient of this email, you should not
> > copy, modify,
> > > distribute or take any action in reliance on it. If you
> > have received
> > > this email in error please notify the sender and delete this email
> > > from your system. Although this email has been checked for viruses
> > > and other defects, no responsibility can be accepted for
> > any loss or
> > > damage arising from its receipt or use.
> > > ====================================================
> > >
> >
>
====================================================
This email is confidential and intended solely for the use of the
individual or organisation to whom it is addressed. Any opinions or
advice presented are solely those of the author and do not necessarily
represent those of the Millward Brown Group of Companies. If you are
not the intended recipient of this email, you should not copy, modify,
distribute or take any action in reliance on it. If you have received
this email in error please notify the sender and delete this email
from your system. Although this email has been checked for viruses
and other defects, no responsibility can be accepted for any loss or
damage arising from its receipt or use.
====================================================
|