CLASS-L Archives

October 2002

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Content-Transfer-Encoding:
8bit
Sender:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Subject:
From:
"Marinucci, Max (MB Ergo)" <[log in to unmask]>
Date:
Fri, 25 Oct 2002 08:51:55 +0100
Content-Type:
text/plain; charset="iso-8859-1"
MIME-Version:
1.0
Reply-To:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Parts/Attachments:
text/plain (140 lines)
Dear all,
I resend you my second posting below. Hope this helps.
Thanks in advance for your attention
max



-----Mensaje original-----
De: Marinucci, Max (MB Ergo)
Enviado el: miércoles, 23 de octubre de 2002 16:01
Para: 'Classification, clustering, and phylogeny estimation'
Asunto: RE: Clustering question


Dear all
thanks for all your suggestions which I highly appreciate and take into
account for my work.
Some of you asked me about the motivation of such partitionning: to be more
concrete it deals with the crossvalidation
of some econometric regression models.
In other words we want to split the data into three subsamples that should
be as homogeneous as possible in terms of the correlations patterns among
all variables (dependent and regressors).
We need to do this because our modelling tool uses the out of sample
forecasting ability to suggest the best model.
I hope this helps you to understand better the sense and the meaning of my
query.
We already tried some random allocation of the data, but we want to
implement in our tool some routines that do this in a more objective and
optimal way.
This is because the random solution results almost always in a non
satisfactory partitions when the data set is very small (ie 50 obs).

I apologise if the matter is not strictly related with cluster analysis.
Maybe it can be considered an allocation-optimization problem.

Kind regards
max

Max Marinucci
Phd Candidate
Universidad Complutense Madrid



-----Mensaje original-----
De: shannon [mailto:[log in to unmask]]
Enviado el: viernes, 25 de octubre de 2002 4:17
Para: [log in to unmask]
Asunto: Re: classification comparison/R=R1=R2=R3


I'm not sure the problem that needed to be addressed is what we think it
is. I saw an email in the last couple of days from the original poster of
the problem and believe the issue is to generate 3 random samples (without
replacement) from a small number of observations such that each sample has
the same distribution as the original data.

I might be confusing this and maybe the original poster can resend a
description of the problem to the list.

Bill Shannon

On Thu, 24 Oct 2002, F. James Rohlf wrote:

> I agree. I don't yet see the point of why this should be done.
>
> > -----Original Message-----
> > From: Classification, clustering, and phylogeny estimation
> > [mailto:[log in to unmask]]On Behalf Of Murray Jorgensen
> > Sent: Thursday, October 24, 2002 8:21 PM
> > To: [log in to unmask]
> > Subject: Re: classification comparison/R=R1=R2=R3
> >
> >
> > At 15:36 24/10/02 +0200, Christian Hennig wrote:
> >
> > >2) On clustering with R1=R2=R3=R. k-means clustering implicitly assumes
> > >   clusters to have unit matrix correlation. So transforming the data
to
> > >   unit covariance and then applying 3-means will give clusters with
> > >   approximately R1=R2=R3=R.
> >
> > R1=R2=R3, maybe but =R???
> >
> > Surely it is most unlikely that the overall correlation structure
> > would mirror
> > the within-cluster structure? It is also hard to think why that might be
> > desirable. If it were then an obvious way to achieve it would be
> > to randomly
> > allocate the data points to the three clusters.
> >
> > Murray Jorgensen
> >
> >
> > May be even better with a Gausiian mixture
> > >   model where covariance matrices of the clusters are restricted to
cI,
> > >   where I is unit matrix and c may depend on the cluster. This again
has
> > >   to be applied to data which is sphered, i.e. transformed to unit
> > >   covariance first. I hope this "covariance model" can be found
> > in mclust,
> > >   mentioned previously in this discussion.
> > >
> > >Christian Hennig
> > >
> > >
> > >
> > >--
> > >***********************************************************************
> > >Christian Hennig
> > >Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (current)
> > >and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
> > >[log in to unmask], http://stat.ethz.ch/~hennig/
> > >[log in to unmask], http://www.math.uni-hamburg.de/home/hennig/
> > >#######################################################################
> > >ich empfehle www.boag.de
> > >
> > Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
> > Department of Statistics, University of Waikato, Hamilton, New Zealand
> > Email: [log in to unmask]                            Fax +64-7 838 4155
> > Phone  +64-7 838 4773 wk    +64 7 849 6486 home     Mobile 021 395 862
> >
>


====================================================
This email is confidential and intended solely for the use of the
individual or organisation to whom it is addressed. Any opinions or
advice presented are solely those of the author and do not necessarily
represent those of the Millward Brown Group of Companies.  If you are
not the intended recipient of this email, you should not copy, modify,
distribute or take any action in reliance on it. If you have received
this email in error please notify the sender and delete this email
from your system. Although this email has been checked for viruses
 and other defects, no responsibility can be accepted for any loss or
damage arising from its receipt or use.
====================================================

ATOM RSS1 RSS2