CLASS-L Archives

June 2002

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Arthur J. Kendall" <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Date:
Sat, 22 Jun 2002 09:07:08 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (65 lines)
In clustering, as in factor analysis, there are no exact stopping rules.
The various rules give you a ballpark number of cluster to retain. The final
decision is based on agreement among different methods, similarity
coefficients, etc. and on the intepretability of the results.

In SPSS's QUICK Cluster you specify the number of clusters.  It has a
default number of iterations (10)  and a convergence value that defaults to
the minimum distince between initial clusters. These  can be overridden.
QUICK CLUSTER {varlist}
 . . .
 [/INITIAL=(value list)]
 [/CRITERIA=[CLUSTER({2**})] [NOINITIAL]
                     {k  }
        [MXITER({10**})][CONVERGE({0**})]]
                {n   }            {n  }
. . .
 **Default if the subcommand is omitted.


In SPSS's CLUSTER, which does hierachical clustering, you get several kinds
of output that help.
The SAVE subcommand saves the membership of each case for a section of the
tree. You can use this to crosstab the results of different mentods etc.
The SCHEDULE specification on the PRINT subcommand includes a column that
you can inspect for a "jump".
The PLOT has a dendogran which is scaled by the joining distances of the
clusters.  This gives you information on which combinations of cases are
done on samller vs larger distances.

CLUSTER varlist
 [/MEASURE={SEUCLID**          }]
. . . 40 or so distance/similarity measures clipped
 [/METHOD={BAVERAGE**}[(rootname)] [,...]]
          {WAVERAGE  }
          {SINGLE    }
          {COMPLETE  }
          {CENTROID  }
          {MEDIAN    }
          {WARD      }
 [/SAVE=CLUSTER({level  })]  [/ID=varname]
               {min,max}
 [/PRINT=[CLUSTER({level  })] [DISTANCE]]
                  {min,max}
         [SCHEDULE**] [NONE]
 [/PLOT=[VICICLE**[(min[,max[,inc]])]]]
        [HICICLE[(min[,max[,inc]])]]
        [DENDROGRAM] [NONE]
 . . .
**Default if the subcommand is omitted.

George Feretzakis wrote:

> Hi,
>
> Have anyone any idea if any of the software package like (SPSS, s-Plus,
> Stata) includes any function in order to determine the optimum number of
> groups in a data set using stopping rules of cluster Analysis.
>
> Thanks everyone in advance.
> Sincerely yours,
> George Feretzakis (MSc in Biostatistics)
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com

ATOM RSS1 RSS2