Print

Print


A good candidate for such an exercise is the "World population data sheet"
available from Population Reference Bureau.  It contains all the countries
of the world, basic demographic info (% over 65, % under 19, crude birth
rate, crude death rate, GNP per capita, etc.), and some items are missing
for some countries.  There are enough countries, and enough data and enough
variation, to make the exercise interesting.

Unfortunately, there's no "agreed upon" classification. However, in my
analyses, I typically end up with three clusters: Older developed nations
with low CBR and low CDR and high percapita GNP, developing nations with
modest CBR and CDR, young populations and medium percap GNP, and developing
nations with high CBR and CDR, very young populations and low percap GNP.

I may have an old 1987 data sheet entered in stata format, if I can find it
in my old files.  If interested in that year, feel free to contact me
directly.

Dean H. Judson, Ph.D., Mathematical Statistician and Group Leader

Administrative Records Evaluation and Linkage Group

Planning, Research and Evaluation Division
U.S. Bureau of the Census
Washington, DC 20233

Phone: 301-457-4222
Fax: 301-457-6864
email: [log in to unmask]





                    Lyn Hunt
                    <[log in to unmask]       To:     [log in to unmask]
                    NZ>                    cc:
                    Sent by:               Subject:     Data set required please
                    "Classification,
                    clustering, and
                    phylogeny
                    estimation"
                    <[log in to unmask]
                    unysb.edu>


                    03/20/2002 06:54
                    PM
                    Please respond
                    to
                    "Classification,
                    clustering, and
                    phylogeny
                    estimation"






Would somebody be able to recommend a data set that I could use that meets
the following criteria.

It needs to be suitable for clustering and contain both categorical and
continuous variables with missing values in some variables. It would also
be good if there were some commonly agreed classification as with Fisher's
Iris data.

Thanks.



--------------------------------------------------------------------------------------------------

  Lyn Hunt                       Phone:  64 7 838 4466 ext 8338
  Department of Statistics,      Fax:      64 7 838 4155
  University of Waikato,
  Private Bag 3105,
  Hamilton, New Zealand.
---------------------------------------------------------------------------------------------------