CLASS-L Archives

March 2003

CLASS-L@LISTS.SUNYSB.EDU

Options: Use Proportional Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version:
1.0
Sender:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Subject:
From:
Michael Fahey <[log in to unmask]>
Date:
Wed, 5 Mar 2003 20:06:09 +0100
Content-Type:
text/plain; charset="us-ascii"; format=flowed
Reply-To:
"Classification, clustering, and phylogeny estimation" <[log in to unmask]>
Parts/Attachments:
text/plain (33 lines)
Hello all,

A quick question regarding the use of the Bayesian Information Criterion
(BIC) to determine the number of clusters when doing mixture-model clustering.

Consider these two analyses:

1)  I took a random sample of size of N=2000 from a population.  I found,
as expected, that there was a point at which BIC began to increase with the
estimation of an additional cluster.  The BIC indicated that 4 clusters
were sufficient.

2)  I took a random sample of size of N=10,000 from the same population as
above.  In this case, the BIC decreased monotonically for as many as 16
clusters.

My naive explanation for the different behaviour of the BIC is the
difference in sample size.  Is it (somewhat) analogous to the "ease" of
getting small p-values for hypothesis tests with large samples?

Does anyone have any comments, pointers to literature, or suggestions?

Thanks in advance,


--
Michael Fahey
Unit of Human Nutrition and Cancer
IARC, 150 cours Albert-Thomas
69372, Lyon, cedex 08, France
Tel: +33-4-7273-8343
Fax: +33-4-7273-8361

ATOM RSS1 RSS2