A quick question regarding the use of the Bayesian Information Criterion
(BIC) to determine the number of clusters when doing mixture-model clustering.
Consider these two analyses:
1) I took a random sample of size of N=2000 from a population. I found,
as expected, that there was a point at which BIC began to increase with the
estimation of an additional cluster. The BIC indicated that 4 clusters
2) I took a random sample of size of N=10,000 from the same population as
above. In this case, the BIC decreased monotonically for as many as 16
My naive explanation for the different behaviour of the BIC is the
difference in sample size. Is it (somewhat) analogous to the "ease" of
getting small p-values for hypothesis tests with large samples?
Does anyone have any comments, pointers to literature, or suggestions?
Thanks in advance,
Unit of Human Nutrition and Cancer
IARC, 150 cours Albert-Thomas
69372, Lyon, cedex 08, France