No matter how the performance of the model is measured (precision, recall,
MSE, correlation), we always need to measure on the test set, not on the
training set. Performance on the training only tells us that the model
learns what it's supposed to learn. It is not a good indicator of
performance on unseen data. The test set can be obtained using an
independent sample or holdout techniques (cross-validation, leave-one-out).
To meaningfully compare the performance of two algorithms for a given type
of data, we need to compute if a difference in performance is significant.
We also need to compare performance against a baseline (chance or


Mitchell, Tom. M. 1997. Machine Learning. New York: McGraw-Hill.

Witten, Ian H., and Eibe Frank. 2000. Data Mining: Practical Machine
Learning Tools and Techniques with Java Implementations. San Diego, CA:
Morgan Kaufmann.

----- Original Message -----
From: "Henry Bulley" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Saturday, November 27, 2004 12:28 PM
Subject: A Classification validation question

> Hello,
> I recently read that:
> you can't validate the "classification model with the data used to develop
> the model. You must use completely independent data otherwise you bias the
> results.
> Is there any resampling approach to address this issue?
> I would be grateful if any of you can point me to some good references or
> studies.
> Thanks for your help
> Henry