CLASS-L Archives

June 2005


Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Ross Clement <[log in to unmask]>
Reply To:
Classification, clustering, and phylogeny estimation
Wed, 22 Jun 2005 08:18:32 -0500
text/plain (53 lines)
Hi. This is my first post to this list.

I'm interested if anyone could recommend a paper discussing the margins
of training sets and nearest neighbour classification. In case I'm using
the wrong terminology: I mean 'margin' as in the margin between a
decision boundary and instances optimised by support vector machines and

I'm interested in a good introductory paper in discussing the effective
margins when instances are classified by nearest neighbour
classification. I've just played around with a Java program which allows
me to create a 2d data set by plotting points with a mouse. I note that
for "1 nearest neighbours" approaches the margin between the instances
and the decision boundary is maximised (as it should be using euclidean
distances and a visible "instance space"), while "3 nearest neighbours"
can easily misclassify instances even when the problem is linearly
separable. Hence maximising the margins between the decision boundary
and instances is not what is happening with 3NN.

Putting "nearest neighbor" classification and margin into a search
engine returns large numbers of papers, but I have no idea which papers
or authors would the best ones to try first.

My reason for wanting to know more about this is that as a more
"symbolic AI" type of person, I have in the past ended up creating
nearest neighbour classifiers for problems in ad-hoc domain dependent
data representations and using ad-hoc distance metrics. Not knowing how
I would apply classifiers such as support vector machines to these
problems (or even if such is possible - conversion to vectors is
frequently not feasible). I do note that the type of decision boundary
created when using nearest neighbour approaches is very different from
the linear, polynomial, etc. decision boundaries created by other
approaches and am aware of the bias-variance tradeoff. But, I'm mainly
thinking about margins in instance spaces that are not vectors where it
is not feasible to convert the instances to vectors.

Thanks in anticipation,

Ross Clement
Ross Clement (Dr)
Senior Lecturer
Department of Artificial Intelligence and Interactive Multimedia
Harrow School of Computer Science
University of Westminster
Northwick Park

[log in to unmask]
+44-20-7911-5000 ext. 4159