- 1997-03
- Selection of Distance Metrics and Feature Subsets for k-Nearest Neighbor Classifiers
- Allen L. Barker, May 1997
- Advisor: Worthy N. Martin and Donald E. Brown
- Online Formats: PostScript, PDF
Abstract:
The k-nearest neighbor (kNN) classifier is a popular and
effective method for associating a feature vector with
a unique element in a known, finite set of classes. A common
choice for the distance metric used in kNN classification
is the quadratic distance Q(x,A,y) = (x-y)' A (x-y), where x
and y are n-vectors of features, A is a symmetric n by n matrix,
and prime denotes transpose. For finite sets of training samples
the choice of matrix A is important in optimizing classifier
performance. We show that A can be approximately optimized via
gradient descent on a sigmoidally smoothed estimate of the classifier's
probability of error. We describe an algorithm for performing the
metric selection, and compare the performance of our method with that
of other methods. We demonstrate that adding noise during the
descent process can reduce the effects of overfitting. We further
suggest how feature subset selection can be treated as a special
case of this metric selection.