Abstract
A more sophisticated approach, k-nearest neighbor (kNN) classification [10, 11, 21], finds a group of k objects in the training set that are closest to the test object, and bases the assignment of a label on the predominance of a particular class in this neighborhood. This addresses the issue that, in many data sets, it is unlikely that one object will exactly match another, as well as the fact that conflicting information about the class of an object may be provided by the objects closest to it. There are several key elements of this approach: (i) the set of labeled objects to be used for evaluating a test object’s class, 1 (ii) a distance or similarity metric that can be used to compute the closeness of objects, (iii) the value of k, the number of nearest neighbors, and (iv) the method used to determine the class of the target object based on the classes and distances of the k nearest neighbors. In its simplest form, kNN can involve assigning an object the class of its nearest neighbor or of the majority of its nearest neighbors, but a variety of enhancements are possible and are discussed below.
| Original language | English (US) |
|---|---|
| Title of host publication | The Top Ten Algorithms in Data Mining |
| Publisher | CRC Press |
| Pages | 151-161 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781420089653 |
| ISBN (Print) | 9781420089646 |
| DOIs | |
| State | Published - Jan 1 2009 |
Bibliographical note
Publisher Copyright:© 2009 by Taylor and Francis Group, LLC.