Slide 14.3: How K-nearest neighbor (kNN) algorithm works?

How kNN Algorithm Works?

K-nearest neighbor algorithm works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbors. After we gather K nearest neighbors, we take simple majority of these K-nearest neighbors to be the prediction of the query instance.

The data for kNN algorithm consist of several multivariate attributes name X_i that will be used to classify Y. The data of kNN can be any measurement scale from ordinal, nominal, to quantitative scale but for the moment let us deal with only quantitative X_i and binary (nominal) Y. The last row is the query instance that we want to predict. The graph of this problem is shown below. Suppose we determine K=8 (we will use 8 nearest neighbors) as parameter of this algorithm. Because we use only quantitative X_i , we can use Euclidean distance. Suppose the query instance have coordinates (x₁^q, x₂^q) and the coordinate of training sample is (x₁^t, x₂^t), then square Euclidean distance is d_tq² = (x₁^t-x₁^q)²+(x₂^t-x₂^q)². If you have more than 2 variables, you can use Euclidean distance formula:

d_ij = (Σ_k(x_ik-x_jk)²)^½

For example, distance between objects A=(1,1) and B=(1.5,1.5) is computed as

d_AB = ((1-1.5)²+(1-1.5)²)^½ = 0.5^½ = 0.7071

Another example of distance between objects D=(3,4) and F=(3,3.5) is computed as

d_DF = ((3-3)²+(4-3.5)²)^½ = 0.5