The data for kNN algorithm consist of several multivariate attributes name Xi that will be used to classify Y.
The data of kNN can be any measurement scale from ordinal, nominal, to quantitative scale but for the moment let us deal with only quantitative Xi and binary (nominal) Y.
The last row is the query instance that we want to predict.
The graph of this problem is shown below.
Suppose we determine K=8 (we will use 8 nearest neighbors) as parameter of this algorithm.
Because we use only quantitative Xi , we can use Euclidean distance.
Suppose the query instance have coordinates (x1q, x2q) and the coordinate of training sample is (x1t, x2t), then square Euclidean distance is dtq2 = (x1t-x1q)2+(x2t-x2q)2.
If you have more than 2 variables, you can use Euclidean distance formula:
|
|
|