If the X contains some categorical data or nominal or ordinal measurement scale, you may need to compute weighted distance from the multivariate variables.
The next step is to find the K-nearest neighbors.
We include a training sample as nearest neighbors if the distance of this training sample to the query instance is less than or equal to the Kth smallest distance.
In other words, we sort the distance of all training samples to the query instance and determine the Kth minimum distance.
If the distance of the training sample is below the K-th minimum, then we gather the category Y of this nearest neighbors’ training samples.
In MS excel, we can use MS Excel function SMALL(array,K) to determine the Kth minimum value among the array.
Some special case happens in our example that the 4th until the 7th minimum distance happen to be the same.
|
“A great many people think they are thinking when they are merely rearranging their prejudices.” ― William James |