A kNN Application Using scikit-learn


kNN is a simple, supervised machine learning (ML) algorithm that can be used for classification or regression tasks—and is also frequently used in missing value imputation. It is based on the idea that the observations closest to a given data point are the most “similar” observations in a data set, and we can therefore classify unforeseen points based on the values of the closest existing points. By choosing k, the user can select the number of nearby observations to use in the algorithm. This slide will show you how to implement the kNN algorithm for classification, and show how different values of k affect the results.

A kNN Application Using scikit-learn
k is the number of nearest neighbors to use. For classification, a majority vote is used to determined which class a new observation should fall into. Larger values of k are often more robust to outliers and produce more stable decision boundaries than very small values (k=3 would be better than k=1, which might produce undesirable results). Below is a kNN application using scikit-learn:

A kNN Application Using scikit-learn
All data points
A new data point classified
    (before clicking, uncommenting 1 command, scatter, below)

    (before clicking, uncommenting 2 commands, scatter and text, below)

    (after clicking either one of the above two buttons)

                   




      Be careful when you follow the masses.    
      Sometimes the M is silent.