Slide 16.7: A K-means clustering application using scikit-learn

Slide 16.6: Sklearn (cont.)
Slide 16.8: Data mining techniques
Home Print version

A K-Means Clustering Application Using scikit-learn

K-means is an unsupervised learning method for clustering data points. The algorithm iteratively divides data points into K clusters by minimizing the variance in each cluster. Here, we will show you how to estimate the best value for K using the elbow method, then use K-means clustering to group the data points into clusters.

A K-Means Clustering Application Using scikit-learn
K-means clustering requires us to select K, the number of clusters we want to group the data into:

Each data point is randomly assigned to one of the K clusters.
Compute the centroid (functionally the center) of each cluster, and reassign each data point to the cluster with the closest centroid.
Repeat this process until the cluster assignments for each data point are no longer changing.

The elbow method lets us graph the inertia (a distance-based metric) and visualize the point at which it starts decreasing linearly. This point is referred to as the “elbow” and is a good estimate for the best value for K based on our data.

A K-Means Clustering Application Using scikit-learn

All data points

The elbow method

Data points clustered

(before clicking, uncommenting the scatter command below)

(before clicking, uncommenting 10 commands below)

(before clicking, uncommenting 4 commands below)

(after clicking any one of the above three buttons)

◀
Previous

Slide 16.6: Sklearn (cont.)
Slide 16.8: Data mining techniques
Home Print version

▶
Next

Be careful when you follow the masses.
Sometimes the M is silent.