An Example of K-means Using Sklearn


K-means is an unsupervised learning method for clustering data points. The algorithm iteratively divides data points into K clusters by minimizing the variance in each cluster. Here, we will show you how to estimate the best value for K using the elbow method, then use K-means clustering to group the data points into clusters.

A K-means Example Using Sklearn
K-means clustering requires us to select K, the number of clusters we want to group the data into:
  1. Each data point is randomly assigned to one of the K clusters.
  2. Compute the centroid (functionally the center) of each cluster, and reassign each data point to the cluster with the closest centroid.
  3. Repeat this process until the cluster assignments for each data point are no longer changing.
The elbow method lets us graph the inertia (a distance-based metric) and visualize the point at which it starts decreasing linearly. This point is referred to as the “eblow” and is a good estimate for the best value for K based on our data. The code does not work for the Web since Python is not a client-side language. To see a working example, check the W3Schools.

All data points
The elbow method
The final clusters
The Python Source Code for the Rightmost Figure (K=2)
# Three lines to make Python compiler able to draw:
import sys
import matplotlib
matplotlib.use( 'Agg' )

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

x = [  4,  5, 10,  4,  3, 11, 14 , 6, 10, 12 ]
y = [ 21, 19, 24, 17, 16, 25, 24, 22, 21, 21 ]
data = list( zip( x, y ) )

# The elbow method shows that 2 is a good value for K.
kmeans = KMeans( n_clusters=2 )
kmeans.fit( data )

plt.scatter( x, y, c=kmeans.labels_ )
plt.show( )

# Two lines to make Python compiler able to draw:
plt.savefig( sys.stdout.buffer )
sys.stdout.flush( )




      When people tell me “You’re gonna regret that in the morning.”    
      I sleep till noon because I’m a problem solver.