Skip to content
Pablo Rodriguez

K Means Intuition

  • Data set with 30 unlabeled training examples (30 points)
  • Goal: Run K-means to find two clusters (K=2)
  • Later in week: discuss how to decide how many clusters to find
  • K-means algorithm first takes random guess at where centers of clusters might be
  • Randomly pick two points (red cross and blue cross) as initial cluster centers
  • These are just random initial guesses - not particularly good guesses
  • Centers of clusters are called cluster centroids

K-means repeatedly does two different things:

  1. Assign points to cluster centroids
  2. Move cluster centroids

Step 1: Assign Points to Cluster Centroids

Section titled “Step 1: Assign Points to Cluster Centroids”
  • Go through each of the points (x⁽¹⁾ through x⁽³⁰⁾)
  • For each point, check if it’s closer to red cross or blue cross
  • Assign each point to whichever cluster centroid it is closer to
  • Illustrate assignment by painting each point either red or blue
    • Point closer to red centroid → painted red
    • Point closer to blue centroid → painted blue
  • Look at all red points and take average of them
  • Move red cross to average location of red dots
  • Look at all blue dots and take average of them
  • Move blue cross to average location of blue dots
  • This gives new locations for both cluster centroids
  1. Look through all 30 training examples again
  2. Check for every point whether it’s closer to red or blue centroid (new locations)
  3. Associate each point with closer cluster centroid (indicated by color)
  4. Some points may change color as centroids have moved
  5. Repeat: compute average location of red dots and blue dots
  6. Move cluster centroids to new average locations
  7. Repeat process
  • Keep repeating the two steps:
    • Assign each point to nearest cluster centroid
    • Move each cluster centroid to mean of all points with same color
  • Algorithm converges when there are no more changes to:
    • Colors of points
    • Locations of cluster centroids

In the example, K-means has done a pretty good job:

  • Points up in one area correspond to one cluster
  • Points down in another area correspond to second cluster

The K-means algorithm demonstrates a simple yet powerful approach to automatically discovering natural groupings in data through this iterative refinement process.