K Means Intuition
K-means Intuition
Section titled “K-means Intuition”Example Dataset Setup
Section titled “Example Dataset Setup”- Data set with 30 unlabeled training examples (30 points)
- Goal: Run K-means to find two clusters (K=2)
- Later in week: discuss how to decide how many clusters to find
Initial Setup
Section titled “Initial Setup”Step 1: Random Initialization
Section titled “Step 1: Random Initialization”- K-means algorithm first takes random guess at where centers of clusters might be
- Randomly pick two points (red cross and blue cross) as initial cluster centers
- These are just random initial guesses - not particularly good guesses
- Centers of clusters are called cluster centroids
Core K-means Process
Section titled “Core K-means Process”K-means repeatedly does two different things:
- Assign points to cluster centroids
- Move cluster centroids
Step 1: Assign Points to Cluster Centroids
Section titled “Step 1: Assign Points to Cluster Centroids”- Go through each of the points (x⁽¹⁾ through x⁽³⁰⁾)
- For each point, check if it’s closer to red cross or blue cross
- Assign each point to whichever cluster centroid it is closer to
- Illustrate assignment by painting each point either red or blue
- Point closer to red centroid → painted red
- Point closer to blue centroid → painted blue
Step 2: Move Cluster Centroids
Section titled “Step 2: Move Cluster Centroids”- Look at all red points and take average of them
- Move red cross to average location of red dots
- Look at all blue dots and take average of them
- Move blue cross to average location of blue dots
- This gives new locations for both cluster centroids
Iterative Process
Section titled “Iterative Process”Continue Until Convergence
Section titled “Continue Until Convergence”- Look through all 30 training examples again
- Check for every point whether it’s closer to red or blue centroid (new locations)
- Associate each point with closer cluster centroid (indicated by color)
- Some points may change color as centroids have moved
- Repeat: compute average location of red dots and blue dots
- Move cluster centroids to new average locations
- Repeat process
Convergence
Section titled “Convergence”- Keep repeating the two steps:
- Assign each point to nearest cluster centroid
- Move each cluster centroid to mean of all points with same color
- Algorithm converges when there are no more changes to:
- Colors of points
- Locations of cluster centroids
Final Result
Section titled “Final Result”In the example, K-means has done a pretty good job:
- Points up in one area correspond to one cluster
- Points down in another area correspond to second cluster
The K-means algorithm demonstrates a simple yet powerful approach to automatically discovering natural groupings in data through this iterative refinement process.