Initializing K Means
Initializing K-means
Section titled “Initializing K-means”Random Initialization Method
Section titled “Random Initialization Method”Basic Constraint
Section titled “Basic Constraint”- Should always choose number of cluster centroids K < number of training examples m
- Doesn’t make sense to have K ≥ m (not enough examples per cluster)
- Example: K = 2, m = 30
Most Common Initialization Method
Section titled “Most Common Initialization Method”- Randomly pick K training examples from the dataset
- Set μ₁ through μₖ equal to these K training examples
Example Process
Section titled “Example Process”- Training set with many points
- Randomly select two training examples (for K = 2)
- Set red cluster centroid at first selected point
- Set blue cluster centroid at second selected point
- Run K-means from these initial positions
Problem: Local Optima
Section titled “Problem: Local Optima”Different Initializations → Different Results
Section titled “Different Initializations → Different Results”- Same dataset can yield different clustering results
- Depends on initial random centroid placement
Example: Finding 3 Clusters (K = 3)
Section titled “Example: Finding 3 Clusters (K = 3)”Good Initialization:
- Results in well-separated, sensible clusters
- Each cluster contains related data points
Poor Initialization:
- Two centroids initialized within same natural group
- One centroid in different group
- Results in less optimal clustering
- This represents a local optimum
K-means minimizes distortion cost function J, but different initializations can get stuck in different local minima.
Multiple Random Initializations
Section titled “Multiple Random Initializations”Strategy: Try Multiple Times
Section titled “Strategy: Try Multiple Times”Run K-means multiple times with different random initializations to find best local optimum.
Algorithm for Multiple Initializations
Section titled “Algorithm for Multiple Initializations”for i = 1 to 100: # Randomly initialize K-means centroids = randomly_pick_K_examples(X)
# Run K-means to convergence final_centroids, assignments = run_kmeans(X, centroids)
# Compute cost function J[i] = compute_distortion(X, assignments, final_centroids)
# Pick solution with lowest costbest_solution = argmin(J)
Implementation Details
Section titled “Implementation Details”- Typical range: 50 to 1000 random initializations
- Why this range?:
- Less than 50: May miss good solutions
- More than 1000: Diminishing returns, computationally expensive
Cost Function Comparison
Section titled “Cost Function Comparison”Good clustering (top result):
- Green, red, blue crosses have relatively small distances to their assigned dots
- Lower cost function J
Poor clustering (bottom results):
- Blue cross has larger distances to blue dots
- Red cross has larger distances to red dots
- Higher cost function J
Selection criteria: Choose clustering with smallest distortion/cost function J
Practical Recommendation
Section titled “Practical Recommendation”Personal Usage
Section titled “Personal Usage”“When I’m using the K-means algorithm myself, I will almost always use more than one random initialization.”
Benefits
Section titled “Benefits”- Much better job minimizing distortion cost function
- Finds much better choice for cluster centroids
- Higher likelihood of finding good clustering (like top example)
- Lower likelihood of inferior local minima (like bottom examples)
Using multiple random initializations is a simple yet effective technique that dramatically improves the quality of K-means clustering results by exploring different starting points in the optimization landscape.