Skip to content
Pablo Rodriguez

Initializing K Means

  • Should always choose number of cluster centroids K < number of training examples m
  • Doesn’t make sense to have K ≥ m (not enough examples per cluster)
  • Example: K = 2, m = 30
  1. Randomly pick K training examples from the dataset
  2. Set μ₁ through μₖ equal to these K training examples
  • Training set with many points
  • Randomly select two training examples (for K = 2)
  • Set red cluster centroid at first selected point
  • Set blue cluster centroid at second selected point
  • Run K-means from these initial positions

Different Initializations → Different Results

Section titled “Different Initializations → Different Results”
  • Same dataset can yield different clustering results
  • Depends on initial random centroid placement

Good Initialization:

  • Results in well-separated, sensible clusters
  • Each cluster contains related data points

Poor Initialization:

  • Two centroids initialized within same natural group
  • One centroid in different group
  • Results in less optimal clustering
  • This represents a local optimum

K-means minimizes distortion cost function J, but different initializations can get stuck in different local minima.

Run K-means multiple times with different random initializations to find best local optimum.

multiple-initializations
for i = 1 to 100:
# Randomly initialize K-means
centroids = randomly_pick_K_examples(X)
# Run K-means to convergence
final_centroids, assignments = run_kmeans(X, centroids)
# Compute cost function
J[i] = compute_distortion(X, assignments, final_centroids)
# Pick solution with lowest cost
best_solution = argmin(J)
  • Typical range: 50 to 1000 random initializations
  • Why this range?:
    • Less than 50: May miss good solutions
    • More than 1000: Diminishing returns, computationally expensive

Good clustering (top result):

  • Green, red, blue crosses have relatively small distances to their assigned dots
  • Lower cost function J

Poor clustering (bottom results):

  • Blue cross has larger distances to blue dots
  • Red cross has larger distances to red dots
  • Higher cost function J

Selection criteria: Choose clustering with smallest distortion/cost function J

“When I’m using the K-means algorithm myself, I will almost always use more than one random initialization.”

  • Much better job minimizing distortion cost function
  • Finds much better choice for cluster centroids
  • Higher likelihood of finding good clustering (like top example)
  • Lower likelihood of inferior local minima (like bottom examples)

Using multiple random initializations is a simple yet effective technique that dramatically improves the quality of K-means clustering results by exploring different starting points in the optimization landscape.