Optimization Objective
Optimization Objective
Section titled “Optimization Objective”K-means as Cost Function Optimization
Section titled “K-means as Cost Function Optimization”Like supervised learning algorithms (courses 1 and 2), K-means also optimizes a specific cost function. However, it doesn’t use gradient descent - it uses the algorithm shown in previous videos.
Notation Review
Section titled “Notation Review”- c⁽ⁱ⁾ = index of cluster (1 to K) to which training example x⁽ⁱ⁾ is currently assigned
- μₖ = location of cluster centroid k
- μc⁽ⁱ⁾ = cluster centroid of the cluster to which example x⁽ⁱ⁾ has been assigned
Example
Section titled “Example”- If training example x⁽¹⁰⁾ is assigned to cluster 2: c⁽¹⁰⁾ = 2
- Then μc⁽¹⁰⁾ = μ₂ = location of cluster centroid 2
Cost Function Definition
Section titled “Cost Function Definition”The cost function J is:
J(c⁽¹⁾,...,c⁽ᵐ⁾, μ₁,...,μₖ) = (1/m) Σ(i=1 to m) ||x⁽ⁱ⁾ - μc⁽ⁱ⁾||²
Components
Section titled “Components”- Input variables: c⁽¹⁾ through c⁽ᵐ⁾ (cluster assignments), μ₁ through μₖ (centroid locations)
- Calculation: Average squared distance between every training example x⁽ⁱ⁾ and the cluster centroid to which it’s assigned
- Alternative name: Distortion function (common in literature)
Visual Understanding
Section titled “Visual Understanding”At any step of K-means:
- Look at all blue points → measure distances to blue centroid → square them
- Look at all red points → measure distances to red centroid → square them
- Average of squares of all these distances = value of cost function J
How K-means Minimizes J
Section titled “How K-means Minimizes J”Step 1: Assign Points to Cluster Centroids
Section titled “Step 1: Assign Points to Cluster Centroids”- Goal: Update c⁽¹⁾ through c⁽ᵐ⁾ to minimize J
- Method: Hold μ₁ through μₖ fixed
- Logic: To minimize ||x⁽ⁱ⁾ - μc⁽ⁱ⁾||², assign x⁽ⁱ⁾ to closest cluster centroid
Example
Section titled “Example”- Two cluster centroids μ₁ and μ₂
- Single training example x⁽ⁱ⁾
- Assign to centroid 1: large squared distance
- Assign to centroid 2: much smaller squared distance
- Result: Choose closer centroid to minimize cost
Step 2: Move Cluster Centroids
Section titled “Step 2: Move Cluster Centroids”- Goal: Update μ₁ through μₖ to minimize J
- Method: Hold c⁽¹⁾ through c⁽ᵐ⁾ fixed
- Logic: Choose μₖ as mean of points assigned to cluster k
Example
Section titled “Example”Cluster with two points at positions 1 and 11:
-
Centroid at position 1:
- Distance to point 1: 0, squared = 0
- Distance to point 11: 10, squared = 100
- Average: (0 + 100)/2 = 50
-
Centroid at position 6 (mean):
- Distance to point 1: 5, squared = 25
- Distance to point 11: 5, squared = 25
- Average: (25 + 25)/2 = 25
Mean location minimizes squared distance.
Algorithm Convergence Properties
Section titled “Algorithm Convergence Properties”Guaranteed Convergence
Section titled “Guaranteed Convergence”- Every K-means iteration either decreases J or keeps it same
- J should never increase (if it does, there’s a bug in code)
- Each step optimizes J, so algorithm must converge
Convergence Detection
Section titled “Convergence Detection”- If cost function stops going down → K-means has converged
- Can stop algorithm when J stays same for one iteration
- Sometimes J goes down very slowly → may choose to stop (similar to gradient descent)
Understanding K-means as a cost function optimization provides insight into why the algorithm works and guarantees convergence to a local minimum of the distortion function.