K Means Programming
Programming Assignment: K-means
Section titled “Programming Assignment: K-means”Assignment Overview
Section titled “Assignment Overview”This exercise implements the K-means algorithm and applies it to image compression by reducing colors in an image to only the most common ones.
Assignment Structure
Section titled “Assignment Structure”Part 1: Implementing K-means
Section titled “Part 1: Implementing K-means”- 1.1 Finding closest centroids (Exercise 1)
- 1.2 Computing centroid means (Exercise 2)
Part 2: K-means on Sample Dataset
Section titled “Part 2: K-means on Sample Dataset”- Run implemented algorithm on toy 2D dataset
- Visualize algorithm progress through iterations
Part 3: Random Initialization
Section titled “Part 3: Random Initialization”- Understanding centroid initialization process
- Using
kMeans_init_centroids
function
Part 4: Image Compression with K-means
Section titled “Part 4: Image Compression with K-means”- 4.1 Dataset: Load and process bird image
- 4.2 K-means on image pixels: Apply clustering to RGB values
- 4.3 Compress image: Reconstruct using reduced color palette
Key Implementation Details
Section titled “Key Implementation Details”Exercise 1: Finding Closest Centroids
Section titled “Exercise 1: Finding Closest Centroids”Objective: Complete find_closest_centroids
function
Task:
- Input: Data matrix X, centroid locations
- Output: Array idx with closest centroid index for each example
- Formula: c⁽ⁱ⁾ := j that minimizes ||x⁽ⁱ⁾ - μⱼ||²
Implementation approach:
for i in range(X.shape[0]): distance = [] for j in range(centroids.shape[0]): norm_ij = np.linalg.norm(X[i] - centroids[j]) distance.append(norm_ij) idx[i] = np.argmin(distance)
Exercise 2: Computing Centroid Means
Section titled “Exercise 2: Computing Centroid Means”Objective: Complete compute_centroids
function
Task:
- Input: Data matrix X, assignments idx, number of clusters K
- Output: New centroids as means of assigned points
- Formula: μₖ = (1/|Cₖ|) Σᵢ∈Cₖ x⁽ⁱ⁾
Implementation approach:
for k in range(K): points = X[idx == k] # Get points assigned to cluster k centroids[k] = np.mean(points, axis=0) # Compute mean
Sample Dataset Results
Section titled “Sample Dataset Results”Algorithm Process
Section titled “Algorithm Process”- Starts with random initial centroids
- Iteratively assigns points and moves centroids
- Visualizes progress through multiple iterations
- Converges to final clustering solution
Expected Behavior
Section titled “Expected Behavior”- Algorithm should converge to stable cluster assignments
- Final centroids become black X-marks in cluster centers
- Connected X-marks show centroid movement history
Image Compression Application
Section titled “Image Compression Application”Technical Details
Section titled “Technical Details”- Original encoding: 24-bit color (8 bits × 3 RGB channels)
- Original size: 128 × 128 × 24 = 393,216 bits
- Compressed encoding: 4 bits per pixel + 16 colors × 24 bits
- Compressed size: 16 × 24 + 128 × 128 × 4 = 65,920 bits
- Compression ratio: ~6:1 reduction
Process
Section titled “Process”- Reshape image: Convert to m × 3 matrix (RGB values per pixel)
- Apply K-means: Find K=16 most representative colors
- Assign pixels: Map each pixel to closest representative color
- Reconstruct: Create new image using only 16 colors
Results
Section titled “Results”- Compressed image retains most characteristics of original
- Some compression artifacts due to reduced color palette
- Trade-off between image quality and file size
Key Learning Outcomes
Section titled “Key Learning Outcomes”Algorithm Understanding
Section titled “Algorithm Understanding”- Implementation of core K-means components
- Understanding of iterative optimization process
- Experience with convergence behavior
Practical Applications
Section titled “Practical Applications”- Real-world use case in image compression
- Trade-offs between quality and compression
- Visualization of clustering results
Technical Skills
Section titled “Technical Skills”- NumPy operations for distance calculations
- Array indexing and boolean masking
- Image processing and visualization
The K-means programming exercise demonstrates both the algorithmic foundations and practical utility of clustering in computer vision and data compression applications.