Feature Scaling
Feature Scaling
Section titled “Feature Scaling”The Problem with Different Feature Ranges
Section titled “The Problem with Different Feature Ranges”When features have vastly different ranges of values, gradient descent can run slowly and inefficiently.
Example Scenario
Section titled “Example Scenario”- x₁ (house size): 300-2,000 square feet
- x₂ (bedrooms): 0-5 bedrooms
Parameter Relationships
Section titled “Parameter Relationships”For a house with 2,000 sq ft, 5 bedrooms, priced at $500k:
Poor Parameters: w₁=50, w₂=0.1, b=50
- Prediction: 50×2000 + 0.1×5 + 50 = 100,050k (way too high)
Good Parameters: w₁=0.1, w₂=50, b=50
- Prediction: 0.1×2000 + 50×5 + 50 = 500k (correct)
Impact on Gradient Descent
Section titled “Impact on Gradient Descent”Contour Plot Characteristics
Section titled “Contour Plot Characteristics”- Without Scaling: Tall, narrow elliptical contours
- Inefficient Path: Gradient descent bounces back and forth
- Slow Convergence: Takes many iterations to reach minimum
With Feature Scaling
Section titled “With Feature Scaling”- Circular Contours: More symmetric cost function shape
- Direct Path: Gradient descent moves efficiently toward minimum
- Fast Convergence: Significantly fewer iterations needed
Feature Scaling Methods
Section titled “Feature Scaling Methods”1. Division by Maximum
Section titled “1. Division by Maximum”x1_scaled = x1 / max(x1) # Range: 0.15 to 1.0x2_scaled = x2 / max(x2) # Range: 0.0 to 1.0
2. Mean Normalization
Section titled “2. Mean Normalization”# Calculate mean (μ) for each featuremu1 = mean(x1) # e.g., 600mu2 = mean(x2) # e.g., 2.3
# Normalizex1_norm = (x1 - mu1) / (max(x1) - min(x1))x2_norm = (x2 - mu2) / (max(x2) - min(x2))
Results: Features centered around zero with both positive and negative values
3. Z-Score Normalization
Section titled “3. Z-Score Normalization”# Calculate mean (μ) and standard deviation (σ)mu1, sigma1 = mean(x1), std(x1) # e.g., 600, 450mu2, sigma2 = mean(x2), std(x2) # e.g., 2.3, 1.4
# Z-score normalizex1_zscore = (x1 - mu1) / sigma1x2_zscore = (x2 - mu2) / sigma2
Target Ranges and Guidelines
Section titled “Target Ranges and Guidelines”Acceptable Ranges
Section titled “Acceptable Ranges”Aim for features in approximately -1 to +1 range, though these are flexible guidelines:
- ✅ Good: -3 to +3, -0.3 to +0.3, 0 to 3, -2 to +0.5
- ⚠️ Consider Rescaling: -100 to +100, -0.001 to +0.001, 98.6 to 105°F
When to Apply Feature Scaling
Section titled “When to Apply Feature Scaling”Features with very different magnitudes (e.g., house size vs. number of rooms)
When in doubt - there’s rarely harm in feature scaling
Features already in similar ranges (-1 to +1 approximately)
Practical Examples
Section titled “Practical Examples”Temperature Data
Section titled “Temperature Data”Hospital patient temperatures ranging 98.6-105°F:
- Values around 100 are large compared to typical scaled features
- Recommendation: Scale to improve gradient descent performance
Financial Data
Section titled “Financial Data”Income ($30k-$200k) vs. Age (25-65 years):
- Very different scales require feature scaling
- Prevents income from dominating the model due to larger numeric values
Feature scaling is a simple yet powerful technique that can dramatically improve the efficiency and performance of gradient descent algorithms, making it essential for practical machine learning implementations.