Skip to content
Pablo Rodriguez

Feature Scaling

When features have vastly different ranges of values, gradient descent can run slowly and inefficiently.

  • x₁ (house size): 300-2,000 square feet
  • x₂ (bedrooms): 0-5 bedrooms

For a house with 2,000 sq ft, 5 bedrooms, priced at $500k:

Poor Parameters: w₁=50, w₂=0.1, b=50

  • Prediction: 50×2000 + 0.1×5 + 50 = 100,050k (way too high)

Good Parameters: w₁=0.1, w₂=50, b=50

  • Prediction: 0.1×2000 + 50×5 + 50 = 500k (correct)
  • Without Scaling: Tall, narrow elliptical contours
  • Inefficient Path: Gradient descent bounces back and forth
  • Slow Convergence: Takes many iterations to reach minimum
  • Circular Contours: More symmetric cost function shape
  • Direct Path: Gradient descent moves efficiently toward minimum
  • Fast Convergence: Significantly fewer iterations needed
max_scaling.py
x1_scaled = x1 / max(x1) # Range: 0.15 to 1.0
x2_scaled = x2 / max(x2) # Range: 0.0 to 1.0
mean_norm.py
# Calculate mean (μ) for each feature
mu1 = mean(x1) # e.g., 600
mu2 = mean(x2) # e.g., 2.3
# Normalize
x1_norm = (x1 - mu1) / (max(x1) - min(x1))
x2_norm = (x2 - mu2) / (max(x2) - min(x2))

Results: Features centered around zero with both positive and negative values

zscore.py
# Calculate mean (μ) and standard deviation (σ)
mu1, sigma1 = mean(x1), std(x1) # e.g., 600, 450
mu2, sigma2 = mean(x2), std(x2) # e.g., 2.3, 1.4
# Z-score normalize
x1_zscore = (x1 - mu1) / sigma1
x2_zscore = (x2 - mu2) / sigma2
Z-Score

Aim for features in approximately -1 to +1 range, though these are flexible guidelines:

  • ✅ Good: -3 to +3, -0.3 to +0.3, 0 to 3, -2 to +0.5
  • ⚠️ Consider Rescaling: -100 to +100, -0.001 to +0.001, 98.6 to 105°F

Features with very different magnitudes (e.g., house size vs. number of rooms)

Hospital patient temperatures ranging 98.6-105°F:

  • Values around 100 are large compared to typical scaled features
  • Recommendation: Scale to improve gradient descent performance

Income ($30k-$200k) vs. Age (25-65 years):

  • Very different scales require feature scaling
  • Prevents income from dominating the model due to larger numeric values

Feature scaling is a simple yet powerful technique that can dramatically improve the efficiency and performance of gradient descent algorithms, making it essential for practical machine learning implementations.