More Data Solution
Benefits: Often the most effective approach
- Larger training sets make algorithms less likely to overfit
- Can continue using complex models with sufficient data
- Limitation: Not always feasible to obtain more data
Regularization is a powerful technique for reducing overfitting by preventing parameters from becoming too large, leading to simpler and more generalizable models.
More Data Solution
Benefits: Often the most effective approach
Reduce Features
Approach: Use fewer features
Regularization
Best of Both Worlds: Keep all features but prevent large parameters
Instead of eliminating features entirely (setting parameters to 0), regularization encourages smaller parameter values across all features.
Consider a high-order polynomial that overfits:
f(x) = w₁x + w₂x² + w₃x³ + w₄x⁴ + b
If we could make w₃ and w₄ very small (close to 0), we get a function closer to:
f(x) ≈ w₁x + w₂x² + b
This reduces complexity while keeping all features.
Add a penalty term for large parameters:
J(w,b) = (1/2m) * Σ(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)² + (λ/2m) * Σ(wⱼ²)
Where:
λ = 0
No regularization - potential overfitting
λ very large (10¹⁰)
All w ≈ 0, f(x) ≈ b - underfitting
λ just right
Balanced trade-off between fit and simplicity
The regularized update for wⱼ becomes:
wⱼ := wⱼ(1 - α*λ/m) - α*(1/m)*Σ[(f(x⁽ⁱ⁾) - y⁽ⁱ⁾)*xⱼ⁽ⁱ⁾]
On each iteration:
J(w,b) = -(1/m) * Σ[y*log(f(x)) + (1-y)*log(1-f(x))] + (λ/2m) * Σ(wⱼ²)
Same principle applies to classification problems.
Regularization becomes even more important in deep learning due to the large number of parameters.
Regularization offers an elegant solution to overfitting by encouraging smaller parameter values rather than eliminating features entirely. By adding a penalty term to the cost function, it creates a balance between fitting the training data and maintaining model simplicity, leading to better generalization on new examples.