Checking For Convergence

Checking for Convergence

Monitoring Gradient Descent with Learning Curves

Creating a Learning Curve

Plot the cost function J as a function of iteration number (not parameter values):

Horizontal Axis: Number of iterations
Vertical Axis: Cost J(w,b) computed on training set
Purpose: Visual verification that gradient descent is working correctly

The learning curve shows how the cost function changes after each simultaneous update of parameters w and b.

Interpreting Learning Curves

Proper Convergence Pattern

Decreasing Cost: J should decrease after every iteration
Eventual Flattening: Cost levels off as algorithm converges
Smooth Decline: Generally smooth downward trend

Key Points on the Curve

100 iterations: Shows cost value after 100 parameter updates
200 iterations: Cost value after 200 updates
300+ iterations: Cost begins leveling off, indicating convergence

Warning Signs

Cost Increases

Problem: If J increases after any iteration
Likely Causes: Learning rate α too large, or code bug
Action: Reduce learning rate or debug implementation

Fluctuating Cost

Pattern: Cost sometimes goes up, sometimes down
Diagnosis: Clear sign gradient descent is not working properly
Solution: Check learning rate and verify implementation

Automatic Convergence Test

Implementation

Let ε (epsilon) be a small threshold (e.g., 0.001 or 10⁻³):

Condition: If J decreases by less than ε in one iteration, declare convergence
Logic: Algorithm has reached the flat part of the curve

Challenges with Automatic Tests

Threshold Selection: Choosing appropriate ε value is difficult
Problem Detection: Doesn’t help identify issues early
Limited Insight: Provides less information than visual inspection

Recommendation

Visual inspection of learning curves is often more reliable than automatic tests for:

Detecting convergence
Identifying problems early
Understanding algorithm behavior

Practical Monitoring Strategy

During Development

Plot Learning Curves: For every training run
Check Trends: Ensure consistent decrease
Identify Issues: Spot problems early
Adjust Parameters: Based on curve behavior

Key Indicators

Healthy Convergence

Smooth, consistent decrease that eventually flattens out

Too Slow

Very gradual decrease - consider increasing learning rate

Unstable

Fluctuating or increasing cost - reduce learning rate or check for bugs

Converged

Cost has flattened and no longer decreasing significantly

Debugging Workflow

When Cost Increases

Verify Implementation: Check gradient calculation and update rules
Reduce Learning Rate: Try smaller α value
Check Data: Ensure features are properly scaled
Test Simple Case: Use very small α to isolate issues

Convergence Assessment

Visual Inspection: Primary method for understanding behavior
Patience: Allow sufficient iterations for convergence
Documentation: Record what learning curves look like for future reference

Learning curves provide essential insight into gradient descent performance and are the primary tool for ensuring your optimization algorithm is working correctly and efficiently.