Skip to content
Pablo Rodriguez

Checking For Convergence

Monitoring Gradient Descent with Learning Curves

Section titled “Monitoring Gradient Descent with Learning Curves”

Plot the cost function J as a function of iteration number (not parameter values):

  • Horizontal Axis: Number of iterations
  • Vertical Axis: Cost J(w,b) computed on training set
  • Purpose: Visual verification that gradient descent is working correctly

The learning curve shows how the cost function changes after each simultaneous update of parameters w and b.

  • Decreasing Cost: J should decrease after every iteration
  • Eventual Flattening: Cost levels off as algorithm converges
  • Smooth Decline: Generally smooth downward trend
  • 100 iterations: Shows cost value after 100 parameter updates
  • 200 iterations: Cost value after 200 updates
  • 300+ iterations: Cost begins leveling off, indicating convergence
  • Problem: If J increases after any iteration
  • Likely Causes: Learning rate α too large, or code bug
  • Action: Reduce learning rate or debug implementation
  • Pattern: Cost sometimes goes up, sometimes down
  • Diagnosis: Clear sign gradient descent is not working properly
  • Solution: Check learning rate and verify implementation

Let ε (epsilon) be a small threshold (e.g., 0.001 or 10⁻³):

  • Condition: If J decreases by less than ε in one iteration, declare convergence
  • Logic: Algorithm has reached the flat part of the curve
  • Threshold Selection: Choosing appropriate ε value is difficult
  • Problem Detection: Doesn’t help identify issues early
  • Limited Insight: Provides less information than visual inspection

Visual inspection of learning curves is often more reliable than automatic tests for:

  • Detecting convergence
  • Identifying problems early
  • Understanding algorithm behavior
  1. Plot Learning Curves: For every training run
  2. Check Trends: Ensure consistent decrease
  3. Identify Issues: Spot problems early
  4. Adjust Parameters: Based on curve behavior

Healthy Convergence

Smooth, consistent decrease that eventually flattens out

Too Slow

Very gradual decrease - consider increasing learning rate

Unstable

Fluctuating or increasing cost - reduce learning rate or check for bugs

Converged

Cost has flattened and no longer decreasing significantly

  1. Verify Implementation: Check gradient calculation and update rules
  2. Reduce Learning Rate: Try smaller α value
  3. Check Data: Ensure features are properly scaled
  4. Test Simple Case: Use very small α to isolate issues
  • Visual Inspection: Primary method for understanding behavior
  • Patience: Allow sufficient iterations for convergence
  • Documentation: Record what learning curves look like for future reference

Learning curves provide essential insight into gradient descent performance and are the primary tool for ensuring your optimization algorithm is working correctly and efficiently.