Plot the cost function J as a function of iteration number (not parameter values):
Horizontal Axis : Number of iterations
Vertical Axis : Cost J(w,b) computed on training set
Purpose : Visual verification that gradient descent is working correctly
The learning curve shows how the cost function changes after each simultaneous update of parameters w and b.
Decreasing Cost : J should decrease after every iteration
Eventual Flattening : Cost levels off as algorithm converges
Smooth Decline : Generally smooth downward trend
100 iterations : Shows cost value after 100 parameter updates
200 iterations : Cost value after 200 updates
300+ iterations : Cost begins leveling off, indicating convergence
Note
Convergence Timing : The number of iterations needed varies dramatically between applications - could be 30, 1,000, or 100,000 iterations.
Problem : If J increases after any iteration
Likely Causes : Learning rate α too large, or code bug
Action : Reduce learning rate or debug implementation
Pattern : Cost sometimes goes up, sometimes down
Diagnosis : Clear sign gradient descent is not working properly
Solution : Check learning rate and verify implementation
Let ε (epsilon) be a small threshold (e.g., 0.001 or 10⁻³):
Condition : If J decreases by less than ε in one iteration, declare convergence
Logic : Algorithm has reached the flat part of the curve
Threshold Selection : Choosing appropriate ε value is difficult
Problem Detection : Doesn’t help identify issues early
Limited Insight : Provides less information than visual inspection
Visual inspection of learning curves is often more reliable than automatic tests for:
Detecting convergence
Identifying problems early
Understanding algorithm behavior
Plot Learning Curves : For every training run
Check Trends : Ensure consistent decrease
Identify Issues : Spot problems early
Adjust Parameters : Based on curve behavior
Healthy Convergence
Smooth, consistent decrease that eventually flattens out
Too Slow
Very gradual decrease - consider increasing learning rate
Unstable
Fluctuating or increasing cost - reduce learning rate or check for bugs
Converged
Cost has flattened and no longer decreasing significantly
Verify Implementation : Check gradient calculation and update rules
Reduce Learning Rate : Try smaller α value
Check Data : Ensure features are properly scaled
Test Simple Case : Use very small α to isolate issues
Visual Inspection : Primary method for understanding behavior
Patience : Allow sufficient iterations for convergence
Documentation : Record what learning curves look like for future reference
Learning curves provide essential insight into gradient descent performance and are the primary tool for ensuring your optimization algorithm is working correctly and efficiently.