Learning curves show how algorithms perform as a function of training experience (number of training examples). They plot both J_cv (cross-validation error) and J_train (training error) against training set size.
Decreases as training set size increases
More examples → better model → lower CV error
Algorithm learns patterns more effectively
Increases as training set size increases
Small datasets: Easy to achieve zero/low error
Large datasets: Harder to fit all examples perfectly
Model : Simple (e.g., linear function)
Training error : Rises then plateaus (flattens out)
CV error : Decreases then plateaus
Gap : J_cv consistently higher than J_train
Baseline : Both errors remain above human-level performance
Important
If algorithm has high bias, getting more training data will NOT help much.
Learning curves plateau because:
Simple models (straight lines) don’t change significantly with more data
Both J_cv and J_train flatten and stay flat regardless of dataset size
Curves never reach baseline performance level
Model : Complex (e.g., 4th-order polynomial, small λ)
Training error : Low (sometimes below human performance)
CV error : Much higher than training error
Large gap : Significant difference between J_cv and J_train
If algorithm has high variance, getting more training data IS likely to help.
With more data:
J_train continues rising
J_cv comes down toward J_train
Gap between errors decreases
Performance approaches baseline
Take subsets of training data (100, 200, 300 examples…)
Train models on each subset size
Evaluate J_train and J_cv for each
Plot results against training set size
Expensive : Requires training multiple models
Time-consuming : Not done frequently in practice
Mental model : Useful conceptual framework for understanding algorithm behavior
Learning curves help decide what to try next:
High Bias Scenario
Flat learning curves
More data won’t help
Need model complexity increases
High Variance Scenario
Converging learning curves
More data will help
Focus on data collection
Indicates high bias
Adding data provides diminishing returns
Focus on: more features, polynomial terms, reducing λ
Indicates high variance
More data can close the gap
Continue data collection efforts
Learning curves provide visual insight into whether bias or variance is the primary bottleneck, guiding resource allocation decisions in machine learning projects.