Learning Curves

Definition and Purpose

Learning curves show how algorithms perform as a function of training experience (number of training examples). They plot both J_cv (cross-validation error) and J_train (training error) against training set size.

General Learning Curve Behavior

Cross-Validation Error (J_cv)

Decreases as training set size increases
More examples → better model → lower CV error
Algorithm learns patterns more effectively

Training Error (J_train)

Increases as training set size increases
Small datasets: Easy to achieve zero/low error
Large datasets: Harder to fit all examples perfectly

High Bias Learning Curves

Characteristics

Model: Simple (e.g., linear function)
Training error: Rises then plateaus (flattens out)
CV error: Decreases then plateaus
Gap: J_cv consistently higher than J_train
Baseline: Both errors remain above human-level performance

Key Insight: High Bias and More Data

Important

If algorithm has high bias, getting more training data will NOT help much.

Learning curves plateau because:

Simple models (straight lines) don’t change significantly with more data
Both J_cv and J_train flatten and stay flat regardless of dataset size
Curves never reach baseline performance level

High Variance Learning Curves

Characteristics

Model: Complex (e.g., 4th-order polynomial, small λ)
Training error: Low (sometimes below human performance)
CV error: Much higher than training error
Large gap: Significant difference between J_cv and J_train

Key Insight: High Variance and More Data

If algorithm has high variance, getting more training data IS likely to help.

With more data:

J_train continues rising
J_cv comes down toward J_train
Gap between errors decreases
Performance approaches baseline

Plotting Learning Curves in Practice

Implementation Process

Take subsets of training data (100, 200, 300 examples…)
Train models on each subset size
Evaluate J_train and J_cv for each
Plot results against training set size

Computational Considerations

Expensive: Requires training multiple models
Time-consuming: Not done frequently in practice
Mental model: Useful conceptual framework for understanding algorithm behavior

Practical Applications

Housing Price Prediction Example

Learning curves help decide what to try next:

High Bias Scenario

Flat learning curves
More data won’t help
Need model complexity increases

High Variance Scenario

Converging learning curves
More data will help
Focus on data collection

Decision Framework

When J_train and J_cv Plateau Early

Indicates high bias
Adding data provides diminishing returns
Focus on: more features, polynomial terms, reducing λ

When Large Gap Exists Between Errors

Indicates high variance
More data can close the gap
Continue data collection efforts

Learning curves provide visual insight into whether bias or variance is the primary bottleneck, guiding resource allocation decisions in machine learning projects.