Skip to content
Pablo Rodriguez

Learning Curves

Learning curves show how algorithms perform as a function of training experience (number of training examples). They plot both J_cv (cross-validation error) and J_train (training error) against training set size.

  • Decreases as training set size increases
  • More examples → better model → lower CV error
  • Algorithm learns patterns more effectively
  • Increases as training set size increases
  • Small datasets: Easy to achieve zero/low error
  • Large datasets: Harder to fit all examples perfectly
  • Model: Simple (e.g., linear function)
  • Training error: Rises then plateaus (flattens out)
  • CV error: Decreases then plateaus
  • Gap: J_cv consistently higher than J_train
  • Baseline: Both errors remain above human-level performance
Important

If algorithm has high bias, getting more training data will NOT help much.

Learning curves plateau because:

  • Simple models (straight lines) don’t change significantly with more data
  • Both J_cv and J_train flatten and stay flat regardless of dataset size
  • Curves never reach baseline performance level
  • Model: Complex (e.g., 4th-order polynomial, small λ)
  • Training error: Low (sometimes below human performance)
  • CV error: Much higher than training error
  • Large gap: Significant difference between J_cv and J_train

If algorithm has high variance, getting more training data IS likely to help.

With more data:

  • J_train continues rising
  • J_cv comes down toward J_train
  • Gap between errors decreases
  • Performance approaches baseline
  1. Take subsets of training data (100, 200, 300 examples…)
  2. Train models on each subset size
  3. Evaluate J_train and J_cv for each
  4. Plot results against training set size
  • Expensive: Requires training multiple models
  • Time-consuming: Not done frequently in practice
  • Mental model: Useful conceptual framework for understanding algorithm behavior

Learning curves help decide what to try next:

High Bias Scenario

  • Flat learning curves
  • More data won’t help
  • Need model complexity increases

High Variance Scenario

  • Converging learning curves
  • More data will help
  • Focus on data collection
  • Indicates high bias
  • Adding data provides diminishing returns
  • Focus on: more features, polynomial terms, reducing λ
  • Indicates high variance
  • More data can close the gap
  • Continue data collection efforts

Learning curves provide visual insight into whether bias or variance is the primary bottleneck, guiding resource allocation decisions in machine learning projects.