Skip to content
Pablo Rodriguez

Diagnosing Bias Lab

This lab builds upon performance evaluation by exploring techniques to improve models using bias/variance analysis. Training and cross-validation errors indicate whether you have high bias (underfitting) or high variance (overfitting) problems.

High Bias

Model not capturing training data patterns

  • High training error
  • High CV error

High Variance

Model overfitting training set

  • Low training error
  • High CV error
  • Adding polynomial features helps models learn complex patterns
  • Example shows training/CV errors vs polynomial degree
  • Baseline performance: 400% → models above degree 4 achieve low bias
  • With lower baseline (250%), more degrees needed
  • Second feature added to dataset (2 columns instead of 1)
  • Training error closer to baseline with additional information
  • More features provide richer representation
  • Ridge regression with various λ values tested
  • High λ (10) → training error worse than baseline → high bias
  • Decreasing λ allows model to learn complex patterns
  • Lower λ values approach baseline performance
  • Small λ values maintain low bias but high variance
  • Increasing λ improves cross-validation error
  • Example: λ from 0.01 to 1.0 reduces overfitting
  • Irrelevant features (like patient IDs) cause overfitting
  • Comparison: 2 features vs 3 features (with random ID)
  • 3-feature model has higher CV error, especially with polynomial terms
  • At degree=4: wider gap between training and CV error with extra feature
  • Learning curves show error vs training set size
  • 4th degree polynomial model example
  • CV error approaches training error as dataset grows
  • More examples won’t solve high bias (training error stays flat)

High Variance Solutions:

  • Increase λ (regularization)
  • Remove irrelevant features
  • Collect more training data

High Bias Solutions:

  • Add polynomial features
  • Collect additional relevant features
  • Decrease λ values

Learning Curves:

  • Show whether more data will help
  • High variance: CV error decreases toward training error
  • High bias: Both errors plateau regardless of data size

The lab demonstrates systematic approaches to diagnosing and fixing bias/variance problems through practical experimentation with real datasets.