Diagnosing Bias Lab

Lab: Diagnosing Bias and Variance

Lab Overview

This lab builds upon performance evaluation by exploring techniques to improve models using bias/variance analysis. Training and cross-validation errors indicate whether you have high bias (underfitting) or high variance (overfitting) problems.

High Bias

Model not capturing training data patterns

High training error
High CV error

High Variance

Model overfitting training set

Low training error
High CV error

Fixing High Bias

Adding Polynomial Features

Adding polynomial features helps models learn complex patterns
Example shows training/CV errors vs polynomial degree
Baseline performance: 400% → models above degree 4 achieve low bias
With lower baseline (250%), more degrees needed

Getting Additional Features

Second feature added to dataset (2 columns instead of 1)
Training error closer to baseline with additional information
More features provide richer representation

Decreasing Regularization Parameter

Ridge regression with various λ values tested
High λ (10) → training error worse than baseline → high bias
Decreasing λ allows model to learn complex patterns
Lower λ values approach baseline performance

Fixing High Variance

Increasing Regularization Parameter

Small λ values maintain low bias but high variance
Increasing λ improves cross-validation error
Example: λ from 0.01 to 1.0 reduces overfitting

Smaller Feature Sets

Irrelevant features (like patient IDs) cause overfitting
Comparison: 2 features vs 3 features (with random ID)
3-feature model has higher CV error, especially with polynomial terms
At degree=4: wider gap between training and CV error with extra feature

More Training Examples

Learning curves show error vs training set size
4th degree polynomial model example
CV error approaches training error as dataset grows
More examples won’t solve high bias (training error stays flat)

Key Lab Insights

High Variance Solutions:

Increase λ (regularization)
Remove irrelevant features
Collect more training data

High Bias Solutions:

Add polynomial features
Collect additional relevant features
Decrease λ values

Learning Curves:

Show whether more data will help
High variance: CV error decreases toward training error
High bias: Both errors plateau regardless of data size

The lab demonstrates systematic approaches to diagnosing and fixing bias/variance problems through practical experimentation with real datasets.