Establishing Baseline

Establishing a Baseline Level of Performance

Purpose of Baseline Performance

Before diagnosing bias or variance, establish what level of error is reasonably achievable. This baseline helps determine if training error is genuinely “high” or acceptable given task constraints.

Key Insight

Don’t evaluate training error in isolation - compare against achievable performance levels.

Methods to Set Baseline

Human Level Performance

Best for: Unstructured data (audio, images, text)
Example: Speech recognition where humans achieve 10.6% error due to noisy audio
Humans excel at pattern recognition in natural data

Competing Algorithms

Previous implementations
Competitor solutions
Industry benchmarks
Published research results

Experience-Based Estimation

Domain expertise
Historical project performance
Theoretical limits
Business requirements

Practical Application

Without Baseline

Training error: 10.8%
Appears high, suggests bias problem
May lead to wrong optimization approach

With Baseline (10.6%)

Training error: 10.8% (only 0.2% above baseline)
CV error: 14.8% (4% gap from training)
Correctly identifies variance problem

Baseline Considerations

Zero vs Non-Zero Baselines

Zero baseline: Appropriate for tasks requiring perfect performance
Non-zero baseline: Realistic for noisy data applications
- Speech recognition with background noise
- Medical diagnosis with ambiguous cases
- Image recognition with poor quality inputs

Real-World Data Challenges

Real-world data often contains:

Noise: Background sounds, visual artifacts
Ambiguity: Multiple valid interpretations
Missing information: Incomplete data points
Human limitations: Tasks exceeding human capability

Impact on Bias/Variance Diagnosis

Revised Assessment Framework

Compare training error to baseline (not absolute threshold)
Measure gap between CV and training error
Use baseline-relative metrics for decision making

Baseline: 5%
Training: 15% (10% gap)
CV: 16% (1% gap)
Conclusion: Focus on reducing bias

Establishing Baseline in Practice

Data Collection Phase

Measure human performance on representative samples
Research published benchmarks for similar tasks
Test existing solutions on your specific dataset

Validation Considerations

Ensure baseline uses same evaluation metrics
Test on similar data distribution
Account for task-specific constraints
Consider resource limitations (time, budget)

Setting an appropriate baseline transforms bias/variance analysis from guesswork into data-driven decision making, leading to more effective algorithm improvements.