Skip to content
Pablo Rodriguez

Establishing Baseline

Establishing a Baseline Level of Performance

Section titled “Establishing a Baseline Level of Performance”

Before diagnosing bias or variance, establish what level of error is reasonably achievable. This baseline helps determine if training error is genuinely “high” or acceptable given task constraints.

Key Insight

Don’t evaluate training error in isolation - compare against achievable performance levels.

  • Best for: Unstructured data (audio, images, text)
  • Example: Speech recognition where humans achieve 10.6% error due to noisy audio
  • Humans excel at pattern recognition in natural data
  • Previous implementations
  • Competitor solutions
  • Industry benchmarks
  • Published research results
  • Domain expertise
  • Historical project performance
  • Theoretical limits
  • Business requirements

Without Baseline

  • Training error: 10.8%
  • Appears high, suggests bias problem
  • May lead to wrong optimization approach

With Baseline (10.6%)

  • Training error: 10.8% (only 0.2% above baseline)
  • CV error: 14.8% (4% gap from training)
  • Correctly identifies variance problem
  • Zero baseline: Appropriate for tasks requiring perfect performance
  • Non-zero baseline: Realistic for noisy data applications
    • Speech recognition with background noise
    • Medical diagnosis with ambiguous cases
    • Image recognition with poor quality inputs

Real-world data often contains:

  • Noise: Background sounds, visual artifacts
  • Ambiguity: Multiple valid interpretations
  • Missing information: Incomplete data points
  • Human limitations: Tasks exceeding human capability
  1. Compare training error to baseline (not absolute threshold)
  2. Measure gap between CV and training error
  3. Use baseline-relative metrics for decision making
  • Baseline: 5%
  • Training: 15% (10% gap)
  • CV: 16% (1% gap)
  • Conclusion: Focus on reducing bias
  • Measure human performance on representative samples
  • Research published benchmarks for similar tasks
  • Test existing solutions on your specific dataset
  • Ensure baseline uses same evaluation metrics
  • Test on similar data distribution
  • Account for task-specific constraints
  • Consider resource limitations (time, budget)

Setting an appropriate baseline transforms bias/variance analysis from guesswork into data-driven decision making, leading to more effective algorithm improvements.