Skip to content
Pablo Rodriguez

Error Analysis

Second Most Important

After bias/variance analysis, error analysis is the second most important diagnostic tool for improving learning algorithm performance.

Core Concept: Manually examine misclassified examples from cross-validation set to identify patterns and common traits.

  • Cross-validation examples: m_cv = 500
  • Misclassified examples: 100 out of 500
  • Approach: Manual inspection of these 100 misclassified cases

After examining 100 misclassified spam examples:

Pharmaceutical Spam

21 examples - Medicine/drug sales High impact problem

Password Phishing

18 examples - Attempting to steal passwords High impact problem

Unusual Email Routing

7 examples - Suspicious server paths Moderate impact

Embedded Image Spam

5 examples - Spam message in images Moderate impact

Deliberate Misspellings

3 examples - “w4tches”, “med1cine” Low impact problem

  • Pharmaceutical spam: 21/100 = Major problem requiring attention
  • Phishing emails: 18/100 = Major problem requiring attention
  • Deliberate misspellings: 3/100 = Minor problem, lower priority
  • Non-mutually exclusive: Single email can belong to multiple categories
  • Example: Pharmaceutical spam with unusual routing AND deliberate misspellings
  • Overlapping counts: One email counted in multiple categories
  • Small datasets: Examine all misclassified examples (e.g., 100 examples)
  • Large datasets: Sample randomly ~100-200 examples for manual review
  • Time constraint: Choose sample size manageable for available team/time
  • Statistical validity: ~100 examples usually provide sufficient error pattern statistics

Based on pharmaceutical spam being major issue:

Data Collection:

  • Collect more pharmaceutical spam examples specifically
  • Focus on drug names and pharmaceutical product features

Feature Engineering:

  • Develop features for specific drug names
  • Create pharmaceutical product detection algorithms

Phishing-Specific Improvements:

  • Analyze URLs in emails for suspicious links
  • Create features for detecting phishing patterns
  • Collect more phishing email examples
  • Works well: Problems humans can evaluate (email spam classification)
  • Challenging: Tasks where human performance is poor
  • Example limitation: Predicting ad clicks - humans can’t reliably predict user behavior

Applications where even humans struggle:

  • Recommender systems
  • Complex user behavior prediction
  • Highly technical/specialized domains

Error analysis provides concrete direction for improvement efforts, potentially saving months of otherwise fruitless work by focusing attention on high-impact problem areas.