Second Most Important
After bias/variance analysis, error analysis is the second most important diagnostic tool for improving learning algorithm performance.
Core Concept : Manually examine misclassified examples from cross-validation set to identify patterns and common traits.
Cross-validation examples : m_cv = 500
Misclassified examples : 100 out of 500
Approach : Manual inspection of these 100 misclassified cases
After examining 100 misclassified spam examples:
Pharmaceutical Spam
21 examples - Medicine/drug sales
High impact problem
Password Phishing
18 examples - Attempting to steal passwords
High impact problem
Unusual Email Routing
7 examples - Suspicious server paths
Moderate impact
Embedded Image Spam
5 examples - Spam message in images
Moderate impact
Deliberate Misspellings
3 examples - “w4tches”, “med1cine”
Low impact problem
Pharmaceutical spam : 21/100 = Major problem requiring attention
Phishing emails : 18/100 = Major problem requiring attention
Deliberate misspellings : 3/100 = Minor problem, lower priority
Non-mutually exclusive : Single email can belong to multiple categories
Example : Pharmaceutical spam with unusual routing AND deliberate misspellings
Overlapping counts : One email counted in multiple categories
Small datasets : Examine all misclassified examples (e.g., 100 examples)
Large datasets : Sample randomly ~100-200 examples for manual review
Time constraint : Choose sample size manageable for available team/time
Statistical validity : ~100 examples usually provide sufficient error pattern statistics
Based on pharmaceutical spam being major issue:
Data Collection :
Collect more pharmaceutical spam examples specifically
Focus on drug names and pharmaceutical product features
Feature Engineering :
Develop features for specific drug names
Create pharmaceutical product detection algorithms
Phishing-Specific Improvements :
Analyze URLs in emails for suspicious links
Create features for detecting phishing patterns
Collect more phishing email examples
Works well : Problems humans can evaluate (email spam classification)
Challenging : Tasks where human performance is poor
Example limitation : Predicting ad clicks - humans can’t reliably predict user behavior
Applications where even humans struggle:
Recommender systems
Complex user behavior prediction
Highly technical/specialized domains
Error analysis provides concrete direction for improvement efforts, potentially saving months of otherwise fruitless work by focusing attention on high-impact problem areas.