Skip to content
Pablo Rodriguez

Error Metrics Skewed Datasets

When positive-to-negative example ratios are very far from 50-50, usual error metrics like accuracy don’t work effectively.

Scenario: Binary classifier for rare disease detection

  • y = 1: Disease present
  • y = 0: Disease absent
  • Algorithm result: 1% error (99% accuracy)

Seems impressive, but…

If only 0.5% of population has the disease:

Simple baseline algorithm:

baseline.py
# Non-learning algorithm
def predict_disease(patient_data):
return 0 # Always predict no disease

Baseline performance: 99.5% accuracy (0.5% error)

With three algorithms achieving:

  • Algorithm A: 99.5% accuracy (0.5% error)
  • Algorithm B: 99.2% accuracy (0.8% error)
  • Algorithm C: 99.6% accuracy (0.4% error)

Problem: Lowest error may correspond to useless prediction (always y=0) rather than meaningful medical diagnosis.

y=1 is rare class
Actual Class
Predicted Class10
115 (True Positive)5 (False Positive)
010 (False Negative)70 (True Negative)

True Positive

15 examples

  • Predicted: 1 (disease)
  • Actual: 1 (disease)
  • ✅ Correct positive prediction

False Positive

5 examples

  • Predicted: 1 (disease)
  • Actual: 0 (no disease)
  • ❌ Incorrect positive prediction

False Negative

10 examples

  • Predicted: 0 (no disease)
  • Actual: 1 (disease)
  • ❌ Missed disease cases

True Negative

70 examples

  • Predicted: 0 (no disease)
  • Actual: 0 (no disease)
  • ✅ Correct negative prediction

Precision: Of all patients predicted to have disease, what fraction actually has it?

Formula:

Precision = True Positives / (True Positives + False Positives)
= True Positives / Total Predicted Positive

Example calculation:

Precision = 15 / (15 + 5) = 15/20 = 0.75 (75%)

Interpretation: When algorithm predicts disease, it’s correct 75% of the time.

Recall: Of all patients who actually have disease, what fraction did we correctly detect?

Formula:

Recall = True Positives / (True Positives + False Negatives)
= True Positives / Total Actual Positive

Example calculation:

Recall = 15 / (15 + 10) = 15/25 = 0.60 (60%)

Interpretation: Algorithm correctly identifies 60% of all disease cases.

Algorithm that always predicts y=0:

  • True Positives: 0 (never predicts positive)
  • Precision: 0/0 = undefined (treat as 0)
  • Recall: 0/(actual positives) = 0
Key Insight

Both precision and recall will be very low or zero for algorithms that just print y=0, making them easy to identify as non-useful.

Good algorithms need both:

  • High precision: When it predicts disease, probably correct
  • High recall: Catches most actual disease cases

Example interpretation:

  • 75% precision: Reliable when predicting disease presence
  • 60% recall: Identifies majority of actual disease cases
  • Precision importance: Avoid unnecessary anxiety and treatments from false positives
  • Recall importance: Don’t miss actual disease cases that need treatment
  • Balance needed: Both metrics should be reasonably high

Precision and recall provide much better evaluation than accuracy for skewed datasets, ensuring algorithms are both accurate in their positive predictions and effective at detecting the rare class.