Error Metrics Skewed Datasets

Error Metrics for Skewed Datasets

Problem with Accuracy on Skewed Data

When positive-to-negative example ratios are very far from 50-50, usual error metrics like accuracy don’t work effectively.

Rare Disease Detection Example

Misleading Accuracy Results

Scenario: Binary classifier for rare disease detection

y = 1: Disease present
y = 0: Disease absent
Algorithm result: 1% error (99% accuracy)

Seems impressive, but…

The Baseline Problem

If only 0.5% of population has the disease:

Simple baseline algorithm:

# Non-learning algorithm
def predict_disease(patient_data):
  return 0  # Always predict no disease

Baseline performance: 99.5% accuracy (0.5% error)

Why Accuracy Fails

Difficulty in Algorithm Comparison

With three algorithms achieving:

Algorithm A: 99.5% accuracy (0.5% error)
Algorithm B: 99.2% accuracy (0.8% error)
Algorithm C: 99.6% accuracy (0.4% error)

Problem: Lowest error may correspond to useless prediction (always y=0) rather than meaningful medical diagnosis.

Precision and Recall Metrics

Confusion Matrix Setup

y=1 is rare class

	Actual Class
Predicted Class	1	0
1	15 (True Positive)	5 (False Positive)
0	10 (False Negative)	70 (True Negative)

Confusion Matrix Terminology

True Positive

15 examples

Predicted: 1 (disease)
Actual: 1 (disease)
✅ Correct positive prediction

False Positive

5 examples

Predicted: 1 (disease)
Actual: 0 (no disease)
❌ Incorrect positive prediction

False Negative

10 examples

Predicted: 0 (no disease)
Actual: 1 (disease)
❌ Missed disease cases

True Negative

70 examples

Predicted: 0 (no disease)
Actual: 0 (no disease)
✅ Correct negative prediction

Precision Definition

Precision: Of all patients predicted to have disease, what fraction actually has it?

Formula:

Precision = True Positives / (True Positives + False Positives)
         = True Positives / Total Predicted Positive

Example calculation:

Precision = 15 / (15 + 5) = 15/20 = 0.75 (75%)

Interpretation: When algorithm predicts disease, it’s correct 75% of the time.

Recall Definition

Recall: Of all patients who actually have disease, what fraction did we correctly detect?

Formula:

Recall = True Positives / (True Positives + False Negatives)
       = True Positives / Total Actual Positive

Example calculation:

Recall = 15 / (15 + 10) = 15/25 = 0.60 (60%)

Interpretation: Algorithm correctly identifies 60% of all disease cases.

Benefits of Precision/Recall

Detects Useless Algorithms

Algorithm that always predicts y=0:

True Positives: 0 (never predicts positive)
Precision: 0/0 = undefined (treat as 0)
Recall: 0/(actual positives) = 0

Key Insight

Both precision and recall will be very low or zero for algorithms that just print y=0, making them easy to identify as non-useful.

Meaningful Performance Assessment

Good algorithms need both:

High precision: When it predicts disease, probably correct
High recall: Catches most actual disease cases

Example interpretation:

75% precision: Reliable when predicting disease presence
60% recall: Identifies majority of actual disease cases

Practical Application

Medical Context

Precision importance: Avoid unnecessary anxiety and treatments from false positives
Recall importance: Don’t miss actual disease cases that need treatment
Balance needed: Both metrics should be reasonably high

Precision and recall provide much better evaluation than accuracy for skewed datasets, ensuring algorithms are both accurate in their positive predictions and effective at detecting the rare class.