Skip to content
Pablo Rodriguez

Measuring Purity

Entropy quantifies the impurity of a set of examples, providing a mathematical foundation for decision tree splitting decisions.

For a set of examples:

  • p₁: Fraction of examples that are cats (positive class)
  • p₀: Fraction of examples that are not cats = (1 - p₁)

Mathematical Definition:

H(p₁) = -p₁ log₂(p₁) - p₀ log₂(p₀)
= -p₁ log₂(p₁) - (1-p₁) log₂(1-p₁)
Base 2 Logarithm

Using log₂ makes the peak entropy value equal to 1, providing intuitive interpretation.

Dataset: 3 cats, 3 dogs (6 total)

  • p₁ = 3/6 = 0.5
  • H(0.5) = 1.0
  • Maximum impurity: 50-50 split represents highest uncertainty

Dataset: 5 cats, 1 dog (6 total)

  • p₁ = 5/6 ≈ 0.83
  • H(0.83) ≈ 0.65
  • Lower impurity: Strong majority class

Dataset: 6 cats, 0 dogs (6 total)

  • p₁ = 6/6 = 1.0
  • H(1.0) = 0
  • Zero impurity: Single class only

Dataset: 2 cats, 4 dogs (6 total)

  • p₁ = 2/6 = 1/3 ≈ 0.33
  • H(0.33) ≈ 0.92
  • High impurity: Closer to 50-50 than Example 2

Maximum at p₁ = 0.5

Entropy = 1.0

  • 50-50 class distribution
  • Highest uncertainty
  • Most impure state

Minimum at Extremes

Entropy = 0.0

  • p₁ = 0 (all negative)
  • p₁ = 1 (all positive)
  • Perfect purity

Mathematical Issue: log(0) is undefined (negative infinity) Convention: For entropy calculation, 0 log(0) = 0 Result: Correctly computes entropy as 0 for pure nodes

Alternative Option: Some open-source packages use Gini criterion

  • Similar behavior: Also goes from 0 → 1 → 0 as p₁ varies
  • Equivalent effectiveness: Works well for decision trees
  • Focus Choice: We’ll use entropy for simplicity and consistency
  • H ≈ 1.0: Mixed classes, needs splitting
  • H ≈ 0.5-0.8: Moderate impurity, potential for improvement
  • H ≈ 0.0: Pure node, stop splitting

Higher Entropy = More Information Needed

  • Uncertain outcomes require more questions
  • Additional features needed to improve classification

Lower Entropy = Less Information Needed

  • Predictable outcomes require fewer questions
  • Current features sufficient for good classification

Entropy provides the mathematical foundation for systematically choosing which features to split on at each node, enabling the decision tree algorithm to make optimal splitting decisions based on measurable impurity reduction.