Maximum at p₁ = 0.5
Entropy = 1.0
- 50-50 class distribution
- Highest uncertainty
- Most impure state
Entropy quantifies the impurity of a set of examples, providing a mathematical foundation for decision tree splitting decisions.
For a set of examples:
Mathematical Definition:
H(p₁) = -p₁ log₂(p₁) - p₀ log₂(p₀) = -p₁ log₂(p₁) - (1-p₁) log₂(1-p₁)
Using log₂ makes the peak entropy value equal to 1, providing intuitive interpretation.
Dataset: 3 cats, 3 dogs (6 total)
Dataset: 5 cats, 1 dog (6 total)
Dataset: 6 cats, 0 dogs (6 total)
Dataset: 2 cats, 4 dogs (6 total)
Maximum at p₁ = 0.5
Entropy = 1.0
Minimum at Extremes
Entropy = 0.0
Mathematical Issue: log(0) is undefined (negative infinity) Convention: For entropy calculation, 0 log(0) = 0 Result: Correctly computes entropy as 0 for pure nodes
Alternative Option: Some open-source packages use Gini criterion
Higher Entropy = More Information Needed
Lower Entropy = Less Information Needed
Entropy provides the mathematical foundation for systematically choosing which features to split on at each node, enabling the decision tree algorithm to make optimal splitting decisions based on measurable impurity reduction.