Ear Shape
Information Gain = 1.0 - 0.72 = 0.28
- Highest reduction in entropy
- Best feature choice
Information Gain measures the reduction in entropy achieved by splitting on a particular feature, enabling systematic feature selection for decision tree nodes.
Starting Point: All 10 examples at root node
Split Results:
Split Results:
Split Results:
Key Insight: Larger branches with high entropy are worse than smaller branches with high entropy.
Weight Calculation:
Ear Shape Split:
Weighted Entropy = (5/10) × H(0.8) + (5/10) × H(0.2) = 0.5 × 0.72 + 0.5 × 0.72 = 0.72
Face Shape Split:
Weighted Entropy = (7/10) × H(4/7) + (3/10) × H(1/3) = 0.7 × 0.99 + 0.3 × 0.92 ≈ 0.97
Whiskers Split:
Weighted Entropy = (4/10) × H(3/4) + (6/10) × H(2/6) = 0.4 × 0.81 + 0.6 × 0.92 ≈ 0.88
Information Gain = Entropy Reduction from Split
Information Gain = H(p₁^root) - [w^left × H(p₁^left) + w^right × H(p₁^right)]
Ear Shape
Information Gain = 1.0 - 0.72 = 0.28
Face Shape
Information Gain = 1.0 - 0.97 = 0.03
Whiskers
Information Gain = 1.0 - 0.88 = 0.12
Information Gain = H(p₁^root) - [w^left × H(p₁^left) + w^right × H(p₁^right)]
Where: w^left + w^right = 1 (all examples go either left or right)
Choose the feature that maximizes information gain
Example Result: Ear Shape (0.28) > Whiskers (0.12) > Face Shape (0.03) Decision: Split on Ear Shape at root node
Use Case: Stop splitting when information gain < threshold
Information gain provides the mathematical foundation for systematic feature selection in decision trees, ensuring each split maximally improves class separation while enabling principled stopping decisions.