Decision trees can work with continuous-valued features (any real number) by using threshold-based splitting rather than categorical splits.
New Feature Added : Weight of animal (in pounds)
Insight : Cats typically lighter than dogs (though overlap exists)
Feature Type : Continuous numerical value
Range : Any positive real number
Feature Options for Splitting :
Ear Shape (categorical)
Face Shape (categorical)
Whiskers (categorical)
Weight (continuous)
Goal : Determine if weight splitting provides better information gain than categorical features.
Continuous Feature Split : weight ≤ threshold vs. weight > threshold
Algorithm Process :
Test multiple threshold values
Calculate information gain for each threshold
Select threshold with highest information gain
Compare with other features’ best information gains
Dataset Visualization :
Horizontal axis : Animal weight (pounds)
Vertical axis : Class (Cat=1, Not Cat=0)
Data points : Scattered based on weight and class
Split Results :
Left Branch : 2 animals, both cats
Right Branch : 8 animals (3 cats, 5 dogs)
Information Gain Calculation :
H(0.5) - [2/10 × H(2/2) + 8/10 × H(3/8)] = 0.24
Split Results :
Left Branch : 4 animals, all cats
Right Branch : 6 animals (1 cat, 5 dogs)
Information Gain Calculation :
H(0.5) - [4/10 × H(4/4) + 6/10 × H(1/6)] = 0.61
Split Results :
Left Branch : 7 animals (4 cats, 3 dogs)
Right Branch : 3 animals (1 cat, 2 dogs)
Information Gain Calculation :
H(0.5) - [7/10 × H(4/7) + 3/10 × H(1/3)] = 0.40
Threshold = 8
Information Gain: 0.24
Moderate improvement
Unbalanced split
Threshold = 9
Information Gain: 0.61
Best performance
Good class separation
Threshold = 13
Information Gain: 0.40
Decent improvement
Less optimal than 9
Rather than testing just a few values , test many possible thresholds:
Sort all examples by feature value (weight)
Identify candidate thresholds at midpoints between consecutive examples
Test each threshold for information gain
Select optimal threshold with highest gain
For 10 training examples : Test 9 different threshold values
Between examples 1&2 : Midpoint threshold
Between examples 2&3 : Midpoint threshold
…and so on
Example Sorted Weights : [7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
Test Thresholds : [7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5]
Final Comparison : If weight (threshold=9) gives information gain of 0.61, and this exceeds all categorical features’ gains, then split on weight ≤ 9.
Continuous features require :
Threshold optimization : Find best split point
Information gain calculation : Standard entropy-based approach
Comparison with categorical features : Select overall best feature
Binary splitting : Left (≤ threshold) vs. Right (> threshold)
If weight ≤ 9 is selected :
Left Branch : 4 examples (all cats)
Right Branch : 6 examples (1 cat, 5 dogs)
Continue recursively : Build subtrees using remaining features
Key Steps :
Test multiple thresholds along feature range
Calculate information gain for each threshold
Select optimal threshold for the feature
Compare with other features (categorical and continuous)
Choose best overall split based on highest information gain
Result : Decision trees seamlessly handle mixed feature types (categorical and continuous) within the same unified framework.
Next, we can extend decision trees beyond classification to handle regression problems where the goal is predicting numerical outputs rather than discrete categories.