Continuous Valued Features

Handling Numerical Features

Decision trees can work with continuous-valued features (any real number) by using threshold-based splitting rather than categorical splits.

Problem Setup: Animal Weight Feature

Extended Dataset

New Feature Added: Weight of animal (in pounds)

Insight: Cats typically lighter than dogs (though overlap exists)
Feature Type: Continuous numerical value
Range: Any positive real number

Decision Tree Integration

Feature Options for Splitting:

Ear Shape (categorical)
Face Shape (categorical)
Whiskers (categorical)
Weight (continuous)

Goal: Determine if weight splitting provides better information gain than categorical features.

Threshold-Based Splitting

Core Concept

Continuous Feature Split: weight ≤ threshold vs. weight > threshold

Algorithm Process:

Test multiple threshold values
Calculate information gain for each threshold
Select threshold with highest information gain
Compare with other features’ best information gains

Threshold Selection Example

Dataset Visualization:

Horizontal axis: Animal weight (pounds)
Vertical axis: Class (Cat=1, Not Cat=0)
Data points: Scattered based on weight and class

Threshold Option 1: Weight ≤ 8

Split Results:

Left Branch: 2 animals, both cats
Right Branch: 8 animals (3 cats, 5 dogs)

Information Gain Calculation:

H(0.5) - [2/10 × H(2/2) + 8/10 × H(3/8)] = 0.24

Threshold Option 2: Weight ≤ 9

Split Results:

Left Branch: 4 animals, all cats
Right Branch: 6 animals (1 cat, 5 dogs)

Information Gain Calculation:

H(0.5) - [4/10 × H(4/4) + 6/10 × H(1/6)] = 0.61

Threshold Option 3: Weight ≤ 13

Split Results:

Left Branch: 7 animals (4 cats, 3 dogs)
Right Branch: 3 animals (1 cat, 2 dogs)

Information Gain Calculation:

H(0.5) - [7/10 × H(4/7) + 3/10 × H(1/3)] = 0.40

Threshold = 8

Information Gain: 0.24

Moderate improvement
Unbalanced split

Threshold = 9

Information Gain: 0.61

Best performance
Good class separation

Threshold = 13

Information Gain: 0.40

Decent improvement
Less optimal than 9

Systematic Threshold Selection

Comprehensive Approach

Rather than testing just a few values, test many possible thresholds:

Sort all examples by feature value (weight)
Identify candidate thresholds at midpoints between consecutive examples
Test each threshold for information gain
Select optimal threshold with highest gain

Midpoint Strategy

For 10 training examples: Test 9 different threshold values

Between examples 1&2: Midpoint threshold
Between examples 2&3: Midpoint threshold
…and so on

Example Sorted Weights: [7, 8, 9, 10, 11, 12, 13, 14, 15, 16] Test Thresholds: [7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5]

Feature Selection Decision

Final Comparison: If weight (threshold=9) gives information gain of 0.61, and this exceeds all categorical features’ gains, then split on weight ≤ 9.

Implementation Process

Algorithm Integration

Continuous features require:

Threshold optimization: Find best split point
Information gain calculation: Standard entropy-based approach
Comparison with categorical features: Select overall best feature
Binary splitting: Left (≤ threshold) vs. Right (> threshold)

Split Result

If weight ≤ 9 is selected:

Left Branch: 4 examples (all cats)
Right Branch: 6 examples (1 cat, 5 dogs)
Continue recursively: Build subtrees using remaining features

Summary: Continuous Feature Handling

Key Steps:

Test multiple thresholds along feature range
Calculate information gain for each threshold
Select optimal threshold for the feature
Compare with other features (categorical and continuous)
Choose best overall split based on highest information gain

Result: Decision trees seamlessly handle mixed feature types (categorical and continuous) within the same unified framework.

Next, we can extend decision trees beyond classification to handle regression problems where the goal is predicting numerical outputs rather than discrete categories.