Skip to content
Pablo Rodriguez

Continuous Valued Features

Decision trees can work with continuous-valued features (any real number) by using threshold-based splitting rather than categorical splits.

New Feature Added: Weight of animal (in pounds)

  • Insight: Cats typically lighter than dogs (though overlap exists)
  • Feature Type: Continuous numerical value
  • Range: Any positive real number

Feature Options for Splitting:

  • Ear Shape (categorical)
  • Face Shape (categorical)
  • Whiskers (categorical)
  • Weight (continuous)

Goal: Determine if weight splitting provides better information gain than categorical features.

Continuous Feature Split: weight ≤ threshold vs. weight > threshold

Algorithm Process:

  1. Test multiple threshold values
  2. Calculate information gain for each threshold
  3. Select threshold with highest information gain
  4. Compare with other features’ best information gains

Dataset Visualization:

  • Horizontal axis: Animal weight (pounds)
  • Vertical axis: Class (Cat=1, Not Cat=0)
  • Data points: Scattered based on weight and class

Split Results:

  • Left Branch: 2 animals, both cats
  • Right Branch: 8 animals (3 cats, 5 dogs)

Information Gain Calculation:

H(0.5) - [2/10 × H(2/2) + 8/10 × H(3/8)] = 0.24

Split Results:

  • Left Branch: 4 animals, all cats
  • Right Branch: 6 animals (1 cat, 5 dogs)

Information Gain Calculation:

H(0.5) - [4/10 × H(4/4) + 6/10 × H(1/6)] = 0.61

Split Results:

  • Left Branch: 7 animals (4 cats, 3 dogs)
  • Right Branch: 3 animals (1 cat, 2 dogs)

Information Gain Calculation:

H(0.5) - [7/10 × H(4/7) + 3/10 × H(1/3)] = 0.40

Threshold = 8

Information Gain: 0.24

  • Moderate improvement
  • Unbalanced split

Threshold = 9

Information Gain: 0.61

  • Best performance
  • Good class separation

Threshold = 13

Information Gain: 0.40

  • Decent improvement
  • Less optimal than 9

Rather than testing just a few values, test many possible thresholds:

  1. Sort all examples by feature value (weight)
  2. Identify candidate thresholds at midpoints between consecutive examples
  3. Test each threshold for information gain
  4. Select optimal threshold with highest gain

For 10 training examples: Test 9 different threshold values

  • Between examples 1&2: Midpoint threshold
  • Between examples 2&3: Midpoint threshold
  • …and so on

Example Sorted Weights: [7, 8, 9, 10, 11, 12, 13, 14, 15, 16] Test Thresholds: [7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5]

Final Comparison: If weight (threshold=9) gives information gain of 0.61, and this exceeds all categorical features’ gains, then split on weight ≤ 9.

Continuous features require:

  1. Threshold optimization: Find best split point
  2. Information gain calculation: Standard entropy-based approach
  3. Comparison with categorical features: Select overall best feature
  4. Binary splitting: Left (≤ threshold) vs. Right (> threshold)

If weight ≤ 9 is selected:

  • Left Branch: 4 examples (all cats)
  • Right Branch: 6 examples (1 cat, 5 dogs)
  • Continue recursively: Build subtrees using remaining features

Key Steps:

  1. Test multiple thresholds along feature range
  2. Calculate information gain for each threshold
  3. Select optimal threshold for the feature
  4. Compare with other features (categorical and continuous)
  5. Choose best overall split based on highest information gain

Result: Decision trees seamlessly handle mixed feature types (categorical and continuous) within the same unified framework.

Next, we can extend decision trees beyond classification to handle regression problems where the goal is predicting numerical outputs rather than discrete categories.