Regression Trees

Regression Trees (Optional)

Extending to Regression Problems

Decision trees can be generalized beyond classification to handle regression tasks where the goal is predicting numerical values rather than discrete categories.

Problem Setup: Predicting Animal Weight

Task Transformation

Previous: Predict if animal is cat (classification) Now: Predict weight of animal (regression)

Input Features (unchanged):

Ear Shape (pointy/floppy)
Face Shape (round/not round)
Whiskers (present/absent)

Output Variable:

Y = weight in pounds (continuous numerical value)

Regression Task

Predicting a number rather than a category.

Regression Tree Structure

Example Tree Construction

Sample regression tree:

Root: Split on Ear Shape
Left Subtree (Pointy): Split on Face Shape
Right Subtree (Floppy): Split on Face Shape

Note: Same feature can appear in multiple branches - this is perfectly valid in decision trees.

Leaf Node Values

Classification vs. Regression Difference:

Classification Trees
Regression Trees

Leaf nodes predict class labels
Example: “Cat” or “Not Cat”
Discrete categorical output

Computing Leaf Predictions

Leaf Node Calculation: Average of training examples reaching that node

Example Leaf Node:

Training examples reaching node: Weights [7.2, 7.6, 9.2, 10.2]
Prediction: (7.2 + 7.6 + 9.2 + 10.2) ÷ 4 = 8.35 pounds

Additional Examples:

Single example node: Weight 9.2 → Prediction: 9.2 pounds
Multiple examples: [15.0, 18.0, 20.1] → Prediction: 17.70 pounds
Two examples: [9.1, 10.7] → Prediction: 9.90 pounds

Splitting Criterion: Variance Reduction

From Entropy to Variance

Classification Trees: Minimize entropy (measure of class impurity) Regression Trees: Minimize variance (measure of numerical spread)

Variance as Impurity Measure

Variance Definition: How widely a set of numbers varies

Example Comparisons:

Low variance: [7.2, 9.2, 8.1, 10.2] → Variance = 1.47
High variance: [8.8, 15.0, 11.0, 18.0, 20.0] → Variance = 21.87

Interpretation: Higher variance indicates more spread in values, suggesting need for further splitting.

Information Gain for Regression

Weighted Variance Calculation

Similar structure to classification, but using variance instead of entropy:

Split Evaluation:

Weighted Variance = w^left × Variance(left) + w^right × Variance(right)

Example: Ear shape split

w^left = 5/10, w^right = 5/10
Left variance = 1.47, Right variance = 21.87
Weighted variance = 0.5 × 1.47 + 0.5 × 21.87 = 11.67

Variance Reduction (Information Gain)

Formula:

Variance Reduction = Root Variance - Weighted Variance After Split

Calculation Examples:

Ear Shape Split

Root Variance: 20.51 Weighted Variance: 11.67 Variance Reduction: 8.84

Best reduction

Face Shape Split

Root Variance: 20.51 Weighted Variance: 19.87 Variance Reduction: 0.64

Minimal improvement

Whiskers Split

Root Variance: 20.51 Weighted Variance: 14.29 Variance Reduction: 6.22

Moderate improvement

Feature Selection Process

Choosing Optimal Split

Selection Rule: Choose feature with largest variance reduction

Example Result: Ear shape (8.84) > Whiskers (6.22) > Face shape (0.64) Decision: Split on ear shape

Recursive Application

After selecting ear shape:

Left Branch: 5 examples with pointy ears
Right Branch: 5 examples with floppy ears
Repeat process: Apply same algorithm to each subset
Continue until stopping criteria met

Stopping Criteria for Regression Trees

Similar to classification trees:

Maximum depth reached
Minimum examples per node
Variance reduction below threshold
Pure node (all examples have same value - rare in practice)

Summary: Classification vs. Regression Trees

Key Differences

Aspect	Classification Trees	Regression Trees
Output	Categories/Classes	Numerical Values
Leaf Prediction	Most common class	Average of values
Splitting Criterion	Entropy/Information Gain	Variance Reduction
Impurity Measure	Class mixture	Value spread

Shared Structure

Tree building process: Identical recursive approach
Feature selection: Choose best split at each node
Stopping criteria: Similar threshold-based approaches
Prediction process: Follow path from root to leaf

Regression trees extend the power of decision trees beyond classification, enabling prediction of continuous numerical outcomes while maintaining the interpretable tree structure and systematic splitting approach.