Optional Lab Decision Trees

Optional Lab: Decision Trees

Lab Overview

This notebook visualizes how decision trees split using information gain, implementing the core concepts from the lectures with hands-on code examples.

Dataset and Encoding

Cat Classification Dataset

10 Training Examples with three categorical features:

Animal	Ear Shape	Face Shape	Whiskers	Cat
0	Pointy	Round	Present	1
1	Floppy	Not Round	Present	1
2	Floppy	Round	Absent	0
…	…	…	…	…

One-Hot Encoding Applied

Binary representation of categorical features:

Ear Shape: Pointy = 1, Floppy = 0
Face Shape: Round = 1, Not Round = 0
Whiskers: Present = 1, Absent = 0

Result: X_train contains 3 features, y_train contains labels

X_train = np.array([[1, 1, 1],    # Pointy, Round, Present
                 [0, 0, 1],    # Floppy, Not Round, Present
                 [0, 1, 0],    # Floppy, Round, Absent
                 # ... more examples
                 ])

y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

Entropy Implementation

Entropy Function

Key implementation from lecture material:

def entropy(p):
  """Calculate entropy for probability p"""
  if p == 0 or p == 1:
      return 0
  else:
      return -p * np.log2(p) - (1-p) * np.log2(1-p)

# Example: Root node entropy
print(entropy(0.5))  # Output: 1.0 (maximum impurity)

Entropy Behavior Visualization

Interactive plot shows entropy curve:

Maximum at p=0.5: Entropy = 1.0 (highest uncertainty)
Minimum at extremes: Entropy = 0.0 (complete certainty)
Shape: Bell curve peaking at equal class distribution

Information Gain Calculation

Helper Functions

Split Indices Function:

def split_indices(X, index_feature):
  """Split dataset based on feature value"""
  left_indices = []   # feature = 1
  right_indices = []  # feature = 0

  for i, x in enumerate(X):
      if x[index_feature] == 1:
          left_indices.append(i)
      else:
          right_indices.append(i)

  return left_indices, right_indices

Weighted Entropy Function:

def weighted_entropy(X, y, left_indices, right_indices):
  """Calculate weighted entropy after split"""
  w_left = len(left_indices) / len(X)
  w_right = len(right_indices) / len(X)

  p_left = sum(y[left_indices]) / len(left_indices)
  p_right = sum(y[right_indices]) / len(right_indices)

  return w_left * entropy(p_left) + w_right * entropy(p_right)

Information Gain Implementation

def information_gain(X, y, left_indices, right_indices):
  """Calculate information gain from split"""
  p_node = sum(y) / len(y)
  h_node = entropy(p_node)
  w_entropy = weighted_entropy(X, y, left_indices, right_indices)

  return h_node - w_entropy

Feature Comparison Results

Root Node Analysis

Starting conditions:

10 examples total: 5 cats, 5 dogs
Root entropy: H(0.5) = 1.0
Goal: Find feature with highest information gain

Information Gain Calculation

for i, feature_name in enumerate(['Ear Shape', 'Face Shape', 'Whiskers']):
  left_indices, right_indices = split_indices(X_train, i)
  i_gain = information_gain(X_train, y_train, left_indices, right_indices)
  print(f"Feature: {feature_name}, Information Gain: {i_gain:.2f}")

# Expected Output:
# Feature: Ear Shape, Information Gain: 0.28
# Feature: Face Shape, Information Gain: 0.03
# Feature: Whiskers, Information Gain: 0.12

Best Split Selection

Result: Ear Shape provides highest information gain (0.28) Decision: Split root node on ear shape feature

Tree Visualization

Single-Level Tree

After first split on ear shape:

Left branch: Examples with pointy ears
Right branch: Examples with floppy ears
Visualization: Interactive tree diagram showing split

Complete Tree Construction

Recursive process continues:

Maximum depth: Set to 2 for demonstration
Stopping criteria: Pure nodes or depth limit
Final tree: Multiple levels with leaf predictions

Key Lab Insights

Algorithmic Process

Start at root: All examples together
Calculate gains: For each possible feature
Select best: Feature with highest information gain
Split data: According to selected feature
Recurse: Repeat process on subsets until stopping criteria met

Mathematical Implementation

Entropy: Measures impurity using log₂ formula
Information gain: Reduction in entropy from splitting
Weighted averages: Account for different branch sizes
Optimization: Choose splits that maximize purity

Practical Understanding

Feature comparison: Systematic evaluation of all options
Pure nodes: Natural stopping points (entropy = 0)
Tree depth: Controls model complexity
Recursive structure: Build large trees from smaller subtrees

The lab provides hands-on experience with the mathematical foundations of decision trees, making the algorithmic concepts concrete through implementation and visualization.