Practice Lab Decision Trees

Practice Lab: Decision Trees

Lab Overview

In this exercise, you will implement a decision tree from scratch and apply it to classify whether a mushroom is edible or poisonous.

Build from Scratch

This lab focuses on implementing the core decision tree algorithms rather than using libraries.

Problem Statement

Business Context: Starting a wild mushroom company Challenge: Determine if mushrooms are edible or poisonous based on physical attributes Goal: Build a classifier using existing data to identify safe mushrooms for sale

Dataset Description

Mushroom Features

10 training examples with three categorical features:

Feature	Values	Encoding
Cap Color	Brown/Red	Brown=1, Red=0
Stalk Shape	Tapering/Enlarging	Tapering=1, Enlarging=0
Solitary	Yes/No	Yes=1, No=0

Target Variable: Edible (1=Yes, 0=Poisonous)

One-Hot Encoded Dataset

X_train = np.array([[1,1,1],  # Brown, Tapering, Solitary → Edible
                 [1,0,1],  # Brown, Enlarging, Solitary → Edible
                 [1,0,0],  # Brown, Enlarging, Not Solitary → Poisonous
                 # ... more examples
                 ])

y_train = np.array([1,1,0,0,1,0,0,1,1,0])

print('Training examples (m):', len(X_train))  # 10 examples
print('Features (n):', X_train.shape[1])      # 3 features

Core Implementation

Exercise 1: Compute Entropy

Implementation Task: Calculate entropy to measure node impurity

def compute_entropy(y):
  """
  Computes entropy for binary classification

  Args:
      y (ndarray): Binary labels (1=edible, 0=poisonous)
  Returns:
      entropy (float): Entropy value
  """
  entropy = 0.

  # STUDENT IMPLEMENTATION REQUIRED
  if len(y) != 0:
      p1 = len(y[y == 1]) / len(y)  # Fraction of edible mushrooms

      if p1 != 0 and p1 != 1:
          entropy = -p1 * np.log2(p1) - (1-p1) * np.log2(1-p1)
      else:
          entropy = 0.  # Pure node

  return entropy

# Test: Root node entropy should be 1.0 (5 edible, 5 poisonous)

Key Implementation Points:

Handle edge cases: empty nodes and pure nodes (p₁ = 0 or 1)
Use log base 2 for standard entropy calculation
Convention: 0 log(0) = 0 for computation purposes

Exercise 2: Split Dataset

Implementation Task: Split data based on feature values

def split_dataset(X, node_indices, feature):
  """
  Splits data into left (feature=1) and right (feature=0) branches

  Args:
      X (ndarray): Feature matrix
      node_indices (list): Active example indices
      feature (int): Feature index to split on
  Returns:
      left_indices, right_indices: Split indices
  """
  left_indices = []   # feature = 1
  right_indices = []  # feature = 0

  # STUDENT IMPLEMENTATION REQUIRED
  for i in node_indices:
      if X[i][feature] == 1:
          left_indices.append(i)
      else:
          right_indices.append(i)

  return left_indices, right_indices

# Test: Split on feature 0 (Brown Cap) at root

Exercise 3: Calculate Information Gain

Implementation Task: Compute information gain from splitting

def compute_information_gain(X, y, node_indices, feature):
  """
  Compute information gain from splitting on given feature

  Args:
      X, y: Training data
      node_indices: Active indices at current node
      feature: Feature to split on
  Returns:
      information_gain: Reduction in entropy
  """
  # Split the dataset
  left_indices, right_indices = split_dataset(X, node_indices, feature)

  # Extract relevant data
  y_node = y[node_indices]
  y_left = y[left_indices]
  y_right = y[right_indices]

  # STUDENT IMPLEMENTATION REQUIRED
  # Calculate entropies
  node_entropy = compute_entropy(y_node)
  left_entropy = compute_entropy(y_left)
  right_entropy = compute_entropy(y_right)

  # Calculate weights
  w_left = len(left_indices) / len(node_indices)
  w_right = len(right_indices) / len(node_indices)

  # Calculate weighted entropy
  weighted_entropy = w_left * left_entropy + w_right * right_entropy

  # Information gain
  information_gain = node_entropy - weighted_entropy

  return information_gain

Expected Results:

Brown Cap: 0.035 information gain
Tapering Stalk: 0.125 information gain
Solitary: 0.278 information gain (highest - best split)

Exercise 4: Get Best Split

Implementation Task: Find feature with maximum information gain

def get_best_split(X, y, node_indices):
  """
  Returns optimal feature to split on

  Args:
      X, y: Training data
      node_indices: Active indices at current node
  Returns:
      best_feature: Index of best feature to split on
  """
  num_features = X.shape[1]
  best_feature = -1

  # STUDENT IMPLEMENTATION REQUIRED
  max_info_gain = 0

  for feature in range(num_features):
      info_gain = compute_information_gain(X, y, node_indices, feature)

      if info_gain > max_info_gain:
          max_info_gain = info_gain
          best_feature = feature

  return best_feature

# Test: Should return feature 2 (Solitary) as best split

Tree Building Process

Recursive Tree Construction

Automated Implementation (provided - no student coding required):

def build_tree_recursive(X, y, node_indices, branch_name, max_depth, current_depth):
  """
  Build decision tree using recursive splitting

  Stopping criteria:
  - Maximum depth reached (max_depth=2)
  - Pure node achieved
  - No more examples to split
  """

  # Check stopping criteria
  if current_depth == max_depth:
      print(f"Leaf node: {branch_name} with indices {node_indices}")
      return

  # Get best split
  best_feature = get_best_split(X, y, node_indices)
  print(f"Depth {current_depth}, {branch_name}: Split on feature {best_feature}")

  # Split and recurse
  left_indices, right_indices = split_dataset(X, node_indices, best_feature)

  build_tree_recursive(X, y, left_indices, "Left", max_depth, current_depth+1)
  build_tree_recursive(X, y, right_indices, "Right", max_depth, current_depth+1)

Tree Construction Results

Expected Tree Structure:

Root: Split on Solitary (feature 2)
Left Branch (Solitary=Yes): Further splits on other features
Right Branch (Solitary=No): Further splits on other features
Leaf Nodes: Final predictions (Edible/Poisonous)

Key Lab Learning Objectives

Algorithmic Understanding

Entropy Calculation: Measure impurity using information theory
Dataset Splitting: Partition examples based on feature values
Information Gain: Quantify improvement from splitting
Feature Selection: Choose optimal splits systematically
Recursive Construction: Build complete tree through repeated splitting

Implementation Skills

Mathematical formulas: Translate entropy equations into code
Data manipulation: Split datasets efficiently using indices
Algorithm logic: Implement greedy feature selection
Edge case handling: Deal with empty nodes and pure nodes
Recursive thinking: Understand tree building as recursive process

Practical Insights

Feature importance: Solitary feature provides highest information gain
Stopping criteria: Maximum depth prevents overfitting
Pure nodes: Natural stopping points when entropy = 0
Greedy selection: Best local choice at each node

This hands-on implementation reinforces the theoretical concepts from lectures and prepares students for using production-ready decision tree libraries with better understanding of the underlying algorithms.