Skip to content
Pablo Rodriguez

Practice Lab Decision Trees

In this exercise, you will implement a decision tree from scratch and apply it to classify whether a mushroom is edible or poisonous.

Build from Scratch

This lab focuses on implementing the core decision tree algorithms rather than using libraries.

Business Context: Starting a wild mushroom company Challenge: Determine if mushrooms are edible or poisonous based on physical attributes Goal: Build a classifier using existing data to identify safe mushrooms for sale

10 training examples with three categorical features:

FeatureValuesEncoding
Cap ColorBrown/RedBrown=1, Red=0
Stalk ShapeTapering/EnlargingTapering=1, Enlarging=0
SolitaryYes/NoYes=1, No=0

Target Variable: Edible (1=Yes, 0=Poisonous)

dataset_setup.py
X_train = np.array([[1,1,1], # Brown, Tapering, Solitary → Edible
[1,0,1], # Brown, Enlarging, Solitary → Edible
[1,0,0], # Brown, Enlarging, Not Solitary → Poisonous
# ... more examples
])
y_train = np.array([1,1,0,0,1,0,0,1,1,0])
print('Training examples (m):', len(X_train)) # 10 examples
print('Features (n):', X_train.shape[1]) # 3 features

Implementation Task: Calculate entropy to measure node impurity

compute_entropy.py
def compute_entropy(y):
"""
Computes entropy for binary classification
Args:
y (ndarray): Binary labels (1=edible, 0=poisonous)
Returns:
entropy (float): Entropy value
"""
entropy = 0.
# STUDENT IMPLEMENTATION REQUIRED
if len(y) != 0:
p1 = len(y[y == 1]) / len(y) # Fraction of edible mushrooms
if p1 != 0 and p1 != 1:
entropy = -p1 * np.log2(p1) - (1-p1) * np.log2(1-p1)
else:
entropy = 0. # Pure node
return entropy
# Test: Root node entropy should be 1.0 (5 edible, 5 poisonous)

Key Implementation Points:

  • Handle edge cases: empty nodes and pure nodes (p₁ = 0 or 1)
  • Use log base 2 for standard entropy calculation
  • Convention: 0 log(0) = 0 for computation purposes

Implementation Task: Split data based on feature values

split_dataset.py
def split_dataset(X, node_indices, feature):
"""
Splits data into left (feature=1) and right (feature=0) branches
Args:
X (ndarray): Feature matrix
node_indices (list): Active example indices
feature (int): Feature index to split on
Returns:
left_indices, right_indices: Split indices
"""
left_indices = [] # feature = 1
right_indices = [] # feature = 0
# STUDENT IMPLEMENTATION REQUIRED
for i in node_indices:
if X[i][feature] == 1:
left_indices.append(i)
else:
right_indices.append(i)
return left_indices, right_indices
# Test: Split on feature 0 (Brown Cap) at root

Implementation Task: Compute information gain from splitting

information_gain.py
def compute_information_gain(X, y, node_indices, feature):
"""
Compute information gain from splitting on given feature
Args:
X, y: Training data
node_indices: Active indices at current node
feature: Feature to split on
Returns:
information_gain: Reduction in entropy
"""
# Split the dataset
left_indices, right_indices = split_dataset(X, node_indices, feature)
# Extract relevant data
y_node = y[node_indices]
y_left = y[left_indices]
y_right = y[right_indices]
# STUDENT IMPLEMENTATION REQUIRED
# Calculate entropies
node_entropy = compute_entropy(y_node)
left_entropy = compute_entropy(y_left)
right_entropy = compute_entropy(y_right)
# Calculate weights
w_left = len(left_indices) / len(node_indices)
w_right = len(right_indices) / len(node_indices)
# Calculate weighted entropy
weighted_entropy = w_left * left_entropy + w_right * right_entropy
# Information gain
information_gain = node_entropy - weighted_entropy
return information_gain

Expected Results:

  • Brown Cap: 0.035 information gain
  • Tapering Stalk: 0.125 information gain
  • Solitary: 0.278 information gain (highest - best split)

Implementation Task: Find feature with maximum information gain

get_best_split.py
def get_best_split(X, y, node_indices):
"""
Returns optimal feature to split on
Args:
X, y: Training data
node_indices: Active indices at current node
Returns:
best_feature: Index of best feature to split on
"""
num_features = X.shape[1]
best_feature = -1
# STUDENT IMPLEMENTATION REQUIRED
max_info_gain = 0
for feature in range(num_features):
info_gain = compute_information_gain(X, y, node_indices, feature)
if info_gain > max_info_gain:
max_info_gain = info_gain
best_feature = feature
return best_feature
# Test: Should return feature 2 (Solitary) as best split

Automated Implementation (provided - no student coding required):

build_tree.py
def build_tree_recursive(X, y, node_indices, branch_name, max_depth, current_depth):
"""
Build decision tree using recursive splitting
Stopping criteria:
- Maximum depth reached (max_depth=2)
- Pure node achieved
- No more examples to split
"""
# Check stopping criteria
if current_depth == max_depth:
print(f"Leaf node: {branch_name} with indices {node_indices}")
return
# Get best split
best_feature = get_best_split(X, y, node_indices)
print(f"Depth {current_depth}, {branch_name}: Split on feature {best_feature}")
# Split and recurse
left_indices, right_indices = split_dataset(X, node_indices, best_feature)
build_tree_recursive(X, y, left_indices, "Left", max_depth, current_depth+1)
build_tree_recursive(X, y, right_indices, "Right", max_depth, current_depth+1)

Expected Tree Structure:

  1. Root: Split on Solitary (feature 2)
  2. Left Branch (Solitary=Yes): Further splits on other features
  3. Right Branch (Solitary=No): Further splits on other features
  4. Leaf Nodes: Final predictions (Edible/Poisonous)
  1. Entropy Calculation: Measure impurity using information theory
  2. Dataset Splitting: Partition examples based on feature values
  3. Information Gain: Quantify improvement from splitting
  4. Feature Selection: Choose optimal splits systematically
  5. Recursive Construction: Build complete tree through repeated splitting
  • Mathematical formulas: Translate entropy equations into code
  • Data manipulation: Split datasets efficiently using indices
  • Algorithm logic: Implement greedy feature selection
  • Edge case handling: Deal with empty nodes and pure nodes
  • Recursive thinking: Understand tree building as recursive process
  • Feature importance: Solitary feature provides highest information gain
  • Stopping criteria: Maximum depth prevents overfitting
  • Pure nodes: Natural stopping points when entropy = 0
  • Greedy selection: Best local choice at each node

This hands-on implementation reinforces the theoretical concepts from lectures and prepares students for using production-ready decision tree libraries with better understanding of the underlying algorithms.