Skip to content
Pablo Rodriguez

Optional Lab Decision Trees

This notebook visualizes how decision trees split using information gain, implementing the core concepts from the lectures with hands-on code examples.

10 Training Examples with three categorical features:

AnimalEar ShapeFace ShapeWhiskersCat
0PointyRoundPresent1
1FloppyNot RoundPresent1
2FloppyRoundAbsent0

Binary representation of categorical features:

  • Ear Shape: Pointy = 1, Floppy = 0
  • Face Shape: Round = 1, Not Round = 0
  • Whiskers: Present = 1, Absent = 0

Result: X_train contains 3 features, y_train contains labels

dataset_setup.py
X_train = np.array([[1, 1, 1], # Pointy, Round, Present
[0, 0, 1], # Floppy, Not Round, Present
[0, 1, 0], # Floppy, Round, Absent
# ... more examples
])
y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

Key implementation from lecture material:

entropy_function.py
def entropy(p):
"""Calculate entropy for probability p"""
if p == 0 or p == 1:
return 0
else:
return -p * np.log2(p) - (1-p) * np.log2(1-p)
# Example: Root node entropy
print(entropy(0.5)) # Output: 1.0 (maximum impurity)

Interactive plot shows entropy curve:

  • Maximum at p=0.5: Entropy = 1.0 (highest uncertainty)
  • Minimum at extremes: Entropy = 0.0 (complete certainty)
  • Shape: Bell curve peaking at equal class distribution

Split Indices Function:

split_function.py
def split_indices(X, index_feature):
"""Split dataset based on feature value"""
left_indices = [] # feature = 1
right_indices = [] # feature = 0
for i, x in enumerate(X):
if x[index_feature] == 1:
left_indices.append(i)
else:
right_indices.append(i)
return left_indices, right_indices

Weighted Entropy Function:

weighted_entropy.py
def weighted_entropy(X, y, left_indices, right_indices):
"""Calculate weighted entropy after split"""
w_left = len(left_indices) / len(X)
w_right = len(right_indices) / len(X)
p_left = sum(y[left_indices]) / len(left_indices)
p_right = sum(y[right_indices]) / len(right_indices)
return w_left * entropy(p_left) + w_right * entropy(p_right)
information_gain.py
def information_gain(X, y, left_indices, right_indices):
"""Calculate information gain from split"""
p_node = sum(y) / len(y)
h_node = entropy(p_node)
w_entropy = weighted_entropy(X, y, left_indices, right_indices)
return h_node - w_entropy

Starting conditions:

  • 10 examples total: 5 cats, 5 dogs
  • Root entropy: H(0.5) = 1.0
  • Goal: Find feature with highest information gain
feature_comparison.py
for i, feature_name in enumerate(['Ear Shape', 'Face Shape', 'Whiskers']):
left_indices, right_indices = split_indices(X_train, i)
i_gain = information_gain(X_train, y_train, left_indices, right_indices)
print(f"Feature: {feature_name}, Information Gain: {i_gain:.2f}")
# Expected Output:
# Feature: Ear Shape, Information Gain: 0.28
# Feature: Face Shape, Information Gain: 0.03
# Feature: Whiskers, Information Gain: 0.12

Result: Ear Shape provides highest information gain (0.28) Decision: Split root node on ear shape feature

After first split on ear shape:

  • Left branch: Examples with pointy ears
  • Right branch: Examples with floppy ears
  • Visualization: Interactive tree diagram showing split

Recursive process continues:

  • Maximum depth: Set to 2 for demonstration
  • Stopping criteria: Pure nodes or depth limit
  • Final tree: Multiple levels with leaf predictions
  1. Start at root: All examples together
  2. Calculate gains: For each possible feature
  3. Select best: Feature with highest information gain
  4. Split data: According to selected feature
  5. Recurse: Repeat process on subsets until stopping criteria met
  • Entropy: Measures impurity using log₂ formula
  • Information gain: Reduction in entropy from splitting
  • Weighted averages: Account for different branch sizes
  • Optimization: Choose splits that maximize purity
  • Feature comparison: Systematic evaluation of all options
  • Pure nodes: Natural stopping points (entropy = 0)
  • Tree depth: Controls model complexity
  • Recursive structure: Build large trees from smaller subtrees

The lab provides hands-on experience with the mathematical foundations of decision trees, making the algorithmic concepts concrete through implementation and visualization.