Learning Process

Decision Tree Building Steps

Building a decision tree involves systematic decisions at each step to create an effective classifier from training data.

Step-by-Step Construction Process

Step 1: Choose Root Node Feature

Decision Required: What feature to use at the root node (top of tree)

Algorithm Process:

Evaluate all available features (ear shape, face shape, whiskers)
Choose feature that provides best information gain
Example Result: Select “Ear Shape” as root feature

Step 2: Split Training Data

Based on chosen feature, partition all 10 training examples:

Left Branch: 5 examples with pointy ears
Right Branch: 5 examples with floppy ears

Step 3: Build Left Subtree

Focus on left branch (5 examples with pointy ears):

Choose next best feature for this subset
Algorithm Choice: Face Shape feature
Further Split:
- 4 examples with round faces → All cats → Leaf Node: Cat
- 1 example with not round face → Dog → Leaf Node: Not Cat

Step 4: Build Right Subtree

Focus on right branch (5 examples with floppy ears):

Contains 1 cat and 4 dogs
Choose best feature: Whiskers
Split Results:
- 1 example with whiskers → Cat → Leaf Node: Cat
- 4 examples without whiskers → Dogs → Leaf Node: Not Cat

Key Decisions in Tree Building

Decision 1: Feature Selection Criteria

Question: How do you choose what feature to split on at each node?

Answer: Maximize Purity

Choose features that create subsets as close as possible to “all cats” or “all dogs”
Ideal Example: “Cat DNA” feature would create perfect splits (5/5 cats left, 0/5 cats right)
Actual Features: Compare ear shape, face shape, whiskers for best purity

Ear Shape Split

Left: 4/5 cats
Right: 1/5 cats
Good purity results

Face Shape Split

Left: 4/7 cats
Right: 1/3 cats
Moderate purity

Whiskers Split

Left: 3/4 cats
Right: 2/6 not cats
Lower purity

Decision 2: Stopping Criteria

Question: When do you stop splitting nodes?

Multiple Criteria Options:

Complete Purity: Stop when node is 100% single class
Maximum Depth: Prevent tree from exceeding depth limit
Minimum Information Gain: Stop if improvement is below threshold
Minimum Examples: Stop if node has too few examples

Depth-Based Stopping

Tree Depth Definition: Number of hops from root to node

Root Node: Depth 0
Next Level: Depth 1
Following Level: Depth 2

Example: If maximum depth = 2, don’t split nodes at depth 2

Benefits of Stopping Early

Smaller Trees: Easier to interpret and manage
Overfitting Prevention: Reduces risk of memorizing training data
Computational Efficiency: Faster training and prediction

Algorithm Complexity Considerations

Historical Development

Evolutionary Algorithm

Decision trees evolved through multiple researcher contributions:

Researcher A: Basic splitting algorithm
Researcher B: New splitting criteria
Researcher C: Maximum depth stopping
Researcher D: Minimum gain thresholds

Result: Many interconnected pieces and parameters

Practical Implementation

Modern Approach: Use well-established open-source packages

Pre-configured defaults: Good parameter choices provided
Automatic optimization: Packages handle complex decisions
Focus on application: Less manual parameter tuning needed

Recursive Nature

Computer Science Concept

Recursive Algorithm: Algorithm that calls itself

Overall Tree: Built by constructing smaller sub-trees
Left Subtree: Decision tree trained on left subset
Right Subtree: Decision tree trained on right subset

Implementation Note: Understanding recursion helps with implementation but isn’t required for using decision tree libraries.

Next Steps

The foundation is choosing optimal splits at each node. This requires:

Measuring purity/impurity in systematic way
Quantifying improvement from potential splits
Selecting best feature based on measurable criteria

The next critical component is understanding entropy - a mathematical way to measure the impurity of a set of examples, enabling systematic feature selection decisions.