Building a decision tree involves systematic decisions at each step to create an effective classifier from training data.
Decision Required : What feature to use at the root node (top of tree)
Algorithm Process :
Evaluate all available features (ear shape, face shape, whiskers)
Choose feature that provides best information gain
Example Result : Select “Ear Shape” as root feature
Based on chosen feature , partition all 10 training examples:
Left Branch : 5 examples with pointy ears
Right Branch : 5 examples with floppy ears
Focus on left branch (5 examples with pointy ears):
Choose next best feature for this subset
Algorithm Choice : Face Shape feature
Further Split :
4 examples with round faces → All cats → Leaf Node: Cat
1 example with not round face → Dog → Leaf Node: Not Cat
Focus on right branch (5 examples with floppy ears):
Contains 1 cat and 4 dogs
Choose best feature: Whiskers
Split Results :
1 example with whiskers → Cat → Leaf Node: Cat
4 examples without whiskers → Dogs → Leaf Node: Not Cat
Question : How do you choose what feature to split on at each node?
Answer : Maximize Purity
Choose features that create subsets as close as possible to “all cats” or “all dogs”
Ideal Example : “Cat DNA” feature would create perfect splits (5/5 cats left, 0/5 cats right)
Actual Features : Compare ear shape, face shape, whiskers for best purity
Ear Shape Split
Left: 4/5 cats
Right: 1/5 cats
Good purity results
Face Shape Split
Left: 4/7 cats
Right: 1/3 cats
Moderate purity
Whiskers Split
Left: 3/4 cats
Right: 2/6 not cats
Lower purity
Question : When do you stop splitting nodes?
Multiple Criteria Options :
Complete Purity : Stop when node is 100% single class
Maximum Depth : Prevent tree from exceeding depth limit
Minimum Information Gain : Stop if improvement is below threshold
Minimum Examples : Stop if node has too few examples
Tree Depth Definition : Number of hops from root to node
Root Node : Depth 0
Next Level : Depth 1
Following Level : Depth 2
Example : If maximum depth = 2, don’t split nodes at depth 2
Smaller Trees : Easier to interpret and manage
Overfitting Prevention : Reduces risk of memorizing training data
Computational Efficiency : Faster training and prediction
Evolutionary Algorithm
Decision trees evolved through multiple researcher contributions:
Researcher A : Basic splitting algorithm
Researcher B : New splitting criteria
Researcher C : Maximum depth stopping
Researcher D : Minimum gain thresholds
Result : Many interconnected pieces and parameters
Modern Approach : Use well-established open-source packages
Pre-configured defaults : Good parameter choices provided
Automatic optimization : Packages handle complex decisions
Focus on application : Less manual parameter tuning needed
Note
Complexity Acknowledgment : “If this algorithm seems complicated and messy, it frankly does to me too, but it does work well.” The many components evolved to create a very effective learning algorithm.
Recursive Algorithm : Algorithm that calls itself
Overall Tree : Built by constructing smaller sub-trees
Left Subtree : Decision tree trained on left subset
Right Subtree : Decision tree trained on right subset
Implementation Note : Understanding recursion helps with implementation but isn’t required for using decision tree libraries.
The foundation is choosing optimal splits at each node. This requires:
Measuring purity/impurity in systematic way
Quantifying improvement from potential splits
Selecting best feature based on measurable criteria
The next critical component is understanding entropy - a mathematical way to measure the impurity of a set of examples, enabling systematic feature selection decisions.