Multiple Decision Trees

Using Multiple Decision Trees

Problem with Single Decision Trees

High Sensitivity

Single decision trees are highly sensitive to small changes in training data, leading to poor robustness.

Sensitivity Example

Original Dataset: 10 examples with specific features

Best root split: Ear Shape feature
Result: Specific tree structure

Modified Dataset: Change just ONE example

Original: Pointy ears, Round face, Whiskers absent
Modified: Floppy ears, Round face, Whiskers present
New best split: Whiskers feature (completely different!)
Result: Totally different tree structure

Tree Ensemble Solution

Concept: Multiple Trees Voting

Tree Ensemble: Collection of multiple decision trees that vote on final predictions

Benefits:

Reduced sensitivity: No single tree controls the outcome
Improved robustness: Majority vote smooths out individual tree errors
Better accuracy: Multiple perspectives improve predictions

Ensemble Prediction Process

Example Prediction: Animal with pointy ears, not round face, whiskers present

Tree 1 Prediction: Follows its path → Predicts “Cat”
Tree 2 Prediction: Follows its path → Predicts “Not Cat”
Tree 3 Prediction: Follows its path → Predicts “Cat”
Majority Vote: 2 votes for “Cat”, 1 for “Not Cat”
Final Prediction: “Cat” (correct!)

Why Ensembles Work Better

Robustness Through Voting

Individual Tree Problems:

Overfitting to specific training patterns
High variance in predictions
Sensitive to data perturbations

Ensemble Solutions:

Multiple perspectives: Each tree captures different patterns
Error averaging: Individual mistakes get outvoted
Reduced overfitting: Collective decision more stable

Single Tree

High Variance

Sensitive to data changes
Prone to overfitting
Unreliable predictions

Tree Ensemble

Low Variance

Robust to data changes
Reduced overfitting
Stable predictions

Creating Diverse Trees

Key Challenge

Question: How do you create multiple plausible but different decision trees?

Requirements for Good Ensemble:

Trees should be different enough to provide diverse perspectives
Trees should be accurate enough to contribute meaningful votes
Trees should complement each other’s strengths and weaknesses

Upcoming Techniques

Sampling with Replacement: Statistical technique to create different training sets

Purpose: Generate varied datasets for training different trees
Result: Each tree sees slightly different data patterns
Outcome: Naturally diverse tree ensemble

Ensemble Voting Mechanism

Simple Majority Vote

Classification: Each tree votes for a class, majority wins Regression: Average the predictions from all trees

Practical Implementation

For new test example:
1. Pass example through Tree 1 → get prediction 1
2. Pass example through Tree 2 → get prediction 2
3. Pass example through Tree 3 → get prediction 3
4. Combine predictions using voting/averaging
5. Output final ensemble prediction

Tree ensembles fundamentally solve the robustness problem of single decision trees by leveraging the wisdom of crowds - multiple diverse trees working together produce more reliable and accurate predictions than any individual tree could achieve alone.