Skip to content
Pablo Rodriguez

Learning Process

Building a decision tree involves systematic decisions at each step to create an effective classifier from training data.

Decision Required: What feature to use at the root node (top of tree)

Algorithm Process:

  • Evaluate all available features (ear shape, face shape, whiskers)
  • Choose feature that provides best information gain
  • Example Result: Select “Ear Shape” as root feature

Based on chosen feature, partition all 10 training examples:

  • Left Branch: 5 examples with pointy ears
  • Right Branch: 5 examples with floppy ears

Focus on left branch (5 examples with pointy ears):

  • Choose next best feature for this subset
  • Algorithm Choice: Face Shape feature
  • Further Split:
    • 4 examples with round faces → All cats → Leaf Node: Cat
    • 1 example with not round face → Dog → Leaf Node: Not Cat

Focus on right branch (5 examples with floppy ears):

  • Contains 1 cat and 4 dogs
  • Choose best feature: Whiskers
  • Split Results:
    • 1 example with whiskers → Cat → Leaf Node: Cat
    • 4 examples without whiskers → Dogs → Leaf Node: Not Cat

Question: How do you choose what feature to split on at each node?

Answer: Maximize Purity

  • Choose features that create subsets as close as possible to “all cats” or “all dogs”
  • Ideal Example: “Cat DNA” feature would create perfect splits (5/5 cats left, 0/5 cats right)
  • Actual Features: Compare ear shape, face shape, whiskers for best purity

Ear Shape Split

  • Left: 4/5 cats
  • Right: 1/5 cats
  • Good purity results

Face Shape Split

  • Left: 4/7 cats
  • Right: 1/3 cats
  • Moderate purity

Whiskers Split

  • Left: 3/4 cats
  • Right: 2/6 not cats
  • Lower purity

Question: When do you stop splitting nodes?

Multiple Criteria Options:

  1. Complete Purity: Stop when node is 100% single class
  2. Maximum Depth: Prevent tree from exceeding depth limit
  3. Minimum Information Gain: Stop if improvement is below threshold
  4. Minimum Examples: Stop if node has too few examples

Tree Depth Definition: Number of hops from root to node

  • Root Node: Depth 0
  • Next Level: Depth 1
  • Following Level: Depth 2

Example: If maximum depth = 2, don’t split nodes at depth 2

  • Smaller Trees: Easier to interpret and manage
  • Overfitting Prevention: Reduces risk of memorizing training data
  • Computational Efficiency: Faster training and prediction
Evolutionary Algorithm

Decision trees evolved through multiple researcher contributions:

  • Researcher A: Basic splitting algorithm
  • Researcher B: New splitting criteria
  • Researcher C: Maximum depth stopping
  • Researcher D: Minimum gain thresholds

Result: Many interconnected pieces and parameters

Modern Approach: Use well-established open-source packages

  • Pre-configured defaults: Good parameter choices provided
  • Automatic optimization: Packages handle complex decisions
  • Focus on application: Less manual parameter tuning needed

Recursive Algorithm: Algorithm that calls itself

  • Overall Tree: Built by constructing smaller sub-trees
  • Left Subtree: Decision tree trained on left subset
  • Right Subtree: Decision tree trained on right subset

Implementation Note: Understanding recursion helps with implementation but isn’t required for using decision tree libraries.

The foundation is choosing optimal splits at each node. This requires:

  1. Measuring purity/impurity in systematic way
  2. Quantifying improvement from potential splits
  3. Selecting best feature based on measurable criteria

The next critical component is understanding entropy - a mathematical way to measure the impurity of a set of examples, enabling systematic feature selection decisions.