Random Forest Algorithm

Tree Ensemble Generation Process

Random Forest builds multiple decision trees using sampling with replacement to create a robust ensemble classifier.

Basic Bagged Decision Tree Algorithm

Core Algorithm Steps

For B = 1 to B (total trees):
- Use sampling with replacement to create new training set of size M
- Train decision tree on this new dataset
For prediction:
- Get all B trees to vote on final prediction
- Use majority vote for classification

Example Implementation

Original Training Set: 10 examples Process:

Tree 1: Sample 10 examples with replacement → Train decision tree
Tree 2: Sample 10 examples with replacement → Train different decision tree
Tree 3: Sample 10 examples with replacement → Train another decision tree
Continue for B total trees…

Parameter Selection

Typical B values: 64 to 128 trees

More trees: Generally doesn’t hurt performance
Diminishing returns: Beyond ~100 trees, improvement minimal
Computational cost: More trees = slower training and prediction

Bagged Decision Tree

This basic ensemble method is called “bagged decision tree” because training examples are placed in a virtual “bag” for sampling.

Random Forest Enhancement

Problem with Basic Bagging

Issue: Even with sampling, trees often make same split decisions

Root node: Frequently chooses same feature across different samples
Near-root nodes: Similar split patterns emerge
Result: Trees less diverse than optimal

Random Forest Solution: Feature Randomization

Key Innovation: At each node, randomly select subset of features to consider

Process at each node:

Available features: n total features
Random subset: Choose k < n features randomly
Best split selection: Choose feature with highest information gain from k features only

Parameter Guidelines

Feature subset size (k):

Typical choice: k = √n (square root of total features)
Small datasets: Technique more useful for larger feature sets
Example: If n = 100 features, consider k = 10 features at each split

Basic Bagging

Randomization Source

Sample different training sets
Same features available at each node

Random Forest

Enhanced Randomization

Sample different training sets
Random feature subsets at each node
Greater tree diversity

Why Random Forest Works Better

Increased Robustness

Multiple sources of randomness:

Data sampling: Different training sets via sampling with replacement
Feature sampling: Different feature subsets at each node split

Result: More diverse trees → Better ensemble performance

Theoretical Foundation

Ensemble averaging: Algorithm explores many small changes to data automatically

Data variations: Sampling with replacement creates data perturbations
Feature variations: Random feature selection creates different perspectives
Collective wisdom: Averaging over variations reduces impact of individual changes

Algorithm Comparison

Single Decision Tree

Training: Fixed dataset, all features available
Weakness: High sensitivity to data changes
Strength: Fast, interpretable

Bagged Decision Trees

Training: Multiple sampled datasets, all features available
Improvement: Reduced sensitivity through ensemble voting
Limitation: Trees may still be too similar

Random Forest

Training: Multiple sampled datasets, random feature subsets
Advantages: Maximum diversity, excellent robustness
Performance: Typically much better than single trees

Implementation Example

# Conceptual Random Forest algorithm
for b in range(B):  # B = number of trees
  # 1. Sample with replacement
  sample_data = sample_with_replacement(original_data, M)

  # 2. Train tree with feature randomization
  tree = DecisionTree()

  # At each node, only consider sqrt(n) random features
  tree.max_features = int(sqrt(n_features))
  tree.fit(sample_data)

  # 3. Add to ensemble
  forest.append(tree)

# Prediction: majority vote from all trees
def predict(x):
  votes = [tree.predict(x) for tree in forest]
  return majority_vote(votes)

Practical Benefits

Performance Improvements

Accuracy: Usually significantly better than single decision tree
Robustness: Much less sensitive to training data changes
Overfitting reduction: Ensemble averaging reduces overfitting

Computational Considerations

Training time: Longer than single tree (training B trees)
Prediction time: Slower (need B predictions + voting)
Memory usage: Higher (storing B trees)
Parallelization: Tree training can be parallelized

Random Forest represents a major improvement over single decision trees, using two complementary randomization techniques to create diverse, robust ensembles that significantly outperform individual trees while maintaining the interpretability and efficiency benefits of tree-based methods.