Random Forest builds multiple decision trees using sampling with replacement to create a robust ensemble classifier.
For B = 1 to B (total trees) :
Use sampling with replacement to create new training set of size M
Train decision tree on this new dataset
For prediction :
Get all B trees to vote on final prediction
Use majority vote for classification
Original Training Set : 10 examples
Process :
Tree 1 : Sample 10 examples with replacement → Train decision tree
Tree 2 : Sample 10 examples with replacement → Train different decision tree
Tree 3 : Sample 10 examples with replacement → Train another decision tree
Continue for B total trees…
Typical B values : 64 to 128 trees
More trees : Generally doesn’t hurt performance
Diminishing returns : Beyond ~100 trees, improvement minimal
Computational cost : More trees = slower training and prediction
Bagged Decision Tree
This basic ensemble method is called “bagged decision tree” because training examples are placed in a virtual “bag” for sampling.
Issue : Even with sampling, trees often make same split decisions
Root node : Frequently chooses same feature across different samples
Near-root nodes : Similar split patterns emerge
Result : Trees less diverse than optimal
Key Innovation : At each node, randomly select subset of features to consider
Process at each node :
Available features : n total features
Random subset : Choose k < n features randomly
Best split selection : Choose feature with highest information gain from k features only
Feature subset size (k) :
Typical choice : k = √n (square root of total features)
Small datasets : Technique more useful for larger feature sets
Example : If n = 100 features, consider k = 10 features at each split
Basic Bagging
Randomization Source
Sample different training sets
Same features available at each node
Random Forest
Enhanced Randomization
Sample different training sets
Random feature subsets at each node
Greater tree diversity
Multiple sources of randomness :
Data sampling : Different training sets via sampling with replacement
Feature sampling : Different feature subsets at each node split
Result : More diverse trees → Better ensemble performance
Ensemble averaging : Algorithm explores many small changes to data automatically
Data variations : Sampling with replacement creates data perturbations
Feature variations : Random feature selection creates different perspectives
Collective wisdom : Averaging over variations reduces impact of individual changes
Training : Fixed dataset, all features available
Weakness : High sensitivity to data changes
Strength : Fast, interpretable
Training : Multiple sampled datasets, all features available
Improvement : Reduced sensitivity through ensemble voting
Limitation : Trees may still be too similar
Training : Multiple sampled datasets, random feature subsets
Advantages : Maximum diversity, excellent robustness
Performance : Typically much better than single trees
# Conceptual Random Forest algorithm
for b in range ( B ): # B = number of trees
# 1. Sample with replacement
sample_data = sample_with_replacement ( original_data , M )
# 2. Train tree with feature randomization
# At each node, only consider sqrt(n) random features
tree.max_features = int ( sqrt ( n_features ))
# Prediction: majority vote from all trees
votes = [ tree. predict ( x ) for tree in forest ]
return majority_vote ( votes )
Accuracy : Usually significantly better than single decision tree
Robustness : Much less sensitive to training data changes
Overfitting reduction : Ensemble averaging reduces overfitting
Training time : Longer than single tree (training B trees)
Prediction time : Slower (need B predictions + voting)
Memory usage : Higher (storing B trees)
Parallelization : Tree training can be parallelized
Random Forest represents a major improvement over single decision trees, using two complementary randomization techniques to create diverse, robust ensembles that significantly outperform individual trees while maintaining the interpretability and efficiency benefits of tree-based methods.