So far, features took only two possible values. But what if a feature can take on more than two discrete values? One-hot encoding provides an elegant solution.
Original : Pointy or Floppy (2 values)
Extended : Pointy, Floppy, or Oval (3 values)
Challenge : How to handle splitting on a feature with three possible values?
One option : Create three branches from single node
Left Branch : Pointy ears
Middle Branch : Floppy ears
Right Branch : Oval ears
Issue : Complicates algorithm design and implementation
Instead of one 3-value feature , create three binary features :
Pointy Ears Feature
Values : 1 if pointy, 0 otherwise
Binary decision: Has pointy ears?
Floppy Ears Feature
Values : 1 if floppy, 0 otherwise
Binary decision: Has floppy ears?
Oval Ears Feature
Values : 1 if oval, 0 otherwise
Binary decision: Has oval ears?
Original Data :
Example 1: Ear shape = Pointy
Example 2: Ear shape = Oval
Example 3: Ear shape = Floppy
One-Hot Encoded :
Example 1: [Pointy=1, Floppy=0, Oval=0]
Example 2: [Pointy=0, Floppy=0, Oval=1]
Example 3: [Pointy=0, Floppy=1, Oval=0]
Input : Single feature with k possible values
Output : k binary features, each taking values 0 or 1
Key Property : Exactly one feature equals 1 per example
“Hot” feature : The one feature with value 1
“Cold” features : All other features with value 0
Name origin : “One-hot” because only one feature is “hot” (active)
Ear Shape : 3 values (pointy, floppy, oval)
Face Shape : 2 values (round, not round)
Whiskers : 2 values (present, absent)
Five total binary features :
Pointy ears : 1 if pointy, 0 otherwise
Floppy ears : 1 if floppy, 0 otherwise
Oval ears : 1 if oval, 0 otherwise
Round face : 1 if round, 0 otherwise
Whiskers present : 1 if present, 0 otherwise
One-hot encoding works for other algorithms too :
Neural Network Input Requirements :
Expects numerical inputs
Categorical features need numerical representation
One-hot encoding provides this conversion
Example Application :
Input : Five binary features from one-hot encoding
Process : Feed to neural network or logistic regression
Output : Cat classification prediction
Applicable to :
Decision Trees : Handle multi-value categorical features
Neural Networks : Convert categories to numerical inputs
Linear/Logistic Regression : Enable categorical feature use
Any algorithm requiring numerical inputs
Original Dataset :
Animal 1: Ear=Pointy, Face=Round, Whiskers=Present → Cat
Animal 2: Ear=Oval, Face=Not Round, Whiskers=Present → Cat
Animal 3: Ear=Floppy, Face=Round, Whiskers=Absent → Dog
One-Hot Encoded Dataset :
Animal 1: [1,0,0,1,1] → Cat
Animal 2: [0,0,1,0,1] → Cat
Animal 3: [0,1,0,1,0] → Dog
[Pointy, Floppy, Oval, Round Face, Whiskers Present]
Animal 1: [1,0,0,1,1] = Pointy ears, Round face, Whiskers present
Animal 2: [0,0,1,0,1] = Oval ears, Not round face, Whiskers present
Animal 3: [0,1,0,1,0] = Floppy ears, Round face, No whiskers
Clean Solution
Algorithm Compatibility : Works with existing binary decision tree algorithm
No Algorithm Modification : Use standard implementation
Clear Interpretation : Each feature has obvious meaning
Universal Application : Works across multiple machine learning algorithms
Increased Feature Space :
Before : 3 features (some multi-valued)
After : 5 features (all binary)
Impact : Slightly more complex feature space
Mutual Exclusivity :
Exactly one ear shape feature equals 1
Built-in constraint prevents impossible combinations
Natural representation of categorical relationships
With one-hot encoding :
Feature evaluation : Calculate information gain for each binary feature
Split selection : Choose binary feature with highest gain
Tree construction : Proceed with standard binary splitting
Stopping criteria : Apply same rules as before
Example : If “Pointy ears” feature provides highest information gain, split on that feature:
Left branch : Animals with pointy ears (Pointy ears = 1)
Right branch : Animals without pointy ears (Pointy ears = 0)
# Example one-hot encoding transformation
# Original: ear_shape = ['pointy', 'oval', 'floppy', 'pointy']
pointy_ears = [ 1 , 0 , 0 , 1 ]
floppy_ears = [ 0 , 0 , 1 , 0 ]
# Combined feature matrix
features = [ pointy_ears, floppy_ears, oval_ears, round_face, whiskers ]
One-hot encoding elegantly solves the multi-value categorical feature problem by converting it to multiple binary features, maintaining compatibility with standard decision tree algorithms while enabling broader applicability across machine learning methods.