Skip to content
Pablo Rodriguez

One Hot Encoding

Using One-Hot Encoding of Categorical Features

Section titled “Using One-Hot Encoding of Categorical Features”

So far, features took only two possible values. But what if a feature can take on more than two discrete values? One-hot encoding provides an elegant solution.

Original: Pointy or Floppy (2 values) Extended: Pointy, Floppy, or Oval (3 values)

Challenge: How to handle splitting on a feature with three possible values?

One option: Create three branches from single node

  • Left Branch: Pointy ears
  • Middle Branch: Floppy ears
  • Right Branch: Oval ears

Issue: Complicates algorithm design and implementation

Instead of one 3-value feature, create three binary features:

Pointy Ears Feature

Values: 1 if pointy, 0 otherwise

  • Binary decision: Has pointy ears?

Floppy Ears Feature

Values: 1 if floppy, 0 otherwise

  • Binary decision: Has floppy ears?

Oval Ears Feature

Values: 1 if oval, 0 otherwise

  • Binary decision: Has oval ears?

Original Data:

  • Example 1: Ear shape = Pointy
  • Example 2: Ear shape = Oval
  • Example 3: Ear shape = Floppy

One-Hot Encoded:

  • Example 1: [Pointy=1, Floppy=0, Oval=0]
  • Example 2: [Pointy=0, Floppy=0, Oval=1]
  • Example 3: [Pointy=0, Floppy=1, Oval=0]

Input: Single feature with k possible values Output: k binary features, each taking values 0 or 1

Key Property: Exactly one feature equals 1 per example

  • “Hot” feature: The one feature with value 1
  • “Cold” features: All other features with value 0
  • Name origin: “One-hot” because only one feature is “hot” (active)
  • Ear Shape: 3 values (pointy, floppy, oval)
  • Face Shape: 2 values (round, not round)
  • Whiskers: 2 values (present, absent)

Five total binary features:

  1. Pointy ears: 1 if pointy, 0 otherwise
  2. Floppy ears: 1 if floppy, 0 otherwise
  3. Oval ears: 1 if oval, 0 otherwise
  4. Round face: 1 if round, 0 otherwise
  5. Whiskers present: 1 if present, 0 otherwise

One-hot encoding works for other algorithms too:

Neural Network Input Requirements:

  • Expects numerical inputs
  • Categorical features need numerical representation
  • One-hot encoding provides this conversion

Example Application:

  • Input: Five binary features from one-hot encoding
  • Process: Feed to neural network or logistic regression
  • Output: Cat classification prediction

Applicable to:

  • Decision Trees: Handle multi-value categorical features
  • Neural Networks: Convert categories to numerical inputs
  • Linear/Logistic Regression: Enable categorical feature use
  • Any algorithm requiring numerical inputs

Original Dataset:

Animal 1: Ear=Pointy, Face=Round, Whiskers=Present → Cat
Animal 2: Ear=Oval, Face=Not Round, Whiskers=Present → Cat
Animal 3: Ear=Floppy, Face=Round, Whiskers=Absent → Dog

One-Hot Encoded Dataset:

Animal 1: [1,0,0,1,1] → Cat
Animal 2: [0,0,1,0,1] → Cat
Animal 3: [0,1,0,1,0] → Dog

[Pointy, Floppy, Oval, Round Face, Whiskers Present]

  • Animal 1: [1,0,0,1,1] = Pointy ears, Round face, Whiskers present
  • Animal 2: [0,0,1,0,1] = Oval ears, Not round face, Whiskers present
  • Animal 3: [0,1,0,1,0] = Floppy ears, Round face, No whiskers
Clean Solution
  1. Algorithm Compatibility: Works with existing binary decision tree algorithm
  2. No Algorithm Modification: Use standard implementation
  3. Clear Interpretation: Each feature has obvious meaning
  4. Universal Application: Works across multiple machine learning algorithms

Increased Feature Space:

  • Before: 3 features (some multi-valued)
  • After: 5 features (all binary)
  • Impact: Slightly more complex feature space

Mutual Exclusivity:

  • Exactly one ear shape feature equals 1
  • Built-in constraint prevents impossible combinations
  • Natural representation of categorical relationships

With one-hot encoding:

  1. Feature evaluation: Calculate information gain for each binary feature
  2. Split selection: Choose binary feature with highest gain
  3. Tree construction: Proceed with standard binary splitting
  4. Stopping criteria: Apply same rules as before

Example: If “Pointy ears” feature provides highest information gain, split on that feature:

  • Left branch: Animals with pointy ears (Pointy ears = 1)
  • Right branch: Animals without pointy ears (Pointy ears = 0)
one_hot_example.py
# Example one-hot encoding transformation
# Original: ear_shape = ['pointy', 'oval', 'floppy', 'pointy']
# One-hot encoded:
pointy_ears = [1, 0, 0, 1]
floppy_ears = [0, 0, 1, 0]
oval_ears = [0, 1, 0, 0]
# Combined feature matrix
features = [pointy_ears, floppy_ears, oval_ears, round_face, whiskers]

One-hot encoding elegantly solves the multi-value categorical feature problem by converting it to multiple binary features, maintaining compatibility with standard decision tree algorithms while enabling broader applicability across machine learning methods.