Skip to content
Pablo Rodriguez

Algorithm Refinement Neural

Algorithm Refinement: Improved Neural Network Architecture

Section titled “Algorithm Refinement: Improved Neural Network Architecture”

Efficiency Problem with Original Architecture

Section titled “Efficiency Problem with Original Architecture”

The original architecture required computing Q(s,a) separately for each action:

  • “Whenever we are in some state s, we would have to carry out inference in the neural network separately four times to compute these four values so as to pick the action a that gives us the largest Q value”
  • “This is inefficient because we have to carry our inference four times from every single state”
original_architecture.txt
Input: 12 numbers (state + action)
Hidden Layer 1: 64 units
Hidden Layer 2: 64 units
Output: 1 Q value
Required: 4 separate forward passes per state

Key Improvement: “It turns out to be more efficient to train a single neural network to output all four of these values simultaneously.”

improved_architecture.txt
Input: 8 numbers (state only)
Hidden Layer 1: 64 units
Hidden Layer 2: 64 units
Output: 4 units (all Q values)
Output Units:
- Q(s, nothing)
- Q(s, left)
- Q(s, main)
- Q(s, right)

Efficiency Gain: “This turns out to be more efficient because given the state s we can run inference just once and get all four of these values, and then very quickly pick the action a that maximizes Q(s,a).”

Additional Benefit: “You notice also in Bellman’s equations, there’s a step in which we have to compute max over a’ Q(s’, a’), this multiplied by gamma and then there was plus R(s) up here.”

Computational Advantage: “This neural network also makes it much more efficient to compute this because we’re getting Q(s’, a’) for all actions a’ at the same time. You can then just pick the max to compute this value for the right-hand side of Bellman’s equations.”

  • Original: State + Action (12 numbers) → Single Q value
  • Improved: State only (8 numbers) → All four Q values
  1. Input state s to neural network (8 numbers)
  2. Get all Q values simultaneously: [Q(s,nothing), Q(s,left), Q(s,main), Q(s,right)]
  3. Select action with highest Q value: a = argmax Q(s,a)
  • Forward pass: More efficient (1 pass vs 4 passes)
  • Bellman computation: Faster max operation over actions
  • Overall training: Significantly improved computational efficiency

Original Architecture

  • Input: 12 numbers (state + action)
  • Output: 1 Q value
  • Requires: 4 forward passes per decision
  • Use case: One Q value at a time

Improved Architecture

  • Input: 8 numbers (state only)
  • Output: 4 Q values simultaneously
  • Requires: 1 forward pass per decision
  • Use case: All Q values at once

The improved architecture eliminates the need for action encoding in the input and instead produces all action values as outputs, making both action selection and Bellman equation computation much more efficient.

“Most implementations of DQN actually use this more efficient architecture that we’ll see in this video” rather than the conceptually simpler but computationally inefficient original approach.

This architectural improvement represents a practical optimization that maintains the same learning objectives while significantly reducing computational overhead, making the algorithm much more viable for real-world applications.