Original Architecture
- Input: 12 numbers (state + action)
- Output: 1 Q value
- Requires: 4 forward passes per decision
- Use case: One Q value at a time
The original architecture required computing Q(s,a) separately for each action:
Input: 12 numbers (state + action)↓Hidden Layer 1: 64 unitsHidden Layer 2: 64 units↓Output: 1 Q value
Required: 4 separate forward passes per state
Key Improvement: “It turns out to be more efficient to train a single neural network to output all four of these values simultaneously.”
Input: 8 numbers (state only)↓Hidden Layer 1: 64 unitsHidden Layer 2: 64 units↓Output: 4 units (all Q values)
Output Units:- Q(s, nothing)- Q(s, left)- Q(s, main)- Q(s, right)
Efficiency Gain: “This turns out to be more efficient because given the state s we can run inference just once and get all four of these values, and then very quickly pick the action a that maximizes Q(s,a).”
Additional Benefit: “You notice also in Bellman’s equations, there’s a step in which we have to compute max over a’ Q(s’, a’), this multiplied by gamma and then there was plus R(s) up here.”
Computational Advantage: “This neural network also makes it much more efficient to compute this because we’re getting Q(s’, a’) for all actions a’ at the same time. You can then just pick the max to compute this value for the right-hand side of Bellman’s equations.”
Original Architecture
Improved Architecture
The improved architecture eliminates the need for action encoding in the input and instead produces all action values as outputs, making both action selection and Bellman equation computation much more efficient.
“Most implementations of DQN actually use this more efficient architecture that we’ll see in this video” rather than the conceptually simpler but computationally inefficient original approach.
This architectural improvement represents a practical optimization that maintains the same learning objectives while significantly reducing computational overhead, making the algorithm much more viable for real-world applications.