Skip to content
Pablo Rodriguez

State Action Value Function Quiz

Which of the following accurately describes the state-action value function Q(s,a)?

  • It is the return if you start from state s, take action a (once), then behave optimally after that.
  • It is the return if you start from state s and repeatedly take action a.
  • It is the return if you start from state s and behave optimally.
  • It is the immediate reward if you start from state s and take action a (once).

You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.

What is the optimal action to take in state s?

  • STOP
  • ← (left)
  • → (right)
  • Impossible to tell

For this problem, γ = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

[Diagram shows states 1-6 with returns 100, 25, 6.25, 2.5, 10, 40 and optimal actions pointing left from most states]

  • 0.625
  • 0.391
  • 1.25
  • 2.5
  • Definition: Return for taking action once, then behaving optimally
  • Optimal action: Choose action that maximizes Q(s,a)
  • Bellman equation: Q(s,a) = R(s) + γ × max_{a’} Q(s’, a’)
  1. Identify current state s and action a
  2. Determine next state s’ after taking action a
  3. Apply Bellman equation with immediate reward and discounted future return
  4. Use max Q value from next state for optimal future behavior
  • Confusing Q(s,a) with immediate reward only
  • Forgetting to take maximum over actions in next state
  • Mixing up current state rewards with next state rewards
  • Not applying discount factor to future returns