State Action Value Function Quiz

State-Action Value Function Quiz

Question 1

Which of the following accurately describes the state-action value function Q(s,a)?

It is the return if you start from state s, take action a (once), then behave optimally after that. ✓
It is the return if you start from state s and repeatedly take action a.
It is the return if you start from state s and behave optimally.
It is the immediate reward if you start from state s and take action a (once).

Question 2

You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.

What is the optimal action to take in state s?

STOP ✓
← (left)
→ (right)
Impossible to tell

Question 3

For this problem, γ = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

[Diagram shows states 1-6 with returns 100, 25, 6.25, 2.5, 10, 40 and optimal actions pointing left from most states]

0.625 ✓
0.391
1.25
2.5

Quick Reference

Q Function Key Points

Definition: Return for taking action once, then behaving optimally
Optimal action: Choose action that maximizes Q(s,a)
Bellman equation: Q(s,a) = R(s) + γ × max_{a’} Q(s’, a’)

Calculation Steps

Identify current state s and action a
Determine next state s’ after taking action a
Apply Bellman equation with immediate reward and discounted future return
Use max Q value from next state for optimal future behavior

Common Mistakes to Avoid

Confusing Q(s,a) with immediate reward only
Forgetting to take maximum over actions in next state
Mixing up current state rewards with next state rewards
Not applying discount factor to future returns