State Action Value Function Quiz
State-Action Value Function Quiz
Section titled “State-Action Value Function Quiz”Question 1
Section titled “Question 1”Which of the following accurately describes the state-action value function Q(s,a)?
- It is the return if you start from state s, take action a (once), then behave optimally after that. ✓
- It is the return if you start from state s and repeatedly take action a.
- It is the return if you start from state s and behave optimally.
- It is the immediate reward if you start from state s and take action a (once).
Question 2
Section titled “Question 2”You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.
What is the optimal action to take in state s?
- STOP ✓
- ← (left)
- → (right)
- Impossible to tell
Question 3
Section titled “Question 3”For this problem, γ = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).
[Diagram shows states 1-6 with returns 100, 25, 6.25, 2.5, 10, 40 and optimal actions pointing left from most states]
- 0.625 ✓
- 0.391
- 1.25
- 2.5
Quick Reference
Section titled “Quick Reference”Q Function Key Points
Section titled “Q Function Key Points”- Definition: Return for taking action once, then behaving optimally
- Optimal action: Choose action that maximizes Q(s,a)
- Bellman equation: Q(s,a) = R(s) + γ × max_{a’} Q(s’, a’)
Calculation Steps
Section titled “Calculation Steps”- Identify current state s and action a
- Determine next state s’ after taking action a
- Apply Bellman equation with immediate reward and discounted future return
- Use max Q value from next state for optimal future behavior
Common Mistakes to Avoid
Section titled “Common Mistakes to Avoid”- Confusing Q(s,a) with immediate reward only
- Forgetting to take maximum over actions in next state
- Mixing up current state rewards with next state rewards
- Not applying discount factor to future returns