Reinforcement Quiz

Reinforcement Learning Introduction Quiz

Question 1

You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

action
state ✓
return
reward

Question 2

You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative. ✓
R(1) > R(2) > R(3), where R(1), R(2) and R(3) are negative.
R(1) < R(2) < R(3), where R(1) and R(2) are negative and R(3) is positive.
R(1) > R(2) > R(3), where R(1), R(2) and R(3) are positive.

Question 3

You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

-100 - 0.25100 + 0.25^21000
-0.25100 - 0.25^2100 + 0.25^3*1000
-0.75100 - 0.75^2100 + 0.75^3*1000
-100 - 0.75100 + 0.75^21000 ✓

Question 4

Given the rewards and actions below, compute the return from state 3 with a discount factor of γ = 0.25.

[Diagram shows: States 1-6 with rewards 100, 0, 0, 0, 0, 40 respectively, with arrows showing leftward movement from state 3]

0.39
0
25
6.25 ✓

Quick Reference

State vs Action vs Reward vs Return

State: Current position/situation of the agent
Action: Choice made by the agent (e.g., left/right)
Reward: Immediate feedback from environment
Return: Discounted sum of all future rewards

Return Calculation

Always use: R₁ + γR₂ + γ²R₃ + … where first reward has no discount factor applied.