Reinforcement Quiz
Reinforcement Learning Introduction Quiz
Section titled “Reinforcement Learning Introduction Quiz”Question 1
Section titled “Question 1”You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.
- action
- state ✓
- return
- reward
Question 2
Section titled “Question 2”You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:
- R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative. ✓
- R(1) > R(2) > R(3), where R(1), R(2) and R(3) are negative.
- R(1) < R(2) < R(3), where R(1) and R(2) are negative and R(3) is positive.
- R(1) > R(2) > R(3), where R(1), R(2) and R(3) are positive.
Question 3
Section titled “Question 3”You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?
- -100 - 0.25100 + 0.25^21000
- -0.25100 - 0.25^2100 + 0.25^3*1000
- -0.75100 - 0.75^2100 + 0.75^3*1000
- -100 - 0.75100 + 0.75^21000 ✓
Question 4
Section titled “Question 4”Given the rewards and actions below, compute the return from state 3 with a discount factor of γ = 0.25.
[Diagram shows: States 1-6 with rewards 100, 0, 0, 0, 0, 40 respectively, with arrows showing leftward movement from state 3]
- 0.39
- 0
- 25
- 6.25 ✓
Quick Reference
Section titled “Quick Reference”State vs Action vs Reward vs Return
Section titled “State vs Action vs Reward vs Return”- State: Current position/situation of the agent
- Action: Choice made by the agent (e.g., left/right)
- Reward: Immediate feedback from environment
- Return: Discounted sum of all future rewards
Return Calculation
Section titled “Return Calculation”Always use: R₁ + γR₂ + γ²R₃ + … where first reward has no discount factor applied.