Skip to content
Pablo Rodriguez

Reinforcement Quiz

You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

  • action
  • state
  • return
  • reward

You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

  • R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative.
  • R(1) > R(2) > R(3), where R(1), R(2) and R(3) are negative.
  • R(1) < R(2) < R(3), where R(1) and R(2) are negative and R(3) is positive.
  • R(1) > R(2) > R(3), where R(1), R(2) and R(3) are positive.

You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

  • -100 - 0.25100 + 0.25^21000
  • -0.25100 - 0.25^2100 + 0.25^3*1000
  • -0.75100 - 0.75^2100 + 0.75^3*1000
  • -100 - 0.75100 + 0.75^21000

Given the rewards and actions below, compute the return from state 3 with a discount factor of γ = 0.25.

[Diagram shows: States 1-6 with rewards 100, 0, 0, 0, 0, 40 respectively, with arrows showing leftward movement from state 3]

  • 0.39
  • 0
  • 25
  • 6.25
  • State: Current position/situation of the agent
  • Action: Choice made by the agent (e.g., left/right)
  • Reward: Immediate feedback from environment
  • Return: Discounted sum of all future rewards

Always use: R₁ + γR₂ + γ²R₃ + … where first reward has no discount factor applied.