Skip to content
Pablo Rodriguez

Mars Rover

Simplified Reinforcement Learning Environment

Section titled “Simplified Reinforcement Learning Environment”

This example is “adapted from the example due to Stanford professor Emma Branskill and one of my collaborators, Jagriti Agrawal, who had actually written code that is actually controlling the Mars rover right now.”

Six Positions/States:

  • State 1, State 2, State 3, State 4, State 5, State 6
  • Rover starts in State 4
  • State: “The position of the Mars rover is called the state in reinforcement learning”

Science Mission Goals:

  • Use sensors (drill, radar, spectrometer) to analyze rocks
  • Take interesting pictures for scientists on Earth
  • Different locations have varying scientific value

Reward Structure:

  • State 1: Reward = 100 (very interesting surface, high scientific value)
  • State 6: Reward = 40 (pretty interesting surface, less valuable than State 1)
  • States 2, 3, 4, 5: Reward = 0 (not much interesting science)

Two Actions Per Step:

  • Go left
  • Go right

End Conditions:

  • When rover reaches State 1 or State 6, “the day ends”
  • These are “terminal states” where:
    • Rover receives final reward at that state
    • “Nothing happens after that” (robot runs out of fuel/time)
    • No additional rewards can be earned

Sequence: State 4 → State 3 → State 2 → State 1 Rewards: 0 → 0 → 0 → 100

Sequence: State 4 → State 5 → State 6 Rewards: 0 → 0 → 40

Inefficient Path: State 4 → State 5 → State 4 → State 3 → State 2 → State 1

  • Robot “is wasting a bit of time” by going right first, then changing direction
  • “This maybe isn’t such a great way to take actions”
  • Final rewards: 0 → 0 → 0 → 0 → 0 → 100

At every time step, the robot experiences:

  1. Current State (S): Where the robot currently is
  2. Action: Choice between left or right
  3. Reward (R(S)): Reward associated with current state
  4. Next State (S’): Where robot moves after taking action
  • State: 4
  • Action: Go left
  • Reward: 0 (associated with State 4)
  • Next State: 3

Four Core Components:

  • State (S)
  • Action
  • Reward (R)
  • Next State (S’)

These four elements are “what reinforcement learning algorithms will look at when deciding how to take actions” and form the foundation for specific reinforcement learning algorithms.

The Mars rover example provides a simplified but complete framework for understanding how reinforcement learning problems are structured, with clear states, actions, rewards, and terminal conditions that mirror real-world applications.