Mars Rover

Mars Rover Example

Simplified Reinforcement Learning Environment

This example is “adapted from the example due to Stanford professor Emma Branskill and one of my collaborators, Jagriti Agrawal, who had actually written code that is actually controlling the Mars rover right now.”

Environment Setup

Six Positions/States:

State 1, State 2, State 3, State 4, State 5, State 6
Rover starts in State 4
State: “The position of the Mars rover is called the state in reinforcement learning”

Mission Objectives

Science Mission Goals:

Use sensors (drill, radar, spectrometer) to analyze rocks
Take interesting pictures for scientists on Earth
Different locations have varying scientific value

Reward Structure:

State 1: Reward = 100 (very interesting surface, high scientific value)
State 6: Reward = 40 (pretty interesting surface, less valuable than State 1)
States 2, 3, 4, 5: Reward = 0 (not much interesting science)

Available Actions

Two Actions Per Step:

Go left
Go right

Terminal States

End Conditions:

When rover reaches State 1 or State 6, “the day ends”
These are “terminal states” where:
- Rover receives final reward at that state
- “Nothing happens after that” (robot runs out of fuel/time)
- No additional rewards can be earned

Example Action Sequences

Going Left from State 4

Sequence: State 4 → State 3 → State 2 → State 1 Rewards: 0 → 0 → 0 → 100

Going Right from State 4

Sequence: State 4 → State 5 → State 6 Rewards: 0 → 0 → 40

Mixed Strategy Example

Inefficient Path: State 4 → State 5 → State 4 → State 3 → State 2 → State 1

Robot “is wasting a bit of time” by going right first, then changing direction
“This maybe isn’t such a great way to take actions”
Final rewards: 0 → 0 → 0 → 0 → 0 → 100

Reinforcement Learning Components

Core Elements

At every time step, the robot experiences:

Current State (S): Where the robot currently is
Action: Choice between left or right
Reward (R(S)): Reward associated with current state
Next State (S’): Where robot moves after taking action

Concrete Example

State: 4
Action: Go left
Reward: 0 (associated with State 4)
Next State: 3

Key Learning Elements

Four Core Components:

State (S)
Action
Reward (R)
Next State (S’)

These four elements are “what reinforcement learning algorithms will look at when deciding how to take actions” and form the foundation for specific reinforcement learning algorithms.

The Mars rover example provides a simplified but complete framework for understanding how reinforcement learning problems are structured, with clear states, actions, rewards, and terminal conditions that mirror real-world applications.