Mars Rover
Mars Rover Example
Section titled “Mars Rover Example”Simplified Reinforcement Learning Environment
Section titled “Simplified Reinforcement Learning Environment”This example is “adapted from the example due to Stanford professor Emma Branskill and one of my collaborators, Jagriti Agrawal, who had actually written code that is actually controlling the Mars rover right now.”
Environment Setup
Section titled “Environment Setup”Six Positions/States:
- State 1, State 2, State 3, State 4, State 5, State 6
- Rover starts in State 4
- State: “The position of the Mars rover is called the state in reinforcement learning”
Mission Objectives
Section titled “Mission Objectives”Science Mission Goals:
- Use sensors (drill, radar, spectrometer) to analyze rocks
- Take interesting pictures for scientists on Earth
- Different locations have varying scientific value
Reward Structure:
- State 1: Reward = 100 (very interesting surface, high scientific value)
- State 6: Reward = 40 (pretty interesting surface, less valuable than State 1)
- States 2, 3, 4, 5: Reward = 0 (not much interesting science)
Available Actions
Section titled “Available Actions”Two Actions Per Step:
- Go left
- Go right
Terminal States
Section titled “Terminal States”End Conditions:
- When rover reaches State 1 or State 6, “the day ends”
- These are “terminal states” where:
- Rover receives final reward at that state
- “Nothing happens after that” (robot runs out of fuel/time)
- No additional rewards can be earned
Example Action Sequences
Section titled “Example Action Sequences”Going Left from State 4
Section titled “Going Left from State 4”Sequence: State 4 → State 3 → State 2 → State 1 Rewards: 0 → 0 → 0 → 100
Going Right from State 4
Section titled “Going Right from State 4”Sequence: State 4 → State 5 → State 6 Rewards: 0 → 0 → 40
Mixed Strategy Example
Section titled “Mixed Strategy Example”Inefficient Path: State 4 → State 5 → State 4 → State 3 → State 2 → State 1
- Robot “is wasting a bit of time” by going right first, then changing direction
- “This maybe isn’t such a great way to take actions”
- Final rewards: 0 → 0 → 0 → 0 → 0 → 100
Reinforcement Learning Components
Section titled “Reinforcement Learning Components”Core Elements
Section titled “Core Elements”At every time step, the robot experiences:
- Current State (S): Where the robot currently is
- Action: Choice between left or right
- Reward (R(S)): Reward associated with current state
- Next State (S’): Where robot moves after taking action
Concrete Example
Section titled “Concrete Example”- State: 4
- Action: Go left
- Reward: 0 (associated with State 4)
- Next State: 3
Key Learning Elements
Section titled “Key Learning Elements”Four Core Components:
- State (S)
- Action
- Reward (R)
- Next State (S’)
These four elements are “what reinforcement learning algorithms will look at when deciding how to take actions” and form the foundation for specific reinforcement learning algorithms.
The Mars rover example provides a simplified but complete framework for understanding how reinforcement learning problems are structured, with clear states, actions, rewards, and terminal conditions that mirror real-world applications.