Position
- x, y coordinates
- Orientation angle θ
The simplified Mars rover example used “a discrete set of states” where the rover “could only be in one of six possible positions.” However, most real robots can be in “any of a very large number of continuous value positions.”
Instead of discrete positions 1-6, a Mars rover could be positioned anywhere on a line from “0-6 kilometers where any number in between is valid.” Examples include positions like:
For controlling a self-driving car or truck smoothly, the state includes six numbers:
Position and Orientation:
Velocity Information:
Position
Velocity
Unlike the Mars rover with discrete states 1-6, the car state “comprises this vector of six numbers, and any of these numbers can take on any value within its valid range.” For example, “Theta should range between zero and 360 degrees.”
Controlling an autonomous helicopter requires an even more sophisticated state representation with twelve numbers:
Position (3D):
Orientation (3 angles):
Linear Velocities (3D):
Angular Velocities (3D):
“This is actually the state used to control autonomous helicopters. Is this list of 12 numbers that is input to a policy, and the job of a policy is look at these 12 numbers and decide what’s an appropriate action to take in the helicopter.”
Continuous State Markov Decision Process: “The state of the problem isn’t just one of a small number of possible discrete values, like a number from 1-6. Instead, it’s a vector of numbers, any of which could take any of a large number of values.”
These continuous state spaces are essential for:
“In the practice lab for this week, you get to implement for yourself a reinforcement learning algorithm applied to a simulated lunar lander application. Landing something on the moon is simulation.”
The transition from discrete to continuous state spaces represents a significant increase in complexity but enables reinforcement learning to tackle real-world control problems where precise, smooth actions are required rather than simple discrete choices.