Making Decisions
Making Decisions: Policies in Reinforcement Learning
Section titled “Making Decisions: Policies in Reinforcement Learning”Action Selection Strategies
Section titled “Action Selection Strategies”Possible Approaches
Section titled “Possible Approaches”There are many different ways to choose actions in reinforcement learning:
- Nearest reward strategy: “Always go for the nearer reward” - go left if leftmost reward is nearer, right if rightmost reward is nearer
- Largest reward strategy: Always pursue the larger reward regardless of distance
- Smallest reward strategy: Always go for smaller reward (doesn’t seem like a good idea, but it is another option)
- Mixed strategy: “Go left unless you’re just one step away from the lesser reward, in which case, you go for that one”
Policy Definition
Section titled “Policy Definition”Core Concept
Section titled “Core Concept”Policy (π): A function that “takes as input any state s and maps it to some action a that it wants us to take”
Mathematical Notation: π(s) = a
- Input: State s
- Output: Action a
Example Policy
Section titled “Example Policy”Strategy: Go left unless one step from lesser reward
Policy Mapping:
- π(State 2) = Left
- π(State 3) = Left
- π(State 4) = Left
- π(State 5) = Right
Reinforcement Learning Goal
Section titled “Reinforcement Learning Goal”Objective
Section titled “Objective”“The goal of reinforcement learning is to find a policy π or π(s) that tells you what action to take in every state so as to maximize the return.”
Policy vs Controller Terminology
Section titled “Policy vs Controller Terminology”“I don’t know if policy is the most descriptive term of what π is, but it’s one of those terms that’s become standard in reinforcement learning. Maybe calling π a controller rather than a policy would be more natural terminology but policy is what everyone in reinforcement learning now calls this.”
Complete Framework
Section titled “Complete Framework”The policy represents the final piece needed for a complete reinforcement learning system:
- States: Possible positions/situations
- Actions: Available choices at each state
- Rewards: Feedback for being in each state
- Return: Discounted sum of future rewards
- Policy: Decision-making function that maps states to actions
Integration
Section titled “Integration”- Policy determines which actions to take
- Actions determine which states are visited
- States determine which rewards are received
- Returns provide the metric for evaluating policy quality
- Goal is finding the policy that maximizes expected return
The policy serves as the “brain” of the reinforcement learning agent, encapsulating all the learned knowledge about how to behave optimally in the environment to achieve the highest possible return.