Skip to content
Pablo Rodriguez

Making Decisions

Making Decisions: Policies in Reinforcement Learning

Section titled “Making Decisions: Policies in Reinforcement Learning”

There are many different ways to choose actions in reinforcement learning:

  • Nearest reward strategy: “Always go for the nearer reward” - go left if leftmost reward is nearer, right if rightmost reward is nearer
  • Largest reward strategy: Always pursue the larger reward regardless of distance
  • Smallest reward strategy: Always go for smaller reward (doesn’t seem like a good idea, but it is another option)
  • Mixed strategy: “Go left unless you’re just one step away from the lesser reward, in which case, you go for that one”

Policy (π): A function that “takes as input any state s and maps it to some action a that it wants us to take”

Mathematical Notation: π(s) = a

  • Input: State s
  • Output: Action a

Strategy: Go left unless one step from lesser reward

Policy Mapping:

  • π(State 2) = Left
  • π(State 3) = Left
  • π(State 4) = Left
  • π(State 5) = Right

“The goal of reinforcement learning is to find a policy π or π(s) that tells you what action to take in every state so as to maximize the return.”

“I don’t know if policy is the most descriptive term of what π is, but it’s one of those terms that’s become standard in reinforcement learning. Maybe calling π a controller rather than a policy would be more natural terminology but policy is what everyone in reinforcement learning now calls this.”

The policy represents the final piece needed for a complete reinforcement learning system:

  1. States: Possible positions/situations
  2. Actions: Available choices at each state
  3. Rewards: Feedback for being in each state
  4. Return: Discounted sum of future rewards
  5. Policy: Decision-making function that maps states to actions
  • Policy determines which actions to take
  • Actions determine which states are visited
  • States determine which rewards are received
  • Returns provide the metric for evaluating policy quality
  • Goal is finding the policy that maximizes expected return

The policy serves as the “brain” of the reinforcement learning agent, encapsulating all the learned knowledge about how to behave optimally in the environment to achieve the highest possible return.