Grid World MDP

  • Define and implement a grid world MDP
    • where each action costs -1 and
    • reaching the goal generates a reward that is equal to the distance from the start state to the goal state.
  • Compute and visualize the optimal policy and the optimal value function
Reward Shaping

  • Compute the optimal pseudo-rewards of the reward shaping method
  • create a 2x2 plot where each subplot visualizes the optimal pseudo-rewards for one of the four possible actions in each state of the grid world.
