- Define and implement a grid world MDP
- where each action costs -1 and
- reaching the goal generates a reward that is equal to the distance from the start state to the goal state.

- Compute and visualize the optimal policy and the optimal value function

- Compute the optimal pseudo-rewards of the reward shaping method
- create a 2x2 plot where each subplot visualizes the optimal pseudo-rewards for one of the four possible actions in each state of the grid world.

