Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Reinforcement Learning by Richard S. Sutton and Andrew G. Barto (2nd Edition) Ex

ID: 3748170 • Letter: R

Question

Reinforcement Learning by Richard S. Sutton and Andrew G. Barto (2nd Edition)

Exercise 3.7 Imagine that you are designing a robot to run a maze. You decide to give it a reward of +1 for escaping from the maze and a reward of zero at all other times. The task seems to break down naturally into episodes-the successive runs through the maze-so you decide to treat it as an episodic task, where the goal is to maximize expected total reward (3.7). After running the learning agent for a while, you find that it is showing no improvement in escaping from the maze. What is going wrong? Have you effectively communicated to the agent what you want it to achieve?

Explanation / Answer

Answer: In reinforcement learning, agent receives a reward at each time step. The goal of learning is to maximize the total amount (cummulative) of reward received by the agent. For the given problem, the learning process is being conducted in the form of episodic tasks. It episodic tasks, it is crtical to define a special state called terminal state. This terminal state is followed by a reset to standard starting state or a sample from the distribution of starting states. It is important to distinguish between the terminal state and non-terminal state to ensure the success of episodic task.

Also the success of reinforcement learning depends in the manner in which rewards are computed and indicate how to achieve the goal. It is critical to indicate for commulative reward whether goal is achieved or not apart from being maximized.

Hence, if this is the case that robot is not showing imporovements in its performance, then it means there is some gap in knowing whether goal is achieved or not, wheter terminal condition for each episode is assigned properly or not and is contributing in achieving the goal i.e. escaping from maze as quickly as possible.