Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Assume that we now need to solve a long-run average reward problem for the follo

ID: 3604141 • Letter: A

Question

Assume that we now need to solve a long-run average reward problem for the following matrices

i.e., there is no discount factor. Write a MATLAB program to perform relative value iteration. Show me the MATLAB code and also an output from your code after it is used to solve the MDP. Use the max norm for termination. Please show the nal policy and how many iterations the algorithm took to converge, as well as the final value of the average reward. Use = 0.001. Note: the MDP is the Markov decision process (MDP).

12 9 0 0.3 0.7 0.2 0.8 12 4 0.6 0.4 0.1 0.9 7-13 6 20

Explanation / Answer

start string name num score num NUM_TESTS = 4 num NUM_RANGES = 5 num RANGES[NUM_RANGES] = 90, 80, 70, 60, 0 num QUIT = "ZZZZZ" string GRADES[NUM_RANGES] = "A", "B", "C", "D", "F" num total num average num sub output "Enter student name or ", QUIT, " to quit " input name while name QUIT sub = 0 while sub