Consider the following processors and compilers: P1: 5-stage pipelined MIPS proc
ID: 3758052 • Letter: C
Question
Consider the following processors and compilers:
P1: 5-stage pipelined MIPS processor (stages: F, D, E, M, W) without data-forwarding, with Tclock = 1 ns.
P2: as P1, but with data-forwarding implemented.
P1R: as P1, but with compiler that can re-arrange instructions to reduce stalls.
P2R: as P2, but with compiler that can re-arrange instructions to reduce stalls.
The processors are used to execute the following fragment of MIPS code.
add $5 , $0 , $0
lw $4 , 0( $3 )
add $5 , $5 , $4
lw $6 , 4( $3 )
add $5 , $5 , $6
sw $5 , 8( $3 )
a) Assuming that at execution time the content of register $3 is available, for P1, P2, P1R and P2R:
i. Show the timing diagram1 (instructions executed in each clock cycle) of the execution of the above fragment of MIPS code. Clearly, for P1R and P2R you are free to re-arrange the instructions to obtain less stalls.
ii. Compute the execution time of the fragment of MIPS code.
b) What is the speed-up of P2R over P1 and P2?
Explanation / Answer
Please tell us the binary content of your registers
P2: as P1, but with data-forwarding implemented.
P1R: as P1, but with compiler that can re-arrange instructions to reduce stalls.
P2R: as P2, but with compiler that can re-arrange instructions to reduce stalls.
The processors are used to execute the following fragment of MIPS code.
add $5 , $0 , $0
lw $4 , 0( $3 )
add $5 , $5 , $4
lw $6 , 4( $3 )
add $5 , $5 , $6
sw $5 , 8( $3 )