Fo this problem, use a simple 4 stage pipeline (FET, DEC, EXE, WB). Assume that
ID: 3773120 • Letter: F
Question
Fo this problem, use a simple 4 stage pipeline (FET, DEC, EXE, WB). Assume that the MULT instruction requires 3 cycles to execute and the memory operations require 2 cycles, but all other instructions can be completed in a single operation. Assume perfect branch prediction. Show the scheduling for this loop, with and without forwarding. What is the CPI for instructions within the loop without forwarding? with forwarding?
L1: LD R1,0(R2)
MULT R4,R3,R1
ST R4,0(R2)
ADDI R2,R2,#4
CMP R2,R0
BLE L1
second part of problem is :
Schedule two executions of the loop from above for the same four stage pipeline, but now support dual issue, out-of-order execution
Explanation / Answer
4 stage pipe line processing:
FET: Fetch = read or fetch or get the instruction from the memory
DEC: Decode = The commands, op codes and other
instructions that were previously encoded would get decoded
The opcodes connect both the operands on the left hand side
EXE: Execute the operation that was described by the opcodes and / or instruction
WB: Write Back the result of the operation back in to the
memory address that was specified as rge destination address
Hence it is Fetch, Decode, Execute, and Write Back
Hazard to take care of: RAW Hazard
Scheduling for the loop:
Loop: LD F1, R1
LD F2, 0(R2)
LD F4, -12 (R3)
MULT RES1, R1
MULT RES1, R3
MULT RES1, R4
ST RES2, R4
ST RES2, 0(R2)
// CMP R2, R0
ST RES3, R2
ST RES4, R0
CMP RES3, RES4
BLE L1
// This is equivalent to R4 = R3 * R1
Cycles Per Instruction (CPI) inside the loop with no
forwarding:
CPI Formula = Sum of (NI) * (ClockCycles) / (Instruction Count)
Where
NI = Number of Instructions for each type of instruction under consideration
Clock Cycles = Clock Cycles consumed by instruction of the a specific type under consideration
Instruction Count = Count of all total instructions
Cycles Per Instruction (CPI) inside the loop with
forwarding:
MULT instruction consumes 3 cycles to complete the execution
Memory operation consumes 2 cycles to complete the operation
All other instruction consumes just a single operation
Branch prediction is perfect.
our program has 1 MULT instruction 3 cycles
2 Memory operations (LD, ST) 3+2*2 = 7 cycles so far
other instructions (ADDI, CMP, BLE) 3 + 7 = 10 in total
CPI = ( 3 * 1/6 + 2 * 2 + 3 * 1 ) / 6
= ( 3 * 0.16666 + 4 + 3 ) / 6
= (0.5 + 7 ) / 6
= 7.5 / 6
= 1.25