Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Fo this problem, use a simple 4 stage pipeline (FET, DEC, EXE, WB). Assume that

ID: 3773120 • Letter: F

Question

Fo this problem, use a simple 4 stage pipeline (FET, DEC, EXE, WB). Assume that the MULT instruction requires 3 cycles to execute and the memory operations require 2 cycles, but all other instructions can be completed in a single operation. Assume perfect branch prediction. Show the scheduling for this loop, with and without forwarding. What is the CPI for instructions within the loop without forwarding? with forwarding?

L1: LD R1,0(R2)

MULT R4,R3,R1

ST R4,0(R2)

ADDI R2,R2,#4

CMP R2,R0

BLE L1

second part of problem is :

Schedule two executions of the loop from above for the same four stage pipeline, but now support dual issue, out-of-order execution

Explanation / Answer

4 stage pipe line processing:

FET: Fetch = read or fetch or get the instruction from the memory

DEC: Decode = The commands, op codes and other
instructions that were previously encoded would get decoded
The opcodes connect both the operands on the left hand side

EXE: Execute the operation that was described by the opcodes and / or instruction

WB: Write Back the result of the operation back in to the
memory address that was specified as rge destination address

Hence it is Fetch, Decode, Execute, and Write Back

Hazard to take care of: RAW Hazard
Scheduling for the loop:
Loop:    LD F1, R1
   LD F2, 0(R2)
   LD F4, -12 (R3)
   MULT RES1, R1
   MULT RES1, R3
   MULT RES1, R4
   ST RES2, R4
   ST RES2, 0(R2)
//   CMP R2, R0
   ST RES3, R2
   ST RES4, R0
   CMP RES3, RES4
   BLE L1


// This is equivalent to R4 = R3 * R1

Cycles Per Instruction (CPI) inside the loop with no
forwarding:
CPI Formula = Sum of (NI) * (ClockCycles) / (Instruction Count)
Where
NI = Number of Instructions for each type of instruction under consideration
Clock Cycles = Clock Cycles consumed by instruction of the a specific type under consideration
Instruction Count = Count of all total instructions
  
Cycles Per Instruction (CPI) inside the loop with
forwarding:

MULT instruction consumes 3 cycles to complete the execution
Memory operation consumes 2 cycles to complete the operation
All other instruction consumes just a single operation
Branch prediction is perfect.

our program has 1 MULT instruction 3 cycles
2 Memory operations (LD, ST) 3+2*2 = 7 cycles so far
other instructions (ADDI, CMP, BLE) 3 + 7 = 10 in total

CPI = ( 3 * 1/6 + 2 * 2 + 3 * 1 ) / 6
= ( 3 * 0.16666 + 4 + 3 ) / 6
= (0.5 + 7 ) / 6
= 7.5 / 6
= 1.25