CSE 401 Midterm May 15, 2017 1 pipeline (40pt) A particular (fictional) CPU has
ID: 3903575 • Letter: C
Question
CSE 401 Midterm May 15, 2017 1 pipeline (40pt) A particular (fictional) CPU has the following internal units and timings (WR and RR are write/read registers, ALU does all logic and integer nerations and there is a separate floating point unit. ID - instruction fetch and decode 150ps RR - read registers 40ps WR - write registers 60ps ALU - arithmetic and logic 180ps FPU - floating point 280ps MEM-memory access 200ps There are 6 basic instruction types: 1. LOAD: ID+RR+ALU+MEM+WR: 660ps 2. STORE: ID+RR+ALU+MEM: 600ps 3. MEMOP: ID+WR+RR+ALU+MEM: 660ps 4. LOGIC/INTEGER: ID+RR+ALU+WR: 460ps 5. FLOATING POINT: ID+RR+FPU+WR: 560ps 6. BRANCH: ID+RR+ALU: 400ps 1 cycle is 660ps for this machine, on the assumption that all instructions take 1 cycle (ignore memory delays). Outline what a pipeline would look like: Give • number of stages 5 Stages • operations in each stage • new clock rate • speedup compared to original 660ps instruction rate with no pipeline. Assume you double the number of registers, but increased address complexity would add 10ps to RR and 15ps to WR. What (if any) changes would you malee to your pipelined machine (e.g. clock or stages or both)?Explanation / Answer
Each instruction has five phases.
Operations in each phase of instruction
Instruction fetch (IF)
Instruction decode (ID)
Execution (EX)
Memory access (MEM)
Writeback(WB)
Thus, each instruction requires five cycles to execute (CPI )= 5
Since there are five separate stages, a pipeline with one instruction for each stage can be achieved.
Consider the following classical example of pipelininig:
Instruction
Cycle 1
Cycle2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
I1
IF
ID
EX
MEM
WB
I2
IF
ID
EX
MEM
WB
I3
IF
ID
EX
MEM
WB
I 4
IF
ID
EX
MEM
WB
I 5
IF
ID
EX
MEM
WB
Instruction1
completes
Instruction2
completes
Instruction3
completes
Instruction4
completes
Instruction5
Completes
Without pipelining each instruction will be executed in a sequential order in which all the instruction phases will take 5 cycles each making a total of 25 cycles to complete 5 instructions(i1,i1,….i5)
By applying pipeling the execution of instruction will be done in a parallel therefore completing one instruction per cycle
Pipeline speedup:
In the given problem,
Mnemonic
Instruction stage
Time in pico seconds
ID
INSTRUCTION FETCH AND DECODE
180
RR
READ REGISTERS
40
WR
WRITE REGISTERS
60
ALU
ARITHMETIC AND LOGIC
180
FPU
FLOATING POINT
280
MEM
MEMORY ACCESS
200
No. of pipeline stages: 5
Each pipeline stage:660ps
(1)
LOAD : ID+RR+ALU+MEM÷WR (660ps)
Speedup=660/5=132ps
(2)
STORE : ID+RR+ALU+MEM (660ps)
Speedup=660/5=132ps
(3)
MEMOP: ID+WR+RR+ALU+MEM (660ps)
Speedup=660/5=132ps
(4)
LOGIC/INTEGER. ID+RR+ALU+WR (460ps)
Speedup=460/5=92ps
(5)
FLOATING POINT: ID+RR+FPU+WR (560ps)
Speedup=560/5=112ps
(6)
BRANCH: ID+RR+ALU (400PS)
Speedup=400/5=80ps
INSTRUCTION NAME
/CYCLE NO.;
1
2
3
4
5
6
7
8
LOAD
ID
RR
ALU
MEM
WR
STORE
ID
RR
ALU
MEM
MEMOP
ID
WR
RR
ALU
MEM
LOGIC/INT
ID
RR
ALU
WR
FLOATING PT
ID
RR
FPU
WR
BRANCH
ID
RR
ALU
Theoretical speedup of a pipeline
S=nT/(k+n?1)t
where:
n is the no of tasks=(6)
T is the time to process a task=(660ps)
k is the number of stages =(5)
t is the time per stage =(280 considering maximum value of FPUoperation)
S=6(660)/(5+6-1)(280) =1.414286
So the speed up is 1.4 times faster in pipelined execution
Average no of cycles taken by an instruction in the given instruction set=
((660)*5 + (660)*4 + (660)*5+ (460)*4 + (560)*4+ (400)*3 = 14520)/(6*(660)) =3.67 clock cycles
No. Of cycles taken per instruction on average = 3.67
Assuming each cycle takes 660ps and ignoring memory delays
Assuming no. Of registers are doubled,
Address complexity of
RR=40+10=50ps
WR=60+15=75ps
By adding more number of stages we might reduce or maintain the same throughput of the machine
as the latency is inversely proportional to number of stages in pipeline ; increase in no of stages will reduce the average clock cycles needed for an instruction
Instruction
Cycle 1
Cycle2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
I1
IF
ID
EX
MEM
WB
I2
IF
ID
EX
MEM
WB
I3
IF
ID
EX
MEM
WB
I 4
IF
ID
EX
MEM
WB
I 5
IF
ID
EX
MEM
WB
Instruction1
completes
Instruction2
completes
Instruction3
completes
Instruction4
completes
Instruction5
Completes