Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the following vector code run on a 500 MHz version of VMIPS for a fixed

ID: 3861197 • Letter: C

Question

Consider the following vector code run on a 500 MHz version of VMIPS for a fixed vector length of 64. The latency of load and store unit is 12 cycles, add unit 6 cycles, multiply unit 7 cycles and divide unit 20 cycles.

LV V1, Ra

MULV.D V2, V1, V3

ADDV.D V4, V1, V3

SV Rb, V2

SV Rc, V4

a) Assuming no chaining and a single memory pipeline, determine how many clock periods it would take to run the above VMIPS vector code.

b) If the vector sequence is chained, how many clock cycles it would take to run the above vector code?

c) Suppose VMIPS had three memory pipelines and chaining. If there were no bank conflicts in the accesses for the above loop, how many clock cycles are required to run this sequence?

Explanation / Answer

Answer for the given question:

As given data in the problem statement

LV V1, Ra; 12 + 64
MULV.D V2, V1, V3; 7 + 12
ADDV.D V4, V1, V3; 6 + 64
SV Rb, V2 ; 12 + 64
SV Rc, V4 ; 12 + 64

The approximate time is 317 clock periods.

The hardware is changed such that the vector FP multiplier unit is chained to the scalar FP add unit (which stores in a
scalar FP register). The chaining requires a new bus, a Vector-Scalar, VS bus, from the vector multiplier to the scalar
FP Add functional unit. In the context of speculative superscalar MIPS execution, this means a reservation station
entry for the DPV in the integer section. That reservation station waits for the 64 results coming on the VS bus from
the vector unit. The VS bus is parallel to the integer and FP CDB bus(es).
If full chaining and multiple memory pipes are employed, the execution time would be 317 clock periods