Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Suppose we have an application running on a 32- processor multiprocessor, which

ID: 3765819 • Letter: S

Question

Suppose we have an application running on a 32- processor multiprocessor, which has a 200ns time to handle reference to a remote memory. For this application, assume that all the references except those involving communication hit in the local memory hierarchy, which is slightly optimistic. Processors are stalled on a remote request, and the processor clock rate is 3.3 GHz. If the base CPI (assuming that all references hit in the cache) is 0.5, how much faster is the multiprocessor if there is no communication versus if 0.2% of the instructions involve a remote communication reference?

Explanation / Answer

It is simpler to first calculate the clock cycles per instruction. The effective CPI

for the multiprocessor with 0.2% remote references is

CPI=BASE CPI +REMOTE REQUEST RATE*REMOTE REQUEST COST

=0.5+0.2%*REMOTE REQUEST COST

this remote request cost is

Remote access cost/ cycle time= 200nos/0.3nos=666 cycles

therefore CPI= 0.5+1.2=1.7

The multiprocessor with all local references is 1.7/0.5 = 3.4 times faster. In prac-

tice, the performance analysis is much more complex, since some fraction of the

noncommunication references will miss in the local hierarchy and the remote

access time does not have a single constant value. For example, the cost of a re-

mote reference could be quite a bit worse, since contention caused by many ref-

erences trying to use the global interconnect can lead to increased delays.