Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Answer questions E please!!!!!!! a) Examine the code given below to compute the

ID: 3699202 • Letter: A

Question

Answer questions E please!!!!!!!

a) Examine the code given below to compute the average of an array: total = 0; for(j=0; j < k; j++) { sub_total = 0; /* Nested loops to avoid overflow */ for(i=0; i < N; i++) sub_total += A[j*N + i]; total += sub_total/N; } average = total/k;

When designing a cache to run this application, given a constant cache capacity and associativity, will you want a larger or smaller block size? Why?

b) examine the MODIFIED code given below: total = 0; for(i=0; i < N; i++) { sub_total = 0; /* Nested loops to avoid overflow */ for(j=0; j < k; j++) sub_total += A[j*N + i]; total += sub_total/k; } average = total/N;

Generally, how will the size of the array and the cache capacity impact the choice of block size for good performance? Why?

c) Translate the following line of code into MIPS. Assume i is $s0, j is $s1, base address of A is $a0, N is $a1 and sub_total is $s2. sub_total += A[j*N + i];

d) Now consider that we are executing one of these programs for the very first time. Assume we have a memory system with TLB, L1 I-cache, L1 Dcache, L2 cache and 2-level Page Table Virtual Memory system. List all the steps that will/may happen as we load instructions or data from memory. You will also need to list the steps taken when target instructions or data are not in cache or page tables. i.e. the steps to handle misses.

Load instructions:

1. Read instruction: check TLB if instruction is in memory by looking up its virtual page number.

2. If virtual page number is not present in TLB, read miss. Check L1 page table to see if virtual page number is present.

Load data:

1.

e) List and number all the ADDITIONAL steps that will happen when we are executing your translated code for sub_total += A[j*N + i]; f) Suppose the whole program fit into the L1 caches and we are executing it many times. What are the steps from that will be skipped from (d) (just write down the #s)

Explanation / Answer

Answers:

a- large blocks; as program never reuses any address and access consecutive address hence large block.

b-As we wish to fit at least one iteration of the outer loop in the cache with a block size ?2 words. For this, we want at least two “columns” of A to be able to fit in the cache at a time, so you will get some hits from spatial locality. Without this, the program will suffer 100% misses. If the block size is two and en entire column can fit, one iteration of the outer loop result in all misses, but the next iteration will be all hits.As the size of the array gets smaller relative to the cache capacity, the “columns” are “shorter,” so you will want a larger line size to reduce compulsory misses.As the size of the array gets larger relative to the cache capacity, the “columns” will get “taller.” You will need to make the line size smaller to get more lines in the cache to reduce conflict misses so it can still hold an entire iteration of the outer loop.


e- steps are :(main-loop-loop-end)

# starting index of j

# loop bound(fetch k and k as byte offset)

#sub_total=0

#starting index of i

#loop bound(fetch N and N as byte offset)

#fetch sub_total

#J*N+i

#turn J*N+i into byte offset

#&A[J*N+i]

#fetch A[J*N+i]

#sub_total+A[J*N+i]

#sub_total = ...

#increment loop i

# if i<N

#increment loop j

# if j<k

end