Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Problem 424-4: Chapter 5 presents a series of performance measurements showing t

ID: 3861407 • Letter: P

Question

Problem 424-4:
Chapter 5 presents a series of performance measurements showing the performance benefits of a sequence of optimizations to the original combine1() function. For this problem, you will repeat that sequence of optimizations starting with this function:

Start with with this tar file ( http://ece324web.groups.et.byu.net/hw/code/dotprod.tar ) that includes dotproduct1() and a version of the getcpe timing code used in the previous homework set. Compiling and running the initial getcpe program should give you the CPE for the original code. Consistent with the treatment in the text, you should create a different version of the function for each of the six required optimizations listed below. Follow the naming conventions of the test -- dotproduct5() should be the version with 2x loop unrolling. For this assignment, you need only consider the cose where data_t is a double.

2. Move the call to vec_length out of the loop

3. Directly access the vector data

4. Accumulate results in a local variable

5. Unroll the loop by 2

6. Unroll the loop by 2 with 2-way parallelism

7. Unroll the loop by 2 and reassociate

Your submission should include the C source code for all 6 new versions of the dotproduct function along with the reported CPE of each. (No other source code needs to be submitted.)
After you obtain all 7 required CPE measurements, you should write a paragraph comparing your results with those in the book and, where possible, explaining the differences. Finally, state what can be inferred from your results about the functional units in the processor of the system you used. (Can you determine both latency and throughput bounds, for example?)

Explanation / Answer

PROGRAM2:

And the results: