Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Part 2: Multi-threaded/Parallel Programming using OpenMP The purpose of this exe

ID: 3911127 • Letter: P

Question

Part 2: Multi-threaded/Parallel Programming using OpenMP

The purpose of this exercise is for you to get some experience using OpenMP directives to turn linear program code into concurrent regions that can execute in parallel in modern multicore systems and to practice with the various synchronization primitives that the API provides.

The programs listed in each task are to be compiled and run in your CentOS VM and the output collected is to be included in your report and spreadsheet along with your observations.

Task 2.1 - Compile a “hello world” program -

The newer versions of the gcc compiler will compile OpenMP programs. Create a subdirectory in your work space and call it OpenMP. In this directory create a text file calling it helloworld.c and in it place the following code:

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

int main (int argc, char *argv[]) {

int nthreads, tid;

/* Fork a team of threads giving them their own copies of variables */

#pragma omp parallel private(nthreads, tid)

{

tid = omp_get_thread_num(); // Obtain thread number

printf("Hello World from thread = %d ", tid);

if (tid == 0) { // Only master thread does this

      nthreads = omp_get_num_threads();

      printf("Number of threads = %d ", nthreads);

}

} /* All threads join master thread and disband */

}

This program has the basic parallel construct for defining a single parallel region for multiple threads. It also

has a private clause for defining a variable local to each thread. Compile the program with the following command:

         gcc –fopenmp –o helloworld helloworld.c

Execute the compiled program, observe and document output in your report.

Alter the number of threads to 4 by altering the environment variable with the command:

export OMP_NUM_THREADS=4

Verify value of env var just set with:

echo $OMP_NUM_THREADS

Re-execute the program, observe and document output in your report.

Task 2.2 - Work Sharing - Weight: 4%

This task explores the use of the for work-sharing construct. The program provided that adds two vectors

together using a work-sharing approach to assign work to threads is given below:

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

#define CHUNKSIZE    10

#define N                       100

int main (int argc, char *argv[]) {

int nthreads, tid, i, chunk;

float a[N], b[N], c[N];

for (i=0; i < N; i++)

     a[i] = b[i] = i * 1.0; // initialize arrays

chunk = CHUNKSIZE;

#pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid) {

tid = omp_get_thread_num();

if (tid == 0) {

     nthreads = omp_get_num_threads();

     printf("Number of threads = %d ", nthreads);

}

printf("Thread %d starting... ",tid);

#pragma omp for schedule(dynamic,chunk)

for (i=0; i<N; i++) {

       c[i] = a[i] + b[i];

      printf("Thread %d: c[%d]= %f ",tid,i,c[i]);

}

} /* end of parallel section */

}

This program has an overall parallel region within which there is a work-sharing for construct. Compile and

execute the program. Depending upon the scheduling of work different threads might add elements of the

vector. It may be that one thread does all the work. Execute the program several times to see any different

thread scheduling. In the case that multiple threads are being used, observe how they may interleave.

Alter the code from dynamic scheduling to static scheduling and repeat. What are your conclusions?

Time of execution

Measure the execution time by instrumenting the code with the OpenMP routine omp_get_wtime() at the

beginning and end of the program, finding the difference in time.

                                           

Task 2.3 - Work Sharing with sections construct - Weight: 4%

This task explores the use of the sections construction. The program that adds elements of two vectors to form

a third and also multiplies the elements of the arrays to produce a fourth vector is given below:

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>     

#define N 50

int main (int argc, char *argv[]) {

int i, nthreads, tid;

float a[N], b[N], c[N], d[N];

/* Some initializations */

for (i=0; i<N; i++) {

a[i] = i * 1.5;

b[i] = i + 22.35;

c[i] = d[i] = 0.0;

}

#pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid) {

tid = omp_get_thread_num();

if (tid == 0) {

   nthreads = omp_get_num_threads();

   printf("Number of threads = %d ", nthreads);

}

printf("Thread %d starting... ",tid);

#pragma omp sections nowait {

#pragma omp section {

printf("Thread %d doing section 1 ",tid);

for (i=0; i<N; i++) {

    c[i] = a[i] + b[i];

   printf("Thread %d: c[%d]= %f ",tid,i,c[i]);

}

}

#pragma omp section {

printf("Thread %d doing section 2 ",tid);

for (i=0; i<N; i++) {

    d[i] = a[i] * b[i];

    printf("Thread %d: d[%d]= %f ",tid,i,d[i]);

}

}

} /* end of sections */

printf("Thread %d done. ",tid);

} /* end of parallel section */

        }

This program has a parallel region but now with variables declared as shared among the threads as well as

private variables. Also there is a sections work sharing construct. Within the sections construct, there are

individual section blocks that are to be executed once by one member of the team of threads.

Compile and execute the program and make conclusions on its execution.

Explanation / Answer

#include #include #include #define CHUNKSIZE 10 #define N 100 int main (int argc, char *argv[]) { int nthreads, tid, i, chunk; float a[N], b[N], c[N]; double dtime; //stores the execution time for (i=0; i