0% found this document useful (0 votes)

2 views24 pages

hpc lab

OpenMP is a shared-memory parallel programming API that allows for the creation and management of threads using directives and runtime routines. It employs a fork-join model for parallel execution and offers scheduling methods like static and dynamic to distribute tasks among threads. The document includes examples demonstrating basic OpenMP usage, including a 'Hello World' program and a parallelized array addition task.

Uploaded by

Yousef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views24 pages

hpc lab

Uploaded by

Yousef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Open multi-processing (OpenMP) is a standard, shared-memory, multi

processing application program interface (API) for shared-memory

parallelism.

It provides a set of compiler directives, environment variables, and

runtime library routines for threads creation, management, and
synchronization.

When a parallel region executes, the program creates a number of

threads running concurrently.

With OpenMP, forked threads have access to shared memory. For more
details, visit the OpenMP home page.

Fork - Join Model:

OpenMP uses the fork-join model of parallel execution:

Exercise 1 in openmp :
The purpose of this code is to introduce the basic concepts of parallel
programming using OpenMP. It shows how to create a parallel region, how to
obtain the thread ID and the total number of threads, and how to execute
parallel tasks. This "Hello World" example serves as a starting point for
understanding parallelism with OpenMP.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])

{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(nthreads, tid)
{

/* Obtain thread number */

tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */

if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}

} /* All threads join master thread and disband */

This code is an example of using OpenMP, an API for parallel programming in

C/C++. It demonstrates a simple "Hello World" parallel program. Here's what
it does:

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for

standard input/output, and stdlib.h for standard library functions.
2. The main() function initializes two variables: nthreads (number of
threads) and tid (thread ID).
3. Inside a parallel region defined by #pragma omp parallel , each thread
obtains its unique thread number using omp_get_thread_num() and
prints "Hello World from thread = <thread_id>".
4. The master thread (thread with ID 0) additionally prints the total
number of threads using omp_get_num_threads().
5. All threads synchronize at the end of the parallel region with the
implicit barrier.

Exercise 1

§§§§§§112wThis code is an example of using OpenMP, an API for parallel

programming in C/C++. It demonstrates a simple "Hello World" parallel
program. Here's what it does:

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for

standard input/output, and stdlib.h for standard library functions.
2. The main() function initializes two variables: nthreads (number of
threads) and tid (thread ID).
3. Inside a parallel region defined by #pragma omp parallel , each thread
obtains its unique thread number using omp_get_thread_num() and prints
"Hello World from thread = <thread_id>".
4. The master thread (thread with ID 0) additionally prints the total number
of threads using omp_get_num_threads().
5. All threads synchronize at the end of the parallel region with the implicit
barrier.

The output of the code will be

The output of this code will display the thread number and the total number of threads
within the parallel region. Here's an example of what the output might look like:

```
Hello World from thread = 0
Hello World from thread = 1
Hello World from thread = 2
Hello World from thread = 3
Number of threads = 4
```

In this output:
- Each line corresponds to a thread printing "Hello World" along with its thread number
(`tid`).
- The master thread (thread with ID 0) additionally prints the total number of threads
(`nthreads`) after all threads have been created and executed.
- The order in which threads print their "Hello World" messages may vary due to the
concurrent execution of threads in the parallel region. However, the master thread's
message about the total number of threads will always be printed after all other
threads' messages.

What is Scheduling in OpenMP

Scheduling is a method in OpenMP to distribute iterations to different threads
in for loop.

The basic form of OpenMP scheduling is

#pragma omp parallel for schedule(scheduling-type) for(conditions){

do something
}
Of course you can use #pragma omp parallel for directly without scheduling, it
is equal to #pragma omp parallel for schedule(static,1)[1]

If you run

int main()
{
#pragma omp parallel for schedule(static,1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

and

int main()
{
#pragma omp parallel for for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

The result stays similar. 20 tasks distributes on 12 threads on my 6-core cpu

machine (thread_number = core_number * 2) equally, order to print the
result is quite random, but not a big issue(if you run the same code for
multiple times, the printed might be different, too).

Result 1:

Thread 5 is running number 5

Thread 5 is running number 17
Thread 1 is running number 1
Thread 1 is running number 13
Thread 3 is running number 3
Thread 3 is running number 15
Thread 6 is running number 6
Thread 6 is running number 18
Thread 0 is running number 0
Thread 0 is running number 12
Thread 9 is running number 9
Thread 4 is running number 4
Thread 4 is running number 16
Thread 2 is running number 2
Thread 2 is running number 14
Thread 7 is running number 7
Thread 7 is running number 19
Thread 10 is running number 10
Thread 11 is running number 11
Thread 8 is running number 8

Result 2:

Thread 4 is running number 8

Thread 4 is running number 9
Thread 1 is running number 2
Thread 1 is running number 3
Thread 0 is running number 0
Thread 0 is running number 1
Thread 6 is running number 12
Thread 6 is running number 13
Thread 8 is running number 16
Thread 9 is running number 17
Thread 10 is running number 18
Thread 11 is running number 19
Thread 2 is running number 4
Thread 2 is running number 5
Thread 5 is running number 10
Thread 5 is running number 11
Thread 3 is running number 6
Thread 3 is running number 7
Thread 7 is running number 14
Thread 7 is running number 15

Static
#pragma omp parallel for schedule(static,chunk-size)

If you do not specify chunk-size variable, OpenMP will divides iterations into
chunks that are approximately equal in size and it distributes chunks to
threads in order(Notice that is why static method different from
others.In the for loop we discussed before, under 12-thread condition, each
thread will treat 1-2 iterations; if you only use 4 threads, each thread will
treat 5 iterations.

Result after using #pragma omp parallel for schedule(static)(If you do not
specify chunk-size, the default value is 1)

Thread 0 is running number 0

Thread 0 is running number 1
Thread 6 is running number 12
Thread 6 is running number 13
Thread 8 is running number 16
Thread 3 is running number 6
Thread 3 is running number 7
Thread 2 is running number 4
Thread 2 is running number 5
Thread 9 is running number 17
Thread 10 is running number 18
Thread 11 is running number 19
Thread 5 is running number 10
Thread 5 is running number 11
Thread 1 is running number 2
Thread 1 is running number 3
Thread 4 is running number 8
Thread 4 is running number 9
Thread 7 is running number 14
Thread 7 is running number 15

If you specify chunk-size variable, the iterations will be divide into iter_size /
chunk_size chunks.

Notice: iter_size is 20 in this example, because for loop ranges from

0 to 20(not include 20 itself) here

int main()
{
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

20 iterations will be divided into 7 chunks(6 with 3 iters, 1 with 2 iters), the
result is:

Thread 5 is running number 15

Thread 5 is running number 16
Thread 5 is running number 17
Thread 2 is running number 6
Thread 2 is running number 7
Thread 2 is running number 8
Thread 6 is running number 18
Thread 6 is running number 19
Thread 1 is running number 3
Thread 1 is running number 4
Thread 1 is running number 5
Thread 3 is running number 9
Thread 3 is running number 10
Thread 3 is running number 11
Thread 4 is running number 12
Thread 4 is running number 13
Thread 0 is running number 0
Thread 0 is running number 1
Thread 0 is running number 2
Thread 4 is running number 14

It is clear that the cpu only uses thread 0, 1, 2, 3, 4, 5, 6 here

But what if iter_size / chunk_size is larger than the number of threads in your
computer, or number of threads you specified
in omp_set_num_threads(thread_num)?

The following example how OpenMP works under this kind of condition.

int main()
{
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

Result:

Thread 1 is running number 3

Thread 1 is running number 4
Thread 1 is running number 5
Thread 1 is running number 15
Thread 1 is running number 16
Thread 1 is running number 17
Thread 3 is running number 9
Thread 3 is running number 10
Thread 3 is running number 11
Thread 0 is running number 0
Thread 0 is running number 1
Thread 0 is running number 2
Thread 0 is running number 12
Thread 0 is running number 13
Thread 0 is running number 14
Thread 2 is running number 6
Thread 2 is running number 7
Thread 2 is running number 8
Thread 2 is running number 18
Thread 2 is running number 19

OpenMP will still split task into 7 chunks, but distributes the chunks to
threads in a circular order, like the following figure shows
Dynamic
#pragma omp parallel for schedule(dynamic,chunk-size)

OpenMP will still split task into iter_size/chunk_size chunks, but distribute
trunks to threads dynamically without any specific order.

If you run

int main()
{
#pragma omp parallel for schedule(dynamic, 1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

#pragma omp parallel for schedule(dynamic, 1 is equivalent to #pragma omp

parallel for schedule(dynamic)

Result:

Thread 1 is running number 2

Thread 1 is running number 7
Thread 1 is running number 9
Thread 1 is running number 10
Thread 1 is running number 11
Thread 1 is running number 13
Thread 1 is running number 14
Thread 1 is running number 15
Thread 1 is running number 17
Thread 1 is running number 19
Thread 3 is running number 0
Thread 0 is running number 4
Thread 8 is running number 12
Thread 4 is running number 3
Thread 6 is running number 6
Thread 9 is running number 16
Thread 5 is running number 1
Thread 7 is running number 8
Thread 10 is running number 18
Thread 2 is running number 5

You can see that thread 1 took on 10 iters while others took only 0-1.

In OpenMP, scheduling refers to how loop iterations or tasks are divided among threads for
parallel execution. Two common scheduling strategies are dynamic scheduling and static
scheduling. Here's a comparison between the two:

1. Static Scheduling:
 In static scheduling, loop iterations are divided among threads before runtime,
typically at the beginning of the parallel region.
 The number of iterations assigned to each thread is determined statically based on
the loop iteration space and the number of threads.
 Workload distribution is done once, and each thread is assigned a fixed set of
iterations to execute.
 Static scheduling is beneficial when the workload of each iteration is roughly the
same, and there is little variation in execution time across iterations.
 It may lead to load imbalance if the workload of iterations varies significantly, as
some threads may finish their work much earlier than others.
2. Dynamic Scheduling:
 In dynamic scheduling, loop iterations are dynamically assigned to threads at
runtime.
 The loop iterations are divided into chunks, and each thread takes a new chunk of
work when it finishes its previous chunk.
 This approach enables better load balancing because work distribution can adapt
to runtime conditions, such as varying execution times of iterations.
 Dynamic scheduling incurs overhead due to the runtime decision-making process
and synchronization between threads to acquire new work chunks.
 It is suitable for situations where the workload of iterations varies significantly or
when the execution time of iterations is unpredictable.

In summary, static scheduling divides loop iterations among threads before runtime, providing
simplicity and potentially better performance in cases of uniform workload. On the other hand,
dynamic scheduling assigns iterations dynamically at runtime, offering better load balancing at
the cost of additional overhead. The choice between static and dynamic scheduling depends on
the characteristics of the workload and the desired trade-offs between simplicity and load
balancing.
Exercise 2
#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

#define CHUNKSIZE 10

#define N 100

int main (int argc, char *argv[])

int nthreads, tid, i, chunk;

float a[N], b[N], c[N];

/* Some initializations */

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;
#pragma omp parallel shared (a,b,c,nthreads,chunk) private(i,tid)

tid = omp_get_thread_num();

if (tid == 0)

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);}

printf("Thread %d starting...\n",tid);

#pragma omp for schedule(dynamic,chunk)

for (i=0; i<N; i++)

c[i] = a[i] + b[i];

printf("Thread %d: c[%d]= %f\n",tid,i,c[i]);

} /* end of parallel section *

this code is another OpenMP example demonstrating parallelization of a simple task. Here's what
it does:

1. Initialization: It initializes three arrays a, b, and c, each of size N=100. Arrays a and
b are filled with values based on the index.
2. Parallel Region: It enters a parallel region using #pragma omp parallel . This
directive spawns multiple threads to execute the enclosed code block in parallel.
3. Thread Information: Each thread obtains its thread ID using
omp_get_thread_num() and prints it. The master thread (thread with ID 0) also
prints the total number of threads using omp_get_num_threads().
4. Parallel Loop: Inside the parallel region, a loop is parallelized using #pragma omp
for. This loop computes element-wise addition of arrays a and b, storing the result in
array c.
5. Dynamic Scheduling: The loop is scheduled dynamically using schedule(dynamic,
chunk). This means that iterations are dynamically assigned to threads at runtime, and
each thread processes a chunk of chunk iterations before requesting more work.
6. Printing Results: Each thread prints the computed value of c[i] for the iterations it
processes.
7. End of Parallel Section: After the loop, the parallel section ends, and all threads
synchronize implicitly.

The purpose of this code is to demonstrate how to parallelize a simple computation task
(element-wise addition of two arrays) using OpenMP. It showcases how to utilize parallel loops
and dynamic scheduling to distribute work among threads efficiently. Additionally, it provides
insights into thread management and synchronization within a parallel region.
The output in the case of dynamic schudling
The output of this code will display the result of parallel addition of elements from arrays `a` and `b` into array `c`, along with some
diagnostic information about the threads. Since the code employs OpenMP parallelism, the order of thread execution and the
scheduling of iterations may vary. Here's an example of what the output might look like:

```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```

In this output:
- Each thread prints a message indicating its thread number (`tid`) when it starts.
- The master thread (thread with ID 0) additionally prints the total number of threads (`nthreads`) after all threads have started.
- Each thread then computes and prints the elements of array `c` that it has calculated.
- The order in which threads start and process iterations, as well as the scheduling of iterations, may vary due to the parallel
execution.
The output in the case of static schudling .
If we replace the dynamic scheduling with static scheduling in the code, the output might change due to the different scheduling
behavior. With static scheduling, loop iterations are statically divided among threads before runtime, typically at the beginning of the
parallel region. Each thread is assigned a fixed set of iterations to execute.

Here's what the modified output might look like with static scheduling:

```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 0: c[0]= 0.000000
Thread 0: c[1]= 2.000000
Thread 0: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[0]= 0.000000
Thread 1: c[1]= 2.000000
Thread 1: c[2]= 4.000000
...
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[0]= 0.000000
Thread 2: c[1]= 2.000000
Thread 2: c[2]= 4.000000
...
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```

In this output:
- Each thread still starts and prints its thread number (`tid`) as before.
- The master thread (thread with ID 0) still prints the total number of threads (`nthreads`) as before.
- However, with static scheduling, the distribution of loop iterations is predetermined before runtime, so each thread will execute a
contiguous chunk of iterations. Thus, the order in which threads process iterations will be more predictable and may follow a pattern
based on the static assignment of iterations.

Exercise 3
/
******************************************************************************
* FILE: omp_orphan.c
* DESCRIPTION:
* OpenMP Example - Parallel region with an orphaned directive - C/C++
Version
* This example demonstrates a dot product being performed by an orphaned
* loop reduction construct. Scoping of the reduction variable is critical.
* AUTHOR: Blaise Barney 5/99
* LAST REVISED: 06/30/05
******************************************************************************
/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define VECLEN 100

float a[VECLEN], b[VECLEN], sum;

float dotprod ()
{
int i,tid;

tid = omp_get_thread_num();
#pragma omp for reduction(+:sum)
for (i=0; i < VECLEN; i++)
{
sum = sum + (a[i]*b[i]);
printf(" tid= %d i=%d\n",tid,i);
}
}

int main (int argc, char *argv[]) {

int i;

for (i=0; i < VECLEN; i++)

a[i] = b[i] = 1.0 * i;
sum = 0.0;

#pragma omp parallel

dotprod();

printf("Sum = %f\n",sum);

This code is an example of using OpenMP for parallel computation of the dot
product of two vectors. Here's a breakdown of its purpose and functionality:

1. **Initialization**:
- Two arrays `a` and `b`, both of length `VECLEN (100)`, are initialized
with values based on their index.
- The variable `sum` is initialized to `0.0`. This variable will accumulate
the dot product of `a` and `b`.

2. **Parallel Region**:
- The `main()` function contains a parallel region defined by `#pragma omp
parallel`.
- Inside this parallel region, the function `dotprod()` is called by each
thread.

3. Dot Product Computation:

- The `dotprod()` function is responsible for computing the dot product of
`a` and `b`.
- It utilizes a parallel loop construct with `#pragma omp for`.
- Within this loop, each thread iterates over elements of `a` and `b`,
accumulating their products into the `sum` variable.
- The reduction clause `reduction(+:sum)` ensures that each thread has a
private copy of `sum` and aggregates its local `sum` values into a shared
`sum` variable at the end of the loop.

4. **Print Statements**:
- Within the loop in `dotprod()`, each thread prints its thread ID (`tid`)
and the index (`i`) it's currently processing.

5. **Summation Output**:
- After the parallel region, the `main()` function prints out the computed
sum, which represents the dot product of vectors `a` and `b`.

The purpose of this code is to demonstrate how to parallelize a simple

computation task, such as the dot product of two vectors, using OpenMP
directives. It showcases parallel loop constructs and reduction clauses to
efficiently distribute work among multiple threads and aggregate results.
Additionally, it highlights the scoping of variables within parallel regions
and the use of private and shared variables to maintain data consistency.

Message passing interface (MPI) is a standard specification of message-

passing interface for parallel computation in distributed-memory systems.

MPI isn’t a programming language. It’s a library of functions that

programmers can call from C, C++, or Fortran code to write parallel
programs.

With MPI, an MPI communicator can be dynamically created and have

multiple processes concurrently running on separate nodes of clusters.
Each process has a unique MPI rank to identify it, its own memory space,
and executes independently from the other processes. Processes
communicate with each other by passing messages to exchange data.
Parallelism occurs when a program task gets partitioned into small chunks
and distributes those chunks among the processes, in which such each
process processes its part.

MPI Communication Methods

MPI provides three different communication methods that MPI processes
can use to communicate with each other. The communication methods
are discussed as follows:
Point-to-Point Communication

MPI Versus OpenMP

The following is a list of the most common differences between MPI and
OpenMP.
MPI OpenMP
Available from different vendors and gets compiled on Windows, An add-on in a compiler such as a GNU com
macOS, and Linux operating systems.. compiler.
Supports parallel computation for distributed-memory and shared- Supports parallel computation for shared-me
MPI OpenMP
memory systems. only.
A process-based parallelism. A thread-based parallelism.
With MPI, each process has its own memory space and executes With OpenMP, threads share the same resou
independently from the other processes. shared memory.
Processes exchange data by passing messages to each other. There is no notion of message-passing. Thre
memory.
Process creation overhead occurs one time. It depends on the implementation. More ove
when creating threads to join a task.

Exercise 1 mpi

/
******************************************************************************
* FILE: mpi_hello.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program
* AUTHOR: Blaise Barney
* LAST REVISED: 03/05/10
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0

int main (int argc, char *argv[])

{
int numtasks, taskid, len;
char hostname[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
MPI_Finalize();

}
This code is a simple "Hello World" program written using MPI (Message
Passing Interface), which is a standard for parallel programming. Here's a
breakdown of its purpose and functionality:
1. **Initialization**:
- The program starts by initializing MPI using `MPI_Init(&argc, &argv)`. This
call initializes the MPI execution environment.

2. Determination of Process Information:

- It retrieves the total number of MPI processes (`numtasks`) and the ID of
the current process (`taskid`) using `MPI_Comm_size()` and
`MPI_Comm_rank()` functions, respectively.
- Additionally, it retrieves the hostname of the current process using
`MPI_Get_processor_name()`.

3. **Printing Greetings**:
- Each MPI process prints a "Hello" message, indicating its task ID and the
hostname it's running on.
- If the current process is the master process (identified by `taskid ==
MASTER`), it also prints the total number of MPI tasks.

4. **Finalization**:
- Finally, MPI is finalized using `MPI_Finalize()`, which cleans up the MPI
environment before program termination.

The purpose of this code is to illustrate the basic structure of an MPI program
and demonstrate how to initialize MPI, obtain process information, and
perform simple parallel output. It serves as a starting point for understanding
MPI parallel programming and how processes interact in a distributed-
memory computing environment.

The output of the code

The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed.

Assuming the code is executed with multiple MPI tasks, here's an example of
the expected output:

```
Hello from task 0 on <hostname>!
Hello from task 1 on <hostname>!
...
Hello from task <numtasks-1> on <hostname>!
MASTER: Number of MPI tasks is: <numtasks>
```

- Each MPI task will print its task ID (`taskid`) and the hostname of the
system (`hostname`).
- The master task (with `taskid == MASTER`, usually 0) will also print the
total number of MPI tasks (`numtasks`).

The `<hostname>` will be replaced by the actual hostname of the system,

and `<numtasks>` will be replaced by the total number of MPI tasks running
the code.

Exercise 2

/
******************************************************************************
* FILE: mpi_helloBsend.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program that uses blocking
* send/receive routines.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/08/15
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0

int main (int argc, char *argv[])

{
int numtasks, taskid, len, partner, message;
char hostname[MPI_MAX_PROCESSOR_NAME];
MPI_Status status;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

/* need an even number of tasks */

if (numtasks % 2 != 0) {
if (taskid == MASTER)
printf("Quitting. Need an even number of tasks: numtasks=%d\n",
numtasks);
}

else {
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);

MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);

/* determine partner and then send/receive with partner */

if (taskid < numtasks/2) {
partner = numtasks/2 + taskid;
MPI_Send(&taskid, 1, MPI_INT, partner, 1, MPI_COMM_WORLD);
MPI_Recv(&message, 1, MPI_INT, partner, 1, MPI_COMM_WORLD, &status);
}
else if (taskid >= numtasks/2) {
partner = taskid - numtasks/2;
MPI_Recv(&message, 1, MPI_INT, partner, 1, MPI_COMM_WORLD, &status);
MPI_Send(&taskid, 1, MPI_INT, partner, 1, MPI_COMM_WORLD);
}

/* print partner info and exit*/

printf("Task %d is partner with %d\n",taskid,message);
}

MPI_Finalize();

The purpose of this code is to demonstrate a simple "Hello World" program

using MPI (Message Passing Interface), specifically focusing on blocking send
and receive routines (`MPI_Send` and `MPI_Recv`). Here's a breakdown of its
purpose and functionality:

1. Initialization: The program begins by initializing the MPI execution

environment using `MPI_Init(&argc, &argv)`. It then obtains the ID of the
current MPI task (`taskid`) and the total number of MPI tasks (`numtasks`).

2. Even Number of Tasks Check: It checks if the total number of tasks is

even. If it's not, the program prints a message and quits. This check ensures
that the program can pair up tasks for communication.

3. Printing Information: If the total number of tasks is even and the

current task is the master (with `taskid == MASTER`), it prints the total
number of MPI tasks.

4. **Hello Message**: Each MPI task prints a "Hello" message, indicating its
task ID and the hostname it's running on.

5. Communication: Tasks are paired up to perform simple communication

using blocking send and receive:
- Each task determines its partner based on its task ID and the total
number of tasks.
- If a task has a lower task ID than the midpoint of the total tasks, it sends
its task ID to its partner and receives a message from the partner.
- If a task has a higher or equal task ID than the midpoint, it receives a
message from its partner and sends its task ID.
- This communication ensures that each task is paired with another task
and exchanges information.

6. Printing Partner Information: After communication, each task prints

information about its partner task.

7. Finalization: Finally, MPI is finalized using `MPI_Finalize()`, which cleans

up the MPI environment before program termination.

Overall, the purpose of this code is to illustrate basic message passing and
communication between MPI tasks using blocking send and receive routines.
It demonstrates how to pair up tasks and exchange information in a
distributed-memory computing environment.

The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed. Here's what you can
expect:

1. If the number of MPI tasks (`numtasks`) is not even, the program will print
a message indicating that it's quitting because an even number of tasks is
required.

2. If the number of MPI tasks is even:

- The master task (task with ID 0) will print the total number of MPI tasks.
- Each MPI task will print a "Hello" message indicating its task ID and the
hostname of the system it's running on.
- Each MPI task will then pair up with another task and exchange
information:
- For tasks with IDs less than half of the total tasks, they will send their
task ID to their partner and receive a message from their partner.
- For tasks with IDs greater than or equal to half of the total tasks, they
will receive a message from their partner and send their task ID.
- After communication, each task will print information about its partner
task.

The output will look something like this:

```
MASTER: Number of MPI tasks is: <numtasks>
Hello from task <taskid> on <hostname>!
Hello from task <taskid> on <hostname>!
...
Task <taskid1> is partner with <taskid2>
Task <taskid3> is partner with <taskid4>
...
```

`<numtasks>` will be replaced by the actual number of MPI tasks,

`<taskid>` will be replaced by the task's ID, `<hostname>` will be replaced
by the hostname of the system, `<taskid1>` and `<taskid2>` will be
replaced by the task IDs of paired tasks, and so on.

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
OpenMP Dynamic Scheduling
No ratings yet
OpenMP Dynamic Scheduling
6 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
Openmp 2pp
No ratings yet
Openmp 2pp
15 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
Mcap-lab Manual 1
No ratings yet
Mcap-lab Manual 1
19 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
openmp_HPC_ass1
No ratings yet
openmp_HPC_ass1
43 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Parallel Programming Using Openmp: Mike Bailey
No ratings yet
Parallel Programming Using Openmp: Mike Bailey
27 pages
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
No ratings yet
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
61 pages
Openmp 6pp
No ratings yet
Openmp 6pp
5 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
Open MP
No ratings yet
Open MP
35 pages
Open MP
No ratings yet
Open MP
30 pages
Parallel Programming Module 3
No ratings yet
Parallel Programming Module 3
44 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
No ratings yet
Govindarajan_ParallelizationPrinciples-NSM-AstroPhysics
50 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Exercise 1 (Openmp-I)
No ratings yet
Exercise 1 (Openmp-I)
10 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
openMP
No ratings yet
openMP
28 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
final
No ratings yet
final
30 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
Updated_CS8083 MCP UNIT III notes
No ratings yet
Updated_CS8083 MCP UNIT III notes
26 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Multiprocessing OpenMP
No ratings yet
Multiprocessing OpenMP
15 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Solved Homework OS
No ratings yet
Solved Homework OS
1 page
Regular Expression
No ratings yet
Regular Expression
1 page
70 Fill-In-The-Blank Questions on Computing Platforms - Chapter 4
No ratings yet
70 Fill-In-The-Blank Questions on Computing Platforms - Chapter 4
8 pages
full stack20254-C6BLbQz4
No ratings yet
full stack20254-C6BLbQz4
2 pages
Sheet 2-1
No ratings yet
Sheet 2-1
7 pages
Mahmoud Anwar Portfolio
No ratings yet
Mahmoud Anwar Portfolio
12 pages
Mahmoud Anwar PHP Laravel Developer
No ratings yet
Mahmoud Anwar PHP Laravel Developer
2 pages
Chapter 2 اولي
No ratings yet
Chapter 2 اولي
4 pages
SE-lect6
No ratings yet
SE-lect6
35 pages
Mid Exam Image ف3 solution
No ratings yet
Mid Exam Image ف3 solution
4 pages
Exercise #1
No ratings yet
Exercise #1
1 page
Magnetically Coupled Circuits-Lecture 2
No ratings yet
Magnetically Coupled Circuits-Lecture 2
16 pages
Quiz 1 - A
No ratings yet
Quiz 1 - A
2 pages
LEC2 - 7segment - LED Matrix
No ratings yet
LEC2 - 7segment - LED Matrix
37 pages
Creating A XML File Using - Composer and Hjoin Steps Within Hierarchical Data Stage
No ratings yet
Creating A XML File Using - Composer and Hjoin Steps Within Hierarchical Data Stage
7 pages
E-commerce chap 4
No ratings yet
E-commerce chap 4
37 pages
Unit I - Cab
No ratings yet
Unit I - Cab
35 pages
The Benefits of Artificial Intelligence
No ratings yet
The Benefits of Artificial Intelligence
2 pages
Untitled
No ratings yet
Untitled
1,112 pages
A2.Pilot Pioneer Expert Quick Start V10.5
No ratings yet
A2.Pilot Pioneer Expert Quick Start V10.5
14 pages
1 s2.0 S2772508124000383 Main
No ratings yet
1 s2.0 S2772508124000383 Main
12 pages
Burp Suit Intruder Guide-1
No ratings yet
Burp Suit Intruder Guide-1
14 pages
Engineer Programming Eclipse 8-16-32 (2
No ratings yet
Engineer Programming Eclipse 8-16-32 (2
100 pages
STATISTICS
No ratings yet
STATISTICS
23 pages
Wachtel Resume
No ratings yet
Wachtel Resume
1 page
SEO
No ratings yet
SEO
2 pages
Internship Report
No ratings yet
Internship Report
18 pages
SY0-071-Module 4 Powerpoint Slides
No ratings yet
SY0-071-Module 4 Powerpoint Slides
358 pages
HCF and LCM
No ratings yet
HCF and LCM
34 pages
BCSP064 - Project - Guidelines July 2023-Jan 2024
No ratings yet
BCSP064 - Project - Guidelines July 2023-Jan 2024
18 pages
Ear Does Fourier Analysis
No ratings yet
Ear Does Fourier Analysis
6 pages
Smart City
No ratings yet
Smart City
13 pages
Azure Fundamentals (AZ-900)
No ratings yet
Azure Fundamentals (AZ-900)
56 pages
2020 ICS Cyber Attack Trends
No ratings yet
2020 ICS Cyber Attack Trends
33 pages
Mern Stack Seminar Ppt
No ratings yet
Mern Stack Seminar Ppt
12 pages
9th to 15th php prgrms
No ratings yet
9th to 15th php prgrms
13 pages
Citrus Plant Disease Classifcation and Recommendation System Using Deep Learning
No ratings yet
Citrus Plant Disease Classifcation and Recommendation System Using Deep Learning
53 pages
Non-Parametric Multivariate Analyses of Changes in Community Structure
No ratings yet
Non-Parametric Multivariate Analyses of Changes in Community Structure
28 pages
Lastexception 63823776804
No ratings yet
Lastexception 63823776804
3 pages
Lecture16 CE72.12FEM - Meshing
No ratings yet
Lecture16 CE72.12FEM - Meshing
26 pages
30-Day Google PM Study Guide
No ratings yet
30-Day Google PM Study Guide
9 pages
BA-A-S120Mini User Manual en
100% (1)
BA-A-S120Mini User Manual en
19 pages
Constrained Design of Deep Iris Networks
No ratings yet
Constrained Design of Deep Iris Networks
10 pages
Computer Presentation
No ratings yet
Computer Presentation
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

hpc lab

Uploaded by

hpc lab

Uploaded by

Open multi-processing (OpenMP) is a standard, shared-memory, multi

processing application program interface (API) for shared-memory

It provides a set of compiler directives, environment variables, and

When a parallel region executes, the program creates a number of

Fork - Join Model:

int main (int argc, char *argv[])

/* Obtain thread number */

/* Only master thread does this */

} /* All threads join master thread and disband */

This code is an example of using OpenMP, an API for parallel programming in

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for

§§§§§§112wThis code is an example of using OpenMP, an API for parallel

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for

The output of the code will be

What is Scheduling in OpenMP

The basic form of OpenMP scheduling is

#pragma omp parallel for schedule(scheduling-type) for(conditions){

The result stays similar. 20 tasks distributes on 12 threads on my 6-core cpu

Thread 5 is running number 5

Thread 4 is running number 8

Thread 0 is running number 0

Notice: iter_size is 20 in this example, because for loop ranges from

Thread 5 is running number 15

It is clear that the cpu only uses thread 0, 1, 2, 3, 4, 5, 6 here

Thread 1 is running number 3

#pragma omp parallel for schedule(dynamic, 1 is equivalent to #pragma omp

Thread 1 is running number 2

int main (int argc, char *argv[])

int nthreads, tid, i, chunk;

float a[N], b[N], c[N];

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

printf("Number of threads = %d\n", nthreads);}

#pragma omp for schedule(dynamic,chunk)

for (i=0; i<N; i++)

c[i] = a[i] + b[i];

printf("Thread %d: c[%d]= %f\n",tid,i,c[i]);

} /* end of parallel section *

float a[VECLEN], b[VECLEN], sum;

int main (int argc, char *argv[]) {

for (i=0; i < VECLEN; i++)

#pragma omp parallel

3. **Dot Product Computation**:

The purpose of this code is to demonstrate how to parallelize a simple

Message passing interface (MPI) is a standard specification of message-

MPI isn’t a programming language. It’s a library of functions that

With MPI, an MPI communicator can be dynamically created and have

MPI Communication Methods

MPI Versus OpenMP

int main (int argc, char *argv[])

2. **Determination of Process Information**:

The output of the code

The `<hostname>` will be replaced by the actual hostname of the system,

int main (int argc, char *argv[])

/* need an even number of tasks */

/* determine partner and then send/receive with partner */

/* print partner info and exit*/

The purpose of this code is to demonstrate a simple "Hello World" program

1. **Initialization**: The program begins by initializing the MPI execution

2. **Even Number of Tasks Check**: It checks if the total number of tasks is

3. **Printing Information**: If the total number of tasks is even and the

5. **Communication**: Tasks are paired up to perform simple communication

6. **Printing Partner Information**: After communication, each task prints

7. **Finalization**: Finally, MPI is finalized using `MPI_Finalize()`, which cleans

2. If the number of MPI tasks is even:

The output will look something like this:

`<numtasks>` will be replaced by the actual number of MPI tasks,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

3. Dot Product Computation:

2. Determination of Process Information:

1. Initialization: The program begins by initializing the MPI execution

2. Even Number of Tasks Check: It checks if the total number of tasks is

3. Printing Information: If the total number of tasks is even and the

5. Communication: Tasks are paired up to perform simple communication

6. Printing Partner Information: After communication, each task prints

7. Finalization: Finally, MPI is finalized using `MPI_Finalize()`, which cleans