0% found this document useful (0 votes)
2 views24 pages

hpc lab

OpenMP is a shared-memory parallel programming API that allows for the creation and management of threads using directives and runtime routines. It employs a fork-join model for parallel execution and offers scheduling methods like static and dynamic to distribute tasks among threads. The document includes examples demonstrating basic OpenMP usage, including a 'Hello World' program and a parallelized array addition task.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views24 pages

hpc lab

OpenMP is a shared-memory parallel programming API that allows for the creation and management of threads using directives and runtime routines. It employs a fork-join model for parallel execution and offers scheduling methods like static and dynamic to distribute tasks among threads. The document includes examples demonstrating basic OpenMP usage, including a 'Hello World' program and a parallelized array addition task.

Uploaded by

Yousef
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Open multi-processing (OpenMP) is a standard, shared-memory, multi

processing application program interface (API) for shared-memory


parallelism.

It provides a set of compiler directives, environment variables, and


runtime library routines for threads creation, management, and
synchronization.

When a parallel region executes, the program creates a number of


threads running concurrently.

With OpenMP, forked threads have access to shared memory. For more
details, visit the OpenMP home page.

Fork - Join Model:


OpenMP uses the fork-join model of parallel execution:

Exercise 1 in openmp :
The purpose of this code is to introduce the basic concepts of parallel
programming using OpenMP. It shows how to create a parallel region, how to
obtain the thread ID and the total number of threads, and how to execute
parallel tasks. This "Hello World" example serves as a starting point for
understanding parallelism with OpenMP.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])


{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(nthreads, tid)
{

/* Obtain thread number */


tid = omp_get_thread_num();
printf("Hello World from thread = %d\n", tid);

/* Only master thread does this */


if (tid == 0)
{
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}

} /* All threads join master thread and disband */

This code is an example of using OpenMP, an API for parallel programming in


C/C++. It demonstrates a simple "Hello World" parallel program. Here's what
it does:

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for


standard input/output, and stdlib.h for standard library functions.
2. The main() function initializes two variables: nthreads (number of
threads) and tid (thread ID).
3. Inside a parallel region defined by #pragma omp parallel , each thread
obtains its unique thread number using omp_get_thread_num() and
prints "Hello World from thread = <thread_id>".
4. The master thread (thread with ID 0) additionally prints the total
number of threads using omp_get_num_threads().
5. All threads synchronize at the end of the parallel region with the
implicit barrier.

Exercise 1

§§§§§§112wThis code is an example of using OpenMP, an API for parallel


programming in C/C++. It demonstrates a simple "Hello World" parallel
program. Here's what it does:

1. It includes necessary headers: omp.h for OpenMP functions, stdio.h for


standard input/output, and stdlib.h for standard library functions.
2. The main() function initializes two variables: nthreads (number of
threads) and tid (thread ID).
3. Inside a parallel region defined by #pragma omp parallel , each thread
obtains its unique thread number using omp_get_thread_num() and prints
"Hello World from thread = <thread_id>".
4. The master thread (thread with ID 0) additionally prints the total number
of threads using omp_get_num_threads().
5. All threads synchronize at the end of the parallel region with the implicit
barrier.

The output of the code will be


The output of this code will display the thread number and the total number of threads
within the parallel region. Here's an example of what the output might look like:

```
Hello World from thread = 0
Hello World from thread = 1
Hello World from thread = 2
Hello World from thread = 3
Number of threads = 4
```

In this output:
- Each line corresponds to a thread printing "Hello World" along with its thread number
(`tid`).
- The master thread (thread with ID 0) additionally prints the total number of threads
(`nthreads`) after all threads have been created and executed.
- The order in which threads print their "Hello World" messages may vary due to the
concurrent execution of threads in the parallel region. However, the master thread's
message about the total number of threads will always be printed after all other
threads' messages.

What is Scheduling in OpenMP


Scheduling is a method in OpenMP to distribute iterations to different threads
in for loop.

The basic form of OpenMP scheduling is

#pragma omp parallel for schedule(scheduling-type) for(conditions){


do something
}
Of course you can use #pragma omp parallel for directly without scheduling, it
is equal to #pragma omp parallel for schedule(static,1)[1]

If you run

int main()
{
#pragma omp parallel for schedule(static,1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

and

int main()
{
#pragma omp parallel for for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

The result stays similar. 20 tasks distributes on 12 threads on my 6-core cpu


machine (thread_number = core_number * 2) equally, order to print the
result is quite random, but not a big issue(if you run the same code for
multiple times, the printed might be different, too).

Result 1:

Thread 5 is running number 5


Thread 5 is running number 17
Thread 1 is running number 1
Thread 1 is running number 13
Thread 3 is running number 3
Thread 3 is running number 15
Thread 6 is running number 6
Thread 6 is running number 18
Thread 0 is running number 0
Thread 0 is running number 12
Thread 9 is running number 9
Thread 4 is running number 4
Thread 4 is running number 16
Thread 2 is running number 2
Thread 2 is running number 14
Thread 7 is running number 7
Thread 7 is running number 19
Thread 10 is running number 10
Thread 11 is running number 11
Thread 8 is running number 8

Result 2:

Thread 4 is running number 8


Thread 4 is running number 9
Thread 1 is running number 2
Thread 1 is running number 3
Thread 0 is running number 0
Thread 0 is running number 1
Thread 6 is running number 12
Thread 6 is running number 13
Thread 8 is running number 16
Thread 9 is running number 17
Thread 10 is running number 18
Thread 11 is running number 19
Thread 2 is running number 4
Thread 2 is running number 5
Thread 5 is running number 10
Thread 5 is running number 11
Thread 3 is running number 6
Thread 3 is running number 7
Thread 7 is running number 14
Thread 7 is running number 15

Static
#pragma omp parallel for schedule(static,chunk-size)

If you do not specify chunk-size variable, OpenMP will divides iterations into
chunks that are approximately equal in size and it distributes chunks to
threads in order(Notice that is why static method different from
others.In the for loop we discussed before, under 12-thread condition, each
thread will treat 1-2 iterations; if you only use 4 threads, each thread will
treat 5 iterations.

Result after using #pragma omp parallel for schedule(static)(If you do not
specify chunk-size, the default value is 1)

Thread 0 is running number 0


Thread 0 is running number 1
Thread 6 is running number 12
Thread 6 is running number 13
Thread 8 is running number 16
Thread 3 is running number 6
Thread 3 is running number 7
Thread 2 is running number 4
Thread 2 is running number 5
Thread 9 is running number 17
Thread 10 is running number 18
Thread 11 is running number 19
Thread 5 is running number 10
Thread 5 is running number 11
Thread 1 is running number 2
Thread 1 is running number 3
Thread 4 is running number 8
Thread 4 is running number 9
Thread 7 is running number 14
Thread 7 is running number 15

If you specify chunk-size variable, the iterations will be divide into iter_size /
chunk_size chunks.

Notice: iter_size is 20 in this example, because for loop ranges from


0 to 20(not include 20 itself) here

In

int main()
{
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

20 iterations will be divided into 7 chunks(6 with 3 iters, 1 with 2 iters), the
result is:

Thread 5 is running number 15


Thread 5 is running number 16
Thread 5 is running number 17
Thread 2 is running number 6
Thread 2 is running number 7
Thread 2 is running number 8
Thread 6 is running number 18
Thread 6 is running number 19
Thread 1 is running number 3
Thread 1 is running number 4
Thread 1 is running number 5
Thread 3 is running number 9
Thread 3 is running number 10
Thread 3 is running number 11
Thread 4 is running number 12
Thread 4 is running number 13
Thread 0 is running number 0
Thread 0 is running number 1
Thread 0 is running number 2
Thread 4 is running number 14

It is clear that the cpu only uses thread 0, 1, 2, 3, 4, 5, 6 here

But what if iter_size / chunk_size is larger than the number of threads in your
computer, or number of threads you specified
in omp_set_num_threads(thread_num)?

The following example how OpenMP works under this kind of condition.

int main()
{
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

Result:

Thread 1 is running number 3


Thread 1 is running number 4
Thread 1 is running number 5
Thread 1 is running number 15
Thread 1 is running number 16
Thread 1 is running number 17
Thread 3 is running number 9
Thread 3 is running number 10
Thread 3 is running number 11
Thread 0 is running number 0
Thread 0 is running number 1
Thread 0 is running number 2
Thread 0 is running number 12
Thread 0 is running number 13
Thread 0 is running number 14
Thread 2 is running number 6
Thread 2 is running number 7
Thread 2 is running number 8
Thread 2 is running number 18
Thread 2 is running number 19

OpenMP will still split task into 7 chunks, but distributes the chunks to
threads in a circular order, like the following figure shows
Dynamic
#pragma omp parallel for schedule(dynamic,chunk-size)

OpenMP will still split task into iter_size/chunk_size chunks, but distribute
trunks to threads dynamically without any specific order.

If you run

int main()
{
#pragma omp parallel for schedule(dynamic, 1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}

#pragma omp parallel for schedule(dynamic, 1 is equivalent to #pragma omp


parallel for schedule(dynamic)

Result:

Thread 1 is running number 2


Thread 1 is running number 7
Thread 1 is running number 9
Thread 1 is running number 10
Thread 1 is running number 11
Thread 1 is running number 13
Thread 1 is running number 14
Thread 1 is running number 15
Thread 1 is running number 17
Thread 1 is running number 19
Thread 3 is running number 0
Thread 0 is running number 4
Thread 8 is running number 12
Thread 4 is running number 3
Thread 6 is running number 6
Thread 9 is running number 16
Thread 5 is running number 1
Thread 7 is running number 8
Thread 10 is running number 18
Thread 2 is running number 5

You can see that thread 1 took on 10 iters while others took only 0-1.

In OpenMP, scheduling refers to how loop iterations or tasks are divided among threads for
parallel execution. Two common scheduling strategies are dynamic scheduling and static
scheduling. Here's a comparison between the two:

1. Static Scheduling:
 In static scheduling, loop iterations are divided among threads before runtime,
typically at the beginning of the parallel region.
 The number of iterations assigned to each thread is determined statically based on
the loop iteration space and the number of threads.
 Workload distribution is done once, and each thread is assigned a fixed set of
iterations to execute.
 Static scheduling is beneficial when the workload of each iteration is roughly the
same, and there is little variation in execution time across iterations.
 It may lead to load imbalance if the workload of iterations varies significantly, as
some threads may finish their work much earlier than others.
2. Dynamic Scheduling:
 In dynamic scheduling, loop iterations are dynamically assigned to threads at
runtime.
 The loop iterations are divided into chunks, and each thread takes a new chunk of
work when it finishes its previous chunk.
 This approach enables better load balancing because work distribution can adapt
to runtime conditions, such as varying execution times of iterations.
 Dynamic scheduling incurs overhead due to the runtime decision-making process
and synchronization between threads to acquire new work chunks.
 It is suitable for situations where the workload of iterations varies significantly or
when the execution time of iterations is unpredictable.

In summary, static scheduling divides loop iterations among threads before runtime, providing
simplicity and potentially better performance in cases of uniform workload. On the other hand,
dynamic scheduling assigns iterations dynamically at runtime, offering better load balancing at
the cost of additional overhead. The choice between static and dynamic scheduling depends on
the characteristics of the workload and the desired trade-offs between simplicity and load
balancing.
Exercise 2
#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

#define CHUNKSIZE 10

#define N 100

int main (int argc, char *argv[])

int nthreads, tid, i, chunk;

float a[N], b[N], c[N];

/* Some initializations */

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;
#pragma omp parallel shared (a,b,c,nthreads,chunk) private(i,tid)

tid = omp_get_thread_num();

if (tid == 0)

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);}

printf("Thread %d starting...\n",tid);

#pragma omp for schedule(dynamic,chunk)

for (i=0; i<N; i++)

c[i] = a[i] + b[i];

printf("Thread %d: c[%d]= %f\n",tid,i,c[i]);

} /* end of parallel section *

this code is another OpenMP example demonstrating parallelization of a simple task. Here's what
it does:

1. Initialization: It initializes three arrays a, b, and c, each of size N=100. Arrays a and
b are filled with values based on the index.
2. Parallel Region: It enters a parallel region using #pragma omp parallel . This
directive spawns multiple threads to execute the enclosed code block in parallel.
3. Thread Information: Each thread obtains its thread ID using
omp_get_thread_num() and prints it. The master thread (thread with ID 0) also
prints the total number of threads using omp_get_num_threads().
4. Parallel Loop: Inside the parallel region, a loop is parallelized using #pragma omp
for. This loop computes element-wise addition of arrays a and b, storing the result in
array c.
5. Dynamic Scheduling: The loop is scheduled dynamically using schedule(dynamic,
chunk). This means that iterations are dynamically assigned to threads at runtime, and
each thread processes a chunk of chunk iterations before requesting more work.
6. Printing Results: Each thread prints the computed value of c[i] for the iterations it
processes.
7. End of Parallel Section: After the loop, the parallel section ends, and all threads
synchronize implicitly.

The purpose of this code is to demonstrate how to parallelize a simple computation task
(element-wise addition of two arrays) using OpenMP. It showcases how to utilize parallel loops
and dynamic scheduling to distribute work among threads efficiently. Additionally, it provides
insights into thread management and synchronization within a parallel region.
The output in the case of dynamic schudling
The output of this code will display the result of parallel addition of elements from arrays `a` and `b` into array `c`, along with some
diagnostic information about the threads. Since the code employs OpenMP parallelism, the order of thread execution and the
scheduling of iterations may vary. Here's an example of what the output might look like:

```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```

In this output:
- Each thread prints a message indicating its thread number (`tid`) when it starts.
- The master thread (thread with ID 0) additionally prints the total number of threads (`nthreads`) after all threads have started.
- Each thread then computes and prints the elements of array `c` that it has calculated.
- The order in which threads start and process iterations, as well as the scheduling of iterations, may vary due to the parallel
execution.
The output in the case of static schudling .
If we replace the dynamic scheduling with static scheduling in the code, the output might change due to the different scheduling
behavior. With static scheduling, loop iterations are statically divided among threads before runtime, typically at the beginning of the
parallel region. Each thread is assigned a fixed set of iterations to execute.

Here's what the modified output might look like with static scheduling:

```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 0: c[0]= 0.000000
Thread 0: c[1]= 2.000000
Thread 0: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[0]= 0.000000
Thread 1: c[1]= 2.000000
Thread 1: c[2]= 4.000000
...
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[0]= 0.000000
Thread 2: c[1]= 2.000000
Thread 2: c[2]= 4.000000
...
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```

In this output:
- Each thread still starts and prints its thread number (`tid`) as before.
- The master thread (thread with ID 0) still prints the total number of threads (`nthreads`) as before.
- However, with static scheduling, the distribution of loop iterations is predetermined before runtime, so each thread will execute a
contiguous chunk of iterations. Thus, the order in which threads process iterations will be more predictable and may follow a pattern
based on the static assignment of iterations.

Exercise 3
/
******************************************************************************
* FILE: omp_orphan.c
* DESCRIPTION:
* OpenMP Example - Parallel region with an orphaned directive - C/C++
Version
* This example demonstrates a dot product being performed by an orphaned
* loop reduction construct. Scoping of the reduction variable is critical.
* AUTHOR: Blaise Barney 5/99
* LAST REVISED: 06/30/05
******************************************************************************
/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define VECLEN 100

float a[VECLEN], b[VECLEN], sum;

float dotprod ()
{
int i,tid;

tid = omp_get_thread_num();
#pragma omp for reduction(+:sum)
for (i=0; i < VECLEN; i++)
{
sum = sum + (a[i]*b[i]);
printf(" tid= %d i=%d\n",tid,i);
}
}

int main (int argc, char *argv[]) {


int i;

for (i=0; i < VECLEN; i++)


a[i] = b[i] = 1.0 * i;
sum = 0.0;

#pragma omp parallel


dotprod();

printf("Sum = %f\n",sum);

This code is an example of using OpenMP for parallel computation of the dot
product of two vectors. Here's a breakdown of its purpose and functionality:

1. **Initialization**:
- Two arrays `a` and `b`, both of length `VECLEN (100)`, are initialized
with values based on their index.
- The variable `sum` is initialized to `0.0`. This variable will accumulate
the dot product of `a` and `b`.

2. **Parallel Region**:
- The `main()` function contains a parallel region defined by `#pragma omp
parallel`.
- Inside this parallel region, the function `dotprod()` is called by each
thread.

3. **Dot Product Computation**:


- The `dotprod()` function is responsible for computing the dot product of
`a` and `b`.
- It utilizes a parallel loop construct with `#pragma omp for`.
- Within this loop, each thread iterates over elements of `a` and `b`,
accumulating their products into the `sum` variable.
- The reduction clause `reduction(+:sum)` ensures that each thread has a
private copy of `sum` and aggregates its local `sum` values into a shared
`sum` variable at the end of the loop.

4. **Print Statements**:
- Within the loop in `dotprod()`, each thread prints its thread ID (`tid`)
and the index (`i`) it's currently processing.

5. **Summation Output**:
- After the parallel region, the `main()` function prints out the computed
sum, which represents the dot product of vectors `a` and `b`.

The purpose of this code is to demonstrate how to parallelize a simple


computation task, such as the dot product of two vectors, using OpenMP
directives. It showcases parallel loop constructs and reduction clauses to
efficiently distribute work among multiple threads and aggregate results.
Additionally, it highlights the scoping of variables within parallel regions
and the use of private and shared variables to maintain data consistency.

Message passing interface (MPI) is a standard specification of message-


passing interface for parallel computation in distributed-memory systems.

MPI isn’t a programming language. It’s a library of functions that


programmers can call from C, C++, or Fortran code to write parallel
programs.

With MPI, an MPI communicator can be dynamically created and have


multiple processes concurrently running on separate nodes of clusters.
Each process has a unique MPI rank to identify it, its own memory space,
and executes independently from the other processes. Processes
communicate with each other by passing messages to exchange data.
Parallelism occurs when a program task gets partitioned into small chunks
and distributes those chunks among the processes, in which such each
process processes its part.

MPI Communication Methods


MPI provides three different communication methods that MPI processes
can use to communicate with each other. The communication methods
are discussed as follows:
Point-to-Point Communication

MPI Versus OpenMP


The following is a list of the most common differences between MPI and
OpenMP.
MPI OpenMP
Available from different vendors and gets compiled on Windows, An add-on in a compiler such as a GNU com
macOS, and Linux operating systems.. compiler.
Supports parallel computation for distributed-memory and shared- Supports parallel computation for shared-me
MPI OpenMP
memory systems. only.
A process-based parallelism. A thread-based parallelism.
With MPI, each process has its own memory space and executes With OpenMP, threads share the same resou
independently from the other processes. shared memory.
Processes exchange data by passing messages to each other. There is no notion of message-passing. Thre
memory.
Process creation overhead occurs one time. It depends on the implementation. More ove
when creating threads to join a task.

Exercise 1 mpi

/
******************************************************************************
* FILE: mpi_hello.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program
* AUTHOR: Blaise Barney
* LAST REVISED: 03/05/10
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0

int main (int argc, char *argv[])


{
int numtasks, taskid, len;
char hostname[MPI_MAX_PROCESSOR_NAME];

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
MPI_Finalize();

}
This code is a simple "Hello World" program written using MPI (Message
Passing Interface), which is a standard for parallel programming. Here's a
breakdown of its purpose and functionality:
1. **Initialization**:
- The program starts by initializing MPI using `MPI_Init(&argc, &argv)`. This
call initializes the MPI execution environment.

2. **Determination of Process Information**:


- It retrieves the total number of MPI processes (`numtasks`) and the ID of
the current process (`taskid`) using `MPI_Comm_size()` and
`MPI_Comm_rank()` functions, respectively.
- Additionally, it retrieves the hostname of the current process using
`MPI_Get_processor_name()`.

3. **Printing Greetings**:
- Each MPI process prints a "Hello" message, indicating its task ID and the
hostname it's running on.
- If the current process is the master process (identified by `taskid ==
MASTER`), it also prints the total number of MPI tasks.

4. **Finalization**:
- Finally, MPI is finalized using `MPI_Finalize()`, which cleans up the MPI
environment before program termination.

The purpose of this code is to illustrate the basic structure of an MPI program
and demonstrate how to initialize MPI, obtain process information, and
perform simple parallel output. It serves as a starting point for understanding
MPI parallel programming and how processes interact in a distributed-
memory computing environment.

The output of the code

The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed.

Assuming the code is executed with multiple MPI tasks, here's an example of
the expected output:

```
Hello from task 0 on <hostname>!
Hello from task 1 on <hostname>!
...
Hello from task <numtasks-1> on <hostname>!
MASTER: Number of MPI tasks is: <numtasks>
```

- Each MPI task will print its task ID (`taskid`) and the hostname of the
system (`hostname`).
- The master task (with `taskid == MASTER`, usually 0) will also print the
total number of MPI tasks (`numtasks`).

The `<hostname>` will be replaced by the actual hostname of the system,


and `<numtasks>` will be replaced by the total number of MPI tasks running
the code.

Exercise 2

/
******************************************************************************
* FILE: mpi_helloBsend.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program that uses blocking
* send/receive routines.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/08/15
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0

int main (int argc, char *argv[])


{
int numtasks, taskid, len, partner, message;
char hostname[MPI_MAX_PROCESSOR_NAME];
MPI_Status status;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);

/* need an even number of tasks */


if (numtasks % 2 != 0) {
if (taskid == MASTER)
printf("Quitting. Need an even number of tasks: numtasks=%d\n",
numtasks);
}

else {
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);

MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);

/* determine partner and then send/receive with partner */


if (taskid < numtasks/2) {
partner = numtasks/2 + taskid;
MPI_Send(&taskid, 1, MPI_INT, partner, 1, MPI_COMM_WORLD);
MPI_Recv(&message, 1, MPI_INT, partner, 1, MPI_COMM_WORLD, &status);
}
else if (taskid >= numtasks/2) {
partner = taskid - numtasks/2;
MPI_Recv(&message, 1, MPI_INT, partner, 1, MPI_COMM_WORLD, &status);
MPI_Send(&taskid, 1, MPI_INT, partner, 1, MPI_COMM_WORLD);
}

/* print partner info and exit*/


printf("Task %d is partner with %d\n",taskid,message);
}

MPI_Finalize();

The purpose of this code is to demonstrate a simple "Hello World" program


using MPI (Message Passing Interface), specifically focusing on blocking send
and receive routines (`MPI_Send` and `MPI_Recv`). Here's a breakdown of its
purpose and functionality:

1. **Initialization**: The program begins by initializing the MPI execution


environment using `MPI_Init(&argc, &argv)`. It then obtains the ID of the
current MPI task (`taskid`) and the total number of MPI tasks (`numtasks`).

2. **Even Number of Tasks Check**: It checks if the total number of tasks is


even. If it's not, the program prints a message and quits. This check ensures
that the program can pair up tasks for communication.

3. **Printing Information**: If the total number of tasks is even and the


current task is the master (with `taskid == MASTER`), it prints the total
number of MPI tasks.

4. **Hello Message**: Each MPI task prints a "Hello" message, indicating its
task ID and the hostname it's running on.

5. **Communication**: Tasks are paired up to perform simple communication


using blocking send and receive:
- Each task determines its partner based on its task ID and the total
number of tasks.
- If a task has a lower task ID than the midpoint of the total tasks, it sends
its task ID to its partner and receives a message from the partner.
- If a task has a higher or equal task ID than the midpoint, it receives a
message from its partner and sends its task ID.
- This communication ensures that each task is paired with another task
and exchanges information.

6. **Printing Partner Information**: After communication, each task prints


information about its partner task.

7. **Finalization**: Finally, MPI is finalized using `MPI_Finalize()`, which cleans


up the MPI environment before program termination.

Overall, the purpose of this code is to illustrate basic message passing and
communication between MPI tasks using blocking send and receive routines.
It demonstrates how to pair up tasks and exchange information in a
distributed-memory computing environment.

The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed. Here's what you can
expect:

1. If the number of MPI tasks (`numtasks`) is not even, the program will print
a message indicating that it's quitting because an even number of tasks is
required.

2. If the number of MPI tasks is even:


- The master task (task with ID 0) will print the total number of MPI tasks.
- Each MPI task will print a "Hello" message indicating its task ID and the
hostname of the system it's running on.
- Each MPI task will then pair up with another task and exchange
information:
- For tasks with IDs less than half of the total tasks, they will send their
task ID to their partner and receive a message from their partner.
- For tasks with IDs greater than or equal to half of the total tasks, they
will receive a message from their partner and send their task ID.
- After communication, each task will print information about its partner
task.

The output will look something like this:

```
MASTER: Number of MPI tasks is: <numtasks>
Hello from task <taskid> on <hostname>!
Hello from task <taskid> on <hostname>!
...
Task <taskid1> is partner with <taskid2>
Task <taskid3> is partner with <taskid4>
...
```

`<numtasks>` will be replaced by the actual number of MPI tasks,


`<taskid>` will be replaced by the task's ID, `<hostname>` will be replaced
by the hostname of the system, `<taskid1>` and `<taskid2>` will be
replaced by the task IDs of paired tasks, and so on.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy