hpc lab
hpc lab
With OpenMP, forked threads have access to shared memory. For more
details, visit the OpenMP home page.
Exercise 1 in openmp :
The purpose of this code is to introduce the basic concepts of parallel
programming using OpenMP. It shows how to create a parallel region, how to
obtain the thread ID and the total number of threads, and how to execute
parallel tasks. This "Hello World" example serves as a starting point for
understanding parallelism with OpenMP.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
Exercise 1
```
Hello World from thread = 0
Hello World from thread = 1
Hello World from thread = 2
Hello World from thread = 3
Number of threads = 4
```
In this output:
- Each line corresponds to a thread printing "Hello World" along with its thread number
(`tid`).
- The master thread (thread with ID 0) additionally prints the total number of threads
(`nthreads`) after all threads have been created and executed.
- The order in which threads print their "Hello World" messages may vary due to the
concurrent execution of threads in the parallel region. However, the master thread's
message about the total number of threads will always be printed after all other
threads' messages.
If you run
int main()
{
#pragma omp parallel for schedule(static,1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}
and
int main()
{
#pragma omp parallel for for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}
Result 1:
Result 2:
Static
#pragma omp parallel for schedule(static,chunk-size)
If you do not specify chunk-size variable, OpenMP will divides iterations into
chunks that are approximately equal in size and it distributes chunks to
threads in order(Notice that is why static method different from
others.In the for loop we discussed before, under 12-thread condition, each
thread will treat 1-2 iterations; if you only use 4 threads, each thread will
treat 5 iterations.
Result after using #pragma omp parallel for schedule(static)(If you do not
specify chunk-size, the default value is 1)
If you specify chunk-size variable, the iterations will be divide into iter_size /
chunk_size chunks.
In
int main()
{
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}
20 iterations will be divided into 7 chunks(6 with 3 iters, 1 with 2 iters), the
result is:
But what if iter_size / chunk_size is larger than the number of threads in your
computer, or number of threads you specified
in omp_set_num_threads(thread_num)?
The following example how OpenMP works under this kind of condition.
int main()
{
omp_set_num_threads(4);
#pragma omp parallel for schedule(static, 3) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}
Result:
OpenMP will still split task into 7 chunks, but distributes the chunks to
threads in a circular order, like the following figure shows
Dynamic
#pragma omp parallel for schedule(dynamic,chunk-size)
OpenMP will still split task into iter_size/chunk_size chunks, but distribute
trunks to threads dynamically without any specific order.
If you run
int main()
{
#pragma omp parallel for schedule(dynamic, 1) for (int i = 0; i < 20; i++)
{
printf("Thread %d is running number %d\n", omp_get_thread_num(), i);
}
return 0;
}
Result:
You can see that thread 1 took on 10 iters while others took only 0-1.
In OpenMP, scheduling refers to how loop iterations or tasks are divided among threads for
parallel execution. Two common scheduling strategies are dynamic scheduling and static
scheduling. Here's a comparison between the two:
1. Static Scheduling:
In static scheduling, loop iterations are divided among threads before runtime,
typically at the beginning of the parallel region.
The number of iterations assigned to each thread is determined statically based on
the loop iteration space and the number of threads.
Workload distribution is done once, and each thread is assigned a fixed set of
iterations to execute.
Static scheduling is beneficial when the workload of each iteration is roughly the
same, and there is little variation in execution time across iterations.
It may lead to load imbalance if the workload of iterations varies significantly, as
some threads may finish their work much earlier than others.
2. Dynamic Scheduling:
In dynamic scheduling, loop iterations are dynamically assigned to threads at
runtime.
The loop iterations are divided into chunks, and each thread takes a new chunk of
work when it finishes its previous chunk.
This approach enables better load balancing because work distribution can adapt
to runtime conditions, such as varying execution times of iterations.
Dynamic scheduling incurs overhead due to the runtime decision-making process
and synchronization between threads to acquire new work chunks.
It is suitable for situations where the workload of iterations varies significantly or
when the execution time of iterations is unpredictable.
In summary, static scheduling divides loop iterations among threads before runtime, providing
simplicity and potentially better performance in cases of uniform workload. On the other hand,
dynamic scheduling assigns iterations dynamically at runtime, offering better load balancing at
the cost of additional overhead. The choice between static and dynamic scheduling depends on
the characteristics of the workload and the desired trade-offs between simplicity and load
balancing.
Exercise 2
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define CHUNKSIZE 10
#define N 100
/* Some initializations */
chunk = CHUNKSIZE;
#pragma omp parallel shared (a,b,c,nthreads,chunk) private(i,tid)
tid = omp_get_thread_num();
if (tid == 0)
nthreads = omp_get_num_threads();
printf("Thread %d starting...\n",tid);
this code is another OpenMP example demonstrating parallelization of a simple task. Here's what
it does:
1. Initialization: It initializes three arrays a, b, and c, each of size N=100. Arrays a and
b are filled with values based on the index.
2. Parallel Region: It enters a parallel region using #pragma omp parallel . This
directive spawns multiple threads to execute the enclosed code block in parallel.
3. Thread Information: Each thread obtains its thread ID using
omp_get_thread_num() and prints it. The master thread (thread with ID 0) also
prints the total number of threads using omp_get_num_threads().
4. Parallel Loop: Inside the parallel region, a loop is parallelized using #pragma omp
for. This loop computes element-wise addition of arrays a and b, storing the result in
array c.
5. Dynamic Scheduling: The loop is scheduled dynamically using schedule(dynamic,
chunk). This means that iterations are dynamically assigned to threads at runtime, and
each thread processes a chunk of chunk iterations before requesting more work.
6. Printing Results: Each thread prints the computed value of c[i] for the iterations it
processes.
7. End of Parallel Section: After the loop, the parallel section ends, and all threads
synchronize implicitly.
The purpose of this code is to demonstrate how to parallelize a simple computation task
(element-wise addition of two arrays) using OpenMP. It showcases how to utilize parallel loops
and dynamic scheduling to distribute work among threads efficiently. Additionally, it provides
insights into thread management and synchronization within a parallel region.
The output in the case of dynamic schudling
The output of this code will display the result of parallel addition of elements from arrays `a` and `b` into array `c`, along with some
diagnostic information about the threads. Since the code employs OpenMP parallelism, the order of thread execution and the
scheduling of iterations may vary. Here's an example of what the output might look like:
```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```
In this output:
- Each thread prints a message indicating its thread number (`tid`) when it starts.
- The master thread (thread with ID 0) additionally prints the total number of threads (`nthreads`) after all threads have started.
- Each thread then computes and prints the elements of array `c` that it has calculated.
- The order in which threads start and process iterations, as well as the scheduling of iterations, may vary due to the parallel
execution.
The output in the case of static schudling .
If we replace the dynamic scheduling with static scheduling in the code, the output might change due to the different scheduling
behavior. With static scheduling, loop iterations are statically divided among threads before runtime, typically at the beginning of the
parallel region. Each thread is assigned a fixed set of iterations to execute.
Here's what the modified output might look like with static scheduling:
```
Thread 0 starting...
Thread 1 starting...
Thread 2 starting...
Thread 3 starting...
Number of threads = 4
Thread 0: c[0]= 0.000000
Thread 0: c[1]= 2.000000
Thread 0: c[2]= 4.000000
...
Thread 0: c[96]= 192.000000
Thread 0: c[97]= 194.000000
Thread 0: c[98]= 196.000000
Thread 0: c[99]= 198.000000
Thread 1: c[0]= 0.000000
Thread 1: c[1]= 2.000000
Thread 1: c[2]= 4.000000
...
Thread 1: c[96]= 192.000000
Thread 1: c[97]= 194.000000
Thread 1: c[98]= 196.000000
Thread 1: c[99]= 198.000000
Thread 2: c[0]= 0.000000
Thread 2: c[1]= 2.000000
Thread 2: c[2]= 4.000000
...
Thread 2: c[96]= 192.000000
Thread 2: c[97]= 194.000000
Thread 2: c[98]= 196.000000
Thread 2: c[99]= 198.000000
Thread 3: c[0]= 0.000000
Thread 3: c[1]= 2.000000
Thread 3: c[2]= 4.000000
...
Thread 3: c[96]= 192.000000
Thread 3: c[97]= 194.000000
Thread 3: c[98]= 196.000000
Thread 3: c[99]= 198.000000
```
In this output:
- Each thread still starts and prints its thread number (`tid`) as before.
- The master thread (thread with ID 0) still prints the total number of threads (`nthreads`) as before.
- However, with static scheduling, the distribution of loop iterations is predetermined before runtime, so each thread will execute a
contiguous chunk of iterations. Thus, the order in which threads process iterations will be more predictable and may follow a pattern
based on the static assignment of iterations.
Exercise 3
/
******************************************************************************
* FILE: omp_orphan.c
* DESCRIPTION:
* OpenMP Example - Parallel region with an orphaned directive - C/C++
Version
* This example demonstrates a dot product being performed by an orphaned
* loop reduction construct. Scoping of the reduction variable is critical.
* AUTHOR: Blaise Barney 5/99
* LAST REVISED: 06/30/05
******************************************************************************
/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define VECLEN 100
float dotprod ()
{
int i,tid;
tid = omp_get_thread_num();
#pragma omp for reduction(+:sum)
for (i=0; i < VECLEN; i++)
{
sum = sum + (a[i]*b[i]);
printf(" tid= %d i=%d\n",tid,i);
}
}
printf("Sum = %f\n",sum);
This code is an example of using OpenMP for parallel computation of the dot
product of two vectors. Here's a breakdown of its purpose and functionality:
1. **Initialization**:
- Two arrays `a` and `b`, both of length `VECLEN (100)`, are initialized
with values based on their index.
- The variable `sum` is initialized to `0.0`. This variable will accumulate
the dot product of `a` and `b`.
2. **Parallel Region**:
- The `main()` function contains a parallel region defined by `#pragma omp
parallel`.
- Inside this parallel region, the function `dotprod()` is called by each
thread.
4. **Print Statements**:
- Within the loop in `dotprod()`, each thread prints its thread ID (`tid`)
and the index (`i`) it's currently processing.
5. **Summation Output**:
- After the parallel region, the `main()` function prints out the computed
sum, which represents the dot product of vectors `a` and `b`.
Exercise 1 mpi
/
******************************************************************************
* FILE: mpi_hello.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program
* AUTHOR: Blaise Barney
* LAST REVISED: 03/05/10
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD,&taskid);
MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
MPI_Finalize();
}
This code is a simple "Hello World" program written using MPI (Message
Passing Interface), which is a standard for parallel programming. Here's a
breakdown of its purpose and functionality:
1. **Initialization**:
- The program starts by initializing MPI using `MPI_Init(&argc, &argv)`. This
call initializes the MPI execution environment.
3. **Printing Greetings**:
- Each MPI process prints a "Hello" message, indicating its task ID and the
hostname it's running on.
- If the current process is the master process (identified by `taskid ==
MASTER`), it also prints the total number of MPI tasks.
4. **Finalization**:
- Finally, MPI is finalized using `MPI_Finalize()`, which cleans up the MPI
environment before program termination.
The purpose of this code is to illustrate the basic structure of an MPI program
and demonstrate how to initialize MPI, obtain process information, and
perform simple parallel output. It serves as a starting point for understanding
MPI parallel programming and how processes interact in a distributed-
memory computing environment.
The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed.
Assuming the code is executed with multiple MPI tasks, here's an example of
the expected output:
```
Hello from task 0 on <hostname>!
Hello from task 1 on <hostname>!
...
Hello from task <numtasks-1> on <hostname>!
MASTER: Number of MPI tasks is: <numtasks>
```
- Each MPI task will print its task ID (`taskid`) and the hostname of the
system (`hostname`).
- The master task (with `taskid == MASTER`, usually 0) will also print the
total number of MPI tasks (`numtasks`).
Exercise 2
/
******************************************************************************
* FILE: mpi_helloBsend.c
* DESCRIPTION:
* MPI tutorial example code: Simple hello world program that uses blocking
* send/receive routines.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/08/15
******************************************************************************
/
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#define MASTER 0
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
else {
if (taskid == MASTER)
printf("MASTER: Number of MPI tasks is: %d\n",numtasks);
MPI_Get_processor_name(hostname, &len);
printf ("Hello from task %d on %s!\n", taskid, hostname);
MPI_Finalize();
4. **Hello Message**: Each MPI task prints a "Hello" message, indicating its
task ID and the hostname it's running on.
Overall, the purpose of this code is to illustrate basic message passing and
communication between MPI tasks using blocking send and receive routines.
It demonstrates how to pair up tasks and exchange information in a
distributed-memory computing environment.
The output of the code will depend on the number of MPI tasks (processes)
and the hostname of the system where it is executed. Here's what you can
expect:
1. If the number of MPI tasks (`numtasks`) is not even, the program will print
a message indicating that it's quitting because an even number of tasks is
required.
```
MASTER: Number of MPI tasks is: <numtasks>
Hello from task <taskid> on <hostname>!
Hello from task <taskid> on <hostname>!
...
Task <taskid1> is partner with <taskid2>
Task <taskid3> is partner with <taskid4>
...
```