Nscet E-Learning Presentation: Listen Learn Lead

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

NSCET

E-LEARNING
PRESENTATION
LISTEN … LEARN… LEAD…
COMPUTER SCIENCE AND ENGINEERING

IV YEAR / VIII SEMESTER

CS6801 – MULTICORE ARCHITECTURES


AND PROGRAMMING

P.MAHALAKSHMI,M.E,MISTE PHOTO
ASSISTANT PROFESSOR
Nadar Saraswathi College of Engineering & Technology,
Vadapudupatti, Annanji (po), Theni – 625531.
UNIT IV
DISTRIBUTED MEMORY
PROGRAMMING WITH MPI
Introduction
Distributed Memory
A distributed-memory system consists of a collection of core-memory pairs connected by a
network, and the memory associated with a core is directly accessible only to that core.

In message-passing programs, a program running on one core-memory pair is usually


called a process, and two processes can communicate by calling func-tions: one process
calls a send function and the other calls a receive function.

Department of CSE, NSCET, Theni Page-1


MPI
 The implementation of message-passing is called MPI, which is an abbreviation of Message-
Passing Interface.
 MPI is not a new programming language. It defines a library of functions that can be called from C,
C++, and Fortran programs.
 Some “global” communication functions that can involve more than two processes. These functions
are called collective communications.
 MPI manages a parallel computation on a distributed memory system.
 The user arranges an algorithm so that pieces of work can be carried out a simultaneous but
separate processes, and expresses this in a C or FORTRAN program that includes calls to MPI
functions.
At runtime, MPI:
o Distributes a copy of the program to each processor;
o Assigns each process a distinct ID;
o Synchronizes the start of the programs;
o Transfers messages between the processes;
o Manages an orderly shutdown of the programs at the end.

Department of CSE, NSCET, Theni Page-2


Topic
MPI program execution – MPI
constructs – libraries
Getting Started
Consider “hello, world” program in C language
#include <stdio.h>
int main(void)
{
printf("hello, worldnn");
return 0;
}
 Let’s write a program similar to “hello, world” that makes some use of MPI.
 Instead of having each process simply print a message, designate one process to do the output,
and the other processes will send it messages, which it will print.
 In parallel programming, it’s common (one might say standard) for the processes to be identified
by nonnegative integer ranks. So if there are p processes, the processes will have ranks 0, 1,
2, ...., p1. For parallel “hello, world,” let’s make process 0 the designated process, and the other
processes will send it messages.

Department of CSE, NSCET, Theni Page-3


Compilation and Execution
 The details of compiling and running the program depend on the system, so need to
check with a local expert.
 However, recall that when need to be explicit, assume that we’re using a text editor to
write the program source, and the command line to compile and run. Many systems
use a command called mpicc for compilation.
$ mpicc -g -Wall -o mpi _hello mpi _hello.c
 Typically, mpicc is a script that’s a wrapper for the C compiler. A wrapper script is a
script whose main purpose is to run some program.
 In this case, the program is the C compiler. However, the wrapper simplifies the
running of the compiler by telling it where to find the necessary header files and which
libraries to link with the object file.

Department of CSE, NSCET, Theni Page-4


OpenMP program to print Hello World using C language
/* mpi-hello.c */
#include <stdio.h>
#include <string.h>
#include <mpi.h>
int main( int argc, char *argv[] )
{
int rank, size, len;
char hostname[MPI_MAX_PROCESSOR_NAME];
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &size );
MPI_Get_processor_name( hostname, &len );
printf(”Greetings from process %d of %d running on %s\n", rank, size, hostname);
MPI_Finalize( );
return 0;
}

Department of CSE, NSCET, Theni Page-5


Compilation:
mpicc -Wall mpi_hello.c -o mpi_hello
Execution (8 processes on localhost):
mpirun -n 8 ./mpi_hello
Execution (two processes on host “foo” and one on host “bar”)
mpirun -H foo,foo,bar ./mpi_hello.
Output
$ mpirun -n 8 ./mpi_hello
Greetings from process 7 of 8 running on wopr
Greetings from process 5 of 8 running on wopr
Greetings from process 0 of 8 running on wopr
Greetings from process 3 of 8 running on wopr
Greetings from process 6 of 8 running on wopr
Greetings from process 4 of 8 running on wopr
Greetings from process 1 of 8 running on wopr
Greetings from process 2 of 8 running on wopr

Department of CSE, NSCET, Theni Page-6


MPI programs
 The first thing to observe is that this is a C program.
 For example, it includes the standard C header files stdio.h and string.h.
 It also has a main function just like any other C program. However, there are many
parts of the program which are new.
 Line 3 includes the mpi.h header file. This contains prototypes of MPI functions, macro
definitions, type definitions, and so on; it contains all the definitions and declarations
needed for compiling an MPI program.
 The second thing to observe is that all of the identifiers defined by MPI start with the
string MPI . The first letter following the underscore is capitalized for function names
and MPI-defined types. All of the letters in MPI-defined macros and constants are
capitalized, so there’s no question about what is defined by MPI and what’s defined by
the user program.

Department of CSE, NSCET, Theni Page-7


MPI_Init
 The call to MPI_Init tells the MPI system to do all of the necessary setup.
 For example, it might allocate storage for message buffers, and it might decide which
process gets which rank. As a rule of thumb, no other MPI functions should be called
before the program calls MPI_Init.
Syntax
int MPI_Init( Int* argc_p /* in/out */, Char*** argv_p /* in/out */);
The arguments, argc_p and argv_p, are pointers to the arguments to main, argc, and argv.
However, when our program doesn’t use these arguments, we can just pass NULL for both.
Like most MPI functions, MPI_Init returns an int error code, and in most cases we’ll ignore
these error codes.
MPI_Finalize
 The call to MPI_Finalize tells the MPI system that we’re done using MPI, and that any
resources allocated for MPI can be freed.
Syntax
int MPI_Finalize(void);

Department of CSE, NSCET, Theni Page-8


In general, no MPI functions should be called after the call to MPI_Finalize. Thus, a typical
MPI program has the following basic outline:
#include <mpi.h>
...
int main(int argc, char* argv[]) {
...
/* No MPI calls before this */
MPI_Init(&argc, &argv);
...
MPI_Finalize( );
/* No MPI calls after this */
...
return 0;
}
It’s not necessary to pass pointers to argc and argv to MPI_Init. It’s also not necessary that
the calls to MPI_Init and MPI_Finalize be in main.

Department of CSE, NSCET, Theni Page-9


Communicators, MPI_Comm_size and MPI_Comm_rank
In MPI a communicator is a collection of processes that can send messages to each other. One of the
purposes of MPI_Init is to define a communicator that consists of all of the processes started by the
user when she started the program.
This communicator is called MPI_COMM_WORLD. The function calls are getting information
about MPI_COMM_WORLD.
Syntax
int MPI_Comm_size(
MPI_Comm comm /* in */,
int* comm._sz_ p /* out */ );
int MPI_Com_ rank(
MPI_Comm comm /* in */,
int* my_rank_p /* out */);
For both functions, the first argument is a communicator and has the special type defined by MPI for
communicators, MPI_Comm. MPI_Comm_size returns in its second argument the number of
processes in the communicator, and MPI_Comm_rank returns in its second argument the calling
process’ rank in the communicator. Often use the variable comm sz for the number of processes
in MPI_COMM_WORLD, and the variable my rank for the process rank.

Department of CSE, NSCET, Theni Page-10


SPMD programs
 Most MPI programs are written in this way. That is, a single program is written so that
different processes carry out different actions, and this is achieved by simply having
the processes branch on the basis of their process rank.
 Recall that this approach to parallel programming is called single program, multiple
data, or SPMD. The if else statement makes program SPMD.
Communication
Each process, other than process 0, creates a message it will send to process 0. Process 0,
on the other hand, simply prints its message using printf, and then uses a for loop to
receive and print the messages sent by pro-cesses 1, 2, ...., comm sz 1.

Department of CSE, NSCET, Theni Page-11


Topic
MPI send and receive
Introduction
 Blocking communication is done using MPI_Send( ) and MPI_Recv( ).
 These functions do not return(i.e., they block) until the communication is finished.
 This means that the buffer passed to MPI_Send( ) can be reused, either because MPI
saved it somewhere, or because it has been received by the destination.
 Similarly, MPI_Recv( ) returns when the receive buffer has been filled with valid data
MPI_Send
It performs a blocking send
int MPI_Send(
const void *buf,
int count,
MPI_Datatype datatype,
int dest,
int tag,
MPI_Comm comm)

Department of CSE, NSCET, Theni Page-12


 The first three arguments data, count and data type determine the contents of the
message.
 The Remaining arguments destination, tag and communicator determine the
destination of the message.
 The first argument data is a pointer to the block of memory containing the contents of
the message.
 The second and third arguments count and data type determine the amount of data to
be sent.
 The fourth argument destination specifies the rank of the process that should receive
the message.
 The fifth argument tag is a non negative int. It can be used to distinguish messages that
are otherwise identical.
 The final argument to MPI_Send is a communicator. All MPI functions that involve
communication have a communicator argument. One of the most important purposes
of communicators is to specify communication universes; recall that a communicator is
a collection of processes that can send messages to each other.

Department of CSE, NSCET, Theni Page-13


Example
int buf = 123456;
MPI_Send(&buf, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Recv
 It performs blocking receive for a messag
int MPI_Recv(
void *buf,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm comm,
MPI_Status *status )

Department of CSE, NSCET, Theni Page-14


 The first argument data is a pointer to the block of memory containing the contents of
the message.
 The second and third arguments count and data type determine the amount of data to
be received.
 The source argument specifies the process from which the message should be received.
 The tag argument should match the tag argument of the message being sent.
 The communicator argument must match the communicator used by the sending
process.
 The status argument can be ignored with MPI_STATUS_IGNORE constant.
Example
MPI_Recv( &buf, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status );

Department of CSE, NSCET, Theni Page-15


Example Program
#include <mpi.h>
#include <stdio.h>
int main(intargc, char *argv[]) {
int numtasks, rank, dest, source, rc, count, tag=1;
char inmsg, outmsg='x';
MPI_StatusStat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = 1;
source = 1;
rc= MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc= MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD,
&Stat);
} ...

Department of CSE, NSCET, Theni Page-16


...
else if (rank == 1)
{
dest = 0;
source = 0;
rc= MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &Stat);
rc= MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag,
MPI_COMM_WORLD);
}
rc= MPI_Get_count(&Stat, MPI_CHAR, &count);
printf("Task%d: Received %d char(s) from task %d with tag %d \n",
rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG);
MPI_Finalize( );
}

Department of CSE, NSCET, Theni Page-17


The status argument
 The MPI type MPI_Status is a struct with atleast the three members MPI_SOURCE,
MPI_TAG and MPI_ERROR.
 If pass an MPI_Status structure to the MPI_Recv function it will be populated with
additional information about the receive operation after it completes. The three
primary pieces of information include:
o The amount of data in the message
o The sender of the message
o The tag of the message
 Suppose our program contains the definition
MPI_Status status;
 Then after a call to MPI_Recv in which & status is passed as the last argument we can
determine the sender and tag by examining the two members status.
i) MPI_SOURCE status ii) MPI_TAG
 The amount of data does not have a predefined element in the status structure. Instead
we have to find out the length of the message with MPI_Get_count.

Department of CSE, NSCET, Theni Page-18


Int MPI_Get_count
(
MPI_Status* status,
MPI_Datatype datatype,
int* count
)
 In MPI_Get_count the user passes the MPI_Status structure, the data type of the message and count
is returned.
 The count variable is the total number of data type elements that were received.
 It turns out that MPI_Recv can take MPI_ANY_SOURCE for the rank of the sender and MPI_ANY_TAG
for the tag of the message.
 For this case the MPI_Status structure is the only way to find out the actual sender and tag of the
message.
 Furthermore MPI_Recv is not Guaranteed to receive the entire amount of elements passed as the
argument to the function call. Instead, it receives the amount of elements that were sent to it (and
returns an error if more elements were sent than the desired receive amount).
 The MPI_Get_count function is used to determine the actual receive amount.

Department of CSE, NSCET, Theni Page-19


Topic
Point-to-point and Collective
communication
Point to Point Communication
 A point-to-point communication is a dedicated communication link between two systems
or processes. Think of a wire that directly connects two systems. The systems use that
wire exclusively to communicate.
 MPI point-to-point communication sends messages between two different MPI
processes. One process performs a send operation while the other performs a matching
read. MPI guarantees that every message will arrive intact without errors.

Department of CSE, NSCET, Theni Page-20


Each message (envelope) contains
i) The actual data that is to be sent
ii) The data type of each element of data
iii) The number of elements the data consists of
iv) An identification number for the message (tag)
v) The ranks of the source and destination process
 MPI_Send() and MPI_Recv() are commonly used blocking methods for sending
messages between two MPI processes.
 Blocking means that the sending process will wait until the complete message has been
correctly sent and the receiving process will block while waiting to correctly receive the
complete message.
 Message size is determined by the count of an MPI data type and the type of the data

Department of CSE, NSCET, Theni Page-21


Send operation
MPI_Send (buf, count, datatype,dest, tag, comm)
Parameters
buf - The data that is sent
count - Number of elements in buffer
datatype - Type of each element in buf (see later slides)
dest - The rank of the receiver
tag - An integer identifying the message
comm - Communicator
error - The function returns an error value; in C/C++ it’s the return value of the
function, and in Fortran an additional output parameter

Department of CSE, NSCET, Theni Page-22


Receive operation
MPI_Recv (buf, count, datatype,source, tag, comm, status )
Parameters
buf - Buffer for storing received data
count - Number of elements in buffer, not the number of elements that are actually
received
datatype - Type of each element in buf
source - Sender of the message
tag - Number identifying the message.
comm - Communicator
status - Information on the received message
error - As for send operation

Department of CSE, NSCET, Theni Page-23


Send-receive pairs illustrated by an example
 A point-to-point communication in the message passing model requires both the initiator and the
target of the message to explicitly participate in the communication of messages.
 Because a single executable from the same source code is launched on each process, the role of
each process is typically specified by indexing the rank with statements.
 The process with rank 0 comes up with a number, which is then passed to the next rank, and from
then on to the next rank, and so on.

Department of CSE, NSCET, Theni Page-24


Collective Communication
 Collective communication is defined as communication that involves a group of
processes.
 Collective communication is over all of the processes in that group.
 Programs using only collective communication can be easier to understand
i) Every program does roughly the same thing
ii) No “strange” communication patterns
 Algorithms for collective communication are subtle, tricky and also encourages use of
communication algorithms devised by experts
Tree Structured Communication
 The Collective communication functions make use of tree structure to calculate and
communicate the result.
 The pattern by which processes join the computation is tree based. The depth of
balanced binary tree with N nodes is [log2N].
 Tree-based communication is very common when values are being distributed or
collected as it makes optimal use of communication resources.

Department of CSE, NSCET, Theni Page-25


Fig: Tree Structured global sum
In the above diagram initially students or processes 1, 3, 5 and 7 send their values to
processes 0, 2, 4 and 6 respectively. Then processes 0, 2, 4 and 6 add the received values to
their original values and the process is repeated twice:
1. a. Processes 2 and 6 send their new values to processes 0 and 4 respectively.
b. Processes 0 and 4 add the received values in to their new values.
2. a. Process 4 sends its newest value to process 0.
b. Process 0 adds the received value to its newest value

Department of CSE, NSCET, Theni Page-26


Fig: An alternative tree structured global sum
a) MPI_Bcast
 During a broadcast one process sends the same data to all processes in a communicator.
 One of the main uses of broad casting is to send out user input to a parallel program or
send out configuration parameters to all processes.

Department of CSE, NSCET, Theni Page-27


Example

Syntax
int MPI_Bcast (
void* data,
int count,
MPI_Datatype datatype,
int root,
MPI_Comm communicator)

Department of CSE, NSCET, Theni Page-28


b) MPI_Scatter
 MPI_Scatter is a collective routine that is very similar to MPI_Bcast.
 The primary difference between MPI_Bcast and MPI_Scatter is small but important.
 MPI_Bcast sends the same piece of data to all processes while MPI_Scatter sends
chunks of an array to different processes.
Syntax Example
int MPI_Scatter(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)

Department of CSE, NSCET, Theni Page-29


c) MPI_Gather
 MPI_Gather is the inverse of MPI_Scatter. Instead of spreading elements from one
process to many processes.
 MPI_Gather takes elements from many processes and gathers them to one single
process.
Syntax Example
int MPI_Gather (
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)

Department of CSE, NSCET, Theni Page-30


d) MPI_Allgather
 Given a set of elements distributed across all processes.
 MPI_Allgather will gather all of the elements to all the processes. In the most basic
sense, MPI_Allgather is an MPI_Gather followed by an MPI_Bcas.
Syntax Example
int MPI_Allgather (
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
MPI_Comm communicator )

Department of CSE, NSCET, Theni Page-31


e) MPI_Reduce
 Similar to MPI_Gather, MPI_Reduce takes an array of input elements on each process
and returns an array of output elements to the root process.
 The output elements contain the reduced result.
Syntax Example
int MPI_Reduce (
void* send_data,
void* recv_data,
int count,
MPI_Datatype datatype,
MPI_Op op,
int root,
MPI_Comm communicator )

Department of CSE, NSCET, Theni Page-32


f) MPI_Allreduce
 Many parallel applications will require accessing the reduced results across all
processes rather than the root process.
 In a similar complementary style of MPI_Allgather to MPI_Gather, MPI_Allreduce will
reduce the values and distribute the results to all processes.
Syntax Example
int MPI_Allreduce (
void* send_data,
void* recv_data,
int count,
MPI_Datatype datatype,
MPI_Op op,
MPI_Comm communicator)

Department of CSE, NSCET, Theni Page-33


Topic
MPI derived data types
Introduction
 In MPI, a derived datatype can be used to represent any collection of data items in
memory by storing both the types of the items and their relative locations in memory.
 The idea here is that if a function that sends data knows the types and the relative
locations in memory of a collection of data items, it can collect the items from memory
before they are sent.
 Similarly, a function that receives data can distribute the items into their correct
destinations in memory when they’re received.
Example
In our trapezoidal rule program we needed to call MPI_Bcast three times: once for the left
endpoint a, once for the right endpoint b, and once for the number of trapezoids n. As an
alternative, we could build a single derived datatype that consists of two doubles and
one int. If we do this, we’ll only need one call to MPI_Bcast. On process 0, a, b, and n will be
sent with the one call, while on the other processes, the values will be received with the
call.

Department of CSE, NSCET, Theni Page-34


Formally, a derived datatype consists of a sequence of basic MPI datatypes together with
a displacement for each of the datatypes.
In our trapezoidal rule example, suppose that on process 0 the variables a, b, and n are
stored in memory locations with the following addresses:

Then the following derived datatype could represent these data items:
{(MPI_DOUBLE, 0),(MPI_DOUBLE, 16),(MPI_INT, 24)}.
The first element of each pair corresponds to the type of the data, and the second element
of each pair is the displacement of the data element from the beginning of the type. We’ve
assumed that the type begins with a, so it has displacement 0, and the other elements have
displacements measured, in bytes, from a: b is 40 - 24 = 16 bytes beyond the start of a,
and n is 48-24 = 24 bytes beyond the start of a.

Department of CSE, NSCET, Theni Page-35


Syntax
 Use MPI_Type_create struct to build a derived datatype that consists of individual
elements that have different basic types:
int MPI_Type_create_struct(
int count,
const int array_of_blocklengths[ ],
const MPI_Aint array_of_displacements[ ],
const MPI_Datatype array_of_types[ ],
MPI_Datatype* newtype)

Department of CSE, NSCET, Theni Page-36


Input Parameters
count: number of blocks(integer)
array_of_bLockLengths: number of elements in each block(array of integer)
array_of_dispLacements: byte displacement of each block(array of address integer)
array_of_types: type of elements in each block(array of handles to data type objects)
Output Parameters
 The argument count is the number of elements in the datatype, so for our example, it
should be three.
 Each of the array arguments should have count elements.
 The first array, array of block lengths, allows for the possibility that the individual data
items might be arrays or subarrays. If, for example, the first element were an array
containing five elements, we would have
array_of_blocklengths[0] = 5;
 None of the elements is an array can simply define
int array of blocklengths[3] = {1, 1, 1};
The third argument to MPI Type create struct, array of displacements, specifies the
displacements, in bytes, from the start of the message.
array of displacements[] = {0, 16, 24};
Department of CSE, NSCET, Theni Page-37
Topic
Performance evaluation
 Parallel programs because are faster than a serial program that solves the same
problem.
Taking timings
 We’re usually not interested in the time taken from the start of program execution to
the end of program execution.
 MPI provides a function, MPI_Wtime, that returns the number of seconds that have
elapsed since some time in the past:double MPI_IWtime(void);
Thus, we can time a block of MPI code as follows: double start, finish; start=MPI_Wtime( );
/*Code tobe timed*/
finish=MPI_Wtime( );
printf("Proc %d > Elapsed time=%e seconds\n“ , my_rank, finish-start);
Timing the serial code
There’s a C macro GET_TIME defined in the header file timer.h. This macro should be
called with a double argument.

Department of CSE, NSCET, Theni Page-38


#include "timer.h"
double start, finish
GET_TIME(start);
/*Code to be timed*/
GET_TIME(finish);
printf("Elapsed time=%e seconds\n", finish-start)
Both MPI_Wtime and GET_TIME return wall clock time. Recall that a timer like the
C clock function returns CPU time—the time spent in user code, library functions, and
operating system code. It doesn’t include idle time, which can be a significant part of
parallel run time. For example, a call to MPI_Recv may spend a significant amount of time
waiting for the arrival of a message. Wall clock time, on the other hand, gives total elapsed
time, so it includes idle time.

Department of CSE, NSCET, Theni Page-39


Program: An MPI matrix-vector multiplication function
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv); //initialize MPI operations
MPI_Comm_rank(MPI_COMM_WORLD, &rank); //get the rank
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank > 0)
{
MPI_Recv(&low_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG, MPI_COMM_WORLD, &status);
MPI_Recv(&upper_bound, 1, MPI_INT, 0, MASTER_TO_SLAVE_TAG + 1, MPI_COMM_WORLD, &status);
MPI_Recv(&mat_a[low_bound][0], (upper_bound - low_bound) * NUM_COLUMNS_A, MPI_DOUBLE, 0,
MASTER_TO_SLAVE_TAG + 2, MPI_COMM_WORLD, &status);
for (i = low_bound; i < upper_bound; i++)
{
for (j = 0; j < NUM_COLUMNS_B; j++)
{
for (k = 0; k < NUM_ROWS_B; k++)
{
mat_result[i][j] += (mat_a[i][k] * mat_b[k][j]);
}

Department of CSE, NSCET, Theni Page-40


Results
 The results of timing the matrix-vector multiplication program are shown in
Table. The input matrices were square. The times shown are in milliseconds.

 comm_sz is the number of proces. The times for comm_sz = 1 are the run-times of the
serial program running on a single core of the distributed-memory system.

Department of CSE, NSCET, Theni Page-41


Denote the serial run-time by Tserial. Since it typically depends on the size of the input, n,
we’ll frequently denote it as Tserial(n). Also recall that we denote the parallel run-time by
Tparallel. Since it depends on both the input size, n, and the number of processes,
comm_sz= p, we’ll frequently denote it as Tparallel(n, p). As we noted in Chapter 2, it’s
often the case that the parallel program will divide the work of the serial program among
the processes, and add in some overhead time, which we denoted Toverhead:

 In MPI programs, the parallel overhead typically comes from communication, and it can
depend on both the problem size and the number of processes.
 However, the parallel program also needs to complete a call to MPI_Allgather before it
can carry out the local matrix-vector multiplication. In our example, it appears that

Department of CSE, NSCET, Theni Page-42


Speedup and efficiency

 Recall that the most widely used measure of the relation between the serial and the
parallel run-times is the speedup. It’s just the ratio of the serial run-time to the parallel
run-time:

 The ideal value for S(n, p) is p. If S(n, p) = p, then our parallel program
with comm_sz = p processes is running p times faster than the serial program. In
practice, this speedup, sometimes called linear speedup, is rarely achieved. Our
matrix-vector multiplication program got the speedups shown in Table. For small p and
large n, our program obtained nearly linear speedup. On the other hand, for large p and
small n, the speedup was considerably less than p. The worst case was n = 1024
and p = 16, when we only managed a speedup of 2.4

Department of CSE, NSCET, Theni Page-43


Department of CSE, NSCET, Theni Page-44
 Also recall that another widely used measure of parallel performance is parallel
efficiency. This is “per process” speedup:

 Linear speedup corresponds to a parallel efficiency of p/p = 1.0, and, in general, we


expect that our efficiencies will be less than 1.
 The efficiencies for the matrix-vector multiplication program are shown in Table. Once
again, for small p and large n our parallel efficiencies are near linear, and for large p and
small n, they are very far from linear.
Scalability
 A program is scalable if the problem size can be increased at a rate so that the
efficiency doesn’t decrease as the number of processes increase.

Department of CSE, NSCET, Theni Page-45


Consider two parallel programs: program A and program B. Suppose that if p>=2, the efficiency of
program A is 0.75, regardless of problem size. Also suppose that the efficiency of
program B is n/(625p), provided p>=2 and 1000<=n<=n 625p. Then according to our “definition,”
both programs are scalable. For program A, the rate of increase needed to maintain constant efficiency
is 0, while for program B if we increase n at the same rate as we increase p, we’ll maintain a constant
efficiency. For example, if n = 1000 and p = 2, the efficiency of B is 0.80. If we then double p to 4 and
we leave the problem size at n = 1000, the efficiency will drop to 0.40, but if we also double the
problem size to n = 2000, the efficiency will remain constant at 0.80. Program A is thus more scalable
than B, but both satisfy our definition of scalability.
 The programs that can maintain a constant efficiency without increasing the problem
size are sometimes said to be strongly scalable.
 Programs that can maintain a constant efficiency if the problem size increases at the
same rate as the number of processes are sometimes said to be weakly scalable.
 Program A is strongly scalable, and program B is weakly scalable. Furthermore, our
matrix-vector multiplication program is also apparently weakly scalable.

Department of CSE, NSCET, Theni Page-46

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy