PETSc Manual
PETSc Manual
PETSc Manual
i
|x
i
|, the
2-norm is (
i
x
2
i
)
1/2
and the innity norm is max
i
|x
i
|.
For parallel vectors that are distributed across the processes by ranges, it is possible to determine a
processs local range with the routine
VecGetOwnershipRange(Vec vec,int *low,int *high);
The argument low indicates the rst component owned by the local process, while high species one
more than the last owned by the local process. This command is useful, for instance, in assembling parallel
vectors.
On occasion, the user needs to access the actual elements of the vector. The routine VecGetArray()
returns a pointer to the elements local to the process:
VecGetArray(Vec v,PetscScalar **array);
When access to the array is no longer needed, the user should call
VecRestoreArray(Vec v, PetscScalar **array);
43
Function Name Operation
VecAXPY(Vec y,PetscScalar a,Vec x); y = y + a x
VecAYPX(Vec y,PetscScalar a,Vec x); y = x + a y
VecWAXPY(Vec w,PetscScalar a,Vec x,Vec y); w = a x + y
VecAXPBY(Vec y,PetscScalar a,PetscScalar b,Vec x); y = a x + b y
VecScale(Vec x, PetscScalar a); x = a x
VecDot(Vec x, Vec y, PetscScalar *r); r = x
y
VecTDot(Vec x, Vec y, PetscScalar *r); r = x
y
VecNorm(Vec x,NormType type, PetscReal *r); r = ||x||
type
VecSum(Vec x, PetscScalar *r); r =
x
i
VecCopy(Vec x, Vec y); y = x
VecSwap(Vec x, Vec y); y = x while x = y
VecPointwiseMult(Vec w,Vec x,Vec y); w
i
= x
i
y
i
VecPointwiseDivide(Vec w,Vec x,Vec y); w
i
= x
i
/y
i
VecMDot(Vec x,int n,Vec y[],PetscScalar *r); r[i] = x
y[i]
VecMTDot(Vec x,int n,Vec y[],PetscScalar *r); r[i] = x
y[i]
VecMAXPY(Vec y,int n, PetscScalar *a, Vec x[]); y = y +
i
a
i
x[i]
VecMax(Vec x, int *idx, PetscReal *r); r = max x
i
VecMin(Vec x, int *idx, PetscReal *r); r = min x
i
VecAbs(Vec x); x
i
= |x
i
|
VecReciprocal(Vec x); x
i
= 1/x
i
VecShift(Vec x,PetscScalar s); x
i
= s + x
i
VecSet(Vec x,PetscScalar alpha); x
i
=
Table 1: PETSc Vector Operations
Minor differences exist in the Fortran interface for VecGetArray() and VecRestoreArray(), as discussed
in Section 10.1.3. It is important to note that VecGetArray() and VecRestoreArray() do not copy the vector
elements; they merely give users direct access to the vector elements. Thus, these routines require essentially
no time to call and can be used efciently.
The number of elements stored locally can be accessed with
VecGetLocalSize(Vec v,int *size);
The global vector length can be determined by
VecGetSize(Vec v,int *size);
In addition to VecDot() and VecMDot() and VecNorm(), PETSc provides split phase versions of these
that allow several independent inner products and/or norms to share the same communication (thus improv-
ing parallel efciency). For example, one may have code such as
VecDot(Vec x,Vec y,PetscScalar *dot);
VecMDot(Vec x,PetscInt nv, Vec y[],PetscScalar *dot);
VecNorm(Vec x,NormType NORM 2,PetscReal *norm2);
VecNorm(Vec x,NormType NORM 1,PetscReal *norm1);
This code works ne, the problem is that it performs three separate parallel communication operations.
Instead one can write
44
VecDotBegin(Vec x,Vec y,PetscScalar *dot);
VecMDotBegin(Vec x, PetscInt nv,Vec y[],PetscScalar *dot);
VecNormBegin(Vec x,NormType NORM 2,PetscReal *norm2);
VecNormBegin(Vec x,NormType NORM 1,PetscReal *norm1);
VecDotEnd(Vec x,Vec y,PetscScalar *dot);
VecMDotEnd(Vec x, PetscInt nv,Vec y[],PetscScalar *dot);
VecNormEnd(Vec x,NormType NORM 2,PetscReal *norm2);
VecNormEnd(Vec x,NormType NORM 1,PetscReal *norm1);
With this code, the communication is delayed until the rst call to VecxxxEnd() at which a single MPI
reduction is used to communicate all the required values. It is required that the calls to the VecxxxEnd()
are performed in the same order as the calls to the VecxxxBegin(); however if you mistakenly make the
calls in the wrong order PETSc will generate an error, informing you of this. There are additional routines
VecTDotBegin() and VecTDotEnd(), VecMTDotBegin(), VecMTDotEnd().
Note: these routines use only MPI 1 functionality; so they do not allow you to overlap computation and
communication (assuming no threads are spawned within a MPI process). Once MPI 2 implementations are
more common well improve these routines to allow overlap of inner product and norm calculations with
other calculations. Also currently these routines only work for the PETSc built in vector types.
2.3 Indexing and Ordering
When writing parallel PDE codes there is extra complexity caused by having multiple ways of indexing
(numbering) and ordering objects such as vertices and degrees of freedom. For example, a grid generator
or partitioner may renumber the nodes, requiring adjustment of the other data structures that refer to these
objects; see Figure 10. In addition, local numbering (on a single process) of objects may be different than
the global (cross-process) numbering. PETSc provides a variety of tools that help to manage the mapping
among the various numbering systems. The two most basic are the AO (application ordering), which enables
mapping between different global (cross-process) numbering schemes and the ISLocalToGlobalMapping,
which allows mapping between local (on-process) and global (cross-process) numbering.
2.3.1 Application Orderings
In many applications it is desirable to work with one or more orderings (or numberings) of degrees of
freedom, cells, nodes, etc. Doing so in a parallel environment is complicated by the fact that each process
cannot keep complete lists of the mappings between different orderings. In addition, the orderings used in
the PETSc linear algebra routines (often contiguous ranges) may not correspond to the natural orderings
for the application.
PETSc provides certain utility routines that allow one to deal cleanly and efciently with the various
orderings. To dene a new application ordering (called an AO in PETSc), one can call the routine
AOCreateBasic(MPI Comm comm,int n,const int apordering[],const int petscordering[],AO *ao);
The arrays apordering and petscordering, respectively, contain a list of integers in the application
ordering and their corresponding mapped values in the PETSc ordering. Each process can provide whatever
subset of the ordering it chooses, but multiple processes should never contribute duplicate values. The
argument n indicates the number of local contributed values.
For example, consider a vector of length ve, where node 0 in the application ordering corresponds to
node 3 in the PETSc ordering. In addition, nodes 1, 2, 3, and 4 of the application ordering correspond,
respectively, to nodes 2, 1, 4, and 0 of the PETSc ordering. We can write this correspondence as
0, 1, 2, 3, 4 3, 2, 1, 4, 0.
45
The user can create the PETSc-AO mappings in a number of ways. For example, if using two processes,
one could call
AOCreateBasic(PETSC COMM WORLD,2,{0,3},{3,4},&ao);
on the rst process and
AOCreateBasic(PETSC COMM WORLD,3,{1,2,4},{2,1,0},&ao);
on the other process.
Once the application ordering has been created, it can be used with either of the commands
AOPetscToApplication(AO ao,int n,int *indices);
AOApplicationToPetsc(AO ao,int n,int *indices);
Upon input, the n-dimensional array indices species the indices to be mapped, while upon output,
indices contains the mapped values. Since we, in general, employ a parallel database for the AO map-
pings, it is crucial that all processes that called AOCreateBasic() also call these routines; these routines
cannot be called by just a subset of processes in the MPI communicator that was used in the call to AOCre-
ateBasic().
An alternative routine to create the application ordering, AO, is
AOCreateBasicIS(IS apordering,IS petscordering,AO *ao);
where index sets (see 2.5.1) are used instead of integer arrays.
The mapping routines
AOPetscToApplicationIS(AO ao,IS indices);
AOApplicationToPetscIS(AO ao,IS indices);
will map index sets (IS objects) between orderings. Both the AOXxxToYyy() and AOXxxToYyyIS()
routines can be used regardless of whether the AO was created with a AOCreateBasic() or AOCreateBasi-
cIS().
The AOcontext should be destroyed with AODestroy(AO*ao) and viewed with AOView(AOao,PetscViewer
viewer).
Although we refer to the two orderings as PETSc and application orderings, the user is free to use
them both for application orderings and to maintain relationships among a variety of orderings by employing
several AO contexts.
The AOxxToxx() routines allow negative entries in the input integer array. These entries are not
mapped; they simply remain unchanged. This functionality enables, for example, mapping neighbor lists
that use negative numbers to indicate nonexistent neighbors due to boundary conditions, etc.
2.3.2 Local to Global Mappings
In many applications one works with a global representation of a vector (usually on a vector obtained with
VecCreateMPI()) and a local representation of the same vector that includes ghost points required for local
computation. PETSc provides routines to help map indices from a local numbering scheme to the PETSc
global numbering scheme. This is done via the following routines
ISLocalToGlobalMappingCreate(MPI Comm comm,int N,int* globalnum,PetscCopyMode mode,ISLocalToGlobalMapping* ctx);
ISLocalToGlobalMappingApply(ISLocalToGlobalMapping ctx,int n,int *in,int *out);
ISLocalToGlobalMappingApplyIS(ISLocalToGlobalMapping ctx,IS isin,IS* isout);
ISLocalToGlobalMappingDestroy(ISLocalToGlobalMapping *ctx);
46
Here N denotes the number of local indices, globalnum contains the global number of each local number,
and ISLocalToGlobalMapping is the resulting PETSc object that contains the information needed to apply
the mapping with either ISLocalToGlobalMappingApply() or ISLocalToGlobalMappingApplyIS().
Note that the ISLocalToGlobalMapping routines serve a different purpose than the AO routines. In the
former case they provide a mapping from a local numbering scheme (including ghost points) to a global
numbering scheme, while in the latter they provide a mapping between two global numbering schemes. In
fact, many applications may use both AO and ISLocalToGlobalMapping routines. The AO routines are rst
used to map from an application global ordering (that has no relationship to parallel processing etc.) to
the PETSc ordering scheme (where each process has a contiguous set of indices in the numbering). Then in
order to perform function or Jacobian evaluations locally on each process, one works with a local numbering
scheme that includes ghost points. The mapping from this local numbering scheme back to the global PETSc
numbering can be handled with the ISLocalToGlobalMapping routines.
If one is given a list of indices in a global numbering, the routine
ISGlobalToLocalMappingApply(ISLocalToGlobalMapping ctx,
ISGlobalToLocalMappingType type,int nin,int idxin[],int *nout,int idxout[]);
will provide a new list of indices in the local numbering. Again, negative values in idxin are left un-
mapped. But, in addition, if type is set to IS_GTOLM_MASK , then nout is set to nin and all global
values in idxin that are not represented in the local to global mapping are replaced by -1. When type is
set to IS_GTOLM_DROP, the values in idxin that are not represented locally in the mapping are not in-
cluded in idxout, so that potentially nout is smaller than nin. One must pass in an array long enough to
hold all the indices. One can call ISGlobalToLocalMappingApply() with idxout equal to PETSC NULL
to determine the required length (returned in nout) and then allocate the required space and call ISGlobal-
ToLocalMappingApply() a second time to set the values.
Often it is convenient to set elements into a vector using the local node numbering rather than the global
node numbering (e.g., each process may maintain its own sublist of vertices and elements and number them
locally). To set values into a vector with the local numbering, one must rst call
VecSetLocalToGlobalMapping(Vec v,ISLocalToGlobalMapping ctx);
and then call
VecSetValuesLocal(Vec x,int n,const int indices[],const PetscScalar values[],INSERT VALUES);
Now the indices use the local numbering, rather than the global, meaning the entries lie in [0, n) where
n is the local size of the vector.
2.4 Structured Grids Using Distributed Arrays
Distributed arrays (DMDAs), which are used in conjunction with PETSc vectors, are intended for use with
logically regular rectangular grids when communication of nonlocal data is needed before certain local
computations can occur. PETSc distributed arrays are designed only for the case in which data can be
thought of as being stored in a standard multidimensional array; thus, DMDAs are not intended for paral-
lelizing unstructured grid problems, etc. DAs are intended for communicating vector (eld) information;
they are not intended for storing matrices.
For example, a typical situation one encounters in solving PDEs in parallel is that, to evaluate a local
function, f(x), each process requires its local portion of the vector x as well as its ghost points (the
bordering portions of the vector that are owned by neighboring processes). Figure 9 illustrates the ghost
points for the seventh process of a two-dimensional, regular parallel grid. Each box represents a process;
the ghost points for the seventh processs local part of a parallel array are shown in gray.
47
Box-type stencil Star-type stencil
Proc 6
Proc 0 Proc 0 Proc 1 Proc 1
Proc 6
Figure 9: Ghost Points for Two Stencil Types on the Seventh Process
2.4.1 Creating Distributed Arrays
The PETSc DMDA object manages the parallel communication required while working with data stored
in regular arrays. The actual data is stored in approriately sized vector objects; the DMDA object only
contains the parallel data layout information and communication information, however it may be used to
create vectors and matrices with the proper layout.
One creates a distributed array communication data structure in two dimensions with the command
DMDACreate2d(MPI Comm comm,DMDABoundaryType xperiod,DMDABoundaryType yperiod,DMDAStencilType st,int M,
int N,int m,int n,int dof,int s,int *lx,int *ly,DM *da);
The arguments M and N indicate the global numbers of grid points in each direction, while m and n denote
the process partition in each direction; m
*
n must equal the number of processes in the MPI communicator,
comm. Instead of specifying the process layout, one may use PETSC DECIDE for m and n so that PETSc
will determine the partition using MPI. The type of periodicity of the array is specied by xperiod and
yperiod, which can be DMDA BOUNDARY NONE (no periodicity), DMDA BOUNDARY PERIODIC
(periodic in that direction), DMDA BOUNDARY GHOSTED , or DMDA BOUNDARY MIRROR. The
argument dof indicates the number of degrees of freedom at each array point, and s is the stencil width
(i.e., the width of the ghost point region). The optional arrays lx and ly may contain the number of
nodes along the x and y axis for each cell, i.e. the dimension of lx is m and the dimension of ly is n; or
PETSC NULL may be passed in.
Two types of distributed array communication data structures can be created, as specied by st. Star-
type stencils that radiate outward only in the coordinate directions are indicated by DMDA STENCIL STAR,
while box-type stencils are specied by DA STENCIL BOX. For example, for the two-dimensional case,
DA STENCIL STARwith width 1 corresponds to the standard 5-point stencil, while DMDA STENCIL BOX
with width 1 denotes the standard 9-point stencil. In both instances the ghost points are identical, the only
difference being that with star-type stencils certain ghost points are ignored, decreasing substantially the
number of messages sent. Note that the DMDA STENCIL STAR stencils can save interprocess communi-
cation in two and three dimensions.
These DMDA stencils have nothing directly to do with any nite difference stencils one might chose to
use for a discretization; they only ensure that the correct values are in place for application of a user-dened
nite difference stencil (or any other discretization technique).
The commands for creating distributed array communication data structures in one and three dimensions
are analogous:
48
DMDACreate1d(MPI Comm comm,DMDABoundaryType xperiod,int M,int w,int s,int *lc,DM *inra);
DMDACreate3d(MPI Comm comm,DMDABoundaryType xperiod,DMDABoundaryType yperiod,
DMDABoundaryType zperiod, DMDAStencilType stencil type,
int M,int N,int P,int m,int n,int p,int w,int s,int *lx,int *ly,int *lz,DM *inra);
The routines to create distributed arrays are collective, so that all processes in the communicator comm must
call DACreateXXX().
2.4.2 Local/Global Vectors and Scatters
Each DMDA object denes the layout of two vectors: a distributed global vector and a local vector that
includes room for the appropriate ghost points. The DMDA object provides information about the size and
layout of these vectors, but does not internally allocate any associated storage space for eld values. Instead,
the user can create vector objects that use the DMDA layout information with the routines
DMCreateGlobalVector(DM da,Vec *g);
DMCreateLocalVector(DM da,Vec *l);
These vectors will generally serve as the building blocks for local and global PDE solutions, etc. If additional
vectors with such layout information are needed in a code, they can be obtained by duplicating l or g via
VecDuplicate() or VecDuplicateVecs().
We emphasize that a distributed array provides the information needed to communicate the ghost value
information between processes. In most cases, several different vectors can share the same communication
information (or, in other words, can share a given DMDA). The design of the DMDA object makes this
easy, as each DMDA operation may operate on vectors of the appropriate size, as obtained via DMCreate-
LocalVector() and DMCreateGlobalVector() or as produced by VecDuplicate(). As such, the DMDA scat-
ter/gather operations (e.g., DMGlobalToLocalBegin()) require vector input/output arguments, as discussed
below.
PETSc currently provides no container for multiple arrays sharing the same distributed array communi-
cation; note, however, that the dof parameter handles many cases of interest.
At certain stages of many applications, there is a need to work on a local portion of the vector, including
the ghost points. This may be done by scattering a global vector into its local parts by using the two-stage
commands
DMGlobalToLocalBegin(DM da,Vec g,InsertMode iora,Vec l);
DMGlobalToLocalEnd(DM da,Vec g,InsertMode iora,Vec l);
which allow the overlap of communication and computation. Since the global and local vectors, given
by g and l, respectively, must be compatible with the distributed array, da, they should be generated
by DMCreateGlobalVector() and DMCreateLocalVector() (or be duplicates of such a vector obtained via
VecDuplicate()). The InsertMode can be either ADD VALUES or INSERT VALUES.
One can scatter the local patches into the distributed vector with the command
DMLocalToGlobalBegin(DM da,Vec l,InsertMode mode,Vec g); DMLocalToGlobalEnd(DM da,Vec l,InsertMode mode,Vec g);
Note that this function is not subdivided into beginning and ending phases, since it is purely local.
A third type of distributed array scatter is from a local vector (including ghost points that contain irrele-
vant values) to a local vector with correct ghost point values. This scatter may be done by commands
DMDALocalToLocalBegin(DM da,Vec l1,InsertMode iora,Vec l2);
DMDALocalToLocalEnd(DM da,Vec l1,InsertMode iora,Vec l2);
49
Since both local vectors, l1 and l2, must be compatible with the distributed array, da, they should be
generated by DMCreateLocalVector() (or be duplicates of such vectors obtained via VecDuplicate()). The
InsertMode can be either ADD_VALUES or INSERT_VALUES.
It is possible to directly access the vector scatter contexts (see below) used in the local-to-global (ltog),
global-to-local (gtol), and local-to-local (ltol) scatters with the command
DMDAGetScatter(DM da,VecScatter *ltog,VecScatter *gtol,VecScatter *ltol);
Most users should not need to use these contexts.
2.4.3 Local (Ghosted) Work Vectors
In most applications the local ghosted vectors are only needed during user function evaluations. PETSc
provides an easy light-weight (requiring essentially no CPU time) way to obtain these work vectors and
return them when they are no longer needed. This is done with the routines
DMGetLocalVector(DM da,Vec *l);
.... use the local vector l
DMRestoreLocalVector(DM da,Vec *l);
2.4.4 Accessing the Vector Entries for DMDA Vectors
PETSc provides an easy way to set values into the DMDA Vectors and access them using the natural grid
indexing. This is done with the routines
DMDAVecGetArray(DM da,Vec l,void *array);
... use the array indexing it with 1 or 2 or 3 dimensions
... depending on the dimension of the DMDA
DMDAVecRestoreArray(DM da,Vec l,void *array);
where array is a multidimensional C array with the same dimension as da. The vector l can be either a
global vector or a local vector. The array is accessed using the usual global indexing on the entire grid,
but the user may only refer to the local and ghost entries of this array as all other entries are undened. For
example for a scalar problem in two dimensions one could do
PetscScalar **f,**u;
...
DMDAVecGetArray(DM da,Vec local,(void*)&u);
DMDAVecGetArray(DM da,Vec global,(void*)&f);
...
f[i][j] = u[i][j] - ...
...
DMDAVecRestoreArray(DM da,Vec local,(void*)&u);
DMDAVecRestoreArray(DM da,Vec global,(void*)&f);
See ${PETSC_DIR}/src/snes/examples/tutorials/ex5.c for a complete example and see
${PETSC_DIR}/src/snes/examples/tutorials/ex19.c for an example for a multi-component
PDE.
50
2.4.5 Grid Information
The global indices of the lower left corner of the local portion of the array as well as the local array size can
be obtained with the commands
DMDAGetCorners(DM da,int *x,int *y,int *z,int *m,int *n,int *p);
DMDAGetGhostCorners(DM da,int *x,int *y,int *z,int *m,int *n,int *p);
The rst version excludes any ghost points, while the second version includes them. The routine DMDAGet-
GhostCorners() deals with the fact that subarrays along boundaries of the problem domain have ghost points
only on their interior edges, but not on their boundary edges.
When either type of stencil is used, DMDA STENCIL STAR or DA STENCIL BOX, the local vectors
(with the ghost points) represent rectangular arrays, including the extra corner elements in the DMDA STENCIL STAR
case. This conguration provides simple access to the elements by employing two- (or three-) dimensional
indexing. The only difference between the two cases is that when DMDA STENCIL STAR is used, the ex-
tra corner components are not scattered between the processes and thus contain undened values that should
not be used.
To assemble global stiffness matrices, one needs either
the global node number of each local node including the ghost nodes. This number may be determined
by using the command
DMDAGetGlobalIndices(DM da,int *n,int *idx[]);
The output argument n contains the number of local nodes, including ghost nodes, while idx contains
a list of length n containing the global indices that correspond to the local nodes. Either parameter
may be omitted by passing PETSC NULL. Note that the Fortran interface differs slightly; see Sec-
tion 10.1.3 for details.
or to set up the vectors and matrices so that their entries may be added using the local numbering.
This is done by rst calling
DMDAGetISLocalToGlobalMapping(DM da,ISLocalToGlobalMapping *map);
followed by
VecSetLocalToGlobalMapping(Mat A,ISLocalToGlobalMapping map);
MatSetLocalToGlobalMapping(Mat A,ISLocalToGlobalMapping map);
Now entries may be added to the vector and matrix using the local numbering and VecSetValuesLo-
cal() and MatSetValuesLocal().
Since the global ordering that PETSc uses to manage its parallel vectors (and matrices) does not usually
correspond to the natural ordering of a two- or three-dimensional array, the DMDA structure provides an
application ordering AO (see Section 2.3.1) that maps between the natural ordering on a rectangular grid
and the ordering PETSc uses to parallize. This ordering context can be obtained with the command
DMDAGetAO(DM da,AO *ao);
In Figure 10 we indicate the orderings for a two-dimensional distributed array, divided among four pro-
cesses.
The example ${PETSC_DIR}/src/snes/examples/tutorials/ex5.c, illustrates the use of
a distributed array in the solution of a nonlinear problem. The analogous Fortran program is
${PETSC_DIR}/src/snes/examples/tutorials/ex5f.F; see Chapter 5 for a discussion of
the nonlinear solvers.
51
Processor 2 Processor 3
22 23 24 29 30 26 27 28 29 30
Processor 2 Processor 3
21 22 23 24 25
16 17 18 19 20
11 12 13 14 15
6 7 8 9 10
1 2 3 4 5
19 20 21 27 28
16 17 18 25 26
Natural Ordering
PETSc Ordering
Processor 1 Processor 0 Processor 1 Processor 0
7 8 9 14 15
4 5 6 12 13
1 2 3 10 11
Figure 10: Natural Ordering and PETSc Ordering for a 2D Distributed Array (Four Processes)
2.5 Software for Managing Vectors Related to Unstructured Grids
2.5.1 Index Sets
To facilitate general vector scatters and gathers used, for example, in updating ghost points for problems
dened on unstructured grids, PETSc employs the concept of an index set. An index set, which is a gener-
alization of a set of integer indices, is used to dene scatters, gathers, and similar operations on vectors and
matrices.
The following command creates a index set based on a list of integers:
ISCreateGeneral(MPI Comm comm,int n,int *indices,PetscCopyMode mode, IS *is);
When mode is PETSC COPY VALUES this routine copies the n indices passed to it by the integer array
indices. Thus, the user should be sure to free the integer array indices when it is no longer needed,
perhaps directly after the call to ISCreateGeneral(). The communicator, comm, should consist of all pro-
cesses that will be using the IS.
Another standard index set is dened by a starting point (first) and a stride (step), and can be
created with the command
ISCreateStride(MPI Comm comm,int n,int rst,int step,IS *is);
Index sets can be destroyed with the command
ISDestroy(IS &is);
On rare occasions the user may need to access information directly from an index set. Several commands
assist in this process:
ISGetSize(IS is,int *size);
ISStrideGetInfo(IS is,int *rst,int *stride);
ISGetIndices(IS is,int **indices);
52
The function ISGetIndices() returns a pointer to a list of the indices in the index set. For certain index sets,
this may be a temporary array of indices created specically for a given routine. Thus, once the user nishes
using the array of indices, the routine
ISRestoreIndices(IS is, int **indices);
should be called to ensure that the system can free the space it may have used to generate the list of indices.
A blocked version of the index sets can be created with the command
ISCreateBlock(MPI Comm comm,int bs,int n,int *indices,PetscCopyMode mode, IS *is);
This version is used for dening operations in which each element of the index set refers to a block of
bs vector entries. Related routines analogous to those described above exist as well, including ISBlock-
GetIndices(), ISBlockGetSize(), ISBlockGetLocalSize(), ISGetBlockSize(). See the man pages for details.
2.5.2 Scatters and Gathers
PETSc vectors have full support for general scatters and gathers. One can select any subset of the com-
ponents of a vector to insert or add to any subset of the components of another vector. We refer to these
operations as generalized scatters, though they are actually a combination of scatters and gathers.
To copy selected components from one vector to another, one uses the following set of commands:
VecScatterCreate(Vec x,IS ix,Vec y,IS iy,VecScatter *ctx);
VecScatterBegin(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD);
VecScatterEnd(VecScatter ctx,Vec x,Vec y,INSERT VALUES,SCATTER FORWARD);
VecScatterDestroy(VecScatter *ctx);
Here ix denotes the index set of the rst vector, while iy indicates the index set of the destination vector.
The vectors can be parallel or sequential. The only requirements are that the number of entries in the index
set of the rst vector, ix, equal the number in the destination index set, iy, and that the vectors be long
enough to contain all the indices referred to in the index sets. The argument INSERT VALUES species
that the vector elements will be inserted into the specied locations of the destination vector, overwriting
any existing values. To add the components, rather than insert them, the user should select the option
ADD VALUES instead of INSERT VALUES.
To perform a conventional gather operation, the user simply makes the destination index set, iy, be a
stride index set with a stride of one. Similarly, a conventional scatter can be done with an initial (sending)
index set consisting of a stride. The scatter rotines are collective operations (i.e. all processes that own a
parallel vector must call the scatter routines). When scattering from a parallel vector to sequential vectors,
each process has its own sequential vector that receives values from locations as indicated in its own index
set. Similarly, in scattering from sequential vectors to a parallel vector, each process has its own sequential
vector that makes contributions to the parallel vector.
Caution: When INSERT VALUES is used, if two different processes contribute different values to the
same component in a parallel vector, either value may end up being inserted. When ADD VALUES is used,
the correct sum is added to the correct location.
In some cases one may wish to undo a scatter, that is perform the scatter backwards switching the
roles of the sender and receiver. This is done by using
VecScatterBegin(VecScatter ctx,Vec y,Vec x,INSERT VALUES,SCATTER REVERSE);
VecScatterEnd(VecScatter ctx,Vec y,Vec x,INSERT VALUES,SCATTER REVERSE);
Note that the roles of the rst two arguments to these routines must be swapped whenever the SCAT-
TER REVERSE option is used.
53
Vec p, x; /* initial vector, destination vector */
VecScatter scatter; /* scatter context */
IS from, to; /* index sets that dene the scatter */
PetscScalar *values;
int idx_from[] = {100,200}, idx_to[] = {0,1};
VecCreateSeq(PETSC COMM SELF,2,&x);
ISCreateGeneral(PETSC COMM SELF,2,idx from,PETSC COPY VALUES,&from);
ISCreateGeneral(PETSC COMM SELF,2,idx to,PETSC COPY VALUES,&to);
VecScatterCreate(p,from,x,to,&scatter);
VecScatterBegin(scatter,p,x,INSERT VALUES,SCATTER FORWARD);
VecScatterEnd(scatter,p,x,INSERT VALUES,SCATTER FORWARD);
VecGetArray(x,&values);
ISDestroy(&from);
ISDestroy(&to);
VecScatterDestroy(&scatter);
Figure 11: Example Code for Vector Scatters
Once a VecScatter object has been created it may be used with any vectors that have the appropriate
parallel data layout. That is, one can call VecScatterBegin() and VecScatterEnd() with different vectors than
used in the call to VecScatterCreate() so long as they have the same parallel layout (number of elements
on each process are the same). Usually, these different vectors would have been obtained via calls to
VecDuplicate() from the original vectors used in the call to VecScatterCreate().
There is a PETSc routine that is nearly the opposite of VecSetValues(), that is, VecGetValues(), but it can
only get local values from the vector. To get off process values, the user should create a new vector where
the components are to be stored and perform the appropriate vector scatter. For example, if one desires to
obtain the values of the 100th and 200th entries of a parallel vector, p, one could use a code such as that
within Figure 11. In this example, the values of the 100th and 200th components are placed in the array
values. In this example each process now has the 100th and 200th component, but obviously each process
could gather any elements it needed, or none by creating an index set with no entries.
The scatter comprises two stages, in order to allow overlap of communication and computation. The
introduction of the VecScatter context allows the communication patterns for the scatter to be computed
once and then reused repeatedly. Generally, even setting up the communication for a scatter requires com-
munication; hence, it is best to reuse such information when possible.
2.5.3 Scattering Ghost Values
The scatters provide a very general method for managing the communication of required ghost values for
unstructured grid computations. One scatters the global vector into a local ghosted work vector, performs
the computation on the local work vectors, and then scatters back into the global solution vector. In the
simplest case this may be written as
Function: (Input Vec globalin, Output Vec globalout)
VecScatterBegin(VecScatter scatter,Vec globalin,Vec localin,InsertMode INSERT VALUES,
ScatterMode SCATTER FORWARD);
VecScatterEnd(VecScatter scatter,Vec globalin,Vec localin,InsertMode INSERT VALUES,
ScatterMode SCATTER FORWARD);
54
/
*
For example, do local calculations from localin to localout
*
/
VecScatterBegin(VecScatter scatter,Vec localout,Vec globalout,InsertMode ADD VALUES,
ScatterMode SCATTER REVERSE);
VecScatterEnd(VecScatter scatter,Vec localout,Vec globalout,InsertMode ADD VALUES,
ScatterMode SCATTER REVERSE);
2.5.4 Vectors with Locations for Ghost Values
There are two minor drawbacks to the basic approach described above:
the extra memory requirement for the local work vector, localin, which duplicates the memory in
globalin, and
the extra time required to copy the local values from localin to globalin.
An alternative approach is to allocate global vectors with space preallocated for the ghost values; this
may be done with either
VecCreateGhost(MPI Comm comm,int n,int N,int nghost,int *ghosts,Vec *vv)
or
VecCreateGhostWithArray(MPI Comm comm,int n,int N,int nghost,int *ghosts,
PetscScalar *array,Vec *vv)
Here n is the number of local vector entries, N is the number of global entries (or PETSC NULL) and
nghost is the number of ghost entries. The array ghosts is of size nghost and contains the global
vector location for each local ghost location. Using VecDuplicate() or VecDuplicateVecs() on a ghosted
vector will generate additional ghosted vectors.
In many ways a ghosted vector behaves just like any other MPI vector created by VecCreateMPI(), the
difference is that the ghosted vector has an additional local representation that allows one to access the
ghost locations. This is done through the call to
VecGhostGetLocalForm(Vec g,Vec *l);
The vector l is a sequential representation of the parallel vector g that shares the same array space (and
hence numerical values); but allows one to access the ghost values past the end of the array. Note that
one access the entries in l using the local numbering of elements and ghosts, while they are accessed in g
using the global numbering.
A common usage of a ghosted vector is given by
VecGhostUpdateBegin(Vec globalin,InsertMode INSERT VALUES,
ScatterMode SCATTER FORWARD);
VecGhostUpdateEnd(Vec globalin,InsertMode INSERT VALUES,
ScatterMode SCATTER FORWARD);
VecGhostGetLocalForm(Vec globalin,Vec *localin);
VecGhostGetLocalForm(Vec globalout,Vec *localout);
/
*
Do local calculations from localin to localout
*
/
VecGhostRestoreLocalForm(Vec globalin,Vec *localin);
VecGhostRestoreLocalForm(Vec globalout,Vec *localout);
VecGhostUpdateBegin(Vec globalout,InsertMode ADD VALUES,
55
ScatterMode SCATTER REVERSE);
VecGhostUpdateEnd(Vec globalout,InsertMode ADD VALUES,
ScatterMode SCATTER REVERSE);
The routines VecGhostUpdateBegin() and VecGhostUpdateEnd() are equivalent to the routines VecScat-
terBegin() and VecScatterEnd() above except that since they are scattering into the ghost locations, they do
not need to copy the local vector values, which are already in place. In addition, the user does not have
to allocate the local work vector, since the ghosted vector already has allocated slots to contain the ghost
values.
The input arguments INSERT VALUES and SCATTER FORWARD cause the ghost values to be cor-
rectly updated from the appropriate process. The arguments ADD VALUES and SCATTER REVERSE
update the local portions of the vector from all the other processes ghost values. This would be appropri-
ate, for example, when performing a nite element assembly of a load vector.
Section 3.5 discusses the important topic of partitioning an unstructured grid.
56
Chapter 3
Matrices
PETSc provides a variety of matrix implementations because no single matrix format is appropriate for all
problems. Currently we support dense storage and compressed sparse row storage (both sequential and
parallel versions), as well as several specialized formats. Additional formats can be added.
This chapter describes the basics of using PETSc matrices in general (regardless of the particular format
chosen) and discusses tips for efcient use of the several simple uniprocess and parallel matrix types. The
use of PETSc matrices involves the following actions: create a particular type of matrix, insert values into it,
process the matrix, use the matrix for various computations, and nally destroy the matrix. The application
code does not need to know or care about the particular storage formats of the matrices.
3.1 Creating and Assembling Matrices
The simplest routine for forming a PETSc matrix, A, is followed by
MatCreate(MPI Comm comm,Mat *A) MatSetSizes(Mat A,int m,int n,int M,int N)
This routine generates a sequential matrix when running one process and a parallel matrix for two or more
processes; the particular matrix format is set by the user via options database commands. The user species
either the global matrix dimensions, given by M and N or the local dimensions, given by m and n while PETSc
completely controls memory allocation. This routine facilitates switching among various matrix types, for
example, to determine the format that is most efcient for a certain application. By default, MatCreate()
employs the sparse AIJ format, which is discussed in detail Section 3.1.1. See the manual pages for further
information about available matrix formats.
To insert or add entries to a matrix, one can call a variant of MatSetValues(), either
MatSetValues(Mat A,int m,const int idxm[],int n,const int idxn[],const PetscScalar values[],
INSERT VALUES);
or
MatSetValues(Mat A,int m,const int idxm[],int n,const int idxn[],const PetscScalar values[],
ADD VALUES);
This routine inserts or adds a logically dense subblock of dimension m
*
n into the matrix. The integer indices
idxm and idxn, respectively, indicate the global row and column numbers to be inserted. MatSetValues()
uses the standard C convention, where the row and column matrix indices begin with zero regardless of
the storage format employed. The array values is logically two-dimensional, containing the values that
are to be inserted. By default the values are given in row major order, which is the opposite of the For-
tran convention, meaning that the value to be put in row idxm[i] and column idxn[j] is located in
values[i
*
n+j]. To allow the insertion of values in column major order, one can call the command
57
MatSetOption(Mat A,MAT COLUMN ORIENTED,PETSC TRUE);
Warning: Several of the sparse implementations do not currently support the column-oriented option.
This notation should not be a mystery to anyone. For example, to insert one matrix into another when
using MATLAB, one uses the command A(im,in) = B; where im and in contain the indices for the
rows and columns. This action is identical to the calls above to MatSetValues().
When using the block compressed sparse row matrix format (MATSEQBAIJ or MATMPIBAIJ), one can
insert elements more efciently using the block variant, MatSetValuesBlocked() or MatSetValuesBlocked-
Local().
The function MatSetOption() accepts several other inputs; see the manual page for details.
After the matrix elements have been inserted or added into the matrix, they must be processed (also
called assembled) before they can be used. The routines for matrix processing are
MatAssemblyBegin(Mat A,MAT FINAL ASSEMBLY);
MatAssemblyEnd(Mat A,MAT FINAL ASSEMBLY);
By placing other code between these two calls, the user can perform computations while messages are in
transit. Calls to MatSetValues() with the INSERT VALUES and ADD VALUES options cannot be mixed
without intervening calls to the assembly routines. For such intermediate assembly calls the second routine
argument typically should be MAT_FLUSH_ASSEMBLY, which omits some of the work of the full assembly
process. MAT_FINAL_ASSEMBLY is required only in the last matrix assembly before a matrix is used.
Even though one may insert values into PETSc matrices without regard to which process eventually
stores them, for efciency reasons we usually recommend generating most entries on the process where
they are destined to be stored. To help the application programmer with this task for matrices that are
distributed across the processes by ranges, the routine
MatGetOwnershipRange(Mat A,int *rst row,int *last row);
informs the user that all rows from first_row to last_row-1 (since the value returned in last_row
is one more than the global index of the last local row) will be stored on the local process.
In the sparse matrix implementations, once the assembly routines have been called, the matrices are
compressed and can be used for matrix-vector multiplication, etc. Inserting new values into the matrix
at this point will be expensive, since it requires copies and possible memory allocation. Thus, whenever
possible one should completely set the values in the matrices before calling the nal assembly routines.
If one wishes to repeatedly assemble matrices that retain the same nonzero pattern (such as within a
nonlinear or time-dependent problem), the option
MatSetOption(Mat A,MAT NO NEW NONZERO LOCATIONS,PETSC TRUE);
should be specied after the rst matrix has been fully assembled. This option ensures that certain data
structures and communication information will be reused (instead of regenerated) during successive steps,
thereby increasing efciency. See ${PETSC_DIR}/src/ksp/ksp/examples/tutorials/ex5.
c for a simple example of solving two linear systems that use the same matrix data structure.
3.1.1 Sparse Matrices
The default matrix representation within PETSc is the general sparse AIJ format (also called the Yale sparse
matrix format or compressed sparse row format, CSR). This section discusses tips for efciently using this
matrix format for large-scale applications. Additional formats (such as block compressed row and block
diagonal storage, which are generally much more efcient for problems with multiple degrees of freedom
per node) are discussed below. Beginning users need not concern themselves initially with such details and
may wish to proceed directly to Section 3.2. However, when an application code progresses to the point of
tuning for efciency and/or generating timing results, it is crucial to read this information.
58
Sequential AIJ Sparse Matrices
In the PETSc AIJ matrix formats, we store the nonzero elements by rows, along with an array of correspond-
ing column numbers and an array of pointers to the beginning of each row. Note that the diagonal matrix
entries are stored with the rest of the nonzeros (not separately).
To create a sequential AIJ sparse matrix, A, with m rows and n columns, one uses the command
MatCreateSeqAIJ(PETSC COMM SELF,int m,int n,int nz,int *nnz,Mat *A);
where nz or nnz can be used to preallocate matrix memory, as discussed below. The user can set nz=0
and nnz=PETSC NULL for PETSc to control all matrix memory allocation.
The sequential and parallel AIJ matrix storage formats by default employ i-nodes (identical nodes)
when possible. We search for consecutive rows with the same nonzero structure, thereby reusing matrix
information for increased efciency. Related options database keys are -mat_no_inode (do not use
inodes) and -mat_inode_limit <limit> (set inode limit (max limit=5)). Note that problems with
a single degree of freedom per grid node will automatically not use I-nodes.
By default the internal data representation for the AIJ formats employs zero-based indexing. For com-
patibility with standard Fortran storage, thus enabling use of external Fortran software packages such as
SPARSKIT, the option -mat_aij_oneindex enables one-based indexing, where the stored row and
column indices begin at one, not zero. All user calls to PETSc routines, regardless of this option, use
zero-based indexing.
Preallocation of Memory for Sequential AIJ Sparse Matrices
The dynamic process of allocating new memory and copying from the old storage to the new is intrinsically
very expensive. Thus, to obtain good performance when assembling an AIJ matrix, it is crucial to preallocate
the memory needed for the sparse matrix. The user has two choices for preallocating matrix memory via
MatCreateSeqAIJ().
One can use the scalar nz to specify the expected number of nonzeros for each row. This is generally
ne if the number of nonzeros per row is roughly the same throughout the matrix (or as a quick and easy rst
step for preallocation). If one underestimates the actual number of nonzeros in a given row, then during the
assembly process PETSc will automatically allocate additional needed space. However, this extra memory
allocation can slow the computation,
If different rows have very different numbers of nonzeros, one should attempt to indicate (nearly) the
exact number of elements intended for the various rows with the optional array, nnz of length m, where m is
the number of rows, for example
int nnz[m];
nnz[0] = <nonzeros in row 0>
nnz[1] = <nonzeros in row 1>
....
nnz[m-1] = <nonzeros in row m-1>
In this case, the assembly process will require no additional memory allocations if the nnz estimates are
correct. If, however, the nnz estimates are incorrect, PETSc will automatically obtain the additional needed
space, at a slight loss of efciency.
Using the array nnz to preallocate memory is especially important for efcient matrix assembly if the
number of nonzeros varies considerably among the rows. One can generally set nnz either by knowing
in advance the problem structure (e.g., the stencil for nite difference problems on a structured grid) or by
precomputing the information by using a segment of code similar to that for the regular matrix assembly.
The overhead of determining the nnz array will be quite small compared with the overhead of the inherently
59
expensive mallocs and moves of data that are needed for dynamic allocation during matrix assembly. Always
guess high if exact value is not known (since extra space is cheaper than too little).
Thus, when assembling a sparse matrix with very different numbers of nonzeros in various rows, one
could proceed as follows for nite difference methods:
- Allocate integer array nnz.
- Loop over grid, counting the expected number of nonzeros for the row(s)
associated with the various grid points.
- Create the sparse matrix via MatCreateSeqAIJ() or alternative.
- Loop over the grid, generating matrix entries and inserting in matrix via MatSetValues().
For (vertex-based) nite element type calculations, an analogous procedure is as follows:
- Allocate integer array nnz.
- Loop over vertices, computing the number of neighbor vertices, which determines the
number of nonzeros for the corresponding matrix row(s).
- Create the sparse matrix via MatCreateSeqAIJ() or alternative.
- Loop over elements, generating matrix entries and inserting in matrix via MatSetValues().
The -info option causes the routines MatAssemblyBegin() and MatAssemblyEnd() to print informa-
tion about the success of the preallocation. Consider the following example for the MATSEQAIJ matrix
format:
MatAssemblyEnd SeqAIJ:Matrix size 10 X 10; storage space:20 unneeded, 100 used
MatAssemblyEnd SeqAIJ:Number of mallocs during MatSetValues is 0
The rst line indicates that the user preallocated 120 spaces but only 100 were used. The second line
indicates that the user preallocated enough space so that PETSc did not have to internally allocate additional
space (an expensive operation). In the next example the user did not preallocate sufcient space, as indicated
by the fact that the number of mallocs is very large (bad for efciency):
MatAssemblyEnd SeqAIJ:Matrix size 10 X 10; storage space:47 unneeded, 1000 used
MatAssemblyEnd SeqAIJ:Number of mallocs during MatSetValues is 40000
Although at rst glance such procedures for determining the matrix structure in advance may seem
unusual, they are actually very efcient because they alleviate the need for dynamic construction of the
matrix data structure, which can be very expensive.
Parallel AIJ Sparse Matrices
Parallel sparse matrices with the AIJ format can be created with the command
MatCreateMPIAIJ(MPI Comm comm,int m,int n,int M,int N,int d nz,
int *d nnz, int o nz,int *o nnz,Mat *A);
A is the newly created matrix, while the arguments m, M, and N, indicate the number of local rows and
the number of global rows and columns, respectively. In the PETSc partitioning scheme, all the matrix
columns are local and n is the number of columns corresponding to local part of a parallel vector. Either the
local or global parameters can be replaced with PETSC DECIDE, so that PETSc will determine them. The
matrix is stored with a xed number of rows on each process, given by m, or determined by PETSc if m is
PETSC DECIDE.
60
If PETSC DECIDE is not used for the arguments m and n, then the user must ensure that they are chosen
to be compatible with the vectors. To do this, one rst considers the matrix-vector product y = Ax. The m
that is used in the matrix creation routine MatCreateMPIAIJ() must match the local size used in the vector
creation routine VecCreateMPI() for y. Likewise, the n used must match that used as the local size in
VecCreateMPI() for x.
The user must set d_nz=0, o_nz=0, d_nnz=PETSC NULL, and o_nnz=PETSC NULL for PETSc
to control dynamic allocation of matrix memory space. Analogous to nz and nnz for the routine MatCreate-
SeqAIJ(), these arguments optionally specify nonzero information for the diagonal (d_nz and d_nnz) and
off-diagonal (o_nz and o_nnz) parts of the matrix. For a square global matrix, we dene each processs
diagonal portion to be its local rows and the corresponding columns (a square submatrix); each processs
off-diagonal portion encompasses the remainder of the local matrix (a rectangular submatrix). The rank in
the MPI communicator determines the absolute ordering of the blocks. That is, the process with rank 0 in
the communicator given to MatCreateMPIAIJ() contains the top rows of the matrix; the i
th
process in that
communicator contains the i
th
block of the matrix.
Preallocation of Memory for Parallel AIJ Sparse Matrices
As discussed above, preallocation of memory is critical for achieving good performance during matrix
assembly, as this reduces the number of allocations and copies required. We present an example for three
processes to indicate how this may be done for the MATMPIAIJ matrix format. Consider the 8 by 8 matrix,
which is partitioned by default with three rows on the rst process, three on the second and two on the third.
1 2 0 | 0 3 0 | 0 4
0 5 6 | 7 0 0 | 8 0
9 0 10 | 11 0 0 | 12 0
13 0 14 | 15 16 17 | 0 0
0 18 0 | 19 20 21 | 0 0
0 0 0 | 22 23 0 | 24 0
25 26 27 | 0 0 28 | 29 0
30 0 0 | 31 32 33 | 0 34
1 2 0
0 5 6
9 0 10
,
while the off-diagonal submatrix, o, matrix is given by
0 3 0 0 4
7 0 0 8 0
11 0 0 12 0
.
For the rst process one could set d_nz to 2 (since each row has 2 nonzeros) or, alternatively, set d_nnz
to {2,2,2}. The o_nz could be set to 2 since each row of the o matrix has 2 nonzeros, or o_nnz could be
set to {2,2,2}.
For the second process the d submatrix is given by
15 16 17
19 20 21
22 23 0
.
61
Thus, one could set d_nz to 3, since the maximum number of nonzeros in each row is 3, or alternatively
one could set d_nnz to {3,3,2}, thereby indicating that the rst two rows will have 3 nonzeros while the
third has 2. The corresponding o submatrix for the second process is
13 0 14 0 0
0 18 0 0 0
0 0 0 24 0
A
aa
A
ab
A
ac
A
ba
A
bb
A
bc
A
ca
A
cb
A
cc
.
There are two fundamentally different ways that this matrix could be stored, as a single assembled sparse
matrix where entries from all blocks are merged together (monolithic), or as separate assembled matri-
ces for each block (nested). These formats have different performance characteristics depending on the
operation being performed. In particular, many preconditioners require a monolithic format, but some that
are very effective for solving block systems (see Section 4.5) are more efcient when a nested format is
used. In order to stay exible, we would like to be able to use the same code to assemble block matrices in
both monolithic and nested formats. Additionally, for software maintainability and testing, especially in a
multi-physics context where different groups might be responsible for assembling each of the blocks, it is
desirable to be able to use exactly the same code to assemble a single block independently as to assemble it
as part of a larger system. To do this, we introduce the four spaces shown in Figure 12.
The monolithic global space is the space in which the Krylov and Newton solvers operate, with
collective semantics across the entire block system.
The split global space splits the blocks apart, but each split still has collective semantics.
The split local space adds ghost points and sperates the blocks. Operations in this space can be
performed with no parallel communication. This is often the most natural, and certainly the most
powerful, space for matrix assembly code.
The monolithic local space can be thought of as adding ghost points to the monolithic global space,
but it is often more natural to use it simply as a concatenation of split local spaces on each process. It
is not common to explicitly manipulate vectors or matrices in this space (at least not during assembly),
but it is a useful for declaring which part of a matrix is being assembled.
The key to format-independent assembly is the function
MatGetLocalSubMatrix(Mat A,IS isrow,IS iscol,Mat *submat);
which provides a view submat into a matrix A that operates in the monolithic global space. The submat
transforms from the split local space dened by iscol to the split local space dened by isrow. The index
sets specify the parts of the monolithic local space that submat should operate in. If a nested matrix format
is used, then MatGetLocalSubMatrix() nds the nested block and returns it without making any copies. In
this case, submat is fully functional and has a parallel communicator. If a monolithic matrix format is used,
then MatGetLocalSubMatrix() returns a proxy matrix on PETSC_COMM_SELF that does not provide values
or implement MatMult(), but does implement MatSetValuesLocal() and, if isrow,iscol have a constant
block size, MatSetValuesBlockedLocal(). Note that although submat may not be a fully functional matrix
and the caller does not even know a priori which communicator it will reside on, it always implements the
local assembly functions (which are not collective). The index sets isrow,iscol can be obtained using
DMCompositeGetLocalISs() if DMComposite is being used. DMComposite can also be used to create
matrices, in which case the MATNEST format can be specied using -prefix_dm_mat_type nest
and MATAIJ can be specied using -prefix_dm_mat_type aij. See $PETSC_DIR/src/snes/
examples/tutorials/ex28.c for a simple example using this interface.
63
rank 0
rank 2
rank 1
rank 0
rank 1
rank 2
LocalToGlobalMapping
Monolithic Global Monolithic Local
Split Local
GetLocalSubMatrix()
Split Global
GetSubMatrix() / GetSubVector()
LocalToGlobal()
rank 0
rank 1
rank 2
Figure 12: The relationship between spaces used for coupled assembly.
64
3.2 Basic Matrix Operations
Table 2 summarizes basic PETSc matrix operations. We briey discuss a few of these routines in more detail
below.
The parallel matrix can multiply a vector with n local entries, returning a vector with m local entries.
That is, to form the product
MatMult(Mat A,Vec x,Vec y);
the vectors x and y should be generated with
VecCreateMPI(MPI Comm comm,n,N,&x);
VecCreateMPI(MPI Comm comm,m,M,&y);
By default, if the user lets PETSc decide the number of components to be stored locally (by passing in
PETSC DECIDE as the second argument to VecCreateMPI() or using VecCreate()), vectors and matrices of
the same dimension are automatically compatible for parallel matrix-vector operations.
Along with the matrix-vector multiplication routine, there is a version for the transpose of the matrix,
MatMultTranspose(Mat A,Vec x,Vec y);
There are also versions that add the result to another vector:
MatMultAdd(Mat A,Vec x,Vec y,Vec w);
MatMultTransposeAdd(Mat A,Vec x,Vec y,Vec w);
These routines, respectively, produce w = A x + y and w = A
T
x + y . In C it is legal for the vectors
y and w to be identical. In Fortran, this situation is forbidden by the language standard, but we allow it
anyway.
One can print a matrix (sequential or parallel) to the screen with the command
MatView(Mat mat,PETSC VIEWER STDOUT WORLD);
Other viewers can be used as well. For instance, one can draw the nonzero stucture of the matrix into the
default X-window with the command
MatView(Mat mat,PETSC VIEWER DRAW WORLD);
Also one can use
MatView(Mat mat,PetscViewer viewer);
where viewer was obtained with PetscViewerDrawOpen(). Additional viewers and options are given in
the MatView() man page and Section 13.3.
The NormType argument to MatNorm() is one of NORM 1, NORM INFINITY, and NORM FROBENIUS.
3.3 Matrix-Free Matrices
Some people like to use matrix-free methods, which do not require explicit storage of the matrix, for the
numerical solution of partial differential equations. To support matrix-free methods in PETSc, one can use
the following command to create a Mat structure without ever actually generating the matrix:
MatCreateShell(MPI Comm comm,int m,int n,int M,int N,void *ctx,Mat *mat);
65
Function Name Operation
MatAXPY(Mat Y, PetscScalar a,Mat X,MatStructure); Y = Y + a X
MatMult(Mat A,Vec x, Vec y); y = A x
MatMultAdd(Mat A,Vec x, Vec y,Vec z); z = y + A x
MatMultTranspose(Mat A,Vec x, Vec y); y = A
T
x
MatMultTransposeAdd(Mat A,Vec x, Vec y,Vec z); z = y + A
T
x
MatNorm(Mat A,NormType type, double *r); r = ||A||
type
MatDiagonalScale(Mat A,Vec l,Vec r); A = diag(l) A diag(r)
MatScale(Mat A,PetscScalar a); A = a A
MatConvert(Mat A,MatType type,Mat *B); B = A
MatCopy(Mat A,Mat B,MatStructure); B = A
MatGetDiagonal(Mat A,Vec x); x = diag(A)
MatTranspose(Mat A,MatReuse,Mat* B); B = A
T
MatZeroEntries(Mat A); A = 0
MatShift(Mat Y,PetscScalar a); Y = Y + a I
Table 2: PETSc Matrix Operations
Here M and N are the global matrix dimensions (rows and columns), m and n are the local matrix dimensions,
and ctx is a pointer to data needed by any user-dened shell matrix operations; the manual page has
additional details about these parameters. Most matrix-free algorithms require only the application of the
linear operator to a vector. To provide this action, the user must write a routine with the calling sequence
UserMult(Mat mat,Vec x,Vec y);
and then associate it with the matrix, mat, by using the command
MatShellSetOperation(Mat mat,MatOperation MATOP MULT,
(void(*)(void)) PetscErrorCode (*UserMult)(Mat,Vec,Vec));
Here MATOP_MULT is the name of the operation for matrix-vector multiplication. Within each user-dened
routine (such as UserMult()), the user should call MatShellGetContext() to obtain the user-dened con-
text, ctx, that was set by MatCreateShell(). This shell matrix can be used with the iterative linear equation
solvers discussed in the following chapters.
The routine MatShellSetOperation() can be used to set any other matrix operations as well. The le
${PETSC_DIR}/include/petscmat.h provides a complete list of matrix operations, which have
the form MATOP_<OPERATION>, where <OPERATION> is the name (in all capital letters) of the user
interface routine (for example, MatMult() MATOP_MULT). All user-provided functions have the same
calling sequence as the usual matrix interface routines, since the user-dened functions are intended to be
accessed through the same interface, e.g., MatMult(Mat,Vec,Vec) UserMult(Mat,Vec,Vec). The
nal argument for MatShellSetOperation() needs to be cast to a void
*
, since the nal argument could
(depending on the MatOperation) be a variety of different functions.
Note that MatShellSetOperation() can also be used as a backdoor means of introducing user-dened
changes in matrix operations for other storage formats (for example, to override the default LU factorization
routine supplied within PETSc for the MATSEQAIJ format). However, we urge anyone who introduces
such changes to use caution, since it would be very easy to accidentally create a bug in the new routine that
could affect other routines as well.
See also Section 5.5 for details on one set of helpful utilities for using the matrix-free approach for
nonlinear solvers.
66
3.4 Other Matrix Operations
In many iterative calculations (for instance, in a nonlinear equations solver), it is important for efciency
purposes to reuse the nonzero structure of a matrix, rather than determining it anew every time the matrix is
generated. To retain a given matrix but reinitialize its contents, one can employ
MatZeroEntries(Mat A);
This routine will zero the matrix entries in the data structure but keep all the data that indicates where the
nonzeros are located. In this way a new matrix assembly will be much less expensive, since no memory
allocations or copies will be needed. Of course, one can also explicitly set selected matrix elements to zero
by calling MatSetValues().
By default, if new entries are made in locations where no nonzeros previously existed, space will be
allocated for the new entries. To prevent the allocation of additional memory and simply discard those new
entries, one can use the option
MatSetOption(Mat A,MAT NO NEW NONZERO LOCATIONS,PETSC TRUE);
Once the matrix has been assembled, one can factor it numerically without repeating the ordering or the
symbolic factorization. This option can save some computational time, although it does require that the
factorization is not done in-place.
In the numerical solution of elliptic partial differential equations, it can be cumbersome to deal with
Dirichlet boundary conditions. In particular, one would like to assemble the matrix without regard to bound-
ary conditions and then at the end apply the Dirichlet boundary conditions. In numerical analysis classes this
process is usually presented as moving the known boundary conditions to the right-hand side and then solv-
ing a smaller linear system for the interior unknowns. Unfortunately, implementing this requires extracting
a large submatrix from the original matrix and creating its corresponding data structures. This process can
be expensive in terms of both time and memory.
One simple way to deal with this difculty is to replace those rows in the matrix associated with known
boundary conditions, by rows of the identity matrix (or some scaling of it). This action can be done with the
command
MatZeroRows(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag value,Vec x,Vec b),
or equivalently,
MatZeroRowsIS(Mat A,IS rows,PetscScalar diag value,Vec x,Vec b);
For sparse matrices this removes the data structures for certain rows of the matrix. If the pointer diag_
value is PETSC NULL, it even removes the diagonal entry. If the pointer is not null, it uses that given
value at the pointer location in the diagonal entry of the eliminated rows.
One nice feature of this approach is that when solving a nonlinear problem such that at each iteration the
Dirichlet boundary conditions are in the same positions and the matrix retains the same nonzero structure,
the user can call MatZeroRows() in the rst iteration. Then, before generating the matrix in the second
iteration the user should call
MatSetOption(Mat A,MAT NO NEW NONZERO LOCATIONS,PETSC TRUE);
From that point, no new values will be inserted into those (boundary) rows of the matrix.
The functions MatZeroRowsLocal() and MatZeroRowsLocalIS() can also be used if for each process
one provides the Dirichlet locations in the local numbering of the matrix. A drawback of MatZeroRows() is
that it destroys the symmetry of a matrix. Thus one can use
67
MatZeroRowsColumns(Mat A,PetscInt numRows,PetscInt rows[],PetscScalar diag value,Vec x,Vec b),
or equivalently,
MatZeroRowsColumnsIS(Mat A,IS rows,PetscScalar diag value,Vec x,Vec b);
Note that with all of these for a given assembled matrix it can be only called once to update the x and b
vector. It cannot be used if one wishes to solve multiple right hand side problems for the same matrix since
the matrix entries needed for updating the b vector are removed in its rst use.
Once the zeroed rows are removed the new matrix has possibly many rows with only a diagonal entry
affecting the parallel load balancing. The PCREDISTRIBUTE preconditioner removes all the zeroed rows
(and associated columns and adjusts the right hand side based on the removed columns) and then rebalances
the resulting rows of smaller matrix across the processes. Thus one can use MatZeroRows() to set the
Dirichlet points and then solve with the preconditioner PCREDISTRIBUTE. Note if the original matrix was
symmetric the smaller solved matrix will also be symmetric.
Another matrix routine of interest is
MatConvert(Mat mat,MatType newtype,Mat *M)
which converts the matrix mat to new matrix, M, that has either the same or different format. Set newtype
to MATSAME to copy the matrix, keeping the same matrix format. See ${PETSC_DIR}/include/
petscmat.h for other available matrix types; standard ones are MATSEQDENSE, MATSEQAIJ, MATMPIAIJ,
MATSEQBAIJ and MATMPIBAIJ.
In certain applications it may be necessary for application codes to directly access elements of a matrix.
This may be done by using the the command (for local rows only)
MatGetRow(Mat A,int row, int *ncols,const PetscInt (*cols)[],const PetscScalar (*vals)[]);
The argument ncols returns the number of nonzeros in that row, while cols and vals returns the column
indices (with indices starting at zero) and values in the row. If only the column indices are needed (and
not the corresponding matrix elements), one can use PETSC NULL for the vals argument. Similarly,
one can use PETSC NULL for the cols argument. The user can only examine the values extracted with
MatGetRow(); the values cannot be altered. To change the matrix entries, one must use MatSetValues().
Once the user has nished using a row, he or she must call
MatRestoreRow(Mat A,int row,int *ncols,int **cols,PetscScalar **vals);
to free any space that was allocated during the call to MatGetRow().
3.5 Partitioning
For almost all unstructured grid computation, the distribution of portions of the grid across the processs
work load and memory can have a very large impact on performance. In most PDE calculations the grid
partitioning and distribution across the processes can (and should) be done in a pre-processing step before
the numerical computations. However, this does not mean it need be done in a separate, sequential program,
rather it should be done before one sets up the parallel grid data structures in the actual program. PETSc pro-
vides an interface to the ParMETIS (developed by George Karypis; see the docs/installation/index.htm le
for directions on installing PETSc to use ParMETIS) to allow the partitioning to be done in parallel. PETSc
does not currently provide directly support for dynamic repartitioning, load balancing by migrating matrix
entries between processes, etc. For problems that require mesh renement, PETSc uses the rebuild the data
structure approach, as opposed to the maintain dynamic data structures that support the insertion/deletion
of additional vector and matrix rows and columns entries approach.
68
Partitioning in PETSc is organized around the MatPartitioning object. One rst creates a parallel matrix
that contains the connectivity information about the grid (or other graph-type object) that is to be partitioned.
This is done with the command
MatCreateMPIAdj(MPI Comm comm,int mlocal,int n,const int ia[],const int ja[],
int *weights,Mat *Adj);
The argument mlocal indicates the number of rows of the graph being provided by the given process, n is
the total number of columns; equal to the sum of all the mlocal. The arguments ia and ja are the row
pointers and column pointers for the given rows, these are the usual format for parallel compressed sparse
row storage, using indices starting at 0, not 1.
1
2
3
4
5
0
0
1 2
3
Figure 13: Numbering on Simple Unstructured Grid
This, of course, assumes that one has already distributed the grid (graph) information among the pro-
cesses. The details of this initial distribution is not important; it could be simply determined by assigning to
the rst process the rst n
0
nodes from a le, the second process the next n
1
nodes, etc.
For example, we demonstrate the form of the ia and ja for a triangular grid where we
(1) partition by element (triangle)
Process 0, mlocal = 2, n = 4, ja = {2, 3, |3}, ia = {0, 2, 3}
Process 1, mlocal = 2, n = 4, ja = {0, |0, 1}, ia = {0, 1, 3}
Note that elements are not connected to themselves and we only indicate edge connections (in some contexts
single vertex connections between elements may also be included). We use a | above to denote the transition
between rows in the matrix.
and (2) partition by vertex.
Process 0, mlocal = 3, n = 6, ja = {3, 4, |4, 5, |3, 4, 5}, ia = {0, 2, 4, 7}
Process 1, mlocal = 3, n = 6, ja = {0, 2, 4, |0, 1, 2, 3, 5, |1, 2, 4}, ia = {0, 3, 8, 11}.
Once the connectivity matrix has been created the following code will generate the renumbering required
for the new partition
69
MatPartitioningCreate(MPI Comm comm,MatPartitioning *part);
MatPartitioningSetAdjacency(MatPartitioning part,Mat Adj);
MatPartitioningSetFromOptions(MatPartitioning part);
MatPartitioningApply(MatPartitioning part,IS *is);
MatPartitioningDestroy(MatPartitioning *part);
MatDestroy(Mat *Adj);
ISPartitioningToNumbering(IS is,IS *isg);
The resulting isg contains for each local node the new global number of that node. The resulting is
contains the new process number that each local node has been assigned to.
Now that a new numbering of the nodes has been determined one must renumber all the nodes and
migrate the grid information to the correct process. The command
AOCreateBasicIS(isg,PETSC NULL,&ao);
generates, see Section 2.3.1, an AO object that can be used in conjunction with the is and gis to move the
relevant grid information to the correct process and renumber the nodes etc.
PETSc does not currently provide tools that completely manage the migration and node renumbering,
since it will be dependent on the particular data structure you use to store the grid information and the type
of grid information that you need for your application. We do plan to include more support for this in the
future, but designing the appropriate general user interface and providing a scalable implementation that can
be used for a wide variety of different grids requires a great deal of time.
70
Chapter 4
KSP: Linear Equations Solvers
The object KSP is the heart of PETSc, because it provides uniform and efcient access to all of the packages
linear system solvers, including parallel and sequential, direct and iterative. KSP is intended for solving
nonsingular systems of the form
Ax = b, (4.1)
where A denotes the matrix representation of a linear operator, b is the right-hand-side vector, and x is the
solution vector. KSP uses the same calling sequence for both direct and iterative solution of a linear system.
In addition, particular solution techniques and their associated options can be selected at runtime.
The combination of a Krylov subspace method and a preconditioner is at the center of most modern
numerical codes for the iterative solution of linear systems. See, for example, [7] for an overview of the
theory of such methods. KSP creates a simplied interface to the lower-level KSP and PC modules within
the PETSc package. The KSP package, discussed in Section 4.3, provides many popular Krylov subspace
iterative methods; the PC module, described in Section 4.4, includes a variety of preconditioners. Although
both KSP and PC can be used directly, users should employ the interface of KSP.
4.1 Using KSP
To solve a linear system with KSP, one must rst create a solver context with the command
KSPCreate(MPI Comm comm,KSP *ksp);
Here comm is the MPI communicator, and ksp is the newly formed solver context. Before actually solving
a linear system with KSP, the user must call the following routine to set the matrices associated with the
linear system:
KSPSetOperators(KSP ksp,Mat Amat,Mat Pmat,MatStructure ag);
The argument Amat, representing the matrix that denes the linear system, is a symbolic place holder for
any kind of matrix. In particular, KSP does support matrix-free methods. The routine MatCreateShell()
in Section 3.3 provides further information regarding matrix-free methods. Typically the preconditioning
matrix (i.e., the matrix from which the preconditioner is to be constructed), Pmat, is the same as the matrix
that denes the linear system, Amat; however, occasionally these matrices differ (for instance, when a
preconditioning matrix is obtained from a lower order method than that employed to form the linear system
matrix). The argument flag can be used to eliminate unnecessary work when repeatedly solving linear
systems of the same size with the same preconditioning method; when solving just one linear system, this
ag is ignored. The user can set flag as follows:
71
SAME_NONZERO_PATTERN - the preconditioning matrix has the same nonzero structure during
successive linear solves,
DIFFERENT_NONZERO_PATTERN - the preconditioning matrix does not have the same nonzero
structure during successive linear solves,
SAME_PRECONDITIONER - the preconditioner matrix is identical to that of the previous linear solve.
If the structure of a matrix is not known a priori, one should use the ag DIFFERENT_NONZERO_PATT
ERN.
Much of the power of KSP can be accessed through the single routine
KSPSetFromOptions(KSP ksp);
This routine accepts the options -h and -help as well as any of the KSP and PC options discussed below.
To solve a linear system, one sets the rhs and solution vectors using and executes the command
KSPSolve(KSP ksp,Vec b,Vec x);
where b and x respectively denote the right-hand-side and solution vectors. On return, the iteration number
at which the iterative process stopped can be obtained using
KSPGetIterationNumber(KSP ksp, int *its);
Note that this does not state that the method converged at this iteration: it can also have reached the maxi-
mum number of iterations, or have diverged. Section 4.3.2 gives more details regarding convergence testing.
Note that multiple linear solves can be performed by the same KSP context. Once the KSP context is no
longer needed, it should be destroyed with the command
KSPDestroy(KSP *ksp);
The above procedure is sufcient for general use of the KSP package. One additional step is required for
users who wish to customize certain preconditioners (e.g., see Section 4.4.4) or to log certain performance
data using the PETSc proling facilities (as discussed in Chapter 11). In this case, the user can optionally
explicitly call
KSPSetUp(KSP ksp)
before calling KSPSolve() to perform any setup required for the linear solvers. The explicit call of this
routine enables the separate monitoring of any computations performed during the set up phase, such as
incomplete factorization for the ILU preconditioner.
The default solver within KSP is restarted GMRES, preconditioned for the uniprocess case with ILU(0),
and for the multiprocess case with the block Jacobi method (with one block per process, each of which
is solved with ILU(0)). A variety of other solvers and options are also available. To allow application
programmers to set any of the preconditioner or Krylov subspace options directly within the code, we
provide routines that extract the PC and KSP contexts,
KSPGetPC(KSP ksp,PC *pc);
The application programmer can then directly call any of the PC or KSP routines to modify the correspond-
ing default options.
To solve a linear system with a direct solver (currently supported by PETSc for sequential matrices,
and by several external solvers through PETSc interfaces (see Section 4.7)) one may use the options -
ksp_type preonly -pc_type lu (see below).
By default, if a direct solver is used, the factorization is not done in-place. This approach prevents the
user from the unexpected surprise of having a corrupted matrix after a linear solve. The routine PCFactorSe-
tUseInPlace(), discussed below, causes factorization to be done in-place.
72
4.2 Solving Successive Linear Systems
When solving multiple linear systems of the same size with the same method, several options are available.
To solve successive linear systems having the same preconditioner matrix (i.e., the same data structure
with exactly the same matrix elements) but different right-hand-side vectors, the user should simply call
KSPSolve,() multiple times. The preconditioner setup operations (e.g., factorization for ILU) will be done
during the rst call to KSPSolve() only; such operations will not be repeated for successive solves.
To solve successive linear systems that have different preconditioner matrices (i.e., the matrix elements
and/or the matrix data structure change), the user must call KSPSetOperators() and KSPSolve() for each
solve. See Section 4.1 for a description of various ags for KSPSetOperators() that can save work for such
cases.
4.3 Krylov Methods
The Krylov subspace methods accept a number of options, many of which are discussed below. First, to set
the Krylov subspace method that is to be used, one calls the command
KSPSetType(KSP ksp,KSPType method);
The type can be one of KSPRICHARDSON, KSPCHEBYCHEV, KSPCG, KSPGMRES, KSPTCQMR,
KSPBCGS, KSPCGS, KSPTFQMR, KSPCR, KSPLSQR, KSPBICG, or KSPPREONLY. The KSP method
can also be set with the options database command -ksp_type, followed by one of the options richardson,
chebychev, cg, gmres, tcqmr, bcgs, cgs, tfqmr, cr, lsqr, bicg, or preonly. There are
method-specic options for the Richardson, Chebychev, and GMRES methods:
KSPRichardsonSetScale(KSP ksp,double damping factor);
KSPChebychevSetEigenvalues(KSP ksp,double emax,double emin);
KSPGMRESSetRestart(KSP ksp,int max steps);
The default parameter values are damping_factor=1.0, emax=0.01, emin=100.0, and max_
steps=30. The GMRES restart and Richardson damping factor can also be set with the options -ksp_
gmres_restart <n> and -ksp_richardson_scale <factor>.
The default technique for orthogonalization of the Hessenberg matrix in GMRES is the unmodied
(classical) Gram-Schmidt method, which can be set with
KSPGMRESSetOrthogonalization(KSP ksp,KSPGMRESClassicalGramSchmidtOrthogonalization);
or the options database command -ksp_gmres_classicalgramschmidt. By default this will not
use iterative renement to improve the stability of the orthogonalization. This can be changed with the
option
KSPGMRESSetCGSRenementType(KSP ksp,KSPGMRESCGSRenementType type)
or via the options database with
-ksp gmres cgs renement type none,ifneeded,always
The values for KSPGMRESCGSRenementType are KSP_GMRES_CGS_REFINEMENT_NONE, KSP_GM
RES_CGS_REFINEMENT_IFNEEDED and KSP_GMRES_CGS_REFINEMENT_ALWAYS.
One can also use modifed Gram-Schmidt, by setting the orthogonalization routine, KSPGMRESModi-
edGramSchmidtOrthogonalization(), by using the command line option -ksp_gmres_modifiedgram-
schmidt.
For the conjugate gradient method with complex numbers, there are two slightly different algorithms
depending on whether the matrix is Hermitian symmetric or truly symmetric (the default is to assume that it
is Hermitian symmetric). To indicate that it is symmetric, one uses the command
73
KSPCGSetType(KSP ksp,KSPCGType KSP_CG_SYMMETRIC);
Note that this option is not valid for all matrices.
The LSQR algorithm does not involve a preconditioner, any preconditioner set to work with the KSP
object is ignored if LSQR was selected.
By default, KSP assumes an initial guess of zero by zeroing the initial value for the solution vector that
is given; this zeroing is done at the call to KSPSolve() (or KSPSolve()). To use a nonzero initial guess, the
user must call
KSPSetInitialGuessNonzero(KSP ksp,PetscBool g);
4.3.1 Preconditioning within KSP
Since the rate of convergence of Krylov projection methods for a particular linear system is strongly de-
pendent on its spectrum, preconditioning is typically used to alter the spectrum and hence accelerate the
convergence rate of iterative techniques. Preconditioning can be applied to the system (4.1) by
(M
1
L
AM
1
R
) (M
R
x) = M
1
L
b, (4.2)
where M
L
and M
R
indicate preconditioning matrices (or, matrices from which the preconditioner is to be
constructed). If M
L
= I in (4.2), right preconditioning results, and the residual of (4.1),
r b Ax = b AM
1
R
M
R
x,
is preserved. In contrast, the residual is altered for left (M
R
= I) and symmetric preconditioning, as given
by
r
L
M
1
L
b M
1
L
Ax = M
1
L
r.
By default, all KSP implementations use left preconditioning. Right preconditioning can be activated for
some methods by using the options database command -ksp_pc_side right or calling the routine
KSPSetPCSide(KSP ksp,PCSide PC RIGHT);
Attempting to use right preconditioning for a method that does not currently support it results in an error
message of the form
KSPSetUp Richardson:No right preconditioning for KSPRICHARDSON
We summarize the defaults for the residuals used in KSP convergence monitoring within Table 3. Details
regarding specic convergence tests and monitoring routines are presented in the following sections. The
preconditioned residual is used by default for convergence testing of all left-preconditioned KSP methods.
For the conjugate gradient, Richardson, and Chebyshev methods the true residual can be used by the options
database command ksp_norm_type unpreconditioned or by calling the routine
KSPSetNormType(KSP ksp,KSP NORM UNPRECONDITIONED);
Note: the bi-conjugate gradient method requires application of both the matrix and its transpose plus
the preconditioner and its transpose. Currently not all matrices and preconditioners provide this support and
thus the KSPBICG cannot always be used.
74
Options
Database
Method KSPType Name
Richardson KSPRICHARDSON richardson
Chebychev KSPCHEBYCHEV chebychev
Conjugate Gradient [12] KSPCG cg
BiConjugate Gradient KSPBICG bicg
Generalized Minimal Residual [16] KSPGMRES gmres
Flexible Generalized Minimal Residual KSPFGMRES fgmres
Deated Generalized Minimal Residual KSPDGMRES dgmres
Generalized Conjugate Residual KSPGCR gcr
BiCGSTAB [19] KSPBCGS bcgs
Conjugate Gradient Squared [18] KSPCGS cgs
Transpose-Free Quasi-Minimal Residual (1) [8] KSPTFQMR tfqmr
Transpose-Free Quasi-Minimal Residual (2) KSPTCQMR tcqmr
Conjugate Residual KSPCR cr
Least Squares Method KSPLSQR lsqr
Shell for no KSP method KSPPREONLY preonly
Table 3: KSP Objects.
4.3.2 Convergence Tests
The default convergence test, KSPDefaultConverged(), is based on the l
2
-norm of the residual. Convergence
(or divergence) is decided by three quantities: the decrease of the residual norm relative to the norm of the
right hand side, rtol, the absolute size of the residual norm, atol, and the relative increase in the residual,
dtol. Convergence is detected at iteration k if
r
k
2
< max(rtol b
2
, atol),
where r
k
= b Ax
k
. Divergence is detected if
r
k
2
> dtol b
2
.
These parameters, as well as the maximum number of allowable iterations, can be set with the routine
KSPSetTolerances(KSP ksp,double rtol,double atol,double dtol,int maxits);
The user can retain the default value of any of these parameters by specifying PETSC DEFAULT as the
corresponding tolerance; the defaults are rtol=10
5
, atol=10
50
, dtol=10
5
, and maxits=10
5
. These
parameters can also be set from the options database with the commands -ksp_rtol <rtol>, -ksp_
atol <atol>, -ksp_divtol <dtol>, and -ksp_max_it <its>.
In addition to providing an interface to a simple convergence test, KSP allows the application program-
mer the exibility to provide customized convergence-testing routines. The user can specify a customized
routine with the command
KSPSetConvergenceTest(KSP ksp,PetscErrorCode (*test)(KSP ksp,int it,double rnorm,
KSPConvergedReason *reason,void *ctx),void *ctx,PetscErrorCode (*destroy)(void *ctx));
75
The nal routine argument, ctx, is an optional context for private data for the user-dened convergence
routine, test. Other test routine arguments are the iteration number, it, and the residuals l
2
norm,
rnorm. The routine for detecting convergence, test, should set reason to positive for convergence, 0 for
no convergence, and negative for failure to converge. A list of possible KSPConvergedReason is given in
include/petscksp.h. You can use KSPGetConvergedReason() after KSPSolve() to see why conver-
gence/divergence was detected.
4.3.3 Convergence Monitoring
By default, the Krylov solvers run silently without displaying information about the iterations. The user can
indicate that the norms of the residuals should be displayed by using -ksp_monitor within the options
database. To display the residual norms in a graphical window(running under XWindows), one should use -
ksp_monitor_draw [x,y,w,h], where either all or none of the options must be specied. Application
programmers can also provide their own routines to perform the monitoring by using the command
KSPMonitorSet(KSP ksp,PetscErrorCode (*mon)(KSP ksp,int it,double rnorm,void *ctx),
void *ctx,PetscErrorCode (*mondestroy)(void**));
The nal routine argument, ctx, is an optional context for private data for the user-dened monitoring rou-
tine, mon. Other mon routine arguments are the iteration number (it) and the residuals l
2
norm (rnorm).
A helpful routine within user-dened monitors is PetscObjectGetComm((PetscObject)ksp,MP
I_Comm
*
comm), which returns in comm the MPI communicator for the KSP context. See section 1.3
for more discussion of the use of MPI communicators within PETSc.
Several monitoring routines are supplied with PETSc, including
KSPMonitorDefault(KSP,int,double, void *);
KSPMonitorSingularValue(KSP,int,double, void *);
KSPMonitorTrueResidualNorm(KSP,int,double, void *);
The default monitor simply prints an estimate of the l
2
-norm of the residual at each iteration. The routine
KSPMonitorSingularValue() is appropriate only for use with the conjugate gradient method or GMRES,
since it prints estimates of the extreme singular values of the preconditioned operator at each iteration.
Since KSPMonitorTrueResidualNorm() prints the true residual at each iteration by actually computing the
residual using the formula r = bAx, the routine is slow and should be used only for testing or convergence
studies, not for timing. These monitors may be accessed with the command line options -ksp_monitor,
-ksp_monitor_singular_value, and -ksp_monitor_true_residual.
To employ the default graphical monitor, one should use the commands
PetscDrawLG lg;
KSPMonitorLGCreate(char *display,char *title,int x,int y,int w,int h,PetscDrawLG *lg);
KSPMonitorSet(KSP ksp,KSPMonitorLG,lg,0);
When no longer needed, the line graph should be destroyed with the command
KSPMonitorLGDestroy(PetscDrawLG *lg);
The user can change aspects of the graphs with the PetscDrawLG
*
() and PetscDrawAxis
*
() rou-
tines. One can also access this functionality from the options database with the command -ksp_
monitor_draw [x,y,w,h]. , where x, y, w, h are the optional location and size of the window.
One can cancel hardwired monitoring routines for KSP at runtime with -ksp_monitor_cancel.
Unless the Krylov method converges so that the residual norm is small, say 10
10
, many of the nal
digits printed with the -ksp_monitor option are meaningless. Worse, they are different on different
76
machines; due to different round-off rules used by, say, the IBM RS6000 and the Sun Sparc. This makes
testing between different machines difcult. The option -ksp_monitor_short causes PETSc to print
fewer of the digits of the residual norm as it gets smaller; thus on most of the machines it will always print
the same numbers making cross system testing easier.
4.3.4 Understanding the Operators Spectrum
Since the convergence of Krylov subspace methods depends strongly on the spectrum (eigenvalues) of the
preconditioned operator, PETSc has specic routines for eigenvalue approximation via the Arnoldi or Lanc-
zos iteration. First, before the linear solve one must call
KSPSetComputeEigenvalues(KSP ksp,PETSC TRUE);
Then after the KSP solve one calls
KSPComputeEigenvalues(KSP ksp, int n,double *realpart,double *complexpart,int *neig);
Here, n is the size of the two arrays and the eigenvalues are inserted into those two arrays. Neig is the
number of eigenvalues computed; this number depends on the size of the Krylov space generated during
the linear system solution, for GMRES it is never larger than the restart parameter. There is an additional
routine
KSPComputeEigenvaluesExplicitly(KSP ksp, int n,double *realpart,double *complexpart);
that is useful only for very small problems. It explicitly computes the full representation of the precondi-
tioned operator and calls LAPACK to compute its eigenvalues. It should be only used for matrices of size
up to a couple hundred. The PetscDrawSP
*
() routines are very useful for drawing scatter plots of the
eigenvalues.
The eigenvalues may also be computed and displayed graphically with the options data base com-
mands -ksp_plot_eigenvalues and -ksp_plot_eigenvalues_explicitly. Or they can
be dumped to the screen in ASCII text via -ksp_compute_eigenvalues and -ksp_compute_
eigenvalues_explicitly.
4.3.5 Other KSP Options
To obtain the solution vector and right hand side from a KSP context, one uses
KSPGetSolution(KSP ksp,Vec *x);
KSPGetRhs(KSP ksp,Vec *rhs);
During the iterative process the solution may not yet have been calculated or it may be stored in a different
location. To access the approximate solution during the iterative process, one uses the command
KSPBuildSolution(KSP ksp,Vec w,Vec *v);
where the solution is returned in v. The user can optionally provide a vector in w as the location to store
the vector; however, if w is PETSC NULL, space allocated by PETSc in the KSP context is used. One
should not destroy this vector. For certain KSP methods, (e.g., GMRES), the construction of the solution is
expensive, while for many others it requires not even a vector copy.
Access to the residual is done in a similar way with the command
KSPBuildResidual(KSP ksp,Vec t,Vec w,Vec *v);
Again, for GMRES and certain other methods this is an expensive operation.
77
Method PCType Options Database Name
Jacobi PCJACOBI jacobi
Block Jacobi PCBJACOBI bjacobi
SOR (and SSOR) PCSOR sor
SOR with Eisenstat trick PCEISENSTAT eisenstat
Incomplete Cholesky PCICC icc
Incomplete LU PCILU ilu
Additive Schwarz PCASM asm
Linear solver PCKSP ksp
Combination of preconditioners PCCOMPOSITE composite
LU PCLU lu
Cholesky PCCHOLESKY cholesky
No preconditioning PCNONE none
Shell for user-dened PC PCSHELL shell
Table 4: PETSc Preconditioners
4.4 Preconditioners
As discussed in Section 4.3.1, the Krylov space methods are typically used in conjunction with a precondi-
tioner. To employ a particular preconditioning method, the user can either select it from the options database
using input of the form -pc_type <methodname> or set the method with the command
PCSetType(PC pc,PCType method);
In Table 4 we summarize the basic preconditioning methods supported in PETSc. The PCSHELL precon-
ditioner uses a specic, application-provided preconditioner. The direct preconditioner, PCLU, is, in fact, a
direct solver for the linear system that uses LU factorization. PCLU is included as a preconditioner so that
PETSc has a consistent interface among direct and iterative linear solvers.
Each preconditioner may have associated with it a set of options, which can be set with routines and
options database commands provided for this purpose. Such routine names and commands are all of the
form PC<TYPE>Option and -pc_<type>_option [value]. A complete list can be found by
consulting the manual pages; we discuss just a few in the sections below.
4.4.1 ILU and ICC Preconditioners
Some of the options for ILU preconditioner are
PCFactorSetLevels(PC pc,int levels);
PCFactorSetReuseOrdering(PC pc,PetscBool ag);
PCFactorSetDropTolerance(PC pc,double dt,double dtcol,int dtcount);
PCFactorSetReuseFill(PC pc,PetscBool ag);
PCFactorSetUseInPlace(PC pc);
PCFactorSetAllowDiagonalFill(PC pc);
When repeatedly solving linear systems with the same KSP context, one can reuse some information
computed during the rst linear solve. In particular, PCFactorSetReuseOrdering() causes the ordering (for
example, set with -pc_factor_mat_ordering_type order) computed in the rst factorization to
be reused for later factorizations. PCFactorSetUseInPlace() is often used with PCASM or PCBJACOBI
when zero ll is used, since it reuses the matrix space to store the incomplete factorization it saves memory
78
and copying time. Note that in-place factorization is not appropriate with any ordering besides natural and
cannot be used with the drop tolerance factorization. These options may be set in the database with
-pc_factor_levels <levels>
-pc_factor_reuse_ordering
-pc_factor_reuse_fill
-pc_factor_in_place
-pc_factor_nonzeros_along_diagonal
-pc_factor_diagonal_fill
See Section 12.4.2 for information on preallocation of memory for anticipated ll during factorization.
By alleviating the considerable overhead for dynamic memory allocation, such tuning can signicantly
enhance performance.
PETSc supports incomplete factorization preconditioners for several matrix types for sequential matrix
(for example MATSEQAIJ, MATSEQBAIJ, MATSEQSBAIJ).
4.4.2 SOR and SSOR Preconditioners
PETSc provides only a sequential SOR preconditioner that can only be used on sequential matrices or as the
subblock preconditioner when using block Jacobi or ASM preconditioning (see below).
The options for SOR preconditioning are
PCSORSetOmega(PC pc,double omega);
PCSORSetIterations(PC pc,int its,int lits);
PCSORSetSymmetric(PC pc,MatSORType type);
The rst of these commands sets the relaxation factor for successive over (under) relaxation. The second
command sets the number of inner iterations its and local iterations lits (the number of smoothing
sweeps on a process before doing a ghost point update from the other processes) to use between steps of the
Krylov space method. The total number of SOR sweeps is given by its
*
lits. The third command sets
the kind of SOR sweep, where the argument type can be one of SOR_FORWARD_SWEEP, SOR_BACK
WARD_SWEEP or SOR_SYMMETRIC_SWEEP, the default being SOR_FORWARD_SWEEP. Setting the type
to be SOR_SYMMETRIC_SWEEP produces the SSOR method. In addition, each process can locally and
independently performthe specied variant of SORwith the types SOR_LOCAL_FORWARD_SWEEP, SOR_
LOCAL_BACKWARD_SWEEP, and SOR_LOCAL_SYMMETRIC_SWEEP. These variants can also be set
with the options -pc_sor_omega <omega>, -pc_sor_its <its>, -pc_sor_lits <lits>,
-pc_sor_backward, -pc_sor_symmetric, -pc_sor_local_forward, -pc_sor_local_
backward, and -pc_sor_local_symmetric.
The Eisenstat trick [5] for SSOR preconditioning can be employed with the method PCEISENSTAT
(-pc_type eisenstat). By using both left and right preconditioning of the linear system, this vari-
ant of SSOR requires about half of the oating-point operations for conventional SSOR. The option -
pc_eisenstat_no_diagonal_scaling) (or the routine PCEisenstatNoDiagonalScaling()) turns off
diagonal scaling in conjunction with Eisenstat SSOR method, while the option -pc_eisenstat_omega
<omega> (or the routine PCEisenstatSetOmega(PC pc,double omega)) sets the SSOR relaxation coef-
cient, omega, as discussed above.
4.4.3 LU Factorization
The LU preconditioner provides several options. The rst, given by the command
PCFactorSetUseInPlace(PC pc);
79
causes the factorization to be performed in-place and hence destroys the original matrix. The options
database variant of this command is -pc_factor_in_place. Another direct preconditioner option
is selecting the ordering of equations with the command
-pc_factor_mat_ordering_type <ordering>
The possible orderings are
MATORDERINGNATURAL - Natural
MATORDERINGND - Nested Dissection
MATORDERING1WD - One-way Dissection
MATORDERINGRCM - Reverse Cuthill-McKee
MATORDERINGQMD - Quotient Minimum Degree
These orderings can also be set through the options database by specifying one of the following: -pc_
factor_mat_ordering_type natural, or nd, or 1wd, or rcm, or qmd. In addition, see MatGe-
tOrdering(), discussed in Section 15.2.
The sparse LU factorization provided in PETSc does not perform pivoting for numerical stability (since
they are designed to preserve nonzero structure), thus occasionally a LU factorization will fail with a
zero pivot when, in fact, the matrix is non-singular. The option -pc_factor_nonzeros_along_
diagonal <tol> will often help eliminate the zero pivot, by preprocessing the the column ordering to
remove small values from the diagonal. Here, tol is an optional tolerance to decide if a value is nonzero;
by default it is 1.e 10.
In addition, Section 12.4.2 provides information on preallocation of memory for anticipated ll dur-
ing factorization. Such tuning can signicantly enhance performance, since it eliminates the considerable
overhead for dynamic memory allocation.
4.4.4 Block Jacobi and Overlapping Additive Schwarz Preconditioners
The block Jacobi and overlapping additive Schwarz methods in PETSc are supported in parallel; however,
only the uniprocess version of the block Gauss-Seidel method is currently in place. By default, the PETSc
implentations of these methods employ ILU(0) factorization on each individual block ( that is, the default
solver on each subblock is PCType=PCILU, KSPType=KSPPREONLY); the user can set alternative linear
solvers via the options -sub_ksp_type and -sub_pc_type. In fact, all of the KSP and PC options
can be applied to the subproblems by inserting the prex -sub_ at the beginning of the option name. These
options database commands set the particular options for all of the blocks within the global problem. In
addition, the routines
PCBJacobiGetSubKSP(PC pc,int *n local,int *rst local,KSP **subksp);
PCASMGetSubKSP(PC pc,int *n local,int *rst local,KSP **subksp);
extract the KSP context for each local block. The argument n_local is the number of blocks on the calling
process, and first_local indicates the global number of the rst block on the process. The blocks are
numbered successively by processes from zero through gb 1, where gb is the number of global blocks.
The array of KSP contexts for the local blocks is given by subksp. This mechanism enables the user to set
different solvers for the various blocks. To set the appropriate data structures, the user must explicitly call
KSPSetUp() before calling PCBJacobiGetSubKSP() or PCASMGetSubKSP(). For further details, see the
example ${PETSC_DIR}/src/ksp/ksp/examples/tutorials/ex7.c.
80
The block Jacobi, block Gauss-Seidel, and additive Schwarz preconditioners allow the user to set the
number of blocks into which the problem is divided. The options database commands to set this value are
-pc_bjacobi_blocks n and -pc_bgs_blocks n, and, within a program, the corresponding routines
are
PCBJacobiSetTotalBlocks(PC pc,int blocks,int *size);
PCASMSetTotalSubdomains(PC pc,int n,IS *is);
PCASMSetType(PC pc,PCASMType type);
The optional argument size, is an array indicating the size of each block. Currently, for certain parallel
matrix formats, only a single block per process is supported. However, the MATMPIAIJ and MATMPIBAIJ
formats support the use of general blocks as long as no blocks are shared among processes. The is argument
contains the index sets that dene the subdomains.
The object PCASMType is one of PC_ASM_BASIC, PC_ASM_INTERPOLATE, PC_ASM_RESTRIC
T, PC_ASM_NONE and may also be set with the options database -pc_asm_type [basic, interpolate,
restrict, none]. The type PC_ASM_BASIC (or -pc_asm_type basic) corresponds to the stan-
dard additive Schwarz method that uses the full restriction and interpolation operators. The type PC_A
SM_RESTRICT (or -pc_asm_type restrict) uses a full restriction operator, but during the interpo-
lation process ignores the off-process values. Similarly, PC_ASM_INTERPOLATE (or -pc_asm_type
interpolate) uses a limited restriction process in conjunction with a full interpolation, while PC_ASM
_NONE (or -pc_asm_type none) ignores off-process valies for both restriction and interpolation. The
ASM types with limited restriction or interpolation were suggested by Xiao-Chuan Cai and Marcus Sarkis
[3]. PC_ASM_RESTRICT is the PETSc default, as it saves substantial communication and for many prob-
lems has the added benet of requiring fewer iterations for convergence than the standard additive Schwarz
method.
The user can also set the number of blocks and sizes on a per-process basis with the commands
PCBJacobiSetLocalBlocks(PC pc,int blocks,int *size);
PCASMSetLocalSubdomains(PC pc,int N,IS *is);
For the ASM preconditioner one can use the following command to set the overlap to compute in con-
structing the subdomains.
PCASMSetOverlap(PC pc,int overlap);
The overlap defaults to 1, so if one desires that no additional overlap be computed beyond what may have
been set with a call to PCASMSetTotalSubdomains() or PCASMSetLocalSubdomains(), then overlap
must be set to be 0. In particular, if one does not explicitly set the subdomains in an application code,
then all overlap would be computed internally by PETSc, and using an overlap of 0 would result in an
ASM variant that is equivalent to the block Jacobi preconditioner. Note that one can dene initial index
sets is with any overlap via PCASMSetTotalSubdomains() or PCASMSetLocalSubdomains(); the routine
PCASMSetOverlap() merely allows PETSc to extend that overlap further if desired.
4.4.5 Shell Preconditioners
The shell preconditioner simply uses an application-provided routine to implement the preconditioner. To
set this routine, one uses the command
PCShellSetApply(PC pc,PetscErrorCode (*apply)(PC,Vec,Vec));
Often a preconditioner needs access to an application-provided data structured. For this, one should use
PCShellSetContext(PC pc,void *ctx);
81
to set this data structure and
PCShellGetContext(PC pc,void **ctx);
to retrieve it in apply. The three routine arguments of apply() are the PC, the input vector, and the
output vector, respectively.
For a preconditioner that requires some sort of setup before being used, that requires a new setup
everytime the operator is changed, one can provide a setup routine that is called everytime the operator is
changed (usually via KSPSetOperators()).
PCShellSetSetUp(PC pc,PetscErrorCode (*setup)(PC));
The argument to the setup routine is the same PC object which can be used to obtain the operators with
PCGetOperators() and the application-provided data structure that was set with PCShellSetContext().
4.4.6 Combining Preconditioners
The PC type PCCOMPOSITE allows one to form new preconditioners by combining already dened pre-
conditioners and solvers. Combining preconditioners usually requires some experimentation to nd a com-
bination of preconditioners that works better than any single method. It is a tricky business and is not rec-
ommended until your application code is complete and running and you are trying to improve performance.
In many cases using a single preconditioner is better than a combination; an exception is the multigrid/mul-
tilevel preconditioners (solvers) that are always combinations of some sort, see Section 4.4.7.
Let B
1
and B
2
represent the application of two preconditioners of type type1 and type2. The pre-
conditioner B = B
1
+ B
2
can be obtained with
PCSetType(pc,PCCOMPOSITE);
PCCompositeAddPC(pc,type1);
PCCompositeAddPC(pc,type2);
Any number of preconditioners may added in this way.
This way of combining preconditioners is called additive, since the actions of the preconditioners are
added together. This is the default behavior. An alternative can be set with the option
PCCompositeSetType(PC pc,PCCompositeType PC COMPOSITE MULTIPLICATIVE);
In this form the new residual is updated after the application of each preconditioner and the next precondi-
tioner applied to the next residual. For example, with two composed preconditioners: B
1
and B
2
; y = Bx
is obtained from
y = B
1
x
w
1
= x Ay
y = y + B
2
w
1
Loosely, this corresponds to a Gauss-Siedel iteration, while additive corresponds to a Jacobi iteration.
Under most circumstances the multiplicative form requires one-half the number of iterations as the
additive form; but the multiplicative form does require the application of A inside the preconditioner.
In the multiplicative version, the calculation of the residual inside the preconditioner can be done in two
ways: using the original linear system matrix or using the matrix used to build the preconditioners B1, B2,
etc. By default it uses the preconditioner matrix, to use the true matrix use the option
PCCompositeSetUseTrue(PC pc);
82
The individual preconditioners can be accessed (in order to set options) via
PCCompositeGetPC(PC pc,int count,PC *subpc);
For example, to set the rst sub preconditioners to use ILU(1)
PC subpc;
PCCompositeGetPC(pc,0,&subpc);
PCFactorSetFill(subpc,1);
These various options can also be set via the options database. For example, -pc_type composite
-pc_composite_pcs jacobi,ilu causes the composite preconditioner to be used with two precon-
ditioners: Jacobi and ILU. The option -pc_composite_type multiplicative initiates the multi-
plicative version of the algorithm, while -pc_composite_type additive the additive version. Using
the true preconditioner is obtained with the option -pc_composite_true. One sets options for the
subpreconditioners with the extra prex -sub_N_ where N is the number of the subpreconditioner. For
example, -sub_0_pc_ifactor_fill 0.
PETSc also allows a preconditioner to be a complete linear solver. This is achieved with the PCKSP
type.
PCSetType(PC pc,PCKSP PCKSP);
PCKSPGetKSP(pc,&ksp);
/* set any KSP/PC options */
From the command line one can use 5 iterations of bi-CG-stab with ILU(0) preconditioning as the precondi-
tioner with -pc_type ksp -ksp_pc_type ilu -ksp_ksp_max_it 5 -ksp_ksp_type bcgs.
By default the inner KSP preconditioner uses the outer preconditioner matrix as the matrix to be solved
in the linear system; to use the true matrix use the option
PCKSPSetUseTrue(PC pc);
or at the command line with -pc_ksp_true.
Naturally one can use a KSP preconditioner inside a composite preconditioner. For example, -pc_
type composite -pc_composite_pcs ilu,ksp -sub_1_pc_type jacobi -sub_1_ksp_
max_it 10 uses two preconditioners: ILU(0) and 10 iterations of GMRES with Jacobi preconditioning.
Though it is not clear whether one would ever wish to do such a thing.
4.4.7 Multigrid Preconditioners
A large suite of routines is available for using multigrid as a preconditioner. In the PC framework the
user is required to provide the coarse grid solver, smoothers, restriction, and interpolation, as well as the
code to calculate residuals. The PC package allows all of that to be wrapped up into a PETSc compliant
preconditioner. We fully support both matrix-free and matrix-based multigrid solvers. See also Chapter
7 for a higher level interface to the multigrid solvers for linear and nonlinear problems using the DMMG
object.
A multigrid preconditioner is created with the four commands
KSPCreate(MPI Comm comm,KSP *ksp);
KSPGetPC(KSP ksp,PC *pc);
PCSetType(PC pc,PCMG);
PCMGSetLevels(pc,int levels,MPI Comm *comms);
A large number of parameters affect the multigrid behavior. The command
83
PCMGSetType(PC pc,PCMGType mode);
indicates which form of multigrid to apply [17].
For standard V or W-cycle multigrids, one sets the mode to be PC_MG_MULTIPLICATIVE; for
the additive form (which in certain cases reduces to the BPX method, or additive multilevel Schwarz, or
multilevel diagonal scaling), one uses PC_MG_ADDITIVE as the mode. For a variant of full multigrid, one
can use PC_MG_FULL, and for the Kaskade algorithm PC_MG_KASKADE. For the multiplicative and full
multigrid options, one can use a W-cycle by calling
PCMGSetCycleType(PC pc,PCMGCycleType ctype);
with a value of PC_MG_CYCLE_W for ctype. The commands above can also be set from the options
database. The option names are -pc_mg_type [multiplicative, additive, full, kaskade],
and -pc_mg_cycle_type <ctype>.
The user can control the amount of pre- and postsmoothing by using either the options -pc_mg_
smoothup m and -pc_mg_smoothdown n or the routines
PCMGSetNumberSmoothUp(PC pc,int m);
PCMGSetNumberSmoothDown(PC pc,int n);
The multigrid routines, which determine the solvers and interpolation/restriction operators that are used,
are mandatory. To set the coarse grid solver, one must call
PCMGGetCoarseSolve(PC pc,KSP *ksp);
and set the appropriate options in ksp. Similarly, the smoothers are set by calling
PCMGGetSmoother(PC pc,int level,KSP *ksp);
and setting the various options in ksp. To use a different pre- and postsmoother, one should call the fol-
lowing routines instead.
PCMGGetSmootherUp(PC pc,int level,KSP *upksp);
and
PCMGGetSmootherDown(PC pc,int level,KSP *downksp);
Use
PCMGSetInterpolation(PC pc,int level,Mat P);
and
PCMGSetRestriction(PC pc,int level,Mat R);
to dene the intergrid transfer operations. If only one of these is set, its transpose will be used for the other.
It is possible for these interpolation operations to be matrix free (see Section 3.3), he or she should
make sure that these operations are dened for the (matrix-free) matrices passed in. Note that this system is
arranged so that if the interpolation is the transpose of the restriction, you can pass the same mat argument
to both PCMGSetRestriction() and PCMGSetInterpolation().
On each level except the coarsest, one must also set the routine to compute the residual. The following
command sufces:
PCMGSetResidual(PC pc,int level,PetscErrorCode (*residual)(Mat,Vec,Vec,Vec),Mat mat);
84
The residual() function can be set to be PCMGDefaultResidual() if ones operator is stored in a Mat format.
In certain circumstances, where it is much cheaper to calculate the residual directly, rather than through the
usual formula b Ax, the user may wish to provide an alternative.
Finally, the user may provide three work vectors for each level (except on the nest, where only the
residual work vector is required). The work vectors are set with the commands
PCMGSetRhs(PC pc,int level,Vec b);
PCMGSetX(PC pc,int level,Vec x);
PCMGSetR(PC pc,int level,Vec r);
The PC references these vectors so you should call VecDestroy() when you are nished with them. If any of
these vectors are not provided, the preconditioner will allocate them.
One can control the KSP and PC options used on the various levels (as well as the coarse grid) using the
prex mg_levels_ (mg_coarse_ for the coarse grid). For example,
-mg levels ksp type cg
will cause the CG method to be used as the Krylov method for each level. Or
-mg levels pc type ilu -mg levels pc factor levels 2
will cause the the ILU preconditioner to be used on each level with two levels of ll in the incomplete
factorization.
4.5 Solving Block Matrices
Block matrices represent an important class of problems in numerical linear algebra and offer the possibility
of far more efcient iterative solvers than just treating the entire matrix as black box. In this section we are
using the common linear algebra denition of block matrices where matrices are divided in a small (two,
three or so) number of very large blocks. Where the blocks arise naturally from the underlying physics
or discretization of the problem, for example, the velocity and pressure. Under a certain numbering of
unknowns the matrix can be written as
A
00
A
01
A
02
A
03
A
10
A
11
A
12
A
13
A
20
A
21
A
22
A
23
A
30
A
31
A
32
A
33
.
Where each A
ij
is an entire block. In a parallel computer the matrices are not stored explicitly this way
however, each process will own some of the rows of A
0
, A
1
etc. On a process the blocks may be stored
one block followed by another
A
00
00
A
00
01
A
00
02
... A
01
00
A
01
02
...
A
00
10
A
00
11
A
00
12
... A
01
10
A
01
12
...
A
00
20
A
00
21
A
00
22
... A
01
20
A
01
22
...
...
A
10
00
A
10
01
A
10
02
... A
11
00
A
11
02
...
A
10
10
A
10
11
A
10
12
... A
11
10
A
11
12
...
...
85
or interlaced, for example, with two blocks
A
00
00
A
01
00
A
00
01
A
01
01
...
A
10
00
A
11
00
A
10
01
A
11
01
...
...
A
00
10
A
01
10
A
00
11
A
01
11
...
A
10
10
A
11
10
A
10
11
A
11
11
...
...
.
Note that for interlaced storage the number of rows/columns of each block must be the same size. Matrices
obtained with DMGetMatrix() where the DM is a a DMDA are always stored interlaced. Block matrices can
also be stored using the MATNEST format which holds separate assembled blocks. Each of these nested
matrices is itself distributed in parallel. It is more efcient to use MATNEST with the methods described
in this section because there are fewer copies and better formats (e.g. BAIJ or SBAIJ) can be used for the
components, but it is not possible to use many other methods with MATNEST. See Section 3.1.3 for more
on assembling block matrices without depending on a specic matrix format.
The PETSc PCFIELDSPLIT preconditioner is used to implement the block solvers in PETSc. There
are three ways to provide the information that denes the blocks. If the matrices are stored as interlaced
then PCFieldSplitSetFields() can be called repeatedly to indicate which elds belong to each block. More
generally PCFieldSplitSetIS() can be used to indicate exactly which rows/columns of the matrix belong to a
particular block. You can provide names for each block with these routines, if you do not provide names they
are numbered from 0. With these two approaches the blocks may overlap (though generally they will not).
If only one block is dened then the complement of the matrices is used to dene the other block. Finally
the option -pc_fieldsplit_detect_saddle_point causes two diagonal blocks to be found, one
associated with all rows/columns that have zeros on the diagonals and the rest.
For simplicity in the rest of the section we restrict our matrices to two by two blocks. So the matrix is
A
00
A
01
A
10
A
11
.
On occasion the user may provide another matrix that is used to construct parts of the preconditioner
Ap
00
Ap
01
Ap
10
Ap
11
.
For notational simplicity dene ksp(A, Ap) to mean approximately solving a linear system using KSP with
operator A and preconditioner built from matrix Ap.
For matrices dened with any number of blocks there are three block algorithms available: block
Jacobi,
ksp(A
00
, Ap
00
) 0
0 ksp(A
11
, Ap
11
)
block Gauss-Seidel,
I 0
0 A
1
11
I 0
A
10
I
A
1
00
0
0 I
which is implemented
1
as
I 0
0 ksp(A
11
, Ap
11
)
0 0
0 I
I 0
A
10
A
11
I 0
0 0
ksp(A
00
, Ap
00
) 0
0 I
1
This may seem an odd way to implement since it involves the extra multiply by A11. The reason is this is implemented
this way is that this approach works for any number of blocks that may overlap.
86
and symmetric block Gauss-Seidel
A
1
00
0
0 I
I A
01
0 I
A
00
0
0 A
1
11
I 0
A
10
I
A
1
00
0
0 I
.
These can be access with -pc_fieldsplit_type <additive,multiplicative,symmetric_
multiplicative> or the function PCFieldSplitSetType(). The option prexes for the internal KSPs are
given by -fieldsplit_name_.
For two by two blocks only there are another family of solvers, based on Schur complements. The
inverse of the Schur complement factorization is
I 0
A
10
A
1
00
I
A
00
0
0 S
I A
1
00
A
01
0 I
I A
1
00
A
01
0 I
A
1
00
0
0 S
1
I 0
A
10
A
1
00
I
I A
1
00
A
01
0 I
A
1
00
0
0 S
1
I 0
A
10
A
1
00
I
A
1
00
0
0 I
I A
01
0 I
A
00
0
0 S
1
I 0
A
10
I
A
1
00
0
0 I
.
The preconditioner is accessed with -pc_fieldsplit_type schur and is implemented as
ksp(A
00
, Ap
00
) 0
0 I
I A
01
0 I
I 0
0 ksp(
S,
Sp)
I 0
A
10
ksp(A
00
, Ap
00
) I
.
Where
S = A
11
A
10
ksp(A
00
, Ap
00
)A
01
is the approximate Schur complement. By default
Sp is Ap
11
.
There are several variants of the Schur complement preconditioner obtained by dropping some of
the terms, these can be obtained with -pc_fieldsplit_schur_factorization_type <diag,
lower,upper,full> or NOTE WE NEED TO ADD A FUNCTIONAL INTERFACE. Note that the
diag form uses the preconditioner
ksp(A
00
, Ap
00
) 0
0 ksp(
S,
Sp)
since the Schur complement for saddle point problems is negative denite. IS THIS ALWAYS TRUE,
SHOULD THERE BE A FLAG TO SET THE SIGN?
You can use the PCLSC preconditioner for the Schur complement with -pc_fieldsplit_type
schur -fieldsplit_1_pc_type lsc. This uses for the preconditioner to
S the operator
ksp(A
10
A
01
, A
10
A
01
)A
10
A
00
A
01
ksp(A
10
A
01
, A
10
A
01
)
which, of course, introduces two additional inner solves for each application of the Schur complement. The
options prex for this inner KSP is -fieldsplit_1_lsc_. Instead of constructing the matrix A
10
A
01
the user can provide their own matrix. This is done by attaching the matrix/matrices to the Sp matrix they
provide with
PetscObjectCompose((PetscObject)Sp,LSC L,(PetscObject)L);
PetscObjectCompose((PetscObject)Sp,LSC Lp,(PetscObject)Lp);
87
4.6 Solving Singular Systems
Sometimes one is required to solver linear systems that are singular. That is systems with the matrix has a
null space. For example, the discretization of the Laplacian operator with Neumann boundary conditions as
a null space of the constant functions. PETSc has tools to help solve these systems.
First, one must know what the null space is and store it using an orthonormal basis in an array of PETSc
Vecs. (The constant functions can be handled separately, since they are such a common case). Create a
MatNullSpace object with the command
MatNullSpaceCreate(MPI Comm,PetscBool hasconstants,int dim,Vec *basis,MatNullSpace *nsp);
Here dim is the number of vectors in basis and hasconstants indicates if the null space contains the
constant functions. (If the null space contains the constant functions you do not need to include it in the
basis vectors you provide).
One then tells the KSP object you are using what the null space is with the call
KSPSetNullSpace(KSP ksp,MatNullSpace nsp);
The PETSc solvers will now handle the null space during the solution process.
But if one chooses a direct solver (or an incomplete factorization) it may still detect a zero pivot.
You can run with the additional options -pc_factor_shift_nonzero <dampingfactor> or -
pc_factor_shift_nonzero <dampingfactor> to prevent the zero pivot. A good choice for the
damping factor is 1.e-10.
4.7 Using PETSc to interface with external linear solvers
PETSc interfaces to several external linear solvers (see Acknowledgments). To use these solvers, one needs
to:
1. Run ./congure with the additional options --download-packagename. For eg: --download-
superlu_dist --download-parmetis (SuperLU DIST needs ParMetis) or --download-
mumps --download-scalapack --download-blacs (MUMPS requires ScaLAPACK and
BLACS).
2. Build the PETSc libraries.
3. Use the runtime option: -ksp_type preonly -pc_type <pctype> -pc_factor_mat_
solver_package <packagename>. For eg: -ksp_type preonly -pc_type lu -pc_
factor_mat_solver_package superlu_dist.
88
MatType PCType MatSolverPackage Package
(-pc_factor_mat_solver_package)
seqaij lu MATSOLVERESSL essl
seqaij lu MATSOLVERLUSOL lusol
seqaij lu MATSOLVERMATLAB matlab
aij lu MATSOLVERMUMPS mumps
aij cholesky
sbaij cholesky
mpidense lu MATSOLVERPLAPACK plapack
mpidense cholesky
aij lu MATSOLVERSPOOLES spooles
sbaij cholesky
seqaij lu MATSOLVERSUPERLU superlu
aij lu MATSOLVERSUPERLU DIST superlu_dist
seqaij lu MATSOLVERUMFPACK umfpack
Table 5: Options for External Solvers
The default and available input options for each external software can be found by specifying -help (or
-h) at runtime.
As an alternative to using runtime ags to employ these external packages, procedural calls are provided
for some packages. For example, following procedural calls are equivalent to runtime options -ksp_type
preonly -pc_type lu -pc_factor_mat_solver_package mumps -mat_mumps_icntl_
7 2:
KSPSetType(ksp,KSPPREONLY);
KSPGetPC(ksp,&pc);
PCSetType(pc,PCLU);
PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS);
PCFactorGetMatrix(pc,&F);
icntl=7; ival = 2;
MatMumpsSetIcntl(F,icntl,ival);
One can also create matrices with the appropriate capabilities by calling MatCreate() followed by Mat-
SetType() specifying the desired matrix type from Table 5. These matrix types inherit capabilities from
their PETSc matrix parents: seqaij, mpiaij, etc. As a result, the preallocation routines MatSeqAIJSetPre-
allocation(), MatMPIAIJSetPreallocation(), etc. and any other type specic routines of the base class are
supported. One can also call MatConvert() inplace to convert the matrix to and from its base class with-
out performing an expensive data copy. MatConvert() cannot be called on matrices that have already been
factored.
In Table 5, the base class aij refers to the fact that inheritance is based on MATSEQAIJ when con-
structed with a single process communicator, and from MATMPIAIJ otherwise. The same holds for baij and
sbaij. For codes that are intended to be run as both a single process or with multiple processes, depending
on the mpiexec command, it is recommended that both sets of preallocation routines are called for these
communicator morphing types. The call for the incorrect type will simply be ignored without any harm or
message.
89
90
Chapter 5
SNES: Nonlinear Solvers
The solution of large-scale nonlinear problems pervades many facets of computational science and demands
robust and exible solution strategies. The SNES library of PETSc provides a powerful suite of data-
structure-neutral numerical routines for such problems. Built on top of the linear solvers and data structures
discussed in preceding chapters, SNES enables the user to easily customize the nonlinear solvers according
to the application at hand. Also, the SNES interface is identical for the uniprocess and parallel cases; the
only difference in the parallel version is that each process typically forms only its local contribution to
various matrices and vectors.
The SNES class includes methods for solving systems of nonlinear equations of the form
F(x) = 0, (5.1)
where F :
n
n
. Newton-like methods provide the core of the package, including both line search and
trust region techniques, which are discussed further in Section 5.2. Following the PETSc design philosophy,
the interfaces to the various solvers are all virtually identical. In addition, the SNES software is completely
exible, so that the user can at runtime change any facet of the solution process.
The general form of the n-dimensional Newtons method for solving (5.1) is
x
k+1
= x
k
[F
(x
k
)]
1
F(x
k
), k = 0, 1, . . . , (5.2)
where x
0
is an initial approximation to the solution and F
(x
k
), the Jacobian, is nonsingular at each itera-
tion. In practice, the Newton iteration (5.2) is implemented by the following two steps:
1. (Approximately) solve F
(x
k
)x
k
= F(x
k
). (5.3)
2. Update x
k+1
= x
k
+ x
k
. (5.4)
5.1 Basic SNES Usage
In the simplest usage of the nonlinear solvers, the user must merely provide a C, C++, or Fortran routine to
evaluate the nonlinear function of Equation (5.1). The corresponding Jacobian matrix can be approximated
with nite differences. For codes that are typically more efcient and accurate, the user can provide a
routine to compute the Jacobian; details regarding these application-provided routines are discussed below.
To provide an overview of the use of the nonlinear solvers, we rst introduce a complete and simple example
in Figure 14, corresponding to ${PETSC_DIR}/src/snes/examples/tutorials/ex1.c.
static char help[] = "Newtons method for a two-variable system, sequential.\n\n";
91
/
*
T
Concepts: SNESbasic example
T
*
/
/
*
Include "petscsnes.h" so that we can use SNES solvers. Note that this
file automatically includes:
petscsys.h - base PETSc routines petscvec.h - vectors
petscmat.h - matrices
petscis.h - index sets petscksp.h - Krylov subspace methods
petscviewer.h - viewers petscpc.h - preconditioners
petscksp.h - linear solvers
*
/
#include <petscsnes.h>
typedef struct {
Vec xloc,rloc; /
*
local solution, residual vectors
*
/
VecScatter scatter;
} AppCtx;
/
*
User-defined routines
*
/
extern PetscErrorCode FormJacobian1(SNES,Vec,Mat
*
,Mat
*
,MatStructure
*
,void
*
);
extern PetscErrorCode FormFunction1(SNES,Vec,Vec,void
*
);
extern PetscErrorCode FormJacobian2(SNES,Vec,Mat
*
,Mat
*
,MatStructure
*
,void
*
);
extern PetscErrorCode FormFunction2(SNES,Vec,Vec,void
*
);
#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char
**
argv)
{
SNES snes; /
*
nonlinear solver context
*
/
KSP ksp; /
*
linear solver context
*
/
PC pc; /
*
preconditioner context
*
/
Vec x,r; /
*
solution, residual vectors
*
/
Mat J; /
*
Jacobian matrix
*
/
PetscErrorCode ierr;
PetscInt its;
PetscMPIInt size,rank;
PetscScalar pfive = .5,
*
xx;
PetscBool flg;
AppCtx user; /
*
user-defined work context
*
/
IS isglobal,islocal;
PetscInitialize(&argc,&argv,(char
*
)0,help);
ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRQ(ierr);
ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRQ(ierr);
/
*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Create nonlinear solver context
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
/
ierr = SNESCreate(PETSC_COMM_WORLD,&snes);CHKERRQ(ierr);
/
*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
92
Create matrix and vector data structures; set corresponding routines
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
/
/
*
Create vectors for solution and nonlinear function
*
/
ierr = VecCreate(PETSC_COMM_WORLD,&x);CHKERRQ(ierr);
ierr = VecSetSizes(x,PETSC_DECIDE,2);CHKERRQ(ierr);
ierr = VecSetFromOptions(x);CHKERRQ(ierr);
ierr = VecDuplicate(x,&r);CHKERRQ(ierr);
if (size > 1){
ierr = VecCreateSeq(PETSC_COMM_SELF,2,&user.xloc);CHKERRQ(ierr);
ierr = VecDuplicate(user.xloc,&user.rloc);CHKERRQ(ierr);
/
*
Create the scatter between the global x and local xloc
*
/
ierr = ISCreateStride(MPI_COMM_SELF,2,0,1,&islocal);CHKERRQ(ierr);
ierr = ISCreateStride(MPI_COMM_SELF,2,0,1,&isglobal);CHKERRQ(ierr);
ierr = VecScatterCreate(x,isglobal,user.xloc,islocal,&user.scatter);CHKERRQ(ierr);
ierr = ISDestroy(&isglobal);CHKERRQ(ierr);
ierr = ISDestroy(&islocal);CHKERRQ(ierr);
}
/
*
Create Jacobian matrix data structure
*
/
ierr = MatCreate(PETSC_COMM_WORLD,&J);CHKERRQ(ierr);
ierr = MatSetSizes(J,PETSC_DECIDE,PETSC_DECIDE,2,2);CHKERRQ(ierr);
ierr = MatSetFromOptions(J);CHKERRQ(ierr);
ierr = PetscOptionsHasName(PETSC_NULL,"-hard",&flg);CHKERRQ(ierr);
if (!flg) {
/
*
Set function evaluation routine and vector.
*
/
ierr = SNESSetFunction(snes,r,FormFunction1,&user);CHKERRQ(ierr);
/
*
Set Jacobian matrix data structure and Jacobian evaluation routine
*
/
ierr = SNESSetJacobian(snes,J,J,FormJacobian1,PETSC_NULL);CHKERRQ(ierr);
} else {
if (size != 1) SETERRQ(PETSC_COMM_SELF,1,"This case is a uniprocessor
example only!");
ierr = SNESSetFunction(snes,r,FormFunction2,PETSC_NULL);CHKERRQ(ierr);
ierr = SNESSetJacobian(snes,J,J,FormJacobian2,PETSC_NULL);CHKERRQ(ierr);
}
/
*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Customize nonlinear solver; set runtime options
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
/
/
*
Set linear solver defaults for this problem. By extracting the
KSP, KSP, and PC contexts from the SNES context, we can then
directly call any KSP, KSP, and PC routines to set various options.
93
*
/
ierr = SNESGetKSP(snes,&ksp);CHKERRQ(ierr);
ierr = KSPGetPC(ksp,&pc);CHKERRQ(ierr);
ierr = PCSetType(pc,PCNONE);CHKERRQ(ierr);
ierr = KSPSetTolerances(ksp,1.e-4,PETSC_DEFAULT,PETSC_DEFAULT,20);CHKERRQ(ierr);
/
*
Set SNES/KSP/KSP/PC runtime options, e.g.,
-snes_view -snes_monitor -ksp_type <ksp> -pc_type <pc>
These options will override those specified above as long as
SNESSetFromOptions() is called _after_ any other customization
routines.
*
/
ierr = SNESSetFromOptions(snes);CHKERRQ(ierr);
/
*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Evaluate initial guess; then solve nonlinear system
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
/
if (!flg) {
ierr = VecSet(x,pfive);CHKERRQ(ierr);
} else {
ierr = VecGetArray(x,&xx);CHKERRQ(ierr);
xx[0] = 2.0; xx[1] = 3.0;
ierr = VecRestoreArray(x,&xx);CHKERRQ(ierr);
}
/
*
Note: The user should initialize the vector, x, with the initial guess
for the nonlinear solver prior to calling SNESSolve(). In particular,
to employ an initial guess of zero, the user should explicitly set
this vector to zero by calling VecSet().
*
/
ierr = SNESSolve(snes,PETSC_NULL,x);CHKERRQ(ierr);
ierr = SNESGetIterationNumber(snes,&its);CHKERRQ(ierr);
if (flg) {
Vec f;
ierr = VecView(x,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
ierr = SNESGetFunction(snes,&f,0,0);CHKERRQ(ierr);
ierr = VecView(r,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);
}
ierr = PetscPrintf(PETSC_COMM_WORLD,"number of Newton iterations = %D\n\n",its);CHKERRQ(ierr);
/
*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Free work space. All PETSc objects should be destroyed when they
are no longer needed.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
*
/
ierr = VecDestroy(&x);CHKERRQ(ierr); ierr = VecDestroy(&r);CHKERRQ(ierr);
ierr = MatDestroy(&J);CHKERRQ(ierr); ierr = SNESDestroy(&snes);CHKERRQ(ierr);
if (size > 1){
ierr = VecDestroy(&user.xloc);CHKERRQ(ierr);
ierr = VecDestroy(&user.rloc);CHKERRQ(ierr);
ierr = VecScatterDestroy(&user.scatter);CHKERRQ(ierr);
}
94
ierr = PetscFinalize();
return 0;
}
/
*
-------------------------------------------------------------------
*
/
#undef __FUNCT__
#define __FUNCT__ "FormFunction1"
/
*
FormFunction1 - Evaluates nonlinear function, F(x).
Input Parameters:
. snes - the SNES context
. x - input vector
. ctx - optional user-defined context
Output Parameter:
. f - function vector
*
/
PetscErrorCode FormFunction1(SNES snes,Vec x,Vec f,void
*
ctx)
{
PetscErrorCode ierr;
PetscScalar
*
xx,
*
ff;
AppCtx
*
user = (AppCtx
*
)ctx;
Vec xloc=user->xloc,floc=user->rloc;
VecScatter scatter=user->scatter;
MPI_Comm comm;
PetscMPIInt size,rank;
PetscInt rstart,rend;
ierr = PetscObjectGetComm((PetscObject)snes,&comm);CHKERRQ(ierr);
ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr);
ierr = MPI_Comm_rank(comm,&rank);CHKERRQ(ierr);
if (size > 1){
/
*
This is a ridiculous case for testing intermidiate steps from sequential
code development to parallel implementation.
(1) scatter x into a sequetial vector;
(2) each process evaluates all values of floc;
(3) scatter floc back to the parallel f.
*
/
ierr = VecScatterBegin(scatter,x,xloc,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
ierr = VecScatterEnd(scatter,x,xloc,INSERT_VALUES,SCATTER_FORWARD);CHKERRQ(ierr);
ierr = VecGetOwnershipRange(f,&rstart,&rend);CHKERRQ(ierr);
ierr = VecGetArray(xloc,&xx);CHKERRQ(ierr);
ierr = VecGetArray(floc,&ff);CHKERRQ(ierr);
ff[0] = xx[0]
*
xx[0] + xx[0]
*
xx[1] - 3.0;
ff[1] = xx[0]
*
xx[1] + xx[1]
*
xx[1] - 6.0;
ierr = VecRestoreArray(floc,&ff);CHKERRQ(ierr);
ierr = VecRestoreArray(xloc,&xx);CHKERRQ(ierr);
ierr = VecScatterBegin(scatter,floc,f,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
ierr = VecScatterEnd(scatter,floc,f,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
} else {
/
*
Get pointers to vector data.
95
- For default PETSc vectors, VecGetArray() returns a pointer to
the data array. Otherwise, the routine is implementation dependent.
- You MUST call VecRestoreArray() when you no longer need access to
the array.
*
/
ierr = VecGetArray(x,&xx);CHKERRQ(ierr);
ierr = VecGetArray(f,&ff);CHKERRQ(ierr);
/
*
Compute function
*
/
ff[0] = xx[0]
*
xx[0] + xx[0]
*
xx[1] - 3.0;
ff[1] = xx[0]
*
xx[1] + xx[1]
*
xx[1] - 6.0;
/
*
Restore vectors
*
/
ierr = VecRestoreArray(x,&xx);CHKERRQ(ierr);
ierr = VecRestoreArray(f,&ff);CHKERRQ(ierr);
}
return 0;
}
/
*
-------------------------------------------------------------------
*
/
#undef __FUNCT__
#define __FUNCT__ "FormJacobian1"
/
*
FormJacobian1 - Evaluates Jacobian matrix.
Input Parameters:
. snes - the SNES context
. x - input vector
. dummy - optional user-defined context (not used here)
Output Parameters:
. jac - Jacobian matrix
. B - optionally different preconditioning matrix
. flag - flag indicating matrix structure
*
/
PetscErrorCode FormJacobian1(SNES snes,Vec x,Mat
*
jac,Mat
*
B,MatStructure
*
flag,void
*
dummy)
{
PetscScalar
*
xx,A[4];
PetscErrorCode ierr;
PetscInt idx[2] = {0,1};
/
*
Get pointer to vector data
*
/
ierr = VecGetArray(x,&xx);CHKERRQ(ierr);
/
*
Compute Jacobian entries and insert into matrix.
- Since this is such a small problem, we set all entries for
the matrix at once.
*
/
A[0] = 2.0
*
xx[0] + xx[1]; A[1] = xx[0];
A[2] = xx[1]; A[3] = xx[0] + 2.0
*
xx[1];
ierr = MatSetValues(
*
B,2,idx,2,idx,A,INSERT_VALUES);CHKERRQ(ierr);
*
flag = SAME_NONZERO_PATTERN;
96
/
*
Restore vector
*
/
ierr = VecRestoreArray(x,&xx);CHKERRQ(ierr);
/
*
Assemble matrix
*
/
ierr = MatAssemblyBegin(
*
B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(
*
B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
if (
*
jac !=
*
B){
ierr = MatAssemblyBegin(
*
jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(
*
jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
}
return 0;
}
/
*
-------------------------------------------------------------------
*
/
#undef __FUNCT__
#define __FUNCT__ "FormFunction2"
PetscErrorCode FormFunction2(SNES snes,Vec x,Vec f,void
*
dummy)
{
PetscErrorCode ierr;
PetscScalar
*
xx,
*
ff;
/
*
Get pointers to vector data.
- For default PETSc vectors, VecGetArray() returns a pointer to
the data array. Otherwise, the routine is implementation dependent.
- You MUST call VecRestoreArray() when you no longer need access to
the array.
*
/
ierr = VecGetArray(x,&xx);CHKERRQ(ierr);
ierr = VecGetArray(f,&ff);CHKERRQ(ierr);
/
*
Compute function
*
/
ff[0] = PetscSinScalar(3.0
*
xx[0]) + xx[0];
ff[1] = xx[1];
/
*
Restore vectors
*
/
ierr = VecRestoreArray(x,&xx);CHKERRQ(ierr);
ierr = VecRestoreArray(f,&ff);CHKERRQ(ierr);
return 0;
}
/
*
-------------------------------------------------------------------
*
/
#undef __FUNCT__
#define __FUNCT__ "FormJacobian2"
PetscErrorCode FormJacobian2(SNES snes,Vec x,Mat
*
jac,Mat
*
B,MatStructure
*
flag,void
*
dummy)
{
97
PetscScalar
*
xx,A[4];
PetscErrorCode ierr;
PetscInt idx[2] = {0,1};
/
*
Get pointer to vector data
*
/
ierr = VecGetArray(x,&xx);CHKERRQ(ierr);
/
*
Compute Jacobian entries and insert into matrix.
- Since this is such a small problem, we set all entries for
the matrix at once.
*
/
A[0] = 3.0
*
PetscCosScalar(3.0
*
xx[0]) + 1.0; A[1] = 0.0;
A[2] = 0.0; A[3] = 1.0;
ierr = MatSetValues(
*
B,2,idx,2,idx,A,INSERT_VALUES);CHKERRQ(ierr);
*
flag = SAME_NONZERO_PATTERN;
/
*
Restore vector
*
/
ierr = VecRestoreArray(x,&xx);CHKERRQ(ierr);
/
*
Assemble matrix
*
/
ierr = MatAssemblyBegin(
*
B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(
*
B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
if (
*
jac !=
*
B){
ierr = MatAssemblyBegin(
*
jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
ierr = MatAssemblyEnd(
*
jac,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
}
return 0;
}
Figure 14: Example of Uniprocess SNES Code
To create a SNES solver, one must rst call SNESCreate() as follows:
SNESCreate(MPI Comm comm,SNES *snes);
The user must then set routines for evaluating the function of equation (5.1) and its associated Jacobian
matrix, as discussed in the following sections.
To choose a nonlinear solution method, the user can either call
SNESSetType(SNES snes,SNESType method);
or use the the option -snes_type <method>, where details regarding the available methods are pre-
sented in Section 5.2. The application code can take complete control of the linear and nonlinear techniques
used in the Newton-like method by calling
SNESSetFromOptions(snes);
98
This routine provides an interface to the PETSc options database, so that at runtime the user can select
a particular nonlinear solver, set various parameters and customized routines (e.g., specialized line search
variants), prescribe the convergence tolerance, and set monitoring routines. With this routine the user can
also control all linear solver options in the KSP, and PC modules, as discussed in Chapter 4.
After having set these routines and options, the user solves the problem by calling
SNESSolve(SNES snes,Vec b,Vec x);
where x indicates the solution vector. The user should initialize this vector to the initial guess for the
nonlinear solver prior to calling SNESSolve(). In particular, to employ an initial guess of zero, the user
should explicitly set this vector to zero by calling VecSet(). Finally, after solving the nonlinear system (or
several systems), the user should destroy the SNES context with
SNESDestroy(SNES *snes);
5.1.1 Nonlinear Function Evaluation
When solving a system of nonlinear equations, the user must provide a vector, f, for storing the function of
Equation (5.1), as well as a routine that evaluates this function at the vector x. This information should be
set with the command
SNESSetFunction(SNES snes,Vec f,
PetscErrorCode (*FormFunction)(SNES snes,Vec x,Vec f,void *ctx),void *ctx);
The argument ctx is an optional user-dened context, which can store any private, application-specic
data required by the function evaluation routine; PETSC NULL should be used if such information is not
needed. In Cand C++, a user-dened context is merely a structure in which various objects can be stashed; in
Fortran a user context can be an integer array that contains both parameters and pointers to PETSc objects.
${PETSC_DIR}/src/snes/examples/tutorials/ex5.c and ${PETSC_DIR}/src/snes/
examples/tutorials/ex5f.F give examples of user-dened application contexts in C and Fortran,
respectively.
5.1.2 Jacobian Evaluation
The user must also specify a routine to form some approximation of the Jacobian matrix, A, at the current
iterate, x, as is typically done with
SNESSetJacobian(SNES snes,Mat A,Mat B,PetscErrorCode (*FormJacobian)(SNES snes,
Vec x,Mat *A,Mat *B,MatStructure *ag,void *ctx),void *ctx);
The arguments of the routine FormJacobian() are the current iterate, x; the Jacobian matrix, A; the
preconditioner matrix, B (which is usually the same as A); a flag indicating information about the precon-
ditioner matrix structure; and an optional user-dened Jacobian context, ctx, for application-specic data.
The options for flag are identical to those for the ag of KSPSetOperators(), discussed in Section 4.1.
Note that the SNES solvers are all data-structure neutral, so the full range of PETSc matrix formats (includ-
ing matrix-free methods) can be used. Chapter 3 discusses information regarding available matrix formats
and options, while Section 5.5 focuses on matrix-free methods in SNES. We briey touch on a few details
of matrix usage that are particularly important for efcient use of the nonlinear solvers.
A common usage paradigm is to assemble the problem Jacobian in the preconditioner storage B, rather
than A. In the case where they are identical, as in many simulations, this makes no difference. However, it
allows us to check the analytic Jacobian we construct in FormJacobian() by passing the -snes_mf_
operator ag. This causes PETSc to approximate the Jacobian using nite differencing of the function
99
Method SNESType Options Name Default Convergence Test
Line search SNESLS ls SNESConverged LS()
Trust region SNESTR tr SNESConverged TR()
Test Jacobian SNESTEST test
Table 6: PETSc Nonlinear Solvers
evaluation (discussed in section 5.6), and the analytic Jacobian becomes merely the preconditioner. Even if
the analytic Jacobian is incorrect, it is likely that the nite difference approximation will converge, and thus
this is an excellent method to verify the analytic Jacobian. Moreover, if the analytic Jacobian is incomplete
(some terms are missing or approximate), -snes_mf_operator may be used to obtain the exact solution,
where the Jacobian approximation has been transferred to the preconditioner.
During successive calls to FormJacobian(), the user can either insert new matrix contexts or reuse
old ones, depending on the application requirements. For many sparse matrix formats, reusing the old space
(and merely changing the matrix elements) is more efcient; however, if the matrix structure completely
changes, creating an entirely new matrix context may be preferable. Upon subsequent calls to the FormJ
acobian() routine, the user may wish to reinitialize the matrix entries to zero by calling MatZeroEntries().
See Section 3.4 for details on the reuse of the matrix context.
If the preconditioning matrix retains identical nonzero structure during successive nonlinear iterations,
setting the parameter, flag, in the FormJacobian() routine to be SAME_NONZERO_PATTERN and
reusing the matrix context can save considerable overhead. For example, when one is using a parallel
preconditioner such as incomplete factorization in solving the linearized Newton systems for such problems,
matrix colorings and communication patterns can be determined a single time and then reused repeatedly
throughout the solution process. In addition, if using different matrices for the actual Jacobian and the
preconditioner, the user can hold the preconditioner matrix xed for multiple iterations by setting flag to
SAME_PRECONDITIONER. See the discussion of KSPSetOperators() in Section 4.1 for details.
The directory ${PETSC_DIR}/src/snes/examples/tutorials provides a variety of exam-
ples.
5.2 The Nonlinear Solvers
As summarized in Table 6, SNES includes several Newton-like nonlinear solvers based on line search tech-
niques and trust region methods.
Each solver may have associated with it a set of options, which can be set with routines and options
database commands provided for this purpose. A complete list can be found by consulting the manual pages
or by running a program with the -help option; we discuss just a few in the sections below.
5.2.1 Line Search Techniques
The method SNESLS (-snes_type ls) provides a line search Newton method for solving systems of
nonlinear equations. By default, this technique employs cubic backtracking [4]. An alternative line search
routine can be set with the command
SNESSetLineSearch(SNES snes,PetscErrorCode (*ls)(SNES,Vec,Vec,Vec,Vec,double,double*,double*),
void *lsctx);
Other line search methods provided by PETSc are SNESSearchQuadraticLine(), SNESLineSearchNo(), and
SNESLineSearchNoNorms(), which can be set with the option
100
-snes_ls [cubic, quadratic, basic, basicnonorms]
The line search routines involve several parameters, which are set to defaults that are reasonable for many
applications. The user can override the defaults by using the options -snes_ls_alpha <alpha>,
-snes_ls_maxstep <max>, and -snes_ls_steptol <tol>.
5.2.2 Trust Region Methods
The trust region method in SNES for solving systems of nonlinear equations, SNESTR (-snes_type
tr), is taken from the MINPACK project [13]. Several parameters can be set to control the variation of the
trust region size during the solution process. In particular, the user can control the initial trust region radius,
computed by
=
0
F
0
2
,
by setting
0
via the option -snes_tr_delta0 <delta0>.
5.3 General Options
This section discusses options and routines that apply to all SNES solvers and problem classes. In particular,
we focus on convergence tests, monitoring routines, and tools for checking derivative computations.
5.3.1 Convergence Tests
Convergence of the nonlinear solvers can be detected in a variety of ways; the user can even specify a
customized test, as discussed below. The default convergence routines for the various nonlinear solvers
within SNES are listed in Table 6; see the corresponding manual pages for detailed descriptions. Each
of these convergence tests involves several parameters, which are set by default to values that should be
reasonable for a wide range of problems. The user can customize the parameters to the problem at hand by
using some of the following routines and options.
One method of convergence testing is to declare convergence when the normof the change in the solution
between successive iterations is less than some tolerance, stol. Convergence can also be determined based
on the norm of the function Such a test can use either the absolute size of the norm, atol, or its relative
decrease, rtol, from an initial guess. The following routine sets these parameters, which are used in many
of the default SNES convergence tests:
SNESSetTolerances(SNES snes,double atol,double rtol,double stol,
int its,int fcts);
This routine also sets the maximum numbers of allowable nonlinear iterations, its, and function evalu-
ations, fcts. The corresponding options database commands for setting these parameters are -snes_
atol <atol>, -snes_rtol <rtol>, -snes_stol <stol>, -snes_max_it <its>, and
-snes_max_funcs <fcts>. A related routine is SNESGetTolerances().
Convergence tests for trust regions methods often use an additional parameter that indicates the min-
imium allowable trust region radius. The user can set this parameter with the option -snes_trtol
<trtol> or with the routine
SNESSetTrustRegionTolerance(SNES snes,double trtol);
Users can set their own customized convergence tests in SNES by using the command
SNESSetConvergenceTest(SNES snes,PetscErrorCode (*test)(SNES snes,int it,double xnorm,
double gnorm,double f,SNESConvergedReason reason,
void *cctx),void *cctx,PetscErrorCode (*destroy)(void *cctx));
101
The nal argument of the convergence test routine, cctx, denotes an optional user-dened context for
private data. When solving systems of nonlinear equations, the arguments xnorm, gnorm, and f are the
current iterate norm, current step norm, and function norm, respectively. SNESConvergedReason should
be set positive for convergence and negative for divergence. See include/petscsnes.h for a list of
values for SNESConvergedReason.
5.3.2 Convergence Monitoring
By default the SNES solvers run silently without displaying information about the iterations. The user can
initiate monitoring with the command
SNESMonitorSet(SNES snes,PetscErrorCode (*mon)(SNES,int its,double norm,void* mctx),
void *mctx,PetscErrorCode (*monitordestroy)(void**));
The routine, mon, indicates a user-dened monitoring routine, where its and mctx respectively denote the
iteration number and an optional user-dened context for private data for the monitor routine. The argument
norm is the function norm.
The routine set by SNESMonitorSet() is called once after every successful step computation within
the nonlinear solver. Hence, the user can employ this routine for any application-specic computations that
should be done after the solution update. The option -snes_monitor activates the default SNES monitor
routine, SNESMonitorDefault(), while -snes_monitor_draw draws a simple line graph of the residual
norms convergence.
Once can cancel hardwired monitoring routines for SNES at runtime with -snes_monitor_cancel.
As the Newton method converges so that the residual norm is small, say 10
10
, many of the nal
digits printed with the -snes_monitor option are meaningless. Worse, they are different on different
machines; due to different round-off rules used by, say, the IBM RS6000 and the Sun Sparc. This makes
testing between different machines difcult. The option -snes_monitor_short causes PETSc to print
fewer of the digits of the residual norm as it gets smaller; thus on most of the machines it will always print
the same numbers making cross process testing easier.
The routines
SNESGetSolution(SNES snes,Vec *x);
SNESGetFunction(SNES snes,Vec *r,void *ctx,
int(**func)(SNES,Vec,Vec,void*));
return the solution vector and function vector from a SNES context. These routines are useful, for instance,
if the convergence test requires some property of the solution or function other than those passed with routine
arguments.
5.3.3 Checking Accuracy of Derivatives
Since hand-coding routines for Jacobian matrix evaluation can be error prone, SNES provides easy-to-use
support for checking these matrices against nite difference versions. In the simplest form of comparison,
users can employ the option -snes_type test to compare the matrices at several points. Although
not exhaustive, this test will generally catch obvious problems. One can compare the elements of the two
matrices by using the option -snes_test_display , which causes the two matrices to be printed to
the screen.
Another means for verifying the correctness of a code for Jacobian computation is running the problem
with either the nite difference or matrix-free variant, -snes_fd or -snes_mf. see Section 5.6 or Section
5.5). If a problem converges well with these matrix approximations but not with a user-provided routine, the
problem probably lies with the hand-coded matrix.
102
5.4 Inexact Newton-like Methods
Since exact solution of the linear Newton systems within (5.2) at each iteration can be costly, modica-
tions are often introduced that signicantly reduce these expenses and yet retain the rapid convergence of
Newtons method. Inexact or truncated Newton techniques approximately solve the linear systems using
an iterative scheme. In comparison with using direct methods for solving the Newton systems, iterative
methods have the virtue of requiring little space for matrix storage and potentially saving signicant com-
putational work. Within the class of inexact Newton methods, of particular interest are Newton-Krylov
methods, where the subsidiary iterative technique for solving the Newton system is chosen from the class of
Krylov subspace projection methods. Note that at runtime the user can set any of the linear solver options
discussed in Chapter 4, such as -ksp_type <ksp_method> and -pc_type <pc_method>, to set
the Krylov subspace and preconditioner methods.
Two levels of iterations occur for the inexact techniques, where during each global or outer Newton
iteration a sequence of subsidiary inner iterations of a linear solver is performed. Appropriate control of the
accuracy to which the subsidiary iterative method solves the Newton system at each global iteration is crit-
ical, since these inner iterations determine the asymptotic convergence rate for inexact Newton techniques.
While the Newton systems must be solved well enough to retain fast local convergence of the Newtons
iterates, use of excessive inner iterations, particularly when x
k
x
(x
k
)x
k
+F(x
k
)
satisfy
r
(i)
k
F(x
k
)
k
< 1.
Here x
0
is an initial approximation of the solution, and denotes an arbitrary norm in
n
.
By default a constant relative convergence tolerance is used for solving the subsidiary linear systems
within the Newton-like methods of SNES. When solving a system of nonlinear equations, one can instead
employ the techniques of Eisenstat and Walker [6] to compute
k
at each step of the nonlinear solver by
using the option -snes_ksp_ew_conv . In addition, by adding ones own KSP convergence test (see
Section 4.3.2), one can easily create ones own, problem-dependent, inner convergence tests.
5.5 Matrix-Free Methods
The SNES class fully supports matrix-free methods. The matrices specied in the Jacobian evaluation rou-
tine need not be conventional matrices; instead, they can point to the data required to implement a particular
matrix-free method. The matrix-free variant is allowed only when the linear systems are solved by an it-
erative method in combination with no preconditioning (PCNONE or -pc_type none), a user-provided
preconditioner matrix, or a user-provided preconditioner shell (PCSHELL, discussed in Section 4.4); that
is, obviously matrix-free methods cannot be used if a direct solver is to be employed.
The user can create a matrix-free context for use within SNES with the routine
MatCreateSNESMF(SNES snes,Mat *mat);
103
This routine creates the data structures needed for the matrix-vector products that arise within Krylov space
iterative methods [2] by employing the matrix type MATSHELL, discussed in Section 3.3. The default
SNES matrix-free approximations can also be invoked with the command -snes_mf. Or, one can retain
the user-provided Jacobian preconditioner, but replace the user-provided Jacobian matrix with the default
matrix free variant with the option -snes_mf_operator.
See also
MatCreateMFFD(Vec x, Mat *mat);
for users who need a matrix-free matrix but are not using SNES.
The user can set one parameter to control the Jacobian-vector product approximation with the command
MatMFFDSetFunctionError(Mat mat,double rerror);
The parameter rerror should be set to the square root of the relative error in the function evaluations, e
rel
;
the default is 10
8
, which assumes that the functions are evaluated to full double precision accuracy. This
parameter can also be set from the options database with
-snes mf err <err>
In addition, SNES provides a way to register new routines to compute the differencing parameter (h);
see the manual page for MatMFFDSetType() and MatMFFDRegisterDynamic). We currently provide two
default routines accessible via
-snes mf type <default or wp>
For the default approach there is one tuning parameter, set with
MatMFFDDSSetUmin(Mat mat,PetscReal umin);
This parameter, umin (or u
min
), is a bit involved; its default is 10
6
. The Jacobian-vector product is
approximated via the formula
F
(u)a
F(u + h a) F(u)
h
where h is computed via
h = e
rel
u
T
a/||a||
2
2
if|u
a| > u
min
||a||
1
= e
rel
u
min
sign(u
T
a) ||a||
1
/||a||
2
2
otherwise.
This approach is taken from Brown and Saad [2]. The parameter can also be set from the options database
with
-snes mf umin <umin>
The second approach, taken from Walker and Pernice, [15], computes h via
h =
1 + ||u||e
rel
||a||
This has no tunable parameters, but note that inside the nonlinear solve for the entire linear iterative process
u does not change hence
1 + ||u|| need be computed only once. This information may be set with the
options
MatMFFDWPSetComputeNormU(Mat mat,PetscBool );
104
or
-mat mffd compute normu <true or false>
This information is used to eliminate the redundant computation of these parameters, therefor reducing the
number of collective operations and improving the efciency of the application code.
It is also possible to monitor the differencing parameters h that are computed via the routines
MatMFFDSetHHistory(Mat,PetscScalar *,int);
MatMFFDResetHHistory(Mat,PetscScalar *,int);
MatMFFDGetH(Mat,PetscScalar *);
We include an example in Figure 15 that explicitly uses a matrix-free approach. Note that by using the
option -snes_mf one can easily convert any SNES code to use a matrix-free Newton-Krylov method with-
out a preconditioner. As shown in this example, SNESSetFromOptions() must be called after SNESSetJa-
cobian() to enable runtime switching between the user-specied Jacobian and the default SNES matrix-free
form.
Table 7 summarizes the various matrix situations that SNES supports. In particular, different linear
system matrices and preconditioning matrices are allowed, as well as both matrix-free and application-
provided preconditioners. All combinations are possible, as demonstrated by the example, ${PETSC_DI
R}/src/snes/examples/tutorials/ex6.c, in Figure 15.
Matrix Use Conventional Matrix Formats Matrix-Free Versions
Jacobian Create matrix with MatCreate().
Create matrix with MatCreateShell().
Assemble matrix with user-dened Use MatShellSetOperation() to set
Matrix routine.
various matrix actions.
Or use MatCreateMFFD() or MatCreateSNESMF().
Preconditioning Create matrix with MatCreate().
Use SNESGetKSP() and KSPGetPC()
Matrix Assemble matrix with user-dened to access the PC, then use
routine.
PCSetType(pc,PCSHELL);
followed by PCShellSetApply().
:i
F(u + h dx
i
) F(u)
h
where h is computed via
h = e
rel
u
i
if |u
i
| > u
min
h = e
rel
u
min
sign(u
i
) otherwise.
These parameters may be set from the options database with
-mat fd coloring err err
-mat fd coloring umin umin
Note that although MatGetColoring() works for parallel matrices, the routine currently uses a sequential
algorithm. Extensions may be forthcoming. However, if one can compute the coloring iscoloring some
other way, the routine MatFDColoringCreate() is scalable. An example of this for 2D distributed arrays is
given below that uses the utility routine DMGetColoring().
DMGetColoring(da,IS COLORING GHOSTED,&iscoloring);
MatFDColoringCreate(J,iscoloring,&fdcoloring);
MatFDColoringSetFromOptions(fdcoloring);
ISColoringDestroy(&iscoloring);
Note that the routine MatFDColoringCreate() currently is only supported for the AIJ and BAIJ matrix
formats.
112
5.7 Variational Inequalities
SNES can also solve variational inequalities with box constraints. That is nonlinear algebraic systems with
additional inequality constraints on some or all of the variables: Lu u
i
Hu
i
. Some or all of the lower
bounds may be negative innity (indicated to PETSc with SNES VI NINF) and some or all of the upper
bounds may be innity (indicated by SNES VI INF). The command
SNESVISetVariableBounds(SNES,Vec Lu,Vec Hu);
is used to indicate that one is solving a variational inequality. The option -snes_vi_monitor turns on
extra monitoring of the active set associated with the bounds and -snes_vi_type allows selecting from
several VI solvers, the default is prefered.
113
114
Chapter 6
TS: Scalable ODE and DAE Solvers
The TS library provides a framework for the scalable solution of ODEs and DAEs arising from the dis-
cretization of time-dependent PDEs, and of steady-state problems using pseudo-timestepping.
Time-Dependent Problems: Consider the ODE
u
t
= F(u, t),
where u is a nite-dimensional vector, usually obtained from discretizing a PDE with nite differences,
nite elements, etc. For example, in the backward Euler method, discretizing the heat equation
u
t
= u
xx
with centered nite differences results in
(u
i
)
t
=
u
i+1
2u
i
+ u
i1
h
2
;
or with piecewise linear nite elements approximation u(x, t)
.
=
i
i
(t)
i
(x) yields semi-discrete equa-
tion
B
(t) = A(t)
and discrete equation
(B dt
n
A)u
n+1
= Au
n
,
in which
u
n
i
=
i
(t
n
)
.
= u(x
i
, t
n
),
(t
n+1
)
.
=
u
n+1
i
u
n
i
dt
n
,
A is the stiffness matrix and B is the mass matrix.
The TS library provides code to solve these equations (currently using the forward or backward Euler
method) as well as an interface to other sophisticated ODE solvers, in a clean and easy manner, where the
user need only provide code for the evaluation of F(u, t) and (optionally) its associated Jacobian matrix.
Steady-State Problems: In addition, TS provides a general code for performing pseudo timestepping with
a variable timestep at each physical node point. For example, instead of directly attacking the steady-state
problem
F(u) = 0,
we can use pseudo-transient continuation by solving
u
t
= F(u).
115
Using time differencing
u
t
.
=
u
n+1
u
n
dt
n
with the backward Euler method, we obtain nonlinear equations at a series of pseudo-timesteps
1
dt
n
B(u
n+1
u
n
) = F(u
n+1
).
For this problem the user must provide F(u), the time steps dt
n
and the left-hand-side matrix B (or op-
tionally, if the timestep is position independent and B is the identity matrix, a scalar timestep), as well as
optionally the Jacobian of F(u).
More generally, this can be applied to implicit ODE and DAE for which the transient form is
F(u, u) = 0.
See Section 6.1.2 for details on this formulation.
6.1 Basic TS Usage
The user rst creates a TS object with the command
int TSCreate(MPI Comm comm,TSProblemType problemtype,TS *ts);
The TSProblemType is one of TS_LINEAR or TS_NONLINEAR, to indicate whether F(u, t) is given by a
matrix A, or A(t), or a function F(u, t).
One can set the solution method with the routine
TSSetType(TS ts,TSType type);
Currently supported types are TSEULER, TSRK (Runge-Kutta), TSBEULER, TSCN (Crank-Nicolson),
TSTHETA, TSGL (generalized linear), TSPSEUDO, and TSSUNDIALS (only if the Sundials package is
installed), or the command line option
-ts_type euler,rk,beuler,cn,theta,gl,pseudo,sundials.
Set the initial time and timestep with the command
TSSetInitialTimeStep(TS ts,double time,double dt);
One can change the timestep with the command
TSSetTimeStep(TS ts,double dt);
can determine the current timestep with the routine
TSGetTimeStep(TS ts,double* dt);
Here, current refers to the timestep being used to attempt to promote the solution form u
n
to u
n+1
.
One sets the total number of timesteps to run or the total time to run (whatever is rst) with the command
TSSetDuration(TS ts,int maxsteps,double maxtime);
One performs the request number of time steps with
TSSolve(TS ts,Vec X,PetscReal *ftime);
The solve call implicitly sets up the timestep context; this can be done explicitly with
116
TSSetUp(TS ts);
One destroys the context with
TSDestroy(TS *ts);
and views it with
TSView(TS ts,PetscViewer viewer);
In place of TSSolve(), a single step can be taken using
TSStep(TS ts);
6.1.1 Solving Time-dependent Problems
To set up TS for solving an ODE, one must set the following:
Solution:
TSSetSolution(TS ts, Vec initialsolution);
The vector initialsolution should contain the initial conditions for the ODE.
Function:
For linear functions (solved with implicit timestepping), the user must call
TSSetMatrices(TS ts,
Mat A,PetscErrorCode (*frhs)(TS,PetscReal,Mat*,Mat*,MatStructure*,void*),
Mat B,PetscErrorCode (*hs)(TS,PetscReal,Mat*,Mat*,MatStructure*,void*),
MatStructure ag,void *ctx)
The matrices A and B are right and left hand-side matrix respectively. The functions frhs and flhs
are used to form the matrices A and B at each timestep if the matrices are time dependent. If the
matrices do not depend on time, the user should pass in PETSC NULL. The variable ctx allows
users to pass in an application context that is passed to the frhs() or flhs() function whenever
they are called, as the nal argument. The user must provide the matrices A. If B is an identity
matrix, the user should pass in PETSC NULL. If the right-hand side is provided only as a linear
function, the user must construct a MATSHELL matrix. Note that this is the same interface as that for
SNESSetJacobian().
For nonlinear problems (or linear problems solved using explicit timestepping methods) the
user passes the function with the routine
TSSetRHSFunction(TS ts,Vec R,PetscErrorCode (*f)(TS,double,Vec,Vec,void*),void *fP);
The vector R is an optional location to store the result. The arguments to the function f() are the
timestep context, the current time, the input for the function, the output for the function, and the
(optional) user-provided context variable fP.
Jacobian: For nonlinear problems the user must also provide the (approximate) Jacobian matrix of
F(u,t) and a function to compute it at each Newton iteration. This is done with the command
117
TSSetRHSJacobian(TS ts,Mat A, Mat P,PetscErrorCode (*fjac)(TS,double,Vec,Mat*,Mat*,
MatStructure*,void*),void *fP);
The arguments for the function fjac() are the timestep context, the current time, the location where
the Jacobian is to be computed, the Jacobian matrix, an alternative approximate Jacobian matrix used
as a preconditioner, and the optional user-provided context, passed in as fP. The user must provide the
Jacobian as a matrix; thus, if using a matrix-free approach is used, the user must create a MATSHELL
matrix. Again, note the similarity to SNESSetJacobian().
Similar to SNESDefaultComputeJacobianColor() is the routine TSDefaultComputeJacobianColor() and
TSDefaultComputeJacobian() that corresponds to SNESDefaultComputeJacobian().
6.1.2 Solving Differential Algebraic Equations
The interface for solving time dependent problems given in the implicit form
F(t, u, u) = 0, u(t
0
) = u
0
is slightly different. In general, this is a differential algebraic equation (DAE), but if the matrix F
u
(t) =
F/ u is nonsingular then it is an ODE and can be transformed to the standard explicit form, although this
transformation may not lead to efcient algorithms. For ODE with nontrivial mass matrices such as arise in
FEM, the implicit/DAE interface signicantly reduces overhead to prepare the system for algebraic solvers
(SNES/KSP) by having the user assemble the correctly shifted matrix. Therefore this interface is also useful
for ODE systems.
To solve a DAE, instead of TSSetRHSFunction() and TSSetRHSJacobian(), one uses:
Function F(t, u, u)
TSSetIFunction(TS ts,Vec R,PetscErrorCode (*f)(TS,PetscReal,Vec,Vec,Vec,void*),void *funP);
The vector R is an optional location to store the residual. The arguments to the function f() are the
timestep contex, current time, input state u, input time derivative u, and the (optional) user-provided
context funP.
Jacobian F
u
+ aF
u
Unless one is using matrix-free methods without preconditioning, the user must also provide an (ap-
proximate) Jacobian matrix of G(u) = F(t, u, w + au). This form for G arises because the time
integrator internally approximates u by w + au where the positive shift a and vector w depend on
the integration method, step size, and past states. The function that evaluates G
(u) = F
u
+ aF
u
is
set with
TSSetIJacobian(TS ts,Mat A,Mat B,
PetscErrorCode (*fjac)(TS,PetscReal,Vec,Vec,PetscReal,Mat*,Mat*,MatStructure*,void*),void *jacP);
The arguments for the function fjac() are the timestep contex, current time, input state u, input
derivative u, shift a, matrix A, preconditioning matrix B, ag describing structure of preconditioning
matrix (see the discussion of KSPSetOperators() in Section 4.1 for details), and the (optional) user-
provided context jacP.
118
6.1.3 Using Implicit-Explicit (IMEX) methods for multi-rate problems
The methods of the last section can be generalized for problems with multiple time scales using the form
G(t, u, u) = F(t, u), u(t
0
) = u
0
(6.1)
where G will be treated implicitly using a method suitable for stiff problems and F will be treated explic-
itly when using an IMEX method like TSARKIMEX. G is typically linear or weakly nonlinear while F
may have very strong nonlinearities such as arise in non-oscillatory methods for hyperbolic PDE. The user
provides three pieces of information, the APIs for which have been described above.
Slow part F(t, u) using TSSetRHSFunction().
Stiff part G(t, u, u) using TSSetIFunction().
Jacobian J(t, u, u) = G
u
+aG
u
using TSSetIJacobian(). An assembled Jacobian is optional and can
be approximated using nite differencing.
To solve Equation 6.1 using fully implicit methods, the user may also provide the Jacobian F
u
.
6.1.4 Using Sundials from PETSc
Sundials is a parallel ODE solver developed by Hindmarsh et al. at LLNL. The TS library provides an
interface to use the CVODE component of Sundials directly from PETSc. (To install PETSc to use Sundials,
see the installation guide, docs/installation/index.htm.)
To use the Sundials integrators, call
TSSetType(TS ts,TSType TSSUNDIALS);
or use the command line option -ts_type sundials.
Sundials CVODE solver comes with two main integrator families, Adams and BDF (backward differ-
entiation formula). One can select these with
TSSundialsSetType(TS ts,TSSundialsLmmType [SUNDIALS ADAMS,SUNDIALS BDF]);
or the command line option -ts_sundials_type <adams,bdf>. BDF is the default.
Sundials does not use the SNES library within PETSc for its nonlinear solvers, so one cannot change
the nonlinear solver options via SNES. Rather, Sundials uses the preconditioners within the PC package of
PETSc, which can be accessed via
TSSundialsGetPC(TS ts,PC *pc);
The user can then directly set preconditioner options; alternatively, the usual runtime options can be em-
ployed via -pc_xxx.
Finally, one can set the Sundials tolerances via
TSSundialsSetTolerance(TS ts,double abs,double rel);
where abs denotes the absolute tolerance and rel the relative tolerance.
Other PETSc-Sundials options include
TSSundialsSetGramSchmidtType(TS ts,TSSundialsGramSchmidtType type);
where type is either SUNDIALS_MODIFIED_GS or SUNDIALS_UNMODIFIED_GS. This may be set
via the options data base with -ts_sundials_gramschmidt_type <modifed,unmodified>.
The routine
TSSundialsSetGMRESRestart(TS ts,int restart);
sets the number of vectors in the Krylov subpspace used by GMRES. This may be set in the options database
with -ts_sundials_gmres_restart restart.
119
6.1.5 Solving Steady-State Problems with Pseudo-Timestepping
For solving steady-state problems with pseudo-timestepping one proceeds as follows.
Provide the function F(u) with the routine
TSSetRHSFunction(TS ts,PetscErrorCode (*f)(TS,double,Vec,Vec,void*),void *fP);
The arguments to the function f() are the timestep context, the current time, the input for the func-
tion, the output for the function and the (optional) user-provided context variable fP.
Provide the (approximate) Jacobian matrix of F(u,t) and a function to compute it at each Newton
iteration. This is done with the command
TSSetRHSJacobian(TS ts,Mat A, Mat B,PetscErrorCode (*f)(TS,double,Vec,Mat*,Mat*,
MatStructure*,void*),void *fP);
The arguments for the function f() are the timestep context, the current time, the location where the
Jacobian is to be computed, the Jacobian matrix, an alternative approximate Jacobian matrix used as
a preconditioner, and the optional user-provided context, passed in as fP. The user must provide the
Jacobian as a matrix; thus, if using a matrix-free approach, one must create a MATSHELL matrix.
In addition, the user must provide a routine that computes the pseudo-timestep. This is slightly different
depending on if one is using a constant timestep over the entire grid, or it varies with location.
For location-independent pseudo-timestepping, one uses the routine
TSPseudoSetTimeStep(TS ts,int(*dt)(TS,double*,void*),void* dtctx);
The function dt is a user-provided function that computes the next pseudo-timestep. As a default
one can use TSPseudoDefaultTimeStep(TS,double*,void*) for dt. This routine updates the pseudo-
timestep with one of two strategies: the default
dt
n
= dt
increment
dt
n1
||F(u
n1
)||
||F(u
n
)||
or, the alternative,
dt
n
= dt
increment
dt
0
||F(u
0
)||
||F(u
n
)||
which can be set with the call
TSPseudoIncrementDtFromInitialDt(TS ts);
or the option -ts_pseudo_increment_dt_from_initial_dt. The value dt
increment
is by
default 1.1, but can be reset with the call
TSPseudoSetTimeStepIncrement(TS ts,double inc);
or the option -ts_pseudo_increment <inc>.
For location-dependent pseudo-timestepping, the interface function has not yet been created.
120
6.1.6 Using the Explicit Runge-Kutta timestepper with variable timesteps
The Explicit Runge-Kutta timestepper with variable timesteps is an implementation of standard Runge-Kutta
using Dormand-Prince 5(4). It is easy to change this table if needed. Since the time-stepper is using variable
timesteps, the TSSetInitialTimeStep() function is not used.
Setting the tolerance with
TSRKSetTolerance(TS ts,double tolerance)
or -ts_rk_tol denes the global tolerance, for the whole time period. The tolerance for each timestep is
calculated relatively to the size of the timestep.
The error in each timestep is calculated using the two solutions given from Dormand-Prince 5(4). The
local error is calculated from the 2-norm from the difference of the two solutions.
Other timestep features:
The next timestep can be maximum 5 times the present timestep
The smallest timestep can be 1e-14 (to avoid machine precision errors)
More details about the solver and code examples can be found at http://www.parallab.uib.
no/projects/molecul/matrix/.
121
122
Chapter 7
High Level Support for Multigrid with
DMMG
This infrastructure will be replaced in the next PETSc release. DO NOT USE IT.
PETSc provides an easy to use high-level interface for multigrid on a single structured grid using the
PETSc DMDA object (or the DMComposite object) to decompose the grid across the processes. This
DMMG code is built on top of the lower level PETSc multigrid interface provided in the PCType of MG,
see Section 4.4.7. Currently we only provide piecewise linear and piecewise constant interpolation, but can
add more if needed. The DMMG routines only provide linear multigrid but they can be used easily with
either KSP (for linear problems) or SNES (for nonlinear problems).
For linear problems the examples src/ksp/ksp/examples/tutorials/ex22.c and ex25.c
can be used to guide your development. We give a short summary here.
DMMG *dmmg;
DM da;
Vec soln;
/* Create the DMDA that stores information about the coarsest grid you wish to use */
ierr = DMDACreate3d(PETSC COMM WORLD,DMDA NONPERIODIC,DMDA STENCIL STAR,
3,3,3,PETSC DECIDE,PETSC DECIDE,PETSC DECIDE,1,1,0,0,0,&da);CHKERRQ(ierr);
/* Create the DMMG data structure */
- the second argument indicates the number of levels you wish to use and
can be changed with the option -dmmg nlevels
ierr = DMMGCreate(PETSC COMM WORLD,3,PETSC NULL,&dmmg);CHKERRQ(ierr);
/* Tell the DMMG object to use the da to dene the coarsest grid */
ierr = DMMGSetDM(dmmg,(DM)da);
/* Tell the DMMG we are solving a linear problem (hence KSP) and provide the
callback function to compute the right hand side and matrices for each level */
ierr = DMMGSetKSP(dmmg,ComputeRHS,ComputeMatrix);CHKERRQ(ierr);
/* Solve the problem */
ierr = DMMGSolve(dmmg);CHKERRQ(ierr);
/* One can access the solution with */
soln = DMMGGetx(dmmg);
ierr = DMMGDestroy(dmmg);CHKERRQ(ierr);
ierr = DMDestroy(&da);CHKERRQ(ierr);
The option -ksp_monitor and -mg_levels_ksp_monitor and optionally -mg_coarse_ksp_
monitor causes the DMMG code to print the residual norms for each level of the solver to the screen so
123
that the coarser the grid the more indented the print out. The option -dmmg_grid_sequence causes
the DMMG solve to use grid sequencing to generate the initial guess by solving the same problem on the
previous coarser grid; this often results in a much faster time to solution.
The solver (smoother) used on each level but the coarsest can be controled via the options database with
any PC or KSP option prexed as -mg_levels_[pc/ksp]_. The solver options on the nest grid can
be set with -mg_coarse_[pc/ksp]_. The DMMG has many other options that can view by running the
DMMG program with the option -help. You should generally run your code with the option -ksp_view
to see exactly what solvers are being used.
For nonlinear problems one replaces the DMMGSetKSP() with
DMMGSetSNES(DMMG *dmmg,PetscErrorCode (*function)(SNES,Vec,Vec,void*),
PetscErrorCode (*jacobian)(SNES,Vec,Mat*,Mat*,MatStructure*,void*))
or the prefered approach
DMMGSetSNESLocal(DMMG *dmmg,
PetscErrorCode (*localfunction)(DMDALocalInfo *info,void *x,void *f,void* appctx),
PetscErrorCode (*localjacobian)(DMDALocalInfo *,void *x,Mat J,void *appctx),
ad function,ad mf function);
The ad_function and ad_mf_function are described in the next chapter. See examples src/snes/
examples/tutorials/ex18.c and ex19.c for complete details.
For scalar problems (problems with one degree of freedom per node), the localfunction x and
f arguments are simply multi-dimensional arrays of double precision (or complex) numbers (according to
the dimension of the grid) that should be indexed using global i, j, k indices on the entire grid. For multi-
component problems you must create a C struct with an entry for each component and the x and f arguments
are appropriately dimensioned arrays of that struct. For example, for a 3d scalar problem the function would
be
int localfunction(DMDALocalInfo *info,double ***x, double ***f,void *ctx)
For a 2d multi-component problem with u, v, and p components one would write
typedef struct {
PetscScalar u,v,p;
} Field;
...
int localfunction(DMDALocalInfo *info,Field **x,Field **f,void *ctx)
For many nonlinear problems it is too difcult to compute the Jacobian analytically, thus if jacobian
or localjacobian is not provided, (indicated by passing in a PETSC NULL) the DMMG will compute
the sparse Jacobian reasonably efciently automatically using nite differencing. See the next chapter on
computing the Jacobian via automatic differentiation. The option -dmmg_jacobian_mf_fd causes the
code to not compute the Jacobian explicitly but rather to use differences to apply the matrix vector product
of the Jacobian.
The usual option -snes_monitor can be used to monitor the progress of the nonlinear solver. The
usual -snes_ options may be used to control the nonlinearr solves. Again we recommend using the option
-dmmg_grid_sequence and -snes_view for most runs.
124
Chapter 8
Using ADIC and ADIFOR with PETSc
Automatic differentiation is an incredible technique to generate code that computes Jacobians and other
differentives directly from code that only evaluates the function. For structured grid problems, via the
DMMG interface (see Chapter 7) PETSc provides a way to use ADIFOR and ADIC to compute the sparse
Jacobians or perform matrix free vector products with them. See src/snes/examples/tutorials/
ex18.c and ex5f.F for example usage.
First one indicates the functions for which one needs Jacobians by adding in the comments in the code
/* Process adiC(maximum number colors): FormFunctionLocal FormFunctionLocali */
where one lists the functions. In Fortran use
! Process adifor: FormFunctionLocal
Next one uses the call
DMMGSetSNESLocal(DMMG *dmmg,
PetscErrorCode (*localfunction)(DMDALocalInfo *info,void *x,void *f,void* appctx),PETSC NULL,
ad localfunction,ad mf localfunction);
where the names of the last two functions are obtained by prepending an ad_ and ad_mf_ in front of the
function name. In Fortran, this is done by prepending a g_ and m_.
Two useful options are -dmmg_jacobian_mf_ad and -dmmg_jacobian_mf_ad_operator,
with the former is uses the matrix-free automatic differentiation to apply the operator and to dene the
preconditioner operator. The latter form uses the matrix-free for the matrix-vector product but still computes
the Jacobian (by default with nite differences) used to construct the preconditioner.
8.1 Work arrays inside the local functions
In C you can call DMDAGetArray() to get work arrays (this is low overhead). In Fortran you can pro-
vide a FormFunctionLocal() that had local arrays that have hardwired sizes that are large enough or
somehow allocate space and pass it into an inner FormFunctionLocal() that is the one you differentiate; this
second approach will require some hand massaging. For example,
subroutine TrueFormFunctionLocal(info,x,f,ctx,ierr)
double precision x(gxs:gxe,gys:gye),f(xs:xe,ys:ye)
DMDA info(DMDA LOCAL INFO SIZE)
integer ctx
PetscErrorCode ierr
125
double precision work(gxs:gxe,gys:gye)
.... do the work ....
return
subroutine FormFunctionLocal(info,x,f,ctx,ierr)
double precision x(*),f(*)
DMDA info(DMDA LOCAL INFO SIZE)
PetscErrorCode ierr
integer ctx
double precision work(10000)
call TrueFormFunctionLocal(info,x,f,work,ctx,ierr)
return
126
Chapter 9
Using MATLAB with PETSc
There are four basic ways to use MATLAB with PETSc: (1) dumping les to be read into MATLAB, (2)
automatically sending data from a running PETSc program to a MATLAB process where you may inter-
actively type MATLAB commands (or run scripts), (3) automatically sending data back and forth between
PETSc and MATLAB where MATLAB commands are issued not interactively but from a script or the
PETSc program and (4) directly in MATLAB using PETSc objects.
9.1 Dumping Data for MATLAB
One can dump PETSc matrices and vectors to the screen (and thus save in a le via > filename.m) in
a format that MATLAB can read in directly. This is done with the command line options -vec_view_
matlab or -mat_view_matlab. This causes the PETSc program to print the vectors and matrices every
time a VecAssemblyXXX() and MatAssemblyXXX() is called. To provide ner control over when and what
vectors and matrices are dumped one can use the VecView() and MatView() functions with a viewer type
of ASCII (see PetscViewerASCIIOpen(), PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_STDOU
T_SELF, or PETSC_VIEWER_STDOUT_(MPI_Comm)). Before calling the viewer set the output type
with, for example,
PetscViewerSetFormat(PETSC VIEWER STDOUT WORLD,PETSC VIEWER ASCII MATLAB);
VecView(A,PETSC VIEWER STDOUT WORLD);
or
PetscViewerPushFormat(PETSC VIEWER STDOUT WORLD,PETSC VIEWER ASCII MATLAB);
MatView(B,PETSC VIEWER STDOUT WORLD);
The name of each PETSc variable printed for MATLAB may be set with
PetscObjectSetName((PetscObject)A,name);
If no name is specied, the object is given a default name using PetscObjectName.
9.2 Sending Data to Interactive Running MATLAB Session
One creates a viewer to MATLAB via
PetscViewerSocketOpen(MPI Comm,char *machine,int port,PetscViewer *v);
(port is usally set to PETSC_DEFAULT, use PETSC NULL for the machine if the MATLAB interactive
session is running on the same machine as the PETSc program) and then sends matrices or vectors via
127
VecView(Vec A,v);
MatView(Mat B,v);
One can also send arrays or integer arrays via PetscIntView(), PetscRealView() and PetscScalarView(). One
may start the MATLAB program manually or use the PETSc command PetscStartMATLAB(MPI_Comm,
char
*
machine,char
*
script,FILE
**
fp); where machine and script may be PETSC NULL.
To receive the objects in MATLAB you must rst make sure that ${PETSC_DIR}/bin/matlab is
in your MATLAB path. Use p = sopen; (or p = sopen(portnum) if you provided a port number in
your call to PetscViewerSocketOpen()), then a = PetscBinaryRead(p); returns the object you have
passed from PETSc. PetscBinaryRead() may be called any number of times. Each call should correspond on
the PETSc side with viewing a single vector or matrix. You many call sclose() to close the connection
from MATLAB. It is also possible to start your PETSc program from MATLAB via launch().
9.3 Using the MATLAB Compute Engine
One creates access to the MATLAB engine via
PetscMatlabEngineCreate(MPI Comm comm,char *machine,PetscMatlabEngine *e);
where machine is the name of the machine hosting MATLAB (PETSC NULL may be used for localhost).
One can send objects to MATLAB via
PetscMatlabEnginePut(PetscMatlabEngine e,PetscObject obj);
One can get objects via
PetscMatlabEngineGet(PetscMatlabEngine e,PetscObject obj);.
Similarly one can send arrays via
PetscMatlabEnginePutArray(PetscMatlabEngine e,int m,int n,PetscScalar *array,char *name);
and get them back via
PetscMatlabEngineGetArray(PetscMatlabEngine e,int m,int n,PetscScalar *array,char *name);
One cannot use MATLAB interactively in this mode but you can send MATLAB commands via
PetscMatlabEngineEvaluate(PetscMatlabEngine,format,...);
where format has the usual printf() format. For example,
PetscMatlabEngineEvaluate(PetscMatlabEngine,x = %g *y + z;,avalue);
The name of each PETSc variable passed to Matlab may be set with
PetscObjectSetName((PetscObject)A,name);
Text responses can be returned from MATLAB via
PetscMatlabEngineGetOutput(PetscMatlabEngine,char **);
or
PetscMatlabEnginedPrintOutput(PetscMatlabEngine,FILE*).
There is a short-cut to starting the MATLAB engine with PETSC_MATLAB_ENGINE_(MPI_Comm).
128
9.4 Using PETSc objects directly in MATLAB
PETSc must be congured with --with-shared-libraries --with-matlab-engine --with-
matlab [--download-f2cblaslapack]. The option --download-f2cblaslapack must be
used if using 64bit MATLAB on LINUX or 64bit MATLAB and the --download-ml external package
on Apple Mac OS X. You can build with or without MPI, but cannot run on more than one process There is
currently no MPI in the API, the MPI Comm is not in any of the argument lists but otherwise the argument
lists try to mimic the C binding. Add ${PETSC_DIR}/bin/matlab/classes to your MATLAB path.
In MATLAB use help PETSc to get started using PETSc from MATLAB once you have setup PETSc to
support MATLAB.
129
130
Chapter 10
PETSc for Fortran Users
Most of the functionality of PETSc can be obtained by people who program purely in Fortran 77 or Fortran
90. The PETSc Fortran interface works with both F77 and F90 compilers.
Since Fortran77 does not provide type checking of routine input/output parameters, we nd that many er-
rors encountered within PETSc Fortran programs result from accidentally using incorrect calling sequences.
Such mistakes are immediately detected during compilation when using C/C++. Thus, using a mixture of
C/C++ and Fortran often works well for programmers who wish to employ Fortran for the core numerical
routines within their applications. In particular, one can effectively write PETSc driver routines in C/C++,
thereby preserving exibility within the program, and still use Fortran when desired for underlying numeri-
cal computations. With Fortran 90 compilers we now can provide some type checking from Fortran.
10.1 Differences between PETSc Interfaces for C and Fortran
Only a few differences exist between the C and Fortran PETSc interfaces, all of which are due to Fortran
77 syntax limitations. Since PETSc is primarily written in C, the FORTRAN 90 dynamic allocation is not
easily accessible. All Fortran routines have the same names as the corresponding C versions, and PETSc
command line options are fully supported. The routine arguments follow the usual Fortran conventions; the
user need not worry about passing pointers or values. The calling sequences for the Fortran version are in
most cases identical to the C version, except for the error checking variable discussed in Section 10.1.2 and
a few routines listed in Section 10.1.10.
10.1.1 Include Files
The Fortran include les for PETSc are located in the directory ${PETSC_DIR}/include/finclude
and should be used via statements such as the following:
#include nclude/includele.h
Since one must be very careful to include each le no more than once in a Fortran routine, application
programmers must manually include each le needed for the various PETSc libraries within their program.
This approach differs from the PETSc C/C++ interface, where the user need only include the highest level
le, for example, petscsnes.h, which then automatically includes all of the required lower level les.
As shown in the examples of Section 10.2, in Fortran one must explicitly list each of the include les. One
must employ the Fortran le sufx .F rather than .f. This convention enables use of the CPP preprocessor,
which allows the use of the #include statements that dene PETSc objects and variables. (Familarity with
the CPP preprocessor is not needed for writing PETSc Fortran code; one can simply begin by copying a
PETSc Fortran example and its corresponding makele.)
131
For some of the Fortran 90 functionality of PETSc and type checking of PETSc function calls you can
use
#include nclude/includele.h #include nclude/includele.h90
See the manual page UsingFortran for how you can use PETSc Fortran module les in your code.
10.1.2 Error Checking
In the Fortran version, each PETSc routine has as its nal argument an integer error variable, in contrast
to the C convention of providing the error variable as the routines return value. The error code is set to
be nonzero if an error has been detected; otherwise, it is zero. For example, the Fortran and C variants of
KSPSolve() are given, respectively, below, where ierr denotes the error variable:
call KSPSolve(KSP ksp,Vec b,Vec x,PetscErrorCode ierr)
KSPSolve(KSP ksp,Vec b,Vec x);
Fortran programmers can check these error codes with CHKERRQ(ierr), which terminates all pro-
cesses when an error is encountered. Likewise, one can set error codes within Fortran programs by using
SETERRQ(comm,ierr,p, ), which again terminates all processes upon detection of an error. Note
that complete error tracebacks with CHKERRQ() and SETERRQ(), as described in Section 1.4 for C rou-
tines, are not directly supported for Fortran routines; however, Fortran programmers can easily use the error
codes in writing their own tracebacks. For example, one could use code such as the following:
call KSPSolve(ksp,b,x,ierr)
if ( ierr .ne. 0) then
print*, Error in routine ...
return
endif
The most common reason for crashing PETSc Fortran code is forgetting the nal ierr argument.
10.1.3 Array Arguments
Since Fortran 77 does not allow arrays to be returned in routine arguments, all PETSc routines that return
arrays, such as VecGetArray(), MatGetArray(), ISGetIndices(), and DMDAGetGlobalIndices() are dened
slightly differently in Fortran than in C. Instead of returning the array itself, these routines accept as input
a user-specied array of dimension one and return an integer index to the actual array used for data storage
within PETSc. The Fortran interface for several routines is as follows:
double precision xx v(1), aa v(1)
PetscErrorCode ierr
integer ss v(1), dd v(1), nloc
PetscOffset ss i, xx i, aa i, dd i
Vec x
Mat A
IS s
DM d
call VecGetArray(x,xx v,xx i,ierr)
call MatGetArray(A,aa v,aa i,ierr)
call ISGetIndices(s,ss v,ss i,ierr)
call DMDAGetGlobalIndices(d,nloc,dd v,dd i,ierr)
132
To access array elements directly, both the user-specied array and the integer index must then be used
together. For example, the following Fortran program fragment illustrates directly setting the values of
a vector array instead of using VecSetValues(). Note the (optional) use of the preprocessor #define
statement to enable array manipulations in the conventional Fortran manner.
#dene xx a(ib) xx v(xx i + (ib))
double precision xx v(1)
PetscOffset xx i
PetscErrorCode ierr
integer i, n
Vec x
call VecGetArray(x,xx v,xx i,ierr)
call VecGetLocalSize(x,n,ierr)
do 10, i=1,n
xx a(i) = 3*i + 1
10 continue
call VecRestoreArray(x,xx v,xx i,ierr)
Figure 17 contains an example of using VecGetArray() within a Fortran routine.
Since in this case the array is accessed directly from Fortran, indexing begins with 1, not 0 (unless the
array is declared as xx_v(0:1)). This is different from the use of VecSetValues() where, indexing always
starts with 0.
Note: If using VecGetArray(), MatGetArray(), ISGetIndices(), or DMDAGetGlobalIndices() from For-
tran, the user must not compile the Fortran code with options to check for array entries out of bounds (e.g.,
on the IBM RS/6000 this is done with the -C compiler option, so never use the -C option with this).
10.1.4 Calling Fortran Routines from C (and C Routines from Fortran)
Different machines have different methods of naming Fortran routines called from C (or C routines called
from Fortran). Most Fortran compilers change all the capital letters in Fortran routines to small. On some
machines, the Fortran compiler appends an underscore to the end of each Fortran routine name; for example,
the Fortran routine Dabsc() would be called from C with dabsc_(). Other machines change all the
letters in Fortran routine names to capitals.
PETSc provides two macros (dened in C/C++) to help write portable code that mixes C/C++ and For-
tran. They are PETSC_HAVE_FORTRAN_UNDERSCORE and PETSC_HAVE_FORTRAN_CAPS , which
are dened in the le ${PETSC_DIR}/${PETSC_ARCH}/include/petscconf.h. The macros are
used, for example, as follows:
#if dened(PETSC HAVE FORTRAN CAPS)
#dene dabsc DMDABSC
#elif !dened(PETSC HAVE FORTRAN UNDERSCORE)
#dene dabsc dabsc
#endif
.....
dabsc (&n,x,y); /* call the Fortran function */
10.1.5 Passing Null Pointers
In several PETSc C functions, one has the option of passing a 0 (null) argument (for example, the fth
argument of MatCreateSeqAIJ()). From Fortran, users must pass PETSC_NULL_XXX to indicate a null ar-
gument (where XXX is INTEGER, DOUBLE, CHARACTER, or SCALAR depending on the type of argument
133
required); passing 0 from Fortran will crash the code. Note that the C convention of passing PETSC NULL
(or 0) cannot be used. For example, when no options prex is desired in the routine PetscOptionsGetInt(),
one must use the following command in Fortran:
call PetscOptionsGetInt(PETSC NULL CHARACTER,-name,N,g,ierr)
This Fortran requirement is inconsistent with C, where the user can employ PETSC NULL for all null
arguments.
10.1.6 Duplicating Multiple Vectors
The Fortran interface to VecDuplicateVecs() differs slightly from the C/C++ variant because Fortran does
not allow arrays to be returned in routine arguments. To create n vectors of the same format as an existing
vector, the user must declare a vector array, v_new of size n. Then, after VecDuplicateVecs() has been
called, v_new will contain (pointers to) the new PETSc vector objects. When nished with the vectors, the
user should destroy them by calling VecDestroyVecs(). For example, the following code fragment duplicates
v_old to form two new vectors, v_new(1) and v_new(2).
Vec v old, v new(2)
integer ierr
PetscScalar alpha
....
call VecDuplicateVecs(v old,2,v new,ierr)
alpha = 4.3
call VecSet(v new(1),alpha,ierr)
alpha = 6.0
call VecSet(v new(2),alpha,ierr)
....
call VecDestroyVecs(2,&v new,ierr)
10.1.7 Matrix, Vector and IS Indices
All matrices, vectors and IS in PETSc use zero-based indexing, regardless of whether C or Fortran is being
used. The interface routines, such as MatSetValues() and VecSetValues(), always use zero indexing. See
Section 3.2 for further details.
10.1.8 Setting Routines
When a function pointer is passed as an argument to a PETSc function, such as the test in KSPSetConver-
genceTest(), it is assumed that this pointer references a routine written in the same language as the PETSc
interface function that was called. For instance, if KSPSetConvergenceTest() is called from C, the test argu-
ment is assumed to be a C function. Likewise, if it is called from Fortran, the test is assumed to be written
in Fortran.
10.1.9 Compiling and Linking Fortran Programs
Figure 23 shows a sample makele that can be used for PETSc programs. In this makele, one can compile
and run a debugging version of the Fortran programex3.F with the actions make ex3 and make runex3,
respectively. The compilation command is restated below:
134
ex3: ex3.o
-${FLINKER} -o ex3 ex3.o ${PETSC_LIB}
${RM} ex3.o
10.1.10 Routines with Different Fortran Interfaces
The following Fortran routines differ slightly from their C counterparts; see the manual pages and previous
discussion in this chapter for details:
PetscInitialize(char *lename,int ierr)
PetscError(MPI COMM,int err,char *message,int ierr)
VecGetArray(), MatGetArray()
ISGetIndices(), DMDAGetGlobalIndices()
VecDuplicateVecs(), VecDestroyVecs()
PetscOptionsGetString()
The following functions are not supported in Fortran:
PetscFClose(), PetscFOpen(), PetscFPrintf(), PetscPrintf()
PetscPopErrorHandler(), PetscPushErrorHandler()
PetscInfo()
PetscSetDebugger()
VecGetArrays(), VecRestoreArrays()
PetscViewerASCIIGetPointer(), PetscViewerBinaryGetDescriptor()
PetscViewerStringOpen(), PetscViewerStringSPrintf()
PetscOptionsGetStringArray()
10.1.11 Fortran90
PETSc includes limited support for direct use of Fortran90 pointers. Current routines include:
VecGetArrayF90(), VecRestoreArrayF90()
VecDuplicateVecsF90(), VecDestroyVecsF90()
DMDAGetGlobalIndicesF90()
MatGetArrayF90(), MatRestoreArrayF90()
ISGetIndicesF90(), ISRestoreIndicesF90()
See the manual pages for details and pointers to example programs. To use the routines VecGetArrayF90(),
VecRestoreArrayF90() VecDuplicateVecsF90(), and VecDestroyVecsF90(), one must use the Fortran90 vec-
tor include le,
#include nclude/petscvec.h90
Analogous include les for other libraries are petscdm.h90, petscmat.h90, and petscis.h90.
Unfortunately, these routines currently work only on certain machines with certain compilers. They
currently work with the SGI, Solaris, the Cray T3E, the IBM and the NAG Fortran 90 compiler.
10.2 Sample Fortran77 Programs
Sample programs that illustrate the PETSc interface for Fortran are given in Figures 16 19, corresponding
to ${PETSC_DIR}/src/vec/vec/examples/tests/ex19f.F, ${PETSC_DIR}/src/vec/vec/
examples/tutorials/ex4f.F,
135
${PETSC_DIR}/src/sys/draw/examples/tests/ex5f.F, and ${PETSC_DIR}/src/snes/
examples/ex1f.F, respectively. We also refer Fortran programmers to the C examples listed throughout
the manual, since PETSc usage within the two languages differs only slightly.
!
!
program main
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
!
! This example demonstrates basic use of the PETSc Fortran interface
! to vectors.
!
PetscInt n
PetscErrorCode ierr
PetscBool flg
PetscScalar one,two,three,dot
PetscReal norm,rdot
Vec x,y,w
n = 20
one = 1.0
two = 2.0
three = 3.0
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
call PetscOptionsGetInt(PETSC_NULL_CHARACTER,-n,n,flg,ierr)
! Create a vector, then duplicate it
call VecCreate(PETSC_COMM_WORLD,x,ierr)
call VecSetSizes(x,PETSC_DECIDE,n,ierr)
call VecSetFromOptions(x,ierr)
call VecDuplicate(x,y,ierr)
call VecDuplicate(x,w,ierr)
call VecSet(x,one,ierr)
call VecSet(y,two,ierr)
call VecDot(x,y,dot,ierr)
rdot = PetscRealPart(dot)
write(6,100) rdot
100 format(Result of inner product ,f10.4)
call VecScale(x,two,ierr)
call VecNorm(x,NORM_2,norm,ierr)
write(6,110) norm
110 format(Result of scaling ,f10.4)
call VecCopy(x,w,ierr)
call VecNorm(w,NORM_2,norm,ierr)
write(6,120) norm
120 format(Result of copy ,f10.4)
call VecAXPY(y,three,x,ierr)
call VecNorm(y,NORM_2,norm,ierr)
136
write(6,130) norm
130 format(Result of axpy ,f10.4)
call VecDestroy(x,ierr)
call VecDestroy(y,ierr)
call VecDestroy(w,ierr)
call PetscFinalize(ierr)
end
Figure 16: Sample Fortran Program: Using PETSc Vectors
!
!
! Description: Illustrates the use of VecSetValues() to set
! multiple values at once; demonstrates VecGetArray().
!
!/
*
T
! Concepts: vectorsassembling;
! Concepts: vectorsarrays of vectors;
! Processors: 1
!T
*
/
! -----------------------------------------------------------------------
program main
implicit none
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Include files
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! The following include statements are required for Fortran programs
! that use PETSc vectors:
! petscsys.h - base PETSc routines
! petscvec.h - vectors
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Macro definitions
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! Macros to make clearer the process of setting values in vectors and
! getting values from vectors.
!
! - The element xx_a(ib) is element ib+1 in the vector x
! - Here we add 1 to the base array index to facilitate the use of
! conventional Fortran 1-based array indexing.
!
#define xx_a(ib) xx_v(xx_i + (ib))
#define yy_a(ib) yy_v(yy_i + (ib))
137
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PetscScalar xwork(6)
PetscScalar xx_v(1),yy_v(1)
PetscInt i,n,loc(6),isix
PetscErrorCode ierr
PetscOffset xx_i,yy_i
Vec x,y
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
n = 6
isix = 6
! Create initial vector and duplicate it
call VecCreateSeq(PETSC_COMM_SELF,n,x,ierr)
call VecDuplicate(x,y,ierr)
! Fill work arrays with vector entries and locations. Note that
! the vector indices are 0-based in PETSc (for both Fortran and
! C vectors)
do 10 i=1,n
loc(i) = i-1
xwork(i) = 10.0
*
i
10 continue
! Set vector values. Note that we set multiple entries at once.
! Of course, usually one would create a work array that is the
! natural size for a particular problem (not one that is as long
! as the full vector).
call VecSetValues(x,isix,loc,xwork,INSERT_VALUES,ierr)
! Assemble vector
call VecAssemblyBegin(x,ierr)
call VecAssemblyEnd(x,ierr)
! View vector
call PetscObjectSetName(x, "initial vector:",ierr)
call VecView(x,PETSC_VIEWER_STDOUT_SELF,ierr)
call VecCopy(x,y,ierr)
! Get a pointer to vector data.
! - For default PETSc vectors, VecGetArray() returns a pointer to
! the data array. Otherwise, the routine is implementation dependent.
! - You MUST call VecRestoreArray() when you no longer need access to
! the array.
! - Note that the Fortran interface to VecGetArray() differs from the
! C version. See the users manual for details.
call VecGetArray(x,xx_v,xx_i,ierr)
138
call VecGetArray(y,yy_v,yy_i,ierr)
! Modify vector data
do 30 i=1,n
xx_a(i) = 100.0
*
i
yy_a(i) = 1000.0
*
i
30 continue
! Restore vectors
call VecRestoreArray(x,xx_v,xx_i,ierr)
call VecRestoreArray(y,yy_v,yy_i,ierr)
! View vectors
call PetscObjectSetName(x, "new vector 1:",ierr)
call VecView(x,PETSC_VIEWER_STDOUT_SELF,ierr)
call PetscObjectSetName(y, "new vector 2:",ierr)
call VecView(y,PETSC_VIEWER_STDOUT_SELF,ierr)
! Free work space. All PETSc objects should be destroyed when they
! are no longer needed.
call VecDestroy(x,ierr)
call VecDestroy(y,ierr)
call PetscFinalize(ierr)
end
Figure 17: Sample Fortran Program: Using VecSetValues() and VecGetArray()
!
!
program main
#include <finclude/petscsys.h>
#include <finclude/petscdraw.h>
!
! This example demonstrates basic use of the Fortran interface for
! PetscDraw routines.
!
PetscDraw draw
PetscDrawLG lg
PetscDrawAxis axis
PetscErrorCode ierr
PetscBool flg
integer x,y,width,height
PetscScalar xd,yd
PetscInt i,n,w,h
n = 20
x = 0
y = 0
w = 300
139
h = 300
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
! GetInt requires a PetscInt so have to do this ugly setting
call PetscOptionsGetInt(PETSC_NULL_CHARACTER,-width,w, &
& flg,ierr)
width = w
call PetscOptionsGetInt(PETSC_NULL_CHARACTER,-height,h, &
& flg,ierr)
height = h
call PetscOptionsGetInt(PETSC_NULL_CHARACTER,-n,n,flg,ierr)
call PetscDrawCreate(PETSC_COMM_SELF,PETSC_NULL_CHARACTER, &
& PETSC_NULL_CHARACTER,x,y,width,height,draw,ierr)
call PetscDrawSetType(draw,PETSC_DRAW_X,ierr)
call PetscDrawLGCreate(draw,1,lg,ierr)
call PetscDrawLGGetAxis(lg,axis,ierr)
call PetscDrawAxisSetColors(axis,PETSC_DRAW_BLACK,PETSC_DRAW_RED, &
& PETSC_DRAW_BLUE,ierr)
call PetscDrawAxisSetLabels(axis,toplabel,xlabel,ylabel, &
& ierr)
do 10, i=0,n-1
xd = i - 5.0
yd = xd
*
xd
call PetscDrawLGAddPoint(lg,xd,yd,ierr)
10 continue
call PetscDrawLGIndicateDataPoints(lg,ierr)
call PetscDrawLGDraw(lg,ierr)
call PetscDrawFlush(draw,ierr)
call PetscSleep(10,ierr)
call PetscDrawLGDestroy(lg,ierr)
call PetscDrawDestroy(draw,ierr)
call PetscFinalize(ierr)
end
Figure 18: Sample Fortran Program: Using PETSc PetscDraw Routines
!
!
! Description: Uses the Newton method to solve a two-variable system.
!
!/
*
T
! Concepts: SNESbasic uniprocessor example
! Processors: 1
!T
*
/
!
! -----------------------------------------------------------------------
140
program main
implicit none
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Include files
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! The following include statements are generally used in SNES Fortran
! programs:
! petscsys.h - base PETSc routines
! petscvec.h - vectors
! petscmat.h - matrices
! petscksp.h - Krylov subspace methods
! petscpc.h - preconditioners
! petscsnes.h - SNES interface
! Other include statements may be needed if using additional PETSc
! routines in a Fortran program, e.g.,
! petscviewer.h - viewers
! petscis.h - index sets
!
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscmat.h>
#include <finclude/petscksp.h>
#include <finclude/petscpc.h>
#include <finclude/petscsnes.h>
!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Variable declarations
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! Variables:
! snes - nonlinear solver
! ksp - linear solver
! pc - preconditioner context
! ksp - Krylov subspace method context
! x, r - solution, residual vectors
! J - Jacobian matrix
! its - iterations for convergence
!
SNES snes
PC pc
KSP ksp
Vec x,r
Mat J
PetscErrorCode ierr
PetscInt its,i2,i20
PetscMPIInt size,rank
PetscScalar pfive
double precision tol
PetscBool setls
! Note: Any user-defined Fortran routines (such as FormJacobian)
! MUST be declared as external.
141
external FormFunction, FormJacobian, MyLineSearch
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
! Macro definitions
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
!
! Macros to make clearer the process of setting values in vectors and
! getting values from vectors. These vectors are used in the routines
! FormFunction() and FormJacobian().
! - The element lx_a(ib) is element ib in the vector x
!
#define lx_a(ib) lx_v(lx_i + (ib))
#define lf_a(ib) lf_v(lf_i + (ib))
!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
call MPI_Comm_size(PETSC_COMM_WORLD,size,ierr)
call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
if (size .ne. 1) then
if (rank .eq. 0) then
write(6,
*
) This is a uniprocessor example only!
endif
SETERRQ(PETSC_COMM_SELF,1, ,ierr)
endif
i2 = 2
i20 = 20
! - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - -
! Create nonlinear solver context
! - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - -
call SNESCreate(PETSC_COMM_WORLD,snes,ierr)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Create matrix and vector data structures; set corresponding routines
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Create vectors for solution and nonlinear function
call VecCreateSeq(PETSC_COMM_SELF,i2,x,ierr)
call VecDuplicate(x,r,ierr)
! Create Jacobian matrix data structure
call MatCreate(PETSC_COMM_SELF,J,ierr)
call MatSetSizes(J,PETSC_DECIDE,PETSC_DECIDE,i2,i2,ierr)
call MatSetFromOptions(J,ierr)
! Set function evaluation routine and vector
142
call SNESSetFunction(snes,r,FormFunction,PETSC_NULL_OBJECT,ierr)
! Set Jacobian matrix data structure and Jacobian evaluation routine
call SNESSetJacobian(snes,J,J,FormJacobian,PETSC_NULL_OBJECT, &
& ierr)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Customize nonlinear solver; set runtime options
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Set linear solver defaults for this problem. By extracting the
! KSP, KSP, and PC contexts from the SNES context, we can then
! directly call any KSP, KSP, and PC routines to set various options.
call SNESGetKSP(snes,ksp,ierr)
call KSPGetPC(ksp,pc,ierr)
call PCSetType(pc,PCNONE,ierr)
tol = 1.e-4
call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION, &
& PETSC_DEFAULT_DOUBLE_PRECISION,i20,ierr)
! Set SNES/KSP/KSP/PC runtime options, e.g.,
! -snes_view -snes_monitor -ksp_type <ksp> -pc_type <pc>
! These options will override those specified above as long as
! SNESSetFromOptions() is called _after_ any other customization
! routines.
call SNESSetFromOptions(snes,ierr)
call PetscOptionsHasName(PETSC_NULL_CHARACTER,-setls,setls,ierr)
if (setls) then
call SNESLineSearchSet(snes,MyLineSearch, &
& PETSC_NULL_OBJECT,ierr)
endif
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Evaluate initial guess; then solve nonlinear system
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Note: The user should initialize the vector, x, with the initial guess
! for the nonlinear solver prior to calling SNESSolve(). In particular,
! to employ an initial guess of zero, the user should explicitly set
! this vector to zero by calling VecSet().
pfive = 0.5
call VecSet(x,pfive,ierr)
call SNESSolve(snes,PETSC_NULL_OBJECT,x,ierr)
call SNESGetIterationNumber(snes,its,ierr);
if (rank .eq. 0) then
write(6,100) its
endif
143
100 format(Number of Newton iterations = ,i5)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Free work space. All PETSc objects should be destroyed when they
! are no longer needed.
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
call VecDestroy(x,ierr)
call VecDestroy(r,ierr)
call MatDestroy(J,ierr)
call SNESDestroy(snes,ierr)
call PetscFinalize(ierr)
end
!
! ------------------------------------------------------------------------
!
! FormFunction - Evaluates nonlinear function, F(x).
!
! Input Parameters:
! snes - the SNES context
! x - input vector
! dummy - optional user-defined context (not used here)
!
! Output Parameter:
! f - function vector
!
subroutine FormFunction(snes,x,f,dummy,ierr)
implicit none
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscsnes.h>
SNES snes
Vec x,f
PetscErrorCode ierr
integer dummy(
*
)
! Declarations for use with local arrays
PetscScalar lx_v(1),lf_v(1)
PetscOffset lx_i,lf_i
! Get pointers to vector data.
! - For default PETSc vectors, VecGetArray() returns a pointer to
! the data array. Otherwise, the routine is implementation dependent.
! - You MUST call VecRestoreArray() when you no longer need access to
! the array.
! - Note that the Fortran interface to VecGetArray() differs from the
! C version. See the Fortran chapter of the users manual for details.
call VecGetArray(x,lx_v,lx_i,ierr)
call VecGetArray(f,lf_v,lf_i,ierr)
! Compute function
144
lf_a(1) = lx_a(1)
*
lx_a(1) &
& + lx_a(1)
*
lx_a(2) - 3.0
lf_a(2) = lx_a(1)
*
lx_a(2) &
& + lx_a(2)
*
lx_a(2) - 6.0
! Restore vectors
call VecRestoreArray(x,lx_v,lx_i,ierr)
call VecRestoreArray(f,lf_v,lf_i,ierr)
return
end
! ---------------------------------------------------------------------
!
! FormJacobian - Evaluates Jacobian matrix.
!
! Input Parameters:
! snes - the SNES context
! x - input vector
! dummy - optional user-defined context (not used here)
!
! Output Parameters:
! A - Jacobian matrix
! B - optionally different preconditioning matrix
! flag - flag indicating matrix structure
!
subroutine FormJacobian(snes,X,jac,B,flag,dummy,ierr)
implicit none
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscmat.h>
#include <finclude/petscpc.h>
#include <finclude/petscsnes.h>
SNES snes
Vec X
Mat jac,B
MatStructure flag
PetscScalar A(4)
PetscErrorCode ierr
PetscInt idx(2),i2
integer dummy(
*
)
! Declarations for use with local arrays
PetscScalar lx_v(1)
PetscOffset lx_i
! Get pointer to vector data
i2 = 2
call VecGetArray(x,lx_v,lx_i,ierr)
145
! Compute Jacobian entries and insert into matrix.
! - Since this is such a small problem, we set all entries for
! the matrix at once.
! - Note that MatSetValues() uses 0-based row and column numbers
! in Fortran as well as in C (as set here in the array idx).
idx(1) = 0
idx(2) = 1
A(1) = 2.0
*
lx_a(1) + lx_a(2)
A(2) = lx_a(1)
A(3) = lx_a(2)
A(4) = lx_a(1) + 2.0
*
lx_a(2)
call MatSetValues(jac,i2,idx,i2,idx,A,INSERT_VALUES,ierr)
flag = SAME_NONZERO_PATTERN
! Restore vector
call VecRestoreArray(x,lx_v,lx_i,ierr)
! Assemble matrix
call MatAssemblyBegin(jac,MAT_FINAL_ASSEMBLY,ierr)
call MatAssemblyEnd(jac,MAT_FINAL_ASSEMBLY,ierr)
return
end
subroutine MyLineSearch(snes,lctx,x,f,g,y,w,fnorm,ynorm,gnorm, &
& flag,ierr)
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscmat.h>
#include <finclude/petscksp.h>
#include <finclude/petscpc.h>
#include <finclude/petscsnes.h>
SNES snes
integer lctx
Vec x, f,g, y, w
double precision fnorm,ynorm,gnorm
PetscBool flag
PetscErrorCode ierr
PetscScalar mone
mone = -1.0d0
flag = .false.
call VecNorm(y,NORM_2,ynorm,ierr)
call VecAYPX(y,mone,x,ierr)
call SNESComputeFunction(snes,y,g,ierr)
call VecNorm(g,NORM_2,gnorm,ierr)
return
end
146
Figure 19: Sample Fortran Program: Using PETSc Nonlinear Solvers
147
148
Part III
Additional Information
149
Chapter 11
Proling
PETSc includes a consistent, lightweight scheme to allow the proling of application programs. The PETSc
routines automatically log performance data if certain options are specied at runtime. The user can also
log information about application codes for a complete picture of performance. In addition, as described
in Section 11.1.1, PETSc provides a mechanism for printing informative messages about computations.
Section 11.1 introduces the various proling options in PETSc, while the remainder of the chapter focuses
on details such as monitoring application codes and tips for accurate proling.
11.1 Basic Proling Information
If an application code and the PETSc libraries have been congured with --with-log=1, the default,
then various kinds of proling of code between calls to PetscInitialize() and PetscFinalize() can be activated
at runtime. The proling options include the following:
-log_summary - Prints an ASCII version of performance data at programs conclusion. These
statistics are comprehensive and concise and require little overhead; thus, -log_summary is in-
tended as the primary means of monitoring the performance of PETSc codes.
-info [infofile] - Prints verbose information about code to stdout or an optional le. This
option provides details about algorithms, data structures, etc. Since the overhead of printing such
output slows a code, this option should not be used when evaluating a programs performance.
-log_trace [logfile] - Traces the beginning and ending of all PETSc events. This option,
which can be used in conjunction with -info, is useful to see where a program is hanging without
running in the debugger.
As discussed in Section 11.1.3, additional prolng can be done with MPE.
11.1.1 Interpreting -log summary Output: The Basics
As shown in Figure 8 (in Part I), the option -log_summary activates printing of prole data to standard
output at the conclusion of a program. Proling data can also be printed at any time within a program by
calling PetscLogView().
We print performance data for each routine, organized by PETSc libraries, followed by any user-dened
events (discussed in Section 11.2). For each routine, the output data include the maximum time and oating
point operation (op) rate over all processes. Information about parallel performance is also included, as
discussed in the following section.
151
For the purpose of PETSc oating point operation counting, we dene one op as one operation of
any of the following types: multiplication, division, addition, or subtraction. For example, one VecAXPY()
operation, which computes y = x+y for vectors of length N, requires 2N ops (consisting of N additions
and N multiplications). Bear in mind that op rates present only a limited view of performance, since
memory loads and stores are the real performance barrier.
For simplicity, the remainder of this discussion focuses on interpreting prole data for the KSP library,
which provides the linear solvers at the heart of the PETSc package. Recall the hierarchical organization of
the PETSc library, as shown in Figure 2. Each KSP solver is composed of a PC (preconditioner) and a KSP
(Krylov subspace) part, which are in turn built on top of the Mat (matrix) and Vec (vector) modules. Thus,
operations in the KSP module are composed of lower-level operations in these packages. Note also that the
nonlinear solvers library, SNES, is build on top of the KSP module, and the timestepping library, TS, is in
turn built on top of SNES.
We briey discuss interpretation of the sample output in Figure 8, which was generated by solving
a linear system on one process using restarted GMRES and ILU preconditioning. The linear solvers in
KSP consist of two basic phases, KSPSetUp() and KSPSolve(), each of which consists of a variety of
actions, depending on the particular solution technique. For the case of using the PCILU preconditioner
and KSPGMRES Krylov subspace method, the breakdown of PETSc routines is listed below. As indicated
by the levels of indentation, the operations in KSPSetUp() include all of the operations within PCSetUp(),
which in turn include MatILUFactor(), and so on.
KSPSetUp - Set up linear solver
PCSetUp - Set up preconditioner
MatILUFactor - Factor preconditioning matrix
MatILUFactorSymbolic - Symbolic factorization phase
MatLUFactorNumeric - Numeric factorization phase
KSPSolve - Solve linear system
PCApply - Apply preconditioner
MatSolve - Forward/backward triangular solves
KSPGMRESOrthog - Orthogonalization in GMRES
VecDot or VecMDot - Inner products
MatMult - Matrix-vector product
MatMultAdd - Matrix-vector product + vector addition
VecScale, VecNorm, VecAXPY, VecCopy, ...
The summaries printed via -log_summary reect this routine hierarchy. For example, the perfor-
mance summaries for a particular high-level routine such as KSPSolve include all of the operations accu-
mulated in the lower-level components that make up the routine.
Admittedly, we do not currently present the output with -log_summary so that the hierarchy of PETSc
operations is completely clear, primarily because we have not determined a clean and uniform way to do
so throughout the library. Improvements may follow. However, for a particular problem, the user should
generally have an idea of the basic operations that are required for its implementation (e.g., which operations
are performed when using GMRES and ILU, as described above), so that interpreting the -log_summary
data should be relatively straightforward.
11.1.2 Interpreting -log summary Output: Parallel Performance
We next discuss performance summaries for parallel programs, as shown within Figures 20 and 21, which
present the combined output generated by the -log_summary option. The program that generated this
152
mpiexec -n 4 ./ex10 -f0 medium -f1 arco6 -ksp_gmres_classicalgramschmidt -log_summary -mat_type baij \
-matload_block_size 3 -pc_type bjacobi -options_left
Number of iterations = 19
Residual norm = 7.7643e-05
Number of iterations = 55
Residual norm = 6.3633e-01
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
ex10 on a rs6000 named p039 with 4 processors, by mcinnes Wed Jul 24 16:30:22 1996
Max Min Avg Total
Time (sec): 3.289e+01 1.0 3.288e+01
Objects: 1.130e+02 1.0 1.130e+02
Flops: 2.195e+08 1.0 2.187e+08 8.749e+08
Flops/sec: 6.673e+06 1.0 2.660e+07
MPI Messages: 2.205e+02 1.4 1.928e+02 7.710e+02
MPI Message Lengths: 7.862e+06 2.5 5.098e+06 2.039e+07
MPI Reductions: 1.850e+02 1.0
Summary of Stages: ---- Time ------ ----- Flops ------- -- Messages -- -- Message-lengths -- Reductions -
Avg %Total Avg %Total counts %Total avg %Total counts %Total
0: Load System 0: 1.191e+00 3.6% 3.980e+06 0.5% 3.80e+01 4.9% 6.102e+04 0.3% 1.80e+01 9.7%
1: KSPSetup 0: 6.328e-01 2.5% 1.479e+04 0.0% 0.00e+00 0.0% 0.000e+00 0.0% 0.00e+00 0.0%
2: \href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSolve.html#KSPSolve}{KSPSolve}\findex{KSPSolve} 0: 2.269e-01 0.9% 1.340e+06 0.0% 1.52e+02 19.7% 9.405e+03 0.0% 3.90e+01 21.1%
3: Load System 1: 2.680e+01 107.3% 0.000e+00 0.0% 2.10e+01 2.7% 1.799e+07 88.2% 1.60e+01 8.6%
4: KSPSetup 1: 1.867e-01 0.7% 1.088e+08 2.3% 0.00e+00 0.0% 0.000e+00 0.0% 0.00e+00 0.0%
5: \href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSolve.html#KSPSolve}{KSPSolve}\findex{KSPSolve} 1: 3.831e+00 15.3% 2.217e+08 97.1% 5.60e+02 72.6% 2.333e+06 11.4% 1.12e+02 60.5%
------------------------------------------------------------------------------------------------------------------------
.... [Summary of various phases, see part II below] ...
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants Mem.
Viewer 5 5 0 0
Index set 10 10 127076 0
Vector 76 76 9152040 0
Vector Scatter 2 2 106220 0
Matrix 8 8 9611488 5.59773e+06
Krylov Solver 4 4 33960 7.5966e+06
Preconditioner 4 4 16 9.49114e+06
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSP.html#KSP}{KSP}\findex{KSP} 4 4 0 1.71217e+07
Figure 20: Proling a PETSc Program: Part I - Overall Summary
data is ${PETSC_DIR}/src/ksp/ksp/examples/ex10.c. The code loads a matrix and right-
hand-side vector from a binary le and then solves the resulting linear system; the program then repeats
this process for a second linear system. This particular case was run on four processors of an IBM SP, using
restarted GMRES and the block Jacobi preconditioner, where each block was solved with ILU.
Figure 20 presents an overall performance summary, including times, oating-point operations, compu-
tational rates, and message-passing activity (such as the number and size of messages sent and collective
operations). Summaries for various user-dened stages of monitoring (as discussed in Section 11.3) are
also given. Information about the various phases of computation then follow (as shown separately here in
Figure 21). Finally, a summary of memory usage and object creation and destruction is presented.
We next focus on the summaries for the various phases of the computation, as given in the table within
Figure 21. The summary for each phase presents the maximum times and op rates over all processes, as
well as the ratio of maximum to minimum times and op rates for all processes. A ratio of approximately
1 indicates that computations within a given phase are well balanced among the processes; as the ratio
increases, the balance becomes increasingly poor. Also, the total computational rate (in units of MFlops/sec)
is given for each phase in the nal column of the phase summary table.
Total Mop/sec = 10
6
(sum of ops over all processors)/(max time over all processors)
Note: Total computational rates < 1 MFlop are listed as 0 in this column of the phase summary table.
Additional statistics for each phase include the total number of messages sent, the average message length,
and the number of global reductions.
As discussed in the preceding section, the performance summaries for higher-level PETSc routines in-
153
mpiexec -n 4 ./ex10 -f0 medium -f1 arco6 -ksp_gmres_classicalgramschmidt -log_summary -mat_type baij \
-matload_block_size 3 -pc_type bjacobi -options_left
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
.... [Overall summary, see part I] ...
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: optional user-defined stages of a computation. Set stages with PLogStagePush() and PLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 106
*
(sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Phase Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
...
--- Event Stage 4: \href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSetUp.html#KSPSetUp}{KSPSetUp}\findex{KSPSetUp} 1
MatGetReordering 1 3.491e-03 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatILUFctrSymbol 1 6.970e-03 1.2 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatLUFactorNumer 1 1.829e-01 1.1 3.2e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 90 99 0 0 0 110
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSetUp.html#KSPSetUp}{KSPSetUp}\findex{KSPSetUp} 2 1.989e-01 1.1 2.9e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 99 99 0 0 0 102
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/PC/PCSetUp.html#PCSetUp}{PCSetUp}\findex{PCSetUp} 2 1.952e-01 1.1 2.9e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 97 99 0 0 0 104
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/PC/PCSetUpOnBlocks.html#PCSetUpOnBlocks}{PCSetUpOnBlocks}\findex{PCSetUpOnBlocks} 1 1.930e-01 1.1 3.0e+07 1.1 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 96 99 0 0 0 105
--- Event Stage 5: \href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSolve.html#KSPSolve}{KSPSolve}\findex{KSPSolve} 1
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Mat/MatMult.html#MatMult}{MatMult}\findex{MatMult} 56 1.199e+00 1.1 5.3e+07 1.0 1.1e+03 4.2e+03 0.0e+00 5 28 99 23 0 30 28 99 99 0 201
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Mat/MatSolve.html#MatSolve}{MatSolve}\findex{MatSolve} 57 1.263e+00 1.0 4.7e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 27 0 0 0 33 28 0 0 0 187
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecNorm.html#VecNorm}{VecNorm}\findex{VecNorm} 57 1.528e-01 1.3 2.7e+07 1.3 0.0e+00 0.0e+00 2.3e+02 1 1 0 0 31 3 1 0 0 51 81
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecScale.html#VecScale}{VecScale}\findex{VecScale} 57 3.347e-02 1.0 4.7e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 184
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecCopy.html#VecCopy}{VecCopy}\findex{VecCopy} 2 1.703e-03 1.1 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecSet.html#VecSet}{VecSet}\findex{VecSet} 3 2.098e-03 1.0 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecAXPY.html#VecAXPY}{VecAXPY}\findex{VecAXPY} 3 3.247e-03 1.1 5.4e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 200
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecMDot.html#VecMDot}{VecMDot}\findex{VecMDot} 55 5.216e-01 1.2 9.8e+07 1.2 0.0e+00 0.0e+00 2.2e+02 2 20 0 0 30 12 20 0 0 49 327
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecMAXPY.html#VecMAXPY}{VecMAXPY}\findex{VecMAXPY} 57 6.997e-01 1.1 6.9e+07 1.1 0.0e+00 0.0e+00 0.0e+00 3 21 0 0 0 18 21 0 0 0 261
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecScatterBegin.html#VecScatterBegin}{VecScatterBegin}\findex{VecScatterBegin} 56 4.534e-02 1.8 0.0e+00 0.0 1.1e+03 4.2e+03 0.0e+00 0 0 99 23 0 1 0 99 99 0 0
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/Vec/VecScatterEnd.html#VecScatterEnd}{VecScatterEnd}\findex{VecScatterEnd} 56 2.095e-01 1.2 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 5 0 0 0 0 0
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/KSP/KSPSolve.html#KSPSolve}{KSPSolve}\findex{KSPSolve} 1 3.832e+00 1.0 5.6e+07 1.0 1.1e+03 4.2e+03 4.5e+02 15 97 99 23 61 99 99 99 99 99 222
KSPGMRESOrthog 55 1.177e+00 1.1 7.9e+07 1.1 0.0e+00 0.0e+00 2.2e+02 4 39 0 0 30 29 40 0 0 49 290
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/PC/PCSetUpOnBlocks.html#PCSetUpOnBlocks}{PCSetUpOnBlocks}\findex{PCSetUpOnBlocks} 1 1.180e-05 1.1 0.0e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
\href{http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-3.2/docs/manualpages/PC/PCApply.html#PCApply}{PCApply}\findex{PCApply} 57 1.267e+00 1.0 4.7e+07 1.0 0.0e+00 0.0e+00 0.0e+00 5 27 0 0 0 33 28 0 0 0 186
------------------------------------------------------------------------------------------------------------------------
.... [Conclusion of overall summary, see part I] ...
Figure 21: Proling a PETSc Program: Part II - Phase Summaries
clude the statistics for the lower levels of which they are made up. For example, the communication within
matrix-vector products MatMult() consists of vector scatter operations, as given by the routines VecScatter-
Begin() and VecScatterEnd().
The nal data presented are the percentages of the various statistics (time (%T), ops/sec (%F), messages(%M),
average message length (%L), and reductions (%R)) for each event relative to the total computation and to
any user-dened stages (discussed in Section 11.3). These statistics can aid in optimizing performance,
since they indicate the sections of code that could benet from various kinds of tuning. Chapter 12 gives
suggestions about achieving good performance with PETSc codes.
11.1.3 Using -log mpe with Upshot/Jumpshot
It is also possible to use the Upshot (or Jumpshot) package [11] to visualize PETSc events. This package
comes with the MPE software, which is part of the MPICH [9] implementation of MPI. The option
-log mpe [logle]
creates a logle of events appropriate for viewing with Upshot. The user can either use the default logging
le, mpe.log, or specify an optional name via logfile.
154
By default, not all PETSc events are logged with MPE. For example, since MatSetValues() may be called
thousands of times in a program, by default its calls are not logged with MPE. To activate MPE logging of
a particular event, one should use the command
PetscLogEventMPEActivate(int event);
To deactivate logging of an event for MPE, one should use
PetscLogEventMPEDeactivate(int event);
The event may be either a predened PETSc event (as listed in the le ${PETSC_DIR}/include/
petsclog.h) or one obtained with PetscLogEventRegister() (as described in Section 11.2). These rou-
tines may be called as many times as desired in an application program, so that one could restrict MPE event
logging only to certain code segments.
To see what events are logged by default, the user can view the source code; see the les src/plot/
src/plogmpe.c and include/petsclog.h. A simple program and GUI interface to see the events
that are predened and their denition is being developed.
The user can also log MPI events. To do this, simply consider the PETSc application as any MPI
application, and follow the MPI implementations instructions for logging MPI calls. For example, when
using MPICH, this merely required adding -llmpich to the library list before -lmpich.
11.2 Proling Application Codes
PETSc automatically logs object creation, times, and oating-point counts for the library routines. Users can
easily supplement this information by monitoring their application codes as well. The basic steps involved
in logging a user-dened portion of code, called an event, are shown in the code fragment below:
#include petsclog.h
int USER EVENT;
PetscLogEventRegister(User event name,0,&USER EVENT);
PetscLogEventBegin(USER EVENT,0,0,0,0);
/* application code segment to monitor */
PetscLogFlops(number of ops for this code segment);
PetscLogEventEnd(USER EVENT,0,0,0,0);
One must register the event by calling PetscLogEventRegister(), which assigns a unique integer to iden-
tify the event for proling purposes:
PetscLogEventRegister(const char string[],PetscLogEvent *e);
Here string is a user-dened event name, and color is an optional user-dened event color (for use with
Upshot/Nupshot logging); one should see the manual page for details. The argument returned in e should
then be passed to the PetscLogEventBegin() and PetscLogEventEnd() routines.
Events are logged by using the pair
PetscLogEventBegin(int event,PetscObject o1,PetscObject o2,
PetscObject o3,PetscObject o4);
PetscLogEventEnd(int event,PetscObject o1,PetscObject o2,
PetscObject o3,PetscObject o4);
155
The four objects are the PETSc objects that are most closely associated with the event. For instance, in
a matrix-vector product they would be the matrix and the two vectors. These objects can be omitted by
specifying 0 for o1 - o4. The code between these two routine calls will be automatically timed and logged
as part of the specied event.
The user can log the number of oating-point operations for this segment of code by calling
PetscLogFlops(number of ops for this code segment);
between the calls to PetscLogEventBegin() and PetscLogEventEnd(). This value will automatically be added
to the global op counter for the entire program.
11.3 Proling Multiple Sections of Code
By default, the proling produces a single set of statistics for all code between the PetscInitialize() and
PetscFinalize() calls within a program. One can independently monitor up to ten stages of code by switching
among the various stages with the comands
PetscLogStagePush(PetscLogStage stage);
PetscLogStagePop();
where stage is an integer (0-9); see the manual pages for details. The command
PetscLogStageRegister(const char *name,PetscLogStage *stage)
allows one to associate a name with a stage; these names are printed whenever summaries are generated
with -log_summary or PetscLogView(). The following code fragment uses three proling stages within
an program.
PetscInitialize(int *argc,char ***args,0,0);
/* stage 0 of code here */
PetscLogStageRegister(Stage 0 of Code,&stagenum0);
for(i=0; intimes; i++) {
PetscLogStageRegister(Stage 1 of Code,&stagenum1);
PetscLogStagePush(stagenum1);
/* stage 1 of code here */
PetscLogStagePop();
PetscLogStageRegister(Stage 2 of Code,&stagenum2);
PetscLogStagePush(stagenum2);
/* stage 2 of code here */
PetscLogStagePop();
} PetscFinalize();
Figures 20 and 21 show output generated by -log_summary for a program that employs several
proling stages. In particular, this program is subdivided into six stages: loading a matrix and right-hand-
side vector from a binary le, setting up the preconditioner, and solving the linear system; this sequence is
then repeated for a second linear system. For simplicity, Figure 21 contains output only for stages 4 and
5 (linear solve of the second system), which comprise the part of this computation of most interest to us
in terms of performance monitoring. This code organization (solving a small linear system followed by a
larger system) enables generation of more accurate proling statistics for the second system by overcoming
the often considerable overhead of paging, as discussed in Section 11.8.
156
11.4 Restricting Event Logging
By default, all PETSc operations are logged. To enable or disable the PETSc logging of individual events,
one uses the commands
PetscLogEventActivate(int event);
PetscLogEventDeactivate(int event);
The event may be either a predened PETSc event (as listed in the le ${PETSC_DIR}/include/
petsclog.h) or one obtained with PetscLogEventRegister() (as described in Section 11.2).
PETSc also provides routines that deactivate (or activate) logging for entire components of the library.
Currently, the components that support such logging (de)activation are Mat (matrices), Vec (vectors), KSP
(linear solvers, including KSP and PC), and SNES (nonlinear solvers):
PetscLogEventDeactivateClass(MAT CLASSID);
PetscLogEventDeactivateClass(KSP CLASSID); /* includes PC and KSP */
PetscLogEventDeactivateClass(VEC CLASSID);
PetscLogEventDeactivateClass(SNES CLASSID);
and
PetscLogEventActivateClass(MAT CLASSID);
PetscLogEventActivateClass(KSP CLASSID); /* includes PC and KSP */
PetscLogEventActivateClass(VEC CLASSID);
PetscLogEventActivateClass(SNES CLASSID);
Recall that the option -log_all produces extensive prole data, which can be a challenge for PETScView
to handle due to the memory limitations of Tcl/Tk. Thus, one should generally use -log_all when run-
ning programs with a relatively small number of events or when disabling some of the events that occur
many times in a code (e.g., VecSetValues(), MatSetValues()).
Section 11.1.3 gives information on the restriction of events in MPE logging.
11.5 Interpreting -log info Output: Informative Messages
Users can activate the printing of verbose information about algorithms, data structures, etc. to the screen
by using the option -info or by calling PetscInfoAllow(PETSC_TRUE). Such logging, which is used
throughout the PETSc libraries, can aid the user in understanding algorithms and tuning program perfor-
mance. For example, as discussed in Section 3.1.1, -info activates the printing of information about
memory allocation during matrix assembly.
Application programmers can employ this logging as well, by using the routine
PetscInfo(void* obj,char *message,...)
where obj is the PETSc object associated most closely with the logging statement, message. For example,
in the line search Newton methods, we use a statement such as
PetscInfo(snes,"Cubically determined step, lambda %g\n",lambda);
One can selectively turn off informative messages about any of the basic PETSc objects (e.g., Mat,
SNES) with the command
PetscInfoDeactivateClass(int object classid)
157
where object_classid is one of MAT_CLASSID, SNES_CLASSID, etc. Messages can be reactivated
with the command
PetscInfoActivateClass(int object classid)
Such deactivation can be useful when one wishes to view information about higher-level PETSc libraries
(e.g., TS and SNES) without seeing all lower level data as well (e.g., Mat). One can deactivate events at
runtime for matrix and linear solver libraries via -info [no_mat, no_ksp].
11.6 Time
PETSc application programmers can access the wall clock time directly with the command
PetscLogDouble time;
PetscGetTime(&time);CHKERRQ(ierr);
which returns the current time in seconds since the epoch, and is commonly implemented with MPI_Wtime.
A oating point number is returned in order to express fractions of a second. In addition, as discussed in
Section 11.2, PETSc can automatically prole user-dened segments of code.
11.7 Saving Output to a File
All output from PETSc programs (including informative messages, proling information, and convergence
data) can be saved to a le by using the command line option -history [filename]. If no le
name is specied, the output is stored in the le ${HOME}/.petschistory. Note that this option
only saves output printed with the PetscPrintf() and PetscFPrintf() commands, not the standard printf()
and fprintf() statements.
11.8 Accurate Proling: Overcoming the Overhead of Paging
One factor that often plays a signicant role in proling a code is paging by the operating system. Generally,
when running a program only a few pages required to start it are loaded into memory rather than the entire
executable. When the execution procedes to code segments that are not in memory, a pagefault occurs,
prompting the required pages to be loaded from the disk (a very slow process). This activity distorts the
results signicantly. (The paging effects are noticeable in the the log les generated by -log_mpe, which
is described in Section 11.1.3.)
To eliminate the effects of paging when proling the performance of a program, we have found an
effective procedure is to run the exact same code on a small dummy problem before running it on the actual
problem of interest. We thus ensure that all code required by a solver is loaded into memory during solution
of the small problem. When the code procedes to the actual (larger) problem of interest, all required pages
have already been loaded into main memory, so that the performance numbers are not distorted.
When this procedure is used in conjunction with the user-dened stages of proling described in Sec-
tion 11.3, we can focus easily on the problem of interest. For example, we used this technique in the
program${PETSC_DIR}/src/ksp/ksp/examples/tutorials/ex10.c to generate the timings
within Figures 20 and 21. In this case, the proled code of interest (solving the linear system for the larger
problem) occurs within event stages 4 and 5. Section 11.1.2 provides details about interpreting such proling
data.
In particular, the macros
158
PetscPreLoadBegin(PetscBool ,char* stagename),
PetscPreLoadStage(char *stagename),
and
PetscPreLoadEnd()
can be used to easily convert a regular PETSc program to one that uses preloading. The command line
options -preload true and -preload false may be used to turn on and off preloading at run time
for PETSc programs that use these macros.
159
160
Chapter 12
Hints for Performance Tuning
This chapter presents some tips on achieving good performance within PETSc codes. We urge users to read
these hints before evaluating the performance of PETSc application codes.
12.1 Compiler Options
Code congured with --with-debugging=0 faster than that the default debugging version, so we rec-
ommend using one of the optimized versions of code when evaluating performance.
12.2 Proling
Users should not spend time optimizing a code until after having determined where it spends the bulk of its
time on realistically sized problems. As discussed in detail in Chapter 11, the PETSc routines automatically
log performance data if certain runtime options are specied. We briey highlight usage of these features
below.
Run the code with the option -log_summary to print a performance summary for various phases
of the code.
Run the code with the option -log_mpe [logfilename], which creates a logle of events suit-
able for viewing with Upshot or Nupshot (part of MPICH).
12.3 Aggregation
Performing operations on chunks of data rather than a single element at a time can signicantly enhance
performance.
Insert several (many) elements of a matrix or vector at once, rather than looping and inserting a single
value at a time. In order to access elements in of vector repeatedly, employ VecGetArray() to allow
direct manipulation of the vector elements.
When possible, use VecMDot() rather than a series of calls to VecDot().
161
12.4 Efcient Memory Allocation
12.4.1 Sparse Matrix Assembly
Since the process of dynamic memory allocation for sparse matrices is inherently very expensive, accu-
rate preallocation of memory is crucial for efcient sparse matrix assembly. One should use the matrix
creation routines for particular data structures, such as MatCreateSeqAIJ() and MatCreateMPIAIJ() for
compressed, sparse row formats, instead of the generic MatCreate() routine. For problems with multiple
degrees of freedom per node, the block, compressed, sparse row formats, created by MatCreateSeqBAIJ()
and MatCreateMPIBAIJ(), can signicantly enhance performance. Section 3.1.1 includes extensive details
and examples regarding preallocation.
12.4.2 Sparse Matrix Factorization
When symbolically factoring an AIJ matrix, PETSc has to guess how much ll there will be. Careful use
of the ll parameter in the MatILUInfo structure when calling MatLUFactorSymbolic() or MatILUFac-
torSymbolic() can reduce greatly the number of mallocs and copies required, and thus greatly improve the
performance of the factorization. One way to determine a good value for f is to run a program with the
option -info. The symbolic factorization phase will then print information such as
Info:MatILUFactorSymbolic AIJ:Realloc 12 Fill ratio:given 1 needed 2.16423
This indicates that the user should have used a ll estimate factor of about 2.17 (instead of 1) to prevent the
12 required mallocs and copies. The command line option
-pc ilu ll 2.17
will cause PETSc to preallocate the correct amount of space for incomplete (ILU) factorization. The corre-
sponding option for direct (LU) factorization is -pc_factor_fill <fill_amount\trl{>}.
12.4.3 PetscMalloc() Calls
Users should employ a reasonable number of PetscMalloc() calls in their codes. Hundreds or thousands of
memory allocations may be appropriate; however, if tens of thousands are being used, then reducing the
number of PetscMalloc() calls may be warranted. For example, reusing space or allocating large chunks and
dividing it into pieces can produce a signicant savings in allocation overhead. Section 12.5 gives details.
12.5 Data Structure Reuse
Data structures should be reused whenever possible. For example, if a code often creates new matrices or
vectors, there often may be a way to reuse some of them. Very signicant performance improvements can
be achieved by reusing matrix data structures with the same nonzero pattern. If a code creates thousands of
matrix or vector objects, performance will be degraded. For example, when solving a nonlinear problem or
timestepping, reusing the matrices and their nonzero structure for many steps when appropriate can make
the code run signicantly faster.
A simple technique for saving work vectors, matrices, etc. is employing a user-dened context. In C and
C++ such a context is merely a structure in which various objects can be stashed; in Fortran a user context
can be an integer array that contains both parameters and pointers to PETSc objects. See ${PETSC_DIR}/
snes/examples/tutorials/ex5.c and ${PETSC_DIR}/snes/examples/tutorials/ex5f.
F for examples of user-dened application contexts in C and Fortran, respectively.
162
12.6 Numerical Experiments
PETSc users should run a variety of tests. For example, there are a large number of options for the linear
and nonlinear equation solvers in PETSc, and different choices can make a very big difference in conver-
gence rates and execution times. PETSc employs defaults that are generally reasonable for a wide range
of problems, but clearly these defaults cannot be best for all cases. Users should experiment with many
combinations to determine what is best for a given problem and customize the solvers accordingly.
Use the options -snes_view, -ksp_view, etc. (or the routines KSPView(), SNESView(), etc.)
to view the options that have been used for a particular solver.
Run the code with the option -help for a list of the available runtime commands.
Use the option -info to print details about the solvers operation.
Use the PETSc monitoring discussed in Chapter 11 to evaluate the performance of various numerical
methods.
12.7 Tips for Efcient Use of Linear Solvers
As discussed in Chapter 4, the default linear solvers are
uniprocess: GMRES(30) with ILU(0) preconditioning
multiprocess: GMRES(30) with block Jacobi preconditioning, where there is 1 block per process, and
each block is solved with ILU(0)
One should experiment to determine alternatives that may be better for various applications. Recall that one
can specify the KSP methods and preconditioners at runtime via the options:
-ksp type <ksp name> -pc type <pc name>
One can also specify a variety of runtime customizations for the solvers, as discussed throughout the manual.
In particular, note that the default restart parameter for GMRES is 30, which may be too small for some
large-scale problems. One can alter this parameter with the option -ksp_gmres_restart <restart> or
by calling KSPGMRESSetRestart(). Section 4.3 gives information on setting alternative GMRES orthogo-
nalization routines, which may provide much better parallel performance.
12.8 Detecting Memory Allocation Problems
PETSc provides a number of tools to aid in detection of problems with memory allocation, including leaks
and use of uninitialized space. We briey describe these below.
The PETSc memory allocation (which collects statistics and performs error checking), is employed
by default for codes compiled in a debug-mode (congured with --with-debugging=1). PETSc
memory allocation can be activated for optimixed-mode (congured with --with-debugging=0)
using the option -malloc. The option -malloc=0 forces the use of conventional memory alloca-
tion when debugging is enabled. When running timing tests, one should build libraries in optimized-
mode.
163
When the PETSc memory allocation routines are used, the option -malloc_dump will print a list
of unfreed memory at the conclusion of a program. If all memory has been freed, only a message
stating the maximum allocated space will be printed. However, if some memory remains unfreed, this
information will be printed. Note that the option -malloc_dump merely activates a call to Petsc-
MallocDump() during PetscFinalize() the user can also call PetscMallocDump() elsewhere in
a program.
Another useful option for use with PETSc memory allocation routines is -malloc_log, which acti-
vates logging of all calls to malloc and reports memory usage, including all Fortran arrays. This option
provides a more complete picture than -malloc_dump for codes that employ Fortran with hard-
wired arrays. The option -malloc_log activates logging by calling PetscMallocSetDumpLog()
in PetscInitialize() and then prints the log by calling PetscMallocDumpLog() in PetscFinal-
ize(). The user can also call these routines elsewhere in a program. When ner granularity is desired,
the user should call PetscMallocGetCurrentUsage() and PetscMallocGetMaximumUsage() for mem-
ory allocated by PETSc, or PetscMemoryGetCurrentUsage() and PetscMemoryGetMaximumUsage()
for the total memory used by the program. Note that PetscMemorySetGetMaximumUsage() must be
called before PetscMemoryGetMaximumUsage() (typically at the beginning of the program).
12.9 System-Related Problems
The performance of a code can be affected by a variety of factors, including the cache behavior, other users
on the machine, etc. Below we briey describe some common problems and possibilities for overcoming
them.
Problem too large for physical memory size: When timing a program, one should always leave at
least a ten percent margin between the total memory a process is using and the physical size of the
machines memory. One way to estimate the amount of memory used by given process is with the
UNIX getrusage system routine. Also, the PETSc option -log_summary prints the amount
of memory used by the basic PETSc objects, thus providing a lower bound on the memory used.
Another useful option is -malloc_log which reports all memory, including any Fortran arrays in
an application code.
Effects of other users: If other users are running jobs on the same physical processor nodes on which
a program is being proled, the timing results are essentially meaningless.
Overhead of timing routines on certain machines: On certain machines, even calling the system
clock in order to time routines is slow; this skews all of the op rates and timing results. The le
${PETSC_DIR}/src/benchmarks/PetscTime.c contains a simple test problem that will
approximate the ammount of time required to get the current time in a running program. On good
systems it will on the order of 1.e-6 seconds or less.
Problem too large for good cache performance: Certain machines with lower memory bandwidths
(slow memory access) attempt to compensate by having a very large cache. Thus, if a signicant
portion of an application ts within the cache, the program will achieve very good performance; if the
code is too large, the performance can degrade markedly. To analyze whether this situation affects a
particular code, one can try plotting the total op rate as a function of problem size. If the op rate
decreases rapidly at some point, then the problem may likely be too large for the cache size.
Inconsistent timings: Inconsistent timings are likely due to other users on the machine, thrashing
(using more virtual memory than available physical memory), or paging in of the initial executable.
164
Section 11.8 provides information on overcoming paging overhead when proling a code. We have
found on all systems that if you follow all the advise above your timings will be consistent within a
variation of less than ve percent.
165
166
Chapter 13
Other PETSc Features
13.1 PETSc on a process subset
Users who wish to employ PETSc routines on only a subset of processes within a larger parallel job, or
who wish to use a master process to coordinate the work of slave PETSc processes, should specify an
alternative communicator for PETSC_COMM_WORLD by calling
PetscSetCommWorld(MPI Comm comm);
before calling PetscInitialize(), but, obviously, after calling MPI_Init(). PetscSetCommWorld() can be
called at most once per process. Most users will never need to use the routine PetscSetCommWorld().
13.2 Runtime Options
Allowing the user to modify parameters and options easily at runtime is very desirable for many applications.
PETSc provides a simple mechanism to enable such customization. To print a list of available options for a
given program, simply specify the option -help (or -h) at runtime, e.g.,
mpiexec -n 1 ./ex1 -help
Note that all runtime options correspond to particular PETSc routines that can be explicitly called from
within a program to set compile-time defaults. For many applications it is natural to use a combination of
compile-time and runtime choices. For example, when solving a linear system, one could explicitly specify
use of the Krylov subspace technique BiCGStab by calling
KSPSetType(ksp,KSPBCGS);
One could then override this choice at runtime with the option
-ksp type tfqmr
to select the Transpose-Free QMR algorithm. (See Chapter 4 for details.)
The remainder of this section discusses details of runtime options.
13.2.1 The Options Database
Each PETSc process maintains a database of option names and values (stored as text strings). This database
is generated with the command PETScInitialize(), which is listed below in its C/C++ and Fortran variants,
respectively:
167
PetscInitialize(int *argc,char ***args,const char *le,const char *help);
call PetscInitialize(character le,integer ierr)
The arguments argc and args (in the C/C++ version only) are the addresses of usual command line
arguments, while the file is a name of a le that can contain additional options. By default this le is
called .petscrc in the users home directory. The user can also specify options via the environmental
variable PETSC_OPTIONS. The options are processed in the following order:
le
environmental variable
command line
Thus, the command line options supersede the environmental variable options, which in turn supersede the
options le.
The le format for specifying options is
-optionname possible value
-anotheroptionname possible value
...
All of the option names must begin with a dash (-) and have no intervening spaces. Note that the option
values cannot have intervening spaces either, and tab characters cannot be used between the option names
and values. The user can employ any naming convention. For uniformity throughout PETSc, we employ the
format -package_option (for instance, -ksp_type and -mat_view_info).
Users can specify an alias for any option name (to avoid typing the sometimes lengthy default name) by
adding an alias to the .petscrc le in the format
alias -newname -oldname
For example,
alias -kspt -ksp type
alias -sd -start in debugger
Comments can be placed in the .petscrc le by using one of the following symbols in the rst column of a
line: #, %, or !.
13.2.2 User-Dened PetscOptions
Any subroutine in a PETSc program can add entries to the database with the command
PetscOptionsSetValue(char *name,char *value);
though this is rarely done. To locate options in the database, one should use the commands
PetscOptionsHasName(char *pre,char *name,PetscBool *g);
PetscOptionsGetInt(char *pre,char *name,int *value,PetscBool *g);
PetscOptionsGetReal(char *pre,char *name,double *value,PetscBool *g);
PetscOptionsGetString(char *pre,char *name,char *value,int maxlen,PetscBool *g);
PetscOptionsGetStringArray(char *pre,char *name,char **values,int *maxlen,PetscBool *g);
PetscOptionsGetIntArray(char *pre,char *name,int *value,int *nmax,PetscBool *g);
PetscOptionsGetRealArray(char *pre,char *name,double *value, int *nmax,PetscBool *g);
168
All of these routines set flg=PETSC_TRUE if the corresponding option was found, flg=PETSC_FALSE
if it was not found. The optional argument pre indicates that the true name of the option is the given name
(with the dash - removed) prepended by the prex pre. Usually pre should be set to PETSC NULL
(or PETSC_NULL_CHARACTER for Fortran); its purpose is to allow someone to rename all the options
in a package without knowing the names of the individual options. For example, when using block Jacobi
preconditioning, the KSP and PC methods used on the individual blocks can be controlled via the options
-sub_ksp_type and -sub_pc_type.
13.2.3 Keeping Track of Options
One useful means of keeping track of user-specied runtime options is use of -options_table, which
prints to stdout during PetscFinalize() a table of all runtime options that the user has specied.
A related option is -options_left, which prints the options table and indicates any options that have
not been requested upon a call to PetscFinalize(). This feature is useful to check whether an option has been
activated for a particular PETSc object (such as a solver or matrix format), or whether an option name may
have been accidentally misspelled.
13.3 Viewers: Looking at PETSc Objects
PETSc employs a consistent scheme for examining, printing, and saving objects through commands of the
form
XXXView(XXX obj,PetscViewer viewer);
Here obj is any PETSc object of type XXX, where XXX is Mat, Vec, SNES, etc. There are several predened
viewers:
Passing in a zero for the viewer causes the object to be printed to the screen; this is most useful when
viewing an object in a debugger.
PETSC VIEWER STDOUT SELF and PETSC VIEWER STDOUT WORLD cause the object to be
printed to the screen.
PETSC VIEWER DRAW SELF PETSC VIEWER DRAW WORLD causes the object to be drawn
in a default X window.
Passing in a viewer obtained by PetscViewerDrawOpen() causes the object to be displayed graphically.
To save an object to a le in ASCII format, the user creates the viewer object with the command
PetscViewerASCIIOpen(MPI_Comm comm, char
*
file, PetscViewer
*
viewer). This
object is analogous to PETSC VIEWER STDOUT SELF (for a communicator of MPI_COMM_SEL
F) and PETSC VIEWER STDOUT WORLD (for a parallel communicator).
To save an object to a le in binary format, the user creates the viewer object with the command
PetscViewerBinaryOpen(MPI_Comm comm,char* le,PetscViewerBinaryType type, PetscViewer *viewer).
Details of binary I/O are discussed below.
Vector and matrix objects can be passed to a running MATLAB process with a viewer created by
PetscViewerSocketOpen(MPI Comm comm,char *machine,int port,PetscViewer *viewer).
169
On the MATLAB side, one must rst run v = sreader(int port) and then A = PetscB
inaryRead(v) to obtain the matrix or vector. Once all objects have been received, the port can be
closed from the MATLAB end with close(v). On the PETSc side, one should destroy the viewer
object with PetscViewerDestroy(). The corresponding MATLAB mex les are located in
${PETSC_DIR}/src/sys/viewer/impls/socket/matlab.
The user can control the format of ASCII printed objects with viewers created by PetscViewerASCI-
IOpen() by calling
PetscViewerSetFormat(PetscViewer viewer,int format);
Possible formats include PETSC_VIEWER_DEFAULT, PETSC_VIEWER_ASCII_MATLAB, and PETS
C_VIEWER_ASCII_IMPL. The implementation-specic format, PETSC_VIEWER_ASCII_IMPL, dis-
plays the object in the most natural way for a particular implementation.
The routines
PetscViewerPushFormat(PetscViewer viewer,int format);
PetscViewerPopFormat(PetscViewer viewer);
allow one to temporarily change the format of a viewer.
As discussed above, one can output PETSc objects in binary format by rst opening a binary viewer
with PetscViewerBinaryOpen() and then using MatView(), VecView(), etc. The corresponding routines for
input of a binary object have the form XXXLoad(). In particular, matrix and vector binary input is handled
by the following routines:
MatLoad(PetscViewer viewer,MatType outtype,Mat *newmat);
VecLoad(PetscViewer viewer,VecType outtype,Vec *newvec);
These routines generate parallel matrices and vectors if the viewers communicator has more than one pro-
cess. The particular matrix and vector formats are determined from the options database; see the manual
pages for details.
One can provide additional information about matrix data for matrices stored on disk by providing an
optional le matrixfilename.info, where matrixfilename is the name of the le containing the
matrix. The format of the optional le is the same as the .petscrc le and can (currently) contain the
following:
-matload block size <bs>
The block size indicates the size of blocks to use if the matrix is read into a block oriented data structure (for
example, MATMPIBAIJ). The diagonal information s1,s2,s3,... indicates which (block) diagonals
in the matrix have nonzero values.
13.4 Debugging
PETSc programs may be debugged using one of the two options below.
-start_in_debugger [noxterm,dbx,xxgdb] [-display name] - start all processes
in debugger
-on_error_attach_debugger [noxterm,dbx,xxgdb] [-display name] - start de-
bugger only on encountering an error
170
Note that, in general, debugging MPI programs cannot be done in the usual manner of starting the program-
ming in the debugger (because then it cannot set up the MPI communication and remote processes).
By default the GNU debugger gdb is used when -start_in_debugger or -on_error_attach_
debugger is specied. To employ either xxgdb or the common UNIX debugger dbx, one uses command
line options as indicated above. On HP-UX machines the debugger xdb should be used instead of dbx; on
RS/6000 machines the xldb debugger is supported as well. By default, the debugger will be started in a new
xterm (to enable running separate debuggers on each process), unless the option noxterm is used. In order
to handle the MPI startup phase, the debugger command cont should be used to continue execution of the
program within the debugger. Rerunning the program through the debugger requires terminating the rst
job and restarting the processor(s); the usual run option in the debugger will not correctly handle the MPI
startup and should not be used. Not all debuggers work on all machines, so the user may have to experiment
to nd one that works correctly.
You can select a subset of the processes to be debugged (the rest just run without the debugger) with the
option
-debugger nodes node1,node2,...
where you simply list the nodes you want the debugger to run with.
13.5 Error Handling
Errors are handled through the routine PetscError(). This routine checks a stack of error handlers and calls
the one on the top. If the stack is empty, it selects PetscTraceBackErrorHandler(), which tries to print a
traceback. A new error handler can be put on the stack with
PetscPushErrorHandler(PetscErrorCode (*HandlerFunction)(int line,char *dir,char *le,
char *message,int number,void*),void *HandlerContext)
The arguments to HandlerFunction() are the line number where the error occurred, the le in which
the error was detected, the corresponding directory, the error message, the error integer, and the HandlerC
ontext. The routine
PetscPopErrorHandler()
removes the last error handler and discards it.
PETSc provides two additional error handlers besides PetscTraceBackErrorHandler():
PetscAbortErrorHandler()
PetscAttachErrorHandler()
The function PetscAbortErrorHandler() calls abort on encountering an error, while PetscAttachErrorHan-
dler() attaches a debugger to the running process if an error is detected. At runtime, these error handlers
can be set with the options -on_error_abort or -on_error_attach_debugger [noxterm,
dbx, xxgdb, xldb] [-display DISPLAY].
All PETSc calls can be traced (useful for determining where a program is hanging without running in
the debugger) with the option
-log trace [lename]
where filename is optional. By default the traces are printed to the screen. This can also be set with the
command PetscLogTraceBegin(FILE*).
It is also possible to trap signals by using the command
171
PetscPushSignalHandler( PetscErrorCode (*Handler)(int,void *),void *ctx);
The default handler PetscDefaultSignalHandler() calls PetscError() and then terminates. In general, a signal
in PETSc indicates a catastrophic failure. Any error hander that the user provides should try to clean up only
before exiting. By default all PETSc programs use the default signal handler, although the user can turn this
off at runtime with the option -no_signal_handler .
There is a separate signal handler for oating-point exceptions. The option -fp_trap turns on the
oating-point trap at runtime, and the routine
PetscSetFPTrap(int ag);
can be used in-line. A flag of PETSC_FP_TRAP_ON indicates that oating-point exceptions should be
trapped, while a value of PETSC_FP_TRAP_OFF (the default) indicates that they should be ignored. Note
that on certain machines, in particular the IBM RS/6000, trapping is very expensive.
A small set of macros is used to make the error handling lightweight. These macros are used throughout
the PETSc libraries and can be employed by the application programmer as well. When an error is rst
detected, one should set it by calling
SETERRQ(MPI Comm comm,PetscErrorCode ag,,char *message);
The user should check the return codes for all PETSc routines (and possibly user-dened routines as well)
with
ierr = PetscRoutine(...);CHKERRQ(PetscErrorCode ierr);
Likewise, all memory allocations should be checked with
ierr = PetscMalloc(n*sizeof(double),&ptr);CHKERRQ(ierr);
If this procedure is followed throughout all of the users libraries and codes, any error will by default generate
a clean traceback of the location of the error.
Note that the macro __FUNCT__ is used to keep track of routine names during error tracebacks. Users
need not worry about this macro in their application codes; however, users can take advantage of this feature
if desired by setting this macro before each user-dened routine that may call SETERRQ(), CHKERRQ().
A simple example of usage is given below.
#undef FUNCT
#dene FUNCT MyRoutine1
int MyRoutine1() {
/* code here */
return 0;
}
13.6 Incremental Debugging
When developing large codes, one is often in the position of having a correctly (or at least believed to be
correctly) running code; making a change to the code then changes the results for some unknown reason.
Often even determining the precise point at which the old and new codes diverge is a major pain. In other
cases, a code generates different results when run on different numbers of processes, although in exact
arithmetic the same answer is expected. (Of course, this assumes that exactly the same solver and parameters
are used in the two cases.)
PETSc provides some support for determining exactly where in the code the computations lead to differ-
ent results. First, compile both programs with different names. Next, start running both programs as a single
172
MPI job. This procedure is dependent on the particular MPI implementation being used. For example, when
using MPICH on workstations, procgroup les can be used to specify the processors on which the job is to
be run. Thus, to run two programs, old and new, each on two processors, one should create the procgroup
le with the following contents:
local 0
workstation1 1 /home/bsmith/old
workstation2 1 /home/bsmith/new
workstation3 1 /home/bsmith/new
(Of course, workstation1, etc. can be the same machine.) Then, one can execute the command
mpiexec -p4pg <procgroup lemame> old -compare <tolerance> [options]
Note that the same runtime options must be used for the two programs. The rst time an inner product
or norm detects an inconsistency larger than <tolerance>, PETSc will generate an error. The usual
runtime options -start_in_debugger and -on_error_attach_debugger may be used. The
user can also place the commands
PetscCompareDouble()
PetscCompareScalar()
PetscCompareInt()
in portions of the application code to check for consistency between the two versions.
13.7 Complex Numbers
PETSc supports the use of complex numbers in application programs written in C, C++, and Fortran. To do
so, we employ either the C99 complex type or the C++ versions of the PETSc libraries in which the basic
scalar datatype, given in PETSc codes by PetscScalar, is dened as complex (or complex<double>
for machines using templated complex class libraries). To work with complex numbers, the user should
run ./congure with the additional option --with-scalar-type=complex. The le ${PETSC_DI
R}/src/docs/website/documentation/installation.html provides detailed instructions
for installing PETSc. You can use --with-clanguage=c (the default) to use the C99 complex numbers
or --with-clanguage=c++ to use the C++ complex type.
We recommend using optimized Fortran kernels for some key numerical routines with complex numbers
(such as matrix-vector products, vector norms, etc.) instead of the default C/C++ routines. This can be
done with the ./configure option --with-fortran-kernels=generic. This implementation
exploits the maturity of Fortran compilers while retaining the identical user interface. For example, on
rs6000 machines, the base single-node performance when using the Fortran kernels is 4-5 times faster than
the default C++ code.
Recall that each variant of the PETSc libraries is stored in a different directory, given by
${PETSC_DIR}/lib/${PETSC_ARCH},
according to the architecture.. Thus, the libraries for complex numbers are maintained separately from
those for real numbers. When using any of the complex numbers versions of PETSc, all vector and matrix
elements are treated as complex, even if their imaginary components are zero. Of course, one can elect to
use only the real parts of the complex numbers when using the complex versions of the PETSc libraries;
however, when working only with real numbers in a code, one should use a version of PETSc for real
numbers for best efciency.
173
The program ${PETSC_DIR}/src/ksp/ksp/examples/tutorials/ex11.c solves a linear
system with a complex coefcient matrix. Its Fortran counterpart is ${PETSC_DIR}/src/ksp/ksp/
examples/tutorials/ex11f.F.
13.8 Emacs Users
If users develop application codes using Emacs (which we highly recommend), the etags feature can
be used to search PETSc les quickly and efciently. To use this feature, one should rst check if the
le, ${PETSC_DIR}/TAGS exists. If this le is not present, it should be generated by running make
alletags from the PETSc home directory. Once the le exists, from Emacs the user should issue the
command
M-x visit-tags-table
where M denotes the Emacs Meta key, and enter the name of the TAGS le. Then the command M-. will
cause Emacs to nd the le and line number where a desired PETSc function is dened. Any string in any
of the PETSc les can be found with the command M-x tags-search. To nd repeated occurrences,
one can simply use M-, to nd the next occurrence.
An alternative which provides reverse lookups (e.g. nd all call sites for a given function) and is some-
what more amenable to managing many projects is GNU Global, available from http://www.gnu.
org/s/global/. Tags for PETSc and all external packages can be generated by running the command
find $PETSC_DIR/{include,src,tutorials,$PETSC_ARCH/include} any/other/paths \
-regex .
*
\.\(cc\|hh\|cpp\|C\|hpp\|c\|h\|cu\)$ \
| grep -v ftn-auto | gtags -f -
from your home directory or wherever you keep source code. If you are making large changes, it is useful
to either set this up to run as a cron job or to make a convenient alias so that refreshing is easy. Then add the
following to /.emacs to enable gtags and replace the plain etags bindings.
(when (require gtags)
(global-set-key "\C-cf" gtags-find-file)
(global-set-key "\M-." gtags-find-tag)
(define-key gtags-mode-map (kbd "C-c r") gtags-find-rtag))
(add-hook c-mode-common-hook
(lambda () (gtags-mode t))) ; Or add to existing hook
13.9 Vi and Vim Users
If users develop application codes using Vi or Vim the tags feature can be used to search PETSc les
quickly and efciently. To use this feature, one should rst check if the le, ${PETSC_DIR}/CTAGS
exists. If this le is not present, it should be generated by running make alletags from the PETSc home
directory. Once the le exists, from Vi/Vim the user should issue the command
:set tags=CTAGS
from the PETSC_DIR directory and enter the name of the CTAGS le. Then the command tag function-
name will cause Vi/Vim to nd the le and line number where a desired PETSc function is dened. See,
for example, http://www.yolinux.com/TUTORIALS/LinuxTutorialAdvanced vi.html for additional Vi/Vim
options that allow searches etc. It is also possible to use GNU Global with Vim, see the description for
Emacs above.
174
13.10 Eclipse Users
If you are interested in developing code that uses PETSc from Eclipse or developing PETSc in Eclipse
and have knowledge of how to do indexing and build libraries in Eclipse please contact us at petsc-
dev@mcs.anl.gov.
To allow an Eclipse code to compile with the PETSc include les and link with the PETSc libraries
someone has suggested To do this, click on your managed C project with the right sided mouse button,
select
Properties - C/C++ Build - Settings
Then you get a new window with on the right hand side the various setting options.
Select Includes, and add the required PETSc paths. In my case I have added
${PETSC_DIR}/include
${PETSC_DIR}/${PETSC_ARCH}/include
Then select Libraries under the header Linker and you should set the Library search path:
${PETSC_DIR}/${PETSC_ARCH}/lib
and then the libraries, in my case:
m, petsc, stdc++, mpichxx, mpich, lapack, blas, gfortran, dl, rt,gcc_s, pthread, X11
To make PETSc an Eclipse package
Install the Mecurial plugin for Eclipse and then import the PETSc repository to Eclipse.
elected New-Convert to C/C++ project and selected shared library. After this point you can perform
searchs in the code.
A PETSc user has provided the following steps to build an Eclipse index for PETSc that can be used
with their own code without compiling PETSc source into their project.
In the user project source directory, create a symlink to the petsc/src directory.
Refresh the project explorer in Eclipse, so the new symlink is followed.
Right-click on the project in the project explorer, and choose Index - Rebuild. The index should
now be build.
Right-click on the PETSc symlink in the project explorer, and choose Exclude from build... to make
sure Eclipse does not try to compile PETSc with the project.
13.11 Qt Creator
Qt Creator is developed as a part of Qt SDK for cross-platform GUI programming using C++ and thus has
excellent support for application developing using C/C++. This information was provided by Mohammad
Mirzadeh.
1. Qt Creator supports automated and cross-platform qmake and Cmake build systems. What this
means, is that, starting from a handful set of options, Qt Creator is able to generate makeles that are
used to compile and run your application package on Linux, Windows and Mac systems.
2. Qt Creator makes the task of linking PETSc library to your application package easy.
3. Qt Creator has visual debugger and supports both GNUs GDB and Microsofts CDB (on Windows).
175
4. Qt Creator (as of version 2.3) has interface to memory proling softwares such as valgrind.
5. Qt Creator has built-in auto-completion and function look-up. This feature of editor, makes it possible
for the user to easily browse inside huge packages like PETSc and easily nd relevant functions and
objects.
6. Qt Creator has inline error/warning messages. This feature of the editor greatly reduces programming
bugs by informing the user of potential error/warning as the code is being written and even before the
compile step.
7. Qt Creator is an excellent environment for doing GUI programming using Qt framework. This means
you can easily generate a nice GUI for you application package as well!
Qt Creator is a part of Qt SDK, currently being developed by Nokia Inc. It is available under GPL v3,
LGPL v2 and a commercial license and may be obtained, either as a part of Qt SDK or as an stand-alone
software, via http://qt.nokia.com/downloads/.
How to create a project?
Upon installation, all you really need to do to port your software package to Qt Creator is to build a .pro
le. A .pro le is simply a conguration le that tells Qt Creator about all build/compile options and
locations of source le. One may start with a blank .pro le and ll in the conguration options as needed:
TARGET = name of your app
APP DIR = / home / pa t h / t o / your / app
PETSC DIR = / home / pa t h / t o / your / p e t s c
PETSC ARCH = l i nux gnucxxdebug
PETSC BINS = $$PETSC DIR / $$PETSC ARCH/ bi n
INCLUDEPATH += $$APP DIR / i n c l u d e \
$$PETSC DIR / i n c l u d e \
$$PETSC DIR / $$PETSC ARCH/ i n c l u d e
LIBS += L/ pa t h / t o / p e t s c / l i b s l a l l n e e d e d p e t s c l i b s
QMAKE CC = $$PETSC BINS / \ h r e f { h t t p : / / www. mcs . a nl . gov / p e t s c / pe t s c as / s n a p s h o t s / pe t s c 3. 2/ docs / h t t p : / / www. mcs . a nl . gov / mpi /www/ www1/ mpi cc . ht ml #mpi cc }{ mpi cc }\ f i nde x {mpi cc }
QMAKE CXX = $$PETSC BINS / mpi cxx
QMAKE CFLAGS += O3
QMAKE CXXFLAGS += O3
QMAKE LFLAGS += O3
SOURCES += \
$$APP DIR / s r c / s our c e 1 . cpp \
$$APP DIR / s r c / s our c e 2 . cpp \
$$APP DIR / s r c / main . cpp
HEADERS += \
$$APP DIR / s r c / s our c e 1 . h \
176
$$APP DIR / s r c / s our c e 2 . h
OTHER FILES += \
$$APP DIR / s r c / s o me f i l e . cpp \
$$APP DIR / d i r / a n o t h e r f i l e . xyz
In this example, there are different keywords used. These include:
TARGET: This denes the name of your application executable. You can choose it however you like.
INCLUDEPATH: This is used in the compile time to point to all the needed include les. Essentially it
is used as a -I $$INCLUDEPATH ag for the compiler. This should include all your application-
specic header les and those related to PETSc which may be found via make getincludedirs.
LIBS: This denes all needed external libraries to link with your application. To get PETScs linking
libraries do make getlinklibs.
QMAKE CC and QMAKE CXX: These dene which C/C++ compilers to choose for your application.
QMAKE CFLAGS, QMAKE CXXFLAGS and QMAKE LFLAGS: These set the corresponding compile
and linking ags.
SOURCES: This sets all the source les that need to be compiled.
HEADERS: This sets all the header les that are needed for your application. Note that since header
les are merely included in the source les, they are not compiled. As a result this is an optional ag
and is simply included to allow users to easily access the header les in their application (or even
PETSc header les).
OTHER FILES: This can be used to virtually include any other le (source, header or any other
extension) just as a way to make accessing to auxiliary les easier. They could be anything from a
text le to a PETSc source le. Note that none of the source les placed here are compiled.
Note that there are many more options available that one can feed into the .pro le. For more informa-
tion, one should followhttp://doc.qt.nokia.com/latest/qmake-variable-reference.
html. Once the .pro le is generated, the user can simply open it via Qt Creator. Upon opening, generally
one has the option to create two different build options, debug and release, and switch between the two easily
at any time. For more information on howto use the Qt Creator interface, and other more advanced aspects of
this IDE, one may refer to http://doc.qt.nokia.com/qtcreator-snapshot/index.html
13.12 Developers Studio Users
In the projects subdirectory of PETSc are several Developers Studio projects that use PETSc.
13.13 XCode Users (The Apple GUI Development System
Please let us know how you have set things up at petsc-dev@mcs.anl.gov.
13.14 Parallel Communication
When used in a message-passing environment, all communication within PETSc is done through MPI, the
message-passing interface standard [14]. Any le that includes petscsys.h (or any other PETSc include
le), can freely use any MPI routine.
177
13.15 Graphics
PETSc graphics library is not intended to compete with high-quality graphics packages. Instead, it is in-
tended to be easy to use interactively with PETSc programs. We urge users to generate their publication-
quality graphics using a professional graphics package. If a user wants to hook certain packages in PETSc,
he or she should send a message to petsc-maint@mcs.anl.gov, and we will see whether it is reasonable to
try to provide direct interfaces.
13.15.1 Windows as PetscViewers
For drawing predened PETSc objects such as matrices and vectors, one must rst create a viewer using the
command
PetscViewerDrawOpen(MPI Comm comm,char *display,char *title,int x,
int y,int w,int h,PetscViewer *viewer);
This viewer may be passed to any of the XXXView() routines. To draw into the viewer, one must obtain
the Draw object with the command
PetscViewerDrawGetDraw(PetscViewer viewer,PetscDraw *draw);
Then one can call any of the PetscDrawXXX commands on the draw object. If one obtains the draw
object in this manner, one does not call the PetscDrawOpenX() command discussed below.
Predened viewers, PETSC VIEWER DRAW WORLD and PETSC VIEWER DRAW SELF, may be
used at any time. Their initial use will cause the appropriate window to be created.
By default, PETSc drawing tools employ a private colormap, which remedies the problem of poor color
choices for contour plots due to an external programs mangling of the colormap (e.g, Netscape tends to
do this). Unfortunately, this causes ashing of colors as the mouse is moved between the PETSc windows
and other windows. Alternatively, a shared colormap can be used via the option -draw_x_shared_
colormap.
13.15.2 Simple PetscDrawing
One can open a window that is not associated with a viewer directly under the X11 Window System with
the command
PetscDrawOpenX(MPI Comm comm,char *display,char *title,int x,
int y,int w,int h,PetscDraw *win);
All drawing routines are done relative to the windows coordinate system and viewport. By default the draw-
ing coordinates are from (0,0) to (1,1), where (0,0) indicates the lower left corner of the window.
The application program can change the window coordinates with the command
PetscDrawSetCoordinates(PetscDraw win,double xl,double yl,double xr,double yr);
By default, graphics will be drawn in the entire window. To restrict the drawing to a portion of the window,
one may use the command
PetscDrawSetViewPort(PetscDraw win,double xl,double yl,double xr,double yr);
These arguments, which indicate the fraction of the window in which the drawing should be done, must
satisfy 0 xl xr 1 and 0 yl yr 1.
To draw a line, one uses the command
178
PetscDrawLine(PetscDraw win,double xl,double yl,double xr,double yr,int cl);
The argument cl indicates the color (which is an integer between 0 and 255) of the line. A list of predened
colors may be found in include/petscdraw.h and includes PETSC_DRAW_BLACK, PETSC_DRAW
_RED, PETSC_DRAW_BLUE etc.
To ensure that all graphics actually have been displayed, one should use the command
PetscDrawFlush(PetscDraw win);
When displaying by using double buffering, which is set with the command
PetscDrawSetDoubleBuffer(PetscDraw win);
all processes must call
PetscDrawSynchronizedFlush(PetscDraw win);
in order to swap the buffers. From the options database one may use -draw_pause n, which causes the
PETSc application to pause n seconds at each PetscDrawPause(). A time of -1 indicates that the application
should pause until receiving mouse input from the user.
Text can be drawn with either of the two commands
PetscDrawString(PetscDraw win,double x,double y,int color,char *text);
PetscDrawStringVertical(PetscDraw win,double x,double y,int color,char *text);
The user can set the text font size or determine it with the commands
PetscDrawStringSetSize(PetscDraw win,double width,double height);
PetscDrawStringGetSize(PetscDraw win,double *width,double *height);
13.15.3 Line Graphs
PETSc includes a set of routines for manipulating simple two-dimensional graphs. These routines, which
begin with PetscDrawAxisDraw(), are usually not used directly by the application programmer. Instead, the
programmer employs the line graph routines to draw simple line graphs. As shown in the program, within
Figure 22, line graphs are created with the command
PetscDrawLGCreate(PetscDraw win,int ncurves,PetscDrawLG *ctx);
The argument ncurves indicates how many curves are to be drawn. Points can be added to each of the
curves with the command
PetscDrawLGAddPoint(PetscDrawLG ctx,double *x,double *y);
The arguments x and y are arrays containing the next point value for each curve. Several points for each
curve may be added with
PetscDrawLGAddPoints(PetscDrawLG ctx,int n,double **x,double **y);
The line graph is drawn (or redrawn) with the command
PetscDrawLGDraw(PetscDrawLG ctx);
A line graph that is no longer needed can be destroyed with the command
PetscDrawLGDestroy(PetscDrawLG *ctx);
179
To plot new curves, one can reset a linegraph with the command
PetscDrawLGReset(PetscDrawLG ctx);
The line graph automatically determines the range of values to display on the two axes. The user can change
these defaults with the command
PetscDrawLGSetLimits(PetscDrawLG ctx,double xmin,double xmax,double ymin,double ymax);
It is also possible to change the display of the axes and to label them. This procedure is done by rst
obtaining the axes context with the command
PetscDrawLGGetAxis(PetscDrawLG ctx,PetscDrawAxis *axis);
One can set the axes colors and labels, respectively, by using the commands
PetscDrawAxisSetColors(PetscDrawAxis axis,int axis lines,int ticks,int text);
PetscDrawAxisSetLabels(PetscDrawAxis axis,char *top,char *x,char *y);
static char help[] = "Plots a simple line graph.\n";
#include <petscsys.h>
#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc,char
**
argv)
{
PetscDraw draw;
PetscDrawLG lg;
PetscDrawAxis axis;
PetscInt n = 20,i,x = 0,y = 0,width = 300,height = 300,nports
= 1;
PetscBool flg;
const char
*
xlabel,
*
ylabel,
*
toplabel;
PetscReal xd,yd;
PetscDrawViewPorts
*
ports;
PetscErrorCode ierr;
xlabel = "X-axis Label";toplabel = "Top Label";ylabel = "Y-axis Label";
ierr = PetscInitialize(&argc,&argv,(char
*
)0,help);CHKERRQ(ierr);
ierr = PetscOptionsGetInt(PETSC_NULL,"-width",&width,PETSC_NULL);CHKERRQ(ierr);
ierr = PetscOptionsGetInt(PETSC_NULL,"-height",&height,PETSC_NULL);CHKERRQ(ierr);
ierr = PetscOptionsGetInt(PETSC_NULL,"-n",&n,PETSC_NULL);CHKERRQ(ierr);
ierr = PetscOptionsHasName(PETSC_NULL,"-nolabels",&flg);CHKERRQ(ierr);
if (flg) {
xlabel = (char
*
)0; toplabel = (char
*
)0;
}
ierr = PetscDrawCreate(PETSC_COMM_SELF,0,"Title",x,y,width,height,&draw);CHKERRQ(ierr);
ierr = PetscDrawSetFromOptions(draw);CHKERRQ(ierr);
ierr = PetscOptionsGetInt(PETSC_NULL,"-nports",&nports,PETSC_NULL);CHKERRQ(ierr);
ierr = PetscDrawViewPortsCreate(draw,nports,&ports);CHKERRQ(ierr);
ierr = PetscDrawViewPortsSet(ports,0);CHKERRQ(ierr);
180
ierr = PetscDrawLGCreate(draw,1,&lg);CHKERRQ(ierr);
ierr = PetscDrawLGGetAxis(lg,&axis);CHKERRQ(ierr);
ierr = PetscDrawAxisSetColors(axis,PETSC_DRAW_BLACK,PETSC_DRAW_RED,PETSC_DRAW_BLUE);CHKERRQ(ierr);
ierr = PetscDrawAxisSetLabels(axis,toplabel,xlabel,ylabel);CHKERRQ(ierr);
for (i=0; i<n ; i++) {
xd = (PetscReal)(i - 5); yd = xd
*
xd;
ierr = PetscDrawLGAddPoint(lg,&xd,&yd);CHKERRQ(ierr);
}
ierr = PetscDrawLGIndicateDataPoints(lg);CHKERRQ(ierr);
ierr = PetscDrawLGDraw(lg);CHKERRQ(ierr);
ierr = PetscDrawString(draw,-3.,150.0,PETSC_DRAW_BLUE,"A legend");CHKERRQ(ierr);
ierr = PetscDrawFlush(draw);CHKERRQ(ierr);
ierr = PetscSleep(2);CHKERRQ(ierr);
ierr = PetscDrawViewPortsDestroy(ports);CHKERRQ(ierr);
ierr = PetscDrawLGDestroy(&lg);CHKERRQ(ierr);
ierr = PetscDrawDestroy(&draw);CHKERRQ(ierr);
ierr = PetscFinalize();
return 0;
}
Figure 22: Example of PetscDrawing Plots
It is possible to turn off all graphics with the option -nox. This will prevent any windows from being
opened or any drawing actions to be done. This is useful for running large jobs when the graphics overhead
is too large, or for timing.
13.15.4 Graphical Convergence Monitor
For both the linear and nonlinear solvers default routines allow one to graphically monitor convergence
of the iterative method. These are accessed via the command line with -ksp_monitor_draw and -
snes_monitor_draw. See also Sections 4.3.3 and 5.3.2.
The two functions used are KSPMonitorLG() and KSPMonitorLGCreate(). These can easily be modied
to serve specialized needs.
13.15.5 Disabling Graphics at Compile Time
To disable all x-window-based graphics, run ./congure with the additional option with-x=0
181
182
Chapter 14
Makeles
This chapter describes the design of the PETSc makeles, which are the key to managing our code portability
across a wide variety of UNIX and Windows systems.
14.1 Our Makele System
To make a program named ex1, one may use the command
make PETSC ARCH=arch ex1
which will compile the example and automatically link the appropriate libraries. The architecture, arch, is
one of solaris, rs6000, IRIX, hpux, etc. Note that when using command line options with make
(as illustrated above), one must not place spaces on either side of the = signs. The variable PETSC_ARCH
can also be set as environmental variables. Although PETSc is written in C, it can be compiled with a C++
compiler. For many C++ users this may be the preferred route. To compile with the C++ compiler, one
should use the ./congure option --with-clanguage=c++.
14.1.1 Makele Commands
The directory ${PETSC_DIR}/conf contains virtually all makele commands and customizations to en-
able portability across different architectures. Most makele commands for maintaining the PETSc system
are dened in the le ${PETSC_DIR}/conf. These commands, which process all appropriate les within
the directory of execution, include
lib - Updates the PETSc libraries based on the source code in the directory.
libfast - Updates the libraries faster. Since libfast recompiles all source les in the directory
at once, rather than individually, this command saves time when many les must be compiled.
clean - Removes garbage les.
The tree command enables the user to execute a particular action within a directory and all of its sub-
directories. The action is specied by ACTION=[action], where action is one of the basic commands
listed above. For example, if the command
make ACTION=lib tree
were executed from the directory ${PETSC_DIR}/src/ksp/ksp, the debugging library for all Krylov
subspace solvers would be built.
183
14.1.2 Customized Makeles
The directory ${PETSC_DIR}/ contains a subdirectory for each architecture that contains machine-specic
information, enabling the portability of our makele system, these are ${PETSC_DIR}/${PETSC_ARCH
}/conf. Each architecture directory contains two makeles:
petscvariables - denitions of the compilers, linkers, etc.
petscrules - some build rules specic to this machine.
These les are generated automatically when you run ./congure.
The architecture independent makeles, are located in ${PETSC_DIR}/conf, and the machine-
specic makeles get included from here.
14.2 PETSc Flags
PETSc has several ags that determine how the source code will be compiled. The default ags for particular
versions are specied by the variable PETSCFLAGS within the base les of ${PETSC_DIR}/${PETS
C_ARCH}/conf, discussed in Section 14.1.2. The ags include
PETSC_USE_DEBUG - The PETSc debugging options are activated. We recommend always using
this.
PETSC_USE_COMPLEX - The version with scalars represented as complex numbers is used.
PETSC_USE_LOG - Various monitoring statistics on oating-point operations, and message-passing
activity are kept.
14.2.1 Sample Makeles
Maintaining portable PETSc makeles is very simple.
The rst is a minimum makele for maintaining a single program that uses the PETSc libraires. The
most important line in this makele is the line starting with include:
include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules
This line includes other makeles that provide the needed denitions and rules for the particular base PETSc
installation (specied by ${PETSC_DIR}) and architecture (specied by ${PETSC_ARCH}). (See 1.2
for information on setting these environmental variables.) As listed in the sample makele, the appropriate
include le is automatically completely specied; the user should not alter this statement within the
makele.
ALL: ex2
CFLAGS =
FFLAGS =
CPPFLAGS =
FPPFLAGS =
CLEANFILES = ex2
include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
ex2: ex2.o chkopts
${CLINKER} -o ex2 ex2.o ${PETSC_LIB}
${RM} ex2.o
184
Figure 23: Sample PETSc Makele for a Single Program
For users who wish to manage the compile process themselves and not use the rules PETSc uses for
compiling programs include variables instead of base. That is, use something like
ALL: ex2
CFLAGS = ${PETSC_CC_INCLUDES}
FFLAGS = ${PETSC_FC_INCLUDES}
include ${PETSC_DIR}/conf/variables
ex2: ex2.o
mylinkercommand -o ex2 ex2.o ${PETSC_LIB}
Figure 24: Sample PETSc Makele that does not use PETScs rules for compiling
The variables ${PETSC_CC_INCLUDES}, ${PETSC_FC_INCLUDES} and ${PETSC_LIB} are
dened by the included conf/variables le.
If you do not wish to include any PETSc makeles in your makele, you can use the commands (run in
the PETSc root directory) to get the information needed by your makele: make getlinklibs getincludedirs
getpetscags. All the libraries listed need to be linked into your executable and the include directories and
ags need to be passed to the compiler usually this is done with CFLAGS = list of -I and other ags and
FFLAGS = list of -I and other ags in your makele.
Note that the variable ${PETSC_LIB} (as listed on the link line in the above makele) species all
of the various PETSc libraries in the appropriate order for correct linking. For users who employ only
a specic PETSc library, can use alternative variables like ${PETSC_SYS_LIB}, ${PETSC_VEC_LI
B}, ${PETSC_MAT_LIB}, ${PETSC_DM_LIB}, ${PETSC_KSP_LIB}, ${PETSC_SNES_LIB} or
${PETSC_TS_LIB}.
The second sample makele, given in Figure 25, controls the generation of several example programs.
CFLAGS =
FFLAGS =
CPPFLAGS =
FPPFLAGS =
include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
ex1: ex1.o
-${CLINKER} -o ex1 ex1.o ${PETSC_LIB}
${RM} ex1.o
ex2: ex2.o
-${CLINKER} -o ex2 ex2.o ${PETSC_LIB}
${RM} ex2.o
ex3: ex3.o
-${FLINKER} -o ex3 ex3.o ${PETSC_LIB}
${RM} ex3.o
ex4: ex4.o
-${CLINKER} -o ex4 ex4.o ${PETSC_LIB}
${RM} ex4.o
runex1:
185
-@${MPIEXEC} ex1
runex2:
-@${MPIEXEC} -n 2 ./ex2 -mat_seqdense -options_left
runex3:
-@${MPIEXEC} ex3 -v -log_summary
runex4:
-@${MPIEXEC} -n 4 ./ex4 -trdump
RUNEXAMPLES_1 = runex1 runex2
RUNEXAMPLES_2 = runex4
RUNEXAMPLES_3 = runex3
EXAMPLESC = ex1.c ex2.c ex4.c
EXAMPLESF = ex3.F
EXAMPLES_1 = ex1 ex2
EXAMPLES_2 = ex4
EXAMPLES_3 = ex3
include ${PETSC_DIR}/conf/test
Figure 25: Sample PETSc Makele for Several Example Programs
Again, the most important line in this makele is the include line that includes the les dening all
of the macro variables. Some additional variables that can be used in the makele are dened as follows:
CFLAGS, FFLAGS User specied additional options for the C compiler and fortran compiler.
CPPFLAGS, FPPFLAGS User specied additional ags for the C preprocessor and fortran prepro-
cesor.
CLINKER, FLINKER the C and Fortran linkers.
RM the remove command for deleting les.
Note that the PETSc example programs are divided into several categories, which currently include:
EXAMPLES_1 basic C suite used in installation tests
EXAMPLES_2 additional C suite including graphics
EXAMPLES_3 basic Fortran .F suite
EXAMPLES_4 subset of 1 and 2 that runs on only a single process
EXAMPLES_5 examples that require complex numbers
EXAMPLES_6 C examples that do not work with complex numbers
EXAMPLES_8 Fortran .F examples that do not work with complex numbers
186
EXAMPLES_9 uniprocess version of 3
EXAMPLES_10 Fortran .F examples that require complex numbers
We next list in Figure 26 a makele that maintains a PETSc library. Although most users do not need to
understand or deal with such makeles, they are also easily used.
ALL: lib
CFLAGS =
SOURCEC = sp1wd.c spinver.c spnd.c spqmd.c sprcm.c
SOURCEF = degree.f fnroot.f genqmd.f qmdqt.f rcm.f fn1wd.f gen1wd.f
genrcm.f qmdrch.f rootls.f fndsep.f gennd.f qmdmrg.f qmdupd.f
SOURCEH =
OBJSC = sp1wd.o spinver.o spnd.o spqmd.o sprcm.o
OBJSF = degree.o fnroot.o genqmd.o qmdqt.o rcm.o fn1wd.o gen1wd.o
genrcm.o qmdrch.o rootls.o fndsep.o gennd.o qmdmrg.o qmdupd.o
LIBBASE = libpetscmat
MANSEC = Mat
include $PETSC DIR/conf/variables include $PETSC DIR/conf/rules
Figure 26: Sample PETSc Makele for Library Maintenance
The librarys name is libpetscmat.a, and the source les being added to it are indicated by SO
URCEC (for C les) and SOURCEF (for Fortran les). Note that the OBJSF and OBJSC are identical to
SOURCEF and SOURCEC, respectively, except they use the sufx .o rather than .c or .f.
The variable MANSEC indicates that any manual pages generated from this source should be included in
the Mat section.
14.3 Limitations
This approach to portable makeles has some minor limitations, including the following:
Each makele must be called makele.
Each makele can maintain at most one archive library.
187
188
Chapter 15
Unimportant and Advanced Features of
Matrices and Solvers
This chapter introduces additional features of the PETSc matrices and solvers. Since most PETSc users
should not need to use these features, we recommend skipping this chapter during an initial reading.
15.1 Extracting Submatrices
One can extract a (parallel) submatrix from a given (parallel) using
MatGetSubMatrix(Mat A,IS rows,IS cols,MatReuse call,Mat *B);
This extracts the rows and columns of the matrix A into B. If call is MAT_INITIAL_MATRIX it will
create the matrix B. If call is MAT_REUSE_MATRIX it will reuse the B created with a previous call.
15.2 Matrix Factorization
Normally, PETSc users will access the matrix solvers through the KSP interface, as discussed in Chapter 4,
but the underlying factorization and triangular solve routines are also directly accessible to the user.
The LU and Cholesky matrix factorizations are split into two or three stages depending on the users
needs. The rst stage is to calculate an ordering for the matrix. The ordering generally is done to reduce ll
in a sparse factorization; it does not make much sense for a dense matrix.
MatGetOrdering(Mat matrix,MatOrderingType type,IS* rowperm,IS* colperm);
The currently available alternatives for the ordering type are
MATORDERINGNATURAL - Natural
MATORDERINGND - Nested Dissection
MATORDERING1WD - One-way Dissection
MATORDERINGRCM - Reverse Cuthill-McKee
MATORDERINGQMD - Quotient Minimum Degree
189
These orderings can also be set through the options database.
Certain matrix formats may support only a subset of these; more options may be added. Check the
manual pages for up-to-date information. All of these orderings are symmetric at the moment; ordering
routines that are not symmetric may be added. Currently we support orderings only for sequential matrices.
Users can add their own ordering routines by providing a function with the calling sequence
int reorder(Mat A,MatOrderingType type,IS* rowperm,IS* colperm);
Here A is the matrix for which we wish to generate a new ordering, type may be ignored and rowperm
and colperm are the row and column permutations generated by the ordering routine. The user registers
the ordering routine with the command
MatOrderingRegisterDynamic(MatOrderingType inname,char *path,char *sname,
PetscErrorCode (*reorder)(Mat,MatOrderingType,IS*,IS*)));
The input argument inname is a string of the users choice, iname is either an ordering dened in
petscmat.h or a users string, to indicate one is introducing a new ordering, while the output See the
code in src/mat/impls/order/sorder.c and other les in that directory for examples on how the
reordering routines may be written.
Once the reordering routine has been registered, it can be selected for use at runtime with the command
line option -pc_factor_mat_ordering_type sname. If reordering directly, the user should provide
the name as the second input argument of MatGetOrdering().
The following routines perform complete, in-place, symbolic, and numerical factorizations for symmet-
ric and nonsymmetric matrices, respectively:
MatCholeskyFactor(Mat matrix,IS permutation,const MatFactorInfo *info);
MatLUFactor(Mat matrix,IS rowpermutation,IS columnpermutation,const MatFactorInfo *info);
The argument info->fill > 1 is the predicted ll expected in the factored matrix, as a ratio of the
original ll. For example, info->fill=2.0 would indicate that one expects the factored matrix to have
twice as many nonzeros as the original.
For sparse matrices it is very unlikely that the factorization is actually done in-place. More likely, new
space is allocated for the factored matrix and the old space deallocated, but to the user it appears in-place
because the factored matrix replaces the unfactored matrix.
The two factorization stages can also be performed separately, by using the out-of-place mode, rst one
obtains that matrix object that will hold the factor
MatGetFactor(Mat matrix,const MatSolverPackage package,MatFactorType ftype,Mat *factor);
and then performs the factorization
MatCholeskyFactorSymbolic(Mat factor,Mat matrix,IS perm,const MatFactorInfo *info);
MatLUFactorSymbolic(Mat factor,Mat matrix,IS rowperm,IS colperm,const MatFactorInfo *info);
MatCholeskyFactorNumeric(Mat factor,Mat matrix,const MatFactorInfo);
MatLUFactorNumeric(Mat factor,Mat matrix,const MatFactorInfo *info);
In this case, the contents of the matrix result is undened between the symbolic and numeric factorization
stages. It is possible to reuse the symbolic factorization. For the second and succeeding factorizations, one
simply calls the numerical factorization with a new input matrix and the same factored result matrix. It
is essential that the new input matrix have exactly the same nonzero structure as the original factored matrix.
(The numerical factorization merely overwrites the numerical values in the factored matrix and does not
disturb the symbolic portion, thus enabling reuse of the symbolic phase.) In general, calling XXXFactorS
190
ymbolic with a dense matrix will do nothing except allocate the new matrix; the XXXFactorNumeric
routines will do all of the work.
Why provide the plain XXXfactor routines when one could simply call the two-stage routines? The
answer is that if one desires in-place factorization of a sparse matrix, the intermediate stage between the
symbolic and numeric phases cannot be stored in a result matrix, and it does not make sense to store the
intermediate values inside the original matrix that is being transformed. We originally made the combined
factor routines do either in-place or out-of-place factorization, but then decided that this approach was not
needed and could easily lead to confusion.
We do not currently support sparse matrix factorization with pivoting for numerical stability. This is
because trying to both reduce ll and do pivoting can become quite complicated. Instead, we provide a poor
stepchild substitute. After one has obtained a reordering, with MatGetOrdering(Mat A,MatOrdering type,IS
*row,IS *col) one may call
MatReorderForNonzeroDiagonal(Mat A,double tol,IS row, IS col);
which will try to reorder the columns to ensure that no values along the diagonal are smaller than tol in
a absolute value. If small values are detected and corrected for, a nonsymmetric permutation of the rows
and columns will result. This is not guaranteed to work, but may help if one was simply unlucky in the
original ordering. When using the KSP solver interface the option -pc_factor_nonzeros_along_
diagonal <tol> may be used. Here, tol is an optional tolerance to decide if a value is nonzero; by
default it is 1.e 10.
Once a matrix has been factored, it is natural to solve linear systems. The following four routines enable
this process:
MatSolve(Mat A,Vec x, Vec y);
MatSolveTranspose(Mat A, Vec x, Vec y);
MatSolveAdd(Mat A,Vec x, Vec y, Vec w);
MatSolveTransposeAdd(Mat A, Vec x, Vec y, Vec w);
matrix A of these routines must have been obtained from a factorization routine; otherwise, an error will be
generated. In general, the user should use the KSP solvers introduced in the next chapter rather than using
these factorization and solve routines directly.
15.3 Unimportant Details of KSP
Again, virtually all users should use KSP through the KSP interface and, thus, will not need to know the
details that follow.
It is possible to generate a Krylov subspace context with the command
KSPCreate(MPI Comm comm,KSP *kps);
Before using the Krylov context, one must set the matrix-vector multiplication routine and the preconditioner
with the commands
PCSetOperators(PC pc,Mat mat,Mat pmat,MatStructure ag);
KSPSetPC(KSP ksp,PC pc);
In addition, the KSP solver must be initialized with
KSPSetUp(KSP ksp);
Solving a linear system is done with the command
191
KSPSolve(KSP ksp,Vec b,Vec x);
Finally, the KSP context should be destroyed with
KSPDestroy(KSP *ksp);
It may seem strange to put the matrix in the preconditioner rather than directly in the KSP; this decision
was the result of much agonizing. The reason is that for SSOR with Eisenstats trick, and certain other
preconditioners, the preconditioner has to change the matrix-vector multiply. This procedure could not be
done cleanly if the matrix were stashed in the KSP context that PC cannot access.
Any preconditioner can supply not only the preconditioner, but also a routine that essentially performs
a complete Richardson step. The reason for this is mainly SOR. To use SOR in the Richardson framework,
that is,
u
n+1
= u
n
+ B(f Au
n
),
is much more expensive than just updating the values. With this addition it is reasonable to state that all our
iterative methods are obtained by combining a preconditioner from the PC package with a Krylov method
from the KSP package. This strategy makes things much simpler conceptually, so (we hope) clean code
will result. Note: We had this idea already implicitly in older versions of KSP, but, for instance, just doing
Gauss-Seidel with Richardson in old KSP was much more expensive than it had to be. With PETSc this
should not be a problem.
15.4 Unimportant Details of PC
Most users will obtain their preconditioner contexts from the KSP context with the command KSPGetPC().
It is possible to create, manipulate, and destroy PC contexts directly, although this capability should rarely
be needed. To create a PC context, one uses the command
PCCreate(MPI Comm comm,PC *pc);
The routine
PCSetType(PC pc,PCType method);
sets the preconditioner method to be used. The routine
PCSetOperators(PC pc,Mat mat,Mat pmat,MatStructure ag);
set the matrices that are to be used with the preconditioner. The routine
PCGetOperators(PC pc,Mat *mat,Mat *pmat,MatStructure *ag);
returns the values set with PCSetOperators().
The preconditioners in PETSc can be used in several ways. The two most basic routines simply apply
the preconditioner or its transpose and are given, respectively, by
PCApply(PC pc,Vec x,Vec y);
PCApplyTranspose(PC pc,Vec x,Vec y);
In particular, for a preconditioner matrix, B, that has been set via PCSetOperators(pc,A,B,ag), the routine
PCApply(pc,x,y) computes y = B
1
x by solving the linear system By = x with the specied precondi-
tioner method.
Additional preconditioner routines are
192
PCApplyBAorAB(PC pc,PCSide right,Vec x,Vec y,Vec work,int its);
PCApplyBAorABTranspose(PC pc,PCSide right,Vec x,Vec y,Vec work,int its);
PCApplyRichardson(PC pc,Vec x,Vec y,Vec work,PetscReal rtol,PetscReal atol, PetscReal dtol,PetscInt
maxits,PetscBool zeroguess,PetscInt *its,PCRichardsonConvergedReason*);
The rst two routines apply the action of the matrix followed by the preconditioner or the preconditioner
followed by the matrix depending on whether the right is PC_LEFT or PC_RIGHT. The nal routine
applies its iterations of Richardsons method. The last three routines are provided to improve efciency
for certain Krylov subspace methods.
A PC context that is no longer needed can be destroyed with the command
PCDestroy(PC *pc);
193
194
Chapter 16
Unstructured Grids in PETSc
This chapter introduces the DMMesh subclass of DM, which allows the user to handle unstructured grids
using the generic DM interface for hierarchy and multi-physics. The underlying implementation is written
in C++, and thus requires PETSc to be build using with-clanguage=c++. More advanced features will also
require direct C++ access in the user code, however much can be done using only the C/F77/F90 interface.
16.1 Sections: Vectors on Unstructured Grids
16.1.1 Dening Sections
The sequence for setting up any Section is the following:
1. Specify the chart
2. Specify the ber dimensions
3. Allocate
4. Specify the values
We will expand on each item in turn.
First, you specify the chart, which is the domain for the Section. For any Section retrieved from the
Mesh, for instance using MeshGetSectionReal(), this is done automatically.
Second, you determine sizes for both values and constraints at the same time:
f or ( l oop over p o i n t s ) {
s e c t i o n >s e t Fi be r Di me ns i on ( p , dof ) ;
}
f or ( l oop over boundar y p o i n t s ) {
s e c t i o n >s e t Co n s t r a i n t Di me n s i o n ( p , numCons t r a i nt s ) ;
}
Third, we allocate storage for this section
s e c t i o n >a l l o c a t e P o i n t ( )
which also creates storage for the boundary conditions.
Now you can input values and constraints whenever you want
195
f or ( l oop over p o i n t s ) {
s e c t i o n >u p d a t e Po i n t ( p , va l ue s ) ;
}
f or ( l oop over boundar y p o i n t s ) {
/ Here we c o n s t r a i n t he x and y v al ue s /
i nt dof s [ 2 ] = {0 , 1};
s e c t i o n >s e t Co n s t r a i n t Do f ( p , dof s ) ;
}
Notice that the input to setConstraintDofs() are the local dofs which are xed.
16.1.2 Updating Values
There are many kinds of update*() functions since users want different ways of setting values. As an
example, we will take a point p with 3 degrees of freedom (u,v,w), and we constrain v, so the ber dimension
is 3, the constraint dimension is 1, and the constraint dof is [1]. Now, the simplest update is
section-updatePoint(p, [a, b, c])
which will set the values a and c, ignoring b since it is constrained. The function updateAddPoint()
does the same thing but adds the values. We have a similar function
section-updateFreePoint(p, [a, c])
which updates the unconstrained values, and only takes those values as input. If you want to update
constrained values, you could use
section-updatePointBC(p, [b])
which updates the values of constrained unknowns, and takes only the constraint values. You could also
use
section-updatePointBCFull(p, [a, b, c])
which takes all the values, but only updates the constraint b. Finally,
section-updatePointAll(p, [a, b, c])
takes the all values and updates all of them, ignoring constraints.
196
Index
-compare, 173
-dmmg grid sequence, 124
-draw pause, 179
-fp trap, 22, 172
-h, 22, 167
-help, 167
-history, 158
-info, 60, 62, 151, 157
-ksp atol, 75
-ksp compute eigenvalues, 77
-ksp compute eigenvalues explicitly, 77
-ksp divtol, 75
-ksp gmres cgs renement type, 73
-ksp gmres classicalgramschmidt, 73
-ksp gmres modiedgramschmidt, 73
-ksp gmres restart, 73
-ksp max it, 75
-ksp monitor, 76, 123
-ksp monitor cancel, 76
-ksp monitor draw, 76, 181
-ksp monitor short, 77
-ksp monitor singular value, 76
-ksp monitor true residual, 76
-ksp pc side, 74
-ksp plot eigenvalues, 77
-ksp plot eigenvalues explicitly, 77
-ksp richardson scale, 73
-ksp rtol, 75
-ksp type, 73
-log mpe, 154, 161
-log summary, 151, 152, 161
-log trace, 151, 171
-malloc, 163
-malloc dump, 163
-malloc log, 164
-mat aij oneindex, 59
-mat coloring, 112
-mat fd coloring err, 112
-mat fd coloring umin, 112
-mat view matlab, 127
-mg levels, 85
-no signal handler, 172
-nox, 181
-on error attach debugger, 23
-options left, 169
-options table, 169
-pc asm type, 81
-pc bgs blocks, 81
-pc bjacobi blocks, 81
-pc composite pcs, 83
-pc composite true, 83
-pc composite type, 83
-pc eisenstat omega, 79
-pc factor diagonal ll, 79
-pc factor ll, 162
-pc factor in place, 79, 80
-pc factor levels, 79
-pc factor mat ordering type, 80
-pc factor nonzeros along diagonal, 79, 80, 191
-pc factor reuse ll, 79
-pc factor reuse ordering, 79
-pc factor shift nonzero, 88
-pc factor sshift nonzero, 88
-pc eldsplit detect saddle point, 86
-pc eldsplit type, 87
-pc ksp true, 83
-pc mg cycle type, 84
-pc mg smoothdown, 84
-pc mg smoothup, 84
-pc mg type, 84
-pc sor backward, 79
-pc sor its, 79
-pc sor local backward, 79
-pc sor local forward, 79
-pc sor local symmetric, 79
-pc sor omega, 79
-pc sor symmetric, 79
-pc type, 78
-preload, 159
-snes atol, 101
-snes ksp ew conv, 103
-snes ls, 100
197
-snes ls alpha, 101
-snes ls maxstep, 101
-snes ls steptol, 101
-snes max funcs, 101
-snes max it, 101
-snes mf, 104
-snes mf err, 104
-snes mf operator, 104
-snes mf umin, 104
-snes monitor, 102, 124
-snes monitor cancel, 102
-snes monitor draw, 102, 181
-snes monitor short, 102
-snes rtol, 101
-snes stol, 101
-snes test display, 102
-snes trtol, 101
-snes type, 98
-snes vi monitor, 113
-snes vi type, 113
-start in debugger, 23
-sub ksp type, 80
-sub pc type, 80
-trdump, 22
-ts pseudo increment, 120
-ts pseudo increment dt from initial dt, 120
-ts rk tol, 121
-ts sundials gmres restart, 119
-ts sundials gramschmidt type, 119
-ts sundials type, 119
-ts type, 116
-v, 22
-vec type, 41
-vec view matlab, 127
.petschistory, 158
.petscrc, 168
1-norm, 43, 65
2-norm, 43
Adams, 119
additive preconditioners, 82
aggregation, 161
AIJ matrix format, 58
alias, 168
AO, 41, 4547, 51, 70
AOApplicationToPetsc, 46
AOApplicationToPetscIS, 46
AOCreateBasic, 45, 46
AOCreateBasicIS, 46, 70
AODestroy, 46
AOPetscToApplication, 46
AOPetscToApplicationIS, 46
AOView, 46
Arnoldi, 77
array, distributed, 48
ASM, 80
assembly, 42
axis, drawing, 180
backward Euler, 116
BDF, 119
Bi-conjugate gradient, 74
block Gauss-Seidel, 80, 81
block Jacobi, 80, 81, 169
boundary conditions, 67
C++, 183
Cai, Xiao-Chuan, 81
CG, 73
CHKERRQ, 123, 158, 172
CHKERRQ(), 172
Cholesky, 189
coarse grid solve, 84
collective operations, 29
coloring with SNES, 111
coloring with TS, 118
combining preconditioners, 82
command line arguments, 23
command line options, 167
communicator, 76, 167
compiler options, 161
complex numbers, 28, 173
composite, 83
convergence tests, 75, 101
coordinates, 178
Crank-Nicolson, 116
CSR, compressed sparse row format, 58
damping, 88
debugger, 23
debugging, 170, 171
developers studio, 177
DIFFERENT NONZERO PATTERN, 72
direct solver, 79
distributed array, 48
DMCompositeGetLocalISs, 63
DMCreateGlobalVector, 49
DMCreateLocalVector, 49, 50
DMDA BOUNDARY GHOSTED, 48
198
DMDA BOUNDARY MIRROR, 48
DMDA BOUNDARY NONE, 48
DMDA BOUNDARY PERIODIC, 48
DMDABoundaryType, 48, 49
DMDACreate1d, 49
DMDACreate2d, 48
DMDACreate3d, 49, 123
DMDAGetAO, 51
DMDAGetArray, 125
DMDAGetCorners, 51
DMDAGetGhostCorners, 51
DMDAGetGlobalIndices, 51, 132, 133, 135
DMDAGetGlobalIndicesF90, 135
DMDAGetScatter, 50
DMDALocalInfo, 124, 125
DMDALocalToLocalBegin, 49
DMDALocalToLocalEnd, 49
DMDAStencilType, 48, 49
DMDAVecGetArray, 50
DMDAVecRestoreArray, 50
DMDestroy, 123
DMGetColoring, 112
DMGetLocalVector, 50
DMGetMatrix, 86
DMGlobalToLocalBegin, 49
DMGlobalToLocalEnd, 49
DMLocalToGlobalBegin, 49
DMLocalToGlobalEnd, 49
DMMG, 83, 123125
DMMGCreate, 123
DMMGDestroy, 123
DMMGGetx, 123
DMMGSetDM, 123
DMMGSetKSP, 123, 124
DMMGSetSNES, 124
DMMGSetSNESLocal, 124, 125
DMMGSolve, 123
DMRestoreLocalVector, 50
Dorman-Prince, 121
double buffer, 179
eclipse, 175
eigenvalues, 77
Eisenstat trick, 79
Emacs, 174
errors, 171
etags, in Emacs, 174
Euler, 116
Euler, backward, 116
factorization, 189
oating-point exceptions, 172
ushing, graphics, 179
Frobenius norm, 65
gather, 53
generalized linear, 116
ghost points, 46, 47
global numbering, 45
global representation, 46
global to local mapping, 47
GMRES, 73
Gram-Schmidt, 73
graphics, 178
graphics, disabling, 181
grid partitioning, 68
Hermitian matrix, 73
Hindmarsh, 119
ICC, parallel, 79
IEEE oating point, 172
ILU, parallel, 79
in-place solvers, 79
incremental debugging, 172
index sets, 52
inexact Newton methods, 103
innity norm, 43, 65
INSERT VALUES, 53
InsertMode, 49, 50, 5456
installing PETSc, 21
IS, 19, 41, 46, 5254, 63, 67, 68, 70, 81, 87, 112,
132, 134, 189191
IS GTOLM DROP, 47
IS GTOLM MASK, 47
ISBlockGetIndices, 53
ISBlockGetLocalSize, 53
ISBlockGetSize, 53
ISColoring, 111
ISColoringDestroy, 111, 112
ISCreateBlock, 53
ISCreateGeneral, 52, 54
ISCreateStride, 52
ISDestroy, 52, 54
ISGetBlockSize, 53
ISGetIndices, 52, 53, 132, 133, 135
ISGetIndicesF90, 135
ISGetSize, 52
ISGlobalToLocalMappingApply, 47
ISGlobalToLocalMappingType, 47
199
ISLocalToGlobalMapping, 41, 4547, 51
ISLocalToGlobalMappingApply, 46, 47
ISLocalToGlobalMappingApplyIS, 46, 47
ISLocalToGlobalMappingCreate, 46
ISLocalToGlobalMappingDestroy, 46
ISPartitioningToNumbering, 70
ISRestoreIndices, 53
ISRestoreIndicesF90, 135
ISStrideGetInfo, 52
Jacobi, 80
Jacobian, 91
Jacobian, debugging, 102
Jacobian, testing, 102
Krylov subspace methods, 71, 73
KSP, 19, 24, 28, 30, 7178, 80, 8388, 99, 103,
118, 123, 124, 132, 152, 157, 163, 169,
189, 191, 192
KSP CG SYMMETRIC, 73
KSP GMRES CGS REFINEMENT ALWAYS, 73
KSP GMRES CGS REFINEMENT IFNEEDED,
73
KSP GMRES CGS REFINEMENT NONE, 73
KSPBCGS, 73, 75, 167
KSPBICG, 7375
KSPBuildResidual, 77
KSPBuildSolution, 77
KSPCG, 73, 75
KSPCGS, 73, 75
KSPCGSetType, 74
KSPCGType, 74
KSPCHEBYCHEV, 73, 75
KSPChebychevSetEigenvalues, 73
KSPComputeEigenvalues, 77
KSPComputeEigenvaluesExplicitly, 77
KSPConvergedReason, 75, 76
KSPCR, 73, 75
KSPCreate, 28, 30, 71, 83, 191
KSPDefaultConverged, 75
KSPDestroy, 28, 72, 192
KSPDGMRES, 75
KSPFGMRES, 75
KSPGCR, 75
KSPGetConvergedReason, 76
KSPGetIterationNumber, 72
KSPGetPC, 72, 83, 89, 105, 192
KSPGetRhs, 77
KSPGetSolution, 77
KSPGMRES, 73, 75, 152
KSPGMRESCGSRenementType, 73
KSPGMRESClassicalGramSchmidtOrthogonalization,
73
KSPGMRESModiedGramSchmidtOrthogonalization,
73
KSPGMRESSetCGSRenementType, 73
KSPGMRESSetOrthogonalization, 73
KSPGMRESSetRestart, 73, 163
KSPLSQR, 73, 75
KSPMonitorDefault, 76
KSPMonitorLGCreate, 76, 181
KSPMonitorLGDestroy, 76
KSPMonitorSet, 76
KSPMonitorSingularValue, 76
KSPMonitorTrueResidualNorm, 76
KSPPREONLY, 73, 75, 80, 89
KSPRICHARDSON, 7375
KSPRichardsonSetScale, 73
KSPSetComputeEigenvalues, 77
KSPSetConvergenceTest, 134
KSPSetFromOptions, 28, 72
KSPSetInitialGuessNonzero, 74
KSPSetNormType, 74
KSPSetNullSpace, 88
KSPSetOperators, 28, 71, 73, 82, 99, 100, 118
KSPSetPC, 191
KSPSetPCSide, 74
KSPSetTolerances, 75
KSPSetType, 73, 89, 167
KSPSetUp, 72, 74, 80, 152, 191
KSPSolve, 28, 7274, 76, 132, 152, 192
KSPTCQMR, 73, 75
KSPTFQMR, 73, 75
KSPType, 73, 75, 80
KSPView, 163
Lanczos, 77
line graphs, 179
line search, 91, 100
linear system solvers, 71
lines, drawing, 178
local linear solves, 80
local representation, 46
local to global mapping, 46
logging, 151, 161
LU, 189
Mat, 19, 28, 30, 51, 5760, 62, 63, 6571, 84, 85,
99, 103105, 117, 118, 120, 124, 128,
200
132, 152, 157, 158, 169, 170, 187, 189
192
MAT FINAL ASSEMBLY, 58
MAT FLUSH ASSEMBLY, 58
MAT INITIAL MATRIX, 189
MAT NEW NONZERO LOCATIONS, 67
MAT REUSE MATRIX, 189
MATAIJ, 28, 63
MatAssemblyBegin, 28, 58, 60, 62
MatAssemblyEnd, 28, 58, 60
MatAXPY, 66
MatCholeskyFactor, 190
MatCholeskyFactorNumeric, 190
MatCholeskyFactorSymbolic, 190
MATCOLORINGID, 112
MATCOLORINGLF, 112
MATCOLORINGNATURAL, 112
MATCOLORINGSL, 112
MatConvert, 66, 68, 89
MatCopy, 66
MatCreate, 28, 30, 57, 89, 105, 162
MatCreateMFFD, 104, 105
MatCreateMPIAdj, 69
MatCreateMPIAIJ, 60, 61, 105, 162
MatCreateMPIBAIJ, 162
MatCreateMPIDense, 62
MatCreateSeqAIJ, 29, 5961, 133, 162
MatCreateSeqBAIJ, 162
MatCreateSeqDense, 62
MatCreateShell, 65, 66, 71, 105
MatCreateSNESMF, 103, 105
MatDestroy, 70
MatDiagonalScale, 66
MatFactorInfo, 190
MatFactorType, 190
MatFDColoring, 111, 112
MatFDColoringCreate, 111, 112
MatFDColoringSetFromOptions, 111, 112
MatFDColoringSetParameters, 112
MatGetArray, 132, 133, 135
MatGetArrayF90, 135
MatGetColoring, 111, 112
MatGetDiagonal, 66
MatGetFactor, 190
MatGetLocalSubMatrix, 63
MatGetOrdering, 80, 189191
MatGetOwnershipRange, 58, 62
MatGetRow, 68
MatGetSubMatrix, 189
MatILUFactor, 152
MatILUFactorSymbolic, 152, 162
MATLAB, 127
MatLoad, 170
MatLUFactor, 190
MatLUFactorNumeric, 152, 190
MatLUFactorSymbolic, 162, 190
MatMFFDDSSetUmin, 104
MatMFFDGetH, 105
MatMFFDRegisterDynamic, 104
MatMFFDResetHHistory, 105
MatMFFDSetFunctionError, 104
MatMFFDSetHHistory, 105
MatMFFDSetType, 104
MatMFFDWPSetComputeNormU, 104
MATMPIAIJ, 61, 62, 68, 81, 89
MatMPIAIJSetPreallocation, 89
MATMPIBAIJ, 58, 62, 68, 81, 170
MatMult, 63, 65, 66, 152, 154
MatMultAdd, 65, 66, 152
MatMultTranspose, 65, 66
MatMultTransposeAdd, 65, 66
MatMumpsSetIcntl, 89
MATNEST, 63, 86
MatNorm, 65, 66
MatNullSpace, 88
MatNullSpaceCreate, 88
MATORDERING1WD, 80
MATORDERINGNATURAL, 80
MATORDERINGND, 80
MATORDERINGQMD, 80
MATORDERINGRCM, 80
MatOrderingRegisterDynamic, 190
MatOrderingType, 189, 190
MatPartitioning, 69, 70
MatPartitioningApply, 70
MatPartitioningCreate, 70
MatPartitioningDestroy, 70
MatPartitioningSetAdjacency, 70
MatPartitioningSetFromOptions, 70
MatReorderForNonzeroDiagonal, 191
MatRestoreArrayF90, 135
MatRestoreRow, 68
MatReuse, 66, 189
matrices, 28, 57
matrix ordering, 190
matrix-free Jacobians, 103
matrix-free methods, 65, 71
MatScale, 66
201
MATSEQAIJ, 60, 66, 68, 79, 89
MatSeqAIJSetPreallocation, 89
MATSEQBAIJ, 58, 68, 79
MATSEQDENSE, 68
MATSEQSBAIJ, 79
MatSetLocalToGlobalMapping, 51
MatSetOption, 58, 67
MatSetSizes, 28, 57
MatSetType, 28, 29, 89
MatSetValues, 28, 57, 58, 60, 67, 68, 134, 155,
157
MatSetValuesBlocked, 58
MatSetValuesBlockedLocal, 58, 63
MatSetValuesLocal, 51, 63
MATSHELL, 104, 117, 118, 120
MatShellGetContext, 66
MatShellSetOperation, 66, 105
MatShift, 66
MatSolve, 152, 191
MatSolve(), 191
MatSolveAdd, 191
MATSOLVERESSL, 89
MATSOLVERLUSOL, 89
MATSOLVERMATLAB, 89
MATSOLVERMUMPS, 89
MatSolverPackage, 89, 190
MATSOLVERPLAPACK, 89
MATSOLVERSPOOLES, 89
MATSOLVERSUPERLU, 89
MATSOLVERUMFPACK, 89
MatSolveTranspose, 191
MatSolveTransposeAdd, 191
MatSORType, 79
MatStructure, 28, 66, 71, 99, 111, 117, 118, 120,
124, 191, 192
MatTranspose, 66
MatType, 28, 66, 68, 89, 170
MatView, 65, 127, 128, 170
MatZeroEntries, 66, 67, 100
MatZeroRows, 67, 68
MatZeroRowsColumns, 68
MatZeroRowsColumnsIS, 68
MatZeroRowsIS, 67
MatZeroRowsLocal, 67
MatZeroRowsLocalIS, 67
memory allocation, 163
memory leaks, 163
MPI, 177
MPI Comm, 129
MPI Finalize(), 24
MPI Init(), 23
mpiexec, 22
multigrid, 83
multigrid, additive, 84
multigrid, full, 84
multigrid, Kaskade, 84
multigrid, multiplicative, 84
multiplicative preconditioners, 82
nested dissection, 80
Newton-like methods, 91
nonlinear equation solvers, 91
NormType, 4345, 65, 66
null space, 88
Nupshot, 154
ODE solvers, 115, 119
one-way dissection, 80
options, 167
ordering, 190
orderings, 45, 46, 78, 80
overlapping Schwarz, 80
partitioning, 68
PC, 19, 28, 71, 72, 74, 7885, 99, 105, 119, 124,
152, 157, 169, 191193
PC ASM BASIC, 81
PC ASM INTERPOLATE, 81
PC ASM NONE, 81
PC ASM RESTRICT, 81
PC COMPOSITE ADDITIVE, 82
PC COMPOSITE MULTIPLICATIVE, 82
PC LEFT, 193
PC MG ADDITIVE, 84
PC MG CYCLE W, 84
PC MG FULL, 84
PC MG KASKADE, 84
PC MG MULTIPLICATIVE, 84
PC RIGHT, 193
PCApply, 152, 192, 193
PCApplyBAorABTranspose, 193
PCApplyRichardson, 193
PCApplyTranspose, 192
PCASM, 78
PCASMGetSubKSP, 80
PCASMSetLocalSubdomains, 81
PCASMSetOverlap, 81
PCASMSetTotalSubdomains, 81
PCASMSetType, 81
202
PCASMType, 81
PCBJACOBI, 78
PCBJacobiGetSubKSP, 80
PCBJacobiSetLocalBlocks, 81
PCBJacobiSetTotalBlocks, 81
PCCHOLESKY, 78
PCCOMPOSITE, 78, 82
PCCompositeAddPC, 82
PCCompositeGetPC, 83
PCCompositeSetType, 82
PCCompositeSetUseTrue, 82
PCCompositeType, 82
PCCreate, 192
PCDestroy, 193
PCEISENSTAT, 78, 79
PCEisenstatNoDiagonalScaling, 79
PCEisenstatSetOmega, 79
PCFactorGetMatrix, 89
PCFactorSetAllowDiagonalFill, 78
PCFactorSetFill, 83
PCFactorSetLevels, 78
PCFactorSetMatSolverPackage, 89
PCFactorSetReuseFill, 78
PCFactorSetReuseOrdering, 78
PCFactorSetUseInPlace, 72, 78, 79
PCFIELDSPLIT, 86
PCFieldSplitSetFields, 86
PCFieldSplitSetIS, 86
PCFieldSplitSetType, 87
PCGetOperators, 82, 192
PCICC, 78
PCILU, 78, 80, 152
PCJACOBI, 78
PCKSP, 78, 83
PCKSPGetKSP, 83
PCKSPSetUseTrue, 83
PCLSC, 87
PCLU, 78, 89
PCMG, 83
PCMGCycleType, 84
PCMGDefaultResidual, 85
PCMGGetCoarseSolve, 84
PCMGGetSmoother, 84
PCMGGetSmootherDown, 84
PCMGGetSmootherUp, 84
PCMGSetCycleType, 84
PCMGSetInterpolation, 84
PCMGSetLevels, 83
PCMGSetNumberSmoothDown, 84
PCMGSetNumberSmoothUp, 84
PCMGSetR, 85
PCMGSetResidual, 84
PCMGSetRestriction, 84
PCMGSetRhs, 85
PCMGSetType, 84
PCMGSetX, 85
PCMGType, 84
PCNONE, 78, 103
PCREDISTRIBUTE, 68
PCRichardsonConvergedReason, 193
PCSetOperators, 191, 192
PCSetType, 78, 82, 83, 89, 105, 192
PCSetUp, 152
PCSHELL, 78, 103, 105
PCShellGetContext, 82
PCShellSetApply, 81, 105
PCShellSetContext, 81, 82
PCShellSetSetUp, 82
PCSide, 74, 193
PCSOR, 78
PCSORSetIterations, 79
PCSORSetOmega, 79
PCSORSetSymmetric, 79
PCType, 78, 80, 89, 123, 192
performance tuning, 161
PETSC DECIDE, 41, 60
PETSC DIR, 22
PETSC FP TRAP OFF, 172
PETSC FP TRAP ON, 172
PETSC HAVE FORTRAN CAPS, 133
PETSC HAVE FORTRAN UNDERSCORE, 133
PETSC LIB, 185
PETSC NULL CHARACTER, 134
PETSC NULL DOUBLE, 134
PETSC NULL INTEGER, 134
PETSC NULL SCALAR, 134
PETSC OPTIONS, 168
PETSC USE COMPLEX, 184
PETSC USE DEBUG, 184
PETSC USE LOG, 184
PETSC VIEWER ASCII IMPL, 170
PETSC VIEWER ASCII MATLAB, 170
PETSC VIEWER DEFAULT, 170
PetscAbortErrorHandler, 171
PetscBinaryRead, 128
PetscBool, 74, 78, 88, 104, 159, 168, 193
PetscCopyMode, 46, 52, 53
PetscDefaultSignalHandler, 172
203
PetscDraw, 178, 179
PetscDrawAxis, 180
PetscDrawAxis*(), 76
PetscDrawAxisDraw, 179
PetscDrawAxisSetColors, 180
PetscDrawAxisSetLabels, 180
PetscDrawFlush, 179
PetscDrawLG, 76, 179, 180
PetscDrawLG*(), 76
PetscDrawLGAddPoint, 179
PetscDrawLGAddPoints, 179
PetscDrawLGCreate, 179
PetscDrawLGDestroy, 179
PetscDrawLGDraw, 179
PetscDrawLGGetAxis, 180
PetscDrawLGReset, 180
PetscDrawLGSetLimits, 180
PetscDrawLine, 179
PetscDrawOpenX, 178
PetscDrawPause, 179
PetscDrawSetCoordinates, 178
PetscDrawSetDoubleBuffer, 179
PetscDrawSetViewPort, 178
PetscDrawSP*(), 77
PetscDrawString, 179
PetscDrawStringGetSize, 179
PetscDrawStringSetSize, 179
PetscDrawStringVertical, 179
PetscDrawSynchronizedFlush, 179
PetscError, 135, 171, 172
PetscErrorCode, 66, 75, 76, 81, 82, 84, 99102,
117, 118, 120, 124126, 132, 133, 171,
172, 190
PetscFClose, 135
PetscFinalize, 23, 151, 156, 164, 169
PetscFOpen, 135
PetscFPrintf, 135, 158
PetscGetTime, 158
PetscInfo, 135, 157
PetscInfoActivateClass, 158
PetscInfoAllow, 157
PetscInfoDeactivateClass, 157
PetscInitialize, 2224, 135, 151, 156, 167, 168
PetscInt, 4345, 67, 68, 193
PetscIntView, 128
PetscLogEvent, 155
PetscLogEventActivate, 157
PetscLogEventActivateClass, 157
PetscLogEventBegin, 155, 156
PetscLogEventDeactivate, 157
PetscLogEventDeactivateClass, 157
PetscLogEventEnd, 155, 156
PetscLogEventRegister, 155, 157
PetscLogFlops, 155, 156
PetscLogStage, 156
PetscLogStagePop, 156
PetscLogStagePush, 156
PetscLogStageRegister, 156
PetscLogTraceBegin, 171
PetscMalloc, 162, 172
PetscMallocDump, 164
PetscMallocDumpLog, 164
PetscMallocGetCurrentUsage, 164
PetscMallocGetMaximumUsage, 164
PetscMallocSetDumpLog, 164
PetscMatlabEngine, 128
PetscMatlabEngineCreate, 128
PetscMatlabEngineEvaluate, 128
PetscMatlabEngineGet, 128
PetscMatlabEngineGetArray, 128
PetscMatlabEngineGetOutput, 128
PetscMatlabEnginePut, 128
PetscMatlabEnginePutArray, 128
PetscMemoryGetCurrentUsage, 164
PetscMemoryGetMaximumUsage, 164
PetscMemorySetGetMaximumUsage, 164
PetscObject, 87, 127, 128, 155
PetscObjectCompose, 87
PetscObjectName, 127
PetscObjectSetName, 127, 128
PetscObjectSetName(), 127
PetscOffset, 132, 133
PetscOptionsGetInt, 27, 134, 168
PetscOptionsGetIntArray, 168
PetscOptionsGetReal, 168
PetscOptionsGetRealArray, 168
PetscOptionsGetString, 135, 168
PetscOptionsGetStringArray, 135, 168
PetscOptionsHasName, 168
PetscOptionsSetValue, 168
PetscPopErrorHandler, 135, 171
PetscPreLoadBegin, 159
PetscPreLoadEnd, 159
PetscPreLoadStage, 159
PetscPrintf, 135, 158
PetscPushErrorHandler, 135, 171
PetscPushSignalHandler, 172
PetscReal, 44, 45, 104, 116118, 193
204
PetscRealView, 128
PetscScalar, 27, 28, 4245, 47, 50, 54, 55, 57, 62,
6668, 105, 124, 128, 134, 173
PetscScalarView, 128
PetscSetDebugger, 135
PetscSetFPTrap, 172
PetscTraceBackErrorHandler, 171
PetscViewer, 42, 46, 65, 117, 127, 169, 170, 178
PetscViewerASCIIGetPointer, 135
PetscViewerASCIIOpen, 127, 169, 170
PetscViewerBinaryGetDescriptor, 135
PetscViewerBinaryOpen, 169, 170
PetscViewerDrawGetDraw, 178
PetscViewerDrawOpen, 65, 169, 178
PetscViewerPopFormat, 170
PetscViewerPushFormat, 127, 170
PetscViewerSetFormat, 127, 170
PetscViewerSocketOpen, 127, 128, 169
PetscViewerStringOpen, 135
PetscViewerStringSPrintf, 135
preconditioners, 78
preconditioning, 71, 74
preconditioning, right and left, 193
proling, 151, 161
providing arrays for vectors, 43
Qt Creator, 175
quotient minimum degree, 80
relaxation, 79, 84
reorder, 189
restart, 73
reverse Cuthill-McKee, 80
Richardsons method, 193
Runge-Kutta, 116, 121
running PETSc programs, 22
runtime options, 167
SAME NONZERO PATTERN, 72, 100
SAME PRECONDITIONER, 72
Sarkis, Marcus, 81
scatter, 53
SCATTER FORWARD, 53
ScatterMode, 5456
sections, 195
SETERRQ, 172
SETERRQ(), 172
signals, 171
singular systems, 88
smoothing, 84
SNES, 19, 29, 91, 98105, 111, 113, 118, 119,
123, 124, 152, 157, 158, 169
SNESConvergedReason, 101, 102
SNESCreate, 98
SNESDefaultComputeJacobian, 118
SNESDefaultComputeJacobianColor, 111, 118
SNESDestroy, 99
SNESGetFunction, 102
SNESGetKSP, 105
SNESGetSolution, 102
SNESGetTolerances, 101
SNESLineSearchNo, 100
SNESLineSearchNoNorms, 100
SNESLS, 100
SNESMonitorDefault, 102
SNESMonitorSet, 102
SNESSetConvergenceTest, 101
SNESSetFromOptions, 98, 105
SNESSetFunction, 99
SNESSetJacobian, 99, 105, 111, 117, 118
SNESSetTolerances, 101
SNESSetTrustRegionTolerance, 101
SNESSetType, 98
SNESSolve, 99
SNESTEST, 100
SNESTR, 100, 101
SNESType, 98, 100
SNESView, 163
SNESVISetVariableBounds, 113
SOR, 79
SOR BACKWARD SWEEP, 79
SOR FORWARD SWEEP, 79
SOR LOCAL BACKWARD SWEEP, 79
SOR LOCAL FORWARD SWEEP, 79
SOR LOCAL SYMMETRIC SWEEP, 79
SOR SYMMETRIC SWEEP, 79
SPARSKIT, 59
spectrum, 77
SSOR, 79
stride, 52
submatrices, 189
Sundials, 119
SUNDIALS MODIFIED GS, 119
SUNDIALS UNMODIFIED GS, 119
symbolic factorization, 190
tags, in Vi/Vim, 174
text, drawing, 179
time, 158
205
timing, 151, 161
trust region, 91, 101
TS, 19, 115121, 152, 158
TSARKIMEX, 119
TSBEULER, 116
TSCN, 116
TSCreate, 116
TSDefaultComputeJacobian, 118
TSDefaultComputeJacobianColor, 118
TSDestroy, 117
TSEULER, 116
TSGetTimeStep, 116
TSGL, 116
TSProblemType, 116
TSPSEUDO, 116
TSPseudoDefaultTimeStep, 120
TSPseudoIncrementDtFromInitialDt, 120
TSPseudoSetTimeStep, 120
TSPseudoSetTimeStepIncrement, 120
TSRK, 116
TSRKSetTolerance, 121
TSSetDuration, 116
TSSetIFunction, 118, 119
TSSetIJacobian, 119
TSSetInitialTimeStep, 116, 121
TSSetRHSFunction, 117120
TSSetRHSJacobian, 118, 120
TSSetSolution, 117
TSSetTimeStep, 116
TSSetType, 116, 119
TSSetUp, 117
TSSolve, 116, 117
TSStep, 117
TSSUNDIALS, 116, 119
TSSundialsGetPC, 119
TSSundialsSetGramSchmidtType, 119
TSSundialsSetTolerance, 119
TSSundialsSetType, 119
TSTHETA, 116
TSType, 116, 119
TSView, 117
Upshot, 154
UsingFortran, 132
V-cycle, 84
Vec, 19, 27, 28, 30, 4145, 47, 49, 50, 5356, 65
68, 72, 77, 81, 84, 85, 88, 99, 100, 102,
104, 113, 116118, 120, 123, 124, 128,
132134, 152, 157, 169, 170, 191193
VecAbs, 44
VecAssemblyBegin, 42
VecAssemblyEnd, 42
VecAXPBY, 44
VecAXPY, 44, 152
VecAYPX, 44
VecCopy, 44, 152
VecCreate, 27, 30, 41, 65
VecCreateGhost, 55
VecCreateGhostWithArray, 55
VecCreateMPI, 41, 46, 55, 61, 65
VecCreateMPIWithArray, 43
VecCreateSeq, 41, 54
VecCreateSeqWithArray, 43
VecDestroy, 43, 85
VecDestroyVecs, 43, 134, 135
VecDestroyVecsF90, 135
VecDot, 44, 152, 161
VecDotBegin, 45
VecDotEnd, 45
VecDuplicate, 27, 43, 49, 50, 54, 55
VecDuplicateVecs, 43, 49, 55, 134, 135
VecDuplicateVecsF90, 135
VecGetArray, 4244, 54, 132, 133, 135, 161
VecGetArrayF90, 135
VecGetArrays, 135
VecGetLocalSize, 44, 133
VecGetOwnershipRange, 43
VecGetSize, 44
VecGetValues, 42, 54
VecGhostGetLocalForm, 55
VecGhostRestoreLocalForm, 55
VecGhostUpdateBegin, 55, 56
VecGhostUpdateEnd, 55, 56
VecLoad, 170
VecMax, 44
VecMAXPY, 44
VecMDot, 44, 152, 161
VecMDotBegin, 45
VecMDotEnd, 45
VecMin, 44
VecMTDot, 44
VecMTDotBegin, 45
VecMTDotEnd, 45
VecNorm, 43, 44, 152
VecNormBegin, 45
VecNormEnd, 45
VecPointwiseDivide, 44
VecPointwiseMult, 44
206
VecReciprocal, 44
VecRestoreArray, 43, 44, 133
VecRestoreArrayF90, 135
VecRestoreArrays, 135
Vecs, 88
VecScale, 44, 152
VecScatter, 41, 50, 5355
VecScatterBegin, 5356, 154
VecScatterCreate, 53, 54
VecScatterDestroy, 53, 54
VecScatterEnd, 5356, 154
VecSet, 27, 42, 44, 99, 134
VecSetFromOptions, 27, 41
VecSetLocalToGlobalMapping, 47, 51
VecSetSizes, 27, 41
VecSetType, 27
VecSetValues, 27, 42, 54, 133, 134, 157
VecSetValuesLocal, 47, 51
VecShift, 44
VecSum, 44
VecSwap, 44
VecTDot, 44
VecTDotBegin, 45
VecTDotEnd, 45
vector values, getting, 54
vector values, setting, 42
vectors, 27, 41
vectors, setting values with local numbering, 47
vectors, user-supplied arrays, 43
vectors, with ghost values, 55
VecType, 170
VecView, 42, 127, 128, 170
VecWAXPY, 44
Vi, 174
Vim, 174
W-cycle, 84
wall clock time, 158
X windows, 178
xcode, 177
zero pivot, 88
207
208
Bibliography
[1] Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith. Efcient management of
parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Lang-
tangen, editors, Modern Software Tools in Scientic Computing, pages 163202. Birkhauser Press,
1997.
[2] Peter N. Brown and Youcef Saad. Hybrid Krylov methods for nonlinear systems of equations. SIAM
J. Sci. Stat. Comput., 11:450481, 1990.
[3] X.-C. Cai and M. Sarkis. A restricted additive Schwarz preconditioner for general sparse linear
systems. Technical Report CU-CS 843-97, Computer Science Department, University of Colorado-
Boulder, 1997. (accepted by SIAM J. of Scientic Computing).
[4] J. E. Dennis Jr. and Robert B. Schnabel. Numerical Methods for Unconstrained Optimization and
Nonlinear Equations. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.
[5] S. Eisenstat. Efcient implementation of a class of CG methods. SIAM J. Sci. Stat. Comput., 2:14,
1981.
[6] S. C. Eisenstat and H. F. Walker. Choosing the forcing terms in an inexact Newton method. SIAM J.
Scientic Computing, 17:1632, 1996.
[7] R. Freund, G. H. Golub, and N. Nachtigal. Iterative Solution of Linear Systems, pages 57100. Acta
Numerica. Cambridge University Press, 1992.
[8] Roland W. Freund. A transpose-free quasi-minimal residual algorithm for non-Hermitian linear sys-
tems. SIAM J. Sci. Stat. Comput., 14:470482, 1993.
[9] William Gropp and Ewing Lusk. MPICH Web page. http://www.mcs.anl.gov/mpi/mpich.
[10] William Gropp, Ewing Lusk, and Anthony Skjellum. Using MPI: Portable Parallel Programming with
the Message Passing Interface. MIT Press, 1994.
[11] Virginia Herrarte and Ewing Lusk. Studying parallel program behavior with Upshot. Technical Report
ANL-91/15, Argonne National Laboratory, August 1991.
[12] Magnus R. Hestenes and Eduard Steifel. Methods of conjugate gradients for solving linear systems. J.
Research of the National Bureau of Standards, 49:409436, 1952.
[13] Jorge J. Mor e, Danny C. Sorenson, Burton S. Garbow, and Kenneth E. Hillstrom. The MINPACK
project. In Wayne R. Cowell, editor, Sources and Development of Mathematical Software, pages 88
111, 1984.
[14] MPI: A message-passing interface standard. International J. Supercomputing Applications, 8(3/4),
1994.
209
[15] M. Pernice and H. F. Walker. NITSOL: A Newton iterative solver for nonlinear systems. SIAM J. Sci.
Stat. Comput., 19:302318, 1998.
[16] Youcef Saad and Martin H. Schultz. GMRES: A generalized minimal residual algorithm for solving
nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856869, 1986.
[17] Barry F. Smith, Petter Bjrstad, and William D. Gropp. Domain Decomposition: Parallel Multilevel
Methods for Elliptic Partial Differential Equations. Cambridge University Press, 1996.
[18] Peter Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat.
Comput., 10:3652, 1989.
[19] H. A. van der Vorst. BiCGSTAB: A fast and smoothly converging variant of BiCG for the solution of
nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 13:631644, 1992.
Argonne National Laboraory is a U.S. Department of Energy
laboratory managed by UChicago Argonne, LLC
!"#$%&"#'()*"+,*-.&/0#%1*2('%+(%*3'4')'.+
!"#$%%&'()*+$%),'-).$")*$"/
0122'3$4*5'6)77'!8&%4&9':,;#<'=>2
!"#$%%&9'?-'@2>A0B>C>1
DDD<)%,<#$8
Figure 27: