Master

TMA4212 Numerical solution of partial differential
equations with finite difference methods
Brynjulf Owren1
January 31, 2017
1
Translated and amended by E. Celledoni
Preface
The preparation of this note started in the winter of 2004. The note is a teaching aid for
the first half of the course TMA4210 "Numerical solution of partial differential equations
with difference methods". In the winter of 2006 the note was updated and several new
sections were added to adapt it to the course TMA4212. I want to thank all the students
who followed the courses during these semesters. They have been a source of inspiration
for the writing and they helped me in the correction of many typos and mistakes in the
earlier versions of this note.
i
ii
Contents
1 Introduction 1
2 Background material 3
2.1 Background on matrix theory . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Positive definite matrices . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.4 Gershgorin’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.5 Vector and matrix norms . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.6 Consistent and subordinate matrix norms . . . . . . . . . . . . . . . 6
2.1.7 Matrix norms and spectral radius. . . . . . . . . . . . . . . . . . . . 7
2.2 Difference formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Taylor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Big O-notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Difference approximations to the derivatives . . . . . . . . . . . . . . 10
2.2.4 Difference operators and other operators . . . . . . . . . . . . . . . 11
2.2.5 Differential operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Boundary value problems 15

3.1 A simple case example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 2-norm stability for the case example . . . . . . . . . . . . . . . . . . 18
3.1.2 Neumann boundary conditions . . . . . . . . . . . . . . . . . . . . . 18
3.2 Linear boundary value problems . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 A self-adjoint case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 A nonlinear example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Discretization of the heat equation 27

4.1 On the derivation of the heat equation . . . . . . . . . . . . . . . . . . . . . 27
4.2 Numerical solution of the initial/boundary value problem . . . . . . . . . . 28
4.2.1 Numerical approximation on a grid. . . . . . . . . . . . . . . . . . . 28
4.2.2 Euler, Backward Euler and Crank–Nicolson . . . . . . . . . . . . . . 29
4.2.3 Solution of the linear systems in Backward Euler’s method and Crank–
Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.4 Solution of linear systems in Matlab . . . . . . . . . . . . . . . . . . 34
4.2.5 The θ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Semi-discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.1 Semi-discretization of the heat equation . . . . . . . . . . . . . . . . 36
4.3.2 Semidiscretization principle in general . . . . . . . . . . . . . . . . . 37
4.3.3 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
iii
iv CONTENTS
4.3.4 ut = Lu with different choices of L . . . . . . . . . . . . . . . . . . . 39

4.4 Boundary conditions involving derivatives . . . . . . . . . . . . . . . . . . . 41
4.4.1 Different types of boundary conditions . . . . . . . . . . . . . . . . . 41
4.4.2 Discretization of the boundary conditions . . . . . . . . . . . . . . . 42
4.5 Nonlinear parabolic differential equations . . . . . . . . . . . . . . . . . . . 44
5 Stability, consistency and convergence 47

5.1 Properties of the continuous problem . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Convergence of a numerical method . . . . . . . . . . . . . . . . . . . . . . 48
5.3 Domain of dependence of a numerical method . . . . . . . . . . . . . . . . 49
5.4 Proof of convergence for the Euler’s method on the (I/BVP) with r ≤ 12 . . 50
5.5 Stability on unbounded time interval (F -stability) . . . . . . . . . . . . . . . 51
5.6 Stability on [0, T ] when h → 0, k → 0 . . . . . . . . . . . . . . . . . . . . . 52
5.7 Stability and roundoff error . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.8 Consistency and Lax’ equivalence theorem . . . . . . . . . . . . . . . . . . . 58
5.9 von Neumann’s stability criterion . . . . . . . . . . . . . . . . . . . . . . . . 59
6 Elliptic differential equations 63

6.1 Elliptic equation on the plane . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Difference methods derived using Taylor series . . . . . . . . . . . . . . . . . 64
6.2.1 Discretization of a self-adjoint equation . . . . . . . . . . . . . . . . 66
6.3 Boundary conditions of Neumann and Robin type . . . . . . . . . . . . . . . 67
6.4 Grid-like net and variable step-size . . . . . . . . . . . . . . . . . . . . . . . 69
6.5 General rectangular net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.6 Discretization using Taylor expansion on a completely general net . . . . . . 71
6.7 Difference formulae derived by integration . . . . . . . . . . . . . . . . . . . 72
6.8 Net based on triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.9 Difference equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.10 Convergence of the methods for elliptic equations . . . . . . . . . . . . . . 77
6.10.1 Convergence for the 5-point formula on a Dirichlet problem . . . . . 77
6.10.2 Some general comments on convergence . . . . . . . . . . . . . . . . 78
6.11 A discussion on the solution of large linear systems . . . . . . . . . . . . . . 79
7 Hyperbolic equations 81
7.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.3 Explicit difference formulae for ut + aux = 0 . . . . . . . . . . . . . . . . . . 84
7.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.5 Implicit methods for ut + aux = 0 . . . . . . . . . . . . . . . . . . . . . . . . 88
7.6 Hyperbolic systems of first order equations . . . . . . . . . . . . . . . . . . . 90
7.7 Dissipation and dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Chapter 1
Introduction
The numerical approximation of partial differential equations is an important component

in the simulation of natural processes. Examples where simulation techniques are useful
are chemical processes, fluid mechanics, structural dynamics, quantum physical processes,
electromagnetism, finance, etc.
When we talk about partial differential equations (PDEs) we mean equations where the
solution is a function (or a vector of functions) of at least two variables, which are called
independent variables. The equation describes a relation involving the solution and its
partial derivatives. But the specification of a mathematical model in applications involves
much more than just this relation. First of all the model is associated to a geometry.
This means that we specify a domain in the space of the independent variables where the
differential equation should be satisfied.
This domain can be finite or infinite. In two dimensions such domain can be a subset
of the plane, but also the surface of a cylinder or a sphere.
Typically a PDE in itself has infinitely many solutions. The specification of a PDE on
a domain must be supplemented with the specification of boundary conditions. There are
many different types of boundary conditions, but in general they specify the solution or its
derivatives on the boundary of the domain. If one of the independent variables is physical
time, then the boundary conditions at the initial time are called initial conditions, and this
is just a particular type of boundary conditions.
Most PDEs are such that it is not possible to write their solution in a closed form. To
understand the behavior of the solutions, it is therefore necessary to use approximation
methods and computer simulations. There are several numerical methods which can be
used for this purpose, among them we will focus mostly on difference methods. Other
techniques are spectral methods and finite element methods. These can be studied in
other courses (TMA4220).
Partial differential equations can be linear or nonlinear. Since this is a introductory
course we will mostly consider the linear case. Linear PDEs are divided in three subclasses,
parabolic, elliptic and hyperbolic differential equations. The PhD course MA8103 considers
nonlinear PDEs with particular focus on hyperbolic PDEs.
A typical prototype of parabolic partial differential equation is the heat equation, this is
the main subject of chapter 3. In this chapter we work most of the time with 2 independent
variables, time and one space dimension. For this reason the discussion about the geometry
of the problems is somewhat limited in this chapter.
In chapter 4 we consider general topics regarding difference approximation of PDEs
which are of interest for all the three types of PDEs. This chapter is about convergence of
the numerical approximations to the exact solution of the PDEs. The concepts of stability
1
2 CHAPTER 1. INTRODUCTION
and consistency will be also introduced as important tools for showing convergence of a
numerical scheme.
In chapter 5 we discuss elliptic equations and typical examples in this case are the
Laplace’s equation and the Poisson’s equation. We work with 2-dimensional domains,
where both the independent variables are space variables. In this case it is interesting to
look at more complex geometries compared to the parabolic case, and the domain is not
necessarily rectangular like in chapter 3. As a consequence most of the discussion is about
how to approximate different boundary conditions for domains which are not rectangular.
The first chapter of this note reviews some background material on matrix theory and
Taylor expansion, in this chapter we will also set the notation on differential operators.
Chapter 2
Background material
2.1 Background on matrix theory

Let A be a n × n-matrix with real (or complex) entries, we write A ∈ Rn×n (or A ∈ Cn×n ).
We say that A is diagonalizable if it exist a matrix X ∈ Cn×n such that
 
λ1
 λ2 
Λ = X −1 AX =  .
 
..
 . 
λn
Every λi ∈ C is called egenvalue of A. The matrix X has n columns denoted by X =
[x1 , . . . , xn ], and every xi ∈ Cn is called egenvector (associated to the egenvalue λi ). We
write a diagonal matrix as Λ above, in the following form
Λ = diag(λ1 , . . . , λn ).
2.1.1 Jordan form

For any A ∈ Rn×n (or A ∈ Cn×n ) it exists a matrix M ∈ Cn×n such that
 
J1
 J2 
M −1 AM = J =  , (block-diagonal). (2.1)
 
..
 . 
Jk
Pk
Here Ji is a mi × mi -matrix, and i=1 mi = n. The Jordan-blocks Ji have the form
 
λi 1
 .. 
 λi . 
Ji = 
 ..
,
 if mi ≥ 2
 . 1 
λi
and Ji = [λi ] if mi = 1. If all mi = 1, then k = n and the matrix is diagonalizable. If A

has n distinct egenvalues, it is always diagonalizable. The converse is not true, that is a
matrix can be diagonalizable even if it has multiple eigenvalues.
3
4 CHAPTER 2. BACKGROUND MATERIAL
2.1.2 Symmetric matrices

When we talk about symmetric matrices, we mean normally real symmetric matrices. The
transpose AT of a m × n-matrix A, is a n × m-matrix with aji as the (ij)-element (a matrix
whose columns are the rows of A). A n × n matrix is symmetric if AT = A.
A symmetric n × n matrix has real eigenvalues λ1 , . . . , λn and a set of real orthonor-

mal eigenvectors x1 , . . . , xn . Let h·, ·i denote the standard inner-product on Cn , then
hxi , xj i = δij (Kronecker-delta).
A consequence of this is that the matrix of eigenvectors X = [x1 , . . . , xn ] is real and

orthogonal and its inverse is therefore the transpose
X −1 = X T .
The diagonalization of A is given by
Λ = diag(λ1 , . . . , λn ), X = [x1 , . . . , xn ], X T X = I, X T AX = Λ ⇔ A = XΛX T
2.1.3 Positive definite matrices

If A is symmetric and hx, Axi = xT Ax > 0 for all 0 6= x ∈ Rn A is called positive definite.
A (symmetric) is positive semi-definite if hx, Axi ≥ 0 for all x ∈ Rn and hx, Axi = 0
for at least a x 6= 0.
A positive definite ⇔ A has only positive eigenvalues.
A positive semi-definite ⇔ A has only non-negative eigenvalues, and at least a 0-eigenvalue.
2.1.4 Gershgorin’s theorem
Gershgorin’s theorem. Is given A = (aik ) ∈ Cn×n . Define n disks Sj in the complex

plane by  
 X 
Sj = z ∈ C : |z − ajj | ≤ |ajk | .
 
k6=j
n
[
The union S = Sj contains all the eigenvalues of A. For every eigenvalue λ of A there
j=1
is a j such that λ ∈ Sj .
Example.
 
1+i 1 0
A =  0.5 3 1  .
1 1 5
2.1. BACKGROUND ON MATRIX THEORY 5
Im z
0
−1
−2
−3
0 1 2 3 4 5 6 7
Re z ,

Proof of Gershgorin’s theorem: Let λ be a eigenvalue with associate eigenvector x =
[ξ1 , . . . , ξn ]T 6= 0. Choose ` among the indexes 1, . . . , n such that |ξ` | ≥ |ξk |, k = 1, . . . , n,
and so |ξ` | > 0. The equation Ax = λx has component `:
n
X X
a`k ξk = λ ξ` ⇒ (λ − a`` )ξ` = a`k ξk
k=1 k6=`
Divide by |ξ` | on each side and take the absolute value
X ξk X |ξk | X
|λ − a`` | = a`k ≤ |a`k | ≤ |a`k |
ξ` |ξ` |
k6=` k6=` k6=`
Then we get λ ∈ S` .
Example. Diagonally dominant matrices with positive diagonal elements are positive
definite. Why?
2.1.5 Vector and matrix norms

Consider a vector space X (real or complex). A norm k · k : X → R satisfies the following
axioms
1. kxk ≥ 0 for all x, kxk = 0 ⇔ x = 0.
2. kαxk = |α|kxk (α ∈ R (C))
3. kx + yk ≤ kxk + kyk
Examples. x = (ξk ), X = Rn .
n n
!1/2
X X
2
kxk1 = |ξk |, kxk2 = |ξk | , kxk∞ = max |ξk |.
1≤k≤n
k=1 k=1
The matrix spaces Rn×n and Cn×n are also vector spaces over R (C). We say that k · k
is a matrix norm if for all A, B ∈ Rn×n (Cn×n )
1. kAk > 0 for all A, kAk = 0 ⇔ A = 0,
2. kαAk = |α|kAk, (α ∈ R (C)),
3. kA + Bk ≤ kAk + kBk,
4. kA · Bk ≤ kAk · kBk.
Remark. The last point requires that a matrix-matrix product is defined (this operation is
not defined in a general vector space). In abstract terms the axioms 1–4 give an example
of Banach-algebra.
Example. The Frobenius-norm of a matrix is defined as

 1/2
Xn X n
kAkF =  |ajk |2  .
j=1 k=1
2.1.6 Consistent and subordinate matrix norms

A given matrix norm is consistent with a given vector norm on Rn if
kAxk ≤ kAk · kxk for all A ∈ Rn×n , x ∈ Rn .
A given matrix norm is subordinate to a given vector norm on Rn if

kAxk
kAk = max .
x6=0 kxk
Examples. We give here as examples some of the most common subordinate matrix
norms. We look for matrix norms subordinate to the three vector norms k · k1 , k · k2 and
k · k∞ .
1. Let k · k1 be the matrix norm subordinate to the vector norm k · k1 . One can show
that A ∈ Rn×n (Cn×n ) is
Xn
kAk1 = max |aik |.
1≤k≤n
i=1
In other words we can say that kAk1 is the “maximal column-sum in A”.
2. To find the matrix norm subordinate to the vector norm k · k2 we must define the
spectral radius of a matrix M ∈ Rn×n (Cn×n ). If λ1 , . . . , λn are the eigenvalues of
M , we denote the spectral radius of M by ρ(M ), and it is defined as
ρ(M ) = max |λk |. (2.2)

1≤k≤n
If we plot the eigenvalues of M in the complex plane, the spectral radius is the
minimal radius of a circle centered in the origin and containing all eigenvalues of M .
We define now the 2-norm of a matrix A as
q
kAk2 = ρ(AT A).
2.1. BACKGROUND ON MATRIX THEORY 7
Note that AT A is positive (semi)definite, so all the eigenvalues are real and positive.
Taking the square root of the biggest eigenvalue, we obtain kAk2 . Note also that
the spectral radius of A can be very different from (the square root of ) the spectral
radius of AT A. On the other hand if A is symmetric then kAk2 = ρ(A).
3. Let k · k∞ be the matrix norm subordinate to the vector norm k · k∞ . We have

n
X
kAk∞ = max |aik |.
1≤i≤n
k=1
That is kAk∞ is the “maximal row-sum in A”. Observe also that kAk1 = kAT k∞ .
2.1.7 Matrix norms and spectral radius.

For any matrix norm k · k it is true that
kAk ≥ ρ(A). (2.3)
Proof: Let x be a eigenvector of A associated to an eigenvalue λ such that
Ax = λx.
Let y ∈ Cn be arbitrary. Then we have
A(xy T ) = (Ax)y T = λ(xy T ),
such that
kA(xy T )k ≤ kAk kxy T k.
As a consequence
|λ| kxy T k = kλ(xy T )k = kA(xy T )k ≤ kAk kxy T k.
Therefore |λ| ≤ kAk, and since this is true for every eigenvalue of A, it must be ρ(A) ≤ kAk.
Question to the reader: What is wrong with the following line of reasoning
|λ| kxk = kλ xk = kA xk ≤ kAk kxk etc.
Convergent matrix. A matrix A is said to be convergent (to zero) if
Ak → 0 when k → ∞.
A sufficient criterion. If kAk < 1 for a particular matrix norm, A is convergent.
Proof.
kAk k = kA · Ak−1 k ≤ kAk · kAk−1 k ≤ · · · ≤ kAkk → 0 if kAk < 1
Necessary and sufficient criterion. A is convergent if and only if the spectral radius
ρ(A), defined by (2.2), satisfies ρ(A) < 1.
Proof: We use Jordan form, and let A = M JM −1 where M ∈ Cn×n and J is like in
(2.1). Then we have A2 = M JM −1 M JM −1 = M J 2 M −1 , and by induction we get
Ak = M J k M −1 . Now Ak → 0 if and only if J k → 0. And J k → 0 if and only if
every Jordan block Jik → 0. Assume such a Jordan block has diagonal element λ and the
mi × mi -matrix F has its (j, j + 1) elements, for j = 1, . . . , mi − 1, equal to 1, and the other
elements equal to zero. Then Ji = λI + F where I is the identity matrix. The matrix F
is nihilpotent, i.e. F m = 0, m ≥ n. We assume that k ≥ n − 1 and compute
k n−1
X k n−1
X k X (m)
Jik k
= (λI + F ) = λ k−m m
F = λk−m F m = ϕk (λ)F m
m m
m=0 m=0 m=0
(m)
where ϕk (λ) = λk /k!. When k → ∞ then ϕk (λ) → 0 for 0 ≤ m ≤ n − 1 if and only if
|λ| < 1. This must be true for all Jordan blocks (i.e. eigenvalues of A) and this concludes
the proof.
2.2 Difference formulae

2.2.1 Taylor expansion
1 free variable. Let u ∈ C n+1 (I) where I ⊂ R is a interval of the real line. This means
that the n + 1-th derivative of u exists and is continuous on the interval I. Then the
following formula is valid.
Taylor’s formula with reminder. With x ∈ I, x + h ∈ I is

n
X hm (m)
u(x + h) = u (x) + rn
m!
m=0
where
hn+1
rn = u(n+1) (x + θh), 0 < θ < 1.
(n + 1)!
2 free variables. Assume now that u ∈ C n+1 (Ω) where Ω ⊂ R2 . It is convenient to use
a operatornotation for the partial derivatives. We write h = [h, k], and let
∂ ∂ ∂u ∂u
h · ∇ := h +k i.e. h · ∇u = h +k
∂x ∂y ∂x ∂y
The operator produces the derivative of a function in the direction h = [h, k], and we find
the directional derivative.
We can also define powers of the operator by for example
∂ 2 ∂2 ∂2 ∂2

∂
(h · ∇)2 = h +k = h2 2 + 2hk + k2 2
∂x ∂y ∂x ∂x∂y ∂y
The extension to the m-th power is obvious. then we can write

2.2. DIFFERENCE FORMULAE 9
Taylor’s formula with reminder for functions of two variables.

n
∂ m

X 1 ∂
u(x + h, y + k) = h +k u(x, y) + rn (2.4)
m! ∂x ∂y
m=0
where
∂ n+1

1 ∂
rn = h +k u(x + θh, y + θk), 0 < θ < 1.
(n + 1)! ∂x ∂y
We have here assumed that the line segment between (x, y) and (x + h, y + k) is included
in Ω.
Derivation of the previous formula. We look at a function of one variable µ(t) =

u(x + th, y + tk) for fixed x, y, h, k. Using Taylor’s expansion with reminder for the case of
one variable for µ(t) around t = 0, we obtain the two variables formula by setting t = 1.
2.2.2 Big O-notation
Let φ be a function of h and p a positive integer. Then we have
φ(h) = O(hp ) when h → 0
if there exist two constants C, H > 0 such that
|φ(h)| ≤ C|h|p when 0 < |h| < H.
If this holds, we say that φ(h) is of order p in the variable h.
The typical use of the big O-notation is in connection with the local truncation error
in numerical methods. For example in the Taylor expansion in one variable
hn+1 M
|rn | = u(n+1) (x + θh) ≤ |h|n+1 , M = max |u(n+1) (y)|
(n + 1)! (n + 1)! y∈I
where we know that the maximum exists if I is a closed, and bounded interval. So in this
case we have rn = O(hn+1 ).
Note that with the definition above for positive integers p we have
φ(h) = O(hp+1 ) ⇒ φ(h) = O(hp ).
Therefore sometimes when we write φ(h) = O(hp ) we mean that p is the biggest possible
integer such that this is true. Often it is convenient to write O(hp ) in formulae with sums,
like for example in the Taylor expansion of u above we replace rn with O(hn+1 ) such that
n
X hk
u(x + h) = u(m) (x) + O(hn+1 ).
k!
k=0
In general we have that φ(h) = O(hpφ ) and ψ(h) = O(hpψ ), then

ψ(h) + φ(h) = O(hq ), where q = min(pφ , pψ ).
But sometimes one can get higher powers. An obvious example is when φ(h) = h2 ,
ψ(h) = h3 − h2 , each of them is O(h2 ) but their sum is O(h3 ). If you multiply a function
φ(h) by a constant (6= 0), the order does not change.
2.2.3 Difference approximations to the derivatives

We introduce a grid on R i.e. a monotone sequence of real numbers {xn } where xn ∈ R.
xn−1 xn xn+1
Assume that u(x) is a given function, u ∈ C q (I), for a q which we will specify later. Let
un := u(xn ), u(m)
n := u(m) (xn ).
Assume the grid points xn are equidistant, i.e. xn+1 = xn + h for all n, where h ∈ R is
(m)
called step-size. We want to approximate un with expressions of the type
q
X
a` un+`
`=p
p ≤ q are integers, and typically p ≤ 0 and q ≥ 0.
Truncation error. We define

q
X
τn (h) = a` un+` − u(m)
n .
`=p
The strategy is to choose p and q, and then compute the q − p + 1 parameters ap , . . . , aq

such that τn is “small”.
By Taylor expansion we obtain
ν
X (`h)k (k)
un+` = u(xn + `h) = u + rν ,
k! n
k=0
where rν = O(hν+1 ) and ν ≥ m, such that

q ν
X X 1
τn = a` (`h)k u(k) (m)
n − un + O(h
ν+1
),
k!
`=p k=0
which can be rearranged in the form

 
ν k q
X h  X
τn = a` `k  u(k) (m)
n − un + O(h
ν+1
).
k!
k=0 `=p
(m)
We want that τn = O(hr ) with r as big as possible. To approximate un we need to
impose conditions on p and q. Set j := q − p. In order to get consistent approximation
formulae (i.e. such that τn (h) → 0 when h → 0), we must require j ≥ m. We choose then
ap , . . . , aq such that 
h k Xq  0 0 ≤ k ≤ m − 1,
`k a` = 1 k = m, (2.5)
k!
0 m + 1 ≤ k ≤ j.

`=p
Note that we have q−p+1 = j +1 free parameters ap , . . . , aq we can use, and the conditions
in (2.5) must be satisfied for 0 ≤ k ≤ j, this means a total of j + 1 conditions. The system
of equations has a unique solution for h 6= 0. If we choose a` from (2.5), and assume ν ≥ j,
we obtain the following truncation error
ν q
X hk (k) X
τn = un a` `k + O(hν+1 ).
k!
k=j+1 `=p
This methods is called the method of undetermined coefficients.

Example. m = 1 (u0n ). Choose p = −1, q = 1, j = 2. We want to find a−1 , a0 , a1 . We

write j + 1 = 3 equations i.e. k = 0, 1, 2 i (2.5).

1
k=0 a−1 + a0 + a1 = 0 a−1 = − 2h

−h a−1 + 0 · a0 + h a1 = 1  ⇒ a0

k=1 = 0

2 2 1
k=2 h a−1 + 0 · a0 + h a1 = 0 a1 = 2h
Looking at the first terms in the local truncation error we obtain

X hk X (2s+1)
(k) 1 k 1 k un
τn = un − (−1) + 1 = h2s .
k! 2h 2h (2s + 1)!
k=3 s=1
In the last equality, we have used the fact that the terms with even k disappear, such that
we can put k = 2s + 1 and let s = 1, 2, . . . . We have omitted the upper limit value for the
index s on purpose because the number of terms we include in the remainder depend on
the circumstances. Since the first term in the expression for τn is of type h2 , we say that
the formula is of order 2.
Some more formulae. Other popular difference approximations are
un+1 − un
m=1 = u0n + 1 00
2! h un + ···
h
un − un−1
m=1 = u0n − 1 00
2! h un + ···
h
un+1 − 2un + un−1 1 2 (4)
m=2 = u00n + 12 h un + ···
h2
Which order have these formulae?
Exercises
1. Assume m = 1, p = −2, q = 0 use the method of undetermined coefficients to obtain
(1)
an approximation of un of the second order in h.
2. Assume m = 1, p = −2, q = 1 use the method of undetermined coefficients to obtain
(1)
an approximation of un of the third order in h.
3. Consider m = 2, p = 0, q = 1, so that j = q − p = 1 < m. Try to construct an
(2)
approximation formula for un using the method of undetermined coefficients. What
happens?
2.2.4 Difference operators and other operators
Forward difference: ∆u(x) = u(x + h) − u(x).

Backward difference: ∇u(x) = u(x) − u(x − h).
Central difference: δu(x) = u(x + h2 ) − u(x − h2 ).
1
u(x + h2 ) + u(x − h2 ) .

Mean value: µu(x) = 2
Shift: Eu(x) = u(x + h).

Unity operator 1 u(x) = u(x).
Linearity. All the operators

∆, ∇, δ, µ, E, 1,
are linear. This means that for α ∈ R, and with functions u(x) and v(x) we have
F (αu(x) + v(x)) = αF u(x) + F v(x),
where F can be any of the operators above. Let us verify this for F = ∆.
∆(αu(x) + v(x)) = (αu(x + h) + v(x + h)) − (αu(x) + v(x))
= α(u(x + h) − u(x)) + (v(x + h) − v(x)) = α∆u(x) + ∆v(x).
Powers of the operators. Let F be one of the above defined operators. We can define
powers of F as follows
F 0 = 1, F k u(x) = F (F k−1 u(x)).
Example.
δu(x) = u(x + h2 ) − u(x − h2 ),
δ 2 u(x) = δ(δu(x)) = δu(x + h2 ) − δu(x − h2 ) = u(x + h) − u(x) − (u(x) − u(x − h)),
= u(x + h) − 2u(x) + u(x − h).
Another interesting example is the shift operator. We observe that E k u(x) = u(x+kh).
In this case it is easy to extend the definition to include all possible real powers, simply by
defining E s u(x) = u(x + sh) for all s ∈ R. For example we have E −1 u(x) = u(x − h) and
this is the inverse of E since Eu(x − h) = E −1 u(x + h) = u(x).
Relations between the difference operators.

∆u(x) = u(x + h) − u(x) = Eu(x) − 1 u(x) = (E − 1) u(x),
∇u(x) = u(x) − u(x − h) = 1 u(x) − E −1 u(x) = (1 − E −1 ) u(x),
h h
δu(x) = u(x + ) − u(x − ) = (E 1/2 − E −1/2 ) u(x),
2 2
1 h h 1
µu(x) = u(x + ) + u(x − ) = (E 1/2 + E −1/2 ) u(x).
2 2 2 2
In a more compact notation we have
∆ = (E − 1),
∇ = (1 − E −1 ),
δ = (E 1/2 − E −1/2 ),
1 1/2
µ = (E + E −1/2 ).
2
And now we have for example
k
k k
X k
∆ = (E − 1) = (−1)k−` E ` ,
`
`=0
such that
k k
X k X k
∆k u(x) = (−1)k−` E ` u(x) = (−1)k−` u(x + `h).
l l
`=0 `=0
2.2.5 Differential operator.

Define
d
D= so that Du(x) = u0 (x).
dx
Let
Dm u(x) = u(m) (x).
If u(x) is analytic1 in an interval containing x, x + h we have
∞ ∞
!
X hm m X 1 m
u(x + h) = D u(x) = (hD) u(x) = ehD u(x).
m! m!
m=0 m=0
We think of this only as a notation. We have
Eu(x) = ehD u(x),
and then E = ehD .
Relation between D and the other operators.

∞
X 1
∆ = E − 1 = ehD − 1 = (hD)m ,
m!
m=1
1
∆ = hD + (hD)2 + · · ·
2!
We will see that under the extra assumption that u is analytic we can make manipulations
with analytic functions in the way we are used to. The meaning is always that the final
result is expanded with a Taylor expansion and is interpreted as a sum of powers of oper-
ators which are applied to a smooth function. The analyticity requirement can always be
relaxed by considering a Taylor expansion with reminder, and requiring the function to be
differentiable only a finite number of times.
We consider powers of ∆, and we obtain
∞
!k
X (hD) m k
∆k = = hk Dk + hk+1 Dk+1 + · · ·
m! 2!
m=1
or
k k+1 k+1
∆k u(x) = hk Dk u(x) + h D u(x) + · · ·
2!
showing that ∆k /hk is a first order approximation (truncation error O(h)) of the operator
Dk .
Note that for s ∈ R we have
∞
X (sh)k
E s u(x) = u(x + sh) = Dk u(x) = eshD u(x)
k!
k=0
which reflects known computational rules. For central differences we can therefore write
1 1 hD
δ = E 1/2 − E −1/2 = e 2 hD − e− 2 hD = 2 sinh .
2
1
By analytic function on an interval we simply mean that its Taylor expansion converges in a neigh-
borhood of any point of the interval.
We can also compute

!k
hD k
3
k 2 hD k
δ = 2 sinh = hD + + ··· = (hD)k + (hD)k+2 + · · ·
2 3! 2 24
that is
k k+2 k+2
δ k u(x) = hk Dk u(x) + h D u(x) + · · ·
24
this shows that δ k /hk is a second order approximation of Dk .
In particular we find as before that
1 4 (4)
δ 2 u(x) = u(x + h) − 2u(x) + u(x − h) = h2 u00 (x) + h u (x) + · · · (2.6)
12
It is tempting to manipulate further with analytic functions. We have seen that
δ hD
= sinh .
2 2
We write therefore formally
2 δ
D= sinh−1 .
h 2
It is possible to expand sinh−1 z in a Taylor expansion
1 3 5 7
sinh−1 z = z − z 3 + z 5 − z + ···
6 40 112
so z = δ/2 and by multiplying by 2/h we obtain

1 1 3 3 5 5 7
D= δ− δ + δ − δ + ··· .
h 24 640 7168
Since we know that δ k = O(hk ) we see that we can find approximations to the differential
operator D of arbitrary high order by including enough terms in the expansion. The
manipulation we have carried out is not rigorously justified here, but it turns out to be
correct. For a detailed discussion on algebraic manipulations with differential operators,
see the textbook by Arieh Iserles, A first course in the numerical analysis of differential
equations, published by Cambridge University Press.
Chapter 3
Boundary value problems
3.1 A simple case example

We consider the boundary value problem
uxx = f (x), 0 < x < 1, u(0) = α, u(1) = β, (3.1)
the exact solution can be obtained by integrating twice on both sides between 0 and 1,
and then imposing the boundary conditions. We want to use this simple test problem to
illustrate some of the basic features of finite difference discretization methods.
To obtain a finite difference discretization for this problem we consider the grid
1
xm = m h, m = 0, . . . , M + 1, h= ,
M +1
and the notation um := u(xm ) such that u0 = α and uM +1 = β. We denote with capital
letters Um ≈ um the numerical approximation to u(x) at the grid point x = xm .
By replacing derivatives with central difference approximations to the left hand side of
(3.1) we obtain the so called discrete problem whose solution is the numerical approxima-
tion that we are seeking, this is
1
(Um−1 − 2Um + Um+1 ) = fm , m = 1, . . . , M. (3.2)
h2
This is a linear system of equations
Ah U = F, (3.3)
where Ah is a M × M matrix U ∈ RM and F ∈ RM and
f1 − hα2
   
−2 1  
 1 −2 1  U1  f2 
1 

.. .. ..

 . 

..

Ah := 2   , U :=  ..  , F :=  .
  
. . . .
h    
 1 −2 1  UM  fM −1 
1 −2 fM − hβ2
We know that u00 (xm ) = 1

δ 2 u(xm ) + O h2 and we want to deduce similar information

h2
about the error
em := Um − um , e0 = 0, eM +1 = 0.
15
16 CHAPTER 3. BOUNDARY VALUE PROBLEMS
Let the error vector eh be

 
u1
eh := U − u, u :=  ...  .
 
uM
We will in the sequel associate such vector to a piecewise constant function eh (x) defined
on the interval [0, 1] as follows
eh (x) = em , x ∈ [xm , xm+1 ), m = 1, . . . , M.
Because we are approximating a function, u(x) solution of (3.1), it is appropriate to think

of the numerical solution as a piecewise constant function approximating u(x) and similarly
for the error. We are therefore interested in measuring the norm of this piecewise constant
error function rather than the norm of the corresponding error vector, however the two are
closely related. In fact we can see that the following relationships hold:
• keh k∞ = max0≤x≤1 |eh (x)| = max1≤m≤M |em | = keh k∞ ;

R1 R xm+1
• keh k1 = 0 |eh (x)| dx = M |eh (x)| dx = h M
P P
m=1 xm m=1 |em | = hkeh k1 ;
R 1 P 1
• keh k2 =
1 2 2
= h M 2 2 = h 12 ke k ;
0 |eh (x)| dx m=1 |e m | h 2
and we see that in these three popular cases the vector norm is related to the function
norm of the corresponding piecewise constant function (with respect to the assumed grid)
simply by a scaling factor. A similar result is true for the case of k · kq .
Truncation error. The truncation error is the vector that by definition has components
 
τ1
1
τm := 2 (um−1 − 2um + um+1 ) − fm , m = 1, . . . , M, τh :=  ...  .
 
h
τM
1 2 (4)
By using that u00 (xm ) = 1
δ 2 u(xm ) − + O h4 and u00m = fm , we obtain

h2 12 h um
1 2 (4) 1
τm = u00m + h um + O h4 − fm = h2 u(4) 4

m +O h .
12 12
Equation for the error. The relationship between the error eh and the truncation error
is easily obtained: recall that by definition
τh = Ah u − F,
rearranging and subtracting this from Ah U = F we obtain the important relation
Aeh = −τh (3.4)
which can be also written componentwise as
1
(em−1 − 2em + em+1 ) = −τm , m = 1, . . . , M.
h2
3.1. A SIMPLE CASE EXAMPLE 17
Definition. A method for the boundary value problem (3.1) is said to be consistent with
the boundary value problem with respect to the norm k · k if and only if
kτh k → 0, when h → 0.
Consistence in the vector norm k · k implies that the corresponding piecewise constant
function tends to zero as h tends to zero in the corresponding function norm, this is because
of the relationship between vector and function norms as we have seen for k · k1 , k · k∞ and
k · k2 .
Definition. A method for the boundary value problem (3.1) is said to be convergent in
the (function) norm k · k if and only if
keh k → 0, when h → 0.
Definition. Stability. Assume a difference method for the boundary value problem (3.1)
is given by the discrete equation
Ah U = F,
where h is the step-size of discretization. The method is stable in the norm k · k if there
exist constants C > 0 and H > 0 such that
1. A−1
h exists for all h < H;
2. kAh−1 k ≤ C for all h < H.
The matrix norm in which we should prove stability is the one subordinate to the
chosen function/vector norm in which we want to prove convergence.
Proposition. For the boundary value problem (3.1), stability and consistence with
respect to the norm k · k imply convergence in the same norm.
Proof. We use (3.4) to obtain a bound for the norm of the error. Since we have stability
Ah is invertible for all h < H and therfore
eh = −A−1
h τh .
Then
1
keh kq ≤ h q kA−1 −1
h kq kτh kq = kAh kq kτh kq ≤ Ckτh kq ,
and we conclude that keh kq → 0 as h → 0 because so does kτh kq .

For the case k · k∞ the vector norm of v ∈ RM and the function norm of the corre-
sponding piecewise constant function defined on the grid coincide, so to include this case
1 1
we can conveniently adopt the notation h ∞ := limq→∞ h q = 1.
3.1.1 2-norm stability for the case example

Observe that the eigenvalues of the matrix (3.3) are
2
λm = (cos(m π h) − 1) , m = 1, . . . , M,
h2
and the corresponding eigenvectors vm have components
vjm = sin(m π j h), j = 1, . . . , M.

q
Since by definition kAh k2 = ρ(ATh Ah ) then because Ah is symmetric kAh k2 = ρ(Ah ).
Denote with σ(B) the spectrum of the M ×M matrix B (the collection of all the eigenvalues
of B), then kAh k2 = maxλ∈σ(Ah ) |λ|. Analogously
1
kA−1 −1 −1
h k2 = ρ(Ah ) = max |λ | = .
λ∈σ(Ah ) minλ∈σ(Ah ) |λ|
2 4
Using the series expansion cos(x) = 1 − x2 + x4! + O(x6 ), with x = m π h in the expression
for the m-th eigenvalue we get
(m π h)2 (m π h)4

2
λm = 2 − + + O(h ) = −m2 π 2 + O(h2 ),
6
h 2 4!
and
min |λ| = |λ1 | = −π 2 + O(h2 ),
λ∈σ(Ah )
such that
1 1
kA−1
h k2 = → 2, when h → 0,
|λ1 | π
and so there exist C > 0 and H > 0 such that kA−1 1

h k2 = |λ1 | < C for all h < H, which
proves stability of the proposed difference scheme in the 2-norm.
Exercise. Using the estimates for kA−1
h k2 and kτh k2 obtained in this section, prove that
1 h2.5 00
keh k ≤ kf k2 + O(h3.5 ),
π 2 12
assume f is twice differentiable.
3.1.2 Neumann boundary conditions

We consider now the boundary value problem
uxx = f (x), 0 < x < 1, u0 (0) = σ, u(1) = β, (3.5)
and we propose three different discretizations of the left boundary condition which com-
bined with the earlier consider discretization of the second derivative will lead to three
different linear systems. In this case the matrices we obtain are no longer symmetric.
Case 1
We approximate u0 (0) = σ by
U1 − U0
= σ,
h
and combine this discrete equation with (3.2), this leads to the (M + 1) × (M + 1) linear
system
Ah U = F,
where
   
−h h   σ
 1 −2 1  U0  f1 
1  .. .. ..

 .. 

..

Ah := 2  , U :=  .  , F :=  .
   
. . . .
h    
 1 −2 1  UM  fM −1 
1 −2 fM − hβ2
This leads to an overall method of order of consistency 1.
Case 2
To obtain order two we introduce a fictitious node external to the domain to the left of x0
which we call x−1 . We approximate u0 (0) = σ by
U1 − U−1
= σ.
2h
We use this equation together with (3.2) for m = 0, i.e.
1
(U−1 − 2U0 + U1 ) = f0 ,
h2
and eliminate U−1 .

We are left with the equation
U1 − U0 f0 h
= +σ
h 2
which we now combine with (3.2), this leads to a (M + 1) × (M + 1) linear system
Ah U = F,
where Ah is as in the previous case and

 f0 h 
2 +σ

 f1 

F :=  ..
.
 
 . 
 fM −1 
fM − hβ2
Case 3
In this discretization we want to obtain order 2 without the use of fictitious nodes. We
then consider the approximation of u0 (0) = σ by
3
U0 − 2U1 + 12 U2
−2 = σ,
h
where the left hand side gives a discretization of order two of the first derivative of u at
zero. Proceeding as in case 1, we obtain the (M + 1) × (M + 1) linear system
Ah U = F,
where
− 23 h 2h − h2
   
  σ
U0
 1 −2 1 
 U1
 f1 
1  .. .. ..
  
..

Ah := 2  , U :=  . , F :=  .
     
. . . .
h    ..   
 1 −2 1   fM −1 
UM
1 −2 fM − hβ2
Exercise. Prove with the method of undetermined coefficients that

3
u(0) − 2u(h) + 21 u(2h)
−2 = u0 (0) + O(h2 ).
h
Neumann boundary conditions on both sides

We consider now the boundary value problem
uxx = f (x), 0 < x < 1, u0 (0) = σ0 , u0 (1) = σ1 , (3.6)
and we propose a second order discretization of the boundary conditions using two fictitious
nodes external to [0, 1], x−1 and xM +2 . The approximated boundary conditions become
U1 − U−1 UM +2 − UM
= σ0 , = σ1 ,
2h 2h
which we combine respectively with (3.2) for m = 0 and m = M + 1 and obtain the linear
system of equations (M + 2) × (M + 2) linear system
Ah U = F,
where
 
−h h
σ0 + h2 f0
 
 1 −2 1   

.. .. ..
 U0  f1 
1  . . .

..

..

Ah := 2  , U :=  , F :=  .
     
h  . .
1 −2 1   
  UM +1  fM 
 1 −2 1 
fM +1 h2 − σ1
h −h
But now the matrix Ah is singular, in fact it is easy to verify that v := [1, . . . , 1]T ∈ RM +2
is an eigenvector of Ah with eigenvalue 0. We want to determine the condition of existence
of solutions for this linear system.
Before we do that we consider the solution of (3.6). Integrating on both sides between
0 and 1 we obtain Z 1
u0 (1) − u0 (0) = f (x) dx,
0
that is Z 1
σ1 − σ0 = f (x) dx, (3.7)
0
this is a necessary condition for the existence of solutions of the boundary value problem
(3.6).
We now turn to the analysis of the solutions of the discrete boundary value problem
Ah U = F. Recall that
Range(Ah ) := span{Ah e1 , . . . Ah eM +2 },
where we denote with ej j = 1, . . . , M +2 the canonical basis of RM +2 and then Ah e1 , . . . Ah eM +2

are the columns of Ah .
Recall that
Ker(ATh ) = {z ∈ RM +2 | ATh z = 0},
and Ker(ATh ) is a vector space.
Proposition. Ah U = F admits a solution if and only if F ∈ Range(Ah )
Proposition. F ∈ Range(Ah ) if and only if F ⊥ Ker(ATh ).
Proof The proof of these two propositions is by exercise.
Since F ⊥ Ker(ATh ) if and only if
∀z ∈ RM +2 , s.t. ATh z = 0, FT z = 0,
we need to characterize z = [z0 , . . . , zM +1 ]T generic element of Ker(ATh ), this amounts at

finding the solutions of the homogeneous linear system ATh z = 0,
    
−h 1 z0 0
 h −2 1   ..   .. 
   .   . 
1  1 −2 1  
..
 
..

 =  ,
     
2
h 
 .. .. ..   . .
 . . . 


 ..



 ..


 1 −2 h   .   . 
1 −h zM +1 0
and we find that the solution is z = [z0 , hz0 , . . . , hz0 , z0 ]T ∈ RM +2 for any value of z0 ∈ R.
Then
M
T h X h
F z = 0 ⇔ z0 (f0 + σ0 ) + hz0 fm + z0 (fM +1 + σ1 ) = 0,
2 2
m=1
for all z0 and for z0 6= 0 we can divide out by z0 to obtain the condition
M
h X h
σ1 − σ0 = f0 +h fm + fM +1 .
2 2
m=1
We easily recognize that this condition is the discrete analogous of (3.7) and in fact the
right hand side is the composite trapezoidal rule applied to the integral of f (x) on the
discretization grid.
3.2 Linear boundary value problems

Consider the general linear boundary value problem
a(x)u00 (x) + b(x)u0 (x) + c(x)u(x) = f (x), a ≤ x ≤ b, u(a) = α, u(b) = β,

b−a
a discretization of this problem on the grid xm = a + m · h, h = M +1 is
Um−1 − 2Um + Um+1 Um+1 − Um−1

am + bm + cm Um = fm , m = 1, . . . , M
h2 2h
and am := a(xm ), bm := b(xm ), cm := c(xm ) and Um ≈ um := u(xm ). The corresponding
linear system of equations has the form
h2 c1 − 2a1 a1 + h b21 f1 − α( ha12 − 2h
b1
    
U1 )
 a2 − h b2 h2 c2 − 2a2 a2 + h b22     f 2

 2     
     
1 
.. .. ..
 
..
 
..

 =  .
    
. . . . .
h2 
  
    
     
     
 aM −1 + h bM2−1     fM −1 
am − h b2M h2 cM − 2aM UM fM − β( ahM2 + b2hM
)
The linear system we obtain will in general be not symmetric.
3.2.1 A self-adjoint case

Suppose now we want to discretize
(κ(x)u0 (x))0 = f (x), a ≤ x ≤ b, u(a) = α, u(b) = β,
we could differentiate out the left hand side to obtain a problem of the class described
above, i.e
κ(x)u00 (x) + κ0 (x)u0 (x) = f (x)
this is however not the best strategy because, as we have seen, this leads in general to non
symmetric linear systems. Consider now the differential operator L defined as follows
Lu(x) := (κ(x)u0 (x))0
this operator is self-adjoint with respect to the L2 ([a, b]) inner product,
Z b
hf, gi := f (x)g(x) dx, f, g ∈ L2 ([a, b]).
a
This motivates the search for a symmetric discretization Ah of L.

3.3. A NONLINEAR EXAMPLE 23
h
To achieve this we consider the mid-points of the grid xm− 1 = xm − 2 and xm+ 1 =
2 2
xm + h2 , we then consider the following approximations
um − um−1
κm− 1 u0m− 1 = κm− 1 + O(h2 )
2 2 2 h
and
um+1 − um
κm+ 1 u0m+ 1 = κm+ 1 + O(h2 ).
2 2 2 h
We use these two to produce
1 h i
(κ u0 )0 xm = 2 κm− 1 um−1 − (κm− 1 + κm+ 1 )um + κm+ 1 um+1 + O(h2 ).
h 2 2 2 2
The methods is of second order and the linear system we have to solve is
κ1
−(κ 1 + κ 3 ) κ3
   
2 2 2

U1
 f1 − α h22
 κ3 −(κ 3 + κ 5 ) κ 5 
..

f2

2 2 2 2
     
1  .. .. ..   .  
 =  .. 
.
 . . .  
.. .
h2   
.
  
κM − 1 fM −1
     
   
2
UM κM + 1
κM − 1 −(κM − 1 + κM + 1 ) fM − β h2
2
2 2 2
3.3 A nonlinear example

We consider the pendulum equations
θ00 (t) + sin(θ(t)) = 0, 0 < t < T, θ(0) = α, θ(T ) = β, (3.8)

T
with numerical discretization on the grid tm = m h, m = 0, . . . , M + 1 and h = M +1
leading to the discretized equations
1
(Θm−1 − 2Θm + Θm+1 ) + sin(Θm ) = 0, m = 1, 2, . . . , M.
h2
The system of equations is now nonlinear and we denote it in a vector form as
 
Θ1
Gh (Θ) = 0, Θ =  ...  ,
 
ΘM
where Gh (Θ) = Ah Θ + sin(Θ), sin(Θ) := [sin(Θ1 ), . . . , sin(ΘM )]T and Ah is the same as
in (3.3).
The numerical solution of this system must be addressed iteratively, for example with
a Newton method:
Θ[k+1] = Θ[k] − Jh (Θ[k] )−1 Gh (Θ[k] ),
where Jh (Θ) is the Jacobian of Gh (Θ), and it is easily seen that Jh (Θ) is a tridiagonal
symmetric matrix such that
 1

 h2
for j = i − 1 or j = i + 1,

−2
(Jh )i,j (Θ) = h2
+ cos(Θi ) for j = i,


0 otherwise,

so
Jh (Θ) = Ah + C(Θ), C(Θ) := diag(cos(Θ1 ), . . . , cos(ΘM )).
The truncation error is
 
θ1
→
− →
−
τh := Gh ( θ ), θ :=  ...  , θj := θ(tj ).
 
θM
It is easily shown by Taylor expansion that the components of τh satisfy
1 2 (4)
τj = h θ (tj ) + O(h4 ), j = 1, . . . , M,
12
this ensures that the method is consistent of order 2. As usual we want to use the connection
→
−
between error Eh := Θ − θ and truncation error τh in order to prove convergence. In
general this connection is a bit less manageable in the case of nonlinear problems, but in
this particular example it is not too complicated.
We combine the discrete equation Gh (Θ) = 0 and the equation for the truncation error
→
−
Gh ( θ ) = τh to obtain
→
−
Gh (Θ) − Gh ( θ ) = −τh . (3.9)
From (3.9), using Gh (Θ) = Ah Θ + sin(Θ) we get
→
−
Ah Eh + sin(Θ) − sin( θ ) = −τh ,
and by Taylor expansion

→
−
sin(Θ) = sin( θ ) + C(θ̂)Eh ,
where C(θ̂) is the diagonal matrix earlier defined, and the components θ̂i of the vector θ̂
belong to the open intervals (Θi , θ(ti )).
Due to the stability already proven for the linear case, Ah is invertible for all h < H
and so
Eh + A−1 −1
h C(θ̂)Eh = −Ah τh .
We here use the same notation for the vectors Eh and τh and the corresponding piecewise
constant functions defined on the discretization grid, the norms we consider in the sequel
are function norms. Assuming we operate in the 2-norm, we get
h i
−1 −1
kEh k2 ≤ kAh k2 kC(θ̂)Eh k2 + kτh k2 ≤ kAh k2 max | cos(θ̂m )| kEh k2 + kτh k2
1≤m≤M
so
(1 − kA−1 −1
h k2 ) kEh k2 ≤ kAh k2 kτh k2 .
We know from earlier analysis that kA−1 1
h k2 = |λ1 | where λ1 is the eigenvalue of Ah with
minimum absolute value. We also obtained the estimate |λ1 | = π 2 + O(h2 ) and so we can
also deduce that kA−1 1 2 −1
h k2 = π 2 + O(h ) and for h small enough kAh k2 < 1 and we get
kA−1
h k2
kEh k2 ≤ kτh k2 .
1 − kA−1
h k2
Using again the estimate for kA−1

h k2 we can obtain
kA−1
h k2 1
−1 = 2 + O(h2 ).
1 − kAh k2 π −1
3.3. A NONLINEAR EXAMPLE 25
1
kEh k2 ≤ kτh k2 + O(h2 )
π2 − 1
1 2 (4)
and recalling that kτh k2 = + O(h4 ) we get finally
12 h kθ k2

1 1 2 (4)
kEh k2 ≤ 2 h kθ k2 + O(h4 )
π − 1 12
which guarantees convergence when h goes to zero, as it is easy to see that θ(4) is bounded
by differentiating the equation twice.
Chapter 4
Discretization of the heat equation
4.1 On the derivation of the heat equation

We will use a standard example to describe the numerical schemes for parabolic differential
equations throughout the course. We are talking about the linear heat equation in one
space dimension, which for example can be used to model the flow of heat on a straight
homogeneous rod over time. The rod is insulated everywhere except at the two ends.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The rod in the picture has length L = 1, and let the coordinate x describes a point
along the rod. At time t ≥ 0 the rod has temperature u(x, t) at the point x. We can derive
the differential equation by using Fourier’s law. The flux of heat φ through a cross section
of the rod at x is proportional to the temperature gradient, such that
φ = −λux , λ > 0,
and the following conservation law holds true
ρcut + φx = 0, ρ is the rod’s density.
These two equations imply together that

λ
ut = a uxx , a= .
ρc
By introducing scales for time, space and temperature
u x at
w= , y= , τ= ,
u0 L L2
where w, y and τ are dimensionless variables, u0 is the characteristic temperature,and L
is the rod’s length, we get
wτ = wyy , 0 < y < 1.
We have seen that after scaling it is always possible to assume that the space interval is
[0, 1] and the coefficient a can be set equal to 1. From now on we will usually look at the
problem
ut = uxx , 0 < x < 1, t > 0.
27
28 CHAPTER 4. DISCRETIZATION OF THE HEAT EQUATION
Together with the differential equation we need to provide appropriate boundary conditions
and initial conditions. The kind of boundary and initial conditions necessary and sufficient
to get a unique solution vary from differential equation to differential equation. We will
consider few options for the heat equations.
Pure initial value problem. In this case we assume the rod is infinitely long.
ut = uxx , x ∈ R, t > 0,
u(x, 0) = f (x), x ∈ R.
Initial/Boundary value problem (I/BVP). This case includes the situation of heat
transport in a homogeneous rod of length 1. We must consider an initial function and
boundary conditions at the two ends of the rod.
ut = uxx , 0 < x < 1, t > 0,
u(x, 0) = f (x), 0 ≤ x ≤ 1,
u = g0 u = g1
u(0, t) = g0 (t), t > 0,
u(1, t) = g1 (t), t > 0.

(4.1)
u=f x
4.2 Numerical solution of the initial/boundary value problem
4.2.1 Numerical approximation on a grid.
We adopt a step-size h in the x-direction, and one in the t-direction which we denote k.
We assume at first that h = 1/(M + 1) for a given integer M .
We then define gridpoints or nodes (xm , tn ) by
xm = mh, 0 ≤ m ≤ M + 1, tn = nk, n = 0, 1, 2, . . . .
Observe that this means that x0 = 0 and xM +1 = 1 are the boundary points. The solution
in the point (xm , tn ) is denoted unm := u(xm , tn ). And from now on we denote with Um
n
the approximation to the solution in (xm , tn ) produced by the numerical method.

4.2. NUMERICAL SOLUTION OF THE INITIAL/BOUNDARY VALUE PROBLEM29
: u known
: u unknown
4.2.2 Euler, Backward Euler and Crank–Nicolson

We present three different difference schemes for the heat equation.
The Euler’ s method. We adopt a simpler notation for the derivatives and set
∂u ∂ku ∂u
∂x u = ux = , ∂xk u = , ∂t u = ut = .
∂x ∂xk ∂t
We expand un+1
m = u(xm , tn + k) for constant x = xm , around t = tn , and get
1
un+1
m = unm + k∂t unm + ϕnm , ϕnm = k 2 ∂t2 unm + · · · .
2
But we can now use the heat equation ensuring ∂t unm = ∂x2 unm , we then approximate this
second derivative with central differences as in (2.6)
k 2 n
un+1
m = unm + n
δ u − ψm + ϕnm
h2 x m
where the index on δ means that we apply this operator in the x-direction i.e.
δx2 unm = unm+1 − 2unm + unm−1 .
From the expression in (2.6) we find that
n 1
ψm = k h2 ∂x4 unm + · · · .
12
Summarizing we have
k 2 n k
un+1
m = unm + 2
n
δx um + kτm = unm + 2 (unm+1 − 2unm + unm−1 ) + kτm
n
h h
where
n 1 2 2 1
kτm = ϕnm − ψm
n
= k ∂t − k h2 ∂x4 unm + · · ·
2 12
the Euler’s formula is obtained by replacing all the exact u-values (small letters) with
n.
approximate values U (capital letters) in the above formula and discard the term kτm
Euler’s method
m, n + 1
n+1 n
(4.2) Um = Um + r δx2 Um
n
k
r = h2
m − 1, n m, n m + 1, n
The picture above to the right is called computational molecule, it is a sort of local
chart over the grid, indicating which grid-points are used in the formula. The idea is now
to start at n = 0, corresponding to t0 = 0 where u(x, t0 ) = u(x, 0) = f (x) which is known.
0 = f (x ), m = 0, . . . , M + 1. Then we set n = 1
It is then possible to order the values Um m
and use first the boundary values to get U01 = g0 (k) and UM1
+1 = g1 (k). For the remaining
values we use the formula (4.2) above. It is possible to see that the grid-point at level
tn+1 = t1 in the computational molecule can be computed using known values.
Algorithm (Euler’s method for the heat equation)
Um0 := f (x ), m = 0, . . . , M + 1
m
for n = 0, 1, 2, . . .
U0n+1 := g0 (tn+1 )
n+1
UM +1 := g1 (tn+1 )
Umn+1 := U n + r δ 2 U n , m = 1, . . . , M
m x m
end
Example.
0.8
ut = uxx 0 < x < 1, t > 0,
0.6
0 ≤ x ≤ 21 ,
u

2 x, 0.4
f (x) =
2 (1 − x), 12 < x ≤ 1,
0.2
g0 (t) = g1 (t) = 0, t > 0.
0
0 0.2 0.4 0.6 0.8 1
x
In the picture above we display the initial function. By running a simulation in Matlab
based on this example, where we let h = 0.1 (M = 9), and k = 0.0045. The reason why we
take k so small compared to h will be explained later on. Figure 4.1 shows the numerical
solution in the grid-points as small rings, at time t = 0, 5, 10 and 20.
n=0 n=5
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
n=10 n=20
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Figure 4.1: Matlab simulation of the heat equation using the Euler method
Backward Euler. We expand now instead unm around x = xm , t = tn+1 , and obtain
unm = u(xm , tn+1 − k)

1 2 2 n+1
= un+1 n+1
m − k ∂t um + 2 k ∂t um + · · ·
1 2 2 n+1
= un+1 2 n+1
m − k ∂x um + 2 k ∂t um + · · ·
1 2 n+1 1 2 4 n+1
= un+1
m − k ( h2 δx um − 12 h ∂x um + · · · ) + 21 k 2 ∂t2 un+1
m + ···
= un+1 2 n+1 n
m − rδx um + kτm ,
where
n 1 1
kτm = k h2 ∂x4 + k 2 ∂t2 un+1
m + ··· .
12 2
n we obtain
By replacing the u’s med U ’s and discard the term kτm
Backward Euler’s method

m − 1, n + 1 m, n + 1 m + 1, n + 1
n+1
Um − r δx2 Um
n+1 = n,
Um
(4.3)
k
r= h2
.
m, n
The Backward Euler method is an implicit method. This means that at each time
n+1 , m = 1, . . . , M . We
step we must solve a system of linear equations to compute Um
will discuss later on the solution of linear systems. We now will present another implicit
method.
Crank–Nicolsons method. This method is based on the trapezoidal rule, consider the
following expansion of the error for the trapezoidal rule for quadrature
Z k
1 1 3 00 k
f (t) dt = k (f (0) + f (k)) − k f + ··· .
0 2 12 2
To derive the method we use the obvious formula
Z tn+1
u(xm , tn+1 ) − u(xm , tn ) = ut (xm , t) dt,
tn
n+1/2
and approximate the integral with the trapezoidal rule where we use the notation um =
u(xm , tn + 21 k).
1 1 3 3 n+1/2
un+1
m = unm + k (∂t unm + ∂t un+1
m )− k ∂t um + ···
2 12
1 1 3 3 n+1/2
= unm + k (∂x2 unm + ∂x2 un+1
m )− k ∂t um + ···
2 12

n 1 1 2 n 1 2 n+1 1 1 2 4 n 1 2 4 n+1
= um + k δ u + δ u − k h ∂x um + h ∂x um + · · ·
2 h2 x m h2 x m 2 12 12
1 3 3 n+1/2
−k ∂t um + ··· .
12
We simplify and summarize
r
un+1
m = unm + (δx2 unm + δx2 un+1 n
m ) + kτm ,
2
n 1 3 3 n+1/2 1
kτm = − k ∂t um − k h2 ∂x4 un+1/2
m + ··· ,
12 12
where we have used that
1 4 n
(∂ u + ∂x4 un+1 4 n+1/2
m ) = ∂x um + O(k 2 ).
2 x m
Crank–Nicolsons method
m − 1, n + 1 m, n + 1 m + 1, n + 1
(1 − 2r δx2 ) Um
n+1 = (1 + r δ 2 ) U n ,
2 x m
(4.4)
k
r= h2
.
m − 1, n m, n m + 1, n
We summarize by writing all the formulae in a compact form
n+1 1 1 2 n
(E) Um = (1 + r δx2 ) Um
n or n
k ∆t Um = δ U ,
h2 x m
1 1 2 n+1
(BE) (1 − r δx2 ) Um
n+1 n
= Um or n+1
k ∇t Um = δ U
h2 x m
,
n+1/2 n+1/2
(CN) (1 − 2r δx2 ) Um
n+1 = (1 + r δ 2 ) U n
2 x m or 1
k δ t Um = 1
h2
δx2 µt Um .
Note that (E) is explicit while both (BE) and (CN) are implicit.
The local the truncation error. In the methods presented we have used the symbol
τmn , this is the local truncation error in the point (x , t ). Given a finite difference formula,
m n
to find the corresponding local truncation error one inserts the exact solution evaluated in
the grid points in the finite difference formula. Since the exact solution does not satisfy
the finite difference formula, then an error term appears, this is the local truncation error.
So for example for the forward Euler method applied to the heat equation we have the
finite difference formula
Umn+1 − U n U n − 2Um n + Un
m
= m−1 m+1
,
k h2
replacing Um n with the exact solution in (x , t ), un we get
m n m
un+1 n
m − um un − 2unm + unm+1
= m−1 n
+ τm ,
k h2
so
un+1
m − um
n un − 2unm + unm
n
τm := − m−1 .
k h2
The local truncation error can expressed in powers of the step-sizes h and k and the
derivatives of the exact solution by using the Taylor expansion.
4.2.3 Solution of the linear systems in Backward Euler’s method and

Crank–Nicolson
Our starting point is that we know Um n , 0 ≤ m ≤ M + 1, together with U n+1 and U n+1
0 M +1
given by the boundary conditions. We need to compute Um n+1 , 1 ≤ m ≤ M . Let us
consider Crank–Nicolson first. The right hand side (r.h.s.) of the equations is known, we
set r 2 n r n r n
dn+1
m = 1 + δx Um = Um−1 + (1 − r) Um n
+ Um+1 , 1 ≤ m ≤ M.
2 2 2
For the left hand side (l.h.s.) we get component-wise
r n+1 r n+1 r n+1
1 − δx2 Um = − Um−1 + (1 + r) Umn+1
− Um+1 , 1 ≤ m ≤ M,
2 2 2
where we substitute U0n+1 = g0n+1 = g0 (tn+1 ) and UM n+1 n+1
+1 = g1 = g1 (tn+1 ). We can now
express the equation in matrix-vector form
    
1 + r − 2r U1n+1 dn+1 + r n+1
g
    1 2 0 
r r   n+1   n+1
 −2 1 + r −2   U2 d2
 
  
. .
    
.. .. .. .. ..
. . . = .
     
 
    
 r r 
  n+1    n+1 

 − 2 1 + r − 2  U M −1  
 dM −1


n+1
− 2r 1 + r UM dn+1
M + r n+1
g
2 1
In a similar way we get the following linear system for Bakward-Euler
    
1 + 2r −r U1n+1 U1n + r g0n+1
    
  n+1
−r 1 + 2r −r   U2 U2n
   
   
.. ..
    
.. .. ..
. . . = .
    



 .   . 
   n+1   n


 −r 1 + 2r −r    UM −1
  
  UM −1


n+1 n + r g n+1
−r 1 + 2r UM UM 1
The two matrices in Crank-Nicolson and Backward Euler are examples of tridiagonal
matrices. These special tridiagonal matrices are also called Toeplitz matrices, i.e. the
elements along each of the three diagonals are equal. In Toeplitz matrices we need only to
specify the first row and the first column to determine the whole matrix. If in addition we
know that the matrix is symmetric, it is enough to specify the first row (or column).
In general when we solve differential equations with difference methods we obtain sparse
matrices. PDEs in one space dimension and at most second order derivatives give rise typ-
ically to (at least) tridiagonal matrices. Higher order derivatives imply a larger bandwidth
in the matrix. There is a clear correspondence between the bandwidth of the matrix and
the number of neighboring values Um−p , . . . , Um , Um+1 , . . . , Um+q used to approximate the
highest order derivative of u with respect to x, at the grid point xm (see also the method of
undetermined coefficients form chapter 1). In the case of several space dimensions we obtain
a block structure in the matrices, for example the heat equation in two space dimensions
will typically give a block-tridiagonal matrix. Toeplitz structure is lost when considering
a heat equation where the rod’s material is inhomogeneous (see the chapter 4.1), then the
differential equation becomes
ut = a(x)uxx
where a(x) is a given function.
There are special algorithms which can be used to solve linear systems with tridiagonal
matrices. Among the direct methods a variant of Gaussian elimination called Thomas
algorithm. We are not going to discuss this algorithm in detail here.
Instead we consider some simple examples on how it is possible to use Matlab to
assemble and solve the above equations.
4.2.4 Solution of linear systems in Matlab

When working with sparse matrices, that is with matrices the majority of whose elements
are zero, it is important to store the matrix in a cheap way in the computer. For a
M × M -matrix as the one considered in the previous section it is unpractical to store all
the elements, it is possible instead to store a list of all the indexes corresponding to nonzero
elements and the corresponding value. If you choose M = 1000 in the last example, the
matrix will have 106 = one million elements in total, while there are about 3000 elements
which are different from zero. Another issue is that if we multiply a large and sparse matrix
with a vector, we will make a lot of multiplications by and additions with zero which could
be avoided.
Matlab has built in facilities for this. Start Matlab and try typing > help sparse or >
help spdiags. The function sparse converts a full matrix into a sparse matrix, menaning
that only the non-zero elements are stored. The conversion of a matrix from sparse format
to full format can be achieved using the function full. The function spdiags is used to
generate matrices in sparse format using their diagonals. To create the matrix for the
Crank–Nicolson method in sparse format, we proceed as follows. We assume M = 10
so that h = 1/11 and choose k such that r = k/h2 = 1. Try the following sequence of
commands
> M=10;
> r=1;
> e=ones(M,1);
> A=spdiags([-r/2*e, (1+r)*e, -r/2*e], -1:1,M,M);
Try to remove ’;’ in the last command to see how Matlab shows a matrix in sparse format.
Matlab assigns indexes to the diagonals in a matrix by giving the index 0 to the main
diagonal, the sub-diagonal gets index −1, the super-diagonal gets index 1 and so on. The
second input argument of spdiags is a vector d of integer numbers such that the coulmn j
from the matrix given as the first input argument becomes the diagonal d(j) in the result.
The two last input arguments in the call to the function specify that the result must be a
M × M matrix.
Let us see how a time-step with Crank–Nicolson can be implemented in Matlab. As-
sume that the variable U0 has M + 1 elements and stores the numerical approximation at
time t = tn . If for example n = 0, U0 is generated from the given starting values. It can
be a good idea to let U0 have dimension M + 2 and store the boundary points respectively
in U0(1) and U0(M+2)
Let us now set n = 0, and use the hat function as a starting value, i.e. f (x) = 1−|2x−1|.
We define U0 by
> h=1/(M+1); % definer romskrittlengde h

> X=(0:h:1)’; % Gitterpunktene i x-retning er en kolonnevektor
> U0 = 1-abs(2*X-1); % Definer hattfunksjonen som startverdi
Assume the above commands are executed, such that r,A,U0,X are all defined.
We assume also that the boundary values are g0 (t) = g1 (t) = 0. Then we can make a
step with Crank–Nicolson as follows
> d=r/2*U0(1:end-2)+(1-r)*U0(2:end-1)+r/2*U0(3:end);
> U1=[0;A\d;0]; % Numerisk losning ved tidsskritt n+1
> plot(X,U1,’o’) % Plott resultatet
Observe that the definition of A is done once and for all, and it is used in all the following
time-steps. It would have been even better to LU-factorize A at the beginning, so that in
the subsequent steps one uses just the backward substitution algorithm.
Try to plot or make a movie of the results in time, look at > help movie, for ex-
ample.You can also try with a bigger value of M to improve accuracy and decrease the
numerical error.
4.2.5 The θ-method

It is possible to present all the three methods defined in the previous sections in a unified
format by writing
(1 − θ r δx2 ) Um
n+1
= (1 + (1 − θ) r δx2 ) Um
n
.
One gets then

(E) θ=0
(BE) θ=1
1
(CN) θ= 2
Local truncation error for the θ-method
n
k τm = (1 − θ r δx2 ) un+1 2 n 2 n+1 n 2 n
m − (1 + (1 − θ) r δx ) um = (1 − θ r δx )(um − um ) − r δx um .
We expand all the expressions around (xm , tn ) and obtain

n 2 1 2 4 1 2 2 1 3 3 n 2 1 2 4
k τm = 1 − θ k (∂x + h ∂x + · · · ) k ∂t + k ∂t + k ∂t um − k ∂x + h ∂x + · · · unm
12 2 6 12

1 1 1 1
= k ∂x2 + k 2 ∂t2 + k 3 ∂t3 − θ k 2 ∂t2 − θ k 3 ∂ 3 − k ∂x2 − k h2 ∂x4 + · · · unm + · · ·
2 6 2 12
1 1 1 1
= ( − θ) k 2 ∂t2 unm − k h2 ∂x4 unm + ( − θ) k 3 ∂t3 unm + · · · .
2 12 6 2
We conclude that
n 1
k τm = O(k 2 + k h2 ) when θ 6= ,
2
n 1
k τm = O(k 3 + k h2 ) when θ = .
2
And we then expect that (CN) is more accurate than (E) and (BE).
4.3 Semi-discretization
4.3.1 Semi-discretization of the heat equation
We look again at the (I/BV) problem for the heat equation (4.1). Let us now draw vertical
grid-lines as shown in the picture.
u = g0 u = g1
h
x1 x2 x3 xM xM +1
The lines are parallel to the t-axis and cross the x-axes in x = xm , m = 0, . . . , M + 1.
We consider the differential equation along such lines, this means
∂t u(xm , t) = ∂x2 u(xm , t)

1 2
= δ u(xm , t) + ϕ(xm , t),
h2 x
1 2 4
ϕ(xm , t) = − h ∂x u(xm , t) + · · · .
12
4.3. SEMI-DISCRETIZATION 37
We introduce the functions of one variable vm (t), m = 0, . . . , M + 1, as approximations to

u(xm , t). We require that
v0 (t) = g0 (t),
vM +1 (t) = g1 (t),
1 2
v̇m (t) = δ vm (t), vm (0) = f (xm ), m = 1, . . . , M,
h2 x
dvm (t)
where v̇m (t) = . For a more compact notation, let v(t) := [v1 (t), . . . , vM (t)]T . We
dt
get
       
v̇1 −2 1 v1 g0 (t)
       
 v̇2   1 −2 1   v2   0 
       
 . 
  1 

.. .. ..   .. 
   1  . 

v̇ =  ..  = 2  . . .  ·  .  + 2  ..  .

  h     h  
       
 v̇M −1 
 

 1 −2 1 
 
 v M −1 


 0 

v̇M 1 −2 vM g1 (t)
| {z } | {z }
A b(t)
So we obtain a (linear) system of ordinary differential equations (ODEs) of the type
v̇ = Av + b(t), v(0) = v 0 = [f (x1 ), . . . , f (xM )]T . (4.5)
This system is a special case of the general format for ODEs which is used in standard
numerical codes for ODEs. The general format is
v̇ = F (t, v), v(0) = v 0 , (4.6)
where v and F (t, v) are vectors in RM . Three of the simplest methods for the numerical
solution of (4.6) with time-step k are
(E) Euler : V n+1 = V n + k F (tn , V n )

(BE) Backward Euler : V n+1 = V n + k F (tn+1 , V n+1 )
Trapezoidal rule : V n+1 = V n + k
F (tn , V n ) + F (tn+1 , V n+1 ) .

(T) 2
By setting F (t, v) = Av + b(t) from (4.5), it is possible to reproduce the three methods
presented earlier. In particular (T) becomes (CN).
4.3.2 Semidiscretization principle in general

This strategy can be applied also to differential equations other than the heat equation.
For such equations we replace all derivatives with difference approximations, while the
time, t, remains continuous. The result is
PDE −→ System of ODEs

An advantage with this approach is that now it is possible to exploit off-the-shelf software
for ODEs, where advanced routines for error and step-size control are already incorporated.
In particular the method can be of interest in case of nonlinear PDEs, because in that case
also the semi-discrertized ODE problem will be nonlinear, and standard ODE software is
designed to solve numerically also such problems in an efficient way.
A problem often encountered is that the resulting ODE system is stiff. For the linear
semi-discrete system (4.5) this means typically that the eigenvalues λ1 , . . . , λM of A have
negative real part and the quotient
maxi |Re λi |
α= ,
mini |Re λi |
is relatively large. If A is the tridiagonal matrix reported above then
4 sπ
λs = − 2
sin2 , m = 1, . . . , M,
h 2(M + 1)
that is all the eigenvalues are real. For small values of x, sin x ≈ x, such that, for big M ,
the eigenvalue with the smallest absolute value is
4 π 2 h2
|λ1 | ≈ = π2,
h2 22
and the eigenvalue with the biggest absolute value is
4 Mπ 4 π 4
|λM | = − 2
sin2 ≈ 2 sin2 = 2 .
h 2(M + 1) h 2 h
Then we get α ≈ π24h2 1 when h is small.

Later on we will see that this fact implies that (BE) and (CN) perform better than
(E).
4.3.3 General approach

Abstractly we can write a partial differential equation (evolution equation) in the form
∂t u = Lu,
where L is a differential operator with space derivatives. For example the heat equation is
obtained by considering L = ∂x2 . Generally, in one space dimension, we have
L = L(x, t, ∂x , ∂x2 , . . .).
The semi-discretization leads to
L −→ Lh ,
that is, Lh , is a discrete operator now acting on the components of a vector of functions
in one variable instead of functions of two variables. For each component we write
∂t u(xm , t) = Lh u(xm , t) + ϕ(xm , t),
where ϕ(xm , t) is the truncation error in the space direction. We let now vm (t) ≈ u(xm , t)
and define
v̇m (t) = Lh vm (t), (including the boundary conditions).
We will next look at the truncation error due to the time-discretization, that is after having
chosen an integration method for ODEs. Assume we use the trapezoidal rule for example.
4.3. SEMI-DISCRETIZATION 39
Let y(t) be the solution of
ẏ = F (t, y), y ∈ RM .
It is possible to show that with step-size k this gives
n+1 n k
Fm (tn , y n ) + Fm (tn+1 , y n+1 ) + ψm
n

ym := ym (tn+1 ) = ym + ,
2
where
n 1 3 (3)
ψm =− k ym (tn ) + · · · .
12
But u(xm , t) is such that ∂t u(xm , t) = Lh u(xm , t)+ϕ(xm , t). Let us insert ym (t) = u(xm , t)
in the general ODE-formulation above, such that
Fm (t, y) = Lh u(xm , t) + ϕ(xm , t),
i.e., a system of ODEs whose solution is the solution of the PDE problem along the vertical
lines (x = xm , t). We get then
k k
un+1 = unm + Lh unm + ϕnm + Lh un+1 n+1 n+1
= unm + Lh unm + Lh un+1 n

m m + ϕm +ψm m +k τm ,
2 2
where
n k n
k τm = (ϕ + ϕn+1 n
m ) + ψm .
2 m
Note that this is true in general for the semi-discretization principle. In particular the result
is consistent with what we know for the three methods (E), (BE) and (CN) applied to
the heat equation.
4.3.4 ut = Lu with different choices of L

Case A.
ut = a(x) uxx + b(x) ux + c(x) u .
| {z }
Lu
We can also write

L = a∂x2 + b∂x + c.
Requirements: a, b og c are continuous in [0, 1],

a(x) > 0 in [0, 1].
Space-discretization
Discretization Truncation error

1 2
uxx −→ δ u
h2 x O(h2 )

1


 h ∆x u O(h)

ux −→ 1
 h ∇x u O(h)

1 1
O(h2 )


2h (u(x + h) − u(x − h)) = h µ δx u
We set therefore  
1
h ∆x u

 

 
1 2 
1

Lh u = a δ u+b h ∇x u
+ c u.
h2 x 
 

 1 
 δ
h x µ u 
We get Lu = Lh u + ϕ where
 


 − 12 h ∂x2 u 


1  
ϕ = − a h2 ∂x4 u − b 1 2
2 h ∂x u
.
12 
 

 − 1 h2 ∂ 3 u
 

6 x
The choice of ∆x versus ∇x is called upwind/downwind- differencing. One of these two is

chosen in the case of the so called convection dominated problems. The sign of b determines
if one should use ∆x or ∇x . b > 0 → ∆, b < 0 → ∇.
Case B. Consider now the equation
ut = (a(x) ux )x , L = ∂x (a∂x ).
| {z }
Lu
L is self-adjoint. In particular this means that if you use the inner product between
differentiable functions which are 0 in the endpoints 0 and 1, defined by
Z 1
hu, vi = u(x)v(x) dx,
0
so we get hLu, vi = hu, Lvi for all u, v. If we look at an analogous situation with the inner
product on Rn
hx, yi = y T x
and replace L with a matrix, then the analogous condition becomes
y T Ax = hAx, yi = hx, Ayi = y T AT x,
for all x, y ∈ Rn , implying that A = AT , i.e. that A is symmetric.

A possible idea is to expand L by using the product rule of differentiation,
ut = auxx + a0 ux .
So that we get an equation of the same type as in the case A with b = a0 and c = 0.
Another (and usually better) possibility is to discretize directly the original form, we let
1 1 1
∂x (a ∂x )u(xm , t) −→ δx (a δx U )m = 2 δx (am (Um+1/2 − Um−1/2 ))
h h h
1
= 2
am+1/2 (Um+1 − Um ) − am−1/2 (Um − Um−1 ) .
h
The truncation error is O(h2 ).
A method by Tikhonov and Samarski. Write the equation in conservative form
1) ut + wx = 0, 2) w = −aux .
4.4. BOUNDARY CONDITIONS INVOLVING DERIVATIVES 41
We discretize 1) by
1 1
∂t um = −∂x wm ≈ − δx wm = − (wm+1/2 − wm−1/2 ).
h h
For the other equation we get ux = −w/a and
Z xm+1 Z xm+1 Z xm+1
w dx
ux dx = − dx ≈ −wm+1/2 .
xm xm a xm a
Set now
1
Am = R xm+1 dx
.
h xm a
And therefore
1 1
w ≈ −Am (um+1 − um ), w ≈ −Am−1 (um − um−1 ),
h m+1/2 h m−1/2
such that in the discretization of 1) we get
∂t um ≈ Am (um+1 − um ) − Am−1 (um − um−1 ),
and the semi-discrete system is
v̇m = Am (vm+1 − vm ) − Am−1 (vm − vm−1 ).
4.4 Boundary conditions involving derivatives

4.4.1 Different types of boundary conditions
We look at boundary conditions used for the heat equation in 3 space dimensions.
The picture illustrates the flux of heat φ through

the surface A, with normal vector ~n from the side
I to the side II. This flux is proportional to the
directional derivative of the temperature in the
II direction of a normal vector pointing outwards
the domain. We write
I
∂u
~n φ = −λ = −λ ~n · ∇u.
∂n
You can think that the surface A is a part of the
A surface (boundary) of the domain of definition
of the problem in R3 where we solve the equa-
tions. We name the domain in space Ω, and the
boundary ∂Ω.
Physical situations with boundary conditions including derivatives.
1. The heat flux given (specified) on ∂Ω
∂u
−λ = φ given.
∂n
2. Convection
∂u
−λ
= α(u − u0 ).
∂n
where we do not consider the boundary layer.
3. Radiation (Plancks radiation law from statistical mechanics)
∂u
−λ = σ(u4 − u40 ).
∂n
In what follows we will consider the model

∂u
+ η u = g,
∂n
where η ∈ R and the function g is defined on the boundary ∂Ω. We require η > 0, and
recall that the normal vector ~n is pointing outwards Ω. We consider the case of one space
dimension.
~n ~n
∂u ∂u
∂n
= −∂xu ∂n
= ∂x u
−∂xu + η0u = g0 ∂x u + η1 u = g 1
∂u
In one space dimension ∂n = ±ux , the sign depends on where the normal vector is
pointing (to the right or to the left). We call the relative Initial/Boundary value problem
(I/BVP).
ut = uxx ,
u(x, 0) = f (x),
−ux (0, t) + η0 u(0, t) = g0 (t),
ux (1, t) + η1 u(1, t) = g1 (t),
η0 , η1 > 0.
NB! u(0, t) and u(1, t) are unknown.
Semi-discretization: we choose now h = 1/M and xm = mh, 0 ≤ m ≤ M . We get all

together M + 1 unknowns v0 , . . . , vM where vm (t) ≈ u(xm , t).
4.4.2 Discretization of the boundary conditions

We look at how the derivatives in the boundary conditions can be discretized. A useful
technique is to introduce “fictitious grid lines”; one to the left for the left boundary, i.e.
the line x = −h, t > 0, and one to the right for the right boundary, that is the line
x = 1 + h, t > 0.
4.4. BOUNDARY CONDITIONS INVOLVING DERIVATIVES 43
Left boundary. We want to use a difference approximation with truncation error O(h2 )
and try the use of central differences, which will involve the solution at the fictitious grid
line outside the domain of definition of the solution of the equation.
−∂x u(0, t) + η0 u(0, t) = g0 (t),

↓
u1 − u−1
− + η0 u0 = g0 + θ0 ,
2h
where
1
θ0 = − h2 ∂x3 u0 + · · · = truncation error.
6
Here we have u−1 = u(x−1 , t) = u(−h, t) outside the domain where we look for u(x, t).
This might seem a bit dubious, but later on we will see that u−1 is eliminated from the
discrete equations.
Right boundary. Similarly we get
∂x u(0, t) + η1 u(1, t) = g1 (t),

↓
uM +1 − uM −1
+ η1 uM = g1 + θ1 ,
2h
where
1 2 3
θ1 = h ∂x uM + · · · .
6
The semi-discretization is then

1
v̇m = 2 δx2 vm , 0 ≤ m ≤ M,





 h
v1 − v−1

− + η0 v0 = g0 , (4.7)


 2h
 vM +1 − vM −1


 + η1 vM = g1 ,
2h
that is M + 3 equations for the M + 3 unknowns v−1 , v0 , . . . , vM , vM +1 . We eliminate v−1
and vM +1 . From the two last equations in (4.7) we obtain the following equations for the
values relative to the fictitious grid lines
v−1 = v1 − 2 h η0 v0 + 2hg0 ,
vM +1 = vM −1 − 2hη1 vM + 2hg1 .
We substitute the above expressions in the first equation of (4.7) for m = 0, m = M
1 2 1
v̇0 = 2
δx v0 = 2 (v−1 − 2 v0 + v1 )
h h
1 2
= − 2 (h η0 + 1) v 0 + 2 v 1 + g0 ,
h2 h
1 2 1
v̇M = 2
δx vM = 2 (vM −1 − 2 vM + vM +1 )
h h
1 2
= 2
2 vM −1 − 2 (h η1 + 1) vM + g1 .
h h
We can write this in matrix-vector notation. With v(t) = [v0 (t), . . . , vM (t)]T we get
1 2
v̇ = 2
Q v + d, (4.8)
h h
where
   
−2(hη0 + 1) 2 g0
   
1 −2 1 0
   
   
..
   
Q= .. .. ..
. . . , d= . (4.9)
   
   .

   

 1 −2 1 


 0 
2 −2(hη1 + 1) g1
We note that this matrix is not symmetric, but it can be obtained from√a symmetric√ matrix
via a similarity transformation. With the diagonal matrix D = diag( 2, 1, . . . , 1, 2) we
find in fact that Q̃ = D−1 QD is symmetric and it has therefore real eigenvalues. Q̃ is also
negative definite 1 .
To get an example of a fully discrete method, we can use the trapezoidal rule, and
obtain Crank–Nicolson. Let U n = (U0n , . . . , UMn )T , then

n+1 n k 1 n 2 n 1 n+1 2 n+1
U =U + QU + d + 2 QU + d ,
2 h2 h h h
or in matrix-vector notation
r r k
(I − Q) U n+1 = (I + Q) U n + (dn + dn+1 ).
2 2 h
Alternative discretization of the boundary conditions. It is possible to avoid ficti-

tious grid lines and use difference approximations of lower order, obtaining a less accurate
approximation
−∂x u0 + η0 u0 = g0 ,
↓
u1 − u0
− + η0 u0 = g0 + θ̃0 ,
h
where θ̃0 = 12 h∂x2 u0 + · · · , and respectively backward differences for the right boundary,
uM − uM −1
∂x uM + η1 uM = g1 → + η1 uM = g1 + θ̃1 .
h
4.5 Nonlinear parabolic differential equations

In general one could consider equations of the form
∂f
ut = f (x, t, u, ux , uxx ), > 0, + initial conditions & boundary conditions.
∂uxx
We semi-discretize the equation by introducing xm = mh and vm (t) ≈ u(xm , t), h =
1/(M + 1).
1 1 2
v̇m = f xm , t, vm , (vm+1 − vm−1 ), 2 δx vm .
2h h
1
a matrix A is negative definite if and only if(−A) is positive definite
4.5. NONLINEAR PARABOLIC DIFFERENTIAL EQUATIONS 45
If we include the boundary conditions, we get a system of ordinary differential equations

(ODEs). Taking v = [v1 , v2 , . . . , vM ]T and F = [F1 , F2 , . . . , FM ]T where

1 1 2
Fm = f xm , t, vm , (vm+1 − vm−1 ), 2 δx vm .
2h h
We have found a nonlinear system of ODEs
v̇ = F (t, v),
which can be solved with suitable ODE integration codes (for example in Matlab).
Burgers’ equation.
ut = ε uxx − uux
It is possible to semi-discretize by taking
ε 1
Fm = (vm+1 − 2vm + vm−1 ) − vm (vm+1 − vm−1 ).
h2 2h
Here it is possible for example to use a Runge-Kutta method applied to v̇ = F (v), see
for example > help ode45 in Matlab. Note that Burgers’ equations can be written in the
form
1
∂t u = ε ∂x2 u − ∂x u2 ,
2
which can be discretized directly with central differences, to obtain
ε 1 2 2
Fm = (vm+1 − 2vm + vm−1 ) − (v − vm−1 ).
h 2 4h m+1
A particular equation type. Sometimes nonlinear partial differential equations are

given in the following form
b(u) ut = (a(u) ux )x , b(u) > 0, a(u) > 0.
Here it is possible to use the same strategy as above on the following problem
a(u) a0 (u) 2
ut = uxx + u .
b(u) b(u) x
But a better approach is to let
1 1
((a(u) ux )x )m −→ 2
(δx (a δx v))m = 2 (am+1/2 (vm+1 − vm ) − am+1/2 (vm − vm−1 ))
h h
where
am±1/2 = a(vm±1/2 ) = a(v(xm ± h/2)),
these are quantities which are not defined on the grid. But we can use the following
approximation
1
um±1/2 = (um + um±1 ) + O(h2 ),
2
so such approximation has a truncation error of the same order as the truncation error due
to the approximation of the derivatives. We next define

vm + vm±1
αm±1/2 = a .
2
The semi-discretization becomes now
1
truncation error: O(h2 ).

b(vm )v̇m = 2 αm+1/2 (vm+1 − vm ) − αm−1/2 (vm − vm−1 ) ,
h
Crank–Nicolson for a nonlinear parabolic problem.
n+1 n k
Fm (U n ) + Fm (U n+1 ) ,

Um = Um +
2
where
1
Fm (U n ) = (α n
∆x Um n
− αm−1/2 ∇x Um ).
b(U n ) m+1/2
This means that we must solve a nonlinear equation to compute Um n+1 . If we require the
same accuracy as for Crank–Nicolson, but we would like to avoid solving nonlinear systems
of equations at each time step, we could try and apply a 3-level formula. We use central
differences in time and get
Umn+1 − U n−1
m
= Fm (U n ),
2k
or
n+1 n−1
Um = Um + 2k Fm (U n ).
NB! This formula is always unstable for this equation.
Modification. In the unstable formula we have

1 1
Fm (U n ) = n n

n 2
αm+1/2 ∆x Um − αm−1/2 ∇x Um .
b(Um ) h
Replace
n 1 n−1 n n+1
∆x Um −→ (∆x Um + ∆x Um + ∆ x Um ),
3
n 1 n−1 n n+1
∇x Um −→ (∇x Um + ∇x Um + ∇ x Um ).
3
We get the method
n+1 n−1 2 1 n−1 n n+1

Um = Um + r n)
αm+1/2 (∆x Um + ∆x Um + ∆ x Um )
3 b(Um
n−1 n n+1

−αm−1/2 (∇x Um + ∇x Um + ∇ x Um ) ,
which originally was proposed by Lees. The picute to the left

shows the computational molecule for this scheme. The for-
mula is linearly implicit, i.e. only one linear system is solved
per time-step. Here, as for multi-step methods for ODEs,
we need a starting procedure: we need to compute U 1 with
another method, which should have the same local trunca-
tion error as the method used for the rest of the integration.
Starting procedures are needed in general in p-level formulae
when p > 2.
Chapter 5
Stability, consistency and

convergence
We are now going to analyze the numerical solution of a partial differential equation. Part
of this theory is valid for general PDEs, but the examples are based on the heat equation
and the methods introduced so far for such equation.
5.1 Properties of the continuous problem

When approximating the solution of a partial differential equation it is important that the
PDE itself has a solution. A well posed PDE problem satisfyes the following three criteria
1. A solution of the problem exists.
2. The solution is unique.
3. The solution depends continuously on initial and boundary data.
Example. We consider again the I/BVP for the heat equation as an example
ut = uxx , 0 < x < 1, t > 0,
u(x, 0) = f (x), 0 ≤ x ≤ 1,
u(0, t) = g0 (t), t > 0,
u(1, t) = g1 (t), t > 0.
Initial and boundary data are the functions f , g0 and g1 .
Assume f , g0 and g1 are continuous and that f (0) = g0 (0), f (1) = g1 (0). Then the I/BVP
for the heat equation has a unique solution u(x, t) which is continuous for 0 ≤ x ≤ 1, t ≥ 0
and fulfilling the maximum principle

max |u(x, t)| ≤ max max |f (x)|, max |g0 (s)|, max |g1 (s)| . (5.1)
0≤x≤1 0≤x≤1 s≤t s≤t
More general and advanced results of this type can be found in the literature. We
are not going to report the proof of this result, but rather observe how the maximum
47
48 CHAPTER 5. STABILITY, CONSISTENCY AND CONVERGENCE
principle (5.1) implies the properties (2) and (3) listed above. Let u1 and u2 be solutions
with data
(i) (i)
f (i) , g0 , g1 , i = 1, 2.
Consider
(1) (2) (1) (2)
w = u(1) − u(2) , φ = f (1) − f (2) , γ0 = g0 − g0 , γ1 = g1 − g1 .
(1) (2) (1) (2)

We obtain wt = ut − ut = uxx − uxx = wxx so w is a solution of the heat equation with
data
w(x, 0) = φ(x), w(0, t) = γ0 (t), w(1, t) = γ1 (t),
and from (5.1) we obtain that

max |w(x, t)| ≤ max max |φ(x)|, max |γ0 (s)|, max |γ1 (s)| . (5.2)
0≤x≤1 0≤x≤1 s≤t s≤t
So uniqueness (2) follows now, because two solutions with the same initial and boundary
data will have φ, γ0 , γ1 identically equal to zero, and therefore also w(x, t) will be zero. But
also the property (3) follows from (5.2). The property (3) is often referred to as stability
of the differential equation. Can we transfer this concept to the numerical solution?
5.2 Convergence of a numerical method

Let u be the solution of a PDE problem (for example I/BVP as above) on a rectangle
ΩT = [0, 1] × [0, T ] = {(x, t) : 0 ≤ x ≤ 1, 0 ≤ t ≤ T } .
We introduce a grid
G = {(xm , tn ), 0 ≤ m ≤ M, 0 ≤ n ≤ N } ,
where
1 T
xm = mh, h = , tn = nk, k = .
M N
n ≈ u(x , t ). The discretization error is

Let U be defined on the grid such that Um m n
enm = unm − Um
n
, unm = u(xm , tn ).
We say that U → u in ΩT when h → 0, k → 0 if
max max |enm | → 0, h → 0, k → 0.

0≤n≤T /k 0≤m≤1/h
In more general terms. Consider
U n = [U0n , . . . , UM
n T
] , un = [un0 , . . . , unM ]T , en = [en0 , . . . , enM ]T , vectors in RM +1 .
5.3. DOMAIN OF DEPENDENCE OF A NUMERICAL METHOD 49
We choose now a vector-norm k · k defined on RM +1 for all M ≥ 0, and say that U → u i

ΩT if
max ken k → 0 when k → 0, h → 0.
0≤n≤T /k
Examples of norms are

ken k∞ = max |enm |,
m
and a scaled variant of the usual k · k2 -norm
M
!1/2
X 1
ken k2,h = h |enm |2 = √ ken k2 .
m=0
M
Remarks.
1. The concept of norm is a bit tricky in this context. We know that all norms on a
finite-dimensional space are equivalent , a fact that implies in turn that convergence
in one norm is equivalent to convergence in another norm. But now we must take
into account that the dimension of this space goes towards infinity when h → 0. We
have for example the relationship
√
kxk2,h ≤ kxk∞ ≤ M kxk2,h .
And the vector with 1 in the first component and 0 on the others will converge to 0
in the norm k · k2,h but not in the norm k · k∞ as h → 0 (and M → ∞).
2. The scaled norm k · k2,h can be interpreted as an approximation of the L2 -norm of

an underlying continuous function,
M
!1/2 1/2
X Z 1
kf k2,h = h |enm |2 ≈ 2
|f (x)| dx = kf kL2 ,
m=0 0
where f m = f (mh); or as the exact L2 -norm of an underlying piecewise-constant

function.
5.3 Domain of dependence of a numerical method

Domain of dependence of the Euler method We look at the computational molecule
for the Euler method earlier presented, we note that the approximation Um n in the point
n−1 n−1 and U n−1 . Each of these depends on three grid-points at
(xm , tn ) depends on Um−1 , Um m+1
the previous time level and so on. Continuing downwards in the same way, we find that Um n
0
depends indirectly on the points Um−n 0
, . . . , Um+n if one solves a pure initial value problem
(or if 0 ≤ m − n, m + n ≤ M ). The domain of dependence is in this case the triangle
0
with vertices Um−n 0
, Um+n n . Generally the domain of dependence of U n includes all U ν
, Um m µ
values which have been included in the computation of Um n.
n
Um
m, n + 1
m − 1, n m, n m + 1, n
In the (I/BV) problem the domain of dependence will look like in the picture below.
The angle
k
φ = arctan ,
h
t (m, n)
characterizes the domain of dependence for φ
the Euler method. The solution of the
PDE u(xm , tn ) has a domain of dependence
including the whole rectangle with corners
(0, 0), (0, tn ), (1, tn ), (1, 0). If we let r = hk2 be
constant when h → 0 then
k
φ = arctan = arctan rh → 0,
h
such that in the limit we will get the whole do- x
main of dependence for the exact solution.
5.4 Proof of convergence for the Euler’s method on the (I/BVP)

with r ≤ 12
We consider the problem (4.1) and recall that Euler’s method is
n+1 n
(5.3) Um = Um + r δx2 Um
n
, 1 ≤ m ≤ M, n ≥ 0,
U0n = g0n , n
UM n
+1 = g1 , n > 0,
U0m = fm , 0 ≤ m ≤ M + 1.
The exact solution satisfies
un+1
m = unm + r δx2 unm + kτm
n
, (5.4)
and τmn is the local truncation error. We define now en = un − U n , and subtract (5.3)
m m m
from (5.4). We get
en+1
m = enm + r δx2 enm + kτm
n
, n > 0, 1 ≤ m ≤ M. (5.5)
Moreover we have e0m = 0 and en0 = enM +1 = 0. We know that for the Euler’s method it
holds that
n 1 1
kτm = k 2 ∂t2 unm − k h2 ∂x4 unm + · · · ,
2 12
5.5. STABILITY ON UNBOUNDED TIME INTERVAL (F -STABILITY) 51
and it seems reasonable assuming enough regularity of the exact solution, to assume that
there exists a constant A such that
n
|τm | ≤ A(k + h2 ), for all m, n.
We write out explicitely (5.5) and get
en+1
m = r enm−1 + (1 − 2r)enm + r enm+1 + kτm
n
.
1
When we later on take the absolute value, we will make use of the hypothesis that 0 ≤ r ≤ 2
because this implies that both r and 1 − 2r are non-negative. We obtain therefore
|en+1 n n n 2 2
m | ≤ r |em−1 | + (1 − 2r) |em | + r |em+1 | + A (k + kh )
≤ max |en` |(r + (1 − 2r) + r) + A (k 2 + kh2 )
`
= max |en` | + A (k 2 + kh2 ).
`
Denoting now with E n = max` |en` | we find that
E n+1 ≤ E n + A (k 2 + kh2 ),
and since E 0 = 0 is
E n ≤ n k A (k + h2 ) = tn A (k + h2 ) ≤ T A (k + h2 ),
that is
max |en` | ≤ T A (k + h2 ) for all n ≤ T /k.
`
We conclued that Euler’s method converges, that is U → u when h → 0 and k → 0 for

constant r ≤ 21 .
5.5 Stability on unbounded time interval (F -stability)

Let us write the θ-method (including (E), (BE) (CN)) in a matrix form. We get
(1 − θ r δx2 )Um
n+1
= (1 + (1 − θ) r δx2 )Um
n
.
Let S be the matrix  

−2 1
 
1 −2 1
 
 
 
S= .. .. ..
. . . . (5.6)
 
 
 

 1 −2 1 

1 −2
We define the vector U n = [U1n , . . . , Um

n ]T for n = 0, 1, . . . we can write the θ-method in a
vector-form as
(I − θ r S)U n+1 = (I + (1 − θ) r S)U n + dn ,
where
dn = [θ r g0n+1 + (1 − θ) r g0n , 0, . . . , 0, θ r g1n+1 + (1 − θ) r g1n ]T .
Typically the difference scheme can be written as
A U n+1 = B U n + cn . (5.7)
A and B will depend on h and k since the elements are functions of h and k, but also
because the matrix dimension typically is M × M where M = 1/h (or M = 1/h − 1).
A and B can depend on n, but we assume here this does not happen.
A method is called a one-step method (ODE-terminology) or alternatively two-level-
method (PDE-terminology) if it can be written in the form
U n+1 = C U n + q n . (5.8)
If A in (5.7) is invertible, we can take C = A−1 B and q n = A−1 dn .
Definition of F -stability. (stability on (0, ∞))
For an arbitrary vector w0 , compute the sequence wn+1 = C wn , n = 0, 1, . . .. Choose a

vector norm k · k. We say that (5.8) is F -stable if there exists a constant L independent
on n such that
kwn k ≤ Lkw0 k for all w0 .
Note that the stability concept here considered has nothing to do with the solution of
a differential equation, but is merely a property of the difference scheme.
Criterion for F -stability.
ρ(C) < 1 ⇒ F -stability ⇒ ρ(C) ≤ 1.
Without Proof.
You can find more on F -stability in the Norwegian version of the note.
5.6 Stability on [0, T ] when h → 0, k → 0

We now change viewpoint, and consider the stability of a method approximating a PDE
on a rectangle [0, 1] × [0, T ]. We still look at the process n → ∞, but now simultaneously
k → 0, such that we always have an upper bound T = nk. We assume moreover that
h → 0 at the same time, so that the dimensions of the matrices increase in the process.
We call this concept simply stability in this case. We analyze also now a computational
scheme of the type(5.8).
5.6. STABILITY ON [0, T ] WHEN H → 0, K → 0 53
Definition of stability. We say that (5.8) is stable if and only if it exists a constant
L independent on h and k such that wn+1 = Cwn satisfies
T
kwn k ≤ L kw0 k for all n ≤ and starting vectors w0 , (5.9)
k
where k · k is a vector-norm.
Equivalent definition. The scheme defined by (5.8) is stable if and only if there exists
a constant L independent of h and k such that
T
kC n k ≤ L for all n ≤ , (5.10)
k
where the matrix norm is subordinate to the vector-norm in the previous definition.
Example. If we use the norm
kwk∞ = max |wm | i (5.9),

m
the corresponding subordinate norm is

X
kCk∞ = max |C`m | i (5.10).
`
m
Proof of the equivalence of (5.9) and (5.10). Assume that (5.9) holds true. Let h, k and
n be arbitrary. Since the matrix norm in (5.10) is subordinate, we can find w0 such that
kC n w0 k = kC n k kw0 k. And so we get
kC n k kw0 k = kC n w0 k = kwn k ≤ L kw0 k ⇒ kC n k ≤ L
Assume on the other hand that (5.10) holds true, and let w0 , h, k and n be arbitrary, then
kwn k = kC n w0 k ≤ kC n k kw0 k ≤ L kw0 k.
If not specified otherwise, we will assume from now on that the matrix-norm which is
used is subordinate to the vector norm in the definition (5.9).
Sufficient criterion for stability. If there exists a µ ≥ 0 independent of h and k such

that
kCk ≤ 1 + µ k,
then (5.8) is stable.
Proof
µ T
kC n k ≤ kCkn ≤ (1 + µ k)n ≤ (1 + µ k)T /k = (1 + µ k)1/(µ k) ≤ eµ T .
We observe that the seqence xn = (1 + 1/n)n is monotone increasing converges to e =

2.717 · · · when n → ∞. Such that we can write L = eµ T in (5.9).
Necessary condition for stability. If (5.8) is stable, there exists ν ≥ 0 independent

of h and k such that
ρ(C) ≤ 1 + ν k, (5.11)
where ρ(C) is the spectral radius of C.
Proof. Since we assume that (5.8) is stable, it exists a constant L such that kC n k ≤ L for
n ≤ T /k. Moreover we have from (2.3) that
ρ(C)n = ρ(C n ) ≤ kC n k.
So we get ρ(C) ≤ L1/n , n ≤ T /k, and in particular for n = T /k we have
ρ(C) ≤ Lk/T = ek/T ln L

.
We apply Taylor’s formula with reminder to the right hand side of this inequality and we
get
k
ρ(C) ≤ 1 + ln L ek/T θ ln L , where 0 < θ < 1.
T
We use now that ex is a monotone increasing function and that θ < 1 and k ≤ T , so we
get
k k
ρ(C) ≤ 1 + ln Leln L = 1 + L ln L,
T T
L ln L
such that (5.11) is fulfilled with ν = .
T
Remarks. The condition ρ(C) ≤ 1 + ν k is in general not sufficient for stability. A

somewhat artificial counterexample is obtained by considering C = C(h) = I+F ∈ RM ×M ,
M = 1/h, and assume that C has elements equal to 1 in the elements (i, i) and (i, i−1) and
0 otherwise (as in a Jordan block). It is quite easy to see, for example, that kC n k∞ = 2n
when 0 ≤ n ≤ M − 1. This fact is sufficient to conclude that (5.8) with such choice of C
can not be stable. But on the other hand since C is triangular and its eigenvalues coincide
with its diagonal elements, then it must be ρ(C) = 1, and (5.11) is fulfilled.
Note also that stability is depending on the norm, it is possible that a difference method
is stable in one norm, but not stable in another norm.
A common mistake. The following argument is wrong. Assume that the considered
subordinate matrix norm is such that for diagonal matrices we have ρ(D) = kDk (this is
true for the usual norms). Assume also that C is diagonalizable, C = P ΛP −1 .
kC n k = kP Λn P −1 k ≤ kP k kΛn k kP −1 k = kP k kP −1 k ρ(C)n
≤ kP k kP −1 k (1 + νk)n ≤ kP k kP −1 k eνT ,
so we have at the same time found a bound for kC n k. The problem is however that P can
depend on h, k in such a way that kP k (kP −1 k) → ∞ when h, k → 0.
5.6. STABILITY ON [0, T ] WHEN H → 0, K → 0 55
Important special case. If C is symmetric the condition ρ(C) ≤ 1 + ν k is both nec-

essary and sufficient for stability when we use k · k2,h . For symmetric matrices we have
namely kCk2,h = ρ(C).
Stability of the θ-method for (I/BVP). We recall that the θ-method has the form
(I − θ r S)U n+1 = (1 + (1 − θ) r S)U n + dn ,
such that
C = (I − θ r S)−1 (I + (1 − θ) r S).
Here we have r = k/h2 , and S is the symmetric matrix defined in (5.6). And we have
therefore the diagonalization S = P ΛP T where P T P = I. We get also
I − θ r S = P (I − θ r Λ)P T ,
(I + (1 − θ) r S) = P (I + (1 − θ) r Λ)P T .
By substituting in the expression for C we get
C = P (I − θ r Λ)−1 (I + (1 − θ) r Λ) P T = P ∆P T .
| {z }
∆
Now ∆ is a diagonal matrix with real elements
1 + (1 − θ) rλm
∆m =
1 − θ rλm
From the diagonalization it is clear that C is also symmetric so it is enough to require that
ρ(C) ≤ 1 + ν k for a ν ≥ 0. From before we know that
mπ
λm = −4 sin2 φm , φm = , m = 1, . . . , M,
2(M + 1)
and therefore
1 − 4(1 − θ) r sin2 φm
∆m = .
1 + 4 θ r sin2 φm
We assume that 0 ≤ θ ≤ 1, such that the expression in the numerator is ≤ 1, while the
denominator is such that ≥ 1. So we have ∆m ≤ 1 for all m. We try and require that
∆m ≥ −1, substitute the expression for ∆m in the above inequality, and multiply each side
with the denominator (which is positive)
1 − 4(1 − θ) r sin2 φm ≥ −1 − 4 θ r sin2 φm

⇓
2 (1 − 2θ) r sin2 φm ≤ 1.
If 21 ≤ θ ≤ 1 the left hand side is ≤ 0, so the inequality is satisfied for all values of r ≥ 0.
But if 0 ≤ θ < 12 we must require
1
r≤ , m = 1, . . . , M.
2 (1 − 2θ) sin2 φm
Mπ π hπ
The right hand side is minimal when m = M , i.e. φm = 2(M +1) = 2 − 2 . We get

2 π hπ hπ
sin − = cos2 ,
2 2 2
so the condition must become
1
r≤ .
2(1 − 2θ) cos2 (hπ/2)
For small values of h we have cos2 (hπ/2) ≈ 1 and we get a sufficient condition by substi-
tuting it with 1. In summary we get
Stability criterium for θ-method applied to (I/BVP).
1 1
0≤θ< 2 ⇒ Stable if 0 ≤ r ≤ ,
2(1 − 2θ)
1
2 ≤θ≤1 ⇒ Stable for all r ≥ 0.
Stability of the θ-method for (I/BVPD). (Case of boundary conditions involving

derivatives). If we apply the θ-method on the semi-discretized system (4.8) we get a
difference method which can be written in the form (5.8) with
C = (I − θ r Q)−1 (I + (1 − θ) r Q),
where Q is given by (4.9). Since Q is not symmetric also C will not be symmetric, and it
is not possible to use the condition ρ(C) ≤ 1 + ν k for non symmetric matrices (because
this is only necessary for stability). However in this particular case this condition is also
sufficient.
√ Here is√the explanation. We have seen earlier that by using the matrix D =
diag( 2, 1, . . . , 1, 2) we get that Q̃ = D−1 Q D is symmetric. This implies that also
C̃ = D−1 CD is symmetric. We obtain that
kC n k ≤ kDk kD−1 k kC̃ n k.

√
For the most common norms we have that kDk = 2 and kD−1 k = 1. Since C̃ is symmetric
we have stability if ρ(C̃) ≤ 1. But C and C̃ are similar matrices, so they have the same
eigenvalues. The stability is then guaranteed for (I/BVPD) if ρ(C) ≤ 1.
The matrix C has eigenvalues
1 + (1 − θ) r λm
∆m = , (5.12)
1 − θ r λm
where λm are the eigenvalues of Q. We can not find an explicit expression for the eigenval-
ues of Q, but we know they are real because Q̃ is symmetric. We can use the Gershgorin’s
theorem to get a sufficient criterion for stability.
5.7. STABILITY AND ROUNDOFF ERROR 57
The Gershgorin’s discs for all rows except the

first and the last are identical, see the disc fur-
Im z
thest to the right in the picture. The first and
the last discs are drawn on the left. Their cen-
ter is −2(1+ηi h), i = 0, 1 and their left inter-
section with the real axis is −4 − 2ηi h, i =
0, 1. The eigenvalues of Q are therefore on
the real axis and −4 − 2η h ≤ λ ≤ 0 where
Re z η = max{η0 , η1 }. From (5.12) it follows now
that ∆m ≤ 1 for 0 ≤ θ ≤ 1. The condition
∆m ≥ −1 gives us moreover
r λm (2θ − 1) ≤ 2,
,which is satisfied for θ ≥ 12 , for any r > 0.
Let θ < 21 .
If we had had λm = 0 the inequality would have hold unconditionally. For λm < 0 we divide
on both sides by a the positive value −λm (1 − 2θ). The critical value occurs for the eigen-
value placed furthest to the left so we substitute the limit value instead λm ≥ −4 − 2η h.
In the end we get
Stability criterion for the θ-method applied to (I/BVPD).
1 1
0≤θ< 2 ⇒ Stable if 0 ≤ r ≤ ηh
,
2(1 − 2θ)(1 + 2 )
1
2 ≤θ≤1 ⇒ Stable for all r ≥ 0.
where η = max{η0 , η1 }.
5.7 Stability and roundoff error

In numerical computations on the computer one should always take into account roundoff
error, because the real numbers in a computer are represented only with a fixed and finite
number of digits. When we try and compute U n+1 from (5.8), it is in fact another quantity
we really find, and it is given by
Ũ n+1 = C Ũ n + q n + sn .
The vector sn contains the round-off error produced at step n. If we define the error due
to round-off by Rn = Ũ n − U n we get
Rn+1 = C Rn + sn .
We can use the formula recursively and we get
Rn = C n R0 + C n−1 s1 + · · · + C sn−2 + sn−1 .
Assume R0 = 0. Then we get
kRn k ≤ kC n−1 k ks0 k + · · · + kCk ksn−2 k + ksn−1 k.

Further we assume to have a bound σ such that ks` k ≤ σ for all `. If (5.8) is stable we get
then
n−2
X
kRn k ≤ σ + L σ = (1 + (n − 2)L)σ,
j=1
so stability guarantees that the round-off error increases at most linearly with n.
5.8 Consistency and Lax’ equivalence theorem

We recall that (5.7)
AU n+1 = BU n + cn , (5.13)
which holds for difference methods applied to a linear PDE. If we substitute the exact
solution of the PDE in the formula, we obtain the local truncation error as residual, we
set τ n = [τ1n , . . . , τm
n ]T , and have by definition
Aun+1 = Bun + cn + k τ n . (5.14)
Consistency. The difference method (5.13) is consistent (with the differential equa-
tion) if
n
τm → 0, for all m, n når h → 0, k → 0.
Remark. The literature is not consistent about the definition of local truncation error,
and this fact influences also the definition of consistency. Alternatively it is common to
n = k τ n such that the condition for consistency is 1 τ n
define truncation error as τbm m k bm → 0.
Lax’ equivalence theorem. A consistent difference scheme is convergent if and only

if it is stable.
The proof of Lax’ equivalence theorem is outside the scope of this course. We will instead
juts show a simpler result, namely that consistency and stability imply convergence.
Proposition Assume the two level difference scheme (5.13) is consistent (i.e. for τmn
n
defined by (5.14), τm → 0 when k → 0 and h → 0) and there exsist constants K > 0,
K̃ > 0 and H > 0 such that the inverse A−1 exitst for all h < H and k < K and
kA−1 k ≤ K̃ for all h < H and k < K in a given norm subordinate to the vector norm
k · k.
Assume the difference scheme can be written in the form (5.8) with C = A−1 B and it is
stable.
Then the difference scheme converges.
5.9. VON NEUMANN’S STABILITY CRITERION 59
Proof. Subtracting (5.14) from (5.13) we obtain the following equation for the error
AE n+1 = BE n − kτ n .
Exchanging n + 1 with n and multiplying by the inverse of A we get
E n = A−1 BE n−1 − kA−1 τ n−1 ,
we set q n−1 := −kA−1 τ n−1 and simplify the above equation to obtain
E n = CE n−1 + q n−1
and by using the formula recursively we obtain
E n = C n E 0 + C n−1 q 0 + C n−2 q 1 + · · · + Cq n−2 + q n−1 .
Taking the norm on both sides and using the assumed stability of (5.8) we get
n−1
X
kE n k ≤ L kq s k,
s=0
now we use that

kq s k ≤ k kA−1 k kτ s k ≤ k K̃ kτ s k ,
and we obtain
kE n k ≤ L K̃ n k max kτ s k ≤ L K̃ T max kτ s k,
0≤s≤n−1 0≤s≤n−1
which we substitute in the bound for the norm of the error and we get
kE n k ≤ L T K̃ max kτ s k → 0,
0≤s≤n−1
when k → 0, h → 0, and we have proved convergence.
5.9 von Neumann’s stability criterion

We consider now again the heat equation, but we use periodic boundary conditions
ut = uxx , −∞ < x < ∞, t > 0,

u(x, 0) = f (x), −∞ < x < ∞,
f (x + 2π) = f (x), x ∈ R,
u(x + 2π, t) = u(x, t), x ∈ R.
1
0
u
−1
−5 0 5 10
x ,
We can expand f (x) in a Fourier series

∞ Z 2π
X 1
f (x) = Aβ eiβx , Aβ = f (x) e−iβx dx.
2π 0
β=−∞
Using separation of variables, we get solutions of the form

2
uβ (x, t) = e−β t eiβx , β ∈ Z.
and by using the initial function f (x) this gives

∞
2
X
u(x, t) = Aβ e−β t eiβx .
β=−∞
Let us consider an analogous analysis for the numerical solution, we use first the Euler
method.
n+1 2π
Um = (1 + r δx2 )Um
n
, h= ,
M
n n
Um+M = Um for all m ∈ Z,
0
Um = f (xm ) for all m ∈ Z.
We can now try and write

∞
X
n
Um = Aβ ξ n eiβxm , (5.15)
β=−∞
this formula fits for 0.

Um We check if it is possible to choose ξ such that this is satisfied
also for n > 0. It is enough to check one general term in the series, so we set
n
Um = ξ n eiβxm .
Which substituted in the Euler’s method gives
ξ n+1 eiβxm = ξ n eiβxm + r (ξ n eiβxm−1 − 2 ξ n eiβxm + ξ n eiβxm+1 ).
We can assume ξ 6= 0 and use that xm = mh, we can divide each side by ξ n eiβxm and we
get
βh
ξ = 1 + r (e−iβh − 2 + eiβh ) = 1 + 2r (cos βh − 1) = 1 − 4r sin2 .
2
We can take ξ = ξβ from this expression in the sum (5.15) and obtain
∞
X
n
Um = Aβ ξβn eiβxm , ξβ = 1 + 2r (cos βh − 1),
β=−∞
2
which is the exact solution of the differential equation. ξβ corresponds the factor e−β h in
the expression for the exact solution We should therefore require that |ξβ | ≤ 1 for all β to
ensure that the numerical solution is stable. In this particular case ξβ is real, and we note
immediately that ξβ ≤ 1 for all β. When we require ξβ ≥ −1 we get as a condition
1
r≤ βh
, for all β,
2 sin2 2
so we must again require r ≤ 21 .

5.9. VON NEUMANN’S STABILITY CRITERION 61
General case. We consider 2-level difference formulae written in the form

r
X s
X
n+1 n
ap Um+p = bp Um+p .
p=−r p=−s
We look for solutions of the form

n
Um = ξ n eiβxm ,
which substituted into the difference formula give

r
X s
X
n+1 iβxm iβph n iβxm
ξ e ap e =ξ e bp eiβph ,
p=−r p=−s
and so
s
X
bp eiβph
p=−s
ξ = r .
X
iβph
ap e
p=−r
Von Neumann’s stability criterion. There is a constant µ ≥ 0 such that
|ξ| ≤ 1 + µ k.
Example. Let us now use the differential equation
ut = uxx − λux .
We use Euler’s method and central differences on ux
n+1 n k 2 n k
Um = Um + δ U −λ (U n − Um−1
n
).
h2 x m 2h m+1
n = ξ n eiβxm and note that r = k k 1

Assume Um 2
implies that = rh.
h 2h 2
λrh iβh
ξ = 1 + r (e−iβh − 2 + eiβh ) − (e − e−iβh )
2
βh
= 1 − 4 r sin2 − i λ r h sin βh
2
We compute |ξ|2 = (Re ξ)2 + (Im ξ)2 ,
2
βh
|ξ|2 = 1 − 4 r sin2 + λ2 r k sin2 βh, siden r2 h2 = r k.
2
In order to get |ξ| ≤ 1 + µ k we need to have
βh
1 − 4 r sin2 ≤ 1 + µ̃ k,
2
and from the Euler’s method applied to ut = uxx we know already that this implies r ≤ 12 .
But if r ≤ 12 we get
1
|ξ|2 ≤ 1 + λ2 k,
2
and obtain by using the mean-value theorem that von Neumann’s stability criterion is
satisfied with µ = 14 λ2 when r ≤ 21 .
Relationship between von Neumann and the earlier definition of stability.

Assume
1. the differential equation has constant coefficients and one dependent variable; (only
one u-component).
2. we are given a pure initial value problem;
3. we apply a 2-level difference formula;
then von Neumann’s stability is necessary and sufficient for stability (as previously de-
fined).
One might note that von Neumann’s stability analysis is used as an indication of
stability or instability in much more general cases.
Example.
ut = a(x, u) uxx , (+randkrav & startkrav).
Using Euler’s method we get
n+1 n n
Um = Um + r a(x, Um ) δx2 Um
n
.
With a constant the von Neumann’s stability criterion has the form
βh
|ξ| = 1 − 4 a r sin2 = 1 + O(k).
2
Let us for example assume it is possible to bound a such that 1 ≤ a(x, u) ≤ 2 for all x, u
we are interested in. Then we could require
1 1
r≤ = ,
2 max a 4
without having performed any rigorous analysis.
Chapter 6
Elliptic differential equations
6.1 Elliptic equation on the plane

We consider partial differential equations of the type
a uxx + 2b uxy + c uyy = d(x, y, u, ux , uy ), (x, y) ∈ Ω
where a, b, c and d can be functions of x and y.
Ω ∂Ω
x
,
Ellipticity If the functions a, b and c for all (x, y) ∈ Ω satisfy
ac − b2 > 0
the differential equation is elliptic on Ω.
Example. If a = c = 1, b = d = 0 we get the Laplace equation
uxx + uyy = 0.
Boundary conditions. There are three types boundary conditions
1. Dirichlet boundary conditions: u = f on ∂Ω.

∂u
2. Neumann boundary conditions: ∂n = ~n · ∇u = g on ∂Ω.
63
64 CHAPTER 6. ELLIPTIC DIFFERENTIAL EQUATIONS
∂u
3. Robin boundary conditions: αu + β ∂n = γ on ∂Ω.
The boundary conditions can often be a mixture of 1–3, such that the boundary ∂Ω
can be divided in subparts, and with different boundary conditions on the different parts.
Maximum principle. Let Ω be a open connected subset of R2 . Define the differential

operator L by
Lu = a uxx + 2b uxy + c uyy + d ux + e uy
where a, b, c, d and e are functions of x and y, and L is elliptic (ac > b2 ) in Ω.
If Lu = 0 on Ω, then u can not assume a strict local maximum or minimum on Ω unless

u = constant on Ω.
Alternative formulation. If u is continuous on Ω̄ = Ω ∪ ∂Ω and Lu = 0 on Ω u will

assume its maximum/minimum on the boundary ∂Ω.
6.2 Difference methods derived using Taylor series

We start considering a regular grid.
Grid-lines: x = x` , y = ym
Grid-points: P = (x` , ym )
Gitterpunkt
Skjæringspunkt We look for an approximation of the so-

lution of the elliptic PDE on a net made
of grid-pints and points of intersection be-
tween the boundary and the grid-lines.
We define some subsets of the net

G = {(x` , ym )} : the whole grid
N o : G ∩ Ω, set of the internal grid-points
D = {(x, y) ∈ ∂Ω : x = x` or y = ym }
N = No ∪ D
The intersection points (between the boundary and the grid-lines) can cause problems,
because often they reduce the accuracy of the numerical approximation and can also destroy
the matrix structure (e.g. symmetry) and produce linear systems of equations which are
difficult to solve numerically via iterative techniques.
6.2. DIFFERENCE METHODS DERIVED USING TAYLOR SERIES 65
Grid-like net. A net is grid-like if all the points in the set D are grid-points. In the
sequel we consider grid-like nets with constant step-size, i.e. x` = x0 + `h, ` = 0, 1, . . . and
ym = y0 + mh, m = 0, 1, . . ..
Poisson’s equation. y
∆u = uxx + uyy = f i Ω, u = g(x, y) på ∂Ω.
1 2
∂x2 up = δ up + O(h2 )
h2 x
1 2
∂y2 up = δ up + O(k 2 )
k2 y
k
where up is the solution of the differential equa-
tion evaluated in the point p = (x, y). x
h
If we discretize the differential equation in the point p = (x, y) we get
1 2 1
∆up = fp −→ 2
δx up + 2 δy2 up = fp + τp
h k
where fp is the function f evaluated in the point p, and the local truncation error τp is
1 2 4 1 2 4
τp = h ∂x up + k ∂y up + · · · (6.1)
12 12
We let Up be the approximation of up and we get the linear system
1 2 1 2
δ U
h2 x p
+ δ U
k2 y p
= fp , p ∈ N o,
Up = gp p ∈ D.
Classic 5-points formula for the Poisson

equation. In the case k = h we get n
δx2 Up + δy2 Up = h2 fp .
p
We can rewrite this formula by using the four
directions: north (N), south (S), east (E) and
v ø
west (W)
Uw + Us + Ue + Un − 4Up = h2 fp
s
Example. We compute in detail a concrete case.
∆u = 0, (x, y) ∈ Ω = (0, 1) × (0, 1), u(x, y) = g(x, y), (x, y) ∈ ∂Ω
where g(x, y) is as as described in the picture.

y u = 4x(x2 − 1)
g(0, y) = 0, 0≤y≤1
3 4
g(x, 0) = 0, 0≤x≤1
u=0 u = 4y(1 − y 2)
g(1, y) = 4y(1 − y 2 ), 0≤y≤1
1 2
g(x, 1) = 4x(x2 − 1), 0 ≤ x ≤ 1
u=0 x
p = 1 : −4 U1 +U2 +U3 = 0
32
p=2: U1 −4 U2 +U4 = − 27
32
p=3: U1 −4 U3 +U4 = 27
p=4: U2 +U3 −4 U4 = 0.
By solving these equations we get
8 8
U1 = U4 = 0, U2 = , U3 = − .
27 27
You can verify that the solution of the partial equation is
u(x, y) = 4xy (x2 − y 2 ).
From (6.1) we see that the local truncation error is identically equal to zero because
≡ 0 and ∂y4 u ≡ 0 and therefore all the higher order derivatives. So in this case the
∂x4 u
formula gives the exact solution of the problem. This is a very special case, with different
boundary conditions we would get τp 6= 0.
6.2.1 Discretization of a self-adjoint equation

We consider the problem
Lu = f, where Lu = ∂x (a∂x u) + ∂y (c∂y u), a = a(x, y), c = c(x, y).
n
1
∂x (a∂x u) = (aø 0 (uø − up ) − av0 (up − uv )) + O(h2 )
h2 n’
1
(cn0 (un − up ) − cs0 (up − us )) + O(k 2 )
∂y (c∂y u) = p
k2
So Lu = f can be discretized to v v’ ø’ ø
αø Uø + αn Un + αv Uv + αs Us − αp Up = fp ,
s’
where αp = αø + αn + αv + αs s
and
1 1 1 1
αø = 2
aø 0 , αn = 2 cn0 , αv = 2 av0 , αs = 2 cs0 .
h k h k
6.3. BOUNDARY CONDITIONS OF NEUMANN AND ROBIN TYPE 67
6.3 Boundary conditions of Neumann and Robin type

Example.
y
∆u = 0 iΩ
∂Ω2
u = g(x, y) on ∂Ω1 ∪ ∂Ω2
∂Ω3 ∂Ω1
Ω ∂x u − q (0) u = f (0) on ∂Ω3
n
∂y u − q (1) u = f (1) on ∂Ω4
∂Ω4 n x
n
We require that q (0) ≥ 0 and q (1) ≥ 0. Now we need
an equation for Up for all p ∈ N o , and for all p on ∂Ω3
∂Ω3
and ∂Ω4 . Let us for the sake of simplicity use the same v p ø
step in both space directions, i. e. k = h. Let us use
the left boundary ∂Ω3 as an example,
∂x u − qu = f.
s
Alternative 1.
uø − up Uø − Up
∂x up = + O(h) −→ − qp(0) Up = fp(0)
h h
(0) (0)
where qp and fp are the given functions q (0) and f (0) evaluated in the point p.
Alternative 2. Use the fictitious point v lying outside the domain see picture.
uø − uv Uø − Uv
∂x up = + O(h2 ) −→ − qp(0) Up = fp(0) (6.2)
2h 2h
The extra unknown Uv requires an extra equation, and we use the scheme for the approx-
imation of the differential equation in the point p
Uø + Un + Uv + Us − 4Up = 0, (6.3)
we eliminate Uv using the discretized boundary condition. From (6.2) we find
Uv = Uø − 2h (qp(0) Up + fp(0) ),
which substituted in (6.3) gives
2 Uø + Un + Us − (4 + 2 h qp(0) )Up = 2 h fp(0) .
This discretization is more accurate than alternative 1.

The same approach is used on the boundary ∂Ω4 . For this boundary we see only at
alternative 2 with fictitious boundary point s, and pon ∂Ω4 :
un − us Un − Us
∂y up = + O(h2 ) −→ − qp(1) Up = fp(1) .
2h 2h
Here we can also use (6.3) and eliminate the fictitious value Us , the result is
2 Un + Uv + Uø − (4 + 2 h qp(1) )Up = 2 h fp(1) .
Boundary along the grid-diagonal
h
∂n u + qu = ~n · ∇u + qu = f
~n
h ~n = [nx , ny ]T
v p
nx = ny = √1 because k = h
2
⇓
s
∂n u = nx ∂x u + ny ∂y u = √1 (∂x u + ∂y u)
2
up − uv up − us

1
∂n up = √ + + O(h)
2 h h
such that a first order approximation is

√ √
(2 + 2 h qp )Up − Uv − Us = 2 h fp .
It is usual to consider such low order approximations for this kind of boundary. In principle
it is possible to use fictitious grid-points as in alternative 2, but the formulae become quite
complicated.
A problem with pure Neumann boundary conditions for the Laplace’s equation.
The problem
∆u = 0, in Ω
∂n u = g, on ∂Ω
R
hasR a solution only if ∂Ω g = 0. This can be shown by using the divergence theorem
on Ω ∆u. If u is a solution, we see immediately that also u + c is a solution for an arbitrary
constant c. Therefore we have not a well posed PDE-problem. This fact has a counterpart
in the discrete problem where we solve a linear system of equations of the type AU = b
where A is a square singular matrix. If b belongs to the range of A (the span of the columns
of A) there is at least a solution (not necessarily unique as we can add any arbitrary non
zero vector belonging to the null space of A), otherwise there is no solution of the linear
system.
6.4. GRID-LIKE NET AND VARIABLE STEP-SIZE 69
6.4 Grid-like net and variable step-size

n
y

n0

hn

h
v

p ø0 ø
v v0

hø
hs s0
x
s
We consider the approximation of
Lu = ∂x (a∂x u) + ∂y (c∂y u).
We let
Uø − Up Up − Uv

(x) 2
∂x (a∂x u) −→ Lh Up = aø 0 − av 0
hv + hø hø hv
and
Un − Up Up − Us

(y) 2
∂y (c∂y u) −→ Lh Up = cn0 − cs0
hs + hn hn hs
We refer to the picture above to the right for the definition of the points p, v, v 0 , s, s0 , ø, ø 0 , n, n0
and the relative step-sizes. We approximate L by
(x) (y)
Lh = Lh + Lh
and find
h2 h3

(x) 2 hø 1
Lh up = (1 + ∂x + ø ∂x2 + ø ∂x3 + · · · )(ap (∂x + h2ø ∂x3 + · · · )up )
hv + hø 2 8 48 24
2 3

hv hv 2 hv 3 1 2 3
−(1 − ∂x + ∂x − ∂x + · · · )(ap (∂x + hv ∂x + · · · )up )
2 8 48 24
3 3
1 1 hø + hv
= ∂x (a∂x )up + (hø − hv )∂x2 (a∂x )up + (∂x a∂x3 + ∂x3 a∂x )up + . . .
3 24 hø + hv
(y)
By a similar calculation for Lh up eventually we get

 O((hø − hv ) + (hn − hs ) + h2 + h2 ) generelt

ø n
Lu − Lh u =
 O(h2 + h2 )

hø = hv , hn = hs .
ø n
6.5 General rectangular net
Indre gitterpunkt
Nettpunkt på randen
We consider the equation ∆u = f as an example.
Dirichlet problem. We can use the general 5-points-formula with hø , hv , hn , hs for all
the grid-points internal to the domain.
Robin problem. The difficulty here is ∂n u = ~n · ∇u. We let the grid be rectangular
with step-sizes h and k respectively in the x- and y-direction.
d(P, Q) = d
~n
d(Q, S) = h0
P
d(P, S) = k 0
R Q S
We have
uP − uQ p
∂n uP = + O(d), d = h02 + k 02 .
d
The problem is that Q is not a grid-point, but we can approximate the solution in the
point Q using linear interpolation. We find
h0 h − h0
uQ = uR + uS + O(h2 ).
h h
Therefore we get
h0 h − h0

1
∂n uP = uP − (uR + uS ) + O(h2 /d) + O(d).
d h h
But we note that this type of discrete problem is generally difficult to derive and handle.
6.6. DISCRETIZATION USING TAYLOR EXPANSION ON A COMPLETELY GENERAL NET71
6.6 Discretization using Taylor expansion on a completely

general net
We want to set up a discrete version of the

operator
y
Lu = auxx +2buxy +cuyy +dux +euy +f u.
Let P be an arbitrary node internal to the

domain. We chose s nodes among the rest
of the net-points, typically the s closest
adjacent points to P . Let h be a character-
istic grid spacing. We describe le position
of Qi relative to P by coordinates
P Qi = (ξi h, ηi h).
We approximate the operator L by Lh

where we assume
s
X
Lh UP = αi UQi − α0 UP . x
i=0
for a choice of constants α0 , . . . , αs .
As usual we insert the exact solution u in this formula and using Taylor expansion in
2 dimensions as in (2.4), we get
1 1
uQi = uP + ξi h∂x uP + ηi h∂y uP + ξi2 h2 ∂x2 uP + ξi ηi h2 ∂x ∂y uP + ηi2 h2 ∂y2 uP + · · · ,
2 2
which substituted in the espression for Lh uP gives
s s s s
! ! ! !
X X X 1 2X 2
Lh up = αi − α0 uP + h ξi α i ∂x uP + h ηi αi ∂y uP + h ξi αi ∂x2 uP
2
i=1 i=1 i=1 i=1
s s
! !
2
X 1 2X 2
+ h ξi ηi αi ∂x ∂y uP + h ηi αi ∂y2 uP
2
i=1 i=1
Since this should be “as similar as possible to Lup ”, we should require

s
X
αi = α0 + f,
i=1
s
X d
ξi αi = ,
h
i=1
s
X e
ηi αi = ,
h
i=1
s
X 2a
ξi2 αi = ,
h2
i=1
s
X 2b
ξi ηi αi = ,
h2
i=1
s
X 2c
ηi2 αi = .
h2
i=1
Moreover the following equation should be satisfied for as many values of ` as possible
s
X
ξi` ηi`−m αi = 0, m = 0, . . . , `, ` = 3, 4, . . . .
i=1
This can be justified by considering the rest of the Taylor expansion of Lh uP , the terms
h3 , of order 3 in h, and higher
··· ` s
X h` X ` X
(αi ξim ηi`−m ) ∂xm ∂y`−m uP .
`! m
`=3 m=0 i=1
6.7 Difference formulae derived by integration

This technique is called often box-integration, and is closely related to what in the literature
goes under the name of finite-volume methods.
Gauss’ divergence theorem. Given a domain
Ω as in the picture with boundary ∂Ω and such
y that in each point the normal vector ~n is pointing
outwards. Let p~ = p~(x, y) be a vector field on Ω.
~n Then it is true that
Z I
div p~ dA = p~ · ~n ds.
Ω Ω ∂Ω
We denote by div p~ = ∇ · p~ the divergence of the

vector field p~. Specially if p~ = ∇u (gradient vector
field) we get that div p~ = ∆u = ∂x2 u + ∂y2 u. So we
∂Ω obtain
x Z I I
∆u dA = ∇u · ~n ds = ∂n u ds.
Ω ∂Ω ∂Ω
6.7. DIFFERENCE FORMULAE DERIVED BY INTEGRATION 73
We illustrate the box-integration via a special example. Let Ω = R be a rectangle with

boundaries Γ1 , . . . , Γ4 as in the picture.
u kjent
u ukjent
y
Γ2
Γ3 R Γ1
Γ4 x
We consider the problem
∆u = f, i R.
∂n u + au = d, on Γ1 ,
u = g, on Γ2 ∪ Γ3 ∪ Γ4 .
Here we have introduced a rectangular grid on R, note that it is possible that the step-sizes
are variable in both directions.
Let now P be an internal grid-point in R. We h3
first look at the rectangle bounded by the grid- Q2
lines neighbouring the gridlines on which P lies
(see the picture). Next we look at a new rectan- γ2
gle placed exactly in the center of the previous h1
one as in the picture, we call this Ω. The edge γ3 Ω ~n
Q3 P
boundaries of Ω are centered between the grid-
lines and are called γi , i = 1, 2, 3, 4. The lenght γ1 Q1
of γi is called ì . We let the step-size out of P be
h1 (to the right), h2 (upwards), h3 (to the left), h4 γ4
h4 (downwards). So as in the picture we get
h2 + h4 h1 + h3 Q4
`1 = `3 = , ` 2 = `4 = .
2 2 h1
1
The area Ω becomes A = 4 (h1 + h3 )(h2 + h4 ). Now we use Gauss’ divergence theorem on
the small rectangle Ω and get
Z I Z
∆u = f ⇒ ∆u dA = ∂n u ds = f dA .
Ω
| ∂Ω {z } | Ω {z }
I II
We try and approximate I and II.
4 Z 4
uQi − uP
I X X
I: ∂n u ds = ∂n u ds ≈ ì
∂Ω γi hi
i=1 i=1
Z
1
II: f dA ≈ fP A = fP (h1 + h3 )(h2 + h4 ).
Ω 4
So our difference formula becomes
4
X ì
(UQi − UP ) = fP .
Ahi
i=1
We can alternatively write the formula in the "old format"

4
X
αi UQi − α0 UP = fP
i=1
where
2 2 2 2 2 2
α1 = , α2 = , α3 = , α4 = , α0 = + .
h1 (h1 + h3 ) h2 (h2 + h4 ) h3 (h1 + h3 ) h4 (h2 + h4 ) h1 h3 h2 h4
This formula can be used for all internal points in R. But we must have equations also
for the unknowns on the boundary Γ1 .
Now γ1 coincides with the exterior boundary Γ1
where we have
∂n u + au = d Q2
Gauss’ divergence theorem on Ω gives
4 Z
X Z γ2
∂n u ds = f dA. h2
i=1 γi Ω γ3 Ω
Q3 P
Looking in particular at the boundary γ1 where
∂n u is given as d − au we obtain
γ1
Z Z
∂n u ds = (d − au) ds ≈ `1 (dP − aP uP ) h4 γ4
γ1 γ1
where aP and dP are functions of a and d eval- Q4

uated in the point P . On the other boundaries
we set
h3
uQ − uP
Z
∂n u ds ≈ ì i , i = 2, 3, 4.
γ1 hi
We end up with the following difference formula for the point P belonging to the boundary
Γ1 ,
4
X ì
`1 (dP − aP UP ) + (UQi − UP ) = fP A
hi
i=2
where
1 1 h2 + h4
A = h3 (h2 + h4 ), `2 = `4 = h3 , `3 = .
4 2 2
6.8. NET BASED ON TRIANGLES 75
6.8 Net based on triangles

The nice thing about box-integration is that it is not
based on the assumption that the domain is subdi-
vided in rectangles, in fact triangles can be used with
the same ease. It is often simpler to subdivide a do-
main which has not a rectangular shape using trian-
gles rather then rectangles. The triangles need not be
equal or similar (same angles), but we require that all
the angles are less than 90 degrees. We also assume we
do not have the so called “hanging nodes”, we require
that no node can be placed along the edge of a triangle
(except on the corner points).
In the next picture we have depicted a part of the
triangular net where s triangles (s = 6 in the picture)
have the internal grid-point P as a common corner.
The edge γi in the internal polygon intersects with a
90 degrees angle in the midpoint of the segment P Qi .
Q2 Q1
We let γi have length ì and the segments P Qi length
hi . We assume also that Ω has area A. We use Gauss’
divergence theorem on Ω, and get γ1
γ2
s Z
X Z
∂n u ds = f dA. Ω P
γ6
γ3
Q3 Q6
i=1 γi Ω
Now we approximate as before γ4 γ5
UQ − UP
Z
∂n u ds ≈ ì i
γi hi Q4 Q5
such that the final formula becomes
s
X ì
(UQi − UP ) = fP .
hA
i=1 i
We have not given specific formulae for the computation of A or any relation between ì
and hi (this is not possible in the general case we have considered). There is an important
alternative to box-integration, namely the so called finite element method. Which rely
on completely different mathematical fundaments compared to the one presented in this
and the previous chapter. The course TMA4220 Numerical solution of partial differential
equations with the finite element method treats all the details abut these methods.
6.9 Difference equations

Let us first write an elliptic PDE with boundary conditions in an abstract form as
Lu = f, in Ω,
Bu = g, on ∂Ω.
We introduce the following discretization of this problem
s
X
α0 UP − αi UQi = βP . (6.4)
i=1
We let P range over all the points where we want to approximate u. Here Qi is the
neighbouring point of P for i = 1, . . . , s. The coefficients α1 , . . . , αs can depend on P ,
and similarly does s. Using these formulae one determines the linear system of algebraic
equations approximating the elliptic PDE. The entries of the matrix defining such linear
system (coefficient matrix) are the coefficients αi , i = 0, . . . s for all P where the numerical
solution is sought.
Useful properties for (6.4).

i.
α0 > 0
for all P.
αi ≥ 0
ii.
s
X
α0 ≥ αi for all P,
i=1
if this property hols the coefficient matrix is called diagonally dominant. We also
require strict inequality for at least one P .
iii. The coefficient matrix is symmetric.
P Qi
If we assume that the coefficient used in node P in front of Q
is αP,Q , the symmetry of the matrix means that αP,Q = αQ,P .
We recall the maximum principle discussed in chapter 6.1. An analogous principle
holds true for difference equations of the type described above.
Discrete maximum principle. Assume that the difference equation satisfies (i) and
(ii) given above. Assume the quantity VP is defined for all P ∈ Ω ∪ ∂Ω and that
X
α0 VP − αi VQi ≤ 0
i
for every P ∈ Ω (internal grid-point).

Then we have
VP ≤ max VS for all P ∈ Ω.
S∈∂Ω
Proof. Assume the converse is true, namely that there is a P ∗ ∈ Ω such that
VP ∗ = max VP > max VS . (6.5)

P ∈Ω S∈∂Ω
6.10. CONVERGENCE OF THE METHODS FOR ELLIPTIC EQUATIONS 77
By using the hypothesis of the theorem we get

X
α0 VP ∗ ≤ αi VQi
i
m (i)
X αi (ii) 1 X X
VP ∗ ≤ VQi ≤ P αi VQi = γi VQi
α0 j αj
i i i
where
αi X
γi = X such that γi = 1.
αj i
j
We get therefore that

X X
γi VQi ≤ γi max VQi = max VQi
i i
i i
and then VP ∗ ≤ maxi VQi . By (6.5), we get VP ∗ = maxi VQi = VQi1 , so (α0 − αi1 )VP ∗ ≤
P
i6=i1 αi VQi . So either there is only one neighbouring point Qi1 , or α0 − αi1 > 0 so by
a similar argument we get VP ∗ = maxi6=i1 VQi and we can proceed til we cover all the
neighbouring points so that VQi = VP ∗ for all i. We can next move in turn to each one
of the neighbouiring points Qi and repeat the same argument till one of the neighbouring
points is a boundary point S. Therefore we get that VS = VP ∗ for S ∈ ∂Ω which is a
contradiction (see (6.5)) and we conclude that the theorem holds.
6.10 Convergence of the methods for elliptic equations

6.10.1 Convergence for the 5-point formula on a Dirichlet problem
Consider the problem
−∆u = f iR
u = g på ∂R
where R is the square (0, 1) × (0, 1).
1
Set h = M and apply the 5-point formula Lh Up = fp where
1
Lh Up = (4Up − Uø − Uv − Un − Us ), p∈R
h2
Up = gp , p ∈ ∂R.
The truncation error on the node p is given as τp = Lh up + fp . Using Taylor expansion

we can show that
1
|τp | = h2 K = τ̄
6
where
K = max{|∂x4 up |, |∂y4 up |},
p∈R
this definition requires that these fourth derivatives are bounded on R. The discretization
error (global error) is defined as
ep = up − Up ,
and we find that
Lh ep = Lh up − Lh Up = fp + τp − fp = τp , p ∈ R
Lh ep = 0, p ∈ ∂R.
So we have
|Lh ep | ≤ τ̄ for alle p ∈ R.
We find now that the function ϕ(x, y) = 21 x2 and we apply the operator Lh on this function

1 1 2 1 2 1 2 1 2 1 2
Lh ϕp = 2 4 xp − xø − xv − xs − xn .
h 2 2 2 2 2
Where we have

xo = xp + h 


 1 2 2 2 2 2
xn = xs = xp  ⇒ Lh ϕp = 2 4xp − (xp + h) − (xp − h ) − xp − xp = −1.
 2h

xv = xp − h


We set now Vp = ep + τ̄ ϕp and get
Lh Vp = Lh ep + τ̄ Lh ϕp = Lh ep − τ̄ ≤ 0.
So Lh satisfies the hypothesis in the discrete maximum principle with Vp = ep + τ̄ ϕp .

Therefore we get
1
ep + τ̄ ϕp ≤ max (eS + τ̄ ϕS ) ≤ τ̄ for alle p ∈ R,
S∈∂R 2
since eS = 0 and R is the square (0, 1) × (0, 1) so ϕ(x, y) = 21 x2 ≤ 21 for (x, y) ∈ R.

If we repeat the same argument with Vp = −ep + τ̄ ϕp we find respectively that
1
−ep + τ̄ ϕp ≤ τ̄ for all p ∈ R.
2
Since ϕp τ̄ ≥ 0 can we conclude that
1 1
|ep | ≤ τ̄ ≤ Kh2 for alle p ∈ R.
2 12
6.10.2 Some general comments on convergence

We look at the general difference scheme of the type
X
αpp up − αpq uq = βp + τp .
q
6.11. A DISCUSSION ON THE SOLUTION OF LARGE LINEAR SYSTEMS 79
The discretization error is ep = up − Up , and by substituting into the formula we obtain

X
αpp ep − αpq eq = τp , p ∈ Ω.
q
If we arrange all the values of the error and the local truncation error in the grid-points
ep , τp , p ∈ R in vectors e and τ , we can write the system in the form
Ae = τ
If A is invertible we get
e = A−1 τ
implying
kek∞ ≤ kA−1 k∞ · kτ k∞ .
Stability: the difference methods is called stable if there is a constant C such that
kA−1 k ≤ C, for all step-sizes h.
It is typically possible to prove (by Taylor expansion) that he truncation error τ satisfies
kτ k∞ = O(hσ ), σ integer number.
This together with stability will imply that
kek∞ = O(hσ ).
It can happen that for example τp = O(h2 ) for some p, while for others (typically those
close to the boundary or on the boundary) we have τp = O(h). So in general the global
error kek∞ can not be expected to be of higher order than 1 in h, O(h). It can happen
however that in such situations one can get kek∞ = O(h2 ) anyway.
6.11 Some remarks on the solution of linear systems of alge-

braic equations
The solution of linear systems of the type (6.4) is an important sub-area of numerical
analysis. How complex PDE problems we can handle and how accurate numerical solutions
we can produce at a computer, depends greatly on how large linear systems of equations
we can solve. It is outside the scope of this course to discuss this subject in detail, we will
give here some general information.
When we choose a method to solve numerically a linear system of equations we have two
main classes of methods we can consider, i.e. direct methods and iterative methods. The
first class includes Gaussian elimination, or more specifically the Cholesky factorisation
if the system is symmetric and positive definite. The iterative methods include Jacobi,
Gauss–Seidel, and SOR (succsessive over-relaxation). But the type of linear iterative
equation-solvers which has got most success in the last decades, are the so called Krylov
subspace methods and for symmetric matrices the conjugate gradient method. It is not easy
to say precisely when it is best to use direct methods rather than direct methods. Typical
advantages of iterative methods are:
1. the linear systems are very large and sparse; a system is sparse if the non zero entries
of the matrix are relatively small portion of the total number of entries;
2. the matrices are not banded, i.e. there are indexes, corresponding to non zero ele-
ments, relatively far away form the diagonal position (ij-element with large value of
|i − j|); the bandwidth of a matrix can be for example defined as follows
b(A) = max{|i − j| : aij 6= 0};
3. the system can be preconditioned effectively; this means that it is possible to find
matrices T, S, such that the system
T −1 AS S −1 x = T −1 b
A
b x̂ = b̂
is “easier” to solve compared to the original system Ax = b.
In practice direct methods and iterative methods are both effective when working with
partial differential equations in 2 space dimensions, while in 3 space dimensions we might
expect an advantage in using iterative methods. Typically iterative solvers use matrix-
vector multiplications (by A or respectively A) b as building blocks. If property 1. above is
satisfied the multiplications Ax can be executed efficiently provided the matrix is easy to
access in the computer memory. The property 2. disfavours the use of Gaussian elimina-
tion, which can be executed efficiently for banded matrices. The factors L and U in the
LU -factorization have in this case the same bandwidth as A (if no pivoting is performed).
On the other hand if the bandwidth is large the number of non zero elements in L and U
can be much larger than in A. This effect is called “fill-in”.
A real difference in performance between iterative methods and direct methods can be
seen when we use appropriate preconditioning techniques, property 3. above. Note that
the transformation A b = T −1 AS is not carried out in practice but only in theory. For the
sake of simplicity, let us assume S = I. As mentioned above the iterative methods are built
on operations of the type y = Ax for arbitrary vectors x. For the preconditioned system
we get y = Ax b = T −1 Ax. The preconditioning is therefore an internal procedure in the
method: each time we compute Ax b we do it by first computing ỹ = Ax and theny = T −1 ỹ.
The last operation is not performed as an explicit matrix-vector multiplication, but as
a process, a function applied to ỹ. An example of such a process is the computation of
parts of the Gaussian elimination algorithm on the system Ay = ỹ. Such approach is called
incomplete LU-factorization and is done by limiting or neglecting the fill-in. This approach
can also be efficiently parallelized, meaning that the computational work can be divided
over many processors on a supercomputer and can be executed very efficiently. Another
preconditioning technique can be obtained by splitting the domain of definition of the
partial differential equation in many small sub-domains. In turn, the differential equation
is split in many different differential equations, while discarding or approximating the
coupling between the various sub-domains. Each of the linear systems arising form one of
the differential equations is then solved on a different processor by, for example, Gaussian
elimination.
The course TMA4205 Numerical Linear Algebra presents these issues and methods in
detail.
Chapter 7
Hyperbolic equations
7.1 Examples
1. The most famous example of a hyperbolic differential equation is the linear second
order wave equation
utt = c2 uxx
or in several space dimensions
utt = c2 ∆u.
2. Consider the following general second order linear PDEs in two space dimensions
a uxx + 2b uxy + cuyy + dux + euy + f u = 0,
where a, b, c, d, e, f are functions of x, y, such equation is said to be hyperbolic when
b2 − ac > 0.
Note that y plays here the role of t in the equations above.
3. Another equation appearing in many applications is the so called conservation law
ut + c(x, t, u) ux = 0.
This is a scalar quasi-linear PDE, of first order which is also said to be hyperbolic.
Such an equation can be used to describe the road traffic flow .
4. Systems of first order linear equations with constant coefficients can be written in
the following form
ut + A ux = 0 (7.1)
where u ∈ Rn and A is a real n × n-matrix. Such a system is hyperbolic if A is
diagonalisable with real eigenvalues.
5. Let us consider a nonlinear hyperbolic system of equations describing an important
application namely the shallow water equations. We consider this set of equations
in one space dimension, and let v(x, t) be the velocity of the fluid in the point x at
time t, while z(x, t) measures the (vertical) wave height relative to an equilibrium
position.
Mass conservation zt + (vz)x = 0.
Momentum conservation vt + ( 21 v 2 + z)x = 0.
81
82 CHAPTER 7. HYPERBOLIC EQUATIONS
6. Next we write all these equations in general conservation form

ut + (f (u))x = 0. (7.2)
Here u ∈ Rn while f : Rn → Rn is a nonlinear mapping, and (7.1) is the special
case where f (u) = Au for a n × n matrix A. To ensure (7.2) is hyperbolic we require
that the Jacobian matrix Df = f 0 (u) is diagonalisable with real eigenvalues.
7.2 Characteristics
We consider a model problem which we will use a lot in the sequel
ut + aux = 0, −∞ < x < ∞, t ≥ 0, a>0 (7.3)
where a is a constant. The initial function must be given on all of R
u(x, 0) = f (x), −∞ < x < ∞.
For this simple model problem it is possi-
ble to find the exact solution, that is t
x−at=x0
u(x, t) = f (x − at).
u(x,t)=f(x 0 )
You can verify that by differentiating ap-
propriately and substituting into the equa-
tion u(x, t) = f (x−at) satisfies (7.3). This
means that in the (x, t)-plane there is a
line where u(x, t) is constant, i.e. the line
x = x0 + at where u(x, t) = u(x0 + at, t) =
x0 x
f (x0 + at − at) = f (x0 ). This line is called
characteristic.
Constant values of u are transported from the initial value u(x0 , 0) = f (x0 ) and forward
in time. The velocity of propagation is the reciprocal of the slope of the curve. When the
line is vertical the slope is infinite, and the velocity is 0. If the characteristic is oriented
towards right and upwards the value u(x0 , 0) is transported in the same direction. In a
xu-plot we can draw the solution at different values of time
u
t=t 0 t=t 1
Let us now consider the more general case

ut + a(x, t) ux = b(x, t) (7.4)
Characteristic equation for (7.4)
dx
= a(x, t). (7.5)
dt
7.2. CHARACTERISTICS 83
Assume now that x0 and t0 is given and

t
let
x = g(x0 , t0 , t) x=g(x0,t0 ,t)
be a solution of (7.5) satisfying x(t0 ) =

x0 . Let u(x, t) be a solution of (7.4) and (x0 ,t0 )
consider x
v(t) := u(x, t)|x=g = u(g(x0 , t0 , t), t) Characteristic γ through (x0 , t0 )

Here v gives the solution u(x, t) along the curve γ. We can differentiate v with respect to
time using the chain rule of derivation

dx
v 0 (t) = ut + ux = (ut + a(x, t)ux )|x=g = b(x, t)|x=g
dt x=g
We see that along the characteristic, v 0 (t) is a known function of t, so we can find v by
integration Z t
v(t) = v(t0 ) + b(x, s)|x=g(x0 ,t0 ,s) ds.
t0
If we know u(x0 , t0 ) we can find u(x, t) for (x, t) ∈ γ from the formula
Z t
u(x, t) = u(x0 , t0 ) + b(g(x0 , t0 , s), s) ds. (7.6)
t0
Special case: b(x, t) ≡ 0 implies u(x, t) = u(x0 , t0 ) for all (x, t) ∈ γ.
Initial value problem for (7.4). Assume that u(x, t) satisfies the PDE (7.4) and that
u(x, t) = f (x, t) along the curve Σ : x = σ(t)
NB! Σ should not be a characteristic.
t
Solution strategy: Compute the charac- karakteristikk
teristic x = g(x0 , t0 , t) through the point
(x0 , t0 ) belonging to Σ. Use than (7.6)
(x 0 ,t0 )
with u(x0 , t0 ) = f (x0 , t0 ). Σ
x
A simple example. Let us consider again the simplest example

ut + aux = 0 a ∈ R.
The characteristics are
dx
=a ⇔ x = g(x0 , t0 , t) = x0 + a(t − t0 ).
dt
Let us for example assume that the curve Σ is the x-axis where u(x, 0) = f (x) i.e. we have
t0 = 0. The characteristic through (x, t) intersects the x-axis in x0 = x − at where the
solution is u(x0 , 0) = f (x0 ) = f (x − at) and we have reproduced the exact solution which
we have presented earlier.
Initial/boundary valule problem for ut + aux = 0. We formulate the problem for

0 ≤ x ≤ ∞ and t ≥ 0. Assume now for simplicity that a > 0.
Initial value u(x, 0) = f (x), x≥0 t

Boundary value u(0, t) = h(t), t≥0
u=g
The exact solution of this equation is given
by
x

f (x − at), x − at ≥ 0
u(x, t) = u=f
h(t − x/a), x − at < 0.
Remark. We could have used, for example, boundaries x = 0 and x = 1, but than it would
be wrong to specify the solution along the line x = 1.
7.3 Explicit difference formulae for ut + aux = 0
t
n+1
xm = mh, tn = nk n−1
unm = u(xm , tn )
m−1 m m+1
x
Different discretizations
1 n+1
∂t unm = (u − unm ) + O(k) (7.7)
k m
1
∂x unm = (unm+1 − unm ) + O(h) (7.8)
h
1
∂x unm = (unm − unm−1 ) + O(h) (7.9)
h
1 n
∂x unm = (u − unm−1 ) + O(h2 ) (7.10)
2h m+1
If we choose (7.7) and (7.9) we get
1 n+1 a
(u − unm ) + (unm − unm−1 ) + O(k) + O(h) = 0
k m h
and therefore for the numerical method
n+1 n n k
Um = (1 − ap)Um + apUm−1 , p= (7.11)
h
The truncation error bacomes τm n = O(k 2 ) + O(kh). If we choose (7.10) for u we obtain
x
instead
n+1 n ap n n
Um = Um − (Um+1 − Um−1 ) (7.12)
2
Here we obtain τmn = O(k 2 ) + O(kh2 ), looking as an improvement compared to (7.11) but
we will later show that (7.12) is always unstable for this problem!
7.4. STABILITY 85
Lax-Wendroff ’s formula. From the differential equation we obtain
utt = (−aux )t = −a(ut )x = −a(−aux )x = a2 uxx .
We construct a difference discretization by Taylor expanding un+1 m to the second order, use
the differential equation, and then discretize the spatial derivatives with central differences.
We obtain
n+1 1 2 2 n
um = unm + k ∂t unm + k ∂t um + O(k 3 )
2
1
= unm − ak ∂x unm + (ak)2 ∂x2 unm + O(k 3 )
2
n 1 1 1
= um − ak (um+1 − unm−1 ) + + (ak)2 2 δx2 unm + O(kh2 ) + O(k 3 )
n
2h 2 h
From here we obtain
Lax-Wendroff ’s formula for ut + aux = 0
n+1 n ap n n 1
Um = Um − (Um+1 − Um−1 ) + (ap)2 δx2 Um
n
(7.13)
2 2
Truncation error
n
kτm = O(k 3 ) + O(kh2 )
Leap frog formula (Hoppe-bukk formel).
n+1
1 n+1
∂t unm = (u n−1
− um ) + O(k 2 )
2k m
1 n
∂x unm = (u − unm−1 ) + O(h2 )
2h m+1
n
We get therefore the leap-frog formula
(hoppe-bukk formel) for ut + aux = 0
n+1 n−1 n n
Um = Um − ap(Um+1 − Um−1 )
n−1 n = O(k 3 ) + O(kh2 )
Truncation error: kτm
m−1 m m+1
Note that this is a two-step (three levels) method in time. For this reason it is neceaasry
to provide a starting method, analogously to what happens for multistep methods for
ordinary differential equations.
7.4 Stability
Courant-Friedrichs-Levy condition. We consider again the equation ut + aux = 0.
Assume we have a difference formula of the type
n+1 n n n
Um = α−1 Um−1 + α0 Um + α1 Um+1 d (7.14)
n . U n depends on U 0
We study the domain of dependence for Um m m+` der −n ≤ ` ≤ n. The
n n
dependence intervall for Um on the x-axis is Im = [xm−n , xm+n ].
n
Um
xm−n xm xm+n
Let us fix xm = x∗ = mh and tn = t∗ = nk while sending h and k to zero and m, n to infinity

simultaneously. Assume also that this is done in such a way that p = k/h = constant. The
n are then
end points for Im
h t∗
xm±n = x∗ ± nh = x∗ ± t∗ = x∗ ±
k p
∗ ∗
So Im n = [x∗ − t , x∗ + t ]. Note that this interval is fixed when h and k go to zero as
p p
described above. Let γ be a characteristic for ut + aux = 0 passing though the point
(x∗ , t∗ ). The inclination with respect to the x-axis is a and the line intersects the x-axis
in x∗ − at∗ so u(x∗ , t∗ ) = f (x∗ − at∗ ). If x∗ − at∗ 6∈ Im
n this means that the computed
approximation Um is built upon initial data that do not include f (x∗ − at∗ ) regardless of
n
how small h and k are. In the situation depicted in the figure we can not have convergence
towards the exact solution, when h, k → 0, for all initial values.
( x*,t* )
x*−at * x*−t */p x*+t*/p
Necessary condition for convergence: CFL-condition. The characteristic though (x∗ , t∗ )

must intersect the x-axis in a point of the domain of dependencefor xm = x∗ , tn = t∗ .
CFL is a short for Courant-Friedrichs-Levy.
The same principle holds also for curved characteristic curves, as those one gets when
a = a(x, t).
The characteristic though (xm , tn+1 ) inter-
sects the line t = tn between the two ex- n+1
ternal points corresponding to m − 1 and
m + 1 (whose corresponding numerical ap-
proximations are featuring in the differ-
γ
ence formula), as described in the figure.
The characteristics must never leave the n
domain of dependence. m−1 m m+1
7.4. STABILITY 87
Let us see which is the necessary criterion for convergence for the case of constant a and
a difference formula of the type considered above. We have the condition
x∗ − t∗ /p ≤ x∗ − at∗ ≤ x∗ + t∗ /p
giving |a|p ≤ 1. This is the necessary condition for convergence for all the numerical
formulae of the type (7.14).
Von Neumann-condition. We remind that this condition, discussed earlier in this

course, is based on Fourier analysis. The method consists in substituting
n
√
Um = ξ n eiβxm , i = −1 (7.15)
in the difference equation, and then solve it writh respect to th amplification factor ξ. The
stability condition is then
|ξ| ≤ 1 for any β ∈ R.
Stability of the Lax-Wendroff ’s scheme. We remind the formula valid for problems
of the type ut + aux = 0
n+1 n 1 n n 1
Um = Um − ap(Um+1 − Um−1 ) + (ap)2 (Um+1
n n
− 2Um n
+ Um−1 ).
2 2
If we substitute (7.15), simplified with ξ n eiβxm on both sides and we use eiβh = cos βh +
i sin βh we get
βh
ξ = 1 − i ap sin βh + (ap)2 (cos βh − 1) = 1 − 2(ap)2 sin2 − i ap sin βh.
2
βh
We now use the trigonometric identity cos βh = 1 − 2 sin2 2 here and below. Let us define
r = ap og q = sin βh
2 .
|ξ|2 = (1 − 2r2 q 2 )2 + r2 (1 − cos2 βh) = (1 − 2r2 q 2 )2 + r2 (1 − (1 − 2q 2 )2 )

= 1 − 4r2 q 2 + 4r4 q 4 + r2 + r2 − r2 (1 − 4q 2 + 4q 4 )
= 1 − 4r2 (1 − r2 ) q 4
We require then
1 − 4r2 (1 − r2 ) q 4 ≤ 1, 0≤q≤1
m
4r2 (1 − r2 ) ≥ 0
m
|r| ≤ 1
m
|a| p ≤ 1
then we obtain exactly the same condition obtained from the CFL-condition.
A simpler fomula to check is the “naive” formula (7.12) which we DO NOT recommend
n+1 n ap n n
Um = Um − (U − Um−1 ).
2 m+1
We obtain here
ξ = 1 − i ap sin βh (7.16)
so that
|ξ|2 = 1 + (ap)2 sin2 βh > 1
for almost all the β, that is the formula is unstable for all step-sizes. An interesting
modification of this bad method can be obtained by replacing
n 1 n n

Um with Um−1 + Um+1 .
2
In von Neumann analysis the expression for ξ in (7.16) is replaced by
ξ = cos βh − i ap sin βh,
and we find that |ξ|2 = 1 + (1 − (ap)2 ) sin2 βh so that we get stability if |ap| ≤ 1. This
method is called Lax-Friedrichs method.
7.5 Implicit methods for ut + aux = 0

Implicit methods can only be used on initial/boundary value problems. We consider prob-
lems of the type
t
ut + aux = 0, x ≥ 0, t ≥ 0,
u=g
u(x, 0) = f (x), x ≥ 0,
u(0, t) = g(t), t ≥ 0.
x
u=f
The easiest implicit formula. We try the easiest possible discretization of the deriva-
tive in (xm , tn+1 )

1

∂t un+1 = (un+1 − unm ) + O(k)

m
k m
1
∂x un+1
m = (un+1 − un+1
m−1 + O(h)
h m
which gives the formula
n+1 n n+1 n+1

Um − Um + ap(Um − Um−1 ) = 0.
n+1 and obtain

We can solve explicitely with respect to Um
ap 1
n+1
Um = U n+1 + Un .
1 + ap m−1 1 + ap m
Even if the superscript n + 1 appears at the right hand side of this equation, the method is
in practice explicit if we compute the solution values at time level n + 1 from left to right.
We observe that U1n+1 depends only on U0n+1 at time level n + 1 and the latter is given as
g(tn+1 ). The computed value U1n+1 is subsequently used to compute U2n+1 and so on.
We find that the local truncation error for this method is
n 1
= − a2 k 2 + a hk ∂x2 unm + · · · = O(k 2 + hk)

kτm
2
7.5. IMPLICIT METHODS FOR UT + AUX = 0 89
Wendroff ’s method. We consider a procedure similar to the box-integration
ut + aux = 0
m+1 n+1
⇓ R
Z tn+1 Z xm+1
(ut + aux ) dx dt = 0.
tn xm m n
If we exchange the order of integration for the first term and we use the fundamental
theorem of calculus, we find
Z xm+1 Z tn+1
n+1 n
(u − u ) dx + a (um+1 − um ) dt = 0 (7.17)
xm tn
where
un = u(x, tn ), and um = u(xm , t).
Now we recall the trapezoidal rule for quadrature. Given a function f (x) we have
Z r+d
d 1 d
f (s) ds = (f (r) + f (r + d)) − d3 f 00 (r + ) + · · ·
r 2 12 2
So if we apply the trapezoidal rule to both the integrals in (7.17) we obtain

h n+1 n n+1 n ak n+1
(um − um ) + (um+1 − um+1 ) + (um+1 − um ) + (um+1 − um ) + O(k 3 + h3 ) = 0
n n n+1
2 2
Wendroff’s method.
n+1 n+1 n n
(1 + ap) Um+1 + (1 − ap) Um − (1 − ap) Um+1 − (1 + ap) Um =0
The truncation error can be expanded around the midpoint (xm + h/2, tn + k/2) of the
rectangle R, and we get then

n 1 3 3 n+1/2
kτm = a k − a kh ∂x3 um+1/2 + · · · = O(k 3 + kh2 ).
2
6
To study the stability of the method we use the Von Neumann-condition again, with
γ = (1 − ap)/(1 + ap) we can write
n+1 n+1 n n
Um+1 + γ Um − γ Um+1 − Um = 0.
n = eiβmh we get
By setting Um
ξ(eiβh + γ) = γeiβh + 1,
which we solve with respect to ξ and obtain
γ + e−iβh
ξ = eiβh .
γ + eiβh
The first factor eiβh has absolute value 1, and the fraction is an expression of the type
z ∗ /z so this has also absolut value 1. Therefore we have |ξ| = 1 for all β and we say that
Wendroeff’s method is unconditionally stable, that is stable for all h and k.
7.6 Hyperbolic systems of first order equations

Definition of hyperbolicity, characteristics. We consider systems of partial differ-
ential equations of first order with two independent variables x and t and ` dependent
variables
ut + Aux = 0,
(7.18)
u = (u1 , . . . , u` )T , A ∈ R`×` .
To begin with we let A be constant. If A is diagonalizable and has real egenvalues, we call
(7.18) hyperbolic. In this case A can be factorized as
A = T ΛT −1 , Λ = diag(λi )i=1:` , λi real.
We perform a variable transformation
u = T v ⇔ v = T −1 u
ut = T vt
ux = T vx
where T is a constant matrix. By substituting in (7.18) we get
T vt + (T ΛT −1 )T vx = T (vt + Λvx ) = 0.
We have therefore decoupled (7.18) into
vt + Λvx = 0
m (7.19)
(vi )t + λi (vi )x = 0, i = 1, . . . , `.
For each equation we have a characteristic equation

dx
= λi , i = 1, . . . , `. (7.20)
dt
We observe that to (7.18) pertain ` characteristic equations and ` families of characteristics,
x = λi t + konstant, i = 1, . . . , `. (7.21)
Example. We rewrite the wave equation
φtt = c2 φxx
into a system of first order equations, as follows
u1 = φt , u 2 = φx ,
(u1 )t = φtt = c2 φxx = c2 (u2 )x
(u2 )t = φxt = φtx = (u1 )x ,
and in the form (7.18)
0 −c2

u1 u1
+ · = 0. (7.22)
u2 t
−1 0 u2 x
7.6. HYPERBOLIC SYSTEMS OF FIRST ORDER EQUATIONS 91
So the matrix A in (7.18) has the form
0 c2

A=− ,
1 0
A has eigenvalues and eigenvectors

−c c
λ1 = c, t1 = , λ2 = −c, t2 =
1 1
and so it is diagonalizable:
A = T ΛT −1 ,

c 0 −c c −1/(2c) 1/2
Λ= , T = , T −1 = .
0 −c 1 1 1/(2c) 1/2
The characteristic equation is dx/dt = ±c, and (7.22) has two families of characteristics
x = ±ct + konst. The tranfromation to the form (7.19) is
v = T −1 u ⇔ v1 = (−u1 /c + u2 )/2, v2 = (u1 /c + u2 )/2,

(v1 )t − c(v1 )x = 0,
(v2 )t + c(v2 )x = 0.
General linear system of first order equations.
ut + A(x, t)ux + B(x, t)u = f (x, t) (7.23)
The equation (7.23) is hyperbolic if A is diagonalizable with real eigenvalues
A(x, t) = T (x, t) · Λ(x, t) · T −1 (x, t)

Λ(x, t) = diag(λi (x, t))i=1:l .
We perform the change of variables
u=Tv
ut = T vt + Tt v, ux = T vx + Tx v
(7.23) ⇒
T vt + Tt v + T ΛT −1 (T vx + Tx v) + BT v = f
giving
vt + Λvx + Γv = g,
Γ = T −1 Tt + ΛT −1 Tx + T −1 BT, (7.24)
g = T −1 f.
The equations (7.24) are not any longer a decoupled system as in the case (7.21), but lead
to charachteristic equations for (7.23)
dx
= λi (x, t), i = 1, . . . , `.
dt
Initial/boundary value problem. An initial/boundary value problem is given as
ut + Aux = 0, −∞ < x < ∞, t > 0

u(x, 0) = f (x), −∞ < x < ∞.
Initial/boundary value probelm of type I.
t
R = {(x, t) : x ≥ 0, t ≥ 0
ut + Aux = 0 i R
u=? u(x, 0) = f (x), x ≥ 0
Moreover we need a set of boundary

conditions for u along the t-axis as
we will se in the sequel.
x
u=f
Consider first a scalar equation
wt + awx = 0.
The characteristic equation is dx/dt = a. The characteristics are: x = at + konst.

t t
w(x,t)=w(0,q)
x=a(t−q)
x=at+b
w(x,t)=w(b,0)
w(x,t)=w(b,0)
q
x=at+b
x
x b
b a > 0. To be able to determine w(x, t) in
a ≤ 0. w(x, t) is uniquely determined in R we need to know w(x, 0), x ≥ 0 and
R if w(x, 0) is known for x ≥ 0. w(0, t), t ≥ 0.
Consider a system
) (
ut + Aux = 0, i R, vt + Λvx = 0 i R,
⇐⇒
u(x, 0) = f (x), x ≥ 0 A = T ΛT −1 , v(x, 0) = h(x), x ≥ 0.
u = T v,
h = T −1 f.
Let Λ = diag(λi )i=1:` and assume the eigenvalues are ordered such that λi > 0, i = 1, . . . , k
and λi ≤ 0, i = k + 1, . . . , ` (for a certain k). We get scalar equations
(vi )t + λi (vi )x = 0 i R, vi (x, 0) = hi (x), x ≥ 0.

In this case vi (x, t) will be determined by vi (x, 0) = hi (x) for all i ≥ k + 1 (characteristics
pointing upwards towards left). To compute vi (x, t) in R with i ≤ k we need values of vi
along the t-axis. We can get such values by imposing k boundary conditions.
l
X
γij vj (0, t) = gi (t), i = 1 : k.
j=1
The equations must be solvable for {vi , i = 1, . . . , k} this means C = [γij ]i,j=1:k must be
non singular. This leads to the following initial/boundary value problem:
ut + Aux = 0 i R, Ati = λi ti , i = 1, . . . , `,
u(x, 0) = f (x), x ≥ 0, λi > 0, i = 1, . . . , k,
Bu(0, t) = g(t). B er k × `,
C = B · [t1 , · · · , tk ] is non singular.
Example.
(u1 )t − c2 (u2 )x = 0

u1 (x, 0) = f1 (x)
i R, , x ≥ 0.
(u2 )t − (u1 )x = 0 u2 (x, 0) = f2 (x)
β1 u1 (0, t) + β2 u2 (0, t) = g(t), t ≥ 0, −β1 c + β2 6= 0.
Initial/boundary vaule problem of type II.

R = {(x, t) : 0 ≤ x ≤ 1, t ≥ 0}. Assume A has k positive and m negative eigenvalues
(k + m ≤ l).
t
B u=g B u=g
0 0 1 1 ut + Aux = 0 i R,
u(x, 0) = f (x), 0 ≤ x ≤ 1,
B0 u(0, t) = g0 (t), t ≥ 0,
B1 u(1, t) = g1 (t), t ≥ 0.
Here B0 k × ` and B1 is m × `.
u=f
0 1 x
Lax-Wendroff and Wendroff for systems. We recall the Lax-Wendroff for the scalar
equation ut + aux = 0
n+1 n n 1
Um = Um − apµx δx Um + (ap)2 δx2 Um
n
2
where µx u(x, t) = 12 (u(x + h/2, t) + u(x − h/2, t)) is a averaging operator and δx u(x, t) =
u(x + h/2, t) − u(x − h/2, t) as always. Particularly we note that
n n n 1 n n 1 n n 1 n n
µx δx Um = µx (Um+1/2 − Um−1/2 ) = (Um+1 + Um ) − (Um + Um−1 ) = (Um+1 − Um−1 ).
2 2 2
We formulate now the Lax-Wendroff method for a system of equations ut + Aux = 0 where
the matrix A ∈ R`×` is constant and Um
n ∈ R` for all m and n
n+1 n n p2 2 2 n
Um = Um − pAµx δx Um + A δx Um , (7.25)
2
this is a straigtforward generalization. But we let now A = A(x, t) depend on x and t, and
get
Lax-Wendroff for systems with variable A, ut + A(x, t)ux = 0
n+1 n p2 n+1/2 n+1/2

Um = Um − pAn+1/2
m
n
µx δx Um + Am δ x Am n
δx Um
2
n+1/2
where Am = A(xm , tn + k/2).
We consider now stability for constant A, and generalize the Von Neumann-condition
to systems. We present only the procedure which is based on taking
n
Um = eiβxm Gn U 0
and substitute it in the difference method. Now G ∈ R`×` is a matrix, often called
amplification matrix, β ∈ R is an arbitrary frequency, and U 0 ∈ R` an arbitrary vector.
We substitute this expression into (7.25), and we get
G = I − ipA sin θ + p2 (cos θ − 1)A2 , θ = βh. (7.26)
A necessary condition for stability is then
ρ(G) ≤ 1 for all θ ∈ R.
Since A is diagonalizable, we can write
A = T ΛT −1 , Λ = diag(λ1 , . . . , λ` ).
If we use this into (7.26) we get
G = T (I − ip sin θΛ + p2 (cos θ − 1)Λ2 ) T −1
then the matrix T diagonalizes both A and G. So the eigenvalues of G are on the mid
diagonal factor, they are
µj = 1 − ip sin θλj + p2 (cos θ − 1)λ2j .
The expression for µj is then exactly the same as for ξ in the scalar stability analysis for
the Lax-Wendroff method, only a is replaced by λj on the right hand side. The same
analysis goes through when we require |µj | ≤ 1, and we obtain the condition p|λj | ≤ 1 for
all j, i.e.
ρ(A) p ≤ 1,
which is the stability condition for Lax-Wendroff applied to systems with A constant.
A necessary condition for stability for variable A is that ρ(A(x, t)) p ≤ 1 for all (x, t)
where the method is used.
We consider now a generalization of Wendroff ’s method to the case of systems. We recall

the method for the scalar equation
n+1 n+1 n n
(1 + ap) Um+1 + (1 − ap) Um − (1 − ap) Um+1 − (1 + ap) Um = 0.
By using forward differences in space we can rewrite this expression as

1 n+1 1 n
1 + (1 + ap)∆x Um = 1 + (1 − ap)∆x Um .
2 2
So we obtain
Wendroff’s method for systems with variable A, ut + A(x, t)ux = 0

1 n+1/2 n+1 1 n+1/2 n
I + (I + p Am+1/2 )∆x Um = I + (I − p Am+1/2 )∆x Um
2 2
n+1/2
where Am+1/2 = A(xm + h/2, tn + k/2).
If we can compute A only in the gridpoints, we can replace

n+1/2 1 n n+1
Am+1/2 with (A + Anm+1 + An+1 m + Am+1 )
4 m
without loss of accuracy. The stability analysis for constant A follows the same technique
as before, and the result is that Wendroff’s method is stable for all p.
Initial/boundary value problem for systems of two equations. The wave equation
φtt = φxx can be rewritten in the form of a system as in (7.22), with c = 1 the equation
becomes
u1 0 −1 u1
+ · = 0.
u2 t −1 0 u2 x
Here A has eigenvalues λ1 = 1 and λ2 = −1, so a family of characteristics points towards
left and the other points towards right as in the picture at page 92. So it is necessary to
specify only one boundary condition. The boundary/initial conditions can for example be
u(x, 0) = f (x), x ≥ 0
v(x, 0) = g(x), x ≥ 0
u(0, t) = φ(t), t ≥ 0.
Let us for example chose the Lax-Wendroff method as difference method. We let
   
U n 0 −1
n m 
Wm = , A= .
Vmn −1 0
The method becomes then
p2

n+1
Wm = I − pAµx δx + A2 δx2 n
Wm . (7.27)
2
Assume now that Wmn is known for all m ≥ 0 and a n ≥ 0. We can use (7.27) to find W n+1
m
for m ≥ 1. What about W0n+1 ?
U0n+1 = φ(tn+1 ) OK, the problem is V0n+1 .
To obtain V0n+1 we propose two alternatives
1. From the first differential equations, ut − vx = 0, we find
vx (0, t) = ut (0, t) = φ0 (t).

1
We approximate vx (0, t) = (v(h, t) + v(0, t)+O(h). Then we get the approximation
h
V0n+1 = V1n+1 + hφ0 (tn+1 ).
Note that V1n+1 is obtained from (7.27). Higher order approximations of the deriva-
tive can also be used,
2. Consider the second differential equation, vt − ux = 0 and use box-integration

Z h Z tn+1 Z tn+1 Z h
vt dt dx = ux dx dt.
0 tn tn 0
From the fundamental theorem of calculus we get then

Z h Z tn+1
(v(x, tn+1 ) − v(x, tn )) dx = (u(h, t) − u(0, t)) dt.
0 tn
We approximate both integrals with trapezoidal rule

h n+1 k
(V − V0n + V1n+1 − V1n ) = (U1n − U0n + U1n+1 − U0n+1 ).
2 0 2
The equation is solved then with respect to V0n+1
V0n+1 = −V1n+1 + V0n + V1n + p(U1n + U1n+1 − U0n − U0n+1 ).
7.7 Dissipation and dispersion

Let us now consider again the pure initial value problem
ut + aux = 0, −∞ < x < ∞, t≥0

u(x, 0) = f (x), −∞ < x < ∞.
The Fourier transform of the initial function f (x) is

Z ∞
1
fˆ(β) = f (x) e−iβx dx.
2π −∞
The inverse transform is Z ∞

f (x) = fˆ(β) eiβx dβ.
−∞
The solution of the pure initial value problem can be written in the form
Z ∞
u(x, t) = fˆ(β) eiβ(x−at) dβ.
−∞
If we let f (x) = eiβx we get the solution
u(x, t) = eiβ(x−at) .
7.7. DISSIPATION AND DISPERSION 97
The solution u is a wave with amplitude 1 and wave length λ = 2π/β, moving with speed
a along the x-axis. Let us now apply a difference method to this equation with the same
initial function. This gives a numerical approximation
n
Um = ξ n eiβxm ,
as we have already seen in the case of Von Neumann stability. Here ξ is a complex number
depending on β, as one can find by substituting the expression above in the differential
equation. In the sequel it is useful to write ξ in polar form
ξ = |ξ| e−iϕ .
Let us define the number α by requiring ϕ = β α k where k is the time-step, so α = ϕ/(βk).
We find
ξ n = |ξ|n e−i n ϕ = |ξ|n e−i n k α β = |ξ|n e−i β α tn
and so we obtain the following expression for the numerical and the exact solution
n
Um = |ξ|n ei β(xm −αtn ) ,
unm = |1|n ei β(xm −atn ) .

In general one can use an arbitrary initial function f (x) and in this case
Z ∞
Umn
= fˆ(β) |ξ|n ei β(xm −αtn ) dβ.
−∞
Earlier we have used |ξ| ≤ 1 as stability condition. The strict inequality is considered in
the following definition.
Dissipation. If there is an upper bound for the time-step k0 > 0 and a constant σ > 0
such that
|ξ| ≤ 1 − σ(βh)2s for |βh| ≤ π
and all k ≤ k0 , the difference formula is dissipative of order 2s.
Dispersion. If α depends on β the difference method is dispersive. Then different fre-

quencies are transported at different speed. In other words we can say that dissipation
measures the error in the amplitude, and dispersion measures the error in the phase.
Example. Let us use the definitions on the Lax-Wendroff method. From earlier analysis
we have (with θ = βh)

2 2 2 θ
ξ = 1 − i ap sin θ − (ap) (1 − cos θ) = 1 − 2r sin − i r sin θ, r = ap.
2
From the stability analysis for the Lax-Wendroff method we found the expression

θ
|ξ|2 = 1 − q sin4 , q = 4r2 (1 − r2 ). (7.28)
2
As earlier shown, we see that |ξ| ≤ 1 if and only if |r| ≤ 1, but the question is how |ξ|
behaves for 0 < |r| < 1, case when 0 < q ≤ 1. Let us assume that if x is a number such
that x ≤ 1 then
1 x √ x
1 − x ≤ 1 − x + x2 = (1 − )2 ⇒ 1−x≤1−
4 2 2
θ
Taking the square root of (7.28) and letting x = q sin4

2 we get
q sin(θ/2) 4 4

q θ
|ξ| ≤ 1 − sin4 =1− θ .
2 2 2 θ
As by the definition of dispersion we need only to look at the interval −π ≤ θ ≤ π. The

smallest value of sin(θ/2))/θ is sin(π/2))/π = 1/π. Then we get
4
q 1 q
|ξ| ≤ 1 − θ4 = 1 − 4 θ4 , |θ| ≤ π.
2 π 2π
2(ap)2 (1 − (ap)2 )
We conclude that Lax-Wendreoff is dissipative of order 4 with σ = .
π4
If we analyze now the dispersion, we find that

r sin θ
ϕ = α β k = arctan .
1 − 2r2 sin2 (θ/2)
Solving with respect to α and using that βk = θra we get

1 r sin θ
α=a arctan ,
rθ 1 − 2r2 sin2 (θ/2)
so in general α definitely depends on β and Lax-Wendroff is therefore dispersive. It is

interesting to see that at the stability limit r = 1 the scheme will not be dispersive, but
one gets α = a for all frequencies β.

Master

Uploaded by

Copyright:

Available Formats

Master

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Master

Uploaded by

Copyright:

Available Formats

TMA4212 Numerical solution of partial differential

equations with finite difference methods

January 31, 2017

3 Boundary value problems 15

4 Discretization of the heat equation 27

4.3.4 ut = Lu with different choices of L . . . . . . . . . . . . . . . . . . . 39

5 Stability, consistency and convergence 47

6 Elliptic differential equations 63

The numerical approximation of partial differential equations is an important component

2.1 Background on matrix theory

2.1.1 Jordan form

and Ji = [λi ] if mi = 1. If all mi = 1, then k = n and the matrix is diagonalizable. If A

2.1.2 Symmetric matrices

A symmetric n × n matrix has real eigenvalues λ1 , . . . , λn and a set of real orthonor-

A consequence of this is that the matrix of eigenvectors X = [x1 , . . . , xn ] is real and

The diagonalization of A is given by

Λ = diag(λ1 , . . . , λn ), X = [x1 , . . . , xn ], X T X = I, X T AX = Λ ⇔ A = XΛX T

2.1.3 Positive definite matrices

A positive definite ⇔ A has only positive eigenvalues.

A positive semi-definite ⇔ A has only non-negative eigenvalues, and at least a 0-eigenvalue.

2.1.4 Gershgorin’s theorem

Gershgorin’s theorem. Is given A = (aik ) ∈ Cn×n . Define n disks Sj in the complex

Divide by |ξ` | on each side and take the absolute value

2.1.5 Vector and matrix norms

1. kxk ≥ 0 for all x, kxk = 0 ⇔ x = 0.

2. kαxk = |α|kxk (α ∈ R (C))

2. kαAk = |α|kAk, (α ∈ R (C)),

Example. The Frobenius-norm of a matrix is defined as

2.1.6 Consistent and subordinate matrix norms

kAxk ≤ kAk · kxk for all A ∈ Rn×n , x ∈ Rn .

A given matrix norm is subordinate to a given vector norm on Rn if

ρ(M ) = max |λk |. (2.2)

3. Let k · k∞ be the matrix norm subordinate to the vector norm k · k∞ . We have

2.1.7 Matrix norms and spectral radius.

kAk ≥ ρ(A). (2.3)

Proof: Let x be a eigenvector of A associated to an eigenvalue λ such that

Let y ∈ Cn be arbitrary. Then we have

A(xy T ) = (Ax)y T = λ(xy T ),

|λ| kxy T k = kλ(xy T )k = kA(xy T )k ≤ kAk kxy T k.

|λ| kxk = kλ xk = kA xk ≤ kAk kxk etc.

Convergent matrix. A matrix A is said to be convergent (to zero) if

A sufficient criterion. If kAk < 1 for a particular matrix norm, A is convergent.

kAk k = kA · Ak−1 k ≤ kAk · kAk−1 k ≤ · · · ≤ kAkk → 0 if kAk < 1

2.2 Difference formulae

Taylor’s formula with reminder. With x ∈ I, x + h ∈ I is

The extension to the m-th power is obvious. then we can write

Taylor’s formula with reminder for functions of two variables.

Derivation of the previous formula. We look at a function of one variable µ(t) =

2.2.2 Big O-notation

Let φ be a function of h and p a positive integer. Then we have

φ(h) = O(hp ) when h → 0

if there exist two constants C, H > 0 such that

|φ(h)| ≤ C|h|p when 0 < |h| < H.

If this holds, we say that φ(h) is of order p in the variable h.

In general we have that φ(h) = O(hpφ ) and ψ(h) = O(hpψ ), then

2.2.3 Difference approximations to the derivatives

p ≤ q are integers, and typically p ≤ 0 and q ≥ 0.

Truncation error. We define

The strategy is to choose p and q, and then compute the q − p + 1 parameters ap , . . . , aq