NLAFull Notes 22

Numerical Linear Algebra MATH10098
Recap of Linear Algebra

The following is a summary of useful results in linear algebra, which you should be familiar with from
your Introduction to Linear Algebra or Accelerated Algebra and Calculus for Direct Entry course, or
your previous studies elsewhere. We will frequently use these results in this course, so it is essential
that you re-familiarise yourself with them. Some useful exercises are given at the end of the summary.
Note that this summary is not exhaustive, and is intended only as a reference to the linear algebra
results that will be used most frequently in Numerical Linear Algebra. Recommended resources for
further reading and recap:
• Your notes from Introduction to Linear Algebra or Accelerated Algebra and Calculus for Direct
Entry.
• B.N Datta. Numerical Linear Algebra and Applications, SIAM, 2010 (second edition). Sections
2.1 - 2.4.
• D. Poole. Linear Algebra: A modern introduction. Cengage Learning. Sections 1.1 - 1.3, 2.3,
3.1-3.3, 3.5, 4.1-4.4 and 5.1 - 5.5.
Pn
Definition 0.1. The dot product between two vectors x, y ∈ Rn is given by x · y = i=1 xi y i .
√
Definition 0.2. The length (or Euclidean norm) of a vector x ∈ Rn is given by kxk = x·x =
1/2
( ni=1 x2i ) .
P
Theorem 0.3 (Cauchy-Schwarz inequality). For x, y ∈ Rn , we have |x · y| ≤ kxkkyk.
Theorem 0.4 (Triangle inequality). For x, y ∈ Rn , we have kx + yk ≤ kxk + kyk.
Theorem 0.5 (Non-negativity). For x ∈ Rn , we have x · x ≥ 0, with x · x = 0 if and only if

x = 0.
Definition 0.6. The vectors x, y ∈ Rn are orthogonal if x · y = 0.
Definition 0.7. For two vectors x, y ∈ Rn , with y 6= 0, the projection of x onto y is the vector
projy (x) given by
x·y
projy (x) = y.
y·y
Definition 0.8. A matrix A ∈ Rm×n is in row echelon form if:
1. Any rows consisting entirely of zeros are at the bottom.
2. In each non-zero row, the first non-zero entry (called the leading entry) is in a column to
the left of any leading entries below it.
1
Definition 0.9. The rank of A is the number of non-zero rows in its row echelon form, which is
equal to the number of linearly independent rows of A. It is also equal to the number of linearly
independent columns of A.
Theorem 0.10 (Rank Theorem). Let A ∈ Rm×n , with m ≥ n. The following two statements
are equivalent:
1. Ax = 0 if and only if x = 0.
2. rank(A) = n, i.e. A has full rank.
Definition 0.11. Given A ∈ Rm×n with entries (A)ij := aij , the transpose AT ∈ Rn×m is given
by (AT )ij := aji . A square matrix A ∈ Rn×n is symmetric if A = AT .
Definition 0.12. Given A ∈ Rn×n , the inverse matrix A−1 ∈ Rn×n is such that AA−1 =
A−1 A = I, where I ∈ Rn×n denotes the identity matrix. If A−1 exists, A is called invertible.
Theorem 0.13. For any A, B ∈ Rn×n , we have det(AB) = det(A)det(B).
Definition 0.14. A matrix Q ∈ Rm×n , with m ≥ n, is orthogonal if QT Q = I. The columns

q1 , . . . , qn of an orthogonal matrix form an orthonormal set: qi · qi = 1 and qi · qj = 0, for
1 ≤ i, j ≤ n. For a square orthogonal matrix Q ∈ Rn×n , we have Q−1 = QT .
Definition 0.15. Given a matrix A ∈ Rn×n , a scalar λ ∈ C is called an eigenvalue and the vector
x ∈ Cn \ {0} a corresponding eigenvector of A if Ax = λx.
Theorem 0.16. The eigenvalues of A ∈ Rn×n are the roots of the characteristic polynomial ξA ,
defined by ξA (z) = det(A − zI), for z ∈ C. In other words, the eigenvalues λ1 , . . . , λn of A satisfy
n
Y
det(A − zI) = (λn − z).
i=1
Theorem 0.17. A matrix A ∈ Rn×n is diagonalisable if and only if it has n linearly independent
eigenvectors. A diagonalisable matrix A can be written in the form A = S−1 DS, where the
columns of S ∈ Cn×n are the eigenvectors of A and the diagonal entries of D ∈ Cn×n are the
corresponding eigenvalues, in the same order.
2
Theorem 0.18. If A ∈ Rn×n is symmetric, then it is orthogonally diagonalisable: We have

A = QT DQ, where Q ∈ Rn×n is orthogonal and D ∈ Rn×n is diagonal. The columns of Q are
the eigenvectors of A, and the diagonal entries of D are the corresponding eigenvalues, in the
same order. This means that the eigenvectors of A form an orthonormal basis of Rn , and the
eigenvalues and eigenvectors of A are real.
3
1 Background material
This section contains some background material on floating point arithmetic, algorithms and norms.
We need these concepts to understand how computers perform computations, how we can write down
our methods in a structured way, and whether our methods are computationally efficient and the
computed solutions are accurate.
1.1 Floating point arithmetic

Computers represent real numbers using only a finite number of bits. Thus, they can only represent
finitely many numbers. Let F ⊂ R be the set of representable numbers, and let fl : R → F be the
rounding to the closest element in F. This means a real number α ∈ R will be stored as fl (α) ∈ F
on the computer.
Computers use a floating point representation: a number is represented by a mantissa con-
taining the number’s digits, an exponent revealing the location of the decimal point relative to the
mantissa, and its sign. Floating point representation is essentially scientific notation, with most
modern computers using base 2:
Base 10: 2048 = 2.048 × 103
Base 2: 2048 = 1 × 211 (mantissa: 1, exponent: 11)
The mantissa can only store a finite number of digits, which you can think of as using a finite
number of significant figures. For a more thorough discussion of the relation between floating point
representations, scientific notation and significant figures, please watch the video at
https://www.youtube.com/watch?v=PZRI1IfStY0.
We work under the following assumptions on floating point representation.
Assumption 1.1. There is a parameter εm > 0, called machine epsilon, such that the following
assumptions hold.
A1. For all α ∈ R, there is ε ∈ (−εm , εm ) with
fl(α) = α (1 + ε) .
A2. For every α, β ∈ F and every elementary operation ∗ ∈ {+, −, ×, ÷}, there is δ ∈ (−εm , εm )
with
α ∗ β = (α ∗ β) (1 + δ)
where ∗ denotes the computed version of ∗.
Note that by Assumption A1, the rounding error in α is relative to the size of α:
|α − fl(α)| ≤ |α|εm .
Likewise, by Assumption A2, the error in α ∗ β is relative to the size of α ∗ β:
|α ∗ β − α ∗ β| ≤ |α ∗ β|εm .
The vast majority of modern computers use the floating point representation specified by the
IEEE standard 754 with double precision. In this case we use 64 bits to represent a floating point
4
number, split as 1 bit for the sign, 52 bits for the mantissa, and 11 bits for the exponent (in base
2). In this case we have εm = 2.22 × 10−16 , to 3 significant figures. You can check this yourself in
Python, using the following code:
import numpy as np
# find value of machine epsilon

em = np.finfo(float).eps
print(em)
According to Assumption 1.1, εm is the accuracy with which we can store real numbers and
perform arithmetic operations. Since εm gives the relative accuracy, we can interpret εm ≈ 10−16 as
working with 15 significant figures. Note that using a finite number of significant figures makes arith-
metic operations involving numbers of very different magnitude susceptible to errors. For example,
consider adding the numbers 1012 and 10−12 :
1012 + 10−12 = 1. 00 . . 00} 1 × 1012

| .{z
23 zeros
= 1012 to 15 significant figures
Floating point arithmetic will have an effect on any quantities we compute. In general, there will
be a difference between the true solution to the problem we want to solve, e.g. the solution x to
Ax = b, and the solution computed by a computer in floating point arithmetic, typically denoted
by x
b.
Python demo Week1.1: Floating point arithmetic
1.2 Algorithms and computational cost

An algorithm is a list of arithmetic operations to perform in order to compute the solution to a
problem.
The complexity of an algorithm is a measure of how the run time of the algorithm, as measured
by the number of arithmetic operations it performs, depends on the problem size n (typically the
dimension(s) of A). We assign unit cost to additions, subtractions, multiplications, divisions, and
taking square roots of floating point numbers.
Definition 1.2. The cost of an algorithm is
C ..= number of floating point operations (FLOPs).
We are usually interested in how quickly C(n) grows with n.
Definition 1.3. We say C(n) = O(f (n)) if there exists a constant α > 0 such that C(n) ≤ αf (n).
This means that C(n) does not grow more quickly than f (n) as a function of n. For example,
5n3 = O(n3 ),
5n3 + 2n2 = O(n3 ).
5
We now look at some basic algorithms in linear algebra.
Algorithm DP Dot product

Input: x, y ∈ Rn
Output: s = x · y = xT y
1: s = 0
2: for i = 1, . . . , n do . Loop over elements
3: s = s + x i yi . Multiply ith entries of x and y and add
4: end for
To compute the cost of Algorithm DP, we count the number of arithmetic operations. Line DP.3
involves 1 addition and 1 multiplication,
Pn and it is executed for all values of i = 1, . . . , n. The cost
of DP is therefore C(n) = i=1 2 = 2n = O(n).
Algorithm MV Matrix-vector multiplication

Input: A ∈ Rm×n with rows aT1 , . . . , aTm , x ∈ Rn
Output: y = Ax
1: for i = 1, . . . , m do . Loop over rows of A
2: yi = aT
i x . using Algorithm DP
3: end for
Pm
Algorithm MV involves m applications of Algorithm DP on vectors in Rn , hence C(m, n) = i=1 2n =
m × 2n = O(mn). In particular, if A ∈ Rn×n is square, we have C(n) = O(n2 ).
Algorithm MM Matrix-matrix multiplication

Input: A ∈ Rm×n with rows aT1 , . . . , aTm , B ∈ Rn×p with columns b1 , . . . , bp
Output: C = AB
1: for i = 1, . . . , m do . Loop over rows of A
2: for j = 1, · · · , p do . Loop over columns of B
3: cij = aT
i bj . using Algorithm DP
4: end for
5: end for
The cost of Algorithm MM is C(m, n, p) = m × p × 2n = O(mnp). In particular, if A, B ∈ Rn×n are

square, C(n) = O(n3 ).
Python demo Week1.2: Computational cost
There are always many different algorithms that solve a given problem, and it is important to
think about efficient implementation. Suppose we want to compute the product ABC, for matrices
A ∈ Rm×m , B ∈ Rm×m and C ∈ Rm×n . We can do this in two ways:
6
1. Compute AB using Algorithm MM, at cost 2m3 , then compute the product of AB ∈ Rm×m
and C using Algorithm MM at cost 2m2 n. Total cost: 2m2 (m + n).
2. Compute BC using Algorithm MM, at cost 2m2 n, then compute the product of A and BC ∈
Rm×n using Algorithm MM at cost 2m2 n. Total cost: 2m2 (n + n).
The first approach is faster if m < n, whereas the second approach is faster if m > n.
1.3 Vector and matrix norms

A norm is a scalar measure of the size of a vector or matrix. Norms will be useful when quantifying
the accuracy of our computed solutions, for example the difference between the true solution x to
Ax = b and the computed solution x b computed in floating point arithmetic on the computer.
On Rn , we define the vector p-norm
n
!1/p
X p
kxkp = |xi | , p ∈ [1, ∞) ,
i=1
kxk∞ = max |xi | .
16i6n
Particular examples are:
1. p = 2: the 2-norm (also called Euclidean norm or length)

v
u n
uX √
kxk2 = t x2i = xT x.
i=1
2. p = 1: the 1-norm (also called the Manhattan or taxi-cab norm)

n
X
kxk1 = |xi |.
i=1
Which p-norm we choose to work with depends on the situation. The 1-norm ensures that smaller
  still count, whereas the ∞-norm measures only the largest component. For example, for
components
100
 1  202
x= . . .  ∈ R , we have

kxk1 = 301, kxk2 = 101, kxk∞ = 100.
When measuring the difference between the true solution x to Ax = b and the computed solution
b, the 1-norm and 2-norm represent an average error over all components, whereas the ∞-norm
x
represents the maximum error over all components.
In general, norms have the following properties.
7
Definition 1.4. A norm on Rn is a function k·k : Rn → R such that
(a) kxk > 0 for all x ∈ Rn , with kxk = 0 if and only if x = 0;
(b) kαxk = |α| kxk for all x ∈ Rn , α ∈ R;
(c) kx + yk 6 kxk + kyk for all x, y ∈ Rn (triangle inequality).
Given the p-norm on Rn , we can define a norm on Rm×n by the induced norm
kAxkp
kAkp = max , for all A ∈ Rm×n .
x6=0 kxkp
The size of a matrix A is hence measured by how much the application of A to a vector x can change
the size of the vector, measured in the p norm.
The maximum in the expression above can be difficult to compute explicitly, but this is possible
in particular cases.
Theorem 1.5. a) The matrix norm induced by the ∞-norm is the maximum row sum:
n
X
kAk∞ = max |aij | .
16i6m
j=1
b) The matrix norm induced by the 1-norm is the maximum column sum:
m
X
kAk1 = max |aij | .
16j6n
i=1
c) The matrix norm induced by the 2-norm is the square root of the spectral radius of AT A:
p
kAk2 = ρ(AT A), where ρ(AT A) = max{|λ| : λ is an eigenvalue of AT A}.
Part a) is proved in Workshop 2, the proof of part b) is similar. Part c) is proved in Workshop
10.
8
2 Methods for Systems of Linear Equations

We want to solve the system of linear equations
Ax = b (SLE)
with A ∈ Rn×n invertible and b ∈ Rn .

The unique solution is given by x = A−1 b. However, the Python demo this week shows that
computing A−1 can be both slow and inaccurate.
Python demo Week2.1: Solving linear equations vs Computing inverses
The main idea behind modern algorithms to solve (SLE) is to factorise A = MN, where M and
N are simpler matrices, and then solve
My = b
Nx = y.
This gives the correct solution, since Ax = MNx = My = b. Here, M and N being simpler
matrices means that the two systems of linear equations above are easy to solve. The next sections
give examples of different choices for M and N.
2.1 Gaussian elimination

Recall the process of Gaussian elimination:
Example. Let  
2 1 0
A = 4 3 3
6 7 8
We want to put A into row echelon form. We create zeros below the diagonal in the first
column by subtracting multiples of the first row from the other rows:
     
1 0 0 2 1 0 2 1 0
L1 A = −2 1 0 · 4 3 3 = 0 1 3 .
−3 0 1 6 7 8 0 4 8
Repeating this for the second column gives

     
1 0 0 2 1 0 2 1 0
L2 (L1 A) = 0 1 0 · 0 1 3 = 0 1 3  =.. U.
0 −4 1 0 4 8 0 0 −4
The above procedure factorises A as A = (L2 L1 )−1 U =: LU.
To quantify the structure of the matrices L and U computed in the above example, recall the
following definitions.
9
Definition 2.1. A matrix B ∈ Rm×n is
• diagonal if bij = 0 for i 6= j,
• (strictly) upper triangular if bij = 0 for i > j (i > j),
• (strictly) lower triangular if bij = 0 for i < j (i 6 j).
A unit diagonal, upper triangular, or lower triangular matrix has all diagonal entries equal to 1,
i.e. bij = 1 for i = j.
The matrix U from the example is upper triangular. Lemma 2.2 shows that the matrix L = (L2 L1 )−1
is unit lower triangular.
Lemma 2.2. (a) Let L = (lij ) be unit lower triangular with non-zero entries below the diagonal
only in column k. Then L−1 is also unit lower triangular with non-zero entries below the
diagonal only in column k, and (L−1 )ik = −lik , for all i > k.
(b) Let A = (aij ) and B = (bij ) be unit lower triangular n × n matrices, where A has non-zero
entries below the diagonal only in columns 1 to k, and B has non-zero entries below the
diagonal only in columns k + 1 to n. Then AB is unit lower triangular, with (AB)ij = aij
for j 6 k, and (AB)ij = bij for j > k + 1.
Proof. (a) Multiplying L with the suggested inverse gives the identity.
(b) Direct calculation.
Example. (cont’d) We have A = LU, where

 
2 1 0
U = 0 1 3 
0 0 −4
     
1 0 0 1 0 0 1 0 0
L = (L2 L1 )−1 = L−1 −1
1 L2 = 2 1 0 · 0 1 0 = 2 1 0 .
     
3 0 1 0 4 1 3 4 1
Note that in the example above, Lemma 2.2 allowed us to compute the matrix L without per-
forming any arithmetic operations. The inversion of the individual matrices L1 and L2 only requires
flipping a sign by Lemma 2.2 (a), and the multiplication of L−1 −1
1 and L2 only requires a concatenation
of the entries in L−1 −1
1 and L2 by Lemma 2.2 (b). Also note that the entries of L are the so-called
multipliers: ljk = multiple of row k subtracted from row j to create zeros below the diagonal in
column k.
Qn−1 −1
For a general matrix A ∈ Rn×n , Gaussian elimination will result in A = k=1 Lk U =: LU.
The matrix U is by construction upper triangular. Each matrix Lk represents the elementary row
operations of subtracting a multiple of row k from rows k + 1, . . . , n. Hence it is lower-triangular, it
only has non-zero entries below the diagonal only in column k, and Lemma 2.2 applies.
Gaussian elimination computes an LU factorisation of A.
10
Definition 2.3. An LU factorisation of A ∈ Rn×n is A = LU, where L ∈ Rn×n is unit lower

triangular and U ∈ Rn×n is non-singular upper triangular.
An algorithm to compute an LU factorisation is given below. It is a generalisation of the procedure

shown in the example: we loop over the columns of A to create zeros below the diagonal.
Algorithm LU LU factorisation
Input: A ∈ Rn×n
Output: L, U ∈ Rn×n , the LU factorisation of A
1: L = I, U = A
2: for k = 1, . . . , n − 1 do
3: for j = k + 1, . . . , n do
4: ljk = ujk /ukk
5: (ujk , . . . , ujn ) = (ujk , . . . , ujn ) − ljk (ukk , . . . , ukn )
6: end for
7: end for
Line LU.5 subtracts a multiple of row k from row j, resulting in ujk = 0. After iteration k, U contains
zeros below the diagonal in columns 1 to k. Hence, they do not need to be included in line LU.5.
Lemma 2.2 shows that L is computed correctly.
Algorithm LU shows that an LU factorisation of A exists, provided ukk 6= 0 in line LU.4, for
k = 1, . . . , n − 1. In other words, an LU factorisation of A ∈ Rn×n exists if A is non-singular and
can be put into row echelon form without performing any row swaps. If an LU factorisation exists,
it is unique.
If you would like to see a detailed walk through of another example of an LU factorisation, you
can watch the video at https://www.youtube.com/watch?v=HS7RadfcoFk.
We now have the factorisation A = LU, so we can solve (SLE) by solving
Ly = b
Ux = y.
Since L (resp. U) is lower (resp. upper) triangular, this can be done by forward (resp. back)
substitution.
Algorithm BS Back substitution

Input: U ∈ Rn×n non-singular, upper triangular, y ∈ Rn
Output: x ∈ Rn , with Ux = y
1: for j = n, . .. , 1 do
xj = u1jj yj − nk=j+1 ujk xk
P
2:
3: end for
Forward substitution is defined similarly, see Computer lab week 3.
Python demo Week2.2: Back substitution
11
Algorithm GE Gaussian elimination

Input: A ∈ Rn×n , b ∈ Rn
Output: x ∈ Rn , with Ax = b
1: Find LU factorisation of A.
2: Solve Ly = b using forward substitution.
3: Solve Ux = y using back substitution.
Example. (cont’d) Find the solution to Ax = b, with

   
2 1 0 4
A = 4 3 3 , b = 10 .
  
6 7 8 24
We have A = LU, where

   
2 1 0 1 0 0
U = 0 1 3  , L = 2 1 0 .
0 0 −4 3 4 1
We find the solution y to Ly = b by forward substitution:

b1
y1 = =4
l11
1
y2 = (b2 − l21 y1 ) = 10 − 2 × 4 = 2
l22
1
y3 = (b3 − l31 y1 − l32 y2 ) = 24 − 3 × 4 − 4 × 2 = 4.
l33
We find the solution x to Ux = y by back substitution:
y3
x3 = = 4/(−4) = −1
u33
1
x2 = (y2 − u23 x3 ) = (2 − 3 × (−1) = 5
u22
1
x1 = (y1 − u13 x3 − u12 x2 ) = (4 − 1 × 5)/2 = −1/2.
u11
Hence x = [−1/2, 5, −1]T .
Computational cost
Lemma 2.4. The LU factorisation algorithm has computational cost
2 1 7
C(n) = n3 + n2 − n.
3 2 6
12
Proof. We count the number of FLOPs. Line LU.5 requires (n − k + 1) multiplications and
(n − k + 1) subtractions. Line LU.4 requires 1 division. The inner loop (over j) hence needs
n
X
(1 + 2 (n − k + 1)) = (n − k) · (1 + 2(n − k + 1)) operations.
| {z } | {z }
j=k+1 # iterations FLOPs per iteration
Summing over the outer loop (over k) iterations:

n−1
X
C(n) = (n − k) (1 + 2 (n − k + 1))
k=1
n−1
X
= 2 (n − k)2 + 3 (n − k)
k=1
n−1
X n−1
X
2
=2 l +3 l using substitution l = n − k
l=1 l=1
2 1 7
= n3 + n2 − n,
3 2 6
Pm m(m+1) Pm m(m+1)(2m+1)
where in the last step we have used the formulae i=1 i= 2
and i=1 i2 = 6
.
Lemma 2.5. The back substitution algorithm has cost C(n) = n2 .
Proof. Line BS.2 requires n − j multiplications, (n − j − 1) additions, 1 subtraction and 1 division.

Summing over the loop iterations, we have
n
X n−1
X
C(n) = (2 (n − j) + 1) = 2 l + n = n2 .
j=1 l=0
A similar argument shows that the computational cost of forward substitution is C(n) = n2 .
Theorem 2.6. The cost of Algorithm GE is

2 5 7
C(n) = n3 + n2 − n = O(n3 ).
3 2 6
Error analysis
Algorithm GE is an unstable algorithm, meaning that the finite precision εm of floating point arith-
metic can have a disastrous effect on the computed solution. The following example illustrates this.
Example. Let −20
10 1
A= .
1 1
Finding an LU factorisation of A:
0 10−20 1
−20
1 10 1
L1 A = =
−1020 1 1 1 0 1 − 1020
13
−20
1 0 10 1
⇒ L= , U= .
1020 1 0 1 − 1020
If f l(1 − 1020 ) = −1020 , the matrices will be stored as
−20
10 1
L̂ = L , Û = .
0 −1020
Then −20
10 1 0 0
L̂Û = =A+ ,
1 0 0 −1
and the solution x
b computed using L̂ and Û will be completely different to the true solution
x.
14
2.2 Gaussian elimination with partial pivoting

To avoid the stability issues of Algorithm GE, we make use of permutations.
Definition 2.7. A matrix P ∈ Rn×n is called a permutation matrix if every row and every
column of P contains n − 1 zeros and 1 one.
For A ∈ Rn×n :
• multiplying by P from the left (forming PA) permutes the rows of A;
• multiplying by P from the right (forming AP) permutes the columns of A.
Permutation matrices are orthogonal: PT P = I.
Example. (cont’d) Let

10−20 1

0 1
A= , P= .
1 1 1 0

0 1 1 0 0 0 0 1 0 0 1 1
Then PA = A = , and A = L U with L = , U = .
10−20 1 10−20 1 0 1 − 10−20
With f l(1 − 10−20 ) = 1, the matrices will be stored as

0 0 0 1 1 0 0 1 1 0 0 0
L̂ = L , U = , and we have L̂ Û = =A + ,
0 1 10−20 1 + 10−20 0 10−20
and the error in the computed LU factorisation of is A0 much smaller compared to the error
in the LU factorisation of A.
There are two primary methods of permutation used in Gaussian elimination:
• GEPP (Gaussian elimination with partial pivoting) — swap rows only to maximise |ukk | in
line LU.4
• GECP (Gaussian elimination with complete pivoting) — swap rows and columns to maximise
|ukk | in line LU.4
By maximising |ukk | in line LU.4, we avoid dividing by very small numbers and hence avoid
very large numbers in the computation of the LU factorisation. In the above example, we avoid
computations with 1020 , and hence minimise the effect of rounding errors.
The general idea of Algorithm LUPP below is to compute an LU factorisation of a permuted
matrix PA. However, a suitable choice of the permutation matrix P is typically not known a-priori,
and needs to be determined as part of the algorithm.
15
Algorithm LUPP LU factorisation with partial pivoting

Input: A ∈ Rn×n
Output: L, U, P ∈ Rn×n , with PA = LU, L unit lower triangular, U non-singular upper triangular,
and P a permutation matrix
1: U = A, L = I, P = I
2: for k = 1, . . . , n − 1 do
3: Choose i ∈ {k, . . . , n} which maximises |uik |
4: Exchange row (ukk , . . . , ukn ) with (uik , . . . , uin )
5: Exchange row (lk1 , . . . , lk,k−1 ) with (li1 , . . . , li,k−1 )
6: Exchange row (pk1 , . . . , pkn ) with (pi1 , . . . , pin )
7: for j = k + 1, . . . , n do
8: ljk = ujk /ukk
9: (ujk , . . . , ujn ) = (ujk , . . . , ujn ) − ljk (ukk , . . . , ukn )
10: end for
11: end for
Algorithm LUPP computes an LU factorisation with permutation of A, PA = LU. This

factorisation exists provided A is invertible, but the factorisation is not unique.
Note that the arithmetic operations in lines 8 to 9 are the same as in Algorithm LU. The novelty
is in lines 3 to 6, which are related to the permutation of rows. Firstly, note that we choose the so-
called pivot i in line 3 in the set i ∈ {k, . . . , n}. This is in order not to destroy the upper triangular
structure of U, and keep zeros below the diagonal in columns 1 to k − 1.
Secondly, note that we initialise the algorithm such that PA = LU. To keep this equality, we
need to swap rows in L and P accordingly when we swap rows in U.
We then have the following algorithm.
Algorithm GEPP Gaussian elimination with partial pivoting

Output: x ∈ Rn with Ax = b
1: Find PA = LU using Algorithm LUPP
2: Solve Ly = Pb using forward substitution
3: Solve Ux = y using back substitution
Example. Let  
2 1 0
A = 4 3 3
6 7 8
Last week we saw that the LU factorisation of A is
   
1 0 0 2 1 0
L = 2 1 0 , U = 0 1 3  .
3 4 1 0 0 −4
16
Let us now compute its LU factorisation with permutation:
U = A, L = I, P = I
k=1:
i = 3, since |u31 | = max{|u11 |, |u21 |, |u31 |}
 
6 7 8
Swap rows 1 and 3 of U: U = 4 3 3
2 1 0
 
0 0 1
Swap rows 1 and 3 of P: P = 0 1 0
1 0 0
No rows to swap in L, since we haven’t filled in any entries yet
u21 2
l21 = = ,
u11 3
5 7
(u21 , u22 , u23 ) = (u21 , u22 , u23 ) − l21 (u11 , u12 , u13 ) = (0, − , − ),
3 3
u31 1
l31 = = ,
u11 3
4 8
(u31 , u32 , u33 ) = (u31 , u32 , u33 ) − l31 (u11 , u12 , u13 ) = (0, − , − ),
      3 3
6 7 8 1 0 0 0 0 1
U = 0 − 35 − 73  , L =  32 1 0 , P = 0 1 0
0 − 43 − 83 1
3
0 1 1 0 0
k=2:
i = 2, since |u22 | = max{|u22 |, |u32 |}
No swaps required
u32 4
l32 = = ,
u22 5
4
(u32 , u33 ) = (u32 , u33 ) − l32 (u22 , u23 ) = (0, − ),
     5 
6 7 8 1 0 0 0 0 1
U = 0 − 35 − 73  , L =  32 1 0 , P = 0 1 0 .
0 0 − 45 1
3
4
5
1 1 0 0
Computational cost
We assume the exchange of rows carries no cost. The computational cost of Algorithm GEPP is the
computational cost of Algorithm GE, plus the additional cost for choosing the pivot i in line LUPP.3.
This does not require any FLOPs, but it does require the comparison of different numbers in order
to find the maximum. Let us assign unit cost to the comparison of two numbers.
For partial pivoting, choosing i involves finding the maximum of (n − k + 1) numbers. This can
be done using the following algorithm.
17
Algorithm Max Finding index of maximum

Input: z ∈ Rm
Output: i with |zi | = maxj=1,...,m |zj |
1: i = 1, max = |z1 |
2: for j = 2, . . . , m do
3: if |zj | > max then
4: i=j
5: max = |zj |
6: end if
7: end for
Algorithm Max involves m − 1 comparisons for finding the maximum of m numbers, and the total
cost of choosing the pivots i in Algorithm LUPP is hence:
n−1 n−1
X X n(n − 1)
(n − k) = l= = O(n2 ).
k=1 l=1
2
Compared to the cost of

2 3 5 2 7
n + n − n
3 2 6
from Theorem 2.6, this extra cost is negligible for large n. The total cost of Algorithm GEPP is
C(n) = 23 n3 + 3n2 − 35 n.
For Gaussian elimination with complete pivoting (GECP), we choose pivot {i, j} such that |uij |
is maximised, and swap rows i and k and columns j and k. In order to maintain the structure of
U, we are restricted to the choices i ∈ {k, . . . , n} and j ∈ {k, . . . , n}. Determining the pivot hence
involves finding the maximum of (n − k + 1)2 numbers for every k, which adds O(n3 ) comparisons
in total. This extra cost is not negligible.
Error analysis
In comparison to Algorithm GE, Algorithm GEPP is stable.
Definition 2.8. An algorithm to solve Ax = b is backward stable if the computed solution x

b
satisfies
(A + ∆A) x̂ = b + ∆b,
with, for a constant α independent of A, b and εm ,
k∆Akp k∆bkp
+ 6 αεm .
kAkp kbkp
Intuitively, an algorithm is backward stable if the computed solution x

b can be viewed as the exact
solution to a near-by system of linear equations. Ideally we want the constant α to be small, but
this is actually a quite strong requirement. For example, we need to allow for larger errors in solving
larger systems of linear equations, since computing the solution involves more arithmetic operations.
So we need to allow α to grow with the problem size n, unless we restrict to a class of sufficiently
well-behaved matrices A.
18
Algorithm GE is an example of an algorithm that is not backward stable, the perturbations

∆A and ∆b are unbounded in this case. We have the following result on the backward stability of
Algorithm GEPP.
Theorem 2.9. Let x̂ be the computed solution to (SLE) by Algorithm GEPP. Then
(A + ∆A) x̂ = b,
where, with p(n) a cubic polynomial,
k∆Ak∞ 6 p(n)2n−1 εm kAk∞ .
The bound on k∆Ak∞ is sharp for some matrices A, √ however, for most matrices the bound is
quite pessimistic. Recent research suggests that k∆Ak∞ ≈ nεm kAk∞ for most matrices A ∈ Rn×n .
Complete pivoting improves on the bound, but the moderate increase in accuracy is usually not worth
the increased computational cost. In practice, Algorithm GEPP mostly works well, even for large n.
The built-in Python solver, numpy.linalg.solve, uses Algorithm GEPP.
The proof of Theorem 2.9 can be found in Theorem 9.5 in the book Accuracy and Stability of
Numerical Algorithms, by N.J. Higham. It is based on Assumption 1.1, and involves going through
each operation performed in Algorithm GEPP, accumulating all rounding errors.
Given backward stability, what can we say about the error x − x̂? For this, we need the notion
of condition number.
Definition 2.10. The condition number κp (A) of A ∈ Rn×n with respect to the p-norm k·kp
is (
kAkp kA−1 kp if A is invertible
κp (A) =
∞ otherwise.
The matrix A is ill-conditioned if κp (A) is large.
The condition number κp (A) is a measure of how close the matrix A is to being singular, cf
Theorem 2.12. What is "large" for κp (A) is not precisely defined, and depends on the context.
Workshop 4 shows κp (A) ≥ 1 for any matrix A ∈ Rn×n , and any matrix with condition number
close to 1, say κp (A) ≤ 104 is considered well-conditioned. A matrix is definitely ill-conditioned once
κp (A) ≈ ε−1 12
m , say κp (A) ≥ 10 .
Python demo Week3.1: Condition numbers
Theorem 2.11. Let Ax = b and (A + ∆A) x̂ = b. If
κp (A) k∆Akp < kAkp ,
then
kx − x̂kp κp (A) k∆Akp
6 k∆Ak
· .
kxkp 1 − κp (A) kAk p kAkp
p
kx−x̂kp
The above Theorem shows that for the relative error kxkp
to be small, we need:
19
k∆Akp
a) the relative size of the perturbation kAkp
to be small, ie the algorithm to have good stability
properties;
b) κp (A) to be small, ie the matrix A to be well-conditioned.
A combination of Theorems 2.9 and 2.11 gives a bound on the error in the solution x̂ computed by
kx−x̂k
Algorithm GEPP. Note that the theorem gives a bound on the relative error kxk p , which is much
p
more informative than a bound on the actual error kx − x̂kp . To see why this is the case, consider
the following two examples:
−6 −6
10 10 kx − x̂k∞
Example 1: x = −6 , x
b= −3 , bk∞ = 1.001 × 10−3 ,
kx − x = 1.001 × 103 ,
10 10 kxk∞

1 1 kx − x̂k∞
Example 2: x = ,x
b= , bk∞ = 10−3 ,
kx − x = 10−3 .
1 1.001 kxk∞
As example 1 shows, a small actual error does not necessarily mean that the solution has been
computed accurately - the second component is off by orders of magnitude. We always want to
measure the relative error, ie the size of the error compared to the size of the solution x.
Python demo Week3.2: Error of Algorithm GEPP
To provide some more intuition why the condition number measures how close the matrix A is
to being singular, consider the following result.
Theorem 2.12. Let A ∈ Rn×n , with real eigenvalues |λ1 | > |λ2 | > . . . > |λn | and corresponding
real eigenvectors x1 , . . . , xn . If x1 , . . . , xn form an orthonormal set, then the condition number of
A in the 2-norm is
|λ1 |
κ2 (A) = kAk2 A−1 2 = > 1.
|λn |
|λ1 |
Proof. If A is singular, then λn = 0, and |λ n|
= ∞. This is consistent with Definition 2.10, where
singular matrices are assigned condition number ∞.
We will prove the result by proving kAk2 = |λ1 | and kA−1 k2 = |λn |−1 . We start with the
former.
For x ∈ Rn \ {0}, let c ∈ Rn be such that
x = c1 x 1 + . . . + cn x n .
Then
Ax = A (c1 x1 + · · · + cn xn )
= c1 Ax1 + . . . + cn Axn
= c1 λ1 x1 + . . . + cn λn xn .
20
We have
kxk22 = xT x
= (c1 x1 + . . . + cn xn )T (c1 x1 + . . . + cn xn )
X n
2 2
= c1 + . . . + cn = c2i ,
i=1
since xTi xi = 1 and xTi xj = 0 for i 6= j. Similarly,
kAxk22 = (c1 λ1 x1 + . . . + cn λn xn )T (c1 λ1 x1 + . . . + cn λn xn )

Xn
= λ2i c2i .
i=1
Hence,
1/2 1/2
( ni=1 λ2i c2i ) |λ1 | ( ni=1 c2i )
P P
kAxk2
= Pn 1/2
6 1/2
= |λ1 |,
kxk2 ( i=1 c2i ) ( ni=1 c2i )
P
which implies
kAxk2
kAk2 = max 6 |λ1 |.
x6=0 kxk2
We now show that kAk2 > |λ1 |.
kAxk2
kAk2 = max
x6=0 kxk
2
kAx1 k2
>
kx1 k2
kλ1 x1 k2 |λ1 | kx1 k2
= = = |λ1 |.
kx1 k2 kx1 k2
This proves kAk2 = |λ1 |.

The proof that kA−1 k2 = |λn |−1 is similar, noting that the eigenvalues of A−1 are |λ−1
n | >
. . . > |λ−1
1 |, with corresponding eigenvectors x n , . . . , x 1 .
The assumptions of Theorem 2.12 are satisfied for example by symmetric matrices. The result
shows that the closer the smallest eigenvalue of A is to zero, the larger the condition number of A.
Since a zero eigenvalue is equivalent to the matrix being singular, ill-conditioned matrices are close
to being singular.
21
2.3 The QR method

As an alternative to the LU factorisation used in Gaussian elimination, we can also use a QR
factorisation of A.
Definition 2.13. A QR-factorisation of A ∈ Rn×n is A = QR, where Q ∈ Rn×n is orthogonal

and R ∈ Rn×n is non-singular upper triangular.
Recall that an orthogonal matrix Q ∈ Rn×n is a matrix that satisfies QT Q = I, or in other words
Q−1 = QT . We then have the following algorithm.
Algorithm SQR (SLE) via QR-factorisation

Output: x ∈ Rn , with Ax = b
1: Find QR factorisation of A
2: Solve Qy = b using Algorithm MV
3: Solve Rx = y using Algorithm BS
A QR factorisation exists for all non-singular A, and can be computed using Gram-Schmidt
orthonormalisation. First, note that the definition QT Q = I of an orthogonal matrix Q is equivalent
to the columns of Q forming an orthonormal set. To see this, denote by q1 , . . . , qn the columns of
Q. Then
QT Q = I
(
1 for i = j
⇔ (QT Q)ij =
0 6 j
for i =
(
1 for i=j
⇔ qT
i qj =
0 for i 6= j
The Gram-Schmidt algorithm is based on orthogonal projections.
Definition 2.14. The orthogonal projection of x ∈ Rn onto a unit vector u ∈ Rn is given by
proju (x) = uT x u.

proju (x) is “the component of x in the direction of u”:
x = proju (x) + (x − proju (x)) .

| {z } | {z }
parallel to u orthogonal to u
22
Algorithm GS Gram-Schmidt orthonormalisation

Input: A ∈ Rn×n , with columns a1 , . . . , an
Output: Q ∈ Rn×n orthogonal with columns q1 , . . . , qn , R ∈ Rn×n non-singular upper triangular
with A = QR
1: R = 0
2: for j = 1, . . . , n do
3: qj = aj
4: for k = 1, . . . , j − 1 do
5: rkj = qTk aj
6: qj = qj − rkj qk
7: end for q
8: rjj = kqj k2 = qTj qj
q
9: qj = rjjj
10: end for
The columns of Q form an orthonormal basis of Rn . They are constructed from the columns of
A in the following way:
1. We start with a1 and normalise it to unit length:

a1
q1 = .
ka1 k2
2. To construct q2 , we start with a2 , subtract its orthogonal projection onto q1 , and normalise to
unit length:
q2 = a2 − (aT2 q1 )q1
q2
q2 = .
kq2 k2
By construction, q2 is orthogonal to q1 and of unit length.
3. We iterate this procedure to produce q3 , . . . , qn :
q3 = a3 − (aT3 q1 )q1 − (aT3 q2 )q2

q3
q3 = etc.
kq3 k2
Since A is non-singular, we know that rjj 6= 0 in Line GS.8:
rjj = 0 ⇒ qj = 0
⇒ a1 , . . . , aj are linearly dependent
⇒ A is singular.
23

1 1
Example. Consider the matrix A = . Using Algorithm GS, we have
1 0
j=1:
√ √
r11 = ka1 k2 = 1 + 1 = 2,

a1 1 1
q1 = =√ ,
r11 2 1
j=2:
1
r12 = qT1 a2 = √ ,
2

1 1
q2 = a2 − r12 q1 = ,
2 −1
1√ 1
r22 = kq2 k2 = 1+1= √ ,
2 2

q2 1 1
q2 = =√ ,
r22 2 −1
"√ #
√1

1 1 2 2 1
Hence, A = QR, with Q = √12 and R = 2 = √1 .
1 −1 0 √12 2 0 1
Computational cost
Theorem 2.15. The computational cost of Algorithm GS is
C(n) = 2n3 + n2 + n.
Proof. Line GS.5 requires 2n operations using Algorithm DP, and Line GS.6 requires n multipli-
cations and n subtractions. Summing over the inner loop:
j−1
X
4n = 4n(j − 1).
k=1
Line GS.8 requires 1 square root and 2n operations to calculate the dot product. Line GS.9
requires n divisions. Summing over the outer loop:
n
X n
X n
X
(4n (j − 1) + 1 + 2n + n) = 4n (j − 1) + (3n + 1)
j=1 j=1 j=1
n−1
X
= 4n i + (3n + 1)n
i=0
(n − 1)n
= 4n + 3n2 + n
2
= 2n3 + n2 + n.
Solving the system Qy = b in step 2 of Algorithm SQR is done in 2n2 operations using Algo-
rithm MV.
24
Solving the system Rx = y in step 3 of Algorithm SQR is done in n2 operations using Algo-
rithm BS.
Theorem 2.16. The computational cost of Algorithm SQR is
C(n) = 2n3 + 4n2 + n.
Error analysis
Algorithm GS is known to produce large errors. For the computed QR factorisation Q̂R̂, we have
Q̂T Q̂ = I + ∆I , Q̂R̂ = A + ∆A
where ∆I, ∆A ∈ Rn×n are typically very large.
Algorithm GS can be stabilised by a small modification – replace line GS.5 by
40 : rkj = qTk qj
Denote by Algorithm MGS (Modified Gram-Schmidt) Algorithm GS with line 4’ instead of GS.5.
Algorithm MGS Modified Gram-Schmidt orthonormalisation

Input: A ∈ Rn×n , with columns a1 , . . . , an
Output: Q ∈ Rn×n orthogonal with columns q1 , . . . , qn , R ∈ Rn×n non-singular upper triangular
with A = QR
1: R = 0
2: for j = 1, . . . , n do
3: qj = aj
4: for k = 1, . . . , j − 1 do
5: rkj = qTk qj
6: qj = qj − rkj qk
7: end for q
8: rjj = kqj k2 = qTj qj
q
9: qj = rjjj
10: end for
In exact arithmetic, the two algorithms compute the same matrices Q and R. This can be seen
from the following calculation: at iteration j in the outer loop and iteration k in the inner loop, we
have in exact arithmetic
MGS
rkj = qTk qj
k−1
!
X
= qTk aj − rljMGS ql
l=0
k−1
X 0, as {q1 , . . . , qj−1 } is an orthonormal set
qTk aj T *
= − rljMGS q q
k l
l=0
= qTk aj
GS
= rkj
25
In floating point arithmetic, the computed vectors {q̂1 , . . . , q̂j−1 } will not be orthonormal. Al-
gorithm MGS takes this into account when computing q̂j ; Algorithm GS does not. This small
change makes a big difference in practice, and we have the following stability result for Algo-
rithm MGS.
Theorem 2.17. The matrices Q̂ and R̂ computed by Algorithm MGS satisfy
Q̂T Q̂ = I + ∆I , Q̂R̂ = A + ∆A,
where
α1 (n)εm κ2 (A)
k∆Ik2 6 , if 1 − α1 (n)εm κ2 (A) > 0
1 − α1 (n)εm κ2 (A)
k∆Ak2 6 α2 (n)εm kAk2 .
Here, α1 (n) and α2 (n) are constants depending on n; their values increase with increasing n.
each operation performed in Algorithm MGS, accumulating all rounding errors.
Python demo Week4: Stability of QR factorisations
The following example also demonstrates the improved stability properties of Algorithm MGS.
 
1 1 1
Example. Consider the matrix A = ε 0 0. Suppose ε is small, such that f l(1+ε2 ) = 1.
0 ε 0
Using Algorithm GS, we have
j=1:
√
r11 = ka1 k2 = 1 + ε2 ≈ 1 = r̂11 ,
 
1
a1
q̂1 = = a1 = ε ,
r̂11
0
j=2:
r̂12 = q̂T1 a2 = 1,
 
0
˜ 2 = a2 − r̂12 q̂1 = −ε ,
q̂
ε
√ √
˜ 2 k2 = ε2 + ε2 = 2ε,
r̂22 = kq̂
 
˜ 0
q̂2 1
q̂2 = = √ −1 ,
r̂22 2 1
26
j=3:
r̂13 = q̂T1 a3 = 1,
r̂23 = q̂T2 a3 = 0,
 
0
˜ 3 = a3 − r̂13 q̂1 − r̂23 q̂2 = −ε ,
q̂
0
√
˜ 3 k2 = ε2 = ε,
r̂33 = kq̂
 
˜ 0
q̂3
q̂3 = = −1 ,

r̂33
0
Checking for orthogonality, we have

√ 
1 + ε2 − √ε2 −ε
ε 1
q̂T1 q̂2 = − √ , q̂T1 q̂3 = −ε, q̂T2 q̂3 = √ ⇒ Q̂T Q̂ =  − √ε2 1 √1  .
 
2
2 2 √1
−ε 2
1
So there is a significant loss of orthogonality between q̂2 and q̂3 , and ∆I is large. For
Algorithm MGS, on the other hand, we have
j=1:
√
r11 = ka1 k2 = 1 + ε2 ≈ 1 = r̂11 ,
 
1
a1
q̂1 = = a1 = ε ,

r̂11
0
j=2:
r̂12 = q̂T1 a2 = 1,
 
0
˜ 2 = a2 − r̂12 q̂1 = −ε ,
q̂
ε
√ √
˜ 2 k2 = ε2 + ε2 = 2ε,
r̂22 = kq̂
 
˜2 0
q̂ 1
q̂2 = = √ −1 ,
r̂22 2 1
27
j=3:
r̂13 = q̂T1 a3 = 1,
ε
r̂23 = q̂T2 (a3 − r̂13 q̂1 ) = √ ,
2
 
0
˜ 3 = a3 − r̂13 q̂1 − r̂23 q̂2 = − ε  ,
q̂ 2
− 2ε
r
2 2
˜ 3 k2 = ε + ε = √ε ,
r̂33 = kq̂
4 4 2
 
˜3 0
q̂ 1  
q̂3 = = √ −1 ,
r̂33 2 −1
Checking for orthogonality, we have

√ 
1 + ε2 − √ε2 − √ε2
ε ε
q̂T1 q̂2 = − √ , q̂T1 q̂3 = − √ , q̂T2 q̂3 = 0 ⇒ Q̂T Q̂ =  − √ε2 1 0  .

2 2 ε
− 2
√ 0 1
So there is no significant loss of orthogonality, and ∆I is small.
28
2.4 Householder QR factorisation

This section covers the Householder method of computing the QR factorisation of A ∈ Rn×n . It is
one of the well-known stable algorithms, and the most stable algorithm considered in this course.
Python’s built-in QR factorisation algorithm, numpy.linalg.qr, uses the Householder algorithm.
The idea of the Householder QR algorithm is similar to that of Algorithm LU: we convert the
matrix A into an upper triangular matrix by looping over the columns and creating zeros below the
diagonal. However, instead of using elementary row operations, the Householder QR algorithm uses
reflections, resulting in the factorisation A = QR with orthogonal Q and upper triangular R.
Algorithm QR Householder QR factorisation

Input: A ∈ Rn×n
Output: Q ∈ Rn×n orthogonal, R ∈ Rn×n non-singular upper triangular, with A = QR
1: Q = I, R = A
2: for k = 1, . . . , n − 1 do
3: u = (rkk , . . . , rnk ) ∈ Rn−k+1
4: v = u + sign(u1 ) kuk2 e1
v
5: w = kvk
2
T (n−k+1)×(n−k+1)
6: Hk = "I − 2ww # ∈R
I 0
7: Qk = ∈ Rn×n
0 Hk
8: R = Qk R
9: Q = QQk
10: end for
In line QR.4, e1 denotes the vector with a 1 in its first entry, and all other entries equal to 0.
Furthermore, sign(u1 ) = 1 if u1 > 0 and sign(u1 ) = −1 otherwise.
The matrix Hk is called a Householder reflection matrix. Applying Hk to a vector x reflects
the vector x about the hyperplane that is orthogonal to w:
Hk x = x − 2wwT x = x − 2 projw (x).
projw (x) is the projection of x onto w, and x − projw (x) is the projection of x onto the hyperplane
orthogonal to w. Hence, x − 2 projw (x) is the reflection of x about the hyperplane orthogonal to w:
x
2 projw (x)
hyperplane orthogonal to w
x − 2 projw (x)
Algorithm QR starts with R = A, and creates zeros below the diagonal in column k at iteration
k. This is achieved by using the Householder reflections Hk .
29
Lemma 2.18. The matrix Rk = Qk . . . Q1 A has zeros below the diagonal in columns 1 to k, for
1 6 k 6 n − 1.
Proof. We will prove this by induction.

Consider the product Q1 A. From line QR.7, we have Q1 = H1 = I − 2wwT . With u being
the first column of A by line QR.3, we have
Q1 u = I − 2wwT u.

Using the definition of w in lines QR.3–QR.5, a direct calculation (see Workshop 6) shows
Q1 u = − sign(u1 ) kuk2 e1 ,
of which only the first entry is non-zero. Hence,

R11 R12
Q1 A =
0 R22
for R11 = − sign(u1 ) kuk2 and some R22 ∈ R(n−1)×(n−1) .

Next, assume Rk−1 = Qk−1 . . . Q1 A has zeros below the diagonal in columns 1 to k − 1, i.e.

R11 R12
Rk−1 =
0 R22
for some R11 ∈ R(k−1)×(k−1) upper triangular, R22 ∈ R(n−k+1)×(n−k+1) . Then, from line QR.7

I 0 R11 R12 R11 R12
Rk = Qk Rk−1 = · = .
0 Hk 0 R22 0 Hk R22
Now, u in line QR.3 is the first column of R22 . As before, a direct calculation shows
Hk u = − sign(u1 ) kuk2 e1 .
Hence, the first column of Hk R22 has zeros everywhere except the first entry, and so the k th
column of Rk has zeros below the diagonal.
Hence, Rk has zeros below the diagonal in columns 1 to k if Rk−1 has zeros below the diagonal
in columns 1 to k − 1, and the statement of the Lemma holds by induction.
In particular, Lemma 2.18 shows that the output R = Rn−1 = Qn−1 . . . Q1 A is upper triangular.
Lemma 2.19. The matrix Qk in line QR.7 is orthogonal and symmetric, for 1 6 k 6 n − 1.
Proof. We first prove orthogonality. We have

T I 0 I 0 I 0
Qk Qk = · = .
0 HTk 0 Hk 0 HTk Hk
30
Then
HTk Hk = I − 2wwT I − 2wwT

= I − 2wwT − 2wwT + 4wwT wwT

= I − 4wwT + 4(wT w)wwT
=I
since wT w = 1 by line QR.5. This completes the proof of orthogonality.

For symmetry, we have
T
I 0 I 0 I 0
QTk = = = = Qk .
0 Hk 0 (I − 2wwT )T 0 I − 2wwT
Lemma 2.20. The output Q of Algorithm QR is orthogonal.
Proof. We have Q = IQ1 . . . Qn−1 . Since the product of orthogonal matrices is orthogonal, this
follows from Lemma 2.19.
Finally, we check that QR = A. We have

QR = (IQ1 . . . Qn−1 ) (Qn−1 . . . Q1 A) ,
and the claim then follows since Q2k = I, for 1 ≤ k ≤ n − 1, by Lemma 2.19.

1 1
Example. Consider the matrix A = . Using Algorithm QR (Householder):
1 0
k=1:

1
u = a1 =
1
√ √
1 2 1+ 2
v = u + sign(u1 )kuk2 e1 = + =
1 0 1
√ √
1 1 1+ 2 1 1+ 2
w= v=q √ =√ p √
kvk2 2 2 1 2 2+ 2 1
(1 + 2) + 1
√ √
T 1 0 2 3 + 2√ 2 1 + 2 1 −1 −1
Hk = I − 2ww = − √ =√
0 1 2(2 + 2) 1 + 2 1 2 −1 1

1 −1 −1 1 1 1 −2 −1
R = Hk A = √ =√
2 −1 1 1 0 2 0 −1

1 −1 −1
Q = IHk = Hk = √
2 −1 1

1 −1 −1 1 1 1 −2 −1
Hence, we have Q = √2 and R = = √2 . Note that up to a
−1 1 1 0 0 −1
change in sign, this QR factorisation is the same as the one computed in the previous section
using Algorithm GS. In fact the QR factorisation of a matrix can be shown to be unique up
31
to changes in sign, and the QR factorisation can be made unique by requiring all diagonal
entries of R to be positive.
Computational cost
Algorithm QR should be implemented efficiently. In particular, we do not want to use Algorithm MM
to compute the matrix-matrix products in lines QR.8 and QR.9.
Using the special structure of the matrices Hk and Qk , we can perform lines QR.8 and QR.9
using Algorithms MV and OP. For example, at iteration k, in line QR.8 we have

I 0 R11 R12 R11 R12
Qk R = = ,
0 Hk 0 R22 0 Hk R22
and we only need to compute Hk R22 ∈ R(n−k+1)×(n−k+1) . Since Hk = I − 2wwT , we have
Hk R22 = R22 − 2wwT R22 ,
and we can compute zT = wT R22 using Algorithm MV, 2w using n − k + 1 multiplications and 2wzT
using Algorithm OP. Together with the (n − k + 1)2 subtractions to compute R22 − 2wzT , this gives
a total cost of 4(n − k + 1)2 + (n − k + 1) for line QR.8. Note that this is much smaller than the cost
of 2n3 we would get by applying Algorithm MM to compute Qk R directly!
Algorithm QR has computational cost C(n) = 38 n3 + O(n2 ), so it is more expensive than Algo-
rithms GS and MGS.
For Algorithm SQR, we do not need the matrix Q explicitly, but rather only the product QT b.
Thus, we can modify the algorithm, and initialise y = b in line 1 and replace line QR.9 of Algo-
rithm QR with
90 : y = Qk y.
This computes
y = Qn−1 . . . Q1 b
= QT T
n−1 . . . Q1 b
= (Q1 . . . Qn−1 )T b
= QT b.
This reduces the cost of Algorithm QR to C(n) = 34 n3 + O(n2 ). This can be implemented efficiently
using Algorithm DP as in Workshop 2 Exercise 1.
Error analysis
Algorithm SQR has better stability properties than Algorithm GEPP, which outweighs the larger
cost.
Theorem 2.21. Let Ax = b, and denote by x̂ the solution computed through Algorithm SQR
with a Householder QR factorisation. Then
(A + ∆A) x̂ = b,
32
where, with α > 0 a small integer constant,
k∆Ak2 6 αn5/2 εm kAk2 .
each operation performed in Algorithm QR, accumulating all rounding errors.
Compared to the bounds in Theorem 2.9 for Algorithm GEPP, the size of the perturbation ∆A
only depends polynomially on n. Theorem 2.11, together with Theorem 2.21, can be used to bound
the error in the computed solution x
b.
Python demo Week5: Algorithm SQR vs Algorithm GEPP
33
3 Iterative methods for systems of linear equations

Direct methods, such as Algorithms GE, GEPP and SQR, compute the true solution x to (SLE) in
exact arithmetic. For general matrices A ∈ Rn×n , they have computational cost O(n3 ).
In this chapter, we introduce iterative methods, which construct a sequence (xk )k∈N with xk →
x as k → ∞, where x solves Ax = b. Even in exact arithmetic, these methods only compute
approximate solutions. The primary use of iterative methods is for problems where A ∈ Rn×n
with n 1, so that O(n3 ) cost is prohibitively expensive. The most powerful iterative methods in
existence today have cost O(n log n).
Iterative methods start with an initial guess x0 , and in each iteration, xk is computed from xk−1 .
We want:
• the computation of xk from xk−1 to be cheap;

• the convergence of xk to x to be fast.
To define a class of iterative methods, we write A = M + N, where M has a simpler structure

than A in the sense that systems of linear equations with coefficient matrix M are easier to solve
than systems with coefficient matrix A.
Algorithm GI General Iterative Method

Input: A = M + N ∈ Rn×n , b ∈ Rn , x0 ∈ Rn , εr > 0
Output: xk ∈ Rn with Axk ≈ b
1: k = 0
2: while kAxk − bkp > εr do
3: k =k+1
4: Compute yk = b − Nxk−1
5: Solve Mxk = yk
6: end while
Different choices for M and N result in different particular methods, and we will see two examples
in the next sections.
Since the error ek = x − xk is not computable in practice, we stop the iteration when the residual
rk = b − Axk is sufficiently small. Note that since SLE has unique solution, krk kp = 0 ⇔ xk = x.
Typical choices for the starting guess x0 include the zero vector and random vectors. Typical
choices for the tolerance εr are for example 10−3 kbkp or 10−6 kbkp . Scaling the tolerance by kbkp
kAxk −bkp
means that we are stopping the iteration when the relative residual kbk is small enough.
p
Let us now look at convergence properties of Algorithm GI. We have
Mx = b − Nx,
and
Mxk = b − Nxk−1 .
The error ek = x − xk hence satisfies
Mek = Mx − Mxk = b − Nx − (b − Nxk−1 ) = N (xk−1 − x) = −Nek−1 .
34
Applying this equality iteratively, we have

k
ek = −M−1 N e0 = Rk e0 , with R = −M−1 N.
Theorem 3.1. Let ek = Rk e0 , for k ∈ N and some R ∈ Rn×n . If kRkp < 1, then kek kp → 0 as
k → ∞.
Proof. Using the properties kBykp ≤ kBkp kykp and kBk kp ≤ kBkkp , for any B ∈ Rn×n and
y ∈ Rn , we have
kek kp = Rk e0 p
k
6 R p
ke0 kp
6 kRkkp ke0 kp
and so kek kp → 0 as k → ∞ if kRkp < 1.
3.1 Jacobi method

We write A = L + D + U, where L is strictly lower triangular, D is diagonal, and U is strictly upper
triangular. The Jacobi method is obtained by choosing M = D and N = L + U.
Algorithm JAC Jacobi Iterative Method

Input: A = D + L + U ∈ Rn×n with D diagonal, L and U strictly lower and upper triangular,
respectively, b ∈ Rn , x0 ∈ Rn , εr > 0
1: k = 0
2: while kAxk − bkp > εr do
3: k =k+1
4: Compute yk = b − (L + U) xk−1
5: Solve Dxk = yk
6: end while
(yk )i
The system in line 5 can be solved efficiently using a diagonal solver: (xk )i = dii
, for i = 1, . . . , n.

2 1 3
Example. Let A = and b = . Consider solving the system Ax = b using the
1 2 3
0
Jacobi method with x0 = .
0
35

2 0 0 1
With M = D = and N = L + U = , we then have
0 2 1 0
k=1:

3 0 1 0 3
y1 = b − (L + U) x0 = − = ,
3 1 0 0 3
3
2 0 3
x = y1 = ⇒ x1 = 23 .
0 2 1 3 2
k=2:
0 1 23
3
3
y2 = b − (L + U) x1 = − = 23 ,
3 1 0 32 2
3 3
2 0
x = y2 = 23 ⇒ x2 = 43 .
0 2 2 2 4
k=3:
0 1 34
9
3
y3 = b − (L + U) x2 = − = 49 ,
3 1 0 34 4
9 9
2 0
x = y3 = 49 ⇒ x3 = 89 .
0 2 3 4 8
Are we converging to the right solution?
Error analysis
Convergence of the Jacobi method follows from Theorem 3.1, with R = −D−1 (L + U).
Theorem 3.2. Suppose A ∈ Rn×n is strictly diagonally dominant, i.e.

X
|aii | > |aij |, ∀i = 1, . . . , n.
j6=i
Then the Jacobi method applied to Ax = b converges: kek k∞ → 0 as k → ∞.
Proof. First, note that

( (
aij if i 6= j −a−1 if i = j
(L + U)ij = , (−D−1 )ij = ii
.
0 otherwise, 0 otherwise,
The matrix R = −D−1 (L + U) hence has entries

(
a
− aijii if i 6= j
rij =
0 otherwise.
36
Then
n
X
kRk∞ = max |rij |
16i6n
j=1
X |aij |
= max
16i6n
j6=i
|aii |
1 X
= max |aij |
16i6n |aii |
j6=i
< 1,
where in the last step we have used that A is strictly diagonally dominant. It then follows from
Theorem 3.1 that kek k∞ → 0 as k → ∞.
Computational cost
The computational cost of the Jacobi method depends on:
• how many iterations we perform;

• the cost of each iteration.
The number of iterations kmax depends on the chosen accuracy εr , the norm of R and the starting
guess x0 . See the Python demo video and the computer lab in Week 7 for examples.
Python demo Week6: Implementing Jacobi and studying dependence of kmax on εr .
For each iteration of the Jacobi method, line JAC.4 requires n subtractions and one matrix-vector
multiplication with cost 2n2 , which is a total of 2n2 + n operations. Line JAC.5 requires n divisions.
So, in total, one iteration requires 2n2 +2n = O(n2 ) operations. The cost of Jacobi is then O(kmax n2 ),
which is typically much smaller than O(n3 ) for large n.
3.2 Gauss-Seidel method

The Gauss-Seidel method is also based on the decomposition A = L + D + U already used for the
Jacobi method. We now choose M = L + D and N = U.
Algorithm GAU Gauss-Seidel Iterative Method

Input: A = D + L + U ∈ Rn×n with D diagonal, L and U strictly lower and upper triangular,
respectively, b ∈ Rn , x0 ∈ Rn , εr > 0
1: k = 0
2: while kAxk − bk > εr do
3: k =k+1
4: Compute yk = b − Uxk−1
5: Solve (L + D) xk = yk
6: end while
37
The matrix L + D is lower triangular, so the system in line 5 can be solved efficiently using
Algorithm FS.

2 1 3
Example. (ctd) Let A = and b = . Consider solving the system Ax = b using
1 2 3
0
the Gauss-Seidel method with x0 = .
0

2 0 0 1
With M = L + D = and N = U = , we then have
1 2 0 0
k=1:

3 0 1 0 3
y1 = b − Ux0 = − = ,
3 0 0 0 3
3
3
2 0 3
x = y1 = ⇒ x1 = 1 2 3 = 23 .
1 2 1 3 2
3− 2 4
k=2:
0 1 23
9
3
y2 = b − Ux1 = − = 4 ,
3 0 0 34 3
9
9 9
2 0
x = y2 = 4 ⇒ x2 = 1 8 9 = 15 8 .
1 2 2 3 2
3− 8 16
k=3:
0 1 89
33
3
y3 = b − Ux2 = − 15 =
16 ,
3 0 0 16 3
33
33 33
2 0
x = y3 = 16 ⇒ x3 = 1 32 33 = 32 63 .
1 2 3 3 2
3 − 32 64
Are we converging to the right solution? What do you notice about the speed of convergence
compared to the Jacobi method? What do you notice about the size of the errors in the first
and second component?
Error analysis
Theorem 3.3. Suppose A ∈ Rn×n is strictly diagonally dominant, i.e.
X
|aii | > |aij |, ∀i = 1, . . . , n.
j6=i
Then the Gauss-Seidel method applied to Ax = b converges: kek k∞ → 0 as k → ∞.
The proof of Theorem 3.3 is similar to that of Theorem 3.2.
Computational cost
The number of iterations kmax again depends on the chosen accuracy εr , the norm of R and the
starting guess x0 . Compared to Jacobi, we typically need fewer iterations, as illustrated by the
example above.
38
Line GAU.5 can be done using forward substitution, since L + D is lower triangular, and so costs
n2 operations. Line GAU.4 requires n subtractions and one matrix-vector multiplication, which costs
2n2 operations using Algorithm MV. This would give a total of 3n2 + n operations.
However, the matrix U in the matrix-vector product in line GAU.4 is strictly upper triangular,
and so Algorithm MV is not efficient in this case. P Pn triangular structure of U implies
The strictly upper
n
that uij = 0 for j ≤ i, and so we have (Uv)i = j=1 uij vj = j=i+1 uij vj for any vector v. Hence,
(Uv)i can be computed by applying Algorithm DP to the vectors ui,i+1:n := [ui,i+1 , . . . , uin ]T ∈ Rn−i
and vi+1:n ]T ∈ Rn−i . The cost of computing a matrix-vector product Uv is then
Pn:= [vi+1 , . . . , vnP
C(n) = i=1 2(n − i) = 2 n−1 2
j=0 j = (n − 1)n. This gives a total cost of 2n operations per iteration
of the Gauss-Seidel method.
The costs per iteration of Jacobi and Gauss-Seidel hence both have leading order term 2n2 , and
for large n, the difference between them is negligible. (Note that we can also slightly improve on the
cost per iteration of Jacobi, by noting that the matrix L+U has zeros on the diagonal and modifying
Algorithm MV correspondingly. This saves 1 multiplication and 1 addition per entry of the vector
(L + U)xk , so 2n operations in total. The costs per iteration of Jacobi and Gauss-Seidel are then
the same.)
39
4 Polynomial Fitting
Suppose we are given a set of points {ai , bi }m
i=1 , and we want to find the degree m − 1 polynomial
that goes through these points. Then we need to determine coefficients x ∈ Rm such that
bi = x1 + x2 ai + x3 a2i + . . . + xm am−1
i , for i = 1, . . . , m.
This corresponds to solving the linear system Ax = b, where b ∈ Rm is the vector containing {bi }m
i=1 ,
m×m j−1
and the matrix A ∈ R is the Vandermonde matrix with entries aij = ai :
Ax = b
⇔ (Ax)i = bi , ∀i ∈ {1, . . . , m}
Xm
⇔ aij xj = bi ∀i ∈ {1, . . . , m}
j=1
Vandermonde matrices are very ill-conditioned, especially when the number of points m is large (see
Computer lab 7).
An illustration of polynomial fitting applied to a specific set of observed points {ai , bi }10
i=1 is given
in Figure 1. The data points are indicated by circles, and the interpolating degree 9 polynomial is
shown in the dotted line. We can see that even though the dotted line does pass through each of the
observed points, it does not seem to be a good indicator of the general trend in the data. Suppose
we want to predict the value of the output b at the input a = 0.95. Do you think the polynomial
gives a reasonable prediction here?
What if we instead want to find the straight line that best fits our observations {ai , bi }m
i=1 ? Let
the line have equation b = x1 + x2 a. Then we want to choose x ∈ R2 as to minimise
m
X
((x1 + x2 ai ) − bi )2 ,
i=1
i.e. we want to minimise the difference between our observed outputs bi and the predicted outputs
x1 + x2 ai of the straight line. In terms of Figure 1, this corresponds to minimising the vertical
distance between the circles and the straight line.
With the Vandermonde matrix A ∈ Rm×2 with entries aij = aj−1
i , and b ∈ Rm the vector
containing {bi }m
i=1 , this corresponds to x minimising
kAx − bk22 .
Since kAx − bk2 is always non-negative, this is equivalent to minimising kAx − bk2 .
This can be generalised to fitting a polynomial of degree n − 1 to the data points {ai , bi }m
i=1 , in
m×n
which case we have the Vandermonde matrix A ∈ R .
40
Figure 1: Illustration of polynomial fitting.
5 Least Squares Problem

A least squares problem (LSQ) is to find x ∈ Rn that minimises kAx − bk2 , for given A ∈ Rm×n
and b ∈ Rm . We start by rewriting (LSQ) as a system of linear equations.
Theorem 5.1. Let A ∈ Rm×n , b ∈ Rm . A vector x ∈ Rn minimises kAx − bk2 if and only if it
solves the normal equations
AT Ax = AT b.
Proof. ⇒: Suppose x ∈ Rn minimises kAx − bk2 . Let g(x) = kAx − bk22 . Then minimising
kAx − bk2 is equivalent to minimising g.
The gradient of g is given by
∇g(x) = ∇ kAx − bk22

= ∇ (Ax − b)T (Ax − b)
= ∇ (Ax)T Ax − bT Ax − (Ax)T b + bT b

= ∇ xT AT Ax − 2xT AT b + bT b

= 2AT Ax − 2AT b.
At a minimum, we must have ∇g(x) = 0, i.e. AT Ax = AT b. Hence, x solves the normal

equations.
⇐: Suppose x ∈ Rn solves the normal equations. Let y ∈ Rn be arbitrary, and let c = y − x.
41
Then
kAy − bk22 = kA(x + c) − bk22

= (A(x + c) − b)T (A(x + c) − b)
= (Ax − b)T (Ax − b) + 2 (Ac)T (Ax − b) + (Ac)T (Ac)
= kAx − bk22 + 2cT AT (Ax − b) + kAck22
= kAx − bk22 + kAck22 ,
since AT (Ax − b) = 0. Since kAck22 ≥ 0 for all c ∈ Rn , this shows that kAy − bk22 ≥ kAx − bk22
for all y ∈ Rn . In other words, x ∈ Rn minimises kAx − bk22 and hence minimises kAx − bk2 .
In general, the matrix AT A need not be invertible, and the minimiser of kAx − bk2 is not unique.
If (and only if) we have m > n and rank(A) = n, then the matrix AT A is invertible, and the normal
equations have a unique solution
−1 T
x = AT A A b.
−1
The matrix A† = AT A

AT ∈ Rn×m is in that case called the Moore-Penrose pseudo-inverse
of A. If A is invertible, then A† = A−1 .
5.1 LSQ via normal equations

When AT A is invertible, we can compute the unique minimiser x by solving the normal equations,
for example using Algorithm GEPP. This involves
• Computing AT A and AT b. This can be done using Algorithms MM and MV, respectively, at
cost 2mn2 and 2mn.
• Solving AT Ax = AT b. This can be done using Algorithm GEPP, at cost 23 n3 + 3n2 − 35 n.
Since the matrix AT A is symmetric, we can essentially halve the computational cost of the steps
above using efficient implementations. For step 2, we can use the Cholesky factorisation as in
Workshop 4 Exercise 4, instead of the LU factorisation, which reduces the cost of solving the system
to 13 n3 +3n2 + 32 n. We can reduce the cost of step 1 by modifying Algorithm MM to only compute the
entries (AT A)ij for i ≤ j, since (AT A)ji = (AT A)ij . This means we are computing nj=1 j = n(n+1)
P
2
entries of AT A, each of which involves an application of Algorithm DP at cost 2m, which gives a
total cost of mn2 + mn to compute AT A.
For the computed solution x
b computed using Algorithm GEPP to solve the normal equations, an
application of Theorem 2.11 gives us the error bound
kx̂ − xkp κp (AT A) ∆AT A p

6 · .
kxkp T
k∆AT Akp kAT Akp
1 − κp (A A) AT A
k kp
The dependency on κp (AT A) is undesirable, since it turns out that κp (AT A) = κp (A)2 . So even if A
is only mildly ill-conditioned, AT A can be severely ill-conditioned and the accuracy in the computed
solution can be very poor.
Above, we have used the condition number κp (A) of the rectangular matrix A, which is defined
analogously to the condition number of square matrices in Definition 2.10.
42
Definition 5.2. The condition number κp (A) of A ∈ Rm×n with respect to the p-norm is
(
kAkp A† p if rank(A) = n,
κp (A) =
∞ otherwise.
This is consistent with Definition 2.10.
Python demo Week7: Error of solving the normal equations
We can show that κp (AT A) = κp (A)2 using the singular value decomposition of A.
43
5.2 Singular Value Decomposition

Definition 5.3. Let A ∈ Rm×n . A factorisation
A = UΣVT
is called a singular value decomposition (SVD) of A if U ∈ Rm×m and V ∈ Rn×n are

orthogonal, and Σ ∈ Rm×n is diagonal, with non-negative diagonal entries σ1 , σ2 , . . . , σp > 0,
where p = min(m, n). The values σ1 , . . . , σp are called the singular values of A. The columns
of U are called the left singular vectors of A; the columns of V are called the right singular
vectors of A.
An SVD exists for all matrices A ∈ Rm×n . The singular values σ1 , . . . , σp are uniquely determined
(up to permutation), whereas the singular vectors are not uniquely determined.
If A = UΣVT is an SVD, then
Avi = σi ui , and uTi A = σi viT ,
where ui is the ith column of U and vi is the ith column of V.
Since U and V are invertible, we have
rank(A) = rank(UΣVT ) = rank(Σ).
Since Σ is diagonal, the rank of Σ is equal to the number of non-zero diagonal entries. Any column
with a zero diagonal entry will be equal to the zero vector. For A ∈ Rm×n with m > n, we hence
have
rank(A) = n iff σ1 , . . . , σn > 0,
i.e. A is of full rank iff it does not have a zero singular value.
We have the following equivalent of Theorem 2.12.
Theorem 5.4. Let A ∈ Rm×n , with m ≥ n and singular value decomposition A = UΣVT . Then
the condition number of A in the 2-norm is
σmax
κ2 (A) = kAk2 A† 2
= > 1,
σmin
where σmax = max1≤i≤n σi and σmin = min1≤i≤n σi .
σmax
Proof. If A is not of rank n, then σmin = 0, and σmin
= ∞. This is consistent with Definition 5.2.
−1
We will prove the result by proving kAk2 = σmax and A† 2 = σmin . We start with the former.
We first show kAk2 ≤ σmax , and will afterwards show kAk2 ≥ σmax .
By definition of the induced 2-norm, we have
UΣVT x 2
kAk2 = max .
x6=0 kxk2
Now U is orthogonal, so kUyk2 = kyk2 for all y ∈ Rn , and
UΣVT x 2
ΣVT x 2
max = max
x6=0 kxk2 x6=0 kxk2
44
Since V is orthogonal, VT is also orthogonal. So
ΣVT x 2
max
x6=0 kxk2
ΣVT x 2
= max
x6=0 kVT xk
2
kΣyk2
= max ,
y6=0 kyk
2
where we have used the substitution y = VT x.

The matrix ΣT Σ ∈ Rn×n is diagonal with diagonal entries σ12 , . . . , σn2 . Hence, for any y ∈ Rn ,
1/2 1/2
( ni=1 σi2 yi2 ) σmax ( ni=1 yi2 )
P P
kΣyk2
= Pn 1/2
6 1/2
= σmax ,
kyk2 ( i=1 yi2 ) ( ni=1 yi2 )
P
which implies
kAk2 6 σmax .
We now show that kAk2 > σmax . Let ej ∈ Rn be the vector with a 1 in its jth entry and zeros in
all other entries, where j is such that σj = σmax . Then
kAyk2
kAk2 = max
y6=0 kyk2
kΣyk2
= max
y6=0 kyk
2
kΣej k2
>
kej k2
= σmax .
This finishes the proof that kAk2 = σmax .

−1
The proof that A† 2
= σmin is similar, see Workshop 8.
Using the singular value decomposition, we can prove κ2 (AT A) = κ2 (A)2 (see Workshop 8), and
so the error in the solution x̂ computed by solving the normal equations with Algorithm GEPP can
be very large. We want to avoid working with the matrix AT A, and work instead directly with A.
5.3 LSQ via QR

A better way of solving the least squares problem is via a QR factorisation of A. This is what the
built-in solver np.linalg.lstsq in Python uses.
For A ∈ Rm×n , with m > n, a QR factorisation is A = QR, with Q ∈ Rm×m orthogonal and
R ∈ Rm×n upper triangular.
Since R is upper triangular, the last m − n rows of R are simply zero. Because
of this, it is often
R1
sufficient to work with the reduced QR factorisation A = QR = Q1 Q2 = Q1 R1 , where
0
R1 ∈ Rn×n is upper triangular and Q1 ∈ Rm×n has orthogonal columns.
45
Algorithm LSQ-QR LSQ via QR factorisation

Input: A ∈ Rm×n , with m ≥ n and rank(A) = n, b ∈ Rm
Output: x ∈ Rn that minimises kAx − bk2
1: Compute the reduced QR factorisation A = Q1 R1
T
2: Compute y = Q1 b ∈ Rn
3: Solve R1 x = y using back substitution.
The result x of the algorithm satisfies
AT Ax = RT1 QT1 Q1 R1 x
= RT1 y
= RT1 Q1 T b
= AT b.
Computational cost
The QR factorisation can be computed using Algorithm QR:
Algorithm QR Householder QR factorisation

Input: A ∈ Rm×n , m ≥ n
Output: Q ∈ Rm×m orthogonal, R ∈ Rm×n upper triangular, with A = QR
1: Q = I, R = A
2: for k = 1, . . . , n do . Need to include last column as this now
has rows below the diagonal
3: u = (rkk , . . . , rmk ) ∈ Rm−k+1
4: v = u + sign(u1 ) kuk2 e1
v
5: w= kvk2
6: I − 2ww#T ∈ R(m−k+1)×(m−k+1)
Hk = "
I 0
7: Qk = ∈ Rm×m
0 Hk
8: R = Qk R . Create zeros below diagonal in column k
9: Q = QQk
10: end for
The computational cost of Algorithm QR applied to A ∈ Rm×n to produce the reduced QR

factorisation A = Q1 R1 is C(n, m) = 2n2 m − 32 n3 + O(m2 + n2 + mn).
Step 2 can be done using Algorithm MV in 2mn operations, and step 3 can be done in n2
operations using Algorithm BS. So the leading order terms in the cost are 2n2 m − 32 n3 , and whether
the cost is larger or smaller than the cost of solving the normal equations, will depend on the
relationship between m and n.
46
Error analysis
By working directly with the matrix A, rather than AT A, we get rid of the dependency of the error
on κ(AT A), and instead get a dependency on κ(A).
Theorem 5.5. Let x be the minimiser of kAx − bk2 , and denote by x̂ the solution computed
through Algorithm LSQ-QR with a Householder QR factorisation. Then x̂ is the minimiser of
k(A + ∆A)x − bk2 where, with α > 0 a small integer constant,
k∆Ak2 6 αn3/2 mεm kAk2 .
The proof is similar to that of Theorem 2.21, see Theorem 20.3 in Accuracy and Stability of
Numerical Algorithms by Higham.
Theorem 5.6. Let x be the minimiser of kAx−bk2 , and x̂ be the minimiser of k(A+∆A)x−bk2 .
If rank(A) = rank(A + ∆A) = n and
κ2 (A) k∆Ak2 < kAk2 ,
then
kx − x̂k2 κ2 (A) k∆Ak2 krk2
6 k∆Ak
2 + (κ2 (A) + 1) ,
kxk2 1 − κ2 (A) kAk 2 kAk2 kAk2 kxk2
2
where r = b − Ax.
For a proof of Theorem 5.6, see Theorem 20.1 in Accuracy and Stability of Numerical Algorithms
by Higham. The proof uses the pseudo-inverses A† and (A + ∆A)† .
5.4 LSQ via SVD

The most stable method to solve the least squares problem, especially for problems where the matrix
A is close to being rank deficient, is using an SVD of A. However, this is also the most expensive
method.
As for the QR factorisation, it is sufficient to work with the reduced SVD A = UΣVT =
Σ1
U1 U2 VT = U1 Σ1 VT , with U1 ∈ Rm×n and Σ1 ∈ Rn×n , since the last m − n rows of Σ are
0
zero.
Algorithm LSQ-SVD LSQ via SVD

Input: A ∈ Rm×n , with m ≥ n and rank(A) = n, b ∈ Rm
Output: x ∈ Rn that minimises kAx − bk2
1: Compute the reduced SVD A = U1 Σ1 VT
T
2: Compute z = U1 b
3: Solve Σ1 y = z
4: Compute x = Vy
47
The result x of the algorithm satisfies

AT Ax = VΣ1 UT1 U1 Σ1 VT x
= VΣ1 UT1 U1 Σ1 VT Vy
= VΣ1 UT1 U1 z
= VΣ1 U1 T b
= AT b.
To compute the reduced SVD of A, we make use of the following: The factorisation A = UΣVT
is equivalent to AV = UΣVT V = UΣ, since V is invertible and VT V = I. This gives
Avi = σi ui , for i = 1, . . . , n.
Similarly, the factorisation AT = VΣT UT is equivalent to AT U = VΣT UT U = VΣT , since U is
invertible and UT U = I. This gives
AT ui = σi vi , for i = 1, . . . , n.
Together, this implies
AT Avi = σi AT ui = σi2 vi , for i = 1, . . . , n.
So we need to compute the eigenvalues {σi2 }ni=1 of AT A ∈ Rn×n , together with its corresponding
eigenvectors {vi }ni=1 . The vectors {ui }ni=1 can then be computed as ui = σ1i Avi . This will give us
the reduced SVD.
Example. Consider the matrix
1 2
A= .
2 1
Since A is square, the SVD and the reduced SVD are the same. To find an SVD, we follow
the steps above. We have
T 5 4
A A= .
4 5
The eigenvalues of AT A satisfy
5−λ 4
0= = (5 − λ)2 − 16 = λ2 − 10λ + 9 = (λ − 1)(λ − 9).
4 5−λ
The eigenvalues of AT A are hence λ1 = 9 and λ2 = 1,

and the singular values of A are
√ √ 3 0
σ1 = λ1 = 3 and σ2 = λ2 = 1. This gives Σ = .
0 1
The singular vectors v1 and v2 are the eigenvectors of AT A. Solving the corresponding
equations
AT Av1 = 9v1 , AT Av2 = v2 ,

" # " #
√1 − √1
we get v1 = √12 and v2 = √1 2 . (Note that any choice of ±v1 and ±v2 is valid here.)
2 2
" #
1 √1
√ −
This gives V = v1 v2 = √12 √1 2 .
2 2
48
" #
√1
1 2 1
For the singular vectors u1 and u2 , we have u1 = σ1
Av1 = √1
and u2 = σ2
Av2 =
2
" #
√1
2
.
− √12
" #
√1 √1
2 2
This gives U = u1 u2 = √1
.
2
− √12
49
6 Eigenvalue Problems
Given A ∈ Rn×n , we want to find λ ∈ C and x ∈ Cn \ {0} such that
Ax = λx.
The eigenvalues of A are the roots of its characteristic polynomial ξA (λ) = det(A − λI). The
below theorem in Galois theory shows that it is not possible to find an algorithm that calculates
eigenvalues exactly after a finite number of steps. Any possible algorithm would be based on the
operations mentioned in the theorem. All algorithms for computing eigenvalues are hence iterative,
and only yield an approximate solution. We want to avoid applying root finding algorithms to the
characteristic polynomial, since this is a very unstable (and expensive) approach.
Theorem 6.1. (Abel, 1824) For all n ≥ 5, there exists a polynomial of degree n with rational co-
efficients that has a real root which cannot be expressed by only using rational numbers, addition,
subtraction, multiplication, division and taking kth roots.
We will restrict our attention to symmetric matrices A, which have real eigenvalues and eigen-
vectors.
Theorem 6.2. Let A ∈ Rn×n be symmetric. Then:
a) The eigenvalues λ1 , . . . , λn are real.
b) The eigenvectors x1 , . . . , xn can be chosen such that they form an orthonormal basis of Rn .
Proof. This is also known as the Spectral Theorem, and can be proved by induction on n, see e.g.
Theorem 5.20 in Linear Algebra: a Modern Introduction by D. Poole.
Theorem 6.2 shows that symmetric matrices A diagonalise as A = QDQT , where the columns
of the orthogonal matrix Q ∈ Rn×n are the eigenvectors of A, and the diagonal matrix D ∈ Rn×n
has the corresponding eigenvalues as diagonal entries, in the same order.
We order the eigenvalues of A in decreasing order of absolute value,
|λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |,
and choose the corresponding eigenvectors x1 , . . . , xn to be an orthonormal basis of Rn .
Definition 6.3. For x ∈ Rn \{0}, the Rayleigh quotient of A is defined as
xT Ax
rA (x) = .
xT x
Theorem 6.4. We have rA (xi ) = λi , for i = 1, . . . , n.
50
Proof. Using Axi = λi xi , we have
xTi Axi xTi (λi xi ) λi xTi xi

rA (xi ) = = = = λi .
xTi xi xTi xi xTi xi
Finding eigenvalues and eigenvectors are equivalent problems:
• given an eigenvalue λi , we can find the corresponding eigenvector xi by solving the system of
equations (A − λi I)xi = 0.
• given an eigenvector xi , we can find the eigenvalue λi by computing the Rayleigh quotient
rA (xi ).
6.1 Power iteration

The method of Power Iteration is based on the following observation. Since the eigenvectors of A
form a basis of Rn , we can express any vector x ∈ Rn \{0} as
n
X
x= αi xi ,
i=1
for some coefficients αi ∈ R. Repeatedly multiplying x by A gives

n
X n
X
k k
A x= αi A xi = αi λki xi .
i=1 i=1
For large k ∈ N, this sum will be dominated by the term corresponding to the eigenvalue of A with
largest absolute value. For this reason, the eigenvalue λ1 of A with largest absolute value is known
as the dominating eigenvalue, and denoted by λmax . We denote the corresponding eigenvector by
xmax .
Algorithm PI Power Iteration

Input: A ∈ Rn×n symmetric, z(0) ∈ Rn \ {0}, ε > 0
Output: z(k) ∈ Rn , λ(k) ∈ R, with z(k) ≈ xmax and λ(k) ≈ λmax
1: k = 0
2: λ(0) = rA (z(0) )
3: while kAz(k) − λ(k) z(k) k2 > ε do
4: k =k+1
5: z(k) = Az(k−1)
(k)
6: z(k) = kzz(k) k2
7: λ(k) = rA (z(k) )
8: end while
As with all iterative methods, we need to stop the iteration when we are "close enough" to
the real answer. Typically, we stop the iteration when either kz(k) − z(k−1) k2 , |λ(k) − λ(k−1) | or
51
kAz(k) − λ(k) z(k) k2 is smaller than some tolerance ε. With stopping a stopping criterion based on
the closeness of eigenvectors, we have to be careful with a possible change of sign of z(k) with each
iteration. This happens when λmax < 0, and λkmax changes sign with each iteration.
k (0)
The algorithm calculates an approximate eigenvector as z(k) = kAAk zz(0) k2 . To improve stability and
avoid working with very large or very small numbers, the vector z(k) is normalised in every step of
the iteration.

5/2 −3/2
Example. Let A = . This matrix diagonalises as A = QDQT , with D =
−3/2 5/2
1 0 1 1 1 (0) 1
and Q = 2
√ . Let z = . Then
0 4 1 −1 0

(0) 5/2
Az =
−3/2

2 (0) 17/2
Az =
−15/2

3 (0) 65/2
Az = .
−63/2
that Ak z(0) converges very quickly to a multiple of the dominating

Wesee xmax =
eigenvector

√1
1 3 (0) 65/2 0.718
2 −1
. The final, normalised estimate is z(3) = kAA3 zz(0) k2 = √8194
2
≈ .
−63/2 −0.696
The corresponding approximate eigenvalue is λ(3) = rA (z(3) ) ≈ 3.99, which is close to the
dominating eigenvalue λmax = 4.
Note how the absolute value of the entries of Ak z(0) is growing with each iteration. If we
end up having to do a lot of iterations, then in the later iterations we would be dealing with
very large numbers and be very susceptible to rounding errors. This is why in practice, we
normalise z(k) to length 1 in Algorithm PI.
Computational cost
Line 5 of Algorithm PI requires one matrix-vector multiplication, at cost 2n2 using Algorithm MV.
Line 6 requires one application of Algorithm DP at cost 2n and one square root to compute kw(k) k2 ,
and n divisions. Computing the Rayleigh quotient in line 6 requires one matrix-vector multiplication
at cost 2n2 and one dot products at cost 2n. Note that since z(k) is normalised to length 1, the
denominator in the Rayleigh quotient is simply one. So one iteration step costs O(n2 ) in total.
|λ2 |
The number of iterations kmax required depends on the speed of convergence (i.e. on |λ1 |
) and
the stopping criterion employed.
The total cost of Power Iteration is then kmax (4n2 + 5n + 1).
52
Error Analysis
We have the following result on the convergence of the Power Iteration algorithm.
Theorem 6.5. Suppose |λ1 | > |λ2 | and xT1 z(0) 6= 0. Then there exists a sequence {σk }k∈N with
σk = ±1 such that as k → ∞,
kσk z(k) − x1 k2 → 0, and |λ(k) − λ1 | → 0.
Proof. We express the initial guess z(0) ∈ Rn \{0} as

n
X
z(0) = ci x i .
i=1
Since xT1 z(0) 6= 0, we have c1 6= 0. Then

n n n k !
X X X ci λi
Ak z(0) = ci A k x i = ci λki xi = c1 λk1 x1 + xi = c1 λk1 (x1 + yk )
i=1 i=1
c
i=2 1
λ 1
Pn k
ci λi
Since |λ1 | > |λ2 |, the vector yk = i=2 c1 λ1
xi satisfies yk → 0 as k → ∞.
Then
Ak z(0)
z(k) =
kAk z(0) k2
c1 λk1 (x1 + yk )
=
kc1 λk1 (x1 + yk ) k2
c1 λk1 x1 + yk
=
|c1 λk1 | kx1 + yk k2
x 1 + yk
= σk−1 ,
kx1 + yk k2
c1 λk1
with σk−1 = |c1 λk1 |
.
The first claim of the Theorem then follows:
x 1 + yk
kσk z(k) − x1 k2 = − x1
kx1 + yk k2 2
→0 as k → ∞,
since yk → 0 as k → ∞ and kx1 k2 = 1.
53
For the second claim, we use the definition of the Rayleigh quotient.
λ(k) = rA (z(k) )
= (z(k) )T Az(k)
= σk2 (z(k) )T Az(k)
= (σk z(k) )T A(σk z(k) )
1
= (x1 + yk )T A(x1 + yk )
kx1 + yk k22
→ xT1 Ax1 as k → ∞,
since yk → 0 as k → ∞. The claim then follows since λ1 = rA (x1 ) = xT1 Ax1 , which gives
|λ(k) − λ1 | → 0.
The proof of Theorem 6.5 reveals that the speed of convergence of z(k) and λk depends on the
ratio |λ 2|
|λ1 |
. (This directly influences the size of the vector yk .) Sharpening the proof of Theorem 6.5,
we can show that
k
(k) |λ2 |
kσk z − x1 k2 ≤ α1
|λ1 |
2k
(k) |λ2 |
|λ − λ1 | ≤ α2 ,
|λ1 |
for some constants α1 , α2 ∈ R. So the bigger the gap between |λ1 | and |λ2 |, the faster Power Iteration
converges.
Python demo Week9: Convergence of Power Iteration
6.2 Inverse Iteration and Shifted Inverse Iteration

What if we want to compute an eigenvector that is not the dominating eigenvector xmax ? This can
be achieved by applying Power Iteration to matrices derived from A.
The eigenvalues of the matrix A−1 are µi = λ−1i , i = 1, . . . , n, with corresponding eigenvectors xi .
The Power Iteration algorithm applied to A−1 hence computes xn , the eigenvector corresponding to
the eigenvalue of A with smallest absolute value. This is called the Inverse Iteration algorithm. Note
that we can compute λn as rA (xn ).
More generally, the eigenvalues of (A − sI)−1 , for a shift s ∈ R s.t. s 6= λi for i = 1, . . . , n, are
µi = (λi − s)−1 , i = 1, . . . , n, and the Power Iteration algorithm applied to (A − sI)−1 can be used
to compute the eigenvalue of A closest to s.
54
Algorithm SII Shifted Inverse Iteration

Input: A ∈ Rn×n symmetric, s ∈ R, z(0) ∈ Rn \ {0}, ε > 0
Output: z(k) ∈ Rn , λ(k) ∈ R, with z(k) ≈ xj and λ(k) ≈ λj , where λj is the eigenvalue of A closest
to s
1: k = 0
2: λ(0) = rA (z(0) )
3: while kAz(k) − λ(k) z(k) k2 > ε do
4: k =k+1
5: Solve (A − sI)z(k) = z(k−1)
(k)
6: z(k) = kzz(k) k2
7: λ(k) = rA (z(k) )
8: end while
Note that by computing λ(k) = rA (z(k) ), we have λ(k) ≈ λj , where λj is the eigenvalue of A closest
to s. The inverse iteration algorithm is obtained by setting s = 0.
Computational cost
Compared to Algorithm PI, the only line that has changed is line 5. Instead of performing a matrix-
vector multiplication, we now have to solve a system of linear equations of size n. This can be done
using Algorithm GEPP. Since the matrix A − sI is the same for each k, and only the right hand side
z(k−1) changes, we can compute the LU factorisation with permutation of A − sI once, and reuse this
for all values of k. This gives a one-off cost of O(n3 ) and a cost per iteration of O(n2 ).
Error Analysis
The following is a direct consequence of Theorem 6.5.
Theorem 6.6. Let s ∈ R. Suppose a, b ∈ {1, 2, . . . , n} are such that |λa − s|−1 > |λb − s|−1 ≥
|λi − s|−1 , for i ∈ {1, . . . , n} \ {a, b} and xTa z(0) 6= 0. Then there exists a sequence {σk }k∈N with
σk = ±1 such that as k → ∞,
kσk z(k) − xa k2 → 0, and |λ(k) − λa | → 0.
The speed of convergence again depends on how well the eigenvalues are separated, i.e. on the
size of |λ a −s|
|λb −s|
:
k
(k) |λa − s|
kσk z − x a k 2 ≤ α1
|λb − s|
2k
(k) |λa − s|
|λ − λa | ≤ α 2 ,
|λb − s|
for some constants α1 , α2 ∈ R.
55
6.3 Orthogonal Iteration

The methods that we have seen so far only compute one eigenvector at a time. A natural idea to
compute all eigenvectors of A at once, is to choose Z(0) ∈ Rn×n orthogonal, and to multiply Z(0)
repeatedly by A.
Algorithm OI Orthogonal Iteration

Input: A ∈ Rn×n symmetric, Z(0) ∈ Rn×n orthogonal, ε > 0
Output: Z(k) ∈ Rn×n with columns approximating the eigenvectors of A, Λ(k) ∈ Rn×n with diagonal
entries approximating the eigenvalues of A
1: k = 0
2: Λ(0) = (Z(0) )T AZ(0)
3: while kAZ(k) − Λ(k) Z(k) k1 > ε do
4: k =k+1
5: Compute W(k) = AZ(k−1)
6: Compute QR factorisation W(k) = Z(k) R(k)
7: Λ(k) = (Z(k) )T AZ(k)
8: end while
Alternative stopping criteria for Algorithm OI include kΛ(k) − Λ(k−1) k1 ≤ ε.

There is a line by line correspondence to the Power Iteration Algorithm. In line 5, we apply A
to a set of n vectors (the n columns of Z(k−1) ), resulting in W(k) .
In line 6, we orthonormalise the columns of W(k) , resulting in the orthogonal matrix Z(k) .
(k) (k) (k) (k)
In line 7, we compute a matrix Λ(k) with diagonal entries satisfying Λii = (zi )T Azi = rA (zi ),
(k)
where zi is the ith column of Z(k) . Hence, the diagonal entries of Λ(k) will converge to the eigenvalues
of A. The off-diagonal entries of Λ(k) will converge to zero. To see why this is the case, note that
(k) (k)
(Λ(k) )ij = (zi )T Azj → xT T
i Axj = λj xi xj = 0 as k → ∞.
In line 6, it is crucial that we do not just normalise all the columns of Z(k) to length 1, but also
make the columns orthogonal. As seen in the analysis of Power Iteration, all columns of Ak Z(0)
would converge to multiples of the dominating eigenvector x1 . In fact, the first column of Z(k) is
exactly the vector z(k) computed by Algorithm PI with starting guess z(0) equal to the first column
of Z(k) . (It is multiplied by A in line 3 and then normalised to length 1 in line 6.)
In order to make sure that the second column of Z(k) converges to x2 , we need to subtract the
component in the direction of x1 . What is left will then be dominated by the component in the
direction of x2 . Similarly, we need to subtract the components in the directions of x1 , . . . , xi−1 from
the ith column of Z(k) . This is exactly what the QR factorisation does. Think back to how Gram-
Schmidt orthonormalisation works: The first column of Z(k) is the first column of W(k) , normalised
to length 1. To obtain the second column of Z(k) , we take the second column of W(k) , subtract the
projection onto the first column of W(k) , and normalise to length 1. Since the first column of Z(k) is
converging x1 , this will eventually get rid of the term in the direction of x1 in the second column of
Z(k) . The argument for the remainder of the columns is similar.
For the Orthogonal Iteration Algorithm to converge, we need the assumption that A has distinct
eigenvalues:
|λ1 | > |λ2 | > · · · > |λn |.
56
The speed of convergence of the columns of Z(k) to the eigenvectors of A and the diagonal entries of
Λ(k) to the eigenvalues of A, depends on how well the eigenvalues are separated. More precisely, it
depends on the quantity
|λl+1 |
max .
1≤l≤n−1 |λl |
Algorithm OI is unstable when implemented in practice, and small rounding errors typically lead
to large errors in the computed eigenvectors and eigenvalues. To see why this is the case, consider
extracting the component in the direction of xn of the vector
n
X
k (0)
A z = ci λki xi .
i=1
Theoretically, this can be done following the procedure in Algorithm GS, and subtracting the projec-
tions onto x1 , . . . , xn−1 . However, the component in the direction of xn will be very small compared
to the other terms in the sum, and so numerically this component might get lost in the presence of
rounding errors.
Algorithm QR-Eig in the next section is a much more stable way of implementing the idea behind
Algorithm OI.

5/2 −3/2
Example. (ctd) Let A = . This matrix diagonalises as A = QDQT , with
−3/2
5/2
1 0 1 1 1 (0) 1 0
D= and Q = 2
√ . Let Z = . Then, to 3 significant figures, we have
0 4 1 −1 0 1
k=1:

(1) (0) 5/2 −3/2
W = AZ =
−3/2 5/2

(1) −0.857 0.514 −2.91 2.57
W = = Z(1) R(1)
0.514 0.857 0 1.37

(1) (1) T (1) 3.82 0.706
Λ = (Z ) AZ =
0.706 1.18
k=2:

(2) (1) −2.92 0
W = AZ =
2.57 1.37

(2) −0.750 0.661 3.89 0.908
W = = Z(2) R(2)
0.662 0.750 0 1.03

(2) (2) T (2) 3.99 0.187
Λ = (Z ) AZ =
0.187 1.01
k=3:

(3) (2) −2.87 2.78
W = AZ =
0.529 0.882

(3) −0.718 0.696 3.99 0.234
W = = Z(3) R(3)
0.696 0.718 0 1.00

(3) (3) T (3) 4.00 0.047
Λ = (Z ) AZ =
0.047 1.00
57
(k)
the columns of Z converge
We see that very
quickly
to a multiple of the eigenvectors
1 0.707 1 0.707
x1 = √12 ≈ and x2 = √12 ≈ .
−1 −0.707 1 0.707
The diagonal entries of Λ(k) converge to the the eigenvalues 4 and 1, respectively, and
the off-diagonal entries converge to zero. As expected from the error analysis for Algorithm
PI, we can see that the estimates of the eigenvalues converge faster than the estimate of the
eigenvectors.
6.4 QR Algorithm for Eigenvalues

The following algorithm computes the matrix Λ(k) from Algorithm OI in a much more stable man-
ner. Many of Python’s built-in functions for computing eigenvalues and eigenvectors, including
np.linalg.eigvalsh and np.linalg.eigh, use Algorithm QR-Eig below. To find the corresponding eigen-
vectors of A, if required, we can subsequently solve the systems (A − λI)x = 0.
The main idea behind Algorithm QR-Eig below, is to compute Λ(k) directly from Λ(k−1) . Let us
define Q(k) := (Z(k−1) )T Z(k) . Then Q(k) is orthogonal by construction, and
Q(k) R(k) = (Z(k−1) )T Z(k) (Z(k) )T W(k)

= (Z(k−1) )T W(k)
= (Z(k−1) )T AZ(k−1)
= Λ(k−1) .
Furthermore,
R(k) Q(k) = (Z(k) )T W(k) (Z(k−1) )T Z(k)

= (Z(k) )T AZ(k)
= Λ(k) .
This leads to the following algorithm
Algorithm QR-Eig QR Algorithm for eigenvalues

Input: A ∈ Rn×n symmetric, ε > 0
Output: Λ(k) ∈ Rn×n with diagonal entries approximating the eigenvalues of A
1: Λ(0) = A
2: k = 0
3: while kΛ(k) − Λ(k−1) k1 > ε or k < 1 do
4: k =k+1
5: Compute QR factorisation Λ(k−1) = Q(k) R(k)
6: Λ(k) = R(k) Q(k)
7: end while

5/2 −3/2
Example. (ctd) Let A = . This matrix diagonalises as A = QDQT , with
−3/2 5/2
58

1 0 √1
1 1
D= and Q = . Then, to 3 significant figures, we have
0 4 1 −1
2

2.5 −1.5
Λ(0) =A=
−1.5 2.5
k=1:

(0) −0.857 0.514 −2.92 2.57
Λ = = Q(1) R(1)
0.514 0.857 0 1.37

3.82 0.706
Λ(1) (1) (1)
=R Q =
0.706 1.18
k=2:

(1) −0.983 −0.182 −3.89 −0.908
Λ = = Q(2) R(2)
−0.182 0.983 0 1.03

3.99 −0.187
Λ(2) (2) (2)
=R Q =
−0.187 1.01
k=3:

(2) −0.999 0.0468 −3.99 0.234
Λ = = Q(3) R(3)
0.0468 0.999 0 1.00

4.00 0.047
Λ(3) (3) (3)
=R Q =
0.047 1.00
We see that the matrices Λ(k) are indeed the same matrices as computed by Algorithm OI,
differing only by a possible change of sign in the off-diagonal entries. This is due to the QR
factorisation only being unique up to a change of sign.
Although the matrices Λ(k) are the same for Algorithm OI and Algorithm QR-Eig when
computed in exact arithmetic, there is typically a substantial difference when implemented
in floating point arithmetic. The matrices computed with Algorithm QR-Eig will be more
accurate.
Computational Cost
In each iteration, we need to perform a matrix-matrix multiplication and find a QR factorisation,
both of which have cost of O(n3 ). Hence, the total cost of Algorithm QR-Eig is O(kmax n3 ).
If we use Algorithm MGS (Modified Gram-Schmidt) and Algorithm MM (Matrix-matrix multi-
plication), then the total cost of Algorithm QR-Eig is kmax (4n3 + n2 + n). But note that R is upper
triangular, and so the efficiency of Algorithm MM can be improved.
59

NLAFull Notes 22

Uploaded by

Copyright:

Available Formats

NLAFull Notes 22

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NLAFull Notes 22

Uploaded by

Copyright:

Available Formats

Numerical Linear Algebra MATH10098

Recap of Linear Algebra

Theorem 0.3 (Cauchy-Schwarz inequality). For x, y ∈ Rn , we have |x · y| ≤ kxkkyk.

Theorem 0.4 (Triangle inequality). For x, y ∈ Rn , we have kx + yk ≤ kxk + kyk.

Theorem 0.5 (Non-negativity). For x ∈ Rn , we have x · x ≥ 0, with x · x = 0 if and only if

Definition 0.6. The vectors x, y ∈ Rn are orthogonal if x · y = 0.

Definition 0.8. A matrix A ∈ Rm×n is in row echelon form if:

1. Any rows consisting entirely of zeros are at the bottom.

2. rank(A) = n, i.e. A has full rank.

Theorem 0.13. For any A, B ∈ Rn×n , we have det(AB) = det(A)det(B).

Definition 0.14. A matrix Q ∈ Rm×n , with m ≥ n, is orthogonal if QT Q = I. The columns

Theorem 0.18. If A ∈ Rn×n is symmetric, then it is orthogonally diagonalisable: We have

1.1 Floating point arithmetic

A1. For all α ∈ R, there is ε ∈ (−εm , εm ) with

# find value of machine epsilon

1012 + 10−12 = 1. 00 . . 00} 1 × 1012

Python demo Week1.1: Floating point arithmetic

1.2 Algorithms and computational cost

Definition 1.2. The cost of an algorithm is

C ..= number of floating point operations (FLOPs).

We are usually interested in how quickly C(n) grows with n.

We now look at some basic algorithms in linear algebra.

Algorithm DP Dot product

Algorithm MV Matrix-vector multiplication

Algorithm MM Matrix-matrix multiplication

The cost of Algorithm MM is C(m, n, p) = m × p × 2n = O(mnp). In particular, if A, B ∈ Rn×n are

Python demo Week1.2: Computational cost

1.3 Vector and matrix norms

Particular examples are:

1. p = 2: the 2-norm (also called Euclidean norm or length)

2. p = 1: the 1-norm (also called the Manhattan or taxi-cab norm)

kxk1 = 301, kxk2 = 101, kxk∞ = 100.

Definition 1.4. A norm on Rn is a function k·k : Rn → R such that

(a) kxk > 0 for all x ∈ Rn , with kxk = 0 if and only if x = 0;

(b) kαxk = |α| kxk for all x ∈ Rn , α ∈ R;

(c) kx + yk 6 kxk + kyk for all x, y ∈ Rn (triangle inequality).

2 Methods for Systems of Linear Equations

with A ∈ Rn×n invertible and b ∈ Rn .

Python demo Week2.1: Solving linear equations vs Computing inverses

2.1 Gaussian elimination

Repeating this for the second column gives

The above procedure factorises A as A = (L2 L1 )−1 U =: LU.

Definition 2.1. A matrix B ∈ Rm×n is

• diagonal if bij = 0 for i 6= j,

• (strictly) upper triangular if bij = 0 for i > j (i > j),

• (strictly) lower triangular if bij = 0 for i < j (i 6 j).

(b) Direct calculation.

Example. (cont’d) We have A = LU, where

Definition 2.3. An LU factorisation of A ∈ Rn×n is A = LU, where L ∈ Rn×n is unit lower

An algorithm to compute an LU factorisation is given below. It is a generalisation of the procedure

Algorithm BS Back substitution

Forward substitution is defined similarly, see Computer lab week 3.

Python demo Week2.2: Back substitution

Algorithm GE Gaussian elimination

Example. (cont’d) Find the solution to Ax = b, with

We have A = LU, where

We find the solution y to Ly = b by forward substitution:

Hence x = [−1/2, 5, −1]T .

Summing over the outer loop (over k) iterations: