NLAFull Notes 22
NLAFull Notes 22
NLAFull Notes 22
• Your notes from Introduction to Linear Algebra or Accelerated Algebra and Calculus for Direct
Entry.
• B.N Datta. Numerical Linear Algebra and Applications, SIAM, 2010 (second edition). Sections
2.1 - 2.4.
• D. Poole. Linear Algebra: A modern introduction. Cengage Learning. Sections 1.1 - 1.3, 2.3,
3.1-3.3, 3.5, 4.1-4.4 and 5.1 - 5.5.
Pn
Definition 0.1. The dot product between two vectors x, y ∈ Rn is given by x · y = i=1 xi y i .
√
Definition 0.2. The length (or Euclidean norm) of a vector x ∈ Rn is given by kxk = x·x =
1/2
( ni=1 x2i ) .
P
Definition 0.7. For two vectors x, y ∈ Rn , with y 6= 0, the projection of x onto y is the vector
projy (x) given by
x·y
projy (x) = y.
y·y
2. In each non-zero row, the first non-zero entry (called the leading entry) is in a column to
the left of any leading entries below it.
1
Numerical Linear Algebra MATH10098
Definition 0.9. The rank of A is the number of non-zero rows in its row echelon form, which is
equal to the number of linearly independent rows of A. It is also equal to the number of linearly
independent columns of A.
Theorem 0.10 (Rank Theorem). Let A ∈ Rm×n , with m ≥ n. The following two statements
are equivalent:
1. Ax = 0 if and only if x = 0.
Definition 0.11. Given A ∈ Rm×n with entries (A)ij := aij , the transpose AT ∈ Rn×m is given
by (AT )ij := aji . A square matrix A ∈ Rn×n is symmetric if A = AT .
Definition 0.12. Given A ∈ Rn×n , the inverse matrix A−1 ∈ Rn×n is such that AA−1 =
A−1 A = I, where I ∈ Rn×n denotes the identity matrix. If A−1 exists, A is called invertible.
Definition 0.15. Given a matrix A ∈ Rn×n , a scalar λ ∈ C is called an eigenvalue and the vector
x ∈ Cn \ {0} a corresponding eigenvector of A if Ax = λx.
Theorem 0.16. The eigenvalues of A ∈ Rn×n are the roots of the characteristic polynomial ξA ,
defined by ξA (z) = det(A − zI), for z ∈ C. In other words, the eigenvalues λ1 , . . . , λn of A satisfy
n
Y
det(A − zI) = (λn − z).
i=1
Theorem 0.17. A matrix A ∈ Rn×n is diagonalisable if and only if it has n linearly independent
eigenvectors. A diagonalisable matrix A can be written in the form A = S−1 DS, where the
columns of S ∈ Cn×n are the eigenvectors of A and the diagonal entries of D ∈ Cn×n are the
corresponding eigenvalues, in the same order.
2
Numerical Linear Algebra MATH10098
3
Numerical Linear Algebra MATH10098
1 Background material
This section contains some background material on floating point arithmetic, algorithms and norms.
We need these concepts to understand how computers perform computations, how we can write down
our methods in a structured way, and whether our methods are computationally efficient and the
computed solutions are accurate.
The mantissa can only store a finite number of digits, which you can think of as using a finite
number of significant figures. For a more thorough discussion of the relation between floating point
representations, scientific notation and significant figures, please watch the video at
https://www.youtube.com/watch?v=PZRI1IfStY0.
We work under the following assumptions on floating point representation.
Assumption 1.1. There is a parameter εm > 0, called machine epsilon, such that the following
assumptions hold.
fl(α) = α (1 + ε) .
A2. For every α, β ∈ F and every elementary operation ∗ ∈ {+, −, ×, ÷}, there is δ ∈ (−εm , εm )
with
α ∗ β = (α ∗ β) (1 + δ)
where ∗ denotes the computed version of ∗.
Note that by Assumption A1, the rounding error in α is relative to the size of α:
|α − fl(α)| ≤ |α|εm .
Likewise, by Assumption A2, the error in α ∗ β is relative to the size of α ∗ β:
|α ∗ β − α ∗ β| ≤ |α ∗ β|εm .
The vast majority of modern computers use the floating point representation specified by the
IEEE standard 754 with double precision. In this case we use 64 bits to represent a floating point
4
Numerical Linear Algebra MATH10098
number, split as 1 bit for the sign, 52 bits for the mantissa, and 11 bits for the exponent (in base
2). In this case we have εm = 2.22 × 10−16 , to 3 significant figures. You can check this yourself in
Python, using the following code:
import numpy as np
print(em)
According to Assumption 1.1, εm is the accuracy with which we can store real numbers and
perform arithmetic operations. Since εm gives the relative accuracy, we can interpret εm ≈ 10−16 as
working with 15 significant figures. Note that using a finite number of significant figures makes arith-
metic operations involving numbers of very different magnitude susceptible to errors. For example,
consider adding the numbers 1012 and 10−12 :
Floating point arithmetic will have an effect on any quantities we compute. In general, there will
be a difference between the true solution to the problem we want to solve, e.g. the solution x to
Ax = b, and the solution computed by a computer in floating point arithmetic, typically denoted
by x
b.
Definition 1.3. We say C(n) = O(f (n)) if there exists a constant α > 0 such that C(n) ≤ αf (n).
This means that C(n) does not grow more quickly than f (n) as a function of n. For example,
5n3 = O(n3 ),
5n3 + 2n2 = O(n3 ).
5
Numerical Linear Algebra MATH10098
To compute the cost of Algorithm DP, we count the number of arithmetic operations. Line DP.3
involves 1 addition and 1 multiplication,
Pn and it is executed for all values of i = 1, . . . , n. The cost
of DP is therefore C(n) = i=1 2 = 2n = O(n).
Pm
Algorithm MV involves m applications of Algorithm DP on vectors in Rn , hence C(m, n) = i=1 2n =
m × 2n = O(mn). In particular, if A ∈ Rn×n is square, we have C(n) = O(n2 ).
There are always many different algorithms that solve a given problem, and it is important to
think about efficient implementation. Suppose we want to compute the product ABC, for matrices
A ∈ Rm×m , B ∈ Rm×m and C ∈ Rm×n . We can do this in two ways:
6
Numerical Linear Algebra MATH10098
1. Compute AB using Algorithm MM, at cost 2m3 , then compute the product of AB ∈ Rm×m
and C using Algorithm MM at cost 2m2 n. Total cost: 2m2 (m + n).
2. Compute BC using Algorithm MM, at cost 2m2 n, then compute the product of A and BC ∈
Rm×n using Algorithm MM at cost 2m2 n. Total cost: 2m2 (n + n).
The first approach is faster if m < n, whereas the second approach is faster if m > n.
n
!1/p
X p
kxkp = |xi | , p ∈ [1, ∞) ,
i=1
kxk∞ = max |xi | .
16i6n
Which p-norm we choose to work with depends on the situation. The 1-norm ensures that smaller
still count, whereas the ∞-norm measures only the largest component. For example, for
components
100
1 202
x= . . . ∈ R , we have
When measuring the difference between the true solution x to Ax = b and the computed solution
b, the 1-norm and 2-norm represent an average error over all components, whereas the ∞-norm
x
represents the maximum error over all components.
In general, norms have the following properties.
7
Numerical Linear Algebra MATH10098
Given the p-norm on Rn , we can define a norm on Rm×n by the induced norm
kAxkp
kAkp = max , for all A ∈ Rm×n .
x6=0 kxkp
The size of a matrix A is hence measured by how much the application of A to a vector x can change
the size of the vector, measured in the p norm.
The maximum in the expression above can be difficult to compute explicitly, but this is possible
in particular cases.
Theorem 1.5. a) The matrix norm induced by the ∞-norm is the maximum row sum:
n
X
kAk∞ = max |aij | .
16i6m
j=1
b) The matrix norm induced by the 1-norm is the maximum column sum:
m
X
kAk1 = max |aij | .
16j6n
i=1
c) The matrix norm induced by the 2-norm is the square root of the spectral radius of AT A:
p
kAk2 = ρ(AT A), where ρ(AT A) = max{|λ| : λ is an eigenvalue of AT A}.
Part a) is proved in Workshop 2, the proof of part b) is similar. Part c) is proved in Workshop
10.
8
Numerical Linear Algebra MATH10098
Ax = b (SLE)
The main idea behind modern algorithms to solve (SLE) is to factorise A = MN, where M and
N are simpler matrices, and then solve
My = b
Nx = y.
This gives the correct solution, since Ax = MNx = My = b. Here, M and N being simpler
matrices means that the two systems of linear equations above are easy to solve. The next sections
give examples of different choices for M and N.
Example. Let
2 1 0
A = 4 3 3
6 7 8
We want to put A into row echelon form. We create zeros below the diagonal in the first
column by subtracting multiples of the first row from the other rows:
1 0 0 2 1 0 2 1 0
L1 A = −2 1 0 · 4 3 3 = 0 1 3 .
−3 0 1 6 7 8 0 4 8
To quantify the structure of the matrices L and U computed in the above example, recall the
following definitions.
9
Numerical Linear Algebra MATH10098
A unit diagonal, upper triangular, or lower triangular matrix has all diagonal entries equal to 1,
i.e. bij = 1 for i = j.
The matrix U from the example is upper triangular. Lemma 2.2 shows that the matrix L = (L2 L1 )−1
is unit lower triangular.
Lemma 2.2. (a) Let L = (lij ) be unit lower triangular with non-zero entries below the diagonal
only in column k. Then L−1 is also unit lower triangular with non-zero entries below the
diagonal only in column k, and (L−1 )ik = −lik , for all i > k.
(b) Let A = (aij ) and B = (bij ) be unit lower triangular n × n matrices, where A has non-zero
entries below the diagonal only in columns 1 to k, and B has non-zero entries below the
diagonal only in columns k + 1 to n. Then AB is unit lower triangular, with (AB)ij = aij
for j 6 k, and (AB)ij = bij for j > k + 1.
Proof. (a) Multiplying L with the suggested inverse gives the identity.
Note that in the example above, Lemma 2.2 allowed us to compute the matrix L without per-
forming any arithmetic operations. The inversion of the individual matrices L1 and L2 only requires
flipping a sign by Lemma 2.2 (a), and the multiplication of L−1 −1
1 and L2 only requires a concatenation
of the entries in L−1 −1
1 and L2 by Lemma 2.2 (b). Also note that the entries of L are the so-called
multipliers: ljk = multiple of row k subtracted from row j to create zeros below the diagonal in
column k.
Qn−1 −1
For a general matrix A ∈ Rn×n , Gaussian elimination will result in A = k=1 Lk U =: LU.
The matrix U is by construction upper triangular. Each matrix Lk represents the elementary row
operations of subtracting a multiple of row k from rows k + 1, . . . , n. Hence it is lower-triangular, it
only has non-zero entries below the diagonal only in column k, and Lemma 2.2 applies.
Gaussian elimination computes an LU factorisation of A.
10
Numerical Linear Algebra MATH10098
Algorithm LU LU factorisation
Input: A ∈ Rn×n
Output: L, U ∈ Rn×n , the LU factorisation of A
1: L = I, U = A
2: for k = 1, . . . , n − 1 do
3: for j = k + 1, . . . , n do
4: ljk = ujk /ukk
5: (ujk , . . . , ujn ) = (ujk , . . . , ujn ) − ljk (ukk , . . . , ukn )
6: end for
7: end for
Line LU.5 subtracts a multiple of row k from row j, resulting in ujk = 0. After iteration k, U contains
zeros below the diagonal in columns 1 to k. Hence, they do not need to be included in line LU.5.
Lemma 2.2 shows that L is computed correctly.
Algorithm LU shows that an LU factorisation of A exists, provided ukk 6= 0 in line LU.4, for
k = 1, . . . , n − 1. In other words, an LU factorisation of A ∈ Rn×n exists if A is non-singular and
can be put into row echelon form without performing any row swaps. If an LU factorisation exists,
it is unique.
If you would like to see a detailed walk through of another example of an LU factorisation, you
can watch the video at https://www.youtube.com/watch?v=HS7RadfcoFk.
We now have the factorisation A = LU, so we can solve (SLE) by solving
Ly = b
Ux = y.
Since L (resp. U) is lower (resp. upper) triangular, this can be done by forward (resp. back)
substitution.
11
Numerical Linear Algebra MATH10098
Computational cost
Lemma 2.4. The LU factorisation algorithm has computational cost
2 1 7
C(n) = n3 + n2 − n.
3 2 6
12
Numerical Linear Algebra MATH10098
Proof. We count the number of FLOPs. Line LU.5 requires (n − k + 1) multiplications and
(n − k + 1) subtractions. Line LU.4 requires 1 division. The inner loop (over j) hence needs
n
X
(1 + 2 (n − k + 1)) = (n − k) · (1 + 2(n − k + 1)) operations.
| {z } | {z }
j=k+1 # iterations FLOPs per iteration
A similar argument shows that the computational cost of forward substitution is C(n) = n2 .
Error analysis
Algorithm GE is an unstable algorithm, meaning that the finite precision εm of floating point arith-
metic can have a disastrous effect on the computed solution. The following example illustrates this.
Example. Let −20
10 1
A= .
1 1
Finding an LU factorisation of A:
0 10−20 1
−20
1 10 1
L1 A = =
−1020 1 1 1 0 1 − 1020
13
Numerical Linear Algebra MATH10098
−20
1 0 10 1
⇒ L= , U= .
1020 1 0 1 − 1020
If f l(1 − 1020 ) = −1020 , the matrices will be stored as
−20
10 1
L̂ = L , Û = .
0 −1020
Then −20
10 1 0 0
L̂Û = =A+ ,
1 0 0 −1
and the solution x
b computed using L̂ and Û will be completely different to the true solution
x.
14
Numerical Linear Algebra MATH10098
Definition 2.7. A matrix P ∈ Rn×n is called a permutation matrix if every row and every
column of P contains n − 1 zeros and 1 one.
For A ∈ Rn×n :
and the error in the computed LU factorisation of is A0 much smaller compared to the error
in the LU factorisation of A.
• GEPP (Gaussian elimination with partial pivoting) — swap rows only to maximise |ukk | in
line LU.4
• GECP (Gaussian elimination with complete pivoting) — swap rows and columns to maximise
|ukk | in line LU.4
By maximising |ukk | in line LU.4, we avoid dividing by very small numbers and hence avoid
very large numbers in the computation of the LU factorisation. In the above example, we avoid
computations with 1020 , and hence minimise the effect of rounding errors.
The general idea of Algorithm LUPP below is to compute an LU factorisation of a permuted
matrix PA. However, a suitable choice of the permutation matrix P is typically not known a-priori,
and needs to be determined as part of the algorithm.
15
Numerical Linear Algebra MATH10098
Example. Let
2 1 0
A = 4 3 3
6 7 8
Last week we saw that the LU factorisation of A is
1 0 0 2 1 0
L = 2 1 0 , U = 0 1 3 .
3 4 1 0 0 −4
16
Numerical Linear Algebra MATH10098
U = A, L = I, P = I
k=1:
i = 3, since |u31 | = max{|u11 |, |u21 |, |u31 |}
6 7 8
Swap rows 1 and 3 of U: U = 4 3 3
2 1 0
0 0 1
Swap rows 1 and 3 of P: P = 0 1 0
1 0 0
No rows to swap in L, since we haven’t filled in any entries yet
u21 2
l21 = = ,
u11 3
5 7
(u21 , u22 , u23 ) = (u21 , u22 , u23 ) − l21 (u11 , u12 , u13 ) = (0, − , − ),
3 3
u31 1
l31 = = ,
u11 3
4 8
(u31 , u32 , u33 ) = (u31 , u32 , u33 ) − l31 (u11 , u12 , u13 ) = (0, − , − ),
3 3
6 7 8 1 0 0 0 0 1
U = 0 − 35 − 73 , L = 32 1 0 , P = 0 1 0
0 − 43 − 83 1
3
0 1 1 0 0
k=2:
i = 2, since |u22 | = max{|u22 |, |u32 |}
No swaps required
u32 4
l32 = = ,
u22 5
4
(u32 , u33 ) = (u32 , u33 ) − l32 (u22 , u23 ) = (0, − ),
5
6 7 8 1 0 0 0 0 1
U = 0 − 35 − 73 , L = 32 1 0 , P = 0 1 0 .
0 0 − 45 1
3
4
5
1 1 0 0
Computational cost
We assume the exchange of rows carries no cost. The computational cost of Algorithm GEPP is the
computational cost of Algorithm GE, plus the additional cost for choosing the pivot i in line LUPP.3.
This does not require any FLOPs, but it does require the comparison of different numbers in order
to find the maximum. Let us assign unit cost to the comparison of two numbers.
For partial pivoting, choosing i involves finding the maximum of (n − k + 1) numbers. This can
be done using the following algorithm.
17
Numerical Linear Algebra MATH10098
Algorithm Max involves m − 1 comparisons for finding the maximum of m numbers, and the total
cost of choosing the pivots i in Algorithm LUPP is hence:
n−1 n−1
X X n(n − 1)
(n − k) = l= = O(n2 ).
k=1 l=1
2
Error analysis
In comparison to Algorithm GE, Algorithm GEPP is stable.
k∆Akp k∆bkp
+ 6 αεm .
kAkp kbkp
18
Numerical Linear Algebra MATH10098
Theorem 2.9. Let x̂ be the computed solution to (SLE) by Algorithm GEPP. Then
(A + ∆A) x̂ = b,
The bound on k∆Ak∞ is sharp for some matrices A, √ however, for most matrices the bound is
quite pessimistic. Recent research suggests that k∆Ak∞ ≈ nεm kAk∞ for most matrices A ∈ Rn×n .
Complete pivoting improves on the bound, but the moderate increase in accuracy is usually not worth
the increased computational cost. In practice, Algorithm GEPP mostly works well, even for large n.
The built-in Python solver, numpy.linalg.solve, uses Algorithm GEPP.
The proof of Theorem 2.9 can be found in Theorem 9.5 in the book Accuracy and Stability of
Numerical Algorithms, by N.J. Higham. It is based on Assumption 1.1, and involves going through
each operation performed in Algorithm GEPP, accumulating all rounding errors.
Given backward stability, what can we say about the error x − x̂? For this, we need the notion
of condition number.
Definition 2.10. The condition number κp (A) of A ∈ Rn×n with respect to the p-norm k·kp
is (
kAkp kA−1 kp if A is invertible
κp (A) =
∞ otherwise.
The matrix A is ill-conditioned if κp (A) is large.
The condition number κp (A) is a measure of how close the matrix A is to being singular, cf
Theorem 2.12. What is "large" for κp (A) is not precisely defined, and depends on the context.
Workshop 4 shows κp (A) ≥ 1 for any matrix A ∈ Rn×n , and any matrix with condition number
close to 1, say κp (A) ≤ 104 is considered well-conditioned. A matrix is definitely ill-conditioned once
κp (A) ≈ ε−1 12
m , say κp (A) ≥ 10 .
then
kx − x̂kp κp (A) k∆Akp
6 k∆Ak
· .
kxkp 1 − κp (A) kAk p kAkp
p
kx−x̂kp
The above Theorem shows that for the relative error kxkp
to be small, we need:
19
Numerical Linear Algebra MATH10098
k∆Akp
a) the relative size of the perturbation kAkp
to be small, ie the algorithm to have good stability
properties;
A combination of Theorems 2.9 and 2.11 gives a bound on the error in the solution x̂ computed by
kx−x̂k
Algorithm GEPP. Note that the theorem gives a bound on the relative error kxk p , which is much
p
more informative than a bound on the actual error kx − x̂kp . To see why this is the case, consider
the following two examples:
−6 −6
10 10 kx − x̂k∞
Example 1: x = −6 , x
b= −3 , bk∞ = 1.001 × 10−3 ,
kx − x = 1.001 × 103 ,
10 10 kxk∞
1 1 kx − x̂k∞
Example 2: x = ,x
b= , bk∞ = 10−3 ,
kx − x = 10−3 .
1 1.001 kxk∞
As example 1 shows, a small actual error does not necessarily mean that the solution has been
computed accurately - the second component is off by orders of magnitude. We always want to
measure the relative error, ie the size of the error compared to the size of the solution x.
To provide some more intuition why the condition number measures how close the matrix A is
to being singular, consider the following result.
Theorem 2.12. Let A ∈ Rn×n , with real eigenvalues |λ1 | > |λ2 | > . . . > |λn | and corresponding
real eigenvectors x1 , . . . , xn . If x1 , . . . , xn form an orthonormal set, then the condition number of
A in the 2-norm is
|λ1 |
κ2 (A) = kAk2 A−1 2 = > 1.
|λn |
|λ1 |
Proof. If A is singular, then λn = 0, and |λ n|
= ∞. This is consistent with Definition 2.10, where
singular matrices are assigned condition number ∞.
We will prove the result by proving kAk2 = |λ1 | and kA−1 k2 = |λn |−1 . We start with the
former.
For x ∈ Rn \ {0}, let c ∈ Rn be such that
x = c1 x 1 + . . . + cn x n .
Then
Ax = A (c1 x1 + · · · + cn xn )
= c1 Ax1 + . . . + cn Axn
= c1 λ1 x1 + . . . + cn λn xn .
20
Numerical Linear Algebra MATH10098
We have
kxk22 = xT x
= (c1 x1 + . . . + cn xn )T (c1 x1 + . . . + cn xn )
X n
2 2
= c1 + . . . + cn = c2i ,
i=1
Hence,
1/2 1/2
( ni=1 λ2i c2i ) |λ1 | ( ni=1 c2i )
P P
kAxk2
= Pn 1/2
6 1/2
= |λ1 |,
kxk2 ( i=1 c2i ) ( ni=1 c2i )
P
which implies
kAxk2
kAk2 = max 6 |λ1 |.
x6=0 kxk2
We now show that kAk2 > |λ1 |.
kAxk2
kAk2 = max
x6=0 kxk
2
kAx1 k2
>
kx1 k2
kλ1 x1 k2 |λ1 | kx1 k2
= = = |λ1 |.
kx1 k2 kx1 k2
The assumptions of Theorem 2.12 are satisfied for example by symmetric matrices. The result
shows that the closer the smallest eigenvalue of A is to zero, the larger the condition number of A.
Since a zero eigenvalue is equivalent to the matrix being singular, ill-conditioned matrices are close
to being singular.
21
Numerical Linear Algebra MATH10098
Recall that an orthogonal matrix Q ∈ Rn×n is a matrix that satisfies QT Q = I, or in other words
Q−1 = QT . We then have the following algorithm.
A QR factorisation exists for all non-singular A, and can be computed using Gram-Schmidt
orthonormalisation. First, note that the definition QT Q = I of an orthogonal matrix Q is equivalent
to the columns of Q forming an orthonormal set. To see this, denote by q1 , . . . , qn the columns of
Q. Then
QT Q = I
(
1 for i = j
⇔ (QT Q)ij =
0 6 j
for i =
(
1 for i=j
⇔ qT
i qj =
0 for i 6= j
proju (x) = uT x u.
22
Numerical Linear Algebra MATH10098
The columns of Q form an orthonormal basis of Rn . They are constructed from the columns of
A in the following way:
2. To construct q2 , we start with a2 , subtract its orthogonal projection onto q1 , and normalise to
unit length:
q2 = a2 − (aT2 q1 )q1
q2
q2 = .
kq2 k2
rjj = 0 ⇒ qj = 0
⇒ a1 , . . . , aj are linearly dependent
⇒ A is singular.
23
Numerical Linear Algebra MATH10098
1 1
Example. Consider the matrix A = . Using Algorithm GS, we have
1 0
j=1:
√ √
r11 = ka1 k2 = 1 + 1 = 2,
a1 1 1
q1 = =√ ,
r11 2 1
j=2:
1
r12 = qT1 a2 = √ ,
2
1 1
q2 = a2 − r12 q1 = ,
2 −1
1√ 1
r22 = kq2 k2 = 1+1= √ ,
2 2
q2 1 1
q2 = =√ ,
r22 2 −1
"√ #
√1
1 1 2 2 1
Hence, A = QR, with Q = √12 and R = 2 = √1 .
1 −1 0 √12 2 0 1
Computational cost
Theorem 2.15. The computational cost of Algorithm GS is
C(n) = 2n3 + n2 + n.
Proof. Line GS.5 requires 2n operations using Algorithm DP, and Line GS.6 requires n multipli-
cations and n subtractions. Summing over the inner loop:
j−1
X
4n = 4n(j − 1).
k=1
Line GS.8 requires 1 square root and 2n operations to calculate the dot product. Line GS.9
requires n divisions. Summing over the outer loop:
n
X n
X n
X
(4n (j − 1) + 1 + 2n + n) = 4n (j − 1) + (3n + 1)
j=1 j=1 j=1
n−1
X
= 4n i + (3n + 1)n
i=0
(n − 1)n
= 4n + 3n2 + n
2
= 2n3 + n2 + n.
Solving the system Qy = b in step 2 of Algorithm SQR is done in 2n2 operations using Algo-
rithm MV.
24
Numerical Linear Algebra MATH10098
Solving the system Rx = y in step 3 of Algorithm SQR is done in n2 operations using Algo-
rithm BS.
Theorem 2.16. The computational cost of Algorithm SQR is
Error analysis
Algorithm GS is known to produce large errors. For the computed QR factorisation Q̂R̂, we have
Q̂T Q̂ = I + ∆I , Q̂R̂ = A + ∆A
where ∆I, ∆A ∈ Rn×n are typically very large.
Algorithm GS can be stabilised by a small modification – replace line GS.5 by
40 : rkj = qTk qj
Denote by Algorithm MGS (Modified Gram-Schmidt) Algorithm GS with line 4’ instead of GS.5.
In exact arithmetic, the two algorithms compute the same matrices Q and R. This can be seen
from the following calculation: at iteration j in the outer loop and iteration k in the inner loop, we
have in exact arithmetic
MGS
rkj = qTk qj
k−1
!
X
= qTk aj − rljMGS ql
l=0
k−1
X 0, as {q1 , . . . , qj−1 } is an orthonormal set
qTk aj T *
= − rljMGS q q
k l
l=0
= qTk aj
GS
= rkj
25
Numerical Linear Algebra MATH10098
In floating point arithmetic, the computed vectors {q̂1 , . . . , q̂j−1 } will not be orthonormal. Al-
gorithm MGS takes this into account when computing q̂j ; Algorithm GS does not. This small
change makes a big difference in practice, and we have the following stability result for Algo-
rithm MGS.
where
α1 (n)εm κ2 (A)
k∆Ik2 6 , if 1 − α1 (n)εm κ2 (A) > 0
1 − α1 (n)εm κ2 (A)
k∆Ak2 6 α2 (n)εm kAk2 .
Here, α1 (n) and α2 (n) are constants depending on n; their values increase with increasing n.
The proof of Theorem 2.17 can be found in Theorem 19.13 in the book Accuracy and Stability of
Numerical Algorithms, by N.J. Higham. It is based on Assumption 1.1, and involves going through
each operation performed in Algorithm MGS, accumulating all rounding errors.
The following example also demonstrates the improved stability properties of Algorithm MGS.
1 1 1
Example. Consider the matrix A = ε 0 0. Suppose ε is small, such that f l(1+ε2 ) = 1.
0 ε 0
Using Algorithm GS, we have
j=1:
√
r11 = ka1 k2 = 1 + ε2 ≈ 1 = r̂11 ,
1
a1
q̂1 = = a1 = ε ,
r̂11
0
j=2:
r̂12 = q̂T1 a2 = 1,
0
˜ 2 = a2 − r̂12 q̂1 = −ε ,
q̂
ε
√ √
˜ 2 k2 = ε2 + ε2 = 2ε,
r̂22 = kq̂
˜ 0
q̂2 1
q̂2 = = √ −1 ,
r̂22 2 1
26
Numerical Linear Algebra MATH10098
j=3:
r̂13 = q̂T1 a3 = 1,
r̂23 = q̂T2 a3 = 0,
0
˜ 3 = a3 − r̂13 q̂1 − r̂23 q̂2 = −ε ,
q̂
0
√
˜ 3 k2 = ε2 = ε,
r̂33 = kq̂
˜ 0
q̂3
q̂3 = = −1 ,
r̂33
0
So there is a significant loss of orthogonality between q̂2 and q̂3 , and ∆I is large. For
Algorithm MGS, on the other hand, we have
j=1:
√
r11 = ka1 k2 = 1 + ε2 ≈ 1 = r̂11 ,
1
a1
q̂1 = = a1 = ε ,
r̂11
0
j=2:
r̂12 = q̂T1 a2 = 1,
0
˜ 2 = a2 − r̂12 q̂1 = −ε ,
q̂
ε
√ √
˜ 2 k2 = ε2 + ε2 = 2ε,
r̂22 = kq̂
˜2 0
q̂ 1
q̂2 = = √ −1 ,
r̂22 2 1
27
Numerical Linear Algebra MATH10098
j=3:
r̂13 = q̂T1 a3 = 1,
ε
r̂23 = q̂T2 (a3 − r̂13 q̂1 ) = √ ,
2
0
˜ 3 = a3 − r̂13 q̂1 − r̂23 q̂2 = − ε ,
q̂ 2
− 2ε
r
2 2
˜ 3 k2 = ε + ε = √ε ,
r̂33 = kq̂
4 4 2
˜3 0
q̂ 1
q̂3 = = √ −1 ,
r̂33 2 −1
28
Numerical Linear Algebra MATH10098
In line QR.4, e1 denotes the vector with a 1 in its first entry, and all other entries equal to 0.
Furthermore, sign(u1 ) = 1 if u1 > 0 and sign(u1 ) = −1 otherwise.
The matrix Hk is called a Householder reflection matrix. Applying Hk to a vector x reflects
the vector x about the hyperplane that is orthogonal to w:
Hk x = x − 2wwT x = x − 2 projw (x).
projw (x) is the projection of x onto w, and x − projw (x) is the projection of x onto the hyperplane
orthogonal to w. Hence, x − 2 projw (x) is the reflection of x about the hyperplane orthogonal to w:
x
2 projw (x)
hyperplane orthogonal to w
x − 2 projw (x)
Algorithm QR starts with R = A, and creates zeros below the diagonal in column k at iteration
k. This is achieved by using the Householder reflections Hk .
29
Numerical Linear Algebra MATH10098
Lemma 2.18. The matrix Rk = Qk . . . Q1 A has zeros below the diagonal in columns 1 to k, for
1 6 k 6 n − 1.
Q1 u = I − 2wwT u.
Using the definition of w in lines QR.3–QR.5, a direct calculation (see Workshop 6) shows
Q1 u = − sign(u1 ) kuk2 e1 ,
for some R11 ∈ R(k−1)×(k−1) upper triangular, R22 ∈ R(n−k+1)×(n−k+1) . Then, from line QR.7
I 0 R11 R12 R11 R12
Rk = Qk Rk−1 = · = .
0 Hk 0 R22 0 Hk R22
Now, u in line QR.3 is the first column of R22 . As before, a direct calculation shows
Hk u = − sign(u1 ) kuk2 e1 .
Hence, the first column of Hk R22 has zeros everywhere except the first entry, and so the k th
column of Rk has zeros below the diagonal.
Hence, Rk has zeros below the diagonal in columns 1 to k if Rk−1 has zeros below the diagonal
in columns 1 to k − 1, and the statement of the Lemma holds by induction.
In particular, Lemma 2.18 shows that the output R = Rn−1 = Qn−1 . . . Q1 A is upper triangular.
Lemma 2.19. The matrix Qk in line QR.7 is orthogonal and symmetric, for 1 6 k 6 n − 1.
30
Numerical Linear Algebra MATH10098
Then
Proof. We have Q = IQ1 . . . Qn−1 . Since the product of orthogonal matrices is orthogonal, this
follows from Lemma 2.19.
k=1:
1
u = a1 =
1
√ √
1 2 1+ 2
v = u + sign(u1 )kuk2 e1 = + =
1 0 1
√ √
1 1 1+ 2 1 1+ 2
w= v=q √ =√ p √
kvk2 2 2 1 2 2+ 2 1
(1 + 2) + 1
√ √
T 1 0 2 3 + 2√ 2 1 + 2 1 −1 −1
Hk = I − 2ww = − √ =√
0 1 2(2 + 2) 1 + 2 1 2 −1 1
1 −1 −1 1 1 1 −2 −1
R = Hk A = √ =√
2 −1 1 1 0 2 0 −1
1 −1 −1
Q = IHk = Hk = √
2 −1 1
1 −1 −1 1 1 1 −2 −1
Hence, we have Q = √2 and R = = √2 . Note that up to a
−1 1 1 0 0 −1
change in sign, this QR factorisation is the same as the one computed in the previous section
using Algorithm GS. In fact the QR factorisation of a matrix can be shown to be unique up
31
Numerical Linear Algebra MATH10098
to changes in sign, and the QR factorisation can be made unique by requiring all diagonal
entries of R to be positive.
Computational cost
Algorithm QR should be implemented efficiently. In particular, we do not want to use Algorithm MM
to compute the matrix-matrix products in lines QR.8 and QR.9.
Using the special structure of the matrices Hk and Qk , we can perform lines QR.8 and QR.9
using Algorithms MV and OP. For example, at iteration k, in line QR.8 we have
I 0 R11 R12 R11 R12
Qk R = = ,
0 Hk 0 R22 0 Hk R22
and we can compute zT = wT R22 using Algorithm MV, 2w using n − k + 1 multiplications and 2wzT
using Algorithm OP. Together with the (n − k + 1)2 subtractions to compute R22 − 2wzT , this gives
a total cost of 4(n − k + 1)2 + (n − k + 1) for line QR.8. Note that this is much smaller than the cost
of 2n3 we would get by applying Algorithm MM to compute Qk R directly!
Algorithm QR has computational cost C(n) = 38 n3 + O(n2 ), so it is more expensive than Algo-
rithms GS and MGS.
For Algorithm SQR, we do not need the matrix Q explicitly, but rather only the product QT b.
Thus, we can modify the algorithm, and initialise y = b in line 1 and replace line QR.9 of Algo-
rithm QR with
90 : y = Qk y.
This computes
y = Qn−1 . . . Q1 b
= QT T
n−1 . . . Q1 b
= (Q1 . . . Qn−1 )T b
= QT b.
This reduces the cost of Algorithm QR to C(n) = 34 n3 + O(n2 ). This can be implemented efficiently
using Algorithm DP as in Workshop 2 Exercise 1.
Error analysis
Algorithm SQR has better stability properties than Algorithm GEPP, which outweighs the larger
cost.
Theorem 2.21. Let Ax = b, and denote by x̂ the solution computed through Algorithm SQR
with a Householder QR factorisation. Then
(A + ∆A) x̂ = b,
32
Numerical Linear Algebra MATH10098
The proof of Theorem 2.21 can be found in Theorem 19.5 in the book Accuracy and Stability of
Numerical Algorithms, by N.J. Higham. It is based on Assumption 1.1, and involves going through
each operation performed in Algorithm QR, accumulating all rounding errors.
Compared to the bounds in Theorem 2.9 for Algorithm GEPP, the size of the perturbation ∆A
only depends polynomially on n. Theorem 2.11, together with Theorem 2.21, can be used to bound
the error in the computed solution x
b.
33
Numerical Linear Algebra MATH10098
Different choices for M and N result in different particular methods, and we will see two examples
in the next sections.
Since the error ek = x − xk is not computable in practice, we stop the iteration when the residual
rk = b − Axk is sufficiently small. Note that since SLE has unique solution, krk kp = 0 ⇔ xk = x.
Typical choices for the starting guess x0 include the zero vector and random vectors. Typical
choices for the tolerance εr are for example 10−3 kbkp or 10−6 kbkp . Scaling the tolerance by kbkp
kAxk −bkp
means that we are stopping the iteration when the relative residual kbk is small enough.
p
Mx = b − Nx,
and
Mxk = b − Nxk−1 .
The error ek = x − xk hence satisfies
34
Numerical Linear Algebra MATH10098
Theorem 3.1. Let ek = Rk e0 , for k ∈ N and some R ∈ Rn×n . If kRkp < 1, then kek kp → 0 as
k → ∞.
Proof. Using the properties kBykp ≤ kBkp kykp and kBk kp ≤ kBkkp , for any B ∈ Rn×n and
y ∈ Rn , we have
kek kp = Rk e0 p
k
6 R p
ke0 kp
6 kRkkp ke0 kp
(yk )i
The system in line 5 can be solved efficiently using a diagonal solver: (xk )i = dii
, for i = 1, . . . , n.
2 1 3
Example. Let A = and b = . Consider solving the system Ax = b using the
1 2 3
0
Jacobi method with x0 = .
0
35
Numerical Linear Algebra MATH10098
2 0 0 1
With M = D = and N = L + U = , we then have
0 2 1 0
k=1:
3 0 1 0 3
y1 = b − (L + U) x0 = − = ,
3 1 0 0 3
3
2 0 3
x = y1 = ⇒ x1 = 23 .
0 2 1 3 2
k=2:
0 1 23
3
3
y2 = b − (L + U) x1 = − = 23 ,
3 1 0 32 2
3 3
2 0
x = y2 = 23 ⇒ x2 = 43 .
0 2 2 2 4
k=3:
0 1 34
9
3
y3 = b − (L + U) x2 = − = 49 ,
3 1 0 34 4
9 9
2 0
x = y3 = 49 ⇒ x3 = 89 .
0 2 3 4 8
Error analysis
Convergence of the Jacobi method follows from Theorem 3.1, with R = −D−1 (L + U).
36
Numerical Linear Algebra MATH10098
Then
n
X
kRk∞ = max |rij |
16i6n
j=1
X |aij |
= max
16i6n
j6=i
|aii |
1 X
= max |aij |
16i6n |aii |
j6=i
< 1,
where in the last step we have used that A is strictly diagonally dominant. It then follows from
Theorem 3.1 that kek k∞ → 0 as k → ∞.
Computational cost
The computational cost of the Jacobi method depends on:
The number of iterations kmax depends on the chosen accuracy εr , the norm of R and the starting
guess x0 . See the Python demo video and the computer lab in Week 7 for examples.
For each iteration of the Jacobi method, line JAC.4 requires n subtractions and one matrix-vector
multiplication with cost 2n2 , which is a total of 2n2 + n operations. Line JAC.5 requires n divisions.
So, in total, one iteration requires 2n2 +2n = O(n2 ) operations. The cost of Jacobi is then O(kmax n2 ),
which is typically much smaller than O(n3 ) for large n.
37
Numerical Linear Algebra MATH10098
The matrix L + D is lower triangular, so the system in line 5 can be solved efficiently using
Algorithm FS.
2 1 3
Example. (ctd) Let A = and b = . Consider solving the system Ax = b using
1 2 3
0
the Gauss-Seidel method with x0 = .
0
2 0 0 1
With M = L + D = and N = U = , we then have
1 2 0 0
k=1:
3 0 1 0 3
y1 = b − Ux0 = − = ,
3 0 0 0 3
3
3
2 0 3
x = y1 = ⇒ x1 = 1 2 3 = 23 .
1 2 1 3 2
3− 2 4
k=2:
0 1 23
9
3
y2 = b − Ux1 = − = 4 ,
3 0 0 34 3
9
9 9
2 0
x = y2 = 4 ⇒ x2 = 1 8 9 = 15 8 .
1 2 2 3 2
3− 8 16
k=3:
0 1 89
33
3
y3 = b − Ux2 = − 15 =
16 ,
3 0 0 16 3
33
33 33
2 0
x = y3 = 16 ⇒ x3 = 1 32 33 = 32 63 .
1 2 3 3 2
3 − 32 64
Are we converging to the right solution? What do you notice about the speed of convergence
compared to the Jacobi method? What do you notice about the size of the errors in the first
and second component?
Error analysis
Theorem 3.3. Suppose A ∈ Rn×n is strictly diagonally dominant, i.e.
X
|aii | > |aij |, ∀i = 1, . . . , n.
j6=i
Computational cost
The number of iterations kmax again depends on the chosen accuracy εr , the norm of R and the
starting guess x0 . Compared to Jacobi, we typically need fewer iterations, as illustrated by the
example above.
38
Numerical Linear Algebra MATH10098
Line GAU.5 can be done using forward substitution, since L + D is lower triangular, and so costs
n2 operations. Line GAU.4 requires n subtractions and one matrix-vector multiplication, which costs
2n2 operations using Algorithm MV. This would give a total of 3n2 + n operations.
However, the matrix U in the matrix-vector product in line GAU.4 is strictly upper triangular,
and so Algorithm MV is not efficient in this case. P Pn triangular structure of U implies
The strictly upper
n
that uij = 0 for j ≤ i, and so we have (Uv)i = j=1 uij vj = j=i+1 uij vj for any vector v. Hence,
(Uv)i can be computed by applying Algorithm DP to the vectors ui,i+1:n := [ui,i+1 , . . . , uin ]T ∈ Rn−i
and vi+1:n ]T ∈ Rn−i . The cost of computing a matrix-vector product Uv is then
Pn:= [vi+1 , . . . , vnP
C(n) = i=1 2(n − i) = 2 n−1 2
j=0 j = (n − 1)n. This gives a total cost of 2n operations per iteration
of the Gauss-Seidel method.
The costs per iteration of Jacobi and Gauss-Seidel hence both have leading order term 2n2 , and
for large n, the difference between them is negligible. (Note that we can also slightly improve on the
cost per iteration of Jacobi, by noting that the matrix L+U has zeros on the diagonal and modifying
Algorithm MV correspondingly. This saves 1 multiplication and 1 addition per entry of the vector
(L + U)xk , so 2n operations in total. The costs per iteration of Jacobi and Gauss-Seidel are then
the same.)
39
Numerical Linear Algebra MATH10098
4 Polynomial Fitting
Suppose we are given a set of points {ai , bi }m
i=1 , and we want to find the degree m − 1 polynomial
that goes through these points. Then we need to determine coefficients x ∈ Rm such that
bi = x1 + x2 ai + x3 a2i + . . . + xm am−1
i , for i = 1, . . . , m.
This corresponds to solving the linear system Ax = b, where b ∈ Rm is the vector containing {bi }m
i=1 ,
m×m j−1
and the matrix A ∈ R is the Vandermonde matrix with entries aij = ai :
Ax = b
⇔ (Ax)i = bi , ∀i ∈ {1, . . . , m}
Xm
⇔ aij xj = bi ∀i ∈ {1, . . . , m}
j=1
Vandermonde matrices are very ill-conditioned, especially when the number of points m is large (see
Computer lab 7).
An illustration of polynomial fitting applied to a specific set of observed points {ai , bi }10
i=1 is given
in Figure 1. The data points are indicated by circles, and the interpolating degree 9 polynomial is
shown in the dotted line. We can see that even though the dotted line does pass through each of the
observed points, it does not seem to be a good indicator of the general trend in the data. Suppose
we want to predict the value of the output b at the input a = 0.95. Do you think the polynomial
gives a reasonable prediction here?
What if we instead want to find the straight line that best fits our observations {ai , bi }m
i=1 ? Let
the line have equation b = x1 + x2 a. Then we want to choose x ∈ R2 as to minimise
m
X
((x1 + x2 ai ) − bi )2 ,
i=1
i.e. we want to minimise the difference between our observed outputs bi and the predicted outputs
x1 + x2 ai of the straight line. In terms of Figure 1, this corresponds to minimising the vertical
distance between the circles and the straight line.
With the Vandermonde matrix A ∈ Rm×2 with entries aij = aj−1
i , and b ∈ Rm the vector
containing {bi }m
i=1 , this corresponds to x minimising
kAx − bk22 .
Since kAx − bk2 is always non-negative, this is equivalent to minimising kAx − bk2 .
This can be generalised to fitting a polynomial of degree n − 1 to the data points {ai , bi }m
i=1 , in
m×n
which case we have the Vandermonde matrix A ∈ R .
40
Numerical Linear Algebra MATH10098
Theorem 5.1. Let A ∈ Rm×n , b ∈ Rm . A vector x ∈ Rn minimises kAx − bk2 if and only if it
solves the normal equations
AT Ax = AT b.
Proof. ⇒: Suppose x ∈ Rn minimises kAx − bk2 . Let g(x) = kAx − bk22 . Then minimising
kAx − bk2 is equivalent to minimising g.
The gradient of g is given by
= ∇ xT AT Ax − 2xT AT b + bT b
= 2AT Ax − 2AT b.
41
Numerical Linear Algebra MATH10098
Then
since AT (Ax − b) = 0. Since kAck22 ≥ 0 for all c ∈ Rn , this shows that kAy − bk22 ≥ kAx − bk22
for all y ∈ Rn . In other words, x ∈ Rn minimises kAx − bk22 and hence minimises kAx − bk2 .
In general, the matrix AT A need not be invertible, and the minimiser of kAx − bk2 is not unique.
If (and only if) we have m > n and rank(A) = n, then the matrix AT A is invertible, and the normal
equations have a unique solution
−1 T
x = AT A A b.
−1
The matrix A† = AT A
AT ∈ Rn×m is in that case called the Moore-Penrose pseudo-inverse
of A. If A is invertible, then A† = A−1 .
• Computing AT A and AT b. This can be done using Algorithms MM and MV, respectively, at
cost 2mn2 and 2mn.
• Solving AT Ax = AT b. This can be done using Algorithm GEPP, at cost 23 n3 + 3n2 − 35 n.
Since the matrix AT A is symmetric, we can essentially halve the computational cost of the steps
above using efficient implementations. For step 2, we can use the Cholesky factorisation as in
Workshop 4 Exercise 4, instead of the LU factorisation, which reduces the cost of solving the system
to 13 n3 +3n2 + 32 n. We can reduce the cost of step 1 by modifying Algorithm MM to only compute the
entries (AT A)ij for i ≤ j, since (AT A)ji = (AT A)ij . This means we are computing nj=1 j = n(n+1)
P
2
entries of AT A, each of which involves an application of Algorithm DP at cost 2m, which gives a
total cost of mn2 + mn to compute AT A.
For the computed solution x
b computed using Algorithm GEPP to solve the normal equations, an
application of Theorem 2.11 gives us the error bound
The dependency on κp (AT A) is undesirable, since it turns out that κp (AT A) = κp (A)2 . So even if A
is only mildly ill-conditioned, AT A can be severely ill-conditioned and the accuracy in the computed
solution can be very poor.
Above, we have used the condition number κp (A) of the rectangular matrix A, which is defined
analogously to the condition number of square matrices in Definition 2.10.
42
Numerical Linear Algebra MATH10098
Definition 5.2. The condition number κp (A) of A ∈ Rm×n with respect to the p-norm is
(
kAkp A† p if rank(A) = n,
κp (A) =
∞ otherwise.
We can show that κp (AT A) = κp (A)2 using the singular value decomposition of A.
43
Numerical Linear Algebra MATH10098
A = UΣVT
An SVD exists for all matrices A ∈ Rm×n . The singular values σ1 , . . . , σp are uniquely determined
(up to permutation), whereas the singular vectors are not uniquely determined.
If A = UΣVT is an SVD, then
Avi = σi ui , and uTi A = σi viT ,
where ui is the ith column of U and vi is the ith column of V.
Since U and V are invertible, we have
rank(A) = rank(UΣVT ) = rank(Σ).
Since Σ is diagonal, the rank of Σ is equal to the number of non-zero diagonal entries. Any column
with a zero diagonal entry will be equal to the zero vector. For A ∈ Rm×n with m > n, we hence
have
rank(A) = n iff σ1 , . . . , σn > 0,
i.e. A is of full rank iff it does not have a zero singular value.
We have the following equivalent of Theorem 2.12.
Theorem 5.4. Let A ∈ Rm×n , with m ≥ n and singular value decomposition A = UΣVT . Then
the condition number of A in the 2-norm is
σmax
κ2 (A) = kAk2 A† 2
= > 1,
σmin
where σmax = max1≤i≤n σi and σmin = min1≤i≤n σi .
σmax
Proof. If A is not of rank n, then σmin = 0, and σmin
= ∞. This is consistent with Definition 5.2.
−1
We will prove the result by proving kAk2 = σmax and A† 2 = σmin . We start with the former.
We first show kAk2 ≤ σmax , and will afterwards show kAk2 ≥ σmax .
By definition of the induced 2-norm, we have
UΣVT x 2
kAk2 = max .
x6=0 kxk2
UΣVT x 2
ΣVT x 2
max = max
x6=0 kxk2 x6=0 kxk2
44
Numerical Linear Algebra MATH10098
ΣVT x 2
max
x6=0 kxk2
ΣVT x 2
= max
x6=0 kVT xk
2
kΣyk2
= max ,
y6=0 kyk
2
which implies
kAk2 6 σmax .
We now show that kAk2 > σmax . Let ej ∈ Rn be the vector with a 1 in its jth entry and zeros in
all other entries, where j is such that σj = σmax . Then
kAyk2
kAk2 = max
y6=0 kyk2
kΣyk2
= max
y6=0 kyk
2
kΣej k2
>
kej k2
= σmax .
Using the singular value decomposition, we can prove κ2 (AT A) = κ2 (A)2 (see Workshop 8), and
so the error in the solution x̂ computed by solving the normal equations with Algorithm GEPP can
be very large. We want to avoid working with the matrix AT A, and work instead directly with A.
45
Numerical Linear Algebra MATH10098
AT Ax = RT1 QT1 Q1 R1 x
= RT1 y
= RT1 Q1 T b
= AT b.
Computational cost
The QR factorisation can be computed using Algorithm QR:
46
Numerical Linear Algebra MATH10098
Error analysis
By working directly with the matrix A, rather than AT A, we get rid of the dependency of the error
on κ(AT A), and instead get a dependency on κ(A).
Theorem 5.5. Let x be the minimiser of kAx − bk2 , and denote by x̂ the solution computed
through Algorithm LSQ-QR with a Householder QR factorisation. Then x̂ is the minimiser of
k(A + ∆A)x − bk2 where, with α > 0 a small integer constant,
The proof is similar to that of Theorem 2.21, see Theorem 20.3 in Accuracy and Stability of
Numerical Algorithms by Higham.
Theorem 5.6. Let x be the minimiser of kAx−bk2 , and x̂ be the minimiser of k(A+∆A)x−bk2 .
If rank(A) = rank(A + ∆A) = n and
then
kx − x̂k2 κ2 (A) k∆Ak2 krk2
6 k∆Ak
2 + (κ2 (A) + 1) ,
kxk2 1 − κ2 (A) kAk 2 kAk2 kAk2 kxk2
2
where r = b − Ax.
For a proof of Theorem 5.6, see Theorem 20.1 in Accuracy and Stability of Numerical Algorithms
by Higham. The proof uses the pseudo-inverses A† and (A + ∆A)† .
47
Numerical Linear Algebra MATH10098
To compute the reduced SVD of A, we make use of the following: The factorisation A = UΣVT
is equivalent to AV = UΣVT V = UΣ, since V is invertible and VT V = I. This gives
Avi = σi ui , for i = 1, . . . , n.
Similarly, the factorisation AT = VΣT UT is equivalent to AT U = VΣT UT U = VΣT , since U is
invertible and UT U = I. This gives
AT ui = σi vi , for i = 1, . . . , n.
Together, this implies
AT Avi = σi AT ui = σi2 vi , for i = 1, . . . , n.
So we need to compute the eigenvalues {σi2 }ni=1 of AT A ∈ Rn×n , together with its corresponding
eigenvectors {vi }ni=1 . The vectors {ui }ni=1 can then be computed as ui = σ1i Avi . This will give us
the reduced SVD.
Example. Consider the matrix
1 2
A= .
2 1
Since A is square, the SVD and the reduced SVD are the same. To find an SVD, we follow
the steps above. We have
T 5 4
A A= .
4 5
The eigenvalues of AT A satisfy
5−λ 4
0= = (5 − λ)2 − 16 = λ2 − 10λ + 9 = (λ − 1)(λ − 9).
4 5−λ
48
Numerical Linear Algebra MATH10098
" #
√1
1 2 1
For the singular vectors u1 and u2 , we have u1 = σ1
Av1 = √1
and u2 = σ2
Av2 =
2
" #
√1
2
.
− √12
" #
√1 √1
2 2
This gives U = u1 u2 = √1
.
2
− √12
49
Numerical Linear Algebra MATH10098
6 Eigenvalue Problems
Given A ∈ Rn×n , we want to find λ ∈ C and x ∈ Cn \ {0} such that
Ax = λx.
The eigenvalues of A are the roots of its characteristic polynomial ξA (λ) = det(A − λI). The
below theorem in Galois theory shows that it is not possible to find an algorithm that calculates
eigenvalues exactly after a finite number of steps. Any possible algorithm would be based on the
operations mentioned in the theorem. All algorithms for computing eigenvalues are hence iterative,
and only yield an approximate solution. We want to avoid applying root finding algorithms to the
characteristic polynomial, since this is a very unstable (and expensive) approach.
Theorem 6.1. (Abel, 1824) For all n ≥ 5, there exists a polynomial of degree n with rational co-
efficients that has a real root which cannot be expressed by only using rational numbers, addition,
subtraction, multiplication, division and taking kth roots.
We will restrict our attention to symmetric matrices A, which have real eigenvalues and eigen-
vectors.
b) The eigenvectors x1 , . . . , xn can be chosen such that they form an orthonormal basis of Rn .
Proof. This is also known as the Spectral Theorem, and can be proved by induction on n, see e.g.
Theorem 5.20 in Linear Algebra: a Modern Introduction by D. Poole.
Theorem 6.2 shows that symmetric matrices A diagonalise as A = QDQT , where the columns
of the orthogonal matrix Q ∈ Rn×n are the eigenvectors of A, and the diagonal matrix D ∈ Rn×n
has the corresponding eigenvalues as diagonal entries, in the same order.
We order the eigenvalues of A in decreasing order of absolute value,
xT Ax
rA (x) = .
xT x
50
Numerical Linear Algebra MATH10098
• given an eigenvalue λi , we can find the corresponding eigenvector xi by solving the system of
equations (A − λi I)xi = 0.
• given an eigenvector xi , we can find the eigenvalue λi by computing the Rayleigh quotient
rA (xi ).
For large k ∈ N, this sum will be dominated by the term corresponding to the eigenvalue of A with
largest absolute value. For this reason, the eigenvalue λ1 of A with largest absolute value is known
as the dominating eigenvalue, and denoted by λmax . We denote the corresponding eigenvector by
xmax .
As with all iterative methods, we need to stop the iteration when we are "close enough" to
the real answer. Typically, we stop the iteration when either kz(k) − z(k−1) k2 , |λ(k) − λ(k−1) | or
51
Numerical Linear Algebra MATH10098
kAz(k) − λ(k) z(k) k2 is smaller than some tolerance ε. With stopping a stopping criterion based on
the closeness of eigenvectors, we have to be careful with a possible change of sign of z(k) with each
iteration. This happens when λmax < 0, and λkmax changes sign with each iteration.
k (0)
The algorithm calculates an approximate eigenvector as z(k) = kAAk zz(0) k2 . To improve stability and
avoid working with very large or very small numbers, the vector z(k) is normalised in every step of
the iteration.
5/2 −3/2
Example. Let A = . This matrix diagonalises as A = QDQT , with D =
−3/2 5/2
1 0 1 1 1 (0) 1
and Q = 2
√ . Let z = . Then
0 4 1 −1 0
(0) 5/2
Az =
−3/2
2 (0) 17/2
Az =
−15/2
3 (0) 65/2
Az = .
−63/2
Computational cost
Line 5 of Algorithm PI requires one matrix-vector multiplication, at cost 2n2 using Algorithm MV.
Line 6 requires one application of Algorithm DP at cost 2n and one square root to compute kw(k) k2 ,
and n divisions. Computing the Rayleigh quotient in line 6 requires one matrix-vector multiplication
at cost 2n2 and one dot products at cost 2n. Note that since z(k) is normalised to length 1, the
denominator in the Rayleigh quotient is simply one. So one iteration step costs O(n2 ) in total.
|λ2 |
The number of iterations kmax required depends on the speed of convergence (i.e. on |λ1 |
) and
the stopping criterion employed.
The total cost of Power Iteration is then kmax (4n2 + 5n + 1).
52
Numerical Linear Algebra MATH10098
Error Analysis
We have the following result on the convergence of the Power Iteration algorithm.
Theorem 6.5. Suppose |λ1 | > |λ2 | and xT1 z(0) 6= 0. Then there exists a sequence {σk }k∈N with
σk = ±1 such that as k → ∞,
Pn k
ci λi
Since |λ1 | > |λ2 |, the vector yk = i=2 c1 λ1
xi satisfies yk → 0 as k → ∞.
Then
Ak z(0)
z(k) =
kAk z(0) k2
c1 λk1 (x1 + yk )
=
kc1 λk1 (x1 + yk ) k2
c1 λk1 x1 + yk
=
|c1 λk1 | kx1 + yk k2
x 1 + yk
= σk−1 ,
kx1 + yk k2
c1 λk1
with σk−1 = |c1 λk1 |
.
The first claim of the Theorem then follows:
x 1 + yk
kσk z(k) − x1 k2 = − x1
kx1 + yk k2 2
→0 as k → ∞,
53
Numerical Linear Algebra MATH10098
For the second claim, we use the definition of the Rayleigh quotient.
λ(k) = rA (z(k) )
= (z(k) )T Az(k)
= σk2 (z(k) )T Az(k)
= (σk z(k) )T A(σk z(k) )
1
= (x1 + yk )T A(x1 + yk )
kx1 + yk k22
→ xT1 Ax1 as k → ∞,
since yk → 0 as k → ∞. The claim then follows since λ1 = rA (x1 ) = xT1 Ax1 , which gives
|λ(k) − λ1 | → 0.
The proof of Theorem 6.5 reveals that the speed of convergence of z(k) and λk depends on the
ratio |λ 2|
|λ1 |
. (This directly influences the size of the vector yk .) Sharpening the proof of Theorem 6.5,
we can show that
k
(k) |λ2 |
kσk z − x1 k2 ≤ α1
|λ1 |
2k
(k) |λ2 |
|λ − λ1 | ≤ α2 ,
|λ1 |
for some constants α1 , α2 ∈ R. So the bigger the gap between |λ1 | and |λ2 |, the faster Power Iteration
converges.
54
Numerical Linear Algebra MATH10098
Note that by computing λ(k) = rA (z(k) ), we have λ(k) ≈ λj , where λj is the eigenvalue of A closest
to s. The inverse iteration algorithm is obtained by setting s = 0.
Computational cost
Compared to Algorithm PI, the only line that has changed is line 5. Instead of performing a matrix-
vector multiplication, we now have to solve a system of linear equations of size n. This can be done
using Algorithm GEPP. Since the matrix A − sI is the same for each k, and only the right hand side
z(k−1) changes, we can compute the LU factorisation with permutation of A − sI once, and reuse this
for all values of k. This gives a one-off cost of O(n3 ) and a cost per iteration of O(n2 ).
Error Analysis
The following is a direct consequence of Theorem 6.5.
Theorem 6.6. Let s ∈ R. Suppose a, b ∈ {1, 2, . . . , n} are such that |λa − s|−1 > |λb − s|−1 ≥
|λi − s|−1 , for i ∈ {1, . . . , n} \ {a, b} and xTa z(0) 6= 0. Then there exists a sequence {σk }k∈N with
σk = ±1 such that as k → ∞,
The speed of convergence again depends on how well the eigenvalues are separated, i.e. on the
size of |λ a −s|
|λb −s|
:
k
(k) |λa − s|
kσk z − x a k 2 ≤ α1
|λb − s|
2k
(k) |λa − s|
|λ − λa | ≤ α 2 ,
|λb − s|
55
Numerical Linear Algebra MATH10098
In line 6, it is crucial that we do not just normalise all the columns of Z(k) to length 1, but also
make the columns orthogonal. As seen in the analysis of Power Iteration, all columns of Ak Z(0)
would converge to multiples of the dominating eigenvector x1 . In fact, the first column of Z(k) is
exactly the vector z(k) computed by Algorithm PI with starting guess z(0) equal to the first column
of Z(k) . (It is multiplied by A in line 3 and then normalised to length 1 in line 6.)
In order to make sure that the second column of Z(k) converges to x2 , we need to subtract the
component in the direction of x1 . What is left will then be dominated by the component in the
direction of x2 . Similarly, we need to subtract the components in the directions of x1 , . . . , xi−1 from
the ith column of Z(k) . This is exactly what the QR factorisation does. Think back to how Gram-
Schmidt orthonormalisation works: The first column of Z(k) is the first column of W(k) , normalised
to length 1. To obtain the second column of Z(k) , we take the second column of W(k) , subtract the
projection onto the first column of W(k) , and normalise to length 1. Since the first column of Z(k) is
converging x1 , this will eventually get rid of the term in the direction of x1 in the second column of
Z(k) . The argument for the remainder of the columns is similar.
For the Orthogonal Iteration Algorithm to converge, we need the assumption that A has distinct
eigenvalues:
|λ1 | > |λ2 | > · · · > |λn |.
56
Numerical Linear Algebra MATH10098
The speed of convergence of the columns of Z(k) to the eigenvectors of A and the diagonal entries of
Λ(k) to the eigenvalues of A, depends on how well the eigenvalues are separated. More precisely, it
depends on the quantity
|λl+1 |
max .
1≤l≤n−1 |λl |
Algorithm OI is unstable when implemented in practice, and small rounding errors typically lead
to large errors in the computed eigenvectors and eigenvalues. To see why this is the case, consider
extracting the component in the direction of xn of the vector
n
X
k (0)
A z = ci λki xi .
i=1
Theoretically, this can be done following the procedure in Algorithm GS, and subtracting the projec-
tions onto x1 , . . . , xn−1 . However, the component in the direction of xn will be very small compared
to the other terms in the sum, and so numerically this component might get lost in the presence of
rounding errors.
Algorithm QR-Eig in the next section is a much more stable way of implementing the idea behind
Algorithm OI.
5/2 −3/2
Example. (ctd) Let A = . This matrix diagonalises as A = QDQT , with
−3/2
5/2
1 0 1 1 1 (0) 1 0
D= and Q = 2
√ . Let Z = . Then, to 3 significant figures, we have
0 4 1 −1 0 1
k=1:
(1) (0) 5/2 −3/2
W = AZ =
−3/2 5/2
(1) −0.857 0.514 −2.91 2.57
W = = Z(1) R(1)
0.514 0.857 0 1.37
(1) (1) T (1) 3.82 0.706
Λ = (Z ) AZ =
0.706 1.18
k=2:
(2) (1) −2.92 0
W = AZ =
2.57 1.37
(2) −0.750 0.661 3.89 0.908
W = = Z(2) R(2)
0.662 0.750 0 1.03
(2) (2) T (2) 3.99 0.187
Λ = (Z ) AZ =
0.187 1.01
k=3:
(3) (2) −2.87 2.78
W = AZ =
0.529 0.882
(3) −0.718 0.696 3.99 0.234
W = = Z(3) R(3)
0.696 0.718 0 1.00
(3) (3) T (3) 4.00 0.047
Λ = (Z ) AZ =
0.047 1.00
57
Numerical Linear Algebra MATH10098
(k)
the columns of Z converge
We see that very
quickly
to a multiple of the eigenvectors
1 0.707 1 0.707
x1 = √12 ≈ and x2 = √12 ≈ .
−1 −0.707 1 0.707
The diagonal entries of Λ(k) converge to the the eigenvalues 4 and 1, respectively, and
the off-diagonal entries converge to zero. As expected from the error analysis for Algorithm
PI, we can see that the estimates of the eigenvalues converge faster than the estimate of the
eigenvectors.
Furthermore,
5/2 −3/2
Example. (ctd) Let A = . This matrix diagonalises as A = QDQT , with
−3/2 5/2
58
Numerical Linear Algebra MATH10098
1 0 √1
1 1
D= and Q = . Then, to 3 significant figures, we have
0 4 1 −1
2
2.5 −1.5
Λ(0) =A=
−1.5 2.5
k=1:
(0) −0.857 0.514 −2.92 2.57
Λ = = Q(1) R(1)
0.514 0.857 0 1.37
3.82 0.706
Λ(1) (1) (1)
=R Q =
0.706 1.18
k=2:
(1) −0.983 −0.182 −3.89 −0.908
Λ = = Q(2) R(2)
−0.182 0.983 0 1.03
3.99 −0.187
Λ(2) (2) (2)
=R Q =
−0.187 1.01
k=3:
(2) −0.999 0.0468 −3.99 0.234
Λ = = Q(3) R(3)
0.0468 0.999 0 1.00
4.00 0.047
Λ(3) (3) (3)
=R Q =
0.047 1.00
We see that the matrices Λ(k) are indeed the same matrices as computed by Algorithm OI,
differing only by a possible change of sign in the off-diagonal entries. This is due to the QR
factorisation only being unique up to a change of sign.
Although the matrices Λ(k) are the same for Algorithm OI and Algorithm QR-Eig when
computed in exact arithmetic, there is typically a substantial difference when implemented
in floating point arithmetic. The matrices computed with Algorithm QR-Eig will be more
accurate.
Computational Cost
In each iteration, we need to perform a matrix-matrix multiplication and find a QR factorisation,
both of which have cost of O(n3 ). Hence, the total cost of Algorithm QR-Eig is O(kmax n3 ).
If we use Algorithm MGS (Modified Gram-Schmidt) and Algorithm MM (Matrix-matrix multi-
plication), then the total cost of Algorithm QR-Eig is kmax (4n3 + n2 + n). But note that R is upper
triangular, and so the efficiency of Algorithm MM can be improved.
59