Lin Syster RN
Lin Syster RN
Lin Syster RN
When we solve a linear system Ax = b we often do not know A and b exactly, but have only approximations  and b̂ available.
Then the best thing we can do is to solve Âx̂ = b̂ exactly which gives a different solution vector x̂. We would like to know
how the errors of  and b̂ influence the error in x̂.
Vector norms
In order to measure errors in vectors by a single number we use a so-called vector norm.
A vector norm kxk measures the size of a vector x ∈ Rn by a nonnegative number and has the following properties
kxk = 0 ⇒ x=0
kαxk = |α| kxk
kx + yk ≤ kxk + kyk
for any x, y ∈ Rn , α ∈ R. There are many possible vector norms. We will use the three norms kxk1 , kxk2 , kxk∞ defined by
kxk1 = |x1 | + · · · + |xn |
1/2
kxk2 = |x1 |2 + · · · + |xn |2
kxk∞ = max{|x1 | , . . . , |xd |}
If we write kxk in an equation without any subscript, then the equation is valid for all three norms (using the same norm
everywhere).
kx̂−xk
If the exact vector is x and the approximation is x̂ we can define the relative error with respect to a vector norm as kxk .
kb̂−bk kx̂−xk
Example: Note that in the above example we have kbk ∞ = 0.01, but kxk ∞ = 1. That means that the relative error of
∞ ∞
the solution is 100 times as large as the relative error in the given data, i.e., the condition number of the problem is at least
100.
Matrix norms
A matrix norm kAk measures the size of a matrix A ∈ Rn×n by a nonnegative number. We would like to have the property
kAxk ≤ kAk kxk for all x ∈ Rn (1)
where kxk is one of the above vector norms kxk1 , kxk2 , kxk∞ . We define kAk as the smallest number satisfying (1):
kAxk
kAk := sup = maxn kAxk
x∈Rn kxk x∈R
x6=0 kxk=1
By using the 1, 2, ∞ vector norm in this definition we obtain the matrix norms kAk1 , kAk2 , kAk∞ (which are in general
different numbers). It turns out that kAk1 and kAk∞ are easy to compute:
Theorem:
kAk∞ = max ∑ ai j (maximum of row sums of absolute values)
i=1,...,n j=1,...,n
kAk1 = max ∑ ai j (maximum of column sums of absolute values)
j=1,...,n i=1,...,n
Proof: For the infinity norm we have
!
kAxk∞ = max ∑ ai j x j ≤ max ∑ ai j x j ≤ max ∑ ai j kxk∞
i j i j i j
implying kAk∞ ≤ max i ∑ j ai j . Let i∗ be the index where the maximum occurs and define x j = sign ai∗ j , then kxk∞ = 1 and
kAxk∞ = maxi ∑ j ai j .
For the 1-norm we have
! !
kAxk1 = ∑ ∑ ai j x j ≤ ∑ ∑ ai j x j ≤ max ∑ ai j kxk1
i j j i j i
implying kAk1 ≤ max j ∑i ai j . Let j∗ be the index where the maximum occurs and define x j∗ = 1 and x j = 0 for j 6= j∗ , then
kxk1 = 1 and kAxk1 = max j ∑i ai j .
We will not use kAk2 since it is more complicated to compute (it involves eigenvalues).
Note that for A, B ∈ Rn×n we have kABk ≤ kAk kBk since
Lemma 1: Let A ∈ Rn×n be nonsingular and E ∈ Rn×n . Then kEk < kA1−1 k implies that A + E is nonsingular.
Proof: Assume that A + E is singular. Then there exists a nonzero x ∈ Rn such that (A + E)x = 0 and hence
kxk
≤ kAxk = kExk ≤ kEk kxk
kA−1 k
The left inequality for b := Ax follows from kxk =
A−1 b
≤
A−1
kbk. As kxk > 0 we obtain
1
kA−1 k
≤ kEk.
kyk
Lemma 2: For given vectors x, y ∈ Rn with x 6= 0 there exists a matrix E ∈ Rn×n with Ex = y and kEk = kxk .
Proof: For the infinity-norm we have kxk∞ = x j for some j. Let a ∈ Rn be the vector with a j = 1, ak = 0 for j 6= k and let
1
ya> ,
E=
kxk
kyk
then (i) a> x = kxk∞ implies Ex = y and (ii)
ya> v
∞ = a> v kyk∞ with a> v ≤ kvk∞ implies kEk ≤
kxk .
For the 1-norm we use a ∈ Rn with a j = sign(x j ) since a>x = kxk1 and a> v ≤ kvk1 .
For the 2-norm we use a = x/ kxk2 since a> x = kxk2 and a> v ≤ kak2 kvk2 = kvk2 .
Condition numbers
Let x denote the solution vector of the linear sytem Ax = b. If we choose a slightly different right
hand
side vector b̂ then we
obtain a different solution vector x̂ satisfying Ax̂ = b̂. We want to know how the relative error
b̂ − b
/ kbk influences the
relative error kx̂ − xk / kxk (“error propagation”). We have A(x̂ − x) = b̂ − b and therefore
The number cond(A) := kAk
A−1
is called condition number of the matrix A. It determines how much the relative error
and therefore
kx̂ − xk
b̂ − b
≤ 100
kxk kbk
which is consistent with our results above (b and b̂ were chosen so that the worst possible error magnification occurs).
The fact that the matrix
A in our example has a large condition number is related to the fact that A is close to the singular
1 1
matrix B = .
1 1
1
The following result shows that cond(A) indicates how close A is to a singular matrix:
kA − Bk 1
Theorem: min =
kAk
B∈Rn×n , B singular cond(A)
Proof: (1) Lemma 1 shows: B singular implies kA − Bk ≥ kA1−1 k .
kxk
(2) By the definition of
A−1
there exist x, y ∈ Rn such that x = A−1 y and
A−1
=
kyk . By Lemma 2 there exists a matrix
kyk
E ∈ Rn×n such that Ex = y and kEk = kxk . Then B := A − E satisfies Bx = Ax − Ex = y − y = 0, hence B is singular and
1
kA − Bk = kEk = kA−1 k
.
1.01 .99 1 1 kA−Bk
Example: The matrix A = is close to the singular matrix B = so that kAk ∞ = .02
2 = .01. By
.99 1.01 1 1 ∞
the theorem we have that 0.01 ≤ cond1∞ (A) or cond∞ (A) ≥ 100. As we say above we have cond∞ (A) = 100, i.e., the matrix B
is really the closest singular matrix to the matrix A.
When we solve a linear system Ax = b we have to store the entries of A and b in the computer, yielding a matrix  with
rounded entries âi j = f l(ai j ) and a rounded right hand side vector b̂. If the original matrix A is singular then the linear system
has no solution or infinitely many solutions, so that any computed solution is meaningless. How can we recognize this on a
computer? Note that the matrix  which the computer uses may no longer be singular.
Answer: We should compute (or at least estimate) cond(Â). If cond(Â) < ε1M then we can guarantee that any matrix A which
is rounded to  must be nonsingular: âi j − ai j ≤ εM ai j implies
 − A
≤ εM kAk for the infinity or 1-norm. Therefore
kÂ−Ak kÂ−Ak
εM
≤ 1−ε ≈ εM and cond(Â) < 1−ε 1 1
εM ≈ εM imply kÂk < cond(Â) . Hence the matrix A must be nonsingular by the
M
kÂk M
theorem.
Now we assume that we perturb both the right hand side vector b and the matrix A:
Assume Ax = b and Âx̂ = b̂. If A is nonsingular and
 − A
< 1/
A−1
there holds
Theorem:
!
kx̂ − xk cond(A)
b̂ − b
 − A
≤ +
kxk kÂ−Ak kbk kAk
1 − cond(A) kAk
Proof: Let E = Â − A, hence Ax̂ = b̂ − E x̂. Subtracting Ax = b gives A(x̂ − x) = (b̂ − b) − E x̂ and therefore
kx̂ − xk ≤
A−1
b̂ − b
+ kEk kx̂k
|{z}
≤ kxk + kx̂ − xk
1 − kA−1 k kEk kx̂ − xk ≤
A−1
b̂ − b
+ kEk kxk
Dividing by kxk and using kbk ≤ kAk kxk ⇐⇒ kxk ≥ kbk / kAk gives
!
k x̂ − xk
b̂ − b
1 − kA−1 k kEk ≤
A−1
+ kEk
kxk kbk / kAk
!
kEk kx̂ − xk
b̂ − b
kEk
1 − cond(A) ≤ cond(A) +
kAk kxk kbk kAk
kEk
If kEk < 1/ kA−1 k ⇐⇒ kAk < 1/ cond(A) we can divide by 1 − cond(A) kEk
kAk .
kÂ−Ak
If cond(A) kAk 1 we have that both the relative error in the right hand side vector and in the matrix are magnified by
cond(A).
kÂ−Ak
If cond(A) kAk =
A−1
 − A
≥ 1 then by the theorem for the condition number the matrix  may actually be singular,
We have seen that the condition number is very useful: It tells us what accuracy we can expect for the solution, and how
close our matrix is to a singular matrix.
3
In order to compute the condition number we have to find A−1 . This takes n3 + O(n2 ) operations, compared with n3 + O(n2 )
operations for the LU-decomposition. Therefore the computation of the condition number would make the solution of a
linear system 3 times as expensive. For large problems this is not reasonable.
However, we do not
need to
compute the
condition
number with full machine accuracy. Just knowing the order of magnitude
is sufficient. Since
A−1
= maxkzk=1
A−1 z
we can pick any vector z with kzk = 1 and have the lower bound kAk ≥
−1
A z
. Note that u = A−1 z is the solution of the linear system Au = z. If we already have L,U, p we can easily find u using
−1
u=U\(L\b(p)) with a cost of only n2
+ O(n) operations. The trick is to pick z with kzk = 1 such that
A z
becomes as
−1
large as possible, so that it is close to A
. There are heuristic methods available which achieve fairly good lower bounds:
(i) Pick c = (±1, . . . , ±1) and pick the signs so that the forward susbstitution gives a large vector, (ii) picking c̃ := z and solve
Az̃ = c̃ often improves the lower bound. The Matlab functions condest(A) and 1/rcond(A) use similar ideas to give lower
bounds for cond1 (A). Typically they give an estimated condition number c with c ≤ cond1 (A) ≤ 3c and require the solution
of 2 or 3 linear systems which costs O(n2 ) operations if the LU decomposition is known. (However, the Matlab commands
condest and rcond only use the matrix A as an input value, so they have to compute the LU decomposition of A first and
3
need n3 + O(n2 ) operations.)
When we run Gaussian elimination on a computer each single operation causes some roundoff error, and instead of the exact
solution x of a linear system we only get an approximation x̂. As explained above we should select the pivot candidate with
the largest absolute value to avoid unnecessary subtractive cancellation, and this usually is a numerically stable algorithm.
However, there is no theorem which guarantees this for partial pivoting (row interchanges). (For “full pivoting” with row
and column interchanges some theoretical results exist. However, this algorithm is more expensive, and for all practical
examples partial pivoting seems to work fine.)
Question 2: After we computed x̂ how can we check how good our computation was? The obvious thing to check is
b̂ := Ax̂ and to compare it with b. The difference r = b̂ − b is called the residual. As Ax = b and Ax̂ = b̂ we have
kx̂ − xk
b̂ − b
≤ cond(A)
kxk kbk
where
b̂ − b
/ kbk is called the relative residual. We can compute (or at least estimate) cond(A), and therefore can obtain
an upper bound for the error kx̂ − xk / kxk. Actually, we can obtain a slightly better estimate by using
−1
−1
kx̂ − xk
b̂ − b
kx̂ − xk =
A b̂ − b
≤
A
b̂ − b
=⇒ ≤ cond(A)
kx̂k kAk kx̂k
b̂ − b
with the weighted residual ρ := . Note that kx̂ − xk / kx̂k ≤ δ implies for δ < 1
kAk kx̂k
kx̂ − xk δ
kx̂ − xk ≤ δ kx + (x̂ − x)k ≤ δ (kxk + kx̂ − xk) =⇒ ≤
kxk 1−δ
kyk
2. Let y := b − b̂. Using Lemma 2 we get a matrix E with E x̂ = y and kEk = kx̂k . Then à := A + E satisfies Ãx̂ =
kb−b̂k
(A + E)x̂ = b̂ + (b − b̂) = b and kEk
kAk = kx̂kkAk ≤ ε.
Summary
kx̂ − xk
b̂ − b
– If Ax = b and Ax̂ = b̂ we have ≤ cond(A) .
kxk kbk
– The unavoidable error due to the rounding of A and b is approximatively 2 cond(A)εM .
– If cond(Â) ≥ 1/εM then the matrix  could be the machine representation of a singular matrix A, and the com-
puted solution is usually meaningless.
– To check whether a matrix A is singular, up to a perturbation within machine accuracy:
∗ DO NOT use the determinant det A
∗ DO NOT use the pivot elements u j j obtained by Gaussian elimination
∗ DO use the condition number and check cond(A) < 1/εM
• You should compute an approximation to the the condition number cond(A) = kAk
A−1
. Here
A−1
can be ap-
proximated by solving a few linear systems with the existing LU decomposition (condest in Matlab).
• In order to check the accuracy of a computed solution x̂ compute the residual r := Ax̂ − b and the weighted residual
krk
ρ := kAkk x̂k .
kx̂−xk
– we get an error bound kx̂k ≤ cond(A)ρ
– if ρ is not much larger than εM the computation is numerically stable: Perturbing A and b with a relative error
εM would cause the same error.
– if ρ is much larger than εM the computation is numerically unstable: The error is much larger than the error
resulting from the uncertainty of the input values.
We can obtain a more accurate result by iterative improvement: Let r := Ax̂ − b and solve Ae = r using the
existing LU decomposition. Then let x̂ := x̂ − e.
• Gaussian elimination with the pivoting strategy of choosing the largest absolute value is in almost all cases numerically
stable. We can check this by computing the weighted residual ρ. If ρ is much larger than εM we can compute a
numerically stable result by using iterative improvement.