Errors for Linear Systems

When we solve a linear system Ax = b we often do not know A and b exactly, but have only approximations  and b̂ available.
Then the best thing we can do is to solve Âx̂ = b̂ exactly which gives a different solution vector x̂. We would like to know
how the errors of  and b̂ influence the error in x̂.

Example: Consider the linear system Ax = b with

1.01 .99 x1 2
= .
.99 1.01 x2 2
1 2.02
We can easily see that the solution is x = . Now let us use the slightly different right hand side vector b̂ =
1 1.98
and solve the linear system Ax̂ = b̂. This gives the solution vector x̂ = . In this case a small change in the right hand
side vector has caused a large change in the solution vector.

Vector norms

In order to measure errors in vectors by a single number we use a so-called vector norm.
A vector norm kxk measures the size of a vector x ∈ Rn by a nonnegative number and has the following properties
kxk = 0 ⇒ x=0
kαxk = |α| kxk
kx + yk ≤ kxk + kyk
for any x, y ∈ Rn , α ∈ R. There are many possible vector norms. We will use the three norms kxk1 , kxk2 , kxk∞ defined by
kxk1 = |x1 | + · · · + |xn |
kxk2 = |x1 |2 + · · · + |xn |2
kxk∞ = max{|x1 | , . . . , |xd |}

If we write kxk in an equation without any subscript, then the equation is valid for all three norms (using the same norm
If the exact vector is x and the approximation is x̂ we can define the relative error with respect to a vector norm as kxk .

kb̂−bk kx̂−xk
Example: Note that in the above example we have kbk ∞ = 0.01, but kxk ∞ = 1. That means that the relative error of
∞ ∞
the solution is 100 times as large as the relative error in the given data, i.e., the condition number of the problem is at least

Matrix norms

A matrix norm kAk measures the size of a matrix A ∈ Rn×n by a nonnegative number. We would like to have the property
kAxk ≤ kAk kxk for all x ∈ Rn (1)
where kxk is one of the above vector norms kxk1 , kxk2 , kxk∞ . We define kAk as the smallest number satisfying (1):
kAk := sup = maxn kAxk
x∈Rn kxk x∈R
x6=0 kxk=1

By using the 1, 2, ∞ vector norm in this definition we obtain the matrix norms kAk1 , kAk2 , kAk∞ (which are in general
different numbers). It turns out that kAk1 and kAk∞ are easy to compute:
kAk∞ = max ∑ ai j (maximum of row sums of absolute values)
i=1,...,n j=1,...,n

kAk1 = max ∑ ai j (maximum of column sums of absolute values)
j=1,...,n i=1,...,n
Proof: For the infinity norm we have

kAxk∞ = max ∑ ai j x j ≤ max ∑ ai j x j ≤ max ∑ ai j kxk∞

i j i j i j

implying kAk∞ ≤ max i ∑ j ai j . Let i∗ be the index where the maximum occurs and define x j = sign ai∗ j , then kxk∞ = 1 and
kAxk∞ = maxi ∑ j ai j .
For the 1-norm we have
! !

kAxk1 = ∑ ∑ ai j x j ≤ ∑ ∑ ai j x j ≤ max ∑ ai j kxk1

i j j i j i

implying kAk1 ≤ max j ∑i ai j . Let j∗ be the index where the maximum occurs and define x j∗ = 1 and x j = 0 for j 6= j∗ , then
kxk1 = 1 and kAxk1 = max j ∑i ai j . 
We will not use kAk2 since it is more complicated to compute (it involves eigenvalues).
Note that for A, B ∈ Rn×n we have kABk ≤ kAk kBk since

kABxk ≤ kAk kBxk = kAk kBk kxk .

The following results about matrix norms will be useful later:

Lemma 1: Let A ∈ Rn×n be nonsingular and E ∈ Rn×n . Then kEk < kA1−1 k implies that A + E is nonsingular.
Proof: Assume that A + E is singular. Then there exists a nonzero x ∈ Rn such that (A + E)x = 0 and hence
≤ kAxk = kExk ≤ kEk kxk
kA−1 k

The left inequality for b := Ax follows from kxk = A−1 b ≤ A−1 kbk. As kxk > 0 we obtain
kA−1 k
≤ kEk.

Lemma 2: For given vectors x, y ∈ Rn with x 6= 0 there exists a matrix E ∈ Rn×n with Ex = y and kEk = kxk .

Proof: For the infinity-norm we have kxk∞ = x j for some j. Let a ∈ Rn be the vector with a j = 1, ak = 0 for j 6= k and let
ya> ,
then (i) a> x = kxk∞ implies Ex = y and (ii) ya> v ∞ = a> v kyk∞ with a> v ≤ kvk∞ implies kEk ≤

kxk .
For the 1-norm we use a ∈ Rn with a j = sign(x j ) since a> x = kxk1 and a> v ≤ kvk1 .

For the 2-norm we use a = x/ kxk2 since a> x = kxk2 and a> v ≤ kak2 kvk2 = kvk2 .

Condition numbers

Let x denote the solution vector of the linear sytem Ax = b. If we choose a slightly different right hand side vector b̂ then we
obtain a different solution vector x̂ satisfying Ax̂ = b̂. We want to know how the relative error b̂ − b / kbk influences the
relative error kx̂ − xk / kxk (“error propagation”). We have A(x̂ − x) = b̂ − b and therefore

kx̂ − xk = A−1 (b̂ − b) ≤ A−1 b̂ − b .

On the other hand we have kbk = kAxk ≤ kAk kxk. Combining this we obtain

kx̂ − xk −1 b̂ − b
≤ kAk A .
kxk kbk

The number cond(A) := kAk A−1 is called condition number of the matrix A. It determines how much the relative error

of the right hand side

vector can be amplified. The condition number depends on the choice of the matrix norm: In general
cond1 (A) := kAk1 A−1 1 and cond∞ (A) := kAk∞ A−1 ∞ are different numbers.

Example: In the above example we have

−1 1.01 .99 25.25 −24.75
cond∞ (A) = kAk∞ A

.99 1.01

−24.75 25.25 = 2 · 50 = 100

∞ ∞

and therefore
kx̂ − xk b̂ − b
≤ 100
kxk kbk
which is consistent with our results above (b and b̂ were chosen so that the worst possible error magnification occurs).
The fact that the matrix
 A in our example has a large condition number is related to the fact that A is close to the singular
1 1
matrix B = .
1 1
The following result shows that cond(A) indicates how close A is to a singular matrix:

kA − Bk 1
Theorem: min =
B∈Rn×n , B singular cond(A)
Proof: (1) Lemma 1 shows: B singular implies kA − Bk ≥ kA1−1 k .
(2) By the definition of A−1 there exist x, y ∈ Rn such that x = A−1 y and A−1 =

kyk . By Lemma 2 there exists a matrix
E ∈ Rn×n such that Ex = y and kEk = kxk . Then B := A − E satisfies Bx = Ax − Ex = y − y = 0, hence B is singular and
kA − Bk = kEk = kA−1 k

1.01 .99 1 1 kA−Bk
Example: The matrix A = is close to the singular matrix B = so that kAk ∞ = .02
2 = .01. By
.99 1.01 1 1 ∞

the theorem we have that 0.01 ≤ cond1∞ (A) or cond∞ (A) ≥ 100. As we say above we have cond∞ (A) = 100, i.e., the matrix B
is really the closest singular matrix to the matrix A.
When we solve a linear system Ax = b we have to store the entries of A and b in the computer, yielding a matrix  with
rounded entries âi j = f l(ai j ) and a rounded right hand side vector b̂. If the original matrix A is singular then the linear system
has no solution or infinitely many solutions, so that any computed solution is meaningless. How can we recognize this on a
computer? Note that the matrix  which the computer uses may no longer be singular.
Answer: We should compute (or at least estimate) cond(Â). If cond(Â) < ε1M then we can guarantee that any matrix A which

is rounded to  must be nonsingular: âi j − ai j ≤ εM ai j implies  − A ≤ εM kAk for the infinity or 1-norm. Therefore
kÂ−Ak kÂ−Ak
≤ 1−ε ≈ εM and cond(Â) < 1−ε 1 1
εM ≈ εM imply kÂk < cond(Â) . Hence the matrix A must be nonsingular by the
kÂk M

Now we assume that we perturb both the right hand side vector b and the matrix A:
Assume Ax = b and Âx̂ = b̂. If A is nonsingular and  − A < 1/ A−1 there holds

kx̂ − xk cond(A) b̂ − b  − A
≤ +
kxk kÂ−Ak kbk kAk
1 − cond(A) kAk

Proof: Let E = Â − A, hence Ax̂ = b̂ − E x̂. Subtracting Ax = b gives A(x̂ − x) = (b̂ − b) − E x̂ and therefore
kx̂ − xk ≤ A−1 b̂ − b + kEk kx̂k

≤ kxk + kx̂ − xk
1 − kA−1 k kEk kx̂ − xk ≤ A−1 b̂ − b + kEk kxk

Dividing by kxk and using kbk ≤ kAk kxk ⇐⇒ kxk ≥ kbk / kAk gives
k x̂ − xk b̂ − b
1 − kA−1 k kEk ≤ A−1

+ kEk
kxk kbk / kAk
kEk kx̂ − xk b̂ − b kEk
1 − cond(A) ≤ cond(A) +
kAk kxk kbk kAk
If kEk < 1/ kA−1 k ⇐⇒ kAk < 1/ cond(A) we can divide by 1 − cond(A) kEk
kAk .

If cond(A) kAk  1 we have that both the relative error in the right hand side vector and in the matrix are magnified by
If cond(A) kAk = A−1  − A ≥ 1 then by the theorem for the condition number the matrix  may actually be singular,

so that the solution x̂ is no longer well defined.

Computing the condition number

We have seen that the condition number is very useful: It tells us what accuracy we can expect for the solution, and how
close our matrix is to a singular matrix.
In order to compute the condition number we have to find A−1 . This takes n3 + O(n2 ) operations, compared with n3 + O(n2 )
operations for the LU-decomposition. Therefore the computation of the condition number would make the solution of a
linear system 3 times as expensive. For large problems this is not reasonable.
However, we do not need to
compute the condition
number with full machine accuracy. Just knowing the order of magnitude
is sufficient. Since A−1 = maxkzk=1 A−1 z we can pick any vector z with kzk = 1 and have the lower bound kAk ≥
A z . Note that u = A−1 z is the solution of the linear system Au = z. If we already have L,U, p we can easily find u using
u=U\(L\b(p)) with a cost of only n2 + O(n) operations. The trick is to pick z with kzk = 1 such that A z becomes as

large as possible, so that it is close to A
. There are heuristic methods available which achieve fairly good lower bounds:
(i) Pick c = (±1, . . . , ±1) and pick the signs so that the forward susbstitution gives a large vector, (ii) picking c̃ := z and solve
Az̃ = c̃ often improves the lower bound. The Matlab functions condest(A) and 1/rcond(A) use similar ideas to give lower
bounds for cond1 (A). Typically they give an estimated condition number c with c ≤ cond1 (A) ≤ 3c and require the solution
of 2 or 3 linear systems which costs O(n2 ) operations if the LU decomposition is known. (However, the Matlab commands
condest and rcond only use the matrix A as an input value, so they have to compute the LU decomposition of A first and
need n3 + O(n2 ) operations.)

Computation in machine arithmetic and residuals

When we run Gaussian elimination on a computer each single operation causes some roundoff error, and instead of the exact
solution x of a linear system we only get an approximation x̂. As explained above we should select the pivot candidate with
the largest absolute value to avoid unnecessary subtractive cancellation, and this usually is a numerically stable algorithm.
However, there is no theorem which guarantees this for partial pivoting (row interchanges). (For “full pivoting” with row
and column interchanges some theoretical results exist. However, this algorithm is more expensive, and for all practical
examples partial pivoting seems to work fine.)

Question 1: How much error do we have to accept for kx̂−xk

kxk ? This is the unavoidable error which occurs even for an
ideal algorithm where we only round the input values and the output value to machine accuracy, and use infinite accuracy
for all computations.
When we want to solve Ax = b we have to store the entries of A, b in the computer, yielding a matrix  and a right hand side
kÂ−Ak kb̂−bk
vector b̂ of machine numbers so that kAk ≤ εM and kbk ≤ εM . An ideal algorithm would then try to solve this linear
system exactly, i.e., compute a vector x̂ such that Âx̂ = b̂. Then we have
kx̂ − xk cond(A)
≤ (εM + εM ) ≈ 2 cond(A)εM
kxk 1 − cond(A)εM
if cond(A)  1/εM . Therefore the unavoidable error is 2 cond(A)εM .

Question 2: After we computed x̂ how can we check how good our computation was? The obvious thing to check is
b̂ := Ax̂ and to compare it with b. The difference r = b̂ − b is called the residual. As Ax = b and Ax̂ = b̂ we have

kx̂ − xk b̂ − b
≤ cond(A)
kxk kbk

where b̂ − b / kbk is called the relative residual. We can compute (or at least estimate) cond(A), and therefore can obtain
an upper bound for the error kx̂ − xk / kxk. Actually, we can obtain a slightly better estimate by using

−1  −1 kx̂ − xk b̂ − b
kx̂ − xk = A b̂ − b ≤ A b̂ − b =⇒ ≤ cond(A)
kx̂k kAk kx̂k

b̂ − b
with the weighted residual ρ := . Note that kx̂ − xk / kx̂k ≤ δ implies for δ < 1
kAk kx̂k
kx̂ − xk δ
kx̂ − xk ≤ δ kx + (x̂ − x)k ≤ δ (kxk + kx̂ − xk) =⇒ ≤
kxk 1−δ

which is the same as δ up to higher order terms O(δ 2 ).

If b̂ − b / kbk is not much larger than εM then the computation was numerically stable: Just perturbing the input slightly
from b to b̂ and then doing everything else exactly would give the same result x̂.
But it can happen that the relative residual is much larger than εM , and yet the computation is numerically stable. We obtain
a better way to measure numerical stability by considering perturbations of the matrix A:
Assume we have a computed solution x̂. If we can find a slightly perturbed matrix à such that

à − A
≤ ε, Ãx̂ = b (2)
where ε not much larger than εM , then the computation is numerically stable: Just perturbing the matrix within the roundoff
error and then doing everything exactly gives the same result as our computation.
How can we check whether such a matrix à exists? We again use the “weighted residual”

b̂ − b
ρ := .
kAk kx̂k
1. If x̂ is the solution of a slightly perturbed problem (2) we have ρ ≤ ε.
2. If ρ ≤ ε then x̂ is the solution of a slightly perturbed problem (2).
1. Let E = Ã − A. Then (A + E)x̂ = b or b̂ − b = −E x̂ yielding

b̂ − b kEk
b̂ − b ≤ kEk kx̂k , ≤ ≤ ε.
kAk kx̂k kAk

2. Let y := b − b̂. Using Lemma 2 we get a matrix E with E x̂ = y and kEk = kx̂k . Then à := A + E satisfies Ãx̂ =
(A + E)x̂ = b̂ + (b − b̂) = b and kEk
kAk = kx̂kkAk ≤ ε.


• Recommended method for solving linear systems on a computer:

1. Given A find L,U, p using Gaussian elimination with pivoting, choosing the pivot candidate with the largest
absolute value.
2. Solve Lu = b̃ (where b̃i = b pi ) by forward substitution and Ux = y by back substitution.
• DO NOT compute the inverse matrix A−1 . This takes about 3 times as long as computing the LU decomposition.
• The condition number cond(A) = kAk A−1 characterizes the sensitivity of the linear system:

kx̂ − xk b̂ − b
– If Ax = b and Ax̂ = b̂ we have ≤ cond(A) .
kxk kbk
– The unavoidable error due to the rounding of A and b is approximatively 2 cond(A)εM .
– If cond(Â) ≥ 1/εM then the matrix  could be the machine representation of a singular matrix A, and the com-
puted solution is usually meaningless.
– To check whether a matrix A is singular, up to a perturbation within machine accuracy:
∗ DO NOT use the determinant det A
∗ DO NOT use the pivot elements u j j obtained by Gaussian elimination
∗ DO use the condition number and check cond(A) < 1/εM
• You should compute an approximation to the the condition number cond(A) = kAk A−1 . Here A−1 can be ap-

proximated by solving a few linear systems with the existing LU decomposition (condest in Matlab).
• In order to check the accuracy of a computed solution x̂ compute the residual r := Ax̂ − b and the weighted residual
ρ := kAkk x̂k .

– we get an error bound kx̂k ≤ cond(A)ρ
– if ρ is not much larger than εM the computation is numerically stable: Perturbing A and b with a relative error
εM would cause the same error.
– if ρ is much larger than εM the computation is numerically unstable: The error is much larger than the error
resulting from the uncertainty of the input values.
We can obtain a more accurate result by iterative improvement: Let r := Ax̂ − b and solve Ae = r using the
existing LU decomposition. Then let x̂ := x̂ − e.
• Gaussian elimination with the pivoting strategy of choosing the largest absolute value is in almost all cases numerically
stable. We can check this by computing the weighted residual ρ. If ρ is much larger than εM we can compute a
numerically stable result by using iterative improvement.

