Numerical Linear Algebra

Victor Eijkhout

Fall 2022

Many algorithms are based in linear algebra, including some

non-obvious ones such as graph algorithms. This session will mostly
discuss aspects of solving linear systems, focusing on those that have
computational ramifications.

Linear algebra

• Mathematical aspects: mostly linear system solving

• Practical aspects: even simple operations are hard
– Dense matrix-vector product: scalability aspects
– Sparse matrix-vector: implementation

Let’s start with the math. . .

Two approaches to linear system solving
Solve Ax = b

Direct methods:

• Deterministic
• Exact up to machine precision
• Expensive (in time and space)

Iterative methods:

• Only approximate
• Cheaper in space and (possibly) time
• Convergence not guaranteed

Really bad example of direct method

Cramer’s rule
write |A| for determinant, then

a11 a12 . . . a1i −1 b1 a1i +1 . . . a1n

a21 ... b2 . . . a2n
xi = . .. .. /|A|
.. . .
an1 ... bn . . . ann

Time complexity O (n!)

Not a good method either

Ax = b

• Compute explictly A−1 ,

• then x ← A−1 b.
• Numerical stability issues.
• Amount of work?

A close look linear system
solving: direct methods

Gaussian elimination

Example    
6 −2 2 16
12 −8 6 x =  26 
3 −13 3 −19
     
6 −2 2 | 16 6 −2 2 | 16 6 −2 2 | 16
12 −8 6 | 26  −→ 0 −4 2 | −6  −→ 0 −4 2 | −6
3 −13 3 | −19 0 −12 2 | −27 0 0 −4 | −9
Solve x3 , then x2 , then x1

6, −4, −4 are the ‘pivots’

Gaussian elimination, step by step
⟨LU factorization⟩:
for k = 1, n − 1:
⟨eliminate values in column k ⟩
⟨eliminate values in column k ⟩:
for i = k + 1 to n:
⟨compute multiplier for row i ⟩
⟨update row i ⟩
⟨compute multiplier for row i ⟩
aik ← aik /akk
⟨update row i ⟩:
for j = k + 1 to n:
aij ← aij − aik ∗ akj

Gaussian elimination, all together
⟨LU factorization⟩:
for k = 1, n − 1:
for i = k + 1 to n:
aik ← aik /akk
for j = k + 1 to n:
aij ← aij − aik ∗ akj

Amount of work:
n−1 n−1
∑ ∑ 1 = ∑ (n − k )2 ≈ ∑ k 2 ≈ n3 /3
k =1 i ,j >k k k


If a pivot is zero, exchange that row and another.

(there is always a row with a nonzero pivot if the matrix is nonsingular)
best choice is the largest possible pivot
in fact, that’s a good choice even if the pivot is not zero:
partial pivoting
(full pivoting would be row and column exchanges)

Roundoff control
ε 1 1+ε
1 1 2
with solution x = (1, 1)t

Ordinary elimination:
ε 1 1+ε 1+ε
x= = .
0 1 − 1ε 2 − 1+ε
ε 1 − 1ε

We can now solve x2 and from it x1 :

= (1 − ε−1 )/(1 − ε−1 ) = 1

x1 = ε−1 (1 + ε − x2 ) = 1

Roundoff 2
If ε < εmach , then in the rhs 1 + ε → 1, so the system is:
ε 1 1
1 1 2

The solution (1, 1) is still correct!

ε 1 1 ε 1 1
x= ⇒ x=
0 1 − ε−1 2 − ε−1 0 −ε−1 −ε−1

Solving first x2 , then x1 , we get:

= ε−1 /ε−1 = 1

x1 = ε−1 (1 − 1 · x2 ) = ε−1 · 0 = 0,
so x2 is correct, but x1 is completely wrong.

Roundoff 3

Pivot first:
1 1 2 1 1 2
x= ⇒ x=
ε 1 1+ε 0 1−ε 1−ε

Now we get, regardless the size of epsilon:

x2 = = 1, x1 = 2 − x2 = 1

LU factorization
Same example again:
 
6 −2 2
A = 12 −8 6
3 −13 3

2nd row minus 2× first; 3rd row minus 1/2× first;

equivalent to
 
1 0 0
L1 Ax = L1 b, L1 =  −2 1 0
−1/2 0 1

(elementary reflector)

LU 2

Next step: L2 L1 Ax = L2 L1 b with

 
1 0 0
L2 = 0
 1 0
0 −3 1

Define U = L2 L1 A, then A = LU with L = L− 1 −1

1 L2
‘LU factorization’ with U upper; L see next.

LU 3
   
1 0 0 1 0 0
L1 =  −2 1 0 L−
=  2 1 0
−1/2 0 1 1/2 0 1
   
1 0 0 1 0 0
L2 = 0
 1 0 L2 = 0
 1 0
0 −3 1 0 3 1
Even more remarkable:
 
1 0 0
1 −1
L2 =  2 1 0 Lower triangular!
1/2 3 1
Can be computed in place! (pivoting?)

Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y

Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn

Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y

Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn

y1 = b1 , y2 = b2 − ℓ21 y1 , . . .

Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn

Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn

xn = unn yn , xn−1 = un−−11n−1 (yn−1 − un−1n xn ), . . .

(Compute inverses once; store)

Computational aspects

Matrix-vector product: Solving LU system:

    
    
y1 a11 a1n x1 a11 0/ x1 y1
..   .. ..   ..   .. ..   ..   .. 
. ← .  .  =  . 

 .  .   . .
yn an1 ... ann xn an1 ... ann xn yn

(and similarly the U matrix)

Compare operation counts. Can you think of other points of

comparison? (Think modern computers.)

Short detour: Partial
Differential Equations

Second order PDEs; 1D case
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub

Second order PDEs; 1D case
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
Using Taylor series:

u (x + h) + u (x − h) = 2u (x ) + u ′′ (x )h2 + u (4) (x ) +···
u (x + h) − 2u (x ) + u (x − h)
u ′′ (x ) = + O (h 2 )
Numerical scheme:
u (x + h) − 2u (x ) + u (x − h)
− = f (x , u (x ), u ′ (x ))

This leads to linear algebra
2u (x ) − u (x + h) − u (x − h)
−uxx = f → = f (x , u (x ), u ′ (x ))
Equally spaced points on [0, 1]: xk = kh where h = 1/(n + 1), then

−uk +1 + 2uk − uk −1 = −h2 f (xk , uk , uk′ ) for k = 1, . . . , n

Written as matrix equation:

    
2 −1 0/ u1 f1 + u0
−1 2 −1   u2   f 2 
   =  
.. .. .. .. ..
0/ . . . . .

Second order PDEs; 2D case

−uxx (x̄ ) − uyy (x̄ ) = f (x̄ ) x ∈ Ω = [0, 1]2
u (x̄ ) = u0 x̄ ∈ δΩ
Now using central differences in both x and y directions:

4u (x , y ) − u (x + y , y ) − u (x − h, y ) − u (x , y + h) − u (x , y − h)

The stencil view of things

−1 4 −1

Sparse matrix from 2D equation
−1 −1
 
4 0/ 0/
 −1 4 1 −1 
 
 .. .. .. .. 
 . . . . 
 
 .. .. .. 

 . . −1 . 

 0/
 −1 4 0/ −1 

 −1 0/ 4 −1 −1 
 

 −1 −1 4 −1 −1 
 .. 

 ↑ . ↑ ↑ ↑ ↑ 

 k −n k −1 k k +1 −1 k +n 

 −1 −1 4 

.. ..
. .

The stencil view is often more insightful.

Matrix properties
• Very sparse, banded
• Factorization takes less than n2 space, n3 work
• Symmetric (only because 2nd order problem)
• Sign pattern: positive diagonal, nonpositive off-diagonal
(true for many second order methods)
• Positive definite (just like the continuous problem)
• Constant diagonals: only because of the constant coefficient
differential equation
• Factorization: lower complexity than dense, recursion length less
than N.

Sparse matrices

Sparse matrix storage
Matrix above has many zeros: n2 elements but only O (n) nonzeros.
Big waste of space to store this as square array.

Matrix is called ‘sparse’ if there are enough zeros to make specialized

storage feasible.

Compressed Row Storage
 
10 0 0 0 0

 3 9 0 0 0 3 

 0 7 8 7 0 0 
A= . (1)

 3 0 8 7 5 0 

 0 8 0 9 9 13 
0 4 0 0 2 −1
Compressed Row Storage (CRS): store all nonzeros by row, their
column indices, pointers to where the columns start (1-based

val 10 -2 3 9 3 7 8 7 3 ··· 9 13 4 2 -1
col ind 1 5 1 2 6 2 3 4 1 ··· 5 6 2 5 6
row ptr 1 3 6 9 13 17 20 .

Sparse matrix-vector operations

• Simplest, and important in many contexts: matrix-vector product.

• Matrix-matrix product rare in engineering science
very important in Deep Learning
• Gaussian elimination is a complicated story.
• In general: changes to sparse structure are hard!

Dense matrix-vector product
Most common operation in many cases: matrix-vector product

aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (col=0; col<ncols; col++) {
s += a[aptr] * x[col];
y[row] = s;

Reuse? Locality? Cachelines?

Better implementation
Three loops: block, columns inside block, row;
permute blocks to outermost

Sparse matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (icol=ptr[row]; icol<ptr[row+1]; icol++) {
int col = ind[icol];
s += a[aptr] * x[col];
y[row] = s;

Again: Reuse? Locality? Cachelines?

Indirect addressing of x gives low spatial and temporal locality.

Exercise: sparse coding

What if you need access to both rows and columns at the same time?
Implement an algorithm that tests whether a matrix stored in CRS
format is symmetric. Hint: keep an array of pointers, one for each row,
that keeps track of how far you have progressed in that row.

Remember Gaussian elimination algorithm:

for k = 1, n − 1:
for i = k + 1 to n:
for j = k + 1 to n:
aij ← aij − aik ∗ akj /akk

Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.

(and so ℓij ̸= 0 or uij ̸= 0.)

Change in the sparsity structure! How do you deal with that?

LU of a sparse matrix

   
2 −1 0 ... 2 −1 0 ...
−1 2 −1   0 2 − 21 −1 

   
 −1 2 −1 

 0 −1 2 −1 

.. .. .. .. .. .. .. ..
. . . . . . . .

How does this continue by induction?


LU of a sparse matrix
 
4 −1 0 ... −1
 −1 4 −1 0 ... 0 −1 
 
 .. .. .. .. 
 . . . . 
 
 −1 0 ... 4 −1 
0 −1 0 ... −1 4 −1

−1 −1

4 0 ...

 4 − 14 −1 0 ... −1/4 −1 

 .. .. .. .. 

 . . . . 

 −1/4 ... 4 − 14 −1 
−1 0 ... −1 4 −1

A little graph theory
Graph is a tuple G = ⟨V , E ⟩ where V = {v1 , . . . vn } for some n, and
E ⊂ {(i , j ) : 1 ≤ i , j ≤ n, i ̸= j }.

V = {1, 2, 3, 4, 5, 6}
E = {(1, 2), (2, 6), (4, 3), (4, 4), (4, 5)}

Graphs and matrices
For a graph G = ⟨V , E ⟩, the adjacency matrix M is defined by
1 (i , j ) ∈ E
Mij =
0 otherwise

A dense and a sparse matrix, both with their adjacency graph

Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.

aij ← aij − aik ∗ akj /akk

Eliminating a vertex introduces a new edge in the quotient graph

LU of sparse matrix, with graph view: 1

Original matrix.

LU of sparse matrix, with graph view: 2

Eliminating (2, 1) causes fill-in at (2, 3).

LU of sparse matrix, with graph view: 3

Remaining matrix when step 1 finished.

LU of sparse matrix, with graph view: 4

Eliminating (3, 2) fills (3, 4)

LU of sparse matrix, with graph view: 5

After step 2

Fill-in is a function of ordering

 
∗ ∗ ··· ∗
∗ ∗ 0/ 
 
 .. .. 
. . 
∗ 0/ ∗
After factorization the matrix is dense.
Can this be permuted?

Exercise: LU of a penta-diagonal matrix
Consider the matrix
 
2 0 −1
 2 0 −1 

−1 0 2 0 −1 
 

 −1 0 2 0 −1 

.. .. .. .. ..
. . . . .
Describe the LU factorization of this matrix:

• Convince yourself that there will be no fill-in. Give an inductive

proof of this.
• What does the graph of this matrix look like? (Find a tutorial on
graph theory. What is a name for such a graph?)
• Can you relate this graph to the answer on the question of the

Exercise: LU of a band matrix

Suppose a matrix A is banded with halfbandwidth p:

aij = 0 if |i − j | > p

Derive how much space an LU factorization of A will take if no pivoting

is used. (For bonus points: consider partial pivoting.)

Can you also derive how much space the inverse will take? (Hint: if
A = LU, does that give you an easy formula for the inverse?)


