hpc_linear

Numerical Linear Algebra
Victor Eijkhout
Fall 2022
Justification
Many algorithms are based in linear algebra, including some

non-obvious ones such as graph algorithms. This session will mostly
discuss aspects of solving linear systems, focusing on those that have
computational ramifications.
2
Linear algebra
• Mathematical aspects: mostly linear system solving

• Practical aspects: even simple operations are hard
– Dense matrix-vector product: scalability aspects
– Sparse matrix-vector: implementation
Let’s start with the math. . .
3
Two approaches to linear system solving
Solve Ax = b
Direct methods:
• Deterministic
• Exact up to machine precision
• Expensive (in time and space)
Iterative methods:
• Only approximate
• Cheaper in space and (possibly) time
• Convergence not guaranteed
4
Really bad example of direct method
Cramer’s rule
write |A| for determinant, then
a11 a12 . . . a1i −1 b1 a1i +1 . . . a1n

a21 ... b2 . . . a2n
xi = . .. .. /|A|
.. . .
an1 ... bn . . . ann
Time complexity O (n!)
5
Not a good method either
Ax = b
• Compute explictly A−1 ,

• then x ← A−1 b.
• Numerical stability issues.
• Amount of work?
6
A close look linear system
solving: direct methods
7
Gaussian elimination
Example    
6 −2 2 16
12 −8 6 x =  26 
3 −13 3 −19
     
6 −2 2 | 16 6 −2 2 | 16 6 −2 2 | 16
12 −8 6 | 26  −→ 0 −4 2 | −6  −→ 0 −4 2 | −6
3 −13 3 | −19 0 −12 2 | −27 0 0 −4 | −9
Solve x3 , then x2 , then x1
6, −4, −4 are the ‘pivots’
8
Gaussian elimination, step by step
⟨LU factorization⟩:
for k = 1, n − 1:
⟨eliminate values in column k ⟩
⟨eliminate values in column k ⟩:
for i = k + 1 to n:
⟨compute multiplier for row i ⟩
⟨update row i ⟩
⟨compute multiplier for row i ⟩
aik ← aik /akk
⟨update row i ⟩:
for j = k + 1 to n:
aij ← aij − aik ∗ akj
9
Gaussian elimination, all together
⟨LU factorization⟩:
for k = 1, n − 1:
for i = k + 1 to n:
aik ← aik /akk
for j = k + 1 to n:
aij ← aij − aik ∗ akj
Amount of work:
n−1 n−1
∑ ∑ 1 = ∑ (n − k )2 ≈ ∑ k 2 ≈ n3 /3
k =1 i ,j >k k k
10
Pivoting
If a pivot is zero, exchange that row and another.

(there is always a row with a nonzero pivot if the matrix is nonsingular)
best choice is the largest possible pivot
in fact, that’s a good choice even if the pivot is not zero:
partial pivoting
(full pivoting would be row and column exchanges)
11
Roundoff control
Consider
ε 1 1+ε
x=
1 1 2
with solution x = (1, 1)t
Ordinary elimination:

ε 1 1+ε 1+ε
x= = .
0 1 − 1ε 2 − 1+ε
ε 1 − 1ε
We can now solve x2 and from it x1 :
= (1 − ε−1 )/(1 − ε−1 ) = 1

x2
x1 = ε−1 (1 + ε − x2 ) = 1
12
Roundoff 2
If ε < εmach , then in the rhs 1 + ε → 1, so the system is:

ε 1 1
x=
1 1 2
The solution (1, 1) is still correct!
Eliminating:

ε 1 1 ε 1 1
x= ⇒ x=
0 1 − ε−1 2 − ε−1 0 −ε−1 −ε−1
Solving first x2 , then x1 , we get:

= ε−1 /ε−1 = 1

x2
x1 = ε−1 (1 − 1 · x2 ) = ε−1 · 0 = 0,
so x2 is correct, but x1 is completely wrong.
13
Roundoff 3
Pivot first:

1 1 2 1 1 2
x= ⇒ x=
ε 1 1+ε 0 1−ε 1−ε
Now we get, regardless the size of epsilon:
1−ε
x2 = = 1, x1 = 2 − x2 = 1
1−ε
14
LU factorization
Same example again:
 
6 −2 2
A = 12 −8 6
3 −13 3
2nd row minus 2× first; 3rd row minus 1/2× first;

equivalent to
 
1 0 0
L1 Ax = L1 b, L1 =  −2 1 0
−1/2 0 1
(elementary reflector)
15
LU 2
Next step: L2 L1 Ax = L2 L1 b with

 
1 0 0
L2 = 0
 1 0
0 −3 1
Define U = L2 L1 A, then A = LU with L = L− 1 −1

1 L2
‘LU factorization’ with U upper; L see next.
16
LU 3
Observe:
   
1 0 0 1 0 0
L1 =  −2 1 0 L−
1
1
=  2 1 0
−1/2 0 1 1/2 0 1
Likewise
   
1 0 0 1 0 0
−1
L2 = 0
 1 0 L2 = 0
 1 0
0 −3 1 0 3 1
Even more remarkable:
 
1 0 0
L−
1
1 −1
L2 =  2 1 0 Lower triangular!
1/2 3 1
Can be computed in place! (pivoting?)
17
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y
Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn
18
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y
Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn
y1 = b1 , y2 = b2 − ℓ21 y1 , . . .
18
Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn
19
Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn
−1
xn = unn yn , xn−1 = un−−11n−1 (yn−1 − un−1n xn ), . . .
(Compute inverses once; store)
19
Computational aspects
Compare:
Matrix-vector product: Solving LU system:

    
...
    
y1 a11 a1n x1 a11 0/ x1 y1
..   .. ..   ..   .. ..   ..   .. 
. ← .  .  =  . 

 .  .   . .
yn an1 ... ann xn an1 ... ann xn yn
(and similarly the U matrix)
Compare operation counts. Can you think of other points of

comparison? (Think modern computers.)
20
Short detour: Partial
Differential Equations
21
Second order PDEs; 1D case
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
22
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
Using Taylor series:
h4
u (x + h) + u (x − h) = 2u (x ) + u ′′ (x )h2 + u (4) (x ) +···
12
so
u (x + h) − 2u (x ) + u (x − h)
u ′′ (x ) = + O (h 2 )
h2
Numerical scheme:
u (x + h) − 2u (x ) + u (x − h)
− = f (x , u (x ), u ′ (x ))
h2
22
This leads to linear algebra
2u (x ) − u (x + h) − u (x − h)
−uxx = f → = f (x , u (x ), u ′ (x ))
h2
Equally spaced points on [0, 1]: xk = kh where h = 1/(n + 1), then
−uk +1 + 2uk − uk −1 = −h2 f (xk , uk , uk′ ) for k = 1, . . . , n
Written as matrix equation:

    
2 −1 0/ u1 f1 + u0
−1 2 −1   u2   f 2 
   =  
.. .. .. .. ..
0/ . . . . .
23
(
−uxx (x̄ ) − uyy (x̄ ) = f (x̄ ) x ∈ Ω = [0, 1]2
u (x̄ ) = u0 x̄ ∈ δΩ
Now using central differences in both x and y directions:
4u (x , y ) − u (x + y , y ) − u (x − h, y ) − u (x , y + h) − u (x , y − h)
24
The stencil view of things
−1
−1 4 −1
−1
25
Sparse matrix from 2D equation
−1 −1
 
4 0/ 0/
 −1 4 1 −1 
 
 .. .. .. .. 
 . . . . 
 
 .. .. .. 

 . . −1 . 

 0/
 −1 4 0/ −1 

 −1 0/ 4 −1 −1 
 

 −1 −1 4 −1 −1 
 .. 

 ↑ . ↑ ↑ ↑ ↑ 

 k −n k −1 k k +1 −1 k +n 


 −1 −1 4 

.. ..
. .
The stencil view is often more insightful.
26
Matrix properties
• Very sparse, banded
• Factorization takes less than n2 space, n3 work
• Symmetric (only because 2nd order problem)
• Sign pattern: positive diagonal, nonpositive off-diagonal
(true for many second order methods)
• Positive definite (just like the continuous problem)
• Constant diagonals: only because of the constant coefficient
differential equation
• Factorization: lower complexity than dense, recursion length less
than N.
27
Sparse matrices
28
Sparse matrix storage
Matrix above has many zeros: n2 elements but only O (n) nonzeros.
Big waste of space to store this as square array.
Matrix is called ‘sparse’ if there are enough zeros to make specialized

storage feasible.
29
Compressed Row Storage
−2
 
10 0 0 0 0

 3 9 0 0 0 3 

 0 7 8 7 0 0 
A= . (1)

 3 0 8 7 5 0 

 0 8 0 9 9 13 
0 4 0 0 2 −1
Compressed Row Storage (CRS): store all nonzeros by row, their
column indices, pointers to where the columns start (1-based
indexing):
val 10 -2 3 9 3 7 8 7 3 ··· 9 13 4 2 -1
col ind 1 5 1 2 6 2 3 4 1 ··· 5 6 2 5 6
row ptr 1 3 6 9 13 17 20 .
30
Sparse matrix-vector operations
• Simplest, and important in many contexts: matrix-vector product.

• Matrix-matrix product rare in engineering science
very important in Deep Learning
• Gaussian elimination is a complicated story.
• In general: changes to sparse structure are hard!
31
Dense matrix-vector product
Most common operation in many cases: matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (col=0; col<ncols; col++) {
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}
Reuse? Locality? Cachelines?
32
Better implementation
Three loops: block, columns inside block, row;
permute blocks to outermost
33
Sparse matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (icol=ptr[row]; icol<ptr[row+1]; icol++) {
int col = ind[icol];
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}
Again: Reuse? Locality? Cachelines?
Indirect addressing of x gives low spatial and temporal locality.
34
Exercise: sparse coding
What if you need access to both rows and columns at the same time?
Implement an algorithm that tests whether a matrix stored in CRS
format is symmetric. Hint: keep an array of pointers, one for each row,
that keeps track of how far you have progressed in that row.
35
Fill-in
Remember Gaussian elimination algorithm:
for k = 1, n − 1:
for i = k + 1 to n:
for j = k + 1 to n:
aij ← aij − aik ∗ akj /akk
Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.

(and so ℓij ̸= 0 or uij ̸= 0.)
Change in the sparsity structure! How do you deal with that?
36
LU of a sparse matrix
   
2 −1 0 ... 2 −1 0 ...
−1 2 −1   0 2 − 21 −1 
⇒
   
0
 −1 2 −1 


 0 −1 2 −1 

.. .. .. .. .. .. .. ..
. . . . . . . .
How does this continue by induction?
Observations?
37
LU of a sparse matrix
 
4 −1 0 ... −1
 −1 4 −1 0 ... 0 −1 
 
 .. .. .. .. 
 . . . . 
 
 −1 0 ... 4 −1 
0 −1 0 ... −1 4 −1

−1 −1

4 0 ...

 4 − 14 −1 0 ... −1/4 −1 

⇒
 .. .. .. .. 

 . . . . 

 −1/4 ... 4 − 14 −1 
−1 0 ... −1 4 −1
38
A little graph theory
Graph is a tuple G = ⟨V , E ⟩ where V = {v1 , . . . vn } for some n, and
E ⊂ {(i , j ) : 1 ≤ i , j ≤ n, i ̸= j }.
(
V = {1, 2, 3, 4, 5, 6}
E = {(1, 2), (2, 6), (4, 3), (4, 4), (4, 5)}
39
Graphs and matrices
For a graph G = ⟨V , E ⟩, the adjacency matrix M is defined by
(
1 (i , j ) ∈ E
Mij =
0 otherwise
A dense and a sparse matrix, both with their adjacency graph
40
Fill-in
Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.
aij ← aij − aik ∗ akj /akk
Eliminating a vertex introduces a new edge in the quotient graph
41
LU of sparse matrix, with graph view: 1
Original matrix.
42
Eliminating (2, 1) causes fill-in at (2, 3).
43
Remaining matrix when step 1 finished.
44
Eliminating (3, 2) fills (3, 4)
45
After step 2
46
Fill-in is a function of ordering
 
∗ ∗ ··· ∗
∗ ∗ 0/ 
 
 .. .. 
. . 
∗ 0/ ∗
After factorization the matrix is dense.
Can this be permuted?
47
Exercise: LU of a penta-diagonal matrix
Consider the matrix
 
2 0 −1
0
 2 0 −1 

−1 0 2 0 −1 
 

 −1 0 2 0 −1 

.. .. .. .. ..
. . . . .
Describe the LU factorization of this matrix:
• Convince yourself that there will be no fill-in. Give an inductive

proof of this.
• What does the graph of this matrix look like? (Find a tutorial on
graph theory. What is a name for such a graph?)
• Can you relate this graph to the answer on the question of the
fill-in?
48
Exercise: LU of a band matrix
Suppose a matrix A is banded with halfbandwidth p:
aij = 0 if |i − j | > p
Derive how much space an LU factorization of A will take if no pivoting

is used. (For bonus points: consider partial pivoting.)
Can you also derive how much space the inverse will take? (Hint: if
A = LU, does that give you an easy formula for the inverse?)
49

hpc_linear

Uploaded by

Copyright:

Available Formats

hpc_linear

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

hpc_linear

Uploaded by

Copyright:

Available Formats

Numerical Linear Algebra

Many algorithms are based in linear algebra, including some

• Mathematical aspects: mostly linear system solving

Let’s start with the math. . .

a11 a12 . . . a1i −1 b1 a1i +1 . . . a1n

Time complexity O (n!)

• Compute explictly A−1 ,

6, −4, −4 are the ‘pivots’

If a pivot is zero, exchange that row and another.

We can now solve x2 and from it x1 :

= (1 − ε−1 )/(1 − ε−1 ) = 1

The solution (1, 1) is still correct!

Solving first x2 , then x1 , we get:

Now we get, regardless the size of epsilon:

2nd row minus 2× first; 3rd row minus 1/2× first;

Next step: L2 L1 Ax = L2 L1 b with

Define U = L2 L1 A, then A = LU with L = L− 1 −1

(Compute inverses once; store)

Matrix-vector product: Solving LU system:

(and similarly the U matrix)

Compare operation counts. Can you think of other points of

−uk +1 + 2uk − uk −1 = −h2 f (xk , uk , uk′ ) for k = 1, . . . , n

Written as matrix equation:

The stencil view is often more insightful.

Matrix is called ‘sparse’ if there are enough zeros to make specialized

• Simplest, and important in many contexts: matrix-vector product.

Reuse? Locality? Cachelines?

Again: Reuse? Locality? Cachelines?

Indirect addressing of x gives low spatial and temporal locality.

Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.

Change in the sparsity structure! How do you deal with that?

How does this continue by induction?

A dense and a sparse matrix, both with their adjacency graph

aij ← aij − aik ∗ akj /akk

Eliminating a vertex introduces a new edge in the quotient graph

Eliminating (2, 1) causes fill-in at (2, 3).

Remaining matrix when step 1 finished.

Eliminating (3, 2) fills (3, 4)

• Convince yourself that there will be no fill-in. Give an inductive

Suppose a matrix A is banded with halfbandwidth p:

Derive how much space an LU factorization of A will take if no pivoting

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.