hpc_linear
hpc_linear
hpc_linear
Victor Eijkhout
Fall 2022
Justification
2
Linear algebra
3
Two approaches to linear system solving
Solve Ax = b
Direct methods:
• Deterministic
• Exact up to machine precision
• Expensive (in time and space)
Iterative methods:
• Only approximate
• Cheaper in space and (possibly) time
• Convergence not guaranteed
4
Really bad example of direct method
Cramer’s rule
write |A| for determinant, then
5
Not a good method either
Ax = b
6
A close look linear system
solving: direct methods
7
Gaussian elimination
Example
6 −2 2 16
12 −8 6 x = 26
3 −13 3 −19
6 −2 2 | 16 6 −2 2 | 16 6 −2 2 | 16
12 −8 6 | 26 −→ 0 −4 2 | −6 −→ 0 −4 2 | −6
3 −13 3 | −19 0 −12 2 | −27 0 0 −4 | −9
Solve x3 , then x2 , then x1
8
Gaussian elimination, step by step
⟨LU factorization⟩:
for k = 1, n − 1:
⟨eliminate values in column k ⟩
⟨eliminate values in column k ⟩:
for i = k + 1 to n:
⟨compute multiplier for row i ⟩
⟨update row i ⟩
⟨compute multiplier for row i ⟩
aik ← aik /akk
⟨update row i ⟩:
for j = k + 1 to n:
aij ← aij − aik ∗ akj
9
Gaussian elimination, all together
⟨LU factorization⟩:
for k = 1, n − 1:
for i = k + 1 to n:
aik ← aik /akk
for j = k + 1 to n:
aij ← aij − aik ∗ akj
Amount of work:
n−1 n−1
∑ ∑ 1 = ∑ (n − k )2 ≈ ∑ k 2 ≈ n3 /3
k =1 i ,j >k k k
10
Pivoting
11
Roundoff control
Consider
ε 1 1+ε
x=
1 1 2
with solution x = (1, 1)t
Ordinary elimination:
ε 1 1+ε 1+ε
x= = .
0 1 − 1ε 2 − 1+ε
ε 1 − 1ε
12
Roundoff 2
If ε < εmach , then in the rhs 1 + ε → 1, so the system is:
ε 1 1
x=
1 1 2
Eliminating:
ε 1 1 ε 1 1
x= ⇒ x=
0 1 − ε−1 2 − ε−1 0 −ε−1 −ε−1
13
Roundoff 3
Pivot first:
1 1 2 1 1 2
x= ⇒ x=
ε 1 1+ε 0 1−ε 1−ε
1−ε
x2 = = 1, x1 = 2 − x2 = 1
1−ε
14
LU factorization
Same example again:
6 −2 2
A = 12 −8 6
3 −13 3
(elementary reflector)
15
LU 2
16
LU 3
Observe:
1 0 0 1 0 0
L1 = −2 1 0 L−
1
1
= 2 1 0
−1/2 0 1 1/2 0 1
Likewise
1 0 0 1 0 0
−1
L2 = 0
1 0 L2 = 0
1 0
0 −3 1 0 3 1
Even more remarkable:
1 0 0
L−
1
1 −1
L2 = 2 1 0 Lower triangular!
1/2 3 1
Can be computed in place! (pivoting?)
17
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y
Forward sweep:
1 0/ y1 b1
ℓ21 1 y2 b2
ℓ31 ℓ32 1 =
.. .. .. ..
. . . .
ℓn1 ℓn2 ··· 1 yn bn
18
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y
Forward sweep:
1 0/ y1 b1
ℓ21 1 y2 b2
ℓ31 ℓ32 1 =
.. .. .. ..
. . . .
ℓn1 ℓn2 ··· 1 yn bn
y1 = b1 , y2 = b2 − ℓ21 y1 , . . .
18
Solve LU 2
Backward sweep:
u11 u12 . . . u1n x1 y1
u22 . . . u2n x2 y2
.. .. = ..
..
. . . .
0/ unn xn yn
19
Solve LU 2
Backward sweep:
u11 u12 . . . u1n x1 y1
u22 . . . u2n x2 y2
.. .. = ..
..
. . . .
0/ unn xn yn
−1
xn = unn yn , xn−1 = un−−11n−1 (yn−1 − un−1n xn ), . . .
19
Computational aspects
Compare:
20
Short detour: Partial
Differential Equations
21
Second order PDEs; 1D case
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
22
Second order PDEs; 1D case
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
Using Taylor series:
h4
u (x + h) + u (x − h) = 2u (x ) + u ′′ (x )h2 + u (4) (x ) +···
12
so
u (x + h) − 2u (x ) + u (x − h)
u ′′ (x ) = + O (h 2 )
h2
Numerical scheme:
u (x + h) − 2u (x ) + u (x − h)
− = f (x , u (x ), u ′ (x ))
h2
22
This leads to linear algebra
2u (x ) − u (x + h) − u (x − h)
−uxx = f → = f (x , u (x ), u ′ (x ))
h2
Equally spaced points on [0, 1]: xk = kh where h = 1/(n + 1), then
23
Second order PDEs; 2D case
(
−uxx (x̄ ) − uyy (x̄ ) = f (x̄ ) x ∈ Ω = [0, 1]2
u (x̄ ) = u0 x̄ ∈ δΩ
Now using central differences in both x and y directions:
4u (x , y ) − u (x + y , y ) − u (x − h, y ) − u (x , y + h) − u (x , y − h)
24
The stencil view of things
−1
−1 4 −1
−1
25
Sparse matrix from 2D equation
−1 −1
4 0/ 0/
−1 4 1 −1
.. .. .. ..
. . . .
.. .. ..
. . −1 .
0/
−1 4 0/ −1
−1 0/ 4 −1 −1
−1 −1 4 −1 −1
..
↑ . ↑ ↑ ↑ ↑
k −n k −1 k k +1 −1 k +n
−1 −1 4
.. ..
. .
26
Matrix properties
• Very sparse, banded
• Factorization takes less than n2 space, n3 work
• Symmetric (only because 2nd order problem)
• Sign pattern: positive diagonal, nonpositive off-diagonal
(true for many second order methods)
• Positive definite (just like the continuous problem)
• Constant diagonals: only because of the constant coefficient
differential equation
• Factorization: lower complexity than dense, recursion length less
than N.
27
Sparse matrices
28
Sparse matrix storage
Matrix above has many zeros: n2 elements but only O (n) nonzeros.
Big waste of space to store this as square array.
29
Compressed Row Storage
−2
10 0 0 0 0
3 9 0 0 0 3
0 7 8 7 0 0
A= . (1)
3 0 8 7 5 0
0 8 0 9 9 13
0 4 0 0 2 −1
Compressed Row Storage (CRS): store all nonzeros by row, their
column indices, pointers to where the columns start (1-based
indexing):
val 10 -2 3 9 3 7 8 7 3 ··· 9 13 4 2 -1
col ind 1 5 1 2 6 2 3 4 1 ··· 5 6 2 5 6
row ptr 1 3 6 9 13 17 20 .
30
Sparse matrix-vector operations
31
Dense matrix-vector product
Most common operation in many cases: matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (col=0; col<ncols; col++) {
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}
32
Better implementation
Three loops: block, columns inside block, row;
permute blocks to outermost
33
Sparse matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (icol=ptr[row]; icol<ptr[row+1]; icol++) {
int col = ind[icol];
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}
34
Exercise: sparse coding
What if you need access to both rows and columns at the same time?
Implement an algorithm that tests whether a matrix stored in CRS
format is symmetric. Hint: keep an array of pointers, one for each row,
that keeps track of how far you have progressed in that row.
35
Fill-in
Remember Gaussian elimination algorithm:
for k = 1, n − 1:
for i = k + 1 to n:
for j = k + 1 to n:
aij ← aij − aik ∗ akj /akk
36
LU of a sparse matrix
2 −1 0 ... 2 −1 0 ...
−1 2 −1 0 2 − 21 −1
⇒
0
−1 2 −1
0 −1 2 −1
.. .. .. .. .. .. .. ..
. . . . . . . .
Observations?
37
LU of a sparse matrix
4 −1 0 ... −1
−1 4 −1 0 ... 0 −1
.. .. .. ..
. . . .
−1 0 ... 4 −1
0 −1 0 ... −1 4 −1
−1 −1
4 0 ...
4 − 14 −1 0 ... −1/4 −1
⇒
.. .. .. ..
. . . .
−1/4 ... 4 − 14 −1
−1 0 ... −1 4 −1
38
A little graph theory
Graph is a tuple G = ⟨V , E ⟩ where V = {v1 , . . . vn } for some n, and
E ⊂ {(i , j ) : 1 ≤ i , j ≤ n, i ̸= j }.
(
V = {1, 2, 3, 4, 5, 6}
E = {(1, 2), (2, 6), (4, 3), (4, 4), (4, 5)}
39
Graphs and matrices
For a graph G = ⟨V , E ⟩, the adjacency matrix M is defined by
(
1 (i , j ) ∈ E
Mij =
0 otherwise
40
Fill-in
Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.
41
LU of sparse matrix, with graph view: 1
Original matrix.
42
LU of sparse matrix, with graph view: 2
43
LU of sparse matrix, with graph view: 3
44
LU of sparse matrix, with graph view: 4
45
LU of sparse matrix, with graph view: 5
After step 2
46
Fill-in is a function of ordering
∗ ∗ ··· ∗
∗ ∗ 0/
.. ..
. .
∗ 0/ ∗
After factorization the matrix is dense.
Can this be permuted?
47
Exercise: LU of a penta-diagonal matrix
Consider the matrix
2 0 −1
0
2 0 −1
−1 0 2 0 −1
−1 0 2 0 −1
.. .. .. .. ..
. . . . .
Describe the LU factorization of this matrix:
48
Exercise: LU of a band matrix
aij = 0 if |i − j | > p
Can you also derive how much space the inverse will take? (Hint: if
A = LU, does that give you an easy formula for the inverse?)
49