hpc_linear

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Numerical Linear Algebra

Victor Eijkhout

Fall 2022
Justification

Many algorithms are based in linear algebra, including some


non-obvious ones such as graph algorithms. This session will mostly
discuss aspects of solving linear systems, focusing on those that have
computational ramifications.

2
Linear algebra

• Mathematical aspects: mostly linear system solving


• Practical aspects: even simple operations are hard
– Dense matrix-vector product: scalability aspects
– Sparse matrix-vector: implementation

Let’s start with the math. . .

3
Two approaches to linear system solving
Solve Ax = b

Direct methods:

• Deterministic
• Exact up to machine precision
• Expensive (in time and space)

Iterative methods:

• Only approximate
• Cheaper in space and (possibly) time
• Convergence not guaranteed

4
Really bad example of direct method

Cramer’s rule
write |A| for determinant, then

a11 a12 . . . a1i −1 b1 a1i +1 . . . a1n


a21 ... b2 . . . a2n
xi = . .. .. /|A|
.. . .
an1 ... bn . . . ann

Time complexity O (n!)

5
Not a good method either

Ax = b

• Compute explictly A−1 ,


• then x ← A−1 b.
• Numerical stability issues.
• Amount of work?

6
A close look linear system
solving: direct methods

7
Gaussian elimination

Example    
6 −2 2 16
12 −8 6 x =  26 
3 −13 3 −19
     
6 −2 2 | 16 6 −2 2 | 16 6 −2 2 | 16
12 −8 6 | 26  −→ 0 −4 2 | −6  −→ 0 −4 2 | −6
3 −13 3 | −19 0 −12 2 | −27 0 0 −4 | −9
Solve x3 , then x2 , then x1

6, −4, −4 are the ‘pivots’

8
Gaussian elimination, step by step
⟨LU factorization⟩:
for k = 1, n − 1:
⟨eliminate values in column k ⟩
⟨eliminate values in column k ⟩:
for i = k + 1 to n:
⟨compute multiplier for row i ⟩
⟨update row i ⟩
⟨compute multiplier for row i ⟩
aik ← aik /akk
⟨update row i ⟩:
for j = k + 1 to n:
aij ← aij − aik ∗ akj

9
Gaussian elimination, all together
⟨LU factorization⟩:
for k = 1, n − 1:
for i = k + 1 to n:
aik ← aik /akk
for j = k + 1 to n:
aij ← aij − aik ∗ akj

Amount of work:
n−1 n−1
∑ ∑ 1 = ∑ (n − k )2 ≈ ∑ k 2 ≈ n3 /3
k =1 i ,j >k k k

10
Pivoting

If a pivot is zero, exchange that row and another.


(there is always a row with a nonzero pivot if the matrix is nonsingular)
best choice is the largest possible pivot
in fact, that’s a good choice even if the pivot is not zero:
partial pivoting
(full pivoting would be row and column exchanges)

11
Roundoff control
Consider    
ε 1 1+ε
x=
1 1 2
with solution x = (1, 1)t

Ordinary elimination:
     
ε 1 1+ε 1+ε
x= = .
0 1 − 1ε 2 − 1+ε
ε 1 − 1ε

We can now solve x2 and from it x1 :

= (1 − ε−1 )/(1 − ε−1 ) = 1



x2
x1 = ε−1 (1 + ε − x2 ) = 1

12
Roundoff 2
If ε < εmach , then in the rhs 1 + ε → 1, so the system is:
   
ε 1 1
x=
1 1 2

The solution (1, 1) is still correct!

Eliminating:
       
ε 1 1 ε 1 1
x= ⇒ x=
0 1 − ε−1 2 − ε−1 0 −ε−1 −ε−1

Solving first x2 , then x1 , we get:


= ε−1 /ε−1 = 1

x2
x1 = ε−1 (1 − 1 · x2 ) = ε−1 · 0 = 0,
so x2 is correct, but x1 is completely wrong.

13
Roundoff 3

Pivot first:
       
1 1 2 1 1 2
x= ⇒ x=
ε 1 1+ε 0 1−ε 1−ε

Now we get, regardless the size of epsilon:

1−ε
x2 = = 1, x1 = 2 − x2 = 1
1−ε

14
LU factorization
Same example again:
 
6 −2 2
A = 12 −8 6
3 −13 3

2nd row minus 2× first; 3rd row minus 1/2× first;


equivalent to
 
1 0 0
L1 Ax = L1 b, L1 =  −2 1 0
−1/2 0 1

(elementary reflector)

15
LU 2

Next step: L2 L1 Ax = L2 L1 b with


 
1 0 0
L2 = 0
 1 0
0 −3 1

Define U = L2 L1 A, then A = LU with L = L− 1 −1


1 L2
‘LU factorization’ with U upper; L see next.

16
LU 3
Observe:
   
1 0 0 1 0 0
L1 =  −2 1 0 L−
1
1
=  2 1 0
−1/2 0 1 1/2 0 1
Likewise
   
1 0 0 1 0 0
−1
L2 = 0
 1 0 L2 = 0
 1 0
0 −3 1 0 3 1
Even more remarkable:
 
1 0 0
L−
1
1 −1
L2 =  2 1 0 Lower triangular!
1/2 3 1
Can be computed in place! (pivoting?)

17
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y

Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn

18
Solve LU system
Ax = b −→ LUx = b solve in two steps:
Ly = b, and Ux = y

Forward sweep:
    
1 0/ y1 b1
ℓ21 1  y2  b2 
    
ℓ31 ℓ32 1 = 
   
 
 .. ..   ..   .. 
 . .  .  .
ℓn1 ℓn2 ··· 1 yn bn

y1 = b1 , y2 = b2 − ℓ21 y1 , . . .

18
Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn

19
Solve LU 2
Backward sweep:
    
u11 u12 . . . u1n x1 y1
 u22 . . . u2n  x2  y2 
   
..   ..  =  .. 
 
 ..
 . .  .   . 
0/ unn xn yn

−1
xn = unn yn , xn−1 = un−−11n−1 (yn−1 − un−1n xn ), . . .

(Compute inverses once; store)

19
Computational aspects
Compare:

Matrix-vector product: Solving LU system:


    
...
    
y1 a11 a1n x1 a11 0/ x1 y1
..   .. ..   ..   .. ..   ..   .. 
. ← .  .  =  . 

 .  .   . .
yn an1 ... ann xn an1 ... ann xn yn

(and similarly the U matrix)

Compare operation counts. Can you think of other points of


comparison? (Think modern computers.)

20
Short detour: Partial
Differential Equations

21
Second order PDEs; 1D case
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub

22
Second order PDEs; 1D case
(
−u ′′ (x ) = f (x ) x ∈ [a, b]
u ( a ) = ua , u ( b ) = ub
Using Taylor series:

h4
u (x + h) + u (x − h) = 2u (x ) + u ′′ (x )h2 + u (4) (x ) +···
12
so
u (x + h) − 2u (x ) + u (x − h)
u ′′ (x ) = + O (h 2 )
h2
Numerical scheme:
u (x + h) − 2u (x ) + u (x − h)
− = f (x , u (x ), u ′ (x ))
h2

22
This leads to linear algebra
2u (x ) − u (x + h) − u (x − h)
−uxx = f → = f (x , u (x ), u ′ (x ))
h2
Equally spaced points on [0, 1]: xk = kh where h = 1/(n + 1), then

−uk +1 + 2uk − uk −1 = −h2 f (xk , uk , uk′ ) for k = 1, . . . , n

Written as matrix equation:


    
2 −1 0/ u1 f1 + u0
−1 2 −1   u2   f 2 
   =  
.. .. .. .. ..
0/ . . . . .

23
Second order PDEs; 2D case

(
−uxx (x̄ ) − uyy (x̄ ) = f (x̄ ) x ∈ Ω = [0, 1]2
u (x̄ ) = u0 x̄ ∈ δΩ
Now using central differences in both x and y directions:

4u (x , y ) − u (x + y , y ) − u (x − h, y ) − u (x , y + h) − u (x , y − h)

24
The stencil view of things

−1
−1 4 −1
−1

25
Sparse matrix from 2D equation
−1 −1
 
4 0/ 0/
 −1 4 1 −1 
 
 .. .. .. .. 
 . . . . 
 
 .. .. .. 

 . . −1 . 

 0/
 −1 4 0/ −1 

 −1 0/ 4 −1 −1 
 

 −1 −1 4 −1 −1 
 .. 

 ↑ . ↑ ↑ ↑ ↑ 

 k −n k −1 k k +1 −1 k +n 


 −1 −1 4 

.. ..
. .

The stencil view is often more insightful.

26
Matrix properties
• Very sparse, banded
• Factorization takes less than n2 space, n3 work
• Symmetric (only because 2nd order problem)
• Sign pattern: positive diagonal, nonpositive off-diagonal
(true for many second order methods)
• Positive definite (just like the continuous problem)
• Constant diagonals: only because of the constant coefficient
differential equation
• Factorization: lower complexity than dense, recursion length less
than N.

27
Sparse matrices

28
Sparse matrix storage
Matrix above has many zeros: n2 elements but only O (n) nonzeros.
Big waste of space to store this as square array.

Matrix is called ‘sparse’ if there are enough zeros to make specialized


storage feasible.

29
Compressed Row Storage
−2
 
10 0 0 0 0

 3 9 0 0 0 3 

 0 7 8 7 0 0 
A= . (1)

 3 0 8 7 5 0 

 0 8 0 9 9 13 
0 4 0 0 2 −1
Compressed Row Storage (CRS): store all nonzeros by row, their
column indices, pointers to where the columns start (1-based
indexing):

val 10 -2 3 9 3 7 8 7 3 ··· 9 13 4 2 -1
col ind 1 5 1 2 6 2 3 4 1 ··· 5 6 2 5 6
row ptr 1 3 6 9 13 17 20 .

30
Sparse matrix-vector operations

• Simplest, and important in many contexts: matrix-vector product.


• Matrix-matrix product rare in engineering science
very important in Deep Learning
• Gaussian elimination is a complicated story.
• In general: changes to sparse structure are hard!

31
Dense matrix-vector product
Most common operation in many cases: matrix-vector product

aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (col=0; col<ncols; col++) {
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}

Reuse? Locality? Cachelines?

32
Better implementation
Three loops: block, columns inside block, row;
permute blocks to outermost

33
Sparse matrix-vector product
aptr = 0;
for (row=0; row<nrows; row++) {
s = 0;
for (icol=ptr[row]; icol<ptr[row+1]; icol++) {
int col = ind[icol];
s += a[aptr] * x[col];
aptr++;
}
y[row] = s;
}

Again: Reuse? Locality? Cachelines?

Indirect addressing of x gives low spatial and temporal locality.

34
Exercise: sparse coding

What if you need access to both rows and columns at the same time?
Implement an algorithm that tests whether a matrix stored in CRS
format is symmetric. Hint: keep an array of pointers, one for each row,
that keeps track of how far you have progressed in that row.

35
Fill-in
Remember Gaussian elimination algorithm:

for k = 1, n − 1:
for i = k + 1 to n:
for j = k + 1 to n:
aij ← aij − aik ∗ akj /akk

Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.


(and so ℓij ̸= 0 or uij ̸= 0.)

Change in the sparsity structure! How do you deal with that?

36
LU of a sparse matrix

   
2 −1 0 ... 2 −1 0 ...
−1 2 −1   0 2 − 21 −1 

   
0
 −1 2 −1 


 0 −1 2 −1 

.. .. .. .. .. .. .. ..
. . . . . . . .

How does this continue by induction?

Observations?

37
LU of a sparse matrix
 
4 −1 0 ... −1
 −1 4 −1 0 ... 0 −1 
 
 .. .. .. .. 
 . . . . 
 
 −1 0 ... 4 −1 
0 −1 0 ... −1 4 −1

−1 −1

4 0 ...

 4 − 14 −1 0 ... −1/4 −1 


 .. .. .. .. 

 . . . . 

 −1/4 ... 4 − 14 −1 
−1 0 ... −1 4 −1

38
A little graph theory
Graph is a tuple G = ⟨V , E ⟩ where V = {v1 , . . . vn } for some n, and
E ⊂ {(i , j ) : 1 ≤ i , j ≤ n, i ̸= j }.

(
V = {1, 2, 3, 4, 5, 6}
E = {(1, 2), (2, 6), (4, 3), (4, 4), (4, 5)}

39
Graphs and matrices
For a graph G = ⟨V , E ⟩, the adjacency matrix M is defined by
(
1 (i , j ) ∈ E
Mij =
0 otherwise

A dense and a sparse matrix, both with their adjacency graph

40
Fill-in
Fill-in: index (i , j ) where aij = 0 originally, but gets updated to non-zero.

aij ← aij − aik ∗ akj /akk

Eliminating a vertex introduces a new edge in the quotient graph

41
LU of sparse matrix, with graph view: 1

Original matrix.

42
LU of sparse matrix, with graph view: 2

Eliminating (2, 1) causes fill-in at (2, 3).

43
LU of sparse matrix, with graph view: 3

Remaining matrix when step 1 finished.

44
LU of sparse matrix, with graph view: 4

Eliminating (3, 2) fills (3, 4)

45
LU of sparse matrix, with graph view: 5

After step 2

46
Fill-in is a function of ordering

 
∗ ∗ ··· ∗
∗ ∗ 0/ 
 
 .. .. 
. . 
∗ 0/ ∗
After factorization the matrix is dense.
Can this be permuted?

47
Exercise: LU of a penta-diagonal matrix
Consider the matrix
 
2 0 −1
0
 2 0 −1 

−1 0 2 0 −1 
 

 −1 0 2 0 −1 

.. .. .. .. ..
. . . . .
Describe the LU factorization of this matrix:

• Convince yourself that there will be no fill-in. Give an inductive


proof of this.
• What does the graph of this matrix look like? (Find a tutorial on
graph theory. What is a name for such a graph?)
• Can you relate this graph to the answer on the question of the
fill-in?

48
Exercise: LU of a band matrix

Suppose a matrix A is banded with halfbandwidth p:

aij = 0 if |i − j | > p

Derive how much space an LU factorization of A will take if no pivoting


is used. (For bonus points: consider partial pivoting.)

Can you also derive how much space the inverse will take? (Hint: if
A = LU, does that give you an easy formula for the inverse?)

49

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy