0% found this document useful (0 votes)
3 views

matrix_algorithms

This document discusses matrix and linear algebra concepts pertinent to large scale network analysis, focusing on computational aspects. It includes detailed sections on matrix nomenclature, algebra, linear systems, LU decomposition, and methods for symmetric and sparse matrices. The document serves as a comprehensive guide for understanding and implementing matrix algorithms.

Uploaded by

신동호
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

matrix_algorithms

This document discusses matrix and linear algebra concepts pertinent to large scale network analysis, focusing on computational aspects. It includes detailed sections on matrix nomenclature, algebra, linear systems, LU decomposition, and methods for symmetric and sparse matrices. The document serves as a comprehensive guide for understanding and implementing matrix algorithms.

Uploaded by

신동호
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Matrix Algorithms

Timothy Vismor
March 1, 2012

Abstract
is document examines various aspects of matrix and linear algebra
that are relevant to the analysis of large scale networks. Particular emphasis
is placed on computational aspects of the topics of interest.

Copyright © 1990 - 2012 Timothy Vismor


CONTENTS CONTENTS

Contents
1 Matrix Nomenclature 6

2 Matrix Algebra 7
2.1 Matrix Equality . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Matrix Transposition . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Matrix Addition . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.8 Similarity Transformations . . . . . . . . . . . . . . . . . . . . 13
2.9 Partitioning a Matrix . . . . . . . . . . . . . . . . . . . . . . . 13

3 Linear Systems 14
3.1 Solving Fully Determined Systems . . . . . . . . . . . . . . . 15
3.2 Solving Underdetermined Systems . . . . . . . . . . . . . . . 16
3.3 Solving Overdetermined Systems . . . . . . . . . . . . . . . . 17
3.4 Computational Complexity of Linear Systems . . . . . . . . . 17

4 LU Decomposition 18
4.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Doolittle’s LU Factorization . . . . . . . . . . . . . . . . . . . 19
4.3 Crout’s LU Factorization . . . . . . . . . . . . . . . . . . . . . 22
4.4 LDU Factorization . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Numerical Instability During Factorization . . . . . . . . . . . 24
4.6 Pivoting Strategies for Numerical Stability . . . . . . . . . . . 26
4.7 Diagonal Dominance and Pivoting . . . . . . . . . . . . . . . 26
4.8 Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.9 Complete Pivoting . . . . . . . . . . . . . . . . . . . . . . . . 28
4.10 Computational Complexity of Pivoting . . . . . . . . . . . . . 29
4.11 Scaling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 29

5 Solving Triangular Systems 29


5.1 Forward Substitution . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Backward Substitution . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Outer Product Formulation . . . . . . . . . . . . . . . . . . . 31

2
CONTENTS CONTENTS

6 Factor Update 32
6.1 LDU Factor Update . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 LU Factor Update . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 Additional Considerations . . . . . . . . . . . . . . . . . . . . 34

7 Symmetric Matrices 35
7.1 LDU Decomposition of Symmetric Matrices . . . . . . . . . . 35
7.2 LU Decomposition of Symmetric Matrices . . . . . . . . . . . 35
7.3 Symmetric Matrix Data Structures . . . . . . . . . . . . . . . 37
7.4 Doolittle’s Method for Symmetric Matrices . . . . . . . . . . . 38
7.5 Crout’s Method for Symmetric Matrices . . . . . . . . . . . . 38
7.6 Forward Substitution for Symmetric Systems . . . . . . . . . . 39
7.6.1 Forward Substitution Using Lower Triangular Factors . 40
7.6.2 Forward Substitution Using Upper Triangular Factors . 40
7.7 Backward Substitution for Symmetric Systems . . . . . . . . . 41
7.7.1 Back Substitution Using Upper Triangular Factors . . 42
7.7.2 Back Substitution Using Lower Triangular Factors . . 42
7.8 Symmetric Factor Update . . . . . . . . . . . . . . . . . . . . 42
7.8.1 Symmetric LDU Factor Update . . . . . . . . . . . . 43
7.8.2 Symmetric LU Factor Update . . . . . . . . . . . . . . 44

8 Sparse Matrices 45
8.1 Sparse Matrix Methodology . . . . . . . . . . . . . . . . . . . 46
8.2 Abstract Data Types for Sparse Matrices . . . . . . . . . . . . 46
8.2.1 Sparse Matrix Proper . . . . . . . . . . . . . . . . . . 47
8.2.2 Adjacency List . . . . . . . . . . . . . . . . . . . . . . 47
8.2.3 Reduced Graph . . . . . . . . . . . . . . . . . . . . . 48
8.2.4 List . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.2.5 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.2.6 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.3 Pivoting To Preserve Sparsity . . . . . . . . . . . . . . . . . . 50
8.3.1 Markowitz Pivot Strategy . . . . . . . . . . . . . . . . 51
8.3.2 Minimum Degree Pivot Strategy . . . . . . . . . . . . 51
8.4 Symbolic Factorization of Sparse Matrices . . . . . . . . . . . 52
8.4.1 Symbolic Factorization with Minimum Degree Pivot . 53
8.4.2 Computational Complexity of Symbolic Factorization 54
8.5 Creating 𝐏𝐀𝐏𝐓 from a Symbolic Factorization . . . . . . . . . 55
8.6 Numeric Factorization of Sparse Matrices . . . . . . . . . . . . 57
8.7 Solving Sparse Linear Systems . . . . . . . . . . . . . . . . . . 57
8.7.1 Permute the Constant Vector . . . . . . . . . . . . . . 59

3
LIST OF FIGURES LIST OF ALGORITHMS

8.7.2 Sparse Forward Substitution . . . . . . . . . . . . . . 59


8.7.3 Sparse Backward Substitution . . . . . . . . . . . . . . 60
8.7.4 Permute the Solution Vector . . . . . . . . . . . . . . 61
8.8 Sparse LU Factor Update . . . . . . . . . . . . . . . . . . . . 61
8.8.1 Factorization Path of a Singleton Update . . . . . . . . 62
8.8.2 Revising LU after a Singleton Update . . . . . . . . . 63

9 Implementation Notes 63
9.1 Sparse Matrix Representation . . . . . . . . . . . . . . . . . . 65
9.2 Database Cache Performance . . . . . . . . . . . . . . . . . . 66
9.2.1 Sequential Matrix Element Retrieval . . . . . . . . . . 68
9.2.2 Arbitrary Matrix Element Retrieval . . . . . . . . . . . 68
9.2.3 Arbitrary Matrix Element Update . . . . . . . . . . . 68
9.2.4 Matrix Element Insertion . . . . . . . . . . . . . . . . 68
9.2.5 Matrix Element Deletion . . . . . . . . . . . . . . . . 69
9.2.6 Empirical Performance Measurements . . . . . . . . . 69
9.3 Floating Point Performance . . . . . . . . . . . . . . . . . . . 71
9.4 Auxiliary Store . . . . . . . . . . . . . . . . . . . . . . . . . . 73

List of Figures
1 Computational Sequence of Doolittle’s Method . . . . . . . . 21
2 Computational Sequence of Crout’s Method . . . . . . . . . . 23
3 Computational Sequence of Tinney’s LDU Decomposition . . 25
4 Matrix Tuple Structure . . . . . . . . . . . . . . . . . . . . . . 65
5 Sparse Matrix Representation . . . . . . . . . . . . . . . . . . 67

List of Tables
1 Database Cache Benchmarks . . . . . . . . . . . . . . . . . . 70
2 Floating Point Benchmarks . . . . . . . . . . . . . . . . . . . 71
3 Math Library Benchmarks . . . . . . . . . . . . . . . . . . . . 72

List of Algorithms
1 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Doolittle’s LU Decompostion . . . . . . . . . . . . . . . . . . 21
3 Crout’s LU Decomposition . . . . . . . . . . . . . . . . . . . 23
4 Forward Substitution . . . . . . . . . . . . . . . . . . . . . . . 30

4
LIST OF ALGORITHMS LIST OF ALGORITHMS

5 Backward Substitution . . . . . . . . . . . . . . . . . . . . . . 31
6 Forward Substitution - Outer Product . . . . . . . . . . . . . 31
7 Back Substitution - Outer Product . . . . . . . . . . . . . . . 31
8 LDU Factor Update . . . . . . . . . . . . . . . . . . . . . . . 33
9 LU Factor Update . . . . . . . . . . . . . . . . . . . . . . . . 34
10 Doolittle’s Method - Symmetric Implementation . . . . . . . . 38
11 Doolittle’s Method - Symmetric, Array Based . . . . . . . . . . 39
12 Crout’s Method - Symmetric Implementation . . . . . . . . . 39
13 Crout’s Method - Symmetric, Array Based . . . . . . . . . . . 40
14 Symmetric Forward Substitution via Upper Triangular Factors 41
15 Symmetric Forward Substitution using 𝐔 with Array Storage . 41
16 Symmetric Forward Substitution using 𝐔, Outer Product . . . 41
17 Symmetric Forward Substitution using 𝐔, Outer Product, Array 42
18 Symmetric Back Substitution using Lower Triangular Factors . 43
19 Symmetric Backward Substitution using 𝐋 with Array Storage . 43
20 Symmetric LDU Factor Update . . . . . . . . . . . . . . . . . 44
21 Symmetric LU Factor Update . . . . . . . . . . . . . . . . . . 45
22 Symbolic Factorization of a Sparse Matrix . . . . . . . . . . . 54
23 Construct 𝐏𝐀𝐏𝐓 of a Sparse Matrix . . . . . . . . . . . . . . . 56
24 Construct 𝐏𝐀𝐏𝐓 of a Sparse Symmetric Matrix . . . . . . . . . 56
25 LU Decomposition of a Sparse Matrix . . . . . . . . . . . . . 58
26 LU Decomposition of a Sparse Symmetric Matrix . . . . . . . 58
27 Permute 𝐛 to order 𝐏 . . . . . . . . . . . . . . . . . . . . . . . 60
28 Sparse Forward Substitution . . . . . . . . . . . . . . . . . . . 60
29 Sparse Forward Substitution - Outer Product . . . . . . . . . . 60
30 Sparse Back Substitution . . . . . . . . . . . . . . . . . . . . . 61
31 Permute 𝐱 to order 𝐐 . . . . . . . . . . . . . . . . . . . . . . . 61
32 Factorization Path . . . . . . . . . . . . . . . . . . . . . . . . 62
33 Symmetric Factorization Path . . . . . . . . . . . . . . . . . . 63
34 Structurally Symmetric Sparse LU Factor Update . . . . . . . . 64
35 Symmetric Sparse LU Factor Update . . . . . . . . . . . . . . 64

5
1 MATRIX NOMENCLATURE

1 Matrix Nomenclature
Since any nite dimensional linear operator can be represented as a matrix, ma-
trix algebra and linear algebra are two sides of the same coin. Properties of linear
systems are gleaned from either discipline. e following sections draw on both
of these perspectives to examine the basic concepts, numerical techniques, and
practical constraints of computational linear algebra.
Assuming the symbols 𝑥𝑖 represent variables and the symbols 𝑎𝑖𝑗 and 𝑏𝑖 are
complex constants, the following is a system of 𝑚 linear equations in 𝑛 un-
knowns.

𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1


𝑎21 𝑥1 + 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2 (1)

𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + ⋯ + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚

is system of equations is expressed in matrix notation as

𝐀𝐱 = 𝐛 (2)
where

⎛ 𝑎11 𝑎12 ⋯ 𝑎1𝑛 ⎞ ⎛ 𝑥1 ⎞ ⎛ 𝑏1 ⎞


⎜ 𝑎21 𝑎22 ⋯ 𝑎2𝑛 ⎟ ⎜ 𝑥2 ⎟ ⎜ 𝑏2 ⎟
𝐀=⎜ ⎟ 𝐱=⎜ ⎟ 𝐛=⎜ ⎟ (3)
⋯ ⋯ ⋯ ⋯ ⋯ ⋯
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 𝑎𝑚1 𝑎𝑚2 ⋯ 𝑎𝑚𝑛 ⎠ ⎝ 𝑥𝑛 ⎠ ⎝ 𝑏𝑛 ⎠
A rectangular array of coefficients such as a 𝐀 is referred to as a matrix. e
matrix 𝐀 has 𝑚 rows and 𝑛 columns. As such, it is called an 𝑚 × 𝑛 matrix. A
square matrix has an equal number of rows and columns, e.g. an 𝑛 × 𝑛 matrix.
A vector is a matrix with just one row or just one column. A 1 × 𝑛 matrix is a
row vector. An 𝑚 × 1 matrix, such as 𝐱 or 𝐛 in Equation 2, is called a column
vector.
e elements of a matrix 𝑎𝑖𝑖 whose row and column index are equal are re-
ferred to as its diagonal. e elements of a matrix above the diagonal (𝑎𝑖𝑗 , where
𝑖 < 𝑗 ) are its superdiagonal entries. e elements of a matrix below the diago-
nal (𝑎𝑖𝑗 , where 𝑖 > 𝑗 ) are its subdiagonal entries. A matrix whose subdiagonal
entries are zero is called upper triangular. An upper triangular matrix with ones
along the diagonal is called unit upper triangular. e following 3 × 3 matrix is
unit upper triangular.

6
2 MATRIX ALGEBRA

⎛ 1 𝑎12 𝑎13 ⎞
⎜ 0 1 𝑎23 ⎟
⎜ ⎟
⎝ 0 0 1 ⎠
Similarly, a matrix whose superdiagonal entries are zero is called lower tri-
angular. A lower triangular matrix with ones along the diagonal is called unit
lower triangular. e following 3 × 3 matrix is lower triangular.

⎛ 𝑎11 0 0 ⎞
⎜ 𝑎21 𝑎22 0 ⎟
⎜ ⎟
⎝ 𝑎31 𝑎32 𝑎33 ⎠
A matrix whose superdiagonal and subdiagonal entries are zero is a diagonal
matrix, e.g.

⎛ 𝑎11 0 0 ⎞
⎜ 0 𝑎22 0 ⎟
⎜ ⎟
⎝ 0 0 𝑎33 ⎠
A square matrix whose subdiagonal elements are the mirror image of its
superdiagonal elements is referred to as a symmetric matrix. More formally, a
symmetric matrix 𝐀 has the property 𝑎𝑖𝑗 = 𝑎𝑗𝑖 . A trivial example of a symmetric
matrix is a diagonal matrix. e general case of a 3×3 symmetric matrix follows.

⎛ 𝑎11 𝑎12 𝑎13 ⎞


⎜ 𝑎12 𝑎22 𝑎23 ⎟
⎜ ⎟
⎝ 𝑎13 𝑎23 𝑎33 ⎠
A matrix whose elements are all zero is called the zero matrix or the null
matrix.

2 Matrix Algebra
e set of square matrices of dimension 𝑛 form an algebraic entity known as a
ring. By de nition, a ring consists of a set 𝑅 and two operators (addition + and
multiplication ×) such that

• 𝑅 is an Abelian group with respect to addition.


• 𝑅 is a semigroup with respect to multiplication.
• 𝑅 is left distributive, i.e. 𝑎 × (𝑏 + 𝑐) = (𝑎 × 𝑏) + (𝑎 × 𝑐).

7
2.1 Matrix Equality 2 MATRIX ALGEBRA

• 𝑅 is right distributive, i.e. (𝑏 + 𝑐) × 𝑎 = (𝑏 × 𝑎) + (𝑐 × 𝑎).

An Abelian group consists of a set 𝐺 and a binary operator such that

• 𝐺 is associative with respect to the operator.


• 𝐺 has an identity element with respect to the operator.
• Each element of 𝐺 has an inverse with respect to the operator.
• 𝐺 is commutative with respect to the operator.

A semigroup consists of a set 𝐺 and a binary operator such that 𝐺 is associa-


tive with respect to the operator.
For non-square matrices, even these limited properties are not generally true.
e following sections examine the algebraic properties of matrices in further
detail.

2.1 Matrix Equality


Two 𝑚 × 𝑛 matrices 𝐀 and 𝐁 are equal if their corresponding elements are equal.

𝐀=𝐁 (4)
implies

𝑎𝑖𝑗 = 𝑏𝑖𝑗 , where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛 (5)


e notion of matrix equality is unde ned unless the operands have the same
dimensions.

2.2 Matrix Transposition


e transpose of an 𝑚 × 𝑛 matrix 𝐀 is an 𝑛 × 𝑚 matrix denoted by 𝐀𝐓 . e
columns of 𝐀𝐓 are the rows of 𝐀 and the rows of 𝐀𝐓 are the columns of 𝐀.

𝑎𝑇𝑖𝑗 = 𝑎𝑗𝑖 , where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛 (6)


A symmetric matrix is its own transpose, i.e. if 𝐀 is symmetric

𝐀 = 𝐀𝐓
e transpose of the 2 × 3 matrix

8
2.3 Scalar Multiplication 2 MATRIX ALGEBRA

𝑎11 𝑎12 𝑎13


 𝑎21 𝑎22 𝑎23 
is the 3 × 2 matrix

⎛ 𝑎11 𝑎21 ⎞
⎜ 𝑎12 𝑎22 ⎟
⎜ ⎟
⎝ 𝑎13 𝑎23 ⎠

2.3 Scalar Multiplication


e product of an 𝑚×𝑛 matrix 𝐀 and a scalar 𝛼 is an 𝑚×𝑛 matrix whose elements
are the arithmetic products of 𝛼 and the elements of 𝐀.

𝐂=𝛼⋅𝐀 (7)
implies

𝑐𝑖𝑗 = 𝛼 ⋅ 𝑎𝑖𝑗 , where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛 (8)

2.4 Matrix Addition


e sum of 𝑚 × 𝑛 matrices 𝐀 and 𝐁 is an 𝑚 × 𝑛 matrix 𝐂 which is the element
by element sum of the addends.

𝐂=𝐀+𝐁 (9)
implies

𝑐𝑖𝑗 = 𝑎𝑖𝑗 + 𝑏𝑖𝑗 , where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛 (10)


Matrix addition is unde ned unless the addends have the same dimensions.
Matrix addition is commutative.

𝐀+𝐁=𝐁+𝐀
Matrix addition is also associative.

(𝐀 + 𝐁) + 𝐂 = 𝐀 + (𝐁 + 𝐂)
e additive identity is the zero matrix. e additive inverse of matrix 𝐀 is
denoted by −𝐀 and consists of the element by element negation of a 𝐀, i.e. it’s
the matrix formed when a 𝐀 is multiplied by the scalar −1.

9
2.5 Matrix Multiplication 2 MATRIX ALGEBRA

−𝐀 = −1 ⋅ 𝐀 (11)

2.5 Matrix Multiplication


e product of an 𝑚 × 𝑝 matrix 𝐀 and a 𝑝 × 𝑛 matrix 𝐁 is a 𝑚 × 𝑛 matrix 𝐂 where
each element 𝑐𝑖𝑗 is the dot product of row 𝑖 of 𝐀 and column 𝑗 of 𝐁.

𝐂 = 𝐀𝐁 (12)
implies
𝑝
𝑐𝑖𝑗 = (𝑎 + 𝑏𝑘𝑗 ), where 1 ≤ 𝑖 ≤ 𝑚 and 1 ≤ 𝑗 ≤ 𝑛
 𝑖𝑘
(13)
𝑘=1

e product of matrices 𝐀 and 𝐁 is unde ned unless the number of rows in 𝐀 is


equal to the number of columns in 𝐁. In this case, the matrices are conformable
for multiplication.
In general, matrix multiplication is not commutative.

𝐀𝐁 ≠ 𝐁𝐀
As a consequence, the following terminology is sometimes used. Considering
the matrix product

𝐀𝐁
e left multiplicand 𝐀 is said to premultiply the matrix 𝐁. e right multipli-
cand 𝐁 is said to postmultiply the matrix 𝐀.
Matrix multiplication distributes over matrix addition

𝐀(𝐁 + 𝐂) = (𝐀𝐁) + (𝐀𝐂)


and

(𝐁 + 𝐂)𝐀 = (𝐁𝐀) + (𝐂𝐀)


if 𝐀, 𝐁, and 𝐂 are conformable for the indicated operations.
With the same caveat, matrix multiplication is associative.

𝐀 (𝐁𝐂) = (𝐀𝐁) 𝐂
e transpose of a matrix product is the product of the factors in reverse
order, i.e.

10
2.6 Inverse of a Matrix 2 MATRIX ALGEBRA

(𝐀𝐁𝐂)𝐓 = 𝐂𝐓 𝐁𝐓 𝐀𝐓 (14)
e set of square matrices has a multiplicative identity which is denoted by
𝐈. e identity is a diagonal matrix with ones along the diagonal

1 where 𝑖 = 𝑗
𝑎𝑖𝑗 = (15)
 0 where 𝑖 ≠ 𝑗
e 3 × 3 multiplicative identity is

⎛ 1 0 0 ⎞
𝐈=⎜ 0 1 0 ⎟
⎜ ⎟
⎝ 0 0 1 ⎠

2.6 Inverse of a Matrix


If 𝐀 and 𝐁 are square 𝑛 × 𝑛 matrices such that

𝐀𝐁 = 𝐈 (16)
then 𝐁 is a right inverse of 𝐀. Similarly, if 𝐂 is an 𝑛 × 𝑛 matrix such that

𝐂𝐀 = 𝐈 (17)
then 𝐂 is a left inverse of 𝐀. When both Equation 16 and Equation 17 hold

𝐀𝐁 = 𝐂𝐀 = 𝐈 (18)
then 𝐁 = 𝐂 and 𝐁 is the two-sided inverse of 𝐀.
e two-sided inverse of 𝐀 will be referred to as its multiplicative inverse or
simply its inverse. If the inverse of 𝐀 exists, it is unique and denoted by 𝐀−𝟏 .
𝐀−𝟏 exists if and only if 𝐀 is square and nonsingular. A square 𝑛 × 𝑛 matrix is
singular when its rank is less than 𝑛, i.e. two or more of its columns (or rows) are
linearly dependent. e rank of a matrix is examined more closely in Section
2.7 of this document.
A few additional facts about inverses. If 𝐀 is invertible, so is 𝐀−𝟏 and
−𝟏 −𝟏
𝐀  =𝐀 (19)
If 𝐀 and 𝐁 are invertible, so is 𝐀𝐁 and

(𝐀𝐁)−𝟏 = 𝐁−𝟏 𝐀−𝟏 (20)

11
2.7 Rank of a Matrix 2 MATRIX ALGEBRA

Extending the previous example

(𝐀𝐁𝐂)−𝟏 = 𝐂−𝟏 𝐁−𝟏 𝐀−𝟏 (21)


If 𝐀 is invertible, then
−𝟏 𝐓 𝐓 −𝟏
𝐀  = 𝐀  (22)
e conditional or generalized inverse which may be de ned for any matrix
is beyond the scope of the current discussion.

2.7 Rank of a Matrix


e rank of an 𝑚 × 𝑛 matrix 𝐀 is the maximum number of linearly independent
columns in 𝐀. Column vectors of of 𝐀, denoted 𝐚𝐢 , are linearly independent if
the only set of scalars 𝛼𝑖 such that

𝛼1 𝐚𝟏 + 𝛼2 𝐚𝟐 + ... + 𝛼𝑛 𝐚𝐧 = 𝟎 (23)
is the set

𝛼1 = 𝛼2 = ... = 𝛼𝑛 = 0
For a more concrete example, consider the following matrix.

⎛ 0 1 1 2 ⎞
𝐀=⎜ 1 2 3 4 ⎟
⎜ ⎟
⎝ 2 0 2 0 ⎠
e rank of 𝐀 is two, since its third and fourth columns are linear combinations
of its rst two columns, i.e.

𝐚𝟑 = 𝐚 𝟏 + 𝐚 𝟐
𝐚𝟒 = 2𝐚𝟐
If 𝐀 is an 𝑚 × 𝑛 matrix, it can be shown

rank (𝐀) ≤ min(𝑚, 𝑛) (24)


Furthermore,

rank (𝐀𝐁) ≤ min(rank (𝐀) , rank (𝐁)) (25)

rank 𝐀𝐀𝐓  = rank 𝐀𝐓 𝐀 = rank (𝐀) (26)

12
2.8 Similarity Transformations 2 MATRIX ALGEBRA

2.8 Similarity Transformations


If 𝐀 and 𝐁 are 𝑛 × 𝑛 matrices, 𝐀 is similar to 𝐁 if there exists an invertible matrix
𝐏 such that

𝐁 = 𝐏𝐀𝐏−𝟏 (27)
Every matrix is similar to itself with 𝐏 = 𝐈. e only similarity transformation
that holds for the identity matrix or the zero matrix is this trivial one.
Similarity is a symmetric relation. If 𝐀 ∼ 𝐁, then 𝐁 ∼ 𝐀. erefore, pre-
multiplying Equation 27 by 𝐏−𝟏 and postmultiplying it by 𝐏 yields

𝐀 = 𝐏−𝟏 𝐁𝐏 (28)
Similarity is also a transitive relation. If 𝐀 ∼ 𝐁 and 𝐁 ∼ 𝐂, then 𝐀 ∼ 𝐂.
Since similarity is re exive, symmetric, and transitive, it is an equivalence
relation. A common example of a similarity transformation in linear algebra is
changing the basis of a vector space.

2.9 Partitioning a Matrix


Matrices may be divided into subsections for computational purposes. Consider
the 𝑛 × 𝑛 matrix 𝐀 which is partitioned along the following lines.

𝐀𝟏𝟏 𝐀𝟏𝟐
𝐀= (29)
 𝐀𝟐𝟏 𝐀𝟐𝟐 
If 𝐀𝟏𝟏 (𝑘 × 𝑘) and 𝐀𝟐𝟐 (𝑝 × 𝑝) are square matrices, then 𝐀𝟏𝟐 has dimensions 𝑘 × 𝑝
and 𝐀𝟐𝟏 has dimensions 𝑝 × 𝑘.
e transpose of 𝐀 is

𝐀𝐓𝟏𝟏 𝐀𝐓𝟏𝟐
𝐀𝐓 = (30)
 𝐀𝐓𝟐𝟏 𝐀𝐓𝟐𝟐 
If 𝐀 is invertible, its inverse is

𝐁𝟏𝟏 𝐁𝟏𝟐
𝐀−𝟏 = (31)
 𝐁𝟐𝟏 𝐁𝟐𝟐 
where

13
3 LINEAR SYSTEMS

−𝟏
𝐁𝟏𝟏 = (𝐀𝟏𝟏 − 𝐀𝟏𝟐 𝐀−𝟏
𝟐𝟐 𝐀𝟐𝟏 )
𝐁𝟏𝟐 = −𝐀−𝟏
𝟏𝟏 𝐀𝟏𝟐 𝐁𝟐𝟐 (32)
𝐁𝟐𝟏 = −𝐀−𝟏
𝟐𝟐 𝐀𝟐𝟏 𝐁𝟏𝟏
−𝟏
𝐁𝟐𝟐 = (𝐀𝟐𝟐 − 𝐀𝟐𝟏 𝐀−𝟏
𝟏𝟏 𝐀𝟏𝟐 )

Alternately,

𝐁𝟏𝟐 = −𝐁𝟏𝟏 𝐀𝟏𝟐 𝐀−𝟏


𝟐𝟐 (33)
𝐁𝟐𝟐 = 𝐀−𝟏 −𝟏
𝟐𝟐 − 𝐀𝟐𝟐 𝐀𝟐𝟏 𝐁𝟏𝟐

e product of 𝐀 and another 𝑛 × 𝑛 matrix 𝐁 which is partitioned along the


the same lines is an identically partitioned matrix 𝐂 such that

𝐂𝟏𝟏 = 𝐀𝟏𝟏 𝐁𝟏𝟏 + 𝐀𝟏𝟐 𝐁𝟐𝟏


𝐂𝟏𝟐 = 𝐀𝟏𝟏 𝐁𝟏𝟐 + 𝐀𝟏𝟐 𝐁𝟐𝟐 (34)
𝐂𝟐𝟏 = 𝐀𝟐𝟏 𝐁𝟏𝟏 + 𝐀𝟐𝟐 𝐁𝟐𝟏
𝐂𝟐𝟐 = 𝐀𝟐𝟏 𝐁𝟏𝟐 + 𝐀𝟐𝟐 𝐁𝟐𝟐

e current discussion has focused on the principal partition of a square


matrix; however, all aspects of the discussion (except the inversion rules) are
more general – provided the dimensions of the partitions are conformable for
the indicated operations.

3 Linear Systems
Consider the 𝑚 × 𝑛 system of linear equations

𝐀𝐱 = 𝐛 (35)
If 𝑚 = 𝑛 and 𝐀 is not singular, Equation 35 possesses a unique solution and is
referred to as a fully determined system of equations. When 𝑚 > 𝑛 (or 𝑚 = 𝑛
and 𝐀 is singular), Equation 35 is an underdetermined system of equations.
Otherwise, 𝑚 < 𝑛 and Equation 35 is overdetermined system of equations.

14
3.1 Solving Fully Determined Systems 3 LINEAR SYSTEMS

3.1 Solving Fully Determined Systems


A 𝑚 × 𝑛 system of linear equations where 𝑚 = 𝑛 and 𝐀 is not singular, pos-
sesses a unique solution. A solution for the unknown vector 𝐱 is obtained by
premultiplying Equation 35 by 𝐀−𝟏 , i.e.
−𝟏 −𝟏
𝐀 𝐀 𝐱 = 𝐀 𝐛
or simply

𝐱 = 𝐀−𝟏 𝐛 (36)
since 𝐀−𝟏 𝐀 is by de nition the multiplicative identity.
e solution algorithm suggested by Equation 36 is

1. Invert matrix 𝐀.
2. Premultiply the vector 𝐛 by 𝐀−𝟏 .

is procedure is computationally inefficient and rarely used in practice.


Most direct solutions to systems of linear equations are derived from a procedure
known as Gaussian elimination, which is a formalization of the ad hoc techniques
used to solve linear equations in high school algebra. e basic algorithm is

1. Transform the system of equations into a triangular form.


2. Solve the triangular set of equations through a series of variable substitu-
tions.

A major drawback to this procedure is that both sides (𝐀 and 𝐛) of Equa-


tion 35 are modi ed as the system is forced into triangular form. is requires
repetition of the entire procedure if you want to solve Equation 35 with a new
𝐛 vector. A technique known as LU decomposition overcomes this problem by
systematically capturing the intermediate states of 𝐀 as the transformation to
triangular form progresses. is effectively decouples the operations on 𝐀 from
those on 𝐛, permitting solutions to Equation 35 for many 𝐛 vectors based on a
single triangular factorization.
More speci cally, LU decomposition produces a lower triangular matrix 𝐋
and an upper triangular matrix 𝐔 such that

𝐋𝐔 = 𝐀 (37)
Substituting Equation 37 into Equation 35 yields

15
3.2 Solving Underdetermined Systems 3 LINEAR SYSTEMS

(𝐋𝐔) 𝐱 = 𝐛 (38)
Associating the factors in Equation 38 yields

𝐋 (𝐔𝐱) = 𝐛 (39)
Recalling that efficient procedures exist for solving triangular systems (i.e. for-
ward substitution for lower triangular systems and backward substitution for
upper triangular systems), Equation 39 suggests an algorithm for solving Equa-
tion 35. De ne a vector 𝐲 such that

𝐲 = 𝐔𝐱 (40)
Substituting Equation 40 into Equation 39 yields

𝐋𝐲 = 𝐛 (41)
Since 𝐛 is known and 𝐋 is lower triangular, Equation 41 can be solved for 𝐲
by forward substitution. Once 𝐲 is known, Equation 40 can be solved for 𝐱 by
back substitution.
In summary, the preferred algorithm for solving for a nonsingular 𝑛 × 𝑛
system of linear equations is

1. Compute an 𝐋𝐔 decomposition of 𝐀.
2. Solve Equation 41 for 𝐲 by forward substitution.
3. Solve Equation 40 for 𝐱 by back substitution.

3.2 Solving Underdetermined Systems


A 𝑚 × 𝑛 system of linear equations where 𝑚 > 𝑛 (or 𝑚 = 𝑛 and 𝐀 is singular) is
underdetermined. ere are fewer equations than there are unknowns. Under-
determined systems have 𝑞 linearly independent families of solutions, where

𝑞=𝑛−𝑟
and

𝑟 = rank (𝐀)
e value 𝑞 is referred to as the nullity of matrix 𝐀. e 𝑞 linearly dependent
equations in 𝐀 are the null space of 𝐀.

16
3.3 Solving Overdetermined Systems 3 LINEAR SYSTEMS

“Solving” an underdetermined set of equations usually boils down to solving


a fully determined 𝑟×𝑟 system (known as the range of 𝐀) and adding this solution
to any linear combination of the other 𝑞 vectors of 𝐀. A numerical procedure
that solves the crux of this problem is known as singular value decomposition
(or SVD). A singular value decomposition constructs a set of orthonormal bases
for the null space and range of 𝐀.

3.3 Solving Overdetermined Systems


A 𝑚 × 𝑛 system of linear equations with 𝑚 < 𝑛 is overdetermined. ere are
more equations than there are unknowns. “Solving” this equation is the process
of reducing the system to an 𝑚 × 𝑚 problem then solving the reduced set of
equations. A common technique for constructing a reduced set of equations is
known as the least squares solution to the equations. e least squares equations
are derived by premultiplying Equation 35 by 𝐀𝐓 , i.e.
𝐓 𝐓
𝐀 𝐀 𝐱 = 𝐀 𝐛 (42)
Often Equation 42 is referred to as the normal equations of the linear least
squares problem. e least squares terminology refers to the fact that the solu-
tion to Equation 42 minimizes the sum of the squares of the differences between
the left and right sides of Equation 35.

3.4 Computational Complexity of Linear Systems


As was mentioned in Section 3.1, the decomposition algorithm for solving linear
equations is motivated by the computational inefficiency of matrix inversion.
Inverting a dense matrix 𝐀 requires

2𝑛3 + 𝑂 𝑛2 
oating point operations. Computing the LU decomposition of 𝐀 requires
2 3 1 2 1
𝑛 + 𝑛 + 𝑛
3 2 6
or
2 3
𝑛 + 𝑂 𝑛2 
3
oating point operations. Computing 𝐱 from the factorization requires

2𝑛2 + 𝑛

17
4 LU DECOMPOSITION

or

2𝑛2 + 𝑂 (𝑛)
oating point operations (which is equivalent to computing the product 𝐀−𝟏 𝐛).
erefore, solving a linear system of equations by matrix inversion requires ap-
proximately three times the amount of work as a solution via LU decomposition.
When 𝐀 is a sparse matrix, the computational discrepancy between the two
methods becomes even more overwhelming. e reason is straightforward. In
general,

• Inversion destroys the sparsity of 𝐀, whereas


• LU decomposition preserves the sparsity of 𝐀.

Much work can be avoided by taking advantage of the sparsity of a matrix


and its triangular factors. Algorithms for solving sparse systems of equations are
are described in detail in Section 8 of this document.

4 LU Decomposition
ere are many algorithms for computing the LU decomposition of the matrix
𝐀. All algorithms derive a matrix 𝐋 and a matrix 𝐔 that satisfy equation Equa-
tion 37. Most algorithms also permit 𝐋 and 𝐔 to occupy the same amount of
space as 𝐀. is implies that either 𝐋 or 𝐔 is computed as a unit triangular
matrix so that explicit storage is not required for its diagonal (which is all ones).
ere are two basic approaches to arriving at an LU decomposition:

• Simulate Gaussian elimination by using row operations to zero elements


in 𝐀 until an upper triangular matrix exists. Save the multipliers produced
at each stage of the elimination procedure as 𝐋.
• Use the de nition of matrix multiplication to solve Equation 37 directly
for the elements of 𝐋 and 𝐔.

Discussions of the subject by Fox (1964) [1], Golub and Van Loan (1983) [2],
Duff, et al.(1986) [3], and Press, et al.(1988) [4] are complementary in many
respects. Taken as a group, these works provide a good sense of perspective
concerning the problem.

18
4.1 Gaussian Elimination 4 LU DECOMPOSITION

4.1 Gaussian Elimination


Approaches to LU decomposition which systematically capture the intermediate
results of Gaussian elimination often differ in the order in which 𝐀 is forced
into upper triangular form. e most common alternatives are to eliminate the
subdiagonal parts of 𝐀 either one row at a time or one column at a time. e
calculations required to zero a complete row or a complete column are referred
to as one stage of the elimination process.
e effects of the 𝑘𝑡ℎ stage of Gaussian elimination on the 𝐀 matrix are
summarized by the following equation.

𝑎(𝑘)
𝑎(𝑘+1)
𝑖𝑗 = 𝑎(𝑘)
𝑖𝑗 − 𝑖𝑘
𝑎(𝑘) , where 𝑖, 𝑗 >𝑘 (43)
 𝑎  𝑖𝑗
(𝑘)
𝑘𝑘
e notation 𝑎(𝑘) 𝑡ℎ
𝑖𝑗 means the value of 𝑎𝑖𝑗 produced during the 𝑘 stage of the
𝑎(𝑘)
elimination procedure. In Equation 43, the term 𝑖𝑘
(sometimes referred to as a
𝑎(𝑘)
𝑘𝑘
multiplier) captures the crux of the elimination process. It describes the effect of
eliminating element 𝑎𝑖𝑘 on the other entries in row 𝑖 during the 𝑘𝑡ℎ stage of the
elimination. In fact, these multipliers are the elements of the lower triangular
matrix 𝐋, i.e.

𝑎(𝑘)
𝑖𝑘
𝑙𝑖𝑘 = (44)
𝑎(𝑘)
𝑘𝑘
Algorithm 1 implements Equation 43 and Equation 44 and computes the LU
decomposition of an 𝑚 × 𝑛 matrix 𝐀 (it is based on Algorithm 4.2-1 of Golub
and Van Loan (1983) [2]).
e algorithm overwrites 𝑎𝑖𝑗 with 𝑙𝑖𝑗 when 𝑖 > 𝑗 . Otherwise, 𝑎𝑖𝑗 is over-
written by 𝑢𝑖𝑗 . e algorithm creates a matrix 𝐔 that is upper triangular and a
matrix 𝐋 that is unit lower triangular. Note that a working vector 𝐰 of length 𝑛
is required by the algorithm.

4.2 Doolittle’s LU Factorization


An LU decomposition of 𝐀 may be obtained by applying the de nition of matrix
multiplication to the equation 𝐀 = 𝐋𝐔. If 𝐋 is unit lower triangular and 𝐔 is
upper triangular, then
min(𝑖,𝑗)
𝑎𝑖𝑗 =

𝑙𝑖𝑝 𝑢𝑝𝑗 , where 1 ≤ 𝑖, 𝑗 ≤ 𝑛 (45)
𝑝=1

19
4.2 Doolittle’s LU Factorization 4 LU DECOMPOSITION

Algorithm 1: LU Decomposition
f o r 𝑘 = 1, ⋯ , min(𝑚 − 1, 𝑛)
f o r 𝑗 = 𝑘 + 1, ⋯ , 𝑛
𝑤𝑗 = 𝑎𝑘𝑗
f o r 𝑖 = 𝑘 + 1, ⋯ , 𝑚
𝑎𝑖𝑘
𝛼=
𝑎𝑘𝑘
𝑎𝑖𝑘 = 𝛼
f o r 𝑗 = 𝑘 + 1, ⋯ , 𝑛
𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝛼𝑤𝑗

Rearranging the terms of Equation 45 yields


𝑗−1
𝑎𝑖𝑗 − 𝑙𝑖𝑝 𝑢𝑝𝑗

𝑝=1
𝑙𝑖𝑗 = , where 𝑖 > 𝑗 (46)
𝑢𝑗𝑗
and
𝑖−1
𝑢𝑖𝑗 = 𝑎𝑖𝑗 −

𝑙𝑖𝑝 𝑢𝑝𝑗 , where 1 ≤ 𝑗 (47)
𝑝=1

Jointly Equation 46 and Equation 47 are referred to as Doolittle’s method of


computing the LU decomposition of 𝐀. Algorithm 2 implements Doolittle’s
method. Calculations are sequenced to compute one row of 𝐋 followed by the
corresponding row of 𝐔 until 𝐀 is exhausted. You will observe that the proce-
dure overwrites the subdiagonal portions of 𝐀 with 𝐋 and the upper triangular
portions of 𝐀 with 𝐔. Figure 1 depicts the computational sequence associated
with Doolittle’s method.
Two loops in the Doolittle algorithm are of the form

𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑎𝑝𝑗 (48)


ese loops determine an element of the factorization 𝑙𝑖𝑗 or 𝑢𝑖𝑗 by computing the
dot product of a partial column vector in 𝐔 and partial row vector in 𝐋. As such,
the loops perform an inner product accumulation. ese computations have a
numerical advantage over the gradual accumulation of 𝑙𝑖𝑗 or 𝑢𝑖𝑗 during each stage
of Gaussian elimination (sometimes referred to as a partial sums accumulation).

20
4.2 Doolittle’s LU Factorization 4 LU DECOMPOSITION

Prior Iterations

li1 ... l(i-1)i uii ui(i+1) ... uin

Iteration i

Subsequent Iterations

Figure 1: Computational Sequence of Doolittle’s Method

Algorithm 2: Doolittle’s LU Decompostion


f o r 𝑖 = 1, ⋯ , 𝑛
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑗 − 1
𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑎𝑝𝑗
𝛼
𝑎𝑖𝑗 =
𝑎𝑗𝑗
f o r 𝑗 = 𝑖, ⋯ , 𝑛
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑎𝑝𝑗
𝑎𝑖𝑗 = 𝛼

21
4.3 Crout’s LU Factorization 4 LU DECOMPOSITION

is advantage is based on the fact the product of two single precision oating
point numbers is always computed with double precision arithmetic (at least in
the C programming language). Because of this, the product 𝑎𝑖𝑝 𝑎𝑝𝑗 suffers no loss
of precision. If the product is accumulated in a double precision variable 𝛼 , there
is no loss of precision during the entire inner product calculation. erefore,
one double precision variable can preserve the numerical integrity of the inner
product.
Recalling the partial sum accumulation loop of the elimination-based pro-
cedure:

𝑎𝑖𝑗 = 𝑎𝑖𝑗 − 𝛼𝑤𝑗


You will observe that truncation to single precision must occur each time 𝑎𝑖𝑗 is
updated unless both 𝐀 and 𝑤 are stored in double precision arrays.
e derivation of Equation 46 and Equation 47 is discussed more fully in
Conte and De Boor (1972) [5].

4.3 Crout’s LU Factorization


An equivalent LU decomposition of 𝐀 = 𝐋𝐔 may be obtained by assuming that
𝐋 is lower triangular and and 𝐔 is unit upper triangular. is factorization
scheme is referred to as Crout’s method. e de ning equations for Crout’s
method are
𝑖−1
𝑙𝑖𝑗 = 𝑎𝑖𝑗 −

𝑙𝑖𝑝 𝑢𝑝𝑗 , where 𝑖 ≥ 𝑗 (49)
𝑝=1

and
𝑖−1
𝑎𝑖𝑗 − 𝑙𝑖𝑝 𝑢𝑝𝑗

𝑝=1
𝑢𝑖𝑗 = , where 𝑖 < 𝑗 (50)
𝑙𝑖𝑖
Algorithm 3 implements Crout’s method. Calculations are sequenced to com-
pute one column of 𝐋 followed by the corresponding row of 𝐔 until 𝐀 is ex-
hausted.
Figure 2 depicts the computational sequence associated with Crout’s method.
You should observe that Crout’s method, like Doolittle’s, exhibits inner
product accumulation.
A good comparison of the various compact factorization schemes is found
in Duff, et al.(1986) [3].

22
4.3 Crout’s LU Factorization 4 LU DECOMPOSITION

Prior Iterations

ljj uj(j+1) uj(j+2) ... uj(n-1) ujn

l(j+1)j
Prior Iterations

l(j+2)j

...

l(n-1)j

lnj Subsequent
Iterations

Iteration j

Figure 2: Computational Sequence of Crout’s Method

Algorithm 3: Crout’s LU Decomposition


f o r 𝑗 = 1, ⋯ , 𝑛
f o r 𝑖 = 𝑗, ⋯ , 𝑛
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑗 − 1
𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑎𝑝𝑗
𝑎𝑖𝑗 = 𝛼
f o r 𝑗 = 𝑗 + 1, ⋯ , 𝑛
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 − 𝑎𝑗𝑝 𝑎𝑝𝑖
𝛼
𝑎𝑗𝑖 =
𝑎𝑗𝑗

23
4.4 LDU Factorization 4 LU DECOMPOSITION

4.4 LDU Factorization


Some factorization algorithms, referred to as LDU decompositions, derive three
matrices 𝐋, 𝐃, and 𝐔 from 𝐀 such that

𝐋𝐃𝐔 = 𝐀 (51)
where 𝐋 is unit upper triangular, 𝐃 is diagonal, and 𝐔 is unit lower triangular.
It should be obvious that the storage requirements of LDU decompositions and
LU decompositions are the same.
A procedure proposed by Tinney and Walker (1967) [6] provides a concrete
example of an LDU decomposition that is based on Gaussian elimination. One
row of the subdiagonal portion of 𝐀 is eliminated at each stage of the compu-
tation. Tinney refers to the LDU decomposition as a “table of factors”. He
constructs the factorization as follows:

• e elements of the unit upper triangular matrix 𝐔 are 𝑢𝑖𝑗 = 𝑎(𝑖)


𝑖𝑗 , where
𝑖 < 𝑗.
1
• e elements of the diagonal matrix 𝐃 are 𝑑𝑖𝑖 = .
𝑎(𝑖−𝑙)
𝑖𝑖

• e elements of the unit lower triangular matrix 𝐋 are 𝑙𝑖𝑗 = 𝑎(𝑗−1)


𝑖𝑗 , where
𝑖 < 𝑗.

Figure 3 depicts the rst two stages of Tinney’s factorization scheme.

4.5 Numerical Instability During Factorization


Examining Equation 43 and Equation 44, you will observe that LU decompo-
sition will fail when value the 𝑎(𝑘) 𝑘𝑘
(called the pivot element) is zero. In many
applications, the possibility of a zero pivot is quite real and constitutes a serious
impediment to the use of Gaussian elimination. is problem is compounded
by the fact that Gaussian elimination is numerically unstable even if there are
no zero pivot elements.
Numerical instability occurs when errors introduced by the nite precision
representation of real numbers are of sufficient magnitude to swamp the true
solution to a problem. In other words, a numerically unstable problem has a
theoretical solution that may be unobtainable in nite precision arithmetic.
e other LU decomposition schemes examined in this section exhibit sim-
ilar characteristics, e.g. instability is introduced by the division by 𝑢𝑗𝑗 in Equa-
tion 46 and 𝑙𝑖𝑖 in Equation 50.

24
4.5 Numerical Instability During Factorization4 LU DECOMPOSITION

Stage 1

D 1/a11 a12/a11 ... a1n/a11 U

a21 a22 - a21a12(1) ... a2n - a21a1n(1)

a31 a32 - a31a12(1) ... a3n - a31a1n(1)


L
... ... ... ...

an1 an2 - an1a12(1) ... ann - an1a1n(1)

Updated

Stage 2
a11(1) a12(1) ... a1n(1)

a21 D 1/a22(1) ... a1n(1)/a22(1) U

a31 a32(1) ... a3n(1) - a32(1)/a2n(2)


... L ... ...
an1 an2(1) ... ann(1) - an2(1)/a2n(2)

Updated

(k) th
Note: aij is the k stage partial sum of aij.

Figure 3: Computational Sequence of Tinney’s LDU Decomposition

25
4.6 Pivoting Strategies for Numerical Stability 4 LU DECOMPOSITION

4.6 Pivoting Strategies for Numerical Stability


A solution to the numerical instability of LU decomposition algorithms is ob-
tained by interchanging the rows and columns of 𝐀 to avoid zero (and other
numerically unstable) pivot elements. ese interchanges do not effect the so-
lution to Equation 35 as long as the permutations are logged and taken into
account during the substitution process. e choice of pivot elements 𝑎(𝑘) 𝑘𝑘
is
referred to as a pivot strategy. In the general case, there is no optimal pivot
strategy. Two common heuristics are:

• At the 𝑘𝑡ℎ stage of the computation, choose the largest remaining element
in 𝐀 as the pivot. If pivoting has proceeded along the diagonal in stages
1 through 𝑘 − 1, this implies the next pivot should be the largest element
𝑎(𝑘−1)
𝑖𝑗 where 𝑘 ≤ 𝑖 ≤ 𝑛 and 𝑘 ≤ 𝑗 ≤ 𝑛. is strategy is referred to as
complete pivoting.
• At the 𝑘𝑡ℎ stage of the computation, select the largest element in column
𝑘 as the pivot. is strategy is referred to as partial pivoting.

Both procedures have good computational properties. Gaussian elimination


with complete pivoting is numerically stable. In most practical applications,
Gaussian elimination with partial pivoting has the same stability characteristics
as complete pivoting. However, there are theoretical situations where partial
pivoting strategies can become unstable.
Applications of current interest are diagonally dominant; therefore, algo-
rithms which incorporate pivoting strategies for numerical stability are not ex-
amined in this document. For implementations of Gaussian elimination with
complete and partial pivoting, see algorithms 4.4-1 and 4.4-2 of Golub and
Van Loan (1983) [2]. For an implementation of Doolittle’s method with scaled
partial pivoting, see algorithm 3.5 of Conte and de Boor (1972) [5]. Crout’s
method with scaled partial pivoting is implemented in section 2.3 of Press, et
al.(1988) [4]. Pivoting strategies which control element growth in sparse ma-
trices are examined in Section 8.3 of this document.
e following sections present a brief introduction to the topic of pivoting
to reduce numerical instability.

4.7 Diagonal Dominance and Pivoting


We will begin our discussion of pivoting by identifying a condition in which
pivoting is unnecessary. e matrix 𝐀 is row diagonally dominant when the
following inequality holds.

26
4.8 Partial Pivoting 4 LU DECOMPOSITION

||𝑎𝑖𝑖 || > |𝑎 | , where 𝑖 = 1, ⋯ , 𝑛 (52)


 | 𝑖𝑗 |
𝑗≠𝑖

e matrix 𝐀 is column diagonally dominant when the following inequality


holds.

||𝑎𝑗𝑗 || > |𝑎 | , where 𝑗 = 1, ⋯ , 𝑛 (53)


 | 𝑖𝑗 |
𝑗≠𝑖

If either of these conditions apply, the LU decomposition algorithms discussed


in this document are numerically stable without pivoting.

4.8 Partial Pivoting


If a partial pivoting strategy is observed (pivoting is restricted to row inter-
changes), factorization produces matrices 𝐋 and 𝐔 which satisfy the following
equation.

𝐋𝐔 = 𝐏𝐀 (54)
where 𝐏 is a permutation matrix that is derived as follows:

1. 𝐏 is initialized to 𝐈.
2. Each row interchange that occurs during the decomposition of 𝐀 causes
a corresponding row swap in 𝐏.

Recalling the de nition of a linear system of equations

𝐀𝐱 = 𝐛
and premultiplying both sides by 𝐏

𝐏𝐀𝐱 = 𝐏𝐛
Using Equation 54 to substitute for 𝐏𝐀 yields

𝐋𝐔𝐱 = 𝐏𝐛 (55)
Following the same train of logic used to derive equations Equation 40 and
Equation 41 implies that a solution for 𝐀 can be achieved by the sequential
solution of two triangular systems.

27
4.9 Complete Pivoting 4 LU DECOMPOSITION

𝐲 = 𝐏𝐛 (56)
𝐋𝐜 = 𝐲
𝐔𝐱 = 𝐜

Observe that the product 𝐏𝐛 is computed before forward substitution begins.


Computationally, this implies that 𝐏 can be implemented as a mapping that is
applied to 𝐛 before substitution.

4.9 Complete Pivoting


If a complete pivoting strategy is observed (pivoting involves both row and col-
umn interchanges), factorization produces matrices 𝐋 and 𝐔 which satisfy the
following equation.

𝐋𝐔 = 𝐏𝐀𝐐 (57)
where 𝐏 is a row permutation matrix and 𝐐 is a column permutation matrix.
𝐐 is derived from column interchanges in the same way 𝐏 is derived from row
interchanges.
If 𝐀 and its factors are related according to Equation 57, then Equation 35
can still be solved for 𝐀 by the sequential solution of two triangular systems.

𝐲 = 𝐏𝐛 (58)
𝐋𝐜 = 𝐲 (59)
𝐔𝐳 = 𝐜 (60)
𝐱 = 𝐐𝐳 (61)

Since Equation 56 and Equation 58 are identical, 𝐏 can still be implemented as


a mapping that is applied to 𝐛 before substitution begins. Since Equation 61
computes the product 𝐐𝐳 after back substitution is nished, 𝐐 can be imple-
mented as a mapping that is applied to 𝐀 following the substitution process.
If 𝐀 is symmetric, pivoting for numerical stability may destroy the symmetry
of the LU decomposition of 𝐀. For a symmetric factorization of 𝐀, matching
row and column interchanges are required. In other words, pivoting must be
complete and the permutation matrices must be related as follows:

𝐐 = 𝐏𝐓

28
4.10 Computational Complexity5of Pivoting
SOLVING TRIANGULAR SYSTEMS

4.10 Computational Complexity of Pivoting


Obviously, complete pivoting and partial pivoting differ substantially with re-
gard to the computational effort required to determine the next pivot element.
Complete pivoting on a dense, asymmetric matrix is an 𝑂 𝑛3  operation re-
quiring
2 3 1 2 1
𝑛 + 𝑛 + 𝑛
3 2 6
oating point comparisons. Partial pivoting on the same matrix is an 𝑂 𝑛2 
operation requiring

𝑛2 + 𝑛
2
oating point comparisons.

4.11 Scaling Strategies


Some algorithms attempt to reduce the roundoff error generated during LU de-
composition by preprocessing 𝐀. e most common roundoff control strategy
is row scaling where each equation is normalized (so that its largest coefficient
has a value of one) before a pivot is chosen. Pivoting strategies which employ
scaling techniques usually are implemented in two stages:

1. Scan the equations to determine a set of scale factors.


2. Choose the pivot elements taking the scale factors into account.

However, a word of caution is in order. Scaling techniques do not solve the


fundamental roundoff problem of adding a large quantity to a small one. In fact,
problems may arise where scaling techniques exacerbate rather than ameliorate
roundoff error.

5 Solving Triangular Systems


A triangular system of equations is efficiently solved by a series of variable sub-
stitutions. By de nition, there will always be at least one equation with a single
unknown, one equation with only two unknowns (one of which will correspond
to the unknown in the single variable equation), etc. is procedure can be for-
malized into two simple algorithms.

29
5.1 Forward Substitution 5 SOLVING TRIANGULAR SYSTEMS

Algorithm 4: Forward Substitution


f o r 𝑖 = 1, ⋯ , 𝑛
𝛼 = 𝑏𝑖
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 − 𝑙𝑖𝑗 𝑦𝑗
𝛼
𝑦𝑖 =
𝑙𝑖𝑖

• A lower triangular system is solved by forward substitution.


• An upper triangular system is solved by backward substitution.

ese operations are described with reference to the solution of systems of


linear equations whose LU decomposition is known. Equation 41 (𝐋𝐲 = 𝐛) and
Equation 40 (𝐔𝐱 = 𝐲) pose the problem in this context.

5.1 Forward Substitution


e equation 𝐋𝐲 = 𝐛 is solved by forward substitution as follows.
𝑖−1
𝑏𝑖 − 𝑙𝑖𝑗 𝑦𝑖

𝑗=1
𝑦𝑖 = , where 1 ≤ 𝑖 ≤ 𝑛 (62)
𝑙𝑖𝑖
Algorithm 4 implements Equation 62.
If 𝐋 is unit lower triangular, the division by 𝑙𝑖𝑖 is unnecessary (since 𝑙𝑖𝑖 is 1).
Notice that the update to 𝑦𝑖 is accumulated as an inner product in 𝛼 .

5.2 Backward Substitution


e equation 𝐲 = 𝐔𝐱 is solved by backward substitution as follows.
𝑛
𝑦𝑖 − 𝑢𝑖𝑗 𝑥𝑖

𝑗=𝑖+1
𝑥𝑖 = , where 𝑖 = 𝑛, 𝑛 − 1, ⋯ , 1 (63)
𝑢𝑖𝑖
Algorithm 5 implements Equation 63.
If 𝐔 is unit upper triangular, the division by 𝑢𝑖𝑖 is unnecessary (since 𝑢𝑖𝑖 is
1). Notice that the update to 𝑥𝑖 is accumulated as an inner product in 𝛼 .

30
5.3 Outer Product Formulation 5 SOLVING TRIANGULAR SYSTEMS

Algorithm 5: Backward Substitution


f o r 𝑖 = 𝑛, ⋯ , 1
𝛼 = 𝑦𝑖
f o r 𝑗 = 𝑖 + 1, ⋯ , 𝑛
𝛼 = 𝛼 − 𝑢𝑖𝑗 𝑥𝑗
𝛼
𝑥𝑖 =
𝑢𝑖𝑖

Algorithm 6: Forward Substitution - Outer Product


f o r 1 = 1, ⋯ , 𝑛
𝑏𝑖
𝑦𝑖 =
𝑙𝑖𝑖
f o r 𝑘 = 𝑖 + 1, ⋯ , 𝑛
𝑏𝑘 = 𝑏𝑘 − 𝑦𝑖 𝑙𝑘𝑖

5.3 Outer Product Formulation


Sections 5.1 and 5.2 solve triangular systems by procedures which use an in-
ner product accumulation for each row of 𝐀. It is possible to formulate these
algorithms in a manner that arrives at the solution through partial sum accumu-
lations (also known as an outer product). Algorithm 6 solves lower triangular
systems using an outer product formulation of forward substitution.
You should observe that if 𝑏𝑗 is zero, the 𝑖𝑡ℎ stage can be skipped. In this
situation, 𝑦𝑖 will also be zero and the term 𝑦𝑖 𝑙𝑘𝑖 will not change any of the partial
sums 𝑏𝑘 .
Algorithm 7 solves upper triangular systems are using an outer product for-
mulation of the backward substitution algorithm.
George and Liu (1981) [7] examine outer product solutions of triangular
systems as do Tinney, Bradwajn, and Chan (1985) [8].

Algorithm 7: Back Substitution - Outer Product


f o r 𝑖 = 1, ⋯ , 𝑛
𝑦𝑖
𝑥𝑖 =
𝑢𝑖𝑖
f o r 𝑘 = 𝑖 − 1, ⋯ , 1
𝑦𝑘 = 𝑦𝑘 − 𝑥𝑖 𝑢𝑘𝑖

31
6 FACTOR UPDATE

6 Factor Update
If the LU decomposition of the matrix 𝐀 exists and the factorization of a related
matrix

𝐀 = 𝐀 + 𝚫𝐀 (64)
is needed, it is sometimes advantageous to compute the factorization of 𝐀 by
modifying the factors of 𝐀 rather than explicitly decomposing 𝐀 . Implemen-
tations of this factor update operation should have the following properties:

• Arithmetic is minimized,
• Numerical stability is maintained, and
• Sparsity is preserved.

e current discussion outlines procedures for updating the factors of 𝐀


following a rank one modi cation. A rank one modi cation of 𝐀 is de ned as

𝐀 = 𝐀 + 𝛼𝐲𝐳𝐓 (65)
where 𝛼 is a scalar and the vectors 𝐲 and 𝐳𝐓 are dimensionally correct. e
terminology comes from the observation that the product 𝛼𝐲𝐳𝐓 is a matrix whose
rank is one.
Computationally, a rank one factor update to a dense matrix is an 𝑂 𝑛2 
operation. Recall that decomposing a matrix from scratch is 𝑂 𝑛3 .

6.1 LDU Factor Update


Algorithm 8 follows the lead of our sources (see Section 6.3 for details) and
implements a technique for updating the factors 𝐋, 𝐃, and 𝐔 of 𝐀 following a
rank one change to 𝐀.
e scalar 𝛼 and the vectors 𝐲 and 𝐳𝐓 are destroyed by this procedure. e
factors of 𝐀 are overwritten by their new values.
e outer loop of Algorithm 8 does not have to begin at one unless 𝐲 is full.
If 𝐲 has any leading zeros, the initial value of 𝑖 should be the index of 𝑦𝑖 the rst
nonzero element of 𝐲. If there is no a priori information about the structure
of 𝐲 but there is a high probability of leading zeros, testing 𝑦𝑖 for zero at the
beginning of the loop might save a lot of work. However, you must remember
to cancel the test as soon as a nonzero 𝑦𝑖 is encountered.

32
6.2 LU Factor Update 6 FACTOR UPDATE

Algorithm 8: LDU Factor Update


f o r 𝑖 = 1, ⋯ , 𝑛
𝛿 = 𝑑 𝑖 , 𝑝 = 𝑦𝑖 , 𝑞 = 𝑧 𝑖
𝑑𝑖 = 𝑑𝑖 + 𝛼𝑝𝑞
𝛼𝑝 𝛼𝑞 𝛼𝛿
𝛽1 = , 𝛽2 = , 𝛼=
𝑑𝑖 𝑑𝑖 𝑑𝑖
f o r 𝑗 = 𝑖 + 1, ⋯ , 𝑛
𝑦𝑗 = 𝑦𝑗 − 𝑝𝑙𝑗𝑖
𝑧𝑗 = 𝑧𝑗 − 𝑞𝑢𝑖𝑗
𝑙𝑗𝑖 = 𝑙𝑗𝑖 + 𝛽1 𝑦𝑗
𝑢𝑖𝑗 = 𝑢𝑖𝑗 + 𝛽2 𝑦𝑗

6.2 LU Factor Update


Similar algorithms exist for updating 𝐋𝐔 decompositions of 𝐀. If 𝐔 is upper
triangular and 𝐋 is unit lower triangular, an element 𝑢𝑖𝑗 from 𝐔 is related to the
elements 𝑢𝑖𝑗 and 𝑑𝑖 of the 𝐋𝐃𝐔 decomposition as follows.

𝑢𝑖𝑗 = 𝑢𝑖𝑗 𝑑𝑖 (66)


e recurrence relations of the inner loop of Algorithm 8 must change to re ect
this relationship. e following statement updates the 𝐳 vector for a unit upper
triangular 𝐔.

𝑧𝑗 = 𝑧𝑗 − 𝑞𝑢𝑖𝑗 (67)
If 𝐔 is upper triangular, the statement becomes
𝑢𝑖𝑗
𝑧 𝑗 = 𝑧𝑗 − 𝑝 (68)
𝛿
where 𝛿 is the value of 𝑢𝑖𝑖 before it was changed during stage 𝑖 of the procedure.
Along the same lines, the factor update statement

𝑢𝑖𝑗 = 𝑢𝑖𝑗 + 𝛽2 𝑧𝑗 (69)


becomes
𝑢𝑖𝑗 𝑢𝑖𝑗
= + 𝛽2 𝑧𝑗 (70)
𝑢𝑖𝑖 𝛿
Solving for the updated value of 𝑢𝑖𝑗 yields

33
6.3 Additional Considerations 6 FACTOR UPDATE

Algorithm 9: LU Factor Update


f o r 𝑖 = 1, ⋯ , 𝑛
𝛿 = 𝑢𝑖𝑖 , 𝑝 = 𝑦𝑖 , 𝑞 = 𝑧𝑖
𝑢𝑖𝑖 = 𝑢𝑖𝑖 + 𝛼𝑝𝑞
𝛼 𝑞 𝑢𝑖𝑖 𝛼
𝛽1 = , 𝛽2 = 𝛼𝑞, 𝑞 = , 𝛿= , 𝛼=
𝑢𝑖𝑖 𝛿 𝛿 𝛿
f o r 𝑗 = 𝑖 + 1, ⋯ , 𝑛
𝑦𝑗 = 𝑦𝑗 − 𝑝𝑙𝑗𝑖
𝑧𝑗 = 𝑧𝑗 − 𝑞𝑢𝑖𝑗
𝑙𝑗𝑖 = 𝑙𝑗𝑖 + 𝛽1 𝑦𝑗
𝑢𝑖𝑗 = 𝛿𝑢𝑖𝑗 + 𝛽2 𝑧𝑗

𝑢𝑖𝑗
𝑢𝑖𝑗 = 𝑢𝑖𝑖 ( + 𝛽2 𝑧𝑗 ) (71)
𝛿
Taking these observations into consideration and pulling operations on con-
stants out of the inner loop, Algorithm 9 updates 𝐔 based on a rank one change
to 𝐀.
If 𝐔 is unit upper triangular and 𝐋 is lower triangular, a similar algorithm is
derived from the observation that 𝑙𝑖𝑗 of 𝐋 and 𝑙𝑖𝑗 , 𝑑𝑖 of the 𝐋𝐃𝐔 decomposition
are related as follows.

𝑙𝑖𝑗 = 𝑙𝑖𝑗 𝑑𝑗 (72)


e resulting algorithm deviates from Algorithm 8 in a manner that parallels
Algorithm 9.

6.3 Additional Considerations


e algorithms presented in Sections 6.1 and 6.2 are based on the work of Ben-
nett [1965] [9]. e nomenclature is similar to that of Gill, Golub, Murray,
and Sanders (1974) [10]. ese citations describe procedures for updating the
factors of an 𝐋𝐃𝐔 decomposition.
e procedure described by Bennett (1965) [9] is more general than the
algorithms described in this section in that it applies to rank 𝑚 changes to 𝐀.
However, decomposing a rank 𝑚 change into 𝑚 rank one changes and applying
the current algorithms has the same complexity as Bennett’s process and saves a
little array space. Gill, Golub, Murray, and Sanders (1974) [10] state that Ben-
nett’s algorithm is theoretically unstable unless 𝐋 = 𝐔𝐓 and 𝐲 = 𝐳. In practice,

34
7 SYMMETRIC MATRICES

Bennett’s algorithm has proven to be stable for many physical problems with rea-
sonable values of 𝛼 , 𝐲, and 𝐳. e algorithm rarely exhibits instability when it is
applied to diagonally dominant matrices where pivoting is not required. Gill,
et. al. (1974) [10] describe alternate algorithms for situations where stability
problems arise.
Hager (1989) [11] provides a good overview of approaches to the problem of
updating the inverse of a matrix and describes practical areas in which the prob-
lem arises. Chan and Brandwajn (1986) [12] examine applications in network
analysis.

7 Symmetric Matrices
Recall that an 𝑛 × 𝑛 symmetric matrix 𝐀 is its own transpose

𝐀 = 𝐀𝐓
is being the case, the elements of 𝐀 are described by the following relationship

𝑎𝑖𝑗 = 𝑎𝑗𝑖 , for all 𝑖, 𝑗

7.1 LDU Decomposition of Symmetric Matrices


If 𝐀 is symmetric, its 𝐋𝐃𝐔 decomposition is symmetric, i.e.

𝐋 = 𝐔𝐓 (73)
and

𝐔 = 𝐋𝐓 (74)
For this reason, the 𝐋𝐃𝐔 decomposition of a symmetric matrix is sometimes
referred to as an 𝐋𝐃𝐋𝐓 decomposition. e elements 𝐋 and 𝐔 of the LDU
decomposition of a symmetric matrix are related as follows.

𝑙𝑖𝑗 = 𝑢𝑗𝑖 , where 𝑖 ≠ 𝑗 (75)

7.2 LU Decomposition of Symmetric Matrices


Given the symmetric structure of the 𝐋𝐃𝐔 factors of a symmetric matrix (see
Section 7.1) and the common use of 𝐋𝐔 factorization in the analysis of linear
systems, it is constructive to develop expressions that relate an explicit 𝐋𝐔 de-
composition to an implicit 𝐋𝐃𝐔 factorization. In subsequent sections of this

35
7.2 LU Decomposition of Symmetric Matrices
7 SYMMETRIC MATRICES

document, they will prove useful in deriving symmetric variants of the algo-
rithms discussed in Sections 4 and 5.
In other words, the symmetric factorization algorithms discussed in this doc-
ument assume an 𝐋𝐔 decomposition exists (or is to be computed) such that

• Its existing (explicit) factorization (either 𝐋 or 𝐔) is triangular, and


• e desired (implicit) factorization is unit triangular.

is implies that algorithms which deal with an explicit set of lower trian-
gular factors, call them 𝐋, will associate the factors of an implicit 𝐋𝐃𝐔 decom-
position as follows

𝐋 = 𝐋𝐃 (76)
or

𝐋 = 𝐋𝐃−𝟏
Substituting for 𝐋 based on Equation 73 yields

𝐔𝐓 = 𝐋𝐃−𝟏 (77)
Recalling that the inverse of a diagonal matrix is the arithmetic inverse of each
element and taking the product yields

𝑙𝑖𝑗
𝑢𝑗𝑖 =
𝑑𝑖𝑖
Since 𝑑𝑖𝑖 = 𝑙𝑖𝑖 ,

𝑙𝑗𝑖
𝑢𝑖𝑗 = (78)
𝑙𝑗𝑗
In a similar vein, algorithms that deal with an explicit set of upper triangular
factors, call them 𝐔, will associate the factors of an 𝐋𝐃𝐔 decomposition as fol-
lows.

𝐔 = 𝐃𝐔 (79)
is association yields the following relationship between the explicit factors 𝐔
and implicit factors 𝐋.

36
7.3 Symmetric Matrix Data Structures 7 SYMMETRIC MATRICES

𝑢𝑗𝑖
𝑙𝑖𝑗 = (80)
𝑢𝑗𝑗

ese observations show that it is only necessary to compute and store 𝐋 or 𝐔


during the 𝐋𝐔 factorization of a symmetric matrix. is halves the arithmetic
required during the factorization procedure. However, symmetry does not re-
duce the work required during forward and backward substitution.

7.3 Symmetric Matrix Data Structures


Recognizing the special character of symmetric matrices can save time and stor-
age during the solution of linear systems. More speci cally, a dense matrix
requires storage for 𝑛2 elements. A symmetric matrix can be stored in about
2
half the space, 𝑛 2+𝑛 elements. Only the upper (or lower) triangular portion of
𝐀 has to be explicitly stored. e implicit portions of 𝐀 can be retrieved using
section 7. An efficient data structure for storing dense, symmetric matrices is a
simple linear array. If the upper triangular portion of 𝐀 is retained, the array is
organized in the following manner.

𝐀 = (𝑎11 , 𝑎12 , ⋯ , 𝑎1𝑛 , 𝑎21 , ⋯ , 𝑎2𝑛 , ⋯ , 𝑎𝑛𝑛 ) (81)


e element 𝑎𝑖𝑗 is retrieved from the linear array by the following indexing rule.

𝑎𝑖𝑗 = 𝐚[(𝑖 − 1)(𝑛) − (𝑖 − 1)𝑖/2 + 𝑗 ] (82)


If array and matrix indexing is zero based (as in the C programming language),
the subscripting rule becomes

𝑎𝑖𝑗 = 𝐚[𝑖𝑛 − (𝑖 − 1)𝑖/2 + 𝑗 ] (83)


If the lower triangular portion of 𝐀 is retained, the linear array is organized as
follows.

𝐀 = (𝑎11 , 𝑎21 , 𝑎22 , 𝑎31 , ⋯ , 𝑎𝑛1 , 𝑎𝑛2 , ⋯ , 𝑎𝑛𝑛 ) (84)


e element 𝑎𝑖𝑗 is retrieved from the linear array by the following indexing rule.

𝑎𝑖𝑗 = 𝐚[𝑖(𝑖 − 1)/2 + 𝑗 ] (85)


If array and matrix subscripts are zero based, Equation 85 becomes

𝑎𝑖𝑗 = 𝐚[𝑖(𝑖 + 1)/2 + 𝑗 ] (86)

37
7.4 Doolittle’s Method for Symmetric Matrices
7 SYMMETRIC MATRICES

Algorithm 10: Doolittle’s Method - Symmetric Implementation


f o r 𝑖 = 1, ⋯ , 𝑛
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
𝑎𝑗𝑖
𝑤𝑗 =
𝑎𝑗𝑗
f o r 𝑗 = 𝑖, ⋯ , 𝑛
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑤𝑝
𝑎𝑖𝑗 = 𝛼

You will observe that the dimension of 𝐀 does not enter the indexing calculation
when its lower triangular portion is retained.
e indexing equations are implemented most efficiently by replacing divi-
sion by two with a right shift.

7.4 Doolittle’s Method for Symmetric Matrices


If 𝐀 is a symmetric 𝑛 × 𝑛 matrix, Algorithm 10 computes – one row at a time –
the upper triangular matrix 𝐔 that results from a Doolittle decomposition. e
upper triangular portion of 𝐀 is overwritten by 𝐔.
e algorithm uses a working vector 𝐰 of length 𝑛 to construct the relevant
portion of row 𝑖 from 𝐋 at each stage of the factorization. Elements from 𝐋 are
derived using Equation 80.
If the upper triangular portion of 𝐀 is stored in a linear array, Algorithm 11
results in the same factorization (assuming zero based subscripting).
e upper triangular portion of 𝐀 is overwritten by 𝐔.
For the general implementation of Doolittle’s method, see Section 4.2.

7.5 Crout’s Method for Symmetric Matrices


If 𝐀 is a symmetric 𝑛 × 𝑛 matrix, Algorithm 12 computes – one column at a time
– the lower triangular matrix 𝐋 that results from a Crout decomposition. e
lower triangular portion of 𝐀 is overwritten by 𝐋.
e algorithm uses a working vector 𝐰 of length 𝑛 to construct the relevant
portion of column 𝑗 from 𝐔 at each stage of the factorization. Elements from
𝐔 are derived using Equation 78.

38
7.6 Forward Substitution for Symmetric Systems
7 SYMMETRIC MATRICES

Algorithm 11: Doolittle’s Method - Symmetric, Array Based


f o r 𝑖 = 0, ⋯ , 𝑛 − 1
f o r 𝑗 = 0, ⋯ , 𝑖 − 1
𝐚[jn-j(j+1)/2+i]
𝐰[j] =
𝐚[jn-j(j+1)/2+j]
f o r 𝑗 = 𝑖, ⋯ , 𝑛 − 1
𝛼 = 𝐚[in-i(i+1)/2+j]
f o r 𝑝 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 − 𝐚[in-i(i+1)/2+p]⋅𝐰[p]
𝐚[in-i(i+1)/2+j] = 𝛼

Algorithm 12: Crout’s Method - Symmetric Implementation


f o r 𝑗 = 1, ⋯ , 𝑛
f o r 𝑖 = 1, ⋯ , 𝑗 − 1
𝑎𝑗𝑖
𝑤𝑖 =
𝑎𝑖𝑖
f o r 𝑖 = 𝑗, ⋯ , 𝑛
𝛼 = 𝑎𝑖𝑗
f o r 𝑝 = 1, ⋯ , 𝑗 − 1
𝛼 = 𝛼 − 𝑎𝑖𝑝 𝑤𝑝
𝑎𝑖𝑗 = 𝛼

If the lower triangular portion of 𝐀 is stored in a linear array, Algorithm 13


results in the same factorization (assuming zero based subscripting).
e algorithm overwrites 𝐀 with 𝐋.
For the general implementation of Crout’s method, see Section 4.3.

7.6 Forward Substitution for Symmetric Systems


Symmetry reduces the amount of work required to decompose symmetric sys-
tems into triangular factors. It does not reduce the work required to actually
solve the system from an existing triangular factorization. Implementing for-
ward substitution for a symmetric decomposition boils down to making sure
implicit data (i.e. the portion of the symmetric factorization that is not not phys-
ically stored) is correctly derived from the explicitly stored data. See Section 7.2
for a discussion of implicit data in symmetric LU decompositions.

39
7.6 Forward Substitution for Symmetric Systems
7 SYMMETRIC MATRICES

Algorithm 13: Crout’s Method - Symmetric, Array Based


f o r 𝑗 = 0, ⋯ , 𝑛 − 1
f o r 𝑖 = 0, ⋯ , 𝑗 − 1
𝐚[jn-j(j+1)/2+i]
𝐰[j] =
𝐚[jn-j(j+1)/2+j]
f o r 𝑖 = 𝑗, ⋯ , 𝑛 − 1
𝛼 = 𝐚[i(i+1)/2+j]
f o r 𝑝 = 1, ⋯ , 𝑖 − 1
𝛼 = 𝛼 - 𝐚[i(i+1)/2+p] ⋅ 𝐰[p]
𝐚[i(i+1)/2+j] = 𝛼

7.6.1 Forward Substitution Using Lower Triangular Factors


Forward substitution solves lower triangular systems. When 𝐋 is available, the
symmetric and asymmetric solution algorithms are identical. See Section 5.1
for the general implementation of forward substitution. If 𝐋 is stored in a linear
array, use Equation 85 or Equation 86 for indexing.

7.6.2 Forward Substitution Using Upper Triangular Factors


e case in which 𝐔 is the explicit factorization is examined more closely. If 𝐔
is an 𝑛 × 𝑛 matrix containing the upper triangular factors of a symmetric matrix,
then 𝐋 is unit lower triangular and obtained from 𝐔 via Equation 80 with 𝑙𝑗𝑗
being one. Substituting Equation 80 into Equation 62 yields:
𝑛
𝑢𝑗𝑖 𝑦𝑗
𝑦𝑖 = 𝑏𝑖 −
 𝑢𝑗𝑗
, where 1 ≤ 𝑖 ≤ 𝑛 (87)
𝑗=𝑖+1

Algorithm 14 implements Equation 87, i.e. the inner product formulation of


forward substitution for symmetric systems whose upper triangular factors are
available.
If 𝐔 is stored in a linear array with zero based indexing, the inner product
formulation of forward substitution is implemented by Algorithm 15.
Algorithm 16 implements the outer product formulation of forward substi-
tution for symmetric systems whose upper triangular factors are available.
is differs from the asymmetric implementation in that
𝑢𝑖𝑘
𝑙𝑖𝑖 = 1 and 𝑙𝑘𝑖 = , where 𝑖 ≠ 𝑘 (88)
𝑢𝑖𝑖

40
7.7 Backward Substitution for Symmetric Systems
7 SYMMETRIC MATRICES

Algorithm 14: Symmetric Forward Substitution via Upper Triangular Factors


f o r 𝑖 = 1, ⋯ , 𝑛
𝛼 = 𝑏𝑖
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
𝑢𝑗𝑖
𝛼=𝛼− 𝑦𝑗
𝑢𝑗𝑗
𝑦𝑖 = 𝛼

Algorithm 15: Symmetric Forward Substitution using 𝐔 with Array Storage


f o r 𝑖 = 0, ⋯ , 𝑛 − 1
𝛼 = 𝐛[i]
f o r 𝑗 = 0, ⋯ , 𝑖 − 1
𝐮[jn-j(j+1)/2+i]
𝛼=𝛼− ⋅ 𝐲[j]
𝐮[jn-j(j+1)/2+j]
𝐲[i] = 𝛼

erefore, the initial division by 𝑙𝑖𝑖 is omitted and the division by 𝑢𝑖𝑖 is pulled
out of the 𝑘 loop. e outer product formulation of forward substitution where
𝐔 is stored in a linear array with zero based indexing is realized by Algorithm
17.

7.7 Backward Substitution for Symmetric Systems


As stated previously, symmetry reduces the amount of work required to decom-
pose symmetric systems into triangular factors. It does not reduce the work
required to actually solve the system from an existing triangular factorization.
Implementing backward substitution for a symmetric decomposition reduces

Algorithm 16: Symmetric Forward Substitution using 𝐔, Outer Product


f o r 𝑖 = 1, ⋯ , 𝑛
𝑦𝑖 = 𝑏𝑖
𝑦
𝛼= 𝑖
𝑢𝑖𝑖
f o r 𝑘 = 𝑖 + 1, ⋯ , 𝑛
𝑏𝑘 = 𝑏𝑘 − 𝛼𝑢𝑖𝑘

41
7.8 Symmetric Factor Update 7 SYMMETRIC MATRICES

Algorithm 17: Symmetric Forward Substitution using 𝐔, Outer Product, Array


f o r 𝑖 = 0, ⋯ , 𝑛 − 1
𝐲[i] = 𝐛[i]
𝐲[i]
𝛼=
𝐮[in-i(i+1)/2+i]
f o r 𝑘 = 𝑖 + 1, ⋯ , 𝑛 − 1
𝐛[k] = 𝐛[k] −𝛼 ⋅ 𝐮[in-i(i+1)/2+k]

to making sure that implicit data (i.e. the portion of the symmetric factoriza-
tion that is not not physically stored) is correctly derived from the explicitly
stored data. See Section 7.2 for a discussion of implicit data in symmetric LU
decompositions.

7.7.1 Back Substitution Using Upper Triangular Factors


Backward substitution solves upper triangular systems. When 𝐔 is available,
the symmetric and asymmetric solution algorithms are identical. See Section
5.2 for the general implementation of backward substitution. If 𝐔 is stored in
a linear array, use Equation 85 or Equation 86 for indexing.

7.7.2 Back Substitution Using Lower Triangular Factors


e case in which 𝐋 is the explicit factorization merits further attention. If 𝐋 is
an 𝑛 × 𝑛 matrix containing the upper triangular factors of a symmetric matrix,
then 𝐔 is unit lower triangular and obtained from 𝐋 via Equation 78 with 𝑢𝑖𝑖
being one. Substituting Equation 78 into Equation 63 yields:
𝑛
𝑙𝑗𝑖 𝑥𝑗
𝑥𝑖 = 𝑦 𝑖 −
 𝑙𝑗𝑗
, where 𝑖 = 𝑛, 𝑛 − 1, ⋯ , 1 (89)
𝑗=𝑖+1

Algorithm 18 implements this inner product formulation of backward substi-


tution for symmetric systems whose lower triangular factors are available.
If 𝐋 is stored in a linear array with zero based indexing, the inner product
formulation of back substitution is stated in Algorithm 19.

7.8 Symmetric Factor Update


If the 𝐋𝐔 decomposition of the 𝑛 × 𝑛 symmetric matrix 𝐀 exists and the factor-
ization of a related matrix

42
7.8 Symmetric Factor Update 7 SYMMETRIC MATRICES

Algorithm 18: Symmetric Back Substitution using Lower Triangular Factors


f o r 𝑖 = 1, ⋯ , 𝑛
𝛼 = 𝑦𝑖
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
𝑙𝑗𝑖
𝛼=𝛼− 𝑥𝑗
𝑙𝑗𝑗
𝑥𝑖 = 𝛼

Algorithm 19: Symmetric Backward Substitution using 𝐋 with Array Storage


f o r 𝑖 = 𝑛 − 1, ⋯ , 0
𝛼 = 𝐲[i]
f o r 𝑗 = 𝑖 + 1, ⋯ , 𝑛 − 1
𝐥[jn-j(j+1)/2+i]
𝛼=𝛼− ⋅ 𝐱[j]
𝐥[jn-j(j+1)/2+j])
𝐱[i] = 𝛼

𝐀 = 𝐀 + 𝚫𝐀
is desired, factor update is often the procedure of choice.
Section 6 examines factor update techniques for dense, asymmetric matrices.
e current section examines techniques that exploit computational efficiencies
introduced by symmetry. Symmetry reduces the work required to update the
factorization of 𝐀 by half, just as it reduces the work required to decompose 𝐀
in the rst place.
More speci cally, the current section examines procedures for updating the
factors of 𝐀 following a symmetric rank one modi cation

𝐀 = 𝐀 + 𝛼𝐲𝐲𝐓
where 𝛼 is a scalar and 𝐲 is an 𝑛 vector.

7.8.1 Symmetric LDU Factor Update


Algorithm C1 of Gill, Golub, et al.(1974) [10] updates the factors 𝐋 and 𝐃 of
a 𝐋𝐃𝐔 decomposition of 𝐀. e algorithm assumes that the upper triangular
factors 𝐔 are implicit. Algorithm 20 mimics Algorithm C1.

43
7.8 Symmetric Factor Update 7 SYMMETRIC MATRICES

Algorithm 20: Symmetric LDU Factor Update


f o r 𝑗 = 1, ⋯ , 𝑛
𝛿 = 𝑑𝑗
𝑝 = 𝑦𝑗
𝑑𝑗 = 𝑑𝑗 + 𝛼𝑝2
𝛼𝑝
𝛽=
𝑑𝑗
𝛼𝛿
𝛼=
𝑑𝑗
f o r 𝑖 = 𝑗 + 1, ⋯ , 𝑛
𝑦𝑖 = 𝑦𝑖 − 𝑝𝑙𝑖𝑗
𝑙𝑖𝑗 = 𝑙𝑖𝑗 + 𝛽𝑦𝑖

e scalar 𝛼 and the 𝐲 vector are destroyed by this procedure. e factors of


𝐀 are overwritten by their new values.
See Section 6.1 for a discussion of updating asymmetric LDU factorizations
following rank one changes to a matrix.

7.8.2 Symmetric LU Factor Update


If the 𝐋𝐔 decomposition of the symmetric matrix 𝐀 exists and 𝐔 is stored ex-
plicitly, the recurrence relations of the inner loop of Algorithm 20 must change.
Following a train of thought similar to the derivation of Algorithm 9 (see Sec-
tion 6.2 for details) results in Algorithm 21 which updates 𝐔 based on a rank
one change to 𝐀.
If 𝐔 is maintained in a zero-based linear array, Algorithm 21 changes in the
normal manner, that is

1. e double subscript notation is replaced by the indexing rule de ned in


Equation 86.
2. e outer loop counter 𝑖 ranges from zero to 𝑛 − 1.
3. e inner loop counter 𝑗 ranges from 𝑖 + 1 to 𝑛 − 1.

e outer loops of the symmetric factor update algorithms do not have to


begin at one unless 𝐲 is full. If 𝐔 has any leading zeros, the initial value of 𝑖
(or 𝑗 in Algorithm 20) should be the index of 𝑦𝑖 the rst nonzero element of 𝐲.
If there is no a priori information about the structure of 𝐲 but there is a high

44
8 SPARSE MATRICES

Algorithm 21: Symmetric LU Factor Update


f o r 𝑖 = 1, ⋯ , 𝑛
𝛿 = 𝑢𝑖𝑖 , 𝑝 = 𝑦𝑖
𝑢𝑖𝑖 = 𝑢𝑖𝑖 + 𝛼𝑝2
𝛽 = 𝛼𝑝
𝑝
𝑝=
𝛿
𝑢
𝛿 = 𝑖𝑖
𝛿
𝛼
𝛼=
𝛿
f o r 𝑗 = 𝑖 + 1, ⋯ , 𝑛
𝑦𝑗 = 𝑦𝑗 − 𝑝𝑢𝑖𝑗
𝑢𝑖𝑗 = 𝛿𝑢𝑖𝑗 + 𝛽𝑦𝑦

probability of leading zeros, testing 𝑦𝑖 for zero at the beginning of the loop might
save a lot of work. However, you must remember to suspend the test as soon as
the rst nonzero value of 𝑦𝑖 is encountered.
For a fuller discussion of the derivation and implementation of LU factor
update, see Section 6.2.

8 Sparse Matrices
e preceding sections examined dense matrix algorithms for solving systems
of linear equations. It was seen that signi cant savings in storage and compu-
tation is achieved by exploiting the structure of symmetric matrices. An even
more dramatic performance gain is possible by exploiting the sparsity intrin-
sic to many classes of large systems. Sparse matrix algorithms are based on the
simple concept of avoiding the unnecessary storage of zeros and unnecessary
arithmetic associated with zeros (such as multiplication by zero or addition of
zero). Recognizing and taking advantage of sparsity often permits the solution
of problems that are otherwise computationally intractable. Practical examples
provided by Tinney and Hart (1967) [13] suggest that in the analysis of large
power system networks the use of sparse matrix algorithms makes both the stor-
age and computational requirements approximately linear with respect to the
size of the network. In other words, data storage is reduced from an 𝑂 𝑛2 
problem to an 𝑂 (𝑛) problem and computational complexity diminishes from
𝑂 𝑛3  to 𝑂 (𝑛).

45
8.1 Sparse Matrix Methodology 8 SPARSE MATRICES

8.1 Sparse Matrix Methodology


Any matrix with a signi cant number of zero-valued elements is referred to as a
sparse matrix. e meaning of “signi cant” in the preceding de nition is rather
vague. It is pinned down (in a circular way) by de ning a sparse matrix to be
a matrix with enough zeros to bene t from the use of sparsity techniques. e
intent of this de nition is to emphasize that there is a computational overhead
required by sparse matrix procedures. If the degree of sparsity in a matrix com-
pensates for the algorithmic overhead, sparsity techniques should be employed.
Otherwise, dense matrix algorithms should be utilized. is argument simply
restates a fundamental rule of numerical analysis, a priori information concern-
ing the nature of a problem tends to result in more efficient solution techniques.
e next few sections will explore the application of sparsity techniques to
the solution of large systems of linear equations. e standard approach is to
break the solution into three phases:

1. Analyze. Determine an ordering of the equations such that the 𝐋𝐔 decom-


position will retain as much sparsity as possible. is problem has been
shown to be N-P complete (i.e. the optimal solution can not be efficiently
determined). However, a number of satisfactory heuristics are available.
e analysis phase of the solution usually produces a complete de nition
of the sparsity pattern that will result when the 𝐋𝐔 decomposition is com-
puted.
2. Factor. Compute the 𝐋𝐔 decomposition.
3. Solve. Use the 𝐋𝐔 decomposition to compute a solution to the system of
equations (i.e. perform forward and backward substitution).

e degree to which these phases are distinct depends on the implementa-


tion.

8.2 Abstract Data Types for Sparse Matrices


e traditional implementation of sparse matrix techniques in engineering and
scienti c analysis is heavily reminiscent of its roots in the static data structures
of FORTRAN. Descriptions of these data structures are provided by Duff, Eris-
man, and Reid (1986) [3], George and Liu (1981) [7], Eisenstat, Schultz, and
Sherman (1976) [14], and Tinney and Hart (1967) [13] among others. e
current implementation takes a different tack. Sparse matrix algorithms are de-
scribed using an abstract data type paradigm. at is, data sets and operators

46
8.2 Abstract Data Types for Sparse Matrices 8 SPARSE MATRICES

are speci ed, but the actual data structures used to implement them are left un-
de ned. Any data structure that efficiently satis es the constraints imposed in
this section is suited for the job.
All signals emitted by the operators de ned in this section are used to navi-
gate through data, not to indicate errors. Error processing is intentionally omit-
ted from the algorithms appearing in this document. e intent is to avoid
clutter that obscures the nature of the algorithms.

8.2.1 Sparse Matrix Proper


A sparse matrix, 𝐀, is stored in a dynamic data structure that locates an element
𝑎𝑖𝑗 based on its row index 𝑖 and column index 𝑗 . e following operations are
supported on 𝐀:

• Insert adds an arbitrary element 𝑎𝑖𝑗 to 𝐀. If 𝑎𝑖𝑗 does not already exist,
insert signals a successful insertion.

• Get retrieves an arbitrary element 𝑎𝑖𝑗 from 𝐀. When 𝑎𝑖𝑗 is an element of


𝐀, get signals a successful lookup.

• Scan permits sequential access to the nonzero entries of row 𝑖 of 𝐀. Row


scans are bounded. More speci cally, a row scan nds all nonzero entries
𝑎𝑖𝑗 in row 𝑖 of 𝐀 such that 𝑗𝑚𝑖𝑛 ≤ 𝑗 ≤ 𝑗𝑚𝑎𝑥 .When scan nds 𝑎𝑖𝑗 its column
index 𝑗 is returned. When the scan has exhausted all entries in its range,
a nished signal is emitted.
A scan has two support operations push and pop. A push suspends the
scan at its current position a pop resumes a suspended scan. e push
and pop operations permit scans to be nested.
• Put updates the value an arbitrary element 𝑎𝑖𝑗 of 𝐀.

e algorithms assume that operations that read the data structure (get and
scan) make the designated element 𝑎𝑖𝑗 of 𝐀 available in a buffer (this buffer is
usually denoted by the symbol 𝑎). Operations that update 𝑎𝑖𝑗 (insert and
put) do so based on the current contents of the communication buffer 𝑎.
Section 9.1 examines one possible realization of the sparse matrix data type.

8.2.2 Adjacency List


An adjacency list, 𝐴, is a data type for representing adjacency relationships of
the sparse graph 𝐺 = (𝑉, 𝐸). An adjacency list is typically stored in a dynamic
data structure that identi es the edge from vertex 𝑖 to vertex 𝑗 as an ordered pair

47
8.2 Abstract Data Types for Sparse Matrices 8 SPARSE MATRICES

of vertex labels (𝑖, 𝑗). Descriptive information is usually associated with each
edge.
Since both adjacency lists and sparse matrices represent sparse networks, it
should come as no surprise that they require a similar set of operations. More
speci cally, the following operations are supported on an adjacency list 𝐴:

• Insert adds an arbitrary edge (𝑖, 𝑗) to 𝐴. If edge (𝑖, 𝑗) is not already in


the list, insert signals a successful insertion.
• Get retrieves an arbitrary edge (𝑖, 𝑗) from 𝐴. When edge (𝑖, 𝑗) is in 𝐴, get
signals a successful lookup.
• Scan permits sequential access to all edges incident upon vertex 𝑖. Vertex
scans are bounded. More speci cally, a vertex scan nds all edges (𝑖, 𝑗)
such that 𝑗𝑚𝑖𝑛 ≤ 𝑗 ≤ 𝑗𝑚𝑎𝑥 . When scan nds edge (𝑖, 𝑗), it returns 𝑗 .
When a vertex scan has exhausted all entries in its range, a nished signal
is emitted.
A scan has two support operations: push and pop. A push suspends
the scan at its current position. A pop resumes a suspended scan. e
push and pop operations permit scans to be nested.

• Put updates the information associated with an arbitrary edge (𝑖, 𝑗) in 𝐴.

e algorithms assume that read operations (get and scan) make edge
information available in a buffer (this buffer is usually denoted by the symbol
𝑎). Update operations (insert and put) modify the description of an edge
based on the current contents of the communication buffer.
Implementation of adjacency lists is examined in detail in Graph Algo-
rithms1 .

8.2.3 Reduced Graph


A reduced graph, 𝐺 = (𝑉  , 𝐸  ), is a data structure that supports the systematic
elimination of all vertices from the graph 𝐺 = (𝑉, 𝐸). e vertices of the re-
duced graph are denoted as 𝑉  (𝐺 ) and its edges as 𝐸  (𝐺 ). A crucial attribute
of the reduced graph is efficient identi cation of the vertex in 𝑉  (𝐺 ) with the
minimum degree.
A reduced graph supports the following operations:

• Increase_degree increases the degree of vertex 𝑣 in 𝑉  (𝐺 ) by one.


1
https://vismor.com/documents/network_analysis/graph_algorithms/

48
8.2 Abstract Data Types for Sparse Matrices 8 SPARSE MATRICES

• Decrease_degree decreases the degree of vertex 𝑣 in 𝑉  (𝐺 ) by one.


• In_graph tests to see whether vertex 𝑣 is in 𝑉  (𝐺 ).
• Minimum_degree nds the vertex 𝑣 in 𝑉  (𝐺 ) with the smallest degree.
• Remove excises vertex 𝑣 from $𝑉  (𝐺 ) .

Implementation of reduced graph modeling is examined in detail in Graph


Algorithms2 .

8.2.4 List
A simple list 𝐿 is an ordered set of elements. If the set {𝑙1 , ⋯ , 𝑙𝑖 , 𝑙𝑖+1 , ⋯ , 𝑙𝑛 }
represents 𝐿, then the list contains 𝑛 elements. Element 𝑙1 is the rst item on
the list and 𝑙𝑛 is the last item on the list. Element 𝑙𝑖 precedes 𝑙𝑖+1 and element
𝑙𝑖+1 follows 𝑙𝑖 . Element 𝑙𝑖 is at position 𝑖 in 𝐿. Descriptive information may
accompany each item on a list. Lists associated with matrix algorithms support
the following operations:

• Link adds an element 𝑥 to a list at position 𝑖. Inserting element 𝑥 into


the list at position 𝑖 results an an updated list: {𝑙1 , ⋯ , 𝑙𝑖−1 , 𝑥, 𝑙𝑖 , 𝑙𝑖+1 , ⋯ , 𝑙𝑛 }
An insertion at position 𝑒𝑜𝑙 appends 𝑥 to the end of the list.
• Unlink removes the element at position 𝑖 from the list. Deleting element
𝑖 results in the list {𝑙1 , ⋯ , 𝑙𝑖−1 , 𝑙𝑖+1 , ⋯ , 𝑙𝑛 }.

• Find looks for an element on the list and returns its position. If the
element is not a member of the list, 𝑒𝑜𝑙 is returned.
• First returns the position of the rst item on the list. When the list is
empty, 𝑒𝑜𝑙 is returned.
• Next returns position 𝑖 + 1 on the list if position 𝑖 is provided. If 𝑙𝑖 is the
last item on the list, 𝑒𝑜𝑙 is returned.
• Prev returns position 𝑖 − 1 on the list if position 𝑖 is provided. If 𝑖 is one,
𝑒𝑜𝑙 is returned.

A linked list refers to a list implementation that does not require its mem-
bers to reside in contiguous storage locations. In this environment, an efficient
implementation of the prev operator dictates the use of a doubly linked list.
2
https://vismor.com/documents/network_analysis/graph_algorithms/

49
8.3 Pivoting To Preserve Sparsity 8 SPARSE MATRICES

Communicating with a simple list is analogous to adjacency list communi-


cation. Read operations (find, first, next, and prev) make list information
available in a buffer. Update operations (link, unlink) modify the list based
on the current contents of the buffer.

8.2.5 Mapping
A mapping 𝜇 relates elements of its domain 𝑑 to elements of its range 𝑟 as follows.

𝜇(𝑑) = 𝑟
A mapping resides in a data structure that supports two operations:

• Map links an element 𝑟 in the range of 𝜇 to an arbitrary element 𝑑 in the


domain of 𝜇, i.e. sets 𝜇(𝑑) to 𝑟.
• Evaluate evaluates the mapping 𝜇 for an arbitrary element 𝑑 in its do-
main, i.e. returns 𝜇(𝑑).

8.2.6 Vector
For simplicity of exposition, a full vector is represented as a linear array. How-
ever, any data structure that lets you retrieve and update an arbitrary element 𝑏𝑖
of a vector 𝐛 based upon its index 𝑖 will suffice.

8.3 Pivoting To Preserve Sparsity


As Gaussian elimination is applied to a sparse matrix 𝐀, row operations tend to
introduce nonzero elements into 𝐋 and 𝐔 that have no counterpart in 𝐀. ese
nonzero entries in 𝐋 and 𝐔 that are induced by the factorization process are
referred to as ll-ups. A fact central to sparse matrix techniques is that changes
in the pivot strategy change the number ll-ups that occur during factorization.
is being the case, an important goal of sparse matrix algorithms is to nd
a pivot strategy that minimizes the number of ll-ups during 𝐋𝐔 decomposi-
tion. For the asymmetric case, Rose and Tarjan (1975) [15] have shown that
this minimization problem is NP complete. For the symmetric case, no opti-
mal solution exists to date. erefore, existing ll-up reduction algorithms are
heuristic in nature.

50
8.3 Pivoting To Preserve Sparsity 8 SPARSE MATRICES

8.3.1 Markowitz Pivot Strategy


Among the most successful heuristics are those based on the work of Markowitz
[1957]. A Markowitz pivot strategy involves choosing a pivot element 𝑎𝑖𝑗 which
minimizes a quantity called the Markowitz count at each elimination step. e
Markowitz count is the product

(𝑟𝑖 − 1)(𝑐𝑗 − 1) (90)


where

𝑟𝑖 is the number of entries in row 𝑖 of the reduced matrix, and

𝑐𝑗 is the number of entries in column 𝑗 of the reduced matrix.

Stability considerations are introduced into Markowitz pivoting strategies by


de ning numerical thresholds that also apply to pivot candidates. In effect,
these thresholds temper sparsity considerations to preserve numerical stability.
Typical threshold considerations require that the successful pivot candidate sat-
isfy the following conditions:

|𝑎(𝑘) (𝑘)
𝑖𝑗 | ≥ 𝑢 max |𝑎𝑖𝑗 | (91)
𝑙≥𝑘

where 𝑢 is a number falling in the range 0 < 𝑢 ≤ 1.


Duff, Erisman, and Reid (1986) [3] provide a thorough examination of piv-
oting strategies in asymmetric matrices that are based on the Markowitz crite-
rion.

8.3.2 Minimum Degree Pivot Strategy


If 𝑎𝑗𝑖 is nonzero whenever 𝑎𝑖𝑗 is nonzero, matrix 𝐀 has a symmetric sparsity
structure. If 𝐀 is diagonally dominant and has a symmetric sparsity structure,
the minimum degree pivot strategy is commonly used to reduce ll-ups during
𝐋𝐔 decomposition. Minimum degree pivoting is a special case of Markowitz
pivoting that ignores the numerical values of 𝑎𝑖𝑗 and concentrates the structure
of 𝐀.
e most straightforward motivation of minimum degree pivoting is based
on the following observations:

• Any matrix 𝐀 with symmetric sparsity structure can be represented by an


undirected, ordered graph 𝐺 = (𝑉, 𝐸).

51
8.4 Symbolic Factorization of Sparse Matrices 8 SPARSE MATRICES

• e effect of Gaussian elimination on the sparsity structure of 𝐀 is mod-


eled by the impact of eliminating vertices from 𝐺.
Vertex elimination is examined elsewhere in this series of monographs (see
the discussion of reduced graph modeling in Graph Algorithms3 ). At this point,
it suffices to say that a vertex 𝑣 is eliminated from the graph 𝐺 = (𝑉, 𝐸) by
1. Removing vertex 𝑣 from 𝑉(𝐺).
2. Removing all edges that were incident upon 𝑣 from 𝐸(𝐺).
3. Adding edges to 𝐸(𝐺) that connect all the vertices that were adjacent to
𝑣.
e edges that are added to 𝐺 when vertex 𝑣 is eliminated correspond to the
ll-ups that occur in 𝐀 when row 𝑣 is eliminated.
e minimum degree pivot strategy is just the order in which the following
algorithm eliminates the vertices of 𝐺.
for 𝑖 = 1, ⋯ , |𝑉|
Choose the vertex 𝑣 from 𝑉 that has the minimum degree
Eliminate vertex 𝑣 from 𝐺
You should recall that the degree of vertex 𝑣 is the number of vertices that are
adjacent to 𝑣. e algorithm has an arbitrary tie-breaking rule. If more than one
vertex is of minimum degree at the beginning of an elimination step, any ver-
tex from the group may be eliminated. Gomez and Franquelo (1988a,b) [16],
[17] examine the impact of alternate tie-breaking schemes on minimum degree
pivoting strategies.
Simply put, the minimum degree algorithm pivots on the vertex which has
the fewest neighbors at each elimination step. is heuristic appears to have
been rst described by Tinney and Hart (1967) [13]. A lucid and thorough
examination of the topic is found in George and Liu (1981) [7].

8.4 Symbolic Factorization of Sparse Matrices


e goal of symbolic factorization is to de ne the sparsity pattern of the 𝐋𝐔
decomposition of a sparse matrix 𝐀. Recall that

𝐋𝐔 = 𝐏𝐀𝐐 (92)
where 𝐏 and 𝐐 are row and column permutations that re ect the pivot strategy
associated with the factorization process.
3
https://vismor.com/documents/network_analysis/graph_algorithms/

52
8.4 Symbolic Factorization of Sparse Matrices 8 SPARSE MATRICES

8.4.1 Symbolic Factorization with Minimum Degree Pivot


e goal of the current section is somewhat more speci c. It describes a symbolic
factorization algorithm that simulates the decomposition of 𝐀 when a minimum
degree pivot strategy is applied. e algorithm operates on an undirected graph
𝐺 = (𝑉, 𝐸) whose vertices 𝑉 are labeled from 1 to ||𝑉 ||. 𝐺 is de ned by its ad-
jacency list 𝐴. e algorithm arrives at a symbolic factorization by eliminating
vertices from 𝐺. Each stage of the process creates a reduced graph 𝐺 = (𝑉  , 𝐸  ).
e vertex 𝑣 with the minimum degree in 𝐺 is chosen as the next elimination
candidate. e algorithm describes the structure of 𝐋, 𝐔, 𝐏, and 𝐐 in the fol-
lowing manner:
• e adjacency list 𝐴 is augmented to account for ll-ups that will occur
during numeric factorization.
• A list 𝐿 is created. e list orders the vertices in 𝑉 according to a mini-
mum degree pivot strategy. If the vertices of 𝑉 are labeled according to
their position in 𝐋𝐔, a minimum degree ordering of 𝑉 is generated.
• A mapping 𝜓 is created. e domain of 𝜓 is the initial label of vertex
𝑣. e range of 𝜓 is the minimum degree label of 𝑣. at is, 𝜓(𝑣) is the
minimum degree label of 𝑣.
• A mapping 𝜏 is created. e domain of 𝜏 is the the minimum degree label
of vertex 𝑣. e range of 𝜏 is the initial label of 𝑣. at is, if 𝑣 is the
minimum degree label of a vertex, 𝜏(𝑣) is the initial label of the vertex.
e value of an element 𝑎𝑖𝑗 in the adjacency list is a binary variable indicating
whether edge (𝑖, 𝑗) is a ll-up or not. e ll-up indicator is communicated
through the buffer 𝑎 in the normal manner.
Algorithm 22 computes a symbolic factorization of 𝐺 that consists of these
four data structures.
A few observations concerning the factorization procedure:
• e reduced graph 𝐺 is tracked in data structures designed to model ver-
tex elimination (see Section 8.2.3 and Graph Algorithms4 for implemen-
tation details).
• e adjacency list 𝐴 is augmented to log ll-ups but it is never diminished
to re ect graph reduction.
• If the algorithm is efficiently implemented, 𝜓 , 𝜏 , and 𝑙 can occupy space
that is vacated as 𝐺 shrinks.
4
https://vismor.com/documents/network_analysis/graph_algorithms/

53
8.4 Symbolic Factorization of Sparse Matrices 8 SPARSE MATRICES

Algorithm 22: Symbolic Factorization of a Sparse Matrix


f o r 𝑖 = 1, ⋯ , |𝑉|
𝑣 = minimum_degree ( 𝑉  )
w h i l e [ 𝑤 = scan ( 𝐴 , row 𝑣, 1, |𝑉|)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
i f in_graph ( 𝑉  , 𝑤 )
decrease_degree ( 𝑉  , 𝑤 )
push
w h i l e [ 𝑧 = scan ( 𝐴 , row 𝑣, 𝑤 + 1, |𝑉|)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑎 = llup
i f insert ( 𝐴, 𝑧, 𝑤 )
increase_degree ( 𝑉  , 𝑧 )
insert ( 𝐴, 𝑤, 𝑧 )
increase_degree ( 𝑉  , 𝑤 )
pop
remove ( 𝑉  , 𝑣 )
map ( 𝜓, 𝑣, 𝑖 )
map ( 𝜏, 𝑖, 𝑣 )
link ( 𝐿, 𝑣 )

8.4.2 Computational Complexity of Symbolic Factorization


e complexity of the symbolic factorization is determined by the size of the ad-
2
jacency list. One iteration of the main loop requires 𝑒 2−𝑒 adjacency list accesses,
where 𝑒 is the number of edges incident upon vertex 𝑣. If all vertices in 𝑉 have
2
the same number of incident edges, the total operation count is ||𝑉 || 𝑒 2−𝑒 . In this
case, the time complexity of symbolic factorization is 𝑂 ||𝑉 || 𝑒2  if all operations
on the data structures are 𝑂 (1).
Consider the following comparison of symbolic factorization operation counts
to those associated with the 𝐋𝐔 decomposition of a dense matrix. If a graph
has 100 vertices each of which has 5 incident edges, computing the symbolic
factorization requires 1000 operations. Computing the dense factorization re-
quires 661,650 operations. If the number of vertices increases to 1000 while
the incident edges per vertex remains constant (the typical scenario in network
analysis problems associated with electric power systems), symbolic factorization
requires 1 × 104 operations and dense matrix decomposition requires 6 × 108 op-
erations.

54
8.5 Creating 𝐏𝐀𝐏𝐓 from a Symbolic Factorization 8 SPARSE MATRICES

8.5 Creating 𝐏𝐀𝐏𝐓 from a Symbolic Factorization


Symbolic factorization determines the structure of 𝐋 and 𝐔 when the product
𝐏𝐀𝐐 is decomposed. e current section examines a procedure for creating
this product directly when 𝐀 does not exist. Pivoting down the diagonal of the
resulting matrix creates the 𝐋𝐔 decomposition predicted by symbolic factoriza-
tion.
More speci cally, the current section describes an algorithm which acts on
the adjacency list 𝐴 of an undirected graph 𝐺 = (𝑉, 𝐸) to create a matrix with
symmetric sparsity pattern that represents the product

𝐏𝐀𝐏𝐓
where

𝐀 is the matrix that corresponds to the graph 𝐺.

𝐏 is the row permutation matrix corresponding to the minimum degree labeling


of 𝑉 .

It is assumed that 𝐴 has been expanded to accommodate ll-ups that will, occur
during 𝐋𝐔 decomposition and that the permutation matrix 𝐏 is de ned by

• A list 𝐿 which traverses 𝑉 in minimum degree order, and


• A mapping 𝜓 whose domain is the initial label of a vertex 𝑣 from 𝑉(𝐺)
and whose range is the minimum degree label of 𝑣.

See Section 8.4.1 for details concerning creation of the augmented adjacency
list 𝐴 and the permutation matrix 𝐏, i.e. the minimum degree traversal 𝐿 and
vertex label mapping 𝜓 .
It is also assumed that both the adjacency list 𝐴 and the matrix 𝐏𝐀𝐏𝐓 are
maintained in sparse data structures supporting the scan and insert oper-
ators. Communication with the data structures is maintained through buffers
𝑎 and 𝑝𝑎𝑝 in the normal manner. It is further assumed that procedure make
creates element 𝑎𝑖𝑗 of 𝐀 when its row and column indices, 𝑖 and 𝑗 , are speci ed.
Algorithm 23 constructs a full matrix 𝐏𝐀𝐏𝐓 based on these assumptions.
Zero valued entries are created for elements that will ll up during 𝐋𝐔 decom-
position.
Algorithm 24 constructs the symmetric matrix 𝐏𝐀𝐏𝐓 based on these as-
sumptions.

55
8.5 Creating 𝐏𝐀𝐏𝐓 from a Symbolic Factorization 8 SPARSE MATRICES

Algorithm 23: Construct 𝐏𝐀𝐏𝐓 of a Sparse Matrix


𝑣 = first ( 𝐿 )
f o r 𝑖 = 1, ⋯ , |𝑉|
w h i l e [ 𝑤 = scan ( 𝐴 , row 𝑣, 1, |𝑉|)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
i f 𝑎 is llup
𝑝𝑎𝑝 = 0
else
𝑝𝑎𝑝 = make ( 𝐀, 𝑣, 𝑤 )
𝑗 = evaluate ( 𝜓, 𝑤 )
insert ( 𝐏𝐀𝐏𝐓 , 𝑖, 𝑗 )
𝑝𝑎𝑝 = make ( 𝐀, 𝑣, 𝑣 )
insert ( 𝐏𝐀𝐏𝐓 , 𝑖, 𝑖 )
𝑝𝑎𝑝 = row header
insert ( 𝐏𝐀𝐏𝐓 , 𝑖, 0 )
𝑣 = next ( 𝐿, 𝑣 )

Algorithm 24: Construct 𝐏𝐀𝐏𝐓 of a Sparse Symmetric Matrix


𝑣 = first ( 𝐿 )
f o r 𝑖 = 1, ⋯ , |𝑉|
w h i l e [ 𝑤 = scan ( 𝐴 , row 𝑣, 1, |𝑉|)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑗 = evaluate ( 𝜓, 𝑤 )
if 𝑗>𝑖
i f 𝑎 is llup
𝑝𝑎𝑝 = 0
else
𝑝𝑎𝑝 = make ( 𝐀, 𝑣, 𝑤 )
insert ( 𝐏𝐀𝐏𝐓 , 𝑖, 𝑗 )
𝑝𝑎𝑝 = make ( 𝐀, 𝑣, 𝑣 )
insert ( 𝐏𝐀𝐏𝐓 , 𝑖, 𝑖 )
𝑣 = next ( 𝐿, 𝑣 )

56
8.6 Numeric Factorization of Sparse Matrices 8 SPARSE MATRICES

8.6 Numeric Factorization of Sparse Matrices


Numeric factorization algorithms work with nonzero values of a sparse matrix
𝐀 and the data structures resulting from symbolic factorization to compute the
factorization

𝐋𝐔 = 𝐏𝐀𝐏𝐓
Algorithms discussed in the current section act on a sparse 𝑛 × 𝑛 matrix 𝐀. ey
assume that

• 𝐀 already re ects the pivot strategy de ned by 𝐏 and 𝐏𝐓 , i.e. the algorithms
pivot down the diagonal.
• 𝐀 has zero-valued entries at ll-up locations.
• 𝐀 is maintained in a sparse data structure supporting the get, scan, and
put operators. Communication with the data structure is maintained
through the buffer 𝑎 in the normal manner.

See Section 8.5 for details concerning the creation of a pivot ordered, ll-up
augmented 𝐀 matrix.
Algorithm 25 uses Doolittle’s method (see Section 4.2 for more informa-
tion) to compute the 𝐋𝐔 decomposition of a sparse matrix 𝐀. e algorithm
overwrites 𝐀 with 𝐋𝐔.
Algorithm 26 uses Doolittle’s method to compute 𝐔, the upper triangular
factors, of a symmetric sparse matrix 𝐀. It is assumed that 𝐀 is initially stored
as an upper triangular matrix.
e algorithm overwrites 𝐀 with 𝐔. e vector 𝐰 is used to construct the
nonzero entries of each column from 𝐔. e vector 𝐜 contains cursors to the
row in 𝐋 with which the entries of 𝐀 are associated, e.g. if 𝑤𝑘 contains 𝑙𝑗𝑖 then
𝑐𝑘 is 𝑗 .

8.7 Solving Sparse Linear Systems


As we have seen, the 𝐋𝐔 decomposition of the matrix 𝐏𝐀𝐐 is used to solve the
linear system of equations 𝐀𝐱 = 𝐛 by sequentially solving Equation 58 through
Equation 61, which are repeated below.

57
8.7 Solving Sparse Linear Systems 8 SPARSE MATRICES

Algorithm 25: LU Decomposition of a Sparse Matrix


f o r 𝑖 = 2, ⋯ , 𝑛
w h i l e [ 𝑗 = scan ( 𝐀 , row 𝑖, 1, 𝑛)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
push
𝛼=𝑎
𝑚 = min ( 𝑖 − 1, 𝑗 − 1 )
w h i l e [ 𝑝 = scan ( 𝐀 , row 𝑖, 1, 𝑚)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑏=𝑎
i f get ( 𝐀, 𝑝, 𝑗 ) is successful
𝛼 = 𝛼 − 𝑎𝑏
if 𝑖>𝑗
get ( 𝐀, 𝑗, 𝑗 )
𝛼
𝛼=
𝑎
𝑎=𝛼
put ( 𝐀, 𝑖, 𝑗 )
pop

Algorithm 26: LU Decomposition of a Sparse Symmetric Matrix


f o r 𝑖 = 1, ⋯ , 𝑛
𝑘=0
f o r 𝑗 = 1, ⋯ , 𝑖 − 1
i f get ( 𝐀, 𝑗, 𝑖 ) is successful
𝑘=𝑘+1
𝑤𝑘 = 𝑎
𝑐𝑘 = 𝑗
get ( 𝐀, 𝑗, 𝑗 )
𝑤
𝑤𝑘 = 𝑘
𝑎
w h i l e [ 𝑗 = scan ( 𝐀 , row 𝑖, 𝑖, 𝑛)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝛼=𝑎
f o r 𝑝 = 1, ⋯ , 𝑘
i f get ( 𝐀, 𝑐𝑝 , 𝑗 ) is successful
𝛼 = 𝛼 − 𝑎𝑤𝑝
𝑎=𝛼
put ( 𝐀, 𝑖, 𝑗 )

58
8.7 Solving Sparse Linear Systems 8 SPARSE MATRICES

𝐲 = 𝐏𝐛
𝐋𝐜 = 𝐲
𝐔𝐳 = 𝐜
𝐱 = 𝐐𝐳

Sparsity techniques bene t this process. e algorithms presented in this section


assume:

• An 𝐋𝐔 decomposition of 𝐏𝐀𝐐 exists.


• e factorization is maintained in a sparse matrix data structure which
supports the scan and get operators.
• e row permutations 𝐏 are represented by a mapping 𝜓 whose domain
is a row index in 𝐀 and whose range is the corresponding row index in
𝐏𝐀𝐐.

• e column permutations 𝐐 are represented by a mapping 𝜏 whose do-


main is a column index in 𝐏𝐀𝐐 and whose range is the corresponding
column index in 𝐀.

For example, Section 8.6 describes algorithms that create numeric factoriza-
tions satisfying the rst two of these assumptions. Section 8.4.1 describes an
algorithm for obtaining the row and column permutations corresponding to a
minimum degree pivot strategy.
For simplicity of exposition, it is assumed that the vectors 𝐛, 𝐜, 𝐱, 𝐲, and 𝐳
are stored in linear arrays. However, any data structure that lets you retrieve and
update an element of a vector based on its index will suffice.

8.7.1 Permute the Constant Vector


e equation 𝐲 = 𝐏𝐛 is efficiently implemented using the mapping 𝜓 to permute
the elements of 𝐛. Algorithm 27 illustrates this procedure.

8.7.2 Sparse Forward Substitution


e lower triangular system of equations 𝐋𝐜 = 𝐲 is solved by forward substi-
tution. Algorithm 28 implements the inner product formulation of forward
substitution on a sparse 𝐋 and full vectors 𝐜 and 𝐲.

59
8.7 Solving Sparse Linear Systems 8 SPARSE MATRICES

Algorithm 27: Permute 𝐛 to order 𝐏


f o r 𝑖 = 1, ⋯ , 𝑛
𝑘 = evaluate ( 𝜓, 𝑖 )
𝑦𝑖 = 𝑏𝑘

Algorithm 28: Sparse Forward Substitution


f o r 𝑖 = 1, ⋯ , 𝑛
𝛼 = 𝑦𝑖
w h i l e [ 𝑗 = scan ( 𝐋 , row 𝑖, 1, 𝑖 − 1)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝛼 = 𝛼 − 𝑎𝑦𝑖
get ( 𝐋, 𝑖, 𝑖 )
𝛼
𝑐𝑖 =
𝑎

e sequencing of the operations permits the use of a single vector 𝐲. Op-


erations that update 𝑐𝑖 would overwrite 𝑦𝑖 instead. If 𝐋 is unit lower triangular,
division by the diagonal element 𝑙𝑖𝑖 is omitted.
Algorithm 29 implements an outer product formulation of forward substitu-
tion for use with symmetric systems whose upper triangular factors are available.
See Sections 5.3 and 7.6 for additional information.

8.7.3 Sparse Backward Substitution


e upper triangular system of equations 𝐔𝐳 = 𝐜 is solved by backward substi-
tution. Algorithm 30 implements the inner product formulation of backward
substitution on a sparse 𝐔 and full vectors 𝐜 and 𝐳.
e sequencing of the operations permits the use of a single vector 𝐜. Op-

Algorithm 29: Sparse Forward Substitution - Outer Product


f o r 𝑖 = 𝑛, ⋯ , 1
𝑐𝑖 = 𝑦 𝑖
get ( 𝐔, 𝑖, 𝑖 )
𝑐
𝛼= 𝑖
𝑎
w h i l e [ 𝑘 = scan ( 𝐔 , row 𝑖, 𝑖 + 1, 𝑛)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑦𝑘 = 𝑦𝑘 − 𝑎𝛼

60
8.8 Sparse LU Factor Update 8 SPARSE MATRICES

Algorithm 30: Sparse Back Substitution


f o r 𝑖 = 𝑛, ⋯ , 1
𝛼 = 𝑐𝑖
w h i l e [ 𝑗 = scan ( 𝐔 , row 𝑖, 𝑖 + 1, 𝑛)]≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝛼 = 𝛼 − 𝑎𝑧𝑗
get ( 𝐔, 𝑖, 𝑖 )
𝛼
𝑧𝑖 =
𝑎

Algorithm 31: Permute 𝐱 to order 𝐐


f o r 𝑖 = 1, ⋯ , 𝑛
𝑘 = evaluate ( 𝜏, 𝑖 )
𝑥𝑖 = 𝑧 𝑘

erations that update 𝑧𝑖 would overwrite 𝑐𝑖 instead. If 𝐔 is unit upper triangular,


division by the diagonal element 𝑢𝑖𝑖 is omitted.

8.7.4 Permute the Solution Vector


e equation 𝐱 = 𝐐𝐳 is efficiently implemented using the mapping 𝜏 to permute
the elements of 𝐳. Algorithm 31 illustrates this process.

8.8 Sparse LU Factor Update


If the 𝐋𝐔 decomposition of the sparse product 𝐏𝐀𝐐 exists and the factorization
of a related matrix

𝐏𝐀 𝐐 = 𝐏(𝐀 + 𝚫𝐀)𝐐 (93)


is desired, factor update is at times the procedure of choice. e current section
examines procedures for updating the sparse factorization of 𝐏𝐀𝐐 following a
rank one modi cation of 𝐀, that is

𝐀 = 𝐀 + 𝛼𝐲𝐳𝐓 (94)
where 𝛼 is a scalar, 𝐲 and 𝐳 are 𝑛 vectors, and 𝐀 has the same sparsity pattern as
𝐀.
e condition on the structure of 𝐀 is not imposed by the factor update
process, but is instead a comment on the utility of factor update in a sparse

61
8.8 Sparse LU Factor Update 8 SPARSE MATRICES

Algorithm 32: Factorization Path


w h i l e 𝑖 ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑗 = scan ( 𝐔, row 𝑖, 𝑖 + 1, 𝑛)
𝑘 = scan ( 𝐋, row 𝑖, 𝑖 + 1, 𝑛)
𝑖 = min ( 𝑗, 𝑘 )

environment. If the modi cation to 𝐀 introduces new elements into the matrix,
the pivot sequence determined during symbolic factorization may no longer
apply. e sparsity degradation introduced by an inappropriate pivot sequence
may outweigh the bene ts gained from the updating the existing factorization.
e performance of factor update algorithms is often enhanced by restricting
pivot operations to the portions of 𝐋 and 𝐔 that are directly effected by the
change in 𝐀. Papers by Tinney, Bradwajn, and Chan (1985) [8] and Chan and
Brandwajn (1986) [12] describe a systematic methodology for determining this
subset of 𝐋𝐔. e rows of 𝐔 and columns in 𝐋 that are changed during factor
update are referred to as the factorization path. e fundamental operation
is to determine the factorization path associated with a vector 𝐲 with just one
nonzero element. Such a vector is called a singleton. Its factorization path is
called a singleton path. If more than one of the elements in 𝐲 are nonzero, the
composite factorization path is simply the union of the singleton paths.

8.8.1 Factorization Path of a Singleton Update


If the 𝐋𝐔 decomposition of 𝐏𝐀𝐐 is stored as an asymmetric sparse matrix and
a singleton vector 𝐲 has one nonzero element at location 𝑖, its factorization path
through 𝐋𝐔 is determined by Algorithm 32. Each value ascribed to 𝑖 by Algo-
rithm 32 is a vertex on the factorization path of 𝐲.
In words, Algorithm 32 starts at 𝑢𝑖𝑖 (the diagonal element in 𝐔 correspond-
ing to the nonzero element in 𝐲) and looks to the right until it nds the rst
nonzero element 𝑢𝑖𝑗 . It then starts at element 𝑙𝑖𝑖 and looks down column 𝑖 of 𝐋
until it nds a nonzero element 𝑙𝑖𝑘 . e value of 𝑖 is then reset to the smaller of 𝑗
or 𝑘 and the process is repeated for the next 𝑢𝑖𝑖 and 𝑙𝑖𝑖 . e procedure ends when
there are no elements to the right of the diagonal in 𝐔 or below the diagonal in
𝐋 for some vertex on the factorization path.
Obviously, Algorithm 32 assumes that a column scan is independent of a
row scan but works in the same manner.
If the 𝐋𝐔 decomposition of 𝐏𝐀𝐏𝐓 has a symmetric sparsity pattern Algo-
rithm 32 simpli es to Algorithm 33.

62
9 IMPLEMENTATION NOTES

Algorithm 33: Symmetric Factorization Path


w h i l e 𝑖 ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
𝑖 = scan ( 𝐔, row 𝑖, 𝑖 + 1, 𝑛)

e symmetry makes the search through column 𝑖 of 𝐋 unnecessary. In


Algorithm 33, the use of 𝐔 to determine the factorization path was arbitrary.
Either 𝐋 or 𝐔 can anchor the process.

8.8.2 Revising LU after a Singleton Update


Algorithm 34 updates a structurally symmetric 𝐋𝐔 factorization after a rank one
modi cation to 𝐀. It assumes:

• 𝐋 is unit lower triangular and maintained in a sparse data structure that


communicates using the buffer 𝑙.
• 𝐔 is upper triangular and maintained in a sparse data structure that com-
municates using the buffer 𝑢.
• e sparse data structure does not permit a column scan.
• e 𝐲 vector is full and all entries are zero except 𝑦𝑖 .
• e 𝐳 vector is full.
• e product 𝐲𝐳𝐓 has the same sparsity structure as 𝐀.
• e 𝐲 and 𝐳𝐓 vectors have been permuted by 𝐏 and 𝐏𝐓 so they are in the
same frame of reference as 𝐋 and 𝐔.

e requirement that 𝐋 and 𝐔 occupy different data structures in Algorithm


34 is pedantic. In practice, 𝐋 will occupy the subdiagonal portion of a sparse
matrix and 𝐔 will occupy the diagonal and superdiagonal portions of the same
matrix.
If 𝐀 is symmetric, 𝐲 = 𝐳, and 𝐀 has the same sparsity pattern as 𝐀, Algo-
rithm 34 simpli es to Algorithm 35.

9 Implementation Notes
is document concludes with a brief discussion of an experimental implemen-
tation of sparse matrix algorithms in a highly cached database environment. It

63
9 IMPLEMENTATION NOTES

Algorithm 34: Structurally Symmetric Sparse LU Factor Update


w h i l e 𝑖 ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
get ( 𝐔, 𝑖, 𝑖 )
𝛿 = 𝑢, 𝑝 = 𝑦𝑖 , 𝑞 = 𝑧𝑖
𝑢 = 𝑢 + 𝛼𝑝𝑞
put ( 𝐔, 𝑖, 𝑖 )
𝛼𝑝 𝑞 𝑢 𝛼
𝛽1 = , 𝛽2 = 𝛼𝑞, 𝑞 = , 𝛿 = , 𝛼 =
𝑢 𝛿 𝛿 𝛿
𝑘 = 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
w h i l e [ 𝑗 = scan ( 𝐔, row 𝑖, 𝑖 + 1, 𝑛)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
i f 𝑗 is initial element encountered on row 𝑖
𝑘=𝑗
𝑧𝑗 = 𝑧𝑗 − 𝑞𝑢
𝑢 = 𝛿𝑢 + 𝛽2 𝑧𝑗
put ( 𝐔, 𝑖, 𝑗 )
get ( 𝐋, 𝑗, 𝑖 )
𝑦𝑗 = 𝑦𝑗 − 𝑝𝑙
𝑙 = 𝑙 + 𝛽1 𝑦𝑗
put ( 𝐋, 𝑗, 𝑖 )
𝑖=𝑘

Algorithm 35: Symmetric Sparse LU Factor Update


w h i l e 𝑖 ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
get ( 𝐔, 𝑖, 𝑖 )
𝛿 = 𝑢, 𝑝 = 𝑦𝑖
𝑢 = 𝑢 + 𝛼𝑝2
put ( 𝐔, 𝑖, 𝑖 )
𝛼𝑝 𝑞 𝑢 𝛼
𝛽= , 𝑞= , 𝛿= , 𝛼=
𝑢 𝛿 𝛿 𝛿
𝑘 = 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
w h i l e [ 𝑗 = scan ( 𝐔,row 𝑖, 𝑖 + 1, 𝑛)] ≠ 𝑓𝑖𝑛𝑖𝑠ℎ𝑒𝑑
i f 𝑗 is initial element encountered on row 𝑖
𝑘=𝑗
𝑦𝑗 = 𝑦𝑗 − 𝑝𝑢
𝑢 = 𝛿𝑢 + 𝛽𝑧𝑗
put ( 𝐔, 𝑖, 𝑗 )
𝑖=𝑘

64
9.1 Sparse Matrix Representation 9 IMPLEMENTATION NOTES

row column matrix element

Figure 4: Matrix Tuple Structure

was written as part of the documentation of an experimental project, from a


bygone era, which proceeded along these lines.
e relevance of the material in his section to current readers is debatable
since it examines software implementations for machine architectures that are
no longer in use (or available). Nonetheless, some aspects of the discussion are
still pertinent despite changes in the execution environment. For this reason
and for the sake of completeness, it was decided to retain the information in
this monograph. Do with it as you like.

9.1 Sparse Matrix Representation


Data structures used to maintain sparse matrices must provide access to the
nonzero elements of a matrix in a manner which facilitates efficient implemen-
tation of the algorithms that are examined in Section 8. e current sparse
matrix implementation also seeks to support a high degree of generality both in
problem size and the de nition of a matrix element. Among other things, this
implies that the algorithms must be able to solve problems that are too large to t
into core. A 𝐵 𝑙𝑖𝑛𝑘 tree supported by a data base cache (described elsewhere, also
see Lehman and Yao (1981) [18]) is one possible vehicle for this task. Simply
put, the fundamental sparse matrix data structure is:

• Each matrix is a relation in a data base, and


• Each nonzero element of a matrix is a tuple in a matrix relation.

Matrix tuples have the structure indicated in Figure 4.


e row and column domains of each tuple constitute a compound key to
the matrix relation. eir meaning corresponds to the standard dense matrix
terminology.
e description of a matrix element is left intentionally vague. Its de nition
varies with the application. Matrix elements must include a real number, dou-
ble precision real number, complex number, or any other entity for which the

65
9.2 Database Cache Performance 9 IMPLEMENTATION NOTES

arithmetical operations of addition, subtraction, multiplication, and division


are reasonably de ned.
In this context, matrix elements are accessed through high level data base
operations:
• Get retrieves a random tuple.
• Next retrieves tuples sequentially. You will recall that the scan operator
(de ned in sections 8.2.1 and 8.2.2) is used extensively by sparse matrix
algorithms in Section 8. Scan is implemented by embellishing the next
primitive.
• Put updates the non-key portions of an existing tuple.
• Insert adds a new tuple to a relation.
• Delete removes an existing tuple from a relation.
is data structure places few constraints on the representation of a matrix.
However, several conventions are adopted to facilitate consistent algorithms and
efficient cache access:
• Matrices have one-based indexing, i.e. the row and column indices of an
𝑛 × 𝑛 matrix range from 1 to 𝑛.
• Column zero exists for each row of an asymmetric matrix. Column zero
serves as a row header and facilitates row operations. It does not enter into
the calculations.
• A symmetric matrix matrix is stored as an upper triangular matrix. In this
representation, the diagonal element anchors row operations as well as
entering into the computations. Column zero is not used for symmetric
matrices.
Figure 5 depicts the data structure associated with sparse matrices.

9.2 Database Cache Performance


When matrix algorithms are implemented in a relational environment, database
access requirements can play a signi cant (if not dominating) role in an algo-
rithm’s time complexity. e current section examines the theoretical and em-
pirical performance characteristics of a 𝐵 𝑙𝑖𝑛𝑘 tree (see Lehman and Yao (1981) [18])
supported by an in-core cache with a LRU paging discipline. e operators de-
scribed in the previous section form the basis of the discussion.

66
9.2 Database Cache Performance 9 IMPLEMENTATION NOTES

Column 0 Column 1 Column 2 ... Column n

Row 1
Row 2
Row 3
Row 4
Row 5
Row 6
... ...
Row n

Asymmetric Matrix

Column 1 Column 2 Column 3 ... Column n

Row 1
Row 2
Row 3
... ...
Row n

Symmetric Matrix

Figure 5: Sparse Matrix Representation

67
9.2 Database Cache Performance 9 IMPLEMENTATION NOTES

9.2.1 Sequential Matrix Element Retrieval


e time complexity of next is 𝑂 (1), since next just looks up a virtual address
(no search is involved). A next operation requires one cache access. No key
comparisons are necessary. An additional cache access is required to obtain the
non-key portion of the tuple.

9.2.2 Arbitrary Matrix Element Retrieval


e time complexity of get is 𝑂 (log 𝑛) where 𝑛 is the number of tuples in the
relation. A get operation is implemented by a 𝐵 𝑙𝑖𝑛𝑘 tree search. When the key
to a sparse matrix is four bytes long, the 𝐵 𝑙𝑖𝑛𝑘 tree of the matrix is a 31-61 tree.
Interpolating the tables found in Gonnet (1984) [19] yields:

• e expected height of a 10,000 key 𝐵 𝑙𝑖𝑛𝑘 tree (i.e. number of nodes


searched) is three.
• e average node to key ratio of a 10,000 key 𝐵 𝑙𝑖𝑛𝑘 tree is 0.02904. Hence,
the average node will contain 34.43 keys.

Each descending branch in a 𝐵 𝑙𝑖𝑛𝑘 tree search is determined through a binary


search of a tree node. e maximum number of key comparisons needed to
search a node in this manner is log2 𝑘, where 𝑘 is the number of keys in the node.
erefore, it will take no more than 5.1 comparisons to locate the appropriate
tree branch in an average 35 key node.
ese observations imply that no more than 3 cache lookups and 16 key
comparisons are required to locate an entry in a 10,000 key 𝐵 𝑙𝑖𝑛𝑘 tree. An ad-
ditional cache access is required to obtain the non-key portions of the tuple.

9.2.3 Arbitrary Matrix Element Update


If the address of the non-key portion of a tuple is known, put is an 𝑂 (1) opera-
tion requiring a single cache access. If the address is not known put is equivalent
to a get – 𝑂 (log 𝑛) – followed by a direct cache access.

9.2.4 Matrix Element Insertion


e time complexity of insert is 𝑂 (log 𝑛) where 𝑛 is the number of tuples in
the relation. An insert operation is equivalent to a put operation unless the
tree splits. Interpolating the tables in in Gonnet (1984) [19] yields:

68
9.2 Database Cache Performance 9 IMPLEMENTATION NOTES

• e average number of splits for the 𝑛+1𝑠𝑡 insertion into a 10,000 key 31-
61 tree is approximately 0.02933, i.e. the tree will split each time 33.40
items are inserted (on the average).

Splitting increases the constant associated with the growth rate slightly. It
does not increase the growth rate per se.

9.2.5 Matrix Element Deletion


Deleting a key from a 𝐵 𝑙𝑖𝑛𝑘 tree is analogous to inserting one, except tree nodes
occasionally combine instead of splitting. erefore, the time complexity of
delete is 𝑂 (log 𝑛) like insertion.

9.2.6 Empirical Performance Measurements


e measurements in Table 1 provide empirical performance statistics for var-
ious data base operations. e measurements were made on a 16 Mhz IBM
PS/2 Model 70 with a 16 Mhz 80387 coprocessor and a 27 msec, 60 Mbyte
xed disk drive (do you think the hardware is a bit dated? I suspect many read-
ers have only seen this hardware in a museum — if at all). You should note two
characteristics of the measurements:

• e cache was large enough to hold the entire 𝐵 𝑙𝑖𝑛𝑘 tree of relations A,
B, and C. ere are no cache faults to disk in these measurements. e
relation D was too big to t in core. Its next times re ect numerous
cache faults.
• e get operation looked for the same item during each repetition. is
is explains the lack of cache faults while relation D was processed. Once
the path to the item was in core it was never paged out.

Neglecting cache faults, the time required to nd a tuple is a function of


two variables: the size of the relation and the size of the key. e number of
tuples in a relation determines the number of comparisons that are made. e
size of the key effects the amount of work required to perform each comparison.
Comparing the “get relation D” measurements of Table 1 to the “get relation
C” measurements provides an indication of the actual effect of a relation’s size
on search time (since both relations have the same size key). Note that
2, 520 𝜇 sec
≈ 1.167
2, 165 𝜇 sec

69
9.2 Database Cache Performance 9 IMPLEMENTATION NOTES

Table 1: Database Cache Benchmarks

Repetitions
30k 50k 100k 200k Average
Operation (seconds) (seconds) (seconds) (seconds) (𝜇sec)
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐀1
Next n/a 12 25 51 255
Get n/a 26 52 103 515
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐁2
Next 7 12 24 47 235
Get 34 57 114 228 1,140
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐂3
Next 7 12 24 49 245
Get 65 108 216 433 2,165
𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧 𝐃4
Next 48 82 164 n/a 1,640
Cache faults 2,058 3,541 7,095
Get 76 126 252 n/a 2,520
Cache faults 1 1 1

1
112 tuples, 2 byte key.
2
463 tuples, 4 byte key.
3
673 tuples, 22 byte key.
4
10,122 tuples, 22 byte key.

which is below the theoretical bound


log2 10, 122 13.31
≈ ≈ 1.416
log2 673 9.40
Comparing “get relation C” to “get relation B” gives a good feel for the impact
of key size. e size of the relation should not have much impact on the search
time discrepancies since
log2 673 9.40
≈ ≈ 1.060
log2 463 8.87
e next operation was metered by repeatedly scanning a relation until the
desired number of operations was achieved. Each time the end of a relation was
encountered the scan was restarted. e high next measurement for relation
A probably re ects the overhead of many loop starts and stops (since there were

70
9.3 Floating Point Performance 9 IMPLEMENTATION NOTES

only 12 tuples in the relation). e elevated next time of the relation C is


probably due to key length. After all the keys on a cache page are processed,
a relatively expensive page fault occurs. Larger keys cause the cache page to
change more frequently. Given that the in-core next observations have a mean
of 240 𝜇sec and a standard deviation of 10 𝜇sec, the effects of these peripheral
processes appear to be relatively insigni cant.
In summary, Table 1 shows that the 𝑂 (1) operations have an empirical time
advantage that ranges from 1.54/1 to 8.84/1 over the 𝑂 (log 𝑛) operations. is
observation underscores the importance of tailoring algorithmic implementa-
tions to take advantage of the 𝑂 (1) operators. In practical terms, his means that
sequential access and stack operations are preferred to direct random access.

9.3 Floating Point Performance


Note.is section contains data that is so antiquated that we’re not sure it has much
relevance to modern hardware con gurations. It is included for the sake of completeness
and as a historical curiosity.

Table 2: Floating Point Benchmarks

With 80387 No 80387


Repetitions Average Repetitions Average
2,000k Time 200k Time
Operation (seconds) (𝜇sec) (seconds) (𝜇sec)
Add 25.9 13.0 34.3 172
Subtract 25.9 13.0 35.3 177
Multiply 28.4 14.2 44.3 222
Divide 33.6 16.8 50.9 255
Inner product1 40.3 20.2 61.9 310
Scalar multiply2 30.2 15.1 45.0 225
Loop overhead 1.3 1.3

1
sum + = a[cursor[i]] * y[cursor[j]]
2
a[ cursor[i]] * = scalar

A variety of oating point operations were monitored under MS DOS ver-


sion 3.30 on a 16 Mhz IBM PS/2 Model 70 with a 16 Mhz 80387 coprocessor
and a 27 msec, 60 Mbyte xed disk drive. e 32 bit operations available on
the 80386 were not used. Table 2 catalogs the time requirements of the simple

71
9.3 Floating Point Performance 9 IMPLEMENTATION NOTES

arithmetical operations, inner product accumulation, and multiplying a vector


by a scalar. All benchmarks were performed using double precision real num-
bers. e test contains a loop that was restarted after every 500 operations,
e.g. 200k repetitions also includes the overhead of starting and stopping a loop
400 times. With this testing scheme, all loop counters and array indices were
maintained in the registers.
e measurements in Table 3 provide a similar analysis of math functions in
the Microsoft C Version 5.1 math library. ese benchmarks were conducted
with a single loop whose counter was a long integer.

Table 3: Math Library Benchmarks

With 80387 No 80387


Repetitions Average Repetitions Average
300k Time 10k Time
Function (seconds) (𝜇sec) (seconds) (𝜇sec)
acos 36.2 121 30.5 3,050
asin 35.1 117 29.9 2,990
atan 26.0 87 23.0 2,300
cos 37.7 126 25.3 2,530
sin 37.0 123 24.7 2,470
tan 31.7 106 19.2 1,920
log 25.4 85 18.5 1,850
sqrt 16.5 55 5.7 570
pow 51.4 171 38.6 3,860
j01 235.1 784 60.7 6,070
j6 662.02 2,207 176.3 17,603
y03 510.02 1,700 146.4 14,640
Loop overhead 3 3

1
Bessel function of the rst kind, order 0.
2
Extrapolated from 30,000 repetitions.
3
Bessel function of the second kind, order 0.

Differences in loop overheads found in Table 2 and Table 3 are accounted for
by the differences in the loop counter implementation described above. e 3
𝜇sec overhead re ects the time required to increment a long integer and monitor
the termination condition (which also involved a long integer comparison). e
1.3 𝜇sec overhead re ects the time required to increment a register and monitor

72
9.4 Auxiliary Store REFERENCES

the termination condition (which involved a register comparison).

9.4 Auxiliary Store


A data structure referred to as the auxiliary store is provided to support tem-
porary information that is required by matrix algorithms but not stored in the
matrix relations themselves. e auxiliary store is implemented in a manner
that takes advantage of unused heap space at execution time. If the heap is big
enough to accommodate the entire auxiliary store, an array of structures is allo-
cated and the store is maintained in core. If available heap space is inadequate, a
relation is allocated and the auxiliary store is maintained in the database cache.
Access to the in-core version of the auxiliary store requires 13.8 𝜇sec. Heap
access time does not vary with the size of the store.

References
[1] L. Fox, An Introduction to Numerical Linear Algebra, Clarendon Press, Ox-
ford, 1964. 18
[2] G. Golub and C. Van Loan, Matrix Computations, e Johns Hopkins Uni-
versity Press, Baltimore, 1983. 18, 19, 26
[3] I. Duff, A. Erisman, and J. Reid, Direct Methods for Sparse Matrices, Claren-
don Press, Oxford, 1986. 18, 22, 46, 51
[4] W. Press, B. Flannery, S. Teukolsky, and W. Vetterling, Numerical Recipes in
C, Cambridge University Press, Cambridge and New York, 1988. 18, 26
[5] S. Conte and C. de Boor, Elementary Numerical Analysis, McGraw-Hill Book
Company, New York, 1972. 22, 26
[6] W. Tinnney and J. Walker, “Direct solutions of sparse network equations
by optimally ordered triangular factorization”, pp 1801-1809, Proceedings of
the IEEE, Volume 55, No. 11, 1967. 24
[7] A. George and J. Liu, Computer Solutions of Large Sparse Positive De nite
Systems, Prentice-Hall, Engle Wood Cliffs, New Jersey, 1981. 31, 46, 52
[8] W. Tinney, V. Brandwajn, and S. Chan, “Sparse vector methods”, IEEE
Transactions on Power Apparatus and Systems, PAS-104, No. 2, 1985. 31, 62
[9] J Bennett, “Triangular Factors of Modi ed Matrices”, Numerische Mathe-
matik, Volume 7, pp. 217-221, 1965. 34

73
REFERENCES REFERENCES

[10] P. Gill, G. Golub, W. Murray, and M. Sanders, “Methods for Modifying


Matrix Factorizations”, Mathematics of Computation, Volue 28, No. 126, pp.
505-535, 1974. 34, 35, 43
[11] W. Hager, “Updating the Inverse of A Matrix”, SIAM Review, Volume 31,
No. 2, pp. 221-239, 1989. 35
[12] S. Chan and V. Brandwajn, “Partial matrix refactorization”, IEEE Trans-
actions on Power Systems, Volume 1, No. 1, pp.193-200, 1986. 35, 62
[13] W. Tinney and C. Hart, “Power ow solution by Newton’s method”, IEEE
Transactions on Power Apparatus and Systems, PAS-86, No. 6, 1972. 45, 46,
52
[14] S. Eisenstat, M. Schultz, and A. Sherman, “Considerations in the design
of software for sparse Gaussian elimination, in Sparse Matrix Computations,
Edited by J. Bunch and D.Rose, Academic Press, New York and London,
pp. 263-289, 1976. 46
[15] D. Rose and R. Tarjan “Algorithmic aspects of vertex elimination”, Pro-
ceedings Seventh Annual ACM Symposium on eory of Computing, pp. 245-
254, 1975. 50
[16] A. Gomez and L. Franquelo, “Node ordering algorithms for sparse vector
method improvement”, IEEE Transactions on Power Systems, Volume 3, No.
1, pp. 73-79, 1988. 52
[17] A. Gomez and L. Franquelo, “An efficient ordering algorithm to improve
sparse vector methods”, IEEE Transactions on Power Systems, Volume 3, No.
4, pp. 1538–1544, 1988. 52
[18] P. Lehman and B. Yao, “Efficient Locking for Concurrent Operations on
B-Trees”, ACM Transaction on Database Systems, Volume 6, Number 4, pp.
650-669, December, 1981. 65, 66
[19] G. Gonnet, Handbook of Algorithms and Data Structures, Addison-Wesley,
Reading, Massachusetts, 1984. 68

74

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy