LangCommentary PDF

Commentary on Lang’s Linear Algebra
Henry C. Pinkham

c 2013 by Henry C. Pinkham
All rights reserved
Henry C. Pinkham
Draft of May 31, 2013.

Contents
1 Gaussian Elimination 1
1.1 Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Elimination via Matrices . . . . . . . . . . . . . . . . . . . . 5
2 Matrix Inverses 7
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 More Elimination via Matrices . . . . . . . . . . . . . . . . . 8
2.3 Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . 10
3 Groups and Permutation Matrices 11

3.1 The Motions of the Equilateral Triangle . . . . . . . . . . . . 11
3.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Permutation Matrices . . . . . . . . . . . . . . . . . . . . . . 16
4 Representation of a Linear Transformation by a Matrix 19

4.1 The First Two Sections of Chapter IV . . . . . . . . . . . . . 19
4.2 Representation of L(V, W ) . . . . . . . . . . . . . . . . . . . 21
4.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Row Rank is Column Rank 36

5.1 The Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . 36
6 Duality 39
6.1 Non-degenerate Scalar Products . . . . . . . . . . . . . . . . 39
6.2 The Dual Basis . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 The Orthogonal Complement . . . . . . . . . . . . . . . . . . 42
6.4 The Double Dual . . . . . . . . . . . . . . . . . . . . . . . . 43
6.5 The Dual Map . . . . . . . . . . . . . . . . . . . . . . . . . . 44
CONTENTS iv
6.6 The Four Subspaces . . . . . . . . . . . . . . . . . . . . . . . 45

6.7 A Different Approach . . . . . . . . . . . . . . . . . . . . . . 46
7 Orthogonal Projection 49
7.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Solving the Inhomogeneous System . . . . . . . . . . . . . . 51
7.3 Solving the Inconsistent Inhomogeneous System . . . . . . . 52
8 Uniqueness of Determinants 56
8.1 The Properties of Expansion by Minors . . . . . . . . . . . . 56
8.2 The Determinant of an Elementary Matrix . . . . . . . . . . . 57
8.3 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4 The Determinant of the Transpose . . . . . . . . . . . . . . . 60
9 The Key Lemma on Determinants 61

9.1 The Computation . . . . . . . . . . . . . . . . . . . . . . . . 61
9.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 63
10 The Companion Matrix 64

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
10.2 Use of the Companion Matrix . . . . . . . . . . . . . . . . . 65
10.3 When the Base Field is the Complex Numbers . . . . . . . . . 66
10.4 When the Base Field is the Real Numbers . . . . . . . . . . . 67
11 Positive Definite Matrices 69

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
11.2 The First Three Criteria . . . . . . . . . . . . . . . . . . . . . 71
11.3 The Terms of the Characteristic Polynomial . . . . . . . . . . 73
11.4 The Principal Minors Test . . . . . . . . . . . . . . . . . . . 76
11.5 The Characteristic Polynomial Test . . . . . . . . . . . . . . . 81
12 Homework Solutions 82
12.1 Chapter XI, §3 . . . . . . . . . . . . . . . . . . . . . . . . . . 82
12.2 Chapter XI, §4 . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A Symbols and Notational Conventions 85

A.1 Number Systems . . . . . . . . . . . . . . . . . . . . . . . . 85
A.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 85
A.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References 87
Preface
These notes were written to complement and supplement Lang’s Linear Algebra
[4] as a textbook in a Honors Linear Algebra class at Columbia University. The
students in the class were gifted but had limited exposure to linear algebra. As Lang
says in his introduction, his book is not meant as a substitute for an elementary
text. The book is intended for students having had an elementary course in linear
algebra. However, by spending a week on Gaussian elimination after covering
the second chapter of [4], it was possible to make the book work in this class. I
had spent a fair amount of time looking for an appropriate textbook, and I could
find nothing that seemed more appropriate for budding mathematics majors than
Lang’s book. He restricts his ground field to the real and complex numbers, which
is a reasonable compromise.
The book has many strength. No assumptions are made, everything is defined.
The first chapter presents the rather tricky theory of dimension of vector spaces
admirably. The next two chapters are remarkably clear and efficient in presently
matrices and linear maps, so one has the two key ingredients of linear algebra:
the allowable objects and the maps between them quite concisely, but with many
examples of linear maps in the exercises. The presentation of determinants is good,
and eigenvectors and eigenvalues is well handled. Hermitian forms and hermitian
and unitary matrices are well covered, on a par with the corresponding concepts
over the real numbers. Decomposition of a linear operator on a vector space is
done using the fact that a polynomial ring over a field is a Euclidean domain, and
therefore a principal ideal domain. These concepts are defined from scratch, and
the proofs presented very concisely, again. The last chapter covers the elementary
theory of convex sets: a beautiful topic if one has the time to cover it. Advanced
students will enjoy reading Appendix II on the Iwasawa Decomposition. A motto
for the book might be:
A little thinking about the abstract principles underlying linear algebra
can avoid a lot of computation.
Many of the sections in the book have excellent exercises. The most remarkable
CONTENTS vi
ones are in §II.3, III.3, III.4, VI.3, VII.1, VII.2, VII.3, VIII.3, VIII.4, XI.3, XI.6.
A few of the sections have no exercises at all, and the remainder the standard set.
There is a solutions manuel by Shakarni [5]. It has answers to all the exercises in
the book, and good explanations for some of the more difficult ones. I have added
some clarification to a few of the exercise solutions.
A detailed study of the book revealed flaws that go beyond the missing material
on Gaussian elimination. The biggest problems are in Chapters IV and V. Certain
of the key passages in Chapter IV are barely comprehensible, especially in §3. Ad-
mittedly this is the hardest material in the book, and is often omitted in elementary
texts. In Chapter V there are problems of a different nature: because Lang does
not use Gaussian elimination he is forced to resort to more complicated arguments
to prove that row rank is equal to column rank, for example. While the coverage
of duality is good, it is incomplete. The same is true for projections. The coverage
of positive definite matrices over the reals misses the tests that are easiest to apply,
again because they require Gaussian elimination. In his quest for efficiency in the
definition of determinants, he produces a key lemma that has to do too much work.
While he covers some of the abstract algebra he needs, for my purposes he
could have done more: cover elementary group theory and spend more time on
permutations. One book that does more in this direction is Valenza’s textbook [7].
Its point of view similar to Lang’s and it covers roughly the same material. Another
is Artin’s Algebra text [1], which starts with a discussion of linear algebra. Both
books work over an arbitrary ground field, which raises the bar a little higher for
the students. Fortunately all the ground work for doing more algebra is laid in
Lang’s text: I was able to add it in my class without difficulty.
Lang covers few applications of linear algebra, with the exception of differen-
tial equations which come up in exercises in Chapter III, §3, and in the body of the
text in Chapters VIII, §1 and §4, and XI, §4.
Finally, a minor problem of the book is that Lang does not refer to standard
concepts and theorems by what is now their established name. The most egre-
gious example is the Rank-Nullity Theorem, which for Lang is just Theorem 3.2
of Chapter III. He calls the Cayley-Hamilton Theorem the Hamilton-Cayley The-
orem. He does not give names to the axioms he uses: associativity, commutativity,
existence of inverses, etc.
Throughout the semester, I wrote notes to complement and clarify the expo-
sition in [4] where needed. Since most of the book is clear and concise, I only
covered a small number of sections.
• I start with an introduction to Gaussian elimination that I draw on later in
these notes and simple extensions of some results.
• I continue this with a chapter on matrix inverses that fills one of the gaps in
Lang’s presentation.
CONTENTS vii
• The next chapter develops some elementary group theory, especially in con-
nection to permutation groups. I also introduce permutation matrices, which
gives a rich source of examples of matrices.
• The next six chapters (4-9) address problems in Lang’s presentation of key
material in his Chapters IV, V and VI.
• In Chapter 10 I discuss the companion matrix, which is not mentioned at all
in Lang, and can be skipped. It is interesting to compare the notion of cyclic
vector it leads to with the different one developed in Lang’s discussion of the
Jordan normal form in [XI, §6].
• The last chapter greatly expands Lang’s presentation of tests for positive
(semi)definite matrices, which are only mentioned in the exercises, and which
omit the ones that are most useful and easiest to apply.
To the extent possible I use the notation and the style of the author. For example
I minimize the use of summations, and give many proofs by induction. One ex-
ception: I use At to denote the transpose of the matrix A. Those notes form the
chapters of this book. There was no time in my class to cover Lang’s Chapter XII
on Convex Sets or Appendix II on the Iwasawa Decomposition, so they are not
mentioned here. Otherwise we covered the entire book in class.
I hope that instructors and students reading Lang’s book find these notes useful.
Comments, corrections, and other suggestions for improving these notes are
welcome. Please email them to me at hcp3@columbia.edu.
H ENRY C. P INKHAM
New York, NY
Draft of May 31, 2013
Chapter 1
Gaussian Elimination
The main topic that is missing in [4], and that needs to be covered in a first linear
algebra class is Gaussian elimination. This can be easily fit into a course using [4]:
after covering [4], Chapter II, spend two lectures covering Gaussian elimination,
including elimination via left multiplication by elementary matrices. This chapter
covers succinctly the material needed. Another source for this material is Lang’s
more elementary text [3].
1.1 Row Operations

Assume we have a system of m equations in n variables.
Ax = b,
where A is therefore a m × n matrix, x a n-vector and b a m-vector. Solving this

system of equations in x is one of the fundamental problems of linear algebra.
Gaussian elimination is an effective technique for determining when this sys-
tem has solutions in x, and what the solutions are.
The key remark is that the set of solutions do not change if we use the following
three operations.
1. We multiply any equation in the system by a non-zero number;
2. We interchange the order of the equations;
3. We add to one equation a multiple of another equation.
1.1. ROW OPERATIONS 2
1.1.1 Definition. Applied to the augmented matrix

 
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
A0 =  . (1.1.2)
 
.. .. .. .. 
 .. . . . . 
am1 am2 . . . amn bm
these operations have the following effect:

1. Multiplying a row of A0 by a non-zero number;
2. Interchanging the rows of A0 ;
3. Adding to a row of A0 a multiple of another row.
They are called row operations.
Using only these three matrix operations, we will put the coefficient matrix A
into row echelon form. Recall that Ai denotes the i-th row of A.
1.1.3 Definition. The matrix A is in row echelon form if
1. All the rows of A that consist entirely of zeroes are below any row of A that
has a non-zero entry;
2. If row Ai has its first non-zero entry in position j, then row Ai+1 has its first
non-zero entry in position > j.
1.1.4 Remark. This definition implies that the first non-zero entry of Ai in a posi-
tion j ≥ i. Thus if A is in row echelon form, aij = 0 for all j < i. If A is square,
this means that A is upper triangular.
1.1.5 Example. The matrices
     
1 2 3 0 2 3 −1 2 3
0 4 0 , 0 0 2 , and  0 −2 2
0 0 1 0 0 0 0 0 0
are in row echelon form, while

     
1 2 3 0 2 3 0 0 3
0 4 0 , 0 0 0 , and 0 0 2
0 2 1 0 0 1 0 0 1
are not.
The central theorem is
1.1.6 Theorem. Any matrix can be put in row echelon form by using only row
operations.
1.2. ELEMENTARY MATRICES 3
Proof. We prove this by induction on the number of rows of the matrix A. If A has
one row, there is nothing to prove.
So assume the result for matrices with m − 1 rows, and let A be a matrix with
m rows. If the matrix is the 0 matrix, we are done. Otherwise consider the first
column Ak of A that has a non-zero element. Then for some i, aik 6= 0, and for all
j < k, the column Aj is the zero vector.
Since Ak has non-zero entry aik , interchange rows 1 and i of A. So the new
matrix A has a1k 6= 0. Then if another row Ai of A has k-th entry aik 6= 0, replace
that row by
aik
Ai − A1
a1k
which has j-th entry equal to 0. Repeat this on all rows that have a non-zero
element in column k. The new matrix, which we still call A, has only zeroes in its
first k columns except for a1k .
Now consider the submatrix B of the new matrix A obtained by removing the
first row of A and the first k columns of A. Then we may apply the induction
hypothesis to B and put B in row echelon form by row operations. The key ob-
servation is that these same row operations may be applied to A, and do not affect
the zeroes in the first k columns of A, since the only non-zero there is in row 1 and
that row has been removed from B.
We could of course make this proof constructive by repeating to B the row

operations we did on A, and continuing in this way.
1.1.7 Exercise. Reduce the matrices in Example 1.1.5 that are not already in row
echelon form to row echelon form.
At this point we defer the study of the inhomogeneous equation Ax = b. We

will pick it up again in §7.2 when we have more tools.
1.2 Elementary Matrices

Following Lang [3], Chapter II §5, we introduce the three types of elementary
matrices, and show they are invertible.
In order to do that we let Irs be the square matrix with a irs = 1, and zeroes
everywhere else. So for example in the 3 × 3 case
 
0 0 0
I23 = 0 0 1
0 0 0
1.2. ELEMENTARY MATRICES 4
1.2.1 Definition. Elementary matrices E are square matrices, say m × m. We

describe what E does to a matrix m × n A when we left-multiply E by A, and we
write the formula for E. There are three types.
1. E multiplies a row of a matrix by a number c 6= 0; The elementary matrix
Er (c) := I + (c − 1)Irr
multiplies the r-th row of A by c.

2. E interchange two rows of any matrix A; We denote by Trs , r 6= s the
permutation matrix that interchanges row i of A with row j. This is a special
case of a permutation matrix: see §3.3 for the 3 × 3 case. The permutation
is a transposition.
3. E adds a constant times a row to another row; The matrix
Ers (c) := I + cIrs , r 6= s,
adds c times the s-th row of A to the r-th row of A.
1.2.2 Example. Here are some 3 × 3 examples, acting on the 3 × n matrix

 
a11 . . . a1n
A = a21 . . . a2n  .
a31 . . . a3n
First, since  
c 0 0
E1 (c) = 0 1 1 ,
0 0 1
matrix multiplication gives
 
ca11 . . . ca1n
E1 (c)A = a21 . . .
 a2n  .
a31 . . . a3n
Next since  
1 0 0
T23 = 0 0 1 
0 1 0
we get  
a11 . . . a1n
T23 A = a31 . . . a3n  .
a21 . . . a2n
1.3. ELIMINATION VIA MATRICES 5
Finally, since  
1 0 c
E13 (c) = 0 1 0
0 0 1
we get  
a11 + ca31 . . . a1n + ca33
E13 (c)A =  a21 ... a2n .
a31 ... a3n
1.2.3 Theorem. All elementary matrices are invertible.
Proof. For each type of elementary matrix E we write down an inverse, namely a
matrix F such that EF = I = F E.
For Ei (c) the inverse is Ei (1/c). Tij is its own inverse. Finally the inverse of
Eij (c) is Eij (−c).
1.3 Elimination via Matrices

The following result follows immediately from Theorem 1.1.6 once one under-
stands left-multiplication by elementary matrices.
1.3.1 Theorem. Given any m × n matrix A, one can left multiply it by elementary
matrices of size m to bring into row echelon form.
We are interested in the following
1.3.2 Corollary. Assume A is a square matrix of size m. Then we can left-multiply
A by elementary matrices Ei , 1 ≤ i ≤ k until we reach either the identity matrix,
or a matrix whose bottom row is 0. In other words we can write
A0 = Ek . . . E1 A. (1.3.3)
where either A0 = I or the bottom row of A0 is the zero vector.
Proof. Using Theorem 1.3.1, we can bring A into row echelon form. Call the new
matrix AB. As we noted in Remark 1.1.4 this means that B is upper triangular.
If any diagonal entry bii of B is 0, then the bottom row of B is zero, and we have
reached one of the conclusions of the theorem.
So we may assume that all the diagonal entries of B are non-zero. We now do
what is called backsubstitution to transform B into the identity matrix.
First we left multiply by the elementary matrices
Em (1/bmm ), Em−1 (1/bm−1,m−1 ), . . . , E1 (1/b11 )

1.3. ELIMINATION VIA MATRICES 6
. This new matrix, that we still call B, now has 1s along the diagonal.
Next we left multiply by elementary matrices of type (3) in the order
Em,m−1 (c),Em,m−2 (c), . . . , Em,1 (c),

Em−1,m−2 (c),Em−1,m−3 (c), . . . , Em−1,1 (c),
...,...
E2,1 (c).
where in each case the constant c is chosen so that a new zero in created in the ma-
trix. This order is chosen so that no zeroes created by a multiplication is destroyed
by a subsequent one. At the end of the process we get the identity matrix I so we
are done.
1.3.4 Exercise. Reduce the matrices in Example 1.1.5 either to a matrix with bot-
tom row zero or to the identiy matrix using left multiplcation by elementary matri-
ces.
For example, the first matrix
 
1 2 3
0 4 0
0 0 1
backsubstitutes to
     
1 2 3 1 2 0 1 0 0
0 1 0 , then 0 1 0 , then 0 1 0 .
0 0 1 0 0 1 0 0 1
We will use these results on elementary matrices in §2.2 and then throughout
Chapter 8.
Chapter 2
Matrix Inverses
This chapter fill in some gaps in the exposition in Lang, especially the break be-
tween his more elementary text [3] and the textbook [4] we use. There are some
additional results: Proposition 2.2.5 and Theorem 2.2.8. They will be important
later.
2.1 Background
On page 35 of our text [4], Lang gives the definition of an invertible matrix.
2.1.1 Definition. A n × n matrix A is invertible if there exists another n × n matrix
B such that
AB = BA = I. (2.1.2)
We say that B is both a left inverse and a right inverse for A. It is reasonable
to require both since matrix multiplication is not commutative. Then Lang proves:
2.1.3 Theorem. If A has an inverse, then its inverse B is unique.
Proof. Indeed assume there is another matrix C satisfying (2.1.2) when C replaces
B. Then
C = CI = C(AB) = (CA)B = IB = B (2.1.4)
so we are done.
Note that the proof only uses that C is a left inverse and B a right inverse.
We say, simply, that B is the inverse of A and A the inverse of B.
In [4], on p. 86, Chapter IV, §2, in the proof of Theorem 2.2, Lang only estab-
lishes that a certain matrix A has a left inverse, when in fact he needs to show it
has an inverse. We recall this theorem and its proof in 4.1.4. We must prove that
2.2. MORE ELIMINATION VIA MATRICES 8
having a left inverse is enough to get an inverse. Indeed, we will establish this in
Theorem 2.2.8.
2.2 More Elimination via Matrices

In 1.2.1 we introduced the three types of elementary matrices and showed they are
invertible in Theorem 1.2.3.
2.2.1 Proposition. Any product of elementary matrices is invertible.
Proof. This follows easily from a more general result. Let A and B are two invert-
ible n × n matrices, with inverses A−1 and B −1 . Then B −1 A−1 in the inverse of
AB
Indeed just compute
B −1 A−1 AB = B −1 IB = B −1 B = I
and
ABB −1 A−1 = A−1 IA = A−1 A = I.
Before stating the next result, we make a definition:
2.2.2 Definition. Two m × n matrices A and B are row equivalent if there is a

product of elementary matrices E such that B = EA.
By Proposition 2.2.1 E is invertible, so if B = EA, then A = E −1 B.
2.2.3 Proposition. If A is a square matrix, , and if B is row equivalent to A, then

A has an inverse if and only if B has an inverse.
Proof. Write B = EA. If A is invertible, then A−1 E −1 is the inverse of B. We

get the other implication by symmetry.
2.2.4 Remark. This establishes an equivalence relation on m × n matrices, as

discussed in §4.3.
We continue to assume that A is a square matrix. Corollary 1.3.2 tells us that

A is either row equivalent to the identity matrix I, in which case it is obviously
invertible by Proposition 2.2.3, or row equivalent to a matrix with bottom row the
zero vector.
2.2. MORE ELIMINATION VIA MATRICES 9
2.2.5 Proposition. Let A be a square matrix with one row equal to the zero vector.
Then A is not invertible.
Proof. By [4] , Theorem 2.1 p. 30, the homogeneous system
AX = O (2.2.6)
has a non-trivial solution X, since one equation is missing, so that the number of
variables is greater than the number of equations. But if A were invertible, we
could multiply (2.2.6) on the left by A−1 to get A−1 AX = A−1 O. This yields
X = O, implying there cannot be a non-trivial solution, a contradiction to the
assumption that A is invertible.
Proposition 5.1.4 generalizes this result to matrices that are not square.
So we get the following useful theorem. The proof is immediate.
2.2.7 Theorem. A square matrix A is invertible if and only if it is row equivalent

to the identity I. If it is not invertible, it is row equivalent to a matrix whose last
row is the zero vector. Furthermore any upper triangular matrix square A with
non-zero diagonal elements is invertible.
Finally, we get to the key result of this section.
2.2.8 Theorem. Let A be a square matrix which has a left inverse B, so that
BA = I. Then A is invertible and B is its inverse.
Similarly, if A has a right inverse B, so that AB = I, the same conclusion
holds.
Proof. Suppose first that AB = I, so B is a right inverse. Perform row reduction

on A. By Theorem 2.2.7, there are elementary matrices E1 , E2 , . . . , Ek so that
the row reduced matrix A0 = Ek . . . E1 A is the identity matrix or has bottom row
zero. Then multiply by B on the right and use associativity:
A0 B = (Ek . . . E1 A)B = (Ek . . . E1 )(AB) = Ek . . . E1 .
This is invertible, because elementary matrices are invertible, so all rows of A0 B

are non-zero by Proposition 2.2.5. Now if A0 had a zero row, matrix multiplication
tells us that A0 B would have a zero row, which is impossible. So A0 = I. Finally
Proposition 2.2.3 tells us that A is invertible, since it is row equivalent to I. In
particular its left inverse Ek . . . E1 and its right inverse B are equal.
To do the direction BA = I, just interchange the role of A and B.
2.3. INVERTIBLE MATRICES 10
2.2.9 Remark. The first five chapters of Artin’s book [1] form a nice introduction
to linear algebra at a slightly higher level than Lang’s, with some group theory
thrown in too. The main difference is that Artin allows his base field to be any
field, including a finite field, while Lang only allows R and C. The references to
Artin’s book are from the first edition.
2.3 Invertible Matrices

The group (for matrix multiplication) of invertible square matrices has a name: the
general linear group, written Gl(n). Its dimension is n2 . Most square matrices are
invertible, in a sense one can make precise, for example by a dimension count. A
single number, the determinant of A, tells us if A is invertible or not, as we will
find out in [4], Chapter VI.
It is worth memorizing what happens in the 2 × 2 case. The matrix
a b
A=
c d
is invertible if and only if ad − bc 6= 0, in which case the inverse of A is
−1 1 d −b
A =
ad − bc −c a
as you should check by direct computation. See Exercise 4 of [I, §2].

Chapter 3
Groups and Permutation

Matrices
In class we studied the motions of the equilateral triangle whose center of gravity is
at the origin of the plane, so that the three vertices are equidistant from the origin.
We set this up so that one side of the triangle is parallel to the x- axis. There are
6 = 3! motions of the triangle permuting the vertices, and they can be realized by
linear transformations, namely multiplication by 2 × 2 matrices.
These notes describe these six matrices and show that they can be multiplied
together without ever leaving them, because they form a group: they are closed
under matrix multiplication. The multiplication is not commutative, however. This
is the simplest example of a noncommutative group, and is worth remembering.
Finally, this leads us naturally to the n×n permutation matrices. We will see in
§3.3 that they form a group. When n = 3 it is isomorphic to the group of motions
of the equilateral triangle. From the point of view of linear algebra, permutation
and permutation matrices are the most important concepts of these chapter. We
will need them when we study determinants.
There are some exercises in this chapter. Lots of good practice in matrix mul-
tiplication.
3.1 The Motions of the Equilateral Triangle

The goal of this section is to write down six 2 × 2 matrices such that multiplication
by those matrices induce the motions of the equilateral triangle.
First the formal definition of a permutation:
3.1.1 Definition. A permutation σ of a finite set S is a bijective map from S to
itself. See [4], p. 48.
3.1. THE MOTIONS OF THE EQUILATERAL TRIANGLE 12
We are interested in the permutations of the vertices of the triangle. Because

we will be realizing these permutations by linear transformations and because the
center of the triangle is fixed by all linear transformations, it turns out that each
linear transformation fixes the entire triangle. Here is the graph.
Y Axis
P
θ= 1200
Mirr
or
Mirr
or
X Axis
300
Q R
Left-Right Mirror
To start, we have the identity motion given by the identity matrix

1 0
I= .
0 1
2π
Next the rotations. Since the angle of rotation is 120 degrees, or 3 radians,
the first matrix of rotation is
√
cos 2π − sin 2π

3 3 −1/2 − 3/2
√
R1 = = .
sin 2π
3 cos 2π
3 3/2 −1/2
Lang [4] discusses this on page 85. The second one, R2 , corresponds rotation by
240 degrees. You should check directly what R2 is, just like we did for R1 . You
will get:
3.2. GROUPS 13
√
−1/2
√ 3/2
R2 = .
− 3/2 −1/2
More importantly note that R2 is obtained by repeating R1 , in matrix multiplica-
tion:
R2 = R12 .
Also recall Lang’s Exercise 23 of Chapter II, §3.
Next let’s find the left-right mirror reflection of the triangle. What is its matrix?
In the y coordinate nothing should change, and in the x coordinate, the sign should
change. The matrix that does this is

−1 0
M1 =
0 1
This is an elementary matrix, by the way.

One property that all mirror reflection matrices share is that their square is the
identity. True here.
Finally, the last two mirror reflections. Here they are. Once you understand
why conceptually the product is correct, to do the product just change the sign of
the terms in the first column of R1 and R2 :
√ √
1/2
√ − 3/2 1/2 3/2
M2 = R1 M1 = and M3 = R2 M1 = √ .
− 3/2 −1/2 3/2 −1/2
In the same way, to compute the product M1 R1 just change the sign of the
terms in the first row of R1 .
3.1.2 Exercise. Check that the squares of M2 and M3 are the identity. Describe
each geometrically: what is the invariant line of each one. In other words, if you
input a vector X ∈ R2 , what is M2 X, etc.
3.2 Groups
3.2.1 Exercise. We now have six 2 × 2 matrices, all invertible. Why? The next
exercise will give an answer.
3.2.2 Exercise. Construct a multiplication table for them. List the 6 elements in
the order I, R1 , R2 , M1 , M2 , M3 , both horizontally and vertically, then have the
elements in the table you construct represent the product, in the following order: if
A is the row element, and B is the column element, then put AB in the table. See
3.2. GROUPS 14
name I R1 R2 M1 M2 M3
I I R1 R2 M1 M2 M3
R1 R1 R2 I M2 M3 M1
R2 R2 I R1 M3 M1 M2
M1 M1 M3 M2 I R2 R1
M2 M2 M1 M3 R1 I R2
M3 M3 M2 M1 R2 R1 I
Table 3.1: Multiplication Table.
Artin [1], chapter II, §1, page 40 for an example, and the last page of these notes
for an unfinished example. The solution is given in Table 3.1.
Show that this multiplication is not commutative, and notice that each row and
column contains all six elements of the group.
Table 3.1 is the solution to the exercise. The first row and the first column of
the table just give the elements to be multiplied, and the entry in the table give the
product. Notice that the 6 × 6 table divides into 4 3 × 3 subtables, each of which
contains either M elements or R and I elements, but not both.
What is a group? The formal definition is given in Lang in Appendix II.
3.2.3 Definition. A group is a set G with a map G × G → G, written as multipli-

cation, which to a pair (u, v) of elements in G associates uv in G, where this map,
called the group law, satisfies three axioms:
GP 1. The group law is associative: Given elements u, v and w in G,
(uv)w = u(vw).
GR 2. Existence of a neutral element 1 ∈ G for the group law: for every element
u ∈ G,
1u = u1 = u.
GR 3. Existence of an inverse for every element: For every u ∈ G there is a v ∈ G

such that
uv = vu = 1.
The element v is written u−1 .
If you look at the first three axioms of a vector space, given in Lang p.3, you
will see that they are identical. Only two changes: the group law is written like
multiplication, while the law + in vector spaces is addition. The neutral element
3.2. GROUPS 15
for vector spaces is O, while here we write it as 1. These are just changes in
notation, not in substance. Note that we lack axiom VP 4, which in our notation
would be written:
GR 4. The group law is commutative: For all u and v in G
uv = vu.
So the group law is not required to be commutative, unlike addition in vector
spaces. A group satisfying GR 4 is called a commutative group. So vector spaces
are commutative groups for their law of addition.
One important property of groups is that cancelation is possible:
3.2.4 Proposition. Given elements u, v and w in G,
1. If uw = vw, then u = v.
2. If wu = wv, then u = v again.
Proof. For (1) multiply the equation on the right by the inverse of w to get
(uw)w−1 = (vw)w−1 by existence of inverse,
u(ww−1 ) = v(ww−1 ) by associativity,
u1 = v1 by properties of inverses,
u=v by property of neutral element.
For (2), multiply by w−1 on the left.
Because vector spaces are groups for addition, this gives a common framework
for solving some of the problems in Lang: for example problems 4 and 5 of [I, §1].
3.2.5 Example. Here are some groups we will encounter.
1. Any vector space for its law of addition.
2. The integers Z, with addition, so the neutral element is 0 and the inverse of
n is −n.
3. Permutations on n elements, where the law is composition of permutations.
4. The group of all invertible n × n matrices, with matrix multiplication as its
law. The neutral element is’
 
1 0 ... 0
0 1 . . . 0
I = . . .
 
 .. .. . . ... 

0 0 ... 1
5. The group of all invertible n × n matrices with determinant 1.
Much of what we will do in this course is to prove results that will help us under-
stand these groups. For example Theorem 8.3.3 clarifies the relationship between
the last two example. See Remark 8.3.5.
3.3. PERMUTATION MATRICES 16
3.3 Permutation Matrices

Permutations are discussed in Lang [6, §4] while studying determinants. The ter-
minology of groups is not introduced until Appendix II, but in effect it is shown
that permutations on the set of integers (1, 2, . . . , n), which Lang calls Jn , is a
group for composition of maps. It is called the symmetric group, and often written
Sn . In §3.1 we studied S3 .
3.3.1 Exercise. Show that the number of elements in Sn is
n! = n(n − 1)(n − 2) · · · (2)(1).
When n = 3 we get 6, as we noticed above.
Lang does not introduce permutation matrices, even though they form a natural
linear algebra way of studying permutations, and are a rich store of examples.
While we treat the general case of n × n permutation matrices here, you should
first set n = 3 to get the case we studied in §3.1.
The objects to be permuted are the n unit column vectors in K n :
     
1 0 0
0 1 0
e1 =  .  , e2 =  .  , . . . , en =  .  .
     
 ..   ..   .. 
0 0 1
Given a permutation σ on {1, 2, . . . , n}, we ask for a n × n matrix P σ such that
matrix multiplication P σ ej yields eσ(j) for j = 1, 2, . . . , n. Because P σ I = P σ ,
where I is the identity matrix, the j-th column of P σ must be the unit vector eσ(j) .
In particular P σ must have exactly one 1 in each row and column with all other
entries being 0.
Here is the formal definition:
3.3.2 Definition. Given a permutation σ on {1, 2, . . . , n}, we define the n × n
permutation matrix P σ = (pσij ) of σ as follows:
(
σ 1, if i = σ(j);
pij =
0, otherwise.
Another way of saying this is that the j-th column of P σ is eσ(j) .
3.3.3 Example. Consider the permutation σ(1) = 2, σ(2) = 3, σ(3) = 1. Then
its permutation matrix is  
0 0 1
P = 1 0 0
0 1 0
Here is another permutation matrix.

 
0 1 0
Q = 1 0 0
0 0 1
Notice that this is an elementary matrix, interchanging rows 1 and 2. Its square
is I. We could call it M3 : M because it acts like a mirror in the previous sections,
3 because that is the variable that is not moved.
3.3.4 Exercise. What is the permutation σ associated to Q?
Solution: σ here is the permutation σ(1) = 2, σ(2) = 1, σ(3) = 3.
3.3.5 Exercise. Compute the matrix product PQ. Also compute the matrix product
QP.
Solution:
    
0 0 1 0 1 0 0 0 1
P Q = 1 0 0 1 0 0 = 0 1 0
0 1 0 0 0 1 1 0 0
while     
0 1 0 0 0 1 1 0 0
QP = 1 0 0 1 0 0 = 0 0 1
0 0 1 0 1 0 0 1 0
The answers are different, but each one is a permutation matrix.
3.3.6 Exercise. Here is another example.

 
0 0 1
R = 0 1 0 
1 0 0
What is the permutation associated to R? What is R2 ?
3.3.7 Exercise. Write down all 3 × 3 permutation matrices. Show that they are
invertible by finding their inverses. Make sure that your technique generalizes to
n × n permutation matrices. (Hint: transpose)
3.3.8 Exercise. Construct a multiplication table for the 3 × 3 permutation matri-

ces, just as in Exercise 3.2.2. Notice that it is the same table, given the proper
identification of the elements.
Here is the solution to these exercises. One key concept is the notion of the
order of an element, namely the smallest power of the element that makes it the
identity. The order of I is obviously 1.
Table 3.1 shows that the order of all three elements Mi is 2. Note that this
means that they are their own inverses. A little extra thought shows that the order
of R1 and R2 is 3: Indeed: R1 is the rotation by 120 degrees, and R2 rotation by
240, so 3 times each one of them is a multiple of 360 degres.
For our permutation matrices, to have order 2 means to be symmetric, as you
should check. The matrices Q and R are symmetric, but the matrix P is not. P 2 is
the last non-symmetric permutation matrix:
    
0 0 1 0 0 1 0 1 0
P 2 = 1 0 0 1 0 0 = 0 0 1
0 1 0 0 1 0 1 0 0
It is now easy to put together a multiplication table that, in the language of

group theory, shows that this group is isomorphic to the group of motions of the
equilateral triangle of §3.2.
Chapter 4
Representation of a Linear
Transformation by a Matrix
This chapter concerns [4], chapter IV. At the end of the chapter Lang gives the
definition of similarity. Here we expand on this in §4.3. Then there are some
solved exercises.
4.1 The First Two Sections of Chapter IV

In Chapter IV, §1, Lang considers an arbitrary m × n matrix A, and associates to it
a linear map LA : K n → K m .
So he is building a map T from the set M (m, n) of all m×n matrices to the set
L(K n , K m ) of linear maps from K n → K m . From Exercise 2 p. 28, we learned
that M (m, n) is a vector space of dimension mn. L(K n , K m ) is a vector space by
Example 6, page 54.
So we have a map T between two vector spaces: we should check that it is a
linear map. The two displayed equations on the top of p.86 establish that.
Lang’s Theorem 1.1 shows that T is injective. Theorem 2.1 constructs an in-
verse to T . The construction of the inverse is important. It is the prototype of the
difficult constructions in [IV, §3]. The following point should be emphasized.
4.1.1 Remark. As in Lang, let E j , 1 ≤ j ≤ n be the unit coordinate vectors of

K n , and the ei , 1 ≤ i ≤ m be those of K m . By linearity the linear map L from
K n to K m maps:
L(E j ) = a1j e1 + · · · + amj em
for some aij ∈ K, which we have written here as in Lang. The choice of the order
of the running indices i and j in aij is arbitrary: we could have written aji instead.
4.1. THE FIRST TWO SECTIONS OF CHAPTER IV 20
If we write the equations at the bottom of page 83 as a matrix multiplication, we

get
L(E 1 ) L(e1 )
    
a11 ... am1
 ..   .. .. ..   .. 
 . = . . .  . 
L(E n ) a1n ... amn L(em )
The matrix (aji ) is apparently written backwards. The point of choosing this order
is that when one writes down the matrix showing how the coordinates transform
(middle of page 84), the entries of the matrix are now written in the conventional
order. This is the same computation as that given in the proof of Proposition 4.2.5
and Theorem 4.2.7 below.
4.1.2 Exercise. Show that Lang’s Theorem 2.1 is a corollary of Theorem 4.2.7
below.
By Theorem 2.1, dim L(K n , K m ) = dim M (m, n), namely mn.

Although we do not need it right away, you should study the following equation
from p.86 carefully. If F is the linear map from K n to K m with matrix A, and
G the linear map from K m to K s with matrix B, then we can find the matrix
associated to the composite linear map G ◦ F without computation. Let X be an
n-vector. Then
(G ◦ F )(X) = G(F (X)) = B(AX) = (BA)X. (4.1.3)
It is the matrix product BA. All we used is the associativity of matrix multiplica-
tion. We use this in the proof of Theorem 4.2.15.
It is worth amplifying Lang’s proof of his Theorem 2.2 p.86.
4.1.4 Theorem. Let A be an n × n matrix with columns A1 , . . . , An . Then A is

invertible if and only if A1 , . . . , An are linearly independent.
Proof. First assume A1 , . . . , An are linearly independent. So {A1 , . . . , An } is a

basis of K n . We apply Theorem 2.1 of Chapter III (p. 56), where the {A1 , . . . , An }
play the role of {v1 , . . . , vn } and the standard basis {E 1 , . . . , E n } the role of
{w1 , . . . , wn }. Then that result tells us there is a unique linear mapping T sending
Aj to E j for all j:
T (Aj ) = E j for j = 1, . . . , n.
From Theorem 2.1 of Chapter IV, §2, we know that T is multiplication by a matrix:
it is a LB . Thus we have
BAj = E j for j = 1, . . . , n.
4.2. REPRESENTATION OF L(V, W ) 21
But now, just looking at the way matrix multiplication works (a column at a time
for the right hand matrix), since E j is the j-th column of the identity matrix, we
get
BA = I
so B is a left inverse for A. Since Theorem 2.2.8 implies that A is invertible , we
are done for this direction.
The converse is easy: we have already seen this argument before in the proof
of Proposition 2.2.5. Assume A is invertible. Consider the linear map LA :
LA (X) = AX = x1 A1 + · · · + xn An ,
where the Ai are the columns of A.
If AX = O, then multiplying by the inverse A−1 we get A−1 AX = X = O,
so that the only solution of the equation AX = O is X = O. So the only solution
of
x 1 A1 + · · · + x n An = O
is X = O, which says precisely that A1 , . . . , An are linearly independent.
4.2 Representation of L(V, W )

We now get to the most problematic section of [4]: [IV, §3].
Suppose we have two vector spaces V and W , and a linear map F between
them. Let B = {v1 , . . . , vn } be a basis for V and E = {w1 , . . . , wm } a basis for
W , so that V has dimension n and W dimension m.
The goal is to find the matrix that Lang writes MEB (F ) and calls the matrix
representing F in these bases. It is defined in Lang’s Theorem 3.1.
Lang does not mention this up front (but see the set of equations at the bottom
of page 83, and (∗) on page 89), but the linearity of F implies that there is a
collection of numbers (cij ) such that
F (v1 ) = c11 w1 + · · · + c1m wm (4.2.1)

.. ..
.=.
F (vn ) = cn1 w1 + · · · + cnm wm
This gives us a n × m matrix C: this is not the matrix we want, but these
numbers are typically what is known about the linear transformation. Although
Lang is unwilling to write this expression down at the beginning of the section,
note that he does write it down in Example 3 p. 90.
Instead we use our basis B to construct an isomorphism from V to K n we have
studied before: see example 1 p.52. Lang now calls it XB : V → K n . To any
element
v = x1 v1 + · · · + xn vn ∈ V
it associates its vector of coordinates (x1 , . . . , xn ) in K n . We do the same thing on
the W side: we have an isomorphism XE : W → K m . To an arbitrary element
w = y1 w1 + · · · + ym wm ∈ W
it associates its vector of coordinates (y1 , . . . , ym ) in K m .

4.2.2 Remark. The inverse of XE is the linear map
XE−1 : K m → W
which to (y1 , . . . , ym ) ∈ K m associates y1 w1 + · · · + ym wm ∈ W.

These isomorphisms (or their inverses) induce a unique linear map K n → K m ,
in a sense that is made clear below, and it is the m × n matrix MEB (F ) representing
this map, in the sense of §1 and §2, that we are interested in.
4.2.3 Notation. In the notation MEB (F ), the basis, here B, of the domain V of the
linear transformation F is written as the superscript, while that of W (often called
the codomain), here E, is the subscript.
We have what is called a commutative diagram:
F
V −−−−→ W
 

XB y

XE y
M B (F )
K n −−E−−→ K m
4.2.4 Theorem. The above diagram is commutative, meaning that for any v ∈ V ,
MEB (F ) ◦ XB (v) = XE ◦ F (v)
This is exactly the content of the equation in Lang’s Theorem 3.1 p. 88.
We want to compute MEB (F ), given information about F . Since XB is an
isomorphism, it has an inverse. Thus MEB (F ) can be written as the composition
MEB (F ) = XE ◦ F ◦ XB−1
so that if we know F and the bases, we can compute MEB (F ).

Following Lang, for simplicity of notation we write A for MEB (F ), and X for
XB (v), the coordinate vector of v with respect to the basis B. Then we get the all
important computation, where the scalar product is just the ordinary dot product in
K n.
4.2.5 Proposition.
F (v) = hX, A1 iw1 + . . . hX, Am iwm , (4.2.6)
where Ai is the i-th row of A.
Proof. In the commutative diagram above, we first want to get from V to K m via
K n . We get there from V by composing the maps XB followed by MEB (F ). In
our simplified notation, this is the matrix product AX, which is an m vector. Then
to get back to W , we need to apply the inverse of XE . So we need to take each
component of the m-vector AX and put it in front of the appropriate basis element
in W , as we did in Remark 4.2.2. The components of AX are the products of the
rows of A with X. That gives (4.2.6).
Next we relate MEB (F ) to the matrix C of (4.2.1). Using (4.2.1) for the F (vi ),
we write out F (v) = F (x1 v1 + · · · + xn vn ) to get:

F (v) = x1 c11 w1 + · · · + c1m wm + · · · + xn cn1 w1 + · · · + cnm wm
and collecting the terms in wi we get:

F (v) = x1 c11 + · · · + xn cn1 w1 + · · · + x1 cm1 + · · · + xn cmn wm
which we can write as
F (v) = hX, C 1 iw1 + · · · + hX, C m iwm
the dot products with the columns of C. Compare this to (4.2.6): it says that A is
the transpose of C.
4.2.7 Theorem. The matrix MEB (F ) is the transpose of the matrix C of (4.2.1). In
particular it is an m × n matrix.
This key result is Lang’s assertion of page 89.
4.2.8 Remark. A major source of confusion in Lang is the set of equations (∗) on
p.89. Since a matrix A is already defined, the reader has every right to assume that
the aij in these equations are the coefficients of the matrix A, especially since he
already uses Ai to denote the rows of A. Nothing of the sort. The aij are being
defined by these equations, just as here we defined the cij in the analogous (4.2.1).
Because he has written the coefficients in (∗) backwards: aji for aij , he lands on
his feet: they are the coefficients of A.
4.2.9 Example. Lang’s Example 1 at the bottom of page 89 is illuminating. What

we are calling C is the matrix

3 −1 17
1 1 −1
while the matrix A = MBB0 (in Lang’s notation) is its transpose, the matrix associ-
ated with F .
4.2.10 Example. We now clarify Lang’s example 3 p.90, in changed notation to

avoid confusion. Let V be an n dimensional vector space with a first basis B =
{v1 , . . . , vn }, and a second basis E = {w1 , . . . , wn }. Thus there exists a change of
basis matrix C such that:
w1 = c11 v1 + · · · + c1n vn (4.2.11)

.. ..
.=.
wn = cn1 v1 + · · · + cnn vn
If we interpret the left hand side as the identity linear transformation id applied to
the basis elements wi here, this is the same set of equations at (4.2.1), but with the
role of the bases reversed: {w1 , . . . , wn } is now a basis in the domain of id, and
{v1 , . . . , vn } a basis in the codomain
We compute MBE (id). Note the change of position of the indices! We apply
Proposition 4.2.5, untangling the meaning of all the terms, and using as before the
abbreviation A for MBE (Id). Then, as in (4.2.6):
w = hX, A1 iv1 + · · · + hX, An ivn , (4.2.12)
and by Theorem 4.2.7, A is the transpose of C. So we are done.

Next consider the unique linear map F : V → V such that
F (v1 ) = w1 , . . . , F (vn ) = wn .
The uniqueness follows from the important Theorem 2.1 of Chapter III of Lang.
We want to compute MBB (F ). The same basis on both sides. We write

F (v) = F (x1 v1 + · · · + xn vn ) = x1 F (v1 ) + . . . xn F (vn )
= x1 w1 + · · · + xn wn

= x1 c11 v1 + · · · + c1n vn + · · · + xn cn1 v1 + · · · + cnn vn

= x1 c11 + . . . xn cn1 v1 + · · · + x1 c1n + . . . xn cnn v1
= hX, C 1 iv1 + · · · + hX, C n ivn
= AX,
where we used (4.2.11) to express the wi in terms of the vi and we write A for the
transpose of C. Thus MBB (F ) is the transpose of C.
The point of this exercise is that these two very different linear operators have
the same matrix representing them: this is due to the different choice of bases.
Lang now states the following easy result.
4.2.13 Theorem. Let V be a vector space of dimension n with basis B and W a
vector space of dimension m with basis E. Let f and g be linear maps from V to
W . Then
MEB (f + g) = MEB (f ) + MEB (g)
and
MEB (cf ) = cMEB (f ).
Therefore MEB is a linear map from L(V, W ) to M (m, n), the space of n × m
matrices. Furthermore it is an isomorphism.
4.2.14 Remark. Some comments.
1. In the proof of the theorem, note the phrase “it is surjective since every linear
map is represented by a matrix”. This is a typo: it is surjective because every
matrix represents a linear map. The phrase as written just shows the map is
well defined. We could also finish the proof by showing that the dimension
of L(V, W ) is mn and concluding by the Rank-Nullity Theorem.
2. As noted in §4.1, Lang has already shown that L(K n , K m ) is isomorphic to
M (m, n). But L(K n , K m ) and L(V, W ) are obviously isomorphic, by an
isomorphism that depends on the choice of a basis for V and a basis for W ,
so we are done. The isomorphism comes from, for example, Theorem 2.1 of
Chapter III, p.56.
On to the multiplicative properties of the associated matrix. Lang proves the
next result beautifully: no computation is involved. It is also easy to see when the
theorem applies: the middle basis index must be the same (in the statement below,
E); and how to apply it: compose the maps, and use the extreme bases.
4.2.15 Theorem. Let V , W and U be vector spaces of dimensions n, m and p. Let

B, E, H be bases for V , W , U respectively. Let
F : V → W and G : W → U
be linear maps. Then

E
MH (G)MEB (F ) = MH
B
(G ◦ F ),
where the left hand side is the matrix product of a matrix of size p × m by a matrix
of size m × n.
Proof. We can extend our previous commutative diagram:
F G
V −−−−→ W −−−−→ U
  

XB y

XE y

XH y
M B (F ) M E (G)
K n −−E−−→ K m −−−
H
−−→ K p
and compare to the commutative diagram of the composite:
G◦F
V −−−−→ U
 

XB y

XH y
H M B (G◦F )
K n −−−−−−→ K p
We need to show they are the same. This is nothing more than (4.1.3).
Now assume the linear map F is a map from a vector space V to itself. Pick a
basis B of V , and consider
MBB (F )
the matrix associated to F relative to B. Then doing this for the identity map Id,
we obviously get the identity matrix I:
MBB (Id) = I (4.2.16)
The following elementary corollary will prove surprisingly useful.

4.2.17 Corollary. Let V be a vector space and B and E two bases. Then
MEB (Id)MBE (Id) = I = MBE (Id)MEB (Id),
so that MBE (Id) is invertible.

Proof. In Theorem 4.2.15, let V = U = W , F = G = Id, and H = B. Use

(4.2.16), and we are done.
The next result is very important:
4.2.18 Theorem. Let F : V → V be a linear map, and let B and E be bases of V .

Then there is an invertible matrix N such that
MEE (F ) = N −1 MBB (F )N
Proof. Indeed take N = MBE (id), which has inverse MEB (id) by Corollary 4.2.17.
Then the expression on the right becomes
MEB (id)MBB (F )MBE (id) (4.2.19)
First apply Theorem 4.2.15 to the two terms on the right in (4.2.19):
MBB (F )MBE (id) = MBE (F ◦ id) = MBE (F )
Now apply Theorem 4.2.15 again to the two remaining terms:
MEB (id)MBE (F ) = MEE (id ◦ F ) = MEE (F )
so we are done.
Two square matrices M and P are called similar if there is an invertible matrix
N such that
P = N −1 M N
Theorem 4.2.18 shows is that two matrices that represent the same linear trans-
formation F : V → V in different bases of V are similar. We have an easy converse
that Lang does not mention:
4.2.20 Theorem. Assume that two n × n matrices A and B are similar, so B =

N −1 AN , for an invertible matrix N . Then they represent the same linear operator
F.
Proof. Choose an n dimensional vector space V , a basis B = {v1 , . . . , vn } for V .

Let F be the linear map represented by A in the B basis, so that if (x1 , . . . , xn ) are
the coordinates in the B basis, the action is by matrix multiplication AX, where X
is the column vector of (xi ). Thus A = MBB (F )
We construct a second basis E = {w1 , . . . , wn } of V , so that N = MBE (id).
Indeed, we already constructed it in the first part of Example 4.2.10: in (4.2.11)
4.3. EQUIVALENCE RELATIONS 28
use for C the transpose of the matrix N , that gives us the basis E = {w1 , . . . , wn }.
By construction N = MBE (id), so by Corollary 4.2.17
B = MEB (id)MBB (F )MBE (id) = N −1 AN
as required.
We indicate that A is similar to B by A ∼ B.
4.3 Equivalence Relations

We explore the notion of similarity of matrices by introducing equivalence rela-
tions.
If S is a set, a binary relation compares two elements of S. When we compare
two elements s and t of S, the outcome can be either true or false. We express that
the outcome is true by writing sRt.
We are only interested in a specific kind of binary relation, called an equiva-
lence relation. It has three properties: it is reflexive, symmetric and transitive:
4.3.1 Definition. Let R be a binary relation on a set S. Then

1. R is reflexive when sRs for all s ∈ S.
2. R is symmetric when sRt implies tRs.
3. R is transitive when sRt and tRu imply sRu
Read “sRt” as “s is equivalent to t”. The most familiar example of an equiva-

lence relation is probably congruence:
4.3.2 Example. Congruence modulo a positive integer k is an equivalence relation

on the set of integers Z, defined as follows: Two integers a and b are congruent
modulo k, if they have the same remainder under division by k. Each equivalence
class contains all the integers whose remainder modulo division by k is a fixed
− 1. Thus
integer. Thus there are k equivalence classes, often denoted 0̃, 1̃, . . . , k]
the equivalence class 0̄ contains all the multiples of k:
. . . , −2k, −k, 0, k, 2k, . . .
A key fact about an equivalence relation on a set S is that it partitions S into

non-overlapping equivalence classes.
4.3.3 Definition. A partition of a set S is a collection of non-overlapping subsets

Si , called equivalence classes, whose union is S. Thus for any two i and j in I,
the intersection Si ∩ Sj is empty, and the union ∪i∈I Si = S.
4.3. EQUIVALENCE RELATIONS 29
4.3.4 Proposition. A partition {Si , i ∈ I} defines an equivalence relation P on

S × S, whose domain and range is all of S: sP t if s and t are in the same subset
Si . Conversely any equivalence relation R defines a partition of S, where each
equivalence class Ss consists of the elements t ∈ S that are equivalent to a given
element s.
Proof. It is easy to show that P satisfies the three properties of an equivalence

relation. For the converse, just show that the sets Ss are either the same, or disjoint.
Their union is obviously S.
The key theorem is very easy to prove:
4.3.5 Theorem. Similarity is an equivalence relation on n × n matrices.
Proof. To prove that ∼ is an equivalence relation, we need to establish the follow-

ing three points:
1. A ∼ A: Use the identity matrix for C.
2. if A ∼ B, then B ∼ A:
If A ∼ B, there is an invertible C such that B = C −1 AC. Then, multiplying
both sides of the equation on the right by C −1 and on the left by C, and
letting D = C −1 , we see that A = D−1 BD, so B is similar to A.
3. if A ∼ B and B ∼ D, then A ∼ D:
The hypotheses mean that there are invertible matrices C1 and C2 such that
B = C1−1 AC1 and D = C2−1 BC2 , so, substituting from the first equation
into the second, we get
D = C2−1 C1−1 AC1 C2 = (C1 C2 )−1 A(C1 C2 ),
so A is similar to D using the matrix C1 C2 .
Since similarity is an equivalence relation on n × n matrices, it partitions these

matrices into equivalence classes. View each n × n matrix as the matrix MBB (T )
of a linear operators T in the basis B of the n-dimensional vector space V . The
importance of this relation is that the elements of a given equivalence class corre-
spond to the same linear operator T , but expressed in different basis. One of the
goals of the remainder of this course is to determine the common features of the
matrices in a given similarity class. For example Lang shows in Chapter VIII, The-
orem 2.1 that similar matrices have the same characteristic polynomial. However
at the end of this course we will see that two matrices that have the same charac-
teristic polynomial need not be similar: this follows from the construction of the
4.4. SOLVED EXERCISES 30
Jordan normal form in Chapter XI, §6. The simplest example is probably given by
the matrices
α 0 α 1
and
0 α 0 α
for any complex number α.
4.3.6 Remark. When you read Lang’s Chapter XI, §6, be aware that Figure 1 on
p. 264 is potentially misleading: the αi are not assumed to be distinct.
4.3.7 Exercise. Show that row equivalence (see Definition 2.2.2) is an equivalence
on n × m matrices.
We will study one further equivalence relation in Exercise 11.1.6.
4.4 Solved Exercises

Here are some easy exercises with solutions:
4.4.1 Exercise. Let F be the linear transformation from K 3 to K 2 given by

 
x1
5x1 − 7x2 − x3
F x2 =
 
x2 + x3
x3
What is the matrix A of F with respect to the standard bases of K 3 and K 2 ?

Solution: The xi are coordinates with respect to the standard basis: indeed, a
vector can be written x1 E 1 + x2 E 2 + x3 E 3 where the E i are our notation for the
standard basis. Then the matrix is clearly
 
x1
5 −7 −1
, so that when you multiply by x2 

0 1 1
x3
on the right you get the original matrix back.
4.4.2 Exercise. Let V be a vector space of dimension 3, and let B = {v1 , v2 , v3 }

be a basis for V . Let L be a linear operator from V to itself with matrix
 
2 0 1
MBB (L) = 1 1 5
2 1 2
relative to this basis. What is L(2v1 − v2 + 5v3 )? Is the operator L invertible?

Solution: Again, a very easy problem once you untangle the definitions. The
coordinates of 2v1 −v2 +5v3 are (2, −1, 5), so the answer is just the matrix product:
    
2 0 1 2 9
1 1 5 −1 = 26
2 1 2 5 13
The second part is unrelated. We answer in the usual way by doing row oper-
ations. First we interchange two rows to simplify the computations, and then we
make the matrix upper triangular:
       
1 1 5 1 1 5 1 1 5 1 1 5
2 0 1 , 0 −2 −9 , 0 0 7  , 0 −1 −8
2 1 2 0 −1 −8 0 −1 −8 0 0 7
so it is invertible by Theorem 2.2.7: all the diagonal elements are nonzero.
4.4.3 Exercise. Let V be a vector space of dimension 3, and let B = {v1 , v2 , v3 } be

a basis for V . Let L be a linear operator from V to itself such that L(v1 ) = 2v1 −v3 ,
L(v2 ) = v1 − v2 + v3 , L(v3 ) = v2 + 2v3 . Compute the matrix MBB of the linear
transformation.
Solution: Here we are given the transformation on the basis elements. On the
basis elements the transformation is given by the matrix
 
2 0 −1
1 −1 1 
0 1 2
So by the big theorem of Lang, Chapter IV, §3, recalled in Theorem 4.2.7: the
matrix representing the linear transformation is the transpose of the matrix on the
basis elements. Thus the answer is
 
2 1 0
 0 −1 1
−1 1 2
And now two exercises from Lang [IV, 3].

In the following two exercises, you are asked to find MBB0 (id), as defined in
Lang p. 88. Denote by B the 3 × 3 matrix whose columns are the basis vectors of
B, and by B 0 the 3 × 3 matrix whose columns are the basis vectors of B 0 . As we

will see, the answer is the columns of the matrix product
−1
B0 B.
This is worth remembering. Thus once you unwind the definitions, the only diffi-
culty is to compute the inverse of the matrix B 0 using Gaussian elimination.
Exercise 1 a)
The source basis B is
     
1 −1 0
1 ,  1  , 1 , (4.4.4)
0 1 2
and the target basis B 0 is      

2 0 −1
1 , 0 ,  1  .
1 1 1
We want to write each one of the source basis vectors as a combination of the target
basis vectors. The key point is that we are solving the linear system
    
2 0 −1 x1 r1
1 0 1  x2  = r2  (4.4.5)
1 1 1 x3 r3
for the three choices of the right hand side corresponding to the three basis vectors
of B. So we compute the inverse C of B 0 by setting up, as usual:
 
2 0 −1 | 1 0 0
1 0 1 | 0 1 0 
1 1 1 | 0 0 1
and doing row operations to make the left half of the matrix the identity matrix.
The C will appear on the right hand side. First divide the first row by 2:
 
1 0 −1/2 | 1/2 0 0
1 0 1 | 0 1 0
1 1 1 | 0 0 1
Clear the first column:

 
1 0 −1/2 | 1/2 0 0
0 0 3/2 | −1/2 1 0
0 1 3/2 | −1/2 0 1
Interchange the second and third rows:

 
1 0 −1/2 | 1/2 0 0
0 1 3/2 | −1/2 0 1
0 0 3/2 | −1/2 1 0
Subtract the third row from the second, multiply the third row by 2/3:
 
1 0 −1/2 | 1/2 0 0
0 1 0 | 0 −1 1
0 0 1 | −1/3 2/3 0
Finally add to the first row 1/2 times the third:

 
1 0 0 | 1/3 1/3 0
0 1 0 | 0 −1 1
0 0 1 | −1/3 2/3 0
So the inverse matrix C of B 0 is

 
1/3 1/3 0
 0 −1 1
−1/3 2/3 0
as you should check, and to get the answer just right multiply C by the matrix B
whose columns are in (4.4.4)
    
1/3 1/3 0 1 −1 0 2/3 0 1/3
 0 −1 1 1 1 1 =  −1 0 1 
−1/3 2/3 0 0 1 2 1/3 1 2/3
a matrix whose columns give the same answer as in (4.4.6).
Here is a slightly different approach: start from (4.4.5), and solve for the xi . A
similar computation gives
   r1 +r2 
x1 3
x2  = −r2 + r3 
−r1 +2r2
x3 3
Now just plug in the values for the ri . When they are (1, 1, 0), (−1, 1, 1), (0, 1, 2)
we get in turn for the {xi } the three column vectors:
     
2/3 0 1/3
 −1  , 0 ,  1  , (4.4.6)
1/3 1 2/3
and they are the columns of the matrix we are looking for, since they are the coor-
dinates, in the B 0 basis, of the unit coordinate vectors in the B basis.
Exercise 1 b) In this exercise the source basis B is:

     
3 0 1
2 , −2 , 1 , (4.4.7)
1 5 2
Using the same technique, since the target basis B 0 are the columns of the matrix
below, we get the system
    
1 −1 2 x1 r1
1 2 −1 x2  = r2 
0 4 1 x3 r3
As before, we do Gaussian elimination. Here are the steps. First subtract the first
row from the second row:
    
1 −1 2 x1 r1
0 3 −3 x2  = r2 − r1 
0 4 1 x3 r3
Next subtract 4/3 times the second row from the third:
    
1 −1 2 x1 r1
0 3 −3 x2  =  r2 − r1 
0 0 5 x3 r3 − (4/3)(r2 − r1 )
Thus x3 = (r3 − (4/3)(r2 − r1 ))/5 = (1/15)(4r1 − 4r2 + 3r3 ), and now, back-
substituting
3x2 = r2 − r1 + 3x2 = r2 − r1 + (1/5)(4r1 − 4r2 + 3r3 ) = (1/5)(−r1 + r2 + 3r3 )
so
x2 = (1/15)(−r1 + r2 + 3r3 ).
Finally
x1 = r1 + x2 − 2x3 = r1 + (1/15)(−r1 + r2 + 3r3 ) − (2/15)(4r1 − 4r2 + 3r3 )
= (1/5)(2r1 + 3r2 − r3 ).
So
     
x1 (1/5)(2r1 + 3r2 − r3 ) 6r1 + 9r2 − 3r3 )
x2  =  (1/15)(−r1 + r2 + 3r3 )  = 1  −r1 + r2 + 3r3  .
15
x3 (1/15)(4r1 − 4r2 + 3r3 ) 4r1 − 4r2 + 3r3
For the first coordinate vector in B from (4.4.7) we have r1 = 3, r2 = 2, r3 = 1,

so we get    
x1 11/5
x2  = 2/15
x3 7/15
For the second coordinate vector r1 = 0, r2 = −2, r3 = 5 in B from (4.4.7) we
get    
x1 −11/5
x2  = −13/15
x3 23/15
The last one is obtained in the same way.
Chapter 5
Row Rank is Column Rank
One of the most important theorems of linear algebra is the theorem that says that
the row rank of a matrix is equal to its column rank. This result is sometimes called
the fundamental theorem of linear algebra. The result, and its proof, is buried
in Lang’s Chapter V. It is stated as Theorem 3.2 p.114, but the proof requires a
result from §6: Theorem 6.4. The goal here is to give the result the prominence it
deserves, and to give a different proof. The one here uses Gaussian elimination.
5.1 The Rank of a Matrix

We start with a definition.
5.1.1 Definition. If A is a m×n matrix, then the column rank of A is the dimension
of the subspace of K m generated by the columns {A1 , . . . , An } of A. The row rank
of A is the dimension of the subspace of K n generated by the rows {A1 , . . . , Am }
of A.
First, as Lang points out on page 113, the space of solutions of a m × n homo-
geneous system of linear equations AX = O can be interpreted in two ways:
1. either as those vectors X giving linear relations
x 1 A1 + · · · + x n An = O
between the columns Aj of A,

2. or as those vectors X orthogonal to the row vectors Ai of A:
hX, A1 i = 0, . . . , hX, Am i = 0.
5.1. THE RANK OF A MATRIX 37
These are just two different ways of saying that X is in the kernel of the lin-
ear map LA we investigated in Chapter IV. The second characterization uses the
standard scalar product on K n and K m . Continuing in this direction:
5.1.2 Definition. For any subspace W of K n , the orthogonal complement W ⊥ of

W in K n is defined by
W ⊥ = {y ∈ K n | hw, yi = 0, for all w ∈ W }.
5.1.3 Proposition. W ⊥ is a subspace of K n , and its dimension is n - dim W .
Proof. If W = {O}, then W ⊥ is V , so we are done. We may therefore assume

that dim W ≥ 1. Pick a basis {w1 , . . . , wr } of W . Write
wi = (ci1 , ci2 , . . . , cin ) ∈ K n .
Also write y = (y1 , y2 , . . . , yn ) ∈ K n . Letting C be the r × n matrix (cij ), it

follows that y is in W ⊥ if and only if y is in the kernel of C. Since the wi form a
basis of W , there is no relation of linear dependence between the rows of C. Thus
the row rank of C is r.
Now perform row reduction on C to put C in row echelon form, as defined in
1.1.3. Call the new matrix B. By definition, the row rank of C and the row rank of
B are the same. We now prove an easy generalization of Proposition 2.2.5:
5.1.4 Proposition. Let B be a r × n matrix, r ≤ n, in row echelon form. Then B
has row rank r if and only if it does not have a row of zeroes.
Proof. Notice that Proposition 2.2.5 is the case r = n. To have row rank r, B must
have linearly independent rows. If B has a row Bi of zeroes, then it satisfies the
row equation Bi = O, a contradiction. Conversely, if B does not have a row of
zeroes, then for each row i of B there is an index µ(i) so that the entry bi,µ(i) is the
first non-zero coordinate of row Bi . Note that µ(i) is a strictly increasing function
of i, so that there are exactly n − r indices j ∈ [1, n] that are not of the form µ(i).
The variables yj for these n − r indices are called the free variables. Assume the
Bi satisfy an equation of linear dependence
λ1 Bi + λ2 B2 + · · · + λr Br = O.
Look at the equation involving the µ(1) coordinate. The only row with a non-zero
entry there is B1 . Thus λ1 = 0. Continuing in this way, we see that all the λi are
0, so this is not an equation of linear dependence.
5.1. THE RANK OF A MATRIX 38
Back to the proof of Proposition 5.1.3. We can give arbitrary values to each one
of the free variables and then do backsubstitution as in §1.3 to solve uniquely for
the remaining variables. This implies that the space of solutions have dimension
n − r, so we are done.
Now we can prove one of the most important theorems of linear algebra.
5.1.5 Theorem. The row rank and the column rank of any matrix are the same.
Proof. Let LA be the usual map: K n → K m . By the Rank-Nullity Theorem, we

have dim Im(LA ) + dim Ker(LA ) = n, which we can now write
column rank LA + dim Ker(LA ) = n.

Now the second interpretation of the space of solutions tells us that Ker(LA ) is
the subspace orthogonal to the space of row vectors of A. By Proposition 5.1.3, it
has dimension n - row rank. So
row rank LA + dim Ker(LA ) = n.
Comparing the two equations, we are done.

Chapter 6
Duality
This chapter is a clarification and an expansion of Lang [4], section [V, §6] entitled
The Dual Space and Scalar Products. The first three sections clarify the material
there, the remaining sections contain new results.
If V is a vector space over the field K, V ∗ denotes the vector space of linear
maps f : V → K. These linear maps are called functionals. V ∗ is called the dual
space of V , and the passage from V to V ∗ is called duality.
6.1 Non-degenerate Scalar Products

A scalar product on V is a map to K, written hv, wi, v ∈ V , w ∈ V that satisfies
Lang’s three axioms:
SP 1. hv, wi = hw, vi for all v abd w in V ;
SP 2. For all elements u, v and w of V , hu, v + wi = hu, vi + hu, wi;
SP 3. If x ∈ K, then hxv, wi = xhv, wi.
In his definition (see p.94) Lang adds to SP 3 the additional condition
hv, xwi = xhv, wi.
This is clearly unnecessary, since
hv, xwi = hxw, vi by SP 1

= xhw, vi by SP 3
= xhv, wi by SP 1.
6.1. NON-DEGENERATE SCALAR PRODUCTS 40
The scalar product is non-degenerate if the only v ∈ V such that
hv, wi = 0 for all w ∈ V
is the zero vector O.

An important theorem Lang proves in Chapter V, §5, p.124 is
6.1.1 Theorem. Any scalar product on a finite dimensional vector space V admits
an orthogonal basis.
Recall that an orthogonal basis {v1 , v2 , . . . , vn } is a basis where
hvi , vj i = 0 whenever i 6= j.
This is defined p.103. Lang proves this theorem by a variant of the Gram-Schmidt
process.
As Lang notes in Theorem 4.1 of Chapter V, if X and Y are column vectors in
K , then a scalar product on K n is equivalent to specifying a unique symmetric
n
n × n matrix A so that
hX, Y i = X t AY. (6.1.2)
Returning to V , if we choose an orthogonal basis B for V and its scalar product,
and let X and Y be the component vectors in K n of vectors v and w in V , then the
symmetric matrix A of (6.1.2) describing the scalar product is diagonal.
We get the easy corollary, not mentioned by Lang:
6.1.3 Corollary. If the scalar product is non-degenerate, and an orthogonal basis

B of V has been chosen, then the diagonal elements of the associated diagonal
matrix A are non-zero.
The importance of having a non-degenerate scalar product is revealed by Theo-

rem 6.2 p.128: then there is an isomorphism (to which Lang does not give a name):
D: V → V ∗
given by v 7→ Lv , where Lv is the functional taking the values
Lv (u) = hv, ui, for all u ∈ V.
So D(v) = Lv .
6.2. THE DUAL BASIS 41
6.2 The Dual Basis

Lang defines the dual basis on p. 127. Here is the definition:
6.2.1 Definition. Let B = {v1 , . . . , vn } be a basis for V . Then the dual basis
B ∗ = {ϕ1 , . . . , ϕn } of V ∗ is defined as follows. For each ϕj , set
(
1, if k = j;
ϕj (vk ) =
0, otherwise.
By Lang’s Theorem 2.1 of Chapter III, this specifies ϕj uniquely.
He uses the dual basis to prove his Theorem 6.3 of Chapter V.

One drawback of the isomorphism D defined above is that it depends not only
on V , but also on the inner product on V .
6.2.2 Example. Let V be two-dimensional, with a basis B = {v1 , v2 }. Let B ∗ =

{ϕ1 , ϕ1 } be the dual basis on V ∗ .
First assume that the scalar product of V has as matrix A the identity matrix.
Then for this scalar product the isomorphism D satisfies D(vi ) = ϕ1 and D(v2 ) =
ϕ2 as you should check.
Now instead assume the scalar product has in the same basis the matrix

a b
A= with ad − b2 6= 0.
b d
The last condition is needed to insure that the scalar product is non-degenerate.
Let’s call the map from V to V ∗ given by this scalar product E instead of D, since
we will get a different linear map. The definition shows that E(v1 ) is the functional
which in the B for V takes the value

a b x1
1 0 = ax1 + bx2
b d x2
on the vector x1 v1 + x2 v2 . So E(v1 ) is the functional aϕ1 + bϕ2 in the dual basis
on V ∗ . In the same way, E(v2 ) is the functional bϕ1 + dϕ2 , so we get a different
map when we choose a different scalar product..
This example shows that choosing a non-degenerate scalar product on V is

equivalent to choosing a basis of V and the dual basis on V ∗ . See Remark 6.7.2
for details.
6.3. THE ORTHOGONAL COMPLEMENT 42
6.3 The Orthogonal Complement

In Lang’s statement of Theorem 6.3, p. 129, the symbol W ⊥ is used to denote an
object different from one that it was used for up to then. This is bad practice so
here we will use instead:
W ∗ ⊥ = {ϕ ∈ V ∗ such that ϕ(W ) = 0}. (6.3.1)

In this notation Theorem 6.3 says that
dim W + dim W ∗ ⊥ = dim V.
Note that this theorem holds for all finite dimension vectors spaces V , without
reference to a scalar product Its proof is very similar to that of Theorem 2.3 p. 106.
Now assume that V is an n-dimensional vector space with a non-degenerate
scalar product. Then the easy Theorem 6.2 p.128 mentioned above and the com-
ments in Lang starting at the middle of page 130 establish the following theorem
in this context.
6.3.2 Theorem. For any subspace W of V , D restricts to an isomorphism
W ⊥ ' W ∗⊥.
This is the key point, so we should prove it, even though it is easy.
Proof. W ⊥ , which Lang redefines as perpV (W ) to avoid confusion with W ∗ ⊥ , is

just
W ⊥ = {v ∈ V such that hv, wi = 0}.
On the other hand W ∗ ⊥ , defined by Lang as perpV ∗ (W ) is given in (6.3.1) above.
Take any u ∈ W ⊥ , so that hw, ui = 0 for all w ∈ W . Its image under
the isomorphism of Theorem 6.2 is the map Lu , which is in W ∗ ⊥ by definition.
Conversely, if we start with a map Lu in W ∗ ⊥ , it comes from a u in W ⊥ .
Notice the logical progression: we want to establish Lang’s Theorem 6.4 p.

131, in the case of a vector space V with a non-degenerate scalar product. Thus we
want to prove:
dim W + dim W ⊥ = dim V.
To prove this, we first prove Lang’s Theorem 6.3 which asserts that in any vector
space V ,
dim W + dim W ∗ ⊥ = dim V.
6.4. THE DOUBLE DUAL 43
(Alas, confusingly W ∗ ⊥ is written W ⊥ in Lang.) Then we use Theorem 6.3.2

above to establish the isomorphism of W ⊥ with W ∗ ⊥ , when V has a non-degenerate
scalar product. So dim W ⊥ = dim W ∗ ⊥ and we are done.
Why do we need Theorem 6.4? In order to prove that row rank equals column
rank. In the approach of Chapter 5, we used the (non-degenerate) dot product on
K n to establish Proposition 5.1.3, our analog of Theorem 6.4. That meant that we
did not have to pass through Theorem 6.3.2 above.
6.4 The Double Dual

Given a n-dimensional vector space V , we have constructed its dual space V ∗ ,
which has the same dimension by Lang, Theorem 6.1. The isomorphism between
V and V ∗ depends on the choice of non-degenerate scalar product of V used to
construct it, as we showed in Example 6.2.2.
Next we can take the dual of V ∗ , the double dual of V , written V ∗∗ . Obviously
it has dimension n, but it also has a natural isomorphism with V , something V ∗
does not have. The material below is only mentioned in Lang’s Exercise 6, p. 131.
6.4.1 Definition. Pick a v ∈ V . For any ϕ ∈ V ∗ , let ev (ϕ) = ϕ(v). The map
ev : V ∗ → K is easily seen to be linear. It is called evaluation at v.
6.4.2 Theorem. The map D2 : v 7→ ev is an isomorphism of V with V ∗∗ .
Proof. First we need to show D2 is a linear map. The main point is that for two
elements v and w of V ,
ev+w = ev + ew .
To show this we evaluate ev+w on any ϕ ∈ V ∗ :
ev+w (ϕ) = ϕ(v + w) = ϕ(v) + ϕ(w) = ev (ϕ) + ew (ϕ)
just using the linearity of ϕ. Thus D2 (v + w) = D2 (v) + D2 (w). The remaining

point D2 (cv) = cD2 (v) is left to you.
Next, to show D2 is an isomorphism, since the spaces are of the same dimen-
sion, by the Rank-Nullity theorem (Lang Theorem 3.2 of Chapter III) all we have
to do is show D2 is injective. Suppose not: that means that there is a v such that
ev evaluates to 0 on all ϕ ∈ V ∗ , so ϕ(v) = 0. But that is absurd: all functionals
cannot vanish at a point. For example extend v to a basis of V and let φ be the
element in the dual basis dual to v, so φ(v) = 1.
Thus we can treat V and V ∗∗ as the same space.

6.5. THE DUAL MAP 44
6.5 The Dual Map

As in Lang, Chapter IV, §3, suppose we have a vector space V of dimension n, a
vector space W of dimension m, and a linear map F between them:
F : V → W.
To each ψ ∈ W ∗ , which is of course a linear map W → K, we can associate the

composite map:
ψ ◦ F : V → W → K.
In this way we get a linear map, that we call F ∗ , from W ∗ to V ∗ .
F ∗ : ψ ∈ W ∗ 7→ ϕ = ψ ◦ F ∈ V ∗ .
Our goal is to understand the relationship between the m × n matrix of F and the
n × m matrix of F ∗ , in suitable bases, namely the dual bases.
For the vector space V ∗ we use the dual basis B ∗ = {ϕ1 , . . . , ϕn } of the basis
B and for the vector space W ∗ we use the dual basis E ∗ = {ψ1 , . . . , ψm } of the
basis E.
The m × n matrix for F associated to B and E is denoted MEB (F ), as per Lang
∗
p. 88, while the matrix for F ∗ associated to E ∗ and B ∗ is denoted MBE∗ (F ∗ ).
What is the relationship between these two matrices? Here is the answer.
6.5.1 Theorem. One matrix is the transpose of the other:

∗
MBE∗ (F ∗ ) = MEB (F )t .
Proof. Any functional ϕ ∈ V ∗ can be written in terms of the dual basis as:
ϕ = ϕ(v1 )ϕ1 + · · · + ϕ(vn )ϕn . (6.5.2)
Just test the equality by applying the functionals to any basis vector vj to see that
both sides agree, since all the terms on the right hand side except the j-th one
vanish by definition of the dual basis. That means they are the same.
Writing A for MEB (F ), then for any j:
F (vj ) = a1j w1 + · · · + amj wm . (6.5.3)
This is what we learned in Chapter IV, §3. Note the indexing going in the wrong
order: in other words, we are taking the dot product of w with the j-th column of
A.
6.6. THE FOUR SUBSPACES 45
Now we compute. For any dual basis vector ψj we get:

F ∗ (ψj ) = ψj ◦ F in V ∗ .
= [ψj ◦ F ](v1 )ϕ1 + · · · + [ψj ◦ F ](vn )ϕn by (6.5.2)
= ψj (F (v1 ))ϕ1 + · · · + ψj (F (vn ))ϕn
= ψj (a11 w1 + · · · + am1 wm )ϕ1 +
· · · + ψj (a1n w1 + · · · + amn wm )ϕn by (6.5.3)
= aj1 ϕ1 + · · · + ajn ϕn since ψj is dual to wi ,
in each sum we only keep the j-th term, and use ψj (wj ) = 1. Then from Chapter
∗
IV, §3, we know that the matrix MBE∗ (F ∗ ) is the transpose of the matrix implied by
this formula: therefore it is At .
6.6 The Four Subspaces

We now bring everything together. We start with a linear map F : V → W , vector
spaces of dimension n and m. We pick bases B and E of V and W , as in the
previous section. Call A the matrix MEB (F ) associated to F .
As we saw in §6.2, we get a linear map F ∗ from W ∗ to V ∗ , and the matrix
∗
MBE∗ (F ∗ ) of F ∗ in the dual bases is just the transpose of A.
We also have the linear maps DB : V → V ∗ and DE : W → W ∗ defined in
Theorem 6.3.2, which says that they are isomorphisms if the scalar product on
each vector space is non-degenerate. Let us assume that. Then we get a diagram
of linear maps between vector spaces, where the vertical maps are isomorphisms.
F
V −−−−→ W
 

DB y

DE y
V ∗ ←−−∗−− W ∗
F
Now identify V and V ∗ each to K n using the basis B and the dual basis B ∗ ;
also identify W and W ∗ each to K m using the basis E and the dual basis E ∗ .
Let B = MBB∗ (DB ) be the matrix representing DB is these two bases, and E =
E
ME ∗ (DE ) the matrix representing DE . Then we get the corresponding diagram of
matrix multiplication:
A
K n −−−−→ K m
 
 
By Ey
K n ←−−−− K m
At
6.7. A DIFFERENT APPROACH 46
This is most interesting if the scalar product on V is the ordinary dot product in
the basis B, and the same holds on W in the basis E. Then it is an easy exercise to
show that the matrices B and E are the identity matrices, so the diagram becomes:
A
K n −−−−→ Km
 
 
Iy Iy
K n ←−−−− K m
At
Then we get the following rephrasing of Theorem 5.1.5:
6.6.1 Theorem. Let A be a m × n matrix, so that LA is a linear map from K n

to K m , with both given the ordinary dot product. Consider also the linear map
LAt : K m → K n associated to the transpose of A. Then the kernel of LA is the
orthogonal complement of the image of LAt in K n , and the kernel of LAt is the
orthogonal complement of the image of LA in K m . In particular this implies
Ker(LA ) ⊕ Image(LAt ) = K n ,
Image(LA ) ⊕ Ker(LAt ) = K m .
Thus the four subspaces associated to the matrix A are the kernels and images
of A and At . The text by Strang [6] builds systematically on this theorem.
6.7 A Different Approach

We give a completely elementary approach, without scalar products, to the above
material.
Start with a vector space V over K with a basis B = {v1 , . . . , vn }. To every
v ∈ V we construct maps to K as follows. The element v can be written uniquely
as
v = x1 v1 + x2 v2 + · · · + xn vn
so let ϕi be the map from V to K given by ϕi (v) = xi .
These are functionals, called the coordinate functions. See Lang Example 1 [V,
§6]. We must show they are linear. Indeed, if
w = y1 v 1 + y2 v 2 + · · · + yn v n
then
ϕi (v + w) = xi + yi = ϕi (v) + ϕi (w)
and
ϕi (cv) = cxi = cϕi (v).
Note that ϕi (vj ) = 0 if i 6= j, and ϕi (v1 ) = 1.
Then to every element w we can associate a functional called ϕw given by
ϕw = y1 ϕ1 + y2 ϕ2 + · · · + yn ϕn .
.
Thus we have constructed a map from V to V ∗ that associates to any w in V
the functional ϕw . We need to show this is a linear map: this follows from the
previous computations.
6.7.1 Proposition. The ϕi , 1 ≤ i ≤ n form a basis of V ∗ .
Proof. First we show the ϕi are linearly independent. Assume not. Then there is
an equation of linear dependence:
a1 ϕ1 + a2 ϕi + · · · + an ϕn = O.
Apply the equation to vj . Then we get aj = 0, so all the coefficients are zero, and
this is not an equation of dependence. Now take an arbitrary functional ψ on V .
Let aj = ψ(vj ). Now consider the functional
ψ − a1 ϕ1 − a2 ϕi − · · · − an ϕn
Applied to any basis vector vi this functional gives 0. So it is the zero functional
and we are done.
So we have proved that the dimension of V ∗ is n. The basis B ∗ = {ϕ1 , . . . , ϕn }
is called the dual basis of the basis B.
6.7.2 Remark. Now, just to connect to Lang’s approach, let’s define a scalar prod-
uct on V using the bases B and B ∗ . Let
hv, wi = ϕv (w).
The linearity in both variables is trivial. To conclude, we must show we have
symmetry, so
hv, wi = hv, wi = ϕw (v).
With the same notation as before for the components of v and w, we get
ϕv (w) = x1 ϕ1 (w) + · · · + xn ϕn (w)
= x1 y1 + · · · + xn yn
= y1 ϕ1 (v) + · · · + yn ϕn (v)
= ϕw (v)
so we have symmetry. This product is clearly non-degenerate: to establish that, we

must check that for every non-zero v ∈ V there is a w ∈ V such that hv, wi 6= 0.
If v is non zero, it has at least one non zero coefficient xi in the B basis. Then take
hv, vi i = xi 6= 0, so we are done.
We now use these ideas to show that the row rank and the column rank of a
matrix are the same.
Let A be a m × n matrix. Let N be the kernel of the usual map K n → K m
given by matrix multiplication:
x 7→ Ax.
View the rows of A as elements of V ∗ . Indeed, they are the elements of V ∗
such that when applied to N , give 0. Pick any collection of linearly independent
rows of A, then extend this collection to a basis of V ∗ . Pick the dual basis for
V ∗∗ = V . By definition N is the orthogonal complement of the collection of
independent rows of A. So
dim N + row rankA = n.
Combining with the Rank-Nullity Theorem, we get the desired result.

Chapter 7
Orthogonal Projection
Some additional material about orthogonality, to complement what is in Lang,

Chapter V. As usual At denotes the transpose of the matrix A. A vector x is always
a column vector, so xt is a row vector.
7.1 Orthogonal Projection

Start with a real n-dimensional vector space V of dimension n with a positive
definite inner product. Assume you have a subspace U ⊂ V of dimension m > 0.
By Theorem 2.3 of Chapter V, we know its orthogonal complement W = U ⊥ has
dimension r = n − m. Thus we can write V = U ⊕ W .
Pick an orthogonal basis {u1 , . . . , um } of U , an orthogonal basis {w1 , . . . , wr }
of W . The two together form an orthogonal basis of V . Write any v ∈ V in terms
of this basis:
v = c1 u1 + · · · + cm um + d1 w1 + · · · + dr wr (7.1.1)
for suitable real numbers ci and dj .
7.1.2 Proposition. ci is the component of v along ui .
We use the definition of component in Lang, page 99. Dotting (7.1.1) with ui
gives (v − ci ui ) · ui = 0, which is exactly what is required. In the same way, the
di is the component of v along wi .
Now we apply Lang’s Theorem 1.3 of Chapter V. First to v and ui , . . . , um .
The theorem tells us that the point p = c1 u1 + · · · + cm um is the point in U closest
to v. Similarly, the point q = d1 w1 + · · · + dr wr is the point in W closest to v.
7.1. ORTHOGONAL PROJECTION 50
7.1.3 Definition. The orthogonal projection of V to U is the linear transformation

P from V to V that sends
v 7→ p ∈ U such that hv − p, ui = 0 for all u ∈ U.
Here we map
v 7→ p = c1 u1 + · · · + cm um .
as per (7.1.1), and we have shown in Proposition 7.1.2 that this linear map P is the
orthogonal projection to U .
The kernel of P is W . The image of P is of course U . Obviously P 2 = P .
So Exercise 10 of Lang, Chapter III, §4 applies. Notice that we have, similarly, a
projection to W that we call Q. It sends
v 7→ q = d1 w1 + · · · + dr wr .
We have the set up of Exercises 11-12 of Lang, Chapter III, §4, p. 71. Finally note
that the matrix of P in our basis can be written in block form as

Im 0
Am =
0 0
where Im is the m×m identity matrix, and the other matrices are all zero matrices.
In particular it is a symmetric matrix.
Conversely we can establish:
7.1.4 Theorem. Any square matrix P that
• is symmetric (P t = P ),
• and satisfies P 2 = P ;
is the matrix of the orthogonal projection to the image of P .
Proof. We establish Definition 7.1.3 just using the two properties. For all v and w
in V :
hv − P v, P wi = (v − P v)t P w = v t P w − v t P t P w = v t P w − v t P 2 w = 0.
In the next-to-the-last step we replaced P t P by P 2 because P is symmetric, and

then in the last step we used P 2 = P .
7.2. SOLVING THE INHOMOGENEOUS SYSTEM 51
7.2 Solving the Inhomogeneous System

We work in K n and K m , equipped with the ordinary dot product. Our goal is to
solve the inhomogeneous equation
Ax = b
where A be a m × n matrix, and b a m-column vector. We started studying this

equation in §1.1.
7.2.1 Theorem. The inhomogeneous equation Ax = b can be solved if and only if
hb, yi = 0, for every m-vector y such that y t A = 0.
Proof. To say that the equation Ax = b can be solved is simply to say that b is in
the image of the linear map with matrix A. By the four subspaces theorem 6.6.1
the image of A in K m is the orthogonal complement of the kernel of At . Thus for
any b orthogonal to the kernel of At , the equation Ax = b can be solved, and only
for such b. This is precisely our assertion.
7.2.2 Example. Now a 3 × 3 example. We want to solve the system:
x1 − x2 = b1
x2 − x3 = b2
x3 − x1 = b3
So  
1 −1 0
A= 0 1 −1 .
−1 0 1
Now A has rank 2, so up to a scalar, there is only one non-zero vector y such
that y t A = 0. To find y add the three equations. We get
0 = b1 + b2 + b3 .
This says that the scalar product of (1, 1, 1) with b is 0. So by the theorem the
system has a solution for all b such that b1 + b2 + b3 = 0.
Let’s work it out. Write b3 = −b1 − b2 . Then the third equation is a linear
combination of the first two, so can be omitted. It is sufficient to solve the system:
x1 − x2 = b1
x2 − x3 = b2
7.3. SOLVING THE INCONSISTENT INHOMOGENEOUS SYSTEM 52
x3 can be arbitrary, and then x2 = x3 + b2 and

x1 = x2 + b1 = x3 + b1 + b2
so the system can be solved for any choice of x3 .
7.2.3 Remark. It is worth thinking about what happens when one does Gaussian
elimination on the inhomogeneous system, using the augmented matrix 1.1.2. As
in the proof of Proposition 5.1.3, reduce the matrix A to row echelon form C,
getting an equivalent system
Cx = d
The matrix C may have rows of zeroes - necessarily the last rows. Assume it has p
rows of zeroes. Then for the new system to have a solution, the last p components
of d must be 0. The rank of C is m − p: just imitate the proof of Proposition
5.1.4. This can be at most n, since the row rank is the column rank. Then by
the Rank-Nullity Theorem, any vector d whose last p components are 0 is in the
image of C, and in that case the system has a solution. The matrix C has m − p
columns of index µ(i), 1 ≤ i ≤ m − p, where µ(i) is a strictly increasing function
of i, such that the entry ci,µ(i) is the first non-zero coordinate of row Ci of C. The
remaining columns correspond to the free variables xi . Thus there are n − (m − p)
of them. For any choice of the free variables the system admits a unique solution
in the remaining (m − p) variables.
7.2.4 Example. Now we redo Example 7.2.2 via Gaussian elimination to illustrate
the remark above. Here n = m = 3. The augmented matrix is
 
1 −1 0 b1
0 1 −1 b2  .
−1 0 1 b3
We reduce A to row echelon form C:
 
1 −1 0 b1
0 1 −1 b2 .
0 0 0 b1 + b2 + b3
so p = 1. µ(1) = 1, µ(2) = 2, so x3 is the only free variable. The only condition
on b is that b1 + b2 + b3 = 0.
7.3 Solving the Inconsistent Inhomogeneous System

We continue with the inhomogeneous system:
Ax = b
where A is a m × n matrix. Here, we assume that m is larger, perhaps much larger

than n, and that the rank of A is n - the biggest it could be under the circumstances,
meaning that its columns are linearly independent. In Theorem 7.3.3 below we
show that this implies that the n × n matrix At A is invertible. We also assume that
we are working over R, since we need results from §7.1.
Multiplication by the matrix A gives us an injective linear map from Rn to
m
R = V , and the image of A is a subspace U of dimension n of V by the Rank-
Nullity theorem.
So typically, a right hand vector b ∈ Rm will not lie in the subspace U , so the
equation Ax = b will not be solvable, because b is not in the image of A. If this
is the case, we say the system is inconsistent. Still, one would like to find the best
possible approximate solution of the system. We know what to do: project b into
U as in §7.1: why? because this is the best approximation by Lang, Theorem 1.3
p. 102.
The columns A1 , . . . , An form a basis (not orthogonal) of the image U , so the
projection point p of b can be written uniquely as a linear combination
p = x1 A1 + · · · + xn An
for real variables xi . This is the matrix product p = Ax: check the boxed formula
of Lang, page 113. To use the method of §7.1, we need the orthogonal complement
of U , namely the subspace of V orthogonal to the columns of A. That is the kernel
of At , since that is precisely what is orthogonal to all the columns.
So we conclude that b − p must be in the kernel of At . Writing this out, we get
the key condition:
At (b − Ax) = 0, or At Ax = At b. (7.3.1)
Because At A is invertible - see Theorem 7.3.3 below, we can solve for the un-
knowns x:
x = (At A)−1 At b.
So finally we can find the projection point:
p = Ax = A(At A)−1 At b.
So we get from any b to its projection point p by the linear transformation with
matrix
P = A(At A)−1 At
Since A is a m × n matrix , it is easy to see that P is m × m. Notice that P 2 = P :
P 2 = A(At A)−1 At A(At A)−1 At = A(At A)−1 At = P

by cancellation of one of the (At A)−1 by At A in the middle. Also notice that P is
symmetric by computing its transpose:
P t = (At )t ((At A)−1 )t At = A((At A)t )−1 At = A(At A)−1 At = P.
We used (At )−1 = (A−1 )t (see Lang’s exercise 32 p. 41), and of course we used
(At )t = A. So we have shown (no surprise, since it is a projection matrix):
7.3.2 Theorem. The matrix P above satisfies Theorem 7.1.4: it is symmetric and
P2 = P.
We now prove a result we used above. We reprove a special case of this result
in Theorem 11.2.1. Positive definite matrices are considered at great length in
Chapter 11.
7.3.3 Theorem. Let A be a an m × n matrix, with m ≥ n. If A has maximal rank

n, then the n × n matrix At A is positive definite, and therefore is invertible. It is
also symmetric.
Proof. Because A has maximal rank, its kernel is trivial, meaning that the only
n-vector x such that Ax = 0 is the zero vector. So assume Ax 6= 0. Then
xt (At A)x = (xt At )Ax = (Ax)t Ax = kAxk2 ≥ 0
Now kAxk2 = 0 implies that Ax is the zero vector, and this cannot be the case.
Thus xt (At A)x > 0 whenever x 6= 0. This is precisely the definition that At A is
positive definite. It is symmetric because
(At A)t = At (At )t = At A.
7.3.4 Exercise. Compute At A for the rank 2 matrix

 
1 −1
A = 1 0 
1 1
and show that it is positive definite.

 
1 −1

1 1 1  3 0
At A = 1 0  =
−1 0 1 0 2
1 1
This is obviously positive definite. In this case it is easy to work out the projection
matrix A(At A)−1 At :
   
1 −1 5/6 2/6 −1/6
1/3 0 1 1 1
P = 1 0  =  2/6 2/6 2/6 
0 1/2 −1 0 1
1 1 −1/6 2/6 5/6
which is of course symmetric, and P 2 = P as you should check.
In conclusion, given an inconsistent system Ax = b, the technique explained

above shows how to replace it by the system Ax = p that matches it most closely.
Orthogonal projections will be used when (if) we study the method of least
squares.
7.3.5 Remark. In Lang’s Theorem 1.3 of Chapter V mentioned above, a result

minimizing distance is proved just using Pythagoras. In this section, we might
want to do the same thing when faced with the expression
kAx − bk
It is not so simple because the columns of A are not mutually perpendicular. So in-
stead we can use the standard minimization technique from multivariable calculus.
First, to have an easier function to deal with, we take the square, which we write
as a matrix product:
f (x) = (xt At − bt )(Ax − b) = xt At Ax − xt At b − bt Ax + bt b
Notice that each term is a number: check the size of the matrices and the vectors
involved. Calculus tells us f (x), which is a quadratic polynomial, has an extremum
(minimum or maximum or inflection point) only when all the partial derivatives
with respect to the xi vanish. It is an exercise to see that the gradient ∇f of f
in x is At Ax − At b, so setting this to 0 gives the key condition (7.3.1) back. No
surprise.
Of course Theorem 7.1.4 shows that it is possible to bypass distance minimiza-
tion entirely, and use perpendicularity instead.
Chapter 8
Uniqueness of Determinants
Using the material from Lang [3] on elementary matrices, summarized in Chapter
1, we give an alternative quick proof of the uniqueness of the determinant function.
This is the approach of Artin [1], Chapter I, §3, and is different from the one using
permutations that Lang uses in our textbook [4]. You may wonder how Lang covers
this material in [3]: the answer is that he omits the proofs.
8.1 The Properties of Expansion by Minors

We start with a n × n matrix A. In [4], [VI, §2-3], Lang shows that any expansion
F (A) by minors along rows (columns) satisfies certain properties. Recall that the
expansion along the i-th row can be written inductively as
F (A) = (−1)i+1 ai1 F (Ai1 ) + · · · + (−1)i+j aij F (Aij ) + · · · + (−1)i+n ain F (Ain )
where the F on the right hand side refers to the function for (n − 1) × (n − 1)
matrices, and Aij is the ij-th minor of A. Note that Lang does not use this word,
which is universally used. We will have more to say about minors in Chapter 11.
The expansion along the j-th column can be written inductively as
F (A) = (−1)j+1 a1j F (A1j )+· · ·+(−1)i+j aij F (Aij )+· · ·+(−1)j+n anj F (Anj ).
These expansions are widely known as the Laplace expansions.

This is a place where summation notation is useful. The two expressions above
can be written
n
X n
X
(−1)i+j aij F (Aij ) and (−1)i+j aij F (Aij ).
i=1 j=1
8.2. THE DETERMINANT OF AN ELEMENTARY MATRIX 57
Only the index changes!

Now we recall the three defining properties of the determinant, then the addi-
tional properties we deduce from them. They are numbered here as in Lang:
1. Multilinearity in the columns (rows);
2. Alternation: if two adjacent columns (rows) are the same, the function F is
0;
3. Identity: F (I) = 1;
4. Enhanced alternation: If any two columns (rows) are interchanged, the func-
tion F is multiplied by −1.
5. More alternation: If any two columns (rows) are equal, the function F is 0;
6. If a scalar multiple of a column (row) is added to a different column (row),
the function F does not change;
7. If a matrix A has a column (row) of zeroes, the function F (A) = 0.
Here is how you read this list: if you expand along rows (as is done in [4], VI,
§2-3) read column everywhere, while if you expand along columns read (row)
everywhere.
(7) is not proved in [4]. Here is the easy proof, which we only do for row
expansion, so for a column of zeroes. Suppose the j-th column of A is 0. Multiply
that column by the non zero constant c. This does not change the matrix. By
linearity (property (1)) we then have F (A) = cF (A). The only way this can be
true is if F (A) = 0.
8.2 The Determinant of an Elementary Matrix

In this section we will deal with expansion along columns, so that we can make an
easy connection with the row operations of Gaussian elimination. First we compute
the determinant of the three different types of elementary matrices 1.2.1.
Multilinearity (property (1)) says, in particular, that if Ec is an elementary
matrix of type (1), for the constant c, then
F (Ec A) = cF (A). (8.2.1)
since Ec A just multiplies the corresponding row of A by c. Let A = I, the identity

matrix. Property (3) says that F (I) = 1, so substituting I for A in (8.2.1) we get
F (Ec ) = c.
Next let E be an elementary matrix of type (2): interchanging rows. Then by
Property (4),
F (EA) = −F (A). (8.2.2)
Again take A = I to get F (E) = −1.
8.3. UNIQUENESS 58
Finally let E be an elementary matrix of type (3). Property (6) says
F (EA) = F (A). (8.2.3)
Once again take A = I to get, this time, F (E) = 1.
8.2.4 Remark. It is important to notice that the computation of F (E) does not
depend on whether F is a row or column expansion. We only used Property (3):
F (I) = 1 that holds in both directions, as does the rest of the computation.
Next we get the easy but critical:
8.2.5 Theorem. For any elementary matrix E and any square matrix A,
F (EA) = F (E)F (A).
Furthermore if E1 , . . . , Ek are elementary matrices, then
F (Ek . . . E1 A) = F (Ek ) . . . F (E1 )F (A).
Proof. Now that we have computed the value of F (E) for any elementary matrix,
we just plug it into the equations (8.2.1), (8.2.2) and (8.2.3) to get the desired
result. For the last statement, just peel off one of the elementary matrices at a time,
by induction, starting with the one on the left.
8.3 Uniqueness
In this section we follow Artin [1], p. 23-24 closely.
By Theorem 8.2.5, if (1.3.3) is satisfied, then
F (A0 ) = F (Ek ) . . . F (E1 )F (A).
Now A0 is either I, in which case F (A0 ) = 1, or the bottom row of A0 is 0, in

which case by Property (7), F (A0 ) = 0.
In either case we can compute F (A): if F (A0 ) = 1, we get the known value
1
F (A) = .
F (Ek ) . . . F (E1 )
If F (A0 ) = 0, we get 0. By remark 8.2.4 we get the same result whether we do

row or column expansion. So we get a uniquely defined value for the function
F , only using the three defining properties, valid for multilinearity and alternation
with respect to either rows or columns. So we have:
8.3. UNIQUENESS 59
8.3.1 Theorem. This function F , which we now call the determinant, is the only
function of the rows (columns) of n × n matrices to K satisfying Properties (1), (2)
and (3).
8.3.2 Corollary. The determinant of A is 6= 0 if and only if A is invertible.
This follows from Gaussian elimination by matrices: A is invertible if and only
if in (1.3.3) the matrix A0 = I, which is the only case in which the determinant of
A is non-zero.
We now get one of the most important results concerning determinants:
8.3.3 Theorem. For any two n × n matrices A and B,
det(AB) = det(A) det(B).
Proof. First we assume A is invertible. By (1.3.3) we know A is the product of
elementary matrices:
A = Ek . . . E1 .
Note that these are not the same E1 , . . . , Ek as before. By using Theorem 8.2.5 we
get
det(A) = det(Ek ) . . . det(E1 ) (8.3.4)
and
det(AB) = det(Ek . . . E1 B) by definition of A
= det(Ek ) . . . det(E1 ) det(B) by Theorem 8.2.5
= det(A) det(B). using (8.3.4)
Next assume that A is not invertible, so det(A) = 0. We must show det(AB) =
0. Now apply (1.3.3) to A0 with bottom row equal to 0. Matrix multiplication
shows that the bottom row of A0 B is 0, so by Property (7), det(A0 B) = 0. So
using Theorem 8.2.5 again
0 = det(A0 B) as noted
= det(Ek . . . E1 AB) by definition of A0
= det(Ek ) . . . det(E1 ) det(AB) Th. 8.2.5 applied to AB
Since the det(Ei ) are non-zero, this forces det(AB) = 0, and we are done.
As a trivial corollary we get, when A is invertible, so A−1 A = I:
1
det(A−1 ) = .
det(A)
These last two results are proved in Lang [4] p.172: Theorems 7.3 and Corollary
7.4 of chapter VI.
8.4. THE DETERMINANT OF THE TRANSPOSE 60
8.3.5 Remark. Theorem 8.3.3 says that the map which associates to any n × n
matrix its determinant has a special property. For those who know group theory
this can be expressed as follows. Look at the set of all invertible n × n matrices,
usually denoted Gl(n) as we mentioned in §2.3. Now Gl(n) is a group for matrix
multiplication, with I as in neutral element. As we already know the determinant
of an invertible matrix is non-zero, so when restricted to Gl(n) the determinant
function maps to K ∗ which denotes the non zero elements of the field K. Now
K ∗ is a group for multiplication, as you should check. Theorem 8.3.3 says that the
determinant function preserves multiplication, i.e.,
The determinant of the product is the product of the determinants.
Maps that do this are called group homomorphisms and have wonderful properties
that you can learn about in an abstract algebra book such as [1]. In particular it
implies that the kernel of the determinant function, meaning the matrices that have
determinant equal to the neutral element of K ∗ , namely 1, form a subgroup of
Gl(n), called Sl(n). Lang studies Sl(n) in Appendix II.
8.4 The Determinant of the Transpose

It is now trivial to prove
8.4.1 Theorem. For any n × n matrix A,
det(At ) = det(A)
Proof. Indeed computing the expansion by minors of At along columns is the same
as computing that of A along rows, so the computation will give us det(A) both
ways. The key point is again Remark 8.2.4.
This is Lang’s Theorem 7.5 p.172. He proves it using the expansion formula
for the determinant in terms of permutations: see (9.1.3).
Chapter 9
The Key Lemma on Determinants
The point of this chapter is to clarify Lang’s key lemma 7.1 and the material that
precedes it in §7 of [4]. A good reference for this material is Hoffman and Kunze
[2], §5.3.
These notes can be read before reading about permutations in Lang’s Chapter
VI, §6. We do assume a knowledge of Lang [4], §2-3 of Chapter VI: the notation
from there will be used.
9.1 The Computation

As in Chapter 8, we use F (A) to denote a function from n × n matrices to the base
field K that satisfies Properties (1) and (2) with respect to the columns: thus, as a
function of the columns it is multilinear and alternating.
Let e1 , . . . , en denote the columns (in order) of the identity matrix. Then we
can, somewhat foolishly, perhaps, write:
Aj = a1j e1 + a2j e2 + · · · + anj en . (9.1.1)
Aj denotes, as usual, the j-th column of A.

So using multilinearity of F in A1 , we get
n
X
F (A1 , A2 , . . . , An ) = ai1 F (ei , A2 , . . . , An ).
i=1
Now repeat with respect to A2 . We get a second sum:

n X
X n
1 2 n
F (A , A , . . . , A ) = ai1 aj2 F (ei , ej , A3 , . . . , An ).
i=1 j=1
9.1. THE COMPUTATION 62
Now a third time with respect to A3 :

n X
X n X
n
F (A) = ai1 aj2 ak3 F (ei , ej , ek , A4 , . . . , An ).
i=1 j=1 k=1
Now we have to solve an issue of notation: we call the indices k1 , k2 , . . . , kn ,

and we write only one summation sign. Once we have expanded all the columns,
we get:
n
X
F (A) = ak1 1 ak2 2 . . . akn n F (ek1 , ek2 , . . . , ekn ). (9.1.2)
k1 ,...,kn =1
What does the summation mean? We need to take all possible choices of
k1 , k2 , . . . , kn between 1 and n. It looks like we have nn terms.
Now we use the hypothesis that F is alternating. Then
F (ek1 , ek2 , ek3 , . . . , ekn ) = 0
unless all of the indices are distinct: otherwise the matrix has two equal columns.
This is how permutations enter the picture. When k1 , k2 , . . . , kn are distinct, then
the mapping
1 → k1 , 2 → k2 , . . . , n → kn
is a permutation of Jn = {1, . . . , n}. Our sum is over all permutations.
Now at last we assume Property (3), so that F (I) = 1. Then
F (ek1 , ek2 , ek3 , . . . , ekn ) = ±1,
because we have the columns of the identity matrix, in a different order. At each
interchange of two columns, the sign of F changes. So we get 1 if we need an even
number of interchanges, and −1 if we need an odd number of interchanges.
The number F (ek1 , ek2 , ek3 , . . . , ekn ) is what is called the sign of the permu-
tation. Lang writes it .
Thus if we write the permutations σ, we have proved
X
F (A) = (σ)aσ(1),1 aσ(2),2 . . . aσ(n),n (9.1.3)
σ
where the sum is over all permutations in Jn . This is a special case of the formula
in the key Lemma 7.1, and gives us Theorem 7.2 directly.
9.2. MATRIX MULTIPLICATION 63
9.2 Matrix Multiplication

The next step is to see how matrix multiplication allows us to produce the general-
ization of (9.1.1) Lang uses to get his Theorem 7.3.
First work out the 2 × 2 case. If we assume C = AB, it is easy to see the
following relationship between the columns of C and those of A, with coefficients
the entries of B.

1 2
a11 b11 + a12 b21 a11 b12 + a12 b22
C C =
a21 b11 + a22 b21 a21 b12 + a22 b22
= b11 A1 + b21 A2 b12 A1 + b22 A2

This generalizes to all n: if C = AB for n × n matrices, then for every i,
C j = b1j A1 + b2j A2 + · · · + bnj An .
Now we compute F (C) by expanding the columns using these equations and
multilinearity, exactly as we did in the previous section. So C plays the role of A,
the bij replace the aij , and the Ai replace the ei .
Now we assume F is also alternating. We will not make use of Property (3)
until we mention it. By the same argument as before we get the analog of (9.1.2):
n
X
F (C) = bk1 1 bk2 2 . . . bkn n F (Ak1 , Ak2 , . . . , Akn ).
k1 ,...,kn =1
Finally, using permutations as before, we can rewrite this as

X
F (C) = (σ)bσ(1),1 bσ(2),2 . . . bσ(n),n F (A)
σ
which is the precise analog of the expression in Lang’s Lemma 7.1. Using (9.1.3)
for B, we get:
F (AB) = F (A) det(B).
Finally if we assume that F satisfies Property (3), by the result of the previous
section we know that F is the determinant, so we get the major theorem:
det(AB) = det(A) det(B).
Note in passing that this solves Lang’s Exercise 2 of §7.

Chapter 10
The Companion Matrix
This important matrix is not discussed in [4], but it provides a good introduction to
the notion of a cyclic vector, which becomes important in [4], XI, §4.
10.1 Introduction
We start with a monic polynomial of degree n:
f (t) = tn − an−1 tn−1 − an−2 tn−2 − · · · − a0 (10.1.1)
with coefficients in an field K. Monic just means that the coefficient of the leading
term tn is 1.
We associates to this matrix an n × n matrix known as its companion matrix:
 
0 0 0 . . . 0 0 a0
1 0 0 . . . 0 0 a1 
 
0 1 0 . . . 0 0 a2 
 
A = 0 0 1 . . . 0 0 a3  (10.1.2)
 
 .. .. . . .. . . .. .. 
. .
 . . . . . 

0 0 0 . . . 1 0 an−2 
0 0 0 . . . 0 1 an−1
Thus the last column of A has the coefficients of f in increasing order, omitting
the coefficient 1 of the leading term, while the subdiagonal of A, namely the terms
ai+1,i are all equal to 1. All other terms are 0.
10.1.3 Theorem. The characteristic polynomial of the companion matrix of a poly-

nomial f (t) is f (t).
10.2. USE OF THE COMPANION MATRIX 65
Proof. We compute the characteristic polynomial of A, i.e. the determinant of

 
t 0 0 ... 0 −a0
−1 t . . . . . . . . 0 −a1 
 
 0 −1 t . . . 0 −a2 
 
0
 0 −1 . . . 0 −a3  
 . .. .. .. ..
 ..

. . . t . 
0 0 0 ... −1 t − an−1
by expansion along the first row, and induction on n. First we do the case n = 2.
The determinant we need is

t −a0 2
−1 t − a1 = t(t − a1 ) − a0 = t − a1 t − a0 ,

as required.
Now we do the case n. By Laplace expansion of the determinant along the first
row we get two terms:

t ........ 0
−a1 −1 t . . . . . . . . 0

−1 t . . . 0 −a2 0 −1 t . . . 0

0 −1 . . . 0 −a n 0 0 −1 . . . 0
t 3 + a0 (−1)

.. .. .. .. .. .. .. ..
.
. . t .

. . . . t

0 0 . . . −1 t − an−1 0 0 0 . . . −1
By induction we see that the first term gives
t(tn−1 − an−1 tn−2 − −an−2 tn−3 − · · · − a2 t − a1 )
while the second term gives a0 (−1)n (−1)n−1 = −a0 , since the matrix is triangu-
lar with −1 along the diagonal.
Thus we do get the polynomial (10.1.1) as characteristic polynomial of (10.1.2).
10.2 Use of the Companion Matrix

We denote by e1 , e2 , . . . , en the standard unit basis of K n . The form of (10.1.2)
shows that the linear transformation L associated to the matrix A acts as follows:
L(e1 ) = e2 , L(e2 ) = e3 , . . . , L(en ) = a0 e1 + a1 e2 + · · · + an−1 en ,

10.3. WHEN THE BASE FIELD IS THE COMPLEX NUMBERS 66
since e2 is the first column of A, e3 the second column of A, en the n−1-th column
of A, and a0 e1 + a1 e2 + · · · + an−1 en the last column of A.
This means that e1 is a cyclic vector for L: as we apply powers of the operator
L to e1 , we generate a basis of the vector space K n : in other words the vectors e1 ,
L(e1 ), L2 (e1 ), Ln−1 (e1 ) are linearly independent.
Conversely,
10.2.1 Theorem. Assume that the operator L : V → V has no invariant sub-
spaces, meaning that there is no proper subspace W ⊂ V such that L(W ) ⊂ W .
Pick any non-zero vector v ∈ V . Then the vectors v, L(v), L2 (v), . . . , Ln−1 (v)
are linearly independent, and therefore form a basis of V .
Proof. To establish linear independence we show that there is no equation of linear
dependence between v, L(v), L2 (v), . . . , Ln−1 (v). By contradiction, let k be the
smallest positive integer such that there is an equation of dependence
b0 v + b1 L(v) + b2 L2 (v) + · · · + bk Lk (v) = 0
The minimality of k means that bk 6= 0, so that we can solve for Lk (v) in terms of
the previous basis elements. But this gives us an invariant subspace for L: the one
generated by v, L(v), L2 (v), . . . , Lk (v), a contradiction.
The matrix of L in the basis {v, L(v), L2 (v), . . . , Ln−1 (v)} is the companion
matrix of the characteristic polynomial of L.
10.3 When the Base Field is the Complex Numbers

If K = C, then one can easily find an even simpler n × n matrix D over C whose
characteristic polynomial is f (t). In that case f (t) factors linearly with n roots α1 ,
. . . , αn , not necessarily distinct. Then take the diagonal matrix D with diagonal
entries the αi . The characteristic polynomial of D is clearly f (t).
The construction we will make for the Jordan normal form ([4], Chapter XI, §6)
is a variant of the construction if the companion polynomial. Here we only treat the
case where the characteristic polynomial of L is (t − α)n . So it has only one root
with high multiplicity. Assume furthermore that there is a vector v that is cyclic,
not for L, but for L − αI, meaning that there is a k such that (L − αI)k v = 0. The
smallest k for which this is true is called the period of v. The case that parallels
what we did in the previous section is the case where k = n, the dimension of V .
An argument similar to the one above (given in [4], Lemma 6.1 p. 262) shows
that the elements
w0 = v, w1 = (L − α)v, w2 = (L − α)2 v, . . . , wn−1 = (L − α)n−1 v

10.4. WHEN THE BASE FIELD IS THE REAL NUMBERS 67
are linearly independent. If we use these as a basis, in this order, we see that
Lw0 = Lv = (L − α)v + αv = w1 + αw0

Lw1 = L(L − α)v = (L − α)2 v + α(L − α)v = w2 + αw1
Lwk = L(L − α)k v = (L − α)k+1 v + α(L − α)k v = wk+1 + αwk
Lwn−1 = L(L − α)n−1 v = (L − α)n v + α(L − α)n−1 v = αwn−1
Thus the matrix of L in this basis is

 
α 0 0 ... 0 0
1 α 0 ... 0 0
 
0 1 α ... 0 0
A = . . . (10.3.1)
 
.. .. .. 
 .. .. .. . . .
 
0 0 0 ... α 0
0 0 0 ... 1 α
Thus it is lower triangular, with α along the diagonal, and 1 on the subdiago-
nal. Because Lang takes the reverse order for the elements of the basis, he gets the
transpose of this matrix: see p. 263. This confirms that the characteristic polyno-
mial of this matrix is (t − α)n .
Finally we ask for the eigenvectors of (10.3.1). It is an exercise ([4], XI, §6,
exercise 2) to show that there is only one eigenvector: wn−1 .
10.4 When the Base Field is the Real Numbers

When K = R the situation is slightly more complicated, since the irreducible
factorization of f (t) over R contains polynomials of degree 1 and degree 2. Let us
look at the 2 × 2 case: the real matrix

a b
(10.4.1)
c d
with characteristic polynomial
g(t) = t2 − (a + d)t + (ad − bc).
The trace of this matrix is a + d and its determinant ad − bc. Notice how they
appear in the characteristic polynomial. For this to be irreducible over R, by the
quadratic formula we must have
(a + d)2 − 4(ad − bc) = (a − d)2 + 4bc < 0

10.4. WHEN THE BASE FIELD IS THE REAL NUMBERS 68
or (a − d)2 < 4bc.

The companion matrix of this polynomial is

0 −(ad − bc)
1 a+d
The full value of the companion matrix only reveals itself when one takes
smaller subfields of C, for example the field of rational numbers Q. Over such
a field there are irreducible polynomials of arbitrary high degree: for example the
cyclotomic polynomial
Φp (t) = tp−1 + tp−2 + · · · + t + 1
for p a prime number. Since (t − 1)Φp (t) = tp − 1, the roots of Φ(t) are complex
numbers on the circle of radius 1, thus certainly not rational. It is a bit harder
to show that Φp (t) is irreducible, but it only requires the elementary theory of
polynomials in one variable. A good reference is Steven H. Weintraub’s paper
Several Proofs of the Irreducibility of the Cyclotomic Polynomial.
Chapter 11
Positive Definite Matrices
Real symmetric matrices come up in all areas of mathematics and applied mathe-
matics. It is useful to develop easy tests to determine when a real symmetric matrix
is positive definite or positive semidefinite.
The material in Lang [4] on positive (semi)definite matrices is scattered in ex-
ercises in Chapter VII, §1 and in Chapter VIII, §4. The goal of this chapter is to
bring it all together, and to establish a few other important results in this direction.
11.1 Introduction
Throughout this chapter V with denote a real vector space of dimension n with a
positive definite scalar product written hv, wi as usual. By the Gram-Schmidt or-
thonormalization process (Lang V.2) this guarantees that there exists an orthonor-
mal basis B = {v1 , v2 , . . . , vn } for V .
Recall that a symmetric operator T on V satisfies
hT (v), wi = hT v, wi
for all v and w in V . A symmetric operator T is positive semidefinite
hT v, vi ≥ 0 for all v ∈ V. (11.1.1)
This is the definition in Exercise 4 of VIII §4. Similarly a symmetric operator T is

positive definite if (11.1.1) is replaced by
hT v, vi > 0 for all v 6= O. (11.1.2)
In particular, positive definite operators are a subset of positive semidefinite oper-

ators.
11.1. INTRODUCTION 70
11.1.3 Remark. Lang uses the term semipositive instead of positive semidefinite:
Exercise 10 of VII §1and Exercise 4 of VIII §4. Here we will use positive semidef-
inite, since that is the term commonly used.
Lang also defines a positive definite matrix A on Rn with the standard dot
product. It is a real n × n symmetric matrix A such that for all coordinate vectors
X 6= O, X t AX > 0. See Exercise 9 of VII §1. In the same way we can define a
positive semidefinite matrix.
Here is an important fact that is only stated explicitly by Lang in Exercise 15
b) of VIII §4:
11.1.4 Proposition. If T is a positive definite operator on V , then for an orthonor-

mal basis B of V the matrix A = MBB (T ) representing it in the basis B is pos-
itive definite, in particular it is symmetric. Thus if v = x1 v1 + · · · + xn vn and
X = (x1 , . . . , xn ) is the coordinate vector of v in Rn of v,
X t AX > 0 for all X 6= O.
Similarly for positive semidefinite operators.
This follows immediately from the results of Chapter IV, §3 and the definitions.
Next it is useful to see how the matrix A changes as one varies the basis of
V . Following the notation of Lang V.4, we write g(v, w) = hT v, wi. This is a
symmetric bilinear form: see Lang, V.4, Exercises 1 and 2. We have just seen that
for an orthonormal basis B of V ,
g(v, w) = X t AY
where X and Y are the coordinates of v and w in that basis. Now consider another
basis B 0 . By Lang’s Corollary 3.2 of Chapter IV, if U is the invertible change of
basis matrix MBB0 (id) and X 0 and Y 0 are the coordinate vectors of x and y in the
B 0 basis, then
X 0 = U X and Y 0 = U Y.
Thus if A0 denotes the matrix for g(v, w) in the B 0 basis, we must have
t
g(v, w) = X t AY = X 0 A0 Y 0 = X t U t AU Y.
Since this must be true for all v and w, we get
A0 = U t AU (11.1.5)
Therefore the matrix A0 is symmetric: just compute its transpose.

11.2. THE FIRST THREE CRITERIA 71
11.1.6 Exercise. Two n × n symmetric matrices A and A0 are congruent if there

is an invertible matrix U such that A0 = U t AU . Show that congruence is an
equivalence relation on symmetric matrices: see §4.3.
11.1.7 Exercise. Show that two congruent matrices represent the same symmetric
bilinear form on V in different bases. Thus they have the same index of positiv-
ity and same index of nullity. In particular, if one is positive definite or positive
semidefinite, the other is also. See Lang V.8.
11.2 The First Three Criteria

From Lang’s account in Chapter VIII, §4, one can piece together the following
theorem.
11.2.1 Theorem. The following conditions on the symmetric operator T are equiv-
alent.
1. T is positive (semi)definite.
2. The eigenvalues of T are all positive (non negative).
3. There exists a operator S on V such that T = S t S, which is invertible if and
only if T is positive definite.
4. The index of negativity of T is 0. If T is positive definite, the index of nullity
is also 0
These equivalences are contained in the exercises of Chapter 8, §4, in particular

Exercises 3, 6, 8, 15 and 25. The proof relies on the Spectral Theorem, which
allows us to find an orthonormal basis of eigenvectors for any symmetric operator.
We only consider the positive definite case, and leave the semidefinite case as an
exercise.
Proof. For (1) ⇒ (2), if T v = λv, so that v is an eigenvector with eigenvalue λ,

then
hv, T vi = hv, λvi = λhv, vi.
This must be positive. Since hv, vi > 0, we must have λ > 0. Thus all the
eigenvalues are positive. Conversely, if all the eigenvalues are positive, write an
arbitrary element v ∈ V as a linear combination of the orthonormal eigenvectors
vi , which form a basis of V :
v = c1 v1 + c2 v2 + · · · + cn vn
11.2. THE FIRST THREE CRITERIA 72
for real numbers ci . Then
hv, T vi = c21 hv1 , v1 i + c22 hv2 , v2 i + · · · + c2n hv1 , v1 i

= λ1 c21 + λ2 c22 + · · · + λn c2n
which is positive unless all the coefficients ci are 0.

For (1) ⇒ (3), again let {v1 , v2 , . . . , vn } be an orthonormal basis of eigenvec-
tors for T . By the first equivalence T is positive definite if and only √ if the λi are
all positive. In which case take for S the operator that maps vi 7→ λi vi . S is
invertible if and only if all the λi are positive. It is easy to see that S is symmetric.
If v = c1 v1 + · · · + cn vn and w = d1 v1 + · · · + dn vn , then
p p
hSv, wi = λ1 c1 d1 hv1 , v1 i + · · · + λn cn dn hvn , vn i
= hv, Swi
S is called the square root of T . Note that a positive semidefinite operator also has
a square root, but it is invertible only if T is positive definite.
(3) ⇒ (1) is a special case of Theorem 7.3.3. We reprove it in this context in
the language of operators. We take any operator S on V . S t S is symmetric because
hS t Sv, wi = hSv, Swi = hv, S t Swi.
It is positive semidefinite because, by the same computation,
hS t Sv, vi = hSv, Svi ≥ 0
for all v ∈ V . If S is invertible, then Sv 6= O when v 6= O. Then hSv, Svi > 0

because the scalar product itself is positive definite.
(1) ⇔ (4) follows easily from the definition of the index of positivity: see
Lang, Chapter V, §8, p. 138. More generally, here is the definition of the more
commonly used inertia of A, which is not defined in Lang: it is the triple of non-
negative integers (n+ , n− , n0 ), where n+ is the index of positivity, n0 the index of
nullity (defined p.137 of Lang) and n− = n − n1 − n3 what could be called by
analogy the index of negativity: thus if {v1 , . . . , vn } is any orthogonal basis of V ,
then
1. n+ is the number of basis elements vi such that vit Avi > 0,
2. n− is the number of basis elements vi such that vit Avi < 0,
3. n0 is the number of basis elements vi such that vit Avi = 0,
The fact that these numbers do not depend on the choice of orthogonal basis is the
content of Sylvester’s Theorem 8.1 of Chapter V. Thus, essentially by definition:
• A is positive definite if and only if its inertia is (n, 0, 0). In Lang’s terminol-
ogy, A is positive definite if and only its index of positivity is n.
11.3. THE TERMS OF THE CHARACTERISTIC POLYNOMIAL 73
• A is positive semidefinite if and only if its inertia is (n+ , 0, n0 ).
This concludes the proof.
As a corollary of this theorem, we get
11.2.2 Corollary. If T is positive definite, then it is invertible.
Proof. Indeed if v is a non-zero vector in the kernel of T , then T v = O, so

hT (v), vi = 0, a contradiction.
There are two other useful criteria. The purpose of the rest of this chapter is to
explain them. They are not mentioned in Lang, but following easily from earlier
results.
11.3 The Terms of the Characteristic Polynomial

In this section we establish a result for all square matrices that is interesting in its
own right. We will use it in the next tests for positive (semi)definiteness.
Write the characteristic polynomial of any n × n matrix A over a field K as
P (t) = tn − p1 tn−1 + p2 tn−2 − · · · + (−1)n pn , (11.3.1)
where pi ∈ K for all i, 1 ≤ i ≤ n.

Before stating the main result of this section, we need some definitions.
11.3.2 Definition. Let J = {i1 , . . . , ik } be a set of k distinct integers in the interval

[1, n], listed in increasing order. If A is a square matrix, let AJ or A(i1 , . . . , ik ) be
the square submatrix of A formed by the rows and the columns of A with indices
J. Let DJ be its determinant. Then AJ is called a principal submatrix of A, and
its determinant DJ or D(i1 , . . . , ik ) a principal minor of A.
For each k between 1 and n, the matrices A(1, 2, . . . , k) are called the leading
principal submatrices, and their determinants the leading principal minors of A.
11.3.3 Remark. Lang looks at certain minors in VI.2, while studying the Laplace
expansion along a row or column. See for example the discussion on p.148. He
does not use the word minor, but his notation Aij denotes the submatrix of size
(n − 1) × (n − 1) where the i-th row i and the j-th column have been removed.
Then the det Aij are minors, and the det Aii are principal minors.
11.3.4 Example. So if A is the matrix

 
2 0 0 1
0 4 3 0
 
0 3 4 0
1 0 0 2
then
 
2 0 0
2 0 4 3
A(1, 2) = , A(2, 3) = , and A(1, 2, 3) = 0 4 3
0 4 3 4
0 3 4
Obviously, if A is symmetric as in this example, AJ is symmetric. For each k,

there is only only leading principal matrix, but nk principal minors of order k.
For the matrix A above, we have already computed A(1, 2) and A(1, 2, 3);
A(1) is the 1 × 1 matrix (2) and A(1, 2, 3, 4) is A. So the determinant D(1) = 2,
D(1, 2) = 8, and D(1, 2, 3) = 2(16 − 9) = 14. So far, so good. We compute the
determinant of A itself by expansion along the first row. We get 2(16−9)2−7 > 0.
As we will see in the next test, these simple computations are enough to show thatA
is positive definite. In Lang’s notation our A(1, 2, 3) would be A44 .
Then the main theorem is:
11.3.5 Theorem. For each index j, 1 ≤ j ≤ n we have

X X
pj = det AJ = DJ
J J
where the sum is over all choices of j elements J = {i1 , . . . , ij } from the set of the
first n integers, and AJ is the corresponding principal submatrix of A. Thus the
n

sum has j terms.
11.3.6 Example. We know two cases of the theorem already.
• If j = n, then there is only one choice for J: all the integers between 1 and
n. The theorem then says that pn = det A, which is clear: just evaluate at
t = 0.
• If j = 1, we know that p1 is the trace of the matrix. The sets J have just one
element, so the theorem says
p1 = a11 + a22 + . . . ann
which is indeed the trace.

Proof. To do the general case, we use the expansion of the determinant in terms
of permutations in Theorem 9.1.3. We consider the entries aij of the matrix A as
variables.
To pass from the determinant to the characteristic equation, we make the fol-
lowing substitutions: we replace each off-diagonal term aij by −aij and each di-
agonal terms aii by t − aii , where t is a new variable. How do we write this
substitution in the term
(σ)aσ(1),1 aσ(2),2 . . . aσ(n),n
of the determinant expansion? Let f be the number of integers i fixed by σ, mean-
ing that σ(i) = i, and let the fixed integers be {j1 , . . . , jf } Designate the remain-
ing n − f integers in [1, n] by {i1 , . . . , in−f }. Then the term associated to σ in the
characteristic polynomial can be written
f
Y n−f
Y
(σ) (t − ajl ,jl ) (−aσ(il ),il ). (11.3.7)
l=1 l=1
We fix an integer k in order to study the coefficient pk of tn−k in the character-

istic polynomial. How can a term of degree exactly n − k in t be produced by the
term (11.3.7) corresponding to σ?
First, the permutation σ must fix at least n − k integers. So f ≥ n − k. Then
in (11.3.7) we pick the term t in n − k factors, and the other term in the remaining
f

f − n + k factors involving t. So there are exactly n−k ways of doing this. If we
let I be a choice of n − k indices fixed by σ, we get for the corresponding term of
(11.3.7) Y
(−1)k (σ)tn−k aσ(l),l
l∈I
/
Remove from this expression the factors (−1)k
and tn−k which do not depend of
σ. We call what is left Y
M (σ, I) = (σ) aσ(l),l
l∈I
/
if σ fixes I, and we set M (σ, I) = 0 otherwise. Now we fix the set of indices
I. A permutation σ that fixes I gives a permutation τ on the complement J =
{j1 , j2 , . . . , jk } of I in [1, n]. Notice that (σ) = (τ ). Conversely any permutation
τ of J lifts uniquely to a permutation σ of [1, n] by requiring σ to fix all the integers
not in J. This shows that
X X
M (σ, I) = (τ )aτ (j1 ),j1 aτ (j2 ),j2 . . . aτ (jk ),jk
σ τ
where the last sum is over all the permutations of J. This is just DJ . We have
accounted for all the terms in the characteristic polynomial, so we are done.
11.4. THE PRINCIPAL MINORS TEST 76
11.4 The Principal Minors Test

The main theorem is
11.4.1 Theorem. A symmetric matrix A is positive definite if and only if all its
leading principal minors are positive. It is positive semidefinite if and only if all its
principal minors are non-negative.
Notice the subtle difference between the two cases: to establish that A is pos-
itive semidefinite, you need to check all the principal minors, not just the leading
ones.
11.4.2 Example. Consider the matrix

0 0
0 −1
The leading principal minors of this matrix are both 0, and yet it obviously not
positive semidefinite, since its eigenvalues (0 and −1) are not both non-negative.
To prove the theorem some more definitions are needed.
11.4.3 Definition. Given J = {i1 , . . . , ik } as in Definition 11.3.2, for any n-vector

X = (x1 , . . . , xn ) let XJ be the k-vector (xi1 , xi2 . . . , xik ). Let X̃J be the n-vector
whose i-th entry x̃i is given by
(
xi , if i ∈ J;
x̃i =
0, otherwise.
Then it is clear that

X̃Jt AX̃J = XJt AJ XJ . (11.4.4)
11.4.5 Example. If n = 4, k = 2 and J = {2, 4}, then

 
0
x2  x2 
XJ = , and X̃J = 
 0 .

x4
x4
Since
a22 a24
AJ =
a42 a44
you can easily verify (11.4.4) in this case.
This allows us to prove:

11.4.6 Proposition. If A is positive definite, the symmetric matrix AJ is positive
definite. If A is positive samidefinite, then AJ is positive semidefinite.
Proof. If A is positive definite, the left hand side of (11.4.4) is positive if X̃J 6= O.
So the right hand side is positive when XJ 6= O, since XJ is just X̃J with n − k
zero entries removed. That is the definition of positive definiteness for AJ , so we
are done. The positive semidefinite case is even easier.
If AJ is positive definite, its determinant, which is the principal minor DJ of
A, is positive: indeed the determinant is the product of the eigenvalues, which are
all positive. This shows that all the principal minors are positive, and finishes the
easy implication in the proof of the main theorem. A similar argument handles the
positive semidefinite case.
Before proving the more difficult implication of the theorem, we look at some
examples.
11.4.7 Example. When the set J has just one element, so k = 1, we are looking at
1 × 1 principal minors. So we get: D(i) = aii > 0. However there are symmetric
matrices with positive diagonal entries that are not positive definite. The matrix

1 −2
A=
−2 1
is not positive definite: test the vector (1, 1):

1
1 1 A = −2
1
When J has two elements, so J = {i, j}, we get:
11.4.8 Corollary. Let A be a positive definite matrix. Then for i 6= j,
√
|aij | ≤ aii ajj .
In Example 11.4.7, |a12 | = 2, while a11 a22 = 1, so the matrix is not positive
definite.
11.4.9 Example. The matrix
 
2 0 0 2
0 4 3 0
A=
0

3 4 0
2 0 0 2
is not positive definite, by applying the lemma to i = 1, j = 4. It is positive
semidefinite, however.
A weaker result implied by this corollary is useful when just scanning the ma-
trix.
11.4.10 Corollary. If A is positive definite, the term of largest absolute value must
be on the diagonal.
Now we return to the proof of the main theorem. In the positive definite case it
remains to show that if all the leading principal minors of the matrix A are positive,
then A is positive definite. Here is the strategy. From Exercise 11.1.7, if U be any
invertible n × n matrix, then the symmetric matrix A is positive definite if and
only if U t AU is positive definite. Then we have the following obvious facts for
diagonal matrices , which guide us.
11.4.11 Proposition. If the matrix A is diagonal, then

1. It is positive definite if and only if all its diagonal entries are positive. This
can be rewritten:
2. Iit is positive definite if and only if all its its leading principal minors are
positive.
Proof. Indeed, if the diagonal entries are {d1 , d2 , . . . , dn }, the leading principal
minors are D1 = d1 , D2 = d1 d2 , . . . , Dn = d1 d2 · · · dn . So the positivity of the
di is equivalent to that of the Di .
The key step in the proof is the following result:
11.4.12 Proposition. If A is has positive leading minors, it can be diagonalized

by an invertible lower triangular matrix U with uii = 1 for all i. In other words
U AU t is diagonal.
Proof. Let Dk denote the k-th leading principal minor of A, so
Dk = det A(1, . . . , k).
Thus a11 = D1 > 0, so we can use it to clear all the entries of the first column of
the matrix A below a11 . In other words we left-multiply A by the invertible matrix
U1 whose only non-zero elements are in the first column except for 1 along the
diagonal:
 
1 0 ... 0
 − a21 1 . . . 0
 a11
U1 =  .

. .. . . .. 
 . . . .
an1
− a11 0 . . . 1
This is simply the first step of Gaussian elimination on A, as described in §1.1.

Now we right multiply by the transpose of U1 to get A(1) = U1 AU1t . It is symmet-
ric with zeroes in the first row and column, except at a11 , Because A(1) is obtained
from A by using row and column operations, the passage from A to A(1) does not
modify the determinants of the submatrices that contain the first row and column.
(1)
So if we denote by Dk the leading principal minors of A(1) , we get
(1)
Dk = Dk > 0
From the partial diagonalization we have achieved, we see that

(1) (1)
D2 = a11 a22
(1) (1)
Since D2 > 0 by hypothesis, its diagonal element a22 > 0. Now we can iterate
the process. We use this diagonal element to clear the elements in the second
(1)
column of A(1) below a22 by using Gaussian elimination to left-multiply by the
invertible U2 :  
1 0 0 ... 0
0 1 0 . . . 0
 (1)

 a32 
0 − (1) 1 . . . 0
U2 =   a22 
.. .
.. .
.. . . 
. . .. 
. 
 (1) 
an2
0 − (1) 0 . . . 1
a22
Define A(2) = U2 A(1) U2t . It is symmetric with zeroes in the first two rows and
(2) (2) (1)
columns, except at a11 = a11 and a22 = a22 . SinceA(2) is obtained from A(1)
using row and column operations involving the second row and second column, we
have
(2) (2)
Dk = Dk > 0.
Now from partial diagonalization
(2) (1) (2)
D3 = a11 a22 a33 .
(2)
So a33 > 0. Continuing in this way, we get a diagonal matrix A(n−1) with positive
diagonal elements, obtained from A by left-multiplying by
U = Un−1 Un−2 · · · U1
and right-multiplying by its conjugate U t . Since U is the product of lower trian-

gular matrices with 1’s along the diagonal, it is also lower triangular, so we are
done.
More generally, the proof implies a result that is interesting in its own right.
11.4.13 Proposition. Let A be any symmetric matrix that can be diagonalized by a

product U of lower triangular matrices as in Proposition 11.4.12. Then the leading
principal minors of A0 = U AU t are equal to those of A.
Proof. As before, this immediately follows from the fact that if you add to a row
(or column) of a square matrix a multiple of another row (or another column),
then the determinant of the matrix does not change. Just apply this to the leading
principal minors.
11.4.14 Exercise. State the result concerning negative definite matrices that is anal-
ogous to the main theorem, noting that Proposition 11.4.13 applies. Do the same
for Theorem 11.2.1.
We now finish the proof of the main theorem in the positive definite case.
Assume that A is a symmetric matrix whose leading principal minors Dk are
all positive. Proposition 11.4.12 tells us that A can be diagonalized to a matrix
A(n−1) = U AU t by a lower diagonal matrix U with 1’s on the diagonal The
(1) (n−1)
diagonal matrix A(n−1) obtained has all its diagonal entries a11 , a22 , . . . , ann
positive, so it is positive definite by the easy Proposition 11.4.11. By Exercise
11.1.7 A is positive definite, so we are done.
We now prove Theorem 11.4.1 in the positive semidefinite case. Proposition
11.4.6 establishes one of the implications. For the other implication we build on
Lang’s Exercise 13 of VII.1. Use the proof of this exercise in [5] to prove:
11.4.15 Proposition. If A is positive semidefinite, then A + I is positive definite

for any > 0.
We also need Theorem 11.3.5. Write the characteristic polynomial of A as in

(11.3.1):
P (t) = tn − p1 tn−1 + p2 tn−2 − · · · + (−1)n pn .
Since all the principal minors of A are non-negative, Theorem 11.3.5 says that all
the pi are non-negative. We have the elementary proposition:
11.4.16 Proposition. If the characteristic polynomial of A is written as in (11.3.1),

then if all the pi are non-negative, A is positive semidefinite. If all the pi are
positive, then A is positive definite.
Proof. We first note that all the roots of P (t) are non negative, only using the non-
negativity of the pi . Assume we have a negative root λ. Then all the terms of
P (λ) have the same sign, meaning that if n is even, all the terms are non-negative,
11.5. THE CHARACTERISTIC POLYNOMIAL TEST 81
while if n is odd, all the terms are non-positive. Since the leading term λn is non-
zero, this is a contradiction. Thus all the roots are non-negative, and A is therefore
positive semidefinite by Theorem 11.2.1, (1). If the pi are all positive (in fact if a
single one of them is positive) then the polynomial cannot have 0 as a root, so by
the same criterion A is positive definite.
This concludes the proof of Theorem 11.4.1.

For some worked examples, see §11.5.2, which is Lang’s exercise 16 of Chap-
ter VIII, §4.
11.5 The Characteristic Polynomial Test

We write the characteristic polynomial of A as in (11.3.1). With this notation, we
get a new test for positive definiteness.
11.5.1 Theorem. A is positive definite if and only if all the pi , 1 ≤ i ≤ n, are pos-
itive. A is positive semidefinite if and only if all the pi , 1 ≤ i ≤ n, are nonnegative.
Proof. One implication follows immediately from Theorem 11.4.16.

For the reverse implication, we must show that if A is positive definite, then all
the constants pi are positive. This follows immediately from Theorem 11.3.5 and
Proposition 11.4.6: all the principal minors are positive (non-negative) and the pi
are sums of them.
11.5.2 Exercise. Consider Exercise 16 of [VIII, §4].

How to determine if a matrix is positive definite? The easiest test is often the
leading principal minors test of Theorem 11.4.1 which is not presented in [4], or
the characteristic polynomial test above.
Exercise 16 a) and c). Not positive definite because the determinant is nega-
tive. The determinant is the product of the eigenvalues, which are all positive by
Exercise 3 a), so this cannot be so.
Exercise 16 b). Positive definite, because the upper right hand corner entry is
positive, and the determinant is positive.
Exercise 16 d) and e). Not positive definite because a diagonal element is
0. If the k-th diagonal element is 0, and ek is the k unit vector, the etk Aek = 0,
contradicting positive definiteness.
Note that the answers in Shakarni [5] are wrong for b), c) and e).
Chapter 12
Homework Solutions
This chapter contains solutions of a few exercises in Lang, where the solution man-
ual is too terse and where the method of solution is as important as the solution
itself.
12.1 Chapter XI, §3

The purpose here is to fill in all the details in the answer of the following problem
in [5], which seems (from your solutions) to be unclear.
Exercise 8. Assume A : V → V is diagonalizable over K. This implies that
the characteristic polynomial PA (t) can be written
PA (t) = (t − α1 )m1 (t − α1 )m2 · · · (t − αr )mr ,
where the mi are the multiplicities of the eigenvalues. The goal of the exercise is
to show that the minimal polynomial of A is
µ(t) = (t − α1 )(t − α1 ) · · · (t − αr ).
Thus all the factors appear, but with multiplicity one.

There are two things to prove.
First that the µ(t) vanishes on A. Any element in A can be written as a linear
combination of eigenvectors vi . Since (A − αi I) vanishes on all the eigenvectors
with eigenvalue αi , this is clear.
So the harder step is to show that there is no polynomial g(t) of smaller de-
gree that vanishes on A. Write A = BDB −1 where D is diagonal. Assume by
contradiction that there is a
g(t) = tk + gk−1 tk−1 + · · · + g1 t + g0

12.2. CHAPTER XI, §4 83
of degree k, where k < r that vanishes on A. Note that
Ai = (BDB −1 )i = BDB −1 BDB −1 · · · BDB −1 = BDi B −1
because all the B −1 B in the middle simplify to I. This implies
g(A) = Ak + gk−1 Ak−1 + · · · + g1 A + g0 I

= BDk B −1 + gk−1 BDk−1 B −1 + · · · + g1 BDB −1 + g0 BIB −1

= B Dk + gk−1 Dk−1 + · · · + g1 D + g0 I B −1
= Bg(D)B −1 .
By assumption g(A) = O, where O is the n × n zero matrix. So
Bg(D)B −1 = O.
Multiplying by B −1 on the left, and B on the right, we get
g(D) = O. (12.1.1)
If the diagonal entries of D are denoted d1 , d2 , . . . , dn , then Di is diagonal with

diagonal entries di1 , di2 , . . . , din . Then (12.1.1) is satisfied if and only if
g(di ) = 0 for all i. (12.1.2)
But g(t) has only k roots, and there are r > k distinct numbers among the di .
So (12.1.2) is impossible, and the minimal degree for g is r.
Finally you may ask, could there be a polynomial g(t) of degree r that vanishes
on A, other than µ(t)? No: if there were, we could divide µ(t) by g(t). If there is
no remainder, the polynomials are the same. If there is a remainder, it has degree
< r and vanishes on A: this is impossible as we have just established.
12.2 Chapter XI, §4

Exercise 2. A : V → V is an operator, V is finite dimensional. Assume that
A3 = A. Show
V = V0 ⊕ V1 ⊕ V−1
where Vi is the eigenspace of A with eigenvalue i.
Solution The polynomial t3 − t vanishes at A. Since t3 − t factors as t(t −
1)(t + 1), and these three polynomials are relatively prime, we can apply Theorem
4.2 of Chapter XI, which says that V = V0 ⊕ V1 ⊕ V−1 , where Vi is the kernel of
12.2. CHAPTER XI, §4 84
(A − iI). When i = 0, we have the kernel of A, and when i = ±1 we get the

corresponding eigenspaces.
Note that since we are not told that t3 − t is the characteristic polynomial of A,
it is possible that there is a polynomial of lower degree on which A vanishes. This
polynomial would have to divide t3 − t: for example it could be t2 − 1. In that case
the space V0 would have dimension 0. Thus, more generally, we cannot assert that
any of the spaces Vi are positive dimensional.
Appendix A
Symbols and Notational

Conventions
A.1 Number Systems

N denotes the natural numbers, namely the positive integers.
Z denotes the integers.
Q denotes the field of rational numbers, R denotes the field of real numbers,
and C the field of complex numbers.
When a neutral name for a field is required, we use K. In these notes, as in
Lang, it usually stands for R or for C.
[a, b] = {x ∈ R | a ≤ x ≤ b}.
(a, b) = {x ∈ R | a < x < b}.
A.2 Vector Spaces

The n-th cartesian product of the real numbers R is written Rn ; similarly we have
Cn . Lower-case letters such as x and a denote vectors in Rn or Cn , each with
coordinates represented by (x1 , . . . , xn ) and (a1 , . . . , an ), respectively.
Vectors are also called points, depending on the context. When the direction is
being emphasized, it is called a vector.
When unspecified, vectors are column matrices.
In the body of the text, an expression such as [a1 , a2 , . . . , an ] denotes a column
vector while (a1 , a2 , . . . , an ) denotes a row vector.
The length of a vector v is written kvk, and the inner product of v and w is hv, wi,
or v · w.
A.3. MATRICES 86
A.3 Matrices
Matrices are written with parentheses. Matrices are denoted by capital roman
letters such as A, and have as entries the corresponding lower case letter. So
A = (aij ). A is an m × n matrix if it has m rows and n columns, so 1 ≤ i ≤ m
and 1 ≤ j ≤ n. We write the columns of A as Aj and the rows as Ai , following
Lang.
At is the transpose of the matrix A. This differs from Lang.
D(d1 , d2 , . . . , dn ) is the n × n diagonal matrix
 
d1 0 0 . . . 0
 0 d2 0 . . . 0 
 
 .. .. .. 
. . . ... 0 
0 0 0 ... dn
In or just I stands for the n × n identity matrix D(1, 1, . . . , 1).

If A is an m × n matrix, LA is the linear transformation from Rn to Rm given by
LA (x) = Ax, the matrix product of A by the n-column vector x. The kernel of
LA is written Ker(A), and its image Image(A).
References
[1] Michael Artin, Algebra, First Edition, Prentice Hall, New York, 1991. ↑vi, 10, 14, 56, 58, 60
[2] Kenneth Hoffman and Ray Kunz, Linear Algebra, Second Edition, Prentice Hall, Englewood
Cliffs, NJ, 1971. ↑61
[3] Serge Lang, Introduction to Linear Algebra, Second Edition, Springer, New York, 1997. ↑1, 3,
7, 56
[4] , Linear Algebra, Third Edition, Springer, New York, 1987. ↑v, vi, 1, 7, 9, 10, 11, 12, 19,
21, 39, 56, 57, 59, 61, 64, 66, 67, 69, 81
[5] Rami Shakarni, Solutions Manuel for Lang’s Linear Algebra, Springer, New York, 1996. ↑vi,
80, 81, 82
[6] Gilbert Strang, Linear Algebra and its Applications, Third Edition, Harcourt Brace Jovanovich,
San Diego, CA, 1988. ↑46
[7] Robert J. Valenza, Linear Algebra: An Introduction to Abstract Mathematics, Undergraduate
Texts in Mathematics, Springer, New York, 1993. ↑vi

LangCommentary PDF

Uploaded by

Copyright:

Available Formats

LangCommentary PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LangCommentary PDF

Uploaded by

Copyright:

Available Formats

Commentary on Lang’s Linear Algebra

Draft of May 31, 2013.

3 Groups and Permutation Matrices 11

4 Representation of a Linear Transformation by a Matrix 19

5 Row Rank is Column Rank 36

6.6 The Four Subspaces . . . . . . . . . . . . . . . . . . . . . . . 45

9 The Key Lemma on Determinants 61

10 The Companion Matrix 64

11 Positive Definite Matrices 69

A Symbols and Notational Conventions 85

1.1 Row Operations

where A is therefore a m × n matrix, x a n-vector and b a m-vector. Solving this

1.1.1 Definition. Applied to the augmented matrix

these operations have the following effect:

are in row echelon form, while

We could of course make this proof constructive by repeating to B the row

At this point we defer the study of the inhomogeneous equation Ax = b. We

1.2 Elementary Matrices

1.2.1 Definition. Elementary matrices E are square matrices, say m × m. We

multiplies the r-th row of A by c.

Ers (c) := I + cIrs , r 6= s,

adds c times the s-th row of A to the r-th row of A.

1.2.2 Example. Here are some 3 × 3 examples, acting on the 3 × n matrix

1.3 Elimination via Matrices

where either A0 = I or the bottom row of A0 is the zero vector.

Em (1/bmm ), Em−1 (1/bm−1,m−1 ), . . . , E1 (1/b11 )

Em,m−1 (c),Em,m−2 (c), . . . , Em,1 (c),

2.2 More Elimination via Matrices

2.2.1 Proposition. Any product of elementary matrices is invertible.

Before stating the next result, we make a definition:

2.2.2 Definition. Two m × n matrices A and B are row equivalent if there is a

By Proposition 2.2.1 E is invertible, so if B = EA, then A = E −1 B.

2.2.3 Proposition. If A is a square matrix, , and if B is row equivalent to A, then

Proof. Write B = EA. If A is invertible, then A−1 E −1 is the inverse of B. We

2.2.4 Remark. This establishes an equivalence relation on m × n matrices, as

We continue to assume that A is a square matrix. Corollary 1.3.2 tells us that

Proof. By [4] , Theorem 2.1 p. 30, the homogeneous system

2.2.7 Theorem. A square matrix A is invertible if and only if it is row equivalent

Finally, we get to the key result of this section.

Proof. Suppose first that AB = I, so B is a right inverse. Perform row reduction

A0 B = (Ek . . . E1 A)B = (Ek . . . E1 )(AB) = Ek . . . E1 .

This is invertible, because elementary matrices are invertible, so all rows of A0 B

2.3 Invertible Matrices

is invertible if and only if ad − bc 6= 0, in which case the inverse of A is

as you should check by direct computation. See Exercise 4 of [I, §2].

Groups and Permutation

3.1 The Motions of the Equilateral Triangle

We are interested in the permutations of the vertices of the triangle. Because

To start, we have the identity motion given by the identity matrix

This is an elementary matrix, by the way.

Table 3.1: Multiplication Table.

3.2.3 Definition. A group is a set G with a map G × G → G, written as multipli-

GP 1. The group law is associative: Given elements u, v and w in G,

GR 3. Existence of an inverse for every element: For every u ∈ G there is a v ∈ G

3.3 Permutation Matrices

Here is another permutation matrix.

3.3.4 Exercise. What is the permutation σ associated to Q?

Solution: σ here is the permutation σ(1) = 2, σ(2) = 1, σ(3) = 3.

3.3.6 Exercise. Here is another example.