LangCommentary PDF
LangCommentary PDF
LangCommentary PDF
Henry C. Pinkham
c 2013 by Henry C. Pinkham
All rights reserved
Henry C. Pinkham
1 Gaussian Elimination 1
1.1 Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Elimination via Matrices . . . . . . . . . . . . . . . . . . . . 5
2 Matrix Inverses 7
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 More Elimination via Matrices . . . . . . . . . . . . . . . . . 8
2.3 Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . 10
6 Duality 39
6.1 Non-degenerate Scalar Products . . . . . . . . . . . . . . . . 39
6.2 The Dual Basis . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 The Orthogonal Complement . . . . . . . . . . . . . . . . . . 42
6.4 The Double Dual . . . . . . . . . . . . . . . . . . . . . . . . 43
6.5 The Dual Map . . . . . . . . . . . . . . . . . . . . . . . . . . 44
CONTENTS iv
7 Orthogonal Projection 49
7.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . 49
7.2 Solving the Inhomogeneous System . . . . . . . . . . . . . . 51
7.3 Solving the Inconsistent Inhomogeneous System . . . . . . . 52
8 Uniqueness of Determinants 56
8.1 The Properties of Expansion by Minors . . . . . . . . . . . . 56
8.2 The Determinant of an Elementary Matrix . . . . . . . . . . . 57
8.3 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4 The Determinant of the Transpose . . . . . . . . . . . . . . . 60
12 Homework Solutions 82
12.1 Chapter XI, §3 . . . . . . . . . . . . . . . . . . . . . . . . . . 82
12.2 Chapter XI, §4 . . . . . . . . . . . . . . . . . . . . . . . . . . 83
References 87
Preface
These notes were written to complement and supplement Lang’s Linear Algebra
[4] as a textbook in a Honors Linear Algebra class at Columbia University. The
students in the class were gifted but had limited exposure to linear algebra. As Lang
says in his introduction, his book is not meant as a substitute for an elementary
text. The book is intended for students having had an elementary course in linear
algebra. However, by spending a week on Gaussian elimination after covering
the second chapter of [4], it was possible to make the book work in this class. I
had spent a fair amount of time looking for an appropriate textbook, and I could
find nothing that seemed more appropriate for budding mathematics majors than
Lang’s book. He restricts his ground field to the real and complex numbers, which
is a reasonable compromise.
The book has many strength. No assumptions are made, everything is defined.
The first chapter presents the rather tricky theory of dimension of vector spaces
admirably. The next two chapters are remarkably clear and efficient in presently
matrices and linear maps, so one has the two key ingredients of linear algebra:
the allowable objects and the maps between them quite concisely, but with many
examples of linear maps in the exercises. The presentation of determinants is good,
and eigenvectors and eigenvalues is well handled. Hermitian forms and hermitian
and unitary matrices are well covered, on a par with the corresponding concepts
over the real numbers. Decomposition of a linear operator on a vector space is
done using the fact that a polynomial ring over a field is a Euclidean domain, and
therefore a principal ideal domain. These concepts are defined from scratch, and
the proofs presented very concisely, again. The last chapter covers the elementary
theory of convex sets: a beautiful topic if one has the time to cover it. Advanced
students will enjoy reading Appendix II on the Iwasawa Decomposition. A motto
for the book might be:
A little thinking about the abstract principles underlying linear algebra
can avoid a lot of computation.
Many of the sections in the book have excellent exercises. The most remarkable
CONTENTS vi
ones are in §II.3, III.3, III.4, VI.3, VII.1, VII.2, VII.3, VIII.3, VIII.4, XI.3, XI.6.
A few of the sections have no exercises at all, and the remainder the standard set.
There is a solutions manuel by Shakarni [5]. It has answers to all the exercises in
the book, and good explanations for some of the more difficult ones. I have added
some clarification to a few of the exercise solutions.
A detailed study of the book revealed flaws that go beyond the missing material
on Gaussian elimination. The biggest problems are in Chapters IV and V. Certain
of the key passages in Chapter IV are barely comprehensible, especially in §3. Ad-
mittedly this is the hardest material in the book, and is often omitted in elementary
texts. In Chapter V there are problems of a different nature: because Lang does
not use Gaussian elimination he is forced to resort to more complicated arguments
to prove that row rank is equal to column rank, for example. While the coverage
of duality is good, it is incomplete. The same is true for projections. The coverage
of positive definite matrices over the reals misses the tests that are easiest to apply,
again because they require Gaussian elimination. In his quest for efficiency in the
definition of determinants, he produces a key lemma that has to do too much work.
While he covers some of the abstract algebra he needs, for my purposes he
could have done more: cover elementary group theory and spend more time on
permutations. One book that does more in this direction is Valenza’s textbook [7].
Its point of view similar to Lang’s and it covers roughly the same material. Another
is Artin’s Algebra text [1], which starts with a discussion of linear algebra. Both
books work over an arbitrary ground field, which raises the bar a little higher for
the students. Fortunately all the ground work for doing more algebra is laid in
Lang’s text: I was able to add it in my class without difficulty.
Lang covers few applications of linear algebra, with the exception of differen-
tial equations which come up in exercises in Chapter III, §3, and in the body of the
text in Chapters VIII, §1 and §4, and XI, §4.
Finally, a minor problem of the book is that Lang does not refer to standard
concepts and theorems by what is now their established name. The most egre-
gious example is the Rank-Nullity Theorem, which for Lang is just Theorem 3.2
of Chapter III. He calls the Cayley-Hamilton Theorem the Hamilton-Cayley The-
orem. He does not give names to the axioms he uses: associativity, commutativity,
existence of inverses, etc.
Throughout the semester, I wrote notes to complement and clarify the expo-
sition in [4] where needed. Since most of the book is clear and concise, I only
covered a small number of sections.
• I start with an introduction to Gaussian elimination that I draw on later in
these notes and simple extensions of some results.
• I continue this with a chapter on matrix inverses that fills one of the gaps in
Lang’s presentation.
CONTENTS vii
• The next chapter develops some elementary group theory, especially in con-
nection to permutation groups. I also introduce permutation matrices, which
gives a rich source of examples of matrices.
• The next six chapters (4-9) address problems in Lang’s presentation of key
material in his Chapters IV, V and VI.
• In Chapter 10 I discuss the companion matrix, which is not mentioned at all
in Lang, and can be skipped. It is interesting to compare the notion of cyclic
vector it leads to with the different one developed in Lang’s discussion of the
Jordan normal form in [XI, §6].
• The last chapter greatly expands Lang’s presentation of tests for positive
(semi)definite matrices, which are only mentioned in the exercises, and which
omit the ones that are most useful and easiest to apply.
To the extent possible I use the notation and the style of the author. For example
I minimize the use of summations, and give many proofs by induction. One ex-
ception: I use At to denote the transpose of the matrix A. Those notes form the
chapters of this book. There was no time in my class to cover Lang’s Chapter XII
on Convex Sets or Appendix II on the Iwasawa Decomposition, so they are not
mentioned here. Otherwise we covered the entire book in class.
I hope that instructors and students reading Lang’s book find these notes useful.
Comments, corrections, and other suggestions for improving these notes are
welcome. Please email them to me at hcp3@columbia.edu.
H ENRY C. P INKHAM
New York, NY
Draft of May 31, 2013
Chapter 1
Gaussian Elimination
The main topic that is missing in [4], and that needs to be covered in a first linear
algebra class is Gaussian elimination. This can be easily fit into a course using [4]:
after covering [4], Chapter II, spend two lectures covering Gaussian elimination,
including elimination via left multiplication by elementary matrices. This chapter
covers succinctly the material needed. Another source for this material is Lang’s
more elementary text [3].
Ax = b,
Proof. We prove this by induction on the number of rows of the matrix A. If A has
one row, there is nothing to prove.
So assume the result for matrices with m − 1 rows, and let A be a matrix with
m rows. If the matrix is the 0 matrix, we are done. Otherwise consider the first
column Ak of A that has a non-zero element. Then for some i, aik 6= 0, and for all
j < k, the column Aj is the zero vector.
Since Ak has non-zero entry aik , interchange rows 1 and i of A. So the new
matrix A has a1k 6= 0. Then if another row Ai of A has k-th entry aik 6= 0, replace
that row by
aik
Ai − A1
a1k
which has j-th entry equal to 0. Repeat this on all rows that have a non-zero
element in column k. The new matrix, which we still call A, has only zeroes in its
first k columns except for a1k .
Now consider the submatrix B of the new matrix A obtained by removing the
first row of A and the first k columns of A. Then we may apply the induction
hypothesis to B and put B in row echelon form by row operations. The key ob-
servation is that these same row operations may be applied to A, and do not affect
the zeroes in the first k columns of A, since the only non-zero there is in row 1 and
that row has been removed from B.
1.1.7 Exercise. Reduce the matrices in Example 1.1.5 that are not already in row
echelon form to row echelon form.
Er (c) := I + (c − 1)Irr
First, since
c 0 0
E1 (c) = 0 1 1 ,
0 0 1
matrix multiplication gives
ca11 . . . ca1n
E1 (c)A = a21 . . .
a2n .
a31 . . . a3n
Next since
1 0 0
T23 = 0 0 1
0 1 0
we get
a11 . . . a1n
T23 A = a31 . . . a3n .
a21 . . . a2n
1.3. ELIMINATION VIA MATRICES 5
Finally, since
1 0 c
E13 (c) = 0 1 0
0 0 1
we get
a11 + ca31 . . . a1n + ca33
E13 (c)A = a21 ... a2n .
a31 ... a3n
1.2.3 Theorem. All elementary matrices are invertible.
Proof. For each type of elementary matrix E we write down an inverse, namely a
matrix F such that EF = I = F E.
For Ei (c) the inverse is Ei (1/c). Tij is its own inverse. Finally the inverse of
Eij (c) is Eij (−c).
A0 = Ek . . . E1 A. (1.3.3)
Proof. Using Theorem 1.3.1, we can bring A into row echelon form. Call the new
matrix AB. As we noted in Remark 1.1.4 this means that B is upper triangular.
If any diagonal entry bii of B is 0, then the bottom row of B is zero, and we have
reached one of the conclusions of the theorem.
So we may assume that all the diagonal entries of B are non-zero. We now do
what is called backsubstitution to transform B into the identity matrix.
First we left multiply by the elementary matrices
. This new matrix, that we still call B, now has 1s along the diagonal.
Next we left multiply by elementary matrices of type (3) in the order
where in each case the constant c is chosen so that a new zero in created in the ma-
trix. This order is chosen so that no zeroes created by a multiplication is destroyed
by a subsequent one. At the end of the process we get the identity matrix I so we
are done.
1.3.4 Exercise. Reduce the matrices in Example 1.1.5 either to a matrix with bot-
tom row zero or to the identiy matrix using left multiplcation by elementary matri-
ces.
For example, the first matrix
1 2 3
0 4 0
0 0 1
backsubstitutes to
1 2 3 1 2 0 1 0 0
0 1 0 , then 0 1 0 , then 0 1 0 .
0 0 1 0 0 1 0 0 1
We will use these results on elementary matrices in §2.2 and then throughout
Chapter 8.
Chapter 2
Matrix Inverses
This chapter fill in some gaps in the exposition in Lang, especially the break be-
tween his more elementary text [3] and the textbook [4] we use. There are some
additional results: Proposition 2.2.5 and Theorem 2.2.8. They will be important
later.
2.1 Background
On page 35 of our text [4], Lang gives the definition of an invertible matrix.
2.1.1 Definition. A n × n matrix A is invertible if there exists another n × n matrix
B such that
AB = BA = I. (2.1.2)
We say that B is both a left inverse and a right inverse for A. It is reasonable
to require both since matrix multiplication is not commutative. Then Lang proves:
2.1.3 Theorem. If A has an inverse, then its inverse B is unique.
Proof. Indeed assume there is another matrix C satisfying (2.1.2) when C replaces
B. Then
C = CI = C(AB) = (CA)B = IB = B (2.1.4)
so we are done.
Note that the proof only uses that C is a left inverse and B a right inverse.
We say, simply, that B is the inverse of A and A the inverse of B.
In [4], on p. 86, Chapter IV, §2, in the proof of Theorem 2.2, Lang only estab-
lishes that a certain matrix A has a left inverse, when in fact he needs to show it
has an inverse. We recall this theorem and its proof in 4.1.4. We must prove that
2.2. MORE ELIMINATION VIA MATRICES 8
having a left inverse is enough to get an inverse. Indeed, we will establish this in
Theorem 2.2.8.
Proof. This follows easily from a more general result. Let A and B are two invert-
ible n × n matrices, with inverses A−1 and B −1 . Then B −1 A−1 in the inverse of
AB
Indeed just compute
B −1 A−1 AB = B −1 IB = B −1 B = I
and
ABB −1 A−1 = A−1 IA = A−1 A = I.
2.2.5 Proposition. Let A be a square matrix with one row equal to the zero vector.
Then A is not invertible.
AX = O (2.2.6)
has a non-trivial solution X, since one equation is missing, so that the number of
variables is greater than the number of equations. But if A were invertible, we
could multiply (2.2.6) on the left by A−1 to get A−1 AX = A−1 O. This yields
X = O, implying there cannot be a non-trivial solution, a contradiction to the
assumption that A is invertible.
Proposition 5.1.4 generalizes this result to matrices that are not square.
So we get the following useful theorem. The proof is immediate.
2.2.8 Theorem. Let A be a square matrix which has a left inverse B, so that
BA = I. Then A is invertible and B is its inverse.
Similarly, if A has a right inverse B, so that AB = I, the same conclusion
holds.
2.2.9 Remark. The first five chapters of Artin’s book [1] form a nice introduction
to linear algebra at a slightly higher level than Lang’s, with some group theory
thrown in too. The main difference is that Artin allows his base field to be any
field, including a finite field, while Lang only allows R and C. The references to
Artin’s book are from the first edition.
−1 1 d −b
A =
ad − bc −c a
In class we studied the motions of the equilateral triangle whose center of gravity is
at the origin of the plane, so that the three vertices are equidistant from the origin.
We set this up so that one side of the triangle is parallel to the x- axis. There are
6 = 3! motions of the triangle permuting the vertices, and they can be realized by
linear transformations, namely multiplication by 2 × 2 matrices.
These notes describe these six matrices and show that they can be multiplied
together without ever leaving them, because they form a group: they are closed
under matrix multiplication. The multiplication is not commutative, however. This
is the simplest example of a noncommutative group, and is worth remembering.
Finally, this leads us naturally to the n×n permutation matrices. We will see in
§3.3 that they form a group. When n = 3 it is isomorphic to the group of motions
of the equilateral triangle. From the point of view of linear algebra, permutation
and permutation matrices are the most important concepts of these chapter. We
will need them when we study determinants.
There are some exercises in this chapter. Lots of good practice in matrix mul-
tiplication.
Y Axis
P
θ= 1200
Mirr
or
Mirr
or
X Axis
300
Q R
Left-Right Mirror
√
−1/2
√ 3/2
R2 = .
− 3/2 −1/2
More importantly note that R2 is obtained by repeating R1 , in matrix multiplica-
tion:
R2 = R12 .
Also recall Lang’s Exercise 23 of Chapter II, §3.
Next let’s find the left-right mirror reflection of the triangle. What is its matrix?
In the y coordinate nothing should change, and in the x coordinate, the sign should
change. The matrix that does this is
−1 0
M1 =
0 1
√ √
1/2
√ − 3/2 1/2 3/2
M2 = R1 M1 = and M3 = R2 M1 = √ .
− 3/2 −1/2 3/2 −1/2
In the same way, to compute the product M1 R1 just change the sign of the
terms in the first row of R1 .
3.1.2 Exercise. Check that the squares of M2 and M3 are the identity. Describe
each geometrically: what is the invariant line of each one. In other words, if you
input a vector X ∈ R2 , what is M2 X, etc.
3.2 Groups
3.2.1 Exercise. We now have six 2 × 2 matrices, all invertible. Why? The next
exercise will give an answer.
3.2.2 Exercise. Construct a multiplication table for them. List the 6 elements in
the order I, R1 , R2 , M1 , M2 , M3 , both horizontally and vertically, then have the
elements in the table you construct represent the product, in the following order: if
A is the row element, and B is the column element, then put AB in the table. See
3.2. GROUPS 14
name I R1 R2 M1 M2 M3
I I R1 R2 M1 M2 M3
R1 R1 R2 I M2 M3 M1
R2 R2 I R1 M3 M1 M2
M1 M1 M3 M2 I R2 R1
M2 M2 M1 M3 R1 I R2
M3 M3 M2 M1 R2 R1 I
Artin [1], chapter II, §1, page 40 for an example, and the last page of these notes
for an unfinished example. The solution is given in Table 3.1.
Show that this multiplication is not commutative, and notice that each row and
column contains all six elements of the group.
Table 3.1 is the solution to the exercise. The first row and the first column of
the table just give the elements to be multiplied, and the entry in the table give the
product. Notice that the 6 × 6 table divides into 4 3 × 3 subtables, each of which
contains either M elements or R and I elements, but not both.
What is a group? The formal definition is given in Lang in Appendix II.
(uv)w = u(vw).
GR 2. Existence of a neutral element 1 ∈ G for the group law: for every element
u ∈ G,
1u = u1 = u.
If you look at the first three axioms of a vector space, given in Lang p.3, you
will see that they are identical. Only two changes: the group law is written like
multiplication, while the law + in vector spaces is addition. The neutral element
3.2. GROUPS 15
for vector spaces is O, while here we write it as 1. These are just changes in
notation, not in substance. Note that we lack axiom VP 4, which in our notation
would be written:
GR 4. The group law is commutative: For all u and v in G
uv = vu.
So the group law is not required to be commutative, unlike addition in vector
spaces. A group satisfying GR 4 is called a commutative group. So vector spaces
are commutative groups for their law of addition.
One important property of groups is that cancelation is possible:
3.2.4 Proposition. Given elements u, v and w in G,
1. If uw = vw, then u = v.
2. If wu = wv, then u = v again.
Proof. For (1) multiply the equation on the right by the inverse of w to get
(uw)w−1 = (vw)w−1 by existence of inverse,
u(ww−1 ) = v(ww−1 ) by associativity,
u1 = v1 by properties of inverses,
u=v by property of neutral element.
For (2), multiply by w−1 on the left.
Because vector spaces are groups for addition, this gives a common framework
for solving some of the problems in Lang: for example problems 4 and 5 of [I, §1].
3.2.5 Example. Here are some groups we will encounter.
1. Any vector space for its law of addition.
2. The integers Z, with addition, so the neutral element is 0 and the inverse of
n is −n.
3. Permutations on n elements, where the law is composition of permutations.
4. The group of all invertible n × n matrices, with matrix multiplication as its
law. The neutral element is’
1 0 ... 0
0 1 . . . 0
I = . . .
.. .. . . ...
0 0 ... 1
5. The group of all invertible n × n matrices with determinant 1.
Much of what we will do in this course is to prove results that will help us under-
stand these groups. For example Theorem 8.3.3 clarifies the relationship between
the last two example. See Remark 8.3.5.
3.3. PERMUTATION MATRICES 16
3.3.5 Exercise. Compute the matrix product PQ. Also compute the matrix product
QP.
Solution:
0 0 1 0 1 0 0 0 1
P Q = 1 0 0 1 0 0 = 0 1 0
0 1 0 0 0 1 1 0 0
while
0 1 0 0 0 1 1 0 0
QP = 1 0 0 1 0 0 = 0 0 1
0 0 1 0 1 0 0 1 0
The answers are different, but each one is a permutation matrix.
3.3.7 Exercise. Write down all 3 × 3 permutation matrices. Show that they are
invertible by finding their inverses. Make sure that your technique generalizes to
n × n permutation matrices. (Hint: transpose)
Here is the solution to these exercises. One key concept is the notion of the
order of an element, namely the smallest power of the element that makes it the
identity. The order of I is obviously 1.
Table 3.1 shows that the order of all three elements Mi is 2. Note that this
means that they are their own inverses. A little extra thought shows that the order
of R1 and R2 is 3: Indeed: R1 is the rotation by 120 degrees, and R2 rotation by
240, so 3 times each one of them is a multiple of 360 degres.
For our permutation matrices, to have order 2 means to be symmetric, as you
should check. The matrices Q and R are symmetric, but the matrix P is not. P 2 is
the last non-symmetric permutation matrix:
0 0 1 0 0 1 0 1 0
P 2 = 1 0 0 1 0 0 = 0 0 1
0 1 0 0 1 0 1 0 0
Representation of a Linear
Transformation by a Matrix
This chapter concerns [4], chapter IV. At the end of the chapter Lang gives the
definition of similarity. Here we expand on this in §4.3. Then there are some
solved exercises.
4.1.2 Exercise. Show that Lang’s Theorem 2.1 is a corollary of Theorem 4.2.7
below.
It is the matrix product BA. All we used is the associativity of matrix multiplica-
tion. We use this in the proof of Theorem 4.2.15.
It is worth amplifying Lang’s proof of his Theorem 2.2 p.86.
BAj = E j for j = 1, . . . , n.
4.2. REPRESENTATION OF L(V, W ) 21
But now, just looking at the way matrix multiplication works (a column at a time
for the right hand matrix), since E j is the j-th column of the identity matrix, we
get
BA = I
so B is a left inverse for A. Since Theorem 2.2.8 implies that A is invertible , we
are done for this direction.
The converse is easy: we have already seen this argument before in the proof
of Proposition 2.2.5. Assume A is invertible. Consider the linear map LA :
LA (X) = AX = x1 A1 + · · · + xn An ,
where the Ai are the columns of A.
If AX = O, then multiplying by the inverse A−1 we get A−1 AX = X = O,
so that the only solution of the equation AX = O is X = O. So the only solution
of
x 1 A1 + · · · + x n An = O
is X = O, which says precisely that A1 , . . . , An are linearly independent.
This gives us a n × m matrix C: this is not the matrix we want, but these
numbers are typically what is known about the linear transformation. Although
4.2. REPRESENTATION OF L(V, W ) 22
Lang is unwilling to write this expression down at the beginning of the section,
note that he does write it down in Example 3 p. 90.
Instead we use our basis B to construct an isomorphism from V to K n we have
studied before: see example 1 p.52. Lang now calls it XB : V → K n . To any
element
v = x1 v1 + · · · + xn vn ∈ V
it associates its vector of coordinates (x1 , . . . , xn ) in K n . We do the same thing on
the W side: we have an isomorphism XE : W → K m . To an arbitrary element
w = y1 w1 + · · · + ym wm ∈ W
XE−1 : K m → W
M B (F )
K n −−E−−→ K m
4.2.4 Theorem. The above diagram is commutative, meaning that for any v ∈ V ,
This is exactly the content of the equation in Lang’s Theorem 3.1 p. 88.
We want to compute MEB (F ), given information about F . Since XB is an
isomorphism, it has an inverse. Thus MEB (F ) can be written as the composition
MEB (F ) = XE ◦ F ◦ XB−1
4.2. REPRESENTATION OF L(V, W ) 23
4.2.5 Proposition.
Proof. In the commutative diagram above, we first want to get from V to K m via
K n . We get there from V by composing the maps XB followed by MEB (F ). In
our simplified notation, this is the matrix product AX, which is an m vector. Then
to get back to W , we need to apply the inverse of XE . So we need to take each
component of the m-vector AX and put it in front of the appropriate basis element
in W , as we did in Remark 4.2.2. The components of AX are the products of the
rows of A with X. That gives (4.2.6).
Next we relate MEB (F ) to the matrix C of (4.2.1). Using (4.2.1) for the F (vi ),
we write out F (v) = F (x1 v1 + · · · + xn vn ) to get:
F (v) = x1 c11 w1 + · · · + c1m wm + · · · + xn cn1 w1 + · · · + cnm wm
the dot products with the columns of C. Compare this to (4.2.6): it says that A is
the transpose of C.
4.2.7 Theorem. The matrix MEB (F ) is the transpose of the matrix C of (4.2.1). In
particular it is an m × n matrix.
4.2.8 Remark. A major source of confusion in Lang is the set of equations (∗) on
p.89. Since a matrix A is already defined, the reader has every right to assume that
the aij in these equations are the coefficients of the matrix A, especially since he
4.2. REPRESENTATION OF L(V, W ) 24
already uses Ai to denote the rows of A. Nothing of the sort. The aij are being
defined by these equations, just as here we defined the cij in the analogous (4.2.1).
Because he has written the coefficients in (∗) backwards: aji for aij , he lands on
his feet: they are the coefficients of A.
while the matrix A = MBB0 (in Lang’s notation) is its transpose, the matrix associ-
ated with F .
If we interpret the left hand side as the identity linear transformation id applied to
the basis elements wi here, this is the same set of equations at (4.2.1), but with the
role of the bases reversed: {w1 , . . . , wn } is now a basis in the domain of id, and
{v1 , . . . , vn } a basis in the codomain
We compute MBE (id). Note the change of position of the indices! We apply
Proposition 4.2.5, untangling the meaning of all the terms, and using as before the
abbreviation A for MBE (Id). Then, as in (4.2.6):
F (v1 ) = w1 , . . . , F (vn ) = wn .
The uniqueness follows from the important Theorem 2.1 of Chapter III of Lang.
4.2. REPRESENTATION OF L(V, W ) 25
F : V → W and G : W → U
where the left hand side is the matrix product of a matrix of size p × m by a matrix
of size m × n.
Proof. We can extend our previous commutative diagram:
F G
V −−−−→ W −−−−→ U
XB y
XE y
XH y
M B (F ) M E (G)
K n −−E−−→ K m −−−
H
−−→ K p
and compare to the commutative diagram of the composite:
G◦F
V −−−−→ U
XB y
XH y
H M B (G◦F )
K n −−−−−−→ K p
We need to show they are the same. This is nothing more than (4.1.3).
Now assume the linear map F is a map from a vector space V to itself. Pick a
basis B of V , and consider
MBB (F )
the matrix associated to F relative to B. Then doing this for the identity map Id,
we obviously get the identity matrix I:
MEE (F ) = N −1 MBB (F )N
Proof. Indeed take N = MBE (id), which has inverse MEB (id) by Corollary 4.2.17.
Then the expression on the right becomes
First apply Theorem 4.2.15 to the two terms on the right in (4.2.19):
so we are done.
Two square matrices M and P are called similar if there is an invertible matrix
N such that
P = N −1 M N
Theorem 4.2.18 shows is that two matrices that represent the same linear trans-
formation F : V → V in different bases of V are similar. We have an easy converse
that Lang does not mention:
use for C the transpose of the matrix N , that gives us the basis E = {w1 , . . . , wn }.
By construction N = MBE (id), so by Corollary 4.2.17
as required.
Jordan normal form in Chapter XI, §6. The simplest example is probably given by
the matrices
α 0 α 1
and
0 α 0 α
for any complex number α.
4.3.6 Remark. When you read Lang’s Chapter XI, §6, be aware that Figure 1 on
p. 264 is potentially misleading: the αi are not assumed to be distinct.
4.3.7 Exercise. Show that row equivalence (see Definition 2.2.2) is an equivalence
on n × m matrices.
The second part is unrelated. We answer in the usual way by doing row oper-
ations. First we interchange two rows to simplify the computations, and then we
make the matrix upper triangular:
1 1 5 1 1 5 1 1 5 1 1 5
2 0 1 , 0 −2 −9 , 0 0 7 , 0 −1 −8
2 1 2 0 −1 −8 0 −1 −8 0 0 7
So by the big theorem of Lang, Chapter IV, §3, recalled in Theorem 4.2.7: the
matrix representing the linear transformation is the transpose of the matrix on the
basis elements. Thus the answer is
2 1 0
0 −1 1
−1 1 2
This is worth remembering. Thus once you unwind the definitions, the only diffi-
culty is to compute the inverse of the matrix B 0 using Gaussian elimination.
Exercise 1 a)
The source basis B is
1 −1 0
1 , 1 , 1 , (4.4.4)
0 1 2
for the three choices of the right hand side corresponding to the three basis vectors
of B. So we compute the inverse C of B 0 by setting up, as usual:
2 0 −1 | 1 0 0
1 0 1 | 0 1 0
1 1 1 | 0 0 1
and doing row operations to make the left half of the matrix the identity matrix.
The C will appear on the right hand side. First divide the first row by 2:
1 0 −1/2 | 1/2 0 0
1 0 1 | 0 1 0
1 1 1 | 0 0 1
Subtract the third row from the second, multiply the third row by 2/3:
1 0 −1/2 | 1/2 0 0
0 1 0 | 0 −1 1
0 0 1 | −1/3 2/3 0
Now just plug in the values for the ri . When they are (1, 1, 0), (−1, 1, 1), (0, 1, 2)
we get in turn for the {xi } the three column vectors:
2/3 0 1/3
−1 , 0 , 1 , (4.4.6)
1/3 1 2/3
4.4. SOLVED EXERCISES 34
and they are the columns of the matrix we are looking for, since they are the coor-
dinates, in the B 0 basis, of the unit coordinate vectors in the B basis.
One of the most important theorems of linear algebra is the theorem that says that
the row rank of a matrix is equal to its column rank. This result is sometimes called
the fundamental theorem of linear algebra. The result, and its proof, is buried
in Lang’s Chapter V. It is stated as Theorem 3.2 p.114, but the proof requires a
result from §6: Theorem 6.4. The goal here is to give the result the prominence it
deserves, and to give a different proof. The one here uses Gaussian elimination.
5.1.1 Definition. If A is a m×n matrix, then the column rank of A is the dimension
of the subspace of K m generated by the columns {A1 , . . . , An } of A. The row rank
of A is the dimension of the subspace of K n generated by the rows {A1 , . . . , Am }
of A.
First, as Lang points out on page 113, the space of solutions of a m × n homo-
geneous system of linear equations AX = O can be interpreted in two ways:
1. either as those vectors X giving linear relations
x 1 A1 + · · · + x n An = O
hX, A1 i = 0, . . . , hX, Am i = 0.
5.1. THE RANK OF A MATRIX 37
These are just two different ways of saying that X is in the kernel of the lin-
ear map LA we investigated in Chapter IV. The second characterization uses the
standard scalar product on K n and K m . Continuing in this direction:
Proof. Notice that Proposition 2.2.5 is the case r = n. To have row rank r, B must
have linearly independent rows. If B has a row Bi of zeroes, then it satisfies the
row equation Bi = O, a contradiction. Conversely, if B does not have a row of
zeroes, then for each row i of B there is an index µ(i) so that the entry bi,µ(i) is the
first non-zero coordinate of row Bi . Note that µ(i) is a strictly increasing function
of i, so that there are exactly n − r indices j ∈ [1, n] that are not of the form µ(i).
The variables yj for these n − r indices are called the free variables. Assume the
Bi satisfy an equation of linear dependence
λ1 Bi + λ2 B2 + · · · + λr Br = O.
Look at the equation involving the µ(1) coordinate. The only row with a non-zero
entry there is B1 . Thus λ1 = 0. Continuing in this way, we see that all the λi are
0, so this is not an equation of linear dependence.
5.1. THE RANK OF A MATRIX 38
Back to the proof of Proposition 5.1.3. We can give arbitrary values to each one
of the free variables and then do backsubstitution as in §1.3 to solve uniquely for
the remaining variables. This implies that the space of solutions have dimension
n − r, so we are done.
Now we can prove one of the most important theorems of linear algebra.
5.1.5 Theorem. The row rank and the column rank of any matrix are the same.
Duality
This chapter is a clarification and an expansion of Lang [4], section [V, §6] entitled
The Dual Space and Scalar Products. The first three sections clarify the material
there, the remaining sections contain new results.
If V is a vector space over the field K, V ∗ denotes the vector space of linear
maps f : V → K. These linear maps are called functionals. V ∗ is called the dual
space of V , and the passage from V to V ∗ is called duality.
6.1.1 Theorem. Any scalar product on a finite dimensional vector space V admits
an orthogonal basis.
hvi , vj i = 0 whenever i 6= j.
This is defined p.103. Lang proves this theorem by a variant of the Gram-Schmidt
process.
As Lang notes in Theorem 4.1 of Chapter V, if X and Y are column vectors in
K , then a scalar product on K n is equivalent to specifying a unique symmetric
n
n × n matrix A so that
hX, Y i = X t AY. (6.1.2)
Returning to V , if we choose an orthogonal basis B for V and its scalar product,
and let X and Y be the component vectors in K n of vectors v and w in V , then the
symmetric matrix A of (6.1.2) describing the scalar product is diagonal.
We get the easy corollary, not mentioned by Lang:
D: V → V ∗
So D(v) = Lv .
6.2. THE DUAL BASIS 41
6.2.1 Definition. Let B = {v1 , . . . , vn } be a basis for V . Then the dual basis
B ∗ = {ϕ1 , . . . , ϕn } of V ∗ is defined as follows. For each ϕj , set
(
1, if k = j;
ϕj (vk ) =
0, otherwise.
The last condition is needed to insure that the scalar product is non-degenerate.
Let’s call the map from V to V ∗ given by this scalar product E instead of D, since
we will get a different linear map. The definition shows that E(v1 ) is the functional
which in the B for V takes the value
a b x1
1 0 = ax1 + bx2
b d x2
on the vector x1 v1 + x2 v2 . So E(v1 ) is the functional aϕ1 + bϕ2 in the dual basis
on V ∗ . In the same way, E(v2 ) is the functional bϕ1 + dϕ2 , so we get a different
map when we choose a different scalar product..
Note that this theorem holds for all finite dimension vectors spaces V , without
reference to a scalar product Its proof is very similar to that of Theorem 2.3 p. 106.
Now assume that V is an n-dimensional vector space with a non-degenerate
scalar product. Then the easy Theorem 6.2 p.128 mentioned above and the com-
ments in Lang starting at the middle of page 130 establish the following theorem
in this context.
W ⊥ ' W ∗⊥.
This is the key point, so we should prove it, even though it is easy.
6.4.1 Definition. Pick a v ∈ V . For any ϕ ∈ V ∗ , let ev (ϕ) = ϕ(v). The map
ev : V ∗ → K is easily seen to be linear. It is called evaluation at v.
Proof. First we need to show D2 is a linear map. The main point is that for two
elements v and w of V ,
ev+w = ev + ew .
To show this we evaluate ev+w on any ϕ ∈ V ∗ :
F : V → W.
F ∗ : ψ ∈ W ∗ 7→ ϕ = ψ ◦ F ∈ V ∗ .
Our goal is to understand the relationship between the m × n matrix of F and the
n × m matrix of F ∗ , in suitable bases, namely the dual bases.
For the vector space V ∗ we use the dual basis B ∗ = {ϕ1 , . . . , ϕn } of the basis
B and for the vector space W ∗ we use the dual basis E ∗ = {ψ1 , . . . , ψm } of the
basis E.
The m × n matrix for F associated to B and E is denoted MEB (F ), as per Lang
∗
p. 88, while the matrix for F ∗ associated to E ∗ and B ∗ is denoted MBE∗ (F ∗ ).
What is the relationship between these two matrices? Here is the answer.
Proof. Any functional ϕ ∈ V ∗ can be written in terms of the dual basis as:
Just test the equality by applying the functionals to any basis vector vj to see that
both sides agree, since all the terms on the right hand side except the j-th one
vanish by definition of the dual basis. That means they are the same.
Writing A for MEB (F ), then for any j:
This is what we learned in Chapter IV, §3. Note the indexing going in the wrong
order: in other words, we are taking the dot product of w with the j-th column of
A.
6.6. THE FOUR SUBSPACES 45
V ∗ ←−−∗−− W ∗
F
Now identify V and V ∗ each to K n using the basis B and the dual basis B ∗ ;
also identify W and W ∗ each to K m using the basis E and the dual basis E ∗ .
Let B = MBB∗ (DB ) be the matrix representing DB is these two bases, and E =
E
ME ∗ (DE ) the matrix representing DE . Then we get the corresponding diagram of
matrix multiplication:
A
K n −−−−→ K m
By Ey
K n ←−−−− K m
At
6.7. A DIFFERENT APPROACH 46
This is most interesting if the scalar product on V is the ordinary dot product in
the basis B, and the same holds on W in the basis E. Then it is an easy exercise to
show that the matrices B and E are the identity matrices, so the diagram becomes:
A
K n −−−−→ Km
Iy Iy
K n ←−−−− K m
At
Ker(LA ) ⊕ Image(LAt ) = K n ,
Image(LA ) ⊕ Ker(LAt ) = K m .
Thus the four subspaces associated to the matrix A are the kernels and images
of A and At . The text by Strang [6] builds systematically on this theorem.
w = y1 v 1 + y2 v 2 + · · · + yn v n
then
ϕi (v + w) = xi + yi = ϕi (v) + ϕi (w)
6.7. A DIFFERENT APPROACH 47
and
ϕi (cv) = cxi = cϕi (v).
Note that ϕi (vj ) = 0 if i 6= j, and ϕi (v1 ) = 1.
Then to every element w we can associate a functional called ϕw given by
ϕw = y1 ϕ1 + y2 ϕ2 + · · · + yn ϕn .
.
Thus we have constructed a map from V to V ∗ that associates to any w in V
the functional ϕw . We need to show this is a linear map: this follows from the
previous computations.
6.7.1 Proposition. The ϕi , 1 ≤ i ≤ n form a basis of V ∗ .
Proof. First we show the ϕi are linearly independent. Assume not. Then there is
an equation of linear dependence:
a1 ϕ1 + a2 ϕi + · · · + an ϕn = O.
Apply the equation to vj . Then we get aj = 0, so all the coefficients are zero, and
this is not an equation of dependence. Now take an arbitrary functional ψ on V .
Let aj = ψ(vj ). Now consider the functional
ψ − a1 ϕ1 − a2 ϕi − · · · − an ϕn
Applied to any basis vector vi this functional gives 0. So it is the zero functional
and we are done.
So we have proved that the dimension of V ∗ is n. The basis B ∗ = {ϕ1 , . . . , ϕn }
is called the dual basis of the basis B.
6.7.2 Remark. Now, just to connect to Lang’s approach, let’s define a scalar prod-
uct on V using the bases B and B ∗ . Let
hv, wi = ϕv (w).
The linearity in both variables is trivial. To conclude, we must show we have
symmetry, so
hv, wi = hv, wi = ϕw (v).
With the same notation as before for the components of v and w, we get
ϕv (w) = x1 ϕ1 (w) + · · · + xn ϕn (w)
= x1 y1 + · · · + xn yn
= y1 ϕ1 (v) + · · · + yn ϕn (v)
= ϕw (v)
6.7. A DIFFERENT APPROACH 48
We now use these ideas to show that the row rank and the column rank of a
matrix are the same.
Let A be a m × n matrix. Let N be the kernel of the usual map K n → K m
given by matrix multiplication:
x 7→ Ax.
View the rows of A as elements of V ∗ . Indeed, they are the elements of V ∗
such that when applied to N , give 0. Pick any collection of linearly independent
rows of A, then extend this collection to a basis of V ∗ . Pick the dual basis for
V ∗∗ = V . By definition N is the orthogonal complement of the collection of
independent rows of A. So
Orthogonal Projection
v = c1 u1 + · · · + cm um + d1 w1 + · · · + dr wr (7.1.1)
We use the definition of component in Lang, page 99. Dotting (7.1.1) with ui
gives (v − ci ui ) · ui = 0, which is exactly what is required. In the same way, the
di is the component of v along wi .
Now we apply Lang’s Theorem 1.3 of Chapter V. First to v and ui , . . . , um .
The theorem tells us that the point p = c1 u1 + · · · + cm um is the point in U closest
to v. Similarly, the point q = d1 w1 + · · · + dr wr is the point in W closest to v.
7.1. ORTHOGONAL PROJECTION 50
Here we map
v 7→ p = c1 u1 + · · · + cm um .
as per (7.1.1), and we have shown in Proposition 7.1.2 that this linear map P is the
orthogonal projection to U .
The kernel of P is W . The image of P is of course U . Obviously P 2 = P .
So Exercise 10 of Lang, Chapter III, §4 applies. Notice that we have, similarly, a
projection to W that we call Q. It sends
v 7→ q = d1 w1 + · · · + dr wr .
We have the set up of Exercises 11-12 of Lang, Chapter III, §4, p. 71. Finally note
that the matrix of P in our basis can be written in block form as
Im 0
Am =
0 0
where Im is the m×m identity matrix, and the other matrices are all zero matrices.
In particular it is a symmetric matrix.
Conversely we can establish:
• is symmetric (P t = P ),
• and satisfies P 2 = P ;
Proof. We establish Definition 7.1.3 just using the two properties. For all v and w
in V :
hv − P v, P wi = (v − P v)t P w = v t P w − v t P t P w = v t P w − v t P 2 w = 0.
Ax = b
Proof. To say that the equation Ax = b can be solved is simply to say that b is in
the image of the linear map with matrix A. By the four subspaces theorem 6.6.1
the image of A in K m is the orthogonal complement of the kernel of At . Thus for
any b orthogonal to the kernel of At , the equation Ax = b can be solved, and only
for such b. This is precisely our assertion.
x1 − x2 = b1
x2 − x3 = b2
x3 − x1 = b3
So
1 −1 0
A= 0 1 −1 .
−1 0 1
Now A has rank 2, so up to a scalar, there is only one non-zero vector y such
that y t A = 0. To find y add the three equations. We get
0 = b1 + b2 + b3 .
This says that the scalar product of (1, 1, 1) with b is 0. So by the theorem the
system has a solution for all b such that b1 + b2 + b3 = 0.
Let’s work it out. Write b3 = −b1 − b2 . Then the third equation is a linear
combination of the first two, so can be omitted. It is sufficient to solve the system:
x1 − x2 = b1
x2 − x3 = b2
7.3. SOLVING THE INCONSISTENT INHOMOGENEOUS SYSTEM 52
p = x1 A1 + · · · + xn An
for real variables xi . This is the matrix product p = Ax: check the boxed formula
of Lang, page 113. To use the method of §7.1, we need the orthogonal complement
of U , namely the subspace of V orthogonal to the columns of A. That is the kernel
of At , since that is precisely what is orthogonal to all the columns.
So we conclude that b − p must be in the kernel of At . Writing this out, we get
the key condition:
At (b − Ax) = 0, or At Ax = At b. (7.3.1)
Because At A is invertible - see Theorem 7.3.3 below, we can solve for the un-
knowns x:
x = (At A)−1 At b.
So finally we can find the projection point:
p = Ax = A(At A)−1 At b.
So we get from any b to its projection point p by the linear transformation with
matrix
P = A(At A)−1 At
Since A is a m × n matrix , it is easy to see that P is m × m. Notice that P 2 = P :
by cancellation of one of the (At A)−1 by At A in the middle. Also notice that P is
symmetric by computing its transpose:
We used (At )−1 = (A−1 )t (see Lang’s exercise 32 p. 41), and of course we used
(At )t = A. So we have shown (no surprise, since it is a projection matrix):
7.3.2 Theorem. The matrix P above satisfies Theorem 7.1.4: it is symmetric and
P2 = P.
We now prove a result we used above. We reprove a special case of this result
in Theorem 11.2.1. Positive definite matrices are considered at great length in
Chapter 11.
Proof. Because A has maximal rank, its kernel is trivial, meaning that the only
n-vector x such that Ax = 0 is the zero vector. So assume Ax 6= 0. Then
Now kAxk2 = 0 implies that Ax is the zero vector, and this cannot be the case.
Thus xt (At A)x > 0 whenever x 6= 0. This is precisely the definition that At A is
positive definite. It is symmetric because
This is obviously positive definite. In this case it is easy to work out the projection
matrix A(At A)−1 At :
1 −1 5/6 2/6 −1/6
1/3 0 1 1 1
P = 1 0 = 2/6 2/6 2/6
0 1/2 −1 0 1
1 1 −1/6 2/6 5/6
kAx − bk
It is not so simple because the columns of A are not mutually perpendicular. So in-
stead we can use the standard minimization technique from multivariable calculus.
First, to have an easier function to deal with, we take the square, which we write
as a matrix product:
Notice that each term is a number: check the size of the matrices and the vectors
involved. Calculus tells us f (x), which is a quadratic polynomial, has an extremum
(minimum or maximum or inflection point) only when all the partial derivatives
with respect to the xi vanish. It is an exercise to see that the gradient ∇f of f
in x is At Ax − At b, so setting this to 0 gives the key condition (7.3.1) back. No
surprise.
Of course Theorem 7.1.4 shows that it is possible to bypass distance minimiza-
tion entirely, and use perpendicularity instead.
Chapter 8
Uniqueness of Determinants
Using the material from Lang [3] on elementary matrices, summarized in Chapter
1, we give an alternative quick proof of the uniqueness of the determinant function.
This is the approach of Artin [1], Chapter I, §3, and is different from the one using
permutations that Lang uses in our textbook [4]. You may wonder how Lang covers
this material in [3]: the answer is that he omits the proofs.
F (A) = (−1)i+1 ai1 F (Ai1 ) + · · · + (−1)i+j aij F (Aij ) + · · · + (−1)i+n ain F (Ain )
where the F on the right hand side refers to the function for (n − 1) × (n − 1)
matrices, and Aij is the ij-th minor of A. Note that Lang does not use this word,
which is universally used. We will have more to say about minors in Chapter 11.
The expansion along the j-th column can be written inductively as
F (A) = (−1)j+1 a1j F (A1j )+· · ·+(−1)i+j aij F (Aij )+· · ·+(−1)j+n anj F (Anj ).
8.2.4 Remark. It is important to notice that the computation of F (E) does not
depend on whether F is a row or column expansion. We only used Property (3):
F (I) = 1 that holds in both directions, as does the rest of the computation.
8.2.5 Theorem. For any elementary matrix E and any square matrix A,
Proof. Now that we have computed the value of F (E) for any elementary matrix,
we just plug it into the equations (8.2.1), (8.2.2) and (8.2.3) to get the desired
result. For the last statement, just peel off one of the elementary matrices at a time,
by induction, starting with the one on the left.
8.3 Uniqueness
In this section we follow Artin [1], p. 23-24 closely.
By Theorem 8.2.5, if (1.3.3) is satisfied, then
8.3.1 Theorem. This function F , which we now call the determinant, is the only
function of the rows (columns) of n × n matrices to K satisfying Properties (1), (2)
and (3).
8.3.2 Corollary. The determinant of A is 6= 0 if and only if A is invertible.
This follows from Gaussian elimination by matrices: A is invertible if and only
if in (1.3.3) the matrix A0 = I, which is the only case in which the determinant of
A is non-zero.
We now get one of the most important results concerning determinants:
8.3.3 Theorem. For any two n × n matrices A and B,
det(AB) = det(A) det(B).
Proof. First we assume A is invertible. By (1.3.3) we know A is the product of
elementary matrices:
A = Ek . . . E1 .
Note that these are not the same E1 , . . . , Ek as before. By using Theorem 8.2.5 we
get
det(A) = det(Ek ) . . . det(E1 ) (8.3.4)
and
det(AB) = det(Ek . . . E1 B) by definition of A
= det(Ek ) . . . det(E1 ) det(B) by Theorem 8.2.5
= det(A) det(B). using (8.3.4)
Next assume that A is not invertible, so det(A) = 0. We must show det(AB) =
0. Now apply (1.3.3) to A0 with bottom row equal to 0. Matrix multiplication
shows that the bottom row of A0 B is 0, so by Property (7), det(A0 B) = 0. So
using Theorem 8.2.5 again
0 = det(A0 B) as noted
= det(Ek . . . E1 AB) by definition of A0
= det(Ek ) . . . det(E1 ) det(AB) Th. 8.2.5 applied to AB
Since the det(Ei ) are non-zero, this forces det(AB) = 0, and we are done.
As a trivial corollary we get, when A is invertible, so A−1 A = I:
1
det(A−1 ) = .
det(A)
These last two results are proved in Lang [4] p.172: Theorems 7.3 and Corollary
7.4 of chapter VI.
8.4. THE DETERMINANT OF THE TRANSPOSE 60
8.3.5 Remark. Theorem 8.3.3 says that the map which associates to any n × n
matrix its determinant has a special property. For those who know group theory
this can be expressed as follows. Look at the set of all invertible n × n matrices,
usually denoted Gl(n) as we mentioned in §2.3. Now Gl(n) is a group for matrix
multiplication, with I as in neutral element. As we already know the determinant
of an invertible matrix is non-zero, so when restricted to Gl(n) the determinant
function maps to K ∗ which denotes the non zero elements of the field K. Now
K ∗ is a group for multiplication, as you should check. Theorem 8.3.3 says that the
determinant function preserves multiplication, i.e.,
Maps that do this are called group homomorphisms and have wonderful properties
that you can learn about in an abstract algebra book such as [1]. In particular it
implies that the kernel of the determinant function, meaning the matrices that have
determinant equal to the neutral element of K ∗ , namely 1, form a subgroup of
Gl(n), called Sl(n). Lang studies Sl(n) in Appendix II.
det(At ) = det(A)
Proof. Indeed computing the expansion by minors of At along columns is the same
as computing that of A along rows, so the computation will give us det(A) both
ways. The key point is again Remark 8.2.4.
This is Lang’s Theorem 7.5 p.172. He proves it using the expansion formula
for the determinant in terms of permutations: see (9.1.3).
Chapter 9
The point of this chapter is to clarify Lang’s key lemma 7.1 and the material that
precedes it in §7 of [4]. A good reference for this material is Hoffman and Kunze
[2], §5.3.
These notes can be read before reading about permutations in Lang’s Chapter
VI, §6. We do assume a knowledge of Lang [4], §2-3 of Chapter VI: the notation
from there will be used.
What does the summation mean? We need to take all possible choices of
k1 , k2 , . . . , kn between 1 and n. It looks like we have nn terms.
Now we use the hypothesis that F is alternating. Then
unless all of the indices are distinct: otherwise the matrix has two equal columns.
This is how permutations enter the picture. When k1 , k2 , . . . , kn are distinct, then
the mapping
1 → k1 , 2 → k2 , . . . , n → kn
is a permutation of Jn = {1, . . . , n}. Our sum is over all permutations.
Now at last we assume Property (3), so that F (I) = 1. Then
because we have the columns of the identity matrix, in a different order. At each
interchange of two columns, the sign of F changes. So we get 1 if we need an even
number of interchanges, and −1 if we need an odd number of interchanges.
The number F (ek1 , ek2 , ek3 , . . . , ekn ) is what is called the sign of the permu-
tation. Lang writes it .
Thus if we write the permutations σ, we have proved
X
F (A) = (σ)aσ(1),1 aσ(2),2 . . . aσ(n),n (9.1.3)
σ
where the sum is over all permutations in Jn . This is a special case of the formula
in the key Lemma 7.1, and gives us Theorem 7.2 directly.
9.2. MATRIX MULTIPLICATION 63
Now we compute F (C) by expanding the columns using these equations and
multilinearity, exactly as we did in the previous section. So C plays the role of A,
the bij replace the aij , and the Ai replace the ei .
Now we assume F is also alternating. We will not make use of Property (3)
until we mention it. By the same argument as before we get the analog of (9.1.2):
n
X
F (C) = bk1 1 bk2 2 . . . bkn n F (Ak1 , Ak2 , . . . , Akn ).
k1 ,...,kn =1
which is the precise analog of the expression in Lang’s Lemma 7.1. Using (9.1.3)
for B, we get:
F (AB) = F (A) det(B).
Finally if we assume that F satisfies Property (3), by the result of the previous
section we know that F is the determinant, so we get the major theorem:
This important matrix is not discussed in [4], but it provides a good introduction to
the notion of a cyclic vector, which becomes important in [4], XI, §4.
10.1 Introduction
We start with a monic polynomial of degree n:
with coefficients in an field K. Monic just means that the coefficient of the leading
term tn is 1.
We associates to this matrix an n × n matrix known as its companion matrix:
0 0 0 . . . 0 0 a0
1 0 0 . . . 0 0 a1
0 1 0 . . . 0 0 a2
A = 0 0 1 . . . 0 0 a3 (10.1.2)
.. .. . . .. . . .. ..
. .
. . . . .
0 0 0 . . . 1 0 an−2
0 0 0 . . . 0 1 an−1
Thus the last column of A has the coefficients of f in increasing order, omitting
the coefficient 1 of the leading term, while the subdiagonal of A, namely the terms
ai+1,i are all equal to 1. All other terms are 0.
by expansion along the first row, and induction on n. First we do the case n = 2.
The determinant we need is
t −a0 2
−1 t − a1 = t(t − a1 ) − a0 = t − a1 t − a0 ,
as required.
Now we do the case n. By Laplace expansion of the determinant along the first
row we get two terms:
t ........ 0
−a1 −1 t . . . . . . . . 0
−1 t . . . 0 −a2 0 −1 t . . . 0
0 −1 . . . 0 −a n 0 0 −1 . . . 0
t 3 + a0 (−1)
.. .. .. .. .. .. .. ..
.
. . t .
. . . . t
0 0 . . . −1 t − an−1 0 0 0 . . . −1
while the second term gives a0 (−1)n (−1)n−1 = −a0 , since the matrix is triangu-
lar with −1 along the diagonal.
Thus we do get the polynomial (10.1.1) as characteristic polynomial of (10.1.2).
since e2 is the first column of A, e3 the second column of A, en the n−1-th column
of A, and a0 e1 + a1 e2 + · · · + an−1 en the last column of A.
This means that e1 is a cyclic vector for L: as we apply powers of the operator
L to e1 , we generate a basis of the vector space K n : in other words the vectors e1 ,
L(e1 ), L2 (e1 ), Ln−1 (e1 ) are linearly independent.
Conversely,
10.2.1 Theorem. Assume that the operator L : V → V has no invariant sub-
spaces, meaning that there is no proper subspace W ⊂ V such that L(W ) ⊂ W .
Pick any non-zero vector v ∈ V . Then the vectors v, L(v), L2 (v), . . . , Ln−1 (v)
are linearly independent, and therefore form a basis of V .
Proof. To establish linear independence we show that there is no equation of linear
dependence between v, L(v), L2 (v), . . . , Ln−1 (v). By contradiction, let k be the
smallest positive integer such that there is an equation of dependence
The minimality of k means that bk 6= 0, so that we can solve for Lk (v) in terms of
the previous basis elements. But this gives us an invariant subspace for L: the one
generated by v, L(v), L2 (v), . . . , Lk (v), a contradiction.
The matrix of L in the basis {v, L(v), L2 (v), . . . , Ln−1 (v)} is the companion
matrix of the characteristic polynomial of L.
are linearly independent. If we use these as a basis, in this order, we see that
Thus it is lower triangular, with α along the diagonal, and 1 on the subdiago-
nal. Because Lang takes the reverse order for the elements of the basis, he gets the
transpose of this matrix: see p. 263. This confirms that the characteristic polyno-
mial of this matrix is (t − α)n .
Finally we ask for the eigenvectors of (10.3.1). It is an exercise ([4], XI, §6,
exercise 2) to show that there is only one eigenvector: wn−1 .
The trace of this matrix is a + d and its determinant ad − bc. Notice how they
appear in the characteristic polynomial. For this to be irreducible over R, by the
quadratic formula we must have
The full value of the companion matrix only reveals itself when one takes
smaller subfields of C, for example the field of rational numbers Q. Over such
a field there are irreducible polynomials of arbitrary high degree: for example the
cyclotomic polynomial
for p a prime number. Since (t − 1)Φp (t) = tp − 1, the roots of Φ(t) are complex
numbers on the circle of radius 1, thus certainly not rational. It is a bit harder
to show that Φp (t) is irreducible, but it only requires the elementary theory of
polynomials in one variable. A good reference is Steven H. Weintraub’s paper
Several Proofs of the Irreducibility of the Cyclotomic Polynomial.
Chapter 11
Real symmetric matrices come up in all areas of mathematics and applied mathe-
matics. It is useful to develop easy tests to determine when a real symmetric matrix
is positive definite or positive semidefinite.
The material in Lang [4] on positive (semi)definite matrices is scattered in ex-
ercises in Chapter VII, §1 and in Chapter VIII, §4. The goal of this chapter is to
bring it all together, and to establish a few other important results in this direction.
11.1 Introduction
Throughout this chapter V with denote a real vector space of dimension n with a
positive definite scalar product written hv, wi as usual. By the Gram-Schmidt or-
thonormalization process (Lang V.2) this guarantees that there exists an orthonor-
mal basis B = {v1 , v2 , . . . , vn } for V .
Recall that a symmetric operator T on V satisfies
hT (v), wi = hT v, wi
11.1.3 Remark. Lang uses the term semipositive instead of positive semidefinite:
Exercise 10 of VII §1and Exercise 4 of VIII §4. Here we will use positive semidef-
inite, since that is the term commonly used.
Lang also defines a positive definite matrix A on Rn with the standard dot
product. It is a real n × n symmetric matrix A such that for all coordinate vectors
X 6= O, X t AX > 0. See Exercise 9 of VII §1. In the same way we can define a
positive semidefinite matrix.
Here is an important fact that is only stated explicitly by Lang in Exercise 15
b) of VIII §4:
This follows immediately from the results of Chapter IV, §3 and the definitions.
Next it is useful to see how the matrix A changes as one varies the basis of
V . Following the notation of Lang V.4, we write g(v, w) = hT v, wi. This is a
symmetric bilinear form: see Lang, V.4, Exercises 1 and 2. We have just seen that
for an orthonormal basis B of V ,
g(v, w) = X t AY
where X and Y are the coordinates of v and w in that basis. Now consider another
basis B 0 . By Lang’s Corollary 3.2 of Chapter IV, if U is the invertible change of
basis matrix MBB0 (id) and X 0 and Y 0 are the coordinate vectors of x and y in the
B 0 basis, then
X 0 = U X and Y 0 = U Y.
Thus if A0 denotes the matrix for g(v, w) in the B 0 basis, we must have
t
g(v, w) = X t AY = X 0 A0 Y 0 = X t U t AU Y.
A0 = U t AU (11.1.5)
11.1.7 Exercise. Show that two congruent matrices represent the same symmetric
bilinear form on V in different bases. Thus they have the same index of positiv-
ity and same index of nullity. In particular, if one is positive definite or positive
semidefinite, the other is also. See Lang V.8.
11.2.1 Theorem. The following conditions on the symmetric operator T are equiv-
alent.
1. T is positive (semi)definite.
2. The eigenvalues of T are all positive (non negative).
3. There exists a operator S on V such that T = S t S, which is invertible if and
only if T is positive definite.
4. The index of negativity of T is 0. If T is positive definite, the index of nullity
is also 0
v = c1 v1 + c2 v2 + · · · + cn vn
11.2. THE FIRST THREE CRITERIA 72
S is called the square root of T . Note that a positive semidefinite operator also has
a square root, but it is invertible only if T is positive definite.
(3) ⇒ (1) is a special case of Theorem 7.3.3. We reprove it in this context in
the language of operators. We take any operator S on V . S t S is symmetric because
There are two other useful criteria. The purpose of the rest of this chapter is to
explain them. They are not mentioned in Lang, but following easily from earlier
results.
11.3.3 Remark. Lang looks at certain minors in VI.2, while studying the Laplace
expansion along a row or column. See for example the discussion on p.148. He
does not use the word minor, but his notation Aij denotes the submatrix of size
(n − 1) × (n − 1) where the i-th row i and the j-th column have been removed.
Then the det Aij are minors, and the det Aii are principal minors.
11.3. THE TERMS OF THE CHARACTERISTIC POLYNOMIAL 74
then
2 0 0
2 0 4 3
A(1, 2) = , A(2, 3) = , and A(1, 2, 3) = 0 4 3
0 4 3 4
0 3 4
where the sum is over all choices of j elements J = {i1 , . . . , ij } from the set of the
first n integers, and AJ is the corresponding principal submatrix of A. Thus the
n
sum has j terms.
• If j = n, then there is only one choice for J: all the integers between 1 and
n. The theorem then says that pn = det A, which is clear: just evaluate at
t = 0.
• If j = 1, we know that p1 is the trace of the matrix. The sets J have just one
element, so the theorem says
Proof. To do the general case, we use the expansion of the determinant in terms
of permutations in Theorem 9.1.3. We consider the entries aij of the matrix A as
variables.
To pass from the determinant to the characteristic equation, we make the fol-
lowing substitutions: we replace each off-diagonal term aij by −aij and each di-
agonal terms aii by t − aii , where t is a new variable. How do we write this
substitution in the term
(σ)aσ(1),1 aσ(2),2 . . . aσ(n),n
of the determinant expansion? Let f be the number of integers i fixed by σ, mean-
ing that σ(i) = i, and let the fixed integers be {j1 , . . . , jf } Designate the remain-
ing n − f integers in [1, n] by {i1 , . . . , in−f }. Then the term associated to σ in the
characteristic polynomial can be written
f
Y n−f
Y
(σ) (t − ajl ,jl ) (−aσ(il ),il ). (11.3.7)
l=1 l=1
11.4.1 Theorem. A symmetric matrix A is positive definite if and only if all its
leading principal minors are positive. It is positive semidefinite if and only if all its
principal minors are non-negative.
Notice the subtle difference between the two cases: to establish that A is pos-
itive semidefinite, you need to check all the principal minors, not just the leading
ones.
The leading principal minors of this matrix are both 0, and yet it obviously not
positive semidefinite, since its eigenvalues (0 and −1) are not both non-negative.
Since
a22 a24
AJ =
a42 a44
you can easily verify (11.4.4) in this case.
11.4. THE PRINCIPAL MINORS TEST 77
A weaker result implied by this corollary is useful when just scanning the ma-
trix.
11.4.10 Corollary. If A is positive definite, the term of largest absolute value must
be on the diagonal.
Now we return to the proof of the main theorem. In the positive definite case it
remains to show that if all the leading principal minors of the matrix A are positive,
then A is positive definite. Here is the strategy. From Exercise 11.1.7, if U be any
invertible n × n matrix, then the symmetric matrix A is positive definite if and
only if U t AU is positive definite. Then we have the following obvious facts for
diagonal matrices , which guide us.
Proof. Indeed, if the diagonal entries are {d1 , d2 , . . . , dn }, the leading principal
minors are D1 = d1 , D2 = d1 d2 , . . . , Dn = d1 d2 · · · dn . So the positivity of the
di is equivalent to that of the Di .
Thus a11 = D1 > 0, so we can use it to clear all the entries of the first column of
the matrix A below a11 . In other words we left-multiply A by the invertible matrix
U1 whose only non-zero elements are in the first column except for 1 along the
diagonal:
1 0 ... 0
− a21 1 . . . 0
a11
U1 = .
. .. . . ..
. . . .
an1
− a11 0 . . . 1
11.4. THE PRINCIPAL MINORS TEST 79
Define A(2) = U2 A(1) U2t . It is symmetric with zeroes in the first two rows and
(2) (2) (1)
columns, except at a11 = a11 and a22 = a22 . SinceA(2) is obtained from A(1)
using row and column operations involving the second row and second column, we
have
(2) (2)
Dk = Dk > 0.
Now from partial diagonalization
(2) (1) (2)
D3 = a11 a22 a33 .
(2)
So a33 > 0. Continuing in this way, we get a diagonal matrix A(n−1) with positive
diagonal elements, obtained from A by left-multiplying by
U = Un−1 Un−2 · · · U1
More generally, the proof implies a result that is interesting in its own right.
Proof. As before, this immediately follows from the fact that if you add to a row
(or column) of a square matrix a multiple of another row (or another column),
then the determinant of the matrix does not change. Just apply this to the leading
principal minors.
11.4.14 Exercise. State the result concerning negative definite matrices that is anal-
ogous to the main theorem, noting that Proposition 11.4.13 applies. Do the same
for Theorem 11.2.1.
We now finish the proof of the main theorem in the positive definite case.
Assume that A is a symmetric matrix whose leading principal minors Dk are
all positive. Proposition 11.4.12 tells us that A can be diagonalized to a matrix
A(n−1) = U AU t by a lower diagonal matrix U with 1’s on the diagonal The
(1) (n−1)
diagonal matrix A(n−1) obtained has all its diagonal entries a11 , a22 , . . . , ann
positive, so it is positive definite by the easy Proposition 11.4.11. By Exercise
11.1.7 A is positive definite, so we are done.
We now prove Theorem 11.4.1 in the positive semidefinite case. Proposition
11.4.6 establishes one of the implications. For the other implication we build on
Lang’s Exercise 13 of VII.1. Use the proof of this exercise in [5] to prove:
Proof. We first note that all the roots of P (t) are non negative, only using the non-
negativity of the pi . Assume we have a negative root λ. Then all the terms of
P (λ) have the same sign, meaning that if n is even, all the terms are non-negative,
11.5. THE CHARACTERISTIC POLYNOMIAL TEST 81
while if n is odd, all the terms are non-positive. Since the leading term λn is non-
zero, this is a contradiction. Thus all the roots are non-negative, and A is therefore
positive semidefinite by Theorem 11.2.1, (1). If the pi are all positive (in fact if a
single one of them is positive) then the polynomial cannot have 0 as a root, so by
the same criterion A is positive definite.
11.5.1 Theorem. A is positive definite if and only if all the pi , 1 ≤ i ≤ n, are pos-
itive. A is positive semidefinite if and only if all the pi , 1 ≤ i ≤ n, are nonnegative.
Homework Solutions
This chapter contains solutions of a few exercises in Lang, where the solution man-
ual is too terse and where the method of solution is as important as the solution
itself.
where the mi are the multiplicities of the eigenvalues. The goal of the exercise is
to show that the minimal polynomial of A is
µ(t) = (t − α1 )(t − α1 ) · · · (t − αr ).
Bg(D)B −1 = O.
g(D) = O. (12.1.1)
But g(t) has only k roots, and there are r > k distinct numbers among the di .
So (12.1.2) is impossible, and the minimal degree for g is r.
Finally you may ask, could there be a polynomial g(t) of degree r that vanishes
on A, other than µ(t)? No: if there were, we could divide µ(t) by g(t). If there is
no remainder, the polynomials are the same. If there is a remainder, it has degree
< r and vanishes on A: this is impossible as we have just established.
A.3 Matrices
Matrices are written with parentheses. Matrices are denoted by capital roman
letters such as A, and have as entries the corresponding lower case letter. So
A = (aij ). A is an m × n matrix if it has m rows and n columns, so 1 ≤ i ≤ m
and 1 ≤ j ≤ n. We write the columns of A as Aj and the rows as Ai , following
Lang.
At is the transpose of the matrix A. This differs from Lang.
D(d1 , d2 , . . . , dn ) is the n × n diagonal matrix
d1 0 0 . . . 0
0 d2 0 . . . 0
.. .. ..
. . . ... 0
0 0 0 ... dn
[1] Michael Artin, Algebra, First Edition, Prentice Hall, New York, 1991. ↑vi, 10, 14, 56, 58, 60
[2] Kenneth Hoffman and Ray Kunz, Linear Algebra, Second Edition, Prentice Hall, Englewood
Cliffs, NJ, 1971. ↑61
[3] Serge Lang, Introduction to Linear Algebra, Second Edition, Springer, New York, 1997. ↑1, 3,
7, 56
[4] , Linear Algebra, Third Edition, Springer, New York, 1987. ↑v, vi, 1, 7, 9, 10, 11, 12, 19,
21, 39, 56, 57, 59, 61, 64, 66, 67, 69, 81
[5] Rami Shakarni, Solutions Manuel for Lang’s Linear Algebra, Springer, New York, 1996. ↑vi,
80, 81, 82
[6] Gilbert Strang, Linear Algebra and its Applications, Third Edition, Harcourt Brace Jovanovich,
San Diego, CA, 1988. ↑46
[7] Robert J. Valenza, Linear Algebra: An Introduction to Abstract Mathematics, Undergraduate
Texts in Mathematics, Springer, New York, 1993. ↑vi