7 Diagonalization and Quadratic Forms
7 Diagonalization and Quadratic Forms
7 Diagonalization and Quadratic Forms
Diagonalization
Recall the definition of a diagonal matrix from Section 1.6.
A7 = AAAAAAA.
1
Exercise. With A as above, work out A16 . Then try and do it directly
without the “diagonalization”!
2
and one can check (exercise) that
−1 3 0
P AP = .
0 −1
(iii) A is symmetric.
Proof. (i) ⇔ (ii) This follows from Theorems 6.6, 6.8 and 7.4 (exercise:
write down exactly how).
(i) ⇒ (iii) If (i) holds, say P −1 AP = D is diagonal with P orthogonal,
then we have A = P DP −1 = P DP T . Clearly D is symmetric, so
AT = (P DP T )T = (P T )T DT P T = P DP T = A
Quadratic Forms
Definition 7.8. A quadratic form in n variables is a function f : Rn → R
of the form X
f (x) = f (x1 , ..., xn ) = cij xi xj (∗)
1≤i≤j≤n
1. f (x1 ) = x21
3
4. f (x1 , . . . , xn ) = x21 + x22 + · · · + x2n = hv|vi, where v = (x1 , . . . , xn ). So
the Euclidean inner product (see Chapter 6) gives rise to a quadratic
form.
1
If we set aii = cii for i = 1, ..., n and aij = cij for 1 ≤ i < j ≤ n, then
2
(∗) becomes
n
X X
f (x) = aii x2i + 2aij xi xj
i=1 1≤i<j≤n
4
Theorem 7.11. (The Principal Axes Theorem) Every quadratic form f can
be diagonalized.
More specifically, if f (x) = xT Ax is a quadratic form in
x1
..
x = . , then there exists an orthogonal matrix Q such that
xn
From the first part of the course we know that Q is the matrix whose
columns are the unit eigenvalues of the matrix A of f .
and so we have √ √
1/√3 −2/√ 6 0√
Q = 1/√3 1/√6 1/ √2 .
1/ 3 1/ 6 −1/ 2
If we set y = QT x, we get
1 1 1
y1 = √ (x1 + x2 + x3 ), y2 = √ (−2x1 + x2 + x3 ), y3 = √ (x2 − x3 ).
3 6 2
Then, expressed in terms of the variables y1 , y2 and y3 , the quadratic form
becomes 2y12 − y22 − y32 .
5
An immediate consequence of the Principal Axes Theorem is the follow-
ing:
Theorem 7.14. Let f (x) = xT Ax be a quadratic form with matrix A. Then
f is positive definite if and only if all the eigenvalues of A are positive.
Proof. By the Principal Axes Theorem, there exists an orthogonal matrix Q
such that
f (x) = λ1 y12 + ... + λn yn2
y1
where y = ... = QT x and λ1 , ..., λn are the eigenvalues of A. If all the
yn
λi are positive then f (x) > 0 except when y = 0. But this happens if and
only if x = 0 because QT is invertible. Therefore f is positive definite.
On the other hand if one of the eigenvalues λi ≤ 0, letting y = ei and
x = Qy we get f (x) = λi ≤ 0 and so f is not positive definite.
We say that a symmetric matrix A is positive definite if the associated
quadratic form
f (x) = xT Ax
is positive definite.
The Principal Axes Theorem has important applications in geometry.
8 Vector Spaces
Definition and Examples
In the first part of the course we’ve looked at properties of the real n-space
Rn . We also introduced the idea of a field K in Section 3.1 which is any set
with two binary operations + and × satisfying the 9 field axioms. R is an
example of a field but there are many more, for example C, Q and Zp (p a
prime, with modulo p addition and multiplication).
6
1.(i) u + v = v + u for all u, v ∈ V
The elements of V are called vectors and the elements of K are called
scalars. We sometimes refer to V as a K−space.
Examples 8.2. 1. For all n ≥ 1, Rn with the usual addition and scalar
multiplication is a vector space over R. More generally, let
x 1
n ..
K = . | xi ∈ K
x
n
and define
x1 y1 x1 + y 1 x1 cx1
.. .. = ..
. + , c ... = ...
. .
xn yn xn + y n xn cxn
2. The set Mmn (R) of all m × n matrices with entries in R with addition
of matrices and scalar multiplication is a vector space over R. More
generally, let
a 11 ... a 1n
.. .
.. | aij ∈ K
Mmn (K) = .
a
m1 ... amn
7
be the set of m × n matrices with entries in K and define
a11 ... a1n b11 ... b1n a11 + b11 ... a1n + b1n
.. .. + .. .. = .. ..
,
. . . . . .
am1 ... amn bm1 ... bmn am1 + bm1 ... amn + bmn
a11 ... a1n ca11 ... ca1n
c ... .. = .. ..
. . .
am1 ... amn cam1 ... camn
where (aij , bij , c ∈ K). Then Mmn (K) is a vector space over K.
We write Mn (K) = Mnn (K).
3. Let Pn denote the set of all polynomials of degree ≤ n with real coef-
ficients:
and define n n n
X X X
i i
ai x + bi x = (ai + bi )xi
i=0 i=0 i=0
n
X n
X
c( ai xi ) = (cai )xi .
i=0 i=0
8
5. The set C of complex numbers is a vector space over R with the usual
addition of complex numbers and multiplication by real numbers.
6. An unusual example: Let U be a set. Consider the power set
P(U ) = {A| A ⊆ U }. For A, B ⊆ U define
A + B = (A ∪ B)\(A ∩ B).
This definition satisfies conditions 1(i)-(iv) of Definition 8.1.
The zero in P(U ) is ∅ and −A = A.
Consider the field Z2 = {0, 1} and define
1.A = A, 0.A = ∅, ∀A ⊆ U.
We can show that 2(i)-(iv) of Definition 8.1 are satisfied.
Hence P(U ) is a vector space over Z2 .
Theorem 8.3. Let V be a vector space over K. Then, for all u ∈ V and all
a ∈ K we have:
(i) 0u = 0;
(ii) a0 = 0;
(iii) (−1)u = −u; and
(iv) if au = 0, then a = 0 or u = 0.
9
Subspaces
Definition 8.4. A non-empty subset W of a K−space V is a subspace if
(i) u + v ∈ W, ∀u, v ∈ W ; and
(ii) au ∈ W, ∀u ∈ W, ∀a ∈ K.
Theorem 8.5. A subspace W of a K−space V is itself a vector space over
K with the same addition and scalar multiplication as in V .
W = {f ∈ F| f (1) = 0}.
10
(v) In any vector space V , the subset {0} is a subspace, called the zero
subspace.
Proof. We have 0 = 0 + 0 ∈ W1 + W2 .
Let u + v, u0 + v 0 ∈ W1 + W2 , where u, u0 ∈ W1 and v, v 0 ∈ W2 . Then
(u + v) + (u0 + v 0 ) = (u + u0 ) + (v + v 0 ) ∈ W1 + W2
c(u + v) = cu + cv ∈ W1 + W2
W1 = {A ∈ Mn (R)| A = AT }, W2 = {B ∈ Mn (R)| B = −B T }.
We have already seen that W1 is a subspace and it’s not hard to show that
W2 is a subspace. We have
11
Let A ∈ Mn (R). Then we can write
1 1
A = (A + AT ) + (A − AT )
2 2
1 1
and by properties of the transpose we have (A+AT ) ∈ W1 and (A−AT ) ∈
2 2
W2 . Therefore A ∈ W1 + W2 and W1 + W2 = Mn (R).
We can extend Definition 8.8 to the sum of more than two subspaces:
An easy induction on t and Theorem 8.9 show that this sum is a subspace
of V .
12
Examples 9.2. 1. The set
1 0 0 1 0 0 0 0
S= , , ,
0 0 0 0 1 0 0 1
is a spanning set for the vector space M2 (K) over any field K because
we can write any 2 × 2 matrix as
a b 1 0 0 1 0 0 0 0
=a +b +c +d
c d 0 0 0 0 1 0 0 1
where a, b, c, d ∈ K.
2. The set S = {1, x, x2 , x3 , ...} is a spanning set for the vector space P
of polynomials with coefficients in R. By definition, any polynomial
f = ni=0 ai xi is a linear combination of elements of S.
P
in M2 (K).
Since the three matrices in S are symmetric, any linear combination of
them is symmetric. Indeed,
a b 1 0 0 1 0 0
=a +b +d
b d 0 0 1 0 0 1
3b + c = −1, 2a = 4, −a + c = 0, −b = 1.
13
(i) span(v1 , ..., vk ) is a subspace of V ;
Proof. (i) Consider the span of a single vector vi , span(vi ) = {avi | a ∈ K}.
This is a subspace because
Linear Independence
Definition 9.4. Let V be a vector space. A finite set S = {v1 , ..., vk } of
vectors in V is called linearly independent if the only solution to
a1 v1 + ... + ak vk = 0
is a1 = ... = ak = 0.
If S is not linearly independent we say it is linearly dependent.
Remark:
If we can find a1 , ..., ak ∈ K, not all zero, such that a1 v1 + ... + ak vk = 0,
we say this is a non-trivial combination of the vectors.
Proof. Let S = {v1 , ..., vk } and assume, without loss of generality, that
v1 = a2 v2 + ... + ak vk .
Then
1v1 − a2 v2 − ... − ak vk = 0
and therefore S is linearly dependent.
14
Conversely assume there is a non-trivial combination
b1 v1 + ... + bk vk = 0.
Then at least one of the coefficients bi 6= 0 and we can write
= (−b−1 −1 −1 −1
i b1 )v1 + ...(−bi bi−1 )vi−1 + (−bi bi+1 )vi+1 + ... + (−bi bk )vk
as required.
1 0 0 1
Examples 9.6. 1. In M2 (R), let A = ,B = ,
0 0 −2 0
−1 0
C= . If
0 1
aA + bB + cC = 0
for some a, b, c ∈ R we have
a−c b 0 0
=
−2b c 0 0
and so a − c = 0, b = −2b = 0, c = 0 and the only solution is a = b =
c = 0. Therefore {A, B, C} is linearly independent in M2 (R).
2. Consider the vector space P and let S = {1, x, x2 , ..., xn }. Then
a0 + a1 x + ... + an xn = 0
if and only if a0 = a1 = ... = an = 0. Hence S is linearly independent
in P. (Here we use the fact that two polynomials are equal if and only
if their coefficients are equal.)
3. In the vector space F of all functions f : R → R, the set {sin2 (x), cos2 (x), cos(2x)}
is linearly dependent because
cos(2x) = cos2 (x) − sin2 (x).
15
The concept of linear independence can be extended to infinite sets: an
arbitrary non-empty set S in a vector space V is linearly independent if all
its finite subsets are linearly independent.
1 if x = α
fα (x) = {
0 if x 6= α.
⇒ ai .1 = 0 ⇒ ai = 0, ∀i = 1, ..., k.
Hence {fα1 , ..., fαk } is linearly independent and so by definition S is
linearly independent.
Notice this is an example of an uncountably infinite linearly inde-
pendent set.
10 Bases
Definition 10.1. A subset B of a vector space V is a basis for V if B spans
V and B is linearly independent.
Examples 10.2. 1. The standard basis for K n is the set {e1 , ..., en }
0
..
.
where ei = 1 , where the 1 appears in the i-th row. There are
.
..
0
many more bases for K n .
16
3. In Mmn (K), the set of matrices Eij with a 1 in the (i, j)−th position
and 0 everywhere else, for 1 ≤ i ≤ m, 1 ≤ j ≤ n is the standard basis
for Mmn (K).
4. The set of polynomials {1, x, x2 , ..., xn } form a basis for Pn and the
infinite set {1, x, x2 , ...} is a basis for P.
Theorem 10.3. Let V be a vector space and let B be a finite basis for V .
Then for every vector v ∈ V there is a unique expression for v as a linear
combination of the vectors in B.
v = a1 v1 +, ... + an vn = b1 v1 + ... + bn vn
a1 − b1 = ... = an − bn = 0
⇒ ai = b i , ∀i = 1, ..., n.
(This proves the theorem for finite B. The proof for an infinite B is
similar, but requires some additional argument.)
17
for all vj ∈ S\B, B ∪ {vj } is linearly dependent. So there is a non-trivial
combination
a1 v1 + ... + an vn + aj vj = 0
where aj 6= 0 because B is linearly independent. Therefore
vj = −a−1
j (a1 v1 + ... + an vn )
n
X m
X
= bi vi + bj v j
i=1 j=n+1
n
X m
X n
X
= bi vi + bj (−a−1
j ai )vi
i=1 j=n+1 i=1
n
X n
X m
X
= bi v i + ( −bj a−1
j ai )vi
i=1 i=1 j=n+1
n
X m
X
= (bi − bj a−1
j ai )vi .
i=1 j=n+1
Proof. Let S0 be a finite linearly independent set in V and let T be any finite
spanning set for V . Then S = S0 ∪ T is a finite spanning set for V containing
S0 . By Lemma 10.5, S contains a basis B of V with S0 ⊆ B.
The next result is required to prove the main theorem of this section:
Lemma 10.7. Let V be a finite dimensional vector space. Let R = {u1 , ..., un }
be a linearly independent set in V and S = {v1 , ..., vm } be a spanning set for
V . Then n ≤ m.
18
Proof. Consider the set T1 = {u1 } ∪ S. Then T1 is a spanning set because
it contains S and it is linearly dependent because u1 is a linear combination
of elements of S. Therefore T1 contains a basis B1 containing the linearly
independent set {u1 } by Lemma 10.5. B1 is a proper subset of T1 because T1
is linearly dependent and so B1 = {u1 } ∪ S1 where S1 is a proper subset of
S.
Now consider T2 = {u2 } ∪ B1 = {u1 , u2 } ∪ S1 . Then T2 is a spanning set
and it is linearly dependent because u2 ∈ span(B1 ). Therefore T2 contains a
basis B2 containing the linearly independent set {u1 , u2 } and B2 is a proper
subset of T2 . We can write B2 = {u1 , u2 } ∪ S2 where S2 is a proper subset of
S1 .
Continuing in this way we find that V has a basis Bn = {u1 , ..., un } ∪ Sn
where
Sn ( Sn−1 ( ... ( S1 ( S.
Hence
0 ≤ |Sn | ≤ |S| − n
⇒ n ≤ |S| = m.
Theorem 10.8. (The Basis Theorem) Any two bases of a finite dimen-
sional vector space have the same number of elements.
Proof. Let V be a finite dimensional vector space . Then V has a finite basis
B by definition. Suppose B has n elements. Let C be any other finite basis
and suppose that C has m elements. Since B is linearly independent and C
is a spanning set, then Lemma 10.7 gives n ≤ m. Similarly m ≤ n since B is
a spanning set and C is linearly independent. Therefore n = m.
Finally, any basis of V is finite, otherwise V would contain an infinite
linearly independent set, and hence a finite linearly independent set with
more than n elements. However this contradicts Lemma 10.7 because B is a
spanning set with n elements.
Definition 10.9. Let V be a finite dimensional vector space over K, with
V 6= {0}. The number of vectors in any basis for V is called the dimension
of V , denoted by dimK (V ) or simply dim(V ). For the zero space we set
dim{0} = 0.
Examples 10.10. 1. For any n ≥ 1 and any field K,
dimK (K n ) = n.
19
1 0 0 0 0 1
3. We have seen that the set of matrices , ,
0 0 0 1 1 0
forms a basis for the space of symmetric matrices in M2 (K). Hence this
space has dimension 3. It is a subspace of M2 (K).
{Eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n}
where Eij is the matrix with (i, j)-th entry 1 and all other entries 0.
Therefore
dimK (Mmn (K)) = mn.
5. Indicating the field K can be important because some sets can be re-
garded as vector spaces over different fields. For example if V = C,
then
dimC (C) = 1, with basis {1},
dimR (C) = 2, with basis {1, i}.
1. V has a finite basis and all bases of V have the same (finite) number of ele-
ments. This number is called the dimension of V .
20
Proof. (i) If S is a linearly independent subset of V , then S is contained in
a basis B and so |S| ≤ |B| = n.
(ii) If S is a spanning set for V , then S contains a basis B and n = |B| ≤
|S|.
(iii) If S is a linearly independent set with |S| = n, then (i) implies that
S = B for some basis B.
(iv) If S is a spanning set for V with |S| = n, then (ii) implies that S = B
for some basis B.
The next result shows that any subspace of a finite dimensional vector
space is also finite dimensional.
dim(W ) = m ≤ n = dim(V ).
W = V ⇒ dimW = dimV.
21
Proof. Since W1 ∩ W2 is a subspace of W1 and of W2 it is finite dimensional
by Theorem 10.12(i). Let B = {u1 , ..., un } be a basis for W1 ∩ W2 . Then
this can be extended to a basis B1 = {u1 , ..., un , v1 , ..., vk } for W1 and a basis
B2 = {u1 , ..., un , w1 , ..., wm } for W2 . We will show that
B3 = B1 ∪ B2 = {u1 , ..., un , v1 , ..., vk , w1 , ..., wm }
is a basis for W1 + W2 . It is clear that B3 spans W1 + W2 so it remains to
show that it is linearly independent. Assume that
a1 u1 + ... + an un + b1 v1 + ... + bk vk + c1 w1 + ... + cm wm = 0 (∗)
for some scalars a1 , ..., an , b1 , ..., bk , c1 , ..., cm ∈ K. Then u + v + w = 0, where
u = a1 u1 + ... + an un , v = b1 v1 + ... + bk vk , w = c1 w1 + ... + cm wm . So
w = −u − v ∈ W1 ∩ W2 and we can write w = d1 u1 + ... + dn un for some
d1 , ..., dn ∈ K. Substituting into (∗) we get
(a1 + d1 )u1 + ... + (an + dn )un + b1 v1 + ... + bk vk = 0. (∗∗)
Since B1 is linearly independent, all coefficients in (∗∗) are zero and so (∗)
becomes
a1 u1 + ... + an un + c1 w1 + ... + cm wm = 0. (∗ ∗ ∗)
Since B2 is linearly independent, all coefficients in (∗ ∗ ∗) are zero and this
means that all coefficients in (∗) are zero. Therefore B3 is linearly indepen-
dent and is a basis for W1 + W2 .
Finally we have
dim(W1 + W2 ) = n + k + m = (n + k) + (n + m) − n
= dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).
22
11 Co-ordinates and change of bases
Let V be a vector space with dim(V ) = n and let B = {v1 , ..., vn } be a
basis for V . Then every vector v ∈ V has a unique expression as a linear
combination
v = a1 v1 + ... + an vn
for some a1 , ..., an ∈ K. The scalarsa1 , ...,
an are called the co-ordinates of
a1
..
v with respect to B and the vector . ∈ K n is called the co-ordinate
an
vector of v with respect to B, denoted by [v]B .
23
Thereforethe co-ordinates
of A are a, b, c, d and the co-ordinate vector
a
b
is [A]B =
c .
d
3. Consider C as a vector space over R. A standard basis is B = {1, i}.
The co-ordinate vector of v = a + bi with respect to B is
a
[v]B = .
b
u = v ⇔ [u]B = [v]B .
v ↔ [v]B .
The next result shows that this correspondence agrees with the vector space
operations:
Theorem 11.2. Let B = {v1 , ..., vn } be a basis for V . Then for all u, v ∈ V
and all a ∈ K
(i) [u + v]B = [u]B + [v]B
u = a1 v1 + ... + an vn , v = b1 v1 + ... + bn vn .
Then
24
Hence
a1 + b 1 a1 b1
[u + v]B = .. .. + .. = [u] + [v] , and
=
. . . B B
an + b n an bn
aa1 a1
[au]B = ... = a .. = a[u] .
. B
aan an
[a1 u1 + ... + ak uk ]B = 0 ∈ K n .
Then
a1 [u1 ]B + ... + ak [uk ]B = 0 ∈ K n
and so a1 = ... = ak = 0 since C 0 is linearly independent. Therefore C is
linearly independent as required.
25
Change of Bases
Let B = {u1 , ..., un } and C = {v1 , ..., vn } be bases for a vector space V . Then
each ui has a unique expression as a linear combination of the vectors in C:
Definition 11.5. The n×n matrix whose column vectors are the co-ordinate
vectors [u1 ]C , ..., [un ]C of the vectors of B with respect to C is denoted by PB→C
and is called the change of basis matrix from B to C:
2 1 −1 1 2
Example 11.6. In R , let B = , ,C = , .
1 1 2 1
Then
1 1
1 1 2 1 1
u1 = = + = v1 + v2 ,
3 2
1 3 1 3 3
−1 1 2
u2 = = − = 1.v1 − 1.v2 .
1 2 1
1/3 1
Therefore PB→C = .
1/3 −1
If we know the change of basis matrix PB→C and the co-ordinates of a
vector w ∈ V with respect to B we can easily get the co-ordinates of w with
respect to C.
Theorem 11.7. Let B = {u1 , ..., un } and C = {v1 , ..., vn } be bases for a
vector space V and let PB→C be the change of basis matrix from B to C. Then
26
−1
(iii) PB→C is invertible and PB→C = PC→B .
a1
Proof. (i) Let v ∈ V with [v]B = ... ie. v = a1 u1 + ... + an un . Then
an
[v]C = [a1 u1 + ... + an un ]C
= a1 [u1 ]C + ... + an [un ]C
a1
= ([u1 ]C ...[un ]C ) ...
an
= PB→C [v]B .
(ii) Suppose P is an n × n matrix with P [v]B = [v]C for all v ∈ V . Then
for v = ui we get
[ui ]B = ei ,
where ei is the vector with a 1 in the i-th position and 0 everywhere else. So
the i-th column of P is
Pi = P ei = P [ui ]B = [ui ]C ,
the i-th column of P = PB→C . Therefore P = PB→C .
(iii) By part (i) we have, for all v ∈ V ,
[v]B = PC→B [v]C = PC→B PB→C [v]B .
Hence P = PC→B PB→C has the property that
[v]B = P [v]B
for all v ∈ V . Therefore P = PC→B PB→C = In . Hence PB→C is invertible and
−1
PB→C = PC→B .
Corollary 11.8. If B, C and D are bases of a vector space V , then
PB→C = PD→C PB→D .
Proof. By part (i) of Theorem 11.7, we have for all v ∈ V ,
[v]C = PD→C [v]D = PD→C PB→D [v]B
and then part (ii) of Theorem 11.7 gives that
PB→C = PD→C PB→D .
27
This Corollary gives an easy method for computing a change of basis
matrix. Suppose we are given B, C and D. Then we have
PB→C = PD→C PB→D
= (PC→D )−1 PB→D .
We can take advantage of this if PB→D is easy to compute, for example if D
is a standard basis.
1 −1
Example 11.9. As in our previous example, let B = , ,
1 1
1 2 1 0
C = , . Set D = , , the standard basis for
2 1 0 1
R2 . Then
1 −1 1 2
PB→D = , PC→D = .
1 1 2 1
Then
PB→C = (PC→D )−1 PB→D
−1/3 2/3 1 −1 1/3 1
= =
2/3 −1/3 1 1 1/3 −1
which is what we calculated directly before.
Example 11.10. Consider C as a vector space over R. Let B = {1+i, 1−i},
C = {2 + 3i, 1 + 2i}, bases for C. Then
1 3
PB→C = .
−1 −5
12 Linear Transformations
Definition 12.1. A linear transformation from a vector space V (over
K) to a vector space W (over K) is a function T : V → W such that for all
u, v ∈ V and all a ∈ K,
1. T (u + v) = T (u) + T (v);
2. T (au) = aT (u).
Note: It follows from the definition that a function T : V → W is a linear
transformation if and only if for all u1 , ..., uk ∈ V and for all a1 , ..., ak ∈ K,
T (a1 u1 + ... + ak uk ) = a1 T (u1 ) + ... + ak T (uk ).
We say that T commutes with linear combinations.
28
Examples 12.2. 1. Matrix transformations: For any matrix A ∈ Mmn (K),
define TA : K n → K m by setting
TA (u) = Au
T (A) = AT .
29
4. For any two vector spaces V and W over K, the zero transformation
T0 : V → W defined by
T0 (v) = 0
for all v ∈ V and the identity map I : V → V defined by
I(v) = v
(i) T (0) = 0;
for all x ∈ A.
Proof. Exercise.
For any three linear transformations T : U → V, S : V → W, R : W → Y
we have
R ◦ (S ◦ T ) = (R ◦ S) ◦ T,
30
the associativity law for composition of functions. Also,
T ◦ IU = T, IV ◦ T = T.
Recall that a function f : A → B is invertible if there exists a function
g : B → A with g ◦ f = IA and f ◦ g = IB , and in this case g is unique and
is called the inverse of f , denoted by f −1 . Also f is invertible if and only if
f is injective and surjective.
Theorem 12.5. If a linear transformation T : V → W is invertible, then
the inverse T −1 : W → V is also a linear transformation.
Proof. Let x, y ∈ W and a ∈ K. Then
T ◦ T −1 (x + y) = IW (x + y) = x + y
= T ◦ T −1 (x) + T ◦ T −1 (y)
and since T is injective this gives
T −1 (x + y) = T −1 (x) + T −1 (y).
Also
T ◦ T −1 (ax) = ax = aT ◦ T −1 (x) = T (aT −1 (x))
and again, since T is injective this gives T −1 (ax) = aT −1 (x). Hence T −1 is a
linear transformation.
31
2. Consider T : Mmn (K) → Mnm (K) given by T (A) = AT . Then
A ∈ ker(T ) ⇔ T (A) = 0nm ⇔ AT = 0nm ⇔ A = 0mn .
Therefore ker(T ) = {0mn }. Here we have
range(T ) = Mnm (K)
because every matrix in Mnm (K) is the image of its transpose, ie.
A = T (AT ) = (AT )T
for all A ∈ Mnm (K).
3. Consider T : M2 (R) → F given by
a b
T = a sin(x) − 2d cos(x).
c d
a b
Here ∈ ker(T ) ⇔ a sin(x) − 2d cos(x) = 0 for all x ∈ R.
c d
Since sin(x) and cos(x) are linearly independent in F, this implies that
a = d = 0. Hence
0 b
ker(T ) = | b, c ∈ R .
c 0
The range of T is the subspace of F spanned by {sin(x), cos(x)}.
Theorem 12.8. The kernel of a linear transformation T : V → W is a
subspace of V and the range is a subspace of W .
Proof. Since T (0) = 0, we have 0 ∈ ker(T ) and so ker(T ) 6= ∅. Let u, v ∈
ker(T ) and a ∈ K. Then
T (u + v) = T (u) + T (v) = 0 + 0 = 0
and so u + v ∈ ker(T ), and
T (au) = aT (u) = a0 = 0
and so au ∈ ker(T ). By the subspace criteria, ker(T ) is a subspace of V .
Let x, y ∈ range(T ) and suppose x = T (u), y = T (v) for some u, v ∈ V .
Then
x + y = T (u) + T (v) = T (u + v)
and so x + y ∈ range(T ) and if a ∈ K, then
ax = aT (u) = T (au)
and so ax ∈ range(T ). Therefore range(T ) is a subspace of W .
32
Definition 12.9. Let T : V → W be a linear transformation. The rank of
T is defined by
rank(T ) = dim(range(T ))
and the nullity of T is defined as
33
But then
w = T (a1 v1 + ... + ak vk + ak+1 vk+1 + ... + an vn )
= a1 T (v1 ) + ... + ak T (vk ) + ak+1 T (vk+1 ) + ... + an T (vn )
= ak+1 T (vk+1 ) + ... + an T (vn )
because v1 , .., vk ∈ ker(T ). Therefore w is a linear combination of vectors in
B2 and so range(T ) is spanned by B2 . To show that B2 is linearly indepen-
dent, assume that
and ck+1 vk+1 + ... + cn vn ∈ ker(T ). Since B1 is a basis for ker(T ) we have
34
Theorem 12.13. Let V and W be vector spaces over a field K with dim(V ) =
dim(W ). Then a linear transformation T : V → W is injective if and only
if it is surjective.
Proof. Exercise
in W . Then
T (a1 v1 + ... + an vn ) = 0
and so a1 v1 + ... + an vn ∈ ker(T ) = {0}. Then
a1 v1 + ... + an vn = 0
Proof. Let B = {v1 , ..., vn } be a basis for V . Then T (B) is linearly indepen-
dent set of n vectors in W and since dim(W ) = n, T (B) must form a basis
for W .
35
Proof. If T : V → W is an isomorphism, then range(T ) = W and ker(T ) =
{0} and then the Rank Theorem gives
Corollary 12.18. Any finite dimensional vector space V over a field K with
dim(V ) = n is isomorphic to K n .
T (v) = [v]B
is an isomorphism.
36
and C = {w1 , w2 }, where
1 1
w1 = , w2 = ,
1 0
Proof. Let B = {v1 , ..., vn } and let {e1 , ..., en } be the standard basis for K n .
Then [vi ]B = ei for each i = 1, ..., n. Let A =B [T ]C . Then
A[vi ]B = Aei = [T (vi )]C (∗)
since Aei is the i-th column of A and this is [T (vi )]C by the definition of A.
Now any v ∈ V can be written as a linear combination v = a1 v1 + ... + an vn
and then
[v]B = a1 e1 + ... + an en .
So
A[v]B = A(a1 e1 + ... + an en )
= a1 Ae1 + ... + an Aen
= a1 [T (v1 )]C + ... + an [T (vn )]C
= [a1 T (v1 ) + ... + an T (vn )]C
= [T (a1 v1 + ... + an vn )]C = [T (v)]C .
37
1
In our previous example, let v = 3 = 1v1 + 2v2 + 2v3 . Then
4
4
T (v) = = −1w1 + 5w2 ,
−1
so
−1
[T (v)]C = .
5
Also
1
1 0 −1 2 = −1
B [T ]C [v]B = ,
1 1 1 5
2
as predicted by the Theorem.
Example 13.4. Let IV : V → V be the identity map on a vector space V
with dim(V ) = n and let B = {v1 , ..., vn } and C be bases of V . What is
B [IV ]C ?
By definition, the i-th column of this matrix is [IV (vi )]C = [vi ]C . Therefore
B [IV ]C = PB→C ,
the change of basis matrix. In particular
B [IV ]B = In .
38
Theorem 13.6. Let V and W be finite dimensional vector spaces over a
field K with bases B and C respectively, and let T : V → W be a linear
transformation. Then T is invertible if and only if dim(V ) = dim(W ) and
B [T ]C is an invertible matrix. In this case
−1
C [T ]B = (B [T ]C )−1 .
T (v) = 0 ⇒ [T (v)]C = 0
⇒B [T ]C [v]B = 0 ⇒ [v]B = 0
⇒ v = 0.
The Rank Theorem now gives range(T ) = W and so T is a bijection and
hence T is invertible.
[T ]B =B [T ]B .
[T ]C = P −1 [T ]B P
39
Proof. Let B = {u1 , ..., un } and C = {v1 , ..., vn }. Then the i-th column of
[T ]C is
[T (vi )]C = PB→C .[T (vi )]B
= PB→C .[T ]B [vi ]B
= PB→C .[T ]B .PC→B [vi ]C
= PB→C .[T ]B .PC→B ei .
Therefore
[T ]C = PB→C .[T ]B .PC→B
= (PC→B )−1 .[T ]B .PC→B
as required.
Note: If matrices A and B can be written as B = P −1 AP for some invertible
matrix P , then we say that A and B are similar matrices. Theorem 13.7
says that the matrices of T with respect to different bases are similar.
Conversely if A, B ∈ Mn (K) are similar matrices, then they represent the
same linear operator T : K n → K n with respect to some bases B, C of K n .
Suppose B = P −1 AP for some invertible P ∈ Mn (K). We have A = [TA ]B
where TA (v) = Av for all v ∈ K n and B = {e1 , ..., en } is the standard basis
for K n . Therefore
B = P −1 AP = P −1 [TA ]B P = [TA ]C
where C is the basis {P e1 , ..., P en }, ie. the basis of K n with change of basis
matrix PC→B = P .
1 1
Example 13.8. Let A = ∈ M2 (R). Then A = [TA ]B where
1 1
B = {e1 , e2 } is the standard basis of R2 and
x 1 1 x x+y
TA = = .
y 1 1 y x+y
1 1
Let C = {u1 , u2 } with u1 = = e1 + e2 and u2 = = e1 − e2 .
1 −1
1 1 1/2 1/2
Then PC→B = and (PC→B )−1 = . Therefore
1 −1 1/2 −1/2
1/2 1/2 1 1 1 1
[TA ]C =
1/2 −1/2 1 1 1 −1
40
2 0
= .
0 0
We can check this directly from the definition:
2 0
TA (u1 ) = = 2u1 + 0u2 , TA (u2 ) = = 0u1 + 0u2 .
2 0
41
If we wish to discover whether a vector v is in row(A), then we can
consider AT , using the method above and the fact that row(A) = col(AT ).
Theorem 14.4. For any matrix A ∈ Mmn (K), the dimension of row(A) is
the number of non-zero rows in the reduced row echelon form of A and the
non-zero rows of the RREF of A form a basis for row(A).
A(u + v) = Au + Av = 0 + 0 = 0
42
The dimension of null(A) is called the nullity of A, denoted nullity(A).
1 −1 2
Example 14.7. Find a basis for null(A) where A = 0 1 −1 .
3 −3 6
a
We must find a basis for the set of vectors v = b such that Av = 0.
c
The augmented matrix is
1 −1 2 0
0 1 −1 0
3 −3 6 0
which reduces to
1 −1 2 0
0 1 −1 0 .
0 0 0 0
−t −1
We get a − b + 2c = b − c = 0 and so v = t = t 1 for t ∈ R.
t 1
−1
Therefore null(A) a 1-dimensional subspace of R3 spanned by 1 and
1
nullity(A) = 1.
Proof. Write vi for the ith column of A. Then vi = Aei ∈ range(TA ) (where
{e1 , . . . , en } is the standard basis of K n ). So
43
Definition 14.9. For A ∈ Mmn (K), define rank(A) to be the dimension of
the column space of A. By the above lemma we have rank(A) = rank(TA ).
Note that it is not true in general that col(A) = row(A). They just have
the same dimension.
Proof. Let R be the reduced row echelon form (RREF) of A. By Theorem
14.4 we have dim(row(A)) = r, the number of non-zero rows in R. Since
solutions to Av = 0 are precisely the solutions of Rv = 0, we have null(A) =
null(R).
By Theorem 14.10 null(A) + rank(A) = n = null(R) + rank(R). So
3 −1 5
Example 14.12. Consider the matrix A = 2 1 3 ∈ M3 (R). The
0 −5 1
reduced row echelon form of A is
1 0 8/5
R = 0 1 −1/5 .
0 0 0
44
Therefore row(A) had dimension 2 and we can take the two non-zero rows
of R as a basis for row(A), i.e.,
1 0
B= 0 , 1
8/5 −1/5
is a basis for row(A) and row(A) has dimension 2. For the column space we
T
can perform column operations on A or row operations
on A . If we do the
3 2 0
latter the reduced row echelon form of AT = −1 1 −5 is
5 3 1
1 0 2
R0 = 0 1 −3 .
0 0 0
Therefore a basis for col(A) is
1 0
B0 = 0 , 1 .
2 −3
45