7 Diagonalization and Quadratic Forms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

7 Diagonalization and Quadratic Forms

Diagonalization
Recall the definition of a diagonal matrix from Section 1.6.

Definition 7.1. A square matrix A is diagonalizable if there exists an in-


vertible matrix P such that P −1 AP is diagonal. We say that P diagonalizes
A.

Remark. Why is this interesting? For many applications we need to com-


pute powers of matrices, for example

A7 = AAAAAAA.

To do this by direct calculation is a lot of work, but if A is diagonalizable,


say P −1 AP = D diagonal, then A = P DP −1 so

A7 = (P DP −1 )(P DP −1 )(P DP −1 )(P DP −1 )(P DP −1 )(P DP −1 )(P DP −1 ) = P D7 P −1 .

and more generally, Ak = P Dk P −1 for all k. We have seen in Exercise Sheet


5 that Dk is easy to compute, so this gives a much easier way to work out
Ak for large k.

Examples 7.2. Consider


     
0 0 −2 −1 0 −2 1 0 2
A=  1 2 1  , P =  0 1 1  , P −1 =  1 1 1  .
1 0 3 1 0 1 −1 0 −1

Then we can check that


 
2 0 0
P −1 AP =  0 2 0  =: D,
0 0 1

so A is diagonalizable. Now for any m we have


 m 
2 0 0
Am = P Dm P −1 = P  0 2m 0  P −1
0 0 1m

so to calculate Am we need only calculate the scalar power 2m and then


perform two matrix multiplications.

1
Exercise. With A as above, work out A16 . Then try and do it directly
without the “diagonalization”!

Given that diagonalizing a matrix is so useful, it is natural to ask which


matrices can be diagonalized. To answer this question we will need a lemma
giving yet another characterisation of invertible matrices.

Lemma 7.3. Let P be an n × n square matrix. Then P is invertible if and


only if its columns (viewed as column n-vectors) form a set of n linearly
independent vectors.

Proof. See Section 14.

Theorem 7.4. Let A be an n×n matrix. Then A is diagonalizable if and only


if A has n linearly independent eigenvectors. A matrix P diagonalizes A if
and only if P ’s columns form a set of n linearly independent eigenvectors for
A. If it does, then the main diagonal entries of the diagonal matrix P −1 AP
are the eigenvalues of A (in the order corresponding to the columns of P ).

Proof. Suppose P −1 AP = D is diagonal. Let c1 , . . . , cn be the columns of


P . By Lemma 7.3, the columns are linearly independent. Now P −1 AP = D
implies
AP = P D.
It follows from the definition of matrix multiplication that (i) the ith column
of P D is Dii ci and (ii) the ith column of AP = Aci . Thus we have Aci = Dii ci ,
so each column ci is an eigenvector of A corresponding to the eigenvalue Dii .
Conversely, if A has n linearly independent eigenvectors c1 , . . . , cn then
let P be the matrix with these as columns, and D the diagonal matrix with
the corresponding eigenvalues on the main diagonal. By Lemma 7.3, P is
invertible. Now reversing the argument above, the ith column of AP is Aci
and the ith column of P D is Dii ci so AP = P D, so P −1 AP = D.

Example 7.5. Let


     
3 0 1 0
A= ,u = and v = .
8 −1 2 1

Then A has eigenvalues 3 and −1, with corresponding eigenvectors u and


v respectively (exercise: check this). It is easy to check that {u, v} is
linearly independent. If we let P be the matrix whose columns are u and v,
   
1 0 −1 1 0
P = , then P =
2 1 −2 1

2
and one can check (exercise) that
 
−1 3 0
P AP = .
0 −1

Definition 7.6. An n×n matrix A is called orthogonally diagonalizable


if there is an orthogonal matrix P such that P −1 AP = P T AP is diagonal.

Theorem 7.7. Let A be an n × n matrix. Then the following are equivalent:

(i) A is orthogonally diagonalizable.

(ii) A has an orthonormal set of n eigenvectors;

(iii) A is symmetric.

Proof. (i) ⇔ (ii) This follows from Theorems 6.6, 6.8 and 7.4 (exercise:
write down exactly how).
(i) ⇒ (iii) If (i) holds, say P −1 AP = D is diagonal with P orthogonal,
then we have A = P DP −1 = P DP T . Clearly D is symmetric, so

AT = (P DP T )T = (P T )T DT P T = P DP T = A

which means that A is symmetric.


(iii) ⇒ (ii) Omitted (see for example the textbook of Anton).

Quadratic Forms
Definition 7.8. A quadratic form in n variables is a function f : Rn → R
of the form X
f (x) = f (x1 , ..., xn ) = cij xi xj (∗)
1≤i≤j≤n

where x ∈ Rn and cij ∈ R(1 ≤ i ≤ j ≤ n). Alternatively, a quadratic form is


a homogeneous polynomial of degree 2 in n variables x1 , ..., xn .

Examples 7.9. The following are quadratic forms:

1. f (x1 ) = x21

2. f (x1 , x2 ) = 2x21 + 3x22 − x1 x2

3. f (x1 , x2 , x3 ) = x21 + x22 + x23 − 2x1 x2 − 2x1 x3 − 2x2 x3

3
4. f (x1 , . . . , xn ) = x21 + x22 + · · · + x2n = hv|vi, where v = (x1 , . . . , xn ). So
the Euclidean inner product (see Chapter 6) gives rise to a quadratic
form.
1
If we set aii = cii for i = 1, ..., n and aij = cij for 1 ≤ i < j ≤ n, then
2
(∗) becomes
n
X X
f (x) = aii x2i + 2aij xi xj
i=1 1≤i<j≤n

and we can write this as


f (x) = xT Ax
where A is the symmetric n × n matrix with (i, j)-th entry equal to aij . Then
A is called the matrix of the quadratic form f .

Example 7.10. Let f (x1 , x2 ) = 2x21 − 3x22 − x1 x2 . Then


  
 2 −1/2 x1
f (x1 , x2 ) = x1 x2 .
−1/2 −3 x2

Now that a symmetric matrix is involved, we can take advantage of The-


orem 7.7.
ie. there exists an orthogonal matrix Q such that
 
λ1 ... 0
QT AQ = D =  ... . . . ... 
 
0 ... λn

where D is a diagonal matrix and λ1 , ..., λn are the eigenvalues of A.


Now let y = Q−1 x = QT x. Then x = Qy and

f (x) = (Qy)T A(Qy) = y T QT AQy = y T Dy


 
y1
 .. 
and if y =  .  then
yn

y T Dy = λ1 y12 + ... + λn yn2 ,

a quadratic form in variables y1 , ..., yn with no cross terms. This process


is called diagonalization of the quadratic form f . We have just proved a
famous theorem, namely

4
Theorem 7.11. (The Principal Axes Theorem) Every quadratic form f can
be diagonalized.
  More specifically, if f (x) = xT Ax is a quadratic form in
x1
 .. 
x =  . , then there exists an orthogonal matrix Q such that
xn

f (x) = xT Ax = λ1 y12 + ... + λn yn2


   
y1 x1
 ..  . 
where  .  = QT  ..  and λ1 , ..., λn are the eigenvalues of the matrix

yn xn
A.

From the first part of the course we know that Q is the matrix whose
columns are the unit eigenvalues of the matrix A of f .

Example 7.12. Let f (x1 , x2 , x3 ) = 2x1 x2 + 2x1 x3 + 2x2 x3 . The matrix of f


is  
0 1 1
A =  1 0 1 .
1 1 0
The eigenvalues of A are 2, −1, −1 with corresponding unit eigenvectors
 √   √   
1/√3 −2/√ 6 0√
 1/ 3  ,  1/ 6  ,  1/ 2 
√ √ √
1/ 3 1/ 6 −1/ 2

and so we have  √ √ 
1/√3 −2/√ 6 0√
Q =  1/√3 1/√6 1/ √2  .
1/ 3 1/ 6 −1/ 2
If we set y = QT x, we get
1 1 1
y1 = √ (x1 + x2 + x3 ), y2 = √ (−2x1 + x2 + x3 ), y3 = √ (x2 − x3 ).
3 6 2
Then, expressed in terms of the variables y1 , y2 and y3 , the quadratic form
becomes 2y12 − y22 − y32 .

Definition 7.13. A quadratic form f : Rn → R is positive definite if


f (x) > 0 for all x 6= 0.

5
An immediate consequence of the Principal Axes Theorem is the follow-
ing:
Theorem 7.14. Let f (x) = xT Ax be a quadratic form with matrix A. Then
f is positive definite if and only if all the eigenvalues of A are positive.
Proof. By the Principal Axes Theorem, there exists an orthogonal matrix Q
such that
f (x) = λ1 y12 + ... + λn yn2
 
y1
where y =  ...  = QT x and λ1 , ..., λn are the eigenvalues of A. If all the
 
yn
λi are positive then f (x) > 0 except when y = 0. But this happens if and
only if x = 0 because QT is invertible. Therefore f is positive definite.
On the other hand if one of the eigenvalues λi ≤ 0, letting y = ei and
x = Qy we get f (x) = λi ≤ 0 and so f is not positive definite.
We say that a symmetric matrix A is positive definite if the associated
quadratic form
f (x) = xT Ax
is positive definite.
The Principal Axes Theorem has important applications in geometry.

8 Vector Spaces
Definition and Examples
In the first part of the course we’ve looked at properties of the real n-space
Rn . We also introduced the idea of a field K in Section 3.1 which is any set
with two binary operations + and × satisfying the 9 field axioms. R is an
example of a field but there are many more, for example C, Q and Zp (p a
prime, with modulo p addition and multiplication).

In the second part of the course we will be looking at vector spaces.


These will be a generalisation of Rn and we will see many other examples that
have the same properties. We will start with an abstract definition listing
the vector space axioms. All the properties we derive will then apply to
any example that satisfies this definition.
Definition 8.1. A vector space over a field K is a set V with addition
and scalar multiplication, ie. u + v ∈ V is defined for all u, v ∈ V and
au ∈ V is defined for all a ∈ K and u ∈ V , such that

6
1.(i) u + v = v + u for all u, v ∈ V

(ii) (u + v) + w = u + (v + w) for all u, v, w ∈ V

(iii) there exists an element 0 ∈ V such that u + 0 = u for all u ∈ V

(iv) for each u ∈ V , there exists a unique element −u ∈ V such that u +


(−u) = 0

2.(i) a(u + v) = au + av for all a ∈ K, for all u, v ∈ V

(ii) (a + b)u = au + bu for all a, b ∈ K, for all u ∈ V

(iii) a(bu) = (ab)u for all a, b ∈ K, for all u ∈ V

(iv) 1u = u for all u ∈ V .

The elements of V are called vectors and the elements of K are called
scalars. We sometimes refer to V as a K−space.

Examples 8.2. 1. For all n ≥ 1, Rn with the usual addition and scalar
multiplication is a vector space over R. More generally, let
  

 x 1 

n  .. 
K =  .  | xi ∈ K

 x 

n

and define
         
x1 y1 x1 + y 1 x1 cx1
 ..   ..  =  ..
 . + , c  ...  =  ... 
    
.   .
xn yn xn + y n xn cxn

where xi , yi , c ∈ K. Then K n is a vector space over K.

2. The set Mmn (R) of all m × n matrices with entries in R with addition
of matrices and scalar multiplication is a vector space over R. More
generally, let
  

 a 11 ... a 1n 

 .. .
..  | aij ∈ K
Mmn (K) =  .


 a 
m1 ... amn

7
be the set of m × n matrices with entries in K and define
     
a11 ... a1n b11 ... b1n a11 + b11 ... a1n + b1n
 .. .. + .. ..  =  .. ..
,

 . .   . .   . .
am1 ... amn bm1 ... bmn am1 + bm1 ... amn + bmn
   
a11 ... a1n ca11 ... ca1n
c  ... ..  =  .. .. 

.   . . 
am1 ... amn cam1 ... camn
where (aij , bij , c ∈ K). Then Mmn (K) is a vector space over K.
We write Mn (K) = Mnn (K).

3. Let Pn denote the set of all polynomials of degree ≤ n with real coef-
ficients:

Pn = {a0 + a1 x + a2 x2 + ... + an xn | ai ∈ R, ∀i = 0, ..., n}

and define n n n
X X X
i i
ai x + bi x = (ai + bi )xi
i=0 i=0 i=0
n
X n
X
c( ai xi ) = (cai )xi .
i=0 i=0

Then Pn is a vector space over R.


The zero
P vector is thePzero polynomial with all coefficients equal to 0
and −( ni=0 ai xi ) = ni=0 (−ai )xi .
The set P of all polynomials over R is also a vector space over R.

4. Let F denote the set of all functions from R → R and for


f : R → R, g : R → R and c ∈ R define

(f + g)(x) = f (x) + g(x), (cf )(x) = c.f (x), ∀x ∈ R.

Then F is a vector space over R.


The zero is the constant function f0 such that f0 (x) = 0, ∀x ∈ R and
for any f ∈ F, −f is the function defined by

(−f )(x) = −f (x), ∀x ∈ R.

8
5. The set C of complex numbers is a vector space over R with the usual
addition of complex numbers and multiplication by real numbers.
6. An unusual example: Let U be a set. Consider the power set
P(U ) = {A| A ⊆ U }. For A, B ⊆ U define
A + B = (A ∪ B)\(A ∩ B).
This definition satisfies conditions 1(i)-(iv) of Definition 8.1.
The zero in P(U ) is ∅ and −A = A.
Consider the field Z2 = {0, 1} and define
1.A = A, 0.A = ∅, ∀A ⊆ U.
We can show that 2(i)-(iv) of Definition 8.1 are satisfied.
Hence P(U ) is a vector space over Z2 .
Theorem 8.3. Let V be a vector space over K. Then, for all u ∈ V and all
a ∈ K we have:
(i) 0u = 0;
(ii) a0 = 0;
(iii) (−1)u = −u; and
(iv) if au = 0, then a = 0 or u = 0.

Proof. (i) 0u = (0 + 0)u = 0u + 0u


⇒ 0 = 0u − 0u = (0u + 0u) − 0u = 0u + (0u − 0u) = 0u.
(ii) a0 = a(0 + 0) = a0 + a0
⇒ 0 = a0 − a0 = (a0 + a0) − a0 = a0 + (a0 − a0) = a0.
(iii) We need to show that u + (−1)u = 0. We have
u + (−1)u = 1u + (−1)u = (1 − 1)u = 0u = 0.

(iv) If au = 0 and a 6= 0, then there exists a−1 ∈ K. We have


a−1 (au) = a−1 0 = 0
but also a−1 (au) = (a−1 a)u = 1u = u and therefore u = 0.

9
Subspaces
Definition 8.4. A non-empty subset W of a K−space V is a subspace if
(i) u + v ∈ W, ∀u, v ∈ W ; and

(ii) au ∈ W, ∀u ∈ W, ∀a ∈ K.
Theorem 8.5. A subspace W of a K−space V is itself a vector space over
K with the same addition and scalar multiplication as in V .

Proof. Since W 6= ∅, there exists u ∈ W and then 0 = 0.u ∈ W by (ii) of


Definition 8.4. For each v ∈ W we have −v = (−1)v ∈ W . The remaining
properties of a vector space hold in W because they hold in V and W ⊆
V.
  

 x 1 

n  .. 
Examples 8.6. (i) In R , let W =  .  | xi ∈ R, x1 = 0 . Then W

 x 

n
satisfies the two conditions of Definition 8.4 and is a subspace of Rn .

(ii) In Mn (K) let


W = {A ∈ Mn (K)| A = AT }
be the subset of all symmetric matrices. Then W is a subspace because
the sum of two symmetric matrices is symmetric and any scalar multiple
of a symmetric matrix is also symmetric.

(iii) For all n, Pn is a subspace of P because the sum of two polynomials


with degree ≤ n is a polynomial of degree ≤ n and any scalar multiple
of a polynomial of degree ≤ n has degree ≤ n.
Also Pn is a subspace of Pm for all n ≤ m.

(iv) In the space of real-valued functions F let

W = {f ∈ F| f (1) = 0}.

Then W is a subspace of F because if f, g ∈ W then

(f + g)(1) = f (1) + g(1) = 0 + 0 = 0

and for all a ∈ R,

(af )(1) = af (1) = a0 = 0.

10
(v) In any vector space V , the subset {0} is a subspace, called the zero
subspace.

Theorem 8.7. Let W1 and W2 be subspaces of the vector space V . Then


W1 ∩ W2 is also a subspace of V .

Proof. First note that 0 ∈ W1 ∩ W2 and so W1 ∩ W2 6= ∅. Let u, v ∈ W1 ∩ W2 .


Then u + v ∈ W1 and u + v ∈ W2 because W1 , W2 are subspaces. Hence
u + v ∈ W1 ∩ W2 .
Similarly if a ∈ K and u ∈ W1 ∩ W2 , then au ∈ W1 and au ∈ W2 and so
au ∈ W1 ∩ W2 . Therefore W1 ∩ W2 satisfies the two conditions of Definition
8.4 and is a subspace of V .

Definition 8.8. Let W1 and W2 be subspaces of a vector space V . Then the


set
W1 + W2 = {u + v| u ∈ W1 , v ∈ W2 }
is called the sum of W1 and W2 in V .

Theorem 8.9. The sum of two subspaces of a vector space V is a subspace


of V .

Proof. We have 0 = 0 + 0 ∈ W1 + W2 .
Let u + v, u0 + v 0 ∈ W1 + W2 , where u, u0 ∈ W1 and v, v 0 ∈ W2 . Then

(u + v) + (u0 + v 0 ) = (u + u0 ) + (v + v 0 ) ∈ W1 + W2

and for any c ∈ K,

c(u + v) = cu + cv ∈ W1 + W2

because W1 and W2 are subspaces of V .

Example 8.10. In Mn (R) consider the subsets

W1 = {A ∈ Mn (R)| A = AT }, W2 = {B ∈ Mn (R)| B = −B T }.

We have already seen that W1 is a subspace and it’s not hard to show that
W2 is a subspace. We have

W1 ∩ W2 = {A ∈ Mn (R)| A = AT and A = −AT }.


So if A ∈ W1 ∩W2 , then AT = −AT and we get AT = 0 and A = 0. Therefore
W1 ∩ W2 = {0}.

11
Let A ∈ Mn (R). Then we can write
1 1
A = (A + AT ) + (A − AT )
2 2
1 1
and by properties of the transpose we have (A+AT ) ∈ W1 and (A−AT ) ∈
2 2
W2 . Therefore A ∈ W1 + W2 and W1 + W2 = Mn (R).

We can extend Definition 8.8 to the sum of more than two subspaces:

Definition 8.11. Let W1 , W2 , ..., Wt be subspaces of the vector space V . Then

W1 + W2 + ... + Wt = {w1 + w2 + ... + wt | wi ∈ Wi , i = 1, ..., t}

is the sum of the subspaces W1 , ..., Wt .

An easy induction on t and Theorem 8.9 show that this sum is a subspace
of V .

9 Spanning Sets and Linear Independence


Spanning Sets
Let V be a vector space over K. An expression of the form
m
X
a1 v1 + a2 v2 + ... + am vm = ai v i ,
i=1

where v1 , ..., vm ∈ V and a1 , ..., am ∈ K is called a linear combination of


the vectors v1 , ..., vm .

Definition 9.1. Let S be a set of vectors in a K−space V . The set of all


linear combinations of vectors from S is called the span of S, denoted by
span(S). If V = span(S), then S is called a spanning set for V and we
say V is spanned by S.

Note: The span of S can be written as


Xm
span(S) = { ai vi | v1 , ..., vm ∈ S, a1 , ..., am ∈ K, m ≥ 1}.
i=1

12
Examples 9.2. 1. The set
       
1 0 0 1 0 0 0 0
S= , , ,
0 0 0 0 1 0 0 1

is a spanning set for the vector space M2 (K) over any field K because
we can write any 2 × 2 matrix as
         
a b 1 0 0 1 0 0 0 0
=a +b +c +d
c d 0 0 0 0 1 0 0 1

where a, b, c, d ∈ K.

2. The set S = {1, x, x2 , x3 , ...} is a spanning set for the vector space P
of polynomials with coefficients in R. By definition, any polynomial
f = ni=0 ai xi is a linear combination of elements of S.
P

In P, span(x, x2 , x3 , ...) is the set of all polynomials with zero constant


term, span(1, x, ..., xn ) is the set Pn of all polynomials of degree ≤ n.

3. Find the span of the subset


     
1 0 0 1 0 0
S= , ,
0 0 1 0 0 1

in M2 (K).
Since the three matrices in S are symmetric, any linear combination of
them is symmetric. Indeed,
       
a b 1 0 0 1 0 0
=a +b +d
b d 0 0 1 0 0 1

and so span(S) is the set of all symmetric matrices in M2 (K).

4. Show that −1 + 4x + x3 ∈ span(2x − x2 , 3 − x3 , 1 + x2 ).

Assume −1 + 4x + x3 = a(2x − x2 ) + b(3 − x3 ) + c(1 + x2 ). Then

3b + c = −1, 2a = 4, −a + c = 0, −b = 1.

This set of equations has solution a = c = 2, b = −1 and therefore


−1 + 4x + x3 ∈ span(2x − x2 , 3 − x3 , 1 + x2 ).

Theorem 9.3. Let V be a vector space over K and v1 , ..., vk ∈ V . Then

13
(i) span(v1 , ..., vk ) is a subspace of V ;

(ii) span(v1 , ..., vk ) is the smallest subspace of V that contains v1 , ..., vk .

Proof. (i) Consider the span of a single vector vi , span(vi ) = {avi | a ∈ K}.
This is a subspace because

avi + bvi = (a + b)vi ∈ span(vi ), c(avi ) = (ca)vi ∈ span(vi )

for any a, b, c ∈ K. Also

span(v1 , ..., vk ) = span(v1 ) + ... + span(vk )

and therefore span(v1 , ..., vk ) is a subspace by Theorem 8.9.

(ii) If W is a subspace containing v1 , ..., vk , then, since subspaces are closed


under addition and scalar multiplication, W contains all linear combinations
of v1 , ..., vk , and hence W contains span(v1 , ..., vk ).

Linear Independence
Definition 9.4. Let V be a vector space. A finite set S = {v1 , ..., vk } of
vectors in V is called linearly independent if the only solution to

a1 v1 + ... + ak vk = 0

is a1 = ... = ak = 0.
If S is not linearly independent we say it is linearly dependent.

Remark:
If we can find a1 , ..., ak ∈ K, not all zero, such that a1 v1 + ... + ak vk = 0,
we say this is a non-trivial combination of the vectors.

Theorem 9.5. A finite set of vectors S is linearly dependent if and only if


one of the vectors in S can be written as a linear combination of the other
vectors in S.

Proof. Let S = {v1 , ..., vk } and assume, without loss of generality, that

v1 = a2 v2 + ... + ak vk .

Then
1v1 − a2 v2 − ... − ak vk = 0
and therefore S is linearly dependent.

14
Conversely assume there is a non-trivial combination
b1 v1 + ... + bk vk = 0.
Then at least one of the coefficients bi 6= 0 and we can write

bi vi = −b1 v1 − ... − bi−1 vi−1 − bi+1 vi+1 − ... − bk vk .


Therefore
vi = b−1
i (−b1 v1 − ... − bi−1 vi−1 − bi+1 vi+1 − ... − bk vk )

= (−b−1 −1 −1 −1
i b1 )v1 + ...(−bi bi−1 )vi−1 + (−bi bi+1 )vi+1 + ... + (−bi bk )vk

as required.
   
1 0 0 1
Examples 9.6. 1. In M2 (R), let A = ,B = ,
0 0 −2 0
 
−1 0
C= . If
0 1
aA + bB + cC = 0
for some a, b, c ∈ R we have
   
a−c b 0 0
=
−2b c 0 0
and so a − c = 0, b = −2b = 0, c = 0 and the only solution is a = b =
c = 0. Therefore {A, B, C} is linearly independent in M2 (R).
2. Consider the vector space P and let S = {1, x, x2 , ..., xn }. Then
a0 + a1 x + ... + an xn = 0
if and only if a0 = a1 = ... = an = 0. Hence S is linearly independent
in P. (Here we use the fact that two polynomials are equal if and only
if their coefficients are equal.)
3. In the vector space F of all functions f : R → R, the set {sin2 (x), cos2 (x), cos(2x)}
is linearly dependent because
cos(2x) = cos2 (x) − sin2 (x).

4. Sets containing a single non-zero vector are always linearly indepen-


dent.
Sets containing the zero vector are always linearly dependent.
Sets containing two non-zero vectors {v1 , v2 } are linearly dependent if
and only if v1 = av2 for some scalar a.

15
The concept of linear independence can be extended to infinite sets: an
arbitrary non-empty set S in a vector space V is linearly independent if all
its finite subsets are linearly independent.

Examples 9.7. 1. The set {1, x, x2 , ...} is linearly independent in P.

2. In F, consider the set S = {fα | α ∈ R} where

1 if x = α
fα (x) = {
0 if x 6= α.

Let {fα1 , ..., fαk } be an arbitrary finite subset of S. If

a1 fα1 + ... + ak fαk = 0,

then evaluating the LHS at x = αi we get

a1 fα1 (αi ) + ... + ak fαk (αi ) = 0,

⇒ ai .1 = 0 ⇒ ai = 0, ∀i = 1, ..., k.
Hence {fα1 , ..., fαk } is linearly independent and so by definition S is
linearly independent.
Notice this is an example of an uncountably infinite linearly inde-
pendent set.

10 Bases
Definition 10.1. A subset B of a vector space V is a basis for V if B spans
V and B is linearly independent.

Examples 10.2. 1. The standard basis for K n is the set {e1 , ..., en }
 
0
 .. 
 . 
where ei =  1 , where the 1 appears in the i-th row. There are
 
 . 
 .. 
0
many more bases for K n .

2. Another basis for R2 is {e1 , e2 }, where


   
2 1
e1 = , e2 = .
1 1

16
3. In Mmn (K), the set of matrices Eij with a 1 in the (i, j)−th position
and 0 everywhere else, for 1 ≤ i ≤ m, 1 ≤ j ≤ n is the standard basis
for Mmn (K).

4. The set of polynomials {1, x, x2 , ..., xn } form a basis for Pn and the
infinite set {1, x, x2 , ...} is a basis for P.

The advantage of a basis over an ordinary spanning set is that every


vector in the space has a unique representation as a linear combination of
the basis vectors:

Theorem 10.3. Let V be a vector space and let B be a finite basis for V .
Then for every vector v ∈ V there is a unique expression for v as a linear
combination of the vectors in B.

Proof. Let B = {v1 , ..., vn }. Suppose that

v = a1 v1 +, ... + an vn = b1 v1 + ... + bn vn

where a1 , ..., an , b1 , ..., bn ∈ K. Then

0 = v − v = (a1 − b1 )v1 + ... + (an − bn )vn

and since B is linearly independent this gives

a1 − b1 = ... = an − bn = 0

⇒ ai = b i , ∀i = 1, ..., n.

(This proves the theorem for finite B. The proof for an infinite B is
similar, but requires some additional argument.)

Definition 10.4. A vector space V is called finite dimensional if it has


a finite basis.

Lemma 10.5. Let V be a finite dimensional vector space. Let S be a finite


spanning set for V and S0 be a linearly independent subset of S. Then S
contains a basis B of V with S0 ⊆ B.

Proof. Let B be a maximum linearly independent subset of S that contains


S0 , ie. B is not contained in any larger linearly independent subset of S that
contains S0 . We need to show that B is a spanning set for V . If B = S we are
done so assume that B = {v1 , ..., vn }, S = {v1 , ..., vm } where m > n. Then

17
for all vj ∈ S\B, B ∪ {vj } is linearly dependent. So there is a non-trivial
combination
a1 v1 + ... + an vn + aj vj = 0
where aj 6= 0 because B is linearly independent. Therefore

vj = −a−1
j (a1 v1 + ... + an vn )

and vj is a linear combination of the vectors in B for all j = n + 1, ..., m.


Now consider any v ∈ V . Since S is a spanning set we can write
m
X
v= bi v i
i=1

n
X m
X
= bi vi + bj v j
i=1 j=n+1
n
X m
X n
X
= bi vi + bj (−a−1
j ai )vi
i=1 j=n+1 i=1
n
X n
X m
X
= bi v i + ( −bj a−1
j ai )vi
i=1 i=1 j=n+1
n
X m
X
= (bi − bj a−1
j ai )vi .
i=1 j=n+1

Therefore v can be written as a linear combination of vectors in B and B is


a spanning set for V .

Theorem 10.6. Every finite linearly independent set in a finite dimensional


vector space V can be extended to form a basis for V .

Proof. Let S0 be a finite linearly independent set in V and let T be any finite
spanning set for V . Then S = S0 ∪ T is a finite spanning set for V containing
S0 . By Lemma 10.5, S contains a basis B of V with S0 ⊆ B.
The next result is required to prove the main theorem of this section:

Lemma 10.7. Let V be a finite dimensional vector space. Let R = {u1 , ..., un }
be a linearly independent set in V and S = {v1 , ..., vm } be a spanning set for
V . Then n ≤ m.

18
Proof. Consider the set T1 = {u1 } ∪ S. Then T1 is a spanning set because
it contains S and it is linearly dependent because u1 is a linear combination
of elements of S. Therefore T1 contains a basis B1 containing the linearly
independent set {u1 } by Lemma 10.5. B1 is a proper subset of T1 because T1
is linearly dependent and so B1 = {u1 } ∪ S1 where S1 is a proper subset of
S.
Now consider T2 = {u2 } ∪ B1 = {u1 , u2 } ∪ S1 . Then T2 is a spanning set
and it is linearly dependent because u2 ∈ span(B1 ). Therefore T2 contains a
basis B2 containing the linearly independent set {u1 , u2 } and B2 is a proper
subset of T2 . We can write B2 = {u1 , u2 } ∪ S2 where S2 is a proper subset of
S1 .
Continuing in this way we find that V has a basis Bn = {u1 , ..., un } ∪ Sn
where
Sn ( Sn−1 ( ... ( S1 ( S.
Hence
0 ≤ |Sn | ≤ |S| − n
⇒ n ≤ |S| = m.

Theorem 10.8. (The Basis Theorem) Any two bases of a finite dimen-
sional vector space have the same number of elements.
Proof. Let V be a finite dimensional vector space . Then V has a finite basis
B by definition. Suppose B has n elements. Let C be any other finite basis
and suppose that C has m elements. Since B is linearly independent and C
is a spanning set, then Lemma 10.7 gives n ≤ m. Similarly m ≤ n since B is
a spanning set and C is linearly independent. Therefore n = m.
Finally, any basis of V is finite, otherwise V would contain an infinite
linearly independent set, and hence a finite linearly independent set with
more than n elements. However this contradicts Lemma 10.7 because B is a
spanning set with n elements.
Definition 10.9. Let V be a finite dimensional vector space over K, with
V 6= {0}. The number of vectors in any basis for V is called the dimension
of V , denoted by dimK (V ) or simply dim(V ). For the zero space we set
dim{0} = 0.
Examples 10.10. 1. For any n ≥ 1 and any field K,

dimK (K n ) = n.

2. The set {1, x, ..., xn } is a basis for Pn . Hence dim(Pn ) = n + 1.

19
     
1 0 0 0 0 1
3. We have seen that the set of matrices , ,
0 0 0 1 1 0
forms a basis for the space of symmetric matrices in M2 (K). Hence this
space has dimension 3. It is a subspace of M2 (K).

4. The matrix space Mmn (K) has standard basis

{Eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n}

where Eij is the matrix with (i, j)-th entry 1 and all other entries 0.
Therefore
dimK (Mmn (K)) = mn.

5. Indicating the field K can be important because some sets can be re-
garded as vector spaces over different fields. For example if V = C,
then
dimC (C) = 1, with basis {1},
dimR (C) = 2, with basis {1, i}.

Summary: Let V be a finite dimensional vector space over a field K. Then

1. V has a finite basis and all bases of V have the same (finite) number of ele-
ments. This number is called the dimension of V .

2. Every finite spanning set for V contains a basis.

3. Every linearly independent set in V can be extended to form a basis.

The following theorem is an easy consequence of these important facts.

Theorem 10.11. Let V be a vector space with dim(V ) = n. Then

(i) any linearly independent set in V contains at most n vectors;

(ii) any spanning set for V contains at least n vectors;

(iii) any linearly independent set with exactly n vectors is a basis;

(iv) any spanning set of exactly n vectors is a basis.

20
Proof. (i) If S is a linearly independent subset of V , then S is contained in
a basis B and so |S| ≤ |B| = n.
(ii) If S is a spanning set for V , then S contains a basis B and n = |B| ≤
|S|.
(iii) If S is a linearly independent set with |S| = n, then (i) implies that
S = B for some basis B.
(iv) If S is a spanning set for V with |S| = n, then (ii) implies that S = B
for some basis B.

Remark: Let V be a vector space of dimension n. Then if we are given a


set of n vectors B, to prove B is a basis we only need to show ONE of the
following: either B is a spanning set OR B is linearly independent.

The next result shows that any subspace of a finite dimensional vector
space is also finite dimensional.

Theorem 10.12. Let W be a subspace of a finite dimensional vector space


V . Then

(i) W is finite dimensional and dim(W ) ≤ dim(V );

(ii) dim(W ) = dim(V ) if and only if W = V .

Proof. (i) Let dim(V ) = n. Since W ⊆ V , it cannot contain more than n


linearly independent vectors by Theorem 10.11(i) Let C = {w1 , ..., wm } be a
basis for W . Then C is linearly independent in V and therefore

dim(W ) = m ≤ n = dim(V ).

(ii) If dim(W ) = dim(V ) = n, then W has a basis with n vectors and


this is also a basis for V . Therefore W = V , Conversely

W = V ⇒ dimW = dimV.

Theorem 10.13. Let W1 and W2 be finite dimensional subspaces in a vector


space V . Then

dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).

21
Proof. Since W1 ∩ W2 is a subspace of W1 and of W2 it is finite dimensional
by Theorem 10.12(i). Let B = {u1 , ..., un } be a basis for W1 ∩ W2 . Then
this can be extended to a basis B1 = {u1 , ..., un , v1 , ..., vk } for W1 and a basis
B2 = {u1 , ..., un , w1 , ..., wm } for W2 . We will show that
B3 = B1 ∪ B2 = {u1 , ..., un , v1 , ..., vk , w1 , ..., wm }
is a basis for W1 + W2 . It is clear that B3 spans W1 + W2 so it remains to
show that it is linearly independent. Assume that
a1 u1 + ... + an un + b1 v1 + ... + bk vk + c1 w1 + ... + cm wm = 0 (∗)
for some scalars a1 , ..., an , b1 , ..., bk , c1 , ..., cm ∈ K. Then u + v + w = 0, where
u = a1 u1 + ... + an un , v = b1 v1 + ... + bk vk , w = c1 w1 + ... + cm wm . So
w = −u − v ∈ W1 ∩ W2 and we can write w = d1 u1 + ... + dn un for some
d1 , ..., dn ∈ K. Substituting into (∗) we get
(a1 + d1 )u1 + ... + (an + dn )un + b1 v1 + ... + bk vk = 0. (∗∗)
Since B1 is linearly independent, all coefficients in (∗∗) are zero and so (∗)
becomes
a1 u1 + ... + an un + c1 w1 + ... + cm wm = 0. (∗ ∗ ∗)
Since B2 is linearly independent, all coefficients in (∗ ∗ ∗) are zero and this
means that all coefficients in (∗) are zero. Therefore B3 is linearly indepen-
dent and is a basis for W1 + W2 .
Finally we have
dim(W1 + W2 ) = n + k + m = (n + k) + (n + m) − n
= dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 ).

Corollary 10.14. If W1 , W2 are finite dimensional subspaces of a vector


space V such that W1 ∩W2 = {0}, then dim(W1 +W2 ) = dim(W1 )+dim(W2 ).
Example 10.15. In M2 (R) let
W1 = {A ∈ M2 (R) | A = AT },
W2 = {A ∈ M2 (R) | A = −AT }
Then
 dim(W 1 ) = 3 (see Examples 10.10) and dim(W2 ) = 1 (with basis

0 1
).
−1 0
Also W1 ∩ W2 = {0} and so dim(W1 + W2 ) = 4 = dim(M2 (R)).
Therefore W1 + W2 = M2 (R).

22
11 Co-ordinates and change of bases
Let V be a vector space with dim(V ) = n and let B = {v1 , ..., vn } be a
basis for V . Then every vector v ∈ V has a unique expression as a linear
combination
v = a1 v1 + ... + an vn
for some a1 , ..., an ∈ K. The scalarsa1 , ...,
an are called the co-ordinates of
a1
 .. 
v with respect to B and the vector  .  ∈ K n is called the co-ordinate
an
vector of v with respect to B, denoted by [v]B .

Remark: In defining the co-ordinate vector, we have implicitly used an


ordering of the elements of the basis. Uniqueness of the co-ordinate vector
is based on this ordering.
   
2 1 0
Examples 11.1. 1. In R , the standard basis is B = ,
  0 1
a
and with respect to this basis any vector v = has co-ordinates
b
a and b and the co-ordinate vector is [v]B = v.
Consider another basis of R2 ,
   
0 1 1
B = , .
1 2
We can write
     
a 1 1
v= = (2a − b) + (b − a)
b 1 2
and so the co-ordinate vector wrt B 0 is
 
2a − b
[v]B0 = .
b−a

2. Let V = M2 (R). We know that


       
1 0 0 1 0 0 0 0
B= , , ,
0 0 0 0 1 0 0 1
is a basis for M2 (R) and any 2 × 2 matrix can be written as
         
a b 1 0 0 1 0 0 0 0
A= =a +b +c +d .
c d 0 0 0 0 1 0 0 1

23
Thereforethe co-ordinates
 of A are a, b, c, d and the co-ordinate vector
a
 b 
is [A]B = 
 c .

d
3. Consider C as a vector space over R. A standard basis is B = {1, i}.
The co-ordinate vector of v = a + bi with respect to B is
 
a
[v]B = .
b

Let C = {2 + 3i, 1 + 2i}, another basis for C. We can write v = a + bi =


(2a − b)(2 + 3i) + (2b − 3a)(1 + 2i). So the co-ordinate vector of v with
respect to C is  
2a − b
[v]C = .
2b − 3a
Remark If B = {v1 , ..., vn } is a basis for V over the field K, then the co-
ordinates of any vector in V with respect to B are uniquely determined.
Hence there is a 1-1 correspondence between the elements of V and their
co-ordinate vectors: for any vectors u, v ∈ V,

u = v ⇔ [u]B = [v]B .

In other words there is a 1-1 correspondence between V and K n given by

v ↔ [v]B .

The next result shows that this correspondence agrees with the vector space
operations:
Theorem 11.2. Let B = {v1 , ..., vn } be a basis for V . Then for all u, v ∈ V
and all a ∈ K
(i) [u + v]B = [u]B + [v]B

(ii) [au]B = a[u]B


Proof. Write u, v as linear combinations of the basis vectors:

u = a1 v1 + ... + an vn , v = b1 v1 + ... + bn vn .

Then

u + v = (a1 + b1 )v1 + ... + (an + bn )vn , au = aa1 v1 + ... + aan vn .

24
Hence
     
a1 + b 1 a1 b1
[u + v]B =  .. ..  +  ..  = [u] + [v] , and
=
  
. .   .  B B
an + b n an bn
   
aa1 a1
[au]B =  ...  = a  ..  = a[u] .
  
.  B
aan an

Corollary 11.3. For all u1 , ..., uk ∈ V, a1 , ..., ak ∈ K,

[a1 u1 + ... + ak uk ]B = a1 [u1 ]B + ... + ak [uk ]B .

Theorem 11.4. Let B = {v1 , ..., vn } be a basis for V . Then a set C =


{u1 , ..., uk } ⊆ V is linearly independent if and only if the set C 0 = {[u1 ]B , ...[uk ]B }
is linearly independent in K n .

Proof. Suppose that C is linearly independent. Let

a1 [u1 ]B + ... + ak [uk ]B = 0.

Then by Corollary 11.3,

[a1 u1 + ... + ak uk ]B = 0 ∈ K n .

Then by definition, a1 u1 + ... + ak uk = 0v1 + ... + 0vn = 0 ∈ V and so


a1 = ... = ak = 0 since C is linearly independent. This means that C 0 is
linearly independent.
Conversely suppose that C 0 is linearly independent and let

a1 u1 + ... + ak uk = 0 = 0v1 + ... + 0vn ∈ V.

Then
a1 [u1 ]B + ... + ak [uk ]B = 0 ∈ K n
and so a1 = ... = ak = 0 since C 0 is linearly independent. Therefore C is
linearly independent as required.

25
Change of Bases
Let B = {u1 , ..., un } and C = {v1 , ..., vn } be bases for a vector space V . Then
each ui has a unique expression as a linear combination of the vectors in C:

ui = a1i v1 + a2i v2 + ... + ani vn

and the co-ordinate vector of ui with respect to C is


 
a1i
[ui ]C =  ... 
 
ani

for each i = 1, ..., n.

Definition 11.5. The n×n matrix whose column vectors are the co-ordinate
vectors [u1 ]C , ..., [un ]C of the vectors of B with respect to C is denoted by PB→C
and is called the change of basis matrix from B to C:

PB→C = [[u1 ]C ...[un ]C ]

       
2 1 −1 1 2
Example 11.6. In R , let B = , ,C = , .
1 1 2 1
Then      
1 1
1 1 2 1 1
u1 = = + = v1 + v2 ,
3 2
1 3 1 3 3
     
−1 1 2
u2 = = − = 1.v1 − 1.v2 .
1 2 1
 
1/3 1
Therefore PB→C = .
1/3 −1
If we know the change of basis matrix PB→C and the co-ordinates of a
vector w ∈ V with respect to B we can easily get the co-ordinates of w with
respect to C.

Theorem 11.7. Let B = {u1 , ..., un } and C = {v1 , ..., vn } be bases for a
vector space V and let PB→C be the change of basis matrix from B to C. Then

(i) PB→C [v]B = [v]C for all v ∈ V .

(ii) If P is an n × n matrix such that P [v]B = [v]C for all v ∈ V , then


P = PB→C .

26
−1
(iii) PB→C is invertible and PB→C = PC→B .
 
a1
Proof. (i) Let v ∈ V with [v]B =  ...  ie. v = a1 u1 + ... + an un . Then
 
an
[v]C = [a1 u1 + ... + an un ]C
= a1 [u1 ]C + ... + an [un ]C
 
a1
= ([u1 ]C ...[un ]C )  ... 
 
an
= PB→C [v]B .
(ii) Suppose P is an n × n matrix with P [v]B = [v]C for all v ∈ V . Then
for v = ui we get
[ui ]B = ei ,
where ei is the vector with a 1 in the i-th position and 0 everywhere else. So
the i-th column of P is
Pi = P ei = P [ui ]B = [ui ]C ,
the i-th column of P = PB→C . Therefore P = PB→C .
(iii) By part (i) we have, for all v ∈ V ,
[v]B = PC→B [v]C = PC→B PB→C [v]B .
Hence P = PC→B PB→C has the property that
[v]B = P [v]B
for all v ∈ V . Therefore P = PC→B PB→C = In . Hence PB→C is invertible and
−1
PB→C = PC→B .
Corollary 11.8. If B, C and D are bases of a vector space V , then
PB→C = PD→C PB→D .
Proof. By part (i) of Theorem 11.7, we have for all v ∈ V ,
[v]C = PD→C [v]D = PD→C PB→D [v]B
and then part (ii) of Theorem 11.7 gives that
PB→C = PD→C PB→D .

27
This Corollary gives an easy method for computing a change of basis
matrix. Suppose we are given B, C and D. Then we have
PB→C = PD→C PB→D
= (PC→D )−1 PB→D .
We can take advantage of this if PB→D is easy to compute, for example if D
is a standard basis.
   
1 −1
Example 11.9. As in our previous example, let B = , ,
        1 1
1 2 1 0
C = , . Set D = , , the standard basis for
2 1 0 1
R2 . Then    
1 −1 1 2
PB→D = , PC→D = .
1 1 2 1
Then
PB→C = (PC→D )−1 PB→D
    
−1/3 2/3 1 −1 1/3 1
= =
2/3 −1/3 1 1 1/3 −1
which is what we calculated directly before.
Example 11.10. Consider C as a vector space over R. Let B = {1+i, 1−i},
C = {2 + 3i, 1 + 2i}, bases for C. Then
 
1 3
PB→C = .
−1 −5

12 Linear Transformations
Definition 12.1. A linear transformation from a vector space V (over
K) to a vector space W (over K) is a function T : V → W such that for all
u, v ∈ V and all a ∈ K,
1. T (u + v) = T (u) + T (v);
2. T (au) = aT (u).
Note: It follows from the definition that a function T : V → W is a linear
transformation if and only if for all u1 , ..., uk ∈ V and for all a1 , ..., ak ∈ K,
T (a1 u1 + ... + ak uk ) = a1 T (u1 ) + ... + ak T (uk ).
We say that T commutes with linear combinations.

28
Examples 12.2. 1. Matrix transformations: For any matrix A ∈ Mmn (K),
define TA : K n → K m by setting

TA (u) = Au

for all u ∈ K n . We know that this is a linear transformation. This


is an immediate consequence of the elementary properties of matrix
multiplication.

2. Define T : Mmn (K) → Mnm (K) by setting

T (A) = AT .

Let A, B ∈ Mmn (K) and c ∈ K. Then

T (A + B) = (A + B)T = AT + B T = T (A) + T (B)

T (cA) = (cA)T = cAT = cT (A).


Therefore T is a linear transformation.

3. Define T : M2 (R) → F by setting


 
a b
T = a sin(x) − 2d cos(x).
c d
   
a b e f
This is a linear transformation because for all , ∈
c d g h
M2 (R) and all t ∈ R we have,
     
a b e f a+e b+f
T + =T
c d g h c+g d+h

= (a+e) sin(x)−2(d+h) cos(x) = a sin(x)−2d cos(x)+e sin(x)−2h cos(x)


   
a b e f
=T +T
c d g h
and
    
a b ta tb
T t =T = ta sin(x) − 2td cos(x)
c d tc td
 
a b
= t(a sin(x) − 2d cos(x)) = tT .
c d

29
4. For any two vector spaces V and W over K, the zero transformation
T0 : V → W defined by
T0 (v) = 0
for all v ∈ V and the identity map I : V → V defined by

I(v) = v

for all v ∈ V are both linear transformations. We write IV for the


identity map on V if we want to indicate the space.

The next result lists some basic properties of linear transformations.

Theorem 12.3. Let T : V → W be a linear transformation. Then,

(i) T (0) = 0;

(ii) T (−v) = −T (v) for all v ∈ V ;

(iii) T (u − v) = T (u) − T (v) for all u, v ∈ V .

Proof. (i) T (0) = T (0.0) = 0.T (0) = 0.

(ii) T (−v) = T ((−1)v) = (−1)T (v) = −T (v).

(iii) T (u − v) = T (u) + T (−v) = T (u) − T (v).

Composition of Linear Transformations


Recall that for any functions f : A → B, g : B → C, the composite of f and
g is the function g ◦ f : A → C defined by

(g ◦ f )(x) = g(f (x))

for all x ∈ A.

Theorem 12.4. If T : U → V and S : V → W are linear transformations,


then the composite S ◦ T : U → W is also a linear transformation.

Proof. Exercise.
For any three linear transformations T : U → V, S : V → W, R : W → Y
we have
R ◦ (S ◦ T ) = (R ◦ S) ◦ T,

30
the associativity law for composition of functions. Also,
T ◦ IU = T, IV ◦ T = T.
Recall that a function f : A → B is invertible if there exists a function
g : B → A with g ◦ f = IA and f ◦ g = IB , and in this case g is unique and
is called the inverse of f , denoted by f −1 . Also f is invertible if and only if
f is injective and surjective.
Theorem 12.5. If a linear transformation T : V → W is invertible, then
the inverse T −1 : W → V is also a linear transformation.
Proof. Let x, y ∈ W and a ∈ K. Then
T ◦ T −1 (x + y) = IW (x + y) = x + y
= T ◦ T −1 (x) + T ◦ T −1 (y)
and since T is injective this gives
T −1 (x + y) = T −1 (x) + T −1 (y).
Also
T ◦ T −1 (ax) = ax = aT ◦ T −1 (x) = T (aT −1 (x))
and again, since T is injective this gives T −1 (ax) = aT −1 (x). Hence T −1 is a
linear transformation.

Kernel and Range of a Linear Transformation


Definition 12.6. (i) The range of a linear transformation T : V → W is
simply the range of T as a function, ie.
range(T ) = {w ∈ W | w = T (v) for some v ∈ V } ⊆ W.
(ii) The kernel of a linear transformation T : V → W is the set of all
vectors in V that are mapped to 0 by T :
ker(T ) = {v ∈ V | T (v) = 0} ⊆ V.
Examples 12.7. 1. For a matrix transformation TA : K n → K m , where
A ∈ Mmn (K), we have, for any v ∈ K n ,
v ∈ ker(TA ) ⇔ TA (v) = 0
⇔ Av = 0.
n
Therefore ker(TA ) = {v ∈ K | Av = 0}.
We call this set the null space of A, denoted by null(A). The range
of TA is the subspace of K m spanned by the columns of A. We call this
col(A).

31
2. Consider T : Mmn (K) → Mnm (K) given by T (A) = AT . Then
A ∈ ker(T ) ⇔ T (A) = 0nm ⇔ AT = 0nm ⇔ A = 0mn .
Therefore ker(T ) = {0mn }. Here we have
range(T ) = Mnm (K)
because every matrix in Mnm (K) is the image of its transpose, ie.
A = T (AT ) = (AT )T
for all A ∈ Mnm (K).
3. Consider T : M2 (R) → F given by
 
a b
T = a sin(x) − 2d cos(x).
c d
 
a b
Here ∈ ker(T ) ⇔ a sin(x) − 2d cos(x) = 0 for all x ∈ R.
c d
Since sin(x) and cos(x) are linearly independent in F, this implies that
a = d = 0. Hence
  
0 b
ker(T ) = | b, c ∈ R .
c 0
The range of T is the subspace of F spanned by {sin(x), cos(x)}.
Theorem 12.8. The kernel of a linear transformation T : V → W is a
subspace of V and the range is a subspace of W .
Proof. Since T (0) = 0, we have 0 ∈ ker(T ) and so ker(T ) 6= ∅. Let u, v ∈
ker(T ) and a ∈ K. Then
T (u + v) = T (u) + T (v) = 0 + 0 = 0
and so u + v ∈ ker(T ), and
T (au) = aT (u) = a0 = 0
and so au ∈ ker(T ). By the subspace criteria, ker(T ) is a subspace of V .
Let x, y ∈ range(T ) and suppose x = T (u), y = T (v) for some u, v ∈ V .
Then
x + y = T (u) + T (v) = T (u + v)
and so x + y ∈ range(T ) and if a ∈ K, then
ax = aT (u) = T (au)
and so ax ∈ range(T ). Therefore range(T ) is a subspace of W .

32
Definition 12.9. Let T : V → W be a linear transformation. The rank of
T is defined by
rank(T ) = dim(range(T ))
and the nullity of T is defined as

nullity(T ) = dim(ker(T )).

Examples 12.10. 1. For T : Mmn (K) → Mnm (K) with T (A) = AT we


have range(T ) = Mnm (K), ker(T ) = {0} and so

rank(T ) = mn and nullity(T ) = 0.


 
a b
2. For T : M2 (R) → F with T = a sin(x)−2d cos(x) we have
c d   
0 b
range(T ) = span({sin(x), cos(x)}) and ker(T ) = | b, c ∈ R .
c 0
Since {sin(x),cos(x)} is linearly   in F, we
independent get rank(T ) = 2
0 1 0 0
and since the set of matrices , is a basis for
0 0 1 0
ker(T ), we get nullity(T ) = 2.

Let T : V → W be a linear transformation and assume that V is finite


dimensional, ie. V has a finite basis B. Then T (B) is a spanning set for the
range of T and hence range(T ) is also finite dimensional (although W may
not be). The next result tells us more about the dimension of range(T ), ie.
the rank of T .
Theorem 12.11. The Rank Theorem: Let T : V → W be a linear
transformation from a finite dimensional vector space V to an arbitrary space
W . Then
rank(T ) + nullity(T ) = dim(V ).
Proof. Let dim(V ) = n and let B1 = {v1 , ..., vk } be a basis for ker(T ) (so
nullity(T ) = k). Since B1 is linearly independent, it can be extended to a ba-
sis B = {v1 , ...vk , vk+1 , ..., vn } for V . We will show that B2 = {T (vk+1 ), ..., T (vn )}
is a basis for range(T ). Using the definitions of rank(T ) and nullity(T ), this
will be sufficient to prove the theorem.
We need to show that B2 spans range(T ) and that it is linearly indepen-
dent. Let w ∈ range(T ). Then w = T (v) for some v ∈ V . Since B is a basis
for V we can find a1 , ..., an ∈ K with

v = a1 v1 + ... + ak vk + ak+1 vk+1 + ... + an vn .

33
But then
w = T (a1 v1 + ... + ak vk + ak+1 vk+1 + ... + an vn )
= a1 T (v1 ) + ... + ak T (vk ) + ak+1 T (vk+1 ) + ... + an T (vn )
= ak+1 T (vk+1 ) + ... + an T (vn )
because v1 , .., vk ∈ ker(T ). Therefore w is a linear combination of vectors in
B2 and so range(T ) is spanned by B2 . To show that B2 is linearly indepen-
dent, assume that

ck+1 T (vk+1 ) + ... + cn T (vn ) = 0

for some ck+1 , ..., cn ∈ K. Then

T (ck+1 vk+1 + ... + cn vn ) = 0

and ck+1 vk+1 + ... + cn vn ∈ ker(T ). Since B1 is a basis for ker(T ) we have

ck+1 vk+1 + ... + cn vn = c1 v1 + ... + ck vk

for some c1 , ..., ck ∈ K, but then

−c1 v1 + ... − ck vk + ck+1 vk+1 + ... + cn vn = 0.

Since B is linearly independent we get c1 = ... = ck = ck+1 = ... = cn = 0.


This means that B2 is linearly independent as required.
Recall that a function f : A → B is surjective (or onto) if range(f ) = B,
ie. ∀b ∈ B, ∃a ∈ A with f (a) = b; and f is injective (or 1-1) if ∀x, y ∈ A,
f (x) = f (y) ⇒ x = y.

Theorem 12.12. A linear transformation T : V → W is injective if and


only if ker(T ) = {0}.

Proof. If T is injective and v ∈ V with v 6= 0, then T (v) 6= T (0) = 0 and so


v∈/ ker(T ). Therefore ker(T ) = {0}.
Conversely suppose ker(T ) = {0} and let u, v ∈ V with T (u) = T (v).
Then
0 = T (u) − T (v) = T (u − v)
⇒ u − v ∈ ker(T ) ⇒ u − v = 0 ⇒ u = v.
Therefore T is injective.
The following results are easy consequences of the Rank Theorem and
Theorem 12.12.

34
Theorem 12.13. Let V and W be vector spaces over a field K with dim(V ) =
dim(W ). Then a linear transformation T : V → W is injective if and only
if it is surjective.

Proof. Exercise

Theorem 12.14. Let T : V → W be an injective linear transformation


and let B = {v1 , ..., vn } be a linearly independent set in V . Then T (B) =
{T (v1 ), ..., T (vn )} is a linearly independent set in W .

Proof. Suppose that

a1 T (v1 ) + ... + an T (vn ) = 0

in W . Then
T (a1 v1 + ... + an vn ) = 0
and so a1 v1 + ... + an vn ∈ ker(T ) = {0}. Then

a1 v1 + ... + an vn = 0

and so a1 = ... = an = 0 because B is linearly independent in V . Therefore


T (B) is linearly independent.

Corollary 12.15. If T : V → W is injective and dim(V ) = dim(W ) = n,


then T maps any basis of V to a basis of W .

Proof. Let B = {v1 , ..., vn } be a basis for V . Then T (B) is linearly indepen-
dent set of n vectors in W and since dim(W ) = n, T (B) must form a basis
for W .

Isomorphisms of Vector Spaces


Definition 12.16. A linear transformation T : V → W is called an iso-
morphism if it is a bijection (injective and surjective). We say that two
vector spaces V and W are isomorphic (V ∼ = W ) if there exists an isomor-
phism T : V → W .

Theorem 12.17. Two finite dimensional vector spaces V and W over K


are isomorphic if and only if dim(V ) = dim(W ).

35
Proof. If T : V → W is an isomorphism, then range(T ) = W and ker(T ) =
{0} and then the Rank Theorem gives

dim(W ) = rank(T ) = dim(V ) − nullity(T ) = dim(V ).

Conversely let dim(V ) = dim(W ) = n and let B = {v1 , ..., vn } and C =


{w1 , ..., wn } be bases for V and W respectively. Let T : V → W be the
linear transformation T : V → W such that T (vi ) = wi for i = 1, ..., n. Then
T is surjective because C ⊆ range(T ) and hence it is injective by Theorem
12.13. Hence T is an isomorphism and V ∼ = W.

Corollary 12.18. Any finite dimensional vector space V over a field K with
dim(V ) = n is isomorphic to K n .

In fact if V is a K-space with dim(V ) = n and B = {v1 , ...vn } is a basis


for V , then the map T : V → K n defined by

T (v) = [v]B

is an isomorphism.

13 The Matrix of a Linear Transformation


Definition 13.1. Let V and W be finite dimensional vector spaces over K
with bases B = {v1 , ..., vn } and C = {w1 , ..., wm } respectively and let T : V →
W be a linear transformation. Then the m × n matrix whose columns are the
co-ordinate vectors [T (vi )]C (i = 1, ..., n) is denoted by B [T ]C and called the
matrix of T with respect to B and C.

Remark: We are implicitly choosing an ordering of the basis vectors.


Choosing different orderings would give matrices with permuted rows or
columns.
 
x  
3 2 x+y
Example 13.2. Let T : R → R be given by T  y  = .
y−z
z
Let B = {v1 , v2 , v3 }, where
     
1 0 0
v1 =  1  , v2 =  1  , v3 =  0  ,
0 1 1

36
and C = {w1 , w2 }, where
   
1 1
w1 = , w2 = ,
1 0

be bases of R3 and R2 respectively. Then


 
1  
2
T (v1 ) = T  1  = = 1w1 + 1w2 ,
1
0
 
0  
1
T (v2 ) = T  1  = = 0w1 + 1w2 ,
0
1
 
0  
0
T (v3 ) = T  0  = = −1w1 + 1w2 .
−1
1
 
1 0 −1
Therefore B [T ]C = .
1 1 1
Theorem 13.3. Let V and W be finite dimensional vector spaces over K
with bases B, C respectively and let T : V → W be a linear transformation.
Then, for all v ∈ V ,
B [T ]C [v]B = [T (v)]C .

Proof. Let B = {v1 , ..., vn } and let {e1 , ..., en } be the standard basis for K n .
Then [vi ]B = ei for each i = 1, ..., n. Let A =B [T ]C . Then
A[vi ]B = Aei = [T (vi )]C (∗)
since Aei is the i-th column of A and this is [T (vi )]C by the definition of A.
Now any v ∈ V can be written as a linear combination v = a1 v1 + ... + an vn
and then
[v]B = a1 e1 + ... + an en .
So
A[v]B = A(a1 e1 + ... + an en )
= a1 Ae1 + ... + an Aen
= a1 [T (v1 )]C + ... + an [T (vn )]C
= [a1 T (v1 ) + ... + an T (vn )]C
= [T (a1 v1 + ... + an vn )]C = [T (v)]C .

37
 
1
In our previous example, let v =  3  = 1v1 + 2v2 + 2v3 . Then
4
 
4
T (v) = = −1w1 + 5w2 ,
−1
so  
−1
[T (v)]C = .
5
Also  
  1  
1 0 −1  2 = −1
B [T ]C [v]B = ,
1 1 1 5
2
as predicted by the Theorem.
Example 13.4. Let IV : V → V be the identity map on a vector space V
with dim(V ) = n and let B = {v1 , ..., vn } and C be bases of V . What is
B [IV ]C ?
By definition, the i-th column of this matrix is [IV (vi )]C = [vi ]C . Therefore
B [IV ]C = PB→C ,
the change of basis matrix. In particular
B [IV ]B = In .

Matrices of Composites and Inverse Linear Transforma-


tions
Theorem 13.5. Let U, V and W be finite dimensional vector spaces over a
field K with bases B, C and D respectively and let T : U → V and S : V → W
be linear transformations. Then
B [S ◦ T ]D =C [S]D ·B [T ]C .
Proof. Let B = {v1 , ..., vn }. Then the i-th column of B [S ◦ T ]D is
[(S ◦ T )(vi )]D = [(S(T (vi ))]D
=C [S]D [T (vi )]C by Theorem 13.3
=C [S]D ·B [T ]C [vi ]B by Theorem 13.3
=C [S]D ·B [T ]C ei .
However, this is the i-th column of the matrix C [S]D ·B [T ]C and the Theorem
follows.

38
Theorem 13.6. Let V and W be finite dimensional vector spaces over a
field K with bases B and C respectively, and let T : V → W be a linear
transformation. Then T is invertible if and only if dim(V ) = dim(W ) and
B [T ]C is an invertible matrix. In this case

−1
C [T ]B = (B [T ]C )−1 .

Proof. If T is invertible it must be injective and surjective and therefore


ker(T ) = {0} and range(T ) = W . Then dim(V ) = dim(W ) by the Rank
Theorem. Then both B [T ]C and C [T −1 ]B are n×n matrices and since T −1 ◦T =
IV we have
−1
C [T ]B ·B [T ]C =B [IV ]B = In .
Therefore B [T ]C is invertible and (B [T ]C )−1 =C [T −1 ]B .
Conversely, if dim(V ) = dim(W ) and B [T ]C is invertible, then ker(T ) =
{0} because if v ∈ ker(T ), then

T (v) = 0 ⇒ [T (v)]C = 0

⇒B [T ]C [v]B = 0 ⇒ [v]B = 0
⇒ v = 0.
The Rank Theorem now gives range(T ) = W and so T is a bijection and
hence T is invertible.

Change of Basis and Similarity


In this section we consider the special case of linear transformations from V
to V . These are called linear operators. For a linear operator T : V → V
where V is finite dimensional with basis B, we introduce the notation

[T ]B =B [T ]B .

Note that [T ]B ∈ Mn (K) where n = dim(V ). Now suppose that C is another


basis for V . What is the relationship between [T ]C and [T ]B ?

Theorem 13.7. Let V be a finite dimensional K-space with bases B and C


and T : V → V be a linear transformation. Then

[T ]C = P −1 [T ]B P

where P = PC→B is the change of basis matrix from C to B.

39
Proof. Let B = {u1 , ..., un } and C = {v1 , ..., vn }. Then the i-th column of
[T ]C is
[T (vi )]C = PB→C .[T (vi )]B
= PB→C .[T ]B [vi ]B
= PB→C .[T ]B .PC→B [vi ]C
= PB→C .[T ]B .PC→B ei .
Therefore
[T ]C = PB→C .[T ]B .PC→B
= (PC→B )−1 .[T ]B .PC→B
as required.
Note: If matrices A and B can be written as B = P −1 AP for some invertible
matrix P , then we say that A and B are similar matrices. Theorem 13.7
says that the matrices of T with respect to different bases are similar.
Conversely if A, B ∈ Mn (K) are similar matrices, then they represent the
same linear operator T : K n → K n with respect to some bases B, C of K n .
Suppose B = P −1 AP for some invertible P ∈ Mn (K). We have A = [TA ]B
where TA (v) = Av for all v ∈ K n and B = {e1 , ..., en } is the standard basis
for K n . Therefore

B = P −1 AP = P −1 [TA ]B P = [TA ]C

where C is the basis {P e1 , ..., P en }, ie. the basis of K n with change of basis
matrix PC→B = P .
 
1 1
Example 13.8. Let A = ∈ M2 (R). Then A = [TA ]B where
1 1
B = {e1 , e2 } is the standard basis of R2 and
      
x 1 1 x x+y
TA = = .
y 1 1 y x+y
   
1 1
Let C = {u1 , u2 } with u1 = = e1 + e2 and u2 = = e1 − e2 .
  1  −1
1 1 1/2 1/2
Then PC→B = and (PC→B )−1 = . Therefore
1 −1 1/2 −1/2
   
1/2 1/2 1 1 1 1
[TA ]C =
1/2 −1/2 1 1 1 −1

40
 
2 0
= .
0 0
We can check this directly from the definition:
   
2 0
TA (u1 ) = = 2u1 + 0u2 , TA (u2 ) = = 0u1 + 0u2 .
2 0

14 Subspaces Associated with Matrices


Definition 14.1. Let A be an m × n matrix in Mmn (K).
1. The row space of A is the subspace of K n spanned by the rows of A,
denoted by row(A).
2. The column space of A is the subspace of K m spanned by the columns
of A, denoted by col(A).
 
1 −1
Example 14.2. Let A =  0 1  ∈ M32 (R). Determine whether b =
  3 −3
1
 2  is in the column space of A.
3      
1 −1 1
We need to find x, y ∈ R such that x 0 +y
   1  =  2 . This
3 −3   3
  1
x
is the same as solving Av = b where v = and b =  2 . From part
y
3
1 of the course we use row operations to transform the augmented matrix to
its reduced row echelon form (RREF) ie.
   
1 −1 1 1 0 3
(A | b) =  0 1 2  →  0 1 2  .
3 −3 3 0 0 0
 
3
Therefore the system Av = b is consistent (with unique solution v = )
2
and b is in col(A).
Note that the row space is the span of
     
1 0 3
, , .
−1 1 −3

41
If we wish to discover whether a vector v is in row(A), then we can
consider AT , using the method above and the fact that row(A) = col(AT ).

Theorem 14.3. If A, B ∈ Mmn (K) are row equivalent, then row(B) =


row(A). If A, B are column equivalent, then col(B) = col(A).

Proof. The matrix A can be transformed into B by applying a sequence of


elementary row operations. Therefore the rows of B are linear combinations
of the rows of A and so row(B) ⊆ row(A). On the other hand, reversing the
row operations we can transform B into A and applying the same argument
we get row(A) ⊆ row(B). Therefore row(A) = row(B). The second part
uses the same argument but with elementary column operations.

Theorem 14.4. For any matrix A ∈ Mmn (K), the dimension of row(A) is
the number of non-zero rows in the reduced row echelon form of A and the
non-zero rows of the RREF of A form a basis for row(A).

Proof. Let R be the RREF of A. By Theorem 14.3, row(R) = row(A) and


so dim(row(A)) = dim(row(R)). Suppose that R has k non-zero rows. Each
of these rows has a leading 1 with all other entries in the same column equal
to zero. Therefore the non-zero rows of R are linearly independent. Hence
the non-zero rows form a basis of row(R) = row(A) and dim(row(A)) =
dim(row(R)) = k as required.

Definition 14.5. For any m × n matrix A, define the null space of A,


null(A), as the subset of K n consisting of solutions of the linear system
Av = 0, i.e.,
null(A) = {v ∈ K n | Av = 0}.

Defining TA : K n → K m to be the linear transformation given by TA (v) =


Av, we have null(A) = null(TA ).

Theorem 14.6. Let A be an m × n matrix. Then the set null(A) is a


subspace of K n .

Proof. Since A0=0, we have 0 ∈ null(A). Let u, v ∈ null(A). Then

A(u + v) = Au + Av = 0 + 0 = 0

and for any c ∈ K,


A(cu) = c(Au) = c0 = 0
and therefore u+v, cu ∈ null(A). By the subspace test, null(A) is a subspace
of K n .

42
The dimension of null(A) is called the nullity of A, denoted nullity(A).

 
1 −1 2
Example 14.7. Find a basis for null(A) where A =  0 1 −1 .
 3 −3 6
a
We must find a basis for the set of vectors v =  b  such that Av = 0.
c
The augmented matrix is
 
1 −1 2 0
 0 1 −1 0 
3 −3 6 0

which reduces to  
1 −1 2 0
 0 1 −1 0  .
0 0 0 0
   
−t −1
We get a − b + 2c = b − c = 0 and so v =  t  = t  1  for t ∈ R.
t 1  
−1
Therefore null(A) a 1-dimensional subspace of R3 spanned by  1  and
1
nullity(A) = 1.

Lemma 14.8. Let A ∈ Mmn (K). Define TA : K n → K m to be the linear


transformation given by TA (v) = Av. Then col(A) = range(TA ).

Proof. Write vi for the ith column of A. Then vi = Aei ∈ range(TA ) (where
{e1 , . . . , en } is the standard basis of K n ). So

col(A) = span({v1 , . . . , vn }) ⊆ range(TA ).

Now let w ∈ range(TA ). Then w = TA (v) = Av for some v ∈ K n .


Write v = a1 e1 + · · · + an en , where a1 , . . . , an ∈ K.
Then

w = Av = A(a1 e1 +· · ·+an en ) = a1 Ae1 +· · ·+an Aen = a1 v1 +· · ·+an vn ∈ col(A).

So range(TA ) ⊆ col(A), and we are done.

43
Definition 14.9. For A ∈ Mmn (K), define rank(A) to be the dimension of
the column space of A. By the above lemma we have rank(A) = rank(TA ).

Theorem 14.10. (The Rank Theorem for Matrices) If A is an m × n


matrix, then
rank(A) + nullity(A) = n.

Proof. Define as usual TA : K n → K m to be the linear transformation given


by TA (v) = Av. By Lemma 14.8 we have rank(A) = rank(TA ). By definition
we have nullity(A) = nullity(TA ). The result follows by the rank theorem
for linear transformations.
In particular, for A, B ∈ Mmn (K), if null(A) = null(B), then rank(A) =
rank(B).

Theorem 14.11. Let A ∈ Mmn (K). Then dim(col(A)) = dim(row(A)).


Hence rank(AT ) = rank(A).

Note that it is not true in general that col(A) = row(A). They just have
the same dimension.
Proof. Let R be the reduced row echelon form (RREF) of A. By Theorem
14.4 we have dim(row(A)) = r, the number of non-zero rows in R. Since
solutions to Av = 0 are precisely the solutions of Rv = 0, we have null(A) =
null(R).
By Theorem 14.10 null(A) + rank(A) = n = null(R) + rank(R). So

dim(col(A)) = rank(A) = rank(R) = dim(col(R)) = r,

as this is the number of pivot columns of R.


For the last part, observe that

rank(AT ) = dim(col(AT )) = dim(row(AT )) = dim(col(A)) = rank(A).

 
3 −1 5
Example 14.12. Consider the matrix A =  2 1 3  ∈ M3 (R). The
0 −5 1
reduced row echelon form of A is
 
1 0 8/5
R =  0 1 −1/5  .
0 0 0

44
Therefore row(A) had dimension 2 and we can take the two non-zero rows
of R as a basis for row(A), i.e.,
   
 1 0 
B=  0  ,  1 
8/5 −1/5
 

is a basis for row(A) and row(A) has dimension 2. For the column space we
T
can perform column operations on A or row operations
 on A . If we do the
3 2 0
latter the reduced row echelon form of AT =  −1 1 −5  is
5 3 1
 
1 0 2
R0 =  0 1 −3  .
0 0 0
Therefore a basis for col(A) is
   
 1 0 
B0 =  0  ,  1  .
2 −3
 

We have rank(A) = 2. By the rank theorem we have nullity(A) = 3 −


rank(A) = 1.
To finish the course we pull together results we have proved in parts 1
and 2 and apply them to the special case of invertible matrices to get:
Theorem 14.13. (The Fundamental Theorem of Invertible Matri-
ces) Let A be an n × n matrix. The following are equivalent:
(i) A is invertible
(ii) Av = b has a unique solution for v, for any given b ∈ Rn
(iii) Av = 0 has only the trivial solution v = 0
(iv) rank(A) = n
(v) nullity(A) = 0
(vi) The reduced row echelon form of A is In
(vii) The rows of A form a basis of Rn
(viii) The columns of A form a basis of Rn .

45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy