Minpolyandappns
Minpolyandappns
Minpolyandappns
KEITH CONRAD
1. Introduction
The easiest matrices to compute with are the diagonal ones. The sum and product of
diagonal matrices can be computed componentwise along the main diagonal, and taking
powers of a diagonal matrix is simple too. All the complications of matrix operations are
gone when working only with diagonal matrices. If a matrix A is not diagonal but can be
conjugated to a diagonal matrix, say D := P AP −1 is diagonal, then A = P −1 DP so Ak =
P −1 Dk P for all integers k, which reduces us to computations with a diagonal matrix. In
many applications of linear algebra (e.g., dynamical systems, differential equations, Markov
chains, recursive sequences) powers of a matrix are crucial to understanding the situation,
so the relevance of knowing when we can conjugate a nondiagonal matrix into a diagonal
matrix is clear.
We want look at the coordinate-free formulation of the idea of a diagonal matrix, which
will be called a diagonalizable operator. There is a special polynomial, the minimal polyno-
mial (generally not equal to the characteristic polynomial), which will tell us exactly when
a linear operator is diagonalizable. The minimal polynomial will also give us information
about nilpotent operators (those having a power equal to O).
All linear operators under discussion are understood to be acting on nonzero finite-
dimensional vector spaces over a given field F .
2. Diagonalizable Operators
Definition 2.1. We say the linear operator A : V → V is diagonalizable when it admits
a diagonal matrix representation with respect to some basis of V : there is a basis B of V
such that the matrix [A]B is diagonal.
Let’s translate diagoinalizability into the language of eigenvectors rather than matrices.
Theorem 2.2. The linear operator A : V → V is diagonalizable if and only if there is a
basis of eigenvectors for A in V .
Proof. Suppose there is a basis B = {e1 , . . . , en } of V in which [A]B is diagonal:
a1 0 ··· 0
0 a2 ··· 0
[A]B = .. .
.. ..
. . . 0
0 0 · · · an
Then Aei = ai ei for all i, so each ei is an eigenvector for A. Conversely, if V has a basis
{v1 , . . . , vn } of eigenvectors of A, with Avi = λi vi for λi ∈ F , then in this basis the matrix
representation of A is diag(λ1 , . . . , λn ).
A basis of eigenvectors for an operator is called an eigenbasis.
An example of a linear operator that is not diagonalizable over all fields F is ( 10 11 ) acting
x
on F 2 . Its only eigenvectors are the vectors 0 . There are not enough eigenvectors to form
and ( 10 01 ) have the same characteristic polynomial, and the second matrix is diagonalizable,
the characteristic polynomial doesn’t determine (in general) if an operator is diagonalizable.
Here are the main results we will obtain about diagonalizability:
(1) There are ways of determining if an operator is diagonalizable without having to
look explicitly for a basis of eigenvectors.
(2) When F is algebraically closed, “most” operators on a finite-dimensional F -vector
space are diagonalizable.
(3) There is a polynomial, the minimal polynomial of the operator, which can be used
to detect diagonalizability.
(4) If two operators are each diagonalizable, they can be simultaneously diagonalized
(i.e., there is a common eigenbasis) precisely when they commute.
Let’s look at three examples related to diagonalizability over R and C.
Example 2.3. Let R = ( 01 −1 2
0 ), the 90-degree rotation matrix acting on R . It is not
2 2
diagonalizable on R since there are no eigenvectors: a rotation in R sends no nonzero
vector to a scalar multiple of itself. This geometric reason is complemented by an algebraic
reason: the characteristic polynomial T 2 + 1 of R has no roots in R, so there are no real
eigenvalues and thus no eigenvectors in R2 . However, there are roots ±i of T 2 + 1 in C,
and there are eigenvectors of R as an operator on 2 2
2 i
−i
C rather than R . Eigenvectorsi of −iRin
C for the eigenvalues i and −i are 1 and 1 , respectively. In the basis B = { 1 , 1 },
the matrix of R is [R]B = ( 0i −i 0 ), where the first diagonal entry is the eigenvalue of the first
basis vector in B and the second diagonal entry is the eigenvalue of the second basis vector
in B. (Review the proof of Theorem 2.2 to see why this relation between the ordering of
vectors in an eigenbasis and the ordering of entries in a diagonal matrix always holds.)
Put more concretely, since passing to a new matrix representation of an operator from
an old one amounts to conjugating the old matrix representation by the change-of-basis
matrix expressing the old basis in terms of the new basis, we must have ( 0i −i0 ) = P RP −1
−i/2 1/2
where P = ([ 10 ]B [ 01 ]B ) = ( i/2 1/2 ). Verify this P really conjugates R to ( 0i −i0 ). Note
Example 2.4. Every A ∈ Mn (R) satisfying A = A> can be diagonalized over R. This
is a significant result, called the real spectral theorem. (Any theorem that gives sufficient
conditions under which an operator can be diagonalized is called a spectral theorem, because
the eigenvalues of an operator is called its spectrum.) The essential step in the proof of the
real spectral theorem is to show that every real symmetric matrix has a real eigenvalue.
> >
Example 2.5. Any A ∈ Mn (C) satisfying AA = A A is diagonalizable in Mn (C).1 When
>
A is real, so A = A> , saying AA> = A> A is weaker than saying A = A> . In particular,
the real matrix ( 01 −1
0 ) commutes with its transpose and thus is diagonalizable over C,
but the real spectral theorem does not apply to this matrix and in fact this matrix isn’t
diagonalizable over R (it has no real eigenvalues).
(See the proof of Theorem 2.2.) The converse is false: if all the eigenvalues of an operator
are in F this does not necessarily mean the operator is diagonalizable. Just think about
our basic example ( 10 11 ), whose only eigenvalue is 1. It is a “repeated eigenvalue,” in the
sense that the characteristic polynomial (T − 1)2 has 1 as a repeated root. Imposing an
additional condition, that the eigenvalues lie in F and are simple roots of the characteristic
polynomial, does force diagonalizability. To prove this, we start with a general lemma on
eigenvalues and linear independence.
Lemma 3.1. Eigenvectors for distinct eigenvalues are linearly independent. More pre-
cisely, if A : V → V is linear and v1 , . . . , vr are eigenvectors of A with distinct eigenvalues
λ1 , . . . , λr , the vi ’s are linearly independent.
Proof. This will be an induction on r. The case r = 1 is easy. If r > 1, suppose there is a
linear relation
(3.1) c1 v1 + · · · + cr−1 vr−1 + cr vr = 0
with ci ∈ F . Apply A to both sides: vi becomes Avi = λi vi , so
(3.2) c1 λ1 v1 + · · · + cr−1 λr−1 vr−1 + cr λr vr = 0.
Multiply the linear relation in (3.1) by λr :
(3.3) c1 λr v1 + · · · + cr−1 λr vr−1 + cr λr vr = 0.
Subtracting (3.3) from (3.2), the last terms on the left cancel:
c1 (λ1 − λr )v1 + · · · + cr−1 (λr−1 − λr )vr−1 = 0.
Now we have a linear relation with r − 1 eigenvectors having distinct eigenvalues. By
induction, all the coefficients are 0: ci (λi −λr ) = 0 for i = 1, . . . , r −1. Since λ1 , . . . , λr−1 , λr
are distinct, λi − λr 6= 0 for i = 1, . . . , r − 1. Thus ci = 0 for i = 1, . . . , r − 1. Now our
original linear relation (3.1) becomes cr vr = 0. The vector vr is not 0 (eigenvectors are
always nonzero by definition), so cr = 0.
Theorem 3.2. A linear operator on V whose characteristic polynomial is a product of
linear factors in F [T ] with distinct roots is diagonalizable.
Proof. The assumption is that there are n different eigenvalues in F , where n = dim V . Call
them λ1 , . . . , λn . Let ei be an eigenvector with eigenvalue λi . The eigenvalues are distinct,
so by Lemma 3.1 the ei ’s are linearly independent. Since there are n of these vectors, the
ei ’s are a basis of V , so the operator admits an eigenbasis and is diagonalizable.
Example 3.3. The matrix ( 10 11 ) has characteristic polynomial (T − 1)2 , which has linear
factors in R[T ] but the roots are not distinct, so Theorem 3.2 does not say the matrix is
diagonalizable in M2 (R), and in fact it isn’t.
Example 3.4. The matrix ( 11 10 ) has characteristic polynomial T 2 − T − 1, which has 2
different real roots, so the matrix is diagonalizable in M2 (R).
There are many diagonal matrices with repeated diagonal entries (take the simplest
example, In !), and their characteristic polynomials have repeated roots. The criterion in
Theorem 3.2 will never detect a diagonalizable operator with a repeated eigenvalue, so that
criterion is a sufficient but not necessary condition for diagonalizability. In Section 4 we will
see a way to detect diagonalizability using a different polynomial than the characteristic
polynomial that is both necessary and sufficient.
Exactly how common is it for a characteristic polynomial to have distinct roots (whether
or not they lie in F )? Consider 2 × 2 matrices: the characteristic polynomial T 2 + bT + c has
repeated roots if and only if b2 − 4c = 0. A random quadratic will usually satisfy b2 − 4c 6= 0
4 KEITH CONRAD
(you have to be careful to arrange things so that b2 − 4c is 0), so “most” 2 × 2 matrices have
a characteristic polynomial with distinct roots. Similarly, a random n × n matrix usually
has a characteristic polynomial with distinct roots. In particular, over the complex numbers
this means a random n × n complex matrix almost certainly has distinct eigenvalues and
therefore (since the eigenvalues lie in C) Theorem 3.2 tells us that a random n × n complex
matrix is diagonalizable. So diagonalizability is the rule rather than the exception over C,
or more generally over an algebraically closed field.
Since r(T ) vanishes at A and either r(T ) = 0 or r(T ) has degree less than the degree of the
minimal polynomial of A, it must be the case that r(T ) = 0. Therefore f (T ) = mA (T )q(T ),
so mA (T ) | f (T ).
Theorem 4.4 justifies speaking of the minimal polynomial. If two monic polynomials are
both of least degree killing A, Theorem 4.4 shows they divide each other, and therefore they
are equal (since they are both monic). Minimal polynomials of linear operators need not
be irreducible (e.g., ( 10 11 ) has minimal polynomial (T − 1)2 ).
Example 4.5. Write V as a direct sum of subspaces, say V = U ⊕ W . Let PU : V → V be
projection onto the subspace U from this particular decomposition: P (u + w) = u. Since
P (u) = u, we have P 2 (u + w) = P (u + w), so P 2 = P . Thus P is killed by the polynomial
T 2 − T = T (T − 1). If T 2 − T is not the minimal polynomial then by Theorem 4.4 either
T or T − 1 kills P ; the first case means P = O (so U = {0}) and the second case means
P = idV (so U = V ). As long as U and W are both nonzero, P is neither O nor idV and
T 2 − T is the minimal polynomial of the projection P .
While all operators on an n-dimensional space have characteristic polynomials of degree n,
the degree of the minimal polynomial varies from one operator to the next. Its computation
is not as mechanical as the characteristic polynomial, since there isn’t a universal formula
for the minimal polynomial. Indeed, consider the matrix
1+ε 0
Aε = .
0 1
For ε 6= 0, Aε has two different eigenvalues, 1 + ε and 1. Therefore the minimal polynomial
of Aε is not of degree 1, so its minimal polynomial must be its characteristic polynomial
T 2 − (2 + ε)T + 1 + ε. However, when ε = 0 the matrix Aε is the identity I2 with minimal
polynomial T − 1. When eigenvalue multipicities change, the minimal polynomial changes
in a drastic way. It is not a continuous function of the matrix.
To compute the minimal polynomial of a linear operator, Theorem 4.4 looks useful. For
example, the minimal polynomial divides the characteristic polynomial, since the charac-
teristic polynomial kills the operator by the Cayley–Hamilton theorem.4
Example 4.6. Let
0 −1 1
A = 1 2 −1 .
1 1 0
The characteristic polynomial is T 3 − 2T 2 + T = T (T − 1)2 . If the characteristic polynomial
is not the minimal polynomial then the minimal polynomial divides one of the quadratic
factors. There are two of these: T (T − 1) and (T − 1)2 . A calculation shows A(A − I3 ) = O,
so the minimal polynomial divides T (T − 1). Since A and A − I3 are not O, the minimal
polynomial of A is T (T − 1) = T 2 − T .
Since the minimal polynomial divides the characteristic polynomial, every root of the
minimal polynomial (possibly in an extension of F ) is an eigenvalue. The converse is also
true:
Theorem 4.7. Any eigenvalue of a linear operator is a root of its minimal polynomial in
F [T ], so the minimal polynomial and characteristic polynomial have the same roots.
4In particular, deg m (T ) ≤ dim V . This inequality can also be proved directly by an induction on the
A F
dimension, without using the Cayley-Hamilton theorem. See [1].
6 KEITH CONRAD
Proof. The minimal polynomials of a linear operator and its matrix representation in some
basis are the same, so we pick bases to work with a matrix A acting on F n (n = dimF V ).
Say λ is an eigenvalue of A, in some extension field E. We want to show mA (λ) = 0. There
is an eigenvector in E n for this eigenvalue: Av = λv and v 6= 0. Then Ak v = λk v for all
k ≥ 1, so f (A)v = f (λ)v for all f ∈ E[T ]. In particular, taking f (T ) = mA (T ), mA (A) = O
so 0 = mA (λ)v. Thus mA (λ) = 0.
Remark 4.8. The proof may look a bit funny: nowhere in the argument did we use
minimality of mA (T ). Indeed, we showed every polynomial that kills A has an eigenvalue
of A as a root. Since the minimal polynomial for A divides all other polynomials killing A,
it is the minimal polynomial of A to which this result has its main use, and that’s why we
formulated Theorem 4.7 for the minimal polynomial.
Example 4.9. In Example 4.6, χA (T ) = T (T − 1)2 , so the eigenvalues of A are 0 and
1. Theorem 4.7 says mA (T ) has roots 0 and 1, so mA (T ) is divisible by T and T − 1.
Therefore if mA 6= χA then mA = T (T − 1). The consideration of (T − 1)2 in Example 4.6
was unnecessary; it couldn’t have worked because mA (T ) must have both 0 and 1 as roots.
Corollary 4.10. The minimal and characteristic polynomials of a linear operator have the
same irreducible factors in F [T ].
Proof. Any irreducible factor of the minimal polynomial is a factor of the characteristic
polynomial since the minimal polynomial divides the characteristic polynomial. Conversely,
if π(T ) is an irreducible factor of the characteristic polynomial, a root of it (possibly in some
larger field than F ) is an eigenvalue and therefore is also a root of the minimal polynomial
by Theorem 4.7. Any polynomial in F [T ] sharing a root with π(T ) is divisible by π(T ), so
π(T ) divides the minimal polynomial.
If we compare the irreducible factorizations in F [T ]
χA (T ) = π1 (T )e1 · · · πk (T )ek , mA (T ) = π1 (T )f1 · · · πk (T )fk ,
we have 1 ≤ fk ≤ ek since mA (T ) | χA (T ). Since ei ≤ n ≤ nfi , we also get a “reverse”
divisibility: χA (T ) | mA (T )n in F [T ].
We say a polynomial in F [T ] splits if it is a product of linear factors in F [T ]. For instance,
T 2 − 5 splits in R[T ], but not in Q[T ]. The polynomial (T − 1)2 splits in every F [T ]. Any
factor of a polynomial in F [T ] that splits also splits.
Using the minimal polynomial in place of the characteristic polynomial provides a good
criterion for diagonalizability over a field, which is our main result:
Theorem 4.11. Let A : V → V be a linear operator. Then A is diagonalizable if and only
if its minimal polynomial in F [T ] splits in F [T ] and has distinct roots.
Theorem 4.11 gives necessary and sufficient conditions for diagonalizability, rather than
just sufficient conditions as in Theorem 3.2. Because the minimal polynomial is a factor of
the characteristic polynomial, Theorem 4.11 implies Theorem 3.2.
Proof. Suppose mA (T ) splits in F [T ] with distinct roots. We will show V has a basis of
eigenvectors for A, so A is diagonalizable. Let
mA (T ) = (T − λ1 ) · · · (T − λr ),
so the λi ’s are the eigenvalues of A (Theorem 4.7) and by hypothesis they are distinct.
For an eigenvalue λi , let
Eλi = {v ∈ V : Av = λi v}
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 7
5. Simultaneous Diagonalizability
Now that we understand when a single linear operator is diagonalizable (if and only if the
minimal polynomial splits with distinct roots), we consider simultaneous diagonalizability
of several linear operators Aj : V → V , j = 1, 2, . . . , r. Assuming each Aj has a diagonal
matrix representation in some basis, can we find a common basis in which the Aj ’s are all
diagonal matrices? (This possibility is called simultaneous diagonalization.) A necessary
constraint is commutativity: every set of diagonal matrices commutes, so if the Aj ’s can
be simultaneously diagonalized, they must commute. Happily, this necessary condition is
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 9
also sufficient, as we will soon see. What is special about commuting operators is they
preserve each other’s eigenspaces: if AB = BA and Av = λv then A(Bv) = B(Av) =
B(λv) = λ(Bv), so B sends each vector in the λ-eigenspace of A to another vector in the
λ-eigenspace of A. Pay attention to how this is used in the next theorem.
Theorem 5.1. If A1 , . . . , Ar are linear operators on V and each Ai is diagonalizable, they
are simultaneously diagonalizable if and only if they commute.
Proof. We already indicated why simultaneously diagonalizable operators have to commute.
To show the converse direction, assume A1 , . . . , Ar commute and are each diagonaliz-
able. To show they are simultaneously diagonalizable, we induct on the number r of linear
operators. The result is clear if r = 1, so assume r ≥ 2. Let
Eλ = {v : Ar v = λv}
be an eigenspace for Ar for some eigenvalue λ of Ar . Since Ar is diagonalizable on V , V is
the direct sum of the eigenspaces for Ar .
For v ∈ Eλ , Ar (Ai v) = Ai (Ar v) = Ai (λv) = λ(Ai v), so Ai v ∈ Eλ . Thus each Ai
restricts to a linear operator on the subspace Eλ . The linear operators A1 |Eλ , . . . , Ar−1 |Eλ
commute since the Ai ’s already commuted as operators on V , and these restrictions to
Eλ are diagonalizable by Corollary 7.5. There are r − 1 of them, so induction on r (while
quantifying over all finite-dimensional vector spaces) implies there is a basis for Eλ consisting
of common eigenvectors for A1 |Eλ , . . . , Ar−1 |Eλ .5 The elements of this basis for Eλ are
eigenvectors for Ar |Eλ as well, since all nonzero vectors in Eλ are eigenvectors for Ar . Thus
A1 |Eλ , . . . , Ar−1 |Eλ , Ar |Eλ are all diagonalizable. The vector space V is the direct sum of
the eigenspaces Eλ of Ar , so stringing together common eigenbases of all Ai |Eλ as λ runs
over the eigenvalues of Ar gives a common eigenbasis of V for all the Ai ’s.
Remark 5.2. Theorem 5.1 is not saying commuting operators diagonalize! It says commut-
ing diagonalizable operators simultaneously diagonalize. For example, the matrices ( 10 a1 )
for all a commute with each other, but none of them are diagonalizable when a 6= 0.
Because we are dealing with operators on finite-dimensional spaces, Theorem 5.1 extends
to a possibly infinite number of commuting operators, as follows.
Corollary 5.3. Let {Ai } be commuting linear operators on a finite-dimensional vector space
V . If each Ai is diagonalizable on V then they are simultaneously diagonalizable.
Proof. Let U be the subspace of EndF (V ) spanned by the operators Ai ’s. Since EndF (V ) is
finite-dimensional, its subspace U is finite-dimensional, so U is spanned by a finite number of
Ai ’s, say Ai1 , . . . , Air . By Theorem 5.1, there is a common eigenbasis of V for Ai1 , . . . , Air .
A common eigenbasis for linear operators is also an eigenbasis for every linear combination
of the operators, so this common eigenbasis of Ai1 , . . . , Air diagonalizes every element of U ,
and in particular diagonalizes each Ai .
Corollary 5.4. Let A and B be linear operators V → V that are diagonalizable and com-
mute. Then every operator in the ring F [A, B] is diagonalizable. In particular, A + B and
AB are diagonalizable.
Proof. Since A and B commute, there is a common eigenbasis of V for A and B. Members
of this basis are also eigenvectors for each operator in F [A, B], so all these operators are
diagonalizable too.
5This choice of basis for E is not made by A , but by the other operators together.
λ r
10 KEITH CONRAD
6. Nilpotent Operators
The minimal polynomial classifies not only diagonalizable operators, but also nilpotent
operators (those having a power equal to O).
Theorem 6.1. For a linear operator N : V → V , the following are equivalent:
(1) N is nilpotent: N k = O for some k ≥ 1,
(2) N n = O, where n = dim V ,
(3) the minimal polynomial of N is T k for some k ≤ n.
Proof. We will show (1) ⇒ (2) ⇒ (3). (That (3) implies (1) is obvious.) If N k = O for some
k ≥ 1 then the minimal polynomial of N is a factor of T k , so the minimal polynomial of N is
a power of T . The characteristic polynomial is monic of degree n with the same irreducible
factors as the minimal polynomial (Corollary 4.10) so χN (T ) = T n , which implies N n = O
by Cayley-Hamilton. The minimal polynomial divides the characteristic polynomial, so if
χN (T ) = T n then the minimal polynomial is T k for some k ≤ n.
Corollary 6.2. A linear operator N : V → V is nilpotent if and only if its only eigenvalue
in extensions of F is 0.
Proof. We know N is nilpotent if and only if its characteristic polynomial is T n , which is
equivalent to saying the only eigenvalue of N in extensions of F is 0.
We can’t diagonalize a nilpotent operator except if it is O: a minimal polynomial of
the form T k has distinct roots only when it is T , and the only operator with minimal
polynomial T is O. But there is something we can say from Theorem 6.1 about possible
matrix representations of a nilpotent operator.
Corollary 6.3. A nilpotent linear operator N : V → V has a strictly upper triangular
matrix representation.
A strictly upper triangular matrix, like ( 00 10 ), is an upper triangular matrix with 0’s along
the main diagonal.
Proof. We argue by induction on dim V . When dim V = 1, N = O and the result is easy. If
dim V > 1 and N = O, the result is still easy. If dim V > 1 and N 6= O, then W := ker N
is a proper subspace of V and W 6= {0} since N is not injective (a nonzero nilpotent
operator on V certainly isn’t one-to-one, as V 6= {0} from our conventions at the end of the
introduction). Since N (W ) = {0} ⊂ W , N induces linear operators on W and V /W . Since
a power of N on V is O, that same power of N is O on W and V /W , so the operators that
N induces on W and V /W are both nilpotent. Of course N really acts on W as O, but
on V /W all we can say is that N is nilpotent. Since 0 < dim V /W < dim V , by induction
there is a basis of V /W with respect to which the operator induced by N on V /W has a
strictly upper triangular matrix representation. Lift such a basis of V /W (arbitrarily) to
vectors in V (the lifts are automatically linearly independent) and add to this set a basis of
W to get a basis of V . Put the basis vectors from W first in the ordering of this basis for
V . With respect to this choice of basis of V , the matrix representation of N on V has the
form ( O ∗
O U ), where U is the strictly upper triangular (square) matrix for N on V /W . This
matrix for N on V is strictly upper triangular, so we are done.
THE MINIMAL POLYNOMIAL AND SOME APPLICATIONS 11
Now we give the proof of Theorem 4.17: a linear operator has an upper triangular matrix
representation if and only if its minimal polynomial splits in F [T ].
Proof. If a linear operator has an upper triangular matrix representation, then the charac-
teristic polynomial of the upper triangular matrix splits in F [T ], so the minimal polynomial
(a factor of that) also splits in F [T ].
Conversely, assume mA (T ) splits in F [T ], say
mA (T ) = (T − λ1 )e1 · · · (T − λr )er ,
where the λi ’s are the distinct roots. Then the polynomials fi (T ) = mA (T )/(T − λi )ei are
relatively prime, so arguing as in the proof of Theorem 4.11 (where all the exponents ei are
1) we get M
V = ker((A − λi )ei ).
Let Wi = ker((A − λi )ei ). Since A commutes with (A − λi )ei , A(Wi ) ⊂ Wi . We will show
A|Wi has an upper-triangular matrix representation, and by stringing together bases of the
Wi ’s to get a basis of V we will obtain an upper-triangular matrix representation of A on
V.
On Wi , (A − λi )ei = O, so A − λi is a nilpotent operator. Write A = λi + (A − λi ), which
expresses A on Wi as the sum of a scaling operator and a nilpotent operator. By what we
know about nilpotent operators, there is a basis of Wi with respect to which A−λi is strictly
upper triangular. Now with respect to every basis, the scaling operator λi is diagonal. So
using the basis that makes A − λi a strictly upper triangular matrix, the matrix for A is
the sum of a diagonal and strictly upper triangular matrix, and that’s an upper triangular
matrix.
Corollary 6.4. If A1 , . . . , Ar are commuting linear operators on V and each Ai is upper
triangularizable, they are simultaneously upper triangularizable.
Unlike Theorem 5.1, the commuting hypothesis used here is far from being necessary:
most upper triangular matrices (unlike all diagonal matrices) do not commute with each
other! There is a theorem of A. Borel about linear algebraic groups that relaxes the commu-
tativity assumption to a more reasonable hypothesis (solvability, together with some other
technical conditions).
Proof. We argue by induction on the dimension of the vector space (not on the number of
operators, as in the proof of Theorem 5.1). The one-dimensional case is clear. Assume now
dim V ≥ 2 and the corollary is known for lower-dimensional spaces. We may assume the
Ai ’s are not all scalar operators on V (otherwise the result is obvious using an arbitrary
basis of V ). Without loss of generality, let Ar not be a scalar operator.
Since Ar is upper triangularizable on V , its eigenvalues are in F . Let λ ∈ F be an
eigenvalue of Ar and set Eλ to be the λ-eigenspace of Ar in V . Then 0 < dim Eλ < dim V .
Since the Ai ’s commute, Ai (Eλ ) ⊂ Eλ for all i. Moreover, the minimal polynomial of
Ai |Eλ is a factor of the minimal polynomial of Ai on V , so every Ai |Eλ is upper triangular-
izable by Theorem 4.17. Since the Ai ’s commute on V , they also commute as operators on
Eλ , so by induction on dimension the Ai ’s are simultaneously upper triangularizable on Eλ .
In particular, the first vector in a simultaneous “upper triangular basis” for the Ai ’s is a
common eigenvector of all the Ai ’s. Call this vector e1 and set W = F e1 . Then Ai (W ) ⊂ W
for all i. The Ai ’s are all operators on W and thus also on V /W . Let Ai : V /W → V /W
be the operator Ai induces on V /W . On V /W these operators commute and are upper
triangularizable (since their minimal polynomials divide those of the Ai ’s on V , which split
in F [T ]), so again by induction on dimension the operators Ai on V /W are simultaneously
upper triangularizable. If we lift a common upper triangular basis for the Ai ’s from V /W to
12 KEITH CONRAD
V and tack on e1 as the first member of a basis for V , we obtain a common upper triangular
basis for the Ai ’s by an argument similar to that in the proof of Corollary 6.3.
There is a linear dependence relation on the set {v, A(v), . . . , Ad (v)}, and the coefficient
of Ad (v) in the relation must be nonzero since the other vectors are linearly independent.
We can scale the coefficient of Ad (v) in such a relation to be 1, say
(7.1) Ad (v) + cd−1 Ad−1 (v) + · · · + c1 A(v) + c0 = 0,
where ci ∈ F . This tells us the polynomial
m(T ) := T d + cd−1 T d−1 + · · · + c1 T + c0
satisfies m(A)(v) = 0, so for every f (T ) ∈ F [T ] we have m(A)(f (A)v) = f (A)(m(A)v) =
f (A)(0) = 0. Every element of W is f (A)(v) for some f (T ), so m(A) kills all of W : m(T )
is the minimal polynomial of A acting on W . (Incidentally, this also shows dim W = d and
W has basis {v, A(v), . . . , Ad−1 (v)}.)
Set W1 = W and m1 (T ) = m(T ). If W1 6= V , pick a vector v 0 6∈ W1 and run through
the same argument for the subspace W2 of V spanned by the vectors v 0 , A(v 0 ), A2 (v 0 ), . . .
to get a minimal polynomial m2 (T ) for A on W2 . And so on. Since V is finite-dimensional,
eventually we will get a sequence of subspaces W1 , W2 , . . . , Wk where A(Wi ) ⊂ Wi for all i
and the Wi ’s add up to V . The minimal polynomial of A on V is the least common multiple
of the mi (T )’s by Theorem 7.1.
Example 7.3. Let
0 4 1 −2
−1 4 0 −1
A=
0
.
0 1 0
−1 3 0 0
Set v1 = (1, 0, 0, 0), so A(v1 ) = (0, −1, 0, −1) and A2 (v1 ) = (−2, −3, 0, −3) = 3A(v1 ) − 2v1 .
Thus A2 (v1 )−3A(v1 )+2v1 = 0, so m1 (T ) = T 2 −3T +2. Let W1 be the span of v1 and A(v1 ).
The vector v2 = (0, 1, 0, 0) is not in W1 . It turns out v2 , A(v2 ), and A2 (v2 ) are linearly
independent and A3 (v2 ) = 4A2 (v2 )−5A(v2 )+2v2 , so m2 (T ) = T 3 −4T 2 +5T −2. Let W2 be
the span of v2 , A(v2 ), and A2 (v2 ). Both W1 and W2 are in the subspace of vectors with third
component 0, so W1 + W2 6= F 4 . Take v3 = (0, 0, 1, 0). A calculation shows the same linear
relations hold for A-iterates of v3 as for v2 , so m3 (T ) = m2 (T ). Since {v1 , A(v1 ), v2 , v3 } is
a basis for F 4 , the minimal polynomial of A is lcm(m1 , m2 , m3 ) = T 3 − 4T 2 + 5T − 2.
The algorithm we described for computing the minimal polynomial of a linear operator
used subspaces preserved by the operator. Let’s look at such subspaces more closely. If
A : V → V is linear, a subspace W ⊂ V is called A-stable when A(W ) ⊂ W . For example,
an eigenspace of A is A-stable. If A = idV then every subspace of V is A-stable. (We don’t
require A(W ) = W , only A(W ) ⊂ W .) When A(W ) ⊂ W , A induces a linear operator on
W and on the quotient space V /W . Denote these induced linear maps as AW and AV /W .
We look at how each of diagonalizability and nilpotency are related for A, AW , and AV /W
when W is an A-stable subspace. First we see how the minimal polynomials of these three
linear maps are related.
Theorem 7.4. Suppose A : V → V is linear and W is an A-stable subspace of V . The
induced linear maps AW : W → W and AV /W : V /W → V /W have minimal polynomials
that are factors of mA (T ). More precisely, their least common multiple is a factor of mA (T )
and their product is divisible by mA (T ).
Proof. The polynomial mAW (T ) kills W when T is replaced by A|W , and the polynomial
mAV /W (T ) kills V /W when T is replaced by A acting on V /W (that is, by AV /W ). Since
mA (T ) kills W when T is replaced by A and kills V /W when T is replaced by A, mA (T ) is
divisible by both mAW (T ) and by mAV /W (T ) and hence by their least common multiple.
14 KEITH CONRAD
References
[1] M. D. Burrow, “The Minimal Polynomial of a Linear Transformation,” Amer. Math. Monthly 80 (1973),
1129–1131.
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: