Li Ti 2009

ON BEST APPROXIMATIONS OF POLYNOMIALS IN MATRICES
IN THE MATRIX 2-NORM

JÖRG LIESEN∗ AND PETR TICHÝ‡
Abstract. We show that certain matrix approximation problems in the matrix 2-norm have
uniquely defined solutions, despite the lack of strict convexity of the matrix 2-norm. The problems
we consider are generalizations of the ideal Arnoldi and ideal GMRES approximation problems
introduced by Greenbaum and Trefethen [SIAM J. Sci. Comput., 15 (1994), pp. 359–368]. We also
discuss general characterizations of best approximation in the matrix 2-norm and provide an example
showing that a known sufficient condition for uniqueness in these characterizations is not necessary.
Key words. matrix approximation problems, polynomials in matrices, matrix functions, matrix
2-norm, GMRES, Arnoldi’s method
AMS subject classifications. 41A52, 15A60, 65F35
1. Introduction. Much of the work in approximation theory concerns the ap-

proximation of a given function f on some (compact) set Ω in the complex plane by
polynomials. Classical results in this area deal with the best approximation problem
(1.1) min kf − pkΩ ,

p∈Pm
where kgkΩ ≡ maxz∈Ω |g(z)|, and Pm denotes the set of polynomials of degree at
most m. (Note that since in (1.1) we seek an approximation from a finite dimensional
subspace, the minimum is indeed attained by some polynomial p∗ ∈ Pm .)
Scalar approximation problems of the form (1.1) have been studied since the mid
1850s. Accordingly, numerous results on existence and uniqueness of the solution as
well as estimates for the value of (1.1) are known. Here we consider a problem that at
first sight looks similar, but apparently is much less understood: Let f be a function
that is analytic in an open neighborhood of the spectrum of a given matrix A ∈ Cn×n ,
so that f (A) is well defined, and let | · | be a given matrix norm. Consider the matrix
approximation problem
(1.2) min |f (A) − p(A)| .

p∈Pm
Does this problem have a unique solution?

An answer to this question of course depends on the norm used in (1.2). A
norm | · | on a vector space V is called strictly convex, when for all vectors v1 , v2 ∈ V
the equation |v1 | = |v2 | = 12 |v1 + v2 | implies that v1 = v2 . A geometric interpretation
of strict convexity is that the unit sphere in V with respect to the norm | · | does not
contain any line segments. If S ⊆ V is a finite dimensional subspace, then for any
given v ∈ V there exists a unique s∗ ∈ S so that
|v − s∗ | = min |v − s| .
s∈S
∗ Institute of Mathematics, Technical University of Berlin, Straße des 17. Juni 136, 10623 Berlin,
Germany (liesen@math.tu-berlin.de). The work of this author was supported by the Heisenberg
Program of the Deutsche Forschungsgemeinschaft.
‡ Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodárenskou
věžı́ 2, 18207 Prague, Czech Republic (tichy@cs.cas.cz). The work of this author was supported
by the GAAS grant IAA100300802.
1
2 JÖRG LIESEN AND PETR TICHÝ
A proof of this classical result can be found in most books on approximation theory;
see, e.g., [3, Chapter 1]. In particular, if the norm is strictly convex, then (1.2) is
guaranteed to have a unique solution as long as the value of (1.2) is positive.
A useful matrix norm that is met in many applications is the matrix 2-norm (or
spectral norm), which for a given matrix A is equal to the largest singular value of A.
We denote the 2-norm of A by kAk. This norm is not strictly convex as can be seen
from the following simple example: Suppose that we have two matrices A1 , A2 ∈ Cn×n
of the form

B 0 B 0
A1 = , A2 = ,
0 C 0 D
with kA1 k = kA2 k = kBk ≥ 12 kC + Dk. Then 12 kA1 + A2 k = kBk, but whenever
C 6= D, we have A1 6= A2 . Consequently, in case of the matrix 2-norm the classi-
cal uniqueness result mentioned above does not apply, and our question about the
uniqueness of the solution of the matrix approximation problem (1.2) is nontrivial.
It is well known that when the function f is analytic in an open neighborhood
of the spectrum of the matrix A ∈ Cn×n , then f (A) is a well-defined complex n × n
matrix. In fact, f (A) = pf (A), where pf is a polynomial that depends on the values
and possibly the derivatives of f on the spectrum of A. The recent book of Higham [5]
gives an extensive overview of definitions, applications, and computational techniques
for matrix functions. Our above question now naturally leads to the following math-
ematical problem: Let a polynomial b and a nonnegative integer m < deg b be given.
Determine conditions so that the best approximation problem
(1.3) min kb(A) − p(A)k

p∈Pm
has a unique solution, where k · k is the matrix 2-norm and Pm denotes the set of
polynomials of degree at most m.
When searching the literature we found a number of results on general characteri-
zations of best approximations in normed linear spaces of matrices, e.g. in [7, 9, 15, 16],
but just a few papers related to our specific problem. In particular, Greenbaum and
Trefethen consider in [4] the two approximation problems
(1.4) min kAm+1 − p(A)k,

p∈Pm
(1.5) min kI − Ap(A)k.
p∈Pm
They state that both (1.4) and (1.5) (for nonsingular A) have a unique minimizera .
The problem (1.4) is equal to (1.3) with b(A) = Am+1 . Because of its relation to
the convergence of the Arnoldi method [1] for approximating eigenvalues of A, the
uniquely defined monic polynomial z m+1 − p∗ that solves (1.4) is called the (m + 1)st
ideal Arnoldi polynomial of A. In a paper that is mostly concerned with algorithmic
and computational results, Toh and Trefethen [13] call this polynomial the (m + 1)st
Chebyshev polynomial of A. The reason for this terminology is the following: When
the matrix A is normal, i.e. unitarily diagonalizable, problem (1.4) becomes a scalar
approximation problem of the form (1.1) with f (z) = z m+1 and Ω being the spectrum
a The statement of uniqueness is true but the proof given in [4], which was later repeated in [14,
Chapter 29], contains a small error at the very end. After the error was spotted by Michael Eiermann,
it was fixed by Anne Greenbaum in 2005, but the correction has not been published.
ON BEST APPROXIMATIONS OF POLYNOMIALS IN MATRICES 3
of A. The resulting monic polynomial is the (m + 1)st Chebyshev polynomial on this

(discrete) set Ω, i.e., the unique monic polynomial of degree m + 1 with minimal
maximum norm on Ω. In this sense, the matrix approximation problem (1.3) we
study here can be considered a generalization of the classical scalar approximation
problem (1.1). Some further results on Chebyshev polynomials of matrices are given
in [11], and [14, Chapter 29].
The quantity (1.5) can be used for bounding the relative residual norm in the
GMRES method [8]; for details see, e.g., [10, 12]. Therefore, the uniquely defined
polynomial 1 − z p∗ that solves (1.5) is called the (m + 1)st ideal GMRES polynomial
of A.
In this paper we show that, despite the lack of strict convexity of the matrix
2-norm, the approximation problem (1.3) as well as a certain related problem that
generalizes (1.5) have a unique minimizer. Furthermore, we discuss some of the above
mentioned general characterizations of best approximations with respect to the 2-
norm in linear spaces of matrices. On the example of a Jordan block we show that a
sufficient condition for uniqueness of the best approximation obtained by Ziȩtak [15]
does not hold. We are not aware that such an example for a nonnormal matrix has
been given before.
2. Uniqueness results. Let ℓ ≥ 0 and m ≥ 0 be given integers, and consider a
given polynomial b of the form
ℓ+m+1
X
b = βj z j ∈ Pℓ+m+1 .
j=0
Let us rewrite the approximation problem (1.3) in a more convenient equivalent form:
m
X
min kb(A) − p(A)k = min k b(A) − p(A) + βj Aj k
p∈Pm p∈Pm
j=0
ℓ+m+1
X
= min k βj Aj − p(A) k
p∈Pm
j=m+1
ℓ
X
(2.1) = min k Am+1 βj+m+1 Aj − p(A) k .
p∈Pm
j=0
The polynomials in (2.1) are of the form z m+1 g + h, where the polynomial g ∈ Pℓ is
given, and h ∈ Pm is sought. Hence (1.3) is equivalent to the problem
(2.2) min kAm+1 g(A) + h(A)k,

h∈Pm
where g ∈ Pℓ is a given polynomial, or,

(2.3) min kp(A)k, where Gℓ,m (g)
≡ z m+1 g + h : g ∈ Pℓ is given, h ∈ Pm .
(g)
p∈Gℓ,m
With ℓ = 0 and g = 1, (2.3) reduces to (1.4).

Similarly, we may consider the approximation problem

(2.4) min kp(A)k, where Hℓ,m (h)
≡ z m+1 g + h : h ∈ Pm is given, g ∈ Pℓ .
(h)
p∈Hℓ,m
Setting m = 0 and h = 1 in (2.4), we retrieve a problem of the form (1.5).

The problems (2.3) and (2.4) are trivial for g = 0 and h = 0, respectively. Both
cases are unconstrained minimizations problems, and it is easily seen that the resulting
minimum value is zero. In the following we will therefore exclude the cases g = 0 in
(g) (h)
(2.3) and h = 0 in (2.4). Under this assumption, both Gℓ,m and Hℓ,m are subsets of
(g)
Pℓ+m+1 where certain coefficients are fixed. In case of Gℓ,m these are the coefficients
at the ℓ + 1 largest powers of z, namely z m+1 , . . . , z ℓ+m+1 . For Hℓ,m
(h)
these are the
m
coefficients at the m + 1 smallest powers of z, namely 1, . . . , z .
We start with conditions so that the values of (2.3) and (2.4) are positive for all
given nonzero polynomials g ∈ Pℓ and h ∈ Pm , respectively.
Lemma 2.1. Consider the approximation problems (2.3) and (2.4), where ℓ ≥ 0
and m ≥ 0 are given integers. Denote by d(A) the degree of the minimal polynomial
of the given matrix A ∈ Cn×n . Then the following two assertions are equivalent:
(1) minp∈G (g) kp(A)k > 0 for all nonzero polynomials g ∈ Pℓ .
ℓ,m
(2) m + ℓ + 1 < d(A).
If A is nonsingular, the two assertions are equivalent with:
(3) minp∈H(h) kp(A)k > 0 for all nonzero polynomials h ∈ Pm .
ℓ,m
Proof. (1) ⇒ (2): We suppose that m + ℓ + 1 ≥ d(A) and show that (1) fails to
hold. Denote the minimal polynomial of A by ΨA . If m + 1 ≤ d(A) ≤ ℓ + m + 1,
then there exist uniquely determined polynomials b g 6= 0, and b
g ∈ Pℓ , b h ∈ Pm , so that
g+b
z m+1 · b h = ΨA . Hence minp∈G (g) kp(A)k = 0 for g = gb. If 0 ≤ d(A) ≤ m, let bg be
ℓ,m
any nonzero polynomial of degree at most ℓ. By the division theorem for polynomialsb ,
there exist uniquely defined polynomials q ∈ Pm+ℓ+1−d(A) and h ∈ Pm−1 , so that
z m+1 ·b
g = q·ΨA −h, or, equivalently, z m+1 ·b
g +h = q·ΨA . Hence Am+1 gb(A)+h(A) = 0,
which means that minp∈G (g) kp(A)k = 0 for the nonzero polynomial g = b g ∈ Pℓ .
ℓ,m
(g)
(2) ⇒ (1): If m+ℓ+1 < d(A), then Gℓ,m ⊂ Pm+ℓ+1 implies minp∈G (g) kp(A)k > 0
ℓ,m
for every nonzero polynomial g ∈ Pℓ .
(h)
(2) ⇒ (3): If m+ℓ+1 < d(A), then Hℓ,m ⊂ Pm+ℓ+1 implies minp∈H(h) kp(A)k > 0
ℓ,m
for every nonzero polynomial h ∈ Pm .
(3) ⇒ (2): For this implication we use that A is nonsingular. Suppose that (2)
does not hold, i.e. that 0 ≤ d(A) ≤ m + ℓ + 1. Then there exist uniquely defined
polynomials bg ∈ Pℓ and b g+b
h ∈ Pm , such that z m+1 · b h = ΨA . Since A is assumed
b
to be nonsingular, we must have h 6= 0. Consequently, minp∈H(h) kp(A)k = 0 for the
ℓ,m
nonzero polynomial h = b
h ∈ Pm .
In the following Theorem 2.2 we show that the problem (2.3) has a uniquely
defined minimizer when the value of this problem is positive (and not zero). In the
previous lemma we have shown that m + ℓ + 1 < d(A) is necessary and sufficient so
that the value of (2.3) is positive for all nonzero polynomials g ∈ Pℓ . However, it is
possible that for some nonzero polynomial g ∈ Pℓ the value of (2.3) is positive even
when m + 1 ≤ d(A) ≤ m + ℓ + 1. It is possible to further analyze this special case, but
for the ease of the presentation we simply assume that the value of (2.3) is positive.
b If f and g 6= 0 are polynomials over a field F, then there exist uniquely defined polynomials s
and r over F, such that (i) f = g · s + r, and (ii) either r = 0 or deg r < deg g. If deg f ≥ deg g, then
deg f = deg g + deg s. For a proof of this standard result, see, e.g., [6, Chapter 4].
The same assumption is made in Theorem 2.3 below, where we prove uniqueness of
the minimizer of (2.4) (under the additional assumption that A is nonsingular).
We point out that Lemma 2.1 implies that the approximation problems (1.4), and
(1.5) for nonsingular A, have positive values if and only if m + 1 < d(A). Of course,
if m + 1 = d(A), then the value of both problems is zero. In this case, the (m + 1)st
ideal Arnoldi polynomial that solves (1.4) is equal to the minimal polynomial of A,
and the (m + 1)st ideal GMRES polynomial that solves (1.5) is a scalar multiple of
that polynomial.
Theorem 2.2. Let A ∈ Cn×n be a given matrix, ℓ ≥ 0 and m ≥ 0 be given
integers, and g ∈ Pℓ be a given nonzero polynomial. If the value of (2.3) is positive,
then this problem has a uniquely defined minimizer.
Proof. The general strategy in the following is similar to the construction in [4,
Section 5]. We suppose that q1 = z m+1 g + h1 ∈ Gℓ,m
(g)
and q2 = z m+1 g + h2 ∈ Gℓ,m
(g)
are
two distinct solutions to (2.3) and derive a contradiction. Suppose that the minimal
norm attained by the two polynomials is
C = kq1 (A)k = kq2 (A)k.
By assumption, C > 0. Define q ≡ 12 (q1 + q2 ) ∈ Gℓ,m

(g)
, then
1
kq(A)k ≤ (kq1 (A)k + kq2 (A)k) = C.
2
Since C is assumed to be the minimal value of (2.3), we must have kq(A)k = C.
Denote the singular value decomposition of q(A) by
(2.5) q(A) = V diag(σ1 , . . . , σn ) W ∗ .
Suppose that the maximal singular value σ1 = C of q(A) is J-fold, with left and right
singular vectors given by v1 , . . . , vJ and w1 , . . . , wJ , respectively.
It is well known that the 2-norm for vectors v ∈ Cn , kvk ≡ (v ∗ v)1/2 , is strictly
convex. For each wj , 1 ≤ j ≤ J, we have
1
C = kq(A)wj k ≤ (kq1 (A)wj k + kq2 (A)wj k) ≤ C ,
2
which implies
kq1 (A)wj k = kq2 (A)wj k = C, 1≤j≤J.
By the strict convexity of the vector 2-norm,
q1 (A)wj = q2 (A)wj , 1≤j≤J.
Similarly, one can show that
q1 (A)∗ vj = q2 (A)∗ vj , 1 ≤ j ≤ J.
Thus,
(2.6) (q2 (A) − q1 (A))wj = 0, (q2 (A) − q1 (A))∗ vj = 0, 1 ≤ j ≤ J.

By assumption, q2 − q1 ∈ Pm is a nonzero polynomial. By the division theorem for

polynomials (see the footnote on p. 4), there exist uniquely defined polynomials s
and r, with deg s ≤ ℓ + m + 1 and deg r < deg (q2 − q1 ) ≤ m (or r = 0) so that
z m+1 g = (q2 − q1 ) · s + r .
Hence we have shown that for the given polynomials q2 − q1 and g there exist poly-
nomials s and r, such that
qe ≡ (q2 − q1 ) · s = z m+1 g − r ∈ Gℓ,m

(g)
.
Since g 6= 0, we must have qe 6= 0. For a fixed ǫ ∈ (0, 1), consider the polynomial
(g)
qǫ = (1 − ǫ)q + ǫe
q ∈ Gℓ,m .
By (2.6),
qe(A)wj = 0, qe(A)∗ vj = 0, 1 ≤ j ≤ J,
and thus
qǫ (A)∗ qǫ (A)wj = (1 − ǫ)qǫ (A)∗ q(A)wj = (1 − ǫ)Cqǫ (A)∗ vj

= (1 − ǫ)2 Cq(A)∗ vj = (1 − ǫ)2 C 2 wj ,
which shows that w1 , . . . , wJ are right singular vectors of qǫ (A) corresponding to the
singular value (1 − ǫ)C. Note that (1 − ǫ)C < C since C > 0.
Now there are two cases: Either kqǫ (A)k = (1 − ǫ)C, or (1 − ǫ)C is not the largest
singular value of qǫ (A). In the first case we have a contradiction to the fact that C is
the minimal value of (2.3). Therefore the second case must hold. In that case, none
of the vectors w1 , . . . , wJ corresponds to the largest singular value of qǫ (A). Using
this fact and the singular value decomposition (2.5), we get
kqǫ (A)k = kqǫ (A)W k

= kqǫ (A)[wJ+1 , . . . , wn ]k
= k(1 − ǫ)q(A)[wJ+1 , . . . , wn ] + ǫe q (A)[wJ+1 , . . . , wn ]k
≤ (1 − ǫ) k[vJ+1 , . . . , vn ] diag(σJ+1 , . . . , σn )k + ǫke
q(A)[wJ+1 , . . . , wn ]k
(2.7) ≤ (1 − ǫ)σJ+1 + ǫke
q(A)[wJ+1 , . . . , wn ]k .
Note that the norm ke q (A)[wJ+1 , . . . , wn ]k in (2.7) does not depend on the choice of ǫ,
and that (2.7) goes to σJ+1 as ǫ goes to zero. Since σJ > σJ+1 , one can find a positive
ǫ∗ ∈ (0, 1), such that (2.7) is less than σJ for all ǫ ∈ (0, ǫ∗ ). Any of the corresponding
polynomials qǫ gives a matrix qǫ (A) whose norm is less than σJ . This contradiction
finishes the proof.
In the following theorem we prove that the problem (2.4), and hence in particular
the problem (1.5), has a uniquely defined minimizer.
Theorem 2.3. Let A ∈ Cn×n be a given nonsingular matrix, ℓ ≥ 0 and m ≥ 0
be given integers, and h ∈ Pm be a given nonzero polynomial. If the value of (2.4) is
positive, then this problem has a uniquely defined minimizer.
Proof. Most parts of the following proof are analogous to the proof of Theorem 2.2,
and are stated only briefly. However, the construction of the polynomial qǫ used to
derive the contradiction is different.
We suppose that q1 = z m+1 g1 +h ∈ Hℓ,m (h)

and q2 = z m+1 g2 +h ∈ Hℓ,m (h)
are two dis-
tinct solutions to (2.4), and that the minimal norm attained by them is C = kq1 (A)k =
kq2 (A)k. By assumption, C > 0. Define q ≡ 12 (q1 + q2 ) ∈ Hℓ,m (h)
, then kq(A)k = C.
Denote the singular value decomposition of q(A) by q(A) = V diag(σ1 , . . . , σn )W ∗ ,
and suppose that the maximal singular value σ1 = C of q(A) is J-fold, with left and
right singular vectors given by v1 , . . . , vJ and w1 , . . . , wJ , respectively. As previously,
we can show that
(q2 (A) − q1 (A))wj = 0, (q2 (A) − q1 (A))∗ vj = 0, 1 ≤ j ≤ J.
Since A is nonsingular, and q2 − q1 = z m+1 (g2 − g1 ), these relations imply that
(2.8) (g2 (A) − g1 (A))wj = 0, (g2 (A) − g1 (A))∗ vj = 0, 1 ≤ j ≤ J.
By assumption, 0 6= g2 − g1 ∈ Pℓ . Hence there exists an integer d, 0 ≤ d ≤ ℓ, so that

ℓ
X
g2 − g1 = γi z i , with γd 6= 0 .
i=d
Now define
ge ≡ z −d (g2 − g1 ) ∈ Pℓ−d .
By construction, e
g is a polynomial with a nonzero constant term. Furthermore, define
b
h ≡ z −m−1−ℓ+d h and gb ≡ z −ℓ+d e
g.
After a formal change of variables z −1 7→ y, we obtain

b
h(y) ∈ Pm+1+ℓ−d and g(y) ∈ Pℓ−d \ Pℓ−d−1 .
b
(Here P−1 ≡ ∅ in case d = ℓ.) By the division theorem for polynomials (see the
footnote on p. 4), there exist uniquely defined polynomials s(y) and r(y) with deg s ≤
m + 1 (since gb 6= 0 is of exact degree ℓ − d) and deg r < ℓ − d (or r = 0) such that
b g(y) · s(y) − r(y) .

h(y) = b
We now multiply the preceding equation by y −m−1−ℓ+d , which gives

y −m−1−ℓ+d bh(y) = y −ℓ+d bg(y) · y −m−1 s(y) − y −m−1 y −ℓ+d r(y) .
Since y −1 = z, this equation is equivalent to
h = ge · se − z m+1 re,
where se ∈ Pm+1 and re ∈ Pℓ−d−1 . Hence we have shown that for the given polynomi-
g there exist polynomials se ∈ Pm+1 and re ∈ Pℓ−d−1 , such that
als h and e
g · se = z m+1 re + h ∈ Hℓ,m
qe ≡ e (h)
.
For a fixed ǫ ∈ (0, 1), consider

(h)
qǫ = (1 − ǫ)q + ǫe
q ∈ Hℓ,m .
Since qe = sez −d (g2 − g1 ), (2.8) implies that
qe(A)wj = 0, qe(A)∗ vj = 0, 1 ≤ j ≤ J,
which can be used to show that
qǫ (A)∗ qǫ (A)wj = (1 − ǫ)2 C 2 wj , 1 ≤ j ≤ J.
Now the same argument as in the proof of Theorem 2.2 gives a contradiction to the
6 q1 .
original assumption that q2 =
Remark 2.4. Similarly as in Lemma 2.1, the assumption of nonsingularity in
the previous theorem is in general necessary. In other words, when A is singular the
approximation problem (2.4) might have more than one solution even when the value
of (2.4) is positive. The following example demonstrating this fact was pointed out to
us by Krystyna Ziȩtak:
Consider a normal matrix A = U ΛU ∗ , where U ∗ U = I and Λ = diag(λ1 , . . . , λn ).
Suppose that A is singular with n distinct eigenvalues and λ1 = 0. Furthermore,
suppose that h ∈ Pm is any given polynomial that satisfies h(0) 6= 0 and |h(0)| >
|h(λj )| for j = 2, . . . , n. Then for any integer ℓ ≥ 0,
min kp(A)k = min max |λm+1

j g(λj ) + h(λj )| = |h(0)| > 0 .
(h)
p∈Hℓ,m g∈Pℓ j
One solution of this problem is given by the polynomial g = 0. Moreover, the minimum
value is attained for any polynomial g ∈ Pℓ that satisfies
min max |λm+1

j g(λj ) + h(λj )| ≤ |h(0)| ,
g∈Pℓ 2≤j≤n
i.e., for any polynomial g ∈ Pℓ that is close enough to the zero polynomial.
3. Characterization of best approximation with respect to the matrix
2-norm. In this section we discuss general characterizations of best approximation in
linear spaces of matrices with respect to the matrix 2-norm obtained by Ziȩtak [15, 16],
and give an example from our specific problem. To state Ziȩtak’s results, we need
some notation. Suppose that we are given m matrices A1 , . . . , Am ∈ Cn×n that are
linearly independent in Cn×n . We assume that 1 ≤ m < n2 to avoid trivialities.
Denote A ≡ span {A1 , . . . , Am }, which is an m-dimensional subspace of Cn×n . As
above, let k · k denote the matrix 2-norm. For a given matrix B ∈ Cn×n \A, we
consider the best approximation (or matrix nearness) problem
(3.1) min kB − M k .
M∈A
A matrix A∗ ∈ A for which this minimum is achieved (such a matrix exists since A is
finite dimensional) is called a spectral approximation of B from the subspace A. The
corresponding matrix R(A∗ ) = B − A∗ is called a residual matrix.
The approximation problems (2.3) and (2.4) studied in the previous section are
both special cases of (3.1). In case of (2.3),
B = Am+1 g(A), where g ∈ Pℓ is given, and A = {I, A, . . . , Am },
while in case of (2.4),
B = h(A), where h ∈ Pm is given, and A = {Am+1 , . . . , Aℓ+m+1 }.

We have shown that when the values of these approximation problems are positive
(which is true if ℓ + m + 1 < d(A)), for both these problems there exists a uniquely
defined spectral approximation A∗ of B from the subspace A (in case of (2.4), we have
assumed that A is nonsingular). Another approximation problem that fits into the
template (3.1) arises in the convergence theory for Arnoldi eigenvalue iterations in [2],
where the authors study the problem of minimizing kI − h(A)p(A)k over polynomials
p ∈ Pℓ−2m , ℓ ≥ 2m ≥ 2, and h ∈ Pm is a given polynomial.
In general, the spectral approximation of a matrix B ∈ Cn×n from a subspace A ⊂
n×n
C is not unique. Ziȩtak [15] studies the problem (3.1) and gives a general char-
acterization of spectral approximations based on the singular value decomposition of
the residual matrices. In particular, combining results of [16] with [15, Theorem 4.3]
yields the following sufficient condition for uniqueness of the spectral approximation.
Lemma 3.1. In the notation established above, let A∗ be a spectral approximation
of B from the subspace A. If the residual matrix R(A∗ ) = B−A∗ has an n-fold singular
value, then the spectral approximation A∗ of B from the subspace A is unique.
It is quite obvious that the sufficient condition in Lemma 3.1 is, in general, not
necessary. To construct a nontrivial counterexample, we recall that the dual norm to
the matrix 2-norm is the trace norm (also called energy norm or c1 -norm),
r
X
(3.2) ||| M ||| ≡ σj (M ) ,
j=1
where σ1 (M ), . . . , σr (M ) denote the singular values of the matrix M ∈ Cn×n with

rank(M ) = r. For X ∈ Cn×n and Y ∈ Cn×n we define the inner product hX, Y i ≡
trace(Y ∗ X). Using this notation, we can state the following result, which is given in
[16, p. 173].
Lemma 3.2. The matrix A∗ ∈ A is a spectral approximation of B from the
subspace A if and only if there exists a matrix Z ∈ Cn×n with ||| Z ||| = 1, such that
(3.3) hZ, Xi = 0 for all X ∈ A, and Re hZ, B − A∗ i = kB − A∗ k .
Remark 3.3. Lemmas 3.1 and 3.2 are both stated here for square complex ma-
trices. Originally, Lemma 3.1 is formulated in [15] for real rectangular matrices and
Lemma 3.2 given in [16] for square complex matrices. A further generalization to
rectangular complex matrices seems possible, but it is out of our focus here.
Based on Lemma 3.2 we can prove the following result.
Theorem 3.4. For λ ∈ C, consider the n × n Jordan block
 
λ 1
 .. .. 
 . . 
Jλ ≡  .. .
 . 1
λ
Then for any nonnegative integer m with m+1 ≤ n, the solution to the approximation
problem (1.4) with A = Jλ , i.e. the (m + 1)st ideal Arnoldi (or Chebyshev) polynomial
of Jλ , is uniquely defined and given by (z − λ)m+1 .
Proof. With A = Jλ , the approximation problem (1.4) reads

(3.4) min kJλm+1 − p(Jλ )k .
p∈Pm
In the notation established in this section, we seek a spectral approximation A∗ of

B = Jλm+1 from the subspace A = span {I, Jλ , . . . , Jλm }. We claim that the uniquely
defined solution is given by the matrix A∗ = Jλm+1 − (Jλ − λI)m+1 . For this matrix
A∗ we get
B − A∗ = Jλm+1 − A∗ = (Jλ − λI)m+1 = J0m+1 .
For m + 1 = n, A∗ = Jλn − (Jλ − λI)n = Jλn yields B − A∗ = J0n = 0. The
corresponding ideal Arnoldi polynomial of Jλ is uniquely defined and equal to (z −λ)n ,
the minimal polynomial of Jλ .
For m + 1 < n, the value of (3.4) is positive, and hence Theorem 2.2 ensures that
the spectral approximation of Jλm+1 from the subspace A is uniquely defined. We
prove our claim using Lemma 3.2. Define Z ≡ e1 eTm+2 , then |||Z||| = 1,
hZ, Jλj i = 0, for j = 0, . . . , m ,

and kB − A∗ k = kJ0m+1 k = 1, so that
hZ, B − A∗ i = hZ, J0m+1 i = 1 = kB − A∗ k,
which shows (3.3) and completes the proof.
The proof of this theorem shows that the residual matrix of the spectral approx-
imation A∗ of B = Jλm+1 from the subspace A = span {I, Jλ , . . . , Jλm } is given by
R(A∗ ) = J0m+1 . This matrix R(A∗ ) has m + 1 singular values equal to zero, and
n − m − 1 singular values equal to one. Hence, for m + 1 < n, the maximal singular
value of the residual matrix is not n-fold, and the sufficient condition of Lemma 3.1
does not hold. Nevertheless, the spectral approximation of B from the subspace A is
unique whenever m + 1 < n.
As shown above, for m = 0, 1, . . . , n − 1 the polynomial (z − λ)m+1 solves the
ideal Arnoldi approximation problem (1.4) for A = Jλ . For λ 6= 0, we can write
(z − λ)m+1 = (−λ)m+1 · (1 − λ−1 z)m+1 .
Note that the rightmost factor is a polynomial that has value one at the origin. Hence
it is a candidate for the solution of the ideal GMRES approximation problem (1.5)
for A = Jλ . More generally, it is tempting to assume that the (m + 1)st ideal GMRES
polynomial for a given matrix A is equal to a scaled version of its (m + 1)st ideal
Arnoldi (or Chebyshev) polynomial. However, this assumption is false, as we can
already see in case A = Jλ . As shown in [10], the determination of the ideal GMRES
polynomials for a Jordan block is an intriguing problem, since these polynomials can
become quite complicated. They are of the simple form (1 − λ−1 z)m+1 if and only
if 0 ≤ m + 1 < n/2 and |λ| ≥ ̺−1 m+1,n−m−1 , cf. [10, Theorem 3.2]. Here ̺k,n denotes
the radius of the polynomial numerical hull of degree k of an n × n Jordan block (this
radius is independent of the eigenvalue λ).
Now let n be even and consider m + 1 = n/2. If |λ| ≤ 2−2/n , the ideal GMRES
polynomial of degree n/2 of Jλ is equal to the constant polynomial 1. If |λ| ≥ 2−2/n ,
the ideal GMRES polynomial of degree n/2 of Jλ is equal to
2 4λn − 1
(3.5) + n (1 − λ−1 z)n/2 ,
4λn + 1 4λ + 1
cf. [10, p. 465]. Obviously, neither the polynomial 1 nor the polynomial (3.5) are
scalar multiples of (z − λ)n/2 , the ideal Arnoldi polynomial of degree n/2 of Jλ .
Acknowledgements. We thank Krystyna Ziȩtak for many discussions and sug-

gestions that helped to improve the content of this paper. We also thank Mark
Embree, Shmuel Friedland, Anne Greenbaum, Nick Trefethen and two anonymous
referees for their helpful comments.
REFERENCES
[1] W. E. Arnoldi, The principle of minimized iteration in the solution of the matrix eigenvalue
problem, Quart. Appl. Math., 9 (1951), pp. 17–29.
[2] C. A. Beattie, M. Embree, and D. C. Sorensen, Convergence of polynomial restart Krylov
methods for eigenvalue computations, SIAM Rev., 47 (2005), pp. 492–515 (electronic).
[3] E. W. Cheney, Introduction to Approximation Theory, McGraw-Hill Book Co., New York,
1966.
[4] A. Greenbaum and L. N. Trefethen, GMRES/CR and Arnoldi/Lanczos as matrix approx-
imation problems, SIAM J. Sci. Comput., 15 (1994), pp. 359–368.
[5] N. J. Higham, Functions of Matrices: Theory and Computation, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia, PA, 2008.
[6] K. Hoffman and R. Kunze, Linear Algebra, 2nd edition, Prentice-Hall Inc., Englewood Cliffs,
N.J., 1971.
[7] K. K. Lau and W. O. J. Riha, Characterization of best approximations in normed linear spaces
of matrices by elements of finite-dimensional linear subspaces, Linear Algebra Appl., 35
(1981), pp. 109–120.
[8] Y. Saad and M. H. Schultz, GMRES: a generalized minimal residual algorithm for solving
nonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856–869.
[9] I. Singer, Best approximation in normed linear spaces by elements of linear subspaces, Trans-
lated from the Romanian by Radu Georgescu. Die Grundlehren der mathematischen Wis-
senschaften, Band 171, Publishing House of the Academy of the Socialist Republic of
Romania, Bucharest, 1970.
[10] P. Tichý, J. Liesen, and V. Faber, On worst-case GMRES, ideal GMRES, and the polynomial
numerical hull of a Jordan block, Electron. Trans. Numer. Anal., 26 (2007), pp. 453–473
(electronic).
[11] K.-C. Toh, Matrix Approximation Problems and Nonsymmetric Iterative Methods, PhD thesis,
Cornell University, Ithaca, N.Y., 1996.
[12] K.-C. Toh, GMRES vs. ideal GMRES, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 30–36.
[13] K.-C. Toh and L. N. Trefethen, The Chebyshev polynomials of a matrix, SIAM J. Matrix
Anal. Appl., 20 (1998), pp. 400–419.
[14] L. N. Trefethen and M. Embree, Spectra and Pseudospectra, Princeton University Press,
Princeton, N.J., 2005.
[15] K. Ziȩtak, Properties of linear approximations of matrices in the spectral norm, Linear Algebra
Appl., 183 (1993), pp. 41–60.
[16] K. Zietak, On approximation problems with zero-trace matrices, Linear Algebra Appl., 247
‘
(1996), pp. 169–183.

Li Ti 2009

Uploaded by

Copyright:

Available Formats

Li Ti 2009

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Li Ti 2009

Uploaded by

Copyright:

Available Formats

ON BEST APPROXIMATIONS OF POLYNOMIALS IN MATRICES

IN THE MATRIX 2-NORM

AMS subject classifications. 41A52, 15A60, 65F35

1. Introduction. Much of the work in approximation theory concerns the ap-

(1.1) min kf − pkΩ ,

(1.2) min |f (A) − p(A)| .

Does this problem have a unique solution?

(1.3) min kb(A) − p(A)k

(1.4) min kAm+1 − p(A)k,

of A. The resulting monic polynomial is the (m + 1)st Chebyshev polynomial on this

(2.2) min kAm+1 g(A) + h(A)k,

where g ∈ Pℓ is a given polynomial, or,

With ℓ = 0 and g = 1, (2.3) reduces to (1.4).

Setting m = 0 and h = 1 in (2.4), we retrieve a problem of the form (1.5).

C = kq1 (A)k = kq2 (A)k.

By assumption, C > 0. Define q ≡ 12 (q1 + q2 ) ∈ Gℓ,m

(2.5) q(A) = V diag(σ1 , . . . , σn ) W ∗ .

kq1 (A)wj k = kq2 (A)wj k = C, 1≤j≤J.

By the strict convexity of the vector 2-norm,

q1 (A)wj = q2 (A)wj , 1≤j≤J.

Similarly, one can show that

(2.6) (q2 (A) − q1 (A))wj = 0, (q2 (A) − q1 (A))∗ vj = 0, 1 ≤ j ≤ J.

By assumption, q2 − q1 ∈ Pm is a nonzero polynomial. By the division theorem for

qe ≡ (q2 − q1 ) · s = z m+1 g − r ∈ Gℓ,m

qǫ (A)∗ qǫ (A)wj = (1 − ǫ)qǫ (A)∗ q(A)wj = (1 − ǫ)Cqǫ (A)∗ vj

kqǫ (A)k = kqǫ (A)W k

We suppose that q1 = z m+1 g1 +h ∈ Hℓ,m (h)

(q2 (A) − q1 (A))wj = 0, (q2 (A) − q1 (A))∗ vj = 0, 1 ≤ j ≤ J.

Since A is nonsingular, and q2 − q1 = z m+1 (g2 − g1 ), these relations imply that

(2.8) (g2 (A) − g1 (A))wj = 0, (g2 (A) − g1 (A))∗ vj = 0, 1 ≤ j ≤ J.

By assumption, 0 6= g2 − g1 ∈ Pℓ . Hence there exists an integer d, 0 ≤ d ≤ ℓ, so that

After a formal change of variables z −1 7→ y, we obtain

b g(y) · s(y) − r(y) .

We now multiply the preceding equation by y −m−1−ℓ+d , which gives

Since y −1 = z, this equation is equivalent to

For a fixed ǫ ∈ (0, 1), consider

Since qe = sez −d (g2 − g1 ), (2.8) implies that

which can be used to show that

qǫ (A)∗ qǫ (A)wj = (1 − ǫ)2 C 2 wj , 1 ≤ j ≤ J.

min kp(A)k = min max |λm+1

min max |λm+1

B = Am+1 g(A), where g ∈ Pℓ is given, and A = {I, A, . . . , Am },

while in case of (2.4),

B = h(A), where h ∈ Pm is given, and A = {Am+1 , . . . , Aℓ+m+1 }.

where σ1 (M ), . . . , σr (M ) denote the singular values of the matrix M ∈ Cn×n with

(3.3) hZ, Xi = 0 for all X ∈ A, and Re hZ, B − A∗ i = kB − A∗ k .

Proof. With A = Jλ , the approximation problem (1.4) reads

In the notation established in this section, we seek a spectral approximation A∗ of

hZ, Jλj i = 0, for j = 0, . . . , m ,

Acknowledgements. We thank Krystyna Ziȩtak for many discussions and sug-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.