0% found this document useful (0 votes)
2K views440 pages

Fundamental Ideas of Analysis

Uploaded by

a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views440 pages

Fundamental Ideas of Analysis

Uploaded by

a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 440

et

\MdUtl RetjL
Digitized by the Internet Archive
in 2019 with funding from
Kahle/Austin Foundation

https://archive.org/details/fundamentalideasOOOOreed
Fundamental
Ideas of
ANALYSIS

■ j, uafa Library
mas
TRENT UNIVERSITY
WTERBOROUGH, ONTARIO
Fundamental
Ideas of
ANALYSIS
MICHAEL C. REED
Duke University

John Wiley & Sons, Inc.


New York Chichester Weinheim Brisbane Toronto Singapore
Cover art: McltisSG, Hsnri.
Interior with a Violin Case. Nice, (winter 1918-19).
Oil on canvas, 28 3/4 x 23 5/8"(73 x 60 cm).
The Museum of Modern Art, New York. Lillie P. Bliss Collection.
Photograph ©1998 The Museum of Modern Art, New York.

©1998 Succession Henri Matisse, Paris/Artists Rights Society


(ARS), New York

MATHEMATICS EDITOR Barbara Holland

MARKETING MANAGER Leslie Hines

PRODUCTION EDITOR Ken Santor

DESIGNER Maddy Lesure

ILLUSTRATION AND COMPOSITION )ohn Davies

This book was typeset in Palatino by John Davies with the DTgX Documentation System and printed
and bound by the Hamilton Printing Company. The cover was printed by Phoenix Color Corpora¬
tion.

Recognizing the importance of preserving what has been written, it is a policy of John Wiley & Sons,
Inc. to have books of enduring value published in the United States printed on acid-free paper, and
we exert our best efforts to that end.

The paper in this book was manufactured by a mill whose forest management programs include
sustained yield harvesting of its timber lands. Sustained yield harvesting principles ensure that the
numbers of trees cut each year does not exceed the amount of new growth.

Copyright ©1998 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, ex¬
cept as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either
the prior written permission of the Publisher, or authorization through payment of the appropriate
per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (508)
750-8400, fax (508) 750-4470. Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012,
(212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ@WILEY.COM.

Library of Congress Cataloging in Publication Data


Reed, Michael C.
Fundamental ideas of analysis / Michael C. Reed.
p. cm. '»
Includes bibliographical references (p. 403 - 405 ) and index.
ISBN 0-471-15996-4 (cloth : alk. paper)
1. Mathematical analysis. I. Title.
QA300.R44 1998
515-dc21 97-20683
CIP

Printed in the United States of America


10 9876543
for Rhonda
I prove a theorem and the house expands:
the windows jerk free to hover near the ceiling,
the ceiling floats away with a sigh.

- from the poem “Geometry”, by Rita Dove


Poet Laureate of the United States, 1993-1995.
Preface

The ideas and methods of mathematics, long central to the physical sci¬
ences, now play an increasingly important role in a wide variety of dis¬
ciplines. This success, in fields as diverse as biology, economics, opera¬
tions research, robotics, cryptology, and finance, raises difficult questions
about the undergraduate curriculum. Many mathematics majors now
plan careers in other fields, and non-majors form a significant part of the
student population in advanced undergraduate courses. Because of the
interests of these students, and the (appropriate) use of machine compu¬
tation, some courses which used to be highly theoretical (e.g., ordinary
differential equations) now emphasize methods and computational ex¬
periments. There is a bewildering array of undergraduate courses, both
pure and applied. What should we teach these students? Where will
they learn the power and beauty of pure mathematics? How can we
make clear the coherence and continuity of the central ideas of mathe¬
matics?
Because of all of these issues, the undergraduate analysis course is
more important than ever. For in this course students learn that mathe¬
matics is more than just methods that work. Analysis provides theorems
that prove that results are true and provides techniques to estimate the
errors in approximate calculations. In this course, students are asked for
the first time to construct long proofs and it is from the proofs that they
obtain deep understanding of the ideas. Finally, the ideas and methods
of analysis play a fundamental role in ordinary differential equations,
probability theory, differential geometry, numerical analysis, complex
analysis, and partial differential equations, as well as in most areas of
applied mathematics. For all these reasons, analysis should be a central,
if not the central, course in the undergraduate curriculum.
This analysis text makes the connections to the rest of the curriculum
visible. The standard topics for a one term undergraduate real analy¬
sis course are covered in the unstarred sections of chapters 1 through 6.
But, in addition, numerous examples are given which show the ways in
which real analysis is used in ordinary and partial differential equations,
probability theory, numerical analysis, number theory, and so forth. In
this way the importance of the technical questions becomes clear to the
students and the entire course is more lively and interesting. These
“connections” are developed in the starred sections of chapters 1 through
6 and in chapters 7 -10. Throughout the text the connections are also
emphasized in examples, problems, and projects. The development of
the standard topics does not depend on the material in the starred sec¬
tions, so the instructor can choose which starred sections to include and
which to omit. Furthermore, with one or two exceptions, the starred
sections and chapters are independent of each other. My intention is
to provide materials which make it easy for the individual instructor to
construct a lively and interesting course that is appropriate, both in con¬
tent and in level of difficulty, for his or her own students. Assistance is
available in a separate Instructor's Manual in which the starred sections
and chapters, the projects, and the harder problems are thoroughly dis¬
cussed.

By teaching most of the starred sections and chapters, an instructor


can use this text for a two semester course. There are too many subjects
in the undergraduate curriculum that a student “must” know, at least
if each is taught as a separate undergraduate course. The solution is, I
believe, to decide which subjects in analysis are central and to include
them as topics in a required year-long course. My choices are evident
from the table of contents: Students should know the existence theory
for ordinary differential equations; they should know Cauchy's theorem
and Fourier series; they should understand why probability theory is a
branch of analysis; and, along the way, they should see a partial differ¬
ential equation and some numerical analysis. Of course, I hope that this
text will prove useful for individuals and departments whose choices
would not be identical to mine.

Because of its philosophy, this text differs in a variety of ways from


other texts in real analysis. The properties of the real numbers are as¬
sumed, and set theory and point set topology play only a minor role.
The development of the analytical tools is lean. The point is not to prove
the best possible theorem, but to show why a theorem is necessary, to
prove it, and then to use it. No strong distinction is made between classi¬
cal and modern analysis. Rather, the modern ideas are introduced when
they arise naturally to clarify and unify explicit calculations that the stu¬
dents have already made. These differences are changes of emphasis
rather than radical departures from the traditional texts.
Chapters 1 and 2 proceed quite slowly since many students encounter
long proofs here for the first time. The level of difficulty is a little higher
in the unstarred sections chapters 3 through 6 and still higher in starred
sections and in chapters 7 through 10. Note, however, that some sections
in the later chapters, for example Sections 7.3 and 10.2, are easy and can
be taught without teaching the other material in the chapter.
Each section of the book has a large selection of problems designed to
help the students understand definitions and learn to construct proofs.
Each chapter ends with a few projects which can be assigned to students
individually or in groups. Some topics, for example the wave equation,
the Laplace transform, the zeta function, and Chebyshev's inequality, are
introduced in the problems and projects. Throughout the text, and in the
problems and projects, it is assumed that the students have available and
can use graphing hand calculators.
The illustrations in the text were created by John Davies, Senior Sys¬
tems Programmer in the Mathematics Department at Duke, who also
contributed many ideas to the design of the book. He has my deep grati¬
tude. Thanks are also due to my editor at Wiley, Barbara Holland, whose
suggestions and ideas helped to shape the presentation of the material.
Many others have made valuable suggestions: Lester Coyle, Greg
Lawler, Harold Layton, Laura McKinney, Lang Moore, William Pardon,
Isaac Reed, Kirsten Travers. I am grateful to the reviewers: Morton
Brown at the University of Michigan, Robert Cole at the Evergreen State
College, Steven London at the University of Houston-Downtown, Brad
Osgood at Stanford University, Daniel Robinson at the Georgia Institute
of Technology, Eric Schecter at Vanderbilt University, Joe Thrash at the
University of Southern Mississippi-Gulf Park, and Robert Underwood
at the Colorado School of Mines. They greatly influenced the choice
of material in the book. Particularly important for me was the feed¬
back from colleagues who taught out of a preliminary edition last year:
Stephen Abbott at Middlebury College, Jose Barrionuevo at University
of South Alabama, Danielle Carr at Bryn Mawr College, James Don¬
aldson at Howard University, Edward Effros at UCLA, Tony Phillips at
SUNY Stonybrook, and David Schaeffer at Duke. I thank them all.
Finally, I am particularly grateful to my wife, Rhonda Hughes, whose
idea it really was.

Michael Reed
*
Contents
Preface

chapter 1 Preliminaries 1
1.1 The Real Numbers 1
1.2 Sets and Functions 6
1.3 Cardinality 15
1.4 Methods of Proof 20

chapter 2 Sequences 27
2.1 Convergence 27
2.2 Limit Theorems 35
2.3 Two-state Markov Chains* 40
2.4 Cauchy Sequences 44
2.5 Supremum and Infimum 52
2.6 The Bolzano-Weierstrass Theorem 55
2.7 The Quadratic Map* 60
Projects 68

chapter 3 The Riemann Integral 73


3.1 Continuity 73
3.2 Continuous Functions on Closed Intervals 80
3.3 The Riemann Integral 87
3.4 Numerical Methods* 95
3.5 Discontinuities 103
3.6 Improper Integrals 113
Projects 119
chapter 4 Differentiation 121

4.1 Differentiable Functions 121


4.2 The Fundamental Theorem of Calculus 129
4.3 Taylor's Theorem 134
4.4 Newton's Method* 140
4.5 Inverse Functions 147
4.6 Functions of Two Variables* 151
Projects 159

chapter 5 Sequences of Functions 163


5.1 Pointwise and Uniform Convergence 163
5.2 Limit Theorems 169
5.3 The Supremum N orm 175
5.4 Integral Equations* 183
5.5 The Calculus of Variations* 188
5.6 Metric Spaces 196
5.7 The Contraction Mapping Principle 203
5.8 Normed Linear Spaces* 210
Projects 219

chapter 6 Series of Functions 223


6.1 Lim sup and Lim inf 223
6.2 Series of Real Constants 228
6.3 The Weierstrass M-test 238
6.4 Power Series 245
6.5 Complex Numbers* 252
6.6 Infinite Products and Prime Numbers* 260
Projects 270

* *

chapter 7 Differential Equations* 273


7.1 Local Existence 273
7.2 Global Existence 283
7.3 The Error Estimate for Euler's Method 289
Projects 296
chapter 8 Complex Analysis* 299
8.1 Analytic Functions 299
8.2 Integration on Paths 305
8.3 Cauchy's Theorem 312
Projects 320

chapter 9 Fourier Series* 323


9.1 The Heat Equation 323
9.2 Definitions and Examples 331
9.3 Pointwise Convergence 337
9.4 Mean-square Convergence 345
Projects 355

chapter 10 Probability Theory* 359


10.1 Discrete Random Variables 359
10.2 Coding Theory 368
10.3 Continuous Random Variables 376
10.4 The Variation Metric 386
Projects 398

Bibliography 403

Symbol Index 406

Index 409
CHAPTER 1

Preliminaries

This chapter has two purposes. First, we review the properties of real
numbers and establish our notation for sets and functions. Second, we
provide simple examples of mathematical proofs which can serve as
models for students with no previous experience in proving theorems.

1.1 The Real Numbers

The reader is surely familiar with the basic arithmetic properties of ad¬
dition, x + y, and multiplication, xy, of real numbers.

(PI) x + y = y + x for all x and y.


(P2) (x + y) + z = x + (y + z) for all x, y, and z.
(P3) x + 0 = x for all x.
(P4) For every x there is number — x so that x + (—a;) = 0.
(P5) xy = yx for all x and y.
(P6) (xy)z = x(yz) for all x, y, and 2.
(P7) x ■ 1 = x for all x.
(P8) For every i/O there is number x_1 so that xx-1 = 1.
(P9) x(y + z) = xy + xz for all x, y, and z.

Addition and multiplication are commutative (PI and P5) and associa¬
tive (P2 and P6). Property P9 is called the distributive property. We oc¬
casionally write x • y instead of xy to emphasize that we are multiplying
x and y. A set with operations of addition and multiplication which sat¬
isfy these nine properties is called a field. Thus, the set of real numbers,
which we denote by R, is a field. There are other fields. Any set with two
elements can be made into a field by defining addition and multiplica¬
tion appropriately (see problem 7). The complex numbers, C, are a field
that contains R. We define C and study some of its properties in Section
2 Chapter 1. Preliminaries

6.5. It is convenient to think of the real numbers as the points on a line:

y/2 e 7r
H-1-1-•-1-•-I-*-
-10 12 3

Figure 1.1.1

The real numbers have an order relation <, “less than or equal to”,
which has the following properties:

(Ol) For all x and y, either x < y or y < x.


(02) If® ‘C y and y ^ x, then % — y.
(03) If x < y and y < z, then x < 2.
(04) If x < y, then for any z we have x + z < y + z.
(05) If® < y, then for any 0 < z, we have xz < yz.

We can define > and < in terms of <. We say that x > y if and only if
y < x, and we say that x < y if and only if x < y and x ^ y. In terms of
the line in Figure 1.1.1, x < y means that x is to the left of y.
All the rules for the manipulation of elementary algebraic expres¬
sions, for example, removing parentheses or the laws of exponents, can
be proven from (PI) through (P9). The usual rules for the manipulation
of inequalities can be derived from (P1)-(P9) and (01)-(05). We illus¬
trate how this can be done in the following proposition. The reader is
asked to construct other such proofs in problems 1 through 6.

Proposition 1.1.1

(a) For all real numbers x,y, and z, if x + z — y + z, then x = y.

(b) Additive inverses are unique.

(c) For all real numbers x and y, (—x)(—y) = xy.


*

(d) For all real numbers x and y, if x < y, then —x > —y.

Proof. To prove (a), let be a real number such that z + (—z) = 0. We


know that such a number exists by P4. Since x + z = y + z, we know that

(x + z) + (-z) = (y + z) + (-z)
1.1 The Real Numbers 3

x + (*+(-*)) y + (z + (~z)) (by P2)


x -f 0 y + o (by P4)

v- (by P3)

This proves (a).


To prove (b), suppose that z 4- x = 0 and z + y = 0. Then x + z =
z -f x = 0 and y-\-z = z-\-y — ObyPl. Thus, x + z = 0 = y + so by
part (a), we conclude that x = y.
To prove (c), we use the fact that 2 • 0 = 0 = 0 • z for all real numbers
z (problem 4) and the distributive rule (P9) to conclude that

xy + x(—y) = x(y + (—y)) = x■0 = 0 (1)


xy + (-x)y = (x + (—x))y = 0 • y = 0. (2)

It follows from (1) and part (b) that x(—y) = —(xy). Since this equality
holds for all real numbers x and y, it holds if we replace x by —x. Thus
(-x)(-y) ■= —((—x)y). But, by (2) and (P4), (—x)y = -(xy), so we can
use the result of problem 3 to conclude that

(~x)(~y) = -((-x)y) = -(-(xy)) = xy.

To prove (d) we suppose that x < y. Let 2 = (—x) + (—y). Then,


by (04), x + (-x) + (—y) < y + (-x) + (-y). By the commutative and
associative properties, (PI), (P2), (P5), and (P6),

x + (-x) + (-y) = (x+ (-x)) + (-y) = 0 + (~y) = (-y) + 0 = -y.

Similarly,

y+(-a:) + (-y) = (~x) + y + (-y) = (-x) + 0 = -x.

Thus -y < -x, which is the same as -x > -y by the definition of >. □

We will normally use □ to denote the end of a formal proof or the


beginning of a theorem. Using <, we can now define the absolute value
of a real number:
x if x > 0
—x if x < 0

We use = instead of = when the left-hand side is defined in terms of the


right-hand side.
The following proposition states the most important properties of ab¬
solute value.
Chapter 1. Preliminaries

Proposition 1.1.2

(a) |x| > 0 for all x and \x\ = 0 if and only if x = 0.

(b) \xy\ = \x\\y \ for all x and y.

(c) \x + y\ < |x| + \y \ for all x and y (the triangle inequality).

Proof. To prove (a), we first suppose that x > 0. Then, by definition,


\x\ = x so |x| > 0. On the other hand, if x < 0, then |x| = —x, so again
|x| > 0. Thus, in both cases, \x\ >0. If x / 0 then — x ^ 0, so the only
number with absolute value zero is 0. This proves (a). The proof of (b) is
left as an exercise (problem 1).
To prove (c) we argue as follows. For all x, we know that x < \x\
since |x| = x if x is nonnegative and x < 0 < —x — \x\ if x is negative.
Similarly, y < |y|. Thus, by using (04) twice,

x + y < \x\ + y < |x| + \y\.

Therefore, if x+y > 0, we have that \x+y\ = x+y < |x| + |y|. If x + y < 0,
we proceed as follows. We always know that x > —\x\ since this is clearly
true if x is nonnegative and x = — |x| if a; is negative. Similarly, y > —\y\.
Thus, again using (04) twice,

x + y > -\x\ + y > -\x\-\y\ = —(|ar| + |y|).

Using part (d) of Proposition 1.1.1, we see that -(x + y) < |x| + |y|.
Therefore, since |x + y| = —(x + y) if x + y < 0, we conclude that |x + y| <
|x| + |y|, which completes the proof of (c). O

There are two other important properties of the real numbers. The
real numbers are complete, which means roughly that every sequence
of real numbers that looks as if it is converging does indeed converge to
a limit. This is discussed in Section 2.4. Second, the real numbers satisfy
the Archimedian property, which says that if a > 0 and b > 0, then there
is an integer n so that na > b. If we think about the real numbers, this
seems obvious. However, there are ordered fields which do not have
the Archimedian property (ordered fields are defined in problem 12).
Throughout the discussion of completeness in Section 2.4 we assume that
the Archimedian property holds for R.
1.1 The Real Numbers 5

Problems

1. Prove that if x and y are real numbers, then \xy\ — |ar||y|. Hint: check all
the cases.

2. Use the field properties of the real numbers to provide a careful proof of
the elementary algebraic identity (x + y)2 = x2 + 2xy + y2.

3. Prove that —(—x) = a: for all real numbers a;. Hint: show that (—®) + (x) =
0 and then use part (b) of Proposition 1.1.1.

4. Prove that all real numbers 2 satisfy z • 0 = 0 = 0 • z. Hint: first prove that
0 + z- 0 = z- 0 + z- 0 and then use part (a) of Proposition 1.1.1.

5. Use part (d) of Proposition 1.1.1 to prove that if x < y and z < 0, then
zy < zx.

6. Suppose that 0 < x < y.

(a) Using the properties (01)-(05), prove that 0 < x2 < y2. Is the
conclusion true if we omit the hypothesis that 0 < a:?
(b) Using mathematical induction, prove that 0 < xn < yn for all posi¬
tive integers n. Induction is discussed in Section 1.4.

7. Let Z2 be a set consisting of two elements which we denote by 0 and 1.


Define an operation of addition, +, and an operation of multiplication, •,
by the following rules:

0 + 0 = 0; 1 + 1 = 0; 1+0= 1=0+1

0-0 = 0; 1-1 = 1; 1 - 0 = 0 = 0 -1.


Prove that Z2 is a field.

8. Prove that if x and y are real numbers, then 2xy < x2 +y2.

9. Let xi, x2, and x3 be real numbers. Prove that

\xi + x3+x3\ < |a?i| + \x2\ + |x3|.

10. Prove that all real numbers x and y satisfy

| \x\ — |y| | < \x — y|. (3)

Hint: apply the triangle inequality to x = {x - y) + y and then reverse the


roles of x and y.
11. Let a and b be real numbers with a < b. Prove that there are integers m
and n^Oso that
a < — < b.
n
Hint: First use the Archimedian property to prove that an n exists so
that bn - an > 1. Then argue from this that there is an integer m so that
an < m < bn.
6 Chapter 1. Preliminaries

12. Let T be a field. Suppose that there is a set P C T which satisfies the
following properties:

(a) For each x in T, exactly one of the following three statements holds:
x is in P; —x is in P; x = 0.
(b) If x is in P and y is in P, then x + y is in P and xy is in P.

If such a P exists, P is called an ordered field. For any set Pcf, define
x < y if and only if (y - x) is in P or x = y. Prove that P satisfies the
properties (a) and (b) if and only if < satisfies the properties (01)-(05).

1.2 Sets and Functions

Sets are defined by describing their members or by the bracket notation


{x | p(x)}, which indicates the collection of objects x so that the proposi¬
tion p(x) is true. Thus, if a and b are real numbers,

S = {x | a < x < b}

defines the set 5 of real numbers that are greater than or equal to a and
less than or equal to b. S is also denoted by [a, b] and called a closed
interval because it contains its endpoints.

T = {x | a < x < b}

defines the set T of real numbers which are greater than a and less than
b. T is also denoted by (a, b) and called an open interval. Occasionally,
we will use half-open intervals [a, b) = {x \ a < x < b}. The half-open in¬
terval (a, b] is defined analogously. Sometimes we specify a set by listing
its members. For example, the natural numbers are the set N, where

N = {1,2,3,...}

and the integers are the set

Z = {...,-2,-1,0,1,2,...}.
The set of real numbers that can be written in the form — where m and
n
n are integers and n / 0 are called the rational numbers and denoted by
Q. A real number which is not rational is called irrational. If x belongs
to a set X, we write
x e X,
1.2 Sets and Functions 7

and, if it does not belong to X, we write

x {X.

We use the standard notation for the union, S'UT, and intersection, SnT,
of sets:

S UT = {a: | a: e 5 or a: e T}
SC\T = {x | x e S' and x e T}.

We say that S' is a subset of T if every element of S is also in T, in which


case we write S C T. If S C T and S / T, we say that S is strictly
contained in T. We denote the set with no members by 0 and note that
it is a subset of every set X since every x e0 is in X. We remark that
when we talk about sets and set operations we are assuming that there
is a universal set which contains all the sets we are talking about. For
example, if we are talking about intervals of real numbers or rational
numbers, the universal set is R.
If S' is a subset of X, the complement of S in X, denoted by Sc, is the
set of elements of X which are not in S; that is,

Sc = {xeX\x fLS}.

The definition of Sc depends on the set X that contains S. Thus, if S C


T C X and T is strictly contained in X, then the complement of S as
a subset of T is different than the complement of S' as a subset of X.
In cases like this we will denote the complement of S in T by T\S and
the complement of S' in X by X\ S. More generally, if T and S are any
subsets of X, we define T\S = {xeX \ xeT andx yS}. The reader who
is unaccustomed to unions, intersections, and complements should work
problems 1-4.
If S and T are sets, we define their Cartesian product, denoted S xT,
as the set of all ordered pairs where the first element of the pair comes
from S and the second element of the pair comes from T:

SxT = {(s,t)\seS and teT}.

Thus, if S' = [2,3] and T = [1,4], then [2,3] x [1,4] is the set of ordered
pairs (x, y) such that 2 < x < 3 and 1 < y < 4. That is, [2,3] x [1,4] is just
the rectangle in the plane with vertices at the points (2,1), (3,1), (2,4),
and (3,4). Since the Euclidean plane R2 is just the set of all ordered pairs
(x, y) where x e R and ye E, we see that R2 = R x 1. Note that there is a
8 Chapter 1. Preliminaries

possibility of confusion because we are using the notation (a, b) to mean


two different things: an open interval on the real line and a point in the
plane. We will usually say “the open interval (a, &)” or “the point (a, &)”
to distinguish between the two possibilities.
In pre-calculus, students are taught that a function (of one variable) is
“a rule” which assigns to every real number x another real number f(x)
called the “value” of / at the “argument” x. Typically, such functions are
given by formulas like f(x) = 2x2 — 1 or f(x) = xsinx. We will analyze
such functions but we will also consider functions which are not given
by simple explicit formulas. Further, we shall need functions whose ar¬
guments and values are in more general sets than the real numbers EL
Thus, it makes sense to be more careful about what a function is. The
idea of Cartesian product allows us to say precisely what we mean by a
function.

Definition. Let S and T be sets. A function from a set S' to a set T is


a subset, F, of S x T such that each seS occurs in at most one ordered
pair in F. For each pair, (s, t) e F, we call t the value of the function at s,
and if the name of the function is / we will write

or

t = f(s)-

Note the distinction between F and /. The set F is a subset of the set
of ordered pairs, while f(s) is the name of the second element of the
ordered pair whose first element is s. The symbol / is the name of the
“rule” which assigns f(s) to s.

Definition. The set {s | (s, t) e F} is called the domain of /, and the set
{f | (s, t) e F} is called the range of /. These sets are sometimes denoted
by Dom(f) and Ran(f) for short.

Thus Dom(f) is simply the subset of members of S which occur as the


first element of an ordered pair in F. Ran(f) is the subset of members
of T which occur as the second element of an ordered pair in F. We say
that / “is a function from S to T” to indicate that the domain of / is S
and the range of / is a subset of T.
1.2 Sets and Functions 9

Example 1 Let / be the function from R into R that is given by the


formula f(s) — s2 — 2. Since S = R and T = R, we know that S x T =
R2. The set F consists of all ordered pairs of real numbers of the form
(s, s2 — 2). That is,
F = {(s, s2 — 2) | s e R}.

Thus, the set F consists of the points in the plane that we normally refer
to as the graph of /. See Figure 1.2.1(a). The requirement in the definition
of function that each s occur as the first element of at most one ordered
pair ensures that for each s e Dom(f) the function has exactly one value.
That is, the vertical line passing through (s, 0) intersects the graph of /
in exactly one point. The domain of / is R, and the range of / is the
set [—2, oo). We use the symbol [a, oo) to denote the set {x e R | x > a}.
Similarly, the symbol (—oo, 6] denotes the set {x e R | x < b}, and (a, oo)
and (—oo, b) are defined analogously.

Example 2 Let / be the function from R into R that is given by the


formula f(s) = Ins for 5 > 0. The reader is probably familiar with the
natural logarithm function; it is defined formally in Example 2 of Section
4.3. Once again S x T = R2. The set F, which is given by

F = {(s, In s) | s e R and s > 0},

is the graph of the natural logarithm function; see Figure 1.2.1 (b). How¬
ever, in this case the only members of S' = R which occur as the first
element of ordered pairs in F are positive numbers. Thus Dom(f) is
10 Chapter 1. Preliminaries

the interval (0, oo). On the other hand, any real number is the natural
logarithm of some s > 0; so Ran(f) = R.

Example 3 Let denote the set of two-by-two matrices with real


entries. Let / be the function from M(2) to R that assigns to each matrix

fab
its determinant ad — be. Then,
c d

a
F = eM(2)
c d)’ad~bc c

and

ad — be.

In this case Dom(f) = and Ran(f) = R, since, given any real num¬
ber q, it is easy to choose a, b, c, and d so that ad — be = q.

Example 4 For each point (a, 6) in IR2, let f(a,b) be the two-by-two
matrix given by

f{a,b)

Then / is a function from R2 to The domain of / is all of R2, and


the range of / is the subset of consisting of all two-by-two matrices
which are symmetric.

Example 5 We denote by C[0,1] the set of real-valued functions on


the interval [0,1] that are continuous. We will give a technical definition
of “continuous” in Chapter 3. For the moment, just assume that saying
that a function g is continuous means that the graph of g doesn't have
any jumps in it. We will see in Chapter 3 that one can integrate any
continuous function on a finite intervaj. We define a function I on C[0,1]
by

1(9) = f g{*) dx.


Jo
I is a function from C[0,1] to R. The set F is

F = i(g,i(g))\g€C[0,i]}.
1.2 Sets and Functions 11

The domain of I is C[0,1] and the range of I is R since, given any real
number a, we can easily find a function g so that 1(g) = a.

Notice that, by the formal definition, a function from S to T is a set of


ordered pairs F C (S x T) such that each s e S occurs at most once. How¬
ever, following common terminology, we shall use the word “function”
for the expression which gives the second element of the ordered pair in
terms of the first.

Definition. Let / be a function from a set S' to a set T. If Ran(f) = T


we say that the function is onto. If for each t e Ran(f) there is only one
s e S so that f(s) = t, then we say that / is one-to-one. Other texts
sometimes use “surjective” instead of “onto” and “injective” instead of
“one-to-one”.

The function /(x) = x2 — 2 in Example 1 is not onto since its range


is [—2, oo), and it is not one-to-one since for every xeR, f(x) — f(—x).
The function f(x) = In x in Example 2 certainly looks like it is onto since
it appears that for every real number t there is an s > 0 so that In s = t.
Furthermore, it appears to be strictly increasing; that is, if si < s2, then
In si < lns2. Thus, it should be one-to-one. Once we give the formal
definition of the natural logarithm, these facts will be easy to prove. The
function / in Example 3 is certainly onto since every real number can
be the determinant of a two-by-two matrix. It is not one-to-one since
two different matrices can have the same determinant. The function / in
Example 4 is one-to-one because if the two pairs (ai, b\) and (02,62) are
different, then the matrices f(ai,bi) and f(a2,b2) are different. However,
/ is not onto since every point in Ran(f) is a symmetric matrix,
but not all two-by-two matrices are symmetric. Finally, the function I in
Example 5 is onto since its range is R, but it is not one-to-one since two
different functions, gi and g2, can have the same integral, I(gi) = I(g2)-
The significance of the one-to-one property is that it allows us to de¬
fine inverse functions. Suppose that / is a one-to-one function from a set
S' to a set T. We define f~l, called the inverse function of /, to be the
function from T to S, with domain Ran(f), so that for each t e Ran(f),
the value of f~l at t is the unique element s e S such that f(s) = t. See
Figure 1.2.2. The functions / and /-1 are related as follows. For each
t e Ran(f), we have

}(r\t)) = t,
12 Chapter 1. Preliminaries

and for each s e S, we have

rHM) =

It is clear that /“1 is one-to-one and that the inverse function of /_1 itself
is the original function /. We will see later that the inverse function of
the natural logarithm is the exponential function.

Many of the functions which we will consider in this book are func¬
tions from K to R. In this case there are several natural ways to make
new functions from two given functions / and g. We define the sum of
the functions, / + g, by

(f + g)(x) = f{x) + g{x)

on the domain

Dom(f + g) = Dom(f) H Dom(g).

We define the product of the functions by

fd(x) = f(x)g(x)

on the same domain. Finally we define the composition of the two func¬
tions / and g, denoted by / o g, by

(fog)(x) = f(g(x))

on the domain

Dom(f o g) = {a;eM| x e Dom(g) and g(x) e Dom(f)}.


1.2 Sets and Functions 13

Thus, to compute the value of fog at x, we first compute the number g(x)
and then compute f(g(x)). The reason for the complicated expression for
the domain of / o g is that x must be in the domain of g and g(x) must be
in the domain of f. Note that fog and g o f are not normally the same
function.

Example 6 Let / be the function f(x) = ^ with domain {x e R | x ^ 0}


and let g be the function g(x) = sin x. The domain of g is R, so the domain
of / o g is the set of x so that g(x) ^ 0 since 0 is not in Dom(f). Thus,

Dom(f o g) = M\{0, ±7r, ±27T,...}

and
l
smx
On the other hand, sin x is defined everywhere, so

Dom(g of) = {x e R | x 0}

and
{g°f){x) = sin-.
X

Thus, fog and go f are not the same function.

Problems
1. Let S be the open interval (1,2) and let T be the closed interval [-2,2].
Describe the following sets:

(a) S U T.
(b) SOT.
(c) M\S.
(d) T\S.
(e) R\{T\S).
2. Describe the following sets of real numbers:

(a) {x e R | x1 2 > 2}.


(b) {x e R11*| < 3}.
(c) {x e R | \x — 2\ < 3}.
(d) {x e Q | \x\ < 1}.
14 Chapter 1. Preliminaries

3. If A, B, and C are sets, prove that A n (B U C) — {A n B) U (A D C).

4. If A, B, and C are sets, prove that A\(B LiC) = (A\B) n (A\C).


5. In each case, describe the Cartesian product S xT.

(a) S — [0,1] and T = {x \ x > 0}.


(b) S = [0,1] and T = [2,4] U [5,6].
(c) 5 = N and T = N.
(d) 5 = N and T = R.

6. Generate the graph of each of the following functions on M and use it to


determine the range of the function and whether it is onto and one-to-
one:

(a) f(x) = x3.


(b) /(a:) = sinx.
(c) f(x) = ex.
(d) f(x) =

7. Each of the following functions has domain {x e R | x ^ 0}. For each,


generate its graph and use it to determine the range of the function and
whether it is onto and one-to-one:

(a) f(x) =
(b) f(x) = -±2.
(c) f(x) = sin
(d) f(x) = In |®|.

8. Let V be the set of polynomials of one real variable. If p(x) is such a


polynomial, define I(p) to be the function whose value at x is

Explain why I is a function from V to V and determine whether it is one-


to-one and onto.
9. In each case, determine the range of / + g and fg and say whether the
function is onto and whether it is‘one-to-one.

(a) f(x) = x and g{x) = x.


(b) f(x) = x and g(x) = | sin a;.
(c) f(x) = sin2 x and g{x) = cos2 x.

10. In each case, determine the domain and compute a formula for fog and
9° f-
1.3 Cardinality 15

(a) f(x) — 3x2 + 2 and g(x) = ex.


(b) f(x) = 4 - x2 and g(x) — lnx.
(c) f(x) = i and g(x) = lnx.

1.3 Cardinality

Two sets, S and T, are said to have the same number of elements, or the
same cardinality, if there is a one-to-one function, /, from S onto T, that
is Dom(f) — S and Ran(f) = T. This definition is reasonable because
if S and T have the same cardinality, then the function / gives a way of
matching up each element of S with a corresponding element of T. We
say that / establishes a one-to-one correspondence between S and T.

Proposition 1.3.1 Let S, T,


and U be sets. If S and T
have the same cardinality and
T and U have the same cardi¬
nality, then S and U have the
same cardinality.

Proof. Since S and T have the same cardinality, there is a one-to-one


function, /, from S onto T. On the other hand, since T and U have the
same cardinality, there is a one-to-one function, g, from T onto U. Then,
g o / is a function from S to T with domain S and range U. If si / S2,
then f(si) ^ f(s2) since / is one-to-one. Thus, since g is one-to-one,
d(f(si)) g(fis2))- Therefore, g o / is one-to-one, which proves, by
definition, that S and U have the same cardinality. O

A set is called finite if it is empty or if it has the cardinality of the set


{1, 2, 3,..., n} for some n. Otherwise it is called infinite. This definition
just states formally the way in which we usually determine the size of
finite sets. We count the elements! That is, we make a one-to-one cor¬
respondence between the elements of the set and the integers, 1, 2,
until we run out of set elements. Of course it is clear what we mean by
the size of a finite set. The purpose of the definition is to say what we
mean when sets are infinite. One of the interesting aspects of the set the¬
ory that was developed in the 19^ century was the discovery that there
16 Chapter 1. Preliminaries

are infinite sets of different sizes. An infinite set is called countable if it


has the cardinality of the natural numbers N.

Example 1 Consider the set of integers Z = {..., -2, —1,0,1,2,...}. To


show that Z is countable, we must construct a one-to-one function from
Z to N with domain Z and range N. It is easy to check that

2n if n > 1
f(n) 1 — 2n if n < 0

has these properties. Notice that this means that, according to our defi¬
nition of cardinality, Z and N have the same cardinality even though N is
strictly contained in Z.

Proposition 1.3.2 If S is an infinite subset of a countable set T, then S


is countable.

Proof. Since T is countable, there is a one-to-one correspondence, /,


from N onto T. The elements of S are a subset of T = {/(n) | n e N}.
Let ni be the smallest integer in N so that /(ni) e S. Then, let n<i be the
smallest integer bigger than n\ such that /(n2) e S. Continuing in this
manner we define n/. to be the smallest integer bigger than rik-i such
that /(rife) e S. It is easy to check that the function g defined by

fc /fafe)
is a one-to-one function from N onto S. Therefore, S is countable. □

In some cardinality proofs it is convenient to use the fundamental


theorem of arithmetic, which, for completeness, we shall state formally.
Recall that the prime numbers are the set, P, of positive integers that have
no divisors besides themselves and the number 1.

□ Theorem 1.3.3 (The Fundamental Theorem of Arithmetic) Every posi¬


tive integer N >2 can be written uniquely as a finite product of positive
integral powers of primes:

N =

A discussion and proof of the fundamental theorem of arithmetic can be


found in most undergraduate texts on number theory; see, for example,
1.3 Cardinality 17

[33]. We use it in the following proposition. For other uses, see Proposi¬
tion 1.4.2 in the next section or Section 6.6.

Proposition 1.3.4 If S and T are countable sets, then S x T is countable.

Proof. Since S and T are countable, there exist one-to-one functions, /


and g, with domains equal to N and ranges equal to S and T, respectively.
Each element of S x T is of the form (f(n),g(m)) for some n e N and
m e N. Let h be the function from S x T to N defined by

A 2"3“

By the fundamental theorem of arithmetic, h gives a one-to-one corre¬


spondence between S x T and an infinite subset of N. According to
Proposition 1.3.2, this infinite subset is countable. Thus, by Proposition
1.3.1, we conclude that S x T is countable. □

Example 2 The set of pairs of positive integers is countable since it is


equal toNxN.

We can combine these propositions to show that Q is countable.

□ Theorem 1.3.5 The set of rational numbers, Q, is countable.

Proof. Every positive rational number can be written in the form


where m and n are integers, n/0, and the fraction ~ is in lowest terms
(that is, n and m have no common factor). The function /, defined by

gives a one-to-one correspondence between Q+, the set of positive ratio¬


nal, and an infinite subset of N x N. Since N x N is countable, Propositions
1.3.1 and 1.3.2 guarantee that Q+ is countable. Since Q can be written as
Q_ U {0} U Q+, parts (b) and (c) of problem 3 guarantee that Q is count¬
able. ^

So far, all the infinite sets that we have considered have been count¬
able, and, therefore, the following theorem may come as a surprise.
18 Chapter 1. Preliminaries

□ Theorem 1.3.6 The set of real numbers between 0 and 1, that is (0,1), is
not countable.

Proof. In the proof we will use several facts about the decimal expan¬
sion of real numbers. Decimal expansions are discussed in project 4 of
Chapter 6. Every real number x between 0 and 1 has a decimal expansion
x = O.X1X2X3 ... where each Xi is an integer between 0 and 9. Each deci¬
mal expansion corresponds to a different number, except for expansions
ending in 0's, which correspond to numbers that can also be represented
by expansions ending in 9's. For example, \ can be written as .5000 ...
or as .4999 —
The proof is by contradiction. Suppose that (0,1) were countable.
Then there would be a one-to-one function / from N to (0,1) with do¬
main N and range (0,1). Denote by xthe integer in the decimal
expansion of /(n). That is.

/(l) n T(1)T(1)T(1)
U.iLi JbJb c .. x (i)
n .Jbr(2U2)r(2) . . X-
r(2)
/(2) Jb 2 Jb 2

n _(3) (3) (3) J3)


/(3) yj .jb jb 2 3
• • Xj

f(n) = O.x^x^x^ ... x^ ...

We now choose a sequence of integers yi, y2, yz, ■ • • as follows. Choose y\


to be any integer between 2 and 8 which is not equal to cc^. Choose y2
(2)
to be any integer between 2 and 8 which is not equal to x2 . Continuing
in this manner we choose yn to be any integer between 2 and 8 which is
not equal to x^. The decimal expansion

y = ■VW'iVi ...yn---

corresponds to a unique real number between 0 and 1 since it doesn't end


in 0's or 9's. However, y cannot equal /(1) since the decimal expansion
for y differs from the decimal expansion for /(1) in the first place. On the
other hand, y cannot equal /(2) since the decimal expansion for y differs
from the decimal expansion for /(2) in the second place. Continuing,
we see that y cannot equal f(n) since the decimal expansion for y differs
from the decimal expansion for f(n) in the place. That is, y does
not equal any of the numbers /(n) for any n, so y is not in the range of
/. But this is a contradiction since / was assumed to have range (0,1).
Therefore, (0,1) is not countable. □
1.3 Cardinality 19

An infinite set that is not countable is called


uncountable. Since R contains the interval
(0,1), R is also uncountable. In fact, R can be
put into one-to-one correspondence with (0,1).
Consider the function f(x) = tan® on the in¬
terval (—f, f )• The graph of /, shown in Fig¬
ure 1.3.2, shows that / is one-to-one and its
range is R. Thus, / is a one-to-one correspon¬
dence between (— |) and R. Since the func¬
tion g{x) — ttx — | gives a one-to-one corre¬
spondence between the interval (0,1) and the
interval (—§»§), the composition fog gives a
one-to-one correspondence between (0,1) and R.
The reader is asked to show in problem 2 that the union of two count¬
able sets is countable. Since R is the union of the rational numbers and
the irrational numbers, and the rational numbers are countable, the irra¬
tional numbers must be uncountable. In this sense, there are a lot more
irrational numbers than rational numbers.
Most of the ideas in this section were introduced by Georg Cantor
(1845-1918).

Problems
1. Prove that the set S = {5,10,15,20,...} is countable by constructing a
one-to-one function from S onto N.
2. Prove that the set of real numbers of the form en, n — 0, ±1, ±2,... is
countable.
3. Prove that:

(a) The union of two finite sets is finite.


(b) The union of a finite set and a countable set is countable.
(c) The union of two countable sets is countable.

4. Using Propositions 1.3.2 and 1.3.4 we proved that N x N is countable (Ex¬


ample 2). Construct a function which gives the correspondence explicity.
Hint: look at the pairs (n, m) in the plane and figure out how to count
them!
5. Prove that the set of two-by-two matrices with rational entries is count¬
able.
20 Chapter 1. Preliminaries

6. Prove that the set of two-by-two matrices with real entries is uncountable.

7. Find an explicit one-to-one correspondence between the interval (-1,7)


and the real numbers, R.

8. Suppose that for each natural number n, the set An is countable. Denote
the union of the sets An by A = UnAn. Prove that A is countable.

9. For each of the following sets, say whether it is finite, countable, or un¬
countable:

(a) The set of functions from a finite set to a finite set.


(b) The set of functions from a finite set to a countable set.
(c) The set of functions from a countable set to finite set with two or
more elements.
(d) The set of all finite subsets of the integers. Flint: prove that for each
n the set of finite subsets of size n is countable.

1.4 Methods of Proof

Learning how to construct proofs is an important part of a first course


in analysis. Since each problem and each proof is different, there is no
“method” to follow that will yield a correct proof. The best way to begin
is to read and reread the proofs in the text, following the logic line by
line. Next, the easy problems should be attempted (usually the first few
in a problem set). Begin by asking why the result is true. How is the
result related to other results that you know? What information do you
already have about the objects in question? Your instructor will give you
feedback on the correctness of your logic and the style of your proofs.
Often a result can be proven in different ways, so you will benefit by
comparing your proofs with those of other students. There is no substi¬
tute for thought and hard work. But everyone, including a professional
mathematician, gets stuck sometimes, so you should not hestitate to ask
for help. Above all, you should not be discouraged. Constructing correct
and elegant mathematical proofs is difficult. Learning how to do it is a
major intellectual accomplishment.
A typical theorem statement consists of a set, P, of declarative sen¬
tences, called the “hypotheses” or the “assumptions,” and a set, Q, of
declarative sentences called the conclusions. The statement of the the¬
orem is that if all the sentences in P are true, then it follows that the
sentences in Q are true. That is, P implies Q. Propositions 1.3.1, 1.3.2,
1.4 Methods of Proof 21

and 1.3.4 have exactly this form. Sometimes a theorem statement has
only conclusions, Q. This is the case for Theorems 1.3.5 and 1.3.6. This
is done for brevity when it is assumed that the reader knows what the
assumptions, P, are. For example, for Theorem 1.3.5, the unwritten as¬
sumptions are: (1) “The real numbers satisfy the properties PI through
P9 of Section 1.1”; (2) “Q is the set of real numbers of the form ™.” The
first sentence could be listed as an assumption for every theorem in this
book, but that would be annoying, so the assumption is unwritten but
implicit in the theorem statement. The second statement is just the def¬
inition of Q. Another example of unwritten hypotheses is Proposition
1.1.2, which seems to have only conclusions. It is very important to re¬
member that a theorem which asserts that P implies Q does not assert
that either P or Q is true. It asserts only that if P is true, then Q is also
true.

Just because P implies Q does not mean that Q implies P which is


called the converse statement. For example, consider the set of students
at a certain college. Let P be the statement “x is male” and Q be the
statement “x is blond.” Suppose that P implies Q (which might be true
for that particular school). That does not mean necessarily that Q implies
P. In the case where P implies Q and Q implies P, we say that P and Q
are equivalent or that P is true if and only if Q is true. If P implies Q we
often say that P is “sufficient” for Q because if P is true, then Q is true.
If Q implies P then we say that P is “necessary” for Q because Q can't
be true without P being true. Thus, saying that P is both necessary and
sufficient for Q is the same as saying that P and Q are equivalent.

“Not P” is the statement that not all the sentences in P are true. The
statements “P implies Q” and “not Q implies not P” are logically equiv¬
alent. Here is how one can see this. First, suppose that P implies Q.
Suppose that not Q is true. If not P is false, then P is true, which would
imply that Q is true. Therefore, not P must be true. Thus, not Q implies
not P. On the other hand, suppose that not Q implies not P. Suppose P
is true. Then Q must be true because otherwise not P would be true
since not Q implies not P. The fact that “P implies Q” and “not Q implies
not P” are equivalent is useful because it means that we have two ways
of proving that P implies Q. We can take P as true and prove that the
statements Q are true. This is called a direct proof. Or we can take the
statement not Q as our hypothesis and try to prove that the statement
not P is true. This is called a contrapositive proof. We give both a direct
and a contrapositive proof for the following simple proposition.
22 Chapter 1. Preliminaries

Proposition 1.4.1 Suppose that x2 — x > 0 and x > 0. Then x > 1.

Direct Proof. Suppose that x2 — x > 0 and x > 0. Since x(x — 1) / 0,


neither x nor x — 1 can be zero. Since their product is positive they are
either both positive or both negative. Thus, since x is positive, x — 1 must
be positive. That is, x - 1 > 0, which implies x > 1. □

Contrapositive Proof. Suppose not Q; that is, x < 1. We want to show


not P, that is that x2 - x > 0 and x > 0 cannot both be true. If x > 0, then,
multiplying x < 1 by x, we obtain x2 < x, which means that x2 — x < 0.
On the other hand, suppose x(x — 1) = x2 — x > 0; then, since x — 1 < 0,
we must have x < 0 since the product is positive. Thus, x2 — x > 0 and
x > 0 cannot both be true. □

Another common method of proof is proof by contradiction. To


prove that P implies Q one assumes that P is true and that Q is not true
and then shows that this leads to a contradiciton. The proof of Theorem
1.3.6 used proof by contradiction. Here is an example of this method,
created by Euclid more than 2000 years ago. The statement is another
example of omitted hypotheses. The unstated hypothses are, “The real
numbers satisfy (P1)-(P9)” and “there exists a real number, called \/2,
whose square is 2”.

Proposition 1.4.2 y/2 is irrational.

Proof. If -s/2 is rational then we can write \[2 as a ratio of integers


in lowest terms, y/2 = ™. Lowest terms means that the numerator and
denominator have no common factor. Thus,

V2n = m.

Squaring both sides yields

2n2 = m2 (4)

which shows that 2 is a factor of m2. By the fundamental theorem of


arithmetic, the factors of m2 are just the factors of m each taken twice.
Thus 2 is a factor of m, which means that we can write m = 2k for some
integer k. Substituting in (4) yields

2n2 = (2k)2 = 4 k2
1.4 Methods of Proof 23

or
n2 = 2k2.

It follows that 2 is a factor of n2 and therefore, by the same reasoning as


above, 2 is a factor of n. But if 2 is a factor of both m and n, then the
fraction — is not in lowest terms, which is a contradiction. □

Another method of proof is proof by induction. Suppose that for


each positive integer n, Q(n) is a statement. Suppose that we know that
Q(l) is true. Then we show for each k that ifQ(k) is true, then Q(k + 1) is
true. The statement that Q(k) is true is called the induction hypothesis.
The statement that if Q(k) is true then Q(k + 1) is true is called the induc¬
tion step. Since we know Q(l) is true, we conclude from the induction
step that Q(2) is true. Since Q(2) is true we conclude, again from the in¬
duction step, that Q(3) is true. Continuing in this manner we see that we
have proved that Q(n) is true for each n. Thus, a proof by induction con¬
sists of verifying Q(l) and proving the induction step. Here is a simple
example from number theory.

Proposition 1.4.3 If n is a positive integer, then

n(n + 1)
1 +2+ •••+n (5)
2

Proof. Q(n) is statement (5). Q(l) is certainly true since 1 = (1) • (2)/2.
Now, suppose that Q(k) is true for some k. That is,

k(k + 1)
1 T 2 + • • • -)- k

Then, by this induction hypothesis,

k(k + 1)
l + 2-|-*-* + fe-|-(feT-l) + (A; + 1)

(k + 1)(| + 1)
(k + l)(k + 2)

(fc + l)((fc + l) + l)

We have shown that if Q{k) is true, then Q(k + 1) is true. Therefore, by


induction, Q(n) is true for all n. □
24 Chapter 1. Preliminaries

The following example shows another simple use of induction.

Example 1 Suppose that we know that if / and g are differentiable


functions, then / + g is differentiable and (/ + g)' = /' + g'. We would
like to extend this to finite sums. Thus, Q(n) is the following statement:
If /i, f2i • • •, fn are differentiable functions, then /i + + • • • + fn is dif¬
ferentiable and (/i + /2 + • • • + fn)' = f[ + f2 + • • • + fn- We are assuming
that we already know that this is true for n = 2. So, suppose that Q(k) is
true for some k > 2. We want to prove Q(k + 1). Let /i, /2,..., fk+i be
differentiable functions and write

fi + /2 + ••• + fk+i = {fi + /2 + ••• + fk) + /fc+l-

The first term on the right is differentiable by the induction hypothesis


and the second term is differentiable by assumption. Therefore, since
Q(2) is true, (/i + /2 + • •. + fk+i) is differentiable and

(/i + /2 + ••• + /fc+i/ = (/i + /2 + ••• + fk)' + /fc+i


— fi + fk + • •. + fk + /fc+i

by the induction hypothesis.

Problems

1. Give five examples which show that P implies'Q does not necessarily
mean that Q implies P.

2. Proposition 1.1.2 seems to consist only of conclusions. What are the un¬
written hypotheses?

3. Suppose that a, b, c, and d are positive numbers such that a/b < c/d. Prove
that
a a+c c
<
b b -I- d d
4. Suppose that 0 < a < b. Prove thap

(a) a < \fab < b.


(b) Vab < |(a + b).

5. Suppose that x and y satisfy f + | = 1. Prove that x1 2 + y2 > 1. Hint: try


a contrapositive proof.
6. Prove that if n is a positive integer, then n3 4 5 6 7 + 5n is divisible by 6.
7. Prove that for any n real numbers, xlf x2,..., xn,
1.4 Methods of Proof 25

8. Prove that for all positive integers n,

l3 + 23 H-h n3 = (1 + 2 + 3 H-f n)2.

9. Let x > — 1 and n be a positive integer. Prove Bernoulli's inequality:

(1 + x)n > 1 + nx.

10. In order to disprove the implication that P implies Q, one often provides
an example in which P is true but Q is not. Such an example is called a
counterexample to the statement that P implies Q. For each of the follow¬
ing incorrect statements, identify P, identify Q, and provide a counterex¬
ample.

(a) If an integer is divisible by 2, then it is divisible by 4.


(b) All quadratic polynomials have two real roots.
(c) If a function / from E to K is one-to-one, then the function /2 is
, one-to-one.
(d) If a function / from E to E is one-to-one and bounded, then /-1 is
one-to-one and bounded.

11. Suppose that c < d.

(a) Prove that there is a q e Q so that |q — a/2| < d—c. Flint: use problem
11 of Section 1.1.
(b) Prove that q — V2 is irrational.
(c) Prove that there is an irrational number between c and d.

12. Prove that e is irrational by supposing that e = ~ and deriving a con¬


tradiction. Use the fact that e = ]T°10 Jp Let sk be the partial sum

sk — 2-*ij=0 j\ ’

(a) Prove that

< (TTijij1 + ftt + (rri) +"j'


(b) Prove that e - sk < for all positive integers k.

(c) If e = prove that n\e and n\sn are integers.

(d) If e = prove that n!(e - sn) is an integer between 0 and 1, which


is absurd.
CHAPTER 2

Sequences

This chapter is devoted to studying sequences of real numbers and their


limits. Most of the important concepts of analysis are defined by limit¬
ing operations. For example, the derivative is the limit of the difference
quotient and the integral is the limit of Riemann sums. Furthermore,
many proofs in analysis and applied mathematics involve “taking the
limit” of a sequence or series of elementary functions that get close to
the function one is trying to find. These limiting operations are simplest
to study in the case of sequences of real numbers, so we begin there.
Furthermore, there are many beautiful questions about sequences which
make them interesting to study in their own right. Finally, sequences oc¬
cur frequently in other branches of mathematics and in applications; see
Section 2.3, Section 2.7, the problems, and the projects.

2.1 Convergence

Intuitively, a sequence is simply an infinite list of numbers, containing a


first number, a second, and so forth:

2, 4, 6, 8, 10, 12,...
1, -1, 1, -1, 1, -1,...
1? 1 2 ’ ^ 4 ’ ^ 8 ’ 1T6’ * **
More formally, a sequence is a function from the natural numbers, N, to
the real numbers, M. We usually give names to the values of the func¬
tion, for example, by letting ai denote the value of the function at 1, a2
denote the value of the function at 2, and so forth. We represent the en¬
tire sequence by {an}^, where n is the index which runs from 1 to oo,
indicating that the successive terms of the sequence are ai, a2, <23,....
Often, a sequence is specified by giving an explicit formula for the
terms that shows how they depend on n. For example, the first sequence
28 Chapter 2. Sequences

above is given by the formula an — 2n, the second sequence by the for¬
mula an = (-l)n+1, and the third by an = 2 - 2~n+1. However, some¬
times a sequence is specified by giving an algorithm for computing the
terms. For example, if
ai = 2

and we are told that


(2n-j_l Qo,n

for all n, then it is easy to see that a2 = (6)(2), a3 = (6)2(2), and so forth.
Thus, the formula for the term of the sequence is an = 2(6n_1). When
we specify a sequence, the name of the index doesn't matter, so {an
and {afcjfcT 1 specify the same sequence. Sometimes, a sequence is given
where the index n starts at some integer other than 1. In such a case,
one can define a new index for the same sequence that starts at 1. For
example, if the sequence {an}^°=2 is giyen/ we can set k = n — 1 and
observe that {ak+1}^ specifies the same sequence. When the index
runs from 1 to 00, we will often drop the explicit statement of the range
of the index and write simply {an} instead of {an}^=1.
The idea of a “limit” of a sequence is very simple. We say that a
sequence {an} converges to a limit a if the terms in the sequence, an, get
closer and closer to a as n gets larger. So, the first sequence above does
not converge to a limit because each term is two bigger than the previous
term, so the sequence can't get closer and closer to any finite number.
The second sequence doesn't converge because it keeps hopping back
and forth between —1 and 1. The third squence converges to 2 because,
as n gets larger, the terms get closer and closer to 2.
These ideas seem so simple and clear that it is reasonable to ask why
we need a technical definition of convergence. The answer is that we
will encounter lots of sequences whose convergence and limits are not
obvious. Even when the sequence is given by an explicit formula, it may
be difficult to see immediately whether it converges and, if so, what the
limit is. This is the case with the following two sequences,

;. 1
an = n sin —
n

and
o„ = (l + i)n,
n
which are often introduced in calculus. At other times, the sequence may
be specified by giving a0 and a recursive formula for the rest of the terms,
2.1 Convergence 29

for example.
1
&n+1 2<*n + 2
or
®n+l — 2an(l On)-

Again, it is not easy to see immediately whether either sequence con¬


verges and, if so, what the limit is. Given ao = .75, for example, you can
easily compute the first 10 terms of each of these sequences. They look
like they are converging, and you'll even be able to make a reasonable
guess for the limit. However, to prove that they converge we need a tech¬
nical criterion for “limit” and a lot of practice in verifying the criterion in
simple cases.

Definition. We say that a sequence {an} converges to a limit a if, for


every e > 0, there is an integer N(e) so that

|an — a\ < e, for all n > N. (1)

If {an} converges to a, then we write

lim an = a or an —> a.
n—>-00

Here is what this definition means geometrically. Let a small number


e > 0 be given. Consider the interval of length 2e, which has a at its
center; see Figure 2.1.1.

ai &at+i aN+2 cln as a2


- —-1-•-1---•—1—•-•—
a — £ a a +£

Figure 2.1.1

There must exist an N so that all the terms an in the sequence for n > N
lie in the interval. Note that for convergence to hold this must be true
for each £. That is, for each e there must exist such an N. The size of N
will normally depend on e, smaller £ requiring larger N, which is why
we wrote N(e) in the definition.

Example 1 Consider the sequence {an}, whose nth term is given by


an = 1 + As n gets larger, yTz gets larger, so an should converge to 1.
30 Chapter 2. Sequences

Let's prove this by checking the definition of convergence. Let e > 0 be


given. We want to show that we can choose N{e) so that (1) holds, that
is, so that

1 < £

for all sufficiently large n. The inequality on the right is equivalent to

1 < e^/n,

which is equivalent to
1
~2 < n-
Choose N to be any integer greater than or equal to Then, if n > TV,
we have
n > N > —z.
£z

Therefore,
1 1
—F= < —j= < S.
Vn VN
This shows that (1) holds for n> N. Since we have shown how to choose
N(e) for every e > 0, we have proven that an —> 1. Throughout the
proof we used properties of the order relation among real numbers. For
example, we used the fact that if 0 < x < y, then 0 < y-1 < x~l.

Example 2 Suppose that {an} is the sequence given by

Q>n —

Since | gets smaller and smaller as n gets larger, the expression under the
square root sign gets closer and closer to 2. So, it is intuitively reasonable
that this sequence has a limit and the limit is \/2. In order to prove it, let
e > 0 be given. We want to show that

< £ (2)

for all sufficiently large n. Multiplying and dividing by ^2 + 1 + 72, we


find

J2 + --V2
(V^ti ~ ^2)(\/2 + n +
V n
(\/2 + n + V^2)
2.1 Convergence 31

3 jn

V2 + n + V^2
3/n
2^2'

In the last step we replaced the denominator y 2 + | + a/2 by something


smaller, namely 2 a/2, so the fraction got larger. Thus, we want

3/n
< e,
2V2
which is equivalent to

3
n >
2e\[2

Choose A?-.to be any integer > 3/2ey/2- Then, if n > N, we have n >
3/2e-\/2, which is equivalent to 3/2ny/2 < e. Therefore n > N implies
that (2) holds. We have verified the criterion (1) in the definition of limit,
so we conclude that y 2 + ^ > \[2.

Example 3 Suppose that a\ = 5 and the rest of the members of the


sequence {an} are given by the recursion relation

dn — rein—1

where \r\ < 1. Then, a2 = 5r; 0,3 = 5r2; and, in general, an = 5rn_1. Since
we are assuming |r| < 1, it seems reasonable that an -> 0. Let's prove it.
Let e > 0 be given. We want to show that for n large enough

\an — 0| = 5|r|n_1 < e, (3)

or equivalently.

If we take the natural logarithm of both sides we see that

1 5
(n — 1) In —r > In -
r £
or

n > + 1. (4)
32 Chapter 2. Sequences

Note that In A > 0 since \r\ < 1. Choose N to be any integer such that

N > + 1.

Then n> N implies that (3) holds, since (4) and (3) are equivalent. There¬
fore, by the definition of limit, we conclude that an -> 0.

If a sequence {an} does not converge, we say that it diverges. There


are different ways in which a sequence can diverge. The sequence
an = (-l)n diverges because it keeps hopping back and forth between
1 and — 1. On the other hand, the sequence an = n2 diverges because
the terms get large. It is convenient to have some special terminology for
this second case.

Definition. We say that a sequence {an} diverges to oo if, given any


number M, there is an N so that n > N implies that an > M. If {an}
diverges to oo, we write an —> oo or limn^oo an = oo. Similarly, we say
that a sequence {an} diverges to -oo if, given any number M, there is
an N so that n > N implies that an < M. If {an} diverges to — oo, we
write on -oo or limn_^oo an = — oo.

In terms of the number line, saying that an —>• oo means that, given any
number M, there is an N so that n > N implies that an is to the right of
M.

Example 4 Let's consider the sequence defined recursively in Example


3 but now with the assumption \r\ > 1. We consider three cases. If r = 1,
then an — 5 for all n, so the sequence converges to 5. If r > 1, then, since
an — 5rn_1, it is intuitively clear that an should diverge to +oo. Let's
check that the criterion in the definition is satisfied. Let M be given; we
want to show that

5 rn~l > M (5)

for n large enough. If M < 0, the inequality holds for all n so there is
nothing more to prove. If M > 0, the inequality (5) is equivalent to

(n — l)lnr + ln5 > In M


2.1 Convergence 33

since the natural logarithm is an increasing function. Rearranging this


inequality, we find

In M — In 5
n > --- + 1. (6)
inr

Thus, if we choose N to be any integer greater than the right-hand side


of (6), we see that n > N implies (5). Therefore, 5rn~l 2 3 oo as n —>• oo if
r > 1.
If r < —1, then the values of 5rn_1 jump back and forth between
positive numbers > 1 and negative numbers < —1, so the sequence does
not converge, nor does it diverge to oo or to — oo.

Example 5 Consider the sequence an = 2 — Each term is larger


than the previous one, but the sequence does not diverge to oo. In fact,
it converges to 2. Thus, just because a sequence is increasing does not
mean that it diverges to oo.

Problems

1. Compute enough terms of the following sequences to guess what their


limits are:

(a) an = n sin C
(b) an = ( 1 + J)n.
(c) an+1 = \an + 2, ai = .5.
(d) an_|_i = 2.5Gn(l (in)) ®i -3-

2. Prove directly that each of the following sequences converges by letting


e > 0 be given and finding N{e) so that (1) holds.

(a) an = l + 4t=.

(b) an = 1 +

(c) an — 3 + °~n.

(d) an = I

3. Prove directly that each of the following sequences converges by letting


e > 0 be given and finding N(e) so that (1) holds:

(a) an = 5 - for n > 2.


34 Chapter 2. Sequences

(b) an = Hint: to determine the limit, divide numerator and de¬


nominator by n.

(c) an = £^2 for n>2-


(d) an = Hint: look at the ratio of successive terms.

4. Prove directly that each of the following sequences diverges to oo or di¬


verges to — oo:

(a) an = 2n.
(b) an = -n2.
(c) an — Vhrn.
(d) an = g.

5. Prove that a sequence {an} can have at most one limit.


6. Suppose that an ->■ a and let b be any number strictly less than a. Prove
that an > b for all but finitely many n.
7. Suppose that an —>■ a and that an > b for each n. Prove that a > b.
8. Suppose that an -» 0 and bn oo. Show that we cannot draw any
definite conclusions about the sequence cn = anbn by giving examples
which satisfy these hypotheses and

(a) cn —> 0.
(b) cn -» oo.
(c) cn —> 1.
(d) {cn} does not converge, nor does it diverge to +00 or —00.

9. Before we gave the formal definition of convergence, we said intuitively


that {an} converges to a if the an get “closer and closer” to a. Did we
mean that each successive term was closer to a than the previous one?
No. We used “closer and closer” in an imprecise way. This illustrates
why one needs technical definitions which say exactly what one means.

(a) Find a sequence {an} and a real number a so that

|®n ^"1
* ^

for each n, but {an} does not converge to a. Thus a sequence can
get “closer and closer” to a without converging to a.
(b) Find a sequence {an} and a real number a so that an —»• a but so
that the above inequality is violated for infinitely many n. Thus a
sequence can converge without getting “closer and closer.”
2.2 Limit Theorems 35

2.2 Limit Theorems


As we have seen in the last section, proving that limits exist by using the
definition of “limit” can be hard work. Instead, one can sometimes use
known limits and general limit theorems of the kind we will prove in
this section. For example, let {an} and {bn} be sequences and suppose
that an —>■ a and bn —>■ b. Since an gets close to a and bn gets close to b, it
is intuitively clear that anbn should get close to ab. That is, if we define
a new sequence, {cn}, by setting cn = anbn, it should be true that {cn}
has a limit and the limit should be ab. As a warm-up to proving the limit
theorems, we prove two propositions. The first shows that convergent
sequences are automatically bounded.

Proposition 2.2.1 Suppose that bn —»• b. Then the sequence {bn} is


bounded; that is, there is a number, M, so that

\bn\ < M, for all n. (7)

Proof. By the definition of convergence, we can choose N so that n> N


implies \bn - b\ < 1. Then, using the triangle inequality, we see that

| bn | = | bn b T b|

< I bn — b\ + |6|
< 1 + |&|

for n > N. Define M = max{|&i|, I&2I, • ••, |biv—1|» 1 + N}- Then, by the
way that M is defined, we see that |&n[ < M for 1 < n < N. In addition,
\bn\ < l + \b\ < M for n > N. Thus, the inequality in (7) holds for all n.

The second proposition shows that if a sequence is sandwiched be-


tweeen two sequences that converge to the same limit, then the sequence
converges to that limit too.

Proposition 2.2.2 Let {an} and {bn} be sequences such that an ->• L and
bn -> L. Let {cn} be a sequence such that an < cn < bn for all n. Then
Cn L.

Proof. Let e > 0 be given. Since an ->• L, we can choose an N\ such that
n > N\ implies \an - L\ < e. Since bn -t L, we can choose an N2 such
that n > N2 implies \bn - L\ < e. If we choose N = max{Ni,N2}, then
36 Chapter 2. Sequences

n > N implies that both n > N\ and n > N2. Suppose n > N. Then
either cn < L or cn > L. In the first case,

U71 — Cn — T

SO

L — cn| < \L — an\ < e.

In the second case.


-L cn ^ bn

so
L < |6n L| < e.
In either case, n > N implies

IL - cn| < e,

so we have proven that cn —>• L. □

Example 1 Let cn — l + -os n^in n. The sequence {cn} looks complicated,


but notice that — 1 < cos n sin2 n < 1 for each n. Thus,

1 — — <cn<l + —.
n n

Since {1 - -} and {1 + £} both converge to 1, the sequence {cn} must


converge to 1, too, according to Proposition 2.2.2.

□ Theorem 2.2.3 Let {an} and {bn} be sequences and suppose that an ->■ a
and bn -> b. Then,
lim (an + bn) a + b.
71—>-00

Proof. Let e > 0 be given. Since an ->■ a, we can choose Nx so that


n > Ni implies \an - a\ < §. (Why?) Since bn -»• 6, we can choose iV2 so
that n > N2 implies that |6n - b\ < f. Define AT = max{Nu N2}. Then, if
n > N,

l(°n + bn) — (a + 6)| = |(an - a) + (bn - 6)|

— I(an — u)| + | (bn b) |

= £.
2.2 Limit Theorems 37

This proves, by the definition of limit, that lim^oo (an + bn) = a + b. In


the second line we used the triangle inequality. □

□ Theorem 2.2.4 Let {an} be a sequence and suppose that an —> a. Then,
for any constant k.

lim (nan) = k lim (an) = na.


n^-oo n—too

□ Theorem 2.2.5 Let {an} and {bn} be sequences and suppose that ar a
and bn —b. Then,
lim (anbn) = ab.
71-^00

Proof. Let e > 0 be given. To see how to choose N, we write

. &b\ — )Qnbri dbyi “I- dbn ab|

^ |{an a}bn| T |a(^n ^)|


| an ct 11 bn I T |a||6„ b |.

We want to show that the sum on the right-hand side is less than e if we
choose n large enough. WeTl work on the second term first. If a = 0,
then the second term is < | for all n. If a ^ 0, then, since bn —> b, we can
choose an Ni so that n > N\ implies

\bn-b\ <
2 a

Thus for such n the second term will be less than |.


Since bn -> b, Proposition 2.2.1 allows us to choose an M so that (7)
holds. Also, an -* a, so we can choose N2 so that n > N2 implies

\aT aI <
2M

Now choose N = max {N\, N2}. Then,

\anbr — ab\ < \an — a\\bn\ + \a\\bn — b\

<
e
M -|- \a
,, e
2M 2 a

= £

if n > N. Therefore, anbn —ab. □


38 Chapter 2. Sequences

□ Theorem 2.2.6 Let {an} and {6n} be sequences and suppose that an -> a
and bn —> b. Suppose that b ^ 0 and bn ^ 0 for any n. Then,

lim (l“)
n—>cx> on = T-
b

The proofs of Theorem 2.2.4 and Theorem 2.2.6 are left to the student
(problems 3 and 5).

Example 2 Let an = (1 + ^)2 + (2 - 2~n). Since we proved in Section


2.1 that 1 + -d= —)■ 1, Theorem 2.2.5 implies that

lim (1 + i2 l.
n—»oo

Also, we know that


lim (2 - 2~n) = 2,
n—>oo

so Theorem 2.2.3 guarantees that limn^oo <Ai exists and

lim an = lim (1 H-r=)2 + lim (2 — 2~n) = 3.


n—>oo n—>oo x/Tl n—>-oo

Example 3 In problem 3b of Section 2.1, you were asked to show di¬


rectly that the sequence an = ^i±i pas a limit. We write

3n + l _ 3 + ^
n+2 “ i + r

Since the numerator on the right-hand side has limit 3 and the denomi¬
nator has limit 1, Theorem 2.2.6 assures us that {an} has a limit and that
the limit is 3.

Notice how much easier the proof in Example 3 was than the direct
proof in problem 3b of Section 2.1. There is an important point here
about the role of theory in mathematics. Suppose that we'd been given
six problems just like problem 3b except that the coefficients in the nu¬
merator and denominator were different. For each problem the direct
proof of convergence would have been long, and after a while we would
have noticed that the pattern of proof was the same. We would be doing
lots of hard work (basically the same hard work) each time to show that
2.2 Limit Theorems 39

the limit of the ratio is the ratio of the limits. That's when one sees the
need for proving Theorem 2.2.6. The proof of Theorem 2.2.6 contains all
the hard work which we were doing over and over in the separate prob¬
lems. Once Theorem 2.2.6 is proved, it can be used to do each of the six
problems easily.

Problems

1. Prove that each of the following limits exists by using Theorem 2.2.3 -
Theorem 2.2.6:

(a) an = 5(1 + ^)2.


(b) a
un
- 3n+1
n+2 •

n2+ 6
(c) dji — 3n2 — 2 '

5+(A'2
i 3* )
(d) an = 2n + 5
2+ 3n — 2

2. Find the limits of the following sequences:

(a) an = e-n sin n.


(b) an = (sin n) sin ^.
(c) an = (cosn)(Inn)-1 for n > 2.

3. Prove Theorem 2.2.4.


4. Let {bn} be a bounded sequence and suppose that an -» 0. Prove that
Q-nbn ^ 0.
5. Prove Theorem 2.2.6. Hint: first show that there are numbers, Mx and
M2, so that 0 < Mi < \bn\ < M2 for all n.

6. Let p(x) be any polynomial and suppose that an ->• a. Prove that

lim p(an) = p{a).


n—>oo

7. Let {an} and {bn} be sequences and suppose that an < bn for all n and
that an ->• oo. Prove that bn -»■ oo.
8. Let an = y/n+l - \fn. Prove that an -> 0.
9. (a) Let {an} be the sequence in problem 1(c) of Section 2.1. Prove that
{an} -> 4. Hint: subtract 4 from both sides.
(b) Consider the sequence defined by the recursive relation an+1 =
aan + 2. Show that if \a\ < 1 the sequence has a limit independent
of ai.
40 Chapter 2. Sequences

10. The Euclidean plane R2 = R x R is the set of ordered pairs (x, y), where
seR and yeR. Note that we can add pairs, (xi,2/i) + (*2,2/2) = (*1 +
*2, 2/1 +2/2)/ arid multiply pairs by real numbers, a(x,y) = (ax, ay). These
operations correspond to vector addition and scalar multiplication. We
define a notion of “size” for points in R2 by

ll(*,2/)ll = Vx2 + y2-

That is, the size of (*, y) is just the Euclidean distance from the point to
the origin.

(a) Let *1, *2,2/i) 2/2 be real numbers. Using the inequality in problem 8
of Section 1.1, prove that

*1*2 + 2/12/21 <

(b) Prove that for any two points (*1, 2/1) and (x2,2/2)/

||(*1) 2/l) + (*2) 2/2)|| < ||(*l,2/l)|| + ||(*2, 2/2)||-

(c) Let pn = (*n,2/n) be a sequence of points in the plane and let p =


(x, y). We say that pn -> p if \\pn - p\\ 0. Prove that pn ->■ p if and
only if xn —> x and yn —> y.

2.3 Two-state Markov Chains

Consider the following problem. We check a phone every minute to see


whether it is busy. Let pn denote the probability that it is free on the nth
check and qn denote the probability that it is busy on the nth check. We
now make some simple assumptions that allow us to compute pn+i and
Qn+1 from pn and qn. Suppose that the phone is free when we check.
Then we assume that with probability q it will be busy on the next check.
Since it must either be free or busy,.it will remain free with probability
1 — q. Similarly, we assume that if it is busy when we check, then the
probability that it will be free when we next check will be p, and therefore
the probability that it will remain busy is 1 - p. Then pn+1 and qn+1 can
be given in terms of pn and qn by the formulas

Pn+l = (1 - q)pn + pqn (8)


Qn+1 = qPn + (1 ~p)qn- (9)
2.3 Two-state Markov Chains 41

The first formula says simply that the probability that the phone will be
free on the n + 1 check is the probability that it was free on the check
times the probability that if it's free it remains free plus the probability
that it was busy on the check times the probability that if it's busy it
will switch to free. Similar reasoning gives the second formula. If we are
given the probability that the phone starts free, po, and the probability
that it starts busy, qo, then we can use these recursive formulas to com¬
pute pn and qn for each n. Although it would be complicated to write out
the formulas explicitly, we can see what is happening if we reformulate
the recursion using vectors and matrices. Think of the pair (pn, qn) as a
two-dimensional column vector. Then (8) and (9) can be rewritten as

( Pn+l \ = / 1 -Q P \ l Pn \ (10)
V Qn+l ) \ q 1 -P ) \ Qn )

So, if we let A denote the matrix

( i -q p ^
A (ID
V q i-p )

and vn = (Pn,qn)/ we can write

Vn+1 — Avn

SO
vn = Anv 0.

This formula shows that we can obtain the probabilities at the check
by applying the nth power of the matrix A to the vector of initial proba¬
bilities vo = (po, qo)- Note that we must havepo + qo = 1 since the phone
is either free or busy initially. By adding equations (8) and (9) we see that
Pn+l + qn+1 = Pn T Qn- Thus pn + qn = 1 for all n.
We have created two sequences of real numbers, {pn} and {qn}. We
would like to know how they behave for large n. If we wait a long time,
what are the probabilities that the phone will be free or busy? How do
these probabilities depend on the initial probabilities p0> <?o? First we as¬
sume that {pn} and {qn} converge to limits p and q, respectively, and ask
what those limits might be. Using equation (8) and Theorems 2.2.3 and
2.2.4 from Section 2.2, we see that

p = lim pn
n

= lim pn+i
n
42 Chapter 2. Sequences

= lim((l - q)pn +pqn)


n

= (1 - q) limpn +plim<?n
n n

= (1~9)P + P9-

In similar fashion,

p+g = lim (pn + qn) = 1.

This gives us two linear equations that must be satisfied by any limits p
and q . The equations are independent and thus have a unique solution,
which is easily calculated to be:

- P q
p = —— > q =
p+q p+q
We shall show that pn -» p and qn —* q. Using the equations (8), (9), and
the equations above satisfied by p and q, we calculate

{Pn+l — p)2 + {qn+1 — q)2


= ((! - q)Pn+pqn - p)2 + (qpn + (1 - p)qn ~ q)2

= ((! - q)(Pn - p) + p(qn - q))2 + (q{Pn-p) + {l-p){qn-q))2.

To make the algebra easier, define a = pn - p. Since pn + qn = 1 and


P + q = 1, notice that -a = qn-q. Thus,

{Pn+i ~p)2 + (<?n+i - q)2 = ((1 - q)a - pa)2 + (qa - (1 - p)a)2

= a2{(! - q)2 - 2p(l - q) +P2}

+ a2{q2 -2q(l-p) + (l-p)f}

= 2a2(l - (p + q))2

= (!- (P + q))2{{Pn-p)2+ {qn-q)2.}

If we iterate this equality, starting with the case n = 1, we obtain

{Pn+l -p)2 + (qn+l - q)2 = (1 - (p +q))2n{(pi - p)2 + (q1 -q)2}. (12)

From now on we assume that 0 < p < 1 and 0 < q < 1. It follows that
0 < p + q < 2, which implies that |1 - (p + q)\ < l. Therefore,

}im{l-(p + q))n = 0
n—^ oo
2.3 Two-state Markov Chains 43

and so it follows from Theorem 2.2.4 that the right-hand side of (12) con¬
verges to zero. Thus, the left-hand side of (12) converges to zero too.
Since
(Pn+l - P)1 2 3 4 < (Pn+l - P)2 + {Pn+1 ~ pf,

it follows from Proposition 2.2.2 that lim^oo (pn+i - p)2 = 0. On the


other hand, if a sequence of positive numbers converges to zero, then
the sequence of square roots converges to zero too (problem 1). Thus,
lim^oo (pn — p) = 0, so we have shown that pn —» p. A similar sequence
of steps shows that qn —> q.
What this convergence means is that if we wait a long time, then the
probability that the phone will be free will be approximately p and the
probability that it will be busy will be approximately q. Notice that this
is true no matter what the initial probabilities po and qo are.
Probabilistic systems such this are called stochastic processes. The
one that we have described is called a Markov chain because time is dis¬
crete and at each time the probabilities are determined from the prob¬
abilities at the previous time. What we have described is a two-state
Markov chain because at each time the system (that is, the phone) has
only two possible states. The probabilities p and p are called invariant
probabilities because (p, q) = A(p,q). If we start with the probabilities p
and q, then every time we check the probabilities will be p and q. In the
language of probability theory, we have proven that a two-state Markov
chain has unique invariant probabilities to which the system converges
as n -> oo. A somewhat easier proof of this result can be given by using
the tools of linear algebra. This is outlined in Project 2.

Problems

1. Suppose that an > 0 and that an —)> 0. Prove that yja.„ 0.


2. Let p = .7 and q — .5 and suppose that po — .5 and q0 = .5. Compute as
many iterates of equations (8) and (9) as you need to make good guesses
for limn_>oo pn and limn_).00 qn.
3. Let p and q have the same values as in problem 2. Let p0 = .2 and q0 = .8
and compute as many iterates as you need to verify that pn and qn con¬
verge to the same limits as in problem 2, even though the initial probabil¬
ities are different.
4. Let p = .8 and q — .9 and suppose that p0 = -5 and q0 = .5. What are
Hindoo pn and lim^oo qn in this case? Is the convergence to these limits
more or less rapid than the convergence in problem 2? Why?
44 Chapter 2. Sequences

5. Suppose that p = .2 and q = .5. How large must we choose n to be sure


that pn and qn are within 10-3 of the limiting values p and q no matter
what po and qo are?

6. In a certain city, taxis are sometimes on the north side and sometimes on
the south side, but they must be in one place or the other. Let pn and qn
denote the probabilities that a particular taxi is on the north side and the
south side, respectively, on the time that we check. If a taxi is on the
north side when we check, it has a probability of .5 of having switched
to the south side when we next check. If it is on the south side, it has a
probability of .3 of having switched north when we check again. If there
are 100 taxis, about how many will be on the north and south sides if we
wait a long time?

7. Suppose that taxis can be on the north side or the south side, and let pn
and qn be the probabilities of finding them there on the check. Sup¬
pose that between any two times that we check, each taxi has a finite
probability r of going home to the taxi barn (neither north or south) from
which it doesn't emerge. Explain why pn and qn satisfy a recursion rela¬
tion of the form

Pn+l — bnpn + bi2qn


Qn+1 = b2ipn + b22qn,

where the coefficients bij all satisfy 0 < bij < 1 and in addition bn + b21 <
1 and bi2 + b22 < 1- By summing the two equations, prove that pn —> 0
and qn —f 0 as n —> oo. What does this mean?

8. Suppose that the taxi in problem 7 can reemerge from the taxi barn and go
to the north side or the south side with certain probabilities. Formulate
(but do not solve) a three-state Markov chain for the probabilities pn, qnr
and rn that the taxi will be on the north side, the south side, or in the taxi
barn, respectively.

2.4 Cauchy Sequences

In the first three sections of this chapter we proved that many different
sequences converge to limits. In all cases, the idea of the proof was the
same. First we guessed the limit, a. Then we subtracted the purported
limit from the terms of the sequence, an, making a new sequence {an~a}.
Then we proved that the sequence of differences converges to zero; that
is, (an - a) -)• 0. Since this is equivalent to an ->• a, we concluded that the
sequence {an} converges and the limit is, indeed, a. But what if we can't
2.4 Cauchy Sequences 45

guess the limit a? Is there a method by which we can tell if a sequence


converges just by looking at the terms of the sequence itself? Just such a
criterion was introduced by the mathematician Augustin-Louis Cauchy
(1789-1857).

Definition. A sequence {an} is called a Cauchy sequence if, given any


e > 0, there exists N so that

| an — am | < e if n > N and m > N.

That is, a sequence is a Cauchy sequence if, given any e > 0, all the
terms of the sequence are within e of each other from some index N on.
We emphasize that for each e there must exist an N. In general, if e is
smaller, one will have to choose N larger. Notice that there is no mention
of a limit a.

Example 1 We will prove that the sequence an = is a Cauchy


sequence. Let e > 0 be given. To see how we should choose N we first
estimate:

3n + 1 3m + 1 5(n — m)
n+ 2 m+ 2 (n + 2)(m + 2)

5n 5m
< +
(n + 2)(m + 2) (n + 2)(m + 2)

5 5
< +
m+ 2 n+2

In the last step we used the fact that ^ < 1 f°r each positive integer k.
Choose N so that N > 10/e, or equivalently 5/N < e/2; the reason for
this choice will be clear shortly. If n > N and m> N, then

-±- < A+ « <


m n+2 m n I + ^ £/2 + £/2'

Since we have been able to choose an appropriate N for each e, we have


proven that {an} is a Cauchy sequence.

The sequence in the example converges (problem 1(b) of Section 2.2).


In fact, every convergent sequence is automatically a Cauchy sequence,
as the following proposition shows.
46 Chapter 2. Sequences

Proposition 2.4.1 If {an} converges to a finite limit, then {an} is a


Cauchy sequence.

Proof. Let a = lim an and let e > 0 be given. Since an ->• a, we can
choose N so that n > N implies that \an - a\ < e/2. Therefore if n > N
and m> N, we find

\o>n | | (®n "E (u Um)|

C | (®n n)| T | (tt Um)|

^ e/2 T e/2.

Thus, {an} is a Cauchy sequence. □

The more difficult question is whether the converse to Proposition


2.4.1 is true. That is, if a sequence is a Cauchy sequence, does it neces¬
sarily converge to a limit? Before taking up this question, recall that the
rational numbers, Q, are the real numbers that can be expressed as quo¬
tients, m/n, where m and n are integers and n ^ 0. The irrational num¬
bers are the real numbers that are not rational. The reader is certainly
familiar with decimal expansions of real numbers; we discuss decimal
expansions rigorously in Project 4 of Chapter 6. In terms of decimals, the
rational numbers are just the real numbers whose decimal expansions
are repeating after some point. The following irrational numbers,

7T = 3.14159...
e = 2.71828...
y/2 = 1.41421...

occur frequently. The number it is the ratio of the circumference to the


diameter of any circle; e can be defined as the unique number so that the
derivative of ex is again ex, that is, so that the slope of the tangent line to
the graph of ex at the point (x, ex) is ex; \[2 can be defined as the length
of the hypotenuse of a right triangle whose other two sides both have
length 1. Now one must prove that the numbers so defined are irrational
and devise ways of computing as much of their decimal expansions as
one needs. We gave Euclid's proof that \[2 is irrational in Proposition
1.4.2 and outlined a proof that e is irrational in problem 12 of Section 1.4.
We proved in Section 1.3 that the set of rational numbers is countable
and the set of irrational numbers is uncountable.
The rational numbers have the very important property that they are
dense in the real numbers. This means that, given any real number a,
2.4 Cauchy Sequences 47

there is a sequence of rational numbers, {rn}, so that rn —>• a. If a is


rational we just pick every member of the sequence to be equal to a. If a
is irrational, we choose rn to be the rational numbers given by taking the
first n terms of the decimal expansion of a followed by zeros. So, in the
case of 7r we would have

r\ = 3.0
r2 = 3.1
7*3 = 3.14
r4 = 3.141
r5 = 3.1415,

and so forth. Notice that if a number r has a decimal expansion that


agrees with the decimal expansion of a through the decimal place,
then the number differs from a by less than 10_TO. Thus it is clear that
the sequence of rationals {rn} converges to a.
We now return to the question of whether Cauchy sequences of real
numbers necessarily have limits. A set of numbers which has this prop¬
erty is called complete. Let us suppose that we didn't know about irra¬
tional numbers, so that our number system consisted of only the rational
numbers, Q. Suppose that we had a Cauchy sequence {rn} of rational
numbers. Could we be sure that it converged to a limit? Clearly not, as
the example of the sequence {rn} above shows. Since the sequence {rn}
cannot converge to any other real number (problem 5 of Section 2.1), and
since tc is not in Q, the sequence {rn} does not converge to any limit in
Q, even though it is a Cauchy sequence. Therefore, Q is not complete
because not all Cauchy sequences in Q have limits in Q. If we want to
enlarge Q to make a complete set of numbers, we will have to include at
least all the irrational numbers, for each of these is a limit of a Cauchy
sequence of rationals. Indeed, this idea of characterizing all real num¬
bers as the set of “limits” of Cauchy sequences of rationals is one of the
methods of constructing the real numbers from the rationals. Even if we
include the irrational numbers, the question of completeness isn't set¬
tled. Would a Cauchy sequence of irrationals necessarily have a limit
which is an irrational or rational number? The answer, though not obvi¬
ous, is “yes”. This is proven as part of the construction of the set of real
numbers from the rationals. Since we are not giving this construction
here, we will merely assume the result, that is, that the real numbers are
complete.
48 Chapter 2. Sequences

Axiom of Completeness. Every Cauchy sequence of real numbers con¬


verges to a finite real number.

Combining the Axiom of Completeness and Proposition 2.4.1, we


have:

□ Theorem 2.4.2 A sequence of real numbers is a Cauchy sequence if and


only if it converges to a finite real number.

A sequence {an} is said to be monotone increasing if an < an+i for


all n, and it is said to be monotone decreasing if an > an+\. The proof
of the following theorem depends in the last step on the completeness of
R. The theorem itself is extremely useful since it is often easy to prove
that a sequence is bounded. If it is also monotone, then we know that
the sequence must converge.

□ Theorem 2.4.3 Every bounded monotone sequence converges.

Proof. Let {an} be a bounded monotone increasing sequence; the proof


for the decreasing case is similar. We will construct a sequence of inter¬
vals, [bn, cn], of decreasing length so that each interval contains the tail of
the sequence {an}. Since {an} is bounded there is an M so that an < M
for all n. Let b\ = a\ and c\ = M. Since {an} is monotone increasing,
every term of the sequence is in the interval [&i, ci].

case 1 case 2

b2 aN2 C2 b2 aN2 c2
—i-1-•-t- —i-•-1-t-
bX ml cl &i mi ci

Figure 2.4.1

Note that c\ —b\ = M — a\. Now, let mi be the midpoint of the interval
[b\, ci]. If there is one term of the sequence in the interval [mi, ci], then all
the rest of the terms of the sequence must be in [mi, ci] since the sequence
is monotone increasing. In this case, let N2 be an integer so that aN2 is in
[mi, ci], and define our second interval by b2 = mi and c2 = ci, i.e. the
right half of [&i, ci]. See case 1 in Figure 2.4.1. If no points of the sequence
are in the interval [mi, ci], we define our second interval to be the left half
2.4 Cauchy Sequences 49

of [&i, ci]; that is, b2 = &i and C2 = mi; see case 2 in Figure 2.4.1. In either
case, we know that

an e [b2, C2], for n > N2

and
c2 — b2 = -(M — ai).

We now continue in this manner, successively dividing our intervals in


half and choosing one of the two subintervals. Suppose that the interval
[bj,Cj] and Nj have been defined so that

an e [bj, Cj ], for n > Nj

and .
Cj-bj = Q) (M-ai).

Let rrij be .the midpoint of \bj,Cj\. If there is an integer Nj+i so that


aNJ+1 e [rrij, Cj\, then we choose the right subinterval, that is, bj+1 = rrij
and cj+1 = Cj. If none of the an are in [mj,Cj]r then we choose the left
subinterval, i.e. bj+1 = bj and cj+1 = rrij. In either case,

an e [bj+uCj+i], for n > Nj+1

and

cj+i-&i+i = Q) (M-ai).

Now, let £ > 0 be given. Choose j so that 2_J+1(M — ai) < e. Then, if
n > JVj- and m>Nj, both an and aw are in the interval [bj, Cj], so

|an —^ml < 2~-7+1(M-ai) < e.

Thus, {an} is a Cauchy sequence. By Theorem 2.4.2, it therefore con¬


verges to a finite limit. ^

Example 1 It is easy to check that the sequence an = ^ is monotone


increasing, and it is clearly bounded by 1. Therefore, by Theorem 2.4.3,
the sequence converges. Of course, this can also be proved directly.

Example 2 Consider the sequence an = tfn. Let / be the function


50 Chapter 2. Sequences

defined for x > 0. Notice that for all positive integers n,

f(n) = = tfi,

so the sequence {an} consists of the values of / at the positive integers.


Differentiating f, we find

. 1 — In X In x

which is negative if x > e since In x is an increasing function and In e =


1. Thus the numbers an = f(n) are monotone decreasing for n > 3.
Since they are bounded below by 1, Theorem 2.4.2 guarantees that the
sequence %/n converges. Problem 11 outlines the proof that the limit is 1.

Notice that if {an} is monotone increasing and is nof bounded, then


for each M there is an element of the sequence, say an, so that ajy > M.
Because the sequence is monotone increasing, we have an > M for all
n > N. Since we can choose such an N for each M, {an} diverges to +oo
according to the definition in Section 2.1. Thus every increasing sequence
converges to a finite limit or diverges to +oo. Similarly, every decreasing
sequence converges to a finite limit or diverges to — oo.

Problems

1. Prove directly, by verifying the definition, that an — 1 + -±= is a Cauchy


sequence.

2. Prove that the rational numbers are dense in the real numbers without
using decimal expansions. Hint: use problem 11 of Section 1.1.

3. Suppose that (an| and {6n} are both Cauchy sequences and that an < bn
for each n. Prove that limn^00 an < lim^oo bn.

4. Suppose that {xn} and {yn} are both Cauchy sequences. Prove that
hmrwoc (xn - yn) exists.

5. Suppose that {an} is a Cauchy sequence. Prove that {a1 2 3 4 5 6n} is a Cauchy
sequence. Is the converse true?

6. Let {bn} be a sequence of positive numbers such that bn -* 0. Suppose


that the terms of a sequence {an} satisfy

\cLm ~ an\ < bn for all m > n.

Prove that {an} is a Cauchy sequence.


2.4 Cauchy Sequences 51

7. Suppose that the terms {an} satisfy \an+i — an\ < 2 n for all n. Prove that
{an} is a Cauchy sequence.

8. A sequence of points in R2, {pn} , is called a Cauchy sequence if, given


e > 0, there is an N so that ||pn — pm|| < e for all n > N and m > N.
The norm || • || is defined in problem 10 of Section 2.2. Prove that every
Cauchy sequence in R2 has a limit in R2, that is, R2 is complete.

9. Suppose that ai > 0 and an+1 = an + ■£-. Prove that {an} diverges to oo.
Hint: if {an} converges to a nonzero limit, find an equation that would
be satisfied by the limit.

10. Let ai = y/2, and let an for n > 2 be defined recursively by the formula

^n+1

(a) Prove by induction that y/2 < an < 2 for all n.


(b) Prove that {an} is a Cauchy sequence and conclude that {an} con¬
verges.

11. (a) Compute enough terms of the sequence an = y/n to guess the limit,
(b) Let an = y/n — 1. Use the binomial theorem to prove that

n(n — 1) 2
n (1 + an)n > an *
2

(c) Rearrange the inequality and prove that y/n -> 1.

12. Let a„ = (1 + ^)n.

(a) Prove that f(x) = (1 + \)x is increasing for x > 1. Hint: show that
/'(1) > 0 and f"(x) > 0 for x > 1.
(b) Use the binomial theorem to show that an is bounded above by 1 +
En
fc=o fc! ■

(c) Prove that limn_>oo an exists.

13. Let {an} be monotone and bounded and define {bn} by

ai + 02 + ••• + an
bn •
n

Prove that {bn} is monotone and bounded and therefore has a limit.

14. Let {an} be a sequence and suppose that an -»• a. Define the sequence
{bn} as above and prove that bn -> a.
52 Chapter 2. Sequences

2.5 Supremum and Infimum


Let S' be a set of real numbers. We say that meS is a maximal element
if m > x for all xeS. Some sets have a maximal elements, and some
sets don't. The number 1 is the maximal element of the closed interval
[0,1], but the open interval (0,1) has no maximal element. Nevertheless,
we want to characterize the relationship of 1 to the interval (0,1). Notice
that 1 is an upper bound for the interval. In fact, of all the upper bounds
it is the least. To generalize this idea to more complicated sets, we make
a formal definition.

Definition. A number b is an upper bound for a set S if x < b for all


xeS. We say that bo is a least upper bound of S if bo is an upper bound
and bo < b for any other upper bound b.

Though not all bounded sets have maximal elements, they all have
least upper bounds, as the following theorem shows.

□ Theorem 2.5.1 Let S' be a set of real numbers which is bounded above.
Then S has a unique least upper bound.

Proof. We will prove the existence of a least upper bound; uniqueness


is left as an exercise (problem 4). Let M be an upper bound for the set S
and let b be an element of S. We shall construct a decreasing sequence,
{an}, of upper bounds that converges to a least upper bound. Let b\ = 6,
ci = M and let I\ be the interval Ii = [&i,ci]. Choose ai = M. Note
that ai is an upper bound and that there is an element of S, namely b,
within M — b ot a\. Let mi be the midpoint of the interval Ji. If there
are no elements of S in the interval [mi, ci], choose a2 = mi and define
h = [bi,rrii]. If there are elements of S in [mi,ci], choose 02 = ci and
define I2 = [mi,ci]. See Figure 2.5.1. Note that in both cases, a2 is the
right endpoint of I2, 02 is an upper bound for S, 0,2 < a\, and there is an
element X2 e S so that |a2 — £2! < — b).
Rename the endpoints of h to be 62 and C2 and let m2 be the mid¬
point of /2. Choose a3 to be m2 if there is no point of S in [7712,02],
and a3 = C2 otherwise. Continuing this con¬
61 ci
—I-1-1-
structive procedure, we obtain a decreasing se¬
b mi M quence, {an}, of upper bounds for S. The
term of the sequence, an, has the property that
Figure 2.5.1 there is an element xneS so that |an - xn\ <
2.5 Supremum and Infimum 53

2"n+1(M - 6). Since the sequence {an} is decreasing and is bounded be¬
low by b, Theorem 2.4.3 guarantees that {an} converges to a limit a.
To complete the proof we use two simple results that we have left as
excercises. First, since an is an upper bound for S for each n, the limit a
is also an upper bound for S (problem 2). Next, let £ > 0 be given, and
choose n so that 2~n(M — b) < e/2 and so that \an — a\ < e/2. Then,

i | | . . . e e
%n I ^ | |^n n\ — 2

so there is an element of S within e of a. According to problem 3 this


means that no upper bound for S could be less than a. Thus, a is a least
upper bound for S. □

The least upper bound of the set S is also called the supremum of the
set and denoted by sup S. A similar proof shows that any set which is
bounded below has a greatest lower bound. The greatest lower bound
is called the infimum of the set and is denoted by inf S. If a set S is
not bounded above, we define sup S = +oo. It is easy to see that if
sup S = Too then there is a sequence of points sneS such that sn —>■ Too
(problem 7). Similarly, if S is not bounded below we define inf S' = — oo.
Our discussion of completeness assumes the Archimedian property
(for example, it was used implicitly in the proof of Theorem 2.4.3). The
Archimedian property follows from Theorem 2.5.1.

O Theorem 2.5.2 Let a and b be real numbers which satisfy 0 < a < b.
Then there is a positive integer n such that b < na.

Proof. We shall give a proof by contradiction. Assume that the Archi¬


median property fails for some a and b satisfying 0 < a < b. That is,
na < b for all positive integers n. Thus b is an upper bound for the set
S = {na | n eN}. Let c be the least upper bound of S guaranteed by The¬
orem 2.5.1. Then, c < c T a since a > 0, so c - a < c. Since c is the least
upper of S, c — a cannot be an upper bound. Thus, there is an element of
S of the form ma, where m is a positive integer, such that ma > c - a. It
follows that c < (m T l)a, but this is a contradiction since (m T l)a is in S
by definition and c is the least upper bound of S. Thus the Archimedian
property holds for all a and b satisfying 0 < a < b. □

The following result will be useful when we define the Riemann in¬
tegral; see (10) in Section 3.3.
54 Chapter 2. Sequences

□ Theorem 2.5.3 Let A and B be two sets of real numbers and suppose
that for each a e A and beB the inequality a < b holds. Then sup A <
inf B.

Proof. Choose any beB. Since a < b for all ae A, it is clear that b is an
upper bound for A. Since sup A is the least upper bound, we see that

sup^4 < b.

Since this is true for all beB, the set B is bounded below by sup A. Since
inf B is by definition the greatest lower bound, we conclude that sup A <
inf B. □

We took as an axiom the statement that if a sequence of real numbers


is a Cauchy sequence, then it converges to a limit. From that axiom we
proved that bounded monotone sequences converge (Theorem 2.4.3) and
that sets which are bounded above have least upper bounds (Theorem
2.5.1). We remark that Theorem 2.4.2, Theorem 2.4.3, and Theorem 2.5.1
are all equivalent (problem 10). That is, if one of them is true, then all
three are true. Thus, we could have taken any one of them as an axiom
and proved the other two.

Problems

1. Find the sup and inf of each of the following sets:

(a) {4|neN}.
(b) {2 - 4 | ne N}.
(c) {er \ re Q}.
(d) {x2 | 0 < x < 2}.

2. Let S' be a set of real numbers and suppose that an is an upper bound for
S for each n. Prove that if an a, then a is an upper bound for S.

3. Suppose that a set S of real numbers is bounded and let // be an upper


bound for S. Show that p. is the least upper bound of S if and only if for
every e > 0 there is an element of S in the interval [/x — e, fj].

4. Prove that least upper bounds are unique. That is, if pi and p2 are both
least upper bounds for the set S, then /xi = H2-

5. Suppose that a is an upper bound for the set S and that there is a sequence
of elements xneS such that xn a. Prove that a is the least upper bound.
2.6 The Bolzano-Weierstrass Theorem 55

6. (a) Let £i and £2 be two bounded, nonempty sets of real numbers.


Prove that

sup{S 1 U £2} = max{sup{Si}, sup{S2}}-

(b) Let £ be the set consisting of all numbers of the form a\ + a2 where
ai e £1 and a2 e £2. Prove that sup{S} = sup{£i} + sup{£2}.

7. Show that if sup £ = +00, then there is a sequence of points sn e £ such


that sn —y Too.

8. Let £ be a set of real numbers and suppose that a > 0. Prove that
sup {ax | x e £} = a sup £.

9. Let {an} and {bn} be bounded sequences and define sets A, B, and C by
A = {an},B = {6n},andC = {an+bn}. Prove that sup C < supA+supB.
Give an example to show that strict inequality may hold.

10. (a) Assume that Theorem 2.5.1 is true and prove Theorem 2.4.2.
(b) Assume that Theorem 2.4.3 is true and prove Theorem 2.4.2.
(c) Conclude that Theorem 2.4.2, Theorem 2.4.3, and Theorem 2.5.1 are
equivalent.

2.6 The Bolzano-Weierstrass Theorem

Suppose that we have a sequence {an} of real numbers. We can make


new sequences from the elements of {an} by skipping terms:

0^*21 0-4) a. . .

02, a4, as, ai6, • • •

ai, ais, ag3, ag4,0135, • • •

These new sequences are called, for obvious reasons, subsequences of


the original sequence {an}. Notice that not only are each of the elements
of the subsequence members of the original sequence but, in addition,
they must occur in the same order as they did in the original sequence.
These ideas are the basis of the following formal definition.

Definition. Let {an}™=1 be a sequence of real numbers. Let k ^ n{k) be


a function from N to N having the property that n(k + 1) > n(k) for all
ken. Then {an^}^=1 is called a subsequence of {an}“=1.
56 Chapter 2. Sequences

Normally, we use the simpler notation nk instead of n(k), and we write


the subsequence as {ank}, where it is assumed that k runs from 1 to oo.
Next, we introduce the notion of limit point.

Definition. Let {an} be a sequence of real numbers. A number d is called


a limit point of the sequence if, given e > 0 and an integer N, there
exists an n > N so that |an — d\ < e.

Intuitively, d is a limit point if the sequence keeps returning close to d.


The notions of subsequence and limit point are related to each other, as
the following proposition shows.

Proposition 2.6.1 Let {an} be a sequence of real numbers. Then d is a


limit point of the sequence if and only if there exists a subsequence {aUk}
of {an} which converges to d.

Proof. First suppose that there is a subsequence, {ank}, which con¬


verges to a limit d. Let e and a positive integer N be given. Since aUk —>■ d
as k —> oo, we can choose a K so that

\ank ~ d\ < e for all k > K.

Since nk -> oo as k —> oo, we can choose a particular k > K so that


nk > N. Thus, we have found an element of {an} , namely, aUk, so that
nk > N and |aUk — d\ < e. Therefore d is a limit point.
Conversely, suppose that d is a limit point. Then, there exists an n
such that |an - d\ < 1. Denote this n by n\. Given ni and \, we can
choose an > n\ so that |an2 —d\ < We continue in this way, defining
successively larger integers n\ < n2 < ... < nk < ... so that

„ 1
\ank ~d | < -.

It follows easily from this inequality that ank -» d as k -> oo, so we have
found a subsequence of {an} that converges to d. □

Example 1 Let an = 2(-l)n + i. Then, since £ -> 0 and (-l)n oscillates


between +1 and -1, the sequence {an} has two limit points, 2 and -2.
According to Proposition 2.6.1, these limit points must be the limits of
subsequences, and, indeed, they are:

0-2k = 2 + —— —> 2
2.6 The Bolzano-Weierstrass Theorem 57

1
a2k+\ — ~ 2 + -)• - 2.
2k + 1

Example 2 Let ao = .2 and define the rest of the sequence {an} by the
recursive relation

On+1 — (3.3)an(l an) • (13)

Using a hand calculator, we can compute approximately as many terms


of the sequence as we like:

ao = .2 a7 = .479631 ai4 = .823603


ai = .528 «8 = .823631 Ol5 = .479428
a-2 = .822413 a9= .479368 016 = .823603
O3 = .481965 Ol0= .823595 an = .479427
a4 = .823927 an= .479444 Ol8 = .823603
a5 = , .478736 0-12 = .823606 Ol9 = .479427
&Q = .823508 o 13 = .479422 O20 = .823603

The sequence appears to settle down to oscillating between the number


479427 and the number .823603. If this is the case, then the sequence
would have two limit points, .479427 and .823603. The subsequence
{«2n} would converge to .823603 and the subsequence {a2n+i} would
converge to .479427. We use the word “would” because we have not
proved that the sequence behaves in this way; we have found strong nu¬
merical evidence. One of the interesting things about the iteration (13) is
that no matter what number between 0 and 1 we choose for ao, the result¬
ing sequence settles down to oscillating between the same two numbers,
.479427 and .823603. This is discussed further in Section 2.7.

The proof of the following theorem, named after Bernhard Bolzano


(1781 - 1848) and Karl Weierstrass (1815 - 1897), uses the idea of nested
subintervals that was used in the proof of Theorem 2.4.3.

□ Theorem 2.6.2 (The Bolzano-Weierstrass Theorem) Every bounded se¬


quence has a convergent subsequence.

Proof. Let {an} be the sequence. Since it is bounded, all of the terms are
contained in an interval of the form [~M,M] for some M. We will con¬
struct the subsequence and a nested family {Ik} of subintervals whose
58 Chapter 2. Sequences

lengths shrink to zero so that each I/, contains the tail of the subsequence.
We first divide the interval
"* ^1 *’ into two parts at the mid-
_| a^i |_point. If [0, M] contains in-
-M 0 Mi M finitely many terms in the
j sequence {an}, we choose
ani to be one of those terms
Figure 2.6.1 and choose h = [0,M]. If
[0, M] contains only finitely
many terms, then [—M, 0] must contain infinitely many terms of {an}. In
this case we let ani be one of those terms and choose I\ = [— M, 0]. In
either case, I\ has length M and contains ani and infinitely many other
terms of {an}. We now divide Ji at its midpoint, making two subinter¬
vals. Either the right subinterval or the left subinterval (or both) must
contain infinitely many points of {an}. Choose I2 to be a subinterval
with infinitely many points of {an} and define an2 to be any member of
{an} in this subinterval whose index is greater than n\. Note that the
length of I? is 2_1M. See Figure 2.6.1.
Continuing in this manner we construct a nested family, {Ik}, of
intervals, each containing all the succeeding ones. The length of Ik is
2~k+1M, and a term of the sequence ank is in Ik- Since the intervals are
nested, each Ik contains anj for all j > k. Given e > 0, we can choose K
so that 2~k+1M < e. If j > K and k > K, both anj and ank are in Ik, so

\anj ~ ank\ < 2~k+1M < e.

Thus the subsequence {aUk} is a Cauchy sequence, and so by Theorem


2.4.2 it converges to a finite limit. □

Corollary 2.6.3 If {an} is a sequence of numbers in the closed interval


[6, c], then a subsequence of {an} converges to a point in [6, c\.

Proof. Since the sequence consists of numbers in the interval [6, c] it is a


bounded sequence and therefore, by the Bolzano-Weierstrass theorem, a
subsequence {aUk} converges to a finite limit, a. Suppose a is not in [6, c].
Then we can choose an e > 0 so that the intervals [a - e, a + e ] and [b, c]
are disjoint. Since aUk —> a , we know that aUk is in [a - e, a + e ] for k
large enough. But this is a contradiction since all the an are in [6, c] by
hypothesis. Thus, a must be in [b, c] too. □

We shall see in the succeeding chapters that the Bolzano-Weierstrass


2.6 The Bolzano-Weierstrass Theorem 59

theorem is extremely useful and arises in unexpected circumstances.


Project 5 outlines an application to number theory.

Problems

1. Prove that if a sequence converges it has exactly one limit point. Is the
converse true? Prove it or give a counterexample.
2. For each of the following sequences, find the limit points. For each limit
point find a subsequence that converges to it:

(b) an = (-l)n +
(c) an = sin ™.

3. Suppose that the sequence {an} converges to a and that d is a limit point
of the sequence {bn}. Prove that ad is a limit point of the sequence {anbn}.
4. Let c be a limit point of {an} and d be a limit point of {bn}. Is c + d
necessarily a limit point of {an + 6n}? Prove it or give a counterexample.
5. Let {yj}jLi be N given real numbers. Construct a sequence {an} so that
{yj}jLi is the set of limit points of {an} but an ^ yj for any n or j.

6. Consider the following sequence: a\ = the next three terms are 113.
4 2 4
’ ’ '

the next seven terms are §> §> f > ••• and so forth.
What are the limit points?
7. Let {dn} be a sequence of limit points of the sequence {an}. Suppose that
dn —> d. Prove that d is a limit point of {an}.

8. Let {ifc}feLi be a nested family of closed, finite intervals; that is, I\ D I2 3


.... Prove that there is a point p contained in all the intervals, that is,
pe Ik-
9. Suppose that {xn} is a monotone increasing sequence of points in R and
suppose that a subsequence of {xn} converges to a finite limit. Prove that
{xn} converges to a finite limit.

10. A sequence of points in the plane, {(xn,yn)}, is said to be bounded if


there is an M so that 0 < |tcn| < M and 0 < \yn\ < M for all n. Ex¬
plain why these definitions are geometrically reasonable. Prove that ev¬
ery bounded sequence in the plane has a convergent subsequence.
11. Let {(xn,yn)} be a sequence of points in a rectangle R = [a,b] x [c,c£\.
Prove that {(ccn, yn)} has a subsequence which converges to a point of R.
12. Let {pn} be a sequence of points in R1 2 3 4 5 * 7 8 9 10 11 12. Use the notion of converge
introduced in problem 10 of Section 2.2 to
60 Chapter 2. Sequences

(a) Define what it means for a point peM2 to be a limit point of {pn}.
(b) Prove that p is a limit point of {pn} if and only if {pn} has a subse¬
quence which converges to p.
(c) Determine the limit points of the sequence {pn} if for each n, pn =

(d) Determine the limit points of {pn} if the polar coordinates of pn are
rn = 2 - J and dn =

13. Let {h}^Li be a family of closed finite intervals which has the prop¬
erty that the intersection of any finite subcollection of the intervals is
nonempty. Prove that there is a point p contained in ft^=1h- Hint: con¬
sider the sets Jk = n£=1In and use the idea of the proof of problem 8.

2.7 The Quadratic Map


In this section we consider sequences which are defined recursively by
the relation

■Tn+l — (14)

where a > 0 and xo is a given number in the interval [0,1]. If we define

F(x) - ax( 1 — x)

then the maximum of F on [0,1] occurs at x = \ and the maximum value


is |. Thus, if a < 4, F(x) is in [0,1] if x is in [0,1], so {xn} is a sequence of
numbers between 0 and 1.
Sequences produced by iterative formulas such as (14) arise in a vari¬
ety of applications. For example, in a simple model of limited population
growth, xn represents the fraction of some maximal population present
in year n. Given an initial population x$, one wants to understand the be¬
havior of the whole sequence {xn}, which is called the orbit of xo under
the iteration. Will the population approach a limiting value? How does
the behavior of the orbit depend on the initial condition xo? How does
the behavior of the orbit depend on the parameter a? These are the kinds
of questions that one asks about most dynamical systems, both discrete
and continuous. The importance of (14) is that, even though it looks so
simple, it has many interesting properties usually associated with more
complicated dynamical systems. We will see, for example, that as the
2.7 The Quadratic Map 61

parameter a is changed, the behavior of the orbit {xn} changes dramati¬


cally. When a is small, the orbits are quite regular, but as a becomes large
the orbits become very complicated.
First, we will consider the case 0 < a < 1. If we choose a numerical
value for a and choose xq e (0,1), then we can compute on a hand calcu¬
lator as many iterates as we like. For each such numerical experiment,
we see that xn -> 0 as n —> oo. This gives us an idea about the behavior
of the orbits in the case 0 < a < 1, but it is not a proof, of course, since we
are looking at only finitely many of the terms of each orbit and we cannot
perform the experiment for all possible choices of a and xq. Furthermore,
because of round-off error the numerical calculations are only approxi¬
mate. What if orbits that start close diverge from each other quickly?
In this case, the numerical experiments are likely to be inaccurate and
may be quite misleading. In addition, the numerical experiments do not
give us insight into the reason why the sequence {xn} converges to zero.
To gain understanding, we can construct the orbit approximately by a
geometric procedure.

Figure 2.7.1

Given x0, the horizontal line through the point (*0, F(x0)) intersects the
line y = x in the point (F(x0), F(x0)). See Figure 2.7.1(a). Since xi =
F{xq), the first coordinate of this point is just x\. The horizontal line
through (®i, F(xi)) intersects the line y = x in the point (F(xi), F(xi)),
whose first coordinate is X2■ Continuing in this way, we can construct
approximately as many terms of the orbit {xn} as we like and see that
they get smaller and smaller because the graph of F(x) lies below the
62 Chapter 2. Sequences

line y = x. We can also give an analytical proof that xn —> 0.

□ Theorem 2.7.1 Suppose that 0 < a < 1, 0 < xo < 1/ and the sequence
{xn} satisfies (14). Then, xn —>• 0 as n —» oo.

Proof. If xo = 0 or xo = 1, all the rest of the terms in the orbit are zero
so there is nothing more to prove. On the other hand, if 0 < xn < 1, then

Xn-\-1 — <2.Xn(l Xn) <C Xn

since a < 1 and 1 — xn < 1. Thus, the sequence is strictly monotone de¬
creasing and bounded below by zero. By Theorem 2.4.3, {xn} converges
to some x, and since each xn > 0, we know that xe [0,1]. Taking the
limit of both sides of (14), using Theorems 2.2.4 and 2.2.5, we see that x
satisfies

x = ax(l — x). (15)

Since a < 1 and x e [0,1], this can only be true if x = 0. Thus, xn —>• 0 as
n -» oo. □

Even in this very simple case we see the value of three different
methods of investigation: numerical experimentation, geometric con¬
struction, and analytical proof. Let's try this same approach for a > 1.
Set a = 2.4. If we try two different values for xo, namely, xo = .1 and
xq = .85, and iterate (14), we find

Xo .1 .85
»1 .216 .306
X2 .406426 .509674
X3 .578985 .599775
X4 .585027 .576108
X5 .582649 .586098
X6 .583606 .582209
x7 .583224 ■ * .58378
X8 .583377 .583154
x9 .583316 .583405
XlO .58334 .583305

In both cases the sequences seem to be converging to the same num¬


ber, something near .583. We know from the argument in the proof of
2.7 The Quadratic Map 63

Theorem 2.7.1 that if {xn} converges, then the limit x must satisfy (15).
Solving (15) algebraically, we find that there are two possibilities, x = 0
and x = ^=1. Since a = 2.4, we calculate = .583333, so now we
see where the .583 came from. But why does the sequence converge to
.583333 and not to zero? Let's look at the graph of F in Figure 2.7.1(b)
and construct the successive points by the same geometrical method as
above. The x coordinate of the point where the graph of F intersects the
line y — x is x. If xn is less than x, then xn+i will be larger than xn be¬
cause the graph of F is higher than the line y = x for x < x. This explains
why the sequence cannot converge to zero. If xn is greater than x, then
xn+i will be less than xn because the graph of F is lower than the line
for x > x. But that in and of itself doesn't prove that the sequence will
converge to x since it could hop back and forth from one side of x to the
other. In fact, in the case shown in Figure 2.7.1(b), it does hop back and
forth, but, nevertheless, the sequence xn converges to . We shall give
an analytical proof in the case 1 < a < 2y/2.

□ Theorem 2.7.2 Suppose that 1 < a < 2\/2 and that the sequence {xn}
satisfies (14). Then, for all 0 < xo < 1, the sequence {xn} converges to
a
1 as n —> oo.

Proof. Let x = — a
. Recall that the function F achieves its maximum
value at x = | and that value is f. Thus, xn < | for n > 1. Now, choose
a number 5 small enough so that

a(l — S) >1 and 5 < F(f) = a f (1 - f).

We can choose such a S satisfying the first inequality since a > 1. The
reasons for this choice will become clear shortly. First of all, notice that
the graph of F is concave downward since F"{x) = —2a. Thus on the
closed interval [6, |], the minimum value of F occurs at one of the end¬
points. By our choice of 5 above, F(5) = ad)(l — 5) > S and F(|) > <5.
Thus,

S < F{x) < \ if S < x < |. (16)

Therefore, if xi is in the interval [<5, J], then all the succeeding iterates
will be in [S, |]. On the other hand, if any Xj < 5, then

Xj+\ = axj(l - xj) > a(l-5)xj.

Iterating this inequality shows that if x\,..., xk are all < 5, then

Xk > (a(l — S))k~1x\.


64 Chapter 2. Sequences

Since a(l - <$) > 1 and x\ > 0, this gives the contradiction xk > 6 for k
large. Therefore, for each xo, some iterate, xk must satisfy xk > S, and,
by what we proved above, all the succeeding iterates, xn for n > k, will
remain in [5, |].
To prove the convergence, we subtract x = from both sides of
(14) and rearrange algebraically to obtain

(*En+l *e) = (f axn)(xn x) (17)

Since xn is in [<S, |], axn > ad, and the hypothesis that 1 < a < 2\[2
2
implies that axn < < 2. Thus,

— 1 < 1 — ~ <1 — axn < 1 — aS < 1.

Therefore, if we define

(3 = max{|l — ^-|, |1 — a<5|} < 1,

it follows from (17) that

|x„+i—x| < f3\xn-x\ (18)

for all n> k. Iterating the inequality (18) gives

\xn-x\ < f3^n~k\xk - ®),

which proves that xn —>• x as n —>• oo. □

Now let's investigate the behavior of the iteration for a > 3. If we


set a = 3.3 and use a hand calculator to compute the first 20 iterations
starting with three different initial values of xq, we find the results in the
following table:

Xo .1 .5 .88
X\ .297 .825 .34848
X2 .68901 ‘ .'476438 .749238
.707108 .823168 .620006
X4 .683451 .480356 .777475
X5 .713941 .823727 .570925
XQ .673957 .479164 .8084
x7 .725139 .823567 .511135
x8 .657731 .479504 .824591
2.7 The Quadratic Map 65

x9 .742899 .823614 .477315


XlO .6303 .479405 .823302
xn .768972 .8236 .480071
*12 .586258 .479433 .823689
*13 .800447 .823604 .479243
*14 .527115 .479425 .823578
*15 .822574 .823603 .479481
*16 .481622 .479428 .823611
*17 .823885 .823603 .479412
*18 .478824 .479427 .823601
*19 .82352 .823603 .479432
*20 .479604 .479427 .823604

The sequences certainly don't look like they are converging! As n gets
larger, they seem to alternate from close to .479 to close to .823. Amaz¬
ingly, any other initial condition between 0 and 1 seems to give a se¬
quence with the same properties. To understand what is happening, we
look at the composition of F with itself, F^2\x),

F^2\x) = F(F(x)) = a2x(l — x)(l — ax(l — x)).

which is a quartic polynomial. The graph of F^2\x) is shown in Fig¬


ure 2.7.2. In addition to x = 0, there are three other points, labeled
pi, x, and p2 in Figure 2.7.2, such that F^2\x) — x. These points are
called fixed points of F^2\ The point x = is easy to understand.
Since x is a fixed point of F, it is automatically a fixed point of F^2\
Consider the point p\. Since F^2\F(pi)) = F{F^2\pi)) = F(pi), the
point F(pi) is a fixed point of F^2’. Since the odd iterates of F(pi) equal
Pi, F{pi) cannot be 0 or x. If F(pi) = p\, then pi would be a fixed
point of F. Since the only fixed points
of F are 0 and x, we conclude that
F(pi) = p2, and likewise, F(p2) = pi-
Thus, if xq = pi or xq = p2, the iter¬
ates oscillate back and forth between
pi and p2. Such an orbit is said to
have period two. The numerical ev¬
idence above suggests that any orbit
starting with xq between 0 and 1 gets
closer and closer as n ^ oo to an orbit
that oscillates between pi and p2. This
can be proven analytically if a?o ^ x Figure 2.7.2
66 Chapter 2. Sequences

by showing that one of the two subse¬


quences,
®2n+2 = F{2)(x 2„)

and
2-2n+l = ^(®2n—1)>

converges to p\ and the other to p2 using techniques similar to those in


Theorem 2.7.2. If xo = x, then all the terms xn = x. Note that we would
not normally be able to see this numerically because the infinite decimal
expansion of x will be truncated by the calculator and the resulting orbit
will get closer and closer to the period-two orbit discussed above.
And this is just the beginning! As a gets still larger, orbits of period
4, 8,... and so on appear and the dynamics of the iterates becomes more
interesting and difficult to analyze. When a = 4 the iterates are “chaotic”
in the sense that most orbits wander everywhere, there is sensitive de¬
pendence of the orbit on the initial point, x$, and the points with peri¬
odic orbits are dense. Further work in this direction would take us too
far from our goal. Our intention here is just to show that investigating
an iteration even as simple as (14) can lead to deep and interesting ques¬
tions in analysis. For further development of the mathematics of discrete
dynamical systems, see [91 and [8].

Problems

1. Consider the iteration


1
®n+1 2,Xn'

What happens to the iteration for different choices of x0 in 1R. Prove your
conclusions.

2. Consider the iteration


xn+1 = xn■

What happens to the iteration for different choices of x0 in K. Prove your


conclusions.

3. Consider the iteration


xn+1 — sinxn

for 0 < xq < 7r. Prove that xn —> 0. Hint: derive the inequality since < x
for 0 < x < tv by using the Fundamental Theorem of Calculus.
Projects 67

4. Let a, b, and x0 be positive numbers and consider the iteration

axn
xn+\ — •
b 4“ Xfi

Find out all you can about the orbits by numerical experimentation and
analytical proof.

5. Consider the iteration


xn-\~i xn * Xxnj

where xo is any real number and 0 < A < 1. Find out all you can about
the orbits by numerical experimentation and analytical proof.

The following discussion provides background for problems 6-9. For


more information, see [25]. Consider a gene locus having two alleles, a
and b. Individuals in a population of size N are said to have genetic
types aa, ab, or bb, according to whether they have two a alleles, one
of each, or two b alleles. Thus, there are 2N alleles in the population.
We assume that the population reproduces synchronously and that each
new individual receives two alleles chosen randomly from the gene pool
of the previous generation. Let xn denote the fraction of a alleles in the
gene pool at the generation. Constants a, /3, and 7, called fitness
coefficients, represent the probabilities that each of the three genetic types
will reach maturity and reproduce. Thus the total gene pool in the next
generation will be proportional to

axl + 2fixn{l - xn) + 7(1 - xn)2,

and the number of a alleles will be proportional to

ax2n+/3xn(l - xn).

Therefore,
_ _axl + 0xn(l - xn)_
Xn+1 ax\ + 2/3xn(l — xn) + 7(1 — xn)2

6. (a) Show that if 0 < xq < 1, then 0 < xn < 1 for all n.
(b) Show that if a = (3 = 7, then xn remains constant in all generations.

7. Suppose that a > (3 > 7. Prove that xn ->■ 1 as n ->• 00.

8. Suppose that a < (3 < 7. Prove that xn ->■ 0 as n 00.

9. Suppose that a < (3 and 7 < (3. Find numerical evidence that {xn} con¬
verges to a number x satisfying 0 < x < 1 and characterize x analytically.
68 Chapter 2. Sequences

Projects

1. The Fibonacci sequence, 1, 2, 3, 5, 8, 13, ..., is given by the recursive


formula
Fn+i — Fn + Fn-1,
where Fi = 1 and F2 = 2. Let an — Fn/Fn_\.

(a) Suppose that {an} converges to a limit. What must that limit be?
Hint: divide the above equation by Fn to find an equation relating
clyi-\-\ and

(b) Show that 3/2 < an < 2 for all n > 2.


(c) For each n > 2, prove that

|^n+l Qn | L (2/3) \cLn —1|*

(d) Prove that for each m > 2,

\am+i-am\ < (2/3)^m-F\a2-ai\.

(e) Use the inequality in (d) to show that {an} is a Cauchy sequence
and therefore converges to a limit. Hint: express

an ~~ &m = (an ~ an — l) + (an -1 — an — 2 ) + • • • + (am + i — <3m)


and use the geometric series.

2. The purpose of this project is to derive the result of Section 2.3 more easily
using the tools of linear algebra.

(a) Show that 1 and (1 — p — q) are the two eigenvalues of the matrix A.
(b) Find a nonsingular matrix Q so that A = QDQ_1, where D is the
diagonal matrix

D (1
\ 0
0
1-p-q J
j
(c) Explain why An = QDnQ~1.
(d) Prove that

lim An ( Po ) = ( p+9 ^
rwoo V <lo J V Ufi /

3. We are interested in the price of a commodity which is traded at regular


intervals. We let Qk denote the supply of the commodity, Dk the demand
for the commodity, and pk the price at the kth time. The demand depends
on the current price,
Dk = a + bpk,
and the supply depends on the previous price.

Qk — c + dpk-i.
Projects 69

(a) Explain why it is reasonable to take a, c, and d to be positive and b


to be negative.
(b) Suppose we make the assumption that the supply is always equal to
the demand. Find the difference equation satisfied by the sequence
{Pk}-
(c) Suppose that the sequence of prices {pk} converges to a limiting
price p. What must p be?
(d) Find a condition on the coefficients so that you can prove that pk -»
p. Why is it reasonable that the conditions depend on d and 6?
(e) Use specific choices of the coefficients and find numerical evidence
that the prices can oscillate wildly if the condition is not satisfied.
(f) Iterate the equation for pk and use the partial sum of the geometric
series to prove that the sequence does not converge if the coeffi¬
cients do not satisfy the condition in (d).

4. The gas CO2 is produced by the cellular metabolism of the body and
is removed from the body when one exhales. There is a mechanism in
the brain which senses the C02 level and sends signals to the breathing
mechanism to take deeper breathes if the C02 level is too high. Fet Vn be
the volume of the breath and let Cn be the C02 concentration at the
time of the breath. Fet M be the (assumed) constant C02 concentra¬
tion produced by the metabolism between each breath. Then, a simple
model of this system is:

Cn+1 = Cn — aVn + M (19)


Vn+i = (3Cn (20)

The second equation says that the volume of the n + 1 breath is propor¬
tional to the C02 concentration at the time of the breath. The first
equation says that C02 concentration Cn is lowered by an amount pro¬
portional to the volume of the n^"1 breath and raised by M. Given the
initial concentration C0 and breath size V0, we would like to determine
the behavior of the sequences {Cn} and {Un}. Our discussion follows
[10], where more information can be found.

(a) Show that the constant sequences Cn = M/afi and Vn — M/a solve
(19) and (20).
(b) Define cn = Cn - M/a/3 and vn = Vn - M/a. Show that the se¬
quences {cn} and {vn} satisfy

cn_|_i — cn avn (21)


vn+i = (3cn. (22)

(c) Show that {cn} satisfies the second order difference equation,

c^i-)-i cn T a(3cn~i 0. (23)


70 Chapter 2. Sequences

(d) Show that for appropriate choices of Ai and A2/

cn — d\ A” + d2 A£

solves (23) for any choice of d\ and d2.


(e) For any choice of V0 and Co, show that di and d2 can be chosen so
that

Cn = di\™ + d2\2 4- M/a/5


Vn = P(di A” + d2A2) + M/a

solve the original problem.


(f) Prove that if 4a/5 < 1, then Cn —> M/a/3 and Vn —> M/a as n —y
00, so that whatever the initial concentration and breath size, the

breathing settles down to a steady rate.


(g) Show that if 1 < 4a/5 < 4, the concentrations exhibit oscillations
which die out.
(h) Show that if a and 0 are large enough, the size of breaths and the
C02 concentration can exhibit larger and larger oscillations. Can
you explain this analytically?

5. We shall prove a theorem in number theory which uses the Bolzano-


Weierstrass theorem. The proof is longer and more difficult than most of
the proofs in Chapter 2. The project is to read and understand the proof,
filling in all the details, and to prepare a lecture on it for presentation to
the class.
For each positive integer n, there is a unique nonnegative integer jn so
that n lies in the half-open interval

27Tjn < n < 2njn + 27t.

Define
ocn — n - 2irjn.

The number an/ which is denoted n(mod2n) in number theory, is just the
remainder when n is divided by 27r.

□ Theorem. The set of limit points of the sequence {an} is the entire inter¬
val [0, 27t].

Proof. Let e > 0 and N be given. We will show that for any x e [0, 2ir),
there is an n > N so that

\an - x\ < e. (24)

We first note that n^m implies that an ^ am. If an = am, then

m—n 2tT (jm Jn):


Projects 71

but this would imply that n is a quotient of integers and therefore ra¬
tional. Since n is irrational, we must have an / am. Now, we apply the
Bolzano-Weierstrass theorem to the sequence {an}, which is bounded be¬
cause each an is in [0,2n). Since a subsequence converges, we can choose
a large integer m > N and a larger integer n > m so that |an — am\ < e.
Let k = n — m. Then

k — Oin OLm T 27r(jn jm)•

There are two cases, depending on whether an — am is positive or nega¬


tive. Suppose an — am = 5 > 0. Then,

Oik — ®-n Oim —, 5

and since
2k = 2(an Om) T 4tti^jn jm)i

we have ct2k = 25. Continuing in this manner, we see that the numbers
c*2fc> Q-zki ••• equal 5, 25, 35,..., respectively. Each x e [0, 27t] is therefore
within 5 of one of the numbers a*,, a2k, 03*,,.... Since 5 < e, this proves
(24). The other case is handled similarly. If 5 < 0, then a*. = 2ir + 5, a 2k —
2it + 25, and so forth. As above, this proves that any x is within 5 of one
of the points afc,a2fc,Q!3fc,.... □
CHAPTER 3

The Riemann Integral

In this chapter we define what it means for a function to be continuous


and we construct the Riemann integral. Continuity is defined in Sec¬
tion 3.1, and we prove several fundamental theorems about continuous
functions on closed intervals in Section 3.2, for example, that they have
maxima and minima. The Riemann integral for continuous functions on
finite intervals is introduced in Section 3.3, where we show that the inte¬
gral is well defined and has the properties that we expect. In Section 3.4,
Taylor's theorem (from Section 4.3) is used to prove error estimates for
numerical integration techniques. The integration theory is extended to
functions with discontinuities in Section 3.5 and to unbounded continu¬
ous functions and functions on infinite intervals in Section 3.6.

3.1 Continuity
Throughout the first few chapters of this book we study functions which
are defined on subsets of R and take values in R. Recall that the set
on which a function / is defined is called its domain and is denoted by
Dom(f). Normally the functions we deal with will be defined on inter¬
vals (open, closed, or half open) or unions of intervals. For example, the
function f(x) = (2 — x2)^1 is naturally defined on the union of the in¬
tervals (—oo, -v^2), (-a/2, a/2), and (a/2, oo), but not at the points \/2 or
-a/2. Occasionally, functions are defined on more complicated sets.

Definition. A function / is said to be continuous at a point c in Dom(f)


if for every sequence {.-rn} of points of Dom(f) which converges to c we
have

lim f(xn) = /(c). (1)


n—too

f is said to be continuous if it is continuous at every c in Dom(f).


74 Chapter 3. The Riemann Integral

The definition says that a function is continuous at a point if, as one gets
closer to the point, the function values approach the function value at
the point.

Example 1 Consider the function f(x) = x2 whose domain is the whole


real line R. Let celbe given and suppose xn > c. Then, —

lim f(xn) = lim x2n


n-Too n—>oo

= ( lim xn)2
n—>oo

= /(c),

where we used Theorem 2.2.5 in the second step. Thus, / is continuous


at c. Since c was arbitrary, / is continuous on R.

The idea in Example 1 can easily be generalized to give a proof that


all polynomials are continuous functions on R (problem 6 of Section 2.2).
Let p(x) = b0 + b\x + b2x2 + ... + b^xN. We can take all of R as the domain
of p since p is well defined for all real numbers. Choose any ceR and
suppose that xn ->• c. Then, using Theorems 2.2.3, 2.2.4, and 2.2.5,

lim p(xn)
n—> oo
lim {bo + bixn + b2x
n—^ oo
• + bNx%}
lim bo + lim bixn + lim b2xl + ... + lim bNx,N
n
n->oo n-Yoo n—)■ oo n n-^oo

N
bo + b\ lim xn + b2 lim x2 + ... + bjy lim x^
n-^roo n-> oo n—>oo n

b0 + bi lim xn + b2( lim xn) N


bN( lim xn)
‘n—>• oo n—>oo

bo T~ b\c + b2c2 + ... + bjycN


* \

p(c).

Thus p is continuous at x = c. Since c was arbitrarily chosen, we have


shown that p is continuous on K. Notice that p is therefore automatically
continuous on any subset S contained in R. Therefore, if a function is
made up of pieces of polynomials (as in Example 2, below), the only
points where the function might not be continuous are the points where
3.1 Continuity 75

the pieces match up, that is, where the definition formula of the function
changes.

Example 2 Consider the following function defined on [0, 2] :

2x if 0 < x < 1
f(x) = 2-i*2 if 1 < x < 2.

Since / is made up of pieces of polynomials and polynomials are con¬


tinuous, / is continuous on the interval [0,1) and on the interval (1,2].
In Figure 3.1.1, it looks as if / is not con¬
tinuous at x = 1 because the values of
f as x approaches 1 from the left ap¬
proach 2, while /(1) = 1.5. To see that
the criterion in the definition is violated,
let xn = 1 — ~. Then, xn —> 1 as n —> oo,
but

lim f{xn)
n—>oo
~ v
= lim 2(1
7i—yoc
-i)
= 2 Figure 3.1.1
* /(!)•
Thus, / is continuous at all points of [0, 2] except 1.

One of the nicest properties of continuous functions is that when they


are added or multiplied the result is again a continuous function.

□ Theorem 3.1.1 Let / and g be continuous functions and define D =


Dom(f) fl Dom(g). Then,

(a) f + g is continuous on D.

(b) For any constant «, the function nf is continuous on Dom(f).

(c) fg is continuous on D.

(d) - is continuous at all x e D such that g(x) ^ 0.


9

Proof. The proofs of all four parts follow easily from Theorems 2.2.3 -
2.2.6 in Section 2.2. For example, to prove (a), let c be a point of D. Sup¬
pose that is a sequence of points such that xn e D and xn f c. Since
76 Chapter 3. The Riemann Integral

/ is continuous at c, we know that f(xn) —>■ /(c). Since g is continuous


at c we know that g(xn) —)• #(c). Therefore, by Theorem 2.2.3,

lim {f(xn)+g(xn)) = lim /(xn) + lim /(*„)


n—>-oo n—>■ oo n—>• oo

= /(C) + 3(C)-
Thus, by definition, / + g is continuous at c. Since ,c was an arbitrary
point of D, we have proven (a). The other parts are proved similarly. □

We will see later that sinx, cosx, eax, and many other elementary
functions are continuous on R. Because of Theorem 3.1.1, any algebraic
combination of these functions and polynomials is a continuous function
on R, except possibly where denominators vanish. In particular, parts (a)
and (b) show that the set of continuous functions on a set D, denoted by
C(D), is a vector space; that is, linear combinations of continuous func¬
tions are again continuous functions. Vector spaces are defined formally
in Section 5.8. The composition of continuous functions is also continu¬
ous.

□ Theorem 3.1.2 Let / and g be continuous functions and define

D = {x | x e Dom(g) and g(x) e Dom(f)}

Then fog is continuous on D.

Proof. Let c be a point of D and suppose xneD and xn —>• c. By the


definition of D, the numbers g(xn) and g(c) are in Dom(f). Further, since
g is continuous, g(xn) —> g(c). Thus, since / is continuous, f(g(xn)) —>
f{g{c)), which proves that / o g is continuous. □

The definition of D looks complicated but it merely expresses the fact


that / can only be evaluated at points in Dom(f). Therefore g(x) must
be in Dom(f) and x must be in Dom(g) in order to evaluate f(g(x)).

Example 3 Since sin x, cos x, eax, and polynomials are continuous on R,


all compositions such as

sin (x2 — 3x), e3cosa:? sin(cosa:)

are automatically continuous on R.


3.1 Continuity 77

Example 4 Let's consider two examples where we have to be careful


about domains. As you recall from calculus, the function In x is defined
for all x > 0. Later we will see that In a? is continuous. Thus In (/(a:))
is defined only at those x where f(x) > 0. But on that set. In (f(x))
is automatically continuous if / is a continuous function, by Theorem
3.1.2. A slightly more complicated example is the function tan (In x).
Since tanx = the domain of the tangent function consists of all
x ^ 7r(2n + l)/2 where n is an integer. Therefore, we must avoid those x
such that In x = 7r(2n + l)/2, that is, the points of the form e7r(2n+1)/2. Of
course. In x is defined only if x > 0. Thus,

Dora(tan (Ina:)) = {x | x > 0 and x ^ e7r(2n+1)/2 for any n}.

By Theorem 3.1.2, tan (In a;) is continuous on this domain.

Sometimes, it is convenient to have a criterion for continuity that


does not irivolve sequences. We use this criterion in the next section
when we define uniform continuity.

□ Theorem 3.1.3 A function f(x) is continuous at a point c in Dom(f) if


and only if for each e > 0 there is a S > 0 such that for all x in Dom(f):

\x — c| < 6 implies \f(x) - /(c)| < e. (2)


78 Chapter 3. The Riemann Integral

Proof. First suppose that the e — 8 condition holds. We will show that
/ is continuous at c. Suppose that xn eDom(f) and xn —> c. Let e > 0 be
given and choose 8 so that (2) holds. Since xn —> c, we can choose an N so
that \xn — c\ < 8 for n > N. Thus, by (2), we know that \f(xn) - /(c)| < e
for n > N. Thus, by the definition of convergence for sequences, the
sequence {f(xn)} converges to /(c). Therefore, / is continuous at c.
To show that continuity implies the e — 8 condition, we give a contra¬
positive proof. That is, assume that there exists an e t> 0 so that there is
no 8 so that (2) holds. We'll show that / is not continous at c. Let 8n =
Since (2) does not hold, there is an xn e Dom(f) satisfying

\xn c| ^ 8n (3)
such that

!/(*«)-/(c)| > e. (4)

From (3), it follows that xn -> c since 8n -» 0. But by (4), the sequence
{/(*«)} cannot converge to /(c) since each f(xn) is always a distance of
more than e away from /(c). Therefore / is not continuous at c. □

This theorem suggests a natural notation. For each c, we say that

lim f(x) = L (5)

if and only if
lim f(xn) L
n—>oo

for all sequences {xn} in Dom(f), with xn ^ c, that converge to c. We


assume in this definition that there is at least one such sequence (if not,
(5) would be true for any L), which means that c can be approached
from Dom(f). According to the proof of Theorem 3.1.3, (5) is equivalent
to saying that for each e > 0 there is a 8 > 0 such that for all x in Dom(f):

0 < \x — c\ <8 implies \f(x) — L\ < e.

Note that, in general, c need not be in Dom(f) nor need L be a value of


/. For example, the natural domain of f(x) = x~l sin a; is the set of all
x ^ 0. We will show later that

since
lim- = 1.
x

Of course, if / is continuous and c is in Dom{f)r then /(c) = L.


3.1 Continuity 79

Problems

1. Students are often told in calculus that continuous functions are those
functions “whose graphs you can draw without picking your pencil up
from the paper.” Write a paragraph explaining the relation between this
informal idea and the formal definition in this section.
2. Prove part (c) of Theorem 3.1.1.
3. Let f(x) be a continuous function. Prove that \f{x)\ is a continuous func¬
tion.
4. Where is the function In (sin x) defined and continuous?
5. Suppose that / is a continuous function on E such that f(q) = 0 for all
q e Q. Prove that f(x) = 0 for all let.

6. Suppose that the function / is defined only on the integers. Explain why
it is continuous.
7. Let f(x) = 3x — 1 and let e > 0 be given. How small must 8 be chosen so
that \x — 1| < 5 implies |/(x) — 2| < e?
8. Let f(x) = x2 and let e > 0 be given.

(a) Find a 8 so that \x — 1| < 8 implies \f(x) — 1| < e.


(b) Find a 8 so that |x — 2| <8 implies | f(x) — 4| < e.
(c) If n > 2 and you had to find a <5 so that \x — n\ < 8 implies |/(x) —
n21 <e, would the 8 be larger or smaller than the 8 for parts (a) and
(b)? Why?

9. Let f(x) = 3x3 - 2 and let e > 0 be given. Find a 8 so that \x - 1| < 5
implies | f(x) — 1| < e.
10. Suppose that a function / is continuous at a point c and /(c) > 0. Prove
that there is a 8 > 0 so that for all x e Dom(f),

x —c 8 implies
/(c)
/(*) > 2

11. Let f(x) — \fx with domain {x \ x > 0}.


(a) Let e > 0 be given. For each c > 0, show how to choose 8 so that
\x — c\ <8 implies \y/x - y/c\ < e. Hint: write

x — c
\fx — y/c =
y/x + yfc.

(b) Give a separate argument to show that / is continuous at zero.

12. (a) Sketch a rough graph of the function f(x) = e-1^2 defined on the
domain {x \ x / 0}.
80 Chapter 3. The Riemann Integral

(b) Prove that lim^o e 1//x exists.


(c) Show that / can be defined at zero in such a way that it is a contin¬
uous function on R.

13. (a) Generate the graph of the function f(x) = sin \ on the interval (0,1].
(b) Find sequences {a„} and {f3n} of numbers in (0,1], so that an ->■ 0,
(3n —> 0, and

lim sin— = 1, lim sin— = —1.


n->oo an n^oo fjn

(c) Conclude from part (b) that lim^o sin ^ does not exist.
(d) Can / be defined at 0 so that / is continuous on [0,1]?
(e) Can g(x) = x sin | be defined at 0 so that g is continuous on [0,1]?

3.2 Continuous Functions on Closed Intervals


In the last section we proved several simple properties of continuous
functions. We use the word “simple” because the proofs depend only on
the idea of convergent sequence and the limit theorems in Section 2.2. In
this section we prove several deeper properties of continuous functions
which depend on the notion of Cauchy sequence and the completeness
of the real numbers.

Definition. A real-valued function / defined on a set S is said to be


bounded if there exists a real number B so that

|/(x)| < B for all x e S. (6)

□ Theorem 3.2.1 A continuous function on a closed finite interval is


bounded.

Proof. We give a proof by contradiction. Suppose that / is continu¬


ous on [a, b] but no such B exists. Then for each n there is an xn e [a, b}
such that |/(x„)| > n. Since the sequence {xn} is contained in [a,b\, the
Bolzano-Weierstrass theorem and its corollary guarantee that {xn} has a
subsequence {xUk} that converges to a point c in [a, b] as k —>• oo. Since
/ is continuous, f(xnk) —> /(c) as k —* oo. But this is impossible since
3.2 Continuous Functions on Closed Intervals 81

the numbers |/(xnfe)| diverge to +oo. Therefore a B must exist so that (6)
holds. □

If / is a function on a set S, we define the supremum of / on the set


S, denoted sup5 /, to be

sup / = sup {f(x) | x e S'}.


5

That is, sup5 / is just the supremum of the set of values of / on S. Simi¬
larly, we define the infimum of / on S by

inf / = inf {f(x) \ x e S}.

Note that sup5 / could be +oo and infs / could be — oo, depending on
the function / and the set S. For example, if f(x) = x2 and S = R, then
sup5 / = +oo and infs f = 0. Even in the case where sup5 / is finite, we
do not necessarily know that there is a point c in S so that /(c) = sup5 /,
as the following example shows.

Example 1 Let f(x) — x2 be defined on


the set S = (0, 2). See Figure 3.2.1. Then it is
easy to see that sup^ 2) / = 4 and inf(0,2) / =
0, but there are no points in (0, 2) where /
takes on these values. Note that had we in¬
cluded the endpoints of the interval (0,2)
in the domain of definition of /, then there
would have been points (0 and 2) where the
Figure 3.2.1
values of / were equal to sup5 / and infs f ■

The example suggests that a continuous function on a closed interval


assumes its supremum and infimum.

□ Theorem 3.2.2 Let / be a continuous function on a closed interval [a, b].


Then, there exist points c and d in [a, b] such that

/(c) = sup / and f(d) = inf/.


[a,b] M

Proof. According to Theorem 3.2.1, the set of values of the function /


on [a, b] is a bounded set, so sup[a>b] / and infK6] / are finite numbers. Let
82 Chapter 3. The Riemann Integral

M = sup[a 6] /. For each n, there must be a point xn in [a, b] so that

M- - < f(xn) < M, (7)


n

because, otherwise, M wouldn't be the supremum of the values of /.


Since the interval is finite, the Bolzano-Weierstrass theorem guarantees
that the sequence {xn} has a subsequence {xnk} that converges to a point
c in [a, b\ as k -> oo. Since / is continuous,

lim f(xnk) = /(c).


rC—^OO

But, from (7), we see that

lim f(xnk) = M.
k—> oo

Thus, we must have /(c) = M since a convergent sequence can have


only one limit. The proof of the existence of d is similar. □

□ Theorem 3.2.3 (The Intermediate Value Theorem) Let / be a continu¬


ous function on a closed interval [a, b] such that /(a) ^ f(b). Let y be a
real number between /(a) and /(&); that is, either /(a) < y < f(b) or
/(a) > y > f{b). Then there is a c in (a, b) such that /(c) = y.

Proof. We shall consider the case f(a)<y<f(b); the proof of the other
case is similar. Let
S = U e [a,b] | f(x) < y}

and define c = sup S. Note that S is nonempty since it contains a and


S is bounded by b. Thus, c is a well defined element of [a, b]. Since /
is continuous at a and /(a) < y, there is a 5 > 0 so that f(x) < y if
a < x < a + 5. Thus c > a. We shall show that /(c) = y.

a+5 £«. n c + /3
-T-*-♦-*-1-1-1-
a x^i c jj

Figure 3.2.2

For each n we can choose an rn£5 that is in the interval \c - cl; see
problem 3 of Section 2.5. Since xn —> c and / is continuous, f{xn) -» /(c).
3.2 Continuous Functions on Closed Intervals 83

See Figure 3.2.2. Thus, /(c) < y since f(xn) < y for all n. If /(c) < y,
then if /3 > 0 is small enough, f(c + 0) < y because / is continuous. Since
this contradicts the definition of c, we conclude that /(c) = y. □

Figure 3.2.3

The Intermediate Value Theorem says something that seems almost


obvious. If the graph of a continuous function / starts below the height
y and ends above it, then there must be a point where it crosses the hor¬
izontal line at height y. As in Figure 3.2.3, there can be more than one
such point. If the result is “obvious,” why is it so hard to prove? Con¬
sider the function f(x) = x2 — 2 on the interval [0,2]. Since /(0) = —2
and /(2) = 2, the Intermediate Value Theorem guarantees that there is a
number c e [0,2] so that c2 — 2 = 0; that is, c = \/2. Since y/2 is irrational,
if our number system contained only rational numbers, then the “graph”
of x2 — 2 would cross the ®-axis without meeting it. Thus, the complete¬
ness of M must be involved in a deep way in the proof of the Intermediate
Value Theorem. It is, since we used the fact that a bounded set has a least
upper bound, which is equivalent to the axiom of completeness.

Corollary 3.2.4Let / be a continuous function on a closed interval [a, b}


and define m = inf[a 6] / and M = sup[a 6j /. Then, the range of / is the
interval [m,M].

Proof. According to Theorem 3.2.2, there exist c and d in [a, b] so that


/(c) = M and f(d) = m. By Theorem 3.2.3, for every ye[m,M] there
is an x between c and d so that f(x) = y. Thus, the range of / contains
[m, M]. By the definitions of M and m, the range cannot contain any
points outside [m, M], so the range equals [m, M]. □
84 Chapter 3. The Riemann Integral

We now study more deeply the criterion for continuity introduced in


Theorem 3.1.3. In order to prove continuity of / at c, one must show that,
given e > 0, we can choose a S so that (2) holds. In general, the size of S
may depend on what point c we are looking at. Intuitively, we will have
to choose S smaller if the function / is rising or falling faster.

Definition. A continuous function / defined on Dom(f) is said to be


uniformly continuous if for each e > 0 there is a S > 0 so that for all x in
Dom(f) and all c in Dom(f)

\x — c| < S implies \f(x) — f(c)\ < e. (8)

Thus, to show that a function is uniformly continuous one must be able


to find a S that is independent of c.

Example 2 Consider the function f(x) = x2, with domain equal to the
whole real line, R. Let e > 0 be given and choose a point c. Then,

\f(x)-f(c)\ < \x-c\\x + c\.

Choose a S and suppose that x satisfies \x — c\ < S. Then,

\f(x)~f(c)\ < S\x + c\.

We want to see how small S must be so that the right-hand side is less
than e. Since x is within 5 of c, we know that |x| < |c| + S. Therefore if
6 < 1, we have

\f{x)-f(c)\ < (J(|®| + |c|)


< £(|c| + S + |c|)
< 5(2|c| + l).

Now we can see how to choose S. If 5 < 1 and

then

\x — c| < 6 implies \f(x)-f(c) \ < e.

Thus we have proven that f(x) is continuous at c, and we see explicitly


how the choice of 6 depends on c. As |c| gets larger, we have to choose
3.2 Continuous Functions on Closed Intervals 85

S smaller. This makes sense because the slope of / gets steeper as x gets
larger. So, to make the difference \f(x) - /(c) | small, we have to choose
5 smaller as c gets larger. Notice, however, that 8 = 2ijV£i+-j- is a suitable
<5 for all c in [—N, N}. Thus f(x) = x2 is uniformly continous on each
interval [—AT, N] but not on the whole real line R.

What we found in Example 2 is a general phenomenon.

□ Theorem 3.2.5 Let / be a continuous function on a closed finite interval


[a, b}. Then, / is uniformly continuous on [a, b\.

Proof. We will give a proof by contradiction. Suppose that / is not


uniformly continuous. Then, there exists an eo so that we cannot choose
any <5 > 0 with the property that

\x - c\ < 8 implies \f(x) — f(c)\ < eq

for all x and c in the interval. Set 8n — 2~n. Then, for each n there are
points xn and cn in [a, b] so that

\xn-Cn\ < 8n and \f{xn) - /(cn)| > eo- (9)

Since [a, b] is a finite interval, the Bolzano-Weierstrass theorem guaran¬


tees that there exists a subsequence {xnk} that converges to a point d in
[a, b}. Furthermore,

|cnk ~ d\ S: Icnk ~~ xnk \ + \xnk — ^1

< 2~nk I
+ Xnk - d\

so cnk -> d also as k oo. Since / is continuous, f(xnk) ->• f{d) and
f(cnk) -»• f(d) as k ->• cxd. This is impossible since |/(x„fe) - /(cnJ| > £o
for each k. Thus / is uniformly continuous on [a, b}. □

The following criterion guarantees uniform continuity (problem 7).

Definition. A function / defined on a set S C R is said to be Lipschitz


continuous on S if there exists an M so that

1 f(x) - /(c)l < M


\x — cl

for all x and c in S such that x/c.


86 Chapter 3. The Riemann Integral

Problems

1. Prove that f(x) = x3 - 4x + 2 has a zero in the interval [0,1].


2. Prove that all cubic polynomials have at least one real root.
3. Let / be a continuous function on a finite interval [a, 6]. Suppose that
f(x) > 0 for all x in [a, b\. Prove that there is an a > 0 such that f(x) > a
for all x in [a, b\.
4. Let / and g be continuous functions on the finite interval [a, b\. Suppose
that f(x) > g(x) for all x in [a, b]. Prove that there is an a > 0 such that
f(x) > a + g{x) for all x in [a, b}.
5. Let / and g be continuous functions on the finite interval [a, b}. Suppose
that f(x) < g{x) for all x in [a,b]. Prove that there is an a < 1 such
that f(x) < ag(x) for all x in [a,b\. Hint: consider the sets {x \ f(x) >
(i - iM®)}-
6. Let / be a continuous function on a finite interval [a, b}. Suppose that
/(a) < f(b) and choose c and d so that f(a) < c < d < f(b). Define

S = {x | c < f(x) < d}.

(a) Show by example that S need not be a single interval.


(b) Suppose that x < y implies that f(x) < f(y) (such a function is
called monotone increasing). Prove that in this case S is an interval.

7. Prove that if / is Lipschitz continuous on a set S CM then / is uniformly


continuous on S.
8. In problem 13 of Section 3.1 we showed that g(x) = a; sin ^ with domain
(0,1] can be defined at zero in such a way that the extended function is
continuous on [0,1], Is the extended function uniformly continuous on
[0,1]?
9. Show that the function f(x) = y is not uniformly continuous on the inter¬
val (0, oo) but is uniformly continuous on any interval of the form [g,, oo)
if g > 0.
10. Prove that f(x) = \/x is uniformly continuous on [0, oo).
11. Show by example that a function can be uniformly continuous without
being Lipschitz continuous. Hint: consider f(x) = s/x.

12. A function / defined on M is saijd to be periodic with period p if f(x+p) =


f(x) for all seR. Prove that if / is periodic and continuous, then / is
uniformly continuous.

13. Use the result in Project 5 of Chapter 2 and the fact that sin a: is a contin¬
uous function to prove that the set of limit points of the sequence {sinn}
is the entire interval [—1,1].
3.3 The Riemann Integral 87

3.3 The Riemann Integral


In this section we use many of the technical tools which we have devel¬
oped and the uniform continuity of continuous functions on closed finite
intervals to show the existence and properties of the Riemann integral.
Let / be a bounded function on the interval [a, b]. We define a partition,
P, of the interval to be any finite collection of points

Xq < X\ < X2 < .... < Xn

such that xo = a and xjy — b. For each of the N subintervals, [xi-i,Xi],


we define

Mi = sup {/(z) | i < x < Xi}


X

rrii = inf {/(z) | Zi_i < z < zA.


X

Imagine that / is a positive continuous function such as the one pictured


in Figure 3.3.1. Although we each have an intuitive idea of “the area
under the curve between a and 6,” it is not clear how to define “area”
technically. However we define area, it should have the property that

N N
Vmj(zj-Zi_i) < area < Y] Mj(xj - Zj_i)
i=l i=1

because the sum on the left is the sum of the areas of the rectangles lying
under the curve in Figure 3.3.1(a), and the sum on the right is the sum
of the areas of the rectangles whose tops are over the curve in Figure
3.3.1(b). The sum on the left is called a lower sum and is denoted by
Lp(f) because it depends on / and on the partition P.
88 Chapter 3. The Riemann Integral

Similarly, the sum on the right is called an upper sum and is denoted by
Up(f). For any bounded function / and any particular partition P we
always know that Lp(f) < Up(f) since mi < Mi for each i.
Intuitively, upper sums should approximate the area from above and
lower sums from below. And we should get better approximations to the
“area” if we pick partitions with more points and shorter bases (x{—Xi-1)
for the rectangles. The following lemma shows that if we add points to a
partition, then the upper sum cannot increase and the lower sum cannot
decrease. The second lemma shows that, indeed, all upper sums are
larger than all lower sums.

Lemma 1. Let Q be a partition which contains the points of P and some


additional points. Then Lp(f) < LQ(f) and C/q(/) < Up(f)

Proof. We add the additional points to the partition P one at a time. Let
Qi denote the partition with one new point y added in the j^1 interval:

xq < xi < ... < Xj-i < y < Xj < ... < xn.

Then,

1-1 n
Up(S) = -*»-1) + Mj(Xj-Xj-1) + Mi(xi
i=1 i=j+1

3-1
vQl(f) = T,Mi{xi - X{-i) + sup f(x) (y - Xj-1)
i=1 \xj-l<x<y )

n
+ sup (xj y) + Y Mi(xi-xi^ i).
\y<x<Xj
i=j+1

Since,

sup (y-xj-1) + sup (xj-y)


\xj-l<x<y \y<x<xj

< Mj(y - Xj-r) + Mj(Xj - y)

— Mj(xj — Xj-i),

we conclude that UQ1(f) < Up(f). Now we add a second point to Q\


and make the same argument to show that UQ2(f) < UQl(f) and so
3.3 The Riemann Integral 89

forth. Continuing in this way, we find that £/q(/) < UP(f). The proof
that Lp(f) < Lq(/) is similar. □

Lemma 2 Let P and Q be partitions. Then Lp(f) < UQ(f).

Proof. Consider the partition P U Q, made up of both the points of P


and the points of Q. Then LPUQ(f) < UPUQ(f). Since P U Q contains P
and P U Q contains Q, Lemma 1 iniplies that

LP(f) < LP\jQ(f) < UPuQ(f) < UqU).

Therefore, LP(f) < C/q(/). □

Thus, each upper sum is larger than each lower sum, so by Theorem
2.5.3,

sup{LP(/)} < inf {UP(f)}. (10)


p p

Definition. A bounded function / on a finite interval [a, b] is said to be


Riemann integrable if

sup {LP(f)} = inf (C/p(/),} (ID


p F

in which case we define the integral of / over [a, b] by

[' f{x)dx = inf {UP(f)}.


Ja *

The Riemann integral is named after Georg Riemann (1826 - 1866).

Lemma 3 Let / be a bounded function on [a, b}. Suppose that for each
e > 0 there is a partition P so that

Up(f) — LP(f) < e. (12)

Then / is Riemann integrable.

Proof. Define a = infP {UP(f)} - supP {LP{f)}. Equation (10) shows


that a > 0. Suppose that a > 0. For every partition P, UP(f) >
infp {UP{f)} and LP{f) < supP {LP{f)}, so UP{f) - LP(f) > a. Thus,
(12) can not hold for any partition if £ < a. Since we are assuming (12),
90 Chapter 3. The Riemann Integral

we conclude that a = 0. This proves, by definition (11), that / is Riemann


integrable. □

□ Theorem 3.3.1 A continuous function on a closed interval is Riemann


integrable.

Proof. Let / be continuous on [a, b]. We shall show that the criterion
of Lemma 3 is satisfied. Since / is continuous on [d, 6], we know from
Theorem 3.2.5 that it is uniformly continuous. Therefore, given e > 0,
we can choose S > 0 so that

\x ~ y\ < & implies \f(x)-f{y)\ < --.


b—a

Let P be any partition such that the maximum subinterval length is less
than or equal to S. Since / is continuous, by Theorem 3.2.2 there are
points a and d+ in each subinterval [xi,Xi-1] such that f(a) — Mi and
f(di) = rrii. Since |c*—di\ < S for each i, we therefore know that Mi—rrii <
s/(b — a). Thus,

N N N
Mi{xi - Xi-i) - Y mi(xi - Xi-i) Y(Mi ~ mi)(xi ~ xi-1)
i=1 i= 1 i=1

N
<
Yb_a(Xi Xi~i)
i—i

£.

Since we have found a partition so that (12) holds. Lemma 3 guarantees


that / is Riemann integrable. □

Note that this theorem proves that the integral of a continuous func¬
tion exists, but it doesn't tell us how to compute it. If the function / is
the derivative of a function that we know, then we can evaluate the inte¬
gral by using the Fundamental Theorem of Calculus (Section 4.2). If not,
which is often the case, then we must use some method of approxima¬
tion. If P is a partition and x* is a point in the ith subinterval
we call

Yf(Xi)(Xi ~ Xi~l)
i=1
3.3 The Riemann Integral 91

a Riemann sum for / corresponding to the partition P. Note that there


are many possible Riemann sums, depending on how we choose the
points x*. For example, if we choose each x* to be the point in each
[xi,Xi-i] where the maximum is taken on, then we get an upper sum.
Similarly, if we choose x* to be the point in each subinterval where the
minimum is taken on, then we get a lower sum.

Corollary 3.3.2 Suppose that / is a continuous function on [a, 6], and


let {Pk} be a sequence of partitions such that the maximum length of
the subintervals goes to zero as k —> oo. Let Sk be any Riemann sum
corresponding to Pk- Then

fb
f{x)dx as k —» oo.
Ja

Proof. Let e > 0 be given and choose 6 as in the proof of Theorem 3.3.1.
Choose K so that k > K implies that the maximum subinterval length is
less than S. For all k,

LPt(f) < Sk < UPi(f)

and
Lpk{f) < f
Ja
f{x)dx < UPk(f).

Therefore, since UPk(f) - LPk(f) < £ for k > K, we conclude that

ISk - f
Ja
f{x) dx\ < e

for k > K. This proves that Sk -> Jb f{x) dx as k —>■ oo. □

In practice, the Riemann sums are evaluated by machine computa¬


tion, and one is very interested in choosing the x* so that the convergence
of the Sk is rapid. This is the subject of Section 3.4. For the moment, we
use Riemann sums to verify that the Riemann integral has the properties
that we expect.

□ Theorem 3.3.3 Let / and g be continuous functions on the interval [a, b\


and let a and j3 be constants. Then
92 Chapter 3. The Riemann Integral

Proof. Let {Pk} be a sequence of partitions such that the maximum


length of the subintervals goes to zero as k —>■ oo. For each k, we denote
the points of the partition Pfc by xft < x\ < ... < xkN. Note that N
depends on k. In each subinterval [xk_l,xk] we choose a point, xk* Then,

i— 1
N N '

= aY,f(4’)(4-4- ) + 1 ,3 £</(**•)(*,• 1
-*?- )'
*=1 i=l

By Corollary 3.3.2, the sequence of Riemann sums on the left converges


to the integral on the left of (14) as k —» oo. Each of the sums on the right
converges to the corresponding integral on the right of (13). Thus the
result follows from Theorem 2.2.3 and Theorem 2.2.4. □

□ Theorem 3.3.4 Let / and g be continuous functions on the interval [a, b]


and suppose that f(x) < g(x) for all x e [a, 6]. Then,

rb rb
/ f(x)dx < / g(x)dx. (14)
Ja Ja

Proof. Let {Pk} and xk* be as above. Then,

N N

E/(4*)(4-=Ti) < *h:


1=1 i=l

Since the Riemann sums on the left converge to the integral on the left of
(14) and the Riemann sums on the right converge to the integral on the
right, we conclude from problem 3 of Section 2.4 that (14) holds. □

□ Theorem 3.3.5 Let / be a continuous function on the interval [a, 6].


Then,

f f (x) dx '< I |/( x)| dx. (15)


Ja Ja

Proof. Let {Pk} and xk* be as above. By the triangle inequality.

N N

£/(*?*)(*?-4U) < £l/(4*)l(^-4-i)-


i= 1 i—1
3.3 The Riemann Integral 93

The Riemann sums on the left converge to the integral on the left of (15)
and the Riemann sums on the right converge to the integral on the right,
so we conclude that (15) holds. □

Theorems 3.3.4 and 3.3.5 can be combined to give the following fun¬
damental integral estimate.

Corollary 3.3.6 Let / be a continuous function on the interval [a,b\.


Then,

f
Ja
f(x) dx < (6-a)sup|/(x)
[a, 6]
(16)

□ Theorem 3.3.7 Let / be a continuous function on the interval [a, b] and


suppose that a < c < b. Then,

f
Ja
f(x)dx = f
Ja
f(x)dx +
Jc
f f(x)dx. (17)

The proof of Theorem 3.3.7 is left to the student (see problem 6). If
/(*) > 0 for all x e [a, b], we define the area under / and above the x-axis
between a and b to be Jab f(x) dx. Note that by Theorem 3.3.4, the area is
always nonnegative. If f(x) < 0, we define the area above / and below
the x-axis to be fb |/(x)| dx. In this case also, the area is nonnegative.
In our definition of the Riemann integral /° f(x) dx, we assumed that
a < b. If a > b we define

The Riemann integral on M1 2 is considered in project 4 of Chapter 4.

Problems
1. Let / be the function on [0,1] given by
fO if x is rational
f(x) — | i if a; is irrational.

Explain why UP{f) = 1 and LP(f) = 0 for every partition P. Is / Rie¬


mann integrable?
94 Chapter 3. The Riemann Integral

2. Let / be the function on [0,1] given by

1 if x V |
/(*) 2 if cc =

Prove that / is Riemann integrable and compute f* f(x) dx. Hint: for
each e > 0, find a partition P so that UP(f) — Lp(f) < e and use Lemma
3.

3. Suppose that / is Riemann integrable on [a, b] and f(x) > 0 for all x.

(a) Prove that f(x) dx > 0.

(b) Prove that if J^ f(x) dx = 0 and / is continuous, then f(x) = 0 for


all x in [a, b].
(c) Find a counterexample which shows that the conclusion of part (b)
may not hold if the hypothesis of continuity is removed.

4. Let f(x) = 3x on the interval [0,1] and let e > 0 be given.

(a) Explain how to construct explicitly a partition P so that Up(f) —


Lp(f) < e.

(b) Compute fQ f(x)dx without using the Fundamental Theorem of


Calculus.

5. Let / be a continuous function on the interval [a,b]. Explain why


fa f(x) dx can be interpreted as “the sum of the areas above the x-axis mi¬
nus the areas below.” Explain why Theorem 3.3.4, Theorem 3.3.5, Corol¬
lary 3.3.6, and Theorem 3.3.7 make sense in terms of areas.

6. Prove Theorem 3.3.7.

7. Let f(x) = x2 and let P be a partition of the interval [1,2] into subintervals
of length S. Compute UP(f) and LP(f) when S = .5,S = .2, and 8 = .1.

8. Let f(x) = i. Find a partition P of the interval [1,3] so that UP(f) -


Lp(f) < 10-2. What can you conclude about ff ~ dx?

9. Use Theorem 3.3.4 to prove that

e— 1 <
f
/ Vi T- xex dx < V2(e — 1).

10. Let f(x) — x + .l(sina;)3. Estimate the integral f“ f(x) dx from above and
below.

11. Let / be the function in problem 10. Estimate ef (®) dx from above and
below.

12. Let / be a continuous function on [a, 6] and suppose that 8 > 0.


3.4 Numerical Methods 95

(a) Prove that f (x) dx = lims^0 f*+s f{x) dx.

(h) Prove that f (x) dx = lims^o f*~S8 f(x) dx.

13. Let / be a continuous function on the interval [a, b] and define

Prove that F is Lipschitz continuous on [a, b\.

14. Let / be a continuous function on the interval [a, b}. Suppose that for
every a\ e [a, b\ and b\ e [a, 6] we know that

dx = 0.

Prove that f(x) = 0 for all x.

15. Let / be a continuous function on the interval [a, b\. Prove that there exists
an x e [a, b] so that

/(*) = ( f(x) dx-


O a Ja
Why does this make sense geometrically?

3.4 Numerical Methods

In the last section we saw that the Riemann integral of a continuous func¬
tion can be approximated by a Riemann sum. Indeed, the integral was
defined as the limit of such approximations. Sometimes one can use the
Fundamental Theorem of Calculus to evaluate an integral exactly, but of¬
ten some numerical method must be used. In this section we show how
to use analytic techniques to derive error estimates for several different
approximation schemes. Our purpose is not to find the “best” methods
or estimates, but to show how analytical tools can be used.
Consider a particular rectangle in the approximation by the lower
sum in Figure 3.3.1(a). See Figure 3.4.1(a) on page 98. The error is the
area above the rectangle and below the curve. How big will this error
be? In each interval we are approximating the function / by a constant
function (whose height is the minimum of / in the interval). The error
will, therefore, depend on how quickly / moves up from this constant
value. That is, the error should depend on the derivative of /. If / has
96 Chapter 3. The Riemann Integral

a small derivative, then the graph of / will be almost flat in a small in¬
terval and the approximation will be good. If / has a large derivative,
then the graph of / will rise steeply and the approximation will be poor.
Thus, what we want is an explicit estimate which says how good an ap¬
proximation the Riemann sum is in terms of the size of the derivative of
/. Throughout this section we use the properties of differentiation intro¬
duced in Chapter 4; in particular, we use the Mean Value Theorem from
Section 4.2 and Taylor's theorem from Section 4.3.
From now on, we will always choose partitions which divide [a, b\
into subintervals of equal length. So, if there are N subintervals, then
each has length
, b— a

We choose points x* e [x*_i, Xi] and denote by

N N

RN = 1) = h^2f(xi)
i= 1 i=1

a Riemann sum with N subintervals.

□ Theorem 3.4.1 Let / be a continuously differentiable function on the


interval [a, b] and suppose that \ f'(x)\< M for all x e [a, 6]. Then,

dx — Rn < Mh(b — a).

Proof. For every xe[xitxi-1], the Mean Value Theorem guarantees that
there is a £ between x and x* such that

m__mi = m
so

\fix)-f(x*)\ = \f'(0\\x~xi\
< Mh.

Therefore,

b n N

f(x) dx - f(x*i)(xi - xi-1) [ (/(*)“ f(xi))dx


i=1 i=1 *'**-1
3.4 Numerical Methods 97

< i—1
E [
J*i-1
tf(x) - f{x*))dx

< J2 [ \fix) - f(x*)\dx


i=1 Jxi- 1

N rx.
< / Mhdx
i=l ■'**-1

= Mh(b — a). □

Notice that the theorem gives an upper bound for the error indepen¬
dent of how the points x* in the Riemann sum are chosen. We can use the
left endpoints, the right endpoints, the midpoints, or the points where /
achieves maxima or minima. The actual error will depend on the x*, and
some choices may be better then others.

Example 1 Suppose that we wish to evaluate f l'5 e~x2//2 dec, which


is just the probability that a standard normal random variable will take
on a value between 0.5 and 1.5. Let f{x) = e_a;2//2. Since f'(x) —
—xe~x2/2 and f"(x) = (x2 - l)e~x /2, it is easy to check that the max¬
imal values of |/'(x)| occur at x = ±1. Thus for all xe [.5,1.5], we have
\f'(x)\ < e-1/2. Since b — a = 1, Theorem 3.4.1 guarantees that

Rn < he-1/2

for any Riemann sum with N subintervals of length h. Therefore, if we


want to be sure that the Riemann sum is within 10-6 of the true value of
the integral, we need to choose h < lO-6-^ or N > 106/yje.

The method of approximating integrals by general Riemann sums is


called a first-order method because the error bound, E(h) = Mh(b - a),
depends on the first power of h. You can see from the example why first-
order methods aren't very good. To get an accuracy of h, we have to sum
over approximately l/h intervals (ignoring the y/e). If h = 10-6 this is
definitely not something to do on a hand calculator! But there is another
problem too. Any computer, whether it is a hand calculator, a PC, or a
98 Chapter 3. The Riemann Integral

supercomputer, may make a small round-off error on each arithmetic op¬


eration. After a large number of arithmetic operations, the accumulated
round-off errors may corrupt the answer one is trying to compute. For
example, suppose that the round-off error is e. To compute the Riemann
sum Rn one has to evaluate the function N times, add the N values to¬
gether, and multiply by h. Assuming that each evaluation and operation
has a possible error of e, the Riemann sum which we compute may differ
by as much as e2N from the actual Riemann sum. Thus our computation
may differ from the real value of the integral by as much as

E{h) = Mh(b — a) + (19)


h

Of course, E(h) is a bound for the error, not the error itself, which may
be smaller. Notice that E(h) ->• oo as h ->• 0. In fact, E(h) has a minimum
at ho = y/2e/M and that minimum is

E(ho) = 2V2 Me(b — a).

Thus there is no point in choosing h < ho since the error may just get
larger, and there is no way to choose h to guarantee that our approxima¬
tion is closer than 2y/2Me(b — a) to the correct value of the integral.
To get better methods for computing /a6 f(x) dx we should approxi¬
mate / better on each subinterval. Since we need to be able to integrate
the approximations explicitly, polynomials are natural candidates. In a
Riemann sum, the function is approximated by a function which is con¬
stant on each subinterval, so it is natural to try the next simplest thing,
linear approximations.

Figure 3.4.1. (a) A lower sum or left-hand endpoint approximation, (b) -


(d): Different linear approximations. In all cases the error is shaded.
3.4 Numerical Methods 99

On each subinterval Xj], we shall approximate f{x) by the first-


order Taylor polynomial

Tl(x) = f(xi-1) + /'(au-iX® - Xj-i),

which has the same value as f at Xj_i and the same derivative as / at
Xi-1- Let Tn denote the sum of the integrals of the functions Tl(x) on the
intervals [x»_i, x*]. That is.

Tjy Tl(x) dx

N N rxi
) (x -
Jxi-l
i=l i—1

N iV

2
i—1

□ Theorem 3.4.2 Let / be a twice continuously differentiable function on


[a, 6] and suppose that \f"(x)\ < M for all x e [a, b}. Then

Mh?(b — a)
f(x) dx - Tn <
/ 6

Proof. Suppose x e [x»_i, x*]. By Taylor's theorem (Theorem 4.3.1), there


is a £ e [x;_i, x] such that

f(x)-T(x) = t-$-(x-Xi-1)2
2!

Thus,

[b f(x)dx-TN = Y, r V(x)-Ti(x))dx
Ja i=\ " xi — 1

< ~tr \f(x)-T(x)\dx


tl

iV /.a.i
i M
< Y ^T-(a5 — ay.-i)'4 rfx
100 Chapter 3. The Riemann Integral

M
2!~ i—1

Mh3N
6

Mh?{b — a)
6 □

Though natural, the approximation scheme of Theorem 3.4.2 is not


used in practice because one must compute (symbolically or numeri¬
cally) the derivative of / and evaluate it, as well as evaluate / itself.
We can avoid this by choosing the linear approximation which connects
(®i—i, /(®*-i)) and (xh f(xi)) by a straight line

Ti/ x x , f(Xi) - f{Xi-l) , x


T (x) — T __ _ (x x^—i),
—1

in each interval [x*_i, x*]. See Figure 3.4.1(c). This approximation scheme
gives the trapezoidal rule (problem 3), which is also order 2 and has the
error bound (20) where M is a bound for the second derivative of /. The
proof of the error bound is outlined in project 4.

Mh?{b — a)
E(h) (20)
12

One can test the order of any scheme by trying it out on a function
whose integral is known. If the scheme is order 1, then halving the size of
h should halve the size of the error. If the scheme is order 2, then halving
the size of h should cause the resulting error to be \ of the former error.
If one tries this out on the special Riemann sum method

Rn = 5^/(xi)(xi - Xi-i), Xi = X~ 1-+a?I, (21)


i=i

one finds that it is order 2. This is surprising since we proved in The¬


orem 3.4.1 that Riemann sum methods are order 1. Of course the error
estimates are upper bounds for the error, so a particular Riemann sum
method might have higher order. In fact, this method, called the Mid¬
point Rule for obvious reasons, is order 2. The following proof shows
why.
3.4 Numerical Methods 101

□ Theorem 3.4.3 Let / be a twice continuously differentiable function on


[a, b\ and suppose that |/"(x)| < M for all x e [a, b\. If Rn is given by (21),
then

f
Ja
f(x) dx — Rn <
Mh?(b — a)
24
(22)

Proof. On each interval we approximate / by the linear Taylor polyno¬


mial centered at the midpoint. Define

Hl(x) = f(xi) + f'{xi)(x - Xi), xe[xi_i,xi].

See Figure 3.4.1(d). By Taylor's theorem, there is a £ e [x*, x] such that

f(x)-Hi(x) = if

and thus
rxi rxi M
/ |/(x) - Hl(x)\dx < / -{x-xifdx (23)
Jxi-x Jxi-i *■

= y(V2)3- (24)

Summing over the N intervals gives the estimate on the right side of (22).
But why do the integrals of the if*(x) give the Midpoint Rule?

f Hl(x)dx — f(xi) [ dx + f'(xi) f (x — Xi) dx.


J Xi— 1 J Xi— 1 J Xi— 1
The second term on the right is zero because x* is the midpoint of
[xi_i,Xj]. Thus,

jr f Hl(x)dx = Y^f(xi)(xi-xi-1).
i=1 ^ xi-1 i= 1

Problems
1. If we approximate the integral in Example 1 by using the method of The¬
orem 3.4.2, how small should h be chosen so that the error is < 10“4?

2. Show how to get an nth order scheme by generalizing the idea in Theo¬
rem 3.4.2.
102 Chapter 3. The Riemann Integral

3. Show that the linear approximation scheme with the Ll(x) defined as in
this section gives the trapezoid rule

f(x) dx « ^(/(ar0) + 2/(®i) + 2f(x2) + ... + 2f{xN-1) + f(xN)).


a

4. Use the integrals in problem 5 to provide experimental evidence that the


Midpoint rule has order 2.

5. In each case below, compute how small h must be to guarantee that the
error in the left-hand endpoint Riemann sum method and the error in the
Midpoint rule are < 10-4 :

(a) f^2 sinx dx.

6. Suppose that the error bound (including round-off) for a first-order nu¬
merical scheme method is given by (19).

(a) Show that if h is small the error bound increases as we make h still
smaller.
(b) For what h is E(h) smallest?

(c) Suppose that the error bound for a second-order method is given by

E2{h) = Mh2{b-a) + 2g^


h

Find the minimum of E2(h) and compare it to the minimum of E(h)


in the special case M = (6 - a) = 1 and e = 10~9.
(d) How small must h be for E2 to be as small as the smallest value of
E? Why is this important?

7. Let / be a continuous function on the interval [a, b\, which we divide into
N segments of length h = Instead of approximating / on each subin¬
terval [xi-!,xi] by a linear function, we approximate / by the quadratic
function that has the same values as / does atat xif and at the mid¬
point Xi. Show that the sum of«the integrals of these quadratic approxi¬
mations is

This is called Simpson's rule.

8. Provide numerical evidence that Simpson's rule is order 4 by using it to


evaluate one of the integrals in problem 5.
3.5 Discontinuities 103

9. If / is four times continuously differentiable and \f^(x)\ < M for all


x e [a, b], the error estimate for Simpson's rule is

For each of the integrals in problem 5, how small must h be so that the
error in Simpson's rule is < 10 “4?

3.5 Discontinuities

Upper and lower sums were defined in Section 3.3 for any function on a
finite interval [a, b] that is bounded. We proved there that if / is contin¬
uous on [a, b], then the inf of the upper sums equals the sup of the lower
sums, and so, by definition, the Riemann integral exists. In this section
we show that the Riemann integral also exists for special classes of func¬
tions which have points in their domains where they are not continuous.
Such functions are called discontinuous, and the points are called dis¬
continuities. We begin with an example (problem 1 of Section 3.3) which
shows that if a function has too many discontinuities, the Riemann inte¬
gral may not exist.

Example 1 Consider the function / on [0,1] defined by

0 if a; is rational
/(*) 1 if a; is irrational

Let P be any partition. Then, because each interval in the partition con¬
tains both rational and irrational numbers (problem 11 of Section 1.1 and
problem 11 of Section 1.4), the upper sum equals 1 and the lower sum
equals 0. Since this is true for each partition P, the infimum of the upper
sums is 1 and the supremum of the lower sums is 0. Thus the Riemann
integral of / does not exist.

Some special classes of discontinuous bounded functions do have


Riemann integrals, however. A function is said to be monotone increas¬
ing if x < y implies that f(x) < f(y). Monotone decreasing is defined
analogously.
104 Chapter 3. The Riemann Integral

□ Theorem 3.5.1 Suppose that / is a bounded monotone function on [a, b].


Then / is Riemann integrable.

Proof. Suppose that / is monotone increasing; the proof for the decreas¬
ing case is similar. Let M be such that |/(x)| < M for all x e [a, b] and let
e > 0 be given. Let P be a partition so that the subintervals have equal
length h with h < e/2M. Since / is monotone increasing, the value of
/ at the right-hand endpoint of each subinterval [xi-\,Xj\ is the supre-
mum of / over the interval and the value at the left-hand endpoint is the
infimum of /. Thus,

N
Up(f) = Y^f(xi)(xi-Xi-1)
i—1

Lp(f) = ^ f (xi-l){Xi - Xi-1),


i=l

and so.

Up(f) — LP(f) = ^2if{xi) - f(xi-i))(xi-Xi-1)


i—1

= 1))
2=1

= h(f(b) — f(a))

< h(2M)

< £.

Thus, by Lemma 3 of Section 3.3, / is Riemann integrable. □

If / is not monotone, the situation is much more difficult, and we will


restict our attention to functions which are continuous except at finitely
many points.

Example 2 Consider the function on the interval [0,1] defined by /(0) =


0 and f(x) = sin\ if x > 0. This function, whose graph is shown in
Figure 3.5.1, is continuous on (0,1] but is not continuous at x = 0. This
3.5 Discontinuities 105

is not because we defined /(0) = 0. Since lim^o f{x) does not exist, no
definition of / at x = 0 would make / a continuous function on [0,1]. For
each S > 0, / is continuous on the interval [5,1], so the integral // f(x) dx
certainly exists. If the integral of / on [0,1] does exist, it seems natural to
guess that it should equal the limit of the numbers f$ f(x) dx as S 0, if
this limit exists.

The following theorem answers the questions raised by Example 1.

□ Theorem 3.5.2 Let / be a bounded function on [a, b] that is continuous


on (a, b\. Then, the Riemann integral of / exists on [a, b] and

lim dx. (25)


($->0

Proof. Choose M so that \f(x)\ < M for all x in [a, b], and let e > 0 be
given. Choose <5 small enough so that SM < §. The reason for this choice
will become clear later. Let Ps be a partition of the interval [o+<5, b], which
we label with a + S = x\ < x2 < ■ ■ • < xN = 6. Any such partition Ps can
be extended to be a partition P of [a, b] by adding the point x0 = a. Let

Mi = sup f(x), rrn = inf f(x),


Xi-l<X<Xi
Xi — i ^ X ^ X'i
106 Chapter 3. The Riemann Integral

as in Section 3.3. Since / is continuous on [a + S, b], we can choose the


partition Ps so that

N N £

Xi-i)Xi-i) < 4- (26)


i=2 i-2

Therefore,

N N

Y Mi{Xi - Xi-1) - Y mi(Xi ~ Xi-1)


i=l *=1

N N

=- (Mi - mi)S + YMi(xi _ *t-i) _ Ymi(Xi Xi-1)

by (26) and our choice of £ above. Thus, by Lemma 3 of Section 3.3,


the Riemann integral of / exists on [a, b\. To see that f(x)dx can be
calculated by taking the limit (25), we estimate

(27)

N
< ^ ^ -Mj(Xj X{ — i)
i—1

+ M\(x\ - ®0)

£
<
4'

Thus we have shown that, given e > 0, we can choose a number S > 0 so
that | f(x) dx — fY f(x) dx\ < e. This proves (25). □

We remark that a special case of the hypotheses of Theorem 3.5.2 oc¬


curs when the function / is, in fact, continuous at a. In that case, we
already know the limit formula (25); see problem 12 of Section 3.3. No¬
tice that since the integral can be computed by the limit in (25), the value
of f(x) dx does not depend on the value of / at x = a.
3.5 Discontinuities 107

Example 2 (revisited) We now know that the Riemann integral


Ja sin x dx exists and the limit formula (26) suggests a way to compute it.
Although the integral exists for the class of functions treated in Theorem
3.5.2, notice that we do not yet know that the properties of the integral
(Theorems 3.3.3 - 3.3.7) are true. Assuming that the properties hold, we
have
f1 . 1 , fs . 1 , f1 . 1 ,
/ sm — dx = / sm — dx + / sin — dx
Jo x Jo x Js x
by Theorem 3.3.7. Thus,

rl . 1 , r1 1 r8 ' l
/ sm dx — / sin
— dx — = / sin dx —

Jo x Is x Jo x

by Corollary 3.3.6 since |sin^| < 1. This shows us how to compute


sin i dx as closely as we like. Suppose that we want to know the
value of /q sin ^ dx to within 10-2. If we choose 6 = 5 10“2 and replace
fo sin ^ dx by fg sin L dx, we make an error of less than 5IO-2. On the
interval [5,1], the function sin ^ is infinitely often continuously differen¬
tiable, so we can use any of the numerical methods discussed in Section
3.4 to estimate fg sin ~ dx to within |l0-2.

Although the proof of Theorem 3.5.2 is a little complicated, the idea


is very simple. Since the function is bounded, the terms in the upper and
lower sums that come from the first interval [a, a + 5] can be made as
small as we like by choosing S small. On the rest of the interval, [a + S, b\,
f is continuous, so the upper and lower sums there can be made as close
as we like by picking the appropriate partition. We could have allowed
/ to be discontinuous at b too. In fact, if / is bounded and is continuous
on [a, b] except for finitely many points, then the Riemann integral of
/ exists. The proof, which uses exactly the same ideas as the proof of
Theorem 3.5.2, is omitted.

□ Theorem 3.5.3 Let / be a bounded function on [a, b\ that is continuous


on [a, b] except for finitely many points ai, 02,..., a*.. Set ao = a and
ak+1 — b. Then, the Riemann integral of / exists on [a, b] and

(28)
108 Chapter 3. The Riemann Integral

where each integral on the right can be expressed as

raj CLj
raj-0S
/ f(x)dx = lim / f(x)dx. (29)
Jaj-i ’ S\0jaj_1+S

Each of the integrals f*? f{x)dx is defined as the common value


of the inf of the upper sums and the sup of the lower sums. Since this
value can be computed by the limiting operation in (28), the value of
f(x) dx does not depend on the values of / at the points of disconti¬
nuity ai,a2, • • • ,ak.
We want to define a particulary simple class of discontinuities called
jump discontinuities. To do so, we first generalize the definition of
limX^cf(x) introduced in Section 3.1. Suppose that a function / is de¬
fined on an open interval (c, b). We say that / has a limit from the right,
/(c+), at c, if for every sequence xn —> c with xn e (c, b) we have

lim f(xn) = /(c+).


n—>oo

Note that each xn > c. Similarly, suppose that / is defined on an open


interval (a, c). We say that / has a limit from the left,/(c_) , at c, if for
every sequence xn -» c with xn e (a, c) we have

lim f(xn) = f{c~).


n—too

We sometimes denote the limit from the left by lima.^ f{x) and the limit
from the right by limX\cf(x). If / has both right-hand and left-hand
limits at c and those limits are unequal, then / is said to have a jump
discontinuity at c. Notice that nothing is said about the value of / at c.
It may not even be defined there. However, if / is defined at c and

lim/(x) = /(c) = lim /(*), (30)


XyTC X\C

then / is continuous at c (problem 5).


k *

Example 3 Consider the function

' 2x if 0 < x <1


1 if x = 1
1/2 if 1 < x <2
f - (* - 2)2 if 2 < x < 3
3.5 Discontinuities 109

defined on [0,3]. In the graph shown in Figure 3.5.2, we can see that /
has jump discontinuities at x = 1 and x — 2. At x = 1, the limits from
the left and right are 2 and 1/2, respectively. At x = 2, the limits from
the left and right are 1/2 and 7r/2, respectively.

Definition. A function / defined on a finite interval [a, 6] is said to be


piecewise continuous if it is continuous except at finitely many points
at which it has jump discontinuities.

Suppose that / is piecewise continuous on [a, 6] with jump disconti¬


nuities at ai, a2,..., a*,. Set ao = a and a^+i = b. Then / is continuous on
each of the open intervals (a^-i, aj). Let fj be the function on [ay-i, aj]
that is equal to / on (cq-i, aj) and that is defined at cq_i and aj by the
following limits:

1) = lim f(x) and fJ{aj) = lim /(x).


X '\fCLj — 1 X y CLj

For example, in Example 3, there are three subintervals and three func¬
tions, fi, /2, and f%. It is easy to check (problem 6) that the fj is continu¬
ous on [aj-i,aj]. Therefore, by problem 12 of Section 3.3,

raj raj—fi
/ f.j(x)dx = lim / fj(x)dx.
Jaj-i J s\oJaj_1+sJjy J

This means that for piecewise continuous functions. Theorem 3.5.3 can
be written in a simpler form:

Corollary 3.5.4 Suppose that / is piecewise continuous on [a, b] with


jump discontinuities at ai, a2,..., a^. Set ao = a and a^+i = b and define
110 Chapter 3. The Riemann Integral

the functions fj as above. Then / is Riemann integrable and

The distinction that is being made here is as follows. In Theorem 3.5.3


the integrals f“j f(x) dx exist because the inf of the upper sums is equal
to the sup of the lower sums. This doesn't help us to evaluate the integral
since taking such infs and sups is difficult. Therefore, an important part
of Theorem 3.5.3 is that /(*) dx can be expressed as the limit (28).

For each S > o, f(x) dx is the integral of a continuous function on


a closed interval and so can be evaluated by ordinary means (analytical
or numerical). If, however, / is piecewise continuous, then one does
not need the limiting operation in (28). The integral f{x)dx equals
faL1 fj(x) dx' which is an integral of a continuous function on a finite
interval. Thus, in Example 3,

J
/»3
f(x)dx = J 2
/*1
xdx +
f*2 ^
J -dx + J yj-
— — (x — 2)2 dx

and each of the integrals on the right can be evaluated by elementary


means.
We know that bounded monotone functions and functions which are
bounded and continuous except for finitely many points are Riemann
integrable on finite closed intervals. On the other hand, from Example 1
we know that not all bounded functions have Riemann integrals. Thus,
we have not characterized exactly which bounded functions have Rie¬
mann integrals. The proofs of Theorems 3.5.2 and 3.5.3 depend on the
fact that the set of discontinuities is finite and thus the terms in upper
and lower sums corresponding to small intervals about these points do
not contribute much. The function in Example 1 is, however, discontin¬
uous at every point in the interval [0,1]. This suggests that a bounded
function on a finite interval should be Riemann integrable if the set of
discontinuities is “small” in some sense. This is true, but a complete
characterization, though interesting and important for historical reasons,
is beyond the scope of this introductory text.
In Section 3.3, we proved the properties of the Riemann integral (The¬
orems 3.3.3 - 3.3.7) for continuous functions. In fact, they are true for all
functions that are Riemann integrable. This can be proved for partic¬
ular classes of functions that are Riemann integrable (see, for example,
3.5 Discontinuities 111

problems 9, 11, and 12) or directly from the definition of Riemann inte-
grability (see problem 13).

Problems

1. Let / be the function on [0,1] given by

x if a; is rational
/(*) 0 if a: is irrational

Show that / is not Riemann integrable.

2. Let / be the function on [0,1] given by

0, 0 < x < 1
/(*) 1, 1 < x < 2
2, 2 < x < 3

(a) Prove that / is Riemann integrable without appealing to any theo¬


rems in this section.
(b) Which theorems in this section guarantee that / is Riemann inte¬
grable?

(c) What is J03 f(x) dx?

3. Let / be the function on [0,1] given by

,, X / b 0<x<±
f(X) " I * - | 5<*<1
(a) Prove that / is Riemann integrable without appealing to any theo¬
rems in this section.
(b) Which theorems in this section guarantee that / is Riemann inte¬
grable?

(c) What is /q f(x)dx?

4. Each of the following functions is well defined for x > 0. For each, ex¬
plain whether Theorem 3.5.2 can be used to prove that the function is
Riemann integrable on [0,1]. Explain why the answers don't depend on
how the functions are defined at x = 0:

(a) sin1 2 3 4
Cb) |sini.

(c) In*.
112 Chapter 3. The Riemann Integral

(d) Hint: derive the inequality sinx < x for 0 < x < 7t by using
the Fundamental Theorem of Calculus.

5. Suppose that / is defined in an open interval containing c and that (30)


holds. Prove that / is continuous at c; that is, show that for every se¬
quence xn y c we have limn_>oo f(xn) = /(c).

6. Suppose that / is continuous on an open interval (a, b) and that / has a


right-hand limit at a and a left-hand limit at b. Define an extension of
/ to [a, b] by setting fe(x) = f(x) for a < x < b and /e(a) = /(a+),
fe(b) = f(b-).

(a) Show that fe is continuous on [a, b).

(b) Show that if fe has any other values at a or b, then fe would not be
continuous on [a, b\.

7. Suppose that / is monotone increasing on [a,b\. Prove that any disconti¬


nuities that / has are jump discontinuities.

8. Show by example that a monotone increasing function on a finite interval


can have infinitely many jump discontinuities.

9. Use Corollary 3.5.4 to prove that the properties of the Riemann integral
(Theorems 3.3.3 - 3.3.7) hold for the integrals of piecewise continuous
functions on finite intervals.

10. Prove Theorem 3.5.3 by following the ideas of the proof of Theorem 3.5.2.

11. Let / be bounded on [a, b] and continuous except for finitely many points.
Let Pn be a sequence of partitions so that the maximal subinterval length
goes to zero as n —> oo, and let Sn be a Riemann sum corresponding to
Pn. Prove that Sn -> /Qb f(x) dx as n —> oo.

12. Use the result of problem 11 to show that the properties of the Riemann
integral (Theorems 3.3.3 - 3.3.7) hold for functions which are bounded on
[a, b] and are continuous except for finitely many points.

13. Use the definition of Riemann integrability to show directly that if / and
g are Riemann integrable on [a;b], then f + g is Riemann integrable on
[a, b] and
3.6 Improper Integrals 113

3.6 Improper Integrals


In the last section we investigated the Riemann integral for bounded
functions with discontinuities. In this section we extend the Riemann
integral to some unbounded functions and to some functions on infinite
intervals. We begin with an example.

Example 1 Consider the function f(x) = l/y/x


whose graph is shown in Figure 3.6.1. Note that
/ is continuous on (0,1] but diverges to oo as
x \ 0. The Riemann integral as defined in Sec¬
tion 3.3 certainly does not exist since Up(f) =
oo for every partition P because the supremum
of / is infinite over any interval of the form [0,5)
as long as 5 > 0. However, our experience in
Section 3.5 suggests another approach. For ev¬
ery <5 > 0 the Riemann integral of / exists on
[<5,1] because / is continuous on [<5,1]. And since
Figure 3.6.1
/ is so simple, we can compute it explicitly:

/ —= dx — 2 — 2 Vd.
Js Vx
Since the right-hand side has a limit as 6 0, namely 2, we could define
the integral of ^ on [0,1] to be that limit:

dx = lim / 2.
<s\0 Js

The example gives us the idea for the following definition.

Definition. Suppose that / is a continuous function on a half-open finite


interval (a, b}. If the limit of f^+s f(x)dx exists and is finite as \ 0, we
define

dx = lim dx.
<5\o

and call it the improper Riemann integral of / on [a, 6]. A similar defi¬
nition holds if / is continuous but unbounded on [a, b).
114 Chapter 3. The Riemann Integral

Example 2 Let us consider the family of functions fa(x) = on the


interval [0,1] for different values of a. For a > 0, the function fa is
continuous on [0,1], so the Riemann integral exists. For a < 0, fa is
continuous but unbounded on (0,1]. There are three cases to consider. If
a < 0 and a / —1,

i £1+Q
L
1
xa dx
1 +a 1Ta

In the case -1 < a < 0, the right-hand side has limit as 5 \ 0, so the
improper Riemann integral exists and

1 +a

However, if ck < —1, then

f X £l+a 1
lim < -- — --> Too,
<5\o ^ 1 + a 1+aJ

so the improper Riemann integral does not exist. The case a = — 1 is left
as problem 3.

The functions in Example 2 were easy to analyze because the inte¬


grals could be computed explicitly. Often, this is impossible and one has
to make estimates.

Example 3 Consider the integral /q1 dx. The problem here is near
x = 1, where the function is unbounded. Define

cos x
dx.
\/l — x

We want to show that limj^o h exists. This means that for any sequence
Sn —> 0 with 5n > 0, the sequence has a limit and the limit is in¬
dependent of the sequence {<5n} chosen. We did not emphasize this in
Example 2 since the limits there were simple and explicit. Here we will
be more careful. Let {<5n} be such a sequence and consider the two terms
I6n and ISm. Suppose Sn < 5m- Then,

COS X cosx
\hn ~ hm dx dx
\/l — X \J\ - X
3.6 Improper Integrals 115

rl—Sr COSX
dx
JlSm
Sm VT^ X

cos x\
< dx
/ -6m y/l-x

l-<5n 2
<
L -6m
dx

= 2^n - 2v^.

Since <$„ \ 0 and y/s is continuous, ->• 0 as n -> oo. Thus, given
£ > 0, we can choose JV so that n > N and m> N implies \ y/5^ - y/5^\ <
e/2. By the above estimate, it follows that \Isn - hm I < £ for n > N and
m > N. Thus, {Isn} is a Cauchy sequence and therefore converges since
the real numbers are complete.
We have shown that {/jn} converges for any sequence 5n \ 0. If
7n \ 0 is another such sequence, then the same estimate as above shows
that

\^6n I — 21 y/Sn \Z^/n\

for all n. Since the right-hand side converges to zero as n —> oo, the
left-hand side must converge to zero too; that is.

lim Is71 lim LIn '


n—> oo n—>oo

Thus, the improper integral fg dx exists. To compute it approxi¬


mately, one can use the idea in Example 2 (revisited) of Section 3.5.

This same idea allows us to define the improper Riemann integral on


semi-infinite intervals.

Definition. Let / be a continuous function on the semi-infinite interval


[a, oo). If lirrifr )QO f(x) dx exists and is finite, we define

noo rb
/ /(x)dx = lim / /(x)dx.
Ja b—>oo Ja

and call it the improper Riemann Integral of / on [a, oo). A similar


definition holds if / is continuous on a semi-infinite interval of the form
(-00,6].
116 Chapter 3. The Riemann Integral

Example 4 Let us consider the function f{x) — l/xa for a > 0.


If a = 2, the graph is shown in Figure 3.6.2. Suppose that a > 1.

1
5
ba~1

1
lim fb— dx =
6—>oo J i Xa a — 1

Figure 3.6.2

Thus, the improper Riemann integral exists. Similarly, it is not hard to


see that the improper integral does not exist in the case 0 < a < 1.

Sometimes one can show that improper Riemann integrals exist even
though one can not evaluate the limit explicitly.

Example 5 Consider the improper Riemann integral dx. Using


integration by parts (problem 12 of Section 4.2), we see that

cos b
cos(1) + dx.
~b~

The first two terms on the right clearly have nice limits as b oo. To
handle the third term notice that if c and d are both large and c < d, then

fd COS X fc COS X
/ 0 dx / 9 dx
J1 x2 h x2

1 1
c d

This estimate can be used similarly to the way in which the estimate was
used in Example 3 to show that /** ^ dx converges to a finite limit
as cn —> oo, and the limit is independent of the sequence {cn} chosen
(problem 7).
3.6 Improper Integrals 117

There are subtleties in improper Riemann integrals. Choose ^ small


enough so that sin x > \ if x e [|, | + n\. Then,

N r2 +2T17T+M sin x 1 A /•f+2n7r+M dx


dx >
E/. +2n7r X 2“i 4+2n7T ®
n-1 J 2

1 1 /*|+2mr+M
> - > ^- / dx
2 ^ 2 + 2mr +2n7r

JV
c l
7T E 1 + 4n
n=l

JV ,
M V- _1
>
87r 1 n
n=l

Since the harmonic diverges (see Section 6.2), we see that the total
amount of area in the positive bumps of the function x^sinx is infi¬
nite. Similarly there is an infinite amount of area in the negative bumps.
Yet the improper integral ff° dx exists because when one takes the
limit of

rh sin x
/ -dx
J1 x

as b ->■ 00 there are cancelations between the successive positive bumps


and negative bumps. This is analogous to an infinite series that is condi¬
tionally but not absolutely convergent.

Problems
1. Prove that the following functions have improper Riemann integrals on
the interval [0,1]

1+x1 2
(a) \[x
cos 2ar
(b) 13/4 •
1
(c) y/x(x+l) '

2. Determine which of the following functions have improper Riemann in¬


tegrals on the interval [0,1].

(a) In x. Hint: integrate by parts.


118 Chapter 3. The Riemann Integral

(b) Pyf. Hint: see problem 4(d) in Section 3.5.

3. Use the integral formula for the natural logarithm (see Example 2 of Sec¬
tion 4.3) and properties of the logarithm to prove that the improper Rie¬
mann integral fQ ~ dx does not exist.

4. Prove that the improper Riemann integral J0°° e~x dx exists.

5. Determine whether the improper Riemann integral /2°° dx exists.

6. Prove that the improper Riemann integral J0°° e-*2/2 dx exists. Hint: for
large x, estimate e-*2/2 by e~x.

7. Complete the proof of Example 5 by using the estimate to show that if


cn —> oo, then the improper integral ffn dx converges as n —* oo and
the limit is independent of the choice of {cn}.

8. Suppose that / and g are continuous functions on the interval (a, b] and
assume that the improper Riemann integrals fa f(x)dx and g(x) dx
both exist. Prove that the improper Riemann integral f(x) + g(x) dx
exists and equals fa f(x) dx + g(x) dx.

9. Suppose that / is an unbounded continuous function on the open interval


(a, b). We say that the improper Riemann integral f(x) dx exists if there
is a c e (a, b) so that the improper integrals J* f(x) dx and J* f(x) dx exist
(one of them might be proper). In this case we define

pb nC nb

/ f(x)dx = / f(x)dx + / f(x)dx.


J CL J CL JC

Show that if this is true for one c e (a, b) then it is true for all c e (a, b) and
the value of f(x) dx is independent of the choice of c.

10. Suppose that / is a continuous function on R. We say that the improper


Riemann integral f(x) dx exists if there is a c e R so that the improper
integrals f_oo f(x) dx and Jc°° f(x) dx exist. In this case we define

/ OO

f(x)dx =
/>C

/ f{x)dx + /
/>0O

f(x) dx.
-oo J — oo Jc

Show that if this is true for one c e R then it is true for all c e 1 and the
value of f(x) dx is independent of the choice of c.

11. Which of the following improper Riemann integrals exist?

(a) fo(^S + Tfc) dx-


(b) f! oo 1 + x2
dx.
Projects 119

(c) /°° e-*1 2 dx.

(d) J — OO
xdx.

12. Suppose that / is a continuous function on M such that the improper in¬
tegral fZo l/(*)l dx exists.

(a) Show that the improper integral f(x) dx exists.


(b) Show that if g is continuous and bounded on R, then the improper
integral g(x)f(x) dx exists.

Projects

1. Suppose that / is continuous on the open interval (a, b). The purpose of
this project is to show that / can be extended to be a continuous function
on [a, b] if and only if / is uniformly continuous on (a, b). It is clear from
Theorem 3.2.5 that if / can be extended it must be uniformly continuous.
So, suppose / is uniformly continuous on (a, b).

(a) Prove that / is bounded on (a, b).


(b) Let an —>■ a. Prove that {/(an)} has a subsequence which converges
to a limit L.
(c) Define f(a) = L and prove that / is continuous at a.
(d) Do the same at the other endpoint and conclude that / has a contin¬
uous extension. Is the extension unique?
(e) Give examples of bounded continuous functions on open intervals
which do not have continuous extensions.

2. The purpose of this project is to derive integral expressions for certain


physical or geometric quantities.

(a) Let / be a continuously differentiable function on the interval [a, b].


As in Section 3.4, divide up [a,6] into N subintervals of
equal length. Approximate the length of the graph of / over [a, b] by
writing down the sum of the lengths of the straight line segments
between the successive points (Xi, /(xj)). Rearrange your sum so
that it is a Riemann sum on [a, b\. As N —>■ oo, the Riemann sums
converge to an integral. What is the integral? Why is this a reason¬
able way to define the arc length of the graph of / over [a, b\?
(b) A metal bar is lying on the x-axis between x = 0 and x = 2. The
density of the bar at x is p(x) units of mass per unit length. Find
and justify an expression for the total mass of the bar.
(c) Consider the problem of finding a point x so that if one puts one's
finger under the bar at x it will just balance. The point x is called
the center of mass. A unit of mass m at a distance d from x exerts a
120 Chapter 3. The Riemann Integral

downward moment proportional to m times d. The point x will be


such that the moments on the left are equal to the moments on the
right. Find an integral expression for x.
(d) Let / be a positive continuously differentiable function on the inter¬
val [a, 6]. Derive the formulas for the volume and surface area of the
figure obtained by revolving the graph of / around the x-axis.

3. This project is a cautionary tale about numerical methods. Define the


number En for each nonnegative integer n > 0 by

Suppose that we are interested in E10 but don't want to evaluate the in¬
tegral numerically.

(a) Show that En = e — nEn_i.


(b) Show that 0 < En < e for each n and that E0 — e — 1.

(c) Use the recursion relation to compute Ex, E2,..., E10. Ignore the
fact that your calculator has an “e” key and use the (good) approxi¬
mate value of 2.718 for e. Do you think you've found a good value
for Eio? Do your numbers satisfy the inequality in (b)? Explain
what happened by deriving a recursion relation for the difference
between your approximate En (call it En) and the real En.
(d) Rewrite the recursion relation in (a) to express En_x in terms of En.
Pick any number between -100 and +100 for E20 and use the re¬
cursion relation to compute Ex9, E18,..., Ex0.
(e) Do you think this value for E10 is right? See if it is right by numeri¬
cally estimating the integral for E10. How can this be?
(f) Explain what is going on!

4. The purpose of this project is to prove that the trapezoid rule has er¬
ror bound E(h) = vVe use the linear approximation Ll(x) on
Xi] as defined in Section 3.4.

(a) Let xe (xj_!,xj) and define

(t - 3h-i)(f - Xi)
9i(t) = f(t)-L\t)-(f{x)-L\x))
(X - Xi_X){x - Xi)

Verify that g{Xi) = g{x) = g{xi^.1) = 0.


(b) Use Rolle's theorem (Theorem 4.2.2) twice to conclude that there
exists a ^e(xi_!,xi) so that#"(£) = 0.
(c) Prove that f{x) - L\x) = ±/"(£)(x - - x{).
(d) Prove the error bound for the trapezoid rule.
CHAPTER 4

Differentiation

In this chapter we introduce the notion of derivative, develop its prop¬


erties, and explain the relationship between differentiation and integra¬
tion. Some of the results, though probably not the proofs, will be familiar
from calculus. In Section 4.3 we prove Taylor's theorem, a fundamental
result because it often allows one to approximate functions by polynomi¬
als. We have already used Taylor's theorem in discussing numerical inte¬
gration techniques in Section 3.4. Further applications appear in Section
4.4 and throughout the book. In Section 4.5, we show how to compute
the derivatives of inverse functions, and the result is used to justify the
usual change-of-variables techniques in integration. Finally, in Section
4.6 we show how some of the ideas of Chapters 3 and 4 can be extended
to functions of two variables.

4.1 Differentiable Functions

Definition. A function / is said to be differentiable at a point


x e Dom(f) if the limit of the difference quotient

f(x + h) - f{x)
h

exists as h 0, in which case we call the limit the derivative of / at x


and denote it by

f(x + h) - /(a)
/'(*) lim (1)
h^o h

We will also use the standard notation ^ = f'(x).


122 Chapter 4. Differentiation

It is assumed in the definition that an open interval about x is con¬


tained in the domain of /. Recall the definition of lim/^ from Section
3.1. Saying that (1) is true means that

f(x + hn) - f(x)


f'(x) = lim
n—too h"n.

for every sequence {hn} such that hn ^ 0 and x + hn e Dom(f) for each
n and hn —>• 0. Equivalently, given e > 0, there is a S p> 0 such that
f(x + h) - f(x)
/'(*) - < e if 0 < |/i| < <5.

Note that f is itself a function since the limit will, in general, depend on
x. Since x is required to be in the domain of / we always have Dom(f') C
Dom(f).

Example 1 To illustrate the definition, we'll check that f(x) = x2 is


differentiable everywhere. Let x be given, then

f(x + h) - f(x) _ (x + h)2 - x2


h h

= 2,x T h.

Since the limit of (2x + h) exists as h ->• 0 and equals 2x, we conclude
from the definition that /' exists at x and f(x) = 2x.

The difference quotient (1) is the slope of the straight line through the
points (x,f(x)) and (x + h,f(x + h)) on the graph of/; see Figure 4.1.1(a).
Thus, intuitively, f'(x) is the limit of these slopes, that is, the slope of the
line tangent to the graph of / at (x, f(x)).

Figure 4.1.1
4.1 Differentiable Functions 123

Example 2 Figure 4.1.1(b) shows an example of a function, f(x) = \x\,


which is not differentiable at x = 0. In order to show that the difference
quotient does not have a limit, we just need to exhibit a sequence {hn}
satisfying hn —>■ 0 such that the limit of

/(o + M -/(0)
hn

as n —> oo does not exist. Let hn = . Then hn —* 0 as n —> oo.


However, when n is odd, hn is negative and

f(0 + hn)-m = _L
hn

When n is even, hn is positive and

/(0 + hn)-/(0) = L
hn

Thus the sequence of quotients does not converge as n —>■ oo. The differ¬
ence quotient does have a limit from the left and a limit from the right at
x = 0, but the limits are not the same.

It is useful to think of f'(x) in three ways. It is the limit of the differ¬


ence quotient (1). It is the rate of change of the function / at x. It is the
slope of the tangent line to the graph of / at the point (x, f(x)).

□ Theorem 4.1.1 If / is differentiable at x, then / is continuous at x.

Proof. We write
f{x + h) - f(x) h
f(x + h) - f(x)

Since / is differentiable at x, the quotient on the right converges to a


finite limit. Therefore, since h-> 0, the right-hand side converges to zero
by Theorem 2.2.5. Thus, the left-hand side converges to zero too, which
proves that / is continuous at x. Q

□ Theorem 4.1.2 Suppose that / and g are differentiable at x. Then,

(a) For any constants a and p, af + fig is differentiable at x and

{af + pg)' = af' + pg'.


124 Chapter 4. Differentiation

(b) The product fg is differentiable at x and (fg)' = f'g + fg' ■

(c) If g(x) ^ 0, then ffg is differentiable at x and

( = gf'- fg'
\g) g2 '

Proof. By definition, (a/ + /3g)(x) = af(x) + /3g(x)\ Thus,

(a/ + (3g)(x + h)~ (a/ + pg){x)

h f(x+h)-f(x) g(x + h)-g(x)


= °-h- + 0-h-'

Since / and g are differentiable at x, the difference quotients for / and


g on the right side have limits f'(x) and g'{x), respectively. By Theorem
2.2.3 and Theorem 2.2.4, the whole right-hand side converges to af'(x) +
/3g'(x). Thus, the left-hand side has a limit, and the limit is af(x) +
(3g'(x). This proves (a).
To prove (b), we write

f(x + h)g(x + h) - f(x)g(x)


h
= /(* + ft) - /(*) g(x+h) + fl(« + h)-fl(«)/(a)-

As above, the difference quotients for / and g on the right side have
limits f'(x) and g'(x) respectively. Furthermore, by Theorem 4.1.1, g(x +
h) -> g(x) as h -» 0. Thus, by Theorems 2.2.3 - 2.2.5, the right-hand side
has a limit, and the limit is f'(x)g(x) + f(x)g'(x). Therefore, the left-hand
side has a limit, and the limit is f(x)g(x) + f{x)g'{x).
We omit the proof of (c). □

Example 3 To show how useful this theorem is, we will prove that
polynomials are differentiable at all x and derive the usual formula for
the derivative. First, it is easy to she that if f{x) = a for all x, then the
difference quotient is zero for all h, so f'(x) = 0. It is also easy to use
the definition to see that if f(x) = x, then f'(x) = 1. To differentiate
f{x) = x2, notice that x2 = x ■ x. So, applying part (b) of the theorem, we
see that f(x) = x2 is differentiable and

(x ■ x)' = x ■ 1 + 1 • x = 2x.
4.1 Differentiable Functions 125

Of course, we knew this already by explicitly evaluating the limit of the


difference quotient, but this gives us the idea for a general proof. Sup¬
pose we know that (xn)' = nxn_1 for some n > 1. Then, by part (b) of
Theorem 4.1.2,

(zn+1y - (x-xny
= (x)'{xn) + x{xny
= xn + x ■ nxn~l
= (n + l)xn.

Therefore, by induction, we have proven that (xn)' = nxn~l for all n > 1.
Using part (a) of the theorem repeatedly, we obtain

(c*o + a\x + a.2X2 + ... + amxm)' = au + 202 + ••• +

which is the usual formula for the derivative of a polynomial. To see


what we gained by using Theorem 4.1.2, try to prove directly that if
f(x) = oo + ol\x + (X2X2 + ... + omxm, then the limit of the difference
quotient exists.

□ Theorem 4.1.3 (The Chain Rule) Suppose that / is differentiable at x


and g is differentiable at f(x). Then g o f is differentiable at * and
(9{f{x))' = g'{f{x))f'(x).

Proof. We define a new function H on

Dom(H) = {y \ y e Dom(g) and y ^ /(*)}

by the formula

9(y)-9(f(x)) (2)
H{y)
y - /(*)
Let y = f(x) + h. Then

H(f(x) + h) = +-^—MW). g'(f(x))

as h ->• 0 since g is differentiable at f{x). Therefore, if we extend the


domain of H to include f(x) and define H(f(x)) = g (f{x)), then H is
continuous at f{x). Multiply both sides of (2) by (y — f(x)), substitute
y = f(x + h), and divide both sides by h. The result is
126 Chapter 4. Differentiation

g(f(x + h)) -g{f(x)) = H^f^x | f(x + h) ~ f{x)


h h

Since / is differentiable at x, \ {f{x + h) — f{x)) —> f'{x) as h —» 0. Since


/ is continuous at x, f(x + h) —> f{x); so, by the continuity of H, we
see that H(f(x + h)) -4 H(f(x)). Thus, the limit of the right-hand side
exists and is equal to g'(f(x))f'(x). It follows that the left-hand side has
a limit and the limit is g'(f(x))f'(x). Therefore, by definition, g o / is
differentiable at x and (g(f(x))’ = g'(f(x))f'(x). □

We will often use sin x, cos x, and the exponential function in exam¬
ples even though we will not formally define them until Chapter 6. We
assume that the reader is familiar with them, knows that they are differ¬
entiable, and knows their derivatives. The chain rule allows us to assert
that complicated functions like sin (e2x) are differentiable (and it gives a
formula for the derivative!) without our having to take the limit of the
difference quotient.

Definition. If / is differentiable at every point of an open interval (a, b),


we say that / is differentiable on (a,b). If, in addition, f is continuous
on (a, b), we say that / is continuously differentiable on (a, b).

Recall from Section 3.2 that continuous functions on closed intervals


have very strong properties, so it is natural to try to extend this defini¬
tion to closed intervals. We say that / is differentiable on [a, b] if it is
differentiable on (a, b) and, in addition, the right-hand limit of the differ¬
ence quotient exists at a and the left-hand limit of the difference quotient
exists at b. Right-hand and left-hand limits are defined in Section 3.5.

Definition. If / is differentiable on [a, b] and f is continuous on [a, b\,


we say that / is continuously differentiable on [a, b\.

It is convenient to give names to sets of functions which satisfy dif¬


ferent hypotheses. We denote the set of continuous functions on [a, b] by
C[a, b] and the continuously differentiable functions on [a, b] by C^1) [a, b\.
Similarly, C(M) and denote the continuous and continuously dif¬
ferentiable functions on R. Note that these sets of functions are vector
spaces in the sense that linear combinations of functions in these sets are
4.1 Differentiable Functions 127

again in the sets (by Theorems 3.1.1 and 4.1.2). They are also algebras
because products of functions in these spaces are again in the spaces.
If a function / is differentiable in an open interval about x and its
derivative f is differentiable at x, we say that / is twice differentiable at
x and denote the second derivative by

/"(*) = A
dx V dx )

or Higher derivatives are defined analogously; we will often de¬


note the n^ derivative by f(n\x). As above, we denote by C^n^[a,6]
the set of functions which are n times continuously differentiable on
[a, b], and by C^°°^[a, b] the functions which are infinitely often contin¬
uously differentiable. The spaces C^ (R) and C^00) (M) are defined anal¬
ogously. We showed in Example 3 that polynomials are differentiable
everywhere and that their derivatives are again polynomials. Thus all
polynomials-are in C^°°^(1R). Since sinx, cosx, and ex are continuous and
(sin®)' = cosx, (cosx)' = — sin a:, and (ex)' = ex, these functions are
continuously differentiable. By differentiating repeatedly, one can easily
show by induction that they are in C(M).

Problems

1. Prove that /(x) = x1 2 3 4 5 is differentiable everywhere by showing that the


limit of the difference quotient exists.

2. Let n > 0 be a positive integer. For all x ^ 0, prove that f(x) = is


differentiable everywhere and f’(x) = ^t, by showing that the limit of
the difference quotient exists.
P
3. Let p and q be integers, q ^ 0. Suppose that f(x) = aA is differentiable
for x > 0. Prove that

Hint: differentiate f(x)q = xp.

4. Where are the following functions differentiable?

(a) |sinx|.
(b) sin|x|.

5. Let p(x) be a polynomial and suppose that xQ is a real root; that is, p(x0) =
0. When will |p(z)| be differentiable at a:0?
128 Chapter 4. Differentiation

6. Is f(x) = y/x continuously differentiable on the interval (0,1)? On the


interval [0,1]?
7. Suppose that we assume that eax, sin x, and cos x are continuously differ¬
entiable with derivatives aeax,cosx, and — since, respectively Prove that
esin«eCr(oo)(R)_

8. Suppose that / eC(1)[a, 6]. Prove that / is Lipschitz continuous on [a, 6].
Hint: use the Mean Value Theorem.

9. Suppose that / is differentiable at x. Prove that

f(x + h) - f(x - h)
lim fix).
h->0 2h

10. Suppose that 0 < f(x) < x2 for all xel.

(a) Prove that / is differentiable at x = 0 and /'(0) = 0.


(b) Give an example of a function which satisfies the hypothesis but
which is not continuous for x ^ 0. Is the function continuous at
x = 0?

11. Suppose that /(x) > 0 for all x e R. Assume that f(x)2 is differentiable.
Is /(x) necessarily differentiable?

12. Let f(x) = x2 sin (1/x) on the set E = ( —oo, 0) U (0, oo).

(a) Explain why / is continuously differentiable on E.


(b) Compute the derivative of / on E.
(c) Define /(0) = 0 and use the difference quotient to show that / is
differentiable at zero.
(d) Show that if /(0) = 0, then / is differentiable but not continuously
differentiable on R.

13. Let g be the function g(x) = |x| on the interval [—1,1). Extend g to the
whole real line by requiring that g(x + 2) = g(x) for all x.

(a) Draw the graph of g.


(b) Where is g continuous? Where is g differentiable?
(c) Define gn(x) = g(4nx). Where is gn continuous? Where is gn differ¬
entiable?
* *

Remark: These functions are used in Example 2 of Section 6.3 where we


prove the existence of a continuous function that is nowhere differen¬
tiable.

14. A function is called piecewise smooth on [a, b] if it is infinitely often con¬


tinuously differentiable at all but finitely many points, xi, X2,..., xjv, of
[a, b} at which / and all its derivatives are continuous or have jump dis¬
continuities. Which of the following functions are piecewise smooth?
4.2 The Fundamental Theorem of Calculus 129

(a) f(x) = \x\ on [-1,1],


(b) The functions in parts (a) and (b) of problem 4 defined on [—2n, 2-7t] .
(c) f(x) ~ */x on [1,2],
(d) The function / of problem 12 on [-1,1],
(e) The function g of problem 13 on [—1,1],

4.2 The Fundamental Theorem of Calculus


Before proving the Fundamental Theorem of Calculus, we prove three
theorems which relate the values of a function / to values of the deriva¬
tive. Each theorem contains a simple geometric idea. The first theorem
says that at an interior maximum point, the tangent line must be hori¬
zontal. See Figure 4.2.1(a).

Figure 4.2.1

□ Theorem 4.2.1 Suppose that / is continuous on the finite interval [a, 6].
Let c be a point where / attains its maximum. If a < c < b and / is
differentiable at c, then /'(c) = 0.

Proof. Suppose /'(c) > 0. Since £(/(c + h) - f(c)) ->■ /'(c), we can
choose a 6 so that

/(c + fc)~/(g) > ^(c)/2 if \h\<6.


h

Thus for 0 < h < 5,

f(c + h) > f(c) + hf'(c)/2 > /(c),

which contradicts the hypothesis that / achieves its maximum at c. A


similar proof shows that /'(c) < 0 is also impossible. □
130 Chapter 4. Differentiation

A similar result holds for the point where / achieves its minimum
(problem 1).
The next theorem says that if / is zero at two different points then it
must achieve a maximum or a minimum somewhere in between.

□ Theorem 4.2.2 (Rolle's Theorem) Suppose that / is continuous on the


finite interval [a, b], differentiable on (a, b), and /(a) = 0 = f{b). Then
there is a point c satisfying a < c < b such that f'(c) = 0.

Proof. If f(x) = 0 for all x e (a, b), then f'{x) = 0 for all x, so we can
choose c to be any point in the interval. Otherwise, there must be a point
x0 such that |/(x0)| ^ 0. If f(xa) > 0, then, by Theorem 3.2.2, / achieves
a positive maximum at some point c in the interval [a, b\. The point c
cannot equal a or b since / is zero there, so a < c < b. Thus, by Theorem
4.2.1, f'(c) = 0. See Figure 4.2.1(b). If f(x0) < 0, a similar proof, using
the analogue of Theorem 4.2.1 for minima, gives the result. □

The next theorem states that there must be a point on the graph of
a function between (a, / (a)) and (6, f(b)) where the slope of the tangent
line equals the slope of the straight line between (a, /(a)) and (6, /(&)).
See Figure 4.2.1(c).

□ Theorem 4.2.3 (The Mean Value Theorem) Suppose that / is continu¬


ous on the finite interval [a, b] and differentiable on (a, b). Then there is a
point c satisfying a < c <b such that

= fiP) ~ /(a)
b—a

Proof. The idea of the proof is simple; we subtract from / the function
whose graph is the straight line and then use Rolle's theorem. Define

g(x) = f(x)-{f(a) + ^&—^-(x-a)}. (3)


b—a
Then g(a) = 0 = g{b), so Rolle's theorem implies that there is a point c
satisfying a < c < b such that g'(c) = 0. Differentiating both sides of (3)
and evaluating at c, we obtain the result. □

Notice that, in general, the point c depends on a and b. Though


Rolle's theorem and the Mean Value Theorem are very simple, they are
extremely useful and important.
4.2 The Fundamental Theorem of Calculus 131

Example 1 Consider sin® on the interval [0,®]. Since (sina;)' = cos®,


the Mean Value Theorem tells us that for each x there is a point c(x)
satisfying 0 < c(®) < x such that

sin x sin x — sin 0


- = -—-—-——— = cos cl®).
® ® — 0
As x —» 0, c(®) —> 0 since 0 < c(x) < x. Thus, because cos® is continuous.

lim- — limcosc(®) = 1.
a:\0 ® a;\0

Since sin ® is an odd function, the limit from the left is the same. Thus,

, sin ®
Inn- = 1.
®

□ Theorem 4.2.4 (The Fundamental Theorem, Part I) Let / be a continu¬


ously differentiable function on a finite interval [a, b]. Then,

[ f'(x)dx = f(b) — /(a). (4)


Ja

Proof. Let P be a partition of [a, b] into N subintervals. By the Mean


Value Theorem, there is & in each subinterval [®»_i, ®i] such that

f(Xi) - /(®i-1) = /'(&) (xi - ®*-i).

Thus,
N
f(b) - f(a) = ~ /(*<-1)) (5)
i—1
N
= (6)
2=1

The sum (4) is a Riemann sum for /ab /'(®) dx. By Corollary 3.3.2, the Rie-
mann sum converges to /ab f(x) dx as the maximal length of the subin¬
tervals gets smaller since /' is continuous. But each of the Riemann sums
equals f(b) - /(a), so the limit of the Riemann sums equals f(b) - f{a).

Part I of the Fundamental Theorem is the basis for analytical integration.


Given an integral g(x)dx, one tries to guess a G so that G (®) = g{x)-
132 Chapter 4. Differentiation

If one is able to, then g(x)dx = G(b) — G(a), by the Fundamental The¬
orem.

□ Theorem 4.2.5 (The Fundamental Theorem, Part II) Let / be a contin¬


uous function on a finite interval [a, b\. Define

f(x) = r Ja

Then F is continuously differentiable on [a, b], and F'(x) = f(x).

Proof. Let x e [a, b). We shall show that the difference quotient for F
has the right-hand limit f(x) at x. Suppose h > 0. Then

F{x+h)-F(x) i r+h,u,^ i
-h- = h L f(t) dt~lJa m dt
1 rx+h

= hi mdt
For the moment, fix h and let m and M denote the minimum and maxi¬
mum of / on the interval [x, x + h]. Then, by Theorem 3.3.4,

]_ rx+h
m < — /(f) dt < M.
JX

Since / takes on the values m and M, the Intermediate Value Theorem


guarantees that there is a c in the interval [x, x + h] so that

1 px+h

/(c) = ^Jx f{t)dt.

Now, c depends on h, but since c e [x: x + h] we must have c \ x as h —> 0.


Since / is continuous, we know that /(c) —y f(x) as h \ 0. Thus the
difference quotient for F(x) converges to f(x) as h \ 0. A similar proof
shows that if x e (a, b\, then the left-hand limit of the difference quotient
of F converges to f(x) as h /* 0.‘ Hence F is differentiable on [a, b] and
its derivative is /. Since / is continuous, F is continuously differentiable
on [a, 6]. □

Note that Part II answers a very natural question. Does every contin¬
uous function have an antiderivative? That is, given /, is there an F so
that F' = /?
4.2 The Fundamental Theorem of Calculus 133

All five theorems proven in this section have versions with weaker
hypotheses.

Problems

1. State and prove the analogue of Theorem 4.2.1 for the point where /
achieves its minimum.

2. Let f(x) = e~x2. Find a formula for a function F so that F'(x) = f(x).

3. Let g be twice continuously differentiable on [a, b]. Suppose that there are
three distinct points, x\, X2, x$, in [a, b] so that g(xi) = g(x2) = <7(3:3) = 0.
Prove that there is a c satisfying a < c < b such that g"(c) = 0. Hint: use
Rolle's theorem twice.

4. Prove that

x-+0 x
Hint: use the Mean Value Theorem.

5. Suppose that / is continuously differentiable on [a, 6] and f'(x) — 0 for


all x. Prove that / is a constant function.

6. Suppose that / is continuously differentiable on [a,b] and f'(x) > 0 for


all x. Prove that / is strictly monotone increasing on [a, b\; that is, if x < y,
then f(x) < f(y).

7. Let / and g be in C^^(R) and suppose that /(0) = g(0) and f(x) < g'(x)
for all x > 0. Prove that f(x) < g(x) for all x > 0.

8. Suppose that / is continuously differentiable on [a, b] and that /(a) = 2


and \f'(x)\ < .3 for all x e [a, 6]. What can you say about /(&)?

9. Let / be a piecewise continuous function on a finite interval [a, b] with


jump discontinuities at ai, a2,ajc- Define F(x) = f* f(t) dt. Prove that
F is differentiable except at the points ai,a2, •••, aK and F'{x) = f{x).

10. Suppose that / is continuous and g is continuously differentiable on R.


Prove that J^x) /(f) dt is a continuously differentiable function of x and
compute its derivative.

11. Let / be twice continuously differentiable on [a,b\ and suppose that


f'(c) = 0 for some c e (a, b). Suppose f"(c) > 0. Prove that

(a) there is a S > 0 such that f"(x) > 0 for all x e (c- S,c + 8).
(b) f'(x) < 0 for x e (c - <5, c) and f'(x) > 0 for x e (c, c + 6).
(c) f(x) > f(c) for x e (c — 8, c + S) such that x/c; that is, c is a local
minimum.
134 Chapter 4. Differentiation

(d) the hypothesis /"(c) > 0 is not sufficient to guarantee the conclu¬
sion of part (c).

12. Suppose that / and g are continuously differentiable on [a, b\. Prove that

Hint: use the Fundamental Theorem of Calculus.

13. Suppose that / is a continuous function on R that satisfies

f(x) — 5 + 2 / f(t) dt.

Prove that / e Cd)(R) and express /' in terms of /. Then find /.

14. Suppose that / is a continuous function on an interval about 0 that satis¬


fies
f(x) = 5 + 2

Prove that / e C^^R) and express /' in terms of /. Then find /. Hint:
see project 1.

4.3 Taylor's Theorem


It is natural to try to approximate functions by polynomials because
polynomials are easy to integrate and easy to differentiate. Suppose
that we are given a function f defined on some finite interval [a, 6] and
we want to approximate it near some given point xQ in the interval.
If / is a differentiable function, it is rea¬
sonable to make the linear approximation
f(x) « f(xa) + f'(x0) (x - x0). We use
the symbol ~ to mean “approximately
f{xo)- equal to.” The function on the right is the
straight line that has the same value at xQ
as / and the same derivative at xQ as /.
For x close to xQ the linear approximation
should be pretty good. But how close and
how good? If / itself were a straight line
near xQ/ that is, if f were constant, then
Figure 4.3.1 the approximation would be perfect. But
4.3 Taylor's Theorem 135

if the slope f'(x) is changing, then the graph of / will curve away from
the straight line approximation. See Figure 4.3.1. Since the rate of change
of /'(x) is given by f"(x), our intuition suggests that the size of the sec¬
ond derivative of / will determine how good the linear approximation
is. We shall see below that this intuition is correct.
If we want to make better approximations, we could use a quadratic
approximation

f(x) « f(x0) + f'{x0) (x - Xo) + ^ ^°- {x - Xo)2,

which has the same value, the same derivative and the same second
derivative as / at xQ. Continuing in this way, we define the nul Taylor
polynomial of / at xQ by

T(n)(x,x0) = f(x0) + f\x0) (x-x0) + ^ (x~x°)2 +

f{n)(x0)
... + (x
n\

T^n\x, x0) is a polynomial in x, but the coefficients depend on x0. Note


that / must be differentiable n times at xQ in order to define T^n\x, xQ).

Example 1 Let f(x) = cosx and xQ = 0. Then, /(0) = l,/'(0) —


0, /"(0) = -1, /(3)(0) = 0, and /(4)(0) = 1, so the first five Taylor poly¬
nomials are:

T(0)(z, 0) = 1

rW(x, o) = i
T{2) (x,0) = 1-f

Tm(x, 0) = 1-f

T2 T4
tw(x,o) = i-^r + ^r-

The graphs of cos i, r<°>(z,0), T^(x, 0), TM>(x,0): and T";>(x,<i) are
shown in Figure 4.3.2.
136 Chapter 4. Differentiation

□ Theorem 4.3.1 (Taylor's Theorem) Suppose that f is n times continu¬


ously differentiable on the interval [a, 6] and suppose that /(n+1) exists
on (a, b). Let xQ be a point of [a, b]. Then for all x in [a, b\, x ± xQ, there is
a £ between x and xQ such that

f(x) = T(">(x,x0) + (x-x0)"+1. (7)


(n + 1)!

Proof. Fix an x in [a, b\ and let a be the number so that

f(x) = T^n\x,x0) + a(x-x0)n+1. (8)

Define a new function g on [a, b] by

g(t) = f(t) - T^(t,x0) - a (t - z„)n+1.

Differentiating n+1 times yields

S(n+1)W = f{n+1)(t) - a (n+1)!.

Hence, we just need to show that there is a £ between x and xQ such


that 0<n+1)(O = 0, for then a = /(n+1)(£)/(n + 1)!. Because the first n
derivatives of T^ equal those of / at xor we have

9(x0) = g'{xQ) = g{2\x0) = ... = gW(Xo) = 0.


4.3 Taylor's Theorem 137

Now, since g(x) — 0 (by the way we chose a in (8)), Rolle's theorem
implies that there is an x\ between x and xQ so that g'(xi) = 0. Since
g'(x0) = 0 too, Rolle's theorem implies that there is an X2 between x\
and x0 so that g(2)(x2) = 0. Continuing in this manner, we find an xn+i
so that g(n+l\xn+i) = 0. Setting £ = xn+1, we conclude that (7) holds.

Notice that the Mean Value Theorem is just Taylor's theorem for the
case n = 0.

Example 1 (revisited) Let's see how good an approximation T^\x, 0)


is to cos x. Since the fifth derivative of cos x is — sin x,

x2 x4 sm£
|cosz- {1 - — + — }| < x
5!

<
- 5!

Thus, the approximation is very good near the origin but gets much
worse as x grows, as we saw in Figure 4.3.2.

Example 2 (the natural logarithm) The function lnx, is defined for


x > 0 by

rx 1 ,
In 2: = / —at.
J1 t
By the Fundamental Theorem of Calculus, In x is continuously differen¬
tiable and (In x)' = l/x. Thus lnx is infinitely often continuously dif¬
ferentiable on (0, 00). The first three derivatives evaluated at x = 1 are
f'(l) = l,/"(l) = -1, and /(3)( 1) = 2. We shall use the third-order
Taylor polynomial about the point xQ = 1,

T^{x, 1) = (x - 1) - i (x - l)2 + i (x - l)3,

to approximate In 1.2. Note that In 1.2 can also be evaluated by comput¬


ing the defining integral approximately using the methods of Section 3.4
(problem 6). Since the fourth derivative of In a; is — JL Taylor's theorem
says that

Inx — T^(x, 1) = (x-\f


138 Chapter 4. Differentiation

for some £ between 1 and x. We are interested in x = 1.2. Since £ e (1,1.2),


we know that | < 1, so

| In 1.2 — T'(3)(1.2,1)| < ^(.2)4 = .0004.

Thus, the value T<3)( 1.2,1) = .1827 is within .0004 of In 1.2.

Taylor's Theorem can also be used to evaluate certain difficult limits.

□ Theorem 4.3.2 (l'Hospital's Rule) Suppose that / and g are contin¬


uously differentiable on an interval containing the point xa and that
f(x0) = 0 = g(x0). Suppose also that g'(x0) ^ 0. Then,

f(x) f'jxo)
lim
x yxq 9(x) g'{xo) ’

Proof. Since g'(x0) ^ 0, both g and g' are nonzero in a small interval
about x0 except for the root of g at xQ (problem 12). Using the Mean
Value Theorem, we can therefore compute

f(x) f(x) - f(x0)


lim lim
x—>x0 9(x) x yxq g(x) - g(x0)

lim mi)
X-^Xo
£?'(6)

f'{x0)
9'(x0)

since f and g' are continuous. We used the fact that £i and £2 are be¬
tween xQ and x, so £1 —> xQ and £2 —> xQ as x —> xQ. □

Suppose that / is infinitely often continuously differentiable in an


open interval about xQ. In that case we can define the Taylor polyno¬
mial for all n. This raises the question of whether the infinite series
4.3 Taylor's Theorem 139

converges and, if it does, whether it converges to f(x). We study the


convergence of power series in Section 6.4, where we shall see that this
question is deeper than it looks.

Problems

1. Compare the graph of In x to the graphs of the first five Taylor polynomi¬
als of In x about x = 1.

2. Compare the graph of /2 3 4 5 6 7 8 9 10 to the graphs of the first few Taylor


polynomials about x = 0.

3. Use l'Hospital's rule to evaluate the following limits:

(a) limx^o^.

(b) lim^i

(c) limx^0 ^r1-

4. Compute
/>sin2x
lim —- / cos 51 dt.
z-m) sin x Jo

5. Verify all the calculations in Example 2.

6. Use one of the methods in Section 3.4 to estimate the integral for In 1.2
and compare the result with that obtained in Example 2.

7. Use Taylor polynomials at x — xa to approximate \/9^2 and compare the


result with what you get using your hand calculator. How do you think
the hand calculator makes the computation?

8. Suppose that / is twice continuously differentiable on [0,1] and that


/(0) = 1, /'(0) = 2, and \f"(x)\ < .3 for all x e [0,1]. Compute f* f(x) dx
as best you can.

9. Let f(x) = s-^F±.

(a) Find the third-order Taylor polynomial for sins about x = 0.


(b) Use the error estimate in Taylor's theorem to compute limx^,0 f{x)-

10. (a) Prove that in a2 = 2 In a for all a > 0. Hint: write

and use a change of variables in the second integral. Change of


variables is justified in Section 4.5.
140 Chapter 4. Differentiation

(b) Prove that In an = n In a for all a > 0 and all integers n > 0 .

11. (a) Prove that In x is strictly monotone increasing and that In x oo as


x ->• oo. Hint: the harmonic series diverges.

(b) Prove that there is a unique number e such that In e = 1.

(c) Find the first few Taylor polynomials of f(x) = ln(l + x) about
x = 0. *

(d) Use Taylor's theorem to prove that limn_>oo n In (1 + T) = 1.

(e) Prove that limn_>.00(l + ^)n = e.

12. Suppose that g is continuously differentiable in an interval about xQ and


g(x0) = 0. If g'(x0) / 0, show that there is a small interval about xQ in
which neither g nor g' vanish except for the root of g at the point xa.

13. Suppose that / and g are twice continuously differentiable on an interval


containing the point xQ and that f(xQ) = f'{x0) = 0 = g'{x0) — g(x0).
Suppose that g"{x0) ^ 0. Prove that

r /(X) _ f"(x0)
X g(x) g"{x0)'

14. Evaluate lim^^o cosJ 1 •

4.4 Newton's Method

Finding the roots of functions, that is, the values of x so that f(x) = 0, is
important in both pure and applied mathematics. The most familiar use
is finding the roots of the derivative of a function in order to determine
possible local maxima and minima. Even if f(x) is a relatively simple
function like a polynomial, this is. not an easy question. The quadratic
formula allows one to find the roots of quadratic polynomials, and more
complicated formulas allow one to write down analytic expressions for
the roots of cubic and quartic polynomials. But it can be proven that
there are no general formulas for quintic and higher order polynomials.
Newton's method, discovered by Isaac Newton, is based on a simple
geometric idea. Consider the graph of a function f(x) near one of its
roots.
4.4 Newton's Method 141

/(®n)

Figure 4.4.1

Let xn denote the guess for the root. We will explain how to construct
the n + 1 guess. If f(xn) = 0, we have found the root and we can stop.
Otherwise, go to the point {xn, f(xn)) on the graph of /. Construct the
tangent line to the graph at this point; it has slope f(xn). Then £n+i
is defined to be the point where the tangent line crosses the £-axis. See
Figure 4.4.1. To find the formula for xn+i, consider the triangle whose
vertices are at (xn, 0), (xn+i, 0), and (xn, f(xn)). The height is f(xn) and
the base is xn+i - xn, so the slope of the hypotenuse (which is a piece of
the tangent line) is the ratio. That is.

~f{Xn)
f'{xn)
Xn+1 Xn

Solving for xn+i, we find

/(*») (9)
Xn+l — Xn . .
/ (®n)

Example 1 Let's try out Newton's method on the polynomial f(x) =


2 - x2. Of course we know that the roots are ±\/2, but let's suppose that
we don't know that. From a very rough graph, we can see that there is
a positive root somewhere between 1 and 2. If we choose as our initial
guess £0 = 1 and apply Newton's method, we find

xq = 1.0
x\ = 1.5
x2 = 1.41667
£3 = 1.41422
£4 = 1.41421
142 Chapter 4. Differentiation

If we choose as our initial guess xq = 1.9 and apply Newton's method,


we find

Xq = 1.9
xi m 1.47632
X2 = 1.41552
ar3 = 1.41421
x4 = 1.41421

In both cases the sequence converges extremely rapidly to the positive


root y/2 ~ 1.41421.

It is clear, even in a simple case like this, that there are pitfalls in
Newton's method. First, notice that that if we start with xq < 0, then
the sequence of iterates will converge to the root — y/2 rather than to the
root V2. Second, if we had been unlucky enough to start with the initial
guess :ro = 0/ then the tangent line has zero slope and so doesn't intersect
the z-axis anywhere; hence the method breaks down immediately. Intu¬
ition suggests that we can avoid these pitfalls if we start close enough to
the root that we want to find. The following theorem shows that under
reasonable hypotheses the intuition is correct.

□ Theorem 4.4.1 Suppose that f(x) = 0 and that / is twice continuously


differentiable in an interval p — <5, x + S) containing x. Suppose that
fix) ^ 0. Then, if xq is chosen sufficiently close to x, the iterates in
Newton's method will converge to x.

Proof. Note that we are not sure yet that the iterates given by formula
(9) even exist: for how do we know the iterates stay in the interval or
that /'pn) / 0 for each of the xnl For the moment we assume that
everything is all right and make an estimate. That estimate will tell us
how close we need to choose xq to x. From (9),

/On)
®n+1 ® Xn ~ X - (10)
f On)

_ (/On) ~ /Q))
(ID

f Q)Qn ~X) + /"Qn)Qn - x)2/2\


= On - x) ~ (12)
/'On)
4.4 Newton's Method 143

-x ( f'(Xn) - f'{x) f"{Tn)(xn - X) 1


— (®n (13)
Xjl f'(xn) + 2\f'(xn) J

- (*» ^ {(/'(*„)) (/,,(Tn)+ f 2!^) ^Xn *4 (14)


We used Taylor's theorem in (12), and the Mean Value Theorem applied
to /' in (14). Hence rn and rn are between x and xn. Notice that if the
expression in curly brackets has absolute value less than 1, then each
iterate will be strictly closer to x than the previous one. We are now
ready to say how close is close. Since /' is continuous and f'(x) ^ 0 by
hypothesis, we can choose a <5i < S such that

l/M
!/'(*) I > for all x e [x — 5i,x + £i].
2

Equivalently,

■ 1 2
for all x e [x — Si, x + cq].
!/'(*)! ~ !/'(*) I

Next, since f" is continuous, there is an M so that |/" (a?) | < M for all
x e \x — 8\,x + 8\\. Thus,

3M
< \Xr A < 2
{rhj)if"('Tn) + f vn)){Xn x) I /'(*)!

if Xn. X < (6M) 1\f'{x)\. (15)

Therefore, when (15) holds.

|®n+l x\ < ^ \%n x\- (16)

Hence, if xq is within 8i of x and satisfies (15) for n = 0, all the succeeding


xn will be successively closer to x and (16) will hold for all n. Iterating
(16), we obtain

|*n+l-*| < Q) 1*0

which proves that the iterates converge to x. □

We have proven the above theorem to illustrate how to use Taylor's


formula to get convergence. One can show convergence under much
144 Chapter 4. Differentiation

weaker hypotheses. For example, it is not necessary that f'(x) ^ 0. No¬


tice that if we define C = 3M/|/'(x)|, then it follows from (14) and the
estimate above (15) that

|^n+i *r| ^ C\xn x\ .

This estimate shows that the convergence in Newton's method is ex¬


tremely rapid once one is close enough to x. For example, suppose that
C — 10 and that \xn — x\ < 10-3. Then, \xn+i — x\ < 10 x (10~3)2 = 10~5.
This explains why the convergence in Example 1 is so rapid. In contrast,
another technique, the bisection method, discussed in problems 6 and 7,
is reliable but slow.
For functions with many roots and many points where the derivative
is zero, one must start close to a particular root in order to converge to
it. The following example shows that, even for a beautiful function with
only one root, the iterates of Newton's method may diverge if we don't
start close enough to the root.

Figure 4.4.2

Example 2 Let f(x) = arctan ax The graph of f(x) is shown in Figure


4.4.2. It is not hard to check that / and the root x = 0 satisfy the hy¬
potheses of Theorem 4.4.1. So if xq is close enough to zero, the sequence
{xn} produced by Newton's method will converge to zero. On the other
hand, if one starts far enough away from zero, the sequence of iterates
in Newton's method diverges. You can see the reason by drawing the
appropriate tangent lines on the graph. If one starts with a large posi¬
tive xq, then x\ will be even further away from zero, and so forth. We
can prove this divergence analytically too. Choose a number b so that
b = tan 1; that is, 1 = arctan b. Since arctan x is monotone increasing and
4.4 Newton's Method 145

odd, we know that

|arctanx| >1 if |aj| > b.

Using the inequality 1 + x2 > 2x, we see that

f{Xn)
(1 + x„) arctan®r > 2|xn
f'(Xn)

if |xn| > b. It follows from (9) that

l^n+l I \Xn |

if \xn\ > b. Thus if we begin with |xo| > b, the successive iterates will get
further and further away from the origin.

Example 3 One important use of Newton's method is to solve equations


which cannot be solved analytically. For example, suppose that 0 < a <
1 and that we want to find the x such that

a = ex(l — x). (18)

Letf(x) = ex(l — x). Since /(0) = 1 and/(l) = 0, the Intermediate Value


Theorem guarantees that there is an x between 0 and 1 so that (18) holds.
In fact, since / is strictly monotone decreasing, there is a unique solution
x for each a e (0,1). Unfortunately, no amount of algebraic manipulation
(or taking logs) solves (18) explicitly for x in terms of a. However, given
any particular a, we can find x by using Newton's method to find the
root of ex(l — x) — a.

When one uses the method of separation of variables (Project 1) to


solve ordinary differential equations, the solution, y(t), is typically de¬
termined by an implicit relation g(y(t),t) = 0, which often cannot be
solved to find y explicitly in terms of t. However, one can use Newton's
method and the idea in Example 3 to find y approximately for different
values of t. See Project 2.
Finally, we note that finding the roots of functions of one variable is
relatively easy. First we generate the graph of the function. This enables
us to make a good guess for each of the roots, and Newton's method
does the rest. Finding the common roots of several functions of several
variables is much more difficult.
146 Chapter 4. Differentiation

Problems

1. Find the roots of f{x) — x4 — 6a:2 + 2x + 1.

2. Let a = 0.5. Find the x which satisfies (18).

3. Find the values of x so that cos x = x.

4. What is the property of the function arctanx which makes Newton's


method diverge if x$ is large? Find other functions for which Newton's
method can diverge if one starts too far from the root.

5. (a) Which hypothesis of the Theorem 4.4.1 is not satisfied by the func¬
tion f(x) = x2 ?
(b) For different choices of x0/ provide numerical evidence that New¬
ton's method converges rapidly to zero anyway.
(c) Prove that for f(x) — x2 and any x0/ the iterates in Newton's method
converge to zero.

6. Suppose that / is a continuous function on an interval [o,6]. If /(a) <


0 and f(b) > 0, then by the Intermediate Value Theorem, / has a root
between a and b. If m is the midpoint of [a, b] and f{m) ^ 0, then there
is a root of / either in [a, m\ or in [m, b], depending on the sign of f(m).
We carry out the same procedure on this new interval, and so forth. For
obvious reasons, this is called the bisection method.

(a) If {yn} is the sequence of midpoints of the intervals in the bisection


method, prove that {yn} converges to a number c.
(b) Prove that /(c) = 0.
(c) Compute in terms of a and b how large n must be to guarantee that
\Vn~c\ < 10-6.

7. Let f(x) — x2 — 2. Let {xn} be the sequence of iterates generated by New¬


ton's method starting with x0 = 1. Let {yn} be the sequence of iterates
generated by the bisection method starting with the interval [0, 2].

(a) How many iterates are required in each method to find \/2 to within
10“6?
(b) Explain carefully why Newton's method is so much faster.

8. Explain why the factor (l/2)n in (17) can be made to be (l/10)n if we start
even closer to the root.

9. Theorem 4.4.1 guarantees that Newton's method will converge if we start


in the interval [x - x + (q]. How do we know what 5i is? Suppose that
f(x) is twice continuously differentiable on (x-5,x+5) and |/"(x)| < M.
Suppose that f(x) = 0 and f'(x) = c ^ 0. Prove that Si in the proof
of Theorem 4.4.1 can be chosen to be Hint: use the Fundamental
Theorem of Calculus to relate f'(x) and f"(x).
4.5 Inverse Functions 147

10. For A larger than 2.6 (approximately), the function f(x) = x3 — Acc2 + s + l
has three real roots, which we denote rx(A), r2(A), and r3(A). Use New¬
ton's method to generate good graphs of rlrr2, and r3 on the interval
[2.6,4],

4.5 Inverse Functions


A function / is said to be strictly monotone increasing if x < y implies
fix) < f{y). Strictly monotone decreasing is defined similarly. It is easy
to see from the Fundamental Theorem of Calculus that if / is continu¬
ously differentiable and f'[x) > 0, then / is strictly monotone increasing,
although this condition is not necessary (problem 1). The importance of
strict monotonicity is that such a function is automatically one-to-one;
that is, if x ^ y then f(x) / fiv)- Thus, a strictly monotone function
/ has an inverse function, denoted /_1, satisfying f~1(f(x)) — x for all
xeDom(f), and /(/_1 (y)) = y for all y e Dom(f~l) = Ran(f).

□ Theorem 4.5.1 Suppose that / is a strictly monotone function on an


interval [a, b\.

(a) If Ran(f) is an interval, then / is continuous.

(b) If / is continuous, then /-1 is continuous.

Proof. We will assume that / is strictly monotone increasing. The proof


for the other case is similar. Suppose that ce (a, b). Then, f(x) < /(c) <
f(z) if x < c < z, so

L = sup/(*) < /(c) < inf f(x) = R


x<c z>c

Suppose L < R. By monotonicity f(x) < L for x < c and f(z) > R for
z > c. Thus Ran(f) would contain at most one point, /(c), in the interval
(L,R). This is impossible since then Ran(f) would not be an interval
which would contradict the hypothesis. Therefore L = /(c) = R. Now
suppose cn —y c and let e be given. Since, /(c) = supX<cf{x), we can
choose xi < c so that /(c) - f(xi) < e. Similarly we can choose z\ > c so
that f{z\) - /(c) < e. That is,

/(c) -e < fixi) < /(c) < f{zi) < /(c)+ c.


148 Chapter 4. Differentiation

Since cn —> c, we can choose N so that x\ < cn < z\ if n > N. Therefore,


by monotonicity,

/(c) -e < f(xi) < f(cn) < f(zi) < f(c) + e

for n > N. This proves that | f(cn) — /(c) | < e for n > N, so / is continu¬
ous. The proofs at the endpoints c = a and c = b are similar. This proves
(a). Note that we needed only monotonicity, not strict monotonicity.
To prove (b) we recall that if / is continuous, then Ran(f) is an in¬
terval (Corollary 3.2.4). Since / is strictly monotone, /-1 exists, and it is
easy to check that /_1 is strictly monotone on the interval Ran(f). Since
the range of /_1 is [a, b], part (a) implies that /~* is continuous. □

□ Theorem 4.5.2 Suppose that / is a strictly monotone continuous func¬


tion on an interval [a, b]. If / is differentiable at a point x and f'{x) / 0,
then f^1 is differentiable at y = f(x) and

= —■ (19)
/ (*)

Proof. Suppose that x is in [a, b] and satisfies the hypotheses. Let y =


f(x). Let An 0 with \n ^ 0 and define yn = y + Xn. Then,

f~1{y + An) - f~x(y) = r\yn) - f~l(y)


^n Un ~ y

f~l{yn) - x
f(f-HVn)) ~ f{x)

_
f(x + hn) ~ f[x)

where we define hn = f~l{yn) - x. Note that by strict monotonicity,


hn ^ 0, and therefore f(x + hn) 7^ f(x), again by strict monotonicity.
Since / is continuous by hypothesis, f~l is continuous by Theorem 4.5.1,
which implies that hn ->• 0 as n ->• 00. Since / is differentiable at x,
the limit of the right-hand side exists and equals 1 /f'(x) by Theorem
2.2.6. Thus the limit of the left-hand side exists, which proves that /-1 is
differentiable at y and that (19) holds. □

Corollary 4.5.3 If / is continuously differentiable on [a, b] and f'(x) /


0 for all x e [a, b\, then /_1 is continuously differentiable and (19) holds
for all x e [a, 6].
4.5 Inverse Functions 149

Proof. Since / is continuously differentiable and f'(x) ^ 0 for all x


in [a, b], /-1 is differentiable for all y in Dom(f~1) and (19) holds. By
hypothesis, / and f are continuous and /-1 is continuous by Theorem
4.5.1. Therefore,

i
/'(/_1(y))
is continuous by Theorems 3.1.1(d) and 3.1.2. □

Next, we prove two theorems which justify the usual techniques for
changing variables in integrals. Both of these theorems have slick proofs
that use the chain rule and the Fundamental Theorem of Calculus (prob¬
lems 6 and 7). We give the proofs below because changing variables in
Riemann sums is central to the definition of the integral over more com¬
plicated objects like curves and surfaces.

□ Theorem 4.5.4 Let / be a continuous function on a finite interval [a, b].


Suppose that 0 is a strictly monotone continuously differentiable func¬
tion on an interval [c, d] such that 0(c) = a and 0(d) = b. Then,

f f{x)dx = f /(0(£))0'(t) di. (20)


Ja Jc

Proof. Let P be a partition of [a, b] into n subintervals [xi,Xi-1]. Let


U be the point in [c,d] so that 0(C) = Xi. Since 0 is strictly monotone,
the points {C} form a partition of [c,d\. On each interval [£»,£*-i], the
Mean Value Theorem guarantees the existence of a point t* such that
4>(ti) - 1) = 4>'(t’) (ti - Let x* = Mt’). Then,

<21)
i= 1 i=1

= E/WCMW) (‘i - *i-i). (22)


2=1

Now let the interval length in partition P get small. By Corollary 3.3.2,
the Riemann sum on the left side of (21) converges to the left side of (20).
On the other hand, since 0_1 is continuous, and therefore uniformly con¬
tinuous, on [a, b], the interval length of the {C} partition also gets small.
Thus, again by Corollary 3.3.2, the right-hand side of (22) converges to
the right-hand side of (20). Thus (20) holds. Q
150 Chapter 4. Differentiation

□ Theorem 4.5.5 Let / be a continuous function on a finite interval [a, b].


Suppose that 0 is a continuously differentiable function such that <f>'{x) ±
0 on [a, b]. Then,

pb r4>(b)
/ f(<f>(x))dx = / f{t)(</> ■l(t))'dt. (23)

Proof. The proof is very similar to that above. So we just give a


sketch here. Choose partition P as above and define U = <fi(xi). By the
Mean Value Theorem, there are points t* so that </>-1(C) - </>-1(C-i) =
(</>_1 )'(£*) (U — U-1). Define x* so that (j>(x*) = t*. Then,

52f(<t>(xi))(xi ~ ^-i) T,fK)(r1(ti)-4,-1(ti-i))


i—1 i—1

E/w
i=l

Letting the interval length in the partitions get small, we obtain (23). □

Problems

1. Suppose that / is continuously differentiable on [a, 6] and that f'(x) > 0.


Prove that / is strictly monotone increasing. Give an example of a strictly
monotone / whose derivative has a zero.

2. Suppose that / is continuous on [a, b} and that / is one-to-one. Prove that


/ is either strictly monotone increasing or strictly monotone decreasing.

3. Prove that each the following functions is strictly monotone on (0,oo).


Compute the inverse function and its derivative explicitly and verify that
(19) holds.

(a) f(x) = x3.


(b) f(x) = I.

4. Let f(x) = x4 + x2 + 1. Compute (/ x)'(3) and (/ x)'(21).

5. Prove that the function tan x is strictly monotone increasing on the inter¬
val (—|, |). Use Theorem 4.5.2 to compute a formula for the derivative
of arctanax
4.6 Functions of Two Variables 151

6. Give a different proof of Theorem 4.5.4 as follows. Let F(u) = Ja“ f(x) dx
and define G(t) = jF(0(t)). Now, apply the Fundamental Theorem of
Calculus to G'. Note that this proof does not require 0 to be monotone.

7. Give a different proof of Theorem 4.5.5 as follows. Let G be such that


G' = f Q(f). Now, compute (Go (0)-1)' and use the Fundamental Theorem
of Calculus.

8. The natural logarithm was defined in Example 2 of Section 4.3.

(a) Explain why In a; defined on (0, oo) has an inverse function. Call it
4>. What are the domain and range of 0? Is 0 continuously differen¬
tiable?
(b) Show by a change of variables that

(c) Prove that In ab = In a + In b for all positive numbers a and b.


(d) Prove that 0 satisfies 4>{x-\-y) = 0(s) 0(y) for all real numbers x and
y. We shall see in Example 2 of Section 6.4 that 0(z) = ex.

4.6 Functions of Two Variables


In mathematics and its applications, functions which depend on several
variables occur frequently. In this book we use such functions in the
study of integral equations (Section 5.4), the calculus of variations (Sec¬
tion 5.5), ordinary differential equations (Chapter 7), complex analysis
(Chapter 8), and partial differential equations (Section 9.1). Therefore, in
this section, we show how two of the fundamental ideas which we have
introduced, continuity and differentiation, can be extended to functions
of several variables. For simplicity, we shall restrict our attention to func¬
tions of two variables that take values in the real numbers.
The Euclidean plane JR2 = RxR is the set of ordered pairs (x, y) where
xeM and yeR. We can add pairs, (£1,2/1) + (£2,2/2) = (£1 + £2,2/1 + 2/2)/
and multiply pairs by real numbers, a(x, y) = (a£, ay). These operations
correspond to vector addition and scalar multiplication. We define the
Euclidean distance from a point to the origin by || (£, y) || = y/x2 + y2.
This notion of distance satisfies the triangle inequality (problem 10 of
Section 2.2):
152 Chapter 4. Differentiation

If (xi,yi) and (£2,2/2) are points in R2, we define the distance between
them to be

l(*i>yi) - (*^2,2/2)11 X2)2 + (2/1 - V2)2-

Definition. Let pn = (xn, pn) be a sequence of points in the plane and let
p = (x, y). We say that {pn} converges top, writtenpn p, if \\pn-p\\ ->
0.

It is not hard to check that pn —> p if and only if xn —>• x and yn —»■ 2/
(problem 10 of Section 2.2). Now that we have a notion of convergence
we can define continuity.

Definition. A function / from R2 to R is said to be continuous at


P e Dom(f) if, whenever pn e Dom(f), p e Dom{f), and pn -» p, it follows
that
Hm /(pn) /(?)•
n-Aoo

If / is continuous at every point of a set E C R2, then / is said to be


continuous on E.

Example 1 Let f(x, y) = 3£22/ + p3 + 1 and suppose that pn = (x„, yn),


p = (x, y), and pn -> p. Then

hm /(pn) Jirn (3£2pn + p3 + 1)


n—>oo

3( Hm xn)2( hm pn) + ( lim pn)3 + 1


h—>00 n—>-oo vn—>-oo 7

3£2p + p3 + 1

f{p),

where we used the limit theorems from Section 2.2 in the second step.
Thus / is continuous on R2. The same idea can be used to prove that all
polynomials in two variables are continuous functions on R2.

The three theorems in Section 3.1 have analogues for functions of two
variables. The sum, product, and quotient of continuous functions are
continuous except possibly where the denominator vanishes. If / is a
4.6 Functions of Two Variables 153

continuous function from R2 to R and g is a continuous function on R


with Ran(f) C Dom(g), then g o / is a continuous function on Dom(f).
Thus, a composition such as sin (x2 + y3) is continuous on the plane. Fi¬
nally, / is continuous at p if and only if, given e > 0, there is a <5 > 0 such
that
\\q-p\\ < S implies \f{q)-f(p)\ < £■

In all these cases, the proofs are virtually identical to those in Section
3.1 except that the absolute value, | • |, in R is replaced by the Euclidean
distance, || • ||. The reader is asked to give the proof of the third result in
problem 1.

Definition. A continuous function on a set E C R2 is said to be uni¬


formly continuous if, given e > 0, there exists a <5 > 0 such that

\\q — p\\ < S implies \f(q) — f(p)\ < £ for all q and p in E.

Sets of the form R = {x \ a\ < x < £>i, 0,2 < y < 62} are called closed
rectangles. The following theorem generalizes Theorems 3.2.1, 3.2.2, and
3.2.5.

□ Theorem 4.6.1 Let / be a continuous real-valued function on a closed


rectangle R in the plane. Then,

(a) / is bounded on R.

(b) There exist points c and d in R so that

/(c) = sup/(p), f{d) = inf /(p).


peR PeR

(c) / is uniformly continuous on R.

Proof. The proofs of all three parts follow closely the proofs in Section
3.2 except that they use the generalization of the Bolzano-Weierstrass
theorem proved in problems 10 and 11 of Section 2.6. We will prove part
(a), leaving parts (b) and (c) to the problems. To prove that / is bounded
on R, we need to show that there is an M so that |/(p)| < M for all
peR. Suppose that this is not true. Then for each large integer n, there
is a pneR such that \f(pn)\ > n. By the (generalization of the) Bolzano-
Weierstrass theorem, the sequence {pn} has a subsequence {pnk} that
154 Chapter 4. Differentiation

converges to a point ceR as k —> oo. Since / is continuous, f(pnk) ~>


f(p0) as k -» oo. However, this is impossible since f{pnk) —> oo. This
contradiction shows that / must be bounded. □

In the proof of Theorem 4.6.1, the only property of R that was used
was that sequences in R have subsequences that converge to a point of
R. Sets which have this property are called compact sets. Thus, continu¬
ous functions on compact sets have the three properties (a), (b), and (c).
Compact sets are investigated in problems 5 and 6.

We are now ready to discuss differentiation. Let (xQ, yQ) be a point


in the domain of a function of two variables, /, and suppose that all the
points (xQ + h, yQ) are in the domain of / for h small enough. Then, we
define the partial derivative of / with respect to x at (a;G, yQ), written
i£(*o,j/o), by

d_l lim f{x0 + h,yQ) - f(x0,y0)


(x0,y0)
dx h—h

if the limit exists. Similarly, suppose that all the points (xa, yQ + h) are in
the domain of / for h small enough. We define the partial derivative of
/ with respect to y at (xQ, y0), written §£ (xG, y0), by

dl = iim f(x°’y° + h) - f (xqi Vo)


{Xqi Vo)
dy h—h

if the limit exists. We shall often use the simpler notation fx{x0i Vo) and
fy{x0,y0) for x0,y0) and ^(a'0,y0), respectively. In the language of
calculus, we compute fx by holding y fixed and differentiating with re¬
spect to x, and we compute fy by holding x fixed and differentiating
with respect to y.

Example 2 Suppose that f(x, y) = 3x2y + y3 + 1. If we hold y fixed,


then f(x, y) is a polynomial in x and therefore differentiable. Similarly, if
we hold x fixed, f(x, y) is a polynomial in y and therefore differentiable.
Thus fx and fy exist at all points (*, y) in the plane and

fx(x,y) = 6 xy, fy{x,y) = 3a;2 + 3 y2.

The same argument shows that the partial derivatives of all polynomials
in two variables exist.
4.6 Functions of Two Variables 155

Notice that we have not yet defined what it means for / to be


“differentiable” at a point (x0,y0). It would be natural to think that the
right condition is to require that both of the partial derivatives, fx and fy/
exist. However, the following example shows clearly that this condition
is not strong enough.

Example 3 Let f(x,y) be the function such


that f{x,y) = 1 if x > 0 and y > 0 and
f(x, y) = 0 otherwise. Notice that the func¬
tion / isidentically zero on the x and y axes.
It follows that the limits of the difference quo¬
tients for the partial derivatives of / exist and
that (0,0) = 0 and /y(0,0) = 0. However,
on any line through (0,0) besides the axes, the
values of / jump from 0 to 1. See Figure 4.6.1.
Figure 4.6.1
Thus, this function should not be considered
“differentiable” at (0, 0). In fact, / is not even
continuous at the point (0,0).

Definition. We say that / is differentiable at a point (x0,y0) of R2


if for some S > 0 the partial derivatives of / exist and are continuous
in the disk Ds(x0,y0) = {(x,y)eR2 | (x - xQ)2 + (y - yQ)2 < £2}. If this
criterion holds at every point of a set E, then / is said to be continuously
differentiable on E.

This criterion for differentiability is too strong since it implies dif¬


ferentiability in the entire disk D§{x0, yQ). More delicate criteria can be
given, but this one will be sufficient for our purposes.

□ Theorem 4.6.2 Suppose that / is differentiable at (xQ, yQ) and let a and
b be any real numbers. Then,

f(xQ + ah,yQ + bh) - f(x0,y0)


lim---
o h a^c^x°1y°^ + b^J(x°’y°^ (24)

Proof. Let Ds(x0, yQ) be a disk in which the partials exist and are con¬
tinuous. Choose h small enough so that the point (xQ + ah, yQ + bh) is in
the disk. See Figure 4.6.2. We now rewrite the difference quotient as
156 Chapter 4. Differentiation

f(xQ + ah, yQ + bh) - f(xQ, y0)


h

f{xQ + ah,yQ + bh) - f(xQ + ah,yQ) f(xQ + ah, yQ) - f(xQ, yQ)
h h

=
df, , y.\ bh f (xQ + ah, yQ) — f (xQ, y0)
~(Xo + ah,t)- + a---.

In the first term on the right, we used the Mean Value Theorem in y
for the function f{x0 + ah, y) on the interval 0 < y < bh. See Figure
4.6.2. We can use the Mean Value Theorem because of the hypothe¬
sis that / is continuously differentiable in y for each fixed x. Since
0 <i<bh and the partial
, (x0 + ah, y0 + bh) is continuous in the disk, the
. (x0 + ah, £) limit of the first term on the
right as h -> 0 is b^(x0,y0).
• (*o,yo) m{xo + ah,y0) Since the limit of the second
term is a^(x0,y0), we have
Figure 4.6.2 proved the theorem. □

If a2 + b2 = 1, the limit in (24) is called the directional derivative in


the direction (a,fo). Finally, we prove a particular chain rule which we
use in Section 5.5. Other chain rules are considered in the problems.

□ Theorem 4.6.3 Let x(t) and y(t) be continuously differentiable func¬


tions of one variable and suppose that / is a continuously differentiable
function of two variables. Define g(t) = f(x(t), y(t)). Then, g is continu¬
ously differentiable and

9'(t) = w-{x{t),y{t))x\t) + ~(x(t),y(t))y'(t). (25)

Proof. Fix t and write the difference quotient for g as

f(x(t + h), y(t + h)) - f(x(t),y(t)) _


h

f(x(t + h),y(t + h)) - f(x(t + h), y(t)) f{x(t + h),y(t)) - f(x{t),y(t))


h h
4.6 Functions of Two Variables 157

df y(t + h)~ y(t) df u n\\ + h) ~ ®(*)


dy
(x(t + h),£ i) +
h tete’B(t))-h-•

In each term we used the Mean Value Theorem, so £1 is between y(t) and
y(t + h) and £2 is between x(t) and x(t + h). Thus, as h —» 0, we know
that £1 —» y(t) and £2 x(t). Since x(t) and y(t) are differentiable and
the partials of / are continuous, the limit of the right hand side exists
and equals the right hand side of (25). Thus g is differentiable and (25)
holds. Since the composition and product of continuous functions are
continuous, g' is continuous. □

A partial differential equation is an equation involving the partial


derivatives of an unknown function. The investigation of partial differ¬
ential equations was a primary stimulus for the rigorization of analysis
in the 19m century, and such equations are used today in many models
in the physical, biological, and social sciences. A simple example, the
wave equation, is discussed in problems 12 and 13. The heat equation is
discussed in Section 9.1.
Most of the concepts and theorems in this section have natural gener¬
alizations to real-valued functions defined on Rn = M. x R x ... x R. This
is just the beginning of a difficult and important subject, the analysis of
functions from Rn to Rm. For more information, see [24] or [12]. We dis¬
cuss some simple aspects of the integration of functions of two variables
in project 4.

Problems

1. State and prove the analogue of Theorem 3.1.3 for real-valued functions
of two variables.

2. Prove part (b) of Theorem 4.6.1.

3. Prove part (c) of Theorem 4.6.1.

4. Suppose that g and h are continuous functions on R and / is a continuous


function on K1 2 3 4 5. For each t, define y(t) = f(g(t), h(t)). Prove that y is a
continuous function on M.

5. We wish to find criteria for a subset E Cl2 to be compact. We know from


problem 8 of Section 2.4 that every Cauchy sequence in M2 converges to
a point of R2. We say that E is closed if every Cauchy sequence of points
of E converges to a point of E.
158 Chapter 4. Differentiation

(a) Use problem 10 in Section 2.6 to show that if E is closed and


bounded, then E is compact.

(b) Show that if E is not closed and bounded, then E is not compact.

(c) Is {(a?, y) eR2 | x2 + y2 < 1} compact?

(d) Is {(s, y) eM2 | x2 + y2 < 1} compact?


(e) Is {(a;, y) e M2 | x2 + y2 > 1} compact?

6. Give an example of a noncompact set £CR2 and a continuous function


/ on E so that none of the three properties (a), (b), or (c) in Theorem 4.6.1
is true.

7. Show that the partial derivatives of

xy
f(x,y)
\Jx2 + y2

exist at (0,0) but that / is not differentiable there.

8. Let / be differentiable at (x, y). Show that the directional derivative of /


at (x, y) is largest in the direction (a, b) where

a =

That is, it is largest in the direction of the gradient.

9. Let H(x,p) be a differentiable function of two variables. For a given point


(x0,p0) in the plane, let x{t) and p{t) be the solutions of the following
system of differential equations:

x'(t) = —(x(t),p(t)), x(0) = xo

dH
p'W = —^-(*(*). p(*)), p(0) = po.

This is called a Hamiltonian system, H is called the Hamiltonian, and


the curve (x(t),p(t)) in the x-p plane is called an orbit.
1 %

(a) Prove that H is constant on orbits.

(b) Suppose that H(x,p) = x2 + p2. What do the orbits look like?

(c) Suppose that H(x,p) = ax + bp. What do the orbits look like?

10. Let y be a continuously differentiable function on R and let / be continu¬


ously differentiable on M2. Prove that h(x, y) = g(f(x, y)) is continuously
differentiable on IR2 and compute its partial derivatives.
Projects 159

11. Let / be continuously differentiable on the plane and define

h(x,y) = / G(t)dt,
Jo
where G is a continuous function on M. Prove that h is continuously
differentiable and compute its partial derivatives.

12. Imagine a infinitely long elastic string lying along the z-axis. Suppose
the string is set in motion at time t — 0, and let u(x, t) denote the vertical
displacement of the string at x at time t. According to a simple model for
small displacements, u should satisfy the wave equation

££ C Uxx —

where c is a constant determined by the elastic properties of the string.


We assume that u also satisfies the initial conditions

u(x, 0) = f(x), ut(x, 0) = g(x)

where / gives the initial displacement and g gives the initial velocity of
the string at x. Assume that / and g are twice continuously differentiable.
Verify that

f(x + ct) + f(x - ct) 1


u(x,t) + ds
2 2c

solves the wave equation and the initial conditions. This solution was
first written down by J. d'Alembert (1717-1783).

13. Let / be a twice continuously differentiable function of one variable. De¬


fine
, . _ fit — r)
u(t, x, y, z) = -,

where r = y/x1 2 + y2 + z2. Verify that u satisfies the wave equation in


three dimensions:
Utt 'U'xx ^yy 'U'zz 0*

Projects

1. A first-order ordinary differential equation with initial condition y(0) =


y0 is called variables separable if it can be written in the form

/(y)^ = 9{t)-

In calculus, you probably learned that to solve the differential equation,


you multiply both sides by “dt” to obtain

f{y) dy = g{t) dt.


160 Chapter 4. Differentiation

You then integrate the left side with respect to y, obtaining F(y), and
integrate the right side with respect to t, obtaining G(t). Setting F(y) =
G(t) + C, you determine the constant C by the initial condition and then
solve for y in terms of t. How can one integrate one side of an equality
with respect to one variable and the other side with respect to another
variable and expect the results to be equal?

(a) Using the chain rule and the Fundamental Theorem of Calculus,
explain carefully what is really going on here.

(b) Use the method to solve y' = y(0) = 2.

(c) Use the method to solve y' = y2, y(0) = 2.

2. The purpose of this project is to show how Newton's method arises in


solving differential equations.

(a) Show that the solution of the initial value problem

dy _ y
y( 0) =1
dt y+ 1’

satisfies

y(t) + lay(t) = t+ 1. (26)

(b) Try every trick in the book to find an expression for y in terms of t.
(c) Use the Intermediate Value Theorem to show that for each t there is
a number y(t) that satisfies (26).
(d) Prove that for each t, y(t) is unique.
(e) For a number of different t between 0 and 4, use Newton's method
to determine y(t) and draw a sketch of the graph of y(t).

3. Consider the function f(x) = e-1/*2.

(a) Explain why / is infinitely often differentiable on the set (-oo, 0) U


(0, oo).
(b) Show that / can be extended to be continuous on R by defining
/(0) - 0.
(c) Show that / is continuously differentiable on R and /'(0) = 0.

(d) Prove that for a: ^ 0 the derivative of / is of the form


pn(l/x)e~1!x , where pn is a polynomial.
(e) Prove that / is infinitely often continuously differentiable on R and
/(n)(0) = 0 for all n.

(f) What is the Taylor polynomial approximation to / around the


point xQ = 0?
Projects 161

4. The purpose of this project is to develop some simple aspects of integra¬


tion theory on R2. Let R = [a, b} x [c, d\ be a rectangle in the plane. We
define a partition P of R to be a pair of partitions of [a, b] and [c, d], re¬
spectively:
a = xo < xi < X2 < .... < xiv = b

c = yo < Vi < U2 < •••• < Vm = d.

For each partition we define rectangles Rij = x [yj-i,yj]- Note


that R — UijRij. Setting

Mij = sup {f{x,y)}, rriij = inf {f{x,y)},


(x,y) e Rij \xiV) 6 Rij

we define upper and lower sums corresponding to partition P by:

Up(f) — 'y ' Mij (xj — Xi-i)(xjj — yj — i)


ij

Lp{f) = - xi-i)(yj ~

ij

(a) Formulate and prove the analogues of Lemma 1 and Lemma 2 in


Section 3.3.

If supP {LP(f)} = infp {UP(f)}, we say that / is Riemann inte-


grable over R and define

f(x,y)dx dy inf UP(f).

(b) Prove that if / is continuous on R then / is Riemann integrable.

(c) Prove the analogue of Corollary 3.3.2.

(d) Prove that G{y) = f*f{x,y)dx is a continuous function of y on


[c,d].

(e) Prove that / /a f(x, y) dx dy can be computed by iterating the inte¬


grals; that is,

Jc j f(x,y)dxdy = £ f(x, y) dx ^ dy.


162 Chapter 4. Differentiation

n
Hint: write

d pb
o
f(x, y) dx dy -
pd / pb

Jc
/ f(x,y)dx
\J a
dy

< nr. f(x, y) dx ^ dy - x, xi—i)(yj Vj-i)

_^ ^ pb
+ MiAXi ~ xi-l)(Vj ~ Vj-l) ~ ^2(Vj ~ Vj-l) / /(*, Vj) dx
ij j da

w f(x,yj)dxj (yj - yj-i) -rtr ^ f(x, y) dx dy

and justify carefully why each term can be made less than e/3 by choosing
the partition appropriately.
CHAPTER 5

Sequences of Functions

In this chapter we study the convergence of sequences of functions. Two


kinds of convergence, pointwise and uniform, are introduced in Section
5.1, and in Section 5.2 several limit theorems, which depend on uniform
convergence, are proved . Section 5.4 (Integral Equations) and Section
5.5 (The Calculus of Variations) show why these limit theorems are use¬
ful and important. In Section 5.3, the point of view shifts from individual
sequences of functions to sets of functions. The supremum norm is in¬
troduced, and the related notions of Cauchy sequence and completeness
are studied. These ideas lead naturally to the definition of metric space
in Section 5.6 and the study of completeness in Section 5.7. An important
class of metric spaces that have linear structure, normed linear spaces, is
introduced in Section 5.8.

5.1 Pointwise and Uniform Convergence

Sequences of real numbers have played a central role in this text so far.
For example, sequences are used in the definitions of continuous and dif¬
ferentiable functions. The Riemann integral is the limit of a sequence of
Riemann sums. The invariant probabilities of a two-state Markov pro¬
cess were found by taking the limits of sequences. Newton's method
shows how to find the roots of a polynomial by constructing sequences
of approximations. On the other hand, many of the most important ob¬
jects which one studies in analysis are functions. The solution of a differ¬
ential or integral equation is a function. The solution of a partial differ¬
ential equation is a function of several variables. Often one determines
these solutions by constructing a sequence of functions, fn(x), that gets
closer and closer to the solution f(x) as n —> oo. This is the technique we
shall use when we study integral equations (Section 5.4) and ordinary
differential equations (Section 7.1). But what do we mean by “closer and
164 Chapter 5. Sequences of Functions

closer”? What does it mean for a sequence of functions to get closer and
closer to a limiting function? As we shall see, there are many different
notions of convergence for sequences of functions. In this section we in¬
troduce two of the simplest, pointwise and uniform convergence. Other
kinds of convergence are discussed in Section 5.3 and later sections.

Definition. A sequence of functions, fn, defined on the same set E is


said to converge pointwise to a limiting function, / on E, if

lim fn(x) = fix), for every x e E. (1)


n—>oo J

That is, given e > 0 and x e E, there exists an N so that n> N implies

\fn(x)-f{x)\ < £. (2)

Example 1 Consider the functions fn{x) = x2 - ^x + 1 defined on the


whole real line M. Set f(x) = x2 + 1. The graphs of the first few fn and /
are shown in Figure 5.1.1. For each fixed x

x2 - —x + 1 —> x? T 1
2n

as n —>■ oo. That is, fn{x) —> f(x) for each x as n —>■ oo, so by definition,
the sequence of functions /„ converges pointwise to the function /.

Figure 5.1.1
5.1 Pointwise and Uniform Convergence 165

Example 2 Consider the sequence of functions fn(x) = xn. What hap¬


pens to the sequence of numbers fn(x) for different choices of xl If
|x| > 1, the sequence of numbers xn diverges. If \x\ < 1, the sequence
of numbers xn converges to zero. If x = 1, the sequence of numbers xn
converges to 1. If x = — 1, the sequence of numbers xn does not converge
because it oscillates between 1 and —1. Thus, if E is the whole real line
R, the sequence of functions {/n} does not converge pointwise since the
sequence of numbers {fn{x)} does not converge for every x e E. If E
is the interval [—1,1], the sequence of functions {fn} does not converge
pointwise since the sequence of numbers {fn{x)} does not converge for
x = — 1. If E is the interval ( — 1,1], the sequence of functions {/n} does
converge pointwise since the sequence of numbers converges
for each x e E. Thus we see that whether or not a sequence of functions
converges pointwise may depend on the set on which the functions are
defined.

Example 3 Let's consider the same sequence of functions as in Example


2, fn{x) = xn, this time defined on the interval [0,1].

0 1
Figure 5.1.2

The sequence of functions {/n} converges pointwise to the function

Thus, a sequence of continuous functions on a finite interval may con


verge pointwise to a limiting function that is not continuous.
166 Chapter 5. Sequences of Functions

Example 4 Consider the sequence of functions {fn} on the interval


[0,1], whose graphs are shown in Figure 5.1.3. On the interval [0, 2 n],
the graph of fn consists of the straight line from (0, 0) to (2-12-n, 2n) and
the straight line from (2_12_n, 2n) to (2~n, 0). For all x > 2 n, fn(x) = 0.

Figure 5.1.3

For any fixed x > 0, the sequence of numbers {fn{x)} converges to zero
as n —>• oo. In fact, if we choose N so that 2~N < x, then fn(x) = 0 for all
n > N. If x — 0, then /n(0) = 0 for all n, so fn(0) —> 0. Thus the sequence
of functions /„ converges pointwise to the zero function, f(x) = 0, on the
interval [0,1]. Notice, however, that for each n, /q1 fn(x)dx = 2-1 since
the area under each graph is just one half the base (2~n) of the triangle
times the height (2n). Therefore,

lim f fn(x)dx — 2~l 7^ 0 = f lim fn(x)dx.


n^oo J0 Jq n—>oo

Pointwise convergence is an extremely natural notion. It makes sense


to say that a sequence of functions converges to a limiting function if the
values of the functions converge at each x. But Examples 3 and 4 show
that pointwise convergence may not be very useful. In Example 3 we
saw that the pointwise limit of a sequence of continuous functions might
not be continuous. In Example 4 we saw that pointwise convergence
5.1 Pointwise and Uniform Convergence 167

is not enough to guarantee that the limit of the integrals of the fn is the
integral of the limiting function /. We therefore want to define a stronger
notion of convergence that gives us some control over the properties of
the limiting function.

Definition. Suppose that the sequence of functions {/n} converges


pointwise to a function / on the set E. Then, the sequence {/n} is said to
converge uniformly to / if, given e > 0, there exists an N so that n > N
implies

\fn{x) - f(x)\ < e for all x e E. (3)

There is a very important difference between (2) and (3). In pointwise


convergence, one might have to choose a different N for each different
x. In uniform convergence, there is an N which works for all x in the set
E. This is a much stronger requirement, and we will see in Section 5.2

that this is just the condition we need. The name “uniform convergence”
comes from the fact that, given e > 0, we can choose an N so that the
graphs of all the fn, for n > N, lie in an c-band about the graph of the
limiting function /; that is, they are uniformly close to /. Note that, by
definition, uniform convergence implies pointwise convergence. Let's
examine the sequences of functions in the above examples to see whether
they converge uniformly.

Example 1 (revisited) Suppose that e > 0 is given. Notice that

\fn(x)-f(x)\ = g.

In order to make the right-hand side < e, we will have to choose n large,
but how large depends on x. The bigger x is, the larger n will have to be
chosen so that | fn(x) - f(x)\ < e. In particular, given e > 0 and any n,
there is an x so that |/n(a;) - f{x)\ > e. Thus fn does not converge uni¬
formly to / on M. However, the convergence is uniform on each finite in¬
terval [a, b\. To see this, let e > 0 be given. Choose N > max{|a|, |6|}/2e.
Then, for all x e [a, b\, n > N implies

max{|a|, |6|}
\fn{x)-f(x)\ < < < £.
2N

Thus fn converges uniformly to / on [a, b].


168 Chapter 5. Sequences of Functions

Example 3 (revisited) The functions fn(x) = xn converge pointwise


but do not converge uniformly on [0,1]. Let e > 0 be given. The problem
is not at x = 1, where all the functions have value 1, but for x near 1.
For each x < 1 in [0,1], the sequence of numbers xn converges to zero as
n —> oo, but the closer x is to 1 the longer it will take until xn is less than
e. In particular, given any small e > 0 and any N, there is an x e [0,1] so
that xN > e. Thus, there is no N so that (3) holds for all x in [0,1]. On the
other hand, the functions fn do converge uniformly to the zero function
on [0, /j] if p < 1.

Example 4 (revisited) Looking at the graphs of the functions fn in


Figure 5.1.3, we can see why they do not converge to the zero function
uniformly. The closer x is to zero, the longer it takes before |/n(x)| < e
for any given e. In particular, given any 0 < e < 1 and any n, there
is an x (corresponding, say, to the peak) so that fn(x) > 1. Thus, (3)
cannot hold for any choice of N, so {fn} does not converge uniformly to
the zero function on [0,1]. Since uniform convergence implies pointwise
convergence and {/n} converges pointwise to the zero function, {/n} can
not converge uniformly to any other function. The sequence {fn} does
converge uniformly to the zero function on any interval of the form [p, 1]
if 0 < p < 1.

We will often use the notation fn —> f to mean that the sequence of
functions converges to the limiting function /. The arrow itself doesn't
indicate whether the convergence is pointwise or uniform, so we will
always say /„->■/ pointwise or /„ -> / uniformly. In Section 5.3 we
discuss other notions of convergence.

Problems

L Prove that the sequence fn(x) = xn converges uniformly to zero on any


interval of the form [0, fi] if g < 1%

2. Suppose that g is a continuous function on [a, b}.

(a) Prove that if fn —> / pointwise, then gfn — > gf pointwise.

(b) Prove that if fn —> f uniformly, then gfn — > gf uniformly.

3. Let E be a subset of M. Suppose that /„—>■/ uniformly on E and gn —> g


uniformly on E. Prove that fn + gn ->• / + g uniformly on E.
5.2 Limit Theorems 169

4. Suppose that g is a continuous function on [a, 6] such that g(x) > 0 for
each x e [a, b}. Prove that if fn —>• / uniformly, then ^ -> ^ uniformly.

5. Prove that fn{x) = (x — L)2 converges uniformly on any finite interval.

6. Prove that sin (x + 4) —> sin x uniformly on R. Hint: use the Mean Value
Theorem.

7. Let fn(x) = 1+nz 2 • Prove that fn —> 0 pointwise but not uniformly on
[0,1]-
8. Let fn(x) = Prove that fn —>• 0 pointwise but not uniformly on
[0, oo).

9. Suppose that g is a continuous function on [0,1] and that p(l) = 0. Define


fn(x) = g(x)xn. Prove that /„ —>■ 0 uniformly.

10. Let / be a uniformly continuous function on R and define fn(x) = f(x +


4). Prove that fn~+f uniformly.

11. Let g{x) = e~x2 and define fn(x) = g(x — n). Prove that fn —> 0 pointwise
but not uniformly on R.

12. Let {/n} be a sequence of continuous functions such that fn —>■ / uni¬
formly on R. Suppose that xn —> xa. Prove that limn_^oo fn{xn) = f(xa).

13. Let / be a continuous function on [0,1]. Under what conditions on / will


the sequence fn(x) = f(x)n converge pointwise? Uniformly?

14. (a) Prove that (1 + ^)n -»■ ex for all x. Hint: see problem 11 of
Section 4.3.
(b) Prove that (1 + ^)n —> ex uniformly on any finite interval [a, 6].
(c) Prove that the convergence is not uniform on R.

5.2 Limit Theorems


In this section we prove several theorems which guarantee that limiting
functions have nice properties if the convergence is uniform.

□ Theorem 5.2.1 Let {/„} be a sequence of continuous functions on a set


E C R. If fn -» / uniformly on E as n -h oo, then / is continuous on E.

Proof. Let xQ e E and e > 0 be given. Since fn f uniformly, we can


choose an N so that n > N implies that

\fn{x)-f{x)\ < c/3 for all x e E. (4)


170 Chapter 5. Sequences of Functions

Further, since fu is continuous, we can find a 6 so that

\/n(x) - fN(x0)\ < e/3 if \x — xQ\ < S. (5)

Using (4) and (5), we see that

\f{x)~f(x0)\ < \f(x) - fN{x)\ + \fN(x) - fN(x0)\ + \fN(x0) - f(xQ)\


< e/3 + s/3 + e/3
= e

for all x such that \x — xa\ < S. By Theorem 3.1.3, this proves that / is
continuous at xQ. Since xQ was arbitrary, / is continuous on E. □

Example 3 of Section 5.1 shows that the limiting function may not be
continuous if the convergence is not uniform.

□ Theorem 5.2.2 Let {/n} be a sequence of continuous functions on a fi¬


nite interval [a, 6]. If fn —> f uniformly as n —>■ oo, then

Proof. Let e > 0 be given. Since fn / uniformly, we can choose an N


so that n > N implies that

\fn(x) — f{x)\ < -- for all x e [a, 61.


b— a

Thus,

rb pb rb
/ fn(x)dx- f{x)dx / {fn(x) ~ f(x))dx
Ja Ja Ja

rb
‘ \fn(x) - f{x)\dx
5 ja

s rb
/ dx
- b - a Ja
= £

if n > N. Thus fa fn(x) dx —> J^ f(x) dx. In the next to last step we used
the estimate in Corollary 3.3.6. □
5.2 Limit Theorems 171

Example 4 in Section 5.1 shows that the conclusion of Theorem 5.2.2


may not hold if the convergence is not uniform.

Example 1 To see that the hypothesis that [a, b] is a finite interval is


needed in Theorem 5.2.2, consider the following sequence of continuous
functions on [0, oo). Let fn(x) = l/n on the interval [0, n); let fn be the
straight line from the point (n, l/n) to the point (n 4- 1,0) on the interval
[n, n + 1]; let fn(x) — 0 for x > n + 1. Since 0 < fn{x) < l/n for all x, it is
easy to see that fn —>• 0 uniformly as n —>■ oo. However,

Thus, limn^oo /0°° fn(x) dx = 1, while the integral of the limiting function
is 0.

□ Theorem 5.23 Let {/n} be a sequence of continuously differentiable


functions on a finite interval [a, b]. Suppose that the sequence of deriva¬
tives {f!/} converges uniformly to a function g on [a, b\ and that for one
point, x0, the sequence of real numbers {fn(x0)} converges. Then, {fn}
converges uniformly to a continuous function / on [a, 6], f is continu¬
ously differentiable, and f = g.

Proof. Since /' is continuous, the Fundamental Theorem of Calculus


guarantees that

fn(x) = fn(x0) + [ f'n{t)dt (6)


J x0

for each x e [a, 6]. Now, fh-> g uniformly, so by Theorem 5.2.2,

lim
n^°°
f
Jxo
f'n(t)dt = f
Jxo
g(t) dt.

In addition, {fn{x0)} converges by hypothesis, so the right hand side


of (6) converges as n -> oo. Therefore, for each x, the left-hand side
of (6) converges, too. If we define /(x) = lim^oo /„(*) and c =
limn_s.oo fn{,Xo)/ then

f(x) = c + [ g{t)dt (7)


J Xo

for each x e [a, b\. Since g is continuous, we know by the Fundamental


Theorem of Calculus that / is continuously differentiable and /' = g. It
172 Chapter 5. Sequences of Functions

remains to show that the convergence /n —> / is uniform. Let e > 0 be


given. Choose N\ so that n > N\ implies |c — fn(x0)\ < e/2. Choose
N2 > Ni so that n > N2 implies that \f'n{t) — g(t)\ < 2^-a)• Then' if
n > N2,

\fn{x)-f(x)\ < \fn(Xo)-c\ + f UM ~ 9(t))dt


J Xn

< £ /2 + [ \fn(t) - g(t)\dt


J x0
e r
dt
~ 6/2 + 2(6 - a) JXr
< e/2 + e/2

for all x e [a, b]. Thus, fn —> f uniformly. □

Finally, we prove that one can differentiate under the integral sign
with respect to a parameter. This theorem plays an important role in our
derivation of the Euler equation in Section 5.5.

□ Theorem 5.2.4 Let / be a continuous function on the rectangle Q =


[a, 6] x [c, d]. Suppose that for each x the function /(x, •) is differentiable
on [c, d] and fy is continuous on Q. Define a function F on [c, d] by

F(y) = f f{x,y)dx.
Ja

Then F is continuously differentiable and

F'(v) (8)

Proof. To show that F(y) is differentiable, suppose that hn > 0 and —

hn 0. We want to show that the limit of the difference quotient

lim
n—>oo
F(y + hn) - F(y)
b-n
lim
n—>oo J a f f(x,y + hn) - /(x,y)
hn.
dx (9)

exists and equals the right hand side of (8). Since / is continuously dif¬
ferentiable in the second variable,

f{x,y + hn) - f{x,y)


(10)
hn
5.2 Limit Theorems 173

as n —>• oo for each fixed x. What we need to know is that we can bring
the limit on the right-hand side of (9) inside the integral. This is exactly
the question treated in Theorem 5.2.2. According to the hypothesis of
that theorem, we need to know that the convergence in (10) is uniform
in x in order to interchange the limit and the integral. By the Mean Value
Theorem, for each fixed x,

f(x,y + hn) - f(x, y)


fy(x>0
hn

for some £ e [y, y + hn]. By hypothesis, fy is continuous on the rectangle.


This implies, by Theorem 4.6.1(c), that fy is uniformly continuous on the
rectangle. Thus, given e > 0, we can choose a <5 so that for all x e [a, b\, we
have |fy(x,z) - fy(x,y)\ < e if \z — y\ <5. Therefore, choosing n large
enough so that \hn\ < S guarantees that

f(x,y + hn) - f(x,y)


- /y(®,y) < l/y(«»0 - /y(*»y)l
hn
<

so the convergence in (10) is uniform for x in [c,d]. Thus, by Theorem


5.2.2, the limit in (9) exists and can be computed by taking the limit inside
the integral. This proves (8). The proof that F' is continuous is left to the
reader as problem 10. Q

Problems

1. Let fn(x) = e~nx on the interval [0,1]. Explain why the sequence of func¬
tions {fn} converges pointwise on [0,1]. What is the limiting function? Is
it continuous? Is the convergence uniform?

2. Compute the following limits:

(a) limn_+oo f*(x + £)1 2 3 dx.

(b) limn^oo /x2 e~nx dx.

(c) lim^oo // sin (x + J) dx.

(d) limn^oo f* (1 + f )n dx.

3. Give an example of a sequence of continuously differentiable functions


{/„} on [0,1] so that /„ -> / uniformly but / is not differentiable at all
points of [0,1]. Hint: draw graphs first.
174 Chapter 5. Sequences of Functions

4. Let {fn} be a sequence of continuous functions on a finite interval [a,b]


that converges uniformly to /. Show that for all continuous functions g
on [a, b],
pb pb
lim / fn(t)g(t)dt = / f(t)g(t)dt.
n—*’°° Ja Ja
5. Suppose that / is continuous and \f(x)\ < 1 for all x e [a, b\. Prove that

lira f (f(t)Y ldt = o.


n—>oo /
a

6. Let {/„} be a sequence of continuous functions that converges uniformly


on [0,1], Show that there is an M so that |/n(x)| < M for all n and all
x e [0,1].
7. Suppose that {/„} is a sequence of continuous functions on an open in¬
terval (a, b) that converges uniformly to / on (a, b). Suppose that each fn
is uniformly continuous on (a, b). Prove that / is uniformly continuous
on (a, b).

8. Show by example that the hypothesis of Theorem 5.2.3, that {fn(x)} con¬
verges for at least one x, is needed to obtain the conclusion.

9. Let {/„} be a sequence of continuous functions defined on K, and sup¬


pose that fn —y f uniformly on every finite interval [a, b\. Prove that / is
a continuous function on R. If each of the functions fn is bounded, is it
necessarily true that / is bounded?

10. Complete the proof of Theorem 5.2.4 by showing that F' is continuous.

11. Let / be a continuous function on [0, oo) which equals zero outside the
interval [a, b\. For each A > 0, define
pOO

F{ A) = / e~Xxf(x)dx.
Jo
By using Theorem 5.2.4, prove that F is infinitely often continuously dif¬
ferentiable on (0, oo). Remark: F is called the Laplace transform of /.
12. Let fn be a sequence of continuous functions defined on M, and suppose
that fn y f uniformly on every finite interval [a, b\. Suppose that there
is a nonnegative continuous function g on R such that |/n(s)| < g(x)
for all n and all x e R and suppose that the improper Riemann integral
ff°oo g(x) dx < oo exists.

(a) Prove that


/OO pOO

fn(x)dx— / f(x) dx.


-OO J —OO

(b) Find a counterexample that shows that if the hypothesis


fn(x) < g{x) is omitted, then the conclusion may not hold.
5.3 The Supremum Norm 175

13. Compute dx.


14. Suppose that / is bounded and continuous on [0, oo]. Prove that the
Laplace transform of / is infinitely often continuously differentiable.
Hint: use problem 12 and the idea in the proof of Theorem 5.2.4.
15. Suppose that {/n} is a sequence of continuous functions on a finite in¬
terval [a,b\. Suppose that /„ -y 0 pointwise on [a, b] and that for each
xe[a, 6] the sequence of numbers {/«(*)} is nonincreasing. Prove that
/n 0 uniformly. Hint: if not, there is an e > 0 and points xn so that
fn(xn) > e. Use the Bolzano-Weierstrass theorem to get a contradiction.

5.3 The Supremum Norm


Recall the reason for the definition of Cauchy sequence that we intro¬
duced in Section 2.4. We wanted a criterion for convergence of a se¬
quence {an} of real numbers that depended on the sequence itself, not
on the limit. The reason was simple. In many situations we construct
a sequence {an} by some procedure and then need to know that it con¬
verges. The limit is not given to us; its existence is, in fact, the goal of
our calculations. The Cauchy criterion allows us to assert the existence
of the limit, given estimates on the sequence itself. We used this idea
in the Bolzano-Weierstrass theorem, in the definition of the Riemann in¬
tegral and in many proofs. Since the definitions of “continuous” and
“differentiable” involve limits, the Cauchy criterion for sequences of real
numbers is involved, either explicitly or implicitly, in almost every theo¬
rem that we have proved.
Many of the functions which are “solutions” of problems in analy¬
sis (for example, integral equations or differential equations) are con¬
structed as the limit of a sequence of functions. Thus, we need a Cauchy
criterion for sequences of functions analogous to the Cauchy criterion
for sequences of real numbers. First, we need a way of saying when two
functions are “close,” and second, in order to do that, we need a notion
of “size” for functions.

Definition. Let / be a bounded function on a set E C R. We define

||/||oo = sup |/(x)|.


X € E

The number ||/||oo is called the supremum norm or the sup norm of the
function /.
176 Chapter 5. Sequences of Functions

Thus, H/lloo is just the supremum of the values of \f(x)\ on the set E.
Usually the set E will be an interval, finite or infinite. Notice that ||/||oo
depends on the set E, but we will not usually indicate that in the symbol
ll/lloo- If / is a continuous function on a finite interval [a, b], then |/| is
also continuous on [a, b], so ||/||oo is just the maximum of \f{x)\ on [a, b].

Proposition 5.3.1 Let / and g be functions defined on a set E C R.


Then,

(a) ll/Hoo > 0 and ||/||oo = 0 if and only if / is the zero function on E.

(b) For every aeR, we have Ha/Hoo = |a| ||/||oo-

(c) 11/ + g11oo < ll/lloo + Nloo (the triangle inequality).

Proof. The proofs follow quite easily from the properties of sup proven
in Section 2.5. Since \f(x)\ > 0 for each x e E, the supremum over all
such \f(x) \ must be nonnegative. Further, it is clear that \\f\\oo = 0 if and
only if \f(x)\ = 0 for all x e E, in which case / is the zero function on E
by definition. This proves (a).
To prove (b), note that

||a/||oo = sup{\af(x)\ \ x e E}
= sup{|a| \f(x)\ | x e E}
= |a| sup{|/(x)| | x e E}
M 11/11oo?

where we used problem 8 of Section 2.5 in the next to last step.


Finally, for each x e E, we know that

\f{x) + g(x)\ < \f(x)\ + \g(x)\ < ll/IU + ll/Hoo.

Thus, ll/Hoo + HpIIoo is an upper bound for {| f(x) + g(x)\ | x e E}, and
therefore.

ll/ + 0||oo = sup{|/(x) + g(x)\ I x e E}

^ ||/||oo T I |p 11 00

which proves (c). □

Note that the properties of ||/||oo that we have just proven are anal¬
ogous to the properties of absolute value in measuring the size of real
5.3 The Supremum Norm 177

numbers: (a) |x| > 0 and |x| = 0 if and only if x = 0; (b) \ax\ = |a| |x|
for all x; (c) for all x and y, \x + y\ < |a?| + \y\. The sup norm is not the
only way one can measure the size of functions. Other notions of size are
discussed below.
Using the sup norm, we can define convergence.

Definition. A sequence of functions {/n} on a set £ C M is said to


converge in the sup norm to a function / if, given e > 0, there is an N
such that
|| fn ~ /Hoc < £ for n> N.

In other words, {fn} converges to / in the sup norm if and only if

lim II fn - /||oo = 0.
n—Loo

Proposition 5.3.2 Let {/n} be a sequence of functions defined on a set


E Cl. Let / be a function on E. Then, fn converges to / uniformly on
E if and only if fn converges to / in the sup norm on E.

Proof. Suppose that fn converges to / uniformly and let e > 0 be given.


We can choose an N so that

\fn(x) — ,f(x)\ < e for all n>N and all x e E. (11)

Therefore,

||/n-/||oo < £ for n > N. (12)

Thus, by definition, fn converges to / in the sup norm.


Conversely, if fn converges to / in the sup norm, then (12) holds,
which implies (11). Therefore, fn converges to / uniformly. □

We have reformulated uniform convergence in terms of the sup norm


so that we can make the following definition, which is the analogue of
the definition of Cauchy sequence in Section 2.4.

Definition. A sequence of functions {/n} on a set E C K is said to be


a Cauchy sequence in the sup norm if, given any e > 0, there is an N
such that

II fn - /m||oo < £ f°r all n > N and m> N. (13)


178 Chapter 5. Sequences of Functions

If fn converges to / in the sup norm, then it is not hard to see that the se¬
quence fn is a Cauchy sequence (problem 1). The more difficult question
is this. Let S be a set of functions. Does a Cauchy sequence of functions
in S necessarily have a limit in S? If S is the set of continuous functions
on a set E, the answer is yes.

□ Theorem 5.3.3 Let {/„} be a sequence of continuous functions on E C


R. Suppose that {/„} is a Cauchy sequence in the sup norm. Then, there
exists a continuous function f on E such that

lim || fn ~ /||oo = 0.

Proof. Since {/n} is a Cauchy sequence in the sup norm, given e > 0,
we can choose N so that (13) holds. For each xeR, |fn(x) — fm{x)\ <
II fn /m||oo/ SO

| fn(x) - fm{x) | < e for all n > N and m> N. (14)

Thus, for each x, the sequence of real numbers {fn(x)} is a Cauchy se¬
quence and therefore converges because the real numbers are complete.
Define f(x) = limn_+oo fn{x). Since the absolute value is a continuous
function,
\f(x) - fm(x)| = Jirn^\fn(x) - fm(x)|,

so (14) implies that

I f{x) - fm(x)| < £ for all m > N and all x e E.

This shows that fm -> / uniformly. Therefore, / is a continuous function


(by Theorem 5.2.1), and fn converges to / in the sup norm (by Proposi¬
tion 5.3.2). □

Because of Theorem 5.3.3, we say that C[a,b], the set of continuous


functions on a finite interval [a, 6], is complete in the sup norm. That
is, every sequence of continuous functions which is a Cauchy sequence
in the sup norm on [a, b] converges to a limiting continuous function.
This fact will play as important a role in the subsequent chapters as did
the convergence of Cauchy sequences of real numbers in the previous
chapters.
5.3 The Supremum Norm 179

If we use the sup norm, the “size” of a continuous function on [a, 6] is


just the maximum value of \f(x)\. There are other reasonable notions of
size. For example, we could define

11/111=/ \f{x)\dx,
Ja

in which case we are taking the size of / to be the area under the graph
of \f(x)\. More generally, we define for 1 < p < oo,

\f(x)\p dx\ ,

although these sizes do not have simple geometric interpretations if


p > 1. Each of these is called a “norm” on C[a,b] because it satisfies
the three properties, (a), (b), and (c), of Proposition 5.3.1. This is easy
to see in case p = 1 (problem 4), but harder to see for p > 1. These dif¬
ferent norms-are useful in different circumstances. For example, we will
see that the L2 norm ||/||2 arises naturally in the study of Fourier series
(Chapter 9). We give a formal definition of “norm” in Section 5.8. The
point of introducing norms here is to point out that there are other rea¬
sonable ways to measure the size of functions and to raise the question
of whether C[a, b] is complete if we use one of these other norms.

Figure 5.3.1

Example 1 Consider the functions fn on [0,1] depicted in Figure 5.3.1


and defined by
180 Chapter 5. Sequences of Functions

Each fn is a positive continuous function on [0, 1], and if m > n we have


fm{x) > fn{x) for each x, so

Thus, given e > 0, we can choose N so that

||/m — /n||l < £ f°r all n > N and m > N.

Therefore, {fn} is a sequence of continuous functions that is a Cauchy


sequence in the norm || • ||i. Can there exist a continuous function / so
that ||/ - /n|| i —> 0 as n -» oo? Evidently not, because any such / would
be bounded by Theorem 3.2.1, and for large enough n the function fn
will have more and more area over the bound of /; thus, it would be
impossible for || fn - f ||i to converge to zero. Thus, {/n} is a Cauchy
sequence of functions in C[0,1] in the norm || • ||i that does not have a
limit in C[0,1]. Thus C[0,1] is not complete in the norm || • ||i- In fact,
C[0,1] is not complete in any of the norms || • ||p if 1 < p < oo.

This leads us to a very important point about the Riemann integral


which was developed by A. Cauchy and B. Riemann in the 19™1 century.
We have seen that the Riemann integral is naturally and easily defined
on continuous functions and can be extended to functions which have
only finitely many discontinuities. There is a theorem (which we have
not proven) which shows that the Riemann integral exists, that is, the in-
fimum of the upper sums is equal to the supremum of the lower sums for
all functions that are “almost” continuous in the sense that their singu¬
larities occur on a “small” set. The above discussion shows why a more
general theory of integration is necessary. A variety of norms, like the
|| • ||p norms, arise naturally in different analytical situations. To use these
norms effectively one must use spaces of functions that are complete in
these norms so that one knows that Cauchy sequences of functions have
limits. Since spaces of continuous functions, or almost continuous func¬
tions, are not complete in these norms, one needs to use larger sets of
5.3 The Supremum Norm 181

functions which are complete. These sets of functions must contain at


least the functions that one obtains as limits of Cauchy sequences of con¬
tinuous functions. Since these limiting functions will, in general, have
large sets on which they are not continuous, we need a theory of how to
integrate such functions. This theory, which was developed in the early
20^ century by Lebesgue and others, is covered in more advanced text¬
books. Our point here is that the motivation for the Lebesgue theory
is not that one wants to integrate singular functions per se but that one
needs to work with sets of functions that are complete.
Finally, we remark that for a continuous function / on [0,1], it is not
too hard to show that ||/||p converges to ||/||oo asp —>■ oo. See project 1.
This is why the sup norm is denoted by the symbol || •Hoc.

Problems

1. Let fn be a sequence of bounded functions on a set E, and suppose that


/ is a bounded function such that ||/n — /||oo -> 0 as n —> oo. Prove that
{/n} is a Cauchy sequence in the sup norm.

2. Let / and g be continuous functions on [a, b].

(a) Use the triangle inequality to prove that

ll/ll oo t?||oo | C 11/ ^lioo-

(b) Suppose that /„->•/ in the sup norm. Prove that ||/n||oo -* ||/||oo-

3. Let <7b(M) denote the set of bounded continuous functions on R. Prove


that Cb(R) is complete in the sup norm. Hint: use problem 2 to show that
a Cauchy sequence of functions in the sup norm must have a uniform
bound.
4. For / e C[a,b], define ||/||i = \f(x)\ dx. Show that || • ||i satisfies the
three properties, (a), (b), and (c), of Proposition 5.3.1.

5. Let the functions fn be defined on [0,1] by

and define
1 0 < x < \

0 \ < x < 1.
182 Chapter 5. Sequences of Functions

(a) Prove that /n -> / pointwise on [0,1]. Hint: draw the graph of fn.
(b) Prove that ||/ — /n||oo = 1 for each n so that fn does not converge to
/ in the sup norm.
(c) Explain how you could have predicted the result of part (b) simply
by using Theorem 5.2.1.
(d) Prove that \\f — fn\\i —* 0 as n —>■ oo.

6. Let fn be the functions in Example 1 and define f(x) = on (0,1]. Prove


that each of the functions \f — fn\ has an improper Riemann integral on
[0,1] and

lim ||/ fn||i = lim [ \f(x) - fn(x)\dx = 0.


n—>-00 n—hx> Jq

7. Let C^[a, b] denote the set of continuously differentiable functions on a


finite interval [a, 6]. For each / in C(1)[a, b\, define ||/|| = ||/||+ ||/'||oo-

(a) Show that ||/|| has the properties (a), (b), and (c) of Proposition 5.3.1.
(b) Prove that C^[a,b] is complete in the norm ||/||. Hint: follow the
proof of Theorem 5.3.3 and use Theorems 5.2.1 and 5.2.3.

8. Let C7([a, 6];R2) denote the set of pairs, (f(x),g(x)), of continuous func¬
tions on [a, b}. Define a norm on C([a, 6]; R2) by

ll(/, <7)1100 = ll/lloo + IMIoo.

(a) Prove that this norm satisfies the properties (a), (b), and (c) of Propo¬
sition 5.3.1.
(b) Prove that C([a, &]; R2) is complete.

9. Let Q be a closed, finite, rectangle in the plane, and let C(Q) denote the
set of real-valued continuous functions on Q. For / in C(Q), define

ll/lloo = sup \f(x, y)\.


(x,y)eQ

(a) Prove that this norm satisfies the properties (a), (b), and (c) of Propo¬
sition 5.3.1.
(b) Prove that C(Q) is complete.

10. Let C0(R) denote the set of continuous functions on R such that
limx_).OC) f(x) — 0 and lim^^ f(x) = 0. Prove that C0(R) is complete in
the || • ||oo norm. Hint: since C0(R) C C6(R) the limit of a Cauchy sequence
certainly exists and, by problem 3, is in Cb(R).
5.4 Integral Equations 183

5.4 Integral Equations

An equation for an unknown function ip(x) that involves the integral of


ip(x) or the integral of a function of ip(x) is called an integral equation. A
simple example is

ip(x) = f(x) + A f K(x,y)ip(y)dy. (15)


Ja

Here K(x, y) is a given function of two variables, / is a given function of


one variable and A is a parameter. We can't solve directly for ip(x) since
ip(x) occurs in the integral as well as on the left-hand side. This is a linear
integral equation because the integral term depends linearly on ip(x). A
nonlinear integral equation is discussed in project 2. Integral equations
often arise in the study of ordinary and partial differential equations. Ex¬
amples of applications to ordinary differential equations can be found in
Project 3 and Section 7.1. Integral equations can also occur over infinite
intervals and can have variables in the limits of integration; see prob¬
lems 10 and 11. For more information and the further development of
the subject, see [22].
We will prove that (15) has solutions in the case where A is small, / is
continuous on [a, b], and K is a continuous function on the square

Q = [a, b] x [a, b].

The continuity of K implies that K is bounded and uniformly continu¬


ous on Q (Theorem 4.6.1).

□ Theorem 5.4.1 Let / be a continuous function on the finite interval [a, b\


and let K be continuous on Q. Then, if A is small enough, there exists a
unique continuous function ip(x) on [a, b] that solves (15).

Proof. Let ipo(x) be any continuous function on [a, b]. Define functions
'ipi(x), ip2{x), • • •, recursively by the formula

Tpn{x) = f(x) + A f K(x, y)'tpn-i{y) dy. (16)


Ja

We begin by showing that the functions ipn are continuous. Suppose that
ipn is continuous for some n. For each fixed x, K(x, y) is a continuous
function of y. By Theorem 3.1.l(c), K(x, y)ipn{y) is a continuous function
184 Chapter 5. Sequences of Functions

of y, so the integral in (16) makes sense. Now let x\ and X2 be points in


[a, 6]. Then,

|^n+l(*l)-V'n+l(®2)|

< l/(*l) - /(*2)| + xf (K(xi,y) - K(x2,y))'ipn(y)dy (17)


Ja

< \f{xi)~ f(x2)\ + \X\ [ \K(xi,y) - K(x2,y)\\ipn{y)\dy (18)


Ja

Let e > 0 be given. Since / is continuous, we can choose a <5i such that
\f(xi) - f(x2)\ < § if \xi - x2\ < Si. To handle the second term in
(18), note that, because xj)n is continuous, there is a constant Cn so that
|i/>n(y)\ < Cn for all y e [a, 6]. Since K is uniformly continuous on the
square, we can choose a S2 so that \x\ - x2| < S2 implies

e/2
\K(xi,y) - K(x2,y)\ < for all y e [a, b\. (19)
Cn(b — a)

Using (19) to estimate the integral in (18), we find

|^n+i(*i)-^n+i(*2)| < e/2 + e/2

if |*i - *21 < min{<!>i, £2}, which shows that i/’n+i is continuous. By as¬
sumption, ^o(x) is continuous. So, by induction, all the ij)n are continu¬
ous functions on [a, b].
Next, we will show that {V’n} is a Cauchy sequence in the sup norm
if A is sufficiently small. Let M be an upper bound for K on the square.
Then, for n > 1,

\lj>n+l(x) - 1pn(x)\ < xf K(x,y)(ijjn(y) - i/>n-i(y))dy


Ja

< I'M [ \K{x,y)\ l^n(y) -^n-i(y)|dy


Ja

< I'M / M\\lf>n - V'n-lHoo dy


Ja

= \X\M(b- a)\\^n - V^n-illoo-

We have repeatedly used the fact that if one replaces an integrand by


something larger, the integral gets bigger (Theorem 3.3.4). For simplicity,
set
5.4 Integral Equations 185

a = |A| M(b-a)

and choose A small enough so that a < 1. We showed above that

|0n+l{x)-'lpn{x)\ < o||0n - 0„_l||oo

for all x e [a, b\. Thus, taking the sup of the left-hand side, we find

||0n+l 0n||oo 5; a||0„ 0n—l||oo-

Iterating this inequality gives

||0n+l - 0n||oo < (n)n||'01 - 0o||oo- (20)

Suppose that n > N and m > N, and let m > n be the larger of the two
integers. Then, by the triangle inequality (Proposition 5.3.1(c)),

||0m 0n||oo — II (0j 0j—i)lloo


j=n+1
m
< 'y, W'tpj — ipj-iiioo
j=n+1
m
< y OJ_1||0i - 0O||oo
j=n+1
(m-1 \
< y a3 ) ||01 — 00 II oo •
J=n )
The sum in parentheses is part of the tail of the geometric series in a.
Using the estimate in part (b) of problem 1, we find that

m— 1 m—n—1 n
y a-’ = an y < -- < --
^ ■f—' l —o l —o
j=n j=0

since n> N and 0 < o < 1. Because ||0i - 0o||oo is a fixed number and
o < 1, it is clear that we can choose N large enough so that

||0m 0n||oo — £
if n > N and m > N. This proves that the sequence {0n} is a Cauchy
sequence in the sup norm on [a, 6]. By Theorem 5.3.3, there exists a con¬
tinuous function 0 on [a, 6] such that ||0n - 0||oo 0- By Proposition
5.3.2, 0n -> 0 uniformly.
186 Chapter 5. Sequences of Functions

We are not done since we must still show that ^ satisfies (15). Con¬
sider equation (16). For each x, the left-hand side converges to 'ip(x). On
the other hand, for each fixed x, K(x,y) is a continuous function of y,
thus, since xpn —* ip uniformly, we conclude (problem 4 of Section 5.2)
that
rb rb
/ K{x,y)il)n-i{y) dy —-A / K{x,y)^{y)dy
Ja Ja

as n —> oo. Therefore, ip(x) satisfies (15). Suppose that cj>(x) is another
continuous function that satisfies (15). If we subtract the equation
from the 'ip(x) equation and estimate as above, we find

H - </>||oo < Ot\\lf)~ </>||oo.

Since a < 1, this equation can hold only if \\ip — (j)^ = 0, which implies
that 4>{x) — ip(x) for all x. Thus, the solution ip(x) is unique. □

Notice that the proof not only guarantees the existence of a ijj(x) sat¬
isfying (15) but also shows that, if we start with any ipo(x) and iterate the
relation (16), we can approximate ip(x) by ipn(x) after n steps, and the
estimates show how good our approximation will be.

Problems

1. Let5n = E;=0«T

(a) Show that aSn + 1 = Sn + an+1.

(b) Suppose 0 < a < 1. Show that Sn < ~rr

2. Consider equation (15) with A = 1. By carefully considering the steps in


the proof of Theorem 5.4.1, explain why (15) will have a unique continu¬
ous solution if (b - a) is small enough or if ||is small enough.

3. Explain why there is a unique continuous function on [0,3] that satisfies

1
b(z) sin a? + e x y dy.
4

4. Explain why there is a unique continuous function on [0,2] that satisfies

ip(x) + [ xye xyip(y)dy.


Jo
5.4 Integral Equations 187

5. Suppose that K(x, y) = xy, f(x) = 1, a = 0, b = 1, and |A| < 3 in equation


(15). Start with 'tpo(x) = 1 and find 0. Show that there is no solution if
A = 3. Hint: notice that any solution must be of the form 0(a0 = 1 + cx
for some c.

6. Suppose that K(x, y) = g(x)h(y) and that j^ g(x)h(x) dx = 0. Let ipo(x) —


f{x). Show that all iterates equal the first iterate and find a simple for¬
mula for the solution.
7. Suppose that K(x,y) = (2 + a:2) sin (a:2 + y2), a — 0, b = 1, and A i
5'
Let f(x) = 1. Choose a 0o(z) and estimate how close the iterate will
be to the solution. Hint: estimate ||0m — 0n||oo in terms of ||0i — 0o||oo
and let m —> oo. Then estimate ||0i — 0o||oo- Note that your estimate will
depend on your choice of ipo(x).
8. Suppose that there is a nonzero continuous function h(x) so that

h(x) = A / K(x,y)h(y)dy.
Ja

Show that if ip(x) is a solution of (15), then 0(x) + h(x) is also a solution.
Let a = 0, b = n, and K(x,y) = sin x sin y. Show that there is a choice
of A so that such an h(x) exists. Why doesn't this nonuniqueness violate
Theorem 5.4.1?
9. Prove that there is at most one continuous function on the interval [0,2]
that satisfies

0(x) = f(x)
s-2

Hint: estimate and use the Mean Value Theorem.


cos (.30(y)) dy.

10. (a) Show that there is a unique, bounded, continuous function 0 on the
interval [0, oo) that solves
poo
-2x
0(x) + / e~2x~2y sin (x — y)0(y) dy.
Jo
Hint: follow the proof of Theorem 5.4.1.
(b) Prove that ||0||oo < 2. Hint: iterate ||0||oo < 1 + ||

11. Let K be a continuous function on the square S = [0,1] x [0,1] and let /
be a continuous function on [0,1]. We want to solve the Volterra integral
equation:
pX
0(z) = f(x) + / K(x,y)ip(y)dy. (21)
Jo
Let 0o(s) = 1 and define 0n(*) recursively by
pX
0n+i(*) = f(x) + / K(x,y)ipn{y)dy.
Jo
188 Chapter 5. Sequences of Functions

(a) Prove that ipn(x) is continuous on [0,1] for each n.


(b) Let M = sups \K\ and C = ||V>i - V’olloo- Prove that

\'ip2(x) -< MCx for all x e [0,1].

(c) Prove that for each n > 2,

MnCrn
\ipn+1{x) - ipn{x)\ < —- for all X e [0,1].

(d) Prove that tpn converges uniformly to a solution of (21). Hint: the
series ^ converges.

5.5 The Calculus of Variations


One of the familiar uses of calculus is to find the maxima and minima
of functions of a real variable. However, in many applications the quan¬
tities which one wants to maximize or minimize depend on functions
rather than points. For example, the flight path of an airplane flying be¬
tween two cities is given by specifying the three functions, x[t), y(t), and
z(t), which give the coordinates of the plane as a function of time. If we
are given the wind speeds at each point of space, we might want to know
the flight path that minimizes the amount of gasoline used. The rate of
growth of a yeast colony in a tank depends in a complicated way on the
amount of glucose, g(t), available at each time t. How should we choose
g(t) so that the growth is the greatest with the minimum expenditure
of glucose. The branch of mathematics in which one studies methods
for maximizing and minimizing quantities that depend on functions is
called the Calculus of Variations. It has a long and rich history which
greatly influenced the development of analysis in the 18^ and 19^ cen¬
turies. See the discussion in [28]. In this section we illustrate some the
simplest ideas by examining two classical questions from geometry and
physics. Further developments can by found in [14] and [18].

Example 1 Let (xi,y\) and (22,2/2) be two points in the plane. What
is the curve between them that has the shortest length? We all know
that the answer is the straight line that connects the points, and that cer¬
tainly seems reasonable geometrically. But can we prove it analytically?
To keep the analysis simple, we will only consider curves between the
points that are the graphs of functions. Let y(x) be such a function, that
5.5 The Calculus of Variations 189

is, y(xi) = yi and y(x2) = y2, and suppose that y(x) is continuously dif¬
ferentiable. See Figure 5.5.1(a). Then the arc length of the graph of y(x)
between the points (*1,2/1) and (x2, y2) is

J(y') = [ yj 1 + (y'{x))2dx.
J X\

Project 2(a) of Chapter 3 outlines the derivation of this formula. J


is called a functional because its domain consists of a set of func¬
tions (namely the continuously differentiable functions whose graphs go
through the two points) and its value is a real number. We write J(y') be¬
cause J depends explicitly on y' and depends on x and y(x) only through
y'. We want to know which function y(x) minimizes J.

(a) (*2,3/2) (b) (0,0)

Figure 5.5.1

Example 2 In 1696, John Bernoulli posed the following problem. Let


(*i, yi) be a point in the plane with *i / 0 and y\ < 0. A wire, whose
graph is given by the function y(x) connects the origin (0,0) to the point
(x\,yi). At time t = 0, a bead on the wire at (0,0) is released and slides
down the wire to (x\,yi) under the influence of gravity which we as¬
sume points in the negative y direction. If we neglect friction and air
resistance, what is the correct shape of the wire so that the transit time is
the shortest? This is known as the brachistochrone (shortest time) prob¬
lem. See Figure 5.5.1(b).
Let y(x) be the function whose graph has the shape of the wire. Let
v(x) be the speed of the bead when the x-coordinate is x. Let s(x) denote
the length already traversed and t(x) the time already spent when the
x-coordinate is x. Then,

ds ds dx
v[x) = — = —-—
dt dx dt
or
dx v(x)
ds
dt dx
190 Chapter 5. Sequences of Functions

By Theorem 4.5.2,

* = jt_.
dx v(x)

From project 2(a) in Chapter 3 we know that the rate of change of arc
length with respect to x is given by ^ = \/l + (y'(x))2. To figure out how
v(x) and y(x) are related, we use the fact that the total energy (kinetic
plus potential),

E - ^mv(x)2 + mgy(x),

is conserved since we are neglecting friction. Here m is the mass of the


bead and g is the acceleration of gravity. Since the bead starts at rest at
the origin, we can take the energy to be zero at t = 0. Solving, we find
that v(x) — y/—2gy(x) for all x. Therefore, by the Fundamental Theorem
of Calculus,

Total time elapsed = t(xi) — t(0)

Xl dt
= 1 — dx
dx

X1 y/1 + (y'(x))2
dx.
-L V-2 gy(x)

Thus, we want to find how to choose y(x) so that the functional

J(y,y') =
r x\
/ a/i + (y'{x))2 .
Jo x

is minimized. We write J(y,y') because J depends explicitly on both


y(x) and y'(x).

These two examples suggest a very general question. Let / be a func¬


tion of three independent variables. How do we find the functions y(x),
whose graphs pass through the points (xi,yi) and (x2,y2)/ so that the
the functional

rx2
J{x,y(x),y (x)) = / f(x,y(x),y'(x))dx (22)
J Xl

is maximized or minimized. We have allowed J to depend explicitly on


x though that was not the case in the above two examples.
5.5 The Calculus of Variations 191

(a?2,2/2)

(*i>yi)

Figure 5.5.2

Here is the idea. Suppose that y(x) minimizes J, and let 77(2) be a
continuously differentiable function such that 77(21) = 0 = 77(22). For
every e, the graph of y(x) + £77(2) passes through the points (21,7/1) and
(*2, U2)- See Figure 5.5.2. Note that e is allowed to take negative values.
Define a real-valued function I on R by

1(e) = J(x,y(x) + erj(x),y'(x) + ey'(x)).

If y(x) minimizes J, then the function 1(e) should reach its minimum
at e = 0, and this should be true for every choice of 77(2) that satisfies
77(21) = 0 = 77(22). If 1(e) is continuously differentiable, then, according
to Theorem 4.2.1,

I'( 0) = 0, (23)

and this should be true for all choices of 77(2). If y is a twice continu¬
ously differentiable function such that (23) holds for all such 77(2), then
y is called an extremal for the functional J. We want to determine what
condition this puts on the function y(2).

□ Theorem 5.5.1 Suppose that / is a twice continuously differentiable


function of three variables and that the functional J is defined by (22).
Suppose that y(x) is an extremal for J. Then, y(x) satisfies the differential
equation

(24)

Note. We follow tradition in the Calculus of Variations and denote the


partial derivatives of / with respect to the second and third variables by
fy and fyi respectively. Equation (24) is known as the Euler equation
after Leonhard Euler(1707 - 1783).
192 Chapter 5. Sequences of Functions

Proof. Let rj(x) be a twice continuously differentiable function satisfy¬


ing 77(2:1) = 0 == 77(2:2). If /(e) is defined by (22), then /'(0) = 0 since y is
an extremal. Using Theorem 5.2.4, we can compute I'(e) by differentiat¬
ing under the integral sign and using the chain rule:

rx 2
l'{e) = / fy{x,y(x) + er](x)y'(x) + Er]'(x))ri(x)dx
J X\

rx 2 *

+ / fy'(x,y{x) + £y(x),y'(x) + ey {x))y (x) dx.


J Xl

Thus,

fX2
l'(0) = / fy{xiy(x),y'(x))r]{x) + fy'(x,y(x),y'(x))r] (x)dx (25)
J X\

rx2 d
= J {fv{x,y{x),y'(x)) - —fy'(x,y(x),y'(x))}ri(x)dx, (26)

where we integrated by parts in the second term and used the assump¬
tion that rj vanishes at the endpoints. Therefore, if y(x) is an extremal.

ify(x,y{x),y'(x)) fy' (x, y(x),y'(x))}rj(x) dx 0

for all twice continuously differentiable functions 77(2:) vanishing at the


endpoints. Since y(x) is twice continuously differentiable by assump¬
tion, the expression in brackets is continuous and it follows easily (prob¬
lem 1) that the expression in brackets must be identically zero. □

Example 1 (revisited) In Example 1, f(x, y, y') = y/1 + (y1)2. Calculat¬


ing the derivatives in Euler's equation and rearranging, we find

d y'
fy(x,y(x),y’(x)) ’(x,y(x),y'(x))
dx (1 + {y'Y)\

y"
(1 + (t /)2)2

Thus, if y(x) is an extremal, then y"(x) = 0 for all x e [a:1? *2]- Therefore,
y(x) is a straight line, and since it must pass through (xi,yi) and (m2,2/2)
the function y(x) is determined. So far so good. The only extremal is
the straight line between the points. But how do we know whether the
5.5 The Calculus of Variations 193

extremal is a minimum, a maximum, or neither? Even if we can show


that it is a local minimum (it minimizes J over all curves through the
points that are close to y(x)), how do we know it is a global minimum? In
general, such questions are very difficult, but in this case we can answer
them directly because of the simplicity of the functional. Let k be the
slope of the extremal y(x). Then

1(e) = f + (« + er]'(x))2 dx.


J X\

A short calculation (problem 2) shows that

I"
(V(*))2
(! + (« + £Tj'(x))

and so I"(e) is strictly positive for all e if rj(x) is not the zero function
on [x 1,2:2]. It follows from the Fundamental Theorem of Calculus that
I'(e) > 0 if e > 0 and I'(e) < 0 if e < 0. From this, it follows that
1(e) > 1(0) if e ^ 0. Since every function whose graph goes through
(x\,yi) and (X2,f/2) can be written in the form y(x) + rj(x) for an rj(x) that
vanishes at x\ and X2, we conclude that the straight line is the absolute
global minimum of the length functional J over the whole class of twice
differentiable functions whose graphs go through (xi, y\) and (X2,3/2)-

The Euler equation is normally a second-order differential equation.


However, if the integrand / of the functional J does not depend on x
explicitly (as in Examples 1 and 2), then we can find a first-order differ¬
ential equation satisfied y.

y'fy' - /) = y"fy' + fy'yiv'f + fy'y'V'y" ~ fyV' ~ /fy'

= 0.

Thus

fyy'-f = C (27)

for some constant C. Since (27) is a first-order differential equation, it is


usually easier to solve.
194 Chapter 5. Sequences of Functions

Example 2 (revisited) For the Brachistochrone,

Vi + (y'{x))1 2
f{y,y)
V~2gy(x)
So, using (27) and carrying out the differentiations, we compute that

_(j/f_y/1 + (y'(x))2, = c
V'l + {y'(x))2^-2gy(x) y/-2 gy(x)

Rearranging and squaring both sides, we find that any extremal y(x)
must satisfy the differential equation

y(x){l + (y'{x))2) = ~2gC^’

The method for solving this differential equation, which requires a spe¬
cial change of variables, is outlined in project 4.

Problems

1. Suppose that H(x) is a continuous function on the interval [xi,x2] and


that

H(x)t](x) dx = 0 (28)

for all twice continuously differentiable functions rj(x) that vanish at the
endpoints. Prove that H(x) = 0 in the interval as follows:

(a) Let [a, b] be any finite interval. Show how to construct a twice con¬
tinuously differentiable function on R which is strictly positive on
the open interval (a, b) and identically zero everywhere else. Hint:
use pieces of polynomials.
(b) If xa is a point of [xi,x2] such that H(x0) ^ 0, show how to choose
r)(x) so that hypothesis (28) is violated.

2. Carry out the second derivative calculation in Example 1, revisited.

3. Find a curve passing through (1,2) and (2,4) that is an extremal for the
functional

xy'(x) + (y'(x))2 dx.


5.5 The Calculus of Variations 195

4. Find a curve passing through (1,1) and (2,2) that is an extremal for the
functional

Ji x°

5. Find a curve passing through (0,0) and (1,1) that is an extremal for the
functional

6. Find a curve passing through (0,0) and (1,1) that is an extremal for the
functional

7. Find a curve passing through (0, 0) and (f, 1) that is an extremal for the
functional

8. Let y(x) be a twice differentiable function whose graph passes through


the points (xi, 2/1) and (x2,2/2) in the plane, where yi > 0 and y2 > 0.

(a) Show that the surface area generated when the curve is revolved
around the x-axis is given by (project 2(d) of Chapter 3)

(b) Show that if y(x) is an extremal, then y(x) satisfies

y&) _ „
V1 + (y '(x))2

(c) Show that the functions y(x) = C2 cosh satisfy the differen¬
tial equation for all choices of C\ and C2.
(d) Show that C\ and C2 can be chosen so that the graph of y passes
through (xi,r/i) and (x2,y2).

9. Explain why you would not expect the functional

to have a minimum in the class of twice continuously differentiable func¬


tions whose graphs go through the points (xi, 2/1) and (x2,2/2)-
196 Chapter 5. Sequences of Functions

5.6 Metric Spaces


There is a very strong similarity between the way we studied convergent
sequences of numbers in Chapter 2 and the way we studied convergent
sequences of functions in Sections 5.1 - 5.3. For real numbers zeR, we
have a notion of “size,” \x\. This allowed us to define the distance be¬
tween two numbers x and y as \x — y\. Using this notion of distance, we
then defined what it means for a sequence {an} to converge to a limit a.
Namely, given e > 0, there is an N so that n> N implies that \an — a\ < e.
We then introduced the idea of Cauchy sequence and the idea of com¬
pleteness, and the rest, so to speak, is history.
For continuous functions on a finite interval [a, b], we also have a
notion of “size,” ||/|joo = supx |/(x)|. The sup norm allows us to define
the distance between two functions / and g as ||/ — p||oo- In Section 5.3
we saw that a sequence of functions {/n} converges uniformly to a limit
function / if and only if, given e > 0, there is an N so that n > N implies
that ||/ — /n||oo < £• We then introduced the idea of Cauchy sequence
and the idea of completeness, and we showed that the set of continuous
functions, C[a, b], is complete.
Although the two sets of objects, R and C[a,b\, are very different,
the progression of ideas is very similar. In both cases, the definition of
“convergence” depends on having a way to measure the “distance” be¬
tween two elements of the set. This suggests that there is a general idea
here which is worth studying. We begin by saying what properties dis¬
tance functions, called metrics, should have.

Definition. If M. is a set, a function p from M. x M. to [0, oo) is called a


metric if

(a) for all x and y in M, p(x, y) > 0 and p(x, y) = 0 if and only if x = y.

(b) p{x,y) = p{y,x)

(c) for any three points x, y, and z in M, p(z, z) < p(x, y) + p(y, z).

The pair (M, p) is called a metric space.

The three conditions are very intuitive. The first statement says that
the “distance” between distinct points is always positive. The second
says that the distance from x to y is always the same as the distance from
y to x. The third says that the distance from x to z is less than or equal to
5.6 Metric Spaces 197

the distance from x to y plus the dis¬


tance from y to z. This idea is very fa¬
miliar from the usual idea of distance in
the plane. See Figure 5.6.1. In the spaces
that we have previously considered, we
shall see that property (c) is really just
Figure 5.6.1
the triangle inequality .

Example 1 Let M. = R and define p(x,y) = \x — y\. Then properties (a)


and (b) are immediate since the absolute value function is even and the
absolute values of all numbers except zero are positive. Since

p(x,z) = \x-z\ = \(x - y) + (y - z)\


< \x - y\ + \y - z\
= p{x,y) + p{y,z),

we see that property (c) holds too. Thus p is a metric on M. Note that the
crucial step in proving (c) was the triangle inequality.

Example 2 Let M = C[a,b\ and define poc{f,g) = 11/ - y||oo- If


Poo(/, 9) = 0, then | f(x) - g{x)\ = 0 for all x e [a, b]. Thus the functions /
and g are identical. As is Example 1, property (b) holds because absolute
value is an even function. Let /, g, and h be any three functions in C[a, b\.
Then,

Poo{f,g) = \\f - g\\oo = \\(f ~ h) + (h - g)\\oo


< 11/ — h\\oo + \\h — ^||oo

~ Poo (f,h) + poo{h,g),

so property (c) holds too. Thus poo is a metric on C[a, b}. The crucial step
in proving (c) was the triangle inequality (Proposition 5.3.1(c)).

Example 3 Let x = (xi,x2, •••, xn) and y = (yi, y2, yn) be points in
Rn. We define the Euclidean metric on Mn by

In the case n = 1, this is just the metric in Example 1. As in the above


examples, properties (a) and (b) are easy to verify. In the case n = 2,
198 Chapter 5. Sequences of Functions

property (c) follows easily from the triangle inequality proved in prob¬
lem 10 of Section 2.2. The proof in the general case, which is somewhat
harder, is given in Example 3 of Section 5.8. For the moment, we restrict
our attention to K2, where

P2((®i,yi),(®2»y2)) = \J\xi - x2|2 + |yi~y2|2-

The same set can have different metrics. In problem 1, the student is
asked to verify that both

pi(Oi,yi), ix2,V2)) = \xi x21 + |yi-y2|

and
Pmax((*i,yi),(®2,y2)) = max {\xi - x2|, |yi - y21}

are metrics on 1R2. Let us now denote a general point in M2 by (x, y)


and ask what is the set of points (x, y) that are a distance < 1 from the
origin (0, 0). Since the three metrics, p2, pi, and pmax measure distance
differently, it is not surprising that the three sets,

{(ayy) e M2 |p2((x,y), (0,0)) < 1} = {(x, y) e M2 | x2 + y2 < 1}


{(x,y) 6l2 |pi((x,y),(0,0)) < 1} = {(x, y) e K2 | |x| + |y| < 1}
{(x, y) e M2 | pmax((x, y), (0, 0)) < 1} = {(ar, y) e R2 | max {|x|, \y\} < 1},

are different, as shown in Figure 5.6.2. Thus different metrics correspond


to different geometries, which is why metrics play an important role in
geometry; see, for example, [11] or [40].

(a) \/x2 + y2 < 1 (b) |x| + |y| < 1 (c) max{|x|, |y|} < 1

Figure 5.6.2

Notice that metric spaces are not required to have any linear struc¬
ture; that is, they need not be vector spaces. No notion of addition or
5.6 Metric Spaces 199

scalar multiplication occurs in the definition. Thus for any metric space
(Ai,p) and any subset M.i C M., (M.\,p) is also a metric space. For ex¬
ample, the set of points (x, y) in R2 such that x2 — y is a metric space
under any of the three metrics on R2 discussed in Example 3. If M. is a
vector space and the vector space has a norm || • ||, then p\x, y) = ||cc - y||
defines a metric on M. because (c) follows from the triangle inequality
for norms. Vector spaces with norms are discussed in Section 5.8.

Example 4 Let S2 denote the set of points in R3 that are a distance 1


from the origin in the Euclidean metric. That is,

S2 = {{x,y, z) e R3 | x2 + y2 + z2 = 1}.

Since S2 is a subset of R3, the Euclidean metric, p2, is a metric on S2.


Here is a different metric. For any two points a and (3 in S2, let pg(a,/3)
be the length of the shortest curve between a and (3 which remains on
the surface 82- Here “length” means the Euclidean length of the curve as
a curve in R3. These shortest curves that remain on the surface are called
geodesics, and pg is called the geodesic metric. The great circles are the
circles on S2 whose centers are (0,0,0)- Unless a and (3 are antipodal,
there is a unique great circle through a and (3. It can be shown that the
geodesic between a and (3 is the shorter piece of the great circle through
a and (3. If a and (3 are antipodal, then there are infinitely many great
circles through a and (3. This does not create a problem since they all
give the same geodesic distance, 7r. It is clear that pg satisfies properties
(a) and (b) in the definition of metric. Property (c) is harder to prove
though it looks “obvious” if the three points are close together.

Example 5 Let Q be an alphabet of symbols, finite or infinite, and let


N be a positive integer. Define M to be the set of strings of symbols of
length N which can be made from the symbols in Q. Let 6 be the function
on Q x Q such that 5(q, q) = 0 and 6(q,p) = 1 if q ^ p for all symbols
q and p in Q. If x = q\q2 ■ ■ - qN and y = P1P2 ■ ■ ■Pn are two strings of
symbols in M., we define

N
p(x,y) = (29)
i—1

It is not hard to show that p is a metric on M (problem 2). Metrics like p


are useful in coding theory, where one wants to know how “close” two
200 Chapter 5. Sequences of Functions

messages are; see Section 10.2. Here we will briefly describe why such
metrics are useful in molecular biology.
Deoxyribonucleic acid (DNA) is a two-stranded polymer, each strand
consisting of a sequence of four building blocks joined together linearly
along a sugar-phosphate backbone. The four building blocks are the nu¬
cleotides adenine (A), thymine (T), cytosine (C), and guanine (G). Since
A only binds to T and C to G, the linear sequence of A's, T's, C's and G's
along one strand determines the other strand and therefore the whole
DNA molecule. The length of the sequence ranges from 4.6 million for
an E. coli bacterium to about 3 billion in human DNA. Short segments
of the DNA molecule code for the production of specific proteins. Ev¬
ery protein consists of a linear chain of amino acids (typically 50 to 1500)
selected from a fixed list of 20. Experimental techniques allow one to
determine the sequence of relatively small segments of DNA molecules
and the entire amino acid sequence of some proteins.
There are several reasons why one wants to compare two different
linear sequences in order to say how “similar” they are. If the DNAs
of two different animals are similar, then the animals are probably close
to each other on the evolutionary tree. Very similar short segments of
DNA probably code for similar proteins. If the sequences of amino acids
in two proteins are similar, their three-dimensional structures may be
similar and their functions may also be similar. The three-dimensional
structures and the functions of proteins are both difficult to determine.
Thus, astute comparisons of the sequence of a new protein with the se¬
quences of well-known proteins in data bases may suggest reasonable
hypotheses about structure and function.
In the case of DNA, several biological facts make the situation more
complicated. First, the sequences that we wish to compare may have dif¬
ferent lengths. We can handle this by counting as a mismatch a symbol
matched with nothing. Second, DNA mutates by substitutions (a sym¬
bol replaced by another symbol), additions (a symbol placed between
two there already), and deletions (a symbol removed). In the case of a
deletion, we can hold the place of the deleted symbol with a new symbol
So, given two sequences, a natural question is to find the relative
placement so that the distance between them is minimal. Or one can
ask the same question but permit additions or deletions or substitutions.
Even for short sequences, difficult combinatorial questions are involved.
For moderately long sequences, efficient algorithms for machine com¬
putation must be devised. See project 5. The notion of “similar” and all
the rest of our discussion depends of course on the metric that is cho-
5.6 Metric Spaces 201

sen. Metrics with different weights, for example, setting S(x, -) = \,


have also been used. For more information about the use of metrics in
molecular biology, see [30].

Once one has a metric, there is a natural notion of convergence.

Definition. A sequence of points {xn} in a metric space (M, p) is said


to converge to x e M if p(xn, x) —> 0 as n —» oo. The point x is called the
limit of the sequence {xn}, and we also write limn^.oo xn = x or simply
xn —> x.

In problems 7, 8, and 9, the student is asked to show that convergent


sequences in metric spaces have many of the same properties as conver¬
gent sequences of real numbers. Cauchy sequences and completeness
are discussed in the next section.
Does it matter what metric one uses on a set? As we indicated in
Section 5.3, where we discussed different norms on C[a,b], the answer
is yes, in general. On the other hand, it wouldn't make much difference
if we used the metric 3p2 instead of the Euclidean metric p2 on Rn. To
make this idea precise, we make a definition.

Definition. Two metrics, p and a, on a set M are said to be equivalent


if, for every x e M. and e > 0, there is a 5 > 0 such that for all y e M

p(x,y) < S implies a{x,y) < e (30)

cr(x,y) < 6 implies p(x,y) < e. (31)

Examples of equivalent and nonequivalent metrics are developed in the


problems. Equivalent metrics have the same convergent sequences.

Proposition 5.6.1 Let p and a be two equivalent metrics on a set M and


suppose that {xn} is a sequence in M. Then xn -» x in the metric p if
and only if xn —>• x in the metric a.

Proof. Suppose that xn -»• x in the metric p. Let e > 0 be given, and
choose a S > 0 which satisfies (30) for this x and e. Since p(xn, x) -t 0,
we can choose an N so that n > N implies that p(xn,x) < S. By (30),
202 Chapter 5. Sequences of Functions

n > N therefore implies that a(xn, x) < e. Thus, xn -4 x in the metric a.


The proof of the converse is the same. Q

Problems

1. Prove that the functions pi and pmax defined in Example 3 are indeed
metrics on R1 2 3 4 5.
2. Prove that the function p defined in Example 5 is a metric.

3. Prove that p(f,g) = \f(x) - g(x)\ dx is a metric on C[a,b\.

4. Suppose that p is a metric on M. Prove that the following are also metrics:

(a) pi - 5p.
(b) p2 = min{l, p}.

5. Suppose that pi and p2 are metrics on M. Prove that the following are
also metrics:

(a) p = pi + p2 ■
(b) p2 = max{pi,p2}-

6. Prove that p(x, y) is a metric on (0, oo).

7. Let (M,p) be a metric space and suppose that {xn} is a sequence in


(A4, p) so that limn^.oo xn = x and limn^.00 xn = y. Prove that x = y.

8. Suppose that xn -4 x and yn -4 y in metric space (M,p). Prove that


limn^oo p(xn, yn) = p(x, y).

9. Let (M,p) be a metric space. A point deM is called a limit point of a


sequence {xn} if for every e > 0 there is an n so that p(d, xn) < e. Prove
that d is a limit point of {xn} if and only if {*„} has a subsequence which
converges to d.

10. A metric space (A4, p) is said to be discrete if for every x e M there is an


e > 0 so that p(x, y) < e implies y — x.

(a) Define a function <5 on R by S(x,x) = 0 and S(x,y) = 1 if x ^ y.


Prove that (R, 5) is discrete.
(b) Prove that (R, p2) is not discrete.
(c) Which of the metrics in Examples 1-5 are discrete?
(d) In a discrete metric space (M,p), what are the convergent se¬
quences?
5.7 The Contraction Mapping Principle 203

11. Two metrics, p and a, on a set M. are said to be uniformly equivalent if


there exist positive constants ci and c2 such that

cip(x,y) < <r(x,y) < c2p(x,y)

for all x and y in AT Prove that if p and a are uniformly equivalent, then
they are equivalent.

12. Prove that the metrics pi, pmax, and p2 defined in Example 3 are uniformly
equivalent.

13. Are the metrics of Example 2 and p of problem 3 equivalent?

14. Let p be the function defined on R x I by

\x - y 1
1
1 + \x — y\

(a) Prove that p is a metric.


(b) Prove that p is equivalent to the Euclidean metric p2.

(c) Prove that p is not uniformly equivalent to p2.

15. Let ip be a continuous function on [a, b]. Define a function p$ on C[a, b] x


C[a, b) by

(a) Prove that p^ is a metric on C[a, b} if ip(x) > 0 for all x e [a, b\.
(b) Explain why the condition ip(x) > 0 is not enough to guarantee that
Pip is a metric on C[a, b],
(c) Suppose that ip and (p are continuous functions on [a, b] which sat¬
isfy ip(x) > 0 and <p{x) > 0 for all x. Prove that the metrics p^ and
p$ are uniformly equivalent.

5.7 The Contraction Mapping Principle

If a set has a metric, then the notions of Cauchy sequence and complete¬
ness can be defined analogously to their definitions on E.

Definition. A sequence of points {xn} in a metric space (M, p) is said to


be a Cauchy sequence if, given e > 0, there is an N so that p(xn, xm) < e
for all n > N and m > N.
204 Chapter 5. Sequences of Functions

Definition. If every Cauchy sequence in a metric space (Ai,p) has a


limit in (M, p), we say that M. is complete in the metric p and refer to
the pair (M, p) as a complete metric space.

As we have seen in many special cases, it is easy to prove that a con¬


vergent sequence is automatically a Cauchy sequence (problem 1). It
is the converse, completeness, which is crucial for the constructions of
analysis. The main objects of calculus, derivatives, integrals, and series,
are defined by limiting operations and therefore depend heavily on the
completeness of R. The proof of the Bolzano-Weierstrass theorem, which
is the main tool for proving the properties of continuous functions in Sec¬
tion 3.2, also depends on the completeness of R. We proved in Section
5.3 that C[a, 6] is complete in the sup norm, and this fact played a central
role in the construction of solutions of integral equations in Section 5.4.
The completeness of C[a, b] will also be used in the construction of local
solutions of differential equations in Section 7.1.

Example 1 Let M. = {(x,y) e R2 | y = x2} with the Euclidean metric,


p2- Since p2 is a metric on R2, it is automatically a metric on M because
M is a subset of R2. Let {pn} be a Cauchy sequence in (A4,p2). Then,
{pn} is a Cauchy sequence in R2 and therefore has a limit p = (x, y) in R2
since R2 is complete (problem 8 of Section 2.4). If pn = (xn, yn), then, by
problem 10(c) of Section 2.2, we know that xn —>• x and yn -» y. Thus, by
Theorem 2.2.5,

y = lim yn = lim x2 = ( lim xn)2 = x2.


n—>-oo n—>oo n—>-00

Therefore, the limit point (x, y) is in M, so (A4, P2) is complete.

Example 2 Let M. = {/ e C[a, b] \ f(x) > 0 for x e [a, 6]} with the metric
Poo, which comes from the sup norm. Since p^ is a metric on C[a,b], it
is a metric on M. Let {/n} be a Cauchy sequence in M. Since {/n} is a
Cauchy sequence in C[a, b] and C[a, b] is complete (Theorem 5.3.2), there
is a continuous function / on [a, b] such that fn / uniformly. For each
x, fn{x) > 0 for all n, and this implies that f(x) = limn^oo fn(x) > 0.
Therefore / e M, so (M, poo) is complete.

Many other examples of complete and incomplete metric spaces are


given in the problems. We turn now to an important theorem whose
proof uses completeness.
5.7 The Contraction Mapping Principle 205

A function T from a metric space (M, p) to itself is called a contrac¬


tion if there is an a which satisfies 0 < a < 1 so that

p(T(x),T(y)) < ap(x,y) for all x, y e M.

Thus a contraction shrinks the distance between points. The importance


of this concept is that on complete metric spaces contractions have fixed
points, that is, points x such that T(x) = x.

□ Theorem 5.7.1 (The Contraction Mapping Principle) Let T be a con¬


traction on a complete metric space (M, p). Then there is a unique point
x e M. such that T(x) = x. Furthermore, if xq is any point in M and we
define xn+i — T(xn), then xn —> x as n —>■ oo.

Proof. Uniqueness is easy, for suppose that there are two fixed points,
x and x. Then,

p(x,x) = p(T(x),T(x)) < ap(x,x)

which implies that p(x,x) — 0 since a < 1. Thus, by the definition of


metric, x = x. To prove existence, we first show that for any xq e M, the
sequence {xn} defined by xn+i = T(xn) is a Cauchy sequence. Since

p(*n+i,a:„) = p(T(xn), T(xn-i)) < ap{xn,xn-i),

i,
iteration shows that p(xn+i, xn) < anp(x xo). Therefore, by the triangle
inequality, if m> n,

p(xm,xn) < p(xm,xrn-1) + p{xm-i,xm-2) + ... + p(xn+i,xn)

Since a < 1, the geometric series X!aj converges. Now, if p(xi, x0) = 0,
then xo is a fixed point and we are done. Otherwise, p(xi, xo) > 0, and,
given e > 0, we can choose an N so that YJj=n aj - £P(xh xo^1 if n > N
and m > N. For such n and m, p{xm, xn) < e, which proves that {xn}
is a Cauchy sequence. Since At is complete, there is an x e A4 so that
xn -»• x. Finally, by the triangle inequality,

p(T(x),x) < p{T{x),T{xn)) + p(T(xn),xn) + p(xn,x)


^ ct p{xi Xn) T p(xn-\-\, Xn) T p(xn,x)
< a p{x, Xn) + p{xn+1, x) + p{x, Xn) + p{xn, x).
206 Chapter 5. Sequences of Functions

Since xn ->■ x, all the terms on the right converge to zero. Therefore,
we conclude that p(T(x),x) = 0, and so, by the definition of metric,
T{x) = x. a

Example 3 The contraction mapping principle can be used to shorten


the proof of the existence of solutions of integral equations in Section
5.4. We take C[a, b] with the metric as our metric space and define a
function T on C[a,b\ by

T(xp)(x) = f(x) + f K{x,y)ip{y)dy. (32)


Ja

As in the proof of Theorem 5.4.1, we prove that T(0)(x) is continuous on


[a, b] if 0 is. This shows that T is a function from M. to M. Again, as in
that proof, we estimate

\T(ip)(x) - T(0)(x)| < |A| f \K{x,y)\\cp(y)-tp{y)\dy


Ja

< \X\M(b - a)||<p - V'lloo-


Taking the supremum of the left-hand side, we see that

\\T(<p) - T(V>)||oo < a\\(fi ~ V’lloo,


where a = |A|M(b — a). If a < 1, the contraction mapping principle guar¬
antees that there is a unique continuous function -0 such that T(0) = 0.
Replacing T(0) by 0 in (32) shows that 0 solves the integral equation.
Thus, the contraction mapping principle eliminates the need for the con¬
vergence part of the proof of Theorem 5.4.1.

Let / be a real-valued function on R and consider the iteration

®n+l = /(®n)>

which starts with a given point x$. A point x is called a fixed point of
the iteration (or of /) if f(x) = x. A fixed point is called stable if there is
a S > 0 so that xq e [x — 6, x + (5] implies that xn —> x. In Section 2.7, we
investigated an iteration called the quadratic map and proved that some
of its fixed points are stable.

□ Theorem 5.7.2 Suppose that / is continuously differentiable in an open


interval about a fixed point x. If \ f(x)\ < 1, then x is a stable fixed point.
5.7 The Contraction Mapping Principle 207

Proof. Since f is continuous, there is a (3 < 1 and a S > 0 such that


< (3 < 1 for all x e [x — S,x + <5]. Let M = [x — 8,x + 5] with the
Euclidean metric. If x e M,

\f{x)-x\ = \f(x)-f(x)\
= |/'(c)||z-z|
< /3\x — x\

by the Mean Value Theorem. This proves that f(x) is closer to x than x,
so / takes M into itself. Using the Mean Value Theorem again, we see
that if x and y are in M, then

If{x) - f{y)| < P\x - y\,


which shows that / is a contraction on M since (3 < 1. It follows from
the contraction mapping principle that x is the only fixed point in M.
and whatever xq we choose in M, the sequence of points defined by the
iteration xn+i = f(xn) will converge to x. Thus x is a stable fixed point.

Example 4 Let's see what information Theorem 5.7.2 gives us about the
quadratic map,
f(x) = ax( 1 — x),

in the case 1 < a < 3. It is easy to check that the only fixed points in the
interval [0,1] are the points 0 and x=s=L. After calculating that f'(x) =
a( 1 — 2x), we substitute x and find f'(x) = 2 — a. Since 1 < a < 3, we see
that \f(x)\ < 1, so the hypothesis of Theorem 5.7.2 is satisfied. Therefore,
208 Chapter 5. Sequences of Functions

x is stable, and if we start close enough to x, the iteration will converge


to x. Notice that this gives us some information quite easily. Theorem
2.7.2, whose proof is quite long and difficult, shows much more. For any
xq e (0,1), the sequence {xn} converges to x.

The contraction mapping principle shows the importance of com¬


pleteness. But what if we are working in a space M that is not com¬
plete, for example, C[a,b] with the L1 or L2 3 4 norm? There is a theo¬
rem which says that any metric space can be enlarged to become com¬
plete. To say exactly what this means, we need two definitions. First,
a set V C M is said to be dense in M if, given any e > 0 and any
xeM, there is a yeV such that p(x,y) < e. Second, a function T from
one metric space (Mi,pi) into another (M2, P2) is called an isometry if
P2(T(x),T(y)) — pi(x,y) for all x,yeM\. If the range of T is all of M2
then M\ and M2 are said to be isometric. Isometric metric spaces are
identical as far as their metric space properties are concerned.

□ Theorem 5.7.3 Let (.Mi, pi) be a metric space. Then, there is a complete
metric space (M2, P2) and an isometry from M\ into M2 such that the
range of T is dense in M2-

Theorem 5.7.3 says that in terms of metric space properties (Mi, p\)
can be identified with a dense subset of a complete metric space
(M2,P2)- Though Theorem 5.7.3 says that every metric space can be
“enlarged” to become complete, it is not very useful since in practice one
wants to be able to characterize the added points. For example, what are
the added points if one completes C[a,b\ in the L2 norm? Problem 12
gives an example of an incomplete space for which one can characterize
the completion.

Problems

1. Show that a convergent sequencedn a metric space is a Cauchy sequence.


2. Let (M, p) be a metric space and suppose that {xn} and {yn} are Cauchy
sequences in (M, p). Prove that lim^oo p(xn, yn) exists.
3. Which of the following subsets of K. are complete metric spaces with the
Euclidean metric? (a) [-1,6], (b) [0,00), (c) (0,00), (d) Q, (e) N.
4. Which of the following subsets of 1R2 are complete metric spaces with the
Euclidean metric?
5.7 The Contraction Mapping Principle 209

(a) {(x,y) e R2 \ x2 + y2 < 1}.


(b) {(x, y) e R2 | x > 1 and y < —2}.
(c) {(x,y) e R2 \ y e N}.
(d) {(x, y) e R2 \ f(x, y) = 0}, where / is continuous on R2.

5. Which of the following subsets of C[a, 6] are complete metric spaces with
the metric poo?

(a) {/ e C[a, 6] | f(x) > 0 for x e [a, b}}.


(b) {/ e C[a,b] | /(a) = 0}.
(c) {/ e C[a, b] | f(x) = 0 for a < c < x < d < 6}.
(d) {/ e C[a, b] | |/(x)| < 2 + f(x)2 for x e [a, b\}.

6. Give an example to show that a discrete metric space may not be com¬
plete.

7. Consider the integral equation (15), where / e C[a, 6] and K is continuous


on the square [a, b] x [a, b}. Suppose that / and K are non-negative. Prove
that the solution ip is nonnegative. Hint: use the metric space in Example
2.

8. Show how to use the contraction mapping principle to provide an eas¬


ier proof that the nonlinear integral equation in project 2 has a unique
solution.

9. For b > 0, show that (0, b) is complete in the metric p(x, y) = j ^ .

10. (a) Prove that if p and a are uniformly equivalent metrics on M, then
(M, p) is complete if and only if (M, a) is complete.
(b) Suppose that p and a are equivalent metrics on M. Show by ex¬
ample that it is possible that (M,p) is complete but (M,a) is not
complete. Hint: see problem 9.

11. Let (Mi, pi), z = l,..., N, be a finite collection of complete metric spaces.
Let M be the product of the spaces Mf, that is, M consists of the N-
tuples (xi,X2, ■ ■ ■ ,xn) with x* e Mi for each i. For two such iV-tuples,
x = (xi,x2, ■ ■ .,xN) and y = (yi,y2, • • -,Vn), define

i= 1

Prove that (M, p) is a complete metric space.


12. Let M be the set of continuous functions on R which vanish outside a
finite interval (the interval may depend on the function).

(a) Show that M is a metric space in the sup norm.


(b) Show that M is not complete.
210 Chapter 5. Sequences of Functions

(c) Show that C0(R), the continuous functions which go to zero at oo,
is complete in the sup norm (problem 10 of Section 5.3).
(d) Prove that M is dense in C0(R).

13. (a) Let M be the circle of radius 1 with the center at the origin in R2. Let
p2 be the Euclidean metric, and let p be the metric which assigns to
two points the arc length along the circle between them (going the
shorter way). Prove that these metrics are uniformly equivalent.
(b) Assume that the geodesic metric, pg, of Example 4 of Section 5.6
assigns to the points a and (3 on S2 the shorter of the two great
circle arcs between them. Use part (a) to show that pg is uniformly
equivalent to P2 on S2.
(c) Prove that (52, P2) is complete.
(d) Conclude that (52, pg) is complete.

14. In studying lateral inhibition in the retina, one is led to the following kind
of model. We imagine a line of cells indexed by j for —00 < j < 00, and
denote by ej a nonnegative number representing the stimulation of the
cell. If rj represents the response of the cell, we would like to
solve the family of equations

rj = ej - \(rj-1 + rj+1).

(a) Prove that for every bounded sequence {ej }, there is a unique bounded
sequence {rj} so that these equations hold. Hint: you will need to
use the fact that £<*, is a Banach space; see Section 5.8.
(b) How can you compute the sequence {rg}?

5.8 Normed Linear Spaces


Throughout this text we have used many sets which have natural notions
of addition and scalar multiplication. We can add real numbers; we can
add pairs of real numbers. If we add two continuous functions, the result
is another continuous function. We have sometimes referred to these
sets informally as “vector spaces.” We now make this concept precise by
giving a formal definition.

Definition. A vector space over the real numbers is a set V, whose ele¬
ments are called vectors, together with operations of addition and scalar
multiplication that satisfy the following rules:
5.8 Normed Linear Spaces 211

1. For v and w in V, the vector v + w is in V.

2. For v and w in V, v + w = w 4- v.

3. For all v, w, and u in V, (u + v) + w = u + (v + w).

4. There exists a unique vector 0 in V so that v + 0 = v for all v e V.

5. For each veV there exists another vector in V, denoted —v, so that
v + (—v) = 0.

6. For each a in M and v in V, av is in V.

7. For a and (3 in R and v in V, (a + (3)v = av + (3v.


8. For a and (3 in R and v in V, a{(3v) = (a(3)v.

9. For a in R and v and w in V, a(v + w) = av + aw.

Example 1 The set Rn, consisting of all n-tuples of real numbers,


(xi, X2,xn), is a vector space which is familiar from linear algebra.
Addition and scalar multiplication are defined by

(*l,X2)...,®n) + (yi,y2,...,yn) = (*1 +yi»*2 +y2,...,®n + 2/n)


i,£C2)...> Xfi) (ax\j ax2j • • •) otxn^.

Example 2 The set of continuous functions on an interval, C[a,b], is


a vector space. Suppose that W is a subset of C[a,b\ which contains
the zero function. Then properties 2, 3, 7, 8, and 9 hold automatically
because every vector in W is in C[a, b} and these properties hold there. To
prove properties 1, 5, and 6, one needs to verify that for all v and w in W
and all a and (3 in R, it is true that av+(3w is in W too. So, for example, the
set of continuous functions on [a, b] that vanish at a particular point xQ
is a vector space since this property is preserved by linear combinations.
On the other hand, the set of continuous functions on [0,2] such that
/(1) = 5 is not a vector space.

Throughout the text we have used the absolute value \x\ to measure
the size of a real numbers and in Section 5.3 we introduced the sup norm
H/lloo to measure the size of bounded functions. Notice that the proper¬
ties of the absolute value in Proposition 1.1.2 and the properties of || • ||oo
in Proposition 5.3.1 are the same. Other properties of | • | and || • ||oo are
also very similar; compare, for example, problem 10 in Section 1.1 with
problem 2(a) in Section 5.3. This suggests that the absolute value and the
sup norm are special cases of a more general idea.
212 Chapter 5. Sequences of Functions

Definition. Let Lhea vector space over the real numbers. A function,
|| • II, from V to R is called a norm if it satisfies the following conditions:

(a) ||u|| > 0 and ||u|| = 0 if and only if v is the zero vector in V.

(b) For every aeR, we have ||m;|| — |a| \\v\\.

(c) For all v and w in V, ||u + u>|| < ||v|| + ||w|| (the triangle inequality).

A vector space with a norm is called a normed linear space.

We have already seen several examples of norms besides the absolute


value and the sup norm. The Euclidean norm on K2 was defined in prob¬
lem 10 of Section 2.2, and the L\ norm on C[a, b] was defined in Section
5.3.

Example 3 It is not hard to show that

II (®1 j X2) •••, Xn) ||l = '/> ' | Xj |


j=1

is a norm on Rn. To prove that the Euclidean norm

1
2

(xiy X2i •••, 2?n) ||2

satisfies the triangle inequality is a little more difficult. The case n =


2 was outlined in problem 10 of Section 2.2. The general case re¬
quires the Cauchy-Schwarz inequality (problem 10). Suppose that x =
(xi, *2, —, *n) and y = (yi, y2,yn) are in Rn. Then,

n
2
x+y 2 = 5> + ^l2 (33)
2=1

< + |y*l)2 (34)


2=1

= f^\xi\2+ 2J2\Xi\\yi\ + J2\yi\2 (35)


i=1 i=l 1

+ 5>|2 (36)
2=1
5.8 Normed Linear Spaces 213

s ((S'-'f "(S1*1’)')
= (IMI2 + IMI2)2. (38)

Taking the square root of both sides proves the triangle inequality for
the Euclidean norm. The Cauchy-Schwarz inequality was used in going
from (35) to (36).

If || • || is a norm on a vector space V, then it is easy to see (problem 3)


that
p{x,y) = \\x-y\\

is a metric on V. Thus, the concepts of convergence, Cauchy sequence,


and completeness, which we introduced in the last two sections, have
meaning on V.

Definition. Let V be a vector space with a norm || • ||. If every Cauchy


sequence in V converges to a limit in V, then V is said to be complete in
the norm || • ||. A complete normed linear space is called a Banach space
after Stephan Banach (1892 - 1945 ).

We already know that K is complete in the absolute value norm and


that C[a,b\ is complete in the sup norm. We know from Example 1 in
Section 5.3 that C[a, b] is not complete in the L\ norm. The proof that Rn
is complete in the Euclidean norm is outlined in problem 6.. Here we
will investigate a new space.
Let loo denote the set of bounded sequences, {%•}, of real numbers.
We define addition of sequences and scalar multiplication by

iaj} + {bj} = {aj + bj}


a{aj} = {aaj}.

The sequence which is all zeros is the 0 of the vector space. We define a
norm on by
11 { a j } 1100 = sup \dj\.
3

Since |aj| > 0 for each j, it is clear that IKajjHoo > 0 and IKajUloo = 0 if
and only if dj = 0 for each j. Furthermore, if a e R,

lla{ai}||oo = ||{aaj}||oo = sup {| aaj |} = |a|sup{|aj|} = |a| ||{aj}||00.


j 3
214 Chapter 5. Sequences of Functions

Thus, properties (a) and (b) in the definition of norm hold.


If a and /3 are in M and {oq} and {bj} are in t^, then

||c*{aj} T /^{^j}||oo — 11[oMj T /36j}||oo


= sup \cta,j + f3bj\
j

< sup|a||aq| + sup |/3||6j|


j j t
= |a|sup|aj| + || su.p 16^|
j j
= M ll{ai)l|oo + \P\ ||{6j}||oo-

< oo

This proves that linear combinations of sequences in £<*, are again in


Furthermore, if a = 1 = (3, this is just the triangle inequality. Thus, £oo is
a normed linear space.

□ Theorem 5.8.1 is a Banach space.

Proof. Let {a(n)} be a sequence of elements of 1^. Each a(n) is a se¬


quence Suppose that {a(n)} is a Cauchy sequence. That is,
given e > 0, there is an N so that

< £ (39)

for all n > N, m > N. Since

ia(»)_aMll.. =
- Ilfaw
iij^W
II 1 ^7 -
_ n^mn°°
„M
“j II
Jj=l l|oo - SUp
c„n |fl
l„(n)
■ - Cl; j,

we see that

|a<n)-o<ra)| < e (40)

if n > N, m > N, for each fixed j. Thus, each of the sequences of com¬
ponents is a Cauchy sequence of real numbers and therefore
converges to a real number aj. By problem 4(a),

| ||a(n)||||oo _||/7(n)||
I ||U' ||<*
| < ||ci
||oo |
||n(n) — Cl
/1(m)||
' loo,

so because of (39), {||o(n)||00} is a Cauchy sequence of real numbers and


is therefore bounded. That is, there is an M so that supy |a(n)| < M for
all n. Thus |ajn)| < M for all n and all j. Since ajn) ->■ aj as n -> oo.
5.8 Normed Linear Spaces 215

this proves that \aj\ < M for all j. Therefore, the sequence a = {a,j} is
bounded and is thus an element of
Finally, letting n —* oo in (40) shows that |aj — a™ | < e for each j if
m > N. Thus,

11 a. — a^Hoo = sup |aj — ajm^| < e


j

if m > N. This proves that a->• a in £<*> as m oo. Thus, t^ is


complete. □

If normed linear spaces are a special case of metric spaces, why do


we treat them separately? The reason is that the vector space structure
allows us to define the important concept of linear transformation.

Definition. A function T from a normed linear space V to a normed


linear space W is called a linear transformation if

T(au + /3v) = aT{u) + (3T(v) (41)

for all u and v in V and a and (3 in E.

Example 4 To define a function T from Mn to Rm, one must specify the


m components of y = T(x) for each x e Rn. If for each i, the component
Hi is a linear combination of the components Xj of x, that is,

yi — Uil*n + Uz2*®2 + • • • +

then T is a linear transformation. It is represented (in the standard basis)


by the matrix {a^}.

Example 5 If K is a continuous function on the square [a, b] x [a, b\, we


can define
S{^){x) = [ K(x,y)ip(y)dy.
Ja

We showed in Section 5.4 that if ip is a continuous function on [a, b], then


Sty) is a continuous function on [a, b]. That is, 5 is a function from C[a, b]
to itself. Notice that S is a linear transformation because
pb
S{a^ + (3(j)){x) = / K(x,y){ct%l)(y)+P4>(y))dy
Ja
/ b
K(x,yty(y)dy + /3 I
rb
K(x,y)(f)(y) dy

= aSty){x) + (3Sty)(x).
216 Chapter 5. Sequences of Functions

Example 6 Consider the operation of differentiation Let C^[a,b\


denote the set of continuously differentiable functions on [a, b\. We know
from problem 7 of Section 5.3 that Cd)[a, b] is a Banach space with the
norm ||/|| = ||/||oo + H/'Hoo- If we differentiate a function in C^[a,b]
we get a continuous function. That is, is a function from C^[a, b] to
C[a, b\. In fact, ^ is a linear transformation since

+ Pg{x)) = a~f(x) + P-j-9(x).

These examples show that some of the most important objects that
one wants to study are linear transformations on Banach spaces. If the
Banach spaces are finite dimensional as in Example 4, the study of lin¬
ear transformations is called linear algebra. If the underlying Banach
spaces are infinite dimensional, as in Examples 5 and 6, the study of lin¬
ear transformations is part of a branch of mathematics called functional
analysis.

Problems

1. Which of the following subsets of C[a, b] are vector spaces?

(a) The continuous functions, /, which satisfy /(a) = 1.


(b) CW[a,b].

(c) The continuous functions, /, which satisfy f* f(x)dx = 0.

(d) The functions / e C^[a, 6] that satisfy

f"(x) + (2x1 2 + l)f\x) + (sin x)f(x) = 0.

2. Which of the following subsets of C(R) are vector spaces?

(a) Cb(R), the bounded continuous functions on R.


* %

(b) (70(IR), the continuous functions that go to zero at ±oo.


(c) The continuous functions that go to 1 at ±oo.
(d) Cd) (]R), the continuously differentiable functions on M.
(e) The continuous functions on R that vanish at x = 5.

(f) The continuous functions on R that satisfy \f{x)\ < cex2 for some
ceM which can depend on /.
5.8 Normed Linear Spaces 217

3. Let || • || be a norm on a vector space V. Prove that p(x, y) = ||® - y\\ is a


metric on V.

4. Let V be a vector space with a norm || • ||.

(a) Prove that for all v e V and w e V,

I IMI — ||tu|| | < ||u — iu||.

(b) Suppose that i?n-^t?inP. Prove that ||vn|| -> ||u||.

5. Show that every convergent sequence in a normed linear space is a


Cauchy sequence.

6. Prove that Rn is complete in the Euclidean norm. Hint: show that a se¬
quence is Cauchy in Rn if and only if each of the sequences of components
is Cauchy in R.
7. Show that
/OO

\f{x)\e~x2dx
-OO

is a norm on the space of bounded continuous functions on R.


8. Two norms, || • ||i and || • ||2, are called equivalent if there are positive
constants, c and d, so that

c|M|i < IMI2 < d||u||i


for all v e V.

(a) Prove that if || • ||i and || • ||2 are equivalent, then V is complete in
|| • ||i if and only if V is complete in || • ||2.
(b) Prove that the 11 • 111 norm and the Euclidean norm 11 • 112 are equivalent
on Mn by showing that

11*112 — ll*lll < 7111CC || 2 •

(c) Prove that the sup norm and the Li norm are not equivalent on
C[a,b\.

9. Recall from linear algebra that a set of vectors {}”11 is said to be linearly
independent if no linear combination a\V\ + a2v2 +... + a.jVn is the zero
vector unless Uj = 0 for allz. A vector space V is said to have dimension
N if every set of N independent vectors {vi}^ spans V; that is, every
vector in V can be written as a linear combination of the V{. If V has
dimension N for some N, V is said to be finite dimensional.

(a) Show that Mn has dimension n.


(b) Show that {1, x, x2,..., xn} is an independent set of vectors in
C[a, b] for each n.
218 Chapter 5. Sequences of Functions

(c) Prove that C[a, b} is not finite dimensional.

10. Let {xi}^=1 and {yi}fLi be real numbers not all zero. Define a quadratic,
p(A),by
N

p(A)= +A^)2-
2=1

Explain why p(A) has either two complex roots or a double real root. Use
this fact to prove the Cauchy-Schwarz inequality

JV

< (42)
2=1

Under what circumstances does one get equality?


11. Let c0 denote the set of sequences, {a? }, of real numbers such that aj —>■ 0
as j —>■ oo. Define
II{°j}I|oo = SUP|Oj|.
j

(a) Explain why c0 is a normed linear space with the norm || • Hoc.
(b) Prove that cQ is complete. Hint: since cQ C l^, we know that any
Cauchy sequence has a limit in t^.
(c) Show that the set of sequences which are zero after finitely many
terms is dense in cQ.
(d) Show that cQ is not dense in .

12. Let T be a linear transformation from a Banach space to itself and suppose
that T is a contraction. What fixed points can T have?
13. Let An be the set ofnxn matrices A = {aij}. Define

II^H = SUp \d%j


ij

(a) Explain why An is a vector space.


(b) Prove that || • || is a norm.
(c) Prove that An is complete in the norm || • ||.
(d) Let B be a fixed element of /U and define T(A) = BA where BA de¬
notes matrix multiplication. Show that T is a linear transformation
on An-

14. For ft C(2)[a,6], define ||/|| = ||/||oo + ||/'||oo + ||/"||oo-

(a) Explain why C^ [a, b} is a vector space.


(b) Prove that || - || is a norm on C(2)[a, b}.
(c) Prove that C42)[a, 6] is complete in the norm || • ||.
Projects 219

(d) Prove that ^ is a linear transformation from C(1 2)[a, b] to C[a, 6],
(e) Prove that {/ e [a, b} \ f"(s) = 0} is a vector space. It is called the
kernel of ^.
(f) Identify the functions in the kernel.

(g) Is the linear transformation one-to-one?

Projects

1. The purpose of this project is to prove that if / is a continuous function


on [0,1], then ||/||p -» H/H^ as p ->• oo.

(a) Show that for each p, ||/||p < ||/||oo-

(b) Explain why \f(x)\ is continuous on [0,1]. Let xQ be the point where
|/(x)| achieves its maximum. Explain why \f(x0)\ = ||/||oo-
(c) Assume that xQ is not one of the endpoints and let e > 0 be given.
Explain why you can choose a y > 0 so that \ f(x)\ > ||/||oo - £ for
all x e [xa — y, xQ + y\.

(d) Prove that f* | f(x)\pdx > 2y (||/||oo — e)p for all p.


(e) Show that for p large enough, ||/||> ||/||p > \\f\\oo - 2e and con¬
clude that limp^oo ||/||p = ||/||oo-

2. The purpose of this project is to show by example that the method out¬
lined in Section 5.4 can sometimes be used to solve nonlinear integral
equations. Consider the following integral equation on the whole line R:

1
'ip(x) - COS X y){ip(y))2 dy. (43)
2

As in Section 5.4, we will try to solve the equation iteratively by choosing


a ipo{x) and then defining

1 1 fx+*
i/>n+i(x) = - c°sx + - sin (x - y)(ipn(y))2 dy.
2 2Jx-\

(a) Recall that Cb(R) denotes the space of bounded continuous func¬
tions on R and that Cb(R) is complete (problem 3 in Section 5.3).
Prove that if ipn e Cb(R), then ipn+i e C&(R). Argue inductively that
ipn e Cb(R) for all n if ipo e Cb (R) •
(b) Suppose ip0 e C'b(R) and UV’olloo < 1- Prove that ||V»n||oo < 1-
220 Chapter 5. Sequences of Functions

(c) Use estimates similar to those in the proof of Theorem 5.4.1 to show
that
||0n+l 0n||oo L 2 ll^n — V'n-lll00'

Explain why the proof in Theorem 5.4.1 allows us to conclude that


0n converges uniformly to a function 0 e C'b(lR).
(d) Prove that Halloo < 1 and 0 satisfies (43).
(e) Prove that 0 is the unique function in Cb(R) satisfying ||0||oo < 1
which solves (43).
(f) Give a simpler proof using the contraction mapping principle.

3. The purpose of this project is to show that integral equations can some¬
times be used to solve boundary value problems for differential equa¬
tions. Let / be a continuous function on [0,1] and define

y( 1 — x) if y < x
K(x,y)
x(l — y) if y > x.

(a) Show that K is a continuous function on [0,1] x [0,1].


(b) Suppose that 0 is a continuous function on [0,1] that satisfies

0(a) = - / K(x,y)f(y) dy + X / K(x, y)i/>(y) dy.


Jo Jo

Prove that 0 is twice continuously differentiable on [0,1] and that 0


satisfies
4>"(x) + A 0(x) = f(x)

and the boundary conditions 0(0) = 0 = 0(1)- Hint: divide the


region of integration into two parts 0 < y < x and x < y < 1 and
use the Fundamental Theorem of Calculus repeatedly.

4. The purpose of this project is to solve the differential equation satisfied


by the extremal function for the Brachistochrone problem. The extremal
y(x) satisfies y(x)( 1 + (y'(x))2) = C\ for some constant C\.

(a) Explain why Ci < 0 and why y'(x) must blow up as x \ 0. What
does it mean geometrically that y'(x) blows up?

(b) Prove that % =

(c) We introduce a new variable 9 as a parameter and try to find both


x and y in terms of 9. Let y and 9 be related by the equation y =
Ci (sin 9)2. Use the chain rule to prove that

~ = Ci(l - cos 29).

(d) Find x(9) in terms of C\ and a new constant C2.


Projects 221

(e) Show that the constants C\ and C2 can be chosen so that the curve
(x(6),y(9)) passes through the points (0,0) and (ce1} yi).
(f) Generate the graph of the curve (x(0), y(6)). Why do you think that
the curve which gives the shortest time of descent is so steep near
the origin?

5. The purpose of this project is to introduce some of the computational


issues involved in DNA sequence comparison.

(a) Consider the sequences AGGCTC and AGCTCG drawn from the
DNA alphabet. We use the discrete metric in which a letter op¬
posite a deletion symbol and a letter opposite nothing are counted
as full mismatches. Show that the minimum distance between the
sequences is 3 if we do not allow deletion symbols to be inserted.
Show that the minimum distance is 2 if we do allow deletion sym¬
bols.
(b) Design and implement a computational algorithm for finding the
minimal distance (with no deletion symbols allowed) between two
sequences of length 10 and length 8 constructed from the DNA al¬
phabet.
(c) Design an algorithm which produces random DNA sequences of
length 8 and 10.
(d) Conduct an experiment in which you determine the minimum dis¬
tance of 1000 randomly chosen pairs of lengths 10 and 8, respec¬
tively. What fraction has distance 2, distance 3, and so forth? How
likely is it that two randomly chosen pairs have a distance < 3?
(e) If the lengths of the sequences are N and M instead of 10 and 8, es¬
timate how the number of computational steps involved in finding
the minimal distance grows as N and M get large.
,
CHAPTER 6

Series of Functions

Before starting our discussion of series, we introduce in Section 6.1 a new


tool for analyzing sequences, the limit superior and the limit inferior. In
Section 6.2 we define what it means for a series to converge and we prove
various tests for convergence. Series of functions are introduced in Sec¬
tion 6.3, and power series, an important special case, are treated in Sec¬
tion 6.4. In Section 6.5 we review the basic properties of complex num¬
bers and extend many of the ideas which we have developed to complex
sequences and series. Finally, in Section 6.6 we give criteria that guaran¬
tee that infinite products converge and use the results to investigate the
distribution of prime numbers.

6.1 Lim sup and Lim inf

Let {an} be a sequence of real numbers and define for each N

sn = sup{ an | n > N}.

As N gets larger, the sup is taken over a smaller set so the sequence of
numbers {sat} is monotone decreasing; that is, sn > sn+i- If {sn} is
bounded then, by Theorem 2.4.3, {sN} converges to a finite number s.
We define s to be the limit superior of the sequence {an} and write

lim sup an = s = lim s jy.


n—>• oo tV—» oo

We shall usually write lim sup an, omitting the subscript n ^ oo. If {sN}
is not bounded, there are only two possibilities since it is monotone de¬
creasing. Either sN = oo for all N, in which case we say that lim sup an =
oo, or sN -oo, in which case we say that lim sup an = -oo. Similarly,
we define
sN = inf { an | n > N}
224 Chapter 6. Series of Functions

and the limit inferior of the sequence {an} by

lim inf an = s = lim sN.


n-loo N—>oo

A sequence {an} may or may not have a limit, but it always has a lim sup
and lim inf, though they may equal ±oo. Notice that sN < sn for all N,
so, by problem 3 in Section 2.4, we always have s<s.

Example 1 Let an = ( — l)n. The sequence {an} certainly does not con¬
verge. However, for each N, sn = 1 and sN = —1. Thus, lim sup an = 1
and lim inf an = — 1.

Example 2 Consider the sequence an = 2 + (-l)n(l + ^). The values


are depicted in Figure 6.1.1. For each N, sn = supn>N{an} is a little
above 3, but the amount above decreases to 0 as N oo. Thus, we see
that s = limsjv = 3. Similarly, for each N, sN = infn>N{an} is a little
below 1, but the amount below 1 decreases to 0 as IV —» oo. Therefore,
s = lim sN = 1.

"T-1-1-1-1-1-1--r
1 2 3 4 5 6 7 8

Figure 6.1.1

Example 3 Let {an} be the sequence

1.0,1,-1,1,-2,1,-3,1,-4,1,-5,...
k%
For each TV, sN = 1 and sN = —oo. Therefore, lim sup an = 1 and
lim inf an = —oo.

We will see later that lim sup and lim inf are very useful. For the mo¬
ment we prove a theorem that gives a practical and intuitive characteri¬
zation of s and s.
6.1 Lim sup and Lim inf 225

□ Theorem 6.1.1 Let {an} be a sequence of real numbers.

(a) If s is finite and e > 0 is given, there exists an N so that an < s+e for
all n > N, and for each N there exists an n > N so that an > s — e.
Conversely, if s is a number satisfying these properties, then s — s.

(b If s is finite and e > 0 is given, there exists an N so that an > s—e for
all n > N, and for each N there exists an n > N so that an <s + e.
Conversely, if s is a number satisfying these properties, then s = s.

Proof. We will prove (a); the proof of (b) is similar. Let e > 0 be given.
Since sjsr —>■ s and s is finite, we can choose N so that sn — s < e. That is,
sup {an | n > N} < s+e so an < s+c for all n > N. Given N, suppose that
there were no n > N such that an >s — e. Then sn — sup {an \ n > N} <
's — e, which is impossible since sn decreases to s.
Conversely, suppose that s satisfies the stated properties. Then for
each e > 0 there is an N so that sn < s + c; thus, since sjv is monotone
decreasing, s < s + e. Since e is arbitrary, we conclude that s < s. On
the other hand, given any N, there is an n > N such that an > s — e.
This implies that sn > s — e. Thus s > s - e, and since e is arbitrary we
conclude that s > s. Therefore, s = s. O

Corollary 6.1.2 A sequence {an} of real numbers converges to a finite


limit a if and only if

—oo < limsupan < liminfan < oo (1)

in which case
lim sup an — a = lim inf an.

Proof. Suppose that an a. Given e > 0, there is an N so that a- e <


an < a + e for all n> N. Thus, for n> N,

a - e < sn < a + e and a — e < sn < a + e,

so
a—e < s < a+e and a — e < s < a + e.

Because e is arbitrary, it follows that s = a = s, which implies (1).


Conversely, suppose that (1) holds. Then s and s are finite and s > s.
Since we always have s < s, we conclude that s = s. Define a = s = s,
and let e > 0 be given. Then, by part (a) of Theorem 6.1.1, there is an
226 Chapter 6. Series of Functions

Ni so that an < a + e if n > Ni, and by part (b), there is an N2 so


that an > a — e if n > N2. Choosing N3 = max {Ah, N2}, we see that
\an — a\ < e if n > AT3/ which proves that an —> a. □

Theorem 6.1.1 gives some intuition about limsup and liminf. The
terms of the sequence eventually get below any number that is bigger
than s and keep coming back above any number that is less than s. Simi¬
larly, the terms of the sequence are eventually above any number below s
and keep coming back below any number above s. We emphasize that a
sequence {an} may or may not have a limit, but limsupan and liminf an
always exist.
The following technical theorem will be used when we consider infi¬
nite series, and its proof illustrates the concepts that we have defined.

□ Theorem 6.1.3 Let {an} be a sequence of positive numbers. Then

liminf ^2+1 < lim inf < lim sup < lim sup -n+1. (2)
O'Tl

Proof. The middle inequality holds because the lim inf of any sequence
is less than or equal to the lim sup. We will prove the inequality on the
right. The proof of the one on the left is similar. Define

an+1
a lim sup
n—>oo O'n

If a = 00, we have nothing to prove. Otherwise, let £ > 0 be given. Then,


by part (a) of Theorem 6.1.1, there is an N such that

for all n > N. We can rewrite this as an+1 < an(a + e). Iterating this
inequality, starting at N, gives

an < afr(a + e)n~N,

so

< y/aN(a + e)~N (a + e). (3)

As n -> 00 the right-hand side of (3) converges toa + e since the term
with the nth root converges to 1. Therefore, by the result in problem 7,
6.1 Lim sup and Lim inf 227

lim sup < a + e . Since e was arbitrary, lim sup %/a„ < a, which is
what we needed to prove. □

Problems
1. Find the lim sup and lim inf of each of the following sequences:

(a) an = 5 + (-l)n.
(b) an = 5 + (—2)n.
(c) an = 5 + i sin n.
(d) — (3.2)an(l dn), with no 2*

(e) an = sin n. Hint: see project 5 of Chapter 2.

2. Let an — 1 if n = 2k for some positive integer k, and an = ^ otherwise.

(a) Find lim sup an and lim inf an.


(b) Find lim sup .

(c) Find lim sup | an | « .

3. Suppose that lim sup an = c > 0.

(a) Prove that lim sup (2a„ + 1) = 2c + 1.


(b) Prove that lim sup (a1 2 3 4 5 6 7 8n) = c2.

4. Let {an} be a sequence of real numbers and suppose that lim sup an is
finite. Prove that if c > 0, we have lim sup can = c lim sup an.
5. Let {an} be a sequence of real numbers and suppose that lim sup on is
finite. Let {cn} be another sequence and suppose that cn c.

(a) Prove that, if c > 0, then

lim sup cnan = climsupan. (4)

(b) Find a counterexample to (4) with c < 0.

6. Let {an} and {bn} be sequences of real numbers. Prove that

lim sup (an + bn) < lim sup an + lim sup bn

and give an example which shows that strict inequality can hold.
7. Suppose that {an} and {bn} are sequences such that an < bn for all n and
bn —)■ b. Prove that lim sup an < b.
8. Let {an} be a bounded sequence of real numbers. Prove that {a„} has a
subsequence that converges to lim sup an.
228 Chapter 6. Series of Functions

9. Let {an} be a bounded sequence of real numbers and let P be the set of
limit points of {an}. Limit points are defined in Section 2.6. Prove that
lim sup an = sup P and lim inf an — inf P.

10. Consider the sequence

11111111
2’ 3’ 2^’ 3*’ 2®’ 3®’ 2*’ 3*’ *'*

Prove that all four quantities in (2) have different values.

6.2 Series of Real Constants


We begin our study of series of functions with the simplest special case,
series of constants. Let {aj} be a sequence of real numbers. Throughout,
we shall use the summation notation
m

^ ^ Q>j — O'n "L 1 + • • • + am— 1 T dm-


j=n

If m = oo, then infinitely many terms are being added up and the sum is
called an infinite series. We want to determine conditions under which
we can give a reasonable meaning to and we want to prove the¬
orems that allow us to manipulate infinite series. For each n, we define
the partial sum, Sn, of the series aj to be

Sn — aj.
3=1

The sum defining Sn is over finitely many terms so there is no doubt


about its meaning.

Definition. If the sequence of partial sums {5n} converges to a finite


number S as n —>■ oo, we say that'the infinite series aj converges
and define
OO

IZa3 = S‘
3=1

If the sequence of partial sums does not converge we say that the series
diverges.
6.2 Series of Real Constants 229

Example 1 If a is a real number, the series aj *s called the geomet¬


ric series. In this case, the index j starts at zero. Because of the special
form of the terms of this series, we can calculate Sn = ^”=0 explicitly.
If we multiply Sn by a, we see that

aSn + 1 = Sn + an+1.

If ot ^ 1 we can solve for Sn obtaining

1 - an+1

1 — (X

If |a| < 1, then an+1 -> 0 as n —» oo. Therefore, by Theorem 2.2.6, Sn


converges to as n —> oo. Thus, by definition, the series converges
and

aj
a
3=0

To see that this is reasonable, consider the case a = We have shown


that as we take more and more terms, the sum of

1 1 1 1 1
1+2+4+8+ 16 + 32 +

gets closer and closer to 2. If |a| > 1, then one can see from the explicit
formula for Sn that Sn does not converge. In the case a = 1, we are
adding up l's, so the series certainly does not converge. In the case a =
—1, the partial sums alternate between 1 and 0 and so do not converge.

For simplicity, we will often write ]Cyli aj as E) aj> where the indices
are understood.
Since a series converges if and only if the sequence of partial sums
converges, we can use the theorems which we have proven about se¬
quences to study series.

□ Theorem 6.2.1

(a) A series aj converges if and only if for each given e > 0 there
is an N such that

< e for all n > N and m> N. (5)


j—n
230 Chapter 6. Series of Functions

(b) If X aj converges, then aj —> 0 as j —> oo.

(c) If X \aj \ converges, then X) aj converges.


(d) If X) aj converges and XI b3 converges and c and d are any real num¬
bers, then X(cai + dbj) converges and

J^icaj+dbj) = c^aj+dJ^bj- (6)

Proof. Let Sn = YJj=\ aj and suppose e > 0 is given. If the series


converges, then, by definition, {i>n} is a Cauchy sequence. Thus, for n
and m large enough, |5m - 5n-i| < £, so
m

j—n

which proves (5). Conversely, if (5) holds, then \Sm — S'n-i | < e for n and
m large enough, so {SVi} is a Cauchy sequence. Thus limn_^oo Sn exists
and by definition the series converges.
To prove (b), let e > 0 be given and choose N so that (5) holds. If we
choose m = n, then
n

j—n

for n > N, which implies that a3 —> 0 as j -> oo. Part (c) follows imme¬
diately from (a) and the fact that
m

< J2\aJ
j=n j=n

To prove (d), let Sa,n and S^n be the partial sums of X aj and X b1
respectively. Let Sn be the nth partial sum of X(cai + dbj). Then,
Sn = cSa,n + dSbn for each n. Since Sa,n converges and Sb>n converges.
Theorems 2.2.3 and 2.2.4 guarantee.that Sn converges and

lim Sn — c n—>co
n—>• oo
lim Sa,n + d n—^
limoo Sbn-

Thus X(cai + dbj) converges and (6) holds. □

A series X aj for which X K | converges is called absolutely conver¬


gent. If X aj converges but X \aj \ does not converge, the series is said to
6.2 Series of Real Constants 231

be conditionally convergent. Part (c) shows that if a series is absolutely


convergent, then it is convergent. The converse is not true. Alternating
series, which are discussed in project 1, are sometimes convergent but
not absolutely convergent.

l 1
(m-l)P mp

m— 1 rn

Figure 6.2.1

Example 2 We will show that the series converges if p > 1.


Notice that YJjLn is a lower sum for the integral of f(x) = ^ on the
interval [n — 1 ,m\. See Figure 6.2.1. Thus,

1
E JV
j=n
< dx

= JiLfE_1_)
p— 1 \rriP 1 (n — l)p

1 2
- p- 1 (AT - l)^1

if n > N and m > N. Since p > 1 , the expression on the right can
be made as small as we like by choosing N large. Thus, by part (a) of
Theorem 6.2.1, ^ converges.

Example 3 The series X) j, which is called the harmonic series, does


not converge. To see this, group the terms as indicated:

1 + 5 + 4 + i} + {^ 6NN> + {5 + - + ^} +
The terms in each bracket add up to a number greater than Thus the
sequence of partial sums diverges to oo, and thus the harmonic series
232 Chapter 6. Series of Functions

does not converge. Notice, however, that the sequence aj = j converges


to zero. This shows that the converse to part (b) of Theorem 6.2.1 is false.
It is not true that aj —»■ 0 implies that Y aj converges. Y ] can also be
shown to diverge by using an integral argument like that in Example 2
(problem 2).

□ Theorem 6.2.2 (The Comparison Test) Let {aj}, {bj,}, and {cj} be se¬
quences of nonnegative numbers such that aj < bj < Cj for each j. Then,

(a) If Y cj converges, then Y bj converges.

(b) If Y aj diverges, then Y bj diverges.

Proof. Suppose that Y cj converges and let e > 0 be given. Then, by


part (a) of Theorem 6.2.1, there is an N such that Y^jLn Cj < e if n > N
and m > N. Since 0 <bj < Cj for each j,

m m m

T,bi J2bJ
j=n
^ Es- < £•
j=n

Therefore, by part(a) of Theorem 6.2.1, Y bj converges.


To prove (b), notice that Sn = Y{j=i aj is a montone increasing se¬
quence because the aj are nonnegative. Therefore, either Sn converges
to a finite limit or Sn -> oo. Since Y aj diverges by assumption, Sn —> oo.
But since aj < bj for each j, Sn < Yj=i bj for each n. Thus the partial
sums of Y bj diverge to oo. □

Example 4 Consider the series Y Since

I sin j | 1
j2 + 1 ~ j2
and Y -p converges, by part (a) of Theorem 6.2.2, Y is absolutely
convergent. Therefore, by part (c) 'of Theorem 6.2.1, Y is conver-
gent. 3

A very important point to remember is that the convergence or diver¬


gence of a series is not affected by changing finitely many of its terms.
This is shown by part (a) of Theorem 6.2.1 since criterion (5) is required to
hold only for n and m larger than some large N. This fact is often useful
6.2 Series of Real Constants 233

when applying the comparison test because it means that the criterion
aj < bj < Cj need only hold for all j bigger than some finite number J.
Though convergence depends only on the tail of the series, the sum of
the series, if it converges, depends on all of the terms.

□ Theorem 6.2.3 (The Root Test) Set a = lim sup \aj | 3 . Then

(a) if a < 1, the series Y aj converges absolutely.

(b) if a > 1, the series Y aj diverges.

Proof. If a < 1, choose a number (3 so that a < f3 < 1. By Theorem 6.1.1


we can find a J so that \aj \ 3 < (3 for all j > J. Thus, |aj \ < ft for those j
and Y aj converges by comparison with the geometric series.
If a > 1, we choose (3 so that a > (3 > 1. Again, by Theorem 6.1.1, we
1
can find infinitely many values of j so that \a,j\i > (3. Since \aj\ > 1 for
those values of j, the sequence {aj} does not converge to zero. Therefore,
by part (b) of Theorem 6.2.1, X) % cannot converge. □

□ Theorem 6.2.4 ( The Ratio Test) Let £ aj be a series of nonzero terms.


Then,

(a) if lim sup < 1, the series converges absolutely.


\aj l

(b) if lim inf > 1, the series diverges.


\aj I

1
Proof. Set a = lim sup \aj\i. By Theorem 6.1.3,

lim inf < a < lim sup ^+1L


\aj\ \a.j\

Thus, if lim sup < 1, the series converges by part (a) of Theorem

6 2.3. And, if lim inf > 1, the series diverges by part (b) of Theorem
6.2.3. u

Neither the root test nor the ratio test give information if the respec¬
tive limits equal 1. We remark that in many cases the sequence [a~l
a limit, in which case the four limits in Theorem 6.1.3 are all the same.
234 Chapter 6. Series of Functions

Example 5 Consider the series E yr- Since

\aj\ j +1

as j -> oo, the ratio test proves that the series converges.

We defined the sum of a series to be the limit of the sequence of par¬


tial sums if the limit exists. If we rearrange the series, that is if we add
up the terms in a different order, that will certainly change the sequence
of partial sums. Will the new partial sums converge? To the same limit?

□ Theorem 6.2.5 Suppose that E aj converges absolutely and let / be a


one-to-one function from N onto N. Then, the series Ea/(j) converges
absolutely and E a/(j) = E aj-

Proof. Define Sn = Ej=o ajf S = lim*-** Sn, and Tn = EJ=o af(j)- Let
e > 0 be given. Since E ttj converges absolutely, we can choose N\ so
that
OO

E M < e/2- (7)


j=Nx+l

Let J = max {jo, ji, ...,jNl}, where jk is the natural number such that
fijk) = k. Such integers exist because / is onto. Now, choose N =
max{J + 1, Ah}. By the triangle inequality,

\Tn-S\ < \Tn-Sn\ + \Sn-S\

for each n. If n > N, then n > Ni, so the second term on the right is
< | by (7). Further, since n > J + 1, the partial sum Xy=o af(j) contains
every term in the sum J2f=o aj- Thus, the difference Tn - Sn contains
only terms aj with j > N + 1, and therefore \Tn - Sn\ < §, again by (7).
Thus,
| Tn — S | < 6 for n > N,

so Tn —y S; that is, Ylaf(j) = Ylaj- The same proof, using |aj| and
in Place of aj and af{j)' shows that ^ |a/(j)| = E \aj\, so Ea/(j)
converges absolutely. □

It might seem that Theorem 6.2.5 is just a technical exercise, but it is


not. It can be shown that if E % is a conditionally convergent series and
6.2 Series of Real Constants 235

a is any real number, then there is a rearrangement / so that X a/(j) = a-


See problem 16. In other words, one can rearrange a conditionally con¬
vergent series to get any sum one likes! More important, in some situa¬
tions a sum of infinitely many terms may arise with no order specified.
If the sum is absolutely convergent, then the order doesn't matter. Con¬
sider the question of whether one can multiply out two series:

oo \ / oo \

uCM = (8)
j=o / \fc=0 /

How are we to understand the double sum on the right? We could mean

oo/oo \ oo/oo \ oo / n

EE cLjbk) or EE CLjbk OT akbn—k


j=0 \fc=0 / fc=0 \j=0 ) n=0 \fc=0

Each of these expressions uses a different ordering of the pairs of non¬


negative integers (j, k). Theorem 6.2.5 tells us that if we can show that
X] djbk is absolutely convergent in any ordering, then it is absolutely con¬
vergent in all orderings, and most important, the sum X ajbk is the same
in all orderings.

□ Theorem 6.2.6 Suppose that X aj and X h are absolutely convergent.


Then X ajbk is absolutely convergent and (8) holds.

Proof. Let A = X Kl and


B = X \h\- Consider the following subse¬
quence of the partial sums of the absolute values of the terms a,jbk in a
special ordering,

N / N

Sn = X] I 2X \ajWbk\
k=0 \j=0
N ( N \
= ^1^1
fc=o \i=o /

- (sw) (S1-1)
< AB.

Thus, this subsequence of partial sums, {Sn}, is bounded and since it


is monotone increasing, it converges by Theorem 2.4.3. By problem 9 of
236 Chapter 6. Series of Functions

Section 2.6, the sequence of partial sums converges. Thus, in this order-
ing, Yhajbk is absolutely convergent. By Theorem 6.2.5, J2ajbk is abso¬
lutely convergent in all orderings and djb*. is the same in all orderings.
So, using the usual rules of arithmetic.

Taking the limit of both sides as N —> oo yields (8). □

One can use this theorem to show that exey = ex+y by multiplying
out the series for the terms on the left and regrouping (problem 12 of
Section 6.5). We remark that the conclusions of Theorem 6.2.5 are still
true if only one of the two series aj and bk is absolutely convergent,
but the proof is harder.

Problems

1. Determine whether the following series converge or diverge:

2. Use an integral argument similar to that in Example 2 to show that the


harmonic series diverges.
3. How many terms of the series T do we have to add up to be sure we
are within 10-4 of the sum? Hint: estimate the tail of the series by using
integrals, as we did in Example 2.
4. Consider the series V00 ,-^Tr-pr.

(a) Prove that the series converges by the comparison test.


(b) What information does the ratio test give?
(c) Use the formula = f to compute the partial sums. Show
explicitly that the partial sums converge.
5. Consider the series

Show that the ratio test gives no information but that the root test and the
comparison test show convergence.
6.2 Series of Real Constants 237

6. Show that the series


j+1
O' + 2)1010

diverges. What happens if you calculate the first few partial sums on
your hand calculator?

7. Compute as many terms of Y U— as necessary to provide strong nu¬


merical evidence that the series converges. See also project 1.

8. Show that the series YJLi fjzj converges.

9. Show that the series Y^Li s*n (j)2 converges. Hint: use the Mean Value
Theorem.

10. Establish the convergence or divergence of Y'jLiln (1 + j). Hint: use the
definition of ln x in Example 2 of Section 4.3.
hi l\Sj
11. Establish the convergence or divergence of Y’jLi ~—7T~-
12. Prove that if the sequence {&.,■} is bounded and Y \aj\ converges, then
Y o-jbj converges.

13. Prove that if aj > 0 for all j and Y aj converges, then Y a) converges.
14. Suppose that aj > 0 and that Y aj converges.

(a) Show by example that it is not necessarily true that Y \[^j con-
verges.

(b) Show that Y j' converges. Hint: use the Cauchy-Schwarz in¬
equality.

15. Suppose that aj > 0 and that Y aj diverges. Prove that Y diverges.
Hint: first show that if it converges, then aj —> 0.

16. Consider the series Y — • Since the signs alternate and the absolute
values of the terms decrease and converge to zero, this series satisfies the
hypotheses of the Alternating Series Theorem in project 1. Therefore, it
converges. However, it is conditionally convergent since the harmonic
series diverges.

(a) Prove that the sum of the positive terms diverges to infinity. Prove
that the sum of the absolute values of the negative terms diverges
to infinity.
(b) Let a be a given real number. Rearrange the series by choosing only
positive terms, starting at the beginning of the series, until the sum
is greater than a. Choose as many negative terms as needed to bring
the sum below a. Continue in this manner and use the fact that the
terms in the series converge to zero to show that this can be done in
such a way that the sequence of partial sums converges to a.
238 Chapter 6. Series of Functions

6.3 The Weierstrass M-test

Suppose that we have a sequence of functions, {fj{x)}, and we try to


form the infinite sum
OO

fix) = J2fj(x)- (9)


3=1

In general, since each fj depends on x, the sum will depend on x when


it exists. Suppose that the interval [a, 6] is in the domain of all of the
functions fj. We say that the series of functions Ylfj converges to /
on [a, b] if, for each xe [a, 6], the series of numbers X fj(x) converges to
f(x). By the definition of convergence, this just means that the sequence
of partial sums
n

Sn(x ) = £/;(*) (10)


3=1

converges pointwise to f(x) on [a, b\. We have already seen that point-
wise convergence does not give us much control over the limiting func¬
tion. Thus, we want conditions which guarantee that the partial sums
Sn(x) converge uniformly to f(x), in which case we say that series (9)
converges uniformly.

□ Theorem 6.3.1 (The Weierstrass M-test) Let {fj(x)} be a sequence of


functions defined on a set E C R. Suppose that for each j there is a
constant Mj such that \fj{x)\ < Mj for all x e E and that X) Mj converges.
Then Sn(x) converges uniformly to f(x). If each fj is continuous on E,
then / is continuous on E.

Proof. Let e > 0 be given. Since X Mj converges, we can choose an N


so that XqLn+i Mj < e if n > N and m > N. We estimate

m
\Sm(x) - Sn(x)\ =
j—n+l
5Z fj(x)

k «.
m
<
j=n+1
1fj(x)

m
<
E
.j=n+1
M>

< e.
6.3 The Weierstrass M-test 239

Thus, for each xeE, Sn(x) is a Cauchy sequence of numbers and there¬
fore converges to a limit f(x), which is, by definition, the sum of the
series. Letting m —)• oo in the above inequality, we find

\f(x) - 5„(®)| < e

for each x e E if n > N. Thus, the series converges uniformly. Finally,


suppose that each fj is continuous on E. Then, by Theorem 3.1.1(a), Sn
is continuous for each n. Since Sn —¥ f uniformly, we conclude from
Theorem 5.2.1 that / is continuous on E. □

□ Theorem 6.3.2 Let {fj(x)} be a sequence of continuous functions de¬


fined on a finite interval [a,b]. Suppose that the series XqLi fj(x) con"
verges uniformly to f(x) on [a, b]. Then for each x e [a, b]

and the series on the right converges uniformly in x.

Proof. Since the series converges uniformly to f(x), we can


choose an N so that | Xq=i fjix) ~ fix) I < £/{b _ a) ^or xe la>&] ^
n> N. Thus,

< £

for all x e [a, 6]. Thus the right-hand side of (11) converges uniformly to
the left hand side. ^

Theorem 6.3.2 shows that if a series of continuous functions con¬


verges uniformly, then we can integrate it term by term. The following
theorem shows that under reasonable hypotheses we can differentiate a
series term by term.
240 Chapter 6. Series of Functions

□ Theorem 6.3.3 Let {fj(x)} be a sequence of continuously differentiable


functions defined on a finite or infinite interval [a, b]. Suppose that the
sum fj(x) converges uniformly to f(x) on [a, b] and that fj(x) con¬
verges uniformly on [a, b]. Then, / is continuously differentiable on [a, 6]
and
OO

f'(x) = £#*). (12)


j=i

Proof. As above, define Sn(x) = X)y=i fj(x). Then, by hypothesis,


Sn(x) -» f(x) uniformly and S'n converges uniformly. By Theorem 5.2.3,
f(x) is continuously differentiable and f'{x) = limSy Since S'n{x) =

OO

/'(*) = }™lSn(X) =

Although the hypotheses of Theorems 6.3.2 and 6.3.3 do not men¬


tion the Weierstrass M-test, the uniform convergence in the hypotheses
is usually proven in practice by using Theorem 6.2.1 and the M-test. We
note that the theorems in this section are easy to prove because we have
done the hard work in Sections 5.2 and 5.3 already.

Example 1 Let {an} be a sequence such that \an\ < C/np where p > 1.
Define a function f(x) by
OO

f(x) = an sin nx.


71=1

Since |ansinna;| < C/np, the series converges uniformly by the Weier¬
strass M-test. Thus, / is well defined and continuous on R since an sin nx
is continuous for each n. By Theorem 6.3.2, we can integrate / by inte¬
grating the series term by term. So, for example,

r2n oy p2tv
/ f(x)dx = an / sinnxdx = 0.
Jo n=1 Jo

Suppose that p > 2. Then, if we differentiate the series term by term, we


obtain
OO

nan cos nx.


71=1
6.3 The Weierstrass M-test 241

Since |nan cos nx\ < C/np~l, this series converges uniformly by the M-
test because p — 1 > 1. Thus, by Theorem 6.3.3, / is continuously differ¬
entiable and
OO

f'(x) = nan cos nx.


n—1

Series like these are called Fourier series. We study Fourier series in
Chapter 9.

Example 2 We will use the function g defined in problem 13 of Section


4.1 to construct a continuous function on E that is nowhere differentiable.
Note that g is continuous on E, bounded by 1, and satisfies

\g(x)-g(y)\ < \x - y\ (13)

for all x and y in E. Equality holds in (13) if there is no integer strictly


between x and y. Define

The term in the sum is a continuous function which is bounded by


(| y. Thus, by the Weierstrass M-test, / is a continuous function on E.
Let x e E be given. We shall show that / is not differentiable at x by
exhibiting a sequence {hn} with hn —> 0 such that

f(x + hn) - f(x) _^ ^


hn

Define hn — ±^4_n, where we choose the plus or minus sign for each
n so that there is no integer strictly between 4nx and Anx + 4nhn. Fix n.
Then,

0, if j > n
g(4j{x + hn))- g{4Jx)
±4n, if j = n
hr>.
4j, if 0 < j < n — 1.

\n) and A^x differ by a multiple of


j < n - 1, the inequality follows
from (13), and if j = n, the quotient equals ±4n because equality holds
242 Chapter 6. Series of Functions

in (13) if there is no integer between x and y. We can now compute that

f{x + hn) - f(x) ~ /3V g(4?{x + hn))-g(4?x)


(14)
hr
h™ K

A /3 V g{4J(x + hn)) - g(4Jx)


(15)
hU) K

J-1 f 3 V 0(4*(* + M) - ^(4Ja:)


> 3n - (16)
hn
3=0

n—1

> T - (17)
3=0

= i(3n + l). (18)

Thus, the difference quotient for / diverges as n -» oo and hn —>■ 0. Since


a? was chosen arbitrarily, / is not differentiable anywhere. In going from
(15) to (16) we used the inequality \a + b\ > |a| — |6|. In going from (17)
to (18) we used the explicit formula for the partial sum of the geometric
series.

Problems

1. Show that the series YlJLo V converges uniformly in the interval [—/3,(3]
if 101 <1.
2. (a) Show that the series 0e_J'xxJ' converges uniformly on [0, oo).
Hint: how large can xe~x be for x > 0?
(b) Compute the sum of the series.

3. (a) Prove that YlJLo x* *s differentiable on (—1,1) and

, 00 OO

= + i)V.
is**
3=0 i=o

(b) Use the fact that xJ = on (—1,1) to find a formula for


V^vOO
2^j=z0(j + l)xi.

(c) Use this to calculate exactly.


6.3 The Weierstrass M-test 243

4. Find a formula for y if x is in (-1,1).

5. (a) Show that the series

/(x) = x + x(l - x)j


3 =1

converges for every x in [0,1].

(b) What is/(x)?

(c) Is the convergence uniform on [0,1]?

6. (a) Show that the series


OO
XJ
/(*) = S i!
3=0

converges for every x in R.

(b) Prove that / is differentiable and /'(x) = /(x) for all x.

7. Let /(x) be defined by

°° i

f{x) m

(a) Prove that / is a well-defined, continuous function on the whole


real line.

(b) Prove that / is continuously differentiable and find a series repre¬


sentation for /'.

8. Let N be a positive integer and suppose that p > N. Let {a^} be a se¬
quence of numbers satisfying |oj| < C/jp for some constant C. Prove
that
OO

f(x) = ^ aj sin (2njx)


3=1

is N — 1 times continuously differentiable.

9. Suppose that p > 1 and that {aj}?L0 is a sequence of numbers satisfying


\dj\ < C/jp for j > 0 for some constant C. Define / by

OO

/(x) = ^ ^ cos (2?rjx).


i=o

Compute /o f(x)dx.
244 Chapter 6. Series of Functions

10. Let {a.,} be a bounded sequence and, for t > 0, define

OO

/(*) = aie 3t2-

Prove that / is infinitely often continuously differentiable for t > 0. Eval¬


uate / explicitly in the case where aj = 1 for all j.

11. (a) Show that the series

converges uniformly on [0, 27r].


(b) Prove that / is uniformly continuous on [0, 2-k].
(c) Let e — 10-3. Find a S > 0 so that \x — y\ <5 implies that | f(x) —
f(y) I < e. Hint: use the Mean Value Theorem.
12. Prove that

defines a continuous function for x > 1. The function C(x) is called the
Riemann zeta function.

13. Let Q be a rectangle [a, b] x [c, d] in the plane. Let fj(x, y) be a sequence
of continuous functions on Q that satisfy | fj(x, y)\ < Mj for all (x, y) e Q.
Suppose that YlJLo Mj < oo. Prove that

OO

is a well-defined continuous function on Q.

14. Let V be a Banach space with norm || • ||. Let be a sequence of


elements of V such that Xqlo 11**11 < °°- Prove that xi is a well-
defined element of V.

15. Prove that

is a continuous function on R that goes to zero as x —>■ ±oo.


6.4 Power Series 245

6.4 Power Series

A special class of series of functions arises naturally. In Section 4.3 we


introduced the Taylor polynomials,

T^n\x,Xo) = f{xQ) + f'{x0) (x-x0) + —(X-Xo)2 +

f{n)(x0)
...+ n\

as approximations to a given function f(x) near a point xQ. Recall that


/ must be n times differentiable at xQ for T^n\x, xQ) to be well defined.
If / is n + 1 times continuously differentiable near xQ, Taylor's theorem
gives us an error estimate for | f(x) — T^n\x, x0)\, and if / is infinitely
often continuously differentiable then the Taylor polynomials exist for
all n. The n^ Taylor polynomial T(n) is the partial sum of the infinite
series of functions

3=0

This raises the natural question of whether the infinite series equals the
function itself, that is, whether

/(*)
3=0

The sum on the right is called the Taylor series of the function /. In the
special case when xQ = 0 the series is called the Maclaurin series for
/. There are really two separate important questions here. Where does
a series of the form Ylajix ~ xo)j converge? Such a series is called a
power series. And if the Taylor series of a function / converges, does it
converge to f(x)7 We begin with the first question, which has a straight¬
forward answer.
Let p = limsup \aj\* and define R = 1/p if p is finite and nonzero. If
p = 0, we define R = oo, and if p = oo, we define R = 0. R is called the
radius of convergence of the series

OO

^2 aj {x — x0y, (i9)
3=0

a name which is justified by the following theorem.


246 Chapter 6. Series of Functions

□ Theorem 6.4.1 Let R be the radius of convergence of series (19). Then


the series converges for all x in the open interval (xQ — R, xQ + R) and
diverges for all x outside the closed interval [xQ - R, xQ + R}. For each
0 < r < R, the series converges uniformly on the interval [xQ — r, xQ + r].

Proof. Suppose that R is finite and nonzero. Then,

• i i
limsup (|<3.^11a? — x0\J)i = limsup (|x — cc0||aj|i) (20)
i
= |a; — xQ\limsup |a.j| j (21)
= \x — x0\/R. (22)

In the second step we used problem 4 of Section 6.1. Thus, if \x — xQ\ <
R, (19) converges by the root test (Theorem 6.2.3), and if \x — xQ\ > R,
(19) diverges by the root test. If R = oo, then (21) is zero, so the series
converges for all x by the root test. Finally, if R = 0, the series diverges
for x ^ xQ by the root test.
It remains to show the uniform convergence. Suppose that r < R,
and choose 7 so that ^ < 7 < 1. If \x — xa\ < r, then by (22) we have

• I V
limsup (|a.y 11m — x0\3)i < — < 7.
R

By Theorem 6.1.1, we can choose a J so that

. 1
(\aj\\x - x0\3)i < 7

for all j > J. Now choose Mj = 7-? for j > J and define

Mj = sup {\aj(x — xoy\ | x e [x0 — r,xQ + r] }

for j = 0,1,2, 1. Then

\aj(x — x0)3\ < Mj

for all j and all x e [xQ - r, xQ + r\. Since 7 < 1, the series £ Mj converges,
so, by the Weierstrass M-test, (19) converges uniformly on [xQ - r, xa + r].

Example 1 Consider the geometric series Y.'jLo xj. Since Oj = 1 for all
1
j, it is clear that limsup \aj\i = 1, so R = 1. Thus Theorem 6.4.1 con¬
firms what we already know about the geometric series, namely, that it
converges for |ar| < 1 and diverges for |ar| > 1. The theorem gives no
6.4 Power Series 247

information about x = ±R, but we can see in this case that the series di¬
verges for x = ±1. We have already computed that the sum of the series
is /(*) = for |x| < 1. This is a perfectly nice infinitely differentiable
function everywhere on M except for x = 1, but the series Yj°=o equals
the function only on the interval ( — 1,1). To represent f(x) around the
point x = 3, we write

1 1 1
1 — x 2 1 +^
x — 3
E(-i
3=0
r
which converges if \x — 3| < 2. This is the Taylor series for / around the
point x0 = 3, as one can check by computing the derivatives of / at 3,
and it represents the function in the interval (1,5). What we see here is
a general phenomenon: the radius of convergence of the Taylor series
is the distance from xQ to the nearest “singularity” of the function. The
reason for this will be clarified when we study analytic functions of a
complex variable in Chapter 8.

□ Theorem 6.4.2 Let R be the radius of convergence of the series (19).


Then the function f(x) to which the series converges in (xQ - R, xQ + R)
is infinitely often continuously differentiable in (xQ — R, xQ + R), and the
derivatives can be computed by differentiating the series term by term.
Furthermore, (19) is just the Taylor series of / expanded about the point
XQ.

Proof. We shall give a sketch of the proof. Let r < R. We know that
/(x) = Y'jLo aj(x ~ xo)j and that the powers of x - xQ are continuously
differentiable. Thus, Theorem 6.3.3 guarantees that f is continuously
differentiable and the derivative can be computed term by term if we
can show that the series of term-by term-derivatives

OO

^2ja,j(x - x0)j~l (23)


3=1

converges uniformly. It follows from problem 3b of Section 6.1 that


248 Chapter 6. Series of Functions

i i j
= lim (jj-1) lim sup (| aj | j ) j"—1
j~^°° j~><>O

1
= lim sup (| a j | i )
j-> OO

since ->• 1. Thus (23) has the same radius of convergence as (19).
Therefore, Theorem 6.4.1 implies that (23) converges uniformly on {xQ —
r, xQ + r). Thus, by Theorem 6.3.3, f(x) is continuously differentiable on
(x0 — r, x0 + r) and
OO

/'(*) = Yljdj^x - xoy~l. (24)


3=1

Since r was an arbitrary number less than R, (24) holds on (x0—R, x0+R).
We now apply the same idea to show that f" exists and that
OO
f"(x) = l)aj(x - Xo)j-2 (25)
3=2

on (x0 - R,xa + R). The crucial step is to show that (25) has the same
radius of convergence as (19). The argument, which is similar to that
1
above, uses the fact that limnH>00(j(j — 1))j-2 = 1. Continuing in this
manner, we prove that
OO
/(n)(x) = Y^j(j-l)...(j-n + l)a,j(x-XoY~n (26)
j=n

by showing that the series on the right-hand side has the same radius of
convergence as (19). We omit the details, which are very similar to those
outlined above. Notice that that if we evaluate both sides of (26) at xQ,
we find that

f^n\x0) = n(n- l)...(l)a„

since the other terms on the right vanish when x — x0. Thus aj =
and so the original series (19) is just the Taylor series for / expanded
about the point xQ. □

Example 2 (the exponential function) We define

£
X
e (27)
3=0
6.4 Power Series 249

Since = jpj -* 0, Theorem 6.1.3 guarantees that limsup \cij\j = 0.


Thus the radius of convergence of the right-hand side of (27) is R = oo.
By Theorem 6.4.2, ex is infinitely often differentiable and its derivative
can be computed by differentiating term by term in (27). When one does
this, one obtains the same series, so (ex)' = ex. We will show that ex is the
inverse function of In x as defined in Example 2 of Section 4.3. If ip is the
inverse function of the natural logarithm, then ip is defined everywhere
since the range of the natural logarithm is R, and for all positive x we
have ip(lnx) = x. By Theorem 4.5.2, we know that ip is differentiable at
In x and
1
ip'(\nx) x = ip (lax).
(In x)1

Thus, for all real numbers y — In x, we have ip'(y) = ip(y). By the unique¬
ness of the solutions of ordinary differential equations (Theorem 7.1.1),
ip(x) = Cex for some constant C. Since In 1 = 0, we know that -0(0) = 1,
so C = 1. Finally, problem 8 in Section 4.5 shows that the inverse func¬
tion to the natural log satisfies ip(x + y) = ip(x)ip(y), which proves that
exey = ex+y.

Example 3 (the trigonometric functions) We define sin x and cos x by


the power series

since (28)

00 r2 j
cos x
5l_1|w (29)

As in Example 2, the ratio test shows that the radius of convergence of


both series is R = oo. By Theorem 6.4.2, both functions are infinitely
differentiable and, by differentiating the series term by term, we see that
the usual formulas, (since)' = cosx and (coscr)' = — since, hold. All of
the other trigonometric functions are defined in terms of sin x and cos x
so their differentiation formulas can be derived from these properties of
since and coscc.

We have shown that power series converge on intervals and where


they converge they are the Taylor series of their limits. Unfortunately,
it is not true that the Taylor series of an infinitely differentiable function
250 Chapter f. Series of Functions

/ necessarily converges. Even more surprising the Taylor series may


converge but not to the function /.

Example 4 Consider the function /(or'' c which we initial.v take


to be defined for .r = 0. Awa\ from zero, / is clearly infinitely differen¬
tiable since it is the composition of two infinitely ditterentiuble functions.
We define /(0) = 0. Since — a * —oo as » -+ d we set' that /(*) —f 0
as a* —^ 0. Thus f is continuous at zero. To see that f is differentiable at
zero consider the difference quotient

AM - A0) i

Since he*1 * x as h —^ 0 we see that the difference quotient converges


to zero. Thus / is differentiable at x 0 and 0. Furthermore.
awav from zero f'(x) = -pr e _ - - by the chain rule. Since /\x' —k 0 as
x —v 0, / is continuouslv differentiable. Continuing in this way one can
show, by explicitly taking limits, that / is infinitely often continuously
differentiable and all its derivatives are zero at zero. The proofs use only
the fact that

as h —5► 0 for anv positive k. This can be proven directly or bv using


the power series for rr. Since all the derivatives of / have value zero
at zero, the Tavlor series for / is identically zero. Thus the Tavlor series
converges but does not equal the function anv where except at x 0.

In Chapter S we show that the infinitely differentiable functions that


are the limits of their Tavlor series are the restrictions to the real axis of
analytic functions in the complex plane.

Problems

1. Find the radius of convergence of the series £\itx-’ tor each of the fol¬
lowing choices of the coefficients a,:

(a) 3J tbl J- (c) In f (dl js

(e) 25. (f) 2j - 1 (g) 2J rib Prob. 10 of Sec. o.l.


j
6.4 Power Series 251

2. Given the following conditions on the coefficients {<ij}, what can you say
about the radius of convergence of )7

(a) 0 < mi < aj < m2/ for some constants mi and m2.
(b) 2j < aj < 3U
(c) j2 < aj < j3.

3. What do you think is the radius of convergence of the Taylor series for
In a; expanded about xQ = 1? Find the series and prove it. What do you
think is the radius of convergence of the Taylor series for In# expanded
about 4? Find it and prove it.

4. What is the radius of convergence of a power series whose coefficients an


are those given in problem 2 of Section 6.1?

5. Using power series, prove that —v 1 as x —r 0.

6. Using power series, prove that tana3~a: —> 0 as x —> 0.

7. (a) Use power series to evaluate directly the limits in problems 3(a) and
3(c) of Section 4.3.

(b) Suppose that / and g can be represented by convergent Taylor se¬


ries in an interval about x0. Prove l'Hospital's rule (Theorem 4.3.21
and its generalization in problem 13 of Section 4.3 without using the
Mean Value Theorem or Taylor's theorem.
2
8. In calculus, the function f(x) — e~x is always given as an example of a
function that “you can't integrate” because one can't guess an elementary
2
function F such that F'{x) = e~x . Find a power series for F.

9. Is there a power series which converges to the function f(x) = \x\ for all
x?

10. Suppose that f(x) is defined and infinitely differentiable on an interval


(—r, r). Suppose that / satisfies the estimate \ fln^(x)\ < M for all n and
all x e (—r, r). Prove that the Taylor series of / converges to f(x) for all
x e (—r, r).

11. Find the radius of convergence of the series Yjf= oti + 1)0* + 2)xV Find
the function to which the series converges.

12. Give examples which show that a Maclaurin series can either converge
or diverge at the points -R and R (independently) where R is the radius
of convergence.

13. Suppose that we didn't know how to solve the differential equation
y'(t) = y{t), with initial condition y(0) = yQ- Let's try to write a power se¬
ries y(t) = Cjtj for the unknown solution. By differentiating the series.
252 Chapter 6. Series of Functions

show that we can make y'{t) = y(t) if the coefficients satisfy

ci — ^0?

2 c2 = ci,
3c3 = C2,

and so forth. Solve these relations to determine all the coefficients in


terms of co- Determine c0 from the initial condition and observe that we
have found the solution. This example suggests that power series can be
used to solve differential equations. This is discussed further in project 3.

6.5 Complex Numbers

We define the complex numbers, C, to be the set of ordered pairs of


real numbers (x, y) endowed with the following notions of addition and
multiplication:

(xi,yi) + (x2,V2) = (*i + x2,y! +y2) (30)


{xi,yi)(x2,y2) = {xiX2-yiy2,x1y2 + x2yi). (31)

We regard two pairs as equal if and only if both components are equal.
It is straightforward to check that addition and multiplication in C are
commutative and associative. That is, if z\ = (xi,yi), z2 = (x2,y2), and
2:3 = (®3,2/3)/ then

Z\ + Z2 z2 + Zl

(zi + z2) + z3 Z1 + (z2 + z3)


Z1Z2 Z2Z\

(ziZ2)z3 Z\{z2z3).

Furthermore, the distributive law

z3(zx + Z2) = Z3Zi + Z3Z2

holds. The element (0,0) is called the zero of C and (1,0) is called the
identity of C since

z + (0,0) = z

*(1,0) = z
6.5 Complex Numbers 253

for all z e C. Given z\ and Z2 in C, the equation z + z\ — Z2 has a unique


solution z and the equation zz\ — z<i has a unique solution as long as
z\ / (0,0). To check this last statement, let z\ — (xi,yi), Z2 = (x2,y2),
and z = (x, y). Then zz\ = Z2 if and only if

x\x - yiy = x2

yix + xi y = y2.

Since the determinant of the coefficients on the left side is x\ + y\ the


equations have a unique solution as long as x\ and y\ are not both zero.
Thus, the complex numbers C satisfy the axioms for a field given in Sec¬
tion 1.1. The set of complex numbers is the same as the set of points in
the Euclidean plane E2, and the definition of addition (30) corresponds
to vector addition in the plane. But when we want to consider the plane
as endowed with the special multiplication law (31), we denote it by C
and call it the complex numbers.
Let R denote the special subset of C of elements of the form (x, 0).
It is easy to see that the operations of addition (30) and multiplication
(31) take elements of R into itself and that R satisfies all the properties
of a field. Thus R is a subfield of C. Let 0 be the function from the real
numbers E to R defined by 0(x) = (x, 0). Then 0 is a one-to-one function
which maps E onto R. The notions of addition and multiplication on the
two sets correspond to one another under 0; that is,

0(xi)0(x2) = (xi,0)(x2,0) = (®i®2,0) = 0(®lX2)

and similarly for addition. The function 0 is said to be an isomorphism


between the two fields E and R. By using 0 to identify points of E with
points of R, the real numbers can be regarded as a subset of C. Lrom now
on, we do so, dropping the notation R, by saying that a complex number
is real if it has the special form (x, 0) for some real number x. Using the
definition of multiplication (31), it is easy to see that

(x,y) = (x, 0) + (0, l)(y, 0).

Thus, if we give the special complex number (0,1) the name i, we see
that every complex number z can be written in the form

x + iy

where the real number x is called the real part of z, x = Re(z), and the
real number y is called the imaginary part of z,y = Im(z).
254 Chapter 6. Series of Functions

It is possible to define the complex numbers by saying that they are


the set of abstract objects of the form x + iy where x and y are real num¬
bers and i is a special object which satisfies i2 = —1. We then define
the operations of addition and multiplication on this set of objects to be
what we get by adding and multiplying out, using the usual rules of
arithmetic and the special rule i2 = —1. This set of abstract objects is
isomorphic to the complex numbers C, as we have defined them.
If z = x + iy, we define the absolute value of z to be

M = \jx2 + y2,

so |z| is just the Euclidean distance from the point (x,y) to the origin.
Therefore, if z\ = x\ + iyi and z2 = x2 + iy2, then |zi - z2| is the Euclidean
distance between (x\,yi) and (x2,y2). In particular, the set of z which
satisfy \z — zi\ = c is a circle of radius c about (mi, j/i). The absolute value
satisfies several simple properties:

|z\ T z21 < |zi| + |z2| (the triangle inequality)


||<Zl| - |z2|| < \z\ - Z2 |
\z\z2\ = |zi||z2|.

The second inequality follows from the triangle inequality, and the third
statement is easy to verify directly (problem 2). To prove the triangle
inequality, we square the left-hand side:

Zi+Z2j2 = (X! + x2)2 + (2/1 + y2f (32)


= ixi + y\) + (®2 + v\) + ‘Z(xix2 + y\y2) (33)
< N2 + k2|2 + 2(x\+yl)*(xl + yl)* (34)
= (N + N)2. (35)

Taking the square root of both sides gives the triangle inequality. In go¬
ing from (33) to (34) we used the Cauchy-Schwarz inequality (problem
10(a) in Section 2.2). For any complex number z — x + iy, we define the
complex conjugate z by z = x - iy and note that \z\2 = zz.

Definition. A sequence of complex numbers {zn}^ is said to converge


to a limit z if, given e > 0, there is an N such that

|zn — z| < e for all n> N.

In this case we write lim^oo zn = z or zn -> z as n -> oo.


6.5 Complex Numbers 255

Thus, zn —>• z if and only if, given any circle about 2, the sequence
gets inside the circle and stays inside after finitely many terms. If zn —
xn + iyn and z — x + iy, then

Izn z\ — {xn x) T (2/n J/) ) (36)

from which it follows that zn —>• z if and only if xn —> x and yn —> y. If
for every given M there is an N so that n > N implies \zn\ > M, we say
that {zn} converges to oo. Analogously to the real case, we say that the
sequence {zn} is a Cauchy sequence if, given e > 0, there is a A such
that
| zn — zm | < e if n > N and m > N.

□ Theorem 6.5.1 A sequence of complex numbers converges if and only


if it is a Cauchy sequence.

Proof. Suppose that {zn} is a Cauchy sequence. Since

!Zn Zm\ = (xn xm) T (j/n ?/m) i (37)

we knowthat \xn—Xm\ < \zn—zm\and \yn~ ym\ < \zn—Zm I- It follows that
{x^ and {yn} are Cauchy sequences in R. By Theorem 2.4.2, {xn} and
{yn} converge to finite limits x and y, respectively. If we define z = (x, y),
then (36) shows that zn —>• z. The converse argument is similar. □

Definition. Let {aj} be a sequence of complex numbers. The infinite


series aj is said to converge if the sequence of partial sums
n

Sn — 0,j
3=1

converges to a complex number S, in which case we call S the sum of


the series and write S = YljLi ar ^ lajl < oo, the series is said to
converge absolutely.

Many of the theorems of Section 6.2 are true for infinite series of com¬
plex numbers with no change in proof. The geometric series (Example
1) converges if a is a complex number satisfying \a\ < 1 and the sum
of the geometric series is This follows from (31), which implies that
\an\ = |a|n -> 0. Theorem 6.2.1 holds unchanged. The comparison test
256 Chapter 6. Series of Functions

(Theorem 6.2.2) makes no sense as it stands since there is no order re¬


lation among complex numbers; however, we can reformulate it as fol¬
lows. The proof is virtually identical to the proof of Theorem 6.2.2.

□ Theorem 6.5.2 Let {aj}, {bj}, and [cj] be sequences of complex num¬
bers such that \cij| < I bj\ < | Cj | for all j.

(a) If \cj\ converges, then bj converges absolutely.

(b) If J] |a? | diverges, then bj does not converge absolutely.

The hypotheses, conclusions, and proofs of the ratio and root tests (The¬
orems 6.2.3 and 6.2.4) refer only to the absolute values \aj\, so they go
over without change to the complex case.
Suppose now that {a^} is a sequence of complex numbers and zQ is a
given complex number. We want to ask for which z e C the power series

OO

f(z) = J2aj(z ~ z0)J (38)


3=0

converges. Note that where the right-hand side converges, it defines a


function / that takes C into C. To state the analogue of Theorem 6.4.1,
we need several definitions. The radius of convergence is defined, as in
the real case, in terms of the sequence {|aj|}. The set {z | \z — zQ\ < r}
is called the open disk of radius r about zQ, and {z \ \z — zQ| < r} is
called the closed disk of radius r about zQ. The series is said to converge
uniformly to / on a set S C C if, given e > 0, there is an N so that

1/6) - E ctj(z — zoy\ < e for all n > N and all z e S.


3=0

□ Theorem 6.5.3 Let R be the radius of convergence of (38). Then the


series converges for all z in the open disk {z | \z - zQ\ < R} and diverges
for all z outside the closed disk {z |- |z - zG| < R}. The series converges
uniformly on each closed subdisk {z | |z — zG| < r} with r < R.

The proof of Theorem 6.5.3 is virtually identical to that of Theorem


6.4.1 and is omitted. Note that nothing is said about convergence on the
circle {z | |z - z0| = R}. In fact, series can converge at some points on the
circle and diverge at others (see for example, problem 12 in Section 6.4).
6.5 Complex Numbers 257

Theorem 6.5.3 shows that the natural sets of convergence for complex
power series are disks. The intersection of a disk with the real subset E of
the complex numbers C is an interval or a point. This is why the natural
domains of convergence of power series of a real variable x are intervals
or points. In Chapter 8 we return to the question of which functions from
C to C can be expressed by convergent power series. Using complex
series, we can define the exponental function.

Example 1 (the exponential function) The power series

Z
zj
e (39)
j!

converges for all complex numbers 2 since the radius of convergence is


R = 00. Thus we can use (39) as a definition of ez. The exponential
function satisfies

gZl+Z2 _ gzlez2_ (40)

This can be proven by multiplying out the series for eZl and eZ2 and col¬
lecting terms (problem 12) or by using the fact that ez is an entire analytic
function, and (40) holds for real x and y (see problem 9 of Section 8.3). If
0 is a real number then

e
w (41)

OO
e2j OO
e2j+1 (42)
m (2j + 1)!
3= 0 3=0

cos 0 + zsin#. (43)

By using (43) and the complex conjugate e 10 — cos 0 - i sin 6, we can


rearrange algebraically to find the usual formulas for sin 6 and cos 0 in
terms of complex exponentials:

eie + e-i6 eiO _ e-i0

cos 0 = sin# =
2i

These representations of sin 0 and cos 0 will be very useful when we con¬
sider Fourier series in Chapter 9.
258 Chapter 6. Series of Functions

Example 2 (polar form) Recall that the addition of complex numbers


corresponds to vector addition in the plane. Using (40) we can figure out
what is happening geometrically when
we multiply complex numbers. For any
complex number z — x + iy, we let 9 de¬
note the angle between the vector from
the origin to the point (x, y) and the pos¬
itive x axis. See Figure 6.5.1. This angle
is traditionally called the argument of z.
We write 6 = Arg(z), noting that 6 is
only determined up to an integral mul¬
tiple of 27t. The length of the vector is
Figure 6.5.1 \z\ — \]x2 + y2. Thus we can write

z\cos6 + z|z|sin# = |z|e .

which is called the polar form of z. Given two complex numbers, zi and
Z2, we can compute their product by using (40) and the polar form:

2lZ2 = = |zi|Mei(9l+"2).

Therefore, when we multiply complex numbers we multiply their abso¬


lute values and add their arguments.

Example 3 (roots of unity) An important reason for the creation of


the complex numbers was the desire to find a field, larger than the real
number field, in which every number has a square root. Let z = \z\eie be
given. If uo = \u\el61 is a square root of z, we must have

2gi2#i
U)

Two complex numbers are equal if their absolute values are equal and
their arguments are equal or differ by an integral multiple of 2ir. There¬
fore |<v| = |z| 2 and
* *

9 + m27r = 29\

for some integer m. Thus, 9\ = 9/2 + rmx. As m varies through the


integers, 9\ takes on only two distinct values modulo 2tt, 9 and 9 + 7r.
Therefore, z has two square roots: |z|2e*2 and |z|^e^2+7r). For example,
-1 = etn, so the two square roots of -1 are i = e1^ and —i = e1^.
6.5 Complex Numbers 259

If we want u to be an root of z, the same reasoning leads to the


requirement

0 + m27r = n6\.

Solving for 9\ and letting m run through the integers, we find exactly n
distinct choices, modulo 27t, for 9\\

. 6 m2ir
Oi = - + -, m = 0,1, 2,..., n — 1.
n n

Thus, every complex number has exactly n roots. For example, the
■(n-iy.

number 1 has roots, 1, el n ,..., e‘ These numbers all have ab¬


solute value 1 and are equally spaced around the circle of radius 1 with
center at the origin.

Problems

1. Verify from definitions (30) and (31) that complex addition is associative
and complex multiplication is commutative.

2. For all z\ and z2 in C, prove that |ziz2| = kill^l-

3. Describe the following regions in the complex plane.

(a) {*1 \z — i| < 1}.


(b) {*1 — 1 < Im(z) < 1}.
(c) {*1 \z\ > 2 and Re(z) > 0}.
(d) {*1 Im(z2) > 0}.
(e) {-1 z + z = 2}.

4. Show how to express in the form x + iy by multiplying numerator


and denominator by the complex conjugate of 1 - 6i.

5. Express the following complex numbers in polar form: (a) 1 + i


(b)l-i (c) —10 (dlv^ + i.

6. Find the four fourth roots of i.

7. Suppose that zn -4 z. Prove that

(a) \zn| ->■ \z\.


(b) Zn ->• Z.
(c) {zn} is abounded sequence.
260 Chapter 6. Series of Functions

8. Suppose that zn —¥ z, wn —> w, and (3 e C. Prove that

(a) zn + wn z + w.
(b) zn • wn y z • w.
(c) /3zn ->• (3z.
(d) for each positive integer m, z™ —>• zm.
9. Let {oj}jL0 be complex numbers and for each z define p(z) = a0 + a\z +
a2z2 + ... + amzm. Suppose that zn —> z. Prove that

lim p(zn) =p(z).


n—>■ oo

10. For all zeC, define

- e~iz _ eiz + e"iz


sinz = -—-, cosz = ---.
ZjZ £

Find power series representations for sin z and cos z and verify that each
series has radius of convergence R = oo. Are sinz and cosz bounded
functions on C?
11. Prove that for |z| > 1,

1 _ y, (-1Y + 1
1+z “ ^ zi
3=1

12. Prove formula (40) by multiplying out the series on the right (using The¬
orem 6.2.6) and show by regrouping (using Theorem 6.2.5) that you get
the series on the left.

6.6 Infinite Products and Prime Numbers


For any finite set of complex numbers, ai,..., a/v, we introduce a special
symbol, ]X to denote the product of the numbers in the set:
N
J dj = cl\ ■ a,2 • ■. ■ • djv-
3=1

Definition. Let {an} be an infinite sequence of nonzero complex num¬


bers. The infinite product, [] an, is said to converge if the sequence of
partial products
n
Pn = n a3
3=1
6.6 Infinite Products and Prime Numbers 261

converges to a finite nonzero limit P as n —> oo. In this case we define

OO

p = n%-
j=i

If {Pn} converges to zero (or diverges to oo) we say that the infinite prod¬
uct diverges to zero (or oo). We require the an to be nonzero because if
one an were zero, then all partial products beyond n would be zero au¬
tomatically, no matter how wildly the sequence {an} behaved. Similarly,
if P were allowed to be zero, then “convergence” would not put strong
conditions on the behavior of aj for large j. For example, the sequence in
which aj = for j odd and aj = -■ for j even diverges to zero. We need
a criterion on the partial products {Pn} which guarantees convergence
to a nonzero limit.

□ Theorem 6.6.1 The sequence of partial products {Pn} converges to a


nonzero limit P if and only if, given e > 0, there exists an N so that

n
l < £ for all n > m > N. (44)
n ai

Proof. Suppose Pn —> P ^ 0, and let e > 0 be given. Since the abso¬
lute value is a continuous function (problem 7(a) of Section 6.5), we can
choose N\ so that \Pn\ > ^ for n > N\. Thus,

n m —1

I Pn — Pm-II ii
j—1
<ij n
3=1
ai

m—1 n

n
3=1
ai n
j=m
aj ~ ^

3=m

Therefore,

n aj
< ^Pn for all n>m>N\.
262 Chapter 6. Series of Functions

Since {Pn} is a Cauchy sequence, the right-hand side can be made < £ by
choosing TV > TVi large enough and requiring n > m > N. This proves
(44).
Conversely, suppose that condition (44) holds. Then we can choose
TV so that in j=maj 1| — \ for n > m > TV. This implies, in particular,
that
1
-

2 -
< n aj

Thus,

N m / n \
IP
On - 1pm |1 —
— n% n ( n
3=1 j=N+l
a3
w'=m+l
a3 -d
/

N /q\ n

Jf=i
n (|) n
w
vz/ j=m+1
a3 —1

for n > m > TV; thus, it is clear that \Pn - Pm\ can be made small by
choosing n and m sufficiently large. Therefore, {Pn} converges. Further¬
more,

n
IT aj
3=1

so the product cannot converge to zero. □

An immediate consequence of Theorem 6.6.1 is that aj ->• 1 is a neces¬


sary condition for convergence. This is analogous to the necessary con¬
dition an 0 for the convergence of the series J2 aj- Since we must have
an -* 1, it is convenient to write infinite products in the form

OO

II(lHr&j) (45)
j=i

where bj 0. If the bj are real and nonnegative, then it is easy to say


when the infinite product (45) converges.

Q Theorem 6.6.2 Suppose that bj > 0 for all j. Then 0(f T 6j) converges
if and only if Yj bj converges.
6.6 Infinite Products and Prime Numbers 263

Proof. First, suppose that Y bj converges. It is easy to check that the


function f(x) = ex — (l + x) has its minimum at zero where it has the
value zero. Thus, (1 + x) < ex for all real numbers x. It follows that

J^[(l + bj) < e^i ^ (46)


j=1

for all n. Since Y bj converges, the right-hand side of (46) is a bounded


sequence. Thus the left-hand side is a bounded sequence, and since it is
increasing, it converges by Theorem 2.4.3.
Conversely, suppose that n(l + bj) converges. By multiplying out
II?=i(l + bj) and using the positivity of the bj, it is easy to see that

1 + ^2 bj < JJ(l + 6j)


3=1 3=1

for all n. Since n(l + bj) converges the sequence on the right side is
bounded. Thus, the sequence of partial sums on the left is bounded and
therefore converges because it is increasing. □

Example 1 Since the harmonic series Yj diverges and the series


Y j2 converges (Example 2 of Section 6.2), Theorem 6.6.1 guarantees that
Q!? 2(1 — *) diverges and n?=2(l — -k) converges. We can verify this ex-
3 3 J 3
plicitly by computing the partial products:

1234 n— 1 _ 1
j 3 4 5 n n

(1)(3) (2)(4) (3)(5) (n — l)(n + 1) n+ 1


no
3=2
-To) =
22 32 42 2n

Therefore, the first partial product approaches zero, and the second par¬
tial product converges to \ as n —> 00.

We say that the infinite product 11(1 + bj) converges absolutely if


IKl + l&yl) converges. Theorem 6.6.2 gives a criterion for the convergence
of the latter product, and the following theorem shows that absolute con¬
vergence implies convergence.
264 Chapter 6. Series of Functions

□ Theorem 6.6.3 If 11(1 + bj) converges absolutely, then 11(1 + bj) con¬
verges.

Proof. Let Pn = n”=i(l + bj) and n > m. By expanding the product


n?=m(i + bj), subtracting 1, and using the triangle inequality, we find

11(1+^3)
j=m
n (i+\bj\)
j=m
1. (47)

Since n(l + bj) converges absolutely. Theorem 6.6.1 guarantees that the
right hand side of (47) can be made smaller than any given e > 0 by
choosing n and m large enough. Thus, the same is true of the left-hand
side, which proves, by Theorem 6.6.1, that the product converges. □

It might seem that the definition of absolute convergence for the


product is unnatural. Why don't we say that J](l + bj) converges ab¬
solutely if II11 + bj | converges? Let 6 be any nonzero real number and
let bj = elj° — 1. Then JI |1 + bj \ certainly converges since each term in
the product equals 1, but 11(1 + bj) = n e^° does not converge since the
terms e^6 go around the unit circle in the complex plane as j increases
and do not approach 1. The real definition above requires not that |1 + bj\
be close to 1 but that the absolute value of the difference between (1 + bj)
and 1 be small, which is a much stronger statement.

Example 2 The product n^LiU + ^~-2^) converges absolutely since

52 j2 < oo. However, the product IIjLi(l + j^~) does not converge
absolutely since 52 j — 00 •

We now show how infinite products can be used in the study of


prime numbers. The prime numbers are the set, P, of positive integers
that have no divisors besides themselves and the number 1. More than
two thousand years ago, Euclid proved that P is an infinite set by the
following argument. Suppose that P is a finite set and letpi,p2, • ■■,Pk be
a listing of the primes in P. Then the number q = p\P2---Pk + 1 must be a
prime because it is not divisible by any of the pi (since the remainder of
the division is 1). But this leads to a contradiction since q is clearly larger
than each of the pi and therefore is not on our finite list of primes. There¬
fore P must be an infinite set. Throughout our discussion of prime num¬
bers we shall use the fundamental theorem of arithmetic (Theorem 1.3.3).
6.6 Infinite Products and Prime Numbers 265

Note that the fundamental theorem is used implicitly in Euclid's proof


that P is infinite.
If we look at a list of prime numbers or compute how many there
are between 1 and 100, between 101 and 200, and so forth, we find that
the primes become sparse as the integers get larger. Let 7r(n) denote
the number of primes < n. The fact that P is infinite says simply that
7r(n) —>• oo as n —>• oo. In order to characterize how sparse the prime
numbers are as n —> oo, we want to investigate how fast 7r(n) goes to oo.
It is clear that 7r(n) < n by definition. Suppose that n is even. There are |
multiples of 2 that are < n, only one of which (2 itself) is a prime. Thus,

* (n) - 2+1,
and a similar estimate holds for n odd. That is, less than half (approxi¬
mately) of the integers < n can be prime. Of course the multiples of 3 can
not be prime either, and they constitute approximately | of the integers
< n. This suggests that less than i of the integers < n can be prime since
1 — i — | However, there is a problem with this reasoning since
some multiples of 2 are also multiples of 3 and vice versa, and so we've
subtracted too much. Nevertheless, this suggests that as n gets larger,
7r(n) becomes smaller relative to n.

Lemma 1 The infinite product np e (l - diverges to zero.

Proof. In the infinite product we will take the primes to be ordered in


the natural way, P = {2,3, 5, 7,...}, though it turns out that the ordering
doesn't matter (problem 6). The statement of the lemma is equivalent to
the statement that the product

1 - -) (48)
PJ

diverges to oo since the corresponding partial products are reciprocals


of each other. If pi,p2, -,PK are the first K primes, then QK, the iTth
partial product of (48), can be written

(49)
- ftps)
(50)
266 Chapter 6. Series of Functions

since each term is a sum of a geometric series. Let M > 0 be given and
choose N so that

which we may do since the harmonic series diverges to oo. By the fun¬
damental theorem of arithmetic, each n < N can be written uniquely as
a finite product of powers of primes. Choose K large* enough so that the
finite product (49) is taken over all the primes that occur in the product
representations of all n < N. Then, each term ^ is contained in the sum
obtained by multiplying out the series in the finite product (50). Note
that we can multiply out these series because they are absolutely con¬
vergent (Theorem 6.2.5). Because the rest of the terms obtained from the
product are positive, we see that

N K
l
m < £n
n=1

Since M > 0 was arbitrary, we conclude that Qk -» oo as K oo. □

We introduce a function w defined on pairs of positive integers (m, r)


to help us sort out the double-counting difficulty mentioned above. De¬
fine w(m, r) to be the number of positive integers < m that are not mul¬
tiples of the first r prime numbers. Thus, for example, to(12,1) = 6 since
all the odd numbers are not multiples of 2, and iu(12,4) = 2 since only 1
and 11 are not multiples of 2,3,5, or 7.

Lemma 2 For any real x, let [x] denote the greatest integer < x. Then,
for all m and r.

m m
w(m, r) m — E
*i<r ..Pil.
+ .E
.PhPi2.
*1

m
+ (-i r _PlP2-.Pr. '
(51)

Proof. Notice that for each choice of i\, i2,4

m
L PhPi2-Pik\
6.6 Infinite Products and Prime Numbers 267

is just the number of multiples of PixPi2---Pik which are < m. Thus, a fixed
integer n < m is counted in the term

m
£ -PilPl2 ”•Pile -

a number of times equal to the number of different products Pi1Pi2..-Pik


that divide n. So, if n is not a multiple of any of the pi, it is counted once
in the first term (m) and not counted in any of the other terms. Suppose n
is a multiple of exactly one of the first r primes. Then, n is counted once
in the first term (m) and counted once (with a minus sign) in the second
term, so the net count is zero. In general, suppose that n is a multiple of
exactly j of the first r primes. Then, n is counted once in the first term.
J
n is counted j times in the second term, n is counted times in the
2

second term because there are ^ 2 J waYs iR which n is a multiple of

two primes, and so forth. So, the net number of times that n is counted
(including minus signs) is

1->+ ( a ) " ( » ) +- + (-1>* ( > ) = (1-T = °-


Thus, each n that is not a multiple of any pi is counted once by the ex¬
pression on the right of (51). Each n which is a multiple of at least one
Pi is counted zero times (net) by the expression on the right of (51). Thus

the sum on the right of (51) is equal to the number of integers < m that
are not multiples of the first r primes. That is, the sum equals w(m,r).

□ Theorem 6.6.4 Let 7r(m) denote the number of primes < m. Then,

7T (m)
lim 0. (52)
m—>00 m

Proof. The set of primes < m is contained in the union of the first r
primes and the set of numbers < m that are not multiples of the first r
primes. Thus, for every m and r

7r (m) < r + w(m,r) (53)


268 Chapter 6. Series of Functions

Notice that replacing [®] by x always makes an error < 1. Therefore,


since there are

= (l + l)r = 2r
1
(r) + i 2 i + ...
terms in expression (51) for w(m, r), we have
m • „ m
7r (m) < r + 2 + m — 2_j-^ ...+ (-!)*
PlP2---Pr
i<rPi' h<i2<r PilPi2

= r + 2r +
i=\ ^ Pi >

This estimate is true for all positive integers m and r. Now, let e > 0
be given. By Lemma 1 we can choose an r (henceforth fixed) so that

rc=i (i - £) < !• Thus,


£
7r(m) < r + 2r + m-

or
7r(m) r 2r e
——- <-1-f- -
m mm2
for all m. For m large enough,

r 2r e
— + < x
mm2

and so
7r (m)
< e,
m
which proves the theorem. □

The above theorem is a simple example of the use of analysis in num¬


ber theory. Much more is known about the detailed behavior of 7r(m).
The Prime Number Theorem, which states that

7r (m)
lim = 1,
m—> oo m/lnm

was originally conjectured by C. F. Gauss (1777 - 1855). It was proven in


1896 independently by J. Hadamard and C.-J. de la Vallee Poussin, using
advanced techniques from the theory of analytic functions. Some of the
basic ideas of analytic function theory are developed in Chapter 8.
6.6 Infinite Products and Prime Numbers 269

Problems

1. Prove that
1
j(j + !)/ 3'

2. Prove that for every complex number z satisfying \z\ < 1,


OO
i
l-z'

3. Let 2 be a complex number.

(a) Show that there is an N such that n> N implies

2. C
"(I < “a
n nz
for some constant C.
(b) Use the estimate in (a) to prove the convergence of the infinite prod¬
uct
OO
z,
IU
=1
3
+ -)e-.
4. Let {a^} be a sequence of real numbers such that |aj \ < 1 for each j. Prove
that rijli(l + aj) converges if and only if ^ In (1 + aj) converges. Hint:
relate the partial sum and product.

5. Use the result of problem 4 to show that the infinite product


OO

n (-i)j
j

converges but does not converge absolutely.

6. Let S be an infinite subset of the positive integers and let bs be a function


defined on S so that 0 < bs < 1 for all s e S. Show that if the infinite
product

IU-M
s e S

diverges to zero for a particular ordering of S then it diverges to zero for


all orderings of S

7. The Riemann zeta function was defined in problem 12 of Section 6.3.


Prove that for all x > 1

C(*)
270 Chapter 6. Series of Functions

8. (a) Show that for all nonintegral x, the following infinite product con¬
verges:

(b) Show that the limiting function is continuous and that by defining it
to be zero at the integers it can be extended to be a continuous func¬
tion on R. Hint: show that the partial products converge uniformly
on appropriate sets.
(c) Generate the graphs of several partial products and use the graphs
to guess what the limiting function is.

9. (a) Write a computer program which computes n(n) for any given n.
(b) Use the program to provide numerical evidence for Theorem 6.6.4
and the Prime Number Theorem.

Projects

1. The purpose of this project is to develop the theory of alternating series.


A series is called an alternating series if the signs of the terms alternate.
For example,
„ 111111
1_2+3~4+5~6+7~ ""
is the alternating harmonic series. If the first term is positive, we can
write the alternating series as where Cj > 0 for all j. We
will suppose that the sequence {cj} is non-increasing; that is, Cj > cj+i
for all j.

(a) Let Sn = X)"=o(—Prove that for each n > 2, the number Sn


lies between 5n_i and £n_ 2.
(b) Use the Bolzano-Weierstrass theorem to show that a subsequence
Snk converges.
(c) Suppose, in addition, that Cj -» 0 as j -»• oo. Prove that the whole
sequence {Sn} converges. This is known as the Alternating Series
Theorem.
(d) Does the alternating harmonic series converge? Does it converge
absolutely?
(e) Let S = lim^oo Sn. Prove that

\S ~~ ‘Snl 5: -Sn+i-

Note that this gives us a very easy way to estimate how close a
partial sum is to the sum of an alternating series. How many terms
of the alternating geometric series do we have to take to be within
10-4 of the limit?
Projects 271

(f) Prove that for each x we can choose a J so that the Maclaurin series
for sin x and cos x satisfy the hypotheses for the Alternating Series
Theorem for j > J. Use the alternating series remainder to estimate
2 4
how close 1 — + 2L- is to cosx on the interval [—1,1]. Compare
your estimate to the one you get by using Taylor's theorem.

2. Polynomials are easy to integrate and power series are “almost” polyno¬
mials. This gives us a new way to approximate certain integrals.
2
(a) What is the Maclaurin series for the function e~x ?
(b) Use the first three nonzero terms of the Maclaurin series to approx¬
imate the integral fQ e~x2 dx.
(c) Using the alternating series error estimate from Project 1, estimate
the error in your approximation.
(d) How many terms of the series would you have to take to be sure
that your estimate of the integral is within 10-4 of the correct an¬
swer? Compare the computational effort involved with the numer¬
ical methods discussed in Section 3.4.
(e) Suppose that we want to estimate the integral J0°° e~x dx to within
1CT4. First, choose A” so that f^e~x~dx < 10_4/2. Hint: e-x <
e_x for x > 1. Then estimate f0 e~x dx. Note: The exact value of
J0°° e~x2 dx can be computed analytically. See problem 9 of Section
10.3.

3. The purpose of this project is to show how power series can be used to
find or approximate the solutions of certain differential equations.

(a) Use the idea in problem 13 of Section 6.4 to find a power series so¬
lution of the initial value problem

y'{t) = y(i)+t, 2/(0) = 2. (54)

After determining the coefficients, you should be able to express


the series in terms of functions that you know. Check to be sure that
your function solves (54).
(b) Use the same idea to find a power series solution of

y"(t) + \v'(t) + y(t) = 0

that satisfies the condition y(0) = 1. Verify that the series that you
found converges for all t and satisfies the differential equation. The
solution, J0(t), is called a Bessel function.
(c) Power series can also be used for nonlinear equations. Use it to find
the first four terms of the solution of

y'(t) = y(t)2, 2/(0) = 2.


272 Chapter 6. Series of Functions

Note that when you square the series for y(t), the lower-order terms
are easy to calculate. Check to make sure that the terms you found
coincide with the series expansion of the solution found in Example
2 of Section 7.1.

4. The purpose of this project is to develop the idea of decimal expansions.


We shall discuss decimal expansions for numbers in the interval [0,1].
The extension to all real numbers is straightforward.

(a) Let {an}“=1 be a sequence, each of whose terms is an integer


0,1,2,...,8, or 9. Prove that ]T^=1on10_n converges to a real
number a? e [0,1]. We define .010203... = ^^°=i an10-n and call
.oia2a3 ... a decimal expansion for x.
(b) Suppose that x e [0,1] has the form x = kl0~n for some positive
integers k and n such that k < 10n. Prove that x has a decimal
expansion which has only 0's after the term. Prove that x also
fir
has a decimal expansion which has only 9's after the nm term.
(c) Suppose that x e [0,1] does not have the form x = M0_n for any
positive integers k and n. Then x is in one of the open intervals
(0, ^), (^j, 3^),..., (jg, 1). If a: is in the m^ interval, define ai =
m — 1 and notice that 0 < (x — .ai000...) < ^. Now divide the
interval up into 10 parts each of length and use this to choose
a2. Continue in this manner and prove that .a1a2a3 ... is a decimal
expansion for x.
(d) Prove that if two decimal expansions, .010203 ... and .bib2b$ ..., are
distinct and neither ends in a string of 9's, then the numbers which
they represent are different. Hint: let n be the first integer such that
o-n i^bn. Then, either on > bn or an < bn. If the first is true, prove
that .010203 ... > .bib2bz —
(e) Prove that every x e (0,1) that is of the form k!0~n has exactly two
decimal expansions, one ending in 0's and the other in 9's.
(f) Prove that every x e (0,1) that is not of the form kl0~n has a unique
decimal expansion that does not end in a string of 0's or 9's.
CHAPTER 7

Differential Equations

In many applications of mathematics, the fundamental assumption


states that the rate of change (i.e., the derivative) of an important func¬
tion is equal to an expression involving other quantities and (sometimes)
the function itself. The problem is to “solve” the resulting differen¬
tial equation and find the function. Because of the importance of this
problem, it is not surprising that the study of differential equations has
played a central role in mathematics for the past three centuries. Indeed,
the branch of mathematics which we call analysis began when Newton
invented calculus so that he could write down the differential equations
for the motions of the planets.
In the first section we shall see that, under mild assumptions, differ¬
ential equations always have unique solutions if we specify the value of
the unknown function at an initial point. This solution may only exist
for a short time, however. Therefore, in Section 7.2 we give conditions
which guarantee that solutions are global, that is, that they exist for all
times. If the differential equation is very simple, one may be able to find
an explicit expression for the solution. More often, one needs machine
computation to solve the differential equation approximately. In Section
7.3, we prove an error estimate for Euler's method, the simplest approx¬
imation algorithm. Throughout, the idea is to show in a simple way the
usefulness of the analytic concepts that we have developed. For further
study of differential equations, see [2], [4], [21], and [7].

7.1 Local Existence

To see what kind of theorem we could try to prove, it is useful to start


with a simple, familiar example.
274 Chapter 7. Differential Equations

Example 1 Consider the differential equation

y'{t) = 2 y(t).

For every choice of the constant c, the function y(t) = ce2t satisfies the
differential equation. Thus, the differential equation has a whole family
of solutions. Often one refers to both the function y(t) and to its graph
(which is a curve in the t — y plane) as a “solution.” If the value of y(t)
at a particular time t = tQ is given, then c is determined. To see this,
suppose y(t0) = yQ. Then, yQ = y(t0) = ce2t°, so c = y0e~2t°. Thus, given
any point, (t0,y0), in the plane, there is a function, y{t), which solves
the differential equation and whose graph passes through the point. The
condition y(t0) = yQ is called an initial condition because the value of
y is being specified at the “initial” time tQ. Note, however, that the so¬
lution y{t) is determined for times before tQr as well as for times after tQ.
Conversely, suppose that yi(t) = c\e2t
and y2{t) = C2e2t are two solutions
which are equal at some time t\. Then,
c\e2tl = C2e2tl, from which it fol¬
lows that ci must equal C2. Therefore,
the solutions are equal for all times t.
Thus, distinct solutions can never have
the same value at the same time. In
terms of the geometry of solution curves
in the plane, we can restate what we
have proven as follows: every point in
the plane has a solution curve going
through it, and distinct solution curves
never cross. Several of these solution
curves are shown in Figure 7.1.1.

We would like to show that the initial-value problem for a general


first-order differential equation

y'(t) = /(*></(*)), y{t0) = y0 (1)

has the same nice properties as the solutions in Example 1. Here / is a


function of two independent variables, ta is a real number, and the value
of y at t0 is specified. The proof is quite difficult for two reasons. First,
for a general / we have no way of exhibiting explicit functions which
solve the differential equation, as we did in Example 1. This is even
7.1 Local Existence 275

true for quite simple functions /, such as f(t,y) = sin (2y) or /(£,y) =
t2 + y sin (2y). Thus, we will have to prove that solutions exist and have
the right properties rather than just checking the properties of a given
family of functions, as we did in Example 1. The second difficulty is
that solutions may exist only for short times as shown by the following
example.

Example 2 Solving the differential equation

y'(t) = y(f)2, 2/(0) = yQ

by the method of separation of variables (project 1 of Chapter 4), we find


that

v(t) =^
Vo

if yQ / 0. If yQ = 0, the solution is y(t) = 0.


For yQ > 0, y(t) -» +oc as t /* So, not
only does the solution exist for only a finite
time interval, but also the amount of time
it exists depends on the initial condition yQ.
Notice, however, that the solution exists for
all negative times. For yQ < 0, the situa¬
tion is just the opposite. The solution exists
for all positive times but diverges to -cxd as
t \ —. The cases y0 = ±1 are shown in
Figure 7.1.2.

□ Theorem 7.1.1 Let / be continuously differentiable in a square

S = [t0- S, t0+ 6] x [yQ - 5, yQ + <S]

centered at (t0,y0). Then there is a T < 5 and a unique, continuously


differentiable function y(t) defined on [tQ — T, t0 + T} so that (1) holds.

Proof. If y(t) is continuously differentiable, then by the Fundamental


Theorem of Calculus,

y{t) - ya
276 Chapter 7. Differential Equations

and so if y(t) satisfies (1),

y{t) = Vo + [ f(s,y{s))ds. (2)


Jo

We will solve (2) by an iteration method and then show that the solution
satisfies (1). Notice that (2) allows us to reformulate both the differential
equation and the initial condition into a single condition on the function
y(t). Let T > 0 be a number such that T < 8, and let y\{t) = yQ. We
define functions yn(t) inductively for n > 1 by

Vn+i(t) = Vo f{s,yn(s)) ds. (3)

We shall show that if T is small enough, this definition makes sense and
that the resulting sequence of functions, {yn(t)}, is a Cauchy sequence in
C[t0 — T, t0 + T], For simplicity, we denote the interval [tQ — T,t0 + T]
by It- Throughout, we shall denote the sup norm on C[It] by || • ||oo,t to
emphasize the dependence on T.

t0 5 tQ tQ T tQ 5

Figure 7.1.3

Suppose that 0 < T < S/M where M = max5 \f(t,y)\. Since / is


continuous on S,M < oo (Theorem 4.6.1). We shall first prove that each
function yn(t) is continuous on IT and its values lie in [yQ-S, y0+S}. Since
Vi(t) = y0, this is certainly true for n = 1. Suppose that it is true for yn.
7.1 Local Existence 277

Then, since the values of yn lie in [y0—S, y0+5\, we know that/(s,yn(s)) is


well defined on It- Furthermore, f(s,yn(s)) is continuous on It since the
composition of continuous functions is continuous (problem 4 of Section
4.6). Thus, the integral of /(s, yn(s)) from tQ to t is a continuous function
of t (problem 13 of Section 3.3). It follows that yn+1 is continuous, and
for t e It,

\yn+l(t) - yoI < [ If(s,yn(s))\ds


Jt0
< M\t — tQ |
< MT

<

so the values of yn+i lie in [y0 - 5, yQ +8}. By induction, each function yn


is continuous on It and its values on It lie in [y0 — 5, y0 + 8].
By hypothesis, / is continuously differentiable on S, so

df
I\ max < oo.
s dy

For all /3i and /?2 in [ya - 8, y0 +6], the Mean Value Theorem implies that
there is a £ between (3\ and /?2 so that

di (4)
|/(f,/3i)-/(f,/32)| < dy
- (82)

< K\fc-h I- (5)

Using (5), we estimate

\yn+i{t) - yn(t)\ < s,yn{s)) - f(s,yn-i{s))\ds

< K\yn(s) - yn-i{s)\ds

< K\t - t011|yn - yn-i\\oo,T

< KT\\yn - yn-i\\oo,T-

Thus, taking the supremum of the left-hand side over all t e IT, we obtain

11yn+l Un ||oo,T < a\\yn - yn-i\\oo,T (6)


278 Chapter 7. Differential Equations

where a = KT. If we choose T < min {^, <5}, then a < 1. It follows
in exactly the same way as in the proof of Theorem 5.4.1 that {yn} is a
Cauchy sequence in the sup norm in C[It\- Thus, by Theorem 5.3.3, there
is a continuous function y{t) on It such that yn —> y uniformly. One can
also use the contraction mapping principle to prove the existence of y
(see problem 5). Since estimate (5) holds,

\f{s,yn(s)) - f(s,y(s))\ < K\yn(s)-y{s)\

for all seIT. It follows that f(t,yn(s)) converges uniformly to /(t,y(s))


on It- Therefore, using Theorem 5.2.2 and (3), we find

Vi*) = limyn+i(f)

= lim (y0+ f{s,yn(s))ds)


n^°° \ Jt0 J

= yo+ [ f{s,y{s))ds.
Jto

The function y therefore satisfies (2) and y(t0) = yQ. Since f(s,y(s)) is
continuous, the Fundamental Theorem of Calculus implies that yit) is
continuously differentiable and (1) holds.
To see that y(t) is unique, suppose that z(t) is another continuously
differentiable function on It that satisfies (1). Then z{t) satisfies the inte¬
gral equation (2) also. Subtracting the integral equation for y(t) from the
integral equation for z(t), we find

z(t) — y(t) = [ {f(s,z{s))~ f(s,y(s)))ds (7)


•h0

The same estimates as above show that \\z — y||oo,T < &\\z — y||oo,T- But,
since a < 1, this can only be true if \\z — y\\oo,T — 0/ which implies that
z(t) = yit) for all t in the interval. Thus, y(t) is unique. □

Theorem 7.1.1 is called a local existence theorem because we have


proven the existence of a solution of (1) only on a small time interval
[t0 — T,t0 + T] containing tQ. That is, there is a solution curve only near
the point (tG, y0) in the plane. We saw in Example 2 that one cannot ex¬
pect existence for all times without additional hypotheses on /. This is
the subject of Section 7.2. The hypothesis that / is continuously differ¬
entiable can be weakened considerably. The same proof shows that a
7.1 Local Existence 279

unique y(t) exists locally if / is uniformly Lipschitz continuous (prob¬


lem 8). We assumed that / is continuously differentiable for t larger than
tQ and t smaller than tor so we obtained a solution curve on both sides
of tQ. If one knows only that / is continuously differentiable for t > tQ
then the same proof gives local existence of a unique solution y(t) on the
interval [t0, t0 + T].
Note the interesting line of argument in the proof. We showed the ex¬
istence of a continuous function y(t) that satisfied the integral equation
(2). It followed automatically (from the Fundamental Theorem of Calcu¬
lus) that y(t) was continuously differentiable. This same line of reason¬
ing can be used to prove that y(t) is, in fact, infinitely often continuously
differentiable if / is infinitely often continuously differentiable (problem
9). Ordinary differential equations are a special case of a class of partial
differential equations, called elliptic equations, for which this kind of
differentiability holds. Although Theorem 7.1.1 gives local existence and
uniqueness for a single first-order differential equation, a very similar
proof works for first order systems of equations; see project 1. Further¬
more, higher-order equations can be reduced to systems of first-order
equations (problem 11), so the ideas of Theorem 7.1.1 are very general.
If / is only continuous, it can be shown that a local solution exists
by other methods. However, in that case, uniqueness may be lost, as the
following example shows.

Example 3 Consider the initial-value problem

y'(t) = \/y(0> y(°) = °-

The method of separation of variables yields the solution y(t) = t2/4 on


[0, oo), but the function y(t) = 0 is also a solution. The function f(y) =
yy is continuous of [0,ooj. Its derivative, = §-^, is continuous for
y > 0 but not at y = 0. Thus, if the initial condition were y(0) = yQ with
yQ > 0, then Theorem 7.1.1 would guarantee a unique local solution. But

if yQ = 0, the theorem cannot be used because the hypothesis on / is not


satisfied.

We saw in Example 1 that distinct solution curves cannot cross. It is


a very important consequence of Theorem 7.1.1 that this is true in the
general case too. This fact is often used in proofs of global existence; see
Example 3 in Section 7.2.
280 Chapter 7. Differential Equations

□ Theorem 7.1.2 Let y(t) and z(t) be continuously differentiable solutions


of (1) on an interval [a, b}. If there is one point tQ e [a, b] such that y(tQ) =
z[t0), then y(t) = z(t) for all t e [a, b\.

Proof. Suppose y(tQ) = z(tQ) at a point tQ e [a, 6]. Then, both y(t) and
z(t) satisfy (1) and the same initial condition at tQ. Thus, the unique¬
ness statement in Theorem 7.1.1 implies that y(t) = z(t) for all t in some
interval It containing t0. See Figure
7.1.4. Let t\ = sup {t | y(t) = z{t)} and
suppose that t\ < b. We know that
z(ti) = y{t\) since z and y are continu¬
ous functions which are equal in the in¬
terval [t0,ti). However, the local existence
theorem would then guarantee that y(t) =
z(t) for all t in some interval containing
ti, which would contradict the definition
of t\. Thus, we must have t\ > b, so
y(t) = z(t) on the interval [t0, b]. The proof
Figure 7.1.4
that y(t) — z(t) on the interval [a,f0] is
similar. □

Finally, we show that the solution which we have constructed in The¬


orem 7.1.1 depends continuously on the initial value yQ.

□ Theorem 7.1.3 Let /, S, M, and K be as in Theorem 7.1.1, and let y{t)


be the solution of (1) on the interval IT where T < min {^, Suppose
that yQ satisfies \yQ - y0\ < 5/2, and define T = T/2. Then the solution
y(t) of (1) satisfying y(0) = yQ exists on the interval and satisfies
\y(t)-y{t)\ < eKt \yQ — y0\. (8)

Proof. By Theorem 7.1.1, the solution y[t) exists in some small time
interval about tQ. Since y(t) is continuous, it's graph cannot escape from
S immediately and as long as it remains in S, y{t) satisfies the estimate

\y(t)-y0\ < M(t — tQ).


Thus,

\y(t) - y01 < |y{t) ~ y0\ + \v<> ~ ya\


< M{t-to) + 8/2
< 5/2 + 6/2
7.1 Local Existence 281

if 11 — tQ| < S/2M. Therefore, the solution y(t) remains in the rectangle
for t e Jjt. Now define

<r(t) = {y{t) ~y(*))2-

Since both solutions are continuously differentiable, a(t) is continuously


differentiable and

<*'(*) = 2(y(t)-y(t))(y\t)-y'(t)) (9)


= 2(y(f) -y(t))(f(t,y(t)) - f{t,y(t))) (10)

< 2 Ka{t), (12)

where we used the Mean Value Theorem in step (11). Since a(t) satisfies
this differential inequality, Proposition 7.2.2 (proven in the next section)
guarantees that
a(t) < e2Kta(0).

Taking the square root of both sides yields (8). □

Problems

1. Use the method of separation of variables to find explicit solutions of the


following initial-value problems. In both cases, sketch the graphs of the
solutions curves for several different values of ya.

(a) y'(t) = ty(t), y(0) = ya-


(b) y'(t) = -2ty{t)\ y{0) = Vo.

2. Consider the initial-value problem

y'(t) = sin (y(t)), 2/(0) = 1.

Let 5 = 1. Determine values for the constants M and K used in the proof
of Theorem 7.1.1. Prove that a solution exists on the interval [-§, §].

3. Consider the initial-value problem

y'(t) = t2 + y(t) sin (2y(t)), 2/(1) = 3.

Let 5 = 1. Determine values for the constants M and K used in the proof
of Theorem 7.1.1. On what interval IT does Theorem 7.1.1 guarantee that
a solution exists?
282 Chapter 7. Differential Equations

4. Consider the initial-value problem

y'(t) = y(t)2, 2/(0) = 2.

Let (5 = 1. Determine values for the constants M and K used in the proof
of Theorem 7.1.1. On what interval It does Theorem 7.1.1 guarantee that
a solution exists?

5. Show how to use the contraction mapping principle to avoid mimicking


the convergence argument of Theorem 5.4.1 in the proof of Theorem 7.1.1.

6. Let y(t) be the solution of the initial-value problem in problem 2. Let y(t)
be the solution of the same differential equation with the initial condition
y(0) = 1.05. Estimate \y(t) — y(t)\ on the interval [— |].

7. Suppose that the two solutions, y(t) and y(t), described in problem 6 exist
for all times t eR. Use the proof of Theorem 7.1.3 to estimate |y(t) — y(t) |.

8. Suppose that /(t, y) is uniformly Lipschitz continuous in y in the square


S. That is, assume that the M in the definition of Lipschitz continuous
can be chosen uniformly for all t e IT. Explain carefully why the proof of
Theorem 7.1.1 still works.

9. Suppose that / is infinitely often continuously differentiable in S. Prove


that the solution y(t), whose existence was shown in Theorem 7.1.1, is
infinitely often continuously differentiable on IT.

10. For each of the following functions, f(t,y), find the set of initial values
y(0) = yQ for which Theorem 7.1.1 or problem 8 guarantees a unique
local solution y(t):

(a) f{t,y) = t2siny.


(b) f(t,y) = y/y - 2.
(c) f(t,y) = (y)5.

(d) f{t,y) = (cost)vT - y2.

(e) f(t,y) = y~x sin y.

11. Let g be a function of n + 1 variables. Show that the nth order differential
equation

can be converted to a system of first-order equations by the change of


variables z0{t) = y(t),Zl(t) = y\..., zn_i(f) = y{n~^{t). What hypothe¬
ses on g and what type of initial conditions do you think are necessary to
prove a local existence result for the system? What form do these initial
conditions take in terms of y{t)l
7.2 Global Existence 283

12. Suppose that / is a continuously differentiable function on R2. In Theo¬


rem 7.1.1 a solution of (1) was constructed on the interval [tQ — T, ta + T].
Take (tQ + T, y(tQ + T)) as the initial point and explain carefully why the
same proof shows that there is a Ti > 0 so that the solution can be ex¬
tended to the interval [ta — T, ta + T + Ti]. If we repeat this process,
does this prove that the solution can be extended to the whole interval
[to - T, oo)?

7.2 Global Existence


Theorem 7.1.1 guarantees that solutions of differential equations exist
locally, that is, for short times. We would like to derive conditions on /
which guarantee that solutions exist for all times. Example 2 in Section
7.1 shows that the solution of an ordinary differential equation may go to
+oo at a finite time t\ if / grows quickly in y. The solution ceases to exist
at 11 in the sense that there is no continuously differentiable function on
a interval containing t\ that equals the given solution to the left of t\.
Are there other ways in which a solution could stop being a solution?
Perhaps solutions could oscillate faster and faster or remain bounded
and suddenly become non-differentiable. The following theorem shows
that if / is continuously differentiable everywhere, then the only way
solutions stop being solutions is by going to Too or —oo in finite time.

□ Theorem 7.2.1 Let / be continuously differentiable on R2, and let y(t)


be the local solution of

y'it) - f(t,y(t)), y(t0) = y0

given by Theorem 7.1.1. Define

t\ = sup{s | y(t) is a solution on [t0, s)}

and suppose ti < oo. Then, as t /■ t\, either y(t) -> oo or y(t) -> -oo.

Proof. We will show that if y{t) does not converge either to +oo or -oo,
then y(t) can be extended past t\, contradicting the definition of t\. If
y(t) —> oo as t h, then for every N > 0 there is p such that y(t) > N
for all t Similarly, if y(t) -> -oo as t t\ then for every N > 0
there is n so that y{t) < -N for all t > t\ - y. Therefore, if y(t) doesn't
converge either to +oo or to -oo, there is an N > 0 so that every interval
[ti - /i, t{) contains at least one point such that -N < y(fM) < N.
284 Chapter 7. Differential Equations

Consider the initial-value problem

z'{t) = f(t,z(t)), z{t^) = (13)

We will choose y and the corresponding below. Let S* be the square


{(t, y) | |f| < |£i| + 1, |y| < IV + 1} and define

M* = max \f(t,y)\, K* = max


5* S*

For each y, the 6 in Theorem 7.1.1 can be taken to be 1 because, by


hypothesis, / is continuously differentiable on the whole plane. If we
do so, then, for each y the corresponding square 5^ of Theorem 7.1.1 is
contained in S* and thus MM and are less than M* and K*, respec¬
tively. Now, the length of time of existence, T, of the local solution con¬
structed in Theorem 7.1.1 depended only on the the constants M and K.
Therefore, if we choose T* so that
T* < 1/M* and T* < 1/K*; then the
solution of each of the initial-value
problems (13) exists for a time T* in¬
dependent of y (since M* and K* are
t\ — y tn t\ + T* independent of y). Choose y small
enough so thatt\—y + T* > t\. Then,
Figure 7.2.1
since — y, we have

t/j, + T* > t\ — y + T* > t\.

Thus, the solution z(t) of (13) exists on the interval + T*), which
contains t\. See Figure 7.2.1. By the uniqueness proven in Theorem 7.1.1,
the solution z(t) coincides with y(t) on the interval [t^ti) where both
solutions exist. Therefore z(t) extends the solution y{t) past t\ which
contradicts the definition of t\. Thus, either y(t) -)• oo as t /* t\ or
y(t) —> -oo as t /* t\. □

This theorem is so useful because if we can show that a solution can't


go to +oo or -oo in finite time, then the solution must exist for all times
t. In this case we say that the differential equation has a global solution.

Example 1 Consider the initial-value problem

y'{t) = sin (g(t,y(t))), y(t0) = yQ


7.2 Global Existence 285

where g is some continuously differentiable function of two variables.


Even if g is very simple (for example g(t, y) = y), we can't “solve” this
equation in the sense of writing down a solution in terms of elementary
functions. But we can say that the solution exists for all times t. The rea¬
son is simple. Whatever the function y(t) is, we know from the equation
that
\y'(t)\ = | sin (g(t, y(t)))\ < 1

for all times t. Thus in a time interval of length T, y(t) cannot increase
or decrease by more than T, so y(t) cannot approach +oo or — oo in finite
time. Theorem 7.2.1 implies, therefore, that the solution is global.

Even when y'(t) grows, y(t) can't go to infinity in finite time if y'(t)
doesn't grow too fast. For example, the solution of y'(t) = y(t) with y(0)
= 1, is the function y(t) = e*, so y'(t) does indeed grow exponentially.
Nevertheless, the solution exists for all times. Suppose that we consider
the initial-value problem

y'{t) = im = y°- (14)


As long as yQ > 0, y(t) will increase and remain positive. From (14) it
follows that y'(t) < y(t) for all t, so it seems reasonable to guess that
the solution of (14) is less than e4 and therefore can't go to infinity in
finite time. This is true and is so useful that we state it separately as a
proposition.

Proposition 7.2.2 Let y(t) be a continuously differentiable function on


a finite interval [a, b}. Suppose that y{t) satisfies y'(t) < My(t) for all
t e [a, b\. Then y(t) < y{a)eM^t~a?> on [a, b\.

Proof. Define x(t) = y(t)e~Mt. Then, for all t e [a, b\,

x'(t) = e-M,(y'(t) - My(t)) < 0,

so, by the Fundamental Theorem of Calculus, x(t) < x(a). Substituting


for x in terms of y gives the desired inequality. □

The proposition enables us to prove an extremely useful theorem that


allows us to compare the solutions of two different equations if we can
compare the functions on the right-hand sides.
286 Chapter 7. Differential Equations

□ Theorem 7.2.3. Suppose that / and g are continuously differentiable


functions of two variables that satisfy

f(t,y) < g{t,y) (15)


for all points (£, y) in the strip a < t < b, — oo < y < oo. Suppose that y{t)
and z(£) satisfy the differential equations

y'(t) = f{t,y{t)), z'(t) = g(t,z{t))

on the interval [a, b}. Then,

(a) If y(a) < z(a), then y(t) < z(t) for all £ e [a, b}.

(b) If y(b) < z(b), then y(t) > z{t) for all £ e [a, b\.

Proof. We shall prove (a); the proof of (b) is similar. Suppose that y(a) <
z(a) and that there is a £2 in the interval [a, b] such that 2/(^2) > z(t2). We
will show that this leads to a contradiction. Let t\ be the supremum of
the set of £ e [a, £2] such that y(t) < z(t). The hypothesis y(a) < z(a)
shows that the set is nonempty. Since y and z are continuous functions,
we know that y{ti) = z(t\). In particular, t\ < £2- On the interval [£1,^2]/
we define h(t) = y(t) — z(£). Since y and z are continuous on [£1, £2], they
are bounded. Thus, there is a constant B so that \y(t)\ < B and |z(£)| < B
for all £ e [£1, £2]. Let K be the supremum of 1| on the rectangle [£1, £2] x
[-B, B]. Then for £ e [£1, £2],

y'(£) - z'(£) (16)


(17)

/(by(0) - /(*>«(*)) (18)


K\y(t) - z(£)| (19)
K{y[t) - z(£)) (20)
Kh(t). (21)

We used the hypothesis (15) in step (18) and the Mean Value Theorem in
step (19). Since h'(t) < Mh(t) on [tifa], Proposition 7.2.2 assures us that
h(t) < h(t\)eMt. But h(t) is nonnegative and h(ti) = 0, so h(t) = 0 for all
£ e [£1, £2]. Thus, ^(£2) < z(£2). Since this violates our assumption about
£2, the proof is complete. O

Theorems 7.2.1 and 7.2.3 can be combined to give useful conditions


for existence on long time intervals.
7.2 Global Existence 287

□ Theorem 7.2.4 Let /, g, and h be continuously differentiable functions


of two variables that satisfy g(t,y) < f(t,y) < h(t,y) for all a < t < b
and —oo < y < oo. Suppose tQ c [a, 6]. Suppose that solutions x(t) and
z(t) of the initial-value problems

x'{t) = g(t,x(t)), x(tQ) = ya (22)

and
CT

CT
(23)
II

II
0

0
exist on the interval [a, b\. Then, the solution of<r*b

y'(t) = f(t,y(t)), II (24)


0

0
exists on the interval [a, 6]. Furthermore, x(t) < y(t) < z(t) forallfe[f0,&]
and z(t) < y(t) < x(t) for all t e [a, tQ\.

Proof. By Theorem 7.1.1, the equation (24) has a local solution y(t)
which exists near t = tQ. By Theorem 7.2.1, either y(t) exists on [a, b] or
it goes to Too or to —oo as t approaches some t\ e [a, b]. Suppose ti > tQ.
On the interval [t0,ti)r part (a) of Theorem 7.2.3 guarantees that y(t) is
bounded above by z(t) and bounded below by x(t), so y(t) cannot go to
Too or to — oo as t t\. Therefore, the solution exists for all t e [tQ, b] and
the estimate x(t) < y{t) < z{t) holds. A similar proof, using part (b) of
Theorem 7.2.3, shows that the solution exists for all t e [a, t0\ and that the
estimate z(t) < y(t) < x(t) holds there. □

Example 2 Consider the initial-value problem

y'(t) = 2£2(cost)(siny(t)) T 5y(t), y(0) = yQ. (25)

Choose M > 0. For t e [—M, M],

-2M2 T 5y < 2t2(cost)(siny(t)) T 5y < 2M2 T 5y.

By separation of variables, one can solve easily each of the equations


y'(t) = -2M2+5y(t) and y'(t) = 2M2+5y(t) with initial condition y(0) =
yQ and observe that their solutions exist on [-M, M}. By Theorem 7.2.4,
a solution of (25) exists on the interval [-M, M]. Since M is arbitrary, the
solution exists globally.
288 Chapter 7. Differential Equations

Example 3 The fact that orbits cannot cross (Theorem 7.1.2) can some¬
times be used to show that solutions exist on infinite time intervals. Con¬
sider the initial-value problem

y'(t) = -y{t)1 2 3 4, 2/(0) = yQ > o.

Since y'(t) < 0, the solution is decreasing and therefore cannot approach
Too. On the other hand, the t-axis is the orbit of the solution that is
identically zero for all t. Since y(t) starts positive and cannot cross the t
axis, the solution remains positive. Thus it cannot approach — oo. Since
the solution can not approach either Too or Too on the interval [0, oo),
by Theorem 7.2.1 it exists on the entire interval [0, oo) .

It is worthwhile to emphasize two points. First, even when a general


criterion like Theorem 7.2.4 does not hold, a differential equation may
have global solutions. One must use other, more detailed properties of
f to prove it. Second, a differential equation may have global solutions
for some initial conditions but not for others. Both these points are illus¬
trated by problem 3.

Problems

1. Solve the following initial-value problem explicitly and determine for


which p > 0 the solution goes to oo in finite time.

y'{t) = y(t)p, y( 0) = 1.

Note: for x > 0 and p > 0 we define xp = eplnx.

2. Solve the initial-value problem in Example 3 explicitly and verify that the
solution exists on [0, oo) and satisfies 0 < 2/W < ya.

3. Consider the initial-value problem

y\t) = (y(t) -1)(y(t) - 2), y(o) = yQ.

(a) What are the solutions if yQ'= 1 or yQ = 2?


(b) Prove that the solution is global if 1 < y0 < 2.
(c) Prove, without solving explicitly, that the solution goes to Too in
finite time if yQ > 2.

4. Prove global existence for the initial-value problem

y\t) = sin y(t), y(0) = Vo.


7.3 The Error Estimate for Euler's Method 289

5. Prove global existence for the solution of the initial-value problem

y'(t) = 2 y(t) + ^sin y(t), y(0) = yQ.

6. Without solving explicitly, prove global existence for the initial-value


problem

y'(t) = t + y(t), y( 0) = ya.

Hint: for t e [-N, N], the inequality —N + y<t + y<N + y holds.

7. Let g be a bounded continuously differentiable function on R2. Show that


the solution, y£(t), of

v'it) = y(t) + eg(t,y(t)), y( 0) = 1

exists globally and converges uniformly to e1 on each finite interval as


£ —> 0. Hint: estimate ye(t) from above and below.

8. Suppose that 0 < e < y0. Prove that the solution of the initial-value
problem

y'(t) = (y(i))2 + esmy(t), y(0) = Vo

converges to +oo in finite time.

9. Suppose that all the hypotheses of Theorem 7.2.3 hold except that the
hypothesis that / and g are continuously differentiable is weakened to the
statement that both / and g are continuous and one of them is uniformly
Lipschitz continuous. Prove that the conclusion still holds.

7.3 The Error Estimate for Euler's Method

Most differential equations cannot be solved analytically in terms of fa¬


miliar elementary functions like polynomials, trigonometric functions,
exponentials, and so forth. Analytical expressions are extremely valu¬
able when they exist because they allow one to see explicitly the depen¬
dence of the solution on the parameters of the problem, for example, the
initial condition or coefficients. When closed-form expressions do not
exist, sometimes power series or transform methods allow representa¬
tions of the solution. But, in general, most differential equations have
to be solved approximately by machine computation, and an important
part of the design of algorithms is the proof of error estimates. In this
290 Chapter 7. Differential Equations

section we show how to estimate the error in Euler's method, the sim¬
plest numerical method for approximating the solutions of differential
equations.
Suppose that we wish to approximate the solution of the differential
equation

y'{t) = y{a) = y0 (26)

on the interval [a, b]. We divide the interval into N equal parts of length
h = (b — a)/N by setting to = a, ti = a + h,t2 = a + 2h,..., tv = b.
Set yo = yQ. We know the value of y at to, and the differential equation
tells us the value of y' at to, namely, f(to,y{to)). Thus it is natural to
approximate y(t) on the interval [t0, ti] by yo + (t — to)f(t0,y(t0)) since
this straight line has the same value and slope as y(t) at to- This gives us
the approximation
2/i = 2/o + hf(t0,y(t0))

for the value of y(t\) at t\. Using yi, we can approximate y'(ti) =
/(ti,y(ti)) w /(ti,2/1), and this enables us to define the straight line
approximation yi + (t — ti)/(ti,yi) on the interval [ti,t2]- This second
straight line gives us the approximation

y2 = yi + hf(ti,y{ti))

for the value of y(t2). Continuing in this manner, we define recursively

Vn+1 = J/n T h f (tm y(^n)) (27)

for n = 0,..., N -1. Connecting the points (tn, yn) by straight lines gives
the polygonal approximation to y(t) first used by Euler and known as
Euler's method.

Example 1 We will use Euler's method to approximate the solution of

y'(t) = -2y(f) + 5t, y(0) = 5 (28)


* s

on the interval [0,2], We will use 8 subintervals, so h = .25, and tn =


•25n. The recursion relation (28) for the approximate values at the points
tn is yn+1 = yn + (.25)(-2yn + 5tn). Carrying out the recursion starting
with to = 0 and y0 = 5, we get the table of values and the polygonal
graph shown in Figure 7.3.1. This equation has a simple closed-form
solution, so for comparison we have listed its values at the points tn in
the right-hand column and drawn its graph.
7.3 The Error Estimate for Euler's Method 291

n t"n yn y{tn)

0 0 5 5
1 .25 2.5 3.17
2 .5 1.56 2.30
3 .75 1.41 2.02
4 1.0 1.64 2.10
5 1.25 2.07 2.39
6 1.5 2.60 2.81
7 1.75 3.17 3.31
8 2.0 3.77 3.86

Figure 7.3.1

If the solution of (26) is a straight line, Euler's method gives the so¬
lution exactly. By the Fundamental Theorem of Calculus (or by Taylor's
theorem), the deviation of y(t) from a straight line can be estimated if we
can bound the second derivative of y. Thus, we expect that error esti¬
mates for Euler's method should involve bounds on the second deriva¬
tive of y{t), that is, on the first derivatives of /.

□ Theorem 7.3.1 Let / be a continuously differentiable function of two


variables on a rectangle R of the form R = [a, b] x [yQ — c,y0 + c]. Suppose
that the solution y{t) of (26) exists on the interval [a, b] and that the graph
of y(t) lies in the rectangle. Let N, h, {tn}, and {yn} be as defined above
and suppose that the points {(£n, yn)} all lie within R. Then, for each n,

\y{tn)-yn\ < ^(e(b~a)K ~ 1). (29)

where K = sup^ |g| and L = supfl|§£ + /§£|.

Proof. The numbers {yn} are defined by the recursion relation (27).
We can also write a recursion relation for the numbers {y{tn)} as fol¬
lows. Since y(t) and / are continuously differentiable, the composition
f(t, y(t)) is also continuously differentiable. By (26), y'(t) is continuously
differentiable, so y(t) is twice continuously differentiable. Therefore, by
Taylor's theorem.

2! y"(&)
y{pn+1) y(tn) + hy'(tn) +
292 Chapter 7. Differential Equations

for some point between tn and tn+\. Since y(t) satisfies (26), we can
rewrite this as

h2_ df_ fdi


y(*n+i) = y{tn) + hf(tn,y(tn)) + (*»,y(6»)). (30)
2! dt 1 dy\

In the third term on the right, all three functions are evaluated at the
point (£my (£«))• Subtracting (27) from (30) and taking absolute values,
we find

h2
|y(*n+i) - y»»+i| < \y{tn) - yn\ + h\f(tn,y{tn)) - f{tn,yn)\ + —L (31)
h2
< \y(tn) - yn\ + hK\y(tn) - yn)\ + ^L, (32)

where we used the Mean Value Theorem in the second step. Throughout,
we used the hypotheses that the points (tn,y(tn)) and (tn,yn) are in the
rectangle. For simplicity, we write the error at the step as En =
|y(tn) — yn | and set A = (1 + hK) and B = Then (32) can be written

En+\ < AEn + B.

Iterating this inequality, and using the partial sum of the geometric series
gives

En+1 < B(l+A + A2 + ... + An)

To get an estimate that is independent of n, we use the estimate 1 + Kh <


eKh, which follows easily from the power series representation for eKh
or by the Fundamental Theorem of Calculus (problem 3). Since

(1 + Kh)n+1 < (eKh)n+1 < (eKh)N = eKhN = eK(b~a\

we obtain

Iy(tn) - ynI < ^(eK(6“a) - !)

which is what we set out to prove. □

Euler's method is called a first-order method because the error bound


decreases proportionally to the first power of the step size h. We note that
7.3 The Error Estimate for Euler's Method 293

the right-hand side of (29) is an upper bound for the error; the actual er¬
ror may be much less. In addition, we used the rather crude estimate
1 + x < ex in order to get a bound independent of n. Nevertheless,
it is true that Euler's method is not a very efficient method in that one
must make h very small (thus the number of intervals, N, very large) in
order to approximate the solution well. As in the case of the numerical
estimation of integrals discussed in Section 3.4, there are serious draw¬
backs to choosing h too small because of the tiny but real round-off error
that may occur with each computational step. This is explored further
in project 2. Thus, the design of higher-order methods and the proof of
error bounds have played (and play) an important role in the study of
ordinary and partial differential equations. The proofs of the error es¬
timates for higher-order algorithms are more complicated than the one
above but use the same analytical ideas.
A natural question has probably come to mind. What is this mysteri¬
ous rectangle R which occurs in the hypotheses of Theorem 7.3.1? Since
we are using numerical techniques precisely because we can't solve the
equation explicitly, how can we know what R is? The answer is that we
must derive estimates on the solution by using the ideas of Section 7.2.

Example 2 Suppose that we want to use Euler's method to approximate


the solution of

y'(t) = sin y(t), y(0) = 3 (33)

on the interval [0, 2]. Then f(t, y) = sin y, so = 0 and = cos y. Thus,
whatever the rectangle R is, the constants K and L in Theorem 7.3.1 can
be taken to equal 1. Therefore, estimate (29) is

h
\y{tn) ~ Vn\ — ^(e — !)•

If we want the Euler's method points (tn, yn) to be within 10~3 of the
true solution, we can guarantee that if we choose h small enough so that

-(e2 - 1) < 1(T3.


2v ’ ~

In Example 2 we didn't need to determine a suitable R because the


function / and its derivatives were uniformly bounded everywhere.
That is not the case in the following example.
294 Chapter 7. Differential Equations

Example 3 Let's use Euler's method to approximate the solution of

y'(t) = ty(t)smy(t), 2/(0) = 3

on the interval [0,2]. Since f(t,y) — ty sin y, we have ^ = ysiny and


% = ts'mV + tycosy. Notice that the curve {(£, 0) | 0 < t < oo} is an
orbit, so, by Theorem 7.1.2, the solution, y(t), remains positive. Thus, for
0 < t < 2,
-2y < ty sin y < 2y,

so by Theorem 7.2.4, y(t) exists on the interval [0, 2] and

3e~2t < y(t) < 3e2t

Notice that we do not know what the solution y(t) is, but we have esti¬
mates on it from above above and below. Thus, the solution curve y(t)
remains in the rectangle

R = [0,2] x [0, 3e4].

To see that the same is true for the Euler's method iterates, we estimate

Vn+1 = yn + htnyn sin yn < (1 + 2h)yn,

so
yn < (1 + 2h)n3 < 3e2hN < 3e4.

Similarly,
Vn+i = yn + htnyn sin yn > (1-2h)yn

so if h < we have yn > 0 for all n. Thus, the points (tn, yn) also remain
in the rectangle R. In the rectangle, we know that \f (t, y)\ < 6e4, so we
can estimate

df
K sup < 2 + 6e4
R dy

and
L = sup < 3e4 + 6e4(2 + 6e4).
R dt 1 dy

Using these estimates for K and L, we can determine, by Theorem 7.3.1


and (29), how small we must choose h to guarantee that the Euler's
method approximations are within any given distance from the true val¬
ues, y(tn).
7.3 The Error Estimate for Euler's Method 295

Problems
1. Use Euler's method to approximate the solution to the following initial-
value problem on the interval [0,2]:

V'(t) = ty(t)- 2, 2/(0) = 3.

2. Use Euler's method to approximate the solution to the following initial-


value problem on the interval [0,2]:

y’{t) = t2 + e_y(t)2, 2/(0) = 0.

3. Use the Fundamental Theorem of Calculus to prove that 1 + x < ex for


all x > 0.

4. Use Theorem 7.3.1 and the methods of Example 3 to determine how small
h must be chosen so that the Euler points yn are within 10~4 of the true
values y(tn) for the differential equation in problem 1 if 2/(0) = 4.

5. Use Theorem 7.3.1 and the methods of Example 3 to determine how small
h must, be chosen so that the Euler points yn are within 10-4 of the true
values y(tn) for the differential equation in problem 2.

6. Use Euler's method to approximate the solution to the following initial-


value problem on the interval [0,2]:

y'(t) = 2/(*)2, y( 0) = u

Did you see the blowup of the solution at t = 1? Why not?

7. Use Euler's method to approximate the solution to the following initial-


value problem on the interval [0,6]:

y\t) = sin(et), 2/(0) = 1.

Do you think that Euler's method gives a good approximation? Why or


why not? What is the size of the error bound given by (29) for your choice
of h?
8. Design an Euler's method to solve a pair of coupled ordinary differential
equations:

y’it) = y(to) = y0
z'(t) = g(t, y(t), z(t)), z(t0) = z0.

9. In a certain model of competing species, Ni (t) and N2 (t) represent the


amount of species 1 and species 2, respectively, in some normalized units.
Suppose that Ni (t) and N2 (t) satisfy the following differential equations:

K(t) = jV!(t)(i - JV,(f) - ijv2(()), iVi(0) = ni

= w2(t)(i - W2(t) - N2(0) = n2


K(t)
296 Chapter 7. Differential Equations

Use the Euler's method scheme from problem 8 to investigate the behav¬
ior of solution curves N2(t)) in the Ni - N2 plane for different
choices of (ni, n2).

10. In a certain model of predator-prey interactions, the numbers of the two


species, Ni(t) and N2(t), satisfy the following differential equations:

N[(t) = N^t) - Ni(0) — ni


iV'(f) = ~N2(t) + N1(t)N2{t), lV2(0) = n2.

Use the Euler's method scheme from problem 8 to investigate the behav¬
ior of solution curves (Ni(t),N2(t)) in the Ni - N2 plane for different
choices of (ni, n2). In fact, all solutions are periodic in t. Is that what you
found? Why not?

Projects

1. We have seen that some differential equations can be solved explicitly,


and we have analyzed a simple numerical method for finding approxi¬
mate solutions to others. We can also obtain information about solutions
by using analytical tools without solving explicitly or using numerical
methods. We will illustrate some of the simplest ideas in the case of the
logistic equation

y'{t) = ay(t)(b-y(t)), y(0) = yQ.

Here a, b, and yQ are positive and y(t) represents the population size at
time t in some units.

(a) First we suppose that yQ < b. Explain why y(t) is always increasing.
Explain why there can be no time t at which y(t) — b. (Hint: recall
Theorem 7.1.2.) Explain why the solution y(t) exists for all positive
times and is unique.
(b) Again suppose that yQ < b. Explain why c = lim^oo y{t) exists.
Prove that if c <b, then y(t) will be eventually higher than c, giving
a contradiction. Conclude that y(t) —>• b as t —>■ 00.
(c) Again suppose that yQ < b. Compute y"(t) in terms of y(t) and use
the result to help you draw an accurate sketch of the solution.
(d) Suppose that yQ > b. Use the ideas in (a), (b), and (c) to show that
the solution exists for all positive times, and draw an accurate graph
of it.

2. The purpose of this project is to analyze the trade-off between small step
size and round-off error in Euler's method. Every time the computer
calculates an Euler iterate, it makes an error whose size is bounded above
Projects 297

by a number e, which depends on the machine being used. Thus, instead


of computing the iterates given by (27) in Section 7.3, the machine is really
computing the iterates yn that are given by the recursion relation

Vn+l = Vn + f{tn,y(tn)) + (34)

where we know that the numbers en satisfy |en| < e.

(a) Using the terminology and hypotheses of Theorem 7.3.1, except that
we replace (27) in Section 7.3 by (34) above and yn by yn, prove that

h2
En+i < (1 + hK)En + — L + e.

(b) Iterate this inequality to prove that for all n < N,

Lh e (b-a)K _
En < ( —rr (e 1).
2K + hK

(c) Explain why the error bound cannot be made arbitrarily small no
matter how we choose h. How should one choose h to make the
error bound as small as possible?
(d) Suppose that e — 1CU8. What is the maximum accuracy you can get
by using Euler's method for the differential equations in problems
1 and 2 of Section 7.3?

3. The purpose of this project is to show how a local existence and unique¬
ness result analogous to Theorem 7.1.1 can be proved for systems. We
will consider the initial-value problem for the system

y'(t) = f{t,y(t),z(t)), y(t0) — y0 (35)


z'{t) = g(t,y(t),z(t)), z(t0) = z0 (36)

where / and g are continuously differentiable functions of three indepen¬


dent variables. Let IT = [tQ -T,ta + T}, as in Section 7.1. We denote the
set of pairs of continuous functions on IT with values in M by C(It : K2)-
If (f,g) e C(IT ■ K2)/ we define

\\(f,g)\\oo = ll/lloo + Moo.


According to problem 7 in Section 5.3, C(IT : R2) is a complete normed
linear space.

(a) Show that if y(t) and z(t) satisfy (35) and (36), then they also satisfy
a pair of integral equations.
(b) Follow the proof of Theorem 7.1.1 to show that if T is small enough,
the integral equations can be solved by iteration.
(c) Show that the solutions of the integral equations are continuously
differentiable and satisfy the differential equations.
298 Chapter 7. Differential Equations

(d) Show that the solutions are unique.

4. Suppose that y(t) and z(t) satisfy (35) and (36) on a time interval a < t <
b. The curve {\y(t), z(t)) \ te[a,b}} in the y — z plane is called an orbit.
Prove the analogue of Theorem 7.1.2 by showing that if two orbits over
the time interval [a, b\ cross, then they are identical.

5. Suppose that y(t) and z(t) satisfy (35) and (36) on a time interval a < t <
b. The solution pair (y(t),z(t)) is said to go to oo at finite time b if, for
every M, there is a tM < b so that

\/y(t)2 + z{t)2 > M for tM <t <b.

We define similarly what it means for (y(t), z[t)) to go to oo at finite time


a.

(a) Explain geometrically what these definitions mean.


(b) Prove the analogue of Theorem 7.2.1. That is, show that if the local
solution of (35) and (36) does not go to oo in finite time, then the
solution exists for all times.
(c) Let rii > 0 and n2 > 0, and let N\(t) and N2(t) be the local solutions
of the differential equations in problem 9 of Section 7.3, which are
guaranteed to exist by project 3. Prove that the solution must stay
in the positive orthant. Hint: use the result of project 4.
(d) Prove that the solutions Ni(t) and N2(t) exist for all positive times.
Hint: what are the signs of N[(t) and N^t) if either N\(t) or N2(t)
is large?
CHAPTER 8

Complex Analysis

In this chapter we develop the rudiments of the theory of analytic func¬


tions of one complex variable. We have three main purposes. The first
is to show that the ideas of classical real analysis (the Riemann integral,
series, interchanging limits and integration, etc.) play a fundamental
role in complex analysis. Second, we want to answer the question about
which functions of a real variable are equal to their Taylor series. Finally,
we want to introduce the class of analytic functions because it plays a
crucial role in both classical and modern analysis and in the applications
of analysis. Throughout, we use the properties of real-valued functions
of two variables, discussed in Section 4.6. For the further development
of the subject and many applications, see [6] and [23].

8.1 Analytic Functions


The analytic functions are a special class of functions which are defined
on (subsets of) the complex numbers and take values in the complex
numbers. The natural domains of definition of analytic functions are
open sets in the complex plane. We say that a set D is open if for every
zeD there is a circle about z so that every complex number in the interior
of the circle is also in D. Thus, for example, the set {z e C | Re(z) > 0} is
open while the set {ze C | Re(z) > 0} is not. Other examples are given in
problem 1.

Definition. Let D be an open set in C. A function / defined on D, taking


values in C, is called analytic if for every z e D the limit

lim f(z + h)~ /<Z) = f'{z) (1)


/i-> o n

exists.
300 Chapter 8. Complex Analysis

In the definition of analytic, each h is a complex number. Saying that


h —>> 0 means that |/i| —> 0. Thus h can approach zero in many different
ways. For example, if hn = then the sequence {hn} approaches zero
along the positive real axis. If hn = —i\, then the sequence {hn} ap¬
proaches zero along the negative imaginary axis. Let 6 be a fixed small
real number and define hn = \einQ. Then, the complex numbers {hn}
spiral in toward zero. Thus, the statement that the limit in (1) exists
means that the limit is independent of the way in which h —» 0. We shall
see that this is a very strong restriction and that analytic functions have
very special properties.
Recall that every complex number z = x + iy is specified by the pair
of real numbers (x,y). Since the value of / at z is a complex number,
it is specified by giving its real and imaginary parts, u and v. Since the
value of / depends on 2 and therefore on x and y, u and v will depend
on x and y. Thus, specifying the function / is the same as specifying two
real-valued functions, u and v, defined on (a subset of) R2 so that

f(z) = u(x, y) + iv(x,y).

The functions u(x, y) and v(x, y) are called the real and imaginary parts
of the function /.

Example 1 Suppose / is the function on C which squares each complex


number; that is, f(z) = z2. Then, since/(z) = (x+iy)2 = (x2-y2) + 2xyi,
we have u(x, y) = x2 — y2 and v(x, y) = 2xy.

As we shall see, analyticity puts certain restrictions on u and v. In


what follows, we discuss only functions / whose real and imaginary
parts are continuously differentiable. First, suppose that h -> 0 through
real numbers; that is, h = A where A is real. Then,

f(z + h) - f(z) _ (u(x + A,y) + iv(x + A,y)) - (u(x,y) + iv(x,y))


h A

u{x + A,y) - u(x,y) ,v(x + A, y) - v(x, y)


A 1 A
Thus,

lim }{Z + h)~ f(z) ux(x,y) + ivx(x,y), (2)


h—^0 h

where ux and vx are the partial derivatives of u and v with respect to


x. On the other hand, suppose that h 0 through purely imaginary
8.1 Analytic Functions 301

numbers. That is, h = iy where n is real and /i -> 0. Then,

f (z + h)~ f{z) _ (u(x, y + y) + iv(x, y + y)) - (u(x, y) + iv(x, y))


h iy

v{x,y + y) - v(x,y) .u(x,y + y) - u(x,y)


— i

so

vy{x,y) - iuy(x,y). (3)


h—h

According to the definition, if / is analytic, (2) and (3) must be the same,
so
ux(x,y) + ivx(x,y) = vy(x, y) - iuy(x, y).

Since two complex numbers are equal if and only if their real and imag¬
inary parts are equal.

ux(x,y) = vv{x,y) (4)


uy(x,y) = -vx(x,y). (5)

Equations (4) and (5) are known as the Cauchy-Riemann equations.


They are not only necessary but, under suitable hypotheses, also suffi¬
cient for / to be analytic.

□ Theorem 8.1.1 Suppose that u and v are continuously differentiable


real-valued functions defined on a domain D C C. Then f(z) =
u(x, y)+iv(x, y) is analytic on D if and only if the Cauchy-Riemann equa¬
tions hold in D.

Proof. We have already shown that if / is analytic, then the Cauchy-


Riemann equations hold. We shall prove the converse. Suppose that the
Cauchy-Riemann equations hold in D. Let z be in D and let hn = xn+iyn
be any sequence such that z + hn e D for all n and hn —> 0. It follows
that \hn\->0, and this implies that xn ->• 0 and yn ->• 0. For simplicity of
notation, we suppose that z = 0; the proof for a general z is the same but
the expressions for the difference quotients are larger.

f(hn) - /(0) 'U'iXni yn) ^(0) 0) . v(xn,yn) - v(0,0)


+ i (6)
Xn T iyn Xn T
302 Chapter 8. Complex Analysis

u{xn, y-n) - ^(0, Vn) + ^(0) Vn) - ^(0, 0) (7)


xn + iyn

^ ^(^m!/ri) ^(^n) 0) ~t~ v{xni 0) ^(0? Q) ^g^


Xn 1 iyn
“ “

nx (xl i yn)%n "I” Uy (0, T2)yn ^ {xnt ^3)l/n ~t~ (^~4; ^)xn
Xn + iyn xn + Wn
ux{ri,yn)xn + iux(xn,T3)yn | . vx{ta, 0)xn + ivx(0, r2)yn
Xn T" iyn xn 3~ iyn

—* Wx(0,0) + ^(0,0). (11)

In going from (7) and (8) to (9), we used the Mean Value Theorem four
times. In going from (9) to (10), we used the Cauchy-Riemann equations
and rearranged the terms. From the Mean Value Theorem, we know that
n,T2,t3, and T4 all converge to zero as xn —> 0 and yn —» 0. Using the
continuity of ux and vx, the convergence in the last step follows from the
result in problem 2. □

Example 2 (polynomials) Suppose that f(z) = c = c\ + zc2 is a constant


function. Since all the partial derivatives of u = ci and v = c2 are zero,
the Cauchy-Riemann equations hold and the function is analytic. Now,
suppose f(z) = z = x + iy. Then u(x,y) = x and v(x,y) = y, and it is
easy to check that the Cauchy-Riemann equations hold; thus, x + iy is
analytic in the whole complex plane, C. Furthermore, sums, products,
quotients, and the composition of analytic functions are again analytic
(except where the denominators vanish). These results can be proved in
the same way in which we proved that sums, products, quotients, and
compositions of differentiable functions are again differentiable (Theo¬
rems 4.1.2 and 4.1.3). Thus, polynomials in z and quotients of polynomi¬
als in z are analytic except where the denominators vanish. The fact that
such functions are analytic can also be proved directly; see problem 3.
Note that if p(x, y) and q(x, y) are polynomials in x and y, then

f(z) = p{x,y) + iq(x,y)

will be a well-defined continuously differentiable (in fact, infinitely often


differentiable) function from C to C. But, in general, / will not be analytic
8.1 Analytic Functions 303

because the Cauchy-Riemann equations won't hold. For example, the


function f(z) = x - iy is not analytic. In this case, u(x,y) = x and
vix,y) = — y, so Uj. = 1 / -1 = vy. If, however, / is a polynomial
in (x + iy), then, by the argument above, f is analytic and the Cauchy-
Riemann equations hold. For example, suppose f (z) = z2 = x2 — y2 +
i(2xy). In this case, u = x2 - y2 and v = 2xy, so

ux — 2x = vy and uy = —2 y = —vy.

Thus f(z) = z2 is analytic.

Example 3 (series) Since polynomials in z are analytic it is natural to ask


about power series in z. According to Theorem 6.5.3, the power series
OO

f(z) = J2aj(z~z°y 02)


3= 0

converges uniformly in every closed disk about zq that is smaller than


the radius of convergence R of the series. And according to Theorem
6.4.2, the function to which a power series converges is infinitely often
differentiable inside the radius of convergence, and the derivatives can
be computed by differentiating the series term by term. Although The¬
orem 6.4.2 was proved for real power series, the same proof shows that
the result holds for series in powers of z. Thus, in order to check whether
the limit of the series in (12) is analytic, we can just differentiate the series
term by term. Let uj and Vj be the real and imaginary parts of aj(z - zoy.
Then f{z) = Yuj+'i'Y vj aRd

(yi u.i) ^ = y^,(ui)x — ^~2(vj)y =

since (uj)x = (Vj)y for each j because a,j(z - zoy is analytic. Similarly,
(S uj)y — vj)x■ Thus, the real and imaginary parts of / satisfy the
Cauchy-Riemann equations so the function /, defined by (12), is analytic
inside its radius of convergence.
For example, since the power series for ez has radius of convergence
R = oo, ez is analytic on C. Similarly,

_ g iz giz _|_ g iz
sin z = -—-, and cosz = ---
21 2

are analytic functions on C whose restrictions to the real axis are since
and cos x.
304 Chapter 8. Complex Analysis

Problems

1. Which of the following subsets of C are open?

(a) eC \z\ < 1}.


(b) eC \z\ > 1}.

0
(c) eC and Im(z) > 0}.

A
cc;

0
(d) eC A
and Im(z) > 0}.
u

(e) eC p(z) / 0}/ where p is a polynomial in

2. Suppose that {xn} and {yn} are sequences of real numbers such that xn +
iyn ^ 0 for any n and xn + iyn —> 0. Suppose that {an} and {/3n} are
sequences of real numbers such that an —> 7 and /3n —» 7. Prove that

OLnxn -|- i(3nyn


lim -;- - 7.
n-> 00 xn + zyn

Hint: write an = (an - a) + a and (3n = (0n - 0)+ 0.

3. For each of the following functions, find the real and imaginary parts in
terms of x and y and verify that the Cauchy-Riemann equations hold on
the indicated domain:

(a) f[z) = z3 on C.
(b) f(z) = \ on {z e C | z / 0}.
(c) f(z) = X)°l0 zj on {z e C \ \z\ < 1}.

4. Use the Cauchy-Riemann equations to determine which of the following


functions are analytic:

(a) x2 — y2 + i(2xy) (b) ex cos y + iex sin y (c) x2 + iy2

(d) x — iy (e) In vJx2 + y2 + i arcsin xa (f) x—^—=•


iy o

5. We say that a function, /, which takes C to C is continuous if zn —>•


implies lim.,^00 f(zn) = f(zQ).

(a) Prove that / is continuous on C if and only if its real and imaginary
parts are continuous functions on R2.
(b) Prove that if / is analytic, then / is continuous.

6. Suppose that / and g are analytic functions on C. Prove that f + g and


fg are analytic and that (/ + g)'(z) = f'(z) + g'(z) and (f(z)g(z))' =
f(z)g(z) + f(z)g'(z).

7. Suppose that / and g are analytic functions on C. Prove that / o g is ana¬


lytic and (f(g(z))' = f'(g(z))g'(z). Hint: write out the real and imaginary
parts of / o g in terms of the real and imaginary parts of / and g.
8.2 Integration on Paths 305

8. Say where each of the following functions is analytic and compute its
derivative:
(a) f(z) = (cosz)2 + e2z.
(b) f(z) = e1/2.
(c) f(z) = e1/siaz.
9. Let f(z) = Where is / analytic?
(a) Find a power series representation for / in the region {z | \z\ < 1}.
(b) Find representation for / as an infinite series of powers of - valid
in the region {z \ \z\ > 1}.
(c) Find a power series representation for / around the point z = i.
what is its radius of convergence? Hint: write

(d) Find a power series representation for g(z) = j around the


point z = i.
10. What is the radius of convergence of the series

(*-l) - ^-l)2 + ‘(z-l)3 - j(^-l)4 + ...?

Define In z to be this series in the region of convergence. Is In z analytic


there? Compute (In z)'.

8.2 Integration on Paths


We can extend the notions of integration and differentiation easily to
complex-valued functions of a real variable. If g(t) is such a function,
then we say that g is continuous if lim^oo g(tn) = g(t0) whenever
tn —> tQ. We say that g is continuously differentiable if the limit of the
difference quotient h~l{g(t + h) — g{t)) exists and is continuous for each
t. We can write g(t) = h(t) + ik(t), where h(t) and k(t) are the real and
imaginary parts of g(t). It is easy to check (problem 1) that g is contin¬
uous if and only if h and k are continuous and g is differentiable if and
only if h and k are differentiable, in which case g'{t) = h'(t) + ik'{t). If g
is continuous, we define its integral on finite intervals by

f g(t)dt= f h{t)dt + i f k(t)dt.


Ja Ja Ja
306 Chapter 8. Complex Analysis

Recall that any complex number can be written in polar form x + iy —


£ = el0\z\. Note that |x| < \z\ and Re{e 10zj = \z\. If we let 6 =
Arg (/a6 g(t)dt), then

rb

f g{t)dt = Re{e~10 g{t)dt} (13)


Ja Ja

= f
Ja
Re{e~10g(t)} dt (14)

< f 1 g(t)\dt, (15)


Ja

where we used Theorem 3.3.4 in the last step.

Example 1 (the Fourier transform) Let / be a real-valued continuous


function on M such that the improper Riemann integral |/(f)| dt ex¬
ists. According to problem 12 of Section 3.6, it follows that the improper
Riemann integrals, f^°oo(sint)f(t) dt and (cos £)/(£) dt exist also. For
every x e M, we define

e~ixtf(t) dt

/ oo
(cos xt)f(t) dt — i /
roc
(sin xt) f(t) dt.
\Z2tz J--oo J — oo

The function / is called the Fourier transform of /. The function / can


sometimes be recovered from its Fourier transform by the formula

/(*) = dx.

Detailed analysis of the Fourier transform requires more advanced tech¬


niques (like the Lebesgue integral) than we have at our disposal. How¬
ever, we do want to highlight one important property of the Fourier
transform. Suppose that / is zero outside the interval [a, b}. For every
complex number z = x + iy, we define
8.2 Integration on Paths 307

Notice that the integral makes sense since the interval is finite and the
integrand is a continuous function of t; thus, f(z) is a well-defined func¬
tion on C. The real and imaginary parts of f(z) are, respectively,

1 rb
u(x,y) = — / eyt (cos xt) f (t) dt
V27T Ja

1 fb
v(x,y) =--= / eyt(sin xt) f(t) dt.
V 27T J a

Using Theorem 5.2.4, one can compute the partial derivatives of u and v
by differentiating under the integral sign. When one does so (problem
2), one finds that u and v satisfy the Cauchy-Riemann equations. Thus, /
is the restriction to the real axis of a function that is analytic in the entire
complex plane. This property of the Fourier transform plays an impor¬
tant role in the theory of signal transmission in electrical engineering and
in scattering theory in physics.

We shall now define the integral of a complex-valued continuous


function / on a smooth curve C of finite length in the complex plane.
C is called a smooth curve if there is an interval [a, b] in M and contin¬
uously differentiable real-valued functions x(t) and y(t) on [a, b] so that
z(t) = x(t)+iy(t) sweeps out the curve from it's starting point, u> = z(a),
to its ending point, r = z(b), and \z'(t)\ ^ 0. We call z(t) a parameteriza¬
tion of the curve. The length of the curve, L, is defined to be

L = f \z'(t)\dt = f yJ(x'(t))2 + (y'(t))2dt,


Ja Ja

a formula which should be familiar from Calculus. By analogy with the


Riemann integral of real-valued continuous functions on R, we want the
integral of the complex-valued function / over the smooth curve C to be
given approximately by the Riemann sum

r N
/ f(z)dz ~ ^f(zj)(zj- Zj-1),
JC j=1

where the points zq — ca, z/v = t, and zi, Z2, ■.., zn-i be along the curve
in order between zq and zn. See Figure 8.2.1.
308 Chapter 8. Complex Analysis

Figure 8.2.1

Let {to,t\, be a partition of the interval [a, b] and let Zj = z(tj).


Then,

~ ZJ-1) = S /(*(*i))(z(*i) - z(tj-1)) (16)


3=1 3=1

= <i7)
j—i zj zj-1

~ -*i-1)- (i8>
i=i

Notice that (18) is a Riemann sum for f(z(t))z'(t) dt, which is an inte¬
gral of a continuous function on a finite interval on the real line, since /
is continuous and z(t) is continuously differentiable. This gives us the
idea for the following definition.

Definition. Let C be a smooth curve of finite length in the complex


plane and let / be continuous on an open set containing C. Let z(t),a <
t < b, be a parameterization of C. Then, we define

[ f(z) dz = [ f(z(t))z'(t) dt. (19)


Jc Ja

It can be shown that this definition is independent of the parameteri¬


zation of the curve C chosen. Note, however, that a parameterization
sweeps out the curve in a given direction (from u> to r). Definition (19)
8.2 Integration on Paths 309

shows that if two parameterizations differ in direction, then the corre¬


sponding integrals will differ by a minus sign. It is straightforward to
prove (see problems 7 and 8) that the integral has the other properties
that we expect. For example, the integral of a sum of functions is the
sum of the integrals and

< (max|/(z)|)L. (20)


z eC

We can also write fc f(z) dz in terms of real line integrals in R2. If u(x, y)
and v(x,y) are the real and imaginary parts of /, then

[ f(z(t))z'(t)dt
Ja

f (u{x(t),y{t)) + iv(x(t), y(t)))(x'{t) + iy'{t)) dt


Ja

f (u(x(t), y{t))x'{t) - v{x{t),y(t))y'{t)) dt


Ja

+ i f (u(x(t),y(t))y'{t) + v(x{t),y{t))x'{t))dt
Ja

/ udx — vdy + i udy + vdx.


Jc Jc

Example 2 Let C be the straight line from (0,0) to (1,2) and let f(z) =
z2. Then, z(t) = t + 2it, 0 < t < 1, is a parameterization of C. Since
z'(t) = (1 + 2i), we have

J f(z)dz = j\t + 2it)2(l + 2i)dt = (1 + 2i)3Jot2dt = |(1 + 2i)3.

Example 3 Let C be the arc of the unit circle (\z\ = 1) from 1 to i and
let f(z) = 2. We can parameterize C by z(t) = cost + isint = elt on the
interval [0, f]. Thus, z'(t) = ielt, so
310 Chapter 8. Complex Analysis

Example 4 Let zQ be any complex number and let C be the circle of


radius r centered at zQ, taken in the counterclockwise direction. We can
parameterize C by choosing z(0) = zQ + reie for 0 < 9 < 2ir. Here 9 plays
the role of the parameter t. Since z'(9) — irel°,

[ (z -
Jc
Zo)n dz = f
Jo
(rel0)niireidd0

/>27T

i(eie)n+1rn+1d.O
= Jo

0 if n ^ — 1
27ri if n — — 1

Thus, if p(z) = aj(z ~ zo)j is a polynomial in powers of z — zQ,

N
j p(z)dz = J aj(z — zoy dz

N f
— ya3j{z-z0)J dz
3=0
= 0.

Since any polynomial in z can be written as a polynomial in z — zQ by


making the substitution z = (z — z0) + z0, we have shown that the inte¬
gral of any polynomial on any circle in C is zero. Furthermore, this line
of reasoning can also be used for functions that can be represented by
power series in 2 - zQ. If f(z) = J2jjLoajiz ~ zoV and the series has a
radius of convergence larger than r, then the series converges uniformly
on C. Thus, by the analogue of Theorem 5.2.2 (see problem 11),

= 0.
Since we know that a series in powers of z — z0 is analytic inside its radius
of convergence (Example 3 of Section 8.1), this result is a special case of
Cauchy's theorem, which is proved in the next section.
8.2 Integration on Paths 311

Problems

1. Let g{t) = h(t) + ik(t) be a complex-valued function of a real variable t.


Define g(t) = h(t) — ik(t) and note that h(t) = |(g(t) + g{t)).

(a) Show that g is continuous if and only if h and k are continuous.


(b) Show that g is continuously differentiable if and only if h and k are
continuously differentiable, in which case g'(t) = h'(t) + ik'(t).

2. Explain carefully why one can differentiate under the integral sign in
computing the partial derivatives of the real and imaginary parts of the
Fourier transform of a continuous function that vanishes outside of a fi¬
nite interval (see Example 1). Verify that the Cauchy-Riemann equations
hold.

3. Let / be the function which equals 1 on the interval [a, b] and equals 0
elsewhere. Compute the Fourier transform of / and verify explicitly that
it is analytic in the entire complex plane.

4. Let / be an analytic function. Show that the real and imaginary parts, u
and v, satisfy Laplace's equation:

UXx rXyy - 0 - Vxx 3" Cy y •

5. Let C be the path from (0,0) to (2,4) in R2 that follows the graph of the
function y = x2. Compute the following line integrals:

(a) fc xdx 4- ydy (b) fc xdx - y2dy (c) fc y2dx - xexydy

6. Let C be the straight line from (1,1) to (3,0). Compute

(a) fczdz (b)fcz2dz (c) fc\z\dz

7. Let / and g be continuous functions on C and let C be a smooth curve of


finite length. Prove that

(f(z)+g(z))dz = I f(z) dz + j g(z) dz.

8. Let C be a smooth curve of finite length in C and / be a continuous func¬


tion on C. Prove that

where L is the arc length of C.


312 Chapter 8. Complex Analysis

9. Let C be the circle of radius 1 with center at the origin. Evaluate the
integral
r i
dz.
'c z(z - 2)
fa
Hint: write z(z-2)
| | ^2 — \ } and use the ideas in Example 4.

10. Let C be the circle of radius R with center at the origin. Find an upper
bound for
[ — dz
Jc z
11. Let C be a smooth curve of finite length in C and / be a continuous func¬
tion on C. Suppose that {/n} is a sequence of continuous functions that
converges uniformly to / on C. Then

Hint: use problem 8.

8.3 Cauchy's Theorem

In this section we prove two theorems, Cauchy's Theorem and Cauchy's


Integral Formula which are the basis for the usefulness of analytic func¬
tion theory. We define a contour to be a finite collection of smooth curves
attached end to end (for example, the boundary of a rectangle). A con¬
tour is called simple if it does not intersect itself except possibly at the
first and last points in which case it is called a simple closed contour. An
example of a simple, closed contour is shown in Figure 8.3.1.

Figure 8.3.1
We give a proof of Cauchy's theorem which depends on Green's theorem
from multivariable calculus. Direct proofs are available; see [6] or [23].
8.3 Cauchy's Theorem 313

□ Theorem 8.3.1 (Cauchy's Theorem) Let / be an analytic function in an


open domain D C C. Then

[ f{z)dz = 0 (21)
Jc
for every simple closed contour C in D whose interior contains only
points of D.

Proof. Let Dc denote the region of C consisting of C and the points


inside C; see Figure 8.3.1. Since / is analytic, u and v are continuously
differentiable on Dc. Thus, Green's theorem implies that

Since / is analytic, u and v satisfy the Cauchy-Riemann equations, ux =


vy and uy = —vx/ on Dc- Thus, each of the last two integrals equals zero.

Example 1 If / is a polynomial in z, then / is analytic, so (21) is true


for all simple closed contours C in C. If f(z) = J2j°=o aj(z — zoy, then
(21) holds for any simple closed contour inside the radius of convergence
since / is analytic there. We computed these results explicitly in the case
where C is a circle in Example 4 of Section 8.2.

Example 2 The function f(z) = is analytic everywhere except z =


zQ. Let C be any circle with center at zQ traversed in the counterclockwise
direction. We calculated in Example 4 of Section 8.2 that

This shows that if the interior of a simple closed contour contains a single
point at which / is not analytic, then the conclusion of Cauchy's theorem
may not hold.

Example 3 The function


314 Chapter 8. Complex Analysis

is analytic except at the points z — 0 and z = 2. The calculation in Ex¬


ample 4 of Section 8.2 and Cauchy's theorem allow us to evaluate the
following integrals immediately. All curves are traversed in the counter¬
clockwise direction. If C\ is the circle of radius 1 about the point 2i, then
by Cauchy's theorem,
[ f{z)dz = 0.
JCi
If C2 is the circle of radius 1 about the point 0, then by Example 4 and
Cauchy's theorem,

1 r dz 1 r dz
= —ixi -f 0.
2 Jc2 z 2 Jc2 z — 2
If C3 is the circle of radius \ about the point 1, then by Cauchy's theorem.

0.

We remark that we are assuming the Jordan Curve Theorem: every


simple closed continuous curve has an interior and an exterior. This
looks obvious for any simple example but is not easy to prove in gen¬
eral. For a discussion of the Jordon Curve Theorem, see [23].
We now prove the most striking consequence of Cauchy's Theorem.
If / is analytic, its value at any point inside of C is an appropriate
weighted average of its values on C. In particular, the values of / in¬
side of C are all determined by the values on C. And this is true for any
contour C just as long as / is analytic on and inside of C.

□ Theorem 8.3.2 (Cauchy's Integral Formula) Let / be an analytic func¬


tion in an open domain D C C. Let C be a simple closed contour C in D
traversed in the counterclockwise direction whose interior contains only
points of D. Then for every zQ inside C,

f{Zo) = -C/ di±dz. (23)


2m Jc z - z0

Proof. Suppose that e > 0 is small enough so that C(z0, e), the circle of
radius e centered at zot lies entirely inside of C. We will first show that
the integral of f(z)/(z - zQ) on C is equal to the integral of f(z)/(z - zQ)
on C(z0, e). Choose a point to on C(z0, e) and let L denote the radial line
segment from uj to the first intersection point, r, of the line with C. These
8.3 Cauchy's Theorem 315

curves are indicated in Figure 8.3.2. Let C\ be the contour consisting


of C (from r to r), followed by —L, followed by — C(z0,e) (from u to
ui), followed by L. We indicate that a curve is traversed in the opposite
direction from its defined direction by a minus sign in front of the symbol
for the curve.

Since f{z)/{z - z0) is analytic on and everywhere inside of C\,


Cauchy's theorem implies

0 = [ ihU
JCi z ~ z0

fn±dz-[ m.
JC Z — Za JC(z0,e) Z z0

which shows that the integral of f(z)/(z - zQ) on C and the integral of
f(z)/(z- z0) on C(z0, e) are equal. Notice that this equality is true for all
small e.
We now estimate the difference between the right-hand and left-hand
316 Chapter 8. Complex Analysis

sides of (23).

fM ~ dz = fizo) - y~. [ fM dz
2m Jc z — zQ 2m Jc
C(z0,e) Z Z0

fM) 2m
f
Jc(z0,e) Z- Zo
f f{z) ~ f{z0)
— ~ ^ Jc(za, 2m e) Z - Zf,
dz

1 fiz) - f(Zo) dz
2tv fcC(z0,e) z Z0

I f(z) ~ f{Zo)
< l max (27re)
z € C(z0,e)

= 2tt ( max \f{z)-f(z0)\


Z € C{z0,£)

In the second step we used the result of Example 2 to cancel the first
two terms on the right. In the last step we used the estimate proved
in problem 8 of Section 8.2. Since / is continuous, the right-hand side
converges to zero as e —> 0. However, the left-hand side does not depend
on e, so we conclude that (23) holds. □

In Example 3 of Section 8.1, we showed that inside its radius of con¬


vergence a power series in powers of z - zQ converges to an analytic
function. Now we can show the converse.
8.3 Cauchy's Theorem 317

□ Theorem 8.3.3 Let / be an analytic function in an open domain D C C.


Let zQ be a point of D, and let C(zQ,ro) be a circle about zQ such that
the circle and its interior contain only points of D. Then / may be repre¬
sented by a power series in z-zQ that converges uniformly on any closed
disc {z | \z — zQ| < r} such that r < tq.

Proof. Choose ri so that it satisfies r < r\ < tq. Let z be a point in the
disk {r | |r — zQ\ < ri} and let a; be a point on C(z0,ro). Since r\ < ro,
there is an a independent of 2 and uo so that

< a < 1. (24)

Thus, 1 _ 1 1

w - 2 ~ w“2»1-5^

U> - " Vca - ZoJ

where the series converges uniformly for u on C(z0,ro). Therefore, by


Cauchy's Integral Formula and the result of problem 11 of Section 8.2,

f(“)
duj (25)
/(*) 2?rf Jc{z0,r0) W-Z

—f
2m Jc{z0,r0) U - Zo N^°°
Um f; (^^-Y dw
- zo/
(26)
3=0

N
lim V( — / du (2: — z0y (27)
iV—> 00
JT'q V2?rZ JC{zo,r0 ) (u - zQy+i

Since the series in (27) converges for z e {z \ \z - zQ\ < r\}, we know, by
Theorem 6.5.3, that the radius of convergence of the series is > r\. Thus
the series converges uniformly on {z \ \z — zQ\ < r} if r < r\. □

Corollary 8.3.4 On its domain of definition, an analytic function is in¬


finitely often differentiable. Further, the series (27) is the Taylor series of
/ about z0, and the derivatives of / are given by the formulas

n\ /M
f(n)(zo) du (28)
27ri - 20)n+1

where C is any simple closed contour about in D whose interior con¬


tains only points of D.
318 Chapter 8. Complex Analysis

Proof. The same proof as in Theorem 6.4.2 shows that a complex power
series is infinitely differentiable inside its radius of convergence and the
derivatives can be computed by differentiating the series term by term.
Differentiating (27) term by term and setting 2 = 20 gives formula (28)
in the case where C = C(z0,ro). Thus, the coefficient of the power
in (27) is f^\z0)/j\, so (27) is just the Taylor series of /. Formula (28)
holds for general C satisfying the hypotheses by an argument similar to
the argument at the beginning of the proof of Theorem 8.3.2. □

We have shown that analytic functions of a complex variable have


very special properties. What is not obvious is that analytic runction the¬
ory is extremely useful in many problems which at first glance do not
seem to have anything to do with complex variables or analytic func¬
tions. Here are some examples:

(a) In Section 6.4 we asked which real-valued functions of a real vari¬


able, f(x), are equal to their Taylor series, j\f^(xo)(x — xG)J in an
interval (xQ — r,x0 + r). We now know that this is true if and only if f(x)
is the restriction to the real axis of a function of the complex variable 2
that is analytic in the disk {2 | \z — xq\ < r}.

(b) Cauchy's theorem can sometimes be used to evaluate the integrals


of real-valued functions on the real line. See project 1.

(c) The Prime Number Theorem, which we stated at the end of Sec¬
tion 6.6, describes the asymptotic behavior of the function 7r(m) whose
value is the number of primes < m. It doesn't seem likely that this could
have anything to do with complex numbers or analytic functions. How¬
ever, the Riemann zeta function, which we introduced in problem 12 of
Section 6.3, is the restriction to the set {x \ x > 1} of an analytic func¬
tion (see problem 10). The properties of ((z) played a central role in the
original proofs of the Prime Number Theorem.

(d) The Fourier transform, defined in Example 1 of Section 8.2, has


played an important role in mathematics and physics for almost 200
years. Although it was introduced as a technique for solving partial dif¬
ferential equations, it has many other uses. For example, it is used in
the theory of group representations and is the main tool for proving the
Central Limit Theorem in probability theory. In the example, we showed
that the Fourier transform of a function which is zero outside of an inter¬
val is the restriction to the real axis of an analytic function on C. It turns
8.3 Cauchy's Theorem 319

out that the size of the region where / is nonzero can be characterized
in terms of the growth properties of / in the imaginary directions. As
mentioned in the example, this plays an important role in the theory of
signal transmission in electrical engineering and in scattering theory in
high-energy physics.

(e) Let A be a linear transformation from Rn to Mn. If we choose a


basis for Mn, then A is represented by an n x n matrix with real entries.
Let I denote the identity matrix. Except for finitely many values of z, the
eigenvalues of A, the matrix (zl — A) is invertible. It turns out that the
function z —>■ (zl — A)-1 is an analytic matrix-valued function of 2 except
at the eigenvalues. A generalization of this fact plays an important role
in the analysis of linear transformations on Hilbert and Banach spaces.
These transformations, in turn, play a central role in operations research
and quantum mechanics.

Problems

1. Let C\, C2, and C3 be circles with center the origin and radii 1, 3, and 5,
respectively, traversed counterclockwise. Use Cauchy's Theorem and the
Cauchy integral formula to evaluate the following integrals:

/ci ch dz (b) fCs yr2 dz (c) /Ci dz

<d) Sc, (,_e4-,> dz (e) fCi sp dz (f) fc, £ dz

2. Compute fCi dz where Ci is as defined in problem 1. Hint: use the


power series for sin z.

3. Let / be the function

~ (z — l)(z — 4)'

Use an argument similar to that in Theorem 8.3.2 to show that

[ f(z)dz = [ f(z) dz + [ f(z) dz


Jc3 J Ci J C5

where C3 is as defined in problem 1, and C4 and C5 are small counter¬


clockwise circles about the points 1 and 4. Use Cauchy's Integral Formula
to evaluate the two integrals on the right.
320 Chapter 8. Complex Analysis

4. Suppose that / is analytic in the disk {z e C | |z| < R} and that /'(z) = 0
for all z in the disk. Prove that / is constant in the disk. Hint: use power
series.
5. Suppose that / is analytic in the entire complex plane and bounded; that
is, \f(z)\ < M for some M. Use (28) and (20) to prove that /(n^(0) = 0
for all n > 1. Conclude from this that / is a constant. This is known as
Liouville's Theorem.

6. Let C\ be the straight line path from i to 1 and let C2 be the path that goes
in a straight line from i to 0 and then in a straight line from 0 to 1.

(a) Compute fc z dz and fc z dz and show that they are the same.
How could you have predicted this by using Cauchy's theorem?
(b) Compute fCi z dz and fc^ z dz and show that they are not the same.

7. (a) Suppose that / is analytic in the open unit disk {z \ \z\ < 1} and
suppose that /(z) = 0 when 2 is real. Prove that /(z) = 0 in the
disk. Hint: what can you say about the Taylor coefficients of / at 0?
(b) Suppose that / is analytic in the unit disk {z \ \z\ < 1} and that
f(zn) = 0 at a sequence of points {zn} that have a limit point in the
disc. Prove that f(z) = 0 in the disk.
8. Suppose that / and g are both analytic functions on C and that /(z) =
g{z) if z is real. Prove that /(z) = g(z) for all z.

9. Use the fact that ezew — ez+u for z and u> real to prove that the same
equality must hold for all z e C and all weC.
10. If b > 0, define bz = ezlnb. Show that the Riemann £ function,
OO 1

3=1 J

is a well-defined analytic function in the region {z | Re{z} > 1}.

Projects

1. The purpose of this project is to show how the theory of analytic functions
can be used to evaluate improper Riemann integrals on R. The integral

1
dx
1 + x2

can be computed by elementary means. To see how to compute it by


using the theory of analytic functions, notice that f(x) is the restriction to
R of the function /(z) = (1 + z1 2)-1, which is analytic everywhere in C
except at z = i and z = —i.
Projects 321

(-it, 0) 0 (R,0)

(a) Let Fr be the contour in the above figure, consisting of a first piece,
Tr/ traversing the real line from x = — R to x = R, and a second
piece, CR, which traverses counterclockwise from (12,0) to (-12,0)
along the circle of radius R with center at the origin. Suppose 12 > 1.
By writing f(z) = (z + i)~1(z — i)-1 and using the Cauchy integral
formula, explain why

(b) Use (20) to prove that | /Cr f(z) dz\ -» 0 as 12 -> oo.
(c) Explain why the improper Riemann integral dx exists and
equals 7r.
(d) Use the same ideas to evaluate

2. The purpose of this project is to prove the Fundamental Theorem of Al¬


gebra, namely, that every polynomial

p(z) = a0 + aiz + a2z2 + ... + anzn

has at least one root if n > 1. Since we are assuming that p has order n,
we know that an / 0. Note that the coefficients {tr,-} may be complex. We
shall outline a proof by contradiction. Suppose that there is no so that
p(z0) = 0.

(a) Explain why is analytic in the entire complex plane.


(b) Prove that there exists an r0 so that
322 Chapter 8. Complex Analysis

(c) Prove that


1
p(z)

if \z\ > vq.


(d) Prove that js bounded in the entire complex plane.

(e) Use Liouville's theorem to conclude that is constant and explain


why this gives a contradiction.

Remark: to see the power of analytic function theory try to prove the
Fundamental Theorem directly.
CHAPTER 9

Fourier Series
Fourier analysis has played an important role in mathematics since the
early part of the 19^ century. In Section 9.1 we show Fourier's technique
for solving a partial differential equation describing heat flow. His cal¬
culation posed a question for mathematicians which proved to be both
difficult and exceptionally fruitful. Fourier series are formally defined in
Section 9.2- where several examples are given. The theorems on point-
wise convergence and mean-square convergence are proved in Sections
9.3 and 9.4, respectively.

9.1 The Heat Equation

It this section we investigate the flow of heat in a thin metal bar of length
L centimeters. We assume that the temperature, u, is constant in each
cross section, so u depends only on the time t and the distance x along
the bar. Heat energy is measured in calories. The specific heat of a ma¬
terial, c, is the number of calories needed to raised 1 gram 1 degree
centigrade. Thus the heat per unit length at x is cpAu(t, x), where A
is the area of the cross section and p is the density of the metal. For
simplicity, we assume that the
bar is homogeneous and a per¬
fect cylinder, so c, p, and A are
constants. It follows that the to¬ x = 0 a b x = L
tal heat in the segment of the bar
between x — a and x = b is Figure 9.1.1

Ha,b cpAu(t, x) dx,

and thus the rate of change of heat in the segment is

H'ab(t) = [ cpAut{t,x)dx. (i)


Ja
324 Chapter 9. Fourier Series

We will now calculate this rate of change in another way. We assume


that the bar is insulated on its sides so that heat can only flow along the
bar (or out the ends). A fundamental physical principle is that the rate
of heat flow at any point in a material is proportional to the gradient of
the temperature. In our case, since heat flows only along the x-axis, the
rate of heat flow in the positive x direction (in calories per unit time per
unit length per unit cross section) at x is —aux(t, x), where the constant
a > 0 is determined by the properties of the material. We take the minus
sign in front of a so that heat flows from hot to cold. Returning now
to the segment of bar between a and b we see that the heat flowing into
the segment at b is aAux(t,b) and the heat flowing into the segment at
a is —aAux(t,a). Thus, the net rate of change of heat in the segment is
aAux(t,b) - aAux(t,a). By the Fundamental Theorem of Calculus, we
can write this net rate of change as

aAux(t,b) — aAux(t, a) = / aAuxx(t,x) dx. (2)


Ja

Both the right-hand side of (1) and the right-hand side of (2) represent
the net rate of change of heat in the segment, so setting them equal and
rearranging, we find

fb a
/ ut(t,x)-uxx(t,x)dx = 0. (3)
Ja Cp

The constant n = ~ is called the diffusivity. Since (3) holds for all choices
of a and b, the integrand in (3) must be zero if it is continuous (problem
14 of Section 3.3). Thus,

ut(t,x) — Kuxx(t,x) = 0. (4)

This partial differential equation is called the heat equation. The tem¬
peratures along the bar are given at time t = 0,

u{ 0,x) = /(&), (5)

and we suppose that the ends of the bar are held at temperature zero at
all times (for example, by placing them in an ice bath). Thus,

u(t, 0) = 0 = u(t,L). (6)

The mathematical problem is to find the function, u(t, x), defined on the
half-strip, 0 < x < L, t > 0, so that the partial differential equation (4),
the initial condition (5), and the boundary conditions (6) all hold.
9.1 The Heat Equation 325

For the moment we forget about the initial conditions and look for
solutions of (4) and (6) that have the special form u(t,x) = X(x)T(t),
which is a function of x times a function of t. If we substitute X(x)T(t)
into (4), carry out the differentiations and rearrange algebraically, we
find

T'(t) X"(x)
—— = - — (7)
nT(t) X(x) '

Since t and x are independent variables, this can only be true for all t
and all x if both sides are constant; we call the constant -A. Thus, T(t)
satisfies the differential equation

T'(t) + \KT(t) = 0, (8)

and X(x) satisfies the differential equation

X"{x) + XX (x) = 0. (9)

The general solution of (9) is X (x) = a\ cos \/\x + a2 sin y/\x. Flowever,
since the boundary conditions (6) must be satisfied, it is easy to see that
we must have a\ = 0 and sin y/\L = 0. Thus A cannot be any constant
but must satisfy vXL = ±n7r for some integer n. Since the function sin x
is odd, the solutions for n negative are minus the solutions for n positive,
and the solution for n = 0 is identically zero. Thus, if u is not identically
zero, the possible A's are An = (yf)2, n = 1,2, 3,, and X{x) is one of
the functions
mrx
Xn(x) bn sin

where bn is a constant. For each n, the solution of (8) is a constant times


e-\nKt rnluS/ each of the functions

/. \ _\ 7 • T17TX
un(t,x) = bne An sm——
1j

satisfies the partial differential equation and the boundary conditions.


This can be checked directly by computing the partial derivatives of un.
Note that the heat equation is linear; that is, only the first power of the
unknown function u or its derivatives occurs. It follows, since differen¬
tiation is a linear operation, that linear combinations of the functions un
are again solutions. This suggests that we try to write the general solu¬
tion u(t,x) of the heat equation with the boundary conditions (6) in the
form

nirx
u(t, x) bne XnKt sin (10)
71=1
L
326 Chapter 9. Fourier Series

Assuming that the series converges, such a u(t, x) satisfies the heat equa¬
tion and the boundary conditions. But what about the initial condition
(5)? If we set t = 0 in (10), we see that

/(*) = f>sin(11)
71=1 L

Here is the question posed by Joseph Fourier (1768 - 1830): given /,


can we choose the coefficients {6n} so that (11) holds? Although Fourier
did not answer the question, he showed that if the answer is yes, then
the coefficients bn are determined by simple formulas. The family of
functions {sin satisfies (problem 1)

fL . mrx rmrx f 0 if n ^ m
/ sm —— sm —-— ax = <
L y ^ if n = m.

We shall use these special properties to compute formulas for the coef¬
ficients {6n}. Suppose that we know that the series on the right of (11)
converges uniformly to f(x) on the interval [0, L\. Multiplying both sides
of (11) by sin and integrating, we find

where we used Theorem 6.3.2 to exchange the sum and the integral. We
conclude that

Thus, if f can be represented in the form (11), we have a formula for


the coefficients {bn}. This, in turn, allows us to write down an explicit
formula (10) for the solution of the heat equation that satisfies the initial
and boundary conditions.
9.1 The Heat Equation 327

Example 1 Suppose that f(x) = x(L — x). To evaluate the integral (12),
we integrate by parts:

2 . mrx
— — / x(L — a;) si:
sin —— d®
-Lv ,/0 ±J

2 f/ L\ nirx)L 2 fL. n7rx


cos —— ax

2T | mrx \ " 4L f . mrx


{L - 2x) sm— + -—^„ /. sm —— ax
(mr)2 Jn (n7r)z
{mr)A Jo L

4 L2
n3 7r3
(i - (-id.
Therefore, bn = 0 if n is even and bn = if n is odd. Thus w(f, x) is
given by the series

8L2 3-7ra:
u(t, x) (^)2^ sin ™ + (3n/L)2K,t
sin
337T3 ~ir

i sin rj”'r +
5V L
Notice that for t > 0, the series converges very fast because of the ex¬
ponential factors. This is the key fact which is exploited in the theorem
below. When t = 0, the series is

8L2 . 7tx 8L2 . 3ttx 8L2 birx


~^smT + 3VsmT + Pvsmrr +
or2 V 1 (2n + l)7nr
sm
^ (2n + 1)37T3
71=0 v '
~L

If u(t, x) satisfies the initial condition (5), then this series must converge
to the simple polynomial x(L — x) for each x in [0, L\. Because of the cubic
power, the series certainly converges. But does it converge to x(L - x)?
We shall see in Section 9.3 that the answer is yes.

As in this example, we can prove, in general, that the function u(t, x)


defined by (10) satisfies the partial differential equation and the bound¬
ary condition when t > 0. Not only that, the solution u(t, x) is infinitely
often continuously differentiable in both t and x for t > 0.
328 Chapter 9. Fourier Series

□ Theorem 9.1.1 Suppose that / is piecewise continuous on [0, L\, and let
u(t, x) be defined by (10) where the bn are given by (12). Then for t > 0,
u(t, x) is infinitely often continuously differentiable in both t and x and
satisfies the heat equation (4) and the boundary conditions (6).

Proof. According to Theorem 6.3.3, we can differentiate series (10) with


respect to x if each term is continuously differentiable in x and the series
which results from term-by-term differentiation: namely.

77/7T \ U7TX
£ bne~XnKt
n—1
T )C0S^T

converges uniformly. Since / is piecewise continuous on [0, L\, we know


that \f(x)\ < M for some M. It follows by estimating the integral in (12)
that |&n| < 2M. Fix tQ> 0 and define Mn = bn^e~XriK;to. Since

g -A nry'Lo
Kt _ _
1 2!
_
1 -f- 2j -(- ... (A/x/^fo)

and An = (z^!:)2, we can estimate

2! C
Mn < 2 M <
(An«f0)2 n3 '

This estimate shows that Mn < oo. Thus, by the Weierstrass M-test,
the differentiated series converges uniformly for all x. Notice that the
estimates hold uniformly for all t > ta and that the terms in the differen¬
tiated series are continuous functions. Thus we have proved that u(t, x)
is continuously differentiable in x in the region — oo < x < oo,tQ < t.
Since tQ was an arbitrary positive number, it follows that u(t, x) is con¬
tinuously differentiable in x for t > 0 and the derivative can be computed
by differentiating the series for u(t, x) term by term.
Similarly, the term-by-term derivative with respect to t,

E -Annbne
OO
x 7 w
Xn
Ki .
sin—,
T17TX

also converges uniformly in x and t > tQ > 0. Thus u is continuously


differentiable in both x and t for all x and t > 0, and the derivatives can
be computed by differentiating the series term by term.
By repeating these arguments, one can show that u is infinitely often
continuously differentiable in the region -oo<x<oo,f>0 and the
9.1 The Heat Equation 329

derivatives can be computed by term-by-term differentiation (problem


2). Differentiating the series for u term by term allows one to verify that
u satisfies the heat equation (4). In addition, u satisfies the boundary
condition (6) since each term in the series does. □

Theorem 9.1.1 expresses a very important “smoothing” property of


the heat equation. Even if the initial temperature distribution, /, is only
continuous (or even has jump discontinuities), the solution of the heat
equation will be infinitely often differentiable for t > 0. This is a gen¬
eral property of a class of partial differential equations called parabolic
equations. Note that we have not yet answered the question of whether
the series converges to f(x) when t = 0 so that the initial condition (5) is
satisfied.
We derived (4) by analyzing heat flow in a bar, but the same partial
differential equation arises in many other contexts. Albert Einstein (1879
- 1955) showed that if u(t, x) is the density of a gas along the real line at
time t, then under appropriate physical assumptions, u(t, x) will satisfy
(4). Thus, the equation (4), which also occurs in chemical and biological
contexts, is often called the diffusion equation. A variant of the heat
equation occurs in the Black-Scholes model for stock-option pricing.

Problems

1. Use the expression sin# = (2— e~l6) to prove that

fL nirx m-nx f 0 if n ^ m
Jo L L \ f if n = m

2. Fill in the details in the proof of Theorem 9.1.1.


3. Let f(x) = x(L - x), as in Example 1. Verify by explicit differentiation
that the function u(t, x) defined there satisfies the heat equation.
4. Let f(x) = x(L - x), as in Example 1, and suppose, for simplicity, that
L = n and k = 1. Let Sn = Y!j=i Ke~Xnt smnx, where the bn are the
same as those computed in Example 1.

(a) Generate the graphs of /, Si, S3, and S5 on the interval [0,tt] when
t = 0. Does it look as if the series for w(0, x) is converging to /?
(b) Find an upper bound on the error we make if we replace u by S5.
(c) Compare the graphs of S5 for t = 0, .5,1,2, and 5. Is this the way
you would expect u(t, x) to behave?
330 Chapter 9. Fourier Series

5. Let L — 7r and k = 1 and define f(x) = a; on the interval [0,7r].

(a) Compute the coefficients bn. Hint: use formula (12) and integrate
by parts.
(b) Show by explicit differentiation that the function u defined by (10)
satisfies the heat equation for t > 0 and explain why the boundary
conditions (6) hold for all t > 0.
(c) Explain why the series for u at t — 0 cannot converge to / for all
x e [0,7r].
_ 11

6. Let Sn be the nm partial sum of the series for the solution u of problem 5.

(a) Generate enough graphs of Si, S2, S3,... to convince yourself that,
for t — 0, the series converges to / for all x e [0,7r] except x — n.
(b) Compare the graphs of S5 for t = 0, .5,1,2, and 5. Is this the way
you would expect u(t, x) to behave?

7. Consider heat flow in a bar where we do not prescribe the temperature at


the ends but instead assume that the ends are insulated.

(a) Explain why the right boundary conditions are

(L 0) — 0 — (13)

(b) Use the methods of the section to derive the formal solution
OO

where an = £ /QL f(x) cos ~ dx and An = (^)2, if n > 0, and

°o = if0Lf(x)dx>-
(c) Explain why the same arguments as in Theorem 9.1.1 show that u is
infinitely differentiable in x and t for t > 0 and u satisfies the heat
equation and the boundary conditions (13).

(d) By using the partial differential equation, prove that fQL u(t, x) dx is
independent of t. Why is that reasonable?
(e) What happens to the solution as t —>■ 00?

8. Suppose that the ends of the bar are kept at temperature zero and that
/3cu(t, x) units of heat are added to the bar per gram per unit time by
some internal chemical reaction where [3 is a constant. Show that u should
satisfy the partial differential equation

111 (. x) kuxx (t, x^ (3u(t, x) — 0. (14)

Using the methods of the section, write down a formal solution to (14)
which satisfies the boundary condition (6) and the initial condition (5).
How does the behavior of the solution as t ->• 00 depend on /3?
9.2 Definitions and Examples 331

9. Suppose that the ends of the bar are kept at temperature zero and that
cg(x) units of heat are added to the bar per gram per unit time at time t.
Show that u should satisfy the partial differential equation

ut{t,x) - Kuxx(t,x) = g(x).

Suppose that the initial temperatures are given by f(x) and assume that
g can be written in the form g(x) = E^°=i cn sin njf-- Suppose that the
solution u(t, x) has the form
OO
U7TX
u(t, x) ^2 Tn(t) sin
~T~
71 = 1

for some unknown functions {Tn(t)}. For each n, find and solve the or¬
dinary differential equation that must be satisfied by Tn(t).
10. Consider a rectangular plate with coordinates (0, 0), (a, 0), (a, b), and
(0, b) for the vertices. Assume that the temperature u(t, x, y) satisfies the
two-dimensional heat equation

' 'Ut (C «r) *r) KUyy(S x) 0*


Suppose that the boundaries of the rectangle are kept at temperature zero
and that the initial temperatures are given by a function / that can be
written as
OO OO
v ^ m^y • n7TX ■
x,y) = Z^Z^^nSm-Sin——.
n=1 m—1

Using the methods of the section, find functions Tn;Tn(f) so that this
initial-boundary value problem has a solution of the form
OO OO
X ^ V^ m / \ 7171X miry
u(t,x,y) 2^ 2^ sin — sin
~b~‘
n=1 m=1

9.2 Definitions and Examples


We want to investigate under what conditions a real or complex-valued
function on R, f{x), can be written in the form
OO

f(x) = ao + ^ amcosmx + bmsinmx. (15)


m— 1

Note that if £ |am| < oo and £|6m| < oo then the series on the right
certainly converges for each x. If we define
71

Sn(x) = a0 + 22 amCosmx + &msinra:r,


m=l
332 Chapter 9. Fourier Series

then (15) means that the partial sums Sn(x) converge to f{x). Each par¬
tial sum is a periodic function of period 27t; that is, Sn(x + 2ir) = Sn(x).
It is easy to prove from this (problem 1) that if the series converges, then
/ must be periodic of period 2tt. From now on we assume that / has this
property. For calculations, it is often convenient to rewrite the series and
partial sums in terms of complex exponentials. Using the formulas for
sin x and cos x in terms of complex exponentials (Example 1 of Section
6.5), we find

n pimx i p—imx pimx _ p—imx

Sn(x) = a0 + am-2-h bm--


m=l

= a0 + £ \(am - ibm)eim* + £ i(am +


m=1 m=1

Thus, if we define

i(am-z6TO) if m > 0
Cm ao if m = 0
2 (C'—m “F ib—rri) If ^ 0

then

s„(*) =
m=—n

Therefore, (15) holds if and only if

OO
f(x) = y cmeimx, (16)
m——oo

where the infinite sum on the right means the limit of the partial sums
Yjm=-ncmelTnx as n f oo. Some of the Fourier series which we shall
study are not absolutely convergent, so it is important to specify which
sequence of partial sums we mean.
If (16) holds and if the series cbnverges uniformly, then the coeffi¬
cients {Crn} are determined. To see this, multiply both sides of (16) by
e~inx and integrate term by term. Then,

ei{m-n)x dx
r f(x)e-inx dx
J —7V

2ircn
9.2 Definitions and Examples 333

since the integral is zero unless n = m. This shows that the coefficient cn
is determined for each integer n. If we define

cm = / r f(x)e-im* dx, (17)


"7T J—ix

then the series on the right side of (16) is called the Fourier series of / and
the Cm are called the Fourier coefficients. Since (15) is just a rewriting of
(16), it is also called the Fourier series of /. The coefficients {am}^=0
and {bm}m=i are a^so called Fourier coefficients and can be written in
terms of {cm}“=_00 by the formulas: ao = co, am = c_m + cm, and
bm = i_1(c_m - Cm), for m > 0. From these formulas, it follows easily
(problem 2) that

l rn
ao = (18)
2~Jj(t)dt

l r77
am = — / /(f) cosmf df, m > 0 (19)
7T J-7T

i r
bm = — / /(f) sinmf dt, m > 0. (20)
7T J-7T

Example 1 Suppose that f(x) = (tt — \x\)2 on [—7r, 7t]. Since f(ir) =
0 = f (—7r), the function / can be extended to be a continuous function
of period 27t on M. Since /(f) is even, all the coefficients bm are equal to
zero (problem 4). We calculate

a0 = ^ f^-\t\fdt = - f\n-t)2dt = y
27T J—7r tt J o
and by integration by parts,

2
am = — (tt — f )2 cos mt dt
7T Jo
^2 />7r

|(7r - f)2sinraf) ^ + - / 2(7r - f) sinraf dt


irm lV ’ J t=0 7Tm Jo

t=7T
{(7T - f) COSmf}^=0
7rm4

m*
Thus, the Fourier series of / is

7T 00 4
C'N--' —~ cos mx. (21)
/(*) T m2
m=1
334 Chapter 9. Fourier Series

Because of the m2 in the denominator, the series certainly converges, but


we don't know whether it converges to f(x), which is why we write the
~ in (21). The graphs of / and the partial sums Si(x), S^®), and Ss(x)
(see Figure 9.2.1) provide some numerical evidence that the series (21)
converges to f(x).

Figure 9.2.1

Example 2 Let / be the function f(x) = —1 on [—7r, 0] and f(x) = 1 on


(0,7r). / is piecewise continuous on [—7r, tr) with a jump discontinuity at
x = 0. Thus the 2tv periodic extension of / will have jump discontinuities
at the points x = 2mr for integral n. In addition, /(tv~) = 1 and /(7r+) =
— 1, so the periodic extension also has jump discontinuities at the points
x = (2n+ 1)7r for integral n. See Figure 9.2.2(a). It is not hard to calculate
the Fourier coefficients. Since / is an odd function and cos mx is even,
am = 0 for all m. Furthermore,

if0. i r
bm —-/ sin mxdx + — / sin mxdx
7T J—tx TV J 0

_
f ^
J 7T m
m = 1,3,5,...
1 7 7

\ 0 m = 2,4,6,...
Thus, the Fourier series of / is
~ 4
/W ~ ^0^2m + l)Sin(2n + 1)3!- (22)

Notice that it is not at all evident that this series converges for any x ^
0 since the coefficients decay only like m_1. In fact, how could a sum
9.2 Definitions and Examples 335

of infinitely differentiable functions like sin mx and cos mx converge to


a function like / which has jumps? Nevertheless, Figure 9.2.2, which
graphs the partial sums. Si (a:), Ss(x), and Sq(x), provides strong visual
evidence that the partial sums of the series get close to / in some sense.

Figure 9.2.2

Example 3 We return briefly to consider initial conditions for the heat


equation considered in Section 9.1. Suppose that / is a continuous func¬
tion defined on the interval [0,L] such that /(0) = 0 = f{L). Then
g(x) = /(^p) is a continuos function on the interval [0,7r]. By defin¬
ing g(x) = —g(—x), we can extend g to be a continuous odd function
on [—7r, 7r], and since p(0) = 0, no jump discontinuity is introduced at
x = 0. Furthermore, because <?( —-k) — 0 = g(tt), g extends to a unique
27r periodic function on R. Since g is odd, the Fourier coefficients am are
all zero and thus the Fourier series of g has only sine terms. Therefore, if
the Fourier series of g converges to g(x) at each x, then

oo
xir ^ . mux
f(X) = = 2 bmSm-j—,
^

m=1

so f(x) can indeed be represented by formula (11). Thus the question


posed in Section 9.1 is a special case of the Fourier series question posed
in this section. If / is continuously differentiable and satisfies /'(0) =
0 = then the even extension of g to [—zr, zr] can be extended to
be a continuously differentiable 2tv periodic function on R (problem 5).
This extension is used in the case where the ends of the rod are insulated
(problem 7 of Section 9.1) and results in a cosine series for the original
function /.
336 Chapter 9. Fourier Series

Problems

1. Let {fn} be a sequence of functions on R which satisfy fn(x + 2ir) = /n(x)


for all n and x. Suppose that fn —> f pointwise. Prove that /(x + 2n) —
f(x) for all x.

2. Use (17) to prove formulas (18), (19), and (20).

3. Suppose that the Fourier series of / converges to /(x) everywhere. Find


a condition on the Fourier coefficients {cm} which holds if and only if /
is real-valued.

4. Let / be a piecewise continuous function on [—n, 7r]. Show that am = 0 for


all m if / is an odd function and bm = 0 for all m if / is an even function.

5. Suppose that / is a continuously differentiable function on the interval


[0, L\ that satisfies /'(0) = 0 = f'{L). Show that g(x) == /(yy) defined on
[0,7r] has a unique even, continuously differentiable, 2n periodic exten¬
sion to R.

6. Assume that the Fourier series for /(x) = (A — |x|)2, which we computed
in Example 1, converges to /(x) for each x in [—n, 7r]. Prove that

El 00 i
7r 2
n2 6
n=l

7. Compute the Fourier coefficients of the periodic extension of the function


/(x) = x on [-7T, 7r). How does the periodic extension of the function
g(x) = x on (—7T, 7r] differ from the extension of /? Do the Fourier coeffi¬
cients differ?

8. Show that the Fourier series of the function /(x) = |x| on the interval
[—7r, 7r] is
7T 4 °° ^

f(x) ~ f - n—1 v 7

9. Suppose that the Fourier coefficients of a piecewise continuous function /


satisfy |am| < oo and |6m| < oo and that the Fourier series converges
to /(x) for all x. Prove that / must be continuous.

10. Suppose that the Fourier coefficients of a piecewise continuous function /


satisfy ^ mk\am\ < oo and ^ mk \bm\ < oo for some fixed positive integer
k and that the Fourier series converges to /(x) for all x. Prove that f is k
times continuously differentiable.

11. Let / be a continuous function of period 2tt defined on R.

(a) Prove that f^+2n f(x) dx is independent of a.


9.3 Pointwise Convergence 337

(b) Let fa be the function fa(x) = f(x — a). What is the relationship
between the Fourier coefficients of / and the Fourier coefficients of
f ?

(c) Let fk be the function fk{x) = elkxf(x), where k is an integer.


What is the relationship between the Fourier coefficients of / and
the Fourier coefficients of fk?

9.3 Pointwise Convergence

In this section we give conditions on / near a point x that guarantee


that the Fourier series of / converges to f(x) at x. The examples and
problems in the last section suggest that the proof will be quite long and
difficult. If the Fourier coefficients satisfy \am\ < oo and ]T |fem| < oo,

then the Fourier series of / converges uniformly, so / would be contin¬


uous (problem 9 of Section 9.2). Thus, if / is not continuous, even at
one point, we will not have these conditions on the coefficients. On the
other hand, if we don't know that ^ \am\ < oo and |&TO| < oo, how
will we ever show that the Fourier series converges, let alone converges
to f(x)7 Example 2 of Section 9.2 indicates that there is a subtle phe¬
nomenon which we must analyze. In that case the function / has a jump
discontinuity at x = 0. Nevertheless, the partial sums look like they are
converging to the values of / to the right and left of x — 0. At x = 0
itself, the series converges to zero, which is the average of the value of /
to the right (+1) and the value of / to the left (—1) of x = 0.
We begin by proving a lemma which generalizes the following ex¬
plicit calculation. Suppose that / is a constant function, f(x) = C, on an
interval [a,b\. Then,

f(t)eint dt dt

as n -» oo. A similar direct calculation (problem 1) shows that the same


is true if / is piecewise constant.
338 Chapter 9. Fourier Series

Lemma 9.3.1 (The Riemann-Lebesgue Lemma) Let / be piecewise


continuous on the finite interval [a, b}. Then,

lim 0. (23)
n—>00

Proof. Let a = ao < ai < < ... < = b be a partition of the


interval [a, b] so that / is continuous on each subinterval {aj-i, aj). Since
/ is piecewise continuous, there is a unique continuous extension fj of /
on [oj_i, aj], and by Corollary 3.5.4,

If each term in the finite sum on the right of (24) converges to zero as
n —>■ oo, then the left-hand side converges to zero. Thus, it is sufficient to
prove the result in the case where / is continuous, which we henceforth
assume. If / is constant, then the explicit calculation that we made above
shows that (23) holds. So the idea of the proof is to approximate / by a
piecewise constant function. Let e > 0 be given. By Theorem 3.2.5, / is
uniformly continuous on [a, b\, so we can choose a S > 0 so that

\f(x)~f{y)\ < if \x — y\ < 5. (25)


b — a

Choose a partition a = po < pi < p2 < ... < pn = b of [a, b] so that


|Pi — Pi-i | < S for all i, and define g to be the function whose value on
\pi-i,Pi) is f(pi). Then, by (25), \f{t) - g(t)\ < ^ for allte [a, 6]. Thus,

f\;(t)-g(t))e±intdt < ?\m-9(t)\dt < €.


Ja Ja

Therefore,

fb rb fb
/ f{t)e±intdt =
/ if{t) - g[t))e±lntdt + / g(t)e±in,dt (26)
Ja Ja “ *> Ja

< [ \f{t) -g(t)\dt + ! 3 e±intdt (27)


Ja 7=1 JPj-1

N
fPi
< £ + J2\f(pj)\ e±intdt (28)
Jpj-i
3=1
9.3 Pointwise Convergence 339

Each of the integrals in the finite sum goes to zero as n —>■ oo, so

lim sup < £.


71—^00

Since e > 0 was arbitrary.

lim sup < o.


n—>oo

However, since /a6 f(t)e±intdt > 0 for all n.

lim inf > 0.


n—> OO f f{t)e±intdt

Therefore, by Corollary 6.1.2, lim^oo f(t)e±intdt exists and equals

zero, which implies that limn^oo f(t)e±intdt = 0. □

The functions that we are considering are piecewise continuous, so


the left-hand limit, and the right-hand limit, f{x+), exist at every
x. We say that the difference quotient of / has left-hand and right-hand
limits at x if both of the limits,

Hm f(x + s) - f{x~) Hm f(x + s) - /(x+)


s/' 0 S 5 s\0 S

exist and are not infinite. Note that the value of / at x may not itself be
involved in these difference quotients since it need not equal f(x~) or
f(x+). Of course, if / is continuous at x, then f{x~) = /(x) = f(x+).
The following theorem gives sufficient conditions for the pointwise con¬
vergence of Fourier series.

O Theorem 9.3.2 Suppose that / is piecewise continuous on [—zr, ir] and


periodic of period 2tt on K. Let {cm} be the Fourier coefficients of / and
define Sn(x) = J2m=-n cmeirnx. Then, at every point x where the left-
hand and right-hand limits of the difference quotient of / exist.

f{x+) + f(x )
lim Sn(x) (29)
71—> OO 2
340 Chapter 9. Fourier Series

Proof. By the definitions of Sn and c

£ (C f_J(t)e-Mdt)e'
Sn(x) =
m——n (hi (30)

= hf™ t ^ m——n
dt. (31)

To find a simple expression for Dn(x) = Y^m=-n eirnx, we note that

(eix-l )Dn(x) = ei(n+1)x — e~inx. (32)

Solving (32) for Dn(x) and multiplying the numerator and denominator
by e~lx/2, we find that

sin (n + \)x
Dn{x) =
sin 7}X

Substituting in (31) and making the change of variables s = t — x, we get

Sn(x) = ~ J f(t)Dn(x — t) dt

= [ f(x + s)Dn(-s)ds

= “ J f{x + s)Dn(s) ds.

In the last step we used the fact that Dn is even and that all integrals of
f(x + s)Dn(s) on intervals of length 2n are the same since both f(x + s)
and Dn(s) are periodic of period 2ir in s. Because j- f*K Dn(s) ds = 1
and Dn{s) is even,

Sn{x) -
f(x+) + f(x ) 1
2ir 7_
f

J —IT
(f(x + s)~ f(x ))Dn(s)ds

“'
+ ~~ f
2iv Jo
(f{x + s)~ f{x+))Dn(s) ds

1 r0 l rn
— — / g{s) sin (n + |)s ds + — / h(s) sin (n + h)s ds,
Z7T J-7T Z7T 70

where

g(«) = /<a;+jg7/(a!~>,
sin
h(.) = fdx+s)
sin
“/(l+)
9.3 Pointwise Convergence 341

Since sin does not vanish on [—7r, 0), and / is piecewise continuous, g
is piecewise continuous on [—7r, 0). Furthermore, if we write

g(S) = f(X + S)-f(X ) g


s sin i s ’

we see that g has a left-hand limit at s = 0 because of the assumption


that the difference quotient for / has a left-hand limit at x. Therefore, g
is piecewise continuous on [—vr, 0]. If we expand sin (n + ^)s, we have

lr° 1 r°
~ / g(s) sin (nl)s ds = — / (g(s) cos #) sin ns ds
K J-7T 7T J-7T

1 f0
+ — (p(s) sin |) cos ns ds.

Now we write sin ns and cos ns in terms of e+ms and e ins, and we use
the Riemann-Lebesgue lemma to conclude that

1
— / g(s) sin (n + i)s ds —> 0 as n CXD.
7T J--*

A similar proof, using the hypothesis that the difference quotient of /


has a right-hand limit at x, shows that

1
— / h(s) sin (n + |)s ds —> 0 as n —> oo,
7T Jo
which completes the proof. □

Corollary 9.3.3 If / is 2-k periodic and continuously differentiable, then


the Fourier series of / converges to f(x) at every x.

Proof. If / is continuously differentiable then the right-hand and left-


hand limits of the difference quotient exist at every xel Furthermore,
since / is continuous, f(x+) = f(x) = f(x~) for all x. Thus, by Theorem
9.3.2, lim^oo Sn(x) = f(x) for all x. □

Example 1 The function f(x) = (7r — |cc|)2 introduced in Example 1


of Section 9.2 is continuous on [—7r, 7t], and since /(7r) — 0 = /(-7r),
the periodic extension to R is continuous. Note that / is the polyno¬
mial (7r - x)2 on the interval (0,7r) and is the polynomial (7r + x)2 on the
interval (—7r, 0). Thus, the 2tx periodic extension of / is continuously dif¬
ferentiable except possibly at the points — 7r + 27rn and 27rn, where n is an
342 Chapter 9. Fourier Series

integer. At the points — tc + 27m, the left-hand and right-hand difference


quotients both have limit zero, so the 2tv periodic extension is continu¬
ously differentiable at these points. At x — 0 the derivative of / does not
exist (see Figure 9.2.1). However,

Ihn fJ° ± S) ~ /(°+) = lim ^ ~ 8)2 ~ -2tt


s\0 S s\0 S

and
/(0 + s)-/((T) (ir + s)2 — 7T2
lim lim 2tv,
s/0 s s/0 s

so both the right-hand and left-hand limits of the difference quotient


exist at x = 0, although they are not equal. Thus, the right-hand and
left-hand limits of the difference quotient of / exist at all x. Therefore,
according to Theorem 9.3.2, the Fourier series of / converges to f(x) at
all x. That is,
2 oo ,

/(*) = y + ^c°snz.

Example 2 The step function, /, defined in Example 2 of Section 9.2,


is continuous except for integral multiples of 7r. The left-hand and right-
hand limits of the the difference quotient of / exist at every point x e M
and equal zero. Therefore, Theorem 9.3.2 guarantees that the Fourier se¬
ries of / will converge at all x. Where / is continuous, that is, except
at integral multiples of n, the Fourier series converges to f(x). At each
jump discontinuity, the average of f(x+) and f(x~) is 0, so the Fourier
series will converge to zero at the points x = mr. This is exactly the be¬
havior seen in Figure 9.2.2. Note that all the partial sums, Sn(x), equal
zero at x = 0, so they converge to zero as n -> oo. The overshoot of
the partial sums near the jump discontinuities is called the Gibbs phe¬
nomenon after the physicist J. W. Gibbs (1839 - 1903).
Let g be the function which equals —1 on the open interval (—7r,0)
and which equals 1 on the open interval (0,7r). Then, depending how
we define g at 0 and at one or the other of the points ±7r, we get different
27t periodic extensions of g to the whole line M. All of these different
extensions (the function / above is one of them) have the same Fourier
series since the choice of values at the points 0 and ±7r does not affect the
integrals for the Fourier coefficients.

One reason that Theorem 9.3.2 is hard to prove is that the hypotheses
and conclusion are “local.” If the difference quotient for / has right-hand
9.3 Pointwise Convergence 343

and left-hand limits at a point x (which says that / is reasonably regular


near x), then the Fourier series of / converges at x. This is true no matter
what the behavior of / is away from x. Thus the proof must show that
the behavior on the rest of the interval doesn't matter.
A theorem similar to Theorem 9.3.2 was proven in 1829 by A. Dirich-
let (1805 - 1859). Note that it gives sufficient conditions for conver¬
gence of a Fourier series but does not answer the question of whether
the Fourier series of every continuous function converges. Most contem¬
poraries of Dirichlet thought the answer was yes, but they were proved
wrong by Du-Bois Reymond, who found in 1876 a continuous function
whose Fourier series diverged at one point. The example, whose con¬
struction is too long to reproduce here, led many analysts to believe
that there must be continuous functions whose Fourier series diverge
everywhere. The answer, in fact, lies in between. The Fourier series of
a continuous function can diverge on at most a set of “measure zero.”
Conversely, given a set of measure zero, there is a continuous function
whose Fourier series diverges there. Sets of measure zero are defined in
problem 10. Questions about the pointwise convergence of Fourier series
played an important role in analysis for a century and a half following
Fourier's La Theorie Analytique de Chaleur [17] and were not satisfactorally
settled until the 1960s. For a beautiful historical discussion, see [29].

Problems

1. A function / is called piecewise constant on [a, b} if there are finitely


many disjoint subintervals (a,i, bi) such that [a, b] = U[ai, 6;] and / is con¬
stant on each (a*, 6;). Prove by direct calculation that if / is piecewise
constant, then

±int dt 0.

2. Let / be the 2tt periodic extension of the function

2 + 3X —7T < x < 0


X2 0 < X < 7T

(a) For each sel, find the right-hand and left-hand limits of the differ¬
ence quotient of /.
(b) For each xeK, to what number does the Fourier series of / con¬
verge?
344 Chapter 9. Fourier Series

3. Which of the following functions have difference quotients at x = 0 that


have finite right-hand limits?

(a) f(x) = \x\.


(b) f(x) = y/\x\.

(c) /(*) = ^
(d) f(x) = sin K
(e) f(x) = a; sin
(f) f(x) = x2 sin K

4. Let / be the 2ir periodic extension of the function

{ 0 — 7T < X < 0

5x 0 < X < 7T
57t/2 X = TT.

Prove that the Fourier series of / converges to f(x) for all x.

5. (a) Compute the Fourier coefficients of the periodic extension of the


function f(x) = x on [—7r, 7r) (problem 7 of Section 9.2).
(b) Compute the graphs of Si, S3, and 5i0 and compare them to the
graph of /.
(c) Where does the Fourier series of / converge? To what does it con¬
verge?
(d) Show that the solution of the heat equation discussed in problem
5 of Section 9.1 satisfies the initial condition at all x e [0,7r] except
x = TC.

6. Let f be the function

,, s f 7T +x —n < x < 0
/(l) = {t-X 0 < X < TT.

Where does the Fourier series of / converge? To what function does it


converge?
h *

7. Let / be a twice continuously differentiable 2tt periodic function. Prove


that the Fourier series of / converges to / uniformly. Flint: use integration
by parts to estimate the coefficients. Note that the same conclusion holds
with a weaker hypothesis (Theorem 9.4.4).

8. Explain carefully why the coefficients {an} of the solution u(t, x) in prob¬
lem 7 of Section 9.1 can be chosen so that the initial condition w(0,a:) =
f(x) holds for all x if /'(0) = 0 = f'{L).
9.4 Mean-square Convergence 345

9. Let / be a 27t periodic function that is infinitely often continuously dif¬


ferentiable. Explain why the derivatives of / are periodic with period
. Prove that the Fourier series for / may be differentiated term by term
as often as we like and that we thereby obtain the Fourier series for the
derivatives of /.

10. A subset F C 1 is said to have measure zero if, given any e > 0,
there is a countable family of intervals {/n}^°=i such that E C uIn and
Y length(In) < e.

(a) Show that any finite set of points has measure zero.
(b) Show that the set { -
knJ
} has measure zero.
(c) Show that any countable set has measure zero. Remark: There are
uncountable sets of measure zero.

9.4 Mean-square Convergence


In the last section, we saw that the question of the pointwise conver¬
gence of Fourier series is quite delicate. We shall see in this section that
mean-square convergence is fairly easy to establish. We begin from a
more abstract point of view which will allow us to make connections
with other problems in classical and modern analysis. If / and g are
complex-valued piecewise continuous functions on an interval [a, b], we
define the inner product, (/, g), of / and g by

(/w) dx,

where g(x) denotes the complex conjugate of g(x). The integration of


complex-valued functions is discussed in Section 8.2. Elementary prop¬
erties of the Riemann integral and complex numbers allow one to show
easily that (problem 1)

(a) (/, f)> 0 and (/, /) = 0 if and only if / = 0.

(b) (ai/i+a2/2,5) = <*i(/i, g) + a2(/2, g) for all ai and a2 in C.

(c) (f,g) = (.g,f)•

From (b) and (c), it follows that the inner product is conjugate linear in
the second factor; that is, (/, a\g\ + : ^2) = Qi(/wi) + ®2(fi92)- The
02
346 Chapter 9. Fourier Series

L2 norm, defined in Section 5.3, can be expressed in terms of the inner


product by

l/lb = (/,/)»•
If 11/ — /n|| 2 -> 0, then the sequence of functions {/n} is said to converge
to / in the L2 norm, or in the mean-square sense. The reader is asked
to show in problem 3 that pointwise convergence for all x does not im¬
ply mean-square convergence, nor does mean-square convergence imply
pointwise convergence.

Proposition 9.4.1 (The Cauchy-Schwarz Inequality) For all piecewise


continuous functions / and g,

\{f,g)\ < ll/lhllslh- (33)

Proof. The proof uses only the three simple properties of the inner
product. First suppose that ||<7||2 = 1.

0 < 11/- (f,g)g\\2

= if - - (f,g)g)
= (/, /) - (/,9){f,g) - {f,g)(f,g) + (/,g){f,g)(g,g)
/Hi -1(f,g)\2,

from which (33) follows in the case ||y||2 = 1. If g is the zero function,
then (33) certainly holds, and if g is not, then ||#||2 > 0. Applying what
we have just proven to the functions / and h = g/\\g\\2, we find \(f,h)\ <
||/||2 since ||/i||2 = 1. Thus,

- \(f,h)\ < \\f\\2

which proves (33) in the general case. □

Proposition 9.4.2 If / and g are piecewise continuous on [a, b\, then


9.4 Mean-square Convergence 347

Proof. Properties (a) and (b) follow immediately from similar properties
of the inner product. By the Cauchy-Schwarz inequality,

II/ + 0II2 = (/>/) + (f,g) + idJ) + {9,9)


< II/II2 + 2||/||2||p||2 + II5II2
= (II/II2 + NI2)2,
so (c) follows by taking the square root of both sides. □

Thus || • ||2 has the three properties of a norm (see Section 5.8), which is
why we have been referring to it as the L2 norm. Since this norm comes
from an inner product, we can introduce a notion of orthogonality.

Definition. Two piecewise continuous functions / and g are are said to


be orthogonal if (/, g) = 0. A set (finite or infinite) of piecewise continu¬
ous functions, {<Pn}n=i' is said to be an orthonormal family if | </>n||2 = 1
for all n and (<pn, (pm) = 0 for all m ^ n.

It may seem strange to refer to two functions as orthogonal, but there


is a good reason for this terminology. Recall that the “dot product” of
two vectors in Rn, x = {x\, x2,..., xn) and y — (2/1,2/2»- * -»Un), is defined
by
n
x-y = XjVj■
j=1

Notice that the dot product is a function from Rn x R" to R that satisfies

(a) x ■ x > 0 and x ■ x — 0 if and only if x = 0.


(b) (aix + a2y) ■ z = a\x • z + a2y ■ z.
(c) x-y = y • x.

These three properties are completely analogous to the three properties


of the L2 inner product introduced at the beginning of this section, except
for the extra complex conjugation. We say that two vectors in Rn, x and
y, are orthogonal if x • y = 0; so we use the same terminology in the case
of the L2 inner product. We shall see that the concept of orthogonality
for functions plays as fundamental a role as orthogonality in Rn. This
suggests that there is an underlying idea here which is worth studying
futher. In project 4 we introduce the notion of inner product space and
348 Chapter 9. Fourier Series

show that the Cauchy-Schwarz inequality holds in general. In project


5, we give examples of complete inner product spaces, called Hilbert
spaces.
Of course, the orthonormal family that we have in mind is

on the interval [—7r, zr]. The fact that the functions einx are orthogonal to
each other enabled us to derive formula (17) for the Fourier coefficients.
Dividing by \Z2tv ensures that each has L2 norm equal to 1. Note that it is
only for convenience in the definition that we have indexed the sequence
of functions {<pn(a:)} from n = 1 to n = N, where possibly N = oo. The
index can run over some other set (as in the above example), and the
order of the functions in the sequence plays no role in the definition.
Suppose that / is a piecewise continuous function and {<pn}^=i is a
finite orthonormal family on the interval [a, b\. We want to find the linear
combination of the functions <pn that gives the best approximation to /
in the mean-square sense. That is, we want to choose coefficients {cn} so
that the norm of the difference ||/ — cn(pn(x)II2 is as small as possible.
First, we calculate

N N N

1/ - if ~ ^ 1 C-n'Pni f 'y ' Cn<pn)


71=1 n=l n=1

(/> /) - Y C”(<Fn, f)~J2


n=1 n=l
Pn) +
n=1
^

II/II2 — Y, K/’ ‘Fn)!2 + Y, \cn ~ {fi <Fn)|2,


n=1 n=l

where we completed the square in the last step. Since the first two terms
on the right do not depend on the sequence {cn} and the third term is
nonnegative, we see that ||/ - J2n=i cnVnh is smallest if we choose cn =
(/, (fn)- In this case

\\f-YCnVn\\l = ll/lll-EKI2. (35)


n=1 n=l

The numbers cn = (/, <pn) are called the generalized Fourier coefficients
of / with respect to the orthonormal family {pn}Yi- Notice that since
9.4 Mean-square Convergence 349

the left-hand side of (35) is nonnegative, we have

N N

11^1 C-n^Pn 112 = lcn| ^ II/II2* (36)


n=l n=1

If {<pn(x)} is an infinite orthonormal family, then (36) holds for each fi¬
nite N. Since the right-hand side doesn't depend on N, this shows that
|cni2 is finite and
OO

X] M2 < ll/lli,
n— 1

which is known as Bessel's inequality. We summarize what we have


proven in a theorem.

□ Theorem 9.4.3 Let / be a piecewise continuous function and let {<pn}


be an orthonormal family of piecewise continuous functions on a finite
interval [a, b\. Then, choosing cn = (/, <pn) minimizes ||/ - En=i cnVnh,
and the sequence {cn} satisfies Bessel's inequality.

We can now use these concepts to analyze Fourier series. Since

r
J—7r V*rr y/2n
= -i
2tt J-n
= (I itrm
[ 0 otherwise,

the set of functions in (34) is an orthonormal family on the interval


[—7r, 7r], The Fourier coefficients of a function / are defined by the in¬
ner products

tin = (/, (2ttJ-VV”*) = -±= f(x)e~imx dx, (37)

and the Fourier series of / is


oo gima;
f(x) ~ X c™^r=-
m=-oo v27T

Note that the Fourier series is the same as (16), but the definition of cm
differs by a factor of V2tt from (17) since we have put the factor y/2n
under einx so that einx/\/2tt has L2 norm equal to 1. If we define the
partial sum, Sn{f), of the Fourier series of / by
g imx

Sn{f){x) = X Cm /n-’
m--n.V2tT
350 Chapter 9. Fourier Series

then the analogue of (36) is

l|S„(/)||! < E"=-„M2 < ll/lll, 08)


and Bessel's inequality states

OO

E M2 ^ ii/ii2-
m——oo
<39>
We can immediately use Bessel's inequality to improve Corollary 9.3.3.

□ Theorem 9.4.4 Let / be a continuously differentiable function of period


27t. Then, the Fourier series of / converges to / uniformly.

Proof. We already know from Corollary 9.3.3 that the Fourier series of /
converges to / pointwise. By the Weierstrass M-test, we need only show
that E^°oc \cm\ < oo to conclude that the series converges uniformly. If
we denote by {c^} the Fourier coefficients of /', then integration by parts
in (17) shows that iracm = c'm for all m / 0. For each positive integer n,
let Qn denote the set of integers Qn = {m | — n < m < n; m ^ 0}.
Then, the discrete Cauchy-Schwarz inequality (problem 10 in Section 5.8)
implies that

S lc"
meQn meQn
E m

<

Since /' is a continuous function, Bessel's inequality for f implies that


Y, lcml2 < °°/ and we also know that E ~2 < oo. Thus, the right-hand
side has a finite limit as n —>• oo. Therefore,

^ ] |Cm.| ^ OO,

from which it follows that the Fourier series converges uniformly. □

The following lemma plays a crucial role in the main theorem.

Lemma 9.4.5 Every periodic continuous function can be uniformly ap¬


proximated by a periodic continuously differentiable function.
9.4 Mean-square Convergence 351

Proof. Let j(x) be a continuously differentiable function on R such that


j(x) > 0, j{x) — 0 outside the interval [-1,1], and f^°ooj(x)dx = 1.
See problem 4 for the construction of such a function. For each S > 0,
set js{x) = 5~1j(x/6). Then j§(x) > 0, js(x) = 0 outside the interval
[—<5,5], and js(x) dx = 1. Suppose that f(x) is a periodic continuous
function, and define

/ OO

js(x ~y)f(y)dy.
-OO

For each fixed x, the integrand js{x — y)f(y) is a continuous function of


y which is zero outside of the interval [x — (5, x + 5], so the integral makes
sense. In fact, for x in any fixed interval [a, b], the only values of y that
contribute to the integral are those in the interval [a — 5, b + <5], so

rb-\-S
9s(x) = / js(x-y)f(y)dy.
J a—S

Since js{x—y)f(y) is continuously differentiable in x on [a, b] x [a-S, b+d],


Theorem 5.2.4 guarantees that g§ is continuously differentiable on [a, b}.
Ffowever, [a, b] was arbitrary, so g§ is continuously differentiable on R.
Furthermore, by making the change of variables r = x — y, we can write

/ OO

js{r)f{x - r)dr,
-OO

from which it follows that g§ has the same period as /. Finally,

\g5{x) - /(*)|
/ OO

3s(r){f{x - r) - f{x))dr
-OO

= J - f{x))dr

/OO

js(x)di
- _ _ ~°°

< sup \f(x - r) - f(x)\.


—S<t<8

Since / is periodic and continuous, / is uniformly continuous. Therefore,


the term on the right goes uniformly to zero as S -> 0. Thus, g§ -* /
uniformly as S -> 0. ^
352 Chapter 9. Fourier Series

□ Theorem 9.4.6 Let / be a continuous function of period 2ix and let


Sn(f) = J2m=-n cme~irnx be the partial sum of the Fourier series of /.
Then,

IIf — Sn(f)\\2 —> 0 as n —> oo

and

(Parseval's relation).

Proof. Let e > 0 be given. By the lemma, we can choose a continuously


differentiable function g which has period 2n and satisfies \\f—g\\2 < e/3.
By the triangle inequality [part (c) of Proposition 9.4.2],

Il/-s„(/)||2 < \\f - 9h + \\g - Sn(g)\\2 + \\Sn(g) - Sn(f)\\2. (40)

Since the formula for the Fourier coefficients, (17), depends linearly on
the function under the integral, Sn(g) — Sn(f) = Sn(g — /). Furthermore,
by (38), ||Sn(g - /)||2 < ||g - f lb, so,

ll/-S»(/)ll2 < | + Il9-S„(9)||2 + | (41)

for all n. To estimate the middle term, notice that

\\d II2 = f \g{x) - Sn(g)(x)\2 dx < 2n\\g- 5n(p)||^.


J —7T

Since g is continuously differentiable. Theorem 9.4.4 guarantees that ||p —


<S'n(^)||oo —» 0 as n -» 00. Thus, ||g — -> 0 as n -> 00. Therefore,
we can choose an N so that \\g - S'n(^)||2 < e/3 for n > N. Using this
estimate in (41) gives

||/-Sn(/)||2 < £ for n > N,


** \

which proves that ||/-S„(/)||2 -> Oasn ->• 00. Furthermore, formula
(35) gives

II/-S„(/)I|2 = ll/ll2 - £ M2.


m——n

The left-hand side goes to zero as n -* 00 , which implies that


limn^oo Zm=-n lcm|2 = ||/||2. This proves Parseval's relation. □
9.4 Mean-square Convergence 353

It is worthwhile to notice the interesting structure of the proof and


the important role of Lemma 9.4.5. We wanted to show that the left side
of (40) is small for large n. The first term on the right is made small by
using the lemma to choose a continuously differentiable g that is close
to /. The second term is small for large n because the Fourier series of a
continuously differentiable function (g) converges uniformly and there¬
fore in mean-square sense. The third term is shown to be small by using
Bessel's inequality. This suggests a way of proving that the conclusions
of Theorem 9.4.6 hold for more general classes of functions. If / is a func¬
tion which can be approximated in mean-square sense by a continuous
function g, then the first term on the right of (40) can be made small. The
second term can then be made small since we have just shown that the
Fourier series of a continuous functions converges in the mean-square
sense to the function. If Bessel's inequality holds for f — g then the third
term will also be small. This idea can be used to show that the Fourier
series of a piecewise continuous functions converge in the mean-square
sense to the function (problem 6) and that the same is true for functions
with even nastier singularities (project 3). If we can find the most general
class of functions which can be approximated in mean-square sense by
continuous functions, for which dx makes sense and for which
Bessel's inequality holds, we should be able to prove that the conclu¬
sion of Theorem 9.4.6 holds for that class of functions. To carry out this
program, one needs the Lebesgue integral.

Problems
1. Verify the three properties, (a), (b), and (c), of the inner product.
2. Let {/„} be a sequence of continuous functions on [a, b} that converges to
a function / uniformly. Prove that fn -t / in the mean-square sense.
3. Let [a, b] be a finite interval.
(a) Construct a sequence of continuous functions on [a, b\, {fn}, so that
fn -> 0 pointwise but ||/n||2 -too. Hint: choose /„ to be a function
which is tall on a small set and zero elsewhere.
(b) Construct a sequence of continuous functions on [a, b\, {/„}, so that
11 fn 112 —t 0 but {fn{x)} does not converge to 0 for any x e [a, b). Hint:
find fn with narrow graphs that march back and forth across [a, b].

4. (a) Use pieces of polynomials to construct a continuously differentiable


function j(x) on R that is nonnegative, vanishes outside the interval
[—1,1] and satisfies j(x) dx = 1.
354 Chapter 9. Fourier Series

(b) Let f(x) = 0 for x > 0 and f(x) — e 1/x2 for x < 0. Prove that / is a
C°° function on R. Hint: see Example 4 in Section 6.4.

(c) Use the function / and its translates to construct a C°° function j
that has the properties in (a).

5. For the step function of Example 2 in Section 9.2, compute approximately


||/ - 52n+i||2 for n = 0,n = 1, and n = 4.

6. (a) Let / be a piecewise continuous function on [7r, tt\. Prove that there
is a sequence of continuous functions fn on [n, 7r] so that /n —> / in
mean-square sense. Hint: connect the pieces.

(b) Use the idea of the proof of Theorem 9.4.6 to show that the Fourier
series of a piecewise continuous function / converges to / in the
mean-square sense.

7. Let / and g be continuous functions of period 27t with Fourier coefficients


{cn} and {dn}, respectively. Prove that

8. Use Parseval's relation and the function in Example 2 of Section 9.2 to


prove that

9. Let / be a continuously differentiable function on [—7r, 7t] such that


f(x) dx = 0. Prove that

(42)

Prove that strict equality holds in (42) if and only if f(x) = a cos x + b sin x
for some constants a and b. Hint: use Parseval's relation.
k *

10. Suppose that / is a twice continuously differentiable function which is


periodic of period 2n. Prove that

J
r \f\x)\ux
—TV
<

Inequalities such as this are called Sobolev inequalities.


Projects 355

Projects

1. Consider heat flow in a bar of length n with n = 1 and insulated ends as


in Problem 7 of Section 9.1. Let the initial temperatures be f(x) = 0 on
[0,7t/2] and f(x) = 10 on [7t/2,7t].

(a) Compute the coefficients in the expansion f(x) — fl» cos ns.
(b) We will approximate the solution, u(t, x), of the heat equation by
the first eight terms of its expansion v(t, x) = ane~Xnt cos nx.
Graph v at times t = 0, |, and 1.
(c) What properties of the solution discussed in Section 9.1 (and prob¬
lem 7 of that section) can you observe in the graphs?
(d) How close is v to the true solution at t = 1?
(e) Compare v and 5 — e_t cos nx at t = 1. Why are they so close?
(f) Investigate the influence of n by graphing the approximate solu¬
tions at the times t = 0, |, and 1 in the cases k = 10 and
« = 1/10.

2. The technique used to solve the heat equation in Section 9.1 can also be
used to solve boundary value problems for other partial differential equa¬
tions. In the simplest model for a vibrating string of length L that has
fixed ends, the unknown function, u(t, x), which represents the vertical
displacment from equilibrium of the string at position x at time t, satisfies
the wave equation

utt(t,x) — c2uxx(t,x) = 0, 0 < x < L (43)

and the boundary conditions

u(t, 0) = 0 = u(t,L). (44)

The constant c = T/p where T is the tension in the string and p is the
density. We specify the initial displacements and the initial velocity of
the string at each x.

u(0,x) = f(x) (45)


ut(0,x) = g(x) (46)

For simplicity, we will assume that / is three times continuously differ¬


entiable, g is twice continuously differentiable, and #(0) — f(0) = 0 =
f(L) = g(L).
(a) Follow the separation of variables method of Section 9.1 to show
that for each positive integer n and any choice of the constants an
and bn the function
, nirct mrct, . nnx
un(t,x) = {a„sin—— +6ncos—jr-}sm-jr-

satisfies (43) and (44).


356 Chapter 9. Fourier Series

(b) Using Theorem 9.3.2 and the idea in Example 3 of Section 9.2, show
that the constants {an} and {frn} can be chosen so that the series
OO

(47)

converges and (45) and (46) hold.


(c) Prove that u(t, x) is twice continuously differentiable in t and x and
satisfies (43) and (44). Justify any term-by-term differentiations.
(d) The energy, E, of the solution, u, is given by the following expres¬
sion

Show that E is independent of t. Justify any differentiating under


the integral sign.
(e) Suppose that v is another twice continuously differentiable function
which satisfies (43) - (46). Show that w = u — v is a solution of (43)
and that its energy is zero. Use this to prove that u(t, x) = v(t, x) for
all 0 < x < L, 0 < t, so the solution is unique.
(f) We saw in problem 10 of Section 9.2 that the faster the Fourier co¬
efficients go to zero as n —> oo, the more differentiable the function
defined by the Fourier series is. This idea was used in Theorem 9.1.1
to show that the solution of the heat equation is always infinitely
differentiable for t > 0. Do you think that the same is true for the
wave equation?

3. Let f(x) = |x|“4 on the interval [—7r,7t],

(a) Explain why the improper Riemann integrals {f,e~inx) and ||/|||
exist.
(b) If g is continuous, explain why Bessel's inequality will hold for f—g.
(c) Show that / can be approximated as closely as we like in the mean-
square sense by a continuous function.
(d) Prove that the Fourier series of / converges to / in the mean-square
sense.
(e) Prove that the Fourier series of h(x) — sin ^ converges to h in the
mean-square sense on [—7r, it].
(f) Formulate theorems more general than Theorem 9.3.6 that guaran¬
tee mean-square convergence of Fourier series.

4. A vector space V is called an inner product space if there is a function,


(•, •)/ from y x V to C which satisfies the three properties, (a), (b), and (c),
of the L2 inner product listed at the beginning of Section 9.4. For v e V,
we define ||t>|| = (v, v)%.
Projects 357

(a) Prove that |(i>, iu)| < ||v||||u;|| for all v e V and w eV.
(b) Prove that || - || is a norm on V.
(c) Suppose that vn —> v; that is, ||t>n — v\\ -> 0. Prove that for every
w eV, (vn,w) -> (v,w).
(d) Vectors v and w are said to be orthogonal if (v, w) = 0. Let {0n} be
an orthonormal family vectors in V; that is \\(j)n\\ — 0 for all n and
{<Pm 0m) = 0 if n / m. Prove that for all v e V,

(e) The family {0n} is said to be an orthonormal basis for V if every


vector veV can be written v = cn0n for some cn e C, where the
series converges to v in the norm |j • ||. If {0„} is an orthonormal
basis, show that cn = (v, 0n) and ||w||2 = |(v, 0n)|2.

5. An inner product space is called a Hilbert space if it is complete, that is,


if every Cauchy sequence in V has a limit in V. These spaces are named
after David Hilbert (1862 - 1943).

(a) Let Cn denote the set of n-tuples of complex numbers with the usual
vector addition and scalar multiplication. For any two such vectors,
£ = (zi, Z2,. • •, zn) and w = (wi,u>2, ■ ■., wn), we define the inner
product by
n

Prove that C" is a Hilbert space.


(b) Let (.2 denote the set of sequences {an}^°=1 such that J2 l°n|2 < °o-
For two sequences a = {an} and b = {6n} in £2, we define the inner
product of a and b by

OO

(c) Use the discrete Cauchy-Schwarz inequality to show that


Yj \anbn\ < oo, so the definition of inner product makes sense.
(d) Verify that (•, •) is an inner product on i2.
(e) What is an orthonormal basis for
(f) Prove that £2 is a Hilbert space. Hint: this is difficult; follow the
outline of the proof of Theorem 5.8.1.
CHAPTER 10

Probability Theory
In this chapter we show how many of the analytical tools which we have
developed, such as sequences, series, limit theorems, and metric spaces,
are used in probability theory In Section 10.1 we introduce discrete ran¬
dom variables, using the Bernoulli, binomial, and Poisson random vari¬
ables as examples. In Section 10.2 we show how simple probabilistic
ideas and the concept of metric are used in coding theory. Continu¬
ous random variables are discussed in Section 10.3. Finally, in Section
10.4 we develop more advanced applications of metric space concepts
to probability theory. Chebyshev's inequality and the weak law of large
numbers are covered in the projects.

10.1 Discrete Random Variables

We begin by reviewing some of the basic concepts of probability the¬


ory using the experiment of rolling two dice as an example. Probability
theory involves the analysis of real-valued functions defined on sets of
possible outcomes of experiments. The set of possible outcomes is called
a sample space and the functions are called random variables.
Suppose that we roll two dice, one green and one red. Let m and
n be the values showing on the green die and the red die, respectively,
after we roll. The set of possible outcomes, S, consists of pairs of integers
(m, n) where m and n are between 1 and 6. The following are functions
defined on S and take values in R:

X(m,n)=m + n, Y(m,n) =m, Z(m,n) = n.

Given an outcome (m, n), the value of X is the sum of the dice. The
values of Y and Z are the numbers on the green die and the red die,
respectively. Because X, Y and Z are M-valued functions defined on the
set of outcomes of the experiment, they are random variables. Y and Z
360 Chapter 10. Probability Theory

take values between 1 and 6 and X takes values 2 through 12; that is, the
range of X is the set of integers between 2 and 12.

Definition. A random variable whose values lie in a finite or countable


subset of R is called discrete.

If X is a discrete random variable, we are very interested in the prob¬


abilities that it takes on different values. We denote the probability that
the value of X is r by
P{X = r}.
If we make assumptions about the experiment, we may be able to calcu¬
late these probabilities.
There are 36 distinct possible outcomes when we roll the two dice.
Assume that each outcome is equally likely. Let's calculate the probabil¬
ities that X takes various values. Since there is only one outcome, (1,1),
so that the sum of the dice is 2, we have P{X = 2} = On the other
hand, there are two different outcomes, (1,2) and (2,1), so that the sum
is 3, and therefore P{X = 3} = By counting the different ways that
we could have m + n = r for any given r, we can compute P{X = r} for
any r. For example, P{X = 7} = ^ and 7 is the most likely sum. Notice
that P{X = 1} = 0 and P{X = 7r} = 0 since 1 and 7r are not values that
X can take.
If A is a discrete random variable which can take the values {an}, we
require that
^P{X = an} = 1,
n

which says simply that X must take one of these values. If A is any
subset of R, then we define the probability that the value of X is in A,
denoted P{X e A}, by

P{X e A] = Y PiX = “»}’


an e A

where the sum is over all n such that an e A. This makes sense because
we are saying that the probability that the value of X lies in A is the sum
of the probabilities that X takes on each of the different numbers in A.
The function whose value at an is P{X = an} is called the mass density
of the discrete random variable X.
If we denote the complement of A in R by Ac, we note that

P{X e A] + P{X eAc] = Y PiX = “-»}+ E pix = = 1-


On e A a„ e Ac
10.1 Discrete Random Variables 361

1 hus, we always know that P{X e Ac} = 1 — P{X e A}. We remark that
we often write P{a < X < b} instead of P{X e [a, b}}.
Let X be the random variable in the dice experiment and let A be the
closed interval A = [—1,3]. Then P{X = | since 2 and
3 are the only possible values of X in the interval [—1,3]. Now, suppose
that A = [3, oo). Then,

P{3 < X < oo} = 1 - P{-oo < X < 3} - 1 - P{X = 2} = 1 - X.

Two discrete random variables, X and Y, are said to be independent


if for any two subsets A and B of R,

P{X e A and Y e B} = P{X e A}P{Y e B}. (1)

Consider the random variables Y and Z in the dice experiment. Let


A be the set {2,3} and let B be the set {4}. Then, P{Y eA} = | and
P{Z eB} = g. On the other hand, the only outcomes where both Y e A
and Z eB are true are (2,4) and (3,4). Thus, P{X e A and 7e5} = |,
so

P{XeA and Y e B} = P{Y e A}P{Z e B} (2)

for these two particular sets, A and B. With a little more work, one can
show that (2) is true for all choices of A and B, so the random variables
Y and Z are independent. On the other hand, consider the random vari¬
ables X and Y with the same two sets, A = {2,3} and B = {4}. Then,
P{X e>l} = ^ + ^ = ^ and P{Y e B} = However, there are no
outcomes so that X has the value 2 or 3 and Y has the value 4. Thus,

P{X eA and Y e B} = 0 / X . I = p{X e A}P{Y e £},

so the random variables X and Y are not independent. This makes


sense because the likelihood that X will take any particular value will
“depend” on what the value of Y is.
The three special kinds of discrete random variables arise frequently.

Example 1 (Bernoulli) A discrete random variable which takes only


the two values 0 and 1 is called a Bernoulli random variable. Bernoulli
random variables typically arise in experiments with two outcomes. For
example, if one flips a coin and assigns X the value 1 if the result is
heads and takes the value 0 if the result is tails, then X is a Bernoulli
random variable. In the discussion of Markov chains in Section 2.3, the
362 Chapter 10. Probability Theory

state of the phone at the check is a Bernoulli random variable be¬


cause the phone is either free or busy. A Bernoulli random variable is
P{X = 1} = p since then
completely specified by giving the probability
P{X — 0} = 1 — p. Then X is said to be Bernoulli with parameter p.

Example 2 (Binomial) Let n > 0 be a positive integer and suppose that


0 < p < 1. A random variable X which takes the values 0,1,..., n with
probabilities

= (3)

is called a binomial random variable (with parameters n and p). Using


the binomial theorem, we can check that the probabilities add up to 1:

p/{x=k} = = (p+(l-p))n
= 1.

Binomial random variables typically arise as the sums of independent


Bernoulli random variables. Suppose, for example, that we have a coin
that comes up heads with probability p. Let X\ be the random variable
whose value is 1 if the first flip is a head and whose value is 0 if the first
flip is a tail. Similarly, let X2 be the random variable whose value is 1 if
the second flip is a head and whose value is 0 if the second flip is a tail.
Suppose that we assume that the first and second flips are independent.
The probability that they are both heads is

P{X 1 = 1 and X2 = l} = P{X 1 = 1}P{X2 - 1} = p2.

Similarly, the probability that the first flip is a head and the second flip
p(l — p). Now suppose we flip the coin n times and each flip
is a tail is
is independent of the other flips. Let Xk be the random variable that is
1 if the flip is heads and 0 if thek^ flip is tails. Define the random
variable X = X\ + X2 H-1- Xn. Then, the value of X is just the number
of heads in n flips of the coin, so the possible values for X are the integers
0 through n. To compute P{X = k}, notice that the probability that any
particular configuration of k heads will occur is pk{l -p)n~k because the
flips are independent. Since there are (n_^!fe! different choices for the
positions of the k heads in n flips, we see that (3) holds; that is, X is a
binomial random variable.
10.1 Discrete Random Variables 363

Example 3 (Poisson) Suppose A > 0, and let A be a random variable


that takes the values 0,1, 2,... with probabilities

p{x = k} = e~"w

X is called a Poisson random variable with parameter A. Notice that

OO
A*
E pix =k} ~ki
e~xex 1

as required. A random variable is often assumed to be a Poisson random


variable if it is the number of occurrences of a rare event, such as the
number of radioactive decays in a given period of time or the number of
misprints on a page.

Notice that there is no mention of “sample space” in the definitions of


Bernoulli, binomial, and Poisson random variables. The definitions sim¬
ply specify the mass density of the random variables, that is, the range
and the probability of each value in the range. Thus, two experiments
and sample spaces S\ and S2 may be entirely different, but if the ran¬
dom variables X\ on £>1 and X2 on S2 have probabilities

\^
P{Xi = k} = e~x— = P{X2 = k},

then they are both Poisson random variables with parameter A.

Definition. Let A be a discrete random variable which takes values


{an}. If Xm \an\P{X = an} is finite, then the expected value of X, de¬
noted E(X), is defined by

E(X) = J>nP{A = an}


n

and X is said to have finite expectation.

The expected value, which is also called the mean, is the weighted av¬
erage of the possible values of X, with the weight of each value given by
its probability of occurrence. Note that once we know the mass density
of a discrete random variable, we can compute its mean without know¬
ing what the underlying experiment is or the meaning of A. A simple
364 Chapter 10. Probability Theory

example of a random variable which does not have finite expectation is


given in problem 6. The means of the three standard random variables
defined above are easy to calculate.

Proposition 10.1.1

(a) If X is Bernoulli with parameter p, then E(X) = p.

(b) If X is binomial with parameters n and p, then E(X) = np.

(c) If X is Poisson with parameter A, then E(X) = A.

Proof. If X is Bernoulli with parameter p, then

E(X) = 0-P{X = 0} + 1-P{X=1} = p.

If X is binomial with parameters n and p, then

n■ k „\n-k
Em - SV»!» p (i - p)7

(U pk-1 (1

n—1
{n - 1)! n—l)—k
P (1 ~P){
nP^0 ((n - !) - k)!(k)\
= np.

Finally, if X is Poisson with parameter A, then

00 \k oo \(k-l) oo
E(X) = E = Ae- E ^ = Ae- E ^ = A,
fc=0 fc=l ^ fc=0 A"

which completes the proof. □

The following example shows'how power series arise naturally in


probability theory.

Example 4 Consider an experiment in which we flip a fair coin until we


get heads. Let X be the number of flips which we make. Then, P{X =
1} = If the experiment ends after two flips, then we got tails on the
first flip and heads on the second. This happens with probability (±)2.
10.1 Discrete Random Variables 365

Similarly, if the experiment ends after k flips, we got k - 1 tails in a


row followed by a head, so P{X = k} = )k. We wish to compute
E{X) = E£°=iM|)fc-Since

2E(X) = = £*“„(*+ l)(i)\


k=1
we calculate the power series

E(fc+W
fc=0
dx E
fc=0
*1

d / 1
dx V1 — x
1
(1 — x)2

for | x | < 1. In the first step we used Theorem 6.3.3 and the fact that the
radius of convergence of the geometric series is 1. Substituting x =
we see that 2E(X) = (1 — |)“2, so .E(X) = 2.

Let S' be a sample space and let X be a discrete random variable on


S that takes values {an}. If ip is any real-valued function on R, then the
composition ipcX is a discrete random variable on S that takes the values
{^(an)}. The following theorem shows how the expectation of ip o X can
be computed.

□ Theorem 10.1.2 Let X be a random variable with range {an} and let
■0 be a real-valued function on M. If the series ^]0(an)P{X = an} con¬
verges absolutely, then ip o X has finite expectation and

£(</> OX) = Y,'PMPiX =an}. (4)

Proof. The series X) ^(an)P{X = an} converges absolutely by hypoth¬


esis, so by Theorem 6.2.5, we may rearrange it in any way that we like
and it will have the same sum. If we denote the values of ip o X by {bj},
then

Y^MP{X = an} = Y E bjP{X = a „}


n j {n\ ip(an)=bj}

= E bi E P{X = an)
j {n\ip(an)=bj}
366 Chapter 10. Probability Theory

= =y
3

= E(i/j oX).

If we follow exactly the same steps but replace ip(an) by |^(an)| and bj
by \bj\, we see that J2j fy-PiV’ ° X = bj} converges absolutely. □

If X and Y are random variables on a sample space S, then it makes


sense to add or multiply X and Y since they are functions on S with
values in R. It X and Y have finite expectation, then£l(X + F) = E(X) +
E(Y) (problem 9). However, there is in general no simple relationship
between E(XY) and E(X) and E(Y), except in the case where X and Y
are independent.

□ Theorem 10.1.3 Suppose that X and Y are random variables on the


same sample space and that X and Y are independent. Suppose X and
Y have finite expectation. Then XY has finite expectation and

E(XY) = E(X)E(Y). (5)

Proof. Let the values of X be {a;} and the values of Y be {bj}. We begin
by considering the double sum

\aibj\P{X = a,i and Y = bj}.

All the terms are positive in this double sum. Thus, by Theorem 6.2.6,
if we show that it converges in any rearrangement, then it converges in
all rearrangements and the sum is always the same. Since X and Y are
independent,

y, \aibj\P{X = di and Y = bj} = y \ai\\bj\P{X = ai}P{Y = bj}


id i,3

= £kl = =

y>|p{r = y l£|oj|p{x
< oo.
10.1 Discrete Random Variables 367

In the last step we used the hypothesis that X and Y have finite expecta¬
tions. This proves that XY has finite expectation. Equation (5) is proved
by following the same steps with \aibj\ replaced by afij. □

In the projects we define the variation and standard deviation of a


discrete random variable and outline the proofs of Chebyshev's inequal¬
ity and the weak law of large numbers.

Problems

1. Consider the experiment where two dice are rolled, and let the random
variable X be the sum of the faces.

(a) Compute P{X = r} for all r.


(b) Compute the mean of X.
(c) Compute the standard deviation of X.

2. Consider an experiment where we flip two fair coins simultaneously. Let


X denote the number of heads plus the square of the number of tails.

(a) Compute P{X = r} for all r.


(b) Compute the mean of X.
(c) Compute the standard deviation of X.

3. Let X be a Bernoulli random variable with P{X = 1} = p. Compute the


standard deviation of X.
4. In a bin of oranges each orange is good with probability .8 and bad with
probability .2, independent of the other oranges. You select five oranges
at random. What is the probability that

(a) all five are good?


(b) all five are bad?
(c) exactly three are good?
(d) at least three are good?
(e) fewer than two are good?

5. The expected number of misprints on a page in a manuscript is five. If we


assume that the number of misprints is a Poisson random variable, what
is the probability that a particular page will have:

(a) no misprints?
(b) less than or equal to three misprints?
368 Chapter 10. Probability Theory

(c) four or more misprints?

6. Let X be a random variable taking values in N such that P{X — k) = ^4^ ■

(a) Prove that Y1T= i P{X = k} = 1. Hint: see problem 6 in Section 9.2.
(b) Prove that X does not have finite expectation.

7. Let 0 < p < 1 and consider an experiment in which we flip a coin which
comes up heads with probability p until we get heads. Let X be the num¬
ber of flips which we make. Compute E(X).
8. Let X be the random variable which is the sum of the faces in the experi¬
ment of rolling two dice. Compute E(X) and E(X2).
9. Let X and Y be discrete random variables on the same sample space and
suppose that both X and Y have finite expectation. Prove that for any c
and d, the random variable cX + dY has finite expectation and

E(cX + dY) = cE(X) + dE{Y).

10. Consider the experiment of rolling two dice, one red and one green. Com¬
pute the expectations of the following random variables:

(a) the sum of the faces?


(b) three times the green face minus the red face?
(c) the product of the faces?

11. Suppose that Y(A) is a Poisson random variable with parameter A. Let A
be any subset of R and define PA{A) = P{Y(A) e A}. Prove that PA is a
continuous function of A on (0, oo).

10.2 Coding Theory

Probability theory and metrics arise in a natural way in coding theory.


To explain why, we start with a simple example.

Example 1 Suppose that we wish do transmit a binary message which


has two digits, that is, our message is 00, 01, 10, or 11. Suppose that each
digit has a probability p of being transmitted correctly. Assuming that
the digits are transmitted independently, the probability that the mes¬
sage received will be the same as the one transmitted is p2. Intuitively,
we should be able to improve the reliability of transmission by adding
redundancy to the message. For example, suppose that we simply repeat
10.2 Coding Theory 369

each message three times. Instead of sending 00, we send 000000 and
instead of sending 01 we send 010101, and so forth. These four binary
strings
oooooo oioioi ioioio mm
will be our code words. Let S be the set of all strings of 0's and l's of
length 6, and let p be the discrete metric on S. That is, if {x;}®=1 and
{Vi}i=i are binary strings of length 6, then

6
i {Hi}) = ^ ^ Vi)
i—1

where 5(x,x) = 0 and S(x,y) = 1 if a; / y. Thus the distance between


two strings is just the number of places in which they differ.
Here is the idea. Notice that each of the four code words which we
want to send is a distance > 3 from each of the other code words. Sup¬
pose that we send the code word Cs and that it is received as a binary
string Cr with one error in it. Suppose that Ca is a code word different
from Cs. Then, by the triangle inequality for metrics,

3 < p(Cs,C0) < p(Ca,Cr) + p(CriC0).

Since p(Cs,Cr) = 1 by assumption, we see that p(Cr, CQ) > 2; that is, the
received string is a distance > 2 from all other code words. Thus, if we
receive a string which has one error in transmission, we can correct the
error by replacing the received string by the unique code word which has
distance 1 from it. Using this scheme, we can correctly decode a received
string if it is in fact correct or if it has exactly one error. The probability
of correct transmission is p6, and the probability for transmission with
exactly one error is 6p5(l — p). Thus, we will correctly decode the string
with probability p6 + 6p5(l — p). If, for example, p — .9, then

p2 = .81 and p6 + 6p5(l — p) = .886,

so by adding redundancy we have achieved an improvement in the reli¬


ability of our transmission channel. On the other hand, if p = .7, then

p2 = .49 and p6 + 6p5(l — p) = .42,

so the reliability has been decreased. This raises several natural ques¬
tions. For which p does a repetition of three times improve the reliabil¬
ity? Suppose that we repeat each two-digit message five times. Then the
four code words would be a distance > 5 apart and we would be able
370 Chapter 10. Probability Theory

to correctly decode received strings with no errors, one error, or two er¬
rors. For what p will this code improve reliability? Can we achieve any
desired level of reliability less than 1? For which p? These questions are
investigated further in problems 1, 2, and 9. For obvious reasons, these
codes are called repetition codes.

Figure 10.2.1

Example 1 shows the kind of question that motivated the develop¬


ment of coding theory. We are given the characteristics of an informa¬
tion transmission channel, in this case the probability p. We know what
kind of information we wish to transmit; in the example the information
comes as a pair of binary digits. The problem is to design an encoding
device and a decoding device to improve the reliability of the channel.
Usually there are other constraints as well. Suppose that there are N
binary code words of length n. Then

R = lo&2N
n

is called the information rate of the channel. In Example 1, if we transmit


the four words 00, 01, 10, and 11 directly, then R = 1 since N = 4 and
n = 2. If we use the repetition code that repeats each word three times,
then R = | since N = 4 and n = 6. It can be shown (problem 9) that if
p > § and we allow enough repetitions, then we can achieve any desired
reliability strictly less than 1, but at the cost of a very low information
rate. The challenge is to design codes which have both high reliability
and high information rates. To see' that this is an interesting question
with deep connections to linear algebra we will describe the Flamming
(7,4) code, named after one of the founders of the subject, R. Hamming
(1915-).
Suppose that we wish to encode signals consisting of four binary dig¬
its. We put the four digits in the 3rd, 5th, 6th, and 7th positions of a seven
digit binary vector (xi,x2, %3, *4, ®5, *6, x7). The entries in the 1st, 2nd,
10.2 Coding Theory 371

and 4*-^ positions are defined by

£1 “I- #5 Xj (6)
X2 = X3 + Xq + X7 (7)
£4 X5 ~\r Xq X7 (8)

where we use binary arithmetic, so 1 +1 = 0. There are 24 choices for the


3r<^, 5^\ and 7^ positions, and the other positions are then deter¬
mined. Thus, there are 16 code words, each of length 7. Since X{ + £* = 0
for each i, we can write the equations which define x\, £2, and X4 as

£1 + £3 + x5 + x7 = 0
X2 + £3 + £6 + X7 = 0
X4 + X5 + Xq + X7 = 0

Thus, among the 128 binary vectors of length 7, the 16 code words are
the binary vectors v = (£1, £2,£3, £4, £5, xq, £7) such that Hv = 0 where
H is the matrix

/ 1 0 1 0 1 0 1 \
H = 0 110 0 11

v 0 0 0 1 1 1 1 J

The set of binary strings of length n is the Cartesian product of Z2 with


itself n times, so it is denoted by . The set is a vector space because
we can add binary vectors by adding their components (using binary
arithmetic) and we can use Z2 as the field of scalars (problem 7). These
abstract concepts are not used in our calculations below, just the fact that
H is a linear transformation from Z72 to zjj.
Since H is a linear transformation, the set of vectors v such that Hv =
0 is a vector subspace of Z^. That is, if C\ and C2 are code words, then
Ci — C2 is a code word. This makes it easy to compute the distance
between code words. As in Example 1, we define a metric on the set of
seven-digit binary strings by

p({*»}» (yd) = J2s(x^yi)-


i—1

Then, p(C7i, C72) = p{C\ - C2,0), and p(C\ - C2,0) is simply the num¬
ber of I's in the code word C\ - C2. Suppose that C\ ^ C2, and let
(£1, £2, £3, £4, £5, £6, x7) = C\ — c2. We shall show that C\ - C2 has at
372 Chapter 10. Probability Theory

least three l's. If there are three or more l's in the 3r^, 6*- , and Im¬

positions, there is nothing more to prove. From equations (6) - (8), it


follows easily that if exactly two out of x3,x3^xq, xj equal 1, then exactly
one of X!,x2, x4 equals 1. And, if exactly one out of x3,x5,x6, x7 equals
1, then exactly two of x\, x2, x4 equal 1. Thus,

p(Ci,Cj) > 3, for all Q^Cj.

If C is a code word we let Bi (C) denote the set of words within a distance
1 of C. Now, let a denote the string which is all 0's except for a 1 in the

position. Then Hei is just the zth column of H; in particular,

He, +

Thus, if C is a code word, the seven distinct strings C + e,, i 1.2,..., 7,


which are a distance 1 from C, are not code words since

( ° \
H{C + ei) = HC + Hei = He, ± 0
W
Thus, B\(C) contains one code word and seven distinct binary strings
which are not code words. Let C, and Cj be distinct code words and
suppose x e B\(Ci) and y e Bi(Cj). Then, by the triangle inequality for
metrics.

3 < p{Ci, Cj) < p(Ci, x) + p(x,y) + p(y,Cj).

Since p(Ci,x) < 1 and p{Cj,y) < 1, we must have p(x,y) > 1 which
implies that x ^ y. Thus, the sets Bi(C,) are disjoint. Therefore, their
union contains 16x8 = 128 distinct binary vectors, that is, all the binary
vectors of length 7.
We choose the decoding algorithm which assigns to each received
signal the unique code word within a distance 1. With this encoding
and decoding scheme, a code word transmitted with zero errors or with
one error is decoded correctly. Thus, the probability of correct trans¬
mission is p7 + 7p6(l — p) as compared to p4 for the direct transmission
of the four original digits. If p = .9, for example, then p4 = .66 and
p7 + 7p6(l — p) = .85. The information rate is |, so we have achieved a
dramatic improvement in reliability with only a modest reduction in the
information rate.
10.2 Coding Theory 373

The Hamming matrix has the nice property that the column is
just the binary representation of i reading from bottom to top. Suppose
that the four digits that we wish to transmit are encoded in a code word
C and that a single error occurs in transmission. Then the transmitted
word can be written as C + e; for some i. If we apply H to the received
word, we obtain H(C+e;) = Hei, which is the column of H. Since the
column is just the binary representation of i, we can see immediately,
by applying H to the received word, in which digit the error occurs.
For example, suppose that the sender wishes to transmit the signal 1101.
Thus, £3 = 1,^5 = 1,xq = 0, and £7 = 1. Using (6) - (8), the sender
determines that = 1, £2 = 0, and £4 = 0 and transmits the signal

1010101. (9)

Suppose that we receive the signal

1000101 (10)

because of an error in the transmission line. Applying H to the column


vector (1,0,0,0,1,0,1), we obtain the vector

(1\. 1

V0/
Since this is not the zero vector we know that there is an error in the
signal, and if there is only one error, it is in the third position since Oil
is the binary representation of 3. Therefore, we can correct (10) to obtain
the code word (9), thus recovering the signal that the sender wished to
transmit.
For more information about the history and mathematical develop¬
ment of coding theory, see [42] or [32].

Problems

1. (a) Generate the graph of the function f(p) = p6 + 6p5(l - p) - p1 2 and


use it to show that there is a pa so that the triple repetition code
in Example 1 improves reliability if p > pQ and hurts reliability if
P < Pa¬
th) Use Newton's method to get a good estimate of pQ.
374 Chapter 10. Probability Theory

2. Suppose that we wish to transmit a binary message which has two digits,
00, 01, 10, or 11. We use a repetition code which repeats each message
five times.

(a) How many code words are there? How many possible transmitted
signals are there?
(b) Explain why all the code words are a distance > 5 apart. Explain
why it follows that we can correctly decode any transmitted signal
with < 2 errors.
(c) Suppose that the transmission channel sends an individual digit
correctly with probability p. What is the probability that the trans¬
mitted message will have < 2 errors?
(d) If p = .95, compare the reliability of this coding scheme with the
reliability of sending the two digits directly.
(e) If p — .7, compare the reliability of this coding scheme with the
reliability of sending the two digits directly.
(f) Find a p0 so that this code improves reliability if p > pQ and hurts
reliability if p < p0.
(g) What is the information rate of this channel?

3. Suppose that we encode two binary digits as the first two digits of a three-
digit binary string and choose the third digit so that the sum of all the
digits is zero. Explain why this code can detect single errors but cannot
correct them. What is the information rate of this code?
4. Prove that the Hamming (7, 4) code has information rate R — |.
5. Suppose that you are receiving signals which employ the Hamming (7, 4)
code. Decode the following received signals:

(a) 0010111.
(b) 0110111.
(c) 0111101.
(d) 0111100.

6. Let S be the set of all possible words of length n that can be transmitted
by an information channel, and let C C S be the set of code words. Let
p be the discrete metric on S. The set C is called a perfect code if there
is an integer m so that the union of the sets of radius m centered at the
code words C e C equals S and, furthermore, each pair of code words is
a distance at least 2m + 1 apart. Which of the following codes is perfect?

(a) The repetition code of problem 1.


(b) The parity check code of problem 3.
(c) The (7,4) Hamming code.
10.2 Coding Theory 375

7. Let Z£ be the set of n-tuples of 0's and l's. We define addition by

(®i,x2, •••,*«) + (yi,y2,---,yn) = (®i +2/1, x2 + y2, -..,xn + yn),

where the plus signs on the right mean binary addition. For zeZ2 we
define scalar multiplication by

^ * (*H) *^2 5 • • • 5 *Tn) — * X\, Z • X2^ * * * j Z * Xn^ 5

where on the right z ■ X{ means multiplication in Z2. Use the fact that Z2
is a field (problem 7 of Section 1.1) to show that Z£ satisfies the definition
of vector space (with R replaced by Z2) given in Section 5.8.

8. Suppose that we wish to transmit a single binary digit on a channel which


has a probability p of correct transmission for each digit sent. For a given
odd integer n we encode and decode as follows. We transmit the digit n
times. If there are more l's than 0's, we decode the signal as a 1. If there
are more 0's than l's, we decode the signal as a 0.

(a) What is the probability that the decoded signal is the signal that was
encoded?

(b) Suppose that p > |. Prove that if we choose n large enough, the
probability of correct transmission can be made larger than any /3 <
1. Hint: use the weak law of large numbers; see project 3.

9. Suppose that we wish to transmit pairs of binary digits as in Example 1


on a channel which has a probability p of correct transmission for each
digit sent. Let n be a given odd integer. We encode each pair of digits by
repeating it n times.

(a) What are the four code words?

(b) Using the discrete metric introduced in Example 1, explain why


each pair of code words is at least a distance n apart.

(c) Explain why a message with < (n - l)/2 errors can be correctly
decoded.
(d) What is the probability that the decoded signal is the signal that was
encoded?
(e) Suppose that p > f. Prove that if we choose n large enough, the
probability of correct transmission can be made larger than any (3 <
1. Hint: use the weak law of large numbers; see project 3.
376 Chapter 10. Probability Theory

10.3 Continuous Random Variables


In many situations, one wants to analyze random variables whose val¬
ues do not lie in a countable set, for example, the height of a randomly
chosen man, or the amount of time that we have to wait before some
chance event occurs. One can force such problems to be discrete by mea¬
suring height or time in discrete units (for example, inches or seconds)
and rounding measurements up or down, but usually this is cumber¬
some and unnatural. Often, one assumes that the probability that the
value of X lies in a particular set of real numbers is given by integrat¬
ing a “density function” over the set. Suppose that / is a nonnegative
function such that
/ OO

f(x)dx = 1. (11)
-OO

By writing (11), we are assuming implicitly that the improper Riemann


integral exists. We say that / is the density function for a random vari¬
able X if
P{a < X < b} = [ f(x) dx
Ja

for all a and b, including a = —oo and b = oo. Condition (11) simply
guarantees that the total probability is 1.

Example 1 Consider the function

0, if x < 1
fix) =
75, ifx> 1.

Since,

lim lim [
/ —r- dx = lim lim “To = 1,
Jc
>oo
c\l d^ooJc X^ c\l d—>oo \c2 d2

we see that (11) is satisfied. If X is a random variable with density /,


then
P{1 <X<2} /t2 £dx = |

and
P{—2 < X < 1} = J 0 dx = 0.

If a random variable X has a density /, then the probability that the


value of X lies between a and b is just the area under / between a and b.
10.3 Continuous Random Variables 377

See Figure 10.3.1. Notice that

P{X = a} — f f(x)dx = 0,
Ja

so if a random variable has a density function, then the probability of


taking on any particular value is zero. If
/OO

\x\f(x)dx < oo, (12)


-OO

then we say that X has finite expectation and define


/ OO

xf(x)dx. (13)
-OO

E(X) is called the expected value of X or the mean. We often write


p = E(X) for short. Note that by problem 12 of Section 3.6, the improper
integral (13) exists if the improper integral (12) exists. The variance of X,
a2, is defined by
POO

a2 = (x — n)2f(x) dx
J OO

if the improper integral exists. These definitions are analogous to the


definitions of mean and variance in the case where X is a discrete ran¬
dom variable (see Section 10.1 and project 1). By using Riemann sums,
the analogy can be made precise (problem 13). As in the discrete case, a
is called the standard deviation.

f(x) = \e 2 f(x) = ^=e x2

/(*) = dh

n--1-r~—r
c a 0 b d

Figure 10.3.1

Three special kinds of density functions arise frequently.

Example 2 (Uniform) Let f(x) = ^ on a finite interval [c, d] and


f(x) = 0 for x outside the interval. A random variable whose density is
/ is said to be uniformly distributed on the interval [c, d}.
378 Chapter 10. Probability Theory

Example 3 (Exponential) Suppose that A > 0 and define / by

\e~Xx x > 0
/(*) 0, x < 0.

A random variable with density / is said to be exponentially distributed


with parameter A. Often waiting times, that is, the amount of time un¬
til some event occurs, are assumed to be exponentially distributed. The
mean and standard deviation of a random variable which is exponen¬
tially distributed with parameter A are both equal to 1/A (problem 7).

Example 4 (Normal) Let jueR and a > 0, and define / by


-(s-m)2
e

A random variable with density / is said to be normally distributed


with mean p, and standard deviation a. If one computes the mean and
standard deviation by using the definitions above, one obtains p and a,
respectively, which is why this terminology is used. If /x = 0 and a — 1,
then X is called a standard normal random variable.
Note that the probability P{a < X < b} can easily be evaluated in the
case of a uniform or exponentially distributed random variable because
we can use the Fundamental Theorem of Calculus to evaluate the inte¬
gral explicitly. Since there is no elementary function whose derivative
is the normal density, P{a < X < b} must be evaluated by numerical
methods such as those discussed in Section 3.4. There are tables which
give the approximate values of P{a < X < b} for a large number of
different a's and b's. Fortunately, the integrals P{a < X < b} in the case
of general /x and a can be related to the integrals in the case /u, = 0 and
a = 1 by a simple change of variables (problem 10). Thus, we only need
tables of values for the case of a standard normal random variable.
The fact that many random variables are normal can be explained
by a deep theorem in probability theory called the central limit theo¬
rem, which says that a random variable is approximately normal if it
is the sum of many independent, “similar” random variables. For ex¬
ample, imagine that a man's total height is the sum of small “boosts”.
Each boost is present or absent depending on whether a particular nu¬
cleotide in his DNA is present or absent in a particular position. If the
nucleotides in different positions are independent, then height should
be approximately normally distributed. This is a gross oversimplifica¬
tion of course, but it gives the idea of how such an argument might be
constructed.
10.3 Continuous Random Variables 379

There is a natural way to unify our treatment of discrete random vari¬


ables and random variables with densities. If X is a random variable, we
define
F(x) = P{—oo < X < x}.

That is, F(x) is just the probability that the value of X lies in the interval
( —oo,x]. F is called the cumulative distribution function of X. As x gets
larger P{—oo < X < x} cannot decrease so F is a monotone increasing
function. Since the value of X is some real number, we must have

lim F(x)
nr>—Vrvo
\ /
= P{—
c
oo < X < 00}
j
= 1 (14)

and

lim F(x) = P{0) = 0. (15)


x—>—00

Furthermore, since F is monotone increasing, the limits from the left and
right, F(x~) and F(x+), exist for each x. Thus, the only possible discon¬
tinuities of F are jump discontinuities (problem 7 in Section 3.5). Any
real-valued function on R that is monotone increasing and satisfies (14)
and (15) is called a cumulative distribution function even if no random
variable is specified.
Suppose that X is a discrete random variable which takes on only
finitely many values, for example, a Bernoulli or binomial random vari¬
able. Let ai, a2,.. •, aw denote the possible values listed in increasing
order. Then

F(x) = P{-00 < X < xj = Y, P{X = an}. (16)


CLn ^X

Thus F is zero on the interval (—00, ai), equal to P{X = a\} on the in¬
terval [ai, a2), equal to P{X = a{} + P{X = a2} on the interval [a2,03),
and so forth. See Figure 10.3.2. F is constant except for jump discontinu¬
ities at each an, and the size of the jump, F(a+) - F(a~), at an is equal to
P{X = an}.
In general, a discrete random variable X takes on countably many
values {an}. In this case the right-hand side of (16) may be an infinite
series if infinitely many of the numbers an are less than or equal to a
particular x. The series always converges by the comparison test since it
consists of a subset of the terms of the series XI P{x = a„}, which con¬
verges and sums to 1. If {an} has no limit points (e.g a Poisson random
variable), then by the Bolzano-Weierstrass theorem, there can be at most
380 Chapter 10. Probability Theory

finitely many values an in any given finite interval. These points can be
ordered, and again we get the simple picture in Figure 10.3.2.

P{X = an}

•-

P{X = an_!>

I T
®n—1

Figure 10.3.2

If {an} has lots of limit points, there may not be a natural ordering of
{an}. For example, suppose that the values {an} are the rational num¬
bers. Formula (16) is true but it is much harder to visualize the graph of
F since, in any interval about a limit point of {an}, F will have infinitely
many steps. As before, the possible values of X are just the points of
discontinuity of F, and the probabilities are just the sizes of the jumps at
these points. Thus, the mass density of a discrete random variable can be
recovered from its cumulative distribution function. A cumulative dis¬
tribution function which is constant except for finite or countably many
jumps is called discrete.
If the random variable X has a density /, then

F(x) = P{-oo<X<x} = [X f{t)dt.

The improper integral on the right exists because /* /(f) dt is monotone


increasing and bounded by 1 as c —y -oo. If / is continuous, as in
the case of a normal random variable, then by the Fundamental The¬
orem of Calculus (Theorem 4.2.5), F is continuously differentiable and
F'(x) = f(x). Thus the density / can be recovered from the cumula¬
tive distribution function F. If / is only piecewise continuous, as in the
case of uniform or exponential random variables, then F is differentiable
and F'(x) = f(x) except at the points where / jumps. In both cases, the
10.3 Continuous Random Variables 381

cumulative distribution function is continuous. A random variable X


with a continuous cumulative distribution function is called a continu¬
ous random variable. Note that this does not refer to the continuity of
X as a function from the sample space to M. Of course, a cumulative dis¬
tribution function may be neither discrete nor continuous. However, it
can always be written as a convex combination of a discrete cumulative
distribution function and a continuous cumulative distribution function
(problem 12).
If F is the cumulative distribution function of a random variable X,
then for any a < b,

P{a<X <b} = P{~oo <X <b} - P{-oo <X <a}


= F(b) - F(a).

Thus, if F is continuous,

lim P{a < X < b} = 0.


a—>b

Since 0 < P{X = b} < P{a < X < b} for all a < b, we conclude that
P{X = b} = 0. Therefore, continuous random variables take on specific
values with probability zero. We saw this before in the special case when
X has a density, for example, when X is uniform, exponential, or normal.
This raises the natural question of what kinds of sets can have positive
probability. The following example shows that this question is deeper
than it looks.

Example 5 (the Cantor set and function) We will describe the Cantor
set, C, which is a subset of [0,1], by saying which points are not in C.
First, we exclude the middle third (|, |) of the interval [0,1]. We then
exclude the middle thirds, namely, (|, |) and (|, |), of the two intervals
that remain. Now there are four remaining intervals, and we exclude
their middle thirds, and so forth. The Cantor set is the collection of points
in [0,1] that remain after we have carried out this procedure infinitely
often. A straightforward calculation with geometric series shows that
the sum of the lengths of the excluded intervals equals 1, so in that sense
C is a very small set. On the other hand, there is another characterization
of C which allows one to show that C is uncountable (see below), so in
that sense C is a very large set.
Decimal expansions were discussed in project 4 of Chapter 6. The
word “decimal” is used, of course, because one is writing a given num¬
ber as a sum of powers of 10. In similar fashion, one can show that every
382 Chapter 10. Probability Theory

number x satisfying 0 < x < 1 can be written

OO

x E 3n ’
where, for each n, an — 0,1, or 2. This is called the ternary expansion
of x. Given x, the sequence {an} is uniquely determined except when
x is of the form q/3n for some integers n and q, in which case there are
exactly two expansions, one ending in a string of 0's and the other ending
in a string of 2's. Conversely, if {an} is any sequence of 0's, l's, or 2's,
the series converges to a real number x that is in the interval [0,1]. The
proofs of these statements are similar to those outlined in project 4 of
Chapter 6.
It is not too difficult to see that the Cantor set is just the set of x in
[0,1] whose ternary expansions have no l's. For example, the first middle
third we eliminated, (|, |), consists of numbers such that ai = 1 in their
ternary expansions. Similarly, if a\ ^ 1 but a.2 = 1, then x is in the
interval (|, |) or the interval (§,§), depending on whether a\ = 0 or
a\ = 2. If a number x has two ternary expansions, then it is in the Cantor
set if one of the expansions has no l's. This shows that there is a one-
to-one correspondence between C and the set of all sequences of 0's and
2's. Since a straightforward modification of the proof of Theorem 1.3.6
shows that this set of sequences is uncountable, C must be uncountable
too.
We shall now define a function g on [0,1]. If the ternary expansion of
x has no l's, we set N = oo and otherwise let N be the index of the first
place in the ternary expansion of x where a 1 occurs. Set bn = \an for
n < N and bjy = 1, and define

00 h

sW = E
n=1 Z
-§>■

This function, g, is called the Cantor function. Notice that the value of a
is ^ for all x in (l, |) since in that case ai = 1. Similarly, the value of g is \
on (i, |) and the value of g is | on (|, |). Continuing in this fashion, one
can see that g is constant on each of the intervals in the complement of
the Cantor set. Furthermore, g is monotone increasing and continuous.
The monotonicity can be proved by checking cases, and the continuity
holds because two numbers that are very close have ternary expansions
which are identical for a large number of terms.
10.3 Continuous Random Variables 383

Define a function F by

f 0, x < 0
F{x) = < g(x), 0 < x < 1
[1, x > 1.

Then F is monotone increasing, continuous, and satisfies (14) and (15).


Therefore, it satisfies all the properties of a continuous cumulative distri¬
bution function. If there is a random variable X with cumulative distri¬
bution function F, what sets of values would have positive probability?
Well, the intervals (—oo,0) and (l,oo) have zero probability since F is
constant on each. Similarly, each of the intervals in the complement of
the Cantor set in [0,1] has zero probability since g is constant on each of
those intervals. Thus, with probability 1, the value of X must lie in C.
This is true even though the probability that X takes on any particular
value in C is zero since F is continuous. Furthermore, away from the
Cantor set F is differentiable but its derivative is zero, so X does not
have a density function /.
This example indicates why more advanced applications of analysis
to probability theory usually begin with a thorough study of sets and
measures, functions that assign sizes to sets.

Problems

1. Suppose that after a new car is purchased, the number of years until the
first major repair is a random variable X with density

if x > 0
if x < 0.
(a) Show that f(x) dx = 1.
(b) Compute P{0 < X < 2}.
(c) Compute P{X > 2}.
(d) Compute P{—5 < X < 2}.

2. Suppose that X is a random variable with a continuous density / that has


finite variance.

(a) Prove that X has finite expectation.


(b) Prove that a1 2 = E(X2) — E{X)2.

3. Find the mean and variance of the random variable in problem 1.


384 Chapter 10. Probability Theory

4. Find the mean of the random variable in Example 1 and show that the
variance is not finite.
5. (a) Suppose that the departure time of a bus is uniformly distributed
between 1p.m. and 1:10p.m. If you arrive at 1:03 p.m., what is the
probability that you will have missed the bus?
(b) A point is chosen at random on the interval [0,4], What is the prob¬
ability that it will be within | of 7r? What is the probability that it
will be within | of an integer?
6. Suppose that X is uniformly distributed on the interval [c,d]. Find the
mean and standard deviation of X. Draw the graph of the cumulative
distribution function of X.
7. Prove that the mean and standard deviation of a random variable that is
exponentially distributed with parameter A are both equal to 1/A. Com¬
pute explicitly the cumulative distribution function and verify that it has
the right properties.
8. Let fn(x) be the density of an exponentially distributed random variable
Xn with parameter An. Suppose that An —> A > 0.

(a) Prove that {fn} converges uniformly to the density of an exponen¬


tially distributed random variable with parameter A.
(b) Let X be an exponentially distributed random variable with param¬
eter A. Prove that for all a and b (finite or infinite).

lim P{a < Xn < b} — P{a < X < b}.

9. Let / be the density of a standard normal random variable. Prove that


f^°0G f(x)dx = 1 by the following trick:

Now use polar coordinates in the plane to do the integral on the right ex¬
plicitly. Note that this uses the fact that one can compute double integrals
by iterating the integrals (see project 4 of Chapter 4), as well as a change
of variables formula for multiple integrals.
10. Let Y be normally distributed with parameters /z and a, and suppose that
X is a standard normal random variable. Prove that

P{a < Y <b} = P{°—^ < X < h—^}.

Hint: make a change of variables and use Theorem 4.4.5.


10.3 Continuous Random Variables 385

11. Generate the graphs of the cumulative distribution function of the fol¬
lowing random variables:

(a) A Bernoulli random variable with p = |.


(b) A binomial random variable withp = | and n = 6.
(c) A Poisson random variable with A = |.

12. Let F be a cumulative distribution function.

(a) Prove that the set of points where F is discontinuous is countable.


Hint: at how many points could the jump of F be > A?
(b) For each point of discontinuity an, let pn = F(a+) - F(a~), and
define Fd(x) = J2an<x Pn- Show that Fd is monotone increasing and
constant except for countably many jumps.
(c) Define Fc(x) = F(x) — Fd(x). Prove that Fc is continuous and mono¬
tone increasing.
(d) Prove that there is a discrete cumulative distribution function Gd, a
continuous cumulative distribution function Gcr and a real number
■ a satisfying 0 < a < 1, so that

F(x) = aGd(x) + (1 - a)Gc(x).

13. Let / be the density function for a random variable X on a sample


space S. For simplicity we will assume that / is continuous and that
f(x) = 0 outside of the finite interval [a, b]. Divide [a, b} into N subinter¬
vals [xi-1, Xi], each of length S = and let Xi denote the midpoint of
each interval. Define Yn to be the random variable on S which takes the
value Xi on the set {5 e S \ X(s) e Xi)}.

(a) Prove that Yn is a discrete random variable with probabilities

f(x)dx.

(b) Write formulas for the mean, pN, and standard deviation, aN, of Yn-
Use Corollary 3.3.2 to prove that /xjv —> P and aN —>• cr as N —t 00,
where p and a are the mean and standard deviation of X.
(c) Prove that for all c < d,

P{c < X < d} = lim P{c < Yn < d}.

14. Let / be a condnuous function on 1R2. So that we don't have to consider


improper integrals, we shall assume that f(x, y) — 0 for all (x, y) outside
of some square. Let X and Y be random variables on a sample space and
suppose that

P{a < X < b and c <Y < d} (17)


386 Chapter 10. Probability Theory

for all a < b and c < d. The function / is called the joint density of X and
Y, respectively.

(a) Explain why the functions fx and fY on K defined by


/ OO

f(x,y)dy, fY(y) =
poo

/ f(x,y)dx,
-oo J — oo

are the densities of X and Y.


(b) For (x, y) e R2, define

F(x,y)= f f f{s,t) dsdt.


J —oo J —oo

F is called the joint cumulative distribution function of X and Y.


Using Theorem 4.2.5, Theorem 5.2.4, and Project 4 [part (e)] of Chap¬
ter 4, give a careful justification of the formula

-&kF{x'v) = f{x'y)-

(c) X and Y are said to be independent if

P{a < X < b and c < Y < d} — P{a < X < b}P{c <Y<d}

for all a < b and c < d. Prove that X and Y are independent if and
only if f(x,y) = fx(x)fY(y).

10.4 The Variation Metric

In many applications of probability theory, it is extremely useful to have


a way of measuring when two random variables are “close”. For simplic¬
ity, we shall restrict our attention to random variables that take values in
N U {0}. To motivate the mathematical development, we begin by ex¬
plaining heuristically why a binomial random variable is approximately
a Poisson random variable if p is small, n is large, and np has moderate
size. Throughout we shall denote by Y(A) a Poisson random variable
with parameter A. Recall that Y(A) has mass density

P{Y(X} = k) = e~x£.
10.4 The Variation Metric 387

Suppose that X is a binomial random variable with parameters n and p,


and let A = np. Then, for 0 < k < n,

n\
P{X = k} Pk(i~p)n-k (18)
(n - k)\kl
n—k
n!
(19)
(n — k)\k\

n(n — 1)... (n — fc + 1) (1 - A/n)n Xk


(20)
nk (1 — A/n)k k\

Now, suppose that p is small and n is large. Then A/n is small. Suppose
that k is small compared to n. Then (see problem 1),

(21)

n(n — 1)... (n — k + 1)
(22)
nk

(23)

where we use ~ to mean “approximately equal to.” Substituting (21),


(22), and (23) in (20), we obtain

\k
P{X = k} « e~x—~ = P{Y(A) = k}.
kl

Though we have used the assumption that k is small compared to n, both


the binomial and the Poisson probabilities are small for large k since p is
small and are thus close to each other. This suggests that for any set
A CNU{0},

P{X e A} « P{Y(A) e A}. (24)

We shall see later that this is true and derive a bound for the difference.
The approximation (24) is the reason that the number of occurrences
of rare events is often assumed to have a Poisson distribution. Here is an
example.

Example 1 Suppose that the average number of earthquakes each year


is 2.5. A reasonable, simple assumption would be that the probability
of a quake on any particular day is p = 2.5/365. Let X; be the random
388 Chapter 10. Probability Theory

variable which has the value 1 if there is a quake on the day of the
year and equals zero otherwise. Then the value of X = i Xi is the
number of quakes during the year. If we assume that the X{ are indepen¬
dent and set p = P{Xi = 1}, then X is a binomial random variable with
n = 365 and p = 2.5/365. Since n is large, p is small, and A = np = 2.5
has moderate size, the mass density of X should be well approximated
by the mass density of a Poisson random variable with the same mean,
that is, by y(2.5). The probability that there will be no earthquakes in
the year is given by X as

362.5 \ 365
P{X = 0} .081
365 /

and by Y as
P{Y = 0} = e-2'5 = .082.

Similarly, the probability that there will be < 2 quakes is given by X as

362.5^365 | oac (362.5^364 / 2.5 ^ | (365)(364) /362.5^ 363 ( 2.5 ^ 2


365 / + 65 V 365 / V365/ + 2! V 365 ) \365/

= .5434

and by Y as

e 2'5 + e“2'5(2.5) + e~2-5^'^ = .5438.


Li I

In both cases, the approximation by the Poisson is very good. It is clear


that the calculations with the Poisson random variable are simpler. What
we need is an error bound.

We briefly describe two other situations in which the question of


measuring the closeness of two mass densities is both natural and im¬
portant. Mathematical details can be found in the references.
*

Example 2 Let's consider what it means to “randomize” the order of


a deck of playing cards by repeated shuffling. There are 52! possible
orderings of the cards. A shuffle is a well-defined probabilistic proce¬
dure which, when applied to any particular ordering, gives each of the
52! orderings with various probabilities (some of which could be zero).
For example, a simple (inefficient) shuffle would be to choose a card at
10.4 The Variation Metric 389

random from the deck and reinsert it at a randomly chosen place. Let's
assume that we have chosen a specific method of shuffling. Since there
are 52! possible orderings of the cards, we can label the orderings by the
numbers 1,2,..., 52!. Let the value of the random variable Xn be the la¬
bel after n shuffles. Let U be the random variable which takes on each
of the label values 1,2,..., 52! with probability that is, U takes on
each label value with equal probability. A reasonable mathematical in¬
terpretation of the question “How random is the deck after n shuffles?”
is the question “How close is the mass density of Xn to the mass density
of f7?” since U assigns equal probabilities to each ordering. If one uses a
“riffle shuffle”, one can show that the densities are quite close if n > 7.
For an excellent introduction to the mathematics of card shuffling, see
[38], where the riffle and other shuffles are formally defined.

Example 3 In Example 5 of Section 5.6, we discussed the use of met¬


rics in molecular biology. Even when we have determined how close
two DNA sequences are, difficult questions in probability theory are in¬
volved in the interpretation of the results. Suppose that in two long
sequences we find two short subsequences that match perfectly. How
likely is this event if the letters in each sequence are chosen randomly
(with or without independence in neighboring positions) from the DNA
alphabet A, G, C, T. In order to make calculations, one must typically
argue that, under appropriate assumptions, the distribution of the ran¬
dom variable of interest (in this case the length of the longest identical
substrings) is approximately given by one of the standard, simple dis¬
tributions of probability theory, for example, the Poisson distribution or
the binomial distribution. Thus one wants to know when two discrete
probability distributions are close. See [30] for an excellent discussion
of the applications of metrics, combinatorics, and probability theory to
molecular biology.

We are now ready to define the distance between random variables.

Definition. Let X and Y be two random variables which take values in


N U {0}. Define

pv(X,Y) = sup \P{X e A}-P{Y eA}\ (25)


.ACN

pv(X, Y) is called the variation of X and Y.


390 Chapter 10. Probability Theory

This is a natural way to measure distance since if pv{X, Y) is small


then the probabilities that the values of X and Y will be in any particular
set A will be close to each other. Note, however, that pv(X, Y) does not
compare the random variables themselves, only their mass densities. In
particular, pv(X,Y) = 0 if X and Y have the same mass densities. So
X and Y need not even be random variables on the same sample space.
These remarks are made explicit by the following theorem, which shows
how to compute pv(X, Y) in terms of the mass density of X and the mass
density of Y.

□ Theorem 10.4.1 Let X and Y be random variables with values in N U


{0} that have mass densities pn = P{X = n} and qn = P{Y = n},
respectively. Then,

-j OO

Pv(X,Y) = -£lP»-®»l- (26)


Z n=0

Furthermore, for all such random variables X, Y, and Z,

(a) pv{X,Y)> 0.

(b) pv{X,Y) = pv{Y,X).

(c) pv(X,Y)<pv{X,Z) + pv(Z,Y).

Proof. We shall prove (26), from which the properties (a), (b), and (c)
follow quite easily (problem 2). Let S+ = {n e N U {0} \pn > qn} and
S~ = {n e N U {0} | pn < qn}. For any iCMU {0},

P{X e A} - P{Y e A} = E Pn - Yin (27)


n e A n e A

— / ^ (Pn Qn) (28)


n e A

(Pn Qn) + (,Pn Qn)’ (29)


n e AnS'+ n e APIS'-

Since the first term in (29) is positive and the second is negative, it follows
that

E (P» - «») < P{X eA}-P{Y(A} < E (P" - In),


n e Ads'- n e Ans+
10.4 The Variation Metric 391

which implies

E (P» - «») < P{X e A} - P{Y e A} < £ (pn - gn).


. e S- :S+

Subtracting En Qn = 1 from ZnPn = 1, we find that

0 (Pn Qn) — ^(Pn — Qn) + '^{Pn ~ Qn)


NU{0} S+ S~

= ^2(Pn ~~ ^n) — IPn ~ Qn\i


s+ s-
so

^2{Pn-qn) = ^2\pn-Qn\ (30)


S+ S~

It follows that

E (P» - In) < P{x e A} - P{Y e A} < E (P» - 9-):


ne S+ ne S+

SO

IP{X e A} - P{Y e 4}| < E (P» - 9n).


n e 5+
Since we can always choose A = S+, this implies that

sup \P(X e A) - P(Y e A)] = E (P» ~ 9»).


AC1JJ(o} „eS+

It follows from (30) that

(Pn Qn) — „ |i?n <Zn|>


n e 5+ : NU{0}

which proves (26). □

Note that (a), (b), and (c) are just the properties of a metric except that
p(X, Y) = 0 does not imply that X = Y. In fact, pv{-, •) is a metric on the
set of sequences {pn}£°=o °f nonnegative numbers such that J2pn = 1.
That is, pv is a metric on the set of mass densities. We follow common
practice and write pv{X,Y) even though pv depends only on the mass
densities of X and Y, not on the random variables themselves. We now
prove a theorem that relates the properties of pv(X,Y) to probabilistic
properties of X and Y.
392 Chapter 10. Probability Theory

□ Theorem 10.4.2 Let X and Y be random variables with values in NU{0}.


Then,
(a) Then pv(X, Y) < P{X ^Y}.

(b) If Z is a random variable with values in N U {0} which is indepen¬


dent of X and Y, then

pv{X + Z,Y + Z) < pv(X,Y).

(c) Let and be families of mutually independent ran¬


dom variables that take values in N U {0}. Then,
N

M'LXi.Z.Yi) <
i—1

Proof. To prove (a), we note that for every iCNU {0},

P{XeA} = P{X e AandY e A} + P{X e AcmdY e A}


< P{Y eA} + P{X /T},

so P{X e A} — P{Y e A} < P{X ^ Yj. Reversing the roles of X and Y


gives P{Y e A} — P{X e A} < P{X ^ Yj, which proves (a).
Recall that two random variables X and Y are independent if for all
sets 4CI and B CM,

P{X e A and Y e B] = P{X e A}P{Y e B}.

For A C NU{0}, define j4—{n} = {m e NU{0} | m = j—n for some j e A}.


Then, to prove (b), we compute
oo
P{X + Z e A} = y: P{X e A — {n} and Z = n} (31)
n—0

oo
=
J2 PiX e A - in}}p{Z = n} (32)
n—0

oo
< E IP{Y £ A - {n}} + pv(X,Y)}P{Z = n} (33)
n—0

oo
= E P{Y(A- {n}}P{Z = n] + pv{X,Y) (34)
71=0
OO
= ^P{Y eA-{n} and Z = n} + pv(X,Y) (35)
n=0

= P{Y + Z e A} + pv(X,Y). (36)


10.4 The Variation Metric 393

In step (32) we used the hypothesis that Z is independent of X and in


step (35) that Z is independent of Y. In step (34) we used the fact that
P{Z = 1. Combining this inequality with the inequality ob¬
n} —
tained by reversing the roles of X and Y, we obtain

\P{X + Z eA} - P{Y + Z eA}\ < pv{X,Y),

from which (b) follows.


To prove (c), let N = 2. Then, by the triangle inequality and the result
of part (b),

pv(X1 + X2,Y1 + Y2) < pv(X1 + X2, X2 + Yi) + pv(X2 + Yi, Yi + Y2)
< pviXuY!) + Pv(X2,Y2).

Continuing in this manner, we see that part (c) follows by an induction


argument (problem 8). □

The proofs of the next three results all depend on the following two
ideas. First, since pv(X,Y) depends only on the mass densities of X
and Y, its value does not change if Y is replaced by another random
variable Y with the same density. Second, suppose that Y is a Poisson
random variable with mean A. Then, for every choice of pi > 0 so that
Pi + P2 + • • • + Pn = A, there is a sample space S and mutually indepen¬
dent Poisson random variables Y{pi) so that Y = ^Y(pi) is a Poisson
random variable with mean A. Thus pv{X, Y) — pv(X, Y). See problems
12,13, and 14.

Corollary 10.4.3 Let Y(pi) and Y(p2) be Poisson random variables with
means p\ < p2. Then,

Pv{Y(pi)tY(p2)) < M2-Mi-

Proof. Let Y(mi) and Y(p2 - Mi) be independent Poisson random vari¬
ables with means p\ and p2 — pi and define Y(p2) = Y(p\) + Y(p2 — Ml)-
Then Y(p2) is Poisson with mean p2. Therefore,

pv(Y(pi),Y(p2)) = pv(Y(pi),Y(p2))

= pv{Y{pi),Y(pi) +Y(p2 - mi))

< P{Y(mi) 7^ Y(mi) + Y(m2 — Mi)}

= P{Y(M2 - Mi) ^ 0}
394 Chapter 10. Probability Theory

_ l _ g-(M2-Ml)

< P2~Pl-

In the third step we used part (a) of Theorem 10.4.2. The last step follows
from the Mean Value Theorem. □

Thus we have an upper bound for how close the mass densities of two
Poisson random variables are if the means are close.

Example 4 Suppose that a geologist is asked to predict how likely it


is that there will be < 2 earthquakes in a certain region next year. She
assumes that the number of earthquakes is given by a Poisson random
variable. However, she needs to determine the right mean to use, and
she does so by a statistical analysis of historical data. In such situations
it is not always clear how much of the historical data to use since older
data may be more unreliable. If she uses data since 1950 she gets a mean
of 2.5 (as in Example 1), and if she uses data back to 1900 she gets a
mean of 2.4. How much difference does it make which mean she uses?
According to Corollary 10.4.3,

\P{Y(2.5) e A} - P{Y(2A)eA}\ < 2.5 - 2.4 = .1

for any set A. If A — {0,1, 2}, then we calculated in Example 1 that

P{Y(2.5) e {0,1, 2}} = e”2'5 + (2.5)e-2’5 + ^V2'5 = .5438,

and similarly,

P{Y(2.4) e {0,1,2}} = e~2-4 + (2.4)e~2-4 + ^V2'4 = .5697.

Thus we see that the two predictions indeed differ by less than 0.1.

□ Theorem 10.4.4 (Le Cam's Inequality) Let X\, X2,..., Xn be indepen¬


dent Bernoulli random variables with probabilitiespi,p2, •••, pn■ Suppose
that Y(Ya=i Pi) is a Poisson random variable with mean Ya=i Pi- Then,

< !>?. (37)


i—1
10.4 The Variation Metric 395

Proof. We first prove the result in the case n = 1. Let X be a Bernoulli


random variable with p = P{X = 1}. Then,

2 pv(X,Y(p)) YJ\P{X=n}-P{Y(p)=n}\
77=0

1 OO

E lp{* = «} - P{Y(p) = «}| + E |P{Y(p) = n}\


71=0 77=2

= |(l-p)-e p\ + |p — pe p\ + J^|P{y(^) = n}|


77=2

= (e~p — pe~p + 2p — 1) + (1 — e~p — pe~p)

= 2p(l - e~p).

Since (1 — e~p) < p by the Mean Value Theorem, we conclude that

pv(X,Y(p)) < p2. (38)

For the general case, we replace Y(^7=i Pi) by the sum Ya=i ^(Pi) of in¬
dependent Poisson random variables Y(pi) with corresponding means
Pi. And we replace Xi,X2,...,Xn, by independent Bernoulli random
variables with probabilities pi,P2, ■■■■,Pn, so that the families {Xi} and
{Yi} are mutually independent. This can be done by a product construc¬
tion similar to that described in problem 13. Thus, by part (c) of Theorem
10.4.2 and (38),

Pv(T.UXi,Y(Y,Uvi)) = Pv(T.UYY.UY(Pi))
n

< Y,p»(X”Yi(p,))
i=1
n

< Ep.2 □
i—1

Corollary 10.4.5 Let {Xi}™=1 be a sequence of independent Bernoulli


random variables each with mean p, and let X = Xi- Let Y(np) be
a Poisson random variable with mean np. Then,

pv(X,Y(np)) < np2.

Proof. This is the special case of (37) when pt—p for all i. □
396 Chapter 10. Probability Theory

Example 1 (revisited) In Example 1, p = |g| and

.2
np

Thus, whatever the set A C N U {0}, we should have

\P{X e A} - P{Y{np) e A}\ < .017.

This was indeed true for the two cases, A\ = {0} and A2 = {0,1, 2}, that
we compared explicitly in Example 1.

Problems
1. For fixed k, show that the expressions on the left sides of (21), (22), and
(23) converge to the respective right sides as n —» 00.
2. Complete the proof of Theorem 10.4.1 by verifying that pv satisfies the
properties (a), (b), and (c).
3. Suppose that on the average 1 out of every 75 items coming off an assem¬
bly line is defective. Assume that each item coming off has probability X
of being defective. Use a binomial random variable and a Poisson ran¬
dom variable to calculate the probability that the next batch of 75 items
will have

(a) no defective items.


(b) two or more defective items.

4. Every day during January you buy a lottery ticket. Each ticket has a
chance of winning equal to Use a binomial random variable and
a Poisson random variable to calculate the probability that you will win

(a) exactly once.


(b) more than once.
(c) not at all.

5. Assume that each of the four letters in the DNA alphabet occurs inde¬
pendently in each position with probability |. Suppose that we have a
DNA strand of length 8. Use a binomial random variable and a Poisson
random variable to calculate the probability that the strand contains

(a) no C's.
(b) exactly two C's.
(c) less than or equal to two C's.
10.4 The Variation Metric 397

6. Let X be a random variable which takes the values 0,1, and 2 with prob¬
abilities po = Pi — \, and p2 = and let Y be a random variable
which takes the values 0,1,2, and 3, each with probability |. Compute
Pv{X,Y).

7. For each integer n > 0, let Xn be a random variable which takes the
values 0,1,2,n, each with probability j)A_.

(a) Compute pv(Xn, Xm).


(b) Compute limn^oo pv(Xn,Xm).

8. Provide the details of the induction argument for part (c) of Theorem
10.4.2.
9. Compute the error bound for the difference between the binomial and the
Poisson random variables in the situation described in problem 3.
10. Compute the error bound for the difference between the binomial and the
Poisson random variables in the situation described in problem 4.
11. Compute the error bound for the difference between the binomial and the
Poisson random variables in the situation described in problem 5. What
is the point of this problem?
12. Let Y(pi) and Y(p2) be independent Poisson random variables with
means pi and p2. Prove that Y(pi) + Y(p,2) is a Poisson random vari¬
able with mean pi + p2. Hint: since Y(pi) and Y(p2) are independent,
n

P{Y(pi) + Y(p2) = n} = ^P{F(/xi) = k and Y(p2) = n - k}


k=0
n

= YJp{Y(pi)^k}P{Y(p2)=n-k}.
k=0

13. Suppose that and p2 are positive numbers satisfying A = pi + p2.


Let S = N U {0} x N U {0}, and define random variables Xi and X2 on S
by
Xi(m,n) = m, X2(m,n) = n.

Assume that for each m and n,

uT Po
P{X! — m and X2 = n} = ^ e~^
L ml nl

(a) Prove that Xi is a Poisson random variable with mean p,i and X2 is
a Poisson random variable with mean p2.
(b) Prove that Xi and X2 are independent.
(c) Prove that Xx + X2 is a Poisson random variable with mean A =
Pi + P2-
398 Chapter 10. Probability Theory

14. Let Y(A) be a Poisson random variable with mean A. Let pi > 0 for
i = 1,2,..., n and suppose that A = p\ + p2 + • • • + pn■ Show how to
create a sample space 5 and random variables Xi, X2,..., Xn so that

(a) Xi is a Poisson random variable with mean pi.


(b) the random variables Xi are mutually independent.
(c) the mass density of Xi + X2 + ... + Xn is the same as the mass
density of Y(X).

15. Let {Xj}"=1 be a sequence of independent Bernoulli random variables


each with mean p(n), which depends on n. Define X(n) = £) X*. Sup¬
pose that as n —»■ 00, we have p(n) —> 0 and np{n) —> f3. Prove that

pv(X(n),Y(j3)) < np(n)1 2 + \(3 - np(n)\.

Hint: use Y(np(n)) and the triangle inequality.

Projects

1. The purpose of this project is to introduce the variance and develop


some of its properties. Let X be a discrete random variable with val¬
ues {an} and suppose that X2 has finite expectation. That is, the se¬
ries 5^a2P{X = an} converges absolutely. Since Kl < a2n + 1, it
follows that X has finite expectation. We denote E(X) by p. Since
(X — p)2 = X2 — 2pX + p2, it follows from problem 9 in Section 10.1
that (X — p)2 has finite expectation too. We define

a2 = E((X-p)2).

and call a2 the variance of X. It is sometimes denoted Var(X). It's square


root, a, is called the standard deviation.

(a) Suppose that X takes the values 1.5, 2, and 2.5 with probabilities
\, and |, respectively. Compute p and <r2.
(b) Suppose that X takes the values 0, 2, and 4 with probabilities |,
\, and |, respectively. Compute p and <j2. What do you conclude
about the meaning of cr2? ^

(c) Compute the variance of a Bernoulli random variable with param¬


eter p.
(d) Show that a2 = E(X2) - E{X)2. Hint: expand (X — p)2 and use
problem 9 in Section 10.1.
(e) Let X be the sum of the faces in the experiment of rolling two dice.
What is cr2?
Projects 399

(f) Let X\, X2,..., Xn be independent random variables each having


the same range and probabilities. Suppose a2 = Var(Xj) is finite
and set [i = E[Xi). Define Sn = Xx + X2 + ... + Xn. Prove that
Var(Sn) = ncr". Hint: note that E(Sn) = n/x and expand:

{Sn ~ nn)2 - - **))

n n

£(*> - + £(* - /*)(xi - m);


3=1

then use problem 6 of Section 10.1 and Theorem 10.1.3.

2. The purpose of this project is to sketch the proofs and uses of Markov's
inequality and Chebyshev's inequality. Throughout we assume that X is
a discrete random variable which takes values {an}.

(a) Suppose that X has finite expectation. Prove that, for each t > 0,

p{\x\>t) < Sffll.

This is known as Markov's inequality. Hint: by Theorem 10.1.2,


E(|X|) = Y^\an\P{X = an}; now eliminate the terms in which
|
Cln \ f
(b) Markov's inequality is useful when we have information only about
the mean of a random variable. Suppose that the number of defec¬
tive toasters produced by a factory each day is a random variable
with mean 40. What can you say about the probability that more
than 60 defective toasters will be produced on a given day?
(c) Suppose that X has finite variance a2. Let // = E(X). Prove that for
each 8 > 0,
2

P{\X-n\>S} <

This is known as Chebyshev's inequality. Hint: notice that \X —


n\ > 8 if and only if (X — n)2/82 > 1.
(d) Chebyshev's inequality can be used to provide information when
we have information only about the mean and variance of a random
variable. Suppose that the manager of the toaster factory in part
(b) has the additional information that the standard deviation of
the number of defective toasters is 4. What can you say about the
probability that the number of defective toasters will be between 30
and 50 on a given day?
(e) How would the manager of the toaster factory get this information
about the mean and the standard deviation?
400 Chapter 10. Probability Theory

(f) Chebyshev's inequality can also be used to provide an easy estimate


when one knows the density of X but the density is complicated.
Suppose that the manager of an alarm clock factory knows that 3%
of the clocks produced are defective. He is negotiating a special or¬
der of 10,000 clocks, and to sweeten the deal he's going to agree to
give a full refund if more than m clocks are defective. How should
he choose m so that the chance of giving a refund is less than 5%?
Hint: you need to use the fact that the variance of a binomial ran¬
dom variable with parameters p and n is np( 1 — p).

3. The purpose of this project is to introduce the weak law of large numbers.
Let Xi be a sequence of independent discrete random variables all having
the same density and finite variance. Let p = E(Xi) and for each n define
Sn = Xi + X2 + ... + Xn. We want to prove that for all <5 > 0,

0. (39)

This is known as the weak law of large numbers.

(a) Prove that E(Sn/n) = p.


(b) Let a2 = Var(Xj). Prove that Var(Sn/n) = a2/n. Hint: use part (f)
of Project 1.
(c) Use Chebyshev's inequality to prove (39).

(d) Suppose that we flip a fair coin repeatedly and that Xi — 1 if the
flip is a head and Xi = 0 if the flip is a tail. Explain carefully
what the weak law of large numbers means in this case. Why does
this make sense?
(e) Suppose that we flip a coin, which comes up heads with probability
p > \, repeatedly. Let X{ = 1 if the flip is a head and X{ = 0 if
the flip is a tail. Let Sn = Xx + X2 + ... + Xn. Prove that

Why does this make sense?

4. In this project we outline an example, due to Charles Peskin, which illus¬


trates why some biological systems are stochastic rather than determinis¬
tic. Suppose that it is the job of a group of 100 neurons to transmit a real
number p satisfying 0 < p < 1. One can think of p as the scaled intensity
of some variable. Each neuron in our group either fires (produces a 1) or
doesn't fire (produces a 0).

The deterministic scheme. The number p has a binary expansion p =


.nin2n3 ... where each rii is either 0 or 1. We order the cells from 1 to 100.
Projects 401

The k^1 cell senses the binary digit of p and fires if and only if = 1.
Thus, the output of our group of 100 neurons that is read at a higher level
is a string of 0's and l's giving the first 100 binary digits of p.

The stochastic scheme. Each neuron fires independently with probability


p. The output that is sensed at a higher level is the average output of the
group of neurons, that is, the number firing divided by the total number
of neurons.

(a) Compare and contrast the accuracy of the two schemes. Hint: for
the stochastic scheme, use Chebyshev's inequality.
(b) Compare and contrast the simplicity of the two schemes.
(c) Compare and contrast the stability of the two schemes. Hint: sup¬
pose that a particular neuron dies.
(d) In each scheme how difficult would it be to improve accuracy and
stability if there were selective pressure to do so?
>
Bibliography

[1] Bartle, R., and D. Sherbert, Introduction to Real Analysis, 2nd ed., John
Wiley & Sons, Inc., New York, 1992.

[2] Birkhoff, G., and G.-C. Rota, Ordinary Differential Equations, 4th ed.,
John Wiley & Sons, Inc., New York, 1989.

[3] Boltazzini, U., The Higher Calculus: A History of Real and Complex
Analysis from Euler to Weierstrass, Springer-Verlag, New York, 1986.

[4] Braun, M., Differential Equations and Their Applications, 4th ed..
Springer-Verlag, New York, 1993.

[5] Burden, R., and J. Faires, Numerical Analysis, 5th ed., PWS-Kent,
Boston, 1993.

[6] Churchill, R., and J. Brown, Complex Variables and Applications, 4th
ed., McGraw-Hill, Inc., New York, 1984.

[7] Coddington, E., and N. Levinson, Theory of Ordinary Differential


Equations, McGraw-Hill, Inc., New York, 1955.

[8] Devaney, R., A First Course in Chaotic Dynamical Systems, Addison-


Wesley, Reading, Mass., 1992.

[91 Devaney, R., An Introduction to Chaotic Dynamical Systems, 2nd ed.,


Addison-Wesley, Reading, Mass., 1989.

[10] Edelstein-Keshet, L., Mathematical Models in Biology, Random


House, New York, 1988.

[11] Edgar, G., Measure, Topologi/, and Fractal Geometry, Springer-Verlag,


New York, 1990.

[12] Edwards, C., Advanced Calculus of Several Variables, Academic Press,


New York, 1973.
404 Bibliography

[13] Edwards, C., The Historical Development of the Calculus, Springer-


Verlag, New York, 1979.

[14] Ewing, G., Calculus of Variations with Applications, W. W. Norton &


Co., Inc., New York, 1969.

[15] Fauvel, J., and J. Gray, History of Mathematics: A Reader, Macmillan


Education Ltd., London, 1987.

[16] Feller, W., An Introduction to Probability Theory and Its Applications,


Vols. I, II, John Wiley & Sons, New York, 1970,1971.

[17] Fourier J., La Theorie Analytique de Chaleur, Didot, Paris, 1822.

[18] Gelfand, I., and S. Fomin, Calculus of Variations, Prentice Hall, En¬
glewood Cliffs, NJ, 1963.

[19] Goldberg, R., Methods of Real Analysis, 2nd ed., John Wiley & Sons,
Inc., New York, 1976.

[20] Grattan-Guinness, I., The Development of the Foundations of Mathemat¬


ical Analysis from Euler to Riemann, MIT Press, Cambridge, Mass.,
1970.

[21] Hirsch, M., and S. Smale, Differential Equations, Dynamical Systems,


and Linear Algebra, Academic Press, New York, 1974.

[22] Hochstadt, H., Integral Equations, John Wiley & Sons, Inc., New
York, 1973.

[23] Hoffman, M., and J. Marsden, Basic Complex Analysis, W. H. Free¬


man and Company, New York, 1987.

[24] Hoffman, M., and J. Marsden, Elementary Classical Analysis, W. H.


Freeman and Company, New York, 1987.

[25] Hoppensteadt, F., and C. Peskin, Mathematics in Medicine and the Life
Sciences, Springer-Verlag, New York, 1991.

[26] John, F., Partial Differential Equations, 4th ed., John Wiley & Sons, Inc.,
New York, 1982.

[27] Kincaid, D., and W. Cheney, Numerical Analysis: Mathematics of Sci¬


entific Computing, Brooks/Cole, Pacific Grove, 1991.
Bibliography 405

[28] Kline, M. , Mathematical Thought from Ancient to Modern Times, Ox¬


ford University Press, New York, 1972.

[29] Korner, T., Fourier Analysis, Cambridge University Press, Cam¬


bridge, England, 1988.

[30] Lander, E., and M. S. Waterman, Calculating the Secrets of Life, Na¬
tional Academy Press, Washington, D.C., 1995.

[31] Lawler, G., Introduction to Stochastic Processes, Chapman & Hall,


1995.

[32] McEliece, R., The Theory of Information and Coding, Addison-Wesley,


Reading, Mass., 1977.

[33] Rosen, K., Elementary Number Theory and Its Applications, Addison-
Wesley, Reading, Mass., 1993.

[34] Ross; K., Elementary Analysis: The Theory of Calculus, Springer-


Verlag, New York, 1980.

[35] Ross, S., A First Course in Probability Theory, 4th ed., MacMillan, New
York, 1994.

[36] Rudin, W., Principles of Mathematical Analysis, 3rd ed., McGraw-Hill


Inc., New York, 1976.

[37] Scharlau, W., and H. Opolka, From Fermat to Minkowski: Lectures on


the Theory of Numbers and Its Historical Development, Springer-Verlag,
New York, 1985.

[38] Snell, J. L., Topics in Contemporary Probability and Applications, CRC


Press, Boca Raton, Fla., 1995.

[39] Steele, J., "Le Cam's Inequality and Poisson Approximation," Math¬
ematical Monthly, 91(1994), pp. 116 - 123.

[40] Stillwell, J., The Geometry of Surfaces, Springer-Verlag, New York,


1992.

[41 ] Strauss, W., Partial Differential Equations, An Introduction, John Wiley


& Sons, Inc., New York, 1992.

[42] van Lint, J., Introduction to Coding Theory, Springer-Verlag, New


York, 1982.
Symbol Index
sequence 27
{ }
ank
C(D)
subsequence
continuous functions on D
55
76
C\a, b\, C(M) continuous functions 126
C'(n)[a,6],C(n)(M) n times continuously differentiable 127
C^°°^ [a, b], (R) oo often continuously differentiable 127
C([a,b\ : M2) continuous functions with values in R2 182
C0(R) continuous functions going to 0 at ±oo. 182
Cb(R) bounded continuous functions on R. 181
C the complex numbers 252
Dom(f) domain of / 8
f Fourier transform 306
r1 inverse function 11
f °g composition of functions 12
f first derivative of / 120
f{n\x
inf
) nth derivative of /
infimum
127
53, 81
limn^oo limit as n —>■ oo 29
limit of / at c 78
limX//'x0 limit from the left 108
limT\ T
Us \fUsQ limit from the right 108
lim in^^oo limit inferior as n —> oo 224
lim supn_>00 limit superior as n -» oo 223
In a: natural logarithm 137
^oo the space t^ 213
the space £2 357
LP(f) lower sum 87
N natural numbers 6
P prime numbers 16, 264
P{a < X < b} probability X is between a and b 361, 376
P{X e A} probability X lies in A 360
V polynomials 14
Q rational numbers 7
Symbol Index 407

R real numbers 1
R2 Euclidean plane 7, 40
Rn n dimensional Euclidean space 157, 211
R radius of convergence 245
Ran(f) range of / 8
Sc complement of S 7
\s complement of S 7
sup supremum 53, 81
tW(x,o nth Taylor polynomial 135
Up(S) upper sum 88
T integers 7
Z2 integers modulo 2 5

€ is contained in (a set) 6
is not contained in (a set) 7
t
7T (n) number of primes < n 265
p metric 196
P2 Euclidean metric on Rn 196
Pv variational metric 389
0 empty set 7

Riemann integral 89
Jaf(x)dx
Sc f(z)dz integral on a contour C in C 308
order relations 2
<,<,>,>
* not equal to
u union 7
n intersection 7
= is equivalent to 3

-A function / 8
—>• converges to (numbers) 29
—>• converges to (functions) 168
X Cartesian product 7
absolute value 3, 254

norm 212
II • II
IT norm 179
1 ' lip
sup norm 175
|| -Hoc
Fourier series of a function 333
r^> approximately equal to 134

□ beginning of theorem, end of proof 3


.

V
Index

absolute convergence, 230, 255, 263 sup norm, 177


absolute value, 3, 254 Cauchy-Riemann equations, 301
alternating series, 270, 237 Cauchy-Schwarz ineq., 218, 346
analytic function, 299 Cauchy's integral formula, 314
Archimedian property, 4 Cauchy's theorem, 313
arc length, 119, 307 center of mass, 119
area, 93 chain rule, 125
argument change of variables, 149,150
of a complex number, 258 Chebyshev's inequality, 399
of a function, 8 closed
axiom of completeness, 48 contour, 312
disk, 256
interval, 6
Banach, S., 213 set, 157
Banach space, 213 code words, 369
coding theory, 368 - 375
Bernoulli, J., 189
compact sets, 154
Bernoulli random variable, 361
Bessel function, 271 comparison test, 232
Bessel's inequality, 349, 350 complement, 7
binary addition, 5 completeness
binomial random variable, 362 axiom of, 48
bisection method, 146 too, 213
Bolzano-Weierstrass theorem, 57 t2, 357
boundary conditions, 220, 324 metric spaces, 204
bounded normed linear spaces, 213
function, 80 real numbers, 4, 47
sequence, 35, 59 sup norm, 178
brachistochrone problem, 189, 220 complex conjugate, 254
complex numbers, 252
composition of functions, 12
calculus of variations, 188-195 conditional convergence, 231
Cantor, G., 19 conjugate, 254
Cantor function, 382 continuous function
Cantor set, 381 at a point, 73, 77,152
cardinality, 15 on C, 304
Cartesian product, 7 complex-valued, 305
Cauchy, A.-L., 45 Lipschitz, 85
Cauchy sequence piecewise, 109
complex numbers, 255 uniformly, 87,153
metric spaces, 203 continuously differentiable
normed-linear spaces, 213 complex-valued, 305
real numbers, 45 real-valued, 126,155
410 Index

continuous random variable, 281 metrics, 201, 203


contour, 312 norms, 217
integration, 308 Euclid, 22
contraction, 205 Euclidean
Contraction Mapping Principle, 205 distance, 40,151, 212
contradiction, 22 metric, 197
contrapositive proof, 21 plane, 7, 40,151
convergence, Euler, L., 191
infinite products, 260, 263 Euler equation, 191
mean-square sense, 346 Euler's method, 289 - 296
in a metric space, 201 expected value, 363, 377
in norm, 213 exponential function, 249, 257
pointwise, 164 exponential random variable, 378
sequences, 29,152, 254 extremal, 191
series, 228, 230, 255
in sup norm, 177
uniform, 167, 256 Fibonacci sequence, 68
converse, 21 field, 1
countable, 16 finite, 15
counterexample, 25 finite dimensional, 213
cumulative dist. function, 379 finite expectation, 363, 377
joint, 386 first-order method, 97, 292
fixed point, 65, 205, 206
Fourier, J. 326
decimal expansion, 18, 46, 272 Fourier series, 323 - 357
dense, 46, 208 coefficients, 333, 340
density function, 376 coefficients, generalized, 348
derivative, 121,154 mean-square conv., 345 - 354
differentiable, 121,155 pointwise conv., 337 - 345
differential equations, 273 - 322 Fourier transform, 306, 318
comparison of solutions, 286 function, 8
continuous dependence, 280 analytic, 299
global existence, 283 - 289 composition, 12
local existence, 275 inverse, 11
diffusion equation, 329 one-to-one, 11
dimension, 217 onto, 11
directional derivative, 156 product, 12
direct proof, 21 sum, 12
Dirichlet, A. 343 of two variables, 151 - 159
functional, 189
discontinuous, 103
discrete Fundamental Theorem
of algebra, 321
distribution function, 380
metric, 202 of arithmetic, 16
random variable, 366 of calculus, 131,132
distribution function, 379
diverges, 32, 228
DNA, 199, 221,389 Gauss, C., 268
domain, 8 generalized Fourier coef., 348
Du-Bois Raymond, R, 343 genetics, 67
geodesic, 199
geometric series, 229
Einstein, A., 329 Gibbs, J. 342
equivalent Gibbs phenomenon, 342
logically, 21 global solution, 284
Index 411

greatest lower bound, 53 Le Cam's inequality, 394


length, 119, 307
limit
Hadamard, 268 from the left, right, 108
half-open interval, 6 limit inferior, 224
Hamiltonian, 158 limit superior, 223
Hamiltonian system, 158 in a metric space, 201
Hamming, R., 370 sequence, 29
harmonic series, 231 limit point, 56, 202
heat equation, 324 linearly independent, 217
Hilbert, D., 357 linear transformation, 215, 319
Hilbert space, 357 Liouville's theorem, 320
l'Hospital's rule, 138,140 Lipschitz continuous, 85
hypotheses, 20 local existence, 275
logarithm, 137,151
lower sum, 87
if and only if, 21
imaginary part, 253, 301
improper Riemann int., 113,115 Maclaurin series, 245
independent random var., 361, 383 Markov chain, 40 - 44
induction, 23 Markov's inequality, 399
infimum mass density, 360
function, 81 maximal element, 52
set, 53 mean, 363, 377
infinite, 15 mean-square convergence, 345 - 354
infinite products, 260 - 270 Mean Value Theorem, 130
information rate, 370 measure zero, 345
initial conditions, 274, 324 methods of proof, 20 - 25
injective, 11 metric, 196
inner product, 345, 356 discrete, 202
inner product space, 356 equivalent, 201
integers, 6 Euclidean, 197
integral equations, 183-188 geodesic, 199
integral on paths, 309 uniformly equivalent, 203
Intermediate Value Theorem, 82 variation, 389
interval, 6 metric space, 196
invariant probabilities, 43 midpoint rule, 100
inverse function, 11,148 monotone
irrational numbers, 6, 46 function, 103
sequence, 48
V2,22
e, 25 strictly, 147
isometry, 208
isomorphism, 253
natural logarithm, 137,151
joint density, 386 natural numbers, 6
jump discontinuity, 108 Newton's method, 140 - 147
nondifferentiable function, 241
kernel, 219 norm, 212
equivalent, 217
Laplace transform, 174 Euclidean, 212
Laplace's equation, 311 IT,179
sup, 175
lateral inhibition, 210
least upper bound, 52 normal random variable, 378
412 Index

normed linear spaces, 210 - 219 quadratic map, 60 - 67


complete, 213
numerical methods
differential equations, 281 - 296 radius of convergence, 245, 256
integrals, 95-103 random variables, 259
continuous, 281
discrete, 360
one-to-one independent, 361, 386
correspondence, 15 range, 8
function, 11 rational numbers, 6, 46
onto, 11 ratio test, 233
open real numbers, 1
disk, 256 real part, 253, 300
interval, 6 Riemann, G., 89
set, 299 Riemann integral, 89,161
orbit, 60,158, 298
improper, 113,115
order
numerical method, 97, 292 Riemann-Lebesgue lemma, 338
relation, 2 Riemann sum, 91
ordered field, 6 Riemann zeta func., 244, 269, 320
orthogonal, 347, 357 Rolle's theorem, 130
orthonormal basis, 357 roots of unity, 258
orthonormal family, 347 root test, 233

parabolic equations, 329 sample space, 359


parametrization of curves, 307 separation of variables, 159, 325
sequence,
Parseval's relation, 352
partial derivative, 154 functions, 163 - 221
numbers, 27 - 71, 254
partial sum, 228
subsequence, 55
partition, 87
series
perfect code, 374 absolute convergence, 230, 255
period two, 65 alternating, 237, 270
piecewise constant, 343 conditional convergence, 231
piecewise continuous, 109 convergence, 228, 230, 255
piecewise smooth, 128 divergence, 228
pointwise convergence Fourier, 323 - 357
Fourier series, 337 - 345 functions, 238
functions, 164 geometric, 229
Poisson random variable, 363 harmonic, 231
polar form, 258 rearrangement, 234
power series, 245, 256 shortest distance, 188
complex variables, 256, 317 simple contour, 312
radius of convergence, 245, 256 Simpson's rule, 102
prime numbers, 16, 264 smooth curve, 307
Prime Number Theorem, 268, 318 Sobolev inequalities, 354
probability density function, 376 stable fixed point, 206
product of functions, 12 standard deviation, 377, 398
proof standard normal, 378
by contradiction, 22 strictly contained in, 7
contrapositive, 21 strictly monotone, 147
direct, 21 subfield, 253
by induction, 23 subsequence, 55
Index 413

subset, 7 uncountable, 19
sum uniform convergence
of functions, 12
sequences, 167
lower, 87
upper, 88 series, 238, 256
supremum uniformly continuous, 84,153
function, 81 uniformly equivalent metrics, 203
norm, 175 uniform random variable, 377
set, 53 upper bound, 52
surjective, 11 upper sum, 88

Taylor polynomial, 135 de la Valleee Poussin, C., 268


Taylor series, 245 value of a function, 8
Taylor's theorem, 136 variables separable equation, 159
ternary expansion, 382 variance, 377, 398
variation metric, 389
trapezoid rule, 100,120
vector space, 210
triangle inequality
Volterra integral equation, 187
complex numbers, 254
metric spaces, 196 wave equation, 159, 355
norms, 212 weak law of large numbers, 400
in the plane, 40
Weierstrass M-test, 237
real numbers, 4
sup norm, 176
trigonometric functions, 249 zeta function, 244, 269, 320
36-296
TUN VERSTY

0 64 0468087 2
Cover:
MATISSE, Henri.
Interior with a Violin Case.
Nice, (winter 1918 -19)
Oil on canvas, 28 3/4 x 23 5/8”(73 x 60 cm).
The Museum of Modern Art, New York.
Lillie P. Bliss Collection.
Photograph © 1998
The Museum of Modern Art, New York

J-O&h- ^ S'C4*/}'r \l*C*


New York
Chichester
Weinheim
Brisbane
Singapore
Toronto

http://www.wiley.com/college

ISBN □-471-15^-4
9 0 0 00>

9 780471 159964

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy