0% found this document useful (0 votes)
17 views

Book On Stokes Simpler

Uploaded by

Anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Book On Stokes Simpler

Uploaded by

Anil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

Shanghai Lectures on Multivariable Analysis

William G. Faris

October 18, 2016


ii
Contents

1 Differentiation 1
1.1 Fixed point iteration (single variable) . . . . . . . . . . . . . . . 2
1.2 The implicit function theorem (single variable) . . . . . . . . . . 5
1.3 Linear algebra review (norms) . . . . . . . . . . . . . . . . . . . . 7
1.4 Linear algebra review (eigenvalues) . . . . . . . . . . . . . . . . . 12
1.5 Differentiation (multivariable) . . . . . . . . . . . . . . . . . . . . 16
1.6 Fixed point iteration (multivariable) . . . . . . . . . . . . . . . . 26
1.7 The implicit function theorem (multivariable) . . . . . . . . . . . 28
1.8 Second order partial derivatives . . . . . . . . . . . . . . . . . . . 32
1.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Integration 43
2.1 The Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Jordan content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Approximation of Riemann integrals . . . . . . . . . . . . . . . . 48
2.4 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Dominated convergence . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Differentiating a parameterized integral . . . . . . . . . . . . . . 58
2.8 Approximate delta functions . . . . . . . . . . . . . . . . . . . . . 60
2.9 Linear algebra (determinants) . . . . . . . . . . . . . . . . . . . . 61
2.10 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3 Differential Forms 71
3.1 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 Scalar fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Fluid velocity and the advective derivative . . . . . . . . . . . . . 78
3.5 Differential 1-forms . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Integrating factors and canonical forms . . . . . . . . . . . . . . . 84
3.8 The second differential . . . . . . . . . . . . . . . . . . . . . . . . 86
3.9 Regular surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

iii
iv CONTENTS

3.10 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . 91


3.11 Differential k-forms . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.12 The exterior derivative . . . . . . . . . . . . . . . . . . . . . . . . 94
3.13 The Poincaré lemma . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.14 Substitution and pullback . . . . . . . . . . . . . . . . . . . . . . 100
3.15 Pullback of a differential form . . . . . . . . . . . . . . . . . . . . 102
3.16 Pushforward of a vector field . . . . . . . . . . . . . . . . . . . . 104
3.17 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.18 Integration of top-dimension differential forms . . . . . . . . . . . 108
3.19 Integration of forms over singular surfaces . . . . . . . . . . . . . 109
3.20 Stokes’ theorem for chains of singular surfaces . . . . . . . . . . . 110
3.21 Classical versions of Stokes’ theorem . . . . . . . . . . . . . . . . 113
3.22 Picturing Stokes’ theorem . . . . . . . . . . . . . . . . . . . . . . 115
3.23 Electric and magnetic fields . . . . . . . . . . . . . . . . . . . . . 119

4 The Metric Tensor 127


4.1 The interior product . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3 The divergence theorem . . . . . . . . . . . . . . . . . . . . . . . 130
4.4 Conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.5 The metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.6 Twisted forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.7 The gradient and divergence and the Laplace operator . . . . . . 139
4.8 Orthogonal coordinates and normalized bases . . . . . . . . . . . 141
4.9 Linear algebra (the Levi-Civita permutation symbol) . . . . . . . 144
4.10 Linear algebra (volume and area) . . . . . . . . . . . . . . . . . . 145
4.11 Surface area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

5 Measure Zero 159


5.1 Outer content and outer measure . . . . . . . . . . . . . . . . . . 160
5.2 The set of discontinuity of a function . . . . . . . . . . . . . . . . 162
5.3 Lebesgue’s theorem on Riemann integrability . . . . . . . . . . . 163
5.4 Almost everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.5 Mapping sets of measure zero . . . . . . . . . . . . . . . . . . . . 166
5.6 Sard’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.7 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.8 Fiber integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.9 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.10 The co-area formula . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.11 Linear algebra (block matrices) . . . . . . . . . . . . . . . . . . . 175

Mathematical Notation 179


Preface

The book
This book originated in lectures given in Fall 2014 at NYU Shanghai for an
advanced undergraduate course in multivariable analysis. There are chapters
on Differentiation, Integration, Differential Forms, The Metric Tensor, together
with an optional chapter on Measure Zero. The topics are standard, but the
attempt is to present ideas that are often overlooked in this context. The fol-
lowing chapter by chapter summary sketches the approach. The explanations in
the summary are far from complete; they are only intended to highlight points
that are explained in the body of the book.

Differentiation
This main themes of this chapter are standard.

• It begins with fixed point iteration, the basis of the subsequent proofs.

• The central object in this chapter is a smooth (that is, sufficiently differen-
tiable) numerical function f defined on an open subset U of Rk with values
in Rn . Here k is the domain dimension and n is the target dimension. The
function is called “numerical” to emphasize that both inputs and outputs
involve numbers. (See below for other kinds of functions.) Sometimes a
numerical function is denoted by an expression like y 7→ f (y). The choice
of the variable name y is arbitrary.

• When k < n a numerical function f from an open subset U of Rk to Rn


can give a explicit (parametric) representation of a k-dimensional surface
in Rn .

• When k < n a numerical function g from an open subset W of Rn to


Rn−k can given an implicit representation of a family of k-dimensional
surfaces in W .

• The implicit function theorem gives conditions for when an implicit rep-
resentation gives rise to an explicit representation.

v
vi CONTENTS

• When k = n the function f defines a transformation from an open subset


U of Rn to an open set V in Rn .

• The inverse function theorem gives conditions that ensure that the trans-
formation has an inverse transformation.

• In the passive interpretation the smooth transformation f : U → V has


a smooth inverse: the points in U and the points in V give alternative
numerical descriptions of the same situation.

• In the active interpretation the smooth transformation f : U → U de-


scribes a change of state: the state described by y in U is mapped into
the new state described by f (y) in U . The transformation may be iterated.

Integration
This chapter is about the Riemann integral for functions of several variables.
There are several interesting results.

• The Fubini theorem for Riemann integrals deals with iterated integrals.

• The dominated convergence theorem for Riemann integrals is a result


about pointwise convergence. The setting is a sequence of Riemann in-
tegrable functions defined on a fixed bounded set with a common bound
on their values. The sequence of functions is assumed to converge (in
some sense) to another Riemann integrable function. It is elementary to
prove that if the functions converge uniformly, then the integrals converge.
The dominated convergence theorem says that if the functions converge
pointwise, then the integrals converge. The direct proof of the dominated
convergence involves a somewhat complicated construction, but the result
is spectacularly simple and useful.

• There is a treatment of approximate delta functions.

• The change of variables formula has an elegant proof via approximate


delta functions and the dominated convergence theorem.

Differential Forms
This chapter and the next are the heart of the book. The central idea is ge-
ometrical: differential forms are intrinsic expressions of change, and thus the
basic mechanism of calculation with differential forms is equally simple for ev-
ery possible choice of coordinate system. (Of course for modeling a particular
system one coordinate system may be more convenient than another.) The fun-
damental result is Stokes’ theorem, which is the natural generalization of the
fundamental theorem of calculus. Both the fundamental theorem in one dimen-
sion and Stokes’ theorem in higher dimensions make no reference to notions of
length and area; they simply describe the cumulative effect of small changes.
CONTENTS vii

Because of this intrinsic nature, the expression of Stoke’s theorem is the same
in every coordinate system.

• Example: Here is a simple example from physics that illustrates how nat-
ural it is to have a free choice of coordinate system. Consider an idea gas
with pressure P , volume V , and temperature T . The number of gas par-
ticles N is assumed constant. Each of the quantities P, V, T is a function
of the state of the gas; these functions are related by the ideal gas law

P V = N kT.

(The constant k transforms temperature units to energy units.) The dif-


ferential form of this relation is

P dV + V dP = N k dT.

This equation is a precise description of the how these variables change for
a small change in the state of the gas. A typical use of the fundamental
theorem of calculus is the calculation of work done by the system during
a change of state where the temperature is constant. This is obtained by
integrating −P dV along states of constant temperature T = T0 . Suppose
that in this process the volume changes from V0 to V1 and the pressure
changes from P0 to P1 . Then

−P dV = −N kT0 dV /V = −N kT0 d log(V )

has integral N kT0 log(V0 /V1 ). But nothing depends on using volume as
an independent variable. On the curve where T = T0 there is a relation
P dV + V dP = 0. So on this curve the form is also equal to

V dP = N kT0 dP/P = N kT0 log(P )

with integral N kT0 log(P1 /P0 ). Since P0 V0 = P1 V1 = N kT0 , this is the


same result.
• While many applications of differential forms have nothing to do with
length or area, others make essential use of these ideas. For instance, the
length of a curve in the plane is obtained by integrating
p p
(dx/dt)2 + (dy/dt)2 dt = (dx/du)2 + (dy/du)2 du.

The motion along the curve may be described either by the t coordinate or
the u coordinate, but the length of the curve between two points is a num-
ber that does not depend on this choice. In practice such integrals involv-
ing square roots of sums of squares can be awkward. Sometimes it helps
to use a different coordinate system in the plane.
p For instance, in polar
coordinates the same form has the expression (dr/dt)2 + r2 (dθ/dt)2 dt.
The theory of differential forms gives a systematic way of dealing with all
such coordinate changes.
viii CONTENTS

• The chapter introduces a fundamental organizing principle of mathemat-


ical modeling. It applies in geometry and in most applications of mathe-
matics, and it deserves a serious discussion. In the approach of this book
it is based on the elementary concept of an n-dimensional manifold patch.
This term is used here for a differentiable manifold M modeled on some
open subset of Rn . In more detail, M is a set, and there are certain
one-to-one functions from M onto open subsets of Rn called coordinate
systems. Suppose u = (u1 , . . . , un ) : M → U is a coordinate system, and
suppose that w = (w1 , . . . , wn ) : M → W is another coordinate system,
with u = g(w). It is required that g : W → U be a smooth numerical
function with smooth inverse. There is no preferred coordinate system.
The modeling process may employ several coordinate systems in the same
discussion.

• A scalar field is a smooth function s : M → R from the manifold patch


to the reals. If u = (u1 , . . . , un ) : U → Rn is a coordinate system, then
s = f (u) = f (u1 , . . . , un ) for some smooth numerical function f : U → R.
A scalar field is usually pictured in terms of its contour curves (or contour
surfaces).

• Example: There are many examples of the modeling process from subjects
as varied as physics and economics. There are also examples internal to
mathematics, especially in geometry. Here is one that might occur in
elementary geometry. Let M be the set of geometric rectangles in the
plane. (The rectangles have a common corner and common alignment.)
Each rectangle has a length ` and a width w. So ` : M → R and w :
M → R are both scalar fields. Together (`, w) form a coordinate system
that maps M onto an open quadrant in R2 . This coordinate system is
useful for describing the problem of painting a rectangular surface with
given dimensions. The product A = `w is also a scalar, the area of the
rectangle. The pair (A, w) forms another coordinate system. The new
system is useful for painting a rectangular surface with a given width
using a fixed amount of paint. The numerical function that relates these
two coordinate systems is (x, y) 7→ (xy, y) with inverse (s, t) 7→ (s/t, t).

• The concept of scalar field leads to the concept of exact differential 1-form.
If s is a scalar field, the corresponding exact differential 1-form is written
ds. A general differential 1-form is a linear combination of products of
scalar fields with exact differential 1-forms. These concepts have precise
definitions that are given in the book.

• Example (continued): In the example the differential of area A is the exact


differential 1-form dA given by

dA = w d` + ` dw.

It is the sum of two differential forms neither of which is exact.


CONTENTS ix

• There is another mathematical object that is easy to confuse with a dif-


ferential 1-form. This is a vector field, often pictured as a field of arrows.
The book includes a full discussion of vector fields. For the present discus-
sion it is only necessary to point out that a vector field is a very different
object from a differential 1-form. For instance, near every point for which
the vector field is non-zero it is possible to find a new coordinate system
with respect to which the vector field is constant. This is not true for
differential forms: the form ` dw is already an example. It makes sense
to say that a differential form is exact or not exact. For vector fields any
such notion depends on a choice of metric. Vector fields are easy to picture
with arrows. Differential forms are pictured in a quite different way, as
explained in the book.
• How does one picture a differential 1-form? For an exact form ds there is
a straightforward method: For each point, zoom in on contour curves (or
contour surfaces) of s until they look linear near the point. In the example
this gives nice pictures for d ` and d w. What about a form like ` dw that
is not exact? The book describes how to draw the picture in this case.
• Example (continued): Scalar fields are fundamental to mathematical mod-
eling, both in geometry and in applied mathematics. The letters that are
used for the scalars are meaningful, as they describe the actual objects that
are being modeled. Thus in the example A is “area” and w is “width”.
Let c > 0, and let Mc be the subset of rectangles for which ` + w = 2c.
On Mc there is a relation d` + dw = 0. A standard problem is to solve
for the point in Mc where dA = ` dw + w d` = 0. The answer is obtained
by eliminating d` and dw from these two equations. The solution to the
problem is not numerical; it is the square where ` = w = c, that is, a point
in the manifold patch M .
• Here is another important distinction. A passive transformation is a
change of coordinates. It gives an alternative description of the same situ-
ation. In the example M may be equally by well described the coordinate
system `, w or in the coordinate system A, w. A passive transformation
is always invertible. An active transformation is a change in the object
being modeled. A simple example is doubling the width of the rectangle
and then flipping it across the diagonal. This transformation is described
by ` ← 2w, w ← `. The effect on the area is to double it:

A(` ← 2w, w ← `) = 2A.

This is a rigorous equality of scalars: the left hand side is the composition
of the scalar function A : M → R with the active transformation ` ←
2w, w ← ` : M → M . An√active transformation
√ need not have an inverse:
the transformation ` ← `w, w ← `w maps rectangles to squares.
The distinction between passive and active transformations is widely rec-
ognized but not always made explicit. (Some authors use terminology in
x CONTENTS

which “passive” and “active” are replaced by the awkward terms “alias”
and “alibi”.)
• Given two manifold patches K and M there is a concept of manifold
map from K to M . The treatment in the book gives a practical notation
that may be used in computations. Suppose K has coordinate system
t = (t1 , . . . , tk ) and M has coordinate system u = (u1 , . . . , un ). Then
there is a smooth numerical function f such that the manifold map may
be represented in the form u ← f (t). What this means is: take the input
point in K, find its coordinate values using t, apply the function f to
these values to get new values, and finally define the output point in M to
be the point that has these values as u coordinates. The notation for an
active transformation is the special case when K = M and there is only
one coordinate system, so the transformation is u ← f (u).
• The left arrow notation for an active transformation is in perfect analogy
with the assignment notation used in computer science. In that case, a
(simultaneous) assignment means to start with the machine state, find
the values of the u variables, do the computation indicated by f (u) and
store the result in the locations indicated by the u variables, thus produc-
ing a new machine state. (In computer science and in mathematics the
equal sign is often used to indicate assignment. Since assignment is not
symmetric, this is a clash of notation.)
• There is a general concept of pullback of a scalar field s or a differential
form by a manifold map. In the case of a scalar field this is just composi-
tion. That is, if the scalar field

s = g(u) : M → R

is composed with the manifold map u ← f (t) : K → M , then the result is

s(u ← f (t)) = g(f (t)) : K → R.

The notational device in the above equation is precise and convenient.


It is also important as substantive mathematics: pullback together with
change of variable is the key to integration of differential forms.
• There is an exterior product for differential forms, and this leads to the
general notion of differential k-form. Furthermore, the differential of a
(k − 1)-form ω is a k-form dω. The main result of the chapter is Stokes’
theorem. The simplest case of Stokes’s theorem is for k = 2; in this case
it is usually known as Green’s theorem. In that special case the 1-form

ω = p du + q dv

has a differential that is the 2-form


 
∂q ∂p
dω = − du dv.
∂u ∂v
CONTENTS xi

If R is a two-dimensional region with boundary ∂R, Green’s theorem states


that Z   Z
∂q ∂p
− du dv = (p du + q dv).
R ∂u ∂v ∂R

The intuitive meaning is that circulation within the region) produces


change along the boundary. Stokes’ theorem is the corresponding gen-
eral result that works in any number of dimensions, again for arbitrary
coordinate systems.
• What is the geometrical picture for Stokes’ theorem? It should not be
in terms of a vector field, because that is a very different kind of object
from a differential form. The treatment in the book presents reasonable
pictures, at least in low dimensions.
• The ideas are illustrated by simple computations with static electric and
magnetic fields. Physics texts often describe fields E, D, H, B. The electric
field E may be thought of as a 1-form, while the electric flux density D is
a 2-form. Similarly, the magnetic field intensity H can be a 1-form, while
the magnetic flux density B is a 2-form. The forms E and B are exact
and can be written in terms of potential by E = −dφ and B = dA. In the
presence of charges and currents the forms D and H fail to be exact.

The metric tensor


Some texts on multivariable analysis treat differential forms, but do not ex-
plain how they related to notions of length, area, and volume. Or sometimes
they make the connection only in Cartesian coordinates, or only in the three-
dimensional situation. But such metric notions are unavoidable; the Laplace
operator is a central object that makes full use of them.
The treatment in this book gives an account that describes the situation
in any number of dimensions. The central object is the metric tensor. This
may be regarded as a differential 1-form with very special properties that al-
low it to define the notionpof distance. In the simplest case (two-dimensional
flat space) it is the usual (dx/dt)2 + (dy/dt)2 dt in Cartesian coordinates or
p
(dr/dt)2 + r2 (dθ/dt)2 dt in polar coordinates.

• Sometimes all that is needed is to specify a volume form, which is an n-


form. In two dimensional flat space this is the area 2-form dx dy = r dr dθ.
For uniformity of terminology this is called volume, more precisely, 2-
dimensional volume. The volume form in arbitrary coordinates is conven-
√ √
tionally written in a notation such as g du dv, where the scalar factor g

depends on the coordinate system. Thus in Cartesian coordinates g = 1,

while in polar coordinates g = r.
• The divergence theorem is a result about vector fields that depends on
knowing the volume form. A vector field, perhaps representing fluid flow,
determines a flux form, which is an (n − 1)-form that describes how the
xii CONTENTS

flow crosses a surface. In the case of dimension n = 2 the vector field has
components a and b (that depend on u and v), and the corresponding flux
√ √
form is the 1-form a g dv − b g du. The divergence theorem then states
that
 √ √ 
∂a g ∂b g √ √ √
Z Z
1
√ + g du dv = (a g dv − b g du) .
R g ∂u ∂v ∂R

The intuitive meaning is that mass production within the region) pro-
duces flow across the boundary. The flow is described by a vector field, an
object that is easier to picture than a differential 1-form. Once the appro-
priate volume form is given, the divergence theorem may be formulated in
any number of dimensions and in arbitrary coordinates. The interpreta-
tion and use of the divergence theorem are illustrated in the book by the
solution of a conservation law.

• The definitions of gradient and Laplace operator require use of the metric
tensor. This leads to a choice of whether to use coordinate bases or or-
thonormal bases. The advantages of both appear when it is possible to use
orthogonal coordinates. The book explains how these ideas are related. In
particular, it makes the connection with treatments found in elementary
textbooks.

• The chapter concludes with formulas for surface area. A central idea is
an amazing generalization of the theorem of Pythagoras. The classical
theorem says that the length of a vector is the square root of the sum of
squares of the lengths of its projections onto the coordinate axes. One
case of the generalization says that the area of a parallelogram is the
square root of the sum of the squares of the areas of its projections onto
coordinate planes. Unfortunately, surface area calculations are not easy.

Measure zero
The notions of Riemann integral and Jordan content are almost obsolete; they
have been surpassed by the far more powerful and flexible theories of Lebegue
integral and Lebesgue measure. The study of the Riemann integral is still useful,
if only to see how a subtle change in a definition can lead to a new theory that
improves it in every way. Roughly speaking, the Riemann integral or Jordan
content is defined by a one-step limiting process, while the Lebesgue integral or
Lebesgue measure uses a two-step limiting process. This makes all the difference.

• The first part of the chapter is a tiny portion of the Lebesgue theory that
fits in a nice way with the material presented earlier. This is the theory
of sets of Lebesgue measure zero. These sets have nice mapping proper-
ties, and they also are fundamental to the characterization of Riemann
integrable functions.
CONTENTS xiii

• The second topic is different. The usual surface area formula works for
an explicitly defined surface (that is, a parameterized surface). There is
another approach that works for a family of implicitly defined surfaces. For
such a family there are two associated objects. The fiber form describes
how to integrate over such a surface, and the co-area factor is a metric
quantity associated with the surface. The co-area formula states that the
surface area form is equal to the co-area factor times the fiber form. This
gives a direct way to do surface area calculations for implicitly defined
surfaces.

General references
For mathematics at this level it is helpful to see multiple approaches. The course
used Rudin [17] as an alternative text; it focused on the two chapters on Func-
tions of Several Variables and on Integration of Differential Forms. The book
by Spivak [18] gives a treatment at about the same level as Rudin. Flanders [6]
presents differential forms along with many applications. For a more advanced
version of the story, the reader may consult Barden and Thomas [2]. Perhaps
the relatively technical books by Morita [12] and by Agricola and Friedrich [1]
could be useful. The subject matter considered here overlaps with tensor anal-
ysis. The book by Lovelock and Rund [9] has a relatively traditional approach;
as a consequence one can find many useful formulas. The notes by Nelson [13]
are more abstract and sophisticated, but there is good information there too.

Acknowledgements
The students in this course were young, enthusiastic, and able; meeting them
over the course of a semester was a delight. Some of them were able to find
time to read the manuscript and make valuable comments. In particular, the
author is happy to thank Hong-Bin Chen and Xiaoyue Gong for their work on
a previous draft.
xiv CONTENTS
Chapter 1

Differentiation

1
2 CHAPTER 1. DIFFERENTIATION

1.1 Fixed point iteration (single variable)


The foundational results for the subject of these lectures are the implicit function
theorem and the inverse function theorem. Each of these depends on the fact
that a certain equation has a unique solution. The simplest and most general
technique for proving that an equation has a solution is fixed point iteration.
That is the topic that begins these notes.
Let X be a metric space. Let g : X → X be a function from the space to
itself. A fixed point is a point x∗ in X with g(x∗ ) = x∗ .
Take x0 in X. Define the sequence of points by iteration. That is, define xn
in X by
xn+1 = g(xn ) (1.1)
for n ≥ 0. The set of iterates starting at x0 is called the orbit of x0 .

Proposition 1.1 Suppose that g : X → X is continuous. Let xn be the se-


quence of iterates starting at x0 . Suppose that xn → x∗ as n → ∞. Then x∗ is
a fixed point.

Example: As an example, take X to be the real line with g(x) = cos(x). Start
with an arbitrary real number. Then the iterates converge to a fixed point x∗
that is equal to 0.739 in the first three decimal places. This is an experiment
that is easy to do with a scientific calculator. |
When X is the real number line there is a lovely way of picturing the iteration
process. Consider the graph of the linear function y = x. Each iterate is
represented by a point (x, x) on this graph. Consider the graph of the given
function y = g(x). The next iterate is obtained by drawing the vertical line from
(x, x) to (x, g(x)) and then the horizontal line from (x, g(x)) to (g(x), g(x)). The
process is repeated as many times as is needed to show what is going on.
In the following we need stronger notions of continuity. A function g : X →
X is Lipschitz if there is a constant c ≥ 0 such that for all x and y in X we have

d(g(x), g(y)) ≤ cd(x, y). (1.2)

A Lipschitz function is automatically continuous, in fact, it is even uniformly


continuous. If the constant c ≤ 1 then the function is called a contraction. If
the constant c < 1, then the function is called a strict contraction.

Theorem 1.2 (Contraction mapping theorem) Let X be a complete met-


ric space. Let g : X → X be a function. Suppose that g is a strict contraction.
Then g has a unique fixed point x∗ . Furthermore, if x0 is in X, then the corre-
sponding sequence of iterates satisfies xn → x∗ as n → ∞.

Proof: There can only be one fixed point. Suppose x = g(x) and y =
g(y) are fixed points. Then d(x, y) ≤ d(x, g(x)) + d(g(x), g(y)) + d(g(y), y) =
d(g(x), g(y)) ≤ cd(x, y). Since c < 1 we must have d(x, y) = 0, so x = y. This
proves the uniqueness of the fixed point.
1.1. FIXED POINT ITERATION (SINGLE VARIABLE) 3

The existence of the fixed point is shown via iteration. Start with x0 and
define the corresponding iterates xn . Then d(xn+1 , xn ) ≤ cn d(x1 , x0 ). Hence
for m > n
d(xm , xn ) ≤ d(xm , xm−1 ) + · · · + d(xn+1 , xn ) ≤ [cm + · · · + cn ]d(x1 , x0 ). (1.3)
Hence
1 n
d(xm , xn ) ≤ [cm−n + · · · + 1]cn d(x1 , x0 ) ≤ c d(x1 , x0 ). (1.4)
1−c
This shows that the xn form a Cauchy sequence. Since the metric space is
complete, this sequence must converge to some x∗ . This is the desired fixed
point. 
Example: Take again the example g(x) = cos(x). Let 1 ≤ r < π/2. Take the
metric space to be the closed interval X = [−r, r]. Since 1 ≤ r the function
g(x) = cos(x) maps X into itself. Furthermore, since r < π/2 there is a c < 1
such that the derivative g 0 (x) = − sin(x) satisfied |g 0 (x)| ≤ c. This is enough
to show that g is a strict contraction. So the theorem guarantees the existence
and uniqueness of the fixed point. Although there is no explicit formula for the
solution of cos(x) = x, the theorem defines this number with no ambiguity. |
The reasoning in this one-dimensional case may be formulated in a general
way. The following result gives a way of rigorously proving the existence of fixed
points. The hypothesis requires a bound on the derivative and the existence of
an approximate fixed point p.
Proposition 1.3 Let p be a real number, and consider the closed interval [p −
r, p + r] with r > 0. Suppose that |g 0 (x)| ≤ c < 1 for x in this interval. Fur-
thermore, suppose that |g(p) − p| ≤ (1 − c)r. Then g maps the interval into
itself and is a strict contraction, so it has a unique fixed point in this interval.
Furthermore, iteration starting in this interval converges to the fixed point.
Proof: It is clear that |g(x) − g(y)| ≤ c|x − y|. In order to show that g maps
the interval into itself, suppose |x − p| ≤ r. Then |g(x) − p| ≤ |g(x) − g(p)| +
|g(p) − p| ≤ c|x − p| + (1 − c)r ≤ r. 
Sometimes it is helpful to have a result where one knows there is a fixed
point, but wants to show that it is stable, in the sense that fixed point iteration
starting near the fixed point converges to it. The following proposition is a
variant that captures this idea.
Proposition 1.4 Let p be a fixed point of g. Suppose that g 0 is continuous and
that |g 0 (p)| < 1. Then for c with |g 0 (p)| < c < 1 there is an r > 0 such that
|g 0 (x)| ≤ c < 1 for x in the closed interval [p − r, p + r]. Then g maps the
interval into itself and is a strict contraction. Furthermore, iterates starting in
this interval converge to the fixed point.
Fixed point iteration gives a rather general way of solving equations f (x) =
0. If a is an arbitrary non-zero constant, then the fixed points of
1
g(x) = x − f (x) (1.5)
a
4 CHAPTER 1. DIFFERENTIATION

are the solutions of the equation. The trick is to pick a such that g is a strict
contraction. However,
1
g 0 (x) = 1 − f 0 (x), (1.6)
a

so the strategy is to take a close to the values of f 0 (x) at points where f (x) is
close to zero. This is illustrated in the following result, which is a reformulation
of a previous result.

Proposition 1.5 Let p be a real number, and consider the closed interval [p −
r, p + r] with r > 0. Suppose that for some c with 0 < c < 1 we have 1 − c <
1 0
a f (x) < 1 + c for x in this interval. Furthermore, suppose that |f (p)| ≤
|a|(1 − c)r. Then the corresponding fixed point iteration starting in this interval
converges to the solution of f (x) = 0 in this interval.

Example: Suppose we want to solve the equation x = 2 cos(x). Since 2 cos(π/3) =


1, this is a fixed√point equation with a solution near π/3. However, since
−2 sin(π/3) = − 3 there is not much hope for stability. So we may write
the equation as f (x) = x − 2 cos(x) = 0.√The derivative is f 0 (x) = 1 + 2 sin(x).
The value of this derivative at π/3 is 1+ 3 which is fairly close to 3. So we take
a = 3, and the iteration function is g(x) = x − 31 (x − 2 cos(x)) with derivative
g 0 (x) = 1 − 13 (1 + 2 sin(x)). This derivative decreases from 1 − 31 (1 + 1) = 1/3
at π/6 to 1 − 13 (1 + 2) = 0 at π/2. So we may take c = 1/3. Furthermore,
π/3 − g(π/3) = 31 (π/3 − 1). It is easy to see that this is bounded above by
(1 − c)r = 23 π/6. So π/3 is a sufficiently good approximation to the fixed point.
Iteration shows that the fixed point is close to 1.03. |
Example: Even with simple examples like the one above, it is convenient to use
a computer program to do the calculations. Here, for example, is a complete
program written in the computer language R.
f <- function (x) x - 2 * cos(x)
g <- function (x) x - f(x)/3
x <- 1
for (i in 1:10) x <- g(x)
x
[1] 1.029867 |
Example: Fix z with 0 < z < 1. Suppose we want to solve the equation
w = zew for w. We can plot z = we−w and see that there are always two
positive solutions, one less than 1 and the other greater than 1. The smaller
solution may be computed by fixed point iteration. On the other hand, the
greater solution is an unstable fixed point. If we take the smaller solution,
then we have a well-defined function L(z) defined for 0 < z < 1 with values
0 < L(z) < 1 that satisfies L(z) = zeL(z) . There is no obvious explicit formula
for this function. |
1.2. THE IMPLICIT FUNCTION THEOREM (SINGLE VARIABLE) 5

1.2 The implicit function theorem (single vari-


able)
If one wants to solve an equation f (x, y) = 0 for x as a function of y (or for y
as a function of x), there can be problems. Consider, for example, the equation
x − y 2 = 0. There is no problem in solving for x as a function of y. On the other
hand, near the origin there is a √
real issue of how to solve √
for y as a function of x.
The obvious attempts are y = x for x ≥ 0 and y = − x for x ≥ 0. In either
case the solution is non-differentiable at x = 0. The implicit function theorem
gives insight into this sort of behavior.

Theorem 1.6 (Implicit function theorem) Let f (x, y) be continuous with


continuous partial derivatives near some point x = a, y = b with f (a, b) = 0.
The goal is to solve x = h(y), so x is to be the dependent variable. Suppose that
the partial derivative with respect to the dependent variable ∂f (x, y)/∂x 6= 0 at
x = a, y = b. Then there is a continuous function h(y) defined for y near b with
f (h(y), y) = 0. In fact, the function h(y) has a continuous derivative, and

df (h(y), y) ∂f (x, y) dh(y) ∂f (x, y)


0= = |x=h(y) + |x=h(y) . (1.7)
dy ∂x dy ∂y
In the following we shall sometimes write the partial derivatives in the form
∂f (x, y) 0
= f,1 (x, y) (1.8)
∂x
and
∂f (x, y) 0
= f,2 (x, y). (1.9)
∂y
Then the equation has the form
0
0 = f,1 (h(y), y)h0 (y) + f,2
0
(h(y), y). (1.10)

This may be solved to give


0
f,2 (h(y), y)
h0 (y) = − 0 . (1.11)
f,1 (h(y), y)
0
Proof: Let A = ∂f (x, y)/∂x at x = a, y = b, that is A = f,1 (a, b). Consider
the iteration function
1
g(x, y) = x − f (x, y). (1.12)
A
This has partial derivative
∂g(x, y) 1 ∂f (x, y)
=1− . (1.13)
∂x A ∂x
At x = a, y = b this has the value 0. Pick some convenient value of c with
0 < c < 1. Then for x, y sufficiently close to a, b this partial derivative has
6 CHAPTER 1. DIFFERENTIATION

absolute value bounded by c. In particular, this is true in some box |x − a| ≤ r,


|y−b| < s. Fix r > 0. We know that g(a, b)−a = 0. Hence, if s > 0 is sufficiently
small, |g(a, y) − a| ≤ (1 − c)r. This shows that for all y with |y − b| < s the
contraction mapping result works for the function x 7→ g(x, y) on the interval of
x with |x − a| ≤ r. The resulting fixed point is given by some function x = h(y).
It is not too difficult to show that h(y) is a continuous function of y. Consider
y 0 near y. Then g(h(y), y) = h(y) and g(h(y 0 ), y 0 ) = h(y 0 ). So h(y 0 ) − h(y) =
g(h(y 0 ), y 0 ) − g(h(y), y) = g(h(y 0 ), y 0 ) − g(h(y), y 0 ) + g(h(y), y 0 ) − g(h(y), y). This
gives |h(y 0 ) − h(y)| ≤ c|h(y 0 ) − h(y)| + |g(h(y), y 0 ) − g(h(y), y)|. Write this as
(1 − c)|h(y 0 ) − h(y)| ≤ |g(h(y), y 0 ) − g(h(y), y)|. Then as y 0 → y we have
g(h(y), y 0 ) → g(h(y), y), and hence h(y 0 ) → h(y).
It remains to show that h has a continuous derivative. The derivative of h
at y is computed as the limit of the difference quotient (h(y + k) − h(y))/k as
k → 0. In order to get a handle on this, compute

0 = 0 + 0 = f (h(y + k), y + k) − f (h(y), y). (1.14)

Expand

0 = f (h(y + k), y + k) − f (h(y), y) (1.15)


Z 1
d
= f (th(y + k) + (1 − t)h(y), t(y + k) + (1 − t)y) dt.
0 dt

By the chain rule for partial derivatives

0 = f (h(y + k), y + k) − f (h(y), y) (1.16)


Z 1
0
= [f,1 (th(y + k) + (1 − t)h(y), t(y + k) + (1 − t)y)(h(y + k) − h(y))
0
0
+f,2 (th(y + k) + (1 − t)h(y), t(y + k) + (1 − t)y)k] dt.

This has solution


R1 0
h(y + k) − h(y) f,2 (th(y + k) + (1 − t)h(y), t(y + k) + (1 − t)y) dt
= − R01 .
k f 0 (th(y + k) + (1 − t)h(y), t(y + k) + (1 − t)y) dt
,1
0
(1.17)
Now let k → 0 in each integral on the right. For each fixed t between 0 and
1 the integrand converge to a limit as k → 0. Furthermore, the integrands
are bounded uniformly in k. In this circumstance the dominated convergence
theorem (considered later in these notes) shows that the limit of the integral is
the integral of the limit. In fact, the integrands become independent of t, and
we obtain the desired formula for h0 (y), that is,
0
f,2 (h(y), y)
h0 (y) = − 0 . (1.18)
f,1 (h(y), y)

The right hand side of this formula is continuous in y, so the left hand is also
continuous in y. This shows that the function h has a continuous derivative. 
1.3. LINEAR ALGEBRA REVIEW (NORMS) 7

Example: As a very simple example, consider the equation f (x, y) = x2 +


y 2 − 1 = 0 of a circle. The partial derivatives are ∂f (x, y)/∂x = 2x and
∂f (x, y)/∂y = 2y. Thus for every point on the circle where x 6= 0 it is pos-
sible to solve for x in terms of y near
p this point. However, if the point is on the
right half of the circle, then
p x = 1 − y 2 , while if the point is on the left half
of the circle, then x = − 1 − y 2 . In a completely symmetrical way, for every
point on the circle where y 6= 0 it is possible to solve for y in terms of √ x near
this point. If the point is on the upper half of the circle, then √ y = 1 − x2 ,
while if the point is on the lower half of the circle, then y = − 1 − x2 . |
Example: For a less trivial example, take f (x, y) = cos(x)+cos(xy)+cos(y)−2.
Then f (π/2, 0) = 0. In order to solve x = h(y) near y = 0 we can use fixed
point iteration. Say that, for instance, we want the value h(1/10). Notice that
∂f (x, y)/∂y = − sin(x) − y sin(xy). At x = π/2, y = 0 this has the value −1.
So we take the iteration function g(x) = x + f (x, 1/10). There is a fixed point
at x = 1.553753 which is near π/2 = 1.570796. This is the value of h(1/10). |
Example: Here is an R program to compute h(1/10) from the previous example.
f <- function (x,y) cos(x) + cos(x * y) + cos(y) - 2
g <- function (x) x + f(x,1/10)
x <- pi/2
for (i in 1:30) x <- g(x)
x
[1] 1.553753 |

1.3 Linear algebra review (norms)


This section assumes some acquaintance with linear algebra and matrix theory.
If A is an m by n real matrix, and x is an n-component column vector, then Ax
is an m-component column vector. Thus matrix defines a linear function that
sends n-component vectors to m-component vectors. If B is a k by m matrix,
then the matrix product BA is a k by n matrix. The product of BA with x is
the same as the product of B with Ax. Thus matrix multiplication coincides
with the composition of the functions.
One can think of an n-component column vector x as a n by 1 matrix. Often
we shall refer to this simply as a vector . An n-component row vector ω is a 1
by n matrix. This will be called a covector or linear form. The product ωx is
then a scalar.
Let A be an m by n matrix. Then its transpose AT is the n by m matrix
with the columns and rows reversed. There is a fundamental identity

(BA)T = AT B T . (1.19)

between n by k matrices.
An n by n square matrix Q has an inverse R if QR = I and RQ = I. The
inverse of Q is denoted Q−1 . It is well-known linear algebra fact that if the
inverse exists on one side, for instance QR = I, then also RQ = I, and the
8 CHAPTER 1. DIFFERENTIATION

inverse exists on both sides. For matrices there are natural notions of addition
and subtraction and of (non-commutative) multiplication. However some care
must be taken with division, since P Q−1 is in general different from Q−1 P .
Note the important identity

(AB)−1 = B −1 A−1 . (1.20)

Furthermore, suppose both A and B have inverses. Then we may take B =


A + B − A, multiply to get A−1 B = I + A−1 (B − A), multiply again to get
A−1 = B −1 + A−1 (B − A)B −1 . A square matrix with an inverse is often called
invertible or non-singular. Of course it is said to be singular if it does not have
an inverse.
An m by n matrix A cannot have an inverse matrix unless m = n. However
there are important related concepts. Such a matrix has a rank r, which is
the dimension of the range, that is, of the space spanned by the columns. It is
always true that r ≤ n and r ≤ m. If r = n ≤ m or r = m ≤ n, then the matrix
is said to be of maximal rank. In the first case this is equivalent to the n by n
matrix AT A being invertible, while in the second case it is equivalent to the m
by m matrix AAT being invertible. A matrix like AT A is sometimes called a
Gram matrix.
If A is an n by n square matrix, then there are several interesting numbers
that one can associate to it. The trace of A is the number tr(A) obtained by
taking the sum of the diagonal entries. If A and B are such matrices, then

tr(A + B) = tr(A) + tr(B) (1.21)

and
tr(AB) = tr(BA). (1.22)
The most interesting number associated to a square matrix A is its determi-
nant det(A). It has a relatively complicated definition, but there is one property
that is particularly striking. If A and B are such matrices, then

det(AB) = det(BA) = det(A) det(B). (1.23)

Also det(I) = 1. (Here I is the n by n identity matrix, with 1s on the main


diagonal and 0s elsewhere.) We always have det(A) = det(AT ). A matrix A has
an inverse A−1 if and only if det(A) 6= 0. In that case det(A−1 ) = 1/ det(A).
An n by n real matrix H is symmetric if A = AT . An n by n real matrix
P is orthogonal if P T P = P P T = I, that is, P −1 = P T . Notice that if P is
orthogonal, then P T = P −1 is also orthogonal. An n by n real matrix Λ is
diagonal if all entries off the main diagonal are 0.

Theorem 1.7 (Spectral theorem) Suppose H is real and symmetric. Then


there is a real orthogonal matrix P and a real diagonal matrix Λ such that

HP = P Λ. (1.24)
1.3. LINEAR ALGEBRA REVIEW (NORMS) 9

The columns of P form an orthonormal basis, and these are the eigenvectors
of H. The diagonal entries in Λ are the eigenvalues of H. In general it is quite
difficult to compute such eigenvalues and eigenvectors.
The equation HP = P Λ may be written in various ways. Multiplication on
the right by P T gives H = P ΛP T . If λj are the eigenvalues, and pj are the
columns of P , then this gives the spectral representation of the matrix:
X
H= λj pj pTj . (1.25)
j

Example: Here is a simple example where one can do the computation. The
symmetric matrix is  
13 12 2
H =  12 13 −2  . (1.26)
2 −2 8
It is easy to see that this matrix has dependent rows, so the determinant is
zero. As a consequence at least one eigenvalue is zero. In this situation it is
easy to find the other two eigenvalues λ1 , λ2 . Use λ1 + λ2 = tr(A) = 34 and
λ21 + λ22 = tr(A2 ) = 706. (For a symmetric matrix tr(A2 ) = tr(AT A) is the
2-norm.) All that remains is to solve a quadratic equation to get the non-zero
eigenvalues 25, 9. The corresponding eigenvectors are found by solving linear
systems. The eigenvectors form the column matrix
 
1 1 2
R =  1 −1 −2  . (1.27)
0 4 −1

Since the eigenvalues are distinct, the eigenvectors are automatically orthogonal.
This says that RT R is a diagonal matrix. If we normalize each column to be a
vector of length one, then we get a new matrix P such that P T P is the identity
matrix. In particular, we get the representation
   
1  1
1  1  
H = 25 1  1 1 0 + 9  −1  1 −1 4 . (1.28)
2 18
0 4

|
Our main goal is to get a number to measure the size of a matrix. There is
a particularly simple definition of the size of a real column vector, namely

|x| = xT x. (1.29)

This will be called the Euclidean norm of the vector. There is a corresponding
notion of inner product of two real column vectors:

x · y = xT y. (1.30)

The famous Schwarz inequality says that |x · y| ≤ |x||y|.


10 CHAPTER 1. DIFFERENTIATION

If A is a real m by n matrix, then we may define the size of the matrix A to


be the least number kAk such that for all real n component column vectors we
have
|Ax| ≤ kAk|x|. (1.31)
It follows that
|Ax − Ay| = |A(x − y)| ≤ kAk|x − y|, (1.32)
so kA|k is the best Lipschitz constant for the linear transformation A. In these
lectures it will be called the Lipschitz norm of A (corresponding to the Euclidean
norm for the vectors).
It is not difficult to show that the norm behaves reasonably under sum and
product. In fact, kA + Bk ≤ kAk + kBk and kABk ≤ kAkkBk. The norm also
preserves the transpose: kAT k = kAk. This has an easy proof that begins with
kAT xk2 = AT x · AT x = x · AAT x. The reader will have no difficulty using the
Schwarz inequality to complete the proof that kAT k ≤ kAk. This is half the
proof. But then one can use A = AT T to prove that kAk ≤ kAT k.
For the inverse it is easy to see that kA−1 k ≥ 1/kAk. If B is close enough
to A and A−1 exists, then B −1 also exists. All that is required is that B − A
be small enough so that kA−1 kk(B − A)k < 1. In fact, there is an even better
result, given in the following proposition.
Proposition 1.8 Suppose that A−1 exists and that kA−1 (B − A)k < 1. Then
B −1 exists, and
B −1 = A−1 − A−1 (B − A)B −1 . (1.33)
Proof: The space of n by n matrices with the Lipschitz norm is a complete
metric space. Define the map g(C) = A−1 − A−1 (B − A)C. By hypothesis
this is a contraction mapping. Hence it has a fixed point satisfying C = A−1 −
A−1 (B − A)C. Then AC = I − (B − A)C = I − BC + AC, and hence BC = I.
So C is the inverse of B. 
This norm kAk is quite difficult to compute. In fact, write H = AT A and
notice that it is symmetric. So HP = P Λ, or, taking Q = P −1 , QH = ΛQ.
Thus
|Ax|2 = (Ax)T (Ax) = xT AT Ax = xT Hx = xT QT ΛQx = (Qx)T Λ(Qx).
(1.34)
Note that this expression easily leads to the conclusion that all the eigenvalues
λi ≥ 0. It can be compared to
|x|2 = xT x = xT QT Qx = (Qx)T (Qx). (1.35)
Clearly |Ax|2 ≤ λmax |x|2 , and this is the least such bound. This is summarized
in the following result.
Theorem 1.9 The Lipschitz norm of A has the explicit expression
p
kAk = λmax , (1.36)
where λmax is the largest eigenvalue of AT A.
1.3. LINEAR ALGEBRA REVIEW (NORMS) 11

For a real symmetric matrix AT A the norm is the largest eigenvalue. So


another consequence of this reasoning is
q
kAk = kAT Ak. (1.37)

In other words, to compute the norm of A one computes the norm of the sym-
metric matrix AT A and takes the square root.
Example: Consider the matrix
 
3 2 2
A= . (1.38)
2 3 −2

It is easy to see that kAk2 = 34. On the other hand, to compute kAk takes
some work. But the matrix AT A is the√matrix H of the previous example,
which has eigenvalues 25, 9, 0. So kAk = 25 = 5.
There is a considerably easier way to do this computation, namely compute
kAT k, which is square root of the largest eigenvalue of AAT . The pleasure of
this is left to the reader. |
Since eigenvalues are difficult to compute, this is bad news. However, there
is another norm of A that is easier to compute. This is the Euclidean norm

kAk2 = trAT A. (1.39)

It has the happy feature


P that it may be obtained from the matrix entries.
PThe
j, k entry of AT A is i aij aik . Thus the diagonal j, j entry is i aij aij = i a2ij
P
Thus XX
kAk22 = trAT A = a2ij (1.40)
j i

Even better news is that the Euclidean norm is an upper bound for the
Lipschitz norm. In fact,
XX XX X XX X
|Ax|2 = ( aij xj )2 ≤ ( a2ij )( x2k ) = a2ij ( x2k ) = kAk22 |x|2 .
i j i j k i j k
(1.41)
This is summarized in the following proposition.

Proposition 1.10 The Lipshitz norm of a matrix is related to the Euclidean


norm of the matrix by
kAk ≤ kAk2 . (1.42)

The other direction is not so precise. There is a bound, but it depends on


the dimension of the m by n matrix. One result is the following.

Proposition 1.11 The Euclidean norm of a matrix is related to the Lipschitz


norm of the matrix by √
kAk2 ≤ nkAk. (1.43)
12 CHAPTER 1. DIFFERENTIATION

P
The proof is easy. Write aij = p aip δpj . This is the i component of A
applied to the vector δj that is 0 except for 1 in the jth place. Then
XX XX X X X
kAk22 = a2ij = ( aip δpj )2 = |Aδj |2 ≤ kAk2 |δj |2 = nkAk2 .
j i j i p j j
(1.44)

1.4 Linear algebra review (eigenvalues)


This section is devoted to a deeper analysis of the structure of a square matrix
A with real entries. A complex number λ is an eigenvalue if and only if A − λI
is a singular matrix. The set of eigenvalues is sometimes called the spectrum of
A. A real matrix A has either real eigenvalues λk or complex eigenvalue pairs
λk = |λk |(cos(φk ) + i sin(φk )), λ̄k = |λk |(cos(φk ) − i sin(φk )). In the complex
case |λk | > 0 and φk is not a multiple of π. The spectral radius ρ(A) is defined
to be the largest of the |λk |.
A complex eigenvalue pair manifests itself in the form of a real rotation
matrix. Recall that a rotation matrix is a matrix of the form
 
cos(φ) − sin(φ)
R(φ) = . (1.45)
sin(φ) cos(φ)

Again we shall mainly be interested in the cases when R is not diagonal, that
is, φ is not a multiple of π.
One more ingredient is necessary. A square matrix N is said to be nilpotent
if some power N p = 0 for p ≥ 1. Such a matrix is small in the sense that every
eigenvalue of N must be zero. In other words, the spectral radius ρ(N ) = 0.
However the norm kN k can be quite large.
Much can be learned from the special case of 2 by 2 matrices. A 2 by 2
matrix A defines a linear function from R2 to R2 . For each x in R2 there is a
corresponding Ax in R2 . It is difficult to imagine the graph of such a function.
However there is a nice pictorial representation that is very helpful. One picks
several values of x and draws corresponding vectors Ax − x from x to Ax.
In the 2 by 2 case it is not difficult to compute the eigenvalues. The sum of
the eigenvalues is tr(A) and the product of the eigenvalues is det(A). From this
it is easy to see that the eigenvalues are the solutions of the quadratic equation
λ2 − tr(A)λ + det(A) = 0.
If A has positive eigenvalues less than one, then the vectors point in the
general direction of the origin. This is a stable case when the orbits go to
the fixed point at zero. If A has positive eigenvalues greater than one, then
they move away from the origin. If A has a positive eigenvalue greater than
one and another less than one, then the orbits move toward the eigenspace
corresponding to the larger of the two eigenvalues and then outward. If A has
negative eigenvalues, then there is some overshoot. However one can always
look at A2 and get pictures with just positive eigenvalues.
1.4. LINEAR ALGEBRA REVIEW (EIGENVALUES) 13

Another situation is the complex eigenvalue case, when A is a multiple of a


rotation. If the angle φ is reasonably small, then one gets a picture of a spiral
around the origin, moving inward if |λ| < 1 and outward if |λ| > 1.
The case of a nilpotent 2 by 2 matrix A is quite interesting. There is an
eigenspace corresponding to eigenvalue 0. The arrows all land in that eigenspace,
and the ones starting from the eigenspace lead to the origin.
Example: Consider the matrix
 
2 2 −1
A= . (1.46)
5 1 2

This has eigenvalues


√ equal to 2/5 times 2 ± i. The maginitude of the eigenvalues
is 2/5 times 5 which is about 0.8944272. This means the behavior is stable.
The orbit should spiral in to zero. |
Example: Here is a computation for the previous example.
x <- vector()
y <- vector()
u <- 100
v <- 0
for (i in 1:100) {
u <- (2*u - v) * 2 / 5
v <- (u + 2*v) * 2 / 5
x[i] <- u
y[i] <- v }
frame()
plot(x,y)
lines(x,y) |

Theorem 1.12 Let A be an n by n matrix with real entries. Suppose that it has
n distinct eigenvalues. Then there exists a real matrix Λ with the same (possibly
complex) eigenvalues λk as A. Here Λ is a real matrix whose non-zero entries
are either real diagonal entries λk or two-by-two blocks of the form |λk |R(φk ),
where |λk | > 0 and R(φk ) is a rotation matrix. Furthermore, there exists a real
invertible matrix P such that
AP = P Λ. (1.47)

Suppose for simplicity that the eigenvalues are distinct and real, so Λ is
a real diagonal matrix. In this case there is a spectral theorem that directly
generalizes the result for the symmetric case. Let P have column vector pj and
let P −1 have row vectors χk . Then we have matrices that are similar matrices
X
A = P ΛP −1 = λj pj χk . (1.48)
j

Example: Consider the matrix


 
11 −18
A= . (1.49)
6 −10
14 CHAPTER 1. DIFFERENTIATION

Then    
2 3 2 0 2 −3
A = P ΛQ = . (1.50)
1 2 0 −1 −1 2
Here Q = P − is the inverse of P . The spectral representation is
   
2   3  
A=2 2 −3 − −1 2 . (1.51)
1 2

In this situation there is usually nothing gained by normalizing the vectors. |


Theorem 1.13 Let A be an n by n matrix with real entries. Then there exists
a real matrix Λ with the same (possibly complex) eigenvalues λk as A. Here Λ
is a real matrix whose non-zero entries are either real diagonal entries λk or
two-by-two blocks of the form |λk |R(φk ), where |λk | > 0 and R(φk ) is a rotation
matrix. Furthermore, for every δ > 0 there exists a real invertible matrix P and
a real nilpotent matrix N , such that
AP = P J, (1.52)
where
J =Λ+N (1.53)
and kN k ≤ δ.
This theorem says that A may be represented as A = P JP −1 , where we
know a lot about the size of J = Λ + N . In fact, the norm of Λ is the spectral
radius ρ(A) of A. The norm of N is less that δ. So the norm of J is bounded
kJk ≤ kΛk + kN k ≤ ρ(A) + δ. (1.54)
Unfortunately, this does not mean that the norm of kAk is particularly small.
In fact, all we know is that
kAk = kP JP −1 k ≤ kP kkJkkP −1 k. (1.55)
Unfortunately, kP kkP −1 k can be much larger than one.
There is an artificial trick for getting around this difficulty. This trick is to
find a new norm that is adapted to the particular matrix A. It is convenient to
write Q = P −1 , so that Q is an invertible matrix with
A = Q−1 JQ. (1.56)
The matrix G = QT Q is symmetric with eigenvalues > 0. Define a new norm
on vectors by
√ p q
|x|G = xT Gx = xT QT Qx = (Qx)T (Qx) = |Qx|. (1.57)

This defines a new Lipschitz norm on matrices. We can compute this norm in
the case of A (for which it was designed). We have
|Ax|G = |QAx| = |JQx| ≤ kJk|Qx| = kJk|x|G . (1.58)
1.4. LINEAR ALGEBRA REVIEW (EIGENVALUES) 15

With respect to this new Lipschitz norm we have

kAkG ≤ kJk ≤ ρ(A) + δ. (1.59)

This discussion is summarized in the following theorem. The theorem says that
the size of the matrix A is in some more profound sense determined by the
spectral radius ρ(A).

Theorem 1.14 For every real square matrix A with spectral radius ρ(A) and
for every δ > 0 there is a new Lipschitz norm defined with respect to a symmetric
matrix G with strictly positive eigenvalues such that

kAkG ≤ ρ(A) + δ. (1.60)

One can wonder whether one has the right to change norm in this way. The
first observation is that notions of continuity do not change. This is because if
G = QT Q, then

|x|G ≤ kQk|x| (1.61)


−1
|x| ≤ kQ k|x|Q

Furthermore, the geometry only changes in a rather gentle way. Thus, a ball
|x|2G < r2 in the new norm is actually an ellipsoid |Qx|2 = xT Gx < r2 in the
original picture.
The weakness of this idea is that the new norm is specially adapted to the
matrix A. If one is dealing with more than one matrix, then the new norm that
is good for one may not be the new norm that is good for the other.
Example: Consider the p eigenvalues 2, −1. The
p previous example with distinct
new norm is |Qx| = (2x − 3y)2 + (−x + 2y)2 = 5x2 − 16xy + 13y 2 . With
respect to this norm on vectors the matrix A has norm 2 (the spectral radius).
|
In the case of a 2 by 2 matrix A with a repeated eigenvalue λ it is easy to
find the decomposition. Let u be an eigenvector, so that (A − λI)u = 0. Find
some other vector such that (A − λI)v = δu. Take P = [u, v] to be the matrix
with these vectors as columns. Then
 
λ δ
AP = P . (1.62)
0 λ

Example: Let  
−4 4
A= (1.63)
−9 8
with repeated eigenvalue 2. Then

2 − 31 δ 1
    
2 δ 1 0 3δ
A = P JQ = . (1.64)
3 0 0 2 δ −3 2
16 CHAPTER 1. DIFFERENTIATION

So the new norm |Qx| is determined by


r r
2 1 2 2 1
δ |Qx| = (−3x + 3y) + δ y = 9x2 − 12xy + (4 + δ 2 )y 2 .
2 (1.65)
9 9
As long as δ 6= 0 this expression is only zero at the origin, and so it defines a
legitimate norm. |
We conclude this discussion of linear algebra with one more digression. We
have been considering a matrix as a linear transformation, in which case the
relevant classification is given by matrix similarity transformations. Thus if
y = Ax, and we write x = P x0 and y = P y0 , we get y0 = P −1 AP x0 . This
means that we transform the matrix A to P −1 AP in an attempt to simplify it.
However it is also possible to regard a symmetric matrix G as a quadratic form.
This means that we associate with G the quadratic form

q(x) = xT Gx. (1.66)

If we make a transformation x = P y, then the quadratic form becomes

q(x) = xT Gx = (P y)T gP y = yT P T GP y. (1.67)

In other words, the relevant transformation is a matrix congruence transforma-


tion that sends G to P T GP . The matrix P is required to be non-singular, but
is not required to be orthogonal. So the eigenvalues are not preserved. Re-
markably, the signs of the eigenvalues are preserved. In fact, the matrix may
be brought to the diagonal form P T GP = E, where the diagonal entries are ±1
or 0. The quadratic form is non-degenerate precisely when all these entries are
±1. It is positive definite when all these entries are 1.
These quadratic form ideas will occur in the context of the Hessian matrix
of second partial derivatives. An even more important application is the metric
tensor that provides the notions of length, area, and volume.

1.5 Differentiation (multivariable)


It is convenient to deal with a rather general class of functions, those defined
on some domain E ⊆ Rn with values in Rm . If x is in E, then there is a
corresponding value
u = f (x) (1.68)
in Rm . This may be written somewhat more explicitly as ui = fi (x) =
fi (x1 , . . . , xn ).
It may help to see an example. Take n = 3 and m = 2. Then the equation
says

u1 = f1 (x1 , x2 , x3 ) (1.69)
u2 = f2 (x1 , x2 , x3 ).
1.5. DIFFERENTIATION (MULTIVARIABLE) 17

In many practical situations this is written without subscripts, for example

u = f (x, y, z) (1.70)
v = g(x, y, z).

Differential calculus has an important generalization to this situation. The


starting point is the matrix of partial derivatives f 0 (x). We take E to be an
open subset of Rn . For each x in E we have the matrix of partial derivatives
f 0 (x) with components

∂ui ∂fi (x)


f 0 (x)i,j = = . (1.71)
∂xj ∂xj

Each partial derivative is just an ordinary derivative in the situation when all
variables but one are held constant during the limiting process. The matrix of
partial derivatives is also called the Jacobian matrix.
Again it may help to look at an example. We have
 0 0 0

0 f1,1 (x) f1,2 (x) f1,3 (x)
f (x) = 0 0 0 . (1.72)
f2,1 (x) f2,2 (x) f2,3 (x)

An alternate notation using variables might be


" #
∂u1 ∂u1 ∂u1
∂u1 , ∂u2 ∂x1 ∂x2 ∂x3
= ∂u 2 ∂u2 ∂u2 . (1.73)
∂x1 , ∂x2 , ∂x3 ∂x1 ∂x2 ∂x3

The reader should be warned that a notation like the one on the left is often used
in situations where the matrix is a square matrix; in that case some authors use
it to denote the matrix, others use it to denote the determinant of the matrix.
If u = f (x, y, z) and v = g(x, y, z) we can also write
 
  " ∂u ∂u ∂u # dx
du
= ∂x ∂v
∂y
∂v
∂z
∂v
 dy  . (1.74)
dv ∂x ∂y ∂z dz

One way to interpret this equation is to think of x, y, z as functions of a parame-


ter t. Then replace dx, dy, dz by dx/dt, dy/dt, dz/dt and du, dv by du/dt, dv/dt.
Of course one can dispense with matrix notation altogether and write
∂u ∂u ∂u
du = dx + dy + dz (1.75)
∂x ∂y ∂z
∂v ∂v ∂v
dv = dx + dz + dz.
∂x ∂y ∂z
The existence of the matrix of partial derivatives does not fully capture the
notion of differentiability. Here is the actual definition of the derivative f 0 (x).
Write
f (x + h) = f (x) + f 0 (x)h + r(x, h). (1.76)
18 CHAPTER 1. DIFFERENTIATION

Thus the function is written as the value at x plus a linear term (given by
multiplying a matrix times a vector) plus a remainder term. The requirement
on the remainder term is that
|r(x, h)|
→0 (1.77)
|h|

as h → 0. Sometimes it is easier to deal with this with an equivalent definition


that does not involve fractions:

|r(x, h)| ≤ (x, h)|h|, (1.78)

where (x, h) → 0 as h → 0. In other words, the remainder term has to be of


higher order, in this precise sense of being bounded by the norm of h with a
coefficient that goes to zero. From this definition one can conclude that if the
derivative f 0 (x) exists in this sense, then function is continuous and the matrix of
partial derivatives exists and coincides with f 0 (x). Since the derivative provides
a linear transformation that approximates changes in the function, it is often
called the linearization of the function.
Example: There are examples where the matrix of partial derivatives exists
and yet the derivative does not exist. Take f (x, y) = xy/(x2 + y 2 ) away from
the origin, f (0, 0) = 0. Away from the origin it is easy to compute the partial
derivatives ∂f (x, y)/∂x = xy(y − x)/(x2 + y 2 )2 and ∂f (x, y)/∂y = xy(x −
y)/(x2 + y 2 )2 . Since f (x, 0) = 0 and f (0, y) = 0, it follows that the two partial
derivatives exist also at the origin. However the function is discontinuous at the
origin. |
There is a theorem that says that if the matrix of partial derivatives exists
and is continuous, then the derivative exists. This theorem is proved below.
In most cases we shall assume that the conditions of the theorem are satisfied.
There is a very convenient notation that is used in such contexts. A function
defined on an open set is called a C k function if all partial derivatives up to order
k exist and are continuous. Thus a continuous function is a C 0 function, and
a differentiable function with continuous derivative is a C 1 function. Often in
theoretical work it is convenient to assume that one is dealing with a C 1 function,
or even a C 2 function. So as not to worry too much about technicalities, it is
quite common to assume that the function under consideration is a C ∞ function.
This just means that it is in C k for every k = 0, 1, 2, 3, . . .. Often a C ∞ function
is called a smooth function. In theoretical discussions one often has various
things to worry about other than exactly how differentiable a function has to
be. So it is common to limit the discussion to smooth functions. Sometimes one
instead talks about “sufficiently smooth” function, which just means C k with k
large enough for the purpose at hand.
The notion of differentiable function is very general, and it is revealing to
see how various cases apply to various important situations. First, consider the
case when n < m. In that case it is common to think of the set of u = f (x) for
x in the domain to be a parameterized surface in Rm of dimension n. Suppose
that the rank of f 0 (x) is n. (The rank is the dimension of the range, spanned
1.5. DIFFERENTIATION (MULTIVARIABLE) 19

by the columns.) For each x one can look at the tangent space to the surface
at u = f (x). The tangent space is n dimensional and has the parametric form
f (x) + f 0 (x)h, where h is in Rn and parameterizes the tangent space.
In the special case when n = 1 the surface is actually just a curve. The
derivative at a point is the tangent vector. The tangent space at a point on the
curve is a line tangent to the point. For example, take the case when m = 2
and the curve is given by

u = cos(t) (1.79)
v = sin(t)

This is a parametric representation of a circle. The tangent to the circle at a


particular point is given by

ū = cos(t) − sin(t)k (1.80)


v̄ = sin(t) + cos(t)k,

where k is an arbitrary real constant.


When n = 2 one gets a parameterized surface. Here the classic example with
m = 3 is

u = sin(s) cos(t) (1.81)


v = sin(s) sin(t)
w = cos(s)

This describes a spherical surface. The angle s is the co-latidude and the angle
t is the longitude. The tangent plane is given by two parameters h, k as

ū = sin(s) cos(t) + cos(s) cos(t)h − sin(s) sin(t)k (1.82)


v̄ = sin(s) sin(t) + cos(s) sin(t)h + sin(s) cos(t)k
w̄ = cos(s) − sin(s)h.

Next, consider the case when m < n. In this case the set of x with f (x) = c
can be the implicit definition of a surface of dimension n − m in Rn . Suppose
that f 0 (x) has rank m. Then the tangent space to the surface at a point x
should have dimension n − m. It should consist of the points x̄ such that
f 0 (x)(x̄ − x) = 0.
When m = 1 the surface has dimension n − 1 and is called a hypersurface.
If u = f (x1 , . . . , xn ), then the derivative ∂u/∂xi = f,i0 (x1 , . . . , xn ) is a covector.
We often write this in the differential notation as
∂u ∂u
du = dx1 + · · · + dxn . (1.83)
∂x1 ∂xn
A simple example is a sphere given by

x2 + y 2 + z 2 = 1. (1.84)
20 CHAPTER 1. DIFFERENTIATION

The differential of the left hand side is 2x dx + 2y dy + 2z dz. The tangent plane
at a point is found by solving the equation
2x(x̄ − x) + 2y(ȳ − y) + 2z(z̄ − z) = 0. (1.85)

When m < n it turns out that it is often possible to convert an implicit


representation f (x) = c into a parametric representation. Information on how
to do this is given by the implicit function theorem.
Yet another story is when m = n. In applications the equation u = f (x)
can have several interpretations. In the passive interpretation some object is
described by coordinates x, and it is found convenient to have an alternative
description in terms of new coordinates y. In this case one would like to be able
to solve for the x in terms of the u, so that the two descriptions are equivalent.
In particular, one would like each matrix f 0 (x) to have an inverse matrix. We
shall see that the inverse function theorem gives information that applies to this
situation.
Example: A typical passive operation is the change from polar coordinates to
cartesian coordinates. This is just
x = r cos(φ) (1.86)
y = r sin(φ).
A given point has two descriptions. |
In the active interpretation there is only one coordinate system. The function
f (x) then describes how the state of the object is changed. This process can be
iterated, so that x at the next stage becomes f (x) from the previous stage. In
this situation fixed points play an important role. If x∗ is a fixed point with
f (x∗ ) = x∗ , then the eigenvalues of f 0 (x∗ ) given important information about
the stability of the fixed point.
Example: Here is an example called the Hénon map. It occurred in a study of
an extremely simplified model of atmospheric motion. It may be viewed as the
composition of two simpler maps. The first is
p = x (1.87)
2
q = 1 − ax + y.
Here a > 0 is a parameter. This particular transformation preserves area.
Combine this with the transformation

u = q (1.88)
v = bp.
Here b is a parameter with 0 < b < 1, representing some kind of contraction or
dissipation. It decreases area, but in a simple way. The combined transforma-
tion is the Hénon map

u = 1 − ax2 + y (1.89)
v = bx.
1.5. DIFFERENTIATION (MULTIVARIABLE) 21

It may be thought of as a prediction of the future state of the system from the
present state. Notice that this is an active operation; the state changes. It is
possible to iterate this map many times, in an attempt to predict the state far
into the future. This example has been the subject of much research. It turns
out that reliable prediction far into the future is quite difficult. |
The composition of two functions g and f is the function (g ◦ f ) defined by
(g ◦ f )(x) = g(f (x)). The chain rule describes the derivative of such a function.

Theorem 1.15 (Chain rule) Suppose that the derivatives f 0 (x) and g0 (f (x))
exist. Then
(g ◦ f )0 (x) = g0 (f (x))f 0 (x). (1.90)
The left hand is the derivative of the composition of the two functions, while
the right hand side is the matrix product representing the composition of their
derivatives, evaluated at the appropriate points.

Proof: Suppose f (x+h) = f (x)+f 0 (x)h+r(x, h) with |r(x, h)| ≤ (x, h)|h|.
Similarly, suppose g(u+k) = g(u)+g0 (u)k+s(u, k) with |s(u, k)| ≤ η(u, k)|k|.
Take u = f (x) and k = f 0 (x)h. Then

g(f (x+h)) = g(u+k+r(x, h)) = g(u)+g0 (u)k+g0 (u)r(x, h)+s(u, k+r(x, h)).
(1.91)
We need to show that the two remainder terms are appropriately small. First,

|g0 (u)r(x, h)| ≤ kg0 (u)k(x, h)|h|. (1.92)

Second,

s(u, k+r(x, h)) ≤ η(u, k+r(x, h))|k+r(x, h)| ≤ η(u, f 0 (x)h+r(x, h))(kf 0 (x)k+(x, h))|h|.
(1.93)

The chain rule has various important consequences. For instance, in the case
when m = n it is possible that f has an inverse function g such that f (g(y)) = y.
It follows from the chain rule that

g0 (y) = f 0 (g(y))−1 . (1.94)

In other words, the derivative of an inverse function is obtained by taking the


inverse of a matrix. In this case when the matrix is square, the Jacobian matrix
has a determinant, called the Jacobian determinant. It follows that when the
function has an inverse, the Jacobian determinant is non-zero.
Another nice consequence of the chain rule is the mean value theorem. There
is a problem with the mean value theorem for vector-valued functions. One
would like a theorem of the following form: There exists a t with 0 ≤ t ≤ 1 and

f (y) − f (x) = f 0 ((1 − t)x + ty) (y − x). (1.95)


22 CHAPTER 1. DIFFERENTIATION

This would say that there is a point on the segment between x and y where the
derivative accurately predicts the change. But this can be false!
The following is a statement of a true version of the mean value theorem.
The idea is to average over the segment. (The hypothesis of this particular
version is that f 0 (x) not only exists but is continuous in x. A version requiring
differentiability but not continuous differentiability may be found in Rudin.)

Theorem 1.16 (Mean value theorem) Let E be an open convex set in Rn ,


and let f (x) be differentiable with derivative f 0 (x) continuous in x in E. Then
the difference in function values is predicted by the average of the derivative over
the segment:
Z 1
f (y) − f (x) = f 0 ((1 − t)x + ty) dt (y − x). (1.96)
0

Furthermore, suppose that for all z in E we have kf 0 (z)k ≤ M . Then it follows


that
|f (y) − f (x)| ≤ M |y − x|. (1.97)

Proof: Use the fundamental theorem of calculus and the chain rule to
compute
Z 1 Z 1
d
f (y) − f (x) = f ((1 − t)x + ty) dt = f 0 ((1 − t)x + ty)(y − x) dt. (1.98)
0 dt 0

The continuous differentiability assumption guarantees that the integrand is


continuous in t. This gives the identity.
From the identity we get
Z 1 Z 1
|f (y) − f (x)| ≤ kf 0 ((1 − t)x + ty)k|y − x| dt ≤ M |y − x| dt. (1.99)
0 0

But the integrand in the last integral does not depend on t. So this is just
M |y − x|, as in the theorem. 
The mean value theorem idea also works to prove the result about continuous
partial derivatives.

Theorem 1.17 Consider a function f (x) that is defined and continuous on


some open set. Then f 0 (x) exists and is continuous in x if and only if the
partial derivatives exist and are continuous in x.

Proof: It is evident that if the derivative exists and is continuous, then the
partial derivatives exist and are continuous. All the work is to go the other way.
The existence and continuity of the partial derivatives implies the following
statement. Let z be in the open set on which the partial derivatives exist and
are continuous. Let h be a vector in one of the coordinate directions. Then
df (z + th)/dt = f 0 (z + th)h exists for sufficiently small t, and the matrix of
1.5. DIFFERENTIATION (MULTIVARIABLE) 23

partial derivatives f 0 (z) is continuous in z. From here on it is sufficient to work


with this hypothesis.
We need to examine f (x + h) − f (x) when h is not in one of the coordinate
directions. Denote thePorthogonal projection of h on the ith coordinate direction
i
by h(i) . Write h[i] = j=1 h(j) . Then
n
X
f (x + h) − f (x) = [f (x + h[i−1] + h(i) ) − f (x + h[i−1] )] (1.100)
i=1

This represents the total change as the sum of changes resulting from increment-
ing one coordinate at a time. We can use the fundamental theorem of calculus
to write this as
n Z 1
X
f (x + h) − f (x) = f 0 (x + h[i−1] + th(i) )h(i) dt. (1.101)
i=1 0

Notice that each term in the sum only involves one coordinate direction. Fur-
thermore each integrand is continuous in t. It follows that
n Z
X 1
f (x + h) − f (x) − f 0 (x)h = [f 0 (x + h[i−1] + th(i) ) − f 0 (x)]h(i) dt. (1.102)
i=1 0

Hence
n Z
X 1
0
|f (x + h) − f (x) − f (x)h| ≤ kf 0 (x + h[i−1] + th(i) ) − f 0 (x)k dt|h|. (1.103)
i=1 0

Then it is not difficult to show that


Xn Z 1
(x, h) = kf 0 (x + h[i−1] + th(i) ) − f 0 (x)k dt → 0 (1.104)
i=1 0

as h → 0. This is seen as follows. For each t between 0 and 1 we have the


estimate |h[i−1] + th(i) | ≤ |h|. If we take |h| small enough, then from the
continuity assumption each integrand kf 0 (x + h[i−1] + th(i) ) − f 0 (x)k is small
uniformly in t. So each integral is small. 
Here are some chain rule examples. Let u = f (x) and p = g(u), so the
composition is p = g(f (x)). The chain rule says
 0 0
 0 0 0

g1,1 (u) g1,2 (u) f1,1 (x) f1,2 (x) f1,3 (x)
(g ◦ f )0 (x) = 0 0 0 0 0 . (1.105)
g2,1 (u) g2,2 (u) f2,1 (x) f2,2 (x) f2,3 (x)

We can also write


" # " #
∂p ∂p ∂p  ∂p ∂p ∂u ∂u ∂u
∂x ∂y ∂z ∂u ∂v ∂x ∂y ∂z
∂q ∂q ∂q = ∂q ∂q ∂v ∂v ∂v . (1.106)
∂x ∂y ∂z ∂v ∂v ∂x ∂y ∂z .
24 CHAPTER 1. DIFFERENTIATION

This matrix notation is just another way of writing six equations. For ex-
ample, one of these equations is

∂q ∂q ∂u ∂q ∂v
= + . (1.107)
∂y ∂u ∂y ∂v ∂y

There are several ambiguities in such an expression. The first is that if q is


a quantity that depends on u and some other variables, then ∂q/∂u depends on
0
the choice of variables. If we specify q = g2 (u, v), and define ∂q/∂u = g2,1 (u, v),
then there is no problem. But if there were some other coordinate system u, w
for which u is one coordinate, then the function relating q to u, w might be quite
different from the function relating q to u, v. Thus ∂q/∂u holdig w constant is
quite different from ∂q/∂u holding v constant. For this reason many scientists
use a notation such as ∂q/∂u|v = const or just ∂q/∂u|v.
Example: Consider the situation where

p = h(x, g(x, y)). (1.108)

Then
∂p
= h0,1 (x, g(x, y)) + h0,2 (x, g(x, y))g,1
0
(x, y). (1.109)
∂x
When the functional relationships are specified there is no ambiguity. How-
ever this could also be written with p = h(x, v), v = g(x, y) and hence p =
h(x, g(x, y)). Then
∂p ∂p ∂p ∂v
= + . (1.110)
∂x ∂x ∂v ∂x
Now the problem is evident: the expression ∂p/∂x is ambiguous, at least until it
is made clear what other variable or variables are held constant. If we indicate
the variable that is held constant with a subscript, we get a more informative
equation
∂p ∂p ∂p ∂v
|y = |v + |x |y . (1.111)
∂x ∂x ∂v ∂x
|
In general, if p = h(u, v), u = f (x, y), v = g(x, y), a more precise notation
for partial derivatives should be

∂p ∂p ∂u ∂p ∂v
|y = |v |y + |u |y . (1.112)
∂x ∂u ∂x ∂v ∂x
In practice one usually does not indicate which variables are held constant unless
there is risk of confusion. But one should be clear that the partial derivative
with respect to a variable depends on the entire coordinate system.
The second ambiguity is special to the chain rule. Say that p = g1 (u, v),
q = g2 (u, v) and u = f1 (x, y), v = f2 (x, y). Then

∂q 0 0 0 0
= g2,1 (f1 (x, y), f2 (x, y))f1,2 (x, y) + g2,2 (f1 (x, y), f2 (x, y))f2,2 (x, y). (1.113)
∂y
1.5. DIFFERENTIATION (MULTIVARIABLE) 25

This is usually written


∂q ∂q ∂u ∂q ∂v
= + . (1.114)
∂y ∂u ∂y ∂v ∂y
The right hand side first differentiates q regarded as a function of u (holding v
constant), and only after that one replaces u by f1 (x, y) and v by f2 (x, y). So
a more precise notation might indicate that q is defined with the replacement
u ← f1 (x, y), v ← f2 (x, y) before the differentiation ∂q/∂y. On the other hand,
∂q/∂u is defined with the same replacement, but after the differentiation ∂q/∂u.
It is uncommon to indicate this sort of replacement explicitly, but it should be
kept in mind. In the passive interpretation it is hardly necessary, since in that
case one regards u = f1 (x, y), v = f2 (x, y). There may be other situations where
it is necessary for clarity.
Remark: In virtually every application of calculus, the variables that are em-
ployed have specific meanings, and it is natural to write the formulas in terms
of these variables. Even as strict an author as Rudin eventually introduces
notations that employ variables. In multivariable calculus these notations are
not entirely standard. One wants to use notations that will seem familiar to
the mathematical reader, but that work together in a consistent way. Here are
some suggestions.
If y = f (x), then the m by n Jacobian matrix of partial derivatives may be
denoted
∂y
= f 0 (x). (1.115)
∂x
This can also be written out even more explicitly in the form

∂y ∂(y1 , . . . , ym )
= . (1.116)
∂x ∂(x1 , . . . , xn )

Warning: When m = n some authors (including Rudin) use the notation on the
right hand side for the determinant of the Jacobian matrix.
If p = g(y), the chain rule says that

∂p ∂p ∂y
= , (1.117)
∂x ∂y ∂x

The muliplication on the right is matrix multiplication.


If y is a scalar, then we often write

∂y ∂y
dy = dx1 + · · · + dxn . (1.118)
∂x1 ∂xn
This will eventually have a rigorous definition in the context of differential forms.
One possible meaning of dx is as a column vector of dxi . In this setting we can
write formulas such as
dy ∂y dx
= . (1.119)
dt ∂x dt
On the right hand side this is a row covector times a column vector.
26 CHAPTER 1. DIFFERENTIATION

There is another meaning for dx. This is as a formula that occurs in inte-
grands:
dx = dx1 · · · dxn = dx1 ∧ · · · ∧ dxn . (1.120)
The product ∧ is the exterior product, to be explained in the chapter on differ-
ential forms. We shall see that when m = n it is natural to write
dy ∂y ∂(y1 , . . . , yn )
= det = det . (1.121)
dx ∂x ∂(x1 , . . . , xn )

This is the Jacobian determinant that occurs in change of variable formulas.


(Experts may note that this is suggestive of the notation for the Radon-Nikodym
derivative that occurs in the theory of measure and integration.)
|

1.6 Fixed point iteration (multivariable)


Proposition 1.18 Let p be in Rn and consider the closed ball of radius r about
p. Suppose g(x) is defined and continuous for x in some open set including this
ball and has values in Rn . Suppose that kg0 (x)k ≤ c < 1 for x in this set.
Furthermore, suppose that |g(p) − p| ≤ (1 − c)r. Then g maps the ball into
itself and is a strict contraction, so it has a unique fixed point in this ball.
Furthermore, iteration starting in this ball converges to the fixed point.

Proof: From the mean value theorem it follows that |g(x)−g(y)| ≤ c|x−y|.
In order to show that g maps the ball into itself, suppose |x − p| ≤ r. Then
|g(x) − p| ≤ |g(x) − g(p)| + |g(p) − p| ≤ c|x − p| + (1 − c)r ≤ r. 
Sometimes it is helpful have a result where one knows there is a fixed point,
but wants to show that it is stable, in the sense that fixed point iteration starting
near the fixed point converges to it. The following proposition is a variant that
captures this idea.

Proposition 1.19 Let p be a fixed point of g. Suppose that g0 is continuous


and that kg0 (p)k < 1. Then for c with kg0 (p)k < c < 1 there is an r > 0 such
that kg0 (x)k ≤ c < 1 for x satisfying |x − p| < 1. Then g maps the interval into
itself and is a strict contraction. Furthermore, iterates starting in this interval
converge to the fixed point.

In the multidimensional case this result need not give a particularly good
account of stability, since the stability should be established by the spectral
radius ρ(g0 (p)), and the norm kg0 (p)k can be much larger. So the following
result is better.

Proposition 1.20 Let p be a fixed point of g. Suppose that g0 is continuous and


that the spectral radius ρ(g0 (p)) < 1. Then there is a closed ellipsoid centered on
r such that g maps the ellipsoid into itself and iterates starting in this ellipsoid
converge to the fixed point.
1.6. FIXED POINT ITERATION (MULTIVARIABLE) 27

Proof: For every δ > 0 there is a new norm |x|G so that kg0 (p))kG ≤
ρ(g (p)) + δ. Since ρ(g0 (p)) < 1, we can pick the norm so that kg0 (p))kG < 1.
0

Then the continuity of g0 (x) in x shows that for c with kg0 (p)kG < c < 1 there
is an r > 0 such that |x − p|G ≤ r implies kg0 (x)kG ≤ c < 1. Then g maps
this ball into itself and is a strict contraction. Furthermore, iterates starting
in this ball converge to the fixed point. Of course with respect to the original
Euclidean norm this ball is an ellipsoid. 
When n = 2 there is a lovely way of picturing the function and the iteration
process. The idea is to plot vectors g(x) − x. A sequence of such vectors with
the tail of the next one equal to the tip of the previous one indicates the orbit.
The function itself may be pictured by drawing representative orbits. Near a
fixed point p the function g(x) is close to g(p)+g0 (p)(x−p) = p+g0 (p)(x−p).
Thus g(x) − p is close to g0 (p)(x − p), and so the picture resembles the picture
for the linear transformation g0 (p). In particular, the eigenvalues give insight
into the behavior that is expected.
Example: Define a function by

1 2 1
u = f (x, y) =(x − y 2 ) + (1.122)
2 2
1
v = g(x, y) = xy +
4

This has a fixed point where x and y are both equal to 1/2. The linearization
at the fixed point is
  1
− 21
 
x −y 2
= 1 1 (1.123)
y x 2 2

This has eigenvalues 21 ± 12 i. The 2 norm is 1, which is at first alarming, but



this is an overestimate. The Lipschitz norm is 22 < 1, so the fixed point is
certainly stable. In this special case the spectral radius is equal to the Lipschitz
norm, so it gives no additional information. |
Example: Here again is a simple computer program in R.
f <- function (x,y) (x^2 - y^2)/2 + 1/2
g <- function (x,y) x * y + 1/4
x <- 1
y <- 0
for (i in 1:20) {
u <- f(x,y)
v <- g(x,y)
x <- u
y <- v }
x
[1] 0.4998379
y
[1] 0.5015273 |
28 CHAPTER 1. DIFFERENTIATION

Fixed point iteration gives a rather general way of solving equations f (x) = 0.
If A is an arbitrary non-singular matrix, then the fixed points of

g(x) = x − A−1 f (x) (1.124)

are the solutions of the equation. The trick is to pick A such that g is a strict
contraction. However,
g0 (x) = I − A−1 f 0 (x), (1.125)
so the strategy is to take A close to values of f 0 (x) near a point where f (x) is
close to zero. This is illustrated in the following result, which is a reformulation
of a previous result.

Proposition 1.21 Let p be a real number, and consider the closed ball |x−p| ≤
r with r > 0. Suppose that for some c with 0 < c < 1 we have kI −A−1 f 0 (x)k ≤ c
for x in this interval. Furthermore, suppose that |A−1 f (p)| ≤ (1 − c)r. Then
the corresponding fixed point iteration starting in this interval converges to the
solution of f (x) = 0 in this interval.

1.7 The implicit function theorem (multivari-


able)
Theorem 1.22 (Implicit function theorem) Let m ≤ n and let f (x, y) be
a function from Rn to Rm . Here x is in Rm and y is in Rn−m . Suppose f
is continuous with continuous derivative near some point x = a, y = b. Let
I = {1, . . . , m} and J = {m + 1, . . . , n} be the indices corresponding to x and
y. Suppose that the m by m matrix fI0 (a, b) has an inverse matrix. Then there
is a continuous function h(y) defined for y near b with f (h(y), y) = 0. In fact,
the function h(y) has a continuous derivative, and

0 = fI0 (h(y), y)h0 (y) + fJ0 (h(y), y). (1.126)

The equation in the statement of the theorem may be solved to give

h0 (y) = −fI0 (h(y), y)−1 fJ0 (h(y), y). (1.127)

Proof: Let A = fI0 (a, b). Consider the iteration function

g(x, y) = x − A−1 f (x, y). (1.128)

This has partial derivative

gI0 (x, y) = I − A−1 fI0 (x, y). (1.129)

At x = a, y = b this is the zero matrix. Pick some convenient value of c with


0 < c < 1. Then for x, y sufficiently close to a, b this partial derivative has
absolute value bounded by c. In particular, this is true in some box |x − a| ≤
r, |y − b| < s. In this box kgI0 (x, y)k ≤ c, so by the mean value theorem
1.7. THE IMPLICIT FUNCTION THEOREM (MULTIVARIABLE) 29

|g(x00 , y) − g(x0 , y)| ≤ c|x00 − x0 |. Fix r > 0. We know that g(a, b) − a = 0.


Hence, if s > 0 is sufficiently small, |g(a, y) − a| ≤ (1 − c)r. We can put these
results together to show that if |x − a| ≤ r, then

|g(x, y)−a| ≤ |g(x, y)−g(a, y)|+|g(a, y)−a| ≤ c|x−a|+(1−c)r ≤ r. (1.130)

So the map x → g(x, y) is a contraction mapping that sends a complete metric


space (the closed ball of x with |x − a| ≤ r) into itself. This shows that for
all y with |y − b| < s there is a fixed point x = h(y). This proves that
g(h(y), y) = h(y) and hence that f (h(y), y) = 0.
It is easy to see from the contraction mapping principle that the fixed point
h(y) is a continuous function of the parameter y. Consider y0 near y. Then
g(h(y), y) = h(y) and g(h(y0 ), y0 ) = h(y0 ). So h(y0 ) − h(y) = g(h(y0 ), y0 ) −
g(h(y), y) = g(h(y0 ), y0 ) − g(h(y), y0 ) + g(h(y), y0 ) − g(h(y), y). This gives
|h(y0 ) − h(y)| ≤ c|h(y0 ) − h(y)| + |g(h(y), y0 ) − g(h(y), y)|. Write this as
(1 − c)|h(y0 ) − h(y)| ≤ |g(h(y), y0 ) − g(h(y), y)|. Then as y0 → y we have
g(h(y), y0 ) → g(h(y), y), and hence h(y0 ) → h(y).
It remains to show that h has a continuous derivative. Let u be a fixed
vector. The directional derivative h0 (y)u is a vector computed as the limit of
the difference quotient (h(y + ku) − h(y))/k as k → 0. In order to get a handle
on this, compute

0 = f (h(y + ku), y + ku) − f (h(y), y). (1.131)

Each term on the right is zero. Then expand


Z 1
d
0 = f (h(y+ku), y+ku)−f (h(y), y) = f (th(y+ku)+(1−t)h(y), t(y+ku)+(1−t)y) dt.
0 dt
(1.132)
By the chain rule for partial derivatives the right hand side is the sum of two
terms, one from the I derivatives, and one from the J derivatives. The first
term is given by A(k)(h(y + ku) − h(y)), where
Z 1
A(k) = [fI0 (th(y + ku) + (1 − t)h(y), t(y + ku) + (1 − t)y)] dt. (1.133)
0

The second term is B(k)ku, where


Z 1
B(k) = [fJ0 (th(y + ku) + (1 − t)h(y), t(y + ku) + (1 − t)y)] dt. (1.134)
0

This gives the solution

h(y + ku) − h(y)


= −A(k)−1 B(k)u. (1.135)
k
The quantities A(k) and B(k) are continuous in k, because of the continuity
of h. Now let k → 0 and apply the dominated convergence theorem to justify
30 CHAPTER 1. DIFFERENTIATION

taking the limit inside the integral. The integrands become independent of k,
and we obtain the desired formula for this vector, namely

h0 (y)u = −A(0)−1 B(0)u = −fI0 (h(y), y)−1 fJ0 (h(y), y)u. (1.136)

That is, the directional derivative is the matrix

h0 (y) = −fI0 (h(y), y)−1 fJ0 (h(y), y). (1.137)

The right hand side of this is continuous in y. This shows that the left hand side
is continuous in y. As a consequence, h(y) as a function of y is differentiable,
and the derivative h0 (y) as a function of y is continuous. 
The last part of the above proof seems complicated but is actually a straight-
forward application of the technique of the mean value theorem. It follows
unpublished notes of Joel Feldman.
The implicit function theorem has a geometric interpretation. Consider the
case m < n and a function f (x, y) for x in Rm and y in Rn−m , where the
function values are in Rm . A surface of dimension n−m in Rn is given implicitly
by f (x, y) = c , that is, f (x, y) − c = 0. The theorem says that we can write
x = h(y), where h is a function from Rn−m to Rm , such that f (h(y), y) = c.
Thus x = h(t), y = t is a parametric representation of the surface.
Example: Consider the example from Rudin with

p = f1 (x, y, u, v, w) = 2ex + yu − 4v + 3 (1.138)


q = f2 (x, y, u, v, w) = y cos(x) − 6x + 2u − w.

The problem is to solve f (x, y, u, v, w) = 0, g(x, y, u, v, w) = 0 near (0,1,3,2,7)


for x, y in terms of u, v, w. For this to be possible in principle, one needs to
have x, y involved in the equation in a non-trivial way. It is sufficient to have
the linearization
" # 
∂p ∂p
2ex
  
∂x ∂y u 2 3
∂q ∂q = = (1.139)
∂x ∂y
−y sin(x) − 6 cos(x) −6 1

have an inverse. But the inverse is


 −1  
2 3 1 1 −3
= . (1.140)
−6 1 20 6 2

So one can use the iteration function


1
r = g1 (x, y; u, v, w) = x− (f (x, y, z, u, w) − 3g(x, y, z, u, w))(1.141)
20
1
s = g2 (x, y; u, v, w) = y − (6f (x, y, z, u, w) + 2g(x, y, z, u, w)).
20
For each fixed value of u, v, w near (3, 2, 7) this should have a fixed point near
(0,1). |
1.7. THE IMPLICIT FUNCTION THEOREM (MULTIVARIABLE) 31

Example: Here is an R program to carry this out for the input (u, v, w) =
(3, 2, 6), which one hopes is near enough to (u, v, w) = (3, 2, 7) where we know
the solution.
f1 <- function (x,y,u,v,w) 2 * exp(x) + y * u - 4 * v + 3
f2 <- function (x,y,u,v,w) y * cos(x) - 6 * x + 2 * u - w
x <- 0
y <- 1
u <- 3
v <- 2
w <- 6
g1 <- function (x,y) x - ( f1(x,y,u,v,w) - 3 * f2(x,y,u,v,w) )/20
g2 <- function (x,y) y - ( 6 * f1(x,y,u,v,w) + 2 * f2(x,y,u,v,w) )/20
for (n in 1:40) {
r <- g1(x,y)
s <- g2(x,y)
x <- r
y <- s }
x
[1] 0.1474038
y
[1] 0.8941188
This result is for input (3,2,6), and it is reasonably close to the point (0,1)
that one would get for input (3,2,7). Of course to get a better idea of the function
this computation needs to be repeated for a variety of inputs near (3,2,7). Even
then one only gets an idea of the function at inputs sufficiently near this point.
At inputs further away fixed point iteration may fail, and the behavior of the
function is harder to understand. |

Theorem 1.23 (Inverse function theorem) Let y = f (x) be a function from


Rm to Rm . Here x is in Rm and y is in Rm . Suppose f is continuous with
continuous derivative near some point x = a. Suppose that the m by m matrix
f (a) has an inverse matrix. Then there is a continuous function h(y) defined for
y near b = f (a) with f (h(y)) = y. In fact, the function h(y) has a continuous
derivative, and
h0 (y) = f 0 (h(y))−1 . (1.142)

The inverse function theorem is the special case of the implicit function
theorem when f (x, y) is of the form f (x) − y. It is of great importance. For
example, if a system is described by variables x1 , . . . , xn , and ui = fi (x1 , . . . , xn )
gives new variables, we might want to describe the system by u1 , . . . , un . This
would be a passive transformation. But can one recover the original variables?
If the matrix ∂ui /∂xj is non-singular, then the inverse functions says that it
should be possible. That is, we have xj = hj (u1 , . . . , un ).
32 CHAPTER 1. DIFFERENTIATION

1.8 Second order partial derivatives


In this section we consider scalar functions of several variables. For instance,
consider u = f (x) = f (x, y, z). The first order partial derivatives
 
0 0 0 ∂u ∂u ∂u
[f,1 (x), f,2 (x), f,3 (x)] = , , (1.143)
∂x ∂y ∂x

naturally form a covector. How about the second order partial derivatives?
These can be arranged in a matrix
  ∂2u ∂2u ∂2u

00 00 00

f,11 (x) f,12 (x) f,13 (x) ∂x2 ∂x∂y ∂x∂z
∂2u ∂2u ∂2u
f 00 (x) = f 00 (x, y, z) =  f,21
00 00
(x) f,22 00
(x) f,23 (x)  =  .
 
∂y∂x ∂y 2 ∂z∂x
00 00 00 ∂2u ∂2u ∂2u
f,31 (x) f,32 (x) f,33 (x)
∂z∂x ∂z∂y ∂z 2
(1.144)
One ordinarily expects that this is a symmetric matrix, that is,

00 ∂2u ∂2u 00
f,12 (x) = = = f,21 (x) (1.145)
∂x∂y ∂y∂x

and similarly for the other three cases.


The symmetry of this matrix is due to the equality of mixed partial deriva-
tive. This fact is not completely obvious. Here is a strategy for showing that
it is true. There is little loss of generality in considering the case of a func-
tion f (x, y) of two variables. Define ∆1 (h)f (x, y) = f (x + h, y) − f (x, y) and
∆2 (k)f (x, y) = f (x, y + k) − f (x, y). Then ∆1 (h)f (x, y)/h → f,1 (x, y) as h → 0
and ∆2 (k)f (x, y)/k → f,2 (x, y) as k → 0.
The key to the mixed partial derivatives identity is the algebraic identity

∆1 (h)∆2 (k)f (x, y) = ∆2 (k)∆1 (h)f (x, y). (1.146)

Then the obvious attempt at a proof is to write


0
∆2 (k)∆1 (h)f (x, y) ∆2 (k)f,1 (x, y)
lim = (1.147)
h→0 hk k
and
0
∆2 (k)f,1 (x, y) 00
lim = f,21 (x, y). (1.148)
k→0 k
Then

00 ∆1 (h)∆2 (k)f (x, y) ∆2 (k)∆1 (h)f (x, y) 00


f,12 (x, y) = lim lim = lim lim = f,21 (x, y).
h→0 k→0 hk k→0 h→0 hk
(1.149)
The trouble with this is that it is possible to find clever examples where the in-
terchange of limits does not work. However if we make a continuity assumption,
then this problem goes away.
1.8. SECOND ORDER PARTIAL DERIVATIVES 33

Theorem 1.24 (Equality of mixed partial derivatives) Assume that f (x, y)


0 0 00
and f,1 (x, y) and f,2 (x, y) and f,21 (x, y) are continuous. Then it follows that
00 00
f,12 (x, y) = f,21 (x, y).
Proof: By the mean value theorem
0
∆2 (k)∆1 (h)f (x, y) ∆2 (k)f,1 (x + h∗ , y) 00
= = f,21 (x + h∗ , y + k ∗ ). (1.150)
hk k
Here h∗ is between x and x + h and k ∗ is between y and y + k. Consider  > 0.
00
Choose h and k so small that f,21 (x + h∗ , y + k ∗ ) is a distance less than  from
00 00
f,21 (x, y). Thus ∆2 (k)∆1 (h)f (x, y)/(hk) is a distance less than  from f,21 (x, y).
Now let k → 0. We conclude that for h sufficiently small (depending on ) we
0 00
have that ∆1 (h)f,2 (x, y)/h is a distance less than or equal to  from f,21 (x, y).
00 00
This proves that f,12 (x, y) = f,21 (x, y). 
The conclusion is that, apart from technicalities, we may safely assume the
equality of mixed partial derivatives. This has a number of important conclu-
sions.
Theorem 1.25 Consider a covector of functions p1 = h(x1 , . . . , xn ), . . . , pn =
hn (x1 , . . . , xn ) that each has continuous derivatives. Suppose that this covec-
tor is integrable, in the sense that there is a function u = f (x1 , . . . , xn ) with
continuous second partial derivatives satisfying
du = p1 dx1 + · · · + pn dxn . (1.151)
Then for each i, j the integrability condition
∂pi ∂pj
= (1.152)
∂xj ∂xi
is satisfied.
Proof: The proof depends on the equality of mixed partial derivatives. For
each k we have
∂u
pk = . (1.153)
∂xk
Therefore
∂pi ∂ ∂ ∂ ∂ ∂pj
= u= u= . (1.154)
∂xj ∂xj ∂xi ∂xi ∂xj ∂xi

Another important application is to the first and second derivative tests.
Theorem 1.26 (First derivative test) Assume that u = f (x1 , . . . , xn ) has
continuous first partial derivatives. Suppose that there is a point where there is
a local minimum or a local maximum. Then at that point the differential
∂u ∂u
du = dx1 + · · · + dxn (1.155)
∂x1 ∂xn
is zero, that is, each partial derivative is zero at that point.
34 CHAPTER 1. DIFFERENTIATION

A point where the differential du vanishes is called a critical point. The


value of u at a critical point is called a critical value. At a critical point, the
symmetric matrix of second partial derivatives has a special significance. In this
context it is is called the Hessian matrix.

Theorem 1.27 (Second derivative test) Assume that u = f (x1 , . . . , xn ) has


continuous second partial derivatives. Suppose that there is a point where the
derivative f 0 (x) covector with entries

∂u
f,i0 (x) = (1.156)
∂xi

vanishes. Consider the second derivative Hessian matrix f 00 (x) with entries

00 ∂2u
f,ij (x) = . (1.157)
∂xi ∂xj

This is a symmetric matrix with real eigenvalues. If at the given point all eigen-
values of this matrix are strictly positive, then the function has a local minimum.
Similarly, if all eigenvalues of this matrix are strictly negative, then the function
has a local maximum.

1.9 Problems
Problems 1: Fixed point iteration
1. Let g(x) = cos(x/2). It has a stable fixed point r > 0 with g(r) = r. Use
fixed point iteration to find a numerical value for r. Also find g 0 (r).

2. There is a theorem that says that if |g 0 (x)| ≤ c < 1 in an interval of the


form [p − r, p + r], and if |g(p) − p| ≤ (1 − c)r, then g has a fixed point
in the interval. Use this to give a proof that g(x) = cos(x/2) has a fixed
point in the interval [π/6, π/2].

3. Say that g : [a, b] → [a, b] is a continuous function. Suppose that g is


increasing, in the sense that x ≤ y implies g(x) ≤ g(y). Show that fixed
point iteration starting with a always leads to a fixed point r.

4. Let g(x) = (1/2)(x2 + 2x3 − x4 ). This has four fixed points r1 < r2 < r3 <
r4 . Find them, and specify which ones are stable. Compute everything
exactly.

5. In the preceding problem, prove that if r1 < x < r3 , then fixed point
iteration starting at x converges to r2 . Give a detailed discussion. Hint:
It may help to carefully draw a graph and use the graphical analysis of
fixed point iteration. Do not make assumptions about the graph that are
not justified.
1.9. PROBLEMS 35

6. In physics one calculates the spontaneous magnetization m by the fixed


point equation
m = tanh(Jm). (1.158)
Here J > 0 is a fixed parameter. This equation may be solved by fixed
point iteration using the function g(x) = tanh(Jx). Notice that g 0 (x) =
Jsech2 (Jx) satisfies 0 < g 0 (x) < J except at x = 0 where g 0 (0) = J.
In particular it is increasing. Similarly, one can compute that g 00 (x) < 0
for x > 0 and g 00 (x) > 0 for x < 0. Also remember that g(x) → ±1 as
x → ±∞.
(a) Suppose J > 1. Describe the fixed points. Discuss stability. Sketch
orbits.
(b) For each of the fixed points, describe the set of x such that fixed point
iteration starting at x converges to that fixed point.

Recitation 1
1. Use fixed point iteration to numerically find the largest root r of f (x) =
x3 − 5x2 + 3x + 1 = 0. Use g(x) = x − f (x)/f 0 (s), where s is chosen close
to the unknown root r. (Since f (4) = −3 is not very large, perhaps s
could be near 4.) Start the iteration near the root.
2. Consider a smooth function f (x) with a simple root r, that is, f (r) = 0
and f 0 (r) 6= 0. Let g(x) = x − f (x)/f 0 (x). Find g 0 (x). Find g 0 (r).
3. Use the iteration function of the previous problem to numerically find the
largest root for the example of the first problem.
4. Suppose that g : [a, b] → [a, b] is an increasing function: x ≤ y implies
g(x) ≤ g(y). Prove or disprove the following general assertion: There
exists s in [a, b] such that s is not a fixed point and iteration starting at s
converges to a fixed point.
5. Suppose that g : [a, b] → [a, b] is an increasing function: x ≤ y implies
g(x) ≤ g(y). Prove or disprove the following general assertion: The func-
tion g has a fixed point.

Problems 2: The size of a matrix


1. Let H be the real symmetric matrix
 
10 0 2
H =  0 10 4  . (1.159)
2 4 2

This matrix has determinant zero, so one eigenvalue is zero. Find all
eigenvalues. Find the corresponding eigenvectors, as column vectors. (Are
they orthogonal?) Produce a matrix P with the normalized eigenvectors
36 CHAPTER 1. DIFFERENTIATION

as columns. Show that P T P = I. Show by explicit computation that


HP = P Λ, where Λ is diagonal. Find the spectral representation of H.

2. Let A be the real matrix


 
3 1 1
A= . (1.160)
−1 3 1

Find the Lipschitz norm of A (the square root of the largest eigenvalue of
AT A). Find the 2 norm of A (the square root of sum of squares of entries,
or, equivalently, the square root of the trace of AT A). Compare them.

3. This problem deals with the Lipschitz norm. Say that A is a real square
matrix. The claim is that it is always true that kA2 k = kAk2 . Prove or
disprove.

4. Let R be the real symmetric matrix


 
1 1 1 1 1 1 1
 1 1 1 1 1 1 1 
 
 1 1 1 1 1 1 1 
 
R= 1 1 1 1 1 1 1 . (1.161)

 1 1 1 1 1 1 1 
 
 1 1 1 1 1 1 1 
1 1 1 1 1 1 1

Find the Lipschitz norm kRk.

5. Find all real square matrices A such that kAk = kAk2 . If you need a hint,
see below.
Hint: Consider a vector x that is not the zero vector, and another vector
a. The Schwarz inequality says that the inner product a·x satisfies |a·x| ≤
|a||x| with equality only when a = cx. (Since a · x = |a||x| cos(θ), this
is when cos(θ) = ±1, so the vectors are either pointing in the same or
opposite direction.)
Use the Schwarz inequality for each i to prove
XX XX X
|Ax|2 = ( aij xj )2 ≤ ( a2ij x2k ) = kAk22 |x|2 . (1.162)
i j i j k

When is this an equality? (Consider the situation for each fixed i.) Once
you have the form of the matrix you can calculate AT A and evaluate the
norms.

Recitation 2
1. Describe all 2 by 2 matrices with only one eigenvalue that are not diago-
nalizable.
1.9. PROBLEMS 37

2. Consider the matrix  


−3 16
A= . (1.163)
−1 5
Find the eigenvalue. Let δ 6= 0. Find an invertible matrix P with AP =
P J, where
 
λ δ
J= . (1.164)
0 λ
The matrix P will depend on δ.

3. Consider 0 ≤ r ≤ s. Must there be a 2 by 2 matrix with spectral radius r


and norm s? Prove that your answer is correct.

4. Let
     
cos(2θ) sin(2θ) cos(θ)  sin(θ)   
R= = cos(θ) sin(θ) − sin(θ) − cos(θ) .
sin(2θ) − cos(2θ) sin(θ) − cos(θ)
(1.165)
Check this identity. Find the eigenvalues and eigenvectors. Find R2 .

5. Find the eigenvalues and eigenvectors of


1
 
1 10
A= 1 (1.166)
10 1

Use a computer program to show a trajectory heading toward the origin


near the stable direction (0 < λ < 1) and then heading out along the un-
stable direction (1 < λ). This gives a pictorial way of seeing eigenvectors!
Here is a sample program. Try it with other initial conditions!
x <- vector()
y <- vector()
u <- 10
v <- -10 + 1/10
for (i in 1:40) {
u <- u + v/10
v <- u/10 + v
x[i] <- u
y[i] <- v }
frame()
plot(x,y)
lines(x,y)
38 CHAPTER 1. DIFFERENTIATION

Problems 3: The derivative as a matrix


1. The problem is to solve f (u, v) = 0 for u = h(v) as a function of v.
Suppose f (u0 , v0 ) = 0. If we want to find a solution with u0 = h(v0 ),
we can set m = ∂f (u, v)/∂u 6= 0 evaluated at the point. Assume m 6= 0.
Then we can set g(u, v) = u − (1/m)f (u, v). Then ∂g(u, v)/∂u will be
close to zero near the point. (Why?) So fixed point iteration u 7→ g(u, v)
with v fixed near v0 should work to produce a solution.
Say that instead we want to solve f (u, v) = 0 for v = k(u) as a function
of u. Specify the assumption that is needed, and describe the procedure.

2. This continues the previous problem. Consider f (u, v) = u3 −v 2 . Describe


all points u, v for which the above results give a solution for u as a function
of v. Describe all points u, v for which the above results give a solution
for v as a function of u. Graph the equation f (u, v) = 0 with some care.
Indicate the functions u = h(v) and v = k(u) that you get. To what
extent are they uniquely specified.

3. Say that u = x3 − 3xy 2 and v = 3x2 y − y 3 . Find the 2 by 2 derivative


matrix. What is the determinant of this matrix? What is its inverse?

4. This continues the previous problem. Say that s = sin(uev ). Find ∂s/∂u
and ∂s/∂v. Find s as a function of x and y. Use the chain rule to evaluate
the two entries of the derivative matrix (row covector) ∂s/∂x and ∂s/∂y.

5. Let
x3
u = f (x, y) = (1.167)
x2 + y2
with f (0, 0) = 0 at the origin.
a) Show that u is continuous at the origin by direct calculation using the
definition of continuity.
0 0
b) Evaluate ∂u/∂x = f,1 (x, y) and ∂u/∂y = f,2 (x, y) away from the origin.
Evaluate ∂u/∂x and ∂u/∂y at the origin, using the definition of the (one-
dimensional) derivative.
c) Is u = f (x, y) a C 1 function? That is, are the partial derivatives ∂u/∂x
and ∂u/∂y both continuous? Prove that your answer is correct by direct
calculation.
c)The condition for u = f (x, y) to be differentiable at the origin is that
0 0
f (h, k) = f (0, 0) + f,1 (0, 0)h + f,2 (0, 0)k + r(0, 0; h, k) (1.168)
√ √
with |r(0, 0; h, k)|/ h2 + k 2 → 0 as h2 + k 2 → 0. Is the function differ-
entiable? Using only this definition, prove that your answer is correct.
1.9. PROBLEMS 39

Recitation 3
1. Define

g1 (x, y) = xy − 2x − 2y + 6 (1.169)
g2 (x, y) = xy − 2x + 1.

Find all fixed points by explicit computation. (Hint: Start by eliminating


the xy term.) Find the derivative matrix at each fixed point. Discuss
eigenvalues and stability.
2. The obvious general mean value theorem would say that if f (x) is a con-
tinuously differentiable function from a convex domain in Rn to Rm , then
there exists t with 0 ≤ t ≤ 1 such that

f (q) − f (p) = f 0 ((1 − t)q + tp)(q − p). (1.170)

But is this true? Hint: Take n = 1 and m = 2 and f (x) = [x2 , x3 ]T .


3. Say that a function is C k if it has k derivatives, all continuous.
a) For each k, find a function f from the real line to itself that is C k but
not C k+1 . Hint: It may help to take f (x) = 0 for x ≤ 0.
b) A function is C ∞ if it is C k for every k. Find a function with f (x) = 0
for x ≤ 0 that is C ∞ and is not the zero function.
4. a) Let X and Y be metric spaces. Let f : X → Y be a function. Suppose
that f is Lipschitz with Lipschitz constant c. This means that the distance
from f (x) to f (y) is bounded by c times the distance from x to y. (The
Lipschitz norm of f is the least such constant.) Prove that f is uniformly
continuous.
b) Let X = Y = [0, 1]. Find a function f : X √ → Y that is uniformly
continuous but not Lipschitz.
√ Hint: Take f (x) = x. Show that if x < y

and y − x < 2 , then y − x < .

Problems 4: Implicit function theorem


1. Consider the surface in R3 given by

s = x4 + x2 y 2 + y 4 + y 2 z 2 + z 4 + z 2 x2 = 1. (1.171)

(a) Calculate the differential of the function defining the surface. For
which points on the surface does the differential vanish? (b) For which
points on the surface does the implicit function theorem define at least
one of the variables as a function of the other two near the point?
2. (a) In the preceding problem it should be possible to solve for y in terms
of x, z near the point (0,1,0). Find a function g(y; x, z) such that fixed
point iteration y 7→ g(y; x, z) with this function (for fixed x, z) gives the
40 CHAPTER 1. DIFFERENTIATION

corresponding value y. Express this function in terms of s − 1 = f (x, y, z).


(b) Answer the same question if y is to be expressed in terms of x, z near
(0, −1, 0).

3. Consider the function given by

u = f1 (x, y) = x4 − 6x2 y 2 + y 4 (1.172)


3 3
v = f2 (x, y) = 4x y − 4y x. (1.173)

(a) Find the derivative matrix.


(b) Find the determinant of the derivative matrix. Find all points at
which this determinant vanishes.
(c) Show that the four points (x, y) = (±1, 0), (x, y) = (0, ±1) all map to
(u, v) = (1, 0). Near which of these point is there an inverse function?

4. The function of the previous problem maps the point (0, 1) to (1, 0). There
is an inverse function that sends points (u, v) near (1, 0) to points (x, y)
near (0, 1). Find functions g1 (x, y; u, v) and g2 (x, y; u, v) such that fixed
point iteration with these functions (for fixed (u, v)) give the correspond-
ing inverse values x, y. Express these functions in terms of f1 (x, y), f2 (x, y)
and u, v. (Use the algorithm involving the derivative matrix of f1 (x, y), f2 (x, y)
evaluated at the point (0, 1).

5. Consider the iteration function g1 (x, y; u, v), g2 (x, y; u, v) found in the pre-
vious problem. Show that if |x| < 1/100 and |y − 1| < 1/100, then the
linearization at such a point has norm bounded by 1/2. (Hint: Bound the
2-norm.)

6. (a) Consider an equation f (p, v, t) = 0. Consider a particular point in


R3 that gives a solution of this equation. Describe the condition that
guarantees that one can solve for v as a function of p, t near this point.
(b) In physics the quantities p, v, t are related to pressure, volume, and
temperature. An important equation relating these quantities is
  
3 1 8
f (p, v, t) = p + 2 v− − t = 0. (1.174)
v 3 3

Show that p = 1, v = 1, t = 1 is a solution.


(c) Can the implicit function theorem be used to specify v as a function
of p, t near this point where p = 1, v = 1, t = 1? Justify your answer.

Recitation 4
1. (a) Is xy 4 dx + 2x2 y 3 dy exact? Justify your answer.
(b) Is 3x2 y 2 dx + 4x3 y dy exact? Justify your answer.
1.9. PROBLEMS 41

(c) Can 3x2 y 2 dx + 4x3 y dy be multiplied by a power y m to produce an


exact differential? Justify your answer.
2. Let

u = f (x, y) = x2 y 2 − 2xy 2 + 5x2 − 10xy + 7y 2 − 10x + 10y. (1.175)

There is a point on the line y = 0 where du = 0. Find it, and find the
corresponding value of u. Compute the Hessian matrix. Apply the second
derivative test to establish whether it is a local minimum, local maximum,
or saddle point (or something else).

3. The second derivative. Say that we want to minimize w = h(u, v) by ap-


plication of the first derivative test and the second derivative test. How-
ever we want to use coordinates x, y related to the u, v by u = f (x, y),
v = g(x, y). The corresponding 2 by 2 derivative matrix is assumed to be
invertible.
Show that at a point where ∂w/∂u = 0 and ∂w/∂v = 0 it is also true that
∂w/∂x = 0 and ∂w/∂y = 0.
Consider such a point where the first partial derivatives vanish. Suppose
that at this point the quadratic form

∂2w 2 ∂2w ∂2w 2


p + 2 pq + q >0 (1.176)
∂u2 ∂u∂v ∂v 2
for all p, q other than 0, 0. Show that ∂ 2 w/∂x2 > 0.
4. Let g(x, y) = (x2 − y 2 )/(x2 + y 2 ) and let g(0, 0) = 0. Then g(x, y) is a
bounded function that is C 2 away from (0,0). Show that g(x, 0) = 1 for
x 6= 0, g(0, y) = −1 for y 6= 0. (Thus g is not continuous at (0,0).) Show
also that x∂g(x, y)/∂x and y∂g(x, y)/∂y are bounded functions in the
region away from (0, 0). In the following use only these general properties
of g(x, y).
Let f (x, y) = xyg(x, y). Calculate ∂f (x, y)/∂x and ∂f (x, y)/∂y at the
origin. Show that f (x, y) is C 1 . Compute ∂f (x, y)/∂x along y = 0.
Calculate ∂f (x, y)/∂y along x = 0. Show that your answers also work at
the origin. Compute the mixed second partial derivatives at the origin.
42 CHAPTER 1. DIFFERENTIATION
Chapter 2

Integration

43
44 CHAPTER 2. INTEGRATION

2.1 The Riemann integral


The rest of these lectures centers on integration, and here the foundational result
is the change of variables theorem. In dimension greater than one this is a non-
trivial result; the limiting process that defines the derivative is related to the
limiting process that defines the integral in a rather subtle way. This chapter
on the Riemann integral is directed toward making this relation explicit.
In the following the integral is defined in terms of upper sums and lower
sums. The notions of supremum (least upper bound) and infimum (greatest
lower bound) play a role. For easy reference, here are the definitions.
Let S be a set of real numbers. Then sup S is the least upper bound of S.
That is, it is an upper bound:

∀x ∈ S x ≤ sup S, (2.1)

and it is the least upper bound: For every b we have

∀x ∈ S x ≤ b ⇒ sup S ≤ b. (2.2)

Equivalently,
b < sup S ⇒ ∃x ∈ S b < x. (2.3)
Similarly, inf S is the number with the property that is the greatest lower
bound of S. That is, it is a lower bound:

∀x ∈ S inf S ≤ x, (2.4)

and it is the greatest lower bound: For every b we have

∀x ∈ S b ≤ x ⇒ b ≤ inf S. (2.5)

Equivalently,
inf S < b ⇒ ∃x ∈ S x < b. (2.6)
If f is a real function defined on some set, then the supremum and infimum of
the function are the supremum and infimum of the set of values of the function.
An interval is a subset of R that is connected. It is degenerate if it is empty
or consists of only one point. An n-dimensional cell is a subset of Rn that is a
product of n intervals. An n-dimensional cell is bounded if and only if each of
the n intervals is bounded. For a bounded cell we may define the n-dimensional
volume by
Yn
mn (I) = ∆xi , (2.7)
i=1

where ∆xi ≥ 0 is the length of side i.


An n dimensional cell is closed if and only if each of the n intervals is closed.
We say that an n dimensional cell is non-degenerate if each of the n intervals is
non-degenerate.
2.1. THE RIEMANN INTEGRAL 45

In mathematics a set partition of a set A is a collection of non-empty


non-overlapping subsets whose union is A. In the following we need a slightly
different notion of partition. In this version the subsets are allowed to overlap,
but only in lower dimension. Let C be a subset of Rn . Then P is a partition
(more precisely, cell partition) of C provided that P is a finite set of closed,
bounded, non-degenerate cells, each pair of cells in P has an intersection that
is a closed bounded degenerate cell, and C is the union of the cells in P. From
now on we consider a set C that has at least one partition. Such a set will be
called a rectangular set.
A partition P is finer than partition Q if for each I in P there is a J in Q
with I ⊆ J.
If Q and R are partitions, then there is always a partition P that is finer
than Q and also finer than R. This can be found by taking the cells of P to be
of the form I = J ∩ K, where J is a cell of Q and K is a cell of R.
Let f : C → R be a bounded function. For each partition P of A define the
lower sum by X
L(f, P) = inf fI m(I) (2.8)
I∈P

and the upper sum by


X
U (f, P) = sup fI m(I). (2.9)
I∈P

Here fI denotes the restriction of f to the closed bounded non-degenerate cell


I.
It is easy to see that when P is a refinement of Q, then

L(f, Q) ≤ L(f, P) ≤ U (f, P) ≤ U (f, Q). (2.10)

Furthermore, we can take more and more refined partitions and get correspond-
ing lower and upper integrals. More precisely, we have the lower integral

L(f ) = sup L(f, P) (2.11)


P

and the upper integral


U (f ) = inf U (f, P). (2.12)
P

Here P ranges over all partitions of the set A. It is not hard to show that
L(f ) ≤ U (f ).
If L(f ) = U (f ), then we say that f is Riemann integrable with integral
I(f ) = L(f ) = U (f ). Warning: There are functions that are not Riemann
integrable; for such functions L(f ) < U (f ).
The reader will recall examples from the case n = 1. If f is defined on [a, b]
and f is monotone increasing (or monotone decreasing), then f is Riemann
integrable. On the other hand, if f (x) = 1 for x rational and f (x) = 0 for x
irrational, then L(f ) = 0, while U (f ) = b − a.
46 CHAPTER 2. INTEGRATION

The lower and upper integrals are always defined, but in general they have a
major defect: they need not be additive. The following proposition states what
is true in general: the lower integral is superadditive (on functions), and the
upper integral is subadditive.

Proposition 2.1 The lower integral is superadditive:

L(f + g) ≥ L(f ) + L(g), (2.13)

while the upper integral is subadditive:

U (f + g) ≤ U (f ) + U (g). (2.14)

Proof: It is sufficient to prove this for the case of the lower integral. We
have for each x in I the inequality

inf f (x) + inf g(x) ≤ f (x) + g(x), (2.15)


x∈I x∈I

so the left hand side is an lower bound. Therefore the greatest lower bound
satisfies
inf f (x) + inf g(x) ≤ inf (f (x) + g(x)). (2.16)
x∈I x∈I x∈I

Consider partitions Q and R. There is a partition P that is finer than either


of them. As a consequence, we have that

L(f, Q) + L(g, R) ≤ L(f, P) + L(g, P) ≤ L(f + g, P) ≤ L(f + g). (2.17)

So we have an upper bound for the L(f, Q). The least upper bound L(f ) must
then satisfy
L(f ) + L(g, R) ≤ L(f + g). (2.18)
Similarly, we have an upper bound for the L(f, R). The least upper bound L(f )
must then satisfy
L(f ) + L(g) ≤ L(f + g). (2.19)


Theorem 2.2 The Riemann integral is additive for Riemmann integrable func-
tions:
I(f + g) = I(f ) + I(g). (2.20)

The above theorem is the main reason for defining the Riemann integral
as the common value of the lower and upper integrals. Things can go wrong
when the upper and lower integrals differ. The reader will find it not difficult
to produce an example in one dimension where L(f + g) > L(f ) + L(g).
Let A be an arbitrary bounded subset of Rn . All these notions may be
extended to the case of a bounded function f : A → R. We merely have to
choose a rectangular set C such that A ⊆ C. Then we can define f¯ to be equal
2.1. THE RIEMANN INTEGRAL 47

to f on A and to be equal to 0 on the complement of A in C. The integral I(f )


of f over A may be defined to be the integral I(f¯) of f¯ over C.
There are various notations for the integral. For instance, if we represent
the function f : A → R by x 7→ f (x) with x ranging over A, then we could
write
I(f ) = I(x 7→ f (x)). (2.21)

A variation that will be convenient in the following is

I(f ) = I(f (x); x). (2.22)

The most common notation is something like


Z
I(f ) = f (x) dx. (2.23)

The official definition of Riemann integral used here is in terms of parti-


tions of a set into finitely many closed bounded non-degenerate cells. As a
consequence, every Riemann integrable function is not only bounded, but it
also vanishes outside of a bounded set. In practice one often wants to integrate
unbounded functions or functions defined on all of Euclidean space. There are
two situations that are dramatically different.

• The integral is an absolutely convergent sum of absolutely convergent


Riemann integrals. For instance, f has an integral over all of Euclidian
space if we write the space as an infinite union of closed bounded non-
degenerate cells Ck , and the restrictions fk to Ck satisfy
X
I(|fk |) < +∞. (2.24)
k

In that case we can define


X
I(f ) = I(fk ) (2.25)
k

without any ambiguity.

• The integral is represented as a conditionally convergent sum. In this case


the question is the order in which the sum is taken. By changing the order
one can get any result.

Sometime people speak about “improper Riemann integrals” in a way that


confuses these two notions. But the distinction that matters is between absolute
convergence and conditional convergence.
48 CHAPTER 2. INTEGRATION

2.2 Jordan content


Consider a rectangular set C, and A ⊆ C. Let 1A be the indicator function of
A, equal to 1 on A and 0 on C \ A. (The indicator function is sometimes called
the characteristic function, but this term has a conflicting use in probability
theory.) If 1A is Riemann integrable, then A is said to be Jordan measurable.
The Jordan content (or volume) of A is then defined to be

m(A) = I(1A ). (2.26)


Sk
Jordan content is additive. This means that if A = i=1 Ai is a union of
disjoint
Pn Jordan measurable sets Ai , then A is Jordan measurable, and m(A) =
i=1 m(Ai ).
It is fairly easy to compute the Jordan content of a cell. However the formula
for a ball is more difficult. For the record, the volume of a ball of radius r in n
dimensions is vn rn , where
1 2π n/2
vn = . (2.27)
n Γ(n/2)
The most familiar cases are the interval of length 2r, the disk of radius r with
area πr2 , and the ball in three dimensions of radius r and volume 43 πr3 . The area
of the n − 1 sphere of radius r in n dimensions is given by an rn−1 = dvn rn /dr,
so
2π n/2
an−1 = nvn = . (2.28)
Γ(n/2)
The most familiar cases are 2 points, the circle of radius r with length 2πr, and
the 2-sphere in three dimensions of radius r √and area 4πr2 . For reference √ the
relevant values of the Γ function are Γ( 21 ) = π, Γ(1) = 1, and Γ( 32 ) = 12 π.

2.3 Approximation of Riemann integrals


The notions of interior and boundary will occur; here is a quick review. If A is
a subset of a metric space, then the interior int(A) is the largest open subset of
A. The boundary bdy(A) is the set of points that do not belong to the interior
of A and also do not belong to the interior of the complement of A. Thus the
closure of A is the union of int(A) and bdy(A).
If f is a real function on C ⊆ Rn , and A ⊆ C, define the oscillation of f on
A by
oscA (f ) = sup f (z) − inf f (w). (2.29)
z∈A w∈A

The oscillation is a special case of notion of diameter of a set. In fact, we


have oscA (f ) = supz,w∈A |f (z) − f (w)|, which shows that the oscillation is the
diameter of the image of the restriction of f to A.
Define the oscillation of f at x by

oscx (f ) = inf{oscA (f ) | x ∈ int(A)}. (2.30)


2.3. APPROXIMATION OF RIEMANN INTEGRALS 49

It may be shown that the function f is continuous at x if and only if oscx (f ) = 0.


There is a very interesting connection between oscillation and the Riemann
integral. In fact we have
X
U (f, P) − L(f, P) = oscI (f )mn (I). (2.31)
I∈P

This implies that the integral exists if and only if the infimum over all these
oscillation sums is zero.

Theorem 2.3 Suppose that f is Riemann inegrable and that h is a Lipschitz


function. Then the composition of h ◦ f is also Riemann integrable.

Proof: To say that h is Lipschitz is to say that there is a constant M with


|h(y) − h(z)| ≤ M |y − x|. It follows that oscI (h ◦ f ) ≤ M oscI (f ). So

U (h ◦ f, P) − L(h ◦ f, P) ≤ M (U (f, P) − L(f, P)). (2.32)

If f is integrable, then so is h ◦ f . 
Every C 1 function is Lipschitz on every bounded set. So the above result
applies to functions such as h(y) = y 2 .

Theorem 2.4 Suppose that f and g are Riemann integrable. Then so is the
product f · g.

Proof: This follows from the identity


1
(f + g)2 − (f − g)2 .

f ·g = (2.33)
4
Since by the previous result the square of a Riemann integrable function is
Riemann integrable, the right hand size defines a Riemann integrable function.


Theorem 2.5 Suppose that f is Riemann inegrable and that h is a one-to-one


continuous function whose inverse is Lipschitz. Then the composition f ◦ h is
also Riemann integrable.

This theorem will be proved in a later chapter. It follows that a change


of variables that is C 1 with C 1 inverse preserves the property of being Rie-
mann integrable. It follows that it also preserves the property of being Jordan
measurable.
Given a function f on Rn , there is a set of x where f (x) 6= 0. The closure of
this set is the support of f . Thus to say that f has compact support is equivalent
to saying that f vanishes outside of a bounded set.
Another important property of lower and upper integrals is approximation
by continuous functions. The lemma below applies in particular to Riemann
integrals, since every Riemann integral is both a lower and an upper integral.
50 CHAPTER 2. INTEGRATION

Theorem 2.6 Let A be a bounded set. Let f be a bounded real function on A.


Then for every  > 0 there exists a continuous function g with g ≤ f such that
the lower integral satisfies L(f ) − I(g) < . There is a similar result for upper
integrals.

Proof: Choose a partition P such that


X 
sup fI mn (I) > L(f ) − . (2.34)
2
I∈P

Let P+ be the set of I with fI ≥ 0, while P− is the set of I with fI < 0. For I
in P+ define continuous gI with support in I and with 0 ≤ gI ≤ sup fI ≤ f on I
and with fI mn (I) − I(gI ) very small. Thus gI can be a continuous trapezoidal
function with very steep sides. For I in P− define continuous gI ≤ 0 with
compact support and with gI ≤ sup fI ≤ f on I and with fI mn (I) − I(gI )
very small. Again gI can be a continuous trapezoidal function with very steep
sides, however now it has constantP value on all of I and consequently has a
slightly larger support. Let g = I∈P gI . It is not difficult to show that g ≤ f .
Furthermore, we can arrange it so that
X 
I(g) > sup fI mn (I) − . (2.35)
2
I∈P

Thus this is the desired function. 

2.4 Fubini’s theorem


There is a famous theorem of integration theory called Fubini’s theorem that
has a version for the Riemann integral. It says that a multi-dimensional integral
may be written as an iterated integral. The usual formulation says that if f is a
bounded function on a bounded subset Rm+n , then under suitable conditions
Z Z Z 
f (x, y) dx dy = f (x, y) dx dy (2.36)

and also Z Z Z 
f (x, y) dx dy = f (x, y) dy dx. (2.37)

Here x ranges over a subset of Rm , and y ranges over a subset of Rn . The left
hand side is an ordinary Riemann integral over a subset of Rm+n . The right
hand side is an iterated integral. Thus in the first case for each fixed y there
is a corresponding m dimensional Riemann integral. These integrals define a
function of y, and the n-dimensional integral of this function gives the final
iterated integral.
In the following theoretical development we shall use a somewhat different
notation. The reason for this is that we shall be comparing lower and upper
2.4. FUBINI’S THEOREM 51

sums, and the variant notation makes it easy to incorporate these notions. In
particular, the above formulas will be written

I(f (x, y); x, y) = I(I(f (x, y); x); y) (2.38)

and
I(f (x, y); x, y) = I(I(f (x, y); y); x). (2.39)

We shall see that in certain circumstances these formulas are true. However
there are technical issues. For example, suppose that the Riemann integral
I(f (x, y); x, y) exists. Then it is not guaranteed that for fixed y the integral
I(f (x, y); x) exists. Nor is it guaranteed that for each fixed x that the integral
I(f (x, y); y) exists.
Example: Consider the following function defined on [0, 1]×[0, 1]. Let f (x, y) =
1 when x is rational and y = 1/2, but f (x, y) = 0 elsewhere. This is Riemann
integrable with integral zero. However for y = 1/2 the function x 7→ f (x, y) is
not Riemann integrable. In fact, its lower integral is 0, while its upper integral
is 1. |

Theorem 2.7 (Fubini’s theorem for lower and upper integrals) For lower
integrals
L(f (x, y); x, y) ≤ L(L(f (x, y); x); y), (2.40)
while for upper integrals

U (U (f (x, y); x); y) ≤ U (f ). (2.41)

There are similar results where the roles of x and y are reversed.
Proof: It is sufficient to prove the result for lower integrals. The result for
upper integrals is proved in the same way, but with the inequalities reversed.
For the proof, it is useful to have the concept of a product partition. Let
C = C1 × C2 be the cell over which the integration takes place. The x variables
range over C1 , while the y variables range over C2 . If P1 is a partition of C1 ,
and P2 is a partition of C2 , then the product partition P1 × P2 is the partition
of C consisting of all I1 ×I2 with I1 from C1 and I2 from C2 . Given an arbitrary
partition P of C, then there is a product partition that is finer than P. So it is
reasonable to first deal with product partitions.
First we need a simple lemma that only involves sums, not integrals. This
states that
L(f, P1 × P2 ) ≤ L(L(f (x, y); x, P1 ); y, P2 ). (2.42)

The proof of the lemma uses inf (x,y)∈I1 ×I2 f (x, y) = inf y∈I2 inf x∈I1 f (x, y).
The key ingredient is then the product property mm+n (I1 ×I2 ) = mm (I1 )mn (I2 ).
We have
X X
L(f, P1 × P2 ) = inf inf f (x, y)mm (I1 )mn (I2 ). (2.43)
y∈I2 x∈I1
I2 ∈P2 I1 ∈P1
52 CHAPTER 2. INTEGRATION

P P
From the general principle that I inf y hI (y) ≤ inf y I hI (y) we get
X X
L(f, P1 × P2 ) ≤ inf inf f (x, y)mm (I1 )mn (I2 ). (2.44)
y∈I2 x∈I1
I2 ∈P2 I1 ∈P1

This translates to
X
L(f, P1 × P2 ) ≤ inf L(f (x, y); x, P1 )mn (I2 ). (2.45)
y∈I2
I2 ∈P2

This leads easily to the statement of the lemma. The proof of the lemma is
complete.
Since lower sums are bounded by the lower integral, the lemma gives

L(f, P1 × P2 ) ≤ L(L(f (x, y); x, P1 ); y, P2 ) ≤ L(L(f (x, y); x), y, P2 ). (2.46)

Again for the same reason

L(f, P1 × P2 ) ≤ L(L(f (x, y); x); y). (2.47)

Given an arbitrary partition P, there is a finer product partition P1 × P2 , so


we must also have
L(f, P) ≤ L(L(f (x, y); x); y). (2.48)
That is, the iterated lower integral is an upper bound for the lower sums. Since
L(f ) is the least of all such upper bounds, we have

L(f ) ≤ L(L(f (x, y); x); y) (2.49)

as desired. 
Example: The theorem above gives a kind of Fubini theorem that works for the
lower and upper integrals. Here is an example that shows that equality is not
guaranteed. Consider the case of the upper integral of a function f defined on
the cell [0, 1] × [0, 1]. Suppose that there is a countable dense set D such that
f is one on that set, zero on its complement. Then U (f (x, y); x, y) = 1. Now
suppose that the set D has the property that for every horizontal line, there
is at most one point on the line that is in D. Then for each y the function
x 7→ f (x, y) has upper integral U (f (x, y); x) = 0. Thus U (U (f (x, y), x), y) = 0.
So the iterated upper integral is smaller than the upper integral.
How can we find such a set D? First consider the set E of points in the
plane with both coordinates rational. Consider all lines in the plane with fixed
angle θ from the x axis, so that the slope is m = tan(θ). Suppose that m is an
irrational number. For instance, √ we could take lines at an angle θ = π/6 from
the x axis, with slope m = 1/ 3. Every such line intersects E in at most one
point. (Why?) Now rotate the picture by angle −θ, so that we get a set D that
consists of E rotated by this angle, and such that the lines become horizontal
lines. |
2.4. FUBINI’S THEOREM 53

Theorem 2.8 (Fubini’s theorem for the Riemann integral) Suppose that
the Riemann integral I(f (x, y); x, y) exists. Then for each fixed y the lower in-
tegral and upper integral are automatically defined and satisfy L(f (x, y); x) ≤
U (f (x, y); x). Furthermore, as functions of y these each define Riemann inte-
grable functions. Finally, we have both the formulas

I(f (x, y); x, y) = I(L(f (x, y); x); y) (2.50)

and
I(f (x, y); x, y) = I(U (f (x, y); x); y). (2.51)

The result of course works in the other order. For the sake of completeness here
is an explicit statement.

Theorem 2.9 (Fubini’s theorem for the Riemann integral) Suppose that
the Riemann integral I(f (x, y); x, y) exists. Then for each fixed x the lower in-
tegral and upper integral are automatically defined and satisfy L(f (x, y); y) ≤
U (f (x, y); y). Furthermore, as functions of x these each define Riemann inte-
grable functions. Finally, we have both the formulas

I(f (x, y); x, y) = I(L(f (x, y); y); x) (2.52)

and
I(f (x, y); x, y) = I(U (f (x, y); y); x). (2.53)

Proof: The two preceding theorems are essentially the same; it is sufficient
to prove the first one. The proof uses the results that relate lower integrals to
iterated lower integrals and upper integrals to iterated upper integrals. Once
we have these results, we are almost done. We have

L(f ) ≤ L(L(f (x, y); x); y) ≤ U (L(f (x, y); x); y) ≤ U (U (f (x, y); x); y) ≤ U (f ).
(2.54)
If L(f ) = U (f ), then L(L(f (x, y); x); y) = U (L(f (x, y); x); y). This proves the
integrability of the function that sends y to L(f (x, y); x).
Similarly, we have

L(f ) ≤ L(L(f (x, y); x); y) ≤ L(U (f (x, y); x); y) ≤ U (U f (x, y); x); y) ≤ U (f ).
(2.55)
If L(f ) = U (f ), then L(U (f (x, y); x); y) = U (U f (x, y); x); y). This proves the
integrability of the function that sends y to U (f (x, y); x). 
In the above proof it does not seem to matter whether one used the lower
integral or the upper integral. This is clarified by the following remark. Define
the difference function

h(y) = U (f (x, y); x) − L(f (x, y); x). (2.56)

Then h ≥ 0. If f is Riemann integrable, then the analysis in the above proof


shows that h is Riemann integrable and I(h) = 0. Thus h is negligible from
54 CHAPTER 2. INTEGRATION

the Riemann integral point of view. For instance, if h is continuous, then it is


identically zero. In such a case the Fubini theorem is true in the form stated
at the outset, that is, without any special consideration of upper and lower
integrals.
The above exposition of Fubini’s theorem follows unpublished notes on the
Riemann integral by Mariusz Wodzicki [20].

2.5 Uniform convergence


Say that fn is a sequence of functions on a bounded set A, and f is another such
function. We would like conditions that ensure the convergence of the integrals
I(fn ) → I(f ). The requirement of uniform convergence is certainly sufficient.
We review the basic definitions. Here is pointwise convergence. We say that
fn converges to f pointwise on A if for all x in A and for all  > 0 there is
an N such that for n ≥ N we have that |fn (x) − f (x)| < . In other words,
the requirement is that for all x the limit as n → ∞ of fn (x) is equal to f (x).
Sometimes this is written
∀x ∀ > 0 ∃N ∀n ≥ N |fn (x) − f (x)| < . (2.57)
Notice that the ∀x may occur anywhere to the left of the ∃N . This says that
the N may depend on x.
Contrast that with the much stronger requirement of uniform convergence.
We say that fn converges to f uniformly on A if for all  > 0 there is an N such
that for n ≥ N and for all x in A we have that |fn (x) − f (x)| < . In other
words, the requirement is that as n goes to infinity the function fn approaches
the function f in the sense that throughout the set A the deviation of fn from
f becomes arbitrary small. Sometimes this is written
∀ > 0 ∃N ∀n ≥ N ∀x |fn (x) − f (x)| < . (2.58)
Notice that the ∀x may occur anywhere to the right of the ∃N (but before
the final inequality). Thus the N may not depend on x. Clearly uniform
convergence implies pointwise convergence.
The most famous result about uniform convergence is the following one. It
is proved by a standard three  argument.
Theorem 2.10 Suppose that each fn is continuous and that fn → f uniformly.
Then f is continuous.
Sometimes it is necessary to find examples of functions that do not converge
pointwise or do not converge uniformly. The condition for a function to not
converge pointwise is
∃x ∃ > 0 ∀N ∃n ≥ N |fn (x) − f (x)| ≥ . (2.59)
The condition for a function to not converge uniformly is
∃ > 0 ∀N ∃n ≥ N ∃x |fn (x) − f (x)| ≥ . (2.60)
2.6. DOMINATED CONVERGENCE 55

It is easier to prove that a function is not uniformly convergent, since the x is


allowed to depend on N . In many cases it is possible to take n = N .
The following theorem involving uniform convergence and integration is el-
ementary.

Theorem 2.11 Suppose that all the fn and f are Riemann integrable on the
bounded set A. If fn converges to f uniformly on A, then I(fn ) converges to
I(f ).

One way to prove this is to first prove that I(|fn − f |) converges to zero.
There is a remarkable theorem of Dini that shows that under certain very
special circumstances uniform convergence is automatic. The context is that of
a sequence of functions that is decreasing. (There is an obvious variant with a
sequence of functions that is increasing).

Theorem 2.12 (Dini’s theorem) Let A be a compact set. Suppose that fn is


a decreasing sequence of continuous functions on A. Suppose that fn ↓ f con-
verges pointwise. Finally, suppose that f is also continuous. Then fn converges
uniformly to f .

Proof: Let hn = fn − f . Then hn → 0 is also decreasing, and it has


pointwise limit zero. Furthermore, hn is continuous. Let  > 0. By continuity,
for each n the set Un where hn <  is open. The open sets Un are increasing.
Now we use pointwise convergence. For each x in A, there are n sufficiently
large (depending on x) such that hn (x) < . Hence for such n we have x ∈ Un .
This shows that the union of the Un for all n includes A. In other words, the Un
form an open cover of A. Since A is compact, there is a finite subcover. This
implies that there is an N such that UN includes A. Also the Un for n ≥ N
include A. In other words, for all n ≥ N the set where hn <  includes all of A.
This is uniform convergence. 

2.6 Dominated convergence


It is rather amazing that uniform convergence is not necessary for convergence
of the integrals. This section presents the dominated convergence theorem for
the Riemann integral, first proved by Arzela in 1885.
Let A be a bounded set. Let fn be a sequence of real Riemann integrable
functions on A that are bounded by a fixed constant M . Thus there is an M
such that |fn (x)| ≤ M for all x in A and all n. Then the sequence of functions
is said to be dominated. It is dominated in two ways: by the fixed bounded set
A on which the functions are all defined, and by the fixed constant M by which
they are all bounded.
Suppose that the sequence of functions fn is dominated, and suppose that
fn → f pointwise as n → ∞, where f is also Riemann integrable. This is called
dominated convergence (for the Riemann integral). . The theorem says that in
this situation the integrals converge: I(fn ) → I(f ) as n → ∞.
56 CHAPTER 2. INTEGRATION

We begin with a lemma about approximation of lower integrals via continu-


ous functions. This is a special case of a result proved earlier, but it is convenient
to record it here.

Lemma 2.13 Let A be a bounded set. Let f ≥ 0 be a bounded real function on


A. Then for every  > 0 there exists a continuous function g with 0 ≤ g ≤ f
and L(f ) − I(g) < .

Proof: From the definition of the lower integral it follows that there is a
step function k with 0 ≤ k ≤ f and with L(f ) − I(k) < /2. However one can
approximate each step by a continuous trapezoid with very steep sides, so that
the resulting trapezoidal function g satisfies 0 ≤ g ≤ k and I(k) − I(g) < /2.
Then I(f ) − I(g) < I(f ) − I(k) + I(k) − I(g) < /2 + /2 = . 
We now turn to an important result on monotone convergence; it will be used
to prove the dominated convergence theorem. In the monotone convergence re-
sult there is no assumption that the functions are Riemann integrable. However
they have lower integrals, so the result is stated in terms of lower integrals.

Lemma 2.14 (Monotone convergence for the lower Riemann integral)


Let A be a bounded set. Let pn be a sequence of real functions on A such that
there is a constant M with 0 ≤ pn ≤ M . Suppose that pn ↓ 0 pointwise as
n → ∞. Then the lower integrals converge: L(pn ) → 0 as n → ∞.

Proof: The assumption is that pn is a sequence of bounded functions with


pn ↓ 0, that is, the functions are decreasing pointwise to zero. If Dini’s theorem
applied, then we would have I(pn ) ↓ 0. However, there are obvious problems.
The pn need not be continuous, so Dini’s theorem does not apply. Not only
that, the pn need not be Riemann integrable. In this case I(pn ) is not even
defined, and we must work with L(pn ).
Ultimately the proof will reduce to Dini’s theorem. Dini’s theorem relies on
the assumption that the functions are defined on a compact set, but that set
can be taken to be a closed bounded non-degenerate cell C with A ⊆ C. (In
fact this is how the Riemann integral is defined.) Uniform convergence is hiding
somewhere in the world of pointwise convergence, but it is well hidden. We first
use the fact that we can approximate by continuous functions. The essential
idea is to approximate better and better as we go along the sequence.
Consider  > 0. Choose gi continuous with 0 ≤ gi ≤ pi and

1
L(pi ) − I(gi ) ≤  . (2.61)
2i
Unfortunately, there is no guarantee that the functions gi are decreasing. To
fix this, let
hn = min(g1 , . . . , gn ). (2.62)
Then the hn ↓ 0 are decreasing pointwise to zero, and each hn is continuous.
Hence by Dini’s theorem I(hn ) ↓ 0. This looks promising.
2.6. DOMINATED CONVERGENCE 57

The problem is that in general the integral of a minimum can be much


smaller than the integrals of the individual functions. So we need to use special
features of the choices made above to ensure that I(gn ) − I(hn ) is small. We
have for each j = 1, . . . , n that
n−1
X
gn − gj ≤ max(gj , gn ) − gj ≤ (max(gi , gn ) − gi ) (2.63)
i=1

since each max(gi , gn )−gi ≥ 0. The sum on the right hand side does not depend
on j, so it is an upper bound for all of the gn − gj . By definition of hn we then
have
n−1
X
gn − hn ≤ (max(gi , gn ) − gi ). (2.64)
i=1

The above inequality only involves Riemann integrable functions. Hence we


are allowed to use the additivity of the integral to write the integral of the right
hand side as the sum of the integrals. This gives
n−1
X
I(gn ) − I(hn ) ≤ (I(max(gi , gn )) − I(gi )). (2.65)
i=1

For i = 1, . . . , n we have gi ≤ pi and gn ≤ pn ≤ pi , so max(gi , gn ) ≤ pi .


Therefore we have
n−1
X
I(gn ) − I(hn ) ≤ (L(pi ) − I(gi )). (2.66)
i=1

Hence,
n−1  
X 1 1
I(gn ) − I(hn ) ≤  =  1 − . (2.67)
i=1
2i 2n−1
This is the result that is needed. We conclude by noting that
 
1 1
L(pn ) − I(hn ) = L(pn ) − I(gn ) + I(gn ) − I(hn ) ≤  n +  1 − n−1 . (2.68)
2 2

This gives  
1
L(pn ) ≤ I(hn ) +  1 − n < I(hn ) + . (2.69)
2
So when n is so large that I(hn ) is less than , then L(pn ) is less than 2. 
Remark: In the above proof P it could be tempting to use gn − gj ≤ pj − gj
n
for j ≤ n to prove gn − hn ≤ i=1 (pi − gi ). The problem would be that the
right hand side only has a lower integral. Furthermore, the lower integral is
only known to be superadditive, so the lower integral of the sum could be much
larger than the sum of the lower integrals. This was avoided in the proof by
using max(gi , gn ) in place of pi . |
58 CHAPTER 2. INTEGRATION

Theorem 2.15 (Dominated convergence for the Riemann integral) Let


A be a bounded set. Let fn be a sequence of real Riemann integrable functions
on A that are dominated by a fixed constant M . Thus there is an M such that
|fn (x)| ≤ M for all x in A and all n. Suppose that fn → f pointwise as n → ∞,
where f is also Riemann integrable. Then the integrals converge: I(fn ) → I(f )
as n → ∞.

Remark: It actually suffices to prove an apparently weaker version of the the-


orem. This theorem says that if f¯n ≥ 0 and f¯n ≤ 2M and f¯n → 0 pointwise,
then I(f¯n ) → 0. If we set f¯n = |fn − f |, we see that I(|fn − f |) → 0. Hence
|I(fn ) − I(f )| = |I(fn − f )| ≤ I(|fn − f |) → 0. |
Proof: Suppose that f¯n ≥ 0 with f¯n ≤ 2M and f¯n → 0 pointwise. It
is sufficient to show that I(f¯n ) → 0. Here is the strategy. For each n let
pn = supk≥n f¯k be the pointwise supremum of the f¯k for all k ≥ n. This is
an infinite supremum! Then 0 ≤ f¯n ≤ pn . Furthermore, pn ↓ 0 is decreasing
pointwise to zero. (Why?) There is an apparent problem: pn need not be
Riemann integrable. In this case I(pn ) is not defined. However the lemma on
monotone convergence proves that the lower integrals L(pn ) → 0. But then
I(f¯n ) = L(f¯n ) ≤ L(pn ) → 0, so we have the desired result. 
This dominated convergence theorem for the Riemann integral was first
proved by Arzela in 1885. It states that if one has a dominated sequence of
Riemann integrable functions that converges pointwise to a Riemann integrable
function, then the limit of the integrals is the integral of the limit. This was
long before the introduction of the Lebesgue integral; Lebesgue’s thesis was
published in 1902. The Lebesgue theory gives a much more powerful result. It
says that if one has a dominated sequence of Lebesgue integrable functions that
converge pointwise, then the limit is automatically Lebesgue integable, and the
limit of the integrals is the integral of the limit.
This proof given above is elegant but somewhat strange. The theorem is a
result about a sequence of Riemann integrable functions, but the proof uses a
sequence of functions that need not in general be Riemann integrable. In fact,
the proof uses ideas that are perhaps more natural in the context of the Lebesgue
integral. It is taken from a paper by Luxemburg [10]. In this paper Luxemburg
says that his proof is essentially the same as Hausdorff’s proof published in 1927.
However at one point Hausdorff gives an incorrect argument. This needs to be
replaced by other reasoning; this is supplied by Luxemburg.

2.7 Differentiating a parameterized integral


In rough summary, the dominated convergence theorem says that if A is bounded
and the functions fn (x) are uniformly bounded (|fn (x)| ≤ M ), then the condi-
tion that limn→∞ fn (x) = f (x) pointwise in x implies that
Z Z
lim fn (x) dx = f (x) dx. (2.70)
n→∞ A A
2.7. DIFFERENTIATING A PARAMETERIZED INTEGRAL 59

There is a variant form of the dominated convergence theorem in which the


functions depend on a continuous parameter. This says that if A is bounded and
the functions f (x, y) are uniformly bounded (|f (x, y)| ≤ M ), then the condition
that limy→y0 f (x, y) = f (x, y0 ) pointwise in x implies that
Z Z
lim f (x, y) dx = f (x, y0 ) dx. (2.71)
y→y0 A A

This variant form is a consequence of the original theorem and of a general


fact about continuity in the metric space setting. The general fact is that if for
every sequence an with limn an = y0 we have limn g(a R n ) = g(y0 ), then g(y) is
continuous at y0 . We can then apply this to g(y) = A f (x, y) dx.

Theorem 2.16 Let A be a bounded set that is Jordan measurable, and U be


an open set. Suppose that f (x, y) is C 1 , and write f20 (x, y) for the covector of
partial derivatives with respect to the y variables. Suppose also that there is a
constant M with |f20 (x, y)| ≤ M . Let
Z
g(y) = f (x, y) dx. (2.72)
A

Then Z
0
g (y)h = f20 (x, y)h dx. (2.73)
A

Proof: Write
Z Z
g(y + h) − g(y) − f20 (x, y)h dx = (f (x, y + h) − f (x, y) − f20 (x, y)h) dx.
A A
(2.74)
This can also be written
Z Z Z 1
g(y + h) − g(y) − f20 (x, y)h dx = (f20 (x, y + th) − f20 (x, y))h dt dx.
A A 0
(2.75)
This has absolute value bounded by
Z Z 1
(y, h) = |f20 (x, y + th) − f20 (x, y)| dt dx. (2.76)
A 0

times |h|. All that remains is to show that (y, h) → 0 as h → 0. For fixed x
and t the integrand approaches zero as h → 0. The conclusion follows from the
dominated convergence theorem. 
This theorem gives a practical condition for differentiating an integral de-
pending on a parameter with respect to the parameter. For the theorem to
apply it is important that the bound on f20 (x, y) be independent of x and of y.
60 CHAPTER 2. INTEGRATION

2.8 Approximate delta functions


Consider a function δ1 (x) ≥ 0 that is positive and has integral one. Suppose
for convenience that it is continuous and has compact support. For instance, it
might vanish outside the closed ball of radius c > 0.
Construct a family of functions δ (x) ≥ 0 defined for each  > 0 by scaling
according to the rule
x 1
δ (x) = δ1 ( ) n . (2.77)
 
Then it is easy to see that for each  > 0 the n-dimensional integral
Z
δ (x) dx = 1. (2.78)

This family of functions (considered for  > 0 small) will be called a family of
approximate delta functions. Notice that δ vanishes outside the closed ball of
radius c.

Theorem 2.17 Let f be a bounded continuous function, and let δ be a family


of approximate delta functions. Then
Z
lim f (x)δ (x) dx = f (0). (2.79)
→0

Proof: Write
Z Z
x 1
f (x)δ (x) dx = f (x)δ1 ( ) n dx. (2.80)
 
Make the change of variable x = u. This gives
Z Z
f (x)δ (x) dx = f (u)δ1 (u) du. (2.81)

The integrand converges pointwise in u to f (0)δ1 (u) on the closed ball of radius
c and is bounded by a constant independent of . By the dominated convergence
theorem the integral converges to f (0). 
Remark: The amazing thing about this theorem is that the right hand side
is independent of the particular choice of approximate delta function. For this
reason, it is customary to write it in the form
Z
f (x)δ(x) dx = f (0). (2.82)

Of course, there is no such delta function δ(x) with this property, but it is still
convenient to describe its properties. While the left hand side does not have a
literal meaning, it gives an easy way to remember the result. Furthermore, it
allows one to summarize various useful facts, such as
Z
δ(x) dx = 1 (2.83)
2.9. LINEAR ALGEBRA (DETERMINANTS) 61

and
f (x)δ(x) = f (0)δ(x). (2.84)
Also, the delta function is even

δ(−x) = δ(x) (2.85)

and transforms under a change of scale a 6= 0 by

|an |δ(ax) = δ(x). (2.86)

More generally, we have

1
|an |δ(ax − y) = δ(x − y). (2.87)
a
The reader may check that each of these suggests a meaningful statement about
approximate delta functions. |
The integrals involving approximate delta functions are often of the form
Z Z
h(y)δ (z − y) dy = h(z − x)δ (x) dx. (2.88)

The two integrals expressions are equivalent after the change of variable y =
z − x. The new feature is that we look at the result as a function of z. If
|h(y)| ≤ M , then each integral above as a function of z has magnitude bounded
by M . Furthermore, it is a continuous function of z. Also, suppose that h has
compact support K. Then this integral as a function of z has compact support
in the set Kc of points a distance at most c from K. The result may be stated
in this context as follows.

Theorem 2.18 Let h be a bounded continuous function, and let δ be a family


of approximate delta functions. Then
Z Z
lim h(y)δ (z − y) dy = lim h(z − x)δ (x) dx = h(z). (2.89)
→0 →0

2.9 Linear algebra (determinants)


If A is a square matrix, then there is an associated number det(A), the deter-
minant of A. The determinant of a product of matrices is the product of the
determinants. The determinant of the identity matrix I is det(I) = 1. A square
matrix A has an inverse matrix A−1 if and only if det(A) 6= 0.
Various linear algebra operations may be expressed in terms of elementary
matrices. These are square matrices E. For each such matrix there is a corre-
sponding linear transformation x 7→ Ex. There are three kinds of elementary
matrix:
62 CHAPTER 2. INTEGRATION

1. A scaling matrix that is obtained by multiplying row i of the identity ma-


trix by a constant a 6= 0. The corresponding linear transformation sends
xi to axi and leaves the other coordinates unchanged. The determinant
is a.

2. An interchange matrix is obtained by interchanging rows i and j of the


identity matrix. The corresponding linear transformation sends xi to xj
and xj to xi and leaves the other coordinates unchanged. The determinant
is −1.

3. A shear matrix is obtained by modifying row i of the identity matrix by


adding a constant c times row j. The corresponding linear transformation
sends xi to xi + cxj and leaves the other coordinates unchanged. The
determinant is 1.

Every invertible matrix may be written as a product of elementary matrices.


This gives a way of computing the determinant. The significance of the deter-
minant is that its absolute value gives a factor by which volumes are multiplied.
This is reflected in the ways that integrals are computed.

1. For a scaling the integral transforms by


Z Z
|a| f (axi ) dxi = f (yi ) dyi . (2.90)

Volumes are multiplied by the absolute value |a| of the scale factor, which
is the absolute value of the determinant.

2. For an interchange the integral transforms by


Z Z
f (xi , xj ) dxi dxj = f (yj , yi ) dyi dyj . (2.91)

The absolute value of the determinant is 1. Volumes are left unchanged.

3. For an shear use Fubini’s theorem to integrate with respect to xi with xj


fixed.
Z Z Z
f (xi + cxj , xj ) dxi dxj = f (yi , xj ) dyi dxj = f (yi , yj ) dyi dyj .
(2.92)
The determinant is 1. Volumes are left unchanged

Theorem 2.19 Under an invertible linear transformation an integral trans-


forms by
Z Z
f (y) dy = | det(A)| f (Ax) dx. (2.93)
2.9. LINEAR ALGEBRA (DETERMINANTS) 63

Proof: Write A = E1 · · · Ek as a product of elementary matrices. Then


repeated use of the integral identities above gives
Z Z
f (y) dy = | det(E1 )| · · · | det(Ek )| f (E1 · · · Ek x) dx. (2.94)

By the multiplicative properties of absolute value and determinant we have

| det(E1 )| · · · | det(Ek )| = | det(E1 ) · · · det(Ek )| = | det(E1 · · · Ek )|. (2.95)

This gives the result. 


In general a change of variable is not given by a fixed linear transforma-
tion, and so the formula is not quite this simple. However for the case of an
approximate delta function the result is essentially the same.

Theorem 2.20 Consider an open subset V of Rn and a change of variable


function g : V → Rn . Suppose that g is one-to-one and C 1 . Furthermore,
suppose that det g0 (x) 6= 0. Let h be a bounded continuous function. Let δ be a
family of approximate delta functions. Let K be a compact subset of g(V ) and
consider y in K. Then
Z
1
lim h(x)δ (g(x) − y) dx = h(g−1 (y)). (2.96)
→0 V | det g (g−1 (y))|
0

Proof: First it is helpful to take some care about the region of integration.
The integral is over the set of x with |g(x) − y| ≤ c. Consider the set Kc
consisting of all points with distance no greater than c from K. There is an
1 such that K1 c is a subset of g(V ). Since this is a compact set, the function
k(g−1 )0 k is bounded there by some constant λ. Now suppose that y is in K and
|y0 − y| ≤ c for some  ≤ 1 . Then |g−1 (y0 ) − g−1 (y)| ≤ λ|y0 − y| ≤ λc. In
particular for x in the region of integration |x − g−1 (y)| ≤ λ|g(x) − y| ≤ λc.
Make the change of variable x = g−1 (y) + t. The integration region is now
|t| ≤ λc. This is a fixed bounded set, independent of . The integral on the left
hand side is
g(g−1 (y) + t) − g(g−1 (y))
   
g(x) − y 1
Z Z
−1
h(x)δ1 dx = h(g (y)+t)δ1 dt.
 n 
(2.97)
The integrand is bounded, independent of . By the dominated convergence
theorem the limit as  → 0 of this is
Z Z
1
h(g−1 (y))δ1 (g0 (g−1 (y))t) dt = h(g−1 (y)) δ1 (u) du.
| det(g (g−1 (y)))|
0
(2.98)
The last step is the change of variables u = g0 (g−1 (y))t. This involves a matrix
g0 (g−1 (y)) that only depends on the parameter y and so may be regarded as
constant. Performing the u integral gives the resultRon the right hand side. 
Remark: For later use we note that the integral V h(x)δ (g(x) − y) dx bas
a function of y is uniformly bounded on K, independent of . Furthermore,
64 CHAPTER 2. INTEGRATION

for each  it is continuous in y. (The dominated convergence also applies to


integrals that depend on a real parameter such as y.) |
Remark: Again there is a common abbreviation for this kind of result. In the
present case one could write
Z
1
h(x)δ(g(x) − y) dx = 0 (g−1 (y))|
h(g−1 (y)). (2.99)
V | det g
Even more radically, one could write
1
δ(g(x) − y) = δ(x − g−1 (y)). (2.100)
| det g0 (g−1 (y))|
Even if there is no such thing as a delta function, such identities involving delta
functions make perfectly good sense. |

2.10 Change of variables


Theorem 2.21 (Change of variables for the Riemann integral) Consider
an open subset V of Rn and a change of variable function g : V → Rn . Suppose
that g is one-to-one and C 1 . Furthermore, suppose that det g0 (x) 6= 0. Let f be
a Riemann integrable function with compact support in g(V ). Then
Z Z
f (y) dy = f (g(x))| det g0 (x)| dx. (2.101)
g(V ) V

Proof: First we give the proof for the case when f is a continuous function.
The plan is to do the proof in two parts: write the right hand side as a limit,
and then write the left hand side as the same limit.
Here is the part dealing with the integral on the right hand side. Let K be
the support of f . We first have a limit of single integrals
Z
f (y)| det g0 (g−1 (y))|δ (g(x) − y) dy → f (g(x))| det g0 (x)| (2.102)
K

as  → 0. This result uses that continuity of f .


The integral in the above formula vanishes unless g(x) is in Kc . Choose 1 so
that K1 c is in g(V ). So for  ≤ 1 we may integrate x over the compact set L =
g−1 (K1 c ). Furthermore, these integrals are uniformly bounded, independent
of . By the dominated convergence theorem we get limit of double integrals
Z Z Z
0 −1
f (y)| det g (g (y))|δ (g(x) − y) dy dx → f (g(x))| det g0 (x)| dx
L K V
(2.103)
as  → 0.
Here is the part dealing with the integral on the left hand side. First we have
a limit of single integrals. By the change of variable formula for approximate
delta functions we have
Z
lim f (y)| det g0 (g−1 (y))|δ (g(x) − y) dx → f (y) (2.104)
→0 L
2.11. PROBLEMS 65

as  → 0.
As a function of y the above integral is bounded by a constant independent
of  and is supported on K. The dominated convergence theorem gives a limit
of double integrals
Z Z Z
f (y)| det g0 (g−1 (y))|δ (g(x) − y) dx dy → f (y) dy (2.105)
K L K

as  → 0.
The proof for continuous f is concluded by noting that according to Fubini’s
theorem the two double integrals are the same.
The proof for general Riemann integrable functions requires additional com-
ment. First, the fact that the right hand side is integrable follows from general
properties of the Riemann integral with respect to composition and multipli-
cation. Then one can use approximation by continuous functions. In fact, for
every Riemann integrable f there are continuous functions g ≤ f ≤ h with in-
tegrals arbitrarily close to the integral of f . Since the integrals on the left hand
side are arbitrarily close, it follows that the integrals on the right hand side are
also arbitrarily close. 
This beautiful proof is from a recent paper by Ivan Netuka [15]. (A few
details of the proof have been changed.) That formulation in this paper is for the
Lebesgue integral, but it also works for the Riemann integral if one recognizes
that the Riemann integral also has a dominated convergence theorem.
Remark: A physicist or engineer who is familiar with delta functions might
summarize the entire proof by recalling that

| det g0 (g−1 (y))|δ (g(x) − y) = δ(x − g−1 (y)). (2.106)

So
Z Z Z
f (g(x))| det g0 (x)| dx = f (y)| det g0 (g−1 (y))|δ(g(x) − y) dy dx (2.107)

which in turn is equal to


Z Z Z
f (y)δ(x − g−1 (y)) dx dy = f (y) dy (2.108)

Thus the proof is immediately memorable. |

2.11 Problems
Problems 5: Dominated Convergence
1. (a) Give an example of functions f and g on [0, 1] such that their lower
integrals satisfy L(f + g) > L(f ) + L(g).
(b) Give an example of functions f and g on [0, 1] such that their upper
integrals satisfy U (f + g) < U (f ) + U (g).
66 CHAPTER 2. INTEGRATION

2. Consider a function f (x, y) defined for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Suppose


that (x, y) 7→ f (x, y) is Riemann integrable as a function on the square.
Give an example to show that there can be y such that x 7→ f (x, y) is not
Riemann integrable. In your example, calculate the upper integral h(y) of
x 7→ f (x, y). Does the integral of y 7→ h(y) exist? Justify your answer.

3. Consider functions fn (x) with fn (x) = n2 x for 0 ≤ x ≤ 1/n and f (x) =


n − n2 (x − 1/n) for 1/n ≤ x ≤ 2/n, zero elsewhere. Find the integral of
the pointwise limit. Find the limit of the integrals.

4. Consider functions defined on [a, b]. Say that fn ↓ f if for each m ≥ n we


have fm (x) ≤ fn (x), and if f (x) is the infimum of the fn (x). (a) Show
that the limit of the fn (x) as n → ∞ is f (x). (b) Suppose each fn is
continuous. Prove or disprove: The convergence must be uniform.

5. Let fn be a sequence of Riemann integrable functions on [a, b] with each


fn (x) ≥ 0, uniformly bounded. We would like to have a sequence pn
of functions such that for each n we have fn (x) ≤ pn (x) and such that
pn (x) ↓ p(x). An obvious device is to take pn (x) = supk≥n fk (x). (a)
Prove that the sequence pn (x) is decreasing. (b) Must each pn be Riemann
integrable? Give a careful discussion and proof.

Recitation 5
1. Evaluate Z 2 Z 3
ye−xy dy dx (2.109)
0 0

without using integration by parts.

2. (a) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y = 12 , zero otherwise. Is it Riemann integrable? Prove
that your answer is correct.
(b) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational, zero otherwise. Is it Riemann integrable?
Prove that your answer is correct.
(c) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational and x = y, zero otherwise. Is it Riemann
integrable? Prove that your answer is correct.
(d) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational and x 6= y, zero otherwise. Is it Riemann
integrable? Prove that your answer is correct.
(e) Consider the function f defined on the unit square with f (x, y) = 1
if x is rational or y is rational, zero otherwise. Is it Riemann integrable?
Prove that your answer is correct.
2.11. PROBLEMS 67

3. Consider the functions fn (x) = xn defined for 0 ≤ x ≤ 1. Find the


integral of the limit and the limit of the integrals. (a) Does the dominated
convergence theorem apply? Explain in detail. (b) Does Dini’s theorem
apply? Explain in detail. (c) Is there uniform convergence? Justify your
answer directly from the definition of uniform convergence.
4. For each t > 0 define the function
1 1 √
δt (x) = max( √ − |x − t|, 0) (2.110)
t t
Define δ0 (x) = 0 and δ−t (x) = δt (x).
Also, for each t define
φ(x, t) = tδt (x). (2.111)
(a) Compute the x integral of δt (x) for each t 6= 0. Let t → 0. Compute
the x integral of the pointwise limit and the limit of the x integrals.
(b) Compute the x integral of φ(x, t). Compute the t derivative of the x
integral of φ(x, t).
(c) For fixed x, compute the t derivative of φ(x, t) at t = 0. Compute the
x integral of this t partial derivative.
(d) What does this say about differentiating under the integral sign?

Problems 6: Change of Variables


1. (a) For each k = 0, 1, 2, 3, . . . find a C k approximate delta function that
vanishes outside of a bounded set? (b) Is there a C ∞ approximate δ
function that vanishes outside of a bounded set? Prove or disprove.
2. Let u = x3 − 3xy 2 and v = 3x2 y − y 3 . Consider the region D of x, y
in the first quadrant such that the corresponding u, v satisfy |u| < 3 and
|v − 2| < 1. Evaluate Z
(x2 + y 2 )2 dx dy. (2.112)
D

3. (a) There is another interesting change of variable formula in one domen-


sion. It says that if [a, b] may be partitioned into intervals I such that on
each interval g 0 (x) > 0 or g 0 (x) < 0, then
Z b Z ∞ X 1
h(g(x)) dx = h(y) 0
dy. (2.113)
a −∞ |g (t)|
{t|g(t)=y}

The t sum is restricted to t between a and b for which g 0 (t) 6= 0. If there


are no such t then the sum is zero. Hint: The sum over intervals I of the
y integrals over g(I) is the integral over y of the sum over the intervals I
with y in g(I).
(b) Let a > 0. Compute δ(x2 − a2 ) as a multiple of a sum of two delta
functions of the form δ(x ± a).
68 CHAPTER 2. INTEGRATION

4. There is a marvelous formula for computing the n − 1 dimensional surface


area for an implicitly defined surface g(x) = c in some region of Rn . The
rule is to compute the n-dimensional integral of δ(g(x) − c)|dg(x)| dx.
(a) For n = 2 this is the integral of δ(g(x, y) − c)|dg(x, y)| dx dy. Here

∂g(x, y) ∂g(x, y)
dg(x, y) = dx + dy. (2.114)
∂x ∂y

and s 2  2
∂g(x, y) ∂g(x, y)
|dg(x, y)| = + . (2.115)
∂x ∂y
Evaluate this in terms of an x integral involving partial derivatives of
g(x, y). These partial derivatives will be evaluated at the (implicitly de-
fined) y satisfying g(x, y) = c. Hint: First do the y integral.
(b) Say that the implicit function theorem applied to g(x, y) = c defines
y = h(x) as a function of x. Express the above result in terms of derivatives
of h(x). Show that this gives the usual formula for arc length.

5. Use the general formula to compute the area of the hemisphere z ≥ 0 for
the sphere x2 + y 2 +pz 2 = a2 . This is the three-dimensional integral of
δ(x2 + y 2 + z 2 − a2 )2 x2 + y 2 + z 2 dx dy dz. Hint: First do the z integral
to express the result as an x, y integral. Then it is easy to do this x, y
integral in polar coordinates.

Recitation 6
1. Evaluate Z 1 Z 3
2
e−x dx dy. (2.116)
0 3y

2. A previous problem gave a marvelous formula for computing the n − 1


dimensional surface area for an implicitly defined surface g(x) = 0 in
some region of Rn . Explain why this formula is “obvious” by drawing
relevant pictures.

3. Do the integral Z ∞ Z ∞
2
+y 2
e−x dx dy (2.117)
−∞ −∞

by changing to polar coordinates.

4. Do the integral Z ∞
2
e−x dx (2.118)
−∞

by applying Fubini’s theorem to the previous problem.


2.11. PROBLEMS 69

5. The Gamma function may be written


Z ∞
2
Γ(z) = 2 e−r r2z−1 dr. (2.119)
0

1
Show that Γ( 21 ) = π 2 .
6. Let an−1 be the area of the unit sphere in dimension n − 1. Prove that
n 2
π 2 = an−1 12 Γ( n2 ). Hint: Express the n dimensional integral e−x dx in
R

Cartesian coordinates and in polar coordinates.


70 CHAPTER 2. INTEGRATION
Chapter 3

Differential Forms

71
72 CHAPTER 3. DIFFERENTIAL FORMS

3.1 Coordinates
There are two theories of integration. The first describes how to integrate a
function over a set. The second explains how to integrate a differential form
over an oriented surface. It is the second theory that is the natural setting for
calculus. It is also a thread that runs through much of geometry and even of
applied mathematics. The topic of this chapter is differential forms and their
integrals, culminating with the general form of Stokes’ theorem.
In the following a function is said to be smooth if it is C ∞ . Two open sets
U, V of Rn are said to be diffeomorphic if there is a one-to-one smooth function
f : U → V with smooth inverse function f −1 : V → U .
An n-dimensional manifold patch is a set M together with a collection of
functions defined on M . Each such function is one-to-one from M onto some
open subset of Rn . Such a function is called a coordinate system. There are
two requirements on the set of coordinate system functions.

• If x = (x1 , . . . , xn ) : M → U is a coordinate system with values in the


open set U , and if f : U → V is a diffeomorphism from the open set U to
the open set V , then f (x) : M → U is also a coordinate system.

• If x = (x1 , . . . , xn ) : M → U is a coordinate system mapping M onto U ,


and if u = (u1 , . . . , un ) : M → V is a coordinate system mapping M onto
V , then there is a diffeomorphism f : U → V such that u = f (x).

This definition of manifold patch is not a standard definition; its purpose here
is to capture the idea of a set with many different coordinate systems, each of
which is defined on the same set.
For n = 1 a typical manifold patch is something like a featureless curve.
It does not have a shape, but one could think of something like the letter S,
without the end points. (One should not think of a curve in the shape of an 0,
because it does not match with an open interval of numbers. Also one should
not have a curve that looks like the Greek α, because it has a self-intersection
point.) For n = 2 a typical manifold patch is like a patch of cloth, but without
a border. It can have holes. The case n = 0 is a single point.
The coordinate functions serve to attach numbers to the points of M . This
can be done in many ways, and there is no one coordinate system superior to
all the others, at least not without further consideration. Furthermore, in many
treatments (including here) the points in M do not have names. If we want
to specify a point, we say that it is the point in M such that in a particular
coordinate system the coordinate values are certain specified numbers.
The concept of manifold patch is natural in geometry and in many applica-
tions of mathematics. A manifold patch M is (at least at the outset) assumed
to be featureless, except for the fact that it can have various coordinate sys-
tems. These coordinate systems are supposed to be on a democratic status; one
is as good as another. Since the notions of open set, closed set, compact set,
continuous function, smooth function, smooth curve, and so on are independent
of coordinate system, they make sense for a manifold patch. On the other hand,
3.2. SCALAR FIELDS 73

notions of distance, angle, congruence, and so on are not defined (although they
may be introduced later). It is amazing how much useful mathematics can be
done in this general context.
The manifold patch concept is particularly suited to the local notion of
geometry, that is, it gives a description of what happens near a point. This
is because a manifold patch is modeled on an open subset of Rn . There is a
more general concept of manifold that consists of many manifold patches joined
together. This is a global or big picture notion of geometry, and it is a fascinating
topic in advanced mathematics. Here we focus on the local story, with only brief
mention of possible global issues.
Example: Here is a typical example of a manifold patch in applied mathematics.
Consider a box of gas with pressure p and volume V and temperature T . These
are related by an equation of state f (p, V, T ) = 0. In nice cases this equation
may be solved for one variable in terms of the others via the implicit function
theorem. Finding the equation of state is a major task of physics. But even
after this is determined, it is not clear which coordinates will be most convenient
to use. In this case the manifold patch M is the set of possible states of the
gas. One possible coordinate system is p, V . Another is p, T . Yet another is
V, T . Physicists and chemists and geologists use whichever coordinate system
is appropriate to the problem under consideration. |

3.2 Scalar fields


There are three notions that are important at the outset, that of scalar field,
vector field, and differential 1-form. We look at each of them in turn. In general,
we shall require that these be defined on a manifold patch M .
A scalar field is a function s = h(u) = h(u1 , . . . , un ) from M to R. Here
u = (u1 , . . . , un ) is some coordinate system mapping M onto the open set V ,
and h is a smooth function on V . Usually we picture such a function by drawing
contour lines of s on M . This sketch may indicate maxima and minima and
saddle points of s. Notice that while we can express s = h(u), we can also
express s = h(f (x)) in terms of some other coordinate system. The notion of
scalar field is independent of coordinate system.
Example: Here is a simple example that illustrates the notion of manifold
patch in an elementary application. Consider the problem of making a box with
a given amount of material to contain the maximum volume. The box will have
five sides, a base and four vertical sides. It is open at the top. In this case the
manifold patch is the set M of possible boxes made with this material.
Say that the side lengths of the base are u, v and the height is w. The amount
of material available is a fixed number A. Thus uv + 2uw + 2vw = A. There are
various possible coordinate systems to describe the boxes. One possibility is u, v.
Another is u, w. Yet another is v, w. These each send M to a coordinate patch.
Each coordinate system may be expressed in terms of each other coordinate
system. For instance, we may express u, v as in terms of u, w by noting that
v = (A − 2uw)/(u + 2w) for 0 < 2uw < A.
74 CHAPTER 3. DIFFERENTIAL FORMS

The volume V = uvw is a function on M that is a scalar field. For every


constant c > 0 the set of points in M (box shapes) where V = c is either a curve
(the set of boxes with volume c), a single point (the box where V assumes its
maximum value), or empty. Later on we shall solve the problem of finding the
box shape for which V assumes its maximum value. The answer turns out to
be that w/u = 12 and w/v = 12 for this shape. That is, the open top box with
maximal volume has a square base and a height half the length of each side of
the base.
The solution follows a typical strategy for applications of mathematics. Con-
sider the set M of objects of interest. Attach various numerical values to the
points in M and label these by variable names. Use what you know about M to
establish various relations between these variable. Finally, make some (perhaps
arbitrary) choice of which variables will serve as coordinates, and eliminate the
other variables. Then the mathematical calculation begins. |

3.3 Vector fields


A vector field is a differential operator of the form
n
X ∂
X= aj . (3.1)
j=1
∂xj

Here each aj is a scalar field. The differential operator acts on a scalar field s
to give another scalar field
n
X ∂s
Xs= aj . (3.2)
j=1
∂xj

Again, the notion of vector field is independent of the coordinate system. Thus
we can also write
n
X ∂
X= āi . (3.3)
i=1
∂u i

Here
n n
X ∂ui X
0
āi = aj = fi,j (x)aj . (3.4)
j=1
∂xj i=1

One can picture a vector field in the following way. At each point of M one
draws an arrow. This arrow is not to be thought of as a displacement in M , but
as a kind of rate of change at this point of M . For instance, if M is thought
of a region where there is a fluid flow, then the vector field might be something
like the fluid velocity. More precisely, the components of the vector field are
velocities. The vector field itself describes how quantities change with time as
they move with the fluid.
Giving a vector field is equivalent to giving a system of ordinary differential
equations. More precisely, it is equivalent to giving an autonomous system of
3.3. VECTOR FIELDS 75

first order ordinary differential equations. (The word autonomous means that
the vector field
P does not change with time.) The equations corresponding to
vector field j gj (x)∂/∂xj are

dxj
= gj (x1 , . . . , xn ). (3.5)
dt
Of course this can also be written in the abbreviated form dx/dt = g(x). A
solution of such an equation is given by functions hj (t) with

dhj (t)
= gj (h1 (t), . . . , hn (t)). (3.6)
dt
Again this has a brief form dh(t)/dt = g(h(t)).
Locally, a vector field is a fairly boring object, with one exception. This is
at a point in the manifold patch M where the vector field X vanishes, that is,
where each aj has the value zero. Away from such points the vector field is
doing nothing more interesting than uniform motion.

Theorem 3.1 (Straightening out theorem) . If


n
X ∂
X= ai 6= 0 (3.7)
i=1
∂xi

is a vector field that is non-zero near some point, then near that point there is
another coordinate system u1 , . . . , un in which it has the form

X= . (3.8)
∂uj

Proof: Here is the idea of the proof of the straightening out theorem. Say
that aj 6= 0. Solve the system of differential equations

dxi
= ai (3.9)
dt
with initial condition 0 on the surface xj = 0. This can be done locally, by the
existence theorem for systems of ordinary differential equations with smooth
coefficients. The result is that xi is a function of the coordinates xi for i 6= j
restricted to the surface xj = 0 and of the time parameter t. Furthermore,
since dxj /dt 6= 0, the condition t = 0 corresponds to the surface xj = 0. So if
x1 , . . . , xn corresponds to a point in M near the given point, we can define for
i 6= j the coordinates ui to be the initial value of xi on the xj = 0, and we can
define uj = t. In these coordinates the differential equation becomes

dui
= 0, i 6= j, (3.10)
dt
duj
= 1. (3.11)
dt
76 CHAPTER 3. DIFFERENTIAL FORMS


Example: Consider the vector field

∂ ∂
−y +x (3.12)
∂x ∂y

away from the origin. The corresponding system is

dx
= −y (3.13)
dt
dy
= x. (3.14)
dt
Take the point to be y = 0, with x > 0. Take the initial condition to be x = r
and y = 0. Then x = r cos(t) and y = r sin(t). So the coordinates in which
the straightening out takes place are polar coordinates r, t. Thus if we write
x = r cos(φ) and y = r sin(φ), we have

∂ ∂ ∂
−y +x = , (3.15)
∂x ∂y ∂φ

where the partial derivative with respect to φ is taken with r held fixed. |
Example: Consider the Euler vector field

∂ ∂ ∂
x +y =r , (3.16)
∂x ∂y ∂r

where the partial derivative with respect to r is taken with fixed φ. We need to
stay away from the zero at the origin. If we let t = ln(r), then this is

∂ ∂ ∂ ∂
x +y =r = , (3.17)
∂x ∂y ∂r ∂t

where the t derivative is taken with φ fixed. |


Say that a vector field defining a system of ordinary differential equations
has an isolated zero. Thus the equation dx/dt = g(x) has a point x∗ where
g(x∗ ) = 0. Then it becomes interesting to approximate this equation near the
point by
dx̄
= g0 (x∗ )x̄, (3.18)
dt
where x̄ = x − x∗ . This is called the linearization of the vector field at the point
x∗ . The behavior of the linearization is studied by finding the eigenvalues of
the matrix g0 (x∗ ).
This story is already of interest in the case n = 2. Here are some common
cases.

Stable node Real eigenvalues with λ1 < 0, λ2 < 0.

Unstable node Real eigenvalues with λ1 > 0, λ2 > 0.


3.3. VECTOR FIELDS 77

Hyperbolic fixed point (saddle) Real eigenvalues with λ1 < 0 < λ2 .


Stable spiral Nonreal eigenvalues with λ = µ ± iω, µ < 0.
Unstable spiral Nonreal eigenvalues with λ = µ ± iω, µ > 0.
Elliptic fixed point (center) Nonreal eigenvalues λ = ±iω.
There are yet other cases when one of the eigenvalues is zero.
Example: Here is a fairly typical analysis of a vector field via fixed points and
linearizations. Consider the system
du
= u(v − 1) (3.19)
dt
dv
= 4 − u2 − v 2 .
dt

√ are fixed points in the u, v plane at (0, 2) and (0, −2) and ( 3, 1) and
There
(− 3, 1). We can compute the linearizations at each of these fixed points. The
first two are hyperbolic fixed points with vertical and horizontal eigenspaces.
The eigenvalues at (0, −2) are −3 in the horizontal direction and 6 in the vertical
direction. The eigenvalues at (0, 2) are 1 in the horizontal direction and −4 in
the vertical direction. By setting u = 0 in the original equation one can see that
v moves along the vertical axis from the fixed point at (0, −2) to the fixed point
at (0, 2). √ √
The eigenvalues
√ at ( 3, 1) and (− 3, 1) are both stable spirals with eigen-
values −1 ± 5i. The first one spirals in clockwise, while the other one spirals
in counter-clockwise. There are orbits that approach (0, −2) from either side.
Points below these orbits never reach the spirals. Everything above on the
left gets attracted to the left spiral, while everything above on the right gets
attracted to the right spiral.
Suppose that the same system were viewed in another coordinate system.
Many details would differ. However there would still be four fixed points, and
the eigenvalues of the linearizations would be the same. So the qualitative
picture would be much the same, but perhaps less symmetric. |
Example: A classic example from physics is the pendulum
dq 1
= p (3.20)
dt m
dp 1
= −mg sin( q). (3.21)
dt a
Here q = aθ represents displacement, and p represents momentum. The zeros
are at θ = nπ. When n is even this is the pendulum at rest in a stable position;
when n is odd this is the pendulum at rest upside down, in a very unstable
position. The linearization at a zero is
dq 1
= p (3.22)
dt m
dp mg
= − (−1)n q. (3.23)
dt a
78 CHAPTER 3. DIFFERENTIAL FORMS

In matrix form this is


1
    
d q 0 m q̃
= . (3.24)
dt p −(−1)n mg
a 0 p̃

The eigenvalues λ are given by λ2 = −(−1)n ag . When n is even we get an elliptic


fixed point, while when n is odd we get a hyperbolic fixed point.
The big picture is seen by examining the scalar

1 2 1
H= p − mga cos( q). (3.25)
2m a
This is the energy, and it is constant for each solution. While the energy does
not describe the time dependence of the solutions, it does show the shape of the
solutions in the phase plane. |
The following question is natural. Suppose that a vector field has an isolated
zero. At that zero it has a linearization. Is it possible to choose coordinates
nearby so that the vector field is given in those new coordinates by its lineariza-
tion? It turns out that this can often be done. The answer to the question is
negative in general. See Nelson [14] for a discussion of this delicate matter.

3.4 Fluid velocity and the advective derivative


It may be puzzling to think of a vector field as a differential operator. This
section is a digression to point out that this kind of construction is quite natural
in the context of fluid dynamics.
The velocity field of a fluid is an important example of a vector field. Con-
sider a fluid with a velocity vector field that is independent of time. For sim-
plicity, consider a two-dimensional case; think of the velocity of the water on
the surface of a river. Just because the velocity does not depend on time, this
does not mean that there is no motion. If one follows a particular particle,
it is transported along a path. If the position of the particle is described by
coordinates u, v, then the motion of the particle is given by

du
= a = f (u, v)
dt
dv
= b = g(u, v). (3.26)
dt
Here a, b are the velocity vector field components with respect to the coordinates
u, v.
Now let s = h(u, v) be some time-independent quantity. For instance, it
could be the temperature of the fluid at each point in space. If we follow this
quantity along a particle, then it does change in time, according to
 
ds ∂s du ∂s dv ∂ ∂
= + = a +b s. (3.27)
dt ∂u dt ∂v dt ∂u ∂v
3.5. DIFFERENTIAL 1-FORMS 79

In fluid dynamics the differential operator on the right represents the effect of
the fluid flow given by the velocity vector field. It is the derivative following
the motion of the particle. It is so important that it has many names: ad-
vective derivative, particle derivative, material derivative, substantial derivative,
Lagrangian derivative, and so on. The components a, b of the velocity vector
field depend on the coordinate system. If now one changes coordinates, say to
w, z, then the equation becomes
dw
= p = m(z, w)
dt
dz
= q = n(z, w), (3.28)
dt
where
∂w ∂w
p = a +b
∂u ∂v
∂z ∂z
q = a +b . (3.29)
∂u ∂v
Then a straightforward calculation gives
   
∂ ∂ ∂w ∂ ∂z ∂ ∂w ∂ ∂z ∂
p +q =a + +b + . (3.30)
∂w ∂z ∂u ∂w ∂u ∂z ∂v ∂w ∂v ∂z
By the chain rule this is
∂ ∂ ∂ ∂
p +q =a +b . (3.31)
∂w ∂z ∂u ∂v
Since the advective derivative represents the rate of change along the motion of
the particle, it is independent of the coordinate system. Specifying the advective
derivative is thus a particularly attractive way of specifying the vector field.

3.5 Differential 1-forms


A differential 1-form ω assigns to each vector field X a corresponding scalar
field hω | Xi. There is a very important way of constructing a differential 1-
form ds from a scalar field s. This is called the differential of the scalar field.
The definition is
hds | Xi = X s. (3.32)
This is perhaps the fundamental definition in the entire subject. Take
n
X ∂
X= aj . (3.33)
j=1
∂xj

If we write out the definition explicitly, we get


n
X ∂s
hds | Xi = aj . (3.34)
j=1
∂xj
80 CHAPTER 3. DIFFERENTIAL FORMS

If we apply this to the scalar s = xi that is one of the coordinates, then we get
hdxi | Xi = ai . (3.35)
It follows that
n n
X ∂s X ∂s
hds | Xi = hdxj , Xi = h dxj , Xi. (3.36)
j=1
∂xj j=1
∂xj

The final result is that the differential ds is given by


n
X ∂s
ds = dxj . (3.37)
j=1
∂xj

This is the most basic computational tool of the theory. The coordinate basis
forms dxi are sometimes called the dual basis of the coordinate basis vector
fields ∂/∂xj .
The general 1-form may be written in the form
n
X
ω= pj dxj . (3.38)
j=1

Here the pj are scalar fields. Its value on the vector field X is the scalar field
n
X
hω | Xi = pj aj . (3.39)
j=1

Remark: There is a dramatic difference between vector field bases and 1-form
bases. The notation ∂/∂z does not make sense unless z is a variable that belongs
to a given coordinate system. For instance, if the coordinate system is q, z, s,
then ∂/∂z means to differentiate with respect to z holding q, s both constant.
On the other hand, a differential dy makes sense for an arbitrary scalar field y,
whether or not it belongs to a coordinate system. |
Example: Here is an illustration of some of these ideas. Consider the problem of
making a box with a given amount of material to contain the maximum volume.
The box will have five sides, a base and four vertical sides. It is open at the
top. In this case the manifold patch is the set M of possible boxes made with
this material.
Say that the side lengths of the base are u, v and the height is w. The
amount of material available is a fixed number A. Thus uv + 2uw + 2vw = A.
Since A is a constant, we have
(v + 2w) du + (u + 2w) dv + 2(u + v) dw = 0. (3.40)
This relation is valid on all of M . We are interested in the point of M (that is,
in the particular shape of box) with the property that the volume V = uvw is
maximized. At this point we have
dV = vw du + uw dv + uv dw = 0. (3.41)
3.5. DIFFERENTIAL 1-FORMS 81

Now it is time to choose a coordinate system to work with, and it is conve-


nient to choose u, v. Thus we eliminate dw from the system. Multiplication by
2(u + v)/(uv) gives
vw uw
(2 + 2w) du + (2 + 2w) dv + 2(u + v) dw = 0 (3.42)
u v
at the point. Subtracting the equations gives
w w
v(1 − 2 ) du + u(1 − 2 ) dv = 0 (3.43)
u v
at the point. Since u, v is a coordinate system, the coefficients must be zero at
the point. This gives w = u/2 and w = v/2 as the dimensions of the box. The
box with maximal volume has a square base and a height half the length of each
side of the base. |
The 1-form ω is said to be exact in an open set if ω = ds for some scalar
field s defined on that open set. It is also sometimes called an exact differential.
The 1-form ω is said to be closed if for each j 6= k we have
∂pj ∂pk
= . (3.44)
∂xk ∂xj
n

These are 2 conditions. The following theorem is both trivial and supremely
important.
Theorem 3.2 If ω is an exact 1-form in some open set, then ω is also a closed
form.
The notion of differential 1-form is coordinate invariant. If
n
X
α= qk duk (3.45)
k=1

is a differential 1-form, and if u = f (x), then


n
X
α= q̄j dxj (3.46)
j=1

with
n n
X ∂uk X
0
q̄j = qk = qk fk,j (x) (3.47)
∂xj
k=1 k=1
expresses the same form in the other coordinate system. It may be shown by
calculation that the criterion for being exact or closed is the same in either
coordinate system.
The coordinate invariance is a reflection of the fact that the pairing of dif-
ferential 1-form and vector field gives rise to a well-defined scalar. Explicitly,
we have
Xn n
X
hα | Xi = q̄j aj = qk āk . (3.48)
j=1 k=1
82 CHAPTER 3. DIFFERENTIAL FORMS

The x coordinates and the u coordinates give the same result.


An exact differential 1-form ω = ds may be pictured by the contour surfaces
of s. At points where ω 6= 0 these are hypersurfaces of dimension n − 1. It is
sometimes helpful to indicate which contour lines have larger values of s and
which have smaller values of s. It is harder to picture a differential 1-form that
is not exact. The idea is to draw fragments of contour surfaces. These n − 1
dimensional fragments end in n − 2 dimensional surfaces.
Example: Take an example when n = 2. A typical example of a differential
1-form that is not exact is y dx. The fragmented contour lines are all vertical.
The form indicates increase to the right in the upper half plane and increase to
the left in the lower half plane. As the x axis is approached the density of these
fragmented contour lines must diminish at constant rate. Some of the lines have
end points at their lower ends (in the upper half plane) or at their upper ends
(in the lower half plane). |

3.6 Polar coordinates


Polar coordinates provide a useful example of these ideas. Consider the manifold
patch M that is the Euclidean plane. The points in the planes are not numbers;
they are geometrical objects. Let M • be M with a point removed. This is the
punctured plane. Let M † be M • after removal of a half-line that starts at the
missing point. This is the cut plane.
We may choose a Cartesian coordinate system x, y on M . Suppose that the
point where x = 0 and y = 0 is the point that is removed to make M • . Suppose
that the line where x ≤ 0 and y = 0 is the half-line in the definition of M † .
Then on M † the polar coordinates r, θ are related to the Cartesian coordinates
by

x = r cos(θ)
y = r sin(θ). (3.49)

Here θ ranges from −π to π, with a jump in value across the half-line. These
equations are identities saying the the scalars on the left are equal to the scalars
on the right as real functions on M † . Similarly, we have the identity r2 = x2 +y 2 .
Another way of thinking of the relation between the two coordinate systems
is to define open subsets U and V of R2 by taking U = {(a, b) | a > 0, −π <
b < π} and V = R2 \ {(p, q) | p ≤ 0, q = 0}. Then the coordinate system (x, y)
maps M † to V , and the coordinate system (r, θ) maps M † to U . The change of
coordinates is a smooth one-to-one function f from U to V . The result is that
(x, y) = f (r, θ) as functions from M † to V . The two coordinate systems provide
two numerical descriptions of the same object M † .
Taking the differential and then eliminating the trig functions gives
x
dx = dr − y dθ (3.50)
r
y
dy = dr + x dθ. (3.51)
r
3.6. POLAR COORDINATES 83

It follows that

x dx + y dy = r dr
x dy − y dx = r2 dθ. (3.52)

In particular, the differential form


x dy − y dx
ω† = = dθ (3.53)
x2 + y 2

as an identity on M † . This shows that ω † on the cut plane M † is an exact form.


The form on the cut plane is not a particularly natural object. Instead, the
angle form
x dy − y dx
ω• = (3.54)
x2 + y 2
makes sense on all of M • . Furthermore, it is a closed form on M • . The price to
pay is that it is not an exact form on the punctured plane M • . The interpre-
tation of this form is that near every point in M • it represents angular change.
In fact, locally we can always define an angle variable such that ω • = dχ. But
there is no such angle variable defined on all of M • .
There is a similar computation for vector fields. The chain rule followed by
elimination of trig functions gives
∂ x ∂ y ∂
= +
∂r r ∂x r ∂y
∂ ∂ ∂
= −y +x (3.55)
∂θ ∂x ∂y

as an identity on the cut plane M † .


Consider the vector fields
∂ ∂
E = x +y
∂x ∂y
∂ ∂
R = x −y . (3.56)
∂y ∂x
These are both defined on the plane M . The vector field E is sometimes called
the Euler vector field ; it plays a role in the study of homogeneous functions.
The vector field R is the rotation vector field. It describes rotations of constant
angular frequency. The origin is a zero of the vector field R. Rotations of the
origin just leave it fixed.
If we restrict the vector fields E and R to the punctured plane E • , then they
can be locally straightened out. In fact this is true for arbitrary non-zero vector
fields. For differential forms the condition for local exactness is that the form
be a closed form. There is no such restriction for vector fields.
For the Euler operator we can take r = et and get E = r∂/∂r = ∂/∂t in all
of E • . For the rotation operator the straightening out is only local. For every
84 CHAPTER 3. DIFFERENTIAL FORMS

point in E • we can choose an angular variable χ that is defined near the point,
and then R = ∂/∂χ near the point.
This discussion shows that it is useful to think of Cartesian coordinates and
polar coordinates as scalars defined on a manifold patch. Then the equations
above are identities, either for scalars or for differential forms or for vector fields.
The closed differential form ω defined on the punctured plane is a fundamen-
tal mathematical object; for instance it underlies many of the calculations in
complex variable theory. The rotation operator R has an associated system
of differential equations. These are the equations for a linear oscillator, a ba-
sic system that occurs throughout applied mathematics. These examples also
illustrate that differential forms and vector fields are quite different objects.
The example in this section is not typical in one respect: there are natural
notions of length and angle. Thus in Cartesian coordinates dx and dy are
orthogonal unit forms. In polar coordinates dr and r dθ are orthogonal unit
forms. (Note: The form r dθ is not a closed form.) There is a similar story
for vector fields. In Cartesian coordinates ∂/∂x and ∂/∂y are orthogonal unit
vectors. In polar coordinates ∂/∂r and 1/r ∂/∂θ are orthogonal unit vectors.
The reason for this is that the underlying manifold M is the Euclidean plane,
which has natural notions of length and angle. Such special structure need not
be present in other manifolds.

3.7 Integrating factors and canonical forms


A classic application of these ideas is ordinary differential equations in the plane.
Such an equation is often written in the form

p dx + q dy = 0. (3.57)

Here p = f (x, y) and q = g(x, y) are functions of x, y. This means that a solution
of the equation is a curve where the differential form p dx + q dy is zero. There
can be many such curves.
The equation is determined by the differential form α = p dx + q dy, but two
different forms may determine equivalent equations. For example, if µ = h(x, y)
is a non-zero scalar, then the form µα = µp dx + µq dy is a quite different form,
but it determines an equivalent differential equation.
If α = p dx + q dy is exact, then p dx + q dy = dz, for some scalar z depending
on x and y. Each solution of the differential equation is then given implicitly
by z = c, where c is the constant of integration.
If α = p dx + q dy is not exact, then one looks for an integrating factor µ
such that
µα = µ(p dx + q dy) = dz (3.58)

is exact. Once this is done, again the general solution of the differential equation
is then given implicitly by z = c, where c is constant of integration.
3.7. INTEGRATING FACTORS AND CANONICAL FORMS 85

Theorem 3.3 Suppose that α = p dx + q dy is a differential form in two di-


mensions that is non-zero near some point. Then α has a non-zero integrating
factor µ near the point, so µα = dv for some scalar field v.
∂ ∂
Proof: Consider the non-zero vector field X = q ∂x − p ∂y . By the straight-

ening out theorem, there is a new coordinate system u, v such that X = ∂u .
∂x ∂y
This means that ∂u = q and ∂u = −p. It is easy to check that

∂x ∂y
α = p dx + q dy = (p + q ) dv = w dv, (3.59)
∂v ∂v
where w is a non-zero scalar. We can then take µ = 1/w. 
Finding an explicit integrating factor may be no easy matter. However, there
is a strategy that may be helpful.
Recall that if a differential form is exact, then it is closed. So if µ is an
integrating factor, then
∂µp ∂µq
− = 0. (3.60)
∂y ∂x
This condition may be written in the form
 
∂µ ∂µ ∂p ∂q
p −q + − µ = 0. (3.61)
∂y ∂x ∂y ∂x
Say that by good fortune there is an integrating factor µ that depends only
on x. Then this gives a linear ordinary differential equation for µ that may be
solved by integration.
Example: Consider the standard problem of solving the linear differential equa-
tion
dy
= −ay + b, (3.62)
dx
where a, b are functions of x. Consider the differential form (ay−b) dx+dy. Look
for an integrating factor µ that depends only on x. The differential equation for
µ is −dµ/dx = aµ. This has solution µ = eA , where A is a function of x with
dA/dx = a. Thus

eA (ay − b) dx + eA dy = d(eA y − S), (3.63)

where S is a function of x with dS/dx = eA b. So the solution of the equation is


y = e−A (S + c). |
The theory of differential forms is extraordinarily different from the theory
of vector fields. A nonzero vector field may always be straightened out locally.
For differential forms this is only possible if the form is closd (and hence locally
exact).

Theorem 3.4 Consider a differential form α = p dx + q dy in two dimensions.


Suppose that near some point α is not zero. Then
• If α is closed near this point, then there is a scalar field z with α = dz.
86 CHAPTER 3. DIFFERENTIAL FORMS

• If α is not closed, then there there is a new coordinate system w, v with


α = w dv.

Proof: Since α = p dx + q dy is not zero, there is a new coordinate system


u, v in which it has the form α = w dv. In this coordinate system the condition
that α is a closed form is that ∂w
∂u = 0.
If α is closed, then w is a function of v. Thus α = w dv has an integral z
that is a function of v.
If α is not closed, then the matrix that expresses the partial derivatives of
w, v in terms of u, v is non-singular. By the inverse function theorem w, u is
also a coordinate system. 
A theorem of Darboux gives a list of standard representations of 1-forms in
higher dimensions. The differential equations book by Ince [7] treats the three
dimensional situation. The treatise by Şuhubi [19] gives a full discussion for n
dimensions.

3.8 The second differential


This section deals criteria with the critical points of a scalar function. These
concern the coordinate invariant versions of the first derivative test and the
second derivative test. The first derivative test involves first partial derivatives,
that is, the differential. The second derivative test involves the Hessian matrix
of second partial derivatives, which at a critical point gives a second differential.

Theorem 3.5 (Coordinate invariance of first derivative test) Suppose that


M is a manifold patch and z is a scalar field on M . If z has a local maximum
or local minimum at a certain point, then at that point
n
X ∂z
dz = dxi = 0. (3.64)
i=1
∂xi

This condition may be expressed in any coordinate system.

Theorem 3.6 (Coordinate invariance of second derivative test) Suppose


that M is a manifold patch and z is a scalar field on M . Consider a point where
dz = 0. Then at that point
n X
n
X ∂2z
d2 z = dxi dx` . (3.65)
i=1 `=1
∂xi ∂x`

If the Hessian matrix on the right is positive definite (negative definite), then
the function z has a local minimum (local maximum). This condition may be
expressed in any coordinate system.
3.8. THE SECOND DIFFERENTIAL 87

The computation that underlies these results begins with


n
∂z X ∂z ∂xj
= . (3.66)
∂yi j=1
∂xj ∂yi

If we differentiate again, we get


n n n
∂2z X X ∂ 2 z ∂xj ∂x` X ∂z ∂ 2 xj
= + . (3.67)
∂yi ∂yk j=1
∂xj ∂x` ∂yi ∂yk j=1 ∂xj ∂yi ∂yk
`=1

The second derivative in the second term on the right is a rather complicated
factor. But if the first derivatives ∂z/∂xj = 0 for j = 1, . . . , n at a certain
point, then we are left with the Hessian matrix at this point transformed by the
coordinate transformations on left and right. This is a matrix congruence, so it
preserves the positive definite or negative definite property.
In the case of a function of two variables, there is a simple criterion for
application of the second derivative test. Suppose that z = h(x, y) is a smooth
function. Consider a point where the first derivative test applies, that is, the
differential dz = d h(x, y) is zero. Consider the case when the Hessian is non-
degenerate, that is, has determinant not equal to zero. Suppose first that the
determinant of the Hessian matrix is strictly positive. Then the function has
either a local minimum or a local maximum, depending on whether the trace is
positive or negative. Alternatively, suppose that the determinant of the Hessian
matrix is strictly negative. Then the function has a saddle point.
The case of n dimensions is more complicated. The Hessian matrix may be
transformed by matrix congruence transformations to a diagonal matrix with
entries j that are +1, −1, or 0. In the non-degenerate case the entries are ±1.
If they are all +1 then we have a local minimum, while if they are all −1 we
have a local maximum. Otherwise we have a saddle.
There is a more powerful insight into these results that comes from changing
to a new coordinate system. The first result states that away from a critical
point nothing interesting happens.

Theorem 3.7 Let z = f (x1 , . . . , xn ) be a smooth function on an n-dimensional


manifold patch such that at a certain point dz 6= 0. Then there is a new coordi-
nate system u1 , . . . , un near the point such that z = u1 .

Proof: We may assume without loss of generality that ∂z/∂x1 6= 0. Let


u1 = z and let uj = xj for j = 2, . . . , n. Then the matrix of partial derivatives
∂ui /∂xj is non-singular. So by the inverse function theorem the xj may be
expressed in terms of the uj . 
The next result says that even when the first derivative vanishes, there are
common circumstances when there is nothing interesting going on with the
second derivative. See Milnor [11] for a proof.

Theorem 3.8 (Morse lemma) Let z be a smooth function on an n-dimensional


manifold such that dz vanishes at a certain point. Let z0 be the value of the
88 CHAPTER 3. DIFFERENTIAL FORMS

function at that point. Suppose that the Hessian is non-degenerate at this point.
Then there is a coordinate system u1 , . . . , un near the point with
n
X
z = z0 + i u2i , (3.68)
i=1

where i are constants that each have the value ±1.

3.9 Regular surfaces


Consider a manifold patch N with coordinates x1 , . . . , xn . It is often of interest
to consider a surface of dimension k < n. The nicest kind of surface is called
a regular surface. A regular k-surface is a subset S ⊆ N with the following
property. Near every point of S there is a coordinate system u1 , . . . , un for N
such that the nearby part of S is defined by uk+1 = 0, . . . , un = 0. (In advanced
treatments a regular surface is sometimes called an embedded manifold.)
The most classical case is a 2-dimensional surface in 3 dimensions. A 1-
dimensional surface is called a curve; in many cases it can be treated with
the same techniques. A regular curve may often be visualized as something
like the letter S, but there is at least one other possibility: it may be like the
letter O. In this latter situation near every point there is a coordinate system
mapping the letter O to an open interval, but at least two such coordinate
systems are required to describe the curve. A 0-dimensional regular surface
consists of isolated points.
An implicit representation of a surface is to give it as the solution of equations
gp (x1 , . . . , xn ) = cp , for p = k + 1, . . . , n. We can also write this as g(x) = c.
The derivative g0 (x) is a n − k by n matrix. The largest rank it can have is
n − k, and it is natural to require that it has this rank. The n − k differential
forms
X n
0
dgp (x) = gp,i (x) dxi (3.69)
i=1

are then linearly independent. In this case the surface defined by g(x) = c will
be called a regular implicit surface. It is clear that every regular surface is a
regular implicit surface.

Theorem 3.9 Every regular implicit surface is a regular surface.

Proof: Suppose the surface is given by gk+1 (x) = 0, . . . , gn (x) = 0. Without


loss of generality we may assume that the n − k by n − k matrix of partial
derivatives with respect to xk+1 , . . . , xn is non-singular. Set ui = hi (x) = xi
for i = 1, . . . , k and ui = hi (x) = gi (x) for i = k + 1, . . . , n. Then the n by n
matrix of partial derivatives for h0 (x) is non-singular. So u = h(x) locally has
an inverse function x = h−1 (u). Thus u1 , . . . , un is a coordinate system, and S
is given by setting the last n − k variables equal to zero. 
3.9. REGULAR SURFACES 89

Consider a manifold patch N of dimension n with coordinates x1 , . . . , xn .


Introduce another manifold patch P of dimension k with coordinates u1 , . . . , uk .
A parameterized k-surface is a mapping φ given by xi ← fi (u1 , . . . , uk ), for
i = 1, . . . , n, from P to N . Such a surface can be singular in various ways; in
particular the mapping need not be one-to-one. For this reason one wants to
think of P and and its image in N as quite distinct objects. In dealing with
parameterized surfaces in this generality one mainly works with the mapping
φ. In the case of curves, we can think of a curve where the image looks like
the letter S, but we can also think of it looking like the Greek letter α, with
self-intersection.
There is also a useful notion of regular locally parameterized surface. For
S ⊆ N to be such a surface we require that for every point in S there is an
open subset U ⊆ N , a k-dimensional manifold patch P , and a smooth mapping
φ : P → N given by x ← f (x). Furthermore,
1. The mapping φ is one-to-one and has image φ(P ) = S ∩ U .
2. The derivative mapping f 0 (u) has rank k at every point.
3. The mapping φ sends open sets in P to relatively open sets in φ(P ).
The third requirement is a self-avoiding requirement. It says that for every
open subset W ⊆ P there is an open subset V ⊆ N such that φ(W ) = S ∩ V .
In effect every small piece of the surface is isolated from remote parts of the
surface. Spivak [18] gives an example of a curve in the plane that fails to be
self-avoiding. The parameter space is an open interval, and it maps into a figure
that looks like the number 6. There is one point on the curve that is not isolated
from remote parts of the curve. Small enough parameter intervals around that
point give only the left part of the 6. But the right part of the 6 gets arbitrarily
close to the point.
Every regular surface is a regular locally parameterized surfaces. In fact,
if the surface is given near the point by setting uk+1 = 0, . . . , un = 0, then
the surface near the point can be parameterized by u1 , . . . uk . In such a case
one may choose to think of the parameter space as part of the surface, so the
u1 , . . . , uk are coordinates on the surface S ∩ U . In fact, each S ∩ U becomes a
manifold patch in its own right.

Theorem 3.10 Every locally parameterized regular surface is a regular surface.

Proof: Consider a point on the surface. Suppose the parametric represen-


tation near this point is xi = fi (u1 , . . . , uk ). Without loss of generality suppose
that the partial derivatives with respect to the variables x1 , . . . , xk form a k
by k invertible matrix. Define a new function xi = gi (u1 , . . . , un ) as follows.
Take xi = gi (u) = fi (u1 , . . . , uk ) for i = 1, . . . , k, and take xi = gi (u) =
fi (u1 , . . . , uk ) + ui for i = k + 1, . . . , un . Then the derivative of g(u) is an in-
vertible n by n matrix. So we may locally express u = g−1 (x) by the inverse
function theorem. We would like to show that near the given point on the sur-
face it is obtained by setting uk+1 = 0, . . . , un = 0. Clearly when this is satisfied
90 CHAPTER 3. DIFFERENTIAL FORMS

we have xi = fi (u1 , . . . , uk ), and so the corresponding point is on the surface.


On the other hand, we have the self-avoiding condition. Consider a parameter
region W around the point so small that it is in the region where the inverse
function theorem applies. Then there is an open subset V in N near the point
such that that every x in V that is also in S ∩ U is of the form x = f (u1 , . . . , uk )
for u1 , . . . , uk in W . In other words, x = g(u1 , . . . , uk , 0, . . . , 0). Since g is one-
to-one, this means that nearby points x on the surface S ∩U have ui coordinates
satisfying uk+1 = 0, . . . , un = 0. 
We could also define a regular parameterized surface to be one where for
each point in S the open set U is all of N . Then S = φ(P ) needs only one
coordinate patch. This is a regular surface that happens also to be a manifold
patch. Often this is the natural setting for stating local results.
If s = h(x1 , . . . , xn ) is a scalar, then there is a natural pullback scalar ex-
pressed in terms of u1 , . . . , uk . This is

h(x)(x ← f (u)) = h(f (u)). (3.70)

If we differentiate this equation with respect to uα , we get a quantity


n
∂ X
Xα h = h(f (u)) = h0,i (f (u))fi,α
0
(u). (3.71)
∂uα i=1

This suggests that we define vectors that differentiate scalars:


n
X
0 ∂
Xα = fi,α (u) . (3.72)
i=1
∂xi

The notation requires explanation. To get the proper value for Xα h we have to
perform the partial derivatives to get h0,i (x), but after that we have to substitute
x ← f (u) in the result.
The vectors Xα for α = 1, . . . , k are not the usual kind of vector field; instead
each Xα is a vector field along the parameterized surface. That is, the input to
Xα is given by the u, while the output corresponds to a vector at the point f (u)
on the surface. Each such vector is a tangent vector to the surface.
Consider now a surface with both a parametric and an implicit representa-
tion. In that case we have g(x)(x ← f (u)) = g(f (u)) = c. Explicitly,

gp (f (u)) = cp (3.73)

for p = k + 1, . . . , n. Differentiation with respect to uα gives


n
X
0 0
gp,i (f (u))fi,α (u) = 0. (3.74)
i=1

This result may also be written in terms of differential forms and vector fields
in the form
hdgp (x) | Xα i = 0. (3.75)
3.10. LAGRANGE MULTIPLIERS 91

The notion does not make this explicit, but there is an assumption that there
is a replacement x = f (u) in the coefficients of the differential forms. The sig-
nificance of this equation is that the differentials of the functions defining the
surface vanish on the tangent vectors to the surface. Since there are k indepen-
dent tangent vectors and n − k independent differential forms, the differential
forms dgp (x), p = k + 1, . . . , n form a basis for the space of differential forms
that vanish on the tangent vectors.

3.10 Lagrange multipliers


The topic in this section is constrained optimization. The problem is to maximize
or minimize a function restricted to a surface. This constraint surface is given
implicitly.

Theorem 3.11 (Lagrange multiplier theorem) Consider a regular k-surface


given implicitly by gp (x) = cp , p = k + 1, . . . , n, Suppose that h(x) is a smooth
function whose restriction to the surface has a local minimum or a local maxi-
mum at a certain point given by x. Then there are unique coefficients λ1 , . . . , λp
such that
Xn
dh(x) = λp dgp (x) (3.76)
p=k+1

at that point.

Proof: Take a parametric representation x ← f (u) near the point. The


function h(x) pulls back to h(f (u)). The first derivative test gives

dh(f (u)) = 0. (3.77)

More explicitly,
n
X
h0,i (f (u))fi,α
0
(u) = 0 (3.78)
i=1

for α = 1, . . . , k. We can also write this as

hdh(x) | Xα i = 0 (3.79)

for α = 1, . . . , k. This says that dh(x) belongs to the space of forms that vanish
on the tangent vectors. It follows that it is a linear combination of the forms
dgp (x) that form a basis for this space. 
The coefficients λp are called Lagrange multipliers. This result is intuitive.
It says that if h has a local maximum on the surface, then the only way it can
be made larger is by moving off the surface by relaxing the constraint that the
surface is defined by constants. The Lagrange multiplier λp itself is the partial
derivative of the critical value with respect to a change in the parameter cp .
92 CHAPTER 3. DIFFERENTIAL FORMS

Example: Say that we want to maximize or minimize u = x + y + 2z subject


to v = x2 + y 2 + z 2 = 1. The manifold in this case is the unit sphere. The
Lagrange multiplier condition says that

du = dx + dy + 2 dz = λdv = λ(2x dx + 2y dy + 2z dz). (3.80)

Thus 1 = 2λx, 1 = 2λy, and 2 = 2λz. Insert these in the constraint pequation
x2 + yp2
+ z 2 = 1. This
p gives (1/4)p+ (1/4) + 1 = λ 2
, or λ = ± 3/2. So
x = ± 2/3/2, y = ± 2/3/2, z = ± 2/3. |
Example: Say that we want to maximize or minimize u = x − 4y + 3z + z 2
subject to v = x − y = 0 and w = y − z = 0. The manifold in this case is just a
line through the origin. The Lagrange multiplier condition says that

dx − 4 dy + (3 − 2z) dz = λ(dx − dy) + µ(dy − dz). (3.81)

Thus 1 = λ, −4 = −λ + µ, and (3 − 2z) = −µ. When we solve we get µ = −3


and so z = 0.
Of course we could also solve this example without Lagrange multipliers.
Since the manifold is x = y = z, the function to be maximized or minimized
is u = z 2 , and this has its minimum at z = 0. The utility of the Lagrange
multiplier technique in more complicated problems is that it is not necessary to
do such a preliminary elimination before solving the problem. |
Example: Here is a simple example to emphasize the point that the Lagrange
multiplier technique is coordinate independent. Say that one wants to maximize
or minimize z subject to x2 + y 2 + z 2 = 1. The Lagrange multiplier method
says to write dz = λ(2x dx + 2y dy + 2z dz). This says that x = y = 0, and so
z = ±1. In spherical polar coordinates this would be the problem of maximizing
r cos(θ) subject to r2 = 1. This would give dr cos(θ)−r sin(θ) dθ = λ2r dr. Thus
sin(θ) = 0, and the solution is θ = 0 or θ = π. |

3.11 Differential k-forms


The algebra and differential calculus of differential k-forms may be unfamiliar,
but fortunately it is an easy subject. The main properties stated in the following
sections may be proved by checking that the definitions take the same form after
a change of coordinate system. These proofs tend to be dull, and many of them
will be omitted. The book of Rudin [17] takes such a computational approach,
but he does not stress the invariance under coordinate changes, which is the
most wonderful aspect of the subject. There are alternate proofs that are more
interesting and conceptual, but they are also more abstract. More advanced
books [2, 12] give an idea of such an approach.
Example: We begin with a quick summary of the algebraic aspect of differential
forms. Consider the case of a three-dimensional space with arbitrary coordinates
u, v, w. The 0-forms are the scalars. The 1-forms are p du + q dv + r dw. These
will eventually be integrated over curves. The 2-forms are a dvdw + b dw du +
c du dv. These are integrated over surfaces. The 3-forms are s du dv dw. These
3.11. DIFFERENTIAL K-FORMS 93

are good for integrals over 3-dimensional regions. However for the moment we
are only concerned with the differential forms, not with their integrals.
These forms have an algebra. The fundamental law is the anticommutative
law for 1-forms. Thus for instance dw du = − du dw. Since 1-forms anticommute
with themselves, we have du du = 0, and so on.
The algebra here is called the exterior product. In theoretical discussions it
is denoted by a wedge symbol, so that we would write dv ∧ dw instead of the
shorter form dv dw. Sometimes it is a good idea to use such a notation, since it
reminds us that there is a rather special kind of algebra, different from ordinary
multiplication. Practical computations tend to leave it out. |
A more theoretical approach to the definitions relates differential forms to
vector fields. A differential k-form ω on a coordinate patch is a quantity that
depends on k vector fields. We write it as hω | X1 , . . . , Xk i. One way to get
such a k-form is to multiply together k 1-forms and then anti-symmetrize. The
multiplication operation that accomplishes this is often written ∧ and is called
the exterior product. In the simplest case of a differential 2-form ω = α ∧ β this
is given by the determinant
 
hα | Xi hβ | Xi
hα ∧ β | X, Y i = det (3.82)
hα | Y i hβ | Y i

When we write this out we get

hα ∧ β | X, Y i = hα | Xihβ | Y i − hα | Y ihβ | Xi. (3.83)

This can be thought of as a kind of signed area attached to the vectors X, Y .


This product anti-commutes: For 1-forms α, β we always have

α ∧ β = −β ∧ α. (3.84)

In particular
α ∧ α = 0. (3.85)
The general formula for a product of k 1-forms is also given by a determinant

hα1 ∧ · · · ∧ αk | X1 , . . . , Xk i = det (hαi | Xj i) . (3.86)

This can be thought of as a kind of signed volume attached to the vectors


X1 , . . . , Xk .
The general definition of a k-form ω is that it associates to each X1 , . . . , Xk a
number hω | X1 , . . . , Xk i. This expression is suppose to be multi-linear, that is,
it is linear in each Xi with the other Xj for j 6= i held fixed. It is also supposed
to be alternating, in that interchanging two vectors gives a minus sign.
Remark: There is also a general definition of the exterior product of forms. If
θ is a p-form and λ is a q form, then θ ∧ λ is a p + q = k form given by
X
hθ ∧ λ | X1 , . . . , Xk i = sign(σ)hθ | Xσ(1) , . . . , Xσ(p) ihλ | Xσ(p+1) , . . . , Xσ(k) i,
σ
(3.87)
94 CHAPTER 3. DIFFERENTIAL FORMS

where the sum is over all permutations such that σ(1), . . . , σ(p) are in increasing
order and σ(p + 1), . . . , σ(k) are in increasing order.
A simple example is the product of a 2-form θ with a 1-form β. Then

hθ ∧ β | X, Y, Zi = hθ | X, Y ihβ, Zi − hθ | X, Zihβ, Y i + hθ | Y, Zihβ, Xi. (3.88)

|
The multiplicative properties are summarized as follows.

Associative law
(ω ∧ σ) ∧ τ = ω ∧ (σ ∧ τ ). (3.89)
are equal as n + m + p forms.

Distributive law
ω ∧ (σ + τ ) = ω ∧ σ + ω ∧ τ. (3.90)

Commutative law for even degree forms If either ω or σ is an even degree


forms, then
ω ∧ σ = σ ∧ ω. (3.91)

Anticommutative law for odd degree forms If both ω or σ are odd degree
forms, then
ω ∧ σ = −σ ∧ ω. (3.92)

The way to remember this is that even degree forms commute with everything.
On the other hand, odd degree forms anticommute with each other. In partic-
ular, if ω has odd degree, then ω ∧ ω = 0.

3.12 The exterior derivative


Example: Say that we have a three-dimensional space with coordinates x, y, z.
If we have a scalar like x2 z, then we know how to take its differential. In this
case we get
dx2 z = 2xz dx + x2 dz. (3.93)
Say that we have a differential form like x2 z dy dz. The rule is to compute
the differential by putting the scalar on the left, as we have done. Then the
derivative is obtained by taking the differential of the scalar part. Thus

d(x2 z dy dz) = d(x2 z) dy dz. (3.94)

When we compute this, we get

d(x2 z dy dz) = (2xz dx + x2 dz) dy dz = 2xz dx dy dz + x2 dz dy dz. (3.95)


3.12. THE EXTERIOR DERIVATIVE 95

But dz dy dz = −dy dz dz = 0 since dz dz = 0. So the final result is

d(x2 z dy dz) = 2xz dx dy dz. (3.96)

|
Now to a more theoretical treatment. We already know that the differential
of a scalar is
∂u ∂u
du = dx1 + · · · + dxn . (3.97)
∂x1 ∂xn
Every k form may be written as a sum of forms of the form

ω = u dxi1 ∧ · · · ∧ dxik . (3.98)

We can define the exterior derivative or differential by

dω = du ∧ dxi1 ∧ · · · ∧ dxik . (3.99)

It is important that the du goes on the left.


Here are the main properties of the exterior derivative.

Additivity
d(ω + σ) = dω + dσ. (3.100)

Product property If ω is a k-form and σ is an `-form, then

d(ω ∧ σ) = dω ∧ σ + (−1)k ω ∧ dσ. (3.101)

Differential of a differential
ddω = 0. (3.102)

If we think of d as a degree one quantity, then the sign in the product property
makes sense. Also, in this context the differential of a differential property also
makes sense.
A k-form σ is called closed if dσ = 0. A k-form σ is called exact in an
open set if σ = dα for some k − 1 form α in the open set. It follows from the
differential of a differential property that every exact form is closed.
It is useful to look at these quantities in low dimensions. For instance, in
three dimensions one might have a differential 2-form such as

σ = a dy ∧ dz + b dz ∧ dx + c dx ∧ dy. (3.103)

Here x, y, z are arbitrary coordinates, and a, b, c are smooth functions of x, y, z.


Similarly, in three dimensions a typical 3-form might have the form

τ = s dx ∧ dy ∧ dz. (3.104)

Notice that these forms are created as linear combinations of exterior products
of 1-forms.
96 CHAPTER 3. DIFFERENTIAL FORMS

Since these expressions are so common, it is customary in many contexts


to omit the explicit symbol for the exterior product. Thus the forms might be
written
σ = a dy dz + b dz dx + c dx dy (3.105)
and
τ = s dx dy dz. (3.106)
The exterior derivative of an r-form α is an r + 1-form dα. It is defined by
taking the differentials of the coefficients of the r-form. For instance, for the
1-form
α = p dx + q dy + r dz (3.107)
the differential is
dα = dp dx + dq dy + dr dz. (3.108)
This can be simplified as follows. First, note that
∂p ∂p ∂p
dp = dx + dy + dz. (3.109)
∂x ∂y ∂z
Therefore
∂p ∂p ∂p ∂p
dp dx = dy dx + dz dx = − dx dy + dz dx. (3.110)
∂y ∂z ∂y ∂z
Therefore, the final answer is
     
∂r ∂q ∂p ∂r ∂q ∂p
dα = d(p dx+q dy+r dz) = − dy dz+ − dz dx+ − dx dy.
∂y ∂z ∂z ∂x ∂x ∂y
(3.111)
Similarly, suppose that we have a 2-form

σ = a dy dz + b dz dx + c dx dy. (3.112)

Then
∂a ∂b ∂c
dσ = da dy dz + db dz dx + dc dx dy = dx dy dz + dy dz dx + dz dx dy.
∂x ∂y ∂z
(3.113)
This simplifies to
 
∂a ∂b ∂c
dσ = d(a dy dz + b dz dx + c dx dy) = + + dx dy dz. (3.114)
∂x ∂y ∂z
There are ways of picturing differential forms. If s is a 0-form, that is, a
scalar field, then it is a function from some set to the real numbers. The set
could be two-dimensional (or three-dimensional). So it certainly makes sense to
talk about a curve (or a surface) where s has a particular constant value. These
are the contour curves (surfaces) of the scalar field s. The scalar field does not
change along these curves (surfaces). Closely spaced contour curves (surfaces)
indicate a rapid increase or decrease in the values of s.
3.12. THE EXTERIOR DERIVATIVE 97

For an exact 1-form ds one uses the same picture of contour curves (surfaces),
but the philosophy is a bit different. One magnifies the region near a given point,
and one notes that the magnified curves (surfaces) are nearly lines (planes). So
at a small scale they are approximately the contour lines (planes) of linear
functions. The linear functions (forms) do not change much along such lines
(planes).
For a 1-form that is not exact the picture looks almost the same, except
that the the small scale contour lines (planes) can have end points (end curves).
They don’t come from a big scale contour curve (surface) picture. The end points
(end curves) have an orientation that indicates the direction of increase for the
small scale contour lines (planes). The differential of the 1-form corresponds to
this cloud of end points (end curves). A line integral of a 1-form along a curve
represents the cumulative change as the curve crosses the contour lines (planes).
More precisely, a differential 1-form assigns to each point and to each tangent
vector at that point a real number. The form is pictured by indicating those
spaces of tangent vectors at particular points on which the form gives the value
zero. Such a tangent space at a particular point is a line (plane).
If there is a metric, then you can picture a 1-form by vectors that are perpen-
dicular to the lines (planes) defining the form. Many people like to do this. But
it complicates calculations. And for many applications there is not a natural
choice of metric.
Example: Consider the 1-form y dx in two dimensions. This is represented by
vertical contour lines that terminate at points in the plane. The density of these
lines is greater as one gets farther from the x axis. The increase is to the right
above the x axis, and it is to the left below the y axis. The differential of y dx is
dy dx = −dx dy. This 2-form represents the cloud of terminating points, which
has a uniform density. The usual convention is that the positive orientation is
counterclockwise. So the orientations of these source points are clockwise. This
is consistent with the direction of increase along the contours lines. |
To understand the picture for 2-forms, we can look at 3-dimensional space.
If we look at a 1-form, it assigns numbers to little vectors. We picture the
1-form by the vectors (forming little planes) on which it is zero. If we look at
a 2-form, it assigns numbers to little oriented parallelograms. We picture it by
looking at the intersection of all the little parallelograms for which it has the
value zero. These determine a line. So one can think of the 2-form locally as
given by little lines. Globally they form curves, with a kind of spiral orientation.
They may have oriented end points. The differential of a 2-form corresponds to
the cloud of oriented end points. The integral of a 2-form on an oriented surface
depends on what the 2-form assigns to little oriented parallelograms formed by
tangent vectors to the surface. The non-zero contributions correspond to when
the curves representing the 2-form are transverse to the surface.
More precisely, a differential 2-form in 3-dimensional space assigns to each
point and to each ordered pair of tangent vectors at that point a real number.
The form is pictured by those tangent vectors that belong to a pair giving the
value zero. Such a tangent space at a particular point is a line.
When there is a metric, it is possible to picture the 2-form as a vector field
98 CHAPTER 3. DIFFERENTIAL FORMS

along the direction of the lines. In that case the surface integral represents the
amount by which the vectors penetrate the surface.
∂ ∂
Example: Say the 2-form is dx dy. It is zero on the vector pair ∂z , ∂x and on
∂ ∂
the vector pair ∂y , ∂z . In other words, it is zero on a pair including the vector

∂z . So we picture it by where it is zero, that is, by lines in the z direction. If we
integrate it along a surface where z is constant, then the little parallelograms
∂ ∂
in the surface are spanned by vectors like ∂x and ∂y , and so we get a non-zero
result. |
Remark: Consider the case of three dimensions. Anyone familiar with vector
analysis will notice that if s is a scalar, then the formula for ds resembles the
formula for the gradient in cartesian coordinates. Similarly, if α is a 1-form, then
the formula for dα resembles the formula for the curl in cartesian coordinates.
The formula d ds = 0 then corresponds to the formula curl grad s = 0.
In a similar way, if σ is a 2-form, then the formula for dσ resembles the
formula for the divergence of a vector field v in cartesian coordinates. The
formula d dα = 0 then corresponds to the formula div curl v = 0.
There are, however, important distinctions. First, the differential form for-
mulas take the same form in arbitrary coordinate systems. This is not true for
the formulas for the gradient, curl, and divergence. The reason is that the usual
definitions of gradient, curl, and divergence are as operations on vector fields,
not on differential forms. This leads to a much more complicated theory, except
for the very special case of cartesian coordinates on Euclidean space. Later on
we shall examine this issue in detail.
Second, the differential form formulas have natural formulations for mani-
folds of arbitrary dimension. While the gradient and divergence may also be
formulated in arbitrary dimensions, the curl only works in three dimensions.
This does not mean that notions such as gradient of a scalar (a vector field)
or divergence of a vector field (a scalar) are not useful and important. Indeed,
in some situations they play an essential role. However one should recognize
that these are relatively complicated objects.
The same considerations apply to the purely algebraic operations, at least
in three dimensions. The exterior product of two 1-forms resembles in some
way the cross product of vectors, while the exterior product of a 1-form and a
2-form resembles a scalar product of vectors. Thus the exterior product of three
1-forms resembles the triple scalar product of vector analysis. Again these are
not quite the same thing |

3.13 The Poincaré lemma


In the following it will be convenient to have a notion of a set in which nothing
interesting can happen. A subset U of Rn will here be called a nice region if it
is diffeomorphic to an open ball. This is not standard terminology, but it will
be convenient here.

Proposition 3.12 The following are nice regions, that is, diffeomorphic to the
3.13. THE POINCARÉ LEMMA 99

open ball B centered at zero with radius one.


1. The space Rn .
2. The space Rn+ consisting of all points in Rn with strictly positive coordi-
nates.
3. The open cube Cn with all coordinates between 0 and 1.
4. The
Pn interior of the simplex ∆n consisting of all points x in Cn with
i=1 xi < 1.

Proof:
p
1. There is a map from x in B topy in Rn given by y = x/ 1 − |x|2 . The
inverse map is given by x = y/ 1 + |y|2 .
2. There is a map from Rn to Rn+ given by zi = eyi . The inverse map is
yi = log(zi ).
p
3. There is a map from Cn to Rn+ given by zi = ui / 1 − u2i . The inverse
p
map is ui = zi / 1 + zi2 .
n
P
P ∆n to R+ given by z = x/(1 − i xi ). The inverse
4. There is a map from
map is x = z/(1 + i zi ).

An n-dimensional local manifold patch is a manifold patch with a coordinate
system x : M → U , where U ⊆ Rn is a nice region. In the case n = 0 a local
manifold patch is a single point. The terminology used here is not standard,
but the idea is that a local manifold patch has no interesting global features. In
fact, for each n it is an essentially unique object.
Example: An example of a 2-dimensional manifold patch with global features is
one modeled on a plane with a point removed. Another example is one modeled
on a plane with two points removed. These last two examples are not only not
diffeomorphic to the plane, but they are also not diffeomorphic to each other.
In fact, they are very different as global object. |
Example: If ω is defined on a manifold patch, that is, if ω = dα in the region,
then ω is closed: dω = 0. The converse is not true in general. Here is a two
dimensional example. Let
1
ω= (x dy − y dx). (3.115)
x2 + y 2
in the plane with the origin removed. Then ω is closed, but not exact. If we
remove a line running from the origin to infinity, then the resulting region is a
local manifold patch. In this smaller region ω is exact, in fact, ω = dφ, where
x = r cos(φ) and y = r sin(φ). |
What is true is that if ω is closed, then ω is locally exact. In fact, it is exact
on every local manifold patch. This will be proved in the following famous
Poincaré lemma.
100 CHAPTER 3. DIFFERENTIAL FORMS

Theorem 3.13 (Poincaré lemma) Consider a local manifold patch of dimen-


sion n. Let 1 ≤ k ≤ n and suppose that the form ω is a closed k-form, that is,
dω = 0. Then ω is exact on this local manifold patch, that is, there is a form α
defined on it with ω = dα.

Proof: We may as well assume that the coordinate system sends the local
manifold patch to an open ball centered at the origin. This implies that if
x1 , . . . , xn are coordinates of a point in the region, and if 0 ≤ t ≤ 1, then
tx1 , . . . , txn are coordinates of a point in the region.
If ω is a k-form, then we may obtain a form ω̄ by substituting txi for xi
everywhere. In particular, expressions dxi become d(txi ) = xi dt + t dxi . Every
differential form σ involving dt and other differentials may be written σ =
σ1 + σ2 , where σ1 is the static part, depending on t but with no factors of dt,
and σ2 = dt β is the remaining dynamic part, R 1 with β depending on t but with
no factors of dt. Define K(σ) = K(σ2 ) = 0 dt β, where σ2 = dt β. The claim
is that
K(dω̄) + dK(ω̄) = ω. (3.116)
This is proved in two parts. The first part is to prove a result for ω1 .
By definition K(ω̄1 ) = 0. We show that K(dω̄1 ) = ω. But (dω̄1 )2 only in-
volves t derivatives of the coefficients, so by the fundamental theorem of calculus
K(dω̄1 ) = K((dω̄1 )2 ) = ω.
The second part is that K(dω̄2 ) = −dK(ω̄2 ). But dω̄2 = −dt dβ = −dt (dβ)1 ,
so K(dω̄2 ) = −K(dt (dβ)1 ) = −dK(ω̄2 ). These two parts establish the claim.
The result follows from the claim. If dω = 0, then dω̄ = 0, and so ω = dK(ω̄).

Remark: The algorithm is simple, provided that one can do the integrals. Start
with a closed differential form ω defined in terms of x1 , . . . , xn . Replace xi by
txi everywhere, including in differentials. Collect all terms that begin with dt.
Put the dt in front. Integrate from 0 to 1 (with respect to t, keeping everything
else fixed). The result is a form α with dα = ω. |
Example: Consider the closed 1-form ω = x dy + y dx. Then ω̄ = t2 x dy +
t2 y dx + 2xyt dt. The integral of 2xyt dt is α = xy. |
Example: Consider the closed form ω = dx dy. Then ω̄ = t2 dx dy + tx dt dy −
ty dt dx. The integral of t dt (x dy − y dx) is α = (1/2)(x dy − y dx). |

3.14 Substitution and pullback


There is a useful distinction in analysis between two kinds of objects. The first
kind is a function that sends numbers to numbers. In the one dimensional case

one has examples such as sin and . We often use variables to define such
functions. The sine function may be written u 7→ sin(u) or w 7→ sin(w). The
function
p that squares,√adds one, and then takes the square root may be written
y 7→ y 2 + 1 or t 7→ t2 + 1. In such expressions the variables are only place
markers. In logic such a variable is called a bound variable. The term dummy
variable is sometimes used. Functions may be composed. For instance, the
3.14. SUBSTITUTION AND PULLBACK 101

p q
composition (y 7→ y 2 + 1)◦(u 7→ sin(u)) = (w 7→ sin2 (w) + 1). On the other
p √
hand, the composition (u 7→ sin(u)) ◦ (y 7→ y 2 + 1) = (z 7→ sin( z 2 + 1)). In
general (y 7→ f (y)) ◦ (u 7→ g(u)) is just another name for t 7→ f (g(t)) which is
itself just the composition f ◦g. In many instances we leave out the composition
symbol and just write f g.
The other kind of object is an expression that explicitly involves variables.
In logic this corresponds to the notion of free variable. For instance, sin(z) and
sin(t) are different expressions. There is an important operation called substi-
tution of an expression for a variable. An example is u ← sin(t). This means
to substitute sin(t) for u. As an example, u2 ◦ (u ← sin(t)) = sin2 (t). √ Substi-
tutions may √ be composed. Thus, for instance,
√ (u ← sin(t)) ◦ (t
√ ← z 2 + 1) =
2 2 2 2 2
(u ← sin( z + 1)). And u ◦ (u ← sin( z + 1)) = sin ( z + 1). Again
we often leave out the composition symbol. There are general identities such
as h(x)(x ← g(t)) = h(g(t)) and (x ← g(t))(t ← f (u)) = (x ← g(f (u))).
Composition of substitutions and composition of functions are clearly closely
related.
The substitution operation occurs in various forms and has various names,
including replacement and change of variables . In computer science there is a
related notion called assignment. An assignment x ← g(t) makes a change in
the machine state. It takes the number stored under the label t, computes g(t),
and then stores this result under the label x.
There is another notation that is very useful: In place of h(x)(x ← g(t)) =
h(g(t)) one instead writes (x ← g(t))∗ h(x) = h(g(t)). This notation at first
seems strange, but it is very useful. The idea is that the substitution x ← g(t)
is thought of as a operation that acts on expressions h(x), converting them to
other expressions h(g(t)). This kind of operation is called pullback.
If we write (u ← sin(t))∗ u2 = sin2 (t), then it seems natural to define the
expression (u ← sin(t))∗ du2 = d sin2 (t) = 2 sin(t) cos(t) dt. It also seems natural
to define (u ← sin(t))∗ 2u du = 2 sin(t)d sin(t) = 2 sin(t) cos(t) dt. That fact that
these give the same answer should be a source of satisfaction. Substitution
is somewhat more general than first appears; it has a natural application to
differential forms. In this context it is particularly natural to call it a pullback.
The same ideas extend to several variables. Take, for instance, the situation
when we have two variables x, y. A function such as xy 2 is a function on the
plane. Say that we want to perform the substitution ψ given by x ← t2 , y ← t3 .
Then we use ψ to pull back xy 2 to a function t8 on the line. We can write
ψ ∗ xy 2 = t8 . If we think of ψ as a parameterized curve, then the pullback is the
function on the curve expressed in terms of the parameter.
We can also pull back a differential form such as d(xy 2 ) = y 2 dx + 2xy dy
via the same ψ. The result using the right hand side is t6 dt2 + 2t5 dt3 =
2t7 dt+6t7 dt = 8t7 dt. Of course using the left hand side we also get dt8 = 8t7 dt.
For a form like ω = y dx + 2xy dy that is not exact, pulling it back by ψ gives
the result 2t4 dt + 6t7 dt = (2t4 + 6t7 ) dt = d( 52 t5 + 34 t8 ). Thus ψ ∗ ω is exact,
though ω is not exact. The pullback is a non-trivial operation on differential
forms.
102 CHAPTER 3. DIFFERENTIAL FORMS

In the following we will give a precise definition of an operation called mani-


fold mapping that corresponds to substitution or assignment. With this we can
give a rigorous definition of the pullback of a differential form or the pushforward
of a vector field.

3.15 Pullback of a differential form


Now suppose that N is a manifold patch of dimension n and M is a manifold
patch of dimension m. Suppose φ : N → M is a smooth function. We shall call
such a φ a manifold mapping. Sometimes we may just say mapping.
Here is a framework relating the notion of manifold mapping to other con-
cepts:
• x : N → Rn is a coordinate system on N .
• y : M → Rm is a coordinate system on M ,
• g is a function from a nice region in Rn to Rm .
• g(x) : N → Rm is a function from N to Rm .
• φ : N → M is the manifold mapping equal to y ← g(x).
The notation y ← g(x) is a way of defining the manifold mapping φ in
terms of coordinate systems. Thus φ takes a point in N , reads the numbers x,
computes the numbers g(x), and then finds the point in M where y has this
value.
Let u be a scalar field on M . Define the pullback φ∗ u = u ◦ φ as a scalar
field on N . Thus if u = h(y), then the pullback is

(y ← g(x))∗ h(y) = h(g(x)). (3.117)

Similarly, define the pullback of an exact 1-form du by φ∗ du = dφ∗ u. Thus


n X
X m
(y ← g(z))∗ dh(y) = dh(g(x)) = h0,i (g(x))gi,j
0
(x) dxj . (3.118)
j=1 i=1

In particular,
n
X
(y ← g(x))∗ dyi = d(gi (x)) = 0
gi,j (x) dxj . (3.119)
j=1

Every k form may be written as a sum of forms of the form

ω = h(y) dyi1 ∧ · · · ∧ dyik . (3.120)

We can define the pullback by

(y ← g(x))∗ ω = h(g(x)) ∧ dgi1 (x) ∧ · · · ∧ dgik (x). (3.121)


3.15. PULLBACK OF A DIFFERENTIAL FORM 103

This can then be written in terms of products of the dxj . If the products are
arranged in order, then the resulting coefficient is a determinant.
The result of the above computation may be written more succinctly if we
use the definition dyI = dyi1 ∧ · · · dyik , where I = {i1 , i2 , . . . , ik } and i1 < i2 <
· · · < ik . Then we write X
ω= hI (y) dyI , (3.122)
I

where the sum is over sets of I with k elements. Its pullback is


XX
φ∗ ω = 0
hI (g(x)) det gI,J (x) dxJ (3.123)
J I

The sums are over sets J and I with k elements.


Example: Consider the mapping φ = (z ← x2 y, w ← xy 3 ). Then

φ∗ (dz dw) = (2xy dx + x2 dy)(y 3 dx + 3xy 2 dy) = 5x2 y 3 dx dy. (3.124)

The 5x2 y 3 factor is the determinant of the derivative of the transformation


that relates z, w to x, y. We have already encountered similar formulas when
we change coordinates. In this special case we think of the mapping φ as an
equality, and we write the result as an equality. However in general we want
to keep the possibility that z, w describe one set of objects and x, y describe
another set of objects. In this case we must write φ explicitly to describe how
one goes from the objects described by x, y to the objects described by z, w. |
The general properties of the pullback are extraordinarily nice.

Additivity
φ∗ (ω + σ) = φ∗ ω + φ∗ σ. (3.125)

Product property
φ∗ (ω ∧ σ) = φ∗ ω ∧ φ∗ σ. (3.126)

Derivative
φ∗ (dω) = d(φ∗ ω). (3.127)

Composition
(φ ◦ χ)∗ ω = χ∗ φ∗ ω. (3.128)

Notice the reversal in the composition property. We have (y ← g(x))(x ←


f (u)) = (y ← g(f (u))). So the identity says that (y ← g(f (u)))∗ = (y ←
g(x))∗ (x ← f (u))∗ .
The pullback has two interpretations. In the case of a passive tranformation
there are the same number of yi coordinates as xj coordinates, and the sub-
stitution y = g(x) is just giving a different description of the same points. In
the special case of a passive transformation M = N , and the transformation φ
given by y ← g(x) is the identity. All that is happening is that y coordinates
are being expressed in terms of x coordinates. In this case it is appropriate to
104 CHAPTER 3. DIFFERENTIAL FORMS

write y = g(x) as an actual equality of functions on M . One writes everything


as an equality, for instance,
X ∂yi
dyi = dxj . (3.129)
j
∂xj

The corresponding equation for vector fields is

∂ X ∂xq ∂
= . (3.130)
∂yp q
∂yp ∂xq

The two matrices in the last two equations are inverses of each other. This is
the reason for the coordinate invariance of the pairing of 1-forms with vector
fields.
In the case of an active transformation the coordinates yi are describing
one situation, and the coordinates xj are describing some other situation. For
instance, the xj could be parameters describing a singular surface in the space
described by the yi . There is no reason to require that N and M have the
same dimension. The transformation y ← g(x) takes points of N to points of a
different space M . In this case it is best to write the pullback explicitly.
In the special case of an active transformation with M = N , the trans-
formation φ given by x ← g(x) makes sense. In that case the transforma-
tion can be iterated. This is equivalent to iterating the function g, since
x(x ← g(x))n = gn (x). It is important to emphasize that the mapping
x ← g(x) is not the same as the function g. In fact, a common notation
for the function g is x 7→ g(x) with the arrow going the other direction.
There is an intermediate situation that can be confusing. This is when N is a
subset of M , and φ sends each point in N into the same point, but now regarded
as a point in M . In this situation many people write equations y = g(x). There
has to be enough context to indicate that this means that the restriction of y
to N is g(x).

3.16 Pushforward of a vector field


There is a pushforward notion for vector fields, but it is not as nice. The idea
is that if φ : N → M and Z is a vector field on N , then φ∗ Z is a mapping from
N to vectors tangent to the image of φ in M . This is not a vector field in the
ordinary sense, but it is a somewhat more general object, a vector field along the
mapping φ. This takes a rather concrete form in a coordinate representation.
Say that φ is the map x ← f (t). Let

k
X ∂
Z= aα (3.131)
α=1
∂tα
3.16. PUSHFORWARD OF A VECTOR FIELD 105

be a vector field on N . Then φ∗ Z is obtained as follows. Consider a scalar h(x)


on M . Use the chain rule to compute
k n k
!
X ∂ X X
φ∗ Z h(x) = aα h(f (t)) = fi,α (t)aα h0,i (f (t)).
0
(3.132)
α=1
∂tα i=1 α=1

In other words,
n
∂ X
0 ∂
φ∗ = fi,α (t) ( ) |x←f (t) . (3.133)
∂tα i=1
∂x i

The vectors on the right hand side are tangent to the image, since after the
differentiation is performed one substitutes a point in the image. In other words,
if one regards x ← f (t) as a parameterized surface in M , then these are tangent
vectors at various points on the surface.
Since this is a relatively awkward notion, it is not so common to make it
explicit. Typically one merely talks of the vector with components
∂xi 0
= fi,α (t). (3.134)
∂tα
It is understood that this is a way of talking about certain tangent vectors to
the surface.
Remark: For those interested in theoretical considerations, it is worth noting
that the notions of pushforward and pullback are related. The relationship is
somewhat complicated, so it may be wise to omit the following discussion on a
first reading.
To understand it, we need the notion of vector field W along a mapping
φ from N to M . This sends each point in N to a differential operator that
differentiates scalars on M , such that at each point in N the the derivative is
evaluated at thePimage point in M under φ. In coordinates this would have
the form W = j kj (t)∂/∂xj , where kj (t) is a scalar on N and the xj are
coordinates on M . If Z is a vector field on N , then the pushforward φ∗ Z is a
vector field along φ. We also need the notion of differential k-form γ along a
mapping φ from N to M . This acts as an antisymmetric multilinear function on
vector fields along φ, so hγ P
| W1 , . . . , Wk i is a scalar on N . In the case
P of a 1-form
γ could take the form γ = i fi (t) dxi . We would have hγ, W i = j fj (t)kj (t);
there are corresponding formulas for k-forms. If ω is a differential form on M ,
then there is a corresponding object ω ◦ φ that is a differential form along φ. It
may be definedP by pulling back the scalar P coefficients of ω. Thus in the 1-form
case, if ω = j hj (x) dxj , then ω ◦ φ = j hj (f (t)) dxj .
The object of interest is φ∗ ω, the pullback of the differential form ω. Suppose
ω is a k-form on M and Z1 , . . . , Zk are vector fields on N and φ is a map from
N to M . Then the pullback φ∗ ω is a k-form on N given by
hφ∗ ω | Z1 , . . . , Zk i = hω ◦ φ | φ∗ Z1 , . . . , φ∗ Zk i. (3.135)
This is an equality of scalar fields on N . The φ∗ Zi are vector fields along φ;
they map points in N to tangent vectors at the image points in M . The ω ◦ φ
106 CHAPTER 3. DIFFERENTIAL FORMS

is a differential k-form along φ. In the case k = 1 the formula says


X X
hφ∗ ω | Zi = hj (f (t)) 0
fj,α (t)aα . (3.136)
j α

This is consistent with the usual formula


X X X
φ∗ ω = hj (f (t)) dfj (t) = hj (f (t)) 0
fj,α (t) dtα (3.137)
j j α

for the pullback. In summary, the pushforward of a vector field is not an ordinary
vector field; it is a vector field along the manifold mapping. However, the
pullback of a differential form is another differential form. It is the pullback
that is most simple and natural. |

3.17 Orientation
An orientation of an n dimensional vector space is determined by a list of n
basis vectors u1 , . . . , un . Two such lists determine the same orientation if they
are related by a matrix with determinant > 0. They determine the opposite
orientation if they are related by a matrix with determinant < 0. There are
always two orientations.
Sometimes it is useful to have new lists of basis vectors related to the old
sets of vectors by a determinant that has the value ±1. This can be done in
various ways, but here is a simple special case. Consider a variant ūi = si uτ (i) ,
where τ is a permutation of {1, . . . , n} and each si = ±1. Then the determinant
is the product of the si times the sign of the permutation τ . We shall use the
term variant as a technical term for such a new basis.
In one dimension an orientation is determined by specifying one of two di-
rections. So if u is a vector, every strictly positive multiple of u determines
the same orientation, while every strictly negative multiple of u determines the
opposite orientation. So the two variants u and −u determine the two orienta-
tions.
In two dimensions an orientation is specified by taking two linearly inde-
pendent vectors u, v in order. In fact, the same orientation is specified by the
variants u, v and v, −u and −u, −v and −v, u. The opposite orientation would
be given by any of the variants u, −v and −v, −u and −u, v and v, u. These
two orientations are often called counter-clockwise and clockwise. These are not
absolute notions; a counter-clockwise orientation on a piece of paper becomes
clockwise when the paper is viewed from the other side. Often the orientation
is pictured by a rectangle with vectors as sides. For instance this could be
u, v, −u, −v, which takes you back to the starting point.
In three dimensions an orientation is determined by a list of three vectors
u, v, w. Of course many other triples of vectors determine the same orientation.
If we permute the order and change the signs, we get 3! · 23 = 48 variant lists,
of which 24 have one orientation and 24 the opposite orientation. The two
3.17. ORIENTATION 107

orientations are usually called right-handed and left-handed. Again this is not
absolute; a mirror will reverse the orientations. Here one draws a cell with six
oriented sides. In the list of three vectors, the first vector goes from an departure
side to a destination side, the second and third vectors give the orientation of
the destination side. For a given orientation there are six faces the vector can
point to, each has its orientation determined. Since there are four ways to find
vectors determining the orientation of a given face, this gives 24 variant triples
of vectors.
In higher dimensions the idea is the same. One can think of the orientation
of a cell as giving a first vector from one side of the cell to the other, then giving
an orientation of the destination side. For dimension zero it is helpful to think
of an orientation as just a choice of a sign + or −.
Pn
Consider a parameterized cell C consisting of the vectors i=1 ti ui , where
0 ≤ ti ≤ 1. This cell has 2n boundary cells. For each k there are 2 corresponding

boundary cells, related to each other by the vector uP k . The cell Ck is obtained
by setting tk = 0 and consists of the combinations i6=k ti ui . PThe cell Ck+ is
obtained by setting tk = 1 and consists of the combinations i6=k ti ui + uk .
Notice that uk takes Ck− to Ck+ , while −uk takes Ck+ to Ck− .
Given the orientation determined by u1 , . . . , un , there is a natural orienta-
tion on each of the 2n boundary cells. The cell Ck+ has orientation obtained
by moving uk to the front of the list, with a sign change for each interchange,
and then removing it. This implies that the orientation of the cell Ck+ is given
by the orientation u1 , . . . , uk−1 , uk+1 , . . . , un when k is odd and by the op-
posite orientation when k is even. The cell Ck− has the opposite orientation.
This implies that the orientation of the cell Ck− is given by the orientation
u1 , . . . , uk−1 , uk+1 , . . . , un when k is even and by the opposite orientation when
k is odd.
As an example, consider the case n = 2 with the list of vectors u, v. The
orientations of the boundary cells C1+ , C2+ are v and −u respectively, while the
orientations of C1− , C2− are −v and u respectively.
A more challenging example is n = 3 with the list of vectors u, v, w. The
orientations of the boundary cells C1+ , C2+ , C3+ are given by v, w and −u, w
and u, v. The orientations of C1− , C2− , C3− are given by −v, w and u, w and
−u, v. Of course one can always use variants of these lists to define the same
orientations.
All these ideas make sense for n = 1 with a single vector u. There are two
boundary cells each consisting of a single point. The convention is that C1+
corresponds to the point at u with positive orientation, while C1− corresponds
to the point at the origin with negative orientation.
There is a corresponding notion of orientation for a connected manifold
patch. Given a coordinate system x1 , . . . , xn , there is a corresponding orien-
tation given by the list of basis vector fields ∂/∂x1 , . . . , ∂/∂xn . Given two
coordinate systems on a connected manifold patch, they either have the same
orientation or the opposite orientation.
There are other ways of giving orientations on a connected manifold patch.
108 CHAPTER 3. DIFFERENTIAL FORMS

Instead of using a list of basis of vector fields, one can use a list of basis 1-
forms. Or one can give a single non-zero n-form. Often we suppose that an
orientation is given. Then for every coordinate system, either the coordinate
system is compatible with that orientation, or it is compatible with the opposite
orientation.
Example: A one-dimensional connected manifold patch is something like a
curve; an orientation gives a direction along the curve. A coordinate system
has only one coordinate u. It is compatible with the orientation if it increases
in the direction given by the orientation.
A two-dimensional connected manifold patch has a clock orientation. For
dimension two there are two coordinates u, v. Consider a change first in u
(keeping v fixed) and then in v (keeping u fixed) that is in the sense of this
orientation. If both these changes are positive, or if both these changes are
negative), then the coordinate system is compatible with the orientation.
The case of a zero-dimensional connected manifold patch is special; it is just
a featureless point. Here an orientation is a + or − sign. A coordinate is a
non-zero number attached to the point. It is compatible with the orientation if
this number has the same sign. |

3.18 Integration of top-dimension differential forms


Suppose that P is a k-dimensional connected manifold patch with a given orien-
tation. Consider a differential k-form ω on P . This is a top-dimensional form,
since the degree of ω is equal to the dimension P . Also consider a Jordan mea-
surable compact set K ⊆ P . We wish to define the integral of ω over K. To do
this, choose a coordinate system u1 , . . . , uk that determines the orientation of
M . Then we may write
ω = f (u1 , . . . , uk ) du1 ∧ · · · ∧ duk . (3.138)
Furthermore, there is a set B ⊆ Rn such that u(K) = B. The definition is
Z Z
ω= f (v1 , . . . , vn ) dv1 · · · dvk = IB (f ). (3.139)
K B

where IB (f ) is the usual (unoriented) Riemann integral. (The variables v1 , . . . , vn


are just symbols that are available to define the function f .)
To show that this is well-defined, consider another coordinate system y1 , . . . , yk
that determines the same orientation. Then u = g(y), where det g0 (y) > 0.
Furthermore, B = g(A) for some Jordan measurable set A. The differential
form
ω = f (u) du1 ∧ · · · ∧ duk = f (g(y)) det g0 (y) dy1 ∧ · · · ∧ dyk (3.140)
may be expressed in either coordinate system. The definition in the other co-
ordinate system gives
Z
f (g(y)) det g0 (y) dy1 ∧ · · · ∧ dyk = IA ((f ◦ g) det g0 ). (3.141)
K
3.19. INTEGRATION OF FORMS OVER SINGULAR SURFACES 109

But IB (f ) = IA ((f ◦ g) det g0 ) by the change of variables theorem. So the


definitions using different coordinate systems are consistent. This is a classic
example of a passive transformation. The same differential form ω is integrated
over the same set K. But the numerical expression of the integral is different
in the two coordinate systems.

3.19 Integration of forms over singular surfaces


Let P be a k-dimensional oriented manifold patch, and let K ⊆ P be a Jordan
measurable compact subset. Let N be an n dimensional manifold patch. We
wish to define a singular parameterized k-surface χ in N . This is defined to
be a smooth function
χ : K → N. (3.142)
In this context to say that χ is smooth is to say that it extends to a manifold
mapping from P to N .
We want to think of the compact set K as a set over which we can perform k-
dimensional integrals. This is called a singular parameterized k-surface because
there is no restriction on the rank, and there is also no requirement that the
mapping be one-to-one. When k = 1 this χ is a parameterized curve, when
k = 2 it is a parameterized surface, and so on. When k = 0 the set consists of
only one point, so the surface merely specifies a point in N .
This is a classic example of an active transformation. In terms of coordi-
nates, u could be a coordinate system on P , while x is a coordinate system
of N . Suppose that ω is a differential k-form on N expressed in terms of the
coordinates x. If χ is given by x ← g(u), then ω pulls back to a differential
k-form χ∗ ω expressed in terms of u. Since P is k-dimensional, we know how to
integrate this form over K.
Suppose ω is a k-form on the n-dimensional space N . Suppose χ : K → N
is a parameterized k-surface in N . We define the integral of ω over χ by
Z Z
α= χ∗ α. (3.143)
χ K

The integral on the right is the integral of a top-dimensional form.


Example: Consider the parameterized curve ψ given by x ← t2 , y ← t3 , defined
for 1 ≤ t ≤ 2. We want to integrate the differential form ω = y dx + 2xy dy
along this curve. If we pull it back by ψ we get 2t4 dt + 6t7 dt = (2t4 + 6t7 ) dt =
d( 25 t5 + 34 t8 ). Thus the integral has the value ( 25 25 + 43 28 ) − ( 25 + 43 ). |
Example: Consider the cylinder x2 + y 2 = 25 with 0 ≤ x, 0 ≤ y, 0 ≤ z ≤ 3. Say
that one wants the integral of

α = z dy dz + 2x dz dx − 4x2 z dx dy (3.144)

on this surface. One way to calculate this is to parameterize the surface by the
corresponding rectangle in the x, z plane, with 0 ≤ x ≤ 5, 0 ≤ z ≤ 3. The
110 CHAPTER 3. DIFFERENTIAL FORMS

calculation amounts to pulling back by y ← 25 − x2 . The pullback is the form
 
∗ zx
α = √ + 2x dz dx. (3.145)
25 − x2
This can be integrated over the rectangle to give 195/2. |
In order to avoid complicated considerations of change of orientation, it is
customary to write −χ and define it in such a way that
Z Z
ω = − ω. (3.146)
−χ χ

Intuitively this corresponds to changing to the opposite orientation. In general,


one defines a k-chain to be a finite linear combination of singular parameterized
k-surfaces with integer coefficients. Thus, for instance, 2χ1 − χ2 consists of two
copies of the chain χ1 and one copy of the chain oppositely oriented from χ2 .
Thus by definition the integral of ω over 2χ1 − χ2 is twice the integral of ω over
χ1 minus the integral of ω over χ2 .

Theorem 3.14 (Change of variables) Let φ be a smooth manifold mapping


of an open set in a n-dimensional manifold patch N into an open set of an m-
dimensional manifold patch M . Let χ be a singular k surface in the open subset
of N . Let ω be a k-form defined on the open set in the manifold patch M . Then
Z Z
ω= φ∗ ω. (3.147)
φχ χ

Proof: The proof is easy.


Z Z Z Z
ω= (φχ)∗ ω = χ ∗ φ∗ ω = φ∗ ω. (3.148)
φχ K K χ


What happens if both k = m = n. Then the form is ω = f (y)dy. So the
theorem says Z Z
f (y) dy = f (g(y)) det g0 (x) dx. (3.149)
φχ χ

This looks like the classic change of variables theorem, but without the absolute
value sign. The reason the absolute value is not needed is that the integrals are
defined with respect to parameterizations χ and φχ, and these are thus oriented
integrals.

3.20 Stokes’ theorem for chains of singular sur-


faces
For Stokes’ theorem we use a more restricted notion of singular parameterized k-
surface. Again consider a k-dimensional parameter patch. Consider a compact
3.20. STOKES’ THEOREM FOR CHAINS OF SINGULAR SURFACES 111

subset Q realized as follows. Let u1 , . . . , uk be coordinates on N that map


into a non-degenerate open cell and that are compatible with the orientation.
Then Q ⊆ P consists of the points whose coordinates satisfy aj < uj < bj for
j = 1, . . . , k. In other words, Q maps into the bounded non-degenerate closed
cell [a1 , b1 ] × · · · [ak , bk ]. We call this a parameter cell. A singular parameterized
surface χ is a smooth mapping χ : Q → N .
The advantage of restriction to parameter cells is that one can define bound-
aries. For each i there is a manifold patch Pi+ defined by ui = bi and a manifold
patch Pi− defined by ui = ai . The manifold patch Pi+ has coordinates uj for
j 6= i, in the same order as before.
In Pi+ there is a bounded non-degenerate closed cell Q+ i whose coordinates
satisfy aj < uj < bj for j 6= i. Similarly, the manifold patch Pi− has coordinates
uj for j 6= i, and in Pi− there is a bounded non-degenerate closed cell Q− i whose
coordinates satisfy aj < uj < bj for j 6= i.
The definition of boundaries is via face mappings. For a parameter cell
Q ⊆ P there is a singular parameterized k-surface I : Q 7→ P that sends each
point in Q to itself. Define singular parameterized k − 1 face mapping surfaces
Ii± : Q± ±
i → P sending each point in Qi to itself. The oriented boundary of a
parameter cell is then defined by
k
X
∂I = (−1)i−1 (Ii+ − Ii− ) (3.150)
i=1

This is a chain of face mapping surfaces. Given a singular parameterized k-


surface χ : Q → N , the oriented boundary chain is

∂χ = χ ∂I. (3.151)
Pk
This just means that ∂χ is the chain i=1 (−1)i−1 (χIi+ − χIi− ). This is a chain
of parameterized k − 1 surfaces that each map to N .
If χ is a k-chain (an integer linear combination of parameterized k-surfaces),
then its boundary ∂χ is a k − 1 chain (the corresponding integer linear combi-
nation of the boundaries of the surfaces).
All of these objects have coordinate representations. The face mapping
surfaces are

Ii+ = ((u1 , . . . , uk ) ← (u1 , . . . , ui−1 , bi , ui+1 , . . . , uk )) (3.152)


Ii− = ((u1 , . . . , uk ) ← (u1 , . . . , ui−1 , ai , ui+1 , . . . , uk )).

For a singular parameterized k-surface χ : Q → N given by x ← g(u) the faces


in the i direction, i = 1, . . . , k, are given as the pullbacks

χIi+ = (x ← g(u1 , . . . , ui−1 , bi , ui+1 , . . . , uk )) (3.153)


χIi− = (x ← g(u1 , . . . , ui−1 , ai , ui+1 , . . . , uk )).
112 CHAPTER 3. DIFFERENTIAL FORMS

Theorem 3.15 (Stokes’ theorem) If ω is a k − 1 form and χ is a k-chain,


then Z Z
dω = ω. (3.154)
χ ∂χ

Proof: The proof has two steps. The first step is to prove the theorem in
the special case when the cell parameterizes itself, that is, when χ = I : Q → Q
sends every point to itself. The general k − 1 form on a k-dimensional space is
k
X k
X
ω= ωj = fj (u1 , . . . , uk ) du1 ∧ . . . ∧ duj−1 ∧ duj+1 ∧ . . . ∧ duk . (3.155)
j=1 j=1

In that case
k k
X X ∂fj (u1 , . . . , uk )
dω = dωj = (−1)j−1 du1 ∧ . . . ∧ duk . (3.156)
j=1 j=1
∂uj

Hence
Z Z k Z
X ∂fj (u1 , . . . , uk )
dω = I ∗ dω = (−1)j−1 du1 . . . duk . (3.157)
I Q j=1 A ∂uj

We can now use Fubini’s theorem to write the integral on the right hand side
as an iterated integral, with the duj integral as the inside integral. The funda-
mental theorem of calculus say that the integral is equal to
Z
[fj (u1 , . . . , bk , . . . , uk ) − fj (u1 , . . . , aj , . . . , uk )] du1 . . . duj−1 duj+1 . . . duk .
Aj
(3.158)
Here Aj is the relevant k − 1 cell. In other words,
Z k
X Z Z
dω = (−1)j−1 [ Ij+∗ ωj − Ij−∗ ωj ]. (3.159)
I j=1 Q+
j Q−
j

But for i 6= j both Ij+∗ ωi = 0 and Ij−∗ ωi = 0. So this is


Z k
X Z Z
dω = (−1)j−1 [ Ij+∗ ω − Ij−∗ ω]. (3.160)
I j=1 Q+
j Q−
j

Finally, this is
Z k
X Z Z Z
j−1
dω = (−1) [ ω− ω] = ∂Iω. (3.161)
I j=1 Ij+ Ij−

The second step uses the fact that the integral of a differential form α over
a chain χ may be expressed by pulling back to parameter cells. It also depends
3.21. CLASSICAL VERSIONS OF STOKES’ THEOREM 113

on the result that the pullback of a differential is the differential of the pullback,
that is, χ∗ (dω) = dχ∗ ω. This gives
Z Z Z Z Z Z
∗ ∗ ∗
dω = χ (dω) = dχ ω = χ ω= ω= ω. (3.162)
χ I I ∂I χ∂I ∂χ

So the properly formulated result is rather simple; it follows from the trivial
case of the cell and from the remarkable transformation properties of differential
forms. 

3.21 Classical versions of Stokes’ theorem


The integral of a 1-form along an oriented curve is called a line integral . In
general such an integral must be computed by pulling back to a parameter
interval. However for an exact form there is a shortcut. The fundamental
theorem of calculus is the case relating scalars to 1-forms. It says that for every
scalar field s and every parameterized curve C we have
Z
ds = ∆s. (3.163)
C

Here C is an oriented path from one point to another point, and ∆s is the value
of s at the final point minus the value of s at the initial point. Notice that the
result does not depend on the choice of path. This is because ds is an exact
form.
Example: Consider the form y 2 dx + 2xy dy. Since it is exact, we have
Z Z
y 2 dx + 2xy dy = d(xy 2 ) = ∆(xy 2 ) (3.164)
C C

independent of the path. |


The most common version of Stokes’ theorem relates 1-forms to 2-forms.
The 2-dimensional version of the theorem is Green’s theorem . It says that
Z   Z
∂q ∂p
− dx dy = p dx + q dy. (3.165)
R ∂x ∂y ∂R

Here R is an oriented region in two dimensional space, and ∂R is the curve that
is its oriented boundary.
Example: A classical application of Green’s theorem is the computation of area
via Z Z
1
dx dy = x dy − y dx. (3.166)
R 2 ∂R
In polar coordinates this takes the form
Z Z
1
r dr dθ = r2 dθ. (3.167)
R 2 ∂R
114 CHAPTER 3. DIFFERENTIAL FORMS

In this case a typical parameter region for the integral on the left hand side
may be considered in the r, θ plane as the four sided region where θ ranges from
0 to 2π and r ranges from 0 to some value depending on θ. The integral on
the right is over a chain consisting of four oriented curves. However three of
these curves contribute a total of zero: the contributions from θ = 0 and θ = 2π
take opposite orientations and cancel each other, while at r = 0 the integrand
vanishes. So only one oriented curve on the right hand side contributes to the
calculation of the area. |
The integral of a 2-form over a surface is called a surface integral. The
classical Stokes’s theorem says that for an oriented two dimensional surface S
in a three dimensional space with oriented boundary curve ∂S we have
Z       Z
∂r ∂q ∂p ∂r ∂q ∂p
− dy dz+ − dz dx+ − dx dy = (p dx+q dy+r dz).
S ∂y ∂z ∂z ∂x ∂x ∂y ∂S
(3.168)
This result for 2-forms has an obvious analog in n dimensions. This case of
Stokes’ theorem has important consequences for line integrals of closed forms.

Theorem 3.16 (Integral over a boundary) Let ω be a closed differential 1-


form. Let R be a surface on which ω is smooth, and let ∂R be the curve that is
its oriented boundary. Then Z
ω = 0. (3.169)
∂R

Theorem 3.17 (Independence of path for closed forms) Let ω be a closed


differential 1-form. Let R be a surface on which ω is smooth, and let ∂R be the
curve that is its oriented boundary. Suppose that C1 and C2 are oriented curves
such that C1 − C2 = ∂R. Then
Z Z
ω= ω. (3.170)
C1 C2

Theorem 3.18 (Exactness of closed forms in special regions) Suppose ω


is a closed differential 1-form that is smooth in an open set U . Suppose U has
the property that whenever C is a closed curve in U , then C = ∂R for some
region in U . Then ω is exact in U .

Proof: Fix an initial point and a final point, and suppose that the final
point has coordinates x0 . Consider the scalar
Z final(x0 )
s = h(x0 ) = ω. (3.171)
initial

By the property of the region U and the independence of path for closed forms,
this is a well-defined scalar depending only on the final point. It is not too hard
to show that ds = ω in U . 
3.22. PICTURING STOKES’ THEOREM 115

The result in two dimensions only requires Green’s theorem. Even this case
is significant. Much of what is interesting in complex variables depends on the
fact that
x dy − y dx
α= (3.172)
x2 + y 2
is a form (defined in the plane with one point removed) that is closed but
not exact. If one considers the plane with an entire half-line from the origin
removed, then this form is exact in that smaller region, in fact, α = dφ, where
φ is a suitable angle. But the interest is in what happens with curves that
go entirely around the origin. Since such a curve is not a boundary, it is not
surprising that the result can be a non-zero multiple of 2π.
Gauss’s theorem is the case relating n − 1 forms to n forms. The classical
case is when n = 3, so that it relates 2-forms to 3-forms. When n = 2 it is
Green’s theorem. Let W be an oriented three dimensional region, and let ∂W
be the oriented surface that forms its boundary. Then the three dimensional
version of Gauss’s theorem states that
Z   Z
∂a ∂b ∂c
+ + dx dy dz = a dy dz + b dz dx + c dx dy. (3.173)
W ∂x ∂y ∂z ∂W

3.22 Picturing Stokes’ theorem


This section treats the pictures associated with Stokes’ theorem. It will appear
that in n dimensional space it is easy to draw pictures of differential forms of
degree 0, 1, n − 1, and n. Since Stokes’ theorem relates integrals of k forms and
k + 1 forms, we see that we will have nice pictures for k = 0, for k = 1 and
n = 2, 3, and for k = n − 1.
A 0-form s is a scalar field, and a scalar field is pictured by its contour
hypersurfaces (surfaces of dimension n − 1). Such surfaces have no boundaries.
The differential of a scalar field s is an exact 1-form ds. It is pictured by
looking close to each point; near a given point the contour hypersurfaces look
like a hyperplanes.
In the case of a 1-form α the idea is to look at each point and draw con-
tour hyperplanes in a region very near the point. These hyperplanes include
the subspaces on which the form vanishes. One can also imagine nearby hyper-
planes on which the form assumes a constant value. The problem is that these
hyperplanes do not R necessarily fit together to form a hypersurface. Neverthe-
less, the integral C α over an oriented curve C is well-defined; it is obtained
by integrating the changes in value as the curve crosses the little hyperplanes.
If in a discrete version of the integation the spacing between the hyperplanes
corresponds to constant changes, then the calculation reduces to counting the
number of hyperplanes crossed, keeping track of whether the crossing is upward
or downward. The Stokes’ theorem for a 1-form α says that for an oriented
surface S with oriented boundary ∂S we have
Z Z
dα = α. (3.174)
S ∂S
116 CHAPTER 3. DIFFERENTIAL FORMS

For a closed 1-form dα = 0 and the integral over every boundary is zero.
A k-form in n dimensions is a much more complicated object. A strategy to
visualize them is to look at a certain subspace where the form vanishes. This
does not completely characterize the form, but it gives at least some intuition
about what it looks like.
Consider the case of dimension n. If X is a vector field, and ω is a k-form,
the we may define the interior product of X with ω to be the k − 1 form Xcω
defined by
hXcω | Y1 , . . . , Yk−1 i = hω | X, Y1 , . . . , Yk−1 i. (3.175)

It follows from the definition that the k − 2 form Xc(Xcω) = 0.


The way to compute with the interior product is to use the fact that


c dxj ∧ dxI = dxI . (3.176)
∂xj


Thus, for instance, the interior product of ∂y with dx dy is equal to the interior

product of ∂y with −dy dx which is −dx.
The characteristic subspace of ω is the subspace of all X with Xcω = 0.
The condition for X to be a characteristic vector is that for every k-tuple
X, Y1 , . . . , Yk−1 to which X belongs we have hω | X, Y1 , . . . , Yk−1 i = 0. If the
k-form is non-zero, then the dimension of the characteristic subspace is ≤ n − k.
If the characteristic subspace of ω has dimension n−r at each point, then the
form is said to have rank r. If the k-form is non-zero, then the rank is ≥ k. It is
not true that every non-zero k-form is of rank k. The simplest counterexample is
in dimension 4 and is ω = dx1 dx2 + dx3 dx4 , which has rank 4. In this case, the
characteristic subspace consists only of the zero vector. It may be shown that
a non-zero k-form is of rank k if and only if it is decomposable, that is, it may
be represented as a product of non-zero 1-forms. The form in the example is
not decomposable. For more on decomposable forms see the books by Crampin
and Pirani [5] and by Şuhubi [19].
If a k-form is of rank k, then it would seem natural to picture it by its
corresponding characteristic subspace of dimension n − k. This may not give
complete information about the form, but it will indicate its general character.

Proposition 3.19 Consider a non-zero differential k-form in n dimensions. If


k = 1 or if k = n − 1, then the form has rank k.

Proof: For the case k = 1 of a 1-form α the equation for a characteristic


vector is
n n n
X ∂ X X
Xcα = ai c pj dxj = ai pi = 0. (3.177)
i=1
∂xi j=1 i=1

If the form α is non-zero at some point, then some pj 6= 0, and so the corre-
sponding space of vectors X at the given point is n − 1 dimensional.
3.22. PICTURING STOKES’ THEOREM 117

The case of an n − 1 form ω is more interesting. Let σ = dx1 ∧ · · · dxn be


a volume form. The general n − 1 form may always be written as an interior
product with a vector field
n n
X ∂ X
ω = Y cσ = pj cσ = pj (−1)j−1 dx1 · · · dxj−1 dxj+1 · · · dxn . (3.178)
j=1
∂xj j=1

The characteristic equation is then Xcω = Xc(Y cσ) = 0. Suppose that Y


and hence ω are non-zero. Then for every non-zero scalar λ 6= 0 we have that
X = λY is a non-zero solution of the characteristic equation. On the other hand,
if X and Y are linearly independent, then it is easy to see that Xc(Y cσ) 6= 0.

It is tempting to represent the n − 1 form ω by a vector field Y satisfying the
characteristic equation, but this choice of vector field is not unique. (In the proof
above the choice of vector field depends on the choice of volume form σ, which
in turn depends on the coordinate system.) Instead, one should think of an n−1
form ω as a field of long very thin cylindrical tubes, such that the vector field
X is tangent to the tubes. The reason for thinking of tubes instead of lines is to
capture the intuition about orientation; the tubes may be thought of as having
a kind of spiral orientation that gives an orientation in the space transverse
to the tube. In a discreteR approximation the tubes may be pictured as closely
spaced, and the integral S ω of the form ω over an oriented hypersurface S is
then proportional to the number of tubes that penetrate the surface, taking into
account a sign at each intersection point from the orientation. In general tubes
will have oriented end points, and the density of these points gives a geometric
representation of the derivative
n
X ∂pj
dω = dx1 · · · dxn . (3.179)
j=1
∂xj

Stokes’ theorem for an n − 1 form ω is then the Gauss theorem


Z Z
dω = ω. (3.180)
W ∂W

The left hand side represents the integral of the oriented end points over the
n-dimensional oriented region W . The right hand side represents the flux of ω
though the boundary surface ∂W , which in a discrete approximation is imag-
ined as proportional to the number of tubes penetrating the surface, taking
into account orientation. In other words, the output through the boundary is
explained by an integral of the production inside the region. In the discrete
approximation, the Gauss theorem reduces to counting, since the number of
lines passing through the bounding surface ∂W described by the n − 1 form ω
corresponds to the number of points in the interior W described by the n-form
dω, as usual taking signs into account. In the case when ω is closed, there is no
production, and the total flow across the boundary ∂W is zero.
118 CHAPTER 3. DIFFERENTIAL FORMS

We shall see in the next chapter that when there is a given volume form,
then the n − 1 form in the Gauss theorem may be represented by a vector field.
In this case the Gauss theorem becomes the divergence theorem.
In the case n = 3 a 1-form α has a derivative dα that is a 2-form. The 1-form
may be pictured as surfaces that end in curves, and the the 2-form is represented
by thin tubes. The tubes act as hinges on which the surfaces hang. Since dα is
an exact form, the tubes have no end points. Stoke’s theorem relates the 1-form
α represented by the surfaces to the 2-form dα represented by the thin tubes.
The formula represents the integral of α around the closed curve ∂S in terms of
the integrated flux of the tubes given by dα through a surface S that this curve
bounds. The result is independent of the surface. In a discrete approximation
the Stoke’s theorem is again the result of counting, since the number of surfaces
with equal increments (taking into account increase or decrease) is equal to the
number of tubes acting as hinges (taking into account orientation).
Example: Take the form α = x dy in three dimensions and an oriented surface
S in a constant z plane bounded by two values of y and by a positive and a
negative value of x. The boundary ∂S is taken oriented so that y increases
for the positive value of x and y decreases for the negative value of x. Then
the integral along the boundary is positive. This is explained by the tubes
representing dα = dx dy, which are vertical, of constant density, and have an
orientation compatible with the orientation of S. |
The case n = 2 gives rise to two equivalent pictures. The first picture is
that of Green’s theorem. Represent the 1-form α = p dx + q dy near each point
by the line where it vanishes. Then α represents the increase or decrease as
one crosses such lines along an oriented curve. It may help to think of the
lines as double lines representing a step up or a step down in a given direction.
The differential dα = (∂q/∂x − ∂p/∂y) dx dy represents the density of hinge
points where the lines begin. So Green’s theorem says that the total increase
or decrease is completely explained by this cloud of hinge points. When the
form α is closed, there are no hinge points, and the integral around every closed
curve is zero.
Example: Take the form α = x dy in two dimensions. This is represented by
lines of constant y whose spacing decreases and reverses sign as x passes through
zero. Consider a region S bounded by two values of y and by a positive and a
negative value of x. The boundary ∂S is taken oriented so that y increases for
the positive value of x and y decreases for the negative value of x. One thinks of
α as indicating some sort of subjective change in vertical distance. The integral
along ∂S is positive, since one is going uphill along the entire closed curve.
In essence this is the famous picture of the Penrose stairs. (The most famous
illustration of these stairs is the Ascending and Descending lithograph print by
Escher.) |
The other picture for n = 2 is closer to the Gauss theorem. It is suggested
by writing α = p dy − q dx and consider it as a 2 − 1 form. In dimension 2 the
tubes representing the form have a transverse orientation. These can be thought
of as double lines, where the orientation goes from one line to its neighbor. So
α represents the amount that these lines cross an oriented curve, taking into
3.23. ELECTRIC AND MAGNETIC FIELDS 119

account the orientation. The differential dα = (∂p/∂x + ∂q/∂y) dx dy represents


the density of the oriented points at which these tubes begin. This picture
suggests a conservation law, where the amount of flux across the boundary is
explained by the rate of generation within the region. If the form is closed, then
there is no generation within, and so the flow is compensated by the flow out.
Example: It is important that the boundary ∂S is the entire boundary of
the region S where the form is defined. The classical example is the form
α = (x dy − y dx)/(x2 + y 2 ) defined in the plane with the origin removed. This
1-form is described by radial (constant angle) half-lines. Say that S is the
annulus between the circles x2 + y 2 = 1 and x2 + y 2 = 4. Then ∂S consists of
the two circles, with opposite orientations. Each line that enters the annulus on
one circle leaves it on the other circle. So there is complete cancellation, and
the integral of α over ∂S is zero. (Actually, the computation of the boundary
∂S is slightly more complicated. If S is regarded as a chain parameterized by a
rectangular region, then the boundary ∂S is a chain consisting of the two circles
with opposite orientations, plus two segments along y = 0 with 1 < x < 2. These
last two segments have opposite orientations, so their contributions cancel, and
they may be ignored.) |

3.23 Electric and magnetic fields


The power of the differential forms concept is that it works in spaces without
much structure, other than a notion of differentiability. Euclidean space is
quite special in this regard, in that there is considerable structure, in particular
notions such as length and orthogonality. For applications of forms to problems
involving Euclidean space, it is desirable to use basic forms that are of unit
length and are mutually orthogonal. The most obvious example is to take
Cartesian coordinates, where the basic forms are dx, dy, and dz, taken in that
order. Once we have such basic forms, we can convert from 1-forms to vector
fields, just by taking the same components. We can also convert from 1-forms
to 2-forms, by replacing dx, dy and dz by dy dz, dz dx, and dx dy.
This also works in other coordinate systems. The most important example is
spherical polar coordinates r, θ, φ, where x = r cos(φ) sin(θ), y = r sin(φ) sin(θ),
and z = r cos(θ). Here r2 = x2 + y 2 + z 2 . This is the distance from the origin,
so surfaces of constant r are spheres. The quantities θ and φ are co-latitude and
longitude. The basic forms are dr, r dθ, and r sin(θ) dφ.
Sometimes it is useful to use cylindrical coordinates ρ, φ, z. Here x =
ρ cos(φ), y = ρ sin(φ), and z = z. This time ρ2 = x2 + y 2 . This is the dis-
tance from the z axis, so surfaces of constant ρ are cylinders. The basic forms
are dρ, ρ dφ, and dz.
An electric field is most often modeled as a vector field. However in this
section we will think of it as a 1-form. Thus, for example, the electric field of a
point charge at the origin is
1 x dx + y dy + z dz
E= . (3.181)
4πr2 r
120 CHAPTER 3. DIFFERENTIAL FORMS

This can also be written in spherical polar coordinates as


1
E= dr. (3.182)
4πr2
This 1-form is
E = −dφ, (3.183)
where φ is a 0-form called the electric potential . For the electric field of a point
charge at the origin the electric potential is
1
φ= . (3.184)
4πr
The surfaces of constant potential are spheres. In most physics texts the electric
field E is represented by the gradient vector field, which gives vectors in the
radial direction, orthogonal to these spheres.
There is another kind of electric field often considered in physics. It is called
the electric displacement (also called electric flux density ) and is written D.
For our purposes we can think of it as the same field, but considered as a 2-form.
Thus for the electric field of a point charge this is
1 x dy dz + y dz dx + z dx dy
D= . (3.185)
4πr2 r
This has a nice expression in spherical polar coordinates as
1 1
D= r dθ r sin(θ) dφ = sin(θ) dθ dφ. (3.186)
4πr2 4π
This is represented geometrically by lines coming radially from the origin.
The fundamental equations of electrostatics are

dE = 0 (3.187)

and
dD = R. (3.188)
Here R is a 3-form called the charge-density.
These equation have integral forms. The first equation says that for every
surface W the integral around the closed curve ∂W that is its boundary is zero:
Z
E = 0. (3.189)
∂W

The second says that for every region Ω the integral over the surface ∂Ω of D
is equal to the total charge in the region:
Z Z
D= R. (3.190)
∂Ω Ω

As an illustration of these ideas, here is the computation of the electric


displacement field of a charge that is uniform on a ball of radius  about the
3.23. ELECTRIC AND MAGNETIC FIELDS 121

origin. For D we have the same formula as before for r ≥ . However for r ≤ 
we have
r3
Din = 3 D. (3.191)

One way to see that this works is to compute the exterior derivative. This is
for r ≤ 
−1 −1
3r2 4π3 4π3
 
1 2
d Din = 3 dr sin(θ) dθ dφ = r sin(θ) dr dθ dφ = dx dy dz.
 4π 3 3
(3.192)
Indeed this a constant charge density within the ball with total charge equal
to one. Thus the radial lines representing the electric displacement field begin
inside this ball. If we restrict the form to the region outside the charged ball,
then D is a closed 2-form that is not exact.
The corresponding electric field for r ≤  is

r3 1 r
Ein = 3
E= dr. (3.193)
 4π 3
The potential is
1 1 2
φin = (3 − r2 ). (3.194)
4π3 2
The constant term makes the potential continuous at r = .
A magnetic field (also called magnetic flux density ) is most often modeled as
a vector field, but in many ways it is more natural to model it as a differential 2-
form. For example, consider the case of a wire bearing current along the z axis.
The magnetic flux density in this case is most easily expressed in cylindrical
coordinates via
1 x dx + y dy 1
B= dz = dz dρ. (3.195)
2π x2 + y 2 2πρ
The lines of magnetic flux density circle around the wire; their density drops off
with distance from the wire.
There is another kind of magnetic field, sometimes called the magnetic field
intensity. It is written H. For our purposes we can think of it as the same field,
but considered as a 1-form. Thus for the magnetic field intensity of the wire we
have
1 x dy − y dx 1 1
H= 2 2
= ρ dφ = dφ (3.196)
2π x + y 2πρ 2π
in cylindrical coordinates. The surfaces of constant φ are planes with one side
on the wire. Notice that this is not an exact 1-form, since the integral around a
closed curve surrounding the wire is 1. In physics the magnetic field intensity is
often represented by a vector field orthogonal to the planes, again circling the
wire.
The fundamental equations of magnetostatics are

dB = 0 (3.197)
122 CHAPTER 3. DIFFERENTIAL FORMS

and
dH = J. (3.198)
Here J is a 2-form called the current-density. The fact that J is exact represents
current conservation: the lines representing the 2-form J have no end points.
These equations have integral forms. The first says that for every region Ω
the integral over the surface ∂Ω of B is zero.
Z
B = 0. (3.199)
∂Ω
Magnetic flux lines never end. The second equation says that for every surface
W the integral of H around the closed curve ∂W that is its boundary is the
current passing through the surface.
Z Z
H= J. (3.200)
∂W W
As an illustration of these ideas, here is the computation of the magnetic
intensity field of a current that is uniform on a cylinder of radius  about the z
axis. For H we have the same formula as before for ρ ≥ . However for ρ ≤ 
we have
ρ2
Hin = 2 H. (3.201)

One way to see that this works is to compute the exterior derivative. This is
for r ≤ 
2ρ 1 −1 −1
d Hin = 2 dρ dφ = π2 ρ dρ dφ = π2 dx dy. (3.202)
 2π
Indeed this is a constant current density within the cylinder with total current
equal to one. Thus the planes representing the H field end in lines inside the
cylinder. If we restrict H to the region outside the wire, then it is an example
of a closed 1-form that is not exact.
The corresponding magnetic flux density for ρ ≤  is
ρ2 1 ρ
Bin = B= dz dρ. (3.203)
2 2π 2
Since dB = 0, it seems reasonable that it should have a 1-form magnetic
potential A with dA = B. Such a potential is of course not unique, since one
may add to it a 1-form of the form ds and get the same magnetic flux density.
As an example, for the case of the wire the vector potential outside the wire
may be taken to be
1 ρ
A=− log( ) dz. (3.204)
2π 
The reason for writing it with the  > 0 is that it is convenient to have the mag-
netic potential be zero at the surface of the wire. The corresponding expression
inside is then
1 1 ρ2
 
Ain = − − 1 dz. (3.205)
2π 2 2
This also is zero at the surface of the wire.
3.23. ELECTRIC AND MAGNETIC FIELDS 123

Problems 7: Vector fields


1. Straightening out. A vector field that is non-zero at a point can be trans-
formed into a constant vector field near that point by a change of coordi-
nate system. Pick a point away from the origin, and find coordinates u, v
so that
y ∂ x ∂ ∂
− 2 + = . (3.206)
x + y 2 ∂x x2 + y 2 ∂y ∂u

2. Linearization. Consider the vector field


∂ ∂
u = x(4 − x − y) + (x − 2)y . (3.207)
∂x ∂y

Find its zeros. At each zero, find its linearization. For each linearization,
find its eigenvalues. Use this information to sketch the vector field.

3. Nonlinearity. Consider the vector field

∂ ∂
v = (1 + x2 + y 2 )y − (1 + x2 + y 2 )x . (3.208)
∂x ∂y

Find its linearization at zero. Show that there is no coordinate system


near 0 in which the vector field is expressed by its linearization. Hint:
Solve the associated system of ordinary differential equations, both for v
and for its linearization. Find the period of a solution in both cases.

4. Nonlinear instability. Here is an example of a fixed point where the lin-


ear stability analysis gives an elliptic fixed point, but changing to polar
coordinates shows the unstable nature of the fixed point:

dx
= −y + x(x2 + y 2 ) (3.209)
dt
dy
= x + y(x2 + y 2 ). (3.210)
dt
Change the vector field to the polar coordinate representation, and solve
the corresponding system of ordinary differential equations.

5. A predator-prey system. Fix α > 0. In the region with 0 < u and 0 < v
consider the system

du
= u(1 − v) (3.211)
dt
dv
= αv(u − 1).
dt
The u variable represents prey; the v variable represents predators. (a)
Sketch this vector field. Find the zero. What kind of linearization is there
at this zero?
124 CHAPTER 3. DIFFERENTIAL FORMS

(b) Show that each solution satisfies

αv(u − 1) du + u(v − 1) dv = 0. (3.212)

Show that 1/(uv) is an integrating factor for this differential form.


(c) Integrate this form to find an equation for the solution curves in the
u, v plane. Show that these are compatible with your sketch. What value
of the constant of integration corresponds to the fixed point?

Recitation 7
1. Exact differentials. Is (x2 + y 2 ) dx + 2xy dy an exact differential form? If
so, write it as the differential of a scalar.

2. Exact differentials. Is (1 + ex ) dy + ex (y − x) dx an exact differential? If


so, write it as the differential of a scalar.

3. Exact differentials. Is ey dx + x(ey + 1) dy an exact differential? If so,


write it as the differential of a scalar.

4. Constant differential forms. A differential form usually cannot be trans-


formed into a constant differential form, but there are special circum-
stances when that can occur. Is it possible to find coordinates u and v
near a given point (not the origin) such that

−y dx + x dy = du? (3.213)

5. Constant differential forms. A differential form usually cannot be trans-


formed into a constant differential form, but there are special circum-
stances when that can occur. Is it possible to find coordinates u and v
near a given point (not the origin) such that
y x
− dx + 2 dy = du? (3.214)
x2 + y 2 x + y2

6. Ordinary differential equations. Solve the differential equation (xy 2 +


y) dx − x dy = 0 by finding an integrating factor that depends only on y.

Problems 8: Differential forms


1. Say that the differential 1-form α = p dx + q dy + r dz has an integrating
factor µ 6= 0 such that µα = ds. Prove that α ∧ dα = 0. Also, express this
condition as a condition on p, q, r and their partial derivatives.

2. Show that α = dz − y dx − dy has no integrating factor.

3. Show that the differential 1-form α = yz dx + xz dy + dz passes the test


for an integrating factor.
3.23. ELECTRIC AND MAGNETIC FIELDS 125

4. In the previous problem it might be difficult to guess the integrating factor.


Show that µ = exy is an integrating factor, and find s with µα = ds.
5. The differential 2-form ω = (2xy − x2 ) dx dy is of the form ω = dα, where
α is a 1-form. Find such an α. Hint: This is too easy; there are many
solutions.

Recitation 8
1. The differential 3-form σ = (yz + x2 z 2 + 3xy 2 z) dx dy dz is of the form
σ = dω, where ω is a 2-form. Find such an ω. Hint: Many solutions.
2. Let σ = xy 2 z dy dz − y 3 z dz dx + (x2 y + y 2 z 2 ) dx dy. Show that this 2-form
σ satisfies dσ = 0.
3. The previous problem gives hope that σ = dα for some 1-form α. Find
such an α. Hint: This may require some experimentation. Try α of the
form α = p dx+q dy, where p, q are functions of x, y, z. With luck, this may
work. Remember that when integrating with respect to z the constant of
integration is allowed to depend on x, y.

Problems 9: Stokes’ theorem


1. Let C be the curve x2 + y 2 = 1 in the first quadrant from (1, 0) to (0, 1).
Evaluate Z
xy dx + (x2 + y 2 ) dy. (3.215)
C

2. Let C be a curve from (2, 0) to (0, 3). Evaluate


Z
2xy dx + (x2 + y 2 ) dy. (3.216)
C

3. Consider the problem of integrating the differential form


y x
α=− 2 dx + 2 dy (3.217)
x + y2 x + y2
from (1, 0) to (−1, 0) along some curve avoiding the origin. There is an
infinite set of possible answers, depending on the curve. Describe all such
answers.
4. Let R be the region x2 + y 2 ≤ 1 with x ≥ 0 and y ≥ 0. Let ∂R be its
boundary (oriented counterclockwise). Evaluate directly
Z
xy dx + (x2 + y 2 ) dy. (3.218)
∂R

5. This continues the previous problem. Verify Green’s theorem in this spe-
cial case, by explicitly calculating the appropriate integral over the region
R.
126 CHAPTER 3. DIFFERENTIAL FORMS

Recitation 9
1. Let
α = −y dx + x dy + xy dz. (3.219)
Fix a > 0. Consider the surface S that is the hemisphere x + y + z 2 =
2 2

a2 with z ≥ 0. Integrate α over the boundary ∂S of this surface (a


counterclockwise circle in the x, y plane).

2. This continues the previous problem. Verify Stokes’s theorem in this spe-
cial case, by explicitly calculating the appropriate integral over the surface
S.
3. Let σ = xy 2 z dy dz − y 3 z dz dx + (x2 y + y 2 z 2 ) dx dy. Integrate σ over the
sphere x2 + y 2 + z 2 = a2 . Hint: This should be effortless.
Chapter 4

The Metric Tensor

127
128 CHAPTER 4. THE METRIC TENSOR

4.1 The interior product


The subject of differential forms is general and beautiful, but it fails to capture
the ideas of length, area, and volume. These all depend on a more complicated
kind of calculation, where, in the spirit of Pythagoras, one takes a square root
of a sum of squares. We begin this part of the notes with an easy case, when
one simply assumes that one has an understanding of volume. This leads to
an important reformulation of Stokes’ theorem called the divergence theorem.
Then we look more seriously at the extra ingredient that is needed to deal with
these topics, that is, with the metric tensor.
There is an operation called interior product (or contraction). In the case
of interest to us, it is a way of defining the product of a vector field u with
a k-form ω to get a k − 1 form ucω. (The exterior product is also called the
wedge product; the interior product is sometimes called the hook product. In
the following we call it the interior product.) The definition of interior product
that is used in practice is very simple and may be illustrated by an example.

Consider a basis vector field ∂u j
and a differential form dui duj duk . The rule
is to move the duj to the left and then remove it. So the result in this case is

∂ ∂
c dui duj duk = − c duj dui duk = −dui duk . (4.1)
∂uj ∂uj

There is a general theoretical definition that is also illuminating. Thus for


k ≥ 1 the interior product of the vector field u with the k-form σ is defined by

hucσ | v1 , . . . , vk−1 i = hσ | u, v1 , . . . , vk−1 i. (4.2)

When k = 1 this is already familiar. For a 1-form α the interior product ucα is
the scalar hα | vi. For a scalar field s we take ucs = 0.
One interesting property of the interior product is that if α is an r-form and
β is an s-form, then

uc(α ∧ β) = (ucα) ∧ β + (−1)r α ∧ (ucβ). (4.3)

This is a kind of triple product identity.


In particular, we may apply this when r = 1 and s = n. Since β is an n-form,
it follows that α ∧ β = 0. Hence we have in this special case

hα | uiβ = α ∧ (ucβ). (4.4)

Another application is with two 1-forms β and γ. In this case it gives

ac(β ∧ γ) = hβ | aiγ − hγ | aiβ. (4.5)

So the interior product of a vector with β ∧ γ is a linear combination of β and


γ.
Later we shall see the connection with classical vector algebra in three di-
mensions. The exterior product β ∧ γ is an analog of the cross product, while
4.2. VOLUME 129

α ∧ β ∧ γ is an analog of the triple scalar product. The combination −ac(β ∧ γ)


will turn out to be an analog of the triple vector product.
The general formula that is used for computations is that for a vector field
v and 1-forms α1 , . . . , αk we have
k
X
vc(α1 ∧ · · · ∧ αk ) = (−1)i−1 hα | viα1 ∧ · · · ∧ αi−1 ∧ αi+1 ∧ · · · ∧ αk . (4.6)
i=1

In a coordinate representation this implies the following identity for the interior
product of a vector field with an n-form:
 
n n
X ∂ X
 aj c du1 · · · dun = ai (−1)i−1 du1 · · · dui−1 dui+1 · · · dun . (4.7)
j=1
∂u j i=1

4.2 Volume
Consider an n-dimensional manifold. The new feature is a given n-form, taken
to be never zero. We denote this volume form by vol. In coordinates it is of the
form

vol = g du1 · · · dun . (4.8)

This coefficient g depends on the coordinate system. The choice of the no-

tation g for the coefficient will be explained in the following chapter. (Then

g will be the square root of the determinant of the matrix associated with
the Riemannian metric for this coordinate system.) It is typical to make the
coordinate system compatible with the orientation, so that volumes work out
to be positive.
The most common examples of volume forms are the volume in vol =
dx dy dz in Cartesian coordinates and the same volume vol = r2 sin(θ) dr dθ dφ
in spherical polar coordinates. The convention we are using for spherical polar
coordinates is that θ is the co-latitude measured from the north pole, while φ is

the longitude. We see from these coordinates that the g factor for Cartesian

coordinates is 1, while the g factor for spherical polar coordinates is r2 sin(θ).
In two dimensions it is perhaps more natural to call this area. So in Cartesian
coordinates area = dx dy, while in polar coordinates area = r dr dφ.
For each scalar field s there is an associated n-form s vol. The scalar field
and the n-form determine each other in an obvious way. They are said to be
dual to each other, in a certain special sense.
For each vector field v there is an associated n − 1 form given by vcvol.
The vector field and the n − 1 form are again considered to be dual to each
other, in the same sense. If v is a vector field, then vcvol might be called the
corresponding flux density. It is an n − 1 form that describes how much v is
penetrating a given n − 1 dimensional surface. In coordinates we have
 
n n
X ∂ X √
 aj c vol = ai (−1)i−1 g du1 · · · dui−1 dui+1 · · · dun . (4.9)
j=1
∂uj i=1
130 CHAPTER 4. THE METRIC TENSOR

In two dimensions a vector field is of the form


∂ ∂
u=a +b . (4.10)
∂u ∂v
The area form is

area = g du dv. (4.11)
The corresponding flux is

ucarea = g(a dv − b du). (4.12)

In three dimensions a vector field is of the form


∂ ∂ ∂
u=a +b +c . (4.13)
∂u ∂v ∂w
The volume form is

vol = g du dv dw. (4.14)
The corresponding flux is

g(a dv dw + b dw du + c du dv). (4.15)

4.3 The divergence theorem


Consider a vector field v in a space with a volume element vol. Define the flux
density of the vector field to be the n − 1 form vcvol. Define the divergence of
a vector field v to be the scalar div v such that

d(vcvol) = div v vol. (4.16)

In other words, it is the dual of the differential of the dual. The general diver-
gence theorem then takes the following form.

Theorem 4.1 Consider an n-dimensional region for which there is a volume


form. Consider a vector field v with its associated n − 1 form flux density vcvol,
and consider the scalar field div v associated with the exterior derivative of this
form. Then Stokes’ theorem gives
Z Z
div v vol = vcvol. (4.17)
W ∂W

The right hand side in the divergence theorem is called the flux of the vector
field through the surface. The left hand suggests that the flux is produced by
the divergence of the vector field in the interior.
Suppose that
n
X ∂
v= aj . (4.18)
j=1
∂uj
4.3. THE DIVERGENCE THEOREM 131

To compute the integral over the boundary we have to pull back the differential
form vcvol to the parameter space. Say that t1 , . . . , tn−1 are the parameters.
Then the differential form pulls back to
n
X √
φ∗ (vcvol) = aj νj g dt1 · · · dtn−1 , (4.19)
j=1

where νi is (−1)i−1 times the determinant of the matrix obtained by deleting



the ith row from the matrix ∂uj /∂tα . The quantity g must also be expressed
in terms of the ti . The explicit form of the theorem is then
n n
1 X ∂ √ √ √
Z Z X
√ ( gaj ) g du1 · · · dun = aj νj g dt1 · · · dtn−1 . (4.20)
W g j=1 ∂uj j=1

If one compares this with other common formulations of the divergence theorem,
one sees that there is no need to normalize the components νj , and there is also
no need to compute the surface area of ∂W . Both of these operations can be a
major nuisance; it is satisfying that they are not necessary.
There is another closely related way of looking at the surface integral in
the divergence theorem. The terminology is not standardized, but here is one
choice. Define the transverse surface element to be the interior pullback of
the volume as
n
√ X√
element = φ∗1 vol = φ∗1 g du1 , · · · , dun = gνi dxi dt1 , · · · , dtn−1 . (4.21)
i=1

Notice that this is just the pullback where one freezes the dui and only pulls
back the other duj . Then the flux density pulled back to the surface may be
written
n
∗ √ X
φ (vcvol) = vcelement = g ai νi dt1 , · · · , dtn−1 . (4.22)
i=1

The interior product on the left is a vector field with an n-form, while the
interior product on the right is a vector field with an 1-form (freezing the dti ).
So the vector field paired with this surface element is the object that must be
integrated.
In two dimensions the divergence theorem says that
 √ √ 
∂ ga ∂ gb √
Z Z
1
√ + area = g(a dv − b du). (4.23)
R g ∂u ∂v ∂R
√ √
Here the area form is g du dv, where the particular form of g is that associ-
ated with the u, v coordinate system. Notice that the coefficients in the vector
field are expressed with respect to a coordinate basis. We shall see in the next
part of this book that this is not the only possible choice. This theorem involves
a kind of line integral, but the starting point is a vector field instead of a 1-form.
132 CHAPTER 4. THE METRIC TENSOR

A marvellous application of the divergence theorem in two dimensions is the


formula Z Z
1
dx dy = x dy − y dx. (4.24)
R 2 ∂R
This says that one can determine the area by walking around the boundary. It
is perhaps less mysterious when one realizes that x dy − y dx = r2 dφ.
In three dimensions the divergence theorem says that
 √ √ √ 
∂ ga ∂ gb ∂ gc √
Z Z
1
√ + + vol = g(a dv dw + b dw du + c du dv).
W g ∂u ∂v ∂w ∂W
(4.25)
√ √
Here the volume form is g du dv dw, where the particular form of g is that
associated with the u, v, w coordinate system. Again the coefficients a, b, c of the
vector field are expressed in terms of the coordinate basis vectors ∂/∂u, ∂/∂v, ∂/∂w.
This is not the the only possible kind of basis for a vector field, so in some treat-
ments the formulas will appear differently. They will be ultimately equivalent
in terms of their geometrical meaning. This result involves a kind of surface
integral, but the starting point is a vector field instead of a 2-form.
The divergence theorem says that the integral of the divergence of a vector
field over W with respect to the volume is the integral of the flux of the vector
field across the bounding surface ∂W . A famous application in physics is when
the vector field represents the electric field, and the divergence represents the
density of charge. So the amount of charge in the region determines the flux of
the electric field through the boundary.
An important identity for a differential form ω and a scalar field d is

d(sω) = ds ∧ ω + s dω. (4.26)

This gives an integration by parts formula


Z Z Z
ds ∧ ω + s dω = sω. (4.27)
W W ∂W

Apply this to ω = ucvol and use ds ∧ ucvol = hds | ui vol. This gives the
divergence identity
div (su) = ds · u + s div u. (4.28)
From this we get another important integration by parts identity
Z Z Z
hds | ui vol + s div u vol = s ucvol. (4.29)
W W ∂W

4.4 Conservation laws


A conservation law conservation law is an assertion that the total amount of
some quantity (such as mass) does not change. A continuity equation is a local
form of a conservation law. It says that any change in the amount of the quantity
4.4. CONSERVATION LAWS 133

in some region is compensated by flow across the boundary of the region. In


mathematical form it is expressed by an equation of the form
∂R
+ dJ = 0. (4.30)
∂t
Here R is an n-form, the mass in kilograms, and J is an n − 1 form, the mass
flux in kilograms per second). The coefficients of these two forms have units
kilograms per cubic meter and kilograms per second per square meter.) The
integral form of such a conservation law is
Z Z
d
R+ J = 0. (4.31)
dt W ∂W

It says that the rate of change of the amount of substance inside the region W
plus the net (outward minus inward) flow through the boundary ∂W is zero.
Thus, for instance, the amount in the region can only decrease if there is a
compensating outward flow. In fluid dynamics the flux J of mass is J = vcR,
where v is the fluid velocity vector field. Since the coefficients of v are in meters
per second, and the basis vectors are in inverse meters, the units of v itself is
in inverse seconds.
Often one writes
R = ρ vol (4.32)
Here the coefficient ρ is a scalar density (in kilograms per cubic meter). In this
case the conservation law reads
∂ρ
+ div (ρv) = 0. (4.33)
∂t
The corresponding integral form is
Z Z
d
ρ vol + ρvcvol = 0. (4.34)
dt W ∂W

The units for this equation are kilograms per second. For a fluid it is the law
of conservation of mass. The theory also applies when v is a time-dependent
vector field.
The conservation law also appears in the equivalent form
∂ρ
+ vcdρ + div v ρ = 0 (4.35)
∂t
or, more concretely,
n
∂ρ X ∂ρ
+ vi = −div v ρ. (4.36)
∂t i=1 ∂xi
Here vi is the component of velocity in the direction corresponding to xi . The
left hand side is a material derivative, that is, the change in density following a
typical particle. This suggests a method of solving the differential equation by
134 CHAPTER 4. THE METRIC TENSOR

integrating the vector field to find particle trajectories. The change in density
along a particle trajectory is driven by the right hand side, which is a measure
of how fast particles are compressing together.
A solution for this conservation law describes how ρ at t = t0 determines ρ for
later t > t0 . The solution method uses the space-time curves ct0 for t0 |leqt0 ≤ t
that define the particle trajectories. Fix a space-time point given by x and t.
The curve is required to pass through this point, that is, the particle will reach
x at time t. Then ct0 describes where it came from, that is, its location in
space-time at time t0 ≤ t. The space coordinates of ct0 are x ◦ ct0 . The time
component of ct0 is just t0 . The curve solves the ordinary differential equations
dxi ◦ ct0
= vi ◦ ct0 . (4.37)
dt0
This says that the particle moves according to the fluid velocity.
From the chain rule and the ordinary differential equation for the particle
trajectory we get
n n
!
dρ ◦ ct0 ∂ρ X dxi ◦ ct0 ∂ρ ∂ρ X ∂ρ
= ◦ ct0 + ◦ ct0 = ◦ ct0 + vi ◦ ct0 . (4.38)
dt0 ∂t i=1
dt0 ∂xi ∂t i=1
∂xi

Using the partial differential equation, we obtain an ordinary differential equa-


tion for the density along the particle trajectory in the form
dρ ◦ ct0
= −div v ◦ ct0 ρ ◦ ct0 . (4.39)
dt0
Suppose that we know the restriction ρ0 of ρ to t = t0 . Solving the ordinary
differential equation leads to
Z t
ρ = exp(− div v ◦ ct0 dt0 ) ρ0 ◦ ct0 (4.40)
t0

This is the solution for ρ at any point given by x, t. The particles at time t have
been transported from where they were at time t0 . However they may be more
(or less) spread out because they are diverging (or converging) geometrically,
and this will decrease (or increase) the density.
The object of interest is really the differential form R = ρ vol that gives the
mass in a given region. For each t0 there is a map ct0 from the time t space to the
time t0 space. The solution says that R is the pullback under ct0 of R0 = ρ0 vol.
The exponential factor in ρ comes from pulling back the volume form. All that
happens is that mass is transported to new locations.
Example: Consider an exploding star, at a scale at which the original star is a
point. If the point is the origin, and the explosion was at time zero, then the
particles found at x, t will have travelled a distance x in time t and will then
have (constant) velocity v = x/t. If we look at particles at the same time but
at greater or lesser distance, then they will not be the same particles, so they
will have correspondingly greater or lesser velocities.
4.5. THE METRIC 135

The explosion need not be symmetric; the profile at time t > 0 is ρ = f (x, t).
The volume form is dn x = dx1 · · · dxn , and so the divergence of the vector field
is div v = n/t. The exponential factor is thus (t0 /t)n . (For the exploding
star example the dimension of space is n = 3, but it is just as easy to do
the calculation for arbitrary dimension.) For fixed x and t the solution of
the ordinary differential equation dx0 /dt0 = x0 /t0 is x0 = (x/t)t0 . The space
component of the solution curve ct0 is therefore x ◦ ct0 = (x/t)t0 . So if ρ0 =
f (x, t0 ), then ρ0 ◦ ct0 = f ((x/t)t0 , t0 ). The solution to the partial differential
equation is  n
t0 x
ρ = f (x, t) = f (t0 , t0 ). (4.41)
t t
It consists of a profile that is the same for all particles with the same constant
velocity x/t, modified by a factor that says that they are flying apart.
For such an explosion the mass form R = ρ dn x is the pullback under the
map x ← (t0 /t)x of R0 = f (x, t0 ) dn x. This leads to an equivalent solution
formula R = f ((t0 /t)x, t0 ) dn (t0 /t)x = f ((t0 /t)x, t0 )(t0 /t)n dn x. |

4.5 The metric


In geometry one measures distance in a straight line using the Euclidean dis-
tance. In coordinates this is computed by using the theorem of Pythagoras.
However with some calculus one can also measure distance along a curve. It is
even possible to do this in arbitrary coordinate systems. This motivates a very
general notion of squared distance, given by the metric tensor
n X
X n
g= gij dxi dxj . (4.42)
i=1 j=1

Here the product dxi dxj is not the anti-symmetric exterior project, but instead
is a symmetric tensor product. The metric tensor is determined in a particular
coordinate system by functions gij forming the matrix of coefficients. This
matrix is required to be symmetric and positive definite. The distance along a
regular parameterized curve C is then given by
v
u n X n
Z uX
distance = t gij dxi dxj . (4.43)
C i=1 j=1

This can be computed via a parametrization of C via


v
u n X n
Z t1 uX
dxi dxj
distance = t gij dt. (4.44)
t0 i=1 j=1
dt dt

Because of the square root the integrals can be nasty. In this expression it
is helpful to think of dx
dt as the components of the velocity. The square root
i
136 CHAPTER 4. THE METRIC TENSOR

expression is then interpreted as the speed ds


dt . So the computation of arc length
of the curve comes down to integrating ds.
If we have the metric tensor in one set of coordinates, then we have it in any
other set of coordinates. Thus
!    
n Xn n n n X n n X n
X X ∂xi X ∂x j
X X ∂x i ∂x j
g= duα gij  duβ  =  gij duα duβ .
i=1 j=1 α=1
∂uα ∂u β α=1 i=1 j=1
∂uα ∂uβ
β=1 β=1
(4.45)
So the matrix in the new coordinates is
n X n
X ∂xi ∂xj
gαβ = gij . (4.46)
i=1 j=1
∂uα ∂uβ

The case gij = δij is the special case when the xi are Cartesian coordinates.
In this case
Xn
g= dx2i . (4.47)
j=1

However even if we start with Cartesian coordinates, the coefficients of the


metric tensor takes a more complicated form in other coordinate systems. If the
coefficients come from Cartesian coordinates via a change of coordinates, then
the metric is said to be a flat metric.
A familiar example of a flat metric is the metric in three dimensional Eu-
clidean space. Then we have

g = dx2 + dy 2 + dz 2 = dr2 + r2 dθ2 + r2 sin2 (θ) dφ2 . (4.48)

The latter is in spherical polar coordinates.


The notion of metric tensor goes at least part way to erasing the distinction
between differential 1-forms and vector fields. It provides a correspondence, one
that is sometimes awkward, but that always exists. Let gij be the coefficients
of the metric tensor for the coordinate system u1 , . . . , un . If
n
X ∂
X= ak (4.49)
∂uk
k=1

is a vector field, then the associated differential 1-form is


n n
!
X X
gX = ai gij duj . (4.50)
j=1 i=1

One can also go the other direction. If


n
X
ω= pk duk (4.51)
k=1
4.5. THE METRIC 137

is a differential 1-form, then the associated vector field is


n n
!
X X ∂
g−1 ω = g ij pj . (4.52)
j=1 i=1
∂u i

Here we are using the perhaps unfamiliar notation that g ij is the inverse matrix
to gij . (This notation is standard in this context.)
Another quantity associated with the metric tensor g is the volume form

vol = g du1 · · · dun . (4.53)

Here g denotes the determinant of the matrix gij . (This notation is also stan-
dard.) The interpretation of this as volume is left to a later section.
There is a very important construction that produces new metrics. Sup-
pose that the n dimensional space has coordinates x1 , . . . , xn , and there is a
k-dimensional regular parametrized surface with coordinate u1 , . . . , uk . Start
with the metric
Xn X n
g= gij dxi dxj . (4.54)
i=1 j=1

The pullback of the metric to the surface is


k X
X k
g∗ = ∗
gαβ duα duβ . (4.55)
α=1 β=1

Here
n X n

X ∂xi ∂xj
gαβ = gij . (4.56)
i=1 j=1
∂uα ∂uβ

A simple example is the pullback of the Euclidean metric given above to the
sphere x2 + y 2 + z 2 = a2 . The metric pulls back to

g∗ = a2 dθ2 + a2 sin2 (θ) dφ2 . (4.57)

This is not a flat metric. Even if one only considers a small open subset of the
sphere, it is still impossible to find coordinates u, v such that g∗ = du2 + dv 2 .
Remark: The words tensor and tensor field can refer to a number of kinds
of objects. Strictly speaking, a tensor is defined at a particular point, and a
tensor field is a function that assigns to each point a tensor at that point. More
precisely, a tensor of type (p, q) at a point is a real multi-linear function whose
inputs consist of q vectors in the tangent space at the point and p vectors in
the dual space to the tangent space at the point (covectors). When p = 0 and
all the inputs are vectors it is called a covariant tensor. . When q = 0 and all
the inputs are covectors it is called a contravariant tensor. When both kinds of
vectors are allowed as inputs, it is called a mixed tensor. A tensor field assigns
(in a smooth way) to each point in a manifold patch a corresponding tensor at
138 CHAPTER 4. THE METRIC TENSOR

that point. People are often careless and use the word tensor to mean tensor
field.
The most basic tensor fields are scalar fields of type (0,0), vector fields of
type (1,0), and differential 1-forms of type (0,1) There are also more complicated
tensor fields. A differential k-form assigns to each point a real multi-linear
function on k-tuples of tangent vectors at the point, so it is of type (0, k). A
metric tensor field assigns to each pair an inner product on tangent vectors
at the point, so it is of type (0, 2). For the more complicated tensors one
can also impose symmetry conditions. Thus one distinguishes between the anti-
symmetric tensor case (differential k-forms) and the symmetric tensor case (the
metric tensor). The metric tensor is required not only to be symmetric, but also
positive definite. The inverse of the metric tensor is a tensor of type (2, 0); it is
also symmetric and positive definite.
The only example in these lecture of a mixed tensor is a (1,1) tensor, that
is, a linear transformation. An example is the linear transformation associated
with a vector field at a zero. This should be distinguished from the quadratic
form associated with a scalar field at a critical point, which is a symmetric
covariant tensor of type (0,2).
The study of tensors at a point is called tensor algebra, while the study of
tensor fields is tensor calculus. The metric tensor field provides a particularly
rich ground to explore. A choice of metric tensor field is the beginning of a
subject called Riemannian geometry. The metric tensor field and related objects
are fundamental to Einstein’s general relativity. |

4.6 Twisted forms


There is a variation on the idea of differential form that comes up in various
contexts, including discussions of volume. The new kind of object is known as
a twisted form or a pseudoform. In the following we consider the notion of
twisted form in top dimension. The main ingredient is that when an integral
involving the twisted form is expressed in terms of a new coordinate system,
the expression for the new integral involves the absolute value of the Jacobian
determinant.
Suppose that P is a k-dimensional manifold patch. Consider a twisted differ-
ential k-form Ω on M . Also consider a Jordan measurable compact set K ⊆ P .
We wish to define the integral of Ω over K. To do this, choose a coordinate
system u1 , . . . , uk . Write
Ω = f (u1 , . . . , uk ) | du1 ∧ · · · ∧ duk |. (4.58)
There is a set B ⊆ Rn such that u(K) = B. The definition is
Z Z
Ω= f (v1 , . . . , vn ) dv1 · · · dvk = IB (f ). (4.59)
K B

where IB (f ) is the usual (unoriented) Riemann integral. (The variables v1 , . . . , vn


are just symbols that are available to define the function f .)
4.7. THE GRADIENT AND DIVERGENCE AND THE LAPLACE OPERATOR139

To show that this is well-defined, consider another coordinate system y1 , . . . , yk .


Then u = g(y). Also, B = g(A) for some Jordan measurable set A. The twisted
differential form
Ω = f (u) | du1 ∧ · · · ∧ duk | = f (g(y))| det g0 (y)| | dy1 ∧ · · · ∧ dyk | (4.60)
may be expressed in either coordinate system. The definition in the other co-
ordinate system gives
Z Z
Ω= f (g(y))| det g0 (y)| | dy1 ∧ · · · ∧ dyk | = IA ((f ◦ g)| det g0 |). (4.61)
K K

But IB (f ) = IA ((f ◦ g)| det g0 |) by the change of variables theorem. So the


definitions using different coordinate systems are consistent.
The twisted form concept does not depend on orientation. Furthermore, it
makes sense to say that a twisted form is positive. For this reason, it is natural

to interpret the volume form Vol = g | du1 ∧· · ·∧ dun | as a twisted form instead
of as an ordinary form. The same applies in other applications, such as when
the form represents a mass density or a probability density.
The twisted property for the volume may be seen as follows. Suppose that
the matrix for the metric tensor in the u coordinate system is G. Then the
matrix in the y coordinate system is g0 (y)T Gg0 (y). If the determinant in the
u system is g = det G, then the determinant in the y coordinate system is
g(det g0 (y))2 . The volume may thus be represented in either coordinate system,
with
√ √
Vol = g | du1 ∧ · · · ∧ dun | = g | det g0 (y)| | dy1 ∧ · · · ∧ dyn |. (4.62)
This is exactly what one would expect from a twisted form.
It may seem to be a nuisance to have two kinds of differential forms, the usual
ones and the twisted ones. When we are integrating over an oriented region the
distinction is subtle, and it may be easier to deal with usual differential forms.
For example, we can consider the volume form as a usual differential form,
but remember to choose an orientation so that the integral of the volume form

vol = g dx1 ∧ · · · dxn over the region is positive. For more information about
twisted forms, see the book of Burke [4].

4.7 The gradient and divergence and the Laplace


operator
The gradient of a scalar is the vector field
∇s = grad s = g−1 ds. (4.63)
In coordinates the gradient has the form
n n
!
X X
jk ∂s ∂
∇s = grad s = g (4.64)
j=1
∂uk ∂uj
k=1
140 CHAPTER 4. THE METRIC TENSOR

The Laplacian of s is the divergence of the gradient. Thus

∇2 s vol = div grad s vol = d(∇ scvol). (4.65)

In coordinates this is
n n
!
2 1 X ∂ √ X ik ∂s
∇ s= √ g g . (4.66)
g i=1 ∂ui ∂uk
k=1

Theorem 4.2 (Green’s first identity) If s and u are scalars defined in the
bounded region Ω, then
Z Z Z
2
s ∇ u vol + ∇s g ∇u vol = s ∇ucvol. (4.67)
Ω Ω ∂Ω

Proof: This is just integration by parts. By the product rule we have

d(s ∇uc vol) = ds ∇uc vol + s d(∇uc vol). (4.68)

However ds∇uc vol = hds | ∇uivol = ∇s g ∇u vol. So this is

d(s ∇uc vol) = ∇s g ∇u vol + s ∇2 u vol. (4.69)

Integrate and use Stokes’ theorem. 


This identity is often used in cases when either s vanishes on ∂Ω or ∇u
vanishes on ∂Ω, In that case it says that
Z Z
− s ∇2 u vol = ∇s g ∇u vol. (4.70)
Ω Ω

In particular, Z Z
− u ∇2 u vol = ∇u g ∇u vol ≥ 0. (4.71)
Ω Ω

This suggests that in some sense −∇2 is a positive operator.


The remaining objects are in three dimensions. The cross product of two
vectors v and w is defined as the unique vector v × w such that

(v × w)cvol = gv ∧ gw. (4.72)

In other words, it is the operation on vectors that corresponds to the exterior


product on forms. The curl of a vector field v is defined by

(curl v)cvol = d(gv). (4.73)

It is easy to see that curl grad f = 0 and that div curl v = 0. In this language
Stokes’s theorem says that
Z Z
curl vcvol = gv. (4.74)
S ∂S
4.8. ORTHOGONAL COORDINATES AND NORMALIZED BASES 141

4.8 Orthogonal coordinates and normalized bases


In many applications it is possible to use orthogonal coordinates to simplify
the calculations. This section presents some of the common formulas for this
case. It is mainly for reference and for comparison with other treatments. All
of the formulas that follow are a consequence of the general theory. It is very
convenient to choose coordinates so that the Riemannian metric is diagonal with
respect to this coordinate system. Such a coordinate system is called a system
of orthogonal coordinates. In this case it has the form

g = h21 du21 + h22 du22 + · · · + h2n du2n . (4.75)

Here each coefficient hi is a function of the coordinates u1 , . . . , un .


Consider a manifold with a given Riemannian metric. For instance, it could
be a k dimensional surface in some Euclidean space of larger dimension n. If the
manifold has dimension at most three, then near every point there is always a
new coordinate system that is a system of orthogonal coordinates. In the case of
three dimensions this is not a particularly obvious fact, but it is a consequence
of the Cartan-Kähler theorem. There is a treatment in the book of Bryant and
coauthors [3].
When we have orthogonal coordinates, it is tempting to make the basis
vectors have length one. Thus instead of using the usual coordinate basis vectors
∂ 1 ∂
∂ui one uses the normalized basis vectors hi ∂ui . Similarly, instead of using
the usual coordinate differential forms dui one uses the normalized differentials
hi dui . Then
 
1 ∂ 1 ∂
g a1 + · · · an = a1 h1 du1 + · · · an hn dun (4.76)
h1 ∂u1 hn ∂un

With normalized basis vectors the coefficients of vector fields and the corre-
sponding differential forms are the same. This makes it very easy to confuse
vector fields with differential forms.
In orthogonal coordinates the volume is given in terms of normalized differ-
entials by
vol = h1 du1 ∧ · · · ∧ hn dun . (4.77)
A simple example of orthogonal coordinates is that of polar coordinates r, φ
in the plane. These are related to Cartesian coordinates x, y by

x = r cos(φ) (4.78)
y = r sin(φ) (4.79)

The Riemannian metric is expressed as

g = dr2 + r2 dφ2 . (4.80)



The normalized basis vectors are ∂r and 1r ∂φ

. The normalized basis forms are
dr and r dφ. The area form is r dr ∧ dφ. Warning: Even though coordinate
142 CHAPTER 4. THE METRIC TENSOR

forms like dφ are closed forms, a normalized form like r dφ need not be a closed
form. In fact, in this particular case d(r dφ) = dr ∧ dφ 6= 0.
Another example of orthogonal coordinates is that of spherical polar coordi-
nates r, θ, φ. These are related to Cartesian coordinates x, y, z by

x = r cos(φ) sin(θ) (4.81)


y = r sin(φ) sin(θ) (4.82)
z = r cos(θ) (4.83)

The Riemannian metric is expressed as

g = dr2 + r2 dθ2 + r2 sin2 (θ) dφ2 . (4.84)



The normalized basis vectors are ∂r and 1r ∂θ
∂ 1
and r sin(θ) ∂
∂φ . The normalized
basis forms are dr and r dθ and r sin(θ) dφ. The volume form is r2 sin(θ) dr ∧
dθ ∧ dφ.
If f is a scalar field, then its gradient is
n
X 1 ∂f 1 ∂
∇f = . (4.85)
h
i=1 i
∂ui hi ∂ui

If u is a vector field, then its divergence ∇ · u is a scalar field. Say that u


has an expression in terms of normalized basis vectors of the form
n
X 1 ∂
u= ai . (4.86)
i=1
hi ∂ui

Then
n  
X 1 ∂ h1 · · · hn
div u = ∇ · u = ai . (4.87)
i=1
h1 · · · hn ∂ui hi
In coordinates the Laplacian has the form
n  
1 X ∂ h1 · · · hn ∂f
∇2 f = (4.88)
h1 · · · hn i=1 ∂ui h2i ∂ui

For example, in three dimensions with Cartesian coordinates it is

∂2f ∂2f ∂2f


∇2 f = 2
+ 2 + 2. (4.89)
∂x ∂y ∂z
In spherical polar coordinates it is usually written

1 ∂2f
 
2 1 ∂ 2 ∂f 1 1 ∂ ∂f
∇ f= 2 r + 2 sin(θ) + . (4.90)
r ∂r ∂r r sin(θ) ∂θ ∂θ sin2 (θ) ∂φ2
We conclude by recording the explicit form of the divergence theorem and
Stokes’ theorem in the context of orthogonal coordinates. The first topic is
4.8. ORTHOGONAL COORDINATES AND NORMALIZED BASES 143

the divergence theorem in two dimensions. Say that the vector field v has an
expression in terms of normalized basis vectors of the form
1 ∂ 1 ∂
v=a +b . (4.91)
hu ∂u hv ∂v
Recall that the area form is

area = hu hv du dv. (4.92)

Then the corresponding differential 2-form is

vcarea = ahv dv − bhu du. (4.93)

The divergence theorem in two dimensions is obtained by applying Greens’s


theorem for 1-forms to this particular 1-form. The result is
Z    Z
1 ∂ ∂
(hv a) + (hu b) hu hv du dv = ahv dv − bhu du. (4.94)
R hu hv ∂u ∂v ∂R

The expression in brackets on the left is the divergence of the vector field. On
the right the integrand measures the amount of the vector field crossing normal
to the curve.
The next topic is the divergence theorem in three dimensions. Say that the
vector field v has an expression in terms of normalized basis vectors of the form
1 ∂ 1 ∂ 1 ∂
v=a +b +c . (4.95)
hu ∂u hv ∂v hw ∂w
Recall that the volume form is

vol = hu hv hw du dv dw. (4.96)

Then the corresponding differential 2-form is

vcvol = ahv hw dv dw + bhw hu dw du + chu hv du dv. (4.97)

The divergence theorem in three dimensions is obtained by applying Gauss’s


theorem for 2-forms to this particular 2-form. The result is
Z   
1 ∂ ∂ ∂
(hv hw a) + (hw hu b) + (hu hv c) hu hv hw du dv dw =
V hu hv hw ∂u ∂v ∂w
Z
ahv hw dv dw + bhw hu dw du + chu hv du dv. (4.98)
∂V

The expression in brackets is the divergence of the vector field.


The final topic is the classical Stokes’s theorem in three dimensions. Say
that the vector field v has an expression in terms of normalized basis vectors as
above. There is a corresponding differential 1-form

gv = ahu du + bhv dv + chw dw. (4.99)


144 CHAPTER 4. THE METRIC TENSOR

The classical Stokes’s theorem in three dimensions is obtained by applying


Stokes’s theorem for 1-forms to this particular 1-form. This gives on the left
hand side
Z      
1 ∂hw c ∂hv b 1 ∂hu a ∂hw c
− hv hw dv dw + − hw hu dw du +
S hv hw ∂v ∂w hu hw ∂w ∂u
  
1 ∂hv b ∂hu a
− hu hv du dv(4.100)
hu hv ∂u ∂v
and on the right hand side
Z
ahu du + bhv dv + chw dw. (4.101)
∂S

The terms in square brackets are the components of the curl of the vector field
expressed in terms of normalized basis vectors.

4.9 Linear algebra (the Levi-Civita permutation


symbol)
There is a powerful algebraic method to get results in linear and multi-linear
algebra. This is the use of tensor algebra and the Levi-Civita symbol. This
method is usually terrible for numerical calculation, and it is not very useful for
giving geometric insight. The advantage is that it often produces an answer by
a straightforward calculation.
It is convenient to use certain algebraic conventions for manipulating tensor
coefficients. The common practice is to use lower indices for coefficients of
covariant tensors and upper indices for coefficients of contravariant tensors.
Repeated upper and lower indices indicate summation. (This is often called the
Einstein summation convention.) Such a summation is also called a contraction.
Thus if A is a matrix aij and B is a matrix bhk , then the product C = AB is the
matrix cik = aij bjk obtained by contraction. The trace tr(C) = tr(AB) = tr(BA)
of the matrix is the result of a second contraction, that is, cii = aij bji .
The Levi-Civita permutation symbol may be written in two forms: j1 ···jn =
j1 ···jn . By definition this is equal to 0 if there are repeated indices, equal to
1 if the indices form an even permutation of 1, . . . , n, and equal to −1 if the
indices form an odd permutation of 1, . . . , n. The advantage of the Levi-Civita
symbol is that the sums are over all values of the indices; the symbol itself
enforces the permutation condition. This is particularly useful in dealing with
determinants. In this section we state some properties of determinants using
the Levi-Civita symbol. Proofs are indicated in the problems. The following
section gives applications of the Levi-Civita symbol to the formulation of various
expressions for volume and area.
A general definition of determinant is

det(A) = j1 ···jn a1j1 · · · anjn . (4.102)


4.10. LINEAR ALGEBRA (VOLUME AND AREA) 145

This says that the determinant is a sum of products, each product having coeffi-
cient 0, 1, or −1. There are nn such products, most of them equal to zero. Each
product with a non-zero coefficient corresponds to picking a distinct element
from each row and multiplying them together. The number of products with
non-zero coefficient is n!, which is still a very large number for computational
purposes.
The determinant formula above depends on the fact that the rows are taken
in order 1, . . . , n. If we instead take them in the order i1 , . . . , in we get

j1 ···jn aij11 · · · aijnn = i1 ···in det(A). (4.103)

There is an even more complicated but considerably more symmetric formula


for the determinant:
1
det(A) = i ···i j1 ···jn aij11 · · · aijnn . (4.104)
n! 1 n
This formula leads to particularly straightforward derivations of identities such
as det(A) = det(AT ) and det(AB) = det(A) det(B).
Cramer’s rule is a formula for the inverse of a matrix given in terms of
determinants. If aij is a matrix, define its cofactor matrix to be

1
Ckj = k i ···i j j2 ···jn aij22 · · · aijnn . (4.105)
(n − 1)! 2 n

(This is actually the transpose of the usual matrix of cofactors.) In tensor


algebra language Cramer’s rule may be stated as

aij Ckj = δki det(A). (4.106)

In the matrix version this says that AC = det(A)I. Cramer’s rule thus has the
succinct statement A−1 = (1/ det(A))C. While Cramer’s rule is quite impracti-
cal for numerical calculation, it does give considerable insight into the structure
of the inverse matrix.

4.10 Linear algebra (volume and area)


Consider vectors Xα in Rn for α = 1, . . . , n. The convex combinations of these
vectors together with the zero vector form a solid, a kind of n-dimensional par-
allelepiped. There are two formulas for the volume of this solid. The Euclidean
volume is
n−volume(X) = | det X|. (4.107)
This is the absolute value of the determinant of the entries in the vectors. There
is an alternative formula that looks quite different, but is equivalent. This is

n−volume(X) = det X T X. (4.108)
146 CHAPTER 4. THE METRIC TENSOR

The matrix X T X is the matrix of pairwise Euclidean inner products of the


vectors. Sometimes this is called a Gram matrix.
One can generalize this to the case when there is an inner product on Rn
given by an n by n positive definite matrix G. In that case we have
√ √
n−volumeG (X) = det X T GX = g | det X|. (4.109)

Here X T GX is the matrix of the pairwise inner products of vectors, a general-


ization of the Gram matrix. Also g = det G is the determinant of G.
Now consider vectors Xα in Rn for α = 1, . . . , k. The convex combinations
of these vectors together with the zero vector form k-dimensional parallelepiped.
The matrix is no longer square. It is natural to define the k dimensional area
as √
k−areaG (X) = det X T GX (4.110)
The Gram matrix X T GX is again the matrix of pairwise inner products of the
vectors.
The following result is a generalization of the Cauchy-Binet theorem.

Theorem 4.3 Let X be an n by k matrix. Let G be an n by n symmetric


matrix. For each sequence i1 , . . . , ik of rows of X,

J i1 ···ik = α1 ···αk Xαi11 · · · Xαikk . (4.111)

represent the determinant of the corresponding k by k minor obtained by retain-


ing only those rows. Then
1 i1 ···ik
det(X T GX) = J gi1 j1 · · · gik jk J j1 ···jk . (4.112)
k!
In this formula it is understood that there is summation over repeated indices.

Proof: In the following there are always sums over repeated indices. We
have
1 α1 ···αk β1 ···βk T
det(X T GX) =   (X GX)α1 β1 · · · (X T GX)αk βk . (4.113)
k!
However
(X T GX)α β = Xαi gij Xβj . (4.114)
So
1 α1 ···αk β1 ···βk i1
det(X T GX) =   Xα1 · · · Xαikk gi1 j1 · · · gik jk Xβj11 · · · Xβjkk . (4.115)
k!
From the definition of J we get the result as stated. 

Corollary 4.4 Let X be an n by k matrix. Let G be an n by n diagonal matrix.


For each sequence i1 , . . . , ik of rows of X, let

J i1 ···ik = α1 ···αk Xαi11 · · · Xαikk . (4.116)


4.10. LINEAR ALGEBRA (VOLUME AND AREA) 147

represent the determinant of the corresponding k by k minor obtained by retain-


ing only those rows. Then
X
det(X T X) = gK (J K )2 . (4.117)
K

Here K =Q{i1 , . . . , ik } is a subset of {1, . . . , n}. We write (J K )2 = (J i1 ···ik )2


and gK = i∈K gii .

Now consider vectors Xα in Rn for α = 1, . . . , n − 1. This is the special case


of codimension one. In this case, there is a nice simplification of these results.

Theorem 4.5 Let X be an n by n − 1 matrix. Let G be an n by n symmetric


invertible matrix. For each sequence j2 , . . . , jn of rows of X, let

J j2 ···jn = α1 ···αn−1 Xαj21 · · · Xαjnn−1 . (4.118)

represent the determinant of the corresponding n − 1 by n − 1 minor obtained


by retaining only those rows. Let
1
νj = j j ···j J j2 ···jn (4.119)
(n − 1)! 2 n

be the components of a row vector that represents the determinant of the minor
that does not include row j. Then

det(X T GX) = g νG−1 ν T . (4.120)

Here g is the determinant of the matrix G.

Proof: First we need the identity

J j2 ···jn = νj j j2 ···jn . (4.121)

This follows from


1
νj j j2 ···jn = j j2 ···jn j h2 ···hn J h2 ···hn = J j2 ···jn . (4.122)
(n − 1)!

The last identity requires some thought. Fix j2 , . . . , jn distinct. In the sum
over j the factor j j2 ···jn only contributes when j is the one index distinct
from j2 , . . . , jn . The corresponding factor j h2 ···hn then only matters when the
indices h2 , . . . , hn are a permutation of the j2 , . . . , jn . However, both j h2 ···hn
and J h2 ···hn are antisymmetric in the indices h1 , . . . , hn . It follows that the
product j h2 ···hn J h2 ···hn is the same for each permutation h1 , . . . , hn . When we
sum over these permutations we get (n − 1)! terms all equal to J j2 ···jn .
We can use the previous theorem to write
1
det(X T GX) = J i2 ···in gi2 j2 · · · gin jn J j2 ···jn . (4.123)
(n − 1)!
148 CHAPTER 4. THE METRIC TENSOR

The identity shows that this is equal to


1
det(X T GX) = νi νj i i2 ···in j j2 ···jn gi2 j2 · · · gin jn . (4.124)
(n − 1)!

However the cofactor C ij of gij in G is given by a determinant


1
C ij = i i2 ···in j j2 ···jn gi2 j2 · · · gin jn . (4.125)
(n − 1)!

Since by Cramer’s rule the inverse matrix is g ij = (1/g)C ij , we get

det(X T GX) = νi C ij νj = g νi g ij νj . (4.126)

Corollary 4.6 Let X be an n by n − 1 matrix. Let G be a diagonal invertible


matrix. For each sequence j2 , . . . , jn of rows of X, let

J j2 ···jn = α1 ···αn−1 Xαj21 · · · Xαjnn−1 . (4.127)

represent the determinant of the corresponding n − 1 by n − 1 minor obtained


by retaining only those rows. Let
1
νj = j j ···j J j2 ···jn (4.128)
(n − 1)! 2 n
be the components of a row vector that represents the determinant of the minor
that does not include row j. Then
n
X g 2
det(X T GX) = ν . (4.129)
g i
i=1 ii

This general result is dramatic even in the case n = 3 and G the identity
matrix. In that case it says that the square of the area of a parallelogram is
the sum of the squares of the areas of the three parallelograms obtained by
projecting on the three coordinate planes. This is a remarkable generalization
of the theorem of Pythagoras. [The most common version of this observation
is in the context of the cross product. There are vectors X 1 and X 2 in R3 .
They span a parallelogram with a certain area. The cross product is a vector ν
perpendicular to X 1 and X 2 whose Euclidean length is this area.]

4.11 Surface area


Say that we are in n dimensions with metric
n X
X n
g= gij dxi dxj . (4.130)
i=1 i=1
4.11. SURFACE AREA 149

The n-dimensional volume is given by integrating



voln = g dx1 · · · dxn , (4.131)

where g = det(gij ).
Consider a k-dimensional regular parametrized surface S with parameters
u1 , . . . , uk . This parametrization is one-to-one, so that u1 , . . . , uk may be thought
of as coordinates on the surface. It seems reasonable to compute the k-dimensional
surface area using the pull-back of g to the surface. This is
 
k X k k X k n X n
X X X ∂x i ∂x j
g∗ = ∗
gα,β duα duβ =  gij duα duβ . (4.132)
α=1 α=1 i=1 j=1
∂uα ∂u β
β=1 β=1

Let Xαi = ∂xi /∂uα . Then we have a collection of tangent vectors Xα . It follows
that

gαβ = XαT GXβ . (4.133)

Notice that gαβ has the form of a Gram matrix. Let g ∗ = det(gαβ ∗
) be the
corresponding Gram determinant. Then the area is given by integrating

area∗k = g ∗ du1 · · · duk . (4.134)

In some sense this is the end of the story. One computes a Gram determinant,
takes the square root, and integrates. Because of the square root the integrals
tend to be quite nasty. In principle, though, we have a nice notion of area and
of integration with respect to area. That is, we have

Z Z
h(x)areak (x) = h(f (u)) g ∗ du1 · · · duk . (4.135)
S

This may be computed by any convenient parametrization x ← f (u) of the


surface.
The main complication in discussions of surface area is that there are alter-
native formulas for the Gram determinant. The general formula is hard to work
with, though it is not so bad when the coordinates x1 , . . . , xn are orthogonal co-
ordinates. The classic approach is to restrict attention to a hypersurface, when
k = n − 1. In this case it is useful to consider the forms

πi = (−1)i−1 dx1 · · · dxi−1 dxi+1 . . . dxn . (4.136)

ThesePoccur in expressions involving the interior product of the vector field


Y = i ai ∂/∂xi with the volume form. This interior product is
n
√ X i
Y cvol = g a πi . (4.137)
i=1

If we pull back πj to the surface, we get a form

πj∗ = νj du1 · · · duk . (4.138)


150 CHAPTER 4. THE METRIC TENSOR

Here νj is (−1)j−1 times the determinant of the n − 1 by n − 1 matrix obtained


by removing the jth row from the n by n − 1 matrix ∂xi /∂uα . So the interior
product pulls back to
n n
√ X j ∗ X j √
(Y cvol)∗ = g a πj = a νj g du1 · · · duk . (4.139)
j=1 j=1

The coefficients νj satisfy for each β the relation


n
X ∂xj
νj = 0. (4.140)
j=1
∂uβ

This says that the νj are the coefficients of a 1-form P that vanishes on the tan-
gent vectors to the surface. In other words, the j aj νj is giving a numerical
indication of the extent to which the vector field is failing to be tangent, that
is, of the extent to which it is penetrating the surface.
The alternate formula for the area form follows from the linear algebra
treated in the previous section. It says that
v
√ u n X n
uX
∗ ∗

arean−1 = g du1 · · · dun = t νi g ij νj g du1 · · · dun−1 . (4.141)
i=1 j=1

In this equation g ij is the inverse matrix of the metric matrix gik . The νj is
(−1)j−1 times the determinant of the n−1 by n−1 matrix obtained by removing
the jth row from the n by n − 1 matrix ∂xi /∂uα .
ThePform coefficients νj define the coefficients of a vector with coefficients
n
N i = j=1 g ij νj . The identity says that this vector is orthogonal to the surface.
This vector of course depends on the parametrization. Sometimes people express
the area in terms of this corresponding vector as
v
√ u n X n
uX

area∗n−1 = g ∗ du1 · · · dun = t N i gij N j g du1 · · · dun−1 . (4.142)
i=1 j=1

Somewhat astonishingly, it is common to write the flux in the form


n X
X n

(Y cvol) = aj gij N̂ j arean−1 . (4.143)
i=1 j=1

Here N̂k indicates the normalization of the vector to have length one. What is
amazing is that if one writes it out, this says
v
n X n j u n X n
uX
X N √
(Y cvol)∗ = ai gij qP t N i gij N j g du1 · · · dun−1 .
n Pn i j
i=1 j=1 i=1 j=1 N gij N i=1 j=1

(4.144)
4.11. SURFACE AREA 151

There are two complicated factors involving square roots, one to normalize the
orthogonal vector, the other to calculate the area. These factors cancel. They
never need to be computed in a flux integral.
It may be helpful to summarize the result for hypersurface area in the form
of a theorem. This is stated in a way that makes clear the connection with the
divergence theorem. Recall that the transverse surface element
n
√ √ X
element = φ1 ∗ vol = φ∗1 g dx1 · · · dxn = g νi dxi du1 · · · dun−1 (4.145)
i=1

is the interior pullback of the volume. The 1-form part involving the dxi is the
part that was not pulled back. It measures the extent to which a vector field is
transverse to the surface, as in the setting of the divergence theorem. Since this
is a form and not a vector, its norm is computed via the inverse of the metric
tensor.

Theorem 4.7 Consider an n-dimensional manifold patch and a regular param-


eterized n − 1 surface with parameters u1 , . . . , un . Consider also the transverse
surface element
n
√ X
element = g νi dxi du1 · · · dun−1 (4.146)
i=1

that measures the extent to which a vector is transverse to the surface. Then
the area element is the length of the transverse surface element:
v
uX n X n
√ u
area = |element| = g t νi g ij νj du1 · · · dun−1 . (4.147)
i=1 j=1

The textbook case is that where n = 3 and k = 2, that is, a surface in


3-space. The most common coordinates are Cartesian coordinates x, y, z for
which the metric is given by dx2 + dy 2 + dz 2 . However we might want some
other set of coordinates, so temporarily we think of x, y, z as some choice of
orthogonal coordinates with metric

g = h2x dx2 + h2y dy 2 + h2z dz 2 . (4.148)

With Cartesian coordinates we simply have hx = hy = hz = 1.


Say that we have a surface parameterized by u, v. Then the metric on this
surface is
g∗ = E du2 + 2F du dv + G dv 2 . (4.149)
Here E, F, G are functions of u, v. They of course depend on the choice of
coordinates. What is required is that E > 0, G > 0 and the determinant
EF − G2 > 0. Explicitly, the coefficients are
 2  2  2
∂x ∂y ∂z
E = h2x + h2y + h2z , (4.150)
∂u ∂u ∂u
152 CHAPTER 4. THE METRIC TENSOR

and
∂x ∂x ∂y ∂y ∂z ∂z
F = h2x + h2y + h2z , (4.151)
∂u ∂v ∂u ∂v ∂u ∂v
and
 2  2  2
∂x ∂y ∂z
G = h2x + h2y + h2z . (4.152)
∂v ∂v ∂v

The formula for the area of a surface is



Z Z Z p
A= area = g du dv = EG − F 2 du dv. (4.153)
S S S

Here g = EG − F 2 is the determinant of the metric tensor.


The alternative expression for surface area is sometimes convenient. This is
Z Z q
A= area = h2y h2z νx2 + h2z h2x νy2 + h2x h2y νz2 du dv. (4.154)
S S

Here νx = J yz , νy = J zx , and νz = J xy . An expression such as J yz indicates


the determinant of the matrix of partial derivatives of y, z with respect to u, v.
As an example, take the surface given in Cartesian coordinates by z = x2 +y 2
2
with z ≤ 1. Use x, y as parameters. Then E = 1 + 4x √ , F = 4xy, and
2
G = 1 + 4y . So with these parameters the area form is EG − F 2 dx dy =
p
1 + 4x2 + 4y 2 dx dy. This is integrated over the region x2 + y 2 = 1. With the
alternative
q calculation νx = −2x and νy = −2y and νz = 1. So the area form is
p
νx + νy + νz2 dx dy = 4x2 + 4y 2 + 1 dx dy, exactly the same.
2 2

Other parametrizations are possible. Take, for instance, x = r cos(φ), y =


r sin(φ), z = r2 . Then E =√ 1 + 4r2 , F = 0, and 2
√ G = r . So with these
2 2
parameters the area form is EG − F dr dφ = r 1 + 4r dr dφ. We may as
well
√ go ahead and compute the area. It3 is 2π times the integral from 0 to 1 of
r 1 + 4r2 dr. The area is thus (π/6)(5 2 − 1).

Problems 10: The divergence theorem


1. Let r2 = x2 + y 2 + z 2 , and let
 
1 ∂ ∂ ∂
v= x +y +z . (4.155)
r3 ∂x ∂y ∂z

Let vol = dx dy dz. Show that the solid angle form

1
σ = vcvol = (x dy dz + y dz dx + z dx dy). (4.156)
r3

2. In the preceding problem, show directly that dσ = 0 away from r = 0.


4.11. SURFACE AREA 153

3. Find σ in spherical polar coordinates. Hint: This can be done by blind


computation, but there is a better way. Express v in spherical polar
coordinates, using Euler’s theorem
∂ ∂ ∂ ∂
r =x +y +z . (4.157)
∂r ∂x ∂y ∂z
Then use vol = r2 sin(θ) dr dθ dφ to calculate σ = vcvol.
4. In the preceding problem, show that dσ = 0 away from r = 0 by a spherical
polar coordinate calculation.
5. Let S be the sphere of radius a > 0 centered at the origin. Calculate the
integral of σ over S.
6. Let Q be the six-sided cube with side lengths 2L centered at the origin.
Calculate the integral of σ over Q. Prove that your answer is correct.
Hint: Given the result of the previous problem, this should be effortless.

Recitation 10
The setting for these problems is Euclidean space Rn with n ≥ 3 with Cartesian
coordinates x1 , . . . , xn . We write r2 = x21 + · · · + x2n and vol = dx1 · · · dxn . The
gradient of u is the vector
n
X ∂u ∂
∇u = grad u = . (4.158)
i=1
∂xi ∂xi

Then
n
X ∂u
∇ucvol = (−1)i−1 dx1 · · · dxi−1 dxi+1 · · · dxn . (4.159)
i=1
∂xi

In Cartesian coordinates the Laplacian ∇2 u is defined by


n
X ∂2u
∇2 u vol = div grad u vol = d(∇ucvol) = vol (4.160)
i=1
∂x2i

∂2
Pn
Often ∇2 = i=1 ∂x2i is called the Laplace operator.

1. Define the Euler operator E = ∇ 21 r2 . Show that


n
X ∂
E= xi . (4.161)
i=1
∂xi

2. Define the form ω = Ecvol. Show that


n
X
ω= (−1)i−1 xi dx1 · · · dxi−1 dxi+1 · · · dxn (4.162)
i=1
154 CHAPTER 4. THE METRIC TENSOR

1
3. Define the solid angle form σ = r n ω. Show that

1
rn−1 dr σ = dr ω = vol. (4.163)
r

4. Show that dω = n vol.


5. Let Ba be the ball of radius a with volume α(n)an . Let Sn be the sphere of
radius a with area nα(n)an−1 . (For instance α(3) = (4/3)π occurs in the
volume formula for n = 3, and hence 3α(3) = 4π is in the area formula.)
Show that Z
ω = nα(n)an . (4.164)
Sa

Hint: Apply the divergence theorem.


6. For r 6= 0 define the fundamental solution by
1 1 1
φ(x) = . (4.165)
nα(n) n − 2 rn−2

(For n = 3 this is φ(x) = 1/(4π)1/r.) Show that


1 1
−∇φ(x) = E. (4.166)
nα(n) rn

7. Show that
1 1
−∇φ(x)cvol = ω. (4.167)
nα(n) rn

8. Show that this is a closed form, and hence ∇2 φ(x) = 0 away from r = 0.
9. Show that for every a > 0 the flux of the fundamental solution is
Z
−∇φ(x)cvol = 1. (4.168)
Sa

Problems 11: The Laplacian


The setting for these problems is Euclidean space Rn with n ≥ 3 with Cartesian
coordinates x1 , . . . , xn . We write r2 = x21 + · · · + x2n and vol = dx1 · · · dxn . The
gradient of u is the vector
n
X ∂u ∂
∇u = grad u = . (4.169)
i=1
∂xi ∂xi

Then
n
X ∂u
∇ucvol = (−1)i−1 dx1 · · · dxi−1 dxi+1 · · · dxn . (4.170)
i=1
∂xi
4.11. SURFACE AREA 155

In Cartesian coordinates the Laplacian ∇2 u is defined by


n
X ∂2u
∇2 u vol = div grad u vol = d(∇ucvol) = vol (4.171)
i=1
∂x2i

∂2
Pn
Often ∇2 = i=1 ∂x2i is called the Cartesian coordinate Laplace operator.
The fundamental solution of the Laplace equation ∇2 u = 0 is
1 1 1
φ(x) = n−2
. (4.172)
nα(n) n − 2 r

The goal is to establish the following amazing identity: The fundamental solu-
tion satisfies
−∇2 φ(x) = δ(x). (4.173)
This shows that the the behavior of the fundamental solution at r = 0 can also
be understood. It turns out that this is the key to solving various problems
involving the Laplace operator. The following problems are intended to give at
least some intuitive feel for how such an identity can come about.
The method is to get an approximation φ (x) whose
√ Laplacian is an approx-
imate delta function δ (x). To this end, define r = r2 + 2 and let
1 1 1
φ (x) = n−2 . (4.174)
nα(n) n − 2 r

1. Prove that dr2 = dr2 and hence dr /dr = r/r .

2. (a) Prove that


1 1
−∇φ (x) = E. (4.175)
nα(n) rn

(b) Prove that


1 1
−∇φ (x)c vol = ω. (4.176)
nα(n) rn

3. Show that
−∇2 φ (x) = δ (x), (4.177)
where δ (x) is a constant times a power of  times an inverse power of r .
Find the explicit form of δ (x).

4. Show that x 1


δ (x) = δ1 . (4.178)
 n
What is the explicit form for the function δ1 (x)?

5. RTo show that this is an approximate delta


R function, we need to show that
δ (x) vol = 1. For each a compute Ba δ (x) vol as an explicit function
of a and  and n. Hint: Use the divergence theorem.
156 CHAPTER 4. THE METRIC TENSOR

6. Show that for fixed  > 0 we have


Z
δ (x) vol → 1 (4.179)
Ba

as a → ∞.

Recitation 11
1. In the following we use tensor algebra notation where repeated upper and
lower indices indicate summation. Prove the following identity for the
Levi-Civita permutation symbols:
j1 ···jn j1 ···jn = n!. (4.180)

2. Recall that
det(A) = j1 ···jn a1j1 · · · anjn . (4.181)
Show that
i1 ···in det(A) = j1 ···jn aij11 · · · aijnn . (4.182)

3. Show that
1
det(A) = i ···i j1 ···jn aij11 · · · aijnn . (4.183)
n! 1 n
4. Show that det(AB) = det(A) det(B). Hint: Use the preceding identity for
AB.
5. The next three problems are relevant to Cramer’s rule. Show that
j j2 ···jn aij aij22 · · · aijnn = i i2 ···in det(A). (4.184)

6. Show that
1
k i ···i i i2 ···in = δki . (4.185)
(n − 1)! 2 n

7. If aij is a matrix, define its cofactor matrix to be


1
Ckj = k i ···i j j2 ···jn aij22 · · · aijnn . (4.186)
(n − 1)! 2 n
(This is actually the transpose of the usual matrix of cofactors.) Prove
Cramer’s rule
aij Ckj = δki det(A). (4.187)

8. Let X 1 , . . . , X n be n vectors in Rn forming a matrix X. The Euclidean


volume spanned by these vectors is | det X|. Define the Gram matrix to
be the matrix of inner products A = X T X. Show that the volume is given
also by q
det(X T X) = | det X|. (4.188)
4.11. SURFACE AREA 157

9. Let n = 2 in the above result. Let θ be the angle between the two vectors.
Compute the area in terms of this angle and the lengths of the vectors.
10. Let X 1 , . . . , X n be n vectors in Rn forming a matrix X. Let G be a
symmetricp matrix with positive eigenvalues. Define the volume relative to
G to be det(X T GX). Find a formula for this volume in terms of G and
| det(X)|.
11. L X be an n by k matrix, representing k vectors Xαi , where i = 1, . . . , n
and α = 1, . . . , k. Let G be an n by n symmetric matrix. We can define
the Gram matrix as the k by k matrix X T X, or more generally as the k
by k matrix X T GX. For each sequence i1 , . . . , ik of rows of X,
J i1 ···ik = α1 ···αk Xαi11 · · · Xαikk (4.189)
represents the determinant of the corresponding k by k minor obtained by
retaining only those rows. Show
1 i1 ···ik
det(X T GX) = J gi1 j1 · · · gik jk J j1 ···jk . (4.190)
k!
In this formula it is understood that there is summation over repeated
indices.
12. Take G = I (with matrix δij ) in the
pabove formula, and simplify the result.
This gives a formula for the area det(X T X) in terms of areas of projec-
tions. This is a remarkable generalization of the theorem of Pythagoras.
13. Describe what this simplified formula says in the case n = 3 and k = 2.

Problems 12: Surface area


1. Consider an n − 1 surface given implicitly by w = g(x1 , . . . , xn ) = C and
implicitly by xi = fi (u1 , . . . , un−1 ). For α = 1, . . . , n − 1 there are n − 1
tangent vectors to the surface with components ∂xi /∂uα . Prove that dw
on the surface is zero on the tangent space, that is, prove that for each β
we have
n
X ∂w ∂xj
= 0. (4.191)
j=1
∂xj ∂uβ

2. In the same situation, show that the form on the surface with components
νj equal to the (−1)j−1 times the determinant of ∂xi /∂uα with row j
deleted satisfies
n
X ∂xj
νj = 0. (4.192)
j=1
∂uβ
Since it is also zero on the tangent space, it must be a multiple of dw on
the surface. Hint: Consider the matrix with first column ∂xi /∂uβ and
remaining columns ∂xi /∂uα for α = 1, . . . , n − 1. Here β is an arbitrary
choice of one of the α indices.
158 CHAPTER 4. THE METRIC TENSOR

3. Consider the surface given by x = uv, y = u+v, z = u−v. Find an implicit


equation for this surface. Verify that the two forms of the previous two
problems are multiples of each other.
4. For the same surface, the metric dx2 + dy 2 + dz 2 has a pullback given by
E du2 + 2F du dv + G dv 2 . Calculate it.
5. For the same surface,
√ calculate the area for parameter region u2 + v 2 ≤ 1
by integrating EG − F 2 du dv over the region.
6. For the same surface, the forms dy dz and dz dx = − dx dz and dx dy
have pullbacks νx du dv and νy du dv and νz du dv. Calculate them.
7. For the same surface,
q calculate the area for parameter region u2 + v 2 ≤ 1
by integrating (νx2 + νy2 + νz2 ) du dv over the region.
Chapter 5

Measure Zero

159
160 CHAPTER 5. MEASURE ZERO

5.1 Outer content and outer measure


The Riemann integral is good for many purposes, but the Lebesgue integral is
both more general and easier to work with. The problem is that it takes time
and effort to develop the Lebesgue integral and to appreciate its qualities. In
this part we take only a small step beyond the Riemann integral. We contrast
sets of content zero (a Riemann integral topic) with sets of measure zero (a
Lebesgue integral topic). Every set of content zero is of measure zero. However
there are sets of measure zero that are not of content zero. Sets of measure zero
are a key concept in analysis.
A high point for this course is the theorem due to Lebesgue that characterizes
those bounded functions (on a bounded cell) that have a Riemann integral. The
astonishing result is that they are exactly the functions that are continuous
except on a set of measure zero. If we combine this with the fact that smooth
change of variable functions take sets of measure zero to measure zero, we get
a deeper insight into change of variables for the Riemann integral.
Another striking result is Sard’s theorem, which concerns a smooth change
of variables function having critical points at which the Jacobian determinant
is zero and the inverse function theorem fails. The image of the set of critical
points is the set of critical values. Even though the set of critical points may be
quite large, the theorem says that the corresponding set of critical values has
measure zero.
Recall that a cell I ⊆ Rn is a product of intervals. The n-dimensional volume
of a bounded cell I is denoted m(I). A cell is non-degenerate if each interval
is non-degenerate. Suppose A ⊆ Rn . The fundamental notion in the following
is that of cell cover. SA cell cover of A is a family I of bounded non-degenerate
cells such that A ⊆ I∈I I.
The outer content of A is defined by
X [
m̄(A) = inf{ m(I) | I finite, A ⊆ I}. (5.1)
I∈I I∈I

It follows that A ⊆ B implies m̄(A) ≤ m̄(B). A set A with m̄(A) = 0 is said to


have content zero.
The outer measure of A is defined by
X [
µ̄(A) = inf{ m(I) | I countable, A ⊆ I}. (5.2)
I∈I I∈I

It follows that A ⊆ B implies µ̄(A) ≤ µ̄(B). A set A with µ̄(A) = 0 is said to


have measure zero.

Proposition 5.1 For every A ⊆ Rn we have

µ̄(A) ≤ m̄(A). (5.3)


5.1. OUTER CONTENT AND OUTER MEASURE 161

Theorem 5.2 (Finite sub-additivity of outer content) If A is a finite fam-


ily of subsets of Rn , then
!
[ X
m̄ A ≤ m̄(A). (5.4)
A∈A A∈A

Theorem 5.3 (Countable sub-additivity of outer measure) If A is a count-


able family of subsets of Rn , then
!
[ X
µ̄ A ≤ µ̄(A). (5.5)
A∈A A∈A

Since this theorem is the key to the entire subject, it is worth recording the
proof. Let  > 0. Since A is countable, we may enumerate its elements An .
By the fact that µ̄(AnP
) is defined as a greatest lower bound
S of sums, there is a

cover Ink of An with Pk m(Ink ) < µ̄(A P n ) + 2n . Then n AnSis covered by all
the Ink . Furthermore, nk m(Ink ) < Sn µ̄(An ) +P. Since µ̄( n An ) is defined
as a lower bound of sums, S we have µ̄(
P n An ) < n µ̄(An ) + . Since  > 0 is
arbitrary, we must have µ̄( n An ) ≤ n µ̄(An ).
Proposition 5.4 Suppose that A, A0 are subsets with m̄(A \ A0 ) = 0 and also
m̄(A0 \ A) = 0. Then
m̄(A ∩ A0 ) = m̄(A) = m̄(A0 ) = m̄(A ∪ A0 ). (5.6)
0 0
Proposition 5.5 Suppose that A, A are subsets with µ̄(A \ A ) = 0 and also
µ̄(A0 \ A) = 0. Then
µ̄(A ∩ A0 ) = µ̄(A) = µ̄(A0 ) = µ̄(A ∪ A0 ). (5.7)
Proposition 5.6 The outer content of A may be defined by finite open cell
covers: X [
m̄(A) = inf{ m(I) | I finite open, A ⊆ I}. (5.8)
I∈I I∈I

Proposition 5.7 The outer content of A may be defined by finite closed cell
covers: X [
m̄(A) = inf{ m(I) | I finite closed, A ⊆ I}. (5.9)
I∈I I∈I

Proposition 5.8 The outer content of A may be defined by finite closed non-
overlapping cell covers:
X [
m̄(A) = inf{ m(I) | I finite closed non − overlapping, A ⊆ I}. (5.10)
I∈I I∈I

Proposition 5.9 The outer measure of A may be defined by countable open


cell covers:
X [
µ̄(A) = inf{ m(I) | I countable open, A ⊆ I}. (5.11)
I∈I I∈I
162 CHAPTER 5. MEASURE ZERO

Proposition 5.10 The outer measure of A may be defined by countable closed


cell covers:
X [
µ̄(A) = inf{ m(I) | I countable closed, A ⊆ I}. (5.12)
I∈I I∈I

Theorem 5.11 Suppose A ⊆ Rn is compact. Then

µ̄(A) = m̄(A). (5.13)

This statement is not quite as elementary as it might look, since a compact


set need not be Jordan measurable. It therefore deserves a proof. Say that A is
compact. We need to show that m̄(A) ≤ µ̄(A). Let  > 0. Since µ̄(A) is defined
as
P∞ a greatest lower bound of sums, there is a countable cover Ik of A such that
k=1 m(Ik ) < µ̄(A) + . Since A
Pis compact, there
P∞ is a finite subcover, say by
p
the first p cells. Then m̄(A) ≤ k=1 m(Ik ) ≤ k=1 m(Ik ) < µ̄(A) + . Since
for every  > 0 we have m̄(A) < µ̄(A) + , we must have m̄(A) ≤ µ̄(A).

5.2 The set of discontinuity of a function


The purpose of this section is to establish that the set of discontinuity of a
function is the countable union of a certain family of closed subsets.
If f is a real function on C ⊆ Rn , and A ⊆ C, define the oscillation of f on
A by
oscA (f ) = sup |f (z) − f (w)| = sup f (z) − inf f (w). (5.14)
z,w∈A z∈A w∈A

Define the oscillation of f at x by

oscx (f ) = inf{oscU (f ) | x ∈ int(U )}. (5.15)

Proposition 5.12 The function f is continuous at x if and only if oscx (f ) = 0.

In the following we write Disc(f ) for the set of points at which f is not contin-
uous.

Proposition 5.13
Disc(f ) = {x | oscx (f ) > 0}. (5.16)

Next we want to make the notion of discontinuity quantitative. For each f


and each  > 0 define the set D (f ) by the condition

D (f ) = {x | oscx (f ) ≥ }. (5.17)

Proposition 5.14 The point x is in D (f ) if and only if for every U with


x ∈ int(U ) we have oscU (f ) ≥ .

Proposition 5.15 For every  > 0 the set D (f ) ⊆ Disc(f ). Furthermore, as


 gets smaller the sets D (f ) can only get larger.
5.3. LEBESGUE’S THEOREM ON RIEMANN INTEGRABILITY 163

Proposition 5.16 For each  > 0 the set D (f ) is a closed subset of Rn .

Proposition 5.17 A point is a point of discontinuity if and only if it belongs


to some D (f ) for some  > 0. Thus
[
Disc(f ) = D (f ). (5.18)
>0

Proposition 5.18 The set of points of discontinuity is a countable union of


closed subsets:

[
Disc(f ) = D n1 (f ). (5.19)
n=1

Corollary 5.19 Let f be defined on a bounded set. The set Disc(f ) has measure
zero if and only if for each  > 0 the set D (f ) has content zero.

5.3 Lebesgue’s theorem on Riemann integrabil-


ity
Theorem 5.20 (Lebesgue (1907)) Let C ⊂ Rn be a bounded closed non-
degenerate cell. Let f be a real function on C. Then f is Riemann integrable if
and only if f is bounded and the set of discontinuities has measure zero, that is,

µ̄(Disc(f )) = 0. (5.20)

The theorem relies on two lemmas. The first one shows that Riemann in-
tegrability implies the set of discontinuities has measure zero. This part relies
heavily on the countable sub-additivity of outer measure. The other one shows
that a bounded function with a set of discontinuities of measure zero is Rie-
mann integrable. Here the remarkable thing is that measure zero is a weaker
requirement than having content zero.

Lemma 5.21 For every  > 0 and every partition P of C we have

m̄(D ) ≤ U (f, P) − L(f, P). (5.21)

The way the lemma is used is to note that U (f ) − L(f ) is the greatest lower
bound for U (f, P) − L(f, P). The lemma says that m̄(D ) is a lower bound,
so m̄(D ) ≤ U (f ) − L(f ). If U (f ) − L(f ) = 0, then m̄(D ) = 0, and hence
µ̄(D ) = 0. By countable subadditivity µ̄(Disc) = 0.
Proof: Let P be a partition of C into closed bounded non-degenerate cells.
Let P 0 be the subset of P consisting of sets I with int(I) ∩ D 6= ∅. Then for I
in P 0 we have oscI (f ) ≥ . Now let B be the union of the boundary points of
the cells in P, and let D0 = D \ B. If x is in D0 , then x is in D (f ), and hence
164 CHAPTER 5. MEASURE ZERO

x is in some I in P. Since x is not in B, it must be in int(I). Hence I is in P 0 .


This shows that P 0 covers D0 . So we have
X X
m̄(D0 ) ≤  m(I) ≤ oscI (f ) m(I) = U (f, P 0 )−L(f, P 0 ) ≤ U (f, P)−L(f, P).
I∈P 0 I∈P 0
(5.22)
Now D \ D0 ⊆ B, so m̄(D \ D0 ) ≤ m̄(B) = 0. This implies that m̄(D ) =
m̄(D0 ). 

Lemma 5.22 For every  > 0 and every κ > 0 there exists a partition P such
that

U (f, P) − L(f, P) ≤ oscC (f )(m̄(D ) + κ)m̄(D ) + m̄(C). (5.23)

The way the lemma is used is to note that for every  > 0 and every κ > 0
we have U (f ) − L(f ) ≤ oscC (f )(m̄(D ) + κ) + m̄(C). Since κ > 0 is arbitrary
we get U (f ) − L(f ) ≤ oscC (f )m̄(D ) + m̄(C). Now suppose that µ̄(Disc) = 0.
Then for each  > 0 we have µ̄(D ) = 0. But since D is closed, this says that
m̄(D ) = 0. So the right hand side is m̄(C). Since  > 0 is arbitrary, this
implies that U (f ) − L(f ) = 0.
Proof: The first term comes from the points with large oscillation, and the
second term comes from the points with small oscillation. To deal with the first
term, consider κ > 0. From the definition of m̄(D ) there is a finite closed cover
I of D such that X
m(I) < m̄(D ) + κ. (5.24)
I∈I

One can thicken the cells in I so that each of them is open and the same estimate
is satisfied. The union of the cells in the new I is open, and D is a subset of
this open set. One can then take the closures of the cells and get a finite closed
cell cover satisfying this same estimate. By removing overlaps we can get a
nonoverlapping finite closed cell family P 0 such that every cell is a subset of C.
Let A0 be the union of the cells in P 0 . Then D is in the interior of the closed
set A0 ⊆ C. Furthermore,
X
m̄(A0 ) = m(I) < m̄(D ) + κ. (5.25)
I∈P 0

Let A00 be the closure of C \ A0 . It is a compact set whose intersection with


D is empty. Thus for every point x in A00 there is a closed bounded cell I with
x in its interior and such that oscI (f ) < . The collection of these cells forms
an open cover of A00 . Hence there is a finite subcover I 00 . The cells I ∩ A00 may
be subdivided into subcells in such a way as to form a partition P 00 of A00 . For
each I in P 00 we have diamf (I) < .
Now take the partition P = P 0 ∪ P 00 of C. Then
X X
U (f, P) − L(f, P) = oscI (f )m(I) + oscI (f )m(I). (5.26)
I∈P 0 I∈P 00
5.4. ALMOST EVERYWHERE 165

Now
X X
oscI (f )m(I) ≤ oscC (f )m(I) ≤ oscI (f )(m̄(D ) + κ). (5.27)
I∈P 0 I∈P 0

Also X X
oscI (f )m(I) ≤ m(I) = m̄(A00 ) ≤ m̄(C). (5.28)
I∈P 00 I∈P 00
This gives the result. 
This treatment relies heavily on unpublished notes on the Riemann integral
by Mariusz Wodzicki [20].
The Lebesgue theorem has implications for functions that are restricted to
complicated subsets. Let 1A be the indicator function of A, equal to 1 on A and
0 on its complement. Then the discontinuity set of the indicator function 1A is
the boundary of A, that is, Disc(1A ) = bdy(A).
If f is a bounded function defined on a bounded closed non-degenerate cell C,
and A ⊆ C, then the Riemann integral of f over A is defined to be the integral
of f 1A , when that Riemann integral exists. (This may also be taken to be the
definition of a function that is defined only on A.) In particular, if f is a bounded
function on C, and if A ⊆ C, then the Riemann integral of f over A exists if and
only if Disc(f 1A ) has measure zero. However Disc(f 1A ) ⊆ Disc(f ) ∪ bdy(A).
So if both Disc(f ) and bdy(A) have measure zero, then f is integrable over A.
All this applies to the case when f = 1 on the bounded cell C. Then 1A
is discontinuous precisely on bdy(A). So 1A is integrable if and only if bdy(A)
has measure zero. This is precisely the situation when A is Jordan measurable.
The integral of 1A is then the content m(A) of A.
While the outer content of A is defined for arbitrary subsets, the content
of A is only defined when A is Jordan measurable. When a set A is Jordan
measurable, its content m(A) is the same as the outer content m̄(A). (And in
this case this is the same as the outer measure µ̄(A).) The point of restricting
to Jordan measurable sets is that the content on Jordan measurable sets is
additive, while the outer content on arbitrary sets is only subadditive. (The
outer measure on arbitrary sets is also only subadditive, but in this case it takes
some effort to find examples where additivity fails.)

5.4 Almost everywhere


Often we say that a property depending on x holds for almost every x if the set
of x for which it fails has measure zero. Thus a function f on a bounded cell
C is Riemann integrable if and only if it is bounded and f (x) is continuous for
almost every x.
Sometimes we just say that the property holds almost everywhere. A func-
tion f on a bounded cell C is Riemann integrable if and only if it is bounded
and also continuous almost everywhere.
Theorem 5.23 Let f ≥ 0 be a function with I(f ) = 0. Then f (x) = 0 for
almost every x.
166 CHAPTER 5. MEASURE ZERO

Proof: Suppose f ≥ 0 and I(f ) = 0. Let N be the set of x such that


f (x) > 0. If f is continuous at x in N , then I(f ) > 0, which is ruled out. So
N ⊆ Disc. Since µ̄(Disc) = 0, it follows that µ̄(N ) = 0. 

Corollary 5.24 Let f, g be two Riemann-integrable functions with I(|f − g|) =


0. Then f (x) = g(x) for almost every x.

5.5 Mapping sets of measure zero


In computing with sets of measure zero it is sometimes useful to use balls instead
of cells. In the following we consider closed balls B(a, r) = {x ∈ Rn | |x−a| ≤ r,
with r > 0.

Lemma 5.25 Consider a subset A ⊂ Rn . Then A has measure zero S∞if and
if for every  > 0 there is a sequence of closed balls Bk with A ⊆ k=1 Bk
only P

and k=1 voln (Bk ) < . For every κ > 0, it is possible to impose the additional
requirement that the balls all have radius less than κ.

Proof: Up to now measure zero has been defined using covering by non-
degenerate bounded cells. First we show that we could use instead coverings by
closed cubes. Indeed, if we have a non-degenerate cell, then it has a side with
least length L > 0. This cell is a subset of a bigger closed cell all of whose side
lengths are multiples of L. The bigger cell may be taken so that each side length
is no more than 2 times the corresponding side length of the original cell. So
the volume of the bigger cell is bounded by 2n times the volume of the original
cell. Furthermore, the bigger cell may be written as the union of closed cubes
of side length L, and its volume is just the sum of the volumes of the individual
cubes. So if we can cover by cells with small total volume, then we can also
cover by closed cubes of small total volume. By further subdividing the cubes,
one can make them each of side length smaller than (some small multiple) of κ.
Once we have the result for closed cubes, then we have it for closed balls.
This is because a closed ball of radius r is a subset of a closed cube of side length
L = 2r,
√ and a closed cube of side length L is a subset of a closed ball of radius
r0 = nL/2. 
A function h is called Lipschitz continuous if there is a constant C such that
it satisfies |h(x0 ) − h(x)| ≤ C|x0 − x|. A function h from an open subset U
of Rn to Rn is called locally Lipschitz continuous if for every x there exists a
neighborhood V ⊆ U of x such that h restricted to V is Lipschitz continuous.
It is not too hard to see that h is locally Lipschitz if and only if it is Lipschitz
continuous on every compact subset K of U . A continuous function may
map a set of measure zero to a set of non-zero measure zero. However this is
impossible for a C 1 function. In fact, a C 1 function is always locally Lipschitz.
So the following result is relevant.

Theorem 5.26 Let U ⊆ Rn and let h : U → Rn be a locally Lipschitz function.


Then h maps sets of measure zero to sets of measure zero.
5.6. SARD’S THEOREM 167

Proof: First consider a compact subset K ⊆ U . Consider E ⊆ K with


measure zero. There is a κ neighborhood of K that is also a subset of U , whose
closure is also a compact
S subset of U . Let C be the Lipschitz constant for h
on this set. Then E ⊂ P k Bk , where the Bk are closed balls inside U of radius
rk and total volume k voln (Bk ) arbitrarily small. If x is in E, then x is in
some ball Bk with center ak and radius rk , and thus h(x) is in S some ball Bk0
with center h(ak ) and radius rk = Krk . This shows that h(E) ⊆ k Bk0 , where
0

the Bk0 are closed balls of radius rk0 = Krk . The total volume of the balls is
bounded by K n times the total volume of the original balls. This can be made
arbitrarily small.
Next consider an arbitrary subset F ⊆ U with measure zero. Consider a
sequence of compact subsets Kn with union U . Let Fn be the intersection of F
with Kn . Then each Fn has measure zero. Also, h(F ) is the union of the h(Fn ).
Since each of the h(Fn ) has measure zero, then so does h(F ). 
One consequence of the theorem is the result that if f is Riemann integrable,
and g is one-to-one continuous with Lipschitz inverse, then f ◦ g is Riemann
integrable. This is seen as follows. Let E = g −1 (Disc(f )). Then E has measure
zero. Suppose that x is not in E. Then g(x) is a continuity point for f , so x
is a continuity point for f ◦ g. This establishes that Disc(f ◦ g) ⊆ E. Since
Disc(f ◦ g) has measure zero, the Lebesgue theorem says that f ◦ g must be
Riemann integrable.

5.6 Sard’s theorem


Sard’s theorem gives a connection between ideas of differentiability and measure.
It says that if g is a sufficiently smooth function from an open subset of Rn to
Rm , then the image of the set of critical points of g has measure zero. In other
words, the set of critical values of g has measure zero. There are three cases,
depending on whether n = m, n < m, or n > m. The first two are relatively
easy. The third one is more difficult and will not be proved here.
Theorem 5.27 (Sard) Let A ⊆ Rn be open, and let g : A → Rn be C 1 . Let
B ⊆ A be the set of x in A for which g0 (x) has rank less than n, that is, for
which det g0 (x) = 0. Then g(B) has measure zero.
Proof: Let C be a closed bounded non-degenerate cube with C ⊆ U . In fact,
we may take C to be a cube with side length L. We first show that f (B ∩ C)
has content zero. Partition C into N n small cubes I, each of side length L/N .
Suppose that one of these small cubes I intersects B. Then we can choose an x
in I and in B. We want to show that when x + h is in the same small cube I,
then g(x + h) is close to g(x), in fact, very close in at least one direction.
First we have the mean value theorem estimate

|g(x + h) − g(x)| ≤ M |h| ≤ M n(L/N )n . (5.29)
Here M is a bound on the norm of g0 (x) with x in C. By the C 1 hypothesis
and compactness M is finite.
168 CHAPTER 5. MEASURE ZERO

Next, let  > 0. Since g(x) is assumed to be C 1 , we can take N so large


that g0 (x) does not change by more than  across any of the small cubes I. In
particular, we have kg0 (x + th) − g0 (x)k ≤  for every t with 0 ≤ t ≤ 1. By the
integral form of the mean value theorem
Z 1
g(x + h) − g(x) − g0 (x)h = (g0 (x + th) − g0 (x))h dt (5.30)
0

Hence with x and x + h in the small cube I, we have



|g(x + h) − g(x) − g0 (x)h| ≤ |h| ≤  n(L/N )n . (5.31)

Since x is in B, the values g0 (x)h range over a subspace V of dimension at most


n − 1. This√ means that when x + h is also in the small cube I, then g(x + h)
is within  n(L/N )n of the plane g(x) + V .
The conclusion of the two estimates is that when x is in B and in the small
cube I, and x + h is in the same small cube I, then g(x + h) is contained in
√ cylinder. The base of this cylinder is a n√− 1 dimensional ball of
a very flat
radius M n(L/N )n . The height of the cylinder is 2 n(L/N )n . So there is a
constant c (depending on M and on n) such that the volume of the cylinder is
bounded by c(L/N )n .
The result is that the volume of the image of each small cube I intersect-
ing B ∩ C is bounded by c(L/N )n . There are at most N n small cubes that
intersect C, so the image of B ∩ C is covered by a set of volume not exceeding
N n c(L/N )n  = cLn .
Since this works for arbitrary  > 0, we may conclude that the content of
g(B ∩ C) is zero. In particular the measure of each g(B ∩ C) is zero. However
we may write the open set A as the countable union of cubes C. It follows that
g(B) is the union of a countable number of sets of measure zero. Hence g(B)
has measure zero. 

Theorem 5.28 (Sard) Let A ⊆ Rn be open. Let n < m, and let g : A → Rm


with n < m be C 1 . Let B ⊆ A be the set of points such that g0 (x) has rank less
T
than m, that is, det g0 (x)g0 (x) = 0. Then B = A, and in fact the image set
g(A) has measure zero.

Proof: Since g0 (x) is an m by n matrix, it has rank at most n. So rank is


surely less than m. Let p be the projection of Rm onto Rn defined setting the
last m − n + 1 components equal to zero. Let B be the set of y in Rm with
p(y) in A. Let h(y) = g(p(y)). Then h : B → Rn has the same image as g.
Furthermore, the derivative h0 (y) = g0 (p(y))p0 (y) has rank at most n < m. So
by the previous theorem h(B) = g(A) has measure zero. 
The theorem just proved deserves respect. Suppose that A ⊆ Rn and g :
A → Rm with n < m is continuous. One might think that the image set g(A)
would have measure zero. However this is not true in general, as is shown by
the existence of space-filling curves.
5.7. CHANGE OF VARIABLES 169

Theorem 5.29 (Sard) Let A ⊆ Rn be open. Let m < n and suppose g : A →


Rm be C n−m+1 . Let B ⊆ A be the set of x in A for which g0 (x) has rank less
T
than m, that is, for which det g0 g0 = 0. Then g(B) has measure zero.

This last is the most difficult case of Sard’s theorem. It says that the set of
w for which there exists x with g(x) = w and g0 (x) having rank less than m
has measure zero. In other words, for almost every w the surface g(x) = w has
only points with g0 (x) having rank m.

5.7 Change of variables


The Sard theorem is relevant to the general change of variables theorem for
unoriented integrals. Here we review several such theorems, without proofs.
One version of the theorem says that if g is a C 1 map of an open subset of
R to Rn , then
n

 
Z Z X
f (g(x))h(x)| det g0 (x)| dn x = f (y)  h(x) dn y. (5.32)
g(x)=y

In particular we can take h as the indicator function of the set A. This gives
Z Z
f (g(x))| det g0 (x)| dx = f (y)#{x ∈ A | g(x) = y} dy. (5.33)
A

We see that if g is not one-to-one, then the only modification is that we need to
keep track of how many times g assumes a certain value. According to Sard’s
theorem, the critical values do not matter.
Another version of change of variable is the pushforward formula
 
Z Z X 1
f (g(x))h(x) dn x = f (y)  h(x)  dn y. (5.34)
| det g0 (x)|
g(x)=y

In this situation Sard’s theorem is of no use; we need to require that g0 (x) is


non-singular. The quantity f (g(x)) on the left is the pullback of f (y). The
function of y given by the sum on the right hand side is the pushforward of
h(x).

5.8 Fiber integration


There is a very useful variant of the pushforward formula in the case when
y = g(x) defines a function from an open subset of Rn to Rm with 0 < m < n.
Let us assume that g(x) is C 1 and that the derivative g0 (x) has rank m. Then
for each y the equation
g(x) = y (5.35)
170 CHAPTER 5. MEASURE ZERO

defines a regular surface (an embedded manifold) of dimension k = n − m.


The function y = g(x) thus defines a family of disjoint surfaces. The surface
corresponding to a particular value of y is called the fiber over y. It would seem
reasonable to do an iterated integral over y and over the surfaces (the fibers).
It will be convenient to think of g(x) = y as a system gi (x) = yi for i =
k + 1, . . . , n. Suppose we consider a point in the space. Then near that point
there is a change of variables from x1 , . . . , xn to u1 , . . . , uk , yk+1 , . . . , yn . The
u1 , . . . , uk parameterize each surface. We write this as
x = F(u, y). (5.36)
The inverse relation is
u = G1 (x) (5.37)
y = G2 (x) = g(x).
Then
Z Z Z
f (g(x))h(x) dx = f (y) h(F(u, y))| det F0 (u, y)| du dy. (5.38)

Thus the pushforward is


Z
h∗ (y) = h(F(u, y))| det F0 (u, y)| du. (5.39)

Define the fiber form


1
β (y) = | det F0 (u, y)| du = du. (5.40)
| det G0 (F(u, y))|
The answer is now Z
h∗ (y) = h(y) β (y) . (5.41)
g−1 (y)

Here h(y) is the restriction of h(x) to the surface g−1 (y).


It is not hard to check that the fiber form is independent of the choice of
coordinates. Suppose that F̄(w, y) = F(φ(w), y). We can think of the right
hand side as F(u, y) with u = φ(w), y = y. We can compute
| det F̄0 (w, y)| dw = | det F0 (φ(w, y))|| det φ0 (w)| dw = | det F0 (u, y)| du.
(5.42)
The differential forms are identical.
Another way of thinking of the fiber integral is in terms of delta functions.
The relation is
Z Z
δ(g(x) − y)h(x) dx = h(y) β (y) . (5.43)
g−1 (y)

The delta function on the left enforces m equations, leaving an k = n − m


dimensional integral on the right.
The discussion in this section has avoided various technical issues. The
reader may consult the fascinating article by Ponomarev [16] for further infor-
mation on this subject.
5.9. PROBABILITY 171

5.9 Probability
The central notion of probability is that of expectation of a function of a vector
random variable x. A common case is when the expectation is given by a
probability density ρ(x). This is a positive function with integral one. Say that
y = g(x) is a random variable that is a function of x. Then it may or may not
be the case that the expectation of f (y) = f (g(x)) is given by a pushed forward
probability density ρ∗ (y). When this is the case, we should have
Z Z
f (y)ρ∗ (y) dy = f (g(x))ρ(x) dx. (5.44)

First consider the case when n random variables are mapped to n random
variables. There ρ(x) is a joint probability density for random variables x, g(x)
is a vector of n random variables, and f (g(x)) is a function of these n random
variables. The right hand side is the expectation. If one wants to write this
expectation in terms of the random variables y = g(x), then one has to push
forward the density. The change of variables formula suggests that the new
density is
X 1
ρ∗ (y) = ρ(x). (5.45)
| det g0 (x)|
g(x)=y

This only works when the regions where det g0 (x) = 0 can be neglected, and
this is not always the case. If, for instance, there is a region C ofR non-zero
volume with g(x) = y∗ for x in C, then the extra contribution f (y∗ ) C ρ(x) dx
must be added to the left hand side for the identity to be valid. Sard’s theorem
does nothing to help, since there is no longer a factor that vanishes on the set
of critical points. Even though the set of critical values has measure zero, there
can be a lot of probability on a set of measure zero.
Example: An example is when n = 1 and the density is ρ(x) = √12π exp(− 12 x2 ).
This is the density for a standard normal (Gaussian) distribution. Let y = x2 .
Then ρ∗ (y) = √12π √1y exp(− 12 y) for y > 0. This density is that of a chi-squared
distribution (with one degree of freedom). |
Next consider the case when n random variables are mapped to m random
variables with m < n by y = g(x). The pushforward formula suggests that
Z
ρ∗ (y) = ρ(y) β (y) . (5.46)
g−1 (y)

Example: Here is an example when n = 2 and m = 1. Consider the joint normal


1
(Gaussian) distribution with density ρ(x1 , x2 ) = 2π exp(− 12 (x21 + x22 )). Let y =
x2 /x1 be the quotient. The fiber over y is the line x2 = yx1 . One choice of pa-
rameter is u = x1 . Then we have x1 = u, x2 = yu. The Jacobian determinant is
1
u, so the fiber form is β (y) = |u| du. Then ρ(y) β (y) = 2π exp(− 21 (1+y 2 )u2 )|u| du.
1 1
The integral of this form over the fiber is ρ(y) = π 1+y2 . This is the density for
the Cauchy distribution. |
172 CHAPTER 5. MEASURE ZERO

Perhaps the moral of the story is that one should calculate with the original
density ρ(x). In probability theory expectations (or measures) push forward in
a routine way. When you try to express them in terms of densities, then the
expressions are less pleasant. Densities are functions. Functions pull back with
ease, but push forward with considerable difficulty.

5.10 The co-area formula


The co-area formula relates fiber integration to area. Suppose that g is a smooth
function from an open subset of Rn to Rn−k with k > 0. We are interested
in the k-dimensional surfaces g(x) = y. The derivative g0 (x) is a n − k by n
matrix. Its rows are n − k independent co-vectors (forms) that vanish on the
k-dimensional tangent space to the surface. Suppose F(u, y) parameterizes the
surfaces y = g(x). Let F01 (u, y) be the n by k matrix of partial derivatives
with respect to the u variables. The columns of this matrix represent a basis of
tangent vectors to the surface. From

g(F(u, y)) = y (5.47)

we see that
g0 (x)F01 (u, y) = 0. (5.48)
This expresses in matrix form the fact that the n − k independent row covectors
in g0 (x) are zero on the k independent column vectors in F01 (u, y).
Define the co-area factor C(x) by
q
T
C(x) = det g0 (x)g0 (x) . (5.49)

This is the determinant of an n − k square Gram matrix. Define the area factor
by q
0
A(u, y) = det F1T (u, y)F01 (u, y). (5.50)
The determinant is that of a k square Gram matrix. The simplest form of the
co-area formula says that

C(x)| det F0 (u, y)| = A(u, y), (5.51)

where x = F(u, y). In the language of differential forms, this formula says that

C (y) β (y) = area(y) . (5.52)

Here C (y) = C(F(u, y)), while β (y) = | det F0 (u, y)| du and area(y) = A(u, y) du.
Fiber integration gives an integral version of the co-area formula. This says
that
Z Z Z !
(y)
f (g(x))h(x)C(x) dn x = f (y) h(x)arean−m (x) dm y. (5.53)
g(x)=y
5.10. THE CO-AREA FORMULA 173

In particular we can take h as the indicator function of the set A. This gives
Z Z
(y)
f (g(x))C(x) d x = f (y) arean−m ({x ∈ A | g(x) = y}) dm y.
n
(5.54)
A

The co-area formula may also be thought of as a formula for area integrals
in terms of delta functions. Thus
Z Z
(y)
h(x)δ(g(x) − y)C(x) dn x = h(x)areak (x). (5.55)
g(x)=y

In particular
Z
(y)
δ(g(x) − y)C(x) dn x = arean−m ({x ∈ A | g(x) = y}). (5.56)
A

Example: The co-area formula may seem unfamiliar, but there is a case when
it becomes quite transparent. Consider a single equation g(x) = y that defines
a k = n − 1 dimensional hypersurface. In the integral form of the formula take
f (y) = H(s − y), where H is the indicator function of the positive real numbers.
Also take h(x) = 1. The result is
Z Z s
|g 0 (x)| dx = area({x | g(x) = y}) dy. (5.57)
g(x)≤s −∞

It follows that
Z
d
|g 0 (x)| dx = area({x | g(x) = s}). (5.58)
ds g(x)≤s

This formula is an elementary relation between volume and area for an implicitly
defined surface. |
We proceed to a more systematic development of the co-area formula. The
assumption is that the family of surfaces y = g(x) has a smooth parametric
representation x = F(u, y). This means that

g(F(u, y)) = y (5.59)

and the function F(u, y) has an smooth inverse function G(x) with components
G1 (x) = u and G2 (x) = g(x) = y.

Theorem 5.30 (Co-area formula) Consider an open subset of x in Rn and


k-dimensional regular parameterized surfaces g(x) = y in this region, where the
y are in Rn−k . Suppose that this family of surfaces has a smooth parametric
representation x = F(u, y). Then
Z Z Z
f (g(x))h(x)C(x) dn x = f (y) h(F(u, y))A(u, y) dk u dn−k y. (5.60)
174 CHAPTER 5. MEASURE ZERO

Proof: By a change of variable x = F(u, y) we have


Z Z
f (g(x))h(x)C(x) dn x = f (y)h(F(u, y))C(F(u, y))| det F0 (u, y)| dk u dn−k y.
(5.61)
So all that remains is to show that C(F(u, y))| det F0 (u, y)| = A(u, y), and this
is equivalent to
0
det g0 (x)g0 (x)T (det F0 (u, y))2 = det F1T (u, y)F01 (u, y), (5.62)

where x = F(u, y).


The functions G1 (x) and G2 (x) have derivatives G01 (x) and G02 (x) that are
k by n and n−k by n matrices. The function F(u, y) has partial derivatives with
respect to the u and y variables given by matrices F01 (u, y) and F02 (u, y) that are
n by k and n by n−k matrices. By the chain rule we have G0 (F(u, y))F0 (u, y) =
I. In the following we write this in abbreviated form as

G0 F0 = I. (5.63)

It follows that
T T
G0 G0 F0 F = I. (5.64)
More explicitly, we could write the last two equations as
 0   
G1  0 0
 I 0
F F = (5.65)
G02 1 2
0 I

and
" #" #
T T T T
G01 G01 G01 G02 F01 F01 F01 F02
 
I 0
T T T T = . (5.66)
G02 G01 G02 G02 F02 F01 F02 F02 0 I

There is a theorem in linear algebra that gives a relation between the determi-
nants of submatrices of matrices that are inverse to each other. The statement
and proof are given below. In this case it says that
T T T
det G02 G02 det F0 F0 = det F01 F01 . (5.67)

This can be written


T T
det g0 g0 (det F0 )2 = det F01 F01 . (5.68)

This gives the required identity. 


In the co-area formula the function y = g(x) sends an open subset of Rn
into Rm = Rn−k , where 0 < m < n. In the case when g is C n−m+1 = C k+1
Sard’s theorem gives useful information. It says that the image of the critical
set where C(x) = 0 is sent to a set of y of measure zero. Thus including these
sets in the integration should make no difference. Actually, it turns out that
there are versions of the co-area formula that apply in much greater generality.
An account of these matters may be found in the book by Lin and Yang [8].
5.11. LINEAR ALGEBRA (BLOCK MATRICES) 175

5.11 Linear algebra (block matrices)


This section presents a theorem on block matrices that is useful for the co-area
theorem. The block matrices considered have four blocks, so they resemble in
some respects 2 by 2 matrices with four entries. Some of the formulas for 2 by
2 matrices carry over to this situation, at least after appropriate modifications.
We begin with a lemma about the determinant of a matrix for which one of the
blocks is the zero matrix. The main theorem relates determinants of blocks of
a matrix and of its inverse matrix.
Lemma 5.31 Consider a block triangular block matrix
 
A11 A12
A= . (5.69)
0 A22
Then det A = det A11 det A22 .
Proof: It is sufficient to consider the case when A11 is non-singular. De-
compose
I A−1
   
A11 0 I 0 11 A12
A= . (5.70)
0 I 0 A22 0 I
It is easy to work out each of the determinants on the right. So det A =
det A11 · det A22 · 1. 
Theorem 5.32 Consider a block matrix A with inverse B, so that the product
AB = I has the form
    
A11 A12 B11 B12 I 0
= . (5.71)
A21 A22 B21 B22 0 I
Then det A det B = 1. Furthermore,
det A22 det B = det B11 (5.72)
and
det A11 det B = det B22 . (5.73)
Proof: The inverse of A is given by the block Cramer’s rule
(A11 − A12 A−1 −1
−(A22 A−1 −1
   
B11 B12
= 22 A21 ) 12 A11 − A21 ) .
B21 B22 −(A11 A−121 A22 − A12 )
−1
(A22 − A21 A−1
11 A12 )
−1

(5.74)
There is a triangular factorization
A11 − A12 A−1
    
A11 A12 I A12 22 A21 0
= . (5.75)
A21 A22 0 A22 A−1
22 A21 I
−1
By the lemma above this gives det A = det A22 det B11 , which leads to the first
result. We also have the triangular factorization
A−1
    
A11 A12 A11 0 I 11 A12
= . (5.76)
A21 A22 A21 I 0 A22 − A21 A−1 11 A12
−1
This gives det A = det A11 det B22 , which is equivalent to the second result. 
176 CHAPTER 5. MEASURE ZERO
Bibliography

[1] Ilka Agricola and Thomas Friedrich, Global analysis: Differential forms in
analysis, geometry and physics, Graduate Studies in Mathematics, no. 52,
American Mathematical Society, Providence, Rhode Island, 2002.
[2] Dennis Barden and Charles Thomas, An introduction to differentiable man-
ifolds, Imperial College Press, London, 2003.
[3] Robert L. Bryant, S. S. Chern, Robert B. Gardner, Hubert L. Goldschmidt,
and P. A. Griffiths, Exterior differential systems, Mathematical Sciences
Research Institute Publications, no. 18, Springer, New York, 1991, (online).
[4] William L. Burke, Applied differential geometry, Cambridge University
Press, Cambridge, 1985.
[5] M. Crampin and F. A. E. Pirani, Applicable differential geometry, Lon-
don Mathematical Society lecture notes 57, Cambridge University Press,
Cambridge, 1986.
[6] Harley Flanders, Differential forms with applications to the physical sci-
ences, Academic Press, New York, 1963.
[7] E. L. Ince, Ordinary differential equations, Dover Publications, New York,
1956.
[8] Fanghua Lin and Xiaoping Yang, Geometric measure theory—an introduc-
tion, International Press, Boston, MA, 2002.
[9] David Lovelock and Hanno Rund, Tensors, differential forms, and varia-
tional principles, Dover Publications, New York, 1989.
[10] W. A. J. Luxenburg, Arzela’s dominated convergence theorem for the rie-
mann integral, American Mathematical Monthly 78 (1971), 970–979.
[11] John Milnor, Morse theory, Annals of Mathematics Studies, no. 51, Prince-
ton University Press, Princeton, New Jersey, 1969.
[12] Shigeyuki Morita, Geometry of differential forms, Translations of Mathe-
matical Monographs, no. 201, American Mathematical Society, Providence,
Rhode Island, 2001.

177
178 BIBLIOGRAPHY

[13] Edward Nelson, Tensor analysis, Princeton University Press, Princeton,


NJ, 1967.
[14] , Topics in dynamics I: Flows, Princeton University Press, Prince-
ton, NJ, 1969.

[15] Ivan Netuka, The change-of-variables theorem for the lebesgue integral,
Acta Universitatis Matthiae Belii 19 (2011), 37–42.
[16] S. P. Ponomarev, Submersions and preimages of sets of measure zero,
Siberian Mathematical Journal 28 (1987), 153–163.

[17] Walter Rudin, Principles of mathematical analysis, third ed., McGraw-Hill


International Editions, Princeton, NJ, 1969.
[18] Michael Spivak, Calculus on manifolds: A modern approach to classical
theorems of advanced calculus, Addison-Wesley, Reading, Massachusetts,
1965.

[19] Erdoğan Şuhubi, Exterior analysis: Using applications of differential forms,


Academic Press, Waltham, Massachusetts, 2013.
[20] Mariusz Wodzicki, Notes on riemann integral, an annex to h104
etc., Lecture notes posted at https://math.berkeley.edu/ wodz-
icki/H104.F10/Integral.pdf, December 2 2010.
Mathematical Notation

Linear Algebra
x, y, z n-component column vectors
ω, µ m-component row vectors
A, B, C m by n matrices
AT transpose
A−1 inverse
tr(A) trace
det(A)√ determinant
|x| = xT x Euclidean norm
kAk p Lipschitz norm
kAk2 = tr(AT A) Euclidean norm

Multivariable functions
f , g, h functions from E ⊆ Rn to Rm
x 7→ f (x) same as f
f0 derivative matrix function from open E ⊆ Rn to m by n matrices
x 7→ f 0 (x) same as f 0
x, y, z variables in Rn
y = f (x) y as a function f (x) of x
yi = fi (x) yi as a function fi (x) of x
∂y 0
∂x = f (x) derivative matrix (Jacobian matrix)
∂yi ∂fi (x) 0
∂xj = ∂xj = fi,j (x). entry of derivative matrix
dx = dx1 ∧ · · · ∧ dxn exterior product of differentials
dy ∂y 0
dx = det ∂x = det f (x) determinant of derivative matrix (Jacobian determinant)
g◦f composite function
(g ◦ f )(x) = g(f (x)) composite function of x
(g ◦ f )0 = (g ◦ f )f 0 chain rule
(g ◦ f )0 (x) = g(f 0 (x))f 0 (x) chain rule as a function of x
p = g(u), u = f (x) composite function expressed with variables
∂p ∂p ∂u
∂x = ∂u ∂x chain rule expressed with variables
f, g, h functions from E ⊆ Rn to R
f,i j (x) entries of Hessian matrix of second derivatives

179
180 MATHEMATICAL NOTATION

Integration
I, m(I) cell, volume of cell
P partition into cells
fI restriction
L(f, P), U (f, P) lower sum, upper sum
L(f ), U (f ), I(f ) lower integral, upper integral, integral
1A indicator function of subset A
m(A) = I(1A ) content (volume) of A
int(A) interior of subset A
bdy(A) boundary of subset A
oscA (f ) oscillation on set A
oscx (f ) oscillation at point x
Disc(f ) {x | oscx (f ) > 0}
δ (x) family of approximate delta functions

Differential Forms
x, y, u coordinate systems
s = h(x)
Pn scalar field

X = j=1 aj ∂x vector field
Pn ∂s j
ds = i=1 ∂xi dxi differential of a scalar (an exact 1-form)
Pn
ω = i=1 P pi dxi differential 1-form
n
hω | Xi = i=1 pi ai scalar field from form and vector
θ differential k-form
hθ | X1 , . . . , Xk i scalar field from form and vectors
Xcθ interior product k − 1 form
θ∧β exterior product of k-form with `-form
dθ exterior derivative of θ (a k + 1 form)
φ = (x ← g(u)) manifold mapping (parameterized surface)
φ∗ h(x) = h(g(u)) pullback of a scalar field
Pk
φ∗ dxi = α=1 gi,α 0
(u) duα pullback of a basis differential

φ θ pullback of a differential k-form
Pk 0
φ∗ ∂u∂α = i=1 gi,α ∂
(u) ∂x i
pushforward of a basis vector field
φ∗ Y pushforward of a vector field
χ chain
∂χ
R boundary of chain
χ
θ integral of form over chain

The P MetricPTensor
n n
g = i=1 j=1 gij dxi dxj metric tensor
gij matrix entries of metric tensor (inner product on vectors)
G matrix of metric tensor
g ij matrix entries of inverse matrix (inner product on forms)
G−1 √ inverse of matrix tensor matrix

g = det G volume factor

vol = g dx1 · · · dxn volume form
181

Xcvol flux form


element = φ∗1 vol hypersurface element
φ∗ (Xcvol) = Xcelement flux through a hypersurface
∇ · X = div X divergence
∇s = grad s gradient
∇2 s = div grad s Laplacian
Pk Pk
g∗ = α=1 β=1 gαβ ∗
duα duβ metric tensor on surface

P k ∂xi ∂
φ∗ ∂uα = i=1 ∂uα ∂xi αth tangent vector
∂xi
Xαi = ∂u α
components of αth tangent vector

gαβ = XαT GXβ matrix entries of surface metric tensor
X matrix of tangent vector components
G∗ = X√T GX Gram matrix of surface metric tensor
√ ∗
g = √det G∗ area factor
area = g ∗ du1 · · · duk surface area form
area = |element| hypersurface area form

Measure Zero
m(A) content of Jordan measurable A
m̄(A) outer content of A
µ̄(A) outer measure of A
2π n/2
vn = n1 Γ(n/2) volume coefficient
Bn (a, r) open n ball of volume vn rn
2π n/2
an = nvn = Γ(n/2) area coefficient
Sn−1 (a, r) n − 1 sphere of area an−1 rn−1
Index

C ∞ function, 18 cofactor, 145, 156


C k function, 18 column vector, 7
commutative, 94
active interpretation, 20 composition of functions, 21
active tranformation, 104 congruent (matrices), 16
additive (on functions), 46 conservation law, 132
additive (on sets), 48 constrained optimization, 91
advective derivative, 79 constraint surface, 91
almost everywhere, 165 content, 48
angle form, 83 content zero, 160
anticommutative, 94 continuity equation, 132
approximate delta functions, 60 contraction, 144
arc length, 136 contraction (function), 2
area, 146, 149 contraction (vector with form), 128
area factor, 172 contravariant tensor, 137
assignment, 101 coordinate basis, 141
autonomous system, 74 coordinate system, 72
countably subadditive (on sets), 161
ball, 48 covariant tensor, 137
block Cramer’s rule, 175 covector, 7
bound variable, 100 Cramer’s rule, 145, 156
boundary (chain), 111 critical point, 34
boundary (subset), 48 critical value, 34
cross product, 140
Cartesian coordinates, 136 curl, 140
Cauchy-Binet theorem, 146 curve, 88
cell, 44 cut plane, 82
cell partition, 45
chain (of singular surfaces), 110 degenerate cell, 44
chain rule, 21 degenerate interval, 44
change of variables, 64, 101, 110, 169 delta function, 60
characteristic subspace, 116 derivative, 17
classical Stokes’ theorem, 114 determinant, 8, 61, 144, 145, 156
closed 1-form, 81 diagonal matrix, 8
closed form, 95 diffeomorphism, 72
co-area factor, 172 differential, 79, 95
co-area formula, 172 differential k-form, 93

182
INDEX 183

differential 1-form, 79 general relativity, 138


differential form, 93 global, 73
Dini’s theorem, 55 gradient, 139
divergence, 130 Gram matrix, 8, 146, 156
divergence theorem, 130 Green’s first identity, 140
dominated convergence, 55 Green’s theorem, 113
dominated convergence theorem, 55
dual basis, 80 Hénon map, 20
dummy variable, 100 Hessian matrix, 34, 86
hypersurface, 19
eigenvalue, 12 hypersurface element (transverse), 131
Einstein summation convention, 144
electric displacement, 120 implicit function theorem, 28
electric field, 119 implicitly defined surface, 19, 88
electric flux density, 120 indicator function, 48
electric potential, 120 infimum (greatest lower bound), 44
element (transverse), 131 integral (differential form), 108
elementary matrices, 61 integral (function), 45
embedded manifold, 88 integral (twisted differential form), 138
Escher, 118 integrating factor, 84
Euclidean norm (matrix), 11 integration by parts, 132
Euclidean norm (vector), 9 interchange matrix, 62
Euler vector field, 83 interior (subset), 48
exact 1-form, 81 interior product, 116, 128
exact differential, 81 interior pullback, 131
exact form, 95 interval, 44
expectation, 171 inverse function theorem, 31
exterior derivative, 95 inverse matrix, 7
exterior product, 93 invertible matrix, 8
iterated integral, 50
face mapping, 111 iteration, 2
fiber form, 170
fibered manifold, 170 Jacobian determinant, 21
finer (partition), 45 Jacobian matrix, 17
finitely subadditive (on sets), 161 Jordan content, 48
first derivative test, 86 Jordan measurable set, 48
fixed point, 2
fixed point (stable), 3 Lagrange multipliers, 91
flux, 130 Lagrangian derivative, 79
flux density, 129, 130 Laplacian, 140
form, 93 Lebesgue theorem on the Riemann in-
free variable, 101 tegral, 163
Fubini’s theorem, 50 length, 136
fundamental theorem of calculus, 113 Levi-Civita symbol, 144
line integral (form), 113
Gauss’s theorem, 115 line integral (vector field), 131
184 INDEX

linear form, 7 parameterized surface, 18, 89


linear transformation, 7 parametrized surface, 109
linearization (function), 18 partial derivative, 17
linearization (vector field), 76 particle derivative, 79
Lipschitz function, 2, 166 partition (cell), 45
Lipschitz norm (matrix), 10 partition (set), 45
local, 73 passive interpretation, 20
local manifold patch, 99 passive transformation, 103
locally Lipschitz function, 166 pendulum, 77
locally paremeterized surface, 89 Penrose, 118
lower integral, 45 permutation symbol, 144
lower sum, 45 Poincaré lemma, 99
pointwise convergence, 54
magnetic field, 121 polar coordinates, 82, 141
magnetic field intensity, 121 positive definite quadratic form, 16
magnetic flux density, 121 probability, 171
magnetic potential, 122 probablity density, 171
manifold, 73 pseudoform, 138
manifold mapping, 102 pullback, 101
manifold patch, 72 pullback (differential form), 102
mapping, 102 pullback (scalar field), 102
material derivative, 79 punctured plane, 82
matrix, 7 pushforward (function), 169
mean value theorem, 22 pushforward (vector field), 104
measure zero, 160
metric tensor, 135 quadratic form, 16
mixed tensor, 137
Morse lemma, 87 random variable, 171
rank (differential form), 116
nice region, 98
rectangular set, 45
nilpotent matrix, 12
regular implicit surface, 88
non-degenerate quadratic form, 16
regular locally parameterized surface,
non-singular matrix, 8
89
nonlinear pendulum, 77
regular parameterized surface, 90
normalized basis, 141
regular surface, 88
orbit, 2 replacement, 101
orientation, 106 Riemann integral, 45
oriented boundary (chain), 111 Riemannian geometry, 138
orthogonal coordinates, 141 rotation matrix, 12
orthogonal matrix, 8 rotation vector fiels, 83
oscillation (at point), 48 row vector (form), 7
oscillation (on set), 48
outer content, 160 Sard’s theorem, 167
outer measure, 160 scalar field, 73
scaling matrix, 62
parameter cell, 111 second derivative matrix, 32
INDEX 185

second derivative test, 87 upper integral, 45


second differential, 86 upper sum, 45
set partition, 45
shear matrix, 62 variant basis, 106
similar (matrices), 16 vector (column), 7
singular matrix, 8 vector field, 74
singular parameterized surface, 109 vector field along a mapping, 105
singular surface, 109 vector field along surface, 90
smooth function, 18 velocity, 135
solid angle form, 152, 154 volume, 129, 145, 149
spectral radius, 12 volume of cell, 44
spectrum, 12
speed, 136
sphere, 48
spherical polar coordinates, 142
stable fixed point, 3
Stokes’ theorem, 111
straightening out theorem, 75
strict contraction, 2
subadditive (on functions), 46
subadditive (on sets), 161
substantial derivative, 79
substitution, 101
summation convention, 144
superadditive, 46
support, 49
supremum (least upper bound), 44
surface, 88, 109
surface area, 149
surface element (transverse), 131
surface integal (form), 114
symmetric matrix, 8

tangent space, 19
tangent vector field, 90
tensor, 137
tensor algebra, 138, 144
tensor calculus, 138
tensor field, 137
trace, 8
transpose matrix, 7
transverse hypersurface element, 131
twisted form, 138

uniform convergence, 54
uniform convergence theorem, 55

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy