Book On Stokes Simpler
Book On Stokes Simpler
William G. Faris
1 Differentiation 1
1.1 Fixed point iteration (single variable) . . . . . . . . . . . . . . . 2
1.2 The implicit function theorem (single variable) . . . . . . . . . . 5
1.3 Linear algebra review (norms) . . . . . . . . . . . . . . . . . . . . 7
1.4 Linear algebra review (eigenvalues) . . . . . . . . . . . . . . . . . 12
1.5 Differentiation (multivariable) . . . . . . . . . . . . . . . . . . . . 16
1.6 Fixed point iteration (multivariable) . . . . . . . . . . . . . . . . 26
1.7 The implicit function theorem (multivariable) . . . . . . . . . . . 28
1.8 Second order partial derivatives . . . . . . . . . . . . . . . . . . . 32
1.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2 Integration 43
2.1 The Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 Jordan content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Approximation of Riemann integrals . . . . . . . . . . . . . . . . 48
2.4 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5 Uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Dominated convergence . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 Differentiating a parameterized integral . . . . . . . . . . . . . . 58
2.8 Approximate delta functions . . . . . . . . . . . . . . . . . . . . . 60
2.9 Linear algebra (determinants) . . . . . . . . . . . . . . . . . . . . 61
2.10 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3 Differential Forms 71
3.1 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2 Scalar fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Fluid velocity and the advective derivative . . . . . . . . . . . . . 78
3.5 Differential 1-forms . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Integrating factors and canonical forms . . . . . . . . . . . . . . . 84
3.8 The second differential . . . . . . . . . . . . . . . . . . . . . . . . 86
3.9 Regular surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
iii
iv CONTENTS
The book
This book originated in lectures given in Fall 2014 at NYU Shanghai for an
advanced undergraduate course in multivariable analysis. There are chapters
on Differentiation, Integration, Differential Forms, The Metric Tensor, together
with an optional chapter on Measure Zero. The topics are standard, but the
attempt is to present ideas that are often overlooked in this context. The fol-
lowing chapter by chapter summary sketches the approach. The explanations in
the summary are far from complete; they are only intended to highlight points
that are explained in the body of the book.
Differentiation
This main themes of this chapter are standard.
• It begins with fixed point iteration, the basis of the subsequent proofs.
• The central object in this chapter is a smooth (that is, sufficiently differen-
tiable) numerical function f defined on an open subset U of Rk with values
in Rn . Here k is the domain dimension and n is the target dimension. The
function is called “numerical” to emphasize that both inputs and outputs
involve numbers. (See below for other kinds of functions.) Sometimes a
numerical function is denoted by an expression like y 7→ f (y). The choice
of the variable name y is arbitrary.
• The implicit function theorem gives conditions for when an implicit rep-
resentation gives rise to an explicit representation.
v
vi CONTENTS
• The inverse function theorem gives conditions that ensure that the trans-
formation has an inverse transformation.
Integration
This chapter is about the Riemann integral for functions of several variables.
There are several interesting results.
• The Fubini theorem for Riemann integrals deals with iterated integrals.
Differential Forms
This chapter and the next are the heart of the book. The central idea is ge-
ometrical: differential forms are intrinsic expressions of change, and thus the
basic mechanism of calculation with differential forms is equally simple for ev-
ery possible choice of coordinate system. (Of course for modeling a particular
system one coordinate system may be more convenient than another.) The fun-
damental result is Stokes’ theorem, which is the natural generalization of the
fundamental theorem of calculus. Both the fundamental theorem in one dimen-
sion and Stokes’ theorem in higher dimensions make no reference to notions of
length and area; they simply describe the cumulative effect of small changes.
CONTENTS vii
Because of this intrinsic nature, the expression of Stoke’s theorem is the same
in every coordinate system.
• Example: Here is a simple example from physics that illustrates how nat-
ural it is to have a free choice of coordinate system. Consider an idea gas
with pressure P , volume V , and temperature T . The number of gas par-
ticles N is assumed constant. Each of the quantities P, V, T is a function
of the state of the gas; these functions are related by the ideal gas law
P V = N kT.
P dV + V dP = N k dT.
This equation is a precise description of the how these variables change for
a small change in the state of the gas. A typical use of the fundamental
theorem of calculus is the calculation of work done by the system during
a change of state where the temperature is constant. This is obtained by
integrating −P dV along states of constant temperature T = T0 . Suppose
that in this process the volume changes from V0 to V1 and the pressure
changes from P0 to P1 . Then
has integral N kT0 log(V0 /V1 ). But nothing depends on using volume as
an independent variable. On the curve where T = T0 there is a relation
P dV + V dP = 0. So on this curve the form is also equal to
The motion along the curve may be described either by the t coordinate or
the u coordinate, but the length of the curve between two points is a num-
ber that does not depend on this choice. In practice such integrals involv-
ing square roots of sums of squares can be awkward. Sometimes it helps
to use a different coordinate system in the plane.
p For instance, in polar
coordinates the same form has the expression (dr/dt)2 + r2 (dθ/dt)2 dt.
The theory of differential forms gives a systematic way of dealing with all
such coordinate changes.
viii CONTENTS
• Example: There are many examples of the modeling process from subjects
as varied as physics and economics. There are also examples internal to
mathematics, especially in geometry. Here is one that might occur in
elementary geometry. Let M be the set of geometric rectangles in the
plane. (The rectangles have a common corner and common alignment.)
Each rectangle has a length ` and a width w. So ` : M → R and w :
M → R are both scalar fields. Together (`, w) form a coordinate system
that maps M onto an open quadrant in R2 . This coordinate system is
useful for describing the problem of painting a rectangular surface with
given dimensions. The product A = `w is also a scalar, the area of the
rectangle. The pair (A, w) forms another coordinate system. The new
system is useful for painting a rectangular surface with a given width
using a fixed amount of paint. The numerical function that relates these
two coordinate systems is (x, y) 7→ (xy, y) with inverse (s, t) 7→ (s/t, t).
• The concept of scalar field leads to the concept of exact differential 1-form.
If s is a scalar field, the corresponding exact differential 1-form is written
ds. A general differential 1-form is a linear combination of products of
scalar fields with exact differential 1-forms. These concepts have precise
definitions that are given in the book.
dA = w d` + ` dw.
This is a rigorous equality of scalars: the left hand side is the composition
of the scalar function A : M → R with the active transformation ` ←
2w, w ← ` : M → M . An√active transformation
√ need not have an inverse:
the transformation ` ← `w, w ← `w maps rectangles to squares.
The distinction between passive and active transformations is widely rec-
ognized but not always made explicit. (Some authors use terminology in
x CONTENTS
which “passive” and “active” are replaced by the awkward terms “alias”
and “alibi”.)
• Given two manifold patches K and M there is a concept of manifold
map from K to M . The treatment in the book gives a practical notation
that may be used in computations. Suppose K has coordinate system
t = (t1 , . . . , tk ) and M has coordinate system u = (u1 , . . . , un ). Then
there is a smooth numerical function f such that the manifold map may
be represented in the form u ← f (t). What this means is: take the input
point in K, find its coordinate values using t, apply the function f to
these values to get new values, and finally define the output point in M to
be the point that has these values as u coordinates. The notation for an
active transformation is the special case when K = M and there is only
one coordinate system, so the transformation is u ← f (u).
• The left arrow notation for an active transformation is in perfect analogy
with the assignment notation used in computer science. In that case, a
(simultaneous) assignment means to start with the machine state, find
the values of the u variables, do the computation indicated by f (u) and
store the result in the locations indicated by the u variables, thus produc-
ing a new machine state. (In computer science and in mathematics the
equal sign is often used to indicate assignment. Since assignment is not
symmetric, this is a clash of notation.)
• There is a general concept of pullback of a scalar field s or a differential
form by a manifold map. In the case of a scalar field this is just composi-
tion. That is, if the scalar field
s = g(u) : M → R
ω = p du + q dv
flow crosses a surface. In the case of dimension n = 2 the vector field has
components a and b (that depend on u and v), and the corresponding flux
√ √
form is the 1-form a g dv − b g du. The divergence theorem then states
that
√ √
∂a g ∂b g √ √ √
Z Z
1
√ + g du dv = (a g dv − b g du) .
R g ∂u ∂v ∂R
The intuitive meaning is that mass production within the region) pro-
duces flow across the boundary. The flow is described by a vector field, an
object that is easier to picture than a differential 1-form. Once the appro-
priate volume form is given, the divergence theorem may be formulated in
any number of dimensions and in arbitrary coordinates. The interpreta-
tion and use of the divergence theorem are illustrated in the book by the
solution of a conservation law.
• The definitions of gradient and Laplace operator require use of the metric
tensor. This leads to a choice of whether to use coordinate bases or or-
thonormal bases. The advantages of both appear when it is possible to use
orthogonal coordinates. The book explains how these ideas are related. In
particular, it makes the connection with treatments found in elementary
textbooks.
• The chapter concludes with formulas for surface area. A central idea is
an amazing generalization of the theorem of Pythagoras. The classical
theorem says that the length of a vector is the square root of the sum of
squares of the lengths of its projections onto the coordinate axes. One
case of the generalization says that the area of a parallelogram is the
square root of the sum of the squares of the areas of its projections onto
coordinate planes. Unfortunately, surface area calculations are not easy.
Measure zero
The notions of Riemann integral and Jordan content are almost obsolete; they
have been surpassed by the far more powerful and flexible theories of Lebegue
integral and Lebesgue measure. The study of the Riemann integral is still useful,
if only to see how a subtle change in a definition can lead to a new theory that
improves it in every way. Roughly speaking, the Riemann integral or Jordan
content is defined by a one-step limiting process, while the Lebesgue integral or
Lebesgue measure uses a two-step limiting process. This makes all the difference.
• The first part of the chapter is a tiny portion of the Lebesgue theory that
fits in a nice way with the material presented earlier. This is the theory
of sets of Lebesgue measure zero. These sets have nice mapping proper-
ties, and they also are fundamental to the characterization of Riemann
integrable functions.
CONTENTS xiii
• The second topic is different. The usual surface area formula works for
an explicitly defined surface (that is, a parameterized surface). There is
another approach that works for a family of implicitly defined surfaces. For
such a family there are two associated objects. The fiber form describes
how to integrate over such a surface, and the co-area factor is a metric
quantity associated with the surface. The co-area formula states that the
surface area form is equal to the co-area factor times the fiber form. This
gives a direct way to do surface area calculations for implicitly defined
surfaces.
General references
For mathematics at this level it is helpful to see multiple approaches. The course
used Rudin [17] as an alternative text; it focused on the two chapters on Func-
tions of Several Variables and on Integration of Differential Forms. The book
by Spivak [18] gives a treatment at about the same level as Rudin. Flanders [6]
presents differential forms along with many applications. For a more advanced
version of the story, the reader may consult Barden and Thomas [2]. Perhaps
the relatively technical books by Morita [12] and by Agricola and Friedrich [1]
could be useful. The subject matter considered here overlaps with tensor anal-
ysis. The book by Lovelock and Rund [9] has a relatively traditional approach;
as a consequence one can find many useful formulas. The notes by Nelson [13]
are more abstract and sophisticated, but there is good information there too.
Acknowledgements
The students in this course were young, enthusiastic, and able; meeting them
over the course of a semester was a delight. Some of them were able to find
time to read the manuscript and make valuable comments. In particular, the
author is happy to thank Hong-Bin Chen and Xiaoyue Gong for their work on
a previous draft.
xiv CONTENTS
Chapter 1
Differentiation
1
2 CHAPTER 1. DIFFERENTIATION
Example: As an example, take X to be the real line with g(x) = cos(x). Start
with an arbitrary real number. Then the iterates converge to a fixed point x∗
that is equal to 0.739 in the first three decimal places. This is an experiment
that is easy to do with a scientific calculator. |
When X is the real number line there is a lovely way of picturing the iteration
process. Consider the graph of the linear function y = x. Each iterate is
represented by a point (x, x) on this graph. Consider the graph of the given
function y = g(x). The next iterate is obtained by drawing the vertical line from
(x, x) to (x, g(x)) and then the horizontal line from (x, g(x)) to (g(x), g(x)). The
process is repeated as many times as is needed to show what is going on.
In the following we need stronger notions of continuity. A function g : X →
X is Lipschitz if there is a constant c ≥ 0 such that for all x and y in X we have
Proof: There can only be one fixed point. Suppose x = g(x) and y =
g(y) are fixed points. Then d(x, y) ≤ d(x, g(x)) + d(g(x), g(y)) + d(g(y), y) =
d(g(x), g(y)) ≤ cd(x, y). Since c < 1 we must have d(x, y) = 0, so x = y. This
proves the uniqueness of the fixed point.
1.1. FIXED POINT ITERATION (SINGLE VARIABLE) 3
The existence of the fixed point is shown via iteration. Start with x0 and
define the corresponding iterates xn . Then d(xn+1 , xn ) ≤ cn d(x1 , x0 ). Hence
for m > n
d(xm , xn ) ≤ d(xm , xm−1 ) + · · · + d(xn+1 , xn ) ≤ [cm + · · · + cn ]d(x1 , x0 ). (1.3)
Hence
1 n
d(xm , xn ) ≤ [cm−n + · · · + 1]cn d(x1 , x0 ) ≤ c d(x1 , x0 ). (1.4)
1−c
This shows that the xn form a Cauchy sequence. Since the metric space is
complete, this sequence must converge to some x∗ . This is the desired fixed
point.
Example: Take again the example g(x) = cos(x). Let 1 ≤ r < π/2. Take the
metric space to be the closed interval X = [−r, r]. Since 1 ≤ r the function
g(x) = cos(x) maps X into itself. Furthermore, since r < π/2 there is a c < 1
such that the derivative g 0 (x) = − sin(x) satisfied |g 0 (x)| ≤ c. This is enough
to show that g is a strict contraction. So the theorem guarantees the existence
and uniqueness of the fixed point. Although there is no explicit formula for the
solution of cos(x) = x, the theorem defines this number with no ambiguity. |
The reasoning in this one-dimensional case may be formulated in a general
way. The following result gives a way of rigorously proving the existence of fixed
points. The hypothesis requires a bound on the derivative and the existence of
an approximate fixed point p.
Proposition 1.3 Let p be a real number, and consider the closed interval [p −
r, p + r] with r > 0. Suppose that |g 0 (x)| ≤ c < 1 for x in this interval. Fur-
thermore, suppose that |g(p) − p| ≤ (1 − c)r. Then g maps the interval into
itself and is a strict contraction, so it has a unique fixed point in this interval.
Furthermore, iteration starting in this interval converges to the fixed point.
Proof: It is clear that |g(x) − g(y)| ≤ c|x − y|. In order to show that g maps
the interval into itself, suppose |x − p| ≤ r. Then |g(x) − p| ≤ |g(x) − g(p)| +
|g(p) − p| ≤ c|x − p| + (1 − c)r ≤ r.
Sometimes it is helpful to have a result where one knows there is a fixed
point, but wants to show that it is stable, in the sense that fixed point iteration
starting near the fixed point converges to it. The following proposition is a
variant that captures this idea.
Proposition 1.4 Let p be a fixed point of g. Suppose that g 0 is continuous and
that |g 0 (p)| < 1. Then for c with |g 0 (p)| < c < 1 there is an r > 0 such that
|g 0 (x)| ≤ c < 1 for x in the closed interval [p − r, p + r]. Then g maps the
interval into itself and is a strict contraction. Furthermore, iterates starting in
this interval converge to the fixed point.
Fixed point iteration gives a rather general way of solving equations f (x) =
0. If a is an arbitrary non-zero constant, then the fixed points of
1
g(x) = x − f (x) (1.5)
a
4 CHAPTER 1. DIFFERENTIATION
are the solutions of the equation. The trick is to pick a such that g is a strict
contraction. However,
1
g 0 (x) = 1 − f 0 (x), (1.6)
a
so the strategy is to take a close to the values of f 0 (x) at points where f (x) is
close to zero. This is illustrated in the following result, which is a reformulation
of a previous result.
Proposition 1.5 Let p be a real number, and consider the closed interval [p −
r, p + r] with r > 0. Suppose that for some c with 0 < c < 1 we have 1 − c <
1 0
a f (x) < 1 + c for x in this interval. Furthermore, suppose that |f (p)| ≤
|a|(1 − c)r. Then the corresponding fixed point iteration starting in this interval
converges to the solution of f (x) = 0 in this interval.
Expand
The right hand side of this formula is continuous in y, so the left hand is also
continuous in y. This shows that the function h has a continuous derivative.
1.3. LINEAR ALGEBRA REVIEW (NORMS) 7
(BA)T = AT B T . (1.19)
between n by k matrices.
An n by n square matrix Q has an inverse R if QR = I and RQ = I. The
inverse of Q is denoted Q−1 . It is well-known linear algebra fact that if the
inverse exists on one side, for instance QR = I, then also RQ = I, and the
8 CHAPTER 1. DIFFERENTIATION
inverse exists on both sides. For matrices there are natural notions of addition
and subtraction and of (non-commutative) multiplication. However some care
must be taken with division, since P Q−1 is in general different from Q−1 P .
Note the important identity
and
tr(AB) = tr(BA). (1.22)
The most interesting number associated to a square matrix A is its determi-
nant det(A). It has a relatively complicated definition, but there is one property
that is particularly striking. If A and B are such matrices, then
HP = P Λ. (1.24)
1.3. LINEAR ALGEBRA REVIEW (NORMS) 9
The columns of P form an orthonormal basis, and these are the eigenvectors
of H. The diagonal entries in Λ are the eigenvalues of H. In general it is quite
difficult to compute such eigenvalues and eigenvectors.
The equation HP = P Λ may be written in various ways. Multiplication on
the right by P T gives H = P ΛP T . If λj are the eigenvalues, and pj are the
columns of P , then this gives the spectral representation of the matrix:
X
H= λj pj pTj . (1.25)
j
Example: Here is a simple example where one can do the computation. The
symmetric matrix is
13 12 2
H = 12 13 −2 . (1.26)
2 −2 8
It is easy to see that this matrix has dependent rows, so the determinant is
zero. As a consequence at least one eigenvalue is zero. In this situation it is
easy to find the other two eigenvalues λ1 , λ2 . Use λ1 + λ2 = tr(A) = 34 and
λ21 + λ22 = tr(A2 ) = 706. (For a symmetric matrix tr(A2 ) = tr(AT A) is the
2-norm.) All that remains is to solve a quadratic equation to get the non-zero
eigenvalues 25, 9. The corresponding eigenvectors are found by solving linear
systems. The eigenvectors form the column matrix
1 1 2
R = 1 −1 −2 . (1.27)
0 4 −1
Since the eigenvalues are distinct, the eigenvectors are automatically orthogonal.
This says that RT R is a diagonal matrix. If we normalize each column to be a
vector of length one, then we get a new matrix P such that P T P is the identity
matrix. In particular, we get the representation
1 1
1 1
H = 25 1 1 1 0 + 9 −1 1 −1 4 . (1.28)
2 18
0 4
|
Our main goal is to get a number to measure the size of a matrix. There is
a particularly simple definition of the size of a real column vector, namely
√
|x| = xT x. (1.29)
This will be called the Euclidean norm of the vector. There is a corresponding
notion of inner product of two real column vectors:
x · y = xT y. (1.30)
In other words, to compute the norm of A one computes the norm of the sym-
metric matrix AT A and takes the square root.
Example: Consider the matrix
3 2 2
A= . (1.38)
2 3 −2
√
It is easy to see that kAk2 = 34. On the other hand, to compute kAk takes
some work. But the matrix AT A is the√matrix H of the previous example,
which has eigenvalues 25, 9, 0. So kAk = 25 = 5.
There is a considerably easier way to do this computation, namely compute
kAT k, which is square root of the largest eigenvalue of AAT . The pleasure of
this is left to the reader. |
Since eigenvalues are difficult to compute, this is bad news. However, there
is another norm of A that is easier to compute. This is the Euclidean norm
√
kAk2 = trAT A. (1.39)
Even better news is that the Euclidean norm is an upper bound for the
Lipschitz norm. In fact,
XX XX X XX X
|Ax|2 = ( aij xj )2 ≤ ( a2ij )( x2k ) = a2ij ( x2k ) = kAk22 |x|2 .
i j i j k i j k
(1.41)
This is summarized in the following proposition.
P
The proof is easy. Write aij = p aip δpj . This is the i component of A
applied to the vector δj that is 0 except for 1 in the jth place. Then
XX XX X X X
kAk22 = a2ij = ( aip δpj )2 = |Aδj |2 ≤ kAk2 |δj |2 = nkAk2 .
j i j i p j j
(1.44)
Again we shall mainly be interested in the cases when R is not diagonal, that
is, φ is not a multiple of π.
One more ingredient is necessary. A square matrix N is said to be nilpotent
if some power N p = 0 for p ≥ 1. Such a matrix is small in the sense that every
eigenvalue of N must be zero. In other words, the spectral radius ρ(N ) = 0.
However the norm kN k can be quite large.
Much can be learned from the special case of 2 by 2 matrices. A 2 by 2
matrix A defines a linear function from R2 to R2 . For each x in R2 there is a
corresponding Ax in R2 . It is difficult to imagine the graph of such a function.
However there is a nice pictorial representation that is very helpful. One picks
several values of x and draws corresponding vectors Ax − x from x to Ax.
In the 2 by 2 case it is not difficult to compute the eigenvalues. The sum of
the eigenvalues is tr(A) and the product of the eigenvalues is det(A). From this
it is easy to see that the eigenvalues are the solutions of the quadratic equation
λ2 − tr(A)λ + det(A) = 0.
If A has positive eigenvalues less than one, then the vectors point in the
general direction of the origin. This is a stable case when the orbits go to
the fixed point at zero. If A has positive eigenvalues greater than one, then
they move away from the origin. If A has a positive eigenvalue greater than
one and another less than one, then the orbits move toward the eigenspace
corresponding to the larger of the two eigenvalues and then outward. If A has
negative eigenvalues, then there is some overshoot. However one can always
look at A2 and get pictures with just positive eigenvalues.
1.4. LINEAR ALGEBRA REVIEW (EIGENVALUES) 13
Theorem 1.12 Let A be an n by n matrix with real entries. Suppose that it has
n distinct eigenvalues. Then there exists a real matrix Λ with the same (possibly
complex) eigenvalues λk as A. Here Λ is a real matrix whose non-zero entries
are either real diagonal entries λk or two-by-two blocks of the form |λk |R(φk ),
where |λk | > 0 and R(φk ) is a rotation matrix. Furthermore, there exists a real
invertible matrix P such that
AP = P Λ. (1.47)
Suppose for simplicity that the eigenvalues are distinct and real, so Λ is
a real diagonal matrix. In this case there is a spectral theorem that directly
generalizes the result for the symmetric case. Let P have column vector pj and
let P −1 have row vectors χk . Then we have matrices that are similar matrices
X
A = P ΛP −1 = λj pj χk . (1.48)
j
Then
2 3 2 0 2 −3
A = P ΛQ = . (1.50)
1 2 0 −1 −1 2
Here Q = P − is the inverse of P . The spectral representation is
2 3
A=2 2 −3 − −1 2 . (1.51)
1 2
This defines a new Lipschitz norm on matrices. We can compute this norm in
the case of A (for which it was designed). We have
|Ax|G = |QAx| = |JQx| ≤ kJk|Qx| = kJk|x|G . (1.58)
1.4. LINEAR ALGEBRA REVIEW (EIGENVALUES) 15
This discussion is summarized in the following theorem. The theorem says that
the size of the matrix A is in some more profound sense determined by the
spectral radius ρ(A).
Theorem 1.14 For every real square matrix A with spectral radius ρ(A) and
for every δ > 0 there is a new Lipschitz norm defined with respect to a symmetric
matrix G with strictly positive eigenvalues such that
One can wonder whether one has the right to change norm in this way. The
first observation is that notions of continuity do not change. This is because if
G = QT Q, then
Furthermore, the geometry only changes in a rather gentle way. Thus, a ball
|x|2G < r2 in the new norm is actually an ellipsoid |Qx|2 = xT Gx < r2 in the
original picture.
The weakness of this idea is that the new norm is specially adapted to the
matrix A. If one is dealing with more than one matrix, then the new norm that
is good for one may not be the new norm that is good for the other.
Example: Consider the p eigenvalues 2, −1. The
p previous example with distinct
new norm is |Qx| = (2x − 3y)2 + (−x + 2y)2 = 5x2 − 16xy + 13y 2 . With
respect to this norm on vectors the matrix A has norm 2 (the spectral radius).
|
In the case of a 2 by 2 matrix A with a repeated eigenvalue λ it is easy to
find the decomposition. Let u be an eigenvector, so that (A − λI)u = 0. Find
some other vector such that (A − λI)v = δu. Take P = [u, v] to be the matrix
with these vectors as columns. Then
λ δ
AP = P . (1.62)
0 λ
Example: Let
−4 4
A= (1.63)
−9 8
with repeated eigenvalue 2. Then
2 − 31 δ 1
2 δ 1 0 3δ
A = P JQ = . (1.64)
3 0 0 2 δ −3 2
16 CHAPTER 1. DIFFERENTIATION
u1 = f1 (x1 , x2 , x3 ) (1.69)
u2 = f2 (x1 , x2 , x3 ).
1.5. DIFFERENTIATION (MULTIVARIABLE) 17
u = f (x, y, z) (1.70)
v = g(x, y, z).
Each partial derivative is just an ordinary derivative in the situation when all
variables but one are held constant during the limiting process. The matrix of
partial derivatives is also called the Jacobian matrix.
Again it may help to look at an example. We have
0 0 0
0 f1,1 (x) f1,2 (x) f1,3 (x)
f (x) = 0 0 0 . (1.72)
f2,1 (x) f2,2 (x) f2,3 (x)
The reader should be warned that a notation like the one on the left is often used
in situations where the matrix is a square matrix; in that case some authors use
it to denote the matrix, others use it to denote the determinant of the matrix.
If u = f (x, y, z) and v = g(x, y, z) we can also write
" ∂u ∂u ∂u # dx
du
= ∂x ∂v
∂y
∂v
∂z
∂v
dy . (1.74)
dv ∂x ∂y ∂z dz
Thus the function is written as the value at x plus a linear term (given by
multiplying a matrix times a vector) plus a remainder term. The requirement
on the remainder term is that
|r(x, h)|
→0 (1.77)
|h|
by the columns.) For each x one can look at the tangent space to the surface
at u = f (x). The tangent space is n dimensional and has the parametric form
f (x) + f 0 (x)h, where h is in Rn and parameterizes the tangent space.
In the special case when n = 1 the surface is actually just a curve. The
derivative at a point is the tangent vector. The tangent space at a point on the
curve is a line tangent to the point. For example, take the case when m = 2
and the curve is given by
u = cos(t) (1.79)
v = sin(t)
This describes a spherical surface. The angle s is the co-latidude and the angle
t is the longitude. The tangent plane is given by two parameters h, k as
Next, consider the case when m < n. In this case the set of x with f (x) = c
can be the implicit definition of a surface of dimension n − m in Rn . Suppose
that f 0 (x) has rank m. Then the tangent space to the surface at a point x
should have dimension n − m. It should consist of the points x̄ such that
f 0 (x)(x̄ − x) = 0.
When m = 1 the surface has dimension n − 1 and is called a hypersurface.
If u = f (x1 , . . . , xn ), then the derivative ∂u/∂xi = f,i0 (x1 , . . . , xn ) is a covector.
We often write this in the differential notation as
∂u ∂u
du = dx1 + · · · + dxn . (1.83)
∂x1 ∂xn
A simple example is a sphere given by
x2 + y 2 + z 2 = 1. (1.84)
20 CHAPTER 1. DIFFERENTIATION
The differential of the left hand side is 2x dx + 2y dy + 2z dz. The tangent plane
at a point is found by solving the equation
2x(x̄ − x) + 2y(ȳ − y) + 2z(z̄ − z) = 0. (1.85)
u = q (1.88)
v = bp.
Here b is a parameter with 0 < b < 1, representing some kind of contraction or
dissipation. It decreases area, but in a simple way. The combined transforma-
tion is the Hénon map
u = 1 − ax2 + y (1.89)
v = bx.
1.5. DIFFERENTIATION (MULTIVARIABLE) 21
It may be thought of as a prediction of the future state of the system from the
present state. Notice that this is an active operation; the state changes. It is
possible to iterate this map many times, in an attempt to predict the state far
into the future. This example has been the subject of much research. It turns
out that reliable prediction far into the future is quite difficult. |
The composition of two functions g and f is the function (g ◦ f ) defined by
(g ◦ f )(x) = g(f (x)). The chain rule describes the derivative of such a function.
Theorem 1.15 (Chain rule) Suppose that the derivatives f 0 (x) and g0 (f (x))
exist. Then
(g ◦ f )0 (x) = g0 (f (x))f 0 (x). (1.90)
The left hand is the derivative of the composition of the two functions, while
the right hand side is the matrix product representing the composition of their
derivatives, evaluated at the appropriate points.
Proof: Suppose f (x+h) = f (x)+f 0 (x)h+r(x, h) with |r(x, h)| ≤ (x, h)|h|.
Similarly, suppose g(u+k) = g(u)+g0 (u)k+s(u, k) with |s(u, k)| ≤ η(u, k)|k|.
Take u = f (x) and k = f 0 (x)h. Then
g(f (x+h)) = g(u+k+r(x, h)) = g(u)+g0 (u)k+g0 (u)r(x, h)+s(u, k+r(x, h)).
(1.91)
We need to show that the two remainder terms are appropriately small. First,
Second,
s(u, k+r(x, h)) ≤ η(u, k+r(x, h))|k+r(x, h)| ≤ η(u, f 0 (x)h+r(x, h))(kf 0 (x)k+(x, h))|h|.
(1.93)
The chain rule has various important consequences. For instance, in the case
when m = n it is possible that f has an inverse function g such that f (g(y)) = y.
It follows from the chain rule that
This would say that there is a point on the segment between x and y where the
derivative accurately predicts the change. But this can be false!
The following is a statement of a true version of the mean value theorem.
The idea is to average over the segment. (The hypothesis of this particular
version is that f 0 (x) not only exists but is continuous in x. A version requiring
differentiability but not continuous differentiability may be found in Rudin.)
Proof: Use the fundamental theorem of calculus and the chain rule to
compute
Z 1 Z 1
d
f (y) − f (x) = f ((1 − t)x + ty) dt = f 0 ((1 − t)x + ty)(y − x) dt. (1.98)
0 dt 0
But the integrand in the last integral does not depend on t. So this is just
M |y − x|, as in the theorem.
The mean value theorem idea also works to prove the result about continuous
partial derivatives.
Proof: It is evident that if the derivative exists and is continuous, then the
partial derivatives exist and are continuous. All the work is to go the other way.
The existence and continuity of the partial derivatives implies the following
statement. Let z be in the open set on which the partial derivatives exist and
are continuous. Let h be a vector in one of the coordinate directions. Then
df (z + th)/dt = f 0 (z + th)h exists for sufficiently small t, and the matrix of
1.5. DIFFERENTIATION (MULTIVARIABLE) 23
This represents the total change as the sum of changes resulting from increment-
ing one coordinate at a time. We can use the fundamental theorem of calculus
to write this as
n Z 1
X
f (x + h) − f (x) = f 0 (x + h[i−1] + th(i) )h(i) dt. (1.101)
i=1 0
Notice that each term in the sum only involves one coordinate direction. Fur-
thermore each integrand is continuous in t. It follows that
n Z
X 1
f (x + h) − f (x) − f 0 (x)h = [f 0 (x + h[i−1] + th(i) ) − f 0 (x)]h(i) dt. (1.102)
i=1 0
Hence
n Z
X 1
0
|f (x + h) − f (x) − f (x)h| ≤ kf 0 (x + h[i−1] + th(i) ) − f 0 (x)k dt|h|. (1.103)
i=1 0
This matrix notation is just another way of writing six equations. For ex-
ample, one of these equations is
∂q ∂q ∂u ∂q ∂v
= + . (1.107)
∂y ∂u ∂y ∂v ∂y
Then
∂p
= h0,1 (x, g(x, y)) + h0,2 (x, g(x, y))g,1
0
(x, y). (1.109)
∂x
When the functional relationships are specified there is no ambiguity. How-
ever this could also be written with p = h(x, v), v = g(x, y) and hence p =
h(x, g(x, y)). Then
∂p ∂p ∂p ∂v
= + . (1.110)
∂x ∂x ∂v ∂x
Now the problem is evident: the expression ∂p/∂x is ambiguous, at least until it
is made clear what other variable or variables are held constant. If we indicate
the variable that is held constant with a subscript, we get a more informative
equation
∂p ∂p ∂p ∂v
|y = |v + |x |y . (1.111)
∂x ∂x ∂v ∂x
|
In general, if p = h(u, v), u = f (x, y), v = g(x, y), a more precise notation
for partial derivatives should be
∂p ∂p ∂u ∂p ∂v
|y = |v |y + |u |y . (1.112)
∂x ∂u ∂x ∂v ∂x
In practice one usually does not indicate which variables are held constant unless
there is risk of confusion. But one should be clear that the partial derivative
with respect to a variable depends on the entire coordinate system.
The second ambiguity is special to the chain rule. Say that p = g1 (u, v),
q = g2 (u, v) and u = f1 (x, y), v = f2 (x, y). Then
∂q 0 0 0 0
= g2,1 (f1 (x, y), f2 (x, y))f1,2 (x, y) + g2,2 (f1 (x, y), f2 (x, y))f2,2 (x, y). (1.113)
∂y
1.5. DIFFERENTIATION (MULTIVARIABLE) 25
∂y ∂(y1 , . . . , ym )
= . (1.116)
∂x ∂(x1 , . . . , xn )
Warning: When m = n some authors (including Rudin) use the notation on the
right hand side for the determinant of the Jacobian matrix.
If p = g(y), the chain rule says that
∂p ∂p ∂y
= , (1.117)
∂x ∂y ∂x
∂y ∂y
dy = dx1 + · · · + dxn . (1.118)
∂x1 ∂xn
This will eventually have a rigorous definition in the context of differential forms.
One possible meaning of dx is as a column vector of dxi . In this setting we can
write formulas such as
dy ∂y dx
= . (1.119)
dt ∂x dt
On the right hand side this is a row covector times a column vector.
26 CHAPTER 1. DIFFERENTIATION
There is another meaning for dx. This is as a formula that occurs in inte-
grands:
dx = dx1 · · · dxn = dx1 ∧ · · · ∧ dxn . (1.120)
The product ∧ is the exterior product, to be explained in the chapter on differ-
ential forms. We shall see that when m = n it is natural to write
dy ∂y ∂(y1 , . . . , yn )
= det = det . (1.121)
dx ∂x ∂(x1 , . . . , xn )
Proof: From the mean value theorem it follows that |g(x)−g(y)| ≤ c|x−y|.
In order to show that g maps the ball into itself, suppose |x − p| ≤ r. Then
|g(x) − p| ≤ |g(x) − g(p)| + |g(p) − p| ≤ c|x − p| + (1 − c)r ≤ r.
Sometimes it is helpful have a result where one knows there is a fixed point,
but wants to show that it is stable, in the sense that fixed point iteration starting
near the fixed point converges to it. The following proposition is a variant that
captures this idea.
In the multidimensional case this result need not give a particularly good
account of stability, since the stability should be established by the spectral
radius ρ(g0 (p)), and the norm kg0 (p)k can be much larger. So the following
result is better.
Proof: For every δ > 0 there is a new norm |x|G so that kg0 (p))kG ≤
ρ(g (p)) + δ. Since ρ(g0 (p)) < 1, we can pick the norm so that kg0 (p))kG < 1.
0
Then the continuity of g0 (x) in x shows that for c with kg0 (p)kG < c < 1 there
is an r > 0 such that |x − p|G ≤ r implies kg0 (x)kG ≤ c < 1. Then g maps
this ball into itself and is a strict contraction. Furthermore, iterates starting
in this ball converge to the fixed point. Of course with respect to the original
Euclidean norm this ball is an ellipsoid.
When n = 2 there is a lovely way of picturing the function and the iteration
process. The idea is to plot vectors g(x) − x. A sequence of such vectors with
the tail of the next one equal to the tip of the previous one indicates the orbit.
The function itself may be pictured by drawing representative orbits. Near a
fixed point p the function g(x) is close to g(p)+g0 (p)(x−p) = p+g0 (p)(x−p).
Thus g(x) − p is close to g0 (p)(x − p), and so the picture resembles the picture
for the linear transformation g0 (p). In particular, the eigenvalues give insight
into the behavior that is expected.
Example: Define a function by
1 2 1
u = f (x, y) =(x − y 2 ) + (1.122)
2 2
1
v = g(x, y) = xy +
4
This has a fixed point where x and y are both equal to 1/2. The linearization
at the fixed point is
1
− 21
x −y 2
= 1 1 (1.123)
y x 2 2
Fixed point iteration gives a rather general way of solving equations f (x) = 0.
If A is an arbitrary non-singular matrix, then the fixed points of
are the solutions of the equation. The trick is to pick A such that g is a strict
contraction. However,
g0 (x) = I − A−1 f 0 (x), (1.125)
so the strategy is to take A close to values of f 0 (x) near a point where f (x) is
close to zero. This is illustrated in the following result, which is a reformulation
of a previous result.
Proposition 1.21 Let p be a real number, and consider the closed ball |x−p| ≤
r with r > 0. Suppose that for some c with 0 < c < 1 we have kI −A−1 f 0 (x)k ≤ c
for x in this interval. Furthermore, suppose that |A−1 f (p)| ≤ (1 − c)r. Then
the corresponding fixed point iteration starting in this interval converges to the
solution of f (x) = 0 in this interval.
taking the limit inside the integral. The integrands become independent of k,
and we obtain the desired formula for this vector, namely
h0 (y)u = −A(0)−1 B(0)u = −fI0 (h(y), y)−1 fJ0 (h(y), y)u. (1.136)
The right hand side of this is continuous in y. This shows that the left hand side
is continuous in y. As a consequence, h(y) as a function of y is differentiable,
and the derivative h0 (y) as a function of y is continuous.
The last part of the above proof seems complicated but is actually a straight-
forward application of the technique of the mean value theorem. It follows
unpublished notes of Joel Feldman.
The implicit function theorem has a geometric interpretation. Consider the
case m < n and a function f (x, y) for x in Rm and y in Rn−m , where the
function values are in Rm . A surface of dimension n−m in Rn is given implicitly
by f (x, y) = c , that is, f (x, y) − c = 0. The theorem says that we can write
x = h(y), where h is a function from Rn−m to Rm , such that f (h(y), y) = c.
Thus x = h(t), y = t is a parametric representation of the surface.
Example: Consider the example from Rudin with
Example: Here is an R program to carry this out for the input (u, v, w) =
(3, 2, 6), which one hopes is near enough to (u, v, w) = (3, 2, 7) where we know
the solution.
f1 <- function (x,y,u,v,w) 2 * exp(x) + y * u - 4 * v + 3
f2 <- function (x,y,u,v,w) y * cos(x) - 6 * x + 2 * u - w
x <- 0
y <- 1
u <- 3
v <- 2
w <- 6
g1 <- function (x,y) x - ( f1(x,y,u,v,w) - 3 * f2(x,y,u,v,w) )/20
g2 <- function (x,y) y - ( 6 * f1(x,y,u,v,w) + 2 * f2(x,y,u,v,w) )/20
for (n in 1:40) {
r <- g1(x,y)
s <- g2(x,y)
x <- r
y <- s }
x
[1] 0.1474038
y
[1] 0.8941188
This result is for input (3,2,6), and it is reasonably close to the point (0,1)
that one would get for input (3,2,7). Of course to get a better idea of the function
this computation needs to be repeated for a variety of inputs near (3,2,7). Even
then one only gets an idea of the function at inputs sufficiently near this point.
At inputs further away fixed point iteration may fail, and the behavior of the
function is harder to understand. |
The inverse function theorem is the special case of the implicit function
theorem when f (x, y) is of the form f (x) − y. It is of great importance. For
example, if a system is described by variables x1 , . . . , xn , and ui = fi (x1 , . . . , xn )
gives new variables, we might want to describe the system by u1 , . . . , un . This
would be a passive transformation. But can one recover the original variables?
If the matrix ∂ui /∂xj is non-singular, then the inverse functions says that it
should be possible. That is, we have xj = hj (u1 , . . . , un ).
32 CHAPTER 1. DIFFERENTIATION
naturally form a covector. How about the second order partial derivatives?
These can be arranged in a matrix
∂2u ∂2u ∂2u
00 00 00
f,11 (x) f,12 (x) f,13 (x) ∂x2 ∂x∂y ∂x∂z
∂2u ∂2u ∂2u
f 00 (x) = f 00 (x, y, z) = f,21
00 00
(x) f,22 00
(x) f,23 (x) = .
∂y∂x ∂y 2 ∂z∂x
00 00 00 ∂2u ∂2u ∂2u
f,31 (x) f,32 (x) f,33 (x)
∂z∂x ∂z∂y ∂z 2
(1.144)
One ordinarily expects that this is a symmetric matrix, that is,
00 ∂2u ∂2u 00
f,12 (x) = = = f,21 (x) (1.145)
∂x∂y ∂y∂x
∂u
f,i0 (x) = (1.156)
∂xi
vanishes. Consider the second derivative Hessian matrix f 00 (x) with entries
00 ∂2u
f,ij (x) = . (1.157)
∂xi ∂xj
This is a symmetric matrix with real eigenvalues. If at the given point all eigen-
values of this matrix are strictly positive, then the function has a local minimum.
Similarly, if all eigenvalues of this matrix are strictly negative, then the function
has a local maximum.
1.9 Problems
Problems 1: Fixed point iteration
1. Let g(x) = cos(x/2). It has a stable fixed point r > 0 with g(r) = r. Use
fixed point iteration to find a numerical value for r. Also find g 0 (r).
4. Let g(x) = (1/2)(x2 + 2x3 − x4 ). This has four fixed points r1 < r2 < r3 <
r4 . Find them, and specify which ones are stable. Compute everything
exactly.
5. In the preceding problem, prove that if r1 < x < r3 , then fixed point
iteration starting at x converges to r2 . Give a detailed discussion. Hint:
It may help to carefully draw a graph and use the graphical analysis of
fixed point iteration. Do not make assumptions about the graph that are
not justified.
1.9. PROBLEMS 35
Recitation 1
1. Use fixed point iteration to numerically find the largest root r of f (x) =
x3 − 5x2 + 3x + 1 = 0. Use g(x) = x − f (x)/f 0 (s), where s is chosen close
to the unknown root r. (Since f (4) = −3 is not very large, perhaps s
could be near 4.) Start the iteration near the root.
2. Consider a smooth function f (x) with a simple root r, that is, f (r) = 0
and f 0 (r) 6= 0. Let g(x) = x − f (x)/f 0 (x). Find g 0 (x). Find g 0 (r).
3. Use the iteration function of the previous problem to numerically find the
largest root for the example of the first problem.
4. Suppose that g : [a, b] → [a, b] is an increasing function: x ≤ y implies
g(x) ≤ g(y). Prove or disprove the following general assertion: There
exists s in [a, b] such that s is not a fixed point and iteration starting at s
converges to a fixed point.
5. Suppose that g : [a, b] → [a, b] is an increasing function: x ≤ y implies
g(x) ≤ g(y). Prove or disprove the following general assertion: The func-
tion g has a fixed point.
This matrix has determinant zero, so one eigenvalue is zero. Find all
eigenvalues. Find the corresponding eigenvectors, as column vectors. (Are
they orthogonal?) Produce a matrix P with the normalized eigenvectors
36 CHAPTER 1. DIFFERENTIATION
Find the Lipschitz norm of A (the square root of the largest eigenvalue of
AT A). Find the 2 norm of A (the square root of sum of squares of entries,
or, equivalently, the square root of the trace of AT A). Compare them.
3. This problem deals with the Lipschitz norm. Say that A is a real square
matrix. The claim is that it is always true that kA2 k = kAk2 . Prove or
disprove.
5. Find all real square matrices A such that kAk = kAk2 . If you need a hint,
see below.
Hint: Consider a vector x that is not the zero vector, and another vector
a. The Schwarz inequality says that the inner product a·x satisfies |a·x| ≤
|a||x| with equality only when a = cx. (Since a · x = |a||x| cos(θ), this
is when cos(θ) = ±1, so the vectors are either pointing in the same or
opposite direction.)
Use the Schwarz inequality for each i to prove
XX XX X
|Ax|2 = ( aij xj )2 ≤ ( a2ij x2k ) = kAk22 |x|2 . (1.162)
i j i j k
When is this an equality? (Consider the situation for each fixed i.) Once
you have the form of the matrix you can calculate AT A and evaluate the
norms.
Recitation 2
1. Describe all 2 by 2 matrices with only one eigenvalue that are not diago-
nalizable.
1.9. PROBLEMS 37
4. Let
cos(2θ) sin(2θ) cos(θ) sin(θ)
R= = cos(θ) sin(θ) − sin(θ) − cos(θ) .
sin(2θ) − cos(2θ) sin(θ) − cos(θ)
(1.165)
Check this identity. Find the eigenvalues and eigenvectors. Find R2 .
4. This continues the previous problem. Say that s = sin(uev ). Find ∂s/∂u
and ∂s/∂v. Find s as a function of x and y. Use the chain rule to evaluate
the two entries of the derivative matrix (row covector) ∂s/∂x and ∂s/∂y.
5. Let
x3
u = f (x, y) = (1.167)
x2 + y2
with f (0, 0) = 0 at the origin.
a) Show that u is continuous at the origin by direct calculation using the
definition of continuity.
0 0
b) Evaluate ∂u/∂x = f,1 (x, y) and ∂u/∂y = f,2 (x, y) away from the origin.
Evaluate ∂u/∂x and ∂u/∂y at the origin, using the definition of the (one-
dimensional) derivative.
c) Is u = f (x, y) a C 1 function? That is, are the partial derivatives ∂u/∂x
and ∂u/∂y both continuous? Prove that your answer is correct by direct
calculation.
c)The condition for u = f (x, y) to be differentiable at the origin is that
0 0
f (h, k) = f (0, 0) + f,1 (0, 0)h + f,2 (0, 0)k + r(0, 0; h, k) (1.168)
√ √
with |r(0, 0; h, k)|/ h2 + k 2 → 0 as h2 + k 2 → 0. Is the function differ-
entiable? Using only this definition, prove that your answer is correct.
1.9. PROBLEMS 39
Recitation 3
1. Define
g1 (x, y) = xy − 2x − 2y + 6 (1.169)
g2 (x, y) = xy − 2x + 1.
s = x4 + x2 y 2 + y 4 + y 2 z 2 + z 4 + z 2 x2 = 1. (1.171)
(a) Calculate the differential of the function defining the surface. For
which points on the surface does the differential vanish? (b) For which
points on the surface does the implicit function theorem define at least
one of the variables as a function of the other two near the point?
2. (a) In the preceding problem it should be possible to solve for y in terms
of x, z near the point (0,1,0). Find a function g(y; x, z) such that fixed
point iteration y 7→ g(y; x, z) with this function (for fixed x, z) gives the
40 CHAPTER 1. DIFFERENTIATION
4. The function of the previous problem maps the point (0, 1) to (1, 0). There
is an inverse function that sends points (u, v) near (1, 0) to points (x, y)
near (0, 1). Find functions g1 (x, y; u, v) and g2 (x, y; u, v) such that fixed
point iteration with these functions (for fixed (u, v)) give the correspond-
ing inverse values x, y. Express these functions in terms of f1 (x, y), f2 (x, y)
and u, v. (Use the algorithm involving the derivative matrix of f1 (x, y), f2 (x, y)
evaluated at the point (0, 1).
5. Consider the iteration function g1 (x, y; u, v), g2 (x, y; u, v) found in the pre-
vious problem. Show that if |x| < 1/100 and |y − 1| < 1/100, then the
linearization at such a point has norm bounded by 1/2. (Hint: Bound the
2-norm.)
Recitation 4
1. (a) Is xy 4 dx + 2x2 y 3 dy exact? Justify your answer.
(b) Is 3x2 y 2 dx + 4x3 y dy exact? Justify your answer.
1.9. PROBLEMS 41
There is a point on the line y = 0 where du = 0. Find it, and find the
corresponding value of u. Compute the Hessian matrix. Apply the second
derivative test to establish whether it is a local minimum, local maximum,
or saddle point (or something else).
Integration
43
44 CHAPTER 2. INTEGRATION
∀x ∈ S x ≤ sup S, (2.1)
∀x ∈ S x ≤ b ⇒ sup S ≤ b. (2.2)
Equivalently,
b < sup S ⇒ ∃x ∈ S b < x. (2.3)
Similarly, inf S is the number with the property that is the greatest lower
bound of S. That is, it is a lower bound:
∀x ∈ S inf S ≤ x, (2.4)
∀x ∈ S b ≤ x ⇒ b ≤ inf S. (2.5)
Equivalently,
inf S < b ⇒ ∃x ∈ S x < b. (2.6)
If f is a real function defined on some set, then the supremum and infimum of
the function are the supremum and infimum of the set of values of the function.
An interval is a subset of R that is connected. It is degenerate if it is empty
or consists of only one point. An n-dimensional cell is a subset of Rn that is a
product of n intervals. An n-dimensional cell is bounded if and only if each of
the n intervals is bounded. For a bounded cell we may define the n-dimensional
volume by
Yn
mn (I) = ∆xi , (2.7)
i=1
Furthermore, we can take more and more refined partitions and get correspond-
ing lower and upper integrals. More precisely, we have the lower integral
Here P ranges over all partitions of the set A. It is not hard to show that
L(f ) ≤ U (f ).
If L(f ) = U (f ), then we say that f is Riemann integrable with integral
I(f ) = L(f ) = U (f ). Warning: There are functions that are not Riemann
integrable; for such functions L(f ) < U (f ).
The reader will recall examples from the case n = 1. If f is defined on [a, b]
and f is monotone increasing (or monotone decreasing), then f is Riemann
integrable. On the other hand, if f (x) = 1 for x rational and f (x) = 0 for x
irrational, then L(f ) = 0, while U (f ) = b − a.
46 CHAPTER 2. INTEGRATION
The lower and upper integrals are always defined, but in general they have a
major defect: they need not be additive. The following proposition states what
is true in general: the lower integral is superadditive (on functions), and the
upper integral is subadditive.
U (f + g) ≤ U (f ) + U (g). (2.14)
Proof: It is sufficient to prove this for the case of the lower integral. We
have for each x in I the inequality
so the left hand side is an lower bound. Therefore the greatest lower bound
satisfies
inf f (x) + inf g(x) ≤ inf (f (x) + g(x)). (2.16)
x∈I x∈I x∈I
So we have an upper bound for the L(f, Q). The least upper bound L(f ) must
then satisfy
L(f ) + L(g, R) ≤ L(f + g). (2.18)
Similarly, we have an upper bound for the L(f, R). The least upper bound L(f )
must then satisfy
L(f ) + L(g) ≤ L(f + g). (2.19)
Theorem 2.2 The Riemann integral is additive for Riemmann integrable func-
tions:
I(f + g) = I(f ) + I(g). (2.20)
The above theorem is the main reason for defining the Riemann integral
as the common value of the lower and upper integrals. Things can go wrong
when the upper and lower integrals differ. The reader will find it not difficult
to produce an example in one dimension where L(f + g) > L(f ) + L(g).
Let A be an arbitrary bounded subset of Rn . All these notions may be
extended to the case of a bounded function f : A → R. We merely have to
choose a rectangular set C such that A ⊆ C. Then we can define f¯ to be equal
2.1. THE RIEMANN INTEGRAL 47
This implies that the integral exists if and only if the infimum over all these
oscillation sums is zero.
If f is integrable, then so is h ◦ f .
Every C 1 function is Lipschitz on every bounded set. So the above result
applies to functions such as h(y) = y 2 .
Theorem 2.4 Suppose that f and g are Riemann integrable. Then so is the
product f · g.
Let P+ be the set of I with fI ≥ 0, while P− is the set of I with fI < 0. For I
in P+ define continuous gI with support in I and with 0 ≤ gI ≤ sup fI ≤ f on I
and with fI mn (I) − I(gI ) very small. Thus gI can be a continuous trapezoidal
function with very steep sides. For I in P− define continuous gI ≤ 0 with
compact support and with gI ≤ sup fI ≤ f on I and with fI mn (I) − I(gI )
very small. Again gI can be a continuous trapezoidal function with very steep
sides, however now it has constantP value on all of I and consequently has a
slightly larger support. Let g = I∈P gI . It is not difficult to show that g ≤ f .
Furthermore, we can arrange it so that
X
I(g) > sup fI mn (I) − . (2.35)
2
I∈P
and also Z Z Z
f (x, y) dx dy = f (x, y) dy dx. (2.37)
Here x ranges over a subset of Rm , and y ranges over a subset of Rn . The left
hand side is an ordinary Riemann integral over a subset of Rm+n . The right
hand side is an iterated integral. Thus in the first case for each fixed y there
is a corresponding m dimensional Riemann integral. These integrals define a
function of y, and the n-dimensional integral of this function gives the final
iterated integral.
In the following theoretical development we shall use a somewhat different
notation. The reason for this is that we shall be comparing lower and upper
2.4. FUBINI’S THEOREM 51
sums, and the variant notation makes it easy to incorporate these notions. In
particular, the above formulas will be written
and
I(f (x, y); x, y) = I(I(f (x, y); y); x). (2.39)
We shall see that in certain circumstances these formulas are true. However
there are technical issues. For example, suppose that the Riemann integral
I(f (x, y); x, y) exists. Then it is not guaranteed that for fixed y the integral
I(f (x, y); x) exists. Nor is it guaranteed that for each fixed x that the integral
I(f (x, y); y) exists.
Example: Consider the following function defined on [0, 1]×[0, 1]. Let f (x, y) =
1 when x is rational and y = 1/2, but f (x, y) = 0 elsewhere. This is Riemann
integrable with integral zero. However for y = 1/2 the function x 7→ f (x, y) is
not Riemann integrable. In fact, its lower integral is 0, while its upper integral
is 1. |
Theorem 2.7 (Fubini’s theorem for lower and upper integrals) For lower
integrals
L(f (x, y); x, y) ≤ L(L(f (x, y); x); y), (2.40)
while for upper integrals
There are similar results where the roles of x and y are reversed.
Proof: It is sufficient to prove the result for lower integrals. The result for
upper integrals is proved in the same way, but with the inequalities reversed.
For the proof, it is useful to have the concept of a product partition. Let
C = C1 × C2 be the cell over which the integration takes place. The x variables
range over C1 , while the y variables range over C2 . If P1 is a partition of C1 ,
and P2 is a partition of C2 , then the product partition P1 × P2 is the partition
of C consisting of all I1 ×I2 with I1 from C1 and I2 from C2 . Given an arbitrary
partition P of C, then there is a product partition that is finer than P. So it is
reasonable to first deal with product partitions.
First we need a simple lemma that only involves sums, not integrals. This
states that
L(f, P1 × P2 ) ≤ L(L(f (x, y); x, P1 ); y, P2 ). (2.42)
The proof of the lemma uses inf (x,y)∈I1 ×I2 f (x, y) = inf y∈I2 inf x∈I1 f (x, y).
The key ingredient is then the product property mm+n (I1 ×I2 ) = mm (I1 )mn (I2 ).
We have
X X
L(f, P1 × P2 ) = inf inf f (x, y)mm (I1 )mn (I2 ). (2.43)
y∈I2 x∈I1
I2 ∈P2 I1 ∈P1
52 CHAPTER 2. INTEGRATION
P P
From the general principle that I inf y hI (y) ≤ inf y I hI (y) we get
X X
L(f, P1 × P2 ) ≤ inf inf f (x, y)mm (I1 )mn (I2 ). (2.44)
y∈I2 x∈I1
I2 ∈P2 I1 ∈P1
This translates to
X
L(f, P1 × P2 ) ≤ inf L(f (x, y); x, P1 )mn (I2 ). (2.45)
y∈I2
I2 ∈P2
This leads easily to the statement of the lemma. The proof of the lemma is
complete.
Since lower sums are bounded by the lower integral, the lemma gives
as desired.
Example: The theorem above gives a kind of Fubini theorem that works for the
lower and upper integrals. Here is an example that shows that equality is not
guaranteed. Consider the case of the upper integral of a function f defined on
the cell [0, 1] × [0, 1]. Suppose that there is a countable dense set D such that
f is one on that set, zero on its complement. Then U (f (x, y); x, y) = 1. Now
suppose that the set D has the property that for every horizontal line, there
is at most one point on the line that is in D. Then for each y the function
x 7→ f (x, y) has upper integral U (f (x, y); x) = 0. Thus U (U (f (x, y), x), y) = 0.
So the iterated upper integral is smaller than the upper integral.
How can we find such a set D? First consider the set E of points in the
plane with both coordinates rational. Consider all lines in the plane with fixed
angle θ from the x axis, so that the slope is m = tan(θ). Suppose that m is an
irrational number. For instance, √ we could take lines at an angle θ = π/6 from
the x axis, with slope m = 1/ 3. Every such line intersects E in at most one
point. (Why?) Now rotate the picture by angle −θ, so that we get a set D that
consists of E rotated by this angle, and such that the lines become horizontal
lines. |
2.4. FUBINI’S THEOREM 53
Theorem 2.8 (Fubini’s theorem for the Riemann integral) Suppose that
the Riemann integral I(f (x, y); x, y) exists. Then for each fixed y the lower in-
tegral and upper integral are automatically defined and satisfy L(f (x, y); x) ≤
U (f (x, y); x). Furthermore, as functions of y these each define Riemann inte-
grable functions. Finally, we have both the formulas
and
I(f (x, y); x, y) = I(U (f (x, y); x); y). (2.51)
The result of course works in the other order. For the sake of completeness here
is an explicit statement.
Theorem 2.9 (Fubini’s theorem for the Riemann integral) Suppose that
the Riemann integral I(f (x, y); x, y) exists. Then for each fixed x the lower in-
tegral and upper integral are automatically defined and satisfy L(f (x, y); y) ≤
U (f (x, y); y). Furthermore, as functions of x these each define Riemann inte-
grable functions. Finally, we have both the formulas
and
I(f (x, y); x, y) = I(U (f (x, y); y); x). (2.53)
Proof: The two preceding theorems are essentially the same; it is sufficient
to prove the first one. The proof uses the results that relate lower integrals to
iterated lower integrals and upper integrals to iterated upper integrals. Once
we have these results, we are almost done. We have
L(f ) ≤ L(L(f (x, y); x); y) ≤ U (L(f (x, y); x); y) ≤ U (U (f (x, y); x); y) ≤ U (f ).
(2.54)
If L(f ) = U (f ), then L(L(f (x, y); x); y) = U (L(f (x, y); x); y). This proves the
integrability of the function that sends y to L(f (x, y); x).
Similarly, we have
L(f ) ≤ L(L(f (x, y); x); y) ≤ L(U (f (x, y); x); y) ≤ U (U f (x, y); x); y) ≤ U (f ).
(2.55)
If L(f ) = U (f ), then L(U (f (x, y); x); y) = U (U f (x, y); x); y). This proves the
integrability of the function that sends y to U (f (x, y); x).
In the above proof it does not seem to matter whether one used the lower
integral or the upper integral. This is clarified by the following remark. Define
the difference function
Theorem 2.11 Suppose that all the fn and f are Riemann integrable on the
bounded set A. If fn converges to f uniformly on A, then I(fn ) converges to
I(f ).
One way to prove this is to first prove that I(|fn − f |) converges to zero.
There is a remarkable theorem of Dini that shows that under certain very
special circumstances uniform convergence is automatic. The context is that of
a sequence of functions that is decreasing. (There is an obvious variant with a
sequence of functions that is increasing).
Proof: From the definition of the lower integral it follows that there is a
step function k with 0 ≤ k ≤ f and with L(f ) − I(k) < /2. However one can
approximate each step by a continuous trapezoid with very steep sides, so that
the resulting trapezoidal function g satisfies 0 ≤ g ≤ k and I(k) − I(g) < /2.
Then I(f ) − I(g) < I(f ) − I(k) + I(k) − I(g) < /2 + /2 = .
We now turn to an important result on monotone convergence; it will be used
to prove the dominated convergence theorem. In the monotone convergence re-
sult there is no assumption that the functions are Riemann integrable. However
they have lower integrals, so the result is stated in terms of lower integrals.
1
L(pi ) − I(gi ) ≤ . (2.61)
2i
Unfortunately, there is no guarantee that the functions gi are decreasing. To
fix this, let
hn = min(g1 , . . . , gn ). (2.62)
Then the hn ↓ 0 are decreasing pointwise to zero, and each hn is continuous.
Hence by Dini’s theorem I(hn ) ↓ 0. This looks promising.
2.6. DOMINATED CONVERGENCE 57
since each max(gi , gn )−gi ≥ 0. The sum on the right hand side does not depend
on j, so it is an upper bound for all of the gn − gj . By definition of hn we then
have
n−1
X
gn − hn ≤ (max(gi , gn ) − gi ). (2.64)
i=1
Hence,
n−1
X 1 1
I(gn ) − I(hn ) ≤ = 1 − . (2.67)
i=1
2i 2n−1
This is the result that is needed. We conclude by noting that
1 1
L(pn ) − I(hn ) = L(pn ) − I(gn ) + I(gn ) − I(hn ) ≤ n + 1 − n−1 . (2.68)
2 2
This gives
1
L(pn ) ≤ I(hn ) + 1 − n < I(hn ) + . (2.69)
2
So when n is so large that I(hn ) is less than , then L(pn ) is less than 2.
Remark: In the above proof P it could be tempting to use gn − gj ≤ pj − gj
n
for j ≤ n to prove gn − hn ≤ i=1 (pi − gi ). The problem would be that the
right hand side only has a lower integral. Furthermore, the lower integral is
only known to be superadditive, so the lower integral of the sum could be much
larger than the sum of the lower integrals. This was avoided in the proof by
using max(gi , gn ) in place of pi . |
58 CHAPTER 2. INTEGRATION
Then Z
0
g (y)h = f20 (x, y)h dx. (2.73)
A
Proof: Write
Z Z
g(y + h) − g(y) − f20 (x, y)h dx = (f (x, y + h) − f (x, y) − f20 (x, y)h) dx.
A A
(2.74)
This can also be written
Z Z Z 1
g(y + h) − g(y) − f20 (x, y)h dx = (f20 (x, y + th) − f20 (x, y))h dt dx.
A A 0
(2.75)
This has absolute value bounded by
Z Z 1
(y, h) = |f20 (x, y + th) − f20 (x, y)| dt dx. (2.76)
A 0
times |h|. All that remains is to show that (y, h) → 0 as h → 0. For fixed x
and t the integrand approaches zero as h → 0. The conclusion follows from the
dominated convergence theorem.
This theorem gives a practical condition for differentiating an integral de-
pending on a parameter with respect to the parameter. For the theorem to
apply it is important that the bound on f20 (x, y) be independent of x and of y.
60 CHAPTER 2. INTEGRATION
This family of functions (considered for > 0 small) will be called a family of
approximate delta functions. Notice that δ vanishes outside the closed ball of
radius c.
Proof: Write
Z Z
x 1
f (x)δ (x) dx = f (x)δ1 ( ) n dx. (2.80)
Make the change of variable x = u. This gives
Z Z
f (x)δ (x) dx = f (u)δ1 (u) du. (2.81)
The integrand converges pointwise in u to f (0)δ1 (u) on the closed ball of radius
c and is bounded by a constant independent of . By the dominated convergence
theorem the integral converges to f (0).
Remark: The amazing thing about this theorem is that the right hand side
is independent of the particular choice of approximate delta function. For this
reason, it is customary to write it in the form
Z
f (x)δ(x) dx = f (0). (2.82)
Of course, there is no such delta function δ(x) with this property, but it is still
convenient to describe its properties. While the left hand side does not have a
literal meaning, it gives an easy way to remember the result. Furthermore, it
allows one to summarize various useful facts, such as
Z
δ(x) dx = 1 (2.83)
2.9. LINEAR ALGEBRA (DETERMINANTS) 61
and
f (x)δ(x) = f (0)δ(x). (2.84)
Also, the delta function is even
1
|an |δ(ax − y) = δ(x − y). (2.87)
a
The reader may check that each of these suggests a meaningful statement about
approximate delta functions. |
The integrals involving approximate delta functions are often of the form
Z Z
h(y)δ (z − y) dy = h(z − x)δ (x) dx. (2.88)
The two integrals expressions are equivalent after the change of variable y =
z − x. The new feature is that we look at the result as a function of z. If
|h(y)| ≤ M , then each integral above as a function of z has magnitude bounded
by M . Furthermore, it is a continuous function of z. Also, suppose that h has
compact support K. Then this integral as a function of z has compact support
in the set Kc of points a distance at most c from K. The result may be stated
in this context as follows.
Volumes are multiplied by the absolute value |a| of the scale factor, which
is the absolute value of the determinant.
Proof: First it is helpful to take some care about the region of integration.
The integral is over the set of x with |g(x) − y| ≤ c. Consider the set Kc
consisting of all points with distance no greater than c from K. There is an
1 such that K1 c is a subset of g(V ). Since this is a compact set, the function
k(g−1 )0 k is bounded there by some constant λ. Now suppose that y is in K and
|y0 − y| ≤ c for some ≤ 1 . Then |g−1 (y0 ) − g−1 (y)| ≤ λ|y0 − y| ≤ λc. In
particular for x in the region of integration |x − g−1 (y)| ≤ λ|g(x) − y| ≤ λc.
Make the change of variable x = g−1 (y) + t. The integration region is now
|t| ≤ λc. This is a fixed bounded set, independent of . The integral on the left
hand side is
g(g−1 (y) + t) − g(g−1 (y))
g(x) − y 1
Z Z
−1
h(x)δ1 dx = h(g (y)+t)δ1 dt.
n
(2.97)
The integrand is bounded, independent of . By the dominated convergence
theorem the limit as → 0 of this is
Z Z
1
h(g−1 (y))δ1 (g0 (g−1 (y))t) dt = h(g−1 (y)) δ1 (u) du.
| det(g (g−1 (y)))|
0
(2.98)
The last step is the change of variables u = g0 (g−1 (y))t. This involves a matrix
g0 (g−1 (y)) that only depends on the parameter y and so may be regarded as
constant. Performing the u integral gives the resultRon the right hand side.
Remark: For later use we note that the integral V h(x)δ (g(x) − y) dx bas
a function of y is uniformly bounded on K, independent of . Furthermore,
64 CHAPTER 2. INTEGRATION
Proof: First we give the proof for the case when f is a continuous function.
The plan is to do the proof in two parts: write the right hand side as a limit,
and then write the left hand side as the same limit.
Here is the part dealing with the integral on the right hand side. Let K be
the support of f . We first have a limit of single integrals
Z
f (y)| det g0 (g−1 (y))|δ (g(x) − y) dy → f (g(x))| det g0 (x)| (2.102)
K
as → 0.
As a function of y the above integral is bounded by a constant independent
of and is supported on K. The dominated convergence theorem gives a limit
of double integrals
Z Z Z
f (y)| det g0 (g−1 (y))|δ (g(x) − y) dx dy → f (y) dy (2.105)
K L K
as → 0.
The proof for continuous f is concluded by noting that according to Fubini’s
theorem the two double integrals are the same.
The proof for general Riemann integrable functions requires additional com-
ment. First, the fact that the right hand side is integrable follows from general
properties of the Riemann integral with respect to composition and multipli-
cation. Then one can use approximation by continuous functions. In fact, for
every Riemann integrable f there are continuous functions g ≤ f ≤ h with in-
tegrals arbitrarily close to the integral of f . Since the integrals on the left hand
side are arbitrarily close, it follows that the integrals on the right hand side are
also arbitrarily close.
This beautiful proof is from a recent paper by Ivan Netuka [15]. (A few
details of the proof have been changed.) That formulation in this paper is for the
Lebesgue integral, but it also works for the Riemann integral if one recognizes
that the Riemann integral also has a dominated convergence theorem.
Remark: A physicist or engineer who is familiar with delta functions might
summarize the entire proof by recalling that
So
Z Z Z
f (g(x))| det g0 (x)| dx = f (y)| det g0 (g−1 (y))|δ(g(x) − y) dy dx (2.107)
2.11 Problems
Problems 5: Dominated Convergence
1. (a) Give an example of functions f and g on [0, 1] such that their lower
integrals satisfy L(f + g) > L(f ) + L(g).
(b) Give an example of functions f and g on [0, 1] such that their upper
integrals satisfy U (f + g) < U (f ) + U (g).
66 CHAPTER 2. INTEGRATION
Recitation 5
1. Evaluate Z 2 Z 3
ye−xy dy dx (2.109)
0 0
2. (a) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y = 12 , zero otherwise. Is it Riemann integrable? Prove
that your answer is correct.
(b) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational, zero otherwise. Is it Riemann integrable?
Prove that your answer is correct.
(c) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational and x = y, zero otherwise. Is it Riemann
integrable? Prove that your answer is correct.
(d) Consider the function f defined on the unit square with f (x, y) = 1 if
x is rational and y is rational and x 6= y, zero otherwise. Is it Riemann
integrable? Prove that your answer is correct.
(e) Consider the function f defined on the unit square with f (x, y) = 1
if x is rational or y is rational, zero otherwise. Is it Riemann integrable?
Prove that your answer is correct.
2.11. PROBLEMS 67
∂g(x, y) ∂g(x, y)
dg(x, y) = dx + dy. (2.114)
∂x ∂y
and s 2 2
∂g(x, y) ∂g(x, y)
|dg(x, y)| = + . (2.115)
∂x ∂y
Evaluate this in terms of an x integral involving partial derivatives of
g(x, y). These partial derivatives will be evaluated at the (implicitly de-
fined) y satisfying g(x, y) = c. Hint: First do the y integral.
(b) Say that the implicit function theorem applied to g(x, y) = c defines
y = h(x) as a function of x. Express the above result in terms of derivatives
of h(x). Show that this gives the usual formula for arc length.
5. Use the general formula to compute the area of the hemisphere z ≥ 0 for
the sphere x2 + y 2 +pz 2 = a2 . This is the three-dimensional integral of
δ(x2 + y 2 + z 2 − a2 )2 x2 + y 2 + z 2 dx dy dz. Hint: First do the z integral
to express the result as an x, y integral. Then it is easy to do this x, y
integral in polar coordinates.
Recitation 6
1. Evaluate Z 1 Z 3
2
e−x dx dy. (2.116)
0 3y
3. Do the integral Z ∞ Z ∞
2
+y 2
e−x dx dy (2.117)
−∞ −∞
4. Do the integral Z ∞
2
e−x dx (2.118)
−∞
1
Show that Γ( 21 ) = π 2 .
6. Let an−1 be the area of the unit sphere in dimension n − 1. Prove that
n 2
π 2 = an−1 12 Γ( n2 ). Hint: Express the n dimensional integral e−x dx in
R
Differential Forms
71
72 CHAPTER 3. DIFFERENTIAL FORMS
3.1 Coordinates
There are two theories of integration. The first describes how to integrate a
function over a set. The second explains how to integrate a differential form
over an oriented surface. It is the second theory that is the natural setting for
calculus. It is also a thread that runs through much of geometry and even of
applied mathematics. The topic of this chapter is differential forms and their
integrals, culminating with the general form of Stokes’ theorem.
In the following a function is said to be smooth if it is C ∞ . Two open sets
U, V of Rn are said to be diffeomorphic if there is a one-to-one smooth function
f : U → V with smooth inverse function f −1 : V → U .
An n-dimensional manifold patch is a set M together with a collection of
functions defined on M . Each such function is one-to-one from M onto some
open subset of Rn . Such a function is called a coordinate system. There are
two requirements on the set of coordinate system functions.
This definition of manifold patch is not a standard definition; its purpose here
is to capture the idea of a set with many different coordinate systems, each of
which is defined on the same set.
For n = 1 a typical manifold patch is something like a featureless curve.
It does not have a shape, but one could think of something like the letter S,
without the end points. (One should not think of a curve in the shape of an 0,
because it does not match with an open interval of numbers. Also one should
not have a curve that looks like the Greek α, because it has a self-intersection
point.) For n = 2 a typical manifold patch is like a patch of cloth, but without
a border. It can have holes. The case n = 0 is a single point.
The coordinate functions serve to attach numbers to the points of M . This
can be done in many ways, and there is no one coordinate system superior to
all the others, at least not without further consideration. Furthermore, in many
treatments (including here) the points in M do not have names. If we want
to specify a point, we say that it is the point in M such that in a particular
coordinate system the coordinate values are certain specified numbers.
The concept of manifold patch is natural in geometry and in many applica-
tions of mathematics. A manifold patch M is (at least at the outset) assumed
to be featureless, except for the fact that it can have various coordinate sys-
tems. These coordinate systems are supposed to be on a democratic status; one
is as good as another. Since the notions of open set, closed set, compact set,
continuous function, smooth function, smooth curve, and so on are independent
of coordinate system, they make sense for a manifold patch. On the other hand,
3.2. SCALAR FIELDS 73
notions of distance, angle, congruence, and so on are not defined (although they
may be introduced later). It is amazing how much useful mathematics can be
done in this general context.
The manifold patch concept is particularly suited to the local notion of
geometry, that is, it gives a description of what happens near a point. This
is because a manifold patch is modeled on an open subset of Rn . There is a
more general concept of manifold that consists of many manifold patches joined
together. This is a global or big picture notion of geometry, and it is a fascinating
topic in advanced mathematics. Here we focus on the local story, with only brief
mention of possible global issues.
Example: Here is a typical example of a manifold patch in applied mathematics.
Consider a box of gas with pressure p and volume V and temperature T . These
are related by an equation of state f (p, V, T ) = 0. In nice cases this equation
may be solved for one variable in terms of the others via the implicit function
theorem. Finding the equation of state is a major task of physics. But even
after this is determined, it is not clear which coordinates will be most convenient
to use. In this case the manifold patch M is the set of possible states of the
gas. One possible coordinate system is p, V . Another is p, T . Yet another is
V, T . Physicists and chemists and geologists use whichever coordinate system
is appropriate to the problem under consideration. |
Here each aj is a scalar field. The differential operator acts on a scalar field s
to give another scalar field
n
X ∂s
Xs= aj . (3.2)
j=1
∂xj
Again, the notion of vector field is independent of the coordinate system. Thus
we can also write
n
X ∂
X= āi . (3.3)
i=1
∂u i
Here
n n
X ∂ui X
0
āi = aj = fi,j (x)aj . (3.4)
j=1
∂xj i=1
One can picture a vector field in the following way. At each point of M one
draws an arrow. This arrow is not to be thought of as a displacement in M , but
as a kind of rate of change at this point of M . For instance, if M is thought
of a region where there is a fluid flow, then the vector field might be something
like the fluid velocity. More precisely, the components of the vector field are
velocities. The vector field itself describes how quantities change with time as
they move with the fluid.
Giving a vector field is equivalent to giving a system of ordinary differential
equations. More precisely, it is equivalent to giving an autonomous system of
3.3. VECTOR FIELDS 75
first order ordinary differential equations. (The word autonomous means that
the vector field
P does not change with time.) The equations corresponding to
vector field j gj (x)∂/∂xj are
dxj
= gj (x1 , . . . , xn ). (3.5)
dt
Of course this can also be written in the abbreviated form dx/dt = g(x). A
solution of such an equation is given by functions hj (t) with
dhj (t)
= gj (h1 (t), . . . , hn (t)). (3.6)
dt
Again this has a brief form dh(t)/dt = g(h(t)).
Locally, a vector field is a fairly boring object, with one exception. This is
at a point in the manifold patch M where the vector field X vanishes, that is,
where each aj has the value zero. Away from such points the vector field is
doing nothing more interesting than uniform motion.
is a vector field that is non-zero near some point, then near that point there is
another coordinate system u1 , . . . , un in which it has the form
∂
X= . (3.8)
∂uj
Proof: Here is the idea of the proof of the straightening out theorem. Say
that aj 6= 0. Solve the system of differential equations
dxi
= ai (3.9)
dt
with initial condition 0 on the surface xj = 0. This can be done locally, by the
existence theorem for systems of ordinary differential equations with smooth
coefficients. The result is that xi is a function of the coordinates xi for i 6= j
restricted to the surface xj = 0 and of the time parameter t. Furthermore,
since dxj /dt 6= 0, the condition t = 0 corresponds to the surface xj = 0. So if
x1 , . . . , xn corresponds to a point in M near the given point, we can define for
i 6= j the coordinates ui to be the initial value of xi on the xj = 0, and we can
define uj = t. In these coordinates the differential equation becomes
dui
= 0, i 6= j, (3.10)
dt
duj
= 1. (3.11)
dt
76 CHAPTER 3. DIFFERENTIAL FORMS
Example: Consider the vector field
∂ ∂
−y +x (3.12)
∂x ∂y
dx
= −y (3.13)
dt
dy
= x. (3.14)
dt
Take the point to be y = 0, with x > 0. Take the initial condition to be x = r
and y = 0. Then x = r cos(t) and y = r sin(t). So the coordinates in which
the straightening out takes place are polar coordinates r, t. Thus if we write
x = r cos(φ) and y = r sin(φ), we have
∂ ∂ ∂
−y +x = , (3.15)
∂x ∂y ∂φ
where the partial derivative with respect to φ is taken with r held fixed. |
Example: Consider the Euler vector field
∂ ∂ ∂
x +y =r , (3.16)
∂x ∂y ∂r
where the partial derivative with respect to r is taken with fixed φ. We need to
stay away from the zero at the origin. If we let t = ln(r), then this is
∂ ∂ ∂ ∂
x +y =r = , (3.17)
∂x ∂y ∂r ∂t
1 2 1
H= p − mga cos( q). (3.25)
2m a
This is the energy, and it is constant for each solution. While the energy does
not describe the time dependence of the solutions, it does show the shape of the
solutions in the phase plane. |
The following question is natural. Suppose that a vector field has an isolated
zero. At that zero it has a linearization. Is it possible to choose coordinates
nearby so that the vector field is given in those new coordinates by its lineariza-
tion? It turns out that this can often be done. The answer to the question is
negative in general. See Nelson [14] for a discussion of this delicate matter.
du
= a = f (u, v)
dt
dv
= b = g(u, v). (3.26)
dt
Here a, b are the velocity vector field components with respect to the coordinates
u, v.
Now let s = h(u, v) be some time-independent quantity. For instance, it
could be the temperature of the fluid at each point in space. If we follow this
quantity along a particle, then it does change in time, according to
ds ∂s du ∂s dv ∂ ∂
= + = a +b s. (3.27)
dt ∂u dt ∂v dt ∂u ∂v
3.5. DIFFERENTIAL 1-FORMS 79
In fluid dynamics the differential operator on the right represents the effect of
the fluid flow given by the velocity vector field. It is the derivative following
the motion of the particle. It is so important that it has many names: ad-
vective derivative, particle derivative, material derivative, substantial derivative,
Lagrangian derivative, and so on. The components a, b of the velocity vector
field depend on the coordinate system. If now one changes coordinates, say to
w, z, then the equation becomes
dw
= p = m(z, w)
dt
dz
= q = n(z, w), (3.28)
dt
where
∂w ∂w
p = a +b
∂u ∂v
∂z ∂z
q = a +b . (3.29)
∂u ∂v
Then a straightforward calculation gives
∂ ∂ ∂w ∂ ∂z ∂ ∂w ∂ ∂z ∂
p +q =a + +b + . (3.30)
∂w ∂z ∂u ∂w ∂u ∂z ∂v ∂w ∂v ∂z
By the chain rule this is
∂ ∂ ∂ ∂
p +q =a +b . (3.31)
∂w ∂z ∂u ∂v
Since the advective derivative represents the rate of change along the motion of
the particle, it is independent of the coordinate system. Specifying the advective
derivative is thus a particularly attractive way of specifying the vector field.
If we apply this to the scalar s = xi that is one of the coordinates, then we get
hdxi | Xi = ai . (3.35)
It follows that
n n
X ∂s X ∂s
hds | Xi = hdxj , Xi = h dxj , Xi. (3.36)
j=1
∂xj j=1
∂xj
This is the most basic computational tool of the theory. The coordinate basis
forms dxi are sometimes called the dual basis of the coordinate basis vector
fields ∂/∂xj .
The general 1-form may be written in the form
n
X
ω= pj dxj . (3.38)
j=1
Here the pj are scalar fields. Its value on the vector field X is the scalar field
n
X
hω | Xi = pj aj . (3.39)
j=1
Remark: There is a dramatic difference between vector field bases and 1-form
bases. The notation ∂/∂z does not make sense unless z is a variable that belongs
to a given coordinate system. For instance, if the coordinate system is q, z, s,
then ∂/∂z means to differentiate with respect to z holding q, s both constant.
On the other hand, a differential dy makes sense for an arbitrary scalar field y,
whether or not it belongs to a coordinate system. |
Example: Here is an illustration of some of these ideas. Consider the problem of
making a box with a given amount of material to contain the maximum volume.
The box will have five sides, a base and four vertical sides. It is open at the
top. In this case the manifold patch is the set M of possible boxes made with
this material.
Say that the side lengths of the base are u, v and the height is w. The
amount of material available is a fixed number A. Thus uv + 2uw + 2vw = A.
Since A is a constant, we have
(v + 2w) du + (u + 2w) dv + 2(u + v) dw = 0. (3.40)
This relation is valid on all of M . We are interested in the point of M (that is,
in the particular shape of box) with the property that the volume V = uvw is
maximized. At this point we have
dV = vw du + uw dv + uv dw = 0. (3.41)
3.5. DIFFERENTIAL 1-FORMS 81
with
n n
X ∂uk X
0
q̄j = qk = qk fk,j (x) (3.47)
∂xj
k=1 k=1
expresses the same form in the other coordinate system. It may be shown by
calculation that the criterion for being exact or closed is the same in either
coordinate system.
The coordinate invariance is a reflection of the fact that the pairing of dif-
ferential 1-form and vector field gives rise to a well-defined scalar. Explicitly,
we have
Xn n
X
hα | Xi = q̄j aj = qk āk . (3.48)
j=1 k=1
82 CHAPTER 3. DIFFERENTIAL FORMS
x = r cos(θ)
y = r sin(θ). (3.49)
Here θ ranges from −π to π, with a jump in value across the half-line. These
equations are identities saying the the scalars on the left are equal to the scalars
on the right as real functions on M † . Similarly, we have the identity r2 = x2 +y 2 .
Another way of thinking of the relation between the two coordinate systems
is to define open subsets U and V of R2 by taking U = {(a, b) | a > 0, −π <
b < π} and V = R2 \ {(p, q) | p ≤ 0, q = 0}. Then the coordinate system (x, y)
maps M † to V , and the coordinate system (r, θ) maps M † to U . The change of
coordinates is a smooth one-to-one function f from U to V . The result is that
(x, y) = f (r, θ) as functions from M † to V . The two coordinate systems provide
two numerical descriptions of the same object M † .
Taking the differential and then eliminating the trig functions gives
x
dx = dr − y dθ (3.50)
r
y
dy = dr + x dθ. (3.51)
r
3.6. POLAR COORDINATES 83
It follows that
x dx + y dy = r dr
x dy − y dx = r2 dθ. (3.52)
point in E • we can choose an angular variable χ that is defined near the point,
and then R = ∂/∂χ near the point.
This discussion shows that it is useful to think of Cartesian coordinates and
polar coordinates as scalars defined on a manifold patch. Then the equations
above are identities, either for scalars or for differential forms or for vector fields.
The closed differential form ω defined on the punctured plane is a fundamen-
tal mathematical object; for instance it underlies many of the calculations in
complex variable theory. The rotation operator R has an associated system
of differential equations. These are the equations for a linear oscillator, a ba-
sic system that occurs throughout applied mathematics. These examples also
illustrate that differential forms and vector fields are quite different objects.
The example in this section is not typical in one respect: there are natural
notions of length and angle. Thus in Cartesian coordinates dx and dy are
orthogonal unit forms. In polar coordinates dr and r dθ are orthogonal unit
forms. (Note: The form r dθ is not a closed form.) There is a similar story
for vector fields. In Cartesian coordinates ∂/∂x and ∂/∂y are orthogonal unit
vectors. In polar coordinates ∂/∂r and 1/r ∂/∂θ are orthogonal unit vectors.
The reason for this is that the underlying manifold M is the Euclidean plane,
which has natural notions of length and angle. Such special structure need not
be present in other manifolds.
p dx + q dy = 0. (3.57)
Here p = f (x, y) and q = g(x, y) are functions of x, y. This means that a solution
of the equation is a curve where the differential form p dx + q dy is zero. There
can be many such curves.
The equation is determined by the differential form α = p dx + q dy, but two
different forms may determine equivalent equations. For example, if µ = h(x, y)
is a non-zero scalar, then the form µα = µp dx + µq dy is a quite different form,
but it determines an equivalent differential equation.
If α = p dx + q dy is exact, then p dx + q dy = dz, for some scalar z depending
on x and y. Each solution of the differential equation is then given implicitly
by z = c, where c is the constant of integration.
If α = p dx + q dy is not exact, then one looks for an integrating factor µ
such that
µα = µ(p dx + q dy) = dz (3.58)
is exact. Once this is done, again the general solution of the differential equation
is then given implicitly by z = c, where c is constant of integration.
3.7. INTEGRATING FACTORS AND CANONICAL FORMS 85
∂x ∂y
α = p dx + q dy = (p + q ) dv = w dv, (3.59)
∂v ∂v
where w is a non-zero scalar. We can then take µ = 1/w.
Finding an explicit integrating factor may be no easy matter. However, there
is a strategy that may be helpful.
Recall that if a differential form is exact, then it is closed. So if µ is an
integrating factor, then
∂µp ∂µq
− = 0. (3.60)
∂y ∂x
This condition may be written in the form
∂µ ∂µ ∂p ∂q
p −q + − µ = 0. (3.61)
∂y ∂x ∂y ∂x
Say that by good fortune there is an integrating factor µ that depends only
on x. Then this gives a linear ordinary differential equation for µ that may be
solved by integration.
Example: Consider the standard problem of solving the linear differential equa-
tion
dy
= −ay + b, (3.62)
dx
where a, b are functions of x. Consider the differential form (ay−b) dx+dy. Look
for an integrating factor µ that depends only on x. The differential equation for
µ is −dµ/dx = aµ. This has solution µ = eA , where A is a function of x with
dA/dx = a. Thus
If the Hessian matrix on the right is positive definite (negative definite), then
the function z has a local minimum (local maximum). This condition may be
expressed in any coordinate system.
3.8. THE SECOND DIFFERENTIAL 87
The second derivative in the second term on the right is a rather complicated
factor. But if the first derivatives ∂z/∂xj = 0 for j = 1, . . . , n at a certain
point, then we are left with the Hessian matrix at this point transformed by the
coordinate transformations on left and right. This is a matrix congruence, so it
preserves the positive definite or negative definite property.
In the case of a function of two variables, there is a simple criterion for
application of the second derivative test. Suppose that z = h(x, y) is a smooth
function. Consider a point where the first derivative test applies, that is, the
differential dz = d h(x, y) is zero. Consider the case when the Hessian is non-
degenerate, that is, has determinant not equal to zero. Suppose first that the
determinant of the Hessian matrix is strictly positive. Then the function has
either a local minimum or a local maximum, depending on whether the trace is
positive or negative. Alternatively, suppose that the determinant of the Hessian
matrix is strictly negative. Then the function has a saddle point.
The case of n dimensions is more complicated. The Hessian matrix may be
transformed by matrix congruence transformations to a diagonal matrix with
entries j that are +1, −1, or 0. In the non-degenerate case the entries are ±1.
If they are all +1 then we have a local minimum, while if they are all −1 we
have a local maximum. Otherwise we have a saddle.
There is a more powerful insight into these results that comes from changing
to a new coordinate system. The first result states that away from a critical
point nothing interesting happens.
function at that point. Suppose that the Hessian is non-degenerate at this point.
Then there is a coordinate system u1 , . . . , un near the point with
n
X
z = z0 + i u2i , (3.68)
i=1
are then linearly independent. In this case the surface defined by g(x) = c will
be called a regular implicit surface. It is clear that every regular surface is a
regular implicit surface.
The notation requires explanation. To get the proper value for Xα h we have to
perform the partial derivatives to get h0,i (x), but after that we have to substitute
x ← f (u) in the result.
The vectors Xα for α = 1, . . . , k are not the usual kind of vector field; instead
each Xα is a vector field along the parameterized surface. That is, the input to
Xα is given by the u, while the output corresponds to a vector at the point f (u)
on the surface. Each such vector is a tangent vector to the surface.
Consider now a surface with both a parametric and an implicit representa-
tion. In that case we have g(x)(x ← f (u)) = g(f (u)) = c. Explicitly,
gp (f (u)) = cp (3.73)
This result may also be written in terms of differential forms and vector fields
in the form
hdgp (x) | Xα i = 0. (3.75)
3.10. LAGRANGE MULTIPLIERS 91
The notion does not make this explicit, but there is an assumption that there
is a replacement x = f (u) in the coefficients of the differential forms. The sig-
nificance of this equation is that the differentials of the functions defining the
surface vanish on the tangent vectors to the surface. Since there are k indepen-
dent tangent vectors and n − k independent differential forms, the differential
forms dgp (x), p = k + 1, . . . , n form a basis for the space of differential forms
that vanish on the tangent vectors.
at that point.
More explicitly,
n
X
h0,i (f (u))fi,α
0
(u) = 0 (3.78)
i=1
hdh(x) | Xα i = 0 (3.79)
for α = 1, . . . , k. This says that dh(x) belongs to the space of forms that vanish
on the tangent vectors. It follows that it is a linear combination of the forms
dgp (x) that form a basis for this space.
The coefficients λp are called Lagrange multipliers. This result is intuitive.
It says that if h has a local maximum on the surface, then the only way it can
be made larger is by moving off the surface by relaxing the constraint that the
surface is defined by constants. The Lagrange multiplier λp itself is the partial
derivative of the critical value with respect to a change in the parameter cp .
92 CHAPTER 3. DIFFERENTIAL FORMS
Thus 1 = 2λx, 1 = 2λy, and 2 = 2λz. Insert these in the constraint pequation
x2 + yp2
+ z 2 = 1. This
p gives (1/4)p+ (1/4) + 1 = λ 2
, or λ = ± 3/2. So
x = ± 2/3/2, y = ± 2/3/2, z = ± 2/3. |
Example: Say that we want to maximize or minimize u = x − 4y + 3z + z 2
subject to v = x − y = 0 and w = y − z = 0. The manifold in this case is just a
line through the origin. The Lagrange multiplier condition says that
are good for integrals over 3-dimensional regions. However for the moment we
are only concerned with the differential forms, not with their integrals.
These forms have an algebra. The fundamental law is the anticommutative
law for 1-forms. Thus for instance dw du = − du dw. Since 1-forms anticommute
with themselves, we have du du = 0, and so on.
The algebra here is called the exterior product. In theoretical discussions it
is denoted by a wedge symbol, so that we would write dv ∧ dw instead of the
shorter form dv dw. Sometimes it is a good idea to use such a notation, since it
reminds us that there is a rather special kind of algebra, different from ordinary
multiplication. Practical computations tend to leave it out. |
A more theoretical approach to the definitions relates differential forms to
vector fields. A differential k-form ω on a coordinate patch is a quantity that
depends on k vector fields. We write it as hω | X1 , . . . , Xk i. One way to get
such a k-form is to multiply together k 1-forms and then anti-symmetrize. The
multiplication operation that accomplishes this is often written ∧ and is called
the exterior product. In the simplest case of a differential 2-form ω = α ∧ β this
is given by the determinant
hα | Xi hβ | Xi
hα ∧ β | X, Y i = det (3.82)
hα | Y i hβ | Y i
α ∧ β = −β ∧ α. (3.84)
In particular
α ∧ α = 0. (3.85)
The general formula for a product of k 1-forms is also given by a determinant
where the sum is over all permutations such that σ(1), . . . , σ(p) are in increasing
order and σ(p + 1), . . . , σ(k) are in increasing order.
A simple example is the product of a 2-form θ with a 1-form β. Then
|
The multiplicative properties are summarized as follows.
Associative law
(ω ∧ σ) ∧ τ = ω ∧ (σ ∧ τ ). (3.89)
are equal as n + m + p forms.
Distributive law
ω ∧ (σ + τ ) = ω ∧ σ + ω ∧ τ. (3.90)
Anticommutative law for odd degree forms If both ω or σ are odd degree
forms, then
ω ∧ σ = −σ ∧ ω. (3.92)
The way to remember this is that even degree forms commute with everything.
On the other hand, odd degree forms anticommute with each other. In partic-
ular, if ω has odd degree, then ω ∧ ω = 0.
|
Now to a more theoretical treatment. We already know that the differential
of a scalar is
∂u ∂u
du = dx1 + · · · + dxn . (3.97)
∂x1 ∂xn
Every k form may be written as a sum of forms of the form
Additivity
d(ω + σ) = dω + dσ. (3.100)
Differential of a differential
ddω = 0. (3.102)
If we think of d as a degree one quantity, then the sign in the product property
makes sense. Also, in this context the differential of a differential property also
makes sense.
A k-form σ is called closed if dσ = 0. A k-form σ is called exact in an
open set if σ = dα for some k − 1 form α in the open set. It follows from the
differential of a differential property that every exact form is closed.
It is useful to look at these quantities in low dimensions. For instance, in
three dimensions one might have a differential 2-form such as
σ = a dy ∧ dz + b dz ∧ dx + c dx ∧ dy. (3.103)
τ = s dx ∧ dy ∧ dz. (3.104)
Notice that these forms are created as linear combinations of exterior products
of 1-forms.
96 CHAPTER 3. DIFFERENTIAL FORMS
σ = a dy dz + b dz dx + c dx dy. (3.112)
Then
∂a ∂b ∂c
dσ = da dy dz + db dz dx + dc dx dy = dx dy dz + dy dz dx + dz dx dy.
∂x ∂y ∂z
(3.113)
This simplifies to
∂a ∂b ∂c
dσ = d(a dy dz + b dz dx + c dx dy) = + + dx dy dz. (3.114)
∂x ∂y ∂z
There are ways of picturing differential forms. If s is a 0-form, that is, a
scalar field, then it is a function from some set to the real numbers. The set
could be two-dimensional (or three-dimensional). So it certainly makes sense to
talk about a curve (or a surface) where s has a particular constant value. These
are the contour curves (surfaces) of the scalar field s. The scalar field does not
change along these curves (surfaces). Closely spaced contour curves (surfaces)
indicate a rapid increase or decrease in the values of s.
3.12. THE EXTERIOR DERIVATIVE 97
For an exact 1-form ds one uses the same picture of contour curves (surfaces),
but the philosophy is a bit different. One magnifies the region near a given point,
and one notes that the magnified curves (surfaces) are nearly lines (planes). So
at a small scale they are approximately the contour lines (planes) of linear
functions. The linear functions (forms) do not change much along such lines
(planes).
For a 1-form that is not exact the picture looks almost the same, except
that the the small scale contour lines (planes) can have end points (end curves).
They don’t come from a big scale contour curve (surface) picture. The end points
(end curves) have an orientation that indicates the direction of increase for the
small scale contour lines (planes). The differential of the 1-form corresponds to
this cloud of end points (end curves). A line integral of a 1-form along a curve
represents the cumulative change as the curve crosses the contour lines (planes).
More precisely, a differential 1-form assigns to each point and to each tangent
vector at that point a real number. The form is pictured by indicating those
spaces of tangent vectors at particular points on which the form gives the value
zero. Such a tangent space at a particular point is a line (plane).
If there is a metric, then you can picture a 1-form by vectors that are perpen-
dicular to the lines (planes) defining the form. Many people like to do this. But
it complicates calculations. And for many applications there is not a natural
choice of metric.
Example: Consider the 1-form y dx in two dimensions. This is represented by
vertical contour lines that terminate at points in the plane. The density of these
lines is greater as one gets farther from the x axis. The increase is to the right
above the x axis, and it is to the left below the y axis. The differential of y dx is
dy dx = −dx dy. This 2-form represents the cloud of terminating points, which
has a uniform density. The usual convention is that the positive orientation is
counterclockwise. So the orientations of these source points are clockwise. This
is consistent with the direction of increase along the contours lines. |
To understand the picture for 2-forms, we can look at 3-dimensional space.
If we look at a 1-form, it assigns numbers to little vectors. We picture the
1-form by the vectors (forming little planes) on which it is zero. If we look at
a 2-form, it assigns numbers to little oriented parallelograms. We picture it by
looking at the intersection of all the little parallelograms for which it has the
value zero. These determine a line. So one can think of the 2-form locally as
given by little lines. Globally they form curves, with a kind of spiral orientation.
They may have oriented end points. The differential of a 2-form corresponds to
the cloud of oriented end points. The integral of a 2-form on an oriented surface
depends on what the 2-form assigns to little oriented parallelograms formed by
tangent vectors to the surface. The non-zero contributions correspond to when
the curves representing the 2-form are transverse to the surface.
More precisely, a differential 2-form in 3-dimensional space assigns to each
point and to each ordered pair of tangent vectors at that point a real number.
The form is pictured by those tangent vectors that belong to a pair giving the
value zero. Such a tangent space at a particular point is a line.
When there is a metric, it is possible to picture the 2-form as a vector field
98 CHAPTER 3. DIFFERENTIAL FORMS
along the direction of the lines. In that case the surface integral represents the
amount by which the vectors penetrate the surface.
∂ ∂
Example: Say the 2-form is dx dy. It is zero on the vector pair ∂z , ∂x and on
∂ ∂
the vector pair ∂y , ∂z . In other words, it is zero on a pair including the vector
∂
∂z . So we picture it by where it is zero, that is, by lines in the z direction. If we
integrate it along a surface where z is constant, then the little parallelograms
∂ ∂
in the surface are spanned by vectors like ∂x and ∂y , and so we get a non-zero
result. |
Remark: Consider the case of three dimensions. Anyone familiar with vector
analysis will notice that if s is a scalar, then the formula for ds resembles the
formula for the gradient in cartesian coordinates. Similarly, if α is a 1-form, then
the formula for dα resembles the formula for the curl in cartesian coordinates.
The formula d ds = 0 then corresponds to the formula curl grad s = 0.
In a similar way, if σ is a 2-form, then the formula for dσ resembles the
formula for the divergence of a vector field v in cartesian coordinates. The
formula d dα = 0 then corresponds to the formula div curl v = 0.
There are, however, important distinctions. First, the differential form for-
mulas take the same form in arbitrary coordinate systems. This is not true for
the formulas for the gradient, curl, and divergence. The reason is that the usual
definitions of gradient, curl, and divergence are as operations on vector fields,
not on differential forms. This leads to a much more complicated theory, except
for the very special case of cartesian coordinates on Euclidean space. Later on
we shall examine this issue in detail.
Second, the differential form formulas have natural formulations for mani-
folds of arbitrary dimension. While the gradient and divergence may also be
formulated in arbitrary dimensions, the curl only works in three dimensions.
This does not mean that notions such as gradient of a scalar (a vector field)
or divergence of a vector field (a scalar) are not useful and important. Indeed,
in some situations they play an essential role. However one should recognize
that these are relatively complicated objects.
The same considerations apply to the purely algebraic operations, at least
in three dimensions. The exterior product of two 1-forms resembles in some
way the cross product of vectors, while the exterior product of a 1-form and a
2-form resembles a scalar product of vectors. Thus the exterior product of three
1-forms resembles the triple scalar product of vector analysis. Again these are
not quite the same thing |
Proposition 3.12 The following are nice regions, that is, diffeomorphic to the
3.13. THE POINCARÉ LEMMA 99
Proof:
p
1. There is a map from x in B topy in Rn given by y = x/ 1 − |x|2 . The
inverse map is given by x = y/ 1 + |y|2 .
2. There is a map from Rn to Rn+ given by zi = eyi . The inverse map is
yi = log(zi ).
p
3. There is a map from Cn to Rn+ given by zi = ui / 1 − u2i . The inverse
p
map is ui = zi / 1 + zi2 .
n
P
P ∆n to R+ given by z = x/(1 − i xi ). The inverse
4. There is a map from
map is x = z/(1 + i zi ).
An n-dimensional local manifold patch is a manifold patch with a coordinate
system x : M → U , where U ⊆ Rn is a nice region. In the case n = 0 a local
manifold patch is a single point. The terminology used here is not standard,
but the idea is that a local manifold patch has no interesting global features. In
fact, for each n it is an essentially unique object.
Example: An example of a 2-dimensional manifold patch with global features is
one modeled on a plane with a point removed. Another example is one modeled
on a plane with two points removed. These last two examples are not only not
diffeomorphic to the plane, but they are also not diffeomorphic to each other.
In fact, they are very different as global object. |
Example: If ω is defined on a manifold patch, that is, if ω = dα in the region,
then ω is closed: dω = 0. The converse is not true in general. Here is a two
dimensional example. Let
1
ω= (x dy − y dx). (3.115)
x2 + y 2
in the plane with the origin removed. Then ω is closed, but not exact. If we
remove a line running from the origin to infinity, then the resulting region is a
local manifold patch. In this smaller region ω is exact, in fact, ω = dφ, where
x = r cos(φ) and y = r sin(φ). |
What is true is that if ω is closed, then ω is locally exact. In fact, it is exact
on every local manifold patch. This will be proved in the following famous
Poincaré lemma.
100 CHAPTER 3. DIFFERENTIAL FORMS
Proof: We may as well assume that the coordinate system sends the local
manifold patch to an open ball centered at the origin. This implies that if
x1 , . . . , xn are coordinates of a point in the region, and if 0 ≤ t ≤ 1, then
tx1 , . . . , txn are coordinates of a point in the region.
If ω is a k-form, then we may obtain a form ω̄ by substituting txi for xi
everywhere. In particular, expressions dxi become d(txi ) = xi dt + t dxi . Every
differential form σ involving dt and other differentials may be written σ =
σ1 + σ2 , where σ1 is the static part, depending on t but with no factors of dt,
and σ2 = dt β is the remaining dynamic part, R 1 with β depending on t but with
no factors of dt. Define K(σ) = K(σ2 ) = 0 dt β, where σ2 = dt β. The claim
is that
K(dω̄) + dK(ω̄) = ω. (3.116)
This is proved in two parts. The first part is to prove a result for ω1 .
By definition K(ω̄1 ) = 0. We show that K(dω̄1 ) = ω. But (dω̄1 )2 only in-
volves t derivatives of the coefficients, so by the fundamental theorem of calculus
K(dω̄1 ) = K((dω̄1 )2 ) = ω.
The second part is that K(dω̄2 ) = −dK(ω̄2 ). But dω̄2 = −dt dβ = −dt (dβ)1 ,
so K(dω̄2 ) = −K(dt (dβ)1 ) = −dK(ω̄2 ). These two parts establish the claim.
The result follows from the claim. If dω = 0, then dω̄ = 0, and so ω = dK(ω̄).
Remark: The algorithm is simple, provided that one can do the integrals. Start
with a closed differential form ω defined in terms of x1 , . . . , xn . Replace xi by
txi everywhere, including in differentials. Collect all terms that begin with dt.
Put the dt in front. Integrate from 0 to 1 (with respect to t, keeping everything
else fixed). The result is a form α with dα = ω. |
Example: Consider the closed 1-form ω = x dy + y dx. Then ω̄ = t2 x dy +
t2 y dx + 2xyt dt. The integral of 2xyt dt is α = xy. |
Example: Consider the closed form ω = dx dy. Then ω̄ = t2 dx dy + tx dt dy −
ty dt dx. The integral of t dt (x dy − y dx) is α = (1/2)(x dy − y dx). |
p q
composition (y 7→ y 2 + 1)◦(u 7→ sin(u)) = (w 7→ sin2 (w) + 1). On the other
p √
hand, the composition (u 7→ sin(u)) ◦ (y 7→ y 2 + 1) = (z 7→ sin( z 2 + 1)). In
general (y 7→ f (y)) ◦ (u 7→ g(u)) is just another name for t 7→ f (g(t)) which is
itself just the composition f ◦g. In many instances we leave out the composition
symbol and just write f g.
The other kind of object is an expression that explicitly involves variables.
In logic this corresponds to the notion of free variable. For instance, sin(z) and
sin(t) are different expressions. There is an important operation called substi-
tution of an expression for a variable. An example is u ← sin(t). This means
to substitute sin(t) for u. As an example, u2 ◦ (u ← sin(t)) = sin2 (t). √ Substi-
tutions may √ be composed. Thus, for instance,
√ (u ← sin(t)) ◦ (t
√ ← z 2 + 1) =
2 2 2 2 2
(u ← sin( z + 1)). And u ◦ (u ← sin( z + 1)) = sin ( z + 1). Again
we often leave out the composition symbol. There are general identities such
as h(x)(x ← g(t)) = h(g(t)) and (x ← g(t))(t ← f (u)) = (x ← g(f (u))).
Composition of substitutions and composition of functions are clearly closely
related.
The substitution operation occurs in various forms and has various names,
including replacement and change of variables . In computer science there is a
related notion called assignment. An assignment x ← g(t) makes a change in
the machine state. It takes the number stored under the label t, computes g(t),
and then stores this result under the label x.
There is another notation that is very useful: In place of h(x)(x ← g(t)) =
h(g(t)) one instead writes (x ← g(t))∗ h(x) = h(g(t)). This notation at first
seems strange, but it is very useful. The idea is that the substitution x ← g(t)
is thought of as a operation that acts on expressions h(x), converting them to
other expressions h(g(t)). This kind of operation is called pullback.
If we write (u ← sin(t))∗ u2 = sin2 (t), then it seems natural to define the
expression (u ← sin(t))∗ du2 = d sin2 (t) = 2 sin(t) cos(t) dt. It also seems natural
to define (u ← sin(t))∗ 2u du = 2 sin(t)d sin(t) = 2 sin(t) cos(t) dt. That fact that
these give the same answer should be a source of satisfaction. Substitution
is somewhat more general than first appears; it has a natural application to
differential forms. In this context it is particularly natural to call it a pullback.
The same ideas extend to several variables. Take, for instance, the situation
when we have two variables x, y. A function such as xy 2 is a function on the
plane. Say that we want to perform the substitution ψ given by x ← t2 , y ← t3 .
Then we use ψ to pull back xy 2 to a function t8 on the line. We can write
ψ ∗ xy 2 = t8 . If we think of ψ as a parameterized curve, then the pullback is the
function on the curve expressed in terms of the parameter.
We can also pull back a differential form such as d(xy 2 ) = y 2 dx + 2xy dy
via the same ψ. The result using the right hand side is t6 dt2 + 2t5 dt3 =
2t7 dt+6t7 dt = 8t7 dt. Of course using the left hand side we also get dt8 = 8t7 dt.
For a form like ω = y dx + 2xy dy that is not exact, pulling it back by ψ gives
the result 2t4 dt + 6t7 dt = (2t4 + 6t7 ) dt = d( 52 t5 + 34 t8 ). Thus ψ ∗ ω is exact,
though ω is not exact. The pullback is a non-trivial operation on differential
forms.
102 CHAPTER 3. DIFFERENTIAL FORMS
In particular,
n
X
(y ← g(x))∗ dyi = d(gi (x)) = 0
gi,j (x) dxj . (3.119)
j=1
This can then be written in terms of products of the dxj . If the products are
arranged in order, then the resulting coefficient is a determinant.
The result of the above computation may be written more succinctly if we
use the definition dyI = dyi1 ∧ · · · dyik , where I = {i1 , i2 , . . . , ik } and i1 < i2 <
· · · < ik . Then we write X
ω= hI (y) dyI , (3.122)
I
Additivity
φ∗ (ω + σ) = φ∗ ω + φ∗ σ. (3.125)
Product property
φ∗ (ω ∧ σ) = φ∗ ω ∧ φ∗ σ. (3.126)
Derivative
φ∗ (dω) = d(φ∗ ω). (3.127)
Composition
(φ ◦ χ)∗ ω = χ∗ φ∗ ω. (3.128)
∂ X ∂xq ∂
= . (3.130)
∂yp q
∂yp ∂xq
The two matrices in the last two equations are inverses of each other. This is
the reason for the coordinate invariance of the pairing of 1-forms with vector
fields.
In the case of an active transformation the coordinates yi are describing
one situation, and the coordinates xj are describing some other situation. For
instance, the xj could be parameters describing a singular surface in the space
described by the yi . There is no reason to require that N and M have the
same dimension. The transformation y ← g(x) takes points of N to points of a
different space M . In this case it is best to write the pullback explicitly.
In the special case of an active transformation with M = N , the trans-
formation φ given by x ← g(x) makes sense. In that case the transforma-
tion can be iterated. This is equivalent to iterating the function g, since
x(x ← g(x))n = gn (x). It is important to emphasize that the mapping
x ← g(x) is not the same as the function g. In fact, a common notation
for the function g is x 7→ g(x) with the arrow going the other direction.
There is an intermediate situation that can be confusing. This is when N is a
subset of M , and φ sends each point in N into the same point, but now regarded
as a point in M . In this situation many people write equations y = g(x). There
has to be enough context to indicate that this means that the restriction of y
to N is g(x).
k
X ∂
Z= aα (3.131)
α=1
∂tα
3.16. PUSHFORWARD OF A VECTOR FIELD 105
In other words,
n
∂ X
0 ∂
φ∗ = fi,α (t) ( ) |x←f (t) . (3.133)
∂tα i=1
∂x i
The vectors on the right hand side are tangent to the image, since after the
differentiation is performed one substitutes a point in the image. In other words,
if one regards x ← f (t) as a parameterized surface in M , then these are tangent
vectors at various points on the surface.
Since this is a relatively awkward notion, it is not so common to make it
explicit. Typically one merely talks of the vector with components
∂xi 0
= fi,α (t). (3.134)
∂tα
It is understood that this is a way of talking about certain tangent vectors to
the surface.
Remark: For those interested in theoretical considerations, it is worth noting
that the notions of pushforward and pullback are related. The relationship is
somewhat complicated, so it may be wise to omit the following discussion on a
first reading.
To understand it, we need the notion of vector field W along a mapping
φ from N to M . This sends each point in N to a differential operator that
differentiates scalars on M , such that at each point in N the the derivative is
evaluated at thePimage point in M under φ. In coordinates this would have
the form W = j kj (t)∂/∂xj , where kj (t) is a scalar on N and the xj are
coordinates on M . If Z is a vector field on N , then the pushforward φ∗ Z is a
vector field along φ. We also need the notion of differential k-form γ along a
mapping φ from N to M . This acts as an antisymmetric multilinear function on
vector fields along φ, so hγ P
| W1 , . . . , Wk i is a scalar on N . In the case
P of a 1-form
γ could take the form γ = i fi (t) dxi . We would have hγ, W i = j fj (t)kj (t);
there are corresponding formulas for k-forms. If ω is a differential form on M ,
then there is a corresponding object ω ◦ φ that is a differential form along φ. It
may be definedP by pulling back the scalar P coefficients of ω. Thus in the 1-form
case, if ω = j hj (x) dxj , then ω ◦ φ = j hj (f (t)) dxj .
The object of interest is φ∗ ω, the pullback of the differential form ω. Suppose
ω is a k-form on M and Z1 , . . . , Zk are vector fields on N and φ is a map from
N to M . Then the pullback φ∗ ω is a k-form on N given by
hφ∗ ω | Z1 , . . . , Zk i = hω ◦ φ | φ∗ Z1 , . . . , φ∗ Zk i. (3.135)
This is an equality of scalar fields on N . The φ∗ Zi are vector fields along φ;
they map points in N to tangent vectors at the image points in M . The ω ◦ φ
106 CHAPTER 3. DIFFERENTIAL FORMS
for the pullback. In summary, the pushforward of a vector field is not an ordinary
vector field; it is a vector field along the manifold mapping. However, the
pullback of a differential form is another differential form. It is the pullback
that is most simple and natural. |
3.17 Orientation
An orientation of an n dimensional vector space is determined by a list of n
basis vectors u1 , . . . , un . Two such lists determine the same orientation if they
are related by a matrix with determinant > 0. They determine the opposite
orientation if they are related by a matrix with determinant < 0. There are
always two orientations.
Sometimes it is useful to have new lists of basis vectors related to the old
sets of vectors by a determinant that has the value ±1. This can be done in
various ways, but here is a simple special case. Consider a variant ūi = si uτ (i) ,
where τ is a permutation of {1, . . . , n} and each si = ±1. Then the determinant
is the product of the si times the sign of the permutation τ . We shall use the
term variant as a technical term for such a new basis.
In one dimension an orientation is determined by specifying one of two di-
rections. So if u is a vector, every strictly positive multiple of u determines
the same orientation, while every strictly negative multiple of u determines the
opposite orientation. So the two variants u and −u determine the two orienta-
tions.
In two dimensions an orientation is specified by taking two linearly inde-
pendent vectors u, v in order. In fact, the same orientation is specified by the
variants u, v and v, −u and −u, −v and −v, u. The opposite orientation would
be given by any of the variants u, −v and −v, −u and −u, v and v, u. These
two orientations are often called counter-clockwise and clockwise. These are not
absolute notions; a counter-clockwise orientation on a piece of paper becomes
clockwise when the paper is viewed from the other side. Often the orientation
is pictured by a rectangle with vectors as sides. For instance this could be
u, v, −u, −v, which takes you back to the starting point.
In three dimensions an orientation is determined by a list of three vectors
u, v, w. Of course many other triples of vectors determine the same orientation.
If we permute the order and change the signs, we get 3! · 23 = 48 variant lists,
of which 24 have one orientation and 24 the opposite orientation. The two
3.17. ORIENTATION 107
orientations are usually called right-handed and left-handed. Again this is not
absolute; a mirror will reverse the orientations. Here one draws a cell with six
oriented sides. In the list of three vectors, the first vector goes from an departure
side to a destination side, the second and third vectors give the orientation of
the destination side. For a given orientation there are six faces the vector can
point to, each has its orientation determined. Since there are four ways to find
vectors determining the orientation of a given face, this gives 24 variant triples
of vectors.
In higher dimensions the idea is the same. One can think of the orientation
of a cell as giving a first vector from one side of the cell to the other, then giving
an orientation of the destination side. For dimension zero it is helpful to think
of an orientation as just a choice of a sign + or −.
Pn
Consider a parameterized cell C consisting of the vectors i=1 ti ui , where
0 ≤ ti ≤ 1. This cell has 2n boundary cells. For each k there are 2 corresponding
−
boundary cells, related to each other by the vector uP k . The cell Ck is obtained
by setting tk = 0 and consists of the combinations i6=k ti ui . PThe cell Ck+ is
obtained by setting tk = 1 and consists of the combinations i6=k ti ui + uk .
Notice that uk takes Ck− to Ck+ , while −uk takes Ck+ to Ck− .
Given the orientation determined by u1 , . . . , un , there is a natural orienta-
tion on each of the 2n boundary cells. The cell Ck+ has orientation obtained
by moving uk to the front of the list, with a sign change for each interchange,
and then removing it. This implies that the orientation of the cell Ck+ is given
by the orientation u1 , . . . , uk−1 , uk+1 , . . . , un when k is odd and by the op-
posite orientation when k is even. The cell Ck− has the opposite orientation.
This implies that the orientation of the cell Ck− is given by the orientation
u1 , . . . , uk−1 , uk+1 , . . . , un when k is even and by the opposite orientation when
k is odd.
As an example, consider the case n = 2 with the list of vectors u, v. The
orientations of the boundary cells C1+ , C2+ are v and −u respectively, while the
orientations of C1− , C2− are −v and u respectively.
A more challenging example is n = 3 with the list of vectors u, v, w. The
orientations of the boundary cells C1+ , C2+ , C3+ are given by v, w and −u, w
and u, v. The orientations of C1− , C2− , C3− are given by −v, w and u, w and
−u, v. Of course one can always use variants of these lists to define the same
orientations.
All these ideas make sense for n = 1 with a single vector u. There are two
boundary cells each consisting of a single point. The convention is that C1+
corresponds to the point at u with positive orientation, while C1− corresponds
to the point at the origin with negative orientation.
There is a corresponding notion of orientation for a connected manifold
patch. Given a coordinate system x1 , . . . , xn , there is a corresponding orien-
tation given by the list of basis vector fields ∂/∂x1 , . . . , ∂/∂xn . Given two
coordinate systems on a connected manifold patch, they either have the same
orientation or the opposite orientation.
There are other ways of giving orientations on a connected manifold patch.
108 CHAPTER 3. DIFFERENTIAL FORMS
Instead of using a list of basis of vector fields, one can use a list of basis 1-
forms. Or one can give a single non-zero n-form. Often we suppose that an
orientation is given. Then for every coordinate system, either the coordinate
system is compatible with that orientation, or it is compatible with the opposite
orientation.
Example: A one-dimensional connected manifold patch is something like a
curve; an orientation gives a direction along the curve. A coordinate system
has only one coordinate u. It is compatible with the orientation if it increases
in the direction given by the orientation.
A two-dimensional connected manifold patch has a clock orientation. For
dimension two there are two coordinates u, v. Consider a change first in u
(keeping v fixed) and then in v (keeping u fixed) that is in the sense of this
orientation. If both these changes are positive, or if both these changes are
negative), then the coordinate system is compatible with the orientation.
The case of a zero-dimensional connected manifold patch is special; it is just
a featureless point. Here an orientation is a + or − sign. A coordinate is a
non-zero number attached to the point. It is compatible with the orientation if
this number has the same sign. |
α = z dy dz + 2x dz dx − 4x2 z dx dy (3.144)
on this surface. One way to calculate this is to parameterize the surface by the
corresponding rectangle in the x, z plane, with 0 ≤ x ≤ 5, 0 ≤ z ≤ 3. The
110 CHAPTER 3. DIFFERENTIAL FORMS
√
calculation amounts to pulling back by y ← 25 − x2 . The pullback is the form
∗ zx
α = √ + 2x dz dx. (3.145)
25 − x2
This can be integrated over the rectangle to give 195/2. |
In order to avoid complicated considerations of change of orientation, it is
customary to write −χ and define it in such a way that
Z Z
ω = − ω. (3.146)
−χ χ
What happens if both k = m = n. Then the form is ω = f (y)dy. So the
theorem says Z Z
f (y) dy = f (g(y)) det g0 (x) dx. (3.149)
φχ χ
This looks like the classic change of variables theorem, but without the absolute
value sign. The reason the absolute value is not needed is that the integrals are
defined with respect to parameterizations χ and φχ, and these are thus oriented
integrals.
∂χ = χ ∂I. (3.151)
Pk
This just means that ∂χ is the chain i=1 (−1)i−1 (χIi+ − χIi− ). This is a chain
of parameterized k − 1 surfaces that each map to N .
If χ is a k-chain (an integer linear combination of parameterized k-surfaces),
then its boundary ∂χ is a k − 1 chain (the corresponding integer linear combi-
nation of the boundaries of the surfaces).
All of these objects have coordinate representations. The face mapping
surfaces are
Proof: The proof has two steps. The first step is to prove the theorem in
the special case when the cell parameterizes itself, that is, when χ = I : Q → Q
sends every point to itself. The general k − 1 form on a k-dimensional space is
k
X k
X
ω= ωj = fj (u1 , . . . , uk ) du1 ∧ . . . ∧ duj−1 ∧ duj+1 ∧ . . . ∧ duk . (3.155)
j=1 j=1
In that case
k k
X X ∂fj (u1 , . . . , uk )
dω = dωj = (−1)j−1 du1 ∧ . . . ∧ duk . (3.156)
j=1 j=1
∂uj
Hence
Z Z k Z
X ∂fj (u1 , . . . , uk )
dω = I ∗ dω = (−1)j−1 du1 . . . duk . (3.157)
I Q j=1 A ∂uj
We can now use Fubini’s theorem to write the integral on the right hand side
as an iterated integral, with the duj integral as the inside integral. The funda-
mental theorem of calculus say that the integral is equal to
Z
[fj (u1 , . . . , bk , . . . , uk ) − fj (u1 , . . . , aj , . . . , uk )] du1 . . . duj−1 duj+1 . . . duk .
Aj
(3.158)
Here Aj is the relevant k − 1 cell. In other words,
Z k
X Z Z
dω = (−1)j−1 [ Ij+∗ ωj − Ij−∗ ωj ]. (3.159)
I j=1 Q+
j Q−
j
Finally, this is
Z k
X Z Z Z
j−1
dω = (−1) [ ω− ω] = ∂Iω. (3.161)
I j=1 Ij+ Ij−
The second step uses the fact that the integral of a differential form α over
a chain χ may be expressed by pulling back to parameter cells. It also depends
3.21. CLASSICAL VERSIONS OF STOKES’ THEOREM 113
on the result that the pullback of a differential is the differential of the pullback,
that is, χ∗ (dω) = dχ∗ ω. This gives
Z Z Z Z Z Z
∗ ∗ ∗
dω = χ (dω) = dχ ω = χ ω= ω= ω. (3.162)
χ I I ∂I χ∂I ∂χ
So the properly formulated result is rather simple; it follows from the trivial
case of the cell and from the remarkable transformation properties of differential
forms.
Here C is an oriented path from one point to another point, and ∆s is the value
of s at the final point minus the value of s at the initial point. Notice that the
result does not depend on the choice of path. This is because ds is an exact
form.
Example: Consider the form y 2 dx + 2xy dy. Since it is exact, we have
Z Z
y 2 dx + 2xy dy = d(xy 2 ) = ∆(xy 2 ) (3.164)
C C
Here R is an oriented region in two dimensional space, and ∂R is the curve that
is its oriented boundary.
Example: A classical application of Green’s theorem is the computation of area
via Z Z
1
dx dy = x dy − y dx. (3.166)
R 2 ∂R
In polar coordinates this takes the form
Z Z
1
r dr dθ = r2 dθ. (3.167)
R 2 ∂R
114 CHAPTER 3. DIFFERENTIAL FORMS
In this case a typical parameter region for the integral on the left hand side
may be considered in the r, θ plane as the four sided region where θ ranges from
0 to 2π and r ranges from 0 to some value depending on θ. The integral on
the right is over a chain consisting of four oriented curves. However three of
these curves contribute a total of zero: the contributions from θ = 0 and θ = 2π
take opposite orientations and cancel each other, while at r = 0 the integrand
vanishes. So only one oriented curve on the right hand side contributes to the
calculation of the area. |
The integral of a 2-form over a surface is called a surface integral. The
classical Stokes’s theorem says that for an oriented two dimensional surface S
in a three dimensional space with oriented boundary curve ∂S we have
Z Z
∂r ∂q ∂p ∂r ∂q ∂p
− dy dz+ − dz dx+ − dx dy = (p dx+q dy+r dz).
S ∂y ∂z ∂z ∂x ∂x ∂y ∂S
(3.168)
This result for 2-forms has an obvious analog in n dimensions. This case of
Stokes’ theorem has important consequences for line integrals of closed forms.
Proof: Fix an initial point and a final point, and suppose that the final
point has coordinates x0 . Consider the scalar
Z final(x0 )
s = h(x0 ) = ω. (3.171)
initial
By the property of the region U and the independence of path for closed forms,
this is a well-defined scalar depending only on the final point. It is not too hard
to show that ds = ω in U .
3.22. PICTURING STOKES’ THEOREM 115
The result in two dimensions only requires Green’s theorem. Even this case
is significant. Much of what is interesting in complex variables depends on the
fact that
x dy − y dx
α= (3.172)
x2 + y 2
is a form (defined in the plane with one point removed) that is closed but
not exact. If one considers the plane with an entire half-line from the origin
removed, then this form is exact in that smaller region, in fact, α = dφ, where
φ is a suitable angle. But the interest is in what happens with curves that
go entirely around the origin. Since such a curve is not a boundary, it is not
surprising that the result can be a non-zero multiple of 2π.
Gauss’s theorem is the case relating n − 1 forms to n forms. The classical
case is when n = 3, so that it relates 2-forms to 3-forms. When n = 2 it is
Green’s theorem. Let W be an oriented three dimensional region, and let ∂W
be the oriented surface that forms its boundary. Then the three dimensional
version of Gauss’s theorem states that
Z Z
∂a ∂b ∂c
+ + dx dy dz = a dy dz + b dz dx + c dx dy. (3.173)
W ∂x ∂y ∂z ∂W
For a closed 1-form dα = 0 and the integral over every boundary is zero.
A k-form in n dimensions is a much more complicated object. A strategy to
visualize them is to look at a certain subspace where the form vanishes. This
does not completely characterize the form, but it gives at least some intuition
about what it looks like.
Consider the case of dimension n. If X is a vector field, and ω is a k-form,
the we may define the interior product of X with ω to be the k − 1 form Xcω
defined by
hXcω | Y1 , . . . , Yk−1 i = hω | X, Y1 , . . . , Yk−1 i. (3.175)
∂
c dxj ∧ dxI = dxI . (3.176)
∂xj
∂
Thus, for instance, the interior product of ∂y with dx dy is equal to the interior
∂
product of ∂y with −dy dx which is −dx.
The characteristic subspace of ω is the subspace of all X with Xcω = 0.
The condition for X to be a characteristic vector is that for every k-tuple
X, Y1 , . . . , Yk−1 to which X belongs we have hω | X, Y1 , . . . , Yk−1 i = 0. If the
k-form is non-zero, then the dimension of the characteristic subspace is ≤ n − k.
If the characteristic subspace of ω has dimension n−r at each point, then the
form is said to have rank r. If the k-form is non-zero, then the rank is ≥ k. It is
not true that every non-zero k-form is of rank k. The simplest counterexample is
in dimension 4 and is ω = dx1 dx2 + dx3 dx4 , which has rank 4. In this case, the
characteristic subspace consists only of the zero vector. It may be shown that
a non-zero k-form is of rank k if and only if it is decomposable, that is, it may
be represented as a product of non-zero 1-forms. The form in the example is
not decomposable. For more on decomposable forms see the books by Crampin
and Pirani [5] and by Şuhubi [19].
If a k-form is of rank k, then it would seem natural to picture it by its
corresponding characteristic subspace of dimension n − k. This may not give
complete information about the form, but it will indicate its general character.
If the form α is non-zero at some point, then some pj 6= 0, and so the corre-
sponding space of vectors X at the given point is n − 1 dimensional.
3.22. PICTURING STOKES’ THEOREM 117
The left hand side represents the integral of the oriented end points over the
n-dimensional oriented region W . The right hand side represents the flux of ω
though the boundary surface ∂W , which in a discrete approximation is imag-
ined as proportional to the number of tubes penetrating the surface, taking
into account orientation. In other words, the output through the boundary is
explained by an integral of the production inside the region. In the discrete
approximation, the Gauss theorem reduces to counting, since the number of
lines passing through the bounding surface ∂W described by the n − 1 form ω
corresponds to the number of points in the interior W described by the n-form
dω, as usual taking signs into account. In the case when ω is closed, there is no
production, and the total flow across the boundary ∂W is zero.
118 CHAPTER 3. DIFFERENTIAL FORMS
We shall see in the next chapter that when there is a given volume form,
then the n − 1 form in the Gauss theorem may be represented by a vector field.
In this case the Gauss theorem becomes the divergence theorem.
In the case n = 3 a 1-form α has a derivative dα that is a 2-form. The 1-form
may be pictured as surfaces that end in curves, and the the 2-form is represented
by thin tubes. The tubes act as hinges on which the surfaces hang. Since dα is
an exact form, the tubes have no end points. Stoke’s theorem relates the 1-form
α represented by the surfaces to the 2-form dα represented by the thin tubes.
The formula represents the integral of α around the closed curve ∂S in terms of
the integrated flux of the tubes given by dα through a surface S that this curve
bounds. The result is independent of the surface. In a discrete approximation
the Stoke’s theorem is again the result of counting, since the number of surfaces
with equal increments (taking into account increase or decrease) is equal to the
number of tubes acting as hinges (taking into account orientation).
Example: Take the form α = x dy in three dimensions and an oriented surface
S in a constant z plane bounded by two values of y and by a positive and a
negative value of x. The boundary ∂S is taken oriented so that y increases
for the positive value of x and y decreases for the negative value of x. Then
the integral along the boundary is positive. This is explained by the tubes
representing dα = dx dy, which are vertical, of constant density, and have an
orientation compatible with the orientation of S. |
The case n = 2 gives rise to two equivalent pictures. The first picture is
that of Green’s theorem. Represent the 1-form α = p dx + q dy near each point
by the line where it vanishes. Then α represents the increase or decrease as
one crosses such lines along an oriented curve. It may help to think of the
lines as double lines representing a step up or a step down in a given direction.
The differential dα = (∂q/∂x − ∂p/∂y) dx dy represents the density of hinge
points where the lines begin. So Green’s theorem says that the total increase
or decrease is completely explained by this cloud of hinge points. When the
form α is closed, there are no hinge points, and the integral around every closed
curve is zero.
Example: Take the form α = x dy in two dimensions. This is represented by
lines of constant y whose spacing decreases and reverses sign as x passes through
zero. Consider a region S bounded by two values of y and by a positive and a
negative value of x. The boundary ∂S is taken oriented so that y increases for
the positive value of x and y decreases for the negative value of x. One thinks of
α as indicating some sort of subjective change in vertical distance. The integral
along ∂S is positive, since one is going uphill along the entire closed curve.
In essence this is the famous picture of the Penrose stairs. (The most famous
illustration of these stairs is the Ascending and Descending lithograph print by
Escher.) |
The other picture for n = 2 is closer to the Gauss theorem. It is suggested
by writing α = p dy − q dx and consider it as a 2 − 1 form. In dimension 2 the
tubes representing the form have a transverse orientation. These can be thought
of as double lines, where the orientation goes from one line to its neighbor. So
α represents the amount that these lines cross an oriented curve, taking into
3.23. ELECTRIC AND MAGNETIC FIELDS 119
dE = 0 (3.187)
and
dD = R. (3.188)
Here R is a 3-form called the charge-density.
These equation have integral forms. The first equation says that for every
surface W the integral around the closed curve ∂W that is its boundary is zero:
Z
E = 0. (3.189)
∂W
The second says that for every region Ω the integral over the surface ∂Ω of D
is equal to the total charge in the region:
Z Z
D= R. (3.190)
∂Ω Ω
origin. For D we have the same formula as before for r ≥ . However for r ≤
we have
r3
Din = 3 D. (3.191)
One way to see that this works is to compute the exterior derivative. This is
for r ≤
−1 −1
3r2 4π3 4π3
1 2
d Din = 3 dr sin(θ) dθ dφ = r sin(θ) dr dθ dφ = dx dy dz.
4π 3 3
(3.192)
Indeed this a constant charge density within the ball with total charge equal
to one. Thus the radial lines representing the electric displacement field begin
inside this ball. If we restrict the form to the region outside the charged ball,
then D is a closed 2-form that is not exact.
The corresponding electric field for r ≤ is
r3 1 r
Ein = 3
E= dr. (3.193)
4π 3
The potential is
1 1 2
φin = (3 − r2 ). (3.194)
4π3 2
The constant term makes the potential continuous at r = .
A magnetic field (also called magnetic flux density ) is most often modeled as
a vector field, but in many ways it is more natural to model it as a differential 2-
form. For example, consider the case of a wire bearing current along the z axis.
The magnetic flux density in this case is most easily expressed in cylindrical
coordinates via
1 x dx + y dy 1
B= dz = dz dρ. (3.195)
2π x2 + y 2 2πρ
The lines of magnetic flux density circle around the wire; their density drops off
with distance from the wire.
There is another kind of magnetic field, sometimes called the magnetic field
intensity. It is written H. For our purposes we can think of it as the same field,
but considered as a 1-form. Thus for the magnetic field intensity of the wire we
have
1 x dy − y dx 1 1
H= 2 2
= ρ dφ = dφ (3.196)
2π x + y 2πρ 2π
in cylindrical coordinates. The surfaces of constant φ are planes with one side
on the wire. Notice that this is not an exact 1-form, since the integral around a
closed curve surrounding the wire is 1. In physics the magnetic field intensity is
often represented by a vector field orthogonal to the planes, again circling the
wire.
The fundamental equations of magnetostatics are
dB = 0 (3.197)
122 CHAPTER 3. DIFFERENTIAL FORMS
and
dH = J. (3.198)
Here J is a 2-form called the current-density. The fact that J is exact represents
current conservation: the lines representing the 2-form J have no end points.
These equations have integral forms. The first says that for every region Ω
the integral over the surface ∂Ω of B is zero.
Z
B = 0. (3.199)
∂Ω
Magnetic flux lines never end. The second equation says that for every surface
W the integral of H around the closed curve ∂W that is its boundary is the
current passing through the surface.
Z Z
H= J. (3.200)
∂W W
As an illustration of these ideas, here is the computation of the magnetic
intensity field of a current that is uniform on a cylinder of radius about the z
axis. For H we have the same formula as before for ρ ≥ . However for ρ ≤
we have
ρ2
Hin = 2 H. (3.201)
One way to see that this works is to compute the exterior derivative. This is
for r ≤
2ρ 1 −1 −1
d Hin = 2 dρ dφ = π2 ρ dρ dφ = π2 dx dy. (3.202)
2π
Indeed this is a constant current density within the cylinder with total current
equal to one. Thus the planes representing the H field end in lines inside the
cylinder. If we restrict H to the region outside the wire, then it is an example
of a closed 1-form that is not exact.
The corresponding magnetic flux density for ρ ≤ is
ρ2 1 ρ
Bin = B= dz dρ. (3.203)
2 2π 2
Since dB = 0, it seems reasonable that it should have a 1-form magnetic
potential A with dA = B. Such a potential is of course not unique, since one
may add to it a 1-form of the form ds and get the same magnetic flux density.
As an example, for the case of the wire the vector potential outside the wire
may be taken to be
1 ρ
A=− log( ) dz. (3.204)
2π
The reason for writing it with the > 0 is that it is convenient to have the mag-
netic potential be zero at the surface of the wire. The corresponding expression
inside is then
1 1 ρ2
Ain = − − 1 dz. (3.205)
2π 2 2
This also is zero at the surface of the wire.
3.23. ELECTRIC AND MAGNETIC FIELDS 123
Find its zeros. At each zero, find its linearization. For each linearization,
find its eigenvalues. Use this information to sketch the vector field.
∂ ∂
v = (1 + x2 + y 2 )y − (1 + x2 + y 2 )x . (3.208)
∂x ∂y
dx
= −y + x(x2 + y 2 ) (3.209)
dt
dy
= x + y(x2 + y 2 ). (3.210)
dt
Change the vector field to the polar coordinate representation, and solve
the corresponding system of ordinary differential equations.
5. A predator-prey system. Fix α > 0. In the region with 0 < u and 0 < v
consider the system
du
= u(1 − v) (3.211)
dt
dv
= αv(u − 1).
dt
The u variable represents prey; the v variable represents predators. (a)
Sketch this vector field. Find the zero. What kind of linearization is there
at this zero?
124 CHAPTER 3. DIFFERENTIAL FORMS
Recitation 7
1. Exact differentials. Is (x2 + y 2 ) dx + 2xy dy an exact differential form? If
so, write it as the differential of a scalar.
−y dx + x dy = du? (3.213)
Recitation 8
1. The differential 3-form σ = (yz + x2 z 2 + 3xy 2 z) dx dy dz is of the form
σ = dω, where ω is a 2-form. Find such an ω. Hint: Many solutions.
2. Let σ = xy 2 z dy dz − y 3 z dz dx + (x2 y + y 2 z 2 ) dx dy. Show that this 2-form
σ satisfies dσ = 0.
3. The previous problem gives hope that σ = dα for some 1-form α. Find
such an α. Hint: This may require some experimentation. Try α of the
form α = p dx+q dy, where p, q are functions of x, y, z. With luck, this may
work. Remember that when integrating with respect to z the constant of
integration is allowed to depend on x, y.
5. This continues the previous problem. Verify Green’s theorem in this spe-
cial case, by explicitly calculating the appropriate integral over the region
R.
126 CHAPTER 3. DIFFERENTIAL FORMS
Recitation 9
1. Let
α = −y dx + x dy + xy dz. (3.219)
Fix a > 0. Consider the surface S that is the hemisphere x + y + z 2 =
2 2
2. This continues the previous problem. Verify Stokes’s theorem in this spe-
cial case, by explicitly calculating the appropriate integral over the surface
S.
3. Let σ = xy 2 z dy dz − y 3 z dz dx + (x2 y + y 2 z 2 ) dx dy. Integrate σ over the
sphere x2 + y 2 + z 2 = a2 . Hint: This should be effortless.
Chapter 4
127
128 CHAPTER 4. THE METRIC TENSOR
∂ ∂
c dui duj duk = − c duj dui duk = −dui duk . (4.1)
∂uj ∂uj
When k = 1 this is already familiar. For a 1-form α the interior product ucα is
the scalar hα | vi. For a scalar field s we take ucs = 0.
One interesting property of the interior product is that if α is an r-form and
β is an s-form, then
In a coordinate representation this implies the following identity for the interior
product of a vector field with an n-form:
n n
X ∂ X
aj c du1 · · · dun = ai (−1)i−1 du1 · · · dui−1 dui+1 · · · dun . (4.7)
j=1
∂u j i=1
4.2 Volume
Consider an n-dimensional manifold. The new feature is a given n-form, taken
to be never zero. We denote this volume form by vol. In coordinates it is of the
form
√
vol = g du1 · · · dun . (4.8)
√
This coefficient g depends on the coordinate system. The choice of the no-
√
tation g for the coefficient will be explained in the following chapter. (Then
√
g will be the square root of the determinant of the matrix associated with
the Riemannian metric for this coordinate system.) It is typical to make the
coordinate system compatible with the orientation, so that volumes work out
to be positive.
The most common examples of volume forms are the volume in vol =
dx dy dz in Cartesian coordinates and the same volume vol = r2 sin(θ) dr dθ dφ
in spherical polar coordinates. The convention we are using for spherical polar
coordinates is that θ is the co-latitude measured from the north pole, while φ is
√
the longitude. We see from these coordinates that the g factor for Cartesian
√
coordinates is 1, while the g factor for spherical polar coordinates is r2 sin(θ).
In two dimensions it is perhaps more natural to call this area. So in Cartesian
coordinates area = dx dy, while in polar coordinates area = r dr dφ.
For each scalar field s there is an associated n-form s vol. The scalar field
and the n-form determine each other in an obvious way. They are said to be
dual to each other, in a certain special sense.
For each vector field v there is an associated n − 1 form given by vcvol.
The vector field and the n − 1 form are again considered to be dual to each
other, in the same sense. If v is a vector field, then vcvol might be called the
corresponding flux density. It is an n − 1 form that describes how much v is
penetrating a given n − 1 dimensional surface. In coordinates we have
n n
X ∂ X √
aj c vol = ai (−1)i−1 g du1 · · · dui−1 dui+1 · · · dun . (4.9)
j=1
∂uj i=1
130 CHAPTER 4. THE METRIC TENSOR
In other words, it is the dual of the differential of the dual. The general diver-
gence theorem then takes the following form.
The right hand side in the divergence theorem is called the flux of the vector
field through the surface. The left hand suggests that the flux is produced by
the divergence of the vector field in the interior.
Suppose that
n
X ∂
v= aj . (4.18)
j=1
∂uj
4.3. THE DIVERGENCE THEOREM 131
To compute the integral over the boundary we have to pull back the differential
form vcvol to the parameter space. Say that t1 , . . . , tn−1 are the parameters.
Then the differential form pulls back to
n
X √
φ∗ (vcvol) = aj νj g dt1 · · · dtn−1 , (4.19)
j=1
If one compares this with other common formulations of the divergence theorem,
one sees that there is no need to normalize the components νj , and there is also
no need to compute the surface area of ∂W . Both of these operations can be a
major nuisance; it is satisfying that they are not necessary.
There is another closely related way of looking at the surface integral in
the divergence theorem. The terminology is not standardized, but here is one
choice. Define the transverse surface element to be the interior pullback of
the volume as
n
√ X√
element = φ∗1 vol = φ∗1 g du1 , · · · , dun = gνi dxi dt1 , · · · , dtn−1 . (4.21)
i=1
Notice that this is just the pullback where one freezes the dui and only pulls
back the other duj . Then the flux density pulled back to the surface may be
written
n
∗ √ X
φ (vcvol) = vcelement = g ai νi dt1 , · · · , dtn−1 . (4.22)
i=1
The interior product on the left is a vector field with an n-form, while the
interior product on the right is a vector field with an 1-form (freezing the dti ).
So the vector field paired with this surface element is the object that must be
integrated.
In two dimensions the divergence theorem says that
√ √
∂ ga ∂ gb √
Z Z
1
√ + area = g(a dv − b du). (4.23)
R g ∂u ∂v ∂R
√ √
Here the area form is g du dv, where the particular form of g is that associ-
ated with the u, v coordinate system. Notice that the coefficients in the vector
field are expressed with respect to a coordinate basis. We shall see in the next
part of this book that this is not the only possible choice. This theorem involves
a kind of line integral, but the starting point is a vector field instead of a 1-form.
132 CHAPTER 4. THE METRIC TENSOR
Apply this to ω = ucvol and use ds ∧ ucvol = hds | ui vol. This gives the
divergence identity
div (su) = ds · u + s div u. (4.28)
From this we get another important integration by parts identity
Z Z Z
hds | ui vol + s div u vol = s ucvol. (4.29)
W W ∂W
It says that the rate of change of the amount of substance inside the region W
plus the net (outward minus inward) flow through the boundary ∂W is zero.
Thus, for instance, the amount in the region can only decrease if there is a
compensating outward flow. In fluid dynamics the flux J of mass is J = vcR,
where v is the fluid velocity vector field. Since the coefficients of v are in meters
per second, and the basis vectors are in inverse meters, the units of v itself is
in inverse seconds.
Often one writes
R = ρ vol (4.32)
Here the coefficient ρ is a scalar density (in kilograms per cubic meter). In this
case the conservation law reads
∂ρ
+ div (ρv) = 0. (4.33)
∂t
The corresponding integral form is
Z Z
d
ρ vol + ρvcvol = 0. (4.34)
dt W ∂W
The units for this equation are kilograms per second. For a fluid it is the law
of conservation of mass. The theory also applies when v is a time-dependent
vector field.
The conservation law also appears in the equivalent form
∂ρ
+ vcdρ + div v ρ = 0 (4.35)
∂t
or, more concretely,
n
∂ρ X ∂ρ
+ vi = −div v ρ. (4.36)
∂t i=1 ∂xi
Here vi is the component of velocity in the direction corresponding to xi . The
left hand side is a material derivative, that is, the change in density following a
typical particle. This suggests a method of solving the differential equation by
134 CHAPTER 4. THE METRIC TENSOR
integrating the vector field to find particle trajectories. The change in density
along a particle trajectory is driven by the right hand side, which is a measure
of how fast particles are compressing together.
A solution for this conservation law describes how ρ at t = t0 determines ρ for
later t > t0 . The solution method uses the space-time curves ct0 for t0 |leqt0 ≤ t
that define the particle trajectories. Fix a space-time point given by x and t.
The curve is required to pass through this point, that is, the particle will reach
x at time t. Then ct0 describes where it came from, that is, its location in
space-time at time t0 ≤ t. The space coordinates of ct0 are x ◦ ct0 . The time
component of ct0 is just t0 . The curve solves the ordinary differential equations
dxi ◦ ct0
= vi ◦ ct0 . (4.37)
dt0
This says that the particle moves according to the fluid velocity.
From the chain rule and the ordinary differential equation for the particle
trajectory we get
n n
!
dρ ◦ ct0 ∂ρ X dxi ◦ ct0 ∂ρ ∂ρ X ∂ρ
= ◦ ct0 + ◦ ct0 = ◦ ct0 + vi ◦ ct0 . (4.38)
dt0 ∂t i=1
dt0 ∂xi ∂t i=1
∂xi
This is the solution for ρ at any point given by x, t. The particles at time t have
been transported from where they were at time t0 . However they may be more
(or less) spread out because they are diverging (or converging) geometrically,
and this will decrease (or increase) the density.
The object of interest is really the differential form R = ρ vol that gives the
mass in a given region. For each t0 there is a map ct0 from the time t space to the
time t0 space. The solution says that R is the pullback under ct0 of R0 = ρ0 vol.
The exponential factor in ρ comes from pulling back the volume form. All that
happens is that mass is transported to new locations.
Example: Consider an exploding star, at a scale at which the original star is a
point. If the point is the origin, and the explosion was at time zero, then the
particles found at x, t will have travelled a distance x in time t and will then
have (constant) velocity v = x/t. If we look at particles at the same time but
at greater or lesser distance, then they will not be the same particles, so they
will have correspondingly greater or lesser velocities.
4.5. THE METRIC 135
The explosion need not be symmetric; the profile at time t > 0 is ρ = f (x, t).
The volume form is dn x = dx1 · · · dxn , and so the divergence of the vector field
is div v = n/t. The exponential factor is thus (t0 /t)n . (For the exploding
star example the dimension of space is n = 3, but it is just as easy to do
the calculation for arbitrary dimension.) For fixed x and t the solution of
the ordinary differential equation dx0 /dt0 = x0 /t0 is x0 = (x/t)t0 . The space
component of the solution curve ct0 is therefore x ◦ ct0 = (x/t)t0 . So if ρ0 =
f (x, t0 ), then ρ0 ◦ ct0 = f ((x/t)t0 , t0 ). The solution to the partial differential
equation is n
t0 x
ρ = f (x, t) = f (t0 , t0 ). (4.41)
t t
It consists of a profile that is the same for all particles with the same constant
velocity x/t, modified by a factor that says that they are flying apart.
For such an explosion the mass form R = ρ dn x is the pullback under the
map x ← (t0 /t)x of R0 = f (x, t0 ) dn x. This leads to an equivalent solution
formula R = f ((t0 /t)x, t0 ) dn (t0 /t)x = f ((t0 /t)x, t0 )(t0 /t)n dn x. |
Here the product dxi dxj is not the anti-symmetric exterior project, but instead
is a symmetric tensor product. The metric tensor is determined in a particular
coordinate system by functions gij forming the matrix of coefficients. This
matrix is required to be symmetric and positive definite. The distance along a
regular parameterized curve C is then given by
v
u n X n
Z uX
distance = t gij dxi dxj . (4.43)
C i=1 j=1
Because of the square root the integrals can be nasty. In this expression it
is helpful to think of dx
dt as the components of the velocity. The square root
i
136 CHAPTER 4. THE METRIC TENSOR
The case gij = δij is the special case when the xi are Cartesian coordinates.
In this case
Xn
g= dx2i . (4.47)
j=1
Here we are using the perhaps unfamiliar notation that g ij is the inverse matrix
to gij . (This notation is standard in this context.)
Another quantity associated with the metric tensor g is the volume form
√
vol = g du1 · · · dun . (4.53)
Here g denotes the determinant of the matrix gij . (This notation is also stan-
dard.) The interpretation of this as volume is left to a later section.
There is a very important construction that produces new metrics. Sup-
pose that the n dimensional space has coordinates x1 , . . . , xn , and there is a
k-dimensional regular parametrized surface with coordinate u1 , . . . , uk . Start
with the metric
Xn X n
g= gij dxi dxj . (4.54)
i=1 j=1
Here
n X n
∗
X ∂xi ∂xj
gαβ = gij . (4.56)
i=1 j=1
∂uα ∂uβ
A simple example is the pullback of the Euclidean metric given above to the
sphere x2 + y 2 + z 2 = a2 . The metric pulls back to
This is not a flat metric. Even if one only considers a small open subset of the
sphere, it is still impossible to find coordinates u, v such that g∗ = du2 + dv 2 .
Remark: The words tensor and tensor field can refer to a number of kinds
of objects. Strictly speaking, a tensor is defined at a particular point, and a
tensor field is a function that assigns to each point a tensor at that point. More
precisely, a tensor of type (p, q) at a point is a real multi-linear function whose
inputs consist of q vectors in the tangent space at the point and p vectors in
the dual space to the tangent space at the point (covectors). When p = 0 and
all the inputs are vectors it is called a covariant tensor. . When q = 0 and all
the inputs are covectors it is called a contravariant tensor. When both kinds of
vectors are allowed as inputs, it is called a mixed tensor. A tensor field assigns
(in a smooth way) to each point in a manifold patch a corresponding tensor at
138 CHAPTER 4. THE METRIC TENSOR
that point. People are often careless and use the word tensor to mean tensor
field.
The most basic tensor fields are scalar fields of type (0,0), vector fields of
type (1,0), and differential 1-forms of type (0,1) There are also more complicated
tensor fields. A differential k-form assigns to each point a real multi-linear
function on k-tuples of tangent vectors at the point, so it is of type (0, k). A
metric tensor field assigns to each pair an inner product on tangent vectors
at the point, so it is of type (0, 2). For the more complicated tensors one
can also impose symmetry conditions. Thus one distinguishes between the anti-
symmetric tensor case (differential k-forms) and the symmetric tensor case (the
metric tensor). The metric tensor is required not only to be symmetric, but also
positive definite. The inverse of the metric tensor is a tensor of type (2, 0); it is
also symmetric and positive definite.
The only example in these lecture of a mixed tensor is a (1,1) tensor, that
is, a linear transformation. An example is the linear transformation associated
with a vector field at a zero. This should be distinguished from the quadratic
form associated with a scalar field at a critical point, which is a symmetric
covariant tensor of type (0,2).
The study of tensors at a point is called tensor algebra, while the study of
tensor fields is tensor calculus. The metric tensor field provides a particularly
rich ground to explore. A choice of metric tensor field is the beginning of a
subject called Riemannian geometry. The metric tensor field and related objects
are fundamental to Einstein’s general relativity. |
In coordinates this is
n n
!
2 1 X ∂ √ X ik ∂s
∇ s= √ g g . (4.66)
g i=1 ∂ui ∂uk
k=1
Theorem 4.2 (Green’s first identity) If s and u are scalars defined in the
bounded region Ω, then
Z Z Z
2
s ∇ u vol + ∇s g ∇u vol = s ∇ucvol. (4.67)
Ω Ω ∂Ω
In particular, Z Z
− u ∇2 u vol = ∇u g ∇u vol ≥ 0. (4.71)
Ω Ω
It is easy to see that curl grad f = 0 and that div curl v = 0. In this language
Stokes’s theorem says that
Z Z
curl vcvol = gv. (4.74)
S ∂S
4.8. ORTHOGONAL COORDINATES AND NORMALIZED BASES 141
With normalized basis vectors the coefficients of vector fields and the corre-
sponding differential forms are the same. This makes it very easy to confuse
vector fields with differential forms.
In orthogonal coordinates the volume is given in terms of normalized differ-
entials by
vol = h1 du1 ∧ · · · ∧ hn dun . (4.77)
A simple example of orthogonal coordinates is that of polar coordinates r, φ
in the plane. These are related to Cartesian coordinates x, y by
x = r cos(φ) (4.78)
y = r sin(φ) (4.79)
forms like dφ are closed forms, a normalized form like r dφ need not be a closed
form. In fact, in this particular case d(r dφ) = dr ∧ dφ 6= 0.
Another example of orthogonal coordinates is that of spherical polar coordi-
nates r, θ, φ. These are related to Cartesian coordinates x, y, z by
Then
n
X 1 ∂ h1 · · · hn
div u = ∇ · u = ai . (4.87)
i=1
h1 · · · hn ∂ui hi
In coordinates the Laplacian has the form
n
1 X ∂ h1 · · · hn ∂f
∇2 f = (4.88)
h1 · · · hn i=1 ∂ui h2i ∂ui
1 ∂2f
2 1 ∂ 2 ∂f 1 1 ∂ ∂f
∇ f= 2 r + 2 sin(θ) + . (4.90)
r ∂r ∂r r sin(θ) ∂θ ∂θ sin2 (θ) ∂φ2
We conclude by recording the explicit form of the divergence theorem and
Stokes’ theorem in the context of orthogonal coordinates. The first topic is
4.8. ORTHOGONAL COORDINATES AND NORMALIZED BASES 143
the divergence theorem in two dimensions. Say that the vector field v has an
expression in terms of normalized basis vectors of the form
1 ∂ 1 ∂
v=a +b . (4.91)
hu ∂u hv ∂v
Recall that the area form is
The expression in brackets on the left is the divergence of the vector field. On
the right the integrand measures the amount of the vector field crossing normal
to the curve.
The next topic is the divergence theorem in three dimensions. Say that the
vector field v has an expression in terms of normalized basis vectors of the form
1 ∂ 1 ∂ 1 ∂
v=a +b +c . (4.95)
hu ∂u hv ∂v hw ∂w
Recall that the volume form is
The terms in square brackets are the components of the curl of the vector field
expressed in terms of normalized basis vectors.
This says that the determinant is a sum of products, each product having coeffi-
cient 0, 1, or −1. There are nn such products, most of them equal to zero. Each
product with a non-zero coefficient corresponds to picking a distinct element
from each row and multiplying them together. The number of products with
non-zero coefficient is n!, which is still a very large number for computational
purposes.
The determinant formula above depends on the fact that the rows are taken
in order 1, . . . , n. If we instead take them in the order i1 , . . . , in we get
1
Ckj = k i ···i j j2 ···jn aij22 · · · aijnn . (4.105)
(n − 1)! 2 n
In the matrix version this says that AC = det(A)I. Cramer’s rule thus has the
succinct statement A−1 = (1/ det(A))C. While Cramer’s rule is quite impracti-
cal for numerical calculation, it does give considerable insight into the structure
of the inverse matrix.
Proof: In the following there are always sums over repeated indices. We
have
1 α1 ···αk β1 ···βk T
det(X T GX) = (X GX)α1 β1 · · · (X T GX)αk βk . (4.113)
k!
However
(X T GX)α β = Xαi gij Xβj . (4.114)
So
1 α1 ···αk β1 ···βk i1
det(X T GX) = Xα1 · · · Xαikk gi1 j1 · · · gik jk Xβj11 · · · Xβjkk . (4.115)
k!
From the definition of J we get the result as stated.
be the components of a row vector that represents the determinant of the minor
that does not include row j. Then
The last identity requires some thought. Fix j2 , . . . , jn distinct. In the sum
over j the factor j j2 ···jn only contributes when j is the one index distinct
from j2 , . . . , jn . The corresponding factor j h2 ···hn then only matters when the
indices h2 , . . . , hn are a permutation of the j2 , . . . , jn . However, both j h2 ···hn
and J h2 ···hn are antisymmetric in the indices h1 , . . . , hn . It follows that the
product j h2 ···hn J h2 ···hn is the same for each permutation h1 , . . . , hn . When we
sum over these permutations we get (n − 1)! terms all equal to J j2 ···jn .
We can use the previous theorem to write
1
det(X T GX) = J i2 ···in gi2 j2 · · · gin jn J j2 ···jn . (4.123)
(n − 1)!
148 CHAPTER 4. THE METRIC TENSOR
This general result is dramatic even in the case n = 3 and G the identity
matrix. In that case it says that the square of the area of a parallelogram is
the sum of the squares of the areas of the three parallelograms obtained by
projecting on the three coordinate planes. This is a remarkable generalization
of the theorem of Pythagoras. [The most common version of this observation
is in the context of the cross product. There are vectors X 1 and X 2 in R3 .
They span a parallelogram with a certain area. The cross product is a vector ν
perpendicular to X 1 and X 2 whose Euclidean length is this area.]
where g = det(gij ).
Consider a k-dimensional regular parametrized surface S with parameters
u1 , . . . , uk . This parametrization is one-to-one, so that u1 , . . . , uk may be thought
of as coordinates on the surface. It seems reasonable to compute the k-dimensional
surface area using the pull-back of g to the surface. This is
k X k k X k n X n
X X X ∂x i ∂x j
g∗ = ∗
gα,β duα duβ = gij duα duβ . (4.132)
α=1 α=1 i=1 j=1
∂uα ∂u β
β=1 β=1
Let Xαi = ∂xi /∂uα . Then we have a collection of tangent vectors Xα . It follows
that
∗
gαβ = XαT GXβ . (4.133)
∗
Notice that gαβ has the form of a Gram matrix. Let g ∗ = det(gαβ ∗
) be the
corresponding Gram determinant. Then the area is given by integrating
√
area∗k = g ∗ du1 · · · duk . (4.134)
In some sense this is the end of the story. One computes a Gram determinant,
takes the square root, and integrates. Because of the square root the integrals
tend to be quite nasty. In principle, though, we have a nice notion of area and
of integration with respect to area. That is, we have
√
Z Z
h(x)areak (x) = h(f (u)) g ∗ du1 · · · duk . (4.135)
S
This says that the νj are the coefficients of a 1-form P that vanishes on the tan-
gent vectors to the surface. In other words, the j aj νj is giving a numerical
indication of the extent to which the vector field is failing to be tangent, that
is, of the extent to which it is penetrating the surface.
The alternate formula for the area form follows from the linear algebra
treated in the previous section. It says that
v
√ u n X n
uX
∗ ∗
√
arean−1 = g du1 · · · dun = t νi g ij νj g du1 · · · dun−1 . (4.141)
i=1 j=1
In this equation g ij is the inverse matrix of the metric matrix gik . The νj is
(−1)j−1 times the determinant of the n−1 by n−1 matrix obtained by removing
the jth row from the n by n − 1 matrix ∂xi /∂uα .
ThePform coefficients νj define the coefficients of a vector with coefficients
n
N i = j=1 g ij νj . The identity says that this vector is orthogonal to the surface.
This vector of course depends on the parametrization. Sometimes people express
the area in terms of this corresponding vector as
v
√ u n X n
uX
√
area∗n−1 = g ∗ du1 · · · dun = t N i gij N j g du1 · · · dun−1 . (4.142)
i=1 j=1
Here N̂k indicates the normalization of the vector to have length one. What is
amazing is that if one writes it out, this says
v
n X n j u n X n
uX
X N √
(Y cvol)∗ = ai gij qP t N i gij N j g du1 · · · dun−1 .
n Pn i j
i=1 j=1 i=1 j=1 N gij N i=1 j=1
(4.144)
4.11. SURFACE AREA 151
There are two complicated factors involving square roots, one to normalize the
orthogonal vector, the other to calculate the area. These factors cancel. They
never need to be computed in a flux integral.
It may be helpful to summarize the result for hypersurface area in the form
of a theorem. This is stated in a way that makes clear the connection with the
divergence theorem. Recall that the transverse surface element
n
√ √ X
element = φ1 ∗ vol = φ∗1 g dx1 · · · dxn = g νi dxi du1 · · · dun−1 (4.145)
i=1
is the interior pullback of the volume. The 1-form part involving the dxi is the
part that was not pulled back. It measures the extent to which a vector field is
transverse to the surface, as in the setting of the divergence theorem. Since this
is a form and not a vector, its norm is computed via the inverse of the metric
tensor.
that measures the extent to which a vector is transverse to the surface. Then
the area element is the length of the transverse surface element:
v
uX n X n
√ u
area = |element| = g t νi g ij νj du1 · · · dun−1 . (4.147)
i=1 j=1
and
∂x ∂x ∂y ∂y ∂z ∂z
F = h2x + h2y + h2z , (4.151)
∂u ∂v ∂u ∂v ∂u ∂v
and
2 2 2
∂x ∂y ∂z
G = h2x + h2y + h2z . (4.152)
∂v ∂v ∂v
1
σ = vcvol = (x dy dz + y dz dx + z dx dy). (4.156)
r3
Recitation 10
The setting for these problems is Euclidean space Rn with n ≥ 3 with Cartesian
coordinates x1 , . . . , xn . We write r2 = x21 + · · · + x2n and vol = dx1 · · · dxn . The
gradient of u is the vector
n
X ∂u ∂
∇u = grad u = . (4.158)
i=1
∂xi ∂xi
Then
n
X ∂u
∇ucvol = (−1)i−1 dx1 · · · dxi−1 dxi+1 · · · dxn . (4.159)
i=1
∂xi
∂2
Pn
Often ∇2 = i=1 ∂x2i is called the Laplace operator.
1
3. Define the solid angle form σ = r n ω. Show that
1
rn−1 dr σ = dr ω = vol. (4.163)
r
7. Show that
1 1
−∇φ(x)cvol = ω. (4.167)
nα(n) rn
8. Show that this is a closed form, and hence ∇2 φ(x) = 0 away from r = 0.
9. Show that for every a > 0 the flux of the fundamental solution is
Z
−∇φ(x)cvol = 1. (4.168)
Sa
Then
n
X ∂u
∇ucvol = (−1)i−1 dx1 · · · dxi−1 dxi+1 · · · dxn . (4.170)
i=1
∂xi
4.11. SURFACE AREA 155
∂2
Pn
Often ∇2 = i=1 ∂x2i is called the Cartesian coordinate Laplace operator.
The fundamental solution of the Laplace equation ∇2 u = 0 is
1 1 1
φ(x) = n−2
. (4.172)
nα(n) n − 2 r
The goal is to establish the following amazing identity: The fundamental solu-
tion satisfies
−∇2 φ(x) = δ(x). (4.173)
This shows that the the behavior of the fundamental solution at r = 0 can also
be understood. It turns out that this is the key to solving various problems
involving the Laplace operator. The following problems are intended to give at
least some intuitive feel for how such an identity can come about.
The method is to get an approximation φ (x) whose
√ Laplacian is an approx-
imate delta function δ (x). To this end, define r = r2 + 2 and let
1 1 1
φ (x) = n−2 . (4.174)
nα(n) n − 2 r
3. Show that
−∇2 φ (x) = δ (x), (4.177)
where δ (x) is a constant times a power of times an inverse power of r .
Find the explicit form of δ (x).
as a → ∞.
Recitation 11
1. In the following we use tensor algebra notation where repeated upper and
lower indices indicate summation. Prove the following identity for the
Levi-Civita permutation symbols:
j1 ···jn j1 ···jn = n!. (4.180)
2. Recall that
det(A) = j1 ···jn a1j1 · · · anjn . (4.181)
Show that
i1 ···in det(A) = j1 ···jn aij11 · · · aijnn . (4.182)
3. Show that
1
det(A) = i ···i j1 ···jn aij11 · · · aijnn . (4.183)
n! 1 n
4. Show that det(AB) = det(A) det(B). Hint: Use the preceding identity for
AB.
5. The next three problems are relevant to Cramer’s rule. Show that
j j2 ···jn aij aij22 · · · aijnn = i i2 ···in det(A). (4.184)
6. Show that
1
k i ···i i i2 ···in = δki . (4.185)
(n − 1)! 2 n
9. Let n = 2 in the above result. Let θ be the angle between the two vectors.
Compute the area in terms of this angle and the lengths of the vectors.
10. Let X 1 , . . . , X n be n vectors in Rn forming a matrix X. Let G be a
symmetricp matrix with positive eigenvalues. Define the volume relative to
G to be det(X T GX). Find a formula for this volume in terms of G and
| det(X)|.
11. L X be an n by k matrix, representing k vectors Xαi , where i = 1, . . . , n
and α = 1, . . . , k. Let G be an n by n symmetric matrix. We can define
the Gram matrix as the k by k matrix X T X, or more generally as the k
by k matrix X T GX. For each sequence i1 , . . . , ik of rows of X,
J i1 ···ik = α1 ···αk Xαi11 · · · Xαikk (4.189)
represents the determinant of the corresponding k by k minor obtained by
retaining only those rows. Show
1 i1 ···ik
det(X T GX) = J gi1 j1 · · · gik jk J j1 ···jk . (4.190)
k!
In this formula it is understood that there is summation over repeated
indices.
12. Take G = I (with matrix δij ) in the
pabove formula, and simplify the result.
This gives a formula for the area det(X T X) in terms of areas of projec-
tions. This is a remarkable generalization of the theorem of Pythagoras.
13. Describe what this simplified formula says in the case n = 3 and k = 2.
2. In the same situation, show that the form on the surface with components
νj equal to the (−1)j−1 times the determinant of ∂xi /∂uα with row j
deleted satisfies
n
X ∂xj
νj = 0. (4.192)
j=1
∂uβ
Since it is also zero on the tangent space, it must be a multiple of dw on
the surface. Hint: Consider the matrix with first column ∂xi /∂uβ and
remaining columns ∂xi /∂uα for α = 1, . . . , n − 1. Here β is an arbitrary
choice of one of the α indices.
158 CHAPTER 4. THE METRIC TENSOR
Measure Zero
159
160 CHAPTER 5. MEASURE ZERO
Since this theorem is the key to the entire subject, it is worth recording the
proof. Let > 0. Since A is countable, we may enumerate its elements An .
By the fact that µ̄(AnP
) is defined as a greatest lower bound
S of sums, there is a
cover Ink of An with Pk m(Ink ) < µ̄(A P n ) + 2n . Then n AnSis covered by all
the Ink . Furthermore, nk m(Ink ) < Sn µ̄(An ) +P. Since µ̄( n An ) is defined
as a lower bound of sums, S we have µ̄(
P n An ) < n µ̄(An ) + . Since > 0 is
arbitrary, we must have µ̄( n An ) ≤ n µ̄(An ).
Proposition 5.4 Suppose that A, A0 are subsets with m̄(A \ A0 ) = 0 and also
m̄(A0 \ A) = 0. Then
m̄(A ∩ A0 ) = m̄(A) = m̄(A0 ) = m̄(A ∪ A0 ). (5.6)
0 0
Proposition 5.5 Suppose that A, A are subsets with µ̄(A \ A ) = 0 and also
µ̄(A0 \ A) = 0. Then
µ̄(A ∩ A0 ) = µ̄(A) = µ̄(A0 ) = µ̄(A ∪ A0 ). (5.7)
Proposition 5.6 The outer content of A may be defined by finite open cell
covers: X [
m̄(A) = inf{ m(I) | I finite open, A ⊆ I}. (5.8)
I∈I I∈I
Proposition 5.7 The outer content of A may be defined by finite closed cell
covers: X [
m̄(A) = inf{ m(I) | I finite closed, A ⊆ I}. (5.9)
I∈I I∈I
Proposition 5.8 The outer content of A may be defined by finite closed non-
overlapping cell covers:
X [
m̄(A) = inf{ m(I) | I finite closed non − overlapping, A ⊆ I}. (5.10)
I∈I I∈I
In the following we write Disc(f ) for the set of points at which f is not contin-
uous.
Proposition 5.13
Disc(f ) = {x | oscx (f ) > 0}. (5.16)
Corollary 5.19 Let f be defined on a bounded set. The set Disc(f ) has measure
zero if and only if for each > 0 the set D (f ) has content zero.
µ̄(Disc(f )) = 0. (5.20)
The theorem relies on two lemmas. The first one shows that Riemann in-
tegrability implies the set of discontinuities has measure zero. This part relies
heavily on the countable sub-additivity of outer measure. The other one shows
that a bounded function with a set of discontinuities of measure zero is Rie-
mann integrable. Here the remarkable thing is that measure zero is a weaker
requirement than having content zero.
The way the lemma is used is to note that U (f ) − L(f ) is the greatest lower
bound for U (f, P) − L(f, P). The lemma says that m̄(D ) is a lower bound,
so m̄(D ) ≤ U (f ) − L(f ). If U (f ) − L(f ) = 0, then m̄(D ) = 0, and hence
µ̄(D ) = 0. By countable subadditivity µ̄(Disc) = 0.
Proof: Let P be a partition of C into closed bounded non-degenerate cells.
Let P 0 be the subset of P consisting of sets I with int(I) ∩ D 6= ∅. Then for I
in P 0 we have oscI (f ) ≥ . Now let B be the union of the boundary points of
the cells in P, and let D0 = D \ B. If x is in D0 , then x is in D (f ), and hence
164 CHAPTER 5. MEASURE ZERO
Lemma 5.22 For every > 0 and every κ > 0 there exists a partition P such
that
The way the lemma is used is to note that for every > 0 and every κ > 0
we have U (f ) − L(f ) ≤ oscC (f )(m̄(D ) + κ) + m̄(C). Since κ > 0 is arbitrary
we get U (f ) − L(f ) ≤ oscC (f )m̄(D ) + m̄(C). Now suppose that µ̄(Disc) = 0.
Then for each > 0 we have µ̄(D ) = 0. But since D is closed, this says that
m̄(D ) = 0. So the right hand side is m̄(C). Since > 0 is arbitrary, this
implies that U (f ) − L(f ) = 0.
Proof: The first term comes from the points with large oscillation, and the
second term comes from the points with small oscillation. To deal with the first
term, consider κ > 0. From the definition of m̄(D ) there is a finite closed cover
I of D such that X
m(I) < m̄(D ) + κ. (5.24)
I∈I
One can thicken the cells in I so that each of them is open and the same estimate
is satisfied. The union of the cells in the new I is open, and D is a subset of
this open set. One can then take the closures of the cells and get a finite closed
cell cover satisfying this same estimate. By removing overlaps we can get a
nonoverlapping finite closed cell family P 0 such that every cell is a subset of C.
Let A0 be the union of the cells in P 0 . Then D is in the interior of the closed
set A0 ⊆ C. Furthermore,
X
m̄(A0 ) = m(I) < m̄(D ) + κ. (5.25)
I∈P 0
Now
X X
oscI (f )m(I) ≤ oscC (f )m(I) ≤ oscI (f )(m̄(D ) + κ). (5.27)
I∈P 0 I∈P 0
Also X X
oscI (f )m(I) ≤ m(I) = m̄(A00 ) ≤ m̄(C). (5.28)
I∈P 00 I∈P 00
This gives the result.
This treatment relies heavily on unpublished notes on the Riemann integral
by Mariusz Wodzicki [20].
The Lebesgue theorem has implications for functions that are restricted to
complicated subsets. Let 1A be the indicator function of A, equal to 1 on A and
0 on its complement. Then the discontinuity set of the indicator function 1A is
the boundary of A, that is, Disc(1A ) = bdy(A).
If f is a bounded function defined on a bounded closed non-degenerate cell C,
and A ⊆ C, then the Riemann integral of f over A is defined to be the integral
of f 1A , when that Riemann integral exists. (This may also be taken to be the
definition of a function that is defined only on A.) In particular, if f is a bounded
function on C, and if A ⊆ C, then the Riemann integral of f over A exists if and
only if Disc(f 1A ) has measure zero. However Disc(f 1A ) ⊆ Disc(f ) ∪ bdy(A).
So if both Disc(f ) and bdy(A) have measure zero, then f is integrable over A.
All this applies to the case when f = 1 on the bounded cell C. Then 1A
is discontinuous precisely on bdy(A). So 1A is integrable if and only if bdy(A)
has measure zero. This is precisely the situation when A is Jordan measurable.
The integral of 1A is then the content m(A) of A.
While the outer content of A is defined for arbitrary subsets, the content
of A is only defined when A is Jordan measurable. When a set A is Jordan
measurable, its content m(A) is the same as the outer content m̄(A). (And in
this case this is the same as the outer measure µ̄(A).) The point of restricting
to Jordan measurable sets is that the content on Jordan measurable sets is
additive, while the outer content on arbitrary sets is only subadditive. (The
outer measure on arbitrary sets is also only subadditive, but in this case it takes
some effort to find examples where additivity fails.)
Lemma 5.25 Consider a subset A ⊂ Rn . Then A has measure zero S∞if and
if for every > 0 there is a sequence of closed balls Bk with A ⊆ k=1 Bk
only P
∞
and k=1 voln (Bk ) < . For every κ > 0, it is possible to impose the additional
requirement that the balls all have radius less than κ.
Proof: Up to now measure zero has been defined using covering by non-
degenerate bounded cells. First we show that we could use instead coverings by
closed cubes. Indeed, if we have a non-degenerate cell, then it has a side with
least length L > 0. This cell is a subset of a bigger closed cell all of whose side
lengths are multiples of L. The bigger cell may be taken so that each side length
is no more than 2 times the corresponding side length of the original cell. So
the volume of the bigger cell is bounded by 2n times the volume of the original
cell. Furthermore, the bigger cell may be written as the union of closed cubes
of side length L, and its volume is just the sum of the volumes of the individual
cubes. So if we can cover by cells with small total volume, then we can also
cover by closed cubes of small total volume. By further subdividing the cubes,
one can make them each of side length smaller than (some small multiple) of κ.
Once we have the result for closed cubes, then we have it for closed balls.
This is because a closed ball of radius r is a subset of a closed cube of side length
L = 2r,
√ and a closed cube of side length L is a subset of a closed ball of radius
r0 = nL/2.
A function h is called Lipschitz continuous if there is a constant C such that
it satisfies |h(x0 ) − h(x)| ≤ C|x0 − x|. A function h from an open subset U
of Rn to Rn is called locally Lipschitz continuous if for every x there exists a
neighborhood V ⊆ U of x such that h restricted to V is Lipschitz continuous.
It is not too hard to see that h is locally Lipschitz if and only if it is Lipschitz
continuous on every compact subset K of U . A continuous function may
map a set of measure zero to a set of non-zero measure zero. However this is
impossible for a C 1 function. In fact, a C 1 function is always locally Lipschitz.
So the following result is relevant.
the Bk0 are closed balls of radius rk0 = Krk . The total volume of the balls is
bounded by K n times the total volume of the original balls. This can be made
arbitrarily small.
Next consider an arbitrary subset F ⊆ U with measure zero. Consider a
sequence of compact subsets Kn with union U . Let Fn be the intersection of F
with Kn . Then each Fn has measure zero. Also, h(F ) is the union of the h(Fn ).
Since each of the h(Fn ) has measure zero, then so does h(F ).
One consequence of the theorem is the result that if f is Riemann integrable,
and g is one-to-one continuous with Lipschitz inverse, then f ◦ g is Riemann
integrable. This is seen as follows. Let E = g −1 (Disc(f )). Then E has measure
zero. Suppose that x is not in E. Then g(x) is a continuity point for f , so x
is a continuity point for f ◦ g. This establishes that Disc(f ◦ g) ⊆ E. Since
Disc(f ◦ g) has measure zero, the Lebesgue theorem says that f ◦ g must be
Riemann integrable.
This last is the most difficult case of Sard’s theorem. It says that the set of
w for which there exists x with g(x) = w and g0 (x) having rank less than m
has measure zero. In other words, for almost every w the surface g(x) = w has
only points with g0 (x) having rank m.
Z Z X
f (g(x))h(x)| det g0 (x)| dn x = f (y) h(x) dn y. (5.32)
g(x)=y
In particular we can take h as the indicator function of the set A. This gives
Z Z
f (g(x))| det g0 (x)| dx = f (y)#{x ∈ A | g(x) = y} dy. (5.33)
A
We see that if g is not one-to-one, then the only modification is that we need to
keep track of how many times g assumes a certain value. According to Sard’s
theorem, the critical values do not matter.
Another version of change of variable is the pushforward formula
Z Z X 1
f (g(x))h(x) dn x = f (y) h(x) dn y. (5.34)
| det g0 (x)|
g(x)=y
5.9 Probability
The central notion of probability is that of expectation of a function of a vector
random variable x. A common case is when the expectation is given by a
probability density ρ(x). This is a positive function with integral one. Say that
y = g(x) is a random variable that is a function of x. Then it may or may not
be the case that the expectation of f (y) = f (g(x)) is given by a pushed forward
probability density ρ∗ (y). When this is the case, we should have
Z Z
f (y)ρ∗ (y) dy = f (g(x))ρ(x) dx. (5.44)
First consider the case when n random variables are mapped to n random
variables. There ρ(x) is a joint probability density for random variables x, g(x)
is a vector of n random variables, and f (g(x)) is a function of these n random
variables. The right hand side is the expectation. If one wants to write this
expectation in terms of the random variables y = g(x), then one has to push
forward the density. The change of variables formula suggests that the new
density is
X 1
ρ∗ (y) = ρ(x). (5.45)
| det g0 (x)|
g(x)=y
This only works when the regions where det g0 (x) = 0 can be neglected, and
this is not always the case. If, for instance, there is a region C ofR non-zero
volume with g(x) = y∗ for x in C, then the extra contribution f (y∗ ) C ρ(x) dx
must be added to the left hand side for the identity to be valid. Sard’s theorem
does nothing to help, since there is no longer a factor that vanishes on the set
of critical points. Even though the set of critical values has measure zero, there
can be a lot of probability on a set of measure zero.
Example: An example is when n = 1 and the density is ρ(x) = √12π exp(− 12 x2 ).
This is the density for a standard normal (Gaussian) distribution. Let y = x2 .
Then ρ∗ (y) = √12π √1y exp(− 12 y) for y > 0. This density is that of a chi-squared
distribution (with one degree of freedom). |
Next consider the case when n random variables are mapped to m random
variables with m < n by y = g(x). The pushforward formula suggests that
Z
ρ∗ (y) = ρ(y) β (y) . (5.46)
g−1 (y)
Perhaps the moral of the story is that one should calculate with the original
density ρ(x). In probability theory expectations (or measures) push forward in
a routine way. When you try to express them in terms of densities, then the
expressions are less pleasant. Densities are functions. Functions pull back with
ease, but push forward with considerable difficulty.
we see that
g0 (x)F01 (u, y) = 0. (5.48)
This expresses in matrix form the fact that the n − k independent row covectors
in g0 (x) are zero on the k independent column vectors in F01 (u, y).
Define the co-area factor C(x) by
q
T
C(x) = det g0 (x)g0 (x) . (5.49)
This is the determinant of an n − k square Gram matrix. Define the area factor
by q
0
A(u, y) = det F1T (u, y)F01 (u, y). (5.50)
The determinant is that of a k square Gram matrix. The simplest form of the
co-area formula says that
where x = F(u, y). In the language of differential forms, this formula says that
Here C (y) = C(F(u, y)), while β (y) = | det F0 (u, y)| du and area(y) = A(u, y) du.
Fiber integration gives an integral version of the co-area formula. This says
that
Z Z Z !
(y)
f (g(x))h(x)C(x) dn x = f (y) h(x)arean−m (x) dm y. (5.53)
g(x)=y
5.10. THE CO-AREA FORMULA 173
In particular we can take h as the indicator function of the set A. This gives
Z Z
(y)
f (g(x))C(x) d x = f (y) arean−m ({x ∈ A | g(x) = y}) dm y.
n
(5.54)
A
The co-area formula may also be thought of as a formula for area integrals
in terms of delta functions. Thus
Z Z
(y)
h(x)δ(g(x) − y)C(x) dn x = h(x)areak (x). (5.55)
g(x)=y
In particular
Z
(y)
δ(g(x) − y)C(x) dn x = arean−m ({x ∈ A | g(x) = y}). (5.56)
A
Example: The co-area formula may seem unfamiliar, but there is a case when
it becomes quite transparent. Consider a single equation g(x) = y that defines
a k = n − 1 dimensional hypersurface. In the integral form of the formula take
f (y) = H(s − y), where H is the indicator function of the positive real numbers.
Also take h(x) = 1. The result is
Z Z s
|g 0 (x)| dx = area({x | g(x) = y}) dy. (5.57)
g(x)≤s −∞
It follows that
Z
d
|g 0 (x)| dx = area({x | g(x) = s}). (5.58)
ds g(x)≤s
This formula is an elementary relation between volume and area for an implicitly
defined surface. |
We proceed to a more systematic development of the co-area formula. The
assumption is that the family of surfaces y = g(x) has a smooth parametric
representation x = F(u, y). This means that
and the function F(u, y) has an smooth inverse function G(x) with components
G1 (x) = u and G2 (x) = g(x) = y.
G0 F0 = I. (5.63)
It follows that
T T
G0 G0 F0 F = I. (5.64)
More explicitly, we could write the last two equations as
0
G1 0 0
I 0
F F = (5.65)
G02 1 2
0 I
and
" #" #
T T T T
G01 G01 G01 G02 F01 F01 F01 F02
I 0
T T T T = . (5.66)
G02 G01 G02 G02 F02 F01 F02 F02 0 I
There is a theorem in linear algebra that gives a relation between the determi-
nants of submatrices of matrices that are inverse to each other. The statement
and proof are given below. In this case it says that
T T T
det G02 G02 det F0 F0 = det F01 F01 . (5.67)
(5.74)
There is a triangular factorization
A11 − A12 A−1
A11 A12 I A12 22 A21 0
= . (5.75)
A21 A22 0 A22 A−1
22 A21 I
−1
By the lemma above this gives det A = det A22 det B11 , which leads to the first
result. We also have the triangular factorization
A−1
A11 A12 A11 0 I 11 A12
= . (5.76)
A21 A22 A21 I 0 A22 − A21 A−1 11 A12
−1
This gives det A = det A11 det B22 , which is equivalent to the second result.
176 CHAPTER 5. MEASURE ZERO
Bibliography
[1] Ilka Agricola and Thomas Friedrich, Global analysis: Differential forms in
analysis, geometry and physics, Graduate Studies in Mathematics, no. 52,
American Mathematical Society, Providence, Rhode Island, 2002.
[2] Dennis Barden and Charles Thomas, An introduction to differentiable man-
ifolds, Imperial College Press, London, 2003.
[3] Robert L. Bryant, S. S. Chern, Robert B. Gardner, Hubert L. Goldschmidt,
and P. A. Griffiths, Exterior differential systems, Mathematical Sciences
Research Institute Publications, no. 18, Springer, New York, 1991, (online).
[4] William L. Burke, Applied differential geometry, Cambridge University
Press, Cambridge, 1985.
[5] M. Crampin and F. A. E. Pirani, Applicable differential geometry, Lon-
don Mathematical Society lecture notes 57, Cambridge University Press,
Cambridge, 1986.
[6] Harley Flanders, Differential forms with applications to the physical sci-
ences, Academic Press, New York, 1963.
[7] E. L. Ince, Ordinary differential equations, Dover Publications, New York,
1956.
[8] Fanghua Lin and Xiaoping Yang, Geometric measure theory—an introduc-
tion, International Press, Boston, MA, 2002.
[9] David Lovelock and Hanno Rund, Tensors, differential forms, and varia-
tional principles, Dover Publications, New York, 1989.
[10] W. A. J. Luxenburg, Arzela’s dominated convergence theorem for the rie-
mann integral, American Mathematical Monthly 78 (1971), 970–979.
[11] John Milnor, Morse theory, Annals of Mathematics Studies, no. 51, Prince-
ton University Press, Princeton, New Jersey, 1969.
[12] Shigeyuki Morita, Geometry of differential forms, Translations of Mathe-
matical Monographs, no. 201, American Mathematical Society, Providence,
Rhode Island, 2001.
177
178 BIBLIOGRAPHY
[15] Ivan Netuka, The change-of-variables theorem for the lebesgue integral,
Acta Universitatis Matthiae Belii 19 (2011), 37–42.
[16] S. P. Ponomarev, Submersions and preimages of sets of measure zero,
Siberian Mathematical Journal 28 (1987), 153–163.
Linear Algebra
x, y, z n-component column vectors
ω, µ m-component row vectors
A, B, C m by n matrices
AT transpose
A−1 inverse
tr(A) trace
det(A)√ determinant
|x| = xT x Euclidean norm
kAk p Lipschitz norm
kAk2 = tr(AT A) Euclidean norm
Multivariable functions
f , g, h functions from E ⊆ Rn to Rm
x 7→ f (x) same as f
f0 derivative matrix function from open E ⊆ Rn to m by n matrices
x 7→ f 0 (x) same as f 0
x, y, z variables in Rn
y = f (x) y as a function f (x) of x
yi = fi (x) yi as a function fi (x) of x
∂y 0
∂x = f (x) derivative matrix (Jacobian matrix)
∂yi ∂fi (x) 0
∂xj = ∂xj = fi,j (x). entry of derivative matrix
dx = dx1 ∧ · · · ∧ dxn exterior product of differentials
dy ∂y 0
dx = det ∂x = det f (x) determinant of derivative matrix (Jacobian determinant)
g◦f composite function
(g ◦ f )(x) = g(f (x)) composite function of x
(g ◦ f )0 = (g ◦ f )f 0 chain rule
(g ◦ f )0 (x) = g(f 0 (x))f 0 (x) chain rule as a function of x
p = g(u), u = f (x) composite function expressed with variables
∂p ∂p ∂u
∂x = ∂u ∂x chain rule expressed with variables
f, g, h functions from E ⊆ Rn to R
f,i j (x) entries of Hessian matrix of second derivatives
179
180 MATHEMATICAL NOTATION
Integration
I, m(I) cell, volume of cell
P partition into cells
fI restriction
L(f, P), U (f, P) lower sum, upper sum
L(f ), U (f ), I(f ) lower integral, upper integral, integral
1A indicator function of subset A
m(A) = I(1A ) content (volume) of A
int(A) interior of subset A
bdy(A) boundary of subset A
oscA (f ) oscillation on set A
oscx (f ) oscillation at point x
Disc(f ) {x | oscx (f ) > 0}
δ (x) family of approximate delta functions
Differential Forms
x, y, u coordinate systems
s = h(x)
Pn scalar field
∂
X = j=1 aj ∂x vector field
Pn ∂s j
ds = i=1 ∂xi dxi differential of a scalar (an exact 1-form)
Pn
ω = i=1 P pi dxi differential 1-form
n
hω | Xi = i=1 pi ai scalar field from form and vector
θ differential k-form
hθ | X1 , . . . , Xk i scalar field from form and vectors
Xcθ interior product k − 1 form
θ∧β exterior product of k-form with `-form
dθ exterior derivative of θ (a k + 1 form)
φ = (x ← g(u)) manifold mapping (parameterized surface)
φ∗ h(x) = h(g(u)) pullback of a scalar field
Pk
φ∗ dxi = α=1 gi,α 0
(u) duα pullback of a basis differential
∗
φ θ pullback of a differential k-form
Pk 0
φ∗ ∂u∂α = i=1 gi,α ∂
(u) ∂x i
pushforward of a basis vector field
φ∗ Y pushforward of a vector field
χ chain
∂χ
R boundary of chain
χ
θ integral of form over chain
The P MetricPTensor
n n
g = i=1 j=1 gij dxi dxj metric tensor
gij matrix entries of metric tensor (inner product on vectors)
G matrix of metric tensor
g ij matrix entries of inverse matrix (inner product on forms)
G−1 √ inverse of matrix tensor matrix
√
g = det G volume factor
√
vol = g dx1 · · · dxn volume form
181
Measure Zero
m(A) content of Jordan measurable A
m̄(A) outer content of A
µ̄(A) outer measure of A
2π n/2
vn = n1 Γ(n/2) volume coefficient
Bn (a, r) open n ball of volume vn rn
2π n/2
an = nvn = Γ(n/2) area coefficient
Sn−1 (a, r) n − 1 sphere of area an−1 rn−1
Index
182
INDEX 183
tangent space, 19
tangent vector field, 90
tensor, 137
tensor algebra, 138, 144
tensor calculus, 138
tensor field, 137
trace, 8
transpose matrix, 7
transverse hypersurface element, 131
twisted form, 138
uniform convergence, 54
uniform convergence theorem, 55