Advanced Physics Nick Lucid Jan 2023
Advanced Physics Nick Lucid Jan 2023
Advanced Physics Nick Lucid Jan 2023
A Historical Perspective
Nick Lucid
June 2015
Last Updated: January 2023
ii
c Nick Lucid
Contents
Preface ix
1 Coordinate Systems 1
1.1 Cartesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Polar and Cylindrical . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Spherical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Bipolar and Elliptic . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Vector Algebra 11
2.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Vector Operators . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Vector Calculus 19
3.1 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Del Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Non-Cartesian Del Operators . . . . . . . . . . . . . . . . . . 24
3.4 Arbitrary Del Operator . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Vector Calculus Theorems . . . . . . . . . . . . . . . . . . . . 36
The Divergence Theorem . . . . . . . . . . . . . . . . . . . 37
The Curl Theorem . . . . . . . . . . . . . . . . . . . . . . 39
4 Lagrangian Mechanics 45
4.1 A Little History... . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Derivation of Lagrange’s Equation . . . . . . . . . . . . . . . . 46
4.3 Generalizing for Multiple Bodies . . . . . . . . . . . . . . . . . 51
4.4 Applications of Lagrange’s Equation . . . . . . . . . . . . . . 52
4.5 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . 66
4.6 Applications of Lagrange Multipliers . . . . . . . . . . . . . . 68
iii
iv CONTENTS
5 Electrodynamics 77
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 Experimental Laws . . . . . . . . . . . . . . . . . . . . . . . . 77
Coulomb’s Law . . . . . . . . . . . . . . . . . . . . . . . . 78
Biot-Savart Law. . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Theoretical Laws . . . . . . . . . . . . . . . . . . . . . . . . . 97
Ampére’s Law . . . . . . . . . . . . . . . . . . . . . . . . . 97
Faraday’s Law. . . . . . . . . . . . . . . . . . . . . . . . . 105
Gauss’s Law(s) . . . . . . . . . . . . . . . . . . . . . . . . 108
Ampére’s Law Revisited . . . . . . . . . . . . . . . . . . . 111
5.4 Unification of Electricity and Magnetism . . . . . . . . . . . . 114
5.5 Electromagnetic Waves . . . . . . . . . . . . . . . . . . . . . . 118
5.6 Potential Functions . . . . . . . . . . . . . . . . . . . . . . . . 123
Maxwell’s Equations with Potentials. . . . . . . . . . . . . 127
5.7 Blurring Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
c Nick Lucid
CONTENTS v
c Nick Lucid
vi CONTENTS
c Nick Lucid
CONTENTS vii
c Nick Lucid
viii CONTENTS
c Nick Lucid
Preface
• you’re a graduate physics student but feel like you’re missing some-
thing, or
ix
x CHAPTER 0. PREFACE
The point being, this book is not intended for anyone without at least some
background in basic calculus and introductory physics.
The chapters of this book correspond to major topics, so some of them
can get rather long. If a particular physics topic requires a lot of mathemat-
ical background, then development of that math will be in its own chapter
preceding the physics topic. For example, vector calculus (Chapter 3) pre-
cedes electrodynamics (Chapter 5) and tensor analysis (Chapter 6) precedes
relativity (Chapters 7 and 8). The topics are also in a somewhat historical
order and include a bit of historical information to put them in context with
each other. Historical context can give you a deeper insight into a topic and
understanding how long it took the scientific community to develop some-
thing can make you feel a little better about maybe not understanding it
immediately.
With the exception of Chapter 1, all chapters contain worked examples
where helpful. Some of those examples also make use of numerical methods
which can be found in Appendix A. Reading textbooks and other trade books
on these topics, I often get frustrated by how many steps are missing from
examples and derivations. As you read this book, you’ll find that I make
a point to include as many steps as possible and clearly explain any steps
I don’t show mathematically. Also, with so many different topics in one
place, there are times where I avoid traditional notation in favor keeping a
consistent notation throughout the book. Frankly, some traditional choices
for symbols are terrible anyway.
Acknowledgments
I’d like to acknowledge Nicholas Arnold for proofreading this book and Jesse
Mason for asking me that question about Lagrangian mechanics all those
years ago.
c Nick Lucid
Chapter 1
Coordinate Systems
1
2 CHAPTER 1. COORDINATE SYSTEMS
1.1 Cartesian
The Cartesian coordinate system was developed by René Descartes (Latin:
Renatus Cartesius). He published the concept in his work La Géométrie
in 1637. The idea of uniting algebra with geometry as Descartes had re-
sulted in drastic positive consequences on the development of mathematics,
particularly the soon to be invented calculus.
This system of coordinates is the most basic we have consisting of, in
general, three numbers to represent location: x, y, and z. It is a form of
rectilinear coordinates, which is simply a grid of straight lines. We can
represent this position as a position vector,
~r ≡ xx̂ + y ŷ + z ẑ , (1.1.1)
where x̂, ŷ, and ẑ represent the directions along each of the axes. This
c Nick Lucid
1.1. CARTESIAN 3
Figure 1.2: This is the Cartesian plane (i.e. the xy-plane or R2 ). The left shows the grid
living up to it’s rectilinear name. The right shows an arbitrary position vector in this
system.
Figure 1.3: This is Cartesian space (i.e. R3 ). The left shows the grid living up to it’s
rectilinear name. The right shows an arbitrary position vector in this system.
c Nick Lucid
4 CHAPTER 1. COORDINATE SYSTEMS
Figure 1.4: This is cylindrical coordinates. The left shows the curvilinear grid in the xy-
plane (i.e. polar coordinates). The right shows an arbitrary position vector in this system
where the coordinates are also labeled.
coordinate option might make the most sense, but it doesn’t always make
problem solving easier. For those cases, we have some more specialized op-
tions.
or in reverse
p
s = x2 + y 2
φ = arctan xy . (1.2.2)
z = z
c Nick Lucid
1.3. SPHERICAL 5
We can also use Eq. 1.1.1, to find the corresponding unit vectors. Since we
know ŝ = ~s/s and φ̂ must be perpendicular to ŝ (initially with a positive
y-component), the cylindrical unit vectors can be written as
ŝ = cos φ x̂ + sin φ ŷ
φ̂ = − sin φ x̂ + cos φ ŷ . (1.2.3)
ẑ = ẑ
cos φ − sin φ 0 ŝ x̂
sin φ cos φ 0 φ̂ = ŷ . (1.2.6)
0 0 1 ẑ ẑ
Therefore, in equation form, they are
x̂ = cos φ ŝ − sin φ φ̂
ŷ = sin φ ŝ + cos φ φ̂ . (1.2.7)
ẑ = ẑ
This set of coordinates is particularly useful when dealing with cylindrical
symmetry (e.g. rotating rigid bodies, strings of mass, lines of charge, long
straight currents, etc.)
1.3 Spherical
Just as with polar, spherical coordinates are a form of curvilinear coordi-
nates. However, rather than having two straight lines and one curved, it’s
c Nick Lucid
6 CHAPTER 1. COORDINATE SYSTEMS
Figure 1.5: This is spherical coordinates. The left shows the grid you’d find on a surface
of constant r and the right shows the arbitrary position vector. The orientation of the
Cartesian system is different than it was in Figure 1.4, so the cylindrical coordinates are
also shown for clarity.
where we have taken the cylindrical coordinates and made the substitutions
of
s = r sin θ
φ = φ , (1.3.2)
z = r cos θ
c Nick Lucid
1.3. SPHERICAL 7
and
r̂ = sin θ cos φ x̂ + sin θ sin φ ŷ + cos θ ẑ
θ̂ = cos θ cos φ x̂ + cos θ sin φ ŷ − sin θ ẑ . (1.3.5)
φ̂ = − sin φ x̂ + cos φ ŷ
By the matrix method shown in Section 1.2, we can also write the Cartesian
and cylindrical coordinates in terms of the spherical ones. They will be
ŝ = sin θ r̂ + cos θ θ̂
φ̂ = φ̂ (1.3.6)
ẑ = cos θ r̂ − sin θ θ̂
c Nick Lucid
8 CHAPTER 1. COORDINATE SYSTEMS
and
x̂ = sin θ cos φ r̂ + cos θ cos φ θ̂ − sin φ φ̂
ŷ = sin θ sin φ r̂ + cos θ sin φ θ̂ + cos φ φ̂ . (1.3.7)
ẑ = cos θ r̂ − sin θ θ̂
If you go through the matrix algebra as I have for all of these, you’ll notice
a pattern. Say for the sake of discussion our coefficient matrix is given as A.
The pattern we will see is that the inverse matrix is equal to the transpose
of the matrix (i.e. A−1 = AT ), where a transpose is simply a flip over the
diagonal. This is not true for all matrices by any stretch, but it is true
of orthonormal matrices, which are matrices formed by an orthonormal
basis. This is exactly what we have here because the set of unit vectors for a
coordinate system (e.g. {x̂, ŷ, ẑ}) is referred to as a basis and should always
be orthonormal. This makes finding inverse coordinate transformations very
straightforward.
If we define ~r1 and ~r2 as the position vectors relative to the origins at x = −a
c Nick Lucid
1.4. BIPOLAR AND ELLIPTIC 9
Figure 1.6: This is bipolar coordinates. The circles that intersect the origins at ±a along
the horizontal axis are of constant σ and the circles that do not intersect at all are of
constant τ .
Figure 1.7: This is elliptic coordinates. The ellipses are of constant µ and the hyperbolas
are of constant ν. The points ±a along the horizontal axis represent the foci of both the
ellipses and hyperbolas.
matches the equation for a hyperbola for constant ν. A similar process can
be used to show the grid lines in bipolar are all circles, but that derivation
is much more algebraically and trigonometrically involved.
These two planar coordinate systems can be expanded into a wide variety
of three-dimensional systems. We can project the grids along the z-axis to
form bipolar cylindrical and elliptic cylindrical coordinates. They can be
rotated about various axes to form toroidal, bispherical, oblate spheroidal,
and prolate spheroidal coordinates. We can even take elliptic coordinates
to the next dimension with its own angle definition resulting in ellipsoidal
coordinates.
c Nick Lucid
Chapter 2
Vector Algebra
2.1 Operators
The concept of operators is first introduced to students as children with ba-
sic arithmetic. We learn to add, subtract, multiply, and divide numbers. As
mathematics progresses, we learn about exponents, parentheses, and the or-
der of operations (PEMDAS) where we must use operators in a particular
order. When we start learning algebra, we see that for every operator there
is another operator that undoes it (i.e. an inverse operator like subtraction
is for addition). It is at this point that the nature of operators sometimes
becomes understated. Teachers will introduce the idea of functions on a basic
level which can tend to sweep operators under the rug so to speak.
It isn’t until some of us take classes in college like Abstract Algebra (or
something similar) where we’re reintroduced to operators. The arithmetic
we’ve been doing all our lives is summed up in an algebraic structure known
as a field (not to be confused with quantities we see in physics like the
electric field). A mathematical field is a set of numbers closed under two
operators. In the case of basic arithmetic, these two operations are addition
and multiplication and the set of number is the real numbers. We’d write
this as (R, +, ∗).
The other operations (e.g. exponents, subtraction, and division) are in-
cluded through properties of fields such as inverses to maintain generality.
For example, rather than subtract, we add an additive inverse. (e.g.
2 − 3 = 2 + (−3) where −3 is also in the set R). Under higher levels of
algebra we have multiplication and division, exponents and logs, sine and
11
12 CHAPTER 2. VECTOR ALGEBRA
arcsine, etc. In basic algebra they usually refer to sine or log as functions,
but in reality they operate on one function (or number) to make another. All
these operators obey certain properties. For example, operators in (R, +, ∗)
obey the following properties:
• Additive Identity:
For a ∈ R, a + 0 = a.
• Additive Inverse:
For a ∈ R, a + (−a) = 0.
• Multiplicative Identity:
For a ∈ R, a ∗ 1 = a.
• Multiplicative Inverse:
For a ∈ R, a ∗ a−1 = 1.
• Associative Property:
For a, b, c ∈ R, a + (b + c) = (a + b) + c and a ∗ (b ∗ c) = (a ∗ b) ∗ c.
• Commutative Property:
For a, b ∈ R, a + b = b + a and a ∗ b = b ∗ a.
• Distributive Property:
For a, b, c ∈ R, a ∗ (b + c) = a ∗ b + a ∗ c.
The properties listed above don’t apply to all algebras. Matrices, for example,
are not commutative under multiplication.
c Nick Lucid
2.2. VECTOR OPERATORS 13
= Ax Bx x̂ • x̂ + Ax By x̂ • ŷ + Ax Bz x̂ • ẑ
+Ay Bx ŷ • x̂ + Ay By ŷ • ŷ + Ay Bz ŷ • ẑ
+Az Bx ẑ • x̂ + Az By ẑ • ŷ + Az Bz ẑ • ẑ
= Ax Bx + Ay By + Az Bz
c Nick Lucid
14 CHAPTER 2. VECTOR ALGEBRA
where we have taken advantage of Eq. 2.2.1 on the unit vectors (having
a magnitude of one, by definition). We can write this more generally
as
n
X
~•B
A ~ = Ai Bi (2.2.2)
i=1
= Ax Bx x̂ × x̂ + Ax By x̂ × ŷ + Ax Bz x̂ × ẑ
+Ay Bx ŷ × x̂ + Ay By ŷ × ŷ + Ay Bz ŷ × ẑ
+Az Bx ẑ × x̂ + Az By ẑ × ŷ + Az Bz ẑ × ẑ
= Ax By ẑ + Ax Bz (−ŷ) + Ay Bx (−ẑ)
+Ay Bz x̂ + Az Bx ŷ + Az By (−x̂)
~×B
A ~ = (Ay Bz − Az By ) x̂ + (Az Bx − Ax Bz ) ŷ + (Ax By − Ay Bx ) ẑ
where we have taken advantage of Eq. 2.2.3 on the unit vectors (having
a magnitude of one, by definition) noting that all our coordinate sys-
tems are right-handed (i.e. x̂ × ŷ = k̂ but ŷ × x̂ = −k̂). We can write
this more simply as
x̂ ŷ ẑ
A~×B ~ = detAx Ay Az (2.2.4)
Bx By Bz
which can be easily generalized for more dimensions if necessary.
c Nick Lucid
2.2. VECTOR OPERATORS 15
Figure 2.1: This diagram shows a constant force, F~ , (with components labeled) acting on
a mass, m, affecting a displacement, ∆~s.
Example 2.2.1
Let’s consider our definition of work: “Work is done on a body by a force over
some displacement if that force directly affects the displacement of the body.”
For simplicity, we’ll consider a force which is constant over the displacement.
We can see clearly in Figure 2.1 that only the component of the force parallel
to the displacement will affect the displacement. Taking advantage of Eq.
2.2.1, we have
c Nick Lucid
16 CHAPTER 2. VECTOR ALGEBRA
Figure 2.2: This diagram shows a constant force, F~ , (with components labeled) acting on
a door knob with a lever arm, ~r, labeled.
Example 2.2.2
Let’s consider a basic scenario: A constant force acting on a door knob. We
can see clearly in Figure 2.2 that only the component of the force perpen-
dicular to the door will generate a torque because it’s the only one that can
generate rotation. Taking advantage of Eq. 2.2.3, we have
τ = rF⊥ = r (F sin θ) = rF sin θ
~τ = ~r × F~ .
The dot product and cross product also obey properties similar to those
~ B,
found for real numbers and derivatives. For vectors A, ~ and C;
~ and con-
stant c;
• Constant Multiple Properties:
c A ~•B~ = cA ~ •B
~ =A
~ • cB
~ (2.2.5)
c A~×B
~ = cA~ ×B
~ =A
~ × cB
~ . (2.2.6)
c Nick Lucid
2.2. VECTOR OPERATORS 17
• Distributive Properties:
~• B
A ~ +C
~ ~•B
= A ~ +A
~•C
~ (2.2.7)
~× B
A ~ +C
~ ~×B
= A ~ +A
~ × C.
~ (2.2.8)
• Commutative Properties:
~•B
A ~ = B
~ •A
~ (2.2.9)
~×B
A ~ = −B
~ × A.
~ (2.2.10)
~× B
A ~ ×C
~ =B~ A~•C
~ −C~ A~•B
~ (2.2.12)
It should be stated explicitly here that neither the dot product nor the cross
product is associative. That means, when writing triple products, parenthe-
ses must always be present.
c Nick Lucid
18 CHAPTER 2. VECTOR ALGEBRA
c Nick Lucid
Chapter 3
Vector Calculus
3.1 Calculus
Early on in calculus, we’re shown how to take a derivative of a function. Then
later, we see the integral (also called an anti-derivative). These are both
operators and they obey certain properties just like the algebraic operators
from Section 2.1. For real-valued functions f (x) and g(x) and real-valued
constant c,
• Fundamental Theorem of Calculus (or Inverse Property):
Z b Z b
d
(f ) dx = df = f |x=b − f |x=a . (3.1.1)
a dx a
• Chain Rule:
d d du
(f ) = (f ) . (3.1.2)
dx du dx
• Distributive Property:
d d d
(f + g) = (f ) + (g) . (3.1.4)
dx dx dx
19
20 CHAPTER 3. VECTOR CALCULUS
• Product Rule:
d d d
(f ∗ g) = (f ) ∗ g + f ∗ (g) . (3.1.5)
dx dx dx
• Quotient Rule:
d d
(f ) ∗ g − f ∗
d f dx dx
(g)
= . (3.1.6)
dx g g2
~ ≡ x̂ ∂ + ŷ ∂ + ẑ ∂
∇ (3.2.1)
∂x ∂y ∂z
c Nick Lucid
3.2. DEL OPERATOR 21
where the unit vectors have been placed in front to avoid confusion. This
seems simple enough, but how does it operate? The del operator can operate
on both scalar and vector functions. When it operates on a scalar function,
f (x, y, z), we have
~ = ∂f x̂ + ∂f ŷ + ∂f ẑ .
∇f (3.2.2)
∂x ∂y ∂z
This is called the gradient and it measures how the scalar function, f ,
changes in space. This means the change of a scalar function is a vector,
which makes sense. We would want to know in what direction it’s changing
the most.
However, if del operates on another vector, we have two options: dot
product and cross product. Using Eqs. 2.2.2 and 2.2.4 with a vector field,
~ y, z), results in
A(x,
x̂ ŷ ẑ
~ ×A
~ = det ∂ ∂ ∂
∇ ∂x ∂y ∂z
(3.2.4)
Ax Ay Az
where, again, the dot product results in a scalar and the cross product a
vector. The next question on everyone’s minds: “Sure, but what do they
mean?!”
Eq. 3.2.3 is called the divergence and it measures how a vector field, A, ~
diverges or spreads at a single position. In other words, if the divergence (at a
point) is positive, then there are more vectors directed outward surrounding
that point than there are directed inward. The opposite is true for a negative
divergence. By the same reasoning, a divergence of zero implies there is the
same amount outward as inward. This is sounding very abstract, I know, but
we’re keeping definitions as general as possible. This concept could apply to a
multitude of situations (e.g. velocity or flow of fluids, electrodynamics, etc.).
Eq. 3.2.4 is called the curl and it measures how a vector field, A, ~ curls
or circulates at a single position. This makes perfect sense if applied to fluid
c Nick Lucid
22 CHAPTER 3. VECTOR CALCULUS
flow, but what about something like electromagnetic fields where nothing is
actually circulating? We can see some of these fields curl, but they certainly
don’t circulate, right? True, and the field itself bending doesn’t always indi-
cate a non-zero curl (a straight field doesn’t indicate zero curl either). It’s
best to think of the curl in terms of how something would respond to the
field when placed inside. In a circulating fluid, a small object might rotate
or revolve. In an electric field, one charge would move toward or away from
another. In a magnetic field, a moving charge will travel in circle-like paths.
We can attain a visual based on how these foreign bodies move as a result
of the field’s influence. Furthermore, the direction of the angular velocity of
the body will be in the same direction as the curl.
It should be noted that we’re not really dotting or crossing two vectors
together. Yes, the del operator has a vector form, but it’s more an operator
than a vector. We lose the commutative properties of the two products
because del has to operate something (i.e. ∇ ~ •A ~ 6= A ~ • ∇).
~ Because it
doesn’t obey all properties of vectors, the rigorous among us refuse to call
del a vector.
We can also use the del operator to take a second derivative. However,
since this operator is changing the nature of our function between scalar and
vector and vice versa, we have several options mathematically: divergence of
a gradient, gradient of a divergence, curl of a gradient, divergence of a curl,
and curl of a curl. This might seem like a lot, butwe can
eliminate several of
them. First, the divergence of the gradient, ∇ ~ • ∇f~ , has a special name:
the Laplacian. It is short-handed with the symbol ∇ ~ 2 and is represented in
Cartesian coordinates by
2 2 2
~ 2f = ∂ f + ∂ f + ∂ f .
∇ (3.2.5)
∂x2 ∂y 2 ∂z 2
The curl of a gradient and the divergence of a curl are both zero, which we
can show mathematically as
~ × ∇f
∇ ~ =0 (3.2.6)
and
~ ~ ~
∇ • ∇ × A = 0. (3.2.7)
c Nick Lucid
3.2. DEL OPERATOR 23
Both of these can be mathematically proven using Eqs. 3.2.2, 3.2.3, and 3.2.4
by realizing the partial derivatives are commutative:
∂ ∂f ∂ ∂f ∂ ∂f ∂ ∂f
= ⇒ − = 0.
∂x ∂y ∂y ∂x ∂x ∂y ∂y ∂x
~ ~ ~
The gradient of the divergence, ∇ ∇ • A , is not zero, but is extremely rare
in physical systems. The curl of the curl obeys the identity
~ ~ ~
∇× ∇×A =∇ ∇•A −∇ ~ ~ ~ ~ 2 A,
~ (3.2.8)
which contains the gradient of the divergence and the Laplacian, second
derivatives already seen. The Laplacian of a vector is defined in Cartesian
coordinates as
~ 2A
∇ ~≡ ∇ ~ 2 Ax x̂ + ∇~ 2 Ay ŷ + ∇~ 2 Az ẑ,
c Nick Lucid
24 CHAPTER 3. VECTOR CALCULUS
Example 3.3.1
The del operator, gradient, divergence, and curl can be found for any coordi-
nate system by performing the following steps. For context, we’ll find them
for cylindrical coordinates (Section 1.2).
1. Find the Cartesian variables in terms of the variables of the new coor-
dinate system.
Eq. 1.2.1 ...check!
2. Find the variables of the new coordinate system in terms of the Carte-
sian variables.
Eq. 1.2.2 ...check!
3. Find the unit vectors in the new coordinate system in terms of the
Cartesian unit vectors.
Eq. 1.2.3 ...check!
4. Find the Cartesian unit vectors in terms of the unit vectors in the new
coordinate system.
Eq. 1.2.7 ...check!
5. Determine the cross product combinations of the new unit vectors using
the right-hand rule.
Based on the order in which we’ve listed the variables, (s, φ, z), and
Eq. 2.2.10, we can conclude
ŝ × φ̂ = ẑ and φ̂ × ŝ = −ẑ
ẑ × ŝ = φ̂ and ŝ × ẑ = −φ̂ .
φ̂ × ẑ = ŝ and ẑ × φ̂ = −ŝ
c Nick Lucid
3.3. NON-CARTESIAN DEL OPERATORS 25
6. Evaluate all the possible first derivatives of the new variables with re-
spect to the Cartesian variables (there are 9 derivatives total) and then
transform back to the new variables.
Using Eqs. 1.2.2 and then 1.2.1, we see that
∂s ∂φ ∂z ∂z ∂z
= = = = = 0,
∂z ∂z ∂x ∂y ∂z
∂s x s cos φ
∂x = x2 + y 2 = s = cos φ
p
,
∂s y s sin φ
=p = = sin φ
∂y x2 + y 2 s
and
∂φ −y −s sin φ − sin φ
= = =
∂x x2 + y 2 s2 s
.
∂φ x s cos φ cos φ
= 2 = =
∂y x + y2 s2 s
7. Evaluate all the possible first derivatives of the new unit vectors with
respect to the new variables (there are 9 derivatives total).
Unlike in Cartesian, the direction of cylindrical unit vectors depends
on position in space, so this is necessary (and happens to be the source
of most of our trouble). Using Eq. 1.2.3, we see that
∂ŝ ∂ŝ ∂ φ̂ ∂ φ̂ ∂ ẑ ∂ ẑ ∂ ẑ
= = = = = = = 0,
∂s ∂z ∂s ∂z ∂s ∂φ ∂z
∂ŝ
= − sin φ x̂ + cos φ ŷ = φ̂,
∂φ
and
∂ φ̂
= − cos φ x̂ − sin φ ŷ = −ŝ.
∂φ
c Nick Lucid
26 CHAPTER 3. VECTOR CALCULUS
8. Use the chain rule to expand each Cartesian derivative operator into
the new coordinate operators.
By the chain rule (Eq. 3.1.2) generalized to multi-variable partial deriva-
tives, we have
∂ ∂s ∂ ∂φ ∂ ∂z ∂
= + +
∂x ∂x ∂s ∂x ∂φ ∂x ∂z
∂
∂s ∂ ∂φ ∂ ∂z ∂
= + + .
∂y ∂y ∂s ∂y ∂φ ∂y ∂z
∂ ∂s ∂ ∂φ ∂ ∂z ∂
= + +
∂z ∂z ∂s ∂z ∂φ ∂z ∂z
∂ ∂ sin φ ∂
= cos φ −
∂x ∂s s ∂φ
∂ ∂ cos φ ∂
= sin φ + .
∂y ∂s s ∂φ
∂ = ∂
∂z ∂z
~ ≡ x̂ ∂ + ŷ ∂ + ẑ ∂
∇
∂x ∂y ∂z
We can now expand the two binomial products (i.e. using the distribu-
tive property of multiplication) arriving at
2
~ = ŝ cos2 φ ∂ − φ̂ sin φ cos φ ∂ − ŝ sin φ cos φ ∂ + φ̂ sin φ ∂
∇
∂s ∂s s ∂φ s ∂φ
∂ ∂ sin φ cos φ ∂ cos2 φ ∂
+ŝ sin2 φ + φ̂ sin φ cos φ + ŝ + φ̂
∂s ∂s s ∂φ s ∂φ
∂
+ẑ .
∂z
Several terms will cancel and the remaining terms can be simplified by
sin2 φ + cos2 φ = 1 resulting in a del operator of
~ = ŝ ∂ + φ̂ 1 ∂ + ẑ ∂
∇ (3.3.1)
∂s s ∂φ ∂z
~ = ∂f ŝ + 1 ∂f φ̂ + ∂f ẑ
∇f
∂s s ∂φ ∂z
c Nick Lucid
28 CHAPTER 3. VECTOR CALCULUS
~ = ŝ • ∂ As ŝ + Aφ φ̂ + Az ẑ
~ •A
∇
∂s
1 ∂
+φ̂ • As ŝ + Aφ φ̂ + Az ẑ
s ∂φ
∂
+ẑ • As ŝ + Aφ φ̂ + Az ẑ .
∂z
We can now distribute the derivative operators and perform the neces-
sary product rules (Eq. 3.1.5) resulting in
~ •A~ = ŝ • ∂ ∂ ∂
∇ (As ŝ) + Aφ φ̂ + (Az ẑ)
∂s ∂s ∂s
1 ∂ ∂ ∂
+φ̂ • (As ŝ) + Aφ φ̂ + (Az ẑ)
s ∂φ ∂φ ∂φ
∂ ∂ ∂
+ẑ • (As ŝ) + Aφ φ̂ + (Az ẑ)
∂z ∂z ∂z
" #
~ •A
~ = ŝ • ∂A s ∂ŝ ∂A φ ∂ φ̂ ∂A z ∂ ẑ
∇ ŝ + As + φ̂ + Aφ + ẑ + Az
∂s ∂s ∂s ∂s ∂s ∂s
" #
1 ∂As ∂ŝ ∂Aφ ∂ φ̂ ∂Az ∂ ẑ
+φ̂ • ŝ + As + φ̂ + Aφ + ẑ + Az
s ∂φ ∂φ ∂φ ∂φ ∂φ ∂φ
" #
∂As ∂ŝ ∂Aφ ∂ φ̂ ∂Az ∂ ẑ
+ẑ • ŝ + As + φ̂ + Aφ + ẑ + Az .
∂z ∂z ∂z ∂z ∂z ∂z
c Nick Lucid
3.3. NON-CARTESIAN DEL OPERATORS 29
c Nick Lucid
30 CHAPTER 3. VECTOR CALCULUS
~ = ŝ × ∂ As ŝ + Aφ φ̂ + Az ẑ
~ ×A
∇
∂s
1 ∂
+φ̂ × As ŝ + Aφ φ̂ + Az ẑ
s ∂φ
∂
+ẑ × As ŝ + Aφ φ̂ + Az ẑ .
∂z
We can now distribute the derivative operators and perform the neces-
sary product rules (Eq. 3.1.5) resulting in
~ ×A~ = ŝ × ∂ ∂ ∂
∇ (As ŝ) + Aφ φ̂ + (Az ẑ)
∂s ∂s ∂s
1 ∂ ∂ ∂
+φ̂ × (As ŝ) + Aφ φ̂ + (Az ẑ)
s ∂φ ∂φ ∂φ
∂ ∂ ∂
+ẑ × (As ŝ) + Aφ φ̂ + (Az ẑ)
∂z ∂z ∂z
" #
~ ×A
~ = ŝ × ∂A s ∂ŝ ∂A φ ∂ φ̂ ∂A z ∂ ẑ
∇ ŝ + As + φ̂ + Aφ + ẑ + Az
∂s ∂s ∂s ∂s ∂s ∂s
" #
1 ∂As ∂ŝ ∂Aφ ∂ φ̂ ∂Az ∂ ẑ
+φ̂ × ŝ + As + φ̂ + Aφ + ẑ + Az
s ∂φ ∂φ ∂φ ∂φ ∂φ ∂φ
" #
∂As ∂ŝ ∂Aφ ∂ φ̂ ∂Az ∂ ẑ
+ẑ × ŝ + As + φ̂ + Aφ + ẑ + Az .
∂z ∂z ∂z ∂z ∂z ∂z
c Nick Lucid
3.3. NON-CARTESIAN DEL OPERATORS 31
Finally, we can operate with the cross product taking advantage of the
relationships we found in step 5, which results in
~ ×A
~ = 1 ∂Az ∂Aφ ∂As ∂Az
∇ − ŝ + − φ̂
s ∂φ ∂z ∂z ∂s
1 ∂Aφ ∂As
+ s + (1) Aφ − ẑ.
s ∂s ∂φ
c Nick Lucid
32 CHAPTER 3. VECTOR CALCULUS
and we can see the quantity represented by the first two terms in the
z-component matches the form of Eq. 3.1.5. Rewriting, we arrive at
our final answer of
~ ~ 1 ∂Az ∂Aφ ∂As ∂Az
∇×A = − ŝ + − φ̂
s ∂φ ∂z ∂z ∂s
1 ∂ ∂As
+ (sAφ ) − ẑ
s ∂s ∂φ
• The Gradient:
~ = ∂f ŝ + 1 ∂f φ̂ + ∂f ẑ
∇f (3.3.2)
∂s s ∂φ ∂z
• The Divergence:
• The Curl :
~ ×A
~ = 1 ∂A z ∂A φ ∂A s ∂A z
∇ − ŝ + − φ̂
s ∂φ ∂z ∂z ∂s
(3.3.4)
1 ∂ ∂As
+ (sAφ ) − ẑ
s ∂s ∂φ
• The Laplacian:
1 ∂ ∂f 1 ∂ 2 f ∂ 2f
~ 2 ~ ~
∇ f = ∇ • ∇f = s + 2 2+ 2 (3.3.5)
s ∂s ∂s s ∂φ ∂z
c Nick Lucid
3.4. ARBITRARY DEL OPERATOR 33
• The Gradient:
~ = ∂f r̂ + 1 ∂f θ̂ + 1 ∂f φ̂
∇f (3.3.6)
∂r r ∂θ r sin θ ∂φ
• The Divergence:
~ = 1 ∂ r2 Ar + 1 ∂ (sin θ Aθ ) + 1 ∂Aφ
~ •A
∇ (3.3.7)
r2 ∂r r sin θ ∂θ r sin θ ∂φ
• The Curl :
~ ×A
~ = 1 ∂ ∂Aθ
∇ (sin θ Aφ ) − r̂
r sin θ ∂θ ∂φ
1 1 ∂Ar ∂
+ − (rAφ ) θ̂ (3.3.8)
r sin θ ∂φ ∂r
1 ∂ ∂Ar
+ (rAθ ) − φ̂
r ∂r ∂θ
• The Laplacian:
~ f = 1 ∂
∇ 2
r 2 ∂f
+ 2
1 ∂
sin θ
∂f
r2 ∂r ∂r r sin θ ∂θ ∂θ
(3.3.9)
1 ∂ 2f
+ 2 2
r sin θ ∂φ2
c Nick Lucid
34 CHAPTER 3. VECTOR CALCULUS
vectors (i.e. the length of the basis vectors when they’re not unit vectors),
which have the form
∂~r
~ei = hi êi = , (3.4.1)
∂qi
where ~r is defined by Eq. 1.1.1 and the derivative is the result of a simple
coordinate transformation.
We would also like to have some idea of the form of the path (or line)
element in this coordinate system. This can be easily found using the multi-
variable chain rule, which states
∂f ∂f ∂f
df = dq1 + dq2 + dq3
∂q1 ∂q2 ∂q3
3
X ∂f
df = dqi
i=1
∂qi
3
X 1 ∂f
df = hi dqi
i=1
hi ∂q i
for some arbitrary scalar function, f (q1 , q2 , q3 ). If we use Eq. 2.2.2 to write
this as a dot product, then
3
! 3
!
X 1 ∂f X
df = êi • hi dqi êi .
i=1
hi ∂q i i=1
The quantity in the first set of parentheses is simply the gradient of f . Since
we have included the scale factors, every term in the second set of parentheses
has a unit of length making this quantity the path element. We can simplify
the notation to get
~ • d~`
df = ∇f (3.4.2)
where
c Nick Lucid
3.4. ARBITRARY DEL OPERATOR 35
Figure 3.1: These are the volume elements of the three standard coordinate systems from
Chapter 1. In order from left to right: Cartesian, Cylindrical, Spherical.
c Nick Lucid
36 CHAPTER 3. VECTOR CALCULUS
• The Divergence
3
~ •A
~= 1 X ∂
∇ (Hi Ai ) (3.4.9)
h1 h2 h3 i=1 ∂qi
where H~ = (h2 h3 ) ê1 + (h3 h1 ) ê2 + (h1 h2 ) ê3 (the even permutations of
the subscripts).
• The Curl
1 1 1
ê
h2 h3 1
ê
h1 h3 2
ê
h1 h2 3
~ ×A
~ = det
∂ ∂ ∂
∇ ∂q1 ∂q2 ∂q3
(3.4.10)
h1 A1 h2 A2 h3 A3
• The Laplacian
3
~ f=
2 1 X ∂ 1 ∂f
∇ Hi (3.4.11)
h1 h2 h3 i=1 ∂qi hi ∂qi
where H~ = (h2 h3 ) ê1 + (h3 h1 ) ê2 + (h1 h2 ) ê3 (the even permutations of
the subscripts).
c Nick Lucid
3.5. VECTOR CALCULUS THEOREMS 37
dV = (h1 dq1 ) (h2 dq2 ) (h3 dq3 ) = h1 h2 h3 dq1 dq2 dq3 (3.5.1)
where {h1 , h2 , h3 } are the scale factors and (q1 , q2 , q3 ) are the coordinates.
Now let’s consider the divergence throughout this volume element. From
Eq. 3.4.9 and 3.5.1, we get
3
~ •B
~ dV 1 X ∂
∇ = (Hi Bi ) h1 h2 h3 dq1 dq2 dq3
h1 h2 h3 i=1 ∂qi
3
~ •B
~ dV
X ∂
∇ = (Hi Bi ) dq1 dq2 dq3 . (3.5.2)
i=1
∂qi
~ •B
∇ ~ dV = d (h2 h3 B1 ) dq2 dq3 + . . .
= (h2 h3 B1 )|q1 +dq1 dq2 dq3 − (h2 h3 B1 )|q1 dq2 dq3 + . . .
~ •B
∇ ~ dV = B1 |
q1 +dq1 (h2 h3 dq2 dq3 )|q1 +dq1 − B1 |q1 (h2 h3 dq2 dq3 )|q1 + . . .
c Nick Lucid
38 CHAPTER 3. VECTOR CALCULUS
Figure 3.2: This is a representation of an arbitrary volume element. The orthogonal vectors
for each of the surfaces facing the reader are also shown. The back bottom left corner is
labeled (q1 , q2 , q3 ) and the front top right corner is labeled (q1 + dq1 , q2 + dq2 , q3 + dq3 ) to
show that its volume matches that given by Eq. 3.5.1.
Taking a look at Figure 3.2, we can see the first of these two terms
corresponds to the right surface of the volume element located at q1 + dq1
and the second term corresponds to the left surface located at q1 . Each of
these surfaces spans an area of dA1 = (h2 dq2 ) (h3 dq3 ) = h2 h3 dq2 dq3 evaluated
at their location along q1 . This simplifies the above relationship to
~ •B
∇ ~ dV = B1 |
q1 +dq1 dA1 |q1 +dq1 − B1 |q1 dA1 |q1 + . . .
~ •B
∇ ~ dV = B
~ • dA
~1 ~ • dA
+B ~1 + ... (3.5.3)
q1 +dq1 q1
where we’ve lost our three negative signs because the angle between B ~ and
~ for those surfaces is 180◦ (because dA
dA ~ always points outward from the
volume enclosed). The cosine in Eq. 2.2.1 takes care of the sign for us.
Originally, in Eq. 3.5.2, we had three terms. Now, with Eq. 3.5.3, we
have six terms each corresponding to a different surface of the volume element
(which is composed of six surfaces). Since this process has occurred for all six
c Nick Lucid
3.5. VECTOR CALCULUS THEOREMS 39
terms and these six terms together completely enclose the volume element,
we can rewrite Eq. 3.5.3 as
I
~ •B
∇ ~ dV = B~ • dA.
~ (3.5.4)
dV
If we’re going to make this practical, it should apply to the entire region, not
just the volume element. To do this, we simply add up (with an integral)
all the elements that compose the region. But what happens to the right
side of Eq. 3.5.4? If the region is composed of volume elements, then those
elements are all touching such that they completely fill the region. For the
surface (area) elements in contact with other surface elements within the
~ • dA’s
region, their B ~ will all cancel because all of their dA’s
~ will be exactly
opposite. This means only the surface elements not in contact with other
surface elements will add to the integral on the right in Eq. 3.5.4. These
surface elements are simply the ones on the outside of the region (i.e. we
only need to integrate over the outside surface of the region). Therefore, Eq.
3.5.4 becomes
Z I
~ ~
∇ • B dV = B~ • dA
~. (3.5.5)
V
We call this the Divergence Theorem and it is true for any arbitrary region
V enclosed by a surface A.
You may be asking yourself why we didn’t just start with the volume of
the entire region from the beginning. Why did we do all this stuff with the
volume element instead? The answer is simple: We know what the volume
element looks like. We know it has six faces and that these faces have a very
definite size and shape within the coordinate system. The same cannot be
said about the entire region because it’s completely arbitrary. When we say
arbitrary, we don’t just mean that the system we apply this to could have
any configuration. We mean that, even with a particular system, we can
really choose a region with any shape, size, orientation, or location we wish
and Eq. 3.5.5 still applies.
c Nick Lucid
40 CHAPTER 3. VECTOR CALCULUS
where {h1 , h2 , h3 } are the scale factors and ê3 is the vector orthogonal to the
surface element. Now we’ll consider the curl on that surface element given
by
~ ~ ~
∇ × B • dA3 = ∇ × B dA3 ~ ~
3
1 ∂ ∂
= (h2 B2 ) − (h1 B1 ) h1 h2 dq1 dq2
h1 h2 ∂q1 ∂q2
∂ ∂
= (h2 B2 ) − (h1 B1 ) dq1 dq2
∂q1 ∂q2
∂ ∂
= (h2 B2 ) dq1 dq2 − (h1 B1 ) dq1 dq2
∂q1 ∂q2
where we have applied Eqs. 3.4.10 and 3.5.6. If we apply the fundamental
theorem of calculus (Eq. 3.1.1), we get
~ ×B
∇ ~ • dA ~ 3 = d (h2 B2 ) dq2 − d (h1 B1 ) dq1
− B1 |q2 +dq2 (h1 dq1 )|q2 +dq2 + B1 |q2 (h1 dq1 )|q2 .
c Nick Lucid
3.5. VECTOR CALCULUS THEOREMS 41
Figure 3.3: This is a representation of the arbitrary surface element with orthogonal vector
of ê3 . The corner point between which we integrate are labeled with coordinates given in
form (q1 , q2 ) and it is assumed that all points in this diagram have the same q3 value.
Taking a look at Figure 3.3, we can see the first term corresponds to
the right part of the curve bounding the surface element. Furthermore, the
second term corresponds to the left part, the third term to the top part, and
the fourth term to the bottom part. This means the entire curve enclosing
the surface element is represented. Just as with the divergence theorem, we
see the terms match the form of the dot product given by Eq. 2.2.2. Since
the direction we assign to this curve is completely arbitrary, let’s keep things
consistent with the right-hand rule and choose counterclockwise. This way
the negative signs in the second and third terms are explained by the direction
of the curve being opposite from the first and fourth terms, respectively (we’re
defining up and to the right as positive). All this considered and defining
d`i = hi dqi , we can rewrite as
~ ×B
∇ ~ • dA
~3 = ~ • d~`2
B + B~ • d~`2
q +dq q
1 1 1
+ B ~ • d~`1 + B ~ • d~`1 .
q2 +dq2 q2
I
~ ~ ~
∇ × B • dA3 = ~ • d~`.
B (3.5.7)
dA3
The result in Eq. 3.5.7 is only true of areas constructed of surface elements
with orthogonal vector ê3 . However, nothing was really special about this
particular surface element. We could have just as easily (and in exactly the
c Nick Lucid
42 CHAPTER 3. VECTOR CALCULUS
same way) found the curl on one of the other elements given by
or
~ 2 = (h1 dq1 ) (h3 dq3 ) ê2 = h1 h3 dq1 dq3 ê2 .
dA (3.5.9)
or
I
~ ~ ~
∇ × B • dA2 = ~ • d~`,
B (3.5.11)
dA2
respectively.
Eqs. 3.5.7, 3.5.10, and 3.5.11 describe the three possible orthogonal ori-
entations provided by our three-dimensional space. This means any surface
can be constructed of some combination of these surface elements (or pro-
jections onto these elements). This includes the practical area with which
we started our discussion. To do this, we simply add up (with an integral)
all the elements that compose the area. But what happens to the right side
of the equation? If the area is composed of surface elements, then those ele-
ments are all touching such that they completely fill the area. Many of the
curve elements that enclose each surface element are in contact with curve
elements of other surfaces elements For those curve elements, their B ~ • d~`’s
~
will all cancel because all of their d`’s will be exactly opposite. This means
only the curve elements not in contact with other curve elements will add to
the integral on the right. These curve elements are simply the ones on the
outside of the area (i.e. we only need to integrate over the outside curve that
encloses the area). Therefore, our general equation becomes
Z I
~ ×B
∇ ~ • dA
~= ~ • d~` .
B (3.5.12)
A
We call this the Curl Theorem (or often Stokes Theorem) and it is true
for any arbitrary area A enclosed by a curve `.
c Nick Lucid
3.5. VECTOR CALCULUS THEOREMS 43
You may be asking yourself why we didn’t just start with the area of
the entire region from the beginning. Why did we do all this stuff with the
surface elements instead? The answer is simple: We know what the surface
elements look like. We know it has four sides and that these sides have a
very definite size and shape within the coordinate system. The same cannot
be said about the entire area because it’s completely arbitrary. When we say
arbitrary, we don’t just mean that the system we apply this to could have
any configuration. We mean that, even with a particular system, we can
really choose an area with any shape, size, orientation, or location we wish
and Eq. 3.5.12 still applies.
c Nick Lucid
44 CHAPTER 3. VECTOR CALCULUS
c Nick Lucid
Chapter 4
Lagrangian Mechanics
45
46 CHAPTER 4. LAGRANGIAN MECHANICS
Figure 4.1: These people were important in the development leading up to Lagrange’s
equation.
was not formally written until 1847 by Hermann von Helmholtz. This would
suggest Lagrange didn’t have much background as to the nature of these
scalar quantities, but we know from his own words that he didn’t mind.
“No diagrams will be found in this work. The methods that
I explain in it require neither constructions nor geometrical or
mechanical arguments, but only the algebraic operations inherent
to a regular and uniform process. Those who love Analysis will,
with joy, see mechanics become a new branch of it and will be
grateful to me for having extended this field.”
In Section 4.2, a derivation is presented using our modern understanding
of these quantities. The intent is to present it in a similar fashion to Lagrange,
yet a little less abstract than I expect Lagrange’s presentation was.
c Nick Lucid
4.2. DERIVATION OF LAGRANGE’S EQUATION 47
δW = F~ • δ~r. (4.2.2)
F~ = −∇V
~ (4.2.3)
where ∇ ~ is the del operator (defined in Chapter 3). In this particular case,
it is called the gradient which measures the change in the scalar quantity V
through space (i.e. it is a vector derivative with respect to space). With this
substitution, virtual work becomes
~ • δ~r.
δW = −∇V (4.2.4)
where n is the number of degrees of freedom of the system (i.e. the number
of generalized coordinates).
The dot product is simply a sum of the products of the vector components
(defined by Eq. 2.2.2) and the components of the gradient are defined by Eq.
c Nick Lucid
48 CHAPTER 4. LAGRANGIAN MECHANICS
The new summation only has three terms because ~r is a position vector in
3-space. We can now cancel out our original coordinate system leaving us
with
n
X ∂V
δW = − δqi (4.2.5)
i=1
∂qi
and we get
δW = m~r¨ • δ~r.
Again, we can write the dot product as a summation and work becomes
3
X
δW = m r̈j δrj . (4.2.7)
j=1
We can now take advantage of the product rule (defined by Eq. 3.1.5) and
that time derivatives commute with spatial derivatives
d ∂rj ∂rj ∂ r˙j
ṙj = r̈j + r˙j
dt ∂qi ∂qi ∂qi
∂rj d ∂rj ∂ ṙj
⇒ r̈j = ṙj − ṙj
∂qi dt ∂qi ∂qi
c Nick Lucid
4.2. DERIVATION OF LAGRANGE’S EQUATION 49
to change the variable with which we’re differentiating and work becomes
n X 3
X d ∂ 1 2 ∂ 1 2
δW = m ṙ − ṙ δqi .
i=1 j=1
dt ∂ q̇i 2 j ∂qi 2 j
Bringing the m and the summation over the index j inside the derivatives,
we get
n
" 3
! 3
!#
X d ∂ X1 2 ∂ X1 2
δW = mṙj − mṙj δqi . (4.2.9)
i=1
dt ∂ q̇ i j=1
2 ∂q i j=1
2
The summation over j is now simply the kinetic energy, K, of the system.
Applying this definition to Eq. 4.2.9, we get
n
X d ∂K ∂K
δW = − δqi . (4.2.10)
i=1
dt ∂ q̇i ∂qi
c Nick Lucid
50 CHAPTER 4. LAGRANGIAN MECHANICS
Now we can bring Eqs. 4.2.5 and 4.2.10 together and we get
n n
X ∂V X d ∂K ∂K
− δqi = − δqi
i=1
∂q i i=1
dt ∂ q̇ i ∂q i
n
X ∂ (K − V ) d ∂K
⇒ − δqi = 0. (4.2.11)
i=1
∂qi dt ∂ q̇i
If the potential energy, V , is only a function of position (which it is by
definition), then we know
∂V
= 0. (4.2.12)
∂ q̇i
This allows us to do something I like to call voodoo math (with a little
foresight; we can add zeros, multiply by ones, add and subtract constants,
etc. to simplify a mathematical expression) and Eq. 4.2.11 becomes
n
X ∂ (K − V ) d ∂ (K − V )
− δqi = 0.
i=1
∂qi dt ∂ q̇i
Since this mathematical statement must be true for all systems of general
coordinates, we have
∂ (K − V ) d ∂ (K − V )
− = 0. (4.2.13)
∂qi dt ∂ q̇i
Note that the virtual displacements have disappeared from our equation,
which is exactly what we needed to happen so this could all make sense.
Eq. 4.2.13 is called Lagrange’s equation, but we can do better. Let’s
define a Lagrangian as L = KE − PE = K − V so that Eq. 4.2.13 can be
written simply as
∂L d ∂L
− =0 (4.2.14)
∂qi dt ∂ q̇i
where qi are the generalized coordinates and q˙i are the generalized velocities.
The index i indicates there are as many of these equations for your system as
you have generalized coordinates, so you will always have as many equations
as unknowns (i.e. a solvable system). If the generalized coordinate is a linear
distance measure, then Eq. 4.2.14 results in force terms. If the generalized
coordinate is an angle measure, then Eq. 4.2.14 results in torque terms. The
solutions are always the equations of motion for a given system.
c Nick Lucid
4.3. GENERALIZING FOR MULTIPLE BODIES 51
c Nick Lucid
52 CHAPTER 4. LAGRANGIAN MECHANICS
The parenthetical quantity is simply the kinetic energy of the whole system
and can still be defined as K. Under this definition, our new virtual work
becomes exactly Eq. 4.2.10 and still ultimately results in Eq. 4.2.14 given
that we define L as the Lagrangian of the whole system of N bodies.
1. Determine the best set of generalized coordinates for the system. There
are an infinite number of these sets, but we can make things easier by
making a good choice. The best choice will have the minimum number
of degrees of freedom for the system.
3. Use the coordinate transformations to write out the potential and kinetic
energy of the system in terms the generalized coordinates. If you have
multiple bodies in the system, then you can find the total by adding
the corresponding energy from all the bodies together.
Example 4.4.1
A solid ball (mass m and radius R), starting from rest, rolls without slipping
down an platform inclined at an angle φ from the floor.
1. We can define x as the distance the ball has traveled down the incline
and θ as the angle through which the ball has rotated. This would
constitute a set of generalized coordinates. However, the ball has a
constraint that it doesn’t slip on the surface of the incline. Therefore,
x and θ are related by x = Rθ, an equation of constraint. This
means only one of them is required. We’ll choose x.
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 53
Figure 4.2: The ball in this figure is rolling without slipping down the platform. The
displacement from the top, x, is labeled as well as the angle of inclination, φ, of the
platform.
4. The Lagrangian is
L=K −V = 7
10
mẋ2 − (−mgx sin φ) = 7
10
mẋ2 + mgx sin φ.
d 7
= mg sin φ − 75 mẍ = 0
mg sin φ − dt 5
mẋ
5
ẍ = g sin φ.
7
c Nick Lucid
54 CHAPTER 4. LAGRANGIAN MECHANICS
This is the acceleration of the ball as it travels down the incline. Under
normal circumstances we would integrate this twice to find the function x(t),
but because the acceleration is constant we already know this will result in
x(t) = 1
a t2 + v0x t + x0 = 21 ẍt2
2 x
+ ẋ(0) t + x(0)
2
1 1 5
2
= 2
ẍt = 2 7
g sin φ t
5
x(t) = g sin φ t2 .
14
If you want to know how the ball is rotating at a given time, then
x(t) 5g sin φ 2
θ(t) = = t.
R 14R
This is exactly the result you would get via Newton’s laws.
Example 4.4.2
An object with a mass m is moving within the gravitational influence of the
sun (MJ = 1.99 × 1030 kg) such that m MJ .
1. The position of m is represented by (r, θ) in cylindrical coordinates.
Neither of these coordinates is necessarily constant with the informa-
tion provided. If there is any motion at all, then θ is changing. The
value of r is only constant for a circular orbit and, as close as some of
the planets may get to this, most orbits are not circular. Therefore,
our generalized coordinates, qi , are (r, θ).
2. Based on Figure 4.3, we can write the coordinate transformations as
x = r cos θ
y = r sin θ
and the first time-derivatives are
ẋ = ṙ cos θ − rθ̇ sin θ
.
ẏ = ṙ sin θ + rθ̇ cos θ
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 55
Figure 4.3: The sun has been placed at the origin in the coordinate system for convenience.
The position, ~r, is arbitrary and the velocity, ~v , is shown for that position. Note that ~v is
not perpendicular to ~r since this only true for a circular path.
MJ m
V = −G
r
1 2 1
mv = m ẋ2 + ẏ 2
K =
2 2
2
1 2
= m ṙ cos θ − rθ̇ sin θ + ṙ sin θ + rθ̇ cos θ
2
1 2
= m ṙ cos2 θ − rṙθ̇ sin θ cos θ + r2 θ̇2 sin2 θ
2
+ṙ2 sin2 θ + rṙθ̇ sin θ cos θ + r2 θ̇2 cos2 θ
1
K = m ṙ2 + r2 θ̇2 .
2
c Nick Lucid
56 CHAPTER 4. LAGRANGIAN MECHANICS
4. The Lagrangian is
L = K −V
1 2 MJ m
2 2
= m ṙ + r θ̇ − −G
2 r
1 2 MJ m
= m ṙ + r2 θ̇2 + G .
2 r
2
MJ
m d
mrθ̇ − G 2 − (mṙ) = 0
r dt
d
−mr2 θ̇ = 0
0−
dt
2
M J
m
mrθ̇ − G 2 − mr̈ = 0
r .
d 2
−mr θ̇ = 0
dt
GM J
r̈ − rθ̇2 + = 0
2
r .
d 2
r θ̇ = 0
dt
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 57
Figure 4.4: This is the elliptical orbit of Halley’s comet. It has been scaled to make visible
the entire orbit of Earth at 1 AU. The sun still does not have visible size at this scale.
Note: this diagram does not indicate orientation.
c Nick Lucid
58 CHAPTER 4. LAGRANGIAN MECHANICS
Figure 4.5: This is the elliptical orbit of Halley’s comet. It has been scaled to make the
comet’s entire orbit visible.
Figure 4.6: This is a graph of the radius, r, (distance from the sun) as a function of orbital
angle, θ. It also indicates that the radius has a higher rate of change at π rad (i.e. the
aphelion).
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 59
a very straight forward equation. If you don’t have the knack for solving
differential equations yet, don’t worry. Solving them really only comes down
to two things: making good guesses and knowing where you’re going. These
will come to you with experience. Well, now that we’ve got our good guess
out of the way, where are we going? We would like r to be function of θ rather
than t because that will help us get an idea of the shape of the object’s path
(maybe an ellipse?). Using θ̇ = `/r2 = `u2 and the chain rule, we get a first
derivative substitution of
dr dr dθ −2 du du
`u2 = −`
= = −u
dt dθ dt dθ dθ
and a second derivative of
d2 r 2
d dr d dr dθ d du 2 2 2d u
= = = −` `u = −` u .
dt2 dt dt dθ dt dt dθ dθ dθ2
If we make these substitutions in our equation of motion, we get
d2 u
−`2 u2 − `2 u3 + GMJ u2 = 0
dθ2
and, dividing through by −`2 u2 , we get
d2 u GMJ
+ u − = 0
dθ2 `2
d2 u GMJ
+ u = .
dθ2 `2
This has now become a very typical differential equation that we’ll solve
with another guess. Based on its form, the second derivative of u(θ) must
be proportional to the same function as u(θ). This is only true for cos θ and
sin θ. Normally, we’d write the general solution as a linear combination of
these specific ones, but it may be better in our case to just give the cos θ a
phase angle to accommodate the sin θ. Therefore, our general solution is in
the form of
GMJ
u(θ) = + A cos(θ + θ0 ) .
`2
Since the phase angle, θ0 , just determines orientation in this example, we can
define it as zero giving us a general solution of
GMJ
u(θ) = + A cos θ
`2
c Nick Lucid
60 CHAPTER 4. LAGRANGIAN MECHANICS
`2
GMJ
r(θ) = .
1 + B cos θ
This matches the form of the equation for conic sections. If we choose r(0) =
r0 (i.e. the perihelion) and B = e (i.e. the eccentricity), then
r0 (1 + e)
r(θ) = , (4.4.2)
1 + e cos θ
which includes circles (e = 0), ellipses (0 < e < 1), parabolas (e = 1), and
hyperbolas (e > 1). That makes our result a more generalized statement of
Kepler’s first law of planetary motion. This is exactly what we would expect
if we’re analyzing the motions of bodies in a gravitational field.
Example 4.4.3
A double pendulum is constructed as follows: A rigid string (of negligible
mass) of length L1 connects a mass m1 to a perfectly rigid ceiling. Another
rigid string (of negligible mass) of length L2 connects another mass m2 to
the bottom of m1 .
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 61
Figure 4.7: The strings are of constant length L1 and L2 and the pendulum bobs are free
to swing in a two-dimension plane. The angle for each bob is measured from its respective
vertical.
c Nick Lucid
62 CHAPTER 4. LAGRANGIAN MECHANICS
We can see here that the kinetic energy of the system is going to become
a very long equation, so it might be best to consider the two masses
separately for now and bring them back together later. For m1 , we
have
2
1 2
K1 = m1 L1 θ̇1 cos θ1 + L1 θ̇1 sin θ1
2
1
= m1 L21 θ̇12 cos2 θ1 + L21 θ̇12 sin2 θ1
2
2
and, since sin θ + cos2 θ = 1,
K1 = 21 m1 L21 θ̇12
For m2 , we have
2
1 2
K2 = m2 L1 θ̇1 cos θ1 + L2 θ̇2 cos θ2 + L1 θ̇1 sin θ1 + L2 θ̇2 sin θ2
2
K2 = 12 m2 L21 θ̇12 cos2 θ1 + 2L1 L2 θ̇1 θ̇2 cos θ1 cos θ2 + L22 θ̇22 cos2 θ2
2 2 2 2 2 2
+L1 θ̇1 sin θ1 + 2L1 L2 θ̇1 θ̇2 sin θ1 sin θ2 + L2 θ̇2 sin θ2
and, since sin2 θ+cos2 θ = 1 and cos A cos B+sin A sin B = cos(A − B),
h i
K2 = 21 m2 L21 θ̇12 + 2L1 L2 θ̇1 θ̇2 cos(θ1 − θ2 ) + L22 θ̇22 .
Bringing these back together to find the total kinetic energy, we get
h i
K = 21 m1 L21 θ̇12 + 21 m2 L21 θ̇12 + 2L1 L2 θ̇1 θ̇2 cos(θ1 − θ2 ) + L22 θ̇22
= 1
2
(m1 + m2 ) L21 θ̇12 + 21 m2 L22 θ̇22 + m2 L1 L2 θ̇1 θ̇2 cos(θ1 − θ2 ) .
4. The Lagrangian is
L=K −V
1
L = (m1 + m2 ) L21 θ̇12 + m2 L22 θ̇22 + m2 L1 L2 θ̇1 θ̇2 cos(θ1 − θ2 )
1
2
2
− [− (m1 + m2 ) gL1 cos θ1 − m2 gL2 cos θ2 ]
L = 1
2
(m1 + m2 ) L21 θ̇12 + 21 m2 L22 θ̇22 + m2 L1 L2 θ̇1 θ̇2 cos(θ1 − θ2 )
+ (m1 + m2 ) gL1 cos θ1 + m2 gL2 cos θ2
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 63
For clarity, we’ll evaluate each term of each equation separately and
then state it all together at the end. The terms of equation for θ1 will
be
∂L
= −m2 L1 L2 θ̇1 θ̇2 sin(θ1 − θ2 ) − (m1 + m2 ) gL1 sin θ1
∂θ1
and
d ∂L d h 2
i
− = − (m1 + m2 ) L1 θ̇1 + m2 L1 L2 θ̇2 cos(θ1 − θ2 )
dt ∂ θ̇1 dt
If we cancel all like terms and divide through by −L1 , the common
factor to all terms, we get
∂L
= m2 L1 L2 θ̇1 θ̇2 sin(θ1 − θ2 ) − m2 gL2 sin θ2
∂θ2
c Nick Lucid
64 CHAPTER 4. LAGRANGIAN MECHANICS
and
d ∂L d h i
− = − m2 L22 θ̇2 + m2 L1 L2 θ̇1 cos(θ1 − θ2 )
dt ∂ θ̇2 dt
= −m2 L22 θ̈2 − m2 L1 L2 θ̈1 cos(θ1 − θ2 )
h i
+m2 L1 L2 θ̇1 sin(θ1 − θ2 ) θ̇1 − θ̇2
c Nick Lucid
4.4. APPLICATIONS OF LAGRANGE’S EQUATION 65
Figure 4.8: This graph shows both θ1 and θ2 as a function of time. The example is given
for γ = 0.2, L1 = L2 = 1 m, θ1 (0) = π/6, and θ2 (0) = 0.
Figure 4.9: This is a representation of the path each pendulum bob has taken in space
under the time interval given by Figure 4.8. The coordinate transformations have been
used to convert back to x and y.
c Nick Lucid
66 CHAPTER 4. LAGRANGIAN MECHANICS
The first terms can still be written in terms of potential energy just as be-
fore, but the second term needs some attention. We can use the method
of Lagrange Multipliers to also write the constraint force as a gradient
c Nick Lucid
4.5. LAGRANGE MULTIPLIERS 67
F~ = −∇V
~ + λ∇f
~ (4.5.2)
It turns out all we end up doing is carrying through a new term. Furthermore,
the form of work given by Newton’s second law is unaffected by the change.
Therefore, we get
n n n
X ∂V X ∂f X d ∂K ∂K
− δqi + λ δqi = − δqi
i=1
∂qi i=1
∂q i i=1
dt ∂ q̇ i ∂q i
n
X ∂ (K − V ) d ∂K ∂f
⇒ − +λ δqi = 0.
i=1
∂qi dt ∂ q̇i ∂qi
n
X ∂ (K − V ) d ∂ (K − V ) ∂f
⇒ − +λ δqi = 0.
i=1
∂qi dt ∂ q̇i ∂qi
Some texts prefer to move the λ term to the other side of the equation
thereby answering our original question of what Lagrange’s equation is equal
to. However, I prefer to write as though it still is equal to zero and think of
c Nick Lucid
68 CHAPTER 4. LAGRANGIAN MECHANICS
λ as the constraint force that makes it zero. By the same processes as before,
Eq. 4.2.14 now takes on the form
∂L d ∂L ∂f
− +λ =0 (4.5.3)
∂qi dt ∂ q̇i ∂qi
Example 4.6.1
c Nick Lucid
4.6. APPLICATIONS OF LAGRANGE MULTIPLIERS 69
Returning to Example 4.4.1, find the constraint force causing the ball to roll
without slipping.
1. As before, we will define x as the distance the ball has traveled down
the incline and θ as the angle through which the ball has rotated. The
equation of constraint for this example is x = Rθ or f (x, θ) = x − Rθ =
0. The y-direction may still be eliminated because the ball simply being
constrained to the incline is caused by a different force.
K = 1
2
mv 2 + 12 Iω 2 = 12 mẋ2 + 12 I θ̇2
= 1
2
mẋ2 + 51 mR2 θ̇2
4. The Lagrangian is
5. This time when we plug this into Lagrange’s equation, there are two
equations because there are two generalized coordinates. We get
∂L d ∂L ∂f
∂x − dt ∂ ẋ + λ ∂x = 0
∂L
d ∂L ∂f
− +λ = 0
∂θ dt ∂ θ̇ ∂θ
d
mg sin φ − dt (mẋ) + λ = 0
d 2
0 − mR2 θ̇ − λR = 0
dt 5
c Nick Lucid
70 CHAPTER 4. LAGRANGIAN MECHANICS
We can take note here that if λ is a force acting on the outside edge
of the ball, then λR is the torque that it causes. When you perform
Lagrange’s equation with respect to a distance, the terms that result
are forces. When you perform Lagrange’s equation with respect to an
angle, the terms that result are torques. Simplifying a bit, we get
( )
mg sin φ − mẍ + λ = 0
− 52 mRθ̈ − λ = 0
We are looking for λ, so lets start with the second equation and eliminate
some of the other unknowns. Solving for Rθ̈, we get
2
mRθ̈ = −λ
5
5λ
Rθ̈ = − .
2m
Since, from the equation of constraint, x = Rθ ⇒ ẍ = Rθ̈, we get
5λ
ẍ = −
2m
and we can now eliminate ẍ from the first equation in our set resulting in
5λ
mg sin φ − m − +λ = 0
2m
mg sin φ + 25 λ + λ = 0
mg sin φ + 72 λ = 0
2
λ = − mg sin φ .
7
This final answer is simply the force of static friction acting on the outside
edge of the ball, which is exactly what we would expect and exactly the result
we would find using Newton’s laws.
Example 4.6.2
c Nick Lucid
4.6. APPLICATIONS OF LAGRANGE MULTIPLIERS 71
Figure 4.10: This figure shows the motions of the ramp and block as well as their respective
coordinate systems. The coordinate transformations are a way to move (within the math)
between these special coordinate systems and the universal xy system.
1. Based on Figure 4.10, we can see that the positions of the objects in the
system are represented by (xB , yB , xR , yR ). If we were just concerned
with the equations of motion, then (xB , xR ) would be enough since
the wedge is constrained to the horizontal surface and the block is
constrained to the ramp. However, if we want the force constraining the
block to the ramp, then we need to keep yB . Therefore, the generalize
coordinates, qi , are (xB , yB , xR ). The equation of constraint will be
f (xB , yB , xR ) = yB = 0.
x = xR + xB cos φ + yB sin φ
y = −xB sin φ + yB cos φ
c Nick Lucid
72 CHAPTER 4. LAGRANGIAN MECHANICS
We can see here that the kinetic energy of the system is going to become
an extremely long equation, so it might be best to consider the two
objects separately for now and bring them back together later. For the
ramp, we have
KR = 21 mR ẋ2R + 02 = 12 mR ẋ2R
The kinetic energy of the block is where things get nasty. We get
KB = 12 mB (ẋR + ẋB cos φ + ẏB sin φ)2 + (−ẋB sin φ + ẏB cos φ)2
KB = 1
m ẋ2R + 2ẋR ẋB cos φ + 2ẋR ẏB sin φ
2 B
+ẋ2B cos2 φ + 2ẋB ẏB sin φ cos φ + ẏB2 sin2 φ
+ẋ2B sin2 φ − 2ẋB ẏB sin φ cos φ + ẏB2 cos2 φ
c Nick Lucid
4.6. APPLICATIONS OF LAGRANGE MULTIPLIERS 73
Bringing these back together to find the total kinetic energy, we get
K = 12 mR ẋ2R + 12 mB ẋ2R + 2ẋR ẋB cos φ + 2ẋR ẏB sin φ + ẋ2B + ẏB2
+ 12 (mR + mB ) ẋ2R .
4. The Lagrangian is
L = K −V
1
+ 2ẋR ẏB sin φ + ẋ2B + ẏB2
= m 2ẋR ẋB cos φ
2 B
+ 12 (mR + mB ) ẋ2R
− [−mB g (xB sin φ − yB cos φ)]
= 21 mB 2ẋR ẋB cos φ + 2ẋR ẏB sin φ + ẋ2B + ẏB2
d
0 − [(mR + mB ) ẋR + mB (ẋB cos φ + ẏB sin φ)] + 0 = 0
dt
d
mB g sin φ − (mB ẋR cos φ + mB ẋB ) + 0 = 0 .
dt
d
−mB g cos φ − (mB ẋR sin φ + mB ẏB ) + λ = 0
dt
− (mR + mB ) ẍR − mB ẍB cos φ − mB ÿB sin φ = 0
mB g sin φ − mB ẍR cos φ − mB ẍB = 0 .
−mB g cos φ − mB ẍR sin φ − mB ÿB + λ = 0
c Nick Lucid
74 CHAPTER 4. LAGRANGIAN MECHANICS
All is well in the world again now that yB is gone. Carrying yB through the
problem has resulted in an extra equation and the currently unknown con-
straint force, λ. With a little algebra, we should be able to find it. Logically,
if we want λ, then we need to start with the third equation. That means
but we need ẍR . The first equation contains this, but we’ll need ẍB . The
second equation contains ẍB resulting in
c Nick Lucid
4.7. NON-CONSERVATIVE FORCES 75
γ sin2 φ cos φ
λ = mB g cos φ − .
1 − γ cos2 φ
It may not be clear which force this is, so let’s simplify a bit to get a feel
for it. If we wanted to fix the ramp in place, then we would need to alter
one of the quantities in λ. The easiest way to do this is to make the mass of
the ramp very large so it remain nearly (inertially) unaffected by the block.
This would result in γ = 0 and λ = mB g cos φ. This force is just the normal
force acting on the block due to the ramp. Therefore, our λ above is just the
normal force due to a ramp that moves.
Starting from Eq. 4.7.1, we can see that Eq. 4.5.3 becomes
∂L d ∂L ∂f
− +λ + Qi = 0 (4.7.2)
∂qi dt ∂ q̇i ∂qi
where
3
X ∂rj
Qi = Fj
j=1
∂qi
are the generalized forces that include all the non-conservative forces in-
volved in the system transformed to the set of generalized coordinates.
Sometimes generalized forces can be written in terms of a velocity (q̇i )
dependent potential energy. If this is the case, then they will become part
of the Lagrangian and merge with the first two terms in Eq. 4.7.2. However,
when dealing with non-conservative forces, it is usually best to concede to
Newton’s laws of motion for practical purposes.
c Nick Lucid
76 CHAPTER 4. LAGRANGIAN MECHANICS
c Nick Lucid
Chapter 5
Electrodynamics
5.1 Introduction
The concepts of electricity and magnetism have been studied since Ancient
Greece. In fact, there are records indicating Thales of Miletus was rubbing
fur on amber around 600 BCE to generate an attractive force. The Ancient
Greeks also had lodestone, a naturally occurring magnet made of a mineral
now called magnetite. They came up with a wide variety of hypotheses,
but very little progress was made in understanding why these phenomena
occur. Scientific studies today are conducted using the scientific method, a
rigorous process backed by experimental confirmation. In the middle-to-late
19th century, it had become clear that classical mechanics (and, therefore,
the Lagrangian mechanics of Chapter 4) was not sufficient to fully describe
these phenomena and that another form of mechanics would be required to
explain them.
77
78 CHAPTER 5. ELECTRODYNAMICS
Coulomb’s Law
In 1784, Charles Coulomb was studying the effects of charged objects and
their influence on one another. He published a relationship that governed
the force exerted by one charged object on another. It had the form
q1 q2
F~E = kE 2 r̂ , (5.2.1)
r
where q1 and q2 are the charges of the two objects, r is the distance between
their centers, and kE is a constant of proportionality with a value of 8.988 ×
109 Nm2 /C2 . We call this Coulomb’s law. This relationship is referred to
as an inverse square law and, as you can see, bares a striking resemblance to
Newton’s universal law of gravitation,
m1 m2
F~g = −G 2 r̂, (5.2.2)
r
published by Newton over a century before. The simple appearance of Eqs.
5.2.1 and 5.2.2 is very useful when trying to understand the relationships
between quantities. It is sometimes more useful in practical situations to
write Eq. 5.2.1 in terms of position vectors,
~ q1 q2
F
E12
= k E (~
r 1 − ~
r )
2
|~r1 − ~r2 |3
q1 q2 , (5.2.3)
~
F
E21
= k E (~
r 2 − ~
r )
1
|~r2 − ~r1 |3
where ~r1 and ~r2 are the positions of q1 and q2 , respectively. We have used
~r
r̂ = (5.2.4)
r
c Nick Lucid
5.2. EXPERIMENTAL LAWS 79
to eliminate the unit vector r̂. The subscript of 12 indicates this is the force
on q1 due to q2 and 21 the reverse. However, these equations lack the elegance
found in Eq. 5.2.1.
Another limitation of both Eqs. 5.2.1 and 5.2.3 is that they only apply
when the objects in question can be approximated as nearly stationary point
charges. Furthermore, situations can arise where we may not know much
about some of the charge involved due to system complexity. It is astronom-
ically more useful to define a quantity known as a field. In this case, we’d
call it an electric field (abbreviated as E-field). This field is a representation
of how electric charge affects the surrounding space. Essentially, we’re cre-
ating a mathematical middle-man. I realize, at first glance, it might seem
more complicated to consider an entirely new quantity, but this E-field has
incredible power (pardon the pun). We can determine the E-field around a
charged object, whatever the shape, and then forget about that object when
predicting its effect on a new charge in the region, as long as this new charge
is small compared to the original so as to not affect its E-field. We can also
measure the E-field in a region while never considering its source.
Starting with Eq. 5.2.1, we can write the basic definition of an E-field as
~ = kE q r̂ ,
E (5.2.5)
r2
where q is the charge generating the E-field. The electric force on a new
charge, q0 , is then just F~E = q0 E.
~ Based on this, we can also conceptualize
an E-field as a measure of one charge’s ability to exert a force on another
charge. Again, however, Eq. 5.2.5 still only applies to charges which are
approximately points.
To find an E-field due to a charge distribution, we can write Eq. 5.2.5 as
~ = kE dq
dE r̂, (5.2.6)
r2
where dq represents an infinitesimal portion of the charge distribution (i.e.
charge element) dependent on ~r (i.e. both r and r̂) and dE ~ is the E-field
element generated by dq. The value of r now represents the distance from
dq to the point in space that is of interest. Nothing need be at that point,
however, because we’re only discussing how the charge distribution affects
space itself. The total field can be found through superposition by integration
(which is just a sum of an infinite number of infinitesimally small terms).
c Nick Lucid
80 CHAPTER 5. ELECTRODYNAMICS
Figure 5.2: This diagram shows all the quantities used in Eqs. 5.2.6 and 5.2.7 in an
arbitrary coordinate system. We can see clearly here that ~r = ~rp − ~rq because ~rp = ~rq + ~r.
~ = kE dq
dE (~rp − ~rq ) (5.2.7)
|~rp − ~rq |3
where ~rp is the position of the point in space and ~rq is the position of dq.
We have used Eq. 5.2.4 to eliminate the unit vector. Once again, we lose
elegance, but gain practical usefulness.
Just as with problems in Chapter 4, there is a methodical process for
solving problems like this.
1. Chose an arbitrary dq and find its value in terms of some spatial vari-
able(s). If you’ve positioned your coordinate system wisely, this should
look relatively simple.
2. Find ~rp and ~rq for the system. This shouldn’t be too difficult if you’ve
drawn a good picture with the proper labels similar to Figure 5.2.
3. Find ~r = ~rp − ~rq and r = |~rp − ~rq |. This takes the guesswork out of
finding ~r.
c Nick Lucid
5.2. EXPERIMENTAL LAWS 81
4. Substitute from the previous step into Eq. 5.2.7 and separate into vector
component terms. In order to do the next step, these vector components
should have constant directions. The Cartesian coordinate system is a
common choice.
5. Integrate over whatever variable(s) dq is dependent on. Depending on
the charge distribution this could be 1, 2, or 3 spatial variables.
Example 5.2.1
Find the electric field at an arbitrary point p in the space around a uniformly
charged amber rod.
1. Based on the coordinate system chosen in Figure 5.3, we have charge
distributed uniformly along the x-axis. Therefore,
dq
λ= = constant ⇒ dq = λ dxq
dxq
where λ is the linear charge density. It is constant because the dis-
tribution is uniform. Uniformity is not a requirement in general, but
a different distribution would certainly make the rest of this example
rather complicated.
2. The point p chosen is arbitrary, but will remain constant through the
following derivation because we’re integrating along dq (i.e. the rod).
The position of dq is also arbitrary so that we don’t make any premature
judgements about the form of ~rq . Figure 5.3 shows the two position
vectors to clearly be
~rp = xp x̂ + yp ŷ
and
~rq = xq x̂
c Nick Lucid
82 CHAPTER 5. ELECTRODYNAMICS
Figure 5.3: The amber rod is placed along the x-axis and all vectors from Eq. 5.2.7 are
shown.
which means r is
q 1
|~rp − ~rq | = (xp − xq )2 + yp2 = (xp − xq )2 + yp2 2 .
~ = kE λdxq
dE
2 3/2 [(xp − xq ) x̂ + yp ŷ]
2
(xp − xq ) + yp
5. If we want the total E-field due to the amber rod, we must integrate
~ = E~ dE, ~ we get
R
over all possible dq’s. Using E 0
Z a Z a
~ = kE λx̂ (x p − x q ) dx q yp dxq
E 3/2
+ kE λŷ 3/2 .
2 2
−a (xp − xq ) + y 2 −a (xp − xq ) + y 2
p p
c Nick Lucid
5.2. EXPERIMENTAL LAWS 83
From this point on, it will be a bit more clear if we discuss the compo-
nents separately. If we define E~ as Ex x̂ + Ey ŷ, then
Z a
(xp − xq ) dxq
Ex = kE λ
3/2
2
−a (xp − xq ) + y 2
p
Z a .
yp dxq
Ey = kE λ −a
2 3/2
(xp − xq ) + yp 2
u = (xp − xq )2 + yp2
1
⇒ (xp − xq ) dxq = − du.
2
This results in an x-component of
Z u2 Z u2
−1/2 1
Ex = kE λ 3/2
du = − kE λ u−3/2 du
u1 u 2 u1
u2 u2
u−1/2
1 1
Ex = − kE λ = kE λ .
2 −1/2 u1 u1/2 u1
c Nick Lucid
84 CHAPTER 5. ELECTRODYNAMICS
Figure 5.4: This reference triangle is used to transform the integrand of the y-component
in Example 5.2.1. The side opposite the angle θ is labeled xq − xp rather than xp − xq to
eliminate a negative sign from the transformation. This is mathematically legal because
2 2
(xq − xp ) = (xp − xq ) .
1 1
Ex = kE λ q −q .
2 2
(xp − a) + yp2 (xp + a) + yp2
xq − xp
= tan θ ⇒ xq = yp tan θ + xp
yp
c Nick Lucid
5.2. EXPERIMENTAL LAWS 85
arrive at
a yp3
Z
kE λ
Ey = 2 3/2 dxq .
yp −a
2 2
(xp − xq ) + yp
Suddenly, with another look at Figure 5.4, our integrand simply becomes
cos3 θ and Ey becomes
kE λ θ2
Z
cos3 θ yp sec2 θ dθ
Ey = 2
yp θ1
Z θ2 θ2
kE λ kE λ
Ey = cos θ dθ = (sin θ) .
yp θ1 yp θ1
kE λ − (xp − a) − (xp + a)
Ey = q −q
yp (xp − a)2 + yp2 (xp + a)2 + yp2
kE λ xp + a xp − a
Ey = q −q
yp 2 2
(xp + a) + yp 2 2
(xp − a) + yp
~ = Ex x̂ + Ey ŷ where the
In summary, we can write the electric field as E
components are
1 1
Ex = kE λ q − q
2 2 2 2
(xp − a) + yp
(xp + a) + yp
.
k E λ x p + a x p − a
E = −
y
q q
yp
2 2 2 2
(x + a) + y
p (x − a) + y
p p p
c Nick Lucid
86 CHAPTER 5. ELECTRODYNAMICS
This result, because the point p was completely arbitrary, applies to all space
around the rod. With that in mind, we can simplify things by dropping the
p subscript for future uses. The components become
1 1
E x = kE λ q − q
2 2 2 2
(x − a) + y (x + a) + y
.
k E λ x + a x − a
E = −
y
q q
y
2 2 2 2
(x + a) + y (x − a) + y
This is slightly different from the standard definition only because of the
orientation of the rod along the x-axis. In the xy-plane, the z-direction is
equivalent to the φ-direction. We now have
1 1
Ex = k E λ q − q
2 2 2 2
(x − a) + s (x + a) + s
(5.2.8)
Es = kE λ q x + a x−a
−q
s
2 2
(x + a) + s 2 (x − a) + s 2
Eφ = 0
c Nick Lucid
5.2. EXPERIMENTAL LAWS 87
Figure 5.5: These people were important in the development the Biot-Savart law.
Biot-Savart Law
A somewhat similar relationship to Eq. 5.2.6 was discovered for magnetic
fields, but it wouldn’t arrive for almost another 40 years. Together in 1820,
Jean-Baptiste Biot and Félix Savart announced they had discovered the mag-
netic force due to a current carrying conductor was proportional to 1/R and
this force was perpendicular to the wire. This wasn’t much of a result, but
it was a start.
A mathematician named Pierre-Simon Laplace very quickly generalized
~ much like the electric field. Laplace’s
this result in terms of a magnetic field B,
equation looked something like
~ = kM Id~l × r̂
dB , (5.2.9)
r2
where I is a steady electric current generating the B-field, d~l is the infinitesi-
mal section of the conductor in the direction of the current, r is the distance
between Id~l and the point in space being examined, r̂ is the unit vector in
the direction of ~r, and kM is a constant of proportionality with a value of
1.0 × 10−7 N/A2 . This is what we now call the Biot-Savart law. The cross
product in Eq. 5.2.9 indicates that dB ~ is perpendicular to both Id~l and r̂
making it consistent with Biot and Savart’s result. The vector sign is usually
placed on the dl rather than I to emphasize the current is a steady, but it
can really be placed on either.
We can generalize Eq. 5.2.9 much like we did with Eq. 5.2.7 resulting in
c Nick Lucid
88 CHAPTER 5. ELECTRODYNAMICS
Figure 5.6: This diagram shows all the quantities used in Eq. 5.2.9 as well as the quantity
R defined by Biot and Savart’s discovery. The quantity dB ~ is indicated as perpendicular
to r̂. It is also tangent to the dashed circle indicating it is also perpendicular to d~l.
where ~rp is the position of the point in space and ~rI is the position of Id~l.
Again, we have used Eq. 5.2.4 to eliminate the unit vector just like we did
with Coulomb’s law. The methodical process for solving problems with the
Biot-Savart law is similar to that of Coulomb’s law.
1. Chose an arbitrary Id~l and find its value in terms of some spatial vari-
able(s). If you’ve positioned your coordinate system wisely, this should
look relatively simple.
2. Find ~rp and ~rI for the system. This shouldn’t be too difficult if you’ve
drawn a good picture with the proper labels similar to Figure 5.6.
3. Find ~r = ~rp − ~rI and r = |~rp − ~rI |. This takes the guesswork out of
finding ~r.
4. Perform the cross product given by Id~l × (~rp − ~rI ). This will save you
writing time and keep things clear in your solution.
5. Substitute from the previous step into Eq. 5.2.10 and separate into vec-
tor component terms. In order to do the next step, these vector com-
c Nick Lucid
5.2. EXPERIMENTAL LAWS 89
6. Integrate over whatever variable Id~l is dependent on. The form given
in Eq. 5.2.10 is over a single variable, but it can be generalized to more.
The quantity Id~l is simply replaced by KdA ~ I (two variables) or JdV ~ I
(three variables) depending on the type of electric current distribution.
Example 5.2.2
A circular conductor with a radius of R is carrying a steady current I. Find
the magnetic field at an arbitrary point p around this loop.
Id~l = IR dφ
~ I = IR dφI φ̂I = IR dφI (− sin φI x̂ + cos φI ŷ)
~rp = xp x̂ + yp ŷ + zp ẑ = xp x̂ + zp ẑ.
The position of Id~l is also arbitrary so that we don’t make any prema-
ture judgements about the form of ~rI . Figure 5.7 shows
c Nick Lucid
90 CHAPTER 5. ELECTRODYNAMICS
Figure 5.7: The conducting loop is placed in the xy-plane centered at the origin and all
vectors from Eq. 5.2.10 are shown.
This means r is
q
|~rp − ~rI | = R (xR − cos φI )2 + (− sin φI )2 + zR2
q
|~rp − ~rI | = R x2R − 2xR cos φI + cos2 φI + sin2 φI + zR2 .
c Nick Lucid
5.2. EXPERIMENTAL LAWS 91
1/2
|~rp − ~rI | = R x2R + zR2 + 1 − 2xR cos φI
x̂ ŷ ẑ
= IR2 dφI det
− sin φ I cos φ I 0
(xR − cos φI ) − sin φI zR
" #
~ = kM I zR cos φI x̂ + zR sin φI ŷ + (1 − xR cos φI ) ẑ
dB 3/2
dφI .
R (x2R + zR2 + 1 − 2xR cos φI )
c Nick Lucid
92 CHAPTER 5. ELECTRODYNAMICS
From this point on, it will be a bit more clear if we discuss the compo-
nents separately. If we define dB ~ and dBx x̂ + dBy ŷ + dBz ẑ, then
" #
k M I zR cos φI dφ I
dBx =
3/2
R 2 2
(xR + zR + 1 − 2xR cos φI )
" #
kM I zR sin φI dφI
dBy = .
R (x2R + zR2 + 1 − 2xR cos φI )3/2
" #
(1 − xR cos φI ) dφI
kM I
dBz =
R (xR + zR + 1 − 2xR cos φI )
2 2 3/2
6. If we want the total B-field due to the conducting loop, we must inte-
~ = B~ dB, ~ we get
R
grate over all possible dφI ’s. Using B 0
Z 2π
kM I zR cos φ dφ I
Bx =
3/2
R 0 (x 2
R + z 2
R + 1 − 2x R cos φ I )
Z 2π
kM I zR sin φ dφI
By =
R 0 (x2R + zR2 + 1 − 2xR cos φI )3/2
2π
−
Z
k I (1 x cos φ ) dφ
M R I I
B z = 3/2
R 0 (x2R + zR2 + 1 − 2xR cos φI )
where our variable of integration is φI . We cannot replace φI with φ
because φ ≡ φp . The variables φI and φp are two very different things,
so be careful.
Unfortunately, Bx and Bz require numerical integration. However, we can
evaluate By using a change of variable (something the mathematicians like to
call a u-substitution). Choosing how to define the new variable is a bit of an
art, but the desired result is always the same: make the integrand as simple
as possible. This is done by choosing a definition for the new variable that
is as complex as possible such that all forms of the old variable can vanish.
In this case,
u = x2R + zR2 + 1 − 2xR cos φI
would be the best choice. The first derivative of this is
du du
= 2xR sin φI ⇒ = sin φI dφI .
dφI 2xR
c Nick Lucid
5.2. EXPERIMENTAL LAWS 93
Example 5.2.3
A Helmholtz coil is constructed of two circular coils of radius R separated
by a distance R, each have N loops of wire. The magnetic field it produces
is extremely uniform in between the two coils. To justify this statement,
show that a separation of R results in the most uniform field and sketch the
magnetic field.
• We’ll start with Eq. 5.2.11 to save some time. We can set the point
p along the z-axis for simplicity since it will be sufficient to show the
uniformity along the axis. Under this assumption, the s-component is
kM I 2π zR cos φI dφI
Z
Bs = 3/2
R 0 (zR2 + 1)
c Nick Lucid
94 CHAPTER 5. ELECTRODYNAMICS
Z 2π
kM I zR
Bs = cos φI dφI = 0.
R (zR + 1)3/2
2
0
Therefore, the B-field only has a z-component along the z-axis (i.e.
~ = Bz ẑ). The field is now
B
kM I 2π
Z
~ dφI
B= ẑ
R 0 (zR2 + 1)3/2
Z 2π
~ = kM I
B 3/2
ẑ dφI
R (zR2 + 1) 0
~ = 2πkM I
B 3/2
ẑ
R (zR2 + 1)
• A coil is simply like having N loops in one place. Therefore, the field
is
~ = 2πkM N I
B 3/2
ẑ
R (zR2 + 1)
~ = 2πkM N I 2πkM N I
B 3/2 ẑ + 3/2 ẑ
R (z + a)2 + 1 R (z − a)2 + 1
" #
~ = 2πkM N I
B
1
3/2 +
1
3/2 ẑ .
R
(z + a)2 + 1 (z − a)2 + 1
c Nick Lucid
5.2. EXPERIMENTAL LAWS 95
Figure 5.8: This is a two-coil system in which the coils of radius R are separated by a
distance of 2a. If 2a = R, then this system is called a Helmholtz coil. The coordinate
system used for Eq. 5.2.11 is also shown for each individual coil.
Figure 5.9: This is the magnetic field of the Helmholtz coil (at least in the xz-plane). The
large dots are cross-sections of the coils and field strength is indicated by the thickness of
the arrows.
c Nick Lucid
96 CHAPTER 5. ELECTRODYNAMICS
This is the magnetic field of a Helmholtz coil at any point along the
z-axis given the coils are separated by 2a. Remember, both z and a
are unitless because they’re defined in terms of zR = zp /R.
• To show uniformity, we first need to know how the field is changing
along the z-axis. That is given by
" #
~
dB 2πkM N I −3 (z + a) −3 (z − a)
= 5/2 + 5/2 ẑ
dz R
(z + a)2 + 1 (z − a)2 + 1
" #
~
dB −6πkM N I z+a z−a
= 5/2 + 5/2 ẑ.
dz R
(z + a)2 + 1 (z − a)2 + 1
However, this doesn’t tell us anything about uniformity. For that, we
need to know how the changes are changing, meaning we need a second
derivative. The result is
"
~
d2 B −6πkM N I 1 1
2
= 5/2
+ 5/2
dz R 2
(z + a) + 1
(z − a)2 + 1
#
−5 (z + a)2 −5 (z − a)2
+ 7/2 + 7/2 ẑ.
(z + a)2 + 1 (z − a)2 + 1
1 −5a2
0= + .
(a2 + 1)5/2 (a2 + 1)7/2
7/2
Now we can multiply through by (a2 + 1) to eliminate the fractions.
We now have
0 = a2 + 1 − 5a2 = −4a2 + 1
⇒ 2a = 1 .
c Nick Lucid
5.3. THEORETICAL LAWS 97
Figure 5.10: These people were important in the development of theoretical electrody-
namics.
Ampére’s Law
Our theoretical understanding begins with André-Marie Ampére in 1820.
Yes, that’s the same year Biot and Savart released their findings. Both
the Biot-Savart team and Ampére were inspired by Hans Christian Ørsted’s
discovery that a compass needle pointed perpendicular to a current carrying
wire. Ørsted had announced his work in April 1820 and one week later
Ampére demonstrated that parallel currents attract and anti-parallel currents
c Nick Lucid
98 CHAPTER 5. ELECTRODYNAMICS
repel. Biot and Savart’s work wasn’t published until October of that year,
so Ampére was already showing promise.
Six years later, Ampére published a memoir in which he presented all
his theory and experimental results on magnetism. Amongst other things,
it included a beautifully simple relationship between current and B-field we
now write as
I
~ • d~` = µ0 Ienc ,
B (5.3.1)
where Ienc is the current passing through (i.e. enclosed by) the curve ` and µ0
is a theoretical constant with a value of 4πkM = 4π × 10−7 N/A2 . Redefining
the magnetic constant now makes several results in this chapter look much
more elegant. We call Eq. 5.3.1 Ampére’s law. The closed loop given by the
integral is called an Ampérian loop and is arbitrarily chosen very much like
a coordinate system. Eq. 5.3.1 states that, if there is an electric current inside
a closed curve, then there is a magnetic field along that curve. Essentially,
moving charge generates a magnetic field (a concept we’ve already seen).
In an introductory physics textbook, you might see Eq. 5.3.1 used to find
the magnetic field generated by an infinitely long current carrying wire or
an infinitely long solenoid, but this drastically devalues the law. First, we
may be able to find a scenario that approximates one of these possibilities,
but neither truly exists. Second, other than these few rare occurrences, the
Biot-Savart law is far more practical for finding a B-field. Ampére’s law can
be used to find an electric current given a magnetic field, but it has a higher
purpose. It gives us a much better understanding of how magnetic fields
work, the depth of which was not seen clearly until years later (the majority
of the scientific community initially favored the Biot-Savart law).
To get a feel for the real theoretical power of Ampére’s law, we need to
use something called the Curl Theorem given by Eq. 3.5.12. With it, we can
write Eq. 5.3.1 as
Z
~ ×B
∇ ~ • dA ~ = µ0 Ienc
where ∇~ is the del operator (defined in Chapter 3). We can simplify this by
defining a current density (current per unit area) with
Z
I = J~ • dA,~ (5.3.2)
c Nick Lucid
5.3. THEORETICAL LAWS 99
where J~ is the current density and I is the current. If we integrate the current
density over the same area as the one enclosed by the Ampérian loop, then
I becomes Ienc and we have
Z Z
∇ × B • dA = µ0 J~ • dA.
~ ~ ~ ~
Z Z
∇ × B • dA = µ0 J~ • dA.
~ ~ ~ ~
Since the areas of integration are the same, we can just cancel them (using
Eq. 3.1.1) leaving us with
~ ×B
∇ ~ = µ0 J~ , (5.3.3)
which is defined at a single arbitrary point. Eq. 5.3.3 tells us the curl of
the magnetic field at a point in space is directly proportional to the current
density at that same point. This is a very powerful idea because it relates
magnetic fields and current in terms of vector calculus as described in Chapter
3.
Example 5.3.1
Show that the Biot-Savart law is consistent with Ampére’s law.
• First, we need to make the Biot-Savart law look a little more convenient.
We’ll start with the current density form which is given by
Z ~
J × r̂
Z
~ = kM ~r
B 2
dVI = kM J~ × 3 dVI
r r
where we have eliminated the unit vector using Eq. 5.2.4. Generalizing
further, we get
c Nick Lucid
100 CHAPTER 5. ELECTRODYNAMICS
• Now we’re going to make a very creative substitution using the del
operator. Let take
~ 1 ~ 1
∇p = ∇p
r |~rp − ~rI |
where the subscript of p on del indicates the derivatives are with respect
to ~rp , not ~r. It’s best to evaluate this gradient in Cartesian coordinates,
yet the result will hold for any coordinate system. We’ll be using Eq.
3.2.1 and
q
|~rp − ~rI | = (xp − xI )2 + (yp − yI )2 + (yp − yI )2
to get components of
∂ 1 − (xp − xI ) x̂
x̂ =
3/2
∂x p r (x − x )2
+ (y − y )2
+ (y − y ) 2
p I p I p I
∂ 1 − (yp − yI ) ŷ
ŷ =
∂yp r 2 2 2 3/2
(x p − x I ) + (yp − y I ) + (y p − y I )
∂ 1 − (z − z ) ẑ
p I
ẑ = 3/2
∂zp r
2 2 2
(xp − xI ) + (yp − yI ) + (yp − yI )
~ p 1 = − ~r
∇ (5.3.5)
r r3
c Nick Lucid
5.3. THEORETICAL LAWS 101
Z
~ = kM ~ rI ) × ∇
~p 1
B −J(~ dVI .
|~rp − ~rI |
If we use the derivative product rule given by Eq. 3.2.11 (the first term
on the right is our integrand), then the result is
Z " ~ rI )
#
~ = kM ~p× J(~ 1
~ p × J(~
~ rI ) dVI .
B ∇ − ∇
|~rp − ~rI | |~rp − ~rI |
It’s good to note here that the quantity in square brackets is the mag-
netic vector potential (something we’ll get into a little later in the
chapter). At this point, you might be thinking “Will this solution ever
end?!” I assure you, in being this thorough, the following examples will
be incredibly simple in comparison. It’s important that we get all this
out of the way.
• Ampére’s law given by Eq. 5.3.3 involves the curl of B, ~ so
" Z #!
~ rI )
J(~
~p×B
∇ ~ =∇ ~p× ∇ ~ p × kM dVI .
|~rp − ~rI |
c Nick Lucid
102 CHAPTER 5. ELECTRODYNAMICS
• Let’s look at the part of the first term inside the parentheses (call it
~
O),
" Z #
~ rI )
J(~
~ =∇
O ~ p • kM dVI
|~rp − ~rI |
" #
Z ~ rI )
J(~
~ = kM
O ~p•
∇ dVI .
|~rp − ~rI |
If we use the derivative product rule given by Eq. 3.2.10 (taking ∇ ~p•
~ rI ) = 0 because J(~
J(~ ~ rI ) is not dependent on ~rp ), then the result is
Z
~ = kM J(~ ~ rI ) • ∇
~p 1
O dVI .
|~rp − ~rI |
~ rI ) out of the derivative,
It kind of looks like we’ve just pulled the J(~
but if you look close enough you’ll see the divergence changed to a
gradient. Don’t jump to conclusions too quickly. A similar derivation
to the one for Eq. 5.3.5 will give us
~p 1 ~I 1
∇ = −∇
r r
as a substitution. Using it, we get
Z
~ ~ ~ 1
O = kM J(~ rI ) • −∇I dVI
|~rp − ~rI |
Z
~ = −kM ~ rI ) • ∇
~I 1
O J(~ dVI .
|~rp − ~rI |
Now, we’ll use Eq. 3.2.10 again (with a little manipulation) to get
Z " ~ rI )
#
~ = −kM ~I• J(~ 1
~ I • J(~
~ rI ) dVI .
O ∇ − ∇
|~rp − ~rI | |~rp − ~rI |
The ∇~ I • J(~
~ rI ) doesn’t go to zero as easily as ∇
~ p • J(~
~ rI ) did. However,
the Biot-Savart law requires a steady current, which means no charge
c Nick Lucid
5.3. THEORETICAL LAWS 103
This looks a lot like what we started with, but we needed the del to be
with respect to ~rI before we could perform the next step. If we apply
the Divergence Theorem (Eq. 3.5.5), we get
I ~ rI )
J(~
~ = −kM
O ~I .
• dA
|~rp − ~rI |
• Now we need another substitution involving a del, but this one will
take a little more thought. Using Eq. 5.3.5, we get
~ 2 1 ~ p • −∇~p 1
−∇ p =∇
r r
~2 1 ~ ~r
−∇p = ∇p •
r r3
c Nick Lucid
104 CHAPTER 5. ELECTRODYNAMICS
Just so this doesn’t get too messy, we’re going to assume ~rI is zero
meaning ~r = ~rp (don’t worry, we’ll put it back in later). Now, we have
~ r~p ~ rˆp
∇p • 3
= ∇p • .
rp rp2
where both integrals enclose the origin (i.e. ~rI for our purposes). Since
the volume (and the surface enclosing) it are arbitrary, we’ll choose a
sphere of radius a. The line integral on the right gives
I I
rˆp 2
• a sin θ dθ dφ rˆp = sin θ dθ dφ = 4π
a2
which is most definitely not zero. The discrepancy comes from the
origin, our ~rI . The divergence goes to infinity at this location, but
is zero everywhere else. There is only one entity that has an infinite
value at one place, a zero value everywhere else, and also has a finite
area underneath: the Dirac delta function. Calling it a “function”
is misleading since a function must have a finite value everywhere by
definition, but the name suffices. The area under this function is 1, but
the area under our function is 4π. Therefore, we can conclude that
~ r~p
∇p • = 4πδ 3 (~rp ) .
rp3
c Nick Lucid
5.3. THEORETICAL LAWS 105
~ is
• Now the curl of B
Z
~p×B
∇ ~ = kM ~ rI ) 4πδ 3 (~rp − ~rI ) dVI
J(~
Z
~p×B
∇ ~ = 4πkM ~ rI ) δ 3 (~rp − ~rI ) dVI .
J(~
Inside an integral, the Dirac delta function “picks out” where it is non-
zero for all other functions in the integrand. For our integral, this would
be
Z
~ ~
∇p × B = 4πkM J(~ ~ rp ) δ 3 (~rp − ~rI ) dVI .
Faraday’s Law
A British scientist by the name of Michael Faraday had been conducting some
experiments involving electric current and magnetic fields in the 1820s. He
was not formally educated, having learned science while reading books during
a seven-year apprenticeship at a book store in his early twenties. This makes
the set of contributions he made to science (e.g. the electric motor) in his
lifetime very impressive. In 1831, Faraday announced his results regarding
how changing magnetic fields could affect electric current. With his limited
math skills, the relationship he published was very basic in terms of the
application to which he thought it applied.
c Nick Lucid
106 CHAPTER 5. ELECTRODYNAMICS
However, the scope of his relationship was very quickly realized by other
scientists who took it upon themselves to generalize the result to
I
E~ • d~` = − ∂ΦB , (5.3.8)
∂t
which we call Faraday’s law. The quantity being differentiated on the right
is
Z
ΦB = B ~ • dA,
~ (5.3.9)
which we call the magnetic flux. It is called flux because its form is anal-
ogous to flux from fluid dynamics,
Z
~
Φfluid = ρ~v • dA, (5.3.10)
where ρ is the fluid density and ~v is the flow velocity through the area of
integration. In reality, magnetic fields don’t flow, but vector fields can still
be discussed in flow terms even if there isn’t anything flowing as long as there
is a non-zero curl. The curl of the magnetic field is given by Eq. 5.3.3, which
is non-zero (at some points).
Eq. 5.3.8 states that, if a magnetic field changes on some area, then there
is an electric field along the curve enclosing that area. Essentially, a changing
magnetic field generates an electric field. This idea has much more broad a
scope than Michael Faraday had anticipated. It forms the foundation for AC
circuit designs and led the great Nikola Tesla (for which the standard unit of
magnetic field is named) to the design the entire U.S. electricity grid at the
turn of the 20th century.
Just as with Ampére’s law (Eq. 5.3.1), we have a line integral on the left,
so we can get a feel for its theoretical power by applying the Curl Theorem
(Eq. 3.5.12). Doing so, we arrive at
Z
~ = − ∂ΦB .
~ ~
∇ × E • dA
∂t
Substituting in for magnetic flux with Eq. 5.3.9, we get
Z Z
~ ~
~ ∂ ~ • dA.
~
∇ × E • dA = − B
∂t
c Nick Lucid
5.3. THEORETICAL LAWS 107
The integral operator is over space and the derivative operator is over time,
so these operators are commutative. Applying this property results in
Z Z ~
∂B
~ ×E
∇ ~ • dA
~= − ~
• dA.
∂t
Since the areas of integration are the same, we can just cancel them (using
Eq. 3.1.1) leaving us with
~
∇ ~ = − ∂B ,
~ ×E (5.3.11)
∂t
which is defined at a single arbitrary point. Eq. 5.3.11 tells us the curl of the
electric field at a point in space is directly proportional to the rate of change
of the magnetic field with respect to time at that same point. This is a very
powerful idea because it relates electric fields and magnetic fields in terms of
vector calculus as described in Chapter 3.
Example 5.3.2
Show that Coulomb’s law is consistent with Faraday’s law.
c Nick Lucid
108 CHAPTER 5. ELECTRODYNAMICS
Since both the variable of integration and ρ(~rq ) are independent on ~rp ,
we get
" #
−
Z
~p×E ~ = kE ρ(~rq ) ∇ ~p× (~r p ~
r q )
∇ dVq .
|~rp − ~rq |3
Z
~p×E
~ = −kE ~ ~p 1
∇ ρ(~rq ) ∇p × ∇ dVq .
|~rp − ~rq |
Since the curl of a gradient is always zero (Eq. 3.2.6), the integrand is
zero. Therefore,
~p×E
∇ ~ = 0,
Gauss’s Law(s)
The next major discovery came in 1835 with Carl Friedrich Gauss, a German
mathematician and scientist. Gauss formulated relationships for electricity
and magnetism in terms of flux through closed areas. They are formally
written today as
I
E ~ = qenc
~ • dA (5.3.13)
0
and
I
~ • dA
B ~ =0, (5.3.14)
c Nick Lucid
5.3. THEORETICAL LAWS 109
where qenc is the charge enclosed by the area given in the closed surface
integral and 0 is a theoretical constant with a value of (4πkE )−1 = 8.854 ×
10−12 C2 /(Nm2 ). Redefining the electric constant now makes several results
in this chapter look much more elegant. We call Eq. 5.3.13 Gauss’s law. Eq.
5.3.14 doesn’t have a formal name, but we sometimes call it Gauss’s law for
Magnetism. The closed area given by the integrals is called a Gaussian
Surface and is arbitrarily chosen very much like a coordinate system.
Eq. 5.3.13 states that, if there is an electric charge inside a closed surface,
then there is a net electric field passing through that surface (i.e. an electric
flux through the surface as analogous to Eq. 5.3.10). Essentially, charge gen-
erates an electric field (a concept we’ve already seen). Eq. 5.3.13 states that
there isn’t a magnetic flux through any closed surface because the integral is
necessarily zero. No matter what shape, size, orientation, or location this ar-
bitrary surface has, there are always as many vectors on the surface directed
inward as there are directed outward. Essentially, this means magnetic fields
always form closed loops (i.e. they always lead back to the source).
Because the integrals in Eqs. 5.3.13 and 5.3.14 are both closed surface
integrals, we can apply something called the Divergence Theorem (Eq. 3.5.5)
to get a feel for their theoretical power. Showing the work for Eq. 5.3.13, we
see that
Z
~ •E
∇ ~ dV = qenc .
0
We can simplify this by defining a charge density (charge per unit volume)
with
Z
q = ρ dV, (5.3.15)
where ρ is the charge density and q is the charge. If we integrate the charge
density over the same volume as the one enclosed by the Gaussian Surface,
then q becomes qenc and we have
Z Z
~ ~ 1
∇ • E dV = ρ dV
0
Z Z
~ •E
~ dV = ρ
∇ dV.
0
c Nick Lucid
110 CHAPTER 5. ELECTRODYNAMICS
Since the volumes of integration are the same, we can just cancel them (using
Eq. 3.1.1) leaving us with
∇ ~ = ρ ,
~ •E (5.3.16)
0
which is defined at a single arbitrary point. Eq. 5.3.16 tells us the divergence
of the electric field at a point in space is directly proportional to the charge
density at that same point. This is a very powerful idea because it relates
electric fields and charge in terms of vector calculus as described in Chapter
3.
Similarly, Eq. 5.3.14 can be shown to become
~ •B
∇ ~ =0, (5.3.17)
which is defined at a single arbitrary point. Eq. 5.3.17 tells us the divergence
of the magnetic field at any point in space is zero (i.e. magnetic fields don’t
diverge). This is a very powerful idea because it shows the behavior of
magnetic fields in terms of vector calculus as described in Chapter 3.
Example 5.3.3
Show that Coulomb’s law is consistent with Gauss’s law.
Since both the variable of integration and ρ(~rq ) are independent on ~rp ,
we get
" #
−
Z
~p•E ~ = kE ρ(~rq ) ∇ ~p• (~r p ~
r q )
∇ dVq .
|~rp − ~rq |3
c Nick Lucid
5.3. THEORETICAL LAWS 111
Inside an integral, the Dirac delta function “picks out” where it is non-
zero for all other functions in the integrand. For our integral, this would
be
Z
~p•E
∇ ~ = 4πkE ρ(~rp ) δ 3 (~rp − ~rq ) dVq .
~ = ρ(~rp ) ,
~p•E
∇
0
which is exactly Eq. 5.3.16.
Example 5.3.4
Show that the Biot-Savart law is consistent with Gauss’s law for Magnetism.
c Nick Lucid
112 CHAPTER 5. ELECTRODYNAMICS
Another way to think about this is to tap another fluid dynamics concept:
equations of continuity. The basic fluid form of this would be
∂ρ ~
+ ∇ • (ρ~v ) = 0, (5.3.21)
∂t
which is very related to the fluid flux given by Eq. 5.3.10. Formulating this
for electrodynamics, we get
∂ρ ~ ~
+ ∇ • J = 0,
∂t
or
~ • J~ = − ∂ρ ,
∇ (5.3.22)
∂t
where ρ is the volumetric charge density and J~ is the current density (current
per unit area). This is commonly referred to as Conservation of Charge
because it states the spatial flow of charge (current density) outward from
a point in space is equal to the decrease in the charge density over time at
that same point. Seems logical, right? We can see, using vector calculus,
Ampére’s law given by Eq. 5.3.3 is not consistent with Eq. 5.3.22.
According to Eq. 3.2.7, we know the divergence of a curl is always zero.
If we take the divergence of Eq. 5.3.3, we get
~ ~ ~ ~
∇ • ∇ × B = ∇ • µ0 J ~
c Nick Lucid
5.3. THEORETICAL LAWS 113
~ • J.
0=∇ ~
This doesn’t match Eq. 5.3.22, so there must be something missing from
Ampére’s law. Working this out in terms of vector calculus allows us to
discover the true origin of the displacement current. Taking the divergence
of Eq. 5.3.20, we get
~ • ∇
∇ ~ ×B~ =∇ ~ • µ0 J~ + ∇~ • µ0 J~D
~ • J~ + ∇
0=∇ ~ • J~D .
~ • J~D = ∂ρ .
∇
∂t
From Gauss’s law given by Eq. 5.3.16, we can say
~ • J~D = ∂ 0 ∇
∇ ~ •E~ .
∂t
The del operator is over space, so it is commutative with the time derivative.
Applying this property results in
!
∂ ~
E
~ • J~D = ∇
∇ ~ • 0
∂t
~
∂E
J~D = 0 .
∂t
The so-called displacement current term is simply the result of a changing
electric field! We can substitute this result into Eq. 5.3.20 and we get
~
∇ ~ = µ0 J~ + µ0 0 ∂ E
~ ×B (5.3.23)
∂t
and an integral form of
I
~ • d~` = µ0 Ienc + µ0 0 ∂ΦE ,
B (5.3.24)
∂t
c Nick Lucid
114 CHAPTER 5. ELECTRODYNAMICS
Figure 5.11: These people were important in the development of what we call Maxwell’s
equations.
where ΦE is the electric flux passing through the area enclosed by the curve
in the line integral. The integral form of these laws is appealing to some,
but we have seen very clearly in the examples from Section 5.3 and the
immediately preceding work that the del form is far more powerful. It’s also
appropriate at this point in our discussion to stick to the del form because
Maxwell was the first to formally use the notation.
~
∂D
J~tot = J~ + , (5.4.1)
∂t
c Nick Lucid
5.4. UNIFICATION OF ELECTRICITY AND MAGNETISM 115
~ and A):
• “Equation of Magnetic Intensity” (Definition of H ~
~ = µH
B ~ =∇
~ × A,
~ (5.4.2)
~ × µH
∇ ~ = µJ~tot , (5.4.3)
~
~ − ∂ A − ∇φ,
h i
~ ~
~v × B + E = ~v × µH ~ (5.4.4)
∂t
~
• “Equation of Electric Elasticity” (Definition of D):
~ = 1 D,
E ~ (5.4.5)
~ is the dis-
where is an electric field constant for the material and D
placement field (i.e. the electric field in the material).
~ = 1 J,
E ~ (5.4.6)
σ
where σ is the electric conductivity in the material.
c Nick Lucid
116 CHAPTER 5. ELECTRODYNAMICS
∇ ~ = ρ,
~ •D (5.4.7)
where D~ is the displacement field (i.e. the electric field in the material)
and ρ is the volumetric charge density in the material.
~ • J~ = − ∂ρ ,
∇ (5.4.8)
∂t
where ρ is the volumetric charge density in the material. This is just
Eq. 5.3.22.
The quantities D,~ H,~ , µ, and σ are all related in some way to materials.
Maxwell was experimental at heart, so he designed the equations for practical
use rather than deeper meaning. In fact, he viewed the electric potential, φ,
and the magnetic potential, A, ~ as completely meaningless because where you
chose to place the value of zero was irrelevant. Very much like a coordinate
system (see Chapter 1), this choice of zero has no effect on the physical result,
but there are some choices that will simplify the analysis. Both φ and A ~ had
been used prior to Maxwell by people like Joseph Louis Lagrange, Pierre-
Simon Laplace, Gustav Kirchhoff, Michael Faraday, and Franz Neumann; all
of whom tried to interpret them physically to no real success. Maxwell, on
the other hand, simply viewed them as a way to simplify his equations.
We could very easily combine several of these equation to simplify the
work required and hopefully make the list look a little more elegant. In fact,
Oliver Heaviside, an English mathematician and physicist, did just that.
Heaviside’s major contributions include formalizing the notation we use in
vector calculus given in Chapter 3, developing methods of solving differential
equations, and incorporating complex numbers into the methods of electric
circuits. In 1885, he published Electromagnetic Induction and its Propagation
where he took Maxwell’s list of 8 down to 4 equations.
Heaviside realized, not only could he combine a few of Maxwell’s equa-
tions to shorten the list, he could eliminate several equations and arbitrarily
defined quantities by including Faraday’s law (Eq. 5.3.11). He felt that, since
Maxwell’s arbitrary quantities had no physical meaning, they should not be
included. In response, Maxwell spent years trying to discover their physical
c Nick Lucid
5.4. UNIFICATION OF ELECTRICITY AND MAGNETISM 117
~ •E
~ = ρ
∇ (5.4.9a)
0
~ •B
∇ ~ = 0 (5.4.9b)
~
∇ ~ = − ∂B
~ ×E (5.4.9c)
∂t
~
∇ ~ = µ0 J~ + µ0 0 ∂ E
~ ×B (5.4.9d)
∂t
which are just Eqs. 5.3.16, 5.3.17, 5.3.11 and 5.3.23. These equations are
formulated in terms of just the electric and magnetic fields. Heaviside also
listed the Lorentz Force as
F~ = q E
~ + q~v × B
~ (5.4.10)
to incorporate how charges were affected by each of these fields. In this case,
the electric field constant, 0 , is referred to as the permittivity of free
space and the magnetic field constant, µ0 , is referred to as the permeability
of free space.
All physics students know this list as Maxwell’s equations. When they
were first published, they were called Heaviside’s equations (or sometimes
the Heaviside-Hertz equations since Heinrich Hertz discovered the same list
simultaneously). Unfortunately, politics tend to play a role in how these
things turn out and Heaviside was somewhat under-appreciated in his time,
very much like Nikola Tesla. Many scientists felt that, since Maxwell was the
first to try to unify electricity and magnetism, he should be given credit and
so then they were called the Heaviside-Maxwell equations. In 1940, Albert
Einstein published an article called The Fundamentals of Theoretical Physics
where he referred them simply as Maxwell’s equations and, from that point
on, Heaviside’s name has been lost in history.
c Nick Lucid
118 CHAPTER 5. ELECTRODYNAMICS
Since spatial derivative operators are commutative with time derivative op-
erators, we get
~
~ ~
∂ ~ ~
∇× ∇×E =−
∇×B
∂t
∂ .
∇~ × ∇ ~ ×B~ = µ0 0 ~ ×E
∇ ~
∂t
c Nick Lucid
5.5. ELECTROMAGNETIC WAVES 119
Using Eq. 3.2.8, we can substitute on the left side of the equations, which
results in
~
~ ~
~ 2~ ∂ ~ ~
∇ ∇•E −∇ E =−
∇×B
∂t
∂ .
∇~ ∇ ~ •B
~ −∇ ~ B
2 ~ = µ0 0 ~ ×E
∇ ~
∂t
Inside each of the four sets of parentheses, we can substitute from Eq. Set
5.5.1 to arrive at
!
~
~ = − ∂ µ0 0 ∂ E
~ 2E
0 − ∇
∂t ∂t
!
∂ ∂B~
~ 2~
0 − ∇ B = µ
0 0 −
∂t ∂t
2~
~ = µ0 0
~ 2E ∂ E
∇
∂t 2
. (5.5.2)
2~
∇
~ 2B
~ = µ0 0 ∂ B
∂t2
These two equations match the form of the standard mechanical wave
equation given by
d2 y 1 d2 y
= (5.5.3)
dx2 v 2 dt2
where we have a second derivative with respect to space proportional to a
second derivative with respect to time. The proportionality constant is an
inverse square of the wave-speed. This would suggest we can find the speed
of an electromagnetic wave by stating
1 1
= µ0 0 ⇒c= √ . (5.5.4)
c2 µ0 0
where c has the value of 299,792,458 m/s when you plug in the values of µ0
and 0 . This is the speed of light! This is also sometimes specified to be
“in a vacuum” or “in free space” because experimentally (or practically) we
c Nick Lucid
120 CHAPTER 5. ELECTRODYNAMICS
Example 5.5.1
Just as with Eq. 5.5.3, there is a multitude of possible solutions to Eq. 5.5.2
involving the superposition of functions (in this case vector functions). The
simplest of these solutions (worth examining) is for the linearly-polarized
c Nick Lucid
5.5. ELECTROMAGNETIC WAVES 121
Figure 5.12: This is an example of an electromagnetic wave. Specifically, this type is called
a plane linearly-polarized wave in which all vectors are oriented at 90◦ . The direction of
propagation is downward to the right along the thin center line in the image.
plane wave shown in Figure 5.12. The solutions take the form
~ ~ ~
E(~r, t) = E0 cos ωt − k • ~r + ϕ0
, (5.5.5)
B(~
~ 0 cos ωt − ~k • ~r + ϕ0
~ r, t) = B
where ~r is the position vector of the point in space, t is time, ω = 2πf is the
angular frequency of the wave (in radians per second), ~k = (2π/λ) k̂ is the
angular wave vector (in radians per meter) in the direction of propagation,
and ϕ0 is the phase angle (in radians). The vector quantities E ~ 0 and B
~ 0 are
the corresponding amplitudes (maximum field disturbances) for each type of
field.
Let’s apply Eqs. 5.5.1a and 5.5.1c to these wave solutions. Assuming the
direction of propagation is along the z-axis in Cartesian coordinates, we can
say ~k • ~r = kz because of Eq. 1.1.1. Starting with Eq. 5.5.1a, we get
~ •E
∇ ~ =0
h i
~ • E
∇ ~ 0 cos(ωt − kz + ϕ0 ) = 0
∂
0+0+ [E0z cos(ωt − kz + ϕ0 )] = 0
∂z
E0z k sin(ωt − kz + ϕ0 ) = 0.
c Nick Lucid
122 CHAPTER 5. ELECTRODYNAMICS
~ 0 cos(ωt − kz + ϕ0 ) = − ∂ B
h i h i
~ × E
∇ ~ 0 cos(ωt − kz + ϕ0 )
∂t
∂ ∂ h~ i
− [E0 cos(ωt − kz + ϕ0 )] x̂ − 0 + 0 = − B0 cos(ωt − kz + ϕ0 )
∂z ∂t
~ 0 ω sin(ωt − kz + ϕ0 )
−E0 k sin(ωt − kz + ϕ0 ) x̂ = B
~ 0 = −E0 k x̂.
B
ω
It’s in the −x̂ direction. Therefore, the direction of the magnetic field dis-
turbance of a plane linearly-polarized light wave is always orthogonal to the
direction of propagation and the direction of the electric field disturbance.
Furthermore,
k 2π/λ 1 E0
B0 = E0 = E0 = E0 =
ω 2πf λf c
or sometimes written E0 = c B0 . Not only are their directions related, so
are their magnitudes.
In general, both field disturbances are orthogonal to the direction of prop-
agation, but not necessarily to each other. We represent this fact by some-
thing called the Poynting Vector given by
~= 1E
S ~ ×B
~ (5.5.6)
µ0
which is defined as the energy flux vector (in watts per square meter) of the
EM wave. In other words, it’s the rate of energy transfer per unit area in
the direction of propagation.
c Nick Lucid
5.6. POTENTIAL FUNCTIONS 123
Figure 5.13: These people were important in the development of the electric potential.
~
E ~ − ∂A
~ = −∇φ (5.6.1)
∂t
and
~ =∇
B ~ ×A
~. (5.6.2)
The first term in Eq. 5.6.1 matches what we know about scalar potentials
for conservative fields (just as we saw with Eq. 4.2.3). As we can see, vector
potentials are a bit trickier. The magnetic field is clearly the curl of A~ as
c Nick Lucid
124 CHAPTER 5. ELECTRODYNAMICS
defined in Section 3.2. However, we can also see that a time-varying A ~ can
contribute to the overall electric field, a phenomenon that is easily described
by Faraday’s law (Eq. 5.4.9c).
Magnetostatics
If we assume for the moment that A~ is constant in time, then we have what we
call the magnetostatic approximation (i.e. the study of static magnetic
fields). This is an approximation we’ve already made in Section 5.3 without
even realizing it. With this in mind, Eq. 5.6.1 becomes simply
~ = −∇φ
E ~ (5.6.3)
and we can say the electric field is a conservative field meaning it is path-
independent. From this special case, we can form an argument for the phys-
ical significance of the electric potential. Evaluating Eq. 5.6.3 over a line
integral from point a to point b, we get
Z b Z b
~ ~
E • d` = − ~ • d~`.
∇φ
a a
The right side of this equation is just the fundamental theorem of vector
calculus (Eq. 3.4.4), so
Z b Z b
~ • d~` = −
E dφ
a a
Z b
~ • d~` = − [φ| − φ| ]
E b a
a
Z b
~ • d~` = φ| − φ| .
E (5.6.4)
a b
a
Therefore, the path integral of the electric field is just the difference in po-
tential (or the potential difference) between the two endpoints a and b.
Remember Faraday’s law in integral form from 1831? The left side of Eq.
5.3.8 has a very similar integral form, which is no coincidence. A chang-
ing magnetic flux induced what Faraday called an electromotive force (or
emf).
c Nick Lucid
5.6. POTENTIAL FUNCTIONS 125
If we substitute Eq. 5.6.3 into Gauss’s law (Eq. 5.4.9a), then we get
~ •E
∇ ~ = ρ
0
~ • −∇φ~
ρ
∇ =
0
~ 2φ = − ρ ,
∇ (5.6.5)
0
which is called Poisson’s equation named for Siméon Denis Poisson. In
free space where there is no charge, this takes the form
~ 2 φ = 0,
∇ (5.6.6)
which is called Laplace’s equation named for Pierre-Simon Laplace (he did
a lot for electrodynamics). Eq. 5.6.6 is applicable in quite a few unrelated
fields (e.g. Thermodynamics), but is most noted in electrodynamics. The
second space derivative operator on the left of Eqs. 5.6.5 and 5.6.6 is referred
to as the laplacian (see Section 3.2)for reasons which should now be obvious.
In this magnetostatic case, the solution to Eq. 5.6.5 is given by an equa-
tion similar to Coulomb’s law (Eq. 5.2.7):
dq ρ
dφ = kE = kE dV (5.6.7)
r r
or more specifically
ρ(t, ~rq )
dφ = kE dVq . (5.6.8)
|~rp − ~rq |
It may not have been obvious at the time, but a similar relation was found
~ in Eq. 5.3.6. Taking note of Eq. 5.6.2, we get
for A
~ = kM J~
dA dV (5.6.9)
r
or more specifically
~ rI )
J(~
~ = kM
dA dVI . (5.6.10)
|~rp − ~rI |
These equations are assuming that both charge density, ρ, and current den-
~ go to zero at infinity as they should in the real universe. In approxi-
sity, J,
mations that violate this, we have to be a little more creative.
c Nick Lucid
126 CHAPTER 5. ELECTRODYNAMICS
Gauge Invariance
In 1848, Gustav Kirchhoff showed the electric potential, φ, to be the same as
the “electric pressure” in Georg Simon Ohm’s law regarding electric circuits
(published in 1827). We now refer to this quantity as voltage. This is a fact
~ and B
Heaviside was well aware of, but still opted for vector fields E ~ because
the value of zero always meant something physical. The same cannot be said
when φ and A ~ have a value of zero.
The potential functions can vary by particular factors and still leave the
~ and B
vector fields E ~ unchanged. This is called gauge invariance. The
act of choosing a gauge is called gauge fixing and it allows us to not only
be speaking the same language, but also simplify equations a bit. The gauge
invariance for electrodynamic potentials is given by
∂f
φ → φ− (5.6.11a)
∂t
~ → A
A ~ + ∇f
~ (5.6.11b)
where f (t, ~r) is an arbitrary gauge function. We can substitute Eq. Set
5.6.11 into Eq. 5.6.1,
~ ~ ∂f ∂ ~ ~
E = −∇ φ − − A + ∇f
∂t ∂t
∂f ∂A~ ∂ ~
~ ~
E = −∇φ + ∇ ~ − − ∇f .
∂t ∂t ∂t
Since the del operator and the time derivative are commutative, the mixed
terms cancel leaving us with just Eq. 5.6.1. We can make similar substitutions
in Eq. 5.6.2 arriving at
~ ~ ~ ~
B = ∇ × A + ∇f = ∇ ~ ×A ~+∇ ~ × ∇f.
~
Since the curl of the gradient is always zero (Eq. 3.2.6), the second term
disappears and we get just Eq. 5.6.2.
Gauges in physics are not usually defined by specifying a function f , but
rather by specifying the divergence of A.~ Eqs. 5.6.1 and 5.6.2 say nothing
about how A ~ diverges and so it is an arbitrary quantity. There are a couple
very popular gauges: the Coulomb gauge, given by
~ •A
∇ ~ = 0, (5.6.12)
c Nick Lucid
5.6. POTENTIAL FUNCTIONS 127
∇ ~ = − 1 ∂φ .
~ •A (5.6.13)
c2 ∂t
These have particular uses when applying them to Maxwell’s equations.
~ •E
∇ ~ = ρ
0
!
~
~ − ∂A
~ • −∇φ
∇ =
ρ
∂t 0
!
~
~ • ∂A ρ
~ • ∇φ
−∇ ~ −∇ =
∂t 0
c Nick Lucid
128 CHAPTER 5. ELECTRODYNAMICS
~ 2φ − ∂ ~ ~ ρ
−∇ ∇•A = . (5.6.14)
∂t 0
This is where the gauge fixing comes into play. Under the Coulomb gauge
(Eq. 5.6.12), we get
~ 2φ = − ρ ,
∇
0
which is just Poisson’s equation (Eq. 5.6.5) just like with magnetostatics. The
coulomb gauge does make it particularly easy to find the electric potential,
but A~ is still rather challenging. In this more general case, φ is not enough to
determine E ~ (see Eq. 5.6.1), so A~ must be found. Furthermore, changes in φ
over time propagate through space instantaneously, which is still physically
legal because φ is not a physically measurable quantity. At this moment,
you might be yelling at this book saying “I’ve measured potential before!”.
The truth is you’ve never measured potential. You haven’t even measured
~ What you do measure is the effect E
E. ~ has on physical objects and you
~ ~
interpret this as a φ or an E. Since E is also dependent on A ~ and changes
~ propagate at the speed of light, we’re not violating any physical laws.
in A
Under the Lorenz gauge (Eq. 5.6.13), things are a bit simpler overall. Eq.
5.6.14 becomes
~ φ−
2 ∂ 1 ∂φ ρ
−∇ − 2 =
∂t c ∂t 0
2
~ 2φ − 1 ∂ φ = − ρ .
∇ (5.6.15)
c2 ∂t2 0
This might seem a bit more complicated, but now changes in φ over time
only propagate at the speed of light, so it makes more sense. The Lorenz
gauge also simplifies Eq. 5.4.9d to
~
~ ×B
∇ ~ = µ0 J~ + µ0 0 ∂ E
∂t !
∂ ∂ ~
A
~ × ∇
∇ ~ ×A~ = µ0 J~ + µ0 0 ~ −
−∇φ .
∂t ∂t
c Nick Lucid
5.7. BLURRING LINES 129
1 ∂ ~ ~ 2 ~ 1 ∂ 2A~
− ∇φ − ∇ A = µ 0
~ − 1 ∂ ∇φ
J ~ − .
c2 ∂t c2 ∂t c2 ∂t2
The first term on the left cancels with the second term on the right.
2~
~ = µ0 J~ − 1 ∂ A
~ 2A
−∇
c2 ∂t2
2~
∇ ~ − 1 ∂ A = −µ0 J~ .
~ 2A (5.6.16)
c2 ∂t2
Not only do Eqs. 5.6.15 and 5.6.16 retain the beautiful symmetry of Maxwell’s
equations, but they also very quickly show wave equations in free space for
light. The only downside to writing Maxwell’s equations this way is that
we’re dealing with second-order differential equations rather than first order
ones. Having to keep track of a gauge may be something people like Oliver
Heaviside didn’t want to do, but we’ll see in a later chapter that we can
show the electric and magnetic vector potentials to be more physical than
the electric and magnetic fields.
c Nick Lucid
130 CHAPTER 5. ELECTRODYNAMICS
c Nick Lucid
Chapter 6
Tensor Analysis
Tensors of higher rank are called dyads (rank-2 Tij ), triads (rank-3 Tijk ),
quadads (rank-4 Tijkl ), etc. However, these names are seldom used. The
number of values each index can take tells us the tensor’s dimension. For
131
132 CHAPTER 6. TENSOR ANALYSIS
• Ti is a covariant vector.
• T i is a contravariant vector.
c Nick Lucid
6.2. INDEX NOTATION 133
and writing the summation sign can get old fast, so we have a way of imply-
ing the summation instead. For example, let’s take the 3-space dot product
given by Eq. 2.2.2 as
3
X
~•B
A ~ = Ai Bi = A1 B1 + A2 B2 + A3 B3 .
i=1
Under the notational standards given in this section, however, one of these
vectors should be covariant and the other contravariant. Therefore, the dot
product is really
3
X
~•B
A ~ = Ai Bi = A1 B1 + A2 B2 + A3 B3 .
i=1
where the index i is repeated (i.e. summed over) and the vectors are dimension-
3 implied by the use of latin letters.
Example 6.2.1
When we’re first introduced to the moment of inertia, it’s defined as a mea-
sure of an object’s ability to resist changes in rotational motion. We’re also
given little formulae which all depend on mass and, more importantly, the
mass distribution. However, in general, moment of inertia also depends on
the orientation of the rotational axis and the best way to represent such
ambiguity is with a tensor.
In order to find the form of this tensor in index notation, we’ll start with
the origin of the moment of inertia: spin angular momentum. Spin angular
momentum is given by
X X
L~ spin = ~r × p~ = m~r × ~v
where ~r and ~v are the position and velocity, respectively, of a point mass m
relative to the center of mass of the body. If the body has enough m’s closely
c Nick Lucid
134 CHAPTER 6. TENSOR ANALYSIS
Figure 6.1: This is an arbitrary rigid body. Its center of mass (COM), axis of rotation
(AOR), and mass element (dm) have been labeled. The position of dm relative to the
COM is given by ~r.
packed, then we can treat the body as continuous. Under those conditions,
spin angular momentum is
Z
L~ spin = ~r × ~v dm
Since both ω
~ and ~r|| are parallel to the rotational axis, their cross product is
zero according to Eq. 2.2.3 and the velocity of each mass element becomes
~ × ~r.
~v = ω
c Nick Lucid
6.2. INDEX NOTATION 135
What we have here is a triple product which obeys the identity given by Eq.
2.2.12. Now the spin angular momentum can be written as
Z
~
Lspin = [~ω (~r • ~r) − ~r (~r • ω
~ )] dm.
Dot products in index notation are given by Eq. 6.2.1, so we can write
Z
k
ωi r rk − ri rj ωj dm,
Li =
which is the ith component of spin angular momentum. The index i is referred
to as a free index where as j and k are each a summation index. All free
indices on the left side of a tensor equation must match those on the right
side in symbol and location.
We cannot simply pull out the ω because each one is indexed differently.
The rank of each term must be maintained, so we need to use a special rank-2
mixed tensor given by
(
i 1, when i = j
δj = (6.2.2)
0, when i 6= j
which is called the Kronecker delta. With this tensor, we can say ωi = δij ωj
and spin angular momentum becomes
Z
j
δi ωj rk rk − ri rj ωj dm
Li =
Z
j k j
Li = δi r rk − ri r dm ωj .
which is the moment of inertia tensor. This leaves us with a spin angular
momentum of Li = Iij ωj . The moment of inertia tensor is a rank-2 dimension-
3 tensor. If the axis of rotation is a principle axis (i.e. an axis of symmetry)
of the rigid body, then all components where i 6= j will be zero.
c Nick Lucid
136 CHAPTER 6. TENSOR ANALYSIS
Example 6.3.1
This matrix vector notation carries over into operations like the dot product
given in Eq. 6.2.1. A common application of the dot product is work (as seen
in Example 2.2.1) defined by
Z Z
W = F • d~s = F~ • ~v dt.
~
We can also write the vectors F~ and ~v as matrices. In matrix notation, work
becomes
v1
Z
1
W = F F 2 F 3 v2 dt,
v3
which by matrix operations would have exactly the same result as the stan-
dard dot product.
Don’t be fooled by anyone claiming covariant vectors are always column
matrices (and contravariant vectors are always row matrices). The dot prod-
uct given in Eq. 6.2.1 is valid if it’s written Ai Bi or Bi Ai and should still
result in a scalar. In other words, the row matrix must always be written
c Nick Lucid
6.3. MATRIX NOTATION 137
This tensor is symmetric and has discernible pieces. The lower right 3 × 3 is
the Cauchy stress tensor, T00 is the energy density, [T01 , T02 , T03 ] is the energy
flux vector, and [T10 , T20 , T30 ] is the momentum density vector (which, by
symmetry, is the same as the energy flux vector). We’ll get into the details
later in the book.
Another example is the Kronecker Delta defined by Eq. 6.2.2 and given
in matrix notation as
1 0 0
δji −→ 0 1 0 .
0 0 1
c Nick Lucid
138 CHAPTER 6. TENSOR ANALYSIS
This is simply the dimension-3 identity matrix. Its use is important because
it is used to maintain rank when factoring an expression just as in Example
6.2.1.
Example 6.3.2
In Example 6.2.1, the final result was the equation Li = Iij ωj , which in
matrix notation is
1
L1 I1 I12 I13 ω1
L2 = I21 2
I2 I2 2
ω2 .
L3 I31 2
I3 I3 3
ω3
Operating using matrix multiplication results in
1
L1 I1 ω1 + I12 ω2 + I13 ω3
L2 = I21 ω1 + I22 ω2 + I23 ω3
L3 I31 ω1 + I32 ω2 + I33 ω3
which has components in a form that match the original index notation. The
index j is the summation index and each of these components is a summation
over those indices.
If we wanted to isolate the moment of inertia tensor in matrix form, then
we would need to decide on a coordinate system. Let’s keep things simple
and choose Cartesian. Based on Eq. 6.2.3, the moment of inertia is
Z δ11 xk xk − x1 x1 δ12 xk xk − x1 x2 δ13 xk xk − x1 x3
Iij −→ δ21 xk xk − x2 x1 δ22 xk xk − x2 x2 δ23 xk xk − x2 x3 dm.
δ31 xk xk − x3 x1 δ32 xk xk − x3 x2 δ33 xk xk − x3 x3
Since only the diagonal components are non-zero in the Kronecker Delta, we
have
Z xk xk − x1 x1 −x1 x2 −x1 x3
Iij −→ −x2 x1 xk x k − x2 x2 −x2 x3 dm.
1 2
−x3 x −x3 x x xk − x3 x3
k
c Nick Lucid
6.3. MATRIX NOTATION 139
Now that we’ve performed all the operations associated with the indices, we
can drop that notation entirely arriving at
Z yy + zz −xy −xz
Iij −→ −yx xx + zz −yz dm
−zx −zy xx + yy
You might think the matrix notation ends with rank-2 tensors. However,
while first learning about number arrays in high school computer program-
ming class, I designed a visual representation for higher rank tensors akin
to matrices. Let’s consider the pattern developing here. A scalar (rank-
0 tensor) has a single component, a vector (rank-1 tensor) has a length of
components, and a rank-2 tensor has a length and width of components. It
stands to reason that a rank-3 tensor should have a length, width, and depth
of components like that given in Figure 6.2.
Rank-4 tensors, like those found all over General Relativity, might seem
impossible under this pattern until you consider the subtle aspects. A rank-1
tensor is a collection of rank-0 tensors, a rank-2 is a collection of rank-1’s,
and a rank-3 is a collection of rank-2’s. Therefore, I would argue that a
rank-4 is simply a collection of rank-3’s like that given in Figure 6.3. Un-
fortunately, we’re beginning to see the problem with matrix notation. How
does something like a rank-4 tensor operate?! It is usually best to yield to
index notation and treat matrix notation as simply a way to visualize the
quantity.
c Nick Lucid
140 CHAPTER 6. TENSOR ANALYSIS
Figure 6.2: These are both rank-3 tensors in matrix notation. The tensor on the left is
dimension-3 (Tijk ) and the tensor on the right is dimension-4 (Tαβγ ).
Figure 6.3: This is a rank-4 dimension-3 tensor in matrix notation. In index notation, it
would be represented by Tijkl where the final index l is given by the large axis on the left
(i.e. it tells you which rank-3 you’re in).
c Nick Lucid
6.4. DESCRIBING A SPACE 141
Line Element
The simplest, most straight-forward way to represent a coordinate system
with tensors is to use a scalar quantity called a line element. This line
element describes the infinitesimal distance between two consecutive points
in a space and will look different depending on the coordinate system choice.
For example, in Cartesian three-space, the line element is
where the 2’s are exponents. With a careful look at Eq. 3.4.3, we can see
that
where d~`, the path element, is written in whatever coordinate system you
may need.
Metric Tensor
Formally, we treat the scale factors (to use terminology from Section 3.4)
separate from the coordinates xi , so we’d like to separate these scale factors
in the definition of the line element as well. This requires defining a new
quantity called the metric tensor, gij . Now, the line element can be written
c Nick Lucid
142 CHAPTER 6. TENSOR ANALYSIS
where ~ej is a coordinate basis vector and (~ej )k is the k th component of that
vector. In Cartesian coordinates, we have
1 0 0
gij = δij −→ 0 1 0 ; (6.4.5)
0 0 1
where the 2’s are exponents. In each case, we see the tensor is diagonal with
components equal to the square of the scale factor (e.g. gθθ = ~eθ •~eθ = hθ hθ ).
However, this is only the case when the space is described by orthogonal basis
vectors (i.e. ~ei • ~ej = 0 when i 6= j). The metric tensor may not be diagonal
in general, but it is always symmetric since the dot product is commutative.
• Ti = gij T j .
c Nick Lucid
6.4. DESCRIBING A SPACE 143
• Tij = gik T kj .
This pattern continues for higher rank tensors. Raising indices requires the
inverse metric tensor, which can be found using standard matrix algebra.
For example, it is g ij = gij in Cartesian coordinates and
1 0 0
g ij −→ 0 r12 0 (6.4.7)
1
0 0 r2 sin 2θ
• T i = g ij Tj .
• Tji = g ik Tkj .
• T ij = g ik Tkl g lj .
This pattern also continues for higher rank tensors. An interesting result of
all this is gji = g ik gkj = δji , which makes sense if you think in terms of inverse
matrices. Raising and lowering indices is very useful when writing complex
tensor equations.
where (êk )i is the ith coordinate basis component of the k th orthonormal basis
vector (meaning you’ll need to write out the orthonormal basis vectors in the
c Nick Lucid
144 CHAPTER 6. TENSOR ANALYSIS
Figure 6.4: This diagram demonstrates how the fundamental nature of a vector remains
unchanged when the coordinate system is rotated (center) or reflected (far right).
coordinate basis). Performing this process on the metric tensor always gives
1 0 0
gk̂l̂ = (êk )i (êl )j gij −→ 0 1 0 , (6.4.9)
0 0 1
which is just the metric tensor for Cartesian space. Sometimes, gk̂l̂ is written
as ηkl , but I find that much less descriptive and there are already enough
symbols to worry about. This projection is usually a final step in any work,
but must eventually be done to make real sense of your results especially if
those results will be used in another physics discipline.
c Nick Lucid
6.5. REALLY... WHAT’S A TENSOR?! 145
Figure 6.5: The point mass (the solid blue dot) is traveling along the circular path. It’s
~ are given at an arbitrary point along the
velocity ~v , position ~r, and angular momentum L
path.
Example 6.5.1
A point mass m is traveling in uniform circular motion with speed v at a
distance of R from the origin. Find the angular momentum of this object with
the z-axis directed along the axis of rotation. Then, rotate the coordinate
system an angle of θ about the x-axis and find the angular momentum again.
~ = ~r × p~
L
c Nick Lucid
146 CHAPTER 6. TENSOR ANALYSIS
where ~r is the position of the object relative to the origin and p~ is the
linear momentum of the object. Any quantity defined as a cross product
between two real vectors is automatically a pseudovector. (Note: If one
of the quantities in the cross product is a pseudovector, then the result
~ × ~r where ω
is a real vector. For example, ~v = ω ~ is the pseudovector.)
• Now we’ll do the rotation the easy way by operating a Cartesian rota-
tion matrix on the angular momentum vector. We get
1 0 0 0 0
L~ = 0 cos θ − sin θ 0 = −mvR sin θ
0 sin θ cos θ mvR mvR cos θ
which still has a magnitude of mvR. It would appear that the angular
momentum has rotated counterclockwise by an angle θ. However, it is
really the z-axis which has rotated clockwise. The angular momentum
is still directed along the axis of rotation of the point mass and, since
its magnitude hasn’t changed, we can conclude its fundamental nature
hasn’t changed either.
Example 6.5.2
A point mass m is traveling in uniform circular motion with speed v at
a distance of R from the origin with the z-axis directed along the axis of
rotation. Translate the coordinate system by −R along the y-axis and find
the angular momentum.
c Nick Lucid
6.5. REALLY... WHAT’S A TENSOR?! 147
Figure 6.6: The point mass (the solid blue dot) is traveling along the circular (gray dashed)
path. It’s velocity ~v and position ~r are given at an arbitrary point along the path. A few
useful angles are also shown.
r = 2R sin φ.
where the factor of sin φ comes from Eq. 2.2.3 and we’ve realized both
~r and ~v are always in the xy-plane. Substituting in for r, we get
Not only is this not the mvRẑ we got in Example 6.5.1, but it’s variable!
It changes as the point mass goes around the circle.
c Nick Lucid
148 CHAPTER 6. TENSOR ANALYSIS
~τ = ~r × F~
where F~ is the force causing the curved motion. In this case, it’s
uniform circular motion, so this force must always point toward the
center of the circle (i.e. a centripetal force). Since the angle between F~
and ~v is π/2 and the angle between ~v and ~r is θ = φ, we get
π
~τ = F r sin + φ ẑ.
2
Substituting in what we know of r and centripetal force results in
2
v π
~τ = m (2R sin φ) sin + φ ẑ
R 2
π
~τ = mv 2 (2 sin φ) sin + φ ẑ.
2
π
Since sin 2
+ φ = cos φ, the torque is
~τ = mv 2 sin(2φ) ẑ ,
which is also variable. The important point here is the torque in the
original coordinate system was zero at all times, yet one little shift of
the coordinate system (not the physical system) and suddenly there’s
a torque. That’s the weirdness of pseudotensors. If real tensors are
zero in one coordinate system, they must be zero in all of
them.
Another way to tell the difference between some tensors and pseudoten-
sors is by changing the physical system. An easy-to-see example is a mag-
netic field (a pseudovector) generated by a current-carrying wire loop like
c Nick Lucid
6.6. COORDINATE TRANSFORMATIONS 149
Figure 6.7: On the left, we have a wire loop carrying an electric current in a counterclock-
wise direction as viewed from above as well as the magnetic field it generates. On the
right, we have reflected the scenario on the left horizontally (i.e. across a vertical axis).
The direction of the current reflects as we’d expect because its motion is represented by
a vector. However, the magnetic field (a pseudovector) gains an extra reflection vertically
(i.e. across a horizontal axis).
that shown in Figure 6.7. When the whole scenario is reflected, the magnetic
field doesn’t reflect in the way you’d expect, but points in the opposite direc-
tion. If you’re not convinced the B-field is a pseudovector, take a look at the
Biot-Savart law (Eq. 5.2.10). It’s defined with a cross product of real vectors,
which we’ve already stated makes its status automatic. It turns out that, in
general, both E ~ and B~ are pseudovectors, but we’ll leave that development
for a later chapter.
It all really depends on the pseudotensor. Some of them transform just
fine under rotations, but not translations (or vice versa). Some of them
transform fine between rectilinear coordinates, but not curvilinear. Some
of them simply pick up an extra scalar factor when transforming. Others
transform in very complex ways. With experience, you just learn which ones
are tensors and which are pseudotensors. There’s no catch-all rule to figure
it out.
c Nick Lucid
150 CHAPTER 6. TENSOR ANALYSIS
for many but not all coordinate transformations. For example, it doesn’t
work for the coordinate translation of a position vector, but it will work
quite nicely for transforming between the systems described in Chapter 1.
The Jacobian that transforms from cylindrical to Cartesian coordinates is
∂x ∂x ∂x
∂s ∂φ ∂z cos φ −s sin φ 0
∂y ∂y ∂y
J = = sin φ s cos φ 0 ,
∂s ∂φ ∂z
∂z ∂z ∂z 0 0 1
∂s ∂φ ∂z
which is similar to something we already saw in Eq. 1.2.6. For any dimen-
sional space, we can write this in index notation as
∂x0j
Jij = (6.6.1)
∂xi
which transforms from the unprimed coordinate system to the primed one.
When it comes to tensors with multiple indices, each index must be trans-
formed separately. For a contravariant tensor, we have
∂x0k ∂x0l
T 0kl... = · · · T ij... (6.6.2)
∂xi ∂xj
and, for a covariant tensor, we have
0 ∂xi ∂xj
Tkl... = · · · Tij... , (6.6.3)
∂x0k ∂x0l
where the primed coordinates are now on the bottom of the derivative. For a
mixed tensor, you simply transform lower indices using the Jacobians found
in Eq. 6.6.3 and upper indices using those in Eq. 6.6.2 (Note: Upper indices
in the denominator of a derivative are actually lower indices)
Equations involving just tensors are invariant under all coordinate trans-
formations because the transformations are just multiplicative factors which
will cancel on either side. Pseudotensors, on the other hand, do not always
transform according to Eqs. 6.6.2 and/or 6.6.3. This makes equations involv-
ing them a challenge at times. However, if the transformation doesn’t vary
much from that of a tensor, then it isn’t too difficult to adjust.
Example 6.6.1
c Nick Lucid
6.6. COORDINATE TRANSFORMATIONS 151
Figure 6.8: This is the rank-3 Levi-Civita pseudotensor, εijk , in matrix notation. Yellow
boxes represent a zero, green a 1, and blue a −1. It is clear only 6 of the 33 = 27
components are non-zero.
where i, j, and k can each take on the. It’s special in that it’s antisymmetric
(i.e. Tij = −Tji ) and also unit (i.e. composed of unit and/or zero vector
sections, but is not the zero-tensor). Using this, we can write the angular
momentum as
Lk = εijk ri pj
in the coordinate basis or
Lk̂ = εîĵ k̂ rî pĵ .
in the orthonormal basis, which is probably the more familiar for most of us.
c Nick Lucid
152 CHAPTER 6. TENSOR ANALYSIS
Lk̂ = εŝĵ k̂ rŝ pĵ + εφ̂ĵ k̂ rφ̂ pĵ + εẑĵ k̂ rẑ pĵ
Lk̂ = εŝŝk̂ rŝ pŝ + εŝφ̂k̂ rŝ pφ̂ + εŝẑk̂ rŝ pẑ
+εφ̂ŝk̂ rφ̂ pŝ + εφ̂φ̂k̂ rφ̂ pφ̂ + εφ̂ẑk̂ rφ̂ pẑ
+εẑŝk̂ rẑ pŝ + εẑφ̂k̂ rẑ pφ̂ + εẑẑk̂ rẑ pẑ
having expanded over both sums (i.e. both i and j). By Eq. 6.6.4, this
simplifies to
φ̂ ẑ ẑ φ̂
Lŝ = r p − r p
Lφ̂ = −rŝ pẑ + rẑ pŝ ,
L = rŝ pφ̂ − rφ̂ pŝ
ẑ
which is exactly what you’d expect for the components of a cross prod-
uct. We also know ~r = Rŝ and p~ = m~v = mv φ̂, which makes everything
disappear except the first term in the z-component. The angular mo-
mentum is
Lẑ = rŝ pφ̂ = (R) (mv) = mvR,
c Nick Lucid
6.6. COORDINATE TRANSFORMATIONS 153
• So what happens in the coordinate basis? It’s almost the same process,
except based on Eq. 3.4.1, we have
~r = Rŝ = R~es
mv mv ,
p~ = m~v = mv φ̂ = Rφ̂ = ~eφ
R R
which makes linear momentum look a little strange. That’s what we
get for using a coordinate basis. The resulting angular momentum is
mv
Lz = rs pφ = (R) = mv,
R
which doesn’t make much sense. Linear momentum changed its ap-
pearance because φ̂ 6= ~eφ , so it might not be too surprising at this
point. However, we know ẑ = ~ez because hz = 1 (see Section 3.4), so
it shouldn’t be any different (i.e. Lz = Lẑ ). What the heck happened?!
How did we lose a factor of R?
This actually comes down to the fact that the pseudotensor εijk is what
makes angular momentum (and every other result of a cross product) a pseu-
dovector. The Levi-Civita pseudotensor transforms by
c Nick Lucid
154 CHAPTER 6. TENSOR ANALYSIS
With Eq. 6.6.5 in mind, the equation for angular momentum is actually
p
Lk = |det(g)| εijk ri pj , (6.6.6)
c Nick Lucid
6.7. TENSOR CALCULUS 155
Figure 6.9: This is a demonstration of the parallel transport of a vector T i . The dashed
~ i i
i
blue vector represents the vector T x + dx at x . This move was necessary to subtract
T~ xi from T~ xi + dxi .
c Nick Lucid
156 CHAPTER 6. TENSOR ANALYSIS
in the coordinate basis since we can place the blame entirely on the basis
vector. We know it’s a pseudotensor because it’s the zero-tensor in some
coordinates, but non-zero in others. It is also symmetric over the lower two
indices (i.e. Γkij = Γkji ). This means the del operation (sometimes called the
covariant derivative) is actually given by
∂T j
∇i T j = + Γjik T k (6.7.3)
∂xi
for a contravariant vector. The Christoffel term represents our small shift
in the vector’s position for the derivative and is, by no means, insignificant.
For a covariant vector, we get
∂Tj
∇i Tj = − Γkij Tk , (6.7.4)
∂xi
where we’ve swapped the indices and the sign of the extra term to compensate
for the change.
This can work for higher rank tensors as well, but we need a Christoffel
term for each tensor index. For a contravariant rank-2 tensor, this is
∂T jk
∇i T jk = + Γjil T lk + Γkil T jl , (6.7.5)
∂xi
where first Christoffel term sums over the first index on T jk (i.e. it adjusts
the derivative for the first index) and the second Christoffel terms sums over
the second index (i.e. it adjusts the derivative for the second index). The
appropriate Christoffel term in Eq. 6.7.5 can be changed as they were in Eq.
6.7.4 to account a covariant index.
Now we’re only left with one question: “How do we find the Christoffel
symbols for a given space?” Any adjustment we make to a tensor when we
c Nick Lucid
6.7. TENSOR CALCULUS 157
Example 6.7.1
Find the Christoffel symbols in a set of arbitrary orthogonal coordinates,
(q 1 , q 2 , q 3 ).
• First, we need to know the metric tensor for the space. If the coordinate
basis vectors are orthogonal, then Eq. 6.4.4 tells us the metric tensor
is diagonal taking the form
h1 h1 0 0
gij −→ 0 h2 h2 0 , (6.7.7)
0 0 h3 h3
where we’ve avoided using exponents for reasons that should become
clear as we go through the solution. This makes it’s inverse
1
h1 h1
0 0
1
g ij −→ 0 h2 h2
0 . (6.7.8)
1
0 0 h3 h3
c Nick Lucid
158 CHAPTER 6. TENSOR ANALYSIS
• If i = j = k = 1, then we get
1 1 l1 ∂gl1 ∂gl1 ∂g11
Γ11 = g + 1 − .
2 ∂q 1 ∂q ∂q l
We still have a summation over l, so there are actually 3 giant terms
that take the above form. However, we know g ij is diagonal, so the
only non-zero term is l = 1. We now get
1 1 11 ∂g11 ∂g11 ∂g11 1 ∂g11
Γ11 = g 1
+ 1
− 1
= g 11 1
2 ∂q ∂q ∂q 2 ∂q
Now we can substitute in the components of the metric and its inverse
to get
1 1 ∂
Γ111 = (h1 h1 )
2 h1 h1 ∂q 1
We can use Eq. 4.2.8 to simplify and also do this same process for the
other two values of i = j = k, which gives us
1 1 ∂h1
Γ11 = 1
h ∂q
1
2 1 ∂h 2
Γ22 = .
h2 ∂q 2
1 ∂h3
3
Γ33 =
h3 ∂q 3
c Nick Lucid
6.7. TENSOR CALCULUS 159
Since gij is also diagonal, the last two terms in parentheses are zero.
Now we can substitute in the components of the metric and its inverse
to get
1 ∂g11 1 1 ∂
Γ112 = g 11 2 = (h1 h1 ) .
2 ∂q 2 h1 h1 ∂q 2
We can use Eq. 4.2.8 to simplify and also do this same process for
similar index patterns, which gives us
1 1 1 ∂h1
Γ12 = Γ21 =
h ∂q 2
1
1 ∂h
1
Γ1 = Γ1 =
13 31
h1 ∂q 3 ,
1 ∂h2
2 2
Γ = Γ =
21 12
1
h 2 ∂q
etc.
noting that the Christoffel symbol is symmetric over the bottom two
indices. That’s 12 more Christoffel symbols for a total of 15.
c Nick Lucid
160 CHAPTER 6. TENSOR ANALYSIS
We can use Eq. 4.2.8 to simplify and also do this same process for
similar index patterns, which gives us
2
Γ11 = − h 1 ∂h 1
h h ∂q 2
2 2
h ∂h
1 1
3
Γ = −
11 3
h3 h3 ∂q .
h2 ∂h2
Γ122 = −
1
h1 h 1 ∂q
etc.
Example 6.7.2
Use tensor analysis to find the divergence of a vector, Aj , in a set of arbitrary
orthogonal coordinates, (q 1 , q 2 , q 3 ).
c Nick Lucid
6.7. TENSOR CALCULUS 161
∂Ai
∇i Ai = + Γiik Ak , (6.7.9)
∂q i
where i is now a summation index and there are no free indices. The
index k also represents its own summation independent from i. If we
expand both summations, then we have
∇i Ai = ∇1 A1 + ∇2 A2 + ∇3 A3 ,
where
∂A1
1 1 1 1 2 1 3
∇1 A = 1
+ Γ11 A + Γ12 A + Γ13 A
∂q
2
∂A
2 2 1 2 2 2 3
∇2 A = + Γ 21 A + Γ 22 A + Γ 23 A
∂q 2
3
∂A
∇3 A3 = 3 1 3 2 3 3
3
+ Γ 31 A + Γ 32 A + Γ 33 A
∂q
• These are all added together anyway, so let’s consider just the A1 terms
for now. Using the Christoffel symbols we found in Example 6.7.1, we
get
∂A1
+ Γ111 A1 + Γ221 A1 + Γ331 A1
∂q 1
∂A1
1 ∂h1 1 ∂h2 1 ∂h3 1
h1 h2 h3 1 + h2 h3 1 A + h1 h3 1 A + h1 h2 1 A .
h1 h2 h3 ∂q ∂q ∂q ∂q
c Nick Lucid
162 CHAPTER 6. TENSOR ANALYSIS
The quantity in brackets just looks like one big derivative product rule
(defined by Eq. 3.1.5), so we can simplify this drastically by saying
1 ∂
h1 h2 h3 A1
h1 h2 h3 ∂q 1
This may not look familiar since we’re working in the coordinate basis.
Using Eq. 3.4.1, we can say A1~e1 = A1 h1 ê1 = A1̂ ê1 means A1 h1 = A1̂ .
This leaves us with
1 ∂ 1̂
h2 h3 A
h1 h2 h3 ∂q 1
• A very similar process happens with the A2 and A3 terms. The total
result is
i 1 ∂ 1̂
∂ 2̂
∂ 3̂
∇i A = h2 h3 A + 1 h1 h3 A + 1 h1 h2 A
h1 h2 h3 ∂q 1 ∂q ∂q
Example 6.7.3
Use tensor analysis to find the curl of a vector, Aj , in a set of arbitrary
orthogonal coordinates, (q 1 , q 2 , q 3 ).
c Nick Lucid
6.7. TENSOR CALCULUS 163
We’d also like to raise the index on the left side so that we’re deal-
ing with contravariant vector components (because they’re easier to
picture). Operating with the inverse metric, g lm , the result is
l p
~ ×A
∇ ~ = |det(g)| g lm εmkj g ki ∇i Aj (6.7.10)
• Let’s start by considering the first component of the curl. This is given
by
1
~ ~
∇ × A = h1 h2 h3 g 1m εmkj g ki ∇i Aj .
~ ×A
~
1 h2 h3
∇ = ε1kj g ki ∇i Aj
h1
where we’ve made a substitution from Eq. 6.7.8.
• There are two other summations over indices k and j. According to Eq.
6.6.4, the indices of the Levi-Civita pseudotensor must all be different
for a non-zero value. Since we already know m = 1, we know that
kj = 23 and kj = 32 are the only non-zero terms. The result is
1 h h
~ ~ 2 3
ε123 g 2i ∇i A3 + ε132 g 3i ∇i A2
∇×A =
h1
~ = h2 h3 g 2i ∇i A3 − g 3i ∇i A2 .
1
~ ×A
∇
h1
c Nick Lucid
164 CHAPTER 6. TENSOR ANALYSIS
Again, g ki is diagonal, so the only non-zero terms in the sum over i are
~ = h2 h3 g 22 ∇2 A3 − g 33 ∇3 A2 .
1
~ ×A
∇
h1
1 h h 1 1
~ ~ 2 3 3 2
∇×A = ∇2 A − ∇3 A .
h1 h2 h2 h3 h3
where we’ve made a substitution from Eq. 6.7.8. We can simplify fur-
ther to
~ ×A
1
~ = 1
h3 h3 ∇2 A3 − h2 h2 ∇3 A2
∇ (6.7.11)
h1 h2 h3
• The two terms in brackets are defined by Eq. 6.7.3. They are
∂A3
3 3 n
∇
2
A = + Γ 2n A
∂q 2
2
∇3 A2 = ∂A + Γ23n An
∂q 3
∂A3
3 3 1 3 2 3 3
∇2 A = ∂q 2 + Γ21 A + Γ22 A + Γ23 A
2
.
2
∇3 A = ∂A 2 1 2 2 2 3
+ Γ31 A + Γ32 A + Γ33 A
∂q 3
Using the Christoffel symbols we found in Example 6.7.1, we get
• These are both added together with their respective coefficients, so let’s
consider just the A3 terms for now. This would be
3
∂A 1 ∂h3 3 h3 ∂h3 3
h3 h3 + A − h2 h2 − A
∂q 2 h3 ∂q 2 h2 h2 ∂q 2
c Nick Lucid
6.7. TENSOR CALCULUS 165
∂A3 ∂h3
h3 h3 2
+ 2h3 2 A3 .
∂q ∂q
We can use Eq. 4.2.8 on the second term to get
∂A3 ∂
h3 h3 + (h3 h3 ) A3 ,
∂q 2 ∂q 2
which is just the derivative product rule (defined by Eq. 3.1.5). Sim-
plifying further, we arrive at
∂
h3 h3 A3
∂q 2
which is exactly the ê1 component in Eq. 3.4.10. The other two com-
ponents follow the same pattern.
c Nick Lucid
166 CHAPTER 6. TENSOR ANALYSIS
c Nick Lucid
Chapter 7
Special Relativity
7.1 Origins
Since the early-to-middle 17th century, we’ve been keenly aware that motion
is relative. Let’s say you’re an baseball outfielder. If you throw the baseball
at 30 mph toward the second base while running at 15 mph toward second
base, then the player at second base is going to see the ball approaching
them at 45 mph. Each person their own perspective known as a frame of
reference. The concept is often called “classical relativity” or sometimes
“Galilean relativity” because it was Galileo who first formalized it.
However, in the late 19th century, the field of electrodynamics had devel-
oped into a very solid theory (See Chapter 5) and with it came a very big
problem. From Eq. 5.5.2, we discovered the speed of light, c, was constant
(defined by Eq. 5.5.4). There is no indication of any dependence on time,
space, or perspective. It is a universal constant and it is finite.
Let’s take another look at our baseball example. You’re running again
at 30 mph toward second base, but this time you’re pointing a flashlight
rather than throwing a ball. According to classical relativity, the player at
second base should see the light approaching at c + 30 mph. Mind you, c
is a little under 671 million mph, so 30 mph more isn’t going to change it
much. Fundamentally though, this is still a problem because it still changes
the speed of light regardless of how little. According to electrodynamics,
the speed of light is not dependent on perspective, so the second-base player
should still see the light approaching at exactly c. There in lies our problem.
It was widely accepted that neither classical relativity nor electrodynam-
167
168 CHAPTER 7. SPECIAL RELATIVITY
ics could be drastically wrong. Since classical relativity was the least abstract
and easiest to test, it was believed the problem lied with electrodynamics in
some minor way. It was suggested that maybe Maxwell’s equations (Eq. Set
5.4.9) are defined in the rest frame of the medium in which light propagates
(what they called luminiferous aether), so c only takes on the value given by
Eq. 5.5.4 in that frame of reference. It was then a mission for physics to find
out how the aether was moving relative to the Earth.
Many optical experiments were done in the effort (the most famous of
which by Albert Michelson and Edward Morley in 1887). None of the ex-
periments succeeded in measuring the velocity of the aether, which leaves us
with only four possible conclusions:
1. The Earth is in the rest frame of the aether. This is highly unlikely
since the Earth travels in an ellipse (nearly a circle) around the sun.
The Earth’s motion is continually changing direction, so this can’t be
true all the time.
3. The aether had the power to contract the experimental device in just
the right way to conceal its own existence. This was the conclusion
supported by Hendrik Lorentz. Yes, that’s the same guy we named
the Lorentz force (defined by Eq. 5.7.1) after. He even performed a
mathematical exercise to derive exactly how the aether would have to
do this. It was fundamentally the wrong idea, but we’ll see later in this
chapter that the equations turn out to be correct anyway.
4. The aether does not exist. This was highly unappealing at the time
because it implies light doesn’t need a medium to propagate. It was
immediately, but wrongly, discounted as a possible conclusion.
c Nick Lucid
7.1. ORIGINS 169
Figure 7.1: These people were important in the development of special relativity.
Einstein came along in 1905 (at the age of 26) and published a paper enti-
tled On the Electrodynamics of Moving Bodies. In this paper, he presented
a rather controversial solution to the problem described in this section that
he had been pondering for almost a decade (since the age of about 16). He
asked the question that no one else was willing to ask: “What if electrody-
namics is completely accurate. but it’s classical relativity that needs a bit of
reworking?” Needless to say, this solution wasn’t well received at the time.
As all hypotheses do, Einstein’s included some postulates (i.e. fundamen-
tal assumptions). There were only two of these postulates making his idea
more elegant than some could be. They involve the concept of inertial
reference frames (IRFs), which are defined by Newton’s first law to be
traveling at constant velocity (Note: ~v = 0 is constant velocity). Einstein’s
postulates are:
1. The laws of physics are the same in all IRFs. This was nothing new.
Having been stated by people like Galileo and Newton, it was over 200
years old in 1905.
2. The speed of light is constant and the same in all IRFs. This is the
result I mentioned was suggested by Maxwell’s equations. Einstein was
simply the first to be willing to accept it.
The question that now remains is “If neither the laws of physics nor the
speed of light change, then what does change?” The answer is “Almost
everything else!” This thought might be a bit difficult to comprehend or
accept, but hopefully you’ll be able to do both by the end of this chapter.
c Nick Lucid
170 CHAPTER 7. SPECIAL RELATIVITY
7.2 Spacetime
When a physics student first learns about special relativity, abstract equa-
tions are often thrown at them with little and/or poor explanation. This is
a cause for much of the confusion regarding the ideas in this theory. I find it
best to build an idea from other ideas a student (or reader) already knows,
which is a philosophy I’ve used in writing this book. We’ve spent a lot of
time focused on coordinate systems and diagrams. This also seems like a
good place to start with this.
A major implication of special relativity is that time deserves as much
attention as space. Diagrammatically, that means we’ll need to include it in
the coordinate system resulting in a four-dimensional spacetime. With the
new idea of a spacetime comes some new terminology:
Line Element
The best tools we have to describe a space are given in Section 6.4. However,
we have to be very careful when we incorporate time. First, time is not
measured in the same units as space, so a conversion factor of c (the speed
of light) appears. Secondly, by observation, we see that time behaves a little
c Nick Lucid
7.2. SPACETIME 171
Figure 7.2: This is a spacetime diagram where the horizontal axis, x, represents space (y
and z are suppressed for simplicity) and the vertical axis, ct represents time measured in
spatial units (c = 299, 792, 458 m/s is like a unit conversion between meters and seconds).
c Nick Lucid
172 CHAPTER 7. SPECIAL RELATIVITY
• If (∆s)2 > 0, then the two events have a space-like separation meaning
the space component dominates. These two events are considered non-
interactive. For an object to travel on a space-like world line, it would
require speeds faster than c. For this reason, it is unlikely the motion
of anything could be represented by a space-like world line.
From a mathematical standpoint, you could write the time component as an
imaginary number since
q √
−c2 (∆t)2 = −1 c∆t = ic∆t.
Metric Tensor
We can also write something like Eq. 6.4.3 to generalize the line element.
The result is
where the use of greek indices indicates four dimensions and repeated indices
indicates a summation. Remember to distinguish between exponents of 2
c Nick Lucid
7.2. SPACETIME 173
by matrix methods. There is some debate over whether the time component
or space components should have the negative sign, but in the end it simply
comes down to convention and I’ve chosen to stick with tradition.
We can transform this to other coordinate systems by replacing the lower-
right (spatial) 3 × 3 with the appropriate dimension-3 metric. For example,
in spherical coordinates, we have
2
−c 0 0 0
0 1 0 0
gαδ −→ 0 0 r 2
(7.2.6)
0
0 0 0 r2 sin2 θ
found by matrix methods. Note that we still get gδα = g αµ gµδ = δδα , the same
result we got with 3-space in Section 6.4.
Coordinate Rotations
The ultimate value of a spacetime diagram is going to be in how we can
use it to look at two different IRFs. Remember from Section 7.1, we’re
c Nick Lucid
174 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.3: In this spacetime diagram, the coordinate systems of both objects are shown
as well as both their world lines. Both objects line up with their respective time axis
indicating they both consider themselves to be at rest. The diagram on the right shows
the grid lines for the primed frame.
c Nick Lucid
7.2. SPACETIME 175
for a hyperbolic rotation. We’ve simply defined β to be v/c (i.e. the fraction
of the speed of light). Using a few trigonometric identities, we can solve for
cosh and sinh. For instance, we know
tanh2 ϕ = 1 − sech2 ϕ.
1
cosh ϕ = p ≡γ, (7.2.9)
1 − β2
cosh2 ϕ − sinh2 ϕ = 1
q
sinh ϕ = cosh2 ϕ − 1.
s
1 1 − β2
sinh ϕ = −
1 − β2 1 − β2
s
β2
sinh ϕ =
1 − β2
c Nick Lucid
176 CHAPTER 7. SPECIAL RELATIVITY
β
sinh ϕ = p = γβ. (7.2.10)
1 − β2
Eqs. 7.2.9 and 7.2.10 are very important relationships that will show up
repeatedly.
Furthermore, we can get a little more understanding of the diagram out
of Eq. 7.2.8. Let’s look at our two possible extremes:
but this would only be accurate in a diagram like that drawn in Figures 7.3
and 7.4. An axial rotation of α = 45◦ is just the diagonal exactly between
the time and space axes. Events on this diagonal have a light-like separation.
Since light is the fastest thing we know of in the universe, we can use this
line to define something called a light cone.
A light cone points away from an event and encompasses the entire realm
of influence of that event on other events in spacetime (and vice versa).
Figure 7.4 shows two different light cones for event 1: future (above event 1
in the diagram) and past (below event 1 in the diagram). Event 2 is also on
the world line for the object in its future light cone, which means whatever
happens there is something the object can come into physical contact with
at some point in the future.
Event 3 is on the edge of the future light cone, which means someone at
event 3 could see the object at event 1, but that’s about it. In fact, event
3 would represent an observation of event 1. Event 4 is on the edge of the
past light cone, which means the object would receive light from that event
(whatever it may be) when it reaches event 1. Also, as time moves forward,
the light cone gets larger indicating the light has traveled farther away from
where the object was at event 1. Light cones are very useful in discussing
the concept of causality (i.e. cause and effect).
c Nick Lucid
7.2. SPACETIME 177
Figure 7.4: In this spacetime diagram, the solid blue line is the world line for an object.
The orange dashed lines are world lines for light meaning the shaded triangles represent
the past and future light cones of the object at event 1. The cones only appear as triangles
due to the suppression of the other spatial coordinates.
Figure 7.5: This spacetime diagram is very much like Figure 7.4 except only the z-axis
is suppressed (rather than both y and z). It is clear in this diagram why we call it a
light cone. The event in the center cannot interact with events in the region labeled
“Non-Interactive.”
c Nick Lucid
178 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.6: In this type of spacetime diagram, neither set of axes looks orthogonal, but
it’s important to know both sets are orthogonal. The diagram on the left is an exact
reproduction of Figure 7.3. On the right, the orange dashed line represents a light-like
world line, which is still the diagonal between space and time.
Taking Measurements
It’s becoming clear, from diagrams like Figure 7.3, that people in different
IRFs will take different measurements (e.g. time and length) of the same
phenomenon. This begs the question: “So who’s right and who’s wrong?”
Well, no one is wrong even if different observers don’t agree. The concept of
absoluteness is something we need to let go in order to move forward in our
understanding.
If you have two objects (such as those in Figure 7.3) moving at some
relative velocity to one another, then there is no way to determine who is
moving and who is not. Object A will consider themselves at rest and say
object B is moving (and vice versa). A third observer might say both objects
are moving. What we mean is that all IRFs are on equal footing. They are
all correct about measurements made in their own frame and that’s all that
matters because of Einstein’s first postulate. As long as each observer stays
in their own frame, what measurements would be in the other frame is of
little consequence.
In Figures 7.3 and 7.4, the unprimed axes are clearly orthogonal to each
other. We should take note here that the primed axes are also orthogonal to
each other even though it’s not clear in the diagrams. Sometimes spacetime
diagrams are drawn so that neither set of axes looks orthogonal like the one
given in Figure 7.6. This helps keep someone working with the topic from
c Nick Lucid
7.2. SPACETIME 179
Figure 7.7: In this spacetime diagram, events 1 and 2 are simultaneous in the unprimed
frame, but not in the primed frame. Simultaneous events occur in a frame along lines
parallel to the spatial axis in that same frame.
c Nick Lucid
180 CHAPTER 7. SPECIAL RELATIVITY
More proper quantities can be defined in terms of these, but it’s usually
standard to only define the four listed here and let the rest fall as they may.
Example 7.2.1
The difference in time measurements between IRFs is called time dilation
and we can find it using a spacetime diagram with very little math. For
the sake of discussion, let’s say a high-speed boat is traveling at night on
the ocean at constant v (and, therefore, constant β) in the x-direction. This
boat has a blinking light on its bow (safety regulations and all) that blinks
at very regular intervals.
Figure 7.8 shows the time dilation in action. Events 1 and 2 represent two
consecutive flashes of the boat’s bow light. Someone on the boat would be
in the primed frame (as this frame is attached to the boat). They measure a
spacetime separation of ic∆t0 between flashes (which is the hypotenuse in the
triangle). This is the smallest possible time measurement between these two
events, which recall is the proper time (∆t0 = ∆tp ). You might be inclined
to say it’s the longest of the three sides of the triangle based on its physical
length in the diagram, but don’t be fooled! Remember, the time component
in the line element is negative.
The green dashed lines are the components of the same world line, but
measured in the unprimed frame. The component adjacent to α is measured
to be ic∆t because it lines up with the ct-axis and the component opposite
of α is measured to be ∆x because it lines up with the x-axis. It makes sense
there would be a ∆x in this frame since an observer would see the flashing
light move through space.
We can solve this problem one of two ways using the triangle in Figure
7.8. The first instinct might be to use the Pythagorean theorem since the
line element looks a lot like it. In that case, we’d get
2
(ic∆t0 ) = (ic∆t)2 + (∆x)2
2
−c2 (∆t0 ) = −c2 (∆t)2 + (∆x)2
2
c2 (∆t0 ) = c2 (∆t)2 − (∆x)2 ,
c Nick Lucid
7.2. SPACETIME 181
Figure 7.8: In this spacetime diagram, there are two events with time-like separation
2
demonstrating time dilation. The solid blue line is the line element measured as −c2 (∆t0 ) .
The two red dashed lines represent the components of this same world line measured in
2 2
the unprimed frame as −c2 (∆t) + (∆x) . The triangle has been straightened-out for
clarity.
where we can see ∆t0 < ∆t due to the subtraction of (∆x)2 . This equa-
tion also makes sense because we’ve already said the separation is spacetime
invariant. If we divide through by c2 (∆t)2 , then the result is
(∆t0 )2 (∆x)2
=1−
(∆t)2 c2 (∆t)2
2 2
∆t0
∆x/∆t
=1− .
∆t c
We know v = ∆x/∆t because the boat has traveled a distance of ∆x in a
time ∆t in the unprimed frame. With this fact and Eq. 7.2.8, we get
0 2
∆t
= 1 − β2
∆t
∆t0 p
= 1 − β2
∆t
∆t0
∆t = p .
1 − β2
c Nick Lucid
182 CHAPTER 7. SPECIAL RELATIVITY
We can use Eq. 7.2.9 and the definition of proper time to simplify to
∆t = cosh ϕ ∆t0 .
Using Eq. 7.2.9 and the definition of proper time, we arrive again at Eq.
7.2.11. This was a much shorter solution, but don’t feel bad if you didn’t
think to do it. Most people aren’t familiar enough with hyperbolic trigonom-
etry for it to come to mind. It is something you should put in your arsenal
from now on though.
Example 7.2.2
The difference in length measurements between IRFs is called length con-
traction and we can find it using a spacetime diagram with very little math.
For the sake of discussion, let’s say a high-speed boat is traveling at night on
the ocean at constant v (and, therefore, constant β) in the x-direction.
If we’re going to measure length, then we need to be clear about what
we mean by “length.” Measurements of length are very closely related to
the concept of simultaneity shown in Figure 7.7. We now define length as
the spacetime separation between two particular events. These two events
represent the two ends of the object (in this case, the boat). For the mea-
surement to be a length, the two events must occur at the same time in the
frame in which you take the measurement.
We’ve already seen that simultaneous events in one IRF are not simulta-
neous in any another IRF, so the set of events measuring length in one frame
will not be the same set of events measuring length in the other. Figure
7.9 shows the length contraction in action. Length in the primed frame is
c Nick Lucid
7.2. SPACETIME 183
Figure 7.9: In this spacetime diagram, there are two world lines corresponding the front
and back of an object. Between them, there are two measurements of length corresponding
to the two different frames. Both must connect the two world lines with events which occur
at the same time in the frame of measurement. The top triangle is an enlarged version
of the one in the diagram and the bottom triangle is just a straightened-out version for
clarity.
L0
L=
cosh ϕ
Using Eq. 7.2.9 and the definition of proper length, we arrive
Lp
L= . (7.2.12)
γ
You might be inclined to say L the longest of the three sides of the triangle
based on its physical length in the diagram, so you’d think it would be the
longer length measurement. Don’t be fooled! Remember, the square of time
component is negative, so the Pythagorean theorem says
2 2
(L)2 = (ic∆t0 ) + (L0 )
c Nick Lucid
184 CHAPTER 7. SPECIAL RELATIVITY
2 2
(L)2 = −c2 (∆t0 ) + (L0 ) ,
where we can clearly see L0 > L due to the subtraction of c2 (∆t0 )2 . Further-
more, it’s important to know that both these lengths are measured in the
direction of motion. There is no length contraction along the other orthogo-
nal directions (i.e. the y and z directions).
c Nick Lucid
7.3. LORENTZ TRANSFORMATIONS 185
which looks a lot simpler and is more oriented toward measurable values
(noting again that −β is replaced by β for the inverse transformation). If
you prefer transformation equations over matrices, then we can just perform
a quick matrix multiplication. Eq. 7.3.1 becomes
0
ct γct − γβx γ (ct − βx)
x0 −γβct + γx γ (−βct + x)
0 = = .
y y y
0
z z z
We can divide the first line by c and use Eq. 7.2.8 to get
0
t 0
= γ (t − vx/c2 )
x = γ (−vt + x)
, (7.3.2)
y0 = y
0
z = z
Example 7.3.1
c Nick Lucid
186 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.10: A ball is thrown in the top boat. In the unprimed frame (attached to the
bottom boat), the top boat is moving in the positive x-direction at v and the velocity of
the ball is measured to be ~u. In the primed frame (attached to the top boat), the bottom
boat is moving in the negative x-direction at v and the velocity of the ball is measured to
be ~u0 .
You’re on the ocean on a boat at rest (relative to the water) when you see a
high-speed boat zip passed you. It is traveling at constant v (and, therefore,
constant β) in the x-direction. At that exact moment, the driver of that
other boat throws a ball in a random direction with a velocity you measure
to be ~u (pun intended). What velocity would the driver of the other boat
measure for the ball?
• We can do this component-wise, so let’s start with the x-direction (the
boat’s direction of motion). The definition of velocity in this direction
is
dx0
u0x = 0 .
dt
We can apply Eq. 7.3.2 to both the numerator and denominator (as
they both change between IRFs). The result is
γ (−v dt + dx)
u0x = .
γ (dt − v dx/c2 )
Dividing the numerator and denominator by γ dt gives us
−v + dx/dt dx/dt − v
u0x = 2
= .
1 − v dx/ (c dt) 1 − (dx/dt) v/c2
We know ux = dx/dt, so
ux − v
u0x = .
1 − ux v/c2
c Nick Lucid
7.3. LORENTZ TRANSFORMATIONS 187
dy 0 dy
u0y = =
dt0 γ (dt − v dx/c2 )
dy/dt dy/dt
u0y = = .
γ [1 − v dx/ (c2 dt)] γ [1 − (dx/dt) v/c2 ]
uy /γ
u0y = .
1 − ux v/c2
In summary,
ux − v
u0x = (7.3.3a)
1 − ux v/c2
uy /γ
u0y = (7.3.3b)
1 − ux v/c2
uz /γ
u0z = (7.3.3c)
1 − ux v/c2
where x is the direction of motion of the primed IRF relative to the unprimed
IRF. This is called coordinate velocity since the derivative is taken with
respect to the coordinate time, t. According to the observer on the other
boat, they are at rest and you’re moving in the negative x-direction as shown
in Figure 7.10. That means you can obtain the reverse transformations (i.e.
~u0 → ~u) by replacing v with −v. Note that ~u and ~u0 need not have the same
magnitude nor the same direction.
Example 7.3.2
c Nick Lucid
188 CHAPTER 7. SPECIAL RELATIVITY
You’re on the ocean on a boat at rest (relative to the water) when you see a
high-speed boat zip passed you. It is traveling at constant v (and, therefore,
constant β) in the x-direction. At that exact moment, the driver of that other
boat throws a ball in a random direction with an acceleration you measure
to be ~a. What acceleration would the driver of the other boat measure for
the ball?
• We can do this component-wise, so let’s start with the x-direction (the
boat’s direction of motion). The definition of velocity in this direction
is
du0x du0x /dt
a0x = 0 = 0 .
dt dt /dt
We can apply Eq. 7.3.2 to both the denominator and Eq. 7.3.3a to
numerator. The result is
d ux − v
dt 1 − ux v/c2
a0x = .
d vx
γ t− 2
dt c
where γ and v are both constant. Using the derivative quotient rule
(defined by Eq. 3.1.6) on the numerator and distributing the denomi-
nator gives us
dux ux v d ux v
1 − 2 − (ux − v) 1− 2 1
a0x = dt c dt c
ux v 2
dt v dx
1− 2 γ −
c dt c2 dt
dux ux v −v dux
1 − 2 − (ux − v) 2
a0x = dt c c dt 1
.
ux v 2 v dx
1− 2 γ 1− 2
c c dt
We know ux = dx/dt and ax = dux /dt, so
ux v −v
ax 1 − 2 − (ux − v) 2 ax
a0x = c c 1
ux v
ux v 2
1− 2 γ 1− 2
c c
c Nick Lucid
7.3. LORENTZ TRANSFORMATIONS 189
ux v −v
1− 2
− (ux − v) 2
a0x = c c a
x
ux v 3
γ 1− 2
c
ux v ux v v 2 v2
1− + − 1 −
a0x = c2 c2 c2 a = c2 a
x
ux v 3 ux v 3 x
γ 1− 2 γ 1− 2
c c
and, by Eq. 7.2.9,
ax
a0x =
γ 3 (1 − ux v/c2 )3
c Nick Lucid
190 CHAPTER 7. SPECIAL RELATIVITY
In summary,
ax
a0x = (7.3.4a)
γ3 (1 − ux v/c2 )3
Transformation Matrix
The 4 × 4 matrix in Eq. 7.3.1 is called the Lorentz transformation matrix
and is given by
γ −γβ 0 0
−γβ γ 0 0
Λαδ −→ 0
, (7.3.5)
0 1 0
0 0 0 1
c Nick Lucid
7.3. LORENTZ TRANSFORMATIONS 191
Notice, we made the definition x0 ≡ ct that merges the quantity c into the
time component. We’re now measuring time in spatial units (e.g. meters).
This changes the look of our line element in Cartesian coordinates to
gαδ dxα dxδ = −dx0 dx0 + dx1 dx1 + dx2 dx2 + dx3 dx3 (7.3.7)
and the spacetime metric tensor to
−1 0 0 0
0 1 0 0
gαδ −→ 0
, (7.3.8)
0 1 0
0 0 0 1
where we would still replace the lower right 3 × 3 components with the
appropriate 3-space metric. This definition of the time component comes
with its conveniences. First, the metric is not only simpler, but it reflects
clearly our choice of sign convention (−, +, +, +). Secondly, we don’t have to
worry about factors of c appearing in equations and when we raise or lower
indices. The only downside is we must think about time differently, which
isn’t an unreasonable expectation given that spacetime puts space and time
on equal footing. If you think about it, we’re already accustomed to the
reverse, measuring space with time units: “The store is 15 minutes away.”
Inconvenient Coordinates
We can also extend Eq. 7.3.5 a bit. So far, we’ve been assuming that
the relative motion between the two IRFs is solely in the x-direction (i.e.
vy = vz = 0). This wasn’t an unrealistic assumption, mind you, since con-
stant velocity (direction included) implies linear motion. Unfortunately, some
systems are complex enough that another phenomenon may dictate the loca-
tion and orientation of the coordinate system. If that’s the case, we’ll need
to generalize Eq. 7.3.5 to a relative velocity with three non-zero components.
We can see from Eq. 7.3.2 that the directions orthogonal to the direction
of motion are unaffected by the transformation. This should still be true if
we generalize since the orientation of the coordinate system should not affect
physical results. With that in mind, let’s consider the dimension-3 position
vector of an event. We can split this into two components: one parallel to ~v
and one perpendicular to ~v such that
~r = ~rk + ~r⊥ . (7.3.9)
c Nick Lucid
192 CHAPTER 7. SPECIAL RELATIVITY
where we’ve replaced v and x with ~v and ~rk , respectively. Since ~v • ~r⊥ = 0
by definition of ~r⊥ , then we can use Eq. 7.3.9 to get
~v • ~rk = ~v • (~r − ~r⊥ ) = ~v • ~r.
For consistent units, we can also multiply the top equation by c arriving at
ct0 = γ (ct − ~v • ~r/c) .
We also need a substitution for ~rk and ~r⊥ . The parallel component can be
written as a projection onto ~v by
~v ~v (~v • ~r) ~v
~rk = (v̂ • ~r) v̂ = • ~r =
v v v2
where v̂ is the unit vector in the direction of motion and we’ve used something
like Eq. 5.2.4 to get rid of the unit vectors.
It’s going to be simpler in the long run to use β rather than v, so we’ll
define β~ = ~v /c. We now have
β • ~r β~
~
~rk =
β2
and the perpendicular component follows from Eq. 7.3.9 as
β~ • ~r β~
~r⊥ = ~r − ~rk = ~r − .
β2
Therefore, the transformation equations are
ct 0
= γ ct − ~
β • ~
r
~
β • ~
r ~
β
~ +
0
~rk = γ −βct
β2 .
~ ~
β • ~
r β
0
~r⊥ = ~r −
2
β
c Nick Lucid
7.3. LORENTZ TRANSFORMATIONS 193
Lastly, we can merge the last two equations by using Eq. 7.3.9 in the primed
system. This gives us
0 ~
ct = γ ct − β • ~r
,
β~ • ~r β~
~ + ~r + (γ − 1)
~r0 = −γ βct
β2
where we’ve used Eq. 2.2.2 to expand the dot products. This makes the
general transformation matrix
c Nick Lucid
194 CHAPTER 7. SPECIAL RELATIVITY
T 0α = Λαδ T δ (7.4.1)
∆x3 ∆z
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 195
where we have used the metric tensor to raise the index on the first vector
in the product. This looks a lot like the line element in Eq. 7.2.3. Using Eq.
7.3.8, we get
∆xα ∆xα = −∆x0 ∆x0 + ∆x1 ∆x1 + ∆x2 ∆x2 + ∆x3 ∆x3
which is almost exactly the line element in Cartesian coordinates. The only
difference is the ∆ instead of the differential, but you could just as easily
do this for an infinitesimally small displacement: dxα dxα . This makes the
scalar product of the 4-displacement with itself a spacetime invariant (i.e.
∆xα ∆xα = ∆x0δ ∆x0δ ), which is something we mentioned in Section 7.2.
It is important to note that the covariant 4-displacement is given by
−∆x0 −c∆t
∆x1 ∆x
∆xα = gδα ∆xδ −→ ∆x2 = ∆y ,
∆x3 ∆z
where the only difference between this and the contravariant form is the
negative on the time component. This mathematical phenomenon is true of
all 4-vectors due to the metric tensor in Eq. 7.3.8. Sometimes it is written
shorthand as (−c∆t, ∆~r) where ∆~r is the 3-displacement. In general, this
−→
shorthand is essentially (time, space).
Unlike classical physics, time derivatives of 4-vectors are not necessarily
also 4-vectors. Time is measured differently in different IRFs, which poses
issues. You also can’t take a 3-vector, just tack on a fourth component,
c Nick Lucid
196 CHAPTER 7. SPECIAL RELATIVITY
and call it a 4-vector. For example, the 3-velocity extended into dimension-
4 would have a time component of dx0 /dt = c, but it’s a 4-pseudovector.
This is something made clear by the look of the transformations in Example
7.3.1. To make a 4-vector by taking a time derivative, we need to use a time
measurement all frames can agree on. Traditionally, we go with the proper
time, ∆τ .
Four-Velocity
If we’re talking time derivatives, then it makes sense to start with velocity.
The dimension-4 velocity vector of an object is defined as
dxδ
uδ = , (7.4.3)
dτ
dxδ dt dxδ
uδ = =γ ,
dt dτ dt
where
1
γ=p (7.4.4)
1 − u2 /c2
and ~u is the relative velocity between the object and the frame in which
its velocity is measured. The velocity ~u is not the same as the relative
velocity ~v between two observers in two different IRFs. The object itself
would represent a third frame (not necessarily an IRF) independent from
the other two where it measures its own proper time. It’s components in
Cartesian coordinates can be shown in matrix notation as
γ dx0 /dt γc dt/dt γc
γ dx1 /dt γ dx/dt γux
uδ −→ γ dx2 /dt = γ dy/dt = γuy ,
(7.4.5)
γ dx3 /dt γ dz/dt γuz
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 197
Figure 7.11: In this spacetime diagram, the world line for an object is shown. Its 4-
velocity, uδ , is indictated by a green arrow and its 4-acceleration, aδ , is shown with a
purple arrow.
uδ uδ = u0 u0 + u1 u1 + u2 u2 + u3 u3 = u0 u0 + γ 2~u • ~u,
where we’ve written the spatial product as the familiar dot product (think
shorthand 4-vector notation). We also know u0 = g0µ uµ = g00 u0 = −u0
because the metric tensor is diagonal. Now the scalar product is
u2
δ 2 2 2 2 2 2
uδ u = −γ c + γ u = −γ c 1 − 2
c
c Nick Lucid
198 CHAPTER 7. SPECIAL RELATIVITY
uδ uδ = −c2 , (7.4.6)
which is constant and true for all 4-velocities in all time-like frames. We
could have also said
dxµ dxδ gδµ dxµ dxδ
uδ uδ = gδµ uµ uδ = gδµ = ,
dτ dτ dτ dτ
by Eq. 7.4.3. The numerator is just the general definition of the line element,
so
−c2 dτ 2
uδ uδ = = −c2 ,
dτ dτ
where we’ve assumed ds2 = −c2 dτ 2 in the rest frame of the object (see
Example 7.2.1). This is exactly the same result as before. √
You could argue the magnitude of the 4-velocity for all particles is uδ uδ =
ic and it’s only the components that IRFs measure differently. In the rest
frame of the object, the contravariant 4-velocity is (c, 0), which is a fact we
can use to derive the generalized 4-velocity another way. If we use a Lorentz
transformation from the rest frame of the object into an arbitrary IRF, the
result is
γ γβ 0 0 c γc γc
γβ γ 0 0 0 γβc γu
uδ −→ 0 = = , (7.4.7)
0 1 0 0 0 0
0 0 0 1 0 0 0
where γ is given by Eq. 7.4.4 and we’re assuming β is positive due to the
direction of transformation. Note: The result would have been exactly Eq.
7.4.5 had we used Eq. 7.3.10 as the transformation matrix instead. You’ll
find this method is a very useful short-cut to have in your mathematical
toolbox.
We can also use the transformation of 4-velocity to write out a transfor-
mation for the coordinate velocity 4-pseudovector. Between two arbitrary
IRFs, the transformation for the 4-velocity is
u0δ = Λδµ uδ (7.4.8)
γ 0c
γT −γT βT 0 0 γc
γ 0 u0x −γT βT γT 0 0 γux
0 0=
γ uy 0 0 1 0 γuy
0 0
γ uz 0 0 0 1 γuz
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 199
where the T subscript stands for “transformation.” There are three different
gammas: γ is between the unprimed frame and the objects rest frame, γ 0 is
between the primed frame and the objects rest frame, and γT is between the
primed and unprimed frames. If we move around the γ and γ 0 , then we have
c γ T −γ T β T 0 0 c
u0x
0 = γ −γT βT γT 0 0 ux
uy γ0 0 0 1 0 uy
u0z 0 0 0 1 uz
and now the column matrices are just the coordinate velocity 4-pseudovectors
in primed and unprimed frames. Now we can write
dx0δ µ
γ δ dx
= Λ µ , (7.4.9)
dt0 γ0 dt
which is very reminiscent of a pseudovector transformation given that it
simply gains an extra scalar factor.
Four-Acceleration
If an object is not only moving but accelerating, then we’ll also need to a
second derivative of its 4-position. We call this the 4-acceleration and it is
defined by
d dxδ duδ
δ
a = = . (7.4.10)
dτ dτ dτ
δ duδ dt duδ
a = =γ ,
dt dτ dt
c Nick Lucid
200 CHAPTER 7. SPECIAL RELATIVITY
where γ is defined by Eq. 7.4.4. This γ contains a ~u, which is the rela-
tive velocity between the object and the frame in which its acceleration is
measured. The velocity ~u is not the same as the relative velocity ~v between
two observers in two different IRFs. The object itself would represent a third
frame independent from the other two where it measures its own proper time.
It’s components in Cartesian coordinates can be shown in matrix notation
as
0
u γc γ γ̇c
d 1 2
u2 = γ d γux = γ γ̇ux + γ 2 u˙x ,
aδ −→ γ
dt u dt γuy γ γ̇uy + γ u˙y
3
u γuz γ γ̇uz + γ 2 u˙z
where the dot accent represents a derivative with respect to coordinate time,
d/dt, and we’ve used the derivative
product rule(defined Eq. 3.1.5). We can
also write this in shorthand as γ γ̇c, γ γ̇~u + γ 2~u˙ .
However, we need to get rid of the dots. Let’s start with the most difficult
dot to remove: γ̇. It can be evaluated by
" # " − 21 #
dγ d 1 d u2
γ̇ = = p = 1− 2
dt dt 1 − u2 /c2 dt c
− 32
u2
1 1 d
γ̇ = − 1− 2 − 2 (~u • ~u) ,
2 c c dt
where we’ve replaced to u2 with ~u • ~u for clarity in the next few steps and
~u = (ux , uy , uz ) is the coordinate 3-velocity described in Example 7.3.1. Now
we’re going to use Eq. 4.2.8 on the dot product. If you’re not convinced it
works for vectors, then use the derivative product rule (defined Eq. 3.1.5) to
get
d d~u d~u d~u
(~u • ~u) = • ~u + ~u • = 2~u • . (7.4.11)
dt dt dt dt
This results in
− 23
u2 γ3
1 2 d~u d~u
γ̇ = − 1− 2 − 2 ~u • = 2 ~u • ,
2 c c dt c dt
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 201
where we’ve used Eq. 7.4.4 to simplify. In Example 7.3.2, we defined the
coordinate 3-acceleration as ~a = d~u/dt = ~u˙ , so
γ3
γ̇ = (~u • ~a) (7.4.12)
c2
and the 4-acceleration becomes
3
γ3
δ γ 2
a −→ γ 2 (~u • ~a) c, γ 2 (~u • ~a) ~u + γ ~a
c c
γ4 γ4
δ 2
a −→ (~u • ~a) , 2 (~u • ~a) ~u + γ ~a . (7.4.13)
c c
As you can see, this is very complex, but it has very important implications.
The 4-acceleration can be looked at another way. As a proper time deriva-
tive of 4-velocity, it represents the rate of change of the world line tangent
vector. That makes it the curvature vector to the object’s world line (see
Figure 7.11). Also, we can take the scalar product of the 4-acceleration with
the 4-velocity. Using Eq. 4.2.8 and Eq. 7.4.6 to simplify, The result is
duδ 1 d 1 d
uδ aδ = uδ uδ uδ = −c2 = 0,
= (7.4.14)
dτ 2 dτ 2 dτ
which is true for all objects in all frames. Since the scalar product is akin
to the dot product, this says something about their orthogonality. How-
ever, spacetime is a hyperbolic space, so this implies the 4-acceleration and
4-velocity are hyperbolic orthogonal. Mathematically, this means some-
thing very different than what we normally think of as orthogonal. Physically,
since spacetime is hyperbolic and there isn’t any other spacetime, hyperbolic
orthogonal is the only orthogonal. This is one of those “Don’t sweat the
details” moments.
We can also take the scalar product of the 4-acceleration with itself, but
the result isn’t as profound as it was for the 4-velocity. We can still use the
shorthand notation for the scalar product as we did with the 4-velocity. The
general definition of this shorthand is given by
Tδ T δ = T0 T 0 + T1 T 1 + T2 T 2 + T3 T 3 = T0 T 0 + T~ • T~ ,
2
Tδ T δ = − T t + T~ • T~ , (7.4.15)
c Nick Lucid
202 CHAPTER 7. SPECIAL RELATIVITY
where T δ is an arbitrary 4-vector and the negative comes from the metric
tensor. For the 4-acceleration, this would be
γ8
4 4
δ 2 γ 2 γ 2
aδ a = − 2 (~u • ~a) + (~u • ~a) ~u + γ ~a • (~u • ~a) ~u + γ ~a
c c2 c2
γ8 2 γ8 2 2γ 6
aδ aδ = − 2
(~
u • ~
a ) + 4
(~
u • ~
a ) (~
u • ~
u ) + 2
(~u • ~a)2 + γ 4 (~a • ~a)
c c c
γ6 γ2
2
aδ a = 2 (~u • ~a) −γ + 2 (~u • ~u) + 2 + γ 4 (~a • ~a) .
δ 2
c c
γ6 γ2 2
2
δ
aδ a = 2 (~u • ~a) −γ + 2 u + 2 + γ 4 a2
2
c c
γ6 u2
2
aδ a = 2 (~u • ~a) −γ 1 − 2 + 2 + γ 4 a2
δ 2
c c
and, by Eq. 7.4.4,
γ6
aδ aδ = (~u • ~a)2 + γ 4 a2 . (7.4.16)
c2
This scalar product is still spacetime invariant just like any other real scalar
(as opposed to a pseudoscalar), but is not constant like it was for the 4-
velocity.
In the rest frame of the object (we’ll call it the double-primed frame), we
know ~u00 = 0 and γ 00 = 1, so the scalar product reduces to
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 203
u2 2
a2p = γ 6 2 4 2 4 2 2 2
2
a cos θ + γ a = γ γ β cos θ + 1 a,
c2
where β ≡ u/c. Solving for a, we get
ap
a= p . (7.4.18)
γ2 γ β cos2 θ + 1
2 2
This simplifies in certain special cases given you know θ, the angle between
~u and ~a. Since the smallest γ ever gets is one and the smallest β ever gets is
zero, the denominator in Eq. 7.4.18 is always greater than or equal to one.
Therefore, amax = ap .
Four-Momentum
If we extend momentum into the 4-vector realm, then it’s called 4-momentum
and is defined very similarly to that of 3-momentum. We have
pδ = mp uδ , (7.4.19)
where mp is the rest mass (or proper mass) and uδ is the 4-velocity. As long
as the rest mass isn’t changing, the 4-momentum has all the same properties
as the 4-velocity. It’s components can be written in shorthand as
c Nick Lucid
204 CHAPTER 7. SPECIAL RELATIVITY
or even
δ Erel
p −→ , p~rel , (7.4.22)
c
where Erel ≡ γEp and p~rel ≡ γ~p are defined as the relativistic energy and
relativistic 3-momentum.
This is very convenient because we have incorporated conservation of
energy and conservation of 3-momentum into one principle: conservation
of 4-momentum:
where either side includes the entire system. The subscripts “before” and
“after” refer to measurements taken before and after some event in spacetime.
It is also important to distinguish between conserved and invariant using the
following definitions:
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 205
which is true for all objects, but is only constant if rest mass doesn’t change.
Upon close inspection, this yields a very familiar and useful invariant equa-
tion. Evaluating the scalar product using Eq. 7.4.15, we get
2
Erel
− + p~rel • p~rel = −m2p c2
c
2
Erel
− + (prel )2 = −m2p c2
c
2
Erel
− + p2rel = −m2p c2
c2
2
−Erel + p2rel c2 = −m2p c4
2
Erel = m2p c4 + p2rel c2 = Ep2 + p2rel c2 , (7.4.25)
Four-Force
If we extend net force into the 4-vector realm, then it’s called 4-force and is
defined very similarly to that of 3-force. We have
dpδ
Fδ = = m p aδ , (7.4.26)
dτ
where mp is the rest mass (or proper mass) and aδ is the 4-acceleration. As
long as the rest mass isn’t changing, the 4-force has all the same properties
as the 4-acceleration. If the rest mass does change, then Eq. 7.4.26 simply
has an extra term due to the derivative product rule (defined by Eq. 3.1.5).
The components of the 4-force can be written in shorthand just as we did
for the 4-momentum using Eq. 7.4.13. The result is
4
γ4
δ γ 2
F −→ mp (~u • ~a) , 2 mp (~u • ~a) ~u + γ mp~a , (7.4.27)
c c
c Nick Lucid
206 CHAPTER 7. SPECIAL RELATIVITY
δ Prel ~
F −→ γ , γ Frel , (7.4.28)
c
where F~rel ≡ d~prel /dt is the relativistic coordinate 3-force and Prel ≡
dErel /dt is the relativistic coordinate power.
This generalization of net force (essentially Newton’s second law) can be
used to solve problems in terms of Newton’s laws of motion. However, you
must use the 4-vector forms of velocity, acceleration, momentum, and force.
Newton’s first law can be written as
dxδ
uδ = = constant if F δ = 0 ,
(7.4.29)
dτ
which looks just like it did in classical physics. Newton’s third law of motion
does not generalize to special relativity in the sense that we’re used to using
it. In classical physics, it is consistent to replace the words “action” and
“reaction” with the word “force” because they are analogous. This cannot
be done if the motion is relativistic because an “action” is a fundamentally
unique quantity. Mutual opposite forces are not necessarily equal in mag-
nitude. As a result, it is often easier to use Eqs. 7.4.23 and 7.4.25 to solve
problems.
Example 7.4.1
A widely used example of special relativity is the decay of a negative pion.
A negative pion (Ep,π = 139.6 MeV) is a type of massive particle that often
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 207
Figure 7.12: This is the before and after picture for the decay of a negative pion into a
muon and a muon-antineutrino. It is shown in the rest frame of the pion. The line above
the ν indicates the “anti” part of the neutrino.
decays into two other massive particles: a muon (Ep,µ = 105.7 MeV) and
a muon-antineutrino. Typically, a neutrino (designated by the symbol ν) is
considered massless because it is very small compared to other particles (e.g.
Ep,ν Ep,µ ), but it is not actually massless. We’re not quite prepared to
deal with massless particles, but unfortunately the neutrino’s mass is only
approximately known. For the purposes of this example, we’ll go with a
“middle of the road” estimate of Ep,ν = 1.5 eV (not MeV).
Let’s start this problem by stating that all measurements will be taken in
the pion’s rest frame. Now we’ll apply conservation of 4-momentum (given
by Eq. 7.4.23) using Figure 7.12, which results in
where δ is a free index capable of taking on four different values (Note that
π, µ, and ν are not indices but just labels for the particles). This is actually
four equations: one for each component of 4-momentum. We can write these
component equations out using Eq. 7.4.21 in shorthand notation as
Ep,π Ep,µ Ep,ν
, 0 = γµ , γµ pµ x̂ + γν , −γν pν x̂
c c c
or in matrix notation as
Ep,π /c γµ Ep,µ /c γν Ep,ν /c
0 γµ pµ −γν pν
0 =
+ .
0 0
0 0 0
c Nick Lucid
208 CHAPTER 7. SPECIAL RELATIVITY
where the y and z components are unnecessary. However, Eq. Set 7.4.30 has
two equations, but four unknowns. We need two other equations to solve
this system and they will come from Eq. 7.4.25. With a little manipulation,
we get
p
γ 2 Ep2 = Ep2 + γ 2 p2 c2 ⇒ γpc = γ 2 − 1 Ep ,
which we can use on both the muon and the neutrino. This gives
q
γµ pµ c = γµ2 − 1 Ep,µ (7.4.31a)
p
γν pν c = γν2 − 1 Ep,ν (7.4.31b)
γµ pµ c = γν pν c
γµ2 − 1 Ep,µ
2
= γν2 − 1 Ep,ν
2
γµ2 Ep,µ
2 2
− Ep,µ = γν2 Ep,ν
2 2
− Ep,ν .
Using Eq. 7.4.30a to solve for γν Ep,ν yields γν Ep,ν = Ep,π − γµ Ep,µ , so
2
γµ2 Ep,µ 2
− Ep,µ = (Ep,π − γµ Ep,µ )2 − Ep,ν
2
c Nick Lucid
7.4. RELATIVISTIC DYNAMICS 209
γµ2 Ep,µ
2 2
− Ep,µ 2
= Ep,π − 2γµ Ep,µ Ep,π + γµ2 Ep,µ
2 2
− Ep,ν .
2 2 2
2γµ Ep,µ Ep,π = Ep,π + Ep,µ − Ep,ν
2 2 2
Ep,π + Ep,µ − Ep,ν
γµ = . (7.4.32)
2Ep,µ Ep,π
Substituting in all the rest energies gives a value of γµ = 1.039 for our
example. This being close to a value of one implies that the muon is moving
relatively slow after the decay.
We can now summarize by finding all there is to know about the muon:
Eq. 7.4.4 gives us the speed, γµ Ep,µ is the relativistic energy, γµ Ep,µ − Ep,µ
is the kinetic energy, and Eq. 7.4.31a gives us the relativistic momentum.
Therefore,
γµ = 1.039
u = 0.271c
µ
MeV MeV ,
pδµ −→ 109.8 , 29.78 x̂
c c
KE
µ = 4.1 MeV
Ep,π − γµ Ep,µ
γν = ,
Ep,ν
which gives a value of γν = 1.99 × 107 corresponding to a speed of uν =
0.999 999 999 999 999 c. That’s 15 nines after the decimal point! We can now
c Nick Lucid
210 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.13: This is a spacetime diagram showing the decay of a negative pion into a muon
and a muon-antineutrino. The coordinate system shown is the rest frame of the pion. It
is clear that the antineutrino zips off almost along a light-like world line due to its very
low mass.
summarize by finding all there is to know about the neutrino: Eq. 7.4.4 gives
us the speed, γν Ep,ν is the relativistic energy, γν Ep,ν − Ep,ν is the kinetic
energy, and Eq. 7.4.31b gives us the relativistic momentum. Therefore,
γν = 1.99 × 107
u = 0.999 999 999 999 999 c
ν
MeV MeV ,
pδν −→ 29.8 , − 29.8 x̂
c c
KEν = 29.8 MeV
where it can be seen that the neutrino’s total energy is entirely kinetic en-
ergy within the significant figures we’ve kept. It’s also apparent from the
neutrino’s 4-momentum that it’s traveling on very nearly a null world line
since its time and space components are the same (See Figure 7.13).
Just as a check, you can add the 4-momentums of the muon and the
muon-antineutrino and you’ll arrive at the 4-momentum of the pion. We
can take note again that rest energy, Ep , is not conserved (as expected)
since 139.6 MeV 6= 105.7 MeV + 1.5 eV, but is invariant since each of those
three measurements is the same in all frames of reference. The missing 33.9
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 211
MeV went into the kinetic energy (i.e. the motion) of the muon and muon-
antineutrino. Rest energy isn’t really anything new. For example, the pion is
made of more fundamental particles, so the 139.6 MeV is simply the kinetic
energy of those particles (the pion’s rest frame is the center of mass frame
for those particles) plus the potential energy between those particles (i.e. the
nuclear bonds).
The relativistic energy, Erel , is conserved since 139.6 MeV = 109.8 MeV +
29.8MeV in the rest frame of the pion and 145.0MeV = 105.7MeV+39.3MeV
in the rest frame of the muon. However, relativistic energy is not invariant
since Erel,µ = 109.8 MeV in the rest frame of the pion (γµ = 1.039), but
Erel,µ = 105.7 MeV in the rest frame of the muon (γµ = 1). The same
can be shown for the pion (Erel,π = 145.0 MeV) and the muon-antineutrino
(Erel,ν = 39.3 MeV). The pion has the extra 5.4 MeV due to its motion in
the muon’s rest frame.
Total charge, on the other hand, is q = −1.602 × 10−19 C before and
after the decay, which makes it conserved. It is also measured to be q =
−1.602 × 10−19 C in every frame of reference, which makes it invariant. This
is a very unique quality of charge and is very important in all particle decays.
∂ δ ∂T δ
∇α T δ = T = ,
∂xα ∂xα
where upper indices in the denominator of a derivative are actually lower
indices. If we want a scalar result, then
α ∂T α ∂T 0 ∂T 1 ∂T 2 ∂T 3
∇α T = = + + + ,
∂xα ∂x0 ∂x1 ∂x2 ∂x3
c Nick Lucid
212 CHAPTER 7. SPECIAL RELATIVITY
1 ∂ (cρ) ~ ~
+ ∇ • J = 0,
c ∂t
where we can say cρ represents the time component of the current density
4-vector (or 4-current). Therefore, in shorthand notation, we can write the
4-current as
α
J −→ cρ, J , ~ (7.5.2)
∇α J α = 0 , (7.5.4)
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 213
1 ∂ (φ/c) ~ ~
+ ∇ • A = 0,
c ∂t
where we can say φ/c represents the time component of the potential 4-
vector (or 4-potential). Therefore, in shorthand notation, we can write the
4-current as
α φ ~
A −→ ,A . (7.5.5)
c
∇α Aα = 0 , (7.5.6)
where ∇ is the covariant derivative. However, Eqs. 7.5.5 and 7.5.6 don’t
work under any gauges other than the Lorenz gauge. Conveniently, this is
the gauge we used to derive Maxwell’s equations in Section 5.6.
≡ ∇δ ∇δ = g µδ ∇δ ∇µ ,
1 ∂2 ~2.
≡− +∇ (7.5.7)
c2 ∂t2
c Nick Lucid
214 CHAPTER 7. SPECIAL RELATIVITY
Now we get
~ = −µ0 J,
A ~
which looks much simpler. Note here that A ~ and J~ are the spatial components
of their respective 4-vector counterparts.
Using Eq. 5.6.15, we can get a very similar result. A little rearranging
gives us
1 ∂ 2φ ~ 2 ρ
− + ∇ φ = − .
c2 ∂t2 0
If we multiple through by c/c2 = cµ0 0 (we used Eq. 5.5.4), then we get
1 ∂ 2 (φ/c) ~ 2 φ
− 2 +∇ = −µ0 (cρ)
c ∂t2 c
1 ∂ 2 (φ/c) ~ 2
φ
− 2 +∇ = −µ0 (cρ)
c ∂t2 c
φ
= −µ0 (cρ) .
c
The parenthetical quantities on both sides just represent time components
of the 4-potential and 4-current, respectfully. Therefore, we can conclude in
general that
where we’ve simplified Maxwell’s equations down to one equation using tensor
analysis.
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 215
~
E ~
− ~ t + 1 ∂A ,
= ∇A
c c ∂t
where we’ve multiplied through by −1/c. It sort of looks like a covariant
derivative on the right side, but not quite since the components are mixed.
Let’s get a better look at this through its components, which are given by
Ex 1 ∂Ax
x t
− = ∇ A +
c c ∂t
y y
E 1 ∂A
y t
− = ∇ A +
c c ∂t
z z
− E z t 1 ∂A
= ∇A +
c c ∂t
−E x /c = ∇x At − ∇t Ax
−E y /c = ∇y At − ∇t Ay , (7.5.9)
−E z /c = ∇z At − ∇t Az
∇δ = g µδ ∇µ
c Nick Lucid
216 CHAPTER 7. SPECIAL RELATIVITY
∇t = g tt ∇t + g xt ∇x + g yt ∇y + g zt ∇z = −∇t
to keep the same derivative throughout (in this case, the contravariant deriva-
tive).
We can also perform this same process for Eq. 5.6.2. In index notation
for dimension-3, the magnetic field can be written
Bi = εijk ∇j Ak ,
where εijk is the dimension-3 Levi-Civita tensor defined by Eq. 6.6.4. Since
all three indices must be different, this leaves us with the components of
x
B = ∇y Az − ∇z Ay
B y = ∇z Ax − ∇x Az , (7.5.10)
z
B = ∇x Ay − ∇y Ax
F αδ = ∇α Aδ − ∇δ Aα . (7.5.11)
This represents an antisymmetric dimension-4 rank-2 tensor. As a dimension-
4 rank-2 tensor, it has the expected 42 = 16 components. Since it’s antisym-
metric (i.e. F αδ = −F δα ), the diagonal components must be zero leaving only
12 components, but half of those are just opposite-sign duplicates. That’s
six independent components!
Using Eqs. 7.5.9 and 7.5.10 with Eq. 7.5.11, we get a contravariant form
of
0 Ex /c Ey /c Ez /c
−Ex /c 0 Bz −By
F αδ −→ −Ey /c −Bz
(7.5.12)
0 Bx
−Ez /c By −Bx 0
and a covariant form of
0 −Ex /c −Ey /c −Ez /c
Ex /c 0 Bz −By
Fαδ −→
Ey /c −Bz
. (7.5.13)
0 Bx
Ez /c By −Bx 0
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 217
It’s only the electric field components that change from contravariant to
covariant because they’re the only components to have an index value of
zero (or a value of t depending on how you look at it), which is the index
that experiences sign change according to Eq. 7.3.8. This is a real tensor as
it transforms according to
!
~ •E
E ~
Fαδ F αδ = 2 ~ •B
−B ~ , (7.5.15)
c2
where α and δ are both summation indices (equivalent to taking the trace of
the matrix product). The determinant of the tensor,
Ex Bx + Ey By + Ez Bz ~ •B
E ~
det(F) = = , (7.5.16)
c2 c2
is also spacetime invariant. Even if the electric and magnetic field compo-
nents change between frames, the results of Eqs. 7.5.15 and 7.5.16 will not.
Example 7.5.1
In your IRF, a charge q (which is invariant) is moving to the right with a
constant speed of u (which is not invariant). Determine the E-field at an
arbitrary point around this moving charge.
~ = kE q r̂00 = kE q
~rp00 − ~rq00
E 2 3
00
(r ) ~rp00 − ~rq00
c Nick Lucid
218 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.14: There are two IRFs shown: the charge’s rest frame (double-primed) and
another frame (unprimed) in which it is moving with constant velocity ~u in the x-direction.
A position vector of an arbitrary point p is also shown for both frames. This point p appears
closer to the charge along the direction of motion in the unprimed frame due to length
contraction.
where ~rp00 is the location of the arbitrary point and ~rq00 is the location
of the charge in the rest frame of the charge. For simplicity, since it’s
the only object in the system, we can put the charge at the origin
(i.e. ~rq00 = 0 and ~rp00 = ~r 00 ) allowing us to drop the more complicated
notation. The result in Cartesian coordinates is
~ = kE q ~r 00 =
E
kE q 00 00 00
3 (x x̂ + y ŷ + z ẑ) .
00 3
(r ) 2 2 2
(x00 ) + (y 00 ) + (z 00 ) 2
• Before we can transform from this rest frame out to an arbitrary IRF
(just as we did in Eq. 7.4.7), we need to simply a bit. We’ll be using
the standard Lorentz transformation matrix from this chapter which
assumes the relative motion between the two frames is only in the x-
direction. Since we’re starting from the rest frame, this implies the mo-
tion of the charge in new frame should be measured in the x-direction.
Just as in Example 5.2.1, this will result in cylindrical symmetry about
the x-axis and, therefore, (x00 , s00 , φ00 ) as a set of generalized coordinates.
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 219
The electric field in the rest frame of the charge can now be written as
~ = kE q 00 00
E 23 (x x̂ + y ŷ) ,
2 2
(x00 ) + (y 00 )
where we have suppressed the z-direction making y 00 = s00 (we’re staying
in the xy-plane and we’ll bring back the z-direction later).
• Now we need to write out this electric field in the form of the EMF
tensor since it wont obey Lorentz transformations otherwise. As usual,
it’s more convenient to work with contravariant forms, so we’ll use Eq.
7.5.12 to get
0 x00 y 00 0
kE q −x00 0 0 0
F 00 αδ −→
23 −y 00 0 0 0
2 2
c (x00 ) + (y 00 )
0 0 0 0
where all the magnetic field components are zero because stationary
charges don’t generate magnetic fields. We’ve also pulled out all quan-
tities common to all non-zero components.
• The transformation equation is given in index notation by Eq. 7.5.14,
but some of us may still feel more comfortable with matrix notation.
If we intend to write this transformation in terms of matrix multipli-
cation, then we need to be more careful. Because matrices do not
commute, the order will matter. We need to make sure we’re sum-
ming over columns in the first matrix and rows in the second. A little
rearranging gives
F µν = Λµα F 00 αδ Λνδ ,
where we define the first index on F as the row index and the second
as the column index (Λ is symmetric so it doesn’t matter which index
is which). Let’s do this step-by-step so we don’t get lost. Starting with
the last two matrices, we get
0 x00 y 00 0
γ γβ 0 0
kE q −x00 0 0 0 γβ γ 0 0
F 00 αδ Λνδ −→
32 −y 00 0 0 0 0
2 2 0 1 0
c (x00 ) + (y 00 )
0 0 0 0 0 0 0 1
c Nick Lucid
220 CHAPTER 7. SPECIAL RELATIVITY
γβx00 γx00 y 00
0
kE q 00
−γx −γβx 0 00
0
F 00 αδ Λνδ −→
2 2 3
−γy 00 −γβy 00 0 0
c (x00 ) + (y 00 ) 2
0 0 0 0
To get the final result, we just multiply another Λ on the front which
gives
γβx00 γx00 y 00
γ γβ 0 0 0
k q 00 00
E γβ γ 0 0 −γx00 −γβx00 0 0
F µν −→
2 2 23 0 0 1 0 −γy −γβy 0 0
c (x00 ) + (y 00 )
0 0 0 1 0 0 0 0
γ 2 (1 − β 2 ) x00 γy 00
0 0
00
kE q 2 2
−γ (1 − β ) x 0 γβy 00 0
F µν −→
3 −γy 00 −γβy 00 0 0
c (x00 )2 + (y 00 )2 2
0 0 0 0
x00 γy 00
0 0
kE q −x00 0 γβy 00 0
F µν −→ 00 00
.
3
−γy −γβy 0 0
c (x00 )2 + (y 00 )2 2
0 0 0 0
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 221
0 x y 0
γkE q −x 0 βy 0
F µν −→ (7.5.17)
2 2
c [γ x + y ]2
3
2 −y −βy 0 0
0 0 0 0
~ = γkE q
E 3 (xx̂ + y ŷ) ,
(γ 2 x2 + y 2 ) 2
or, better yet,
~ = γkE q
E 3 (xx̂ + sŝ) , (7.5.18)
(γ 2 x2 + s2 ) 2
p
where s = y 2 + z 2 is defined in our version of cylindrical coordi-
nates given by (x, s, φ). We should take note that the factor of c in
the denominator has disappeared because the EMF tensor components
~
include it for E.
We might get a better feel for what this field looks like if we use the
angle θ in Figure 7.14 to generalize. We know ~r = xx̂ + sŝ as well as
x = r cos θ
s = r sin θ
where θ is the angle between ~u and ~r. Now the E-field can be written
~ = γkE q
E 23 ~r
γ 2 r2 cos2 θ + r2 sin2 θ
~ = γ q
E 3 kE 3 ~
r, (7.5.19)
2 2 2
γ cos θ + sin θ
2 r
which looks a lot like Coulomb’s law (defined by Eq. 5.2.5) but with
a hideous factor out front. This is the generalized form of the electric
field at an arbitrary point around a moving point charge. Along the
c Nick Lucid
222 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.15: This the E-field surrounding a charge moving in the positive x-direction
(horizontal in the figure) at a constant speed of u = 0.7c. You can see very clearly the
compression along the horizontal and the expansion along the vertical.
Figure 7.16: This is the B-field surrounding a charge moving in the positive x-direction
(velocity shown by a green arrow) at a constant speed of u = 0.7c. Each line is of equal
~ and it is evident how quickly the B-field drops off as the points in question are further
B
away from the charge.
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 223
• The other result from Eq. 7.5.17 is that, in the new frame, there is also
a magnetic field. This makes sense since the charge is moving in the
new frame, but the beautiful thing about this is that we didn’t even
have to think about it. It appeared automatically! This is an example
of the completeness that comes with the EMF tensor. By Eq. 7.5.12,
this relativistic magnetic field is
~ = γkE q
B 3 βyẑ
c (γ 2 x2 + y 2 ) 2
~ = γkM quy
B 3 ẑ,
(γ 2 x2 + y 2 ) 2
~ = γkM qus
B 3 φ̂ , (7.5.20)
(γ 2 x2 + s2 ) 2
where ẑ is just φ̂ in the xy-plane (where all our original math took
place).
We might get a better feel for what this field looks like if we use the
angle θ in Figure 7.14 to generalize. We know ~r = xx̂ + sŝ as well as
x = r cos θ
s = r sin θ
where θ is the angle between ~u and ~r. Now the B-field can be written
c Nick Lucid
224 CHAPTER 7. SPECIAL RELATIVITY
~ = γ sin θ qu
B 3 kM φ̂ , (7.5.21)
γ 2 cos2 θ + sin2 θ 2
r2
Example 7.5.2
In one IRF, we observe that two equal positive charges (q1 = q2 = q which is
invariant) are moving in opposite directions with equal constant speed (u1 =
u2 = u which is not invariant) as shown in Figure 7.17. At closest approach,
these charges are separated by a distance R, which does not experience length
contraction since it’s orthogonal to the motion of both charges. Determine
the Lorentz force on q2 due to q1 (i.e. F~21 ) in this frame at closest approach.
Also, determine the same Lorentz force in the rest frame of q1 and the rest
frame of q2 .
~ 1 = γ1 kE q1 ~r = γ1 kE q ŷ
E
r3 R2
because θ = 90◦ , q1 = q, r = R, and ~r = Rŷ. By the same logic, the
B-field generated by q1 at the location of q2 is given by Eq. 7.5.21 to
be
~ 1 = γ1 kM qu1 φ̂ = γ1 kE qu ẑ,
B
r2 c2 R 2
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 225
Figure 7.17: An IRF is shown in which two positive charges are moving in opposite
directions parallel to the x-axis. On their closest approach they are separated by a distance
R. The Lorentz force on q2 due to q1 is also shown.
~ = ẑ in the xy-plane.
where kM = kE /c2 , u1 = u is the speed of q1 , and φ
Therefore, the Lorentz force on q2 is given by Eq. 5.7.1 to be
F~21 = q2 E ~ 1 + ~u2 × B
~1
q2 u2
F~21 = γ1 kE 2 ŷ + 2 [−x̂ × ẑ] .
R c
q2
F~21 = γ1 1 + β 2 kE 2 ŷ.
R
c Nick Lucid
226 CHAPTER 7. SPECIAL RELATIVITY
~ 100 = kE q1 ~r = kE q ŷ
E
r3 R2
because q1 = q and ~r = Rŷ. There is no B-field because only moving
charges generate B-fields (i.e. B~ 100 = 0). Therefore, the Lorentz force
on q2 is given by Eq. 5.7.1 to be
~ 00 ~ 00 00
F21 = q2 E1 + ~u2 × B1 ~ 00
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 227
2γ10
~ 0 0 kE q 2u kE qu
B1 = γ1 2 2 2
ẑ = 2
ẑ,
c R 1+β 1+β c2 R 2
where γ10 is related to u01 by Eq. 7.4.4. Therefore, the Lorentz force on
q2 is given by Eq. 5.7.1 to be
~ 0 ~ 0 0
F21 = q2 E1 + ~u2 × B1~ 0
1 + β2 1 + β2
γ10 = q =p
(1 + β 2 )2 − (2β)2 1 + 2β 2 + β 4 − 4β 2
1 + β2 1 + β2
γ10 = p 2 2
= = γ1 1 + β .
1 − 2β 2 + β 4 1 − β2
c Nick Lucid
228 CHAPTER 7. SPECIAL RELATIVITY
F~ 0
F~21 = 21 , (7.5.22)
γ2
F~21 = γ1 1 + β 2 F~2100 ,
where neither frame in the transformation is the rest frame of the object
(on which the force acts). This is much more complicated a transfor-
mations because the Lorentz force is a coordinate 3-force, so it doesn’t
transform between frames in the simple way that a 4-force would. Just
for some perspective, if u = 0.5c and R = 10 fm for two protons, then
the three Lorentz forces have the values
F21 = 3.33 N
F 0 = 3.85 N
2100
F21 = 2.31 N
0
and it is clear that F21 is the largest measured force.
• We can also discuss a few things more generally. Eq. 7.5.22 can be
written as
F~p⊥
F~⊥ = , (7.5.23)
γ
which is true of any force components such that those components are
perpendicular to the motion of the object on which the force acts. The
quantity F~p can be called the proper force (the maximum measurable
force), which is measured in the rest frame of the object on which the
force acts. As it turns out, the components parallel to the motion are
measured the same in all frames.
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 229
~ = ρ = µ0 c2 ρ,
~ •E
∇
0
where we’ve used Eq. 5.5.4 to eliminate the fraction on the right side. If we
divide through by 1/c, then
!
E~
~ •
∇ = µ0 (cρ)
c
Ex Ey Ez
∇x + ∇y + ∇z = µ0 (cρ) .
c c c
On the left, we have three spatial terms from a scalar product, which in
spacetime should also involve time. Since F 00 = 0, we can perform something
I like to call voodoo math (with a little foresight; we can add zeros, multiply by
ones, add and subtract constants, etc. to simplify a mathematical expression).
Using this and Eq. 7.5.2, we can write Gauss’s law as
∇δ F 0δ = µ0 J 0 , (7.5.24)
~
~ = µ0 J~ + µ0 0 ∂ E
~ ×B
∇
∂t
c Nick Lucid
230 CHAPTER 7. SPECIAL RELATIVITY
~
~ ×B
∇ ~ = µ0 J~ + 1 ∂ E ,
c2 ∂t
where we’ve used Eq. 5.5.4 to simplify a term on the right side. With a little
manipulation, we get
~
~ ×B
∇ ~ − 1 ∂ E = µ0 J~
c2 ∂t
~ ×B~− 1 ∂ Ex ~
∇ = µ0 J,
c ∂t c
which gets all the field information on the left side. This equation is actually
three equations, one for each spatial component of the vectors. If we intend
to write this in index notation, then we need to have them all separate leaving
us with
1 ∂ Ex
~
∇×B − ~ = µ0 Jx
x c ∂t c
~ ×B
~ − 1 ∂ Ey
∇ = µ0 Jy
y c ∂t c
~ ~
1 ∂ E z
∇×B −
= µ0 Jz
z c ∂t c
noting that Bi = B i in Cartesian 3-space due to Eq. 6.4.5. Using the defini-
tions of the cross product (Eq. 2.2.4) and the covariant derivative (Eq. 7.5.1),
we get
(∇y Bz − ∇z By ) − ∇t (Ex /c) = µ0 Jx
(∇z Bx − ∇x Bz ) − ∇t (Ey /c) = µ0 Jy
(∇x By − ∇y Bx ) − ∇t (Ez /c) = µ0 Jz
∇y Bz + ∇z (−By ) + ∇t (−Ex /c) = µ0 Jx
∇z Bx + ∇x (−Bz ) + ∇t (−Ey /c) = µ0 Jy .
∇x By + ∇y (−Bx ) + ∇t (−Ez /c) = µ0 Jz
Based on the form of the contravariant EMF tensor (given by Eq. 7.5.12),
we can write this as
∇2 F 12 + ∇3 F 13 + ∇0 F 10 = µ0 J 1
∇3 F 23 + ∇1 F 21 + ∇0 F 20 = µ0 J 2 .
∇1 F 31 + ∇2 F 32 + ∇0 F 10 = µ0 J 3
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 231
The components given in Eq. 7.5.24 and Eq. Set 7.5.25 have an identical
form, so we can combine them into one equation using index notation. This
results in
∇δ F αδ = µ0 J α , (7.5.26)
We can use the covariant EMF tensor (given by Eq. 7.5.13) to write this as
where 123, 231, and 321 are the even permutations of the indices. This one
was probably the easiest so far.
Faraday’s law is a vector equation and, therefore, has three components
like Ampére’s law. Eq. 5.4.9c states
~
~ = − ∂B
~ ×E
∇
∂t
~
~ + ∂ B = 0,
~ ×E
∇
∂t
c Nick Lucid
232 CHAPTER 7. SPECIAL RELATIVITY
where we’ve manipulated a bit to get all the field information on the left
side. We can now multiply through by 1/c to achieve the right units and the
result is
!
~
E ~
1 ∂B
~ ×
∇ + = 0.
c c ∂t
If we intend to write this in index notation, then we need to have all the
components separate leaving us with
" !#
~
~ × E 1 ∂Bx
∇ + = 0
c c ∂t
x
" !#
E~ 1 ∂B y
~
∇× + =0 ,
c c ∂t
y
" !#
~
~ E 1 ∂B z
∇ × + = 0
c c ∂t
z
noting that Bi = B i in Cartesian 3-space due to Eq. 6.4.5. Using the defini-
tions of the cross product (Eq. 2.2.4) and the covariant derivative (Eq. 7.5.1),
we get
∇y (Ez /c) − ∇z (Ey /c) + ∇t Bx = 0
∇z (Ex /c) − ∇x (Ez /c) + ∇t By = 0
∇x (Ey /c) − ∇y (Ex /c) + ∇t Bz = 0
∇y (Ez /c) + ∇z (−Ey /c) + ∇t Bx = 0
∇z (Ex /c) + ∇x (−Ez /c) + ∇t By = 0
∇x (Ey /c) + ∇y (−Ex /c) + ∇t Bz = 0
Based on the form of the covariant EMF tensor (given by Eq. 7.5.13), we can
write this as
∇2 F30 + ∇3 F02 + ∇0 F23 = 0
∇3 F10 + ∇1 F03 + ∇0 F31 = 0 (7.5.28)
∇1 F20 + ∇2 F01 + ∇0 F12 = 0
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 233
The components given in Eq. 7.5.27 and Eq. Set 7.5.28 have an identical
form, so we can combine them into one equation using index notation. This
results in
where α, ν, and δ are all free indices. This complete’s our derivation of
Maxwell’s equations, but does it give us a complete description of of elec-
trodynamics? The answer is a resounding “No.” Just as in Section 5.4, we
need to know how charges will respond to these fields and that requires the
Lorentz force.
Lorentz Four-Force
In vector notation, the Lorentz 3-force is given by Eq. 5.7.1 as
F~ = q E ~ + ~u × B~
F δ = quα F δα , (7.5.30)
which only differs from the contravariant 4-velocity by the negative sign on
the time component. Checking this 4-vector’s spatial components, we get
1
F = q (u0 F 10 + u1 F 11 + u2 F 12 + u3 F 13 )
F 2 = q (u0 F 20 + u1 F 21 + u2 F 22 + u3 F 23 )
3
F = q (u0 F 30 + u1 F 31 + u2 F 32 + u3 F 33 )
c Nick Lucid
234 CHAPTER 7. SPECIAL RELATIVITY
or more simply
1
F = q (u0 F 10 + u2 F 12 + u3 F 13 )
F 2 = q (u0 F 20 + u1 F 21 + u3 F 23 ) ,
3
F = q (u0 F 30 + u1 F 31 + u2 F 32 )
1
F = γq [Ex + uy Bz − uz By ]
F 2 = γq [Ey − ux Bz + uz Bx ] .
3
F = γq [Ez + ux By − uy Bx ]
which is almost exactly the components of the Lorentz 3-force. The extra
factor of γ is consistent with Eq. 7.4.28 because the original Lorentz 3-force
is a coordinate force (i.e. involved coordinate time, not proper time).
This is only three components. What about the time component of the
Lorentz 4-force? By the same methods as above, it is
F 0 = q u0 F 00 + u1 F 01 + u2 F 02 + u3 F 03
F 0 = q u1 F 01 + u2 F 02 + u3 F 03
0 Ex Ey Ez
F = q γux + γuy + γuz
c c c
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 235
0 q q ~
F = γ (ux Ex + uy Ey + uz Ez ) = γ ~u • E ,
c c
where we’ve used the definition of the dot product (Eq. 2.2.2). There still
may be some confusion as to what this is, but if we bring in the q, then
γ ~ = γ ~u • F~E .
F0 = ~u • q E (7.5.31)
c c
We know from classical physics that P = ~u • F~ . The parenthetical quantity
is just the coordinate electrical power! The factor of γ/c is consistent
with Eq. 7.4.28. It also makes sense that the magnetic field is not involved
in power because it never does work:
P = ~u • F~B = ~u • q~u × B
~ = q~u • ~u × B
~ = 0,
which is true for any ~u or B.~ A more clear way to look at Eq. 7.5.31 than
just calling it electrical power is to say it’s the rate at which energy is added
to the charge q by the electric field.
Example 7.5.3
Back in Example 7.5.2, we had two equal positive charges moving in opposite
directions and found the Lorentz 3-force one due to the other in three different
frames. Find the Lorentz 4-force on the same charge in those same three
frames.
• To keep this short, we’ll be using a lot from Example 7.5.2 (i.e. reference
said example if you feel like there are gaps in this one). We’ve already
gone through a little work with the Lorentz 4-force, so we’ll start from
0 t q
F = F = γ (u E
x x + u E
y y + u E )
z z
c
1 x
F = F = γq (Ex + uy Bz − uz By ) .
2 y
F = F = γq (Ey − ux Bz + uz Bx )
F 3 = F z = γq (E + u B − u B )
z x y y x
t q2
F = γ2 (u E
2x 1x + u E
2y 1y + u E )
2z 1z
c
x
F = γ2 q2 (E1x + u2y B1z − u2z B1y ) ,
y
F = γ2 q2 (E1y − u2x B1z + u2z B1x )
F z = γ q (E + u B − u B )
2 2 1z 2x 1y 2y 1x
c Nick Lucid
236 CHAPTER 7. SPECIAL RELATIVITY
F y = γ2 q (E1 − u2 B1 )
where q2 = q is invariant and the rest of the terms are zero as we’d
expect. The fields E1 and B1 are given by Eqs. 7.5.19 and 7.5.21,
respectively.
• In the IRF in which the two charges are traveling the same speed (i.e.
the unprimed frame), we know u1 = u and u2 = −u. We also know
q
1
E = γ k
1 E 2
R
,
B1 = γ1 kE qu
c2 R 2
so
y q kE qu
F = γ2 q γ1 kE 2 + uγ1 2 2
R c R
u2 q2 q2
y
F = γ2 γ1 1 + 2 kE 2 = γ2 γ1 1 + β 2 kE 2 ,
c R R
which is exactly what we got in Example 7.5.2 with the extra factor of
γ2 we expect from Eq. 7.4.28.
u2 − v (−u) − u −2u
u002 = = = ,
1 − u2 v/c2 1 − (−u) u/c2 1 + β2
E 00 = kE q
1
R2 ,
B 00 = 0
1
c Nick Lucid
7.5. RELATIVISTIC ELECTRODYNAMICS 237
so
00 y
q q2
F = γ200 00
q kE 2 + 0 = γ2 kE 2
R R
where
1
γ200 = p ,
1 − u002 /c2
which is exactly what we got in Example 7.5.2 with the extra factor of
γ200 we expect from Eq. 7.4.28.
• In rest frame of q2 (i.e. the single-primed frame), we know u02 = 0 ⇒
γ20 = 1 and, from Eq. 7.3.3a, that
u1 − v u − (−u) 2u
u01 = 2
= 2
= ,
1 − u1 v/c 1 − u (−u) /c 1 + β2
where β ≡ u/c and v = −u is the relative velocity between the un-
primed and single-primed frames. We also know
0 0 q
1
E = γ k
1 E 2
R
,
B 0 = γ 0 kE qu1
1 1 2
c R2
so
q q2
F 0 y = q γ10 kE 2 + 0 = γ10 kE 2
R R
where
1
γ10 = p ,
1 − u01 /c2
which is exactly what we got in Example 7.5.2 with no extra factor
because γ200 = 1.
• Ok, so we got what we expected given our results in Example 7.5.2.
We also plugged in some numbers: u = 0.5c and R = 10 fm for two
protons. This results in
δ
F −→ (0, 3.85 N ŷ)
F 0 δ −→ (0, 3.85 N ŷ)
00 δ
F −→ (0, 3.85 N ŷ)
c Nick Lucid
238 CHAPTER 7. SPECIAL RELATIVITY
−→
using the (time, space) shorthand. It would appear as though the Lorentz
4-force is invariant. However, they are only the same because the elec-
tric field is orthogonal to the motion of both charges. We can show
this kind of transformation in matrix notation as
0 γT ±γT βT 0 0 0 0
0 ±γT βT γT 0 0 0 0
0 y = =
F 0 0 1 0 F y F y
F0z 0 0 0 1 Fy Fy
• It should also be noted that the time component will not remain zero
as time passes because q2 (and q1 for that matter) will gradually gain
a uy component. Furthermore, the moment each of these charge ex-
periences a 4-force, their rest frames are no longer IRFs. That means
this work is only valid if these charges were held in the same rest frame
by some outside force then instantly shifted into their different frames
at beginning of the example and even then it still only applies to that
moment. We can only transform between IRFs and the only frame
that remains an IRF is the one in between the two rest frames (i.e.
the unprimed frame). The chance of this scenario occurring in the real
universe is highly unlikely.
7.6 Worldines
Everything we’ve done so far in this chapter has been objects traveling along
time-like world lines. This isn’t a horrible place to start an understanding
since almost everything we interact with in our everyday life travels these.
However, as we’ve mentioned before, not everything does. Addressing these
circumstances requires us to step outside our comfort zone and look at the
universe as objectively as possible.
c Nick Lucid
7.6. WORLDINES 239
which easily has a finite value. No problems, right? However, it’s relativistic
3-momentum is given by
mp cx̂
p~rel = γ~p = p ,
1 − β2
where we’ve already said β = 1.
Here in lies our problem. If β = 1, then the denominator is zero and
p~rel = ∞. Since nothing can actually have an infinite value in the real
physical universe, we can conclude that β 6= 1 (a proof by contradiction).
Therefore, we can approach speeds of c, but can never actually accelerate to
exactly c. The muon-antineutrino in Example 7.4.1 got pretty close, but it
still didn’t reach what we consider to be the universal speed limit.
You might be thinking “What?! Photons travel at the speed of light!”
and indeed they do. How they do it is the better question. Photons have a
zero rest mass (i.e. mp = 0) resulting in a coordinate 3-momentum of
c Nick Lucid
CHAPTER 7. SPECIAL RELATIVITY
c Nick Lucid
Figure 7.18: This is a graph of kinetic energy (KE/Ep ) vs. velocity (β) scaled so that both axes are unitless. The blue curve
is the relativistic kinetic energy, which goes to infinity at β = 1 indicating that it requires an infinite amount of energy to
accelerate to v = c. The red curve is the classical version, which visibly begins to deviate from the more accurate relativistic
version at about β = 0.4.
240
7.6. WORLDINES 241
Ok, so massive particles always travel along time-like world lines and zero
rest mass particles always travel along null world lines. What does it means
for two events to have a null separation? According to the line element (Eq.
7.2.1), this separation would be
implying that the time and space components of 4-vectors along null world
lines will have the same value. We saw this occur approximately with the
4-momentum of muon-antineutrino in Example 7.4.1. If we take the scalar
product of that 4-momentum with itself (using Eq. 7.4.15), then we’d get
2 2
δ MeV MeV
pδ p = − 29.8 + −29.8 x̂ • x̂ ≈ 0,
c c
which makes sense considering neutrinos are nearly massless. You could also
argue this in general using Eq. 7.4.24, resulting in
pδ pδ = −m2p c2 ≈ 0
Erel
prel = (7.6.2)
c
for all zero rest mass particles (note: Erel = hfrel for a photon). We can also
use this to write the 4-momentum as
δ Erel Erel
p −→ , û (7.6.3)
c c
c Nick Lucid
242 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.19: On the left is a spacetime diagram that includes four different IRFs observing
the motion of two photons along the x-axis. The spacing between these photons is defined
as the spacetime separation connecting two simultaneous events. On the right, you can
see how the spacing between the photons gets larger as you approach the rest frame of the
photons. Since there is no maximum value for length, proper length does not exist.
−→
using the (time, space) shorthand, where û is the direction of motion.
Now, let’s shift perspective to the rest frame of this zero rest mass particle.
A particle traveling at c even having a rest frame is a strange concept because
the speed of light is a spacetime invariant, but let’s consider it anyway.
According to the line element (Eq. 7.2.1), the separation between two events
would be
meaning no time passes at all for a zero rest mass particle. This is still
consistent with time dilation because Eq. 7.2.11 says
∆τ 0
∆t = γ ∆τ = p = ,
1 − β2 0
which again is indeterminate resulting in a finite value for ∆t. This also
implies the entire concept of proper length is meaningless. Two photons
can be spaced by a finite distance in every IRF except the rest frame of the
photons (See Figure 7.19).
Having zero proper time poses a much larger problem for us. In Section
7.4, we defined all the 4-vectors as derivatives with respect to proper time, dτ .
A differential must be a very small number, but not zero, by definition. For
massive particles, we were essentially using τ as a parameter (or independent
c Nick Lucid
7.6. WORLDINES 243
variable) to relate the coordinates (ct, x, y, z). We could have chosen anything
really, but τ was convenient because it made sense dimensionally and gave
us the relativistic form of Newton’s first law in Eq. 7.4.29.
For zero rest mass particles, we’ll have to resort to choosing something
else. Choosing this new parameter carefully, we can get
dxδ
uδ = = constant if F δ = 0 ,
(7.6.4)
dΩ
where Ω is called an affine parameter. An affine parameter is simply a
parameter which keeps the form of Newton’s first law, so it isn’t all that
special but it is useful. There is no single value of Ω that will make it
affine, so it’s a bit more abstract than τ . With this in mind, definitions for
4-acceleration and 4-force follow as
duδ dpδ
aδ = and F δ =
dΩ dΩ
If you’re not feeling comfortable with there being a F δ on something like a
photon, then recall Compton scattering. When the photon scatters off
a massive particle like an electron, there is most definitely a change in its
4-momentum. If a photon’s frequency changes, then by E = hf its energy
will also change and energy is a part of 4-momentum (See Eq. 7.6.3).
c Nick Lucid
244 CHAPTER 7. SPECIAL RELATIVITY
p~ = mp~u = mp ux̂
which easily has a ordinary value. No problems, right? However, it’s rela-
tivistic 3-momentum is given by
mp ux̂
p~rel = γ~p = p ,
1 − β2
and here in lies our problem. If β > 1, then the quantity under the square is
negative and p~rel is imaginary (as well as many other quantities involving γ).
In an attempt to avoid this problem, we can assume rest mass is imaginary
for tachyons (i.e. mp = izp ), which gives us
izp ux̂ zp ux̂
p~rel = p =p , (7.6.5)
i β2 − 1 β2 − 1
which is once again real. Furthermore, we can say
izp c2 zp c2
Erel = p =p (7.6.6)
i β2 − 1 β2 − 1
and Eq. 7.4.25 becomes
2
Erel = (izp )2 c4 + p2rel c2 = −zp2 c4 + p2rel c2
2
Erel + zp2 c4 = p2rel c2 . (7.6.7)
It’s clear now that, at least mathematically, special relativity doesn’t discount
the existence of such particles, but what kinds of consequences would their
existence present?
Example 7.6.1
Consider two experimenters, Joe and Ashley, moving at a constant relative
velocity v = 0.8c with respect to each other. Joe sends Ashley a message
saying “What’s up?” using a radio wave. Upon receiving Joe’s message,
Ashley replies with “Nothin’ much.” using a radio wave. Assuming they’ve
both accounted for the Doppler effect of light, they can both receive and send
c Nick Lucid
7.6. WORLDINES
Figure 7.20: This is a graph of energy (E/Ep ) vs. velocity (β) scaled so that both axes are unitless. A horizontal dashed line
indicates the rest energy of each particle. The blue curve is the relativistic total energy of a massive real particle, which goes
to infinity at β = 1 as it did in Figure 7.18. The red curve is the relativistic total energy of an imaginary tachyon, which goes
to infinity at β = 1 showing that energy increases as speed decreases. Another interesting result is the tachyon’s total energy
can be less than its rest energy if it goes fast enough.
c Nick Lucid
245
246 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.21: This is a spacetime diagram of two experimenters, Joe and Ashley, sending
signals to each other. The orange dashed arrows represent radio waves (photons) being
sent between them. The blue dashed arrows represent tachyons being sent between them.
The tachyons travel into the future in one IRF, but the past in another IRF.
a signal at u = c (which they’ll both measure the same since it’s a spacetime
invariant).
Now consider the same two experimenters still moving at a constant rela-
tive velocity v = 0.8c with respect to each other. Joe sends Ashley a message
saying “What’s up?” using tachyons traveling at u = 5c (yes, I said five).
They have both agreed that, upon receiving the message from Joe, Ashley
will reply with “Don’t send your message.” using tachyons of equal speed
(measured relative to her frame, of course). According to the spacetime di-
agram in Figure 7.21, the reply Ashley sends will travel forward in time in
her IRF, but back in time in Joe’s. Joe will receive this reply before he sends
his original message and we have a causality problem.
This might be surpassing the limitation of the spacetime diagram, so let’s
do the problem with Lorentz transformations instead. According to Eq. Set
7.3.2,
0 v∆x
∆t = γT ∆t − 2
c
If βT β > 1 as it is for our experimenters, then c∆t0 has the opposite sign of
c∆t. The reverse is also true for the reply signal. We can conclude from this
that the spacetime diagram is still a complete geometrical representation of
the Lorentz transformation.
It gets even weirder when we consider the coordinate velocity transfor-
mation. In Joe’s IRF, the tachyon travels away from him at u = 5c toward
Ashley. However, Ashley will measure the velocity of the tachyon to be
u−v β − βT 5 − 0.8
u0 = 2
= c= c = −1.4c,
1 − uv/c 1 − ββT 1 − (5) (0.8)
c Nick Lucid
248 CHAPTER 7. SPECIAL RELATIVITY
is the same condition which makes ∆t and ∆t0 both positive. Again, causality
is maintained.
Mind you, this is all contingent on Joe being able to send information
via tachyons. Recall that tachyons have imaginary mass and would have
to interact with real mass to be sent by Joe. We’re not even sure, given
what we’ve learned so far in this book, how to physically interpret imaginary
matter. It could very well not be capable of interacting with real matter
in the first place. Remember, this is all speculative at this stage. We can
only hope that some newer more advanced theory will explain these strange
particles away.
The former is usually due to some preconceived notion of how the universe
functions based on our personal experience. All we have to do is let go
of it and the problem disappears. The latter, on the other hand, is a bit
more difficult to see. Sometimes it results from simplifying or idealizing the
problem too much, which can be easily rectified. Other times, it can result
from a lack of understanding with respect to the given conditions, which is
much more difficult to resolve. Causality paradoxes, such as the one that
resulted from the use of tachyons in Section 7.6, are a prime example of
this. Carefully constructed problems require carefully constructed solutions.
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 249
In this section, we’ll address a few well-known paradoxes and present their
solutions.
Example 7.7.1
Two spaceships of equal proper length are traveling in opposite directions
along the x-axis each with a constant relative speed of v. The ship traveling
to the right is piloted by Joe and the other by Ashley. At the moment
Ashley’s ship’s bow (or front end) lines up with Joe’s ship’s aft (or rear end),
Ashley fires a laser from her ship’s aft in an attempt to hit Joe’s ship’s bow.
You may assume the ships are close enough together along the y-axis to
neglect the travel time of the laser beam.
This presents a paradox if we think about the scenario in the context of
special relativity. In Ashley’s IRF, Joe’s ship experiences a length contrac-
tion. That means she sees her laser miss Joe’s ship because it’s too short.
In Joe’s IRF, Ashley’s ship experiences the length contraction. That means
her laser will hit his ship somewhere toward the middle. Both events cannot
occur, so which is it? A hit or a miss?
c Nick Lucid
250 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.22: In this spacetime diagram, the two blue world lines correspond to the front
and back of a spaceship moving to the right (and the red worldines, to the left). Event 1
is the detection of the back of the blue ship by the front of the red ship. Event 2 is the red
ship firing a laser beam, which is simultaneous to event 1 in the unprimed frame. Event 3
is the firing of the same laser, but accounting for the time required for the signal to travel
from the front of the red ship to the back telling the laser to fire.
• In Joe’s frame, event 1 happens after event 2! That means, from his
perspective, Ashley fired the shot before the ends of the ships line up
(i.e. too early). For him, the shot also misses because the bow of his
ship still hasn’t lined up with the aft of hers. Events 1 and 2 are not
the same moment for Joe. Both perspectives are shown in Figure 7.23.
• In fact, we can use a few numbers to see how far apart in time Joe
measures these moments to be. Let’s assume in Ashley’s frame that
events 1 and 2 occur at (0, 0, 0, 0) and (0, 50 m, 0, 0) meaning we’ve
assumed the ship’s proper length to be 50 meters. We’ll also assume
v = 0.5c just for comparison. Using a Lorentz transformation (Eq.
7.3.1) on the spacetime coordinates, we get (0, 0, 0, 0) = (0, 0, 0, 0) for
event 1 since it’s the zero vector and
0
ct 1.155 −0.577 0 0 0 −28.87 m
x0 −0.577 1.155 0 0 50 m 57.735 m
0 = =
y 0 0 1 0 0 0
0
z 0 0 0 1 0 0
for event 2. The negative time component implies event 2 occurs t =
28.87 m/c = 96.3 ns before event 1. This isn’t much, but it’s enough for
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 251
Figure 7.23: In the unprimed frame (Ashley’s IRF), the laser fire (designated by the
purple beam) misses because Joe’s ship is too short due to length contraction. In the
primed frame (Joe’s IRF), the laser fire misses because the shot was fired too early.
the laser to miss Joe’s ship. We can also see the laser misses Joe’s ship
by 57.735 m − 50 m = 7.735 m . In Ashley’s frame, the shot misses by
50 m 50 m
50 m − = 50 m − = 6.7 m ,
γ 1.155
but it still misses.
• We’ve solved the paradox by letting go of a preconceived notion of time.
Unfortunately, it isn’t a physically accurate solution because we didn’t
consider how the bow of Ashley’s ship communicates with the aft of
her ship. Assuming this communication is instantaneous is a physical
impossibility because the fastest way to send information (under special
relativity) is at c by, perhaps, a radio wave. We’ve taken this into
account in Figure 7.22 by showing the signal as a orange dashed arrow.
By the time the signal to fire reaches the laser weapon at the aft of
Ashley’s ship, both ships have moved enough so that the laser hits
Joe’s ship (in both IRFs).
Example 7.7.2
c Nick Lucid
252 CHAPTER 7. SPECIAL RELATIVITY
A common example for introductory students is the “pole in the barn” prob-
lem: A farmer holding a 6 meter long pole (perfectly horizontally) is running
toward a small barn. If the barn is 5 meters from front to back and both the
front and back doors are open, then how fast does the farmer have to run to
fit the pole in the barn?
The idea is that the faster the farmer runs, the more contracted the length
of the pole gets. If he runs fast enough, the pole should contract to the length
of the barn. It’s a relatively short calculation using length contraction (Eq.
7.2.12):
LP,p L0P 6m
γ= = = = 1.2,
LP LP 5m
noting we only get to use this because one of the frames is the rest frame of
the pole. We recall proper length, Lp , is defined as the maximum possible
length measurement between two events. The two events in question are
1. The front of the pole lining up with the back of the barn and
2. The back of the pole lining up with the front of the barn.
LB,p LB 5m
L0B = = = = 4.167 m,
γ γ 1.2
noting proper length for the barn is measured in the unprimed frame
(the barn’s IRF). Only the farmer’s son sees the pole fit in the barn.
The farmer sees a minimum 6 m - 4.167 m = 1.833 m of the pole
sticking out of the barn.
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 253
Figure 7.24: In the unprimed frame (the barn’s IRF), the pole fits perfectly into the barn
because the pole has length contracted. In the primed frame (the pole’s IRF), the pole
doesn’t fit into the barn because the barn has length contracted.
• Still don’t see a problem with this yet? That’s ok because there really
isn’t a problem yet. Different observers measure different things all the
time. In fact, we can use a Lorentz transformation (Eq. 7.3.1) assuming,
in the farmer’s son’s frame, the spacetime coordinates are (0, 5 m, 0, 0)
and (0, 0, 0, 0) for events 1 and 2 respectively (i.e. the events are 5
meters apart and simultaneous). In the farmer’s frame, event 2 becomes
(0, 0, 0, 0) = (0, 0, 0, 0) since it’s the zero vector and event 1 becomes
0
ct 1.2 −0.6633 0 0 0 −3.317 m
x0 −0.6633 1.2 0 0 5 m = 6 m ,
0 =
y 0 0 1 0 0 0
0
z 0 0 0 1 0 0
where the negative time component implies event 1 occurs
3.317 m
t= = 11.06 ns
c
before event 2 as shown in Figure 7.25. This makes perfect sense. If
the pole doesn’t fit in the barn, then the front will exit the barn before
the back enters.
• Everything is just fine until the farmer’s son decides to be a smart alec.
What if he leaves the back door closed and, at the moment he sees the
c Nick Lucid
254 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.25: In this spacetime diagram, the two blue world lines correspond to the front
and back of the pole (and, likewise, the red world lines to the barn). You can see those
events are simultaneous in the unprimed frame (the barn’s IRF). However, event 1 occurs
before event 2 in the primed frame (the pole’s IRF) as shown by the gray dashed lines.
back of the pole line up with the front of the barn (i.e. event 2), he
closes the front door. According to the farmer, the pole doesn’t fit, so
is the pole in the barn or not?!
– We saw in Example 7.7.1 that the same set of events must occur
in all frames of reference. Different frames just disagree on how,
and sometimes in what order, those events unfold. If the pole
is enclosed in the barn in the son’s frame, then it must also be
enclosed in the farmer’s frame.
– In the farmer’s frame, event 1 occurs 11.06 ns before event 2, so
the doors don’t close simultaneously for him, but that isn’t quite
enough to reconcile this paradox. We need to let go of one more
thing: the rigidity of the pole.
– Since the back door is closed, it collides with the front of the pole.
Assuming the door and the pole can survive the impact (which
they probably can’t) and the barn keeps moving at β = 0.5528
(which it probably isn’t due to conservation of momentum), the
barn door must start to move the front of the pole. However, the
back of the pole doesn’t notice and stays still because the speed
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 255
– You can’t create a photon in one frame and not another. It must
be created in all frames or none, with no exceptions. The problem
in this case is we’ve stepped beyond the scope of the basic circuit
model. The battery generates an electric field to move charges in
a complete circuit. E-fields propagate at the speed of light, which
appears instantaneous most of the time.
– Unfortunately, since the pole is traveling at β = 0.5528, this prop-
agation speed is no longer negligible. For the LED to light in the
unprimed frame (the barn’s IRF), the circuit must be complete
for at least
LB + LP 5m+5m
t= = = 33.33 ns
c 3 × 108 m/s
to allow the E-field to propagate the round trip of the circuit.
This is ignoring any response time the LED itself might need.
c Nick Lucid
256 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.26: This is a representation of how event 1 appears to both observers in the
pole-barn circuit paradox. In the unprimed frame (the barn’s IRF), the contacts match
up and the circuit is complete and the LED should light. In the primed frame (the pole’s
IRF), the circuit is not complete and the LED should not light.
– This may still seem like a very small amount of time, so let’s
consider it in context. In 33.33 ns traveling at β = 0.5528, the
pole (or the barn) will have moved a distance of
Example 7.7.3
Probably the most famous of all the paradoxes in special relativity is the
“twin’s paradox.” The paradox itself stems from common problem given to
introductory students. Here’s the basic idea: You have a set of identical
twins. One of them is an adventurous astronaut and the other a homebody.
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 257
On their 25th birthday, the astronaut hops in a spaceship and travels off to a
star 8 ly away (let’s say Wolf 359) at half the speed of light (v = 0.5c). Upon
arriving at the star, the astronaut discovers nothing special and immediately
heads home at the same speed.
The homebody twin observes her sister take 16 years to get to the star
and another 16 years to get home. This makes sense since
∆x ∆x 8 c yrs
v= ⇒ ∆t = = = 16 yrs
∆t v 0.5c
for a one-way trip or 32 years for the roundtrip. That makes her now exactly
57 years old. However, due to time dilation (Eq. 7.2.11), the gamma factor
(Eq. 7.2.9) is
1 1
γ=p =√ = 1.155,
1 − β2 1 − 0.52
so the astronaut twin only experiences
∆t 16 yrs
∆tp = = = 13.86 yrs
γ 1.155
for a one-way trip or 27.71 years for the roundtrip. This makes her only
between 52 and 53 years old, 4–5 years younger than the homebody twin.
All of this is perfectly legal in the context of special relativity as long as the
two twins agree how old they each are.
The paradox here arises when we try to examine things from the astro-
naut’s point of view. No frame of reference gets any preference over another,
so the astronaut would consider herself stationary and the Earth moving at
0.5c. According to her, Earth has the shorter time. If the astronaut experi-
ences a total of 27.71 years, then the homebody should experience
∆t 27.71 yrs
∆tp = = = 24 yrs
γ 1.155
as opposed to 32 years. It would seem the twins do not agree on how much
time has passed on Earth, so who is correct?
When considering the total time passed during the roundtrip, it turns out
the Earth is correct about the Earth’s time as you might expect. However, the
reasoning behind why is far from straight forward. I’ve found a wide variety
of explanations ranging from incomplete to unnecessarily complicated to just
plain wrong. Here are some common examples:
c Nick Lucid
258 CHAPTER 7. SPECIAL RELATIVITY
2. “The reference frames are not symmetric because the spaceship experi-
ences acceleration meaning it isn’t an inertial reference frame. Since
Earth is the only IRF, it gets preference.” First off, any explanation
like this is a cop-out because it dodges any discuss of real physics. Sec-
ondly, we can very easily stop and start the clocks to avoid including
the acceleration in the problem entirely. Doing so does not resolve the
paradox.
3. “The twins cannot observe each other’s clock without seeing light from
each other, which takes time to travel between them.” This statement
is true and it might affect how we’d actually see the time pass between
the beginning and the end. However, it is by no means a resolution to
the twin’s paradox. All observers agree on the speed of light, so we all
know how long it takes and it can be factored out of our calculations.
Some references on special relativity have even resorted to invoking
Doppler effect, which even further complicates the situation.
4. “When the spaceship turns around, it switches IRFs, which changes the
lines of simultaneity for the spaceship but not the Earth.” This one has
some promise, but is severely incomplete. My guess is someone figured
this out 100 years ago, but it’s been copy/pasted so many times that
we’ve forgotten what the point actually was. No one really understands
it anymore (or at least the ones that do aren’t talking about it).
To get at the real complete solution without getting lost, we’re going to
keep things as simple as possible by removing all unnecessary factors. First,
we’ll assume that both observers can account for light travel time and leave
it out of the discussion. Second, we’re going to remove all accelerations from
the problem by only running the two clocks during constant velocity portions
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 259
of the trip. This will involve starting and stopping the clocks a couple times.
Then finally, we’re going to consider the two halves of the trip completely
independently.
• We’ll assume the spaceship has been given time to accelerate to its
cruising speed of 0.5c before the clocks pass each other and are started.
Both clocks clearly start together since this is represented by the same
event (i.e. they happen at the same place and same time).
• The clocks are not stopped until the spaceship reaches its destination
of Wolf 359 (8 ly away as measured from Earth). The spaceship main-
tains its cruising speed until it stops its clock so as to avoid including
accelerations. Also, since we’re not including any signals transmitted
between them, both observes agreed before departure to stop each of
their clocks at the appropriate time.
• Now we bring the spaceship to rest relative to Wolf 359 for a while and
have the astronaut talk to her homebody sister to compare notes. They
begin to argue over how much time they think passed on Earth during
the trip because, at least while the clocks were running, they each think
they were stationary and the other was moving. This discrepancy is
easily resolved with spacetime diagram (our go-to solution throughout
this chapter).
c Nick Lucid
260 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.27: Two clocks start at event 1. An astronaut travels to the star Wolf 359
between events 1 and 3. Her twin sister stays on Earth traveling between events 1 and 2.
Events 2 and 3 represent when each twin stops their clock, which only occurs at the same
time in the unprimed frame (Earth’s IRF). The green dashed line connects all the events
happening simultaneously in the primed frame (spaceship’s IRF). It is clear the astronaut
thinks her twin should have stopped her clock after 12 years (at event 2i) rather than after
16 years.
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 261
c Nick Lucid
262 CHAPTER 7. SPECIAL RELATIVITY
Figure 7.28: This is the trip home occurring after Figure 7.27. An astronaut travels home
between events 5 and 6 while her twin sister on Earth travels between events 4 and 6.
Events 4 and 5 represent when each twin restarts their clock, which only occurs at the
same time in the unprimed frame (Earth’s IRF). The green dashed line connects all the
events happening simultaneously in the double-primed frame (spaceship’s IRF). It is clear
the astronaut thinks her twin should have started her clock 4 years later (at event 4i).
c Nick Lucid
7.7. WEIRDER STUFF: PARADOXES 263
Figure 7.29: This is the entire trip from Figures 7.27 and 7.28 involving the two twins. It
includes all three reference frames and all six real events.
c Nick Lucid
264 CHAPTER 7. SPECIAL RELATIVITY
The weirdest consequence shown in the Figures 7.27, 7.28, and 7.29 is
how much time passes for each observer during the accelerations. These four
accelerations are sharp corners, which means the acceleration occurs during
a very short time period for the astronaut. The one-way trip is measured in
years for both observers, so let’s assume each acceleration only took two days.
Yes, I’m aware that corresponds to a very violent proper acceleration (Eq.
7.4.17) of 88.5g (i.e. 88.5 times the gravity of Earth), which is far too high
for any human to survive for two days straight. Unfortunately, a comfortable
1g would require 177 days (or about six months), which is far too long to
ignore. Just go with it.
It doesn’t get weird until we look at how the Earth sees the astronaut
slow down at Wolf 359. According to Figure 7.27, the astronaut switches
IRFs at event 3. By what we just assumed, event 3 is a two-day deceleration
for the astronaut. The beginning of event 3 is simultaneous with event 2i,
but the end of event 3 is simultaneous with event 2 (since it’s now in the
rest frame of Earth). The time between event 2i and event 2 is four years!
Truly understanding what happens during those accelerations would require
general relativity (Chapter 8), but I have yet to see anyone use it to tackle
this particular version of the paradox.
c Nick Lucid
Chapter 8
General Relativity
8.1 Origins
Shortly after publishing his five papers in 1905, Albert Einstein began think-
ing a bit more about his theory of relativity. He had successfully ended
the argument between classical mechanics and electrodynamics, which was
certainly no small feat. However, the solution had one small limitation: it
couldn’t accurately predict measurements taken inside an accelerated refer-
ence frame (ARFs). This seems like a small issue, but always taking mea-
surements in inertial reference frames can be occasionally inconvenient since
the surface of the Earth is only approximately inertial (e.g. it rotates slowly).
It also indicates a gap in our understanding and science has a drive to fill
such gaps. Einstein knew he needed a more general theory of relativity
(hence “general relativity”). This would involve at least one more postulate
to address this issue, so he began performing more thought experiments.
Equivalence Principle
Explaining phenomena in an ARF can be tricky because of fictitious forces
(i.e. forces that do not exist in all frames of reference.) The most popular
examples of these are the Coriolis and centrifugal forces which exist in a
rotating reference frame, but disappear in an inertial frame. The rotation
itself is enough to explain the motion in the inertial frame. In 1907, Ein-
stein’s thoughts were on a much simpler type of ARF: a rocket accelerating
in a straight line. He realized if a rocket accelerated at 9.8 m/s2 and its
265
266 CHAPTER 8. GENERAL RELATIVITY
Figure 8.1: On the left, a rocket is accelerating through space at 9.8 m/s2 . On the
right, an identical rocket is a rest on the surface of the Earth. These two situations are
indistinguishable to the observers inside the rockets.
What he meant was the fictitious force resulting from the acceleration is not
fictitious at all. It is literally gravity! It would appear you can’t explain ac-
celeration without also explaining gravity in the same context. The ultimate
implications of this were, at the time, beyond what anyone could foresee, but
it got the wheels turning for Einstein and a few others.
Spacetime Revisited
As mentioned in Section 7.2, Hermann Minkowski generalized Einstein’s work
in 1908 by describing spacetime itself with tensor analysis. This got Einstein
thinking about his equivalence principle a bit more. “What if spacetime is
something tangible? What if it can be changed?” he asked himself. Not being
c Nick Lucid
8.1. ORIGINS 267
Figure 8.2: These people were important in the development of general relativity.
Spacetime Curves?!
We mentioned the use of something called differential geometry, which is very
important in the development of general relativity. It’s a mathematical tool
describing the behavior of not only curves, but surfaces and volumes as well.
The way it’s formulated allows it to apply to any number of dimensions,
including but not limited to the four-dimensional spacetime in which we
live. It’s common to think of spacetime as a “fabric” of sorts that can be
stretched, compressed, bent, twisted, etc. The more that fabric is deformed,
the more energy it contains and, therefore, the more it can influence anything
c Nick Lucid
268 CHAPTER 8. GENERAL RELATIVITY
Figure 8.3: A common visual curved spacetime is the rubber sheet analogy, featured here.
If we rolled a marble across this mesh sheet, then it would be drawn to the ball in the
center. Unfortunately, spacetime doesn’t actually look like this, so it’s only good for
demonstrating the concept of curvature. We’ll develop a much more accurate diagram
later in Section 8.6.
δ ∂Γδαν ∂Γδαµ
Rαµν = − + Γδλµ Γλαν − Γδλν Γλαµ , (8.1.1)
∂xµ ∂xν
which is a rank-4 dimension-4 mixed tensor (see Section 6.2 for more details
on rank and dimension). This tensor isn’t perfectly symmetric, but its last
two indices obey
δ δ
Rαµν = −Rανµ , (8.1.2)
which is called skew symmetry. If you make the Riemann curvature tensor
completely covariant, then we get
c Nick Lucid
8.2. EINSTEIN’S EQUATION 269
δ
where Rλαµν = gλδ Rαµν (note: index order is important). Also, performing
this index operation multiple times can switch the sign back to positive (e.g.
Rλαµν = Rαλνµ or Rλαµν = Rµνλα ).
Because of the many ways a four-dimensional “fabric” can be deformed,
every point in spacetime is assigned 44 = 256 numbers (4 indices, each with a
possible 4 values) to represent the total curvature. Notice the Riemann cur-
vature tensor involves the Christoffel symbols (Eq. 6.7.6), which described the
parallel transport of tensors during covariant derivatives (Eq. 6.7.5). Since
the Riemann tensor describes curvature, it’s actually a second derivative
(i.e. ∇α ∇δ T µν for an arbitrary tensor T µν ) and so involves the product of
two Christoffel symbols rather than just one. We can also take covariant
derivatives of the Riemann tensor and get some useful identities. One is
called a Bianchi identity,
∂Γµαν ∂Γµαµ
Rαν = µ
− ν
+ Γµλµ Γλαν − Γµλν Γλαµ , (8.1.5)
∂x ∂x
containing 42 = 16 numbers. Furthermore, the Ricci tensor is symmetric
(i.e. Rαν = Rνα ), so this turns out to really be only 10 independent numbers.
Contracting again gives us the Ricci curvature scalar,
which may come in handy since energy is a scalar quantity. The Ricci scalar
contains less information than the Ricci tensor, so we’ll need both as we
describe the behavior of spacetime.
c Nick Lucid
270 CHAPTER 8. GENERAL RELATIVITY
c Nick Lucid
8.2. EINSTEIN’S EQUATION 271
On the other hand, by reducing with the Bianchi identity (Eq. 8.1.4), we get
∇ν Rαν + ∇µ Rαµ − ∇α R = 0,
µ
because Rαν = Rαµν = g µλ Rλαµν and index order matters because of skew
symmetry (i.e. Rσλµν = −Rσλνµ ). Since the summation index can change
symbols on a whim, the first two terms are the same and this reduces to
1
∇µ Rαµ = ∇α R, (8.2.4)
2
which implies R is constant (since its derivative is zero). This is troublesome
since it means the curvature of spacetime is constant and, by Eq. 8.2.2, that
T (the matter-energy distribution) is also constant throughout the entire
universe.
Given that our universe does not have uniform density, we’ll need a better
option. The easiest way to handle this is to just add a second unknown term
to the left side of Eq. 8.2.2,
where we just need to solve for Xαν . By conservation of energy (Eq. 8.2.3),
this is
∇α Rαν + ∇α Xαν = 0.
1
∇α Xαν = − ∇ν R
2
1
∇α Xαν = − gαν ∇α R.
2
c Nick Lucid
272 CHAPTER 8. GENERAL RELATIVITY
Since the covariant derivative of the metric is always zero (∇α gαν = gαλ ∇λ gαν =
0), this becomes
α α 1
∇ Xαν = ∇ − gαν R
2
1
Xαν = − gαν R,
2
assuming we’re not adding any constants into the mix. Historical note: In
1922, Einstein tried to add a constant term to keep the universe static in
size. He called it the cosmological constant... and then later called it the
“biggest blunder” of his career. We will not be including such a constant.
If we substitute this back into Eq. 8.2.5, we get
1
Rαν − gαν R = κ Tαν .
2
If we want this to reduce to Eq. 8.2.1 in the weak-field approximation,
then κ = 8πG/c4 and the final result is called Einstein’s equation,
1 8πG
Rαν − gαν R = 4 Tαν . (8.2.6)
2 c
Sometimes this is called “Einstein’s field equations” because there are ac-
tually 10 equations, one for each possible independent component of the
tensors. It should also be noted that Einstein’s equation is defined at a sin-
gle arbitrary position in spacetime (i.e. an event) just like divergence and
curl (see Section 3.2).
c Nick Lucid
8.3. HILBERT’S APPROACH 273
which is a line (or path) integral of the Lagrangian, L, between times t1 and
t2 . Recall from Section 4.2, the Lagrangian is defined as the kinetic energy
minus the potential energy and has standard energy units. As a result, in SI
units, the action is measured in joule seconds (J·s).
The principle of stationary action states that an object or a particle
will take a path with no variation in its action. We use the word “stationary”
to mean zero variation like what occurs at a maximum or minimum (or saddle
point on curved surfaces). In mathematical terms, we say
δS = 0, (8.3.2)
where the delta operates on the action, S, to give us the variation. This is
sometimes viewed as an alternate form of Lagrange’s equation (Eq. 4.2.14)
since they both involve the Lagrangian and both give the path taken.
If we intend on using the principle of stationary action in general rela-
tivity, then we’ll have a generalize the definition for an action first. Rather
than being integrated over just time, it should be over all spacetime. Also,
if we include spacial coordinates, then we’ll need a Jacobian multiplier (see
Example 6.6.1) for the spacetime volume element.
Z p
S ≡ Ltotal |det(g)| d4 x, (8.3.3)
where g is the metric tensor in matrix form. Keep in mind, from here on
out, we’re sticking with the traditional sign convention for components of the
metric tensor: (−, +, +, +) initially defined in Section 7.2.
When writing the total Lagrangian for the system, it isn’t enough to know
about the matter in the region. In Section 7.5, we examined the relativistic
nature of the electromagnetic field, which contains energy. As a result, the
electromagnetic Lagrangian is
1
LEM = Fαδ F αδ (8.3.4)
4µ0
c Nick Lucid
274 CHAPTER 8. GENERAL RELATIVITY
where Fαδ is the electromagnetic field tensor given by Eq. 7.5.13. In fact,
the tensor product above is given by Eq. 7.5.15.
Now that spacetime itself is a tangible entity, it too can have energy.
Therefore, the total Lagrangian is
However, since we’re only interested in how spacetime and matter interact,
we’ll ignore the electromagnetic field for now. That means
R c4
Lspacetime = = R, (8.3.5)
2κ 16πG
where κ is just a constant (consistent with Section 8.2). Note that the space-
time Lagrangian is zero when the curvature is zero. This is physically impor-
tant and totally consistent with our “fabric” analogy. If you’d like to add a
cosmological constant like the one mentioned in Section 8.2, then you’d add
it here by giving flat spacetime a non-zero energy.
We are now in a position to be applying the principle of stationary action
(Eq. 8.3.2). The total action can be written from Eq. 8.3.3 as
Z p
S = (Lmatter + Lspacetime ) |det(g)| d4 x
Z
R p
S= L+ |det(g)| d4 x,
2κ
where L ≡ Lmatter . Taking the variation of this action and applying the
principle of stationary action, we get
Z
R p
0=δ L+ |det(g)| d4 x
2κ
Z
R p
0= δ L+ |det(g)| d4 x.
2κ
c Nick Lucid
8.3. HILBERT’S APPROACH 275
The variation operator works just like a derivative, so by the chain rule (Eq.
3.1.2)
Z
δ R p
0= L+ |det(g)| δg αν d4 x
δg αν 2κ
Since this statement should be true for any variation in the inverse metric,
g αν , we get
δ R p
0 = αν L+ |det(g)|
δg 2κ
Let’s
p take a closer look at
p the variation in the second term. We know
|det(g)| is the same as − det(g) in spacetime (i.e. you either have one
negative or three negatives by convention), so
p p δ [det(g)]
δ |det(g)| = δ − det(g) = − p .
2 − det(g)
δ p 1p 1p
|det(g)| = − − det(g) g αν = − |det(g)| gαν .
δg αν 2 2
c Nick Lucid
276 CHAPTER 8. GENERAL RELATIVITY
c Nick Lucid
8.3. HILBERT’S APPROACH 277
The second term vanishes leaving just Rαν and Eq. 8.3.8 becomes
1
Rαν − gαν R = κ Tαν , (8.3.9)
2
which is exactly the result we got for Einstein’s equation in Section 8.2.
What was that? Why does the second term vanish?! That was pretty
blatant hand-waving, wasn’t it? Explaining it, though, is going to take a little
bit of careful planning. Remember the Ricci tensor is just a contraction of
the Riemann tensor, so we’ll avoid getting lost in the summation indices by
starting with Riemann. Using the definition (Eq. 8.1.1), we get
∂Γραµ
ρ
ρ ∂Γαν ρ λ ρ λ
δRαµν = δ − + Γλµ Γαν − Γλν Γαµ
∂xµ ∂xν
ρ ∂ (δΓραν ) ∂ δΓραµ ρ λ
ρ λ
δRαµν = − + δ Γλµ Γ αν − δ Γλν Γαµ .
∂xµ ∂xν
Using the product rule (Eq. 3.1.5) on the last two terms gives
ρ
ρ ∂ δΓ
∂ (δΓ ) αµ
ρ
δRαµν = αν
µ
− ν
+ δΓρλµ Γλαν + Γρλµ δΓλαν − δΓρλν Γλαµ − Γρλν δΓλαµ .
∂x ∂x
Moving some terms around and making sure the variations are always last,
this is
ρ
ρ ∂ δΓ
∂ (δΓ ) αµ
ρ
δRαµν = αν
+ Γρλµ δΓλαν − δΓρλν Γλαµ − − Γρλν δΓλαµ + δΓρλµ Γλαν
∂xµ ∂xν
ρ
∂ (δΓραν ) ∂ δΓαµ
ρ
δRαµν = + Γρλµ δΓλαν − Γλαµ δΓρλν − − Γρλν δΓλαµ + Γλαν δΓρλµ
∂xµ ∂xν
and grouping gives us
∂ (δΓραν )
ρ ρ λ λ ρ
δRαµν = + Γλµ δΓαν − Γαµ δΓλν
∂xµ
!
∂ δΓραµ ρ ρ
− + Γλν δΓλαµ − Γλαν δΓλµ .
∂xν
c Nick Lucid
278 CHAPTER 8. GENERAL RELATIVITY
Lastly, we can do some voodoo math (with a little foresight; we can add zeros,
multiply by ones, add and subtract constants, etc. to simplify a mathematical
expression) by subtracting a new term from the first parenthetical expression
and adding that same term to the second. This results in
∂ (δΓραν )
ρ ρ λ λ ρ λ ρ
δRαµν = + Γλµ δΓαν − Γαµ δΓλν − Γµν δΓλα
∂xµ
!
∂ δΓραµ ρ λ λ ρ λ ρ
− + Γλν δΓαµ − Γαν δΓλµ − Γµν δΓλα .
∂xν
where ρ has become a summation index. The original term we need to make
vanish is
δRαν g αν
g αν ∇ρ (δΓραν ) − ∇ν δΓραρ ,
αν
= αν
δg δg
c Nick Lucid
8.3. HILBERT’S APPROACH 279
but we’ll need to move back a little further in our work to see this happen.
This was a originally a term inside p an integral (Recall Eq. 8.3.3). We also
pulled out a δg αν and canceled a |det(g)| along the way, so
Z Z
αν δRαν αν
p 4 αν
p
g δg |det(g)| d x = g δR αν |det(g)| d4 x
δg αν
is what actually vanishes. Using Eq. 8.3.12, this terms is
Z
p
g αν ∇ρ (δΓραν ) − ∇ν δΓραρ |det(g)| d4 x,
but it still needs just a little more work. We can distribute the g αν to get
Z
αν p
g ∇ρ (δΓραν ) − g αν ∇ν δΓραρ |det(g)| d4 x.
Since the symbol used for summation indices is meaningless, we can say
ρ λ αν αλ
∇ρ (δΓαν ) = ∇λ δΓαν and g ∇ν = g ∇λ . This gives
Z
αν p
g ∇λ δΓλαν − g αλ ∇λ δΓραρ |det(g)| d4 x.
We also know the covariant derivative of the metric is always zero (∇λ g αν =
0), so we can pull out the covariant derivative arriving at
Z
p
∇λ g αν δΓλαν − g αλ δΓραρ |det(g)| d4 x
(8.3.13)
but that’s a bit general for my taste. Essentially, it says the rate of some ten-
sor T integrated (i.e. infinitesimally summed) over a whole space is equal to
c Nick Lucid
280 CHAPTER 8. GENERAL RELATIVITY
the tensor T integrated (i.e. infinitesimally summed) over the space’s bound-
ary. When applied to Eq. 8.3.13, this tells us we can just sum the contribu-
tions of
g αν δΓλαν − g αλ δΓραρ
Stress-Energy Tensor
It turns out that matter just isn’t enough to describe what occupies (and
affects) a space. If we recall that Ep = mp c2 means that mass is just a type of
energy, then it becomes clear we need to consider all the energy occupying a
space. This is where the stress-energy tensor comes in because it includes
so much more than just mass. We usually work with it in contravariant form:
E Φ1 Φ2 Φ3
Φ1 P1 σ12 σ13
T αν −→ Φ2 σ21 P2 σ23 ,
(8.4.1)
Φ3 σ31 σ32 P3
c Nick Lucid
8.4. SWEATING THE DETAILS 281
flux (or momentum density) tells us how the energy is moving. As a result,
more than just the energy’s existence, its interactions and motion can also
affect the curvature of spacetime. Another way to think about this is it’s
both potential energy and kinetic energy that curve spacetime.
This tensor obeys a form of the principle of conservation of energy-
momentum (i.e. 4-momentum, see Eq. 7.4.23):
∇ν T αν = 0 , (8.4.2)
Some Context
A massive body like our Sun can be said to hold onto all the planets, aster-
oids, comets, etc. simply with energy density. That component of Einstein’s
equation (Eq. 8.2.6), namely
1 8πG
Rtt − gtt R = 4 Ttt ,
2 c
simplifies to Eq. 8.2.1 in the weak-field approximation. Yes, I’m saying
the Sun creates a weak field. For comparison, a strong field is created by
something like a super-giant star or a black hole. Our sun isn’t called a yellow
dwarf for nothing. However, the orbit of Mercury noticeably wobbles being
so close to the Sun, which was a phenomenon we were unable to explain
until general relativity. From a practical point of view, we really only need
Einstein’s equation (Eq. 8.2.6) when classical physics isn’t enough.
Let’s consider something a little more exciting: a black hole. Black holes
(i.e. objects so massive that not even light can escape) had been speculated
for over a century before the publication of general relativity. However, the
term “black hole” wasn’t coined until physicist John Wheeler first used it
in the 1970s. Understanding black holes requires all the components in the
stress-energy tensor (Eq. 8.4.1). They curve spacetime by not only existing,
but also traveling through space, rotating, and forming orbits with stars
and other black holes. All of these motions affect spacetime in different
ways. Rotation can twist spacetime into a spiral and it’s even speculated
c Nick Lucid
282 CHAPTER 8. GENERAL RELATIVITY
that wobbles can create waves in spacetime. There’s also a bit of lag since
all these the effects only propagate at the speed of light.
Weird Units
Some of the components of the stress-energy tensor (Eq. 8.4.1) seem to have
some units that don’t match, but they do if we’re careful. Energy density
has units of J/m3 in the SI system, so we’ll use that as a reference. Pressure
and stress have a unit of N/m2 , but we get
N Nm J
2
= 3
= 3
m m m
with a little manipulation. Energy flux the rate at which energy passes
through a surface (called “intensity” with regard to waves) and has units of
W/m2 . With a little manipulation, this becomes
W J J m
2
= 2
= 3 ,
m sm m s
which varies from the expected unit by m/s. This turns out to be just a
factor of c = 3×108 m/s. A similar unit phenomenon happens to momentum
density with a unit of
kg m/s kg m2 /s2 s J s
3
= 3
= 3 ,
m m m m m
which varies from the expected unit by s/m (i.e. a factor of 1/c).
Recall for Eq. 7.3.6, we introduced a notation changing the contravariant
coordinates from (ct, x, y, z) to (x0 , x1 , x2 , x3 ). Specifically, this states x0 ≡
ct, which means we’d be measuring time in spatial units (e.g. meters). I
know this seems weird, but spacetime fails to distinguish between space and
time, so it’s actually more physical to do the same on paper. As a result of
this, the speed of light becomes
m m
c = 299, 792, 458 =1 = 1,
s m
so the unit of the stress-energy tensor (Eq. 8.4.1) becomes J/m3 as expected
for all components. The quantity c is now simply a unit conversion between
meters and seconds. We actually did this without realizing it throughout
c Nick Lucid
8.4. SWEATING THE DETAILS 283
Chapter 7 with the use of β = v/c (e.g. half the speed is light was simply
β = 0.5). The only difference now is that we’re openly embracing it.
Traditionally, proponents of general relativity have gone a step further.
Since the quantity G = 6.674 × 10−11 Nm2 /kg2 is in Einstein’s equation (Eq.
8.2.6), it shows up quite often. Physicist get a bit lazy sometimes and stop
writing it. In other words, they set
−11 Nm2
G = 6.67408 × 10 = 1,
kg2
so that all the G’s disappear. Ok, so maybe it’s not just laziness. Theoretical
physicists tend to be unconcerned with universal constants since they don’t
actually say much about the relationship itself. Their only purpose to make
the relationships match experiment. What I’m saying is this isn’t really a
new thing to set a constant to one. It’s referred to as natural units.
The consequence of setting both c = 1 and G = 1 is called geometrized
units because the units of all the quantities relevant to general relativity
reduce to variations of only the meter, the unit of geometry. We end up with
unit conversions like
G m
2
= 7.42592 × 10−28 ; mass
c kg
G m
3 = 2.47702 × 10−36
; linear and angular momentum
c Ns (8.4.3)
G 1
= 8.26245 × 10−45 ; force, energy, energy density, pressure
c 4 N
= 2.75606 × 10−53 1 ; power
G
c5 W
and the size of these conversions drastically brings the large astronomical
values down to comprehensible ones. For example, the mass of the sun is
now
30 −28 m
M = 1.989 × 10 kg 7.42592 × 10
J
= 1477 m,
kg
which really makes no conceptual sense whatsoever. However, with less to
carry through the math, there is less chance of calculation error.
As you can see in Table 8.1, all the quantities in the stress-energy tensor
now have a unit of 1/m2 and we no longer have to worry about the discrep-
ancy. Furthermore, these new units change all the equations we use as well.
c Nick Lucid
284 CHAPTER 8. GENERAL RELATIVITY
Table 8.1: This is a list of quantities relevant to general relativity and their corresponding
geometrized unit.
1
Rαν − gαν R = 8π Tαν . (8.4.4)
2
c Nick Lucid
8.5. SPECIAL CASES 285
Figure 8.4: These people were important in the application of general relativity.
Spherical Symmetry
It is very common for large objects like stars to be spherically symmetric,
which just means there is no angular dependence within the star. Only
changes in radial distance from the center result in changes in the star’s
properties. Furthermore, most stars tend to rotate slowly (e.g. the Sun takes
about a month to make one full rotation), so it’s safe to assume the star is
also static (i.e. has temporal symmetry). This means its properties don’t
change in time.
If the star is spherically symmetric, then it’s angular terms should be
identical to the standard spherical metric terms (Eq. 7.2.6). If the star is also
static, then none of its terms should should be functions of time. Therefore,
the metric tensor takes the form:
−a(r) 0 0 0
0 b(r) 0 0
gαδ −→ 0 2
, (8.5.1)
0 r 0
0 0 0 r2 sin2 θ
where a and b are arbitrary functions of radial distance. Using Eq. 7.2.3, the
line element takes the form:
ds2 = −a(r) dt2 + b(r) dr2 + r2 dθ2 + r2 sin2 θ dφ2 , (8.5.2)
for both inside and outside a spherically symmetric (and static) star.
Example 8.5.1
Show that the metric for spherically symmetric (and static) star is diagonal.
c Nick Lucid
286 CHAPTER 8. GENERAL RELATIVITY
0 ∂xα ∂xδ
gµν = gαδ
∂x0µ ∂x0ν
0 ∂xα ∂xδ
gµt = gαδ .
∂x0µ ∂t0
If we expand the sum over δ, then
∂xα ∂t
0 ∂r ∂θ ∂φ
gµt = gαt + 0 gαr + 0 gαθ + 0 gαφ
∂x0µ ∂t0 ∂t ∂t ∂t
0 ∂xα ∂xα
gµt = [(−1) gαt + (0) gαr + (0) gαθ + (0) gαφ ] = − gαt .
∂x0µ ∂x0µ
which is still four equations due to the free index µ. For µ = t, this is
0 ∂t ∂r ∂θ ∂φ
gtt = − gtt + 0 grt + 0 gθt + 0 gφt ,
∂t0 ∂t ∂t ∂t
gtt0 = − [(−1) gtt + (0) grt + (0) gθt + (0) gφt ] = +gtt ,
c Nick Lucid
8.5. SPECIAL CASES 287
0
grt = − [(0) gtt + (+1) grt + (0) gθt + (0) gφt ] = −grt ,
0
which is a problem. If the star has temporal symmetry, then grt = grt
so we must conclude that grt = 0. In the same way, gθt = 0 and gφt = 0.
• We can perform this same process on the spherical symmetry trans-
formations, θ → −θ and/or φ → −φ. Including the work for it here
would be redundant since all we’d be changing would be indices. The
results are as follows:
0 0
gθθ = gθθ and gφφ = gφφ ,
implying these can be non-zero like gtt , and all off-diagonal terms are
zero. You can save yourself a little time knowing that the metric tensor
is always symmetric (ie. gαδ = gδα ).
Example 8.5.2
Determine the Christoffel symbols and curvature tensors in the space occu-
pied by a spherically symmetric (and static) star where the metric is given
by Eq. 8.5.1.
• There are quite a few components in these quantities and the process
gets a bit repetitive. I’ll save time by deriving only one of each. You can
find an entire list of curvatures for a variety of geometries in Appendix
C.
• Christoffel symbols can be found using Eq. 6.7.6. We’ve done this for
an arbitrary 3-space in Example 6.7.1, but this generalizes to 4-space
with
δ 1 λδ ∂gλµ ∂gλν ∂gµν
Γµν = g + − ,
2 ∂xν ∂xµ ∂xλ
c Nick Lucid
288 CHAPTER 8. GENERAL RELATIVITY
1 ∂gtt
Γttr = g tt
2 ∂r
Since the metric tensor is diagonal, we know g tt = 1/gtt and we get
1 ∂gtt 1 ∂a
Γttr = =
2gtt ∂r 2a ∂r
t ∂Γtrr ∂Γtrt
Rrtr = − + Γtλt Γλrr − Γtλr Γλrt ,
∂t ∂r
where λ is a summation index. Since the summation shows up twice,
that’s a total of 8 non-derivative terms. However, judging from the
non-zero Christoffel symbols in Section C.5, we can say only λ = r in
the first summation results in a non-zero value and only λ = t does in
the second. Also, Γtrr = 0, not that it matters since none are functions
of time anyway. Therefore,
t ∂Γtrt
Rrtr =− + Γtrt Γrrr − Γttr Γtrt
∂r
t ∂ 1 ∂a 1 ∂a 1 ∂b 1 ∂a 1 ∂a
Rrtr = − + −
∂r 2a ∂r 2a ∂r 2b ∂r 2a ∂r 2a ∂r
c Nick Lucid
8.5. SPECIAL CASES 289
2
t ∂ 1 ∂a 1 ∂ ∂a 1 ∂a ∂b 1 ∂a
Rrtr = − − + − 2
∂r 2a ∂r 2a ∂r ∂r 4ab ∂r ∂r 4a ∂r
2 2
1 ∂ 2a
t 1 ∂a 1 ∂a ∂b 1 ∂a
Rrtr = 2 − + −
2a ∂r 2a ∂r2 4ab ∂r ∂r 4a2 ∂r
2
1 ∂ 2a
t 1 ∂a ∂b 1 ∂a
Rrtr =− 2
+ + 2
2a ∂r 4ab ∂r ∂r 4a ∂r
• We could repeat this with Eq. 8.1.5 to get the Ricci curvatures. How-
ever, if we have all the Riemann curvatures, then it’s easier to just
contract the Riemann tensor with
µ
Rαν = Rαµν ,
where µ is a summation index. Again, we’ll pick just one to solve:
µ t r θ φ
Rtt = Rtµt = Rttt + Rtrt + Rtθt + Rtφt
Using the Riemann curvatures from Section C.5, we get
" 2 #
1 ∂2a
1 ∂a 1 ∂a ∂b 1 ∂a 1 ∂a
Rtt = [0] + − − 2 + + +
4ab ∂r 4b ∂r ∂r 2a ∂r2 2rb ∂r 2rb ∂r
2
1 ∂ 2a
1 ∂a 1 ∂a ∂b 1 ∂a
Rtt = − − 2
+ 2
+
4ab ∂r 4b ∂r ∂r 2a ∂r rb ∂r
• Using Eq. 8.1.6 (just another contraction), the Ricci curvature scalar
is given by
R = g αν Rαν = g αt Rαt + g αr Rαr + g αθ Rαθ + g αφ Rαφ .
Luckily, we know both the metric and the Ricci tensor are diagonal, so
R = g tt Rtt + g rr Rrr + g θθ Rθθ + g φφ Rφφ .
Using the Ricci curvatures from Section C.5 and combining like terms,
we get
2
1 ∂2a
2 1 2 ∂a 1 ∂a 2 ∂b 1 ∂a ∂b
R= 2 1− − + 2 + + −
r b rab ∂r 2a b ∂r rb2 ∂r 2ab2 ∂r ∂r ab ∂r2
c Nick Lucid
290 CHAPTER 8. GENERAL RELATIVITY
Example 8.5.3
Determine a convenient orthnormal basis for the space occupied by a spher-
ically symmetric (and static) star where the coordinates are given by the
metric in Eq. 8.5.1.
• A generalization of Eq. 6.4.9 to four-dimensional spacetime is
−1 0 0 0
0 1 0 0
gµ̂δ̂ = (êµ )α (êδ )ν gαν −→
,
0 0 1 0
0 0 0 1
meaning the metric mimics flat spacetime in the orthonormal basis.
Since we’re building an orthonormal basis from an already orthogonal
coordinate basis, each orthonormal basis vector will only have one non-
zero component in the coordinate basis. This will drastically simplify
our summations.
• We’ll start with time component of the time vector (êt )t :
gt̂t̂ = (êt )α (êt )ν gαν .
However, we already know α = ν = t is the only non-zero component,
so
gt̂t̂ = (êt )t (êt )t gtt
2 1
−1 = (êt )t gtt ⇒ (êt )t = √
−gtt
• The radial component of the radial vector works out in a similar way
as
gr̂r̂ = (êr )α (êr )ν gαν = (êr )r (êr )r grr
2 1
+1 = [(êr )r ] grr ⇒ (êr )r = √ .
grr
The angular components of the angular vectors are identical in pattern.
c Nick Lucid
8.5. SPECIAL CASES 291
Eq. 8.5.2 is nice and simple, but it has it’s limitations. It assumes the star
never changes. Eventually, every star, rotating or not, is going to collapse. As
long as your star maintains spherical symmetry perfectly during the collapse,
then you can say
where a and b are now arbitrary functions of both radial distance and time.
You just have to be careful about the conditions of your star’s collapse.
Perfect Fluids
A star happens to be made of plasma, but plasma behave very similarly
to fluids. If our star is not very viscous, free of shear stress, and has only
isotropic pressures (i.e. the pressure is independent of direction); then we call
it a perfect fluid. This is common for a spherically symmetric star. Under
these conditions, the stress-energy tensor (Eq. 8.4.1) takes the form:
T αν = (ρ + P )uα uν + g αν P (8.5.6)
c Nick Lucid
292 CHAPTER 8. GENERAL RELATIVITY
where uα is the 4-velocity (Eq. 7.4.3), ρ(r) is the density at r, and P (r)
is the pressure at r. See Section 8.4 for a more general description of the
stress-energy tensor.
If we’re dealing with a star that is also static, then the fluid is not moving
in space (only through time). That means its 4-velocity is
1
√
a
0
uα −→
0 ,
(8.5.7)
0
where
√ a(r) is from the spherically symmetric line element (Eq. 8.5.2). The
1/ a is due to a scale factor we picked up since we’re working in a coordinate
basis rather than an orthonormal basis (see Example 8.5.3 for more details).
In other words, using Eq. 6.4.8, the components of the 4-velocity are
uα = (êλ )α uλ̂ ,
where the
√ orthonormal basis vectors are given in Eq. 8.5.3. The result is
t t̂
u = 1/ a, but u = 1. It gets really weird, so I do everything I can to stick
with the coordinate basis in general relativity. If we plug Eq. 8.5.7 into Eq.
8.5.6, then we get
tt
= (ρ + P )ut ut + g tt P
T
T rr = g rr P
T θθ = g θθ P
φφ
= g φφ P
T
tt
T = ρ/a
rr
T = P/b
T θθ = P/r2 (8.5.8)
T φφ =
P
2 2
r sin θ
for the four non-zero components of the stress-energy tensor for a perfect
static fluid.
Example 8.5.4
c Nick Lucid
8.5. SPECIAL CASES 293
Determine the form of b(r) in Eq. 8.5.1 for a spherically symmetric star
composed of perfect static fluid.
• We’ll start with Einstein’s equation (Eq. 8.4.4), but only the
1
Rtt − gtt R = 8π Ttt , (8.5.9)
2
component is necessary.
That has 16 terms, but since the metric tensor is diagonal, we know
α = ν = t leaving us with just one non-zero term:
ρ
Ttt = gtt gtt T tt = (−a) (−a) = aρ.
a
and
2
1 ∂2a
2 1 2 ∂a 1 ∂a 2 ∂b 1 ∂a ∂b
R= 1− − + 2 + + − ,
r2 b rab ∂r 2a b ∂r rb ∂r 2ab ∂r ∂r ab ∂r2
2 2
which are also part of Eq. 8.5.9 along with the metric tensor.
• Substituting all these into Eq. 8.5.9 and combining like terms results
in
a 1 a ∂b
2
1− + 2 = 8π aρ.
r b rb ∂r
c Nick Lucid
294 CHAPTER 8. GENERAL RELATIVITY
• It might appear we’re at a stand still, but the integral on the left is
something special. The mass enclosed in a sphere of radius r (centered
at the center of the star) is given by
Z 2π Z π Z r
m(r) = ρ(r) r2 sin θ dr dθ dφ,
0 0 0
−1
2m
⇒b= 1− .
r
The Vacuum
If we limit ourselves to the spacetime outside a star, then we’re in the vac-
uum. This is particularly important if we want to know how the star is
affecting other objects (e.g. planets, comets, people, etc.). We’ve mentioned
the vacuum in the book before and even used it in Section 5.5 to derive the
equations describing electromagnetic waves. A vacuum is just a place (and
time) devoid of matter and energy (i.e. empty spacetime). In the case of
general relativity, we can say Tαν = 0 anywhere in the vacuum.
Recall, we said Einstein’s equation and all quantities in it are defined
at a specific event. What we mean is that it doesn’t matter if there is a
star nearby because Tαν only has a value for all events inside the star. This
has consequences for the other quantities in Einstein’s equation (Eq. 8.4.4).
Substituting in Tαν = 0, we get
1
Rαν − gαν R = 0.
2
There are two ways this equation can be zero: either Rαν = 0 or
1
Rαν = gαν R.
2
However, playing a little with the second possibility gives us
αν αν 1
g Rαν = g gαν R
2
c Nick Lucid
296 CHAPTER 8. GENERAL RELATIVITY
1
g αν Rαν = g αν gαν R.
2
Since g αν gαν = δνν = 4 (using the Kronecker delta) and Eq. 8.1.6 says
g αν Rαν = Rνν = R, we ultimately get
1
R= (4R) ⇒ 1 = 2,
2
Rαν = 0 (8.5.11)
−1
2M 2M
2
ds = − 1 − 2
dt + 1 − dr2 + r2 dθ2 + r2 sin2 θ dφ2 , (8.5.12)
r r
where we’ve replaced a(r) and b(r) with specific functions. This is called
the Schwarzchild solution since Karl Schwarzschild derived it very shortly
after Einstein’s publication of general relativity. It is the most famous of
the “vacuum solutions” and, by solutions, we mean solutions to Einstein’s
equation. All physical line elements are solutions to Einstein’s equation.
Example 8.5.5
Use the Ricci curvatures for spherically symmetric (and static) star found in
Section C.5 to derive the Schwarzchild line element (Eq. 8.5.12).
• To solve for the line element, we just need to find the specific forms of
a(r) and b(r). We’re going to do this using the vacuum condition Eq.
8.5.11, but we have to do it for at least three of the Ricci curvatures
to have a solvable system of partial differential equations. Those three
c Nick Lucid
8.5. SPECIAL CASES 297
are
2
1 ∂ 2a
1 ∂a 1 ∂a 1 ∂a ∂b
Rtt = 0 = − − 2 +
rb ∂r 4ab ∂r 4b ∂r ∂r 2b ∂r2
2
1 ∂a 1 ∂b 1 ∂a ∂b 1 ∂ 2a
Rrr = 0 = + + −
4a2 ∂r rb ∂r 4ab ∂r ∂r 2a ∂r2
R = 0 = 1 − 1 − r ∂a + r ∂b
θθ
b 2ab ∂r 2b2 ∂r
and the Rφφ is unnecessary because it’s just Rθθ sin2 θ.
• From here on out, this is just a math problem. We can clear all the
fractions getting
2
∂ 2a
∂a ∂a ∂a ∂b
0 = 4ab − br − ar + 2abr 2
∂r ∂r ∂r ∂r ∂r
2
2
∂a 2 ∂b ∂a ∂b ∂ a
0 = br + 4a + ar − 2abr 2
∂r ∂r ∂r ∂r ∂r
∂a ∂b
2
0 = 2ab − 2ab − br
+ ar
∂r ∂r
by multiplying by 4ab2 r, 4a2 br, and 2ab2 , respectively.
• Adding the first two equations, several terms cancel and we’re left with
∂a ∂b
0 = 4ab + 4a2
∂r ∂r
∂a ∂b
0=b +a
∂r ∂r
∂
0= (ab) ⇒ ab = constant
∂r
or, equivalently, a = k1 /b where k1 is not a function of r (i.e. a constant
in the integral over r). Substituting this into the third equation gives
us
k1 2 k1 ∂ k1 k1 ∂b
0=2 b −2 b − br + r
b b ∂r b b ∂r
c Nick Lucid
298 CHAPTER 8. GENERAL RELATIVITY
k1 2 k1 k1 ∂b k1 ∂b
0=2 b −2 b + br + r
b b b2 ∂r b ∂r
r ∂b r ∂b
0 = 2b − 2 + + .
b ∂r b ∂r
Combining like terms and clearing fractions by multiplying by b/2, we
get
∂b
0 = b2 − b + r .
∂r
• Since b is only a function of r, this is just a first-order differential
equations we can solve by separation of variables. Rewriting, we get
db −1 1
r = −b (b − 1) ⇒ db = dr
dr b (b − 1) r
1 1 1
⇒ − db = dr.
b b−1 r
Now we can integrate to get
Z Z
1 1 1
− db = dr
b b−1 r
c Nick Lucid
8.5. SPECIAL CASES 299
and
k1 k2
a= = k1 1 − .
b r
• So now we have the general form of both a(r) and b(r). We just need
to figure out what k1 and k2 look like. We know, as r → ∞, the line
element should approach that of flat spacetime (i.e. a → −1). If we
take the limit, then
k2
−1 = lim a = lim k1 1 − = k1 ,
r→∞ r→∞ r
so k1 = −1 and the Schwarzchild solution takes the form
−1
k2 k2
ds2 = − 1 − dt2 + 1 − dr2 + r2 dθ2 + r2 sin2 θ dφ2 . (8.5.13)
r r
Example 8.5.6
The time component in the Schwarzchild line element (Eq. 8.5.12) is depen-
dent on r, the distance from the center of the spherically symmetric object.
This implies the passage of time is measured differently for observers in dif-
ferent locations in the spacetime curvature. Determine a transformation for
time between the following observers:
c Nick Lucid
300 CHAPTER 8. GENERAL RELATIVITY
You may ignore the motion of all observers, which is practical assuming the
observer A is on the equator and observer B is in geostationary orbit above
observer A. Observer C is so far away that the motions of observers A and
B don’t matter.
• Recall from Section 7.7 that we have to be very careful when discussing
who measures what and where they measure it. Since the Schwarzchild
line element (Eq. 8.5.12) has no time-dependence, all three observers
will have the same coordinate time as shown in Figure 8.5. Coordinate
time is the time determined by the coordinates we’ve chosen for the
source of curvature (i.e. Earth), which is not something we directly
measure. What we measure is our proper time and each of the observers
has their own because they’re all on different world lines.
• Let’s assume events 1 and 2 in Figure 8.5 are just two bright flashes
of light. These flashes are separated by ∆τA for the Earth observer.
However, those flashes arrive at observer B at events 3 and 4, respec-
tively, separated by ∆τB . Likewise, that’s ∆τC between events 5 and 6
for observer C, the distant observer.
• Assuming none of the observers move through space, Eq. 8.5.12 shows
2 2 2M
∆sA = −∆τA = − 1 − ∆t2
rA
∆τA2 1 − 2M/rA
2
=
∆τB 1 − 2M/rB
c Nick Lucid
8.5. SPECIAL CASES 301
Figure 8.5: Shown here, events 1 and 2 both happen on the Earth’s surface. The labels
A, B, and C represent the radial distance, r, for each observer in Example 8.5.6. Light
travels away from event 1 and 2 along null paths, which are only straight far from the
Earth. The curvature has been exaggerated for clarity.
c Nick Lucid
302 CHAPTER 8. GENERAL RELATIVITY
s
∆τA 1 − 2M/rA
=
∆τB 1 − 2M/rB
s
1 − 2M/rB
∆τB = ∆τA (8.5.14)
1 − 2M/rA
This shows, as you get closer to the source of gravity (i.e. rA < rB ), time
slows down (i.e. ∆τA < ∆τB ). Just be careful! This is in geometrized
units (see Table 8.1), so M is measured in meters.
• Observer C is very far away (i.e. rC → ∞). The light’s world line is
very straight for them because spacetime is nearly flat. Applying this,
Eq. 8.5.14 simplifies to
1
∆τC = p ∆τA .
1 − 2M/rA
You should never refer to ∆τC as the “gravitational proper time” even
though you may be tempted. Yes, it is an extreme value (i.e. the
longest time measured by any observer), but proper time is the shortest
time measured for a single world line. Remember, we’re measuring
time on three different world lines, so it isn’t the same thing. In fact,
a careful look shows ∆τC = ∆t, which means the distant observer
actually measures coordinate time.
8.6 Geodesics
Knowing how spacetime curves is great, but our real interest lies in how an
object or particle will respond to that curvature. In Section 8.3, we even
used the principle of stationary action (Eq. 8.3.2) on a particle to derive
Einstein’s equation (Eq. 8.2.6). We don’t see fields or spacetime curvature,
so we can’t really take direct measurements. It’s the behavior of the matter
that we really study.
c Nick Lucid
8.6. GEODESICS 303
Flat Spacetime
Classically, a particle’s behavior is found using either Newton’s second law
(Eq. 4.2.6) or Lagrange’s equation (Eq. 4.2.14) to determine it’s equations of
motion. We’ve already generalized Newton’s second law for relativity with
4-force (Eq. 7.4.26), which looked a little like this:
duδ d 2 xδ
F δ = mp aδ = mp = mp 2 ,
dτ dτ
where aδ is 4-acceleration and uδ is 4-velocity. The rest mass, mp (i.e. the
smallest measurable mass), and the proper time, τ (i.e. the shortest mea-
surable time), were first defined at the end of Section 7.2. Usually, if we’re
trying to find equations of motion, then we write this as
d 2 xδ Fδ
= (8.6.1)
dτ 2 mp
so we have just the motions on the left. If the particle or object has no forces
acting on it, then we call it a free particle. In this case, Newton’s second
law reduces to
d 2 xδ
= 0, (8.6.2)
dτ 2
which is something akin to Newton’s first law. A particle under these con-
ditions would travel in a straight line (i.e. the shortest distance between two
points) at constant velocity.
Time-like Geodesics
Until general relativity, gravity was always considered a force, but it didn’t
quite behave like the others we knew about. Sure, the mathematical de-
scriptions are similar in form as we saw with Coulomb’s law (Eq. 5.2.1)
and Newton’s universal law of gravitation (Eq. 5.2.2). However, when you
actually apply these in Newton’s second law (Eq. 4.2.6), they behave very
differently. The mass and charge are both important when determining the
electric influence on an object. When it comes to the gravitational influence
though, neither is necessary. All that matters (pun intended) for gravity is
how the object is moving and it’s distance from the source of the gravity.
It’s weird!!
c Nick Lucid
304 CHAPTER 8. GENERAL RELATIVITY
No Star Star
Figure 8.6: These two diagrams feature the same 11 geodesic paths in a particular region
of space (just a sample of the infinite number of them). On the left, is a flat spacetime
(i.e. the spacetime far away from any sources of curvature). On the right, a massive object
like a star is present, so the geodesics are not what we would consider “straight.” Also,
keep in mind, geodesics are speed dependent, so these curves would be “straighter” for
faster moving objects.
But now, gravity is simply the result of curved spacetime, so it’s weird-
ness makes a lot more sense. Since it’s no longer considered a force, a particle
can be under the influence of gravity and still be considered “free.” Unfor-
tunately, by observation, we know these types of particles do not travel in
what we would think of as “straight” lines as they did in classical physics.
This discrepancy can only be resolved if we relax our definition of the word
“straight.”
To avoid confusion, the new notion of a straight line is called a geodesic.
In flat spacetime, far away from any massive objects, a geodesic is very
straight and obeys Eq. 8.6.2 (see left image in Figure 8.6). However, the
lines (or paths) become curved when the spacetime is curved by a massive
object like the Earth or the Sun (see right image in Figure 8.6). They might
not obey Eq. 8.6.2, but geodesic paths always obey the following definition:
• Geodesic path - Any world line between two events such that the
proper time is extreme (i.e. maximum or minimum),
where is consistent with the classical definition since the shortest distance
takes the least time. Any world line, as defined in Section 7.2, has a proper
c Nick Lucid
8.6. GEODESICS 305
time measured in the frame of the particle traveling along the world line. A
geodesic path is simply a world line with the best value for proper time.
Powered by this idea of geodesics, we’ll need to generalize Eq. 8.6.2 so
we can find equations of motion for the particle. The direction of a path is
described by the 4-velocity, uδ , of a particle on that path since that vector is
always tangent to the path. For a geodesic path, we can say
uµ ∇µ uδ = 0,
where ∇µ uδ is the change in the δ component of the 4-velocity in the xµ
direction. Multiplying this be uµ gives us something like a dot product (Eq.
2.2.1), so, essentially, we’re saying uµ is always perpendicular to its change
along a geodesic path. In other words, particles traveling on a geodesic path
don’t change their motion in the direction of their motion.
Using the definition of the covariant derivative on contravariant vectors
(Eq. 6.7.3), we get
δ
µ ∂u δ ν
u + Γµν u = 0
∂xµ
∂uδ
uµ + Γδµν uµ uν = 0.
∂xµ
By the chain rule for derivatives (Eq. 3.1.2), we get
duδ ∂τ
uµ + Γδµν uµ uν = 0
dτ ∂xµ
duδ 1
uµ µ
+ Γδµν uµ uν = 0
dτ dx /dτ
and, since uµ = dxµ /dτ (Eq. 7.4.3), this becomes
duδ
+ Γδµν uµ uν = 0.
dτ
Note that partial and full derivatives with respect to proper time are equiv-
alent (a quality we’ve used a lot in this book). This can be written with
4-acceleration in its familiar form using Eq. 7.4.3 again, arriving at
d2 xδ µ
δ dx dx
ν
+ Γ µν =0, (8.6.3)
dτ 2 dτ dτ
c Nick Lucid
306 CHAPTER 8. GENERAL RELATIVITY
Example 8.6.1
Determine the value of k2 in Eq. 8.5.13 from Example 8.5.5 using the geodesic
equation (Eq. 8.6.3).
• We’re going to keep things as simple as possible without making any
unnecessary approximations. Let’s assume the event we’re consider-
ing (for the geodesic equation) is a release event. The small object
(mobj << mstar ) is being released from rest some distance above the
star. As you would expect, this object would experience an acceleration
radially toward the star, so we’ll consider the δ = r component:
d2 r µ
r dx dx
ν
+ Γ µν =0
dτ 2 dτ dτ
d2 r µ
r dx dx
ν
= −Γµν .
dτ 2 dτ dτ
• Since we’re assuming a release event, we know
dr dθ dφ
= = =0
dτ event dτ event dτ event
c Nick Lucid
8.6. GEODESICS 307
d2 r r dt dt
= −Γtt + 0 + 0 + ...,
dτ 2 dτ dτ
where the other 15 terms in the summations are zero. Multiplying
through by dt2 /dτ 2 , we get
d2 r
= −Γrtt ,
dt2
which represents the acceleration with respect to coordinate time.
• This Christoffel symbol is given in Section C.5 and the form of the
metric components is given in Eq. 8.5.13, so
d2 r
1 ∂a 1 k2 ∂ k2
=− =− 1− − 1−
dt2 2b ∂r 2 r ∂r r
d2 r
1 k2 k2 k2 k2
2
=− 1− − 2 = 2 1−
dt 2 r r 2r r
d2 r k2 k22
= − .
dt2 2r2 2r3
• Now we’re going to make one approximation in r. This will not affect
the value of k2 because we know it’s not a function of r. We already
made r → ∞ in Example 8.5.5 to find k1 . However, k2 contains some
information about gravity, so we don’t want spacetime completely flat,
just close to flat. We’ll assume we’re releasing from a point where r is
large, but not infinite. Since 1/r3 approaches zero faster than 1/r2 , we
can say
d2 r k2
2
≈ 2.
dt 2r
c Nick Lucid
308 CHAPTER 8. GENERAL RELATIVITY
given by Eq. 5.2.2 and, when combined with Newton’s second law (Eq.
4.2.6), yields an acceleration of
Mm M
−ma = −G ⇒a=G .
r2 r2
Converting to geometrized units (see Table 8.1) and writing the accel-
eration as a derivative, that’s
d2 r M
2
= 2
dt r
and, by comparison to our large r result, we get
k2 M
2
= 2 ⇒ k2 = 2M .
2r r
We’ve made no assumption about the fluid nature of the matter of the
star in deriving this result.
Example 8.6.2
What are the conserved quantities in the Schwarzchild geometry (Eq. 8.5.12)?
• Conserved quantities are usually the result of some kind of symmetry.
We know the Schwarzchild geometry is spherically symmetric (in θ
and φ) and symmetric in time (t), but not radially (r). However, we
also know that Schwarzchild geodesics are always in a single plane, so
we’ll simplify matters by sticking to the xy-plane (i.e. θ = π/2). This
leaves us with two possible routes for conserved quantities: the t and
φ components of the geodesic equation (Eq. 8.6.3).
• We’ll start with the t-component, which is
d2 t µ
t dx dx
ν
+ Γ µν = 0.
dτ 2 dτ dτ
According to the list in Section C.3, there is only one unique Christoffel
symbol with t as the upper index, so
d2 t dt dr
2
+ 2Γttr = 0,
dτ dτ dτ
c Nick Lucid
8.6. GEODESICS 309
where the 2 appears because Γδµν = Γδνµ . Substituting from Section C.3,
we get
" −1 #
d2 t M 2M dt dr
2
+2 2 1− =0
dτ r r dτ dτ
−1
d2 t
2M 2M dt dr
2
=− 2 1− . (8.6.4)
dτ r r dτ dτ
• To keep things looking simple for the math ahead, we’ll define ṫ ≡ dt/dτ
because we’re not going to find t anyway. This simplifies Eq. 8.6.4 to
−1
dṫ 2M 2M dr
=− 2 1− ṫ
dτ r r dτ
−1
dṫ 2M 2M
=− 2 1− dr.
ṫ r r
The right side can be simplified further with a convenient change of
variable. If we say
2M 2M
u=1− ⇒ du = dr,
r r2
then
Z Z
dṫ du dṫ du
=− ⇒ =−
ṫ u ṫ u
ln ṫ = − ln (u) + ln (ε) ,
where ε is a unitless constant (i.e. our time-conserved quantity). Mov-
ing some things around using logarithm rules, we get
ε ε
ln ṫ = ln ⇒ ṫ =
u u
and, substituting back to r,
−1
dt 2M
ṫ = =ε 1− . (8.6.5)
dτ r
c Nick Lucid
310 CHAPTER 8. GENERAL RELATIVITY
d2 φ 2 dr dφ
2
=− . (8.6.7)
dτ r dτ dτ
c Nick Lucid
8.6. GEODESICS 311
• To keep things looking simple for the math ahead, we’ll define φ̇ ≡
dφ/dτ because we’re not going to find φ anyway. This simplifies Eq.
8.6.7 to
dφ̇ 2 dr
=− φ̇
dτ r dτ
Z Z
dφ̇ dr dφ̇ dr
= −2 ⇒ = −2
φ̇ r φ̇ r
ln φ̇ = −2 ln (r) + ln (`) ,
dφ `
φ̇ = = 2. (8.6.8)
dτ r
In classical physics, the components of angular momentum are given by
Eq. 6.6.6, but the general idea is L = Iω = mr2 ω so angular momentum
per unit rest mass is
L dφ
= r2 ω = r2 .
mp dτ
This is why we called the constant “`.” Solving Eq. 8.6.8 for the con-
stant, we get
dφ
` = r2 , (8.6.9)
dτ
c Nick Lucid
312 CHAPTER 8. GENERAL RELATIVITY
Null Geodesics
Geodesic paths taken by massless particles are called null geodesics. Yes,
I said massless. Gravity is not the result of a magical force between masses,
although that was a good stepping stone for physics and it worked very well
for a long time. No, gravity is the result of straight lines not being straight,
so everything capable of motion is affected by gravity. This includes massless
particles like photons.
As we saw in Section 7.6, we have issues with using proper time, τ , as
a parameter in our equations when it comes to particles that travel on null
world lines like photons (and all other massless particles). Particles that
travel at exactly c have zero proper time, so we needed to choose a different
parameter. We chose an affine parameter, Ω, which maintained the form
of all our equations. Using the same process here, the geodesic (Eq. 8.6.3)
becomes
d 2 xδ µ
δ dx dx
ν
+ Γ µν =0, (8.6.10)
dΩ2 dΩ dΩ
where we’ve just replaced all the τ ’s with Ω’s. The quantity Ω is not unique
like proper time, so it’s a bit more abstract.
This substitution must be done in all our definitions of 4-vectors. No
matter what parameter you choose, remember we must always get
uδ uδ = aδ aδ = Fδ F δ = pδ pδ = 0
because they’re all null 4-vectors. It’s clear from Eq. 8.6.10 that 4-velocity
and 4-acceleration are
dxδ d 2 xδ
uδ = and aδ = .
dΩ dΩ2
There is also a new form of the 4-force
dpδ
Fδ = ,
dΩ
in terms of 4-momentum. With 4-momentum, we have to be a little careful
since rest mass, mp , is zero. It’s usually best to define the 4-momentum
of the massless particle in terms of the energy rather than worry about its
derivative definition like we did with Eq. 7.6.3 (recall that Erel = prel c and
Erel = hfrel for a photon).
c Nick Lucid
8.6. GEODESICS 313
Non Geodesics
If, for some reason, your scenario involves more influences than just gravity,
then we need to refer back to Eq. 8.6.1. On the right side of the equation,
we had a force term, which was not necessary before because gravity is no
longer a force. Adding this back in, Eq. 8.6.3 becomes
d2 xδ µ
δ dx dx
ν
Fδ
+ Γ µν = , (8.6.11)
dτ 2 dτ dτ mp
where F δ is the total 4-force. We’ve left the Christoffel term on the left side
this whole time because we’re still keeping motions on the left and forces on
the right (as was done in the flat spacetime case).
A common example of a force affecting an object is the electromagnetic
force (assuming it’s also charged). Given Eq. 8.6.11, you’d just refer back to
the Lorentz 4-force (Eq. 7.5.30), given by
F δ = quα F δα ,
but we need the indices in the correct place. Since uα = uσ gασ and F δα =
g αλ Fλδ , we get
where δσλ = gασ g αλ is the Kronecker delta (Eq. 6.2.2). Now, the equation of
motion (Eq. 8.6.11) becomes
d 2 xδ µ
δ dx dx
ν
quσ Fσδ
+ Γ µν =
dτ 2 dτ dτ mp
d 2 xδ µ
δ dx dx
ν
q dxσ δ
+ Γ µν = F , (8.6.12)
dτ 2 dτ dτ mp dτ σ
where q is the charge of the particle and Fσδ = gασ F δα is the mixed EM field
tensor (Eq. 7.5.12) affecting the particle.
c Nick Lucid
314 CHAPTER 8. GENERAL RELATIVITY
Black Holes
In Section 8.4, we mentioned something called a black hole, an object so
massive that not even light can escape. This seems pretty extreme since the
speed of light is the fastest anything can go. They occur when very massive
stars run out of material to fuse and collapse. There is so much mass that
no force known to us is strong enough to overpower gravity. That includes
the forces involved in keeping matter from occupying the same space at the
same time.
If a black hole is static (i.e. non-rotating), then the Schwarzchild solution
(Eq. 8.5.12) is sufficient in describing it. However, it has issues. A singu-
larity is a purely mathematical term describing a place where a function is
undefined. Eq. 8.5.12 happens to have two of these, one at r = 2M in grr
and another at r = 0 in both gtt and grr .
The singularity at r = 2M is only a coordinate singularity, which means it
only exists because of our choice of coordinates. A coordinate transformation
we can use to eliminate it is
r
t = t∗ − 2M ln −1 , (8.7.1)
2M
where t∗ is replacing t. Its derivative is
r −1 dr r −1
dt = dt∗ − 2M −1 = dt∗ − −1 dr
2M 2M 2M
−1
∗ 2M 2M
dt = dt − 1− dr
r r
However, it’s dt2 in the line element, so
−1 −2
4M 2
2 ∗ 2 4M 2M ∗ 2M
dt = (dt ) − 1− dt dr + 2 1− dr2 .
r r r r
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 315
2GM
rS = (8.7.3)
c2
in SI units rather than geometrized units (converted using Eq. 8.4.3).
Unfortunately, Eq. 8.7.2 still contains the singularity at r = 0. This is
called a physical singularity since it is present with any set of coordinates.
A single counterexample won’t suffice as a proof, so we’ll need something a
little more encompassing. Traditionally, we find the value of
c Nick Lucid
316 CHAPTER 8. GENERAL RELATIVITY
K = g αρ g µσ g νη gλδ Rρση
λ δ
Rαµν .
At first glance, it may seem as though we’ve made things worse since this
has 48 = 65, 536 terms. However, we know gλδ and g αρ are both diagonal in
the Schwarzchild solution (Eq. 8.5.12). This means α = ρ, µ = σ, ν = η,
and λ = δ; and we get
K = g αα g µµ g νν gδδ Rαµν
δ δ
Rαµν
2
K = g αα g µµ g νν gδδ Rαµν
δ
.
There are a lot of repeated indices, but only the up/down ones get summed
over. We’ve reduced this back to only four summations or 44 = 256 terms.
Since the Riemann tensor has skew symmetry (Eq. 8.1.2) in the last two
δ
2
indices and any negative signs cancel due to Rαµν , we can say
2
K = 2g αα g µµ g νν gδδ Rαµν
δ
,
where the 2 in front accounts for the repeats of µν and we’ve reduced to
128 terms. Based on the Riemann components listed in Section C.3, we also
know the indices always alternate. That means δ 6= α, µ 6= ν, δ = µ, and
α = ν; and we get
XX 2
K=2 g νν g µµ g νν gµµ Rνµν
µ
,
ν µ
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 317
where I’ve included the summation signs for clarity at this point. There are
too many repeated indices! Since g µµ = 1/gµµ for any µ (because they are
inverse diagonal tensors), they cancel as well and we’re left with
XX 2
K=2 (g νν )2 Rνµν
µ
.
ν µ
µ
Note, in this notation, Rνµν is not the Ricci tensor since the summation
occurs after it’s squared.
Only two summations left makes for 42 = 16 terms, but we know µ 6= ν
meaning we only have 16 − 4 = 12 terms left. This is conveniently the
exact number of Riemann components given in Section C.3. Expanding the
summations gives us
X
νν 2 t
2 r 2 θ
2 φ 2
K=2 (g ) Rνtν + (Rνrν ) + Rνθν + Rνφν
ν
2 2 2 2 2 2
K = 2 g tt t
Rrtr + 2 g tt t
Rθtθ + 2 g tt
t
Rφtφ
2
+2 (g rr )2 (Rtrt
r 2
) + 2 (g rr )2 (Rθrθ
r
)2 + (g rr )2 Rφrφ
r
2 θ 2 2 θ 2 2 θ 2
+2 g θθ Rtθt + 2 g θθ Rrθr + 2 g θθ Rφθφ
2 2 2 2 2 φ 2
φ φ
+2 g φφ Rtφt + 2 g φφ Rrφr + 2 g φφ Rθφθ
and, substituting in the components of the Riemann tensor and the inverse
metric, we get
48M 2
K= , (8.7.5)
r6
for the Schwarzchild solution.
Notice, it still contains r = 0 as a singularity even though it’s a spacetime
invariant. No matter how you label that singularity in your coordinates, Eq.
c Nick Lucid
318 CHAPTER 8. GENERAL RELATIVITY
Example 8.7.1
Describe the path of a photon traveling only radially close to a black hole.
• If an object is only traveling along radial lines, then the Schwarzchild
line element (Eq. 8.5.12) simplifies to
−1
2 2M 2 2M
ds = − 1 − dt + 1 − dr2 .
r r
Since we’re dealing with photons which travel on null paths, we get say
−1
2M 2 2M
0=− 1− dt + 1 − dr2 .
r r
• Now we just have to solve this for t in terms of r. Moving some things
around, we get
−1 −2
2M 2 2M 2 2 2M
1− dt = 1 − dr ⇒ dt = 1 − dr2
r r r
−1
2M
⇒ dt = ± 1 − dr,
r
where the square root shows we have two solutions.
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 319
Figure 8.7: This is a spacetime diagram showing two possible worldlines for radially trav-
eling photons. Each of the curves is a solution from Eq. 8.7.6. Since the units of the radial
axis are in Schwarzchild radii (i.e. rS = 2M ), the event horizon is located at r = 1 rS .
The physical singularity is still located at r = 0.
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 321
Figure 8.8: This is Figure 8.7 transformed into Eddington-Finkelstein coordinates. It fixes
the coordinate singularity at r = 2M allows us to predict how photons (and anything
else) will cross the event horizon. A light cone has been shown inside the event horizon
for dramatic effect.
c Nick Lucid
322 CHAPTER 8. GENERAL RELATIVITY
so essentially we just need to add that big term to our solutions from
Eq. 8.7.6. This gives us
h r i r
t∗ = ± r + 2M ln − 1 + constant + 2M ln −1
2M 2M
r
t∗ = r + 4M ln − 1 + constant
1
2M
t∗2 = −r + constant.
This doesn’t change the shape of the plus curve very much as you
can see in Figure 8.8. However, the minus curve changes dramatically
because it is now a straight line. This minus curve clearly crosses
the event horizon and makes it all the way to the physical singularity
at r = 0. These two curves form light cones (see Section 7.2) that
progressively point toward the black hole. Even weirder, inside the
event horizon time-like world lines become space-like. No one has any
real concept of what that even means. It’s crazy!
Example 8.7.2
A photon can escape a black hole from just above the event horizon if it
travels straight away from it. If there is any angle to its trajectory, then it
will fall back into the black hole. The boundary beyond which a photon can
escape at any angle away from the black hole is the radius at which a photon
can orbit in a circle. What is this radius?
• There’s a lot going on here, so let’s get all are ducks in a row. A
convenient choice for the plane of the circle is the xy-plane. In spherical
coordinates, the xy-plane is defined by θ = π/2, from which it follows
that
d2 θ dθ
2
= = 0.
dt dt
Furthermore, if the photon’s path is a circular orbit, then
dr dφ
= 0 and = constant
dt dt
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 323
as well as
d2 r d2 φ
= = 0,
dt2 dt2
for all events on the path. Note that we are allowed to take derivatives
with respect to coordinate time for photon, just not with respect to
proper time.
Since we’re dealing with photons which travel on null paths, we get say
2M
0=− 1− dt2 + r2 dφ2 .
r
2
dφ 1 2M
= 2 1− . (8.7.7)
dt r r
d2 r µ
r dx dx
ν
+ Γµν = 0,
dΩ2 dΩ dΩ
where all derivative are with respect to an affine parameter Ω. If we
multiply through by dΩ2 /dt2 , then we get
d2 r µ
r dx dx
ν
+ Γµν = 0,
dt2 dt dt
c Nick Lucid
324 CHAPTER 8. GENERAL RELATIVITY
where all derivatives are now with respect to coordinate time. Expand-
ing the sum gives a total of 1 + 4 × 4 = 17 terms. However, all the zero
derivatives mentioned earlier brings this down to
2
dt dt dφdφ dφ
Γrtt + Γrφφ = Γrtt + Γrφφ =0
dt dt dt dt dt
3GM
rorbit = (8.7.9)
c2
Examples 8.7.1 and 8.7.2 show some very convenient special cases, but
you might be wondering what the general case looks like. What if you want
an angled path for a photon or the path of a massive particle? In that case,
you’re going to have to solve several components of the geodesic equation (Eq.
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 325
Figure 8.9: These are four different null geodesic paths (i.e. geodesic paths for massless
particles like photons) near a black hole. They were plotted using Eqs. 8.7.10 and 8.7.11
with different sets of initial conditions.
√ One of the paths is on the photon sphere described
by Eq. 8.7.9 (ri = 3M and `/ε = ±3 3M ).
c Nick Lucid
326 CHAPTER 8. GENERAL RELATIVITY
d2 r dt dt dr dr dφ dφ
0= 2
+ Γrtt + Γrrr + Γrφφ
dΩ dΩ dΩ dΩ dΩ dΩ dΩ
d2 r dt dt dr dr dφ dφ
2
= −Γrtt − Γrrr − Γrφφ
dΩ dΩ dΩ dΩ dΩ dΩ dΩ
Substituting in the Christoffel symbols from Section C.3 as well as Eqs. 8.6.5
and 8.6.8, this becomes
" −1 #2
d2 r
M 2M 2M
= − 2 1−ε 1−
dΩ2 r r r
−1 2 2
M 2M dr 2M `
+ 2 1− +r 1−
r r dΩ r r2
−1 −1 2
d2 r 2M `2
M 2M M 2M dr
= − 1− ε2 + 1− + 1−
dΩ2 r2 r r2 r dΩ r r3
−1 " 2 #
d2 r 2M `2
M 2M dr 2
= 1− −ε + 1− , (8.7.10)
dΩ2 r2 r dΩ r r3
where ε and ` are constants related to energy and angular momentum, re-
spectively (see Example 8.6.2 for more details). Generalizing Eq. 8.6.8 with
Ω gives us
dφ `
= 2, (8.7.11)
dΩ r
where, again, ` is a constant related to angular momentum. Eqs. 8.7.10 and
8.7.11 apply to both massive (Ω = τ ) and massless particles.
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 327
c Nick Lucid
328 CHAPTER 8. GENERAL RELATIVITY
where a(t) is the scale factor and the constant k is the overall spatial
curvature of the universe. The scale factor is defined to be a = 1 at the
present time and represents the expansion of the universe. We can see it is
only on the spatial components, so it is only space that expands, not time.
We can think of it as average distance between galactic superclusters:
so the scale factor is unitless because it’s normalized to the current spacing
of 400 million light years. Since the average supercluster spacing changes
with time, so does the scale factor.
The spatial curvature, k, is a different story. It is constant over space and
time, but its sign has implications:
It is often stated that k can only have three values: +1, 0, and −1; but this
depends on your choice of units. The way Eq. 8.7.14 is written, the quantity
kr2 must be unitless, so k must have units of m−2 like a Gaussian curvature.
It is not restricted to +1, 0, or −1.
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 329
Figure 8.10: This is what the universe looks like on the largest scale. Strings of galac-
tic superclusters stretch across space in a cosmic web leaving large voids between them.
(Image credit: Argonne National Laboratory)
The Ricci curvature scalar and the non-zero components of the Ricci curva-
ture tensor for this geometry can be found in Section C.6 of the appendix. We
can see this geometry is spherically symmetric because it matches Eq. 8.5.1
with the exception of the extra factor a(t) (which is fine because a is only
a function of time, not space). In fact, this geometry goes further assuming
a perfectly uniform universe. For Eq. 8.7.14 to apply to a universe perfectly,
that universe must be both homogeneous (the same in every place) and
isotropic (the same in every direction). While this isn’t exactly true for our
universe, we can see from Figure 8.10 the universe is uniform enough on the
largest scale that Eq. 8.7.14 is a very good approximation.
Unfortunately, we have a problem. Since Edwin Hubble in 1929, we’ve
been taking measurements of the expanding universe with increasing accu-
racy. Data shows that we live in a flat universe (k = 0), but there isn’t
c Nick Lucid
330 CHAPTER 8. GENERAL RELATIVITY
enough matter and energy in the universe to flatten it. The consequence is
we need to adjust Einstein’s equation (Eq. 8.4.4) to apply to our universe and
we don’t have a lot of wiggle room for that. The methods used in Sections
8.2 and 8.3 involve derivatives and integrals, so our only real option in the
math is to add a constant, gαν Λ. The generalized Einstein’s equation looks
like this:
1
Rαν − gαν R + gαν Λ = 8π Tαν , (8.7.17)
2
where Λ is called the cosmological constant. This constant has units of
m−2 like an energy density (see Table 8.1), but its physical source is unknown,
so we call it dark energy (“dark” because we’re in the dark about it).
Now that we have Eqs. 8.7.16 and 8.7.17, we just need to know what
the matter and energy in the universe looks like. On the largest scale of the
universe, the matter is uniform and doesn’t change much, so let’s assume it’s
a perfect static fluid. Using Eq. 8.5.8 and lowering the indices, we get
Ttt = ρ
2
Trr = a
P
1−kr 2
(8.7.18)
2 2
Tθθ = a r P
2 2 2
Tφφ = a r sin θ P
k ȧ2
3 + 3 − Λ = 8π ρ, (8.7.19)
a2 a2
which we’ll save for later. The αν = rr, θθ, and φφ components of Einstein’s
equation are all the same because the curvature tensor components, Rαν , are
so similar (see Section C.6). The θθ component is the simplest and works
out as
1
Rθθ − gθθ R + gθθ Λ = 8π Tθθ
2
2 1 2 2 6
r 2k + 2ȧ2 + aä − 2 + aä
2 2 2 2
a r k + ȧ + a r Λ = 8π a r P .
2 a2
k ȧ2 ä
− 2
− 2
− 2 + Λ = 8π P. (8.7.20)
a a a
Unfortunately, having so many factors in Eq. 8.7.20 isn’t very convenient. We
can simplify further by adding Eq. 8.7.19 to three of Eq. 8.7.20 (i.e. adding
all the components together: tt + rr + θθ + φφ):
ȧ2 ȧ2
k k ä
3 2 + 3 2 − Λ + 3 − 2 − 2 − 2 + Λ = [8π ρ] + 3 [8π P ]
a a a a a
c Nick Lucid
332 CHAPTER 8. GENERAL RELATIVITY
k ȧ2 k ȧ2 ä
3 + 3 − Λ − 3 − 3 − 6 + 3Λ = 8π (ρ + 3P )
a2 a2 a2 a2 a
ä
−6 + 2Λ = 8π (ρ + 3P ) . (8.7.21)
a
If we move some things around in Eqs. 8.7.19 and 8.7.21, we get something
we can actually interpret. Eq. 8.7.19 determines spatial curvature, k, so
solving for that term gives us
k 8π Λ ȧ2
= ρ + − 2. (8.7.22)
a2 3 3 a
We can see everything inside the universe (regular matter, photons, dark
matter, and even dark energy) makes the curvature more positive. The ȧ2 /a2
term can be thought of as a kinetic energy (density) term for the universe,
which makes the curvature more negative. As mentioned before, our universe
appears to be “flat” (k = 0), so the entire left side equals zero. That leaves
us with
ȧ2 8π Λ
2
= ρ+ (8.7.23)
a 3 3
for our universe. Eq. 8.7.21 determines the acceleration rate of the universe,
ä, so solving for that term gives us
ä 4π Λ
= − (ρ + 3P ) + . (8.7.24)
a 3 3
We can see regular matter, photons, and dark matter (ρ and P ) all lower the
acceleration rate. Dark energy (Λ), on the other hand, raises that accelera-
tion rate.
Eqs. 8.7.22 and 8.7.24 together are called the Friedmann equations
(after Alexander Friedmann). Solutions to these differential equations, like
those found in Figure 8.11, are the scale factor, a(t), which show the ex-
pansion of the universe over time. As with any differential equation, those
solutions depend either on initial or boundary conditions. The way we de-
fined the scale factor in Eq. 8.7.15, we know a(now) = 1, so we would just
need to know the current values of ρ (energy density) and P (pressure). The
c Nick Lucid
8.7. LIMITS AND LIMITATIONS 333
Figure 8.11: This graph shows various possible solutions to the Friedmann equations. The
pink universe expands for the first half of its life and contracts again in the second half,
ending in a big crunch (or possibly a big bounce). The green universe expands and cools
forever, but the rate of that expansion slows over time ending in a big freeze (i.e. a heat
death). The gray universe expands forever at a constant rate, which is only possible if the
universe is perfectly empty. The blue universe expands forever, slowing for some time then
accelerating for the rest and ending in a big freeze (our own universe). The red universe
reaches an infinite scale factor, a, in a finite amount of time causing a big rip where the
actual fabric of the universe rips to bits.
value of k is constant over time and space and so is the value of Λ (the cos-
mological constant), which has some weird consequences. For one, ρ and P
decrease over time, so the Λ term is eventually the big term even though it
doesn’t change. That means, as long as Λ > 0, the last stage of the universe
is guaranteed to be an accelerated expansion.
However, we do have a problem when we run the clock backwards. In
reverse, the universe gets smaller and smaller until, eventually, the entire
(observable) universe becomes a physical singularity (the size of the singu-
larity at the center of a black hole). Recall, at the black hole’s singularity,
the laws of physics break down because the spacetime curvature is infinite. A
similar problem arises here with the universe, but we have no event horizon
to hide it behind. General relativity accurately explains the universe every
place and every time except the center of black hole and the beginning of the
universe, so it would still appear just a bit incomplete. We have yet to find
a solution to the problem.
c Nick Lucid
334 CHAPTER 8. GENERAL RELATIVITY
c Nick Lucid
Chapter 9
335
336 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.1: These people were important in reaching the limits of classical physics.
light, like the Sun). The problem didn’t have a name then, but we now call
it the ultraviolet catastrophe because all the models blew up to infinity
in the ultraviolet wavelength range (see Figure 9.2). However, in 1900, a
German physicist named Max Planck solved this problem with
2hc2 1
Rλ (λ, T ) = hc
, (9.1.1)
λ5
e λkB T −1
where λ is the wavelength of emitted light, T is the temperature of the
object, h = 6.626 × 10−34 Js (or 4.136 × 10−15 eVs) is Planck’s constant,
kB = 1.381 × 10−23 J/K (or 8.617 × 10−5 eV/K) is Boltzmann’s constant, and
Rλ is the spectral radiance (i.e. intensity per steradian per unit wavelength).
In the process, Planck had to embrace two ideas that made him extremely
uncomfortable:
1. The second law of thermodynamics was not fundamental to the uni-
verse, but just the result of statistics.
c Nick Lucid
9.1. DESCENT INTO MADNESS
Figure 9.2: This graph shows predicted spectral radiance against wavelength for the sun’s surface (T = 5778 K). The solid
curve is the attempt by John Rayleigh and James Jeans, but fails at low wavelengths like many other attempts. The dashed
curve is Planck’s solution to the problem.
c Nick Lucid
337
338 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.3: On the left is J.J. Thomson’s model of the atom. On the right is a picture
of plum pudding. During an interview with Thomson, a reporter noticed the resemblance
and referred to it as the “plum pudding” model. Thomson hated the name, but it stuck.
1. It didn’t explain the black body radiation Planck had modeled with
statistics.
The second is a major problem because light carries away energy. This
gradually slows down the electrons until they eventually fall into the nucleus.
The orbits in Rutherford’s model are not stable. Why do we still recognize
this as the atom? Probably because it was the last time atomic models looked
simple.
In 1913, a Danish physicist named Niels Bohr tweaked Rutherford’s model
trying to fix these problems. He stated that, unlike with gravity, only some
orbits were possible (see Figure 9.5). Those orbits were given by
n2 h2 0 n2
rn = = (0.0529 nm) (9.1.2)
πZq 2 m Z
c Nick Lucid
9.1. DESCENT INTO MADNESS 339
Figure 9.4: This is what Ernest Rutherford envisioned for the atom. With all the positive
charge concentrated in a nucleus, the spread in alpha scattering in his gold foil experiment
made much more sense.
some orbits are possible, then only some energy and angular momentum
values are possible. The total energy of an electron is given by
Z 2q4m Z2
En = − = (−13.6 eV) (9.1.3)
8h2 20 n2 n2
where ni is the initial orbit number and nf is the final orbit number. That
energy leaves in the form of a photon, the quantum of light, which has a
c Nick Lucid
340 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.5: Bohr’s model of the atom was just a simple tweak of Rutherford’s in Figure
9.4. It explains Planck’s radiation very well, but only for atoms with a single electron (i.e.
Hydrogen, single-ionized Helium, double-ionized Lithium, etc.).
wavelength of
!
hc 1 1
hf = = Z 2 (13.6 eV) 2
− 2
λ nf ni
!−1
91.13 nm 1 1
λ= 2
− 2 (9.1.5)
Z2 nf ni
where Z is the atomic number (i.e. the number of protons in the nucleus).
Bohr’s model explained Planck’s radiation model quite well, but the problem
of the unstable orbit still remained. The model is also limited because it
fails when there is more than one electron in the atom. It was starting to
become clear that classical mechanics and electrodynamics were not sufficient
to describe what was happening. We needed a new quantum mechanics.
Wave-Particle Duality
Two decades into the 20th century, physicists were still discussing Bohr’s
model. Not only did it leave some questions unanswered, it also raised a
couple more.
c Nick Lucid
9.1. DESCENT INTO MADNESS 341
Figure 9.6: These people were important in the initial development of quantum mechanics.
• What makes light so special that it can be both a particle and a wave?
These are all questions for which we eventually discovered answers. The first
answer we found was to the last question: What makes light so special? The
answer: Nothing. It isn’t special at all.
In 1924, a french physicist named Louis de Broglie proposed, in his PhD
dissertation, that all particles can behave as waves. In other words, wave-
particle duality was not limited to light. He even proposed a way to predict
the wavelength and frequency of massive particles like electrons. Using the
Planck relation (E = hf ), the frequency is given by
E γmp c2
f= = , (9.1.6)
h h
c Nick Lucid
342 CHAPTER 9. BASIC QUANTUM MECHANICS
a wave function is
~
y(~r, t) = A cos −ωt + k • ~r + ϕ0 , (9.1.7)
which looks a lot like the argument of the cosine in Eq. 9.1.7. By simply
matching terms, we can conclude p~ = ~~k with a magnitude of
h 2π h
p = ~k = =
2π λ λ
h h
λ= = , (9.1.8)
p γmp v
γmp c2
hE h
vphase = λf = =
ph γmp v h
E c2
vphase = = (9.1.9)
p vparticle
c Nick Lucid
9.1. DESCENT INTO MADNESS 343
Figure 9.7: This diagrams shows the 3 lowest de Broglie wavelengths for an electron in
hydrogen atom. The Bohr orbits (dashed black) are scaled by a factor of n2 and the waves
(solid blue) by a factor of n, which results in each orbit containing n wavelengths. For
simplicity, the waves shown are the group waves traveling with group velocity.
c Nick Lucid
344 CHAPTER 9. BASIC QUANTUM MECHANICS
constructive interference (i.e. a standing wave). In Figure 9.7, you can see the
waves for the lowest three Bohr orbits of the hydrogen atom. Bohr designed
his model for non-relativistic particles (i.e. particles traveling with v c),
which is actually pretty common for quantum mechanics, so we’ll say γ ≈ 1.
We also know from Newton’s second law (Eq. 4.2.6) that
s
2 2
X Zq v Zq 2
F~n = m~an ⇒ 2
=m ⇒v=
4π0 rn rn 4π0 mrn
where rn is the radius of the Bohr orbit (Eq. 9.1.2). Substituting this into
Eq. 9.1.8, we get
r s
h h 4π0 mrn 4πh2 0 rn
λ= = = .
mv m Zq 2 πZq 2 m
and the quantity in parentheses is just rn (Eq. 9.1.2). This drastically sim-
plifies the wavelength of the electron to
2πrn n
λ= = (0.3324 nm) , (9.1.10)
n Z
where 2πrn is the circumference of the orbit. In other words, you can fit n
wavelengths into each Bohr orbit.
The result is important because, if the electron isn’t actually orbiting,
then it isn’t accelerating and it’s only going to emit a photon if the mat-
ter wave changes. The Bohr orbits are just the natural wavelengths of the
electron. Louis de Broglie’s solution answered not just one, but three of the
questions from the beginning of this section. Unfortunately, it still leaves
the issue of generalizing for multiple electrons. It also raises a question we
encountered for light in electrodynamics (Section 5.5) and then again in spe-
cial relativity (Section 7.1): What is actually vibrating? The answer is only
a bit further down the rabbit hole.
c Nick Lucid
9.2. WAVES OF PROBABILITY 345
Schrödinger’s Equation
In Example 5.5.1, we described electromagnetic waves with a wave function
(Eq. 5.5.5). That wave function, in turn, was a solution to a wave equation
(Eq. 5.5.2). If we assume the wave function of particle has the same form,
then we get
ψ(~r, t) = A cos ~k • ~r − ωt + ϕ0 ,
which looks a lot like Eq. 9.1.7. The symbol ψ is used to clarify it is a
quantum wave function. For the sake of generality and ease of use, we’ll
write this as a complex exponential
~ ~
ψ(~r, t) = A ei(k•~r−ωt+ϕ0 ) = A eiϕ0 ei(~k•~r−~ωt)/~ .
where ~ = 1.055 × 10−34 Js (or 6.582 × 10−16 eVs) and the phase constant
eiϕ0 was merged with the coefficient A.
Contrary to common practice, we’ve found a general wave function before
ever finding a wave equation that governs it. We can’t conveniently apply
these waves to specific cases without a wave equation, which was a problem on
Austrian physicist Erwin Schrödinger’s mind in 1926. His approach involved
the use of energy. The total energy of a particle moving non-relativistically
(i.e. γ ≈ 1) is given by its Hamiltonian,
1 m~v • m~v p~ • p~
H = KE + PE = m~v • ~v + V = +V = + V, (9.2.2)
2 2m 2m
c Nick Lucid
346 CHAPTER 9. BASIC QUANTUM MECHANICS
where p~ = m~v is the 3-momentum of the particle. We also know this H and
the E from Eq. 9.2.1 should be equivalent, so Hψ = Eψ.
Schrödinger knew this wave equation would need to involve time and
space derivatives like any other wave equation, so he reverse-engineered it
from the wave function (Eq. 9.2.1). The first space derivative of the wave
function is
~∂ ∂
⇒E=− = i~ , (9.2.5)
i ∂t ∂t
which is also an operator. Substituting Eqs. 9.2.2, 9.2.4, and 9.2.5 into
Hψ = Eψ, we get
p~ • p~
ψ + V ψ = Eψ
2m
c Nick Lucid
9.2. WAVES OF PROBABILITY 347
~2 ~ 2 ∂ψ
− ∇ ψ + V ψ = i~ , (9.2.6)
2m ∂t
where ~ √= 1.055 × 10−34 Js (or 6.582 × 10−16 eVs), m is the particle’s mass,
and i = −1 is the imaginary unit. This is called Schrödinger’s equation
and it’s the guiding principle of quantum mechanics. If the system is more
complicated than a single non-relativistic particle, then we just say
∂ψ
Hψ = i~ , (9.2.7)
∂t
where H is the Hamiltonian on the wave function ψ. The form of H must
be determined for the specific case.
~2 ~ 2 ∂ψ
− ∇ ψ + (qφ) ψ = i~ , (9.2.8)
2m ∂t
where V = qφ is the electric potential energy of the entire electron. Things
get more interesting if we add in a magnetic field.
c Nick Lucid
348 CHAPTER 9. BASIC QUANTUM MECHANICS
Magnetic fields are weird and their potentials are even weirder. We first
saw the magnetic vector potential, A, ~ in Section 5.6 along with the electric
scalar potential, φ. However, where qφ has units of energy, q A~ has units of
momentum. We can generalize this in special relativity with qAδ , where Aδ is
the 4-potential (Eq. 7.5.5), since the 4-momentum (Eq. 7.4.22) incorporates
energy and momentum. The consequence of this is that 4-momentum is now
pδ = mp uδ + qAδ , (9.2.9)
1 ~ − qA
~ • −i~∇ ~ ψ + (qφ) ψ = i~ ∂ψ ,
~ − qA
−i~∇ (9.2.11)
2m ∂t
which applies to non-relativistic charge q (spin = 0). If you’re dealing
specifically with an electron, just say q = −e = −1.602 × 10−19 C.
We have yet to encounter any problems with our original assumption that
ψ ∗ ψ is the volumetric charge density, ρ, because we have yet to ask the right
question. The charge distribution of our smeared-out electron is bound to
change over time in response to its external influences. This change can be
found by
∂ρ ∂ ∗ ∂ψ ∗ ∂ψ
= (ψ ψ) = ψ + ψ∗ ,
∂t ∂t ∂t ∂t
c Nick Lucid
9.2. WAVES OF PROBABILITY 349
where we’ve used the derivative product rule (Eq. 3.1.5). We can use this
along with Eq. 5.3.22,
∂ρ ~ • J,
~
= −∇
∂t
~ If we can show this current density
to find an electric current density, J.
changes in time, then the charge is accelerating. Since accelerating charges
radiate light and we know that shouldn’t happen in this circumstance, we’ll
have a contradiction. The only conclusion will be that our original assump-
tion was false.
We can start by eliminating the time derivatives on the right-hand side
using substitutions from Schrödinger’s equation (Eq. 9.2.11), which results
in
∂ψ ∗ ψ∗
∂ρ ψ ∂ψ
=− −i~ + i~
∂t i~ ∂t i~ ∂t
∂ρ ψ 1 ~
~ • i~∇ ~ − qA
~ ψ + (qφ) ψ
∗ ∗
= − i~∇ − q A
∂t i~ 2m
ψ∗ 1
+ −i~∇~ − qA ~ • −i~∇ ~ − qA ~ ψ + (qφ) ψ .
i~ 2m
We have to take the complex conjugate of Schrödinger’s equation when op-
erating on ψ ∗ . If we move some factors around,
∂ρ 1 ψ ~
~ • i~∇ ~ − qA
~ ψ +
∗ 2m ∗
= − i~∇ − q A ψ (qφ) ψ
∂t 2m i~ i~
ψ∗
1 ~ ~
~ ~
2m ∗
− − −i~∇ − q A • −i~∇ − q A ψ − ψ (qφ) ψ
2m i~ i~
∂ρ 1 ψ ~ ~
~ ~
= − i~∇ − q A • i~∇ − q A ψ ∗
∂t 2m i~
ψ∗
1 ~ − qA
~ • −i~∇ ~ − qA~ ψ ,
− − −i~∇
2m i~
and expand the two binomial products,
∂ρ 1 ψ 2 2~ 2 ~ • i~∇
~ − i~∇
~ • qA
~+q A2 ~•A~ ψ∗
= − i ~ ∇ − qA
∂t 2m i~
ψ∗ 2 2 ~ 2
1 ~ ~ ~ ~ 2~ ~
− − i ~ ∇ + q A • i~∇ + i~∇ • q A + q A • A ψ
2m i~
c Nick Lucid
350 CHAPTER 9. BASIC QUANTUM MECHANICS
2
∂ρ 1 ~ ψ − ψq A
2 ∗ ~ • ∇ψ
~ ∗ ~ • q Aψ
~ + ψA
∗ q ~ • Aψ
~ ∗
= − i~ψ ∇ − ψ∇
∂t 2m i~
2
1 ∗~ 2 ∗ ~ ~ −ψ ∇ ∗ ~ • q Aψ q
~ − ψ A ∗ ~ • Aψ
~ ,
− −i~ψ ∇ ψ − ψ q A • ∇ψ
2m i~
a total of four terms cancel giving us
∂ρ 1 h ~ 2 ψ ∗ − ψq A
~ • ∇ψ
~ ∗ − ψ∇
~ • q Aψ
~ ∗
i
= − i~ψ ∇
∂t 2m
1 h ~ 2ψ − ψ∗qA ~ • ∇ψ
~ − ψ∗∇
~ • q Aψ
~ .
i
− −i~ψ ∗ ∇
2m
We need to be a bit more careful with the remaining terms. If we do some
~ ∇ψ
voodoo math by adding in two opposite extra terms (i~∇ψ ~ ∗ ), we get
∂ρ 1 h ~ 2 ψ ∗ + i~∇ψ
~ ∇ψ
~ ∗ − ψq A
~ • ∇ψ
~ ∗ − ψ∇
~ • q Aψ
~ ∗
i
= − i~ψ ∇
∂t 2m
1 h ~ 2 ψ − i~∇ψ
~ ∗ ∇ψ
~ − ψ∗qA ~ • ∇ψ
~ − ψ∗∇
~ • q Aψ
~ .
i
− −i~ψ ∗ ∇
2m
Regrouping a few things and using the derivative product rule (Eq. 3.1.5) in
reverse results in
∂ρ 1 h ~ 2 ∗ ~ ~ ∗ ∗ ~ ~ ~ • q Aψ
~ ∗
i
= − i~ ψ ∇ ψ + ∇ψ ∇ψ − ψ q A • ∇ψ + ψ ∇
∂t 2m
1 h
~ 2 ψ + ∇ψ
~ ∗ ∇ψ
~
~ • ∇ψ
~ ∗ + ψ∗∇~ • q Aψ
~
i
− −i~ ψ ∗ ∇ − ψq A
2m
∂ρ 1 h ~ ~ ∗ ~ ~ ∗ i
= − i~∇ • ψ ∇ψ − ∇ • ψq Aψ
∂t 2m
1 h
~
~ • ψ ∗ ∇ψ
~ • ψ ∗ q Aψ
~
i
− −i~∇ −∇
2m
∂ρ ~ • 1 ψ i~∇
h
~ − qA
~ ψ ∗ + ψ ∗ −i~∇
~ − qA
i
~ ψ ,
= −∇ (9.2.12)
∂t 2m
which is now written as a divergence. It is clear from Eqs. 9.2.12 and 5.3.22
that the current density is given by
~ 1 h ~ ~
∗ ∗
~ ~
i
J= ψ i~∇ − q A ψ + ψ −i~∇ − q A ψ . (9.2.13)
2m
c Nick Lucid
9.2. WAVES OF PROBABILITY 351
~ 1 h ∗ ∗ ~ ∗
i
J= (ψ p~ ψ − ψ p~ ψ ) − q Aψ ψ (9.2.15)
2m
to more clearly separate the magnetic contribution.
This electric current density is dependent on the wave function ψ. Ac-
cording to the general Schrödinger’s equation (Eq. 9.2.7), ψ has a non-zero
time derivative if it has a non-zero Hamiltonian. This implies the current
~ will also have a non-zero time derivative meaning the charge dis-
density, J,
tribution will radiate light. Therefore,
and we are back where we started. What kind of waves are these? Later in
1926, a German physicist named Max Born suggested an answer: waves of
probability. The idea is the electron isn’t actually smeared-out across a Bohr
orbit. As Richard Feynman once said, “The electron is either here, or there,
or somewhere else; but, wherever it is, it is a point charge.”
The quantity ψ ∗ ψ is just the probability density (generally, probability
per unit volume) of finding the electron in any particular place. Another way
to say this is ψ ∗ ψ dx dy dz is the probability of finding the electron in the
infinitesimal cube between (x, y, z) and (x + dx, y + dy, z + dz). In a more
practical sense,
Z x2 Z y2 Z z2
P = ψ ∗ ψ dx dy dz (9.2.16)
x1 y1 z1
c Nick Lucid
352 CHAPTER 9. BASIC QUANTUM MECHANICS
• position, ~r,
• momentum, p~,
• energy, H,
~ and
• angular momentum, L,
~
• spin, S.
c Nick Lucid
9.3. QUANTUM MEASUREMENTS 353
of an arbitrary observable, Q(~r, p~, t). If you were to perform many measure-
ments of the observable, Q, on a particle in the state, ψ, then Eq. 9.3.1 pre-
dicts the average value of those measurements. Specifically, this is a weighted
average just like the atomic masses on the periodic table. For average atomic
mass, there is a finite number of possibilities (i.e. it’s discrete), so
X X
hmi = mi (isotope abundance)i = mi Pi ,
i i
where both sides result in the same expectation value for Q (and we’ve used
“all space” in place of the cumbersome ±∞ limits). We can and will keep
c Nick Lucid
354 CHAPTER 9. BASIC QUANTUM MECHANICS
using integrals and functions to make quantum predictions, but they’re not
always the best way. Linear algebra says that functions can also be expressed
as vectors or matrices, which is sometimes more convenient. It’s always nice
to have alternative options.
Bra-Ket Notation
Representing states as functions was certainly favored by Erwin Schrödinger.
However, by the 1930s, it was becoming clear that this method had its draw-
backs. Max Born and Werner Heisenberg had already begun using matrices.
In 1939, Paul Dirac published an article titled A New Notation For Quantum
Mechanics in which he attempted to bridge the gap between these different
methods.
Dirac’s method involved representing states as vectors in a Hilbert space.
A Hilbert space is just a space with more than three dimensions like the one
David Hilbert used for spacetime in general relativity (see Chapter 8). In
quantum mechanics though, the “space” is not necessarily spatial and can
have as many “dimensions” as necessary. Oh, and it’s also complex.
The new notation can defined by extending the brackets of the expecta-
tion value (Eq. 9.3.1) as
where |ψi is the state vector (replacing the state function) for the particle.
The |ψi is called a “ket” vector and the hψ| is called a “bra” vector, which is
a play on words since we often refer to this notation as “bracket” (bra-ket)
notation. The bra and ket vectors must be complex conjugates,
c Nick Lucid
9.3. QUANTUM MEASUREMENTS 355
but what exactly does an operation like this mean? Let’s start by trying to
show how the methods compare. According to Eq. 9.2.17,
ZZZ
ψ ∗ (~r, t) ψ(~r, t) dx dy dz = 1,
all space
where ψ is written in the position basis. Well, |ψi is a vector, which is only
projected onto a basis. We can project onto the Cartesian position basis
using
ZZZ
|ψi = |~ri h~r| |ψi dx dy dz . (9.3.7)
all space
ZZZ
hψ| |~ri h~r| |ψi dx dy dz = 1,
all space
which matches Eq. 9.2.17 as long as ψ(~r, t) = h~r| |ψi and ψ ∗ (~r, t) = hψ| |~ri.
We can, therefore, interpret hψ| |ψi as a probability density. It then
follows that
c Nick Lucid
356 CHAPTER 9. BASIC QUANTUM MECHANICS
is a probability of 100%, which makes sense since both states in the operation
are the same. As a less trivial circumstance, consider a particle in a state
|ψi. The probability of finding it in a state |φi would be
which is only 100% if |ψi = |φi. The advantage here is we never had to
project onto any particular basis to discover the probability.
Many undergraduate quantum textbooks will focus on functions and in-
tegrals because they’re far more familiar to students. As a result, those
students tend to be at a disadvantage when taking graduate courses or read-
ing articles on their own. Undergraduate courses that also expose students to
Max Born’s matrix method put those students in a slightly better position.
However, matrix operations can still be more complicated (e.g. multiplication
involves a transpose matrix) than bra-ket operations, which may cause some
confusion. I will do my best to expose you to all three methods (functions,
matrices, and bra-ket) as we go.
Since H only has space-derivatives and the right-hand side only has time
derivatives,
∂U
U HΨ = i~Ψ
∂t
1 1 ∂U
HΨ = i~ , (9.3.11)
Ψ U ∂t
and we can’t cancel the Ψ’s on the left because H must operate first. The
benefit here is that everything on the left is only a function of space and
everything on the right is only a function of time. We also know that space
and time are independent variables, so changing one will not necessarily
change the other. The only case in which this doesn’t contradict Eq. 9.3.11
is when both sides are constant.
The easy part of solving Eq. 9.3.11 is the time-dependence. Since H has
units of energy, it seems fitting to call the stuff on the right E,
1 ∂U
E = i~ ,
U ∂t
which is very similar to Eq. 9.2.5 with the exception that this E is a real
constant (rather than an operator). If we move some things around, then we
get
∂U E −iE
= U= U.
∂t i~ ~
There is only one function with a first derivative proportional to itself: the
natural exponential function. Therefore,
where we now only need to solve for Ψ(~r) (the space dependence).
c Nick Lucid
358 CHAPTER 9. BASIC QUANTUM MECHANICS
The separable solutions given by Eq. 9.3.13 are called stationary states
because the particle won’t transition out of these states on its own. They are
stable states. If we substitute them into Schrödinger equation (Eq. 9.2.7),
then
∂
H Ψe−iEt/~ = i~ Ψe−iEt/~
∂t
e−iEt/~ HΨ = Ee−iEt/~ Ψ
HΨ = EΨ . (9.3.14)
H |ψi = E |φi ,
c Nick Lucid
9.3. QUANTUM MEASUREMENTS 359
where δij is the Kronecker delta (Eq. 6.2.2). This is useful because it’s a
complete set of orthonormal vectors. That makes it an orthonormal basis for
the Hilbert space, meaning all possible full-states can be written as a linear
combination of those eigenvectors:
∞
X
|ψi = cn e−iEn t/~ |Ψn i , (9.3.18)
n=1
where cn are just constant complex coefficients. We already stated this for
functions as Eq. 9.3.10. Using Eq. 9.3.12, we get
∞
X
ψ(~r, t) = cn Ψn (~r) e−iEn t/~ , (9.3.19)
n=1
c Nick Lucid
360 CHAPTER 9. BASIC QUANTUM MECHANICS
All of these observables are operators, so the question is: “Are those
operations commutative?” As it turns out, position and momentum do not
(i.e. ~r • p~ ψ 6= p~ • ~r ψ). Their actual relationship is given by something called
a commutator,
which is an operator itself. To find its non-zero value, we need to first make
it operate on a general state ψ:
[~r, p~] ψ = ~r • p~ ψ − p~ • ~r ψ.
Using the chain rule for derivatives (Eq. 3.1.2)on the last term, we get
h i
~ + ∇
[~r, p~] ψ = i~ −~r • ∇ψ ~ • ~r ψ + ~r • ∇ψ
~
h i
~
= i~ ∇ • ~r ψ .
~ • ~r = ∂x + ∂y + ∂z = 1 + 1 + 1 = 3
∇
∂x ∂y ∂z
so
Traditionally, this is written in only one dimension. By the vector dot prod-
uct (Eq. 2.2.2), we can say
Since the orientation of the axes doesn’t matter, all three terms should be
equal, so
[x, px ] = i~ . (9.3.22)
c Nick Lucid
9.3. QUANTUM MEASUREMENTS 361
c Nick Lucid
362 CHAPTER 9. BASIC QUANTUM MECHANICS
[A, B] ≡ AB − BA , (9.3.29)
then we’ll have something dependent only on what A and B are rather their
specific expectation values. The complex square on the right-hand side of
Eq. 9.3.28 is necessary because hABi − hAi hBi isn’t necessarily real. Any
complex number, z, will always obey
= Re(z)2 + Im(z)2 .
where Re(z) denotes the real part of z and Im(z) denotes the imaginary part
of z. From the definition of z, we also know
= 2i Im(z)
1
Im(z) = (z − z ∗ ) . (9.3.30)
2i
c Nick Lucid
9.3. QUANTUM MEASUREMENTS 363
Since, with squares, everything is now positive and real, we can say
Using this in Eq. 9.3.28 with z = hABi − hAi hBi and z ∗ = hBAi − hBi hAi,
we get
2
2 2 2 1 ∗
σA σB ≥ kzk ≥ (z − z )
2i
2
2 2 1
σA σB ≥ (hABi − hAi hBi − hBAi + hBi hAi)
2i
2
2 2 1
σA σB ≥ (hABi − hBAi)
2i
2
2 2 1
σA σB ≥ hAB − BAi ,
2i
which just involves the commutator (Eq. 9.3.29). The final result is then
2
1
σA2 σB2 ≥ h[A, B]i , (9.3.31)
2i
H, L2 = 0
(9.3.32)
[H, Lz ] = 0 (9.3.33)
c Nick Lucid
364 CHAPTER 9. BASIC QUANTUM MECHANICS
which will become very important in Chapter 10. If A and B are incompatible
observables, then [A, B] 6= 0 and you’ll have to find Eq. 9.3.31 for that specific
pair. For example, the components of angular momentum are incompatible
with each other as shown by
• Angular Momentum along x and Angular Momentum along y:
[Lx , Ly ] = i~Lz (9.3.35)
c Nick Lucid
9.4. SIMPLE MODELS 365
2. The more precise you can predict position, the less precise your predic-
tion of momentum will be (and vice versa).
It should be emphasized that these precision issues are not due to limits of
our technology. Eq. 9.3.31 was derived using only generic statistics and the
knowledge that matter behaves like waves. It is a fundamental result of the
mechanics of matter waves and, therefore, a fundamental property of the
universe.
Recall that, for any observable, there exists a set of stationary states
(Eq. 9.3.13) that are described partially by eigenstates. These stationary
states are states of definite value. If observables are compatible, then they
share a complete set of eigenstates (i.e. it is possible to find a particle in a
stationary state of both observables at the same time). That means both can
be predicted with precision. However, if observables are incompatible, then
you will never find the particle in a stationary state of both at the same
time. That means if one has a definite value, then the other does not (i.e. it
is less precise).
3. Harmonic Oscillators;
c Nick Lucid
366 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.8: This shows the (one-dimensional) infinite square well’s potential energy (Eq.
9.4.1) graphed against position, x.
for a well with an arbitrary width, a. Figure 9.8 shows clearly why we refer to
this as a “well.” There is no potential energy inside, but it’s infinite outside.
Basically, this particle is never getting out because the it’s in an infinitely
deep hole. In three dimensions, it’s usually stated as
(
0, if 0 ≤ x ≤ ax , 0 ≤ y ≤ ay , and 0 ≤ z ≤ az
V (x, y, z) = (9.4.2)
∞, otherwise
c Nick Lucid
9.4. SIMPLE MODELS 367
Example 9.4.1
What are the stationary states (and corresponding energies) for a non-relativistic
particle in a one-dimensional infinite square well?
• First, there are no stationary states (i.e. ψ(x, t) = 0) outside the well.
It is impossible to achieve infinite potential energy. All we really need
to find are the stationary states inside the well.
~2 ∂ 2 Ψ
− = EΨ (9.4.3)
2m ∂x2
~ 2 is only in one dimension, x.
where we’ve set V = 0 and ∇
∂ 2Ψ 2mE
2
= − 2 Ψ,
∂x ~
which is a very common differential equation. There are only two func-
tions with second derivatives proportional to the negative of themselves:
sin(kx) and cos(kx). The general solution will be a linear combination
of the two,
∂ 2Ψ 2mE
2
= −k 2 Ψ = − 2 Ψ
∂x ~
c Nick Lucid
368 CHAPTER 9. BASIC QUANTUM MECHANICS
√
2mE
⇒k= ,
~
so the eigenstates take the form
√ ! √ !
2mE 2mE
Ψ(x) = C1 sin x + C2 cos x .
~ ~
c Nick Lucid
9.4. SIMPLE MODELS 369
2mE 2
a = n2 π 2
~2
n2 π 2 ~2
En = , (9.4.5)
2ma2
ψ ∗ ψ = Ψ∗ eiEt/~ Ψe−iEt/~ = Ψ∗ Ψ,
we get
Z +∞
Ψ∗ Ψ dx = 1.
−∞
c Nick Lucid
370 CHAPTER 9. BASIC QUANTUM MECHANICS
Z 0 Z a
" √ !#2 Z +∞
2mE
0 dx + C1 sin x dx + 0 dx = 1
−∞ 0 ~ a
Z a
√ !
2mE
C12 sin2 x dx = 1.
0 ~
we can say
" √ !#a
x ~ 2mE
C12 − √ sin 2 x =1
2 4 2mE ~
0
ha r
i 2
C12 −0−0+0 =1 ⇒ C1 = .
2 a
Therefore the eigenstates take the form
r √
2 2mEn
Ψn (x) = sin x
a ~
or, better yet,
r
2 nπ
Ψn (x) = sin x , (9.4.7)
a a
c Nick Lucid
9.4. SIMPLE MODELS 371
Figure 9.9: This shows the first three eigenstates of the (one-dimensional) infinite square
well given by Eq. 9.4.7.
r
2 nπ −in2 π2 ~ t/(2ma2 )
ψn (x, t) = sin x e , (9.4.8)
a a
c Nick Lucid
372 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.10: This shows the first three stationary states of the (one-dimensional) infinite
square well given by Eq. 9.4.8 at some t 6= 0. For these stationary states at t = 0, refer to
Figure 9.9.
c Nick Lucid
9.4. SIMPLE MODELS 373
Example 9.4.2
What are the stationary states (and corresponding energies) for a non-relativistic
particle in a three-dimensional infinite square well?
• Just like with the one-dimensional case, there are no stationary states
(i.e. ψ(x, t) = 0) outside the well. It is impossible to achieve infinite
potential energy. All we really need to find are the stationary states
inside the well.
~2 ∂ 2Ψ ∂ 2Ψ ∂ 2Ψ
− + + = EΨ (9.4.10)
2m ∂x2 ∂y 2 ∂z 2
~ 2 has been expanded into three dimen-
where we’ve set V = 0 and ∇
sions.
This is not always the case, since different potential energy functions
can cause problems. It only worked this time because V = 0.
c Nick Lucid
374 CHAPTER 9. BASIC QUANTUM MECHANICS
~2 ∂ 2X ∂ 2Y ∂ 2Z
− Y Z 2 + XZ 2 + XY = E XY Z.
2m ∂x ∂y ∂z 2
Dividing through by XY Z, we get
~2 1 ∂ 2X 1 ∂ 2Y 1 ∂ 2Z
− + + =E
2m X ∂x2 Y ∂y 2 Z ∂z 2
~2 1 ∂ 2 X ~2 1 ∂ 2 Y ~2 1 ∂ 2 Z
− − − = E.
2m X ∂x2 2m Y ∂y 2 2m Z ∂z 2
We can see the three individual terms add to be a constant. However,
since the terms are not codependent (i.e. they’re functions of different
independent variables), then their sum can be constant only if each
term is individually constant. Therefore, this is just three independent
differential equations:
~2 1 ∂ 2 X ~2 1 ∂ 2 Y ~2 1 ∂ 2 Z
− = E x , − = Ey , − = Ez ;
2m X ∂x2 2m Y ∂y 2 2m Z ∂z 2
where E = Ex + Ey + Ez .
• These three differential equations look identical to the one-dimensional
case (Eq. 9.4.3), so they have the same general solutions. Since the
boundary conditions are also identical, the specific solutions will be
the same. Using these variables, we get eigenstates in the form
r
2 nx π
X(x) = sin x (9.4.12a)
ax ax
s
2 ny π
Y (y) = sin y (9.4.12b)
ay ay
r
2 nz π
Z(z) = sin z (9.4.12c)
az az
c Nick Lucid
9.4. SIMPLE MODELS 375
with energies
n2x π 2 ~2
Enx = (9.4.13a)
2ma2x
n2y π 2 ~2
Eny = (9.4.13b)
2ma2y
n2z π 2 ~2
Eny = (9.4.13c)
2ma2z
where the appropriate quantities have directional labels.
• If we piece Eq. Sets 9.4.12 and 9.4.13 together, then the eigenstates are
s
8 nx π ny π nz π
Ψnx ny nz = sin x sin y sin z (9.4.14)
ax ay az ax ay az
with energies
c Nick Lucid
376 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.11: This is a representation of the lowest energy level (nx ny nz is 111) for the
three-dimensional infinite well with ax = ay = az = a (boundaries shown). The value of
Ψ(x, y, z) (Eq. 9.4.14) is shown as the size of the blocks. The color red indicates the values
of Ψ are positive.
are all the same energy (shown in Figure 9.12). When this happens, we
say those states are degenerate. The phenomenon of degeneracy is
a consequence of working in a three-dimensional world and is common
in more complex examples as well.
c Nick Lucid
9.4. SIMPLE MODELS 377
Figure 9.12: This is a representation of the second energy level for the three-dimensional
infinite well with ax = ay = az = a (boundaries shown). Each diagram is labeled by
nx ny nz . The value of Ψ(x, y, z) (Eq. 9.4.14) is shown as the size of the blocks. The color
red indicates the values of Ψ are positive and the color blue indicates the values of Ψ are
negative.
for a well with an arbitrary width, a. This is much more realistic, but it
doesn’t make our lives very easy.
In Chapter 1, we said the coordinate system was just a tool and some
choices are better than others. If we shift the coordinate to the middle of
the well, then
( a a
−V0 , if − ≤ x ≤ +
V (x) = 2 2 (9.4.17)
0, otherwise
for a well with an arbitrary width, a (shown in Figure 9.13). This potential
energy function has symmetry, so the solutions will also have symmetry.
In this case, it’s symmetry over the vertical axis (i.e. the potential energy
function is even), so it has the same value when you transform by x → −x.
If we do the same transformation on the time-independent Schrödinger
equation (Eq. 9.3.14), we get
~2 ∂ 2 Ψ(−x)
− + V (−x) Ψ(−x) = E Ψ(−x) .
2m ∂x2
However, since V (−x) = V (x), this becomes
~2 ∂ 2 Ψ(−x)
− + V (x) Ψ(−x) = E Ψ(−x) ,
2m ∂x2
c Nick Lucid
378 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.13: This shows the (one-dimensional) finite square well’s potential energy (Eq.
9.4.17) graphed against position, x.
Example 9.4.3
An electron, moving at non-relativistic speeds, is in a one-dimensional finite
square well (V0 = 250 eV, a = 0.1 nm). What are the stationary states (and
corresponding energies) for the electron?
• We’re going to solve this problem by staying as general as possible for
as long as possible. This will allow us to see results common to all
finite wells.
• Unlike the infinite well (see Example 9.4.1), it is possible to find the
particle beyond the walls (i.e. for |x| > a/2). Think of it like probability
c Nick Lucid
9.4. SIMPLE MODELS 379
~2 ∂ 2 Ψ
− = EΨ, (9.4.19)
2m ∂x2
~ 2 is only in one dimension, x.
where ∇
• Moving some things around, we get
∂ 2Ψ −2mE
2
= Ψ,
∂x ~2
which looks a lot like what we got for the infinite well. However, re-
member E is negative because it’s an energy deficit, so
√
−2mE
α= (9.4.20)
~
must be positive and real. There is only one function with a second
derivative proportional to the positive of itself: the exponential, e±αx .
Therefore, the eigenstates are a linear combination of the two,
√ √
−2mE x/~
Ψ(x) = C1 e + C2 e− −2mE x/~
(9.4.21)
~2 ∂ 2 Ψ
− − V0 Ψ = EΨ, (9.4.22)
2m ∂x2
~ 2 is only in one dimension, x.
where ∇
c Nick Lucid
380 CHAPTER 9. BASIC QUANTUM MECHANICS
∂ 2Ψ −2m (E + V0 )
2
= Ψ.
∂x ~2
must be positive and real (just like in the infinite well). There are only
two functions with second derivatives proportional to the negative of
themselves: sin(kx) and cos(kx). Therefore, the eigenstates are a linear
combination of the two,
√ √
2m(E+V0 ) 2m(E+V0 )
Ψ(x) = C3 sin ~
x + C4 cos ~
x (9.4.24)
c Nick Lucid
9.4. SIMPLE MODELS 381
√ a
C1 e −2mE x/~ , if x < −
2
√ √
2m(E+V0 ) 2m(E+V0 ) a
Ψ = C3 sin x + C4 cos x , if |x| ≤
~ ~
2
√ a
C e− −2mE x/~
, if x > +
6
2
for each of the three regions. The boundaries at the walls of the well
are a bit trickier. We’ll need to reduce these functions a little more
before we can apply them.
√ a
C1 e −2mE x/~ , if x < −
2
√
2m(E+V0 ) a
Ψeven = C4 cos x , if |x| ≤ (9.4.25)
~
2
√ a
C e− −2mE x/~
, if x > +
1
2
c Nick Lucid
382 CHAPTER 9. BASIC QUANTUM MECHANICS
which still apply to all finite square wells described by Eq. 9.4.17.
√
√
−2mE a/(2~) 2m(E+V0 ) a
⇒ C1,even = C4 e cos ~ 2
(9.4.27)
√
√
−2mE a/(2~) 2m(E+V0 ) a
⇒ C1,odd = −C3 e sin ~ 2
(9.4.28)
c Nick Lucid
9.4. SIMPLE MODELS 383
However, we know the parts outside the finite well “mirror” each other,
so
Z −a/2 Z +a/2
2
2 Ψ dx + Ψ2 dx = 1, (9.4.29)
−∞ −a/2
where we’ve doubled the first term to make up for the loss of the third
term.
• We’ll start with the even solutions, but using α (Eq. 9.4.20) and k (Eq.
9.4.23) will make for an easier read. Eq. 9.4.29 becomes
Z −a/2 Z +a/2
αx 2
2 [C1 e ] dx + [C4 cos(kx)]2 dx = 1,
−∞ −a/2
where we’ve substituted from Eq. 9.4.25. If we use Eq. 9.4.27, then
Z −a/2 Z +a/2
αx 2
αa/2 ka
[C4 cos(kx)]2 dx = 1
2 C4 e cos 2
e dx +
−∞ −a/2
Z −a/2 Z +a/2
2C42 eαa cos2 ka 2αx
C42 cos2 (kx) dx = 1.
2
e dx +
−∞ −a/2
and
Z b Z b b
2 1+cos(2kx) x sin(2kx)
cos (kx) dx = 2
dx = + , (9.4.31)
a a 2 4k a
c Nick Lucid
384 CHAPTER 9. BASIC QUANTUM MECHANICS
we can say
e2αx −a/2
+a/2
2 x sin(2kx)
2C42 eαa cos2 ka
2
+ C4 + =1
2α −∞ 2 4k −a/2
e−αa
a sin(ka)
2C42 eαa cos2 ka
2
2
− 0 + C4 2 + =1
2α 4 4k
1 a 1
C42 cos2 ka
2
+ + sin(ka) = 1
α 2 2k
s
1
C4 = 1
, (9.4.32)
cos2 ka a 1
α 2
+ 2
+ 2k
sin(ka)
where α is given by Eq. 9.4.20) and k by Eq. 9.4.23. The a/2 term is
the same as in the infinite well, but the additional terms (because they
involve α and k) depend on the energy of the state. The consequence
is C4 is not universal for the finite well like it was for the infinite case.
where α is given by Eq. 9.4.20) and k by Eq. 9.4.23. This looks a lot
like Eq. 9.4.32. We’ve just exchanged a cosine for a sine in the first
term and a plus for a minus in the third term.
c Nick Lucid
9.4. SIMPLE MODELS 385
• Using units to match the givens (nm and eV), ~ = 6.582 × 10−16 eVs
and
eV s2
m = 511 × 103 eV
c2
= 5.686 × 10−30 nm2
and, if we divide the second equation by the first and move some things
around, we get
a
−α = −k tan k
2
α a
= tan k .
k 2
We know a = 0.1 nm as well as α (Eq. 9.4.34) and k (Eq. 9.4.35), so
q
5.123 −E q
nm eV 5.123 E+250 eV 0.1 nm
q = tan nm eV 2
5.123 E+250 eV
nm eV
r r !
−E E + 250 eV
= tan 0.2562 , (9.4.36)
E + 250 eV eV
which is the energy condition for even solutions. Only values of E
that satisfy Eq. 9.4.36 are allowed.
c Nick Lucid
386 CHAPTER 9. BASIC QUANTUM MECHANICS
and, if we divide the first equation by the second and move some things
around, we get
1 1 a
− = tan k
α k 2
k a
− = tan k .
α 2
We know a = 0.1 nm as well as α (Eq. 9.4.34) and k (Eq. 9.4.35), so
q
5.123 E+250 eV q
nm eV 5.123 E+250 eV 0.1 nm
− q = tan nm eV 2
5.123 −E
nm eV
r r !
E + 250 eV E + 250 eV
− = tan 0.2562 , (9.4.37)
−E eV
which is the energy condition for odd solutions. Only values of E
that satisfy Eq. 9.4.37 are allowed.
• Eq. 9.4.36 and 9.4.37 are both transcendental equations (i.e. they “tran-
scend” algebra). This means they are not solvable using algebra, so
we’re forced to use numerical methods. In Figure 9.14, intersections
represent allowed energies and we can see there are only three:
E0 = −226.0 eV
E1 = −156.1 eV (9.4.38)
E2 = −51.28 eV.
The subscripts are arbitrary, but I’ve forced “1” to represent the odd
solution so it matches everyone’s concept of “odd.” If the well is deeper
(i.e. V0 is larger), then there are more possible energies. If the well is
shallower (i.e. V0 is smaller), there are fewer possible energies. However,
there is always an E0 because α/k will always intersect tan(ka/2) at
least once between zero and −V0 .
c Nick Lucid
9.4. SIMPLE MODELS
Figure 9.14: This graph shows three functions: tan(ka/2) (“tan”), α/k (“even”), and −k/α (“odd”). The values shown are
for the finite square well given in Example 9.4.3. An intersection of α/k with tan(ka/2) represents the energy levels of the
even solutions. An intersection of −k/α with tan(ka/2) represents the energy levels of the odd solutions.
c Nick Lucid
387
388 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.15: These are the only three eigenstates in the finite square well given in Example
9.4.3. The vertical dashed lines represent the boundaries of the well.
• With these energies, we can find the three possible eigenstates for this
finite well. After finding each α, k, C1 , C3 , and C4 ; they are
58.11
√
nm
e77.01x/nm , if x < −0.05 nm
Ψ0 = 3.984 cos 25.11
√
nm nm
x , if |x| ≤ 0.05 nm , (9.4.39)
58.11 e−77.01x/nm , if x > +0.05 nm
√
nm
− 58.75
√
nm
e64.01x/nm , if x < −0.05 nm
Ψ1 = 3.903 sin 49.63
√
nm nm
x , if |x| ≤ 0.05 nm , (9.4.40)
58.75 e−64.01x/nm , if x > +0.05 nm
√
nm
and
− 20.09
√
nm
e36.69x/nm , if x < −0.05 nm
Ψ2 = 3.597 cos 72.22
√
nm nm
x , if |x| ≤ 0.05 nm . (9.4.41)
− 20.09 e−36.69x/nm
√
nm
, if x > +0.05 nm
c Nick Lucid
9.4. SIMPLE MODELS 389
Figure 9.16: This is an energy level diagram showing only three energy states in the
finite square well given in Example 9.4.3. The colors match those used in Figure 9.15.
− 58.75
√
nm
e64.01x/nm ei 237.2t/fs , if x < −0.05 nm
ψ1 = 3.903 sin 49.63
√
nm nm
x ei 237.2t/fs , if |x| ≤ 0.05 nm , (9.4.43)
58.75 e−64.01x/nm ei 237.2t/fs
√
nm
, if x > +0.05 nm
c Nick Lucid
390 CHAPTER 9. BASIC QUANTUM MECHANICS
and
− 20.09
√
nm
e36.69x/nm ei 77.91t/fs , if x < −0.05 nm
ψ2 = 3.597 72.22
i 77.91t/fs
√
nm
cos nm
x e , if |x| ≤ 0.05 nm . (9.4.44)
− 20.09 e−36.69x/nm ei 77.91t/fs
√
nm
, if x > +0.05 nm
Example 9.4.4
In Example 9.4.3, an electron was in a finite square well with dimensions
V0 = 250 eV and a = 0.1 nm. Suppose that electron is in the stationary
state ψ1 . What is the probability of finding that electron inside the well (i.e.
|x| ≤ 0.05 nm)? What is the probability of finding that electron outside the
well (i.e. |x| > 0.05 nm)?
ψ ∗ ψ dx = Ψ∗ eiEt/~ Ψe−iEt/~ = Ψ∗ Ψ.
c Nick Lucid
9.4. SIMPLE MODELS 391
in the region between x1 and x2 . Inside the well, the boundaries are
x1 = −a/2 and x2 = +a/2, so
Z +a/2 Z +a/2
2
P = [C3 sin(kx)] dx = C32 sin2 (kx) dx,
−a/2 −a/2
where we’ve replaced the complex square with a real square only be-
cause Ψ is entirely real. Since we’ve already solved an integral like this
in Eq. 9.4.6,
+a/2
x 1 a 1
P = C32 − sin2 (2kx) = C32 2 − 2
sin (ka)
2 4k −a/2 4 4k
a 1
P = C32 − 2
sin (ka) , (9.4.46)
2 2k
which is true for any odd state in any finite square well. Putting the
numbers back in, we get
" #
2 0.1 nm 1 49.63
P = 3.903
√ − sin2 (0.1 nm) ,
nm
2 2 49.63
nm
nm
Example 9.4.5
In Example 9.4.3, an electron was in a finite square well with dimensions
V0 = 250 eV and a = 0.1 nm. Suppose that electron is in the stationary
state ψ1 . Find hxi, hx2 i, hHi, hpi, and hp2 i. Is the Heisenberg Uncertainty
Principle (Eq. 9.3.39) satisfied?
c Nick Lucid
392 CHAPTER 9. BASIC QUANTUM MECHANICS
where Q is still an arbitrary observable. This will work for all expec-
tation values in this example.
c Nick Lucid
9.4. SIMPLE MODELS 393
• Luckily, we can avoid solving them using a couple shortcuts. The mid-
dle integrand, x sin2 (kx), is an odd function (i.e. fodd (−x) = −fodd (x)).
Odd functions have the property
Z +b
fodd (x) dx = 0 (9.4.49)
−b
for any b and any fodd , so the second integral is also zero. This leaves
us with only
Z −a/2 Z +∞
hxi = C12 xe 2αx
dx + C12 xe−2αx dx
−∞ +a/2
to solve. If, in the first term, we reverse the limits of integration and
say x = −x0 , then
R −a/2 R −∞
−∞
xe2αx dx = − −a/2
xe2αx dx
R +∞ 0
= − +a/2
(−x0 ) e2α(−x ) d(−x0 )
R +∞ 0
= − +a/2
x0 e−2αx dx0 .
c Nick Lucid
394 CHAPTER 9. BASIC QUANTUM MECHANICS
This means the two remaining terms cancel each other and hxi = 0 ,
which is actually true for all the states in any finite square well. Re-
member, an expectation value is just a weighted average of all possible
values. Due to the symmetry of the eigenstate, the electron is just as
likely to be found at a negative value for x as it is a positive value for
x.
• The same tricks wont work on hx2 i. Using Eq. 9.4.48 gives us
R −a/2
hx2 i = −∞
[C1 eαx ] x2 [C1 eαx ] dx
R +a/2
+ −a/2
[C3 sin(kx)] x2 [C3 sin(kx)] dx
R +∞
+ +a/2
[−C1 e−αx ] x2 [−C1 e−αx ] dx
for any b and any feven . Even and odd functions show up a lot in quan-
tum mechanics, so you should get used to using Eqs. 9.4.49 and 9.4.50.
c Nick Lucid
9.4. SIMPLE MODELS 395
This means the first and last terms are equal, so the expectation value
simplifies to
Z a/2 Z ∞
2
x = 2C3 2 2 2
x sin (kx) dx + 2C12
x2 e−2αx dx,
0 a/2
c Nick Lucid
396 CHAPTER 9. BASIC QUANTUM MECHANICS
Z a/2
2
a3
x2 sin2 (kx) dx = − a sin(ka)
16k
− a cos(ka)
8k2
+ 48
+ sin(ka)
8k3
. (9.4.51)
0
Using the tabular method and Eq. 9.4.30, the second integral is
−2αx ∞
+x
2
×+ − e 2α
Z ∞
−2αx
e
2 −2αx
xe dx = +2x ×− 4α2 ,
a/2
−2αx
×+ − e 8α3
+2
a/2
∞
a2
Z
2 −2αx a 1
xe dx dx = + 2+ 3 e−αa . (9.4.52)
a/2 8α 4α 4α
• If we substitute Eqs. 9.4.51 and 9.4.52 back into the expectation value,
then
h 2 i
2 2 a sin(ka) a cos(ka) a3 sin(ka)
hx i = 2C3 − 16k − 8k2 + 48 + 8k3
h 2 i
+ 2C12 8α a
+ 4αa 2 + 4α1 3 e−αa
h 2 3
i
2
hx i = − a sin(ka)
C32
8k
− a cos(ka)
4k2
+ a24 + sin(ka)
4k3
h 2 i (9.4.53)
2 a a 1 −αa
+ C1 4α + 2α2 + 2α3 e ,
which applies to any odd state in any finite well. If we put all the
numbers in from Eq. 9.4.40, then hx2 i = 4.990 × 10−4 nm2 .
• Since Ψ1 is an eigenstate of H, we know the energy is definite (and
discrete), which makes our work much easier. Mathematically, we say
H |1i = E1 |1i ,
c Nick Lucid
9.4. SIMPLE MODELS 397
so
hHi = h1| H |1i = h1| E1 |1i = E1 h1| |1i = E1 = −156.1 eV.
We know the value of this from Eq. 9.4.38. This method works for all
eigenstates,
hHi = hn| H |ni = En , (9.4.54)
where n is the state number.
• We should expect hpi = 0 since the electron would just as likely be
traveling right as it would left, but we’ll write out the integrals just to
be safe. Using Eq. 9.4.48 gives us
R −a/2
hpi = −∞
[C1 eαx ] p [C1 eαx ] dx
R +a/2
+ −a/2 [C3 sin(kx)] p [C3 sin(kx)] dx
R +∞
+ +a/2 [−C1 e−αx ] p [−C1 e−αx ] dx
where a = 0.1 nm all other constants are given in Eq. 9.4.40. If we
keep the constants general, then our result will apply to any odd state
in any finite square well. Since p (Eq. 9.2.3) is a derivative,
∂
p = −i~ (9.4.55)
∂x
in one dimension, its operation does change Ψ. Pulling out all con-
stants, we get
h R
−a/2 ∂ αx
hpi = −i~ C12 −∞ eαx ∂x e dx
R +a/2 ∂
+C32 −a/2 sin(kx) ∂x sin(kx) dx
i
2 +∞ −αx ∂ −αx
R
+ C1 +a/2 e ∂x
e dx
and evaluating each of the derivatives gives
h R −a/2
hpi = −i~ αC12 −∞ e2αx dx
R +a/2
+kC32 −a/2 sin(kx) cos(kx) dx
R +∞ i
−αC12 +a/2 e−2αx dx .
We now have three integrals we can solve analytically.
c Nick Lucid
398 CHAPTER 9. BASIC QUANTUM MECHANICS
" #
Z −a/2 Z +∞
hpi = −i~α C12 e2αx dx − C12 e−2αx dx
−∞ +a/2
to solve. The remaining integrals are just the probability of finding the
electron outside the finite well on either side. We know from Example
9.4.4 these two integrals have the same result (0.04474), so they cancel
each other and hpi = 0 , which is actually true for all the states in any
finite square well.
• The only expectation value left is hp2 i. We could go through the inte-
grals, but there’s an easier way. Eqs. 9.2.2 and 9.2.4 tell us that
p2 ~2 ∂ 2
H= +V =− + V (x) (9.4.56)
2m 2m ∂x2
in one dimension for non-relativistic particles. If we take the expecta-
tion value, then
2
hp2 i
p
hHi = +V = + hV i
2m 2m
since the expectation operator is linear. We know V = −V0 is con-
stant and we already found in Eq. 9.4.54 that hHi = En is definite.
Rearranging, we get
hp2 i
En = − V0
2m
c Nick Lucid
9.4. SIMPLE MODELS 399
c Nick Lucid
400 CHAPTER 9. BASIC QUANTUM MECHANICS
Table 9.1: This is a summary of all the calculated values in Example 9.4.5.
Quantity Value
hxi = 0
hx2 i = 4.990 × 10−4 nm2
hHi = −156.1 eV
hpi = 0
~ 2
hp2 i = 2463 nm
σx = 0.02234 nm
~
σp = 49.63 nm
σx σp = 1.109~
Harmonic Oscillator
Another unrealistic quality, present in both the infinite and finite square
wells, is discontinuity. The potential energy function changes abruptly at
the boundaries. If we want a more realistic potential energy function, then
we need it to be continuous over space. The upside is we wont have to write
the quantum states in piecewise form. The downside is they can be tricky or
sometimes even impossible to solve analytically.
The simplest of these continuous models is the harmonic oscillator. In
one dimension, the potential energy function can be written as
1 1
V (x) = kx2 = mω 2 x2 , (9.4.58)
2 2
p
where k is the classical elastic constant and ω = k/m is the angular fre-
quency. Note that we’ve chosen V to be positive everywhere, but you can
shift the function up or down as needed without changing the force experi-
enced by the particle. You can write this in three dimensions as
1
V (x, y, z) = mω 2 x2 + y 2 + z 2
(9.4.59)
2
in Cartesian coordinates. It can also be written in spherical coordinates as
1
V (r) = mω 2 r2 , (9.4.60)
2
but only if the oscillation is isotropic (i.e. independent of direction). If the
oscillations are different in different directions, then we’re forced to use Eq.
9.4.59.
c Nick Lucid
9.4. SIMPLE MODELS 401
Figure 9.17: This shows the (one-dimensional) harmonic oscillator’s potential energy (Eq.
9.4.58) graphed against position.
p Values on the vertical axis are in units of ~ω and values
on the horizontal axis are for mω ~ x (no unit) rather than x for generality.
Recall from Section 9.1, that Max Planck solved the black body radia-
tion problem by assuming the light-emitting object was made of very small
oscillators. The result was that light was emitted in packets called photons,
each with an energy of
Example 9.4.6
What are the stationary states (and corresponding energies) for a non-relativistic
particle behaving as a harmonic oscillator?
~2 ∂ 2 Ψ 1
− 2
+ mω 2 x2 Ψ = EΨ, (9.4.62)
2m ∂x 2
• Any smooth continuous function (i.e. any state function) can be written
as an infinite power series, so it is guaranteed
∞
X
Ψ(x) = aj x j (9.4.63)
j=0
will be a solution to Eq. 9.4.62. This power series could represent any
function in its current form since the constant coefficients, aj , are not
specified. We can do a little better though.
and
E
ε= (9.4.65)
~ω
just like in Figure 9.17. Neither χ nor ε have a unit, which makes them
very convenient for anything we might have to do numerically. We can
apply the chain rule for derivatives (Eq. 3.1.2),
2
~2
∂χ ∂ 1
− Ψ + mω 2 x2 Ψ = EΨ,
2m ∂x ∂χ 2
c Nick Lucid
9.4. SIMPLE MODELS 403
1 ∂ 2Ψ 1
− ~ω 2 + ~ωχ2 Ψ = ~ωεΨ
2 ∂χ 2
∂ 2Ψ
− χ2 Ψ = −2εΨ.
∂χ2
Moving all non-derivative terms to the right, this becomes
∂ 2Ψ 2
= χ − 2ε Ψ, (9.4.66)
∂χ2
where the 2ε is what’s forcing us to use a power series solution.
• Second, we do know a little about what the function looks like. As
χ → ∞,
∂ 2Ψ
→ χ2 Ψ, (9.4.67)
∂χ2
2 /2
since 2ε is constant. That means Ψ(χ) should include a factor of e−χ ,
which dominates over everything else for large χ. Therefore,
2 /2
Ψ(χ) = u(χ) e−χ , (9.4.68)
where u is the “everything else” and must become insignificant for large
2
χ. Technically, e+χ /2 is also a solution to Eq. 9.4.67, but we ignored
it since Ψ must be finite.
• Last, we write Eq. 9.4.66 in terms of u rather than Ψ. Substituting in
Eq. 9.4.68, we get
∂ ∂ −χ2 /2 2
= χ2 − 2ε u e−χ /2
ue
∂χ ∂χ
∂ −χ2 /2 ∂u −χ2 /2 2
= χ2 − 2ε u e−χ /2
e − uχ e
∂χ ∂χ
∂u ∂ 2 u
∂u 2 −χ2 /2 2
−χ2 /2
−χ + − χ − u + uχ e = χ − 2ε ue .
∂χ ∂χ2 ∂χ
2 /2
If cancel the e−χ and group like terms, then
∂u ∂ 2 u ∂u
−χ + 2
−χ − u + uχ2 = χ2 u − 2εu
∂χ ∂χ ∂χ
c Nick Lucid
404 CHAPTER 9. BASIC QUANTUM MECHANICS
∂ 2u ∂u
− 2χ + (2ε − 1) u = 0. (9.4.69)
∂χ2 ∂χ
Setting everything equal to zero is convenient for what we’re about to
do.
to find u. Judging from Eq. 9.4.69, we’re going to need a first and
second derivative, so
∞
∂u X
= jaj χj−1
∂χ j=1
and
∞
∂ 2u X
2
= j (j − 1) aj χj−2 .
∂χ j=2
∞
X ∞
X ∞
X
j−2 j
j (j − 1) aj χ + −2jaj χ + (2ε − 1) aj χj = 0.
j=2 j=1 j=0
• Unfortunately, we can’t combine the sums properly until all the powers
of χ are the same and all the lower limits are the same. Since j is just
a label, we could easily replace every j in the first sum with j + 2,
∞ ∞ ∞
(j + 2) (j + 2 − 1) aj+2 χj+2−2 + −2jaj χj + (2ε − 1) aj χj = 0
P P P
j+2=2 j=1 j=0
c Nick Lucid
9.4. SIMPLE MODELS 405
∞
X ∞
X ∞
X
j j
(j + 2) (j + 1) aj+2 χ + −2jaj χ + (2ε − 1) aj χj = 0
j=0 j=1 j=0
to make the powers the same. To make the lower limits the same,
we could just separate the j = 0 terms from the first and last sum.
However, in this case, the second sum has a factor of j, so adding a
j = 0 to that sum would just be adding a zero term (i.e. voodoo math).
This gives us
∞
X ∞
X ∞
X
j j
(j + 2) (j + 1) aj+2 χ + −2jaj χ + (2ε − 1) aj χj = 0
j=0 j=0 j=0
∞
X
[(j + 2) (j + 1) aj+2 − 2jaj + (2ε − 1) aj ] χj = 0.
j=0
• The only way this sum can always be zero is when the coefficients are
all zero. Therefore,
2j + 1 − 2ε
aj+2 = aj , (9.4.71)
(j + 2) (j + 1)
c Nick Lucid
406 CHAPTER 9. BASIC QUANTUM MECHANICS
– If a0 6= 0, then a1 = 0.
– If a1 6= 0, then a0 = 0.
That means, for any particular solution, you really only have one un-
known coefficient: either a0 or a1 , never both. Which ever one is non-
zero is your normalization constant.
ueven → a0 + 22 a0 χ2 + 42 22 a0 χ4 + 62 24 22 a0 χ6 + . . .
= a0 1 + 11 χ2∗1 + 2∗11 1
χ2∗2 + 3∗2∗1 χ2∗3 + . . .
∞ 1 ∞ 1
` 2
χ2` = a0 (χ2 ) = a0 eχ
P P
= a0
`=0 `! `=0 `!
and, therefore,
2 2 /2 2 /2
Ψeven → a0 eχ e−χ = a0 e+χ .
c Nick Lucid
9.4. SIMPLE MODELS 407
Figure 9.18: This is an energy level diagram showing the first four energy states in the
(one-dimensional) harmonic oscillator.pValues on the vertical axis are in units of ~ω and
values on the horizontal axis are for mω~ x (no unit) rather than x for generality. The
colors match those used in Figure 9.19.
1
0 = 2jmax + 1 − 2ε⇒ ε = jmax + .
2
We know ε is related to E by Eq. 9.4.65, so this is just the energy in
terms of jmax . The convention we’ve chosen in this chapter is to use n
to number energy levels, so
1
En = n + ~ω (9.4.72)
2
where n = jmax . These energy values all differ by an integer multiple
of ~ω, which is consistent with Planck’s result (Eq. 9.4.61).
• Eq. 9.4.72 allows to write the recursion formula (Eq. 9.4.71) as
2j + 1 − (2n + 1)
aj+2 = aj
(j + 2) (j + 1)
−2 (n − j)
aj+2 = aj , (9.4.73)
(j + 2) (j + 1)
which is in terms of n. Using this one will save us a little time calcu-
lating coefficients.
c Nick Lucid
408 CHAPTER 9. BASIC QUANTUM MECHANICS
• Now that we have the energy levels and a way to determine coefficients,
we should be able to find the eigenstates. Combining Eqs. 9.4.68 and
9.4.70, we get
" n #
2
X
Ψn = un e−χ/2 = aj χj e−χ /2 ,
j=0
= a0 + a2 χ2 + a4 χ4 + a6 χ6 + . . . + an χn
−2n 2 −2(n−2) −2n 4 −2(n−4) −2(n−2) −2n 6
= a0 + 2∗1 a0 χ + 4∗3 2∗1 a0 χ + 6∗5 4∗3 2∗1 a0 χ + ...
and we can see a few patterns right away. The denominators are j! =
(2`)! and the numerators have factors of (−2)j/2 = (−2)` . The factors
of n (n − 2) (n − 4) . . . kind of look like factorials, but they change by
two rather than one and they’re also missing a few. If we pull out a 2
from each of the j/2 = ` factors, then
` n
n n
n (n − 2) (n − 4) . . . = 2 −1 − 2 ...
2 2 2
However, we’re shy (n/2 − `) factors of this being (n/2)!, so
n
` 2
!
n (n − 2) (n − 4) . . . = 2 n .
2
− ` !
Combining all of this into a coefficient, we get
" #" #
(−2)` 2` n2 ! (−1)` 22` n2 !
a2` = n
a0 = n
a0
(2`)! 2
− ` ! (2`)! 2
− ` !
and, therefore,
n/2
" #
X (−1)` 22` n2 !
ueven = n
a0 χ2`
`=0
(2`)! 2
− ` !
c Nick Lucid
9.4. SIMPLE MODELS 409
n/2
n X (−1)`
ueven = a0 ! n
(2χ)2` , (9.4.74)
2 `=0 (2`)! 2 − ` !
• The sum in Eq. 9.4.74 looks a lot like an even Hermite polynomial,
which are given by
n/2 n
X (−1)`− 2
Heven = n! n
(2χ)2` . (9.4.75)
`=0
(2`)! 2 − ` !
ueven = a0 Hn . (9.4.76)
That’s what I call a simple solution! The odd solutions work out in a
similar fashion, where odd Hermite polynomial’s are given by
(n−1)/2 n−1
X (−1)`− 2
Hodd = n! (2χ)2`+1 . (9.4.77)
`=0
(2` + 1)! n−1
2
− ` !
and u is
uodd = a1 Hn . (9.4.78)
c Nick Lucid
410 CHAPTER 9. BASIC QUANTUM MECHANICS
Table 9.2: This is the first ten Hermite polynomials, Hn (χ). They represent solutions to
the harmonic oscillator in Example 9.4.6.
Hn Hermite Polynomials
H0 = 1
H1 = 2χ
H2 = 4χ2 − 2
H3 = 8χ3 − 12χ
H4 = 16χ4 − 48χ2 + 12
H5 = 32χ5 − 160χ3 + 120χ
H6 = 64χ6 − 480χ4 + 720χ2 − 120
H7 = 128χ7 − 1,344χ5 + 3,360χ3 − 1,680χ
H8 = 256χ8 − 3,584χ6 + 13,440χ4 − 13,440χ2 + 1,680
H9 = 512χ9 − 9,216χ7 + 48,384χ5 − 80,640χ3 + 3,0240χ
n/2 n
P (−1)`− 2 2`+1
Hn+1 = (n + 1)! (2`+1)!( n −`)!
(2χ)
`=0 2
" #
n/2 n
(−1)`− 2
P 2`
−2χHn = −2χ n! (2`)!( n −`)!
(2χ) ,
`=0 2
" n #
2P
−1 n
(−1)`− 2 −1
2`+1
2nHn−1 = 2n (n − 1)! (2χ)
(2`+1)! n −1−` ! (2 )
`=0
c Nick Lucid
9.4. SIMPLE MODELS 411
We can add an ` = n/2 term to the last sum because the factor n2 − `
would be zero anyway (i.e. voodoo math). Now that the limits on the
sums are the same, we can add all three together and we get
n/2 h n i n
(−1)`− 2
X
n! (n + 1) − (2` + 1) − 2 − ` (2`+1)! n (2χ)2`+1
2 ( 2
−` )!
`=0
(n + 1) − (2` + 1) − 2 n2 − ` = n + 1 − 2` − 1 − n + 2` = 0,
c Nick Lucid
412 CHAPTER 9. BASIC QUANTUM MECHANICS
If we integrate over all space, then the last two terms will go to zero
because Hermite polynomials are orthogonal functions (they have to
be if they’re eigenfunctions). This gives us
Z +∞ Z +∞
2
(Hn ) f (χ) dχ − 2n (Hn−1 )2 f (χ) dχ = 0
−∞ −∞
Z +∞ Z +∞
2
(Hn ) f (χ) dχ = 2n (Hn−1 )2 f (χ) dχ,
−∞ −∞
ψ ∗ ψ = Ψ∗ eiEt/~ Ψe−iEt/~ = Ψ∗ Ψ,
Notice the use of x rather than χ? That’s important. Using the chain
rule for derivatives (Eq. 3.1.2) and definition of χ (Eq. 9.4.64), this is
actually
Z +∞ Z +∞ q Z +∞
2 dx
q
2
(Ψ) dχ = (Ψ) ~
mω
dχ = mω ~
(Ψ)2 dχ = 1
−∞ dχ −∞ −∞
c Nick Lucid
9.4. SIMPLE MODELS 413
q Z +∞
2
Cn2 ~
mω
(Hn )2 e−χ dχ = 1.
−∞
2
This integral matches Eq. 9.4.81 if we set f = e−χ , so
q Z +∞
2
2 ~ n
Cn mω (2 n!) e−χ dχ = 1
−∞
c Nick Lucid
414 CHAPTER 9. BASIC QUANTUM MECHANICS
Figure 9.19: These are the first four eigenstates in the finite square well given in Example
9.4.6.
mω 1/4 1
⇒ Cn = √ (9.4.83)
~π 2n n!
and the eigenstates are given by
mω 1/4 1 2
Ψn (χ) = √ Hn (χ) e−χ /2 . (9.4.84)
~π n
2 n!
Transforming this back to x using Eq. 9.4.64, we get
mω 1/4 1 r mω 2
Ψn (x) = √ Hn x e−mωx /(2~) , (9.4.85)
~π 2n n! ~
c Nick Lucid
9.4. SIMPLE MODELS 415
h q i
mω 1/4 √ 1 2 1
mω
e−mωx /(2~)−i(n+ 2 )ωt
ψn (x, t) = ~π 2n n!
Hn ~ x , (9.4.86)
Example 9.4.6 only works this model out in one dimension, so you may
be skeptical about its relevance to Planck’s result (Eq. 9.4.61). However,
the three-dimensional case (Eq. 9.4.59) works out just like it did for the
infinite well (see Example 9.4.2). The differential equation separates into
three independent equations (one of x, y, and z). Therefore, the energy
levels are given by
Enx ny nz = Enx + Eny + Enz
1 1 1
Enx ny nz = nx + ~ω + ny + ~ω + nz + ~ω
2 2 2
3
Enx ny nz = nx + ny + nz + ~ω, (9.4.87)
2
where nx , ny , and nz are whole numbers (i.e. ni = 1, 2, 3, . . .). If we define
n = nx + ny + nz , then
3
En = n + ~ω, (9.4.88)
2
where n is a whole number (i.e. n = 1, 2, 3, . . .).
There is some degeneracy (i.e. multiple states having the same energy)
just like with the three-dimensional infinite well (see Example 9.4.2), but
that has no effect on our ultimate point. Let’s say an electron transitions
from a higher stationary state (ni ) to a lower one (nf ). The loss of energy is
−∆E = Ei − Ef = (ni − nf ) ~ω,
where (ni − nf ) is a whole number (i.e. 1, 2, 3, . . .). This means the loss of
energy is a whole number multiple of ~ω, which we know is emitted as a pho-
ton. Planck’s result (Eq. 9.4.61) is supported even in the three-dimensional
case.
c Nick Lucid
416 CHAPTER 9. BASIC QUANTUM MECHANICS
c Nick Lucid
Chapter 10
1. Determine the potential energy function for the system. This is what
makes models different from one another.
4. Apply boundary conditions to find any unknowns that may have ap-
peared in the last step. The eigenstates and their derivatives must
be continuous and finite.
5. Normalize using Eq. 9.2.17 to find the one remaining unknown: the
normalization constant. If you have more than one unknown left at
this step, then you didn’t finish the last step.
417
418 CHAPTER 10. MODERN QUANTUM MECHANICS
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 419
Figure 10.1: This is the potential energy function (Eq. 10.2.1) for the hydrogen atom
where r represents the distance from the proton in the nucleus. Values on the vertical axis
are in units of electron volts (eV).
Zq 2 Zq 2
V (r) = −kE =− , (10.2.1)
r 4π0 r
Example 10.2.1
What are the stationary states (and corresponding energies) for a lone non-
relativistic electron bound by a positive nucleus?
c Nick Lucid
420 CHAPTER 10. MODERN QUANTUM MECHANICS
~2 1 ∂ ∂2Ψ
2 ∂Ψ 1 ∂ ∂Ψ 1
− r + sin θ +
2m r2 ∂r ∂r r2 sin θ ∂θ ∂θ r2 sin2 θ ∂φ2
(10.2.2)
Zq 2
− Ψ = EΨ
4π0 r
where the integrals are also separable. Since (1) (1) (1) = 1, we could
just as easily say
Z ∞
R∗ R r2 dr = 1 (10.2.4a)
0
Z π
Θ∗ Θ sin θ dθ = 1 (10.2.4b)
0
Z 2π
Φ∗ Φ dφ = 1. (10.2.4c)
0
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 421
2
• If we substitute Eq. 10.2.3 into 10.2.2, multiply through by − ~2mr
2 RΘΦ ,
∂ 2 Φ Zmq 2 r 2mr2
1 ∂ 2 ∂R 1 ∂ ∂Θ 1
r + sin θ + 2 + + E = 0.
R ∂r ∂r Θ sin θ ∂θ ∂θ Φ sin θ ∂φ2 2π0 ~2 ~2
Zmq 2 r 2mr2
1 ∂ 2 ∂R
r + + 2 E = ` (` + 1) (10.2.5a)
R ∂r ∂r 2π0 ~2 ~
∂ 2Φ
1 ∂ ∂Θ 1
sin θ + = −` (` + 1) (10.2.5b)
Θ sin θ ∂θ ∂θ Φ sin2 θ ∂φ2
1 ∂ 2Φ
sin θ ∂ ∂Θ
sin θ + + ` (` + 1) sin2 θ = 0.
Θ ∂θ ∂θ Φ ∂φ2
We can see that all terms dependent on θ are not dependent on φ (and
vice versa), so the only way their sum can always be zero is if each
term is individually constant and they cancel each other. Therefore,
this is just two independent differential equations:
sin θ ∂ ∂Θ
sin θ + ` (` + 1) sin2 θ = m2` (10.2.6a)
Θ ∂θ ∂θ
1 ∂ 2Φ
2
= −m2` (10.2.6b)
Φ ∂φ
c Nick Lucid
422 CHAPTER 10. MODERN QUANTUM MECHANICS
∂ 2Φ
= −m2` Φ (10.2.7c)
∂φ2
and we need different methods to solve each one.
• Eq. 10.2.7c should be very familiar to you at this point. It appeared
in both the infinite well (Example 9.4.1) and the finite well (Example
9.4.3). In those simple cases, we said the only two functions with second
derivatives proportional to the negative of themselves were sin(m` x)
and cos(m` x). This is true only if you’re interested in real solutions.
In general, complex solutions are permitted, so we should really apply
Euler’s formula,
eiφ = cos φ + i sin φ, (10.2.8)
and say eim` φ and e−im` φ are the functions instead. We did this before
in deriving Eq. 9.2.1 without explicitly stating it. The general solution
will be a linear combination of the two, so
Φ(φ) = C1 eim` φ + C2 e−im` φ ,
where C1 and C2 are just constants. We can simplify further by merging
the sign in the exponent with m` , which results in
1
Φm` (φ) = Cφ eim` φ = √ eim` φ , (10.2.9)
2π
where m` acts as a label (not an exponent) on the variable Φm` . This
is similar to contravariant indices from Chapter 6, but raising and low-
irrelevant because Φm` is not a rank-1 tensor. The constant
ering is √
Cφ = 1/ 2π was determined by normalizing using Eq. 10.2.4c.
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 423
(−1)m` 2 m` /2 d
`+m` `
P`m` (x) = 2
1 − x x − 1 , (10.2.11)
2` `! dx`+m`
where ` and m` act as labels (not as exponents) on the variable P`m` .
This is similar to contravariant indices from Chapter 6, but raising and
lowering is irrelevant because P`m` is not a rank-2 tensor. Functions for
negative m` values are related to their positive counterpart by
(` − m` )! m`
P`−m` = (−1)m` ,P (10.2.12)
(` + m` )! `
which might save you some time. If these functions are solutions to
Eq. 10.2.7b, then the general solutions take the form
Θm m`
` (θ) = Cθ P` (cos θ)
`
c Nick Lucid
424 CHAPTER 10. MODERN QUANTUM MECHANICS
Table 10.1: This is the first ten associated Legendre functions, P`m` (x) = P`m` (cos θ).
They represent solutions for Θ(θ) in Example 10.2.1. Note: Functions for negative m` are
given by Eq. 10.2.12.
s
(2` + 1) (` − m` )! m`
Θm
` (θ) =
`
P (cos θ) , (10.2.13)
2 (` + m` )! `
just like they do for P`m` . The constant Cθ was determined by normal-
izing using Eq. 10.2.4b (and some properties of Legendre functions).
m` = −`, −` + 1, . . . , ` − 1, `. (10.2.14)
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 425
These limits occur because both ` and m` count the number of deriva-
tives in Eq. 10.2.11. If m` > `, then P`m` = 0. If m` < −`, then you
have a negative number of derivatives and that doesn’t make sense.
• The radial equation (Eq. 10.2.7a) is much more tedious than the other
two, so we’re going to work through this as succinctly as possible. Just
like for the harmonic oscillator (see Example 9.4.6), we’ll simplify the
process by changing to a unitless variable:
mq 2 r r
ρ≡ 2
= (10.2.15)
4π~ 0 a0
where
4π~2 0
a0 ≡ = 0.0529 nm (10.2.16)
mq 2
is the Bohr radius (see Eq. 9.1.2). With a little foresight from Eq.
9.1.3, we can even define a constant n such that
2m Z2
E = − . (10.2.17)
~2 n2 a20
We already know E should be negative because V is negative every-
where. However, it’s important to recognize all we know about n is that
it’s a real number. Any further restrictions (like saying it’s a natural
number) must be proven. Using Eq. 10.2.15, Eq. 10.2.17, and the prod-
uct rule for derivatives (Eq. 3.1.5) on the radial equation (Eq. 10.2.7a),
we get
2
Z2 2
2∂ R ∂R
ρ + 2ρ + 2Zρ − 2 ρ − ` (` + 1) R = 0 (10.2.18)
∂ρ2 ∂ρ n
• Also, just like in Example 9.4.6, we can pull out factors to account
for end-behavior because we know what the function should look like
there. As ρ → ∞, the radial equation approaches
∂ 2R Z 2
− 2 R≈0
∂ρ2 n
because the ρ2 terms dominate (i.e. they get bigger faster). That means
R(ρ) should include a factor of e−Zρ/n . Technically, e+Zρ/n is also a
c Nick Lucid
426 CHAPTER 10. MODERN QUANTUM MECHANICS
• We could use a power series solution like we did for the harmonic os-
cillator (see Example 9.4.6), but that’s extremely long and we can do
better. If we substitute Eq. 10.2.19 into our new radial equation (Eq.
10.2.18), then we end up with a 15-term differential equation. Some of
those terms either group or cancel leaving us with a 7-term differential
equation. A couple pages of math later, we get
∂ 2u
2Z ∂u 2Z
ρ 2 + (2` + 1) + 1 − ρ + [n − ` − 1] u = 0
∂ρ n ∂ρ n
or
∂2u
2Z 2Z ∂u
n
ρ 2 + (2` + 1) + 1 − n
ρ
∂ (2Zρ/n)
+ [n − ` − 1] u = 0.
∂ (2Zρ/n)
x−α ex dβ
Lαβ (x) = e−x xβ+α ,
β
(10.2.21)
β! dx
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 427
Lα3 = 61 (α + 1) (α + 2) (α + 3) − 3 (α + 2) (α + 3) x + 3 (α + 3) x2 − x3
Some examples for the hydrogen atom (Z = 1) are shown in Table 10.4.
c Nick Lucid
428 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.2: This is an energy level diagram showing the first four energy states for
the hydrogen atom (i.e. for Example 10.2.1 with Z = 1) where r represents the distance
from the proton in the nucleus.. The colors match those used in Figures 10.3 and 10.4.
n = 1, 2, 3, 4, . . . , (10.2.24)
Z 2 ~2 Z2
En = − = (−13.6 eV) (10.2.25)
n2 2ma20 n2
` = 0, 1, . . . , n − 1. (10.2.26)
• To find the full eigenstate, we need to combine the parts using Eq.
10.2.3. Using a consistent labeling scheme to keep you from seeing any
implied summations, that is
Ψm m` m`
n` = Rn` Θ` Φ .
`
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 429
where
s 2`+3
−3/2 2Z (n − ` − 1)! (2` + 1) (` − m` )!
C= a0 . (10.2.28)
n 2n (n + `)! 4π (` + m` )!
The possible values for n, `, and m` are given by Eqs. 10.2.24, 10.2.26,
and 10.2.14, respectively. Furthermore, L is an associated Laguerre
polynomial (Table 10.2) and P is an associated Legendre function (Ta-
ble 10.1).
c Nick Lucid
430 CHAPTER 10. MODERN QUANTUM MECHANICS
L2 Ψ = ` (` + 1) ~2 Ψ, (10.2.30)
Lz Ψ = m` ~ Ψ, (10.2.31)
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 431
Table 10.3: Based on Eqs. 10.2.24, 10.2.26, and 10.2.14 in Example 10.2.1, these are the
possible values for the three quantum numbers n, `, and m` in the first four electron shells.
The orbital type is also given along with the number of states for each type.
We can also say a few more things about the separated parts of the
eigenstates: Rn` (Eq. 10.2.22), Θm `
`
(Eq. 10.2.13), and Φm` (Eq. 10.2.9).
The radial part, Rn` , determines scale. In Figure 10.3, you can find graphs
of the radial probability densities, R2 r2 (the integrand of Eq. 10.2.4a), for
the first four s-orbitals in the hydrogen atom. You can see an n = 1 electron
is dramatically more likely to be found around one Bohr radius, a0 (Eq.
10.2.16), from the nucleus than anywhere else. However, this consistency
with the Bohr model quickly deteriorates since the highest peaks don’t line
up with Eq. 9.1.2. Figure 10.4 shows the same for the p-orbitals in the
hydrogen atom. The radial equations used in Figures 10.3 and 10.4 can be
found in Table 10.4.
The angular parts, Θm `
`
(Eq. 10.2.13), and Φm` (Eq. 10.2.9), tell you
something about the shape of the orbital. If we combine them, then
s
(2` + 1) (` − m` )! m`
Y`m` = Θm
` Φ
` m`
= [P (cos θ)] eim` φ , (10.2.32)
4π (` + m` )! `
where P`m` is a Legendre function (see Table 10.1). This Y`m` is referred to as
a spherical harmonic and several example can be found in Figure 10.5. It
should be noted here that there is no Z dependence. The number of protons
in the nucleus has no effect on the shape of these orbitals, only their scale
c Nick Lucid
432 CHAPTER 10. MODERN QUANTUM MECHANICS
√
(Rn` ). You can actually see what they look like if you graph Y ∗ Y , which
I’ve done for you in Figure 10.5.
If you’ve taken any classes covering orbitals or have looked any of this up,
then the shapes in Figure 10.5 probably look a little strange to you. That’s
because we tend not to use spherical harmonics as a standard. Atoms are
often connected to others in some kind of crystal lattice, so there tends to be
a convenient set of Cartesian axes we can choose. This allows us to switch
to cubic harmonics, which are much easier to work with because they’re
entirely real.
Cubic harmonics can be found by taking linear combinations of spherical
harmonics (of the same `) that eliminate the imaginary parts. For example,
the cubic p-orbital (` = 1) along the x-axis is
r
1 −1 1 3
1
sin θ e−iφ + eiφ
px = √ Y1 − Y1 = √
2 2 8π
Using Euler’s formula (Eq. 10.2.8),
r
1 3
px = √ sin θ (cos φ − i sin φ + cos φ + i sin φ)
2 8π
r r
1 3 3
= √ (2 cos φ) = sin θ cos φ
2 8π 4π
and using some coordinate transformations (Eq. 1.3.1), we get
r
3 x
px = p .
4π x2 + y 2 + z 2
Be very careful with your negative signs in this process. It’s easy to forget
the extra negative you have for odd m` values. The cubic harmonics for the
first three orbital types (s, p, and d) are shown in Figure 10.6.
A couple of the d-orbitals given in Figure 10.6 are labeled very strangely
because we’re choosing to be as descriptive as possible. The labels tell you
something about what the numerator looks like in Cartesian variables as well
as the orbital’s orientation:
• dxz is in the xz-plane,
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 433
It’s important to know what these look like because, as it turns out, they’re
the same shape in multiple-electron atoms.
c Nick Lucid
434 CHAPTER 10. MODERN QUANTUM MECHANICS
Table 10.4: This is the first ten radial equations, Rn` (r), for the hydrogen atom (Z = 1).
They were found using Eqs. 10.2.22 and 10.2.23. These were used in Figures 10.3 and 10.4
2
by computing Rn` a3/2 r2 .
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 435
Figure 10.3: This graph shows the probability densities of the first four R(r) (Eq. 10.2.22)
functions for ` = 0 (i.e. the s-orbitals) in the hydrogen atom.
Figure 10.4: This graph shows the probability densities of the first three R(r) (Eq. 10.2.22)
functions for ` = 1 (i.e. the p-orbitals) in the hydrogen atom. Note: The n = 1 energy
level does not have a p-orbital.
c Nick Lucid
436 CHAPTER 10. MODERN QUANTUM MECHANICS
√
Figure 10.5: This is a visual representation of Y ∗ Y for the spherical harmonics (Eq.
10.2.32). Only those for the first 4 values of ` are shown. Note: Y`m` looks the same as
Y`−m` because all negatives disappear in the complex square.
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 437
Figure 10.6: This is the first nine cubic harmonics where Y`m` is a spherical harmonic
from Figure 10.5. All orbitals for the first three types (s, p, and d) are shown. The
transformations in Eq. 1.3.1 were used to get functions of x, y, and z. Each of them is
named for the Cartesian numerator.
c Nick Lucid
438 CHAPTER 10. MODERN QUANTUM MECHANICS
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 439
ms = −s, −s + 1, . . . , s − 1, s; (10.2.38)
c Nick Lucid
440 CHAPTER 10. MODERN QUANTUM MECHANICS
still commutes with L2 or S 2 , so Eqs. 10.2.30 and 10.2.35 are still valid.
However, H no longer commutes with Lz or Sz , making the use of quantum
numbers m` and ms undesirable. It also invalidates Eq. 10.2.31 because Lz
and H no longer share the same eigenstates, Ψ (Eq. 10.2.27). Basically, we
~ or S
can’t make predictions about the orientation of L ~ at the same time we
make predictions about the energy.
Fortunately, we can solve this problem by adding them together. We’ll
define a full angular momentum,
J~ ≡ S
~ + L,
~ (10.2.41)
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 441
and
Lz Y`m` = m` ~ Y`m`
or , (10.2.50)
Lz |`, m` i = m` ~ |`, m` i
which are always true. Since Lz and Sz commute with each other, m` and
ms can be predicted at the same time. However, since neither commutes
with H, there’s no guarantee either can be predicted at the same time as n,
`, s, j, or mj (which can all be predicted together). The consequence is that
|j, mj i is an eigenstate of the Hamiltonian, but |`, m` i |s, ms i is not likely to
be.
If the particle is in an eigenstate of the Hamiltonian, then m` and ms are
not definite, so |j, mj i must be some linear combination:
X
`,s,j
|j, mj i = Cm ` ,ms ,mj
|`, m` i |s, ms i . (10.2.51)
m` +ms =mj
c Nick Lucid
442 CHAPTER 10. MODERN QUANTUM MECHANICS
`,s,j
Table 10.5: This is a small sample of the Clebsch-Gordan coefficients, Cm ` ,ms ,mj
, corre-
1
sponding to a spin- 2 particle in a p-orbital (` = 1). Remember, mj = m` + ms or the
coefficient is zero.
1
`=1 s= 2
|j, mj i
3
|`, m` i |s, ms i 2
, + 32 3
2
, + 12 1
2
, + 12 3
2
, − 21 1
2
, − 21 3
2
, − 23
1
|1, +1i 2
, + 12 1 0 0 0 0 0
q q
1
|1, +1i 2
, − 12 0 1
3
2
3
0 0 0
q q
1
|1, 0i 2
, + 12 0 2
3
− 1
3
0 0 0
q q
1
|1, 0i 2
, − 12 0 0 0 2
3
1
3
0
q q
1
|1, −1i 2
, + 12 0 0 0 1
3
− 23 0
1
|1, −1i 2
, − 12 0 0 0 0 0 1
Example 10.2.2
An electron is in a p-orbital for which you’ve already measured the full an-
gular momentum (j = 1/2 and mj = −1/2). Expand this state, find the
probabilities for each value of m` , and calculate hLz i.
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 443
• Since this is bra-ket notation, we use Eq. 9.3.8 to find probability. The
probability of finding the electron in m` = 0 is
2
P = kh`, m` | hs, ms | |j, mj ik2 = h1, 0| 12 , − 21 12 , − 21
r r ! 2
1 2
= h1, 0| 21 , − 12 |1, 0i 21 , − 21 − |1, −1i 1
2
, + 12
3 3
r r 2
1 2 1
= (1) − (0) = .
3 3 3
but we need to expand into |`, m` i. First, we’ll operate Lz on the ket
vector:
r r !
1 2
Lz 21 , − 12 = Lz |1, 0i 12 , − 21 − |1, −1i 12 , + 12
3 3
r r
1 2
= (0) |1, 0i 12 , − 12 − (−~) |1, −1i 21 , + 12
3 3
r
2
= ~ |1, −1i 21 , + 12
3
c Nick Lucid
444 CHAPTER 10. MODERN QUANTUM MECHANICS
where we’ve used Eq. 10.2.50 to operate. Operating with the bra vector,
r r
1 1 1 1 1 2
hj, mj | = 2 , − 2 = h1, 0| 2 , − 2 − h1, −1| 12 , + 21 ,
3 3
we get
r r r r
1 2 2 2 2~
hLz i = ~ (0) − ~ (1) = − .
3 3 3 3 3
Example 10.2.3
An electron is in a pz -orbital for which you’ve already measured it to be
spin-down. Expand this state into |j, mj i.
• It’s an electron, so s = 1/2. It’s also spin-down, so ms = −1/2.
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 445
Fine Structure
We mentioned several times the energy of an electron in shell n has an energy
given by Eq. 10.2.25. This implies that the electron can have any possible
value for ` or m` for that n and have exactly the same energy. When more
than one stationary state has the same energy, we say the model has de-
generacy. The three-dimensional infinite well (Eq. 9.4.15) and the three-
dimensional harmonic oscillator (Eq. 9.4.87) had this same problem, so it
might seem commonplace when working in three dimensions.
This isn’t really true though. In deriving the stationary states for single-
electron atoms in Example 10.2.1, we unwittingly made some assumptions
and we all know what happens when we assume. Here is a list of those
assumptions:
1. the nucleus was stationary,
2. the electron was non-relativistic,
3. the electron had no spin,
4. the proton had no spin, and
5. the Coulomb potential energy (Eq. 10.2.1) was continuous.
These were great approximations for getting us simple stationary states like
those in Eq. 10.2.29, but we need to be careful about the conclusions we take
away from approximate results.
Now, if we hadn’t made assumptions 2-5, then Schrödinger’s equation
(Eq. 9.2.7) would have been analytically unsolvable. I’m not suggesting we
start over and do this numerically (although, you could). I’m just saying we
can get closer to reality by adjusting our results a bit. First, we’ll define the
fine structure constant, which is
q2 ~ 1
α= = = 7.29735257 × 10−3 = , (10.2.53)
4π~c0 a0 mc 137.036
according to 2014 CODATA recommended values. It’s a unitless quantity, so
named because it’s involved in very small adjustments to the energy levels.
Using this for the hydrogen atom (Z = 1), the energy levels (Eq. 10.2.25)
can be written as
1 ~2 α2 mc2
En = − 2 = − , (10.2.54)
n 2ma20 2n2
c Nick Lucid
446 CHAPTER 10. MODERN QUANTUM MECHANICS
where m = me is the mass of the electron and c is the speed of light (Eq.
5.5.4). This gives us a basis for comparison.
We’re going to keep things as straightforward as possible by handling one
approximation at a time. The following list is in the same order as the list
above and applies only to hydrogen (Z = 1):
1836
− 1 En = −5.444 × 10−4 En ,
∆En,µ = 1837
(10.2.56)
H = KE + PE = [Erel − Ep ] + V
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 447
"r #
p2rel
H = mc2 1 + 2 2 − 1 + V. (10.2.57)
mc
α2
4n 3
∆En,rel ≈ 2 − En , (10.2.58)
2n 2` + 1 2
where the factor in front depends on the state (i.e. the quantum num-
bers n and `). However, if we do another order of magnitude approx-
imation, we get ≈ 10−4 to 10−6 for the states the electron is found in
the most often (i.e. the states near the nucleus).
c Nick Lucid
448 CHAPTER 10. MODERN QUANTUM MECHANICS
~ = 1 J 2 − S 2 − L2 ,
~ •L
S (10.2.60)
2
which allows us to avoid using operators that don’t commute with H.
This additional term demands an adjustment in energy of
α2 2n [j (j + 1) − ` (` + 1) − 3/4]
∆En,so ≈ − 2 En , (10.2.61)
2n ` (2` + 1) (` + 1)
where the factor in front depends on the state (i.e. the quantum num-
bers n, `, and j). Performing another order of magnitude approxima-
tion, we get ≈ 10−4 to 10−6 for the states the electron is found in the
most often (i.e. the states near the nucleus). This is the same as the
relativistic adjustment, so we’ll add Eq. 10.2.58 to Eq. 10.2.61 to get
α2
4n 3
∆En,fs ≈ 2 − En . (10.2.62)
2n 2j + 1 2
where S~p is the proton spin, S ~e is the electron spin, Hfs is given by
3
Eq. 10.2.59, and δ (~r) is the Dirac delta function (Eq. 5.3.7). This is
called spin-spin coupling because we can no longer consider the spins
separately. The full spin is defined as
~≡S
S ~p + S
~e , (10.2.64)
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 449
~e = 1 S 2 − S 2 − S 2 ,
~p • S
S p e (10.2.65)
2
which allows us to avoid using operators that don’t commute with H.
A consequence is that s and ms now describe the full spin, not the
individual spins of the particles.
Since the operations themselves depend on `, the solution must be
piecewise:
1
2
11.18α me 6 (2j ± 1) (2j ± 1 + 2) − 1 , if ` = 0
∆En,hf ≈− En ±1 , (10.2.66)
n mp , if ` 6= 0
(2j ± 1 + 1) (2` + 1)
c Nick Lucid
450 CHAPTER 10. MODERN QUANTUM MECHANICS
Table 10.6: This is a summary of the (fine and hyperfine) adjustments to the energy
levels, En (Eq. 10.2.25), in the hydrogen atom. Each is given as an order of magnitude
for simplicity and clarity.
Adjustments like those outlined in Table 10.6 may be small, but they’re
still important. Recall the energy levels in single-electron atoms (Eq. 10.2.25)
were only dependent on n (the shell number), so there was a lot of degen-
eracy. However, many of these small adjustments are also dependent on `
(the orbital number) and j, so different orbital types and spin configurations
can have slightly different energies. That means the degeneracy is broken in
` and j (see Figures 10.7 and 10.8). Breaking the degeneracy in orientation
(mj ) requires an external magnetic field.
We also see the consequences in nature. For example, there is a famous
spectral line of hydrogen called the “21 cm line” observed in interstellar
0
clouds. Due to spin-spin coupling, the ground state of hydrogen (ψ10 ) actu-
ally has two possible energy values that differ by
c Nick Lucid
10.2. SINGLE-ELECTRON ATOMS 451
Figure 10.7: This is an energy level diagram for the first shell (n = 1) of the hydrogen
atom. As you move to the right, sensitivity increases until all adjustments from Table 10.6
are included. A transition between the two hyperfine states results in the 21 cm spectral
line observed in interstellar clouds.
c Nick Lucid
452 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.8: This is an energy level diagram for the second shell (n = 2) of the hydrogen
atom. As you move to the right, sensitivity increases until all adjustments from Table
10.6 are included. Unlike in Figure 10.7, there are two fine structure states since j can be
either 1/2 or 3/2. The p-orbitals (` = 1) have been shown in orange for clarity.
c Nick Lucid
10.3. MULTIPLE-ELECTRON ATOMS 453
If the atom transitions from the higher to lower energy, then it will release a
photon with a frequency of
∆E
f= = 1420 MHz
h
and a wavelength of
c
λ= = 21.11 cm
f
in the microwave range. Hyperfine transitions are also important in Cesium
(atomic) clocks and refinement of nuclear material.
a kinetic energy for each electron and a potential energy for each interaction.
It’s the repulsion between the two electrons, PE1,2 , that causes all the trou-
ble. Without it, the Hamiltonian is made of commuting parts (one for each
electron), the solution would be separable (Ψ = Ψ1 Ψ2 ) and we could carry
everything over from single-electron atoms.
Unfortunately, the PE1,2 term is just as significant as the others, so it
cannot be ignored. Making the quantum substitutions for KE (Eq. 9.2.4)
and PE (Eq. 10.2.1), we get
~2 ~ 2 ~2 ~ 2 2q 2 2q 2 q2
H=− ∇1 − ∇2 − − + , (10.3.1)
2m 2m 4π0 r1 4π0 r2 4π0 |~r2 − ~r1 |
c Nick Lucid
454 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.9: This is helium drawn as a three-body problem. The labels 1 and 2 correspond
to each electron for use in Eq. 10.3.1.
where the first summation represents all the kinetic energies plus interac-
tions with the nucleus and the second summation represents all the electron
c Nick Lucid
10.3. MULTIPLE-ELECTRON ATOMS 455
Figure 10.10: These are orbital diagrams for the ground state of hydrogen and helium.
The arrows represent spin-up (ms = +1/2) or spin-down (ms = −1/2) electrons. The last
two boxes show impossible cases for helium due to the Pauli exclusion principle.
repulsion energy. Now we’d like to know how these orbitals are filled as the
atoms get larger. At any given time, the electrons could technically be in
any of them. Statistically speaking though, they prefer to be in a state with
as low an energy as possible.
Additionally, an electron cannot simultaneously occupy the same total
quantum state as another electron. It’s called the Pauli exclusion princi-
ple, named for Wolfgang Pauli, and applies to more than just the electron
(see Appendix D for more details). Emphasis is put on the word “total”
because electrons (all s = 1/2) can still have the same n, `, and m` as long
as ms is different. Since ms = ±1/2 for an electron, there can only be two
electrons (one spin-up and one spin-down) in each orbital (i.e. each state
given by n, `, and m` ). Figure 10.10 is an orbital diagram showing this
phenomenon for hydrogen and helium.
Recall in Figure 10.8, there was a p-orbital lower than an s-orbital for
n = 2. This is a phenomenon unique to hydrogen. Since the energy is affected
by electron repulsion (Eq. 10.3.2), it breaks the degeneracy in ` without fine
structure considerations. The base energy should now be written as En`
rather than just En . In all atoms larger than hydrogen (Z ≥ 2), orbitals
with the same n but a larger ` will have a higher energy (e.g. 2p is always
higher than 2s, 3d is always higher than 3p, etc.). The same cannot be
said when the values of n are different (e.g. 4p is always higher than 3d, yet
whether 4s or 3d is higher depends on the atom). This occurs because the
energy levels get closer together as they increase (see Figure 10.2). There is a
set of rules for this called Hund’s rules, but they have a ton of exceptions.
I don’t think any guideline with that many exceptions can really be called a
“rule,” so I’ll show you a better way.
We also need to remember that it’s not really the orbital that has energy.
It’s the electrons in those orbitals. Two different electrons in the same orbital
c Nick Lucid
CHAPTER 10. MODERN QUANTUM MECHANICS
c Nick Lucid
Figure 10.11: This graph shows the energies of the single outermost electron for each atom up to Z = 86 (the 6th row of the
periodic table). They were found by determining the energy required to ionize each atom (i.e. remove that electron). Values
of n (shell number) are indicated by color and values of ` (orbital type) are indicated by shape. You can clearly see where
orbital types d and f make simple rules impossible.
456
10.3. MULTIPLE-ELECTRON ATOMS 457
Periodic Table
All of this information about multiple-electron atoms and their orbitals gives
us the ability to construct the periodic table of elements. As a bit of
history, the periodic table was developed by Dmitri Mendeleev in 1870 CE,
long before we even knew for sure that matter was made of atoms (although
we suspected). It’s not called the periodic table of “atoms” after all. You
only have an “element” when there are enough of the same atom to make
something exist on our scale of the universe. In other words, elements are
macroscopic, but atoms are microscopic.
Mendeleev grouped elements into columns by similar chemical proper-
ties, then (assuming atoms existed) by atomic weight. At this point, the
only thing we could know about atoms was that they were very small (as
Democritus suggested in Section 9.1) because we couldn’t see them under
microscopes. Unfortunately, we didn’t know exactly how small, let alone
what they looked like. Atomic weight was measured relative to hydrogen,
the lightest substance we had discovered, by balancing chemical equations.
Today, after almost two centuries of experiments, we know atoms are
1
about 10 nm in diameter (give or take). They are made of a nucleus (protons
and neutrons) surrounded by a cloud of electrons. Most of the atomic mass
(formally “atomic weight”) is in the nucleus, but this is no longer a criterion
for periodic table placement. Instead, we use the atomic number, Z, the
number of protons. How the electrons are organized into orbitals (i.e. the
electron configuration) determines the chemical properties of the element
and, therefore, the columns (i.e. “groups”) of the table. Unfortunately, the
d-type and f-type orbitals often behave strangely, so this isn’t as easy as it
sounds.
The energy of electrons in d-type or f-type orbitals is significantly higher
than the corresponding s-type or p-type, so the higher shells (determined
c Nick Lucid
458 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.12: This is the orbital diagram for the ground state of Nickel (Z = 28). The
arrows represent spin-up (ms = +1/2) or spin-down (ms = −1/2) electrons and the
energy axis is not to any particular scale. Pairing opposite-spin electrons requires a bit
more energy than lone electrons, so they tend to occupy every individual orbital (of each
type) before pairing. Each box in each orbital-type is a single orbital and corresponds to
a possible value of m` from Table 10.3.
c Nick Lucid
10.3. MULTIPLE-ELECTRON ATOMS 459
Figure 10.13: This is the orbital diagram for the ground state of Copper (Z = 29) similar
to Figure 10.12. Since the nucleus is larger than Nickel’s, it attracts the electrons more
and all the orbitals are lower on the chart. However, 3d has more electrons in it, so it’s
attracted a little more bringing it lower than the 4s. As a result, a 4s electron falls into
the remaining spot in 3d and the remaining 4s electron is very loose making copper a very
good conductor of electricity.
c Nick Lucid
460 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.14: This is the orbital diagram for the ground state of Cerium (Z = 58) similar
to Figures 10.12 and 10.13. We’ve included only the outermost electrons since we don’t
know much about those inner electrons anyway. The 4f and 5d electrons have almost
exactly the same energy, so the 5d electron frequently oscillates between 5d and 4f.
Figure 10.15: This is the orbital diagram for the ground state of Praseodymium (Z = 59)
similar to Figure 10.14. Since the nucleus is larger than Cerium’s, it attracts the electrons
more and all the orbitals are lower on the chart. However, 4f has more electrons in it, so
it’s attracted a little more bringing it lower than the 5d. As a result, the 5d electron falls
into a stable 4f state. Some electrons remain unpaired similar to Figure 10.12.
c Nick Lucid
10.3. MULTIPLE-ELECTRON ATOMS 461
Figure 10.16: This shows how the periodic table is organized by n (shell number) and `
(orbital type). The elements shown in Figures 10.10 (hydrogen and helum), 10.12 (Nickel),
10.13 (Copper), 10.14 (Cerium), and 10.15 (Praseodymium) are also shown here for ref-
erence. Helium is sometimes shown two different places because it has the chemical prop-
erties of both groups.
and you include one of these terms for each orbital being occupied. We
know each orbital can only hold up to two electrons and we know how many
orbitals each type has (Table 10.3), so
• s-types can hold 2 × 1 = 2,
c Nick Lucid
462 CHAPTER 10. MODERN QUANTUM MECHANICS
Table 10.7: These are the electron configurations of the few example atoms from this
section. Noble gases (Argon and Xenon) have been used as shorthand.
Hydrogen H 1s 1s
Helium He 1s2 1s2
Nickel Ni 1s2 2s2 2p6 3s2 3p6 4s2 3d8 [Ar] 4s2 3d8
Copper Cu 1s2 2s2 2p6 3s2 3p6 4s1 3d10 [Ar] 4s1 3d10
Cerium Ce 1s2 2s2 2p6 3s2 3p6 4s2 3d10 4p6 5s2 4d10 5p6 6s2 4f 5d [Xe] 6s2 4f 5d
Praseodymium Pr 1s2 2s2 2p6 3s2 3p6 4s2 3d10 4p6 5s2 4d10 5p6 6s2 4f3 [Xe] 6s2 4f3
orbitals on the outside of the atom. These correspond to the orbitals in that
atom’s row (or “period”) of the periodic table (Figure 10.16), so we swap
the other terms in the configuration with the noble gas symbol (far right of
table) from the row above. A few examples are shown in Table 10.7.
This section might seem like it got a little “wordy” near the end since
there wasn’t much math we could do. For those of you who didn’t bother to
read any of it, here’s a summary of the important bits:
• Each orbital can hold up to two electrons as long as their spin orienta-
tions, ms , are opposite.
c Nick Lucid
10.4. ART OF INTERPRETATION 463
• However, since d-type and f-type orbitals are complicated, the number
of spots in rows of the periodic table is not (2,8,18,32,50,72,98); but
rather (2,8,8,18,18,32,32).
• With no external energy, electrons will fill orbitals from lower energy
to higher energy with no exceptions. This order only very loosely cor-
responds to the order of n and `, so thinking in terms of n and ` is not
recommended.
• The size of the electron cloud shrinks as you move from left to right (in
the periodic table) because the larger nucleus causes more attraction.
The size grows as you move from top to bottom because more layers
of electrons are added.
If you didn’t read the paragraphs, I’d recommend you go back and do that
in the future when you have time. Students often miss valuable information
about the actual physics by only reading math, tables, and figures. Physics
isn’t in the math. It’s in the language, concepts, and interpretation.
c Nick Lucid
464 CHAPTER 10. MODERN QUANTUM MECHANICS
scale). That’s why we only tend to use statistics (in scientific theory) when
everything else becomes impractical (e.g. when dealing with large numbers
of objects).
However, as we saw in the examples in Sections 9.4 and 10.2, this is not the
case in quantum mechanics where we apply statistics to individual particles.
Why do we do that? We have no choice. As we saw in Section 9.1, when
we try all the other mathematical tools, the whole model fails. Even when
we add other behavior restrictions for no reason (e.g. Bohr’s allowed orbits),
the model falls short of explaining everything. The examples in Sections 9.4
and 10.2 had no real interpretation in them, so they were more like applied
math than physics. The actual physics is a bit of an art and it can drive you
a little crazy. Read forward at your own risk.
Ensemble of Particles
The interpretation that I think makes the most sense to people is that the
wave function doesn’t apply to a single particle, but an ensemble of par-
ticles. The idea is that, if you prepare say 10,000 identical experiments
involving a certain kind of particle, then the wave function tells you how
many of them will turn out a certain way. Recall the electron in the finite
square well from Example 9.4.3:
• In Example 9.4.4, we found that there was a
The same happens for an electron in the p-orbital from Example 10.2.2. If
you set up a bunch of these experiments and measure Lz , then 1/3 of them
will come out m` = 0 and 2/3 of them will come out m` = −~.
However, if you measure the position of the electron within an orbital,
things get a little more visually interesting. The shapes of the orbitals are
c Nick Lucid
10.4. ART OF INTERPRETATION 465
1s
2s 2px
3s 3px 3dxy
Figure 10.17: These are probability plots for a few orbitals in the hydrogen atom. Only
the xy-plane cross section is shown for clarity. Orange pixels represent a measurement of
the electron’s position, so more concentrated orange means there is a higher probability.
given in Figure 10.6, but the electron is more likely to be found some places
than others inside the shape. To account for that, we need to include Rn`
(Eq. 10.2.22) to get the full eigenstate (Eq. 10.2.27). Let’s say you set up
10,000 electrons in identical hydrogen atoms, measure the positions of the
electrons in each, and make a composite image of all 10,000. The result
would be images like those in Figure 10.17.
All of this is certainly true about identical experiments, but does it ac-
tually mean the wave function doesn’t apply to a single particle? This in-
terpretation makes a lot of sense to people because it assumes it’s just a
problem of our ignorance. It says, somewhere underneath all this statistics,
there is a deterministic theory (i.e. one where anything can be predicted
as long as you know all the variables). Proponents argue there are just some
hidden variables we can’t yet measure. However, history has shown us,
reality doesn’t always lie in our comfort zone.
c Nick Lucid
466 CHAPTER 10. MODERN QUANTUM MECHANICS
Bell’s Inequality
In 1964, John Stewart Bell published a paper proving any local hidden vari-
able theory was impossible. Let’s say you have a neutral pion, π 0 (not to be
confused with the negative pion, π − , used in Example 7.4.1). The neutral
pion is weird since it is its own antiparticle, so it decays into two photons,
π 0 → 2γ, (10.4.1)
about 98% of the time. This isn’t very useful. Luckily, it decays into a
photon and an electron-positron pair,
π 0 → γ + e− + e+ , (10.4.2)
about 1.2% of the time. The electron (e− ) and positron (e+ ) travel in opposite
directions with opposite spins. Unfortunately, each has an equal probability
of being the spin-up (ms = +1/2) particle, so we would say the entangled
pair is in the state
r r
1 1 1 e− 1 1 e+ 1 1 1 e− 1 1 e+
|0, 0i = , + , − − ,− , +2 (10.4.3)
2 2 2 2 2
2 2 2 2
â • ĉ − â • b̂ ≤ 1 − b̂ • ĉ . (10.4.5)
This must be true for all â, b̂, and ĉ no matter how far apart the detectors;
so one counterexample would show a contradiction. Setting
x̂ + ŷ
â = x̂, b̂ = ŷ, and ĉ = √
2
c Nick Lucid
10.4. ART OF INTERPRETATION 467
gives us
x̂ + ŷ x̂ + ŷ
x̂ • √ − x̂ • ŷ ≤ 1 − ŷ • √
2 2
1 1
√ −0 ≤ 1− √
2 2
1 1
√ ≤ 1− √ ,
2 2
which is not true. This leaves us with only two possibilities.
1. The universe is inherently non-local.
• Neither the electron nor the positron had a definite spin prior to
the measurements. The particles were physically in a superpo-
sition of the two states (Eq. 10.4.3) until the measurement was
made.
Bell’s inequality has since been further generalized and many experiments
have been done verifying all versions. As a result, the physics community
has all but abandoned hidden variable theories.
Copenhagen Interpretation
Throughout the 1920s, Werner Heisenberg collaborated Niels Bohr in Copen-
hagen, Denmark. They were trying to come to some kind of agreement about
what quantum mechanics was saying. In the end, they agreed on almost ev-
erything. Heisenberg gave a series of lectures in 1929 (and published a book
in 1930) outlining the conclusions. He didn’t coin the term “Copenhagen
interpretation” until the 1950s while criticizing other interpretations. The
term implies a level of historical formality that doesn’t really exists. Still,
I’ll do my best at defining it.
c Nick Lucid
468 CHAPTER 10. MODERN QUANTUM MECHANICS
We’ve already seen many of the principle ideas in the Copenhagen inter-
pretation, but we’ll include them again here in the interest of clarity. As of
1930, the description is as follows:
1. The wave function, ψ(~r, t), completely describes the state of a system.
2. All quantum entities can display either particle properties, wave prop-
erties, or some combination of the two depending on the experiment
being performed.
c Nick Lucid
10.4. ART OF INTERPRETATION 469
The difference between what we can predict and what we actually measure
is tricky business, but both have equal footing in physical reality.
1. Subject: Bullets
• As the bullets pass through the slit plate, they ricochet off the
armored walls in all directions. The ones that make it through,
will make their way toward the box of sand and stop. After an
c Nick Lucid
470 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.18: This is the basic experimental layout for Feynman’s double-slit thought
experiment. Position, x, along the detector is measured from the bottom edge. The
openings in the slit plate are labeled 1 and 2 for reference.
2. Subject: Water
Source: Piston
Slit plate: Wood
Detector: Chain of floating buoys
Assumptions: Piston and buoys can only move vertically.
• As the piston moves, surface waves are created on the water that
move in all directions. The ones that make it through the slit
plate, will make their way toward the buoys and cause them to
bounce. We measure the maximum displacement of each buoy for
an hour (i.e. the amplitude) and take an average.
3. Subject: Electrons
Source: Filament
Slit plate: Tungsten radiation shielding
Detector: Chain of Geiger counters
c Nick Lucid
10.4. ART OF INTERPRETATION 471
We’ll run through each experiment three different ways: once with both slits
open, once with only slit 1 open, and once with only slit 2 open. This will
allow us to examine their true behavior.
The ultimate result of each experiment is going to be a comparison be-
tween how the detector pattern looks from two open slits (labeled I12 ) and
how we expect it to look based on the two single-slit patterns (labeled I1 and
I2 ). If the subject of the experiment is a particle, then they will just build
up independently and the two single-slit patterns will simply add. In terms
of intensity at each value of x on the screen in Figure 10.18, that can be
written as
If the subject of the experiment is a wave, then it’s the disturbances (i.e.
amplitudes) of the wave that add. In terms of amplitude at each value of
x, that can be written as
I12 ∝ (A12 )2
I12 ∝ (A1 + A2 )2
I12 ∝ (A1 )2 + (A2 )2 + 2A1 A2 cos(ϕ0 ) ,
having used the law of cosines in the last step. We also know I1 ∝ (A1 )2 ,
I2 ∝ (A2 )2 , and ϕ0 is the phase difference between the two waves; so
p 2πd x
I12 (x) = I1 (x) + I2 (x) + 2 I1 (x) I1 (x) cos . (10.4.8)
λ z
c Nick Lucid
472 CHAPTER 10. MODERN QUANTUM MECHANICS
Figure 10.19: The first graph shows the intensity from each individual slit when the other
is closed. The second graph shows the intensity when both slits are open if you’re firing
particles (e.g. bullets). The third graph shows the intensity when both slits are open if
you’re firing waves (e.g. water).
where λ is the wavelength of the wave, d is the distance between the slits
(comparable to λ), and z is the distance between the slit plate and detector
(z x).
The graphs for I1 and I2 will look very similar for both particles and
waves (due to the behavior of waves passing through a single small opening).
However, as you can see in Figure 10.19, the graphs for I12 look very different.
1. Bullets from one slit don’t “interfere” with bullets from the other slit,
so they behave like particles showing the pattern in the second graph
in Figure 10.19.
2. Water waves are a different story. As the waves exit the two slits, they
spread out and overlap. The water must respond to both simultane-
ously, which is what we call interference. By the time the waves get
to the chain of buoys, some parts are adding together and some are
canceling out. This results in the third graph in Figure 10.19.
Both experiments have shown exactly what we would expect and we now
have a basis from which to judge electrons.
According to classical physics, electrons are particles, so there are no
partial electrons and they must travel a certain path (i.e. they take either
c Nick Lucid
10.4. ART OF INTERPRETATION 473
slit 1 or slit 2, but never both). Based on this, we expect the electron’s
detector pattern to match the one for bullets (see Figure 10.19). We even
counted electrons just like we counted bullets: hits at each position x per
hour (i.e. I12 = N12 ).
3. Electrons are tricky beasts though. When we perform the experiment,
our measurements match the pattern for waves (i.e. the third graph in
Figure 10.19). The experiment says electrons are waves.
By this point in the book, you’re already well aware of this. According to
Section 9.2, they’re probability waves, but what does that actually mean?
If we’re counting electrons like we count bullets, then let’s take another
look at bullets. We’ll use the total number of bullets fired per hour to nor-
malize the intensity curve:
I12 (x) N12 (x)
I12 (x) = N12 (x) ⇒ P12 (x) = = ,
Ntotal Ntotal
where P12 (x) is the probability of getting a bullet at x when both slits are
open. We’re really just measuring probability. If electrons are interfering
like waves, then we’ll need an analog to amplitude such that its square is the
probability. We’ll call it a probability amplitude and the electron’s wave
function,
ψ(x) = hx| |ψi ,
conveniently fits the criteria. This allows us to use the same kind of math for
electrons. Unfortunately, particle wave functions are complex (i.e. containing
both real and imaginary parts) and the probabilities (Eq. 9.3.8) are complex
squares,
P (x) = khx| |ψik2 ,
so we can’t make any physical sense of it like we could for water waves
(i.e. all analogies stop here). In the case of the double-slit experiment, the
probability of an electron arriving at x is
2
X
P12 (x) = hx| |sliti hslit| |ψi
slits
P12 (x) = khx| |1i h1| |ψi + hx| |2i h2| |ψik2
P12 (x) = kψ1 (x) + ψ2 (x)k2
c Nick Lucid
474 CHAPTER 10. MODERN QUANTUM MECHANICS
P12 (x) = khx| |1i h1| |ψik2 + khx| |2i h2| |ψik2
P12 (x) = kψ1 (x)k2 + kψ2 (x)k2
P12 (x) = P1 (x) + P2 (x)
and the detector pattern matches the one for particles (see Figure 10.19).
Prior to its detection at the slit plate, the electron was in superposition of slit
1 and slit 2. The act of observing the electron’s path forced the electron to
collapse into a state of slit 1 or slit 2, but not both. It would seem particles
don’t like to be watched by experimenters.
As with every other thought experiment in this book, this one has limits.
For double-slit diffraction to be noticeable, the slit size and separation both
have to be comparable to the wavelength of the wave. In the case of visible
light, the wavelength is ≈ 10−7 meters (≈ 0.1 µm), so slit scales can’t be much
larger than 10−5 meters (10 µm). Electron wavelengths tend to be ≈ 10−10
c Nick Lucid
10.4. ART OF INTERPRETATION 475
Figure 10.20: This is an experimental layout for Feynman’s double-slit thought experiment
(like Figure 10.20), but with a light source and some sensors added to detect which slit
is being used by which electrons. Position, x, along the detector is measured from the
bottom edge. The openings in the slit plate are labeled 1 and 2 for reference.
meters (≈ 0.1 nm), which is 1000 times smaller than visible light. This means
the slit scales also need to be about 1000 times smaller or ≈ 10−8 meters (≈ 10
nm). That was impossible for decades after Feynman’s proposal, but in 2012
the experiment was finally done in real life and the results given here have
been confirmed. We can no longer treat this as just a thought experiment.
c Nick Lucid
476 CHAPTER 10. MODERN QUANTUM MECHANICS
c Nick Lucid
10.4. ART OF INTERPRETATION 477
Figure 10.21: This graph represents the state and electron in an infinite square well before
and after a measurement is made of it’s position, x. Prior to the measurement, it’s in a
stationary state of the Hamiltonian, H. After, it’s in a stationary state of the position, x.
Furthermore, you can never know any property exactly. This is why, even
after a measurement of x, Figure 10.21 shows the electron is around 0.6a
c Nick Lucid
478 CHAPTER 10. MODERN QUANTUM MECHANICS
give or take a little (indicated by the width of the “spike”). It’s not just an
experimental problem. It’s a physical one.
c Nick Lucid
10.4. ART OF INTERPRETATION 479
that detects the decay. A Geiger counter would suffice! However, isn’t that
a measurement? There are two possibilities:
• The Geiger counter detects the radiation, activates the poison, and the
cat dies. The superposition state of the atom collapses into a single
state.
• The Geiger counter doesn’t detect the radiation, the poison is inactive,
and the cat lives. The atom remains in superposition state.
We don’t have this quite figured out yet, but allow me to speculate for a mo-
ment. We know the cat is made of quantum particles, but how many? Just
counting atoms, that would be about 1027 (order of magnitude approxima-
tion). Those atoms are mostly hydrogen, oxygen, and carbon, so the number
of subatomic particles could easily be about 1028 . That’s a lot of particles!
Those particles interact quite a lot and I’d imagine a fair portion of those
interactions could be considered “measurements,” so those wave functions
must be collapsing a lot.
Recall what happened in Section 9.2 when we tried to argue a single electron
was just charge smeared out across an orbit? It failed. However, the billions
of electrons on a charged surface certainly behave that way. If you modeled
the billions of individual electrons as probability waves and used a big com-
puter to simulate the whole process, then you just wont see any of the wave
properties. Some physicists call this quantum decoherence.
This explanation looks great until you remember that not all macro-
scopic things are particle-like. Huge collections of electrons might lose their
wave properties, but huge collections of photons lose their particle properties.
c Nick Lucid
480 CHAPTER 10. MODERN QUANTUM MECHANICS
Light behaves like a wave on the large scale. This could have something to
do with mass (i.e. electrons have mass and photons don’t), but no one knows
for sure. I just don’t think you can ask where the macroscopic world begins
because it’s more a continuous gradual process. The more particles there are
and the more space they take up, the less and less duality there appears to
be. It’s always there, but one or the other just becomes significantly more
dominant.
Interpretations or not, quantum mechanics is weird and crazy. It can
make even the most skilled physicist pull out there hair just thinking about
it. We use it though because it works. It can make incredible predictions
that would have been impossible to make without it and we’ve performed
countless real experiments verifying its principles. In the future, it may turn
out that quantum simultaneously applies to every copy of a particle in an
infinite multiverse (i.e. we don’t know which particle we have in our universe
until we “look”). It might even turn out that our universe is inherently non-
local allowing for other hidden variable theories. Unfortunately, most of us
will just have to wait and see.
c Nick Lucid
Appendix A
Numerical Methods
where
k1 = ẏ(tn , yn )
k2 = ẏ tn + 1 ∆t, yn + 1 k1 ∆t
2 2
(A.1.2)
k3 = ẏ tn + 21 ∆t, yn + 12 k2 ∆t
k4 = ẏ(tn + ∆t, yn + k3 ∆t)
481
482 APPENDIX A. NUMERICAL METHODS
Example A.1.1
Turn the following second-order differential equation into a set of first-order
equations.
ÿ + 2ẏ + 4y = 25
ÿ = 25 − 2ẏ − 4y.
• Yes, it creates an extra variable, but it’s necessary if you intend to solve
the original equation using the Runge-Kutta method.
• This method can be applied to even higher order equations, but for
third-order there will be three equations and for fourth-order there will
be four and so on. Furthermore, y can be either a scalar or a vector
quantity.
c Nick Lucid
A.2. NEWTON’S METHOD 483
Taking another look at Eqs. A.1.1 and A.1.2, we can see that the value of
y(t) will not change with each iteration if y(t0 ) and ẏ(t0 ) are both zero. The
method will always result in a zero. However, taking an extra derivative of
your function to include an extra initial value will usually solve this problem.
Just remember that doing so will add an extra set of integration.
This brings us to the iteration step. As long as the iteration step is suffi-
ciently small, the graphical result will be accurate. How small is “sufficiently
small?” Well, that will depend on your differential equation(s) and the level
of desired accuracy. There are also times when you may want to relax the
condition that the iteration step be constant. This is called adaptive iteration
and involves knowing a little something about your function. Using terrain
as an analogy, you should make your step smaller when passing through er-
ratic mountainous regions to guarantee accuracy in those areas and you can
make it larger when passing through smooth countryside to increase speed.
c Nick Lucid
484 APPENDIX A. NUMERICAL METHODS
Once you have a guess, you step progressively closer to the solution using
f (xn )
xn+1 = xn − , (A.2.2)
f 0 (xn )
Example A.2.1
Solve 3ex + xex = 9 for x.
• First, we set the equation equal to zero to find f :
f (x) = 0 = 3ex + xex − 9.
c Nick Lucid
A.2. NEWTON’S METHOD 485
Table A.1: This table contains a few worked out examples of Newton’s method from
Example A.2.1. Notice that all guesses, x0 , arrive at the same result. The better guess
just takes fewer steps.
f (x) f 0 (x) xn
x0 = 1
1.873127314 13.59140914 0.862182994
0.146904904 11.51523000 0.849425550
0.001124187 11.33942741 0.849326410
0 11.33807149 0.849326404
x0 = 5
1178.305273 1335.718432 4.117849058
428.2279305 498.6549048 3.259082953
153.8967613 188.9224207 2.444480002
53.74521086 74.26976616 1.720831422
17.38554585 31.97471934 1.177103558
4.554541204 16.79950294 0.905991893
0.664927877 12.13931291 0.851217139
0.021461754 11.36395802 0.849328559
0 11.33810087 0.849326404
x0 = 10
286335.0553 308370.5211 9.071457757
105052.5408 113764.8426 8.148039429
38525.26312 41990.85863 7.230571571
14119.53827 15509.54990 6.320194522
5170.055703 5734.736777 5.418661304
1890.055866 2124.632808 4.529069531
688.7361317 790.4084239 3.657702133
249.1334070 296.9055541 2.818602279
88.48147441 114.2348921 2.044044937
29.94900647 46.67078670 1.402337167
8.894130266 21.95881900 0.997300345
1.836494564 13.54744786 0.861740158
0.141806903 11.50908345 0.849418855
0.001048270 11.33933584 0.849326409
0 11.33807148 0.849326404
c Nick Lucid
486 APPENDIX A. NUMERICAL METHODS
4 5
√ 4.4 × 10 actually rounds up to 10 . In fact, any front number bigger than
so
10 ≈ 3.162 will round up. It’s a bit strange, but it’s the scientific standard,
so you should know it.
c Nick Lucid
Appendix B
Useful Formulas
• Chain Rule:
d d du
(f ) = (f )
dx du dx
• Constant Multiple Property:
d d
c (f ) = (cf )
dx dx
• Distributive Property:
d d d
(f + g) = (f ) + (g)
dx dx dx
• Product Rule:
d d d
(f ∗ g) = (f ) ∗ g + f ∗ (g)
dx dx dx
487
488 APPENDIX B. USEFUL FORMULAS
• Path Element:
• Volume Element:
• Gradient:
3
~ =
X 1 ∂f 1 ∂f 1 ∂f 1 ∂f
∇f êi = ê1 + ê2 + ê3
h ∂qi
i=1 i
h1 ∂q1 h2 ∂q2 h3 ∂q3
• Divergence:
3
~ •A
~ = 1 X ∂
∇ (Hi Ai )
h1 h2 h3 i=1 ∂qi
~ •A
~ = 1 ∂ ∂ ∂
∇ (h2 h3 A1 ) + (h3 h1 A1 ) + (h1 h2 A1 )
h1 h2 h3 ∂q1 ∂q2 ∂q3
c Nick Lucid
B.2. MULTI-VARIABLE CALCULUS 489
• Curl :
1 1 1
ê
h2 h3 1
ê
h1 h3 2
ê
h1 h2 3
~ ×A
~ = det
∂ ∂ ∂
∇ ∂q1 ∂q2 ∂q3
h1 A1 h2 A2 h3 A3
~ ~ 1 ∂ ∂
∇×A = (h3 A3 ) − (h2 A2 ) ê1
h2 h3 ∂q2 ∂q3
1 ∂ ∂
− (h3 A3 ) − (h1 A1 ) ê2
h1 h3 ∂q1 ∂q3
1 ∂ ∂
+ (h2 A2 ) − (h1 A1 ) ê3
h1 h2 ∂q1 ∂q2
• Laplacian:
3
~ f =∇
2
~ • ∇f
~ 1 X ∂ 1 ∂f
∇ = Hi
h1 h2 h3 i=1 ∂qi hi ∂qi
where H~ = (h2 h3 ) ê1 + (h3 h1 ) ê2 + (h1 h2 ) ê3 (the even permutations of
the subscripts).
• Divergence Theorem:
Z I
~ •B
∇ ~ dV = ~ • dA
B ~
V
• Curl Theorem:
Z I
~ ×B
∇ ~ • dA
~= ~ • d~`
B
A
where d~` is the length element of the path enclosing the area A.
c Nick Lucid
490 APPENDIX B. USEFUL FORMULAS
Now if you’re looking for a particular coordinate system, just use the follow-
ing. They are sorted as (q1 , q2 , q3 ); {ê1 , ê2 , ê3 }; and {h1 , h2 , h3 }.
• Cartesian:
• Cylindrical :
n o
(s, φ, z) ; ŝ, φ̂, ẑ ; {1, s, 1}
c Nick Lucid
B.3. LIST OF CONSTANTS 491
• Spherical :
n o
(r, θ, φ) ; r̂, θ̂, φ̂ ; {1, r, r sin θ}
• Bipolar Cylindrical :
a a
(τ, σ, z) ; {τ̂ , σ̂, ẑ} ; , ,1
cosh τ − cos σ cosh τ − cos σ
• Elliptic Cylindrical :
q q
2 2 2 2
(µ, ν, z) ; {µ̂, ν̂, ẑ} ; a sinh µ + sin ν, a sinh µ + sin ν, 1
c Nick Lucid
492 APPENDIX B. USEFUL FORMULAS
c Nick Lucid
Appendix C
This is a list of all the quantities that are relevant to the spacetime geometries
I used in Chapters 7 and 8. All information is given in geometrized units.
See Table 8.1 for more details on the units.
• Riemann Curvatures: δ
Rαµν =0
493
494 APPENDIX C. USEFUL SPACETIME GEOMETRIES
• Line Element:
ds2 = −dt2 + dr2 + r2 dθ2 + r2 sin2 θ dφ2
• Riemann Curvatures: δ
Rαµν =0
c Nick Lucid
C.4. EDDINGTON-FINKELSTEIN GEOMETRY 495
δ δ
• Riemann Curvatures (Rαµν = −Rανµ ):
−1
t 2M 2M θ M 2M
Rrtr = 3 1− Rtθt = 3 1−
r r r r
−1
t M θ M 2M
Rθtθ =− Rrθr =− 3 1−
r r r
M 2M
t
Rφtφ =− sin2 θ θ
Rφθφ = sin2 θ
r r
r 2M 2M φ M 2M
Rtrt =− 3 1− Rtφt = 3 1 −
r r r r
−1
r M φ M 2M
Rθrθ =− Rrφr = − 3 1 −
r r r
M φ 2M
r
Rφrφ =− sin2 θ Rθφθ =
r r
48M 2
• Kretschmann Invariant: K=
r6
• Line Element:
2M ∗ 2 4M ∗ 2M
2
ds = − 1 − (dt ) + dt dr + 1 + dr2 + r2 dθ2 + r2 sin2 θ dφ2
r r r
c Nick Lucid
496 APPENDIX C. USEFUL SPACETIME GEOMETRIES
t 4M 2 θ 2M 2
Rtrt =− Rtθr =−
r4 r4
2M 2
t 2M 2M θ
Rrtr = 3 1+ Rrθt =− 4
r r r
t M θ M 2M
Rθtθ =− Rrθr = − 3 1 +
r r r
M 2M
t
Rφtφ =− sin2 θ θ
Rφθφ = sin2 θ
r r
r 2M 2M φ M 2M
Rtrt =− 3 1− Rtφt = 3 1−
r r r r
r 4M 2 φ 2M 2
Rrtr =− Rtφr =−
r4 r4
r M φ 2M 2
Rθrθ =− Rrφt =− 4
r r
M φ M 2M
r
Rφrφ = − sin2 θ Rrφr =− 3 1+
r r r
θ M 2M φ 2M
Rtθt = 3 1 − Rθφθ =
r r r
c Nick Lucid
C.5. SPHERICALLY SYMMETRIC GEOMETRY 497
48M 2
• Kretschmann Invariant: K=
r6
c Nick Lucid
498 APPENDIX C. USEFUL SPACETIME GEOMETRIES
• Kretschmann Invariant:
2 4
4 4 8 2 ∂a 1 ∂a
K = + 4 2− 4 + 2 2 2 + 4 2
r4 r b r b r a b ∂r 4a b ∂r
3 2 2 2
1 ∂a ∂b 2 ∂b 1 ∂a ∂b
+ 3 3 + 2 4 + 2 4
2a b ∂r ∂r r b ∂r 4a b ∂r ∂r
2 2 2
2 2
1 ∂a ∂ a 1 ∂a ∂b ∂ a 1 ∂ a
− 3 2 − 2 3 + 2 2
a b ∂r ∂r2 a b ∂r ∂r ∂r2 a b ∂r2
• Line Element:
2 1
ds2 = −dt2 + [a(t)] dr2 + r2 dθ2 + r2 sin2 θ dφ2
1 − kr2
c Nick Lucid
C.6. COSMOLOGICAL GEOMETRY 499
r 1 ∂2a φ 1 ∂2a
Rtrt =− Rtφt =−
a ∂t2 a ∂t2
" 2 # " 2 #
r 2 ∂a φ 1 ∂a
Rθrθ = r k + Rrφr = 2
k+
∂t 1 − kr ∂t
" 2 # " 2 #
r 2 2 ∂a φ ∂a
Rφrφ = r sin θ k + Rθφθ = r2 k +
∂t ∂t
c Nick Lucid
500 APPENDIX C. USEFUL SPACETIME GEOMETRIES
• Kretschmann Invariant:
" 2 4 2 2 #
12 2 ∂a ∂a 2 ∂ a
K = 4 k + 2k + +a
a ∂t ∂t ∂t2
c Nick Lucid
Appendix D
Particle Physics
Beyond the use of quantum mechanics (Chapters 9 and 10), the physics of
particles is mostly just lists, tables, and diagrams. I felt it was more fitting
to include them in an appendix rather than an actual chapter.
501
502 APPENDIX D. PARTICLE PHYSICS
• For example, as shown in Figure 10.10, you can only put two
electrons in one orbital if they have opposite spins (i.e. opposite
values of ms ).
2. Identical bosons, on the other hand, can occupy the same state at the
same time. In fact, there’s no limit to how many you can cram into a
single state.
You can find a sample lists of particles in Tables D.1 and D.2.
• Six Leptons
• Six Quarks
You can find the full list (with properties) in Table D.1.
The force carrier particles do exactly what their name suggests. They
facilitate one of the four fundamental forces.
c Nick Lucid
D.3. BUILDING LARGER PARTICLES 503
4. Gravity: Unknown
2. They’re not stable. They randomly switch back and forth between each
other (and between different mass eigenstates).
However, we do know they’re non-zero. We also have upper and lower limits
for those masses, but those change so often it was silly to include them in
something as static as a book.
c Nick Lucid
504 APPENDIX D. PARTICLE PHYSICS
Table D.1: This is the full list of fundamental particles and their properties. Mass is given
in units of MeV/c2 , charge in units of the elementary charge e, and spin in units of ~.
c Nick Lucid
D.3. BUILDING LARGER PARTICLES 505
Figure D.1: This chart from optics shows how light colors can add together to make other
colors. Red, green, and blue are the primary light colors. Cyan (blue + green), magenta
(red + blue), and yellow (red + green) are the secondary light colors. This pattern is used
as an analog for quantum chromodynamics (Section D.3).
D.1), so we’ve arbitrarily borrowed the labels: red, green, and blue. As a
result, the study of how quarks bond has come to be known as quantum
chromodynamics. However, quarks don’t actually have color. It’s just an
analogy.
Based on Figure D.1, we have a couple ways we can combine quarks using
color charge and they each have names.
• One red, one green, and one blue combine to make white (i.e.
neutral).
• One red and one anti-red (cyan) combine to make white (i.e. neu-
tral).
• One green and one anti-green (magenta) combine to make white
(i.e. neutral).
• One blue and one anti-blue (yellow) combine to make white (i.e.
neutral).
c Nick Lucid
506 APPENDIX D. PARTICLE PHYSICS
Technically, the model allows for combinations of more than three quarks,
but they’re purely hypothetical (i.e. they’ve never been detected).
There also exists an anti-particle (usually signified by a line over the
symbol) for every particle. When a particle and its anti-particle (e.g. electron
and positron) combine, they annihilate each other to generate one or more
high energy photons (or Z bosons if the energy is high enough). A hadron’s
anti-particle is always made of the opposite quarks (e.g. uud and uud for the
proton and anti-proton, respectively). Sometimes the anti-particle is itself
(e.g. ss for the phi-meson), but it still technically has one. Even weirder, the
neutral pion is in a superposition of two quark doublets:
uu − dd
π0 → √ ,
2
2. Leptons, quarks, and hadrons are all drawn as straight solid lines with
arrows (i.e. time-like paths).
c Nick Lucid
D.4. FEYNMAN DIAGRAMS 507
Table D.2: This is a sample list of particles and their intrinsic properties. It is by no
means a complete list. Mass is given in units of MeV/c2 , charge in units of the elementary
charge e, and spin in units of ~. Anti-quarks are signified by a line over the symbol.
c Nick Lucid
508 APPENDIX D. PARTICLE PHYSICS
c Nick Lucid
D.4. FEYNMAN DIAGRAMS 509
Figure D.2: These are three possible results of an electron-positron annihilation. Far left:
The annihilation forms a photon, but then the photon recreates the electron-positron pair.
Middle: The electron and positron have enough kinetic energy to generate a slower muon-
antimuon pair. Far right: The electron-positron pair just creates two photons that move
away in opposite directions (via a virtual fermion).
Figure D.3: These are some complex examples of Feynman diagrams. Far left: Decay of
a negative pion, π − . Middle: Decay of a neutral pion, π 0 . Far right: Neutron decays into
a proton (i.e. negative beta decay).
c Nick Lucid
Index
510
INDEX 511
c Nick Lucid
512 INDEX
c Nick Lucid
INDEX 513
Faraday’s law, 106, 231, 447 Force, 15, 16, 46, 50, 51, 66, 70, 75,
in del form, 107 77, 148, 206
Fermions, 501 carrier particles, 502, 504
Feynman diagrams, 506 Central, 57, 148
Examples of, 509 Conservative, 47
Rules for, 506 Constraint, 66–68, 70, 75
Fields, 22, 79, 117, 130, 272, 302 Electric, 78, 418
Conservative, 124 Fictitious, 265
Displacement, 114–116 Four-, see Four-force
Electric, 22, 79, 97, 107, 109, 110, Gravitational, 78, 202, 266, 303,
113–118, 123, 124, 126, 129, 307, 312, 313
214, 255, 447 Lorentz, 117, 130, 224, 228, 233
Electric (index notation), 215 Magnetic, 87
Electromagnetic, 130 Non-conservative, 47, 75
Gravitational, 60, 266, 270 Proper, 228
Hysteresis, 115 Relativistic, 206
Magnetic, 22, 87, 97, 99, 105, 107, Four-acceleration, 197, 199, 201, 202,
109, 110, 115, 117, 118, 123, 205, 243, 303, 305, 312
124, 126, 129, 148, 149, 214, Four-current, 212, 214, 215, 229
348, 447, 450 Conservation of, 212
Magnetic (index notation), 216 Four-force, 205, 243, 303, 312, 313
Mathematical, 11 Lorentz, 233, 235, 238, 313
Fine structure, 445 Four-momentum, 203, 204, 207, 241,
adjustment, 448, 450 243, 312, 342
constant, 445 Conservation of, 204, 207, 281
Finite square well, 376 of a photon, 241
Finding expectation values for, 391 with magnetic potential, 348
Finding probabilities for, 390 Four-potential, 213, 214
Finding specific solutions for, 384– Four-velocity, 196–198, 201, 203, 212,
390 233, 292, 303, 305, 312, 313
General coefficients for, 382, 384 for a static fluid, 292
General eigenstates for, 381 Friedmann Equations, see Cosmology
Potential energy for, 377 Full angular momentum, 440, 441, 448
Schrödinger’s equation for, 379 in terms of angular momentum and
FLRW Metric, see Cosmology spin, 441
Fluid continuity, 112 Fundamental particles, 504
Fluid flux, 106 Fundamental theorem of calculus, 19,
c Nick Lucid
514 INDEX
c Nick Lucid
INDEX 515
c Nick Lucid
516 INDEX
Relativistic, 204, 239, 244, 342, Orbitals, 429, 431–433, 436–439, 454,
447 455, 457, 461, 462
with magnetic potential, 348 Order of operations, 11
Muons, 207, 502, 504 Orders of magnitude, 486
Orthonormal basis, 8, 143, 290, 355
Neutrinos, 207, 502–504 Angular momentum in an, 151,
Neutron stars, 366 152
Neutrons, 457, 507, 509
Newton’s first law, 169 Parallel transport, 155, 269
for a photon, 243 Particle decay, 206, 466
Relativistic, 206 Path element, 34, 141
Newton’s law of gravity, 78 Generalized, 34, 488
Newton’s method, 483 Perfect fluids, 291
Newton’s second law, 48, 51, 130, 303, Periodic table, 457, 461
308, 344, 446 Rules for the, 462
Relativistic, 205, 206, 303 Phase velocity, 342
Newton’s third law, 206 Photon sphere, 324
Normalization, 33 Photons, 172, 239, 241–243, 246, 255,
Quantum, 352, 355, 359, 369, 378, 312, 318–323, 327, 342, 466,
382, 412, 417, 420, 473 475, 476, 479, 502, 504, 506,
509
Ohm’s law, 115 around a black hole, 325, 326
Operators, 11 Emission, 339
Calculus, 19 Emission of, 336, 344, 401, 415,
Chain rule, 19 453
Cross product, 14 orbiting a black hole, 324
Del, see Del operator Spin of, 439
Dot Product, 13 Pions, 206, 466, 506, 507
Fundamental theorem of calculus, Poisson’s equation, 125, 128
19 for gravity, 270, 272, 281
Product rule, 20 Polar coordinates, 4
Quantum, see Quantum operators Pole-in-barn problem, 252
Quotient rule, 20 Positrons, 466, 467, 509
Scalar, 12 Potential energy, see Energy
Variation, 275 Potentials, 123, 127
Vector, 12 Electric, 115, 116, 123, 124, 126,
Orbital diagrams, 455, 458–460 128, 129, 212, 348
Orbital Plots, 465 Four-, see Four-potential
c Nick Lucid
INDEX 517
Magnetic, 101, 115, 116, 123, 126, Angular momentum, 430, 441
128, 129, 212, 348 Angular momentum squared, 430,
Power, 235 441
Relativistic, 206 Commutators, 360, 438, 440
Power series solutions, 402 Compatible, 363
for the harmonic oscillator, 404, Full angular momentum, 440, 441,
406 448
Poynting vector, 122 Full angular momentum squared,
Principle of stationary action, see Sta- 440
tionary action Hamiltonian, see Hamiltonian
Probability, 351–353, 356, 378, 382, Hermitian, 354, 355
443, 463, 466, 473, 478, 479 Incompatible, 364
amplitude, see Wave functions Momentum, 346
Conservation of, 352 Momentum squared, 346
current, 351, 352 Spin, 438, 439
density, 351, 352, 354, 356, 431, Spin squared, 438
435 Quarks, 502–504
inside a finite square well, 391 Quotient rule, 20
of quark states, 506
outside a finite square well, 391 Rectilinear coordinates, 2
plots, 465 Reduced mass, 446
Product rule, 20 Relativistic sign convention, 191, 273
Proper acceleration, 202, 203 Relativistic units, 191, 194, 212, 282
Proper length, 179, 183, 249, 252 Rest mass, see Mass
for a photon, 242 Ricci curvatures, 269, 277
Proper mass, see Mass for spherical symmetry, 289, 296
Proper time, 179, 180, 182, 196, 197, in a vacuum, 296
200, 201, 302–306 Riemann curvatures, 268, 269, 289,
for a photon, 242, 312 316
Protons, 228, 237, 338, 431, 447, 457, for spherical symmetry, 288
506, 507, 509 Runge-Kutta method, 481
Spin of, 439, 448
Scalar product, 161, 195, 197, 201,
Quantum decoherence, 479 204, 217, 229, 231, 241, 342
Quantum observables, see Quantum Scale Factor, see Cosmology
operators Schrödinger’s cat, 478
Quantum operators, 346, 353–355, 359–Schrödinger’s equation, 347, 356, 365,
363, 365, 392, 468, 477 445, 468
c Nick Lucid
518 INDEX
c Nick Lucid
INDEX 519
477
Canonical, 364
Generalized, 363
c Nick Lucid