Ele Teo Rel
Ele Teo Rel
Ele Teo Rel
Roberto Casadio
December 4, 2013
Foreword
Two-fold motivation
1. Provide a comprehensive overview of relativistic physics, from mechanics and electromagnetism to the classical theory of gravity, to undergraduate students who will not take on a
Master Course in (theoretical) Physics.
2. Introduce the General Theory of Relativity and gravitation to undergraduate students who
will take on a Master Course in Theoretical Physics.
Outline
1. Review of Special Relativity (principles, kinematics, dynamics, electromagnetism) and oldfashioned tensorial formulation [1]; global symmetries and group theory [2].
Chapter 1 and 2.
2. Introduction to basic geometrical methods in preparation of advanced courses (manifolds,
tensors, Lie and covariant derivatives, differential forms, metric, curvature) [3].
Chapter 3.
3. Introduction to General Relativity (principles, Einstein equations, classical tests) and some
of its main predictions (black holes and cosmology) [4, 5].
Chapter 4.
Bibliography
Complete lecture notes are available from the AMS Campus service and will be constantly updated.
Further suggested bibliography will be reported during the course.
Schedule
1. Wednesdays, 14:0017:00, Room A.
2. Thursdays, 14:0016:00, Room A.
Exam
Oral will cover selected topics from all the three parts of the course, including one short written
essay of students choice.
Open schedule: contact teacher.
i
ii
Contents
1 Special relativity
1.1 Newtonian relativity . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Observers and frames . . . . . . . . . . . . . . . . .
1.1.2 Galilean transformations . . . . . . . . . . . . . . . .
1.1.3 Conservative forces . . . . . . . . . . . . . . . . . . .
1.1.4 Electromagnetism . . . . . . . . . . . . . . . . . . .
1.1.5 Alternative explanations . . . . . . . . . . . . . . . .
1.2 Foundations of special relativity . . . . . . . . . . . . . . . .
1.2.1 Two new principles . . . . . . . . . . . . . . . . . . .
1.2.2 Newtonian space and time . . . . . . . . . . . . . . .
1.2.3 Relativity of simultaneity and space-time . . . . . .
1.3 Relativistic kinematics . . . . . . . . . . . . . . . . . . . . .
1.3.1 Lorentz transformations . . . . . . . . . . . . . . . .
1.3.2 Space-time diagrams . . . . . . . . . . . . . . . . . .
1.3.3 Addition of velocities . . . . . . . . . . . . . . . . . .
1.3.4 Invariance of the phase of a wave . . . . . . . . . . .
1.3.5 Twin paradox . . . . . . . . . . . . . . . . . . . . . .
1.4 Relativistic dynamics . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Relativistic momentum and mass . . . . . . . . . . .
1.4.2 Elastic collisions . . . . . . . . . . . . . . . . . . . .
1.4.3 Inelastic collisions . . . . . . . . . . . . . . . . . . .
1.5 Electromagnetism . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Electric charge and current . . . . . . . . . . . . . .
~ and B
~ . . . . . . . . . . . . .
1.5.2 Transformations for E
1.5.3 Introduction to (old fashioned) covariant formalism .
1.5.4 Maxwell equations redux . . . . . . . . . . . . . . .
1.5.5 Nature and relativistic fields . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
4
4
5
9
9
10
11
12
12
14
18
20
23
25
25
26
30
32
32
33
35
38
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
47
47
49
53
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.4.1
2.4.2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 General Relativity
4.1 Arbitrary observers and gravity . . . . . . . . .
4.2 Gravitational equations . . . . . . . . . . . . .
4.2.1 Gravity and test particles . . . . . . . .
4.2.2 Source of gravity and Einstein equations
4.2.3 Gravity and geometry . . . . . . . . . .
4.2.4 Classical tests of General Relativity . .
4.3 Black holes . . . . . . . . . . . . . . . . . . . .
4.3.1 The Schwarzschild metric . . . . . . . .
4.3.2 Radial geodesics . . . . . . . . . . . . .
4.3.3 General geodesics . . . . . . . . . . . . .
4.3.4 Gravitational red-shift . . . . . . . . . .
4.3.5 The (event) horizon and black holes . .
4.4 Cosmology . . . . . . . . . . . . . . . . . . . .
4.4.1 Friedman-Robertson-Walker metric . . .
4.4.2 Cosmic fluids . . . . . . . . . . . . . . .
4.4.3 Friedmann equations . . . . . . . . . . .
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
58
61
62
62
67
72
74
77
77
79
81
81
83
85
91
95
95
96
100
100
104
106
107
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
. 111
. 115
. 115
. 118
. 121
. 121
. 122
. 122
. 125
. 128
. 129
. 131
. 133
. 134
. 135
. 137
4.4.4
4.4.5
4.4.6
4.4.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
139
140
142
143
vi
Chapter 1
Special relativity
Suggested bibliography [1]: R. Resnick, Introduction to special relativity, J. Wiley and Sons (1968).
1.1
Newtonian relativity
We start by briefly reviewing the main concepts at the heart of classical Newtonian mechanics, and
the inconsistencies that arise when trying to incorporate Maxwells electromagnetism. A successful
description of the latter will lead us to accept Special Relativity as the new general framework
for studying the motion of objects with velocities comparable to the speed of light. However, this
comes at the expense of one of Newtons greatest achievements: his universal law of gravity (in
terms of fundamental interactions), so that we gain one and lose one.
1.1.1
One of the key concepts in this course is that of the observer : physicists, more or less implicitly,
divide the universe into the specific object of study (for example, a moving ball inside the room or
planets around the sun) and the observer, all the rest being included in the co-called environment,
whose effects on both object and observer are neglected (as a simplifying assumption). Much of
the progress achieved in physics, during the last century or so, can be measured by our increasing
ability to describe the object, but its origin is arguably related to improved descriptions of the
observer, and the way the latter interacts (or affects) with the former (the measurement). In
fact, mathematically, we are taught to think of an observer as a geometrical reference frame,
whereas its physical (experimental) meaning is that of an apparatus to locate objects in space and
time. Confusing the two meanings can be hazardous: many are the situations in which the actual
apparatus can just cover a small portion of the observed phenomena and not all mathematical
reference frames can be physically realised. One must therefore beware of the physical relevance of
mathematical computations carried out in mathematically convenient frames 1 .
Classical mechanics deals mostly with point-like objects and their motion. In Newtonian physics,
extended bodies (which represents more realistic objects) are then just collections of points with
mass, and the observers details (along with the measurement process) are assumed to remain
irrelevant: the observer is a space-filling mathematical frame, with the notion of absolute time
attached, which can measure everything of the objects without affecting their physical status. In
1
Special Relativity the situation is more complicated, since one starts to consider the observer as an
apparatus and the measurements limitations which follow, and becomes even more so in General
Relativity, in which the physical localisation of (extended) objects is highly non-trivial.
We will not review the details here, but recall that the three laws of Newtonian mechanics
introduce a family of preferred frames: the inertial observers. Their very definition is logically
a loophole: the first principle defines an inertial observer given the notion of (absence of) forces,
whereas the second principle defines the (effect of a) force given the notion of inertial observer,
namely
F~ = m ~a ,
(1.1.1)
where m is the mass and ~a the acceleration of a body as measured by an inertial observer, whereas
F~ stands for the expression describing a specific force (like Newtons law of gravity or the Lorentz
electromagnetic force). However, this is typical of the physicists pragmatism. In practical terms,
one considers observers (frames) of suitable size (not too big, nor too small) for the problem at
hand and views them as inertial. For example, the bench in a laboratory is good enough an inertial
frame for studying collisions of ping-pong balls, whereas the solar system is good enough to study
the motion of the Earth around the sun 2 .
1.1.2
Galilean transformations
Once the notion of inertial frames is accepted, the principle of Newtonian (or Galilean) Relativity
can be phrased as follows:
Galilean Relativity: The laws of mechanics are the same for all inertial observers.
This idea can be made mathematically more precise by introducing sets of suitable coordinate
transformations.
Figure 1.1: Parallel and transverse axes for the frames S and S .
Given two frames S = {x, y, z} and S = {x , y , z }, the latter moving with velocity ~v = (v, 0, 0)
with respect to the former, the coordinates transform according to
x = xvt
x = x + v t
(1.1.2)
y = y
y =y
z = z ,
z =z
2
Ideally, one would still like to be able to define the concepts at the heart of our physical theories unambiguously.
We shall see how General Relativity helps in this respect, at the end of the course.
(1.1.3)
It follows that, for two events A and B (points in space and time; see Fig. 1.2), one finds
tA tB = tA tB
xA xB = xA xB v (tA tB ) .
and
(1.1.4)
Let us note that, if the events are the end-points of a rod, its length is the same in both frames
provided the measurements are taken simultaneously, that is
tA = tB .
From (1.1.2) and (1.1.3), we immediately obtain
dx
dx
dt = dt v
dy = dy
dt
dt
dz = dz
dt
dt
and
dux = dux
dt
dt
duy
du
= y
dt
dt
duz = duz
dt
dt
(1.1.5)
~u = ~u ~v
(1.1.6)
~a = ~a ,
(1.1.7)
which implies
m ~a = m ~a ,
(1.1.8)
(1.1.9)
Note Eq. (1.1.8) is not yet enough to guarantee invariance of the second of Newtons laws. For
that, we need to show the laws describing specific forces are also invariant.
3
1.1.3
Conservative forces
For conservative forces there exist, by definition, a potential energy U = U (~x) such that
2
d x
~ = m V
~ ,
F~ = m 2 = U
dt
(1.1.10)
(1.1.11)
with
r =
=
(1.1.12)
in which we used the notion of absolute time to compute the distance r between the two particles
located at points 1 and 2 at the same time in both frames,
t1 = t2 = t1 = t2 .
(1.1.13)
~ (r) = dV (r) r
~ = dV (r ) r
~ = dV (r )
~ r =
~ V (r ) ,
V
dr
dr
dr
(1.1.14)
~ = (x , y , z ) = (x , y , z ) =
~ when Eq. (1.1.13) holds. From Eq. (1.1.8), we
where
1
1
1
1
1
1
can therefore conclude that the second law of classical mechanics is Galilean invariant in form for
conservative forces, which means it takes the same mathematical form in all inertial frames,
~ = m ~a = m ~a = F
~ .
F
(1.1.15)
In particular, the above result implies that Newtons law of gravity is Galilean invariant in form,
since it can be derived from the potential
VG =
GM
.
r
(1.1.16)
Given the very good accuracy with which Eq. (1.1.16) describes the motion of planets and other
objects in the solar system, and that gravity was the only known (fundamental) interaction at his
times, Newton was right in assuming time is an absolute concept.
1.1.4
Electromagnetism
The electrostatic Coulomb force can be derived from a potential of the same form as the gravitational expression in Eq. (1.1.16). However, not all of the electromagnetic interactions admit
a similar description. In fact, we shall here show the non-invariance of Maxwells equations (in
particular, of the consequent wave equation for light propagation) under (Galilean) addition of
velocities.
4
Let us first recall that the speed of light is related to the vacuum electromagnetic constants by
c=
1
3 108 m/s .
0 0
(1.1.17)
(1.1.18)
~ and B
~ are the electric and magnetic fields in the inertial
where ~u is the velocity of the charge, E
frame S. Naively, in a different inertial frame S , we would expect
h
i
? ~
~ + (~u ~v ) B
~ =
F ,
(1.1.19)
F~ = q E
where ~v is the relative velocity between S and S . There is however no way to introduce transfor~ and B,
~ and the charge q, which yield the above equality and preserve the form
mation laws for E
of Maxwells equations.
Let us, for example, consider the wave equation (that follows from Maxwells equations)
2
2
2
c
(t, x) = 0 ,
(1.1.20)
t2
x2
where is any of the electromagnetic field components and, for simplicity, we assumed the wave
is plane-symmetric (so that it carries no dependence on y and z). From the chain rule,
t
x
2
=
+
v =
v
v
=
t2
t t t
t x
t t
x
t
x
t
x
2
2
2v
+ v 2 2 ,
(1.1.21)
=
2
t
t x
x
and
2
t
x
=
+
=
=
x2
x x t
x x
x x
x
x
2
=
.
x2
Substituting into Eq. (1.1.20), we obtain (note the dimension of all operators is time1 )
2
2
2
(c v)
(t , x ) = 2 v (c v) + (t , x ) .
t2
x2
x
x
t
(1.1.22)
(1.1.23)
Because of the non-zero right hand side, the above form differs from (1.1.20) (for v 6= 0).
1.1.5
Alternative explanations
return light paths along L1 and L2 , and corresponding travel times. Along L1 , we have
t1 =
1
L1
L1
L1
.
+
=2
cv c+v
c 1 v22
c
(1.1.24)
And, from Pythagoras theorem (see Fig. 1.4), the traveling time along L2 is given by
s
v t2 2
1
2 L2
q
c t2 = 2
+ L22 t2 =
.
2
2
c
1 v
(1.1.25)
c2
2 L2
q
t = t2 t1 =
c
1
v2
c2
L1
.
2
1 vc2
L2
2
L1
t =
.
q
c 1 v22
v2
1 c2
c
1
2
1
q
t t = (L1 + L2 )
2
v
c
1 c2
1
2
L1 + L2 v
,
c
c
v2
c2
(1.1.26)
(1.1.27)
(1.1.28)
which should result in a shift of the interference pattern on the screen. Since the wave period is
T = /c, the shift is given by a (possibly fractional) number of wavelengths equal to
t t
L1 + L2 v 2
N =
.
(1.1.29)
T
In the original experiment, the two arms were 22 m long and light with a wavelength of 5.5 107 m
was used. One therefore expected N = 0.4, a fairly large quantity which was however not seen.
The conclusion was then that ~v = 0, which implied that either the aether did not exist or the earth
moves along with it.
It is worth mentioning that the Michelson interferometer has survived as a useful apparatus to
present. For example, the largest earth based gravitational detectors now active (LIGO and Virgo)
are just an upscaled (albeit much refined) version of the Michelsons design.
Lorentz-Fitzgerald contraction hypothesis
An alternative explanation of Michelsons results was that a body actually shrinks in the direction
of motion with respect to the aether. Due to some complicated electron reaction 3 , the actual
length of a moving body would be related to its rest length by
r
v2
0
(1.1.30)
L1 = L1 1 2 ,
c
3
Quite remarkably, such effect was meant to be mathematically described by the Lorentzs transformations.
1
2 0
L L1 q
c 2
1
0
v2
c2
= t .
(1.1.31)
Michelsons experiment was therefore repeated using different angles. This should (supposedly)
yield different speeds, v1 and v2 , with respect to the aether along the two arms, so that one
expected a corrected shift given by
L02 L01 v12 v22
2 ,
(1.1.32)
N 2
c2
c
which again was not observed.
Aether dragging hypothesis: aberration of light
Another hypothesis assumed that the earth was at rest with the aether, the latter therefore being
dragged along by (sufficiently large?) moving bodies. This case can be easily ruled out because of
the so-called aberration of light coming from distance sources: stars appear to move along circles
during a solar year (see Fig. 1.5). If the aether were dragged along with the Earth, so would be
light rays and no such effect should occur.
Figure 1.5: Aberration of light. The angle depends on relative velocity of Earth and distant star.
Figure 1.6: Fizeau experiment. Water enters the pipe from the bottom valve and exits from the
top one, running counter-clockwise.
made by several mirrors and a pipe filled with running water (see Fig. 1.6). The experiment was
intended to show that water and aether move with the same speed in the laboratory, but the effect
was not observed (meaning Fresnels law held) and Fizeaus hypothesis was discarded.
Alternatively, it was hypothesized that the speed of light depended on the nature of the source
and type of mirrors. Observations were conducted of distant binary stars and corresponding variations of their orbits, which however were never confirmed. Finally, Michelsons experiment was
repeated using extra-terrestrial light sources, but no evidence of the existence of aether was ever
found.
1.2
1.2.1
interactions at a distance becomes fully questionable, since it would allow to move information
faster than c, and conservative forces (described by a space-dependent potential function) are correspondingly banned.
Figure 1.7: Parallel and transverse axes for the frames S and S .
1.2.2
Let us first review the space-time diagrams for Newtonian mechanics. Instead of the snapshot
type of diagram of Fig. 1.7, it is often more convenient to consider a diagram like the one in Fig. 1.8,
in which space and time axes of the frame S are orthogonal and the axes of the moving system
S are represented as follows: the axis t is given by the trajectory of the origin O , that is by the
condition x = 0, from which one immediately finds
t=
x
.
v
(1.2.1)
Likewise, the axis x is represented by the condition t = constant, or, usually, t = 0. According to
Galilean transformations, this means t = 0, so that the axis x is parallel the axis x.
Now, consider that the axes x and x are mathematical representations of a graduated rod with
clocks attached. Therefore, as time evolves, both axes shift upward, with the origin O moving
along the axis t, and the origin O along t . Note that the axes t and t do not represent a physical
apparatus that moves in the same sense, so that space and time remain physically distinct. In
order to determine the coordinates of a given point (or event) A, one should move the axis x from
its position at the time t = 0 until A lies on it. This will determine x(A) and t(A). In practice,
it is more convenient to move backward (or forward) in time the point A as if it were at rest in
S: the point A is projected onto the axis x at t = 0 parallely to the axis t in order to determine
Figure 1.8: Space and time axes for the frames S and S .
10
x(A), whereas t(A) is determined by projecting A onto the axis t parallely to the axis x. Likewise,
in order to determine x (A) and t (A), one projects A parallely to the axes t and x .
We shall next see how such diagrams change according to the principles of Special Relativity.
1.2.3
1.3
Relativistic kinematics
The first step in the development of a relativistic kinematics is to determine the new coordinate
transformations between two inertial observers. From these relations, several interesting consequences will follow.
1.3.1
Lorentz transformations
y = a21 x + a22 y + a23 z + a24 t
(1.3.1)
= a x+a y+a z+a t
z
31
32
33
34
t = a41 x + a42 y + a43 z + a44 t ,
where the coefficients aij may only depend on ~v . In fact, suppose, for example, that
x = x2 ,
(1.3.2)
where the coefficient has dimensions of length1 . The observer S would therefore see space as
endowed with an intrinsic length 1 , hence not homogeneous, as can be easily seen by considering
displacements in the two frames, for example
x1 x2 = x21 x22 6= (x1 x2 )2 .
(1.3.3)
Note then that the above aij are all dimensionless, except for the ai4 , which have dimensions of a
velocity. But we already know there is a fundamental velocity, the speed of light c, in our theory.
12
For ~v = 0, we require that off-diagonal coefficients vanish and the aii = 1 (which can always be
achieved by a suitable choice of time and length units in the two systems). Moreover, we expect
to recover Galilean relativity for small relative speed v c. From isotropy, we can always rotate
S so that the axes x and x are parallel,
xaxis : y = z = 0
y = a22 y + a23 z
(1.3.4)
x axis : y = z = 0
z = a32 y + a33 z .
Moreover, the planes x-y and x -y are parallel as well, as are the planes x-z and x -z ,
xyplane :
z=0
x y plane : z = 0
xzplane :
y=0
x z plane : y = 0
y = a22 y
(1.3.5)
z = a33 z .
From these relations it follows that, if we place in S an object of length L with one end at the
origin O and the other end at a point A on the y-axis, the coordinate of A in S is y = a22 L. If
the same object were at rest in S , the coordinate of the second end A would instead be y = aL22 in
S (and analogously for objects placed on the z and z -axis) 5 . One must therefore have
a22 =
1
1
= 1 = a33 =
.
a22
a33
(1.3.6)
From isotropy, we also expect t = a22 x + a44 t and, from the small velocity agreement with Galilean
invariance, x = a11 (x vt), so that
x = a11 (x vt)
y =y
(1.3.7)
z = z
t = a22 x + a44 t .
So far we have not yet considered the propagation of light and the principle of constancy of c.
Suppose then that at t = t = 0, a flash of light is emitted from the coinciding origins O = O . The
path of such a pulse is given in the two frames by
2
x + y 2 + z 2 = c2 t2
(1.3.8)
2
2
2
2
2
x +y +z =c t .
Upon substituting for Eq. (1.3.7) in the second relation, we obtain
a211 c2 a241 x2 + y 2 + z 2 2 v a211 + c2 a41 a44 x t = c2 a244 v 2 a211 t2 ,
(1.3.9)
5
Note we are implicitly assuming here that the length of an object measured at rest is an intrinsic property and
does not depend on the observer.
13
2 2
c a44 v 2 a211 = c2 .
(1.3.10)
xvt
x = q
1 vc2
y = y
z = z
t v x
t = q c 2 .
1 vc2
(1.3.11)
Note that, as required, these transformation laws reduce to the Galilean ones for non-relativistic
speed v c (correspondence principle). For later convenience, we also define a new time variable
w with units of length 6 and a parameter ,
xw
x = p
1 2
= vc
y =y
(1.3.12)
=z
z
w = ct
wx
,
w = p
1 2
which makes the symmetry between space and time more apparent.
1.3.2
Space-time diagrams
(1.3.13)
to relate length and time units in the two systems S and S . For example, the point P = (w =
0, x = 1), which represents the unit of length in S, is mapped by the hyperbola (with + sign) into a
point P = (w = 0, x = 1) with the same meaning in S . The value of x can be easily determined
by using the Lorentz transformations
0 w (P ) =
w(P ) x(P )
p
1 2
x(P ) =
w(P )
,
(1.3.14)
6
Later in these notes, we shall set c = 1, corresponding to a more natural choice of units, and use the same symbol
t for the time with dimensions of a length.
14
Figure 1.12: Space-time axes for S and S (left panel) and calibration hyperbolae (right panel).
and Eq. (1.3.13), which yield
w(P ) = p
1 2
1
x(P ) = p
1 2
x (P ) =
x(P ) w(P )
p
=1,
1 2
(1.3.15)
as it should be. A similar argument for Eq. (1.3.13) with the sign leads to an analogous conclusion
for the unit of time.
There are two famous effects predicted by Special Relativity which can now be easily derived
from this graphical construction: length contraction and time dilation.
Length contraction
It is easy to find the expression that describes analytically this effect. In S, we have t = 0,
since the positions of the two ends of the bar are measured simultaneously therein, and Lorentz
transformations (1.3.12) with x = L then yield
p
(1.3.16)
x = L = L 1 2 ,
which, incidentally, represents a correction of order 2 with respect to the Galilean result L = L.
Time dilation
T
1 2
(1.3.17)
which, analogously, represents a correction of order 2 with respect to the Galilean result T = T .
Note that, unlike S, the two events marking the time interval do not occur at the same position in
S , that is x (0) 6= x (T ).
Minkowski diagrams
The graphical representation of the two-dimensional space-time w-x is called Minkowski space.
Light trajectories with v = c in this diagram are represented by straight lines at 4 rad, along
which the units of time and length become infinite (in fact, the calibrating hyperbolae approach
these lines asymptotically for x ). Each pair of such lines starting from a given event say
the origin O in S, form what is named the light cone of O and divide the space in three regions:
16
Figure 1.15: Minkowski diagram of two-dimensional space-time with the light cone of the origin.
Absolute future: for any given point P inside this region, it is always possible to find a
reference frame S such that P lies on the w -axis. In this frame, P occurs in the same
spatial position as O but at a later time t > 0. It is then easy to see that there is no inertial
frame in which P occurs before O.
Absolute past: this is just the time-reverse of the absolute future: for any given point P
inside this region, it is always possible to find a reference frame S such that P lies on the
w -axis at time t < 0. It is likewise easy to see that there is no inertial frame in which P
occurs after O.
Absolute present: for any given point P inside this region, it is always possible to find a
reference frame S , moving with speed with respect to S, such that P lies on the x -axis.
In this frame, P occurs at the same time as O, but at a different place x 6= 0. Moreover,
if we chose a frame corresponding to a velocity smaller (larger) than , we would obtain a
frame in which P occurs after (before) O. In other words, there is no fixed temporal order
between P and O.
Consider now a generic event P of coordinates (w, x) in a certain frame S. For the line segment
OP , one can have the three case:
w 2 x2 = 2 > 0
P is in the absolute future or past of O and OP is said time-like. A physical signal can
reach P starting from O (or conversely, depending on the time order between the two events).
w 2 x2 = 2 < 0
P is in the absolute present of O and OP is said space-like. A physical signal cannot reach
P starting from O (or conversely).
w 2 x2 = 2 = 0
17
1.3.3
Addition of velocities
x + vt
p
1 2
t + cv2 x
p
,
1 2
(1.3.19)
(1.3.20)
ux
x
ux + v
x
x + vt
t + v
=
=
=
=
t
t + cv2 x
1 + cv2 ux
1 + cv2 x
t
=
7
ux +
,
1 + ux
(1.3.21)
Note that in the last expression we actually display x/w, which equals ux if we set c = 1.
18
uy = uy
v = uy
1 + c 2 ux
1 + ux
q
p
2
1 vc2
1 2
uz = uz
,
v = uz
1 + c2 ux
1 + ux
(1.3.22)
(1.3.23)
which, besides being an effect of order (v/c)2 , is still qualitatively different from the Galilean result
uy = uy and uz = uz .
Finally, the relativistic acceleration takes the following form
!3
p
2
1
ax = ax
1 + ux
1 2
u
a
y
y
x
(1 + ux )2
1 2
=
a
u
a
.
z
z
x
(1 + ux )2
ay =
(1.3.24)
az
(1.3.25)
Note that ~a = 0 only if ~a = 0: although the value of the acceleration depends on the frame, it
can only vanish in a frame if it is zero in all frames. The fact that an object is accelerated or not
(that is, subject to a force or not) is still an absolute concept in Special Relativity as it was in
the Galilean framework. This result is crucial, in that it allows for the very existence of inertial
observers.
Fresnel formula
From Eq. (1.3.21), we can derive Fresnels formula which was previously used to describe Fizeaus
experiment. Since light and water move along the same direction, it is sufficient to replace u = c/n
and v = vw and expand for vw /c small. This yields
c
c
vw c
1
+ vw 1 2 ,
+ vw 1 2
(1.3.26)
u
n
c n
n
n
The relativistic addition law for velocities (1.3.21)-(1.3.23) also provides an easy description of the
aberration of light. Since a star emits light in all directions, assuming it is at rest in S , the light
rays have velocity
u + v
ux = x v
1 + c 2 ux
ux = c cos
(1.3.27)
v2
uy = c sin
1
c2
uy = uy
v ,
1 + c2 ux
19
1 2
=
c2
c
1 .
2
v
v
(1.3.29)
1.3.4
We shall now derive the relativistic Doppler effect from the invariance of the phase of a wave,
=
2
2
(x cos + y sin t) = x cos + y sin t .
(1.3.30)
The invariance of the phase follows, for example, from the requirement that the number of cycles
at the source point between two fixed times must be independent of the observer. Let us assume
the source is located at x = y = z = 0 at all times, and starts to emit at t = 0. The number of
oscillations of the source at a later time t > 0 is thus given by N (t) = (~0, t)/2 for the observer S
at rest with the source, and by N (t) = (~x (t), t (t))/2 for any other inertial observer S , where
~x = ~x (t) and t = t (t) are the coordinates of the source in S at the later time. If N (t) 6= N (t),
there might exist an observer S which does not see any cycle, and determinism would be totally
lost, since S would not see anything happen.
The phase of a wave is an example of what is called a scalar quantity. By common definition,
a scalar is a quantity which, under a change of coordinates x x , changes according to
(x ) = (x) ,
(1.3.31)
where denotes a possibly different functional form with respect to (for the phase above,
= and = ). Consider, for example, a quantity represented by the real functions
and of the real axis in two different frames S and S . The coordinates in the two frames are
related by the transformation x x (x), meaning that the same point P has coordinate x in S and
x in S (passive transformation). But we can also consider this transformation as one mapping
20
the point P to a different point Q in one coordinate frame, say S (active transformation), where
x (P ) = x(Q). In the passive interpretation, we must then have
(x (P )) = (x(P )) ,
(1.3.32)
meaning that the quantity at a given point P has the same value for both observers, S and S .
Likewise, in the active interpretation,
(x (P )) = (x(Q)) = (x(P )) ,
(1.3.33)
for exactly the same reason: the quantity we are measuring conserves its value even if the point
is moved. Note that both Eqs. (1.3.32) and (1.3.33) can formally be written as the defining
Eq. (1.3.31). This might look somewhat confusing, and we shall indeed spend a good deal of
time later on in the course to clarify such transformation laws. For now, we just need to consider
the passive interpretation and no confusion should arise.
(x , y , t ) = A cos
(x cos + y sin t ) A cos( ) .
(1.3.34)
The invariant number of cycles between two arbitrary points of the wave is represented by the
difference between the arguments of the cosine evaluated at the two points, that is
(x, y, z) = (x , y , z ) ,
(1.3.35)
(1.3.36)
(1.3.37)
t=
2
2
1
1
21
(1.3.38)
Upon equating the coefficients of x, y and t, we finally obtain the laws of transformation for the
frequency = c/,
1 + cos
= p
,
1 2
cos +
cos
= p
1 2
(1.3.39)
(1.3.40)
sin = sin .
The first of the above expressions gives the relativistic Doppler effect, whereas the other two describe
the aberration of light.
Doppler effect
If we set = 0 (source approaching observer, cos = +1) or (source moving away from observer,
cos = 1) in Eq. (1.3.39), we obtain the longitudinal Doppler effect,
s
1
1
,
(1.3.41)
=
L = p
1
1 2
whereas for = 2 (so that cos = 0), we obtain the transversal Doppler effect,
T = p
1 2
(1.3.42)
Note that the longitudinal effect reproduces the Newtonian result at leading order (first order in
= v/c),
L = (1 ) ,
(1.3.43)
whereas the transversal effect is of order 2 and, in fact, was not known in Newtonian dynamics.
Aberration of light
From Eqs. (1.3.40), we easily obtain
p
sin 1 2
.
tan =
cos +
(1.3.44)
Setting, for example, = /2 (light emitted by S straight down), we obtain the angle
r
1
1
tan =
1
(1.3.45)
2
1.3.5
Twin paradox
(x , y)
cos sin
sin cos
(1.3.47)
so that
(x1 , y1 )
x2
y2
=
=
cos sin
sin cos
(x1 , y1 )
x2
,
y2
(x1 , y1 )
23
cos sin
sin cos
x2
y2
(1.3.48)
since
T
R R = R IR = I =
1 0
0 1
(1.3.49)
(1.3.50)
Introduce next the Lorentz boost, for example along the direction x, which acts on space-time
displacements (x, w) as
and the Minkowski metric
x
w
1
1 2
1 2
1 2
1 0
0 1
1
1 2
x
w
(1.3.51)
(1.3.52)
(1.3.53)
(1.3.54)
We remark once again that results of measurements are scalars: what is measured by a given
observer cannot be argued by others. For example, if S measures a rods length is L, no observers
S can claim S saw the rod is long L 6= L, although they would measure the rods length is L .
Proper time is a particular case: it is the time measured by a comoving observer. We will see
that things become even more complicated in General Relativity where a measurement involves
(at least) two scalars: the quantity measured by a given observer and a scalar that defines the
space-time point where the measurement is taken.
Now that we have introduced the Minkowski metric, it is easy to see that, given the two events
A and B, any physical trajectory connecting them must be represented by a time-like curve, along
which the proper time is determined by
s
2
Z B
Z Bp
dx
2
2
dw dx =
1
dw c (tB tA ) ,
(1.3.55)
B A =
dw
A
A
in which we replaced x dx and t dt, x and w (or t) being the coordinates along the
chosen trajectory in the inertial frame where A and B occur at the same position (xA = xB = 0).
It follows that the straight line representing the twin who remained in the origin is the longest
possible proper time tB tA : the twin who travelled will necessarily be younger by the time they
meet again.
24
1.4
Relativistic dynamics
The very concept of Newtonian force between separate bodies implies an action at a distance
which is incompatible with the principles of Special Relativity. On the other hand, contact interactions, such as those involved in collisions, are perfectly acceptable, since they occur when two
bodies touch at one point in space-time (we shall always neglect the duration of the collision for
simplicity). We can therefore describe collisions in a rather simple way in Special Relativity.
1.4.1
,
(1.4.1)
,p
P = p
1 u2 /c2
1 u2 /c2
where the index = 0, 1, 2, 3 for t, x, y and z-components, respectively. Using the law of velocity composition (1.3.21)-(1.3.23), one can explicitly verify that P transforms like a space-time
displacement under Lorentz transformations, that is
Px + vc Ec
m0 ux
q
q
=
P
=
2
2
1 vc2
1 uc2
m0 uy
m
~
u
q
P
=
q
Py = Py =
2
2
1 uc2
1 uc2
(1.4.2)
E
m
c
0
P = Pz = qm0 uz
=q
2
u2
1 c2
1 uc2
E
v
m0 c
E
c c Px
q
=
.
=q
2
2
c
1 vc2
1 uc2
Once we have the necessary mathematical tools at our disposal, it will however be easier to note
that
U =
dx
x
(1.4.3)
is a four-vector, since the proper time is a scalar, as we have seen previously, and x is the
displacement four-vector.
Note that the above four-velocity U is not the quantity measured by an observer. The latter
is given by the usual ~u in the reference frame of the observer: this is the reason some texts prefer
25
m0
1 u2 /c2
(1.4.4)
where m0 is the rest (or proper) mass of the particle, as measured by an observer at rest with the
particle itself. This choice is equivalent to introducing a relativistic velocity in the expression of
the relativistic momentum,
m0 ~u
.
P~ = m ~u = q
2
1 uc2
(1.4.5)
In the particular case in which the particle is at rest in the frame S (that is, ~u = 0), we
straightforwardly have that, in S,
U = (1, 0, 0, 0) ,
(1.4.6)
since dt = d and the observer does not move with respect to itself. Further, Eq. (1.4.1) clearly
satisfies a correspondence principle, since its spatial components reduce to p~ for |u| c.
In the following, we shall show that it is P which is conserved, besides being relativistically
invariant (in form). We shall also see the meaning of P 0 , which has no counterpart in the Newtonian
momentum p~.
1.4.2
Elastic collisions
choose the frame S so that initial total momentum is zero, that is uBx = uAx and uBy = uAy .
In S, on the other hand, we assume A does not move along the x-axis, so that v = uAx = uBx ,
and UAx = uAx = 0 (see Fig. 1.20). In S , momentum conservation yields
UAx
= uAx = uBx = UBx
(1.4.7)
UAy
= uAy = uBy = UBy
,
(1.4.8)
so that both energy and momentum are indeed conserved (only the y-components flip sign and the
speeds do not change after the collision).
In Newtonian mechanics, velocities perpendicular to the direction of motion of S with respect
to S do not change. From the above Eq. (1.4.8), we then readily find
UAy = uAy = uBy = UBy .
(1.4.9)
We next note that Lorentz transformations instead affect the components of the momentum along
perpendicular directions, and yield
1 2
uBy = uBy
1 cv2 uBx
(1.4.10)
u = u
1 2 ,
Ay
Ay
where we used uAx = 0. Eqs. (1.4.10) imply that Eq. (1.4.8) is not compatible with Eq. (1.4.9) for
v 6= 0. This is evidence that the Newtonian definition of momentum, p~ = m0 ~u, cannot represent a
conserved quantity in all inertial frames, and must be modified. This can be achieved by modifying
either its dependence on the speed or the mass, in such a way that it reproduces the usual expression
for small velocities |v| c. For example, one can introduce the relativistic mass of Eq. (1.4.4) or,
simply, the relativistic momentum (1.4.1).
We conceived the above collision having in mind Newtonian concepts (for which it is clearly
elastic). However, we should now show that the collision is indeed elastic by means of truly
relativistic quantities. Let us first not that the new momentum satisfies the following relation
m20 c2 = m2 c2 P~ P~ = m2 c2 m2 u2 ,
(1.4.11)
c2 dm = m u du + u2 dm .
(1.4.12)
and, upon differentiating both sides [note that d(m20 c2 ) = 0], we obtain
We can then derive an expression for the (change in) relativistic kinetic energy, by requiring it
equals the work done on the particle, that is the rate of change in momentum integrated along the
path of the particle. For example, in one spatial dimension, this implies
Z xB
Z xB
dm
du
d
(m u) dx =
dx +
u dx
m
K =
dt
dt
xA
xA dt
Z tB
dm 2
du
=
u dt +
u dt
m
dt
dt
tA
Z uB
dm
=
m u du + u2
du
du
uA
Z mB
= c2
dm = c2 (mB mA ) ,
(1.4.13)
mA
27
where we repeatedly changed integration variable and finally used Eq. (1.4.12). For a particle
initially at rest, we set mA = m0 , mB = m, and we find
2
u
1
2
2
2
1 = m0 c 1 2 ... 1 .
K = c (m m0 ) = m0 c q
(1.4.14)
2
c
u2
1 c2
In the non-relativistic limit,
uc
K=
1
m0 u2
2
(1.4.15)
and, in general,
K = m0 c2 q
1
1
v2
c2
1 ,
(1.4.16)
(1.4.17)
Note that for u c, the kinetic energy K diverges. In fact, it is sensible that one needs an infinite
amount of energy to accelerate a particle to the speed of light.
The four-momentum can now be rewritten as
E ~
P =
,P ,
(1.4.18)
c
and, from Eq. (1.4.11), it is easy to see that its components satisfy
E 2 c2 P~ P~ = m20 c4 ,
(1.4.19)
.
P~in
= 0 = P~fin
(1.4.20)
c P = m0 c2 + KA
+ m0 c2 + KB
= 2 m0 c2 + KA
+ KB
= 2 m0 c2 + KA
,
(1.4.21)
is also conserved, meaning its initial (before collision) and final (after collision) values are the same,
since speeds are again conserved by construction in S ,
0
P in = P fin ,
(1.4.22)
K in = 2 KA
= K fin .
(1.4.23)
from which
28
We can thus conclude the collision is indeed elastic in the frame S , and we have full relativistic
momentum conservation,
P in = P fin .
(1.4.24)
Since the relativistic four-momentum transforms according to Eq. (1.4.2), in S, we must have
with
= Pfin
,
Pin
(1.4.25)
c P 0 = 2 m0 c2 + KA + KB ,
(1.4.26)
Kin = KA + KB = Kfin ,
(1.4.27)
so that
which proves that the collision is elastic also in the frame S. We therefore conclude the collision is
elastic in all inertial frames.
Relativistic force law
In the previous derivation of K, we implicitly assumed the force F~ acting on a particle is given
by the time-derivative of the particles relativistic momentum. If one insists on this interpretation,
~ is not parallel to the particles acceleration,
it immediately follows that F
d
F~ = (m ~u) =
dt
d~u
dt
m + ~u
dm
dt
dm
dt
~u + m ~a ,
(1.4.28)
since the first term above is parallel to the particles velocity. Moreover, from Eq. (1.4.17),
dm
1 ~
1 dE
1 dK
1 d ~ ~
F dl = 2 F
= 2
= 2
= 2
~u ,
dt
c dt
c dt
c dt
c
(1.4.29)
where we assumed F~ is constant in the last step. Replacing the above into Eq. (1.4.28) yields
m ~a = F~
F~ ~u
~u ,
c2
(1.4.30)
which shows that the acceleration contains a component parallel to the force and a second component parallel to the particles velocity.
From the transformation laws for the time t t and the product m ~u m ~u , one could
~ F~ . We shall however not need to display
easily derive the transformation laws for the force F
them here, since the force is no more a quantity that transforms nicely under the new coordinate
transformations of Special Relativity.
29
1.4.3
Inelastic collisions
We have seen that the new definition of relativistic momentum hints at a new definition of total
energy for a point-like particle, namely
E = c P 0 = m0 c2 + K .
(1.4.31)
This result would be purely academic if there were no ways to transform the new proper energy
m0 c2 into a different form, for example kinetic energy (or the other way around).
For this purpose, we next consider a totally inelastic collision between two particles A and
B with the same proper mass m0 and velocities uB = uA = u in S . In the frame S, the
particle A is at rest, so that v = u . After the collision, the two particles remain attached and
form a single particle C (see Fig. 1.21). From the Lorentz transformations (1.3.12), or velocity
composition (1.3.21), we again obtain
uB =
2v
u + v
=
2 ,
u v
1 + c2
1 + vc2
UC = v ,
(1.4.32)
m0
1
= m0
u2B
c2
1+
1
v2
c2
v2
c2
(1.4.33)
(1.4.34)
where M0 is the proper mass of C. Upon inserting the speeds in Eq. (1.4.32), we obtain
1+
m u
q 0 B 2 = m0
u
1
1 cB2
v2
c2
v2
c2
2v
1+
v2
c2
2 m0 v
1
v2
c2
M0 v
=q
,
2
1 vc2
(1.4.35)
We do not need to compute K explicitly, as long as the proper mass m0 does not depend on the reference frame.
30
or
2 m0
,
M0 = q
2
1 vc2
(1.4.36)
which is larger than the sum of the proper masses of A and B. In fact, all kinetic energy converted
into mass in S ,
K + K
K
1
M0 2 m0 = 2 m0 q
1 = B 2 A = 2 2A ,
(1.4.37)
2
c
c
1 vc2
and energy is conserved in both frames. In particular, in S , initial and final energies are given by
2 m0 c2
= 2 m c2 + K = q
Ein
0
1 vc2
(1.4.38)
2 m0 c
E = M0 c2 = q
fin
1 vc2
2 m0 c2
Ein = m0 c2 + (m0 c2 + KB ) = 2 m0 c2 + m0 c2 r 1 2 1 =
2
u
1 vc2
1 B
2
c
(1.4.39)
2 + K = 2 m0 c .
E
=
M
c
fin
0
C
2
1 vc2
Note also that E > E : the energy of the system is the smallest in the frame at rest with the final
particle C 9 .
Equivalence of mass and energy
In the previous experiment we saw all kinetic energy turned into mass. This yields the physical
meaning of the famous
E = m c2 = m0 c2 ,
(1.4.40)
which holds in the rest frame of a massive particle. Let us emphasise that energy is always conserved
in Special Relativity, even in processes which are inelastic from the point of view of Newtonian
mechanics. The, the actual possibility of converting mass into energy (or vice versa) is subjected
to restrictions. For example, if the particle C cannot reduce its proper mass (like an elementary
particle), then the inverse process of the inelastic collision previously discussed may not occur.
9
Note the duality with the proper time being the longest for an observer at rest.
31
1.5
Electromagnetism
We already know Special Relativity was designed to comply with Maxwells laws of electromagnetism. Let us then check that it works.
1.5.1
Ne
0 = 3
L0
~0 = 0 .
(1.5.1)
In S, on the other hand, the cube is contracted along the x-axis and one has
0
Ne
=p
=
2
2
L0 1 L0
1 2
(1.5.2)
0 ~v
~ = p
1 2
The four-vector J = (c , ~) is called four-current and is mathematically similar to the fourmomentum P (just replace m0 with 0 in the latter). In fact, it shares similar properties, such as
the mass-shell relation which we recall here
c2 t2 x2 = c2 2
m2 c2 p2 = m20 c2
c2 2 j 2 = c2 20 .
(1.5.3)
Let us next consider a current flowing through a thin wire at rest in S. The current is composed
of electrons moving with velocity ~u in S (see Fig. 1.23). Let n denote the number of electrons per
unit volume (in S). Since the wire is electrically neutral, we must have
= n e
= ne
= + + = 0 ,
32
(1.5.4)
~+
=0
~ = ~+ + ~ = ~u ,
(1.5.5)
where we can assume ~u = ux . In a frame S moving with velocity ~v parallel to ~u with respect to S,
we instead have
v jc2
(1 uc2v )
p
p
=
=
1 2
1 2
(1.5.6)
j
+
+
v c2
=p
,
= p
1 2
1 2
and
j v
(~u ~v )
j = p
= p
1 2
1 2
(1.5.7)
+
+
+
+ = jp v = pv
.
j
1 2
1 2
If, further, we have u = v, then j = 0, as expected: in S the current is due to the motion of
negative charges, whereas positive charges move in S .
1.5.2
~ and B
~
Transformations for E
d m0 ~u
q
F~ =
,
dt
u2
1
(1.5.8)
c2
does not transform nicely 10 into a new vector F~ because of the law of transformation of t t 6= t.
There is one case which can be dealt with easily, namely the one in which the body subject to the
10
33
force is (momentarily) at rest in S , so that dt = d . The force acting upon it, as seen in a system
S moving with velocity ~v with respect to S (and the body itself), is then given by
Fx = Fx
2
Fy = Fy 1 vc2
(1.5.9)
F = F 1 v2 ,
z
c2
which shows that there is no change in the longitudinal component (parallel to ~v ), but only in the
orthogonal components.
We can now apply the above result to the case of an electromagnetic force acting on a test
charge q, which is given by Lorentz law,
~ + ~u B)
~ ,
F~ = q (E
(1.5.10)
and should transform according to the rules we mentioned previously (albeit never displayed in
full). In particular, suppose the test charge q is at rest in S but moves in S, with the velocity
ux = v ,
uy = uz = 0 .
(1.5.11)
(1.5.12)
Fx = q Ex ,
(1.5.13)
Fx = Fx
Ex = Ex .
(1.5.14)
With similar arguments, for the y and z components, one can finally obtain the full set of transformation rules of the components of the electric field, namely
E = Ex
Ey = (Ey v Bz )
(1.5.15)
Ez = (Ez + v By ) ,
where = (1 2 )1/2 .
This result can be generalised for a charged body moving along a generic trajectory as
~
~
Ek = Ek
(1.5.16)
h
i
E
~ = E
~ + ~v B
~
34
Bk = Bk
h
i
B
~ = B
~ + ~v2 E
~
,
(1.5.17)
where k means the component parallel to the relative velocity ~v and those perpendicular to ~v .
Instead of deriving the above transformations, we shall just show that the transformation of the
~ is indeed in agreement with Maxwells equations. For this purpose, let us consider
electric field E
again a point-like particle with charge q. It will produce a spherically symmetric radial electric
field in the frame in which it is at rest. However, in a frame moving with constant velocity, the
field line will shrink along the direction of relative motion (see Fig. 1.24). This result can be easily
derived from Gauss law, namely
~ E
~ .
(1.5.18)
(1.5.19)
which does not depend on the reference frame and the flux of the electric field is therefore invariant.
Since we know the area parallel to the direction of motion shrinks, whereas that orthogonal is
unaffected, we can immediately conclude that Ek does not change, whereas E must increase to
compensate for the reduced area.
~ and B
~ do not simply
It is clear from the expressions (1.5.16) and (1.5.17) that the fields E
transform as Lorentz vectors, unlike the four-current J they are sourced by. Their geometrical
nature will in fact become clear after we introduce a more convenient mathematical formalism,
which will also allow us to write Maxwells equation in a more compact (geometrical) form.
1.5.3
x x = M x ,
x =Mx =
x
35
(1.5.20)
(1.5.21)
where repeated indices always appear one up and one down and are implicitly summed over (Einsteins notation), and the matrix M is invertible, that is
x
1
M .
(1.5.22)
det(M ) 6= 0 M =
x
We define the following quantities:
Scalars: Quantities that do not change, like numbers and functions that satisfy
(x ) = (x) .
(1.5.23)
All scalars we have seen so far are numbers (the proper mass and charge of a particle), except
for the phase of a wave (which is a scalar field).
Vectors: Quantities V that transform like the coordinates:
!
V =MV =
V
x
(1.5.24)
(1.5.25)
= M =
,
x
(1.5.26)
since then
V
x
x
x
x
V = V = V .
(1.5.27)
x
x
x1
x2
xn
x1
1 2 ...n
...n
...
...
T ... =
T1122...
(1.5.28)
.
n
1
2
m
1 2
x
x
x
xm
x 1
x 2
It is now easy to see that all operations defined in a general vector space can be applied to
the present case. For example, multiplication of a (n, m) tensor by a scalar does not change its
transformation properties, that is
...n
...n
T1122...
= R1122...
,
m
m
(1.5.29)
is stil a (n, m) tensor, and tensors of same rank can be added and subtracted.
Further, by multiplying the components of a (n, m) tensor T by the components of a (p, q)
tensor Q, one obtains a (n + p, m + q) tensor R,
...
... ...p
n 1 2
...n
T1122...
Q1122...qp = R112q+2
...m 1 2 ...q
m
36
(1.5.30)
On the other hand, by contracting a rank (n, m) tensor T with a (p < m, q < n) tensor Q, one
obtains a (n q, m p) tensor R. For example,
...
...
n
q+1 q+2
...n
Q11 22 ...pq = Rp+1
T1122...
p+2 ...m .
m
(1.5.31)
An example of covector is given by the (four-)gradient. From the chain rule, we in fact have
= M
.
(1.5.32)
=
x
x
x
x
For this reason, we shall often use the more compact notation
= .
x
(1.5.33)
(1.5.34)
is a vector.
The above formalism holds for any linear transformation. To make contact with relativity, we
require that M = be a Lorentz transformation. We shall see later on what this really means, but
for now it is enough to know that, from
=
T =
(1.5.35)
one finds that to each vector V there can be associated a co-vector by contracting with the
Minkowski (0, 2) metric tensor,
V = V ,
(1.5.36)
= diag (1, 1, 1, 1) .
(1.5.37)
V = V = V = V = V = V .
(1.5.38)
where
In fact,
Keep in mind that V and V are (mathematically and perhaps physically) different objects. As a
simple mnemonic rule, to go from upper indices to lower indices one just needs to change the sign
of 0 components. For example,
T 00 T 0i
0 ~
V = (V , V ) ,
T =
.
(1.5.39)
T i0 T ij
In general, contracting one index of a (n, m) tensor with the (0, 2) metric produces a (n1, m+1)
tensor as well as the (inverse) (2, 0) metric will produce a (n + 1, m 1) tensor. Note the total
number of indices does not change: (n 1) + (m + 1) = (n + 1) + (m 1) = n + m.
37
A simple example that shows how useful the covariant formalism can be is given by the following:
consider a particles four-velocity,
u =
dx
.
d
(1.5.40)
u = u .
(1.5.41)
As such, u can be computed in any inertial reference frame and its components in any other
inertial frame will be given by Lorentz transformations. In particular, in the (instantaneous) rest
frame of the particle, since dt = d ,
u = (1, 0, 0, 0) ,
(1.5.42)
u u = 1 ,
(1.5.43)
the latter being just a re-statement of the mass-shell condition p p = m20 (having set c = 1).
By differentiating the above with respect to d , we obtain a vector relation,
0=
du
du
du
u + u
= 2 u
,
d
d
d
(1.5.44)
which implies that the 4-acceleration is always orthogonal to the 4-velocity [unlike the usual acceleration defined in Eq. (1.4.30)]. Note though that, beside being a neat result, the above has
not much physical sense since the 4-velocity and 4-acceleration are not the quantity we actually
measure. Eq. (1.5.44) in the rest frame of the particle simply means the acceleration is purely
spatial,
a = (0, ax , ay , az ) .
(1.5.45)
Since a vectors type is invariant, we deduce a is space-like for all observers, like u is time-like.
So, what can we actually learn from the covariant formalism?
1.5.4
~ E
~ =
11
~ B
~ = 0 J~ + 0 0 E
t
~ B
~ =0
11
~ E
~ = B .
t
38
(1.5.46)
(1.5.47)
The third and fourth equations do not contain sources and allow us to introduce scalar and vector
~ and B
~ respectively,
potentials for E
~
~ =
~ A
E
t
(1.5.48)
~ =
~ A
~ .
B
If we define the four-vector potential
~ ,
A = (, A)
(1.5.49)
F = A A .
(1.5.50)
(1.5.51)
F12 = x Ay y Ax = Bz = B .
We now see that the components of the electric and magnetic field form the Maxwell (or fieldstrength) tensor given by
F = F
0 Ex Ey Ez
Ex
0
Bz By
.
=
Ey Bz
0
Bx
Ez By Bx
0
(1.5.52)
Finally, Eqs. (1.5.46), which contain the sources, takes the simple form
F = J ,
(1.5.53)
and encodes the true dynamics of the electromagnetic fields. Note that this differential equation
~ and B,
~ but second order in the potential A . Also, since E
~
is first order in the physical fields E
~
and B are uniquely determined by the four components of A , the number of degrees of freedom
of the electromagnet filed is (at most) four. We will meet the same structure when we study the
gravitational interaction later on in the course.
Wave equation and gauge freedom
Putting together Eq. (1.5.50) and (1.5.53), for J = 0, we obtain the wave equation for the
propagation of light signals
A A = A = t2 c2 2 A = 0 ,
(1.5.54)
In fact, we recall the Maxwell tensor does not determine the vector potential uniquely, namely
for any gauge transformation of the form
A A = A +
F F = F ,
(1.5.55)
which follows from the skew-symmetry of F and the fact that partial derivatives commute when
applied to a scalar . This means that the independent components of A (or the number of
physical degrees of freedom of the electromagnetic field) are not four, but no more than three.
In particular, the Lorentz gauge is determined by a scalar which satisfies 12
A = 0
= A .
(1.5.56)
However, this gauge choice does not fix A completely, since one can always further change the
potential as
A A = A +
with
= 0 ,
(1.5.57)
and the new A will still satisfy A = 0. It can in fact be shown that A indeed contains only
two independent components (degrees of freedom): the two independent polarizations of light.
Scalars and charge conservation
There are a few things we can learn from the covariant formalism: for example, there is a scalar
quantity
(1.5.58)
F F = 2 B 2 E 2 ,
and a pseudo-scalar
13
quantity
~ E
~ ,
F F = 8 B
(1.5.59)
(1.5.60)
which follows from F = F and implies charge conservation. In fact, in the rest frame of the
charge, this becomes
J = (, 0, 0, 0)
J = t = 0 .
(1.5.61)
P = t m0 = 0 ,
(1.5.62)
which shows that four-momentum conservation is the relativistic consequence of proper mass conservation, exactly like electric current conservation follows from the invariance of the electric charge.
12
Second order partial differential equations always admit a local solution, which proves exists for all A and (at
least piecewise) all space-time points.
13
Pseudo-scalars change sign under spatial reflections, x x, unlike true scalars, which do not.
40
1.5.5
Let us conclude this chapter about Special Relativity with a few important observations. One
reason that led us to Special Relativity was precisely the fact that electromagnetism is not Galilean
invariant, which consequently led us to discard interactions at a distance and conservative forces
derived from a potential. Now, we have just seen that the electromagnetic field F can be derived
from a four-potential, and the question comes immediately to mind how this can be consistent.
The fact is that the electromagnetic field does not entail instantaneous interactions at a distance:
changes in the state of sourcing charges generate perturbations in the field which travel at the
speed of light in vacuum [see Eq. (1.5.54)] before affecting test charges.
In the modern view of the physical world, everything is indeed represented by fields, even
matter. Questioning the nature of light (recall the aether theories, and the idea that light is made
of waves in a medium) is therefore the same as wondering what matter is made of. In a sense, they
are both just real.
A more concrete question however comes to mind. If real interactions may (and in fact can)
be represented by mediating fields, what is the gravitational field? If it can be derived by a
potential like electromagnetism, what is the gravitational potential? And what are the gravitational
analogues of electromagnetic waves (light)? Before addressing these questions, we will need to take
a rather long detour into mathematics and geometry.
41
42
Chapter 2
2.1
Abstract groups
Mathematical group: let G be a set of objects for which a binary operation is defined (we shall
mostly use multiplicative notation)
g1 , g2 G ,
g3 G :
g1 g2 = g3 .
(2.1.1)
(2.1.2)
g G ;
(2.1.3)
g 1 G :
g g 1 = g 1 g = I .
(2.1.4)
g1 , g2 G .
(2.1.5)
g3 G :
g1 + g2 = g3 .
(2.1.6)
1) + is associative
(g1 + g2 ) + g3 = g1 + (g2 + g3 ) = g1 + g2 + g3 ;
(2.1.7)
g G ;
(2.1.8)
(g) G :
g + (g) = (g) + g = 0 .
(2.1.9)
The above definition has a priori nothing to do with transformations and is therefore more
general. For example, the prototype of multiplicative groups is the set of rational numbers Q,
whereas the prototype of additive groups is the set of integer numbers Z. What characterizes a
(formal) group is the formal set of elements and the operation between them. If the elements of
two groups can be put in correspondence in such a way that the corresponding operations also yield
corresponding results, then the two groups are formally the same. On the other hand, the same
formal group may be realized in different ways. For example, the formal group Z can be realized
by any set of elements that can be added (and subtracted) like apples and money (provided we
define the negative of an apple by the need of one and the negative of money as a debt).
2.2
Linear changes of coordinates in a N -dimensional space can be represented by matrices with real (or
complex) entries. A particular example of multiplicative groups is thus given by square matrices of
constant numbers with non-vanishing determinant. Such group is called GL(N ) for General Linear
in N dimensions. Note that each matrix is defined by N N 1 elements (the 1 coming from
the determinant condition). The operation defined for matrices is the usual matrix multiplication,
X
Aij Bjk = Cik .
(2.2.1)
j
D(gi ) D(G) ,
(2.2.2)
(2.2.3)
such that
44
where the elements D(g) are usually matrices and act upon vectors V , but one can have different
structures (see below for examples). Notice also that the order of factors is crucial and we used
different symbols for the formal multiplication and multiplication between elements of the realizations, since they are different concepts (in the following we shall instead use the same symbol for
notational simplicity).
Let us also recall that a vector space V is a set of objects we can add together and multiply by
real (or complex) numbers (scalars):
V1 , V2 V and a, b R (C) a V1 + b V2 = V3 V .
(2.2.4)
A fundamental property of V is that there exists a (finite or infinite) basis of elements Vi which
linearly generate all of V. The number of basis elements is the dimension of the vector space.
Note that by representation of a group one actually means the set V of objects D(g) acts upon,
where D(G) is a realization of the group G. For example, for the Lorentz transformations, the
matrices (1.5.21) are the realization of the Lorentz transformation on the vector representation
V = {V }. It is then natural to assume the element obtained by means of the operation (2.2.1)
is still a coordinate transformation, since the above mathematical formula simply means we apply
the transformation B to a given vector followed by the transformation A on the resulting vector.
Groups can have finite or infinite number of elements. An example of a finite group is given by
parity transformation,
P : x x ,
(2.2.5)
P2 = I ,
(2.2.6)
so that the inverse exists and P = P 1 . The elements of the parity group are therefore just {P, I}.
Of particular interest are the infinite groups with a finite number of generators, also known
as Lie groups. Roughly speaking, a Lie group is a group whose elements can be continuously
parametrized by a finite set of real variables i R, with i = 1, 2, . . . , d, in such a way that one
can write any element of G (or, equivalently, of a realisation of G) in the exponentiated form
P
D(g) g = e
i J i
X
( di=1 i Ji )n
,
=
n!
n=0
(2.2.7)
where Ji are the generators of G. It follows that the set G = {j} of all linear combinations of the
Ji s must be endowed with three operations in order to recover the group multiplication and make
sense of Eq. (2.2.7):
i) an associative operation + between elements j G;
ii) a multiplicative operation (usually omitted) by (real or complex) scalars and
iii) a multiplicative operation between elements j G.
This means that (G, +, , ) form an algebra (the Lie algebra of the group G): (G, +, ) is a (real or
complex) vector space and (G, ) is a group. Further, multiplication and addition + are mutually
compatible, meaning they satisfy the distribution property
j1 (j2 + j3 ) = j1 j2 + j1 j3 ,
45
ji G .
(2.2.8)
iff
[A, B] 6= 0 ,
(2.2.9)
where
[A, B] = A B B A ,
(2.2.10)
D(g1 ) D(g2 ) = e
(1)
i J i
(2)
j J j
P
(1)
i i J i + j
(2)
j J j +
ij
(1) (2)
i j [Ji ,Jj ]
(2.2.11)
This means that, if the Lie group is not Abelian, Ji Jj 6= Jj Ji (otherwise the algebra is also called
Abelian), one must have
Ji Jj Jj Ji = [Ji , Jj ] =
cij k Jk ,
(2.2.12)
where cij k are the structure constants of the algebra, otherwise the product of two (or more) Ji s
would not belong to the algebra, nor would the product on the left hand side of Eq. (2.2.11) belong
to the (realisation of the) group G. In fact, the above commutation relations imply that products
of elements of G are linear combinations of the generators, as required by the fact G is a vector
space generated by the Ji . The number d of (linearly independent) Ji is called the dimension of
the Lie algebra and group. Eq. (2.2.12) uniquely specifies a Lie algebra, meaning that two algebras
with the same commutator structure are the same mathematical object.
Two groups G1 and G2 whose algebras are equal are the same group (at least) near the identity
(or locally). However, the two groups can be globally different. For example, the parameters i
may have different ranges, like i R for G1 and |i | < 1 for G2 (which is then a compact group
this concept requires the notion of manifold and will be clarified later in the course).
Another important concept is that of irreducible representations. A representation of a group
G is said irreducible if the corresponding realization D(G) cannot be put in block diagonal form,
D1 (g)
0
0 ...
D2 (g) 0 . . . .
D(g) = 0
...
(2.2.13)
Note that if such a block diagonal form exists, since each Di (G) is a realization of G, the corresponding representation must be given by a vector space that is the cartesian product of separate
vector spaces
V = V1 V2 . . . ,
(2.2.14)
so that the Di act on elements of Vi . An important result is that all the representations of a group
can be build out of irreducible ones.
Before considering the Lorentz group, let us see the simpler case of rotations.
46
2.3
Rotations in N dimensions
1 0 0 ...
RT R = RT I R = I = 0 1 0 . . . ,
...
(2.3.1)
xi y i =
N X
X
xi yi ,
(2.3.2)
i=1
and
1 = det(RT R) = (det R)2
det R = 1 .
(2.3.3)
This group is therefore denoted by O(N ) for Orthogonal in N dimensions. A particular case is
given by orthogonal matrices with positive unit determinant,
det(R) = 1 ,
(2.3.4)
which is denoted by SO(N ) (Special Orthogonal). From (2.3.1) one obtains (note that [Ji , Ji ] = 0
and iT = i )
P
T P
P
T
I = e i J i
e k Jk = e i (Ji +Ji )
JiT = Ji .
(2.3.5)
which tells us the generators Ji are realized by skew-symmetric matrices. From (2.3.4), one likewise
obtains
P
tr(Ji ) = 0 ,
(2.3.6)
1 = det e i Ji = ei tr(Ji )
that is, the generators are traceless. Of course, this second condition is not new and in fact follows
from the previous (2.3.5) (but not the other way around), which is reminiscent of the argument in
Eq. (2.3.3).
2.3.1
(2.3.7)
(2.3.8)
(2.3.9)
and we identify
nN.
R(2 n + ) = R() ,
(2.3.10)
This means the group SO(2) is compact, since its Lie parameter (0, ).
One can easily check that (R(), ) form a group:
cos 1 sin 1
cos 2 sin 2
R(1 ) R(2 ) =
sin 1 cos 1
sin 2 cos 2
cos 1 cos 2 sin 1 sin 2
cos 1 sin 2 + sin 1 cos 2
=
sin 1 cos 2 + cos 1 sin 2 sin 1 sin 2 + cos 1 cos 2
cos(1 + 2 ) sin(1 + 2 )
=
= R(1 + 2 ) .
sin(1 + 2 ) cos(1 + 2 )
(2.3.11)
Further,
I=
cos 0 sin 0
sin 0 cos 0
= R(0) ,
(2.3.12)
and
R
() =
cos sin
sin cos
1
cos() sin()
sin() cos()
= R() = RT () .
(2.3.13)
The matrices R cannot be put in block diagonal forms: this means the 2-vectors V i R2 are an
irreducible representation of SO(2). All other representations can be built from it. For example,
1
1
1 i j
j
i
k ij
k ij
i
j
ij
V
W
+
V
W
V
W
V
W
V i Wj V j Wi
=
V
W
=
T
+
+
k
k
|{z}
|2 {z
} |2
{z
} |2
{z
}
22=4
1
(ij)
= S+T
+T
[ij]
(2.3.14)
is the (2, 0)-tensor representation of SO(2), which one can show reduces to a combination of one
scalar (the trace), the traceless symmetric and skew-symmetric parts. Note that T ij contains 4
free entries, S has 1, T (ij) has 2 and is therefore a 2-vector, T [ij] has 1 and is a (pseudo)-scalar, for
a total of 4. Suppose we arrange the 4 elements of T ij into a vector V A with A = 1, . . . , 4. The
action of SO(2) on such vectors would be given by a 4 4 matrix with block diagonal form
S
1 0 0
(2.3.15)
M AB V B = 0 1 0 T [ij] ,
0 0 R
T (ij)
where the matrices M AB therefore realize SO(2) and the vectors V A represent it.
Clearly, there is only one parameter in SO(2), the angle of rotation , and the group is therefore
one-dimensional and Abelian. One can see this by determining the one generator J of the algebra
so(2) by means of a Taylor expansion of (2.3.8) about = 0,
X n
2 1 0
cos sin
1 0
0 1
J n , (2.3.16)
+ ... =
=
+
+
0 1
sin cos
0 1
1 0
2
n!
n
48
where
J=
0 1
1 0
J 2 = I ,
with
(2.3.17)
R,
(2.3.18)
z1 z2 = e1 i e2 i = e(1 +2 )i ,
(2.3.19)
z z = 1 .
(2.3.20)
To this group one can associate a formal generator J = i with the property that
z 1 z = z z = e i+ i = 1 ,
i = i and i2 = 1 .
(2.3.21)
There is therefore a mathematical equivalence between the algebras so(2) and u(1) given by interpreting as an angle. Note that this realization of the group U (1) does not naturally involve
operators acting on any vector space 1 and does not have a naive representation (in classical
physics!), unlike the group SO(2) which is realized by matrix transformations and is represented
by 2-vectors.
2.3.2
1
0
0
cos 2
R1 = 0 cos 1 sin 1 , R2 =
0
0 sin 1 cos 1
sin 2
0 sin 2
cos 3 sin 3 0
1
0 , R3 = sin 3 cos 3 0 ,
0 cos 2
0
0
1
(2.3.22)
Ri (2 n + ) = Ri () ,
(2.3.23)
which shows that SO(3) is also compact. The corresponding fundamental representation is given
by 3-vectors V i R3 . Since the above rotation matrices cannot be simultaneously put in block
diagonal form, R3 is the fundamental irreducible representation of SO(3).
All other representations can be built out of vectors, like for SO(2), and they are in general
reducible. For example, the product of two vectors,
(ij)
+ |{z}
T [ij]
S + |T{z}
T ij = V i W j = |{z}
|{z}
33=9
1
Of course, one can consider the product z v C as the action of z U (1) on the vector space C.
49
(2.3.24)
breaks into a scalar S (the trace), a pseudo-vector V (the skew-symmetric part T [ij]) and an
irreducible (2, 0) tensor Q (the traceless symmetric part T (ij) ). Note that dim T = 9, dim S = 1,
dim V = 3 and dim Q = 5, with 1 + 3 + 5 = 9. A representation of such tensors as 9-dimensional
vectors V then requires a realization of SO(3) by means of 9 9 matrices M of the following block
diagonal form
1
0
0
S
0 V ,
M AB V B = 0 R(3)
(2.3.25)
Q
0
0
R(5)
where R(3) is a usual rotation in 3 dimensions and R(5) a suitable 5 5 matrix. Again, such
examples of different realizations of SO(3) prove the necessity of distinguishing between a formal
group and its realizations.
Irreducible representations of SO(3) are identified by an integer number s = 0, 1, . . ., with s = 0
for the trivial scalar representation (I = 1 acting on elements of R) and s = 1 for the (fundamental)
vector representation. The integer s can then be related with the angular momentum of a spinning
body, the corresponding realization of SO(3) being given by the operators that generate rotations.
In particular, an object with s = 0 will always look the same regardless of the amount and direction
of rotation; an object with s = 1 will appear the same after a rotation of i = 2 around the ith axis; an object with s = 2 needs a rotation of i = . In general, an object with a given s N
requires a rotation of i = 2 /s to return to the initial configuration.
From the above matrices (2.3.22) one obtains the (skew-symmetric and traceless) generators of
so(3), namely
0 0 0
0 0 1
0 1 0
J1 = 0 0 1 , J2 = 0 0 0 , J3 = 1 0 0 ,
(2.3.26)
0 1 0
1 0 0
0 0 0
with commutators (from now on, we shall employ Einstein summation convention on repeated
indices)
[Ji , Jj ] = ij k Jk ,
where ijk is the Levi-Civita symbol. For example,
0 0 0
0 0 1
0 0 0
J1 J2 = 0 0 1 0 0 0 = 1 0 0
0 0 0
0 1 0
1 0 0
(2.3.27)
(2.3.28)
and
so that
0 1 0
0 0 1
0 0 0
J2 J1 = 0 0 0 0 0 1 = 0 0 0 ,
0 0 0
1 0 0
0 1 0
0 1 0
[J1 , J2 ] = 1 0 0 = J3 .
0 0 0
50
(2.3.29)
(2.3.30)
The same algebra (2.3.27) holds for the generators of SU (2), the group of 2-dimensional unitary
matrices
U U = I ,
(2.3.31)
i /2
i
(2.3.32)
i ( + )/2
i
i
(iT ) = i = i ,
(2.3.33)
and
1 = det(U ) = ei
tr(i )
tr(i ) = 0 ,
which defines the Pauli matrices as the traceless Hermitian generators of su(2),
0 1
0 i
1 0
1 =
, 2 =
, 3 =
.
1 0
i 0
0 1
Note that the matrices Ji = i i /2 satisfy Eq. (2.3.27), namely
h i
j
i
k
= ij k i
,
[i , j ] = 2 i ij k k
i ,i
2
2
2
(2.3.34)
(2.3.35)
(2.3.36)
which manifests the correspondence between the algebras su(2) and so(3), meaning one can find
common representations (that is, an equivalence between corresponding representations).
An explicit construction is the following: let us map 3-vectors ~x = (x, y, z) to 2 2 complex
matrices (the indices a, b = 1, 2 in the following)
z
x iy
~x hab (~x) = ~x ~ = x 1 + y 2 + z 3 =
.
(2.3.37)
x + iy
z
For transformations belonging to SU (2), this is a particular (0, 2) tensor, which must transform
according to
ha b = Uac (i ) Ubd (i ) hcd
h = U T h U ,
(2.3.38)
where U is given in Eq. (2.3.32) and one can then check that h is equivalent to
~x = Ri (i ) ~x ,
(2.3.39)
since ~x h .
The fundamental representation of SU (2) is given by 2-dimensional complex vectors (called
spinors),
za C .
= (z1 , z2 ) ,
(2.3.40)
We will see the role such vectors play in modern physics later. For now, let us just note that
i2 = 1 ,
i = 1, 2, 3 ,
51
(2.3.41)
e+i
0
0
ei
= 1 .
(2.3.43)
In fact, in order to rotate a spinor (2.3.40) back to its initial value, we need to go around the axis
twice, = 4 (unlike a vector!).
Irreducible representations of SU (2) are identified by one parameter s called the spin, which
can only take (half)-integer values. The physical meaning of the (non-zero) spin is that 1/s equals
the number of complete (2 ) rotations needed to map the object into itself. Spin 1/2 objects
require 2 full rotations, spin 1 objects are vectors and require 1 full rotation, (2, 0) tensors are spin
2 objects and require 1/2 a full rotation (rotation of an angle = ).
Before moving on to the Lorentz group, let us summarize our findings for SO(3) and SU (2):
SO(3) irreducible representations
s = 0, 1, 2, 3, . . . ;
(2.3.44)
3
1
s = 0, , 1, , 2, . . . .
2
2
(2.3.45)
We have explicitly seen that irreducible representations of the two groups with s = 1 can be put
in correspondence by constructing the (traceless) 2-tensor (2.3.37) of SU (2) which is equivalent to
an SO(3) vector. Since the latter are the fundamental representation of SO(3), it follows that each
irreducible representation of SO(3) is equivalent to the irreducible representation of SU (2) with
the same integer s. Of course, the other way round does not hold, since there is no equivalent of
(half-integer) spinors in SO(3).
In general, both for SO(3) and SU (2), the dimension of the representation is given by
d = 2s + 1 ,
(2.3.46)
as can be easily checked for scalars (s = 0, d = 1), spinors (s = 1/2, d = 2), vectors (s = 1,
d = 3), etc. One also has the following composition rule for the tensor product of two irreducible
representations (which generalizes the cases of the product of two vectors and two spinors we have
seen before):
(s) (s ) = (s s ) + (s s + 1) + . . . + (s + s ) ,
(2.3.47)
where (s) denotes the irreducible representation of spin s and we assumed s s . We can easily
check the above formula for the cases we saw before:
52
(2.3.48)
(2.3.49)
2.4
We are now ready to study the Lorentz group of matrices which satisfy
T = ,
(2.4.1)
where the Minkowski metric tensor has replaced I for the rotation group and is symbolized by
the notation SO(4) SO(3, 1). We shall restrict ourselves to the so called proper orthocronous
subgroup, which consists of matrices that also satisfy
00 1 ,
det = 1 ,
(2.4.2)
1
1 2
= cosh ,
= sinh ,
cosh 1 sinh 1 0 0
cosh 2 0 sinh 2
sinh 1 cosh 1 0 0
0
1
0
, B2 =
B1 =
0
0
1 0
sinh 2 0 cosh 2
0
0
0 1
0
0
0
cosh 3 0 0 sinh 3
0
1 0
0
,
B3 =
0
0 1
0
sinh 3 0 0 cosh 3
(2.4.3)
0
0
0
1
(2.4.4)
acting on 4-vectors V R4 . The above matrices show a remarkable similarity with rotation
matrices in 3 dimensions, except that the angles of rotation are imaginary (sin sinh and
cos cosh). In fact, the above matrices satisfy the defining equations
cosh2 sinh2 = 1
B BT = ,
53
det B = 1 ,
(2.4.5)
"
1
0
(3)
0 Ri
(3)
Ri
SO(3) .
(2.4.6)
The set {Bi , Ri } therefore generates a realization of the Lorentz group SO(3, 1). Further, since
not all of them can be put in block diagonal form simultaneously, the 4-vectors are an irreducible
representation of SO(3, 1).
One can immediately obtain the corresponding generators, that is for the rotations
Ji =
and for the boosts
0
1
K1 =
0
0
1
0
0
0
0
0
0
0
"
0
0
(3)
0 Ji
0
0
0
0
, K2 =
1
0
0
0
(3)
Ji
0
0
0
0
1
0
0
0
so(3) ,
0
0
0
0
, K3 =
0
0
0
1
(2.4.7)
0
0
0
0
0
0
0
0
1
0
.
0
0
(2.4.8)
[Ji , Jj ] = ij k Jk
(2.4.9)
[Ji , Kj ] = ij Kk .
However, if we define
Ai =
1
(Ji + i Ki ) ,
2
Bi =
1
(Ji i Ki ) ,
2
(2.4.10)
one finds
[Ai , Bj ] =
=
1
([Ji , Jj ] i [Ji , Kj ] + i [Ki , Jj ] + [Ki , Kj ])
4
1 k
ij Jk i ij k Kk i ji k Kk ij k Jk = 0 ,
4
(2.4.11)
as well as
[Ai , Aj ] = ij k Ak
(2.4.12)
k
[Bi , Bj ] = ij Bk .
The two sets of generators therefore belong to two copies of su(2) and we conclude that the Lie
algebra of the Lorentz group so(3, 1) su(2) su(2).
54
2.4.1
The representations of SO(3, 1) can thus be obtained by composing the fundamental representations
of SU (2). Let us denote with (1/2, 0) the fundamental representation of the SU (2) generated by
the Ai s and with (0, 1/2) the fundamental representation of the SU (2) generated by the Bi s. The
reason for this notation is that the dimension of both the representations (s, 0) and (0, s) is of
course d = 2 s + 1. For a generic representation (s, s ) the dimension is given by the product
d = (2 s + 1)(2 s + 1) .
(2.4.13)
One then finds from the composition rule (2.3.47) that, for example,
(1/2, 0) (0, 1/2) = (1/2, 1/2) ,
(2.4.14)
(2.4.15)
where (0, 0) is a scalar (d = 1) and (1, 0) a 3-vector (d = 3), that is the skew-symmetric part of a
(2, 0) tensor.
We shall not go into further details, however a contact with physics is in order. From the
physical point of view, one wants fundamental particles correspond to irreducible representations
of SO(3, 1), meaning a given particle appears of the same specie for all inertial observers.
This far we skipped a detail which is now worth brining up. Irreducible representations of
SO(3, 1) [or rather SU (2) SU (2)] are uniquely identified by two parameters:
1) the mass m 0, and
2) the spin s = 0, 1/2, 1, 3/2, . . .
We already saw s, so the question is where does m come from. One can justify this second parameter
formally by introducing the notion of Casimir operators for the algebra su(2) su(2). However,
we can just give a simpler physical answer to this question: consider for simplicity the vector
representation of SO(3, 1). We already know that the Minkowski modulus of a 4-vector is a scalar,
so that, for the 4-momentum we have
P P = m2 ,
(2.4.16)
where m here denotes the proper mass. Since 4-momenta with different m are not transformed
into each other by Lorentz transformations, it appears natural to consider that m contributes to
distinguish different particles as well as s does. And that the corresponding vector fields V(m, s)
and V(m , s ) are physically distinct (at least before we allow for interactions).
If we trust the mathematical structure that arises from the principle of relativity we therefore
expect that there may exist two kinds of particles:
1) the bosons with integer spin and
2) the fermions with half-integer spin.
As a matter of fact, both such kinds do exist. The historical and physical reasons for their names
however go beyond the scope of this course.
55
2.4.2
Poincar
e group: SO(4, 1)
(2.4.18)
In fact,
P ij
V =
a
1
V
1
V + a
1
(2.4.19)
One can check that the matrix multiplication actually reproduces the expected action on 4vectors.
56
Chapter 3
x + . . . .
y (x ) = x +
x y =x
Since the x can be viewed as (local) vectors under the linear transformation defined by the matrix
y
M=
x y =x
which does not depend on y , there is hope that we can partly recover our previous construction
at least in a local sense.
This is what differential geometry is all about: apply the mathematical machinery of GL(N )
and RN to more general geometric spaces. In the process of introducing it, we will see many things
change and one loses some and gains some.
Suggested bibliography: B. Schutz, Geometrical methods of mathematical physics, Cambridge
Univ. Press (1980). Selected chapters: 1.1-1.6, 2.1-2.9, 2.12-2.17, 2.19-2.30.
3.1
Differentiable manifolds
In a nutshell, a (differential) manifold is a topological space which locally looks like (a portion of)
the n-dimensional Euclidean space Rn .
57
Brief review of Rn
In the following, we shall assume most of the properties of the set Rn of real n-tuples xi , with
i = 1, . . . , n. In particular, Rn is a topological space with open balls defined by
||x y|| < R ,
where, for example, one can use the Euclidean norm given by
v
u n
uX
||x y|| = t (xi y i )2 .
(3.1.1)
(3.1.2)
i=1
It is also a vector space, with vectors defined by the displacements v i = xi y i , upon which one
can act with global linear transformations belonging to GL(n). Finally, we shall assume knowledge
of n-dimensional (real) calculus.
3.1.1
We start by recalling that a topological space is a set of elements (points) in which the notion of
contiguity is defined: two elements of the set are contiguous if they both belong to the same
open subset (usually referred to as a neighbourhood of those elements). More precisely, given a
set M of points 1 , the topological space (M, {Ai }) is defined by a family of so-called open sets
{Ai }, such that the empty set and M itself belong to {Ai }, as do an arbitrary union of (a finite
or infinite number of) open sets i Ai , and the intersection of a finite number of open sets i Ai 2 .
In particular, we shall be concerned with separable (or Hausdorff) topological spaces, in which,
for any two arbitrary elements P and Q, there always exist disjoint open sets U P and V Q
(neighborhoods of P and Q, respectively), with U V = .
A map is in general an application from an open set A M to Rn , that is : A Rn . Since
M is a topological space, the notion of continuity is also defined and means that, if is continuous,
it maps any open set A into an open (sub)set of Rn .
58
1
1
2 = I .
1 f
or
(3.1.4)
In layman terms, the application f is just a coordinate transformation in Rn (see Fig. 3.2), and
it immediately follows from the above conditions that the dimension n must be the same for all
charts of a given M. The integer n is therefore called the dimension of the manifold. Moreover,
if all the functions f C p (Rn ), we can also say the manifold is p-times differentiable. We shall
in general assume f C (Rn ), meaning we can differentiate any functions as many times as we
need.
Mathematically speaking, a manifold is an equivalence class of atlases: two atlases are equivalent
if there exists a bijective correspondence between them. This puts on a firm mathematical basis the
idea of a geometric space whose properties do not depend on the choice of coordinates. This however
does not mean that the concept of manifold and tools of differential geometry are restricted to sets
with a natural geometrical interpretation. For example, the phase space and configuration space of
classical mechanics are manifold; the three parameters (angles) of O(3) form a three-dimensional
manifold; vector spaces are manifolds of dimension equal to the number of basis vectors.
In order to prove that a given space is a manifold, it is sufficient to find one atlas which covers
it. We shall now try to define atlases for the 2-dimensional sphere S 2 and the cone.
Example: the sphere
We can embed the sphere S 2 in R3 with coordinates x, y and z by imposing the (smooth) condition
f s = x2 + y 2 + z 2 R 2 = 0 ,
(3.1.5)
where R is the radius of S 2 . We can then cover the sphere by means of four charts (Ai , i ). In
particular, we cover the northern and southern (open) hemispheres by mapping them into the open
59
A1 = {x2 + y 2 + z 2 = R2 ; z > 0}
1 = (x, y)
A2 = {x2 + y 2 + z 2 = R2 ; z < 0}
2 = (x, y)
(3.1.6)
(3.1.7)
3 = (x, )
(3.1.8)
3 = (x, )
where 0 < z0 < R, tan() = y/x and 0 < x0 < R. This shows that S 2 is indeed a manifold.
Note that it is common practice to pretend A3 and A4 can be replaced by one strip with periodic
boundary condition, namely
= (z, ) ,
= arctan (x/y) ,
where < . However, this is not an open (closed) subset of R2 . This periodic boundary
condition is also used to define polar coordinates in R2 , or on the torus. However, for the sake of
some more mathematical rigor, the plane R2 should be covered by two infinite (open) punctured
disks ||x x1 || > 0 and ||x x2 || > 0 centered around x1 6= x2 . It is not uncommon that one
meets with mathematical difficulties, for example, trying to solve differential equations, when such
subtleties are overlooked.
Example: the cone
We first note the cone cannot be smoothly embedded in R3 . For example, one could describe the
cone by means of the following condition
p
(3.1.10)
f c = z a x2 + y 2 = 0 ,
60
Figure 3.4: Mapping the cone into the plane ( is the deficit angle).
where a > 0 is a constant, and x and y may therefore look like valid coordinates on the cone.
Although fc is continuous in the three coordinates x, y and z, its first derivative with respect to x
or y is not smooth at the tip (x = y = 0), since it show a cusp there,
fc
a2 |x|
=
.
x
f
(3.1.11)
3.1.2
Curves
(3.1.12)
(3.1.13)
(3.1.14)
where is the real parameter which identifies points on the curve and xi are its coordinates in the
given chart. If the n functions xi = xi () C p (R), then the curve is p-differentiable. Note that,
according to our definition, a reparameterization of , that is
= () ,
(3.1.15)
3.1.3
Functions
f = f (xi ) ,
(3.1.16)
(3.1.17)
3.1.4
We recall a vector in Rn can be viewed as a displacement (an oriented straight path between two
points), but also as the tangent to a curve. The first interpretation is difficult to make sense on a
generic manifold, since displacements involve different points (arbitrarily separated) and the notion
of a straight path between them is not necessarily given.
3
This will allow us to distinguish two particles following the same path at different speeds.
62
for all functions f defined in a neighbourhood of P , where (P ) = P and f here stands for
f = f (). Since in any neighborhood of P there is a chart into Rn , we can also write
d(f )
d
1
~v (f ) =
(f )
=
d
d
=P
=P
d
d
i
i
i
f (x ())
f (x ) x ()
=
=
d
d
=P
=P
i
X dxi
X f dx
=
f
.
(3.1.19)
=
xi d
d xi
=P
=P
from which
~v (f ) =
X
dxi
d xi
f
(3.1.20)
=P
This gives a mathematically precise (and coordinate independent) meaning to the naive notion of
a vector as the tangent to at P .
Note that the definition (3.1.18) immediately implies that a vector
~v =
d
,
d
(3.1.21)
df
dg
d
(a f + b g) = a
+b
,
d
d
d
a, b R ,
(3.1.22)
and for all functions f and g defined in a neighbourhood of the point P . In fact, one could define a
vector ~v at a point P as a linear functional that acts on all the functions defined in a neighbourhood
of the point P , and then, prove that there exists a curve = () such that Eq. (3.1.18) holds
63
using Eq. (3.1.22). However, the latter (equivalent) definition is more formal and does not make
clear from the very beginning that we are just generalising the notion of tangent to a curve.
We recall that, in the tensor formalism, we defined vectors as objects with special transformation properties under (certain global) coordinate transformations. It is now easy to see the true
geometrical meaning of that definition. By omitting the argument in Eq. (3.1.20), we can write
~v =
dxi
= v i ~ei ,
d xi
(3.1.23)
,
xi
(3.1.24)
where
~ei =
is a coordinate (basis) vector, that is the vector tangent to the coordinate line defined by constant xj for j 6= i and passing through P . Under a general and local change of coordinates in a
neighbourhood of P ,
y i = y i (xj ) ,
we then have (take note of the position of the indices)
i
j
i
dx x dy = wi
j
d
y d
~v = wj ~ej = ~v .
= ~ei
i
x
xi y j
(3.1.25)
(3.1.26)
In other words, components transform like we discussed in the tensor formalism (and according to
the kind of general linear transformation we studied in Chapter 2), but the actual vector remains
the same because the basis also change (inversely).
d
dxi
dxi
dxi
dxi
a
+b
= a
+b
= a
+b
~ei .
(3.1.27)
d
d
d
d xi
d
d
64
Since there are n coordinates xi , we can have n families of independent curves and parameters.
This defines a vector space at each point P of a manifold M (see Fig 3.9), called the tangent space
TP . Clearly, at each P , we have TP = Rn .
.
values at the tip (namely 1), which is not allowed. And the same is of course true for y
Vector fields
(3.1.28)
where, as usual, the notation does not distinguish between geometrical vectors and their coordinate
representation.
65
i
i
can be written as ~v = v xi = v ~ei . This is a necessary condition, but, in order to prove that these
~ei form a basis, we need to show they are also linearly independent. The latter can be proven by
recalling that the determinant of the Jacobian matrix for a change of coordinates y i = y i (xi ) must
not vanish, that is
y 1
x21
y
x1
J = det
...
y n
x1
...
...
...
...
y 1
xn
y 2
xn
6= 0 .
...
(3.1.29)
y n
xn
It then follows that the n n-tuples of row (or column) entries are linearly independent, and so are
the n vectors
~ej =
y i
=
.
xj
xj y i
(3.1.30)
Since all coordinates = xi are defined in open sets, the above definition of coordinate basis
vectors can be naturally extended to define coordinate basis vector fields in the chart of . It is
important however to remark that coordinate basis vectors at different points, say P and Q, belong
to different tangent spaces, TP and TQ , and cannot be composed linearly, that is operations such
as a ~ei (P ) + b ~ej (Q) are not allowed.
Fiber bundles
Figure 3.11: A simple band (left) and the Moebius strip (right).
The set of all tangent spaces to the points of a manifold together with the base manifold itself
is called tangent bundle T M. More precisely, T M = (M, {TP : P M}), where the original
manifold M is now called the base manifold, and the tangent spaces TP are the fibers. One can
show that T M is also a manifold and that vector fields can be viewed as sections of T M.
An example of non-trivial tangent bundle is given by a closed band. Locally (at each point P
of the band), the tangent bundle is simply given by R2 R2 . However, the global tangent bundle is
not necessarily the direct product of two manifolds: consider the Moebius strip obtained by cutting
the band and twisting the edges before pasting them again (see Fig. 3.11). One therefore needs
to travel twice along the strip in order to come back to the starting point. Spinors belong to such
manifolds.
66
3.1.5
(3.1.31)
~
=Y
d
(3.1.32)
(0 ) = P0 .
dxi ()
= Y i (xi ())
d
(3.1.33)
xi ( ) = xi (P ) ,
0
(3.1.34)
Theorems of calculus ensure that the problem (3.1.33) always admits one solution xi = xi () in
~
a (sufficiently small) neighbourhood of the point P0 , therefore integral curves of a vector field Y
always exist locally.
Exponential map
The formal solution to the n first order differential equations and initial conditions given in
Eq. (3.1.33) can be written as
~ i
i
(0 ) Y
x =e
x
,
(3.1.35)
=0
67
~ in a neighborhood of P0
which is called the exponential map and describes the flow of velocity Y
n
in the coordinate space R .
Let us see in detail how the exponential of a vector field generates integral curves, and the
~ therefore act as generators of the displacements 4 . Given the vector field
(tangent) vectors Y
d
~ =
Y
, its integral curve = () across a point P0 , and the chart which maps P0 into
d
(P0 ) = xi (0 ), we can Taylor expand the n coordinates in a neighbourhood of P0 along the
integral curve as
2 d2 xi
dxi
+ ...
+
x (0 + ) = x (0 ) +
d 0
2 d2 0
2 d2
d
+
+ . . . xi .
=
1+
2
d
2! d
0
i
(3.1.36)
Next, note that all terms in the above expansion are well-defined, because the coordinates of
any point P , xi = xi (P ), are functions on the manifold. In the language of tensor calculus,
coordinates are scalars, in agreement with the fact that the measurements by which an observer
assigns coordinates to a point cannot be questioned by other observers 5 . The action of the vector
~ on the coordinates is then well-defined by the very definition of a vector and we can rewrite the
Y
above as
d
~
i
x (0 + ) = exp
xi = e Y xi .
(3.1.37)
d
0
0
A neat example is given by choosing the parameter along as one of the coordinates, say
~ = 1 and, setting = x1 x1 , one easily obtains
= x1 . We then have Y
0
x
xi = e x1 xi
x1 =x10
xi0
+ (x
x10 )
xi
(x1 x10 )2 2 xi
+
+ ...
x1 x1 =x1
2
(x1 )2 x1 =x1
0
i
1 x1 ) = x1 , i = 1
+
(x
x
0
0
xi0 ,
(3.1.38)
i 6= 1 .
Using the chain rule for the derivation of composite functions, it is straightforward to generalise
the above expression to any function (or field) defined in a neighbourhood of P . For example,
d
f (0 + ) = exp
d
where f () = f (xi ()) = f 1 .
4
~
= e Y f
(3.1.39)
The definition (2.2.7) of a Lie group is a special case of the exponential map, that generates all the elements of
~.
the group from the identity, and the generators of the Lie algebra play the role of the vectors Y
5
Equivalently, since the coordinates identify a specific observer, the very definition of each observer is based on
scalar quantities.
68
~ ,W
~
V
d d
d d
d d d d
i
j
= v i i wj
w
v
x
xj
xi
xj
j
j
i v
i w
i j
j i + v
w
= v w
i
j
i
i
x x
x x
x
x xj
j
wj
i v
w
,
=
vi
i
i
x
x xj
=
(3.1.40)
which is an element of TP . This is a first remarkable result: the commutator of two vectors is still
~ and W
~ are coordinate vectors,
a vector 6 . Moreover, the commutator vanishes if the two fields V
1
2
~
~
that is, if there exit coordinates, say x and x , such that V = x1 and W = x 2 . In fact, if this is
the case, v j = 1j and wi = 2j are obviously constant and the bracket in the last line above vanishes.
69
x (B) x (A) =
d d
,
xi + O(3 ) .
d d
P
(3.1.42)
If the commutator of the two fields does not vanish, A 6= B, and the path P A BP does not close.
Exemple: polar coordinates in R2
x
=
r = cos(0 ) x + sin(0 ) y
x
and
(3.1.43)
~
sin(
)
x
+
cos(
)
y
,
0
0
y =
y
where 0 = arctan(y0 /x0 ). Upon considering the integral curves of these four vectors in a neighbourhood of P0 , one finds
h i
[
x, y] = 0
r, ~ 6= 0 .
(3.1.44)
This is obvious if we choose particular curves. For example, let us start from P0 on the x-axis
~ and compare the result with the inverted
and move first along r and then in the direction of ,
~ is not a coordinate basis
steps (see the left and right panels in Fig. 3.14). This shows that {
r , }
(although it is a basis of TP0 ).
Of course, the proper coordinate basis for polar coordinates is obtained by rescaling
~ ,
~ = /r
(3.1.45)
since motion along corresponds to rotation of a given angle around the origin (whereas motion
along ~ correspond to rotation of an arc of length r ).
Lie algebra of vector fields
~ and B
~ are coordinate vector fields, then [A,
~ B]
~ = 0. Let us now prove that the
We saw that, if A
vanishing of the commutator is also a sufficient condition for the two fields to be coordinate, that
~ B]
~ = 0 implies that there exist two coordinates whose lines are tangent to A
~ and B.
~
is [A,
70
~ and B.
~
Figure 3.15: Exponential maps of A
~ and
Let us consider a two-dimensional manifold for simplicity and assume the vector fields A
~
B are linearly independent in their domain of definition, with
~= d ,
A
d
~ = d .
B
d
(3.1.46)
~
Let us start from a point P and first move along A,
d
i
xi
P R : x (R) = exp 1
d
P
(3.1.47)
~
and then along B,
d
d
R Q : x (Q) = exp 1
exp 1
xi ,
d
d
P
i
(3.1.48)
in which we assumed all relevant points are included in the same chart (U, = xi ) for simplicity.
We should be able to look at 1 and 1 as coordinates of the final point Q,
d
d
i
exp
xi .
(3.1.49)
x (, ) = exp
d
d
P
Now, if and are to be coordinates, the corresponding basis vectors should be
xi
xi
i
= x .
xi
xi
]
and
with
(3.1.50)
J = det
x1
x2
x1
x2
71
6= 0 .
(3.1.51)
In fact,
i
i
x
dx
d
d
d
d
=
exp
exp
x
exp
= exp
d
d
d
d d P
P
xi
d
d
d dxi
d
=
,
exp
exp
x
exp
= exp
d
d
d
d d P
P
(3.1.52)
d
d
d
exp
,
= exp
d
d d
(3.1.53)
d d
,
=0.
d d
(3.1.54)
3.1.6
One-forms
One-forms are linear functionals on vectors and the geometrical counterparts of co-vectors.
Let us consider a point P on the manifold M and the tangent space TP . A 1-form at P is a
linear functional w
acting on vectors in TP ,
w
: TP R ,
(3.1.55)
such that
w(
~v + ~u) = w(~
v ) + w(~
u)
( w)(~
v ) = w(~
v)
(3.1.56)
(w
+
)(~v ) = w(~
v) +
(~v ) .
Note that linearity implies the action of a given 1-form on a generic vector is completely defined
by its action on a basis of TP . Several equivalent notations are in use, for example
w(~
v ) = ~v (w)
= hw,
~v i = h w
| ~v i .
(3.1.57)
One-forms acting on the same TP form a vector space TP , dual to TP , and the collection of all
TP forms the cotangent bundle T M.
A 1-form field is an application which associates a 1-form from TP for each point P of a manifold
M, and, as usual, we shall always assume such a map is sufficiently smooth.
72
(3.1.58)
We can now use the same Eq. (3.1.58) to define the 1-form df d as the reverse operation,
d
~ , given a fixed function f ,
namely the 1-form which associates a real number to any V
d
~ (f ) = df = df
V
.
d
d
(3.1.59)
f (x)
= i f = dfi ,
xi
(3.1.60)
~
df (V ) = V 1 .
x P
(3.1.61)
If the notion of an infinitesimal neighbourhood appears disturbing, one could actually consider
~ through P with unit parametric length, that is the curve tangent to V
~ that
the integral curve of V
starts at P and ends at Q, where
~
xi (Q) = eV xi (P ) .
(3.1.62)
This yields a more general interpretation of 1-forms in any dimension as a series of surfaces and its
action on a vector as the number of surfaces the vector crosses.
73
(3.1.63)
that is, the ith basis 1-form ei associates to a vector its ith component. Obviously, there are n such
forms and the dimension of TP equals the dimension of TP and M. Note that we can equivalently
write Eq. (3.1.63) as
ei (~ej ) = ji .
(3.1.64)
It is easy to see that the above ei are actually a basis, since, given any 1-form q TP , we have
q(~v ) = q(v i~ei ) = v i q(~ei ) = ei (~v ) q(~ei ) = qi ei (~v ) = qi v i .
(3.1.65)
(3.1.66)
We finally note that under a general change of coordinates, vectors and covectors do not change,
only their components do and in a way that compensates so as to keep the above real number (a
scalar) unchanged.
3.1.7
The general definition of (n, m) tensors at P is that of linear functionals acting on n 1-forms and
m vectors,
T : TP TP TP TP R ,
{z
}
|
{z
} |
(3.1.67)
where is the usual cartesian product of vector spaces. It is however easier to build them from
vectors and 1-forms (covectors) by means of the outer product, like we did when studying group
theory.
Tensor components and outer product
We can now define a general tensor as a combination of vectors and covectors, where by combination
we mean the outer product, likewise denoted by . For example, by multiplying two vectors and
applying the result to dual basis covectors, we obtain
~ W
~ )(
~ (
~ (
(V
ei , ej ) = V
ei ) W
ej ) = V i W j ,
74
(3.1.68)
where we note that the second expression is simply the product of two numbers (for fixed i and j).
We can therefore write
~ W
~ = V i W j ~ei ~ej ,
V
(3.1.69)
where now V i and W j are just numbers. This means the outer product of two vectors is an
application
~ W
~ : T T R
V
P
P
(3.1.70)
(3.1.71)
where the components are in turn defined by the action of the tensor on basis vectors and covectors,
...im
Tji11ji22...j
= T (
ei1 , ei2 , . . . , eim , ~ej1 , ~ej2 , . . . , ~ejn ) .
n
(3.1.72)
Basis transformations
We now study changes of basis in TP and TP .
Let us consider a point P on a manifold M and the tangent space TP , with {~ei } as a basis. A
change of basis in TP , namely {~ei } {~ei }, is determined by a non-degenerate n n matrix (of
fixed real entries), that is an element of GL(n). Such a matrix has in general no particular tensorial
properties and just specifies a linear transformation
~ej = i j ~ei .
(3.1.73)
(3.1.74)
ei (~ek ) kj = ki kj = i j .
(3.1.75)
Since 1-forms act linearly, the above expression defines the action of ei on the transformed vector
basis,
ei (~ek ) kj = ei (~ek kj ) = ei (~ej ) .
(3.1.76)
i j jk = ki ,
i j jk = ki .
75
(3.1.77)
ki ei (~ek ) kj = ki ei (~ej ) ,
(3.1.78)
ki ei (~ek ) kj = ki i j = jk .
(3.1.79)
Equating the two results, we thus see that the transformed dual basis is precisely given by
ek = ki ei ,
(3.1.80)
that is, basis 1-forms transform according to the inverse matrix 1 . Note that in the present
notation, 1 is also the matrix that transforms vector components, whereas 1-form components
transform with .
Tensor operations on components
Let us now summarize all operations that map tensors T of type (n, m) into tensors defined at the
same point P , but of possibly different type:
Scalar multiplication:
Addition:
T (n,m) a T (n,m) , a R
Outer product:
(3.1.81)
(3.1.82)
(3.1.83)
T (n,m) ( ,
, ) = T (n1,m)
(3.1.84)
T (n,m) ( , ~v , ) = T (n,m1) .
(3.1.85)
The last two operations above can then be easily generalised to any saturation of (n, m) tensors
with (p m, q n) tensors.
Change of coordinates and coordinate basis
Let us now consider a point P on a manifold M, the tangent space TP , and two charts = xi and
= y i , connected by a bijective function f (see Fig. 3.16). We can then introduce two coordinate
xi
y j
(3.1.86)
kj
y k
=
.
xj
76
(3.1.87)
y k
y k
k
k
=
=
=
.
j
i
i
j
j
i
x
x x
x x
xj i
(3.1.88)
kj
ki
=
xi
xj
(3.1.89)
is necessary for a change of basis in the tangent space to correspond to a change of coordinates on
the manifold. This is in fact a strong restriction, as we shall see better later.
3.2
We shall now introduce distance (length) and angles on a manifold. We first need define the length
(or modulus) of a vector and the angle between two vectors belonging to the tangent space TP of
one point. Both quantities are obtained from a scalar product between vectors in TP , which can
in turn be introduced by means of a special tensor. Upon elevating this tensor to a field, we will
finally be able to define the length of a path on M.
3.2.1
Metric tensor
A metric tensor is a type (0, 2) tensor which maps any two vectors into a real number with the
following properties:
1) it is symmetric
g(~v , w)
~ = g(w,
~ ~v ) = gij v i wj = ~v w
~ ,
7
77
(3.2.1)
w
~ TP
~v = 0 ]
det(gij ) 6= 0 .
(3.2.2)
Examples of metric tensors are the Euclidean metric gij = ij and the Minkowski metric.
Any metric tensor automatically defines a scalar product with the expected properties. In
particular, the squared modulus of a vector is given by
v 2 = g(~v , ~v ) = gij v i v j ,
(3.2.3)
(3.2.4)
although the latter will only be properly defined for Euclidean metrics.
Canonical form and orthonormal bases
The components of any metric g at a point P , under a change of basis in the tangent space TP ,
will change according to the matrix we introduced before,
g = T g .
(3.2.5)
Since in given coordinates gij is a symmetric matrix, it can always be put in diagonal form. More
precisely, we can always write as the product of an orthogonal matrix O1 = O T and a symmetric
matrix D = D T , such that
g = D T O T g O D = D T g(diag) D = D g(diag) D .
(3.2.6)
gij
= ij .
(3.2.7)
The canonical form of the metric implicitly defines the orthonormal basis ~ei for vectors (and dual
ei for 1-forms) at the point P .
What we cannot change arbitrarily is the sign of each diagonal element, whose sum is called the
signature of the metric. If all signs are positive (negative), the metric is positive (negative) definite
and generically called Riemannian. The Euclidean metric is a special case of Riemannian metric
which can be put in canonical form simultaneously at all points of a manifold 8 . If elements of
both signs appear, the metric is said pseudo-Riemannian. In particular, if one element is negative
(positive) and all the others are positive (negative), then is said to have Lorentzian signature (like
the Minkowski metric).
8
Note that, although it is always possible to diagonalize a symmetric matrix, it might not be possible to diagonalize
a metric tensor field simultaneously at different points, since the required matrices may not satisfy the condition
(3.1.89). More on this in the following.
78
(3.2.8)
(3.2.9)
(3.2.10)
and this map is independent of any duality relation between ~ei and ej .
Since gij is invertible, we denote its inverse with
1
gij
= gij
gij gjk = ik ,
(3.2.11)
where
g1 (
ei , ej ) = gij .
(3.2.12)
(3.2.13)
(3.2.14)
actually represents a map between (the components of) a (2, 0) tensor to (the components of) a
(1, 1) tensor. Likewise,
T ij gij = T ij gjk gkl gli = T ik ik = T ,
(3.2.15)
is a map between (the components of) a (2, 0) tensor to scalars (the trace), and can be generalised to
tensors of any order (n, m) to produce tensors of order (n2, m) or (n, m2) and also (n1, m1).
3.2.2
A metric tensor field is an application which maps each point of a manifold M into a metric tensor
g = g(P ). A manifold in which a metric tensor (field) is defined everywhere is called a metric
manifold.
79
The conclusion is thus that it is always possible, by a change of coordinates, to write a metric
tensor field in the form
gij
(x) = ij +
1 2 gij
xk xl + . . . ,
2 xk xl
(3.2.18)
around a given point P of coordinates x(P ). Equivalently, it is always possible to choose locally
orthogonal coordinates at any given point P . In general, however, as we move away from P , the
same coordinates will not be orthogonal, unless the manifold is Rn or a subset of it: there exist
no change of coordinates that can put a general metric tensor in canonical form everywhere on a
manifold.
Length of a curve
We can finally define the concept of length of a path on a manifold by considering the integral
curve of a vector field ~v = d . We first define the (squared) length of an infinitesimal displacement
d
along the vector field ~v as
dl2 = d~x d~x = (~v d) (~v d) = g(~v d, ~v d) = g(~v , ~v ) d2 ,
(3.2.19)
which is obviously a scalar quantity (since ~v is a vector, d a scalar and g a (0, 2) tensor). Upon
integrating along an integral curve of the vector field ~v , we obtain the length of the integral path
between two points of parameters 1 and 2 ,
Z 2 p
Z 2 q
(3.2.20)
l(1 , 2 ) =
g(~v , ~v ) d =
gij () v i () v j () d .
1
The above expression can be made more explicit upon introducing coordinates = xi that cover
the region where the integral is performed, namely
Z 2 r
dxi dxj
l(1 , 2 ) =
gij ()
d ,
(3.2.21)
d d
1
i
where we just used the definition of vector components v i = dx .
d
9
Technically, Eq. (3.2.17) are first order partial differential equations for the components of (x), with Eq. (3.2.7)
at P playing the role of (initial value) boundary conditions.
80
3.3
Among the good tensor operation we have seen so far there is no derivative. An easy way to
understand why, is to consider how the ordinary partial derivative of a vector field transforms
xa b
T
xb
xc
=
xb xc
xa b
T
xb
xb
xa xc
2 xa xc b
b
T
+
T ,
c
xb xb
xb xc xb
(3.3.1)
where we are implicitly decomposing tensor quantities in terms of coordinate basis vectors and the
dual 1-form basis. Due to the presence of the second term in the last line, this quantity does not
clearly look like a tensor.
The above argument, beside being inaccurate (as we shall see later on), does not clarify the real
issue at stake here. Derivatives involve comparing quantities at different points, and also require a
way to quantify the difference between those base points. For example, in the case of functions,
we have seen from the onset that the derivative on a generic manifold requires a curve. Even if a
curve is given, for other tensorial quantities, we then need a way to map such quantities between
the tangent spaces at different points, an ingredient that is mathematically arbitrary. In particular,
in this section we shall consider integral curves of a vector field as a flow for points on a manifold,
thus implementing the active interpretation of (auto)diffeomorphisms of a manifold, which will turn
out to represent Lie groups and allow us to introduce symmetry on differentiable manifolds.
Suggested bibliography: B. Schutz, Geometrical methods of mathematical physics, Cambridge
Univ. Press (1980). Selected chapters: 3.1-3.7, 3.10-3.11.
3.3.1
(x) = (x ) ,
(3.3.2)
where x and x represent the same point in different coordinate frames. We therefore see that it
is not the function f that changes under coordinate transformations, but its composition with the
chart . This describes the so-called passive interpretation of a diffeomorphism on a manifold M:
the points remain the same but their coordinates change. Technically, the diffeomorphism we are
considering does not act on M itself, but on the open subsets of Rn that carry the coordinates for
the manifold, that is,
= () ,
(3.3.3)
(3.3.4)
x
x
V i (x ) = i j (x(x )) V j (x(x )) ,
where the notation is meant to highlight that, in the new coordinate system x = x (x), the new
components V i with respect to the new coordinate basis ~ei = xi are linear combinations of the
(3.3.6)
Assuming points are not moved too far, one can use the same chart = x to cover both U M
and its image U = (U ) and, upon composing both f and f with the same chart , Eq. (3.3.6)
implies
(x) = f 1 (x) = f 1 1 (x ) = (x ) (x) = (x ) ,
(3.3.7)
where and clearly represent different composite functions with respect to the and used
in the previous paragraph, since x and x here represent different points in the same coordinate
frame 10 .
In both cases, active and passive mappings, a question remarkably relevant for physics remains
(for now) unanswered: if coordinates are the only quantities that identify points, and we can
change them freely, what is the physical meaning of points themselves? In other words, how can
we say that two points P and Q on the same manifold are really different geometrical locations
and distinguish active from passive transformations? A hint comes from considering coordinates as
scalars: if we could give an operationally invariant meaning to the measurement of positions, points
would be clearly identified by scalar quantities that do not change when we drag them around or
change coordinates in the mathematical sense. In fact, only very selected coordinates can be given
a physical meaning in this sense, whereas most charts will remain a mathematically useful, but
otherwise formal, tool.
10
It is unfortunate that too many text-books do not distinguish these two compositions and the inherently different
geometrical meanings.
82
3.3.2
(3.3.8)
~ C , the maps ,
Clearly, this map also transforms 0 in a new curve . If the vector field V
with R, become diffeomorphisms (recall the active interpretation of changes of coordinates)
and form a Lie group with respect to the usual composition law,
1 2 = 1 +2 ,
1
= ,
=0 = I .
(3.3.9)
f
(Q) = f (P ) ,
with (P ) = Q .
(3.3.10)
In other words, f takes the same value at the Lie-dragged point Q the original function f takes
~ = d passing through
at the point P . If Eq. (3.3.10) holds for all Q along the integral curve of V
d
as a function of must be constant along such curve. Consequently, if
P , it is clear that f
and f are the same for all values of , the function f must be constant along the lines of the
f
congruence and df = 0.
d
83
~ applied to any LieThis is in analogy with the case of a function: the pushed forward vector W
dragged function f at the Lie-dragged point Q produces the same real number as the original
~ applied to the original function f at the original point P . This can also be written as
vector W
df
df
,
(3.3.12)
=
d 0 +
d 0
~ = d . If W
~ and V
~ are linearly
that df as a function of is constant along congruences of V
d
d
~ = d , we can use this 0 as the initial curve (with
independent along an integral curve 0 of W
d
~ . This initial curve will then be mapped
= 0 and constant) for defining the congruence of V
11
into a new curve = for each value of . Since = 0 along 0 and = 0 +
11
84
~
~
V ,W =
,
=0,
(3.3.13)
d d
for all points in a given open subset of M (and, implicitly, for all functions in such a subset). This
~ , as we shall see momentarily.
can indeed be taken as one of the defining equations for W
3.3.3
Lie derivatives
f
(0 ) f (0 )
f (0 + ) f (0 )
df
~ (f ) , (3.3.14)
=V
= lim
=
V~ f = lim
0
0
0
d 0
in which we employed the pull back of f rather than the push forward. In other words, we
~ to map P (0 + ) to P (0 ),
used the flow generated by the congruence of V
(P (0 + )) = P (0 ) ,
(3.3.15)
f
(0 ) = f (0 + ) .
12
~ (f )|
One can easily see this by choosing the function f = along 0 , so that 1 = W
0
85
(3.3.16)
= df
d
0 +
(3.3.17)
which implies
d d
, =0,
d d
(3.3.19)
Note that Eq. (3.3.18) applied to a given function f and for fixed can be viewed as a proper
~ = d satisfying the first order
set of n initial conditions for the unknown n-dimensional field W
d
partial differential equation (3.3.19). In particular, for = 0 we have f = f and Eq. (3.3.18)
reads
df
df
=
,
(3.3.20)
d 0
d 0
P (0 +)
Upon acting on an arbitrary function f , the pulled back version of the initial condition (3.3.20) at
the starting point P (0 + ) reads
df
df
=
,
(3.3.22)
d 0 +
d 0 +
and Eq. (3.3.19) can explicitly be rewritten as
d d
d d
f=
f ,
d d
d d
in the entire chosen neighborhood of P (0 + ) [thus including P (0 )].
86
(3.3.23)
~ as 13
We can finally define the Lie derivative of a vector field W
~
~
W
W
~ (f ) = lim
(f ) .
V~ W
0
(3.3.24)
~
=
f
+ O(2 ) .
W
(f )
d 0
d 0 +
d d 0 +
0
(3.3.25)
~
~
W (f ) = W (f ) +
,
f + O(2 ) ,
d d
0
0
0
(3.3.27)
On using the initial condition (3.3.22), expanding around 0 and imposing Eq. (3.3.23), we next
get
df
d d
df
=
f
+ O(2 )
d 0
d 0 +
d d 0 +
df
d d
d d
=
+
f
f
+ O(2 )
d 0
d d 0
d d 0
d d
d d
df
+
+ O(2 ) .
(3.3.26)
f
f
=
d 0
d d
d d 0
from which
~ (f )
V~ W
i
h
d , d f + O(2 )
d d
0
= lim
0
d d
,
f ,
=
d d
0
(3.3.28)
h
i
~ = V
~ ,W
~ .
V~ W
(3.3.29)
i
W i
~
V~ W
= V j j Wi Wj j V i =
;
x
x
x1
(3.3.30)
~ + f ~ W
~ ;
~ ) = ( ~ f )W
V~ (f W
V
V
(3.3.31)
~ =W
~ , as it reduces to the
Note the above expression vanishes for a Lie dragged vector field W
defining Eq. (3.3.19).
The Lie derivative has the following properties:
~ are constant along the direction defined by V
~ = 1 :
1. It vanishes if the components of W
x
13
~ and W
~ act on the original function f (none act on f ).
Note that both W
87
3. It is linear:
V~ + W
~ +W
~ ;
~ = V
(3.3.32)
V~ , W
~ = [ V
~ ,W
~] ,
(3.3.33)
(3.3.34)
4. The commutator
so that:
5. It satisfies the Jacobi identity:
(3.3.35)
in which we assumed the Leibniz rule. From this, we can obtain the rather formal expression
~
~).
~ ) = ~ (w(
V~ W
W
(V~ w)(
V W )) w(
(3.3.36)
Upon noting that the full saturation of a type (n, m) tensor field is again a real function,
(n)
~ 1, W
~ 2 ... W
~ m) : M R ,
1 , w
2 ... w
n , W
T(m) (w
(3.3.37)
~ 1, W
~ 2 ... W
~ m ) + ...
2 ... w
n , W
+T (w
1 , V~ w
~ 1, W
~ 2 ... W
~ m ) + ...
+T (w
1 , w
2 ... w
n , V~ W
~ 1, W
~ 2 ... ~ W
~ m) .
+T (w
1 , w
2 ... w
n , W
(3.3.38)
V
This is again a formal expression, which we can however simplify by a smart choice of coordinates.
Simple form of Lie derivatives
Let us choose = x1 as one of the n coordinates, so that
d
=
,
d
x1
and review the different cases:
88
(3.3.39)
Scalars:
V~ f =
f
df
=
.
d
x1
(3.3.40)
Vectors:
Tensors:
i h
ii
i
~
~ ,W
~ = W .
= V
V~ W
x1
T
T
,
0
V~ T = lim
(3.3.41)
(3.3.42)
from which
V~ T
i1 i2 ... in
j1 j2 ... jm
... in
Tji11ji22 ...
jm
x1
(3.3.43)
To summarize, the Lie derivative is the coordinate invariant definition of partial derivatives. This
result, incidentally, shows that the easy argument against partial derivatives being improper tensorial operations is inaccurate.
Example
~ (x, y) given by
Let us consider cartesian coordinates {x, y} on the plane R2 and the vector field V
~ (x, y) = x2 + ,
V
x y
(3.3.44)
~ , we
which is well-defined in all of R2 . In order to compute any Lie derivative with respect to V
first define new coordinates {v = v(x, y), w = w(x, y)}, such that
~ (v, w) = .
V
v
The usual chain rule for partial derivatives then implies
x
y
=
+
x2
+
,
v
v x
v y
x y
(3.3.45)
(3.3.46)
or
1 x
(x1 )
x2 v = v = 1
y = 1 .
v
1
f (w) v
y(v, w) = v + g(w) ,
x(v, w) =
89
(3.3.47)
(3.3.48)
(3.3.49)
where the functions f = f (w) and g = g(w) can be chosen freely, provided the transformation of
coordinates is not singular [see Eq. (3.1.29)], that is
x y
v v
df
dg
6= 0 .
(3.3.50)
J det
=x
dw dw
x y
w w
For example, we can set
f =0
and g = w ,
(3.3.51)
so that
x=
v
y =v+w ,
v=
x
w = y + x1 ,
1
0
2
x y x2
x 0
=
=
and
=
.
1 1
w w
1
2 1
x
x y
(3.3.52)
(3.3.53)
Note that the new coordinates become singular for x 0 (where both v and w diverge, since the
above mapping exchanges the origin in one frame with infinity in the other), whereas the Jacobian
J vanishes for x (that is, the origin in {v, w}). As long as we avoid those two regions of R2 ,
the new coordinates v and w are fine.
Suppose we now want to compute the Lie derivative of the function f (x, y) given by
f (x, y) = x .
(3.3.54)
Direct application of the definition (3.3.14) for the vector (3.3.44) yields
~ (f ) = x2
V~ f = V
f
f
+
= x2 .
x y
(3.3.55)
1
,
v
(3.3.56)
1
f
= 2 = x2 (v, w) .
v
v
(3.3.57)
(3.3.58)
2
~
+
,
.
= 2 x
V~ W = x
x y x
x
(3.3.59)
~ in the new
If we instead wish to apply the simple expression (3.3.41), we first need to express W
coordinate system. Its components change with the matrix in Eq. (3.3.53) and the functional
dependence on the coordinates according to Eq. (3.3.52), yielding
2
~
W =v
.
(3.3.60)
v w
Therefore,
V~ W
and
V~ W
or
~ = 2v
V~ W
v
w
v w
2
(v 2 )
= 2v = ,
v
x
=
(3.3.61)
2
(v 2 )
= 2 v = ,
v
x
= 2 x v
v w
(3.3.62)
= 2 x
,
x
(3.3.63)
as it should.
The above example shows, among other things, that it is not always easier to use the simpler
expressions of the Lie derivatives.
3.3.4
We have seen how a vector field generates a flow on the manifold, and how the Lie derivative can
be used to assess whether a given tensorial quantity remains unaffected by such a flow. It is therefore
natural to associate the concept of symmetry to such flows generated by sets of vector fields:
instead of relying on global coordinate transformations, the geometrical meaning of a symmetry
lies on the local behaviour of relevant quantities under suitable displacements of the points. Such
displacements can be further associated to preferred foliations of the manifold, and the latter
then interpreted as preferred observers.
Submanifolds and Lie algebras
We have just seen that one vector field on a manifold M can generate congruences, that is a family
of one-dimensional submanifolds of M. Likewise, sets of vector fields can act as generators of
submanifolds foliating a manifold.
Given a manifold M of dimension n, one of its subsets S is a submanifold of dimension m n
if there exist charts U with coordinates x Rn such that U S M and, for all points of S (see
Fig. 3.22),
x1 = x2 = ... = xnm = 0 ,
91
mn.
(3.3.64)
Given a point P S, we can define the tangent space TP and, to each curve or vector on
S, we can associate a corresponding quantity in M (see Fig. 3.23). We then have the following
relations. First of all, we recall that
(S)
(M)
=m.
(3.3.65)
= n dim TP
dim TP
(3.3.66)
and so do vectors,
~S = (V 1 , V 2 , . . . , V m ) V
~M = (0, 0, . . . , 0, V 1 , V 2 , . . . , V m ) .
V
(3.3.67)
However, the inverse maps are not uniquely defined: it is always possible to project a curve or
vector from M to S, but the resulting curve or vector are the images of infinitely many curves and
vectors. For example,
~M = (0, . . . , 0, V 1 , . . . , V m ) V
~S = (V 1 , . . . , V m ) V
~ = (1, . . . , 1, V 1 , . . . , V m ) (3.3.68)
V
.
M
(S)
~)=w
w
S (V
M (0, . . . , 0, V~ ) R ,
(3.3.69)
(3.3.70)
(S)
~ T .
yield the same result for all V
P
Since one vector field generates congruences (a foliation of M in one-dimensional submanifolds),
one could naively think m n vector fields define m-dimensional submanifolds. The general
situation is rather different and stated by the very important
Frobenius theorem:
~ (k) (k = 1, 2, . . . , p) on the manifold M,
Given p linearly independent vector fields V
such that
h
i
~ (k) ,
~ (i) , V
~ (i) = cij V
(3.3.71)
V
k
with cijk real constants, the integral curves of these fields form a family of submanifolds
(or foliation) of M, each of dimension m p.
The meaning of the theorem is that a family of p vector fields could actually define a submanifold,
but its dimension m is in general smaller than p. It is also important to stress that the linear
independence of p vector fields means that there do not exist p constants ai , i = 1, . . . , p, such that
p
X
~ (i) (P ) = 0 ,
ai V
i=1
P M .
(3.3.72)
In particular, this does not mean that at a given point P the corresponding vectors are also linearly
independent. In fact, if p > n, the manifolds dimension, the above relation must hold at each P ,
but the coefficients then will depend on the point, that is ai = ai (P ).
~ (2) = .
~ (1) = and V
Example (a): let us consider the manifold R3 and the vector fields V
x
y
~ (1) are straight lines parallel to the x-axis, whereas integral curves of V
~ (2) are
Integral curves of V
lines parallel to the y-axis. These two (obviously lenearly independent) vector fields together define
a foliation of R3 by the planes of equation z =costant, which are 2-dimensional submanifolds R2
of R3 . Note that we have
h
i
~ (1) , V~ (2) = 0 ,
(3.3.73)
V
Example (b): let us now consider the sphere S R3 . By introducing spherical coordinates
with z (0, 2 ) the angle around the z-axis, one can immediately see that the vector field
~z = d = generates circles of constant radius on the xy-planes. Likewise, by choosing the
x
d z
axis x and y we can also define the analogue vector fields ~x = dd and ~y = dd . In any frame,
x
y
d
d
d
~
~
~
, =
and ly =
generate spheres of constant radius, which
the three vector fields z =
dz x dx
d y
3
are 2-dimensional submanifolds of R . Note that this time
h
i
~(i) , ~(j) 6= 0 ,
(3.3.74)
and the three vector fields ~(i) do not define proper coordinates in R3 . Moreover, it is obvious that,
at each P , there must exist real coefficients aij such that
~(i) = ai1 + ai2 + ai3 ,
x
y
z
93
(3.3.75)
but these coefficients differ at different points, so that the ~(i) are independent vector fields.
Invariances and Lie algebras
~ is an invariance,
Let us consider a tensor field T of type (p, q) on a manifold M. A vector field V
or symmetry, of T if
V~ T = 0 ,
(3.3.76)
~.
that is, T is constant along congruences of V
The next important result is that vector fields leaving a set of tensors invariant generate a Lie
algebra.
Theorem:
If we have a set of (linearly independent) tensors T (k) , k = 1, . . . , q, and a set of (linearly
~ (i) , i = 1, . . . , p, such that
independent) vectors V
V~ (i) T (k) = 0 ,
(3.3.77)
(3.3.78)
It is a particularly important case for physics, since the above relation implies that Lie dragging
~ preserves lengths and angles.
points along congruences of V
Example (a): the Euclidean metric in R3 ,
1 0 0
g= 0 1 0 ,
0 0 1
(3.3.79)
x , y , z ,
1 0 0
g= 0 1 0 ,
0 0 1
94
(3.3.80)
x = x ,
(3.3.81)
(3.3.82)
and introduced tensors as mathematical objects (whose components) transform properly under
the action of SO(3, 1). In differential geometry, tensors come along with the very definition of
a manifold, and do not depend on any choice of coordinates (observers). This however does not
prevent us from describing a family of preferred observers, which can be identified with a space-time
~ . In this perspective, Eq. (3.3.82) is therefore replaced by
foliation generated by Killing vectors V
the Killing condition
V~ g = 0 ,
(3.3.83)
3.4
Differential forms
So far, we have defined tensors and, by means of a metric, also lengths and angles. We are still
missing two important geometrical quantities, namely volume and area.
Suggested bibliography: B. Schutz, Geometrical methods of mathematical physics, Cambridge
Univ. Press (1980). Selected chapters: 4.1-4.3.
3.4.1
P -forms
(3.4.1)
1
(w
ijk + w
jki + w
kij w
ikj w
kji w
jik ) w
[ijk] ,
3!
95
(3.4.2)
1
wi1 i2 ...ip + permutations w[i1 i2 ...ip ] .
p!
(3.4.3)
n!
,
p!(p n)!
p n,
with
Cpn = n2 .
(3.4.4)
Clearly, for p > n, this construction fails and there are no p-forms of that type. Moreover, there is
only one n-form. It is also easy to see that p-forms at a point P M form a vector subspace of
the space of (0, p) tensors (TP )p , and that a Cpn -dimensional basis is given by
w
A =
1
wi i ...i ei1 ei2 . . . eip ,
p! 1 2 p
(3.4.5)
where the wedge stands for the skew symmetric outer product. For example, a basis of 2-forms
is given by
1
1
wij ei ej = wij ei ej ej ei .
2!
2
w
A =
(3.4.6)
The wedge product can be used to compose a p-form with a q-form, and obtain
(p-form) (q-form) = (p + q)-form ,
provided p + q n.
~ TP , we obtain
Moreover, upon applying a p-form to a vector V
1
i1
i2
ip
p(V, , . . . , ) =
wi i ...i e e . . . e
(V k~ek )
p! 1 2 p
1
wi1 i2 ...ip V k ei1 (~ek ) ei2 . . . eip + permutations
=
p!
1
=
V k wki2 ...ip ei2 ei3 . . . eip ,
(p 1)!
(3.4.7)
(3.4.8)
which is a (p 1)-form, and we used the dual basis of 1-forms to obtain the final expression.
3.4.2
There is a reason we used the index A to denote a p-form above: they can be used to define the
area of a (sub)manifold. For example, given two vectors ~v and w,
~ we can naturally define the area
of the parallelogram they identify as
A = ~v w
~ = |~v ||w|
~ sin ,
96
(3.4.9)
(3.4.10)
A(~v , w)
~ = A(w,
~ ~v ) .
(3.4.11)
and
Given the interpretation of A as the area of a parallelogram, the antisymmetric property appears
now necessary to ensure that A(~v , a ~v ) = a A(~v , a ~v ) = 0 for all ~v and a R, without imposing
further restrictions on the vectors the area 2-form acts upon.
Given a manifold M of dimension n, a polyhedron is defined by n linearly independent vectors
(which, for infinitesimal polyhedra, we can view as belonging to the tangent space of the same
point) and its volume is simply a real number. We could therefore associate the volume to a type
(0, n) tensor. However, if we do not wish to restrict the n vectors and still assure that the volume
vanishes if (at least) two of them are linearly dependent (roughly speaking, parallel 14 ), we can
instead define the volume as a n-form. Let us denote these n vectors as ~x(k) , with k = 1, 2, . . . , n.
Since they all belong to the same TP , we can expand them on the same coordinate basis,
~xk = dxi(k)
,
xi
(3.4.12)
= f e1 e2 ... en ,
(3.4.13)
where f R. We then define the volume of the infinitesimal polyhedron (or cell) as
xk
(3.4.15)
If the n-form is a field in the chart (U M, = xi ), we can define the volume of U as simply
Z
Z
f dx1 dx2 dxn ,
(3.4.16)
=
V =
U
(U )
where now f = f 1 = f (xi ) for P (xi ) U . It is important to check this expression is actually
a scalar. So let us consider a change of coordinates = xi y i = y i (xi ) = in U and, for
simplicity, assume n = 2. We then have
1 2
Z
Z
Z
x x
x1 x2
1 2
1
2
f (x , x ) dx dx =
=
2 1 f (y 1 , y 2 ) dy 1 dy 2 ,
(3.4.17)
1 y 2
y
y y
(U )
(U )
U
14
97
=
U
f (y) J(y) dn y ,
(3.4.18)
(U )
/ TP , which
defined. We therefore take the volume form
and apply it to a vector ~v TP
(S)
means that ~v is not a linear combination of vectors of TP 15 . According to the expression (3.4.8),
(S)
this defines the (n 1)-form A =
(~v , , . . . , ), which we can now apply to n 1 vectors w
~ (k) TP ,
and obtain the area of the infinitesimal cell
w
(~v , w
~ (1) , w
~ (2) , . . . , w
~ (n1) ) = A(
~ (1) , . . . , w
~ (n1) )
1
v f e1 (w
~ (1) ) e2 (w
~ (1) ) . . . en1 (w
~ (n1) )
=
(n 1)!
1
v f dx1 dx2 dxn1 dA ,
(3.4.19)
=
(n 1)!
i
in which we assumed ~v = v ~en and w(k)
= ki again for simplicity. The area of a portion S is
then given by the integral
Z
Z
Note that under a change of coordinates, the above quantity does not change, since
A J (n1) A ,
(3.4.21)
where J (n1) is the Jacobian determinant of the transformation restricted on the hypersurface S.
Area and volume from the metric
As we expect, volume and area elements can be made compatible with the metric.
Let us assume there is a metric tensor field g on the manifold M of dimension n, and that g is
given the canonical form at the point P ,
gij (P ) = ij .
(3.4.22)
g = e1 . . . en .
(3.4.23)
15
One can naively think of ~v as orthogonal to S, although the notion of orthogonality again requires a metric,
which we do no have in general at our disposal.
98
(3.4.24)
we obtain that
g transforms according to Eq. (3.4.18), that is
g = J
g = J
1 ...
n ,
(3.4.25)
where ~i = y i , and
i (~j ) = ji . Now, observe that the determinant of a canonical metric is 1,
and from the transformation law
g = T g ,
with i k =
y i
,
xk
(3.4.26)
we obtain
det(g ) = det(T g ) = det(g) det(T ) = det(g) det2 () = det(g) J 2 = J 2 ,
(3.4.27)
from which
J=
|det(g )| .
(3.4.28)
g =
(U )
|det(g )| dy 1 dy 2 dy n ,
(3.4.29)
(3.4.30)
Ag =
()
(3.4.31)
where, for simplicity, we assumed the metric locally takes the form
gij =
"
(n1)
gij
0
1
(3.4.32)
The overall conclusion is that we can use the metric to measure the length of a curve as well as
the volume of any open sets of (sub)manifolds.
99
3.5
Covariant derivatives
On a manifold without the notion of angles (that is, without a metric), the only definition of
parallelism can be given at a point P : two vectors of TP are parallel if they are linearly dependent.
But one then needs a way to confront vectors belonging to the tangent spaces at different points.
One is in fact free to define this concept irrespectively of the metric. In particular, one can define
how to transport a vector parallely along a given path.
Let us consider again the example of the sphere S embedded in R3 . Being the latter a Euclidean
space, there is a natural notion of parallel transport: a vector is parallely transported if its angles
with cartesian coordinate vectors remain constant. Consequently, a vector transported along a
closed path returns into itself. From this notion of parallelism in R3 , we can induce a parallel
transport on vectors on S. However, by transporting vectors along loops, we now find they do not
return into themselves, in general.
Suggested bibliography: B. Schutz, Geometrical methods of mathematical physics, Cambridge
Univ. Press (1980). Selected chapters: 6.1-6.12.
3.5.1
0
~ with respect to V
~ at the point
We then define the covariant derivative of the vector field W
P (0 ) as the vector given by the limiting process
~
V~ W
~ (0 )
~ (0 ) W
W
,
0
= lim
(3.5.1)
whose result is a vector, by definition, and vanishes if the parallelly transported vector coincides
with the original vector in P . Note that, like for the Lie derivative, we are here transporting back
~ to the point P (0 ) from P (0 + ). However, unlike the Lie derivative, we do not
the vector W
here need a whole congruence but just one curve.
100
Since functions do not identify a direction, it is natural to define the covariant derivative of a
scalar to coincide with the Lie derivative, and thus with the total derivative
V~ f =
df
.
d
(3.5.2)
For general vectors and tensors, without specifying the actual transportation rule, we can still
require the covariant derivative satisfies some formal properties. First of all, we want the following
Leibnitz rules hold:
~ ) = f ~ W
~ +W
~ df
V~ (f W
V
d
(3.5.3)
~ B)
~ =A
~ ~ B
~ + ~ A
~ B
~
V~ (A
V
V
(3.5.4)
h
i
~+
~ .
~ = ~
A
(A)
V~
~
V
V
(3.5.5)
We also require that a change of parameterization of the curve , that is = (), does
~ = d and V
~ = d be the tangent vectors to and respectively,
not affect the derivative. Let V
d
d
d
d d
d
=h
.
d
d
d d
(3.5.6)
~ = h ~ W
~ ,
h V~ W
V
(3.5.7)
A
~
~ +W
~
V
P
(3.5.8)
so that
f V~ +gW
~ .
~ + g W
~ = f V
(3.5.9)
(3.5.10)
Affine connection
The formal properties introduced above allows one to obtain the components of the covariant
derivative of a vector field in terms of the so-called Christoffel symbols (or affine connection).
~ (0 ) and the difference between W
~ (0 ) and W
~ (0 ) on a basis
We start by expanding both V
of TP (0 ) ,
~ = V i~e W j ~ej
V~ W
i
= V i ~ei W j ~ej
= V i ~ei W j ~ej + W j (~ei ~ej ) .
(3.5.11)
The second term in brackets above is called the affine connection (or Christoffel symbols),
~ei ~ej = kji ~ek ,
(3.5.12)
and, for fixed i and j, is obviously a vector in TP (0 ) . However, unlike its index notation might lead
to think, is not a type (1, 2) tensor. In fact, consider a reference frame with coordinates {xi }
i = i xj , the
and coordinate basis {~ei = x
i }. We then see that, under a change of coordinates x
j
affine connection transforms according to
kj i = kk ii jj kji + kk i i (i kj ) .
(3.5.13)
~ = Vi
V~ W
(3.5.14)
(3.5.15)
~ enters only multiplicatively (that is, by contraction), it is customary (alSince the vector V
~ the type (1, 1) tenthough quite improperly) to also call covariant derivative of a vector W
sor (3.5.10), whose components are now given by
W k
W k =
+ kji W j .
xi
i
(3.5.16)
Several different notations are in use for these components, for example
i W k = W;ik = W,ik + kji W j .
102
(3.5.17)
(3.5.19)
and, finally,
Wk
jki Wj .
xi
By the same procedure, covariant derivatives of higher rank tensors are obtained.
i Wk =
(3.5.20)
Symmetric connection
An affine connection is symmetric if
kij = kji ,
which implies the remarkable relation with the Lie derivative
h
i
~ ~ V
~ = V
~ ,W
~ = ~ W
~ .
V~ W
W
V
(3.5.21)
(3.5.22)
~
~
moving along W first and then V . If the connection is not symmetric,
Tjik = kij kji 6= 0 ,
(3.5.23)
the two paths yield in general two different images of P , and one says that parallely transported
vectors are subject to torsion.
103
3.5.2
Geodesics
A geodesic is a preferred curve along which the tangent vector to the curve itself is transported
parallely. This notion allows us to extend to a general manifold the concept of straight line and,
eventually, of extremal curve (on metric manifolds).
~ = d be the tangent vector to a curve parameterized by R. Then, is a geodesic
Let V
d
~ satisfies
if V
~ = 0 ,
V~ V
P ,
(3.5.24)
P
and is then called an affine parameter . From Eq. (3.5.7), it immediately follows that this definition is invariant under a change of parameterization of (modulo singular points where the
remapping fails), which implies that the same geodesic can be described by different affine parameters. Eq. (3.5.24) can be written in a local coordinate frame, in which M is mapped into
xk = xk () Rn , as
k
k
V
j
k
i
~
V~ V
= V
+ ij V
xj
dV k
+ kij V i V j
=
d
j
i
d2 xk
k dx dx
=0,
(3.5.25)
+
=
ij
d2
d d
which is a set of n second-order differential equations for the variables xk = xk ().
Normal frames
(3.5.26)
(0)
will admit a unique solution with ~ei (P ) = ~ei (see Fig. 3.26). We can then use any of these n
d 2 xi
2 xi
=0,
=
(xj )2
d2(j)
104
(3.5.27)
dxi dxj
=0,
d(l) d(l)
(3.5.28)
along each of the n geodesics of parameters (l) , l = 1, . . . , n. Since the n directions are arbitrary,
we can infer that at the point P one must have
kij = 0 ,
(3.5.29)
P
and the system is then called (Gaussian) normal around P . It is now easier to see that defines
how the coordinate basis vectors are parallelly transported along the coordinate directions. In fact,
from Eq. (3.5.29), we obtain that the corresponding coordinate basis at P satisfies
~ei ~ej |P = 0 ,
(3.5.30)
(3.5.31)
and the covariant derivatives coincide with the Lie derivatives along the coordinate vector fields
(at P ). Further, note that it is in general not possible to define a reference frame in which = 0
in an open set, or, equivalently, one in general has
kij
k
(3.5.33)
ij,l 6= 0 ,
xl
P
P
Another useful formula is the one which gives the parallely transported vector starting from an
~ at P = P () along the curve of direction V
~ = d . Let Q = Q( + ) be a second
initial vector A
d
point on the curve, then
~ ) + . . . = e V~ A(P
~ ).
~
~ ) + ~ A(P
~ ) + 1 2 ~ ~ A(P
(3.5.34)
A(Q)
= A(P
V
V
V
2
It is immediate to understand the above expression if we work in the normal frame around P , in
which Eq. (3.5.32) holds and the geodesic map becomes the exponential map
i
V i i i
A (Q) = e
A
P
i
= A (P ) + V i Ai P + O(2 )
= Ai (P ) ,
(3.5.35)
105
where we used the compact notation ~ei = i for the coordinate basis and the fact that partial
derivatives of a vector defined at a point obviously vanish. The exponential map then simply
~ from to + , without affecting the components,
changes the argument of the vector field A
which shows the two vectors are indeed parallel in the naive sense. In a generic reference frame,
we must replace the partial derivative with the covariant derivative, with a connection which will
mix the coordinate basis vectors and the components of the vector according to our rule of parallel
transport.
3.5.3
The Riemann tensor is the mathematical quantity which allows one to define the curvature of a
manifold as the effect of parallel transport of vector fields along loops.
(3.5.36)
~ and W
~ along which we transport A
~ and
We then repeat the same process reversing the order of V
obtain
~ = e V~ e W~ A
~ .
A
VW
(3.5.37)
~ and W
~ , the
For infinitesimally small displacements || 1 and || 1, respectively, along V
difference between these two resulting vectors will be the vector
~=A
~ W V A
~ V W = ~ , ~ A
~ + O(3) ,
A
(3.5.38)
W
V
106
~ ,W
~ )A
~ = ~ , ~ A
~~ ~ A
~ ,
R(V
W
V
[V ,W ]
(3.5.39)
i
h
~ ,W
~ )A
~ R(V
~ ,W
~ )i j Aj ~ei ,
R(V
(3.5.40)
~ = R(V
~ ,W
~ )A
~ + O(3) ,
A
(3.5.41)
i
Ai = Rjkl
V j W k Al + O(3) .
(3.5.42)
or
This yields the precise mathematical meaning of the concept of intrinsic curvature of a manifold:
whenever the Riemann tensor does not vanish, parallely transporting a vector along a closed path
does not return the vector to its initial value. Conversely, if there exist loops such that vectors
parallely transported along them do not return into themselves, the manifold is curved. Note that
this definition of curvature is intrinsic since it does not require embedding (viewing) the manifold
M into (from) a larger space. An equivalent definition of intrinsic curvature involves measuring
the sum of the internal angles of a triangle, and was proposed by Gauss long ago as an experiment
to measure the Earths curvature.
A simple example is again given by the sphere in R3 : one can of course define the extrinsic
curvature radius R (and extrinsic curvature 1/R) from the defining condition x2 + y 2 + z 2 = R2 .
However, the same conclusion can be drawn without referring to R3 at all, by simply noting that
a vector transported along a loop starting from (say) the North pole, reaching the equator on a
meridian, moving along the equator a distance R , and coming back to the North pole along a
meridian, will appear rotated of the angle . In this case, the intrinsic and extrinsic curvature radii
coincide. However, in general, the two quantities may be different.
From the definition (3.5.40), it is easy to see that the Riemann tensor R has the following
properties
~ ,W
~ )(f A)
~ = f R(V
~ ,W
~ )A
~
R(V
(3.5.43)
~ ,W
~ )A
~ = R(V
~ ,f W
~ )A
~ = f R(V
~ ,W
~ )A
~ ,
R(f V
(3.5.44)
(3.5.45)
3.5.4
Metric connection
So far we have not specified any affine connections. But we are really interest in the case in which
parallel transport preserves lengths and angles, which requires the manifold M is endowed with a
metric tensor g.
107
~ and B,
~ and assume they are transported parallely along a
Let us then consider two vectors A
~
~
~
curve of tangent V , that is V~ A = V~ B = 0. It is natural to demand that the scalar product
between these two vectors does not change along the curve,
h
i
~ B)
~ =0,
~, B
~,V
~ such that ~ A
~ = ~ B
~ =0,
A
(3.5.46)
V~ g(A,
V
V
~ ,
V
V~ g = 0 ,
(3.5.47)
(3.5.48)
Upon expressing this equation in a specified coordinate frame, one finds that it is tantamount to
an equation for the affine connection, namely
kij =
1 kl
g (gil,j + gjl,i gij,l ) .
2
(3.5.49)
Since g is symmetric, one can immediately see that a metric connection is necessarily symmetric.
All expressions can be simplified by assuming the metric is in canonical form at a point P , so
that it can be expanded as
1 2 gij
xk xl + . . . .
(3.5.50)
gij = ij +
2 xk xl P
Eq. (3.5.49) above then implies that
gij,k |P = 0
kij = 0 .
P
(3.5.51)
Starting from P , we can then consider n linearly independent directions and the corresponding
geodesics will form a Gaussian normal reference frame around P (at least in a sufficiently small
neighbourhood of P ). As we showed in Section 3.5.2, in this particular reference frame, covariant
derivatives along the coordinate directions are also Lie derivatives at P . The consistency condition (3.5.49) then implies that the coordinate basis vectors are also Killing vectors at P , and
the metric correspondingly admits (at least) n point isometries at P . It is important to remark
that, strictly speaking, Killing vectors are only defined as fields and the condition (3.5.51) should
therefore hold in an open set of the manifold. Since we have seen that it is in general impossible to
put the metric in canonical form in an open set, we are thus specifiying at P in order to stress
this fact.
Since in a normal frame, covariant derivatives (at a point) can be replaced by partial derivatives (at the same point), one finds the Riemann tensor has components simply given by second
derivatives of the metric (at a given point),
Rijkl =
1
(gil,jk gik,jl + gjk,il gjl,ik ) ,
2
(3.5.52)
R = Rkk
(3.5.53)
(3.5.54)
1
Gij = Rij R gij .
2
(3.5.55)
All expressions can then be generalised to any frames by simply replacing partial derivatives with
covariant derivatives. For example, the general form of the Riemann tensor is given by 17
Rijkl =
1
(gil;jk gik;jl + gjk;il gjl;ik ) ,
2
(3.5.56)
(3.5.57)
Upon generalizing to any frames, we then obtain the important Bianchi identity
i Gij = 0 ,
(3.5.58)
which resembles (and actually is) a conservation law, as we will elucidate further.
Another important expression we will make use of, is given by the Killing equation (3.3.78). In
a general reference frame, one can work out it out and finds that
(3.5.59)
0 = V~ g ij = Vi;j + Vj;i V(i;j) .
Note that in a normal frame around the point P , the above becomes
V(i,j) P = 0
(3.5.60)
17
109
110
Chapter 4
General Relativity
Suggested bibliography: B. Schutz, A first course in general relativity, Cambridge Univ. Press
(2009) [4]; S. Carroll, Spacetime and geometry, Addison-Wesley (2004) [5]; (see also arXiv:grqc/9712019). L. Landau e E. Lifsits, Teoria dei campi, Editori Riuniti (1976) [6].
4.1
We started from Newtonian mechanics and its Galilean invariance, that is a Principle of Relativity
for the laws of mechanics with absolute time, which is compatible with Newtons law of gravity.
The non-invariance of Maxwells equations led us to replace the Galilean Principle of Relativity
with an enlarged version, the Principle of Special Relativity, that covers electromagnetism and
further requires invariance of the speed of light:
Galilean Relativity: The laws of (Newtonian) mechanics are the same for all inertial
observers (and time is absolute).
1) Newtons law of gravity (conservative forces): consistent and yields very a accurate description of astronomical observations.
2) Maxwells electromagnetism: incompatible.
Special Relativity: The laws of physics are the same for all inertial observers and
the speed of light in vacuum is invariant.
1) Newtons law of gravity (action at a distance): incompatible.
2) Maxwells electromagnetism (field-mediated interactions): fully endorsed.
To summarize, Special Relativity has (at least) two drawbacks:
1) it still makes use of the ambiguous concept of inertial coordinate systems;
2) it claims to cover all of physics, but (the very accurately verified Newtonian theory of)
gravitation is excluded.
From the mathematical point of view, Special Relativity is realized by assuming the existence of
global (inertial) reference frames connected by Lorentz (Poincare) transformations. The requirement that the laws of physics are the same is therefore given the mathematically precise meaning
111
that physical laws may only involve quantities represented by tensors under the Lorentz (Poincare)
group and legit tensorial operations among them.
Ideally, a mathematical reference frame should be associated with a measuring apparatus. However, all physical measurements are carried out using detectors with finite spatial and temporal
extension (with no a priori guarantee of being inertial), and should therefore be better described
by generic local reference frames. For this reason we endeavoured the study of differential geometry,
which provided us with mathematical tools (local charts, tensors and new tensorial operations) to
write equations in any coordinate system, inertial or not. These tools turned out so powerful that
we may now write equations in the same form in any arbitrary reference frames. It is thus tempting
to speculate physics can be formulated in a way that is totally independent of the reference frame
or, more physically, in a way that can be adapted to any measuring apparatus, regardless of its
inertial nature. This is in essence the:
Principle of General Relativity: The laws of physics are the same in all reference
frames (for all observers).
Assuming to each physical observer there can be associated a reference frame (and, quite ideally, also
the other way around), the principle of General Relativity can be translated into the mathematical
requirement that all physical laws must involve only tensors and tensorial operations in the sense
of differential geometry (with no a priori connection with Lorentz transformations). We could
actually go as far as saying that without the mathematical machinery of differential geometry,
the principle of General Relativity would have remained an empty statement, as Einstein himself
basically admitted when recognising the works of Ricci Curbastro and Levi-Civita 1 .
Of course, the principle of General Relativity does not tell us what the laws of physics are, but
experiments show that Special Relativity works very well in describing phenomena in our laboratories. The question then naturally arises as to how General Relativity may be compatible with
Special Relativity and solve its problems. Let us first go back to the original issue of consistently
defining an inertial observer, and remove the assumption that reference frames and observers are
equivalent. In fact, it is more realistic to think of observers as (possibly extended) physical apparati that move along trajectories (curves) in space-time, starting from which one can then define
mathematical reference frames that cover larger portions of the space-time manifold 2 . In order
to qualify any such apparatus as defining an inertial frame we would then need an independent
way to determine whether an object is subject to a force. If we believe in our present knowledge
of fundamental forces within Special Relativity, this is actually possible for electromagnetism and
nuclear forces, because
Standard model of elementary particles: The strength of gauge (vector) fieldmediated interactions 3 is governed by charges of both signs.
Put another way, gauge interactions are both attractive and repulsive. By preparing an object with
zero charge(s), we are therefore guaranteed that the only force acting on it could be gravity.
It is a fact that the gravitational attraction between two bodies cannot be made to vanish,
however gravitational effects can be eliminated from the picture by considering a freely falling
1
Two of the founding fathers of the then-called absolute calculus, which, in modern terms, amounts to the
introduction of the covariant derivative.
2
Any reference to the exponential map and alike is clearly implied.
3
The gauge vector fields of electroweak and strong interactions.
112
observer, which will not measure any gravitational acceleration in whatever experiment he carries
on. The latter two observations are encoded in the
Equivalence Principle: For all physical objects, the gravitational charge mg (mass)
equals the inertial mass mi 4 .
This was first hypothesised by Galileo, who (presumably) verified it by letting objects fall from the
Pisa tower and observing they reached the ground at the same time, independently of their mass,
shape or chemical composition. Of course, since the Newtonian description of this experiment is
sufficiently accurate, we can say that this result occurs because, from Newtons second law for a
massive particle in a homogeneous and constant gravitational acceleration field ~g , one as
mg ~g = mi ~a
~a = ~g ,
(4.1.1)
if mg = mi for all bodies. In particular, both the observer (a physical apparatus) and the test
bodies will sustain the same acceleration and one cannot devise any local observation that can tell
whether one is not subject to any gravitational attraction at all, or if one is inside an elevator falling
freely towards the ground, which is Einsteins version of Galileos experiment. Such an example
makes it plainly clear that a freely-falling reference frame cannot be global but must follow the
line of force of gravity, and will therefore be local in space and time in general.
Keep also in mind that non-gravitational forces can be strictly made to vanish only for point-like
objects. In fact, consider for example a ruler we wish to use in order to define a spatial axis of our
reference frame. Internal electromagnetic and nuclear forces will keep this ruler of a fixed length,
so that, if its centre of mass is in free fall, the end-points will not, and the corresponding reference
frame will be strictly inertial only along the trajectory of the centre of mass (a point in space).
Of course, whereas the notion of zero in mathematics is precise, physically a quantity is zero if we
cannot tell its measured value apart from zero within our experimental errors. One can therefore
assume that freely falling, inertial frames can be defined in a sufficiently small neighbourhood UP
of each space-time point P , and the laws of Special Relativity, which may strictly hold only at each
point P , will also be sufficiently good approximations of the true laws inside UP for all inertial
observers defined therein. In fact, since the metric must (locally in space and time) reduce to the
Minkowski form for freely falling observers, there must also exist corresponding Killing vectors at
a point P tangent to the trajectories of each freely falling observer (meaning the Lie derivatives of
the metric tensor are given by partial derivatives and vanish along the orthonormal directions at
each point where the inertial frame is defined). One can make use of such Killing vectors to build
local reference frames starting from each point P of the trajectory of the freely falling observer.
These are the kind of frames we qualified as Gaussian normal, and the symmetries of Special
Relativity will then hold exactly at P and (approximately) in a (sufficiently small neighbourhood)
UP . A particularly neat example is given by a space station orbiting the earth. Its trajectory can
be (approximately) described by an ellipse in space-time from the point of view of an observer on
the earth, however the station is in free fall and one could place rulers on the inner walls of a living
area inside the station to define a triad of space-like vectors and from these generate a local inertial
frame. From the point of view of the earth observer, these three vectors and the one time-like
vector tangent to the stations trajectory rotate along the ellipse, although they truly define
a parallel transport along the stations trajectory 5 . If we next consider a second space station
4
When the latter is not zero. This excludes photons and other massless particles, which cannot be stopped or
accelerated.
5
These four basis vectors are called tetrads or vierbien and must be explicitly introduced to describe the spin in
General Relativity.
113
orbiting the earth not far from the previous one, we can repeat the same construction and build a
second locally inertial reference frame. We can then paste together these two frames smoothly.
However, since the two stations orbit with different angular velocities (from the point of view of
the earth observer), it is clear that their tetrads will rotate with different speeds as well, and
a vector parallely transported along a closed path in this reference frame will consequently not
coincide with itself 6 . In other words, we know that the Newtonian theory predicts the appearance
of gravitational tidal forces between the two stations (which could be measured, for example, by
means of a spring connecting them). These forces reflect in the non-vanishing of the Riemann tensor
in General Relativity, thus space-time curvature, as a consequence of parallel transport being tied
to local inertial observers.
To summarise, from the mathematical point of view, the Equivalence Principle means that
freely falling observers are the true inertial observers, for which one finds
~e ,
(4.1.2)
along normal directions in suitably small neighbourhoods UP , and physics in these frames must
be locally (and, in the worst case, only at a space-time point P ) described, according to Special
Relativity, by tensorial equations (in the sense of the local Lorentz group). (General translations
are of course lost since they connect observers at different locations.) The Lorentz group SO(3, 1)
hence remains a symmetry of physics (strictly speaking) in the tangent space TP at all points
P of the space-time manifold M and approximately in the sufficiently small neighbourhoods UP .
According to the principle of General Relativity, different (non-inertial) observers will then see the
laws of physics of Special Relativity as tensorial equations in the sense of differential geometry, with
the partial derivatives replaced by covariant derivatives. This is precisely encoded in yet another
principle:
Principle of General Covariance: The laws of physics in a general reference frame
are obtained from the laws of Special Relativity by replacing tensor quantities of the
Lorentz group with tensor quantities of the space-time manifold.
In practical terms, this means that one takes a law of physics in the locally inertial frame at a
point P as given by Special Relativity and re-interpret tensorial indices of the Lorentz group as
representing the components of tensors under general coordinate transformations. Further, the
Minkowski metric (used to raise, lower and contract indices) must be replaced by a general metric
tensor with the same signature
g ,
(4.1.3)
where, from now on, we shall only consider four-dimensional space-time manifolds M with coordinates x = (x0 , x1 , x2 , x3 ) = (x0 , xi ) and metric signature (, +, +, +), unless differently specified.
General Covariance and Eq. (4.1.3) then imply that partial derivatives must also be replaced by
the metric covariant derivative
(4.1.4)
6
Note that this operation cannot be realised physically, since nothing can travel along a closed path in space-time
without violating causality (perhaps)!
114
F = J .
(4.1.5)
From the physical point of view, this seemingly simple mathematical replacement should not undermine the fact that there are now two terms in the left hand side: one representing the usual flat
space gradient, and the second entailing the effect of curved space-time (gravity) on the propagation
of electromagnetic degrees of freedom (the photons).
It really cannot be emphasised enough that, for this construction to work, it is crucial that one
can always put the metric tensor in canonical form locally, with its first derivatives (the metric
connection) locally vanishing at the same time, so that Eq. (4.1.2) holds. Without this general
property of metric manifolds, one could not embed Special Relativity inside General Relativity
and, given the experimental success of the former, the latter could not be made into a physical
theory at all. It is not hard to see that this mathematical property must have been a true source
of inspiration for Einsteins ideas.
4.2
Gravitational equations
The above construction covers all the interactions of the Standard Model of particle physics, but
does not yet explicitly include a description of gravity at all. This means two questions are still
open:
Q1) how do we describe the action of gravity on a test particle?
Q2) what sources gravity and how do we determine gravity from its sources?
There is a natural answer for Q1) that follows from the stated principles (like Newtons force law
naturally follows from the principle of Galilean invariance), whereas Q2) must be addressed as an
independent issue (much like Maxwells equations and the equations governing any fundamental
interaction do not follow from relativity principles).
4.2.1
First of all, the Newtonian idea that gravity is represented by an acceleration field ~g cannot work,
since ~g is a three-vector for which one can hardly conceive a temporal component to build up a
four-vector g . However, Newtonian gravity describes the motion of celestial bodies with very high
accuracy and this implies that General Relativity must reduce to the Newtonian theory in some
suitable limit.
Freely falling observers and test particles
Since a freely falling observer is inertial, the local metric in its own reference frame is the canonical
Minkowski metric all along (of course, only in a sufficiently small neighbourhood of each point P
of the observers trajectory). Let u = dx /d denote the four-velocity of a test particle subject
to no other force (but gravity). In the freely falling frame, it must then move along a straight line,
0 = 2
d 2 x
d2 x
d2 x
dx dx
= u u ,
=
=
+
dt2
d 2
d 2
d d
(4.2.1)
since = x0 t, where = (1 u2 /c2 )1/2 is the usual special relativistic factor for a particle
moving with (constant) velocity ~u = d~x , and g, = 0 in the coordinate system of the freely
dt
115
falling observer. Note that this argument, strictly speaking, only holds at a space-time point, say
P , where the trajectory of the freely-falling observer and the trajectory of the test particle happen
to cross. However, the final result (4.2.1) is frame-independent and we can simply say that test
particles follow geodesics of the given space-time metric. This trajectories are usually referred to
as world-lines.
This argument further implies that the inertial observer itself moves along a geodesic of the
space-time metric, as can be simply deduced by considering the observer as a test particle at rest
with respect to itself. In a different frame (equivalently, for a different observer), the Christoffel
symbols will not be zero at P , and this suggest that the metric g can be viewed as a potential for
the gravitational interaction,
g, ,
(4.2.2)
(4.2.3)
Both the non-relativistic limit and the weak field limit can now be implemented by Taylor expanding
in all of our expressions and keep only the first order. (We can then formally set = 1 at the end
of the computation in the truncated expressions.) In particular, the four-velocity becomes
u = 1 + O(2 ), ~v + O(2 )
= (1, ~0) + (0, ~v ) + O(2 ) ,
(4.2.4)
so that
d~v
d 2 x
= 0,
+ O(2 ) ,
d 2
dt
(4.2.5)
= g (h, + h, h, )
2
= (h, + h, h, ) + O(2 ) ,
2
116
(4.2.6)
where the derivatives of the metric are different from zero only if they are not taken with respect to
time, and we must recall that is diagonal. This implies that the only non-trivial components
of the geodesic equation at order are given by (no sum over i)
d 2 xi
d 2 x
dx dx
+
+ i 0 0
d 2
d d
dt2
2 i
1 ii
d x
h00,i ,
dt2
2
(4.2.7)
in which we used
i00
ii
(4.2.8)
(4.2.9)
(4.2.10)
We shall see later that V in Eq. (4.2.9) exactly reproduces Newtons potential once a solution for
g is obtained outside a spherically symmetric body.
An important remark is now in order. We have not yet explicitly considered light, that is a
signal which propagates at a speed equal to c and can therefore be associated with a massless
particle. In Special Relativity, light propagates along the null cone, which is a geodesic of the
Minkowski metric, and one can easily show that this result generalises to any space-time metric.
In fact, the modulus u u = C of the (parallel transported) tangent vector u to a geodesic is
conserved along the geodesics itself, since
u C = 2 u u u = 2 u (u u ) = 0 .
(4.2.11)
Given a point P along a physical geodesic, its four-velocity must satisfy u u = C, where C = 1
for massive particles and C = 0 for light, in a locally inertial reference frame at P . The principle of
Covariance then implies that u u = C in any reference frame and Eq. (4.2.11) ensures the modulus
is conserved. We can therefore conclude that light also propagates along geodesics, although there
is no (affine) parameter along such geodesic that can be identified with a proper time 7 . It
also follows from Eq. (4.2.11) that the metric g encodes much more information than a usual
potential field, like, for example, the four-vector A of electromagnetism: it determines the causal
structure of space-time by governing the propagation of light and of any other signal. This is the
very essence of Einsteins geometric view of gravity.
7
Keep in mind that geodesics are defined modulo an arbitrary reparametrization along the world-line. Only one
of such affine parameters will coincide with the proper time (for a massive particle).
117
4.2.2
Now, answering Q2) is a lot more of a guesswork. First of all, g is symmetric and contains
(at most) 10 independent components. We therefore need ten equations for such components
and we would like they be at most second order partial differential equations, like is the case for
Maxwells equations. We therefore need a tensor constructed solely from g and up to its second
partial derivatives and there are not many choices: the Riemann tensor and its contractions. One
possibility is the (0, 2) Einstein tensor,
G = R
1
R g ,
2
(4.2.12)
(4.2.13)
G = R R g ,
2
and the above property reduces the number of components of the Einstein tensor from (45)/2 = 10
to 10 4 = 6 (which, incidentally, is the number of components of the spatial metric).
If the tensor (4.2.12) is to be the left hand side of the equation which determines the metric, the
source on the right hand side must have the same mathematical properties: it must be a symmetric
and covariantly conserved (0, 2) tensor built out of the matter content of the system. One such
tensor is the energy-momentum tensor , which for a perfect fluid with four-velocity u , is given by
T = u u + p (g + u u ) = ( + p) u u + p g ,
(4.2.14)
where is the (proper) density and p the (proper) pressure, both measured by an observer comoving
with the fluid 8 , which makes such quantities true scalars. Note the tensor multiplying the pressure
is orthogonal to u (recall that u u = 1),
(g + u u ) u = u u = 0 .
(4.2.15)
In fact, in the frame comoving with the fluid, the four-velocity u = (1, 0, 0, 0) and d = dt, which
means g00 = g 00 = 1 and g0i = g0i = 0. We then have
T =
,
(4.2.16)
0 p gij
which implies
T = diag (, p, p, p) .
(4.2.17)
In order to better understand the meaning of the energy-momentum tensor, let us first consider
a particle of four-momentum p and an observer with four-velocity U . It is easy to see that the
energy of the particle as measured by the observer is given by
E (0) = p U .
(4.2.18)
8
A necessary requirement is therefore that the fluid particles be massive. We shall see later on how to deal with
radiation.
118
,
U = (1, 0, 0, 0) V(0)
(4.2.19)
m
dt
,
=
d
1 u2
(4.2.20)
i
where ~u = dx is the particles three-velocity in the locally inertial frame of choice and we used
d
U0 = U 0 = 1. We can of course complete the vierbein of basis vectors of TP with three spatial
=
p V(i)
pi
m ui
=
1 u2
1 u2
(4.2.21)
yield the spatial components of the four-momentum of the particle in the rest frame of the observer.
Note that quantities measured by a given observer are correctly represented by scalars, namely
,
E(a) = p V(a)
a = 0, 1, 2, 3 ,
(4.2.22)
which, furthermore, form the components of a Lorentz vector (under local Lorentz transformations
of the tetrad at P 10 ).
Let us now consider the density term in the energy-momentum tensor, and contract it with
U = (1, 0, 0, 0) twice,
.
(4.2.23)
u U =
( u u U ) U =
2
1
u2
1u
The two factors of 1 u2 in the denominator are easily explained by first recalling that the proper
density (as measured by an observer comoving with the fluid) is defined by
=
m
,
V0
(4.2.24)
Such a locally inertial observer will coincide with our chosen observer only at the space-time point P . The metric
in P will have the canonical Minkowski form.
10
Incidentally, this further shows how the Lorentz group of Special Relativity is embedded as a local symmetry
into General Relativity.
119
An important property of fluids is the continuity equation, which in a locally inertial frame
reads
~ ~p = 0 ,
t +
(4.2.26)
and just tells us that energy is conserved: the loss of energy per unit time t of a portion of
fluid inside a proper volume V0 equals the work per unit time done by that fluid to expand V0
(which, for example, is given by x px in the x-direction). Another way to look at Eq. (4.2.26) is
by integrating it over a cell of volume V0 . Let us for example assume the cell is a cubic box with
a x, y, z a+ , and p~ = (px (x), 0, 0), so that
Z
Z
t dx dy dz = t
dx dy dz = t E ,
(4.2.27)
V0
V0
(4.2.28)
V0
where A0 = (a+ a )2 is the area of the square surfaces at x = a and F x = A0 px is the force
acting on such surfaces. If matter is not created or destroyed inside the cell, we can apply the usual
theorem relating kinetic energy to the work of the force, F x = E/x, and finally obtain
E
E
.
(4.2.29)
t E =
x x
x x+
The above means that the rate of energy increase inside the cell equals the amount of energy which
enters from the left (through the surface at x = a ) minus the energy which exits (through the
surface at x = a+ ). In a general reference frame, (4.2.26) means that the energy-momentum tensor
is covariantly conserved,
T = 0 ,
(4.2.30)
where we recall the covariant derivative now contains extra terms (with respect to the Minkowski
case) which represent the effect of gravity on the fluid. This makes T the natural candidate as
the source of gravity.
The only details that remains to be fixed is the engineering dimensions: for the Einstein tensor
[G ] = L2 whereas [T ] = M L3 . This requires a dimensional coupling constant GN , with
[GN ] = L/M , so that we can finally write the Einstein (field) equations
R +
1
R g = 8 GN T .
2
(4.2.31)
Incidentally, we note that GN is Newtons constant, and its physical role is therefore to convert
mass into length (like the speed of light converts time into length).
The above equations entail an interpretation of gravity as the geometry of space-time: matter
determines the (Ricci) curvature of space-time (and, to some approximation, follows geodesics in
such space-time 11 ).
11
The idea of a test particle is simply an approximation: were we able to solve Einsteins equations for all matter
systems, we would not need this concept and everything would be determined by Eq. (4.2.31) solely.
120
4.2.3
Let us summarise the principles of General Relativity and their connection with the mathematical
background we have developed:
G. R.
E. P.
G. C.
Differential Geometry
g = + O(2)
Tensor calculus
Local SO(3, 1)
There is therefore an explicit connection between this description of gravity and the geometry of
space-time, which deserves some clarifications.
By looking at the equation of motion of massive test particles, it is clear that the concept of
straight line is replaced by that of geodesic line. However, the geodesic equation is a second
order differential equation like Newtons law of mechanics, and one may just look at the connection
term as a force acting on the particle. We have in fact seen that such term reduces to that of a
conservative force in the weak field limit. Since no particle (massive or massless) sees a (globally)
flat space-time 12 ,
d2 x
dx dx
=0,
with v v = 0 ,
(4.2.32)
+
d2
d d
we can start to think of gravity in terms of pure geometry.
The above conclusion is further supported by the Einstein field equations, according to which
matter sources determine the space-time curvature, which in turn affects the matters motion. This
is all encoded in ten (with only six independent) partial differential equations of a highly non-linear
kind: unlike Newtons law, the effect of two gravitational sources is not just their sum. A graphical
picture of this geometrical view of gravity is given by the so-called Einsteins billiard: the spacetime is represented by a sheet of elastic material upon which rest the sources, and motion of test
particles therefore follow curvy lines.
~v~v =
4.2.4
There are three historical tests conducted within the Solar system which strongly support General
Relativity.
Perihelion precession of Mercury
By solving the equation of motion for a test particle in the gravitational field of a much more
massive body, one finds almost elliptic orbits, similar to those predicted by Newtonian mechanics.
The difference can be modeled by a rotation of the axes of the ellipse or, equivalently, of the point
of minimum distance from the source, around the source itself. This point, for planets in the solar
system moving around the sun, is called perihelion, and its motion was already predicted in studies
of Newtonian celestial mechanics, due to the presence of the other planets in the solar system.
A small fraction of the observed precession for the perihelion of the orbit of Mercury, however,
remained unexplained until Einstein employed his new theory to find an astonishingly good agreement: General Relativity was able to explain a difference with respect to the Newtonian theory of
just 43 per century.
12
Note can be the proper time for massive particles and is a generic affine parameter for light signals.
121
(4.2.33)
4.3
Black holes
Soon after Einstein proposed the equations (4.2.31) for the gravitational interaction, Karl Schwarzschild found a spherically symmetric solution which carries his name, and is the prototype of a
black hole space-time. We shall here sketch the derivation and review some of its main features by
studying geodesics.
4.3.1
Let us consider a spherically symmetric source, such as would be the earth or the sun to first
approximation. Of course, modeling the interior of an astrophysical object is anything but easy.
However, if we are just interested in the region outside the source, everything simplifies significantly.
122
In fact, outside the source, the space-time is empty and T = 0. Upon taking the trace of the
Einstein tensor, we obtain for the curvature scalar
R
1
R g = 2 R = 0 ,
2
(4.3.1)
(4.3.2)
We can now try and solve Eq. (4.3.2), and, in doing so, we expect the general solution will depend
on free parameters that we could later fix by means of information coming from the region where
T 6= 0. In the specific case at hand here, we will in fact see that the assumed symmetry of
the space-time is strong enough to reduce the arbitrariness to just one parameter, whose physical
meaning can be obtained from the weak-field limit, regardless of the details of the source. We shall
call this one-parameter family of spherically symmetric solutions to Eq. (4.3.2) the Schwarzschild
(metric) manifold, or just the Schwarzschild space-time.
Finding solutions to Eq. (4.3.2) can be greatly eased by making use of isometries, that is, by
assuming the existence of Killing vectors. First of all, we shall require the metric is static, so that
~ t , and a suitable coordinate t, so that we can write
there exists a time-like Killing vector K
~t = .
K
t
(4.3.3)
Moreover, since the source is spherically symmetric, we also assume the existence of three space-like
Killing vectors corresponding to rotations around axes with origin at the centre of the source,
~i = d ,
K
di
i = 1, 2, 3 .
(4.3.4)
t,
The above three vectors must be conserved in time, which means they must commute with K
d
,
=0.
(4.3.5)
t di
t , like they are in
We may therefore assume the metric is such that rotations are orthogonal to K
1+3
the Minkowski space-time R , and that we can use the analogue of polar coordinates on surfaces
of constant t. This allows us to write the metric in diagonal form
ds2 = A(r) dt2 + B(r) dr 2 + C(r) d 2 + sin2 d2
= A(r) dt2 + B(r) dr 2 + r 2 d2 ,
(4.3.6)
where we have also used the freedom to rescale the radial coordinate r so that C = r 2 . Since the
metric tensor element only depend on r (and trivially on ), one can always redefine the coordinate
r so that the above holds locally. However, one should keep in mind that the rescaling may change
the domain of definition of r, if the transformation becomes singular at some point. With this
choice, the area of a sphere of coordinate radius r is given by
Z
Z q
Z
det g(2) d d = r 2 sin d d = 4 r 2 .
A(r) = d2 =
(4.3.7)
123
For this reason r is commonly called the areal radius. Note now that the proper length of the
radius of such a sphere is not r,
Z r
Z rp
R(r) =
grr dx =
B(x) dx ,
(4.3.8)
0
unless B = 1, and the space-time outside the source is in general curved. Finally, let us note that, in
order for {t, r, , } to be a proper chart, we should also define their domain of definition (technically,
the image of the open subset of the Schwarzschild manifold covered by these coordinates). The
assumed isometries again help here, since time-translation invariance means we can suppose
< t < + ,
(4.3.9)
and the spatial volumes (at fix t) can be foliated by two-dimensional spheres (the submanifolds
generated by the rotations) on which the angular coordinates take their usual values 13
0 ,
0 < 2 .
(4.3.10)
Not much can yet be said for the radial coordinate r, for which we can only expect it goes up to
infinity, where the metric presumably approaches the Minkowski form. Of course, this does not
assure us that the chosen set of coordinates covers the whole Schwarzschild manifold. In fact, this
point represents a very significant difference with respect to other field equations of physics: unlike
Maxwell equations, for example, which are a priori defined everywhere on a given manifold (such
as R1+3 in Special relativity), Einstein equations (4.2.31) [or the vacuum version (4.3.2)] implicitly
define the manifold on which the metric lives. In other words, we can say that the unknown
determined by Eq. (4.2.31) is not just the metric tensor but also the manifold on which it exists.
By inserting the above expression for the metric into Eq. (4.3.2), one obtains only two independent equations (out of the original four
2K
,
r
A=1
B = A1 ,
(4.3.11)
where K 0 has dimensions of a length and emerges as an integration constant. Note that
the metric reduces to Minkowski for K = 0 but, at this point, does not contain any information
about the source, nor the gravitational constant GN [which in fact does not appear in the vacuum
equation (4.3.2)].
In order to understand the physical meaning of K, let us look at the weak field limit, Eq. (4.2.7) 14 ,
1
d2 xi
= ii h00,i = V,i .
2
dt
2
(4.3.12)
Clearly,
g00 = 00 +
2K
r
V =
K
,
r
(4.3.13)
which implies
K
d2 r
= 2 ,
dt2
r
13
14
We do not need to be particularly picky here about the singularity of such coordinates at the poles.
No summation over the index i.
124
(4.3.14)
3.5
R
rH
3.0
2.5
2.0
1.5
1.0
0.5
1.5
2.0
2.5
3.0
r
rH
(4.3.15)
where we can now interpret M as the total mass of the source as measured by a distant observer.
Finally,
2 GN M 1 2
2 GN M
2
2
dr + r 2 d 2 + sin2 d2 ,
(4.3.16)
dt + 1
ds = 1
r
r
is the famous Schwarzschild metric. The length rH plays a crucial role in the Schwarzschild spacetime and is called the Schwarzschild radius. Note that for r rH , the Schwarzschild metric
approaches the Minkowski flat metric, a property referred to as asymptotic flatness. Static observers associated with the Schwarzschild coordinates {t, r, , }, and placed at fixed r, are therefore
asymptotically inertial (at r rH ), but they depart from being inertial the more they approach
rH . Note also that the proper radius of a sphere of area 4 r 2 is given by
Z rH
Z r
dx
dx
p rH
p
R(r) =
+
1 rxH
0
rH
x 1
r
r
rH rH
rH
r
+
log 2
1+ 1
1 ,
(4.3.17)
= r 1
r
2
rH
r
for r > rH , so that R(rH ) = 0 and R(r) > r for r & 3 rH /2 (see Fig. 4.1). However, for r < rH , we
have that grr < 0 and the coordinate r becomes time-like. The above result (4.3.17) is therefore of
little physical meaning, and the geometry of the Schwarzschild space-time will be better understood
by studying its geodesics.
4.3.2
Radial geodesics
The Schwarzschild metric (4.3.16) does not carry any dependence on the size of the spherically
symmetric source that generates it. We can therefore suppose that the source has a very small
areal radius rs rH (ideally reducing to a point), in which case the metric (4.3.16) shows a clearly
disturbing feature:
r rH
gtt = 0 and
125
grr .
(4.3.18)
Further, the signs of gtt and grr change across r = rH , so that inside the Schwarzschild radius t
t is associated with
becomes a spatial coordinate and r is a time (but note that the Killing vector K
g x x d m
2 T d ,
(4.3.19)
S[x ( )] = m
0
and recover the geodesic equations as the Euler-Lagrange equations of motion 15 . In Schwarzschild
space-time and using the mass-shell condition for massive particles, namely
2 GN M 1 2
2 GN M 2
(4.3.20)
t 1
r r 2 2 + sin2 2 = 1 ,
2T = 1
r
r
where all coordinates are now functions of the particle proper time , with f = df /d , one finds
Z
Z
Z
T
S = m
2 T d = m
T d .
(4.3.21)
d = m
2T
0
0
0
d x
x
(4.3.22)
thus avoiding to deal with the square root. Since T does not depend on t and (because of the
t for time translations and K
for one of the three rotations),
existence of the Killing vectors K
from Noethers theorem, we expect two integrals of motion, namely
2 GN M
m T
t,
(4.3.23)
=m 1
E=
2 t
r
which reduces to the proper mass m in the weak field limit r rH and t = 1, and
J =
m T
= m r 2 sin2 ,
2
(4.3.24)
which gives the angular momentum around the axis that defines the angle if we choose = /2.
The latter choice is always possible, since the directions of the axes are arbitrary, due to the
15
The advantage of an action principle is that it makes the effect of symmetries more apparent via Noether theorem.
126
Note that this implies that the particle motion occurs on a plane passing from the origin of the
reference frame.
We are now left with just the equations of motion for = ( ) and r = r( ), which we consider
for the simple case of radial in-fall, namely = 0 and constant. It is then easier to use Eq. (4.3.23)
and write
2 GN M 1 E 2
2
r
.
(4.3.26)
2T = 1 = 1
r
m2
This yields
2 GN M
E2
r =
1 2 ,
r
m
2
(4.3.27)
(4.3.28)
d
r rH
(4.3.29)
and diverges for a trajectory that approaches rH . In particular, an asymptotically inertial observer
placed at r rH which measures the time t would see a particle falling (radially) take an infinite amount of (coordinate) time t to reach the Schwarzschild radius. Upon deriving the above
equation (4.3.27) with respect to , we obtain
r =
GN M
,
r2
(4.3.30)
which is again Newtons law but is now valid for all values of r > 0. Note that this result does not
imply there is no General Relativity correction to Newtonian trajectories: the radial coordinate r
is not the proper distance from the sources centre and is not the observers time t.
We remark that Eq. (4.3.27) allows for radial trajectories that fall from r rH to r rH in a
finite amount of proper time . However, as we mentioned above, due to Eq. (4.3.28), the static
asymptotic observer at r rH would see such a trajectory approach rH and never cross it (see
Fig. 4.2). Let us clarify this point further. One can associate with the reference frame {t, r, , } a
static observer defined by a set of (distinguished) clocks placed at fixed values of the areal radius
r and synchronised in such a way that the angles between fixed vectors and their modulus remains
constant in time (which explains the meaning of a time-independent metric 16 ). Such an observer
would experimentally determine the trajectory of the in-falling particle by recording the particles
16
Of course, for a general synchronisation, the Schwarzschild metric would not appear time-independent.
127
1
dt
=
.
d
1 v2
(4.3.31)
If we push this argument towards the Schwarzschild radius rH , it seems that there, and
v 1 (the speed of light). However this conclusion relies on the use a specific observer (the static
one), which might not be physically realisable 18 . We shall therefore need a better (and more
realistic) way to describe the physics we can see for r & rH .
4.3.3
General geodesics
(4.3.32)
(4.3.33)
128
Since we expect a planet orbits at a relatively large r0 > rH , the right hand side of the above
equation is smaller than 1, and one concludes that bound orbits only exist if for
E<m.
(4.3.34)
Otherwise, one will have scattering trajectories for E m. The quantity E can thus be viewed as
the sum of the proper particle mass m and its (negative) gravitational potential energy.
General time-like geodesics with J > 0 can now be studied. One then usually expresses the
radial coordinate r = r( ) in terms of the angle = ( ) by means of the chain rule, and obtains
dr
d
2
GN M m2 3
E 2 m2 4
2 GN M
=
r 1 2
r 1
r2 ,
2 J2
m
4 J2
r
(4.3.35)
but we shall not go any further here. We just wish to mention that such an analysis explains the
anomalous precession of Mercurys perihelion: in Newtonian gravity, even including the effects
of other planet, one determines a precession which is about 43 (arc seconds) per century off
(total precession is about 5600 per century). Einstein then showed that this small discrepancy
can be precisely accounted for in General Relativity by making use of the Schwarzschild metric.
Historically, this is recognised as one of the (three) classical tests of General Relativity.
Null trajectories can likewise be studied by means of the same action principle (4.3.19), but with
T = 0, from which one can derive an analogue to Eq. (4.3.35). Again, we shall not go into details
and just mention that such an equation could be used to describe the second of the three classical
tests of General Relativity, namely the deflection of light rays around stellar sources. This effect
was first verified by Eddington during a famous expedition to observe a solar eclipse in 1919.
4.3.4
Gravitational red-shift
Let us again consider the particular case of a radial geodesic, but this time for photons, which
means T = 0 in Eq. (4.3.19). Without using the explicit form of the geodesic equation, one can
already derive a simple and very general expression for the gravitational red-shift experienced by
a photon which travels in a static space-time, such as it would be a light ray moving radially in
the Schwarzschild metric. This result is particularly important because it will allow us to model
a more realistic kind of observation. In fact, given the fact that the Schwarzschild metric departs
from the flat Minkowski metric significantly for r rH , it is very unlikely that we can place a static
measuring apparatus there (see Footnote 18). What we could instead do more easily is to look at a
in-falling particle from far away. This means receiving light signals from such a freely-falling probe
which, as we shall see momentarily, are increasingly weakened.
(t) , the Killing energy 19
First of all we note that, if there exists a time-like Killing vector K
E = K(t)
u ,
(4.3.36)
is conserved along geodesics. In fact, let u be the 4-velocity of a particle which moves along a
geodesic, then
19
dE
(u u ) + u u K(t) = 0 ,
u = K(t)
= u K(t)
d
(4.3.37)
Properly speaking, this E is not the energy measured by any observer, unless the space-time is everywhere flat.
129
where we used the geodesic equation and the definition of Killing vectors (3.5.59),
K; + K; = 0 ,
(4.3.38)
and parameterizes the geodesic (for time-like geodesics, can be taken the proper time, for null
geodesics it is a generic affine parameter). We then observe that the 4-velocity of a static observer
(t) , which implies that
is a static space-time must be proportional to the Killing vector K
K(t)
U =q
K
K(t)
(t)
K(t)
|K(t) |
(4.3.39)
u
K(t)
= U u = q
K
K(t)
(t)
E
.
|K(t) |
(4.3.40)
Let us then consider a photon which crosses two different static observers placed at r = r1 and
r = r2 respectively. They will measure the photons energies
|K(t) (r2 )|
1
.
=
2
|K(t) (r1 )|
(4.3.41)
,
U = (t(r),
0, 0, 0) = t K(t)
(4.3.42)
1 rH /r2
,
1 rH /r1
(4.3.43)
rH
s ,
rs
(4.3.44)
or
o s
with a given velocity, say vs = v(rs ), with respect to the static observer at the point of emission
r = rs ,
o p
1 vs2
1 + vs2
r
rH
1
.
rs
(4.3.45)
The Doppler contribution is arbitrary, in a sense, since the velocity vs depends on the specific initial
conditions of the probes trajectory, whereas the gravitational contribution (4.3.44) is uniquely
determined by the geometry. For this reason, one usually omits the former in discussing the
gravitational redshift.
4.3.5
In particular, Eq. (4.3.44) implies that a photon emitted near the Schwarzschild radius would
spend all of its energy to escape. This leads us to interpret the Schwarzschild sphere as an event
horizon. The precise physical nature of the Schwarzschild sphere could in fact be fully understood
by studying light cones starting at different areal radii. One would then discover that, whereas at
r > rH there exists both in-going (contracting) and out-going (expanding) light cones, for r = rH
the out-going light cone is stuck at r = rH (which is therefore a null surface) and for r < rH it
also contracts. This is the defining property of an apparent horizon or trapping horizon: no signal,
including light, can escape from within it. This concept is the General Relativity realization of an
older conjecture made in Newtonian gravity by Michell and Laplace in the 18th century: by simply
equating the Newtonian escape velocity to the speed of light,
GN M
1
2
m v
=m
,
2
r
with v = 1 ,
(4.3.46)
one finds a limiting mass M (independent of m) above which even a signal travelling at the speed
of light cannot escape from a star with given radius rs . Of course, it is questionable that the above
derivation makes sense for m = 0 20 , but it is quite interestingly that this conjecture exactly leads
to rs = rH , the Schwarzschild radius of the star of mass M . But given the previous analysis of
geodesic motion and Eq. (4.3.30) in particular, the coincidence is clear, as it should be clear that
the coordinate r used to describe the proper geodesic motion is not the same as the radius r in the
Newtonian argument above.
If the horizon remains static, it is then called the event horizon because the region inside it will
never be able to communicate with the outer region. The inner region is then known as a black
hole, of which we now know only a very limited variety may (mathematically) exist. They include
the spherically-symmetric but electrically charged Reissner-Nordstroem metric (found in 1916-18),
the axially-symmetric and rotating Kerr metric (found in 1963), and the electrically charged and
rotating Kerr-Newman metric (finally discovered in 1965). A common feature of all these solutions
of the Einstein equations is the existence of one (or more) horizons, approaching which photon
frequencies are red-shifted to zero. It is important to remark two aspects:
1) any probe sent toward the horizon would communicate by means of electromagnetic signals
whose frequency would be affected according to Eq. (4.3.44). Since any physically realisable receiver
has a lower minimum sensitivity, there is always going to be a radius rs > rH such that o (rs )
20
If one takes the limit m 0 before solving Eq. (4.3.46), one does not obtain any escape velocity.
131
is lower than the minimum sensitivity. At that point, the probe would simply black out and
become invisible to any static observer.
2) the above is a purely kinematical effect. In fact, tidal forces (that is, geodesic deviation,
in the General Relativity jargon) described by the Riemann tensor remain small for r rH and
only diverge for r 0 (thus known as the real singularity, whereas the horizon is also named a
coordinate singularity since it can be removed by a suitable change of coordinates).
Rotating black hole metrics also display a frame-dragging effect, so that the space-time appears
to be dragged by the angular momentum of the source. In particular, if one carries a vector (for
example, a spinning top) along a spatially closed geodesic (in the sense that certain coordinate
positions are the same at the beginning and end of the trajectory), the vector will appear rotated
with respect to its initial direction. This effect is extremely small (of the order of 1013 ), but has
been recently tested around our planet by the Probe-B satellite.
In a dynamical situation, there in general exist dynamical horizons which evolve in time and
may (or not) give rise to an event horizon. We remark the former are defined by the local causal
structure (that is, the light cones around a point), whereas the latter is global in nature: whether a
gravitational system possesses an event horizon or not requires the knowledge of the whole spacetime. In particular, for a collapsing body (like a supernova) one, in principle, needs to know the
entire future of the remnant body after the initial explosion.
Black hole space-times would however be just a mathematical curiosity if they were not realized
in nature. In the 40s, Oppenheimer and collaborators described very simple models of collapsing
spheres of dust which ended into forming black holes, thus showing that similar objects might be
the final outcome of supernovae explosions. Further, the same authors and Chandrasekhar later
provided limiting masses (of order 3 M ) above which neutron stars would not be stable but collapse
to a point-like singularity. In particular, Chandrasekhar found in the early 30 that a collapsing
star will not produce a stable white dwarf if its mass exceeds the Chandrasekhar limit of about
1.5 M . The gravitational attraction will in fact be strong enough to force electrons to merge
protons and give rise to a neutron star , kept together by the quantum mechanical pressure of the
degenerate Fermi gas of neutrons. Neutron stars are the best candidate for pulsars. Later, in 1939,
Oppenheimer and Volkoff employed the results of Tolman and further found that neutron stars will
not be stable if their mass exceeds 3 M . At that point, there is no (known) force which could
prevent the collapse of matter to a point. Currents estimates of such limits vary, but the general
picture stands and the existence of black holes in our universe is widely accepted. For example,
astronomers have found evidence of very large black holes (with M 106 M ) at the centre of
galaxies, including our Milky Way.
Finally, let us mention that, in the early 70s, Hawking discovered that black holes actually
emit particles like black bodies, as a quantum effect, albeit at a very small effective temperature
(smaller than the typical temperature of the CMB radiation). This result makes black holes one
of the most interesting (theoretical) arenas for General Relativity and quantum physics, with a
possible conceptual clash between the two, which produced many interesting speculations about
the possible quantum theory of gravity.
The other case of interest is cosmology which, quite remarkably, like the study of black holes
and General Relativity overall, was for a long time after the initial excitement looked upon as a
mathematical (if not merely metaphysical) subject.
132
4.4
Cosmology
Figure 4.3: Past and future light-cones originating from us now. t are hypersurfaces of constant
time t (to be defined).
Let us first review these (apparently) simple symmetries. Looking at the sky at night, it is pretty
obvious that it does not appear particularly isotropic: our solar system is pretty much empty space
with a few planets and asteroids, and one can clearly see scattered stars forming Constellations
and clusters, and even a strip we call the Milky Way. Nonetheless, one would like to think that,
could we detect all the matter in the Universe, the average distribution (on a suitably sampled area
of the night sky) is the same in all directions. In brief, isotropy is therefore assumed as much
as it is observed. It is further worth noting that what we see in the sky is not the Universe at
a given instant of time, but the image of it produced by light-cones that reach us at the time of
observation (see Fig. 4.3). Saying that the Universe is homogenous and isotropic therefore means
that there exists a time t such that the Universe is homogenous and isotropic on each time slice
t . The matter content on each t then affects the light propagation, which shows that the entire
construction needs to be experimentally self-supported. Moreover, a signal traveling along light
cones may have generated at different times t in the past from different distances s, and we
therefore need a separate way of determining either t or s in order to verify specific models.
The CP may therefore be taken as a working assumption upon which explicit models of the
Universe are built and (hopefully) verified a posteriori . In particular, we expect there is a minimum
scale above which the Universe appears homogeneous, but in the following we shall consider an
idealized view in which galaxies form a homogeneous fluid filling up the entire space.
133
4.4.1
Friedman-Robertson-Walker metric
Like with the case of the Schwarzschild metric, the form of the cosmological metric can be partly
fixed by assuming the existence of Killing vectors. In particular, we will now have three space-like
Killing vectors generating spatial translations (which mathematically defines homogeneity), and
three space-like Killing vectors generating rotations (which mathematically defines isotropy). It
can also be proven that isotropy with respect to an arbitrary point is equivalent to homogeneity,
and that there may not be any further isometry on a three-dimensional (space-like) foliation of the
space-time 21 , but we shall not go into these details. It is just worth pointing out that we have no
time-like Killing vector since we want to describe an evolving Universe, since in 1929 E. Hubble
and M. Humason observed the farer galaxies recedes from us faster and faster.
Homogeneity and isotropy uniquely identifies the FRW (Friedmann, Robertson and Walker)
metric
dr 2
2
2
2
2
2
2
2
+ r d + sin d
,
(4.4.1)
ds = dt + a (t)
1 k r2
where the origin r = 0 is totally arbitrary, t is the proper time of an observer moving along with
the homogenous and isotropic cosmic fluid (the idealized representation of galaxies) at r, and
constant. We shall also call
{t, r, , }
comoving coordinates
k = 0, 1
curvature constant.
a(t)
(4.4.2)
Note that k could in fact take any real value, however, if k 6= 0, a suitable rescaling of r and a 22 ,
dr k dr
(4.4.3)
a2 k2 a2 ,
will always allow to have k = 1 [but no rescaling allows to change between the three integer
values in (4.4.2)]. Finally, the meaning of the coordinate r is very different from that we used in
the Schwarzschild space-time. If we write
ds2 = dt2 + a2 (t) d 2 ,
(4.4.4)
(4.4.5)
and the area of surfaces of constant r therefore depends on time. Likewise, the proper distance
between two points is given by
dR = a(t)
dr
= a(t) dR(3) ,
1 k r2
134
(4.4.6)
Observations suggest that the distance between galaxies increases in time, whereas their typical
size remains the same. We can therefore claim that the Universe is expanding, with the farer
galaxies moving faster away from us, like dots on an inflating balloon 23 . This picture can be
mathematically modeled by a modified FRW metric which locally (around matter sources such as
a galaxy) looks like the Schwarzschild metric: local lengths are mostly affected by the localized
sources and do not appreciably change in time, whereas the distance between sources increases
because of the increasing scale factor 24 .
Depending on the value of the curvature scalar, we can introduce new coordinates such that
the topology of the hypersurface t is apparent from the line element d 2 :
Flat Universe: for k = 0, the coordinate r is very similar to the usual radial coordinate in
R3 ,
d 2 = dr 2 + r 2 d2 = dx2 + dy 2 + dz 2 ,
(4.4.7)
d 2 = dX 2 + sin2 (X) d2 ,
(4.4.8)
d 2 = d2 + sinh2 () d2 ,
(4.4.9)
4.4.2
Cosmic fluids
As we wrote above, we assume the Universe is filled with a (perfect) fluid of matter and energy.
Its energy-momentum tensor then takes the form
T = diag(, p, p, p) ,
(4.4.10)
where
= (t)
and
p = p(t) ,
(4.4.11)
and satisfies the continuity equation T = 0. The 00-component of this equation yields energy
conservation 25 ,
T 0 = 0 3
a
( + p) = 0 ,
a
23
(4.4.12)
Alternative scenarios were proposed, in which the Universe is stationary and matter is produced continuously
during the expansion so as to keep the average density constant.
24
This picture is still being debated, and is the topic of the so-called Einstein-Straus problem in General Relativity.
25
Note the Christoffel symbols do not vanish for the FRW metric.
135
(4.4.14)
a
= 3 (1 + )
a3(1+) .
(4.4.15)
The simplest components of cosmic fluids are given by dust (pressureless matter, or nonrelativistic matter almost exactly at rest with the cosmic frame) and radiation (massless matter,
or highly-relativistic matter).
Dust: in this case no force is present, beside gravity, and = 0 (so that p = 0). Eq. (4.4.15)
therefore yields
E
a3 ,
(4.4.16)
V
which agrees with the fact that the proper mass of dust particles, E m0 , is an invariant
and the volume element scales like
dust =
V aaa .
(4.4.17)
In this case, one can consider dust particles (stars, galaxies, etc.) as being located at fixed
r, and , or the chosen reference frame as comoving with the cosmic fluid. Moreover, since
dust particles are only subject to gravity, lines of constant r, and are also geodesics.
Radiation: since mass is totally negligible, so is the trace of the energy-momentum tensor 26 .
T = ( + 3 p) = 0 .
(4.4.18)
We then find
p=
1
,
3
(4.4.19)
and
E
a4 .
(4.4.20)
V
This result can be understood by noting that the volume scales again like in Eq. (4.4.17),
and photon energy redshifts according to
radiation =
E a1 .
(4.4.21)
Of course, it makes no sense to consider the chosen reference frame as comoving with the
photons. The only possible definition of the reference frame in use is then provided by the
CP, or that the coordinates are such that radiation = radiation (t) and pradiation = pradiation (t).
26
Recall that the trace of a (0, 2) tensor is invariant under rotations. In General Relativity, this result is lifted to
any reference frame. By computing the trace in a suitable frame, and using the mass-shell condition E = p where
E is the energy and p the momentum, one can then obtain that T = 0 for photons.
136
For a long time it was thought that we now live in a matter (dust)-dominated Universe, whereas in
the early stages, the Universe dynamics was controlled by radiation, since the density of the latter
increases faster (going backward in time). We now know that the Universe expansion is presently
accelerating (
a > 0), which is not compatible with the effect of dust.
Vacuum or dark energy: Among possible sources, we may also include a fluid with equation
of state
= p =
,
8 GN
= 1 ,
1 ,
(4.4.22)
where is the famous cosmological constant first introduced by Einstein, who later defined
it his biggest mistake (but is now necessary to explain the current accelerated expansion of
the Universe).
4.4.3
Friedmann equations
The specific form of the FRW metric (4.4.1) reduces the Einstein equations to just two Friedmann
equations,
#
"
k
a 2
+ 2 = 8 GN
(4.4.23)
G00 = 8 GN T00
3
a
a
Gii = 8 GN Tii
= 4 GN ( + 3 p) .
a
(4.4.24)
The first one, Eq. (4.4.23), is technically a constraint, which selects the possible combinations of
initial conditions a(t0 ) = a0 and a(t
0 ) = a 0 for the truly dynamical (second order) Eq. (4.4.24) for
the scale factor a = a(t), given a specific matter content. However, since the constraint is preserved
at all times, as can be seen by deriving Eq. (4.4.23) with respect to time and using the continuity
Eq. (4.4.15) to obtain Eq. (4.4.24), it is easier to just solve for the constraint (4.4.23) at all times
t t0 .
We further define
aa
2
a
q=
=
decelleration parameter
8 GN
=
3 H2
critical
density parameter
(4.4.25)
where
critical =
3 H2
.
8 GN
(4.4.26)
k
H 2 a2
(4.4.27)
<1
k = 1
Open Universe
= critical
=1
k=0
Flat Universe
> critical
>1
k = +1
Closed Universe
The spatial curvature k then determines the evolution of the scale factor (see Fig. 4.4, which displays
the cosmic evolution for dust, and similar behaviours would also occur for radiation).
Observations suggest that our Universe is very close to k = 0. For a flat, matter dominated
Universe, one has
dust 3
a
a a 2 1
a 2
1
3
2
a
a
a da dt
a2 t .
(4.4.28)
a2 t .
(4.4.29)
rad 4
a
a2 a 2 1
a 2
1
4
2
a
a
a da dt
Finally, for a flat and empty Universe, with only a positive vacuum energy present, one obtains the
exact solution
a 2
vacuum
a
3
a eH0 t (de Sitter Universe) ,
(4.4.30)
r
= H0
3
a
where H0 is now a true cosmological constant.
138
4.4.4
Many observations have confirmed that the Universe is filled with an almost homogeneous relic
radiation (CMB). We believe this radiation was generated in the very early times, when matter
and radiation decoupled (and photons could therefore start to travel freely) on the surface of last
scattering. Looking back at the energy densities (4.4.16) and (4.4.20), we see that the Universe must
have been much denser and hot during its early stages. At those high energies, the mean free path
of photons was very short since they has enough energy to produce pairs of (oppositely charged)
particles, and photons were in (approximate) thermal equilibrium with electrons and positrons
(among others). As the photon energy decreased below the threshold for pair productions, photons
became essentially free and those are the oldest light signals we can detect now, at a temperature
of about 3 K.
The homogeneity of the CMB is surprising. Suppose we look along two opposite directions
in the sky. The light we receive now from those directions will have originated from very distant
places, and one then wonders how such points could have been at the same temperature. In fact,
the two points could have never been in causal contact before (no signal may have yet travelled
between them; see Fig. 4.5).
dt = a dr
dr =
dt
.
a
(4.4.31)
Suppose we place ourselves at r = 0 and integrate the above expression (along the light-cone) from
t = ts to now (t = 0). We thus find the comoving radial coordinate of the point of origin,
Z 0
Z rs
dt
dr
rs =
.
(4.4.32)
a(t)
ts
0
If the Universe is either matter or radiation dominated, we have
a(t) t ,
0<<1
rs t1
.
s
(4.4.33)
(4.4.34)
which, quite remarkably, coincides with the Minkowskian result in flat space. One also has
a t1
H=
RH
1
,
H(ts )
a
t1 ,
a
(4.4.35)
grows with time (more of the Universe comes into causal contact with a given observer).
In a vacuum dominated Universe, we instead have
a(t) eH0 t
rs
1 H0 t
e
,
H0
(4.4.37)
1
.
H0
(4.4.38)
The latter result can explain the CMB homogeneity: the Universe started out very small, enough
so that all of its parts had time to come into causal contact. It then underwent an early phase of
rapid (almost exponential) expansion, called inflation, during which the initial state of matter was
almost frozen. The CMB originated after the end of inflation, which explains why we do not yet
see the entire Universe, but the CMB is homogeneous.
4.4.5
Cosmological redshift
We would like to assess which type of Universe we live in and the value of H0 by direct observations.
Let us first consider the motion of a particle along a geodesic in a homogenous and isotropic
space-time. If we denote by U = (1, 0, 0, 0) the four-velocity of the cosmic fluid, the combination
K = a2 (g + U U )
(4.4.39)
( K) = 0 .
(4.4.40)
What matters is that we can use such a quantity like we used the time-like Killing vector of
Schwarzschild geometry to obtain the gravitational redshift formula. If a particles four-velocity is
V K 2 = 0 .
(4.4.42)
~ |2 = 1 ,
V V = (V 0 )2 + |V
(4.4.43)
140
this implies
i
h
~ |2 + (V 0 )2
K 2 = a2 (V 0 )2 |V
~|=
|V
K
,
a
(4.4.44)
and its three-velocity decreases as the Universe expands. For example, a gas of particles will cool
down as the Universe becomes larger. For photons instead, we have
V V = 0
U V =
K
.
a
(4.4.45)
(4.4.46)
.
(t2 )
a(t1 )
a1
(4.4.47)
The same result can be easily obtained from Eq. (4.4.46) and the geodesic equation for the time
component of V , which for the FRW metric reads
d2 t
a
=
d2
a
dt
d
2
(4.4.48)
and is solved by
dt
K
=
,
d
a
(4.4.49)
K
a
a.
(4.4.50)
o s
ao
=
1 ,
s
as
(4.4.51)
where the subscript o means the quantity is taken at the time of observation and s at the (earlier)
time of emission. Unlike the Doppler effect, this redshift is not caused by the relative motion of
emitter and observer, but by the space-time expansion, and can be directly measured.
It is important to remark now that the value of a = a(t) at a given instant of time is not
physically meaningful, since a can always be rescaled by an arbitrary constant. However, the
ratio a(t1 )/a(t2 ), for any two times t1 6= t2 , is instead measurable, in principle, and by means of
Eq. (4.4.51), also in practice. It also gives us a way of measuring distances (indirectly).
141
4.4.6
Luminosity-distance relation
In astronomy, measuring distances is of course not trivial, but one can measure the apparent
luminosity of an object. One method to estimate distances is then to use the luminosity-distance
relation, denoted by dL , for specific light sources (stars, galaxies, cluster of galaxies, etc).
To explain this better, let us denote by F the flux of energy (energy E per unit time T and
area A) measured by an observer and first consider Minkowski space-time. Since the energy E is
conserved during light propagation (there is no gravitational redshift in this case), the total energy
that crosses any concentric sphere per unit time L = E
T does not depend on the sphere radius and
equals the intrinsic luminosity of the source, L0 = E0 /T0 . The flux observed on a portion of unit
area of this sphere will then be
F =
L0
L
=
A
4 R2
d2L = R2 =
L0
,
4 F
(4.4.52)
which is the trivial luminosity-distance relation for flat space-time, with dL simply equal to the
spheres radius.
In a FRW space-time, photons are redshifted by a factor of (1 + z) while they propagate,
according to Eqs. (4.4.47) and (4.4.51). Moreover, if we assume the the cosmic evolution does not
affect the microscopic mechanisms by which light is emitted, the frequency at which the observer
registers the arrival of photons is likewise reduced with respect to the (previous) frequency at
which they were emitted. In fact, let t be the time between two emissions from the source. In an
expanding Universe, the observer will (later) detect these two subsequent signals a time (1 + z) t
apart. We therefore have that L L0 /(1 + z)2 and the measured flux
F =
L0
L0
L
,
2
2
A
4 d2L
4 (a0 r) (1 + z)
(4.4.53)
where a0 is the scale factor at the time of observation, and a0 r the proper radius of the sphere
centred on the source and upon which the observer is placed. From the above, one immediately
obtains
dL = a0 r (1 + z) ,
(4.4.54)
whichp
can therefore be used to determine the cosmic scale factor from the measurement of z and
dL L0 /F .
In order to measure the redshift, we need to know the original frequency of the light emitted
from the source. Luckily, most astrophysical sources show clear spectral bands of emission or
absorption. We also need to know the intrinsic luminosity L0 of the source. For this purpose,
one can use variable stars which show a specific relation between the period of oscillation of their
(apparent) luminosity and the absolute luminosity (defined as the intrinsic luminosity measured
from a standard distance). For larger distances, one can instead use galaxies with similar properties.
Altogether, these preferred sources are thereby called standard candles, and form the so-called
cosmic distance ladder . Estimating their proper intrinsic luminosity is therefore very important,
since any error would introduce a systematic bias in the measurement of distances across the
universe.
142
4.4.7
Hubble law
a0
1 ,
as
(4.4.55)
where a0 and as are the scale factors at the time of detection (t = t0 , today) and emission t = ts ,
respectively. Then, along a null ray, and for k r 2 1, we easily find
Z r
Z t0
Z r
a2 dr 2
dt
dr
ds2 = dt2 +
dr .
(4.4.56)
=
0
=
)
2 )1/2
1 k r2
a(t
(1
k
r
0
t
0
Upon expanding the cosmic factor for the emission time ts = t around t0 ,
a(t) = a0 a 0 (t t0 ) +
1
a
0 (t t0 )2 + . . . ,
2
(4.4.57)
we then obtain
1
1
2
r=
(t0 t) + H0 (t0 t) + ... ,
a0
2
(4.4.58)
1
a0
= 1 + H0 (t t0 ) q0 H02 (t t0 )2 + . . . ,
as
2
(4.4.59)
1 + 3
a0 a
0
.
=
2
2
a 0
(4.4.60)
i
1 h
q0 2
z + ... .
z 1+
H0
2
Replacing the above into the expression for r, we finally obtain the Hubble law
z
1
1
2
,
z + (1 q0 ) z + . . .
dL =
H0
2
H0
(4.4.61)
(4.4.62)
that is, (not too far) galaxies recede from us with a velocity v z (directly) proportional to the
distance dL . The constant of proportionality is the Hubble constant H0 , whose inverse is therefore
representative of the age of the Universe.
No need to remind us that the distance that appears in the Hubble relation is in fact the
luminosity-distance and the velocity is not quite the same quantity that determines the Doppler
effect in flat space. But for sufficiently small distances, the two sets of concepts can be roughly
interchanged.
143
144
Bibliography
[1] R. Resnick, Introduction to special relativity, J. Wiley and Sons (1968).
[2] M. Kaku, Quantum field theory: a modern introduction, Oxford Univ. Press (1993).
[3] B. Schutz, Geometrical methods of mathematical physics, Cambridge Univ. Press (1980).
[4] B. Schutz, A first course in general relativity, Cambridge Univ. Press (2009).
[5] S. Carroll, Spacetime and geometry, Addison-Wesley (2004). See also arXiv:gr-qc/9712019.
[6] L. Landau e E. Lifsits, Teoria dei campi, Editori Riuniti (1976).
145