PHYS30201

Mathematical Foundations of Quantum Mechanics
Original notes by Judith McGovern (December 2018)
Updated by Mike Birse
November 22, 2019

Contents
1 The Fundamentals of Quantum Mechanics 3

1.1 Postulates of Quantum Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 From the ket to the wave function . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The propagator or time-evolution operator . . . . . . . . . . . . . . . . . . . . . 5
1.4 Simple examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Two-state system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Propagator in free space . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Ehrenfest’s Theorem and the Classical Limit . . . . . . . . . . . . . . . . . . . 9
1.6 The Harmonic Oscillator Without Tears . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7.1 Product spaces and entanglement . . . . . . . . . . . . . . . . . . . . . . 13
1.7.2 Operators in product spaces . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7.3 Two or more particle wave functions . . . . . . . . . . . . . . . . . . . . 16
1.7.4 Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Angular momentum 18
2.1 General properties of angular momentum . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Electron spin and the Stern-Gerlach experiment . . . . . . . . . . . . . . . . . . 21
2.3 Spin- 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Spin precession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Spin and measurement: Stern Gerlach revisited . . . . . . . . . . . . . . 24
2.4 Higher spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Addition of angular momenta . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Using tables of Clebsch-Gordan tables . . . . . . . . . . . . . . . . . . . 29
2.5.2 Example: Two spin- 12 particles . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.3 Angular Momentum of Atoms and Nuclei . . . . . . . . . . . . . . . . . 32
2.6 Vector Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Approximate methods I: variational method and WKB 35

3.1 Approximate methods in Quantum Mechanics . . . . . . . . . . . . . . . . . . . 35
3.2 Variational methods: ground state . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Variational methods: excited states . . . . . . . . . . . . . . . . . . . . . 37
3.2.2 Variational methods: the helium atom . . . . . . . . . . . . . . . . . . . 38
3.3 WKB approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.1 WKB approximation for bound states . . . . . . . . . . . . . . . . . . . 39
3.3.2 WKB approximation for tunnelling . . . . . . . . . . . . . . . . . . . . . 43
1
4 Approximate methods II: Time-independent perturbation theory 45
4.1 Non-degenerate perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Connection to variational approach . . . . . . . . . . . . . . . . . . . . . 47
4.1.2 Perturbed infinite square well . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.3 Perturbed Harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Degenerate perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Example of degenerate perturbation theory . . . . . . . . . . . . . . . . 51
4.2.2 Symmetry as a guide to the choice of basis . . . . . . . . . . . . . . . . . 53
4.3 The fine structure of hydrogen . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Pure Coulomb potential and nomenclature . . . . . . . . . . . . . . . . . 54
4.3.2 Fine structure: the lifting of l degeneracy . . . . . . . . . . . . . . . . . . 54
4.4 The Zeeman effect: hydrogen in an external magnetic field . . . . . . . . . . . . 57
4.5 The Stark effect: hydrogen in an external electric field . . . . . . . . . . . . . . 59
5 Quantum Measurement 61
5.1 The Einstein-Poldosky-Rosen “paradox” and Bell’s inequalities . . . . . . . . . 61
A Revision and background 65

A.1 Index notation—free and dummy indices . . . . . . . . . . . . . . . . . . . . . . 65
A.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.2.1 Vectors and operators in finite spaces . . . . . . . . . . . . . . . . . . . . 67
A.2.2 Functions as vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
A.2.3 Commutators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
A.3 Recap of 2nd year quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . 73
A.3.1 The wave function and Schrödinger’s equation . . . . . . . . . . . . . . . 73
A.3.2 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
A.3.3 Bound states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
A.3.4 Circular and spherical symmetry . . . . . . . . . . . . . . . . . . . . . . 81
A.3.5 Tunnelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.4 Series Solution of Hermite’s equation and the Harmonic Oscillator . . . . . . . 85
A.5 Angular Momentum in Quantum Mechanics . . . . . . . . . . . . . . . . . . . . 87
A.6 Hydrogen wave functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.7 Properties of δ-functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A.8 Gaussian integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
A.9 Airy functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.10 Units in EM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
1. The Fundamentals of Quantum
Mechanics
1.1 Postulates of Quantum Mechanics

Shankar 4.1; Mandl 1; Griffiths 3
Summary: All of quantum mechanics follows from a small set of assump-

tions, which cannot themselves be derived.
All of quantum mechanics follows from a small set of assumptions, which cannot themselves
be derived. There is no unique formulation or even number of postulates, but all formulations
I’ve seen have the same basic content. This formulation follows Shankar most closely, though
he puts III and IV together. Nothing significant should be read into my separating them (as
many other authors do), it just seems easier to explore the consequences bit by bit.
I: The state of a particle is given by a vector |ψ(t)i in a Hilbert space. The state is normalised:
hψ(t)|ψ(t)i = 1.
This is as opposed to the classical case where the position and momentum can be specified at
any given time.
This is a pretty abstract statement, but more informally we can say that the wave function
ψ(x, t) contains all possible information about the particle. How we extract that information
is the subject of subsequent postulates.
The really major consequence we get from this postulate is superposition, which is behind most
quantum weirdness such as the two-slit experiment.
II: There is a Hermitian operator corresponding to each observable property of the particle.
Those corresponding to position x̂ and momentum p̂ satisfy [x̂i , p̂j ] = i~δij .
Other examples of observable properties are energy and angular momentum. The choice of
these operators may be guided by classical physics (e.g. p̂ · p̂/2m for kinetic energy and x̂ × p̂
for orbital angular momentum), but ultimately is verified by experiment (e.g. Pauli matrices
for spin- 12 particles).
The commutation relation for x̂ and p̂ is a formal expression of Heisenberg’s uncertainty prin-
ciple.
III: Measurement of the observable associated with the operator Ω̂ will result in one of the
eigenvalues ωi of Ω̂. Immediately after the measurement the particle will be in the corresponding
eigenstate |ωi i.
This postulate ensures reproducibility of measurements. If the particle was not initially in the
state |ωi i the result of the measurement was not predictable in advance, but for the result of
3
a measurement to be meaningful the result of a subsequent measurement must be predictable.
(“Immediately” reflects the fact that subsequent time evolution of the system will change the
value of ω unless it is a constant of the motion.)
IV: The probability of obtaining the result ωi in the above measurement (at time t0 ) is
|hωi |ψ(t0 )i|2 .
If a particle (or an ensemble of particles) is repeatedly prepared in the same initial state |ψ(t0 )i
and the measurement is performed, the result each time will in general be different (assuming
this state is not an eigenstate of Ω̂; if it is the result will be the corresponding ωi each time).
Only the distribution of results can be predicted. The postulate expressed this way has the same
content as saying that the average value of ω is given by hψ(t0 )|Ω̂|ψ(t0 )i. (Note the distinction
between repeated measurements on freshly-prepared particles, and repeated measurements on
the same particle which will give the same ωi each subsequent time.)
P
Note that if we expand the state in the (orthonormal) basis {|ωi i}, |ψ(t0 )i = i ci |ωi i, the
probability of obtaining the result ωi is |ci |2 , and hΩ̂i = i |ci |2 ωi .
P
d
V: The time evolution of the state |ψ(t)i is given by i~ dt |ψ(t)i = Ĥ|ψ(t)i, where Ĥ is the
operator corresponding to the classical Hamiltonian.
In most cases the Hamiltonian is just the energy and is expressed as p̂ · p̂/2m + V (x̂). (They
differ in some cases though - see texts on classical mechanics such as Kibble and Berkshire.) In
the presence of velocity-dependent forces such as magnetism the Hamiltonian is still equal to
the energy, but its expression in terms of p̂ is more complicated.
VI: The Hilbert space for a system of two or more subsystems (for example, several particles)
is a product space.
This is true whether the subsystems interact or not, i.e. if the states |φi i span the space for one
subsystem, the states |φi i ⊗ |φj i will span the space for two. If they do interact, the eigenstates
of the Hamiltonian will not be simple products of that form, but will be linear superpositions
of such states.
1.2 From the ket to the wave function

Shankar 4.3; Mandl 1.3; Griffiths 3.1,2
Summary: The position-space representation allows us to make contact

with quantum mechanics expressed in terms of wave functions
The position operator in 3-D is x̂, and has eigenkets |ri. A state of a particle can therefore
be associated with a function of position, the wave function: ψ(r, t) = hr|ψ(t)i. Note that
position and time are treated quite differently in non-relativistic quantum mechanics. There is
no operator corresponding to time, and t is just part of the label of the state. By the fourth
postulate, the probability of finding the particle in an infinitesimal volume dV at a position r,
ρ(r)dV , is given by ρ(r, t) = |hr|ψ(t)i|2 = |ψ(r, t)|2 . Thus a measurement of position can yield
many answers, and as well as an average x-position hψ|x̂|ψi there will be an uncertainty, ∆x,
where ∆x2 = hψ|x̂2 |ψi − hψ|x̂|ψi2 .
Though the notation hψ|Â|φi is compact, to calculate it if Â is a function Rof position and mo-
∞
mentum operators we will usually immediately substitute the integral form −∞ ψ ∗ (r)Âφ(r)d3 r,
where—rather sloppily—Â is here understood as the position-representation of the operator.
The momentum operator p̂ has the representation in the position-basis of −i~∇. Eigenstates
of momentum, in the position representation, are just plane waves. In order that they satisfy
the following normalisations
Z ∞
ˆ
I= |pihp| d3 p, and hp|p0 i = δ(p − p0 ) = δ(px − p0x )δ(py − p0y )δ(pz − p0z ),
−∞
we need
1 3/2
eip·r/~ .

φp (r) ≡ hr|pi = 2π~
d
From the time-evolution equation i~ dt |ψ(t)i = Ĥ|ψ(t)i we obtain in the position representation
∂
i~ ψ(r, t) = Ĥψ(r, t),
∂t
which is the Schrödinger equation. The x-representation of Ĥ is usually −~2 ∇2 /2m + V (r).
The Hamiltonian is a Hermitian operator, and so its eigenstates |En i form a basis in the space.
Together with the probability density, ρ(r) = |ψ(r)|2 , we also have a probability flux or current
i~ ∗
j(r) = − (ψ (r)∇ψ(r) − ψ(r)∇ψ ∗ (r)).
2m
The continuity equation ∇ · j = −∂ρ/∂t which ensures local conservation of probability density
follows from the Schrödinger equation.
A two-particle state has a wave function which is a function of the two positions (6 coordinates),
Φ(r1 , r2 ). For states of non-interacting distinguishable particles where it is possible to say that
the first particle is in single-particle state |ψi and the second in |φi, Ψ(r1 , r2 ) = ψ(r1 )φ(r2 ). But
most two-particle states are not factorisable (separable) like that one. The underlying vector
space for two particles is called a “tensor direct product space” with separable states written
|Ψi = |ψi ⊗ |φi, where “⊗” is a separator which is omitted in some texts. We will come back
to this soon.
1.3 The propagator or time-evolution operator

Shankar 4.3
Summary: Construcing the time-evolution operator from the Hamiltonian

allows us to find the time evolution of a general state
The Schrödinger equation tells us the rate of change of the state at a given time. From that
we can deduce an operator that acts on the state at time t0 to give that at a subsequent time
t: |ψ(t)i = Û (t, t0 )|ψ(t0 )i, which is called the propagator or time-evolution operator. We need
the identity
x N
lim 1 + = ex
N →∞ N
(to prove it, take the log of the L.H.S. and use the Taylor expansion for ln(1 + x) about the
point x = 0).
An infinitesimal time step Û (t+dt, t) follows immediately from the Schrödinger equation:
d i
i~ |ψ(t)i = Ĥ|ψ(t)i ⇒ |ψ(t+dt)i − |ψ(t)i = − Ĥdt|ψ(t)i
dt ~
i
⇒ |ψ(t+dt)i = 1 − Ĥdt |ψ(t)i.
~
For a finite time interval t − t0 , we break it into N small steps and take the limit N → ∞, in
which limit every step is infinitesimal and we can use the previous result N times:

|ψ(t)i = lim 1 − ~i Ĥ (t−t
N
0) N
|ψ(t0 )i = e−iĤ(t−t0 )/~ |ψ(t0 )i ≡ Û (t, t0 )|ψ(t0 )i
N →∞
We note that this is a unitary operator (the exponential of i times a Hermitial operator always
is). Thus, importantly, it conserves the norm of the state; there remains a unit probability of
finding the particle somewhere!
If |ψ(t0 )i is an eigenfunction |ni of the Hamiltonian with energy En ,
|ψ(t)i = Û (t, t0 )|ni = e−iEn (t−t0 )/~ |ni.

P
If we are able to decompose |ψ(t0 )i as a sum of such terms, |ψ(t0 )i = n cn |ni with cn = hn|ψi,
then X
|ψ(t)i = cn e−iEn (t−t0 )/~ |ni;
n
each term evolves with a different phase and non-trivial time evolution takes place. Note that
this implies an alternative form for the propagator:
X
Û (t, t0 ) = e−iEn (t−t0 )/~ |nihn|.
n
(Aside: If the Hamiltonian depends explicitly on time, we have

Z t
0 0
Û (t, t0 ) = T exp −i Ĥ(t )dt /~ ,
t0
where the time-ordered exponential denoted by T exp means that in expanding the exponential,
the operators are ordered so that Ĥ(t1 ) always sits to the right of Ĥ(t2 ) (so that it acts first)
if t1 < t2 . This will come up in Advanced Quantum Mechanics.)
1.4 Simple examples
1.4.1 Two-state system

Let us introduce a toy system with which to explore some of the ideas from the postulates.
Consider a quantum system in which the states belong to a two-dimensional, rather than
infinite-dimensional, vector space, spanned by the two orthonormal states {|a+i, |a−i} (nota-
tion to be explained shortly). We will need two operators in this space, Â and B̂; the basis
states are eigenstates of Â with eigenvalues ±1 and the action of B̂ on the basis states is
B̂|a+i = |a−i and B̂|a−i = |a+i.

In this basis the states and the two operators have the following representations:

1 0 1 0 0 1
|a+i −→ , |a−i −→ , Â −→ , B̂ −→ .
0 1 0 −1 1 0
The eigenkets of B̂ are

q
1
q
1 1
|b±i = (|a+i ± |a−i) −→
2 2 ±1
with eigenvalues ±1.
Measurement
The flow-chart below represents an arbitrary series of measurement on a particle (or series
of identically prepared paricles) in an unknown initial state. We carry out consecutive mea-
surements “immediately”, that is quickly compared with the timescale which characterises the
evolution of the system in between measurements. We will talk of “measuring A” when we
strictly mean “measuring the physical quantity associated with the operator Â.
b=+1
a=+1
B a=+1
A B
b=−1
a=−1
A
b=−1
a=−1
A priori, the possible outcomes on measuring A are the eigenvalues of Â, ±1. In general the
particle will not start out in an eigenstate of Â, so either outcome is possible, with probabilities
that depend on the initial state.
If we obtain the outcome a = +1 and then measure B, what can we get? We know that the
state is now no longer |φi but |a+i. The possible outcomes are b = +1 with probabilities
|hb+|a+i|2 and b = −1 with probabilities |hb−|a+i|2 . Both of these probabilities are 1/2: there
is a 50:50 chance of getting b = ±1. (Note that the difference between this and the previous
measurement of A where we did not know the probabilities is that now we know the state before
the measurement.)
If we we obtain the outcome b = −1 and then measure B again immediately, we can only
get b = −1 again. (This is reproducibility). The particle is in the state |b−i before the
measurement, an eigenstate of B̂.
Finally we measure A again. What are the possible outcomes and their probabilities?
Propagation
First let us consider the time-evolution of this system if the Hamiltonian is Ĥ = ~γ B̂. Assume
we start the evolution at t = 0 with the system in the state |ψ(0)i. Then |ψ(t)i = Û (t, 0)|ψ(0)i
with Û (t, 0) = e−iĤt/~ . Now in general the exponentiation of an operator can’t be found in
closed form, but in this case it can, because B̂ 2 = Iˆ and so B̂ 3 = B̂. So in the power series
ˆ
that defines the exponential, successive terms will be alternately proportional to B̂ and I:
Û (t, 0) = e−iγtB̂ = Iˆ − iγtB̂ − 12 γ 2 t2 B̂ 2 + i 3!1 γ 3 t3 B̂ 3 + . . .

= 1 − (γt)2 /2 + (γt)4 /4! − . . . Iˆ − i γt − (γt)3 /3! + (γt)5 /5! − . . . B̂

ˆ cos γt −i sin γt
= cos γtI − i sin γtB̂ −→
−i sin γt cos γt
So if we start, say, with |ψ(0)i = |b+i, an eigenstate of B̂, as expected we stay in the same
state: |ψ(t)i = Û (t, 0)|b+i = e−iγt |b+i. All that happens is a change of phase. But if we start
with |ψ(0)i = |a+i,
|ψ(t)i = cos γt|a+i − i sin γt|a−i.
Of course we can rewrite this as
q
1
e−iγt |b+i + eiγt |b−i

|ψ(t)i = 2
as expected. The expectation value of Â is not constant: hψ(t)|Â|ψ(t)i = cos 2γt. The system
oscillates between |a+i and |a−i with a frequency 2γ. (This is twice as fast as you might
think—but after time π/γ the state of the system is −|a+i, which is not distinguishable from
|a+i.)
1.4.2 Propagator in free space

One case where the propagator can be calculated even in position space is the case of a free
particle, in which the Hamiltonian is Ĥ = p̂2 /2m. We want to be able to find ψ(r, t) given
ψ(r, 0), using
Z
ψ(r, t) = hr|ψ(t)i = hr|Û (t, 0)|ψ(0)i = hr|Û (t, 0)|r0 iψ(r0 , 0)d3 r0 .
The object U (r, r0 ; t, 0) ≡ hr|Û (t, 0)|r0 i is the position-space matrix element of the propagator.
(Some texts call this the propagator, referring to Û only as the time-evolution operator.) This
is the probability of finding the particle at position r0 at time t, given that at time 0 it was at
r. To calculate it we will use the fact that momentum eigenstates |pi are eigenstates of Ĥ:
ZZ
0
hr|Û (t, 0)|r i = hr|pihp|Û (t, 0)|p0 ihp0 |r0 id3 pd3 p0
−ip̂2 t
ZZ
= hr|pihp| exp |p0 ihp0 |r0 id3 pd3 p0
2m~
−ip2 t −ip0 · r0

ip · r
ZZ
1 0
= exp exp δ(p − p ) exp d3 pd3 p0
(2π~)3 ~ 2m~ ~
−ip2 t ip · (r − r0 )
Z
1
= 3
exp + d3 p
(2π~) 2m~ ~
0 2

m 3/2 im|r − r |
= exp
2iπ~t 2~t
In the last stage, to do the three Gaussian integrals
R (dp x dpy dpzp
) we “completed the square”,
−αx2
shifted the variables and used the standard results e dx = π/α which is valid even if α
is imaginary.
Suppose the initial wave function is a spherically symmetric Gaussian wave packet with width
∆:
ψ(r, 0) = N exp(−|r|2 /(2∆2 )) with N = (π∆2 )−3/4 .
Then the (pretty ghastly) Gaussian integrals give
im|r − r0 |2 −|r0 |2
m 3/2 Z
ψ(r, t) = N exp exp 2
d3 r0
2iπ~t 2~t 2∆
2

|r|
= N 0 exp − 2
2∆ (1 + i~t/m∆2 )
where N 0 does preserve the normalisation but we do not display it. This is an odd-looking
function, but the probability density is more revealing:
|r|2

−3/2 2 2 −3/2

P (r, t) = |ψ(r, t)| = π ∆ + (~t/m∆) exp − 2 ;
∆ + (~t/m∆)2 )
p
this is a Gaussian wavepacket with width ∆(t) = ∆2 + (~t/m∆)2 . The narrower the initial
wavepacket (in position space), the faster the subsequent spread, which makes sense as the
momentum-space wave function will be wide, built up of high-momentum components. On the
other hand for a massive particle with ∆ not too small, the spread will be slow. For m = 1 g
and ∆(0) = 1 µm, it would take longer than the age of the universe for ∆(t) to double.
1.5 Ehrenfest’s Theorem and the Classical Limit

Shankar chs 2.7, 6; Mandl 3.2; (Griffiths 3.5.3)
Summary: The form of classical mechanics which inspired Heisenberg’s

formulation of Classical Mechanics allows us to see when particles should
behave classically.
d d
Using i~ dt |ψ(t)i = Ĥ|ψ(t)i and hence −i~ dt hψ(t)| = hψ(t)|Ĥ, and writing hΩ̂i ≡ hψ(t)|Ω|ψ(t)i,
we have Ehrenfest’s Theorem
d 1 ∂ Ω̂
hΩ̂i = h[Ω̂, Ĥ]i + h i
dt i~ ∂t
The second term is absent if Ω̂ is a time-independent operator (like position, momentum,

spin...). Note we are distinguishing between intrinsic time-dependence of an operator, and the
time-dependence of its expectation value in a given state.
This is very reminiscent of a result which follows from Hamilton’s equations in classical me-
chanics, for a function Ω(p, x, t) of position, momentum (and possibly time explicitly)
d ∂Ω dx ∂Ω dp ∂Ω
Ω(p, x, t) = + +
dt ∂x dt ∂p dt ∂t
∂Ω ∂H ∂Ω ∂H ∂Ω
= − +
∂x ∂p ∂p ∂x ∂t
∂Ω
≡ {Ω, H} +
∂t
where the notation {Ω, H} is called the Poisson bracket of Ω and H, and is simply defined in
terms of the expression on the line above which it replaced. (For Ω = x and Ω = p we can in
fact recover Hamilton’s equations for ṗ and ẋ from this more general expression.)
p̂2
In fact for Ĥ = 2m
+ V (x̂), we can further show that
d p̂ d dV (x̂)
hx̂i = h i and hp̂i = −h i
dt m dt dx̂
which looks very close to Newton’s laws. Note though that hdV (x̂)/dx̂i 6= dhV (x̂)i/dhx̂i in
general.
This correspondence is not just a coincidence, in the sense that Heisenberg was influenced by it
in coming up with his formulation of quantum mechanics. It confirms that it is the expectation
value of an operator, rather than the operator itself, which is closer to the classical concept of
the time evolution of some quantity as a particle moves along a trajectory.
A further similarity is that in both quantum and classical mechanics, anything that commutes
with the Hamiltonian (vanishing Poisson bracket in the latter case) is a constant of the motion
(a conserved quantity). Examples are momentum for a free particle and angular momentum
for a particle in a spherically symmetric potential.
6 0, if the system is in an eigenstate of
In the QM case, we further see that even if [Ω̂, Ĥ] =
Ĥ the expectation value of Ω̂ will not change with time. That’s why the eigenstates of the
Hamiltonian are also called stationary states.
Similarity of formalism is not the same as identity of concepts though. Ehrenfest’s Theorem
does not say that the expectation value of a quantity follows a classical trajectory in general.
What it does ensure is that if the uncertainty in the quantity is sufficiently small, in other words
if ∆x and ∆p are both small (in relative terms) then the quantum motion will approximate
the classical path. Of course because of the uncertainty principle, if ∆x is small then ∆p is
large, and it can only be relatively small if p itself is really large—i.e. if the particle’s mass
is macroscopic. More specifically, we can say that we will be in the classical regime if the de
Broglie wavelength is much less than the (experimental) uncertainty in x. (In the Stern-Gerlach
experiment the atoms are heavy enough that (for a given component of their magnetic moment)
they follow approximately classical trajectories through the inhomogeneous magnetic field.)
1.6 The Harmonic Oscillator Without Tears

Shankar 7.4,5; Mandl 12.5; Griffiths 2.3.1
Summary: Operator methods lead to a new way of viewing the harmonic

oscillator in which quanta of energy are primary.
We are concerned with a particle of mass m in a harmonic oscillator potential 12 kx2 ≡ 12 mω 2 x2
where ω is the classical frequency of oscillation. The Hamiltonian is
p̂2 1
Ĥ = + mω 2 x̂2
2m 2
and we are going to temporarily forget that we know what the energy levels andp
wave functions
are. In order to work with dimensionless quantities, we define the length x0 = ~/mω.
Then if we define

1 x̂ x0 †
1 x̂ x0
â = √ + i p̂ and â = √ − i p̂
2 x0 ~ 2 x0 ~
we can prove the following:

√ √
• x̂ = (x0 / 2)(â† + â); p̂ = (i~/ 2x0 )(â† − â)
• [x̂, p̂] = i~ ⇒ [â, â† ] = 1
• Ĥ = ~ω(â† â + 21 )
• [Ĥ, â] = −~ω â and [Ĥ, â† ] = ~ω â†
Without any prior knowledge of this system, we can derive the spectrum and the wave functions
of the energy eigenstates. We start by assuming we know one normalised eigenstate of Ĥ, |ni,
with energy En . Since
En = hn|Ĥ|ni = ~ωhn|â† â + 12 |ni = ~ωhn|â† â|ni + 21 ~ω
and also hn|â† â|ni = hân|âni ≥ 0, we see that En ≥ 21 ~ω. There must therefore be a lowest-
energy state, |0i (not the null state!).
Now consider the state â|ni. Using the commutator [Ĥ, â] above we have

Ĥ â|ni = âĤ|ni − ~ωâ|ni = (En − ~ω)â|ni,
so â|ni is another eigenstate with energy En − ~ω. A similar calculation shows that â† |ni is
another eigenstate with energy En + ~ω. So starting with |ni it seems that we can generate an
infinite tower of states with energies higher and lower by multiples of ~ω.
However this contradicts the finding that there is a lowest energy state, |0i. Looking more
closely at the argument, though, we see there is a get-out: either â|ni is another energy eigen-
state or it vanishes. Hence â|0i = 0 (where 0 is the null state or vacuum).
The energy of this ground state is E0 = h0|Ĥ|0i = 21 ~ω. The energy of the state |ni, the nth
excited state, obtained by n applications of â† , is therefore (n + 12 )~ω. Thus
Ĥ|ni ≡ ~ω(â† â + 12 )|ni = (n + 21 )~ω|ni
and it follows that â† â is a “number operator”, with â† â|ni = n|ni. The number in question is
the number of the excited state (n = 1—first excited state, etc) but also the number of quanta
of energy in the oscillator.
Up to a phase, which we chose to be zero, the normalisations of the states |ni are:
√ √
â|ni = n|n−1i and â† |ni = n + 1|n+1i.
As a result we have
â† n
|ni = √ |0i.
n!
The operators â† and â are called “raising” and “lowering” operators, or collectively “ladder”
operators.
We can also obtain the wave functions in this approach. Writing φ0 (x) ≡ hx|0i, from hx|â|0i = 0
we obtain dφ0 /dx = −(x/x20 )φ0 and hence
2 /2x2
φ0 = (πx20 )−1/4 e−x 0
(where the normalisation has to be determined separately). This is a much easier differential
equation to solve than the one which comes direct from the Schrödinger equation!
The wave function for the n-th state is
n
1 x d 1
φn (x) = √ − x0 φ0 (x) = √ Hn ( xx0 )φ0 (x)
2n n! x0 dx 2n n!
2 d n −z /22
where here the definition of the Hermite polynomials is Hn (z) = ez /2 (z − dz ) e . The equiv-
alence of this formulation and the Schrödinger-equation-based approach means that Hermite
polynomials defined this way are indeed solutions of Hermite’s equation.
This framework makes many calculations almost trivial which would be very hard in the tra-
ditional framework; in particular matrix elements √ of powers of x̂ and p̂ between
√ general states
can be easily found by using x̂ = (â + â† )(x0 / 2) and p̂ = i(â† − â)(~/ 2x0 ). For example,
hm|x̂n |m0 i and hm|p̂n |m0 i will vanish unless |m − m0 | ≤ n and |m − m0 | and n are either both
even or both odd (the last condition being a manifestation of parity, since φn (x) is odd/even if
n is odd/even).
For a particle in a two-dimensional potential 12 mωx2 x2 + 21 mωy2 y 2 , the Hamiltonian is separable:
p p
Ĥ = Ĥx + Ĥy . Defining x0 = ~/mωx and y0 = ~/mωy , creation operators âx and â†x can be
constructed from x̂ and p̂x as above, and we can construct a second set of operators ây and â†y
from ŷ and p̂y (using y0 as the scale factor) in the same way. It is clear that âx and â†x commute
with ây and â†y , and each of Ĥx and Ĥy independently has a set of eigenstates just like the ones
discussed above.
In fact the space of solutions to the two-dimensional problem can be thought of as a tensor
direct product space1 of the x and y spaces, with energy eigenstates |nx i ⊗ |ny i, nx and ny
being integers, and the Hamiltonian properly being written Ĥ = Ĥx ⊗ Iˆy + Iˆx ⊗ Ĥy , and the
eigenvalues being (nx + 12 )~ωx + (ny + 12 )~ωy . The ground state is |0i ⊗ |0i and it is annihilated
by both âx (= âx ⊗ Iˆy ) and ây (= Iˆx ⊗ ây ).
The direct product notation is clumsy though, and we often write the states as just |nx , ny i.
Then for instance
√ p
âx |nx , ny i = nx |nx −1, ny i and â†y |nx , ny i = ny + 1|nx , ny +1i.
1
see section 1.7
The corresponding wave functions of the particle are given by hr|nx , ny i = hx|nx ihy|ny i:
2 2 2 2
φ0,0 (x, y) = (πx0 y0 )−1/2 e−x /2x0 e−y /2y0
1 1
φnx ,ny (x, y) = √ n p
n
Hnx ( xx0 ) Hny ( yy0 ) φ0,0 (x, y)
2 nx ! 2 ny !
x y
In many cases we are interested in a symmetric potential in which case ωx = ωy , x0 = y0 , and

φ0,0 ∝ exp(−r2 /x20 ).
This formalism has remarkably little reference to the actual system in question—all the pa-
rameters are buried in x0 . What is highlighted instead is the number of quanta of energy in
the system, with â and â† annihilating or creating quanta (indeed they are most frequently
termed “creation” and “annihilation” operators). Exactly the same formalism can be used in
a quantum theory of photons, where the oscillator in question is just a mode of the EM field,
and the operators create or destroy photons of the corresponding frequency.
1.7 Product spaces

Shankar pp 248-249 (chapter 10)
Summary: Product states describe multi-particle systems and systems

with intrinsic (spin) angular momentum
1.7.1 Product spaces and entanglement

In quantum mechanics, the concept of the tensor direct product of two (or more) vector
spaces occurs in various contexts. They arise when a system has two distinct aspects, described
by states in two distinct vector spaces. In order to specify the state of the whole system, the
states of both parts need to be given. Examples are states of two or more particles, or two
contributions to the angular momentum such as orbital and spin, or the three coordinates of a
single particle. The different aspects are sometimes called “degrees of freedom”.
Consider a system in which the two aspects are members of the two spaces, VM N
a and Vb . The
notation just means that the first is M -dimensional, the second N -dimensional, and a and b
are just labels. If {|am i : m = 1 . . . M } and {|bn i : n = 1 . . . N } are basis sets for the two
spaces, one possible basis for the product space is formed by picking one from each. There are
M × N ways of doing this, so the product space has dimension M × N . These states are written
|m, ni ≡ |am i ⊗ |bn i. The ⊗ is best regarded simply as a separator and indeed is omitted in
some texts; it doesn’t indicate any operation that is carried out. The combined space is denoted
VM N N M
a ⊗ Vb . There is no intrinsic difference between this and Vb ⊗ Va , but whichever order we
choose, we need to stick to it in a particular problem—we can’t chop and change.
We have implied without proof so far that VM N
a ⊗ Vb is in fact itself a vector space. For that
we need to specify the rules of combination: for linearly-independent vectors |pi, |qi ∈ VMa and
N
|vi, |wi ∈ Vb , we can take a sum of two vectors, α|pi ⊗ |vi + β|qi ⊗ |wi, to give another vector
in the product space. In addition α|pi ⊗ |vi = (α|pi) ⊗ |vi = |pi ⊗ (α|vi). It can be shown that
all the other rules of a vector space hold (α = −1 generates the additive inverse, α = 0 the null
vector), and linearity and the associative and distributive laws hold, eg

α|pi ⊗ β|vi + γ|wi = αβ |pi ⊗ |vi + αγ |pi ⊗ |wi
The inner product in the product space

is defined as the arithmetic product of the individual
inner products: hp| ⊗ hv| |qi ⊗ |wi = hp|qihv|wi, which of course is a scalar. If there is more
than one term in the bra or ket the usual linearity applies. If we choose orthonormal bases in
each space, then {|am i ⊗ |bn i} is an orthonormal basis in the product space.
Note that inner products hp|qi and outer products |pihq| are formed between vectors in the
same space. The tensor direct product is different, the two vectors involved are in different
spaces, even if the two spaces are copies of one another (for two particles, for instance).
Importantly, while all vectors |pi ⊗ |vi are in the product space, as we have just seen not all
vectors in the product space can be written in this way. Those that can are called separable,
i.e. they have a specified vector in each separate space. Those that cannot, such as α|pi ⊗ |vi +
β|qi⊗|wi, are in the product space but are not separable. This is where the distinction between
classical and quantum mechanics comes in: quantum mechanics allows for superposition. A
non-separable state is also called an entangled state. In QM, entanglement is routine.
It is a straightforward extension to form direct product spaces from three or more individual
spaces.
A common situation is the case of two subsystems, each of which can be in one of two states,
|0i and |1i. In the context of quantum information or computing, these are known as “Qbits”
(or “qubits”) – the quantum analogues of classical binary bits. Choosing an order for the Qbits
and sticking to it, a possible basis of states for the combined system is
{|Ai = |0i ⊗ |0i, |Bi = |0i ⊗ |1i, |Ci = |1i ⊗ |0i, |Di = |1i ⊗ |1i}.
States |Bi and |Ci differ by which of the two Qbits is in which state. Note that here the
label | . . .i tells us whether a particular state is in the product space or one of the individual
spaces, but in general we need to specify. These are all separable, as are the following states
(unnormalised, for simplicity):

|Ai + |Bi = |0i ⊗ |0i + |1i ; |Bi + |Di = |0i + |1i ⊗ |1i;

|Ai + |Bi + |Ci + |Di = |0i + |1i ⊗ |0i + |1i .
But the following are entangled:
|Ai + |Di = |0i ⊗ |0i + |1i ⊗ |1i; |Bi − |Ci = |0i ⊗ |1i − |1i ⊗ |0i;

|Ai + |Ci − |Di = |0i + |1i ⊗ |0i − |1i ⊗ |1i = |0i ⊗ |0i + |1i ⊗ |0i − |1i
When a measurement of the state of one Qbit is carried out, the system will collapse into those
parts of the state compatible with the result, which may or may not specify the state of the
other Qbit. Restricting ourselves to states where neither Qbit’s state is known in advance (i.e.
we might get 0 or 1 whichever of the two we measure) then we have two possibilities. If the
initial state is separable,

|ψi i = α|0i + β|1i ⊗ γ|0i + δ|1i ,
(normalised, so |α|2 + |β|2 = |γ|2 + |δ|2 = 1), then if we measure the state of the first Qbit
and get 0 (something that will happen with probability |α|2 ), then the subsequent state of the
system is
|ψf i = |0i ⊗ γ|0i + δ|1i ,
and we still don’t know what result we’d get if we measured the state of the second Qbit. If on
the other hand the state is not separable, for instance
|ψi i = α|0i ⊗ |1i − β|1i ⊗ |0i,
then if measure the state of the first Qbit and get 0, the subsequent state of the system is
|ψf i = |0i ⊗ |1i
—we now know that the second Qbit is in state |1i and if we measured it we would be certain to
get 1. This particular case where measurement of the first determines the result of the second
is called maximal entanglement.
1.7.2 Operators in product spaces

Let Ĉa be an operator in a vector space Va and D̂b one in Vb . Then in the product space
Va ⊗ Vb we can form product operators Ĉa ⊗ D̂b , which act on the kets as follows:

Ĉa ⊗ D̂b |pi ⊗ |vi = Ĉa |pi ⊗ D̂b |vi .
Here it is particularly important to be clear that we are not multiplying Ĉa and D̂b together; they
act in different spaces. Once again ⊗ should be regarded as a separator, not a multiplication.
Denoting the identity operators in each space as Iâ and Iˆb respectively, in the product space
the identity operator is Iâ ⊗ Iˆb . An operator in which each additive term acts in only one space,
such as Ĉa ⊗ Iˆb + Iâ ⊗ D̂b , is called a separable operator. Ĉa ⊗ Iˆb and Ia ⊗ D̂b commute.
The inverse of Ĉa ⊗ D̂b is Ĉa−1 ⊗ D̂b−1 and the adjoint, Ĉa† ⊗ D̂b† . (The order is NOT reversed,
since each still has to act in the correct space.)

Matrix elements work as follows: (hp| ⊗ hv|) Ĉa ⊗ D̂b (|qi ⊗ |wi) = hp|Ĉa |qihv|D̂b |wi. (This
is the arithmetic product of two scalars.)
The labels a and b are redundant since the order of the operators in the product tells us which
acts in which space. The names of the operators might also tell us. Alternatively if we keep
the labels, it is common to write Ĉa when we mean Ĉa ⊗ Iˆb and Ĉa D̂b (or even, since they
commute, D̂b Ĉa ) when we mean Ĉa ⊗ D̂b .
If an operator is separable, i.e. it can be written as Ĉa ⊗ Iˆb + Iâ ⊗ D̂b , then the eigenvectors are
|ci i ⊗ |dj i with eigenvalue ci + dj . As already mentioned the operator is often written Ĉa + D̂b ,
where the label makes clear which space each operator acts in; similarly the eigenstates are
often written |ci , dj i.
Examples of separable operators are the centre-of-mass position of a two-particle system, m1 x̂1 +
m2 x̂2 , which we would never think of writing in explicit direct-product form but which, none-
the-less, acts on states of two particles such as ψa (r1 ) ⊗ φb (r2 ). Another is the total angular
momentum Ĵ = L̂ + Ŝ. Here we will sometimes find it useful to write Jˆz = L̂z ⊗ Iˆ + Iˆ ⊗ Sz to
stress the fact that the operators act on different aspects of the state, see later in the course.
Vectors and operators in finite-dimensional direct-product spaces can be represented by column
vectors and matrices, just as in any other vector space. In the two-Qbit space above, consider
the single-Qbit operator Ω̂ defined by Ω̂|0i = |1i and Ω̂|1i = |0i. Taking the states {|Ii . . . |Di
defined above as a basis, so that |1i −→ (1, 0, 0, 0)> etc, the operator Ω̂⊗ Ω̂ has matrix elements
such as

hB|Ω̂ ⊗ Ω̂|Ci = h0| ⊗ h1| Ω̂ ⊗ Ω̂ |1i ⊗ |0i = h0|Ω̂|1ih1|Ω̂|0i = 1 × 1 = 1
and
 
hA|Ω̂ ⊗ Ω̂|Ai hA|Ω̂ ⊗ Ω̂|Bi hA|Ω̂ ⊗ Ω̂|Ci hA|Ω̂ ⊗ Ω̂|Di
 hB|Ω̂ ⊗ Ω̂|Ai hB|Ω̂ ⊗ Ω̂|Bi hB|Ω̂ ⊗ Ω̂|Ci hB|Ω̂ ⊗ Ω̂|Di 
 
Ω̂ ⊗ Ω̂ → 
 hC|Ω̂ ⊗ Ω̂|Ai hC|Ω̂ ⊗ Ω̂|Bi hC|Ω̂ ⊗ Ω̂|Ci hC|Ω̂ ⊗ Ω̂|Di 

hD|Ω̂ ⊗ Ω̂|Ai hD|Ω̂ ⊗ Ω̂|Bi hD|Ω̂ ⊗ Ω̂|Ci hD|Ω̂ ⊗ Ω̂|Di
 
0 0 0 1
0 0 1 0
=
0 1 0 0

1 0 0 0
Since |0i ± |1i are eigenvectors of Ω̂ with eigenvalues ±1, the four eigenvectors of Ω̂ ⊗ Ω̂ are
(|0i ± |1i) ⊗ (|0i ± |1i) (where the two ± are independent), i.e.
   
1 1
1  −1 
|Ai + |Bi + |Ci + |Di →  1
 and |Ai − |Bi − |Ci + |Di →  −1 

1 1
both with eigenvalue +1, and


 
1 1
 1  −1 
|Ai + |Bi − |Ci − |Di →  
 −1  and |Ai − |Bi + |Ci − |Di → 
 1

−1 −1
both with eigenvalue −1. Of course because there is degeneracy, these are not unique; for
example the simpler vectors |Ai ± |Di and |Bi ± |Ci are also eigenvectors with eigenvalue ±1.
It is easily checked that these are indeed eigenvectors of the corresponding matrix above. They
are orthogonal as expected.
1.7.3 Two or more particle wave functions

Shankar 10.1; Mandl 1.4; Griffiths 5.1 We can now better understand the states of two particles
with wave function Ψ(r1 , r2 ) obeying the Schrödinger equation, as mentioned above. The
corresponding two-particle state |Ψi space is a direct-product of the single-particle spaces and
the basis kets of the position representation are |r1 i ⊗ |r2 i, with Ψ(r1 , r2 ) = (hr1 | ⊗ hr2 |)|Ψi.
For states of non-interacting distinguishable particles the two-particle Hamiltonian will be sep-
arable, Ĥ = Ĥ (1) ⊗ Iˆ(2) + Iˆ(1) ⊗ Ĥ (2) (or just Ĥ (1) + Ĥ (2) ). If |ψi and |φi are eigenstates of the
single-particle Hamiltonians respectively, separable states such as |ψi ⊗ |φi will be eigenstates
of the two-particle Hamiltonian with an energy which is just E (1) + E (2) , and with wave function
hr1 |ψa ihr2 |φi = ψ(r1 )φ(r2 ). In general though the Hamiltonian will contain interaction terms
V (r1 − r2 ) and these separable states will not be eigenstates. But even then the set of all such
states will still form a basis in the full space, and solutions can be written as superpositions
of these basis states. For identical particles Ĥ (1) and Ĥ (2) are the same with both |ψi and |φi
as eigenstates. Since we can’t label the particles themselves, theqonly allowed states are those
which don’t allow us to say which one is in which state, such as 12 ψ(r1 )φ(r2 ) ± φ(r1 )ψ(r2 ) .

The extension to more than two particles is (conceptually) straightforward.
1.7.4 Decoupling
One final note: we often encounter systems with many degrees of freedom but ignore all but one
for simplicity. This is legitimate if the the Hamiltonian does not couple them, or if the coupling
is sufficiently weak, because them the states in the space of interest will evolve independently
of the other degrees of freedom. For instance we typically ignore the centre-of-mass motion of
a two-particle system in the absence of external forces, because it is trivial (it will be at rest or
have constant velocity). Centre-of-mass and relative motions generally decouple, as is the case
in classical physics too. We often discuss the Schrödinger equation for spatial wave functions
and ignore spin, or focus purely on spin for illustrative purposes, which again is fine if spin and
spatial motion are only weakly coupled.
2. Angular momentum
We have previously encountered orbital angular angular momentum via the vector operator
L̂ = r̂ × p̂, the components of which obey the commutation relations
[L̂x , L̂y ] = i~L̂z , [L̂y , L̂z ] = i~L̂x , [L̂z , L̂x ] = i~L̂y , [L̂2 , L̂i ] = 0,
where L̂2 = L̂2x + L̂2y + L̂2z . We also found that L̂2 is identical to the angular part of −~2 r2 ∇2
and so occurs in 3D problems; it commutes with the Hamiltonian if the potential V (r) is
independent of angle. The commutation relations imply that we can only simultaneously know
L2 and one component, taken conventionally to be Lz . The common eigenfunctions of L̂2 and
L̂z are the spherical harmonics, Ylm (θ, φ) and the requirements that these be single-values and
finite everywhere restrict the eigenvalues to be ~2 l(l + 1) and ~m respectively, where l and m
are integers which satisfy
l = 0, 1, 2 . . . , m = −l, −l + 1, . . . l.
More details including the forms of the differential operators for the L̂i and of the first few
spherical harmonics can be found in section A.3.4. A recap of other aspects of angular momen-
tum from last year can be found in section A.5. In this course, we are going to re-derive the
properties of angular momentum, and discover some new ones, based only on the commutation
relations and nothing else.
2.1 General properties of angular momentum

Shankar 12.5; Griffiths 4.3; Mandl 5.2
Summary: Focussing purely on the commutation relations of the opera-

tors, we can derive all the properties of orbital angular momentum without
a θ or φ in sight—and discover spin too.
In the case of the harmonic oscillator, we found that an approach which focused on operators
and abstract states rather than differential equations was extremely powerful. We are going
to do something similar with angular momentum, with the added incentive that we know
that orbital angular momentum is not the only possible form, we will need to include spin as
well—and that has no classical analogue or position-space description.
Consider three Hermitian operators, Jˆ1 , Jˆ2 and Jˆ3 , components of the vector operator Ĵ, about
which we will only assume one thing, their commutation relations:
[Jˆ1 , Jˆ2 ] = i~Jˆ3 , [Jˆ2 , Jˆ3 ] = i~Jˆ1 , [Jˆ3 , Jˆ1 ] = i~Jˆ2 (2.1)
18
or succinctly, [Jî , Jˆj ] = i~ k ijk Jˆk .1 It can be shown that the orbital angular momentum
P
operator defined previously satisfy these rules, but we want to be more general, hence the new
name Ĵ, and the use of indices 1-3 rather than x, y, z. Note that Ĵ has the same dimensions
(units) as ~.
From these follow the fact that all three commute with Ĵ2 = Jˆ12 + Jˆ22 + Jˆ2 : 3
[Ĵ2 , Jî ] = 0
It follows that we will in general be able to find simultaneous eigenstates of Ĵ2 and only one
of the components Jî . We quite arbitrarily choose Jˆ3 . We denote the normalised states |λ, µi
with eigenvalue ~2 λ of Ĵ2 and eigenvalue ~µ of Jˆ3 . (We’ve written these so that λ and µ
are dimensionless.) All we know about µ is that it is real, but recalling that for any state and
Hermitan operator, hα|Â2 |αi = hÂα|Âαi ≥ 0, we know in addition that λ must be non-negative.
Furthermore
~2 (λ − µ2 ) = hλ, µ|(Ĵ2 − Jˆ32 )|λ, µi = hλ, µ|(Jˆ12 + Jˆ22 )|λ, µi ≥ 0

√
so |µ| ≤ λ. The magnitude of a component of a vector can’t be bigger than the length of the
vector!
Now let us define raising and lowering operators Jˆ± (appropriateness of the names still to be
shown):
Jˆ+ ≡ Jˆ1 + iJˆ2 ; Jˆ− ≡ Jˆ1 − iJˆ2 .
Note these are not Hermitian, but Jˆ− = Jˆ+† . These satisfy the following commutation relations:
[Jˆ+ , Jˆ− ] = 2~J3 , [Jˆ3 , Jˆ+ ] = ~Jˆ+ , [Jˆ3 , Jˆ− ] = −~Jˆ− ,

[Ĵ2 , Jˆ± ] = 0,
Ĵ2 = 21 (Jˆ+ Jˆ− + Jˆ− Jˆ+ ) + Jˆ32 = Jˆ+ Jˆ− + Jˆ32 − ~Jˆ3 = Jˆ− Jˆ+ + Jˆ32 + ~Jˆ3 . (2.2)
Since Jˆ± commute with Ĵ2 , we see that the states Jˆ± |λ, µi are also eigenstates of Ĵ2 with
eigenvalue ~2 λ.
Why the names? Consider the state Jˆ+ |λ, µi:
Jˆ3 (Jˆ+ |λ, µi) = Jˆ+ Jˆ3 |λ, µi + ~Jˆ+ |λ, µi = ~(µ + 1)(Jˆ+ |λ, µi)
So either Jˆ+ |λ, µi is another eigenstate of Jˆ3 with eigenvalue ~(µ + 1), or it is the null vector.
Similarly either Jˆ− |λ, µi is another eigenstate of Jˆ3 with eigenvalue ~(µ − 1), or it is the null
vector. Leaving aside for a moment the case where the raising or lowering operator annihilates
the state, we have Jˆ+ |λ, µi = Cλµ |λ, µ + 1i, where
|Cλµ |2 = hλ, µ|Jˆ+† Jˆ+ |λ, µi = hλ, µ|Jˆ− Jˆ+ |λ, µi = hλ, µ|(Ĵ2 − Jˆ32 − ~Jˆ3 )|λ, µi = ~2 (λ − µ2 − µ)
p
There is an undetermined phase that we can choose to be +1, so Cλµ = ~ λ − µ2 − µ.
We can repeat the process to generate more states with quantum numbers µ ± 2, µ ± 3 . . . unless
we reach states that are annihilated by the raising or lowering operators. All these states are
in the λ-subspace of Ĵ2 .
1
ijk is 1 if i, j, k is a cyclic permutation of 1, 2, 3, −1 if an anticylic permutation such as 2, 1, 3 and 0 if any
two indices are the same.
However
√ we saw above that the magnitude of the eigenvalues µ of Jˆ3 must not be greater than
λ. So the process cannot go on indefinitely, there must be a maximum and minimum value
µmax and µmin , such that Jˆ+ |λ, µmax i = 0 and Jˆ− |λ, µmin i = 0. Furthermore by repeated action
of Jˆ− , we can get from |λ, µmax i to |λ, µmin i in an integer number of steps: µmax − µmin is an
integer, call it N .
Now the expectation value of Jˆ− Jˆ+ in the state |λ, µmax i must also be zero, but as we saw above
2
that expectation value, for general µ, is Cλµ = ~2 (λ − µ2 − µ). Thus
λ − µmax (µmax + 1) = 0.
Similarly, considering the expectation value of Jˆ+ Jˆ− in the state |λ, µmin i gives
λ − µmin (µmin − 1) = 0.
Taking these two equations together with µmin = µmax − N , we find
N
µmax (µmax + 1) = (µmax − N )(µmax − N − 1) ⇒ (N + 1)(2µmax − N ) = 0 ⇒ µmax = 2
.
Hence µmax is either an integer or a half-integer, µmin = −µmax and there are 2µmax + 1 possible
values of µ. Furthermore λ is restricted to values λ = N2 N2 + 1 for any integer N ≥ 0.

Let’s compare with what we found for orbital angular momentum. There we found that what
we have called λ had to have the form l(l + 1) for integer l, and what we’ve called µ was an
integer m, with −l ≤ m ≤ l. That agrees exactly with the integer case above. From now on we
will use m for µ, and j for µmax ; furthermore instead of writing the state |j(j +1), mi we will use
|j, mi. We refer to it as “a state with angular momentum j” but this is sloppy—if universally
p
understood; the magnitude of the angular momentum is ~ j(j + 1). The component of this
along any axis, though, cannot be greater than ~j.
But there is one big difference between the abstract case and the case of orbital angular mo-
mentum, and that is that j can be half integer 21 , 23 . . .. If these cases are realised in Physics,
the source of the angular momentum cannot be orbital, but something without any parallel in
classical Physics.
We end this section by rewriting the relations we have already found in terms of j and m,
noting m can only take the one of the 2j + 1 values −j, −j + 1 . . . j − 1, j:
J2 |j, mi = ~2 j(j + 1)|j, mi; Jˆ3 |j, mi = ~m |j, mi;
Jˆ± |j, mi = ~ j(j + 1) − m(m ± 1) |j, m ± 1i.
p
(2.3)
In the diagram below, √ the five cones show the possible locations of the angular momentum
vector with length 6~ and z-component m. The x- and y-components are not fixed, but must
satisfy hJˆ12 + Jˆ22 i = (6 − m2 )~2 > 0 (unless j = 0).
2.2 Electron spin and the Stern-Gerlach experiment
(Shankar 14.5); Griffiths 4.4; Mandl 5.2
Summary: Electrons have an intrinsic angular momentum—spin—

independently of their orbital motion.
From classical physics, we know that charged systems with angular momentum have a magnetic
moment µ, which means that they experience a torque µ × B if not aligned with an external
magnetic field B, and their interaction energy with the magnetic field is −µ·B. For an electron
in a circular orbit with angular momentum L, the classical prediction is µ = −(|e|/2m)L =
−(µB /~)L, where µB = |e|~/2m is called the Bohr magneton and has dimensions of a magnetic
moment.
Since the torque is perpendicular to the angular momentum the system is like a gyroscope, and
(classically) the direction of the magnetic moment precesses about B, with Lz being unchanged.
If the field is not uniform, though, there will also be a net force causing the whole atom to
move so as to reduce its energy −µ · B; taking the magnetic field along the +ve z axis the atom
will move to regions of stronger field if µz > 0 but to weaker field regions if µz < 0. If a beam
of atoms enters a region of inhomogeneous magnetic field one (classically) expects the beam to
spread out, each atom having a random value of µz and so being deflected a different amount.
The Stern-Gerlach experiment, in 1922, aimed to test if silver atoms had a magnetic moment,
and found that it did. The figure below (from Wikipedia) shows the apparatus; the shape of
the poles of the magnet ensures that the field is stronger near the upper pole than the lower
one.
The first run just showed a smearing of the beam demonstrating that there was a magnetic
moment, but further running showed that atoms were actually deflected either up or down by
a fixed amount, indicating that µz only had two possible values relative to the magnetic field.
The deflection what what would be expected for Lz = ±~. That accorded nicely with Bohr’s
planetary orbits, and was taken as a confirmation of a prediction of what we now call the “old”
quantum theory.
From a post-1926 perspective, however, l = 1 would give three spots (m = −1, 0, 1) not two—
and anyway we now know that the electrons in silver atoms have zero net orbital magnetic
moment. By that time though other considerations, particularly the so-called anomalous Zee-
man splitting of spectroscopic lines in a magnetic field, had caused first Kronig then, in 1925,
Gouldschmidt and Uhling, to suggest that electrons could have a further source of angular
momentum that they called spin, which would have only two possible values (m = − 21 , + 12 ) but
which couples twice as strongly to a magnetic field as orbital angular momentum (gs = 2)—
hence the Stern-Gerlach result (µ̂ = −(µB /~)2Ŝ). We now know that the electron does indeed
carry an intrinsic angular momentum, called spin but not mechanical in origin, which is an
example of the j = 12 possibility that we deduced above.
Thus the full specification of the state of an electron has two parts, spatial and spin. The
vector space is a tensor direct product space of the space of square-integrable functions of which
the spatial state is a member, states like |ψr (t)i, for which hr|ψr (t)i = ψ(r, t), and spin space,
containing states |ψs (t)i, the nature of which we will explore in more detail in the next section.
While in non-relativistic QM this has to be put in by hand, it emerges naturally from the Dirac
equation, which also predicts gs = 2.
Because this product space is itself a vector space, sums of vectors are in the space and not all
states of the system are separable, (that is, they do not all have the form |ψr (t)i ⊗ |ψs (t)i). We
can also have states like |ψr (t)i ⊗ |ψs (t)i + |φr (t)i ⊗ |φs (t)i. As we will see spin-space is two
dimensional (call the basis {|+i, |−i} just now), so including spin doubles the dimension of the
state space; as a result we never need more than two terms, and can write
|Ψ(t)i = c1 |ψr (t)i ⊗ |+i + c2 |φr (t)i ⊗ |−i.
But this still means that the electron has two spatial wave functions, one for each spin state. In
everything we’ve done so far the spin is assumed not to be affected by the dynamics, in which
case we return to a single common spatial state. But that is not general.
2.3 Spin- 21
Shankar 14.3; Griffiths 4.4; Mandl 5.3
Summary: A 2-D complex vector space perfectly describes spin
Whereas with orbital angular momentum we were talking about an infinite-dimension space
which could be considered as a sum of subspaces with l = 0, 1, 2, ...., when we talk about
intrinsic angular momentum—spin—we are confined to a single subspace with fixed j. We also
use Ŝ in place of Ĵ, but the operators Ŝi obey the same rules as the Ji . The simultaneous
eigenstates of Ŝ2 and Ŝz are |s, mi, but as ALL states in the space have the same s, we often
drop it in the notation. In this case, s = 21 , m = − 12 , + 12 , so the space is two-dimensional with
a basis variously denoted
{| 12 , 12 i, | 12 , − 12 i} ≡ {| 12 i, |− 21 i} ≡ {|+i, |−i} ≡ {|ẑ+i, |ẑ−i}
In the last case ẑ is a unit vector in the z-direction, so we are making it clear that these are
states with spin-up (+) and spin-down (−) in the z-direction. We will also construct states
with definite spin in other directions.
†
In this basis, the matrices representing p Ŝz (which is diagonal), Ŝ+ and Ŝ− = Ŝ+ , can be written
down directly. Recall Jˆ+ |j, mi = ~ j(j+1)−m(m+1) |j, m+1i, so
q
Ŝ+ | 12 , − 12 i = ~ 34 + 14 | 21 , 12 i, Ŝ+ | 12 , 12 i = 0,
and so h 12 , 21 |Ŝ+ | 12 , − 12 i = ~ is the only non-vanishing matrix element of Ŝ+ From these Ŝx =
1
2
(Ŝ+ + Ŝ− ) and Ŝy = − 12 i(Ŝ+ − Ŝ− ) can be constructed:

1 0 0 1 0 0
|ẑ+i −→ |ẑ−i −→ Ŝ+ −→ ~ Ŝ− −→ ~
Sz 0 Sz 1 Sz 0 0 Sz 1 0

~ 1 0 ~ 0 1 ~ 0 −i
Ŝz −→ Ŝx −→ Ŝy −→ .
Sz 2 0 −1 Sz 2 1 0 Sz 2 i 0
The label Sz on the arrows reminds us of the particular basis we are using. It is easily shown
that the matrices representing the Ŝi obey the required commutation relations.
The matrices
0 1 0 −i 1 0
σ1 = σ2 = σ3 =
1 0 i 0 0 −1
P
are called the Pauli matrices. They obey σi σj = δij I + i k ijk σk . The three together, σ, form
a vector (a vector of matrices, as Ŝ is a vector of operators) while for some “ordinary” vector
a, a · σ is a single Hermitian matrix:

az ax − iay
a·σ = .
ax + iay −az
and a · σb · σ = a · b I + i(a × b) · σ. Together with the identity matrix they form a basis (with
real coefficients) for all Hermitian 2 × 2 matrices.
The component of Ŝ in an arbitrary direction defined by the unit vector n is Ŝ · n, We can
parametrise the direction of n by the polar angles θ, φ, so n = sin θ cos φ ex + sin θ sin φ ey +
cos θ ez . Then in the basis of eigenstates of Ŝz , Ŝ · n and its eigenstates are given by
sin θe−iφ cos 2θ e−iφ/2 − sin 2θ e−iφ/2

~ cos θ
Ŝ·n −→ |n+i −→ |n−i −→
Sz 2 sin θeiφ − cos θ Sz sin 2θ eiφ/2 Sz cos 2θ eiφ/2
The eigenvalues, of course, are ±~/2. (There is nothing special about the z-direction.)
The most general normalised state of a spin-half vector has the form

a
a|ẑ+i + b|ẑ−i −→ where |a|2 + |b|2 = 1.
Sz b
In fact any such state is a eigenstate of Ŝ · n for some direction n given by tan 21 θ = |b/a| and
φ = Arg (b) − Arg (a).
Note that (from the matrix representation, or by checking explicitly for the two states of a
basis) (2Ŝ · n/~)2 = Iˆ for any n. So
exp i(α/~)Ŝ · n = cos α2 Iˆ + i sin α2 ~2 Ŝ · n.

The lack of higher powers of Ŝi follows from the point about Hermitian operators noted above.
Some calculation in the matrix basis reveals the useful fact that hn±|Ŝ|n±i = ± ~2 n; that is,
the expectation value of the vector operator Ŝ is parallel or antiparallel to n.
2.3.1 Spin precession
The Hamiltonian of a spin- 12 electron in a uniform magnetic field is, with gs = 2 and charge
−|e|,
Ĥ = −µ · B = (gs µB /~)Ŝ · B −→ µB σ · B.
Sz
Consider the case of a field in the x-direction, so that Ĥ −→ µB Bσx , and a particle initially in
Sz
the state |ẑ+i. It turns out that we have already done this problem in Section 1.4.1, with the γ
of that problem being equal to µB B/~. Giving the result in terms of ω = 2γ = 2µB B/~, which
is the frequency corresponding to the energy splitting of the eigenstates of Ĥ, we obtained
|ψ(t)i = cos(ωt/2)|ẑ+i − isin(ωt/2)|ẑ−i. hψ(t)|Ŝz |ψ(t)i = (~/2) cos ωt.
To this we can now add hψ(t)|Ŝy |ψ(t)i = −(~/2) sin ωt, and hψ(t)|Ŝx |ψ(t)i = 0. So the ex-
pectation value of Ŝ is a vector of length ~/2 in the yz plane which rotates with frequency
ω = 2µB B/~. This is exactly what we would get from Ehrenfest’s theorem.
Alternatively, we can take the magnetic field along ẑ so that the energy eigenstates are |ẑ±i
with energies ±µB B ≡ ±~ω/2. If the initial state is spin-up in an arbitrary direction n, that
is |n+i, we can decompose this in terms of the energy eigenstates, each with its own energy
dependence, and obtain
|ψ(t)i = cos 2θ e−i(ωt+φ)/2 |ẑ+i + sin 2θ ei(ωt+φ)/2 |ẑ−i = |n(t)+i
where n(t) is a vector which, like the original n, is oriented at an angle θ to the ẑ (i.e. B)
axis, but which rotates about that axis so that the aximuthal angle changes with time: φ(t) =
φ(0) + ωt. The expectation value hŜi precesses likewise, following the same behaviour as a
classical magnetic moment of −gs µB .
2.3.2 Spin and measurement: Stern Gerlach revisited

We now understand, in a way that the original experimenters did not, what the Stern-Gerlach
experiment does: for each atom that passes through, it measures the component of the spin
angular momentum along the direction n of the magnetic field, S · n. Each time, the answer
is either up or down, ±~/2. With the initial beam being unpolarised, the numbers of up and
down will be equal.
The apparatus also gives us access to a beam of particles which are all spin-up in a particular
direction; say the z direction. We can then run that beam through a second copy of the
apparatus rotated through some angle θ relative to the first. The particles exiting from this
copy will be either spin-up or down along the new magnetic field axis, and the probability of
getting each is |hn ± |ẑ+i|2 , that is cos2 2θ and sin2 2θ respectively. If θ = π/2 (new field along
the x axis, assuming a beam in the y-direction), the probabilities are both 50%.
Successive measurements can be schematically represented below, each block being labelled by
the direction of the magnetic field. It should look very familiar.
2.4 Higher spins
Summary: Angular momentum s is represented by a complex (2s + 1)-D

vector space
The Particle Data Group lists spin-1 particles and spin- 23 particles; gravitons if they exist are
spin-2, and nuclei can have much higher spins (at least 92 for known ground states of stable
nuclei).
Furthermore, since in many situations total angular momentum commutes with the Hamiltonian
(see later), even when orbital angular momentum is involved we are often only concerned with
a subspace of fixed j (or l or s). All such sub-spaces are finite dimensional, of dimension
N = 2j + 1, and spanned by the basis {|j, mi} with m = j, j − 1 . . . − j + 1, −j. It is most
usual (though of course not obligatory) to order the states by descending m.
In this subspace, with this basis, the operators Jˆx , Jˆy , Jˆz are represented by three N × N
matrices with matrix elements eg (Jx )m0 m = hj, m0 |Jˆx |j, mi. (Because states with different j
are orthogonal, and because the Jî only change m, not j, hj 0 , m0 |Jˆx |j, mi = 0 if j 6= j: that’s
why we can talk about non-overlapping subspaces in the first place.) The matrix representation
of Jˆz of course is diagonal with diagonal elements j, j − 1 . . . − j + 1, −j. As with spin- 21 , it is
easiest to construct Jˆ+ first, then Jˆ− as its transpose (the elements of the former having been
chosen to be real), then Jˆx = 21 (Jˆ+ + Jˆ− ) and Jˆy = − 21 i(Jˆ+ − Jˆ− ).
As an example we construct the matrix representation of the operators for spin-1. The three
ˆ
p
basis states |s, mi are |1, 1i, |1, 0i and |1, −1i. Recall J+ |j, mi = ~ j(j+1)−m(m+1) |j, m+1i,
√ √
so Ŝ+ |1, −1i = ~ 2 − 0 |1, 0i, Ŝ+ |1, 0i = ~ 2 − 0 |1, 1i and Ŝ+ |1, 1i = 0, and the only non-zero
matrix elements of Ŝ+ are
√
h1, 1|Ŝ+ |1, 0i = h1, 0|Ŝ+ |1, −1i = 2~.
So:
     
1 0 0
|1, 1i −→ 0  |1, 0i −→ 1   |1, −1i −→ 0 

Sz Sz Sz
0 0 1
     
1 0 0 √ 0 1 0 √ 0 0 0
Ŝz −→ ~ 0 0
 0 Ŝ+ −→ 2~  0 0 1  Ŝ− −→ 2~  1 0 0 
Sz Sz Sz
0 0 −1 0 0 0 0 1 0
   
0 1 0 0 −i 0
~ ~
Ŝx −→ √  1 0 1 Ŝy −→ √  i 0 −i 
Sz 2 0 1 0
Sz 2 0 i 0
Of course this is equally applicable to any system with j = 1, including the l = 1 spherical
harmonics.
Once all possible values of j and m are allowed, any angular momentum operator is represented
in the {|j, mi} = {|0, 0i, | 12 , 12 i, | 12 , − 12 i, |1, 1i, |1, 0i, |1, −1i . . . } basis by a block-diagonal matrix
The first block is a single element, zero in fact, since all components of Ĵ in the one-dimensional
space of states of j = 0 are zero. The next block is the appropriate 2×2 spin- 21 matrix, the next
a 3 × 3 spin-1 matrix, and so on. This block-diagonal structure reflects the fact that the vector
space can be written as a direct sum of spaces with j = 0, j = 12 , j = 1....: V = V1 ⊕V2 ⊕V3 ⊕. . .
(where the superscripts of course are 2j + 1).
In fact, any given physical system can only have integer or half-integer angular momentum.
So the picture would be similar, but with only odd- or even-dimensioned blocks. For orbital
angular momentum, for instance, the blocks would be 1 × 1, 3 × 3, 5 × 5 . . ..
2.5 Addition of angular momenta

Shankar 15.1,2; Griffiths 4.4; Mandl 4.4
Summary: Two sources of angular momentum are described by a direct-

product vector space. States with definite total angular momentum J are
not separable, but form an alternative basis in the space.
Up till now, we have in general spoken rather loosely as if an electron has either orbital or
spin angular momentum—or more precisely, we’ve considered cases where only one affects the
dynamics, so we can ignore the other. But many cases are not like that. The electron in a
excited hydrogen atom may have orbital as well as spin angular momentum, and if is placed
in a magnetic field, both will affect how the energy levels shift, and hence how the spectral
lines split. Or the deuteron (heavy hydrogen nucleus) consists of both a proton and a neutron,
and both have spin; heavier atoms and nuclei have many components all with spin and angular
momentum. Only the total angular momentum of the whole system is guaranteed by rotational
symmetry to be conserved in the absence of external fields. So we need to address the question
of the addition of angular momentum.
Because the notation is clearest, we will start with the spin and orbital angular momentum of
a particle. We consider the case where l as well as s is fixed: electrons in a p-wave orbital,
for instance. These two types of angular momentum are independent and live in different
vector spaces, so this is an example of a tensor direct product space, spanned by the basis
{|l, ml i ⊗ |s, ms i} and hence (2l + 1) × (2s + 1) dimensional.
Now angular momentum is a vector, and we expect the total angular momentum to be the
vector sum of the orbital and spin angular momenta. We can form a new vector operator in
the product space
Ĵ = L̂ ⊗ Iˆ + Iˆ ⊗ Ŝ Ĵ2 = L̂2 ⊗ Iˆ + Iˆ ⊗ Ŝ2 + 2L̂ ⊗ Ŝ

where the last term represents a scalar product as well as a tensor product and would more
clearly be written 2(L̂x ⊗ Ŝx + L̂y ⊗ Ŝy + L̂z ⊗ Ŝz ).
In practice, the tensor product notation for operators proves cumbersome, and we always just
write
Ĵ = L̂ + Ŝ Ĵ2 = L̂2 + Ŝ2 + 2L̂ · Ŝ
We know that the L̂i and Ŝi act on different parts of the state, and we don’t need to stress that
when we act with Ŝi alone we are not changing the orbital state, etc. An alternative form, in
which the tensor product notation is again suppressed, is
Ĵ2 = L̂2 + Ŝ2 + L̂+ Ŝ− + L̂− Ŝ+ + 2L̂z Ŝz .
Now in calling the sum of angular momenta Ĵ, which we previously used for a generic angular
momentum, we are assuming that the Jî do indeed obey the defining commutation rules for
angular momentum, and this can easily be demonstrated. For instance
[Jˆx , Jˆy ] = [L̂x + Ŝx , L̂y + Ŝy ] = [L̂x , L̂y ] + [Ŝx , Ŝy ] = i~L̂z + i~Ŝz = i~Jˆz ,
where we have used the fact that [L̂i , Ŝj ] = 0, since they act in different spaces. Hence we
expect that an alternative basis in the product space will be {|j, mj i}, with allowed values of
j not yet determined. The question we want to answer, then, is the connection between the
{|l, ml i ⊗ |s, ms i} and {|j, mj i} bases. Both, we note, must have dimension (2l + 1) × (2s + 1).
We note some other points about the commutator: L̂z , Ŝz and Jˆz all commute; Jˆz commutes
with Ĵ2 (of course) and with L̂2 and with Ŝ2 (because both L̂z and Ŝz do), but L̂z and Ŝz do not
commute with Ĵ2 . Thus we can, as implied when we wrote down the two bases, always specify
l and s, but then either ml and ms (with mj = ml + ms ) or j and mj . (We will sometimes write
|l, s; j, mj i instead of just |j, mj i, if we need a reminder of l and s in the problem.) What this
boils down to is that the state of a given j and mj will be linear superpositions of the states of
given ms and ml that add up to that mj . If there is more than one such state, there must be
more than one allowed value of j for that mj .
Let’s introduce a useful piece of jargon: the state of maximal m in a multiplet, |j, ji, is called
the stretched state.
We start with the state of maximal ml and ms |l, li ⊗ |s, si, which has mj = l + s. This is
clearly the maximal value of mj , and hence of j: jmax = l + s, and since the state is unique, it
must be an eigenstate of Ĵ2 2 . If we act on this with Jpˆ− = L̂− + Ŝ− , we get a new state with
two terms in it; recalling the general rule Jˆ− |j, mi = ~ j(j + 1) − m(m − 1)|j, m−1i where j
can stand for j or l or s, we have (using ̄ as a shorthand for jmax = l + s)
|̄, ̄i = |l, li ⊗ |s, si⇒ Jˆ− |̄, ̄i = (L̂− |l, li) ⊗ |s, si + |l, li ⊗ (Ŝ− |s, si)
p √ √
⇒ 2̄ |̄, ̄−1i = 2l |l, l−1i ⊗ |s, si + 2s |l, li ⊗ |s, s−1i
From this state we can continue operating with Jˆ− ; at the next step there will be three terms on
the R.H.S. with {ml , ms } equal to {l−2, s}, {l−1, s−1} and {l, s−2}, then four, but eventually
we will reach states which are annihilated by L̂− or Ŝ− and the number of terms will start to
shrink again, till we finally reach |̄, −̄i = |l, −li ⊗ |s, −si after 2̄ steps (2̄ + 1 states in all).
2
This can also be seen directly by acting with Ĵ2 = Jˆ− Jˆ+ + Jˆz2 + ~Jˆz , since |l, li ⊗ |s, si is an eigenstate of
Jz with eigenvalue ~(l + s), and is annihilated by both L̂+ and Ŝ+ , and hence by Jˆ+ .
ˆ
Whichever is the smaller of l or s will govern the maximum number of {ml , ms } that can equal
any mj ; for example if s is smaller, the maximum number is 2s + 1.
Now the state we found with mj = l+s−1 is not unique, there must be another orthogonal
combination of the two states with {ml , ms } equal to {l−1, s} and {l, s−1}. This cannot be
part of a multiplet with j = ̄ because we’ve “used up” the only state with mj = ̄. So it must
be the highest mj state (the stretched state) of a multiplet with j = ̄ − 1 (ie l + s − 1):
q q
s l
|̄ − 1, ̄ − 1i = − l+s |l, l−1i ⊗ |s, si + l+s |l, li ⊗ |s, s−1i
Successive operations with Jˆ− will generate the rest of the multiplet (2̄ − 1 in all); all the states
will be orthogonal to the states of the same mj but higher j already found.
However there will be a third linear combination of the states with {ml , ms } equal to {l−2, s},
{l−1, s−1} and {l, s−2}, which cannot have j = ̄ or ̄−1. So it must be the stretched state of
a multiplet with j = ̄−2, (2̄ − 3 states in all).
And so it continues, generating multiplets with successively smaller values of j. However the
process will come to an end. As we saw, the maximum number of terms in any sum is whichever
is smaller of 2l + 1 or 2s + 1, so this is also the maximum number of mutually orthogonal states
of the same mj , and hence the number of different values of j. So j can be between l + s and
the larger of l + s − 2s and l + s − 2l; that is, l + s ≥ j ≥ |l − s|. The size of the {|j, mj i} basis
is then l+s
P
j=|l−s| 2j + 1, which is equal to (2l + 1)(2s + 1).
The table below illustrates the process for l = 2, s = 1; we go down a column by applying Jˆ− ,
and start a new column by constructing a state orthogonal to those in the previous columns.
The three columns correspond to j = 3, j = 2 and j = 1, and there are 7 + 5 + 3 = 5 × 3 states
in total.
|3, 3i = |2,
q 2i⊗|1, 1i q q q
|3, 2i = 2 |2, 1i⊗|1, 1i+ 1 |2, 2i⊗|1, 0i |2, 2i = − 1 |2, 1i⊗|1, 1i+ 2 |2, 2i⊗|1, 0i
q3 q3 q3 q3 q q
|3, 1i = 2 5
|2, 0i⊗|1, 1i+ 15 8 |2, 1i⊗|1, 0i |2, 1i = − 1
2
|2, 0i⊗|1, 1i+ 1 6
|2, 1i⊗|1, 0i 1 |2, 0i⊗|1, 1i−
|1, 1i = 10 3 |2, 1i⊗|1, 0i
10
q q q
+ 151 |2, 2i⊗|1, −1i + 1 |2, 2i⊗|1, −1i + 3 |2, 2i⊗|1, −1i
q q q 3 q 5 q
|3, 0i = 5 |2, −1i⊗|1, 1i+ 3
1
5
|2, 0i⊗|1, 0i |2, 0i = − 1
2
|2, −1i⊗|1, 1i+0|2, 0i⊗|1, 0i 3 |2, −1i⊗|1, 1i− 2 |2, 0i⊗|1, 0i
|1, 0i = 10 5
q q q
+ 1 5
|2, 1i⊗|1, −1i + 12
|2, 1i⊗|1, −1i + 10 3 |2, 1i⊗|1, −1i
q q q q q q
|3, −1i = 15 1 |2, −2i⊗|1, 1i+ 8 |2, −1i⊗|1, 0i |2, −1i = − 1 |2, −2i⊗|1, 1i− 1 |2, −1i⊗|1, 0i |1, −1i = 3 |2, −2i⊗|1, 1i− 10 3 |2, −1i⊗|1, 0i
q 15 3 q 6 5 q
+ 2 5
|2, 0i⊗|1, −1i + 12
|2, 0i⊗|1, −1i + 1 |2, 0i⊗|1, −1i
10
q q q q
|3, −2i = 1 3
|2, −2i⊗|1, 0i+ 2 |2, −1i⊗|1, −1i
3
|2, −2i = − 2 3
|2, −2i⊗|1, 0i+ 1 3
|2, −1i⊗|1, −1i
|3, −3i = |2, −2i⊗|1, −1i
The coefficients in the table are called Clebsch-Gordan coefficients. They are the inner prod-
ucts (hl, ml | ⊗ hs, ms |)|j, mj i but that is too cumbersome a notation; with a minimum modifi-
cation Shankar uses hl, ml ; s, ms |j, mj i; Mandl uses C(l, ml , s, ms ; j, mj ), but hl, s, ml , ms |j, mj i
and other minor modifications, including dropping the commas, are common. They are all
totally clear when symbols are being used, but easily confused when numerical values are sub-
stituted! We use the “Condon-Shortley” phase convention, which is the most common; in this
convention Clebsch-Gordan coefficients are real, which is why we won’t write hl, ml ; s, ms |j, mj i∗
in the second equation of Eq. (2.4) below.
All of this has been written for the addition of orbital and spin angular momenta. But we did
not actually assume at any point that l was integer. So in fact the same formulae apply for
the addition of any two angular momenta of any origin: a very common example is two spin- 12
particles. The more general form for adding two angular momenta j1 and j2 , with J and M
being the quantum numbers corresponding to the total angular momentum of the system, is
X
|J, M i = hj1 , m1 ; j2 , m2 |J, M i |j1 , m1 i ⊗ |j2 , m2 i,
m1 ,m2
X
|j1 , m1 i ⊗ |j2 , m2 i = hj1 , m1 ; j2 , m2 |J, M i |J, M i. (2.4)
J,M
For the common case of s = 21 , j = l ± 21 , we have

q q
l∓mj + 21 l±mj + 12
|l± 12 , mj i = 2l+1
|l, mj + 21 i ⊗ | 12 , − 12 i ± 2l+1
|l, mj − 21 i ⊗ | 12 , 12 i.
To summarise, the states of a system with two contributions to the angular momentum, j1
and j2 , written in a basis in which the total angular momentum J and z-component M are
specified; the values of J range from |j1 −j2 | to j1 +j2 in unit steps. In this basis the total angular
momentum operators Jî and Ĵ2 are cast in block-diagonal form, one (2J+1)-square block for
each value of J. The vector space, which we started by writing as a product, V2j1 +1 ⊗ V2j2 +1 ,
can instead be written as a direct sum: V2(j1 +j2 )+1 ⊕ . . . ⊕ V2|j1 −j2 |+1 . In particular for some
orbital angular momentum l and s = 21 , V2l+1 ⊗ V2 = V2l+2 ⊕ V2l . The overall dimension of
the space is of course unchanged.
2.5.1 Using tables of Clebsch-Gordan tables

Griffiths 4.4
General formulae for the Clebsch-Gordan coefficients are not used (the already-met case of s = 21
is an exception). One may use the Mathematica function ClebschGordan[{l, ml }, {s, ms }, {j, mj }],
and on-line calculator at Wolfram Alpha.
Frequently though one consults tables and this section gives instructions on their use.
In a system with two contributions to angular momentum j1 and j2 , Clebsch-Gordan coefficients
are used to write states good of total angular momemtum J and z-component M , |j1 , j2 ; J, M i
or just |J, M i, in terms of the basis {m1 , m2 }, |j1 , m1 i ⊗ |j2 , m2 i:
X
|j1 , j2 ; J, M i = hj1 , m1 ; j2 , m2 |J, M i |j1 , m1 i ⊗ |j2 , m2 i and
m1 m2
X
|j1 , m1 i ⊗ |j2 , m2 i = hj1 , m1 ; j2 , m2 |J, M i |j1 , j2 ; J, M i
JM
where the numbers denoted by hj1 , m1 ; j2 , m2 |J, M i are the Clebsch-Gordan coefficients; they
vanish unless j1 + j2 ≥ J ≥ |j1 − j2 |, and m1 + m2 = M . There is a conventional tabulation
which can be found in various places including the Particle Data Group site, but the notation
takes some explanation.
There is one table for each j1 , j2 pair. The table consists of a series of blocks, one for each value
J
of M . Along the top are possible values of M and at the left are possible values of m1 m2 .
1
Each block stands for something which could be written like this one for j1 = 1, j2 = 2
and
M = m1 + m2 = 21 :
J 3/2 1/2
M +1/2 +1/2
m1 m2
+1 −1/2 1/3 2/3
0 +1/2 2/3 −1/3
For compactnessq the numbers in the blocks are the coefficients squared times their sign; thus
− 12 stands for − 12 .
1 3
As an example consider the table for coupling j1 = 1 and j2 = 2
to get J = 2
or 12 . For clarity
we will use the notation |J, M i in place of |j1 , j2 ; J, M i.
In red the coefficients of |1, 1i ⊗ | 21 , − 12 i and |1, 0i ⊗ | 21 , 12 i in | 12 , 12 i are highlighted
q q
| 12 , 12 i = 2
3
|1, 1i ⊗ | 12 , − 12 i − 1
3
|1, 0i ⊗ | 12 , 12 i.
In green are the components for the decomposition
q q
|1, −1i ⊗ | 2 , 2 i = 3 | 2 , − 2 i − 23 | 12 , − 12 i.
1 1 1 3 1
3
Or for coupling j1 = 2
and j2 = 1:
giving for example

q q q
| 32 , − 21 i = 8
15
| 32 , 21 i ⊗ |1, −1i − 1
15
| 32 , − 21 i ⊗ |1, 0i − 2
5
| 32 , − 32 i ⊗ |1, 1i
If instead one wants j1 = 1 and j2 = 32 , we use the relation
hj2 , m2 ; j1 , m1 |J, M i = (−1)J−j1 −j2 hj1 , m1 ; j2 , m2 |J, M i

Note that table of Clebsch-Gordan coefficients are given for states of j1 and j2 coupling up to
total J. But as j is a generic angular momentum, that covers s and l coupling to j, or s1 and
s2 coupling to S etc.
2.5.2 Example: Two spin- 21 particles
Here we will call the operators Ŝ(1) , Ŝ(2) and Ŝ = Ŝ(1) + Ŝ(2) for the individual and total spin
operators, and S and M for the total spin quantum numbers. (The use of capitals is standard
in a many-particle system.) Because both systems are spin- 12 , we will omit the label from our
states, which we will write in the {m1 , m2 } basis as
|1i = |+i ⊗ |+i, |2i = |+i ⊗ |−i, |3i = |−i ⊗ |+i, |4i = |−i ⊗ |−i.
(The 1 . . . 4 are just labels here.) In this basis

     
0 1 1 0 1 0 0 0 2 0 0 0
0 0 0 1 0 0 0 0   0 1 1 0
Ŝ+ −→ ~  0 0 0 1
 Ŝz −→ ~ 
0
 Ŝ2 −→ ~2  
0 0 0  0 1 1 0
0 0 0 0 0 0 0 −1 0 0 0 2
where we use explicit calculation for the matrix elements, eg
h1|(Ŝ+(1) + Ŝ+(2) )|2i = h+|Ŝ+(1) |+ih+|Iˆ(2) |−i + h+|Iˆ(1) |+ih+|Ŝ+(2) |−i = 0 + ~,
then Ŝ− = (Ŝ+ )† and Ŝ2 = Ŝ+ Ŝ− + Ŝz2 − ~Ŝz .

It is clear that |1i and |4i are eigenstates of Ŝ2 with eigenvalue 2~2 and hence S = 1. They are
2
±~. In the {|2i, |3i} subspace, which has M = 0, Ŝ is
also eigenstates of Ŝz with eigenvalues

1 1
represented by the matrix ~2 which has eigenvalues 2~2 and 0 corresponding to states
1 1
q
1
2
(|2i ± |3i). We label these four simultaneous eigenstates of Ŝ2 and Ŝz as |S, M i, and take
the ordering for the new basis as {|0, 0i, |1, 1i, |1, 0i, |1, −1i}. Then the matrix of eigenvectors,
U, is  
0√ 1 0√ 0
 1/ 2 0 1/ 2 0 
U = ~ √ √ 
 −1/ 2 0 1/ 2 0 
0 0 0 1
and the transformed matrices U† Si U are
     
0 0 0 0 0 0 0 0 0 0 0 0
~  0 0 1 0  ~ 0 0 −i 0  0 1 0 0 
Ŝx −→ √   Ŝy −→ √   Ŝz −→ ~  
2  0 1 0 1  20 i 0 −i  0 0 0 0 
0 0 1 0 0 0 i 0 0 0 0 −1
where the 1 × 1 plus 3 × 3 block-diagonal structure has been emphasised and the 3 × 3 blocks
are just the spin-1 matrices we found previously.
2.5.3 Angular Momentum of Atoms and Nuclei
Both atoms and nuclei consist of many spin- 12 fermions, each of which has both spin and orbital
angular momentum. In the independent-particle model we think of each fermion occupying a
well-defined single-particle orbital which is an eigenstate of a central potential and hence has
well defined orbital angular momentum l. The notation s, p, d, f , g. . . is used for orbitals of
l = 0, 1, 2, 3, 4 . . .. For each fermion there is also a total angular momentum j. All the angular
momenta of all the fermions can be added in a variety of ways, and the following quantum
numbers are defined: L for the sum of all the orbital angular momenta (that is, the eigenvalues
of L̂2tot are ~2 L(L + 1)); S for the sum of all the spin angular momenta, and J for the total
angular momentum of the atom or nucleus from all sources. The use of capitals for the quantum
numbers shouldn’t be confused with the operators themselves.
In reality the independent-particle model is only an approximation, and only the total angular
momentum J is a conserved quantum number (only Ĵ2tot commutes with the Hamiltonian of
the whole system). For light atoms, it is a good starting point to treat L and S as if they were
conserved too, and the notation 2S+1 LJ is used, with L being denoted by S, P , D, F , G. . . .
This is termed LS coupling. So 3 S1 has L = 0, S = J = 1. Energy levels of light atoms are
split according to J by the spin-orbit splitting ∝ L̂ · Ŝ (of which more later). However for heavy
atoms and for nuclei, it is a better approximation to sum the individual total angular momenta
j to give J as the only good quantum number. (j-j coupling.)
Somewhat confusingly, J is often called the spin of the atom or nucleus, even though its origin
is both spin and angular momentum. This composite origin shows up in a magnetic coupling
g which is neither 1 (pure orbital) or 2 (pure spin). For light atoms g can be calculated from
L, S and J (the Landé g-factor). For nuclei things are further complicated by the fact that
protons and neutrons are not elementary particles, and their “spin” is likewise of composite
origin, something which shows up through their g values of gp = 5.59 and gn = −3.83 rather
than 2 and 0 respectively. Using these the equivalent of the Landé g-factor can be calculated
for individual nucleon orbitals, and hence for those odd-even nuclei for which the single-particle
model works (that is, assuming that only the last unpaired nucleon contributes to the total
angular momentum). Beyond that it gets complicated.
2.6 Vector Operators

Shankar 15.3
This section is not examinable. The take-home message is that vector operators such as x̂
and p̂ can change the angular momentum of the state they act on in the same way as coupling
in another source of angular momentum with l = 1. If the components of the vector operator
are written in a spherical basis analogously to Ĵ± , the dependence of the matrix elements on the
m quantum numbers is given by Clebsch Gordan coefficients, with the non-trivial dependence
residing only in a single “reduced matrix element” for each pair j and j 0 of the angular momenta
of the initial and final states. This is the Wigner-Eckart theorem of Eq. (2.6).
We have now met a number of vector operators: x̂ = (x̂, ŷ, ẑ), p̂ = (p̂x , p̂y , p̂z ), and of course
L̂, Ŝ and Ĵ. We have seen, either in lectures or examples, that they all satisfy the following
relation: if V̂ stands for the vector operator
X
[Jî , V̂j ] = i~ ijk V̂k
k
for example, [Jˆx , ŷ] = i~ẑ. (We could have substituted L̂x for Jˆx here as spin and space operators
commute.)
We can take this to be the definition of a vector operator: a triplet of operators makes up a
vector operator if it satisfies these commutation relations.
Just as it was useful to define Jˆ+ and Jˆ− , so it is useful to define
q q
1
V̂+1 = − 2 (V̂1 + iV̂2 ) V̂−1 = 12 (V̂1 − iV̂2 ) V̂0 = V̂3
where the subscripts are no longer Cartesian coordinates (1 ≡ x etc) but analogous to the m
of the spherical harmonics—and indeed
q q q
±1
1
∓ 2 (x ± iy) = 4π
3
rY1 (θ, φ) z = 4π 3
rY10 (θ, φ).
q
Note a slight change of normalisation and sign: Jˆ±1 = ∓ 12 Jˆ± . In terms of these spherical
components V̂m ,
[Jˆ0 , V̂m ] = m~V̂m [Jˆ± , V̂m ] = ~ (1 ∓ m)(2 ± m)V̂m±1
p
If we compare these to the effects on states

Jˆ3 |j, mi = ~m|j, mi
p
J± |j, mi = ~ (j ∓ m)(j ± m + 1)|j, m ± 1i
we see a close parallel, so long as we take j = 1 for the vector operators.3
Consider the following two calculations. First, we consider matrix elements of the commutator
of the components of a tensor operator V̂m with Jˆ± , in which l = 1, and p and q are magnetic
quantum numbers like m; in the second line we note that hj, m|Jˆ± is the bra associated with
Jˆ∓ |j, mi
hj 0 , p|[Jˆ± , V̂m ]|j, qi = ~ (l ∓ m)(l ± m + 1)hj 0 , p|V̂m±1 |j, qi
p
hj 0 , p|Jˆ± V̂m − V̂m Jˆ± |j, qi = ~ (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|V̂m |j, qi

p
and
p
− ~ (j ∓ q)(j ± q + 1)hj 0 , p|V̂m |j, q ± 1i
p p
⇒ (l ∓ m)(l ± m + 1)hj 0 , p|V̂m±1 |j, qi = (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|V̂m |j, qi
p
− (j ∓ q)(j ± q + 1)hj 0 , p|V̂m |j, q ± 1i
Secondly we take matrix elements of Jˆ± = Jˆ±(1) + Jˆ±(2) , giving us a relation between the Clebsch-
Gordan coefficients for l and j coupling up to j 0

hj 0 , p|Jˆ± |l, mi ⊗ |j, qi = ~ (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|l, m; j, qi
p

hj 0 , p|Jˆ±(1) + Jˆ±(2) |l, mi ⊗ |j, qi = ~ (l ∓ m)(l ± m + 1)hj 0 , p|l, m ± 1; j, qi
p
and
p
+ ~ (j ∓ q)(j ± q + 1)hj 0 , p|l, m; j, q ± 1i
p p
⇒ (l ∓ m)(l ± m + 1)hj 0 , p|l, m ± 1; j, qi = (j 0 ± p)(j 0 ∓ p + 1)hj 0 , p ∓ 1|l, m; j, qi
p
− (j ∓ q)(j ± q + 1)hj 0 , p|l, m; j, q ± 1i
3
Note that in this section, we use the algebraically equivalent (j ∓ m)(j ± m + 1) for j(j + 1) − m(m ± 1) in
the normalisation of Jˆ± |j, mi.
Comparing the two, we see that the coefficients are identical, but in the first they multiply
matrix elements of V̂ whereas in the second, they multiply Clebsch-Gordan coefficients. This
can only be true if the matrix elements are proportional to the Clebsch-Gordan coefficients,
with a constant of proportionality which must be independent of magnetic quantum numbers,
and which we will write as hj 0 ||V̂||ji, the reduced matrix element:
hj 0 , p|V̂m |j, qi = hj 0 ||V̂||jihj 0 , p|j, q; l, mi|l=1
This is a specific instance of the Wigner-Eckart theorem. It says that acting on a state
with a vector operator is like coupling in one unit of angular momentum; only states with
|j 0 − 1| ≤ j ≤ j 0 + 1 and with p = m + q will have non-vanishing matrix elements. It also
means that if one calculates one matrix element, which ever is the simplest (so long as it is
non-vanishing), then the others can be written down directly.
Since Ĵ is a vector operator, it follows that matrix elements of Ĵq can also be written in terms
of a reduced matrix element hj 0 ||Ĵ||ji, but of course this vanishes unless j 0 = j.
P
Writing |j1 , j2 ; J, M i = m1 m2 hJ, M |j1 , m1 ; j2 , m2 i|j1 , m1 i ⊗ |j2 , m2 i, and using orthonormality
of the states {|J, M i}, allows us to show that
X
hJ, M |j1 , m1 ; j2 , m2 ihJ 0 , M 0 |j1 , m1 ; j2 , m2 i = δJJ 0 δM M 0 (2.5)
m1 m2
Noting
P too that a scalar product of vector operators P̂·Q̂ can be written in spherical components
as q (−1)q P̂−q Q̂q , we can show that
X
hj, m|P̂ · Ĵ|j, mi = (−1)q hj, m|P̂−q |j 0 , m0 ihj 0 , m0 |Jˆq |j, mi
q,j 0 ,m0
X
= hj 0 , m0 |P̂q |j, mihj 0 , m0 |Jˆq |j, mi = hj||P̂||jihj||Ĵ||ji;
q,m0
(we insert a complete set of states at the first step, then use the Wigner-Eckart theorem and
Eq. (2.5)).
p
Replacing P̂ with Ĵ gives us hj||Ĵ||ji = j(j + 1). Hence we have the extremely useful relation
hj, m|P̂ · Ĵ|j, mi

hj, m|P̂|j, mi = hj, m|Ĵ|j, mi . (2.6)
j(j + 1)
which we will use in calculating the Landé g factor in the next section.
Finally, we might guess from the way that we used a general symbol l instead of 1, that there
are operators which couple in 2 or more units of angular momentum. Simple examples are
obtained by writing rl Ylm in terms of x, y and z, then setting x → x̂ etc; so (x̂ ± iŷ)2 , (x̂ ± iŷ)ẑ,
and 2ẑ 2 − x̂2 − ŷ 2 ) are the m = ±2, m = ±1 and m = 0 components of an operator with
l = 2 (a rank-two tensor operator, in the jargon). There are six components of x̂i x̂j , but
(ẑ 2 + x̂2 + ŷ 2 ) is a scalar (l = 0). This is an example of the tensor product of two l = 1 operators
giving l = 2 and l = 0 operators.
3. Approximate methods I:
variational method and WKB
3.1 Approximate methods in Quantum Mechanics

It is often (almost always!) the case that we cannot solve real problems analytically. Only
a very few potentials have analytic solutions, by which I mean we can write down the energy
levels and wave functions in closed form, as we can for the famaliar examples of the square well,
harmonic oscillator and Coulomb potential. In fact those are really the only useful ones, as they
do crop up in various physical contexts. In the last century, a number of approximate methods
have been developed to obtain information about systems which can’t be solved exactly.
These days, this might not seem very relevant. Computers can solve differential equations very
efficiently. But:
• It is always useful to have a check on numerical methods.
• They don’t provide much insight on which aspects of the physics are more important.
• Even supercomputers can’t solve the equations for many interacting particles exactly in
a reasonable time (where “many” may be as low as four, depending on the complexity of
the interaction) — ask a nuclear physicist or quantum chemist.
• Quantum field theories are systems with infinitely many degrees of freedom. All ap-
proaches to QFT must be approximate.
• If the system we are interested in is close to a soluble one, we might obtain more insight
from approximate methods than from numerical ones. This is the realm of perturbation
theory. The most accurate prediction ever made, for the anomalous magnetic moment of
the electron, which is good to one part in 1012 , is a 4th order perturbative calculation.
In the next chapter we will consider perturbation theory, which is the probably the most widely
used approximate method, and in this chapter we consider two other very useful approaches:
variational methods and WKB.
35
3.2 Variational methods: ground state
Shankar 16.1; Mandl 8.1; Griffiths 7.1; Gasiorowicz 14.4
Summary: Whatever potential we are considering, we can always obtain

an upper bound on the ground-state energy.
Suppose we know the Hamiltonian of a bound system but don’t have any idea what the energy
of the ground state is, or the wave function. The variational principle states that if we simply
guess the wave function, the expectation value of the Hamiltonian in that wave function will
be greater than the true ground-state energy:
hΨ|Ĥ|Ψi
≥ E0
hΨ|Ψi
This initially surprising result is more obvious if we consider

P expanding the (normalised) |Ψi
in the true energy eigenstates |ni, which gives hĤi = n Pn En . Since all the probabilities Pn
are non-negative, and all the En greater than or equal to E0 , this is obviously not less than E0 .
It is also clear that the better the guess (in the sense of maximising the overlap with the true
ground state) the lower the energy bound, till successively better guesses converge on the true
result.
As a very simple example, consider the infinite square well with V = 0, 0 < x < a and V = ∞
otherwise. As a trial function, we use Ψ(x) = x(a − x), 0 < x < a and Ψ(x) = 0 otherwise.
Then
hΨ|Ĥ|Ψi 10 ~2 π 2 ~2
= = 1.013
hΨ|Ψi 2ma2 2ma2
This is spectacularly good! Obviously it helped that our trial wave function looked a lot like
what we’d expect of the true solution - symmetric about the midpoint, obeying the boundary
conditions, no nodes....
In general, we will do better if we have an adjustable parameter, because then we can find
the value which minimises our upper bound. So we could try Ψ(x) = x(a − x) + bx2 (a − x)2
(with our previous guess corresponding to b = 0). Letting Mathematica do the dirty work, we
get an energy bound which is a function of b, which takes its minimum value of 1.00001E0 at
b = 1.133/a2 . Not much room for further improvement here!
Above we have plotted, on the left, the true and approximate wave functions (except that
the true one is hidden under the second approximation, given in blue) and on the right, the
deviations of the approximate wave functions from the true one (except that for the second
approximation the deviation has been multiplied by 5 to render it visible!) This illustrates
a general principle though: the wave function does have deviations from the true one on the
part-per-mil scale, while the energy is good to 1 part in 105 . This is because the error in the
energy is proportional to the coefficients squared of the admixture of “wrong” states, whereas
the error in the wave function is linear in them.
Another example, for which the analytic solution is not known, is the quartic potential, V (x) =
2
βx4 . Here a Gaussian trial wave function Ψ(x) = e−ax /2 gives an upper bound for the ground
energy state of 83 61/3 = 0.68 in units of (~4 β/m2 )1/3 . (The value obtained from numerical
solution of the equation is 0.668).
3.2.1 Variational methods: excited states
Shankar 16.1; Mandl 8.3
Summary: Symmetry considerations may allow us to extend the varia-

tional method to certain excited states.
P
Looking again at the expression hĤi = n Pn En , and recalling that the Pn are the squares
of the overlap between the trial function and the actual eigenstates of the system, we see that
we can only find bounds on excited states if we can arrange for the overlap of the trial wave
function with all lower states to be zero. Usually this is not possible.
However an exception occurs where the states of the system can be separated into sets with
different symmetry properties or other quantum numbers. Examples include parity and (in 3
dimensions) angular momentum. For example the lowest state with odd parity will automat-
ically have zero overlap with the (even-parity) ground state, and so an upper bound can be
found for it as well.
For the square well, the relevant symmetry is reflection about the midpoint of the well. If we
choose a trial function which is antisymmetric about the midpoint, it must have zero overlap
with the
P true ground state. So we can get a good bound on the first excited state, since
hĤi = n>0 Pn En > E1 . Using Ψ1 (x) = x(a−x)(2x−a), 0 < x < a we get E1 ≤ 42 ~ /2ma2 =
2
1.064E1 .
If we wanted a bound on E2 , we’d need a wave function which was orthogonal to both the
ground state and the first excited state. The latter is easy by symmetry, but as we don’t know
the exact ground state (or so we are pretending!) we can’t ensure the first. We can instead
form a trial wave function which is orthogonal to the best trial ground state, but we will no
longer have a strict upper bound on the energy E2 , just a guess as to its value.
In this case we can choose Ψ(x) = x(a − x) + bx2 (a − x)2 with a new value of b which gives
orthogonality to the previous state, and then we get E2 ∼ 10.3E0 (as opposed to 9 for the
actual value).
3.2.2 Variational methods: the helium atom
Griffiths 7.2; Gasiorowicz 14.2,4; Mandl 7.2, 8.8.2;Shankar 16.1
Summary: The most famous example of the variational principle is the

ground state of the two-electron helium atom.
If we could switch off the interactions between the electrons, we would know what the ground
state of the helium atom would be: Ψ(r1 , r2 ) = φZ=2 Z=2 Z
100 (r1 )φ100 (r2 ), where φnlm is a single-
particle wave function of the hydrogenic atom with nuclear charge Z. For the ground state
n = 1 and l = m = 0 (spherical symmetry). The energy of the two electrons would be
2Z 2 ERy = −108.8 eV. But the experimental energy is only −78.6 eV (ie it takes 78.6 eV to
fully ionise neutral helium). The difference is obviously due to the fact that the electrons repel
one another.
The full Hamiltonian (ignoring the motion of the proton - a good approximation for the accuracy
to which we will be working) is
~2

2 2 1 1 1
− (∇1 + ∇2 ) − 2 ~cα + + ~cα
2m |r1 | |r2 | |r1 − r2 |
where ∇21 involves differentiation with respect to the components of r1 , and α = e2 /(4π0 ~c) ≈
1/137. (See here for a note on units in EM.)
A really simple guess at a trial wave function for this problem would just be Ψ(r1 , r2 ) as written
above. The expectation value of the repulsive interaction term is (5Z/4)ERy giving a total
energy of −74.8 eV. (Gasiorowicz demonstrates the integral, as do Fitzpatrick and Branson.)
It turns out we can do even better if we use the atomic number Z in the wave function Ψ as a
variational parameter (that in the Hamiltonian, of course, must be left at 2). The best value
turns out to be Z = 27/16 and that gives a better upper bound of −77.5 eV – just slightly
higher than the experimental value. (Watch the sign – we get an lower bound for the ionization
energy.) This effective nuclear charge of less than 2 presumably reflects the fact that to some
extent each electron screens the nuclear charge from the other.
3.3 WKB approximation

Shankar 16.2; Gasiorowicz Supplement 4A, Griffiths 8.1
Summary: The WKB approximation works for potentials which are

slowly-varying on the scale of the wavelength of the particle and is par-
ticularly useful for describing tunnelling.
The WKB approximation is named for G. Wentzel, H.A. Kramers, and L. Brillouin, who
independently developed the method in 1926. There are pre-quantum antecedents due to
Jeffreys and Raleigh, though.
We can always write the one-dimensional time-independent Schrödinger equation as
d2 ψ
= −k(x)2 ψ(x)
dx2
p
where k(x) ≡ 2m(E − V (x))/~. We could think of the quantity k(x) as a spatially-varying
wavenumber (k = 2π/λ), though we anticipate that this can only make sense if it doesn’t
change too quickly with position - else we can’t identify a wavelength at all.
If the potential, and hence k, were constant, the plain waves e±ikx would be solutions. Let’s
see under what conditions a solution of the form
Z x
0 0
ψ(x) = A exp ±i k(x )dx
might be a good approximate solution when the potential varies with position. Plugging this
into the SE above, the LHS reads −(k 2 ∓ ik 0 )ψ. (Here and hereafter, primes denote differenti-
ation wrt x — except when they indicate an integration variable.) So provided |k 0 /k 2 | 1, or
equivalently |λ0 | 1, this is indeed a good solution as the second term can be ignored. And
|λ0 | 1 does indeed mean that the wavelength is slowly varying. (One sometimes reads that
what is needed is that the potential is slowly varying. But that is not a well defined statement,
because dV /dx is not dimensionless. For any smooth potential, at high-enough energy we will
have |λ0 | 1. What is required is that the length-scale of variation of λ, or k, or V (the scales
are all approximately equal) is large compared with the de Broglie wavelength of the particle.
An obvious problem with this form is that the probability current isn’t constant: if we calculate
it we get |A|2 ~k(x)/m. A better approximation is
Z x
A 0 0
ψ(x) = p exp ±i k(x )dx
k(x)
which gives a constant flux. (Classically, the probability of finding a particle in a small region
is inversely proportional to the speed with which it passes through that region.) Furthermore
one can show that if the error in the first approximation is O |λ0 |, the residual error with the
second approximation is O |λ0 |2 . At first glance there is a problem with the second form when
k(x) = 0, ie when E = V (x). But near these points—the classical turning points—the whole
approximation scheme is invalid anyway, because λ → ∞ and so the potential cannot be “slowly
varying” on the scale of λ.
For a region of constant potential, of course, there
R x is0 no 0difference between the two approxima-
tions and both reduce to a plain wave, since k(x )dx = kx.
For regions where E < V (x), k(x) will be imaginary and there is no wavelength as such. Instead
we get real exponentials, which describe tunnelling. But defining λ = 2π/|k| still, the WKB
approximation will continue to be valid if |λ0 | 1 (where λ should now be interpreted as a
decay length).
Tunnelling and bound-state problems inevitably include regions where E ≈ V (x) and the WKB
approximation isn’t valid. This would seem to be a major problem. However if such regions
are short the requirement that the wave function and its derivative be continuous can help us
to “bridge the gap”.
3.3.1 WKB approximation for bound states

Shankar 16.2; Gasiorowicz Supplement 4A; Griffiths 8.3
In a bound state problem with potential V (x), for a given energy E, we can divide space into
classically allowed regions, for which E > V (x), and classically forbidden regions for which
E < V (x). For simplicity we will assume that there are only three regions in total, classically
forbidden for x < a and x > b, and classically allowed for a < x < b.
In the classically allowed region a < x < b the wave function will be oscillatory and we can
write it either as an equal superposition of right- and left-moving complex exponentials or as
Z x
A 0 0
ψ(x) = p cos k(x )dx + φ
k(x) x0
where x0 is some reference point; if it is changed, φ will also change.

For the particular case of a well with infinite sides the solution mustR vanish at the boundaries.
b
Hence, taking the x0 = a, we can choose φ = − 12 π. We the require a k(x0 )dx0 + φ = (n + 21 )π
so that ψ(b) vanishes; in other words
Z b
k(x0 )dx0 = (n + 1)π, for n = 0, 1, 2 . . . .
a
(The integral is positive-definite so n cannot be negative.) Now k(x) depends on E, and this
condition will not hold for a general E, any more than the boundary conditions can be satisfied
for an arbitrary E in the infinite square well. Instead we obtain an expression for E in terms
of n, giving a spectrum of levels E0 , E1 . . ..
√
Of course for the infinite square well k = 2mE/~ is constant and the condition gives k =
(n + 1)π/(b − a), which is exact. (Using n + 1 rather than n allows us to start with n = 0;
starting at n = 1 is more usual for an infinite well.)
For a more general potential, outside the classically allowed region we will have decaying ex-
ponentials. In the vicinity of the turning points these solutions will not be valid, but if we
approximate the potential as linear we can solve the Schrödinger equation exactly (in terms of
Airy functions). Matching these to our WKB solutions in the vicinity of x = a and x = b gives
the surprisingly simple result Z b
k(x0 )dx0 = (n + 21 )π.
a
This is the quantisation condition for a finite well; it is different from the infinite well because
the solution can leak into the forbidden region. (For a semi-infinite well, the condition is
that the integral equal (n + 34 )π. This is the appropriate form for the l = 0 solutions of a
spherically symmetric well.) Unfortunately we can’t check this against the finite square well
though, because there the potential is definitely not slowly varying at the edges, nor can it be
approximated as linear. But we can try the harmonic oscillator, for which the integral gives
Eπ/~ω and hence the quantisation condition gives E = (n + 12 ) ~ω! The approximation was
only valid for large n (small wavelength) but in fact we’ve obtained the exact answer for all
levels.
Matching with Airy Functions
Griffiths 8.3
This subsection gives details of the matching process. The material in it is is not examinable.
More about Airy functions can be found in section A.9.
If we can treat the potential as linear over a wide-enough region around the turning points
that, at the edges of the region, the WKB approximation is valid, then we can match the WKB
and exact solutions.
Consider a linear potential V = βx as an approximation to the potential near the right-hand
turning point b. We will scale x = ( ~2 /(2mβ))1/3 z and E = (~2 β 2 /2m)1/3 µ, so the turning
point is at z = µ. Then the differential equation is y 00 (z) − zy(z) + µy(z) = 0 and the solution
which decays as z → ∞ is y(z) = AAi(z − µ). This has to be matched, for z not too close to
µ, to the WKB solution. In these units, k(x) = (µ − z)1/2 and
Z x Z z
0 0
k(x )dx = (µ − z 0 )1/2 dz 0 = − 23 (µ − z)3/2 ,
b µ
so
WKB B 2 3/2
WKB C 2 3/2

ψx<b (z) = cos − 3
(µ − z) + φ and ψx>b (z) = exp − 3
(z − µ) .
(µ − z)1/4 (z − µ)1/4
(We chose the lower limit of integration to be b in order that the constant of integration
vanished; any other choice would just shift φ.) Now the asymptotic forms of the Airy function
are known:
2 3/2 π
z→−∞
cos 3 |z| − 4 − 2 z 3/2
z→∞ e 3
Ai(z) −→ √ and Ai(z) −→ √ 1/4
π |z|1/4 2 πz
so 2 3/2
2
− π4
3/2

z→−∞ cos 3 (µ − z) z→∞ e− 3 (z−µ)
Ai(z − µ) −→ √ and Ai(z − µ) −→ √
π(µ − z)1/4 2 π(z − µ)1/4
and these will match the WKB expressions exactly provided C = 2B and φ = π/4.
At the left-hand turning point a, the potential is V = −β 0 x (with β 6= β 0 in general) and the
solution y(z) = AAi(−z + ν). On the other hand the WKB integral is
Z x Z z
0 0
k(x )dx = (−ν + z 0 )dz 0 = 32 (−ν + z)3/2 .
a ν
So in the classically allowed region we are comparing

2
− π4
3/2

z→∞ cos 3 (z − ν) WKB D 2
− ν)3/2 + φ0

Ai(−z + ν) −→ √ with ψx>a (z) = cos 3
(z
π(z − ν) 1/4 (z − ν)1/4
which requires φ0 = −π/4. (Note that φ0 6= φ because we have taken a different lower limit of
the integral.)
So now we have two expressions for the solution, valid everywhere except in the vicinity of the
boundaries,
Z x Z x
D 0 0 B 0 0
ψ(x) = p cos k(x )dx − π/4 and ψ(x) = p cos k(x )dx + π/4
k(x) a k(x) b
Rb
which can be satisfied only if D = ±B and a
k(x0 )dx0 = (n + 21 )π, as required.
It is worth stressing that although—for a linear potential—the exact (Airy function) and WKB
solutions match “far away” from the turning point, they do not do so close in. The (z − µ)−1/4
terms in the latter mean that they blow up, but the former are perfectly smooth. They are
shown (for µ = 0) in the figure, in red for the WKB and black for the exact functions. We can
see they match very well so long as |z − µ| > 1; in fact z → ∞ is overkill!
So now we can be more precise about the conditions under which the matching is possible: we
need the potential to be linear over the region ∆x ∼ ( ~2 /(2mβ))1/3 where β = dV /dx. Linearity
means that ∆V /∆x ≈ dV /dx at the turning point, or d2 V /dx2 ∆x dV /dx (assuming the
curvature is the dominant non-linearity, as is likely if V is smooth). For the harmonic oscillator,
d2 V /dx2 ∆x/(dV /dx) = 2(~ω/E)2/3 which is only much less than 1 for very large n, making
the exact result even more surprising!
3.3.2 WKB approximation for tunnelling
Shankar 16.2; Gasiorowicz Supplement 4B; Griffiths 8.2
For the WKB approximation to be applicable to tunnelling through a barrier, we need as
always |λ0 | 1. In practice that means that the barrier function is reasonably smooth and
that E V (x). Now it would of course be possible to do a careful calculation, writing down
the WKB wave function in the three regions (left of the barrier, under the barrier and right of
the barrier), linearising in the vicinity of the turning points in order to match the wave function
and its derivatives at both sides. This however is a tiresomely lengthy task, and we will not
attempt it. Instead, recall the result for a high, wide square barrier; the transmission coefficient
in the limit e−2κL 1 is given by
16k1 k2 κ2
T = e−2κL ,
(κ2 + k12 )(κ2 + k22 )
wherepk1 and k2 are the wavenumbers on either side of the barrier (width L, height V ) and
κ = 2m(V − E). (See the revision notes.) All the prefactors are not negligible, but they
are weakly energy-dependent, whereas the e−2κL term is very strongly energy dependent. If we
plot log T against energy, the form will be essentially const − 2κ(E)L, and so we can still make
predictions without worrying about the constant.
For a barrier which is not constant, the WKB approximation will yield a similar expression for
the tunnelling probability:
Z b
0 0
T = [prefactor] × exp −2 κ(x )dx ,
a
p
where κ(x) ≡ 2m(V (x) − E)/~. The WKB approximation is like treating a non-square
barrier like a succession of square barriers of different heights. The need for V (x) to be slowly
varying is then due to the fact that we are slicing the barrier sufficiently thickly that e−2κ∆L 1
for each slice.
The classic application of the WKB approach to tunnelling is alpha decay. The potential here is
a combination of an attractive short-range nuclear force and the repulsive Coulomb interaction
between the alpha particle and the daughter nucleus. Unstable states have energies greater than
zero, but they are long-lived because they are classically confined by the barrier. (It takes some
thought to see that a repulsive force can cause quasi-bound states to exist!) The semi-classical
model is of a pre-formed alpha particle of energy E bouncing back and forth many times (f )
per second, with a probability of escape each time given by the tunnelling probability, so the
decay rate is given by 1/τ = f T . Since we can’t calculate f with any reliability we would be
silly to worry about the prefactor in T , but the primary dependence of the decay rate on the
energy of the emitted particle will come from the easily-calculated exponential.
In the figure above the value of a is roughly the nuclear radius R, and b is given by VC (b) = E,
with the Coulomb potential VC (r) = zZ ~cα/r. (Z is the atomic number of the daughter
nucleus and z = 2 that of the alpha.) The integral in the exponent can be done (see Gasiorowicz
Supplement 4 B for details1 ; the substitution r = b cos2 θ is used), giving in the limit b a
Z b r
mc2 Z Z
2 κ(r)dr = 2πzZα = 39 p ⇒ log10 τ = const + 1.72 p .
R 2E E(MeV) E(MeV)
1
But note that there is a missing minus sign between the two terms in square brackets in eq. 4B-4.
This is the Geiger-Nuttall law. Data for the lifetimes of long-lived isotopes (those with low-
energy alphas) fit such a functional form well, but with 1.61 rather than 1.72. In view of the
fairly crude approximations made, this is a pretty good result. Note it is independent of the
p radius because we used b R; we could have kept the first correction, proportional
nuclear
to b/R, to improve the result. Indeed the first estimates of nuclear radii came from exactly
such studies.
A version of the classic figure of the results is given below on the left.2 The marked energy scale
1
is non-linear; the linear variable on the x-axis is actually −E − 2 , so that the slope is negative.
Straight lines join isotopes of the same element. On the right is a more recent plot3 of log τ
1
against E − 2 , showing deviations from linearity at the high-energy (left-hand) end.
2
found at https://web-docs.gsi.de/∼ wolle/TELEKOLLEG/KERN/LECTURE/Fraser/L14.pdf, which may
not be the original source.
3
Astier et al, Eur. Phys. J. A46 (2010) 165
4. Approximate methods II:
Time-independent perturbation theory
Perturbation theory is the most widely used approximate method. It requires we have a set of
exact solutions to a Hamiltonian which is close to the realistic one.
4.1 Non-degenerate perturbation theory

Shankar 17.1; Mandl 7.1; Griffiths 6.1
Summary: First assume the exactly-solvable Hamiltonian has no degen-

eracies (repeated eigenvalues); then the rules are straightforward.
Perturbation theory is applicable when the Hamiltonian Ĥ can be split into two parts, with
the first part being exactly solvable and the second part being small in comparison. The first
part is always written Ĥ (0) , and we will denote its eigenstates by |n(0) i and energies by En(0)
(with wave functions φ(0)
n ). These we know, and for now assume to be non-degenerate. The
eigenstates and energies of the full Hamiltonian are denoted |ni and En , and the aim is to
find successively better approximations to these. The zeroth-order approximation is simply
|ni = |n(0) i and En = En(0) , which is just another way of saying that the perturbation is small
and at a crude enough level of approximation we can ignore it entirely.
Nomenclature for the perturbing Hamiltonian Ĥ − Ĥ (0) varies. δV , Ĥ (1) and λĤ (1) are all
common. It usually is a perturbing potential but we won’t assume so here, so we won’t use the
first. The second and third differ in that the third has explicitly identified a small, dimensionless
parameter (eg α in EM), so that the residual Ĥ (1) isn’t itself small. With the last choice, our
expressions for the eigenstates and energies of the full Hamiltonian will be explicitly power
series in λ, so En = En(0) + λEn(1) + λ2 En(2) + . . . etc. With the second choice the small factor is
hidden in Ĥ (1) , and is implicit in the expansion which then reads En = En(0) +En(1) +En(2) +. . .. In
this case one has to remember that anything with a superscript (1) is first order in this implicit
small factor, or more generally the superscript (m) denotes something which is mth order. For
the derivation of the equations we will retain an explicit λ, but thereafter we will set it equal
to one to revert to the other formulation. We will take λ to be real so that Ĥ1 is Hermitian.
We start with the master equation
(Ĥ (0) + λĤ (1) )|ni = En |ni.
Then we substitute in En = En(0) + λEn(1) + λ2 En(2) + . . . and |ni = |n(0) i + λ|n(1) i + λ2 |n(2) i + . . .
and expand. Then since λ is a free parameter, we have to match terms on each side with the
45
same powers of λ, to get
Ĥ (0) |n(0) i = En(0) |n(0) i

Ĥ (0) |n(1) i + Ĥ (1) |n(0) i = En(0) |n(1) i + En(1) |n(0) i
Ĥ (0) |n(2) i + Ĥ (1) |n(1) i = En(0) |n(2) i + En(1) |n(1) i + En(2) |n(0) i
We have to solve these sequentially. The first we assume we have already done. The second
will yield En(1) and |n(1) i. Once we know these, we can use the third equation to yield En(2) and
|n(2) i, and so on. The expressions for the changes in the states, |n(1) i etc, will make use of the
fact that the unperturbed states {|n(0) i} from a basis, so we can write
X X
|n(1) i = cm |m(0) i = hm(0) |n(1) i |m(0) i.
m m
In each case, to solve for the energy we take the inner product with hn(0) | (i.e. the same state)
whereas for the wave function, we use hm(0) | (another state). We use, of course,
hm(0) |Ĥ (0) = Em
(0)
hm(0) | and hm(0) |n(0) i = δmn .
At first order we get
En(1) = hn(0) |Ĥ (1) |n(0) i (4.1)

hm(0) |Ĥ (1) |n(0) i
hm(0) |n(1) i = ∀m 6= n. (4.2)
En(0) − Em (0)
The second equation tells us the overlap of |n(1) i with all the other |m(0) i, but not with |n(0) i.
This is obviously not constrained by the eigenvalue equation, because we can add any amount
of |n(0) i and the equations will still be satisfied. However we need the state to continue to be
normalised, and when we expand hn|ni = 1 in powers of λ we find that hn(0) |n(1) i is required
to be imaginary. This is just like a phase rotation of the original state and we can ignore it.
(Recall that an infinitesimal change in a unit vector has to be at right angles to the original.)
Hence
X hm(0) |Ĥ (1) |n(0) i
|n(1) i = (0) (0)
|m(0) i. (4.3)
m6=n
En − Em
If the spectrum of Ĥ (0) is degenerate, there may be a problem with this expression because the
denominator can vanish, making the corresponding term infinite. In fact nothing that we have
done so far is directly valid in that case, and we have to use “degenerate perturbation theory”
instead. For now we assume that for any two states |m(0) i and |n(0) i, either En(0) − Em
(0)
6= 0 (non
degenerate) or hm |Ĥ |n i = 0 (the states are not mixed by the perturbation.)
(0) (1) (0)
Then at second order

2
X hm(0) |Ĥ (1) |n(0) i
En(2) = hn(0) |Ĥ (1) |n(1) i = . (4.4)
m6=n
En(0) − Em
(0)
The expression for the second-order shift in the wave function |n(2) i can also be found but it
is tedious. The main reason we wanted |n(1) i was to find En(2) anyway, and we’re not planning
to find En(3) ! Note that though the expression for En(1) is generally applicable, those for |n(1) i
and En(2) would need some modification if the Hamiltonian had continuum eigenstates as well
as bound states (eg hydrogen atom). Provided the state |ni is bound, that is just a matter of
integrating rather than summing. This restriction to bound states is why Mandl calls chapter 7
“bound-state perturbation theory”. The perturbation of continuum states (eg scattering states)
is usually dealt with separately.
Note that the equations above hold whether we have identified an explicit small parameter λ
or not. So from now on we will set λ to one, assume that Ĥ (1) has an implicit small parameter
within it, and write En = En(0) + En(1) + En(2) + . . .; the expressions above for E (1,2) and |n(1) i are
still valid.
4.1.1 Connection to variational approach

We have shown that hψ|Ĥ|ψi ≥ E0 for all normalised states |ψi (with equality implying |ψi =
|0i). Thus for the (non-degenerate) ground state, E0(0) + E0(1) is an upper bound on the exact
energy E0 , since it is obtained by using the unperturbed ground state as a trial state for the full
Hamiltonian. It follows that the sum of all higher corrections E0(2) + . . . must be negative. We
can see indeed that E0(2) will always be negative, since for every term in the sum the numerator
is positive and the denominator negative. (The fact that hψ|Ĥ|ψi ≥ E0 is the basis of the
variational approach to finding the ground state energy, where we vary a trial state |ψi to
opimise an upper bound on E0 .)
4.1.2 Perturbed infinite square well

Probably the simplest example we can think of is an infinite square well with a low step half
way across, so that 
0
 for 0 < x < a/2,
V (x) = V0 for a/2 < x < a

∞ elsewhere

We treat this as a perturbation on the flat-bottomed well, so Ĥ (1) −→ V0 for a/2 < x < a and
x
zero elsewhere.
q
2
The ground-state unperturbed wave function is ψ0 = (0)
a
sin πx
a
, with unperturbed energy
E0(0) = π 2 ~2 /(2ma2 ). A “low” step will mean V0 E0(0) . Then we have
2 a
Z
πx V0
(1) (0) (1) (0)
E0 = hψ0 |Ĥ |ψ0 i = V0 sin2 dx =
a a/2 a 2
This problem can be solved√ semi-analytically;

p in both regions the solutions are sinusoids, but
0
with wavenumbers k = 2mE/~ and k = 2m(E − V0 )/~ respectively; satisfying the bound-
ary conditions and matching the wave functions and derivatives at x = a/2 gives the condition
k cot(ka/2) = k 0 cot(k 0 a/2) which can be solved numerically for E. Below the exact solu-
tion (green, dotted) and E0(0) + E0(1) (blue) are plotted; we can see that they start to diverge
when V0 is about 5, which is higher than we might have expected (everything is in units of
~2 /(2ma2 ) ≈ 0.1E0 ).
E
17
16
15
14
13
12
11
V
2 4 6 8 10 12 14
We can also plot the exact wave functions for different step size, and see that for V0 = 10 (the
middle picture, well beyond the validity of first-order perturbation theory) it is significantly
different from a simple sinusoid.
25 25 25
20 20 20
15 15 15
10 10 10
5 5 5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
4.1.3 Perturbed Harmonic oscillator

p̂ 2
Another example is the harmonic oscillator, Ĥ (0) = 2m + 12 mω 2 x̂2 , with a perturbing potential
2
H = λx̂ . The states of the unperturbed oscillator are denoted |n(0) i with energies E0(0) =
(1)
(n + 21 )~ω.
Recalling√that in terms of creation and annihilation p operators (see section 1.6),
x̂ = (x0 / 2)(â + â† ), with [â, â† ] = 1, and x0 = ~/(mω), and so
x20 λ (0) † 2 λ
En(1) = hn(0) |Ĥ (1) |n(0) i = hn |(â ) + â2 + 2â† â + 1|n(0) i = 2
~ω(n + 12 ).
2 mω
The first-order change in the wave function is also easy to compute, as hm(0) |Ĥ (1) |n(0) i = 0
unless m = n ± 2. Thus
X hm(0) |Ĥ (1) |n(0) i
|n(1) i = (0) (0)
|m(0) i
m6=n
En − E m
p p !
~λ (n + 1)(n + 2) n(n − 1)
= |(n + 2)(0) i + |(n − 2)(0) i .
2mω −2~ω 2~ω
We can now also calculate the second order shift in the energy:
2
X hm(0) |Ĥ (1) |n(0) i
En(2) = hn(0) |Ĥ (1) |n(1) i =
m6=n
En(0) − Em
(0)
2
~λ (n + 1)(n + 2) n(n − 1) 2
= + = − 21 λ
mω 2 ~ω(n + 12 )
2mω −2~ω 2~ω
We can see a pattern emerging, and of course this is actually a soluble
p problem, as all that the
0
perturbation has done is change the frequency. Defining ω = ω 1 + 2λ/(mω 2 ), we see that
the exact solution is
2
En = (n + 21 )~ω 0 = (n + 12 )~ω 1 + mω
λ
2 − 1 λ
2 mω 2
+ . . .
in agreement with the perturbative calculation.
4.2 Degenerate perturbation theory

Shankar 17.3; Mandl 7.3; Griffiths 6.6
Summary: If there are degeneracies, we need to first find a set of eigen-

states of the unperturbed Hamiltonian which are not mixed by the pertur-
bation. Then we can proceed as above.
None of the formalism that we have developed so far works “out of the box” if Ĥ (0) has
degenerate eigenstates. To be precise, it is still fine for the non-degenerate states, but it fails
to work in a subspace of degenerate states if Ĥ (1) is not also diagonal in this subspace. We can
see that in Eq. (4.2), where the vanishing energy denominator clearly signals a problem, but
even Eq. (4.1) is wrong. The reason is simple: we assumed from the start that the shifts in
But suppose |1 i and |2 i are degenerate
(0) (0)
the states due to the perturbation
q would be small.
1
eigenstates of Ĥ (0) ; then so are 2
|1(0) i ± |2(0) i . Now the eigenstates of the full Hamiltonian
|1i and |2i are not degenerate—but which of the possible choices for the eigenstate of Ĥ (0) are
they close to? If for example it is the latter (as is often the case in simple examples) then even
a tiny perturbation Ĥ (1) will induce a big change in the eigenstates.
Consider a potential in two dimensions which is symmetric: V (x, y) = V0 (r). The ground
state will be non-degenerate, but all higher states, of energy En , will be doubly degenerate; a
possible but not unique choice has wave functions ψn (r) sin(nφ) and ψn (r) cos(nφ) (where φ
is the angle in the plane from the x-axis); the form of ψn (r) will of course depend on V0 (r).
For n = 1, the probability-density maps of these two states looks like the last two figures of
the top line of figure 4.1. Now imagine a perturbation which is not circularly symmetric, say
λx2 . The potential now rises more steeply along the x-axis than along the y-axis (the top
line of figure 4.1 shows a contour map of the—somewhat exaggeratedly—deformed potential).
This will clearly lift the degeneracy, because the first state, which vanishes along the x-axis,
will “feel” the perturbation less and have a lower energy than the second. The new energy
eigenstates will be similar to the original ones, and the first order energy shifts will indeed be
given by the usual formula: a naı̈ve application of perturbation theory is fine.
But what if the perturbation is 21 λ(x + y)2 ? (See the bottom line of figure 4.1 for the new
contour map.) This is symmetric between x and y, cos φ and sin φ: do we conclude that the
states remain degenerate? No: this is really the same problem as before with a more and a less
steep direction, the only difference being that the orientation is rotated by 45◦ . And so there
will again be a pair of solutions split in energy according to whether the probability of being
found along the 45◦ line is larger or smaller. We expect them to look like those shown in the
bottom line of figure 4.1. But these are not close to our original pairs of solutions; we can’t
get from one to the other by perturbation theory. Does that mean that perturbation theory is
Figure 4.1: Far left: contour plots of unperturbed potential. Left: contour plots of perturbed
potentials. Right and far-right: density plots of |ψ|2 for lower- and higher-energy eigenstates
with n = 1, for each perturbed potential. See text for explanation.
useless in this case? Surely not: effectively all we have done between the two cases is rotate the
axes by 45◦ ! The resolution comes from recognising that our initial choice of eigenstates of V0
was not unique, and with hindsight was not the appropriate starting point for this perturbation.
In fact the circular symmetry of the original
potential means that for
any angle α, the alternative
pair of wave functions ψn (r) cos n(φ − α) and ψn (r) sin n(φ − α) is an equally good choice of
orthogonal unperturbed eigenstates. (We can rotate the x-axis at will.) For the perturbation
1
2
λ(x + y)2 , the choice of α = 45◦ is the right one to ensure that the perturbed solutions are
close to the unperturbed ones, and the usual formulae can be used if we chose that basis.
To sum up: the perturbations broke the symmetry of the original problem and hence took away
the original freedom we had to choose our basis. If we chose inappropriately initially, we need
to choose again once we know the form of the perturbation we are dealing with.
Sometimes the right choice of unperturbed eigenstates is immediately obvious, as in the case
above, or can be deduced by considerations of symmetry. If not, then starting with our initial
choice {|n(0) i}, we need to find a new set of states in the degenerate space, linear combinations
of our initial choice, which we will call {|n0(0) i}, and which crucially are not mixed by the
perturbation, in other words,
hm0(0) |Ĥ (1) |n0(0) i = 0 (0)

if Em = En(0) for m 6= n.
In the new basis Ĥ (1) is diagonal in the degenerate subspace. That will ensure that all state
and energy shifts are small, as assumed in the set-up of perturbation theory.
Thus we write down the matrix which is the representation of Ĥ (1) in the degenerate subspace
of the originally-chosen basis, and find the eigenvectors. These, being linear combinations of
the old basis states, are still eigenstates of Ĥ (0) and so are equally good new basis states. But
because they are eigenstates of Ĥ (1) in this restricted space, they are not mixed by Ĥ (1) . In the
process we will probably find the eigenvalues too, and these are the corresponding first-order
energy shifts.
We then then proceed almost exactly as in the non-degenerate case having replaced (say) |1(0) i
and |2(0) i with the new linear combinations which we can call |10(0) i and |20(0) i. The expressions
for the energy and state shifts, using the new basis, are as before, Eqs. (4.1,4.2,4.3,4.4), except
instead of summing over all states m 6= n, we sum over all states for which Em (0)
6= En(0) . As al-
ready mentioned the first-order energy-shifts of the originally-degenerate states, hn0(0) |Ĥ (1) |n0(0) i,
are just the eigenvalues of the representation of Ĥ (1) in the degenerate subspace.
For example suppose Ĥ (0) has many eigenstates but two, |1(0) i and |2(0) i, are degenerate, and
that Ĥ (1) |1(0) i = β|2(0) i + . . . and Ĥ (1) |2(0) i = β|1(0) i + . . ., with β real and . . . referring to |3(0) i
and higher. Then in the degenerate subspace
(0) (1) (0)
(1) h1 |Ĥ |1 i h1(0) |Ĥ (1) |2(0) i 0 1
Ĥ −→ =β
h2(0) |Ĥ (1) |1(0) i h2(0) |Ĥ (1) |2(0) i 1 0
q q
whose eigenvectors are 12 −11 and 12 ( 11 ), with eigenvalues ∓β. With these, we define new

eigenstates: q q
|10(0) i = 1
2
|1(0) i − |2(0) i , |20(0) i = 1
2
|1(0) i + |2(0) i .
Now that we have the appropriate basis, we can apply Eq. (4.1) to obtain
E1(1)0 = h10(0) |Ĥ (1) |10(0) i = −β, E2(1)0 = h20(0) |Ĥ (1) |20(0) i = β.
The expressions for the first order state changes |n0(1) i and second-order energy changes En(2)0
are just given by Eq. (4.3,4.4) but with primed states where appropriate; in these the higher
states will of course enter. However since h20(0) |Ĥ (1) |10(0) i = 0 by construction, |20(0) i does not
appear in the sum over states and there is no problem with vanishing denominators.
We should note that, unless the perturbation does not mix the degenerate states and the
higher-order states, we have not solved the problem exactly. At this stage we have just found
the correct approximate eigenstates and the first-order energy shifts. Of course that is often
all we want.
4.2.1 Example of degenerate perturbation theory

If we have a two-state problem, then the work done to diagonalise Ĥ (1) is equivalent to finding
the exact solution of the problem. So the simplest non-trivial example requires three states of
which two are degenerate. (If all three are degenerate and mixed by the perturbation, we end
up solving the problem exactly.)
Suppose we have a three-state basis and an Ĥ (0) whose eigenstates, |1(0) i, |2(0) i and |3(0) i, have
energies E1(0) , E2(0) and E3(0) (all initially assumed to be different). A representation of this
system is
       (0) 
1 0 0 E1 0 0
|1(0) i −→  0  , |2(0) i −→  1  , |3(0) i −→  0  , Ĥ (0) −→  0 E2(0) 0  .
0 0 1 0 0 E3(0)
To this, we add the perturbation
 
1 1 1
Ĥ (1) −→ a  1 1 1  .
1 1 1
First, let’s consider a non-degenerate case, with E1(0) = E0 , E2(0) = 2E0 and E3(0) = 3E0 . Then
we can show that, to first order in a
E1(1) = E2(1) = E3(1) = a,
   
0 1
a a (0) a  a (0) a (0) a 
|1(1) i = − |2(0) i − |3 i −→ −2  , |2(1) i = |1 i − |3 i −→ 0 ,
E0 2E0 2E0 E0 E0 E0
−1 −1
 
1
a (0) a (0) a   3a2 3a2
|3(1) i = |1 i + |2 i −→ 2 , E1(2) = − , E2(2) = 0, E3(2) = .
2E0 E0 2E0 2E0 2E0
0
Note that all of these terms are just the changes in the energies and states, which have to be
added to the zeroth-order ones to get expressions which are complete to the given order.
In this case the exact eigenvalues of Ĥ (0) + Ĥ (1) can only be found numerically. The left-hand
plot below shows the energies as a function of a, both in units of E0 , with the dashed lines
being the expansion to second order:
The right-hand plot above shows the partially degenerate case discussed below.
Now we consider a case with two degenerate states, with E1(0) = E2(0) = E0 , and E3(0) = 2E0 .
We note that |1(0) i and |2(0) i are just two of an infinite set of eigenstates with the same energy
E1(0) , since any linear combination of them is another eigenstate. We have to make the choice
which diagonalises Ĥ (1) in this subspace: in this subspace

(1) 1 1
Ĥ −→ a
1 1
q q
whose eigenstates are 12 −11 and 12 ( 11 ), with eigenvalues 0 and 2a. So

1 1
|10 i = √ |1(0) i − |2(0) i |20 i = √ |1(0) i + |2(0) i .
(0) (0)
and
2 2
These new states don’t diagonalise the full Ĥ (1) , √ of course. To go further we need the matrix
elements h3(0) |Ĥ (1) |10(0) i = 0 and h3(0) |Ĥ (1) |20(0) i = 2a. Then
E1(1)0 = 0 E2(1)0 = 2a, E3(1) = a,
√ √ √
   
0 1
a 2 (0) 2a   2a 0(0) a  
|10(1) i = 0, 0(1)
|2 i = − |3 i −→ − 0 , (1)
|3 i = |2 i −→ 1
E0 E0 E0 E0
1 0
2 2
2a 2a
E1(2)0 = 0, E2(2)0 = − , E3(2) =
E0 E0
In this particular case it is easy to show that |10(0) i is actually an eigenstate of Ĥ (1) , so there
will be no change to any order. We can check our results against the exact eigenvalues and see
that they are correct, which is left as an exercise for the reader; for that purpose it is useful to
write Ĥ (1) in the new basis (Ĥ (0) of course being unchanged) as:
 
0 0 √0
Ĥ (1) −→ a  0 √2 2 .
0 2 1
One final comment: we calculated
h10(0) |Ĥ (1) |3(0) i 0(0) h20(0) |Ĥ (1) |3(0) i 0(0)
|3(1) i = |1 i + |2 i
E0 E0
But we could equally have used the un-diagonalised states |1(0) i and |2(0) i. This can be seen if
we write
1 0(0) 0(0)
|3(1) i = |1 ih1 | + |20(0) ih20(0) | Ĥ (1) |3(0) i
E0
and spot that the term in
brackets is the identity operator in the degenerate subspace, which can
equally well be written |1 ih1 | + |2 ih2 | . Of course for a problem in higher dimensions,
(0) (0) (0) (0)
there would be other terms coming from the non-degenerate states |m(0) i as well.
4.2.2 Symmetry as a guide to the choice of basis

Recall that in the majority of cases, degeneracy arises from one or more symmetries, which show
themselves as operators which commute with the Hamiltonian. In the case considered in the
discussion of figure 4.1, the original Hamiltonian commuted with L̂z while the perturbations did
not. In a more complicated case the perturbation may break some but not all of the symmetries:
in 3D for instance full rotational invariance may be broken, but symmetry about the z-axis
might remain, and so the full Hamiltonian will commute with L̂z . Then the perturbation
will not mix initially-degenerate states of the same l but different m, and the usual choice of
spherical harmonics for the wave functions will enable normal, non-degenerate perturbation
theory to be used for the first-order energy shift. We say that m remains a good quantum
number, since L̂z commutes with the full Hamiltonian.
More generally, if eigenstates of H (0) are also eigenstates of some operator Ω̂, we distinguish the
degenerate states with energy En(0) by the eigenvalues of that operator: {|n, ωi i}. If [Ĥ (1) , Ω̂] = 0,
then from hn, ωj |[Ĥ (1) , Ω̂]|n, ωi i = 0 we immediately have hn, ωj |Ĥ (1) |n, ωi i = 0 if ωj 6= ωi . Ĥ (1)
is diagonal in the En(0) -subspace with this choice of basis; it does not mix the degenerate states.
We are about to move on to perturbations in the hydrogen atom, where Ĥ (0) commutes with
both L̂ and Ŝ, and we have a choice of quantum numbers to classify the state: our basis can
be {|l, ml ; s, ms i} or {|l, s; j, mj i}. If Ĥ (1) fails to commute with L̂ or Ŝ, while still commuting
with L̂2 , and Ŝ2 and Ĵ, then we avoid all problems by simply choosing the second basis from
the start.
4.3 The fine structure of hydrogen
Shankar 17.3; Mandl 7.4; Griffiths 6.3; Gasiorowicz 12.1,2,4
Summary: The Schrödinger equation with a pure Coulomb potential does

an excellent job of describing hydrogen energy levels, but even 100 years
ago spectroscopy could detect deviations from that picture. But first-order
perturbation theory is good enough for almost all purposes, since the small
2
parameter is αEM ∼ 10−4 .
4.3.1 Pure Coulomb potential and nomenclature

Although the Schrödinger equation with a Coulomb potential reproduces the Bohr model and
gives an excellent approximation to the energy levels of hydrogen, the true spectrum was known
to be more complicated right from the start. The small deviations are termed “fine structure”
and they are of order 10−4 compared with the ground-state energy (though the equivalent terms
for many-electron atoms can be sizable). Hence perturbation theory is an excellent framework
in which to consider them.
First a reminder of the results of the unperturbed calculation. The Coulomb potential is
V (r) = −~cα/r (written in terms of the dimensionless α = e2 /4π0 ~c ≈ 1/137); the energies
turn out to depend only on the principal quantum number n and not on l: En = − n12 ERy ,
where ERy = 12 α2 mc2 = 13.6 eV (with m the reduced mass of the electron-proton system).
For a given n all values of l < n are allowed, giving n−1 (2l + 1) = n2 degenerate states.
P
l
The wave function of the ground state is proportional to e−r/a0 , where a0 is the Bohr radius,
a0 = ~c/(mc2 α). Results for other hydrogen-like single electron atoms (with nuclear charge Z)
can be obtained by replacing α with Zα and m with the appropriate reduced mass. Lists of
wave functions are given in A.6.
The states of the system are tensor-direct-product states of spatial and spin states, written
|n, l, ml i ⊗ | 12 , ms i or |n, l, ml , ms i (s = 12 is usually suppressed). Since the Hamiltonian has no
spin dependence, the spatial state is just the one discussed previously. The alternative basis
|n, l; j, mj i is often used for reasons which will become clear; subshells of states of a given
{l, s, j} are referred to using the notation (2s+1) lj , with l = s, p, d, f . . . , so for example 2 f5/2
(or just f5/2 ) has s = 21 of course, l = 3 and j = l − s = 52 . An example of such a state is
q q
11 2
|n, 1; 22
i = 3
|n, 1, 1i ⊗ | 12 , − 21 i − 1
3
|n, 1, 0i ⊗ | 21 , 12 i.
(Compared to the section on addition of orbital and spin angular momenta, the only difference
is that in calculating matrix elements of the states |n, l, ml i, there is a radial integral to be done
as well as the angular one.) With the pure Coulomb potential, we can use whichever basis we
like.
4.3.2 Fine structure: the lifting of l degeneracy

There are two effects to be considered. One arises from the use of the non-relativistic
p expression
p2 /2m for the kinetic energy, which is only the first term in an expansion of (mc2 )2 + (pc)2 −
mc2 . The first correction term is −p4 /(8m3 c2 ), and its matrix elements are most easily cal-
culated using the trick of writing it as −1/(2mc2 )(Ĥ (0) − VC (r))2 , where Ĥ (0) is the usual
Hamiltonian with a Coulomb potential. Now in principle we need to be careful here, because
Ĥ (0) is highly degenerate (energies depend only on n and not on l or m). However we have
hn, l0 , m0l |(Ĥ (0) − VC (r))2 |n, l, ml i = hn, l0 , m0l |(En(0) − VC (r))2 |n, l, ml i, and since in this form the
operator is spherically symmetric and spin-independent, it can’t link states of different l, ml
or ms . So the basis {|n, l, ml , ss i} already diagonalises Ĥ (1) in each subspace of states with the
same n, and we have no extra work to do here. (We are omitting the superscript (0) on the
hydrogenic states, here and below.)
The final result for the kinetic energy effect is

(1) 1 (0) 2 (0)
1 2 1
hn, l, ml |ĤKE |n, l, ml i = − (En ) + 2En ~cαhn, l| |n, li + (~cα) hn, l| 2 |n, li
2mc2 r r
2

α |En |
(0)
2 3
=− −
n 2l + 1 4n
In calculating this the relation ERy = ~cα/(2a0 ) is useful. The matrix elements involve ra-
dial integrals only; tricks for calculating these are explained in Shankar qu. 17.3.4; they are
tabulated in A.6. Details of the algebra for this and the following calculation are given here.
The second correction is the spin-orbit interaction:
(1) 1 dVC
ĤSO = L̂ · Ŝ
2m2 c2 r dr
In this expression L̂ and Ŝ are the vector operators for orbital and spin angular momentum
respectively. The usual (somewhat hand-waving) derivation talks of the electron seeing a mag-
netic field from the proton which appears to orbit it; the magnetic moment of the electron then
prefers to be aligned with this field. A slightly more respectable derivation uses the fact that
an EM field which is purely electric in one frame (the lab frame) has a magnetic component
when seen from a moving frame; this at least makes it clear that the spin-orbit interaction is
also a relativistic effect. Both “derivations” give an expression which is too large by a factor of
2; an exact result requires the use of the Dirac equation.
This time we will run into trouble with the degeneracy of Ĥ (0) unless we do some work first.
Since the Coulomb potential is spherically symmetric, there is no mixing of states of the same
n but different l. However states of different {ml , ms } will mix, since L̂ · Ŝ does not commute
with L̂z and Ŝz . The trick of writing 2L̂ · Ŝ = Ĵ2 − L̂2 − Ŝ2 where Ĵ = L̂ + Ŝ tells us that L̂ · Ŝ
does commute with Ĵ2 and Jˆz , so we should work in the basis |n, l; j, mj i, instead. (The label
s = 21 is suppressed.) This basis diagonalises the spin-orbit perturbation (and is an equally
acceptable basis, giving the same result, for the relativistic correction term above).
Then
(1) 1 1 dVC 1

2 2 2

hn, l; j, mj |ĤSO |n, l; j, mj i = hn, l| |n, li hl; j, m |
j 2 Ĵ − L̂ − Ŝ |l; j, mj i
2m2 c2 r dr
~cα 1 2 3

= hn, l| |n, li ~ j(j + 1) − l(l + 1) − 4
4m2 c2 r3
α2 |En(0) |

2 2
= − ,
n 2l + 1 2j + 1
where in the first line we have separated the radial integral from the angular momentum matrix
elements, and where a fair amount of algebra links the last two lines. (This expression is only
correct for l 6= 0. However there is another separate effect, the Darwin term, which only affects
s-waves and whose expectation value is just the same as above (with l = 0 and j = 12 ), so we
can use this for all l. The Darwin term can only be understood in the context of the Dirac
equation.)
So finally
α2 |En(0) |

(1) 3 2
Enj = − .
n 4n 2j + 1
The degeneracy of all states with the same n has been broken. States of given j with l = j ± 12
are still degenerate, a result that persists to all orders in the Dirac equation (where in any case
orbital angular momentum is no longer a good quantum number.) So the eight n = 2 states
are split by 4.5 × 10−5 eV, with the 2 p3/2 state lying higher that the degenerate 2 p1/2 and 2 s1/2
states.
For a brief comment on spin-orbit splitting in light atoms other than hydrogen, see 2.5.3.
Two other effects should be mentioned here. One is the hyperfine splitting. The proton has a
magnetic moment, and the energy of the atom depends on whether the electron spin is aligned
with it or not— more precisely, whether the total spin of the electron and proton is 0 or 1.
The anti-aligned case has lower energy (since the charges are opposite), and the splitting for
the 1s state is 5.9 × 10−6 eV. (It is around a factor of 10 smaller for any of the n = 2 states.)
Transitions between the two hyperfine states of 1s hydrogen give rise to the 21 cm microwave
radiation which is a signal of cold hydrogen gas in the galaxy and beyond.
The final effect is called the Lamb shift. It cannot be accounted for in quantum mechanics, but
only in quantum field theory.
The diagrams above show corrections to the simple Coulomb force which would be represented
by the exchange of a single photon between the proton and the electron. The most notable
effect on the spectrum of hydrogen is to lift the remaining degeneracy between the 2 p1/2 and
2
s1/2 states, so that the latter is higher by 4.4 × 10−6 eV.
Below the various corrections to the energy levels of hydrogen are shown schematically. The
gap between the n = 1 and n = 2 shells is suppressed, and the Lamb and hyperfine shifts are
exaggerated in comparison with the fine-structure. The effect of the last two on the 2 p3/2 level
is not shown.
4.4 The Zeeman effect: hydrogen in an external mag-
netic field
(Shankar 14.5); Mandl 7.5; Griffiths 6.4; Gasiorowicz 12.3
Summary: This is an interesting application of degenerate perturbation

theory because the appropriate unperturbed basis depends on the strength
of the external field compared to the effective internal one which contributes
to fine structure.
(Since we will not ignore spin, this whole section is about the so-called anomalous Zeeman
effect. The so-called normal Zeeman effect cannot occur for hydrogen, but is a special case
which pertains in certain multi-electron atoms for which the total spin is zero.)
With an external magnetic field along the z-axis, the perturbing Hamiltonian is Ĥ (1) = −µ·B =
(µB B/~)(L̂z + 2Ŝz ). The factor of 2 multiplying the spin is of course the famous g-factor for
spin, as predicted by the Dirac equation. Clearly this is diagonalised in the {|n, l, ml , ms i}
basis (s = 12 suppressed in the labelling as usual). Then Enlm(1)
l ms
= µB B(ml + 2ms ). If, for
example, l = 2 there are 7 possible values of ml + 2ms between −3 and 3, with −1, 0 and 1
being degenerate (5 × 2 = 10 states in all).
This is the “strong-field Zeeman” or “Paschen-Back” effect. It is applicable if the magnetic field
is strong enough that we can ignore the fine structure discussed in the last section. In hydrogen,
that means B 10−4 eV/µB ∼ 2 T. But for a weak field, B 2 T, the fine structure effects
will be stronger, so we will consider them part of Ĥ (0) for the Zeeman problem; our basis is then
{|n, l; j, mj i} and states of the same j but different l and mj are degenerate. This degeneracy
however is not a problem, because the operator (L̂z + 2Ŝz ) does not connect states of different
l or mj (since it commutes with L̂2 and with Jˆz ). So we can use non-degenerate perturbation
theory, with
(1) µB B µB B
Enljmj = hn, l; j, mj |L̂z + 2Ŝz |n, l; j, mj i = µB Bmj + hn, l; j, mj |Ŝz |n, l; j, mj i.
~ ~
If Jˆz is conserved but L̂z and Ŝz are not, the expectation values of the latter two might be
expected to be proportional to the first, modified by the average degree of alignment: hŜz i =
~mj hŜ · Ĵi/hĴ2 i . (This falls short of a proof but is in fact correct, and follows from the Wignar
Eckart theorem as explained in section 2.6. A similar expression holds for L̂z . A semi-classical
derivation may be found here.) Using 2Ŝ · Ĵ = Ŝ2 + Ĵ2 − L̂2 gives

(1) j(j + 1) − l(l + 1) + s(s + 1)
Enljmj = µB Bmj 1 + = µB Bmj gjls .
2j(j + 1)
Of course for hydrogen s(s+1) = 34 , but the expression above, which defines the Landé g factor,
is actually more general and hence I’ve left it with an explicit s. For hydrogen, j = l ± 12 and
1
so g = (1 ± 2l+1 ).
Thus states of a given j (already no longer degenerate due to fine-structure effects) are further
split into (2j + 1) equally-spaced levels. Since spectroscopy involves observing transitions
between two states, both split but by different amounts, the number of spectral lines can be
quite large.
For heavier (but not too heavy) atoms the fine structure splitting of levels is much greater than
in hydrogen, and the weak-field Zeeman effect is the norm. The expression above can be used
for the Landé g-factor, using the quantum numbers J, L and S for the atom as a whole.
For protons and neutrons the complex internal dynamics of the quarks leads to g-factors,
and hence magnetic moments, which are quite different from 2 (for the proton) and 0 (for the
uncharged neutron); see comments in 2.5.3. The magnetic moments of nuclei with one unpaired
nucleon in an orbital of specified l and j can then be estimated analogously to the procedure
above. (These estimates give the “Schmidt limits” which will be discussed in PHYS40302.)
4.5 The Stark effect: hydrogen in an external electric
field
Shankar 17.2,3; Gasiorowicz 11.3; (Griffiths problems 6.35,36)
Summary: In this case we have to get to grips with degenerate and beyond-
first-order perturbation theory
In this section we consider the energy shifts of the levels of hydrogen in an external electric field,
taken to be along the z-axis: E = Eez (we use E for the electric field strength to distinguish it
from the energy). We will work in the strong-field limit and ignore fine structure; furthermore
the dynamics are then independent of the spin so we can ignore ms ; the unperturbed eigenstates
can be take to be |n, l, ml i.
The perturbing Hamiltonian is Ĥ (1) = |e|Ez. (In this section we will not write ẑ for fear of confu-
sion with a unit vector.) Now it is immediately obvious that, for any state, hn, l, ml |z|n, l, ml i =
0. The probability density is symmetric on reflection in the xy-plane, but z is antisymmetric.
So for the ground state, the first order energy shift vanishes. (We will return to excited states,
but think now about why we can’t conclude the same for them.) This is not surprising, because
an atom of hydrogen in its ground state has no electric dipole moment: there is no p · E term
to match the µ · B one.
To pcalculate the second-order energy shift we need hn, l, ml |z|1, 0, 0i. We can write z as r cos θ
or 4π/3 rY10 (θ, φ). The lack of dependence on φ means that ml can’t change, and in addition
l can only change by one unit, so hn, l, m|z|1, 0, 0i = δl1 δm0 hn, 1, 0|z|1, 0, 0i. However this isn’t
the whole story: there are also states in the continuum, which we will denote |ki (though these
are not plane waves, since they see the Coulomb potential). So we have
X |hn, 1, 0|z|1, 0, 0i|2 2
3 |hk|z|100i|
Z
(2) 2 2
E100 = (eE) + (eE) d k (0)
n>1
E1(0) − En(0) E1 − Ek(0)
(We use E1 for E100 ). This is a compact expression, but it would be very hard to evaluate
directly. We can get a crude estimate of the size of the effect by simply replacing all the
denominators by E1(0) − E2(0) ; this overestimates the magnitude of every term but the first, for
which it is exact, so it will give an upper bound on the magnitude of the shift. Then (recalling
E1(2) < 0),
!
2 Z
(eE) X X
E1(2) > h1, 0, 0|z|n, l, ml ihn, l, ml |z|1, 0, 0i + d3 kh1, 0, 0|z|kihk|z|1, 0, 0i
E1(0) − E2(0) n≥1 lm
l
!
2 Z
(eE) XX
= (0) (0)
h1, 0, 0|z |n, l, ml ihn, l, ml | + d3 k|kihk| z|1, 0, 0i
E1 − E2 n≥1 lm
l
2
(eE) 2 4(eEa0 )2 8(eE)2 a30
= h1, 0, 0|z |1, 0, 0i = − = −
E1(0) − E2(0) 3ERy 3~cα
where we have included n = 1 and other values of l and m in the sum because the matrix
elements vanish anyway, and then used the completeness relation involving all the states, bound
and unbound, of the hydrogen atom. For details of this integral and the one needed for the
next part, see here, p2.
There is a trick for evaluating the exact result, which gives 9/4 rather than 8/3 as the constant
(See Shankar.) So our estimate of the magnitude is fairly good. (For comparison with other
ways of writing the shift, note that (eE)2 /~cα = 4π0 E2 —or, in Gaussian units, just E2 . )
Having argued above that the hydrogen atom has no electric dipole, how come we are getting
a finite effect at all? The answer of course is that the field polarises the atom, and the induced
dipole can then interact with the field. We have in fact calculated the polarisability of the
hydrogen atom.
Now for the first excited state. We can’t conclude that the first-order shift vanishes here, of
course, because of degeneracy: there are four states and Ĥ (1) is not diagonal in the usual basis
|2, l, ml i (with l = 0, 1). In fact as we argued above it only connects |2, 0, 0i and |2, 1, 0i, so the
states |2, 1, ±1i decouple and their first order shifts do vanish. Using h2, 1, 0|z|2, 0, 0i = −3a0 ,
we have in this subspace (with |2, 0, 0i = (1, 0)> and |2, 1, 0i = (0, 1)> )

(1) 0 1
Ĥ = −3a0 |e|E ,
1 0
p
and the eigenstates are 1/2(|2, 0, 0i ± |2, 1, 0i) with eigenvalues ∓3a0 |e|E. So the degenerate
quartet is split into a triplet of levels (with the unshifted one doubly degenerate).
In reality the degeneracy of the n = 2 states is lifted by the fine-structure splitting; are these
results then actually relevant? They will be approximately true if the field is large; at an
intermediate strength both fine-structure and Stark effects should be treated together as a
perturbation on the pure Coulomb states. For very weak fields degenerate perturbation √ theory
can be applied in the space of j = 21 states (2 s 1 and 2 p 1 ), which are shifted by ± 3a0 |e|E. The
2 2
j = 23 states though have no first-order shift.
5. Quantum Measurement
5.1 The Einstein-Poldosky-Rosen “paradox” and Bell’s

inequalities
Mandl 6.3; Griffiths 12.2; Gasiorowicz 20.3,4
Summary: This section is about nothing less important than “the nature
of reality”!
In 1935 Einstein, along with Boris Poldosky and Nathan Rosen, published a paper entitled
“Can quantum-mechanical description of physical reality be considered complete?” By this
stage Einstein had accepted that the uncertainty principle did place fundamental restrictions
on what one could discover about a particle through measurements conducted on it. The
question however was whether the measuring process actually somehow brought the properties
into being, or whether they existed all along but without our being able to determine what
they were. If the latter was the case there would be “hidden variables” (hidden from the
experimenter) and the quantum description—the wave function—would not be a complete
description of reality. Till the EPR paper came out many people dismissed the question as
undecidable, but the EPR paper put it into much sharper focus. Then in 1964 John Bell
presented an analysis of a variant of the EPR paper which showed that the question actually
was decidable. Many experiments have been done subsequently, and they have come down
firmly in favour of a positive answer to the question posed in EPR’s title.
The original EPR paper used position and momentum as the two properties which couldn’t be
simultaneously known (but might still have hidden definite values), but subsequent discussions
have used components of spin instead, and we will do the same. But I will be quite lax about
continuing to refer to “the EPR experiment”.
There is nothing counter-intuitive or unclassical about the fact that we can produce a pair of
particles whose total spin is zero, so that if we find one to be spin-up along some axis, the other
must be spin down. All the variants of the experiment to which we will refer can be considered
like this: such a pair of electrons is created travelling back-to-back at one point, and travel to
distant measuring stations where each passes through a Stern-Gerlach apparatus (an “SG”) of
a certain orientation in the plane perpendicular to the electrons’ momentum.
As I say there is nothing odd about the fact that when the two SGs have the same orientation
the two sequences recorded at the two stations are perfectly anti-correlated (up to measurement
errors). But consider the case where they are orientated at 90◦ with respect to each other as
below: Suppose for a particular pair of electrons, we measure number 1 to be spin up in the
z-direction and number 2 to be spin down in the x-direction. Now let’s think about what would
have happened if we had instead measured the spin in the x-direction of particle 1. Surely, say
EPR, we know the answer. Since particle 2 is spin down in the x-direction, particle 1 would
61
have been spin up. So now we know that before it reached the detector, particle 1 was spin up
in the z-direction (because that’s what we got when we measured it) and also spin up in the
x-direction (because it is anti-correlated with particle 2 which was spin down). We have beaten
the uncertainty principle, if only retrospectively.
But of course we know we can’t construct a wave function with these properties. So is there
more to reality than the wave function? Bell’s contribution was to show that the assumption
that the electron really has definite values for different spin components—if you like, it has
an instruction set which tells it which way to go through any conceivable SG that it might
encounter—leads to testable predictions.
For Bell’s purposes, we imagine that the two measuring stations have agreed that they will set
their SG to one of 3 possible settings. Setting A is along the z-direction, setting C is along
the x direction, and setting B is at 45◦ to both. In the ideal set-up, the setting is chosen just
before the electron arrives, sufficiently late that no possible causal influence (travelling at not
more than the speed of light) can reach the other lab before the measurements are made. The
labs record their results for a stream of electrons, and then get together to classify each pair
as, for instance, (A ↑, B ↓) or (A ↑, C ↑) or (B ↑, B ↓) (the state of electron 1 being given
first). Then they look at the number of pairs with three particular classifications: (A ↑, B ↑),
(B ↑, C ↑) and (A ↑, C ↑). Bell’s inequality says that, if the way the electrons will go through
any given orientation is set in advance,
N (A ↑, B ↑) + N (B ↑, C ↑) ≥ N (A ↑, C ↑)
where N (A ↑, B ↑) is the number of (A ↑, B ↑) pairs etc.

Now let’s prove that.
Imagine any set of objects (or people!) with three distinct binary properties a, b and c—say
blue or brown eyes, right or left handed, and male or female (ignoring messy reality in which
there are some people not so easily classified). In each case, let us denote the two possible
values as A and A etc (A being “not A” in the sense it is used in logic, so if A is blue-eyed,
A is brown-eyed). Then every object is classified by its values for the three properties as, for
instance, ABC or ABC or ABC . . .. The various possibilities are shown on a Venn diagram
below (sorry that the bars are through rather than over the letters...) In any given collection
of objects, there will be no fewer than zero objects in each subset, obviously. All the N s are
greater than or equal to zero. Now we want to prove that the number of objects which are
AB (irrespective of c) plus those that are BC (irrespective of a) is greater than or equal to the
number which are AC (irrespective of b):
N (AB) + N (BC) ≥ N (AC)
This is obvious from the diagram below, in which the union of the blue and green sets fully
contains the red set.
A logical proof is as follows:
N (AB) + N (BC) = N (ABC) + N (ABC) + N (ABC) + N (ABC)

= N (ABC) + N (AC) + N (ABC) ≥ N (AC)
To apply to the spins we started with, we identify A with A ↑ and A with A ↓. Now if an
electron is A ↑ B ↓ (whatever C might be) then its partner must be A ↓ B ↑, and so the result
of a measurement A on the first and B on the second will be (A ↑, B ↑). Hence the inequality
for the spin case is a special case of the general one. We have proved Bell’s inequality assuming,
remember, that the electrons really do have these three defined properties even if, for a single
electron, we can only measure one of them.
Now let’s consider what quantum mechanics would say. A spin-0 state of two identical particles
is q
|S = 0i = 12 | ↑i ⊗ | ↓i − | ↓i ⊗ | ↑i
and this is true whatever the axis we have chosen to define “up” and “down”. As expected, if we
choose the same measurement direction at the two stations (eg both A), the first measurement
selects one of the two terms and so the second measurement, on the other particle, always
gives the opposite result. (Recall this is the meaning of the 2-particle wave function being
non-seperable or entangled.)
What about different measurement directions at the two stations (eg A and B)? Recall the
relation between the spin-up and spin-down states for two directions in the xz-plane, where θ
is the angle between the two directions:
|θ, ↑i = cos 2θ |0, ↑i + sin 2θ |0, ↓i |0, ↑i = cos 2θ |θ, ↑i − sin 2θ |θ, ↓i
|θ, ↓i = − sin 2θ |0, ↑i + cos 2θ |0, ↓i |0, ↓i = sin 2θ |θ, ↑i + cos 2θ |θ, ↓i.
(We previously showed this for the first axis being the z-axis, but, up to overall phases, it is
true for any pair). For A and B or for B and C, θ = 45◦ ; for A and C it is 90◦ .
Consider randomly-oriented spin-zero pairs and settings A, B and C equally likely. If the first
SG is set to A and the second to B (which happens 1 time in 9), there is a probability of 1/2 of
getting A ↑ at the first station. But then we know that the state of the second electron is |A ↓i
and the probability that we will measure spin-up in the B direction is |hB ↑ |A ↓i|2 = sin2 π8 .
(This probability doesn’t change if we start with the second measurement.) Thus the fraction
of pairs which are (A ↑, B ↑) is 12 sin2 22.5◦ = 0.073, and similarly for
(B ↑, C ↑). But the fraction which are (A ↑, C ↑) is 12 sin2 45◦ = 0.25. So the prediction of
quantum mechanics for 9N0 measurements is
N (A ↑, B ↑) + N (B ↑, C ↑) = 0.146N0 < N (A ↑, C ↑) = 0.25N0
So in quantum mechanincs, Bell’s inequality is violated. The experiment has been done many
times, starting with the pioneering work of Alain Aspect, and every time the predictions of
quantum mechanics are upheld and Bell’s inequality is violated. (Photons rather than electrons
are often used. Early experiments fell short of the ideal in many ways, but as loopholes have
been successively closed the result has become more and more robust.)
It seems pretty inescapable that the electrons have not “decided in advance” how they will
pass through any given SG. Do we therefore have to conclude that the measurement made at
station 1 is responsible for collapsing the wave function at station 2, even if there is no time
for light to pass between the two? It is worth noting that no-one has shown any way to use
this set-up to send signals between the stations; on their own they both see a totally random
succession of results. It is only in the statistical correlation that the weirdness shows up...
In writing this section I found this document by David Harrison of the University of Toronto
very useful.
As well as the textbook references given at the start, further discussions can be found in N.
David Mermin’s book Boojums all the way through (CUP 1990) and in John S. Bell’s Speakable
and unspeakable in quantum mechanics (CUP 1987).
A. Revision and background
A.1 Index notation—free and dummy indices

This is a brief reminder about index notation. When we have a set of N objects, we often label
them with the subscript 1, 2 . . . N . Components of a vector |vi in some basis, for instance,
are (v1 , v2 , . . . vN ). When we write v3 , the index 3 is definite, we are talking about the third
component. But often we want to make more general statements, for instance
|ui + |vi = |wi ⇒ u1 + v1 = w1 and u2 + v2 = w2 and u3 + v3 = w3 and . . .
or
ui + vi = wi .
Here i is a free index; it stands for 1 or 2 or 3 or . . . N . It allows us to write N equations in one,
and is completely equivalent in content to the vector equation. It has to occur in each additive
term on both sides of the equation. There is nothing special about the choice of i: provided we
make the substitution everywhere in the equation we can use j or k or m or n or. . .. Often more
than one free index is needed, e.g. to label the ith row and jth column of a matrix, and they
must be chosen to be different (unless one actually means the diagonal elements). Free indices
can also be used to label basis vectors, as in {|ii}. If these basis vectors are orthonormal, they
satisfy (
1 if i = j,
hi|ji = δij ≡
0 if i 6= j.
This is an example of an equation with two free indices. δij is the Kroneker delta. (In a different
context it might is also be used for the general element of the unit matrix or identity operator.)
Another use of indices is to represent summation:
N
X
hu|vi = u∗1 v1 + u∗2 v2 + u∗3 v3 + ... + u∗N vN = u∗i vi .
i=1
Here i is called a dummy index and represents all indices. This is a completely different use
from free indices; we only have one equation not a set of N , and we can’t replace i with a definite
index of our choice. Again, though,P we can use another symbol, say j, for the dummy index. (In
this course we will never omit the —we won’t use the Einstein equation convention, because
it causes problems in eigenvalue equations. But the limits 1 to N will often be implied and
omitted.)
The crucial rule is that if dummy and free indices, or more than one dummy index, occur in
the same equation, they must all be different.
P So for instance suppose we have an orthonormal
basis {|ii} in terms of which we write |vi = i vi |ii. If we want to find the value of hi|ai, where
65
i is a free index, we have to choose something different for the dummy index in the sum, say j:
!
X X X
hi|vi = hi| vj |ji = vj hi|ji = vj δij = vi .
j j j
In the penultimate step we still had a sum over j, but only the ith term was non-zero. Suppose
instead we’d used i as the dummy index, then we would end up with something like
X X
vi hi|ii = vi = v1 + v2 + . . . + vN
i i
which is not right!

P
Similarly if we also have |ui = i ui |ii and we want hu|vi, we need to use an index other than
i for at least one of the two sums (it doesn’t matter which):
! !
X X X X X
hu|vi = u∗i hi| vj |ji = u∗i vj δij = u∗i vi , or equivalently = u∗j vj .
i j ij i j
The Kronecker delta collapses the double sum to a single sum, but it doesn’t matter which of
the two dummy indices we use in that single sum.
A.2 Vector spaces
This section is intended as revision for those who did PHYS20672 last semester. Those who
did not should consult the longer guides here and here, as well as chapter 1 of Shankar.
A.2.1 Vectors and operators in finite spaces

Vectors in a vector space are members of a set for which addition and scalar multiplication
both yield new members of the set. The familiar displacement and velocity vectors in real 3-D
space are only some examples of vectors, and many more abstract instances occur in physics. In
particular, the state of a quantum system is a vector in an infinite-dimensional vector space, and
the possibility of superposition, which is one of the main ways in which classical and quantum
descriptions of objects differ, follows. We denote vectors as | · · · i, eg |vi, |3i, |ψi, |+i, |♥i,
where the text between the “|” and the “i” is just a name or label for the ket, which can take
many forms. The dimension of the space is the size of the largest set of vectors which can be
linearly independent, and such a set is a called a basis. Any vector in the space can be written
as a sum over basis vectors
XN
|vi = vn |ni
n=1
and the numbers vn are called the coefficients or components of the vector in that basis.
For a given basis, specifying the components specifies the vector.
Multiplying any vector by zero gives the null vector, which properly should be written |0i but
is often written simply as 0. Indeed in QM, |0i may denote the ground state of a system, and
in quantum field theory it may denote the vacuum.
We are usually concerned with spaces in which two vectors can be combined to give a com-
plex number; this is the inner product which is written hw|vi = hv|wi∗ . Note that if
|wi = α|ai + β|bi, α and β being complex numbers, then hw|vi = α∗ ha|vi + β ∗ hb|vi This
is called conjugate- or skew-linearity.
We may write vectors of a basis as {|v1 i, |v1 i . . . |vN i} or simply as {|1i, |2i . . . |N i}. It is very
useful to work with orthonormal bases for which hm|ni = δmn .
When the vectors we are talking about are ordinary vectors in real 3-D space, we will tend not
to use Dirac notation. Cartesian unit vectors forming an orthonormal basis in that space will
be written {ex , ey , ez }. Here the inner product is the familiar scalar product.
Operators act on vectors to produce new vectors: Q̂|vi = |wi. The matrix element of Q̂
between two vectors is defined as hu|Q̂|vi = hu|wi. The identity operator Iˆ leaves vectors
unchanged.
The object Â = |uihv| is an operator, since it can act on a vector to give another (which will
always be proportional to |ui): Â|wi = (hv|wi)|ui. If the vectors {|ni} form an orthonormal
basis, then
N N
! N N
X X X X
|nihn| = Iˆ since |nihn| |vi = |nihn|vi = vn |ni = |vi.
n=1 n=1 n=1 n=1
This is called the completeness relation.
An operator is fully defined by what it does to the vectors of a basis, since then we can find
PN basis vector |ni, Q̂|ni
what it does to any other vector. For each
2
is a new vector which can
itself be expanded in the basis: Q̂|ni = m Qmn |mi. These N numbers Qnm fully define the
operator, in the same way that the components of vector fully define it (always with respect to
a given basis of course). With an orthonormal basis, we have
N
X
vn = hn|vi, Qmn = hm|Q̂|ni and wm = Qmn vn .
n
The final equation is reminiscent of matrix multiplication. We can write the components of a
vector as a vertical list (or column vector), and of an operator as a matrix, to give:
   
v1 h1|vi
v2  h2|vi 
|vi −→  ..  =  ..  ≡ v,
   
.  . 
vN hN |vi
hv| −→ (v1∗ , u∗2 , . . . vN ∗
) = (h1|vi, hv|2i, . . . hv|N i) ≡ v† ,
   
Q11 Q12 . . . Q1N h1|Q̂|1i h1|Q̂|2i ... h1|Q̂|N i
 Q21 Q22 . . . Q2N   h2|Q̂|1i h2|Q̂|2i ... h2|Q̂|N i 
Q̂ −→  .. ..  =   ≡ Q.
   
.. .. .. .. .. .. .. .. .. ..
 . . ... .   . . ... . 
QN 1 QN 2 . . . QN N hN |Q̂|1i hN |Q̂|2i . . . hN |Q̂|N i
The Qmn are called the matrix elements of Q̂ in this basis. So

  
Q11 Q12 . . . Q1N v1
 Q21 Q22 . . . Q2N  v2
hu|Q̂|vi = (u∗1 , u∗2 , . . . u∗N )  ..   ..  = u† Qv
  
.. .. .. .. ..
 . . ... .  . 
QN 1 QN 2 . . . QN N vN
The symbol −→ means “is represented by”, with name being a name or label for the basis,
name
which will be omitted if the basis is obvious. In different bases, the components and matrix
elements will be different. The corresponding column vectors and matrices are different rep-
resentations of the same vector/operator. (Note though that hu|Q̂|vi is a just number and
independent of the representation.)
Note that in their own basis, the basis vectors themselves have extremely simple representations:
in a 3-D space, if we use the symbol −→ to mean “is represented in the {|1i, |2i, |3i} basis
{1,2,3}
by”, then      
1 0 0
|1i −→  0 |2i −→  1 |3i −→  0 .
{1,2,3} {1,2,3} {1,2,3}
0 0 1
If we choose a new orthonormal basis {|n0 i}, vectors and operators will have new coefficients.
With vn = hn|vi, vn0 = hn0 |vi, Qmn = hm|Q̂|ni and Q0mn = hm0 |Q̂|n0 i, and where S is a unitary
matrix (not an representing an operator) defined as Smn = hm|n0 i, we have the following
relations between the two representations:
X
vn0 = ∗
Smn vm ⇒ v0 = S† v; Q0ij = Ski
∗
Qkl Slj ⇒ Q0 = S† QS.
j
For instance the vectors

q
0 0
|1 i = 1
2
|1i + √i |2i
2
− 1
2
|3i, |2 i = 1
2
(|1i + |3i), |30 i = 12 |1i − √i |2i
2
− 12 |3i
are orthonormal and so also form a basis. But in this new basis, the column vectors and matrices
which represent states and operators will be different. For instance if |vi = |1i − |3i = |10 i + |30 i
we write    
1 1
|vi −→  0  ≡ v |vi −→  0  ≡ v0 ,
{1,2,3} {10 ,20 ,30 }
−1 1
and  q 
1 1 1
0
   
v1 2 2 2 v1
√i − √i2
 
 v2  =  0   v20  .
2
v30
 q 
v3 −21 1
− 12
2
The matrix is S as defined above. We observe that its columns are just the representations of
the new states {|10 i, |20 i, |30 i} in the old basis {|1i, |2i, |3i}: S23 = h2|30 i etc.
The adjoint of an operator is defined by hu|Q̂|vi = hv|Q̂† |ui∗ . A unitary operator satisfies
Q̂† Q̂ = Iˆ .
A Hermitian operator is its own adjoint: hu|Q̂|vi = hv|Q̂|ui∗ . In practice that means
that Q̂ can act backwards on hu| or forward on |vi, whichever is more convenient. In an
orthonormal basis, Q̂ will be represented by an matrix which equals its adjoint (transposed
complex-conjugate): Qmn = Q∗nm .
Hermitian operators have real eigenvalues and orthogonal eigenvectors which span the space.
(If eigenvalues are repeated, all linear combinations of the corresponding eigenvectors are also
eigenvectors—they form a degenerate subspace—but an orthogonal subset can always be
chosen.) Thus the normalised eigenvectors of Hermitian operators are often chosen as a basis,
typically labelled by the eigenvalues: |λn i. Two Hermitian operators which commute will have
a common set of eigenvectors with might be labelled by both eigenvalues: |µm , λn i .
In its own eigenbasis, a Hermitian operator will be diagonal, with the eigenvalues as the di-
agonal elements. Hence the process of finding the eigenvalues and eigenvectors is often called
diagonalisation. The unitary matrix S whose columns are the normalised eigenvectors can
be used to transform other vectors and operators to this basis.
Since we can add and multiply operators and multiply them by scalars, we can form power
series of an operator and hence define more general functions via their power-series expansion.
The most important function of an operator is the exponential:
∞
Q̂
X Q̂n
e ≡ .
n=0
n!
Since the corresponding power series for eλ converges for all finite numbers, this is defined for
all Hermitian operators, and its eigenvalues are eλi . (In the eigenbasis of a Hermitian operator,
any function of the operator is also represented by a diagonal matrix whose elements are the
function of the eigenvalues.)
The exponential of a Hermitian operator is a unitary operator.
A.2.2 Functions as vectors

pth-order polynomials in the real variable x (with complex coefficients) form an (p+1)-D vector
space. For p = 3, one examples of a base in this space would be {1, x, x2 , x3 }, and the repre-
sentation of |vi = v0 + v1 x + v2 x2 + v3 x3 in that basis is just the column vector (v0 , v1 , v2 , v3 )> .
Another possible basis would be the first four Hermite polynomials
{H0 (x) = 1, H1 (x) = 2x, H2 (x) = 4x2 − 2, H3 (x) = 8x3 − 12x}.
in which basis |vi −→ (v0 + v2 /2, v1 /2 + 3v3 /4, v2 /4, v3 /8)> .

More general sets of functions can also form vector spaces, but typically infinite-dimensional
ones, with basis sets involving infinitely
R∞ many functions. An example would be the set of all
smooth functions f (x) for which −∞ |f (x)|2 dx is finite. We take this as the definition of hf |f i,
with Z ∞
hf |gi = f ∗ (x)g(x) dx.
−∞
An example of an orthonormal basis for these functions is the set
2 /2
{|ni = Nn Hn (x)e−x } for n = 0, 1, 2 . . .
where Nn is a normalisation constant. Then any such function can be represented by an

>
(infinitely long) list of numbers |f i −→ h0|f i, h1|f i, . . . .
If we shift our perspective, we can consider the vectors in a infinite-dimensional space as primary,
and the functions as just another representation—the position-space representation, in
which |f i −→ f (x). With that viewpoint, the value f (x0 ) of the function at some value x = x0
x
is like a component of the vector, and can be found by taking the inner product with a vector
that picks out just that value, |x0 i: f (x0 ) = hx0 |f i. If we don’t want to specify a particular
value, we have f (x) = hx|f i for the variable x.
With an eye on QM, we will often refer to vectors in a general vector space as states, which
also helps to distinguish them from position and momentum vectors (of which more later).
Operators act on functions to turn one function into another; two simple examples are multi-
plication by x, and differentiation with respect to x. For their action on the abstract states,
we use x̂ and D̂, and we need1
df
hx|x̂|f i = xf (x), hx|D̂|f i = dx
.
f ∗ xg dx = ( g ∗ xf dx)∗ , x̂ is Hermitian. So we see that |xi is an eigenstate of x̂:

R R
Since
x̂|x0 i = x0 |x0 i and x̂|xi = x|xi.

1
Shankar uses X̂ for a dimensionless position variable, and K̂ = −iD̂ as a dimensionless version of p̂, but we
stick with the QM notation.
These position eigenstates satisfy (where x and x0 are both values of the position variable)
Z ∞ Z ∞
0 0
hx |xi = δ(x − x), ˆ
|xihx| dx = I, ˆ
hf |gi = hf |I|gi = f ∗ (x)g(x) dx.
−∞ −∞
R dg
Also since f ∗ dx dx = −( g ∗ df dx)∗ , D̂ is anti-hermitian, so iD̂ is Hermitian. In QM we work
R
dx
with p̂ = −i~D̂, and we can see that [x̂, p̂] = i~. In the abstract vector space, this commutation
relation defines p̂. In position space, these operators are represented by23
d
x̂ −→ x, p̂ −→ −i~ dx .
x x
We can define eigenstates of p̂, |pi, which have the following representation in position space:4
|pi −→ hx|pi = √1
2π~
eipx/~ ,
x
and which satisfy

Z ∞ Z ∞
0 0
hp |pi = δ(p − p), |pihp| dp = Iˆ ˆ =
hf |gi = hf |I|gi f˜∗ (p)g̃(p) dp.
−∞ −∞
Up to factors of ~, hp|f i = f˜(p) is the Fourier transform of f (x), and is an equally valid
representation—in what we call momentum space—of the abstract state |f i. The numerical
equality of hf |gi calculated in the position and momentum representations is a reflection of
Parseval’s theorem.
We note that the states |ni defined above whose position-space representation is a Hermite
polynomial times a Gaussian are actually eigenstates of x̂2 − D̂2 , with eigenvalues λn = 2n + 1.
In this basis x̂ and p̂ are represented by infinite-dimensional matrices, and it can be shown that
for both, only matrix elements where m and n differ by ±1 are non-zero.
We can extend the discussion to functions of three coordinates x, y and z. (Our notation for
a point in space in a particular Cartesian coordinate system is r = x ex + y ey + z ez .) There
are operators associated with each coordinate, x̂, ŷ and ẑ, which commute, and corresponding
momentum operators p̂x , p̂y and p̂z , which also commute. Between the two sets the the only
non-vanishing commutators are [x̂, p̂x ] = [ŷ, p̂y ] = [ẑ, p̂z ] = i~.
The position operator, x̂, is5 x̂ ex + ŷ ey + ẑ ez , and similarly p̂. Boldface-and-hat now indicates
a vector operator, i.e. a triplet of operators. The eigenstate of position is |x, y, zi ≡ |ri:

x̂|ri = x̂ ex + ŷ ey + ẑ ez |ri = x ex + y ey + z ez |ri = r|ri,

p̂|pi = p̂x ex + p̂y ey + p̂z ez |pi = px ex + py ey + pz ez |pi = p|pi.
2
Actually as these are operators, it is more accurate to give the matrix elements x̂ −→ hx0 |x̂|xi = xδ(x0 − x)
x
0
and p̂ −→ hx0 |p̂|xi = −i~ dδ(x−x
dx0
)
, which then are integrated over x0 in any expression, but as this has just the
x
net effect of setting x0 to x we never bother with this more correct version.
3
The lecturer of PHYS20672 used |ek i −→ ek (x) for eigenkets of K̂ = p̂/~. The labelling of the kets used here
x
is more common and is more symmetric between the x and p representations. I don’t find it causes problems
in practice. There is no common convention for the name of the function hx|pi which represents a plane wave.
4
This can be derived from the differential equation obtained by combining hx|p̂|pi = phx|pi together with
d
the position-space representation hx|p̂|pi = −i~ dx hx|pi.
5
We do not use r̂ since that is reserved for the unit vector r/r!
In position space, x̂ −→ r and p̂ −→ −i~∇. Momentum eigenstates are
r r
1 3/2
eip·r/~ ,

|pi −→ hr|pi = 2π~
r
which is a plane wave travelling in the direction of p. Also

Z ∞
hf |gi = f ∗ (r)g(r) d3 r, hr|r0 i = δ(r − r0 ) = δ(x − x0 )δ(y − y 0 )δ(z − z 0 ).
−∞
A.2.3 Commutators
Let Â, B̂ and Ĉ be arbitrary operators in some space. Then the following relations are very
useful:
ÂB̂ = B̂ Â + [Â, B̂],

[Â, B̂ Ĉ] = [Â, B̂]Ĉ + B̂[Â, Ĉ],
[ÂB̂, Ĉ] = [Â, Ĉ]B̂ + Â[B̂, Ĉ],
[Â, B̂ n ] = n[Â, B̂]B̂ n−1 provided [Â, B̂] commutes with B̂.
eÂ eB̂ = eÂ+B̂+[Â,B̂]/2 provided [Â, B̂] commutes with Â and B̂.
Let Q(x) be a polynomial with derivative R(x). Then
[p̂x , Q(x̂)] = −i~R(x̂) −→ −i~ dQ(x)

dx
.
x
Similarly if V (r) is a function of position in 3-D,
[p̂, V (x̂)] −→ −i~ ∇V (r).

r
A.3 Recap of 2nd year quantum mechanics
A.3.1 The wave function and Schrödinger’s equation

The insight of de Broglie was that Bohr’s atomic model with its quantised energy levels could
be understood if the electron acted like a wave, with wave vector and frequency related to
the momentum and energy by p = ~k and E = ~ω. Schrödinger found the appropriate wave
equation to describe this wave, and quantum mechanics (also called wave mechanics in this
formulation) was born. A brief recap of the theory follows.6 We will focus on the problem of
a single spinless particle of mass M moving in an external potential V (r).7 Our notation is
r = x ex + y ey + z ez .
The maximum knowledge that we can have about a quantum particle is the wave function
Ψ(r, t), a complex function of position and time. The particle in general has no definite position
or momentum; instead a range of possible values may be obtained if these or other aspects of
the system such as the energy are measured. In particular the probability of finding a particle
in an infinitesimal volume dV around position r is
P (r, dV ) = |Ψ(r, t)|2 dV
and as a result the wave function must be normalised (it must be somewhere in space):8
Z
|Ψ(r, t)|2 dV = 1
where the integral here and everywhere below is over all space unless otherwise specified. The
wave function is a solution of the time-dependent Schrödinger equation (TDSE)
2
− 2M
~
∇2 Ψ + V (r)Ψ = i~ ∂Ψ
∂t
or equivalently ĤΨ = i~ ∂Ψ
∂t
,
where Ĥ is the Hamiltonian or energy operator. As we often do when faced with a partial
differential equation, we can look for solutions which are separable in space and time,
Ψ(r, t) = ψ(r)T (t),
(this will not work if the potential depends on time, but for many interesting cases it does not).
Substituting in the the Schrödinger equation and dividing by Ψ(r, t) gives
~2 ∇2 ψ i~ dT
+ V (r) = = E;
2M ψ T dt
using the usual argument that if two functions of different variables are equal for all values
of those variables, they both must be equal to a constant which we denote as E. The time
equation is trivial and independent of the potential, and gives
T (t) = e−iE/~ ;
6
These notes are designed to be read at the start of the current course, and so do not use vector space
terminology. They also do not distinguish between operators and their position-space representations.
7
m is usually used for the mass, but that may be confused with the azimuthal quantum number.
8
This means that the wave function is dimensioned, with units of inverse length to the power D/2, D being
the number of spatial dimensions.
this is just a phase factor with a constant frequency satisfying de Broglie’s relation ~ω = E.
Allowed values of E though are not determined at this stage.
The spatial equation is the time-independent Schrödinger equation (TISE)
2
− 2M
~
∇2 ψ + V (r)ψ = Eψ or equivalently Ĥψ = Eψ.
Though the form of the equation is universal, the solutions depend on the potential. In solving
the TISE, at a minimum we need to ensure the solutions are finite everywhere—we don’t want
the probability of finding the particle in some region to be infinite. If the potential is constant,
V (r) = V0 , the solutions are plane waves characterised by a wave vector k
~2 |k|2
ψ(r) ∝ eik·r , where E = V0 + .
2M
This is very suggestive, it looks like E equals the potential energy plus the kinetic energy, if
the momentum is given by de Broglie’s relation p = ~k. And the only allowable solutions are
those with E > V0 , which makes sense classically. Indeed we identify E with the energy of the
system in all cases.
If the potential varies with position, we have two possible types of solution. There are those for
which the energy is greater than the potential over most of space, in which case the solutions
are not localised and the particle may be found anywhere; these are called scattering solutions.
They will look like plane waves in regions where the potential is constant. The other type of
solution may exist if the potential has a well, a region of space where the potential is lower
than its value at large distances. These solutions are called bound states; they have energies
that are not large enough for the particle to climb out of the well, and the wave function is
concentrated within the well—the probability of finding the particle large distances away from
the well vanishes.
Elementary QM is almost exclusively concerned with bound states. The extra requirement that
the wave function must vanish at infinity means that, for arbitrary values of E, no solution to
the TISE exists; only for certain discrete values can acceptable solutions be found. The energy
is quantised ; there is a single lowest (ground-state) energy state and a number of higher-energy
or excited states. If the potential grows without bound at large distances (an infinite well)
there will be infinitely-many states, but more realistically the potential will level off eventually
(a finite well); there will be a maximum energy of the bound states, and scattering states will
exist as well. (In this case it is usual to set V (|r| → ∞) = 0, with V < 0 in the well.) For
simplicity, we often concentrate on infinite wells.
In this case we have infinitely many states with energies En , n = 1, 2 . . . and solutions satisfying
~2
Ψn (r, t) = ψn (r)e−iEn t/~ where − 2M
∇2 ψn (r) + V (r)ψn (r) = En ψn (r).
Note that while the form of the time-dependence is the same for all solutions, the frequency
ωn = En /~ is different for each. The TISE is an eigenfunction equation, with the energies En
being the eigenvalues. The states are taken to be normalised, and it can be shown that they
are also orthogonal :9 Z
∗
ψm (r)ψn (r) dV = δmn .
9
In more than one dimension there may be distinguishable states with the same energy which are termed
degenerate; in that case any superposition of degenerate solutions is also a solution but we are always able to
choose an orthogonal set.
The general solution of the TDSE is a superposition of these states:
∞
X ∞
X
−iEn t/~
Ψ(r, t) = cn ψn (r)e , where |cn |2 = 1,
n=1 n=1
the restriction on the sum of the magnitudes of the coefficients being the normalisation con-
dition. In any specific case a particular wave function is determined by the coefficients, which
(since the functions ψn (r) are orthogonal) can be obtained from the initial state of the system
by Z
ψn∗ (r)Ψ(r, 0) dV.
cn =
q q
Thus in a very simple case, if the initial state is Ψ(r, 0) = 13 ψ1 (r) + 23 ψ3 (r), the subsequent
state would be q q
−iE1 t/~
1
Ψ(r, t) = 3 ψ1 (r)e + 23 ψ3 (r)e−iE3 t/~ .
In practical situations (such as atomic physics) the quantisation of the energy levels shows up
primarily in the existence of discrete excitation or de-excitation energies—spectral lines.
A.3.2 Measurement
It is a postulate of QM that for every “observable” or measurable property of a quantum system
there is a corresponding Hermitian operator. Denoting such an operator Q̂, “Hermitian” means
that if ψ(r) and φ(r) are (complex) normalisable functions of position,
Z ∗
∗ ∗
ψ (r)Q̂φ(r) dV = φ (r)Q̂ψ(r) dV .
Similarly to the case with Hermitian matrices, Hermitian operators have real eigenvalues and
orthogonal eigenfunctions which form a complete set, that is any other normalisable function
can be expressed as a superposition of them.
At the start we noted that a measurement of position (for which the operator is just the
position vector, denoted x̂ but simply equal to r in an integral) will give an answer which is not
known in advance, with the probability of the possible results being governed by the modulus-
squared of the wave function. So, in general, is the case with other possible measurements. If
some observable is associated with an operator Q̂, the average value of a measurement of that
observable for a particle with wave function Ψ(r, t) is given by10
Z
hQ̂i = Ψ∗ (r, t)Q̂Ψ(r, t) dV
Since, as is well known, measurement in quantum mechanics changes the system, we can only
talk about average results if we make repeated measurements on identically-prepared copies of
the system. So this average is an ensemble average, also called the expectation value.
As well as an average value, these measurements will have a spread or uncertainty ∆Q which
is given through the usual definition of the standard deviation by
∆Q2 = hQ̂2 i − hQ̂i2 .

10
hQ̂i may be written hΨ|Q̂|Ψi.
If many measurements are done (each on a fresh copy), but a single result qi is obtained every
time, then Ψ is an eigenfunction of Q̂ with eigenvalue qi : Ψ(r, t) ∝ φi (r) where Q̂φi (r) = qi φi (r).
This ensures that ∆Q = 0.
After a measurement is done, and a particular result q is obtained, subsequent measurements
carried out quickly enough on the same system will again yield the same value q; this is
reproducibility. It is only possible if, after a measurement is made, the system is no longer in
its original state but is in an eigenstate of Q̂ with q as the corresponding eigenvalue. And so
the only possible results of a measurement of Q̂ are its eigenvalues.
Any position is an eigenvalue of the position operator, likewise with momentum, and likewise
(above some minimum) for energy in scattering states. Some operators though, like the energy
for bound states, have discrete values.
There is a common misconception among students that making a measurement on a system is
somehow related to operating with the corresponding operator on the wave function. This is
not true! In general an operator turns a normalised wave function into another, quite different,
unnormalised function that doesn’t even have the dimensions of a wave function.
What we can say is that the probability of getting a particular eigenvalue of Q̂ when we make
a measurement of the corresponding observable is given by11
Z 2
P (qi ) = φ∗i (r)Ψ(r, t) dV .
And immediately after the measurement, the system is in state φ∗i (r).
Some frequently-met operators are momentum, p̂ = −i~∇, kinetic energy p̂ · p̂/2M , energy Ĥ
as given above, and angular momentum L̂ = r × p̂.
For energy, once we have found the eigenfunctions and eigenvalues for the relevant particular
potential, as we have already seen any wave function can be expanded
∞
X ∞
X Z
−iEn t/~
Ψ(r, t) = cn ψn (r)e , where 2
|cn | = 1 and cn = ψn∗ (r)Ψ(r, 0)dV.
n=1 n=1
We now see that |cn |2 is the probability that, if we measure the energy, we will get the value En .
If only one cn is non-zero, Ψ(r, t) is an energy eigenstate. The following terms are synonymous:
separable solution of the TDSE; solution of the TISE; eigenstate of the Hamiltonian; state of def-
inite energy; stationary state. The last requires some explanation: for Ψ(r, t) = ψn (r)e−iEn t/~ ,
the expectation value hQ̂i is independent of time for any operator Q̂ that does not itself depend
on time.
For the case we considered above,
q q
Ψ(r, t) = 1
ψ (r)e−iE1 t/~
3 1
+ 2
ψ (r)e−iE3 t/~ ,
3 3
a measurement of energy would yield the result E1 one third of the time and E3 two-thirds of
the time.
Given any wave function except a plane wave, the uncertainty in any component of the mo-
mentum ∆pi will be non-zero. This is related to the Fourier transform: to make up a wave
11
if the eigenvalues are continuous, this should be interpreted as a probability density instead
packet of finite spatial extent ∆x requires a superposition of waves of a spread of wavelengths.
(Similarly, a signal of finite duration consists of a spread of frequencies). The bandwidth theo-
rem relates the two; the narrower the wave packet, the wider the spread of wave numbers. In
QM language, this is Heisenberg’s uncertainty relation:
∆x∆px ≥ 12 ~.
If we make a measurement that narrows down the region in which the particle exists, a subse-
quent measurement of the momentum is much more unpredictable, and vice versa. It is obvious
therefore that we cannot find a state of well-defined position and momentum; position and mo-
mentum can’t have common eigenfunctions. In fact if operators have common eigenfunctions,
they must commute, and x̂ and p̂x do not:
(x̂p̂x − p̂x x̂)ψ(x) = i~ψ(x) ⇒ [x̂, p̂x ] = i~.
Heisenberg, with Born and Jordan, derived his formulation of QM, called matrix mechanics,
starting from this relation; Schrödinger subsequently showed that it was in fact equivalent to
his wave mechanics.
Other operators which do not commute are the three components of angular momentum, so
that in QM we can only know one of them exactly (usually but not necessarily taken to be Lz ).
In fact the commutation relations are rather interesting:
[L̂x , L̂y ] = i~L̂z , [L̂y , L̂z ] = i~L̂x , [L̂z , L̂x ] = i~L̂y .
However all three separately commute with L̂2 = L̂2x + L̂2y + L̂2z , so we can know the magnitude
of the angular momentum and one of its components at the same time. For more on angular
momentum operators see section A.5. For more on commutators see section A.2.3.
[continued overleaf]
A.3.3 Bound states
We now turn to the problem of finding the energy levels for a variety of potentials.
Infinite square well
An unrealistic but useful model potential is one which has a constant value, taken to be zero,
over some region, but which is infinite elsewhere. Where a potential is discontinuous like this
we have to solve the TISE in each region separately, and we match them by requiring that ψ
is continuous at the boundaries (the probability density has to be unambiguous). Since the
TISE is a second-order differential equation we usually need the derivatives of ψ to match as
well. However in the unique case of an infinite potential step there is no solution possible in
the classically-forbidden region, so in this case the condition is just that ψ vanishes at the
boundaries.
In 1D, with V (x) = 0 for 0 < x <√ a, the general solution is Aeikx + Be−ikx or equivalently
q with ~k = 2M E, but requiring ψ(0) = ψ(a) = 0 restricts the (nor-
C sin(kx) + D cos(kx),
malised) solution to a2 sin(kn x), with kn = nπ/a and En = ~2 π 2 /(2M a2 ). The quantisation
of the energies is completely analogous to the “quantisation” of the vibrational frequencies of
a string fixed at both ends.
If we’d chosen the well to stretch from −a/2 to a/2 the values kn and hence the energies would
(of course) be unchanged but the wave functions would alternate cosines and sines. (I only
mention this because the finite square well is much neater with that choice, see below.)
In 2D and 3D, for a rectangular or cuboidal well, the spatial wave function is separable. The
solution in 3D, with sides Lx etc, is just
s
8 nx πx ny πy nz πz
ψnx ny nz = sin sin sin
Lx Ly Lz Lx Ly Lz
~2 π 2 n2x n2y n2z

and Enx ny nz = + + .
2M L2x L2y L2z
A cuboidal well (albeit better modelled as a finite than infinite well) is also called a quantum
dot. If all the lengths are the same, some energy levels will correspond to more than one
different wave functions—denoting the quantum numbers of the state by (nx , ny , nz ), then the
first excited energy level is in fact three states, with quantum numbers (2, 1, 1), (1, 2, 1) and
(1, 1, 2). These are said to be three-fold degenerate.
Quantum mechanics allows us to make systems which act as if they have fewer than 3 dimensions
in a way that classical physics would not. If we have a situation where one length is much shorter
than the others, say Lz Lx , Ly , then all the low-lying states will have nz = 1. If we probe
such a system with energies less than the energy needed to excite the first state with nz = 2,
it will be effectively two-dimensional; we say the third degree of freedom is frozen out. Modern
so-called 2D systems are often of this nature. Similarly if Lx Ly , Lz we have an effectively
1D system or quantum wire.
For a 2D circular well, the radial solutions are Bessel functions with argument kr. Again k is
chosen so that the wave function vanishes at the boundary r = R, ie kR is a zero of the Bessel
function. For a 3-D spherical well we get spherical Bessel functions. More on this below.
Finite square well
A somewhat more realistic well has one region of constant low potential with a constant higher
potential elsewhere. We can either take these to be −V0 and 0, or 0 and +V0 ; the former allows
the potential at infinity to be zero but the latter is closer to the infinite well so we will go with
that.
In 1D, with V (x) = 0 for −a/2 < x < a/2 and V (x) = V0 > 0 elsewhere, the solutions are
again Aeikx + Be−ikx inside the well. But now there solutions to the equation outside
p the well
−κx κx
as well: for x > a/2, ψ = Ce and for x < −a/2, ψ = De , with ~κ(k) = 2M (V0 − E).
(These were not allowable for a constant potential over all space as they blow up, but they are
OK here as they hold only in restricted regions.) Matching both ψ and ψ 0 at x = ±a/2 again
restricts the values of k: either ψ is symmetric (cosine inside the well) and k has to satisfy
κ(k) = k tan(ka/2), or antisymmetric (sine inside the well) with κ(k) = −k cot(ka/2). The
graphical solution for the allowed values of k, and the first few solutions, are shown below. A
1D well, no matter how shallow, always has one bound state.
The most important part of this analysis (apart again from the quantisation of the energy) is
that there is a finite probability of finding the particle in the classically-forbidden regions. The
higher the energy, the higher this probability is. On the other hand we note that as V0 increases,
the tails of the low-lying states get smaller and shorter, until we reach the point where simply
setting ψ = 0 at the boundary, as we did in the infinite well, is a good approximation.
There is a standard piece of iconography used in the picture on the right. We draw a well and
the energy levels. Then we take the lines representing the energy levels as the x-axis for a graph
of the corresponding wave function. These therefore have off-set y-axes, and the two y scales,
for the potential and for the wave functions, are not related.
Harmonic Oscillator
A particle in a 1D harmonic potential V (x) = 12 ks x2 , where ks plays the role of a spring

constant, is a good model for many practical situations. (Essentially
p all wells are harmonic
at their base!). The classical oscillation frequency is ωp= ks /M and the potential is often
written V (x) = 21 M ω 2 x2 . It is useful to note that x0 = ~/M ω has dimensions of length, and
of course ~ω has dimensions of energy. For any given energy there will be a classically allowed
region for which E > V (x), between x = ±2x0 (E/~ω), and the rest is classically forbidden in
which the wave function falls to zero as x → ±∞. We expect the solutions to look somewhat
like the finite well, oscillatory (though not specifically sinusoidal) within the well, with decaying
tails outside, as indeed they do; again the requirement of finite, normalisable wave functions
restricts the possible energies, this time to (n + 12 )~ω, for n = 0, 1, 2.... (Hence we often use E0
2 2
rather than E1 for the ground state.) The ground state is just a Gaussian, φ0 (x) ∝ e−x /2x0 , and
higher states multiply this by Hermite polynomials; see section A.4 for details of the solution.
A potential which is harmonic in three directions, 12 M (ωx2 x2 + ωy2 y 2 + ωz2 z 2 ) will have solutions
which (like the square well) are just products of the 1D states, and energies which are the sum
of the corresponding 1D energies. If all the “spring constants” are equal there will be a high
degree of degeneracy among the excited states, for instance (7/2)~ω is the energy of the states
with (nx , ny , nz ) = (2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1) and (0, 1, 1).
In the symmetric case, though, the potential can be written V (r) = V (r) = 12 mω 2 r2 , and we
can work instead in spherical polars: see the next section. The energies, degeneracies, and non-
degenerate wave functions such as the ground state must turn out the same in both coordinate
systems, but the degenerate ones need only be linear combinations of one another.
Linear potential
In a region of space where the potential is linear, V (x) ∝ x, the solutions are Airy functions
(see section A.9). To form a well, this potential would have to have a hard wall somewhere, say
V = ∞ for x < 0, or it might be part of a V-shaped potential V (x) ∝ |x|. The energy levels
have to be found numerically.
A.3.4 Circular and spherical symmetry
In 2D we will use r2 = x2 + y 2 and y/x = tan φ; in 3D, r2 = x2 + y 2 + z 2 , y/x = tan φ and
z/r = cos θ. The double meaning of r will not cause problems so long as we don’t consider 3D
problems with cylindrical geometry.
In a 2D problem with a symmetric potential V (r) = V (r), we can write the wave function
in a form which is separable in plane polar coordinates: ψ(x, y) = R(r)Φ(φ). Skipping some
detail we find that the angular dependence is just of the form eimφ where, in order for the wave
function to be single valued, we need m to be an integer (not to be confused with the mass!).
Then the radial equation is
~2 d2 R 1 dR ~2 m2

− + + R + V (r)R = ER.
2M dr2 r dr 2M r2
In a 3D spherically-symmetric potential V (r) = V (r), we can write the wave function in a form
which is separable in spherical polar coordinates: ψ(r) = R(r)Y (θ, φ). Then
~2 2
− ∇ ψ(r) + V (r)ψ(r) = Eψ(r)
2M
1 ∂ 2Y

2 1 ∂ ∂Y
⇒ −~ sin θ + = ~2 l(l + 1)Y (A.1)
sin θ ∂θ ∂θ sin2 θ ∂φ2
~2 d2 (rR) ~2 l(l + 1)
and − + R + V (r)R = ER
2M r dr2 2M r2
where l(l + 1) is the constant of separation. The radial equation depends on the potential, and
so differs from problem to problem. However the angular equation is universal: its solutions
do not depend on the potential. It is further separable into an equation in θ and one in φ
with separation constant m2 ; the latter is the same as in 2D with solution eimφ for integer
m. Finally the allowable solutions of the θ equation are restricted to those which are finite
for all θ, which is only possible if l is an integer greater than or equal to |m|; the solutions
are associated Legendre polynomials Plm (cos θ). The combined angular solutions are called
spherical harmonics Ylm (θ, φ):
r r
1 ±1 3
0
Y0 (θ, φ) = Y1 (θ, φ) = ∓ sin θ e±iφ
4π 8π
r r
3 ±2 15
0
Y1 (θ, φ) = cos θ Y2 (θ, φ) = sin2 θ e±2iφ
4π 32π
r r
15 5
Y2±1 (θ, φ) = ∓ sin θ cos θ e±iφ Y20 (θ, φ) = (3 cos2 θ − 1)
8π 16π
These are normalised and orthogonal:
Z
0 ∗
(Ylm m
0 ) Yl dΩ = δll0 δmm0 where dΩ = sin θ dθ dφ
The physical significance of the quantum numbers l and m is not clear from this approach.
However if we look at the radial equation, we see that the potential has been effectively modified
by an extra term ~2 l(l + 1)/(2M r2 ). Recalling classical mechanics, this is reminiscent of the
centrifugal potential which enters the equation for the radial motion of an orbiting particle,
where ~2 l(l + 1) is taking the place of the (conserved) square of the angular momentum. Now
we have already defined the angular momentum operator L̂ = −i~r × ∇. If we cast this in
spherical polar coordinates, we find
1 ∂2

∂ 2 2 1 ∂ ∂
L̂z = −i~ , L̂ = −~ sin θ + .
∂φ sin θ ∂θ ∂θ sin2 θ ∂φ2
But L̂2 is just the angular part of −~2 r2 ∇2 . So its eigenfunctions, the spherical harmonics, are
in fact states of definite squared angular momentum, which is quantised and takes the values
~2 l(l + 1) for integer l. They are also states of definite Lz , which takes the values ~m. It is
good to see that as |m| ≤ l, L2z ≤ L2 , with equality only if both are zero.
We have already noted that we can simultaneously know the total angular momentum and its
z component, since the corresponding operators commute. (This should be obvious as L̂2 is
independent of φ.) Since L̂2 commutes with the Hamiltonian for a system with a symmetric
potential, states may be fully classified by their energy, the square of their angular momentum,
and the z-component of angular momentum.
For completeness we note that the formulae for L̂x and L̂y are rather lengthy (the z-axis has a
special role in spherical polars), but they can be expressed more succinctly as as
L̂x = 21 (L̂+ + L̂− ) and L̂y = 1

2i
(L̂+ − L̂− ),
where
∂ ∂ ∂ ∂
L̂+ = ~e iφ
+ i cot θ , L̂− = L̂†+ = ~e −iφ
− + i cot θ .
∂θ ∂φ ∂θ ∂φ
The operators L̂± both commute with L̂2 , though it is somewhat easier to show it for L̂x and
L̂y using Cartesian coordinates.
There is more on angular momentum in section A.5.
Square wells again
For a 2D circular infinite square well with V = 0 for r < R the solutions are Bessel functions
as mentioned above; actually there is a different Bessel function for each value of m so
ψ(r, φ) ∝ Jm (kr)eimφ .
The ground state is circularly symmetric, m = 0, and there is a tower of m = 0 states with k,
and hence E, determined by J0 (kR) = 0. However the (doubly-degenerate) first excited states
have m = ±1 and k fixed by requiring kR to be the first zero of J1 .
Similarly for a 3D spherical well, the solutions are spherical Bessel functions,
ψ(r, φ) ∝ jl (kr)Ylm (θ, φ).
This time the energy levels depend on l but not m (as expected since the z-axis is arbitrary).
Similar remarks as in the previous paragraph can be made about the ordering of levels, first
the l = 0 ground state, then an l = 1 excited states, etc.
Interestingly for 3D the radial part of ∇2 R simplifies if we set R(r) = u(r)/r, becoming u00 /r.
Even in the absence of an external potential there is the centrifugal potential to stop the
equation for the radial wave function being trivial, but for the special case of l = 0 we simply
have u(r) = sin(kr). And indeed j0 (z) = sin z/z. So the l = 0 states of a spherical square well
have energies Enl=0 = ~2 n2 π 2 /2M R2 ,
Finite circular and spherical square wells can also be solved numerically. A 2D circular well, like
a 1D well, no matter how shallow, always has one bound state, but in 3D a sufficiently-shallow
spherical well will have no bound states. (The l = 0 states, having the form within the well
of sin(kr)/r and outside of e−κr /r, have the same energies as the antisymmetric states of a 1D
well of the same diameter; as we saw when we considered 1D finite wells above, such a state
may not exist if the well is too shallow.)
Symmetric harmonic oscillator
If we solve the symmetric harmonic oscillator V (r) = 21 M ω 2 r2 in 2 or 3D, we find the lowest
2 2
energy solution to be e−r /2x0 with energy ~ω or 32 ~ω respectively. Higher-energy solutions take
the form of the same Gaussian multiplied by polynomials in r and the appropriate angular form
eimφ or Ylm (θ, φ). The first excited states in the Cartesian basis are just x or y (or z) times the
Gaussian; the relation between the two forms comes from noting that x ± iy = re±iφ (2D) or,
up to a constant, rY1±1 (θ, φ) in 3D, while z ∝ rY10 . Though it becomes increasingly tiresome
to verify, the degeneracies and energies do all match as they must. The three Cartesian-basis
states with E = (5/2)~ω all have l = 1. From the 6 states with E = (7/2)~ω discussed above,
we now find that from linear combinations we can make one state with l = 0 and five with
l = 2 (and m = 2, 1, 0, −1, 1, −2).
Hydrogen atom
The details of finding the radial solutions for the Coulomb potential are extremely lengthy, but
are covered in all the course textbooks. The results are quite simple though, and are given in
section A.6.
A.3.5 Tunnelling
As we saw with the finite square well, there is a finite probability of finding a particle in a
classically-forbidden region where V > E. This means that a particle incident on a barrier of
finite height and thickness has a finite chance of tunnelling through and continuing on the other
side. (See picture below.) To find how likely this is, we need to solve the Schrödinger equation
and look at the probability density beyond the barrier. For an individual particle we would need
to set up a localised incoming wave packet with, necessarily, a spread of momenta and energies,
which would be a complicated, time-dependent problem. Instead we consider the steady-state
problem of a beam of mono-energetic particles incident from the left on the barrier; there will
also be a reflected beam travelling in the opposite direction to the left of the barrier and a
transmitted beam beyond the barrier. (This is an example of a scattering problem). Without
loss of generality we can take the potential to be zero on the left, and the wave function
for the incident beam, with momentum ~k and energy E = ~ω = ~2 k 2 /(2M ), is Iei(kx−ωt) ,
the probability density is |I|2 and the flux of particles in the beam is (p/M )|I|2 particles per
second. We are looking for a single solution to the TISE, albeit one with different functional
forms in different regions, so the energy is the same everywhere and the time-dependence e−iωt
is likewise universal, so we will drop it.
Beyond the barrier the potential need not be the same as on thep left, so we will let it be Vf ; the
0
the momentum of particles that tunnel through will be p = 2M (E − Vf ); the wave function
0
is T eik x and the transmitted flux (p0 /M )|T |2 . Finally the reflected wave is Re−ikx and the
reflected flux (p/M )|R|2 . The transmission and reflection probabilities are
2 2
p0 T R
t= , r= ; t + r = 1.
p I I
We will focus on a single square barrier of height Vb > E p between x = 0 and x = L. In

−κx κx
that region the wave function is Ae + Be where ~κ = 2M (Vb − E), that is there will
be exponentially decaying and exponentially growing parts. As in the finite square well we
require the wave function and its derivative to be continuous at the boundaries, which gives
four equations to determine A, B, T , and R. The rather unenlightening answer is
4κ2 kk 0
t=
(κ2 − kk 0 )2 sinh2 (κL) + κ2 (k 0 + k)2 cosh2 (κL)
But if κL 1, cosh(κL) and sinh(κL) both tend to 21 eκL and the expression simplifies to
16κ2 kk 0 −2κL 4~4 κ2 kk 0

t= 0 2
e = 2 V (V − V )
e−2κL .
2 2 2
(κ + k ) κ + k M b b f
This is equivalent to ignoring the exponentially-growing wave within the barrier, so that the
ratio of the wave function at either end of the barrier is just e−κL .
Recall that k, k 0 and κ are all functions of energy. By far the most rapidly-varying part of
the expression is the exponential, and a rough estimate of the dependencepof the tunnelling
probability on energy for a high, wide barrier is given simply by t ∼ exp(−2L 2M (Vb − E)/~).
In the figure below we can note the approximately exponential decay within the barrier, the
continuity and smoothness of the wave function, and the much smaller amplitude and (since
Vf is higher than the initial potential) the longer wavelength on the right.
It may be noted that there will also be some reflection even if E > Vb , or indeed from a well
rather than a barrier.
A.4 Series Solution of Hermite’s equation and the Har-
monic Oscillator
Shankar 7.3
Griffiths 2.3.2
We consider a particle moving in a 1D quadatic potential V (x) = 12 mω 2 x2 , like a mass on a

spring. The Hamiltonian operator is
p̂2 1
Ĥ = + mω 2 x̂2 (A.2)
2m 2
p
We will work in rescaled dimensionless coordinates, defining the length scale x0 = ~/mω, so
x̂ → x0 y and p̂ → (−i~/x0 )d/dy. The energy scale is 12 mω 2 x20 = 21 ~ω. We are looking for wave
functions φ(y), of energy E = 12 ~ωE, which satisfy
d2 φ
− 2
+ y 2 φ = Eφ. (A.3)
dy
2 /2
If we write φ(y) ≡ f (y)e−y , this can be rewritten as
d2 f df
2
− 2y + (E − 1)f = 0. (A.4)
dy dy
This is P
Hermite’s differential equation. If we look for a series solution of the form
f (y) = ∞ n
j=0 cj y , we get
∞
X ∞
X ∞
X
j−2 j
j(j − 1)cj y −2 jcj y + (E − 1) cn y j = 0
j=2 j=1 j=0
∞
X
⇒ (j + 1)(j + 2)cj+2 + (E − 1 − 2j)cj y j = 0 (A.5)
j=0
where we have changed the summation index in the first sum before relabelling it j. The only
way a polynomial can vanish for all y is if all the coefficients vanish, so we have a recurrence
relation:
(j + 1)(j + 2)cj+2 + (E − 1 − 2j)cj = 0. (A.6)
Given c0 and c1 , we can construct all other coefficients from this equation, for any E. We can
obtain two independent solution, as expected for a second order differential equation: even
solutions with c1 = 0 and odd ones with c0 = 0.
However, we need the wave function to be normalisable (square integrable), which means that
it tends to 0 as x → ±∞. In general an infinite polynomial times a Gaussian will not satisfy
this, and these solutions are not physically acceptable. If we look again at equation (A.6),
though, we see that if E = 1 + 2n for some integer n ≥ 0, then cn+2 , cn+4 , cn+6 . . . are all zero.
Thus for E = 1, 5, 9 . . . we have finite even polynomials, and for E = 3, 7, 11 . . . we have finite
odd polynomials. These are called the Hermite polynomials.
Rewriting (A.6) with E = 1 + 2n as
2(j − n)
cj+2 = cj , (A.7)
(j + 1)(j + 2)
we have for instance, for n = 5,
c3 = 2(1−5)c1 /(2.3) = −4c1 /3 c5 = 2(3−5)c3 /(4.5) = −c3 /5 = 4c1 /15, c7 = c9 = . . . = 0,

(A.8)
5 3 n
and H5 (y) = c1 (4y − 20y + 15y)/15. The conventional normalisation uses 2 for the coefficient
of the highest power of y, which would require c1 = 120, and H5 (y) = 32y 5 − 160y 3 + 120y.
The first four are:
H0 (y) = 1; H1 (y) = 2y; H2 (y) = 4y 2 −2; H3 (y) = 8y 3 −12y; H4 (y) = 16y 4 −48y 2 +12. (A.9)
The corresponding solutions of the original Hamiltonian, returning to unscaled coordinates, are
φn (x) = (2n n!)−1/2 Hn ( xx0 ) × (πx20 )−1/4 exp −x2 /(2x20 ) ;

with energies En = (n + 21 )~ω.

Just as in the square well, the restriction to solutions which satisfy the boundary conditions
has resulted in quantised energy levels.
The wave functions and probablility densities are illustrated in section A.3.3.
A.5 Angular Momentum in Quantum Mechanics
The following section was prepared in another contex and so does not use Dirac notation; I
would use | ↑i etc for spin states. See section A.3.4 for angular momentum in the context of
the Schrödinger equation in spherical polar coordinates, including the forms of the differential
operators for the L̂i .
Orbital angular momentum

We start with the classical definition of orbital angular momentum. In quantum mechanics the
position and momentum vectors become operators, so

∂ ∂ ∂
L = r × p ⇒ L̂z = −i~ x −y = −i~ etc
∂y ∂x ∂φ
[L̂x , L̂y ] = i~L̂z etc; [L̂2 , L̂i ] = 0;
The commutation relations imply that we can only simultaneously know L2 and one component,
taken conventionally to be Lz . The common eigenfunctions of L̂2 and L̂z are the spherical
harmonics, Ylm (θ, φ):
L̂2 Ylm (θ, φ) = ~2 l(l + 1) Ylm (θ, φ) L̂z Ylm (θ, φ) = ~m Ylm (θ, φ)
From requirements that the wave function must be finite everywhere, and single-valued under
φ → φ + 2π, it emerges that l and m are integers and must satisfy
l = 0, 1, 2 . . . , m = −l, −l + 1, . . . l.
These have definite parity of (−1)l , since under r → −r,
Ylm (θ, φ) → Ylm (π − θ, φ + π) = (−1)l Ylm (θ, φ).
See the end of these notes for some explicit forms of spherical harmonics.
Intrinsic and total angular momentum

Orbital angular momentum is not the only source of angular momentum, particles may have
intrinsic angular momentum or spin. The corresponding operator is Ŝ. The eigenvalues of Ŝ2
have the same form as in the orbital case, ~2 s(s + 1), but now s can be integer or half integer;
similarly the eigenvalues of Ŝz are ~ms , with
s = 0, 21 , 1, 32 . . . , ms = −s, −s + 1, . . . s.
s = 21 for an electron, s = 1 for a photon or W boson. This means that the magnitude of the
√
spin vector of an electron is ( 3/2)~, but we always just say “spin- 12 ”.
If a particle has both orbital and spin angular momentum, we talk about its total angular
momentum, with operator
Ĵ = L̂ + Ŝ.
As with spin, the eigenvalues of Ĵ2 are ~2 j(j + 1),
j = 0, 21 , 1, 23 . . . , mj = −j, −j + 1, . . . j.
Systems composed of more than one particle (hadrons, nuclei, atoms) will have many contribu-
tions to their total angular momentum. It is sometimes useful to add up all the spins to give a
total spin, and now, confusingly, we denote the quantum numbers by S and MS , so it is really
important to distinguish operators and the corresponding quantum numbers. Then
Ŝtot = Ŝ(1) + Ŝ(2) + . . . ,
where the superscripts (1), (2) refer to the individual particles.

Similarly we use L̂tot with quantum numbers L and ML , and Ĵtot with quantum numbers J and
MJ . When talking about angular momentum generally, we often use Ĵ to refer to any angular
momentum, whether single or multiple particle, pure spin, pure orbital or a combination.
The following rules are obeyed by any angular momentum (eg Ĵ can be replaced by L̂ or Ŝ, for
a single particle of composite system):
[Jˆx , Jˆy ] = i~Jˆz etc; [Ĵ2 , Jî ] = 0;
It follows that the eigenvalues of (L̂tot )2 , (Ŝtot )2 and (Ĵtot )2 have exactly the same form, with
the same restrictions on the quantum numbers, as those for a single particle. So for instance
the eigenstates of (Ŝtot )2 are ~2 S(S + 1), and of Ŝztot are ~Ms , and
L = 0, 1, 2 . . . , S = 0, 12 , 1, 32 . . . , J = 0, 21 , 1, 32 . . . ,
ML = −L, −L + 1, . . . L, MS = −S, −S + 1, . . . S, MJ = −J, −J + 1, . . . J.
Addition of angular momentum

The rules for the addition of angular momentum are as follows: we start with adding orbital
angular momentum and spin for a composite system with quantum numbers L and S. Angular
momentum is a vector, and so the total can be smaller as well as greater that the parts; however
the z-components just add. The allowed values of the total angular momentum quantum
numbers are
J = |L − S|, |L − S| + 1, . . . , L + S, MJ = ML + MS .
However since L̂z and Ŝz do not commute with Ĵ2 , we cannot know J, ML and MS simultane-
ously. For a single-particle system, replace J, L, and S with j, l, and s.
More generally, for the addition of any two angular momenta with quantum numbers J1 , M1
and J2 , M2 , the rules are
J = |J1 − J2 |, |J1 − J2 | + 1, . . . , J1 + J2 , MJ = M1 + M2
and again we cannot know J, M1 and M2 simultaneously.

Confusingly, when referring to a composite particle (eg a hadron or nucleus), the total angular
momentum is often called its “spin” but given the quantum number J. Sometimes this usage
even extends to elementary particles. For the electron and proton, s is more common though.
For the case of a spin- 12 particle, the eigenvalues of Ŝz are ± 12 ~, and here we will just denote
these states by ↑ and ↓ (αz and βz are also often used); hence
3 3
Ŝ2 ↑ = ~2 ↑ Ŝ2 ↓ = ~2 ↓
4 4
Ŝz ↑ = 21 ~↑ Ŝz ↓ = − 21 ~↓
For two such particles there are four states ↑↑, ↓↓, ↑↓ and ↓↑. The first two states have MS = 1
and −1 respectively, and we can show, using Ŝtot = Ŝ(1) + Ŝ(2) , that they are also eigenstates of
(Ŝtot )2 with S = 1. However the second two, though they have MS = 0, are not eigenstates of
(Ŝtot )2 . To make those, we need linear combinations, tabulated below:
S=1 S=0
M =1 ↑↑
M =0 √1 (↑↓ + ↓↑) √1 (↑↓ − ↓↑)
2 2
M = −1 ↓↓
The S = 1 states are symmetric under exchange of particles; the S = 0 states are antisymmetric.
For a system of N spin- 21 particles, S will be integer if N is even and half-integer if N is odd.
Bosons and Fermions

Particles with half-integer spin (electrons, baryons) are called fermions, those with integer spin,
including J = 0, (mesons, photons, Higgs) are called bosons. The “Pauli exclusion principle”
applies to fermions, but it is a special case of the “spin-statistics theorem” which says that the
overall quantum state of a system of identical fermions must be antisymmetric under exchange
of any pair, while that of a system of identical bosons must be symmetric. There may be several
components to the state (spatial wave function, spin state...).
Examples of the consequences of the spin-statistics theorem are:
• If two electrons in an atom are in the same orbital (thus their spatial wave function is
symmetric under exchange of the two), they must be in an S = 0 state.
• Thus the ground state of helium has S = 0, but the excited states can have S = 0
(parahelium) or S = 1 (orthohelium).
• Two π 0 mesons must have even relative orbital angular momentum L (they are spinless,
so this is the only contribution to their wave function).
• Two ρ0 mesons (spin-1 particles) can have odd or even relative orbital angular momentum
L, but their spin state must have the same symmetry as their spatial state. (In this case,
S = 2 and 0 are even, S = 1 is odd.)
Note that in the last two, in the centre-of-momentum frame the spatial state only depends on
the relative coordinate r. So interchanging the particles is equivalent to r → −r, ie the parity
operation.
Spherical Harmonics
In spherical polar coordinates the orbital angular momentum operators are
L̂x = 21 (L̂+ + L̂− ) and L̂y = 2i1 (L̂+ − L̂− ), where

iφ ∂ ∂ † −iφ ∂ ∂
L̂+ = ~e + i cot θ , L̂− = L̂+ = ~e − + i cot θ ;
∂θ ∂φ ∂θ ∂φ
1 ∂2

∂ 2 2 1 ∂ ∂
L̂z = −i~ , L̂ = −~ sin θ + .
∂φ sin θ ∂θ ∂θ sin2 θ ∂φ2
1 ∂2 1
∇2 ψ = 2
rψ − 2 2 L̂2 ψ;
r ∂r ~r
The spherical harmonics, Ylm (θ, φ) are eigenfunctions of L̂2 and L̂z ; the first few are as follows
r r
1 ±1 3
0
Y0 (θ, φ) = Y1 (θ, φ) = ∓ sin θ e±iφ
4π 8π
r r
3 15
Y10 (θ, φ) = cos θ Y2±2 (θ, φ) = sin2 θ e±2iφ
4π 32π
r r
15 5
Y2±1 (θ, φ) = ∓ sin θ cos θ e±iφ Y20 (θ, φ) = (3 cos2 θ − 1)
8π 16π
A.6 Hydrogen wave functions
The solutions of the Schrödinger equation for the Coulomb potential V (r) = −~cα/r have
energy En = − n12 ERy , where ERy = 21 α2 mc2 = 13.6 eV (with m the reduced mass of the
electron-proton system). (Recall α = e2 /(4π0 ~c) ≈ 1/137.) The spatial wavefunctions are
ψnlm (r) = Rn,l (r)Ylm (θ, φ).
The radial wavefunctions are as follows, where a0 = ~c/(mc2 α):

2 r
R1,0 (r) = 3/2
exp − ,
a0 a0

2 r r
R2,0 (r) = 1− exp − ,
(2 a0 )3/2 2 a0 2 a0

1 r r
R2,1 (r) = √ exp − ,
3 (2 a0 )3/2 a0 2 a0
2 r2

2 2r r
R3,0 (r) = 1− + exp − ,
(3 a0 )3/2 3 a0 27 a02 3 a0
√
4 2 r r r
R3,1 (r) = 1− exp − ,
9 (3 a0 )3/2 a0 6 a0 3 a0
√ 2
2 2 r r
R3,2 (r) = √ exp − .
27 5 (3 a0 )3/2 a0 3 a0
R∞
They are normalised, so 0 (Rn,l (r))2 r2 dr = 1. Radial wavefuntions of the same l but different
n are orthogonal (the spherical harmonics take care of orthogonality for different ls).
The following radial integrals can be proved:
a20 n2
hr2 i = 5 n2 + 1 − 3l (l + 1) ,

2
a0
3n2 − l (l + 1) ,

hri =
2
1 1
= 2
,
r n a0

1 1
= ,
r2 (l + 1/2) n3 a02

1 1
= .
r3 l (l + 1/2) (l + 1) n3 a30
For hydrogen-like atoms (single-electron ions with nuclear charge |e| Z) the results are obtained
by substituting α → Zα (and so a0 → a0 /Z).
A.7 Properties of δ-functions
The δ-function is defined by its behaviour in integrals:
Z b Z b
δ(x − x0 )dx = 1; f (x)δ(x − x0 ) dx = f (x0 )
a a
where the limits a and b satisfy a < x0 < b; the integration simply has to span the point on
which the δ-function is centred. The second property is called the sifting property because
it picks out the value of f at x = x0 .
The following equivalences may also be proved by changing variables in the corresponding
integral (an appropriate integration range is assumed for compactness of notation):
Z
1 b
δ(ax − b) = |a| δ(x − a ) since f (x)δ(ax − b) dx = a1 f ( ab )
X δ(x − xi )
δ(g(x)) = where the xi are the (simple) real roots of g(x).
i
|g 0 (xi )|
Note that the dimensions of a δ-function are the inverse of those of its argument, as should be
obvious from the first equation.
Though the δ-function is not well defined as a function (technically it is a distribution rather
than a function), it can be considered as the limit of many well-defined functions. For instance
the “top-hat” function which vanishes outside a range a and has height 1/a tends to a δ-
function as a → ∞. Similarly a Gaussian with width and height inversely proportial tends to
a δ-function as the width tends to zero. These are shown in the first two frames below.
Two less obvious functions which tend to a δ-function, shown in the next two frames, are the
following:
Z L
0 L→∞
1
2π
ei(k−k )x dx = Lπ sinc (k − k 0 )L −→ δ(k − k 0 )
−L
L→∞
L
π
sinc2 (k − k 0 )L −→ δ(k − k 0 )
The first of these does not actually vanish away from the peak, but it oscillates so rapidly
that there will be no contribution to any integral over k 0 except from the point k 0 = k. This
is the integral which gives the orthogonality of two plane waves with different wavelengths:
hk|k 0 i = δ(k − k 0 ). It also ensures that the inverse Fourier transform of a Fourier transform
recovers the original function.
That
R ∞ the normalisation (for
R ∞integration over k) is correct follows from the following two integrals:
2
−∞
sinc(t)dt = π and −∞ sinc (t)dt = π. The second of these follows from the first via
R∞ R∞
integration by parts. The integral −∞ sinc(t)dt = Im I where I = −∞ eit /t dt may be done
via the contour integral below:
As no poles are included by the contour, the full contour integral is zero. By Jordan’s lemma
the integral round the outer circle tends to zero (as R → ∞, eiz decays exponentially in the
upper half plane). So the integral along the real axis is equal and opposite to the integral over
the inner circle, namely −πi times the residue at x = 0, so I = iπ. So the imaginary part, the
integral of sinc(x), is π.
A.8 Gaussian integrals
The following integrals will be useful:
Z ∞ ∞
dn
r Z r
−αx2 π 2n −αx2 n π
e dx = and x e dx = (−1)
−∞ α −∞ dαn α
These work even for complex α, so long as Re [α] ≥ 0

Often we are faced with a somewhat more complicated integral, which can be cast in Gaussian
form by “completing the square” in the exponent and then shifting integration variable x →
x − β/(2α):
Z ∞ Z ∞ r
−αx2 −βx β 2 /(4α) −α(x+β/(2α))2 π β 2 /(4α)
e dx = e e dx = e
−∞ −∞ α
This works even if β is imaginary.
The two contours below illustrate the two results for complex parameters α or β. For the first,
in (a), we rewrite αx2 as |α|z 2 where z = x exp(iArg [α]/2), so the integral we want is along
the blue line, with R → ∞. Since there are no poles, by Cauchy’s theorem the integral along
the blue contour must equal the sum of those along the red and black countours. As R → ∞
2
the red one gives the known real integral. Since e−|α|z tends to zero faster than 1/R as R → ∞
providing |x| > |y|, the contribution from the black paths is zero as R → ∞. Hence the red
and blue integrals are the same, provided Arg [α] ≤ π/2.
For the second, in (b), the blue contour is the desired integral one after the variable change
(for β imaginary). Again the red and black paths together must equal the blue and again the
contribution from the black paths is zero. Hence the two integrals must be the same.
y z y z
|β|/2α
x x
R R
(a) (b)
A.9 Airy functions
Airy functions are the solutions of the differential equation:
d2 f
− zf = 0
dz 2
There are two solutions, Ai(z) and Bi(z); the first tends to zero as z → ∞, while the second
blows up. Both are oscillatory for z < 0. The Mathematica functions for obtaining them are
AiryAi[z] and AiryBi[z].

The asymptotic forms of the Airy functions are:

z→∞ e
− 23 z 3/2
z→−∞
cos 2
3
|z|3/2 − π
4
Ai(z) −→ √ 1/4 and Ai(z) −→ √
2 πz π |z|1/4

2 3/2 π
e3z
z→∞
2 3/2
z→−∞
cos 3
|z| + 4
Bi(z) −→ √ 1/4 and Bi(z) −→ √ 1/4
πz π |z|
The Schrödinger equation for a linear potential V (x) = βx in one dimension can be cast in the
following form
~2 d2 ψ
− + βxψ − Eψ = 0
2m dx2
Defining z = x/x0 , with x0 = (~2 /(2mβ))1/3 , and E = (~2 β 2 /(2m))1/3 µ, and with y(z) ≡ ψ(x),
this can be written
d2 y
− zy + µy = 0.
dz 2
The solution is

y(z) = C Ai(z−µ)+D Bi(z−µ) or ψ(x) = C Ai (βx−E)/(βx0 ) +D Bi((βx−E)/(βx0 )
where D = 0 if the solution has to extend to x = ∞. The point z = µ, x = E/β is the point
at which E = V and the solution changes from oscillatory to decaying / growing.
The equation for a potential with a negative slope is given by substituting z → −z in the
defining equation. Hence the general solution is ψ(x) = C Ai(−x/x0 − µ) + D Bi(−x/x0 − µ),
with D = 0 if the solution has to extend to x = −∞.
The first few zeros of the Airy functions are given in Wolfram MathWorld.
A.10 Units in EM
There are several systems of units in electromagnetism. We are familiar with SI units, but
Gaussian units are still very common and are used, for instance, in Shankar.
In SI units the force between two currents is used to define the unit of current, and hence the
unit of charge. (Currents are much easier to calibrate and manipulate in the lab than charges.)
The constant µ0 is defined as 4π × 10−7 N A−2 , with the magnitude chosen so that the Ampère
is a “sensible” sort of size. Then Coulomb’s law reads
q1 q2
F=
4π0 |r1 − r2 |2
and 0 has to be obtained from experiment. (Or, these days, as the speed of light is now has a
defined value, 0 is obtained from 1/(µ0 c2 ).)
However one could in principle equally decide to use Coulomb’s law to define charge. This is
what is done in Gaussian units, where by definition
q1 q2
F=
|r1 − r2 |2
Then there is no separate unit of charge; charges are measured in N1/2 m (or the non-SI equiv-
alent): e = 4.803 × 10−10 g1/2 cm3/2 s−1 . (You should never need that!) In these units,
µ0 = 4π/c2 . Electric and magnetic fields are also measured in different units.
The following translation table can be used:
Gauss e E p B
√ √
SI e/ 4π0 4π0 E 4π/µ0 B
Note that eE is the same in both systems of units, but eB in SI units is replaced by eB/c in
Gaussian units. Thus the Bohr magneton µB is e~/2m in SI units, but e~/2mc in Gaussian
units, and µB B has dimesions of energy in both systems.
The fine-structure constant α is a dimensionless combination of fundamental units, and as such
takes on the same value (≈ 1/137) in all systems. In SI it is defined as α = e2 /(4π0 ~c), in
Gaussian units as α = e2 /( ~c). In all systems, therefore, Coulomb’s law between two particles
of charge z1 e and z2 e can be written
z1 z2 ~cα
F=
|r1 − r2 |2
and this is the form I prefer.

PHYS30201

Uploaded by

Copyright:

Available Formats

PHYS30201

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PHYS30201

Uploaded by

Copyright:

Available Formats

Mathematical Foundations of Quantum Mechanics

Original notes by Judith McGovern (December 2018)

Updated by Mike Birse

November 22, 2019

1 The Fundamentals of Quantum Mechanics 3

3 Approximate methods I: variational method and WKB 35

A Revision and background 65

1.1 Postulates of Quantum Mechanics

Summary: All of quantum mechanics follows from a small set of assump-

1.2 From the ket to the wave function

Summary: The position-space representation allows us to make contact

1.3 The propagator or time-evolution operator

Summary: Construcing the time-evolution operator from the Hamiltonian

|ψ(t)i = Û (t, t0 )|ni = e−iEn (t−t0 )/~ |ni.

(Aside: If the Hamiltonian depends explicitly on time, we have

1.4 Simple examples

1.4.1 Two-state system

B̂|a+i = |a−i and B̂|a−i = |a+i.

The eigenkets of B̂ are

with eigenvalues ±1.

Û (t, 0) = e−iγtB̂ = Iˆ − iγtB̂ − 12 γ 2 t2 B̂ 2 + i 3!1 γ 3 t3 B̂ 3 + . . .

1.4.2 Propagator in free space

1.5 Ehrenfest’s Theorem and the Classical Limit

Summary: The form of classical mechanics which inspired Heisenberg’s

The second term is absent if Ω̂ is a time-independent operator (like position, momentum,

1.6 The Harmonic Oscillator Without Tears

Summary: Operator methods lead to a new way of viewing the harmonic

we can prove the following:

• [x̂, p̂] = i~ ⇒ [â, â† ] = 1

• [Ĥ, â] = −~ω â and [Ĥ, â† ] = ~ω â†

En = hn|Ĥ|ni = ~ωhn|â† â + 12 |ni = ~ωhn|â† â|ni + 21 ~ω

Ĥ|ni ≡ ~ω(â† â + 12 )|ni = (n + 21 )~ω|ni

In many cases we are interested in a symmetric potential in which case ωx = ωy , x0 = y0 , and

1.7 Product spaces

Summary: Product states describe multi-particle systems and systems

1.7.1 Product spaces and entanglement

The inner product in the product space

But the following are entangled:

|ψi i = α|0i ⊗ |1i − β|1i ⊗ |0i,

|ψf i = |0i ⊗ |1i

1.7.2 Operators in product spaces

both with eigenvalue +1, and

1.7.3 Two or more particle wave functions

The extension to more than two particles is (conceptually) straightforward.

2.1 General properties of angular momentum

Summary: Focussing purely on the commutation relations of the opera-

~2 (λ − µ2 ) = hλ, µ|(Ĵ2 − Jˆ32 )|λ, µi = hλ, µ|(Jˆ12 + Jˆ22 )|λ, µi ≥ 0

[Jˆ+ , Jˆ− ] = 2~J3 , [Jˆ3 , Jˆ+ ] = ~Jˆ+ , [Jˆ3 , Jˆ− ] = −~Jˆ− ,

Summary: Electrons have an intrinsic angular momentum—spin—

Summary: A 2-D complex vector space perfectly describes spin

sin θe−iφ cos 2θ e−iφ/2 − sin 2θ e−iφ/2

exp i(α/~)Ŝ · n = cos α2 Iˆ + i sin α2 ~2 Ŝ · n.

2.3.2 Spin and measurement: Stern Gerlach revisited

Summary: Angular momentum s is represented by a complex (2s + 1)-D

2.5 Addition of angular momenta

Summary: Two sources of angular momentum are described by a direct-

Ĵ = L̂ ⊗ Iˆ + Iˆ ⊗ Ŝ Ĵ2 = L̂2 ⊗ Iˆ + Iˆ ⊗ Ŝ2 + 2L̂ ⊗ Ŝ

Ĵ2 = L̂2 + Ŝ2 + L̂+ Ŝ− + L̂− Ŝ+ + 2L̂z Ŝz .

For the common case of s = 21 , j = l ± 21 , we have

2.5.1 Using tables of Clebsch-Gordan tables