Classical Mechanics

Physics 6210/Spring 2007/Lecture 1
Lecture 1
Relevant sections in text: 1.1
What is a theory?
Students often consider quantum mechanics to be rather strange compared to other
theories they have encountered (e.g., classical mechanics, electromagnetic theory). And
I guess it is a little unsettling to find position and momentum being represented as differential operators, particles being described by complex waves, dynamics being defined
by the Schrodinger equation, measurements characterized by probabilities, and so forth.
In particular, the rules of quantum mechanics, at first sight, seem rather strange compared to the rules of Newtonian mechanics (essentially just Newtons 3 laws). But a lot
of the intuitive appeal of Newtons laws comes from years of familiarity. Unfortunately,
the strangeness of quantum mechanics, largely bred from lack of familiarity, often leads
to the feeling that the subject is very difficult. Of course, the subject is not easy; nor
should it be - it is one our most advanced descriptions of nature! But even if the basics
seem a little difficult, it is only because one is having to use new mathematical models for
familiar physical structures. After all, wasnt it a little unsettling when you first started
using vectors in a systematic way to describe displacements, velocities, and so forth? (Can
you remember that far back?) So, I would like to begin by emphasizing that, in many
ways, quantum mechanics is a theory like any other. To do this I must give a very coarse
grained description of what a theory is in general, and quantum mechanics in particular.
Of course, the devil and the physics is in the details.
Essentially, a physical theory is a set of rules (i.e., postulates) that can be used to
explain and/or predict the behavior of the world around us in particular, the outcome
of experiments. Which experiments can be explained, how to characterize the physical
ingredients in these experiments, what information needs to be specified in advance, etc.
are part of the rules of the theory. By the way, keep in mind that a theory can never be
said to be true, but only that it agrees with experiment. It is always possible that the
next experiment will falsify the theory. And it is possible that more than one theory will be
able to explain a given set of results. Insofar as these results are all we have experimental
access to, the theories are equally good. Usually, though, there is one theory that explains
the most results with the simplest postulates. This theory is usually considered the
best. So, while classical mechanics can be used to explain a wide variety of macroscopic
observations, quantum mechanics can also explain these results and a host of other results
from microscopic physics (atomic physics, nuclear physics, etc. ) that cannot readily be
explained using classical mechanics.
It should be emphasized that the word theory has a number of connotations, and this
1
can be confusing. For example, quantum mechanics is a theory, but the use of quantum
mechanics to model the hydrogen atom as, say, a non-relativistic electron moving in a fixed
Coulomb field is also a theory in some sense a theory of the hydrogen atom. Clearly
these two notions of theory have somewhat different logical standings in the sense that
one can build a variety of theories of the hydrogen atom (by adding, e.g., spin-orbit
coupling, special relativity, finite-size nucleus, etc. ) within the framework of the theory of
quantum mechanics. Given this slightly confusing state of affairs, I will try (but may fail)
to call quantum mechanics a theory, and I will call models the various theories built
from quantum mechanics (of, e.g., the hydrogen atom).
What are some successful physical theories? There are many, of course. Some examples are: Newtons theory of matter and its interactions, valid at large length scales,
weak gravitational fields, and small velocities (classical mechanics); Maxwells theory
of the electromagnetic field and its interaction with charged sources; Einsteins theory
of gravitation; and, of course, the theory of matter and its interactions at small length
scales, which we call quantum mechanics, along with its descendant, quantum field theory. One confusing feature of all this is that theories really come to us in overlapping
hierarchies. For example, using the classical Maxwell theory of electrodynamics we can
create a theory of atoms as bound states of charges. These theories are, ultimately,
incorrect (being classical). A correct theory of electromagnetic phenomena in general,
and atoms in particular, arises via quantum electrodynamics in which the theory of quantum mechanics (better: quantum field theory) is melded with Maxwells theory and then
used to build a theory of interacting charges, atoms, etc. Thus we can discuss the physical
theory called quantum mechanics and using the framework defined by this theory we
can build theories of various physical phenomena (e.g., crystalline solids). The theory
of the phenomena can be wrong without quantum mechanics being wrong, or perhaps one
is unable to build a satisfactory theory of the phenomena owing to a failure of the parent
theory (quantum mechanics). Similar comments apply to Einsteins theories of relativity
and the various physical theories that are formulated in the context of Einsteins relativity.
Here again, we are drawing a conceptual distinction between the idea of a theory and a
parcticular model built within the confines of that theory.
After all these general, philosophical-sounding statements it is time to get down to
business. What does it mean to have a set of rules that can be used to explain and/or
predict the outcome of experiments? Of course useful theories are necessarily somewhat
intricate, but taking a very coarse-grained view of things, the structure of a generic theory
can be characterized in a pretty simple way. Basically, one can view a theory as a way of
describing observables, states, and dynamics. Below we will describe each of these three.
As you know, mathematics has, for some centuries now, been the language and tool of
choice in building a physical theory. Consequently, the initial job in this course will be to
2
give a mathematical representation for these 3 basic ingredients.
Observables
Measurable aspects of the experimentally accessible world are the observables. Any
theory is to provide a means of assigning a mathematical representation to the observables.
By specifying the observables, we are going a long way toward specifying the kinds of
physical situations that our theory is meant to cover. Quantum mechanics postulates a
universal ground rule for observables: they must be self-adjoint operators on a Hilbert
space. The way in which we implement this rule may vary from physical model to physical
model.
For example, using the theory of Newtonian mechanics we can build a model of a particle in which some important observables are position, momentum, angular momentum,
energy, etc. In fact, our usual model of a (Newtonian) point particle supposes that all
observables can be viewed as functions of position and momentum, which we can call the
basic observables. Mathematically, the basic observables for a single particle in Newtonian
theory are represented by 6 numbers, (x, y, z, px , py , pz ), that is, the observables are functions on the six-dimensional phase space. Normally, these 6 numbers are actually viewed,
mathematically speaking, as a pair of vectors. The behavior of a particle is documented
by monitoring the behavior of its observables and we build our theory using the mathematical representation (e.g., vectors) for these quantities. Other quantities, like mass, electric
charge, and time are in some sense observable too, but in Newtonian mechanics these
quantities appear as parameters in various equations, not as quantities which one measures
to document the behavior of the system. In other words, while we may consider the way
in which the position of a particle changes, we normally dont include in our model of a
particle a time-varying mass or electric charge. Of course, more sophisticated models
of matter may attempt to give a better, or more fundamental description of mass and
electric charge in which these quantities become observables in the sense described above.
As another example, consider the electromagnetic field as it is described in Maxwells
theory. Observables are, of course, the electric and magnetic field strengths. Also we
have, polarization of waves, energy density, momentum density, etc. All electromagnetic
observables (in Maxwells theory) are built from electric and magnetic field strengths,
which are the basic observables of the theory. Mathematically, they are represented as
vector fields. Another measurable quantity is the speed of light c. Like mass and charge
in Newtonian mechanics, this quantity appears as a parameter in Maxwells theory and
is not something that can change with the configuration of the system. We dont use the
term observable for the speed of light in the language we are developing.
As my final example, consider the theory of bulk matter known as statistical mechan3
ics. This theory has a number of similarities with quantum mechanics. For now, let us
note that some of the observables are things like free energy, entropy, critical exponents,
etc. All of these quantities can be computed from the partition function, which is, in some
sense, the basic observable. Of course, statistical mechanics is normally built from classical
and/or quantum mechanics in which case the partition function itself is built from more
basic observables. By the way, temperature is, of course, a measurable quantity. But it
plays the role of a parameter in statistical mechanics in a canonical ensemble and so in this
case temperature is treated as a parameter just like mass and electric charge in classical
electrodynamics.
So, we see that different physical phenomena require different kinds of observables, and
different theories use different mathematical representations for the observables. One of
our two main goals this semester is to get a solid understanding of how quantum mechanics
represents observables.
Let us remark that in all of our examples indeed, in most theories the observable
called time enters as an adjustable parameter. It is normally not modeled as an observable in the same sense as we model, say, the energy of a particle. In quantum mechanics
time is not an observable in the sense described above.
States
A physical system can be in a variety of configurations, that is, it can display a
variety of possible values for its observables. When we speak of the state of the system
we are referring to a mathematical object which can completely characterize the outcomes
of all possible measurements of the observables of the system. Thus the mathematical
representation of state is intimately tied to the representation of observables. Let us
illustrate the notion of state with our previous examples.
In the Newtonian mechanics of a particle, we can determine the state of the system
by measuring the basic observables (the coordinates and momenta). Indeed, all other
observables are functions of these observables (and the time). In electromagnetic theory,
the state of affairs is quite similar. In source-free electrodynamics, the electric and magnetic
fields at one time determine all electromagnetic observables and the state of the system.
In these two examples the notions of state and observables are closely intertwined, which
gives the impression that they are really the same thing. But this is not always the case. In
statistical mechanics we have a more subtle way of defining the state. To be explicit, let us
focus on classical statistical mechanics of a system of particles. From the point of view
of classical mechanics, the underlying, basic observables can be viewed as the coordinates
and momenta of all the particles making up the system (the phase space) and specifying
a point in phase space determines the state of the system. In statistical mechanics the
4
state is specified by giving a probability distribution on the phase space (which is a single
function), rather than a point in phase space as is done in Newtonian mechanics. From the
probability distribution one can compute all observables of statistical mechanics. We see
that given the state of the system, all observables are determined, but the way we specify
the state can be rather different.
To get a better handle on the distinction between states and observables, you can think
as follows. The state of a system reflects the way it has been prepared, which normally
is a reflection of the particular initial conditions used. Given a particular preparation
procedure (performed by various measurements and/or filtering processes) the system will
behave in a particular indeed, unique way, as reflected in the behavior of its observables.
A physical model for a system principally involves an identification of the observables
needed to describe the system. This is done once and for all. The states of the system
represent various ways the system can be started off and can be adjusted by experimental
procedures.
Dynamics
The measured values of observables of a system will usually change in time. Normally, a
theory will contain a means of describing time evolution of the system, that is, a dynamical
law, or a law of motion. Assuming that we use a time-independent mathematical model
for the observables, we can view dynamics as a continuous change (in time) of the state
of the system according to some system of equations. This way of formulating dynamics
is what is often called the Schr
odinger picture of dynamics, and a famous example of the
dynamical law is provided by the Schr
odinger equation. In statistical mechanics, the state
of the system is determined by a probability distribution on phase space. This distribution
evolves in time according to the Liouville equation.
In classical mechanics and electrodynamics, the state of the system is known once one
specifies the values of the basic observables. For example, if you give the positions and
velocities of a Newtonian particle at an instant of time, these quantities will be uniquely
determined for all time by a dynamical law (i.e., Newtons second law). In these theories
one can therefore think of dynamics as a time evolution in the value of the observables
according to some system of equations (F = ma, Maxwell equations). One important
aspect of dynamics that is usually incorporated in any theory is a very basic notion of
causality. In the Schrodinger picture, given the state of the system at one time, the
dynamical law should determine the state uniquely at any other time. Granted this, you
see that the state at a given time (along with the dynamical law - which is part of the
specification of the theory) will determine the outcomes of all measurements at any time.
To summarize: A theory requires (1) A mathematical representation of observables;
5
(2) A mathematical representation of states and a prescription for determining the values
of the observables the physical output of the theory from any given state; (3) A
specification of a dynamical law, which tells us how to extract physical output as a function
of time. Our goal this semester will be to see how quantum mechanics takes care of (1),
(2), and (3) and uses them to build models of a number of physical systems.
A Word of Caution
One prejudice that arises from, say, classical mechanics that must be dispelled is as
follows. In classical mechanics, knowing the state is the same as fixing the values of all
observables at one time. So, if we know the state of a Newtonian particle at one time,
we know the values of its coordinates and momenta and every other observable. Other
theories may be set up differently and this kind of result need not apply. For example,
in quantum mechanics (and in classical statistical mechanics), the state of the system will
provide probability distributions for all observables. One may completely determine/specify
the state by assigning values to some of the observables (the energy of a simple harmonic
oscillator is 5 ergs with probability one), but this may leave some statistical uncertainty in
other observables. As we shall see, for example, (roughly speaking) specifying the position
of a particle will completely determine the state of the particle in quantum mechanics. This
state will allow for a large statistical uncertainty (a very broad probability distribution)
for momentum. Likewise, specifying the energy of a particle will, in general, imply a
statistical uncertainty in the values of position and momentum.
Stern-Gerlach experiment
We now describe an experiment conducted by Stern and Gerlach in the early 1920s.
It gives us a valuable demonstration of the kind of phenomenon that needs quantum
mechanics to explain it. It also provides an example of what is probably the simplest
possible quantum mechanical model. As you probably know, this experiment involves
the property of particles known (perhaps misleadingly) as their spin, which is an intrinsic
angular momentum possessed by the particles. Note, though, at the time of the experiment,
neither intrinsic spin nor quantum mechanics was very well understood! Our goal in
studying this important experiment is to introduce the basic rules of quantum mechanics
in what is probably the simplest possible mathematical setting.
The Stern-Gerlach experiment amounts to passing a beam of particles through a region
with a magnetic field which has the same direction everywhere, but a varying magnitude.
Recall that the classical potential energy of interaction of a magnetic moment with a
magnetic field B is B. Thus the force (as opposed to the torque) exerted on the
6
magnetic moment is non-zero if the magnetic field varies in space. We have

F = ( B).
If the magnetic field varies only in one direction, then the force is parallel or anti-parallel
to that direction, depending upon the orientation of the magnetic moment relative to the
fixed direction of B. Thus the force experienced by the magnetic moment allows us to
measure the component of along the direction of B; we simply have to observe which
way the particles are deflected. Of course, this reasoning was purely classical, so we are
initially confident it will work with macroscopic objects, but it can be justified/explained
quantum mechanically. For now we just note that the correct quantum model of the
interaction of a magnetic moment with a magnetic field does indeed use the potential
energy function shown above, and since the atom is sufficiently massive its motion can be
well approximated using a classical mechanics description. Here we have a nice example
of how classical, macroscopic reasoning gives us important clues to modeling a quantum
mechanical, microscopic system.
Stern and Gerlach passed a beam of silver atoms through such an apparatus. These
atoms have a magnetic moment (thanks to the electron spin) and so are deflected by the
apparatus. Based upon a classical model of the atoms magnetic moment as coming from
motion of the charge distribution, and given that the atoms in the beam have random
orientations of their magnetic moments, one expects a continuous deflection of the beam,
reflecting a continuous spread in the projections of the magnetic moment along the inhomogeneity axis. Instead, what was observed was that the beam splits into two parts. The
explanation for this phenomenon is that the magnetic moment vector of the atom is due
to the presence of an electron which carries intrinsic angular momentum S its spin
and hence a magnetic moment. (Here is proportional to S.) The electron is in an
atomic state which does not have any orbital angular momentum, so its motion about the
atom does not provide a contribution to the magnetic moment.* In this (initially bizarre)
explanation, the electron can be in one of two spin states relative to any given direction.
More precisely, the projection of the spin of an electron along any axis can only take two
values (
h/2). Such particles are said to have spin 1/2, an intrinsic (state-independent)
property of the electron. If each atom is randomly selected, one expects that these two
alternatives occur with 50-50 probability, and this is what is observed. Half the beam gets
deflected in each direction. To explain this discreteness indeed, two-valued nature of
an angular momentum is one challenge faced by any putative theory. But there is much
more...
Using Stern-Gerlach apparatuses, we can measure any component of the magnetic
moment equivalently, the spin vector of a spin 1/2 particle by aligning the SG magnetic
* Of course, there are 47 electrons in a silver atom. However 46 of them are in a state with
no net angular momentum and hence no net contribution to the magnetic moment.
7
field direction along the axis of interest, then passing the beam of particles through and
seeing which way the particles are deflected, corresponding to spin up or down along
that direction. Let us therefore try to model the behavior of the spin vector S in such an
experiment, ignoring all the other degrees of freedom that the atoms might have. Thus
the atom is modeled as a spin 1/2 particle. Let us call an SG apparatus that measures
the spin along an axis characterized by a unit vector n SGn . Thus the apparatus SGn
measures S n. The empirical fact is that if you measure S n you always get
h/2. Let us
pass a beam of spin 1/2 particles through SGn and keep, say, only the particles that deflect
according to S n having the value + h2 . If we pass this filtered beam through another such
SGn /filter device we see that 100% of the beam passes through. We say that we have
determined the spin along n with certainty for all the particles in the filtered beam. We
model this situation by saying that all the particles in the (filtered) beam are in the state
|S n, +i. We can say that we have prepared many particles all in the same state by
passing a beam of particles through an SG apparatus and only keeping those deflected up
or down.
Suppose we pass a beam through the apparatus SGz and only keep one spin projection.
We now have many electrons prepared in the state |Sz , +i. Let us try to pin down the
value of Sx that these electrons possess. Pass the beam (all particles in the state |Sz , +i)
through another Stern-Gerlach apparatus SGx . Particles are now deflected according to
the projection of their magnetic moments (or spin vectors) along the x direction. What
you find in this experiment is that the beam splits in half. This is perfectly reasonable; we
have already decided that any component of the spin has just two projections along any
given axis. Since there is nothing special about the x or z directions; we should get similar
behavior for both. In the SGz filtered beam we did not prepare Sx in any special way,
so it is not too surprising that we get the beam to split in half.
Let us continue our investigation as follows. We have passed our beam through SGz
and kept the spin up particles. We then pass these spin up particles through SGx ; let
us focus on the beam that gave h
/2 for the Sx measurement. Therefore, roughly half of
the beam that entered the SGx apparatus is kept, and we now have 1/4 of the original
particles left in our prepared beam. After this filtering process we can, if we like, verify
that a repeated filtering process with apparata SGx keeps all the beam intact - evidently
the state of the particles could be represented by |Sx , +i.*
Now we have a beam of electrons that have been measured to have the following
properties (1) Sz is +h/2, (2) Sx is +h/2. Given (1) and (2) above, it is reasonable to
* Of course, one naturally prefers to write the state as something like |Sz , +; Sx , +i, but we
shall see that this is not appropriate.
We could now go and measure Sy in this doubly filtered beam; you will find that half the
beam has spin up along y, half has spin down (exercise). But let us not even bother with
this.
8
believe that the electrons all have definite values for Sz and Sx since we have filtered out
the only other possibilities. This point of view is not tenable. Suppose you go back to
check on the value of Sz . Take the beam that came out of SGz with value h/2 and then
SGx with the value +h/2 and pass it through SGz again. You may expect that all of the
beam is found to have a value +h/2 for Sz , but instead you will find that beam splits in
two! This is despite the fact that we supposedly filtered out the spin down components
along z.
So, if you measure Sz and get, say, h/2, and then you measure it again, you will get h/2
with probability one (assuming no other interactions have taken place). If you measure
Sz and get, say, h/2, then measure Sx and then measure Sz , the final measurement will
be
h/2 with a 50-50 probability. This should get your attention: the values that you
can get for the observable Sz in two measurements depends upon whether or not you have
determined the value of Sx in between the Sz measurements.
Given this state of affairs, it is hard to make sense of the classical picture in which one
imagines the electron to have given, definite values of all its observables, e.g., Sx and Sz .
One sometimes says that the measurement of Sx has somehow disturbed the value of
Sz . This point of view is not incorrect, but is not a perfect description of what is going
on. For example, as we shall see, the quantum mechanical prediction is unambiguously
independent of the way in which we make the measurements. Nowhere do we really need
to know how the SG devices worked. Moreover, the disturbance in Sz due to the Sx
measurement is not a function of how carefully we make the Sx measurement, that is,
one cannot blame the strange behavior as coming from some experimental error, the
measurements can, ideally, be perfect and we still get the same result. The fact of the
matter is that one shouldnt think of observables (such as Sz and Sx ) has having given,
fixed, values that exist in the object of interest. This may be philosophically a bit sticky
(and psychologically a bit disturbing), but it seems to be quite alright as a description of
how nature actually works.
If all this seems perfectly reasonable to you, then you probably dont understand it
too well. Our macroscopic experience with matter just doesnt give any hint that this is
the way nature works.
Electrons (and other elementary particles) are not like tiny baseballs following classical
trajectories with tiny spin angular momentum arrows attached to them, and there is no
reason (experimentally) to believe that they are. It is a purely classical prejudice that
a particle has definite values for all observables that we can measure. Try to think this
way: what is a particle? It has mass, (total) spin, charge, etc. and other intrinsic, real
properties that do not change with the state of the particle. Based upon experiment, one
may want to assign other observable properties such as position, energy, orbital angular
momentum, spin component along an axis to the particle. But according to experiment,
9
these properties change with the state of the particle and cannot be viewed as existing
in the particle independently of the measuring process (which changes the state). As it
turns out, according to the quantum mechanical explanation of this sort of phenomenon,
all you are guaranteed to be able to assign to a particle is probability distributions for
its various observables. Our next task is to build up the quantum mechanical model of the
spin 1/2 system using the rules of quantum mechanics.
10
Lecture 2
Quantum theory of spin 1/2
We now try to give a quantum mechanical description of electron spin which matches
the experimental facts described previously.
Let us begin by stating very briefly the rules of quantum mechanics. We shall show
what they mean as we go along. But it is best to know the big picture at the outset.
Rule 1
Observables are represented by self-adjoint operators on a (complex) Hilbert space H.
Rule 2
States are represented by unit vectors in H. The expectation value hAi of the observable
A in the state |i is given by the diagonal matrix element
hAi = h|A|i.
Rule 3
Time evolution is a continuous unitary transformation on H.
We will now use Rules 1-2 to create a model of a spin 1/2 particle. We will not need
Rule 3 for a while (until Chapter 2). We suppose that a spin 1/2 system is completely
described by its spin observable S, which defines a vector in 3-d Euclidean space. As such,
S is really a collection of 3 observables, which we label as usual by Sx , Sy , Sz , each of
which is to be a (self-adjoint) linear operator on a (Hilbert) vector space. We have seen
that the possible outcomes of a measurement of any component of S is
h/2. As we will
see, because the set of possible outcomes of a measurement of one these observables has
two values, we should build our Hilbert space of state vectors to be two-dimensional. A
two dimensional Hilbert space* is a complex vector space with a Hermitian scalar product.
Let us explain what all this means.
I wont bother with the formal definition of a vector space since you have no doubt
seen it before. You might want to review it. We denote the elements of the vector space
* A Hilbert space is a vector space with scalar product and a certain completeness requirement. As it happens, this completeness requirement is redundant for finite dimensional
vector spaces, so we dont need to worry with it just yet.
1
by the symbols |i, |i, etc. . Of course, they can be added to make new vectors, which
we denote by
|i + |i = |i.
We denote the set of scalars (complex numbers) by the symbols a, b, c, etc. Scalar multiplication of a vector |i by c is another vector and is denoted by
c|i |ic.
We denote the Hermitian scalar product of two vectors |i and |i by the notation
h|i = h|i .
The scalar product also satisfies
h|(a|i) h|a|i = ah|i.
If all this leaves you feeling a bit dazed, then your linear algebra background probably
needs strengthening. A quick fix is to study some simple examples to give meaning to the
symbols, which we shall do now.
Vector spaces elementary examples
Here are a couple of simple examples of the foregoing material.
First, consider the set of position vectors in ordinary space. They form a real vector
space with addition being defined component-wise or via the parallelogram rule. Scalar
multiplication is defined component-wise or by scaling of the length of the vector (and
reversing its direction if the scalar is negative). The scalar product of two vectors is just
the familiar dot product. Since this is a real vector space, the complex conjugation business
is trivial.
Second, and much more importantly for our present purposes, consider the complex
vector space C2 , the set of 2-tuples of complex numbers. Elements of this vector space
can be defined by column vectors with complex entries, e.g.,

a
|i
.
b
Addition is defined component-wise in the familiar way. Scalar multiplication is also defined
component-wise, by multiplication of each element of the column, e.g.,

ca
.
c|i
cb
2
The scalar product of two vectors,

|1 i
a1
b1

,
|2 i
a2
b2
= a1 a2 + b1 b2 .
is given by

h1 |2 i =
a1
b1

a2
b2

=
a1

a2
a2
b2
In each of these two examples you should verify the various abstract properties listed
above. Pay particular attention to the scalar product.
Note that we have refrained from literally identifying the vectors with the column
vectors. The reason for this is that we will be adopting the point of view that a column
vector is really defined as the components of a vector in a given basis. One can change
the basis and thereby change the column vector which corresponds to one and the same
vector. For now, you can safely ignore this subtlety and just think of the notation |i,
etc. as a fancy way to manipulate column and row vectors. Later we will tighten up the
interpretation of the notation.
As it happens, every complex 2-dimensional vector space can be expressed in the above
form (column vectors, etc. ). So the vector space used to model the spin 1/2 system has
this mathematical representation. We shall make use of this extensively in what follows.
Bras and Kets

We have already used the notation |i for the elements of a vector space. Of course,
we can call these things vectors, but it is also customary following Dirac to call them
kets, which we shall explain in a moment. Associated to every ket in a Hilbert space
there is another kind of vector which lives in a different dual vector space to H. We
call this dual vector space H and the elements of this space are called dual vectors or
bras (!). If you think of kets as column vectors, then the bras are row vectors and the
correspondence between a ket and bra is that induced by the Hermitian adjoint (complexconjugate-transpose). More generally, every vector |i determines a linear function F on
H, which is defined using the scalar product by
F (|i) = h|i.
(1)
The set of all linear functions on a vector space is itself a vector space the dual vector
space. For a Hilbert space the dual vector space can be identified with the original vector
space in the manner indicated above. Instead of F , we use the notation h| for the linear
3
function defined by (1). Thus a ket |i defines a bra h| which is a linear function on H
via
|i h|i.
Sometimes we write
h| = (|i) ,
which makes good sense if you are thinking in terms of column and row vectors. Note that
because the bras form a complex vector space they can be added and scalar multiplied as
usual. We use the obvious notation:
h| + h| = h|,
c(h|) = ch| = h|c,
and so forth. Note in particular, though, that the bra corresponding to the ket c|i involves
a complex conjugation:
(c|i) = h|c .
To see this we consider the linear function defined by |i = c|i. Evaluated on some vector
|i we get
h|i = h|i = (h|c|i) = c h|i = c h|i.
The origin of the terminology bra and ket comes from the pairing between vectors
and linear functions via the scalar product h|i, which uses a bracket notation. Get it?
This terminology was introduced by Dirac, and the notation we are using is called Diracs
bra-ket notation.
Lecture 3
Relevant sections in text: 1.2, 1.3
Spin states
We now model the states of the spin 1/2 particle. As before, we denote a state of the
particle in which the component of the spin vector S along the unit vector n is
h/2 by
|S n, i. We define our Hilbert space of states as follows. We postulate that H is spanned
by |S n, i for any choice of n, with different choices of n just giving different bases.
Thus, every vector |i H can be expanded via
|i = a+ |S n, +i + a |S n, i.
(1)
We define the scalar product on H by postulating that each set |S n, i forms an orthonormal basis:
hS n, |S n, i = 1, hS n, |S n, i = 0.
Since every vector can be expanded in terms of this basis, this defines the scalar product
of any two vectors (exercise). Note that the expansion coefficients in (1) can be computed
by
a = hS n, |i.
This is just an instance of the general result for the expansion of a vector |i in an
orthonormal (ON) basis |ii, i = 1, 2, . . . , n, where the ON property takes the form
hi|ji = ij .
We have (exercise)
|i =
ci |ii,
ci = hi|i.
We can therefore write

|i =
|iihi|i.
If we choose one of the spin bases, say, |S n, i, and we represent components of

vectors as columns, then the basis has components

1
0
hS n, |S n, +i =
hS n, |S n, i =
.
0
1
More generally, a vector with expansion
|i = a+ |S n, +i + a |S n, i
1
is represented by the column vector

hS n, |i =
a+
a
The bra h| corresponding to |i has components forming a row vector:

h|S n, i = (a+
a ).
Linear operators
Our next step in building a model of a spin 1/2 system using the rules of quantum
mechanics is to represent the observables (Sx , Sy , Sz ) by self-adjoint operators on the
foregoing two-dimensional vector space. To do this we need to explain how to work with
linear operators in our bra-ket notation.
A linear operator A is a linear mapping from H to itself, that is, it associates to each
vector |i a vector A|i. This association (mapping, operation) is to be linear:
A(a|i + b|i) = aA|i + bA|i.
If you think of vectors as columns, then a linear operator is represented as a square matrix.
I will explain this in detail momentarily. As you know, if you take a row vector and multiply
it on the left with a column vector of the same size, you will get a square matrix, that
is, a linear operator. More generally, given a ket |i and a bra h| we can define a linear
operator via
A = |ih|.
What this means is
A|i = |ih|i.
You can easily check as an exercise that this is a linear operator. This operator is called
the outer product or tensor product operator.
You can easily see that the sum of two linear operators, defined by,
(A + B)|i = A|i + B|i,
is a linear operator as is the scalar multiple:
(cA)|i = cA|i.
Thus the set of linear operators forms a vector space! Moreover, you can check that the
product of two operators, defined by
(AB)|i = AB|i
2
is a linear operator. Thus the set of linear operators forms an algebra.

It is not hard to see that every linear operator can be written in terms of an orthonormal
basis (ONB) |ii as
X
A=
Aij |iihj|,
i
where
Aij = hi|A|ji
are called the matrix elements of A in the basis provided by |ii.* To see this, simply
expand the vectors |i and A|i in the ONB:
X
X
X
A|i =
|iihi|A|i =
|iihi|A|jihj|i =
Aij |iihj|i.
i
ij
ij
A very important example of this is the identity operator, defined by,

I|i = |i,
|i.
It has the decomposition (good exercise!)

I=
|iihi|.
(2)
This resolution of the identity is used all the time to manipulate various equations.
Dont forget it! As a simple example, you can use (2) to view the expansion in a basis
formula as a pretty trivial identity:
!
X
X
|iihi|i !
|i = I|i =
|iihi| |i =
i
The array Aij is in fact the matrix representation of the linear operator A in the given
basis. To see how this works, we consider the action of a linear operator on a vector and
see the familiar rules of matrix multiplication coming into play when we expand in an
orthonormal basis |ii. Watch:
X
hi|A|i = hi|
Ajk |jihk|i
jk
Ajk hi|jihk|i
jk
Aik hk|i.
* More generally, any scalar of the form h|A|i is called a matrix element.
3
The final line shows how the ith component of A|i the ith entry of the column vector
representing A|i is given by matrix multiplication of the array Aik with the column
vector hk|i. We can equally well see how matrix multiplication gets defined via the
product of linear operators. Consider the matrix elements of the operator AB:
X
hi|AB|ki =
hi|A|jihj|B|ki
j
Aij Bjk .
Given a linear operator A we recall the notion of eigenvectors and eigenvalues. They
are solutions and |i to the equation
A|i = |i.
The zero vector is not considered an eigenvector. Note that the eigenvector |i corresponding to an eigenvalue is not unique, since any scalar multiple of the eigenvector will also
be an eigenvector with the same eigenvalue. However, note that such eigenvectors are not
linearly dependent. It may or may not happen that there is more than one linearly independent eigenvector for a given eigenvalue. When there is, one says that the eigenvalue is
degenerate. Note also that the number of eigenvalues and the dimension of the eigenspace
can range from 0 to the (still finite) dimension of the Hilbert space.
Finally, let us note that a linear operator A on H also defines a linear operation on the
dual vector space, i.e., the space of bras. This operation is denoted
h| h|A.
To define h|A we should tell how it acts (as a linear function) on kets; the definition is
(h|A)|i = h|A|i.
As an exercise you should check that h|A so-defined is indeed a linear function, i.e., a
bra and that A is a linear operation on bras.
Self-adjoint operators
A linear operator A on a finite-dimensional Hilbert space defines an operator A called
the adjoint of A. It is defined by demanding that, for all kets, we have
h|A |i = h|A|i .
Note that this is equivalent to defining A as the operator whose matrix elements are the
complex conjugate-transpose of those of A:
(A )ij = (Aji ) .
4
If
A = A
we say that A is self-adjoint or Hermitian.
It is a standard result from linear algebra that a Hermitian operator has real eigenvalues
and that eigenvectors corresponding to distinct eigenvalues must be orthogonal. See your
text for the elementary proofs. An extremely important theorem from linear algebra says
that a Hermitian operator always admits an orthonormal basis of eigenvectors. Typically,
we will use a notation such as
A|ii = ai |ii
to denote the eigenvalues and eigenvectors.
Lecture 4
Relevant sections in text: 1.2, 1.3, 1.4
The spin operators
Finally we can discuss the definition of the spin observables for a spin 1/2 system. We
will do this by giving the expansion of the operators in a particular basis. We denote the
basis vectors representing states in which the z component of spin is known by
|i := |Sz , i,
and define

h
|+ih| + |ih+|
2

h
Sy = i |ih+| |+ih|
2

h
Sz =
|+ih+| |ih| .
2
Note that we have picked a direction, called it z, and used the corresponding spin states
for a basis. Of course, any other direction could be chosen as well. You can now check
that, with the above definition of Sz , |i are in fact the eigenvectors of Sz with eigenvalues
h/2. Labeling matrix elements in this basis as

h+|A|+i h+|A|i
Aij =
,
h|A|+i h|A|i
Sx =
you can also verify the following matrix representations in the |i basis:

h 0 1
h 0 i
h 1 0
(Sx )ij =
(Sy )ij =
(Sz )ij =
.
2 1 0
2 i 0
2 0 1
Finally, you should check that all three spin operators are self-adjoint. It will be quite
a while before you get a deep understanding of why these particular operators are chosen.
For now, let us just take them as given and see what we can do with them.
As a good exercise you can verify that
1
|Sy , i = (|+i i|i)
2
1
|Sx , i = (|+i |i),
2
are eigenvectors of Sx and Sy , respectively, with eigenvalues

h/2. Note that these eigenvectors are normalized to have norm (length) unity. The fact that these eigenvectors
are distinct from those of Sz will be dealt with a little later. For now, just note that the
three operators do not share any eigenvectors. Note also that the eigenvalues of the spin
operators are all non-degenerate exercise.
1
Spectral decomposition
The spin operators have the general form
X
A=
Aij |iihj|,
ij
which we discussed earlier. Note, though, that Sz has an especially simple, diagonal form.
This is because it is being represented by an expansion in a basis of its eigenvectors. It is
not hard to see that this result is quite general. If |ii is an ON basis of eigenvectors of A
with eigenvalues ai :
A|ii = ai |ii,
then the matrix elements are
Aij = hi|A|ji = aj hi|ji = ai ij ,
so that (exercise)
A=
ai |iihi|.
This representation of an operator is called its spectral decomposition. The name comes
from the terminology that the set of eigenvalues forms the spectrum of an operator (on a
finite dimensional Hilbert space). You can easily see that the definition of Sz is its spectral
decomposition.
Because every self-adjoint operator admits an ON basis of eigenvectors, each such
operator admits a spectral decomposition. Of course, different operators will, in general,
provide a different ON basis.
Probability interpretation
We now use the third quantum mechanics postulate (relating expectation values to
diagonal matrix elements) to physically interpret some elementary state vectors. Let us
start with the eigenvector |+i of Sz . In the state represented by this vector we have
(exercise)
h
hSz i = h+|Sz |+i = , hSx i = hSy i = 0.

2
This makes perfect sense since |+i |Sz , +i is supposed to be a state where Sz is known
with certainty (probability unity) to have the value +h/2. On the other hand, we saw in
the Stern-Gerlach experiment that such states have equal probability of finding
h/2 for
Sx and Sy whence their expectation value vanishes. You can verify as an exercise that
similar comments (with appropriate permutation of x, y, and z) can be made for |Sz , i,
|Sx , i, etc.
2
The third postulate of QM is the place where the mathematical representation of

a physical system makes contact with reality. It provides the predictions that can be
tested/compared with experiment. Note that the third postulate gives the physical output
of QM in terms of probabilities (specifically, expectation values). In fact, as we shall see,
all the physical predictions of quantum mechanics are intrinsically probabilistic.
How can we see directly the probabilities for the various outcomes of a measurement
of something like spin when the third postulate only gives statistical averages, i.e., expectation values? We proceed as follows. Consider a function f (x) that takes the value 1
at, say, x = h/2 and vanishes otherwise.* Consider the observable, say, f (Sx ) not yet
viewed as an operator, but just as the experimentally accessible observable. So, f (Sx ) is
like a detector for spin-up along the x-axis it yields the value one when the spin along
x is h2 and yields zero otherwise. If you think about repeatedly setting up an experimental
state and measuring f (Sx ) you will see that (in the limit of an arbitrarily large number of
experiments) the expectation value of f (Sx ) is precisely equal to the probability for finding
Sx to have the value h/2 (exercise). More generally, the probability of finding Sx to have
a value in any range R of real numbers is just the expectation value of the characteristic
function of the set R. Clearly we can do the same for any other component of the spin.
Thus we can extract a probability by computing an expectation value, which we know how
to do once we figure out how to represent f (Sx ). Lets see how to do this.
In general, given a Hermitian operator A with an ON basis of eigenvectors |ii and
eigenvalues ai , and a real-valued function h(x), we can define a self-adjoint operator h(A)
using its spectral decomposition. We want the eigenvectors of h(A) to be the same as for A,
and we want the eigenvalues to be h(ai ). Evidently we desire the spectral decomposition
X
h(A) =
h(ai )|iihi|.
i
You can easily check that |ii constitute (a basis of) eigenvectors of h(A) with eigenvalues
h(ai ). Thus we define functions of observables by their spectral decomposition. In particular, given the characteristic function f for the point x = h
/2, we define the operator
f (Sx ) by its spectral decomposition:
h
h
f (Sx ) = f ( )|Sx , +ihSx , +| + f ( )|Sx , ihSx , | = |Sx , +ihSx , +|.
2
2
It is now easy to see, by computing expectation values of f (Sx ) according to the third
postulate, that the following probability distributions arise (good exercise!):
State: |Sz , i P rob(Sx = h/2) = 1/2,
P rob(Sx =
h/2) = 1/2
* Such a function is called a characteristic function for the set x = h2 .

3
State: |Sx , i P rob(Sx = h/2) = 1,
P rob(Sx =
h/2) = 0
State: |Sy , i P rob(Sx = h/2) = 1/2,
P rob(Sx =
h/2) = 1/2.
You can easily play similar games with other components of S. You can also compute the
probabilities (via expectation values) in any state you like just by expanding the state in
the |i basis and computing the expectation values using the orthonormality of the basis.
As a nice exercise you should be able to prove that if a state vector takes the form
|i = a|+i + b|i,
|a|2 + |b|2 = 1,
then the probability of getting h

/2 for Sz is given by |a|2 while the probability for getting
h/2 is given by |b|2 . Note that the normalization condition

1 = h|i = |a|2 + |b|2
guarantees that the probabilities add up to unity.
This implies that the probability for getting any other values for Sz must vanish. Let
us prove this directly. Let g(x) be a function that vanishes at
h/2, that is, that vanishes
at the eigenvalues of any of the spin operators. For any component of the spin, Sk , and
for any state |i we have that

h
h
hg(Sk )i = h| g( )|Sk , +ihSk , +| + g( )|Sk , ihSk , | |i = 0.
2
2
In particular, if you pick g to be a characteristic function of any set not including the
spectrum of Sk , then the expectation value which is the probability for finding Sk to be
in that set vanishes. Thus we see that the only possible outcome of a measurement of an
observable is an element of its spectrum, i.e., one of its eigenvalues.
The foregoing results are important enough to state in all generality. The only possible
outcome of a measurement of an observable represented by the self-adjoint operator A is
one of the eigenvalues of A. Given a state represented by the (unit) vector |i and given
an observable (represented by) A, we can write
X
|i =
hi|i|ii,
i
where
A|ii = ai |ii.
If there is no degeneracy, the probability for getting the value ai upon measurement of (the
observable represented by) A is |hi|i|2 . If there is degeneracy the probability for getting
P
ai is given by j |hj, ai |i|2 , where the sum runs over an ON basis |j, ai i of the subspace
associated with ai .
4
Lecture 5
Relevant sections in text: 1.21.4
An alternate third postulate
Here is an equivalent statement of the third postulate.*
Alternate Third Postulate
Let A be a Hermitian operator with an ON basis of eigenvectors |ii:
A|ii = ai |ii.
The only possible outcome of a measurement of the observable represented by A is one of
its eigenvalues aj . The probability for getting aj in a state |i is
P rob(A = aj ) =
D
X
|hi, aj |i|2 ,
i=1
where the sum ranges over the D-dimensional space of eigenvectors |i, aj i,i = 1, . . . , D,
with the given eigenvalue aj . Note that if the eigenvalue ak is non-degenerate then
P rob(A = aj ) = |hj|i|2 .
Note also that hj|i is the component of |i along |ji.
Let us prove the probability formula in the case where there is no degeneracy. Allowing
for degeneracy is no problem; think about it as an exercise. Let f (x) be the characteristic
function of aj . We have (exercise)
f (A) = |jihj|.
Then, in the state |i we have
P rob(A = aj ) = hf (Ai) = h| (|jihj|) |i = |hj|i|2 .
Because the state vectors have unit norm, we have
X
X
1 = h|i =
h|jihj|i =
|hj|i|2 ,
j
* For now we state it in a form that is applicable to finite-dimensional Hilbert spaces. We

will show how to generalize it to infinite-dimensional ones (such as occur with a particle)
in a little while.
1
that is, the probabilities P rob(A = aj ) add up to unity when summed over all the eigenvalues. This indicates that the probability vanishes for finding a value for A which is not
one of its eigenvalues.
We can write the expectation value of an observable so that the probabilities feature
explicitly. As usual, we have
A|ii = ai |ii,
hi|ji = ij .
If a system is in the state |i, we compute (by inserting the identity operator twice good
exercise)
hAi = h|A|i
X
=
h|iihi|A|jihj|i
ij
ai |hi|i|2 .
We see that the expectation value is just the sum of all possible outcomes weighted by the
probability for each outcome just as it should be!
Note the special role played by (normalized) eigenvectors: If the state is an eigenvector of some observable A with eigenvalue a, then the probability for getting a is unity.
Eigenstates are states where you know one (or more) observables with certainty.
Heres a good exercise for you: Show that every state vector is an eigenvector of some
Hermitian operator. Therefore, at least in principle, every state can be determined by a
suitable measurement.
The third postulate makes very specific predictions about what are the possible outcomes of a measurement of a given observable. In essence, all the physical output of
quantum mechanics appears via this postulate. I have to emphasize that the predictions
are always of a probabilistic nature. (Of course, some probabilities are 1 or 0, so this does
not mean that one can never make any statements with certainty.)
Let us apply this new form of the third postulate to a couple of simple examples coming
from the spin 1/2 system. Consider the state
1
|i = |Sy , +i = (|+i + i|i).
2
What is the probability for finding, say, Sy to have the value
h/2? Since
1
|Sy , i = (|+i i|i)
2
is orthogonal to |Sy , +i,
hSy , |Sy , +i = 0,
2
this probability is zero. In the same state, what is the probability for finding Sz to have
the value h
/2? We have
i
1
|hSz , +|Sy , +i|2 = | |2 = .
2
2
And so forth . . .
Stern-Gerlach revisited
Let us now derive from our model of the spin 1/2 system some of the salient features
of the Stern-Gerlach experiment. View our beam of spin 1/2 system as a large number of
identical systems. When we pass the beam through an apparatus SGz , and keep only the
spin up beam, we have a large collection of particles, all in the state |Sz , +i. If we pass
the beam through another SGz apparatus, we find all the particles are still spin up. In
our model, this result is simply the statement that the probability for finding Sz = h
/2 in
a the state |i = |Sz , +i is unity:
|hSz , +|i|2 = hSz , +|Sz , +i|2 = 1.
Now consider passing the filtered beam through an SGx apparatus. We know that the
beam splits in half. In our model this is seen by computing the probability distribution
for Sx when |i = |Sz , +i. We get
1
|hSx , |i|2 = |hSx , |Sz , +i|2 = .
2
Now we come to a crucial aspect of the use of our model to describe nature. Suppose we
take all the particles that had Sx = h
/2 and pass them through an SGz apparatus, what
does our model predict? The key observation is that the beam of particles that are going
to be passed through SGz are now all in the state
1
|i = |Sx , +i = (|Sz , +i + |Sz , i).
2
The measurement/preparation/filtering process using SGx has determined a new state
vector for the system! To verify this, we just pass the SGx -filtered beam through another
SGx apparatus and see that the particles have Sx = h/2 with probability unity. This
means that they are in the corresponding eigenstate, |Sx , +i. So, when we pass this beam
through SGz we find the 50-50 probability distribution:
1
|h|i|2 = |h|Sx , +i|2 = .
2
We now want to spend a little time understanding this sort of result in general.
3
Compatible and Incompatible Observables

Let us generalize the unusual result found in the Stern-Gerlach experiment as follows.
Suppose we have two observables A and B. If we measure A we will get one of its eigenvalues, say, a. For simplicity (only) we assume that this eigenvalue is non-degenerate. The
state of the system is now the eigenvector |ai, where
A|ai = a|ai.
If we then measure the observable B we get an eigenvalue, say b, and the system is in the
state |bi, where
B|bi = b|bi
(for simplicity, we also assume that b is a non-degenerate eigenvalue). The probability for
getting a is now
|ha|i|2 = |ha|bi|2
Unless there is a real number c such that |ai = eic |bi this probability is less than unity.*
When do we have |ai = eic |bi? This condition means that A and B have a common
eigenvector. ( Note that there are no common eigenvectors for the spin operators exercise.) As long as A and B do not have all of their eigenvectors in common, there will
be situations where knowing one of the observables with certainty will preclude knowing
the other with certainty. Such observables are called incompatible. Under what conditions
will two observables be compatible? Evidently, compatible observables are represented by
operators that have a common (basis) set of eigenvectors so that the argument just given
above cannot apply. This implies that the observables must commute,
[A, B] AB BA = 0.
To see this, let us denote by |a, bi the common set of eigenvectors, which form an ON basis:
A|a, bi = a|a, bi,
B|a, bi = b|a, bi,
ha, b|a0 , b0 i = aa0 bb0 .
We have
AB|a, bi = bA|a, bi = ba|a, bi = aB|a, bi = BA|a, bi.
Since the operator [A, B] maps each element of a basis to the zero vector, and since every
vector can be expanded in this basis, we conclude that [A, B] must be the zero operator.
* To see this, use the Schwarz inequality exercise:
|ha|bi|2 ha|aihb|bi,
equality iff |ai = (const.)|bi.
So, if a pair of Hermitian operators have a common basis of eigenvectors then they
must commute. It is a fundamental result from linear algebra (for finite dimensional vector
spaces) that the converse also holds: If two Hermitian operators commute, then they
admit a common, ON basis of eigenvectors. Thus compatible observables are represented
by commuting operators. Physically, it is possible to have states in which the values of
compatible observables are determined with certainty unlike different components of the
spin. By the same token, incompatible observables are represented by operators that do
not commute. For incompatible observables there will exist states in which their values
cannot be determined with statistical certainty. We see then that the unusual feature of
nature embodied in the Stern-Gerlach experiment is encoded in the algebraic structure
associated to the observables by the commutator of the corresponding linear operators.
Lecture 6
Complete set of commuting observables
Only in the simplest physical systems will the measurement of a single observable
suffice to determine the state vector of the system. Of course, the spin 1/2 system has
this property: if you measure any component of the spin the state vector is fixed (up to
an irrelevant phase factor*). More complicated systems need more observables to characterize the state. For example, recall that the energy levels of a hydrogen atom can be
uniquely specified by the energy, orbital angular momentum magnitude, one component
of the angular momentum and the electron spin state. Other choices are possible. After
measuring an observable and getting a particular outcome an eigenvalue the state of
the system is the corresponding eigenvector. It may happen that more than one state
vector is associated with that measurement outcome. Mathematically, the eigenvalue is
degenerate. To pin down the state uniquely, other observables must be measured. Evidently, if we have a set of observables whose measurement uniquely determines the state
(vector up to a phase), then that state vector must be a simultaneous eigenvector of all
the observables in question. If we demand that the possible outcomes of measurements of
this set of observables defines a basis for the Hilbert space, then the observables must be
compatible, i.e., their operator represntatives must all commute. Such a set of variables
is called a complete set of commuting observables (CSCO). For a hydrogen atom, modeled
as an electron in a Coulomb potential, the energy, orbital angular momentum magnitude,
one component of the orbital angular momentum and one component of the electron spin
constitute a CSCO. For a single spin 1/2 system, any component of spin defines a CSCO.
Evidently, a CSCO is not unique.
Given a CSCO, A, B, . . . we can label the elements of the ON basis it determines by the
eigenvalues of the CSCO, |a, b, . . .i. These are states in which the elements of the CSCO
are known with certainty to have the values a, b, . . .. The vectors satisfy
ha, b, . . . |a0 , b0 , . . .i = aa0 bb0 . . . .
Incompatible observables. The uncertainty principle.

Incompatible observables are represented by operators which do not commute. As we
have seen, incompatible observables are such that there exist states where both observables
* Two states that differ by a phase factor (eia ) will give the same expectation value to all
observables exercise. This means that all probability distributions are insensitive to the
phase factors, and hence all physical predictions are as well.
1
cannot be determined with certainty. We will now make this more precise and give a
very general version of the celebrated uncertainty principle probably better called the
uncertainty theorem.
Given an observable A and a state |i, we define the dispersion of A in the state |i
to be
h(A)2 i := h(A hAi)2 i = h|A2 |i h|A|i2 = hA2 i hAi2 .
The dispersion (also called the variance) is a non-negative real number which characterizes
the statistical uncertainty of the observable A in the state |i. To see that the dispersion
is indeed non-negative, note that the expectation value of the square of any Hermitian
operator C is a positive number:
h|C 2 |i = h|C C|i;
the right hand side is just the length-squared of the vector C|i, which must be nonnegative. Setting C = A hAiI, and noting that hAi must be a real number (exercise),
we conclude that h(A)2 i 0.
Note the dispersion vanishes if and only if the state |i is an eigenvector of A. You
can easily verify the if part. Let me show you the only if part. Write
h(A)2 i = h|(A hAiI)2 |i
= h|(A hAiI) (A hAiI)|i
= ||(A hAiI)|i||2 .
The norm of a vector vanishes if and only if the vector is the zero vector, so if the dispersion
vanishes we have
(A hAiI)|i = 0,
which is the eigenvector condition (exercise).
Using the Schwarz inequality the following relation between the dispersions of 2 obervables can be established (see your text for the proof):
1
hA2 ihB 2 i |h[A, B]i|2 .
4
This is the general form of the uncertainty relation. In a given state it relates the product
of the statistical uncertainty of a pair of observables to the expectation value of the commutator of the observables. If the commutator vanishes, or if its expectation value does
in the given state, then the uncertainty relation has no content. Otherwise, it provides
information about the effect of the incompatibility of the observables. In general, this
effect depends upon the state that is chosen. You can see this from the fact that the
expectation values occurring in the inequality above are defined by a given state. This is
2
important to keep in mind. In certain cases (e.g., the position-momentum relation, to be

studied later) the uncertainty relation turns out to be state independent and hence it is
much more dramatic and famous.
For a spin 1/2 system it is straightforward to compute the uncertainty relation for
various observables. To do it, you need the following commutation relations (which you
derive in your homework):
[Sx , Sy ] = ihSz ,
[Sy , Sz ] = i
hSx ,
[Sz , Sx ] = ihSy .
Let let |i = a|+i + b|i, |a|2 + |b|2 = 1 (which is a general state vector). We consider the
uncertainty relation for Sx and Sz . We have (exercise)
hSz i2 hSx i2
h4 [Im(ab )]2 .
Of course, if a or b vanish the state is an eigenstate of Sz and the uncertainty relation
has no import. Otherwise, though, you can see how the lower limit for the product of
uncertainties varies with the choice of state. For example, if a = 1/ 2 and b = i 2 (so
that the state is an eigenstate of Sy ) we have
hSz i2 hSx i2
4
h
.
4
Lecture 7
Observables with continuous and/or unbounded values

We are now ready to turn to the quantum mechanical description of a (non-relativistic)
particle. We shall define a (spinless) particle as a system that is completely characterized
by the position and linear momentum, which are the basic observables in this model.
This means that all observables are functions of position and momentum. While it is
possible that the position and momentum variables take a discrete set of values (as angular
momentum and often energy do), there is currently no experimental evidence of this.
We therefore create a model in which these observables can take a continuous, unbounded
set of values. Evidently, we need to define a Hilbert space that admits self-adjoint operators
with a continuous, unbounded spectrum. Neither of these features are possible on finitedimensional vector spaces, and so here we are forced into the infinite-dimensional setting
(i.e., spaces of functions). This leads to some mathematical subtleties that we need to be
wary of. I shall not try to be perfectly rigorous in our discussion since that would take us
too far afield. But I will try to give you a reasonably fool-proof if somewhat formal
treatment. First, let me give you a recipe for dealing with the situation. Then let me give
you a flavor of the underlying mathematics which makes the formal recipe work (and also
which shows where it can become tricky).
Formal recipe
We want to define an observable A with a continuous, unbounded set of values a R,
say. We postulate the existence of a self-adjoint linear operator, again denoted A, and a
continuous set of vectors |ai such that
A|ai = a|ai.
We say that A has a continuous spectrum. These vectors are to be orthonormal in the
delta-function sense:
ha|a0 i = (a, a0 ).
You can think of this as a continuum generalization of the usual orthonormality expressed
via the Kronecker delta. The vectors |ai are to form a basis, that is, they provide a
continuous resolution of the identity:
Z
I=
da |aiha|,
so that we can write
Z
|i =
da |aiha|i.
Using quotation marks to indicate where the underlying mathematics is considerably

more subtle than the words indicate, we can say the following. We interpret the eigenvalues a as the possible values of an outcome of the measurement of A, the vectors |ai are
states in which A has the value a with certainty. The scalar |ha|i|2 da is the probability
for finding A to have the value in the range [a, a + da] in the state |i. In particular, the
probability that the observable A is found to have a value in a region [a1 , a2 ] is given by
Z a2
P (A [a1 , a2 ]) =
da |ha|i|2 .
a1
Thus in this setting we call |ha|i|2 the probability density for A in the state . You
can see that, given the operator A with continuous spectrum, the abstract state vector is
completely characterized by the complex-valued function
Z
(a) = ha|i,
da |(a)|2 = h|i = 1.
(a) represents the components of the vector |i along the basis provided by |ai. Its
the continuous analog of the column vector representation! This function (a) = ha|i is
called the wave function associated to the observable A. The Hilbert space can thus be
identified with the set of all square-integrable functions (a). Of course, you have seen
this before in the position and momentum representations for a particle.
Subtleties. A glimpse of the real story.
For the most part, using the above formalism you can treat operators with continuous
spectra much the same as we did linear operators on finite-dimensional Hilbert spaces. It
is worth warning you, however, that there are mathematical subtleties that can come into
play.
To begin with, note that the eigenvectors |ai cannot really be elements of the Hilbert
space since they do not have a norm (a, a) is not defined! To see this concretely in a
familiar example, consider the familiar momentum operator (to be justified soon), acting
upon the Hilbert space of square-integrable functions of 1-variable:
p(x) =
d
h
(x).
i dx
Its eigenfunctions are of the form

(x) = N ex ,
2
with eigenvalues hi . No matter how you choose ,

Z
dx |(x)|2 .
This means that (x) does not actually correspond to an element of the Hilbert space.
This difficulty can be traced back to the fact that p has continuous spectrum.* Our formal
prescription (delta-function normalization) etc. works if we pick = ik, where k is a real
number.
If the spectrum of an observable is unbounded, then another difficulty arises: not all
elements of the Hilbert space are in the domain of the operator. This difficulty which
can only occur if the Hilbert space is infinite-dimensional arises whenever the spectrum
is unbounded, irrespective of whether the spectrum is discrete or continuous. To illustrate
this, return to the momentum operator. It is clear that, say, a square wave pulse defines
a normalizable wave function, that is, an element of the Hilbert space of a particle.**
However, the derivative of a square wave is not defined at the corners, so one doesnt
get a function after applying the operator let alone a square-integrable function. We say
that the square wave function is not in the domain of the momentum operator. Likewise,
it is easy to construct wave functions that are not in the domain of the position operator.
For example, you can easily check that
x
(x) = 2
x +1
is a square integrable (normalizable) function of x. However, the position operator X acts
on such functions via
X(x) = x(x),
so that, in our example,
x2
,
X(x) = 2
x +1
which is not square-integrable, i.e., not an element of the Hilbert space for a particle.
In a more rigorous treatment of observables with continuous and/or unbounded spectrum, one handles these issues as follows. One accepts the limited domain of unbounded
operators. One adds to the definition of self-adjoint operators the requirement that the operator and its adjoint have the same domain. The domain of an observable with unbounded
* Since we dont have any eigenvectors, how do we define spectrum? Roughly, the spectrum of an operator A is the set of scalars such that the linear operator A I has no
inverse.
** Physically, such a wave function describes a particle that is known with certainty to be
somewhere in the region in which the square wave function is non-vanishing. For example,
this would be the state of a particle after it has passed into a particle detector of finite
size.
3
spectrum forms a dense subset of the Hilbert space. This means that every vector can
be arbitrarily well approximated (relative to the Hilbert space norm) by elements in the
domain. If the spectrum is discrete (if unbounded), an orthonormal basis of eigenvectors
for the Hilbert space can be found inside this dense domain. If the spectrum is continuous,
the eigenvectors are not in the Hilbert space, but are generalized eigenvectors. They
live in the vector space of distributions, which are linear functions on a dense subspace of
the Hilbert space. Thus the generalized eigenvectors have a well-defined scalar product
with elements from a dense subset of the Hilbert space, but not with the whole Hilbert
space or among themselves. It can be shown that a self-adjoint operator with continuous
spectrum admits a delta-function normalized set of generalized eigenvectors which can be
used to define a generalized basis much as indicated in our formal treatment above.
If all this mathematical detail leaves you a little dazed, dont feel bad. I only gave
you vague hints as to how the story goes. I did not attempt to give a complete, coherent description. Still, now you at least know where some of the subtleties lie. In what
follows all of our formal manipulations will have a justification within the more complete
mathematical formulation that I hinted at above.
Position Operator
We have already mentioned that a particle is defined as a system in which the only
relevant observables are position and momentum. How do we implement this idea in
quantum mechanics? We need to define self-adjoint operators representing position and
momentum. As mentioned earlier, we do this so that the spectrum is continuous and
unbounded, ranging over all real numbers. We begin with the position operator, which
can be constructed formally as we did the generic operator A above. We will denote by
X the position operator and the spectral values will be denoted x, so that we have the
generalized eigenvectors |xi.
X|xi = x|xi.
We have (on a suitable dense domain) X = X . The position wave function is defined by
Z
(x) = hx|i,
dx |(x)|2 = h|i.
A vector is completely specified by its position wave function and vice versa. The position
wave function is a continuous version of a column vector! A linear operator maps vectors
to vectors and so defines and is defined by a linear mapping from wave functions to
wave functions. This mapping can be computed via
(x) A(x) hx|A|i.
4
In particular, the position operator maps functions to functions via

(x) X(x) = hx|X|i = xhx|i = x(x).
Here we have used
X|xi = x|xi = hx|X = xhx|.
The wave function (x) is the probability amplitude for position. This means that
|(x)|2 is the probability density for position. This means that
Z b
P rob(X [a, b]) =
dx |(x)|2 .
Note the unit vector condition implies normalization of the wave functions:
Z
Z
dx |(x)|2 .
dx h|xihx|i =
1 = h|i =
Finally we note that the generalized eigenvector |x0 i represents a state in which the
particle has definite position. Formally, the wave function for this state is then
(x) = hx|x0 i = (x, x0 ).
Of course this wave function is not normalizable except in the delta function sense. (Try
it!)
Remarks: Note that all the foregoing results are in fact valid for any observable with
continuous spectrum. We simply select one and use it to represent position and adapt
our notation accordingly. Also note that, as is the tradition when presenting material
at this level, we have been completely cavalier with domains. The idea is that the formal
manipulations given above will make sense provided the domains are sufficiently restricted.
These restrictions are usually not explicitly made since we are only trying to understand
the formal structure of things. When performing concrete computations one must usually
pay more attention to such things. But its amazing how far one can get without doing so!
Lecture 8
Momentum
How shall we view momentum in quantum mechanics? Should it be mass times velocity, or what? Our approach to the definition of momentum in quantum mechanics
will rely on a rather fundamental understanding of what is momentum. To motivate
our definition, let me remind you that the principal utility of the quantity called momentum is due to its conservation for a closed system. One can then understand the
motion of interacting systems via an exchange of momentum. Next, recall the intimate
connection between symmetries of laws of physics and corresponding conservation laws.
In particular, symmetry under spatial translations corresponds to conservation of linear
momentum. In the Hamiltonian formulation of the classical limit of mechanics this correspondence becomes especially transparent when it is seen that the momentum is the
infinitesimal generator of translations, viewed as canonical transformations. In the Hamiltonian framework, the conservation of momentum is identified with the statement that the
Hamiltonian is translationally invariant, that is, is unchanged by the canonical transformation generated by the momentum. We shall see that this same logic applies in quantum
mechanics. Indeed, nowadays momentum is mathematically identified by definition as
the generator of translations. Let us see how all this works.
Having defined the position (generalized) eigenvectors, which represent (idealized)
states in which the position of the particle is known with certainty, we can define a translation operator Ta via
Ta |xi = |x + ai.
Since the |xi span the Hilbert space, this defines the operator. Note that (exercise)
Ta Tb = Ta+b ,
T0 = I.
Physically, we interpret this operator as taking the state of the particle and transforming
it to the state in which the particle has been moved by an amount a in the positive x
direction. Since we demand that this operator map states into states, we must require
that they stay of unit length:
1 = h|i = h|Ta Ta |i.

It can be shown that this requirement (for all vectors) forces (for all a, which we suppress
here)
T T = T T = I, T = T 1 .
1
We say that the operator T satisfying this last set of relations is unitary. Note that a
unitary operator preserves all scalar products (exercise).
Note that for position wave functions we have (good exercise)
Ta (x) = hx|Ta |i = hx a|i = (x a).
So, moving the system to the right by an amount a shifts the argument of the wave function
to the left. To see that this makes sense, consider for example a particle with a Gaussian
wave function (exercise). The unitarity of Ta is expressed in the position wave function
representation via (exercise)
Z
Z
2
|(x a)| =
|(x)|2 = 1.
As we mentioned above, momentum is identified with the infinitesimal generator of

translations. Thus, consider an infinitesimal translation, T , << 1. We assume that Ta
is continuous in a so that we may expand the operator in a Taylor series
i
T = I P + O(2 ).
h
The mathematical definition of the Taylor series of an operator needs a fair amount of
discussion, which we will suppress. For our purposes you can just interpret the expansion
as meaning that any matrix elements of the operator can be so expanded. The factor of hi
has been inserted for later convenience. Here P is a linear operator, called the infinitesimal
generator of translations.* The unitarity and continuity of T implies that P is self-adjoint.
Indeed, considering the O() terms you can easily see (exercise)
i
i
I = T T = (I + P + O(2 ))(I P + O(2 )) = P = P .

h
h
In fact, the self-adjointness of P is also sufficient for T to be unitary. This can be seen by
representing Ta as an infinite product of infinitesimal transformations:
Ta = lim (I
N
i
i a N
P ) = e h aP .
hN
It is not hard to check that any operator of the form eiA with A = A is unitary (exercise).
Thus P represents an observable, which we identify with the momentum of the particle.
The canonical commutation relations
We now consider the commutation relation between position and momentum. We can
derive this relation by studying the commutator between position and translation operators
* Technically, the infinitesimal generator is hi P , but it is a convenient abuse of terminology
and it is customary to call P the infinitesimal generator.
2
and then considering the limit in which the translation is infinitesimal. Check the following
computations as an exercise.
XT |xi = (x + )|x + i.
T X|xi = x|x + i.
Subtracting these two relations, taking account of the definition of momentum, and working
consistently to first-order in we have (exercise)
i
i
X( P )|xi ( P )X|xi = |xi.
h
h
This implies keep in mind that |xi is a (generalized) basis

[X, P ] = i
hI.
This relation, along with the (trivial) relations
[X, X] = 0 = [P, P ],
constitute the canonical commutation relations for a particle moving in one dimension.
Note that these commutation relations show us that position and momentum are incompatible observables. They thus satisfy an uncertainty relation:
1 2
hX 2 ihP 2 i
h .
4
This is the celebrated position-momentum uncertainty relation. It shows that if you try
to construct a state with a very small dispersion in X (or P ) then the dispersion in P
(or X) must become large. Note also that the uncertainty relation shows the dispersion
in position or and/or momentum can never vanish! However, either of them can be made
arbitrarily small provided the other observable has a sufficiently large dispersion.
Momentum as a derivative
It is now easy to see how the traditional representation arises in which momentum is
a derivative operator acting on position wave functions. Consider the change in a position
wave function under an infinitesimal translation. We have
i
d(x)
T (x) = (I P )(x) = (x ) = (x)
+ O(2 ).
h
dx
Comparing terms of order we see that
d(x)
h
.
i dx
Of course, you can now easily verify the position wave function representation of the
canonical commutation relations:
P (x) hx|P |i =
[X, P ](x) = i
h(x).
3
Lecture 9
Momentum wave functions

We have already indicated that one can use any continuous observable to define a class
of wave functions. We have used the position observable to this end. Now let us consider
using the momentum observable. We define, as usual,*
P |pi = p|pi, p R
Z
I=
dp|pihp|,
(p) = hp|i,
P (p) = p(p),
and so forth. The interpretation of the momentum wave function is that |(p)|2 dp is the
probability to find the momentum in the range [p, p + dp]. In other words,
Z b
P rob(P [a, b]) =
dp |(p)|2 .
Note that a translation of a momentum wave function is a simple phase transformation

(exercise):
i
Ta (p) = hp|e h aP |i = eipa (p).
Physically, this means that a translation has no effect on the momentum probability distribution (exercise). Exercise: use the momentum wave function representation to check
the unitarity of Ta .
A very useful and important relationship exists between the position and momentum
(generalized) eigenvectors. To get at it, we study the scalar product hx|pi, which can be
viewed as the position wave function representing a momentum eigenvector. It can also be
viewed as the complex conjugate of the momentum wave function representing a position
eigenvector. This complex function of x must satisfy (for each p)
i
hx |pi = hx|T |pi = e h p hx|pi.

This implies (to first order in ) (exercise)
i
d
hx|pi = phx|pi.
dx
h
* Of course, these properties, e.g., the spectrum of P , ought to be derived directly from the
definition of the translation operator. This can be done, but we wont do it explicitly here.
1
The solution to this equation is

i
hx|pi = (const.)e h px .
The constant can be determined by the normalization condition:
Z
0
0
(p, p ) = hp|p i =
dxhp|xihx|p0 i.
Using the Fourier representation of the delta function,

Z
0
1
0
(p, p ) =
dx eix(p p) ,
2
we see that (exercise)
hx|pi =
i
1
e h px .
2
h
Thus we have recovered a familiar result from wave mechanics: the position space wave
function for a particle in a state such that the momentum has (with probability 1) the
h
value p is a (complex) plane wave* with wavelength 2
p . Because the absolute value of
the wave function is unity, the particle has an equal probability of being found anywhere
(think: uncertainty relation). Note also that since the energy of a free particle of mass m
is
P2
H=
,
2m
this wave function describes a free particle with energy p2 /2m.
Because
hp|xi =
i
1
e h px ,
2
h
we see that the momentum space probability amplitude for a particle in an idealized state
with a definite position is also a plane wave. For an arbitrarily well-localized particle, all
momenta are equally likely (uncertainty relation again).
With these results in hand we can give an explicit relation between the position and
momentum bases:
Z
Z
i
1
dp e h px |pi,
|xi =
dp|pihp|xi =
2
h
and
Z
i
1
|pi =
dx|xihx|pi =
dx e h px |xi.
2
h
* Of course, plane waves are not normalizable, but we have already discussed this subtlety.
2
If we set
(p)
= hp|i,
(x) = hx|i,
we get (exercise)
Z
i
1
(x) =
dp e h px (p),
2
h
and
Z
i
1
(p)
=
dx e h px (x).
2
h
Thus the position wave functions and momentum wave functions are related by Fourier
transforms. Note that (exercise)
Z
Z
h|i =
dx (x)(x) =
dp (p)(p).
In the momentum representation the momentum operator is a multiplication operator:
P (p)
= hp|P |i = php|i = p(p),
while the position operator is a differentiation operator:

Z
i
1
px
X (p) = X
dx e h (x)
2
h
Z
i
1
=
dx e h px x(x)
2
h
Z
i
h
dx e h px (x)
=
i p 2
h
h (p)
=
.
i p
Expectation values
Expectation values in the position and momentum representation are easy to compute.
Using the notation
(x) = hx|i, (p)

= hp|i,
we have, in particular, the following
Z
Z
h d
2
h|f (X)|i =
dxf (x)|(x)| =
)(p),
dp (p)f (
i dp
and
Z
h|f (P )|i =
2=
dpf (p)|(p)|
dx (x)f (
h d
)(x),
i dx
You should prove these as a very good exercise.

Particle in 3 dimensions
The generalization to a particle in 3 dimensions is done by, essentially, tripling our
previous constructions. Here we briefly describe this generalization. Afterward, I will
show you how to view in general this way of adding degrees of freedom to a quantum
mechanical model.
In 3-d, we now have position vectors X with components X i = (X, Y, Z), and momentum vectors P with components Pi = (Px , Py , Pz ). Each pair (X i , Pi ) is represented by
self-adjoint operators exactly as before. We demand that they have the following canonical
commutation relations:
[X i , X j ] = [Pi , Pj ] = 0,
[X i , Pj ] = i
hji I.
We have position (generalized) eigenvectors |xi and momentum (generalized) eigenvectors |pi,
X i |xi = xi |xi, Pi |pi = pi |pi.
These form a (generalized) basis:
Z
Z
3
d x |xihx| = I = d3 p |pihp|.
Here it is understood that the integrals run over all of position/momentum space.
The self-adjoint momentum operators generate unitary transformations corresponding
to 3-d translations:
i
Ta = e h aP , Ta |xi = |x + ai.
The canonical commutation relations reflect the fact that translations are commutative
operations:
i
Ta Tb = Tb Ta = e h (a+b)P = Ta+b .
Note that the commutation relations allow us to choose a basis of simultaneous (generalized) eigenvectors of the position or momentum. The relation between the two bases
is
i
1
e h px .
hx|pi =
(2
h)3/2
The position wave functions and momentum wave functions are defined as usual by
taking the components of a state vector along the corresponding basis:
(p)
= hp|i.
(x) = hx|i,
4
We then have
X i (x) = xi (x),
and
X i (p)
=
Pi (x) =

h
(p),
i pi

h
(x),
i xi
Pi (p)
= pi (p),
The probability distributions for finding position/momentum in a volume V /V are

Z
Z
3
2
2.
d x |(x)| , P rob(P V ) =
d3 p |(p)|
P rob(X V ) =
V
Lecture 10
Gaussian state
Here we consider the important example of a Gaussian state for a particle moving in
1-d. Our treatment is virtually identical to that in the text, but this example is sufficiently
instructive to give it again here.
We define this state by giving its components in the position basis, i.e., its wave
function:

1
x2
(x) = hx|i =
exp ikx 2 .
2d
1/4 d
You can check as a good excercise that this wave function is normalized:
Z
Z
1 = h|i =
dx h|xihx|i =
dx |(x)|2 .
Roughly speaking, the wave function is oscillatory with wavelength 2

k but sitting in a
Gaussian envelope centered at the origin. The probability density for position is a Gaussian
centered at the origin with width determined by d. Thus this state represents a particle
localized near the origin within a statistical uncertainty specified by d.
Let us make this more precise and exhibit some properties of this state. As an exercise
you can check the following results.
Z
hXi = h|X|i =
dx x|(x)|2 = 0.
so that the mean location of the particle is at the origin in this state. Next we have
Z
d2
2
2
hX i = h|X |i =
dx x2 |(x)|2 = .
2
Thus the dispersion in position is

d2
2
hX i = ,
2
i.e., d is the standard deviation of the probability distribution for position. Next we have
2
Z
hP i = h|P |i
dx (x)
h d
(x) = hk,
i dx
telling us that, on the average, this state has the particle moving with momentum h
k, and
Z
h2
d2
2
2
hP i = h|P |i
dx (x)(
h2 ) 2 (x) = 2 + h
2 k2 ,
dx
2d
so that the momentum uncertainty is

hP 2 i =
2
h
.
2d2
Thus the momentum uncertainty varies reciprocally with d relative to the position uncertainty. The product of position and momentum uncertainties is as small as allowed by the
uncertainty relation. One sometimes calls |i a minimum uncertainty state.
Because the Fourier transform of a Gaussian function is another Gaussian, it happens
that the momentum probability distribution is a Gaussian:
Z
(p) = hp|i =
dx hp|xihx|i
Z
i
1
e h px (x)
=
dx
2
h
s

1
d
2
2
exp 2 (p hk) d .
=
h
2
h
You can see that this Gaussian is peaked about the expected momentum value, as it
should be, and that its width varies like 1/d, i.e., reciprocal to the position uncertainty,
as expected.
In summary, the Gaussian state we have defined corresponds to a particle which (on
the average) is moving, and has a Gaussian spread of position and momentum values such
that the uncertainty product is minimized. One can use such states to model macroscopic
objects. Just put in reasonable values for d and k and you will find that the quantum
uncertainties are sufficiently small to be negligible.
Combining systems in quantum mechanics
In our generalization from a particle moving in 1-d to a particle moving in 3-d we
tacitly took advantage of a general quantum mechanical scheme for combining systems to
make composite systems. In particular, we viewed a particle moving in 3-d as essentially
3 copies of a particle moving in 1-d. We do this sort of thing all the time in physics. For
example, if we have a satisfactory kinematic model of a particle and we want to consider
a kinematic model for 2 particles, we naturally try to use two copies of the model that we
used for one particle. As another example, we have presented a model of a spin 1/2 system,
what if we want to describe 2 spin 1/2 systems? Or what if we want to describe a particle
with spin 1/2? All these situations require us to know how to combine quantum mechanical
models to make composite models. Here we outline the scheme using our generalization
from a 1-d particle to a 3-d particle as illustration. Later we will have occasion to use this
construction again.
2
The tensor product.

Suppose we wish to combine two systems described by Hilbert spaces H1 and H2 ,
with observables represented by linear operators A1 , B1 , . . . and A2 , B2 , . . ., which are, of
course, only defined on their respective Hilbert spaces. We can build up a Hilbert space
for the combined system as follows. We first consider the set consisting of all ordered pairs
of vectors from each Hilbert space. Typical elements are denoted
|, i := |i |i,
|i H1 , |i H2 .
These vectors are called product vectors. Physically, a product vector |i |i is a state
of the composite system in which system 1 is in state |i and system 2 is in state |i.
Product vectors admit a natural scalar multiplication inherited from that existing on
H1 and H2 :
c|, i := (c|i) |i |i (c|i).
It is possible to define a notion of addition on special product vectors. We have
|i |i + |i |i := (|i + |i) |i,
and
|i |i + |i |i = |i (|i + |i).
Obviously, this does not define an addition operation on all product vectors. For additions
that are not of the above special form we simply define
|i |i + |i |i,
to be a new element of our set. We then simply take the set of all formal linear combinations
of product vectors to be the set used in the new, composite Hilbert space. This set is a
vector space, called the tensor product of H1 and H2 , and is denoted by H1 H2 . Our
text uses the terminology direct product for this construction.
A basis for the tensor product can be obtained by taking all product vectors built from
a basis for each of H1 and H2 (exercise). If |ei i are a basis for H1 and |fi i are a basis for
H2 , then every |i H1 H2 can be written as
X
|i =
aij |ei i |fj i.
i,j
Note that not every basis for the tensor product is going to be built from product vectors.
Note also that the dimension of the product space is the product of the dimensions of
the individual Hilbert spaces. In other words, if n1 is the dimension of H1 and n2 is the
dimension of H2 , then H1 H2 has dimension n1 n2 . To see this, simply note that there
3
are n1 n2 scalars in the array of numbers aij in the above expansion of an arbitrary vector
in the product basis.
The tensor product space is to be a Hilbert space. The scalar product is defined as
follows. For product vectors we have
h, |, i = h|ih|i.
For general vectors we expand them in a basis of product vectors and then define the scalar
product via the usual linearity/anti-linearity of the scalar product (exercise).
Let us note that, since the space of states must be a vector space this forces us to
consider as state vectors all possible (normalizable) linear combinations of the product
state vectors. Such linear combinations will not be interpretable as saying that each
subsystem is in a definite state. Rather, in such linear combinations each subsystem will
have various probabilities to be in various states. Moreover, there will exist correlations
between the probability distributions for each subsystem. This is the essence of quantum
entanglement; it is responsible for very striking physical behavior of composite systems
compared to their classical analogs. Hopefully we will have a little time to discuss this
further at the end of the semester. As an example, consider a system consisting of 2 spin
1/2 systems (e.g, an electron and a positron with all degrees of freedom except for spin
ignored). A (product) basis for the Hilbert space of states could be the four combinations
of |i |i. Where the first vector denotes Sz eigenvectors of particle 1 and the second
denotes Sz eigenvectors of particle 2. These vectors represent states in which Sz for each
particle is known with certainty. A vector such as
1
|i = (|+i |+i + |i |i)
2
represents a state in which neither particle has statistically certain values for Sz .
You can easily check that all the constructions above can be implemented in an identical
fashion for the space of bras. In particular we have
h| h| = (|i |i) .
As an example, let us consider a particle moving in 2-d as built from a product of 2
particles (labeled by x and y) moving in 1-d. A basis for the particle moving in x is |xi
and a basis for the particle moving in y is |yi. The corresponding product basis is then
|xi |yi. A general vector in the product space can be written as
Z
|i = d2 x (|xi |yi)(hx| hy|)|i.
4
We define
(x, y) = (hx| hy|)|i.
You see that vectors are characterized by their position wave functions which are complexvalued functions of two variables. These wave functions must be square-integrable as you
can see by setting
Z
|i =
Z
h| =
dx dy (x, y)|xi |yi,

dx0 dy 0 (x, y)hx| hy|,
and computing (exercise)

Z
h|i =
d2 x |(x, y)|2 .
Of course, with unit normalization, one can intepret |(x, y)|2 to be the probability density
for finding position at (x, y).
Lecture 11
Product observables
We have seen how to build the Hilbert space for a composite system via the tensor
product construction. Let us now see how to build the observables. Let A1 be an observable
for system 1 and B2 an observable for system 2. We define the corresponding observables
for the composite system as follows. Consider the following linear operators on product
vectors:
A1 I|i |i = (A1 |i) |i,
I B2 |i |i = |i (B2 |i).
We extend the definitions of these operators to general states by expanding the state in a
product basis and demanding linearity of the operator. Thus defined, these operators are
Hermitian. Moreover, they commute (good exercises!):
[A1 I, I B2 ] = 0,
which means that one can still determine with statistical certainty the subsystem observables within the composite system. For example, if we are looking at 2 particles, it is
possible to ascertain both particle 1s position and particle 2s momentum with arbitrarily
good statistical accuracy. Likewise, when using this contruction to build a model for a
particle moving in 3-d from our 1-d model, we end up with all the position variables commuting among themselves and all the momentum variables commuting among themselves.
Of course, the commutation relations obtained in the subsystems appear in the combined
system.
Exercise
Show that if [A1 , B1 ] = C1 then
[A1 I, B1 I] = C1 I,
thus explaining the generalization of the canonical commutation relations we obtained for
a particle in 3-d.
Please take note that it is conventional in many references to simply drop the I or I
factors when denoting the extension of the subsystem operators to the composite system.
Indeed, it is common to also drop the symbol entirely!
Also note that not every observable of the system is of the above form. As usual, any
self-adjoint operator can represent an observable. Thus there will be observables that refer
1
to the composite system as a whole (e.g., the total energy) and are not observables of
either of the subsystems alone.
Returning to our example of a particle moving in 2-d. We have position operators for
each of the two particles: X I and I Y . For example, we have
X I|x, yi = X I|xi |yi = x|xi |yi.
Let us see how these operators act on position wave functions:
Y (x, y) = hx, y|Y |i = yhx, y|i = y(x, y),
where we used
Y |x, yi = y|x, yi = hx, y|Y = yhx, y|.
Similarly, we can define the momentum. For example you can check as a nice exercise that
P1 (x, y) = (hx| hy|)Px I|i =
(x, y).
x
(Hint: Px acts on hx| via infinitesimal translations.)

As an example of an observable that can only be defined on the whole system, consider
the total energy function for two particles moving in 1-d (equivalently, one particle moving
in 2-d).
1
H=
(Px2 + Py2 ) + V (X, Y ).
2m
It acts on position wave functions as (exercise)
2

h2
2
H(x, y) =
+
(x, y) + V (x, y)(x, y).
2m x2 y 2
Dynamics: the Schr

odinger picture
We now have enough tools to formulate quantum dynamics. Dynamics are characterized by the final postulate: Time evolution is a continuous unitary transformation. We
have, of course, just studied an example of a continuous unitary transformation: Translation by an amount a. In that application a is not identified with time, though. The idea of
dynamical evolution is that measureable aspects of the system are changing in time.* In
* For us, time will be modeled in its Newtonian form as a non-dynamical, a priori way of
ordering events, valid in any inertial reference frame. In particular, in this framework time
is not treated as an observable whose value depends upon the state of the quantum system,
but instead as a special kind of parameter. Of course, there are many interesting issues
here, but we will not be able to explore them further in this course. We simply assume
that some suitable standard of temporal reference has been chosen once and for all.
2
quantum mechanics the characterization of the system at any given time can be viewed as
the totality of probability distributions for all observables at that time. Time evolution,
therefore, ought to correspond to a time varying change in the probability distributions.
As we have noted already, all probability distributions can be computed by taking expectation values of suitable observables (e.g., characteristic functions, etc. ). Therefore, time
evolution can be defined in terms of the evolution of expectation values. The rules of the
game say that given an observable A the expectation value is to be computed via
hAi = h|A|i,
where |i is a unit vector in Hilbert space representing the state of the system and A is the
operator representing the observable. We now consider doing this as time varies. At each
instant of time we need to have a system of vectors and operators that can be used to make
physical predictions via, say, expectation values. Our strategy will be to assume we have a
single Hilbert space for all time, but we allow the mathematical identification of states and
observables to vary in time. To describe mathematically how expectation values change in
time we then have a number of options. Two such options are often convenient. First, we
can let the operator representing the observable change in time, with the vector |i fixed
for all time. This point of view of dynamics is called the Heisenberg picture, and will be
discussed later. Another way of viewing dynamics, known as the Schr
odinger picture, is
based upon letting the vector |i change in time while holding the operator representatives
of the observables fixed in time. There are infinitely many different pictures intermediate
to these two. We shall look first at the postulate on dynamics from the point of view of
the Schr
odinger picture.
The Schr
odinger picture: The time evolution operator
Much as we did with spatial translations, we assume that the state vector at time t,
denoted by |, ti, is related by a continuous, unitary transformation from the state at any
earlier time t0 . Therefore we write
|, ti = U (t, t0 )|, t0 i.
Here |, t0 i can in principle be any unit vector. U is unitary so that the state vector
remains normalized as time progresses:
h, t|, ti = h, t0 |U U |, t0 i = h, t0 |, t0 i = 1.
(It is not hard to show that a (bounded) operator on a complex Hilbert space vanishes
if and only if all its diagonal matrix elements vanish (exercise). Since |, t0 i is arbitrary
we see that the state vector remains normalized if and only if U = U 1 , that is, U is
unitary.) Given,
U (t, t0 ) = U 1 (t, t0 ),
3
we naturally associate the inverse transformation as time evolution from t back to t0 , so

that
U 1 (t, t0 ) = U (t0 , t).
Finally, we assume that time evolution from t0 to t can be viewed as time evolution from
t0 to t1 and then from t1 to t, with t0 < t1 < t, so that
U (t, t1 )U (t1 , t0 ) = U (t, t0 ).
The operator U is called the time evolution operator. The principal issue when studying
dynamics using quantum mechanics is to uncover the nature of the time evolution operator
for the given system. Generally speaking, this is not easy to do if one proceeds in a direct
fashion. After all, the time evolution operator has to be a pretty complicated object, given
that it knows how to evolve any initial state. A deep principle of physics/mathematics
going back to Lie is the following. When considering continuous transformations it is always
easier to work with infinitesimal transformations. This is because the finite transformation
is completely characterized by the infinitesimal transformation, which is a much simpler
structure than the finite transformation. This is one of the reasons that you usually find
dynamical evolution (whether in Newtonian dynamics, fluid dynamics, electrodynamics,
etc.) expressed in terms of differential equations.
Thus we consider the infinitesimal generator of time evolution, viewed as a continuous
unitary transformation. The infinitesimal generator of the unitary time evolution will
be an observable called the Hamiltonian (in analogy with classical mechanics where the
Hamiltonian is the generating function of a canonical transformation corresponding to
motion in time). Normally, the Hamiltonian represents the energy, which is conserved
provided H does not depend upon the time. Indeed, just as we defined the momentum as
the generator of spatial translations, we define the Hamiltonian/energy as the generator
of time translations. One subtlety which occurs here, unlike what occurs with spatial
translations, is that the Hamiltonian may be time dependent, that is, it may be a family of
operators paramterized by the time. This demands a slightly more sophisticated analysis
than we used for the spatial translations.
Lecture 12
The Hamiltonian and the Schr
odinger equation
Consider time evolution from t to t + . We have

i
U (t + , t) = I + H(t) + O(2 ).
h
As usual, the unitarity of U implies that H(t) is Hermitian, i.e., it represents an observable
the Hamiltonian.
There is one significant difference between the spatial momentum and the Hamiltonian,
however. The spatial momentum is defined once and for all by its geometrical nature (in
the Schr
odinger picture). The Hamiltonian depends on the details of the interactions
within the system and with its environment. Thus there can be many useful Hamiltonians
for, say, a particle moving in 3-d, but we always use the same momentum operator (in the
Schr
odinger picture).
Let us extract the Hamiltonian in a different way. We have that
U (t + , t0 ) = U (t + , t)U (t, t0 ).
Substitute our result above for U (t + , t), divide both sides by and take the limit as
0, thereby defining the derivative of U . We get (exercise)
ih
dU (t, t0 )
= H(t)U (t, t0 ).
dt
Let us turn the logic of this around. Let us suppose that, given a self-adjoint Hamiltonian,
H(t), we can define U (t, t0 ) by the above differential equation.* When solving the differential equation an initial condition will have to specified in order to get a unique solution.
The initial condition we need is that
U (t0 , t0 ) = I.
Thus we can say that, given a Hamiltonian, the time evolution of the system is determined.
By focusing attention on H rather than U we get a considerable advantage in our ability
to describe physical systems. Indeed, we shall always define the dynamics of a system by
specifying its Hamiltonian. Note that it is much easier to give a formula for the energy of
* This supposition can be proved rigorously when the Hamiltonian doesnt depend upon
time. One will have to make additional hypotheses in the more general case, but we wont
worry with those technical details.
1
a dynamical system than to explicitly display its dynamical behavior. Indeed, rarely will
we be able to explicitly compute the time evolution operator.
The relationship we just derived between the time evolution operator and the Hamiltonian is an abstract version of the Schr
odinger equation. To see this, simply apply both
sides of this operator relation to an arbitrary state vector, representing the initial state of
a system at time t0 . We have
ih
dU (t, t0 )
|, t0 i = H(t)U (t, t0 )|, t0 i.
dt
This is
d
|, ti = H(t)|, ti,
dt
which is the traditional form of the Schr
odinger equation in terms of abstract vectors.
Perhaps you would be more familiar with its coordinate wave function form in the case
where the Hamiltonian is of the kinetic+potential form for a particle:
ih
H=
P2
~
+ V (X).
2m
We then get
P2
~
+ V (X)|i
=
H(~x) = h~x|
2m
!
2 2
h
+ V (~x) (~x).
2m
To see this, you should verify that

h~x|Pi |i =

h
(~x),
i xi
h~x|(Pi )2 |i =
h2
2
(~x),
xi2
and, using the definition,

~ =
V (X)
d3 x V (~x)|~xih~x|,
that
~
h~x|V (X)|i
= V (~x)(~x).
We also have
d
|, ti = i
h (~x, t).
dt
t
So, the Schrodinger equation is (after taking components in the position basis)
!
h2 2
+ V (~x) (~x, t) = i
h (~x, t).
2m
t
h~x|i
h
This does not mean U does not exist, of course, but rather it means the dynamical evolution
of the system is sufficiently complicated that no simple formula will suffice to describe it.
2
Solving the Schrodinger equation for the state vector at time t given any initial
state vector is equivalent to determining the time evolution operator. You see that the
Schr
odinger equation simply captures the fact that the Hamiltonian is the generator of
time evolution. To see what that time evolution actually is, one needs to get information about the solutions to the Schr
odinger equation. But a key observation here is that
the solutions are, ultimately, determined by the choice of Hamiltonian. Determining the
Hamiltonian is the key step in making a model of a physical system.
Formal solutions to the Schr

odinger equation
It is possible to give formulas for the time evolution operator analogous to the exponential form of the spatial translation operator. There are 3 cases to consider. First suppose
that H doesnt depend upon time. Then we are in an identical setting, mathematically
speaking, as with the spatial translations. We have
i
U (t, t0 ) = e h (tt0 )H .
You can easily check that this operator satisfies the operator version of the Schr
odinger
equation including the initial condition.
Second, suppose that H = H(t), but that for any times t, t0 we have that
[H(t), H(t0 )] = 0.
Then it is not hard to check that
hi
Rt
U (t, t0 ) = e
t0
dt0 H(t0 )
Note that this formula includes the previous result as a special case. To check this result
just note that one can manipulate the operators as if they were ordinary functions since
all the different operators H(t) commute.
Finally, suppose that H = H(t), but that the operators at different times do not
commute. This case is somewhat harder and we shall take a crack at it much later. For
completeness, let me just say that the resulting evolution operator is given in terms of the
time ordered exponential,
hi
U (t, t0 ) = Te
Rt
t0
dt0 H(t0 )
For a formula, see your text. We wont be using this last case for a while so we defer its
discussion.
3
State vector evolution when H

t = 0
From now on, let us focus on the common case where H
t = 0. We have seen how to
build U (t, t0 ) in this case. Let us have a look at how state vectors evolve in time (in the
Schr
odinger picture). Given H, let us denote its orthonormal basis of energy eigenvectors
by |Ei i. Any state can be expanded in this basis, particularly the initial state:
|, t0 i =
hEj |, t0 i|Ej i.
The state at time t is given by

i
|, ti = e h (tt0 )H
hEj |, t0 i|Ej i
hEj |, t0 ie h (tt0 )Ej |Ej i.
As a good exercise you should check directly that this formula gives the solution to the
Schr
odinger equation matching the initial state |, t0 i. So, the effect of time evolution can
be viewed as a transformation of the components in the energy basis:
i
hEj |, t0 i hEj |, t0 ie h (tt0 )Ej .

This is a very important result. If you want to find the state vector at time t (in
the Schr
odinger picture) given the initial state vector you must perform the following
computations.
(1) Find the energy eigenvectors and eigenvalues.
(2) Expand the initial state vector in the energy basis.
(3) The components of the state vector at time t in the energy basis are the original
i
components times the phase e h (tt0 ) .
You can now see why the energy eigenvectors are so important. Finding them is the
key to understanding time evolution.
Stationary states
Suppose the Hamiltonian does not depend upon time. If the initial state vector for a
system is an energy eigenvector,
H|Ek i = Ek |Ek i,
4
(so that the energy is known with certainty at t = t0 ) then the this state vector changes
in a trivial fashion as time progresses. This can be seen in 2 ways. Firstly, simply apply
the time evolution operator:
i
U (t, t0 )|(t0 )i = e h (tt0 )H |Ek i = e h (tt0 )Ek |Ek i.

You can also check this result by using the 3 step process described above (exercise). The
energy eigenvectors change only by a phase factor, hence all physical predictions (probability
distributions) will not change in time. Thus energy eigenvectors are states in which the
system exhibits stationary behavior. Such states are naturally called stationary states.
Nothing ever happens in a stationary state.
Often times we prepare the initial state by a filtering process based upon measurement
of one (or more commuting) observables. If these observables commute with the Hamiltonian, then one can arrange that the prepared/filtered states are energy eigenvectors, thus
stationary states.
Let me point out a paradox here. Surely you have heard of the process called spontaneous emission. You know, where you put an atomic electron in an excited state and
even if you do notthing to the atom it will decay to a lower energy state? And I know
you have learned about energy levels of atoms in a previous course in quantum mechanics.
The energy levels are eigenvectors of the Hamiltonian for the atom, i.e., stationary states.
Now the paradox appears: how can an excited state of the atom a stationary state
decay into anything if nothing every happens in a staionary state? Think about it!
Conservation of energy
The components of a state vector along the energy eigenvector basis determine the
probability distribution for the energy in that state. Time evolution amounts to multiplying
these coefficients by phase factors. In general, this changes the vector in a non-trivial way;
this causes probability distributions for various observables to change in time. However,
since the probabilities for energy arise via the absolute values of the components, you
can easily see that the probability distribution for energy does not change in time (still
assuming that the Hamiltonian is time independent). This is how energy is conserved in
quantum mechanics.
Of course, in classical mechanics we usually consider conservation of energy to be
characterized by (i) measure the energy initially, (ii) measure the energy at some later
time, (iii) energy is conserved if these results are the same. One can do that here as well,
but if you measure the energy initially you will then have prepared/filtered your system
state to be an energy eigenvector. Then, of course, the probability distribution is quite
simple! As we have seen, if the system is in an energy eigenvector at one time, it remains
5
there for all time. Thus you will always get the same result for any subsequent measurement
of energy. That is certainly much like the classical way of viewing conservation of energy.
However, as we have seen, in a stationary state all probability distributions are time
independent. In non-stationary states, where the dynamical evolution is non-trivial, one
cannot say with statistical certainty what the initial energy is one has a non-trivial
probability distribution. What one can say is that, while other probability distributions
will, in general, change in time (energy-compatible observables being the exceptions), the
energy probability distribution will not. This is what conservation of energy means in
quantum mechanics.
Lecture 13
Example: Spin 1/2 in a uniform magnetic field.
Let us consider the dynamical evolution of an electronic spin in a (uniform) magnetic
field. We ignore the translational degrees of freedom of the electron. The electron magnetic
moment is an observable which we represent as
e
=
S,
mc
(Here e > 0 is the magnitude of the electron electric charge.) The Hamiltonian for a
magnetic moment in a (uniform, static) magnetic field B is taken to be
e
H = B =
S B.
mc
Let us choose the z axis along B so that

H=
eB
mc

Sz .
Evidently, Sz eigenvectors are energy eigenvectors. Thus the Sz eigenvectors are stationary
states.
Let us consider the time evolution of a general state,
|(t0 )i = a|+i + b|i,
We have
|a|2 + |b|2 = 1.
i eB
|(t)i = e h mc Sz |(t0 )i
= aeit/2 |+i + beit/2 |i,
where
eB
.
mc
From this formula you can easily see that if the initial state is an Sz eigenvector, then it
remains so for all time. To see dynamical evolution, we pick an initial state that is not an
energy eigenvector. For example, suppose that at t = 0, a = b = 1 , i.e., |(0)i = |Sx , +i.
=
What is the probability for getting Sx = h2 at time t? We have (exercise)

P rob(Sx =
h
t
) = |hSx , +|(t)i|2 = cos2 ( ),
2
2
t
P rob(Sx = ) = |hSx , |(t)i|2 = sin2 ( ).
2
2
1
Thus an effect of the magnetic field is to cause the x-component of the spin to periodically
flip. One can visualize the behavior of the spin by following its expectation value in
time. Still using |(0)i = |Sx , +i we have (good exercise)
hSx i(t) = h(t)|Sx |(t)i =
h
cos t,
2
hSy i(t) = h(t)|Sx |(t)i =
h
sin t,
2
hSz i(t) = 0.
Thus, on the average, the spin vector precesses about the axis determined by the magnetic
eB .
field, and in the plane orthogonal to it, with angular velocity = mc
Time rate of change of expectation values
Let us consider how expectation values change in time. We have (assuming that the
observable of interest does not depend upon time, so that its operator representative does
not change in time)
d
d
hAi(t) = h(t)|A|(t)i.
dt
dt
Using
1
d
1
d
|(t)i = H|(t)i,
h(t)| = h(t)|H,
dt
ih
dt
ih
we get (exercise)
d
1
hAi(t) = h[A, H]i(t).
dt
i
h
Thus, at least on the average, dynamical evolution of an observable is controlled by its
incompatibility with H. Of course, if A and H are compatible, then they admit a common
basis of eigenvectors whence the probability distribution for A is time independent for the
same reason it is for H (see below). As an exercise you can use the result shown above to
derive the time rate of change of the expectation values that we studied in the spin 1/2
precession example in the last lecture.
Conservation laws
We have already seen (assuming H
t = 0) that H (usually meaning the energy) is
conserved in the sense that the probability distribution for H is time-independent. In a
special class of states, the stationary states, the energy is known with certainty for all
time and all other probability distributions are time independent. However, It is possible
to have conserved quantities besides the energy of course without choosing a stationary
state. We say that an observable A is conserved if its probability distribution is always
time independent irrespective of the initial state of the system. It is not hard to see that
A is conserved if and only if it is compatible with H:
[A, H] = 0.
2
The if part of the statement is proved as follows. If the commutator vanishes, then
the basis of energy eigenvectors can be chosen to also be A eigenvectors. Let us denote
those eigenvectors by |ki, where
A|ki = ak |ki,
H|ki = Ek |ki.
As before, the initial state can be expanded as

|, t0 i =
ck |ki,
where |ck |2 is the probability for getting ak and/or Ek upon measurement of the observable
(represented by) A and/or H. Because the basis is built from H eigenvectors we still have
|, ti =
ck e h Ek (tt0 ) |ki,
where
i
|ck e h Ek (tt0 ) |2 = |ck |2

is the probability for getting ak at time t. Thus the probability distribution for A is
time-independent.
The only if part of the statement is proved as follows. Note that if the probability
distribution for A is to be time-independent then, using the results of the previous section,
we must have
h, t0 |[A, H]|, t0 i = 0
for all possible initial state vectors |, t0 i. As we have mentioned before, if an operator
on a complex Hilbert space has vanishing diagonal matrix elements then it is the zero
operator.
Time-Energy Uncertainty Principle

We have seen that (for a time-independent Hamiltonian) energy eigenvectors define
stationary states, so that if the energy is known with certainty at one time, then the state
is unchanged for all time, i.e., all probability distributions are time independent. Physical attributes of the system will evolve in time only if the initial state of the system is a
superposition of energy eigenvectors with different energies. Of course, in such an initial
state the energy is not known with certainty, but rather has a non-trivial probability distribution. Thus a the statistical uncertainty in energy is related to the time rate of
change of (probability distributions of) observables. The infamous time-energy uncertainty
principle relates the time scale t for a significant change of a given (initial) state to the
3
statistical uncertainty of the energy E in that state (other misleading slogans notwithstanding). You will see that this uncertainty principle is a little different than, e.g., the
position-momentum uncertainty principle, though it does take the form
tE
h.
To be continued . . .
Lecture 14
Time-Energy Uncertainty Principle (cont.)
Suppose the energy is discrete, for simplicity, with values Ek and eigenvectors |ki. Any
state can be written as
X
|i =
ck |ki.
k
Assuming this is the initial state at t = t0 , the state at time t is given by

X
i
U (t, t0 )|i =
ck e h Ek (tt0 ) |Ek i.
k
Let us use an observable A to characterize the change in the system in time (which is, after
all, what we actually do). Let us denote the standard deviation of A (or H) in the initial
state |i by A ( or E). From the uncertainty relation we have in the initial state
1
AE |h[A, H]i|.
2
Recall our previous result which relates time evolution of expectation values to commutators; we get
1
hd
h[A, H]i =
hAi.
2
2 dt
Therefore:
h d
AE | hAi|.
2 dt
If we want to use A to characterize the time scale for a significant change in the system
we can do this by comparing the rate of change of the average value of A to the initial
uncertainty in A:
A
t = d
.
| dt hAi|
With t so-defined we then have
h
.
2
So, the shortest possible time scale that characterizes a significant change in the system is
given by
tE
h.
tE
Of course, if the (initial) state is stationary that is, an energy eigenvector, then E = 0,
which forces t , which makes sense since the physical attributes of the state never
change.
1
The time-energy uncertainty principle is then a statement about how the statistical
uncertainty in the energy (which doesnt change in time since the energy probability distribution doesnt change in time) controls the time scale for a change in the system. In
various special circumstances this fundamental meaning of the time-energy uncertainty
principle can be given other interpretations, but they are not as general as the one we have
given here. Indeed, outside of these special circumstances, the alternative interpretations of the time-energy uncertainty principle can become ludicrous. What I am speaking
of here are things like the oft-heard You can violate conservation of energy if you do it
for a short enough time, or The uncertainty of energy is related to the uncertainty in
time. We shall come back to these bizarre sounding statements and see what they really
mean a little bit later. For now, beware of such slogans.
As a nice example of the time-energy uncertainty principle, consider the spin precession
problem we studied last time. Recall that we had a uniform, static magnetic field B = B k
along the z axis. The Hamiltonian is
H=
eB
Sz .
mc
We studied the time dependence of the spin observables when the initial state was an Sx
eigenvector. It is not hard to compute the standard deviation of energy in the initial state
|Sx , +i. Using

eB
h 2
2
H =
I,
2mc
we have (exercise)
1
eB
h
=
h,
2 mc
2
so that we expect a significant change in the state when
E =
1
t 1.
2
Thus the frequency controls the time scale for changes in the system, as you might have
already ascertained from, e.g., the probability distributions

h
cos2 ( 2 t)
P rob(Sx = )(t) =
.
2
sin2 ( 2 t)
Heisenberg picture
Let us now see how to describe dynamics using the Heisenberg picture, in which we
encode the time evolution into the operator representatives of the observables rather than
in the state vectors. The idea is that time evolution is mathematically modeled by allowing
2
the correspondence between physical observables and self-adjoint operators to change in

time. To see how this works is straightforward, given our previous work in the Schr
odinger
picture.
It is useful to use a very explicit notation. We denote the physical observable by A.
We denote its time-independent, self-adjoint operator representative by A. We have*
hAi(t) = h(t)|A|(t)i = h(t0 )|U (t, t0 )AU (t, t0 )|(t0 )i.
If we define
A(t) = U (t, t0 )AU (t, t0 ),
then we can view time evolution as occurring, mathematically speaking, through the time
dependent identification of a self-adjoint operator A(t) to the observable A:
hAi(t) = h(t0 )|A(t)|(t0 )i.
We call A the Schrodinger picture operator/observable and we call A(t) the Heisenberg
picture operator/observable. You can see that at t = t0 the Schr
odinger and Heisenberg
representatives of A are the same.
In the Heisenberg picture the unit vector representing the state of the system is fixed
once and for all,
|(t)i = |(t0 )i,
while the operator-observables evolve in time. In the Schr
odinger picture the states evolve
in time while the operator-observables are held fixed.
As you know, one of the most basic predictions of quantum mechanics is that the set of
possible outcomes of a measurement of A is the spectrum of its operator representative. In
the Schr
odinger picture, we thus consider the spectrum of A to get at the possible outcomes
of a measurement of A. In the Heisenberg picture, we have to, evidently, consider a different
operator at each time to see what are the possible outcomes of a measurement of A at
each time. This looks bad. If, in the Heisenberg picture, the operator representing A can,
in principle, be different at different times, we are in danger of saying that the possible
outcomes of a measurement of A which is a fixed set for all time in the Schr
odinger
picture is different at different times. For example, perhaps it is possible that a particle
could, at one instant of time, have the whole real line to move on while it could only move
on some subset of the real line at some other time. This would be a serious inconsistency.
* For simplicity, we shall always restrict our attention to observables A which do not involve
the time t explicitly in their definition. What I mean here is that we shall exclude from our
discussion observables like the energy multiplied by the time. It is easy to allow for such
observables, but it might be confusing at first glance. Your text shows how to generalize
our results to the case of explicitly time dependent observables.
3
Of course, there is no inconsistency. You can easily check that if |ai i are the eigenvectors
of A,
A|ai i = ai |ai i,
then
|ai (t)i := U (t, t0 )|ai i
are eigenvectors of A(t) with the same eigenvalue:
A|ai (t)i = ai |ai (t)i.
In fact, it is not hard to prove that the spectrum of A(t) is identical to the spectrum of
A.* Thus the possible outcomes of a measurement of A are the same in both pictures.
What does change between the two pictures is the mathematical representation of the
states in which the observable A is fixed with statistical certainty. As you know, the states
where A are known with certainty are the eigenvectors of the operator representative. In
the Heisenberg picture we then get, in general, a different eigenvector at each time. This
is exactly what is needed to get the proper time evolution of probability distributions:
P rob(A = ai ) = |hai (t)|(t0 )i|2 = |hai |U (t, t0 )|(t0 )i|2 = |hai |(t)|i|2 ,
where the last expression is the Schr
odinger picture formula for the probability.
This result on the eigenvectors changing in time can lead to confusion, so let me belabor
the point a bit. The state vector in the Heisenberg picture is the same for all time, but
the (basis of) eigenvectors of an observable will, in general, change in time. If you start
your system off in an eigenvector of A(t0 ), this means that the observable A is known
with certainty at time t = t0 and we have |(t)i = |(t0 )i = |a(t0 )i. At some other time
t, the state vector is still |a(t0 )i, but this is no longer a state in which A is known with
certainty since that state vector would be |a(t)i, not |a(t0 )i. One sometimes summarizes
this situation with the slogan: In the Schr
odinger picture the basis vectors (provided by the
observables) are fixed, while the state vector evolves in time. In the Heisenberg picture the
state vectors are held fixed, but the basis vectors evolve in time (in the inverse manner).
This is an instance of the active vs. passive representation of a transformation in this
case time evolution.
* This is true for any two operators related by a unitary transformation A U AU .

4
Lecture 15
Compatible/Incompatible observables in the Heisenberg picture
Let us also note that the notion of compatible/incompatible observables is left undisturbed by the transition to the Heisenberg picture. This is because the commutator transforms as (exercise)
[A(t), B(t)] = [U AU, U BU ] = U [A, B]U.
(Note any two operators related by a unitary/similarity transformation will have this
commutator property.) Thus, if A and B are (in)compatible in the Schr
odinger picture
they will be (in)compatible (at each time) in the Heisenberg picture. Note also that the
commutator of two observables in the Schr
odinger picture which is i times another
observable makes the transition to the Heisenberg picture just as any other Schr
odinger
observable, namely, via the unitary transformation
A U AU.
Unitary transformations in general

Let us note that while our discussion was phrased in the context of time evolution,
the same logic can be applied to any unitary transformation. For example, for a particle
moving in one dimension one can view the effect of translations as either redefining the
state vector, leaving the operator-observables unchanged:
|i Ta |i,
AA
or equivalently as redefining the observables, with the state vectors unchanged:
A Ta ATa ,
|i |i.
Note, in particular, that the position and momentum operators change in the expected
way under a translation (exercise you played with this stuff in the homework):
Ta XTa = x + a,
Ta P Ta = p.
Heisenberg equations
We saw that the conventional Schr
odinger equation is really just a consequence of the
relation between the time evolution operator and its infinitesimal generator in the context
of the Schrodinger picture:
d
d
ih U (t, t0 ) = H(t)U (t, t0 ) i
h |(t)i = H(t)|(t)i.
dt
dt
1
Given a Hamiltonian, this equation is the starting point for investigating quantum dynamics in the Schrodinger picture. We can now ask: what is the analog of this in the Heisenberg
picture? In the Heisenberg picture, dynamical evolution occurs through the operatorobservables, which for simplicity we assume to be time independent in the Schr
odinger
picture. We have
A(t) = U (t, t0 )AU (t, t0 ).
Differentiating both sides and using our basic differential equation for U (t, t0 ) we get
d
1
1
A(t) = U (t, t0 )H(t)AU (t, t0 ) + U (t, t0 )A H(t)U (t, t0 )
dt
ih
i
h
1
1
= U (t, t0 )H(t)U (t, t0 )U (t, t0 )AU (t, t0 ) + U (t, t0 )AU (t, t0 )U (t, t0 )H(t)U (t, t0 )
ih
i
h
1
= [A(t), HHeis (t)].
ih
Here we have introduced the Heisenberg picture version of the Hamiltonian:
HHeis (t) = U (t, t0 )H(t)U (t, t0 ).
If the (Schrodinger ) Hamiltonian is time independent, as is often the case, then we have
(exercise)
H
HHeis (t) = H(t) = H,
= 0.
t
This is pretty important to keep in mind.
The equation
d
1
A(t) = [A(t), HHeis (t)]
dt
i
h
is the Heisenberg equation of motion for the Heisenberg operator A(t). Given a Hamiltonian
it is a differential equation that, in principle, can be solved to find the Heisenberg operator
corresponding to an observable at time t, given initial conditions
A(t0 ) = A.
Given A(t) one gets the time dependence of probability distributions in the usual way.
The outcome of a measurement of the observable represented (at time t) by A(t) is one of
the eigenvalues ai . The probability for getting ai (assuming no degeneracy) at time t is
P (ai , t) = |hai , t|, t0 i|2 = |hai , t0 |U (t, t0 )|, t0 i|2 ,
where |ai , ti is the eigenvector of A(t) with eigenvalue ai and |, t0 i is the state vector for
the system.
2
The Heisenberg equation can make certain results from the Schr
odinger picture quite
transparent. For example, just by taking expectation values on both sides of the equation
using the (single, time-independent) state vector it is apparent that (exercise)
d
1
hAi(t) = h[A, H]i.
dt
i
h
Note that here the notation [A, H] means the observable corresponding to the commutator,
which is well defined in either picture. Similarly, it is easy to see that operators that
commute with the Hamiltonian (assumed time-independent for simplicity, so that HHeis =
H) at one time t0 are constants of the motion:
1
d
A(t) = [A(t), H] = U [A(t0 ), H]U = 0.
dt
ih
Thus, conserved quantities satisfy
A(t) = A(t0 ) = A.
You can also see this result directly. If an observable commutes with H at one time, t = t0
say, then it will not change in time since
A(t) = U (t, t0 )A(t0 )U (t, t0 ) = U (t, t0 )U (t, t0 )A(t0 ) = A(t0 ).
Evidently, a conserved quantity has a time independent probability distribution since the
operator in the Heisenberg picture does not change in time.
Functions of observables
We have seen that the time evolution operator defines, via a similarity transformation,
the Heisenberg operators at time t:
A(t) = U AU.
If we have an observable that is a function of A, F (A) say, we have, of course
F (A)(t) = U F (A)U.
It is important to note that one can also express this as
F (A)(t) = F (A(t)) F (U AU ).
To see this, we use the spectral decomposition:
X
F (A) =
F (a)|aiha|,
a
so that
U F (A)U =
F (a)U |aiha|U =
F (a)|a(t)iha(t)| = F (A(t)).
Heisenberg Picture Dynamics of a Particle in a potential

For a particle with position (at one time, say t = 0) X and momentum P (these are
spatial vectors made of operators we are going back to our original notation), we consider
a Hamiltonian of the form
P2
H=
+ V (X).
2m
Note that this operator does not depend upon time, so it is both the Schr
odinger and
Heisenberg Hamiltonian. In particular, we have that
P (t)2
+ V (X(t)).
2m
This result can be viewed as a mathematical version of the conservation of energy in the
Heisenberg picture.
H(t) = H(0) = H =
Let us compute the Heisenberg equations for X(t) and momentum P(t). Evidently, to
do this we will need the commutators of the position and momentum with the Hamiltonian.
To begin, let us consider the canonical commutation relations (CCR) at a fixed time in
the Heisenberg picture. Using the general identity
[A(t), B(t)] = U (t, t0 )[A(t0 ), B(t0 )]U (t, t0 ),
we get (exercise)
[X i (t), X j (t)] = 0 = [Pi (t), Pj (t)],
[X i (t), Pj (t)] = i
hji I.
In other words, the Heisenberg position and momentum operators obey the CCR at any
fixed time. Note that this means the uncertainty relation between position and momentum
is time independent a fact you can also prove in the Schr
odinger picture (exercise).
It is now straightforward to compute
[X i (t), H] =
ih i
P (t),
m
so that
1
dX i (t)
= P i (t).
dt
m
Thus one relates the momentum and velocity of the particle; a result that is a bit more
tricky to establish in the Schrodinger picture. To compute the Heisenberg equations for
the momentum we need to compute (exercise)
[Pi (t), H] = [Pi (t), V (X(t))].
4
Probably the simplest way to do this is to use

[Pi (t), V (X(t))] = U (t, 0)[Pi , V (X)]U (t, 0) = i
hU (t, 0)
V
V
(X)U (t, 0) = i
h i (X(t)).
i
x
x
Here we have used
V
(X),
xi
which can be verified by checking it on the position eigenvector basis using the definition
of Pi as the generator of infinitesimal translations (good exercise). All together we get
[Pi , V (X)] = i
h
V
dPi (t)
= i (X(t)).
dt
x
Using the Heisenberg equations for X(t), we can write the Heisenberg equation for P(t) as
d2 X i (t)
= F i (X(t)),
dt2
where
V
(X(t)),
xi
can be viewed as the quantum representation of the force at time t in the Heisenberg
picture. This is a quantum version of Newtons second law.
F i (X(t)) =
From this result it is tempting to believe that a quantum particle is actually behaving
just like a classical particle. After all, the basic observables obey the same equations of
motion in the two theories. Of course this is not true, if only because it is not possible to
know both position and momentum with statistical certainty in the quantum theory. In
the next section we will take a closer look at this issue.
As a elementary example, let us consider a free particle in 1-d, V = 0. The Heisenberg
equations are
dX(t)
P (t)
dP (t)
=
,
= 0,
dt
m
dt
with solution (setting t0 = 0)
X(t) = X(0) +
P (0)
t,
m
P (t) = P (0).
Here X(0) = X and P (0) = P are the operators we discussed previously in the Schr
odinger
picture. The momentum is evidently a constant of the motion: the momentum probability
distribution for a free particle is time independent. The position probability distribution
changes in time, e.g.,
1
hXi(t) = hXi(0) + hP i(0).
m
5
Lecture 16
Ehrenfests Theorem
The Heisenberg equations are appealing because they make formal contact with the
Hamilton equations of classical mechanics. In classical mechanics functions on phase space
represent the observables, and the time rate of change of an observable A is controlled by
the Poisson bracket with the Hamiltonian:
dA
= {A, H}.
dt
The formal correspondence with quantum mechanics is made via
{A, B}
1
[A, B],
i
h
where the observables are represented by functions on phase space on the left and operators
on the right. This formal correspondence implies that expectation values will, in a suitable
approximation, follow classical trajectories, a result known as Ehrenfests theorem.
To derive this theorem in the Heisenberg picture is quite easy. Take the expectation
value of the quantum form of Newtons second law,
d2 X i (t)
V
=
(X(t)),
dt2
xi
and use the time independence of the state vector to obtain (exercise)
d2
hXi(t) = hFi(t),
dt2
where F is the force. This result is Ehrenfests theorem.
Exercise: How would you derive this equation in the Schr
odinger picture?
It is often said that Ehrenfests theorem shows that expectation values obey the classical dynamical laws. This slogan is not quite true. In particular, the expectation value
of position does not necessarily obey Newtons second law. A true version of Newtons
second law for the expectation value would read
m
~
d2 hX i i(t)
V (hXi(t))
=
.
dt2
hX i i
1
Of course, this latter equation is not what quantum mechanics gives, in general. To get
this last result we need
~
~
V (X(t))
V (hXi(t))
h
i
=
,
X i
hX i i
whose validity depends upon the state vector being used as well as on the form of the
potential energy operator. A simple example where this equality does hold is provided by
the harmonic oscillator potential in one dimension (to be discussed below in more detail)
for which
1
V (X) = kX 2 ,
2
so that
V (X(t))
= kX,
X
which satisfies for any state vector
hkXi(t) =
V (hXi(t))
= khXi(t).
hXi
The harmonic oscillator example is exceptional. Generally, Ehrenfests theorem does

not imply that expectation values obey classical equations of motion. For example, consider
the potential
1
V (X) = kX 3 .
3
We have in this case
V (hXi(t))
= khXi2 (t),
hXi
which does not, in general, equal
V (X(t))
i = khX 2 i(t).
X
Indeed, these two expressions agree if and only if the dispersion in X(t) vanishes which it
never does, strictly speaking. (On the other hand, one can find states which make the position dispersion arbitrarily small at the expense of a large dispersion in momentum.) Thus,
according to Ehrenfests theorem, the expectation values of position for this cubic potential
will only agree with the classical behavior insofar as the dispersion in position is negligible
(for all time) in the chosen state. This result illustrates a general rule: classical behavior
for particle motion arises when the statistical uncertainties in the basic observables are
sufficiently small.
h
Harmonic Oscillator: Definitions, the Hamiltonian

We now begin a survey of key properties of one of the workhorse models of quantum
mechanics: the simple harmonic oscillator (SHO). This model is useful because it is analytically quite tractable, it illustrates a wide variety of quantum mechanical features both
2
physically and at the level of the formalism itself, and it provides a very nice first approximation to many simple models of real physical systems in the vicinity of stable equilibrium.
It is remarkable how often the idea of a harmonic oscillator appears in physics. For example, the harmonic oscillator illustrates generic features of bound state behavior, it
is used to model the confinement of quarks in nucleons, it is used to model the spectral
properties of molecules, it is the elementary building block of the free quantum field.
One can use harmonic oscillators to understand photons!
The SHO can be viewed mathematically as a particle moving in one dimension under
the influence of a quadratic potential (although, physically, it doesnt usually arise that
way). The position is X, the momentum is P , which we have already defined in some
detail (in the Heisenberg pictures these are the position and momentum operators at the
initial time). The dynamics are generated by the Hamiltonian (in either the Schr
odinger
or Heisenberg picture)
P2
1
H=
+ m 2 X 2 .
2m 2
Classically, this would be the energy function for a mass on a spring dynamical system
where m is the mass and is the natural frequency of the spring according to Hookes
law. We retain that terminology here.
The position and momentum observables we have studied in some detail. For this
simple one-dimensional system, the Hamiltonian is the center of attention. Of course, we
would like to understand its spectrum and eigenvectors, since these characterize the possible energies and the states where the energy is known with statistical certainty. Moreover,
the spectral properties of H will determine the dynamical evolution of the oscillator in the
Schr
odinger picture. The algebraic (commutator) properties of H, X and P will control
the dynamics in the Heisenberg picture. In fact, as you will see, the algebraic properties
of the operators H, X and P essentially tell us everything we want to know.
Spectrum of the SHO Hamiltonian

To analyze H we follow an elegant algebraic approach pioneered (I believe) by Dirac.
Rather than working directly with the position and momentum operators, a more convenient set of operators can be defined as follows. Set
r
r

m
i
m
i
a=
X+
P , a =
X
P ,
2h
m
2h
m
so that (exercises)
r
X=
h
(a + a ),
2m
1
P =
i
3
h
(a a ).
2m
Note that a and a can be viewed as quantum versions of the complex amplitudes featuring in the general solution of the equations of motion of a classical oscillator (exercise).
We will say a little more about this later.
From the CCR for position and momentum, we have (exercise)
[a, a ] = I.
A straightforward computation reveals (exercise)
1
1
H = h(aa + a a) = h(a a + I).
2
2
Evidently, the gist of the Hamiltonian is to be found in the number operator
N := a a.
This operator is self-adjoint, and we denote its eigenvectors, for now, by |i,
N |i = |i,
so that the energy eigenvectors are |i with
1
h.
E = ( + )
2
Note that the eigenvalues are dimensionless (exercise).
H|i = E |i,
It is not hard to check that*

[N, a ] = a .
[N, a] = a,
This implies that (exercise)
N (a |i) = ( + 1)(a |i),

and
N (a|i) = ( 1)(a|i).
Therefore a (a) takes an eigenvector of N and turns it into an eigenvector with eigenvalue
raised (lowered) by one unit. Correspondingly, the action of a (a) on an energy eigenvector
yields an eigenvector with energy raised (lowered) by one quantum, h
. For this reason
a and a are often called creation and annihilation operators, or ladder operators or
raising and lowering operators. So, if one has a single eigenvector of N (or H), then one
will have an infinity of such eigenvectors with eigenvalues all differing by integers (or by
integer multiples of h). This result follows merely from the commutation relations for
position and momentum (or a and a ).
To be continued. . .
* One way to check this is to use the identity
[AB, C] = A[B, C] + [A, C]B,
which you should prove as an exercise.
4
Lecture 17
Spectrum of the SHO Hamiltonian (cont.)
It is shown in detail in the text that the eigenvalues of N are non-degenerate and are
non-negative integers,
n = 0, 1, 2, . . . .
The essence of the proof is to note that the number operator must have a non-negative
expectation value:
h|N |i = h|a a|i = (h|a )(a|i) 0,
where
h|N |i = 0
a|i = 0.
On the other hand, since a lowers the eigenvalue of N by one unit, there must be a unique
(up to normalization) lowest-eigenvalue eigenvector |0i such that
a|0i = 0.
One then easily infers that the spectrum of N is non-degenerate, consisting of the nonnegative integers:
N |ni = n|ni, n = 0, 1, 2, . . . .
Therefore, the energy spectrum is purely discrete:
1
h,
En = (n + )
2
n = 0, 1, 2, . . .
and the energy eigenvectors are labeled |ni. These results arise from the assumption that
X and P are self-adjoint operators on a Hilbert space (or, equivalently, that a and a are
adjoints of each other with respect to the Hilbert space scalar product).
Finally, it is shown in the text that we have
a|ni = n|n 1i, a |ni = n + 1|n + 1i,
hn|mi = nm .
Note that the ground state |0i has energy 12

h and satisfies
a|0i = 0.
Energy eigenfunctions
We have defined the simple harmonic oscillator and computed the spectrum of its
Hamiltonian. Now we explore some properties of the energy eigenvectors, that is, the
1
stationary states. Of course, each of these vectors |ni represents a state in which the
energy is known with certainty to have the value En = (n + 21 )
h. These states also
define probability distributions for all other observables, in particular the position and
momentum. Let us consider the position probability distribution, which is controlled by
the position wave functions
un (x) = hx|ni.
It is easy enough to compute these functions. For example, consider the ground state wave
function u0 ; it satisfies
au0 (x) = hx|a|0i = 0.
Since
r
aun (x) =
we have that
m
2h
i
X+
P
m

un (x) =
d
h
x+
m dx
m
2h
d
h
x+
m dx

un (x),

u0 (x) = 0.
This equation is easily solved and normalized to give a Gaussian (exercise)

2
1
x
1
2 x0
,
u0 (x) =
e
1/4 x0
where
h
m
is a length scale set by the problem and, as you can see, it determines the width of the
Gaussian.
x0 =
Here we can see one way in which the quantum and classical mechanics regimes are
related. Of course, a classical mechanics description of an oscillator implies that in the
ground state the oscillator has position x = 0 with certainty. Quantum mechanics provides
instead a Gaussian probability distribution about x = 0. However, provided that x0 is
small, the width of the Gaussian is negligible and the quantum description starts to
coalesce with the classical description in this regard. I used quotation marks about the
word small since x0 has dimensions of length; whether or not you consider it to be small
depends on comparing it to some other length scale. When we speak of macroscopic
phenomena we usually are interested in length scales on the order of, say, centimeters,
masses on the order of grams, and times on the order of seconds. In such a regime x0 is
indeed very, very small. But, of course, in a microscopic regime, x0 can be appreciable.
The excited states (n > 0) are easily obtained from the identity (exercise)
1
|ni = (a )n |0i,
n!
2
so that
n 1 x 2
2 x
d
0
x20
+x e
un (x) =
.
n+1/2
dx
1/4
n
2 n!x0
As you may know, this formula represents (up to the dimensionful constants) one of the
standard generating function methods for defining the Hermite polynomials. Thus,
un (x) is a Hermite polynomial times the ground state Gaussian. See your text for detailed
formulas.
Expectation values
It is instructive to see how to compute stationary state expectation values and, of
course, to see what they look like. To begin, we observe that stationary state expectation
values of observables that are linear in position and/or momentum will vanish:
hn|X|ni = 0 = hn|P |ni.
To see this, just note that such observables are linear in the ladder operators and we have
(by orthogonality)
hn|a |ni hn|n + 1i = 0.
hn|a|ni hn|n 1i = 0,
Another way to see that the expectation values of position and momentum vanish in
stationary states is to note that (exercise)
X [P, H],
P [X, H].
From which the result easily follows (exercise).

On the other hand, quadratic functions of position and momentum need not have
vanishing expectation values. For example, in the ground state (exercise)
x2
x2
h0|X 2 |0i = 0 h0|(a2 + a2 + a a + aa )|0i = 0 .
2
2
Which gibes with our earlier comment about the width of the ground state Gaussian
position probability distribution. A similar computation for hP 2 i shows that (exercise)
h0|P 2 |0i =
2
h
.
2x20
We see that the dispersions in position and momentum satisfy (in the ground state)
2
h
,
4
which is in accord with the uncertainty principle, but also shows that the ground state is
a minimum uncertainty state. The excited states are not minimum uncertainty states;
a straightforward computation reveals (exercise)
1
hX 2 ihP 2 i = (n + )2
h2 .
2
hX 2 ihP 2 i =
Lecture 18
Stationary states and classical mechanics
Here we use the oscillator to illustrate a very key point about the relation between
classical and quantum mechanics.
The stationary states we just studied do not provide any explicitly dynamical behavior.
This is not a specific feature of the oscillator, but a general feature of stationary states in
quantum mechanics. This is, at first sight, a little weird compared to classical mechanics.
Think about it: the probability distributions for position and momentum are time independent in any state of definite energy. In classical mechanics the position and momentum
(and the energy) can, at each moment of time, be determined with certainty; the values
for the position and momentum oscillate in time in every state but the ground state.*
In light of the incompatibility of the position, momentum and energy observables in the
quantum description, one cannot directly compare the classical and quantum predictions
in the excited states. The quantum predictions are purely statistical, involving repeated
state preparations states specified only by their energy and measurements of various
observables. If we want to compare the quantum and classical descriptions we need to ask
the right questions of classical mechanics this means statistical questions. Let us pause
for a moment to expand on this.
In classical mechanics every solution of the equations of motion is a state of definite
energy (assuming energy is conserved). Fix an energy E. Let us ask the same question
that we ask in quantum theory: Given the energy what is the probability for finding
the particle at various locations? To answer this question we define the probability for
finding the classical particle to be in the interval [x, x + dx] to be proportional to the time
spent in that region. (The proportionality constant is used to normalize the probability
distribution.) For a classical oscillator at the point x, a displacement dx takes place in the
time dt where (exercise)
dx
dt = q
.
2E 2 x2
m
Here it is understood that x lies between the two classical turning points, where
1
E = m 2 x2 .
2
* In the ground state, the classical motion is of course trivial the position (relative to
the equilibrium position) and the momentum vanish. In the quantum ground state the
position and momentum have Gaussian probability distributions centered at the vanishing
values. For macroscopic values of m and the widths of these distributions are negligible.
1
Up to a constant normalization factor, this defines the probability density P (x) for the
classical oscillator:
1
P (x) = (const.) q
.
2E 2 x2
m
(The probability density vanishes outside the turning points.) For E 6= 0, the resulting
probability density is strongly peaked near the classical turning points of the motion and
relatively small and flat near the equilibrium position. This just reflects the fact that,
because the oscillator moves slowly near the turning points and rapidly at the equilibrium
position, it is more likely to find the particle near the turning points. In general, a quantum
oscillator does not have anything like this classical probability distribution. First of all, it
oscillates in a non-trivial way because of the Hermite polynomials. Furthermore, while the
probability density is exponentially damped outside of the turning points, the probability
distribution is not identically zero there. However, it can be shown that the quantum
probability distribution does approximate the classical distribution in the limit of large
energy. Here, large means sufficiently macroscopic, i.e., En >>
h so that n >> 1.
2
To see this one simply computes |un | for a large n. The result is a very rapidly oscillating
wavefunction; the oscillations occur about an average curve which approaches the classical
probability distribution as n gets larger and larger. For any finite interval of x, a large
enough n will result in a probability for finding the particle in that interval to agree with
the classical prediction to any desired accuracy. To see this quickly, simply ask your favorite
computer math software to plot the graphs!
We have seen that the probability distribution for position predicted by quantum
mechanics approaches that predicted by classical statistical mechanics in the limit of large
quantum numbers. This result is satisfying, but is far from the complete story of the
relation of the classical and quantum features of the oscillator. A much better way to model
the classical oscillator does not use stationary states but instead minimum uncertainty
coherent states (see the exercises at the end of chapter 2). Recall that the excited
states are not minimum uncertainty states in position and momentum. The coherent
states are minimum uncertainty states; for macroscopic mass and frequency the oscillator
has a small enough uncertainty in energy, position and momentum and a suitable time
dependence to model a classical oscillator. In particular, because the coherent states
arent stationary (except for the ground state), the position and momentum probability
distributions can mimic their classical counterparts sinusoidal oscillations. We will not
explore the coherent states here. We do note that the ground state of the oscillator is one
such state. For macroscopic values of m and the width of the ground state Gaussian
h is truly negligible compared to
position probability distribution controlled by x0 = m
macroscopic length scales, so you can see at least in this state that classical behavior is
recovered.
2
Oscillator Dynamics
To explicitly see the oscillatory behavior of an oscillator in quantum mechanics one
needs to use non-stationary states, that is, superpositions of energy eigenstates, as the
initial state. Let us examine this feature using the Heisenberg picture. The principal job is
to get the basic observables at time t in terms of their representation at some fixed initial
time, say, t0 = 0. We have
X(0) X, P (0) P.
We need to compute X(t) and P (t). We can do this directly using
i
X(t) = e h Ht X(0)e h Ht ,
P (t) = e h Ht P (0)e h Ht ,
and
P 2 (t) 1
P 2 (0) 1
+ m 2 X 2 (t) = H(0) =
+ m 2 X 2 (0).
2m
2
2m
2
To evaluate X(t) and P (t) you can either expand the exponentials in power series and, by
studying the general term, try to deduce a closed form expression for the result. There are
other tricks as well for manipulating the similarity transformation. However, a somewhat
easier way to get expressions for X(t) and P (t) is to directly solve the Heisenberg equations.
Using (exercise)
[X(t), P (t)] = i
hI,
H = H(t) =
we have
ih
d
P (t)
X(t) = [X(t), H] = i
h
,
dt
m
d
P (t) = [P (t), H] = i
hm 2 X(t),
dt
So that the Heisenberg equations can be written as
ih
P (t)
d
d
X(t) = [X(t), H] =
,
P (t) = [P (t), H] = m 2 X(t).
dt
m
dt
These are mathematically the same as the classical Hamilton equations of motion, and are
as easily solved. Taking account the initial conditions at t = 0, we have
X(t) = (cos t)X(0) + (
1
sin t)P (0),
m
P (t) = (cos t)P (0) (m sin t)X(0).

In the Heisenberg picture, the time dependence in physical predictions (i.e., probability
distributions) is obtained using these observables and a fixed state vector. As an example,
let us compute the statistical mean of the position and momentum at time t. Taking
diagonal matrix elements of the Heisenberg operators, we get
hXi(t) = (cos t)hXi(0) + (
3
1
sin t)hP i(0),
m
hP i(t) = (cos t)hP i(0) (m sin t)hXi(0).

From the above equations you can see that we get the usual sort of behavior expected of a
harmonic oscillator with mass m and frequency . Indeed, we see that the expectation values behave exactly as do the classical position and momentum, in accord with Ehrenfests
theorem.
Of course, if the system is in a stationary state |ni then expectation values do not
change in time. This seems paradoxical in light of our result shown above. The resolution
of this paradox in the above example is that, for stationary states, the initial expectation
values vanish, e.g.,
hXi(t) = hXi(0) = 0,
hP i(t) = hP i(0) = 0.
As an exercise see if you can see how the expectation values of, say, X 2 manages to be
time independent in a stationary state.
Lecture 19
Charged particle in an electromagnetic field
We now turn to another extremely important example of quantum dynamics. Let us
describe a non-relativistic particle with mass m and electric charge q moving in a given
electromagnetic field. This system has obvious physical significance.
~ and
We use the same position and momentum operators (in the Schr
odinger picture) X
P~ (although there is a subtlety concerning the meaning of momentum, to be mentioned
later). To describe the electromagnetic field we need to use the electromagnetic scalar and
~ x, t). They are related to the familiar electric and magnetic
vector potentials (~x, t), A(~
~ B)
~ by
fields (E,
~
~ = A.
~
~ = 1 A , B
E
c t
The dynamics of a particle with mass m and charge q is determined by the Hamiltonian
2
1 ~ q ~ ~
~ t).
H=
P A(X, t)) + q(X,
2m
c
This Hamiltonian takes the same form as the classical expression in Hamiltonian mechanics.
We can see that this is a reasonable form for H by computing the Heisenberg equations of
motion, and seeing that they are equivalent to the Lorentz force law, which we shall now
demonstrate.
For simplicity we assume that the potentials are time independent, so that the Heisenberg and Schrodinger picture Hamiltonians are the same, taking the form
1 ~ q ~ ~ 2
~
H=
P A(X)) + q(X).
2m
c
For the positions we get (exercise)
1 ~
1
q~ ~
d ~
X(t) = [X(t),
X(t))}.
H] = {P~ (t) A(
dt
ih
m
c
We see that (just as in classical mechanics) the momentum defined as the generator of
translations is not necessarily given by the mass times the velocity, but rather
~
dX(t)
q~ ~
X(t)).
P~ (t) = m
+ A(
dt
c
As in classical mechanics we sometimes call P~ the canonical momentum, to distinguish it
from the mechanical momentum
~
~ = m dX(t) = P~ q A(
~ X(t))
~
dt
c
1
Note that the mechanical momentum has a direct physical meaning, while the canonical
momentum depends upon the non-unique form of the potentials. We will discuss this in
detail soon.
While the components of the canonical momenta are compatible,
[Pi , Pj ] = 0,
the mechanical momenta are not (!):
Aj
q A
q
) = i
h ijk B k .
[i , j ] = i
h ( ji
i
c x
c
x
Thus, in the presence of a magnetic field, the mechanical momenta obey an uncertainty
relation! This is a surprising, non-trivial and quite robust prediction of quantum mechanics. In particular, if the field is uniform, then two components of mechanical momentum
will obey a state independent uncertainty relation rather like ordinary position and momentum. Can this prediction be verified? As you will see in your homework problems, this
incompatibility of the mechanical momentum components in the presence of a magnetic
field is responsible for the Landau levels for the energy of a charged particle in a uniform
magnetic field. These levels are well-known in condensed matter physics.
The remaining set of Heisenberg equations are most simply expressed using the mechanical momentum. Starting with
H=
2
~
+ q(X),
2m
using the commutation relations between components of the mechanical momentum (above),
and using
[X i , j ] = i
hji I,
we have (exercise)

d~
1 ~
1
~ X(t))
~
~
~ X(t))
~
~ X(t))
~
~
(t) = [(t), H] = q E(
+
(t)
B(
B(
(t)
.
dt
ih
2mc
~ and B,
~ this is the usual Lorentz force law
Except for the possible non-commutativity of
for the operator observables.
The Schr
odinger equation
Dynamics in the Schrodinger picture is controlled by the Schr
odinger equation. If we
compute it for position wave functions then we get (exercise)

2
1
h
q~
A(~x) (~x, t) + q(~x, t)(~x, t) = i

h (~x, t).
2m i
c
t
2
The left hand side represents the action of the Hamiltonian as a linear operator on position
wave functions. We have in detail

qh ~
1
q 2 2
h2 2
~

A + ( A) + [
A + q].
H =
2m
imc
2
c
As you may know, one can always arrange (by making a gauge transformation if
necessary) to use a vector potential that satisfies the Coulomb gauge:
~ = 0.
A
In this case the Hamiltonian on position wave functions takes the form
H =
q 2
2 2
h
q
h ~

A + [
A2 + q].
2m
imc
c
Some typical electromagnetic potentials that are considered are the following.
(i) The Coulomb field, with
=
k
,
|~x|
~ = 0,
A
which features in a simple model of the hydrogen atom; the spectrum and stationary states
should be familiar to you. We will soon study it a bit in the context of angular momentum
issues.
~ where
(ii) A uniform magnetic field B,
= 0,
~ = 1 B ~x.
A
2
The vector potential is not unique, of course. This potential is in the Coulomb gauge. You
will explore this system in your homework. The results for the stationary states are interesting. One has a continuous spectrum coming from the motion along the magnetic field;
but for a given momentum value there is a discrete spectrum of Landau levels coming
~ To see this one massages the Hamiltonian into
from motion in the plane orthogonal to B.
the mathematical form of a free particle in one dimension added to a harmonic oscillator;
this is the gist of your homework problem.
(iii) An electromagnetic plane wave, in which
= 0,
~=A
~ 0 cos(~k ~x kct),
A
~k A
~ 0 = 0.
Of course, this latter example involves a time dependent potential. This potential is used
to study the very important issue of interaction of atoms with a radiation field; maybe we
will have time to study this toward the end of the semester.
3
Gauge transformations
There is a subtle issue lurking behind the scenes of our model of a charged particle in
a prescribed EM field. It has to do with the explicit appearance of the potentials in the
operators representing various observables. For example, the Hamiltonian which should
represent the energy of the particle depends quite strongly on the form of the potentials.
The issue is that there is a lot of mathematical ambiguity in the form of the potentials
and hence operators like the Hamiltonian are not uniquely defined. Let me spell out the
source of this ambiguity.
~ define a given EM
You may recall from your studies of electrodynamics that, if (, A)
~ B),
~ then the potentials (0 , A
~ 0 ), given by
field (E,
0 =
1 f
,
c t
~0 = A
~ + f,
A
~ B)
~ for any choice of f = f (t, ~x). Because all the physics in classical
define the same (E,
~ and B,
~ we declare that all potentials related by such
electrodynamics is determined by E
gauge transformations are physically equivalent in the classical setting. In the quantum
setting, we must likewise insist that this gauge ambiguity of the potentials does not affect
physically measurable quantities. Both the Hamiltonian and the mechanical momentum
are represented by operators which change their mathematical form when gauge-equivalent
potentials are used. The issue is how to guarantee the physical predictions are nonetheless
gauge invariant.
Let us focus on the Hamiltonian for the moment. The eigenvalues of H define the
allowed energies; the expansion of a state vector in the eigenvectors of H defines the
probability distribution for energy; and the Hamiltonian defines the time evolution of the
system. The question arises whether or not these physical aspects of the Hamiltonian
operator are in fact influenced by a gauge transformation of the potentials. If so, this
would be a Very Bad Thing. Fortunately, as we shall now show our model for a particle in
an EM field can be completed so that the physical output of quantum mechanics (spectra,
probabilities) are unaffected by gauge transformations.
For simplicity (only) we still assume that H is time-independent and we only consider
gauge transformations for which f
t = 0. The key observation is the following. Consider
two charged particle Hamiltonians H and H 0 differing only by a gauge transformation of
the potentials, so that they should be physically equivalent. Our notation is that if H is
~ then H 0 is defined by the gauge transformed potentials
defined by (, A)
0 = ,
~0 = A
~ + f (~x),
A
It is now straightforward to verify (see below) that if |Ei satisfies

H|Ei = E|Ei,
4
then
iq
|Ei0 = e h c f (X) |Ei

satisfies
H 0 |Ei0 = E|Ei0 .
iq
Note that the eigenvalue is the same in each case. The operator e h c f (X) is unitary, and
this implies the spectra of H and H 0 are identical. Thus one can say that the spectrum of
the Hamiltonian is unaffected by a gauge transformation, that is, the spectrum is gauge invariant. Thus one can use whatever potentials one wishes to compute the energy spectrum
and the prediction is always the same.
Lecture 20
Gauge transformations (cont.)
Our proof that the spectrum of the Hamiltonian does not change when the potentials
are redefined by a gauge transformation also indicates how we are to use our model so that
all probabilities are unaffected by gauge transformations. We decree that if |i is the state
~ then
vector of a particle in an EM field described by the potentials (, A),
iq
|i0 = e h c f (X) |i
~ 0 ).
is the state vector of the particle when using the gauge transformed potentials (0 , A
Note that this is a unitary transformation.
Let us now see why this prescription works. For a particle, all observables are functions
of the position and momentum operators. Here momentum means either canonical or
mechanical. The position observable is represented (in the Schr
odinger picture) by the
~ no matter the gauge. Any observable function G of the position has an
usual operator X,
expectation value which does not change under a gauge transformation:
iq
~
~
0 h|G(X)|i
0 = h|e hiqc f (X)
~
~ h c f (X)
~
G(X)e
|i = h|G(X)|i.
The momentum operator is where things get more interesting. The mechanical momentum
is a gauge-invariant observable. But it is represented by an operator which changes under
a gauge transformation! Indeed, we have
~ = p~ q A,
~
~ 0 = p~ q (A
~ + f ).
However, it is straightforward to check that (exercise)

~
~ ~
~ 0 e h c f (X)
|i = e h c f (X) |i.
iq
iq
Put differently, we have the operator representing the mechanical momentum which is
a gauge-invariant observable transforming under a gauge transformation as a unitary
transformation:
iq
~ ~ iq f (X)
~
~ 0 = e h c f (X)
e h c
.
Any function of the position and (mechanical) momentum will have a similar transformation law. In particular, the Hamiltonian can be expressed as (exercise)
H=
1 2
+ q,
2m
1
so it follows that (exercise)

iq
iq
H 0 e h c f (X) |i = e h c f (X) H|i,

that is,
iq
iq
H 0 = e h c f (X) He h c f (X) .
The physical output of quantum mechanics is not changed by a unitary transformation
of the state vectors and a unitary (similarity transformation) of the observables. This is
because the expectation values will not change in this case:
h|C|i = h 0 |C 0 | 0 i,
where
| 0 i = U |i,
C 0 = U CU .
It is now easy to see that if you compute the expectation value of (any function of the)
~ or you can use the vector
mechanical momentum you can use the state |i and operator ,
0
0
~ , and get the same answer. In this way one says that the physical
|i and operator
output of quantum mechanics is suitably gauge invariant. Different choices of potentials
lead to unitarily equivalent mathematical representations of the same physics.
It is not hard to generalize all this to time dependent gauge transformationsf = f (t, ~x).
Here we simply observe that if |, ti is a solution to the Schr
odinger equation for one set
of potentials then (exercise)
iq
|, ti0 = e h c f (t,X) |, ti
is the solution for potentials obtained by a gauge transformation defined by f . Thus
one gets gauge invariant results for the probability distributions as functions of time.
This result also shows that position wave function solutions to the Schr
odinger equation
transform as
iq
(~x, t) e h c f (t,~x) (~x, t)
under a gauge transformation.
Aharonov-Bohm effect
The Aharonov-Bohm effect involves the effect of a magnetic field on the behavior of
a particle even when the particle has vanishing probability for being found where the
magnetic field is non-vanishing. Of course, classically the Lorentz force law would never
lead to such behavior. Nevertheless, the AB effect has been seen experimentally. You will
explore one version of this effect in a homework problem. Here let me just show you how,
technically, such a result can occur.
2
The key to the AB effect is to cook up a physical situation where the magnetic field is
non-vanishing in a region (from which the charged particle will be excluded) and vanishing
in a non-simply connected region where the particle is allowed to be. Since the magnetic
field vanishes in that region we have that
~ = 0.
A
In a simply connected, contractible region of space such vector fields must be the gradient
of a function. In this case the potential can be gauge transformed to zero, and there will
be no physically observable influence of the magnetic field in this region. However, if the
~ is a gradient, i.e., pure gauge.
region is not simply connected it need not be true that A
As an example (relevant to your homework), we study the following scenario.
Consider a cylindrical region with uniform magnetic field (magnitude B) along the axis
of the cylinder. You an imagine this being set up via an (idealized) solenoid. Outside of
the cylinder the magnetic field vanishes, but the vector potential outside the cylinder must
~ cannot be the gradient of a function everyhwere outside the
be non-trivial. In particular A
cylinder. To see this, we have from Stokes theorem:
Z
Z
~
~
~ d~s,
A dl =
B
C
where C is a closed contour enclosing the cylinder and S is a surface with boundary C,
~ through S is non-zero (which it
so that the right-hand side is never zero if the flux of B
~ is a gradient then the left-hand side vanishes (exercise)
isnt in our example). But if A
contradiction. In fact, the vector potential can be taken to be (exercise)
2
~ = 1 BR e ,
A
2 2r
r>R
where R is the radius of the cylinder, r > R is the cylindrical radial coordinate and e
~ is necessarily not
is a unit vector in the direction of increasing cylindrical angle. Since A
(gauge-equivalent to) zero, it can affect the energy spectrum and it does.
Angular momentum - introductory remarks
The theory of angular momentum in quantum mechanics is important in many ways.
The myriad of results of this theory, which follow from a few simple principles, are used
extensively in applications of quantum mechanics to atomic, molecular, nuclear and other
subatomic systems. The mathematical strategies involved have a number of important
generalizations to other types of symmetries and conservation laws in quantum mechanics.
The quantum mechanical theory of angular momentum leads naturally to the concept of
intrinsic spin. Just as we saw for spin 1/2, a general feature of angular momentum in
3
quantum mechanics is the incompatibility of the observables corresponding to any two

components of spin. The nature of this incompatibility is at the heart of virtually all
features of angular momentum.
Just as linear momentum is intimately connected with the notion of translation of the
physical system, so angular momentum is deeply tied to the theory of rotations of the
physical system being considered. We shall use this geometric interpretation of angular
momentum as the starting point for our discussion.
Lecture 21
Rotations in three dimensions
We now begin our discussion of angular momentum using its geometric interpretation
as the generator of rotations in space. I should emphasize at the outset that our discussion
can be a little confusing because we will be studying vectors and linear transformations
in 2 distinct spaces: (i) the 3-d (Euclidean) space we live in, and (ii) the Hilbert space of
quantum state vectors. The 3-d rotations are, of course, going to be related to corresponding transformations on the space of quantum states, but it is not too hard to get mixed
up about which space various quantities are associated with. So watch out!
We begin by summarizing some elementary results concerning rotations in three dimensions. This part of the discussion is completely independent of quantum mechanical
considerations. Until you are otherwise notified, everything we do will only refer to properties of rotations of observables in the 3-d space live in.
~ , responds to a rotation according to
A vector observable for some physical system, V
a (special) orthogonal transformation:
~ RV
~.
V
Here R is a linear transformation of 3-d vectors such that
~ ) (RW
~ )=V
~ W
~.
(RV
Evidently, magnitudes of vectors as well as their relative angles are invariant under this
transformation.
~, W
~ as column vectors V , W relative to some Cartesian
If you represent vectors V
basis, the dot product is
~ W
~ = V T W = W T V.
V
You can then represent R as a 3 3 matrix, also denoted R for convenience, acting on the
~ and satisfying (exercise)
(Cartesian) components of V
RT = R1 ,
that is
RT R = I = RRT ,
where the superscript T means transpose and I is the 3 3 identity matrix. Such
matrices are called orthogonal (do you know why?). As a simple example, a rotation about
1
the z axis by an angle is represented by

Rz () =
cos
sin
0
sin
cos
0
0
0
1
!
.
In general, rotations are defined relative to an origin, which is fixed by any rotation.
The rotation is then defined by giving an axis of rotation through the origin and an angle
of rotation about that axis. The axis itself can be specified by a unit vector n
. We will
write R(
n, ) for the orthogonal transformation so-defined by the axis through the origin
along n
and by the angle . The sense of the rotation (counterclockwise or clockwise)
is determined by the right-hand rule. For any single rotation we can always choose the
z-axis to be along n
and then the rotation matrix takes the form given above. Of course,
when considering different rotations about different axes one cannot put them all into this
simple form. You can see that it takes 3 numbers (two for n
and one for ) to specify a
rotation. The set of all rotations about a point forms a three-dimensional group (since
it has 3 parameters). This means, in particular, that every rotation has an inverse, and
that the product of two rotations is equivalent to third rotation. This group is called the
rotation group and denoted by SO(3). The 3 means rotations in 3 dimensions. The
O means orthogonal. And the S means special. This latter adjective arises since
not all orthogonal transformations are rotations, they also include discrete transformations:
reflections and inversions. All orthogonal matrices have determinants of 1. To see this,
recall that det(AB) = det(A) det(B) and det AT = det A, so that
OOT = I = [det(O)]2 = 1.
The rotations are described by matrices with unit determinant, while the discrete transformations (that are not rotations in disguise) have negative determinant. For example,
the transformation
~ V
~
V
is given by the 3 3 orthogonal matrix O = I, which has determinant 1. The rotation
group is non-Abelian, which means non-commutative, since successive rotations commute
if and only if they are about the same axis. The ways in which successive rotations combine
to make a third rotation is somewhat intricate. However, this complicated behavior can
be fruitfully analyzed by studying infinitesimal rotations.
Infinitesimal Rotations
Our goal is to view angular momentum as the infinitesimal generator of rotations on the
space of quantum states, so we need to understand rotations from the infinitesimal point
of view. Since rotations depend continuously on the angle of rotation, we can consider
2
rotations that are infinitesimal, that is, nearly the identity. An infinitesimal rotation in
3-d space about an axis n
and angle << 1 can be written as
R(
n, ) I + G,
where the linear transformation G is the generator of rotations, a 3 3 matrix, and we are
ignoring terms of order 2 . (I emphasize that, we are presently considering rotations in
3-d space; we havent yet moved to the representation of rotations on state vectors.) Note
that if R(
n, ) is to be orthogonal then G must be an antisymmetric matrix (exercise):
GT = G.
For example, if n
is along z, we have that (exercise)
Gz =
1
0
0
0
1
0
0
0
0
!
.
We can define a rotation generator for rotation about any axis. We write
~
R(
n, ) I +
n G,
where
~ = (G1 , G2 , G3 ) = (Gx , Gy , Gz )
G
are a basis for the 3-dimensional vector space of anti-symmetric matrices. Gz is displayed
above; you can easily compute the forms for Gx and Gy by expanding the rotation matrices
about the indicated axes to first order in the rotation angle (exercise):
!
!
0 0 1
0 0 0
0 0 0 .
Gx = 0 0 1 , Gx =
1 0 0
0 1 0
A straightforward computation reveals (exercise):
[Gi , Gj ] = ijk Gk .
These are the commutation relations of infinitesimal rotations. They give a complete
(albeit infinitesimal) account of the way in which successive rotations combine to give a
net rotation. In essence, the generators and their commutation relations define the group
of rotations. Indeed, just as with translations, we can build up a finite rotation about an
axis along n
by an infinite number of infinitesimal rotations according to
R(
n, ) = lim (I + n
G)N = enG .
N
N
The commutation relations encode the relationships between different rotations.
Note that these commutation relations, which are for the generators of rotations in
3-d space, look a lot like those you encountered for the components of the spin operators.
Of course, these vector observables constitute operators on a 2-d Hilbert space. But the
similarity in the commutation relations is no accident as we shall see.
3
Lecture 22
Rotations in quantum mechanics

Now we will discuss what the preceding considerations have to do with quantum mechanics. In quantum mechanics transformations in space and time are implemented or
represented by unitary transformations on the Hilbert space for the system. The idea
is that if you apply some transformation to a physical system in 3-d, the state of the
system is changed and this should be mathematically represented as a transformation of
the state vector for the system. We have already seen how time translations and spatial
translations are described in this fashion. Following this same pattern, to each rotation
R we want to define a unitary transformation, D(R), such that if |i is the state vector
for the system, then D(R)|i represents the state vector after the system has undergone a
rotation characterized by R. The key requirement here is that the pattern for combining
two rotations to make a third rotation is mimicked by the unitary operators. For this
we require that the unitary operators D(R) depend continuously upon the rotation axis
and angle and satisfy
D(R1 )D(R2 ) = ei12 D(R1 R2 ),
where 12 is a real number, which may depend upon the choice of rotations R1 and R2 ,
as its notation suggests. This phase freedom is allowed since the state vector D(R1 R2 )|i
cannot be physically distinguished from ei12 D(R1 R2 )|i.
If we succeed in constructing this family of unitary operators D(R), we say we have
constructed a unitary representation of the rotation group up to a phase, or a projective
unitary representation of the rotation group. You can think of all the parameters as
simply specifying, in part, some of the freedom one has in building the unitary representatives of rotations. (If the representation has all the parameters vanishing we speak
simply of a unitary representation of the rotation group.)
This possible phase freedom in the combination rule for representatives of rotations is
a purely quantum mechanical possibility and has important physical consequences. Incidentally, your text book fails to allow for this phase freedom in the general definition of
representation of rotations. This is a pedagogical error, and an important one at that.
This error is quite ironic: the first example the text gives of the D(R) operators is for a
spin 1/2 system where the phase factors are definitely non-trivial, as we shall see.
Infinitesimal Rotations and Angular Momentum

Since D(R) depends continuously upon an axis and angle, we can consider its infinites1
imal form. For a fixed axis n

and infinitesimal rotation angle we have
i
~
D(R) I
n J,
h
where
J~ = (J1 , J2 , J3 ) = (Jx , Jy , Jz )
are self-adjoint operators, Ji = Ji with dimensions of angular momentum (in the sense that
their matrix elements and eigenvalues have these dimensions). The operator Ji generates
transformations on the Hilbert space corresponding to rotations of the system about the xi
axis. We identify the operators Ji with the angular momentum observables for the system.
Of course, the physical justification of this mathematical model of angular momentum
relies upon the unequivocal success of this strategy in describing physical systems. In
particular, the Ji will (under appropriate circumstances) be conserved.
By demanding that the unitary transformations on the Hilbert space properly mimic
(more precisely, projectively represent) the rotations of 3-d space, it can be shown (see
text for a version in which the phase factors are omitted) that the angular momentum
operators satisfy the commutation relations
[Jk , Jl ] = ihklm Jm .
While the proof takes a little work and is omitted here, the result is very reasonable. Indeed,
the commutation relations of infinitesimal generators in 3-d space encode the geometrical
relationship between various rotations. It is therefore not surprising that the generators of
rotations on the space of state vectors must obey the same commutation relations as the
generators of rotations in 3-d space (up to the i
h, which is there because of the way we
~
defined J).
You have seen in your homework that the spin observables for a spin 1/2 system satisfy
these commutation relations. Thus we identify the spin observables as a kind of angular
momentum. This is not just a matter of terminology. In a closed system (e.g., an atomic
electron and a photon), angular momentum is conserved. However the angular momentum
of a subsystem (e.g., the electron) need not be conserved since it can exchange angular
momentum with the rest of the system (e.g., the photon) so long as the total angular
momentum is conserved. The bookkeeping thus provided by conservation of angular
momentum requires the spin angular momentum contribution to be included in order to
balance the books. Spin angular momentum provides a contribution to the conserved
angular momentum of a closed system.
Using the same mathematical technology as we did for time and space translations, it
is not hard to see that a finite (as opposed to infinitesimal) rotation can formally be built
2
out of many infinitesimal rotations, leading to the formula:

D(
n, ) = lim (I
N
i
~ = e hi nJ~.
n
J)
hN
The detailed form of this exponential operator, like that of the Ji depends upon the specific
physical system being studied. The most familiar form of angular momentum is probably
that of a particle moving in 3-d. However, spin also is a form of angular momentum
(according to the above type of analysis) and it is the simplest, mathematically speaking,
so we shall look at it first.
Spin as angular momentum

You will recall that the spin 1/2 observables Si have the dimensions of angular momentum and satisfy the angular momentum commutation relations:
[Sk , Sl ] = i
hklm Sm .
We therefore have that rotations of the spin 1/2 system are accomplished by unitary
operators of the form
i
D(
n, ) = e h nS .
Let us have a look at an example.
Consider a rotation about the z-axis. Using the Sz eigenvectors as a basis we have the
matrix elements

h 1 0
,
Sz =
2 0 1
so that the matrix elements of the rotation operator are (exercise)

D(
z , ) =
e 2
0
0i
e2
You can do this calculation by using the power series definition of the exponential you
will quickly see the pattern. You can also use the spectral decomposition definition. Recall
that for any self-adjoint operator A we have
A=
ai |iihi|,
where A|ii = ai |ii. Similarly, as you saw in the homework,

f (A) =
X
i
f (ai )|iihi|.
Let
h
(|+ih+| |ih|),
2
and let f be the appropriate exponential function and you will get
A = Sz =
D(
z , ) = e h Sz = e 2 |+ih+| + e 2 |ih|.
Notice that this family of unitary operators satisfies
D(
z , 1 )D(
z , 2 ) = D(
z , 1 + 2 ),
as it should. On the other hand, right away you should notice that something interesting
has happened. The unitary transformation of a spin 1/2 system corresponding to a rotation
by 2 is not the identity, but rather minus the identity! Thus, if you rotate a spin 1/2
system by 2 its state vector |i transforms to
i
|i e h (2)Sz |i = |i.
Indeed, it is only after a rotation by 4 that the spin 1/2 state vector returns to its
original value. This looks bad; how can such a transformation rule agree with experiment?
Actually, everything works out fine since the expectation values are insensitive to this
change in sign:
i
h|A|i h|e h (2)Sz Ae h (2)Sz |i = h|(1)A(1)|i = h|A|i.

What is happening here is that the spin 1/2 system is taking advantage of the phase
freedom in the projective representation. On the one hand we have
R(
n, )R(
n, 2 ) = R(
n, 2) = R(
n, 0) = I.
For the spin 1/2 representation we have
D(
n, )D(
n, 2 ) = I = D(
n, 0).
Let us note that the phase freedom in the representation of rotations is a rather subtle,
intricate feature of the way in which quantum mechanics describes the physical world.
Because of this subtlety in the theory we are able to properly accommodate spin 1/2
systems. This is one of the great successes of quantum mechanics.
How do observable quantities change in general when we make a rotation about z?
Under a rotation about z the expectation value transforms via
i
hAi = h|A|i h|e h Sz Ae h Sz |i

4
Choose, for example, A = Sx . By expanding the exponentials in a Taylor series or by

using the spectral decompositions, and using
h
h
Sz = (|+ih+| |ih|), Sx = (|+ih| + |ih+|),

2
2
we have that (exercise)
i
i
h
e h Sz Sx e h Sz = (ei |+ih| + ei |ih+|)

2
= cos Sx sin Sy .
Note that this is exactly how the x-component of a vector transforms under a rotation
about z. Because of this we get
hSx i cos hSx i sin hSy i.
Similarly, it follows that
hSy i cos hSy i + sin hSx i,
and you can easily see that
hSz i hSz i
when the state vector is transformed. You can see that the unitary representative of
rotations does indeed do its job as advertised.
Rotations represented on operators
Let us follow up on one result from above. We saw that
i
e h Sz Sx e h Sz = cos Sx sin Sy .
In the above equation the left hand side has the product of 3 Hilbert space operators
appearing, corresponding to what happens to the spin vector observable when you change
the state of the system via a unitary transformation corresponding to a rotation. The
operator on the right hand side of the equation the linear combination of spin operators that
you get by rotating them as if they are components of a vector in 3-d space.. This reflects
a general rule which connects the rotations of 3-d space and their unitary representatives
on the space of state vectors:
~ n, ) = R(
~
D (
n, )SD(
n, )S.
~ will behave
You can see immediately from this relation that the expectation values of S
~ is any trio of self-adjoint operators
like the components of a vector. More generally, if V
on Hilbert space representing a vector observable, then
~ D(
~.
D (
n, )V
n, ) = R(
n, )V
You can think of this as analogous to the Heisenberg picture, but now for rotations. A
rotation of the system can be mathematically viewed as a transformation of the state
vector or, equivalently, as a transformation of the observables (but not both!).
5
Lecture 23
Spin precession as a rotation

It is enlightening to return to the dynamical process of spin precession in light of our
new results on rotations. You will recall that a spin system with magnetic moment
~ when
~
placed in a uniform magnetic field B can be described by the Hamiltonian
~
H = ~
B,
where
~ = S.
You will recall that the behavior of the spin observables could be viewed as precession
~ i.e., a continuously developing ( in time) rotation about an axis along B.
~ We
about B,
~ so that
can now see this result immediately. Let n
be a unit vector along B,
H = B
n S.
This means that the time evolution operator is
i
U (t, t0 ) = e h (tt0 )BnS .

This operator represents a rotation about n
by an angle B(t t0 ), which is exactly our
previous result for the dynamics.
Note that, while the physical observables are precessing with frequency B, the state
vector itself has is precessing at half the frequency since, e.g., it takes a 4 rotation to get
the state vector to return to its initial value. It is possible to experimentally see this
difference in frequencies (thereby confirming the projective representation being used) by
a pair of spin 1/2 systems, one of which propagates freely and one of which travels through
a region with a magnetic field. The latter spin will precess according to the time it spends
in the magnetic field. The two particles can be brought together to form an interference
pattern. The interference pattern depends upon the relative phase of the two particles. If
the magnetic field region is set up just right, you can arrange for the second particle to
change its state vector by a minus sign by the time it leaves the magnetic field region. The
interference pattern that you see confirms this fact. See your text for details.
Angular momentum in general

We can deduce quite a lot about the angular momentum J~ of any system just knowing
that it is represented by 3 self-adjoint operators satisfying the angular momentum commutation relations. To begin with, it is clear that the 3 components of J~ are not compatible
1
so that, generally speaking, one will not be able to determine more than one component
with certainty. Indeed, the only state in which 2 or more components of J~ are known
with certainty is an eigenvector of all components with eigenvalue zero, i.e., a state with
vanishing angular momentum. To see this, suppose that |i is an eigenvector of Jx and
Jy , then it is easy to see from
[Jx , Jy ] = ihJz ,
that |i is an eigenvector of Jz with eigenvalue 0. You can easily see that this same
argument can now be used to show that |i has zero eigenvalue for all three components.
Thus, if there is any angular momentum in the system at all, at most one component can
be known with certainty in any state. When we consider states with a definite value for
~ we usually call that component Jz , by convention. But it is important
a component of J,
to realize that there is nothing special about the z-direction; one can find eigenvectors for
any one component of J~ (cf. spin 1/2).
We next observe that the (squared) magnitude of the angular momentum,
J 2 = Jx2 + Jy2 + Jz2
is a Hermitian operator that is compatible with any component Ji . To see this is a very
simple computation:
X
X
[J 2 , Jk ] =
(Jl [Jl , Jk ] + [Jl , Jk ]Jl ) = i
h
lkm (Jl Jm + Jm Jl ) = 0,
l
l,m
where the last equality follows from the antisymmetry of lkm .* We will assume that J 2 is
self-adjoint. Consequently, there exists an orthonormal basis of simultaneous eigenvectors
of J 2 and any one component of J~ (usually denoted Jz ). Physically, this means that while
the 3 components of angular momentum are not compatible, there exists a complete set
of states in which the magnitude of angular momentum and one component of angular
momentum are known with certainty.
Angular momentum eigenvalues and eigenvectors
Of course, given an observable represented as an operator, the most pressing business
is to understand the spectral properties of the operator since its spectrum determines the
possible outcomes of a measurement of the observable and the (generalized) eigenvectors
are used to compute the probability distribution of the observable in a given state. In our
case we have defined angular momentum as operators satisfying
J~ = J~ ,
[Jl , Jm ] = ihlmn Jn .
* The quantity in parenthesis is symmetric under l m while lkm is anti-symmetric when

this interchange is performed. This guarantees that each term in the double sum will be
canceled by another term in the double sum.
2
Just from these relations alone there is a lot we can learn about the spectral properties of
angular momentum.
We assume that each of the operators Ji and J 2 admit eigenvectors. Let us study the
angular momentum eigenvalues and eigenvectors, the latter being simultaneous eigenvectors of Jz and J 2 . We write
J 2 |a, bi = a|a, bi,
Jz |a, bi = b|a, bi,
The possible values of a and b can be deduced much in the same way as the spectrum of
the Hamiltonian for an oscillator can be deduced using the raising and lowering operators.
To this end we define the angular momentum ladder operators
J = Jx iJy ,
J = J .
Of course, these two operators contain the same physical information as Jx and Jy . In terms
of the ladder operators, the angular momentum commutation relations can be expressed
as (exercise)
[Jz , J ] = hJ , [J , J 2 ] = 0, [J , J ] = 2hJz .
From these relations we can see that the vector J |a, bi satisfies (exercise)
J 2 (J |a, bi) = a(J |a, bi),
Jz (J |a, bi) = (b
h)(J |a, bi).
Thus, when acting on angular momentum eigenvectors (eigenvectors of J 2 and Jz ), the

ladder operators preserve the magnitude of the angular momentum but increase/decrease
the z component by a quantum of angular momentum h
.
Lecture 24
Angular momentum eigenvalues and eigenvectors (cont.)

Next we show that the eigenvalues of J 2 are non-negative and bound the magnitude
of the eigenvalues of Jz . One way to see this arises by studying the relation
1
1
J 2 Jz2 = (J+ J + J J+ ) = (J J + J+ J+ ).
2
2
Now, for any operator A and vector |i we have that (exercise)
h|A A|i 0,
so that for any vector |i (in the domain of the squared angular momentum operators)
(exercise)
h|J 2 Jz2 |i 0.
Assuming the eigenvectors |a, bi are not of the generalized type, i.e., are normalizable,
we have
0 ha, b|J 2 Jz2 |a, bi = a b2 ,
and hence
a 0,
a b a.
The ladder operators increase/decrease the b value of the eigenvector with out changing
a. Thus by repeated application of these operators we can violate the inequality above
unless there is a maximum and minimum value for b such that application of J+ and J ,
respectively, will result in the zero vector. Moreover, if we start with an eigenvector with
a minimum (maximum) value for b, then by successively applying J+ (J ) we must hit
the maximum (minimum) value. As shown in your text, these requirements lead to the
following results. The eigenvalues a can only be of the form
a = j(j + 1)
h2 ,
where j 0 can be a non-negative integer or a half integer only:
j = 0, 1/2, 1, 3/2, . . . .
For an eigenvector with a given value of j, the eigenvalues b are given by
b = m
h,
1
where
m = j, j + 1, . . . , j 1, j.
Note that if j is an integer then so is m, and if j is a half-integer, then so is m. Note also
that for a fixed value of j there are 2j + 1 possible values for m. The usual notational
convention is to denote angular momentum eigenvectors by |j, mi, with j and m obeying
the restrictions described above.
The preceding arguments show how the self-adjointness and commutation relations of
angular momentum give plenty of information about their spectrum. We note that these
are necessary conditions, e.g., the magnitude of angular momentum must be determined
via an integer or half-integer, but this does not mean that all these possibilities will occur.
As we shall see, for orbital angular momentum only the integer possibility is utilized. For
the spin 1/2 system, a single value j = 1/2 is utilized. We will discuss this in a little more
detail next.
Spin systems in general

Let us note that the spin 1/2 observables, being angular momentum operators, must
have eigenvectors/eigenvalues obeying the general results we have just derived. Indeed,
you can easily see that with j = 1/2 we reproduce the standard results on the spectrum
of the spin operators. For example we have
3 2
J 2 S2 = h
I,
4
which has eigenvalues
3
1 1
( + 1)
h2 = h2 .
2 2
4
Given that j = 12 we have
1 1
m= , ,
2 2
so that the eigenvalues for Jz Sz are
h, as they should be.
The 1/2 in spin 1/2 comes from the fact that j = 1/2 for all states of interest in
this physical system. We can generalize this to other values of j. We speak of a particle
or system having spin s if it admits angular momentum operators which act on a Hilbert
space of states all of which have the same eigenvalue for S 2 , that is, all of which have
the same value for j = s. For a system with spin-s and no other degrees of freedom the
Hilbert space of states has dimension 2s + 1 (exercise) and the operator representing the
squared-magnitude of the spin is given by (exercise)
S 2 = s(s + 1)
h2 I.
2
Orbital angular momentum

In nature it appears that angular momentum comes in two types when we use a particle description of matter. First there is the intrinsic spin angular momentum carried
by an elementary particle. The spin of a particle is fixed once and for all (although the
spin state is not) and is part of the essential attribute that makes a particle what it is.
Second, there is the orbital angular momentum which arises due to the motion of the
particle in space. Both of these types of angular momentum are to some extent unified
when using the presumably more fundamental description of matter and its interactions
afforded by quantum field theory.
The orbital angular momentum of a particle is represented by the operator
~ =X
~ P~ ,
L
~ is the position operator relative to some fixed origin. Let us note that the ordering
where X
~ and P~ is unambiguous since the cross product only brings commuting
of the operators X
operators into play. For example,
Lz = XPy Y Px .
~ is familiar from classical mechanics, but it can be justified
The formula above for L
using the angular momentum commutation relations. For example, you can check by a
straightforward computation that
[Lx , Ly ] = [Y Pz ZPy , ZPx XPz ] = i
h(XPy Y Px ) = i
hLz .
The other angular momentum commutation relations follow in a similar fashion.
Orbital angular momentum and rotations
To further justify this form of the orbital angular momentum, we can study its role
as infinitesimal generator of rotations. Let us consider an infinitesimal rotation about the
z-axis. The putative generator is
Lz = XPy Y Px .
We can study the action of Lz on states by computing its action on the positions basis,
~ xi = ~x|~xi.
R|~
|~xi = |x, y, zi,
An infinitesimal rotation by an angle is given by

i
D() = I Lz + O(2 ),
h
so that, to first order in ,

i
D()|x, y, zi = [I (xPy yPx )]|x, y, zi + O(2 )
h
= |x y, y + x, zi + O(2 ).

Here we used the fact that momentum generates translations. Now recall the following
geometric fact: under an infinitesimal rotation about an axis n
by an angle the position
vector (indeed, any vector) transforms as
~x ~x +
n ~x + O(2 ).
Choosing n
along the z axis, we can compare this formula with the change of the position eigenvector under the infinitesimal transformation generated by Lz . We see that the
position eigenvectors eigenvalue rotates properly (at least infinitesimally).
It is not hard to see that under a finite (i.e., non-infinitesimal) rotation about the
z-axis we have
i
e h Lz |x, y, zi = |x cos y sin , y cos + x sin , zi

i ~xi.
= |R(k,
You can prove this by iterating the infinitesimal transformation, for example. Since the z
axis is arbitrary, we have in fact proved (exercise)
i
e h nL |~xi = |R(
n, i ~xi.
This implies that (exercise)
i
~ ~ i
~
~
h
nL = R(
e h nL Xe
n, ) X,
which can be checked by evaluating it on the position basis. Therefore we have that
(exercise)
i
X(e h nL |~xi) = R(
n, ) ~x(e h nL |~xi),
so that the rotation operator on the Hilbert space maps eigenvectors of position to eigenvectors with the rotated position:
i
e h nL |~xi = |R(
n, ) ~xi.
From this result we have the position wave functions rotating properly (exercise):
i
e h nL (~x) = h~x|e h nL |i = (R1 (

n, ) ~x).
An identical set of results can be obtained for the momentum operators and their
eigenvectors and momentum wave functions. This is a satisfactory set of results since the
momentum vector should behave in the same way as the position vector under rotations.
We have, in particular
~
D(
n, )|~
pi = e h nL |~
pi = |R(
n, )~
pi,
i
i
~
~
e h nL P~ e h nL = R(
n, )P~ .
Lecture 25
Position representation of angular momentum operators
We have seen that the position operators act on position wave functions by multiplication and the momentum operators act by differentiation. We can combine these two
results and, using spherical polar coordinates (r, , ), get a useful position wave function
representation for the angular momentum operators. We have

h
Lx (r, , ) =
sin cot cos (r, , ),
i

h
Ly (r, , ) =
cos cot sin (r, , )
i
h
Lz (r, , ) = (r, , ).
i
You can see that Lz is particularly simple it clearly generates translations in ,
~ also
which are rotations about the z axis, of course. The other two components of L
generate rotations about their respective axes. They do not take such a simple form
because spherical polar coordinates give the z axis special treatment.
Combining these results we have, in addition,

1
1
2
2
2
L (r, , ) = h
(sin ) (r, , ).
+
sin2 sin
You may recognize that this last result is, up to a factor of h2 r2 , the angular part of the
Laplacian. This result arises from the identity (see text)
~ P~ )2 + ihX
~ P~ ,
L2 = r2 P 2 (X
where r2 = X 2 + Y 2 + Z 2 , so that (exercise)
1
2
P 2 (r, , ) = h2 2 (r, , ) = h2 ( 2 L2 + r2 + r )(r, , ).
r
h r2
Thus we get, in operator form, the familiar decomposition of kinetic energy into a radial
part and an angular part.
Orbital angular momentum eigenvalues and eigenfunctions; spherical harmonics
A good way to see what is the physical content of the orbital angular momentum
eigenvectors is to study the position probability distributions in these states. Thus we
consider the position wave functions
lm = h~x|l, mi
1
corresponding to orbital angular momentum eigenvectors. These are simultaneous eigenfunctions of L2 and Lz , so they satisfy
Lz lml = ml
hlml ,
L2 lml = l(l + 1)h2 lml ,
where on general grounds

1
l = 0, , 1, . . . ,
2
and
ml = l, l + 1, . . . l 1, l.
We note that the angular momentum eigenfunctions will always involve an arbitrary
multiplicative function of the radius r. This is because the angular momentum differential
operators only take derivatives in the angular directions. What this means physically is
that the states of definite angular momentum will always have a degeneracy. This should
not surprise you: just specifying the angular momentum of a state of a particle is not
expected to completely determine the particles state.*
We now argue that the half-integer possibility does not occur for orbital angular momentum. First, we note that it is quite easy to find Lz eigenfunctions in spherical polar
coordinates since Lz = ih . Evidently,
lml = flml (r, )eiml .
Immediately we see that ml can only be an integer otherwise lm will not be a continuous
function. Now, discontinuous wave functions are not inherently evil. Indeed, there are
plenty of discontinuous functions in the Hilbert space. But these functions will fail to be
differentiable and hence will not in the domain of the momentum and angular momentum
operators. This is a contradiction since, of course, we are trying to construct the angular
momentum eigenfunctions which, by definition, are in the domain of the operators. Thus
we can conclude immediately that, for orbital angular momentum, we can only have (at
most)
l = 0, 1, 2, . . . , and ml = l, l + 1, . . . l 1, l.
As we shall see, all of the indicated values do in fact arise. (By the way, in your homework
assignment you will find a problem which gives an alternative argument for this result.)
Orbital angular momentum eigenvalues and eigenfunctions; spherical harmonics
Having solved the Lz equation we now must solve the L2 equation, which is an ordinary
* For example, consider the classical motion of a particle with vanishing angular momentum.
The motion is such that position and momentum vectors are parallel, but are otherwise
arbitrary.
2
differential equation for flml (r, ):

1 2
L flml =
h2
!
1
(sin ) fl,ml = l(l + 1)flml .

sin2 sin
m2l
The solutions of this equation are the associated Legendre polynomials Pl,ml (cos ) and
the angular momentum eigenfunctions are thus of the form
l,ml (r, , ) = fl,ml (r)Yl,ml (, ),
where the Yl,m (, ) are the spherical harmonics and the functions flml (r) are the integration constants for the solution to the purely angular differential equations. See your text
for detailed formulas for the spherical harmonics. Note that all non-negative integer values
are allowed for l. As discussed earlier, the functions flml (r) are not determined by the
angular momentum eigenvalue problem. Typically these functions are fixed by requiring
the wave function to be also an eigenfunction of another observable which commutes with
L2 and Lz , e.g., the energy in a central force problem. In any case, we will assume that
Z
dr r2 |fl,ml (r)|2 = 1.
0
This way, with the conventional normalization of the spherical harmonics:

Z
Z 2
d
d sin2 Yl0 ,m0 (, )Yl,ml (, ) = ll0 ml m0 ,
0
we have that
hl0 , m0l |l, ml i
Z
=
0
dr r2
Z 2
d
d, sin2 l0 ,m0 (r, , )l,ml (r, , ) = ll0 ml m0 .

l
For a state of definite angular momentum |l, ml i we see that the angular dependence
of the probability distribution is completely determined by the spherical harmonics. The
radial dependence of the probability distribution is not determined by the value of angular
momentum unless other requirements are made upon the state.
Addition of angular momentum: Two spin 1/2 systems
We now will have a look at a rather important and intricate part of angular momentum
theory involving the combination of two (or more) angular momenta. We will primarily
focus on the problem of making a quantum model for a system consisting of two distinguishable spin 1/2 particles (ignoring all but their spin degrees of freedom). Again, the
idea is simply to combine two copies of our existing model of a spin 1/2 system. The technology we shall need has already been introduced in our discussion of the direct product
3
construction. We shall take this opportunity to review the construction in the context of
the problem of combining or adding two spin 1/2 angular momenta.
For a system of two spin 1/2 particles, e.g., an electron and a positron, we can imagine
measuring the component of spin for each particle along a given axis, say the z axis.
Obviously there are 4 possible outcomes (exercise). Having made these measurements, we
can denote the states in which these spin values are known with certainty by
|Sz , +i |Sz , +i, |Sz , +i |Sz , i, |Sz , i |Sz , +i, |Sz , i |Sz , i.
Here the first factor of the pair always refers to particle 1 and the second factor refers to
particle 2. We view these vectors as an orthonormal basis for the direct product Hilbert
space of states of 2 spin 1/2 particles. We thus consider the 4-d Hilbert space of formal
linear combinations of these 4 basis vectors. An arbitrary vector |i is given by
|i = a++ |Sz , +i|Sz , +i+a+ |Sz , +i|Sz , i+a+ |Sz , i|Sz , +i+a |Sz , i|Sz , i.
Here the scalar multiplication is assigned to the pair as a whole, but by definition it can
be assigned to either of the factors in the pair as well. If you wish you can view the scalars
a as forming a column vector with 4 rows; the squares of these scalars give the various
probabilities for the outcome of the Sz measurement for each particle. Other bases are
possible, corresponding to other experimental arrangements, e.g., Sx for particle 1 and Sy
for particle 2.
Lecture 26
Two spin 1/2 systems: observables
We have constructed the 4-d Hilbert space of states for a system consisting of two
spin 1/2 particles. We built the space from the basis of product states corresponding to
knowing the spin along z for each particle with certainty. General states were, however,
not necessarily products but rather superpositions of such. How are the observables to
be represented as Hermitian operators on this space? To begin, let us consider the spin
~i = (S
~1 , S
~2 ). We define them on product
observables for each of the particles. Call them S
states via
~1 (|i |i) = (S|i)
~
S
|i,
and
~2 (|i |i) = |i (S|i).
~
S
~ are the usual spin 1/2 operators (acting on a two-dimensional Hilbert
Here the operators S
space) that we have already discussed in some detail.
If |i is an eigenvector of spin along some axis, then so is |i |i for any |i. This
means that if we know the spin component along the chosen axis with certainty for particle
~1 , as we should. The
one then we get an eigenvector of the corresponding component of S
~1 and S
~2 are defined on general vectors
same remarks apply to particle 2. The action of S
by expanding those vectors in a product basis, such as we considered above, and then
using linearity to evaluate the operator term by term on each vector in the expansion.
Sometimes one writes
~1 = S
~ I, S
~2 = I S
~
S
to summarize the above definition.
~1 and S
~2 commute (exercise) and have the same eigenvalues
The two spin operators S
as their 1-particle counterparts (exercise). In this way we recover the usual properties of
each particle, now viewed as subsystems.
Total angular momentum
There are other observables that can be defined for the two particle system as a whole.
Consider the total angular momentum ~S, defined by
~S = S
~1 + S
~2 .
You can easily check that this operator is Hermitian and that
[Sk , Sl ] = ihklm Sm ,
1
so it does represent the angular momentum. Indeed, this operator generates rotations of
~1 and S
~2 only generate
the two particle system as a whole. The individual spin operators S
rotations of their respective subsystems.
Using our general theory of angular momentum we know that we can find a basis of
common eigenvectors of S2 and any one component, say, Sz . Let us write these as |s, ms i,
where
S2 |s, ms i = s(s + 1)h2 |s, ms i, Sz |s, ms i = mh|s, ms i.
Let us define
|, i = |Sz , i |Sz , i.
This product basis physically corresponds to states in which the z component of spin
for each particle is known with certainty. In the following we will find the total angular
momentum eigenvalues and express the eigenvectors in the product basis |, i.
To begin with, it is clear that eigenvectors of Sz are in fact the basis of product vectors
since (with m1 = 21 , m2 = 21 )
Sz |m1 , m2 i = (S1z + S2z )|m1 , m2 i = (m1 + m2 )h|m1 , m2 i.
We see that m = 1, 0, 1 with m = 0 being doubly degenerate (exercise). From our general
results on angular momentum it is clear that the only possible values for the total spin
quantum number are s = 0, 1. From this we can infer that the m = 1 eigenvectors
must be S2 eigenvectors with s = 1, but we may need linear combinations of the m = 0
product eigenvectors to get S2 eigenvectors. To see why the vectors | + +i and | i
must be also S2 eigenvectors one reasons as follows. Our general theory guarantees us the
existence of a basis of simultaneous S2 and Sz eigenvectors. It is easy to see that the | + +i
and | i are the only eigenvectors (up to normalization) with m = 1, since any other
vectors can be expanded in the product basis and this immediately rules out any other
linear combinations (exercise). Therefore, these two vectors must be the S2 eigenvectors.
Because they have m = 1 and we know that s = 0, 1 it follows that the | + +i and | i
vectors are S2 eigenvectors with s = 1.
To determine the linear combinations of the m = 0 product vectors | + i and | +i
that yield S2 eigenvectors we use the angular momentum ladder operators:
S = Sx iSy = S1 + S2 .
If we apply S to the eigenvector
|s = 1, m = 1i = | + +i,
we get (exercise)
1
|s = 1, m = 0i = S |s = 1, m = 1i = S | + +i = (S1 + S2 )| + +i = (| +i + | + i).
2
2
The other eigenket |0, 0i must be orthogonal to this vector as well as to the other eigenkets,
| + +i and | i, from which its formula follows (exercise). All together, we find the total
angular momentum eigenvectors, |s, mi, are related to the individual angular momentum
(product) eigenkets by:
|1, 1i = | + +i
1
|1, 0i = (| + i + | +i),
2
|1, 1i = | i,
1
|0, 0i = (| + i | +i).
2
These vectors form an orthonormal basis for the Hilbert space, so they are all the linearly
independent eigenvectors of S2 and Sz . The eigenstates with s = 1 are called the triplet
states and the eigenstate with s = 0 is the singlet state. Notice that by combining two
systems with half-angular momentum we end up with a system that allows integer angular
momentum only.
A lengthier but more straightforward derivation of the eigenvectors |s, ms i arises
by simply writing the 4 4 matrix for S2 in the basis of product vectors | i, and solving
its eigenvalue problem. This is a good exercise. To get this matrix, you use the formula
S2 = S2z + +hSz + S+ S .
It is straightforward to deduce the matrix elements of this expression among the product
states since each of the operators has a simple action on those vectors.
Notice that the states of definite total angular momentum, |s, ms i, are not all the same
as the states of definite individual angular momentum, say, | i. This is because the
total angular momentum is not compatible with the individual angular momentum. For
example,
[S2 , S1i ] = [S12 + S22 + 2(S1x S2x + S1y S2y + S1z S2z ), S1i ] 6= 0.
A couple of complete sets of commuting observables are given by (S12 , S22 , S1z , S2z )) and
(S12 , S22 , S2 , Sz ). The eigenvectors of the first set are the product basis |m1 , m2 i = | i,
representing states in which each individual spin angular momentum state is known with
certainty. The eigenvectors of the second set are given by |s, ms i, representing states in
which the total angular momentum is known with certainty.
A remark on identical particles
Let us remark that there is yet another postulate in quantum mechanics that deals with
identical particles. These are particles that are intrincsically alike (same mass, spin, electric
charge, etc. ). Thus, for example, all electrons are identical, though of course they can be in
3
different states. This does not mean that electrons cannot be distinguished literally, since
we can clearly distinguish between an electron here on earth and one on the sun. These are
two electrons in different (position) states. But we view these particles as interchangeable
in the sense that if one took the electron from the sun and replaced it with the one here on
Earth (putting them in the respective states) when you werent looking, then you couldnt
tell. This intrinsic indistinguishability of identical particles opens up the possibility of
having the states of multi-particle systems reflect this symmetry under particle interchange.
This symmetry is modeled as a discrete, unitary transformation which exchanges particles.
The postulate of quantum mechanics (which can more or less be derived from relativistic
quantum field theory) is that particles with integer spin (bosons) should be invariant
under this unitary transformation (even under exchange) and they should change sign
(odd under exchange) if they particles have half-integer spin (fermions).
You can see that the total spin states of two spin 1/2 systems are in fact even and
odd under particle interchange. If the two particles are identical and no other degrees of
freedom are present, then one must use the anti-symmetric singlet state only. Of course,
real particles have translational degrees of freedom and the state will reflect that. Using
position wave functions to characterize these degrees of freedom, one again can consider
the symmetric and anti-symmetric combinations. Only the total state vector must have
the appropriate symmetry. For example, consider the ground state for two electrons in a
Helium atom. The position space ground wave function is symmetric under particle interchange. Thus the ground state must be a singlet. Excited states can, however by described
by symmetric spin states if the position part of the state vector is anti-symmetric under
particle interchange.
Lecture 27
Angular momentum addition in general
We can generalize our previous discussion of 2 spin 1/2 systems as follows. Suppose we
are given two angular momenta J~1 and J~2 (e.g., two spins, or a spin and an orbital angular
momentum, or a pair of orbital angular momenta). We can discuss both angular momenta
at once using the direct product space as before, with a product basis |j1 , m1 i |j2 , m2 i.
We represent the operators on product vectors as
~
J~1 (|i |i) = (J|i)
|i,
and
~
J~2 (|i |i) = |i (J|i),
and extend to general vectors by linearity. The product basis |j1 , m1 i |j2 , m2 i is the
basis corresponding to the commuting observables provided by (J12 , J22 , J1z , J2z )).
The total angular momentum is defined by
~J = J~1 + J~2 .
A set of commuting observables that includes the total angular momentum is provided
by the operators (J12 , J22 , J 2 , Jz ). Note that both bases are eigenvectors of J12 and J22
since these commute with all components of the individual and total angular momentum
(exercise). We also note that product eigenvectors |j1 , j2 , m1 , m2 i are in fact eigenvectors
of Jz with eigenvalues given by m = m1 + m2 since
Jz |j1 , j2 , m1 , m2 i = (J1z + J2z )|j1 , j2 , m1 , m2 i = (m1 + m2 )h|j1 , j2 , m1 , m2 i.
But we will have to take linear combinations of product basis vectors to get total angular
momentum vectors eigenvectors of J 2 .
The basis of total angular momentum eigenvectors are denoted |j1 , j2 , j, mi. For given
values of j1 and j2 , it can be shown that (see the text)
j = |j1 j2 |, |j1 j2 | + 1, . . . , j1 + j2 1, j1 + j2
with (as usual)
m = j, j + 1, . . . j 1, j.
The two sets of commuting observables defining each kind of basis are not all compatible.
In particular, J1z and J2z do not commute with J 2 . So the set of total angular momentum
1
eigenvectors will be distinct from the eigenvectors of the individual angular momenta. Total
angular momentum eigenvectors, |j1 , j2 , j, mi can be expressed as linear combinations of
|j1 , j2 , m1 , m2 i where the superposition will go over various m1 , m2 values. Of course one
can also expand |j1 , j2 , m1 , m2 i in terms of |j1 , j2 , j, mi where the superposition will be
over various j, m values. The coefficients in these superpositions are known as the ClebschGordan coefficients. We have worked out a very simple example of all this in the case of
a pair of spin 1/2 systems. There is a general theory of Clebsch-Gordan coefficients which
we shall not have time to explore. Instead we will briefly visit another, relatively simple,
and relatively important example.
Example: A particle in 3-d with spin 1/2
We now apply these ideas and obtain some formulas describing the angular momentum
of a particle moving in 3-d with spin 1/2. Of course this is a model of a non-relativistic
electron so it is extremely important.
To begin, we specify the Hilbert space of states. Using the tensor product construction,
we can view it as the space of formal linear combinations of the product basis:
|~x, i = |~xi |Sz , i.
A general state is of the form
Z
|i =
d3 x (a+ (~x)|~x, +i + a (~x)|~x, i) .
As usual, |a |(~x)|2 is the probability density for finding the particle at ~x with spin up/down
along the z axis. Note that the normalization condition is
Z

d3 x |a+ |2 + |a |2 = 1.
Alternatively, we can characterize the state in terms of its components in the basis defined
above. In this case we organize the information into a 2-component column vector whose
entries are complex valued functions. This gadget is known as a spinor field:

h~x, +|i
a+ (~x)
=
.
h~x, +|i
a (~x)
The position, momentum and spin operators are defined on the product basis as follows:
~ xi |Sz , i) = ~x|~x, i,
X(|~
P~ (|~xi |Sz , i) = (P~ |~xi |Sz , i,
~ xi |Sz , i) = |~xi (S|S

~ z , i).
S(|~
2
This implies that, on the spinor fields the position and momentum operators do their usual
~ multiplies, P~ differentiates) on each component function while the spin operators
thing (X
do their usual thing via 2 2 matrices. For example, the orbital angular momentum is
~ =X
~ P~ and acts as
L
!

h ~
x
a
(~
x
)
a
(~
x
)
+
+
~ P~
X
= hi
,
a (~x)
x a (~x)
i~
and

Sx
a+ (~x)
a (~x)
h
=
2
a (~x)
a+ (~x)

.
We can now define the total angular momentum of the system as the operator
~ + S.
~
J~ = L
~ and S
~ commute and satisfy the angular momentum commutation
As usual, because L
relations, we have
[Ja , Jb ] = ihabc Jc ,
so the total angular momentum has all the general properties we deduced previously. For
~ only generates rotations
example, J~ generates rotations of the system as a whole, while L
~ only generates rotations of the spin. We can only
of position and momentum, and while S
~ and
simultaneously diagonalize L2 , S 2 , J 2 and one component, say, Jz . Setting J~1 = L
~ we have j1 = l = 0, 1, 2, . . . and j2 = s = 1 . For a state specified by |l, s = 1/2, j, mi
J~2 = S
2
we then have that the possible values for j are
1
1
j = l ,l + .
2
2
How are the total angular momentum eigenvectors related to the original product
eigenvectors (eigenvectors of Lz and Sz )? We will sketch the construction. Begin with the
eigenvectors of total angular momentum with the maximum value of angular momentum:
1
1
1
|j1 = l, j2 = , j = l + , m = l + i.
2
2
2
Given j1 and j2 , there is only one (linearly independent) product eigenvector which has
the appropriate eigenvalue for Jz , namely
1
1
|j1 = l, j2 = , m1 = l, m2 = i,
2
2
so we have
1
1
1
1
1
|j1 = l, j2 = , j = l + , m = l + i = |j1 = l, j2 = , m1 = l, m2 = i.
2
2
2
2
2
3
To get the lower eigenvalues of Jz we can apply the ladder operator

J = L + S
to this ket and normalize it. The details can be found in the text. Similarly, we can also
start with the minimum values for j = l 12 and m = l + 21 and construct the states with
the other m values using the raising operator. In this way we can express all the total
angular momentum eigenvectors in terms of the product eigenenvectors. The result is not
too complicated. We have, for j = l 21 ,
r
h r
1
1
1
1
1
1
1
l m + |l, m i|+i+ l m + |l, m+ i|i.
|j1 = l, j2 = , j = l , mi =
2
2
2
2
2
2
2l + 1
In terms of spinor fields we get the total angular momentum spinors Yj,m (, ), representing the total angular momentum eigenvectors by taking components in the product
basis:
q
1
l
(, )
l m + 2Y
1
ml =m 21
q
Yj,m =
.
1
l
2l + 1
l m + 2Y
(,
1
ml =m+ 2
The upper (lower) component of this column vector determines the probability density for
finding a particle at various angular locations and with a spin up (down) along z, given
that its total angular momentum is known with certainty to have the values specified by
j = l 21 and m.
Lecture 28
Spin correlations and quantum weirdness: The EPR argument
Recall the results of adding two spin 1/2 angular momenta. The fact that the total spin
magnitude is not compatible with the individual spin observables leads to some somewhat
dramatic consequences, from a classical physics point of view. This drama was already
noted by Einstein-Podolsky and Rosen (EPR) in a famous critique of the completeness of
quantum mechanics. Much later, Bell showed that the basic classical locality assumption
of EPR must be violated, implying in effect that nature is truly as weird as quantum
mechanics makes it out to be. Here we give a brief discussion of some of these ideas.
The original EPR idea did not deal with spins, but with a pair of spinless particles. We
shall in a moment, following Bohm, deal with a spin system. But it is worth first describing
the EPR argument, which goes as follows. Consider two spinless particles characterized
by positions x1 , x2 and momenta p1 , p2 . It is easy to see that the relative position
x = x1 x2
and the total momentum
p = p1 + p2
commute, so there is a basis B of states in which one can specify these observables with
arbitrary accuracy.
Suppose the system is in such a state. Suppose that an observer measures the position
of particle 1. Assuming that particles 1 and 2 are well separated, there is no way this
experiment on particle 1 can possibly affect particle 2. (This is essentially the EPR locality
idea.) Then one has determined particle 2s position with arbitrary accuracy without
disturbing particle two. Thus particle 2s position is known with certainty it is an
element of reality. Alternatively, one could arrange to measure p1 . By locality, the
value of p2 is undisturbed and is determined with arbitrary accuracy. One concludes
that, given the locality principle, particle 2s position and momentum exist with certainty
as an element of reality. But, of course, quantum mechanics prohibits this situation.
Thus either quantum mechanics is incomplete as a theory (unable to give all available
information about things like position and momentum), or the theory is non-local in some
sense because the locality idea was used to argue that a measurement on particle 1 has no
effect on the outcome of measurements on particle 2. The loss of this type of locality was
deemed unpalatable by EPR, and so this thought experiment was used to argue against
the completeness of quantum mechanics. In fact, the correct conclusion is that quantum
mechanics is, in a certain sense, non-local.
1
Lecture 29
Spin correlations and quantum weirdness: Spin 1/2 systems
Consider a pair of spin 1/2 particles created in a spin singlet state. (Experimentally
speaking, this can be done in a variety of ways; see your text.) Thus the state of the
system is defined by the state vector
1
|i = (| + i | +i).
2
Let us suppose that the particles propagate undisturbed and not interacting until
they are well-separated. Particle 1 has a component of spin measured by Observer 1 and
particle 2 has a component of spin measured by Observer 2. To begin, suppose both
observers measures spin along the z axis. If observer 1 sees spin up, what does observer
2 see? You probably will correctly guess: spin down. But can you prove it? Well, the
reason for this result is that the state of the system is an eigenvector of Sz with eigenvalue
zero. So, the two particles are known with certainty to have opposite values for their
z-components of spin. Alternatively, you can see from the expansion of |i in the product
basis that the only states that occur (with equal probability) are states with opposite spins.
Let us see how to prove this systematically; its a good exercise.
We can ask the question as the following sequence of simple questions. What is the
probability P (S1z = h2 ) that observer 1 gets spin up? Thats easy:*
P (S1z =
1
h
) = |h+ + |i|2 + |h+ |i|2 = .
2
2
Of course, there is nothing special about particle 1 compared to particle 2; the same result
applies to particle 2. Whats the probability for getting particle 1 with spin up and particle
2 with spin down? We have
P (S1z =
h
h
1
, S2z = ) = |h+ |i|2 = .
2
2
2
* Well, its easy if you realize that, in a state |i, the probability for getting an eigenvalue
a of an operator A is (exercise)
P (a) =
d
X
|hi|i|2 ,
i=1
where |ii are a basis for the d-dimensional subspace of vectors with eigenvalue a:
A|ii = a|ii,
i = 1, 2, . . . d.
You can now easily infer that in the singlet state when particle 1 is determined to have
spin up along z, then particle 2 will have spin down along z with certainty. We say that the
Sz variables for particles 1 and 2 are completely correlated. Another way to state this
is the following. If in the singlet state particle 1 is determined to have spin up along
z, then the state vector of the system for the purposes of all subsequent measurements is
given by | + i. A subsequent measurement of the z component of spin for particle 2 is
then going to give spin-down with certainty. The foregoing discussion will work for any
choice of axis, that is, it holds for any component of the spin (the same component being
measured by each observer).
We imagine repeating this experiment many times. If only one observer measures
Sz , then he/she will see a random sequence of spin up or spin down results with a 5050 probability. Quantum mechanics says that this is the best you can do you cannot
predict with certainty the Sz value for each particle because the individual particle spin
is not compatible with the magnitude of the total spin of the system, which is what is
known with certainty. The (singlet) state vector has been specified and there is no more
information to be had about the system. How do the particles know that they have to
have a perfect correlation in their Sz components, given that they can be separated by
arbitrarily large distances? EPR would like to say: each of the particles does have a definite
value of Sz when they are created, and these values are correlated as described above, its
just that quantum mechanics is unable to predict the Sz values for each particles in the
first place and assigns them a random probability distribution.
EPR go on to argue that if, instead, quantum mechanics is the ultimate description
then a paradox arises. To see this, reason as follows. If both observers make measurements,
then observers 1 and 2 each see a random distribution of spin up and spin down results,
with a 50-50 probability. But when they compare their data they see that each observer 1
spin up is matched with an observer 2 spin down and vice versa. Thus, in this particular
situation, one might say that particle 2 has in each experimental run a definite value of Sz .
(Of course, the choice of the z axis is completely arbitrary.) Now, suppose that observer
1 chooses to measure Sx instead of Sz . When observer 1 gets spin up/down along x,
then the state of the system is (exercise) |S1x = h/2; S2x = h/2i and observer 2s Sz
measurement will now yield spin up and spin down with a 50-50 probability. To see this,
the probability P (S2z = h
/2) for getting S2z = h
/2 in either state |S1x = h/2; S2x , h/2i
can be computed by
P (S2z = h
/2) = |hS1x = h
/2; S2z = h
/2|S1x = h/2; S2x , h/2|2
+ |hS1x = h/2; S2z = h
/2|S1x = h/2; S2x , h/2|2
1
= .
2
Evidently, in this case particle 2 does not have a definite value for Sz it has a 50-50
2
probability distribution. On the other hand, if observer 2 chose to measure Sx , with

certainty the result will be opposite to the outcomes of observer 1s measurement. In other
words, when observer 1 measures Sx , particle 2 can be viewed as having a definite value
for Sx , but not of Sz .
Put differently we can say: (i) A measurement of one spin component yields random
(50-50) values. (ii) When observer 1 measures Sz observer 2s Sz value is perfectly correlated with observer 1s measurement. (iii) When observer 1 measures Sx , observer 2s Sz
values are random uncorrelated with observer 1s results.
From this discussion it is clear that according to the description provided by quantum
mechanics particle 2s reality depends upon how observer 1 does his/her measurement.
One cannot assert that the spin observables were a part of the reality of each particle
independent of the measurements. This contradicts the assumption of locality, i.e., that
for suitably separated particles the experiments done upon one should not affect the other.
The preceding is a pretty run-of-the-mill quantum mechanical result, ultimately stemming from the incompatibility of observables. But it is not consistent with EPRs principle
of locality, which prohibits the observable properties of particle 2 to depend upon those
of particle 1 when the two can be separated by an arbitrary distance. Thus we arrive at
the EPR paradox. EPR argue that the only way out is to suppose that all the spin
components of each particle are in fact well-defined, determined with certainty by the
state preparation process, and quantum mechanics is just an incomplete theory with a
concomitant bit of non-locality in its predictions.
However, experiments have in fact supported the view that quantum mechanics is in
fact non-local in the sense described above. In what follows we sketch the analysis that
supports this claim.
Bells theorem
Here we briefly describe Bells attempt to follow up on the EPR conclusion. We will
derive a certain prediction (an inequality satisfied by correlation functions) based upon
the EPR proposal that all spin components are in fact determined with certainty and that
they exist locally. We shall then show that this prediction disagrees with the quantum
mechanical prediction. Finally we then point out that experimental results support the
quantum mechanical prediction.
Suppose that when the two particles are created all their spin components are in fact
defined with certainty and that the values of the spin components of one particle are not
influenced by any experiments done on the second particle. Let us consider the relation
between the outcomes of measurements made by observers 1 and 2 in the case where
they measure spin components about arbitrarily chosen axes n
1 and n
2 , respectively. We
3
report the outcomes of the measurements in units of h

/2 and we suppose that observer 1s
result is given by a function A(
n1 , ) = 1 and observer 2s result is given by a function
B(
n2 , ) = 1, where is any set of (hidden) variables needed to uniquely determine the
outcome.* Of course quantum mechanics does not have access to , but if the EPR idea is
correct, there should be some set of variables that determine the spin components for each
particle when they are created. These variables will, in general, change from one creation
to the next; after all we know that each of the spin components is observed to randomly
fluctuate between the two possible values with a 50-50 probability. We characterize the
changing over many experiments in a very general way: using a probability density ()
(whose specific form will be irrelevant for the purposes of this discussion). So, for example,
the average value of A(
n, ) over many experimental runs is
Z
hA(
n)i = d()A(
n, ).
We next introduce the fact that when the two observers detectors are both aligned
(
n1 = n
2 ) we get equal and opposite values for A and B with certainty:
A(
n, ) = B(
n, ).
We first focus on the average of the product of the two measurements over many
experimental runs:
Z
hA(
n1 )B(
n2 )i = d()A(
n1 , )B(
n2 , )
Z
= d()A(
n1 , )A(
n2 , ).
Notice that this is essentially the correlation function of the two spin measurements.
Next, recall that the unusual non-local behavior appeared when we considered measurements of the spin of one particle in one direction only but varied the direction of spin
being measured for the other particle. So, introducing yet another direction, n
3 , we have
(exercise)
Z
hA(
n1 )B(
n2 )i hA(
n1 )B(
n3 )i = d()[1 A(
n2 , )A(
n3 , )]A(
n1 , )A(
n2 ),
where we used the fact that (because we measure in units of h/2)
[A(
n, )]2 = 1.
* The locality assumption is used when we say that A doesnt depend upon n
2 and that B
doesnt depend upon n
1.
4
It is now easy to establish the estimate (exercise):

Z
|hA(
n1 )B(
n2 )i hA(
n1 )B(
n3 )i| d()[1 A(
n2 , )A(
n3 , )].
You can prove this estimate by observing that for any f ()
Z
Z
| df () | d|f ()|.
Next, using
Z
d() = 1,
we get
|hA(
n1 )B(
n2 )i hA(
n1 )B(
n3 )i| 1 + hA(
n2 )B(
n3 )i.
This is a version of the Bell inequality. It holds for a very wide range of local, hidden
variable theories in which all observables are compatible. We now show that quantum
mechanics, of course, violates this inequality so that one can, in effect, experimentally
determine whether quantum mechanics is complete.
Lecture 30
Bells theorem (cont.)

Assuming suitable hidden variables coupled with an assumption of locality to determine
the spin observables with certainty we found that correlation functions must satisfy
|hA(
n1 )B(
n2 )i hA(
n1 )B(
n3 )i| 1 + hA(
n2 )B(
n3 )i.
We now show that quantum mechanics is not compatible with this inequality. We compute
the expectation value of the product of the two observers measurements (in units of h/2)
using quantum mechanics:

hA(
n1 )B(
n2 )i =

1 2
4
~1 )(
~2 )|i.
hS1 S2 i = 2 h|(
n1 S
n2 S
h/2
h
(Again, this quantity is the correlation function of the two spins.) With z chosen along
n
1 , this quantity is easily computed (exercise):
~1 )(
~2 )|i = h
~2 (| + i + | +i)
h|(
n1 S
n2 S
(h+ | h + |)
n2 S
4
h2
= cos ,
4
where is the angle between n
1 and n
2 . To get the last equality we assume that n
1 and
n
2 are in the x-z plane with z along and n
1 . Using to denote the angle between n
1 and
n
2 we have
~2 (|+i+|+i) = (h+|h+|)[cos S2z +sin S2x ](|+i+|+i) = cos
(h+|h+|)
n2 S
Of course the result is geometric and does not depend upon the choice of coordinates.
Thus, defining ij = n
i n
j , Bells inequality if it applied in quantum mechanics
would imply
| cos 13 cos 12 | 1 cos 23 ,
which is not true. Thus quantum mechanics is not consistent with all observables
having local definite values based upon some (unknown) hidden variables. On the other
hand, if reality is such that all observables for the individual particles are compatible
and locally defined (with QM just giving an incomplete statistical description), then this
To see this, just let n
1 point along y, let n
3 point along x, and let n
2 lie at 45 from x (or
y) in the x-y plane, so that 12 = /4 = 23 , 13 = /2.
1
inequality should be valid, experimentally speaking. (Assuming of course that the correct
description can be obtained using some hidden variables as described above.)
Experiments to check the Bell inequality have been performed since the 1960s. Many
regard the Aspect experiment of the early 1980s as definitive. It clearly showed that the
Bell inequality was violated, while being consistent with quantum mechanical predictions.
The experiment actually used spin-1 particles (photons) arising from atomic transitions,
rather than spin 1/2 systems, but the ideas are the same as described above.
Approximation methods
We now begin studying various approximation methods in quantum mechanics. Approximation methods have a practical and a conceptual value. On the practical side, we
use such methods to get useful, approximations to wave functions, energies, spectra, as
well as transition probabilities and other dynamical quantities. On the conceptual side
we shall see that some of our most cherished ways of thinking about energy levels and
dynamics stem principally from the point of view of approximation methods.
The need for approximation methods arises from the simple fact that almost all realistic
physical systems one wants to study are too complicated for explicit analytic solutions to
be available. (This is no less true in classical mechanics.) So, for example, while we can
analytically handle the hydrogen atom (when modeled as a charged particle in a Coulomb
field), we cannot handle helium or more complicated atoms in the same fashion let alone
dynamical processes involving the interaction of these atoms with electromagnetic fields.
In fact, even more realistic models of the hydrogen atom (including things like, spin-orbit
coupling, hyperfine interaction, finite size of nucleus, etc. ) are not exactly soluble. Thus
the only way we can understand these systems is to find methods of approximation.
We shall study two of several possible approximation techniques. First we shall look
at what is usually called time independent perturbation theory (TIPT), which gives approximate solutions to eigenvalue problems. But this is also called stationary state perturbation theory (since one is usually studying the eigenvalue problem for the Hamiltonian).
Then we shall study time-dependent perturbation theory (TDPT), which is designed to
give approximate solutions to the Schrodinger equation.
For the most part I am going to explain the results of the theory with essentially no
derivations. Then we will look at some important applications.
Time independent perturbation theory

This approximation method is designed to approximate the eigenvalues and eigenvectors of a given observable. This observable is usually the Hamiltonian (whence the
2
alternate name stationary state perturbation theory), but the techniques and results are
not restricted to just the Hamiltonian; any observable will do.
The basic idea is that one is attempting to view a given observable of interest as in
some sense close to a simpler, well-understood observable. One then approximates the
eigenvalues and eigenvectors of the given observable in terms of those of the simpler observable. For example, one could be interested in the energies of an anharmonic oscillator,
with Hamiltonian
P2
1
H=
+ m 2 X 2 + X 4 .
2m 2
Assuming that the anharmonicity (described by is suitably small), one can usefully
approximate the eigenvalues and eigenvectors of H in terms of those of the harmonic
oscillator. Let us now make this more precise.
We suppose that the observable of interest H admits eigenvalues and can be expressed
as
H = H0 + V,
where V is a small perturbation of H0 , e.g., its matrix elements in the basis of eigenvectors of H0 are small compared to the eigenvalues of H0 . For simplicity, we assume that H0
has discrete spectrum. We assume that all the operators in question are sufficiently well
behaved such that the eigenvectors and eigenvalues of H can be obtained via a 1-parameter
family of operators* beginning with H0 :
H() = H0 + V,
H(0) = H0 ,
H(1) = H.
For each value of we have

H()|E()i = E()|E()i.
The idea is that if the effect of V is small compared to that of H0 then one should be able
to approximate the eigenvectors and eigenvalues of H by expanding them in a power series
in , keeping just the first term or so, and then evaluating at = 1. This is equivalent
to approximating the eigenvectors and eigenvalues using an expansion in powers of the
matrix elements of the potential (which can be taken to be V ). Thus we assume that
(0)
(1)
(2)
En () = En + En + 2 En + ,
|En ()i = |En i(0) + |En i(1) + 2 |En i(2) + .
The plan is to solve the equation
(H0 +V )(|En i(0) + |En i(1) + 2 |En i(2) + )
(0)
(1)
(2)
= (En + En + 2 En + ) (|En i(0) + |En i(1) + 2 |En i(2) + )

* The parameter is not essential; it is just a convenient means of bookkeeping, as you will
see.
3
order by order in to derive the corresponding perturbative approximations to the eigenvectors and eigenvalues. Let us note that eigenvectors are only determined up to multiplication by a scalar. This is, of course, what allows us to normalize them and view them as
state vectors. Therefore, when we solve the eigenvalue problem perturbatively we will still
need to normalize the result.
At zeroth order, it is easy to see by inspection that the relevant equation is
(0)
H0 |En i(0) = En |En i(0) ,

which just says that the zeroth vector and scalar are the unperturbed eigenvector and
eigenvalue. This should come as no surprise. The higher order equations constitute a triangular system of linear, inhomogeneous equations. Each set of equations depends only
upon the results of the lower order equations. So one can successively solve these equations from lower to higher order. In the end, any perturbative correction can be expressed
completely in terms of the unperturbed eigenvalues and eigenvectors. Of course, the idea
is that for a sufficiently small perturbation a good approximation can be obtained by just
sticking to relatively low orders. The details of the solution process (which constitutes a
very nice application of linear algebra techniques) can be found in your text. Here I will
just state the results for first-order perturbation theory (O() corrections) and show you
how to use them.
As it turns out, the solutions of the first-order equations are rather simple when the
(0)
unperturbed ( = 0) eigenvalue is non-degenerate. We begin with that case. If En is
non-degenerate, then we have
(1)
En = Vnn (0) hEn |V |En i(0) ,

and
|En i(1) =
|Ek i(0)
X
k6=n
Vkn
(0)
En
(0)
Ek
where
Vkn = (0) hEk |V |En i(0) .
The first-order approximation to the energy eigenvalue is then
(0)
En En + Vnn .
The un-normalized eigenvector is, to first-order,
|En i |En i(0) +
X
k6=n
|Ek i(0)
Vkn
(0)
(0)
En Ek
Thus the correct eigenvector is a superposition of the old eigenvectors, which form a basis.
If we wish to approximate the state of the system using this vector, we must normalize
it. It is a straightforward exercise to see that, to first-order in perturbation theory,
p
|En inormalized = Zn |En i,
where
Z =1
|Vkn |2
(0)
(0)
2
k6=n (En Ek )
(0)
You can easily see that Zn can be interpreted as the probability for getting En when
measuring H0 in the state where H is known to have the value En with certainty. Put
differently, we can say that Zn is the probability for finding the state |En i(0) when the
system is in the state |En i. In general, assuming that at least one of the Vkn 6= 0, we have
that Zn < 1.
Lecture 31
Example: finite size of the atomic nucleus

One improvement on the simple particle-in-a-potential model of an atom takes account
of the fact that the atomic nucleus is not truly point-like but instead exhibits finite-size
structure in both its mass and charge distributions. It is possible to use perturbation
theory to get a quick look into the effect on atomic spectra of this feature. Of course, this
effect is one of a myriad of small corrections that need to be added to the atomic model.
Let us model the nucleus as a (very massive) uniform ball of total charge Ze with
radius r0 . As a nice exercise in elementary electrostatics, you can check that the potential
energy
V (r) = e
for such a charge distribution takes the form
V (r) =
2
Ze
r
2
Ze
2r
for r r0
2
r
r0

3 , for r r0 .
I do not know if a closed form solution to the eigenvalue problem for the Hamiltonian
H=
P2
+V
2m
is known, but I doubt that such a solution exists. We well treat the potential energy due
to the finite size r0 of the nucleus as a perturbation of the usual Coulomb potential. To
this end we write
V (r) = V0 (r) + B(r),
where
V0 (r) =
and
Ze2
,
r
for r > 0,

Ze2 r 2
2r0
3+ r
for 0 r r0 ,
r0
2r0
B(r) =
0
for r r0 .
The idea is then that, since the unperturbed energy eigenfunctions are non-trivial over a
range corresponding to the Bohr radius a (for the given Z), as long as r0 << a we expect
that the effect of the perturbation B will be small. We will make this more precise in a
moment.
1
Recall that the energy eigenstates |n, l, mi have position wave functions given by
hr|n, l, mi nlm (r, , ) = Rnl (r)Ylm (, ),
where the Ylm are the usual spherical harmonics and
"
#1/2
l
r
2 3 (n l 1)!
2r
2r
Rnl (r) =
L2l+1
( ),
e na
nl1
3
na
na
na
2n{(n + l)!}
with
a=
2
h
,
Zme e2
m = electron mass,
and Lq is the associated Laguerre polynomial. See the text for details. Note that a is the
Bohr radius for the hydrogenic atom with a given Z; it sets the scale for the atomic size.
Let us consider the first-order correction to the ground state of the atom due to this
perturbation. We choose the ground state for simplicity and since the ground state energy
is non-degenerate. The shift in the ground state energy is given by
E = h1, 0, 0|B|1, 0, 0i.
Using the fact that
r
2
e a ,
100 =
4a3
we find that (exercise)

2Ze2
E =
dr r2
r0 a3
0
Z r0
"
#

2r
r 2
2r0
e a .
3+
r0
r
This integral can be computed explicitly. Try it! (I did it using Maple.) I am going to
give you the answer after it has been simplified with the assumption that r0 << a, which
is needed in any case to ensure that the perturbation is sufficiently small to render our
approximation scheme viable. In this case we get
E
r 2
2Ze2 r02
4
0
=
|E
|
,
5 ground a
5a3
r0
<< 1.
a
(I got this expression by having Maple perform a Taylor series expansion.) Thus the effect
of the finite size is to shift the energy upward and the scale of the shift is determined by
r0 2
a .
To get a feel for the size of the correction, let us consider a couple of examples. For
hydrogen, we note that the charge radius of a proton is on the order of 1015 m and that
the Bohr radius is on the order of 1010 m so that the the perturbative correction is on
2
the order of one part in 1010 . Very small! There are physical systems in which the finite
size of the nucleus has a more pronounced effect. For example, one can consider muonic
hydrogen, consisting of a proton and a muon. Muons are like electrons only much
more massive. For muonic hydrogen the Bohr radius is smaller by a factor of 102 , leading
to an energy shift on the order of 1 part in 106 . At the other extreme, consider a muonic
lead atom (a negatively charged muon bound to a lead nucleus). There we have that both
r0 and a are on the order of 1015 m. Hence here the finite size effect is as important as the
Coulombic effect. t Note, however, that our perturbative approximation is no longer valid
since the effect of the perturbation is not small so this is at best a qualitative statement.
Let me finish this example with a few comments about the perturbative approximation to the ground state. To compute it we need to compute the matrix element
hn, l, m|B|1, 0, 0i, n 6= 1. Since the perturbation commutes with angular momentum, it is
clear that this matrix element will be non-zero only if l = m = 0. We then have (exercise)
Z r0
hn, l, m|B|1, 0, 0i = l0 m0
dr r2 Rn0 (r)R10 (r)B(r).
0
Assuming that r0 << a, we can approximate Rnl (r) Rnl (0). We then get (exercise)
Ze2 2
r R (0)R10 (0)l0 m0 ,
10 0 n0
which can be used to compute the approximate ground state wave function as a superposition of unperturbed l = 0 = m hydrogenic stationary states. Note that this superposition
will lead to a spherically symmetric ground state wave function which is non-vanishing for
r < r0 . Thus there is a non-zero probability for finding the electron (or muon) within the
nucleus!
hn, l, m|B|1, 0, 0i
Degenerate Perturbation Theory

We now consider the case where the unperturbed eigenvalue is degenerate, that is,
(0)
there are d linearly independent eigenvectors |En ii , i = 1, 2, . . . , d for the unperturbed
(0)
eigenvalue En . These eigenvectors span the degenerate subspace D, which is a finitedimensional vector space sitting inside the full Hilbert space of state vectors. Degeneracy
is associated with a symmetry of the (unperturbed, in this case) Hamiltonian. The full
Hamiltonian (with the perturbation included) will typically not have all the symmetry of
the unperturbed Hamiltonian. Thus the true eigenvalues that are approximated by the
unperturbed eigenvalue will usually not all be degenerate. Put differently, as the perturbation is turned on, by mathematically varying from 0 to 1, some of the unperturbed
eigenvectors with the same unperturbed eigenvalue become eigenvectors with a distinct
eigenvalue, so that the degeneracy can be lifted by the perturbation. One says that the
energy levels split as the perturbation is turned on.
3
Consider for example an atom modeled as a particle moving in a central potential. Its
excited states are degenerate because the (unperturbed ) Hamiltonian H0 is rotationally
~ it is easy to see that all states differing
invariant. In particular, since H0 commutes with L,
only by their m values must have the same energy. In detail, if |n, l, mi is an eigenvector
of H0 , then so is L |n, l, mi. Thus all such states will have the same energy. Suppose this
atom is put in a uniform electric field (Stark effect), so that the perturbation is
~ X.
~
V = eE
~ so that the degenThis potential breaks the rotational symmetry (to just that about E),
eracy is lifted. States constructed as outlined above will have differing energies.
But now there is a subtlety in the perturbative account of this phenomenon. In the
unperturbed system any basis for the degenerate subspace could be used to define the
unperturbed states. But in the perturbed theory, the energy eigenvectors that are no
longer degenerate cannot be linearly superimposed to get eigenvectors. This means that,
in fact, the perturbation theory must select a preferred basis of unperturbed eigenvectors,
namely, the ones that the correct eigenvectors collapse to as 0. We will see this
happening in what follows. The problem is now that in perturbation theory one begins
with the unperturbed states. But which states do we choose? The results of degenerate
perturbation theory take care of this, as you will see.
Lecture 32
Degenerate Perturbation Theory (cont.)
Degenerate perturbation theory leads to the following conclusions (see the text for
details of the derivation). To compute the first-order corrections to the energies and eigenvectors when there is degeneracy in the unperturbed eigenvalue one proceeds as follows.
Step 1
Consider the restriction, V of the perturbation V to the d-dimensional degenerate
subspace D. V is defined as follows. The action of V on a vector from D is some other
vector in the Hilbert space. Take the component of this vector along D, i.e., project this
vector back into D. This process defines a Hermitian linear mapping V from D to itself.
In practice, the most convenient way to compute V is to pick a basis for D. Compute the
matrix elements of V in this basis. One now has a d d Hermitian matrix representing V .
Step 2
Find the eigenvectors and eigenvalues of V . Again, it is most convenient to do this
using the matrix representation of V .
Step 3
(0)
The first-order energy corrections to En are the eigenvalues of V . The zeroth order
limit of the true eigenvectors that correspond to the true eigenvalues are the eigenvectors
of V that correspond to the first-order eigenvalue corrections.
Step 4
The first-order correction to the eigenvectors are given by the same formula as before,
but now one excludes vectors from D in the summation. The zeroth order eigenvectors
are those identified in Step 3.
These steps are, at first sight, rather complicated looking. It is not really that bad,
though. An example will help clear things up.
Example: Hyperfine structure
The hyperfine structure results from the interaction of the nuclear and electronic spin.
Specializing to a hydrogen atom we can model it as follows. We model the electron as a
1
particle with spin 1/2, so it has both translational and spin degrees of freedom. We will
model the nucleus (proton) as fixed in space, so we model it as a spin 1/2 system. The
total system then is defined on the Hilbert space which is a direct product of the space of
states of the electron and another spin 1/2 system. A basis for this space is given by the
product states of the form
|; S(e)z , ; S(p)z , i := |i |S(e)z , i |S(p)z i,
where |i runs over a basis for the Hilbert space of a spinless particle (e.g., the energy
eigenfunctions for the hydrogen atom Hamiltonian), |S(e)z , i is the usual basis for the
spin 1/2 system here used for the electron, and |S(p)z , i is the usual basis for the spin
1/2 system here used for the proton.
The magnetic moments for the electron and proton are given by
e
ge
e =
S(e) , p =
S ,
me
2mp (p)
where g is the proton g-factor, which is about 5.6.
It is a nice exercise to check that a pure point magnetic dipole (which is how we are
modeling the proton) is the source of the following magnetic field
B(r) =
3n(n p ) p 8
+
p (r),
3
r3
where
r
n= .
r
Here the first term represents the familiar dipole term in a multipole expansion of the
magnetic field outside a localized current distribution. The second term is used to model
the limit in which one assumes the spatial dimensions of the current distribution vanish
while keeping the magnetic moment p fixed. The energy of the electron in the presence
of this field is given by
U := e B.
Let us treat this energy as a perturbation of the usual hydrogenic energy and study the
effects of this perturbation on the ground state of hydrogen in our current model.
The unperturbed ground state of hydrogen does not know about the magnetic field
of the proton. Therefore its energy is the usual 13.6 eV . However, since we have taken
account of the spin degrees of freedom of both the proton and the electron we now have
a four-fold degeneracy corresponding to the different possible states of orientation of the
spins, and superpositions thereof. The degenerate subspace of the ground state energy is
spanned by the four orthonormal vectors
|ground ; S(e)z , , S(p)x , i,
2
where
|ground i = |n = 1, l = 0, ml = 0i.
According to degenerate perturbation theory our first step is to compute the matrix
elements of the perturbation in the degenerate subspace. For each of the basis states we
have a matrix element of the form

Z
1
3n(n a) a 8
3
2
(constant) d x |R10 (r)|
a(r) b,
+
4
3
r3
where the vectors a and b represent the matrix elements of the magnetic moment vectors:
a = hS(p)z , |p |S(p)z , i,
b = hS(e)z , |e |S(e)z , i.
Now, it is a standard result from E&M that the quadrupole tensor,
1
Q(a, b) = (r a)(r b) r2 a b,
3
here evaluated on a pair of constant vectors, has a vanishing average over the unit sphere:
Z 2
Z
1
d Q(a, b) = 0.
d
4 0
0
One way to check this is to write out this tensor in spherical polar coordinates. You will
find that the angular dependence of this tensor is that of the spherical harmonic Yl=2,m ,
which integrates to zero over a sphere (since it is orthogonal to constants, i.e., Y00 ). This
result implies (exercise) that only the delta function portion of B plays a role in the matrix
of the perturbation in the ground states.
To be continued...
Lecture 33
Example: Hyperfine structure (cont.)
We are evaluating the matrix elements of the perturbation, which now takes the form
e B =
8
e p (r)
3
in the degenerate subspace spanned by

|1, 0, 0i |i |i.
The translational part of the unperturbed degenerate subspace is just the usual
ground state of hydrogen and so when computing the matrix elements of the perturbation
we get a common factor of:
Z
h1, 0, 0|(r)|1, 0, 0i = d3 x|100 (r)|2 (r) = |100 (0)|2 .
Using |100 (0)|2 = a1 3 , the 4 4 the perturbation takes the form
4ge2
(Sp Se )ij ,
3mp me a
where a is the Bohr radius and the i j refer to the basis |S(e)z , S(p)z i = | + +i, | + i, |
+i, | i. So, for example,
(Sp Se )12 = h+|Se |+i h+|Sp |i = 0.
A very straightforward computation of the matrix elements yields
1 0
0 0
2
h 0 1 2 0
(Sp Se )ij =
0 2 1 0 .
4
0 0
0 1
The eigenvalues and (normalized) eigenvectors of this matrix are (exercise)

1
0
0
2
h
0 1 1 0
eigenvalue :
, eigenvectors : , , ,
0
0
4
2 1
0
0
1
0
1 1
3h2
, eigenvectors :
eigenvalue :
.
4
2 1
0
1

2
Of course, the eigenvalue h4 is triply degenerate; any linear combination of its three eigenvectors is also suitable.
You can also get this result by setting

S = Sp + Se
and computing
1
3
1
Sp Se = (S 2 Sp2 Se2 ) = S 2 h2 I.
2
2
4
Recall that the singlet and triplet states are eigenvectors of S 2 with eigenvalues of 0 and
2h2 respectively. Thus the singlet state is an eigenvector of Sp Se with eigenvalue 34
h2
and the triplet states all have the eigenvalue 41 h2 . The components of the eigenvectors
in the product basis which we found above are indeed the components of the singlet and
triplet states and the eigenvalues then follow. Our particular choice of basis for the triply
degenerate eigenvalue corresponds to states of total spin of 2h and Sz = h, 0, of course.

We thus see that the spin-spin interaction of the electron and proton is such that (to
first order in the perturbation) the triplet spin states are increased in energy by an
amount
gh4
,
Etriplet =
3mp me a3
and the singlet spin state has its energy decreased:
Esinglet =
gh4
.
mp me a3
Taking account of the hyperfine interaction, we see that the singlet state is the correct
(zeroth order approximation to the) ground state. The difference in energy between the
singlet and triplet states is given by
Etriplet Esinglet = 5.9 106 eV.
If we consider transitions between these states associated with emission or absorption of
a photon this energy difference corresponds to a photon wavelength of about 21 cm. This
leads to an explanation for the famous 21 centimeter spectral line that is observed in the
microwave spectrum by radio telescopes. It is attributed to vast amounts of interstellar
hydrogen undergoing transitions from the triplet to the singlet state.
Time-dependent perturbation theory
Time-dependent perturbation theory (TDPT) is an extremely important approximation technique for extracting dynamical information from a quantum system when the
2
Schrodinger equation cannot be solved explicitly.* TDPT can be viewed as a technique

for iteratively approximating solutions to differential equations. Many key physics results
appear via TDPT. It leads, for example, to the fundamental picture of quantum dynamics
as a sequence of transitions between (formerly) stationary states, e.g., , for atoms interacting with electromagnetic radiation. It leads to Fermis Golden Rule, it leads to the idea
of forbidden transitions, and it yields an amusing form of the time energy uncertainty
principle. And theres more.
The basic idea of TDPT is quite simple, we suppose that the Hamiltonian for a given
quantum system can be decomposed into two parts,
H = H0 + V,
where H0 describes physics that is well-understood and V represents the interactions that
we are trying to understand. So, for example, H0 could be the Hamiltonian for an electron
in the hydrogen atom, and V could represent the interaction with an incident electromagnetic wave an example I hope to get to. The key assumption is that the effect of V on
the dynamics is suitably small (compared to H0 ) so that the dynamics generated by H
can be expressed in terms of some small modifications due to the perturbation V to the
dynamics generated by H0 . The technique of TDPT is valid whether or not H, H0 and/or
V depend explicitly on time. For simplicity we will restrict attention to situations where
H0 can be chosen to be time independent; V may be time dependent.
The basic scheme is the following (see your text for an alternative description). In the
Schrodinger picture the state vector at time t satisfies
ih
d
|, ti = (H0 + V )|, ti.
dt
Our goal is to find (an approximation scheme for) the state vector at time t given the
initial state. We expand |, ti in the basis of eigenvectors of H0 :
X
i
|, ti =
cn (t)e h En t |ni,
n
where
H0 |ni = En |ni,
and we assume the spectrum is discrete only for simplicity in our general development.
Note that we have inserted a convenient phase factor into the definition of the expansion
coefficients cn . This phase factor is such that (1) the cn (0) are the expansion coefficients
at t = 0, and (2) if V = 0 (i.e., we turn off or neglect the effect of the perturbation) then
* The usual comments apply about non-trivial physical systems and the lack of explicit
solubility.
3
the cn are constant in time (exercise). Thus, the time dependence of the cn (t) is solely due
to the perturbation.
The Schrodinger equation can be viewed as a system of ODEs for the cn (t). To see
this, substitute the expansion of |, ti into the Schrodinger equation and take components
in the basis |ni. We get (exercise)
ih
X i
d
cn (t) =
e h (En Em )t Vnm (t)cm (t),
dt
m
Vnm = hn|V (t)|mi.
Up until now everything we have done has involved no approximations. The system of
ODEs displayed above is equivalent to the Schrodinger equation.
Now we suppose that the effect of the matrix elements Vnm is suitably small and we
make a sequence of approximations to the right-hand side of the differential equations.
First, if we ignore Vnm altogether then we can approximate the differential equation as
dcn
0.
dt
This is the zeroth order approximation in that the right hand side of the differential
equation is correct to zeroth order in the perturbation (i.e., accurate up to terms of zeroth
order in the matrix elements of the perturbation) and, by the same token, the expansion
coefficients are correct to zeroth order. Thus we find that the cn (t) are constant in time
in this lowest order approximation. You will recall that we defined the phases of the cn so
this would happen. Of course, physically this means that the probability distribution for
H0 is time-independent. We call this the zeroth-order approximation and we denote the
(0)
expansion coefficients in this approximation as cn .
We can now get a better approximation to the right hand side of the equation by
substituting the zeroth order solution there. This gives an equation which approximates
the Schrodinger equation accurately to first order in the potential. As we shall see, it
is easy to solve this approximate form of the equation by simply integrating both sides.
We then have an approximate solution which involves some time dependence because of
the potential. The potential appears linearly in this first order approximation. We can
substitute the first-order approximation into the right hand side of the equation to get
a better approximation to the equation. Solving this equation gives the second order
approximation; the solution is quadratic in V . The idea is that if V is sufficiently small
one can continue this iterative process to get better and better approximations to the
solution of the Schrodinger equation. We will content ourselves with studying the firstorder approximation, which has a number of important physical features and applications.
Lecture 34
Time-dependent perturbation theory (cont.)
We are constructing an approximation scheme for solving
X i
d
ih cn (t) =
e h (En Em )t Vnm (t)cm (t), Vnm = hn|V (t)|mi.
dt
m
For simplicity we shall suppose (as is often the case) that the initial state is an eigenstate of H0 . Setting |(0)i = |ii, i.e., taking the initial state to be one of the unperturbed
energy eigenvectors, we get as our zeroth-order approximation:
(0)
cn (t) cn = ni .
(1)
We get our first-order approximation cn (t) by improving our approximation of the

right-hand side of the differential equations to be accurate to first order. This we do by
(0)
using the zeroth order approximation of the solution, cn in the right hand side of the
equation. So, we now approximate the differential equation as
ih
i
d
cn (t) e h (En Ei )t Vni (t),
dt
with solution
cn (t)
(0)
cn
(1)
+ cn (t)
Z
1 t 0 i (En Ei )t0
= ni +
dt e h
Vni (t0 ).
ih 0
Successive approximations are obtained by iterating this procedure. We shall only deal with
the first non-trivial approximation, which defines first-order time-dependent perturbation
theory.
We assumed that the system started off in the (formerly stationary) state |ii defined
by H0 . Of course, generally the perturbation will be such that |ii is not a stationary state
for H, so that at times t > 0 the state vector will change. We can still ask what is the
probability for finding the system in an eigenstate |ni of H0 . Assuming that n 6= i this is
Z
1 t 0 i (En Ei )t0
P (i n, i 6= n) = |
dt e h
Vni (t0 )|2 .
h 0
This is the transition probability to first-order in TDPT. The probability for no transition
is, in this approximation,
X
P (i i) = 1
P (i n, i 6= n).
n6=i
TDPT is valid so long as the transition probabilities are all much less than unity. Otherwise, our approximation begins to fail.
The transition probability for i n is only non-zero if there is a non-zero matrix
element Vni connecting the initial and final states. Otherwise, we say that the transition
is forbidden. Of course, it is only forbidden in the leading order approximation. If a
particular transition i n is forbidden in the sense just described, then physically this
means that the transition may occur with very small probability compared to other, nonforbidden transitions. To calculate this very small probability one would have to go to
higher orders in TDPT.
Example: time-independent perturbation
As our first application of TDPT we consider what happens when the perturbation V
does not in fact depend upon time. This means, of course, that the true Hamiltonian H0 +V
is time independent, too (since we assume H0 is time independent). If we can solve the
energy eigenvalue problem for H0 +V we can immediately write down the exact solution to
the Schrodinger equation and we dont need to bother with approximation methods. But
we are assuming, of course, that the problem is not exactly soluble. Now, in the current
example of a time-independent perturbation, we could use the approximate eigenvalues
and eigenvectors from time independent perturbation theory to get approximate solutions
to the Schrodinger equation. This turns out to be equivalent to the results we will obtain
below using time dependent perturbation theory. Its actually quite a nice exercise to prove
this, but we wont bother.
Assuming V is time independent, the integral appearing in the transition probability
can be easily computed and we get (exercise)

4|Vni |2
(En Ei )t
2
P (i n, i 6= n) =
sin
.
2
h
(En Ei )2
Assuming this matrix element Vni does not vanish, we see that the transition probability
oscillates in time. The amplitude of this oscillation is small provided the magnitude of
the transition matrix element of the perturbation is small compared to the unperturbed
energy difference between the initial and final states. This number had better be small or
the perturbative approximation is not valid.
Let us consider the energy dependence of the transition probability. The amplitude
of the temporally oscillating transition probability decreases quickly as the energy of the
final state differs from the energy of the initial state. The dominant transition probabilities
occur when En = Ei , i 6= n. For this to happen, of course, the states |ni and |ii must
correspond to a degenerate or nearly degenerate eigenvalue. When En Ei we have
1
P (i n, i 6= n, En = Ei ) = 2 |Vni |2 t2 .
h
2
Because of the growth in time, our approximation breaks down eventually in this case.
Still, you can see that the transition probability is peaked about those states such that
En Ei , with the peak having height proportional to t2 and width proportional to 1/t.
Thus as the time interval becomes large enough the principal probability is for transitions
of the energy conserving type, En Ei . For shorter times the probabilities for energy
non-conserving transitions are less negligible. Indeed, the probability for a transition with
an energy difference E at time t is appreciable once the elapsed time satisfies
t
h
,
E
as you can see by inspecting the transition probability. This is just a version of the timeenergy uncertainty principle expressed in terms of the unperturbed stationary states. One
sometimes characterizes the situation by saying that one can violate energy conservation
by an amount E provided you do it over a time interval t such that tE h.
All this talk of energy non-conservation is of course purely figurative. The true
energy, represented by H0 + V is conserved, as always in quantum mechanics when the
Hamiltonian is time independent. It is only because we are considering the dynamics in
terms of the unperturbed system, using the unperturbed energy defined by H0 , that we
can speak of energy non-conservation from the unperturbed point of view. You can see,
however, why slogans like you can violate conservation of energy if you do it over a short
enough time might arise and what they really mean. You can also see that such slogans
can be very misleading.
Lecture 35
Fermis Golden Rule
First order perturbation theory gives the following expression for the transition probability:

(En Ei )t
4|Vni |2
2
P (i n, i 6= n) =
sin
.
2
h
(En Ei )2
We have seen that the energy conserving transitions (if there are any available) become
dominant after a sufficiently large time interval. Indeed, the probabilties for energy nonconserving transitions are bounded in time, while the probability for energy conserving
transitions grow quadratically with time (for as long as the approximation is valid). Here
large time means that the elapsed time is much larger than the period of oscillation of
the transition probability for energy-non-conserving transitions
T :=
2h
.
|En Ei |
Note that the typical energy scale for atomic structure is on the order of electron-volts.
This translates into a typical time scale T 1015 s, so large times is often a very good
approximation.
In the foregoing we have been tacitly assuming the final state is an energy eigenstate
coming from the discrete part of the energy spectrum. If the final state energies lie in a
continuum (at least approximately) we get a qualitatively similar picture, but the details
change. We shall see that the transition probability at large times still favors energy
conserving transitions, but it will only grow linearly with time because the width of the
probability distribution about such transitions is becoming narrower with time. We can see
this by supposing the probability is a continuous function of E = En Ei (see discussion
below) and considering the large t limit via the identities
sin2 (x)
lim
= (x),
x2
(ax) =
1
(x)
|a|
Then we have (exercise)

P (i n, i 6= n)
2
|Vni |2 (E)t,
h
t >> T.
This implies that in this case (quasi-continuum of final states) the transition rate dP
dt is
constant at late times:
d
2
P (i n, i 6= n)
|Vni |2 (E), t >> T.
dt
h
1
a result which is one version of Fermis Golden Rule.

An example of a continuum of final states appears in the Auger ionization of a Helium
atom. Here two electrons in the 2S (unperturbed) stationary state make a transition
(thanks to their electrostatic repulsion) to a state in which one electron is in the 1S state
and the other is ejected. Another example would be elastic scattering of an electron off of
an ion. In such situations there is a continuum (or near continuum) of (nearly) degenerate
states that can serve as the state denoted by |ni. In this case one will have a density of
states factor (E). The quantity (E)dE is the number of states with energies between
E and E + dE. The transition rate from the initial state |ii into states with energy E E
is then
Z
d
2
2
P (i E) =
dEn (En ) |Vni |2 (Ei , En ) =
(Ei )|Vni |2 .
dt
h
E
This is probably the most commonly used version of Fermis Golden Rule. We will consider
an example of this a bit later.
Harmonic Perturbations
We now consider perturbations which depend upon time. Usually we can Fourier analyze the time dependence and view the perturbation as a superposition of many perturbations that each depend harmonically on the time. Therefore, the next thing to consider are
such harmonically varying perturbations. As a good example, which we shall consider later,
consider the effect of an electromagnetic wave incident upon an atomic electron. If the
wave is nearly monochromatic then one can model the time dependence of the associated
perturbation as harmonic.
Suppose, then, that the perturbation is of the form
V (t) = Veit + V eit ,
where V is some time-independent operator and is a given frequency. (Note that there
are now two time scales in the problem: that associated with the perturbation, and that
naturally defined by the difference in unperturbed energies.) We again suppose that at the
initial time t = 0 the system is in an eigenstate |ii of H0 . We ask what is the probability
for finding the system in the state |ni at time t (n 6= i). Using our previous results we
have that (exercise)
2
Z t

1
0
0
En Ei
0
it
it
i
t
dt (Vni e
+ Vni e
)e ni , ni :=
.
P (i n; i 6= n) =
h 0
h
The integral is the sum of two terms, where each is of the form encountered in the timeindependent perturbation case, except that now the frequency in the exponents is shifted
via
ni ni .
2
This allows us to use our previous analysis for the late time transition rates, provided the
frequencies are modified as shown above in order to handle the current situation.
In detail, after taking the absolute-square in the above formula for the probability, we
have 3 terms: two direct terms and a cross term. Each has a vanishing denominator
on resonance, that is, when the initial and final states are such that ni = 0. As
before, the limit as the denominator vanishes for each of these terms is finite and grows
with time. Thus, as before, the principal transitions at late times are such that one of
these frequency combinations vanishes. For a given initial and final state only one of these
two options () can occur. The direct terms have the square of the growing probability,
while the cross term only involves this quantity linearly. The cross term can therefore be
ignored compared to the direct terms.
In summary, at large enough times, the only transitions with appreciable probability
will be those for which either
ni + = 0
or
ni = 0.
Of course, only one of these two situations can occur for a given choice of frequency and
for a given initial and final state. In the first case we have
ni + = 0 En = Ei h,
and in the second case we have
ni = 0 En = Ei + h.
So, using the unperturbed energy observable to interpret these results, we have the following situation. When En = Ei h we speak of stimulated emission, since the perturbation
has (from the unperturbed point of view) caused the system to emit a quantum of energy
h. When En = Ei + h we speak of stimulated absorption, since the perturbation has
(from the unperturbed point of view) caused the system to absorb a quantum of energy
h.
To get a formula for the late time transition rates we simply have to make the appropriate shift
ni ni
in our previous formulas. In the case of a (quasi-)continuum of final states we get
i
d
2 h
2
2
|Vni | (En Ei + h) + |Vni | (En Ei h) .
P (i n, i 6= n) =
dt
h
3
This is yet another version of Fermis Golden Rule. Note that, since
V = V (V ) = |V |2
|Vni |2 = Vni
ni
in in
in
the transition rate (per unit energy) for i n is the same as that for n i, a phenomenon
often called detailed balancing.
Lecture 36
What happens when you shine light on an atom?

You will have noticed that up to this point in our discussion of time-dependent perturbation theory I have assiduously avoided much in the way of concrete, specific applications
of our results. This was partially because you will do the relatively easy applications in
your homework, while the more involved applications would distract too much from the
general features of the perturbative approach. Also, I wanted to use the perturbative point
of view not just to solve this or that example situation, but rather to give a general point
of view on quantum dynamics that is available whenever the perturbative approximation
is valid. Now, though, we should have a look at a good example where this technology is
useful.
We shall study the perturbative dynamics of an atomic electron when exposed to a
(weak) plane electromagnetic wave. This is a simple model which can be used to answer
(at least in part) the question posed in the title of this section. The story is a little long,
but I think it is useful and instructive.
We are principally interested in the case where the electron is, in the absence of the
perturbation, in a bound stationary state of some potential, with the electromagnetic
field serving to stimulate transitions between the bound or ionized states. Our strategy
is as follows. We model the atom as a spinless particle bound by some potential V0 . Of
course, more sophisticated models of the atom are available. We could specialize V0 to be
a central potential, or even the Coulomb potential, but we wont have to make any such
choices for a while. Thus, the unperturbed Hamiltonian is of the form
H0 =
P2
~
+ V0 (R).
2m
The unperturbed stationary states are just the energy eigenstates of H0 these are the
energy levels of the atom.
Now we want to introduce the electromagnetic radiation. The total electromagnetic
field that the particle interacts with will then be the superposition of the field represented
by V0 (e.g., an electrostatic field) with the field of the incident radiation. Let us describe
~ r, t)). It is a standard result of electhe radiation using the vector potentials ((~r, t), A(~
tromagnetic theory that, in regions of space free of electromagnetic sources, it is always
possible to choose the electromagnetic potentials so that they are in the radiation gauge:
= 0,
~ = 0.
A
1
In this gauge the electric and magnetic fields are determined by the potentials according
to
~
~ = A.
~
~ = 1 A , B
E
c t
We wish to use this description to incorporate the electromagnetic perturbation only.
We could, if desired, use this description also to include the V0 contribution to the total
electromagnetic field. In this case, though, one is not easily able to make the split between
the unperturbed system and the perturbation.
Note that we are using a physical model in which the EM radiation is specified once and
for all, ignoring all electromagnetic sources in our system. Of course, the atomic electron
itself also serves as a source of an EM field. Thus we are in an approximation where this
effect can be ignored. Nonetheless, at the end of the day we will interpret the dynamics as
energy level transitions accompanied by absorption or emission of radiation by the electron!
This is a typical physical analysis in which we make suitable approximations but use
our knowledge of physical behavior to interpret the results, even if our interpretation
goes beyond the validity of the mathematical model. In physics, this kind of reasoning is
something akin to an art form.
Next, I remind you of the form of the Hamiltonian for a particle of charge q and mass
m moving in a given electromagnetic field. We have
H=
1 ~ q~ 2
(P A) + q,
2m
c
~ are operators, possibly depending upon time, obtained by taking the scalar
where (, A)
~ r, t)) and viewing them as operators via
and vector potentials ((~r, t), A(~
~ t),
= (R,
~ = A(
~ R,
~ t).
A
The radiation field has = 0. If you like, you can think of V0 as being obtained from the
~
q contribution to H. In any case, the perturbation comes from the terms involving A.
We are, of course, assuming that the components of the vector potential are suitably small
so that their contribution to the matrix elements of H are small compared to the matrix
elements of H0 . Since the vector potential is to be small, in a first approximation we can
~ in the above Hamiltonian. We are left with the following
neglect the term quadratic in A
(approximate) Hamiltonian:
H = H0 + V,
where
q ~ ~
A P.
mc
We will use this Hamiltonian, in the context of perturbation theory, to investigate the
behavior of an atomic electron in the presence of an electromagnetic wave.
V =
The radiation field

We have seen that we can describe the radiation field by a vector potential. The
Maxwell equations determine the vector potential, in the radiation gauge, according to
2~
~ 1 A = 0,
2 A
c2 t2
~ = 0.
A
It is easy to write down the general solution to this equation. We will specialize to a
solution representing a plane electromagnetic wave traveling in the direction of the unit
vector n
:
Z
1
~
A(~r, t) = e
d A()ei(t c n ~r) ,
where A() = A () determines the frequency composition of the wave. Because the
~ = 0, we have that
vector potential is transverse, A
e n
= 0.
Evidently, the vector e determines the plane polarization.
A general radiation field can be viewed as a superposition of plane waves of the type
shown above. The superposition is over the propagation directions and polarizations. If
we have time, later we will consider this case. For now we will stick with plane waves of a
fixed polarization and propagation direction.
Transitions
Let us now compute the probability P (i f ) for transition from an initial state,
assumed to be an unperturbed stationary state at time t0 , to a final (unperturbed stationary) state at time t. To keep things a little more concrete, we can suppose that the
atomic electron is bound by a central force so that the unperturbed stationary states can
be described by a principal quantum number n labeling the energies along with the
usual orbital angular momentum quantum numbers l and m. The initial and final states
are |ni , li , mi i at t = 0 |nf , lf , mf i at time t. The final state is labeled as if it is a bound
state, but it will be straightforward to adjust our results to apply to unbound final states
whose energy lies in a continuum. Also, you will easily see that most of what we do is
insensitive to the details of the particular model we use for the atomic electron. We have
Z t
2
Z
1

0
0
0
0
i
t
it
it
f
i

]
P (i f )
dt
d e
[Vf i ()e
+ Vif ()e
h t0
Here
Vf i () =
1
q
~
A()hnf , lf , mf |ei c n R e P~ |ni , li , mi i.
mc
The frequency f i depends upon our model of the atom, i.e., our choice for V0 . If we
choose the Coulomb potential (so that we have a hydrogenic atom), then
f i =
1
1
Z 2q2 1
(Ef Ei ) =
( 2 2 ),
h
2a ni
nf
with Z the nuclear charge number, a the Bohr radius, and n = 1, 2, . . . is the principal
quantum number labeling the energy levels. (But, again, this particular formula for f i
formula is contingent upon the simplest model of a hydrogenic atom.)
Let us suppose that the electromagnetic radiation is a pulse of some finite duration
such that t0 is before the pulse arrives and t is after the pulse has passed. We can then
easily compute the transition probability by letting the range of time integration run from
to . In this limit the time integral yields a delta function in frequency space. This
lets us perform the frequency integral. We get
P (i f ) |
1
2q
~
A(|f i |)hnf , lf , mf |ei|f i | c n R e P~ |ni , li , mi i|2 .
hmc
This formula applies to either stimulated emission and absorption.

Note that the only Fourier component of the radiation field that contributes to the
transition is at the frequency f i of the transition. If the wave has no frequency component
at f i then the transition probability vanishes (in this approximation). This jibes with the
picture of transitions being accompanied by emission or absorption of discrete quanta
of size hf i . Indeed, let us express the transition probability in terms of the net energy
delivered by the pulse. It is not hard to see that the energy contained in the wave per unit
area orthogonal to the wave propagation (the Poynting flux) is given by
Z
Z
c ~
1
~
E=
dt (E B) ~n =
d 2 |A()|2 .
4
c
0
Let us define N () as the energy carried by the wave (for all time) through unit area per
unit frequency:
2
N () =
|A()|2 .
c
In terms of N () we have
1
4 2
~
P (i f ) 2 2 |hnf , lf , mf |ei|f i | c n R e P~ |ni , li , mi i|2 N (f i ),
m hf i
where
=
q2
1
hc
137
is the fine structure constant.

4
This probability comes as the product of 3 factors: (1) the intensity of the electromagnetic radiation (embodied in N ()), which reflects the adjustable role of the outside
influence that stimulates the transitions; (2) the fine structure constant, which characterizes the strength (fixed by nature) of the electromagnetic interaction; (3) the matrix
element between the initial and final states, which reflects the role played by the atomic
structure itself. We next spend some time analyzing this matrix element.
Lecture 37
Electric dipole transitions
Our transition probability (to first order in perturbation theory) is
P (i f )
4 2
~
i|f i | 1c n
X
N
(
)|hn
,
l
,
m
|e
e P~ |ni , li , mi i|2 ,
f
i
f
f
f
2
2
m hf i
where
=
1
q2
hc
137
is the fine structure constant and

2
|A()|2
c
is the energy per unit area per unit frequency carried by the EM pulse characterized by
the vector potential in the radiation gauge with frequency components A().
N () =
We now want to analyze the matrix element which appears. This factor reflects the
atomic structure and characterizes the response of the atom to the electromagnetic wave.
Let us begin by noting that the wavelength of the radiation absorbed/emitted is on
the order of 2c/f i 106 m, while the atomic size is on the order of the Bohr radius
108 m. Thus one can try to expand the exponential in the matrix element:
1
1
~
~ + . . .)
hnf , lf , mf |ei|f i | c n X e P~ |ni , li , mi i = hnf , lf , mf |(1 ip|f i | n
X
e P~ |ni , li , mi i.
c
The first term in this expansion, if non-zero, will be the dominant contribution to the
matrix element. Thus we can approximate
1
hnf , lf , mf |ei|f i | c n X e P~ |ni , li , mi i hnf , lf , mf |

e P~ |ni , li , mi i,
which is known as the electric dipole approximation. Transitions for which this matrix element is non-zero have the dominant probability; they are called electric dipole transitions.
We shall see why in a moment. Transitions for which the dipole matrix element vanishes
are often called forbidden transitions. This does not mean that they cannot occur, but
only that the probability is much smaller than that of transitions of the electric dipole
type, so they do not arise at the level of the approximation we are using.
If we restrict attention to the electric dipole approximation, the transition probability
is controlled by the matrix element hnf , lf , mf |P~ |ni , li , mi i. To compute it, we use the fact
that
~
~ H0 ] = ih P ,
[X,
m
1
and that
H0 |n, l, mi = En |n, l, mi.
We get
m
~ 0 H0 X|n
~ i , li , mi i
hnf , lf , mf |P~ |ni , li , mi i = hnf , lf , mf |XH
ih
~ i , li , mi i.
= imf i hnf , lf , mf |X|n
Now perhaps you can see why this is called a dipole transition: the transition only occurs
according to whether or not the matrix elements of the (component along e of the) dipole
~ are non-vanishing.
moment operator, q X,
Selection rules for Electric Dipole Transitions
We have seen that the dominant transitions are of the electric dipole type. We now
consider some details of the dipole matrix elements
~ i , li , mi i|.
hnf , lf , mf |q
e X|n
In particular, we derive necessary conditions on l and m such that the dipole matrix element
is non-zero and hence electric dipole transitions can occur. These conditions are usually
called selection rules for the (first-order) electric dipole transitions. Transitions which do
not obey these selection rules are usually called forbidden transitions. Of course they are
only forbidden insofar as our approximations are valid. The forbidden transitions may very
well occur, but they will be far less likely than the (first-order) electric dipole transitions
being considered here.
The selection rules we shall derive are determined solely by the angular momentum
properties of the unperturbed stationary states. Thus, the selection rules rely upon the
fact that the stationary states can be chosen to be orbital angular momentum eigenstates,
~ (see
which requires that the atomic potential V0 be a central potential: V0 = V0 (|X|)
below). On the other hand, the selection rules do not depend upon any further properties
of this potential.
Digression: Rotational Symmetry
Here we briefly explain how angular momentum conservation is tied to rotational symmetry. Recall the unitary rotation operator:
i
U (
n, ) = e h nJ .
For a spinless particle (which is how we are modeling the electron) we have
~ =X
~ P~ .
J~ = L
2
On position/momentum eigenvectors |~xi, |~

pi we have
~
e h nJ |~xi = |R(
n, )~xi,
e h nJ |~
pi = |R(
n, )~
pi
where R(
n, ) is the 3-d orthogonal transformation rotating about the axis n
by the angle
. From this it follows that (exercise)
i
~ ~ i
~
~
h
nL = R(
e h nL Xe
n, )X,
From this it follows that

i
i
i
~
~
e h nL P~ e h nL = R(
n, )P~ .
e h nL P 2 e h nL = P 2 ,
and
i
~
~
~ hi nL
~
= V (R(
n, )X).
e h nL V (X)e
~ or by verifying
(The last relation can be seen using the spectral decomposition of V (X),
this relation in the position basis.)
If the potential is rotationally invariant, i.e., is spherically symmetric, i.e., depends
only upon the distance from the center of rotation, i.e., describes a central force, then
i
e h nL V (|X|)e h nL = V (|X|).
The Hamiltonian is then rotationally invariant:
i
e h nL (
i
P2
P2
~
+ V (|X|)e h nL =
+ V (|X|).
2m
2m
By considering an infinitesimal transformation it easily follows that

~ = 0.
[H, L]
This is just infinitesimal rotation invariance. But it also means that angular momentum is
conserved its probability distribution is unchanged in time. This is because stationary
states can be chosen to be angular momentum eigenstates. Thus we see how a symmetry
corresponds to a conservation law. For us, however, the key thing here is that for a central
potential the energy eigenvectors can be chosen to be also angular momentum eigenvectors.
Selection rules involving m
We now consider restrictions on mi and mf needed so that the three components of
~ have non-vanishing matrix elements. Our main identity arises because X
~ is a vector
X
operator; we have that
[Ri , Lj ] = ihijk Rk .
3
In particular,
[X, Lz ] = ihY,
[Y, Lz ] = ihX,
[Z, Lz ] = 0.
These formulas simply give the infinitesimal change of the position vector under rotations
(exercise) and are the infinitesimal versions of the formulas given above. You can easily
check them explicitly.
From these identities we have
0 = hnf , lf , mf |[Z, Lz ]|ni , li , mi i = (mi mf )
hhnf , lf , mf |Z|ni , li , mi i,
so that the z matrix element vanishes unless mi = mf . Next we have
(mi mf )
hhnf , lf , mf |X|ni , li , mi i = hnf , lf , mf |[X, Lz ]|ni , li , mi i = ihhnf , lf , mf |Y |ni , li , mi i
and
(mi mf )
hhnf , lf , mf |Y |ni , li , mi i = hnf , lf , mf |[Y, Lz ]|ni , li , mi i = ihhnf , lf , mf |X|ni , li , mi i
from which it follows that either
(mi mf )2 = 1,
or
hnf , lf , mf |Y |ni , li , mi i = hnf , lf , mf |X|ni , li , mi i = 0.
Thus, for the X and Y matrix elements to be non-vanishing we must have
mf = mi 1.
In short, no electric dipole transitions occur unless
m = 0, 1.
If m = 0, then only radiation with polarization with a component along z will stimulate
a transition (in this approximation). If m = 1, then polarization in the x y plane will
stimulate transitions. Likewise, these are the polarizations that feature in the respective
emission processes, that is, the emitted radiation will have this polarization structure.
Selection rules involving l
We have seen how the electric dipole transitions require m = 0, 1. We can get selection rules involving l by playing a similar game as above, but now using the commutator
~ = 2
~ + RL
~ 2 ).
[L2 , [L2 , X]]
h2 (L2 X
4
Take the initial-final matrix element of both sides of this equation and use the fact that
vectors defining the matrix element are L2 eigenvectors. You will find that this identity
implies (exercise)
~ i , li , mi i = 0.
2[lf (lf + 1) + li (li + 1)] [lf (lf + 1) li (li + 1)]2 hnf , lf , mf |X|n
Therefore, if
~ i , li , mi i =
hnf , lf , mf |X|n
6 0,
then
[lf (lf + 1) + li (li + 1)] [lf (lf + 1) li (li + 1)]2 = 0.
This condition can be factored into the form (exercise)
[(lf + li + 1)2 1][(lf li )2 1] = 0.
Keeping in mind that l is non-negative, you can see that the first term vanishes if and only
if li = lf = 0. The second term vanishes if and only if lf li = 1. We conclude that the
transition is forbidden unless
l = 1
or
lf = li = 0,
In fact, the second case is excluded: an explicit calculation easily shows that the dipole
matrix element actually vanishes if the initial and final states are zero angular momentum
states. To see this, simply recall that each of (X, Y, Z) is a linear combination of l = 1
spherical harmonics. If lf = li = 0, then the angular integrals in the inner products vanish
by orthogonality of l = 1 spherical harmonics with l = 0 spherical harmonics (exercise).
To summarize, the electric dipole selection rules are
l = 1,
m = 0, 1.
These conditions are necessary for a transition to occur, given our approximations. These
selection rules are compatible with an interpretation in terms of emission and absorption
of photons by the atom. Using this point of view, the photon will have frequency |f i |
(interpretable as conservation of energy) and the photon carries angular momentum
2
h, or as one says, the photon must have spin-1 (by conservation of angular momentum). Indeed, if the photon carries spin-1, then our previous discussion of addition of
angular momentum coupled with an assumption of conservation of angular momentum for
the atom-photon system imply that the angular momentum of the atom after the transition must be l, l 1 (exercise). The first possibility doesnt occur in the electric dipole
approximation.
5
This is all true, but it is a mistake to think that this picture can be obtained from our
treatment of an atom in a radiation field. There are two reasons why our current description
is inadequate. First, we have not treated the electromagnetic field as dynamical we have
simply postulated its form. Because the electromagnetic field configuration is specified
once and for all, there is no way to describe emission and/or absorption of energy and
angular momentum from the electromagnetic field, which is not allowed to change. This
is reflected in the fact that the Hamiltonian we have been using does not give the total
energy of the combined system of electron and electromagnetic field, rather it just gives
~ =X
~ P~
the non-conserved energy of the electron. Similarly, the angular momentum L
that we are speaking of is not the total, conserved angular momentum of the charge and
electromagnetic field, but rather just the unconserved angular momentum of the charge
alone. To include the electromagnetic field in the bookkeeping of conservation laws we
must include the electromagnetic degrees of freedom into the system and include suitable
terms in the Hamiltonian to describe the dynamics of these degrees of freedom and their
coupling to the charge. This leads us to the second difficulty with our previous treatment.
Our model was semi-classical since the charge was given a quantum treatment but the
electromagnetic field was given a classical treatment. It does not appear possible to have a
consistent theory of charges and electromagnetic fields in which the former are described via
quantum mechanics and the latter treated via classical physics. This was realized early on
in the history of quantum mechanics. What is needed, then, is a method for incorporating
electromagnetic degrees of freedom into the system using the rules of quantum mechanics.
It was (I think) Dirac who first showed a way to quantize the electromagnetic field and
then consider a quantum dynamical system of charges and fields. Thus QED was born. A
healthy dividend was paid for this quantum description of electrodynamics: one could now
explain spontaneous emission, which is the phenomenon where an atom (or other quantum
system) in an exited bound state will spontaneously emit radiation and drop to a lower
energy state even in the absence of a perturbation.

Classical Mechanics

Uploaded by

Copyright:

Available Formats

Classical Mechanics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classical Mechanics

Uploaded by

Copyright:

Available Formats

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

give a mathematical representation for these 3 basic ingredients.

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

magnetic moment is non-zero if the magnetic field varies in space. We have

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 1

Physics 6210/Spring 2007/Lecture 2

Physics 6210/Spring 2007/Lecture 2

Physics 6210/Spring 2007/Lecture 2

The scalar product of two vectors,

Bras and Kets

Physics 6210/Spring 2007/Lecture 2

c(h|) = ch| = h|c,

Physics 6210/Spring 2007/Lecture 3

We can therefore write

If we choose one of the spin bases, say, |S n, i, and we represent components of

Physics 6210/Spring 2007/Lecture 3

is represented by the column vector

The bra h| corresponding to |i has components forming a row vector:

Physics 6210/Spring 2007/Lecture 3

is a linear operator. Thus the set of linear operators forms an algebra.

A very important example of this is the identity operator, defined by,

It has the decomposition (good exercise!)

Physics 6210/Spring 2007/Lecture 3

Physics 6210/Spring 2007/Lecture 3

Physics 6210/Spring 2007/Lecture 4

are eigenvectors of Sx and Sy , respectively, with eigenvalues

Physics 6210/Spring 2007/Lecture 4

hSz i = h+|Sz |+i = , hSx i = hSy i = 0.

Physics 6210/Spring 2007/Lecture 4

The third postulate of QM is the place where the mathematical representation of

* Such a function is called a characteristic function for the set x = h2 .

Physics 6210/Spring 2007/Lecture 4

State: |Sx , i P rob(Sx = h/2) = 1,

State: |Sy , i P rob(Sx = h/2) = 1/2,

then the probability of getting h

h/2 is given by |b|2 . Note that the normalization condition

Physics 6210/Spring 2007/Lecture 5

* For now we state it in a form that is applicable to finite-dimensional Hilbert spaces. We

Physics 6210/Spring 2007/Lecture 5

Physics 6210/Spring 2007/Lecture 5

Physics 6210/Spring 2007/Lecture 5

Compatible and Incompatible Observables

B|a, bi = b|a, bi,

ha, b|a0 , b0 i = aa0 bb0 .

equality iff |ai = (const.)|bi.

Physics 6210/Spring 2007/Lecture 5

Physics 6210/Spring 2007/Lecture 6

Incompatible observables. The uncertainty principle.

Physics 6210/Spring 2007/Lecture 6

Physics 6210/Spring 2007/Lecture 6

important to keep in mind. In certain cases (e.g., the position-momentum relation, to be

Physics 6210/Spring 2007/Lecture 7

I = T T = (I + P + O(2 ))(I P + O(2 )) = P = P .

hx |pi = hx|T |pi = e h p hx|pi.