PURIFICATION
PURIFICATION
Purification
Lemma 1.5.1. Let ρ be a positive operator which has a finite trace. Then there exists a positive
√ √
operator ρ such that ρ2 = ρ. In particular, a density operator ρ admits such a positive square
√
root ρ.
Proof. Density operators admit orthonormal bases of eigenvectors (due to the trace condition), and
so we let {φj } be an orthonormal basis of eigenvectors for ρ: ρφj = pj φj . Since the eigenvalues pj are
√
not negative, we can define ρ by setting
√ √
ρφj = p j φj .
Then
√ 2 √
ρ φj = pj 2 φj = pj φj ,
√ √
showing that ρ2 and ρ have the same effect on a basis and so must be the same. Since pj is positive
√
and so real, ρ is positive. 2
√ √
We note that without the positivity condition it would be possible to set ρφj = ± pj φj , with
arbitrary choices of sign for each j. In an n-dimensional space this would permit 2n choices of square
root, but only one is a positive operator.
We also note that if ρ were positive, that is a projection Pψ onto a vector ψ, then we could choose a
basis with φ1 = ψ and the other vectors φj ⊥ ψ. Then
Theorem 1.5.2. Let ρ be a density operator on the space H. The Hilbert–Schmidt operators
Proof. We shall only sketch the proof of this. The first observation is that the linear operators on a
vector space themselves form a vector space. It is true, though not obvious, that the subset of A such
1
2 Quantum Measurement
that tr[A∗ A] is finite form a subspace, and we shall simply quote that result. Given that, the linear
combinations (A + in B) have finite tr[(A + in B)∗ (A + in B)], so that we can define
3
1 X −n
hA|Bitr = i tr[(A + in B)∗ (A + in B)] = tr[A∗ B].
4 n=0
√
Definition 1.5.1. The normalised pure state ρ ∈ H(2) is called the purification of the mixed
state defined by the density operator ρ.
√
Note. If ρ = Pψ is already pure, then we have seen that ρ = ρ so that the purification does not change
the state.
This result is not really that surprising. We originally derived density operators and mixed states to
cope with situations where some of the information about a system is being ignored or is inaccessible.
For example, we started by showing that
totally ignoring the second system by looking only at observables A for the first, in statistical mechanics
one likewise ignores large quantities of information about the environment (the details of molecular
movements in the surrounding gas) to concentrate on the system of interest. It should therefore be
possible to regard our system with its mixed state as part of a larger system in a pure state; in effect,
we are showing that it is possible to reconstruct enough of the missing part of the system to get a pure
state.
√ √
In the case of the purification the observable A acted by multiplication on the left A : ρ 7→ A ρ, but
√ √
it would also be possible to multiply ρ on the right, sending it to ρB. This is not compatible with
our calculation linking the mixed and pure states, and represents the missing part of the system which
the mixed state has ignored (either the system on H2 , or the missing details about the environment).
Originally density operators appeared in the description of the marginal distribution on one part of
a larger system:
hΨ|(A ⊗ 1)Ψi = tr[Aρ1 ].
One might ask whether the purification of ρ1 could help us reconstruct the vector Ψ. In fact the same
sort of ideas do enable us to get a very neat description of Ψ.
We have already seen that for any given orthonormal basis {ηα } of H2 , any vector Ψ in H1 ⊗ H2 can
be written in the form X
Ψ= |ξj i|e
ηj i,
j
We see that this can match the expression for tr[Aρ1 ] if and only if
√
pj pk hb
ηj |b
ηk i = pj δjk .
The Schmidt decomposition enables us to generalise the notion of Bell states √12 (|vi|vi + |hi|hi, and
to see that they are in some ways typical of systems having two parts. The Schmidt decomposition,
gives an elegant form for the entanglement of vectors describing states of a system formed from two
subsystems. (If there is only a single term in the Schmidt decomposition then Ψ is not entangled, but
otherwise it is entangled and the number of terms in the sum the it Schmidt rank gives a measure of
how entangled it is. Unfortunately there is no such expansion of general vectors when three or more
subsystems are combined, and good measures of entanglement are harder to find.
We note at this point that we could have interchanged the roles of H1 and H2 to get a similar expansion
in terms with eigenvectors ηj of ρ2 , and the same probabilities. So one particularly striking consequence
of this proof is the fact that the two partial traces ρ1 and ρ2 have the same non-zero eigenvalues. This is
all the more surprising when one recalls that H1 and H2 need not have the same dimension. Nonetheless
they must have the same non-zero eigenvalues. This means that if dim(H1 ) < dim(H2 ) then, because
ρ1 cannot have more than dim(H1 ) positive eigenvalues, neither can ρ2 . Overall we can deduce that the
rank of Ψ (the number of terms in its expansion) is the same as the ranks rk(ρ1 ) and rk(ρ2 ).
We conclude this part by noting that one does not need to start with a pure state Ψ ∈ H1 ⊗ H2 in
order to get density operators ρ1 and ρ2 on the two subsystems H1 and H2 .
We shall now consider the behaviour of density operators under (Schrödinger) time evolution, and as
a result of the von Neumann–Lüders projection during a measurement.
Theorem 1.7.1. The Schrödinger time evolution of a density operator satisfies the evolution
equation
dρ
ih̄ = [H, ρ].
dt
dA
ih̄ = [A, H],
dt
so that
d
ih̄ tr[Aρ] = tr[[A, H]ρ] = tr[AHρ] − tr[HAρ] = tr[AHρ] − tr[AρH] = tr[A[H, ρ]].
dt
On the other hand the Schrödinger and Heisenberg equations give the same expectations and in the
Schrödinger picture we have
d dρ
ih̄ tr[Aρ] = ih̄tr[A ].
dt dt
Since this should be true for all A we must have
dρ
ih̄ = [H, ρ].
dt
(It is important to note that this has the opposite sign from the Heisenberg evolution of observables.) 2
One consequence is that if ρ commutes with the Hamiltonian H then dρ/dt vanishes and ρ is constant.
This can be achieved, for example, by taking ρ to be a function of H, such as exp(−βH).
This can be illustrated with an example which arose in statistical mechanics. When Maxwell first
developed the subject he suggested that in its equilibrium state the velocities of the molecules moving
freely in a gas follow a normal distribution. (This was essentially a consequence of the law of large num-
bers, that the large numbers of collisions average out momentum transfer to give a normal distribution.)
Since the velocities are proportional to the momenta, this can be restated as saying that the probability
density is proportional to exp(−β|p|2 /2m) for some β which turns out to be inversely proportional to
the temperature T in Kelvin (that is degrees above absolute zero), β = 1/kT where k is Boltzmann’s
constant. Gibbs generalised this to suggest that when there are interactions the density is proportional
to exp(−β(P 2 /2m + V )).
Definition 1.7.1. For a quantum mechanical system with Hamiltonian H, suppose that exp(−βH)
has a finite trace. The Gibbs state density operator at inverse temperature β is
ρβ = exp(−βH)/tr[exp(−βH)].
We have already noted that the condition for a finite trace is already quite demanding, and is not
satisfied by many Hamiltonians. For example, gas molecules in free space have Hamiltonian P 2 /2m and,
as P 2 does not have any (normalisable) eigenvectors, neither does exp(−βH), so that it cannot have a
finite trace. This is not surprising, because in free space the molecules just fly off and never come into
equilibrium. If there is some restraining potential as in a harmonic oscillator then P 2 /2m + 12 mω 2 X 2
has discrete eigenvalues (n + 12 )h̄ω for n ∈ {0, 1, 2, . . ., and one gets a Gibbs state.
When doing a measurement on a system described by H1 we typically use some kind of measuring
instrument with its own space H2 , and states {j ∈ H2 } labelled by the possible outcomes. In order
to distinguish these states we choose these vectors to be orthonormal. If the measurement indicates
the result j then the projection postulate says that the system vector should be projected by the
corresponding projection operator Pj .
Suppose now that we start with the system described by a vector ψ ∈ H1 and the measuring instru-
ment in some rest state Ω. In the course of the measurement the system and instrument interact, and
for a good measurement |ψi|Ωi should evolve to a state M |ψi|Ωi which is the sum of the projections
Pj |ψihj |Ωi|j i, as projection of |ψi to Pj |ψi and of |Ωi into the measurement vector |j i happen together
X
|ψi|Ωi 7→ Pj |ψihj |Ωi|j i.
j
In other words, the measurement should send the system and measuring instrument into an entangled
state. Ideally, the Mj ψ would be its eigenvectors of the measured corresponding to eigenvalues labelled
by the measuring instrument state j . (If we arrange that hj |Ωi is the same for all j, then the operators
Mj are all the same multiple of the projections Pj , and, since multiples define the same physical state
we could just take Mj = Pj .)
We therefore have X
|ψi|Ωi 7→ Mj |ψi ⊗ |j i.
j
A subsequent measurement of an observable A on the system will have an expection given by this state.
Now X X
(A ⊗ 1) Mj |Mj ψi ⊗ |j i = AMj |ψi ⊗ |j i,
j j
so the expectation is
X
E[A ⊗ 1] = hMk ψ|AMj ψihk |j i.
k,j
The measuring apparatus is something we construct and over which we have complete control, so we
may ensure that the j are orthonormal, giving
X
E[A ⊗ 1] = hMj ψ|AMj ψi
j
X
= hψ|Mj∗ AMj ψi
j
X
= tr[ Mj∗ AMj Pψ ].
j
The net effect of the measurement is therefore to turn a system density operator ρ into
X
Mj ρMj∗ .
j
Definition 1.8.1.
P A positive operator-valued measurement (POVM) takes a density operator ρ1
on the system to j Mj ρ1 Mj∗ , with j Mj∗ Mj = 1. The operators Mj are called Kraus operators.
P
A special case occurs for von Neumann measurements with ψ 7→ P ψ, where P is a projection (usually
onto some eigenstate of the observable A which is being measured). The effect on density operators is to
send ρ to P ρP ∗ showing that we have a measurement with a single Kraus operator M1 = P . However,
we have M1∗ M1 = P 2 = P , which is usually not 1. Normal unitary evolutions described by solving
Schrödinger’s equation then probabilities are conserved and we should end up with 1, but projections,
representing von Neumann measurements, do not preserve norms and that permits us to get answers
which my be smaller.
In other words our measuring apparatus may not bePperfect, indeed its states j may not cover all the
possible outcomes, and in general we Pcan only expect j tr[Mj∗ Mj ρ1 ] ≤ 1 = tr[ρ1 ]. If there is inequality
∗ ∗
P
j Mj Mj ≤ 1 (in the sense that 1 − j Mj Mj is a positive operator) then it is said to be a generalised
positive operator-valued measurement. If the Kraus operators are projections then one says that it is a
projection-valued (or projective) measurement.
Originally only projective measurements were used, but it turned out that positive operator-valued
measurements could also be implemented and, in some situations, offered real advantages. (One early
suggested application was for detecting and decoding signals from deep space probes, where only a few
photons a second might arrive back.)
In fact one can arrive at generalised positive operator valued measurements by simply looking for
the most general transformation of density operators which is linear preserves positivity and does not
increase the trace. (The interaction which occurs should be linear, either the unitary evolution described
by solving Schrödinger’s equation, or a projection coming from a von Neumann measurement.)
Originally only projective measurements were used, but it turned out that positive operator-valued
measurements could also be implemented and, in some situations, offered real advantages. (One early
application was for detecting and decoding signals from deep space probes, where only a few photons a
second might arrive back.)
In normal computers and normal digital communications information is encoded as a string of bits,
that is 0s and 1s. Quantum computers replace each bit by a vector in the two-dimensional space spanned
by two vectors e0 = |0i and e1 = |1i.
Definition 2.1.1. A quantum bit or qubit is encoded by a two-dimensional quantum state, that
is a vector (up to multiples) in a two dimensional space V ∼
= C2 .
In a classical computer the bits are processed using a sequence of digital logic gates, but in a quantum
computer one carries out quantum transformations, that is unitary transformations of the spaces.
The simplest classical gates work on a single bit of information. There are only four possible such
gates, since each must take 0 to either 0 or 1, and likewise take 1 to 0 or 1, two choices of two possible
outcomes. Two of these four are rather boring since they either take both 0 and 1 to 0, or both 0 and
1 to 1. Another is just the identity which takes 0 to 0 and 1 to 1. The only really interesting classical
gate is the NOT gate which takes 0 to 1 and 1 to 0. (In logical applications we could think of 0 as false
and 1 as true, and these are reversed by the NOT gate, whence its name.)
In quantum information processing even one qubit allows many more possibilities, since, as already
remarked, any linear transformation T on C2 can operate on a qubit, sending |ψi 7→ T |ψi. Since vectors
differing only by scalar multiples define the same quantum state, we may as well restrict our attention
to normalised vectors and unitary transformations T , but there are still a lot of those.
The NOT gate also works for quantum systems taking |0i to |1i, and |1i to |0i. This linear transfor-
mation defined by its effect on the basis has matrix
0 1
= σ1
1 0
where σ1 is the first Pauli spin matrix. As an indication of the greater flexibility available in quantum
computing we note that since
1 1 1
[ (1 + i)(1 − iσ1 )]2 = (1 − 1 + 2i)(1 − σ12 − 2iσ1 ) = (2i)(−2iσ1 ) = σ1 ,
2 4 4
the NOT gate in quantum computers has a square root, with matrix
1 1 1 −i
(1 + i)(1 − iσ1 ) = (1 + i) .
2 2 −i 1
7
8 Quantum Information Processing
Another useful and easily implemented gate is the Hadamard gate with matrix
1 1 1 1
H=√ = √ (σ1 + σ3 ).
2 1 −1 2
Of course, one expects computers to process more than just one bit or qubit. In classical computers
the controlled not or CNOT gate provides an example. Here the first bit controls what happens to the
second: if the first bit is 0 then the second bit is left alone, if the first bit is one then NOT is applied
to the second. The quantum analogue fixes e0 ⊗ ψ but sends e1 ⊗ ej to e1 ⊗ e1−j . In terms of the basis
{e00 , e01 , e10 , e11 }, where ejk = ej ⊗ ek , the matrix is
1 0 0 0
0 1 0 0
.
0 0 0 1
0 0 1 0
1
(1 ⊗ 1 + 1 ⊗ σ1 + σ3 ⊗ 1 − σ3 ⊗ σ1 ),
2
Definition 2.1.2. A set of quantum gates is called universal for a system if they generate the
whole group of unitary transformations of the state space of the system, in the sense that given any
unitary transformation U and any > 0, there is a a finite product of the unitary gates U1 U2 . . . Un
which is within a distance of U , that is
It can be shown that the phase shift gate with φ = π/4 and the Hadamard gate from a universal set
for processing a single qubit, that is for the unitary operators on C2 , and those two operators on each
of two qubits, together with the controlled NOT, form a universal set for two qubits.
The proof of this falls outside the scope of this course but for those who are interested we outline how
the proof works.
Theorem 2.2.1. In the two dimensional space V any two noncommuting unitary transformations
neither of which has finite order form a universal set of gates.
Proof. Any unitary operator on V can be diagonalised, by the same procedure used for self-adjoint
operators. (Alternatively, we can derive the unitary result from the self-adjoint result. We may as well
assume that 1 is not an eigenvalue of U , since we can always multiply U by a suitable number of unit
modulus to ensure this. Then we can form A = i(U + 1)(U − 1)−1 , which is self-adjoint, since
This self-adjoint operator admits a basis of eigenvectors {ej }, and since U = (iA − 1)(iA + 1)−1 these are
also eigenvectors of U . If U is a unitary operator which does not have finite order then its eigenvalues
have the form exp(iθj ) with θj /2π irrational. The powers of these generate all complex numbers of unit
modulus. We can therefore assume that we have unitary operators of the form
λ 0
Uλ =
0 λ
for all λ such that λλ = 1. We shall call these U -type operators. The second unitary operator V ,
generates a similar group but with respect to a different basis (since they do not commute). Now any
unitary operator can be expressed as a product of a U -type operator then a V -type then another U -type,
completing the proof. 2
The phase shift T = T 41 π and Hadamard gate H actually do have finite order, but their products T H
and HT do not, so that this theorem applies, and shows that they give a universal set.
Theorem 2.2.2. In a two qubit space a universal set is provided by the single bit unitary trans-
formations and the CNOT gates linking any two.
Proof. By the previous result we know that we can get all unitary operators on each single qubit
space. and one then shows that with CNOT these generate the unitary group on the four-dimensional
two qubit space.
Together these show that a very small number of gates is sufficient to give a universal set, for example,
a two qubit system needs only two gates for each bit plus a CNOT gate.
2.3. No-cloning
In classical information processing we are always making copies, whether on paper using the photo-
copier, or electronically in our computer. The No-cloning Theorem says that it is not possible to make
an exact quantum copies of arbitrary quantum states, that is there is no good quantum copier which
can start with an arbitrary quantum state |ψi ∈ C2 and reliably turn out a copy so that we end up with
|ψi|ψi.
The No-cloning Theorem 2.3.1. Let V be a finite-dimensional vector space. There is no linear
map C : V 7→ V ⊗ V such that C|ψi is a multiple of |ψi|ψi for all ψ ∈ V .
Proof. Let us suppose that for some λ ∈ C − {0} we have C|ψi = λ−1 |ψi|ψi for all ψ ∈ V . We
certainly have for any two linearly independent vectors |ξ0 i and |ξ1 i in V :
λC(c0 |ξ0 i + c1 |ξ1 i) = c20 |ξ0 i|ξ0 i + c0 c1 (|ξ1 i|ξ0 i + |ξ0 i|ξ1 i) + c21 |ξ1 i|ξ1 i.
Comparing coefficients of the independent vectors |ξ0 i|ξ0 i, |ξ1 i|ξ0 i, |ξ0 i|ξ1 i and |ξ1 i|ξ1 i shows that
c20 = c0 , c21 = c1 , c0 c1 = 0.
The last equality shows that one of the coefficients c0 , c1 must vanish, so that |ψi is actually a multiple
of either |ξ0 i or |ξ1 i. 2
The compensation for this is that, although quantum theory forbids making copies of a state, it
allows us to reproduce a state wherever we like, at the expense of losing the original. This is the basis
of quantum teleportation.