Legendre Transform
Legendre Transform
Legendre Transform
Markus Deserno
Department of Physics, Carnegie Mellon University, Pittsburgh, PA 51213, USA
(Dated: December 26, 2012)
A Legendre transform is a procedure for expressing the information content of some function by
using a different independent variable, namely, the derivative of this function with respect to (one
of) its argument(s). These notes explain how this is done and why simply performing some sort of
algebraic substitution instead would destroy information.
I. INTRODUCTION B. Transforms
A. Information content of functions We are well used to the fact that the information con-
tained in a function can be represented in different ways.
Functions are mappings, often from some set of real (or Let us make two examples. Take the function y = f (x)
complex) numbers into another such set. They tell us and let us assume it’s invertible. Then, clearly, the in-
about some relationship and thus contain information. verse function x = f −1 (y) contains the same information.
In these notes we will be concerned with the question A simple way to convince yourself that this is true is to
for how to represent that information content, without recall that the graph of the inverse function is just the
necessarily being able to quantify it. This sounds very mirror image of the graph of the original function, mir-
vague, so let us be (marginally) more specific. rored at the line y = x. And clearly, mirroring preserves
all the information.
How much information is contained in a function? It
The second, maybe more interesting example is that
turns out that this is a nontrivial question with many
we can transform functions into other functions. For in-
subtle ramifications. Consider for instance the simple
stance, we can Fourier transform f (x) into the function
function
f˜(k). For a suitable set of starting functions, Fourier
transforms can be inverted, and hence we can recover
R → R+ 0
f: . (1) f (x) from f˜(k). This would then let us surmise that the
x 7→ x2 function and its Fourier transform contain the same in-
formation. Indeed, in functional analysis we would learn
It takes little to write this down, so the information con- that functions are members of abstract Hilbert spaces
tent appears small. And yet, we could decide to in- and that they can be represented in different ways—
stead “implement” this function by a lookup table. Since meaning, using different basis sets—but that it’s always
single precision floating point numbers require 32 bits the same “function” we’re talking about and that a
of information, a computer can store 232 ≈ 4.29×109 dif- change of basis set doesn’t change the information con-
ferent single precision floating point numbers. A lookup tent of the function.
table would thus contain twice as many numbers, each The topic of these notes, Legendre transforms, are yet
with 32 bits, requiring 32 GB of data (and note that another way to transform one function into another func-
we can’t actually store all squares in single precision. . . ). tion while preserving information content.
But then, how much information is there in that func-
tion? C. Functions and variables
It turns out that below we will be interested in ways to
represent the same information content in different ways,
Physicists love streamlined notation. The detailed no-
hence we must only answer the question “do two differ-
tation in equation (1) is rarely found in the physics lit-
ent functions contain the same information?” This is a
erature. Physicists often don’t even write y = f (x), they
simpler question, because it only requires us to compare
just talk of the function y(x). And when they refer to the
information content, not actually to quantify it. This is
inverse f −1 (y) they call that the function x(y). However,
quite analogous to the situation that permits us to de-
what seems awfully sleek in fact has the danger of con-
cide whether two sets have the same size (“magnitude”)
fusing three distinct concepts. What, for instance, do we
without actually counting elements: A sack of apples and
now mean by “x”? It could be either one of the following
a sack of oranges contain the same number of elements
three:
(and thus have the same magnitude) if we can pair up
the apples and oranges without any of them remaining 1. the independent variable x.
unpaired. And as you surely know, by such means Georg
2. the (inverse) function f −1 .
Cantor has first arrived at the quite unexpected and non-
trivial conclusion that the two sets of natural numbers 3. the value f −1 (y) resulting from inserting the inde-
and rational numbers actually have the same size. pendent variable y into the inverse function f −1 .
2
Usually, we don’t need to distinguish between these three quick answer is: No, it doesn’t. And the best way to see
things very carefully. However, when it comes to Legen- this is by a simple example.
dre transforms, the whole point is to change a function Take for instance the function y : x 7→ 12 (x − x0 )2 . The
by representing it through a different independent vari- derivative is p = y0 (x) = x − x0 , and this can be uniquely
able, which in turn is defined through (a derivative of) solved for x as a function of p, giving x = p + x0 . If we
the function itself. If we are now sloppy, we might miss insert this back into our original function y(x), we get
the whole concept and everything is somehow redundant
and mysterious. Hence, please make sure that before you 1 2 1
read on, you do understand that there is a difference be- y x(p) = x(p) − x0 = p2 . (4)
2 2
tween the three concepts above.
Now, the trick is to distinguish between them, with- Notice that the value of x0 has dropped out! Functions
out making the notation clumsy. In fact, we would like with different values of x0 would map to the same final
to avoid using the seemingly unnecessary extra symbol function 12 p2 . Hence, we do not know, if someone tells us
“f ”. I therefore suggest the following: When we speak of that they have obtained 12 p2 , which function they started
independent variables, we use their italic symbols. When with. Information is lost.
we speak of functions, we use roman type, and if we speak There is a subtlety here that is worth understanding
of values of functions, we add the independent variable fully. You might object that the value of x0 is contained
in parentheses. This way, the three concepts from above in the transformation equation, and of course I could
are distinguished as such: What we formerly all called transform 21 p2 back, together with the right value of x0 ,
“x” is now: if I use the right transformation equation. I’d just have
to memorize which transformation I did, and that might
1. the independent variable “x”.
be different for different initial functions (here: different
2. the (inverse) function “x”. values of x0 ). This is true, but this is not the point.
The point is that I do not want to memorize the trans-
3. the value “x(y)” resulting from inserting the inde- formed function together with the transformation equa-
pendent variable y into the inverse function x. tion. I only want to know by what general procedure the
transformation was accomplished.
This means, we now have y = y(x) and x = x(y). This
There is a way to solve this problem, but not for all
might look unusual. If you feel it is unnecessary, then
possible functions. It turns out that we will only be able
please go back and convince yourself that these three
to do things nicely, if our original function is of a special
concepts are different things.
form, and this requires one more interlude:
The crucial question is: Does this new function ỹ contain 3. If a convex function y(x) is everywhere differen-
the same information as the original function y? The tiable twice, then y00 (x) ≥ 0.
3
?
y-convention y? -convention
TABLE I: Legendre transform pairs and useful relations in the ? y- and y? -convention, both for convex and concave functions.
Notice that for the ? y-convention the Legendre transform is its own inverse, while in the y? -convention there is an additional
minus-to-plus switch in the xp-term. Also notice that in the ? y-convention convexity and concavity does not change upon
transformation, and hence the “min” and “max” is the same for both directions. In the y? -convention convexity and concavity
swap, and so do the “min” and “max”. Notice that when one actually computes the Legendre transforms, one anyways ends
up searching for the derivative in the min- or max-terms, and hence it doesn’t greatly matter whether it actually is a min- or a
max-procedure that is to be implemented. Notice also that the differential relations are rather easy to memorize, and indeed
they are frequently used in thermodynamics.
What is the derivative of the Legendre transform y? with where p(x) = y0 (x) and hence x(p) = y0−1 (p). Notice
respect to its independent variable? Assuming differ- that this is the point where we need convexity: If y(x)
entiability and convexity of y, we know that y? (p) = is convex, then y0 (x) is monotonic and hence we can
y(x(p)) − x(p)p, where x(p) = y0−1 (p). Hence, uniquely solve for x(p).
Now, Legendre transforming one more time (and re-
∂y? (p) ∂y(x) ∂x ∂x membering that ? y is also convex if y is convex) we get
= − x(p) − p = −x(p) . (14b)
∂p | ∂x
{z } ∂p ∂p ??
y(q) = max pq − y? (p) = p(q)q − y? (p(q)) . (19)
p
p
?
Inserting y (p), we get
Since evidently p(x) and x(p) are inverses of each other,
Eqns. (14a) and (14b) show that – up to a minus sign – y?? (q) = p(q)q − y? (p(q))
y0 and y? 0 are also inverse functions of each other:
= p(q)q − x(p(q)) p(q) − y(x(p(q)))
| {z } | {z }
y? 0 = −y0−1 . (15) q q
= y(q) ,
Using a slightly more sloppy notation (which, however, is which is what we wanted to prove.
popular in thermodynamics), we can state this as follows: As it turns out, in the y? -convention it is also necessary
If we have a function y(x) and its Legendre transform to also swap the sign of the “xp”-term. This can be easily
q(p) (using thermodynamics convention (8), then we have checked by the same type of calculation with which we
the following two matching pair of differentials: proved what the inverse transform was for the ? y-case.
Table I summarizes the Legendre transform rules in
dy = p dx ←→ dq = −x dp . (16) both conventions, for convex and concave functions.
5
E? (T ) = min E − T S(E) .
In thermodynamics we find that the entropy S as a (22b)
E
function of energy E, volume V , and particle number N
is a thermodynamic potential, meaning that it contains The difference between Eqn. (22a) and Eqn. (22b) is sub-
all the thermodynamic information we can hope for. We tle, but well visible in our notation: In both formulas we
also learn that certain derivatives of the entropy are of have essentially “E − T S”, but in the first we view the
thermodynamic interest, for instance we know that energy as a function of entropy and minimize over all val-
ues of the entropy, in the second we view the entropy as
1 ∂S a function of energy and hence minimize over all values
= , (20)
T ∂E V,N of the energy.
Of course, in thermodynamics we don’t write the Leg-
where T is the temperature. It is then natural to ask endre transform as E? (T ) but instead give it a new sym-
whether we can construct other thermodynamic poten- bol. Usually it’s F , but sometimes one also finds A. So
tials that contain such derivatives as their natural inde- people write F(T ) or simply F (T ) or simply F . And
pendent variables, and whether we can find them starting they call it the (Helmholtz) free energy. From what we
from the entropy. Since we want to replace some variables have learned about Legendre transforms, the free energy
by others which are derivatives of the function we start contains the same thermodynamic information as the en-
with, and since we certainly don’t want to lose precious tropy. Our discussion above also shows that it is a con-
thermodynamic information along the way, a Legendre cave function of temperature.
transform appears to be the winning ticket. Other Legendre transforms are evidently possible, for
To look at the best know example, let us first of all instance replacing the volume by the pressure, or the
suppress the volume and particle number dependence particle number by the chemical potential, thus leading
and only look at the energy. Next, instead of looking to all kinds of other equivalent thermodynamic potentials
at S = S(E), let us look at the inverse thermodynamic with all kinds of names, such as “enthalpy” or “grand
potential E = E(S). (Warning: Some people use a roman potential”.
“E” to denote the “exergy”, a different thermodynamic
potential. We don’t.) Since the entropy is monotonic
over the ranges over which (canonical) thermal equilib- B. Relation to Laplace transforms and partition
rium can be achieved, this inverse actually exists and we functions
hence do not lose information. Moreover, we of course
then also have The entropy S(E) is the logarithm of a function Ω(E),
∂E
which is essentially the density of states and a multiplic-
T = . (21) ity N ! divided out:
∂S (V,N )
S(E) = kB ln Ω(E) . (23)
For the reasons outlined above, simply solving this
equation for S as a function of T and the inserting The canonical partition function is the Laplace transform
this into E(S) will not solve the problem. We do get of Ω(E), and the free energy essentially the logarithm of
6
the canonical partition function: against and N -independent value s∞ (). We can thus
Z write sN () = s∞ () + δsN (), where the latter is a func-
e −βF (T )
= Z(T ) = dE Ω(E) e−βE . (24) tion that decays to zero in the thermodynamic limit. And
thus we are in the position to perform a saddle-point (or
“Laplace”-) evaluation of the integral:
It is easy to see that the Laplace transform relation be-
tween partition functions translates to a Legendre trans-
form relation between the thermodynamic potentials in Z
−βN fN (T )
the thermodynamic limit. To see this, we need to Laplace- e = d N e−βN [−T s∞ ()−T δsN ()]
evaluate the Laplace transform.
N →∞
First, we use (23) to rewrite (24): ∼ N e−βN min [−T s∞ ()] (27)
Z
e−βF (T ) = dE e−β[E−T S(E)] . (25)
Taking the logarithm and dividing by −βN , we get
We next need to make extensivity explicit. We will write
E = N and S = N sN ().
f (T ) = min − T s() , (28)
Z
−βF (T )
e = d N e−βN [−T sN ()] . (26)
which expresses the now well-known Legendre transform
Notice that sN () still depends on N . All we know is between the specific energies, entropies and free energies
that if the thermodynamic limit exists, it will converge (for which the limit N → ∞ exists).