The Method of Lagrange Multipliers
The Method of Lagrange Multipliers
The Method of Lagrange Multipliers
for
x = (x1 , x2 , . . . , xn )
(1.1a)
subject to p constraints
g1 (x) = c1 ,
g2 (x) = c2 ,
...,
and
gp (x) = cp
x2i
(1.2a)
i=1
i=1
or
5
X
(1.1b)
x1 + 2x2 + x3 = 1
and
(1.2b)
+
x
x
2x
=
6
3
4
5
i=1
Pn
A first guess for (1.1) (with f (x) = i=1 xi2 in (1.2) ) might be to look for
solutions of the n equations
min
x1 ,...,x5
subject to
f (x) = 0,
xi
1in
(1.3)
However, this leads to xi = 0 in (1.2), which does not satisfy the constraint.
Lagranges solution is to introduce p new parameters (called Lagrange
Multipliers) and then solve a more complicated problem:
Theorem (Lagrange) Assuming appropriate smoothness conditions, minimum or maximum of f (x) subject to the constraints (1.1b) that is not on
the boundary of the region where f (x) and gj (x) are defined can be found
by introducing p new parameters 1 , 2 , . . . , p and solving the system
p
X
j gj (x) = 0,
f (x) +
1in
(1.4a)
xi
j=1
gj (x) = cj ,
1jp
(1.4b)
This amounts to solving n+p equations for the n+p real variables in x and .
In contrast, (1.3) has n equations for the n unknowns in x. Fortunately, the
system (1.4) is often easy to solve, and is usually much easier than using the
constraints to substitute for some of the xi .
x2k +
xk = 2xi + = 0,
1in
xi
k=1
k=1
Pn
Thus xi = /2 for 1 i n. The constraint i=1 xi = 1 in (1.4b) implies
P
n
i=1 xi = n/2 = 1 or = 2/n, from which xi = 1/n for 1 i n.
For xi = 1/n, f (x) = n/n2 = 1/n. One can check that this is a minimum
as opposed to a maximum or saddle point by noting that f (x) = 1 if x1 = 1,
xi = 0 for 2 i n.
(2) A System with Two Constraints: There are p = 2 constraints in (1.2b),
which is to find
min
5
X
x2i
subject to
i=1
x1 + 2x2 + x3 = 1
and
x3 2x4 + x5 = 6
(2.1)
(2.2)
k=1
i=1
n
X
a2i i2
subject to
n
X
ai = 1
i=1
i=1
ai Xi
where ai =
i=1
c/i2
for
c=1
n
.X
(1/k2 )
(2.4)
k=1
Pn
Pn
If i2 = 2 for all i, then ai = 1/n and i=1 ai Xi = (1/n) i=1 Xi = X is
the BLUE for .
Pn
Conversely, if Var(Xi ) = i2 is variable, then the BLUE
i=1 ai Xi
for puts relatively less weight on the noisier (higher-variance) observations
(that is, the weight ai is smaller), but still uses the information in the noiser
observations. Formulas like (2.4) are often used in survey sampling.
3. A Short Proof of Lagranges Theorem. The extremal condition
(1.3) (without any constraints) can be written in vector form as
f (x) =
f (x),
f (x), . . . ,
f (x)
x1
x2
xn
= 0
(3.1)
(3.2)
subject to
g(x) = c
(3.3)
d
d
g x1 (0) =
x1 (0) g x1 (0) = 0
(3.4)
dt
dt
This implies that g x1 (0) is orthogonal to the tangent vector (d/dt)x1 (0)
for any path x1 (t) in the surface defined by g(x) = c.
Conversely, if x is any point in the surface g(x) = c and y is any vector
such that y g(x) = 0, then it follows from the Implicit Function Theorem
there exists a path x1 (t) in the surface g(x) = c such that x1 (0) = x and
(d/dt)x1 (0) = y. This result and (3.4) imply that the gradient vector g(x)
is always orthogonal to the surface defined by g(x) = c at x.
Now let x be a solution of (3.3). I claim that f (x) = g(x) for some
scalar . First, we can always write f (x) = cg(x)+y where y g(x) = 0.
If x(t) is a path in the surface with x(0) = x and (d/dt)x(0) f (x) 6= 0,
it follows from (3.2) with y = (d/dt)x(0) that there are values for f (x) for
x = x(t) in the surface that both larger and smaller than f (x).
Thus, if x is a maximum of minimum of f (x) in the surface and f (x) =
cg(x) + y for y g(x) = 0, then y f (x) = y g(x) + y y = y y = 0
and y = 0. This means that f (x) = cg(x), which completes the proof of
Lagranges Theorem for one constraint (p = 1).
Next, suppose that we want to solve
max f (x)
subject to
g1 (x) = c1 , . . . , gp (x) = cp
(3.5)
for p constraints. Let x be a solution of (3.5). Recall that the each vector
gj (x) is orthogonal to the surface gj (x) = cj at x. Let L be the linear
space
L = span{ gj (x) : 1 j p }
I claim that f (x) L. This would imply
f (x) =
p
X
j=1
j gj (x)