The Method of Lagrange Multipliers

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

The Method of Lagrange Multipliers

S. Sawyer October 25, 2002


1. Lagranges Theorem. Suppose that we want to maximize (or minimize) a function of n variables
f (x) = f (x1 , x2 , . . . , xn )

for

x = (x1 , x2 , . . . , xn )

(1.1a)

subject to p constraints
g1 (x) = c1 ,

g2 (x) = c2 ,

...,

and

gp (x) = cp

As an example for p = 1, find


( n
)
n
X
X
xi = 1
min
x2i :
x1 ,...,xn

x2i

(1.2a)

i=1

i=1

or

5
X

(1.1b)

x1 + 2x2 + x3 = 1

and

(1.2b)

+
x
x
2x
=
6
3
4
5
i=1
Pn
A first guess for (1.1) (with f (x) = i=1 xi2 in (1.2) ) might be to look for
solutions of the n equations
min

x1 ,...,x5

subject to

f (x) = 0,
xi

1in

(1.3)

However, this leads to xi = 0 in (1.2), which does not satisfy the constraint.
Lagranges solution is to introduce p new parameters (called Lagrange
Multipliers) and then solve a more complicated problem:
Theorem (Lagrange) Assuming appropriate smoothness conditions, minimum or maximum of f (x) subject to the constraints (1.1b) that is not on
the boundary of the region where f (x) and gj (x) are defined can be found
by introducing p new parameters 1 , 2 , . . . , p and solving the system

p
X

j gj (x) = 0,
f (x) +
1in
(1.4a)
xi
j=1
gj (x) = cj ,

1jp

(1.4b)

This amounts to solving n+p equations for the n+p real variables in x and .
In contrast, (1.3) has n equations for the n unknowns in x. Fortunately, the
system (1.4) is often easy to solve, and is usually much easier than using the
constraints to substitute for some of the xi .

The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


2. Examples. (1) There are p = 1 constraints in (1.2a), so that (1.4a)
becomes
!
n
n
X
X

x2k +
xk = 2xi + = 0,
1in
xi
k=1

k=1

Pn
Thus xi = /2 for 1 i n. The constraint i=1 xi = 1 in (1.4b) implies
P
n
i=1 xi = n/2 = 1 or = 2/n, from which xi = 1/n for 1 i n.
For xi = 1/n, f (x) = n/n2 = 1/n. One can check that this is a minimum
as opposed to a maximum or saddle point by noting that f (x) = 1 if x1 = 1,
xi = 0 for 2 i n.
(2) A System with Two Constraints: There are p = 2 constraints in (1.2b),
which is to find
min

5
X

x2i

subject to

i=1

x1 + 2x2 + x3 = 1

and

x3 2x4 + x5 = 6

The method of Lagrange multipliers says to look for solutions of


!
5
X

x2k + (x1 + 2x2 + x3 ) + (x3 2x4 + x5 ) = 0


xi

(2.1)

(2.2)

k=1

where we write , for the two Lagrange multipliers 1 , 2 .


The equations (2.2) imply 2x1 + = 0, 2x2 + 2 = 0, 2x3 + + = 0,
2x4 2 = 0, and 2x5 + = 0. Combining the first three equations with
the first constraint in (2.1) implies 2 + 6 + = 0. Combining the last three
equations in (2.2) with the second constraint in (2.1) implies 12 + + 6 = 0.
Thus
6 + = 2
+ 6 = 12

Adding these two equations implies 7( + ) = 14 or + = 2.


Subtracting the equations implies 5( ) = 10 or = 2. Thus
( + ) + ( ) = 2 = 0 and = 0, = 2. This implies x1 = x2 = 0,
x3 = x5 = 1, and x4 = 2. The minimum value in (2.1) is 6.
(3) A BLUE problem: Let X1 , . . . , Xn be independent random variables
with E(Xi ) = and Var(Xi ) = i2 . Find the coefficients ai that minimize
n
!
n
!
X
X
ai Xi
Var
E
subject to
ai Xi =
(2.3)
i=1

i=1

The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


Pn
This asks us to find the Best Linear Unbiased Estimator i=1 ai Xi (abbreviated BLUE) for for given values of i2 .
Since Var(aX) = a2 Var(X) and Var(X + Y ) = Var(X)
Pn + Var(Y )
for
independent
random
variables
and
X
Y
,
we
have
Var(
i=1 ai Xi ) =
Pn
Pn
2
2 2
i=1 ai Var(Xi ) =
i=1 ai i . Thus (2.3) is equivalent to finding
min

n
X

a2i i2

subject to

n
X

ai = 1

i=1

i=1

Using one Lagrange multiplier for the constraint


Pnleads to the equations
2
2
2ai i + = 0 or ai = /(2i ). The constraint i=1 ai = 1 then implies
that the BLUE for is
n
X

ai Xi

where ai =

i=1

c/i2

for

c=1

n
.X

(1/k2 )

(2.4)

k=1

Pn
Pn
If i2 = 2 for all i, then ai = 1/n and i=1 ai Xi = (1/n) i=1 Xi = X is
the BLUE for .
Pn
Conversely, if Var(Xi ) = i2 is variable, then the BLUE
i=1 ai Xi
for puts relatively less weight on the noisier (higher-variance) observations
(that is, the weight ai is smaller), but still uses the information in the noiser
observations. Formulas like (2.4) are often used in survey sampling.
3. A Short Proof of Lagranges Theorem. The extremal condition
(1.3) (without any constraints) can be written in vector form as
f (x) =

f (x),
f (x), . . . ,
f (x)
x1
x2
xn

= 0

(3.1)

Now by Taylors Theorem


f (x + hy) = f (x) + hy f (x) + O(h2 )

(3.2)

where h is a scalar, O(h2 ) denotes terms that are bounded by h2 , and x y is


the dot product. Thus (3.1) gives the vector direction in which f (x) changes
the most per unit change in x, where unit change in measured in terms of
the length of the vector x.
In particular, if y = f (x) 6= 0, then
f (x hy) < f (x) < f (x + hy)

The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4


for sufficiently small values of h, and the only way that x can be a local
minimum or maximum would be if x were on the boundary of the set of
points where f (x) is defined. This implies that f (x) = 0 at non-boundary
minimum and maximum values of f (x).
Now consider the problem of finding
max f (x)

subject to

g(x) = c

(3.3)

for one constraint. If x = x1 (t) is a path in the surface defined by g(x) = c,


then by the chain rule

d
d
g x1 (0) =
x1 (0) g x1 (0) = 0
(3.4)
dt
dt

This implies that g x1 (0) is orthogonal to the tangent vector (d/dt)x1 (0)
for any path x1 (t) in the surface defined by g(x) = c.
Conversely, if x is any point in the surface g(x) = c and y is any vector
such that y g(x) = 0, then it follows from the Implicit Function Theorem
there exists a path x1 (t) in the surface g(x) = c such that x1 (0) = x and
(d/dt)x1 (0) = y. This result and (3.4) imply that the gradient vector g(x)
is always orthogonal to the surface defined by g(x) = c at x.
Now let x be a solution of (3.3). I claim that f (x) = g(x) for some
scalar . First, we can always write f (x) = cg(x)+y where y g(x) = 0.
If x(t) is a path in the surface with x(0) = x and (d/dt)x(0) f (x) 6= 0,
it follows from (3.2) with y = (d/dt)x(0) that there are values for f (x) for
x = x(t) in the surface that both larger and smaller than f (x).
Thus, if x is a maximum of minimum of f (x) in the surface and f (x) =
cg(x) + y for y g(x) = 0, then y f (x) = y g(x) + y y = y y = 0
and y = 0. This means that f (x) = cg(x), which completes the proof of
Lagranges Theorem for one constraint (p = 1).
Next, suppose that we want to solve
max f (x)

subject to

g1 (x) = c1 , . . . , gp (x) = cp

(3.5)

for p constraints. Let x be a solution of (3.5). Recall that the each vector
gj (x) is orthogonal to the surface gj (x) = cj at x. Let L be the linear
space
L = span{ gj (x) : 1 j p }
I claim that f (x) L. This would imply
f (x) =

p
X
j=1

j gj (x)

The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


for some choice of scalar values j , which would prove Lagranges Theorem.
To prove that f (x) L, first note that, in general, we can write
f (x) = w + y where w L and y is perpendicular to L, which means that
y z = 0 for any z L. In particular, y gj (x) = 0 for 1 j p. Now find a
path x1 (t) through x in the intersection of the surfaces gj (x) = cj such that
x1 (0) = x and (d/dt)x1 (0) = y. (The existence of such a path for sufficiently
small t follows from a stronger form of the Implicit Function Theorem.) It
then follows from (3.2) and (3.5) that y f (x) = 0. Since f (x) = w + y
where y w = 0, it follows that y f (x) = y w + y y = y y = 0 and y = 0,
This implies that f (x) = w L, which completes the proof of Lagranges
Theorem.
4. Warnings. The same warnings apply here as for most methods for
finding a maximum or minimum:
The system (1.4) does not look for a maximum (or minimum) of f (x)
subject to constraints gj (x) = cj , but only a point x on the set of values
determined by gj (x) = cj whose first-order changes in x are zero. This is
satisfied by a value x = x0 that provides a minimum or maximum typical
for f (x) in a neighborhood of x0 , but may only be a local minimum or
maximum. There may be several local minima or maxima, each yielding a
solution of (1.4). The criterion (1.4) also holds for saddle points of f (x)
that are local maxima in some directions or coordinates and local minima in
others. In these cases, the different values f (x) at the solutions of (1.4) have
to be evaluated individually to find the global maximum.
A particular situation to avoid is to look for a maximum value of f (x)
by solving (1.4) or (1.3) when f (x) takes arbitrarily large values when any of
the components of x are large (as is the case for f (x) in (1.2) ) and (1.4) has a
unique solution x0 . In that case, x0 is probably the global minimum of f (x)
subject to the constraints, and not a maximum. In that case, rather than
find the best possible value of f (x), one may end up with the worst possible
value. After solving (1.3) or (1.4), one often has to look at the problem more
carefully to see if it is a global maximum, a global minimum, or neither.
Another situation to avoid is when the maximum or minimum is on the
boundary of the values for which f (x) is defined. In that case, the maximum
or minimum is not an interior value, and the first-order changes in f (x) (that
is, the partial derivatives of f (x) ) may not be zero at that point. An example
is f (x) = x on the unit interval 0 x 1. The minimum value of f (x) = x
on the interval is x = 0 and the maximum is x = 1, but neither are solutions
of f 0 (x) = 0.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy