Analythical Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Contents

1 Theory of maxima and minima 5


1.1 Statement of an optimization problem. Terminology . . . . . . 5
1.2 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Necessary conditions for maxima and minima . . . . . 8
1.2.2 Sufficient conditions for maxima and minima . . . . . 10
1.2.2.1 Sufficient conditions for single variable func-
tions . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2.2 Sufficient conditions for two independent
variables . . . . . . . . . . . . . . . . . . . . . 14
1.2.2.3 Sufficient conditions for n independent vari-
ables . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Constrained optimization . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Problem formulation . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Handling inequality constraints . . . . . . . . . . . . . 22
1.3.3 Analytical methods for optimization problems with
equality constraints. Solution by substitution . . . . . . 23
1.3.4 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . 26
1.3.5 Sufficient conditions for constrained optimization . . . 30
1.3.6 Karush-Kuhn-Tucker conditions . . . . . . . . . . . . . 39
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3
Contents

4
Chapter 1
Theory of maxima and minima

The classical theory of maxima and minima provides analytical methods for
finding solutions of optimization problems involving continuous and differ-
entiable functions and making use of differential calculus. Applications of
these techniques may be limited since many of the practical problems involve
functions that are not continuous or differentiable. The theory of maxima
and minima is the fundamental starting point for numerical methods of opti-
mization and a basis for advanced topics like calculus of variations or optimal
control.
This chapter presents necessary and sufficient conditions in locating the
optimum solutions for unconstrained and constrained optimization prob-
lems for single-variable functions or multivariable functions.

1.1 Statement of an optimization problem. Terminol-


ogy

Given a vector of n independent variables:

x = (x1 x2 . . . xn )T (1.1)

and a scalar function:


f : Rn R (1.2)

5
Chapter 1. Theory of maxima and minima

an optimization problem (P) can be formulated as follows:



min f (x) or max f (x) (1.3)
x x

subject to:

gi (x) = 0, i = 1, m (1.4)
hj (x) 0, j = 1, p (1.5)

The objective is to find the vector of parameters x which minimizes (or


maximizes) the given scalar function f , possibly subject to some restrictions
on the allowed parameter values. The function f to be optimized is the objec-
tive function; the elements of vector x are the control or decision variables; the
restrictions (1.4) and (1.5) are the equality or inequality constraints.
The values x of the variables which solve the problem is a minimizer (or
maximizer) of the function f subject to the constraints (1.4) and (1.5), and f (x )
is the minimum (or maximum) value of the function subject to the same con-
straints.
If the number of constraints m + p is zero, the problem is called an uncon-
strained optimization problem.
The admissible set or feasible region of (P), denoted S, is defined as:

S = {x Rn : gi (x) = 0, hj (x) 0} (1.6)

Example 1.1 Consider the problem:

min (1 x1 )2 + (1 x2 )2 (1.7)
(x1 ,x2 )

subject to

x1 + x2 1 0, (1.8)
x31 x2 0 (1.9)

The function to be minimized is f (x1 , x2 ) = (1 x1 )2 + (1 x2 )2 and the


constraints are: h1 (x1 , x2 ) = x1 + x2 1 and h2 (x1 , x2 ) = x32 x2 .

6
1.1. Statement of an optimization problem. Terminology

Figure 1.1: Feasible region

Figure 1.1 shows the contour plot of the objective function, i.e. the curves in two
dimensions on which the value of the function f (x1 , x2 ) is constant. The feasible
region, obtained as the area for which the constraints (1.8) and (1.9) hold, is shown
in the same figure.

As shown in Figure 1.2, the minimum values of a function f are the max-
imum of f . Therefore, the optimization problems will be stated, in general,
as minimization problems. The term extremum includes both maximum and
minimum.

100

80

60
f(x)
40

20

20

40
f(x)
60

80

x
100
4 3 2 1 0 1 2 3 4

Figure 1.2: min f (x) = max(f (x))

7
Chapter 1. Theory of maxima and minima

A point x0 is a global minimum of f (x) if:

f (x0 ) < f (x), x S (1.10)

A point x0 is a strong local minimum if there exists some > 0, such that:

f (x0 ) < f (x), when |x x0 | < (1.11)

A point x0 is a weak local minimum if there exists some > 0, such that:

f (x0 ) f (x), when |x x0 | < (1.12)

f(x)

x
x1 x2 x3

Figure 1.3: Minimum points. x1 : weak local minimum, x2 : global minimum,


x3 : strong local minimum

1.2 Unconstrained optimization

1.2.1 Necessary conditions for maxima and minima

The existence of a solution to an optimization problem (P), for a continuous


function f , is guaranteed by the extreme value theorem of Weierstrass, which
states:

Theorem 1.1 If a function f (x) is continuous on a closed interval [a, b], then f (x)
has both a maximum and a minimum on [a, b]. If f (x) has an extremum on an open

8
1.2. Unconstrained optimization

interval (a, b), then the extremum occurs at a critical point, (Renze and Weisstein,
2004).

A single variable function f (x) has critical points at all points x0 , where
the first derivative is zero (f 0 (x0 ) = 0), or f (x) is not differentiable.
A function of several variables f (x) has critical points where the gradient
is zero or the partial derivatives are not defined.
In general, a stationary point of a function f (x) is a point for which the
gradient vanishes:
f (x0 ) = 0 (1.13)

where T
f f f
f (x0 ) = ... (1.14)
x1 x2 xn
A stationary point of a single variable function is a point where the first
derivative f (x0 ) equals zero.

Example 1.2 Consider the cases in Figure 1.4.

f(x) f(x) f(x)

x x x
a x1 b a x2 x3 xb4 a x5b
a) b) c)

Figure 1.4: Stationary points

a) In Figure 1.4 a) the point x1 is the global minimizer.

b) In Figure 1.4 b) there are three stationary points: x2 is a global minimizer, x3 is


a local maximizer and x4 is a local minimizer.

c) In Figure 1.4 c) x5 is a stationary point because the first derivative of f van-


ishes, but it is not a maximizer nor a minimizer; x5 is an inflection point. For
f : [a, b] R the point x = a is the global minimizer.

9
Chapter 1. Theory of maxima and minima

For a continuously differentiable function f , the only case when the mini-
mizer (or maximizer) is not a stationary point is when the point is an endpoint
of the interval [a, b] on which f is defined. That is, any maximum point inside
this interval, must be a stationary point.
The first-order condition: The necessary condition for a point x to be a mini-
mizer (or a maximizer) of the continuously differentiable function f : [a, b] R is: if
x (a, b), then x is a stationary point of f .

Local extrema of a function f : [a, b] R, f continuously differentiable,


may occur only at:

boundaries

stationary points (the first derivative of f is zero)

In case the function is non-differentiable at some points in [a, b], the func-
tion may have extreme values in the points where it is not differentiable.
For a continuously differentiable function of n independent variables
f : Rn R, the necessary condition for a point x0 = (x1 x2 . . . xn )T to be an
extremum is that the gradient equals zero.
For continuously differentiable functions of n variables, the stationary
points can be: minima, maxima or saddle points.

1.2.2 Sufficient conditions for maxima and minima

Since not all stationary points are necessarily minima or maxima (they can
be also inflection or saddle points) we can determine their character by ex-
amining the second derivative of the function at the stationary point. These
sufficient conditions will be developed for single variable functions and then
extended for two or n variables based on the same concepts. The global min-
imum or maximum has to be located by comparing all local maxima and
minima.

1.2.2.1 Sufficient conditions for single variable functions

Second-order conditions for optimum. Let f be a single variable function


with continuous first and second derivatives, defined on an interval S, f : S
R, and x0 is a stationary point of f so that f 0 (x0 ) = 0.

10
1.2. Unconstrained optimization

The Taylor series expansion about the stationary point x0 is one possibility
to justify the second-order conditions:

1
f (x) = f (x0 )+f 0 (x0 )(xx0 )+ f 00 (x0 )(xx0 )2 +higher order terms (1.15)
2

For points x sufficiently close to x0 the higher order terms become neg-
ligible compared to the second-order terms. Knowing the first derivative is
zero at a stationary point, the equation (1.15) becomes:

1
f (x) = f (x0 ) + f 00 (x0 )(x x0 )2 (1.16)
2

Since (x x0 )2 is always positive, we can determine whether x0 is a lo-


cal maximum or minimum by examining the value of the second derivative
f 00 (x0 ):

If f 00 (x0 ) > 0, the term 12 f 00 (x0 )(xx0 )2 will add to f (x0 ) in the equation
(1.16), so the value of f at the neighboring points x is greater than f (x0 ).
In this case x0 is a local minimum.

If f 00 (x0 ) < 0, the term 21 f 00 (x0 )(x x0 )2 will subtract from f (x0 ) and
the value of f at the neighboring points x is less than f (x0 ). In this case
x0 is a local maximum.

If f 00 (x0 ) = 0 it is necessary to examine the higher order derivatives.


In general if f 00 (x0 ) = ... = f (k1) (x0 ) = 0, the Taylor series expansion
becomes
1
f (x) = f (x0 ) + f (n) (x0 )(x x0 )k (1.17)
k!
If k is even, then (x x0 )k is positive. Thus, if f (k) (x0 ) > 0, then
f (x0 ) is a minimum; if f (k) (x0 ) < 0 then f (x0 ) is a maximum (Fig-
ure 1.5).
If k is odd, then (x x0 )k changes sign. It is positive for x > x0
and negative for x < x0 . If f (k) (x0 ) > 0, the second term in the
equation (1.17) is positive for x > x0 and negative for x < x0 . If
f (k) (x0 ) < 0, the second term in the equation (1.17) is negative for
x > x0 and positive for x < x0 . The stationary point is an inflection
point (Figure 1.5).

11
Chapter 1. Theory of maxima and minima

Figure 1.5: Minimum, maximum and inflection points

These results can be summarized in the following rules:

If f 00 (x0 ) < 0, then x0 is a local maximizer.

If f 00 (x0 ) > 0, then x0 is a local minimizer.

If f 00 (x0 ) = ... = f (k1) (x0 ) = 0 and:

If k is even and
If f (k) (x0 ) < 0, then x0 is a local maximizer.
If f (k) (x0 ) > 0, then x0 is a local minimizer.
If k is odd, then x0 is an inflection point.

Example 1.3 Locate the extreme points of the following function:

x5 x3
f (x) = (1.18)
5 3

The first derivative is:

f 0 (x) = x4 x2 = x2 (x2 1) = x2 (x 1)(x + 1) (1.19)

The stationary points are obtained by setting the first derivative equal to zero:

x1 = 1; x2 = 1; x3 = x4 = 0; (1.20)

The second derivative:


f 00 (x) = 4x3 2x (1.21)

12
1.2. Unconstrained optimization

calculated at x1,2,3,4 is:

f 00 (1) = 2, f 00 (1) = 2; f 00 (0) = 0 (1.22)

Because f 00 (1) > 0 and f 00 (1) < 0, the stationary point x1 = 1 is a local minimum
and x2 = 1 is a local maximum. Since the second derivative is zero at x3,4 = 0, an
analysis of higher order derivatives is necessary. The third derivative of f :

f 000 (x) = 12x2 2 (1.23)

is nonzero for x3,4 = 0. Since the order of the first nonzero derivative is 3, i.e., it is
odd, the stationary points x3 = x4 = 0 are inflection points. A plot of the function
showing the local minimum, maximum and inflection points is shown in Figure 1.6.

x5/5x3/3
0.4

0.3

0.2

0.1
f(x)

0.1

0.2

0.3

0.4
1.5 1 0.5 0 0.5 1 1.5
x

Figure 1.6: Plot of f (x) = x5 /5 x3 /3

Example 1.4 Consider the function

f (x) = 4x x3 /3 (1.24)

Determine the maximum and minimum points in the region:

3 x 1 (1.25)

Compute the first derivative of f (x) and set it equal to zero:

f 0 (x) = 4 2x2 = 0 (1.26)

13
Chapter 1. Theory of maxima and minima

The stationary points of the function are x10 = 2 and x20 = 2. Since the variable
is constrained by (1.25) and x20 is out of the bounds, we shall analyze the other
stationary point and the boundaries. The second derivative:

f 00 (x10 ) = 2x10 = 2 (2) = 4 > 4 (1.27)

is positive at x10 , thus 2 is a local minimum, as shown in Figure 1.7. According to


the theorem of Weierstrass, the function must have a maximum value in the interval
[3, 1]. On the boundaries, the function takes the values: f (3) = 3 and f (1) =
11/3. Thus, the point x = 1 is the maximizer in [3, 1].

0
f(x)

6
3 2.5 2 1.5 1 0.5 0 0.5 1
x

Figure 1.7: Plot of f (x) = 4x x3 /3

1.2.2.2 Sufficient conditions for two independent variables

Second derivative test


Consider the function f (x1 , x2 ) and denote the first-order partial deriva-
tives of f with respect to x1 and x2 by:

f (x1 , x2 ) f (x1 , x2 )
= fx1 , = fx2 (1.28)
x1 x2

and the second-order partial derivatives:

f (x1 , x2 )
= fxi xj , i, j = 1, 2 (1.29)
xi xj

If f (x1 , x2 ) has a local extremum at a point (x10 ,x20 ) and has continuous

14
1.2. Unconstrained optimization

partial derivatives at this point, then

fx1 (x10 , x20 ) = 0, fx2 (x10 , x20 ) = 0 (1.30)

The second partial derivatives test classifies the point as a local maximum
or local minimum.
Define the second derivative test discriminant as:

D2 = fx1 x1 fx2 x2 fx21 x2 (1.31)

Then, (Weisstein, 2004)

If D2 (x10 , x20 ) > 0 and fx1 x1 (x10 , x20 ) > 0, the point is a local minimum

If D2 (x10 , x20 ) > 0 and fx1 x1 (x10 , x20 ) < 0, the point is a local maximum

If D2 (x10 , x20 ) < 0, the point is a saddle point

If D2 (x10 , x20 ) = 0 the test is inconclusive and higher order tests must
be used.

Note that the second derivative test discriminant, D2 , is the determinant


of the Hessian matrix: " #
fx1 x1 fx1 x2
H2 = (1.32)
fx1 x2 fx2 x2

Example 1.5 Locate the stationary points of the function:

f (x1 , x2 ) = x21 x22 (1.33)

and determine their character.


The stationary points are computed by setting the gradient equal to zero:

fx1 = 2x1 = 0 (1.34)


fx2 = 2x2 = 0 (1.35)

The function has only one stationary point: (x10 , x20 ) = (0, 0). Compute the second
derivatives:
fx1 x1 = 2, fx1 x2 = 0, fx2 x2 = 2 (1.36)

15
Chapter 1. Theory of maxima and minima

and the determinant of the Hessian matrix is:



f fx1 x2
xx
D2 = 1 1 = fx1 x1 fx2 x2 fx21 x2 = 2 (2) 02 = 4 < 0 (1.37)
fx1 x2 fx2 x2

According to the second derivative test, the point (0, 0) is a saddle point. A mesh
and contour plot of the function is shown in Figure 1.8.

1
0.8
0.8

1 0.6
0.6

0.4 0.4
0.5
0.2
f(x1, x2)

0.2
0

x2
0 0

0.5 0.2 0.2

0.4 0.4
1
1 0.6
0.6
0.5 1
0 0.5 0.8
0 0.8
0.5 0.5 1
1 1 1 0.5 0 0.5 1
x2 x1 x1

Figure 1.8: Mesh and contour plot of f (x1 , x2 ) = x21 x22

Example 1.6 Locate the stationary points of the function:

f (x1 , x2 ) = 1 x21 x22 (1.38)

and determine their character.


Compute the stationary points from:

fx1 = 2x1 = 0 (1.39)


fx2 = 2x2 = 0 (1.40)

The function has only one stationary point: (x10 , x20 ) = (0, 0). The second deriva-
tives are:
fx1 x1 = 2 < 0, fx1 x2 = 0, fx2 x2 = 2 (1.41)

and the discriminant:



f
x x fx1 x2
D2 = 1 1 = fx1 x1 fx2 x2 fx21 x2 = (2) (2) 02 = 4 > 0 (1.42)
fx1 x2 fx2 x2

16
1.2. Unconstrained optimization

Thus, the function has a maximum at (0, 0), because fx1 x1 < 0 and D2 > 0. The
graph of the function is shown in Figure 1.9.

0.5
f(x1, x2)

0.5

1
1
0.5 1
0 0.5
0
0.5 0.5
1 1
x2 x1

Figure 1.9: Mesh plot of f (x1 , x2 ) = 1 x21 x22

Example 1.7 Locate the stationary points of the function:

f (x1 , x2 ) = x21 + 2x1 x2 + x42 2 (1.43)

and determine their character.


Compute the stationary points from:

fx1 = 2x1 + 2x2 = 0 (1.44)


fx2 = 2x1 + 4x32 = 0 (1.45)

From (1.44) and (1.45):



x1 = x2 , x1 2x31 = 0, or x1 (1 2x1 )(1 + 2x1 ) = 0

and the stationary points are: (0, 0), (1/ 2, 1/ 2) and (1/ 2, 1/ 2).
The second derivatives are:

fx1 x1 = 2, fx1 x2 = 2, fx2 x2 = 12x22 (1.46)

and the discriminant:



f fx1 x2 2 2
xx
D2 = 1 1 = = 24x22 4 (1.47)
fx1 x2 fx2 x2 2 12x22

17
Chapter 1. Theory of maxima and minima

For x2 = 0, D2 = 4 < 0 and (0, 0) is a saddle point



For x2 = 1/ 2, D2 = 24/24 = 8 > 0 and (1/ 2, 1/ 2) is a minimum

For x2 = 1/ 2, D2 = 8 > 0 and (1/ 2, 1/ 2) is also a minimum

A plot of the function and the contour lines are shown in Figure 1.10.

2 0.8 1.5

0.6
1
1
0.4
0.5
0
f(x1, x2)

0.2
0

x2
1 0

0.2 0.5
2
0.4 1
3
1 0.6
1.5
1 0.8
0
0.5
0 2
0.5 1
1 1 1 0.5 0 0.5 1
x x x1
2 1

Figure 1.10: Mesh and contour plot of f (x1 , x2 ) = x21 + 2x1 x2 + x42 2

1.2.2.3 Sufficient conditions for n independent variables

Second derivative test


Let f : Rn R be a function of n independent variables, and x0 =
[x10 x20 . . . xn0 ]T a stationary point. The Taylor series expansion about x0 is:

f (x) = f (x0 ) + f (x0 )T (x x0 ) +


1
+ (x x0 )T H2 (x x0 ) + higher order terms (1.48)
2

where H2 is the Hessian matrix defined by:



fx1 x1 fx1 x2 . . . fx1 xn

fx2 x1 fx2 x2 . . . fx2 xn
H2 =


(1.49)
... ... ... ...
fxn x1 fxn x2 . . . fxn x2

If x is sufficiently close to x0 , the terms containing (xi xi0 )k , k > 2


become very small and higher order terms can be neglected. The first deriva-

18
1.2. Unconstrained optimization

tives of f are zero at a stationary point, thus the relation (1.48) can be written
as:
1
f (x) = f (x0 ) + (x x0 )T H2 (x x0 ) (1.50)
2
The sign of the quadratic form which occurs in (1.50) as the second term
on the right-hand side will decide the character of the stationary point x0 ,
similarly to single variable functions.
According to (Hancock, 1960), we can determine whether the quadratic
form is positive or negative by evaluating the signs of the determinants of
the upper-left sub-matrices of H2 :

fx1 x1 fx1 x2 . . . fx1 xi


fx2 x1 fx2 x2 . . . fx2 xi
Di = , i = 1, n
(1.51)
... ... ... ...

fxi x1 fxi x2 . . . fxi xi

If Di (x0 ) > 0, i = 1, n, or H2 is positive definite, the quadratic form is


positive and x0 is a local minimizer of f .

If Di (x0 ) > 0, i = 2, 4, ... and Di (x0 ) < 0, i = 1, 3, ..., or H2 is nega-


tive definite, the quadratic form is negative and x0 is a local maximizer
of f .

If H2 has both positive and negative eigenvalues, x0 is a saddle point

otherwise, the test is inconclusive.

Note that if the Hessian is positive semidefinite or negative semidefinite, the


test is inconclusive.
The sufficient conditions presented above are the same as the ones stated
for two independent variables, when n = 2. For example, a stationary
point (x10 , x20 ) is a local maximizer of a function f when fx1 x1 (x10 , x20 ) =
D1 (x10 , x20 ) < 0 and D2 (x10 , x20 ) > 0.

Example 1.8 Compute the stationary points of the function:

f (x1 , x2 , x3 ) = x31 3x1 + x22 + x23 (1.52)

and classify them.

19
Chapter 1. Theory of maxima and minima

The first derivatives are:

fx1 = 3x21 3, fx2 = 2x2 , fx3 = 2x3 (1.53)

If they are set to zero we obtain two stationary points:

x10 = 1, x20 = 0, x30 = 0


x10 = 1, x20 = 0, x30 = 0 (1.54)

The second-order derivatives:

fx1 x1 = 6x1 , fx2 x2 = 2, fx3 x3 = 2, fx1 x2 = fx1 x3 = fx2 x3 = 0 (1.55)

will form the Hessian matrix:



6x1 0 0

H2 = 0 2 0 (1.56)
0 0 2

For (1, 0, 0) the determinants of the upper-left submatrices:



6 0

D1 = 6 > 0, D2 = = 12 > 0,
0 2

6 0 0


D3 = 0 2 0 = 24 > 0 (1.57)

0 0 2

are all positive, so the stationary point is a minimizer of f (x1 , x2 , x3 ).


For (1, 0, 0) the determinants D1 , D2 , D3 are all negative, as resulted from:

6 0

D1 = 6 < 0, D2 = = 12 < 0,
0 2

6 0 0


D3 = 0 2 0 = 24 < 0 (1.58)

0 0 2

The Hessian matrix is diagonal, the eigenvalues are easily determined as 6, 2, 2.

20
1.3. Constrained optimization

Because they do not have the same sign and are nonzero (1, 0, 0) is a saddle point.

1.3 Constrained optimization


1.3.1 Problem formulation

A typical constrained optimization problem can be formulated as:

min f (x) (1.59)


x

subject to

gi (x) = 0, i = 1, m (1.60)
hj (x) 0, j = 1, p (1.61)

where x is a vector of n independent variables, [x1 x2 . . . xn ]T , and f , gi and


hj are scalar multivariate functions.
In practical problems, the values of independent variables are often lim-
ited since they usually represent physical quantities. The constraints on the
variables can be expressed in the form of equations (1.60) or inequalities
(1.61). A solution of a constrained optimization problem will minimize the
objective function while satisfying the constraints.
In a well formulated problem, the number of linearly independent equal-
ity constraints must be less or equal to the number of variables, m n. The
constraints (1.60) form a system of m (in general) nonlinear equations with
n variables. When m n, the problem is overdetermined and the solution
may not exist because there are more equations than variables. If m = n the
values of the variables are uniquely determined and there is no optimization
problem.
At a feasible point x an inequality constraint is said to be active (or tight)
if hj (x) = 0 and inactive (or loose) if the strict inequality hj (x) < 0 is satisfied.

Example 1.9 Minimize


f (x1 , x2 ) = x21 + x22 (1.62)

subject to
g(x1 , x2 ) = x1 + 2x2 + 4 = 0 (1.63)

21
Chapter 1. Theory of maxima and minima

The plot of the function and the constraint are illustrated in Figure 1.11, as a 3D
surface and contour lines.

16

20 2
14

15 1 12
f(x1, x2)

10

2
10 0

x
8

5 1 6
x1+2x2+4=0
4
0 4 2
3 2
2 0
1 2
0 2
1
2 4
3 3
x 3 2 1 0 1 2 3
x1 2 x1

Figure 1.11: Mesh and contour plot of f (x1 , x2 ) = x21 + x22 and the constraint
x1 + 2x2 + 4 = 0

The purpose now is to minimize the function x21 +x22 subject to the condition that
the variables x1 and x2 lie on the line x1 + 2x2 + 4 = 0. Graphically, the minimizer
must be located on the curve obtained as the intersection of the function surface and
the vertical plane that passes through the constraint line. In the x1 x2 plane, the
minimizer can be found as the point where the line x1 + 2x2 + 4 = 0 and a level
curve of x21 + x22 are tangent.

1.3.2 Handling inequality constraints

Optimization problems with inequality constraints can be converted into


equality-constrained problems and solved using the same approach. In this
course, the method of slack variables is presented.
The functions hj (x) from the inequalities (1.61) are set equal to zero if a
positive quantity is added:

hj (x) + s2j = 0, j = 1, p (1.64)

Notice that the slack variables sj are squared to be positive. They are additional
variables, so the number of the unknown variables increases to n + p.
An inequality given in the form:

hj (x) 0, j = 1, p (1.65)

22
1.3. Constrained optimization

can also be changed into an equality if a positive quantity is subtracted from


the left hand side:
hj (x) s2j = 0, j = 1, p (1.66)

Example 1.10 Maximize

f (x1 , x2 ) = x21 x22 (1.67)

subject to
x1 + x2 4 (1.68)

The inequality (1.68) is converted into an equality by introducing a slack variable s21 :

x1 + x2 4 + s21 = 0 (1.69)

The unknown variables are now: x1 , x2 and s1 .

1.3.3 Analytical methods for optimization problems with equality


constraints. Solution by substitution

This method can be applied when it is possible to solve the constraint equa-
tions for m independent variables, where the number of constraints is less
than the total number of variables m n. The solution of the constraint
equations is then substituted into the objective function. The new problem
will have n m unknowns, it will be unconstrained and the techniques for
unconstrained optimization can be applied.

Example 1.11 Let w, h, d be the width, height and depth of a box (a rectangular
parallelepiped). Find the optimal shape of the box to maximize the volume, when the
sum w + h + d is 120.
The problem can be formulated as:

max whd (1.70)


(w,h,d)

subject to
w + h + d 120 = 0 (1.71)

23
Chapter 1. Theory of maxima and minima

We solve (1.71) for one of the variables, for example d and then substitute the
result into (1.70):
d = 120 w h (1.72)

The new problem is:


max wh(120 w h) (1.73)
(w,h)

Let:
f (w, h) = wh(120 w h) = 120wh w2 h wh2 (1.74)

Compute the stationary points from:

f
= 120h 2wh h2 = h(120 2w h) = 0 (1.75)
w
f
= 120w w2 2wh = w(120 w 2h) = 0 (1.76)
h

The solutions are: w = h = 0 (not convenient) and w = h = 40.


Determine whether (40, 40) is a minimum or maximum point. Write the deter-
minant:
2
f 2f 2h 120 2w 2h
2 wh =
D2 = w 2 2
f f 120 2w 2h 2w
wh h2

80 40

= (1.77)
40 80

Since
2f
D1 = = 80 < 0 and D2 = 4800 > 0
w2
the point (40, 40) is a maximum. From (1.72) we have: d = 120 40 40 = 40,
thus the box should have the sides equal: w = d = h = 40.

Example 1.12 Maximize

f (x1 , x2 ) = x21 x22 (1.78)

subject to:
x1 + x2 = 4 (1.79)

As shown in Figure 1.12, the constrained maximum of f (x1 , x2 ) must be located

24
1.3. Constrained optimization

10
4

0
3 1
9. 1.90
10 7.1 5238 48
429
4.7
f(x , x )

2 619
2

2
20 2.38
1

x
1

1
x1+x24=0
30

0
40

50 6 1
2 4
0 2
2 4 0
6 2
2
x2 2 1 0 1 2 3 4 5
x1 x1

Figure 1.12: Mesh and contour plot of f (x1 , x2 ) = x21 x22 and the constraint
x1 + x2 4 = 0

on the intersection of the function surface and the vertical plane that passes through
the constraint line. In the plot showing the level curves of f (x1 , x2 ), the point that
maximizes the function is located where the line x1 + x2 + 4 = 0 is tangent to one
level curve.
Analytically, this point can be determined by the method of substitution, as fol-
lows:
Solve (1.79) for x2 :
x2 = 4 x1 (1.80)

and replace it into the objective function. The new unconstrained problem is:

max x21 (4 x1 )2 (1.81)
x1

The stationary point of f (x1 ) = x21 (4 x1 )2 is calculated by letting the first


derivative be zero:
2x1 2(4 x1 )(1) = 0 (1.82)

which, gives x10 = 2.


The second derivative f 00 (x10 ) = 4 is negative, thus x10 = 2 is a maximizer
of f (x1 ). From (1.80), x20 = 4 x10 = 2, so the solution of the constrained
optimization problem is (2, 2).

Example 1.13 Minimize

f (x1 , x2 ) = 20 x1 x2 (1.83)

25
Chapter 1. Theory of maxima and minima

subject to:
x1 + x2 = 6 (1.84)

Substitute x2 = 6 x1 from (1.84) into (1.83) and obtain the unconstrained


problem:
max 20 x1 (6 x1 ) (1.85)
x1

The stationary point is computed from the first derivative of f (x1 ):

f 0 (x1 ) = 6 + 2x1 = 0, x10 = 3 (1.86)

Because the second derivative f 00 (x10 ) = 2 is positive, the stationary point is a min-
imizer of f (x1 ). Because x2 = 6 x1 , the point (x10 = 3, x20 = 3) minimizes
the function f (x1 , x2 ) subject to the constraint (1.84). As shown in Figure 1.13,
the minimum obtained is located on the parabola resulting from the intersection of
the function surface and the vertical plane containing the constraint line, or, in the
contour plot, it is the point where the constraint line is tangent to a level curve.

7
1
19

5.6
8.33
16.3333

8.

7
33 2

666
3
21.6667

.6
30 1 33 1
3

333
6

66
24.333

5.6

67
66

5 33
.3
0.
11

1 1 7

33 33
33
20 5 0.3 3

33
33
13.6

3
x1+x26=0
667

10 4 5. 3
8. 66 2 5
33 66
0.3 .3333
f(x1, x2)

33 7 333 3
19

3
16

33
2

0 3
.33

11
x

667

33

5.66 3
21.6

13. 8.33 667


10 2 666 333
7
11
16.3 13.6667
20 1 333
19 16.3333
x +x 6=0 5 19 19
1 2 0
30 0
8 6 21.6667
4 2
5 21.6667
0 2 10 24.3333
1
x2 1 0 1 2 3 4 5 6 7
x1 x1

Figure 1.13: Mesh and contour plot of f (x1 , x2 ) = 20x1 x2 and the constraint
x1 + x2 6 = 0

1.3.4 Lagrange multipliers

In case the direct substitution cannot be applied, the method of Lagrange


multipliers provides a strategy for finding the minimum or maximum value
of a function subject to equality constraints. The general problem can be for-

26
1.3. Constrained optimization

mulated as:
min f (x) (1.87)
x

subject to
gi (x) = 0, i = 1, m (1.88)

As an example, consider the problem of finding the minimum of a


real-valued function f (x1 , x2 ) subject to the constraint g(x1 , x2 ) = 0. Let
f (x1 , x2 ) = 20 x1 x2 and g(x1 , x2 ) = x1 + x2 6 = 0, as shown in Fig-
ure 1.14. The gradient direction of f (x1 , x2 ) is also shown, as arrows, in the
same figure.

5.5 1
0
5

5
5
4.5
10

0
4
15
2

3.5
x

3
5
2.5

2 10
1.5
15 g(x , x )=0
1 2
1
1 2 3 4 5 6
x1

Figure 1.14: Contour plot of f (x1 , x2 ) = 20 x1 x2 and the constraint


g(x1 , x2 ) = x1 + x2 6 = 0

Our goal is to minimize f (x1 , x2 ), or, in other words, to find the point
which lies on the level curve with the smallest possible value and which sat-
isfies g(x1 , x2 ) = 0. If we are at the point (x1 = 3, x2 = 3), the value of the
function is f (3, 3) = 11 and the constraint is satisfied. The constraint line is
tangent to the level curve at this point. If we move on the line g(x1 , x2 ) = 0
left or right from this point, the value of the function increases. Thus, the
solution of the problem is (3, 3).
In the general case, consider the point x where a level curve of a function
f (x) is tangent to the constraint g(x). At this point, the gradient of f , f (x),

27
Chapter 1. Theory of maxima and minima

is parallel to the gradient of the constraint, g(x). For two vectors to be


parallel, they must be linearly dependent. Thus, there is a scalar value , so
that:
f (x) = g(x) (1.89)

The equation (1.89), written as:

f (x) + g(x) = 0 (1.90)

will provide a necessary condition for optimization of f subject to the con-


straint g = 0.

The method of Lagrange multipliers in case of n independent variables and


m constraints, as defined by (1.87) and (1.88) can be generalized (Avriel, 2003;
Hiriart-Urruty, 1996).

Define the Lagrangian (or augmented) function:

L(x, ) = f (x) + T g(x) (1.91)

where g is a column vector containing the m constraints gi (x), and is a


column vector m of unknown values, called the Lagrange multipliers. The
function above can be written in an expanded form as:

L(x1 , x2 , . . . , xn , 1 , 2 , . . . , m ) = f (x1 , x2 , . . . , xn ) + 1 g1 (x1 , x2 , . . . , xn )


+ . . . + m gm (x1 , x2 , . . . , xn ) (1.92)

To locate the stationary points, the gradient of the Lagrangian function is


set to zero:

L(x, ) = f (x) + T g(x) = 0 (1.93)

The necessary conditions for optimum are obtained by setting the first partial
derivatives of the Lagrangian function with respect to xi , i = 1, n and j ,
j = 1, m equal to zero. There are n + m nonlinear algebraic equations to be

28
1.3. Constrained optimization

solved for n + m unknowns, as follows:


m
L(x, ) f (x) X gj (x)
= + j =0
x1 x1 x1
j=1
Xm
L(x, ) f (x) gj (x)
= + j =0
x2 x2 x2
j=1
...
m
L(x, ) f (x) X gj (x)
= + j =0 (1.94)
xn xn xn
j=1
L(x, )
= g1 (x) = 0
1
...
L(x, )
= gm (x) = 0
m

Example 1.14 Find the stationary points of the function f (x1 , x2 ) = 20 x1 x2


subject to the constraint x1 + x2 = 6 using the method of Lagrange multipliers.

Define the Lagrangian function:

L(x1 , x2 , ) = 20 x1 x2 + (x1 + x2 6) (1.95)

Obtain the stationary points from:

L(x1 , x2 , )
= x2 + = 0 (1.96)
x1
L(x1 , x2 , )
= x1 + = 0 (1.97)
x2
L(x1 , x2 , )
= x1 + x2 6 = 0 (1.98)

From (1.96) and (1.97), x1 = x2 = , which replaced in (1.98) gives the stationary
point: x1 = 3 = x2 = .

This point is the solution of the constrained minimization problem from example
1.13, but the sufficient conditions for constrained optimization will be discussed in
the following section.

29
Chapter 1. Theory of maxima and minima

1.3.5 Sufficient conditions for constrained optimization

The sufficient conditions for minima or maxima in constrained optimization


problems are completely described and demonstrated in (Avriel, 2003) and
(Hiriart-Urruty, 1996). The results in the case of equality constraints can be
summarized as follows:

Corollary (Avriel, 2003): Let f , g1 , g2 , ..., gm be twice continuously differ-


entiable real-valued functions. If there exist vectors x0 Rn , 0 Rm , such
that:
L(x0 , 0 ) = 0 (1.99)

and if
2 L(x0 ,0 ) 2 L(x0 ,0 )

... g1 (x0 )
... gm (x0 )
x1 x1 x1 xp x1 x1

... ... ... ... ... ...

2 L(x0 ,0 )
... 2 L(x0 ,0 ) g1 (x0 )
... gm (x0 )

(1)m xp x1
g1 (x0 )
xp xp
g1 (x0 )
xp xp
>0 (1.100)
... 0 ... 0
x1 xp

... ... ... ... ... ...

gm (x0 )
... gm (x0 )
0 ... 0
x1 xp

for p = m + 1, . . . , n, then f has a strict local minimum at x0 , such that

gi (x0 ) = 0, i = 1, m (1.101)

A similar result for strict local maxima is obtained by changing (1)m in


(1.100) to (1)p , (Avriel, 2003).

For p = n, the matrix from (1.100) is the bordered Hessian matrix of the
problem. The elements are in fact the second derivatives of the Lagrangian
with respect to all n + m variables, xi and j . The columns in the right and
the last rows can be easier recognized as second derivatives of L if we notice
that:
L(x, ) L2 (x, ) gj (x)
= gj (x) and = (1.102)
j j xi xi

Since gj (x) do not depend on , the zeros from lower-right corner of the

30
1.3. Constrained optimization

matrix result from:

L(x, ) L2 (x, )
= gj (x) and =0 (1.103)
j j i

When p < n, the matrices can be obtained if the rows and columns be-
tween p + 1 and n 1 are excluded.

Example 1.15 For example 1.14, we shall prove that the stationary point is a mini-
mum, according to the sufficient condition given above. The function to be minimized
is f (x1 , x2 ) = 20 x1 x2 and the constraint g(x1 , x2 ) = x1 + x2 6 = 0

The number of variables in this case is n = 2, the number of constraints is m = 1


and p = m + 1 = 2. The only matrix we shall analyze is:

2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,)
2 x1 x1 x2 x1
2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,)
H2 =
x2 x1 2 x2 x2

(1.104)
2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,)
x1 x2 2

or
2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,) g(x1 ,x2 )
2 x1 x1 x2 x1
2 L(x1 ,x2 ,) 2 L(x1 ,x2 ,) g(x1 ,x2 )
H2 = x2 x1 2 x2 x2 (1.105)
g(x1 ,x2 ) g(x1 ,x2 )
x1 x2 0
Using the results we have obtained in Example 1.14, the second derivatives of the
Lagrangian function are:

2 L(x1 , x2 , )
= (x2 + ) = 0
2 x1 x1
2 L(x1 , x2 , )
= (x1 + ) = 0
2 x2 x2
2
L(x1 , x2 , )
= (x2 + ) = 1 (1.106)
x1 x2 x2
g(x1 , x2 )
= (x1 + x2 6) = 1
x1 x1
g(x1 , x2 )
= (x1 + x2 6) = 1
x2 x2

31
Chapter 1. Theory of maxima and minima

and the bordered Hessian:



0 1 1

H2 = 1 0 1 (1.107)
1 1 0

The relation (1.100) is written as:



0 1 1


(1)1 1 0 1 = (1)(2) = 2 > 0 (1.108)

1 1 0

thus, the stationary point (3, 3) is a minimizer of the function f subject to the con-
straint g = 0.

Example 1.16 Find the highest and lowest points on the surface f (x1 , x2 ) = x1 x2 +
25 over the circle x21 + x22 = 18.
Figure 1.15 shows a graphical representation of the problem. The constraint on
x1 and x2 places the variables on the circle with the center in the origin and with a

radius equal to 18. On this circle, the values of the function f (x1 , x2 ) are located
on the curve shown in Figure 1.15 on the mesh plot. It is clear from the picture that
there are two maxima and two minima that will be determined using the method of
Lagrange multipliers.

5
10

45
4
40
25

3
15 35
2
20 30
1
2

0 25 25
x

1 30
20
2
35
15
3
40
25

4 45
10

5 5
5 4 3 2 1 0 1 2 3 4
x1

Figure 1.15: Mesh and contour plot of f (x1 , x2 ) = 25+x1 x2 and the constraint
x21 + x22 = 18

The problem may be reformulated as: optimize f (x1 , x2 ) = x1 x2 + 25 subject to


the constraint g(x1 , x2 ) = x21 + x22 18 = 0.

32
1.3. Constrained optimization

Write the Lagrange function:

L(x1 , x2 , ) = x1 x2 + 25 + (x21 + x22 18) (1.109)

Compute the first partial derivatives of L and set them equal to zero:

Lx1 = x2 + 2x1 = 0
Lx2 = x1 + 2x2 = 0 (1.110)
L = x21 + x22 18 = 0

There are four stationary points:

1
x10 = 3, x20 = 3, 0 =
2
1
x10 = 3, x20 = 3, 0 = (1.111)
2
1
x10 = 3, x20 = 3, 0 =
2
1
x10 = 3, x20 = 3, 0 =
2

Build the bordered Hessian matrix and check the sufficient conditions for maxima
and minima.

Lx1 x1 Lx1 x2 gx1 2 1 2x1

H2 = Lx1 x2 Lx2 x2 gx2 = 1 2 2x2 (1.112)
gx1 gx2 0 2x1 2x2 0

Because the number of constraints is m = 1, the number of variables is n = 2


and p = 2, the sufficient condition for a stationary point to be a minimizer of f
subject to g = 0 is:

(1)1 det(H2 ) > 0 or det(H2 ) < 0 (1.113)

and for a maximizer:

(1)2 det(H2 ) > 0 or det(H2 ) > 0 (1.114)

33
Chapter 1. Theory of maxima and minima

For all points (x10 , x20 , 0 ) given in (1.111) compute det(H2 ) and obtain:

1 1 6
1

(3, 3, ) : det(H2 ) = 1 1 6 = 144 < 0 (1.115)
2
6 6 0


1 1 6
1

(3, 3, ) : det(H2 ) = 1 1 6 = 144 < 0 (1.116)
2
6 6 0

1 1 6
1

(3, 3, ) : det(H2 ) = 1 1 6 = 144 > 0 (1.117)
2
6 6 0

1 1 6
1

(3, 3, ) : det(H2 ) = 1 1 6 = 144 > 0 (1.118)
2
6 6 0

Thus, the function f subject to g = 0 has two minima at (3, 3, 21 ) and


(3, 3, 12 ), and two maxima at (3, 3, 12 ) and (3, 3, 21 ).

Example 1.17 Find the point on the sphere x2 + y 2 + z 2 = 4 closest to the point
P (3, 4, 0) (Figure 1.16).

0 P
z

2
2
4
0 3
2
1
2 0
1
2
y
x

Figure 1.16: The sphere x2 + y 2 + z 2 = 4 and the point P

34
1.3. Constrained optimization

We shall find the point (x, y, z) that minimizes the distance between P and
the sphere. In a 3-dimensional space, the distance between any point (x, y, z) and
P (3, 4, 0) is given by:
p
D(x, y, z) = (x 3)2 + (y 4)2 + z 2 (1.119)

Since the point (x, y, z) must be located on the sphere, the variables are constrained
by the equation x2 + y 2 + z 2 = 4. The calculus will be easier if instead of (1.119) we
minimize the function under the square root. The problem to be solved is then:

min(x 3)2 + (y 4)2 + z 2 (1.120)


x,y,z

subject to:
g(x, y, z) = x2 + y 2 + z 2 4 = 0 (1.121)

Write the Lagrangian function first:

L(x, y, z, ) = (x 3)2 + (y 4)2 + z 2 + (x2 + y 2 + z 2 4) (1.122)

and set the partial derivatives equal to zero to compute the stationary points:

Lx = 2x 6 + 2x = 0
Ly = 2y 8 + 2y = 0 (1.123)
Lz = 2z + 2z = 0
L = x2 + y 2 + z 2 4 = 0

The system (1.123) has two solutions:

6 8 3
(S1 ) : x10 = , y10 = , z10 = 0, 10 = (1.124)
5 5 2

and
6 8 7
(S2 ) : x10 = , y10 = , z10 = 0, 10 = (1.125)
5 5 2
It is clear from Figure 1.16 that we must find a minimum and a maximum distance
between the point P and the sphere, thus the sufficient conditions for maximum or
minimum have to be checked.
The number of variables is n = 3, the number of constraints m = 1 and p from

35
Chapter 1. Theory of maxima and minima

(1.100) has two values: p = 2, 3. We must analyze the sign of the determinants for
the following matrices:

For p = 2:

Lxx Lxy Lx Lxx Lxy gx

H22 = Lxy Lyy Ly = Lxy Lyy gy (1.126)
Lx Ly L gx gy 0

For p = 3:

Lxx Lxy Lxz Lx Lxx Lxy Lxz gx

Lxy Lyy Lyz Ly Lxy Lyy Lyz gy
H23 =

=

(1.127)
Lxz Lyz Lzz Lz Lxz Lyz Lzz gz

Lx Ly Lz L gx gy gz 0

The sufficient conditions for minimum in this case are:

(1)1 det(H22 ) > 0, and (1)1 det(H23 ) > 0 (1.128)

and the sufficient conditions for maximum are:

(1)2 det(H22 ) > 0, and (1)3 det(H23 ) > 0 (1.129)

The second derivatives of the Lagrangian:

Lxx = 2 + 2, Lyy = 2 + 2, Lzz = 2 + 2


Lxy = 0, Lxz = 0, Lx = gx = 2x (1.130)
Lyz = 0, Ly = gy = 2y, Lz = gz = 2z

From (1.126) and (1.127) we obtain:


2 + 2 0 2x

H22 = 0 2 + 2 2y (1.131)
2x 2y 0

36
1.3. Constrained optimization


2 + 2 0 0 2x

0 2 + 2 0 2y
H23 =

(1.132)
0 0 2 + 2 2z

2x 2y 2z 0

For the first stationary point, (S1 ), the determinants of H22 and H23 are:


5 0 12
5
16
detH22 = 0 5 5 = 80 (1.133)
12
16
0
5 5

5 0 0 12
5
16
0 5 0
detH23 = 5 = 400 (1.134)
0 0 5 0
12 16
5 5 0 0

The point ( 65 , 85 , 0) is then a minimizer of the problem because:

(1)1 det(H22 ) = 80 > 0, and (1)1 det(H23 ) = 400 > 0 (1.135)

For the second stationary point, (S2 ), the determinants are:



5 0
12

5

detH22 = 0 5 16
= 80,
5
(1.136)

12 16 0
5 5

5 0 0 12
5

0 5 0 16
detH23 = 5 = 400 (1.137)
0 0 5 0

12
5 16
5 0 0

The point ( 65 , 85 , 0) is a maximizer of the problem because:

(1)2 det(H22 ) = 80 > 0, and (1)3 det(H23 ) = 400 > 0 (1.138)

The point on the sphere closest to P is: ( 56 , 85 , 0).

37
Chapter 1. Theory of maxima and minima

Example 1.18 Find the extrema of the function:

f (x, y, z) = x + 2y + z (1.139)

subject to:

g1 (x, y, z) = x2 + y 2 + z 2 1 = 0 (1.140)
g2 (x, y, z) = x + y + z 1 = 0 (1.141)

In this case we have two constraints and 3 variables. Two new unknowns will be
introduced and the Lagrange function is written as:

L(x, y, z, 1 , 2 ) = x + 2y + z + 1 (x2 + y 2 + z 2 1) + 2 (x + y + z 1) (1.142)

The stationary points are computed from:

Lx = 1 + 21 x + 2 = 0
Ly = 2 + 21 y + 2 = 0
Lz = 1 + 21 z + 2 = 0 (1.143)
2 2 2
L1 = x +y +z 1=0
L2 = x+y+z1=0

The solutions of (1.143) are:

(S1 ) : x0 = 0, y0 = 1, z0 = 0, 10 = 21 , 20 = 1 (1.144)
(S2 ) : x0 = 23 , y0 = 13 , z0 = 23 , 10 = 12 , 20 = 53 (1.145)

The number p from (1.100) is 3 in this case, therefore we have to analyze the sign
of the determinant:

Lxx Lxy Lxz g1x g2x 21 0 0 2x 1


Lxy Lyy Lyz g1y g2y 0 21 0 2y 1

detH2 = Lxz Lyz Lzz g1z g2z =
0 0 21 2z 1 (1.146)

g1x g1y g1z 0 0 2x 2y 2z 0 0

g2x g2y g2z 0 0 1 1 1 0 0

38
1.3. Constrained optimization

The necessary condition for minimum (when m = 2 and p = 3) is:

(1)2 detH2 > 0 or detH2 > 0 (1.147)

and for maximum:


(1)3 detH2 > 0 or detH2 < 0 (1.148)

For (S1 ):
1 0 0 0 1


0 1 0 2 1

detH2 = 0 0 1 0 1 = 8 < 0
(1.149)

0 2 0 0 0

1 1 1 0 0

thus, the point (x0 = 0, y0 = 1, z0 = 0) is a maximizer of f subject to g1 = 0 and


g2 = 0.
For (S2 ):
1 0 0 43 1


0 1 0 23 1

detH2 = 0 0 1 43 1 =8>0
(1.150)
4 2 4

30 3 3 0 0

1 1 1 0 0
2
thus, the point (x0 = 3, y0 = 13 , z0 = 2
3) minimizes f subject to g1 = 0 and
g2 = 0.

1.3.6 Karush-Kuhn-Tucker conditions

The Karush-Kuhn-Tucker (KKT) conditions are an extension of the Lagrangian


theory to include nonlinear optimization problems with inequality con-
straints.
If x is a vector of n variables, x = [x1 x2 . . . xn ]T and f a nonlinear
real-valued function, f : Rn R, consider the constrained minimization
problem:
(P ) : min f (x) (1.151)
x

39
Chapter 1. Theory of maxima and minima

subject to

gi (x) = 0, i = 1, m (1.152)
hj (x) 0, j = 1, p (1.153)

where f , gi , hj are twice differentiable real-valued functions.


The Lagrangian function is written as:

m
X p
X
L(x, , ) = f (x) + i gi (x) + j hj (x)
i=1 j=1

= f (x) + T g(x) + T h(x) (1.154)

where:

= [1 2 . . . m ]T and = [1 2 . . . p ]T are vector multipliers,

g = [g1 (x) g2 (x) . . . gm (x)]T and h = [h1 (x) h2 (x) . . . hp (x)]T are
vector functions.

The necessary conditions for a point x0 to be a local minimizer of f are:


Pm Pp
f (x0 ) + i=1 i gi (x0 ) + j=1 j hj (x0 ) =0 (1.155)
gi (x0 ) = 0, i = 1, m (1.156)
hj (x0 ) 0, j = 1, p (1.157)
j hj (x0 ) = 0, j = 1, p (1.158)
j 0, j = 1, p (1.159)
i , unrestricted in sign, i = 1, m (1.160)

The relations (1.155) - (1.160) are called the Karush-Kuhn-Tucker conditions,


(Boyd and Vandenberghe, 2004). For any optimization problem with differ-
entiable objective and constraint functions, any optimal points must satisfy
the KKT conditions.

The first condition (1.155) is a system of n nonlinear equations with


n + m + p unknowns, obtained by setting equal to zero all the par-
tial derivatives of the Lagrangian with respect to x1 , x2 , . . . xn , i.e.
L(x0 , , ) = 0.

40
1.3. Constrained optimization

The following two conditions (1.156, 1.157), are the inequality and
equality constraints which must be satisfied by the minimizer of the
constrained problem, and are called the feasibility conditions.

The relation (1.158) is called the complementary slackness equation

The relation (1.159) is the non-negativity condition for the multipliers.

The KKT conditions are necessary conditions for optimum. Not all the
points that satisfy (1.155)-(1.160) are optimal points. On the other hand, a
point is not optimal if the KKT conditions are not satisfied.
If the objective and inequality constraint functions (f and hj ) are convex
and gi are affine, the KKT conditions are also sufficient for a minimum point.
A function hj is affine if it has the form:

hj = A1 x1 + A2 x2 + . . . An xn + b (1.161)

In a few cases, it is possible to solve the KKT conditions (and therefore,


the optimization problem) analytically.

Example 1.19 Minimize

f (x1 , x2 ) = e3x1 + e2x2 (1.162)

subject to:

x1 + x2 2 (1.163)
x1 0 (1.164)
x2 0 (1.165)

The constraints are re-written in the standard form:

x1 + x2 2 0 (1.166)
x1 0 (1.167)
x2 0 (1.168)

41
Chapter 1. Theory of maxima and minima

The Lagrangian of the problem is:

L(x1 , x2 , 1 , 2 , 3 ) = e3x1 + e2x2 + 1 (x1 + x2 2) +


+ 2 (x1 ) + 3 (x2 ) (1.169)

and the KKT conditions are:

L
= 3e3x1 + 1 2 = 0 (1.170)
x1
L
= 2e2x2 + 1 3 = 0 (1.171)
x2
1 (x1 + x2 2) = 0 (1.172)
2 (x1 ) = 0 (1.173)
3 (x2 ) = 0 (1.174)
1 0 (1.175)
2 0 (1.176)
3 0 (1.177)

First we may notice that x1 0 and x2 0, thus they can be either zero, or
strictly positive. Therefore, we have four cases:

1.) x1 = 0, x2 = 0 The relations (1.170), (1.171), (1.172) become:

3 + 1 2 = 0 (1.178)
2 + 1 3 = 0 (1.179)
1 (2) = 0 (1.180)

and 1 = 0, 2 = 3, 3 = 2. The conditions (1.176) (1.177) are violated


so x1 = x2 = 0 is not a solution of the problem.

2.) x1 = 0, x2 > 0 Because x2 is strictly positive, from (1.174) we obtain 3 = 0,


and the relations (1.170), (1.171), (1.172) are recalculated:

3 + 1 2 = 0 (1.181)
2x2
2e + 1 = 0 (1.182)
1 (x2 2) = 0 (1.183)

42
1.3. Constrained optimization

From (1.182) we obtain 1 = 2e2x2 6= 0 so the relation (1.183) is satisfied


only for x2 = 2. Then 1 = 2e4 . From (1.181) we obtain: 2 = 3+2e4 <
0 and the constraint (1.176) is not satisfied.

This case will not give a solution of the problem.

3.) x1 > 0, x2 = 0 Because x1 is strictly positive, from (1.173) we obtain 2 = 0,


and the relations (1.170), (1.171), (1.172) are recalculated:

3e3x1 + 1 = 0 (1.184)
2 + 1 3 = 0 (1.185)
1 (x1 2) = 0 (1.186)

From (1.184) we obtain 1 = 3e3x2 6= 0 so the relation (1.186) is satisfied


only for x1 = 2. Then 1 = 3e6 . From (1.185) we obtain: 3 = 2+3e6 <
0 and the constraint (1.177) is not satisfied.

This situation is not a solution of the problem either.

4.) x1 > 0, x2 > 0 Since x1 and x2 cannot be zero, from (1.173) and (1.174) we
obtain: 2 = 0 and 3 = 0.

The relations (1.170), (1.171), (1.172) become:

3e3x1 + 1 = 0 (1.187)
2e2x2 + 1 = 0 (1.188)
1 (x1 + x2 2) = 0 (1.189)

The value of 1 cannot be zero because this would make zero the exponentials
from (1.187) and (1.188) which is not valid. Then x1 + x2 2 = 0, or x2 =
2 x1 .

From (1.187) and (1.188) we obtain:

1 = 3e3x1 = 2e2x2 (1.190)

and then:
2
3e3x1 = 2e2(2x1 ) , or e5x1 +4 = (1.191)
3

43
Chapter 1. Theory of maxima and minima

The solution is:

1 2
x1 = (4 ln ) = 0.88, x2 = 2 x1 = 1.11 (1.192)
5 3

and 1 = 0.21 > 0.


The necessary conditions for the point (0.88, 1.11) to be a minimizer of the
constrained optimization problem are satisfied.
The contour plot of f and the linear constraint are shown in Figure 1.17. The
constraint x1 + x2 = 2 is tangent to a level curve at the point P (0.88, 1.11)
thus it is the global minimizer of f subject to the constraints.

1.8

1.6

1.4

1.2 P
2

1
x

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x
1

Figure 1.17: Contour plot of f and the constraint x1 + x2 2 = 0

1.4 Exercises
1. Locate the stationary points of the following functions and determine
their character:

a) f (x) = x7
b) f (x) = 8x x4 /4
c) f (x) = 50 6x + x3 /18

2. Consider the functions f : R2 R:

a) f (x1 , x2 ) = x1 x2

44
1.4. Exercises

b) f (x1 , x2 ) = x1 /2 + x22 3x1 + 2x2 5


c) f (x1 , x2 ) = x21 + x32 + 6x1 12x2 + 5
d) f (x1 , x2 ) = 4x1 x2 x41 x42
x2 2
1 +x2
e) f (x1 , x2 ) = x1 x2 e 2

Compute the stationary points and determine their character using the
second derivative test.

3. Find the global minimum of the function:

f (x1 , x2 ) = (x1 2)2 + (x2 1)2

in the region

0 x1 1
0 x2 2

4. Find the extrema of the function f : R3 R:

f (x1 , x2 , x3 ) = 2x21 + 3x22 + 4x23 4x1 12x2 16x3

5. Use the method of substitution to solve the constrained-optimization


problem:

min x21 + x22 49
(x1 ,x2 )

subject to
x1 + 3x2 10 = 0

6. Use the method of Lagrange multipliers to find the maximum and min-
imum values of f subject to the given constraints

a) f (x1 , x2 ) = 3x1 2x2 , x21 + 2x22 = 44


b) f (x1 , x2 , x3 ) = x21 2x2 + 2x33 , x21 + x22 + x23 = 1
c) f (x1 , x2 ) = x21 x22 , x21 + x22 = 1

7. Minimize the surface of a cylinder with a given volume.

45
Chapter 1. Theory of maxima and minima

46
Bibliography

Avriel, M. (2003). Nonlinear Programming: Analysis and Methods. Courier


Dover Publications.

Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Uni-


versity Press.

Hancock, H. (1960). Theory of Maxima and Minima. Ginn and Company.

Hiriart-Urruty, J.-B. (1996). LOptimisation. Que sais-je? Presses Universitaires


de France.

Renze, J. and Weisstein, E. (2004). Extreme value the-


orem. From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/ExtremeValueTheorem.html.

Weisstein, E. (2004). Second derivative test.


From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/SecondDerivativeTest.html.

47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy