Chapter 4: Unconstrained Optimization
Chapter 4: Unconstrained Optimization
Chapter 4: Unconstrained Optimization
1
Outline:
Part I: one-dimensional unconstrained optimization
Analytical method
Newtons method
Golden-section search method
Part II: multidimensional unconstrained optimization
Analytical method
Gradient method steepest ascent (descent) method
Newtons method
2
PART I: One-Dimensional Unconstrained Optimization Techniques
F(x)>0 F(x)<0
F(x)>0
F(x)=0 F(x)>0
F(x)>0
2 Newtons Method
4
F (x) = min F (x) min F (xk + p)
x p
2
0 p 00
min F (xk ) + pF (xk ) + F (xk )
p 2
Let
F (x) 0 00
= F (xk ) + pF (xk ) = 0
p
we have 0
F (xk )
p = 00
F (xk )
Newtons iteration 0
F (xk )
xk+1 = xk + p = xk 00
F (xk )
x2
Example: find the maximum value of f (x) = 2 sin x 10 with an initial guess
of x0 = 2.5.
Solution:
0 2x x
f (x) = 2 cos x = 2 cos x
10 5
5
00 1
f (x) = 2 sin x
5
2 cos xi x5i
xi+1 = xi
2 sin xi 51
x0 = 2.5, x1 = 0.995, x2 = 1.469.
Comments:
0
Same as N.-R. method for solving F (x) = 0.
Quadratic convergence, |xk+1 x| |xk x|2
May diverge
Requires both first and second derivatives
Solution can be either local minimum or maximum
6
3 Golden-section search for optimization in 1-D
7
F(x1)<F(x2) F(x1)>F(x2)
Xl X2 X1 Xu Xl X2 X1 Xu
(new Xl ) (new Xu ) (new Xl ) (new Xu )
Xl X2 X1 Xu Xl X2 X1 Xu
(new Xl ) (new Xu ) (new Xl ) (new Xu )
Figure 2: Golden search: updating search range
8
The choice of d
9
d0 d1 l2
Define r = = = Then r2 + r 1 = 0, and r = 51
l0 l1 l1 . 2 0.618
d = r(xu xl ) 0.618(xu xl ) is referred to as the golden value.
Relative error
xnew xold
a = 100%
xnew
Consider F (x2) < F (x1). That is, xl = x2, and xu = xu.
For case (a), x > x2 and x closer to x2.
x x1 x2 = (xl + d) (xu d)
= (xl xu) + 2d = (xl xu) + 2r(xu xl )
= (2r 1)(xu xl ) 0.236(xu xl )
For case (b), x > x2 and x closer to xu.
x xu x1
= xu (xl + d) = xu xl d
= (xu xl ) r(xu xl ) = (1 r)(xu xl )
0.382(xu xl )
Therefore, the maximum absolute error is (1 r)(xu xl ) 0.382(xu xl ).
10
x
a 100%
x
(1 r)(xu xl )
100%
|x |
0.382(xu xl )
=
100%
|x |
x2
Example: Find the maximum of f (x) = 2 sin x with xl = 0 and xu = 4 as
10
the starting search range.
Solution:
Iteration 1: xl = 0, xu = 4, d = 512 (xu xl ) = 2.472, x1 = xl + d = 2.472,
x2 = xu d = 1.528. f (x1) = 0.63, f (x2) = 1.765.
Since f (x2) > f (x1), x = x2 = 1.528, xl = xl = 0 and xu = x1 = 2.472.
Iteration 2: xl = 0, xu = 2.472, d = 51
2 (xu xl ) = 1.528, x1 = xl + d = 1.528,
x2 = xu d = 0.944. f (x1) = 1.765, f (x2) = 1.531.
Since f (x1) > f (x2), x = x1 = 1.528, xl = x2 = 0.944 and xu = xu = 2.472.
11
Multidimensional Unconstrained Optimization
4 Analytical Method
Definitions:
If f (x, y) < f (a, b) for all (x, y) near (a, b), f (a, b) is a local maximum;
If f (x, y) > f (a, b) for all (x, y) near (a, b), f (a, b) is a local minimum.
If f (x, y) has a local maximum or minimum at (a, b), and the first order partial
derivatives of f (x, y) exist at (a, b), then
f f
|(a,b) = 0, and |(a,b) = 0
x y
If
f f
|(a,b) = 0 and |(a,b) = 0,
x y
then (a, b) is a critical point or stationary point of f (x, y).
If
f f
|(a,b) = 0 and |(a,b) = 0
x y
12
and the second order partial derivatives of f (x, y) are continuous, then
2f
When |H| > 0 and |
x2 (a,b)
< 0, f (a, b) is a local maximum of f (x, y).
2f
When |H| > 0 and |
> 0, f (a, b) is a local minimum of f (x, y).
x2 (a,b)
When |H| < 0, f (a, b) is a saddle point.
Hessian of f (x, y):
" #
2f 2f
x2 xy
H= 2
f 2f
yx y 2
2f 2f 2f 2f
|H| = x2
y 2
xy yx
2f 2f 2f
When xy is continuous, xy = yx .
2f 2f
When |H| > 0, x2
y 2
> 0.
Solution:
f f
x = 2y + 2 2x, y = 2x 4y.
Let f
x = 0, 2x + 2y = 2.
Let f
y = 0, 2x 4y = 0.
Then x = 2 and y = 1, i.e., (2, 1) is a critical point.
2f
x 2 = x (2y + 2 2x) = 2
2f
y 2 = y (2x 4y) = 4
2f
xy = x (2x 4y) = 2, or
14
2 2
z=x y
0.4
0.2
0.2
0.4
0.5
0.5
0
0
y 0.5 0.5
x
15
2f
yx = y (2y + 2 2x) = 2
2f 2f 2f 2f
|H| =
x2 y 2
xy yx = (2) (4) 22 = 4 > 0
2f
x2
< 0. (x, y ) = (2, 1) is a local maximum.
Idea: starting from an initial point, find the function maximum (minimum) along
the steepest direction so that shortest searching time is required.
Steepest direction: directional derivative is maximum in that direction gradi-
ent direction.
Directional derivative
f f f f 0 0
Dhf (x, y) = cos + sin = h[ ] [cos sin ] i
x y x y
hi: inner product
Gradient
16
0 0
When [ f f
x y ] is in the same direction as [cos sin ] , the directional derivative
is maximized. This direction is called gradient of f (x, y).
The gradient of a 2-D function is represented as f (x, y) = f ~i+ f ~j, or [ f f ]0 .
x y x y
h i0
The gradient of an n-D function is represented as f (X)~ = f f . . . f ,
x1 x2 xn
0
~
where X = [x1 x2 . . . xn]
Example: f (x, y) = xy 2. Use the gradient to evaluate the path of steepest ascent
at (2,2).
Solution:
f 2 f
x = y , y = 2xy.
f f
x |(2,2) = 22 = 4,y |(2,2) = 2 2 2 = 8
Gradient: f (x, y) = f ~i + f ~j = 4~i + 8~j
x y
1 8 o
= tan 4 = 1.107, or 63.4 .
cos = 424+82 , sin = 428+82 .
Directional derivative at (2,2): f x cos + f
y sin = 4 cos + 8 sin = 8.944
17
0 0
If 6= , for example, = 0.5325, then
f 0 f 0 0 0
Dh0 f |(2,2) = cos + sin = 4 cos + 8 sin = 7.608 < 8.944
x y
Steepest ascent method
Ideally:
Start from (x0, y0). Evaluate gradient at (x0, y0).
Walk for a tiny distance along the gradient direction till (x1, y1).
Reevaluate gradient at (x1, y1) and repeat the process.
Pros: always keep steepest direction and walk shortest distance
Cons: not practical due to continuous reevaluation of the gradient.
Practically:
Start from (x0, y0).
Evaluate gradient (h) at (x0, y0).
18
Evaluate f (x, y) in direction h.
Find the maximum function value in this direction at (x1, y1).
Repeat the process until (xi+1, yi+1) is close enough to (xi, yi).
~ i+1 from X
Find X ~i
~
For an n-D function f (X),
~ + f | ~ )
g() = f (X (Xi )
0
Let g () = 0 and find the solution = .
Update xi+1 = xi + f | (x
x i i ,y )
, yi+1 = yi + f
y |(xi ,yi ) .
19
Figure 5: Illustration of steepest ascent
20
Figure 6: Relationship between an arbitrary direction h and x and y coordinates
21
Example: f (x, y) = 2xy + 2x x2 2y 2, (x0, y0) = (1, 1).
First iteration:
x0 = 1, y0 = 1.
f f
x | (1,1) = 2y + 2 2x| (1,1) = 6, y |(1,1) = 2x 4y|(1,1) = 6
f = 6~i 6~j
f f
g() = f (x0 + |(x0,y0) , y0 + |(x0,y0) )
x y
= f (1 + 6, 1 6)
= 2 (1 + 6) (1 6) + 2(1 + 6) (1 + 6)2 2(1 6)2
= 1802 + 72 7
0
g () = 360 + 72 = 0, = 0.2.
Second iteration:
x1 = x0 + f |
x (x0 ,y0 )
= 1+60.2 = 0.2, y 1 = y0 + f
y |(x0 ,y0 ) = 160.2 =
0.2
f
x |(0.2,0.2) = 2y + 2 2x|(0.2,0.2) = 2 (0.2) + 2 2 0.2 = 1.2,
f
y |(0.2,0.2) = 2x 4y|(0.2,0.2) = 2 0.2 4 (0.2) = 1.2
22
f = 1.2~i + 1.2~j
f f
g() = f (x1 + |(x1,y1) , y1 + |(x1,y1) )
x y
= f (0.2 + 1.2, 0.2 + 1.2)
= 2 (0.2 + 1.2) (0.2 + 1.2) + 2(0.2 + 1.2)
(0.2 + 1.2)2 2(0.2 + 1.2)2
= 1.442 + 2.88 + 0.2
0
g () = 2.88 + 2.88 = 0, = 1.
Third iteration:
x2 = x1 + f
x |(x1 ,y1 ) = 0.2 + 1.2 1 = 1.4, y2 = y1 +
f
y |(x1 ,y1 ) =
0.2 + 1.2 1 = 1
...
(x, y ) = (2, 1)
23
6 Newtons Method
~ ~ 0
~ ~ ~ 1 ~ ~ 0
~ X~ i)
f (X) f (Xi) + f (Xi)(X Xi) + (X Xi) Hi(X
2
where Hi is the Hessian matrix
2
f 2f 2f
x212 x1 x2 . . . x1 xn
f 2f 2f
. . . x2xn
H = x2x1 x22
...
2 2 2
f f f
xn x1 xn x2 . . . x2
n
~
f (X)
At the maximum (or minimum) point, = 0 for all j = 1, 2, . . . , n, or
xj
f = ~0. Then
~ i) + Hi(X
f (X ~ X
~ i) = 0
If Hi is non-singular,
~ =X
X ~ i H 1f (X
~ i)
i
24
~ i+1 = X
Iteration: X ~ i H 1f (X
~ i)
i
~ = 0.5x21 + 2.5x22
Example: f (X)
~ x1
f (X) =
5x2
" #
2f 2f
x2 1 0
xy
H= f 2f
2 =
0 5
yx y 2
~ 5 ~ ~ 1 ~ 5 1 0 5 0
X0 = , X1 = X0 H f (X0) = =
1 1 0 51 5 0
25