0% found this document useful (0 votes)
10 views15 pages

Lecture 9

gradient search

Uploaded by

Mehul Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Lecture 9

gradient search

Uploaded by

Mehul Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Quasi-Newton methods

• Idea: Instead of using true Hessian 𝛻 " 𝑓(𝑥& ) in Newton direction, use an
approximation 𝐵& (symmetric positive definite).

-
1 -
𝑓 𝒙& + 𝒑 ≈ 𝑓 𝒙& + 𝛻𝑓 𝒙& 𝒑 + 𝒑 𝑩& 𝒑
2
• 𝐵& is constructed at each step using 𝑩&12 and gradient information
𝛻𝑓 𝒙& and 𝛻𝑓 𝒙&12
• It is common to develop iteration formula directly in the form of 𝑩12
& = 𝑨&

𝑨&52 = 𝑨& + 𝑨6&


• Direction:
𝑝& = −𝐴& 𝛻𝑓(𝑥& )
• Eventually 𝑩& converges to 𝛻 " 𝑓(𝒙& )
• Quasi-Newton methods give superlinear convergence.

CL 603 Optimization 29
Quasi-Newton methods
• DFP (Davidon Fletcher Powell) method1,2

Δ𝒙& Δ𝒙& - 𝑨& Δ𝒈& Δ𝒈& - 𝑨&


𝑨&52 = 𝑨& + - −
Δ𝒙& Δ𝒈& Δ𝒈& - 𝑨& Δ𝒈&
• BFGS (Broyden Fletcher Goldfarb Shanno) method3-6

-
Δ𝒙& Δ𝒙& - Δ𝒙& Δ𝒈& - Δ𝒙& Δ𝒈& -
𝑨&52 = -
+ 𝑰− -
𝑨& 𝑰 −
Δ𝒙& Δ𝒈& Δ𝒙& Δ𝒈& Δ𝒙& - Δ𝒈&

with 𝑨= = 𝑰, Δ𝒙& = 𝒙&52 − 𝒙& , Δ𝒈& = 𝛻𝑓 𝒙&52 − 𝛻𝑓 𝒙&


1. Davidon WC, AEC Res. Develop. Rep. ANL-599, 1959
2. Fletcher R and Powell MJD, Computer J, 163, 1963
3. Broyden GG, J Inst. Math. Appl., 76-90, 1970
4. Fletcher R, Computer J, 317-322, 1970
5. Goldfarb D, Math. Prog., 94-110, 1977
6. Shanno DF, Math. Comput., 647-657, 1970
CL 603 Optimization 30
Quasi Newton methods
• Both the updating formulas preserve symmetry and positive
definiteness.
• They follow descent property
• They require storage for 𝑁×N matrix 𝑨𝒌
• The methods are robust and work on a wide variety of
practical problems.
• DFP method suffers from practical difficulty of 𝐴&52 becoming
ill-conditioned. Work-around is to restart with 𝑨& = 𝑰
• BFGS requires fewer restarts and is less dependent on exact
line searches.

CL 603 Optimization 31
Application to motivating example
DFP method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟓𝟖]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]

10.12 5.31 12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
5. 𝐴2 =

-5
-3.161
5.31 4.04

.5
-1 90-2
.5
0

-5
6. 𝑠" = [0.09; −0.10] 10

-2
-1
1
-7.
7. 𝛼" = 7.00
34

19

-7.2
-3.16
76

2
.2

76
8
-7

8. 𝑥" = [13.20; 8.43]

-5.5
-2 -3.1619
-1
CA0
9.11 −2.14
6

-5.
9. 𝐴" =

0
−2.14 4.37

-2
-1
1
-5.5 -5.5
4
10. 𝑠Q = [0.27; 0.05]

-3
.1
61
9
11. 𝛼Q = 3.35 2
-3.1619 -3.1619
-2
-2

-1
-2
12. 𝑥Q = [14.11; 8.58] -1
0

-1 -1

0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=

CL 603 Optimization 32
Application to motivating example
BFGS method After 3 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟏; 𝟖. 𝟔𝟐]
1. Initial guess 𝑥= = [4; 4] 𝒇⋆ = −𝟕. 𝟑𝟔
2. 𝑠2 = −𝛻𝑓 𝑥= = [0.65; 0.39]
Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
3. 𝛼2 = 13.16
𝒇⋆ = −𝟕. 𝟑𝟔
4. 𝑥2 = [12.58; 9.14]
12 1 2 -1
-3.1619 0-2 -1
-3.1619
-5.5 0-2
10.13 5.29

-5
-3.161
5. 𝐴2 =

.5
-1 90-2
5.29 4.06

.5
0

-5
10
6. 𝑠" = [0.09; −0.105]

-2
-1
1
-7.
34

19

-7.2
7. 𝛼" = 6.79

-3.16
76

2
.2

76
8
-7

8. 𝑥" = [13.20; 8.43]

-5.5
-2 -3.1619
-1
CA0
6

-5.
18.62 0.07

0
9. 𝐴" =

-2
0.07 4.89
-1
1
-5.5 -5.5
4

-3
10. 𝑠Q = [0.68; 0.15]

.1
6 19
11. 𝛼Q = 1.33
-3.1619 -3.1619
2
-2
-2

-1
-2
-1
12. 𝑥Q = [14.11; 8.62]
0

-1 -1

0
0 0 0 0
0 2 4 6 8 10 12 14 16 18 20
=

CL 603 Optimization 33
Selection of step length 𝛼&
• Exact line search: Minimize 𝑓(𝑥& + 𝛼& 𝑝& ) using univariate
optimization
• Can be computationally expensive
• Inexact line search: Select 𝛼& such that it achieves
‘reasonable’ reduction in 𝑓 in the search direction 𝑝&
• Trade-offs: Substantial decrease in 𝑓 versus computational
effort involved

Need conditions on 𝜶𝒌 to select appropriate step length

CL 603 Optimization 34
Conditions on step length
• Simple condition: provide reduction 𝑓 𝑥& + 𝛼& 𝑝& < 𝑓(𝑥& )
• Armijo condition (AC): If 𝑝& is a descent direction, 𝛼& should
give sufficient decrease in 𝑓 i.e.
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &

for some 𝑐2 ∈ (0,1)


• Reduction in 𝑓 should be proportional to both step size as
-𝑝
well as directional derivative 𝛻𝑓 𝑥& &

• In practice, 𝑐2 is taken as a small value (say 101b )


• AC condition is not sufficient to make reasonable progress
as it is satisfied for all sufficiently small values of 𝛼&

CL 603 Optimization 35
Armijo condition – graphical interpretation
• 𝑓 𝑥 = 𝑥2 − 2 b + 𝑥2 − 2 𝑥"" + 𝑥" + 1 ", 𝑥& = 1; 1 , 𝑐2 = 0.2
• 𝑝& = −𝛻𝑓 - 𝑥& = [6; −6]

𝜶 ≤ 𝟎. 𝟑𝟐

Any small step 𝜶 → 𝟎 satisfies this condition


CL 603 Optimization 36
Curvature condition
-𝑝 -𝑝 , 𝑐
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& & " ∈ (𝑐2 , 1)
• Rules out very small steps.
• 𝑐" = 0.9 for Newton type method, 0.1 for conjugate gradient
𝒄𝟏 = 𝟎. 𝟐, 𝒄𝟐 = 𝟎. 𝟒

-𝑝
𝑐" 𝛻𝑓 𝑥& &

𝜶 ≥ 𝟎. 𝟎𝟕 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟐
Exact line search: 𝜶⋆ = 𝟎. 𝟐𝟓
CL 603 Optimization 37
Wolfe conditions
• Armijo condition and curvature condition
-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≥ 𝑐" 𝛻𝑓 𝑥& &

with 0 < 𝑐2 < 𝑐" < 1

• Strong Wolfe conditions: To bring 𝛼& closer to local minima


-𝑝
𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& &
-𝑝 -𝑝
𝛻𝑓 𝑥& + 𝛼& 𝑝& & ≤ 𝑐" 𝛻𝑓 𝑥& &

with 0 < 𝑐2 < 𝑐" < 1

Small 𝒄𝟏 and large 𝒄𝟐 lead to large bound on 𝜶

CL 603 Optimization 38
Strong Wolfe conditions

-𝑝
𝑐" 𝛻𝑓 𝑥& &

𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏 𝟎. 𝟎𝟕 ≤ 𝜶 ≤ 𝟎. 𝟑𝟏

Eliminated region 0.31 ≤ 𝛼 ≤ 0.32 with large positive


𝛻𝑓 𝑥& + 𝛼& 𝑝& - 𝑝&

CL 603 Optimization 39
Why 𝑐2 < 𝑐"
𝒄𝟏 = 𝟎. 𝟓, 𝒄𝟐 = 𝟎. 𝟐

𝜶 ≤ 𝟎. 𝟏𝟐 𝜶 ≥ 𝟎. 𝟏𝟒

No feasible 𝜶

CL 603 Optimization 40
Backtrack algorithm
• Based on Armijo condition
1. Choose 𝛼g > 0, 𝜌, 𝑐2 ∈ (0,1)
2. Set 𝛼 = 𝛼g
-𝑝
3. Is 𝑓 𝑥& + 𝛼& 𝑝& ≤ 𝑓 𝑥& + 𝑐2 𝛼& 𝛻𝑓 𝑥& & satisfied?
a. Yes: 𝛼& = 𝛼
b. No: 𝛼 = 𝜌𝛼, go back to step 3

CL 603 Optimization 41
How to select 𝛼g
• For Newton type methods, 𝛼g = 1
• Choice 1: Assume that first order change in function across
two iterates is same
𝛼& 𝛻- 𝑓 𝑥& 𝑝& = 𝛼&12 𝛻- 𝑓 𝑥&12 𝑝&12

jklm n o p qklm rklm


𝛼g =
n o p qk rk

• Choice 2: Fit a quadratic using 𝑓 𝑥&12 , 𝑓(𝑥& ) and


𝛻- 𝑓 𝑥& 𝑝& and define 𝛼g as the minimizer for this quadratic

2 𝑓 𝑥& − 𝑓 𝑥&12
𝛼g =
𝛻- 𝑓 𝑥& 𝑝&

CL 603 Optimization 42
Newton with inexact line search
Newton’s method with inexact After 4 iterations: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟑; 𝟖. 𝟓𝟖]
line search 𝒇⋆ = −𝟕. 𝟑𝟔
1. Initial guess 𝑥= = [4; 4] Analytical solution: 𝒙 ⋆ = [𝟏𝟒. 𝟏𝟒; 𝟖. 𝟓𝟖]
2. 𝛼= = 0.25 𝒇⋆ = −𝟕. 𝟑𝟔
3. 𝑥2 = [9.64; 9.09]
4. 𝛼2 = 1.00
5. 𝑥" = [12.39; 8.71]
6. 𝛼" = 1.00
7. 𝑥Q = [13.83; 8.61]
8. 𝛼Q = 1.00
9. 𝑥b = [14.13; 8.58]

CL 603 Optimization 43

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy