HW8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Xiying Lu 122090371 MAT3007 Homework 8

MAT3007

Homework 8

Xiying Lu 122090371

Question 1

(a) Proof: Since 𝒅 = −(∇𝑓(𝒙)𝑗 ) ∙ 𝒆𝑗 , substituting the definition of 𝒆𝑗 ,

2
∇𝑓(𝑥)𝑇 𝒅 = ∇𝑓(𝒙)𝑇 (−∇𝑓(𝒙)𝑗 𝒆𝑗 ) = −∇𝑓(𝒙)𝑗 ∙ ∇𝑓(𝒙)𝑗 = −(∇𝑓(𝒙)𝑗 ) < 0

Therefore, 𝒅 is a descent direction of 𝑓 at 𝒙.

(b) Proof: Since 𝒅 = −∇𝑓(𝒙)/‖∇𝑓(𝒙)‖,

∇𝑓(𝒙)
∇𝑓(𝑥)𝑇 𝒅 = ∇𝑓(𝒙)𝑇 (− ) = − ‖∇𝑓(𝒙)‖ < 0
‖∇𝑓(𝒙)‖

Therefore, 𝒅 is a descent direction of 𝑓 at 𝒙.

∇𝑓(𝒙)
(c) Proof: Since 𝒅 = − max{∇2 𝑓(𝒙)𝑗
𝑖𝑖 , 𝜀}
𝑛 𝑛
𝑇
(∇𝑓(𝒙)𝑗 )2
∇𝑓(𝑥) 𝒅 = ∑ ∇𝑓(𝒙)𝑖 𝑑𝑖 = − ∑
max{∇2 𝑓(𝒙)𝑖𝑖 , 𝜀}
𝑖=1 𝑖=1
Xiying Lu 122090371 MAT3007 Homework 8

Since max{∇2 𝑓(𝒙)𝑖𝑖 , 𝜀} is always positive for ∀ i ∈ {1, … , n}, 𝜀 > 0,

(∇𝑓(𝒙)𝑗 )2
∇𝑓(𝒙) ≠ 0, each term >0 .
max{∇2 𝑓(𝒙)𝑖𝑖 , 𝜀}

(∇𝑓(𝒙) )2
Thus, ∇𝑓(𝑥)𝑇 𝒅 = − ∑𝑛𝑖=1 max{∇2𝑓(𝒙)
𝑗
<0
𝑖𝑖 , 𝜀}

Therefore, 𝒅 is well-defined and a descent direction of 𝑓 at 𝒙.

Question 2

1 𝜆
(a) 𝑓(𝒙) = ‖𝑨𝒙 − 𝒃‖2 + ‖𝒙‖2
2 2

①Gradient: ∇𝑓(𝒙) = 𝑨𝑻 (𝑨𝒙 − 𝒃) + 𝜆𝒙

②Hessian: ∇2 𝑓(𝒙) = 𝑨𝑻 𝑨 + 𝜆𝑰

where 𝑰 is the 𝑛 × 𝑛 identity matrix.

(b) Denote the eigenvalues as {𝜇1 , … , 𝜇𝑛 }

Since is not positive definite, at least one eigenvalue 𝜇𝑖 ≤ 0

The Hessian of 𝑓 is given by: 𝑨𝑻 𝑨 + 𝜆𝑰 (where 𝑰 is the 𝑛 × 𝑛 identity matrix)

Eigenvalues of 𝐻𝑓 (𝒙) = {𝜇1 + 𝜆, … , 𝜇𝑛 + 𝜆}

①Strong Convexity: 𝐻𝑓 (𝒙) is positive definite (all eigenvalues > 0).


Xiying Lu 122090371 MAT3007 Homework 8

∀𝑖, 𝜇𝑖 + 𝜆 > 0, then 𝜆 > −𝑚𝑖𝑛 (𝜇𝑖 ) ,where 𝑚𝑖𝑛 (𝜇𝑖 ) is the smallest eigenvalue

of 𝑨𝑻 𝑨

②Convexity: 𝐻𝑓 (𝒙) is positive semi-definite (all eigenvalues ≥ 0).

then 𝜆 ≥ −𝑚𝑖𝑛 (𝜇𝑖 ) ,where 𝑚𝑖𝑛 (𝜇𝑖 ) is the smallest eigenvalue of 𝑨𝑻 𝑨

③Non-convexity: 𝐻𝑓 (𝒙) is not positive semi-definite (at least one eigenvalue <

0)

then 𝜆 < −𝑚𝑖𝑛 (𝜇𝑖 ) ,where 𝑚𝑖𝑛 (𝜇𝑖 ) is the smallest eigenvalue of 𝑨𝑻 𝑨
Xiying Lu 122090371 MAT3007 Homework 8

(c) (1)Compute Gradient at 𝒙𝟎 :

2 1 1
𝑨=[ ];𝒃 = [ ]; 𝜆 = 3
1 2 1
1
1 1
𝒙𝟎 = [2] , 𝛾 = , 𝜎 =
1 2 2
2
∇𝑓(𝒙) = 𝑨𝑻 (𝑨𝒙 − 𝒃) + 𝜆𝒙
3 3
2 1 1 −3
𝒅𝟎 = −∇𝑓(𝒙𝟎 ) = −[𝑨𝑻 (𝑨𝒙𝟎 − 𝒃) + 𝜆𝒙𝟎 ] = − [ ] ([2] − [ ]) − [2] = [ ]
1 2 3 1 3 −3
2 2
1 2 𝜆 2 1 3
𝑓(𝒙𝟎 ) = ‖𝑨𝒙𝟎 − 𝒃‖ + ‖𝒙𝟎 ‖ = + = 1
2 2 4 4
Backtracking Line Search (Armijo Condition):

The Armijo condition for backtracking line search is given by:

𝑓(𝒙𝟎 + 𝛼0 𝒅𝟎 ) ≤ 𝑓(𝒙𝟎 ) + 𝜎𝛼0 ∇𝑓(𝒙𝟎 ) 𝑇 𝒅𝟎


−3 −3
Choose 𝒅𝟎 = [ ], ∇𝑓(𝒙𝟎 ) 𝑇 𝒅𝟎 = [3 3] [ ] = −18
−3 −3
So, the Armijo condition becomes:

𝑓(𝒙𝟎 + 𝛼0 𝒅𝟎 ) ≤ 1 − 9𝛼0
Since 𝜑(𝛼) = 𝑓(𝒙𝟎 + 𝛼𝒅𝟎 ) − 𝑓(𝒙𝟎 ), 𝜑′(𝛼) = ∇𝑓(𝒙𝟎 + 𝛼𝒅𝟎 )𝑇 𝒅𝟎 ,

𝜑′(0) = ∇𝑓(𝒙𝟎 )𝑇 𝒅𝟎 ,

then we need 𝜑(𝛼) = 𝑓(𝒙𝟎 + 𝛼0 𝒅𝟎 ) − 𝑓(𝒙𝟎 ) ≤ 𝜎𝛼0 𝜑 ′ (0) = 𝛾𝛼0 𝜑 ′ (0)

According to the figure, let 𝛼0 = 0.05

0.5 −0.15 0.35


𝒙𝟏 = 𝒙𝟎 + 𝛼𝒅𝟎 = [ ] + [ ]= [ ]
0.5 −0.15 0.35

(2)From the previous parts of the problem, we know the gradient of the
Xiying Lu 122090371 MAT3007 Homework 8

3
objective function ∇𝑓(𝒙𝟎 ) = [ ]
3
1
We are using a constant step size 𝛼 = 12,so the next iterate is:
1 1 1

𝒙𝟏 = 𝒙𝟎 + 𝛼𝒅𝟎 = [21] + [ 41] = [41] ,
−4
2 4

Now, let's compute the direction 𝒅𝟏 = −∇𝑓(𝒙𝟏 )

3 3
2 1 1 0
−∇𝑓(𝒙𝟏 ) = −[𝑨𝑻 (𝑨𝒙𝟏 − 𝒃) + 𝜆𝒙𝟏 ] = − [ ] ([4] − [ ]) − [4] = [ ]
1 2 3 1 3 0
4 4
𝟏 )𝑇 𝟏
∇𝑓(𝒙 𝒅 = 𝟎
𝒅𝟏 is not a descent direction because the gradient at 𝒙𝟏 is zero, indicating we are

at a stationary point.

Question 3

(a) 𝑓(𝒙) = 𝑒1−𝑥1−𝑥2 + 𝑒 𝑥1+𝑥2−1 + 𝑥12 + 𝑥1 𝑥2 + 𝑥22 + 𝑥1 − 3𝑥2

Compute the Gradient:

The gradient of the function 𝑓(𝒙) is the vector of partial derivatives with respect

to 𝑥1 and 𝑥2 :
Xiying Lu 122090371 MAT3007 Homework 8

𝜕𝑓
= −𝑒 1−𝑥1−𝑥2 + 𝑒 𝑥1+𝑥2−1 + 2𝑥1 + 𝑥2 + 1
𝜕𝑥1
𝜕𝑓
= −𝑒 1−𝑥1−𝑥2 + 𝑒 𝑥1+𝑥2−1 + 𝑥1 + 2𝑥2 − 3
𝜕𝑥2
MATLAB Code:

①α=1
Xiying Lu 122090371 MAT3007 Homework 8

The algorithm ends after 6 iterations

Converged point 𝑥 ∗ = [𝑁𝑎𝑁, 𝑁𝑎𝑁]

The final objective function value 𝑓(𝑥 ∗ ) = 𝑁𝑎𝑁

The gradient norm at the converged point = 𝑁𝑎𝑁

Plot:

②α=0.1

The algorithm ends after 120 iterations

Converged point 𝑥 ∗ = [−1.571284, 2.428703]

The final objective function value 𝑓(𝑥 ∗ ) = −2.285680

The gradient norm at the converged point = 9.133687𝑒 −6

Plot:
Xiying Lu 122090371 MAT3007 Homework 8

(b) MATLAB Code:


Xiying Lu 122090371 MAT3007 Homework 8

The algorithm ends after 42 iterations

Converged point 𝑥 ∗ = [−1.571285, 2.428706]

The final objective function value 𝑓(𝑥 ∗ ) = −2.285680

The gradient norm at the converged point = 8.239375𝑒 −6

Plot:
Xiying Lu 122090371 MAT3007 Homework 8

(c) The code is indicated in (a) and (b)

Question 4

MATLAB Code:
Xiying Lu 122090371 MAT3007 Homework 8

Results:

The algorithm ends after 14 iterations


Xiying Lu 122090371 MAT3007 Homework 8

Converged point 𝑥 ∗ = [−1.571290, 2.428710]

The final objective function value 𝑓(𝑥 ∗ ) = −2.285680

The gradient norm at the converged point = 6.050006𝑒 −11

Plot:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy