L10_Subgrad_PGD (partially annotated)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Lecture 10: Subgradient method and

Projected gradient descent


In topic 6, we will study numerical methods for solving constrained
optimization problem.
Subgradient method
Projected gradient descent
The quadratic penalty method
Augmented Lagrangian method
Barrier function method
Problem setting

(P) min f (x)


s.t. g(x) = 0, (D) max p
θ(λ, µ)
λ∈Rm , µ∈R+
h(x) ≤ 0, := inf x∈X f (x) + λT g(x) + µT h(x)


x ∈ X ⊆ Rn

Assumptions:
X is a compact set in Rn
f : Rn → R, g : Rn → Rm , and h : Rn → Rp are continuous
Problem setting

(P) min f (x)


s.t. g(x) = 0, (D) max p
θ(λ, µ)
λ∈Rm , µ∈R+
h(x) ≤ 0, := inf x∈X f (x) + λT g(x) + µT h(x)


x ∈ X ⊆ Rn

Assumptions:
X is a compact set in Rn
f : Rn → R, g : Rn → Rm , and h : Rn → Rp are continuous
For solving dual, consider gradient ascent on θ:
[λ; µ](k+1) = [λ; µ](k) + tk ∇θ λ(k) , µ(k)


tk : step length
 ·) is often not differentiable!
Difficulty: θ(·,
 5λ − 4 if λ ≤ −1
Eg. θ(λ) = λ − 8 if − 1 ≤ λ ≤ 2 in example 10.11.
−3λ if λ ≥ 2

Another motivating example

Consider the LASSO model where we apply l1 regularization to linear


regression, i.e.
m
1 X
f (x) = kbi − aT 2
i xk + λkxk1
m
i=1

Here, kxk1 := ni=1 |xi | is defined to be the vector one-norm of x, λ is a


P
regularization parameter.
Compared to the usual regression, this model promotes sparsity for the
optimal solution x.
Another motivating example

Consider the LASSO model where we apply l1 regularization to linear


regression, i.e.
m
1 X
f (x) = kbi − aT 2
i xk + λkxk1
m
i=1

Here, kxk1 := ni=1 |xi | is defined to be the vector one-norm of x, λ is a


P
regularization parameter.
Compared to the usual regression, this model promotes sparsity for the
optimal solution x.
Again, the difficulty for solving this problem is that the objective function
is not differentiable.
Subgradients

Definition 10.1
Suppose
S ⊆ Rn be a nonempty convex set
f : S → R be a convex function
A vector ξ ∈ Rn is a subgradient of f at x̄ ∈ S if

f (x) ≥ f (x̄) + ξ T (x − x̄), ∀ x ∈ S.


Subgradients

Definition 10.1
Suppose
S ⊆ Rn be a nonempty convex set
f : S → R be a convex function
A vector ξ ∈ Rn is a subgradient of f at x̄ ∈ S if

f (x) ≥ f (x̄) + ξ T (x − x̄), ∀ x ∈ S.

The set of all subgradients of f at x̄ is called the subdifferential of f at x̄,


denoted by ∂f (x̄), i.e.

∂f (x̄) = {ξ : ξ is a subgradient of f at x̄}.

If f is concave, then ξ ∈ Rn is a subgradient of f at x̄ ∈ S if

f (x) ≤ f (x̄) + ξ T (x − x̄), ∀ x ∈ S.


Example 10.2
Find the subdifferential of f defined by f (x) = |x| for all x ∈ R.
(
x, if x ≥ 0,
Solution. Note that f (x) = and it is differentiable for all
−x, if x < 0.
x 6= 0. Hence 
{1},
 if x > 0,
∂f (x) = {−1}, if x < 0,

[−1, 1], if x = 0.

Subgradient method (part 2)
Some useful propositions

Proposition 1
∂f is a convex set. If f is differentiable at x, then

∂f (x) = {∇f (x)}.


Some useful propositions

Proposition 1
∂f is a convex set. If f is differentiable at x, then

∂f (x) = {∇f (x)}.

Proposition 2
If f is continuous and convex, then minx∈Rn f (x) is attained at x∗ if and
only if 0 ∈ ∂f (x∗ ).
Some useful propositions

Proposition 1
∂f is a convex set. If f is differentiable at x, then

∂f (x) = {∇f (x)}.

Proposition 2
If f is continuous and convex, then minx∈Rn f (x) is attained at x∗ if and
only if 0 ∈ ∂f (x∗ ).

Proposition 3
Let f , g be two convex functions. Under some mild conditions (which
assumed to be true in this course), the subdifferential of f + g is given by

∂(f + g )(x) = ∂f (x) + ∂g (x)


Convex hull

Definition 10.3
Given a set S, C = conv (S) is the convex hull of S, if C is the smallest
convex set that contains S.
Convex hull

Definition 10.3
Given a set S, C = conv (S) is the convex hull of S, if C is the smallest
convex set that contains S.

Proposition 4
If S = {v1 , · · · , vn }, then
n n
X n
X o
conv (S) = v = λi vi , λi ≥ 0, λi = 1
i=1 i=1
Proposition 5
Suppose f (x) = max{f1 (x), · · · , fm (x)}, where fi are all convex and
continuously differentiable functions. Suppose
f (x∗ ) = f1 (x∗ ) = · · · = fj (x∗ ). Then

∂f (x∗ ) = conv ({∇f1 (x∗ ), · · · , ∇fj (x∗ )})


Example 10.4
Consider f (x) = x 2 + |x − 1|. Find the subgradient and use it to find the
optimal solution.
Solution. x ∗ = 0.5.
Example 10.5 (C.f. example 10.11)

 5λ − 4 if λ ≤ −1
Try to maximize θ(λ) = λ − 8 if − 1 ≤ λ ≤ 2 .
−3λ if λ ≥ 2

Note that θ(λ) = min{5λ − 4, λ − 8, −3λ}
Example 10.5 (C.f. example 10.11)

 5λ − 4 if λ ≤ −1
Try to maximize θ(λ) = λ − 8 if − 1 ≤ λ ≤ 2 .
−3λ if λ ≥ 2

Note that θ(λ) = min{5λ − 4, λ − 8, −3λ}
Solution. When λ < −1, θ is continuously differentiable and θ(λ) = 5λ − 4, so
∂θ(λ) = {5}.

When λ = −1, θ(λ) = − max{4 − 5λ, 8 − λ}, and 4 − 5λ = 8 − λ. By


proposition 5, ∂θ(−1) = −conv ({−5, −1}) = [1, 5].

When −1 < λ < 2, θ is continuously differentiable and θ(λ) = λ − 8, so


∂θ(λ) = {1}.

When λ = 2, θ(λ) = − max{8 − λ, 3λ}, and 8 − λ = 3λ. By proposition 5,


∂θ(−1) = −conv ({1, −3}) = [−3, 1].

When λ > 2, θ is continuously differentiable and θ(λ) = −3λ, so ∂θ(λ) = {−3}.

Running through all cases, the only possibility where 0 ∈ ∂θ(λ∗ ) is when λ∗ = 2.
Example 10.6
Find the subgradient of

f (x) = |x1 + x2 | + |x1 |

At (1,0), (0,1), (1,-1), (0,0). Which of them is a global minimizer?


Solution. (0,0)
Subgradient descent/ascent method
In gradient descent,
x(k+1) = x(k) − tk ∇f (x(k) )
For nonsmooth function, since there is no gradient defined, we can replace
∇f with elements of the subgradient.
However, the set of the subgradient can be difficult to find. Normally, we
just need to choose one of the subgradient to replace ∇f .

Specify some initial guess x(0) .


For k = 0, 1, · · · ,
If 0 ∈ ∂f (x(k) ) ,
then stop;
otherwise,
Pick v(k) ∈ −∂f (x(k) )
x(k+1) = x(k) + tk v(k)
End
End
The last x(k+1) will be the approximate minimizer.
Projected gradient method
In topic 6, we will study numerical methods for solving constrained
optimization problem.
Subgradient method
Projected gradient descent
The quadratic penalty method
Augmented Lagrangian method
Barrier function method
Problem setting

min f (x), x∈S

f is a convex, differentiable function


S is a convex set, eg. S = {x | Ax = b, hj (x) ≤ 0, j = 1, ..., p},
where hj are convex functions.
Difficulty in applying usual gradient descent algorithm: the iterates
may go out of the feasible region.

The projected gradient method is a method that pulls the iterate back to
the feasible region, using the basic template of the usual gradient descent
algorithm.
Projection on a convex set
Theorem 10.7 (Projection theorem)
Let C be a closed convex set in Rn .
(a) For every z ∈ Rn , there exists a unique minimizer (denoted as ΠC (z)
and called as the projection of z onto C ) of
n1 o
min kx − zk2 | x ∈ C
2
where k · k is the Euclidean norm.
Projection on a convex set
Theorem 10.7 (Projection theorem)
Let C be a closed convex set in Rn .
(a) For every z ∈ Rn , there exists a unique minimizer (denoted as ΠC (z)
and called as the projection of z onto C ) of
n1 o
min kx − zk2 | x ∈ C
2
where k · k is the Euclidean norm.
(b) x∗ := ΠC (z) is the projection of z onto C if and only if

hz − x∗ , x − x∗ i ≤ 0 ∀ x ∈ C .
Projection on a convex set
Theorem 10.7 (Projection theorem)
Let C be a closed convex set in Rn .
(a) For every z ∈ Rn , there exists a unique minimizer (denoted as ΠC (z)
and called as the projection of z onto C ) of
n1 o
min kx − zk2 | x ∈ C
2
where k · k is the Euclidean norm.
(b) x∗ := ΠC (z) is the projection of z onto C if and only if

hz − x∗ , x − x∗ i ≤ 0 ∀ x ∈ C .

(c) For any z, w ∈ Rn ,

kΠC (z) − ΠC (w)k ≤ kz − wk.


Example 10.8
For y ∈ R2 , find ΠS (y) for S = {kxk ≤ 1}.
Example 10.8
For y ∈ R2 , find ΠS (y) for S = {kxk ≤ 1}.

Solution. ΠS (y) is the KKT solution of the problem


1
min kx − yk2 s.t. kxk2 ≤ 1.
x 2
The KKT systems give

x − y + µ(2x) = 0, kxk2 ≤ 1, µ ≥ 0, µ(kxk2 − 1) = 0.

Case 1. y ∈ S. Then ΠS (y) = y.


1
Case 2. y ∈/ S. We have x = 1+2µ y. Also, µ 6= 0. This implies the
inequality constraint must be active, i.e.
y
kxk = 1 ⇔ kyk = 1 + 2µ ⇐ x = .
kyk
(
y, if kyk ≤ 1,
Hence, ΠS (y) = y
kyk , otherwise.
Example 10.9
For y ≥ 0 ∈ R2 , findΠS (y) for S = {0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1}.


 y, if y ∈ S,

(y , 1),
1 if 0 ≤ y1 ≤ 1, 1 < y2 ,
Solution. ΠS (y) =
(1, y2 ),

 if 1 < y1 , 0 ≤ y2 ≤ 1,

(1, 1), otherwise.
Example 10.10
Given a 6= 0. For y ∈ d T
(R , find ΠS (y) for S = {a x + b ≤ 0}.
y, if y ∈ S,
Solution. ΠS (y) = aT y+b
y − kak2 a, otherwise
Steepest descent method vs projected gradient descent

Steepest descent method


for solving minx∈Rn f (x)

[Step 0] Select x(0) , and  > 0.

[Step k] For k = 0, 1, 2, 3 · · · ,
(a) d(k) := −∇f (x(k) ).
(b) if kd(k) k < , stop.
(c) else,
(i) Choose fixed tk , or solve

tk = arg min f (x(k) + td(k) )


t≥0

(ii) x(k+1) = x(k) + tk d(k) .


Steepest descent method vs projected gradient descent

Steepest descent method Projected gradient descent


for solving minx∈Rn f (x) for solving minx∈S f (x)

[Step 0] Select x(0) , and  > 0. [Step 0] Select x(0) , and  > 0.

[Step k] For k = 0, 1, 2, 3 · · · , [Step k] For k = 0, 1, 2, 3 · · · ,


(k)
(a) d := −∇f (x(k) ). (a) d(k) := −∇f (x(k) ).
(b) if kd(k) k < , stop. (b) if kx(k+1) − x(k) k < , stop
(c) else, (c) else,
(i) Choose fixed tk , or solve (i) Choose fixed tk , or solve

tk = arg min f (x(k) + td(k) ) tk = arg min f (x(k) + td(k) )


t≥0 t≥0

(ii) x(k+1) = x(k) + tk d(k) . (ii) x(k+1) = ΠS (x(k) + tk d(k) ).


Example 10.11
min f (x) := x1 + x2 , x12 + x22 ≤ 1
Suppose x(0) = 0, find x(1) using projected gradient descent method with
step size t0 = 1.
Example 10.11
min f (x) := x1 + x2 , x12 + x22 ≤ 1
Suppose x(0) = 0, find x(1) using projected gradient descent method with
step size t0 = 1.
 
Solution. x(1) = ΠS x(0) − t0 ∇f (x(0) ) , where S = {x | kxk2 ≤ 1}.
   
(0) (0) 1 1
x − t0 ∇f (x ) = 0 − =− .
1 1
 
1
Since − ∈/ S, the projection formula given by example 10.8 shows:
1
    
(1) 1 1 1
x = ΠS − = − √2 .
1 1
Example 10.12
1
min f (x) := kAx − bk2 , 0 ≤ x1 , x2 ≤ 1.
2
Suppose x(0) = 0, find x(1) using projected gradient descent method with

2 0
optimal step size if A = , b = [1; 2].
0 1
Solution. x(1) = [0.8; 0.8].

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy