0% found this document useful (0 votes)

3 views

exam2018

Uploaded by

OJUGBA OLUCHUKWU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

exam2018

Uploaded by

OJUGBA OLUCHUKWU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

y +0/1/60+ y

Exam Optimization for Machine Learning – CS-439

Prof. Martin Jaggi

Fr. 6. July 2018 - 16h15 to 19h15, in CE1515

ID
STUDENT NAME
T
AF
DR

SCIPER : SCIPER Signature :

Wait for the start of the exam before turning to the next page. This document is
printed double sided, 18 pages.

• This is a closed book exam. No electronic devices of any kind.

• Place on your desk: your student ID, writing utensils, one double-sided A4 page cheat sheet
(handwritten or 11pt min font size) if you have one; place all other personal items below your
desk or on the side.

• You each have a different exam.

• For technical reasons, do use black or blue pens for the MCQ part, no pencils! Use
white corrector if necessary.

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/2/59+ y

First part, multiple choice

There is exactly one correct answer per question.

Newton-Raphson method
An easy method for computing the square root of a real number y > 0 by hand is as follows:

(i) find x0 , such that x20 ≈ y (e.g. y = 17, x0 = 4).

(ii) Calculate the difference d = y − x20 (e.g. d = 17 − 42 = 1).

d 1
(iii) Output x1 = x0 + 2x0 (e.g. x1 = 4 + 8 = 4.125).

(iv) repeat (ii)–(iii) for higher accuracy.

This is an instance of the Newton-Raphson method, which defines the sequence {xt }t≥0 of real numbers
by the following equation:

f (xt )
xt+1 := xt − . (1)
f 0 (xt )

Question 1
√
z − 17
T
What is the function f (z) of which we aim to find a zero in the example above?
AF
z2
z 2 − 17
√
z

Question 2 Now, suppose we are not happy with the solution x1 = 4.125, because x21 = 17.015625
DR

is not accurate enough. What is the next iterate x2 in the sequence (for y = 17, x0 = 4 as above)?
Use the following values: 17.015625
4.125 = 4.125, 0.015625
4.125 ≈ 0.0038.

4.1288
4.1269
4.1212
4.1231

Question 3 How many iterations do you (roughly) have to perform to compute the correct 16
significant digits in the above example (y = 17, x0 = 4).

1016
√
16
10 = 104
16
16
2 =8
√
16 = 4

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/3/58+ y

Question 4 The Newton-Raphson method to find zeros of f can be interpreted as a second-order

optimization method. Of course, one could also use the gradient method instead. How would the
iterates of this scheme look like? (For carefully chosen stepsize γ).

xt+1 := xt − γf (xt )
xt+1 := xt − γf 0 (xt )
xt+1 := xt − γxt
xt+1 := xt − γ(f 0 (xt ))−1 f (xt )
xt+1 := xt − γ(f 00 (xt ))−1 f 0 (xt )

Newton’s second-order optimization method

Question 5 As studied in the class, the update step for Newton’s optimization method for an
objective function g : Rn → R is given by

xt+1 := xt − ∇2 g(xt )−1 ∇g(xt )

For n = 1, how does this optimization method relate to the Newton-Raphson method from Equa-

T
tion (1) from the previous section?

f = g0
f 00 = g
AF
f0 = g
f = g 00

Question 6 Given a quadratic function g : Rn → R of the form g(x) = − 12 x> Ax + b> x + c where
A ∈ Rn×n is a symmetric matrix. What are necessary and sufficient conditions for g to be convex?
DR

−A positive semidefinite, and b is non-negative

The Hessian of g is negative definite for all x, and b is non-negative
A positive semidefinite
The Hessian of g is negative definite for all x
The Hessian of g is positive definite for all x
A positive semidefinite, and b is non-negative
−A positive semidefinite
The Hessian of g is positive definite for all x, and b is non-negative

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/4/57+ y

Coordinate Descent
Question 7 Consider the least squares objective function
1 2
f (x) := kAx − bk , (2)
2
for A a m × n matrix, A = [a1 , . . . an ] with columns ai .
What is the gradient ∇f (x)?

A
A> Ax
A> (Ax − b)
A(A> x − b)

Question 8 We are now interested in the complexity of computing the gradient of f as in Equa-
tion (2). Each addition or multiplication of two real numbers counts as one operation. How expensive
is it to compute the full gradient, given x:
(Note here Θ(k) refers to a function growing at least and at most as fast as k in the variables of concern)

Θ(n + m)
Θ(n2 m2 )
Θ(n)

T
AF
Θ(mn)
Θ(m2 n)
Θ(mn2 )
Θ(m)
DR

Question 9 How expensive is it to compute just a single coordinate of the gradient of f as in

Equation (2), given x:
(Note here Θ(k) refers to a function growing at least and at most as fast as k in the variables of concern)

Θ(n)
Θ(m2 n)
Θ(m)
Θ(n2 m2 )
Θ(n + m)
Θ(mn)
Θ(mn2 )

Question 10 The complexity of Coordinate Descent depends on the coordinate-wise smoothness

constants Li . What is Li for f as in Equation (2)?

Li = λmax (A> A)
2
Li = kai k
Li = λmax (A> A)/n
L i = A> A
Li = kai k

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/5/56+ y

Frank-Wolfe
Consider the linear minimization oracle (LMO) for matrix completion, that is for
X
minn×m (Zij − Yij )2
Y ∈X⊆R
(i,j)∈Ω

when Ω ⊆ [n] × [m] is the set of observed entries from a given matrix Z. Our optimization domain X
is the unit ball of the trace norm (or nuclear norm), which is known to be the convex hull of the rank-1
matrices
u∈Rn , kuk =1
n o
X := conv(A) with A := uv> v∈Rm , kvk2 =1 .
2

Question 11 Consider the LMO for this set X for a gradient at iterate Y ∈ Rn×m (derive it if
necessary). Compare the computational operation (or cost) needed to compute the LMO, as opposed
to computing the projection onto X?
Hint: Assume that the Singular Value Decomposition of a n × m matrix takes time Θ(n2 m), and
computing the top singular vector takes time Θ(nm).

LMO and projection both take Θ(n2 m)

LMO takes Θ(nm), and projection takes Θ(n2 m)
LMO takes Θ(n2 m), and projection takes Θ(nm)

T
AF
Smoothness and Strong Convexity
Consider an iterative optimization procedure.

Question 12 Which one of the following three inequalities is valid for a smooth convex function f :
2
f (xt+1 ) − f (xt ) ≤ ∇f (xt )> (xt+1 − xt ) + L
2 kxt+1 − xt k
DR

2
f (xt+1 ) − f (xt ) ≤ ∇f (xt )> (xt+1 − xt ) − L
2 kxt+1 − xt k
2
f (xt+1 ) − f (xt ) ≤ ∇f (xt )> (xt − xt+1 ) + L
2 kxt+1 − xt k

Question 13 Which one of the following three inequalities is valid for a strongly convex function f :
µ 2
f (xt ) − f (x? ) ≥ ∇f (xt )> (xt − x? ) + 2 kxt − x? k
µ 2
f (xt ) − f (x? ) ≤ ∇f (xt )> (xt − x? ) − 2 kxt − x? k
µ 2
f (xt ) − f (x? ) ≤ ∇f (xt )> (xt − x? ) + 2 kxt − x? k

Random search

Question 14 Consider derivative-free random search, with line-search, as discussed in the lecture.

For strongly convex functions, random search converges as O(L log(1/ε))

For convex functions, random search converges as O(dL/ε)
For convex functions, random search converges as O(dL log(1/ε))

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/6/55+ y

Empirical comparison of different methods

Donald Duck’s three nephews Huey, Dewey, and Louie have enrolled in CS 439. For their course
project, they analyzed three different algorithms, namely Gradient Descent, Accelerated Gradient
Method and Newton’s second-order optimization method on a strongly convex optimization problem
and plotted the performance of the algorithms on a graph. However, as it turns out they forgot to
put a legend in their graph and due to some bug in their code, they plotted a line which corresponds
to none of the algorithms. Can you help them in labelling their unlabelled graph?

T
AF
Figure 1: Performance of different optimization algorithms.

Question 15 Which optimization method corresponds to the error-curve for Algorithm 1?

None
Gradient Descent (with correct stepsize)
Accelerated Gradient Method (with correct parameters)
Newton’s optimization method

Question 16 Which optimization method corresponds to the error-curve for Algorithm 2?

None
Newton’s optimization method
Accelerated Gradient Method (with correct parameters)
Gradient Descent (with correct stepsize)

Question 17 Which optimization method corresponds to the error-curve for Algorithm 3?

Accelerated Gradient Method (with correct parameters)

Newton’s optimization method
None
Gradient Descent (with correct stepsize)

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/7/54+ y

Question 18 Which optimization method corresponds to the error-curve for Algorithm 4?

Newton’s optimization method

None
Accelerated Gradient Method (with correct parameters)
Gradient Descent (with correct stepsize)

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/8/53+ y

Second part, true/false questions

Question 19 (Convexity) The epigraph of a function f : Rd → R is defined as

epi(f ) := {(x, α) ∈ Rd+1 | x ∈ dom(f ), α ≤ f (x)},

TRUE FALSE

Question 20 (Convexity) The triangle inequality and homogeneity of a norm together imply that
any norm is convex.

TRUE FALSE

Question 21 (Convex Sets) We consider C1 and C2 two convex sets in Rd .

We define C1 + C2 := {x1 + x2 , x1 ∈ C1 , x2 ∈ C2 }.
Is C1 + C2 a convex set?

TRUE FALSE

Question 22
T2
(Differentiability) The function max(0, x) is differentiable over R.
AF
TRUE FALSE

Question 23 (Coordinate Descent) Consider Coordinate Descent on a strongly convex and smooth
objective function. depending on the coordinate-wise smoothness constants Li (i.e. different Lipschitz
DR

constants for each gradient coordinate).

If we sample coordinate i with probability proportional to Li , and use stepsize Li , convergence is
typically faster than uniform CD

TRUE FALSE

Question 24 (Coordinate Descent) In the same setting, if we sample coordinate i uniformly, and
use stepsize 1/Li , convergence is typically faster than CD with fixed stepsize

TRUE FALSE

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/9/52+ y

Third part, open questions

Answer in the space provided! Your answer must be justified with all steps. Do not cross any
checkboxes, they are reserved for correction.

Intersection of Convex Sets

We are given n convex sets C0 = {C1 , . . . , Cn } where each set Ci ⊆ Rd . We want to design an
algorithm which can check if the intersection of all of these sets is null i.e. we want to check if
\
Ci = ∅ .
Ci ∈C0

However we do not care about solving this problem exactly, but have a small leeway of magnitude
ε ≥ 0. To make this more mathematically precise, let us define some notation.
The distance between a set C ⊆ Rd and any point y ∈ Rd is defined as
def
d(C, y) = min kw − yk2 .
w∈C

We only want to distinguish between the following two cases for any ε > 0:
Tn
(N) The intersection of the sets is non-empty, i.e. i=1 Ci 6= ∅.

(E) For any point x ∈ Rd , maxi∈{1,...,n} d(Ci , x) ≥ ε.

T
AF
We want to solve this problem using calls to an oracle which can compute the projection onto Ci ∈ C.
Let us define the projection oracle Pi (x) for any i ∈ {1, . . . , n} and x ∈ Rd as

Pi (x) := argmin ky − xk2 .

y∈Ci
DR

We want to make as few calls to the projection oracle as possible. Our strategy will be to i) define
a loss function and ii) run gradient descent. Then using our knowledge of convergence of gradient
descent, we can argue about the number of oracle calls required.

First Approach.
Inspired by the condition in case (E), let us define the following loss function:

g(x) := max d(Ci , x) .

i∈{1,...,n}

Question 24: 2 points. Is the function g(x) convex? Is it Lipschitz?

Hint: maximum of convex functions is also convex.

0 1 2

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/10/51+ y

Question 25: 5 points. What is the sub-gradient of g? How many calls to the gradient oracle are
needed to compute ∂g(x) and g(x)?
Hint: Show that for two convex functions g1 (x) and g2 (x), ∂gi (x) is a subgradient in the set
∂ max(g1 (x), g2 (x)) where gi (x) := max(g1 (x), g2 (x)).

0 1 2 3 4 5

T
AF
Question 26: 5 points. Assume you are given a starting point x0 and a constant R such that
kx0 − x? k2 ≤ R. Give the update step of gradient descent with an appropriate step-size. Show
DR

using the convergence of gradient descent we proved in class that for any optimum x? of g,
R
min g(xt ) − g(x? ) ≤ √ .
t∈{0,...,T } T

0 1 2 3 4 5

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/11/50+ y

Question 27: 4 points. Using the result from the previous question, show that O(n/ε2 ) calls to the
projection oracle is sufficient to distinguish between case (N) and case (E) for our problem.

0 1 2 3 4

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/12/49+ y

Question 28: 6 points.

0 1 2 3 4 5 6

The convergence of the Frank-Wolfe algorithm was analyzed in class for only smooth functions. In
this question we will examine if smoothness is necessary. Consider the following non-smooth function
f : R2 → R:
f (w, v) := max {w, v} ,

restricted to a ball of radius 2 around the origin. We are then interested in finding

(w? , v ? ) := argmin (max {w, v}) .

w2 +v 2 ≤2

Suppose we start at the origin (0, 0) and run the Frank-Wolfe algorithm (with any step size rule).
Since the function is not smooth, we will call the LMO oracle using an arbitrary subgradient instead
of the gradient. Does this algorithm converge to the optimum?
Hint: First show that the iterates of Frank-Wolfe always lie in the convex hull of the starting point
and the solutions of the LMO oracle.

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/13/48+ y

Newton’s second-order optimization method

As studied in the class, the update step for Newton’s optimization method for an objective function
g : Rn → R is given by
xt+1 := xt − ∇2 g(xt )−1 ∇g(xt )

Question 29: 2 points. What happens when Newton’s optimization method is run on a convex
quadratic function? Explain.

0 1 2

T
AF
Question 30: 2 points. Affine Invariance of the Newton’s method
DR

Consider h(x) := g(M x) where M ∈ Rn×n is invertible where g is some convex function.
Show that the Newton steps for h and g are also related by the same linear transformation,
i.e., ∆xt = M ∆yt where ∆xt and ∆yt are the Newton steps at the tth iteration for h and g
respectively. We assume x0 = M y0 are the starting iterates for h and g respectively.

0 1 2

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/14/47+ y

Coordinate Descent
Question 31: 2 points. Given a matrix A, we define λmin (A> A) and λmax (A> A) to be the smallest
and largest eigenvalues of A> A.
Show that for any x, y ∈ Rn ,

λmin (A> A)kx − yk2 ≤ kA(x − y)k2 ≤ λmax (A> A)kx − yk2 .

0 1 2

T
AF
Question 32: 3 points. Show that for any x, y ∈ Rn , for any b ∈ Rn :
DR

> λmax (A> A)

kAx − bk2 ≤ kAy − bk2 + A> (Ay − b) (x − y) + kx − yk2 .

2

What does that imply for f (x) := kAx − bk2 ?

0 1 2 3

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/15/46+ y

Question 33: 2 points. For f (x) := kAx − bk2 , we now perform one step of coordinate descent. I.e.
for a given point xt ∈ Rn we do a step of the form

xt+1 := xt − γt (∇f (xt ))i · ei

where ei ∈ Rn denotes a standard unit vector. For i fixed, compute the best γt .

0 1 2

T
AF
Smooth strongly convex SGD
Pn
We consider a function f (x) := n1 i=1 fi (x) on Rd , and we assume that the functions fi are convex,
differentiable.
We furthermore assume that f is L-smooth, that is that ∇f is L-Lipschitz.
We consider SGD defined as the following algorithm: Let x0 ∈ Rd , and for any t ≥ 1, for a sequence
DR

of step sizes γt , define

xt+1 := xt − γt gt .

We first consider gt := ∇fit (xt ), with it uniformly and independently sampled from {1, . . . , n}.

Question 34: 2 points. Show that gt is an unbiased estimator of the gradient ∇f (xt ).

0 1 2

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/16/45+ y

Question 35: 6 points. Combining the two valid equations of smoothness and strong convexity (as
also stated in Questions 12 and 13), prove in detailed steps that, if γt ≤ L1 , SGD in this setting
converges as
h i h i
h i (1 − γt µ) E kxt − x? k2 − E kxt+1 − x? k2
2
E [f (xt+1 ) − f (x? )] ≤ γt E kgt − ∇f (xt )k + .
2γt
(3)

For comparison, recall the following result from Lecture 6 (slide 6):
h i h i
2 2
γ B 2 (1 − γt µ) E kxt − x? k − E kxt+1 − x? k
t
E [f (xt+1 ) − f (x? )] ≤ + . (4)
2 2γt
2
under the bounded gradient assumption E[kgt k ] ≤ B 2 .
How do the two results compare?

0 1 2 3 4 5 6

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/17/44+ y

Question 36: 4 points. Recall the possible choices of learning rate (γt ) in the situation of the previous
question. What is the resulting rate of convergence? Which estimator do we eventually consider?
Comment on the assumption γt ≤ L1 . Is it a restriction? Which choice of step size could be
used,
a) for getting O(log(t)/t) convergence, and
b) for getting O(1/t)?

0 1 2 3 4

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.
y +0/18/43+ y

T
AF
DR

y For your examination, preferably print documents compiled from auto- y

multiple-choice.

Laboratory 2 ER Modelling and Relational Table Transformation
No ratings yet
Laboratory 2 ER Modelling and Relational Table Transformation
11 pages
Unconstrained Numerical Optimization An Introduction For Econometricians
100% (1)
Unconstrained Numerical Optimization An Introduction For Econometricians
32 pages
Distance Learning Synopsis
No ratings yet
Distance Learning Synopsis
3 pages
Exam 2023
No ratings yet
Exam 2023
16 pages
Exam With Solutions PDF
0% (1)
Exam With Solutions PDF
17 pages
COGS 118 Homework 3 Supervised Machine Learning Algorithms
No ratings yet
COGS 118 Homework 3 Supervised Machine Learning Algorithms
7 pages
Week 10 (2)
No ratings yet
Week 10 (2)
8 pages
Week 8
No ratings yet
Week 8
5 pages
C2_M2_Exam_withSol (1)
No ratings yet
C2_M2_Exam_withSol (1)
12 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Optimization Algorithms For Data Analysis Wright
No ratings yet
Optimization Algorithms For Data Analysis Wright
49 pages
oblig2_fasit
No ratings yet
oblig2_fasit
6 pages
Numerical Programming I (For CSE) : Final Exam
No ratings yet
Numerical Programming I (For CSE) : Final Exam
8 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
Numerical_MCQs_Set20 (1)
No ratings yet
Numerical_MCQs_Set20 (1)
7 pages
02_grad_desc
No ratings yet
02_grad_desc
54 pages
CS-6777 Liu Abs
No ratings yet
CS-6777 Liu Abs
103 pages
N.O. Study Guide
No ratings yet
N.O. Study Guide
7 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Week 12
No ratings yet
Week 12
65 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
DoPE Question Bank of Optimization in Chemical Engg. Sem. VIII2
No ratings yet
DoPE Question Bank of Optimization in Chemical Engg. Sem. VIII2
8 pages
Intronumericalrecipes v01 Chapter08 Opt
No ratings yet
Intronumericalrecipes v01 Chapter08 Opt
15 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Lecture 05 - Unconstrained
No ratings yet
Lecture 05 - Unconstrained
21 pages
CSE488_Lab6_Optimization
No ratings yet
CSE488_Lab6_Optimization
20 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Unit VI Optimization Techniques question bank solved answer
No ratings yet
Unit VI Optimization Techniques question bank solved answer
20 pages
Homework 2
No ratings yet
Homework 2
5 pages
Chapter 3 Unconstrained Convex Optimization
No ratings yet
Chapter 3 Unconstrained Convex Optimization
28 pages
Lecture 7 (with notes)
No ratings yet
Lecture 7 (with notes)
39 pages
MSML604_Homework_5
No ratings yet
MSML604_Homework_5
4 pages
NEOM UNIT-1 Sept-23
No ratings yet
NEOM UNIT-1 Sept-23
34 pages
Chương 9
No ratings yet
Chương 9
12 pages
ASSIGNMENT 1 Math PDF
No ratings yet
ASSIGNMENT 1 Math PDF
40 pages
B For I 1,, M: N J J J
No ratings yet
B For I 1,, M: N J J J
19 pages
Algorithms Process Optimization
No ratings yet
Algorithms Process Optimization
5 pages
10-725/36-725 Optimization Midterm Exam: Name
No ratings yet
10-725/36-725 Optimization Midterm Exam: Name
10 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
Ods End Term
No ratings yet
Ods End Term
3 pages
Or Goalprograming Last Lecture Sllbs
No ratings yet
Or Goalprograming Last Lecture Sllbs
39 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
Math Optimization
No ratings yet
Math Optimization
11 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
An Overview of Traditional Optimization Methods - Truncated
No ratings yet
An Overview of Traditional Optimization Methods - Truncated
17 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Final
No ratings yet
Final
19 pages
2021 11 Quiz Solution
No ratings yet
2021 11 Quiz Solution
4 pages
HW4 Solutions Autotag
No ratings yet
HW4 Solutions Autotag
7 pages
CS 771A: Introduction To Machine Learning Name Roll No Dept
No ratings yet
CS 771A: Introduction To Machine Learning Name Roll No Dept
2 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
Matlab Code
No ratings yet
Matlab Code
8 pages
Basic Concepts: 1.1 Continuity
No ratings yet
Basic Concepts: 1.1 Continuity
7 pages
20-Region Elimination Method_ Golden search method-11-03-2025
No ratings yet
20-Region Elimination Method_ Golden search method-11-03-2025
20 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
Gradient Methods For Minimizing Composite Objective Function
No ratings yet
Gradient Methods For Minimizing Composite Objective Function
31 pages
2021 Mock Exam
No ratings yet
2021 Mock Exam
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
DB2 - IBM's Relational DBMS
No ratings yet
DB2 - IBM's Relational DBMS
157 pages
Case Study Report
No ratings yet
Case Study Report
61 pages
MVAJ053
100% (1)
MVAJ053
38 pages
The Definitive Guide To Lighting in The High Definition Render Pipeline
No ratings yet
The Definitive Guide To Lighting in The High Definition Render Pipeline
83 pages
Electronic Music & Nu Metal
No ratings yet
Electronic Music & Nu Metal
9 pages
Aerospike Benchmark - Aerospike ScyllaDB Initial Comparison
No ratings yet
Aerospike Benchmark - Aerospike ScyllaDB Initial Comparison
4 pages
Prediction of Sensitivity of Energetic Compounds With A New Computer Code
No ratings yet
Prediction of Sensitivity of Energetic Compounds With A New Computer Code
7 pages
Astro: Instruction Manual
100% (1)
Astro: Instruction Manual
114 pages
A VTCT Adapter For All Tektronix SCTs
No ratings yet
A VTCT Adapter For All Tektronix SCTs
39 pages
Bài cuối kì-6313006-Lê Văn An
No ratings yet
Bài cuối kì-6313006-Lê Văn An
5 pages
Revision Chapter 7
No ratings yet
Revision Chapter 7
4 pages
Final Report 2
No ratings yet
Final Report 2
87 pages
B.Tech Electronic & Communication Engineering: Career Options
No ratings yet
B.Tech Electronic & Communication Engineering: Career Options
3 pages
01 Test Bank
No ratings yet
01 Test Bank
15 pages
TT50H
No ratings yet
TT50H
11 pages
OBIEE Dimension Hierarchy
No ratings yet
OBIEE Dimension Hierarchy
11 pages
HCSA Field Datacom Campus Network Lab Guide - 05
No ratings yet
HCSA Field Datacom Campus Network Lab Guide - 05
62 pages
Week 4 Dumps
No ratings yet
Week 4 Dumps
30 pages
Embedded System Lec_Week3
No ratings yet
Embedded System Lec_Week3
45 pages
Suppliers
No ratings yet
Suppliers
11 pages
Correlation Based Dynamic Time Warping of Multivariate Time Series
No ratings yet
Correlation Based Dynamic Time Warping of Multivariate Time Series
28 pages
Odm 301 Rev 30
No ratings yet
Odm 301 Rev 30
910 pages
Jain Cornell 0058O 10854
No ratings yet
Jain Cornell 0058O 10854
35 pages
Priority logs for SIEM ingestion_ practitioner guidance
No ratings yet
Priority logs for SIEM ingestion_ practitioner guidance
38 pages
Interpreting Tables and Charts
100% (1)
Interpreting Tables and Charts
3 pages
Effect of Enterprise Resource Planning System-981
No ratings yet
Effect of Enterprise Resource Planning System-981
13 pages
Discrete Mathematics 1.6 - 1.7 Solution
No ratings yet
Discrete Mathematics 1.6 - 1.7 Solution
6 pages
Non-Contact Forehead Infrared Thermometer User Manual: M. Feingersh & Co - LTD
No ratings yet
Non-Contact Forehead Infrared Thermometer User Manual: M. Feingersh & Co - LTD
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

exam2018

Uploaded by

exam2018

Uploaded by

y +0/1/60+ y

Exam Optimization for Machine Learning – CS-439

Fr. 6. July 2018 - 16h15 to 19h15, in CE1515

SCIPER : SCIPER Signature :

• This is a closed book exam. No electronic devices of any kind.

• You each have a different exam.

y For your examination, preferably print documents compiled from auto- y

First part, multiple choice

(i) find x0 , such that x20 ≈ y (e.g. y = 17, x0 = 4).

(ii) Calculate the difference d = y − x20 (e.g. d = 17 − 42 = 1).

(iv) repeat (ii)–(iii) for higher accuracy.

y For your examination, preferably print documents compiled from auto- y

Question 4 The Newton-Raphson method to find zeros of f can be interpreted as a second-order

Newton’s second-order optimization method

xt+1 := xt − ∇2 g(xt )−1 ∇g(xt )

−A positive semidefinite, and b is non-negative

y For your examination, preferably print documents compiled from auto- y

Question 9 How expensive is it to compute just a single coordinate of the gradient of f as in

Question 10 The complexity of Coordinate Descent depends on the coordinate-wise smoothness

y For your examination, preferably print documents compiled from auto- y

LMO and projection both take Θ(n2 m)

For strongly convex functions, random search converges as O(L log(1/ε))

y For your examination, preferably print documents compiled from auto- y

Empirical comparison of different methods

Question 15 Which optimization method corresponds to the error-curve for Algorithm 1?

Question 16 Which optimization method corresponds to the error-curve for Algorithm 2?

Question 17 Which optimization method corresponds to the error-curve for Algorithm 3?

Accelerated Gradient Method (with correct parameters)

y For your examination, preferably print documents compiled from auto- y

Question 18 Which optimization method corresponds to the error-curve for Algorithm 4?

Newton’s optimization method

y For your examination, preferably print documents compiled from auto- y

Second part, true/false questions

epi(f ) := {(x, α) ∈ Rd+1 | x ∈ dom(f ), α ≤ f (x)},

Question 21 (Convex Sets) We consider C1 and C2 two convex sets in Rd .

constants for each gradient coordinate).

y For your examination, preferably print documents compiled from auto- y

Third part, open questions

Intersection of Convex Sets

(E) For any point x ∈ Rd , maxi∈{1,...,n} d(Ci , x) ≥ ε.

Pi (x) := argmin ky − xk2 .

g(x) := max d(Ci , x) .

Question 24: 2 points. Is the function g(x) convex? Is it Lipschitz?

y For your examination, preferably print documents compiled from auto- y

y For your examination, preferably print documents compiled from auto- y

y For your examination, preferably print documents compiled from auto- y

Question 28: 6 points.

(w? , v ? ) := argmin (max {w, v}) .

y For your examination, preferably print documents compiled from auto- y

Newton’s second-order optimization method

y For your examination, preferably print documents compiled from auto- y

> λmax (A> A)

What does that imply for f (x) := kAx − bk2 ?

y For your examination, preferably print documents compiled from auto- y

xt+1 := xt − γt (∇f (xt ))i · ei

of step sizes γt , define

y For your examination, preferably print documents compiled from auto- y

y For your examination, preferably print documents compiled from auto- y

y For your examination, preferably print documents compiled from auto- y

y For your examination, preferably print documents compiled from auto- y

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.