0% found this document useful (0 votes)
85 views

Handout 1 Introduction

This document introduces optimization problems that can be expressed in the form of minimizing an objective function subject to constraints. It discusses different types of optimization problems including linear programming, quadratic programming, and semidefinite programming. It provides examples to illustrate how to formulate real-world problems as mathematical optimization problems by identifying decision variables and defining the objective function and constraints. Specifically, it formulates an air traffic control problem to maximize the shortest time between airplane arrivals as an optimization problem.

Uploaded by

lu huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Handout 1 Introduction

This document introduces optimization problems that can be expressed in the form of minimizing an objective function subject to constraints. It discusses different types of optimization problems including linear programming, quadratic programming, and semidefinite programming. It provides examples to illustrate how to formulate real-world problems as mathematical optimization problems by identifying decision variables and defining the objective function and constraints. Specifically, it formulates an air traffic control problem to maximize the shortest time between airplane arrivals as an optimization problem.

Uploaded by

lu huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ENGG 5501: Foundations of Optimization 2018–19 First Term

Handout 1: Introduction
Instructor: Anthony Man–Cho So September 3, 2018

1 Basic Notions in Optimization


In this course we will consider a class of mathematical optimization problems that can be
expressed in the form
minimize f (x)
(P )
subject to x ∈ X.
Here, f : Rn → R is called the objective function, and X ⊆ Rn is called the feasible region.
Thus, x = (x1 , . . . , xn ) is an n–dimensional vector, and we shall agree that it is represented in
column form. The entries x1 , . . . , xn are called the decision variables of (P ). As the above
formulation suggests, we are interested in a global minimizer of (P ), which is defined as a point
x∗ ∈ X such that f (x∗ ) ≤ f (x) for all x ∈ X. Note that such an x∗ may not exist—take, for
example, f (x) = 1/x and X = R++ ≡ {x ∈ R : x > 0}. The optimal value of (P ) is defined to be
the greatest lower bound or infimum of the set {f (x) : x ∈ X} (see Handout C) and is denote by

v ∗ = inf{f (x) : x ∈ X}.

Note that if x∗ is a global minimizer of (P ), then we naturally have v ∗ = f (x∗ ). Nevertheless, v ∗ can
be finite even if problem (P ) does not have any global minimizer. For instance, when f (x) = 1/x
and X = R++ , we have v ∗ = 0.
A related notion is that of a local minimizer, which is defined as a point x0 ∈ X such that for
some  > 0, we have f (x0 ) ≤ f (x) for all x ∈ X ∩ B(x0 , ). Here,

B(x0 , ) = x ∈ Rn : kx − x0 k2 ≤ 


is the Euclidean ball P of radius  > 0 centered at x0 (recall that for x ∈ Rn , the 2–norm of
x is defined as kxk22 = ni=1 x2i ≡ xT x). Note that a global minimizer is automatically a local
minimizer, but the converse is not necessarily true. In this course we shall devote a substantial
amount of time to characterize the minimizers of (P ) and study how the structures of f and X
affect our ability to solve (P ). Before we do that, however, let us observe that problem (P ) is quite
general. For example, when X = Rn , we have an unconstrained optimization problem; when
X is discrete (i.e., for every x ∈ X, there exists an  > 0 such that X ∩ B(x, ) = {x}), we have a
discrete optimization problem. Other important classes of optimization problems include:

• Linear Programming (LP) Problems: Here, f is a linear function; i.e., a function of


the form
f (x) = c1 x1 + c2 x2 + · · · + cn xn ≡ cT x
with c = (c1 , . . . , cn ) ∈ Rn , and X is a set defined by linear inequalities; i.e., it takes the
form
X = x ∈ Rn : aTi x ≤ bi for i = 1, . . . , m

(1)

1
with a1 , . . . , am ∈ Rn and b1 , . . . , bm ∈ R. In more compact notation, we may write a linear
programming problem as follows:
minimize cT x
subject to Ax ≤ b,

where A is an m × n matrix whose i–th row is aTi , and b = (b1 , . . . , bm ) is an m–dimensional


column vector (for any two vectors u, v ∈ Rn , the inequality u ≤ v means ui ≤ vi for
i = 1, . . . , n). We remark that LP problems can be solved very efficiently.
• Quadratic Programming (QP) Problems: Here, X is as in (1), and f is a (homogeneous)
quadratic function; i.e., a function of the form
n X
X n
f (x) = Qij xi xj ≡ xT Qx,
i=1 j=1

where Q = [Qij ] is an n × n matrix. Note that we may assume without loss that Q is
symmetric. This follows from the fact that
Q + QT
 
T T
x Qx = x x.
2

• Semidefinite Programming (SDP) Problems: Given an n × n symmetric matrix Q


(denoted by Q ∈ S n ), we say that Q is positive semidefinite (denoted by Q  0 or Q ∈ S+ n

if we want to make the dimension explicit) if xT Qx ≥ 0 for all x ∈ Rn . Let A1 , . . . , Am , C ∈ S n


and b1 , . . . , bm ∈ R. Consider the optimization problem
minimize bT y
m
X
subject to C − yi Ai  0, (2)
i=1

y ∈ Rm .

The constraint in problem (2), which can be written as − m


P
i=1 yi Ai  −C, is called a
linear matrixPinequality (LMI), as the matrix–valued function M : Rm → S n defined
by M (y) = − m i=1 yi Ai is linear (i.e., M satisfies M (αy + βz) = αM (y) + βM (z) for any
m
y, z ∈ R and α, β ∈ R, as can be easily verified). Problem (2) is a so–called semidefinite
programming (SDP) problem. Its feasible region is given by
m
( )
X
X = y ∈ Rm : C − y i Ai  0 .
i=1

It is a routine exercise to show that the optimization problem


n X
X n
minimize C •Z ≡ Cij Zij
i=1 j=1
(3)
subject to Ai • Z = bi for i = 1, . . . , m,
Z0

2
can be cast into the form (2). Hence, problem (3) is an instance of SDP. To determine
the feasible region of (3), observe that since the matrix Z is symmetric, it is completely
determined by, say, the entries on and above the diagonal. Hence, the feasible region of (3)
can be expressed as
n o
X = (Z11 , Z12 , . . . , Znn ) ∈ Rn(n+1)/2 : Ai • Z = bi for i = 1, . . . , m; Z  0 .

Similar to LP problems, SDP problems can also be solved efficiently.

• Polynomial Optimization (PO) Problems: Here, f is a real–valued polynomial of


degree, say, d. In other words, it can be expressed as
X
f (x) = fα xα1 1 xα2 2 · · · xαnn ,
|α|≤d

where α = (α1 , . . . , αn ) ∈ N, |α| = ni=1 αi , and fα is the coefficient of the term xα1 1 xα2 2 · · · xαnn .
P
The set X is defined by polynomial inequalities; i.e., it takes the form

X = {x ∈ Rn : gi (x) ≥ 0 for i = 1, . . . , m} ,

where g1 , . . . , gm : Rn → R are also real–valued polynomials. In general, PO problems


are very difficult to solve. However, under some mild assumptions, they can be efficiently
approximated by a series of SDP problems, at least in theory. We shall not delve into
polynomial optimization in this course. We refer the interested reader to [3] for further
reading.

The aforementioned classes of problems capture a wide range of applications. However, in order
to convert a particular application into a problem of the form (P ), we need to first identify the
data and decision variables and then formulate the objective function and constraints. Let us now
illustrate this process via some examples.

2 Formulating Optimization Problems


2.1 An Air Traffic Control Problem
Suppose that n airplanes are trying to land at the Hong Kong International Airport. Airplane i
will arrive at the airport within the time interval [ai , bi ], where i = 1, . . . , n. For simplicity, we shall
assume that the airplanes arrive in the order 1, 2, . . . , n. Due to safety concerns, the control tower
of the airport would like to maximize the so–called shortest metering time; i.e., the minimum over
all inter–arrival times between two consecutive airplanes. How then should the airport assign the
arrival time of each airplane?
Here, the decision variables are the arrival times of the airplanes, which we denote by t1 , . . . , tn .
Then, we have the following optimization problem:

maximize min1≤j≤n−1 (tj+1 − tj )


subject to ai ≤ ti ≤ bi for i = 1, . . . , n, (4)
ti ≤ ti+1 for i = 1, . . . , n − 1.

3
It is not immediately clear that (4) can be formulated as an LP, but it can be done as follows. Let
z be a new decision variable. Then, we may rewrite (4) as

maximize z
subject to ti+1 − ti ≥ z for i = 1, . . . , n − 1,
ai ≤ ti ≤ bi for i = 1, . . . , n,
ti ≤ ti+1 for i = 1, . . . , n − 1,

which is an LP. We should point out that the above reformulation works only because we are
maximizing instead of minimizing the quantity min1≤j≤n−1 (tj+1 − tj ). In particular, the following
problems:
minimize min1≤j≤n−1 (tj+1 − tj )
subject to ai ≤ ti ≤ bi for i = 1, . . . , n, (5)
ti ≤ ti+1 for i = 1, . . . , n − 1
and
minimize z
subject to ti+1 − ti ≥ z for i = 1, . . . , n − 1,
(6)
ai ≤ ti ≤ bi for i = 1, . . . , n,
ti ≤ ti+1 for i = 1, . . . , n − 1
are not equivalent, since the optimum value of (5) is finite (in fact, it is always non–negative), while
the optimum value of (6) is −∞.
For another air traffic control application that utilizes optimization techniques, see [1].

2.2 A Data Fitting Problem


The previous example shows that sometimes one may be able to convert an optimization problem
into an LP via some transformations. Here is another illustration of such possibility. Suppose that
we are given m data pairs (ai , bi ), where ai ∈ Rn and bi ∈ R for i = 1, . . . , m, with m ≥ n + 1. We
suspect that these pairs are essentially generated by an affine function; i.e., a function f : Rn → R
of the form f (y) = xT y + t for some x ∈ Rn and t ∈ R. However, the output of the function is
usually corrupted by an additive noise. Thus, the relationship between ai and bi is better described
as bi = aTi x + t + i , where x ∈ Rn and t ∈ R are the parameters of the affine function, and i ∈ R
is the noise in the i–th measurement. Our goal then is to determine the parameters x ∈ Rn and
t ∈ R of the affine function that best fits the data. To measure the goodness of fit, we can try to
minimize some sort of error measure. One popular measure is the 1–norm of the residual errors,
which is defined as
m
X
bi − aTi x − t = kb − Ax − tek1 ,

∆1 =
i=1

where A is the m × n matrix whose i–th row is aTi , and e ∈ Rm is the vector of all ones. In other
words, our optimization problem is simply
m
X
bi − aTi x − t .

min (7)
x∈Rn , t∈R
i=1

Here, the objective function is nonlinear. However, we can turn problem (7) into an LP as follows.
We first introduce m new decision variables z1 , . . . , zm ∈ R. Then, it is not hard to see that (7) is

4
equivalent to the following LP:
m
X
minimize zi
i=1

subject to bi − aTi x − t ≤ zi for i = 1, . . . , m,


−bi + aTi x + t ≤ zi for i = 1, . . . , m.

Now, what if we want to minimize the 2–norm of the residual errors? In other words, we would
like to solve the following problem:
m
X 2
min
n
∆2 = kb − Ax − tek22 = bi − aTi x − t . (8)
x∈R , t∈R
i=1

It turns out that this is a particularly simple QP. In fact, since (8) is an unconstrained optimization
problem with a differentiable objective function, we can solve it using calculus techniques. Indeed,
suppose for simplicity that Ā has full column rank, so that ĀT Ā is invertible. Then, the (unique)
optimal solution (x∗ , t∗ ) ∈ Rn × R to (8) is given by
 T 
" # a1 1
x∗ −1 T  . .. 

= ĀT Ā Ā b with Ā =  .. . ∈R
 m×(n+1)
.
t 
aTm 1

On the other hand, if Ā does not have full column rank, then it can be shown that for any z ∈ Rn+1 ,
the vector " #
x∗ T
† T 
T † T

= Ā Ā Ā b + I − ( Ā Ā) Ā Ā z
t∗

is optimal for (8). It is worth noting that the matrix I − (ĀT Ā)† ĀT Ā is simply the orthogonal
projection onto the nullspace of ĀT Ā. In particular, when Ā does not have full column rank, the
nullspace of ĀT Ā is non–trivial.
In the above discussion, we assume that the number of observations m exceeds the number of
parameters n; specifically, m ≥ n + 1. However, in many modern applications (such as biomedical
imaging and gene expression analyses), the number of observations is much smaller than the number
of parameters. Thus, one can typically find infinitely many parameter pairs (x̄, t̄) ∈ Rn × R that fit
the data perfectly; i.e., bi = aTi x̄ + t̄ for i = 1, . . . , m. To make the data fitting problem meaningful,
it is then necessary to impose additional assumptions. An intuitive and popular one is that the
actual number of parameters responsible for the input–output relationship is small. In other words,
most of the entries in the parameter vector x ∈ Rn should be zero, though we do not know a priori
where those entries are. There are several ways to formulate the data fitting problem under such
an assumption. For instance, one can consider the following constrained optimization approach:

minimize kb − Ax − tek22
subject to kxk0 ≤ K, (9)
x ∈ Rn , t ∈ R.

5
Here, kxk0 is the number of non–zero entries in the parameter vector x ∈ Rn , and K ≥ 0 is a
user–defined threshold that controls the sparsity of x. Alternatively, one can consider the following
penalty approach:
kb − Ax − tek22 + µkxk0 ,

min
n
(10)
x∈R , t∈R

where µ > 0 is a penalty parameter. However, due to the combinatorial nature of the function
x 7→ kxk0 , both of the above formulations are computationally difficult to solve. In fact, it can be
shown, in a formal sense, that an efficient algorithm for solving problems (9) and (10) is unlikely to
exist. To obtain more tractable formulations, a widely used approach is to replace k · k0 by k · k1 .
We will see later in the course why this is a good idea from a computationally perspective. For
now, we should note that such an approach changes the original problems, and a natural question
is whether there is any correspondence between the solutions to the original problems and those to
the modified problems. This question has been extensively studied in the fields of high–dimensional
statistics and compressive sensing over the past decade or so. We refer the interested reader to the
book [2] for details and further pointers to the literature.
At this point let us reflect a bit on the above examples. Intuitively, a linear problem (say, an
LP) should be easier than a nonlinear problem, and a differentiable problem should be easier than a
non–differentiable one. However, the above examples show that these need not be the case. Indeed,
even though the 2–norm problem (8) is a QP, its optimal solution has a nice characterization, while
the corresponding 1–norm problem (7) does not have such a feature. On the other hand, even
though the objective function in (7) is non–differentiable, the problem can still be solved easily
via LP. Also, we have seen from problems (9) and (10) that the inclusion of a seemingly simple
constraint or objective may render an originally easy optimization problem (namely, Problem (8))
intractable.
From the above discussion, it is natural to ask what makes an optimization problem difficult.
While it is hard to give an answer to such question without over–generalizing, let us at least identify
a possible source of difficulty. What distinguishes the seemingly very similar problems (8) and (9)
is that the former is a so–called convex optimization problem, while the latter is not. We shall
define the notion of convexity and study it in more detail later.

2.3 Eigenvalue Optimization


Suppose that we are given k n × n symmetric matrices A1 , . . . , Ak . Consider the function A : Rk →
S n defined by
Xk
A(x) = xi Ai .
i=1

Note that by definition, the matrix A(x) is symmetric for any x ∈ Rk . Now, a problem that is
frequently encountered in practice is that of choosing an x ∈ Rk so that the largest eigenvalue of
A(x) is minimized (see, e.g., [4] for details). It turns out that such a problem can be formulated as
an SDP. To prove this, we need the following result:

Proposition 1 Let A be an arbitrary n × n symmetric matrix, and let λmax (A) denote the largest
eigenvalue of A. Then, we have tI  A if and only if t ≥ λmax (A).

6
Proof Suppose that tI  A, or equivalently, tI − A  0. Then, for any u ∈ Rn \{0}, we have
uT (tI − A)u = tuT u − uT Au ≥ 0, or equivalently,

uT Au
t≥ .
uT u
Since this holds for an arbitrary u ∈ Rn \{0}, we have

uT Au
t≥ max . (11)
u∈Rn \{0} uT u

By the Courant–Fischer theorem, the right–hand side of (11) is precisely λmax (A).
The converse can be established by reversing the above arguments. This completes the proof.
u
t
Proposition 1 allows us to formulate the above eigenvalue optimization problem as

minimize t
(12)
subject to tI − A(x)  0.

As the function Rn × R 3 (x, t) 7→ tI − A(x) is linear in (x, t), the constraint is an LMI. Hence,
problem (12) is an SDP.

References
[1] D. Bertsimas, M. Frankovich, and A. Odoni. Optimal Selection of Airport Runway Configura-
tions. Operations Research, 59(6):1407–1419, 2011.

[2] P. Bühlmann and S. van de Geer. Statistics for High–Dimensional Data: Methods, Theory and
Applications. Springer Series in Statistics. Springer–Verlag, Berlin/Heidelberg, 2011.

[3] J. B. Lasserre. Moments, Positive Polynomials and Their Applications, volume 1 of Imperial
College Press Optimization Series. Imperial College Press, London, United Kingdom, 2009.

[4] A. S. Lewis and M. L. Overton. Eigenvalue Optimization. Acta Numerica, 5:149–190, 1996.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy