Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes
Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes
Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes
Lecture notes
Ulrich Brenner
Research Institute for Discrete Mathematics, University of Bonn
Summer term 2019
July 11, 2019
18:52
1
Preface
Continuous updates of these lecture notes can be found on the following webpage:
http://www.or.uni-bonn.de/lectures/ss19/lgo ss19.html
These lecture notes are based on a number of textbooks and lecture notes from earlier courses. See
e.g. the lecture notes by Tim Nieberg (winter term 2012/2013) and Stephan Held (winter term
2013/2014 and 2017/18) that are available online on the teaching web pages of the Research Insti-
tute for Discrete Mathematics, University of Bonn (http://www.or.uni-bonn.de/lectures).
Recommended textbooks:
• Chvátal [1983]: Still a good introduction into the field of linear programming.
• Korte and Vygen [2018]: Chapters 3–5 contain the most important results of this lecture
course. Very compact description.
• Matoušek and Gärtner [2007]: Very good description of the linear programming part. For
some results, proofs are missing, and the book does not consider integer programming.
• Schrijver [1986]: Comprehensive textbook covering both linear and integer programming.
Proofs are short but precise.
Prerequisites of this course are the lectures “Algorithmische Mathematik I” and “Lineare
Algebra I/II”. The lecture “Algorithmische Mathematik I” is covered by the textbook by
Hougardy and Vygen [2018]. The results concerning Linear Algebra that are used in this course
can be found, e.g., in the textbooks by Anthony and Harvey [2012], Bosch [2007], and Fischer
[2009].
We we also make use of some basic results of the complexity theory as they are taught in the
lecture course “Einführung in die Diskrete Mathematik”. These results on complexity theory
can be found e.g. in Chapter 15 of the textbook by Korte and Vygen [2018].
The notation concerning graphs is based on the notation proposed in the textbook by Korte
and Vygen [2018].
Please report any errors in these lecture notes to brenner@or.uni-bonn.de
2
Contents
1 Introduction 5
1.1 A First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Possible Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Integrality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Modeling of Optimization Problems as (Integral) Linear Programs . . . . . . . 9
1.6 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Duality 17
2.1 Dual LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Fourier-Motzkin Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Simplex Algorithm 45
4.1 Feasible Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Efficiency of the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Dual Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3
4.5 Network Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Sizes of Solutions 67
5.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Ellipsoid Method 71
6.1 Idealized Ellipsoid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Ellipsoid Method for Linear Programs . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Separation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4
1 Introduction
Assume that a farmer has 10 hectares of land where he can grow two kinds of crops: maize and
wheat (or a combination of both). For each hectare of maize he gets a revenue of 2 units of
money and for each hectare of wheat he gets 3 units of money. Planting maize in an area of one
hectare takes him 1 day while planting wheat takes him 2 days per hectare. In total, he has
16 days for the work on his field. Moreover, each hectare planted with maize needs 5 units of
water and each hectare planted with wheat needs 2 units of water. In total he has 40 units of
water. How can he maximize his revenue?
If x1 is the number of hectares planted with maize and x2 is the number of hectares planted
with wheat we can write the corresponding optimization problem in the following compact way:
This is what we call a linear program (LP). In such an LP, we are given a linear objective
function (in our case (x1 , x2 ) 7→ 2x1 + 3x2 ) that has to be maximized or minimized under a
number of linear constraints. These constraints can be given by linear inequalities (but not
strict inequalities “<”) or by linear equations. However, a linear equation can easily be replaced
by a pair of inequalities (e.g. 4x1 + 3x2 = 7 is equivalent to 4x1 + 3x2 ≤ 7 and 4x1 + 3x2 ≥ 7),
so we may assume that all constraints are given by linear inequalities.
In our example, there were only two variables, x1 and x2 . In this case, linear programs can be
solved graphically. Figure 1 illustrates the method. The grey area is the set
which is the set of all feasible solutions of our problem. We can solve the problem by moving
the green line, which is orthogonal to the cost vector 23 (shown in red), in the direction of 23
as long as it intersects the feasible area. We end up with x1 = 4 and x2 = 6, which is in fact an
optimum solution.
5
x2
5x1 + 2x2 = 40
x1
x1 + x2 = 10 x1 + 2x2 = 16
In this lecture course, we consider optimization problems with linear objective functions and
linear constraints. The constraints can be written in a compact way using matrices:
Linear Programming
Instance: A matrix A ∈ Rm×n , vectors c ∈ Rn and b ∈ Rm .
Task: Find a vector x ∈ Rn with Ax ≤ b maximizing ct x.
6
Notation: Unless stated differently, always let A = (aij ) i=1,...,m ∈ Rm×n , b = (b1 , . . . , bm ) ∈ Rm
j=1,...,n
and c = (c1 , . . . , cn ) ∈ Rn .
Remark: Real vectors are simply ordered sets of real numbers. But when we multiply vectors
with each other or with matrices, we have to interpret them as n × 1-matrices (column vectors)
or as 1 × n-matrices (row vectors). By default, we consider vectors as column vectors in this
context, so if we want to use them as row vectors, we have to transpose them (“ct ”).
We often write linear programs in the following way:
max ct x
(1)
s.t. Ax ≤ b
Both standard forms can be transformed into each other: If we are given a linear program in
standard equation form we can replace each equation by a pair of inequalities and the constraint
x ≥ 0 by −In x ≤ 0 (where In is always the n × n-identity matrix). This leads to a formulation
of the same linear program in standard inequality form.
The transformation from the standard inequality form into the standard equation form is slightly
more complicated: Assume we are given the following linear program in standard inequality
form
max ct x
(3)
s.t. Ax ≤ b
We replace each variable xi by two variables zi and z̄i . Moreover, for each of the m constraints
we introduce a new variable x̃i (a so-called slack variable). With variables z = (z1 , . . . , zn ),
z̄ = (z̄1 , . . . , z̄n ) and x̃ = (x̃1 , . . . , x̃m ), we state the following LP in standard equation form:
7
Note that [A | − A | Im ] is the m × 2n + m-matrix that we get by concatenating the matrices
A, −A and Im . Any solution z,z̄ and x̃ of the LP (4) gives a solution of the LP (3) with the
same cost by setting: xj := zj − z̄j (for j ∈ {1, . . . , n}).
On the other hand, if x is a solution of LP (3), then we get a solution of LP (4) with thePsame cost
n
by setting zj := max{xj , 0}, Pnz̄j := − min{xj , 0} (for j ∈ {1, . . . , n}) and x̃i = bi − j=1 aij xj
(for i ∈ {1, . . . , m}, where j=1 aij xj ≤ bi is the i-th constraint of Ax ≤ b).
Note that (in contrast to the first transformation) this second transformation (from the standard
inequality form into the standard equation form) leads to a different solution space because we
have to introduce new variables.
There are three possible outcomes for a linear program max{ct x | Ax ≤ b}:
We will see that deciding if a linear program is feasible is as hard as computing an optimum
solution to a feasible and bounded linear program (see Section 2.4).
In many applications, we need an integral solution. This leads to the following class of problems:
8
Replacing the constraint x ∈ Rn by x ∈ Zn makes a huge difference. We will see that
there are polynomial-time algorithms for Linear Programming while Integer Linear
Programming is NP-hard.
Of course, one can also consider optimization problems where we have integrality constraints
only for some of the variables. These linear optimization problems are called Mixed Integer
Linear Programs.
We consider some examples how optimization problems can be modeled as LPs or ILPs. Many
flow problems can easily formulated as linear programs:
Definition 2 Let G be a directed graph with capacities u : E(G) → R>0 and let s and t
be two vertices of G. A feasible s-t-flow in (G, u) is a mapping f : E(G) → R≥0 with
Maximum-Flow Problem
Instance: A directed Graph G, capacities u : E(G) → R>0 , vertices s, t ∈ V (G) with s 6= t.
Task: Find an s-t-flow f : E(G) → R≥0 of maximum value.
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(7)
P P xe ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
e∈δG (v) e∈δG (v)
It is well known that the value of a maximum s-t-flow equals the capacity of a minimum cut
separating s from t. We will see in Section 2.5 that this result also follows from properties of the
linear program formulation. Moreover, if the capacities are integral, there is always a maximum
flow that is integral (see Section 8.4).
9
In some cases, we first have to modify a given optimization problem slightly in order to get a
linear program formulation. See the following example of a modified version of the Maximum-
Flow Problem where we have two sources and want to maximize the minimal out-flow of
both sources.
The objective function here is not a linear function but the minimum of two linear function. To
see how such a problem can be written as an LP, we assume slightly more general that we are
given the following optimization problem:
max min{ct x + d, et x + f }
s.t. Ax ≤ b
min |ct x + d|
s.t. Ax ≤ b
for some c ∈ Rn and d ∈ R. Again the problem can be written equivalently as a linear program
in the following form:
max −σ
s.t. −σ − ct x ≤ d
−σ + ct x ≤ −d
Ax ≤ b
10
The two additional constraints on σ ensure that we have σ ≥ max{ct x + d, −ct x − d} = |ct x + d|.
Other problems allow a formulation as an ILP but assumably not an LP formulation:
This problem is known to be NP-hard (see standard textbooks like Korte and Vygen [2018]),
so we cannot hope for a polynomial-time algorithm. Nevertheless, the problem can easily be
formulated as an integer linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G) (8)
xv ∈ {0, 1} for v ∈ V (G)
For each vertex v ∈ V (G), we have a 0-1-variable xv which is 1 if and only if v should be in the
set X, i.e. if (xv )v∈V (G) is an optimum solution to (8), the set X = {v ∈ V (G) | xv = 1} is an
optimum solution to the Vertex Cover Problem.
This example shows that Integer Linear Programming itself is an NP-hard problem. By
skipping the integrality constraints (xv ∈ {0, 1}) we get the following linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G)
(9)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
We call this linear program an LP-relaxation of (8). In this particular case, the relaxation
gives a 2-approximation of the Vertex Cover Problem: For any solution x of the relaxed
problem, we get an integral solution x̃ by setting
1 : xv ≥ 21
x̃v =
0 : xv < 12
P P
It is easy to check that yields a feasible solution of the ILP with x̃v c(v) ≤ 2 xv c(v).
v∈V (G) v∈V (G)
Obviously, in minimization problems relaxing some constraints can only decrease the value of
an optimum solution. We call the supremum of the ratio between the values of the optimum
solutions of an ILP and its LP-relaxation the integrality gap of the relaxation. The rounding
procedure described above also proves that in this case the integrality gap is at most 2. Indeed,
the is the integrality gap as the example of a complete graph with weights c(v) = 1 for all vertices
v shows. For the Maximum-Flow Problem with integral edge capacities, the integrality gap
is 1 because there is always an optimum flow that is integral.
11
The following problem is NP-hard as well:
P
max v∈V (G) xv c(v)
s.t. x v + xw ≤ 1 for {v, w} ∈ E(G)
(11)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
Unfortunately, in this case, the LP-relaxation is of no use. Even if G is a complete graph (were a
feasible solution of the Stable Set Problem can contain at most one vertex), setting xv = 12
for all v ∈ V (G) would be a feasible solution of the LP-relaxation. This example shows that the
integrality gap is at least n2 . Hence, this LP-relaxation does not provide any useful information
about a good ILP solution.
1.6 Polyhedra
Remark: It is easy to check that the convex hull of a set X ⊆ Rn is the (inclusion-wise)
minimal convex set containing X.
12
Definition 5 Let X ⊆ Rn for some n ∈ N.
• X = Rn
m
\ \
X= {x ∈ Rn | atj x ≤ bj } = {x ∈ Rn | atj x ≤ bj },
j=1 j=1,...,m:aj 6=0
13
Definition 6 The dimension of a set X ⊆ Rn is
In other words, the dimension of X ⊆ Rn is n minus the maximum size of a set of linear
independent vectors that are orthogonal to any difference of elements in X. For example, the
empty set and sets consisting of exactly one vector have dimension 0. The set Rn has dimension
n.
Observation: The dimension of a set X ⊆ Rn is the largest d for which X contains elements
v0 , v1 , . . . , vd such that v1 − v0 , v2 − v0 , . . . , vd − v0 are linearly independent.
Observation: A non-empty set X ⊆ Rn is a convex cone if and only if X is convex and for all
x ∈ X and λ ∈ R≥0 we have λx ∈ X.
( m )
X
cone({x1 , . . . , xm }) := λi xi | λ1 , . . . , λm ≥ 0 .
i=1
14
A convex cone C is called finitely generated if there are vectors x1 , . . . , xm ∈ Rn with
C = cone({x1 , . . . , xm }).
It is easy to check that cone({x1 , . . . , xm }) is indeed a convex cone. We will see in Section 3.5
that a cone is polyhedral if and only if it is finitely generated.
15
16
2 Duality
How can we find upper bounds on the value of an optimum solution? By combining the first
two constraints we can get the following bound for any feasible solution (x1 , y1 ):
1 1
12x1 + 10x2 = 2 · (4x1 + 2x2 ) + (8x1 + 12x2 ) ≤ 2 · 5 + · 7 = 13.5.
2 2
We can even do better by combining the last two inequalities:
7 4 7 4
12x1 + 10x2 = · (8x1 + 12x2 ) + · (2x1 − 3x2 ) ≤ · 7 + · 1 = 9.5.
6 3 6 3
More generally, for computing upper bounds we ask for non-negative numbers u1 , u2 , u3 such
that
12x1 + 10x2 = u1 · (4x1 + 2x2 ) + u2 · (8x1 + 12x2 ) + u3 · (2x1 − 3x2 ).
Then, 5 · u1 + 7 · u2 + 1 · u3 is an upper bound on the value of any solution of (P), so we want
to chose u1 , u2 , u3 in such a way that 5 · u1 + 7 · u2 + 1 · u3 is minimized.
This leads us to the following linear program (D):
This linear program is called the dual linear program of (P). Any solution of (D) yields
an upper bound on the optimum value of of (P), and in this particular case it turns out that
u1 = 0 , u2 = 76 , u3 = 43 (the second solution from above) with value 9.5 is an optimum solution
of (D) because x1 = 11 16
, x2 = 18 is a solution of (P) with value 9.5.
For a general linear program (P)
max ct x
s.t. Ax ≤ b
17
in standard inequality form we define its dual linear program (D) as
min bt y
s.t. At y = c
y ≥ 0
In this context, we call the linear program (P) primal linear program.
Remark: Note that the dual linear program does not only depend on the objective function
and the solution space of the primal linear program but on its description by linear inequalities.
For example adding redundant inequalities to the system Ax ≤ b will lead to more variables in
the dual linear program.
ct x = (At y)t x = y t Ax ≤ y t b.
2
Remark: The term “dual” implies that applying the transformation from (P) to (D) twice
yields (P) again. This is not exactly the case but it is not very difficult to see that dualizing (D)
(after transforming it into standard equational form) gives a linear program that is equivalent
to (P) (see the exercises).
3x + 2y + 4z ≤ 10
3x + 2z ≤ 9
2x − y ≤ 5
(12)
−x + 2y − z ≤ 3
−2x ≤ 4
2y + 2z ≤ 7
Assume that we just want to decide if a feasible solution x, y, z exists. The goal is to get rid of
the variables one after the other. To get rid of x, we first reformulate the inequalities such that
18
we can easily see lower and upper bounds for x:
x ≤ 10
3
− 2
3
y − 4
3
z
2
x ≤ 3 − 3
z
5
x ≤ 2
+ 12 y
(13)
x ≥ −3 + 2y − z
x ≥ −2
2y + 2z ≤ 7
This system of inequalities has a feasible solution if and only if the following system (that does
not contain x) has a solution:
19
Note that this method, which is called Fourier-Motzkin elimination, is in general very
inefficient. If m is the number of inequalities in the initial system, it may be necessary to state
m2
4
inequalities in the system with one variable less (this is the case if there are m2 inequalities
that gave an upper bound on the variable we got rid of and m2 inequalities that gave a lower
bound).
Nevertheless, the Fourier-Motzkin elimination can be used to get a certificate that a given
system of inequalities does not have a feasible solution. In the proof of the following theorem
we give a general description of one iteration of the method:
Theorem 4 Let A ∈ Rm×n and b ∈ Rm (with n ≥ 1). Then there are à ∈ Rm̃×(n−1) and
2
b̃ ∈ Rm̃ with m̃ ≤ max{m, m4 } such that
(a) Each inequality in the system Ãx̃ ≤ b̃ is a positive linear combination of inequalities
from Ax ≤ b
(b) The system Ax ≤ b has a solution if and only if Ãx̃ ≤ b̃ has a solution.
Proof: Denote the entries of A by aij , i.e. A = (aij ) i=1,...,m . We will show how to get rid of
j=1,...,n
the variable with index 1. To this end, we partition the index set {1, . . . , m} of the rows into
three disjoint sets U ,L, and N :
We can assume that |ai1 | = 1 for all i ∈ U ∪ L (otherwise we divide the corresponding inequality
by |ai1 |).
For vectors ãi = (ai2 , . . . ain ) and x̃ = (x2 , . . . xn ) (that are empty if n = 1), we replace the
inequalities that correspond to indices in U and L by
Obviously, each of these |U |·|L| new inequalities is simply the sum of two of the given inequalities
(and hence a positive linear combination of them).
The inequalities with index in N are rewritten as
ãtl x̃ ≤ bl l ∈ N. (18)
The inequalities in (17) and (18) form a set of inequalities Ãx̃ ≤ b̃ with n − 1 variables, and each
solution of Ax ≤ b gives a solution of Ãx̃ ≤ b̃ by restricting x = (x1 , . . . , xn ) to (x2 , . . . , xn ).
20
On the other hand, if x̃ = (x2 , . . . , xn ) is a solution of Ãx̃ ≤ b̃, then we can set x̃1 to any value
in the (non-empty) interval
where we set the minimum of an empty set to ∞ and the maximum of an empty set to −∞.
Then, x = (x̃1 , x2 , . . . , xn ) is a solution of Ax ≤ b. 2
Theorem 6 (Farkas’ Lemma, most general case) For A ∈ Rm1 ×n1 , B ∈ Rm1 ×n2 , C ∈
Rm2 ×n1 , D ∈ Rm2 ×n2 , a ∈ Rm1 and b ∈ Rm2 exactly one of the two following systems has
a feasible solution:
System 1:
Ax + By ≤ a
Cx + Dy = b (19)
x ≥ 0
System 2:
ut A + v t C ≥ 0t
ut B + v t D = 0t
(20)
u ≥ 0
u a + vtb
t
< 0
21
Proof: The first system is equivalent to
Ax + By ≤ a
Cx + Dy ≤ b
−Cx − Dy ≤ −b
−In1 x ≤ 0
By Theorem 5, this system has a solution if and only if the following sytem does not have a
solution:
Obviously, this system has a solution if and only if the second system of the theorem has a
solution. 2
Corollary 7 (Farkas’ Lemma, further variants) For A ∈ Rm×n and b ∈ Rm , the following
statements hold:
Proof: Restrict the statement of Theorem 6 to the vector b and matrix C (for part (a)) of D
(for part (b)). 2
Remark: Statement (a) of Corollary 7 has a nice geometric interpretation. Let C be the cone
generated by the columns of A. Then, the vector b is either in C or there is a hyperplane (given
by the normal u) that separates b from C.
2 3
and b1 = 52 and b2 = 13 (see Figure 2). The vector b1
As an example consider A =
1 1
is in the cone generated by the columns of A (because 52 = 21 + 31 ) while b2 can by separated
1
from the cone by a hyperplane orthogonal to u = −2 .
22
y
1
b2 = 3
5
b1 = 2
2
1
3
1
x
1
u= −2
23
2.4 Strong Duality
max ct x (P )
s.t. Ax ≤ b
and
min bt y (D)
s.t. At y = c
y ≥ 0
4. Both (P) and (D) have a feasible solution. Then both have an optimal solution, and
for an optimal solution x̃ of (P) and an optimal solution ỹ of (D), we have
ct x̃ = bt ỹ.
24
ũ := z1 u. This implies At ũ = c and ũ ≥ 0, so ũ is a feasible solution of (D). Therefore (D) is
feasible. It is bounded as well because of the weak duality.
It remains to show that there are feasible solutions x of (P) and y of (D) such that ct x ≥ bt y.
This is the case if (and only if) the following system has a feasible solution:
Ax ≤ b
At y = c
−ct x + bt y ≤ 0
y ≥ 0
By Theorem 6, this is the case if and only if the following system (with variables u ∈ Rm ,
v ∈ Rn and w ∈ R) does not have a feasible solution:
ut A −wct = 0
v t At + wbt ≥ 0
u b + vtc
t
< 0 (23)
u ≥ 0
w ≥ 0
Hence, assume that system (23) has a feasible solution u, v and w.
Case 1: w = 0. Then (again by Farkas’ Lemma) the system
Ax ≤ b
At y = c
y ≥ 0
does not have a feasible solution, which is a contradiction because both (P) and (D) have a
feasible solution.
Case 2: w > 0. Then
0 > wut b + wv t c ≥ ut (−Av) + v t (At u) = 0,
which is a contradiction. 2
Remark: Theorem 8 shows in particular that if a linear program max{ct x | Ax ≤ b} is feasible
and unbounded that there is a vector x̃ with Ax̃ ≤ b such that ct x̃ = sup{ct x | Ax ≤ b}.
The following table gives an overview of the possible combinations of states of the primal and
dual LPs (“X” means that the combination is possible, “x” means that it is not possible):
(D)
Feasible, Feasible,
Infeasible
bounded unbounded
Feasible,
X x x
bounded
(P) Feasible,
x x X
unbounded
Infeasible x X X
25
Remark: The previous theorem can be used to show that computing a feasible solution of
a linear program is in general as hard as computing an optimum solution. Assume that we
want to compute an optimum solution of the program (P) in the theorem. To this end, we can
compute any feasible solution of the following linear program:
max ct x
s.t. Ax ≤ b
At y = c (24)
ct x ≥ bt y
y ≥ 0
Here x and y are the variables. We can ignore the objective function in the modified LP because
we just need any feasible solution. The constraints At y = c, ct x ≥ bt y and y ≥ 0 guarantee that
any vector x from a feasible solution of the new LP is an optimum solution of (P).
26
Primal LP Dual LP
Variables x1 , . . . , x n y1 , . . . , y m
Matrix A At
Right-hand side b c
Objective function max ct x min bt y
n
P
aij xj ≤ bi yi ≥ 0
j=1
Pn
aij xj ≥ bi yi ≤ 0
j=1
Pn
aij xj = bi yi ∈ R
j=1
Constraints m
P
xj ≥ 0 aij yi ≥ cj
i=1
Pm
xj ≤ 0 aij yi ≤ cj
i=1
Pm
xj ∈ R aij yi = cj
i=1
max{ct x | Ax ≤ b, x ≥ 0} min{bt y | y t A ≥ c, y ≥ 0}
max{ct x | Ax ≥ b, x ≥ 0} min{bt y | y t A ≥ c, y ≤ 0}
max{ct x | Ax = b, x ≥ 0} min{bt y | y t A ≥ c}
(b) ct x = bt y.
(c) y t (b − Ax) = 0.
27
Proof: The equivalence of the statements (a) and (b) follows from Theorem 8. To see the
equivalence of (b) and (c) note that y t (b − Ax) = y t b − y t Ax = y t b − ct x, so ct x = bt y is
equivalent to y t (b − Ax) = 0. 2
With the notation of the theorem, let at1 , . . . , atm be the rows of A and b = (b1 , . . . , bm ). Then,
the theorem implies that for an optimum primal solution x and an optimum dual solution y and
i ∈ {1, . . . , m} we have yi = 0 or ati x = bi (since m t t
P
i=1 i i − ai x) must be zero and yi (bi − ai x)
y (b
cannot be negative for any i ∈ {1, . . . , m}).
(b) ct x = bt y.
Proof: The equivalence of the statements (a) and (b) follows again from Theorem 8. To
see the equivalence of (b) and (c) note that 0 ≤ y t (b − Ax) and 0 ≤ xt (At y − c). Hence
y t (b − Ax) + xt (At y − c) = y t b − y t Ax + xt At y − xt c = y t b − xt c is zero if and only if
0 = y t (b − Ax) and 0 = xt (At y − c). 2
Proof: The linear program is bounded if and only if its dual linear program is feasible. This is
the case if and only if there is a vector y ≥ 0 with y t A = c which is equivalent to the statement
that c is in the cone generated by the rows of A. 2
Theorem 10 allows us to strengthen the statement of the previous Corollary. Let x be an
optimum solution of the linear program max{ct x | Ax ≤ b} and y an optimum solution of its
dual min{bt y | At y = c, y ≥ 0}. Denote the row vectors of A by at1 , . . . , atm . Then yi = 0 if
ati x < bi (for i ∈ {1, . . . , m}), so c is in fact in the cone generated only by these rows of A where
ati x = bi (see Figure 3 for an illustration).
28
y
a1
c
a2
at3 x = b3
at1 x = b1
{x ∈ R2 | Ax ≤ b}
x
at2 x = b2
(a) The primal LP max{ct x | Ax ≤ b} has an optimum solution x∗ with ati x∗ < bi .
(b) The dual LP min{bt y | At y = c, y ≥ 0} has an optimum solution y ∗ with yi∗ > 0.
Proof: By complementary slackness, at most one the statements can be true. Let δ =
max{ct x | Ax ≤ b} be the value of an optimum solution. Assume that (a) does not hold. This
means that
max −ati x
Ax ≤ b
−ct x ≤ −δ
has an optimum solution with value at most −bi . Hence, also its dual LP
min bt y − δu
At y − uc = −ai
y ≥ 0
u ≥ 0
must have an optimum solution of value at most −bi . Therefore, there are y ∈ Rm and u ∈ R
with y ≥ 0 and u ≥ 0 with y t A − uct = −ati and y t b − uδ ≤ −bi . Let ỹ = y + ei (i.e. ỹ arises from
y by increasing the i-th entry by one). If u = 0, then ỹ t A = y t A + ati = 0 and ỹ t b = y t b + bi ≤ 0,
29
so if y ∗ is an optimal dual solution, y ∗ + ỹ is also an optimum solution and has a positive i-th
entry. If u > 0, then u1 ỹ is an optimum dual solution (because u1 ỹ t A = u1 y t A + u1 ati = ct and
1 t
u
ỹ b = u1 y t b + u1 bi ≤ δ) and has a positive i-th entry. 2
Proof: By Theorem 13, for any inequality ati x ≤ bi there is a pair of optimum solutions
(i)
x(i) ∈ Rn , y (i) ∈ Rm such that ati x(i) < bi or yi > 0. Since the convex
Pm combination of optimum
LP solutions is again an optimum solution, we can set x := m i=1 x and y := m1 m
∗ 1 (i) ∗ (i)
P
i=1 y
and get a pair of optimum solutions fulfilling the conditions of the theorem. 2
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(25)
P x
P e ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
e∈δG (v) e∈δG (v)
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G), {s, t} ∩ {v, w} = ∅
ye + zv ≥ 0 for e = (v, t) ∈ E(G), v 6= s
ye − zw ≥ 0 for e = (t, w) ∈ E(G), w 6= s (26)
ye − zw ≥ 1 for e = (s, w) ∈ E(G), w 6= t
ye + zv ≥ −1 for e = (v, s) ∈ E(G), v 6= t
ye ≥ 1 for e = (s, t) ∈ E(G)
ye ≥ −1 for e = (t, s) ∈ E(G)
In a simplified way its dual LP can be written with two dummy variables zs = −1 and zt = 0:
30
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G) (27)
zs = −1
zt = 0
We will use the dual LP to show the Max-Flow-Min-Cut-Theorem. We call a set δ + (R) with
R ⊂ V (G), s ∈ R and t 6∈ R an s-t-cut.
Proof: If x is a feasible solution of the primal problem (25) (i.e. x encodes an s-t-flow) and
δ + (R) is an s-t-cut, then
X X X X X X X X
xe − xe = xe − xe = xe − xe ≤ u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
P P
The first equation follows from the flow conservation rule (i.e. xe − xe = 0) applied
+ −
e∈δG (v) e∈δG (v)
to all vertices in R \ {s} and the second one from the fact that flow values on edges inside R
cancel out in the sum. The last inequality follows from the fact that flow values are between 0
and u.
Thus, the capacity of any s-t-cut is an upper bound for the value of an s-t-flow. We will show
that for any maximum s-t-flow there is an s-t-cut whose capacity equals the value of the flow.
Let x̃ be an optimum solution of the primal problem (25) and ỹ, z̃ be an optimum solution of
the dual problem (27). In particular x̃ defines a maximum s-t-flow. Consider the set R := {v ∈
V (G) | z̃v ≤ −1}. Then s ∈ R and t 6∈ R.
+
If e = (v, w) ∈ δG (R), then z̃v < z̃w , so ỹe ≥ z̃w − z̃v > 0. By complementary slackness
−
this implies x̃e = u(e). On the other hand, if e = (v, w) ∈ δG (R), then z̃v > z̃w and hence
ỹe + z̃v − z̃w ≥ z̃v − z̃w > 0, so again by complementary slackness x̃e = 0. This leads to:
X X X X X X X X
x̃e − x̃e = x̃e − x̃e = x̃e − x̃e = u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
31
32
3 The Structure of Polyhedra
is a polyhedron.
Proof: Exercise. 2
x
Remark: The set P = {x ∈ Rn | ∃y ∈ Rk : A y
≤ b} is called a projection of {z ∈ Rn+k |
Az ≤ b} to Rn .
More generally, the image of a polyhedron {x ∈ Rn | Ax ≤ b} under an affine linear mapping
f : Rn → Rk , which is given by D ∈ Rk×n , d ∈ Rk and x 7→ Dx + d is also a polyhedron:
{y ∈ Rk | ∃x ∈ Rn : Ax ≤ b and y = Dx + d}
is a polyhedron.
(
k n
y ∈ R | ∃x ∈ R : Ax ≤ b and y = Dx + d
A 0 b
x
= y ∈ Rk | ∃x ∈ Rn D −Ik ≤ −d
y
−D Ik d
33
3.2 Faces
(a) F is a face of P .
Proof:
34
Set z := x∗ + (x∗ − y) (see Figure 4). Then ct z > δ, so z 6∈ P . Therefore, there must
be an inequality at x ≤ β of the system Ax ≤ b such that at z > β. We claim that this
inequality cannot belong to Ãx ≤ b̃. To see this assume that at x ≤ β belongs to Ãx ≤ b̃.
If at x∗ ≤ at y then
at z = at x∗ + at (x∗ − y) ≤ at x∗ < β.
But if at x∗ > at y then
t t ∗ t ∗ β − at x ∗ t ∗
t ∗
a z = a x + a (x − y) < a x + t ∗ a (x − y) = β.
a (x − y)
In both cases, we get a contradiction, so the inequality at x ≤ β belongs to A0 x ≤ b0 .
Therefore, at y = at (x∗ + 1 (x∗ − z)) = (1 + 1 )β − 1 at z < β, which means that A0 y 6= b0 .
c
x∗ + (x∗ − y)
F
x∗
y
P
(a) Let c ∈ Rn be a vector such that max{ct x | x ∈ P } < ∞. Then the set of all vectors
x where the maximum of ct x over P is attained is a face of P .
(b) F is a polyhedron.
We are in particular interested in the largest and the smallest faces of a polyhedron.
35
3.3 Facets
Proof: If P = {x ∈ Rn | Ax = b}, then P does not have a facet (the only face of P is P itself,
see Corollary 19 (d)), so both statements are trivial.
Hence assume that P 6= {x ∈ Rn | Ax = b}.
Let A0 x ≤ b0 be a minimal system of inequalities such that P = {x ∈ Rn | Ax = b, A0 x ≤ b0 }.
Let at x ≤ β be an inequality in A0 x ≤ b0 , and let A00 x ≤ b00 be the rest of the system A0 x ≤ b0
without at x ≤ β.
We will show that at x ≤ β is facet-defining.
Let y ∈ Rn be a vector with Ay = b, A00 y ≤ b00 and at y > β. Such a vector exists because
otherwise A00 y ≤ b00 would be a smaller system of inequalities than A0 ≤ b0 with P = {x ∈ Rn |
Ax = b, A00 ≤ b00 }, which is a contradiction to the definition of A0 x ≤ b0 .
Moreover, let ỹ ∈ P be a vector with A0 ỹ < b0 (such a vector ỹ exists because P is full-dimensional
in the linear subspace {x ∈ Rn | Ax = b}). Consider the vector
β − at ỹ
z = ỹ + (y − ỹ).
at y − at ỹ
t t
Then, at z = at ỹ + aβ−a ỹ t t β−a ỹ
t y−at ỹ (a y − a ỹ) = β. Furthermore, 0 < at y−at ỹ < 1. Thus, z is the convex
that is met by all elements of F with equality (e.g. the vector z ∈ F fulfills all inequalities in
A00 x ≤ b00 with strict inequality).
On the other hand, by Proposition 18 any facet is defined by an inequality of A0 x ≤ b0 . 2
36
Corollary 21 Let P ⊆ Rn be a polyhedron.
In particular, this means that the smallest possible representation of a full-dimensional polyhe-
dron P = {x ∈ Rn | Ax ≤ b} is unique (up to swapping inequalities and multiplying inequalities
with positive constants). If possible, we want to describe any polyhedron by facet-defining
inequalities because according to the Theorem 20, this gives such a smallest possible description
of the polyhedron (with respect to the number of inequalities).
Proof: “⇒:” Let F be a minimal face of P . By Proposition 18, we know that there is a
subsystem A0 x ≤ b0 of Ax ≤ b with F = {x ∈ P | A0 x = b0 }. Choose A0 x ≤ b0 maximal with
this property. Let Ãx ≤ b̃ be a minimal subsystem of Ax ≤ b such that F = {x ∈ Rn | A0 x =
b0 , Ãx ≤ b̃}.
We have to show the following claim:
Claim: Ãx ≤ b̃ is an empty system of inequalities.
Proof of the Claim: Assume that at x ≤ β is an inequality in Ãx ≤ b̃. The inequality at x ≤ β
is not redundant, so by Theorem 20, F 0 = {x ∈ Rn | A0 a = b0 , Ãx ≤ b̃, at x = β} is a facet of F ,
and hence, by Corollary 19, F 0 is s face of P . On the other hand, we have F 0 6= F , because
at x = β is not valid for all elements of F (otherwise we could have added at x ≤ β to the set of
inequalities A0 x ≤ b0 ). This is a contradiction to the minimality of F . This proves the claim.
“⇐:” Assume that F = {x ∈ Rn | A0 x = b0 } ⊆ P (for a subsystem A0 x ≤ b0 of Ax ≤ b) is
non-empty.
Then, F cannot contain a proper subset as a face (see Corollary 19 (d)).
37
Moreover, F = {x ∈ Rn | A0 x = b0 } = {x ∈ P | A0 x = b0 }, so by Proposition 18 the set F is a
face of P . Since any proper subset of F that is a face of P would also be a face of F and we
know that F does not contain proper subsets as faces, F is a minimal face of P . 2
(a) x0 is a vertex of P .
Proof:
i ∈ {1, . . . , k}, then at x0 = ki=1 λi at x(i) < β, which is a contradiction. But then, we have
P
x(i) ∈ {x ∈ P | A0 x = b0 } = {x0 } for all i ∈ {1, . . . , k}, which is a contradiction, too.
38
“(d) ⇒ (b)”: Let A0 x ≤ b0 be a maximal subsystem of Ax ≤ b such that A0 x0 = b0 . Assume
that A0 does not contain n linearly independent rows. Then, there is a vector d that is
orthogonal to all rows in A0 . Hence, for any > 0, we have A0 (x0 + d) = A0 (x0 − d) = b0 .
For any inequality at x ≤ β that is in Ax ≤ b but not in A0 x ≤ b0 , we have at x0 < β.
Therefore, if > 0 is sufficiently small, at (x0 + d) ≤ β and at (x0 − d) ≤ β are valid for
inequalities at x ≤ β in Ax ≤ b but not in A0 x ≤ b0 . In other words, we have (x0 + d) ∈ P
and (x0 − d) ∈ P . 2
Examples:
• Polytopes are pointed.
To see this, consider a non-empty polytope P = {x ∈ Rn | Ax ≤ b}. If rank(A) < n, then
there is a vector x̃ ∈ Rn such that Ax̃ = 0. But then for any x ∈ P and K ∈ R, we have
x + K x̃ ∈ P , which is a contradiction to the assumption that P fits into a ball of finite
radius. Hence we have rank(A) = n, so P is pointed.
n
• Polyhedra P that can be written as P = {x ∈ R | Ax =b, x ≥0} arepointed.
A b
This can be seen by writing P as P = {x ∈ Rn | −A x ≤ −b }. Obviously, the
−In 0
A
matrix −A has rank n, hence P is pointed.
−In
Corollary 25 If the linear program max{ct x | Ax ≤ b} is feasible and bounded and the
polyhedron P = {x ∈ Rn | Ax ≤ b} is pointed, then there is a vertex x0 of P such that
ct x0 = max{ct x | Ax ≤ b}. 2
i=1 λi ai .
We show that the vectors a1 , . P. . , ak are linearly independent. If this is not the case, there are
numbers γ1 , . . . , γk such that ki=1 γi ai = 0. We can assume that at least one γi is positive.
39
Choose σ maximal such that λi − σγi ≥ 0 for all i ∈ {1,P. . . , k}. Then, in particular, for at least
on on i ∈ {1, . . . , k}, we have λi − σγi = 0. Therefore, ki=1 (λi − σγi )ai is a representation of c
with less vectors, which is a contradiction to the minimality of the set {a1 , . . . , ak }. 2
Proof: Obviously, at most one of the statements can be valid. Let A be the matrix with rows
at1 , . . . , atm .
If c ∈ cone({a1 , . . . , am }) then by the previous theorem, c can be written as a non-negative
combination of linearly independent vectors from at1 , . . . , atm .
Hence, assume that c 6∈ cone({a1 , . . . , am }), so there is no vector v ∈ Rm , v ≥ 0 such that
ct = v t A. By Farkas’ Lemma (Theorem 6), this implies that there is a vector ũ ∈ Rn such that
Aũ ≥ 0 and ct ũ < 0. This implies that the following LP (with u ∈ Rn as variable vector) has a
feasible solution:
max ct u
s.t. ct u ≤ −1
−ct u ≤ 1
−Au ≤ 0
Moreover, the LP is bounded (-1 is the value of an optimum solution). Hence, the optimum is
attained on a face of the solution polyhedron. By Theorem 22, we can write a minimal face
where the optimum solution value is attained as a set F = {u ∈ Rn | A0 u = b0 } where A0 u ≤ b0
is a subsystem of ct u ≤ −1, −ct u ≤ 1, −Au ≤ 0 consisting of t linearly independent vectors.
Hence, any vector u ∈ F fulfills the condition of (b). 2
3.5 Cones
40
Proof: “⇐:” Let a1 , . . . , am ∈ Rn be vectors. We have to show that cone({a1 , . . . , am }) is
polyhedral. W.l.o.g. we can assume that the vectors a1 , . . . , am span the vector space Rn .
Consider the set H of half-spaces Hu = {x ∈ Rn | ut x ≤ 0} such that for each Hu ∈ H the
following conditions hold:
• {a1 , . . . , am } ⊆ Hu , and
• There are n − 1 linearly independent vectors ai1 , . . . , ain−1 in {a1 , . . . , am } such that
ut aij = 0 for j ∈ {1, . . . , n − 1}
m
The set H is finite because there are at most n−1 such half-spaces, and by Theorem 27 the
set cone({a1 , . . . , am }) is the intersection of these half-spaces. Hence, cone({a1 , . . . , am }) is a
polyhedron.
“⇒:” Let C = {x ∈ Rn | Ax ≤ 0} be a polyhedral cone. We have to show that C is finitely
generated. Let CA be the cone generated by the rows of A. By the first part of the proof, we
know that CA (as any other finitely generated cone) is polyhedral. Hence, there are vectors
d1 , . . . , dk ∈ Rn such that CA = {x ∈ Rn | dt1 x ≤ 0, . . . , dtk x ≤ 0}. Let CB = cone({d1 , . . . , dk })
be the cone generated by d1 , . . . , dk .
Claim: C = CB .
Proof of the claim: “CB ⊆ C”: Every row vector of A is contained in CA . Hence Adi ≤ 0 for all
i ∈ {1, . . . , k}. Therefore, di ∈ C (for i ∈ {1, . . . , k}) and thus (as C is a cone) CB ⊆ C.
“C ⊆ CB ”: Assume that there is a y ∈ C \ CB . Again by the first part, CB is polyhedral. Thus,
there must be a vector w ∈ Rn with wt di ≤ 0 (for i = 1, . . . , k) and wt y > 0. This implies
w ∈ CA , and therefore wt x ≤ 0 for all x ∈ C. Obviously, together with wt y > 0 this is a
contradiction to the assumption y ∈ C. 2
Remark: For a set S ⊆ Rn we call the set S o = {x ∈ Rn | xt y ≤ 0 for all y ∈ S}, the polar
cone of S (in particular it obviously is a convex cone). For a polyhedral cone C = {x ∈ Rn |
Ax ≤ 0} its polar cone C o is the cone generated by the rows of A (see exercises). We have just
seen in the proof that C oo = C for a polyhedral cone C.
3.6 Polytopes
41
where
x n+1
C= ∈R | λ ≥ 0, Ax − λb ≤ 0 .
λ
The set C is a polyhedral cone, so be Theorem 28 it s finitely generated by a set λx11 , . . . , λxkk
Hence, we can assume that all λi are positive (for i ∈ {1, . . . , k}). We can even assume that we
have λi = 1 for all i ∈ {1, . . . , k} because otherwise we could scale all vectors by the factor λi .
Thus, we have
x x1 xk
x ∈ X ⇔ ∃µ1 , . . . , µk ≥ 0 : = µ1 + · · · + µk .
1 1 1
Proof: Let P be a polytope with vertex set X. Since P is convex and X ⊆ P , we have
conv(X) ⊆ P . It remains to show that P ⊆ conv(X). Theorem 29 implies that conv(X) is a
polytope, so in particular a polyhedron. Assume that there is a vector y ∈ P \ conv(X). Then,
there is a half-space Hy = {x ∈ Rn | ct x ≤ δ} such that conv(X) ⊆ Hy and y 6∈ Hy . This means
that ct y > ct x for all x ∈ X, so the maximum of the function ct x over P will not be attained at
a vertex. This is a contradiction to Corollary 25. 2
Notation: For two vector sets X, Y ⊆ Rn , we define their Minkowski sum as:
X + Y := {z ∈ Rn | ∃x ∈ X ∃y ∈ Y : z = x + y}.
42
Theorem 31 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron. Then, there are finite sets
V, E ⊆ Rn such that
P = conv(V ) + cone(E).
P = conv(V ) + cone(E).
43
44
4 Simplex Algorithm
The Simplex Algorithm by Dantzig [1951] is the oldest algorithm for solving general linear
programs. Geometrically it works as follows: Given a polyhedron P and a linear objective
function, we start with any vertex of P . Then we walk along a one-dimensional face of P to
another vertex and repeat this until we found a vertex where the objective function attains a
maximum.
If we want to have a chance to follow this main strategy, we need a pointed polyhedron. That
is why in this section we consider linear programs in standard equation form:
max ct x
s.t. Ax = b (28)
x ≥ 0
max ct x
s.t. Ãx ≤ b (29)
x ≥ 0
45
4.1 Feasible Basic Solutions
Notation: We denote the index set of the columns of a matrix A ∈ Rm×n by {1, . . . , n}. For a
subset B ⊆ {1, . . . , n}, we denote by AB the sub-matrix of A containing exactly the columns
with index in B. Similarly, for a vector x ∈ Rn , we denote by xB the sub-vector of x containing
the entries with index in B. Note that xB is a vector of length |B| but its entries are not indexed
from 1 to |B|, but the indices are the elements of B, so for example for B = {2, 4, 9} we have
x = (x2 , x4 , x9 ).
(b) If x is a basic solution of Ax = b for B, then the variables xj with j ∈ B are called
basic variables and the variables xj with j ∈ N are called non-basic variables.
(c) A basic solution x is called feasible if x ≥ 0. A basis is called feasible if its basic
solution is feasible.
Remark: We also use the above definition for inequality systems of the type Ãx ≤ b, x ≥ 0
(with à ∈ Rm×ñ ). E.g. we call a vector x∗ ∈ Rñ with Ãx∗ ≤ b and x∗ ≥ 0 a basic solution if x∗ , s∗
with s∗ := b − Ãx∗ is a basic solution for Ãx + Im s = b, x ≥ 0, s ≥ 0 (with n := ñ + m variables).
In particular, in a feasible basic solution of Ãx ≤ b, x ≥ 0, the number of tight constraints
(including non-negativity constraints) must be at least n − m = ñ, and in a non-degenerated
feasible basic solution, the number of tight constraints must be exactly ñ. This is because
each positive non-slack variable and each positive slack variable is associated with a non-tight
constraint.
Example: Consider the following system of equations:
x 1 + x2 + s 1 = 1
2x1 + x2 + s2 = 2 (30)
x1 , x 2 , s 1 , s 2 ≥ 0
The variables are x1 , x2 , s1 , and s2 . We denoted the last two variables by s1 and s2 because
they can be interpreted as slack variables for the following system of inequalities: x1 + x2 ≤
1, 2x1 + x2 ≤ 2, x1 , x2 ≥ 0.
46
If we write the system of equations in matrix notation, we get:
x1
1 1 1 0 x 2 = 1
2 1 0 1 s1 2
s2
1 1
For B = {1, 2}, we get AB = with feasible basis solution (1, 0, 0, 0). So in particular
2 1
this
basic feasible solution is degenerated. If we choose instead B = {2, 3}, we get AB =
1 1
and the corresponding basic solution is (0, 2, −1, 0) which if, of course, infeasible.
1 0
Figure 5 illustrates these two basic solutions. However, note that the figure does not show the
solution space (which is 4-dimensional) but only the solution space of the problem without the
slack variables s1 and s2 , i.e. the solution space of the system x1 + x2 ≤ 1, 2x1 + x2 ≤, x1 , x2 ≥ 0.
So the two points (1, 0) and (0, 2) are basic solutions only in the sense of the remark stated
after the last definition.
y
Infeasible
2x1 + x2 = 2
x1 + x2 = 1
Degenerated
In this example we could easily make the degenerated basic solution non-degenerated by
skipping the redundant constraint 2x1 + x2 ≤ 2. This is always possible if we only have two
non-slackness variables but already in three dimensions there are instances where we cannot
get rid of degenerated basic solutions. As an example consider Figure 6. If the pyramid defines
the set of all feasible solution the marked vector is a degenerated basic solution, because four
constraints are fulfilled with equality while there are only three non-slack variables.
Note that the example (30) shows that the same vertex of a polyhedron can belong to a
degenerated or a non-degenerated basic solution, depending on how we describe the polyhedron
by a system of inequalities.
47
Degenerated basic solution
Proof: The vector x0 is a vertex of P is and only if it is a feasible solution of the following
system and fulfills n linearly independent inequalities of the system with equality:
Ax ≤ b
−Ax ≤ −b
−In x ≤ 0
This is the case if and only if x0 ≥ 0, Ax0 = b and x0N = 0 for a set N ⊆ {1, . . . , n} with
|N | = n − m such that with B = {1, . . . , n} \ N the matrix AB has full rank. This is equivalent
to being a feasible basic solution. 2
Before we describe the algorithm in general, we will present some examples (which are taken
from Matoušek and Gärtner [2007]).
Consider the following linear program:
max x1 + x2
s.t. −x1 + x2 + x3 = 1
x1 + x4 = 3
x2 + x5 = 2
x1 , x 2 , x 3 , x 4 , x 5 ≥ 0
x1
−1 1 1 0 0 x2
1
1 0 0 1 0 x3 = 3
0 1 0 0 1 x4 2
x5
48
We first need a basis to start with. We simply choose B = {3, 4, 5}, which gives us the basic
solution x = (0, 0, 1, 3, 2). We write the constraints and the objective function in a so-called
simplex tableau:
x3 = 1 + x1 − x2
x4 = 3 − x1
x5 = 2 − x2
z = x1 + x2
The first three rows describe an equation system that is equivalent to the given one but each
basic variable is written as a combination of the non-basic variable. The last line describes the
objective function.
We will try to increase non-basic variables (which are zero in the current solution) with a
positive coefficient in the objective function. Hence, here we could use x1 or x2 , and we choose
x2 . x3 = 1 + x1 − x2 is the critical constraint that prevents us from increasing to something
bigger than 1 (without increasing x1 ). If we set x2 to something bigger than 1, x3 would become
negative. The constraint x5 = 2 − x2 only gives an upper bound of 2 for the value of x2 . Since the
bound induced by non-negativity of x3 is tighter (so the constraint x3 = 1 + x1 − x2 is critical),
we replace 3 in the basis by 2. The new basic variable x2 can be written as a combination of the
non-basic variables by using the first constraint: x2 = 1 + x1 − x3 . The new base is B = {2, 4, 5}
with a new basic solution x = (0, 1, 0, 3, 1). This is the new simplex tableau:
x2 = 1 + x1 − x3
x4 = 3 − x1
x5 = 1 − x1 + x3
z = 1 + 2x1 − x3
Increase x1 . x5 = 1−x1 +x3 is critical. x1 = 1+x3 −x5 . New base B = {1, 2, 4}. x = (1, 2, 0, 2, 0).
x1 = 1 + x3 − x5
x2 = 2 − x5
x4 = 2 − x3 + x5
z = 3 + x3 − 2x5
Increase x3 . x4 = 2−x3 +x5 is critical. x3 = 2−x4 +x5 . New base B = {1, 2, 3}. x = (3, 2, 2, 0, 0).
x1 = 3 − x4
x2 = 2 − x5
x3 = 2 − x4 + x5
z = 5 − x4 − x5
The value of the objective function for any feasible solution (x1 , . . . , x5 ) is 5 − x4 − x5 . Since we
have found a solution where x4 = x5 = 0 and we have the constraint that xi ≥ 0 (i = 1, . . . , 5),
our solution is an optimum solution.
49
Unbounded instance:
As a second example, consider:
max x1
s.t. x 1 − x2 + x3 = 1
−x1 + x2 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
Quite obviously this LP in unbounded (one can choose x1 arbitrarily large and set x2 = x1 ,
x3 = 1, and x4 = 2).
Again we use the “slack variables” (here x3 and x4 ) for a first basis. This gives B = {3, 4} and
x = (0, 0, 1, 2).
x3 = 1 − x1 + x2
x4 = 2 + x1 − x2
z = x1
x1 = 1 + x2 − x 3
x4 = 3 − x3
z = 1 + x2 − x 3
We can increase x2 as much as we want (provided that we increase x1 by the same amount).
Thus the simplex tableau show that the linear program is unbounded.
Degeneracy:
A final example shows what may happen if we get a degenerated basic solution.
max x2
s.t. −x1 + x2 + x3 = 0
x1 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
x3 = x1 − x2
x4 = 2 − x1
z = x2
50
x2 = x1 − x3
x4 = 2 − x1
z = x1 − x3
x1 = 2 − x4
x2 = 2 − x3 − x4
z = 2 − x3 − x4
Again, we have found an optimum solution because all coefficients of the non-basic variables in
the objective function z = 2 − x3 − x4 are negative.
After these three examples, we will now describe the simplex method in general.
For a feasible basis B, the simplex tableau is a system T (B) of m + 1 linear equations with
variables x1 , . . . , xn and z with this form
xB = p + QxN
(31)
z = z0 + rt xN
• xB is the vector of the basic variables, N = {1, . . . , n} \ B, and xN is the vector of the
non-basic variables,
Note that the entries of p are not necessarily numbered from 1 to m but that p uses B as the
set of indices (and for r, we have a corresponding statement). In particular, the rows of Q are
indexed by B and the columns by N . We denote the entries of Q by qij (where i ∈ B and
j ∈ N ).
Then xB = A−1 −1
B b − AB AN xN which is equivalent to AB xB = b − AN xN and Ax = b.
51
Remark: It is easy to check that there is only one simplex tableau for every feasible basis B.
The cost function z0 + rt xN does not directly depend on the basic variables but only on the
non-basic variables. Their impact on the overall cost is given be the vector r = cN − (ctB A−1 t
B AN ) .
An entry of r is called the reduced cost of its corresponding non-basic variable.
If all reduced costs are non-positive, we have already found an optimum solution:
Lemma 34 Let T (B) be a simplex tableau for a feasible basis B. If r ≤ 0, then the basic
solution of B is optimum.
Proof: Let x the feasible basic solution for B. Let K ∈ R with K > ct x be a constant. Define
tx
a new feasible solution x̃ as follows: x̃α := K−c
rα
, x̃i = xi for i ∈ N \ {α}, and x̃j := pj + qjα x̃α
for j ∈ B. It is easy to check that x̃ is a feasible solution with ct x̃ ≥ K. Hence, the linear
program is unbounded. 2
In the following, we denote the entries of A by aij (i ∈ {1, . . . , m}, j ∈ {1, . . . , n}). The column
of A with index j is denoted by a·j .
Proof: We have to show that AB̃ has full rank and that it is feasible i.e. that its basic solution
is non-negative.
All but one columns of AB̃ belong to AB . Hence, the matrix A−1 B AB̃ contains all unit
vectors ei with the possible exception of eβ because we removed the β-th column from
AB . However, this removed column has been replaced by the α-th column a·α of A, so
the remaining column of A−1 −1
B AB̃ is AB a·α . But this is exactly the column with index
α of −Q = AB AN . By construction, qβα 6= 0, so all columns of A−1
−1
B AB̃ are linearly
52
independent.
β p
(ii) We have to show that the basic solution of B̃ is non-negative. We increase xα to − qβα
pβ
and set the basic variables xB to p − q·α qβα , where q·α is the column with index α of Q.
pβ
For i ∈ B with qiα ≥ 0 (so in particular i 6= β) we have pi − qiα qβα ≥ pi ≥ 0. For i ∈ B
pβ pi pβ
with qiα < 0 we have qβα ≥ qiα , so pi ≥ qiα qβα with equality in the last inequality for
i = β. This leads to xβ = 0 and xB ≥ 0, so we get a feasible basic solution for B̃. 2
xB = p + QxN
z = z0 + rt xN
for the basis B; // See equation (31) and the following notation.
5 if r ≤ 0 then
return x̃ = x; // x̃ is optimum (see Lemma 34).
6 Choose an index α ∈ N with rα > 0;
// Here we can apply different pivot rules.
7 if qiα ≥ 0 for all i ∈ B then
return “unbounded”; // By Lemma 35, the LP is unbounded.
pβ
8 Choose an index β ∈ B with qβα < 0 and qβα = max{ qpiαi | qiα < 0, i ∈ B};
// Again, we can apply different pivot rules.
9 Set B = (B \ {β}) ∪ {α};
// See Lemma 36 proving that we get a new feasible basis.
10 go to line 3
53
max −(xn+1 + xn+2 + · · · + xn+m )
s.t. Ãx̃ = b (32)
x̃ ≥ 0
For this linear program, it is trivial to find a feasible basis ({n + 1, . . . , n + m} will work), so
we can solve it by the Simplex Algorithm. If the value of its optimum solution is negative,
this means that the original linear program does not have a feasible solution. Otherwise, the
Simplex Algorithm will provide a basic solution for the original linear program. In this case,
the solution of the new LP computed by the Simplex Algorithm could contain variables
from xn+1 , . . . , xn+m as basic variables but their value must be 0 and hence they can be replaced
easily by variables from x1 , . . . , xn .
In lines 6 and 8, we may have a choice between different candidates to enter or leave the basis.
The elements chosen in these steps are called pivot elements, and the rules by which we choose
them are called pivot rules. Several different pivot rules for the entering variable have been
proposed:
• Largest coefficient rule: For the entering variable choose α such that rα is maximized.
This is the rule that was proposed by Dantzig in his first description of the Simplex
Algorithm.
• Largest increase rule: Choose the entering variable such that the increase of the
objective function is maximized. Finding an α with that property takes more time because
it is not sufficient to consider the vector r only.
• Steepest edge rule: Choose the entering variable in such a way that we move the
feasible basic solution in a direction as close to the direction of the vector c as possible.
This means we maximize
ct (xnew − xold )
||xnew − xold ||
where xold is the basic feasible solution of the current basis and xnew is the basic feasible
solution of the basis after the exchange step. This rule is even more timing-consuming but
in many practical experiments it turned out to lead to a small number of exchange steps.
Here, we only analyze a pivot rule that is quit inefficient in practice but has the nice property
that we can show that the Simplex Algorithm terminates at all, if we follow that rule. If all
exchange steps improve the value of the current solution, we can be sure that the algorithm will
terminate because we can never visit the same basic solution twice, and there is only a finite
(though exponential) number of basic solutions. However, exchange steps do not necessarily
change the value of the solution. Therefore, depending on the pivot rules, it is possible that
the Simplex Algorithm runs in an endless loop by considering the same sequence of bases
forever. This behavior is called cycling (see page 30 ff. of Chvátal [1983] for an example that
this can really happen). The good news is that we can avoid cycling by using an appropriate
pivot rule.
54
If the algorithm does not terminate, it has to consider the same basis B twice. The computation
between two occurrences of B is called a cycle. Let F ⊆ {1, . . . , n} be the indices of the variables
that have been added to (and hence removed from) the basis during one cycle. We call xF the
cycle variables.
Lemma 37 If the Simplex Algorithm cycles, all basic solutions during the cycling
are the same and all cycle variables are 0.
Proof: The value of a solution considered in Simplex Algorithm never decreases, so during
cycling it cannot increase either. Let B be a feasible basis that occurs in the cycle, and let
B 0 = (B ∪ {α}) \ {β} be the next basis. The only non-basic variable that could be increased is
xα . However, if it indeed was increased, then, because rα > 0, this would increase the value of
the solution. This shows that the non-basic variables remain zero. But then, all variables remain
unchanged because the basic variables are determined uniquely by the non-basic variables. 2
A pivot rule that is able to avoid cycling is Bland’s rule (Bland [1977]) that can be described
as follows: In line 6 of the Simplex Algorithm, we choose α among all elements in N with
rα > 0 such that α is minimal. In line 8, we choose β among all elements in B with qβα < 0
pβ
and qβα = max{ qpiαi | qiα < 0, i ∈ B} such that β is minimal.
Theorem 38 With Bland’s rule as pivot rule in lines 6 and 8, the Simplex Algorithm
terminates after a finite number of steps.
Proof: Assume that the algorithm cycles while using Bland’s rule. We use the notation from
above and consider the set F of the indices of the cycle variables. Let π be the largest element
of F , and let B be the basis just before π enters the basis. Let p,Q,r and z0 be the entries of
the simplex tableau T (B). Let B 0 be the basis just before π leaves it. Let p0 ,Q0 ,r0 and z00 be the
entries of the simplex tableau T (B 0 ).
Let N = {1, . . . , n} \ B be the set of the non-basic variables (so in particular π ∈ N ). According
to Bland’s rule we choose the smallest index and π = max(F ), so when B is considered, π is
the only candidate in F to enter the basis. In other words:
Let α be the index entering B 0 . Again by Bland’s rule, π must have been the only candidate
among all elements of F to leave B 0 . Since p0j = 0 for all j ∈ B 0 ∩ F , this means that
0 0
qπα < 0 and qjα ≥ 0 for j ∈ B 0 ∩ (F \ {π}). (34)
Roughly spoken, we will get a contradiction because (33) says that in a feasible basic solution
increasing a non-basic variable in xF \{π} or decreasing xπ (to something negative!) will not
55
improve the result. On the other hand, (34) says that increasing xα while decreasing xπ (again
to something negative) will improve the result.
We will formalize this statement by considering the following auxiliary linear program:
max ct x
s.t. Ax = b
xF \{π} ≥ 0 (35)
xπ ≤ 0
xN \F = 0
≥0 if j ∈ F \ {π}
xj
≤0 if j = π
Therefore, by statement (33), rj xj ≤ 0 for all j ∈ F . With the condition xN \F = 0 this leads to
rt xN ≤ 0 for any solution x of (35). Therefore, the value of any such solution is at most z0 ,
and thus x̃ is an optimum solution of (35). This proves Claim 1.
Claim 2: The LP (35) is unbounded.
The bases are changed during the cycling but we always have the same basic solution. Hence,
if x̃ is a feasible basic solution of the original LP for basis B is also a feasible basic solution
for the basis B 0 . We choose a positive number K and set x0α = K. For j ∈ N 0 \ {α} (with
N 0 = {1, . . . , n} \ B 0 ), we set x0j = x̃j = 0. Moreover, we set xB 0 = p0 + Q0 x0N 0 . By (34), this
defines a feasible solution of the auxiliary LP (35). Since α was a candidate for entering the
basis B 0 , we have rα0 > 0. Hence, we get a solution with value ct x0 = z00 + r0t x0N 0 = z00 + K · rα0 .
As we can choose K arbitrarily large, this shows that LP (35) is unbounded. 2
56
4.3 Efficiency of the Simplex Algorithm
We have seen that Bland’s rule guarantees that the Simplex Algorithm will terminate. What
can we say about the running time? Consider for some with 0 < < 12 the following example:
max xn
−x1 ≤ 0
x1 ≤ 1
xj−1 − xj ≤ 0 for j ∈ {2, . . . , n}
xj−1 + xj ≤ 1 for j ∈ {2, . . . , n}
Of course, adding non-negativity constraints for all variables would not change the problem.
The polyhedron defined by these inequalities is called Klee-Minty cube (Klee and Minty
[1972]). It turns out that the Simplex Algorithm with Bland’s rule (depending on the initial
solution) may consider 2n bases before finding the optimum solution. In particular, this example
shows that we don’t get a polynomial-time algorithm.
The bad news is that for any of the above pivot rules instances have been found where the
Simplex Algorithm with that particular pivot rule has exponential running time.
Assume that you are given an optimum pivot rule that guides you to an optimum solution
with a smallest possible number of iterations. Then, the number of iterations depends on the
following property of the instances:
Obviously, if we don’t make any assumptions on the starting solution, the number of iterations
performed by the Simplex Algorithm optimizing over a polyhedron P will be at least the
combinatorial diameter of P , even with an optimum pivot rule.
It is an open question what the largest combinatorial diameter of a d-dimensional polyhedron
with n facets is. In 1957, W. Hirsch conjectured that the combinatorial diameter could be at
most n − d. This conjecture was open for decades but it has been disproved by Santos [2011] who
showed that there is a 20-dimensional polyhedron with 40 facets and combinatorial diameter
21. More generally, he proved that there are counter-examples to the Hirsch conjecture with
arbitrarily many facets. Nevertheless, it is still possible that the combinatorial diameter is
always polynomially (or even linearly) bounded in the dimension and the number of facets. The
best known upper bound for the combinatorial diameter is O(n2+log d ) and was proven by Kalai
and Kleitman [1992]. For an overview of this topic see Section 3.3 of Ziegler [2007].
57
In practical experiments, the Simplex Algorithm typically turns out to be very efficient. It
could also be proved that the average running time (with a specified probabilistic model) is
polynomial (see Borgwardt [1982]). Moreover, Spielmann and Teng [2005] have shown that the
expected running time on a slight perturbation of a worst-case instance can be bounded by a
polynomial.
If the linear program max{ct x | Ax = b, x ≥ 0} is feasible and bounded then the Simplex
Algorithm does not only provide an optimum primal solution but we can also get an optimum
solution of the dual linear program min{bt y | At y ≥ c}. To see this, let B the feasible basis
corresponding to the optimum computed by the Simplex Algorithm. Set ỹ = A−t B cB (where
−t t −1 t t t −t
AB = (AB ) ). This leads to AB ỹ = cB and AN ỹ = AN AB cB ≥ cN where the last inequality
follows from the fact that in T (B) we have 0 ≥ r = cN − (ctB A−1 t
B AN ) . So the vector ỹ is feasible
for the dual LP, and it is an optimum solution because together with the (primal) basic solution
x̃ for the basis B, it satisfies the complementary slackness condition (ỹ t A − ct )x̃ = 0.
In fact, the condition r ≤ 0 in the simplex tableau T (B) guarantees the existence of a dual
solution y with y t AB = ctB . In the Dual Simplex Algorithm, we start with a feasible basic
dual solution, i.e. a feasible dual solution for which a basis B exists with y t AB = ctB . If ctB A−1
B
is a feasible dual solution, we call B a dual feasible basis. Then, we compute the corresponding
simplex tableau T (B) (which exists for any basis not just a feasible basis). Thus the vector r
will have no positive entry. Note that B may not be feasible, so entries of p can be negative.
Now the algorithm swaps elements between the basis and the rest of the variables similarly to
the simplex algorithm but instead of keeping p non-negative it keeps r non-positive.
For any basis B such that in T (B) the vector r has no positive entry, the following properties
(that are easy to prove) are the basis of the Dual Simplex Algorithm:
58
• If there is a β ∈ B with pβ < 0 such that qβj ≤ 0 for all j ∈ N , then the primal LP is
infeasible.
r
• For β ∈ B with pβ < 0 and α ∈ N with qβα > 0 with qrβα α
≥ qβjj for all j ∈ N with qβj > 0,
then (B \ {β}) ∪ {α} is a dual feasible basis. Then the value of the dual solution is changed
−p
by qβαβ rα . In particular, if rα 6= 0 then the value of the dual solution gets smaller.
The Dual Simplex Algorithm simply applies the exchange steps in the last item until we
get a feasible basis. The algorithm can be considered as the Simplex Algorithm applied to
the dual LP. Thus it can also run into cycling and its efficiency is not better then the efficiency
of the Simplex Algorithm.
However, in some applications, the Dual Simplex Algorithm is very useful: If you add
an additional constraint to the primal LP, then a primal solution can become infeasible, so in
the Primal Simplex Algorithm we have to start from scratch. However, the dual solution
is still feasible. It is possibly not optimal but often it can be made optimal with just some
iterations of the Dual Simplex Algorithm.
The Network Simplex Algorithm can be seen as the Simplex Algorithm applied to
Min-Cost-Flow-Problems. Even for this special case, we cannot prove a polynomial running
time but it turns out that, in practice, the Network Simplex Algorithm is among the
fastest algorithms for Min-Cost-Flow-Problems. Though it is a variant of the Simplex
Algorithm, it can be described as a pure combinatorial algorithm.
Notation: We call b(v) the balance of v. If b(v) > 0, we call it the supply of v, and if b(v) < 0,
we call it the demand of v. Nodes v of G with b(v) > 0 are called sources, nodes v with
b(v) < 0 are called sinks.
During this chapter, n is always the number of nodes and m the number of edges of the graph
G.
59
Minimum-Cost Flow Problem
↔ ↔
Definition 16 Let G be a directed graph. We define the graph G by V (G) = V (G) and
↔ ← ←
E(G) = E(G)∪{ ˙ e | e ∈ E(G)} where e is an edge from w to v if e is an edge from v to
← ↔
w. e is called the reverse edge of e. Note that G may have parallel edges even if G does
not contain any parallel edges. If we have edge costs c : E(G) → R these are extended
↔ ←
canonically to edges in E(G) by setting c( e ) = −c(e).
Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem and let f be a
b-flow in (G, u). Then, the residual graph Gu,f is defined by V (Gu,f ) := V (G) and
← ↔
E(Gu,f ) := {e ∈ E(G) | f (e) < u(e)}∪{ ˙ e ∈ E(G) | f (e) > 0}. For e ∈ E(G) we define
←
the residual capacity of e by uf (e) = u(e) − f (e) and the residual capacity of e by
←
uf ( e ) = f (e).
The residual graph contains the edges where flow can be increased as forward edges and edges
where flow can be reduced as reverse edges. In both cases, the residual capacity is the maximum
value by which the flow can be modified. If P is a subgraph of the residual graph, then an
augmentation along P by γ means that we increase the flow on forward edges in P (i.e. edges
in E(G) ∩ E(P )) by γ and reduce it on reverse edges in P by γ. Note that the resulting mapping
is only a flow if γ is at most the minimum of the residual capacity of the edges in P .
60
Lemma 39 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem. A
b-flow f is a spanning tree solution if and only if x̃ ∈ RE(G) with x̃e = f (e) is a vertex of
the polytope
X X
E(G)
x∈R | 0 ≤ xe ≤ u(e) (e ∈ E(G)), xe − xe = b(v) (v ∈ V (G)) . (36)
+ −
e∈δ (v) e∈δ (v)
Proof: “⇒:” Let f be a spanning tree solution and x̃ ∈ RE(G) with x̃e = f (e). Consider
all inequalities xe ≥ 0 with f (e) = 0, xe ≤ u(e) with f (e) = u(e) and for each connected
component
P (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) for all but one vertex the equation
ofP
e∈δ + (v) xe − e∈δ − (v) xe = b(v). These are |E(G)| linearly independent inequalities that are
fulfilled with equality by x̃. Hence x̃ is a vertex.
“⇐:” Let f by a b-flow. Assume that x̃ ∈ RE(G) with x̃e = f (e) is a vertex of the polytope (36).
Assume that (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) contains an undirected cycle C. Choose
an > 0 such that ≤ min{min{f (e), u(e) − f (e)} | e ∈ E(C)}. Fix one of the two possible
orientations of C. We call an edge of C a forward edge if its orientation is the same as the
chosen orientation, otherwise it is called backward edge. Set x0e = for all forward edges and
x0e = − for all backward edges. For all edges e ∈ E(G) \ E(C), we set x0e = 0. Then x̃ + x0
and x̃ − x0 belong to the polytope (36) and x̃ = 12 ((x̃ + x0 ) + (x̃ − x0 )), so by Proposition 24, x̃
cannot be a vertex. Hence, we have a contradiction. 2
Proof: Since the polyhedron (36) is in fact a polytope, it is pointed, so there is an optimum
solution that is a vertex. Together with Lemma 39, this proves the statement. 2
61
Definition 18 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem
where we assume that G is connected. A spanning tree structure is a quadruple
(r, T, L, U ) where r ∈ V (G), E(G) = T ∪˙ L ∪˙ U , |T | = |V (G)| − 1, and (V (G), T ) does
not contain any undirected cycle.
The b-flow f associated to the spanning tree structure (r, T, L, U ) is defined by
• f (e) = 0 for e ∈ L,
• f (e) = v∈Ce b(v) + e0 ∈U ∩δ− (Ce ) u(e0 ) − e0 ∈U ∩δ+ (Ce ) u(e0 ) for e ∈ T where we
P P P
G G
denote by Ce vertex set of the the connected component of (V (G), T \ {e}) containing
v (for e = (v, w)).
Let (r, T, L, U ) be a spanning tree structure and f the b-flow associated to it. The structure
(r, T, L, U ) is called feasible if 0 ≤ f (e) ≤ u(e) for all e ∈ E(T ).
An edge (v, w) ∈ E(T ) is called downward if v is on the undirected r-w-path in T ,
otherwise is is called upward.
A feasible spanning tree structure (r, T, L, U ) is called strongly feasible if 0 < f (e) for
every downward edge e ∈ E(T ) and f (e) < u(e) for every upward edge e ∈ E(T ) (where f
is again the b-flow associated to (r, T, L, U )).
We call the unique function π : V (G) → R with π(r) = 0 and cπ (e) := c(e)+π(v)−π(w) = 0
for all e = (v, w) ∈ T the potential associated to the spanning tree structure
(r, T, L, U ).
62
Remarks:
• Obviously, the b-flow associated to the spanning tree structure (r, T, L, U ) fulfills the flow
conservation rule, but it may be infeasible.
↔ ↔
• π(v) is the length of the r-v-path in (G, c ) consisting of edges of T and their reverse
edges, only.
• In a strongly feasible tree structure, we can send a positive flow from each vertex v to r
along tree edges such that that the new flow remains non-negative and fulfills the capacity
constraints.
Proof: Since the potential π just encodes the distances to r in T , a breadth-first search in
the edges of T and the reverse edges of T is sufficient.
We can compute f by scanning the vertices in an order of non-increasing distance to r in T . 2
Proposition 42 Let (r, T, L, U ) be a feasible spanning tree structure and π the potential
associated to it. If cπ (e) ≥ 0 for all e ∈ L and cπ (e) ≤ 0 for all e ∈ U , then the b-flow
associated to (r, T, L, U ) is optimum.
Proof: The flow associated to (r, T, L, U ) is a basic solution of the standard linear program-
ming formulation for the minimum-cost flow problem. The criterion in the proposition is
equivalent to the statement that the reduced costs of all non-basic variables are non-positive.
This is equivalent to the optimality of the solution. 2
↔ ←
For an edge e = (v, w) ∈ E(G) \ T with e 6∈ T , we call e together with the w-v path consisting
of edges of T and reverse edges of edges of T only, the fundamental circuit of e. The vertex
closest to r in the fundamental circuit is called the peak of e.
Algorithm 2 gives a summary of the Network Simplex Algorithm. As an input, we need
a strongly feasible tree structure. However, even if there is a feasible b-flow, such a strongly
feasible tree structure may not exist. But we can modify the instance such that we can easily
find a strongly feasible tree structure (r, T, L, U ). We add artificial expensive edges between r
and all other nodes. For each sink v ∈ V (G) \ {r}, we add an edge (r, v) with u((r, v)) = −b(v).
For all other nodes v ∈ V (G) \ {r} we add an edges (v, r) with u((v, r)) = b(v) + 1. Then,
we get a strongly feasible spanning tree structure by setting L to the set of all old edges (i.e.
without the artificial edges connecting r) and by setting U = ∅. If the weight on the artificial
63
edges is high enough (1 + n maxe∈E(G) |c(e)| would be sufficient) and there is a solution that
does not use these edges at all, no optimum solution will send flow along these new edges, so
the new instance is equivalent.
Proof: It is easy to check that after the modification in the lines 11 to 14 f and π are still
the b-flow and the potential associated to (r, T, L, U ).
We will show that the spanning tree structure (r, T, L, U ) remains strongly feasible. By the
choice of γ in line 5 it remains feasible.
For an edge e = (v, w) on T let ẽ = (v, w) if e is an upward edge and ẽ = (w, v) if e is a
downward edge. We have to show that after an iteration of the algorithm, for all edges e ∈ E(T ),
the edge ẽ has a positive residual capacity. This is obvious for all edges outside C. For the edge
64
on the path on C from the head of e0 to the peak of C, this is also obvious because we augment
by γ = uf (e0 ) which is smaller than the residual capacities on this path (by the choice of e0 ).
For the remaining edges e on C − e0 , the residual capacity uf (ẽ) is, after the augmentation, at
least γ. Thus, is if γ > 0, we are done. But if γ = 0, then e0 must be on the path from the
peak to e0 , so for the edges e on the path from the peak to the tail of e0 we had uf (ẽ) before
the augmentation (because (r, T, L, U ) was strongly feasible), so this is still the case after the
augmentation.
We will show that we never consider the same spanning tree structure twice. In each iteration,
the cost of the flow is reduced by γ|ρ|, so if γ > 0, then we P are done. Hence assume that
− +
γ = 0. If e0 =6 e1 , then e0 ∈ L ∩ δ (X) or e0 ∈ U ∩ δ (X), so v∈V (G) π(v) will get larger (and
it
P will never get smaller). Thus, we assume in addition that e0 = e1 . Then X = V (G) and
v∈V (G) π(v) remains unchanged. But then |{e ∈ L | cπ (e) < 0}| + |{e ∈ U | cπ (e) > 0}| is
strictly decreased. This shows that we can never get the same spanning tree structure twice.
Since there is only a finite number of spanning tree structures, this proves that the algorithm
will terminate after a finite number of iterations.
By Proposition 42, the output of the algorithm is optimal when the algorithm terminates. 2
65
66
5 Sizes of Solutions
Before we will describe polynomial-time algorithms for solving linear programs we have to make
sure that we can store the output and all intermediate results with numbers whose sizes are
polynomial in the input size. To this end we have to define the size of numbers. Assuming that
all numbers are given in a binary representation, we define for
Remark: In order to get a description of a fraction r of with size(r) bits, we have to write r
as pq for numbers p, q ∈ Z that are relatively prime. Therefore, in any computation, when a
fraction pq arises, we apply the Euclidean Algorithm to p and q and divide p and q by their
greatest common divisor. The Euclidean Algorithm has polynomial running time, so during
any algorithm, we can assume that any fraction r is stored by using just size(r) bits.
Proof: Both statements of obvious if the numbers r1 , . . . , rn are integers. Hence assume that
ri = pqii for non-zero numbers pi and qi that are relatively prime (i = 1, . . . , n).
n
n
n
n n n
Q Q Q P P P
(a) size ri ≤ size pi + size qi ≤ size(pi ) + size(qi ) = size(ri ).
i=1 i=1 i=1 i=1 i=1 i=1
!
n
Q n
P n
P n
P Q
(b) We have size qi ≤ size(qi ) ≤ size(ri ), and size pi qj ≤
i=1 i=1 i=1 i=1 j∈{1,...,n}\{i}
!
n n n n n
1
P Q P P P Q
size |pi | qj ≤ size(ri ). Since ri = n
Q pi qj , this proves the
i=1 j=1 i=1 i=1 qi i=1 j∈{1,...,n}\{i}
i=1
claim. 2
67
Proposition 45 For x, y ∈ Qn , we have
Proof:
(a) We have
n
X n
X n
X
size(x + y) = n + size(xi + yi ) ≤ n + 2 size(xi ) + 2 size(yi ) = 2(size(x) + size(y)) − 3n.
i=1 i=1 i=1
(b) We have
n
! n n n
!
X X X X
t
size(x y) = size xi yi ≤2 size(xi yi ) ≤ 2 size(xi ) + size(yi )
i=1 i=1 i=1 i=1
= 2(size(x) + size(y)) − 4n.
2
p
Proof: Write the entries aij of A as aij = qijij where pij and qij are relatively prime (i, j =
1, . . . , n). Let det(A) = pq where p and q are relatively prime, too.
Then |det(A)| ≤ ni=1 nj=1 (|pij | + 1) and |q| ≤ ni=1 nj=1 |qij |. Therefore,
Q Q Q Q
size(q) ≤ size(A)
Qn Qn
and |p| = |det(A)||q| ≤ i=1 j=1 (|pij | + 1)|qij | . We can conclude
n X
X n
size(p) ≤ (size(pij ) + 1 + size(qij )) = size(A).
i=1 j=1
68
Proof: By Corollary 19 the maximum of ct x over P = {x ∈ Rn | Ax ≤ b} must be attained in
a minimal face of P . Let F be a minimal face where the maximum is attained. By Proposition 22,
we can write F = {x ∈ Rn | Ãx = b̃} for some subsystem Ãx ≤ b̃ of Ax ≤ b. We can assume
that the rows of à are linearly independent. Choose B ⊆ {1, . . . , n} such that ÃB is a regular
square matrix. Then x ∈ Rn with xB = Ã−1 B b̃ and xN = 0 (with N = {1, . . . , n} \ B) is an
optimum solution of the linear program. By Cramer’s rule the entries of xB can be written
det(Ã )
as xj = det(Ã j ) where Ãj arises from ÃB by replacing the j-th column by b̃. Thus, we have
B
Proof: According to the proof of the previous proposition there is an optimum solution x
such that for each entry xj of x we have size(xj ) ≤ 4n(size(A) + size(b)). Since every positive
number smaller than 2−4n(size(A)+size(b)) has a size larger than 4n(size(A) + size(b)), this proves
the claim. 2
Assume that we want solve an equation system Ax = b. We can do this by applying the Gaussian
Elimination. This algorithm performs three kinds of operations to the matrix A:
It should be well-known (see e.g. textbooks Hougardy and Vygen [2018] or Korte and Vygen
[2018]) that with these steps O(mn(rank(A) + 1)) elementary arithmetical operations are
sufficient to transform A into an upper (right) triangular matrix. Then it is easy to check if the
equation system is feasible, and, in case that it is feasible, to compute a solution. However, in
order to show that Gaussian Elimination is a polynomial-time algorithm, we have to show that
the numbers that arise during the algorithm aren’t too big.
The intermediate matrices that occur during the algorithm are of the type
B C
, (37)
0 D
69
where B is an upper triangular matrix. Then, an elementary step of the Gaussian Elimination
consist of choosing a non-zero entry of D (called pivot element; if no such entry exists, we are
done) and to swap rows and/or columns such that this element is at position (1, 1) of D. Then
we add a multiple of the first row of D to the other rows of D such that the entry at position
(1, 1) is the only non-zero entry of the first column of D.
We want to prove that the numbers that occur during the algorithm can be encoded using a
polynomial number of bits. We can assume that we don’t need any swapping operation because
swapping columns or rows doesn’t change the numbers in the matrix.
B C
Assume that our current matrix is à = where B is a k × k-matrix. Then for each
0 D
entry dij of D we have
det(Ã1,...,k,k+i 1,...,k
1,...,k,k+j ) = dij · det(Ã1,...,k ). (38)
where Mji11,...,j
,...,it
t
denotes the submatrix of a matrix M induced by the rows i1 , . . . , it and the
columns j1 , . . . , jt . To see the correctness of (38), apply Laplace’s formula to the last row of
Ã1,...,k,k+i
1,...,k,k+j which contains dij as the only non-zero element. Since the determinant does not
change if we add the multiple of a row to another row, this leads to
det(A1,...,k,k+i
1,...,k,k+j )
dij =
det(A1,...,k
1,...,k )
By Proposition 46 and Proposition 44, this implies size(dij ) ≤ 4 size(A). Since all entries of the
matrix occur as entries of such a matrix D, this shows that the sizes of all numbers that are
considered during the Gaussian Elimination are bounded by 4size(A).
Note that we have to apply the Euclidean Algorithm to any intermediate result in order to
get small representations of the numbers. But this is not a problem because the Euclidean
Algorithm is polynomial as well.
Finally, we get the result:
In particular this result shows that the following problems can be solved with a polynomial
running time:
• Solving a system of linear equations.
• Computing the determinant of a matrix.
• Computing the rank of a matrix.
• Computing the inverse of a regular matrix.
• Checking if a set of rational vectors is linearly independent.
70
6 Ellipsoid Method
The Ellipsoid Method (proposed by Khachiyan [1979]) was the first polynomial-time algorithm
for linear programming. The algorithm solves the problem of finding a feasible solution of a
linear program. As we have seen in Section 2.4, this is sufficient to solve as well the optimization
problem.
E = {M x + s | x ∈ B n }
But (using the previous remark) this is equivalent to the statement that there is a positive
definite n × n-matrix Q and a vector s ∈ Rn such that E = {x ∈ Rn | (x − s)t Q−1 (x − s) ≤ 1}.
2
The Ellipsoid Algorithm just finds an element in an polytope or ends with the assertion
that the polytope is empty. On the other hand, it can be applied to more general sets K ⊆ Rn
71
provided that K is a compact convex set and that for any x ∈ Rn \ K we can find a half-space
containing K such that x is on the border of the half-space.
Basically, the algorithms works as follows: We always keep track of an ellipsoid containing K.
Then we check if the center c of the ellipsoid is contained in K. If this is the case, we are done.
Otherwise, we compute the intersection X of the ellipsoid and a half-space containing K such
that c is on the border of the half-space. Then, we find a new (smaller) ellipsoid containing X.
For the 1-dimensional space, the ellipsoid method contains the binary search as a special case.
However, for technical reasons, we assume in the following that the dimension of our solution
space is at least 2.
We start with a special case that is easier to handle: We assume that our given ellipsoid is
the ball B n (with radius 1 and center 0). We want to find a small ellipsoid E covering the
intersection of B n with the half-space {x ∈ Rn | x1 ≥ 0} (the gray area in Figure 7).
(0, 1)
B2
(c, 0) (1, 0)
E
(0, −1)
For symmetry reasons, we choose the center of the new smaller ellipsoid on the vector e1 at a
position c · e1 (where c is still to be determined). Our candidates for the ellipsoid are of the form
( n
)
X
E = x ∈ Rn | α2 (x1 − c)2 + β 2 x2i ≤ 1
i=2
1
where we also have to choose α and β. The matrix Q is then a diagonal matrix with entry α2
at position (1, 1) and β12 on all other diagonal positions.
To keep E small, we want e1 to lie on the border of E. This condition leads to α2 (1 − c)2 = 1
and hence
1
α2 = . (39)
(1 − c)2
72
be on the border of E. This condition leads to α2 c2 + β 2 = 1 and thus
c2 1 − 2c
β 2 = 1 − α 2 c2 = 1 − 2
= . (40)
(1 − c) (1 − c)2
p
The volume of an ellipsoid E = {x ∈ Rn | (x−s)t Q−1 (x−s) ≤ 1} is vol(E) = det(Q)×vol(B n )
(a result from measure theory, see e.g. Proposition 6.1.2 in Cohn [1980]).
p
Therefore, our goal is to choose α, β and c in such a way that det(Q) = α−1 β −(n−1) is
minimized.
(1−c)2n
Thus, we want to find a c minimizing (1−2c)n−1
.
2n 2n 2n−1
d (1−c)
We have dc (1−2c)n−1
= 2(n−1)(1−c)
(1−2c)n
− 2n(1−c)
(1−2c)n−1
which is zero if 2(n−1)(1−c)
1−2c
= 2n. This leads to
2(n − 1) − 2c(n − 1) = 2n − 4cn and c(2n − (n − 1)) = 1. Thus, we minimize the volume by
1
setting c = n+1 .
(n+1)2 n2 −1
Then, α2 = n2
and β 2 = n2
.
1 + x ≤ ex for any x ∈ R. 2
73
Lemma 52 (Half-Ellipsoid Lemma) Let E = p + {x ∈ Rn | xt Q−1 x ≤ 1} be an ellipsoid
and a ∈ Rn with at Qa = 1. Then,
2
n t t 1 0 n n −1 t −1 2 t
E ∩{x ∈ R | a x ≥ a p} ⊆ E = p+ Qa+ x ∈ R | x Q + aa x ≤ 1 .
n+1 n2 n−1
1
vol(E 0 )
Moreover, vol(E)
≤ e− 2(n+1) .
E ∩ {x ∈ Rn | at x ≥ at p}
= (p + M B n ) ∩ {x ∈ Rn | at x ≥ at p}
= p + (M B n ∩ {x ∈ Rn | at (x + p) ≥ at p})
= p + (M B n ∩ {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ M −1 {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | at M x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | et1 x ≥ 0})
2
1 n n −1 t 2 t
⊆ p+ M e1 + M x ∈ R | x In + e1 e x ≤ 1
n+1 n2 n−1 1
2
1 n n −1 −1 t 2 t −1
= p+ M e1 + x ∈ R | (M x) In + e1 e M x ≤ 1
n+1 n2 n−1 1
2
1 n n −1 t −1 2 t
= p+ Qa + x ∈ R | x Q + aa x ≤ 1
n+1 n2 n−1
n o
We can write the ellipsoid E 0 in standard form as E 0 = p + 1
n+1
Qa + x ∈ Rn | xt Q̃−1 x ≤ 1
2
with Q̃ = n2n−1 Q − n+1
2
Qaat Qt because
n2 − 1 n2
2 −1 t 2 t t
Q + aa Q− Qaa Q
n2 n−1 n2 − 1 n+1
2 2 4
= In − aat Qt + aat Q − 2 a at Qa at Qt
n+1 n−1 n − 1 | {z }
=1
= In .
q
vol(E 0 )
Therefore, vol(E) = det( Q̃)
det(Q)
.
2 n n
n2 n2
We have det( Q̃) n 2 2
det(Q)
= det n2 −1
In − n+1
aat Qt = n2 −1
det In − n+1
aat Qt = n2 −1
(1 −
2
n+1
). To see the last equality note that the matrix aat Qt has eigenvalue 1 for the eigenvector
74
a (because at Qt a = 1) while all other eigenvalues are zero (the rank of aat Qt is 1).q Since the
determinant is the product of all eigenvalues, this implies the last equation. Hence, det( Q̃)
det(Q)
≤
2 n2 n−1
1
≤ e− 2(n+1) (see the proof of the Half-Ball Lemma for
n 2 1 n n2 2
n2 −1
(1 − n+1 ) 2 = n+1 n2 −1
details of the last steps). 2
n o 2
Remark: The ellipsoid E 0 = p+ n+1 1
Qa+ x ∈ Rn | xt Q̃−1 x ≤ 1 with Q̃ = n2n−1 Q − n+12
Qaat Qt
is called Löwner-John ellipsoid. It is in fact the smallest ellipsoid containing E ∩ {x ∈ Rn |
at x ≥ at p}.
A separation oracle for a convex set K ⊆ Rn is a black-box algorithm which, given x ∈ Rn ,
either returns an a ∈ Rn with at y > at x for all y ∈ K or asserts x ∈ K.
Observation: Given A ∈ Qm×n and b ∈ Qm , a separation oracle for {x ∈ Rn | Ax ≤ b} can be
implemented in O(mn) arithmetical operations.
Proof: As an invariant, we will prove that during the k-th iteration of the algorithm, the
set K is contained in the set pk + {x ∈ Rn | xt A−1k x ≤ 1}. For k = 0, this is true because R is
big enough. For the step from k to k + 1, we apply the Half-Ellipsoid Lemma (Lemma 52) to
t
Q = Ak and a = √ āt (this scaling leads to at Ak a = āāt A k ā
Ak ā
= 1).
ā Ak ā
75
We have vol({x ∈ Rn | xt x ≤ R2 }) ≤ vol([−R, R]n ) = 2n Rn , and in each iteration, the
1
− 2(n+1)
volume of Ek = {x ∈ Rn | xt A−1
k x ≤ 1} is reduced at least by the factor e , so we get
k
− 2(n+1)
vol(Ek ) ≤ e 2n Rn .
k
Thus, we have to find a smallest k such that e− 2(n+1) 2n Rn ≤ which is equivalent to 2(n+1)k
≥
2n Rn 1 1
ln and k ≥ 2(n + 1)(n ln(2R) + ln( )). This shows that O(n(n ln(R) + ln( ))) iterations
are sufficient. 2
We cannot compute square roots exactly, so during the algorithm, we have to work with rounded
intermediate solutions. Let pek and Ãk be the exact values and pk and Ak be the rounded values
(and the same for the corresponding ellipsoids Ẽk and Ek ). Note that pek and Ãk are based on
the rounded values pk−1 and Ak−1 .
Let δ be an upper bound on the maximum absolute rounding error for the entries in pek and
Ãk , so kpk − pek k∞ ≤ δ and kAk − Ãk k∞ ≤ δ. So δ (that will be defined later) describes the
precision of the rounding. When we round the entries in Ãk , we do it in such a way that the
matrix remains symmetric. Let Γk = Ak − Ãk and ∆k = pk − pek .
In the following, we write by kk̇ the Euclidean norm for vectors and the induced operator norm
for the matrices. When considering matrices, we often make use of the fact that the Frobenius
norm is an upper bound for the operator norm induced by the Euclidean norm.
−1
For any x ∈ K we can assume that (x − pek )t Ãk (x − pek ) ≤ 1 and we want to prove the same
for pk and Ak . To this end, we have to increase the ellipsoid slightly by scaling Ãk .
−1 −1
We have (x − pk )t A−1 t
k (x − pk ) = (x − pk ) Ãk (x − pk ) + (x − pk )t (A−1
k − Ãk )(x − pk ). We
analyze the two summands separately:
−1 −1 −1 −1
(x − pk )t Ãk (x − pk ) = (x − p˜k )t Ãk (x − p˜k ) + |2∆tk Ãk (x − p˜k )| + ∆tk Ãk ∆k
−1 −1
≤ 1 + 2k∆k k · kÃk k (R + kp˜k k) + k∆k k2 · kÃk k (41)
√ −1 −1
≤ 1 + 2 nδ kÃk k (R + kp˜k k) + nδ 2 kÃk k.
And:
−1 −1
(x − pk )t (A−1
k − Ãk )(x − pk ) ≤ kx − pk k2 · kA−1
k − Ãk k
−1 −1
≤ (R + kpk k)2 kAk (Ak − Ãk )Ãk k
−1 (42)
≤ (R + kpk k)2 kA−1
k k · kÃk k · kΓk k
2 −1 −1
≤ (R + kpk k) kAk k · kÃk k · nδ
1
We adjust Ãk by multiplying it by µ = 1 + 2n(n+1)
, so we replace Ãk by µÃk (which we call Ãk
again). Then
−1 1 2n(n + 1) 1
(x − p̃k )t Ãk (x − p̃k ) = 1 = < 1 − . (43)
1+ 2n(n+1)
2n2 + 2n + 1 4n2
76
and (E
]k+1 also refers to the scaled version of Ãk ):
n2
vol(E
] k+1 ) 1
− 2(n+1) 1 1 1 1
≤e 1+ ≤ e− 2(n+1) e 4(n+1) = e− 4(n+1) . (44)
vol(Ek ) 2n(n + 1)
Thus,
q
vol(Ek+1 ) vol(E
]k+1 ) vol(Ek+1 ) 1 −1
= ≤ e− 4(n+1) det(Ak+1 A
]k+1 ) (45)
vol(Ek ) vol(Ek ) vol(E]k+1 )
We have
−1 −1
det(Ak+1 A
] k+1 ) = det In + (Ak+1 − A
] k+1 )A]k+1
(∗) −1
n
≤ kIn + (Ak+1 − A ] k+1 )Ak+1 k
]
−1
n
≤ (1 + kΓk+1 k · kA
] k+1 k)
−1
n
≤ (1 + nδkA
]k+1 k)
−1
2 δkA k
≤ en
^ k+1 ,
Qn
where inequality (∗) follows from Hadamard’s inequality (| det(A)| ≤ i=1 kai k for an n × n-
matrix with columns a1 , . . . , an , see exercises).
This implies
vol(Ek+1 ) 1 1 2
−1
≤ e− 4(n+1) · e 2 n δkAk+1 k .
^
vol(Ek )
−1 1
Hence, if we had 12 δkA
] k+1 k <
1
8(n+1)3
, then we had vol(Ek+1 )
vol(Ek )
< e− 8(n+1) .
Therefore, and by equations (41) and (42) our goal is to choose δ such that we get the following
inequalities:
√ fk −1 k (R + kp˜k k) + nδ 2 kA
fk −1 k + (R + kpk k)2 kA−1 k · kA
fk −1 knδ ≤ 1
• 2 nδ kA k 4n2
−1
1
• δkA
]k+1 k ≤ 4(n+1)3
1
Proposition 54 Assume that δ is chosen such that δ ≤ 12n4k
in iteration k of the
Ellipsoid Method. Then, we have:
(c) kAk k ≤ R2 2k , kA
fk k ≤ R2 2k .
77
Proof: We have
n2 − 1 1 āāt
−1 2
A
]k+1 = A−1
k + .
n2 µ n − 1 āt Ak ā
−1
Thus, as a sum of a positive definite matrix and a positive semidefinite matrix A
]k+1 is positive
n2 2 t
definite. Therefore Ak+1 = n2 −1 µ(Ak − n+1 bk bk ) is positive definite.
]
Thus,
n2 − 1 1 āāt
−1 2
kA
]k+1 k≤ kA−1
k k+ k t k ≤ 3kAk −1 k
n2 µ n − 1 ā Ak ā
Let λ be a smallest eigenvalue of Ak+1 and v a vector with kvk = 1 such that λ = v t Ak+1 v.
Then:
v t Ak+1 v ≥ v t A
]k+1 v − nδ
≥ min{ut A ] n
k+1 u | u ∈ R , kuk = 1} − nδ
1
≥ −1 − nδ
kAk+1 k
]
1
≥ − nδ
3kAk −1 k
1 1
≥ − nδ
3 R−2 4k
1
≥ ,
R−2 4k+1
provided that:
R2
1 1
nδ ≤ − . (46)
3 4 4k
n2
kAk+1 k ≤ kA
] k+1 k + kΓk+1 k ≤ 2−1
µ kAk k + nδ ≤ R2 2k+1
n
| {z }
≤ 32
n2
We also get kA
]k+1 k ≤ n2 −1
µkAk k ≤ R2 2k+1 , so we have proved (c).
78
We can write Ak = M M t with a regular matrix M . Then,
r s
kAk āk t
ā Ak Ak ā (M t ā)t Ak (M t ā) p k
kbk k = √ t = t
= t t t
≤ kAk k ≤ R2 2 , (47)
ā Ak ā ā Ak ā (M ā )(M ā)
where the first inequality follows from the fact that kAk k = max{xT Ak x | kxk = 1} because Ak
is positive semidefinite (see exercises).
Therefore, we get by induction (using the fact that p0 = 0)
1 √ k √ k 1
kpk+1 k ≤ kpk k + kbk k + nδ ≤ kpk k + R2 2 + nδ ≤ R2k + R2 2 + √ k ≤ R2k+1 .
n+1 3 n4
−1
Lemma 55 Let δ be positive with δ < 26(N (R,)+1) 16n3 where N (R, ) :=
d8(n + 1)(n ln(2R) + ln( 1 ))e. Then, in iteration k of the Ellipsoid Algorithm, we
k
have K ⊆ pk + Ek and vol(Ek ) < e− 8(n+1) 2n Rn .
1 1
R2
Proof: By the choice of δ, we have nδ ≤ 3
− 4 4k
.
Moreover,
√ fk −1 k (R + kp˜k k) + nδ 2 kA
fk −1 k +(R + kpk k)2 kA−1 k · kA
fk −1 k nδ ≤ δn26k ≤ 1
• 2 nδ kA k 4n2
| {z } |{z} | {z } |{z} | {z } | {z }
≤R−2 4k ≤R2k ≤R−2 4k ≤R2k ≤R−2 4k ≤R−2 4k
79
−1
1
• δ kA
] k≤
| k+1
{z } 4(n+1)3
≤R−2 4k
Hence, by the above analysis, Ek (with rounded numbers) always contains the set K, and
1
is reduced at least by a factor of e− 8(n+1) in each iteration, so after
the volume of Ek
O n n ln R + ln 1 iterations, the algorithm terminates with a correct output. 2
There number of calls of the separation oracle can be reduced to O(n ln( nR
)) (see Lee, Sidford,
and Wong [2015] for an algorithm that only needs O(n ln( )) oracle calls and O(n3 lnO(1) ( nR
nR
))
additional time).
We first want to use the Ellipsoid Algorithm just to check if a given polyhedron P is empty.
This can be done directly, provided that P is in fact a polytope and if we have the assertion
that if P is non-empty, its volume cannot be arbitrarily small. The following proposition implies
that we can assume these properties:
(a) P = ∅ ⇔ PR, = ∅.
2
n
(b) If P 6= ∅, then vol(PR, ) ≥ n2size(A)
.
Proof:
80
y t A = 0 and y t b = −1. Then, by Proposition 47
min 11t y
At y = 0
bt y = −1
y ≥ 0
has an optimum solution y such that the absolute value of any entry of y is at most
24nsize(A)+size(b) . Thus, y t (b + 11) < −1 + (n + 1)24n(size(A)+size(b)) < 0. Again by Farkas’
Lemma, this implies that Ax ≤ b + 11 does not have a feasible solution. In particular,
there is no feasible solution in [−R, R]n , so PR, = ∅.
(b) If P 6= ∅, then PR−1,0 6= ∅ (with the same proof as in (a) for R). But for any z ∈ PR−1,0 , we
have {x ∈ Rn | ||x − z||∞ < n2size(A) } ⊆ PR, . Hence vol(PR, ) ≥ vol{x ∈ Rn | ||x − z||∞ <
n
2
2
n2size(A)
} = n2size(A) .
√
Proof: We can apply the Ellipsoid Algorithm to K = PR, with R = d n(1+24n(size(A)+size(b)) )e
n −1
and 0 = n2size(A)
2
(for = 2n24n(size(A)+size(b)) ) as a lower bound for the volume. We need
N (R, 0 ) = O(n(n ln(R) + ln( 10 ))) iterations, which is polynomial in the input size.
Moreover, it is sufficient to set the bound on the absolute rounding error to any value δ <
0 −1
26(N (R, )+1) 16n3 , so also the number of bits that we have to compute during the algorithm
is polynomial. 2
Proof: By Theorem 58, we can check in polynomial time if a given linear program has a
feasible solution. We will show that this is sufficient for computing a feasible solution if one exists.
Assume that we are given m inequalities ati x ≤ bi with ai ∈ Qn and bi ∈ Q (i ∈ {1, . . . , m}). First
check if the system is feasible. If it infeasible, we are done. Otherwise, perform for i = 1, . . . , m
the following steps: Check if the system remains feasible if we replace ati x ≤ bi by ati x = bi . If
this is the case, replace ati x ≤ bi by ati x = bi . Otherwise, the inequality is redundant, and we
can skip it. We end up with a feasible system of equations with the property that any solution
of this system of equations is also a solution of the given system of inequalities. However, the
system of equations can be solved in polynomial time by using Gaussian Elimination (see
81
Section 5.1). Hence, for any linear program, we can compute in polynomial-time a feasible
solution if one exists.
In Section 2.4 we have seen that the task of computing an optimum solution for a bounded
feasible linear program can be reduced to the computation of a feasible solution of a modified
linear program (see the LP (24)). Thus, we can also compute an optimum solution. 2
Remark: By Proposition 22, the method described in the previous proof computes a solution
in a minimal face of the solution polyhedron P . In particular, if P is pointed, we compute a
vertex of P .
An advantage of the Ellipsoid Algorithm is that it does not necessarily need a complete
description of a solution space K ⊆ Rn but only needs a separation oracle that provides a linear
inequality satisfied by all elements of K but not by a given vector x ∈ Rn \ K. This allows us
to use the method e.g. for linear program with an exponential number of constraints.
Example: Consider the Maximum-Matching Problem. A matching in an undirected
graph is a set M ⊆ E(G) such that |δG (v) ∪ M | ≤ 1 for all v ∈ V (G). In the Maximum-
Matching Problem we are given an undirected graph G and ask for a matching with
maximum cardinality. It can be formulated as the following integer linear program:
P
max x
P e∈E(G) e
e∈δG (v) xe ≤ 1 v ∈ V (G)
xe ∈ {0, 1} e ∈ E(G)
In the LP-relaxation, we simply replace the constraint “xe ∈ {0, 1}” by “xe ≥ 0”. However, this
allows us e.g. in the graph K3 (i.e. the complete graph on three vertices) to set all values xe to
1
2
. To avoid such solutions, we may add the following constraints:
P |U |−1
e∈E(G[U ]) xe ≤ 2
U ⊆ V (G), |U | odd
are indeed the convex combinations of the solutions of the ILP formulation. In other words, the
vertices of the solution polyhedron of the LP are the integer solutions. We won’t prove this
statement here, see Edmonds [1965] for a proof. Hence, solving the linear program would be
sufficient to solve the matching problem. The number of constraints is exponential in the size
of the graph, but the good news is that there is a separation oracle with polynomial running
82
time for this linear program (see Padberg and Rao [1982]). We will see how such a separation
oracle can be used for solving the optimization problem.
In the remainder of this chapter, we always consider closed convex sets K for which numbers r
and R with 0 < r < R2 exist such that rB n ⊆ K ⊆ RB n . We call sets for which such numbers r
and R exist, r-R-sandwiched sets.
We will consider relaxed versions both of linear optimization problems and of separation
problems. In the weak optimization problem we are given a set K ⊆ Rn , a number > 0
and a vector c ∈ Qn . The task is to find an x ∈ K with ct x ≥ max{ct z | z ∈ K} − .
In order to apply the Ellipsoid Algorithm directly to an optimization problem, we need the
property that the set of almost optimum solutions cannot have an arbitrarily small volume.
The following lemma guarantees this for r-R-sandwiched sets:
n−1
1
rn−1 vol(Bn−1 )
2 ct z
n−1 1
0 n−1
vol(conv(A ∪ {z})) ≥ r vol(Bn−1 )
2ct z 2kck n
n−1
1 1
≥ rn−1 n .
2kckR| n 2kck n
Here we use the fact that conv(A0 ∪ {z}) is an n-dimensional pyramid with height at least
2kck
n−1 n−1
and a base of ((n − 1)-dimensional) volume 2ct z r vol(Bn−1 ). 2
This result allows us to find a polynomial-time algorithm for the weak optimization problem
provided that we can solve the corresponding separation problem efficiently:
83
Proposition 61 Given a polynomial-time separation oracle for an r-R-sandwiched
convex set K ⊆ Rn with running time polynomial in size(R), size(r) and size(x)
(where x is the input vector for the oracle), a number > 0 and a vector c, there is a
polynomial-time algorithm (w.r.t. size(R), size(r), size(c) and size()) that computes a
vector v ∈ K with ct v ≥ sup{ct x | x ∈ K} − .
Proof: Apply the Ellipsoid Algorithm to find an almost optimum vector in K. Use the
previous lemma that shows that the set of almost optimum vectors in K cannot be arbitrarily
small. 2
A weak separation oracle for a convex set K ⊆ Rn is an algorithm which, given x ∈ Rn
and η with 0 < η < 21 , either asserts x ∈ K or finds v ∈ Rn with v t z ≤ 1 for all z ∈ K and
v t x ≥ 1 − η.
Remark: For the previous proposition, it would be enough to have a weak separation oracle
for K.
Notation: For K ⊆ Rn , we define K ∗ := {y ∈ Rn | y t x ≤ 1 for all x ∈ K}.
Proof: Claim: K ∗∗ = K
Proof of the claim: For x ∈ K, we have y t x ≤ 1 for all y ∈ K ∗ which implies x ∈ K ∗∗ . Therefore,
we have K ⊆ K ∗∗ .
Now let z ∈ Rn \ K. And let w ∈ K be a vector such that kz − wk2 is smallest possible over
vectors in K (w exists because K is convex and closed). Let u = z − w. Then, for all x ∈ K, we
have ut x ≤ ut w < ut z. Moreover, since 0 ∈ K, we have ut w ≥ 0. By scaling u, we can assume
that ut z > 1 while ut x ≤ 1 for all x ∈ K. But then u ∈ K ∗ and ut z > 1 which implies z 6∈ K ∗∗ .
Thus K ∗∗ ⊆ K. This prove the claim.
Now, let x ∈ Rn be an instance for the weak separation oracle. If x = 0, we can assert x ∈ K,
x
and if kxk > R we can choose v = kxk 2 . Therefore, we can assume that 0 < kxk ≤ R.
We can solve the (strong) separation problem for K ∗ (see the exercises). Since K ∗ is a closed
convex R1 - 1r -sandwiched set, we can apply the previous observation to it, and thus, we can
η
solve the weak optimization problem for K ∗ with c = kxk x
2 and = R in polynomial time.
xt t η xt η
Thus, we get a vector v0 ∈ K ∗ with v
kxk 0
x
≥ max{ kxk v | v ∈ K ∗} − R
. If v
kxk 0
≥ 1
kxk
− R
,
then v0t x ≥ 1 − η kxk
R
≥ 1 − η, and v0 z ≤ 1 for all z ∈ K (since v0 ∈ K ). Otherwise ∗
84
t
x
max{ kxk v | v ∈ K ∗ } ≤ kxk
1
, so max{xt v | v ∈ K ∗ } ≤ 1, which implies x ∈ K ∗∗ . Together with
the above claim, this implies x ∈ K. Therefore, we have a weak separation oracle for K in
polynomial running time. 2
It turns out that for rational r-R-sandwiched polyhedra P an exact polynomial-time separation
algorithm also provides an exact polynomial-time optimization algorithm, and vice versa,
provided that appropriate bounds on the sizes of the vertices of P are given (see textbooks, e.g.
Theorems 4.21 and Theorem 4.23 of Korte and Vygen [2018]).
85
86
7 Interior Point Methods
Note that this section was not covered by the lecture course given in summer term 2019.
The Ellipsoid Algorithm gives a polynomial-time algorithm for solving linear programs but
in practice it is typically much less efficient than the Simplex Algorithm. In contrast, the
algorithm that we will describe in this section is efficient both in theory and practice.
The term “interior point method” refers to several quite different algorithms. They all have in
common that during the algorithm we always consider vectors in the interior of the polyhedron
of feasible solutions (in contrast to the Simplex Algorithm where we always have vectors
on the border of the polyhedron). Here, we restrict ourselves to one variant and follow the
description by Mehlhorn and Saxena [2015]. The first version of the algorithm has been proposed
by Karmakar [1984].
We consider an LP max{ct x | Ax ≤ b} in standard inequality form.
To simplify the notation, we write the slack variables s explicitly, so we consider the following
problem:
max ct x
s.t. Ax + s = b (48)
s ≥ 0
87
(dual) value of the dual solution y and the (primal) value of the primal solution x, s because
bt y − ct x = xt At y + st y − ct x = xt c + st y − ct x = st y.
The system (50) has a solution only if both the primal and the dual linear program are feasible
and bounded, so for the moment we assume that this is the case. In Section 7.1, we will see
what to do to enforce these properties.
In the interior point methods, one generally considers vectors in the interior of the solution
space. In the system (50), the only inequalities are y ≥ 0 and s ≥ 0, so during the algorithm,
we always have solutions x, s, y with y > 0 and s > 0. We will replace the condition y t s = 0 by
2
the condition σ 2 := m yi s i
− 1 ≤ 14 for some number µ > 0. During the iterations of the
P
i=1 µ
algorithm, we will decrease µ more and more towards 0.
To summarize, during the algorithm, we have a number µ > 0 and vectors x, s, y meeting the
following invariants
Ax + s = b
At y = c
Pm yi si 2
1 (51)
i=1 µ
−1 ≤ 4
y > 0
s > 0
(II) Reduce µ by a constant factor and adapt x, y and s to this new value of µ such that we
again get a solution of (51). Iterate this step until µ is small enough (Section 7.2).
We will show how we can modify (51) to an equivalent problem that can be solved easily,
provided that we are allowed to choose µ. This modification will in particular make both the
primal and the dual LP feasible. This is equivalent to the statement that one of them is feasible
and bounded. We will show how to modify the dual LP (49) such that the modified version is
feasible and bounded.
In a first step, we make the LP (49) bounded (in such a way that we do not change the
problem if the given LP was bounded). By Theorem 47, we know that if (49) is feasible and
bounded, then there is a W with W ∈ 2Θ(m(size(A)+size(c))) such that there is an optimum solution
y = (y1 , . . . , ym ) ≥ 0 with yi ≤ W (i = 1, . . . , m). So in this case there is a vector y ≥ 0 with
11t y ≤ mW and At y = c. Equivalently (after dividing everything by W ), we can ask for a vector
y ≥ 0 with 11t y ≤ m and At y = W1 c. By relaxing the constraint 11t y ≤ m to 11t y ≤ m + 1 and
88
by adding a slack variable ym+1 ≥ 0 this leads to the following LP which is equivalent to (49)
provided that (49) is bounded:
min bt y
s.t. At y = W1 c
11t y + ym+1 = m+1 (52)
y ≥ 0
ym+1 ≥ 0
In a second step, we will make the LP feasible. To this end, we add a new variable ym+2
such that setting all variables to 1 will get us a feasible solution. Let H be a constant (to be
determined later). Then, we state the following LP:
min bt y + Hy
m+2
1
t
s.t. A y + W
c − A 11 ym+2 = W1 c
t
t
11 y + ym+1 + ym+2 = m + 2
(53)
y ≥ 0
ym+1 ≥ 0
ym+2 ≥ 0
The goal is to choose H that big that if this LP has a feasible solution with ym+2 = 0 at all,
then in any optimum solution ym+2 = 0 will hold. In fact, by Corollary 48 we know that there
is a constant l such that if there is an optimum solution of (53) with ym+2 > 0, then there is an
optimum solution with ym+2 ≥ 2−4ml(size(A)+size(c)+size(W )) . On the other hand, bt y ≤ kbk1 (m + 2)
in any feasible solution of (53), so if we set H = (kbk1 (m + 2) + 1)24ml(size(A)+size(c)+size(W )) , then
we enforce that ym+2 = 0 in any optimum solution (if a solution with ym+2 = 0 exists).
The linear program (53) is obviously feasible and bounded. In addition, we can use an optimum
solution of it, to check if the initial dual LP was feasible and bounded, and if this is the case,
we can find an optimum solution of it: Let y1 , . . . , ym+2 be an optimum solution of (53). If
ym+2 > 0, then we know that (52) has no feasible solution (otherwise there was a feasible
solution of (53) with ym+2 = 0 which is cheaper). Thus, the LP (49) has no feasible solution
either. On the other hand, if ym+2 = 0, then the initial dual LP must be feasible. Assume that
this is the case, then we still have to check if the initial dual LP was bounded. If ym+1 > 0, the
initial dual program must be bounded. If ym+1 = 0, then the initial dual LP can be bounded or
unbounded. To decide if it is bounded, we can replace c by the all-zero vector and first solve
this new problem. Then, by Farkas’ Lemma, the LP (49) is bounded if and only if the value of
an optimum solution of the new problem is non-negative.
If we dualize the LP (53), we get the following LP (with variables x ∈ Rn , s ∈ Rm and additional
89
variables xn+1 , sm+1 , and sm+2 ):
max W1 ct x + (m + 2)xn+1
Ax + xn+1 11 + s = b
1 t t
W
c − 11 A x + xn+1 + sm+2 = H
xn+1 + sm+1 = 0 (54)
s ≥ 0
sm+1 ≥ 0
sm+2 ≥ 0
Instead of the primal-dual pair (48) and (49), we will consider the pair (53) and (54). Due to
the modification, both LPs are feasible and bounded.
For the new pair
2 of LPs we can easily find feasible solutions and a number µ such that
Pm+2 yi si
i=1 µ
− 1 ≤ 41 : We set y1 = y2 = · · · = ym = ym+1 = ym+2 = 1 which is obviously
µ
feasible for (53). For (54), we set x1 = x2 = · · · = xn = 0. Moreover, we choose sm+1 = ym+1 =µ
(where µ itself is still to be determined). This leads to xn+1 = −µ, sm+2 = H + µ, and
si = bi − xn+1 = bi + µ (i = 1, . . . m).
As a consequence of this choice, we get:
y i si bi
−1 = i = 1, . . . , m
µ µ
ym+1 sm+1
−1 = 0
µ
ym+2 sm+2 H
−1 =
µ µ
Therefore, !
m+2
X 2 m
yi si 1 X
σ2 = −1 = 2 H2 + b2i .
i=1
µ µ i=1
p Pm 2 2 1
Hence, by choosing µ = 2 H 2 + i=1 bi , we enforce σ ≤ 4
. Moreover, since µ > |bi |, we have
si = bi + µ > 0 for i ∈ {1, . . . , m}.
So what did we get so far? We have replaced the primal-dual pair (48) and (49) by the pair (53)
and (54) such that optimum solutions of these modified problems directly lead to a solution of
the original problem. Moreover, the new primal-dual pair consists of two feasible and bounded
problems.
We will write (53) as
min b̃t y
s.t. Ãt y = c̃ (55)
y ≥ 0
90
and (54) as
max c̃t x
s.t. Ãx + s = b̃ (56)
s ≥ 0
Ãx + s = b̃
Ãt y = c̃
Pm+2 yi si 2
i=1 µ
− 1 ≤ 14 (57)
y > 0
s > 0
In this section, we will describe a solution for the following problem: Given a solution
µ(k) , x(k) , y (k) , s(k) of (57) we want to compute a new solution µ(k+1) , x(k+1) , y (k+1) , s(k+1) of
(57) where µ(k+1) = (1 − δ)µ(k) for some δ that does not depend on the solution (to be
determined later).
In a first version, we describe the step without considering the sizes of the numbers that occur
during the computation. Afterwards, we will show how we can round intermediate solutions in
such a way that the numbers can be written with a polynomial number of bits.
We write x(k+1) = x(k) + f , y (k+1) = y (k) + g, and s(k+1) = s(k) + h. Think of the entries of f , g
and h as relatively small values. Assuming that µ(k+1) is fixed, we describe how to compute
appropriate values for f , g and h. The first two conditions of (57) lead to Ãf + h = 0 and
(k) (k)
Ãt g = 0. In addition we want to choose f and h such that (yi + gi )(si + hi ) is close to µ(k+1)
(k) (k) (k) (k) (k) (k)
(i = 1 . . . , m + 2). Since (yi + gi )(si + hi ) = yi si + gi si + yi hi + gi hi and the product gi hi
(k) (k) (k) (k)
is small (provided that gi and hi are small) we simply demand yi si + gi si + yi hi = µ(k+1)
(i = 1 . . . , m + 2). Hence, we want to compute f , g and h such that
Ãt g = 0
Ãf + h = 0 (58)
(k) (k) (k) (k)
si gi + yi hi = µ(k+1) − yi si i = 1, . . . , m + 2
Note that y (k) and s(k) are constant in this context. In this formulation, we skipped the
constraints that y (k+1) > 0 and s(k+1) > 0. We will see what we can do to get positive values,
anyway.
91
Let f , g and h be a solution of (58). By construction, we have
g t h = −g t Ãf = 0t f = 0. (60)
This implies
t
b̃t y (k+1) − c̃t x(k+1) = Ã(x(k) + f ) + (s(k) + h) (y (k) + g) − c̃t (x(k) + f )
t
= Ã(x(k) + f ) (y (k) + g) + (m + 2)µ(k+1) − c̃t (x(k) + f ) (61)
t
= x(k) + f Ãt y (k) + (m + 2)µ(k+1) − c̃t (x(k) + f )
= (m + 2)µ(k+1)
(k)
Proof: Let S be an (m + 2) × (m + 2)-diagonal matrix with si as entry at position (i, i)
(k)
and Y be an (m + 2) × (m + 2)-diagonal matrix with yi as entry at position (i, i).
Then, the last condition of (58) is equivalent to
which is equivalent to
g + S −1 Y h = S −1 µ(k+1) 11m+2 − y (k) .
This implies
Ãt g + Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − Ãt y (k) , (62)
and hence
Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − c̃. (63)
With h = −Ãf this leads to
However, the matrix Ãt S −1 Y Ã is invertible, so f = (Ãt S −1 Y Ã)−1 (c̃ − Ãt S −1 µ(k+1) 11m+2 ) is the
unique solution of this last inequality. In particular, if (58) has a solution, this is the only
choice for f . By setting h = −Ãf , we fulfill the second constraint of (58). Finally, we set
g = S −1 µ(k+1) 11m+2 − y (k) − S −1 Y h (again the only choice) satisfying the third constraint of
(58).
Since we have chosen g and h such that (62) and (63) are met, we also have Ãt g = 0, so the
solution satisfies the first condition of (58). 2
92
In the above proof we have to solve an equation system −Ãt S −1 Y Ãf = Ãt S −1 µ(k+1) 11m+2 − c̃
in order to compute f . This equation system depends on the previous solutions s(k) and y (k) , so
here the sizes of the numbers to store the intermediate solutions could get too big. At the end
of this section, we will describe how to handle such issues.
s 2 s 2 sm+2
m+2 (k) (k) m+2 (k+1) (k+1) P gi hi 2
yi s i y s
We have σ (k) = − 1 and σ (k+1) =
P P
µ(k)
i i
µ(k+1)
− 1 = µ(k+1)
.
i=1 i=1 i=1
It remains to show that y (k+1) > 0 and s(k+1) > 0 and σ (k+1) ≤ 21 .
We first show that for an appropriate choice of µ(k+1) we get σ (k+1) ≤ 21 .
µ(k) 1
Lemma 64 (a) For i = 1, . . . , m + 2 we have (k) (k) ≤ 1−σ (k)
.
yi s i
√
m+2
(k) (k)
P
1 − yi s i
≤ σ (k) m + 2.
(b) µ(k)
i=1
Proof:
2 2
(k) (k) (k) (k)
Pm+2 yi s i yi s i
(a) We have (σ (k) )2 = i=1 µ(k)
− 1 , so µ(k)
−1 ≤ (σ (k) )2 which implies
(k) (k) (k) (k)
1 − yi (k)
si yi s i
(k)
µ ≤ σ and µ(k)
≥ 1 − σ (k) for i = 1, . . . , m + 2. This proves the claim.
(b) The statement is simply a special case of the Cauchy-Schwarz inequality that can be
proved as follows:
(k) (k) 2
!
m+2
y s
X
(σ (k) )2 (m + 2) − 1 − i (k)i
i=1
µ
!2
(k) (k) 2
m+2
m+2
(k) (k)
X yi si X yi si
= (m + 2) 1 − (k)
− 1 − (k)
µ µ
i=1 i=1
2
(k) (k)
m+2 (k) (k) m+2 m+2 (k) (k)
X y s X X y s yj s j
= (m + 1) 1 − i (k)i − 2 1 − i (k)i · 1 −
µ µ µ (k)
i=1 i=1 j=i+1
m+2
(k) (k)
!2
X m+2 X (k) (k)
yi si yj sj
= 1 − − 1 −
µ(k) µ(k)
i=1 j=i+1
≥ 0
93
Lemma 65 If δ = √1 (i.e. µ(k+1) = (1 − √1 )µ(k) ) then σ (k+1) < 21 .
8 m+2 8 m+2
r r
(k) (k)
si yi
Proof: Let Gi := gi (k) and Hi := hi (k) (for i ∈ {1, . . . , m + 2}).
yi µ(k+1) si µ(k+1)
v v
um+2 um+2
u X gi hi 2 uX
σ (k+1) = t
(k+1)
= t (Gi Hi )2
i=1
µ i=1
v !
u 1 m+2 m+2
u
X X
= t (G2 + Hi2 )2 − (G2i − Hi2 )2
4 i=1 i i=1
v
um+2 m+2
1u X 1X 2
≤ t 2
(G + Hi )2 2
≤ (G + Hi2 )
2 i=1 i 2 i=1 i
m+2 m+2
g t h=0 1X 1X 1 (k) (k) 2
= (Gi + Hi )2 = g i s i + hi y i
2 i=1 2 i=1 yi(k) s(k)
i µ
(k+1) | {z }
(k) (k)
=µ(k+1) −yi si
m+2 2 !2
(k) (k)
1X µ(k) µ(k+1) yi si
= −
2 i=1 yi(k) s(k)
i µ
(k+1) µ(k) µ(k)
m+2
!2
(k) (k)
1 X µ(k) 1 y s
= (k) (k)
−δ + 1 − i (k)i
2 i=1 yi si 1 − δ µ
| {z }
1
≤
1−σ (k)
(k) (k) 2
m+2
! m+2
! !
(k) (k)
1 X y s yi si
X
≤ (m + 2)δ 2
− 2δ 1 − i (k)i
+ 1−
2(1 − δ)(1 − σ (k) ) i=1
µ i=1
µ(k)
(k) (k) 2
! !
m+2 (k) (k) m+2
1 X y s X y s
≤ (m + 2)δ 2 + 2δ 1 − i (k)i + 1 − i (k)i
2(1 − δ)(1 − σ (k) ) µ µ
|i=1 {z
√
} i=1
| {z }
2
≤σ (k) m+2 =(σ )
(k)
1 √ 2
≤ (k)
(m + 2)δ 2 + 2δσ (k) m + 2 + σ (k)
2(1 − δ)(1 − σ )
1 √ 2
(k)
= m + 2δ + σ
2(1 − δ)(1 − σ (k) )
σ (k) ≤ 12
2 √ 2
√
1 1 8 m+2 1 1
≤ m + 2δ + = √ +
1−δ 2 8 m+2−1 8 2
1
≤ .
2
2
94
Lemma 66 We have y (k+1) > 0 and s(k+1) > 0.
(k+1) (k+1)
Proof: Claim: We have yi si > 0 for i = 1, . . . , m + 2.
Proof of the Claim:
(k+1) (k+1)
Assume that yj sj ≤ 0 for a j ∈ {1, . . . , m + 2}. Then,
m+2
!2 (k+1) (k+1)
!2
(k+1) (k+1)
(k+1) 2
X yi si yj sj
σ = (k+1)
−1 ≥ (k+1)
−1 ≥ 1,
i=1
µ µ
(k) (k)
which is a contradiction to the fact that si , yi , and µ(k+1) are positive.
2
95
7.3 Finding an Optimum Solution
Pm+2 yi si 2
Proof: By the condition i=1 µ
− 1 ≤ 41 , we get
µ 3µ
≤ y i si ≤ < 2µ
2 2
for all i ∈ {1, . . . , m + 2}. Moreover, st y = m+2
P
i=1 yi si ≤ 2(m + 2)µ.
(a) Since y ∗ is an optimum and y a feasible solution of the dual LP, we have b̃t y ≥ b̃t y ∗ and
thus
st y = b̃t y − xt Ãt y = b̃t y − c̃t x ≥ b̃t y ∗ − c̃t x = b̃t y ∗ − xt Ãt y ∗ = st y ∗ .
η
Let i ∈ {1, . . . , m + 2} with yi < 4(m+2)
. We have
µ 2(m + 2)µ st y
si ≥ > ≥ .
2yi η η
Assume that yi∗ > 0, so yi∗ ≥ η. This implies
st y
st y ∗ ≥ si yi∗ > · η = st y ≥ st y ∗ ,
η
which is a contradiction. Therefore, yi∗ = 0.
(b) The case is very similar to part (a): Since x∗ , s∗ is an optimum and x, s a feasible solution
of the primal LP, we have c̃t x ≤ c̃t x∗ and thus
µ 2(m + 2)µ st y
yi ≥ > ≥ .
2si η η
96
Assume that s∗i > 0, so s∗i ≥ η. This implies
st y
y t s∗ ≥ s∗i yi > η · = st y ≥ y t s∗ ,
η
There are several ways to find an optimum solution. Before we describe a method to round
an interior point directly to an optimum solution, we will present a simpler but less efficient
η2
method: We choose k big enough such that µ(k) < 32(m+2) 2 . Then, for each i ∈ {1, . . . , m + 2},
(k) η (k) η
we have yi < 4(m+2)
or si <4(m+2)
. Let Āt y = c̄ be the subsystem of Ãt y = c̃ consisting of
(k) η
the rows with indices i for which si
< 4(m+2) , so s∗i = 0. For all other rows, we know that
yi∗ = 0, so we can ignore them when computing an optimum solution for the dual LP. If Āt y = c̄
has only one solution, we compute it and get an optimum solution of the modified dual LP
(k) η
(53) (provided that the result is non-negative). Otherwise, we check if yi0 < 4(m+2) for some
i0 ∈ {1, . . . , m}. In this case we know that if the initial dual LP has an optimal solution, then
there is one with yi0 = 0. Hence we can start the whole process again but now without the
variable yi0 , without the row of A with index i0 and without the entry of b with index i0 . Hence
we have reduced the instance size, so this method will terminate after at most m iterations.
(k) η
What can we do if there is no i ∈ {1, . . . , m} with yi < 4(m+2)
? To handle this case, we first
make sure that the system Ãx = b̃ does not have a feasible solution. If it has a feasible solution
(which can be checked by Gaussian Elimination), we modify b̃ slightly to a vector b∗ such that
Ãx = b∗ has no feasible solution. To this end choose n linearly independent rows of A. These
rows will define the solution of Ãx = b̃. Then, any modification of b outside these rows will make
the system Ãx = b̃ infeasible. We simply add an > 0 to one of these entries of b. If is small
enough, then an optimum solution of the dual LP with respect to b∗ will still be an optimum
solution of the original dual LP. To see that we can write with a polynomial number of bits,
observe that the absolute value of the difference between the costs of two basic solutions of an
LP is either 0 or can be bounded from below by some value 2−L where L is polynomial in the
input size. This follows from the fact that any basic solution can be written with a polynomial
number of bits. Thus, the same is true for any difference u of two basic solutions and for the
scalar product b̃t u. Hence, b̃t u is either zero or its absolute value is at least 2−L . This implies
that we can choose in such a way that it can be written with polynomially many bits and
that no suboptimal solution can become optimal by the modification.
Now assume that the initial dual LP is bounded and feasible. Then, we can compute optimum
solutions x∗ , y ∗ , s∗ of the modified LPs (53) and (54) by expanding optimum solutions of the
initial primal and dual problems in an canonical way. In particular, we will set xn+1 to 0. Then
Ax∗ + s∗ = b∗ but Ax = b∗ has no feasible solution. Hence, there must be an i0 ∈ {1, . . . , m}
(k) η
with s∗i0 > 0, so yi0 < 4(m+2) and yi∗0 = 0. Again, we get rid of at least one dual variable and
can restart the whole procedure on a smaller instance.
Now, we describe how we can avoid iterating the whole process:
97
Consider again the two problems (55) and (56). Theorem 13 implies that we can partition the
˙ such that for i ∈ B
index set {1, . . . , m + 2} of the dual variables into {1, . . . , m + 2} = B ∪N
∗ ∗
there is an optimum dual solution y with yi > 0 and for i ∈ N there is an optimum primal
solution x∗ , s∗ with s∗i > 0. Any optimum solution can be written as convex combination of
η
basic solutions. Hence, in Lemma 67 for any i ∈ {1, . . . , m + 2} we can either have yi < 4(m+2)
η η 2
or si < 4(m+2) but not both. Now we choose k big enough such that µ(k) < 32(m+2) 2 ∆ for
some ∆ ≥ 1 that will be determined later. Then, for each i ∈ {1, . . . , m + 2}, exactly one of
η η
the inequalities yi < 4(m+2)∆ and si < 4(m+2)∆ holds. Therefore, we can find the partitioning
η
˙ . In particular, we have yi ≥ 4(m+2) η
{1, . . . , m + 2} = B ∪N for each i ∈ B and yi < 4(m+2)∆ for
each i ∈ N
Let AB be the submatrix of à consisting of the rows with indices in B, and AN be the submatrix
(k) (k)
of à consisting of the remaining rows. By yB , yN , bB , bN we denote the corresponding subvectors
(k)
of vectors y (k) and b. As in the description of the Simplex Algorithm, the entries of e.g. yB
are not necessarily indexed from 1 to |B| but their index set is the set B ⊆ {1, . . . , m + 2}. We
can assume that AB has full column rank.
In the following, the vector norm is the Euclidean norm k · k2 and the matrix norm is the norm
induced by the Euclidean norm.
p
Theorem 68 Set ∆ = max{ (m + 2)kAB (AtB AB )−1 AtN k, 1}. Let k be big enough such
η2
that µ(k) < 32(m+2)2 ∆ . Let YB be a diagonal matrix whose rows and columns are indexed
(k)
with B such that the entry at position (i, i) is yi . Define
(k)
dy := YB AB (AtB (YB )2 AB )−1 AtN yN
(k)
and ỹB = YB dy + yB . Then:
(c) The vector ỹ ∈ Rm+2 which arises from ỹB by adding zeros for the entries with index
in N is an optimum dual solution.
Proof:
98
(b)
(k)
kdy k = kYB AB (AtB (YB )2 AB )−1 AtN yN k
(k)
= kYB AB (AtB (YB )2 AB )−1 AtB YB YB−1 AB (AtB AB )−1 AtN yN k
| {z }
=In
(k)
= kYB AB (AtB (YB )2 AB )−1 AtB YB k ·kYB−1 AB (AtB AB )−1 AtN yN kx
| {z }
=1
(k)
≤ kYB−1 k · kAB (AtB AB )−1 AtN k · kyN k
| {z } | {z } | √{z }
4(m+2) ∆ η m+2
≤ ≤ √m+2 < 4(m+2)∆
η
≤ 1.
(c) By (a), we have Ãt ỹ = c̃, and by (b), we know that ỹB > 0, so we have ỹ ≥ 0. Hence ỹ
is a feasible dual solution. Moreover, we know that there is a feasible primal solution in
which the slack variables si are zero for i ∈ B. Hence, by complementary slackness, ỹ is
an optimum dual solution. 2
99
100
8 Integer Linear Programming
Imposing integrality constraints on all or some variables of a linear program allows to model
many new conditions that could not be described by linear constraints. For example, even if
we only consider Binary Linear Programs (i.e. all integrality constraints are of the type
x ∈ {0, 1}) we can easily model the following conditions for variables x, y:
On the other hand, we have already seen that there are NP-hard optimization problems that
can be modeled as (mixed) integer linear programs. Hence, we cannot hope for polynomial-time
algorithms to solve general ILPs.
Fig. 8: A polyhedron P (given by the red hyperplanes) and its integer hull PI (green). The
black dots indicate the integral vectors.
Observations:
101
• PI is not necessarily a polyhedron.
102
Definition 22 A polyhedron P is called integral if P = PI .
In this section, our goal is to find a certificate that a given system of equations does not have
any integral solution (which will be the result of Corollary 73).
The following operations on matrices are called elementary unimodular column operati-
ons:
103
Proof: We may assume that A is integral. Assume that we have already transformed A
F 0
into a matrix where F is a lower triangular matrix with positive diagonal. Let
G H
h11 , . . . , h1k be the first row of H. Apply elementary unimodular column operations to H such
that all h1j are non-negative and such that kj=1 h1j is as small as possible. We may assume
P
that h11 ≥ h12 ≥ · · · ≥ h1k . Then, h11 > 0 because A has rank m. Moreover, h1j = 0 for
j ∈ {2, . . . , k} because otherwise subtracting h1j from h11 would reduce kj=1 h1j . Hence, we
P
have obtained a larger lower triangular matrix F 0 .
We iterate this step and end up with a matrix B 0 where B is a lower triangular matrix
with positive diagonal. Denote the entries of B be bij (i = 1, . . . , m, j = 1, . . . , m). Finally, we
perform for i = 2, . . . , m the following steps: For j = 1, . . . , i − 1 add an integer multiple of the
i-th column of B to the j-th column of B such that the bij is non-negative and less than bii . 2
Proof: “⇒:” If x and y t A are integral vectors and Ax = b, then y t Ax = y t b is also integral.
“⇐:” Assume that bt y is integral for each y ∈ Qm for which At y is integral. Then, Ax = b must
have a (fractional) solution, since otherwise, by Farkas’ Lemma (Corollary 7), there would be
a vector y ∈ Qm with y t A = 0 and y t b = − 12 . Thus, we may assume that the rows of A are
linearly independent, so A has rank m.
It is easy to check the statement to be proved holds for A if and and only if it holds for any
matrix à where à arises from A by applying an elementary unimodular column operation.
Hence, we can assume that A is in Hermite normal form [B 0]. Thus B −1 [B 0] = [Im 0] is
−1 −1
an integral matrix. Therefore by our
−1 assumption (applied to−1therows of B ), B b is an
B b B b
integral vector. Since [B 0] = b, the vector x := is an integral solution for
0 0
[B 0] x = b. 2
104
8.3 TDI Systems
(a) P is integral
(e) Each rational supporting hyperplane of P contains at least one integral vector.
(f ) max{ct x | x ∈ P } is attained by an integral vector for each c for which the maximum
is finite.
(g) max{ct x | x ∈ P } is an integer for each integral vector c for which the maximum is
finite.
Proof: The following implications are obvious: “(b) ⇔ (c)”, “(b) ⇒ (d)”, “(d) ⇒ (e)”, and
“(f) ⇒ (g)”
“(a) ⇒ (b):” Assume that P is integral. Let F = P ∩ H be a face of P where H = {x ∈ Rn |
ct x = δ} is a supporting hyperplane of P . Then, any z ∈ F is a convex combination of integral
vectors v1 , . . . , vk of P . If vi ∈ P \ F (so ct vi < δ) for an i ∈ {1, . . . , k}, then (since ct x = δ)
there must be a j ∈ {1, . . . , k} with ct vj > δ, which is a contradiction to vj ∈ P . Thus, all vi
must be in F , so in particular F contains an integral vector.
“(c) ⇒ (f):” Follows from Corollary 19.
“(f) ⇒ (a):” Assume that (f) holds but P 6= PI . Then, there is an x∗ ∈ P \ PI . By Theorem 70,
PI is a polyhedron, so there is an inequality at x ≤ β that is valid for PI but not for x∗ , so
at x∗ > β. This is a contradiction to (f) because max{at x | x ∈ P } is finite (by Proposition 71)
but is not attained by any integral vector.
So far, we have proved that (a),(b),(c), and (f) are equivalent.
“(e) ⇒ (c):” We may assume that A and b are integral. Let F = {x ∈ Rn | A0 x = b0 } be a
minimal face of P (where A0 x ≤ b0 is a subsystem of Ax ≤ b). If there is no integral vector x
with A0 x = b0 , then, by Corollary 73, there must be a rational vector such that c := (A0 )t y is
integral while δ := y t b0 is not an integer. Moreover, we may assume that all entries of y are
positive (otherwise we add an appropriate integral vector to y). Since c is integral but δ is not
integral, the rational hyperplane H := {x ∈ Rn | ct x = δ} does not contain any integral vector.
We will show that H ∩ P = F which implies that H is a supporting hyperplane. By construction,
105
we have F ⊆ H, so we have to show that H∩P ⊆ F . Let x ∈ H∩P . Then, y t A0 x = ct x = δ = y t b0 ,
so y t (A0 x − b0 ) = 0. Thus, since all components of y are positive, A0 x = b0 , so x ∈ F .
Now, we know that (a),(b),(c),(d),(e), and (f) are equivalent.
“(g) ⇒ (e):” Let H = {x ∈ Rn | ct x = δ} be a rational supporting hyperplane of P , so
max{ct x | x ∈ P } = δ. Assume that H does not contain any integral vector. Then, by
Corollary 73, there is a positive number γ for which γc is integral but γδ is not integral. Then
max{(γc)t x | x ∈ P } = γ max{ct x | x ∈ P } = γδ 6∈ Z, so the statement of (g) is false.
Since “(f) ⇒ (g)” is trivial, this shows the equivalence of all statements. 2
Note that this Theorem implies that for any rational polyhedron P ⊆ Rn with P = PI and
any rational vector c there is a polynomial-time algorithm computing a vector x ∈ P ∩ Zn
maximizing ct x over P ∩ Zn , provided that there is an optimum solution. To this end, we only
have to compute an integral element of a minimal face F consisting of optimum solutions only
(for finding F we can apply the Ellipsoid Method). This can be done by computing an
integral solution of an equation system, which is possible in polynomial time by the method
described in the proof of Corollary 73.
Moreover, by the equivalence of (f) and (g), the existence of an integral solution can be deduced
from the integrality of the solution value. This motivates the following definition:
Note that total dual integrality is in fact a property of the system of inequalities, not just of
the polyhedron that is defined by them. For example the systems
1 1 0
x1
1 0 ≤ 0
x2
1 −1 0
and
1 1 x1 0
≤
1 −1 x2 0
define the same polyhedron. But it is easy to check that the first system of inequalities is TDI
while the second one is not TDI.
Theorem 75 Let A ∈ Qm×n and b ∈ Zm such that Ax ≤ b is totally dual integral. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
106
is an integer for each integral vector c for which the maximum is finite. Thus, by the implication
“(g) ⇒ (a)” of Theorem 74, P is integral. 2
Proof: Exercise. 2
Hence, if a system Ax ≤ b is not TDI, then no proper subsystem A0 x ≤ b0 with {x ∈ Rn | Ax ≤
b} = {x ∈ Rn | A0 x ≤ b0 } can be TDI. We call a system Ax ≤ b minimally TDI if it is TDI
but no proper subsystem of Ax ≤ b defining the same polyhedron is TDI.
max{ct x | Ax ≤ b, at x = β}
(64)
= min{bt y + β(λ − µ) | y ≥ 0, λ, µ ≥ 0, At y + (λ − µ)a = c}
is finite. Let x∗ , y ∗ , λ∗ , µ∗ be optimum primal and dual solutions. Set c̃ := c + dµ∗ ea. Then,
max{c̃t x | Ax ≤ b, at x ≤ β}
(65)
= min{bt y + βλ | y ≥ 0, λ ≥ 0, At y + λa = c̃}
is finite because x∗ is feasible for the maximum and y ∗ and λ∗ + dµ∗ e − µ∗ are feasible for the
minimum.
Since Ax ≤ b, at x ≤ β is a TDI-system, the minimum in equation (65) has an integer optimum
solution ỹ, λ̃. Then, y := ỹ, λ := λ̃, µ := dµ∗ e is an integer optimum solution for the minimum
in (64): it is obviously feasible, and its cost is:
The inequality follows from the fact that y ∗ , λ∗ + dµ∗ e − µ∗ is feasible for the minimum in (65)
and ỹ, λ̃ is an optimum solution for the minimum in (65). Hence, the minimum in (64) has an
integral optimum solution, so Ax ≤ b, at x = β is TDI. 2
107
Theorem 78 Every rational polyhedral cone is generated by an integral Hilbert basis.
is integral and an element of P . Thus (since {b1 , . . . , bk } ⊆ H), b can be written a a non-negative
integral combination of the elements of H. This shows that H is a Hilbert basis. 2
Notation: For a system of inequalities Ax ≤ b and a face F of {x ∈ Rn | Ax ≤ b}, we call
a row of A active, if the corresponding inequality in Ax ≤ b is satisfied with equality for all
x ∈ F.
Proof: “⇒:” Suppose that Ax ≤ b is TDI. Let F be a minimal face of P and let a1 , . . . , at be
the rows of A that are active for F . We have to show that {a1 , . . . , at } is a Hilbert basis. Let
c be an integral vector in cone({a1 , . . . , at }). We have to write c as an integral non-negative
combination of a1 , . . . , at . The maximum in the LP-duality equation
is attained by every vector x in F . Since Ax ≤ b is TDI, the dual problem has an integral
optimum solution y. By complementary slackness, the entries of y at positions corresponding
to rows that are not active in F are 0. Thus, c is an integral non-negative combination of
a1 , . . . , a t .
108
“⇐:” Assume that for each minimal face F of P , the rows that are active in F form a Hilbert
basis. Let c be an integral vector for which the optima in (66) are finite. We have to show that
the minimum is attained by an integral vector. Let F be a minimal face of P such that each
vector in F attains the maximum in the duality equation. Let a1 , . . . , at be rows of A that are
active in F . Then, by complementary slackness, c ∈ cone({a1 , . . . , at }). Since a1 , . . . , at form a
Hilbert basis, we can write c = ti=1 λi ai for certain non-negative integral numbers λ1 , . . . , λt .
P
We can extend (λ1 , . . . , λt ) with zero-components to a vector y ∈ Zm with y ≥ 0, At y = c and
bt y = xt At y = ct x for all x ∈ F . In other words, y is an integral optimum solution of the dual
LP. 2
Theorem 80 The rational system of inequalities Ax ≤ 0 is TDI if and only if the rows
of A form a Hilbert basis.
Proof: Follows from the previous Theorem with b = 0 (note that in the unique minimal face
of {x ∈ Rn | Ax ≤ 0} all rows of A are active). 2
Theorem 81 (Giles and Pulleyblank [1979]) For each rational polyhedron P ⊆ Rn there
exists a rational TDI-system Ax ≤ b with A ∈ Zm×n and P = {x ∈ Rn | Ax ≤ b}. The
vector b can be chosen to be integral if and only if P is integral.
Proof: We can assume w.l.o.g. that P 6= ∅. For each minimal face F of P , we define
Then, CF is a polyhedral cone. To see this, assume that P = {Ãx ≤ b̃} is some description of
P . Then CF is generated by the rows of à that are active in F .
Let F be a minimal face, and let a1 , . . . , at be an integral Hilbert basis generating CF . Choose
x0 ∈ F , and define βi := ati x0 for i = 1, . . . , t. Then, βi = max{ati x | x ∈ P } (i = 1, . . . , t). Let
SF be the system at1 x ≤ β1 , . . . , att x ≤ βt . All inequalities in SF are valid for P . Let Ax ≤ b
be the union of the systems SF over all minimal faces F of P . Then, P ⊆ {x ∈ Rn | Ax ≤ b}.
On the other hand, if x∗ ∈ Rn \ P , then there is a supporting hyperplane of P separating x∗
from P , and this supporting hyperplane touches P in a minimal face, so there is an inequality
in Ax ≤ b that is violated by x∗ . Hence, P = {x ∈ Rn | Ax ≤ b}. Moreover, by Theorem 79,
Ax ≤ b is TDI.
If P is integral, then all the βi can chosen to be integral because we can choose the vectors
x0 ∈ F as integral vectors. On the other hand, if b is integral, then by Theorem 75, P is integral.
2
In the primal-dual max{ct x | Ax ≤ b} = min{bt y | At y = c, y ≥ 0} we know (by the Simplex
Algorithm) that if both optima are finite, the minimization problem has an optimum solution
y with at most rank(A) non-zero entries. If we ask for an optimum integral solution (with
109
2
, b = 00 and c = (1).
Ax ≤ b TDI and b integral), this is not necessarily the case: see A = −3
Nevertheless, for full-dimensional solution spaces, we get the following bound on the number of
non-zero entries:
max{ct x | Ax ≤ b} = min{bt y | At y = c, y ≥ 0}
are finite. Then, the minimization problem has an integral optimum solution y with at
most 2r − 1 positive components where r := rank(A).
Since C is pointed, the maximum is finite (check that the dual LP min{ct y | y t ai ≥ 1 for i ∈
{1, . . . , t}} is feasible). We can assume that at most k of the λi are non-zero. Define
t
X t
X
c0 := c − bλi cai = (λi − bλi c)ai .
i=1 i=1
Then, c0 is an integral vector in C, so we can write it as c0 = ti=1 µi ai for some integral numbers
P
µ1 , . . . , µt ≥ 0. Since λ1 , . . . ,P
λt was an P P of (67) and µ1 + bλ1 c, . . . , µt + bλt c is
optimum solution
a feasible solution, we have ti=1 µi + ti=1 bλi c ≤ ti=1 λi , so
t
X t
X t
X
µi ≤ λi − bλi c < k
i=1 i=1 i=1
because at most k of the λi are non-zero. Thus, at most k − 1 of the µi are non-zero. Therefore,
the decomposition
Xt
c= (bλi c + µi )ai
i=1
110
Thus there would be inequalities v t x ≤ β1 and −v t x ≤ β2 (for some numbers β1 , β2 ) that can
be written as non-negative combinations of inequalities in Ax ≤ b corresponding to rows of A
that are active in F . For x ∈ F we have v t x = β1 and −v t x = β2 , so this would imply β1 = −β2
and P would be contained in {x ∈ Rn | v t x = β1 }, which is a contradiction to the assumption
that P is full-dimensional.
By Theorem 79, the rows that are active for a minimal face consisting of optimum solutions of
max{ct x | Ax ≤ b} form a Hilbert basis (because Ax ≤ b is TDI). 2
In this section, we want to identify integral matrices A such that Ax ≤ b, x ≥ 0 is TDI for
any vector b. It will turn out that these are exactly the totally unimodular matrices (see
Corollary 87).
In particular, a regular square matrix is unimodular if and only if it is integral and its determinant
is −1 or 1. Moreover, by Cramer’s rule, the inverse of any unimodular square matrix is an
integral matrix.
Exercise: Check that any series of elementary unimodular column operations, applied to a
matrix A (see Chapter 8.2), can be performed by multiplying A from the right by an appropriate
regular unimodular square matrix.
Theorem 83 Let A be a totally unimodular matrix, and let b be an integral vector. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
Proof: Let F be a minimal face of P . We will show that F contains an integral vector. By
the implication “(c) ⇒ (a)” of Theorem 74 this is sufficient to prove that P is integral.
By Proposition 22, we can write the minimal face as F = {x ∈ Rn | A0 x = b0 } where A0 x ≤ b0
is a subsystem of Ax ≤ b. We can assume that A0 has full row rank. By permuting coordinates,
111
U −1 b0
we can write A0 = U V for some matrix U with det(U ) ∈ {−1, 1}. Thus x :=
0
is an
integral vector in F . 2
Theorem 84 Let A ∈ Zm×n be a matrix with rank m. Then A is unimodular if and only
if for each integral vector b the polyhedron {x ∈ Rn | Ax = b, x ≥ 0} is integral.
Proof: “⇒:” Assume that A is unimodular, and let b be an integral vector. Let x0 be a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. This means that there are n linearly independent constraints
in the system Ax ≤ b, −Ax ≤ −b, −In x ≤ 0 that are satisfied by x0 with equality. Thus, the
columns of A corresponding to non-zero entries of x0 are linearly independent. This set of
columns can be extended to a regular m × m-submatrix B of A. Then, the restriction of x0 to
coordinates corresponding to B is B −1 b. This is integral (because det(B) ∈ {−1, 1}). The other
entries of x0 are zero, so x0 is integral.
“⇐:” Suppose that {x ∈ Rn | Ax = b, x ≥ 0} is integral for every integral vector b. Let B be
a regular m × m-submatrix of A. We have to show that det(B) ∈ {−1, 1}. To this end, it
is sufficient to show that B −1 u is integral for every integral vector u (by Cramer’s rule). So
let u be an integral vector. Then, there is an integral vector y such that z := y + B −1 u ≥ 0.
Then, b := Bz is integral. Let z 0 be a vector with Az 0 = Bz = b that arises from z by adding
zero-entries. Then, z 0 is a feasible (i.e. non-negative) basic solution of Ax = b, so it is a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. Therefore z 0 is integral, which also shows that z is integral. This
implies that B −1 u = z − y is integral. 2
Proof: The matrix A is totally unimodular if and only if Im A is unimodular. Let b be an
integral vector. Then, the
vertices
of {x ∈ Rn | Ax ≤ b, x ≥ 0} are integral if and only if the
vertices of {z ∈ Rm+n | Im A z = b, z ≥ 0} are integral. Thus, the statement follows from
Theorem 84. 2
Corollary 86 An integral matrix A is totally unimodular if and only if for all integral
vectors b and c optimum values for both sides of the duality equation
max{ct x | Ax ≤ b, x ≥ 0} = min{bt y | At y ≥ c, y ≥ 0}
Proof: Follows directly from Hoffmans and Kruskal’s Theorem (Theorem 85) using the fact
112
that a matrix is totally unimodular if and only if its transposed matrix is totally unimodular. 2
Proof: “⇒:” If A is totally unimodular, then also At is totally unimodular. Thus, by Theo-
rem 85, min{bt y | At y ≥ c, y ≥ 0} is attained by an integral vector for each vector b and each
integral vector c for which the minimum is finite. This implies that the system Ax ≤ b, x ≥ 0 is
TDI for each vector b.
“⇐:” Suppose that Ax ≤ b, x ≥ 0 is TDI for each vector b. By Theorem 75 this implies that the
polyhedron {x ∈ Rn | Ax ≤ b, x ≥ 0} is integral for each integral vector b. By Theorem 85, this
means that A is totally unimodular. 2
The following theorem provides as a certificate to show that a matrix is totally unimodular.
Proof: “⇒:” Let A be totally unimodular and R ⊆ {1, . . . , n}. Let d ∈ {0, 1}n be the
characteristic vector for R, i.e.
1 for r ∈ R
dr =
0 for r ∈ {1, . . . , n} \ R
A
Since A is totally unimodular, also the matrix −A is also totally unimodular. Thus, the
In
polytope
n 1 1
P := x ∈ R | Ax ≤ Ad , Ax ≥ Ad , x ≤ d, x ≥ 0
2 2
113
Define R1 := {r ∈ R | zr = 0} and R2 := {r ∈ R | zr = 1}. For i ∈ {1, . . . , m}, this yields
X X n
X
aij − aij = aij (dj − 2zj ) ∈ {−1, 0, 1}
j∈R1 j∈R2 j=1
˙ 2 as
“⇐:” Assume that for each R ⊆ {1, . . . , n} there are sets R1 , R2 ⊆ R with R = R1 ∪R
described in the theorem. We show by induction in k that every k × k-submatrix of A has
determinant -1,0, or 1. For k = 1 this follows from the criterion for |R| = 1.
Let k > 1. Let B = (bij )i,j∈{1,...,k} a submatrix of A. We can assume that B is non-singular
because otherwise its determinant is 0.
0
By Cramer’s rule, each entry of B −1 is det(B
det(B)
)
where B 0 arises from B by replacing a column by
a unit vector. By the induction hypothesis det(B 0 ) ∈ {−1, 0, 1}. Hence, all entries of the matrix
B ∗ := (det(B))B −1 are in {−1, 0, 1}.
Let b∗ be the first column of B ∗ . Then, Bb∗ = det(B)e1 where e1 is the first unit P vector.∗ We
∗ ∗
define R := {j ∈ {1, . . . , k} | bj 6= 0}. For i ∈ {2, . . . , k}, we have 0 = (Bb )i = j∈R bij bj , so
|{j ∈ R | bij 6= 0}| is even.
˙ 2 such that j∈R bij − j∈R bij ∈ {−1, 0, 1} for all i ∈ {1, . . . , k}. Thus, for
P P
Let R = R1 ∪R 1 2 P P
i ∈ {2,
P . . . , k}, we
P have (since |{j ∈ R | b ij 6
= 0}| is even): j∈R 1
b ij − j∈R2 bij = 0. If we also
had j∈R1 b1j − j∈R2 b1j = 0, then the columns of B would not be linearly independent. Hence,
k
P P
j∈R1 b1j − j∈R2 b1j ∈ {−1, 1} and thus, Bx ∈ {e1 , −e1 } where the vector x ∈ {−1, 0, 1} is
defined by
1 for j ∈ R1
xj = −1 for j ∈ R2
0 for j ∈ {1, . . . , k} \ R
Therefore, b∗ = det(B)B −1 e1 ∈ {det(B)x, −det(B)x}. But both b∗ and x are non-zero vectors
with entries -1,0,1 only, so we can conclude that det(B) ∈ {−1, 1}. 2
This result allows as to prove total unimodularity for some quite important matrices: The
incidence matrix of an undirected graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by:
1, if v ∈ e
av,e =
0, if v ∈
6 e
The incidence matrix of a directed graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by:
−1, if v = x
av,(x,y) = 1, if v = y
0, if v 6∈ {x, y}
114
Proof: Let G be an undirected graph and AG its incidence matrix. Since a matrix is TU if
and only if its transposed matrix is TU, we can apply Theorem 88 to the rows of AG : AG is TU
if and only if for each X ⊆ V (G) there is a partition X = A∪B ˙ with E(G[A]) = E(G[B]) = ∅.
The last condition is satisfied if and only if G is bipartite. 2
Applications:
• The previous theorem can be used to show König’s Theorem: The maximum cardinality
of a matching in a bipartite graph equals the minimum cardinality of a vertex cover.
To see this, let G be a bipartite graph and AG its incident matrix. Then, a maximum
matching is given by an integral solution of max{11m x | AG x ≤ 11n , x ≥ 0} and a minimum
vertex cover by an integral solution of min{11n y | AtG y ≥ 11m , y ≥ 0}. By the previous
theorem, AG is TU, so by Corollary 86 both optima are attained by integral vectors.
Proof: Again, we apply Theorem 88 to the transpose of the incidence matrix. For any set
R ⊆ {1, . . . , m} we can choose the R1 := R and R2 := ∅ satisfying the constraints of Theorem 88.
2
Remark: This result gives a reason for the existence of integral optimum solution of flow
problems. These results can be extended to more general linear functions on the edges of
directed graphs (see exercises).
The general strategy of cutting-plane methods can be described as follows: Assume that we are
given a polyhedron P and we want to optimize a linear function over the integral vectors in P .
To this end, we first find an optimum solution x∗ over P . If this belongs to PI , we are done,
because then we can also easily compute and integral solution of the same cost. Otherwise we
115
look for an hyperplane separating x∗ from PI , so we ask for a vector c and a number δ, such
that ct x ≤ δ for all x ∈ PI but ct x∗ > δ. Then, we add the constraint ct x ≤ δ, solve the linear
program again and iterate these steps until we get an integral solution.
How can we find half-space that contain PI but not necessarily P ? An easy observation is that
if H is a half-space that contains P , then PI is contained in HI . This motivates the following
definition:
Definition 28 Let P ⊆ Rn be a convex set. Let M be the set of all rational half-spaces
H = {x ∈ Rn | ct x ≤ δ} with P ⊆ H. Then, we define
\
P 0 := HI .
H∈M
0
We set P (0) := P and P (i+1) := P (i) for i ∈ N \ {0}. P (i) is the i-th Gomory-Chvátal-
truncation of P .
1 α−1
x∗ = (αx∗ − (α − 1)y) + y.
α α
Since ct (αx∗ − (α − 1)y) ≤ ct y = bδc, this shows that x∗ is the convex combination of two
integral vectors in H, so x∗ ∈ HI . 2
116
Proposition 92 Let P = {x ∈ Rn | Ax ≤ b} be a rational polyhedron. Then
Let ũ be an optimum solution of the minimum. Since ũt A = ct is integral, this leads to
ũt Az ≤ bũt bc, so
ct z = ũt Az ≤ bũt bc ≤ bδc.
By the previous lemma, this implies z ∈ HI . Since this is true for any half-space H containing
P , it also shows z ∈ P 0 . 2
Cuts that are given by inequalities of the type ut Ax ≤ but bc (for some vector u ≥ 0 with ut A
integral) are called Gomory-Chvátal cuts. They have been used for the first algorithms for
integer linear programming based on cutting planes (see Gomory [1963]).
117
Since Ax ≤ b is TDI, the minimum is attained by an integral vector ỹ. Thus,
Proof: Follows from the previous theorem and the fact that any rational polyhedron can be
described by a TDI-system with integral matrix (Theorem 81). 2
P 0 ∩ F = {x ∈ Rn | Ax ≤ bbc, at x = β}
= {x ∈ Rn | Ax ≤ bbc, at x ≤ bβc, at x ≥ dβe}
= F 0.
118
Now assume in addition that P is rational. Since U is unimodular, U x is integral if and only if
x is integral. This implies
(f (P ))I = conv({y ∈ Zn | y = U x, x ∈ P })
= conv({y ∈ Rn | y = U x, x ∈ P, x ∈ Zn })
= conv({y ∈ Rn | y = U x, x ∈ PI })
= f (PI ).
By Theorem 81, we can assume that Ax ≤ b is TDI, A is integral and b is rational. Then, for
any integral vector c for which min{bt y | y t AU −1 = ct , y ≥ 0} is feasible and bounded, also
min{bt y | y t A = ct U, y ≥ 0} is feasible and bounded and ct U is integral. Hence AU −1 x ≤ b is
TDI. Thus, Theorem 93 implies
(f (P ))0 = {x ∈ Rn | AU −1 x ≤ b}0 = {x ∈ Rn | AU −1 x ≤ bbc} = f (P 0 ).
2
Remark: This shows as well that (f (P ))(i) = f (P (i) ) for a rational polyhedron P and i ∈ N.
119
PI = {x ∈ Rn | Cx ≤ d} with some integral matrix C and some rational vector d. If PI = ∅, we
choose C = A and d = b − A0 11n where A0 arises from A by taking the absolute value of each
entry. Note that {x ∈ Rn | Ax + A0 11n ≤ b} = ∅ because any vector x∗ with Ax∗ + A0 11n ≤ b
could be rounded down to an integral vector x with Ax ≤ b.
Let ct x ≤ δ be an inequality in Cx ≤ d. Then, we claim that there is an s ∈ N with
P (s) ⊆ H := {x ∈ Rn | ct x ≤ δ}. The theorem is a direct consequence of this claim.
Proof of the claim: Observe that there is a number β ≥ δ with P ⊆ {x ∈ Rn | ct x ≤ β}. If
PI = ∅, this is true by construction. In the case PI 6= ∅, it follows from the fact that ct x is
bounded over P if and only if it is bounded over PI (Proposition 71).
Assume that the claim is false, so there is an integer γ with δ < γ ≤ β for which there is an
s0 ∈ N with P (s0 ) ⊆ {x ∈ Rn | ct x ≤ γ} but there is no s ∈ N with P (s) ⊆ {x ∈ Rn | ct x ≤ γ−1}.
Then, max{ct x | x ∈ P (s) } = γ for all s ≥ s0 . To see this, assume that max{ct x | x ∈ P (s) } < γ
for some s. Then there is an > 0 with P (s) ⊆ {x ∈ Rn | ct x ≤ γ − }. This implies
max{ct x | x ∈ P (s+1) } ≤ γ − 1 because {x ∈ Rn | ct x ≤ γ − }I ⊆ {x ∈ Rn | ct x ≤ γ − 1}.
Define F := P (s0 ) ∩ {x ∈ Rn | ct x = γ}. Then, dim(F ) < n = dim(P ), so we can apply the
induction hypothesis to F , which implies that there is a number s1 with F (s1 ) = FI . Thus,
F (s1 ) = FI ⊆ PI ∩ {x ∈ Rn | ct x = γ} = ∅.
Note that this section was not covered by the lecture course given in summer term 2019.
Branch-and-Bound Methods (they are also called Divide-and-Conquer Algorithms
or Backtracking Algorithms) are a quite simple approach to integer linear programming.
Nevertheless, they are of great practical relevance. Algorithm 5 describes the approach for
integer linear programs but it can be applied to mixed integer linear programs, too. The
algorithm stores a number L which is the cost of the best integral solution found so far (so in
the beginning it is −∞). In each iteration of the main loop, the algorithm chooses a polyhedron
Pj , which is a subset of the given polyhedron P0 , and solves the corresponding linear program. If
this LP is bounded and feasible, the algorithm first checks if the value c∗ of an optimum solution
x∗ is larger than L. If this is not the case, the algorithm can reject the polyhedron Pj because
it cannot contain a better integral solution than the best current solution (this is the bounding
part). If c∗ > L and x∗ is integral, we have found a better integral solution and can update L.
Otherwise, we choose a non-integral component x∗i of x∗ and compute sub-polyhedra P2j+1 and
P2j+2 of Pj with additional constraints that arise by rounding x∗i up or down (branching step).
120
Algorithm 5: Branch-and-Bound Algorithm
Input: A matrix A ∈ Qm×n , a vector b ∈ Qm , and a vector c ∈ Qn such that the LP
max{ct x | Ax ≤ b} is feasible and bounded.
Output: A vector x̃ ∈ {x ∈ Zn | Ax ≤ b} maximizing ct x or the message that there is no
optimum solution.
1 L := −∞;
n
2 P0 := {x ∈ R | Ax ≤ b};
3 K := {P0 };
4 while K 6= ∅ do
5 Choose a Pj ∈ K;
6 K := K \ {Pj };
7 if Pj 6= ∅ then
8 Solve max{ct x | x ∈ Pj };
9 Let x∗ be an optimum solution and c∗ = ct x∗ ;
10 if c∗ > L then
11 if x∗ ∈ Zn then
12 L := c∗ ;
13 x̃ := x∗ ;
14 else
15 Choose i ∈ {1, . . . , n} with x∗i 6∈ Z;
16 P2j+1 := {x ∈ Pj | xi ≤ bx∗i c};
17 P2j+2 := {x ∈ Pj | xi ≥ dx∗i e};
18 K := K ∪ {P2j+1 } ∪ {P2j+2 };
19 if L > −∞ then
20 return x̃;
21 else
22 return “There is no feasible solution”;
Figure 9 illustrates what the algorithm may do on this instance. Since the optimum solution
of the LP-relaxation is not integral, we create in the first branching step two sub-polytopes
P1 = {(x1 , x2 ) | x2 ≤ 2} ∩ P0 and P1 = {(x1 , x2 ) | x2 ≥ 3} ∩ P0 = ∅. In P1 we still do not find
an integral optimum solution, so we branch again and get the polytopes P3 and P4 . In P4 we
get an integral optimum x∗ = (1, 2) with cost 3. In P3 we get a non-integral optimum solution
121
(0, 1.5) whose cost is not better than the best integral solution found so far (provided that we
considered P4 before P3 ), so the algorithm will stop here.
A branch-and-bound computation is often represented by a so-called branch-and-bound tree.
This is in fact rather an arborescence than a tree. Its nodes are the polyhedra Pj that are
considered during the computation, and P0 is the root. For any Pj , the nodes P2j+1 and P2j+2
are its children (if they exist).
In line 5 of the algorithm, we have to choose the next LP to be solved, and in line 15 we have to
decide which non-integral component is used for creating new sub-problems. There are different
strategies for these steps (branching rules). For example, it is often reasonable to store the
elements of K in a last-in-first-out queue and to choose the last element that has been added to
K. In the branch-and-bound tree, this corresponds to a leaf with the biggest distance to the
root. This strategy can reduce the time until the first feasible solution has been found. Another
reasonable branching rule consist in choosing a polyhedron Pj for which max{ct x | x ∈ Pj } is
as large as possible. Note that the maximum over all these values for all Pj ∈ K gives an upper
bound U on the best possible solution that can still be computed. Hence, by choosing a Pj with
max{ct x | x ∈ Pj } = U , we get a chance to reduce U . This can be useful if we do not want to
compute an exact optimum solution but we stop as soon as U − L is small enough.
For the choice of x∗i a common strategy is to choose x∗i such that |x∗i − bx∗i c − 12 | is minimized.
Another, more time-consuming approach is to choose x∗i such that the effect on the objective
function is maximized (strong branching).
Further remarks:
• In order to get at least a finite algorithm, we have to guarantee that in line 8 we always
find a integral optimum solution if Pj is integral.
• Instead of initializing L with −∞, it is often possible to compute some reasonable integral
solution by some heuristics. In particular this is often the case for combinatorial problems.
• The branch-and-bound strategy can be combined with a cutting-plane algorithm (see the
previous section). For each sub-polyhedron Pj , one can try to find hyperplanes separating
some non-integral vectors in Pj from (Pj )I . This combination is called branch-and-cut
method. For example, this approach has been for solving quite large Traveling Salesman
Problems (see Padberg and Rinaldi [1991]).
122
x2
x∗
P0
−4x1 + 6x2 = 9
x1
x1 + x 2 = 4
x2
P2 = ∅
x2 = 3
x2 = 2
x∗
P1
−4x1 + 6x2 = 9
x1
x1 + x2 = 4
x2
x1 = 0 x2 = 1 x1 + x2 = 4
123
Bibliography
Adler, I., Karp, R.M., Shamir, R. [1987]: A simplex variant solving an m × d linear program in
O(min(m2 , d2 )) expected number of steps. Journal of Complexity, 3, 372–387, 1987.
Ahuja, R.K., Magnanti, T.L., and Orlin [1993]: Network Flows: Theory, Algorithms, and
Applications. Prentice Hall, 1993.
Anthony, M., and Harvey, M. [2012]: Linear Algebra: Concepts and Methods. Cambridge
University Press, 2012.
Bárány, I., Howe, R., and Lovász, L. [1992]: On integer points in polyhedra: a lower bound.
Combinatorica, 12, 135–142, 1992.
Bertsimas, D., and Tsitsiklis, J.N. [1997]: Introduction to Linear Optimization. Athena Scientific,
1997.
Bertsimas, D., and Weismantel, R. [2005]: Optimization over Integers. Dynamic Ideas, 2005.
Bland, R.G. [1977]: New finite pivoting rules for the simplex method. Mathematics of Operations
Research, 2, 103–107, 1977.
Borgwardt, K. [1982]: The average number of pivot steps required by the simplex method is
polynomial. Zeitschrift für Operations Research, 26, 157–177, 1982.
Chvátal, V. [1983]: Linear programming. Series of books in the mathematical sciences, W.H.
Freeman, 1983.
Cunningham, W.H. [1976]: A network simplex method. Mathematical Programming, 11, 105–116,
1976.
Dantzig, G.B. [1951]: Maximization of a linear function of variables subject to linear inequalities.
In: Koopmans, T.C (ed.), Activity Analysis of Production and Allocation, 359–373, Wiley,
1951.
Edmonds, J. [1965]: Maximum matching and polyhedron with (0,1) vertices. Journal of Research
of the National Bureau of Standards, B, 69, 125–130, 1965.
Eisenbrand, F. [2003]: Fast integer programming in fixed dimension. Lecture Notes in Computer
Science, 2832, 196–207, 2003.
Fischer, G. [2009]: Lineare Algebra: Eine Einführung für Studienanfänger. 18th edition, Springer,
2013.
124
Ghoulia-Houri, A. [1962]: Charactérisation des matrices totalement unimodulaires. Comptes
Rendus Hebdomadaires des Séances de l’Académie des Sciences (Paris), 254, 1192-1194, 1962.
Giles, F.R. and Pulleyblank, W.R. [1979]: Total dual integrality and integer polyhedra. Linear
Algebra and Its Applications, 25, 191–196, 1979.
Gomory, R.E. [1963]: An algorithm for integer solutions of linear programs. In: Recent Advances
in Mathematical Programing (R.L. Graves, P. Wolfe, eds.), McGraw-Hill, 269–302, 1963.
Grötschel, M., Lovász, L. and Schrijver, A. [1981]: The ellipsoid method and its consequences
in combinatorial optimization. Combinatorica, 1, 169–197, 1981.
Guenin, B., Könemann, J., and Tunçel, L. [2014]: A Gentle Introduction to Optimization.
Cambridge University Press, 2014.
Hoffman, A. and Kruskal, J. [1956]: Integral boundary points of convex polyhedra. Linear
Inequalities and Related Systems (H. Kuhn, A. Tucker, eds.), Annals of Mathematics Studies,
38, 223–246, 1956.
Hougardy, S., and Vygen, J. [2018]: Algorithmische Mathematik. Second edition, Springer, 2018.
Kalai, G., and Kleitman, D. [1992]: A quasi-polynomial bound for the diameter of graphs of
polyhedra. Bulletin of the American Mathematical Society, 26, 315–316, 1992.
Klee, V., and Minty, G.J. [1972]: How good is the simplex algorithm? In: Inequalities III (O.
Shisha, ed.), Academic Press, 159–175, 1972.
Korte, B., and Vygen, J. [2018]: Combinatorial Optimization: Theory and Algorithms. Sixth
edition, Springer, 2018.
Lee, T., Sidford, A., Wong, S.C. [2015]: A Faster Cutting Plane Method and its Implications
for Combinatorial and Convex Optimization. arxiv.org/abs/1508.04874, Symposium on
Foundations of Computer Science, 2015.
Lenstra, H.W. [1983]: Integer programming with a fixed number of variables. Mathematics of
Operations Research, 8, 538–548, 1983.
Matoušek, J., and Gärtner, B. [2007]: Understanding and Using Linear Programming. Springer,
2007.
125
Megiddo, N. [1984]: Linear programming in linear time when the dimension is fixed. Journal of
the ACM, 31, 114–127, 1984.
Mehlhorn, K., and Saxena, S. [2015]: A still simpler way of introducing the interior-point
method for linear programming. Computer Science Review, 22, 1–11, 2016.
Padberg, M. [1999]: Linear Optimization and Extensions. Second edition, Springer, 1999
Padberg, M., and Rao, M. [1982]: Odd minimum cut-sets and b-matchings. Mathematics of
Operations Research, 7, 67–80, 1982.
Padberg, M., and Rinaldi, G. [1991]: A Branch-and-Cut Algorithm for the Resolution of
Large-Scale Symmetric Traveling Salesman Problems. SIAM Review, 33, 1, 60–100, 1991.
Panik, M.J. [1996]: Linear Programming: Mathematics, Theory and Algorithms. Kluwer
Academic Publishers, 1996.
Roos, C., Terlaky, T., Vial, J.-P. [2005]: Interior Point Methods for Linear Optimization.
Second edition, Springer, 2005.
Rubin, D. [1970]: On the unlimited number of faces in integer hulls of linear programs with a
single constraint. Operations Research, 18, 5, 940 – 946, 1970.
Sierksma, G., and Zwols, Y. [2015]: Linear and Integer Optimization. Theory and Practice.
Third edition, CRC Press, 2015.
Spielmann, D.A., and Teng, S.-H. [2005]: Smoothed analysis of algorithms: Why the simplex
algorithm usually takes polynomial time. Journal of the ACM, 51, 3, 385 – 463, 2004.
Strang, G. [1980]: Linear Algebra and Its Applications. Second edition, Academic Press, 1980.
Tardos, É. [1986]: A strongly polynomial algorithm to solve combinatorial linear programs.
Operations Research, 34, 2, 250 – 256, 1986.
Terlaky, R.J. [2001]: An easy way to teach interior point methods. European Journal of
Operational Research, 130, 1–19, 2001.
Vanderbei, R.J. [2014]: Linear Programming: Foundations and Extensions. Fourth edition,
Springer, 2014.
126
Ye, Y. [1992]: On the finite convergence of interior-point algorithms for linear programming.
Mathematical Programming, 57, 325–335, 1992.
Ye, Y. [1997]: Interior Point Algorithms. Theory and Analysis. Wiley, 1997.
127
Index
Active row, 108 Face, 34
Affine linear mapping, 33 Facet, 36
Augmentation of flow, 60 Facet-defining, 36
Farkas’ Lemma, 21
b-flow, 59 Farkas-Minkowski-Weyl Theorem, 40
b-flow associated to a spanning tree structure, Feasible basic solution, 46
62 Feasible basis, 46
Basic solution, 34 Feasible spanning tree structure, 62
Basic variables, 46 Finitely generated cone, 15
Basis, 46 Fourier-Motzkin elimination, 20
Binary Linear Programs, 101 Fundamental circuit, 63
Bland’s rule, 55 Fundamental Theorem of Linear Inequalities,
Bounded optimization problem, 6 40
Branch-and-Bound Algorithm, 121
Branch-and-bound tree, 122 Gaussian Elimination, 45, 69
Branch-and-cut method, 122 Gomory-Chvátal cut, 117
Branching rule, 122 Gomory-Chvátal-truncation, 116
128
Matching, 82 Spanning tree solution, 60
Max-Flow-Min-Cut-Theorem, 31 Spanning tree structure, 62
Maximization problem, 6 Stable Set Problem, 12
Maximum-Flow Problem, 9 Standard equation form, 7
Minimal face, 37 Standard inequality form, 7
Minimally TDI, 107 Steepest edge rule, 54
Minimization problem, 6 Strict complementary slackness, 29
Minimum-Cost Flow Problem, 60 Strongly feasible spanning tree structure, 62
Minkowski sum, 42 Supporting hyperplane, 34
Mixed Integer Linear Program, 9
TDI-system, 106
Network Simplex Algorithm, 64 Totally dual integral, 106
Non-basic variables, 46 Totally unimodular matrix, 111
Non-degenerated feasible basic solution, 46
Normal, 13 Unbounded optimization problem, 6
Unimodular matrix, 111
Objective function, 6
Optimization problem, 6 Vertex, 34
Vertex Cover Problem, 11
Peak, 63
Permutation matrix, 115 Weak duality, 18
Pivot rule, 54 Weak optimization problem, 83
Pointed polyhedron, 39 Weak separation oracle, 84
Polar cone, 41
Polyhedral cone, 14
Polyhedron, 13
Polytope, 13
Positive definite matrix, 71
Positive semidefinite matrix, 71
Potential associated to a spanning tree struc-
ture, 62
Primal LP, 18
Projection of a polyhedron, 33
r-R-sandwiched set, 83
Reduced cost, 52
Residual capacity, 60
Residual graph, 60
Revised Simplex Algorithm, 58
s-t-cut, 31
s-t-flow, 9
Separation oracle, 75
Simplex Algorithm, 53
Simplex tableau, 51
Slack variable, 7
129