Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes

Linear and Integer Optimization (V3C1/F4C1)
Lecture notes
Ulrich Brenner
Research Institute for Discrete Mathematics, University of Bonn
Summer term 2019
July 11, 2019
18:52
1
Preface
Continuous updates of these lecture notes can be found on the following webpage:
http://www.or.uni-bonn.de/lectures/ss19/lgo ss19.html
These lecture notes are based on a number of textbooks and lecture notes from earlier courses. See
e.g. the lecture notes by Tim Nieberg (winter term 2012/2013) and Stephan Held (winter term
2013/2014 and 2017/18) that are available online on the teaching web pages of the Research Insti-
tute for Discrete Mathematics, University of Bonn (http://www.or.uni-bonn.de/lectures).
Recommended textbooks:
• Chvátal [1983]: Still a good introduction into the field of linear programming.
• Korte and Vygen [2018]: Chapters 3–5 contain the most important results of this lecture
course. Very compact description.
• Matoušek and Gärtner [2007]: Very good description of the linear programming part. For
some results, proofs are missing, and the book does not consider integer programming.
• Schrijver [1986]: Comprehensive textbook covering both linear and integer programming.
Proofs are short but precise.
Prerequisites of this course are the lectures “Algorithmische Mathematik I” and “Lineare
Algebra I/II”. The lecture “Algorithmische Mathematik I” is covered by the textbook by
Hougardy and Vygen [2018]. The results concerning Linear Algebra that are used in this course
can be found, e.g., in the textbooks by Anthony and Harvey [2012], Bosch [2007], and Fischer
[2009].
We we also make use of some basic results of the complexity theory as they are taught in the
lecture course “Einführung in die Diskrete Mathematik”. These results on complexity theory
can be found e.g. in Chapter 15 of the textbook by Korte and Vygen [2018].
The notation concerning graphs is based on the notation proposed in the textbook by Korte
and Vygen [2018].
Please report any errors in these lecture notes to brenner@or.uni-bonn.de
2
Contents
1 Introduction 5
1.1 A First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Possible Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Integrality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Modeling of Optimization Problems as (Integral) Linear Programs . . . . . . . 9
1.6 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Duality 17
2.1 Dual LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Fourier-Motzkin Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 The Structure of Polyhedra 33

3.1 Mappings of Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Minimal Faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7 Decomposition of Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Simplex Algorithm 45
4.1 Feasible Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Efficiency of the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Dual Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3
4.5 Network Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Sizes of Solutions 67
5.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Ellipsoid Method 71
6.1 Idealized Ellipsoid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3 Ellipsoid Method for Linear Programs . . . . . . . . . . . . . . . . . . . . . . . 80
6.4 Separation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7 Interior Point Methods 87

7.1 Modification of the LP and Computation of an Initial Solution . . . . . . . . . 88
7.2 Solutions for Reduced Values of µ . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Finding an Optimum Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8 Integer Linear Programming 101

8.1 Integral Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Integral Solutions of Equation Systems . . . . . . . . . . . . . . . . . . . . . . 103
8.3 TDI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.4 Total Unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.5 Cutting Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.6 Branch-and-Bound Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4
1 Introduction
1.1 A First Example
Assume that a farmer has 10 hectares of land where he can grow two kinds of crops: maize and
wheat (or a combination of both). For each hectare of maize he gets a revenue of 2 units of
money and for each hectare of wheat he gets 3 units of money. Planting maize in an area of one
hectare takes him 1 day while planting wheat takes him 2 days per hectare. In total, he has
16 days for the work on his field. Moreover, each hectare planted with maize needs 5 units of
water and each hectare planted with wheat needs 2 units of water. In total he has 40 units of
water. How can he maximize his revenue?
If x1 is the number of hectares planted with maize and x2 is the number of hectares planted
with wheat we can write the corresponding optimization problem in the following compact way:
max 2x1 + 3x2 // Objective function

subject to x1 + x2 ≤ 10 // Bound on the area
x1 + 2x2 ≤ 16 // Bound on the workload
5x1 + 2x2 ≤ 40 // Bound on the water resources
x1 , x2 ≥ 0 // An area cannot be negative
This is what we call a linear program (LP). In such an LP, we are given a linear objective
function (in our case (x1 , x2 ) 7→ 2x1 + 3x2 ) that has to be maximized or minimized under a
number of linear constraints. These constraints can be given by linear inequalities (but not
strict inequalities “<”) or by linear equations. However, a linear equation can easily be replaced
by a pair of inequalities (e.g. 4x1 + 3x2 = 7 is equivalent to 4x1 + 3x2 ≤ 7 and 4x1 + 3x2 ≥ 7),
so we may assume that all constraints are given by linear inequalities.
In our example, there were only two variables, x1 and x2 . In this case, linear programs can be
solved graphically. Figure 1 illustrates the method. The grey area is the set
{(x1 , x2 ) ∈ R2 | x1 + x2 ≤ 10} ∩ {(x1 , x2 ) ∈ R2 | x1 + 2x2 ≤ 16} ∩

{(x1 , x2 ) ∈ R2 | 5x1 + 2x2 ≤ 40} ∩ {(x1 , x2 ) ∈ R2 | x1 , x2 ≥ 0},
which is the set of all feasible solutions of our problem. We can solve the problem by moving
the green line, which is orthogonal to the cost vector 23 (shown in red), in the direction of 23
as long as it intersects the feasible area. We end up with x1 = 4 and x2 = 6, which is in fact an
optimum solution.
5
x2
5x1 + 2x2 = 40
x1
x1 + x2 = 10 x1 + 2x2 = 16
Fig. 1: Graphic solution of the first example.
1.2 Optimization Problems
Definition 1 An optimization problem is a pair (I, f ) where I is a set and f : I → R

is a mapping. The elements of I are called feasible solutions of (I, f ). If I = ∅, the
optimization problem (I, f ) is called infeasible, otherwise we call it feasible. The function
f is called the objective function of (I, f ).
We either ask for an element x∗ ∈ I such that for all x ∈ I we have f (x) ≤ f (x∗ ) (then
(I, f ) is called a maximization problem) or for an element x∗ ∈ I such that for all
x ∈ I we have f (x) ≥ f (x∗ ) (then (I, f ) is called a minimization problem). In both
cases, such an element x∗ is called an optimum solution. (I, f ) is unbounded if
for all K ∈ R, there is an x ∈ I with f (x) > K (for the maximization problem) or an
x ∈ I with f (x) < K (for the minimization problem). An optimization problem is called
bounded if it is not unbounded.
In this lecture course, we consider optimization problems with linear objective functions and
linear constraints. The constraints can be written in a compact way using matrices:
Linear Programming
Instance: A matrix A ∈ Rm×n , vectors c ∈ Rn and b ∈ Rm .
Task: Find a vector x ∈ Rn with Ax ≤ b maximizing ct x.
6
Notation: Unless stated differently, always let A = (aij ) i=1,...,m ∈ Rm×n , b = (b1 , . . . , bm ) ∈ Rm
j=1,...,n
and c = (c1 , . . . , cn ) ∈ Rn .
Remark: Real vectors are simply ordered sets of real numbers. But when we multiply vectors
with each other or with matrices, we have to interpret them as n × 1-matrices (column vectors)
or as 1 × n-matrices (row vectors). By default, we consider vectors as column vectors in this
context, so if we want to use them as row vectors, we have to transpose them (“ct ”).
We often write linear programs in the following way:
max ct x
(1)
s.t. Ax ≤ b
Or, in a shorter version we write: max{ct x | Ax ≤ b}.

The i-th row of matrix A encodes the constraint nj=1P
P
aij xj ≤ bi on a solution x = (x1 , . . . , xn ).
We could also allow equation constraints of the form nj=1 aij xj = bi but (as mentioned in the
example in Section 1.1) these could be easily replaced by two inequalities. The formulation (1)
which avoids such equation constraints is called standard inequality form. Obviously, we
can also handle minimization problems with this approach because minimizing the objective
function ct x means maximizing the objective function −ct x.
A second important standard form for linear programs is the standard equation form:
max ct x
s.t. Ax = b (2)
x ≥ 0
Both standard forms can be transformed into each other: If we are given a linear program in
standard equation form we can replace each equation by a pair of inequalities and the constraint
x ≥ 0 by −In x ≤ 0 (where In is always the n × n-identity matrix). This leads to a formulation
of the same linear program in standard inequality form.
The transformation from the standard inequality form into the standard equation form is slightly
more complicated: Assume we are given the following linear program in standard inequality
form
max ct x
(3)
s.t. Ax ≤ b
We replace each variable xi by two variables zi and z̄i . Moreover, for each of the m constraints
we introduce a new variable x̃i (a so-called slack variable). With variables z = (z1 , . . . , zn ),
z̄ = (z̄1 , . . . , z̄n ) and x̃ = (x̃1 , . . . , x̃m ), we state the following LP in standard equation form:
max ct (z − z̄)


z
s.t. [A | − A | Im ]  z̄  = b (4)
x̃
z, z̄, x̃ ≥ 0
7
Note that [A | − A | Im ] is the m × 2n + m-matrix that we get by concatenating the matrices
A, −A and Im . Any solution z,z̄ and x̃ of the LP (4) gives a solution of the LP (3) with the
same cost by setting: xj := zj − z̄j (for j ∈ {1, . . . , n}).
On the other hand, if x is a solution of LP (3), then we get a solution of LP (4) with thePsame cost
n
by setting zj := max{xj , 0}, Pnz̄j := − min{xj , 0} (for j ∈ {1, . . . , n}) and x̃i = bi − j=1 aij xj
(for i ∈ {1, . . . , m}, where j=1 aij xj ≤ bi is the i-th constraint of Ax ≤ b).
Note that (in contrast to the first transformation) this second transformation (from the standard
inequality form into the standard equation form) leads to a different solution space because we
have to introduce new variables.
1.3 Possible Outcomes
There are three possible outcomes for a linear program max{ct x | Ax ≤ b}:
• The linear program can be infeasible. This means that {x ∈ Rn | Ax ≤ b} = ∅. A simple

example is:
max x
s.t. x ≤ 0 (5)
−x ≤ −1
• The linear program can be feasible but unbounded. This means that for each constant
K there is a feasible solution x with ct x ≥ K. An example is
max x
s.t. x−y ≤ 0 (6)
y−x ≤ 1
• The linear program can be feasible and bounded, so there is an x ∈ Rn with Ax ≤ b
and we have sup{ct x | Ax ≤ b} < ∞. An example is the LP that we saw in Section 1.1.
It will turn out that in this case there is always a vector x̃ ∈ Rn with Ax̃ ≤ b with
ct x̃ = sup{ct x | Ax ≤ b}.
We will see that deciding if a linear program is feasible is as hard as computing an optimum
solution to a feasible and bounded linear program (see Section 2.4).
1.4 Integrality Constraints
In many applications, we need an integral solution. This leads to the following class of problems:
Integer Linear Programming

Instance: A matrix A ∈ Rm×n , vectors c ∈ Rn and b ∈ Rm .
Task: Find a vector x ∈ Zn with Ax ≤ b maximizing ct x.
8
Replacing the constraint x ∈ Rn by x ∈ Zn makes a huge difference. We will see that
there are polynomial-time algorithms for Linear Programming while Integer Linear
Programming is NP-hard.
Of course, one can also consider optimization problems where we have integrality constraints
only for some of the variables. These linear optimization problems are called Mixed Integer
Linear Programs.
1.5 Modeling of Optimization Problems as (Integral) Linear Pro-

grams
We consider some examples how optimization problems can be modeled as LPs or ILPs. Many
flow problems can easily formulated as linear programs:
Definition 2 Let G be a directed graph with capacities u : E(G) → R>0 and let s and t
be two vertices of G. A feasible s-t-flow in (G, u) is a mapping f : E(G) → R≥0 with
• f (e) ≤ u(e) for all e ∈ E(G) and

P P
• +
e∈δG (v) f (e) − −
e∈δG (v) f (e) = 0 for all v ∈ V (G) \ {s, t}.
P P
The value of an s-t-flow f is val(f ) = e∈δ+ (s) f (e) − e∈δ− (s) f (e).
G G
Maximum-Flow Problem
Instance: A directed Graph G, capacities u : E(G) → R>0 , vertices s, t ∈ V (G) with s 6= t.
Task: Find an s-t-flow f : E(G) → R≥0 of maximum value.
This problem can be formulated as a linear program in the following way:
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(7)
P P xe ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
e∈δG (v) e∈δG (v)
It is well known that the value of a maximum s-t-flow equals the capacity of a minimum cut
separating s from t. We will see in Section 2.5 that this result also follows from properties of the
linear program formulation. Moreover, if the capacities are integral, there is always a maximum
flow that is integral (see Section 8.4).
9
In some cases, we first have to modify a given optimization problem slightly in order to get a
linear program formulation. See the following example of a modified version of the Maximum-
Flow Problem where we have two sources and want to maximize the minimal out-flow of
both sources.
Bottleneck Maximum-Flow Problem with 2 Sources

Instance: A directed Graph G, capacities u : E(G) → R>0 ,
three vertices s1 , s2 , t ∈ V (G).
Task: Find a mapping f : E(G) → R≥0 with
• ∆f (e) = 0 for all v ∈ V (G) \ {s1 , s2 , t}
such that min{∆f (s1 ), ∆f (s2 )} is maximized
The objective function here is not a linear function but the minimum of two linear function. To
see how such a problem can be written as an LP, we assume slightly more general that we are
given the following optimization problem:
max min{ct x + d, et x + f }
s.t. Ax ≤ b
for some c, e ∈ Rn and d, f ∈ R.

Though the objective function ist not linear, we can define an equivalent linear program in the
following way:
max σ
t
s.t. σ − c x ≤ d
σ − et x ≤ f
Ax ≤ b
And of course, this trick also works if we want to compute the minimum of more than two
linear functions.
More or less the same trick can be applied to the following problem in which the objective
function contains absolute values of linear functions:
min |ct x + d|
s.t. Ax ≤ b
for some c ∈ Rn and d ∈ R. Again the problem can be written equivalently as a linear program
in the following form:
max −σ
s.t. −σ − ct x ≤ d
−σ + ct x ≤ −d
Ax ≤ b
10
The two additional constraints on σ ensure that we have σ ≥ max{ct x + d, −ct x − d} = |ct x + d|.
Other problems allow a formulation as an ILP but assumably not an LP formulation:
Vertex Cover Problem

Instance: An undirected graph G, weights c : V (G) → R≥0 .
Task: P a set X ⊆ V (G) with {v, w} ∩ X 6= ∅ for all e = {v, w} ∈ E(G) such that
Find
v∈X c(v) is minimized.
This problem is known to be NP-hard (see standard textbooks like Korte and Vygen [2018]),
so we cannot hope for a polynomial-time algorithm. Nevertheless, the problem can easily be
formulated as an integer linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G) (8)
xv ∈ {0, 1} for v ∈ V (G)
For each vertex v ∈ V (G), we have a 0-1-variable xv which is 1 if and only if v should be in the
set X, i.e. if (xv )v∈V (G) is an optimum solution to (8), the set X = {v ∈ V (G) | xv = 1} is an
optimum solution to the Vertex Cover Problem.
This example shows that Integer Linear Programming itself is an NP-hard problem. By
skipping the integrality constraints (xv ∈ {0, 1}) we get the following linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G)
(9)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
We call this linear program an LP-relaxation of (8). In this particular case, the relaxation
gives a 2-approximation of the Vertex Cover Problem: For any solution x of the relaxed
problem, we get an integral solution x̃ by setting
1 : xv ≥ 21

x̃v =
0 : xv < 12
P P
It is easy to check that yields a feasible solution of the ILP with x̃v c(v) ≤ 2 xv c(v).
v∈V (G) v∈V (G)
Obviously, in minimization problems relaxing some constraints can only decrease the value of
an optimum solution. We call the supremum of the ratio between the values of the optimum
solutions of an ILP and its LP-relaxation the integrality gap of the relaxation. The rounding
procedure described above also proves that in this case the integrality gap is at most 2. Indeed,
the is the integrality gap as the example of a complete graph with weights c(v) = 1 for all vertices
v shows. For the Maximum-Flow Problem with integral edge capacities, the integrality gap
is 1 because there is always an optimum flow that is integral.
11
The following problem is NP-hard as well:
Stable Set Problem

Instance: An undirected graph G, weights c : V (G) → R≥0 .
Task: P a set X ⊆ V (G) with |{v, w} ∩ X| ≤ 1 for all e = {v, w} ∈ E(G) such that
Find
v∈X c(v) is maximized.
Again, this problem can easily be formulated as an integer linear program:

P
max v∈V (G) xv c(v)
s.t. x v + xw ≤ 1 for {v, w} ∈ E(G) (10)
xv ∈ {0, 1} for v ∈ V (G)
An LP-relaxation looks like this:
P
max v∈V (G) xv c(v)
s.t. x v + xw ≤ 1 for {v, w} ∈ E(G)
(11)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
Unfortunately, in this case, the LP-relaxation is of no use. Even if G is a complete graph (were a
feasible solution of the Stable Set Problem can contain at most one vertex), setting xv = 12
for all v ∈ V (G) would be a feasible solution of the LP-relaxation. This example shows that the
integrality gap is at least n2 . Hence, this LP-relaxation does not provide any useful information
about a good ILP solution.
1.6 Polyhedra
In this section, we examine basic properties of solution spaces of linear programs.
Definition 3 Let X ⊆ Rn (for n ∈ N). X is called convex if for all x, y ∈ X and

t ∈ [0, 1] we have tx + (1 − t)y ∈ X.
Definition 4 For x1 , . . . , xk ∈ Rn , λ1 , . . . , λk , λi ≥ 0 (i ∈ {1, . . . , k}) with ki=1 λi = 1,

P
we call x = ki=1 λi xi convex combination of x1 , . . . , xk . The convex hull conv(X)

P
of a set X ⊆ Rn is the set of all convex combinations of sets of vectors in X.
Remark: It is easy to check that the convex hull of a set X ⊆ Rn is the (inclusion-wise)
minimal convex set containing X.
12
Definition 5 Let X ⊆ Rn for some n ∈ N.
(a) X is called a half-space if there is a vector a ∈ Rn \ {0} and a number b ∈ R such

that X = {x ∈ Rn | at x ≤ b}. The vector a is called a normal of X.
(b) X is called a hyperplane if there is a vector a ∈ Rn \ {0} and a number b ∈ R

such that X = {x ∈ Rn | at x = b}. The vector a is called a normal of X.
(c) X is called a polyhedron if there are a matrix A ∈ Rm×n and a vector b ∈ Rm

such that X = {x ∈ Rn | Ax ≤ b}.
(d) X is called a polytope if it is a polyhedron and there is a number K ∈ R such that

||x|| ≤ K for all x ∈ X.
Examples: The empty set is a polyhedron because ∅ = {x ∈ Rn | 0t x ≤ −1} and, of course, it

is a polytope. Rn is also a polyhedron, because Rn = {x ∈ Rn | 0t x ≤ 0} (but, of course, Rn is
not a polytope).
Observation: Polyhedra are convex and closed (see exercises).
Lemma 1 A set X ⊆ Rn is a polyhedron if and only if one of the following conditions

holds:
• X = Rn
• X is the intersection of a finite number of half-spaces.
Proof: “⇐:” If X = Rn or X is the intersection of a finite number of half-spaces, it is

obviously a polyhedron.
“⇒:” Assume that X is a polyhedron but X 6= Rn . If X = ∅, then X = {x ∈ Rn | 11tn x ≤
−1} ∩ {x ∈ Rn | −11tn x ≤ −1} (where 11n is the all-one vector of length n). Hence we can assume
that X 6= ∅.
Let A ∈ Rm×n be a matrix and b ∈ Rm a vector with X = {x ∈ Rn | Ax ≤ b}. Denote the rows
of A by a1 , . . . am . If aj = 0 for an j ∈ {1, . . . , m}, then bj ≥ 0 (where b = (b1 , . . . , bm )) because
otherwise X = ∅. Hence we have
m
\ \
X= {x ∈ Rn | atj x ≤ bj } = {x ∈ Rn | atj x ≤ bj },
j=1 j=1,...,m:aj 6=0
which is a representation of X as an intersection of a finite number of half-spaces. 2
13
Definition 6 The dimension of a set X ⊆ Rn is
dim(X) = n − max{rank(A) | A ∈ Rn×n with Ax = Ay for all x, y ∈ X}.
In other words, the dimension of X ⊆ Rn is n minus the maximum size of a set of linear
independent vectors that are orthogonal to any difference of elements in X. For example, the
empty set and sets consisting of exactly one vector have dimension 0. The set Rn has dimension
n.
Observation: The dimension of a set X ⊆ Rn is the largest d for which X contains elements
v0 , v1 , . . . , vd such that v1 − v0 , v2 − v0 , . . . , vd − v0 are linearly independent.
Definition 7 A set X ⊆ Rn is called a convex cone if X 6= ∅ and for all x, y ∈ X and

λ, µ ∈ R≥0 we have λx + µy ∈ X.
Observation: A non-empty set X ⊆ Rn is a convex cone if and only if X is convex and for all
x ∈ X and λ ∈ R≥0 we have λx ∈ X.
Definition 8 A set X ⊆ Rn is called a polyhedral cone if it is a polyhedron and a

convex cone.
Lemma 2 A set X ⊆ Rn is a polyhedral cone if and only if there is a matrix A ∈ Rm×n

such that X = {x ∈ Rn | Ax ≤ 0}.
Proof: “⇐:” Let X = {x ∈ Rn | Ax ≤ 0} for some matrix A ∈ Rm×n . Then X obviously

is a polyhedron and non-empty (because 0 ∈ X). And if x, y ∈ X and λ, µ ∈ R≥0 , then
A(λx + µy) ≤ 0, so λx + µy ∈ X. Hence X is a convex cone, too.
“⇒:” Let X ⊆ Rn be a polyhedral cone. In particular, there is a matrix A ∈ Rm×n and a vector
b ∈ Rm such that X = {x ∈ Rn | Ax ≤ b}. Since X is a convex cone, it is non-empty and it
must contain 0. Therefore, no entry of b can be negative. Thus, X ⊇ {x ∈ Rn | Ax ≤ 0}. But if
there was a vector x ∈ X such that Ax has positive i-th entry (for an i ∈ {1, . . . , m}), then for
sufficiently large λ, the i-th entry of λAx would be greater than bi which is a contradiction to
the assumption that X is a convex cone. Therefore, X = {x ∈ Rn | Ax ≤ 0}. 2
Let x1 , . . . , xm ∈ Rn be vectors. The cone generated by x1 , . . . , xm is the set
( m )
X
cone({x1 , . . . , xm }) := λi xi | λ1 , . . . , λm ≥ 0 .
i=1
14
A convex cone C is called finitely generated if there are vectors x1 , . . . , xm ∈ Rn with
C = cone({x1 , . . . , xm }).
It is easy to check that cone({x1 , . . . , xm }) is indeed a convex cone. We will see in Section 3.5
that a cone is polyhedral if and only if it is finitely generated.
15
16
2 Duality
2.1 Dual LPs
Consider the following linear program (P):
max 12x1 + 10x2

s.t. 4x1 + 2x2 ≤ 5
8x1 + 12x2 ≤ 7
2x1 − 3x2 ≤ 1
How can we find upper bounds on the value of an optimum solution? By combining the first
two constraints we can get the following bound for any feasible solution (x1 , y1 ):
1 1
12x1 + 10x2 = 2 · (4x1 + 2x2 ) + (8x1 + 12x2 ) ≤ 2 · 5 + · 7 = 13.5.
2 2
We can even do better by combining the last two inequalities:
7 4 7 4
12x1 + 10x2 = · (8x1 + 12x2 ) + · (2x1 − 3x2 ) ≤ · 7 + · 1 = 9.5.
6 3 6 3
More generally, for computing upper bounds we ask for non-negative numbers u1 , u2 , u3 such
that
12x1 + 10x2 = u1 · (4x1 + 2x2 ) + u2 · (8x1 + 12x2 ) + u3 · (2x1 − 3x2 ).
Then, 5 · u1 + 7 · u2 + 1 · u3 is an upper bound on the value of any solution of (P), so we want
to chose u1 , u2 , u3 in such a way that 5 · u1 + 7 · u2 + 1 · u3 is minimized.
This leads us to the following linear program (D):
min 5u1 + 7u2 + u3

s.t. 4u1 + 8u2 + 2u3 = 12
2u1 + 12u2 − 3u3 = 10
u1 ≥ 0
u2 ≥ 0
u3 ≥ 0
This linear program is called the dual linear program of (P). Any solution of (D) yields
an upper bound on the optimum value of of (P), and in this particular case it turns out that
u1 = 0 , u2 = 76 , u3 = 43 (the second solution from above) with value 9.5 is an optimum solution
of (D) because x1 = 11 16
, x2 = 18 is a solution of (P) with value 9.5.
For a general linear program (P)
max ct x
s.t. Ax ≤ b
17
in standard inequality form we define its dual linear program (D) as
min bt y
s.t. At y = c
y ≥ 0
In this context, we call the linear program (P) primal linear program.
Remark: Note that the dual linear program does not only depend on the objective function
and the solution space of the primal linear program but on its description by linear inequalities.
For example adding redundant inequalities to the system Ax ≤ b will lead to more variables in
the dual linear program.
Proposition 3 (Weak duality) If both the equation systems Ax ≤ b and At y = c, y ≥ 0

have a feasible solution, then
max{ct x | Ax ≤ b} ≤ min{bt y | At y = c, y ≥ 0}.
Proof: For x with Ax ≤ b and y with At y = c, y ≥ 0, we have
ct x = (At y)t x = y t Ax ≤ y t b.
2
Remark: The term “dual” implies that applying the transformation from (P) to (D) twice
yields (P) again. This is not exactly the case but it is not very difficult to see that dualizing (D)
(after transforming it into standard equational form) gives a linear program that is equivalent
to (P) (see the exercises).
2.2 Fourier-Motzkin Elimination
Consider the following system of inequalities:
3x + 2y + 4z ≤ 10
3x + 2z ≤ 9
2x − y ≤ 5
(12)
−x + 2y − z ≤ 3
−2x ≤ 4
2y + 2z ≤ 7
Assume that we just want to decide if a feasible solution x, y, z exists. The goal is to get rid of
the variables one after the other. To get rid of x, we first reformulate the inequalities such that
18
we can easily see lower and upper bounds for x:
x ≤ 10
3
− 2
3
y − 4
3
z
2
x ≤ 3 − 3
z
5
x ≤ 2
+ 12 y
(13)
x ≥ −3 + 2y − z
x ≥ −2
2y + 2z ≤ 7
This system of inequalities has a feasible solution if and only if the following system (that does
not contain x) has a solution:
min 10 − 32 y − 43 z, 3 − 23 z, 52 + 21 y ≥ max {−3 + 2y − z, −2}

3 (14)
2y + 2z ≤ 7
This system can be rewritten equivalently in the following way:
10 2 4
3
− 3
y ≥ − 3
z −3 + 2y − z
10 2 4
3
− 3
y ≥ − 3
z −2
2
3 ≥ − 3
z −3 + 2y − z
2
3 ≥ − 3
z −2 (15)
5 1
2
+ 2y ≥ −3 + 2y − z
5 1
2
+ 2y ≥ −2
2y + 2z ≤ 7
This is equivalent to the following system in standard form:
8 1 19
3
y+ ≤ 3
z 3
2 4 16
3
y+ ≤ 3
z 3
1
2y − ≤ 3
z 6
2
≤ 3
z 5 (16)
3 11
2
y − z ≤ 2
1 9
−2y ≤ 2
2y + 2z ≤ 7
We can iterate this step until we end up with a system of inequalities without variables. It is
easy to check if all inequalities in this final system are valid, which is equivalent to the existence
of a feasible solution of the initial system of inequalities. Moreover, we can also find a feasible
solution if one exists. To see this, note that any solution of the system (14) (that contains y
and z as variables only) also gives a solution of the system (13) by setting x to a value in the
interval

10 2 4 2 5 1
max {−3 + 2y − z, −2} , min − y − z, 3 − z, + y .
3 3 3 3 2 2
19
Note that this method, which is called Fourier-Motzkin elimination, is in general very
inefficient. If m is the number of inequalities in the initial system, it may be necessary to state
m2
4
inequalities in the system with one variable less (this is the case if there are m2 inequalities
that gave an upper bound on the variable we got rid of and m2 inequalities that gave a lower
bound).
Nevertheless, the Fourier-Motzkin elimination can be used to get a certificate that a given
system of inequalities does not have a feasible solution. In the proof of the following theorem
we give a general description of one iteration of the method:
Theorem 4 Let A ∈ Rm×n and b ∈ Rm (with n ≥ 1). Then there are Ã ∈ Rm̃×(n−1) and
2
b̃ ∈ Rm̃ with m̃ ≤ max{m, m4 } such that
(a) Each inequality in the system Ãx̃ ≤ b̃ is a positive linear combination of inequalities
from Ax ≤ b
(b) The system Ax ≤ b has a solution if and only if Ãx̃ ≤ b̃ has a solution.
Proof: Denote the entries of A by aij , i.e. A = (aij ) i=1,...,m . We will show how to get rid of
j=1,...,n
the variable with index 1. To this end, we partition the index set {1, . . . , m} of the rows into
three disjoint sets U ,L, and N :
U := {i ∈ {1, . . . , m} | ai1 > 0}

L := {i ∈ {1, . . . , m} | ai1 < 0}
N := {i ∈ {1, . . . , m} | ai1 = 0}
We can assume that |ai1 | = 1 for all i ∈ U ∪ L (otherwise we divide the corresponding inequality
by |ai1 |).
For vectors ãi = (ai2 , . . . ain ) and x̃ = (x2 , . . . xn ) (that are empty if n = 1), we replace the
inequalities that correspond to indices in U and L by
ãti x̃ + ãtk x̃ ≤ bi + bk i ∈ U, k ∈ L. (17)
Obviously, each of these |U |·|L| new inequalities is simply the sum of two of the given inequalities
(and hence a positive linear combination of them).
The inequalities with index in N are rewritten as
ãtl x̃ ≤ bl l ∈ N. (18)
The inequalities in (17) and (18) form a set of inequalities Ãx̃ ≤ b̃ with n − 1 variables, and each
solution of Ax ≤ b gives a solution of Ãx̃ ≤ b̃ by restricting x = (x1 , . . . , xn ) to (x2 , . . . , xn ).
20
On the other hand, if x̃ = (x2 , . . . , xn ) is a solution of Ãx̃ ≤ b̃, then we can set x̃1 to any value
in the (non-empty) interval
max{ãtk x̃ − bk | k ∈ L}, min{bi − ãti x̃ | i ∈ U }

where we set the minimum of an empty set to ∞ and the maximum of an empty set to −∞.
Then, x = (x̃1 , x2 , . . . , xn ) is a solution of Ax ≤ b. 2
2.3 Farkas’ Lemma
Theorem 5 (Farkas’ Lemma for a system of inequalities) For A ∈ Rm×n and b ∈ Rm ,

the system Ax ≤ b has a solution if and only if there is no vector u ∈ Rm with u ≥ 0,
ut A = 0t and ut b < 0.
Proof: “⇒:” If Ax ≤ b and u ∈ Rm with u ≥ 0, ut A = 0t and ut b < 0, then 0 = (ut A)x =

ut (Ax) ≤ ut b < 0, which is a contradiction.
“⇐:” Assume that Ax ≤ b does not have a solution. Let A(0) := A and b(0) := b. We apply
Theorem 4 to A(0) x(0) ≤ b(0) and get a system A(1) x(1) ≤ b(1) of inequalities with n − 1 variables
such that A(1) x(1) ≤ b(1) does not have a solution either and such that each inequality of
A(1) x(1) ≤ b(1) is a positive linear combination of inequalities of A(0) x(0) ≤ b(0) . We iterate this
step n times, and in the end, we get a system of inequalities A(n) x(n) ≤ b(n) without variables
(so x(n) is in fact a vector of length 0) that does not have a solution. Moreover, each inequality
in A(n) x(n) ≤ b(n) is a positive linear combination of inequalities in Ax ≤ b. Since A(n) x(n) ≤ b(n)
does not have a solution, it must contain an inequality 0 ≤ d for a constant d < 0. This is a
positive linear combination of inequalities in Ax ≤ b, so there is a vector u ∈ Rm with u ≥ 0,
ut A = 0t and ut b = d < 0. 2
Theorem 6 (Farkas’ Lemma, most general case) For A ∈ Rm1 ×n1 , B ∈ Rm1 ×n2 , C ∈
Rm2 ×n1 , D ∈ Rm2 ×n2 , a ∈ Rm1 and b ∈ Rm2 exactly one of the two following systems has
a feasible solution:
System 1:
Ax + By ≤ a
Cx + Dy = b (19)
x ≥ 0
System 2:
ut A + v t C ≥ 0t
ut B + v t D = 0t
(20)
u ≥ 0
u a + vtb
t
< 0
21
Proof: The first system is equivalent to
Ax + By ≤ a
Cx + Dy ≤ b
−Cx − Dy ≤ −b
−In1 x ≤ 0
By Theorem 5, this system has a solution if and only if the following sytem does not have a
solution:
ut1 A + ut2 C − ut3 C − ut4 = 0t

ut1 B + ut2 D − ut3 D = 0t
ut1 a + ut2 b − ut3 b < 0t
u1 ≥ 0
u2 ≥ 0
u3 ≥ 0
u4 ≥ 0
Obviously, this system has a solution if and only if the second system of the theorem has a
solution. 2
Corollary 7 (Farkas’ Lemma, further variants) For A ∈ Rm×n and b ∈ Rm , the following
statements hold:
(a) There is a vector x ∈ Rn with x ≥ 0 and Ax = b if and only if there is no vector

u ∈ Rm with ut A ≥ 0t and ut b < 0.
(b) There is a vector x ∈ Rn with Ax = b if and only if there is no vector u ∈ Rm with

ut A = 0t and ut b < 0.
Proof: Restrict the statement of Theorem 6 to the vector b and matrix C (for part (a)) of D
(for part (b)). 2
Remark: Statement (a) of Corollary 7 has a nice geometric interpretation. Let C be the cone
generated by the columns of A. Then, the vector b is either in C or there is a hyperplane (given
by the normal u) that separates b from C.

2 3
and b1 = 52 and b2 = 13 (see Figure 2). The vector b1

As an example consider A =
1 1
is in the cone generated by the columns of A (because 52 = 21 + 31 ) while b2 can by separated

1
from the cone by a hyperplane orthogonal to u = −2 .
22
y
1

b2 = 3
5

b1 = 2
2

1
3

1
x
1

u= −2
Fig. 2: Example for the statement in Corollary 7(a).
23
2.4 Strong Duality
Theorem 8 (Strong duality) For the two linear programs
max ct x (P )
s.t. Ax ≤ b
and
min bt y (D)
s.t. At y = c
y ≥ 0
exactly one of the following statements is true:
1. Neither (P) nor (D) have a feasible solution.
2. (P) is unbounded and (D) has no feasible solution.
3. (P) has no feasible solution and (D) is unbounded.
4. Both (P) and (D) have a feasible solution. Then both have an optimal solution, and
for an optimal solution x̃ of (P) and an optimal solution ỹ of (D), we have
ct x̃ = bt ỹ.
Proof: Obviously, at most one of the statements can be true.

If one of the linear programs is unbounded, then the other one must be infeasible because of
the weak duality.
Assume that one of the LPs (say (P) without loss of generality) is feasible and bounded.
Hence the system
Ax ≤ b (21)
has a feasible solution while there is a B such that the system
Ax ≤ b
(22)
−ct x ≤ −B
does not have a feasible solution. By Farkas’ Lemma (Theorem 5), this means that there is a
vector u ∈ Rm and a number z ∈ R with with u ≥ 0 and z ≥ 0 such that ut A − zct = 0t and
bt u − zB < 0.
Note that z > 0 because if z = 0, then ut A = 0t and bt u < 0 which means that Ax ≤ b does not
have a feasible solution, which is a contradiction to our assumption. Therefore, we can define
24
ũ := z1 u. This implies At ũ = c and ũ ≥ 0, so ũ is a feasible solution of (D). Therefore (D) is
feasible. It is bounded as well because of the weak duality.
It remains to show that there are feasible solutions x of (P) and y of (D) such that ct x ≥ bt y.
This is the case if (and only if) the following system has a feasible solution:
Ax ≤ b
At y = c
−ct x + bt y ≤ 0
y ≥ 0
By Theorem 6, this is the case if and only if the following system (with variables u ∈ Rm ,
v ∈ Rn and w ∈ R) does not have a feasible solution:
ut A −wct = 0
v t At + wbt ≥ 0
u b + vtc
t
< 0 (23)
u ≥ 0
w ≥ 0
Hence, assume that system (23) has a feasible solution u, v and w.
Case 1: w = 0. Then (again by Farkas’ Lemma) the system
Ax ≤ b
At y = c
y ≥ 0
does not have a feasible solution, which is a contradiction because both (P) and (D) have a
feasible solution.
Case 2: w > 0. Then
0 > wut b + wv t c ≥ ut (−Av) + v t (At u) = 0,
which is a contradiction. 2
Remark: Theorem 8 shows in particular that if a linear program max{ct x | Ax ≤ b} is feasible
and unbounded that there is a vector x̃ with Ax̃ ≤ b such that ct x̃ = sup{ct x | Ax ≤ b}.
The following table gives an overview of the possible combinations of states of the primal and
dual LPs (“X” means that the combination is possible, “x” means that it is not possible):
(D)
Feasible, Feasible,
Infeasible
bounded unbounded
Feasible,
X x x
bounded
(P) Feasible,
x x X
unbounded
Infeasible x X X
25
Remark: The previous theorem can be used to show that computing a feasible solution of
a linear program is in general as hard as computing an optimum solution. Assume that we
want to compute an optimum solution of the program (P) in the theorem. To this end, we can
compute any feasible solution of the following linear program:
max ct x
s.t. Ax ≤ b
At y = c (24)
ct x ≥ bt y
y ≥ 0
Here x and y are the variables. We can ignore the objective function in the modified LP because
we just need any feasible solution. The constraints At y = c, ct x ≥ bt y and y ≥ 0 guarantee that
any vector x from a feasible solution of the new LP is an optimum solution of (P).
Corollary 9 Let A,B,C,D,E,F ,G,H,K be matrices and a,b,c,d,e,f be vectors of appro-

priate dimensions such that:
 
A B C
 D E F  is an m × n-matrix,
G H K
   
a d
 b  is a vector of length m and  e  is a vector of length n.
c f
Then  

 Ax + By + Cz ≤ a 
Dx + Ey + F z = b 

 
 
max dt x + et y + f t z : Gx + Hy + Kz ≥ c
x ≥ 0 

 

 

z ≤ 0
 
=  
t t t

 A u + D v + G w ≥ d 

t t t
B u + E v + H w = e

 

 
t t t t t t
min a u+b v+c w : C u + F v + K w ≤ f ,
u ≥ 0 

 


 
w ≤ 0
 
provided that both sets are non-empty.

Proof: Transform the first LP into standard inequality form and apply Theorem 8. The
details are again left as an exercise. 2
Table 1 gives an overview of how a primal linear program can be converted into a dual linear
program.
26
Primal LP Dual LP
Variables x1 , . . . , x n y1 , . . . , y m
Matrix A At
Right-hand side b c
Objective function max ct x min bt y
n
P
aij xj ≤ bi yi ≥ 0
j=1
Pn
aij xj ≥ bi yi ≤ 0
j=1
Pn
aij xj = bi yi ∈ R
j=1
Constraints m
P
xj ≥ 0 aij yi ≥ cj
i=1
Pm
xj ≤ 0 aij yi ≤ cj
i=1
Pm
xj ∈ R aij yi = cj
i=1
Tabelle 1: Dualization of linear programs.
Here are some important special cases of primal-dual pairs of LPs:

Primal LP Dual LP
t
max{c x | Ax ≤ b} min{b y | y t A = c, y ≥ 0}
t
max{ct x | Ax ≤ b, x ≥ 0} min{bt y | y t A ≥ c, y ≥ 0}
max{ct x | Ax ≥ b, x ≥ 0} min{bt y | y t A ≥ c, y ≤ 0}
max{ct x | Ax = b, x ≥ 0} min{bt y | y t A ≥ c}
2.5 Complementary Slackness
Theorem 10 (Complementary slackness for inequalities) Let max{ct x | Ax ≤ b} and

min{bt y | At y = c, y ≥ 0} be a pair of a primal and a dual linear program. Then, for
x ∈ Rn with Ax ≤ b and y ∈ Rm with At y = c and y ≥ 0 the following statements are
equivalent:
(a) x is an optimum solution of max{ct x | Ax ≤ b} and y an optimum solution of

min{bt y | At y = c, y ≥ 0}.
(b) ct x = bt y.
(c) y t (b − Ax) = 0.
27
Proof: The equivalence of the statements (a) and (b) follows from Theorem 8. To see the
equivalence of (b) and (c) note that y t (b − Ax) = y t b − y t Ax = y t b − ct x, so ct x = bt y is
equivalent to y t (b − Ax) = 0. 2
With the notation of the theorem, let at1 , . . . , atm be the rows of A and b = (b1 , . . . , bm ). Then,
the theorem implies that for an optimum primal solution x and an optimum dual solution y and
i ∈ {1, . . . , m} we have yi = 0 or ati x = bi (since m t t
P
i=1 i i − ai x) must be zero and yi (bi − ai x)
y (b
cannot be negative for any i ∈ {1, . . . , m}).
Theorem 11 (Complementary slackness for inequalities with non-negative variables) Let

max{ct x | Ax ≤ b, x ≥ 0} and min{bt y | At y ≥ c, y ≥ 0} be a pair of a primal and a dual
linear program. Then, for x ∈ Rn with Ax ≤ b and x ≥ 0 and y ∈ Rm with At y ≥ c and
y ≥ 0 the following statements are equivalent:
(a) x is an optimum solution of max{ct x | Ax ≤ b, x ≥ 0} and y an optimum solution

of min{bt y | At y ≥ c, y ≥ 0}.
(b) ct x = bt y.
(c) y t (b − Ax) = 0 and xt (At y − c) = 0.
Proof: The equivalence of the statements (a) and (b) follows again from Theorem 8. To
see the equivalence of (b) and (c) note that 0 ≤ y t (b − Ax) and 0 ≤ xt (At y − c). Hence
y t (b − Ax) + xt (At y − c) = y t b − y t Ax + xt At y − xt c = y t b − xt c is zero if and only if
0 = y t (b − Ax) and 0 = xt (At y − c). 2
Corollary 12 Let max{ct x | Ax ≤ b} be a feasible linear program. Then, the linear

program is bounded if and only if c is in the convex cone generated by the rows of A.
Proof: The linear program is bounded if and only if its dual linear program is feasible. This is
the case if and only if there is a vector y ≥ 0 with y t A = c which is equivalent to the statement
that c is in the cone generated by the rows of A. 2
Theorem 10 allows us to strengthen the statement of the previous Corollary. Let x be an
optimum solution of the linear program max{ct x | Ax ≤ b} and y an optimum solution of its
dual min{bt y | At y = c, y ≥ 0}. Denote the row vectors of A by at1 , . . . , atm . Then yi = 0 if
ati x < bi (for i ∈ {1, . . . , m}), so c is in fact in the cone generated only by these rows of A where
ati x = bi (see Figure 3 for an illustration).
28
y
a1
c
a2
at3 x = b3
at1 x = b1
{x ∈ R2 | Ax ≤ b}
x
at2 x = b2
Fig. 3: Cost vector c as non-negative combination of rows in A.
Theorem 13 (Strict Complementary Slackness) Let max{ct x | Ax ≤ b} and min{bt y |

At y = c, y ≥ 0} be a pair of a primal and a dual linear program that are both feasible and
bounded. Then, for each inequality ati x ≤ bi in Ax ≤ b exactly one of the following two
statements holds:
(a) The primal LP max{ct x | Ax ≤ b} has an optimum solution x∗ with ati x∗ < bi .
(b) The dual LP min{bt y | At y = c, y ≥ 0} has an optimum solution y ∗ with yi∗ > 0.
Proof: By complementary slackness, at most one the statements can be true. Let δ =
max{ct x | Ax ≤ b} be the value of an optimum solution. Assume that (a) does not hold. This
means that
max −ati x
Ax ≤ b
−ct x ≤ −δ
has an optimum solution with value at most −bi . Hence, also its dual LP
min bt y − δu
At y − uc = −ai
y ≥ 0
u ≥ 0
must have an optimum solution of value at most −bi . Therefore, there are y ∈ Rm and u ∈ R
with y ≥ 0 and u ≥ 0 with y t A − uct = −ati and y t b − uδ ≤ −bi . Let ỹ = y + ei (i.e. ỹ arises from
y by increasing the i-th entry by one). If u = 0, then ỹ t A = y t A + ati = 0 and ỹ t b = y t b + bi ≤ 0,
29
so if y ∗ is an optimal dual solution, y ∗ + ỹ is also an optimum solution and has a positive i-th
entry. If u > 0, then u1 ỹ is an optimum dual solution (because u1 ỹ t A = u1 y t A + u1 ati = ct and
1 t
u
ỹ b = u1 y t b + u1 bi ≤ δ) and has a positive i-th entry. 2
Theorem 14 Let max{ct x | Ax ≤ b} and min{bt y | At y = c, y ≥ 0} be a pair of a primal

and a dual linear program that are both feasible and bounded. Then, there are optimum
solution x∗ and y ∗ of the LPs such that for each inequality ati x ≤ bi in Ax ≤ b either
ati x∗ < bi or yi∗ > 0 holds.
Proof: By Theorem 13, for any inequality ati x ≤ bi there is a pair of optimum solutions
(i)
x(i) ∈ Rn , y (i) ∈ Rm such that ati x(i) < bi or yi > 0. Since the convex
Pm combination of optimum
LP solutions is again an optimum solution, we can set x := m i=1 x and y := m1 m
∗ 1 (i) ∗ (i)
P
i=1 y
and get a pair of optimum solutions fulfilling the conditions of the theorem. 2
As an application of complementary slackness we consider again the Maximum-Flow Problem.

Let G be a directed graph with s, t ∈ V (G), s 6= t, and capacities u : E(G) → R>0 . Here is the
LP-formulation of the Maximum-Flow Problem:
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(25)
P x
P e ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
By dualizing it, we get
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G), {s, t} ∩ {v, w} = ∅
ye + zv ≥ 0 for e = (v, t) ∈ E(G), v 6= s
ye − zw ≥ 0 for e = (t, w) ∈ E(G), w 6= s (26)
ye − zw ≥ 1 for e = (s, w) ∈ E(G), w 6= t
ye + zv ≥ −1 for e = (v, s) ∈ E(G), v 6= t
ye ≥ 1 for e = (s, t) ∈ E(G)
ye ≥ −1 for e = (t, s) ∈ E(G)
In a simplified way its dual LP can be written with two dummy variables zs = −1 and zt = 0:
30
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G) (27)
zs = −1
zt = 0
We will use the dual LP to show the Max-Flow-Min-Cut-Theorem. We call a set δ + (R) with
R ⊂ V (G), s ∈ R and t 6∈ R an s-t-cut.
Theorem 15 (Max-Flow-Min-Cut-Theorem) Let G be a directed graph with edge capaci-

ties u : E(G) → R>0 . Let s, t ∈ V (G) be two different vertices. Then, the minimum of all
capacities of s-t-cuts equals the maximum value of an s-t-flow.
Proof: If x is a feasible solution of the primal problem (25) (i.e. x encodes an s-t-flow) and
δ + (R) is an s-t-cut, then
 
X X X X X X X X
xe − xe =  xe − xe  = xe − xe ≤ u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
P P
The first equation follows from the flow conservation rule (i.e. xe − xe = 0) applied
+ −
to all vertices in R \ {s} and the second one from the fact that flow values on edges inside R
cancel out in the sum. The last inequality follows from the fact that flow values are between 0
and u.
Thus, the capacity of any s-t-cut is an upper bound for the value of an s-t-flow. We will show
that for any maximum s-t-flow there is an s-t-cut whose capacity equals the value of the flow.
Let x̃ be an optimum solution of the primal problem (25) and ỹ, z̃ be an optimum solution of
the dual problem (27). In particular x̃ defines a maximum s-t-flow. Consider the set R := {v ∈
V (G) | z̃v ≤ −1}. Then s ∈ R and t 6∈ R.
+
If e = (v, w) ∈ δG (R), then z̃v < z̃w , so ỹe ≥ z̃w − z̃v > 0. By complementary slackness
−
this implies x̃e = u(e). On the other hand, if e = (v, w) ∈ δG (R), then z̃v > z̃w and hence
ỹe + z̃v − z̃w ≥ z̃v − z̃w > 0, so again by complementary slackness x̃e = 0. This leads to:
 
X X X X X X X X
x̃e − x̃e =  x̃e − x̃e  = x̃e − x̃e = u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
31
32
3 The Structure of Polyhedra
3.1 Mappings of Polyhedra
Proposition 16 Let A ∈ Rm×(n+k) and b ∈ Rm . Then the set

n k x
P = {x ∈ R | ∃y ∈ R : A ≤ b}
y
is a polyhedron.
Proof: Exercise. 2
x

Remark: The set P = {x ∈ Rn | ∃y ∈ Rk : A y
≤ b} is called a projection of {z ∈ Rn+k |
Az ≤ b} to Rn .
More generally, the image of a polyhedron {x ∈ Rn | Ax ≤ b} under an affine linear mapping
f : Rn → Rk , which is given by D ∈ Rk×n , d ∈ Rk and x 7→ Dx + d is also a polyhedron:
Corollary 17 Let A ∈ Rm×n , b ∈ Rm , D ∈ Rk×n and d ∈ Rk . Then
{y ∈ Rk | ∃x ∈ Rn : Ax ≤ b and y = Dx + d}
is a polyhedron.
Proof: Note that
(
k n
y ∈ R | ∃x ∈ R : Ax ≤ b and y = Dx + d
    
A 0 b
 x 
= y ∈ Rk | ∃x ∈ Rn  D −Ik  ≤  −d 
y
−D Ik d
 
and apply the previous proposition. 2
33
3.2 Faces
Definition 9 Let P = {x ∈ Rn | Ax ≤ b} be a non-empty polyhedron and c ∈ Rn \ {0}.
(a) For δ := max{ct x | x ∈ P } < ∞, the set {x ∈ Rn | ct x = δ} is called supporting

hyperplane of P .
(b) A set X ⊆ Rn is called face of P if X = P or if there is a supporting hyperplane

H of P such that X = P ∩ H.
(c) If {x0 } is a face of P , we call x0 vertex of P or basic solution of the system

Ax ≤ b.
Proposition 18 Let P = {x ∈ Rn | Ax ≤ b} be a non-empty polyhedron and F ⊆ P .

Then, the following statements are equivalent:
(a) F is a face of P .
(b) There is a vector c ∈ Rn such that δ := max{ct x | x ∈ P } < ∞ and F = {x ∈ P |

ct x = δ}.
(c) There is a subsystem A0 x ≤ b0 of Ax ≤ b such that F = {x ∈ P | A0 x = b0 } =

6 ∅.
Proof:
“(a) ⇒ (b)”: Let F be face of P . If F = P , then c = 0 yields F = {x ∈ P | ct x = 0}. If

6 P , then there must be a c ∈ Rn such that for δ := max{ct x | x ∈ P }(< ∞) we have
F =
F = {x ∈ Rn | ct x = δ} ∩ P = {x ∈ P | ct x = δ}.
“(b) ⇒ (c)”: Let c, δ and F be as described in (b). Let A0 x ≤ b0 be a maximal subsystem of

Ax ≤ b such that A0 x = b0 for all x ∈ F . Hence F ⊆ {x ∈ P | A0 x = b0 } and it remains
to show that F ⊇ {x ∈ P | A0 x = b0 }. Let Ãx ≤ b̃ be the inequalities in Ax ≤ b that are
not contained in A0 x ≤ b0 . Denote the inequalities of Ãx ≤ b̃ by ãtj x ≤ b̃j (j = 1, . . . , k).
Hence, for each j = 1, . . . , k we have an xj ∈ F with ãj xj < b̃j .
If k > 0, we set x∗ := k1 kj=1 xj . Otherwise let x∗ be an arbitrary element of F . In any
P
case, we have ãtj x∗ < b̃j for all j ∈ {1, . . . , k}.

Consider an arbitrary y ∈ P \ F . We have to show that A0 y 6= b0 .
Because of y ∈ P \ F we know that ct y < δ.
b̃j −ãt x∗
Choose > 0 with < ãt (x∗j−y) for all j ∈ {1, . . . , k} with ãtj x∗ > ãtj y (note that all these
j
upper bounds on are positive).
34
Set z := x∗ + (x∗ − y) (see Figure 4). Then ct z > δ, so z 6∈ P . Therefore, there must
be an inequality at x ≤ β of the system Ax ≤ b such that at z > β. We claim that this
inequality cannot belong to Ãx ≤ b̃. To see this assume that at x ≤ β belongs to Ãx ≤ b̃.
If at x∗ ≤ at y then
at z = at x∗ + at (x∗ − y) ≤ at x∗ < β.
But if at x∗ > at y then
t t ∗ t ∗ β − at x ∗ t ∗
t ∗
a z = a x + a (x − y) < a x + t ∗ a (x − y) = β.
a (x − y)
In both cases, we get a contradiction, so the inequality at x ≤ β belongs to A0 x ≤ b0 .
Therefore, at y = at (x∗ + 1 (x∗ − z)) = (1 + 1 )β − 1 at z < β, which means that A0 y 6= b0 .
“(c) ⇒ (a)”: Let A0 x ≤ b0 be a subsystem of Ax ≤ b such that F = {x ∈ P | A0 x = b0 }. Let

ct be the sum of all row vectors of A0 , and let δ be the sum of the entries of b0 . Then,
ct x ≤ δ for all x ∈ P and F = P ∩ H with H = {x ∈ Rn | ct x = δ}. 2
c
x∗ + (x∗ − y)
F
x∗
y
P
Fig. 4: Illustration of part “(b) ⇒ (c)” of the proof of Proposition 18
The following corollary summarizes direct consequences of the previous proposition:
Corollary 19 Let P 6= ∅ be a polyhedron and F a face of P .
(a) Let c ∈ Rn be a vector such that max{ct x | x ∈ P } < ∞. Then the set of all vectors
x where the maximum of ct x over P is attained is a face of P .
(b) F is a polyhedron.
(c) A subset F 0 ⊆ F is a face of F if and only if F 0 is a face of P .
(d) If P is of the form P = {x ∈ Rn | Ax = b}, so it is in fact a linear subspace, then

P has only one face, namely P itself. 2
We are in particular interested in the largest and the smallest faces of a polyhedron.
35
3.3 Facets
Definition 10 Let P be a polyhedron. A facet of P is an (inclusion-wise) maximal face

F of P with F 6= P . An inequality ct x ≤ δ is facet-defining for P if ct x ≤ δ for all
x ∈ P and {x ∈ P | ct x = δ} is a facet.
Theorem 20 Let P ⊆ {x ∈ Rn | Ax = b} be a non-empty polyhedron of di-

mension n − rank(A). Let A0 x ≤ b0 be a minimal system of inequalities such that
P = {x ∈ Rn | Ax = b, A0 x ≤ b0 }. Then, every inequality in A0 x ≤ b0 is facet-defining for
P and every facet of P is given by an inequality of A0 x ≤ b0 .
Proof: If P = {x ∈ Rn | Ax = b}, then P does not have a facet (the only face of P is P itself,
see Corollary 19 (d)), so both statements are trivial.
Hence assume that P 6= {x ∈ Rn | Ax = b}.
Let A0 x ≤ b0 be a minimal system of inequalities such that P = {x ∈ Rn | Ax = b, A0 x ≤ b0 }.
Let at x ≤ β be an inequality in A0 x ≤ b0 , and let A00 x ≤ b00 be the rest of the system A0 x ≤ b0
without at x ≤ β.
We will show that at x ≤ β is facet-defining.
Let y ∈ Rn be a vector with Ay = b, A00 y ≤ b00 and at y > β. Such a vector exists because
otherwise A00 y ≤ b00 would be a smaller system of inequalities than A0 ≤ b0 with P = {x ∈ Rn |
Ax = b, A00 ≤ b00 }, which is a contradiction to the definition of A0 x ≤ b0 .
Moreover, let ỹ ∈ P be a vector with A0 ỹ < b0 (such a vector ỹ exists because P is full-dimensional
in the linear subspace {x ∈ Rn | Ax = b}). Consider the vector
β − at ỹ
z = ỹ + (y − ỹ).
at y − at ỹ
t t
Then, at z = at ỹ + aβ−a ỹ t t β−a ỹ
t y−at ỹ (a y − a ỹ) = β. Furthermore, 0 < at y−at ỹ < 1. Thus, z is the convex
combination of ỹ and y, so Az = b and A00 z ≤ b00 . Therefore, we have z ∈ P .

Set F := {x ∈ P | at x = β}. Then, F = 6 ∅ (because z ∈ F ), and F =
6 P because ỹ ∈ P \ F .
Hence, F is a face of P . It is also a facet because a x ≤ β is the only inequality of A0 x ≤ b0
t
that is met by all elements of F with equality (e.g. the vector z ∈ F fulfills all inequalities in
A00 x ≤ b00 with strict inequality).
On the other hand, by Proposition 18 any facet is defined by an inequality of A0 x ≤ b0 . 2
36
Corollary 21 Let P ⊆ Rn be a polyhedron.
(a) Every face F of P with F 6= P is the intersection of facets of P .
(b) The dimension of every facet of P is dim(P ) − 1. 2
In particular, this means that the smallest possible representation of a full-dimensional polyhe-
dron P = {x ∈ Rn | Ax ≤ b} is unique (up to swapping inequalities and multiplying inequalities
with positive constants). If possible, we want to describe any polyhedron by facet-defining
inequalities because according to the Theorem 20, this gives such a smallest possible description
of the polyhedron (with respect to the number of inequalities).
3.4 Minimal Faces
Definition 11 A face F of a polyhedron P is called a minimal face if there is no face

F 0 of P with F 0 ( F .
Proposition 22 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron. A non-empty set F ⊆ P

is a minimal face of P if and only if there is a subsystem A0 x ≤ b0 of Ax ≤ b with
F = {x ∈ Rn | A0 x = b0 }.
Proof: “⇒:” Let F be a minimal face of P . By Proposition 18, we know that there is a
subsystem A0 x ≤ b0 of Ax ≤ b with F = {x ∈ P | A0 x = b0 }. Choose A0 x ≤ b0 maximal with
this property. Let Ãx ≤ b̃ be a minimal subsystem of Ax ≤ b such that F = {x ∈ Rn | A0 x =
b0 , Ãx ≤ b̃}.
We have to show the following claim:
Claim: Ãx ≤ b̃ is an empty system of inequalities.
Proof of the Claim: Assume that at x ≤ β is an inequality in Ãx ≤ b̃. The inequality at x ≤ β
is not redundant, so by Theorem 20, F 0 = {x ∈ Rn | A0 a = b0 , Ãx ≤ b̃, at x = β} is a facet of F ,
and hence, by Corollary 19, F 0 is s face of P . On the other hand, we have F 0 6= F , because
at x = β is not valid for all elements of F (otherwise we could have added at x ≤ β to the set of
inequalities A0 x ≤ b0 ). This is a contradiction to the minimality of F . This proves the claim.
“⇐:” Assume that F = {x ∈ Rn | A0 x = b0 } ⊆ P (for a subsystem A0 x ≤ b0 of Ax ≤ b) is
non-empty.
Then, F cannot contain a proper subset as a face (see Corollary 19 (d)).
37
Moreover, F = {x ∈ Rn | A0 x = b0 } = {x ∈ P | A0 x = b0 }, so by Proposition 18 the set F is a
face of P . Since any proper subset of F that is a face of P would also be a face of F and we
know that F does not contain proper subsets as faces, F is a minimal face of P . 2
Corollary 23 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron. Then the minimal faces of P

have dimension n − rank(A).
Proof: Let F be a minimal face of P = {x ∈ Rn | Ax ≤ b}. By Proposition 22, it can be

written as F = {x ∈ Rn | A0 x = b0 } for a subsystem A0 x ≤ b0 of Ax ≤ b. If A0 had smaller rank
than A, we could add a new constraint at x ≤ β of Ax ≤ b to A0 x ≤ b0 such that at is linearly
independent to all rows of A0 . Then, {x ∈ Rn | A0 x = b0 , at x = β} ( F would be a face of F
and thus a face of P . This is a contradiction to the minimality of F . Hence, we can assume
that rank(A0 ) = rank(A).
Therefore, dim(F ) = n − rank(A0 ) = n − rank(A). 2
Proposition 24 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron and x0 ∈ P . Then, the

following statements are equivalent:
(a) x0 is a vertex of P .
(b) There is a subsystem A0 x ≤ b0 of Ax ≤ b of n inequalities such that the rows of A0

are linearly independent and {x0 } = {x ∈ P | A0 x = b0 }.
(c) x0 cannot be written as a convex combination of vectors in P \ {x0 }.
(d) There is no non-zero vector d ∈ Rn such that {x0 + d, x0 − d} ⊆ P .
Proof:
“(a) ⇔ (b)”: By Proposition 22, x0 is a vertex if and only if there is a subsystem A0 x ≤ b0 of

Ax ≤ b with {x0 } = {x ∈ Rn | A0 x = b0 }. Since {x0 } is of dimension 0, this is the case if
and only if the statement in (b) holds.
“(b) ⇒ (c)”: Let A0 x ≤ b0 be a subsystem of n inequalities of Ax ≤ b such that the rows of A0

are linearly independentPand {x0 } = {x ∈ P | A0 x = b0 }. Assume that x0 can be written as
a convex combination ki=1 λi x(i) of vectors x(i) ∈ P \ {x0 } (so λi ≥ 0 for i ∈ {1, . . . , k}
and ki=1 λi = 1). If we had at x(i) < β for any inequality at x ≤ β in A0 x ≤ b0 and
P
i ∈ {1, . . . , k}, then at x0 = ki=1 λi at x(i) < β, which is a contradiction. But then, we have
P
x(i) ∈ {x ∈ P | A0 x = b0 } = {x0 } for all i ∈ {1, . . . , k}, which is a contradiction, too.
“(c) ⇒ (d)”: If {x0 + d, x0 − d} ⊆ P , then x0 = 12 ((x0 + d) + (x0 − d)), so x0 can be written as

a convex combination of vectors in P \ {x0 }.
38
“(d) ⇒ (b)”: Let A0 x ≤ b0 be a maximal subsystem of Ax ≤ b such that A0 x0 = b0 . Assume
that A0 does not contain n linearly independent rows. Then, there is a vector d that is
orthogonal to all rows in A0 . Hence, for any > 0, we have A0 (x0 + d) = A0 (x0 − d) = b0 .
For any inequality at x ≤ β that is in Ax ≤ b but not in A0 x ≤ b0 , we have at x0 < β.
Therefore, if > 0 is sufficiently small, at (x0 + d) ≤ β and at (x0 − d) ≤ β are valid for
inequalities at x ≤ β in Ax ≤ b but not in A0 x ≤ b0 . In other words, we have (x0 + d) ∈ P
and (x0 − d) ∈ P . 2
Definition 12 A polyhedron is called pointed if it is empty or all minimal faces of it

are of dimension 0.
Examples:
• Polytopes are pointed.
To see this, consider a non-empty polytope P = {x ∈ Rn | Ax ≤ b}. If rank(A) < n, then
there is a vector x̃ ∈ Rn such that Ax̃ = 0. But then for any x ∈ P and K ∈ R, we have
x + K x̃ ∈ P , which is a contradiction to the assumption that P fits into a ball of finite
radius. Hence we have rank(A) = n, so P is pointed.
n
• Polyhedra P that can be written as P = {x ∈ R | Ax =b, x ≥0} arepointed.
A b
This can be seen by writing P as P = {x ∈ Rn |  −A  x ≤  −b }. Obviously, the
  −In 0
A
matrix  −A  has rank n, hence P is pointed.
−In
Corollary 25 If the linear program max{ct x | Ax ≤ b} is feasible and bounded and the
polyhedron P = {x ∈ Rn | Ax ≤ b} is pointed, then there is a vertex x0 of P such that
ct x0 = max{ct x | Ax ≤ b}. 2
Theorem 26 (Carathéodory’s Theorem) If X ⊆ Rn is a finite set of vectors and

c ∈ cone(X) then there are linearly independent vectors a1 , . . . , ak ∈ X such that
c ∈ cone({a1 , . . . , ak }).
Proof: Let {a1 , . . . , ak } be an inclusion-wise minimal set of vectors in X such that c ∈

cone({a
Pk 1 , . . . , ak }). This means that there are positive numbers λ1 , . . . , λk such that c =
i=1 λi ai .
We show that the vectors a1 , . P. . , ak are linearly independent. If this is not the case, there are
numbers γ1 , . . . , γk such that ki=1 γi ai = 0. We can assume that at least one γi is positive.
39
Choose σ maximal such that λi − σγi ≥ 0 for all i ∈ {1,P. . . , k}. Then, in particular, for at least
on on i ∈ {1, . . . , k}, we have λi − σγi = 0. Therefore, ki=1 (λi − σγi )ai is a representation of c
with less vectors, which is a contradiction to the minimality of the set {a1 , . . . , ak }. 2
Theorem 27 (Fundamental Theorem of Linear Inequalities) Let a1 , . . . , am , c ∈ Rn be

vectors and let t be the dimension of the subspace of Rn spanned by a1 , . . . , am , c (so t is
the rank of the matrix whose rows are at1 , . . . , atm , ct ). Then, exactly one of the following
statements is true:
(a) c can be written as a non-negative combination of linearly independent vectors from

at1 , . . . , atm .
(b) There is a hyperplane {x ∈ Rn | ut x = 0} (for a non-zero vector u ∈ Rn ) containing

t−1 linearly independent vectors from a1 , . . . , am such that ati u ≥ 0 for i ∈ {1, . . . , m}
and ct u < 0.
Proof: Obviously, at most one of the statements can be valid. Let A be the matrix with rows
at1 , . . . , atm .
If c ∈ cone({a1 , . . . , am }) then by the previous theorem, c can be written as a non-negative
combination of linearly independent vectors from at1 , . . . , atm .
Hence, assume that c 6∈ cone({a1 , . . . , am }), so there is no vector v ∈ Rm , v ≥ 0 such that
ct = v t A. By Farkas’ Lemma (Theorem 6), this implies that there is a vector ũ ∈ Rn such that
Aũ ≥ 0 and ct ũ < 0. This implies that the following LP (with u ∈ Rn as variable vector) has a
feasible solution:
max ct u
s.t. ct u ≤ −1
−ct u ≤ 1
−Au ≤ 0
Moreover, the LP is bounded (-1 is the value of an optimum solution). Hence, the optimum is
attained on a face of the solution polyhedron. By Theorem 22, we can write a minimal face
where the optimum solution value is attained as a set F = {u ∈ Rn | A0 u = b0 } where A0 u ≤ b0
is a subsystem of ct u ≤ −1, −ct u ≤ 1, −Au ≤ 0 consisting of t linearly independent vectors.
Hence, any vector u ∈ F fulfills the condition of (b). 2
3.5 Cones
Theorem 28 (Farkas-Minkowski-Weyl Theorem) A cone is polyhedral if and only if it is

finitely generated.
40
Proof: “⇐:” Let a1 , . . . , am ∈ Rn be vectors. We have to show that cone({a1 , . . . , am }) is
polyhedral. W.l.o.g. we can assume that the vectors a1 , . . . , am span the vector space Rn .
Consider the set H of half-spaces Hu = {x ∈ Rn | ut x ≤ 0} such that for each Hu ∈ H the
following conditions hold:
• {a1 , . . . , am } ⊆ Hu , and
• There are n − 1 linearly independent vectors ai1 , . . . , ain−1 in {a1 , . . . , am } such that
ut aij = 0 for j ∈ {1, . . . , n − 1}
m

The set H is finite because there are at most n−1 such half-spaces, and by Theorem 27 the
set cone({a1 , . . . , am }) is the intersection of these half-spaces. Hence, cone({a1 , . . . , am }) is a
polyhedron.
“⇒:” Let C = {x ∈ Rn | Ax ≤ 0} be a polyhedral cone. We have to show that C is finitely
generated. Let CA be the cone generated by the rows of A. By the first part of the proof, we
know that CA (as any other finitely generated cone) is polyhedral. Hence, there are vectors
d1 , . . . , dk ∈ Rn such that CA = {x ∈ Rn | dt1 x ≤ 0, . . . , dtk x ≤ 0}. Let CB = cone({d1 , . . . , dk })
be the cone generated by d1 , . . . , dk .
Claim: C = CB .
Proof of the claim: “CB ⊆ C”: Every row vector of A is contained in CA . Hence Adi ≤ 0 for all
i ∈ {1, . . . , k}. Therefore, di ∈ C (for i ∈ {1, . . . , k}) and thus (as C is a cone) CB ⊆ C.
“C ⊆ CB ”: Assume that there is a y ∈ C \ CB . Again by the first part, CB is polyhedral. Thus,
there must be a vector w ∈ Rn with wt di ≤ 0 (for i = 1, . . . , k) and wt y > 0. This implies
w ∈ CA , and therefore wt x ≤ 0 for all x ∈ C. Obviously, together with wt y > 0 this is a
contradiction to the assumption y ∈ C. 2
Remark: For a set S ⊆ Rn we call the set S o = {x ∈ Rn | xt y ≤ 0 for all y ∈ S}, the polar
cone of S (in particular it obviously is a convex cone). For a polyhedral cone C = {x ∈ Rn |
Ax ≤ 0} its polar cone C o is the cone generated by the rows of A (see exercises). We have just
seen in the proof that C oo = C for a polyhedral cone C.
3.6 Polytopes
Theorem 29 A set X ⊆ Rn is a polytope if and only if it is the convex hull of a finite

set of vectors in Rn .
Proof: “⇒:” Let X = {x ∈ Rn | Ax ≤ b} be a non-empty polytope. We can write X as

follows:
n x
X= x∈R | ∈C
1
41
where
x n+1
C= ∈R | λ ≥ 0, Ax − λb ≤ 0 .
λ
The set C is a polyhedral cone, so be Theorem 28 it s finitely generated by a set λx11 , . . . , λxkk

of vectors. Since X is bounded, C cannot contain a vector λx with non-zero x but λ ≤ 0.

Hence, we can assume that all λi are positive (for i ∈ {1, . . . , k}). We can even assume that we
have λi = 1 for all i ∈ {1, . . . , k} because otherwise we could scale all vectors by the factor λi .
Thus, we have

x x1 xk
x ∈ X ⇔ ∃µ1 , . . . , µk ≥ 0 : = µ1 + · · · + µk .
1 1 1
This implies that X is the convex hull of x1 , . . . , xk .

“⇐:” Let X = conv({x1 , . . . , xk }) be the
convex hull of x1 , . . . , xk . Wex1have
to show
that X is a
x1 xk xk
polytope. Let C = cone({ 1 , . . . , 1 }) be the cone generated by 1 , . . . , 1 .
Then, we have

x
x∈X⇔ ∈ C.
1
x

By Theorem 28, C is polyhedral, so we can write C as C = { λ
| Ax + bλ ≤ 0}. This shows
X = {x ∈ Rn | Ax + b ≤ 0}, so X is a polyhedron.
It is P
even a polytope, because for M = max{||x i || | i ∈ {1, . . . , k}}
P and x ∈ X, we can
P write x as
x = ki=1 λi xi with λ1 , . . . , λk ≥ 0 and ki=1 λi = 1, so ||x|| ≤ ki=1 λi ||xi || ≤ M ki=1 λi = M .
P
2
Corollary 30 A polytope is the convex hull of its vertices.
Proof: Let P be a polytope with vertex set X. Since P is convex and X ⊆ P , we have
conv(X) ⊆ P . It remains to show that P ⊆ conv(X). Theorem 29 implies that conv(X) is a
polytope, so in particular a polyhedron. Assume that there is a vector y ∈ P \ conv(X). Then,
there is a half-space Hy = {x ∈ Rn | ct x ≤ δ} such that conv(X) ⊆ Hy and y 6∈ Hy . This means
that ct y > ct x for all x ∈ X, so the maximum of the function ct x over P will not be attained at
a vertex. This is a contradiction to Corollary 25. 2
3.7 Decomposition of Polyhedra
Notation: For two vector sets X, Y ⊆ Rn , we define their Minkowski sum as:
X + Y := {z ∈ Rn | ∃x ∈ X ∃y ∈ Y : z = x + y}.
42
Theorem 31 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron. Then, there are finite sets
V, E ⊆ Rn such that
P = conv(V ) + cone(E).
Proof: The cone

x n
C= | x ∈ R , λ ∈ R, λ ≥ 0, Ax − λb ≤ 0
λ
is polyhedral, so by the Farkas-Minkowski-Weyl Theorem (Theorem 28), it is generated by

finitely many vectors λx11 , . . . , λxkk . Then, x ∈ P if and only if x1 ∈ C, which is the case

n o
x x1 xk

if any only if 1 ∈ cone λ1
, . . . , λk . None of the λi can be negative, and by scaling,
we can assume λi ∈ {0, 1} for i = 1, . . . , k in any such representation. Then, the sets V =
{xi | i ∈ {1, . . . , k}, λi = 1} and E = {xi | i ∈ {1, . . . , k}, λi = 0} give a decomposition P =
conv(V ) + cone(E). 2
It is easy to check that the Minkowski sum of two polyhedra is again a polyhedron (see exercises).
Thus, a set P ⊆ Rn is a polyhedron if and only if there are finite sets V, E ⊆ Rn such that
P = conv(V ) + cone(E).
43
44
4 Simplex Algorithm
The Simplex Algorithm by Dantzig [1951] is the oldest algorithm for solving general linear
programs. Geometrically it works as follows: Given a polyhedron P and a linear objective
function, we start with any vertex of P . Then we walk along a one-dimensional face of P to
another vertex and repeat this until we found a vertex where the objective function attains a
maximum.
If we want to have a chance to follow this main strategy, we need a pointed polyhedron. That
is why in this section we consider linear programs in standard equation form:
max ct x
s.t. Ax = b (28)
x ≥ 0
As usual A is an m × n-matrix and b vector of length m.

We assume that rank(A) = m and that Ax = b has a feasible solution. These assumptions are
no real restrictions because we can run Gaussian elimination on the system Ax = b in advance
(see Section 5.1). Doing this we easily find out if Ax = b is indeed feasible and we can get rid of
redundant constraints, i.e. reduce A to a set linearly independent rows.
Thus, we also have m ≤ n. If m = n, then there is only one vector x with Ax = b. We can
compute this vector (again by using Gaussian elimination) and check if it is non-negative. This
solves the linear program in this case. Hence we assume that m < n.
We are interested in vertices of {x ∈ Rn | Ax = b, x ≥ 0}, so in particular, we ask for vectors
x∗ ∈ Rn with Ax∗ = b, x∗ ≥ 0 such that (at least) n − m entries of x∗ are zero (since n
constraints must be satisfied with equality.
The Simplex Algorithm works on linear programs in standard equation form (see (28)).
Nevertheless, in examples we will often start with LPs in the following form:
max ct x
s.t. Ãx ≤ b (29)
x ≥ 0
By adding (non-negative) slack variables x̃ we get a special case of an LP in standard equation

form (with A = [Ã Im ]). These LPs of the form max{ct x | Ãx + Im x̃ = b, x ≥ 0, x̃ ≥ 0} have
the advantage that, provided that b ≥ 0, one can easily compute a vertex of the corresponding
polyhedron (setting x = 0 and x̃i = bi for i ∈ {1, . . . , m} gives a vertex). Such a vertex is
needed to start the Simplex Algorithm.
45
4.1 Feasible Basic Solutions
Notation: We denote the index set of the columns of a matrix A ∈ Rm×n by {1, . . . , n}. For a
subset B ⊆ {1, . . . , n}, we denote by AB the sub-matrix of A containing exactly the columns
with index in B. Similarly, for a vector x ∈ Rn , we denote by xB the sub-vector of x containing
the entries with index in B. Note that xB is a vector of length |B| but its entries are not indexed
from 1 to |B|, but the indices are the elements of B, so for example for B = {2, 4, 9} we have
x = (x2 , x4 , x9 ).
Definition 13 Let A ∈ Rm×n be a matrix with rank m and b ∈ Rm a vector. Let

B ⊆ {1, . . . , n} with |B| = m such that AB is regular. Set N := {1, . . . , n} \ B.
(a) We call B a basis of A. The vector x with xB = A−1

B b and xN = 0 is called basic
solution of Ax = b for the basis B.
(b) If x is a basic solution of Ax = b for B, then the variables xj with j ∈ B are called
basic variables and the variables xj with j ∈ N are called non-basic variables.
(c) A basic solution x is called feasible if x ≥ 0. A basis is called feasible if its basic
solution is feasible.
(d) A feasible basic solution x for a basis B is called non-degenerated if A−1

B b > 0.
Otherwise it is called degenerated.
Remark: We also use the above definition for inequality systems of the type Ãx ≤ b, x ≥ 0
(with Ã ∈ Rm×ñ ). E.g. we call a vector x∗ ∈ Rñ with Ãx∗ ≤ b and x∗ ≥ 0 a basic solution if x∗ , s∗
with s∗ := b − Ãx∗ is a basic solution for Ãx + Im s = b, x ≥ 0, s ≥ 0 (with n := ñ + m variables).
In particular, in a feasible basic solution of Ãx ≤ b, x ≥ 0, the number of tight constraints
(including non-negativity constraints) must be at least n − m = ñ, and in a non-degenerated
feasible basic solution, the number of tight constraints must be exactly ñ. This is because
each positive non-slack variable and each positive slack variable is associated with a non-tight
constraint.
Example: Consider the following system of equations:
x 1 + x2 + s 1 = 1
2x1 + x2 + s2 = 2 (30)
x1 , x 2 , s 1 , s 2 ≥ 0
The variables are x1 , x2 , s1 , and s2 . We denoted the last two variables by s1 and s2 because
they can be interpreted as slack variables for the following system of inequalities: x1 + x2 ≤
1, 2x1 + x2 ≤ 2, x1 , x2 ≥ 0.
46
If we write the system of equations in matrix notation, we get:
 
x1
1 1 1 0   x 2 = 1

2 1 0 1  s1  2
s2

1 1
For B = {1, 2}, we get AB = with feasible basis solution (1, 0, 0, 0). So in particular
2 1
this
basic feasible solution is degenerated. If we choose instead B = {2, 3}, we get AB =
1 1
and the corresponding basic solution is (0, 2, −1, 0) which if, of course, infeasible.
1 0
Figure 5 illustrates these two basic solutions. However, note that the figure does not show the
solution space (which is 4-dimensional) but only the solution space of the problem without the
slack variables s1 and s2 , i.e. the solution space of the system x1 + x2 ≤ 1, 2x1 + x2 ≤, x1 , x2 ≥ 0.
So the two points (1, 0) and (0, 2) are basic solutions only in the sense of the remark stated
after the last definition.
y
Infeasible
2x1 + x2 = 2
x1 + x2 = 1
Degenerated
Fig. 5: Infeasible and degenerated basic solutions of (30) to R2 .
In this example we could easily make the degenerated basic solution non-degenerated by
skipping the redundant constraint 2x1 + x2 ≤ 2. This is always possible if we only have two
non-slackness variables but already in three dimensions there are instances where we cannot
get rid of degenerated basic solutions. As an example consider Figure 6. If the pyramid defines
the set of all feasible solution the marked vector is a degenerated basic solution, because four
constraints are fulfilled with equality while there are only three non-slack variables.
Note that the example (30) shows that the same vertex of a polyhedron can belong to a
degenerated or a non-degenerated basic solution, depending on how we describe the polyhedron
by a system of inequalities.
Theorem 32 Let P = {x ∈ Rn | Ax = b, x ≥ 0} be a polyhedron with rank(A) = m < n.

Then a vector x0 ∈ P is a vertex of P if and only if it is a feasible basic solution.
47
Degenerated basic solution
Fig. 6: A degenerated point in R3 .
Proof: The vector x0 is a vertex of P is and only if it is a feasible solution of the following
system and fulfills n linearly independent inequalities of the system with equality:
Ax ≤ b
−Ax ≤ −b
−In x ≤ 0
This is the case if and only if x0 ≥ 0, Ax0 = b and x0N = 0 for a set N ⊆ {1, . . . , n} with
|N | = n − m such that with B = {1, . . . , n} \ N the matrix AB has full rank. This is equivalent
to being a feasible basic solution. 2
4.2 The Simplex Method
Before we describe the algorithm in general, we will present some examples (which are taken
from Matoušek and Gärtner [2007]).
Consider the following linear program:
max x1 + x2
s.t. −x1 + x2 + x3 = 1
x1 + x4 = 3
x2 + x5 = 2
x1 , x 2 , x 3 , x 4 , x 5 ≥ 0
 
  x1  
−1 1 1 0 0  x2 
 1
 1 0 0 1 0  x3 = 3 
 
0 1 0 0 1  x4  2
x5
48
We first need a basis to start with. We simply choose B = {3, 4, 5}, which gives us the basic
solution x = (0, 0, 1, 3, 2). We write the constraints and the objective function in a so-called
simplex tableau:
x3 = 1 + x1 − x2
x4 = 3 − x1
x5 = 2 − x2
z = x1 + x2
The first three rows describe an equation system that is equivalent to the given one but each
basic variable is written as a combination of the non-basic variable. The last line describes the
objective function.
We will try to increase non-basic variables (which are zero in the current solution) with a
positive coefficient in the objective function. Hence, here we could use x1 or x2 , and we choose
x2 . x3 = 1 + x1 − x2 is the critical constraint that prevents us from increasing to something
bigger than 1 (without increasing x1 ). If we set x2 to something bigger than 1, x3 would become
negative. The constraint x5 = 2 − x2 only gives an upper bound of 2 for the value of x2 . Since the
bound induced by non-negativity of x3 is tighter (so the constraint x3 = 1 + x1 − x2 is critical),
we replace 3 in the basis by 2. The new basic variable x2 can be written as a combination of the
non-basic variables by using the first constraint: x2 = 1 + x1 − x3 . The new base is B = {2, 4, 5}
with a new basic solution x = (0, 1, 0, 3, 1). This is the new simplex tableau:
x2 = 1 + x1 − x3
x4 = 3 − x1
x5 = 1 − x1 + x3
z = 1 + 2x1 − x3
Increase x1 . x5 = 1−x1 +x3 is critical. x1 = 1+x3 −x5 . New base B = {1, 2, 4}. x = (1, 2, 0, 2, 0).
x1 = 1 + x3 − x5
x2 = 2 − x5
x4 = 2 − x3 + x5
z = 3 + x3 − 2x5
Increase x3 . x4 = 2−x3 +x5 is critical. x3 = 2−x4 +x5 . New base B = {1, 2, 3}. x = (3, 2, 2, 0, 0).
x1 = 3 − x4
x2 = 2 − x5
x3 = 2 − x4 + x5
z = 5 − x4 − x5
The value of the objective function for any feasible solution (x1 , . . . , x5 ) is 5 − x4 − x5 . Since we
have found a solution where x4 = x5 = 0 and we have the constraint that xi ≥ 0 (i = 1, . . . , 5),
our solution is an optimum solution.
49
Unbounded instance:
As a second example, consider:
max x1
s.t. x 1 − x2 + x3 = 1
−x1 + x2 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
Quite obviously this LP in unbounded (one can choose x1 arbitrarily large and set x2 = x1 ,
x3 = 1, and x4 = 2).
Again we use the “slack variables” (here x3 and x4 ) for a first basis. This gives B = {3, 4} and
x = (0, 0, 1, 2).
x3 = 1 − x1 + x2
x4 = 2 + x1 − x2
z = x1
Increase x1 . x3 = 1 − x1 + x2 is critical. x1 = 1 + x2 − x3 . New base B = {1, 4}. x = (1, 0, 0, 3).
x1 = 1 + x2 − x 3
x4 = 3 − x3
z = 1 + x2 − x 3
We can increase x2 as much as we want (provided that we increase x1 by the same amount).
Thus the simplex tableau show that the linear program is unbounded.
Degeneracy:
A final example shows what may happen if we get a degenerated basic solution.
max x2
s.t. −x1 + x2 + x3 = 0
x1 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
Starting basis is B = {3, 4}, so x = (0, 0, 0, 2) which is a degenerated solution.
x3 = x1 − x2
x4 = 2 − x1
z = x2
We want to increase x2 . x3 = x1 − x2 is critical. x2 = x1 − x3 . We will replace 3 by 2 in the

basis. However, we cannot increase x2 . New base B = {2, 4}. x = (0, 0, 0, 2).
50
x2 = x1 − x3
x4 = 2 − x1
z = x1 − x3
Increase x1 . x4 = 2 − x1 is critical. x1 = 2 − x4 . New base B = {1, 2}. x = (2, 2, 0, 0).
x1 = 2 − x4
x2 = 2 − x3 − x4
z = 2 − x3 − x4
Again, we have found an optimum solution because all coefficients of the non-basic variables in
the objective function z = 2 − x3 − x4 are negative.
After these three examples, we will now describe the simplex method in general.
For a feasible basis B, the simplex tableau is a system T (B) of m + 1 linear equations with
variables x1 , . . . , xn and z with this form
xB = p + QxN
(31)
z = z0 + rt xN
and the following properties:
• xB is the vector of the basic variables, N = {1, . . . , n} \ B, and xN is the vector of the
non-basic variables,
• T (B) has the same set of solutions as the system Ax = b, z = ct x.
• p is a vector of length m, Q is an m × (n − m)-matrix, r is a vector of length n − m, and

z0 ∈ R.
Note that the entries of p are not necessarily numbered from 1 to m but that p uses B as the
set of indices (and for r, we have a corresponding statement). In particular, the rows of Q are
indexed by B and the columns by N . We denote the entries of Q by qij (where i ∈ B and
j ∈ N ).
Lemma 33 For each feasible basis B, there is a simplex tableau T (B).
Proof: Set p = A−1 −1 t −1 t t −1

B b, Q = −AB AN , r = cN − (cB AB AN ) , and z0 = cB AB b.
Then xB = A−1 −1
B b − AB AN xN which is equivalent to AB xB = b − AN xN and Ax = b.
Moreover, z = ctB A−1 t t −1 t −1 t t −1

B b + (cN − (cB AB AN ))xN = cB AB (b − AN xN ) + cN xN = cB AB AB xB +
ctN xN = ctB xB + ctN xN = ct x. 2
51
Remark: It is easy to check that there is only one simplex tableau for every feasible basis B.
The cost function z0 + rt xN does not directly depend on the basic variables but only on the
non-basic variables. Their impact on the overall cost is given be the vector r = cN − (ctB A−1 t
B AN ) .
An entry of r is called the reduced cost of its corresponding non-basic variable.
If all reduced costs are non-positive, we have already found an optimum solution:
Lemma 34 Let T (B) be a simplex tableau for a feasible basis B. If r ≤ 0, then the basic
solution of B is optimum.
Proof: Let x be the basic solution of B. Since xN = 0, we have ct x = z0 (= ctB A−1 ∗

B b). If x is
any feasible solution with value z ∗ = ct x∗ , then x∗ and z ∗ are also a solution of T (B), and we
have (because of r ≤ 0 and x∗N ≥ 0) z ∗ = z0 + rt x∗N ≤ z0 = ct x. 2
Lemma 35 Let T (B) be a simplex tableau for a feasible basis B. If there is an α ∈ N

with rα > 0 such that the column of Q with index α contains non-negative entries only,
the linear program is unbounded.
Proof: Let x the feasible basic solution for B. Let K ∈ R with K > ct x be a constant. Define
tx
a new feasible solution x̃ as follows: x̃α := K−c
rα
, x̃i = xi for i ∈ N \ {α}, and x̃j := pj + qjα x̃α
for j ∈ B. It is easy to check that x̃ is a feasible solution with ct x̃ ≥ K. Hence, the linear
program is unbounded. 2
In the following, we denote the entries of A by aij (i ∈ {1, . . . , m}, j ∈ {1, . . . , n}). The column
of A with index j is denoted by a·j .
Lemma 36 Let T (B) be a simplex tableau for a feasible basis B. Let α ∈ N be an

pβ
index with rα > 0 and β ∈ B with qβα < 0 and qβα = max{ qpiαi | qiα < 0, i ∈ B}. Then
B̃ = (B ∪ {α}) \ {β} is a feasible basis.
Proof: We have to show that AB̃ has full rank and that it is feasible i.e. that its basic solution
is non-negative.
(i) B̃ is a basis: We will show that A−1

B AB̃ has full rank.
All but one columns of AB̃ belong to AB . Hence, the matrix A−1 B AB̃ contains all unit
vectors ei with the possible exception of eβ because we removed the β-th column from
AB . However, this removed column has been replaced by the α-th column a·α of A, so
the remaining column of A−1 −1
B AB̃ is AB a·α . But this is exactly the column with index
α of −Q = AB AN . By construction, qβα 6= 0, so all columns of A−1
−1
B AB̃ are linearly
52
independent.
β p
(ii) We have to show that the basic solution of B̃ is non-negative. We increase xα to − qβα
pβ
and set the basic variables xB to p − q·α qβα , where q·α is the column with index α of Q.
pβ
For i ∈ B with qiα ≥ 0 (so in particular i 6= β) we have pi − qiα qβα ≥ pi ≥ 0. For i ∈ B
pβ pi pβ
with qiα < 0 we have qβα ≥ qiα , so pi ≥ qiα qβα with equality in the last inequality for
i = β. This leads to xβ = 0 and xB ≥ 0, so we get a feasible basic solution for B̃. 2
Algorithm 1: Simplex Algorithm

Input: A matrix A ∈ Rm×n , a vector b ∈ Rm , and a vector c ∈ Rn
Output: A vector x̃ ∈ {x ∈ Rn | Ax = b, x ≥ 0} maximizing ct x or the message that
max{ct x | Ax = b, x ≥ 0} is unbounded or infeasible
1 Compute a feasible basis B;
2 If no such basis exists, stop with the message “infeasible”;
3 Set N = {1, . . . , n} \ B and compute the feasible basic solution x for B;
// xB = A−1
B b, xN = 0.
4 Compute the simplex tableau T(B)
xB = p + QxN
z = z0 + rt xN
for the basis B; // See equation (31) and the following notation.
5 if r ≤ 0 then
return x̃ = x; // x̃ is optimum (see Lemma 34).
6 Choose an index α ∈ N with rα > 0;
// Here we can apply different pivot rules.
7 if qiα ≥ 0 for all i ∈ B then
return “unbounded”; // By Lemma 35, the LP is unbounded.
pβ
8 Choose an index β ∈ B with qβα < 0 and qβα = max{ qpiαi | qiα < 0, i ∈ B};
// Again, we can apply different pivot rules.
9 Set B = (B \ {β}) ∪ {α};
// See Lemma 36 proving that we get a new feasible basis.
10 go to line 3
Algorithm 1 summarizes the Simplex Algorithm.

Remark: In line 1 of the algorithm we have to compute an initial feasible basis. This can be
done with the following trick: We assume that the Simplex Algorithm works correctly and
has a finite running time, provided that we can compute an initial basis. We further assume that
b ≥ 0 (otherwise, we have to multiply some equations by -1 first). Then, we set Ã = (A | Im ),
add new variables xn+1 , . . . , xn+m , and solve (with x̃ = (x1 , . . . , xn+m )) the following problem:
53
max −(xn+1 + xn+2 + · · · + xn+m )
s.t. Ãx̃ = b (32)
x̃ ≥ 0
For this linear program, it is trivial to find a feasible basis ({n + 1, . . . , n + m} will work), so
we can solve it by the Simplex Algorithm. If the value of its optimum solution is negative,
this means that the original linear program does not have a feasible solution. Otherwise, the
Simplex Algorithm will provide a basic solution for the original linear program. In this case,
the solution of the new LP computed by the Simplex Algorithm could contain variables
from xn+1 , . . . , xn+m as basic variables but their value must be 0 and hence they can be replaced
easily by variables from x1 , . . . , xn .
In lines 6 and 8, we may have a choice between different candidates to enter or leave the basis.
The elements chosen in these steps are called pivot elements, and the rules by which we choose
them are called pivot rules. Several different pivot rules for the entering variable have been
proposed:
• Largest coefficient rule: For the entering variable choose α such that rα is maximized.
This is the rule that was proposed by Dantzig in his first description of the Simplex
Algorithm.
• Largest increase rule: Choose the entering variable such that the increase of the
objective function is maximized. Finding an α with that property takes more time because
it is not sufficient to consider the vector r only.
• Steepest edge rule: Choose the entering variable in such a way that we move the
feasible basic solution in a direction as close to the direction of the vector c as possible.
This means we maximize
ct (xnew − xold )
||xnew − xold ||
where xold is the basic feasible solution of the current basis and xnew is the basic feasible
solution of the basis after the exchange step. This rule is even more timing-consuming but
in many practical experiments it turned out to lead to a small number of exchange steps.
Here, we only analyze a pivot rule that is quit inefficient in practice but has the nice property
that we can show that the Simplex Algorithm terminates at all, if we follow that rule. If all
exchange steps improve the value of the current solution, we can be sure that the algorithm will
terminate because we can never visit the same basic solution twice, and there is only a finite
(though exponential) number of basic solutions. However, exchange steps do not necessarily
change the value of the solution. Therefore, depending on the pivot rules, it is possible that
the Simplex Algorithm runs in an endless loop by considering the same sequence of bases
forever. This behavior is called cycling (see page 30 ff. of Chvátal [1983] for an example that
this can really happen). The good news is that we can avoid cycling by using an appropriate
pivot rule.
54
If the algorithm does not terminate, it has to consider the same basis B twice. The computation
between two occurrences of B is called a cycle. Let F ⊆ {1, . . . , n} be the indices of the variables
that have been added to (and hence removed from) the basis during one cycle. We call xF the
cycle variables.
Lemma 37 If the Simplex Algorithm cycles, all basic solutions during the cycling
are the same and all cycle variables are 0.
Proof: The value of a solution considered in Simplex Algorithm never decreases, so during
cycling it cannot increase either. Let B be a feasible basis that occurs in the cycle, and let
B 0 = (B ∪ {α}) \ {β} be the next basis. The only non-basic variable that could be increased is
xα . However, if it indeed was increased, then, because rα > 0, this would increase the value of
the solution. This shows that the non-basic variables remain zero. But then, all variables remain
unchanged because the basic variables are determined uniquely by the non-basic variables. 2
A pivot rule that is able to avoid cycling is Bland’s rule (Bland [1977]) that can be described
as follows: In line 6 of the Simplex Algorithm, we choose α among all elements in N with
rα > 0 such that α is minimal. In line 8, we choose β among all elements in B with qβα < 0
pβ
and qβα = max{ qpiαi | qiα < 0, i ∈ B} such that β is minimal.
Theorem 38 With Bland’s rule as pivot rule in lines 6 and 8, the Simplex Algorithm
terminates after a finite number of steps.
Proof: Assume that the algorithm cycles while using Bland’s rule. We use the notation from
above and consider the set F of the indices of the cycle variables. Let π be the largest element
of F , and let B be the basis just before π enters the basis. Let p,Q,r and z0 be the entries of
the simplex tableau T (B). Let B 0 be the basis just before π leaves it. Let p0 ,Q0 ,r0 and z00 be the
entries of the simplex tableau T (B 0 ).
Let N = {1, . . . , n} \ B be the set of the non-basic variables (so in particular π ∈ N ). According
to Bland’s rule we choose the smallest index and π = max(F ), so when B is considered, π is
the only candidate in F to enter the basis. In other words:
rπ > 0 and rj ≤ 0 for all j ∈ N ∩ (F \ {π}). (33)
Let α be the index entering B 0 . Again by Bland’s rule, π must have been the only candidate
among all elements of F to leave B 0 . Since p0j = 0 for all j ∈ B 0 ∩ F , this means that
0 0
qπα < 0 and qjα ≥ 0 for j ∈ B 0 ∩ (F \ {π}). (34)
Roughly spoken, we will get a contradiction because (33) says that in a feasible basic solution
increasing a non-basic variable in xF \{π} or decreasing xπ (to something negative!) will not
55
improve the result. On the other hand, (34) says that increasing xα while decreasing xπ (again
to something negative) will improve the result.
We will formalize this statement by considering the following auxiliary linear program:
max ct x
s.t. Ax = b
xF \{π} ≥ 0 (35)
xπ ≤ 0
xN \F = 0
Note that there a no constraints on the signs of the variables in xB\F .

We will show two claims that obviously cause a contradiction:
Claim 1: The LP (35) has an optimum solution.
Proof of Claim 1: Let x̃ be a basic feasible solution (of the original LP) of the basis B. We
have x̃F = 0, so in particular x̃π = 0, and hence x̃ is a feasible solution of (35). The cost of any
solution x of Ax = b can be written as ct x = z0 + rt xN . For any solution x of (35), we have

≥0 if j ∈ F \ {π}
xj
≤0 if j = π
Therefore, by statement (33), rj xj ≤ 0 for all j ∈ F . With the condition xN \F = 0 this leads to
rt xN ≤ 0 for any solution x of (35). Therefore, the value of any such solution is at most z0 ,
and thus x̃ is an optimum solution of (35). This proves Claim 1.
Claim 2: The LP (35) is unbounded.
The bases are changed during the cycling but we always have the same basic solution. Hence,
if x̃ is a feasible basic solution of the original LP for basis B is also a feasible basic solution
for the basis B 0 . We choose a positive number K and set x0α = K. For j ∈ N 0 \ {α} (with
N 0 = {1, . . . , n} \ B 0 ), we set x0j = x̃j = 0. Moreover, we set xB 0 = p0 + Q0 x0N 0 . By (34), this
defines a feasible solution of the auxiliary LP (35). Since α was a candidate for entering the
basis B 0 , we have rα0 > 0. Hence, we get a solution with value ct x0 = z00 + r0t x0N 0 = z00 + K · rα0 .
As we can choose K arbitrarily large, this shows that LP (35) is unbounded. 2
56
4.3 Efficiency of the Simplex Algorithm
We have seen that Bland’s rule guarantees that the Simplex Algorithm will terminate. What
can we say about the running time? Consider for some with 0 < < 12 the following example:
max xn
−x1 ≤ 0
x1 ≤ 1
xj−1 − xj ≤ 0 for j ∈ {2, . . . , n}
xj−1 + xj ≤ 1 for j ∈ {2, . . . , n}
Of course, adding non-negativity constraints for all variables would not change the problem.
The polyhedron defined by these inequalities is called Klee-Minty cube (Klee and Minty
[1972]). It turns out that the Simplex Algorithm with Bland’s rule (depending on the initial
solution) may consider 2n bases before finding the optimum solution. In particular, this example
shows that we don’t get a polynomial-time algorithm.
The bad news is that for any of the above pivot rules instances have been found where the
Simplex Algorithm with that particular pivot rule has exponential running time.
Assume that you are given an optimum pivot rule that guides you to an optimum solution
with a smallest possible number of iterations. Then, the number of iterations depends on the
following property of the instances:
Definition 14 The combinatorial diameter of a pointed polyhedron P is the diameter

(i.e. the largest distance of two nodes) of the undirected graph GP , where V (GP ) is the set
of vertices of P and two nodes v, w ∈ V (GP ) are connected by an edge in GP if and only
if there is a face of dimension 1 containing v and w.
Obviously, if we don’t make any assumptions on the starting solution, the number of iterations
performed by the Simplex Algorithm optimizing over a polyhedron P will be at least the
combinatorial diameter of P , even with an optimum pivot rule.
It is an open question what the largest combinatorial diameter of a d-dimensional polyhedron
with n facets is. In 1957, W. Hirsch conjectured that the combinatorial diameter could be at
most n − d. This conjecture was open for decades but it has been disproved by Santos [2011] who
showed that there is a 20-dimensional polyhedron with 40 facets and combinatorial diameter
21. More generally, he proved that there are counter-examples to the Hirsch conjecture with
arbitrarily many facets. Nevertheless, it is still possible that the combinatorial diameter is
always polynomially (or even linearly) bounded in the dimension and the number of facets. The
best known upper bound for the combinatorial diameter is O(n2+log d ) and was proven by Kalai
and Kleitman [1992]. For an overview of this topic see Section 3.3 of Ziegler [2007].
57
In practical experiments, the Simplex Algorithm typically turns out to be very efficient. It
could also be proved that the average running time (with a specified probabilistic model) is
polynomial (see Borgwardt [1982]). Moreover, Spielmann and Teng [2005] have shown that the
expected running time on a slight perturbation of a worst-case instance can be bounded by a
polynomial.
Revised Simplex Algorithm

If one implements the Simplex Algorithm as described above, an explicit computation of the
simplex tableau can be time-consuming. This can be avoided in the so-called Revised Simplex
Algorithm. In particular, we do not have to store the m × (n − m)-matrix Q completely. It is
sufficient to compute the column of Q with index α after we have found an α ∈ N with rα > 0.
This method is called column generation. Moreover, we do not really need the matrix A−1 B .
In fact, we only want to solve equation system of the type AB y = d. It is more efficient to
compute an LU-decomposition of AB and update it after each exchange step.
4.4 Dual Simplex Algorithm
If the linear program max{ct x | Ax = b, x ≥ 0} is feasible and bounded then the Simplex
Algorithm does not only provide an optimum primal solution but we can also get an optimum
solution of the dual linear program min{bt y | At y ≥ c}. To see this, let B the feasible basis
corresponding to the optimum computed by the Simplex Algorithm. Set ỹ = A−t B cB (where
−t t −1 t t t −t
AB = (AB ) ). This leads to AB ỹ = cB and AN ỹ = AN AB cB ≥ cN where the last inequality
follows from the fact that in T (B) we have 0 ≥ r = cN − (ctB A−1 t
B AN ) . So the vector ỹ is feasible
for the dual LP, and it is an optimum solution because together with the (primal) basic solution
x̃ for the basis B, it satisfies the complementary slackness condition (ỹ t A − ct )x̃ = 0.
In fact, the condition r ≤ 0 in the simplex tableau T (B) guarantees the existence of a dual
solution y with y t AB = ctB . In the Dual Simplex Algorithm, we start with a feasible basic
dual solution, i.e. a feasible dual solution for which a basis B exists with y t AB = ctB . If ctB A−1
B
is a feasible dual solution, we call B a dual feasible basis. Then, we compute the corresponding
simplex tableau T (B) (which exists for any basis not just a feasible basis). Thus the vector r
will have no positive entry. Note that B may not be feasible, so entries of p can be negative.
Now the algorithm swaps elements between the basis and the rest of the variables similarly to
the simplex algorithm but instead of keeping p non-negative it keeps r non-positive.
For any basis B such that in T (B) the vector r has no positive entry, the following properties
(that are easy to prove) are the basis of the Dual Simplex Algorithm:
• There is a feasible dual solution y with y t AB = cB .
• If p ≥ 0 then the current dual solution is optimum.
• z0 is the current solution value of the dual solution.
58
• If there is a β ∈ B with pβ < 0 such that qβj ≤ 0 for all j ∈ N , then the primal LP is
infeasible.
r
• For β ∈ B with pβ < 0 and α ∈ N with qβα > 0 with qrβα α
≥ qβjj for all j ∈ N with qβj > 0,
then (B \ {β}) ∪ {α} is a dual feasible basis. Then the value of the dual solution is changed
−p
by qβαβ rα . In particular, if rα 6= 0 then the value of the dual solution gets smaller.
The Dual Simplex Algorithm simply applies the exchange steps in the last item until we
get a feasible basis. The algorithm can be considered as the Simplex Algorithm applied to
the dual LP. Thus it can also run into cycling and its efficiency is not better then the efficiency
of the Simplex Algorithm.
However, in some applications, the Dual Simplex Algorithm is very useful: If you add
an additional constraint to the primal LP, then a primal solution can become infeasible, so in
the Primal Simplex Algorithm we have to start from scratch. However, the dual solution
is still feasible. It is possibly not optimal but often it can be made optimal with just some
iterations of the Dual Simplex Algorithm.
4.5 Network Simplex
The Network Simplex Algorithm can be seen as the Simplex Algorithm applied to
Min-Cost-Flow-Problems. Even for this special case, we cannot prove a polynomial running
time but it turns out that, in practice, the Network Simplex Algorithm is among the
fastest algorithms for Min-Cost-Flow-Problems. Though it is a variant of the Simplex
Algorithm, it can be described as a pure combinatorial algorithm.
be an directed graph with capacities u : E(G) → R>0 and numbers

Definition 15 Let G P
b : V (G) → R with v∈V (G) b(v) = 0. A feasible b-flow in (G, u, b) is a mapping
f : E(G) → R≥0 with

P P
• e∈δ + (v) f (e) − e∈δ − (v) f (e) = b(v) for all v ∈ V (G).
G G
Notation: We call b(v) the balance of v. If b(v) > 0, we call it the supply of v, and if b(v) < 0,
we call it the demand of v. Nodes v of G with b(v) > 0 are called sources, nodes v with
b(v) < 0 are called sinks.
During this chapter, n is always the number of nodes and m the number of edges of the graph
G.
59
Minimum-Cost Flow Problem
P directed graph G, capacities u : E(G) → R>0 , numbers b : V (G) → R with

Instance: A
v∈V (G) b(v) = 0, edge costs c : E(G) → R.
P
Task: Find a b-flow f minimizing e∈E(G) c(e) · f (e).
We will use the following standard notation:
↔ ↔
Definition 16 Let G be a directed graph. We define the graph G by V (G) = V (G) and
↔ ← ←
E(G) = E(G)∪{ ˙ e | e ∈ E(G)} where e is an edge from w to v if e is an edge from v to
← ↔
w. e is called the reverse edge of e. Note that G may have parallel edges even if G does
not contain any parallel edges. If we have edge costs c : E(G) → R these are extended
↔ ←
canonically to edges in E(G) by setting c( e ) = −c(e).
Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem and let f be a
b-flow in (G, u). Then, the residual graph Gu,f is defined by V (Gu,f ) := V (G) and
← ↔
E(Gu,f ) := {e ∈ E(G) | f (e) < u(e)}∪{ ˙ e ∈ E(G) | f (e) > 0}. For e ∈ E(G) we define
←
the residual capacity of e by uf (e) = u(e) − f (e) and the residual capacity of e by
←
uf ( e ) = f (e).
The residual graph contains the edges where flow can be increased as forward edges and edges
where flow can be reduced as reverse edges. In both cases, the residual capacity is the maximum
value by which the flow can be modified. If P is a subgraph of the residual graph, then an
augmentation along P by γ means that we increase the flow on forward edges in P (i.e. edges
in E(G) ∩ E(P )) by γ and reduce it on reverse edges in P by γ. Note that the resulting mapping
is only a flow if γ is at most the minimum of the residual capacity of the edges in P .
Definition 17 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Pro-

blem. A b-flow f in (G, u) is called a spanning tree solution if the graph
(V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) does not contain any undirected cycle.
Spanning tree solutions can be interpreted as vertex solutions:
60
Lemma 39 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem. A
b-flow f is a spanning tree solution if and only if x̃ ∈ RE(G) with x̃e = f (e) is a vertex of
the polytope
 
 X X 
E(G)
x∈R | 0 ≤ xe ≤ u(e) (e ∈ E(G)), xe − xe = b(v) (v ∈ V (G)) . (36)
 + −

e∈δ (v) e∈δ (v)
Proof: “⇒:” Let f be a spanning tree solution and x̃ ∈ RE(G) with x̃e = f (e). Consider
all inequalities xe ≥ 0 with f (e) = 0, xe ≤ u(e) with f (e) = u(e) and for each connected
component
P (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) for all but one vertex the equation
ofP
e∈δ + (v) xe − e∈δ − (v) xe = b(v). These are |E(G)| linearly independent inequalities that are
fulfilled with equality by x̃. Hence x̃ is a vertex.
“⇐:” Let f by a b-flow. Assume that x̃ ∈ RE(G) with x̃e = f (e) is a vertex of the polytope (36).
Assume that (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) contains an undirected cycle C. Choose
an > 0 such that ≤ min{min{f (e), u(e) − f (e)} | e ∈ E(C)}. Fix one of the two possible
orientations of C. We call an edge of C a forward edge if its orientation is the same as the
chosen orientation, otherwise it is called backward edge. Set x0e = for all forward edges and
x0e = − for all backward edges. For all edges e ∈ E(G) \ E(C), we set x0e = 0. Then x̃ + x0
and x̃ − x0 belong to the polytope (36) and x̃ = 12 ((x̃ + x0 ) + (x̃ − x0 )), so by Proposition 24, x̃
cannot be a vertex. Hence, we have a contradiction. 2
Corollary 40 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem.

If there is a b-flow in (G, u), then there is an optimum solution of (G, u, b, c) that is a
spanning tree solution.
Proof: Since the polyhedron (36) is in fact a polytope, it is pointed, so there is an optimum
solution that is a vertex. Together with Lemma 39, this proves the statement. 2
61
Definition 18 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem
where we assume that G is connected. A spanning tree structure is a quadruple
(r, T, L, U ) where r ∈ V (G), E(G) = T ∪˙ L ∪˙ U , |T | = |V (G)| − 1, and (V (G), T ) does
not contain any undirected cycle.
The b-flow f associated to the spanning tree structure (r, T, L, U ) is defined by
• f (e) = 0 for e ∈ L,
• f (e) = u(e) for e ∈ U ,
• f (e) = v∈Ce b(v) + e0 ∈U ∩δ− (Ce ) u(e0 ) − e0 ∈U ∩δ+ (Ce ) u(e0 ) for e ∈ T where we
P P P
G G
denote by Ce vertex set of the the connected component of (V (G), T \ {e}) containing
v (for e = (v, w)).
Let (r, T, L, U ) be a spanning tree structure and f the b-flow associated to it. The structure
(r, T, L, U ) is called feasible if 0 ≤ f (e) ≤ u(e) for all e ∈ E(T ).
An edge (v, w) ∈ E(T ) is called downward if v is on the undirected r-w-path in T ,
otherwise is is called upward.
A feasible spanning tree structure (r, T, L, U ) is called strongly feasible if 0 < f (e) for
every downward edge e ∈ E(T ) and f (e) < u(e) for every upward edge e ∈ E(T ) (where f
is again the b-flow associated to (r, T, L, U )).
We call the unique function π : V (G) → R with π(r) = 0 and cπ (e) := c(e)+π(v)−π(w) = 0
for all e = (v, w) ∈ T the potential associated to the spanning tree structure
(r, T, L, U ).
62
Remarks:
• Obviously, the b-flow associated to the spanning tree structure (r, T, L, U ) fulfills the flow
conservation rule, but it may be infeasible.
↔ ↔
• π(v) is the length of the r-v-path in (G, c ) consisting of edges of T and their reverse
edges, only.
• In a strongly feasible tree structure, we can send a positive flow from each vertex v to r
along tree edges such that that the new flow remains non-negative and fulfills the capacity
constraints.
Proposition 41 Given an instance (G, u, b, c) of the Minimum-Cost Flow Problem

and a spanning tree structure (r, T, L, U ), the b-flow f and the potential π associated to
(r, T, L, U ) can be computed in time O(m).
Proof: Since the potential π just encodes the distances to r in T , a breadth-first search in
the edges of T and the reverse edges of T is sufficient.
We can compute f by scanning the vertices in an order of non-increasing distance to r in T . 2
Proposition 42 Let (r, T, L, U ) be a feasible spanning tree structure and π the potential
associated to it. If cπ (e) ≥ 0 for all e ∈ L and cπ (e) ≤ 0 for all e ∈ U , then the b-flow
associated to (r, T, L, U ) is optimum.
Proof: The flow associated to (r, T, L, U ) is a basic solution of the standard linear program-
ming formulation for the minimum-cost flow problem. The criterion in the proposition is
equivalent to the statement that the reduced costs of all non-basic variables are non-positive.
This is equivalent to the optimality of the solution. 2
↔ ←
For an edge e = (v, w) ∈ E(G) \ T with e 6∈ T , we call e together with the w-v path consisting
of edges of T and reverse edges of edges of T only, the fundamental circuit of e. The vertex
closest to r in the fundamental circuit is called the peak of e.
Algorithm 2 gives a summary of the Network Simplex Algorithm. As an input, we need
a strongly feasible tree structure. However, even if there is a feasible b-flow, such a strongly
feasible tree structure may not exist. But we can modify the instance such that we can easily
find a strongly feasible tree structure (r, T, L, U ). We add artificial expensive edges between r
and all other nodes. For each sink v ∈ V (G) \ {r}, we add an edge (r, v) with u((r, v)) = −b(v).
For all other nodes v ∈ V (G) \ {r} we add an edges (v, r) with u((v, r)) = b(v) + 1. Then,
we get a strongly feasible spanning tree structure by setting L to the set of all old edges (i.e.
without the artificial edges connecting r) and by setting U = ∅. If the weight on the artificial
63
edges is high enough (1 + n maxe∈E(G) |c(e)| would be sufficient) and there is a solution that
does not use these edges at all, no optimum solution will send flow along these new edges, so
the new instance is equivalent.
Algorithm 2: Network Simplex Algorithm

Input: An instance (G, u, b, c) of the Minimum-Cost Flow Problem and a strongly
feasible spanning tree structure (r, T, L, U ).
Output: A minimum-cost flow f .
1 Compute the b-flow f and the potential π associated to (r, T, L, U );
2 Let e0 be an edge with e0 ∈ L and cπ (e0 ) < 0 or an edge with e0 ∈ U and cπ (e0 ) > 0;
3 if No such edge exists then
return f
←
4 Let C be the fundamental circuit of e0 (if e0 ∈ L) or of e0 (if e0 ∈ U ) and let ρ = cπ (e0 );
5 Let γ = mine0 ∈E(C) uf (e0 ), and let e0 the last edge where this minimum is attained when C
is traversed (starting at the peak);
←
6 Let e1 be the corresponding edge in the input graph, i.e. e0 = e1 or e0 =e1 ;
7 Remove e0 from L or U ;
8 Set T = (T ∪ {e0 }) \ {e1 };
9 if e0 = e1 then
Set U = U ∪ {e1 };
10 else
Set L = L ∪ {e1 };
11 Augment f along γ by C;
12 Let X be the connected component of (V (G), T \ {e0 }) that contains r;
13 if e0 ∈ δ + (X) then
Set π(v) = π(v) + ρ for v ∈ V (G) \ X;
14 if e0 ∈ δ − (X) then
Set π(v) = π(v) − ρ for v ∈ V (G) \ X;
15 go to line 2;
Theorem 43 The Network Simplex Algorithm terminates after a finite number of

iterations and computes an optimum solution.
Proof: It is easy to check that after the modification in the lines 11 to 14 f and π are still
the b-flow and the potential associated to (r, T, L, U ).
We will show that the spanning tree structure (r, T, L, U ) remains strongly feasible. By the
choice of γ in line 5 it remains feasible.
For an edge e = (v, w) on T let ẽ = (v, w) if e is an upward edge and ẽ = (w, v) if e is a
downward edge. We have to show that after an iteration of the algorithm, for all edges e ∈ E(T ),
the edge ẽ has a positive residual capacity. This is obvious for all edges outside C. For the edge
64
on the path on C from the head of e0 to the peak of C, this is also obvious because we augment
by γ = uf (e0 ) which is smaller than the residual capacities on this path (by the choice of e0 ).
For the remaining edges e on C − e0 , the residual capacity uf (ẽ) is, after the augmentation, at
least γ. Thus, is if γ > 0, we are done. But if γ = 0, then e0 must be on the path from the
peak to e0 , so for the edges e on the path from the peak to the tail of e0 we had uf (ẽ) before
the augmentation (because (r, T, L, U ) was strongly feasible), so this is still the case after the
augmentation.
We will show that we never consider the same spanning tree structure twice. In each iteration,
the cost of the flow is reduced by γ|ρ|, so if γ > 0, then we P are done. Hence assume that
− +
γ = 0. If e0 =6 e1 , then e0 ∈ L ∩ δ (X) or e0 ∈ U ∩ δ (X), so v∈V (G) π(v) will get larger (and
it
P will never get smaller). Thus, we assume in addition that e0 = e1 . Then X = V (G) and
v∈V (G) π(v) remains unchanged. But then |{e ∈ L | cπ (e) < 0}| + |{e ∈ U | cπ (e) > 0}| is
strictly decreased. This shows that we can never get the same spanning tree structure twice.
Since there is only a finite number of spanning tree structures, this proves that the algorithm
will terminate after a finite number of iterations.
By Proposition 42, the output of the algorithm is optimal when the algorithm terminates. 2
65
66
5 Sizes of Solutions
Before we will describe polynomial-time algorithms for solving linear programs we have to make
sure that we can store the output and all intermediate results with numbers whose sizes are
polynomial in the input size. To this end we have to define the size of numbers. Assuming that
all numbers are given in a binary representation, we define for
• n ∈ Z : size(n) := 1 + dlog(|n| + 1)e,

p
• r= q
with p, q ∈ Z, relatively prime: size(r) := size(p) + size(q),
Pn
• vectors x = (x1 , . . . , xn ) ∈ Qn : size(x) := n + i=1 size(xi ),
Pm Pn
• matrices A = (aij ) i=1,...,m ∈ Qm×n : size(A) := nm + i=1 j=1 size(aij ).
j=1,...,n
Remark: In order to get a description of a fraction r of with size(r) bits, we have to write r
as pq for numbers p, q ∈ Z that are relatively prime. Therefore, in any computation, when a
fraction pq arises, we apply the Euclidean Algorithm to p and q and divide p and q by their
greatest common divisor. The Euclidean Algorithm has polynomial running time, so during
any algorithm, we can assume that any fraction r is stored by using just size(r) bits.
Proposition 44 For r1 , . . . , rn ∈ Q, we have

n n
Q P
(a) size ri ≤ size(ri )
i=1 i=1
n
n
P P
(b) size ri ≤2 size(ri )
i=1 i=1
Proof: Both statements of obvious if the numbers r1 , . . . , rn are integers. Hence assume that
ri = pqii for non-zero numbers pi and qi that are relatively prime (i = 1, . . . , n).
n
n
n
n n n
Q Q Q P P P
(a) size ri ≤ size pi + size qi ≤ size(pi ) + size(qi ) = size(ri ).
i=1 i=1 i=1 i=1 i=1 i=1
!
n
Q n
P n
P n
P Q
(b) We have size qi ≤ size(qi ) ≤ size(ri ), and size pi qj ≤
i=1 i=1 i=1 i=1 j∈{1,...,n}\{i}
!
n n n n n
1
P Q P P P Q
size |pi | qj ≤ size(ri ). Since ri = n
Q pi qj , this proves the
i=1 j=1 i=1 i=1 qi i=1 j∈{1,...,n}\{i}
i=1
claim. 2
67
Proposition 45 For x, y ∈ Qn , we have
(a) size(x + y) ≤ 2(size(x) + size(y))
(b) size(xt y) ≤ 2(size(x) + size(y))
Proof:
(a) We have
n
X n
X n
X
size(x + y) = n + size(xi + yi ) ≤ n + 2 size(xi ) + 2 size(yi ) = 2(size(x) + size(y)) − 3n.
i=1 i=1 i=1
(b) We have
n
! n n n
!
X X X X
t
size(x y) = size xi yi ≤2 size(xi yi ) ≤ 2 size(xi ) + size(yi )
i=1 i=1 i=1 i=1
= 2(size(x) + size(y)) − 4n.
2
Proposition 46 For any matrix A ∈ Qn×n , we have size(det(A)) ≤ 2size(A).
p
Proof: Write the entries aij of A as aij = qijij where pij and qij are relatively prime (i, j =
1, . . . , n). Let det(A) = pq where p and q are relatively prime, too.
Then |det(A)| ≤ ni=1 nj=1 (|pij | + 1) and |q| ≤ ni=1 nj=1 |qij |. Therefore,
Q Q Q Q
size(q) ≤ size(A)
Qn Qn
and |p| = |det(A)||q| ≤ i=1 j=1 (|pij | + 1)|qij | . We can conclude
n X
X n
size(p) ≤ (size(pij ) + 1 + size(qij )) = size(A).
i=1 j=1
This proves size(det(A)) ≤ 2size(A). 2
Proposition 47 Let max{ct x | Ax ≤ b} be a feasible bounded linear program with

A ∈ Qm×n and b ∈ Qm . Then, there is an optimum (rational) solution x with
size(x) ≤ 4n(size(A) + size(b)). If b = ei oder b = −ei for a unit vector ei , then there is a
non-singular submatrix A0 of A and an optimum solution x with size(x) ≤ 4nsize(A0 ).
68
Proof: By Corollary 19 the maximum of ct x over P = {x ∈ Rn | Ax ≤ b} must be attained in
a minimal face of P . Let F be a minimal face where the maximum is attained. By Proposition 22,
we can write F = {x ∈ Rn | Ãx = b̃} for some subsystem Ãx ≤ b̃ of Ax ≤ b. We can assume
that the rows of Ã are linearly independent. Choose B ⊆ {1, . . . , n} such that ÃB is a regular
square matrix. Then x ∈ Rn with xB = Ã−1 B b̃ and xN = 0 (with N = {1, . . . , n} \ B) is an
optimum solution of the linear program. By Cramer’s rule the entries of xB can be written
det(Ã )
as xj = det(Ã j ) where Ãj arises from ÃB by replacing the j-th column by b̃. Thus, we have
B
size(x) ≤ n + 2n(size(Ãj ) + size(ÃB )) ≤ 4n(size(ÃB ) + size(b̃)).

If b ∈ {ei , −ei }, then |det(Ãj )| is the absolute value of a determinant of a submatrix of ÃB . 2
Corollary 48 Let max{ct x | Ax ≤ b} be a feasible bounded linear program with

A ∈ Qm×n and b ∈ Qm . Then, there is an optimum (rational) solution x such that for
each non-zero entry xj of x, we have |xj | ≥ 2−4n(size(A)+size(b)) .
Proof: According to the proof of the previous proposition there is an optimum solution x
such that for each entry xj of x we have size(xj ) ≤ 4n(size(A) + size(b)). Since every positive
number smaller than 2−4n(size(A)+size(b)) has a size larger than 4n(size(A) + size(b)), this proves
the claim. 2
5.1 Gaussian Elimination
Assume that we want solve an equation system Ax = b. We can do this by applying the Gaussian
Elimination. This algorithm performs three kinds of operations to the matrix A:
1. Add a multiple of a row to another row.
2. Swap two columns.
3. Swap two rows.
It should be well-known (see e.g. textbooks Hougardy and Vygen [2018] or Korte and Vygen
[2018]) that with these steps O(mn(rank(A) + 1)) elementary arithmetical operations are
sufficient to transform A into an upper (right) triangular matrix. Then it is easy to check if the
equation system is feasible, and, in case that it is feasible, to compute a solution. However, in
order to show that Gaussian Elimination is a polynomial-time algorithm, we have to show that
the numbers that arise during the algorithm aren’t too big.
The intermediate matrices that occur during the algorithm are of the type

B C
, (37)
0 D
69
where B is an upper triangular matrix. Then, an elementary step of the Gaussian Elimination
consist of choosing a non-zero entry of D (called pivot element; if no such entry exists, we are
done) and to swap rows and/or columns such that this element is at position (1, 1) of D. Then
we add a multiple of the first row of D to the other rows of D such that the entry at position
(1, 1) is the only non-zero entry of the first column of D.
We want to prove that the numbers that occur during the algorithm can be encoded using a
polynomial number of bits. We can assume that we don’t need any swapping operation because
swapping columns or rows doesn’t change the numbers in the matrix.

B C
Assume that our current matrix is Ã = where B is a k × k-matrix. Then for each
0 D
entry dij of D we have
det(Ã1,...,k,k+i 1,...,k
1,...,k,k+j ) = dij · det(Ã1,...,k ). (38)
where Mji11,...,j
,...,it
t
denotes the submatrix of a matrix M induced by the rows i1 , . . . , it and the
columns j1 , . . . , jt . To see the correctness of (38), apply Laplace’s formula to the last row of
Ã1,...,k,k+i
1,...,k,k+j which contains dij as the only non-zero element. Since the determinant does not
change if we add the multiple of a row to another row, this leads to
det(A1,...,k,k+i
1,...,k,k+j )
dij =
det(A1,...,k
1,...,k )
By Proposition 46 and Proposition 44, this implies size(dij ) ≤ 4 size(A). Since all entries of the
matrix occur as entries of such a matrix D, this shows that the sizes of all numbers that are
considered during the Gaussian Elimination are bounded by 4size(A).
Note that we have to apply the Euclidean Algorithm to any intermediate result in order to
get small representations of the numbers. But this is not a problem because the Euclidean
Algorithm is polynomial as well.
Finally, we get the result:
Proposition 49 The Gaussian Elimination is an algorithm with polynomial running

time. 2
In particular this result shows that the following problems can be solved with a polynomial
running time:
• Solving a system of linear equations.
• Computing the determinant of a matrix.
• Computing the rank of a matrix.
• Computing the inverse of a regular matrix.
• Checking if a set of rational vectors is linearly independent.
70
6 Ellipsoid Method
The Ellipsoid Method (proposed by Khachiyan [1979]) was the first polynomial-time algorithm
for linear programming. The algorithm solves the problem of finding a feasible solution of a
linear program. As we have seen in Section 2.4, this is sufficient to solve as well the optimization
problem.
6.1 Idealized Ellipsoid Method
Definition 19 A set E ⊂ Rn is an ellipsoid if there are a vector s ∈ Rn and a nonsin-

gular matrix M ∈ Rn×n such that
E = {M x + s | x ∈ B n }
where B n = {x ∈ Rn | xt x ≤ 1} is the n-dimensional unit ball.
As a short notation, we write E = s + M B n .
Definition 20 A symmetric matrix A is called positive definite if xt Ax > 0 for any

non-zero vector x. It is called positive semidefinite if xt Ax ≥ 0 for any vector x.
Remark: An n × n-matrix Q is positive definite if and only if there is a non-singular matrix

M such that Q = M M t . For example, the Cholesky decomposition of Q achieves this. For
a proof of this statement, we refer to textbooks on linear algebra, e.g. Strang [1980].
Lemma 50 A set E ⊂ Rn is an ellipsoid if and only if there is a (symmetric) positive de-

finite n×n-matrix Q and a vector s ∈ Rn such that E = {x ∈ Rn | (x−s)t Q−1 (x−s) ≤ 1}.
Proof: A set E ⊆ Rn is an ellipsoid if and only if there is a nonsingular matrix M ∈ Rn×n

and a vector s ∈ Rn such that
E = {M x+s | x ∈ B n } = {y ∈ Rn | M −1 (y−s) ∈ B n } = {y ∈ Rn | (y−s)t (M −1 )t M −1 (y−s) ≤ 1}.
But (using the previous remark) this is equivalent to the statement that there is a positive
definite n × n-matrix Q and a vector s ∈ Rn such that E = {x ∈ Rn | (x − s)t Q−1 (x − s) ≤ 1}.
2
The Ellipsoid Algorithm just finds an element in an polytope or ends with the assertion
that the polytope is empty. On the other hand, it can be applied to more general sets K ⊆ Rn
71
provided that K is a compact convex set and that for any x ∈ Rn \ K we can find a half-space
containing K such that x is on the border of the half-space.
Basically, the algorithms works as follows: We always keep track of an ellipsoid containing K.
Then we check if the center c of the ellipsoid is contained in K. If this is the case, we are done.
Otherwise, we compute the intersection X of the ellipsoid and a half-space containing K such
that c is on the border of the half-space. Then, we find a new (smaller) ellipsoid containing X.
For the 1-dimensional space, the ellipsoid method contains the binary search as a special case.
However, for technical reasons, we assume in the following that the dimension of our solution
space is at least 2.
We start with a special case that is easier to handle: We assume that our given ellipsoid is
the ball B n (with radius 1 and center 0). We want to find a small ellipsoid E covering the
intersection of B n with the half-space {x ∈ Rn | x1 ≥ 0} (the gray area in Figure 7).
(0, 1)
B2
(c, 0) (1, 0)
E
(0, −1)
Fig. 7: Intersection of B n with {x ∈ Rn | x1 ≥ 0}.
For symmetry reasons, we choose the center of the new smaller ellipsoid on the vector e1 at a
position c · e1 (where c is still to be determined). Our candidates for the ellipsoid are of the form
( n
)
X
E = x ∈ Rn | α2 (x1 − c)2 + β 2 x2i ≤ 1
i=2
1
where we also have to choose α and β. The matrix Q is then a diagonal matrix with entry α2
at position (1, 1) and β12 on all other diagonal positions.
To keep E small, we want e1 to lie on the border of E. This condition leads to α2 (1 − c)2 = 1
and hence
1
α2 = . (39)
(1 − c)2
Moreover, we want all points on the intersection of the border of B n and {x ∈ Rn | x1 = 0} to
72
be on the border of E. This condition leads to α2 c2 + β 2 = 1 and thus
c2 1 − 2c
β 2 = 1 − α 2 c2 = 1 − 2
= . (40)
(1 − c) (1 − c)2
p
The volume of an ellipsoid E = {x ∈ Rn | (x−s)t Q−1 (x−s) ≤ 1} is vol(E) = det(Q)×vol(B n )
(a result from measure theory, see e.g. Proposition 6.1.2 in Cohn [1980]).
p
Therefore, our goal is to choose α, β and c in such a way that det(Q) = α−1 β −(n−1) is
minimized.
(1−c)2n
Thus, we want to find a c minimizing (1−2c)n−1
.
2n 2n 2n−1
d (1−c)
We have dc (1−2c)n−1
= 2(n−1)(1−c)
(1−2c)n
− 2n(1−c)
(1−2c)n−1
which is zero if 2(n−1)(1−c)
1−2c
= 2n. This leads to
2(n − 1) − 2c(n − 1) = 2n − 4cn and c(2n − (n − 1)) = 1. Thus, we minimize the volume by
1
setting c = n+1 .
(n+1)2 n2 −1
Then, α2 = n2
and β 2 = n2
.
Lemma 51 (Half-Ball Lemma) We have

( 2 n
)
(n + 1)2 n2 − 1 X 2

1
B n ∩{x ∈ Rn | x1 ≥ 0} ⊆ E := x ∈ Rn | x1 − + x ≤1 .
n2 n+1 n2 i=2 i
1
Moreover, vol(E)
vol(B n )
≤ e− 2(n+1) .
Proof: Consider x ∈ B n ∩ {x ∈ Rn | x1 ≥ 0}. We have ni=2 x2i ≤ 1 − x21 , and hence it is

P
2 2 n2 −1
sufficient to show that g(x1 ) := (n+1)
n2
x1 − n+11
+ n2 (1 − x21 ) ≤ 1. For x1 = 0, we have
2 (n+1)2 2
n2 −1
g(0) = (n+1)
n2
1
(n+1)2 + n 2 = 1. And for x 1 = 1: g(1) = n2
n
n+1
= 1.
(n+1)2 n2 −1
Moreover, g is a quadratic function and the coefficient of x21 is n2
− n2
> 0. Therefore, we
have g(x1 ) ≤ 1 for 0 ≤ x1 ≤ 1.
n−1
vol(E) n2 2
p
For the second statement, note that vol(B n )
= det(Q) = α−1 β −(n−1) = n
n+1 n2 −1
≤
1 n−1 1 1 1
e− n+1 e = e− n+1 + 2(n+1) = e− 2(n+1) . For the first inequality we made use of the fact that
2(n2 −1)
1 + x ≤ ex for any x ∈ R. 2
73
Lemma 52 (Half-Ellipsoid Lemma) Let E = p + {x ∈ Rn | xt Q−1 x ≤ 1} be an ellipsoid
and a ∈ Rn with at Qa = 1. Then,
2

n t t 1 0 n n −1 t −1 2 t
E ∩{x ∈ R | a x ≥ a p} ⊆ E = p+ Qa+ x ∈ R | x Q + aa x ≤ 1 .
n+1 n2 n−1
1
vol(E 0 )
Moreover, vol(E)
≤ e− 2(n+1) .
Proof: Let M be a non-singular n × n-matrix with Q = M M t . We can assume that at M = et1

(and thus Qa = M M t a = M (at M )t = M e1 ) because otherwise we can multiply M by a rotation
matrix that maps the vector at M to e1 . Then
E ∩ {x ∈ Rn | at x ≥ at p}
= (p + M B n ) ∩ {x ∈ Rn | at x ≥ at p}
= p + (M B n ∩ {x ∈ Rn | at (x + p) ≥ at p})
= p + (M B n ∩ {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ M −1 {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | at M x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | et1 x ≥ 0})
2

1 n n −1 t 2 t
⊆ p+ M e1 + M x ∈ R | x In + e1 e x ≤ 1
n+1 n2 n−1 1
2

1 n n −1 −1 t 2 t −1
= p+ M e1 + x ∈ R | (M x) In + e1 e M x ≤ 1
n+1 n2 n−1 1
2

1 n n −1 t −1 2 t
= p+ Qa + x ∈ R | x Q + aa x ≤ 1
n+1 n2 n−1
n o
We can write the ellipsoid E 0 in standard form as E 0 = p + 1
n+1
Qa + x ∈ Rn | xt Q̃−1 x ≤ 1
2
with Q̃ = n2n−1 Q − n+1
2

Qaat Qt because
n2 − 1 n2

2 −1 t 2 t t
Q + aa Q− Qaa Q
n2 n−1 n2 − 1 n+1
2 2 4
= In − aat Qt + aat Q − 2 a at Qa at Qt
n+1 n−1 n − 1 | {z }
=1
= In .
q
vol(E 0 )
Therefore, vol(E) = det( Q̃)
det(Q)
.
2 n n
n2 n2
We have det( Q̃) n 2 2

det(Q)
= det n2 −1
In − n+1
aat Qt = n2 −1
det In − n+1
aat Qt = n2 −1
(1 −
2
n+1
). To see the last equality note that the matrix aat Qt has eigenvalue 1 for the eigenvector
74
a (because at Qt a = 1) while all other eigenvalues are zero (the rank of aat Qt is 1).q Since the
determinant is the product of all eigenvalues, this implies the last equation. Hence, det( Q̃)
det(Q)
≤
2 n2 n−1
1
≤ e− 2(n+1) (see the proof of the Half-Ball Lemma for
n 2 1 n n2 2
n2 −1
(1 − n+1 ) 2 = n+1 n2 −1
details of the last steps). 2
n o 2
Remark: The ellipsoid E 0 = p+ n+1 1
Qa+ x ∈ Rn | xt Q̃−1 x ≤ 1 with Q̃ = n2n−1 Q − n+12

Qaat Qt
is called Löwner-John ellipsoid. It is in fact the smallest ellipsoid containing E ∩ {x ∈ Rn |
at x ≥ at p}.
A separation oracle for a convex set K ⊆ Rn is a black-box algorithm which, given x ∈ Rn ,
either returns an a ∈ Rn with at y > at x for all y ∈ K or asserts x ∈ K.
Observation: Given A ∈ Qm×n and b ∈ Qm , a separation oracle for {x ∈ Rn | Ax ≤ b} can be
implemented in O(mn) arithmetical operations.
Algorithm 3: Idealized Ellipsoid Algorithm

Input: A separation oracle for a closed convex set K ⊆ Rn , a number R > 0 with
K ⊆ {x ∈ Rn | xt x ≤ R2 }, and a number > 0
Output: An x ∈ K or the message “vol(K) < ”.
2
1 p0 := 0, A0 := R In ;
1
2 for k = 0, . . . , N (R, ) := d2(n + 1)(n ln(2R) + ln( ))e do

3 if pk ∈ K then
4 return pk ;
5 Let ā ∈ Rn be a vector with āt y > āt pk for all y ∈ K;
6 bk := √Atk ā ;
ā Ak ā
1
7 pk+1 := pk + n+1 bk ;
n2 2
8 Ak+1 := n2 −1 (Ak − n+1 bk btk );
9 return “vol(K) < ”;
Theorem 53 Given a convex set K ⊆ Rn (specified by a separation oracle), > 0, and

a number R with K ⊆ {x ∈ Rn | xt x ≤ R2 }, we can find an x ∈ K or (correctly) assert
“vol(K) < ”, in O(n(n ln(R) + ln( 1 ))) iterations of the Idealized Ellipsoid Method.
Each iteration requires one oracle call, O(n2 ) basic arithmetical operations, and the
computation of one square root of real numbers.
Proof: As an invariant, we will prove that during the k-th iteration of the algorithm, the
set K is contained in the set pk + {x ∈ Rn | xt A−1k x ≤ 1}. For k = 0, this is true because R is
big enough. For the step from k to k + 1, we apply the Half-Ellipsoid Lemma (Lemma 52) to
t
Q = Ak and a = √ āt (this scaling leads to at Ak a = āāt A k ā
Ak ā
= 1).
ā Ak ā
75
We have vol({x ∈ Rn | xt x ≤ R2 }) ≤ vol([−R, R]n ) = 2n Rn , and in each iteration, the
1
− 2(n+1)
volume of Ek = {x ∈ Rn | xt A−1
k x ≤ 1} is reduced at least by the factor e , so we get
k
− 2(n+1)
vol(Ek ) ≤ e 2n Rn .
k
Thus, we have to find a smallest k such that e− 2(n+1) 2n Rn ≤ which is equivalent to 2(n+1)k
≥
2n Rn 1 1

ln and k ≥ 2(n + 1)(n ln(2R) + ln( )). This shows that O(n(n ln(R) + ln( ))) iterations
are sufficient. 2
6.2 Error Analysis
We cannot compute square roots exactly, so during the algorithm, we have to work with rounded
intermediate solutions. Let pek and Ãk be the exact values and pk and Ak be the rounded values
(and the same for the corresponding ellipsoids Ẽk and Ek ). Note that pek and Ãk are based on
the rounded values pk−1 and Ak−1 .
Let δ be an upper bound on the maximum absolute rounding error for the entries in pek and
Ãk , so kpk − pek k∞ ≤ δ and kAk − Ãk k∞ ≤ δ. So δ (that will be defined later) describes the
precision of the rounding. When we round the entries in Ãk , we do it in such a way that the
matrix remains symmetric. Let Γk = Ak − Ãk and ∆k = pk − pek .
In the following, we write by kk̇ the Euclidean norm for vectors and the induced operator norm
for the matrices. When considering matrices, we often make use of the fact that the Frobenius
norm is an upper bound for the operator norm induced by the Euclidean norm.
−1
For any x ∈ K we can assume that (x − pek )t Ãk (x − pek ) ≤ 1 and we want to prove the same
for pk and Ak . To this end, we have to increase the ellipsoid slightly by scaling Ãk .
−1 −1
We have (x − pk )t A−1 t
k (x − pk ) = (x − pk ) Ãk (x − pk ) + (x − pk )t (A−1
k − Ãk )(x − pk ). We
analyze the two summands separately:
−1 −1 −1 −1
(x − pk )t Ãk (x − pk ) = (x − p˜k )t Ãk (x − p˜k ) + |2∆tk Ãk (x − p˜k )| + ∆tk Ãk ∆k
−1 −1
≤ 1 + 2k∆k k · kÃk k (R + kp˜k k) + k∆k k2 · kÃk k (41)
√ −1 −1
≤ 1 + 2 nδ kÃk k (R + kp˜k k) + nδ 2 kÃk k.
And:
−1 −1
(x − pk )t (A−1
k − Ãk )(x − pk ) ≤ kx − pk k2 · kA−1
k − Ãk k
−1 −1
≤ (R + kpk k)2 kAk (Ak − Ãk )Ãk k
−1 (42)
≤ (R + kpk k)2 kA−1
k k · kÃk k · kΓk k
2 −1 −1
≤ (R + kpk k) kAk k · kÃk k · nδ
1
We adjust Ãk by multiplying it by µ = 1 + 2n(n+1)
, so we replace Ãk by µÃk (which we call Ãk
again). Then
−1 1 2n(n + 1) 1
(x − p̃k )t Ãk (x − p̃k ) = 1 = < 1 − . (43)
1+ 2n(n+1)
2n2 + 2n + 1 4n2
76
and (E
]k+1 also refers to the scaled version of Ãk ):
n2
vol(E
] k+1 ) 1
− 2(n+1) 1 1 1 1
≤e 1+ ≤ e− 2(n+1) e 4(n+1) = e− 4(n+1) . (44)
vol(Ek ) 2n(n + 1)
Thus,
q
vol(Ek+1 ) vol(E
]k+1 ) vol(Ek+1 ) 1 −1
= ≤ e− 4(n+1) det(Ak+1 A
]k+1 ) (45)
vol(Ek ) vol(Ek ) vol(E]k+1 )
We have
−1 −1
det(Ak+1 A
] k+1 ) = det In + (Ak+1 − A
] k+1 )A]k+1
(∗) −1
n
≤ kIn + (Ak+1 − A ] k+1 )Ak+1 k
]
−1
n
≤ (1 + kΓk+1 k · kA
] k+1 k)
−1
n
≤ (1 + nδkA
]k+1 k)
−1
2 δkA k
≤ en
^ k+1 ,
Qn
where inequality (∗) follows from Hadamard’s inequality (| det(A)| ≤ i=1 kai k for an n × n-
matrix with columns a1 , . . . , an , see exercises).
This implies
vol(Ek+1 ) 1 1 2
−1
≤ e− 4(n+1) · e 2 n δkAk+1 k .
^
vol(Ek )
−1 1
Hence, if we had 12 δkA
] k+1 k <
1
8(n+1)3
, then we had vol(Ek+1 )
vol(Ek )
< e− 8(n+1) .
Therefore, and by equations (41) and (42) our goal is to choose δ such that we get the following
inequalities:
√ fk −1 k (R + kp˜k k) + nδ 2 kA
fk −1 k + (R + kpk k)2 kA−1 k · kA
fk −1 knδ ≤ 1
• 2 nδ kA k 4n2
−1
1
• δkA
]k+1 k ≤ 4(n+1)3
For the analysis, we assume that R ≥ 1.
1
Proposition 54 Assume that δ is chosen such that δ ≤ 12n4k
in iteration k of the
Ellipsoid Method. Then, we have:
(a) Ak is positive definite.
(b) kpk k ≤ R2k , kpek k ≤ R2k .
(c) kAk k ≤ R2 2k , kA
fk k ≤ R2 2k .
(d) kA−1 −2 k f −1 k ≤ R−2 4k .

k k ≤ R 4 , kAk
77
Proof: We have
n2 − 1 1 āāt

−1 2
A
]k+1 = A−1
k + .
n2 µ n − 1 āt Ak ā
−1
Thus, as a sum of a positive definite matrix and a positive semidefinite matrix A
]k+1 is positive
n2 2 t
definite. Therefore Ak+1 = n2 −1 µ(Ak − n+1 bk bk ) is positive definite.
]
We will show by induction that Ak is positive definite and kAk −1 k ≤ R−2 4k .

t āt ā −1
k ātāā
Ak ā
k= āt Ak ā
≤ (min{xt Ak x | kxk = 1}) ≤ kA−1
k k.
Thus,
n2 − 1 1 āāt

−1 2
kA
]k+1 k≤ kA−1
k k+ k t k ≤ 3kAk −1 k
n2 µ n − 1 ā Ak ā
Let λ be a smallest eigenvalue of Ak+1 and v a vector with kvk = 1 such that λ = v t Ak+1 v.
Then:
v t Ak+1 v ≥ v t A
]k+1 v − nδ
≥ min{ut A ] n
k+1 u | u ∈ R , kuk = 1} − nδ
1
≥ −1 − nδ
kAk+1 k
]
1
≥ − nδ
3kAk −1 k
1 1
≥ − nδ
3 R−2 4k
1
≥ ,
R−2 4k+1
provided that:
R2

1 1
nδ ≤ − . (46)
3 4 4k
This shows that Ak+1 is positive definite and by kA−1

0 k = R
−2
and 1
kA−1
= v t Ak+1 v this proves
k+1 k
kA−1 −2 k+1
k+1 k ≤ R 4
−1 −1
−1 −2 k+1
By kA
]k+1 k ≤ 3kAk k we get as well kA
]k+1 k ≤ R 4 . This proves (d).
n2
We have kA] k+1 k ≤ n2 −1 µkAk k because kAk ≤ kA + Bk for positive semidefinite matrices A
and B (see the exercises). Together with kA0 k = R2 , this leads by induction to
n2
kAk+1 k ≤ kA
] k+1 k + kΓk+1 k ≤ 2−1
µ kAk k + nδ ≤ R2 2k+1
n
| {z }
≤ 32
n2
We also get kA
]k+1 k ≤ n2 −1
µkAk k ≤ R2 2k+1 , so we have proved (c).
78
We can write Ak = M M t with a regular matrix M . Then,
r s
kAk āk t
ā Ak Ak ā (M t ā)t Ak (M t ā) p k
kbk k = √ t = t
= t t t
≤ kAk k ≤ R2 2 , (47)
ā Ak ā ā Ak ā (M ā )(M ā)
where the first inequality follows from the fact that kAk k = max{xT Ak x | kxk = 1} because Ak
is positive semidefinite (see exercises).
Therefore, we get by induction (using the fact that p0 = 0)
1 √ k √ k 1
kpk+1 k ≤ kpk k + kbk k + nδ ≤ kpk k + R2 2 + nδ ≤ R2k + R2 2 + √ k ≤ R2k+1 .
n+1 3 n4
This also gives us: kpg

k+1 k ≤ kpk k +
1
n+1
kbk k ≤ R2k+1 . This shows statement (b). 2
Algorithm 4: Ellipsoid Algorithm

Input: A separation oracle for a closed convex set K ⊆ Rn , a number R > 0 with
K ⊆ {x ∈ Rn | xt x ≤ R2 }, and a number > 0
Output: An x ∈ K or the message “vol(K) < ”.
2
1 p0 := 0, A0 := R In ;
1
2 for k = 0, . . . , N (R, ) := d8(n + 1)(n ln(2R) + ln( ))e do

3 if pk ∈ K then
4 return pk ;
5 Let ā ∈ Rn be a vector with āt y > āt pk for all y ∈ K;
6 bk := √Atk ā ;
ā Ak ā
1
7 pk+1 an approximation of pg
k+1 := pk + n+1 bk with a maximum error of δ;
2
1 n 2 t
Ak+1 a symmetric approximation of A k+1 := 1 + 2n(n+1) n2 −1 (Ak − n+1 bk bk ) with a
8 ]
maximum error of δ;
9 return “vol(K) < ”;
−1
Lemma 55 Let δ be positive with δ < 26(N (R,)+1) 16n3 where N (R, ) :=
d8(n + 1)(n ln(2R) + ln( 1 ))e. Then, in iteration k of the Ellipsoid Algorithm, we
k
have K ⊆ pk + Ek and vol(Ek ) < e− 8(n+1) 2n Rn .
1 1
R2
Proof: By the choice of δ, we have nδ ≤ 3
− 4 4k
.
Moreover,
√ fk −1 k (R + kp˜k k) + nδ 2 kA
fk −1 k +(R + kpk k)2 kA−1 k · kA
fk −1 k nδ ≤ δn26k ≤ 1
• 2 nδ kA k 4n2
| {z } |{z} | {z } |{z} | {z } | {z }
≤R−2 4k ≤R2k ≤R−2 4k ≤R2k ≤R−2 4k ≤R−2 4k
79
−1
1
• δ kA
] k≤
| k+1
{z } 4(n+1)3
≤R−2 4k
Hence, by the above analysis, Ek (with rounded numbers) always contains the set K, and
1
is reduced at least by a factor of e− 8(n+1) in each iteration, so after
the volume of Ek
O n n ln R + ln 1 iterations, the algorithm terminates with a correct output. 2
Theorem 56 For a compact convex set K ⊆ {x ∈ Rn | xt x ≤ R2 }, given by a separation

oracle, the Ellipsoid Algorithm either finds a vector x ∈ K or asserts vol(K) ≤ . It
1

needs O n n ln R + ln iterations, and in each iteration it performs one oracle call,
the approximative computation of one square root and O(n2 ) arithmetical operations on
O n n ln R + ln 1 2

bits.
There number of calls of the separation oracle can be reduced to O(n ln( nR

)) (see Lee, Sidford,
and Wong [2015] for an algorithm that only needs O(n ln( )) oracle calls and O(n3 lnO(1) ( nR
nR

))
additional time).
6.3 Ellipsoid Method for Linear Programs
We first want to use the Ellipsoid Algorithm just to check if a given polyhedron P is empty.
This can be done directly, provided that P is in fact a polytope and if we have the assertion
that if P is non-empty, its volume cannot be arbitrarily small. The following proposition implies
that we can assume these properties:
Proposition 57 Let A ∈ Qm×n , b ∈ Qm and P = {x ∈ Rn | Ax ≤ b}. For R = 1 +

−1
24n(size(A)+size(b)) and = 2n24n(size(A)+size(b)) let PR, = {x ∈ [−R, R]n | Ax ≤ b + 11}.
Then:
(a) P = ∅ ⇔ PR, = ∅.
2
n
(b) If P 6= ∅, then vol(PR, ) ≥ n2size(A)
.
Proof:
(a) “P = ∅ ⇒ PR,0 = ∅” is trivial, and by Proposition 47, we have “PR,0 = ∅ ⇒ P = ∅”.

“PR, = ∅ ⇒ PR,0 = ∅” is also trivial, so it remains to show: “P = ∅ ⇒ PR, = ∅”. Assume
that P = ∅. By Farkas’ Lemma (Theorem 5) this implies that there is a vector y ≥ 0 with
80
y t A = 0 and y t b = −1. Then, by Proposition 47
min 11t y
At y = 0
bt y = −1
y ≥ 0
has an optimum solution y such that the absolute value of any entry of y is at most
24nsize(A)+size(b) . Thus, y t (b + 11) < −1 + (n + 1)24n(size(A)+size(b)) < 0. Again by Farkas’
Lemma, this implies that Ax ≤ b + 11 does not have a feasible solution. In particular,
there is no feasible solution in [−R, R]n , so PR, = ∅.
(b) If P 6= ∅, then PR−1,0 6= ∅ (with the same proof as in (a) for R). But for any z ∈ PR−1,0 , we

have {x ∈ Rn | ||x − z||∞ < n2size(A) } ⊆ PR, . Hence vol(PR, ) ≥ vol{x ∈ Rn | ||x − z||∞ <
n
2
2

n2size(A)
} = n2size(A) .
Theorem 58 Given a polyhedron P = {x ∈ Rn | Ax ≤ b} with A ∈ Qm×n and b ∈ Qm

we can decide in polynomial running time if P is empty.
√
Proof: We can apply the Ellipsoid Algorithm to K = PR, with R = d n(1+24n(size(A)+size(b)) )e
n −1
and 0 = n2size(A)
2
(for = 2n24n(size(A)+size(b)) ) as a lower bound for the volume. We need
N (R, 0 ) = O(n(n ln(R) + ln( 10 ))) iterations, which is polynomial in the input size.
Moreover, it is sufficient to set the bound on the absolute rounding error to any value δ <
0 −1
26(N (R, )+1) 16n3 , so also the number of bits that we have to compute during the algorithm
is polynomial. 2
Theorem 59 There is a polynomial-time algorithm that computes an optimum solution

for a given linear program max{ct x | Ax ≤ b} with A ∈ Qm×n , c ∈ Qn and b ∈ Qm if one
exists.
Proof: By Theorem 58, we can check in polynomial time if a given linear program has a
feasible solution. We will show that this is sufficient for computing a feasible solution if one exists.
Assume that we are given m inequalities ati x ≤ bi with ai ∈ Qn and bi ∈ Q (i ∈ {1, . . . , m}). First
check if the system is feasible. If it infeasible, we are done. Otherwise, perform for i = 1, . . . , m
the following steps: Check if the system remains feasible if we replace ati x ≤ bi by ati x = bi . If
this is the case, replace ati x ≤ bi by ati x = bi . Otherwise, the inequality is redundant, and we
can skip it. We end up with a feasible system of equations with the property that any solution
of this system of equations is also a solution of the given system of inequalities. However, the
system of equations can be solved in polynomial time by using Gaussian Elimination (see
81
Section 5.1). Hence, for any linear program, we can compute in polynomial-time a feasible
solution if one exists.
In Section 2.4 we have seen that the task of computing an optimum solution for a bounded
feasible linear program can be reduced to the computation of a feasible solution of a modified
linear program (see the LP (24)). Thus, we can also compute an optimum solution. 2
Remark: By Proposition 22, the method described in the previous proof computes a solution
in a minimal face of the solution polyhedron P . In particular, if P is pointed, we compute a
vertex of P .
6.4 Separation and Optimization
An advantage of the Ellipsoid Algorithm is that it does not necessarily need a complete
description of a solution space K ⊆ Rn but only needs a separation oracle that provides a linear
inequality satisfied by all elements of K but not by a given vector x ∈ Rn \ K. This allows us
to use the method e.g. for linear program with an exponential number of constraints.
Example: Consider the Maximum-Matching Problem. A matching in an undirected
graph is a set M ⊆ E(G) such that |δG (v) ∪ M | ≤ 1 for all v ∈ V (G). In the Maximum-
Matching Problem we are given an undirected graph G and ask for a matching with
maximum cardinality. It can be formulated as the following integer linear program:
P
max x
P e∈E(G) e
e∈δG (v) xe ≤ 1 v ∈ V (G)
xe ∈ {0, 1} e ∈ E(G)
In the LP-relaxation, we simply replace the constraint “xe ∈ {0, 1}” by “xe ≥ 0”. However, this
allows us e.g. in the graph K3 (i.e. the complete graph on three vertices) to set all values xe to
1
2
. To avoid such solutions, we may add the following constraints:
P |U |−1
e∈E(G[U ]) xe ≤ 2
U ⊆ V (G), |U | odd
It turns out that the feasible solutions of the LP

P
max x
P e∈E(G) e
xe ≤ 1 v ∈ V (G)
P e∈δG (v) |U |−1
e∈E(G[U ]) xe ≤ 2
U ⊆ V (G), |U | odd
xe ≥ 0 e ∈ E(G)
are indeed the convex combinations of the solutions of the ILP formulation. In other words, the
vertices of the solution polyhedron of the LP are the integer solutions. We won’t prove this
statement here, see Edmonds [1965] for a proof. Hence, solving the linear program would be
sufficient to solve the matching problem. The number of constraints is exponential in the size
of the graph, but the good news is that there is a separation oracle with polynomial running
82
time for this linear program (see Padberg and Rao [1982]). We will see how such a separation
oracle can be used for solving the optimization problem.
In the remainder of this chapter, we always consider closed convex sets K for which numbers r
and R with 0 < r < R2 exist such that rB n ⊆ K ⊆ RB n . We call sets for which such numbers r
and R exist, r-R-sandwiched sets.
We will consider relaxed versions both of linear optimization problems and of separation
problems. In the weak optimization problem we are given a set K ⊆ Rn , a number > 0
and a vector c ∈ Qn . The task is to find an x ∈ K with ct x ≥ max{ct z | z ∈ K} − .
In order to apply the Ellipsoid Algorithm directly to an optimization problem, we need the
property that the set of almost optimum solutions cannot have an arbitrarily small volume.
The following lemma guarantees this for r-R-sandwiched sets:
Lemma 60 Let K ⊆ Rn be an r-R-sandwiched convex set, c ∈ Rn , δ = sup{ct x | x ∈ K},

and 0 < < δ. Moreover, let U = {x ∈ K | ct x ≥ δ − }. Then,
n−1
1 1
vol(U ) ≥ rn−1 .
2kckR| nn 2kck n
Proof: Let z ∈ K with ct z ≥ δ − 2 . The set A = {x ∈ Rn | ct x = 0, xt x ≤ r2 } is an

(n − 1)-dimensional ball of radius r and is contained in K. Its (n − 1)-dimensional volume is
rn−1 vol(Bn−1 ). And by convexity of K, we have conv(A ∪ {z}) ⊆ K. Let A0 = conv(A ∪ {z}) ∩
{x ∈ Rn | ct x = ct z − 2 }. Then the (n − 1)-dimensional volume of A0 is
n−1
1
rn−1 vol(Bn−1 )
2 ct z
Moreover, conv(A0 ∪ {z}) ⊂ U and
n−1 1
0 n−1
vol(conv(A ∪ {z})) ≥ r vol(Bn−1 )
2ct z 2kck n
n−1
1 1
≥ rn−1 n .
2kckR| n 2kck n
Here we use the fact that conv(A0 ∪ {z}) is an n-dimensional pyramid with height at least
2kck
n−1 n−1
and a base of ((n − 1)-dimensional) volume 2ct z r vol(Bn−1 ). 2
This result allows us to find a polynomial-time algorithm for the weak optimization problem
provided that we can solve the corresponding separation problem efficiently:
83
Proposition 61 Given a polynomial-time separation oracle for an r-R-sandwiched
convex set K ⊆ Rn with running time polynomial in size(R), size(r) and size(x)
(where x is the input vector for the oracle), a number > 0 and a vector c, there is a
polynomial-time algorithm (w.r.t. size(R), size(r), size(c) and size()) that computes a
vector v ∈ K with ct v ≥ sup{ct x | x ∈ K} − .
Proof: Apply the Ellipsoid Algorithm to find an almost optimum vector in K. Use the
previous lemma that shows that the set of almost optimum vectors in K cannot be arbitrarily
small. 2
A weak separation oracle for a convex set K ⊆ Rn is an algorithm which, given x ∈ Rn
and η with 0 < η < 21 , either asserts x ∈ K or finds v ∈ Rn with v t z ≤ 1 for all z ∈ K and
v t x ≥ 1 − η.
Remark: For the previous proposition, it would be enough to have a weak separation oracle
for K.
Notation: For K ⊆ Rn , we define K ∗ := {y ∈ Rn | y t x ≤ 1 for all x ∈ K}.
Theorem 62 If there is an algorithm with running time polynomial in size(r) and

size(R) maximizing linear objective functions over a closed convex r-R-sandwiched set
K ⊆ Rn , then there is a weak separation oracle for K with running time polynomial in
size(r), size(R) and size(η).
Proof: Claim: K ∗∗ = K
Proof of the claim: For x ∈ K, we have y t x ≤ 1 for all y ∈ K ∗ which implies x ∈ K ∗∗ . Therefore,
we have K ⊆ K ∗∗ .
Now let z ∈ Rn \ K. And let w ∈ K be a vector such that kz − wk2 is smallest possible over
vectors in K (w exists because K is convex and closed). Let u = z − w. Then, for all x ∈ K, we
have ut x ≤ ut w < ut z. Moreover, since 0 ∈ K, we have ut w ≥ 0. By scaling u, we can assume
that ut z > 1 while ut x ≤ 1 for all x ∈ K. But then u ∈ K ∗ and ut z > 1 which implies z 6∈ K ∗∗ .
Thus K ∗∗ ⊆ K. This prove the claim.
Now, let x ∈ Rn be an instance for the weak separation oracle. If x = 0, we can assert x ∈ K,
x
and if kxk > R we can choose v = kxk 2 . Therefore, we can assume that 0 < kxk ≤ R.
We can solve the (strong) separation problem for K ∗ (see the exercises). Since K ∗ is a closed
convex R1 - 1r -sandwiched set, we can apply the previous observation to it, and thus, we can
η
solve the weak optimization problem for K ∗ with c = kxk x
2 and = R in polynomial time.
xt t η xt η
Thus, we get a vector v0 ∈ K ∗ with v
kxk 0
x
≥ max{ kxk v | v ∈ K ∗} − R
. If v
kxk 0
≥ 1
kxk
− R
,
then v0t x ≥ 1 − η kxk
R
≥ 1 − η, and v0 z ≤ 1 for all z ∈ K (since v0 ∈ K ). Otherwise ∗
84
t
x
max{ kxk v | v ∈ K ∗ } ≤ kxk
1
, so max{xt v | v ∈ K ∗ } ≤ 1, which implies x ∈ K ∗∗ . Together with
the above claim, this implies x ∈ K. Therefore, we have a weak separation oracle for K in
polynomial running time. 2
It turns out that for rational r-R-sandwiched polyhedra P an exact polynomial-time separation
algorithm also provides an exact polynomial-time optimization algorithm, and vice versa,
provided that appropriate bounds on the sizes of the vertices of P are given (see textbooks, e.g.
Theorems 4.21 and Theorem 4.23 of Korte and Vygen [2018]).
85
86
7 Interior Point Methods
Note that this section was not covered by the lecture course given in summer term 2019.
The Ellipsoid Algorithm gives a polynomial-time algorithm for solving linear programs but
in practice it is typically much less efficient than the Simplex Algorithm. In contrast, the
algorithm that we will describe in this section is efficient both in theory and practice.
The term “interior point method” refers to several quite different algorithms. They all have in
common that during the algorithm we always consider vectors in the interior of the polyhedron
of feasible solutions (in contrast to the Simplex Algorithm where we always have vectors
on the border of the polyhedron). Here, we restrict ourselves to one variant and follow the
description by Mehlhorn and Saxena [2015]. The first version of the algorithm has been proposed
by Karmakar [1984].
We consider an LP max{ct x | Ax ≤ b} in standard inequality form.
To simplify the notation, we write the slack variables s explicitly, so we consider the following
problem:
max ct x
s.t. Ax + s = b (48)
s ≥ 0
We write its dual problem in standard form:

min bt y
s.t. At y = c (49)
y ≥ 0
In fact, what we will compute is a solution of this dual linear program.

In the following, we assume that the columns of A are linearly independent (otherwise we had
redundant equations in the constrains of the dual LP) and that the number of rows is larger
than the number of columns (otherwise we could simply solve the equation system for the dual
and check if the solution is non-negative). These are the same assumptions that we had for the
Simplex Algorithm (but for the transposed matrix).
By complementary slackness, we have solved both problems to optimality when we have found
a feasible solution x, s of the primal LP and a feasible solution y of the dual LP such that
y t s = 0. In other words, we want to find x, s, and y with:
Ax + s = b
At y = c
yts = 0 (50)
y ≥ 0
s ≥ 0
Note that y t s = 0 is not a linear constraint. Without this constraint (i.e. for the system
Ax + s = b, At y = c, y ≥ 0, s ≥ 0), the term y t s is exactly the difference between the
87
(dual) value of the dual solution y and the (primal) value of the primal solution x, s because
bt y − ct x = xt At y + st y − ct x = xt c + st y − ct x = st y.
The system (50) has a solution only if both the primal and the dual linear program are feasible
and bounded, so for the moment we assume that this is the case. In Section 7.1, we will see
what to do to enforce these properties.
In the interior point methods, one generally considers vectors in the interior of the solution
space. In the system (50), the only inequalities are y ≥ 0 and s ≥ 0, so during the algorithm,
we always have solutions x, s, y with y > 0 and s > 0. We will replace the condition y t s = 0 by
2
the condition σ 2 := m yi s i
− 1 ≤ 14 for some number µ > 0. During the iterations of the
P
i=1 µ
algorithm, we will decrease µ more and more towards 0.
To summarize, during the algorithm, we have a number µ > 0 and vectors x, s, y meeting the
following invariants
Ax + s = b
At y = c

Pm yi si 2
1 (51)
i=1 µ
−1 ≤ 4
y > 0
s > 0
Now, the general strategy consists of three main parts:
(I) Compute an initial solution of a modified version of (51) (Section 7.1).
(II) Reduce µ by a constant factor and adapt x, y and s to this new value of µ such that we
again get a solution of (51). Iterate this step until µ is small enough (Section 7.2).
(III) Compute an optimum solution of the dual LP (Section 7.3).
7.1 Modification of the LP and Computation of an Initial Solution
We will show how we can modify (51) to an equivalent problem that can be solved easily,
provided that we are allowed to choose µ. This modification will in particular make both the
primal and the dual LP feasible. This is equivalent to the statement that one of them is feasible
and bounded. We will show how to modify the dual LP (49) such that the modified version is
feasible and bounded.
In a first step, we make the LP (49) bounded (in such a way that we do not change the
problem if the given LP was bounded). By Theorem 47, we know that if (49) is feasible and
bounded, then there is a W with W ∈ 2Θ(m(size(A)+size(c))) such that there is an optimum solution
y = (y1 , . . . , ym ) ≥ 0 with yi ≤ W (i = 1, . . . , m). So in this case there is a vector y ≥ 0 with
11t y ≤ mW and At y = c. Equivalently (after dividing everything by W ), we can ask for a vector
y ≥ 0 with 11t y ≤ m and At y = W1 c. By relaxing the constraint 11t y ≤ m to 11t y ≤ m + 1 and
88
by adding a slack variable ym+1 ≥ 0 this leads to the following LP which is equivalent to (49)
provided that (49) is bounded:
min bt y
s.t. At y = W1 c
11t y + ym+1 = m+1 (52)
y ≥ 0
ym+1 ≥ 0
In a second step, we will make the LP feasible. To this end, we add a new variable ym+2
such that setting all variables to 1 will get us a feasible solution. Let H be a constant (to be
determined later). Then, we state the following LP:
min bt y + Hy
m+2
1
t
s.t. A y + W
c − A 11 ym+2 = W1 c
t
t
11 y + ym+1 + ym+2 = m + 2
(53)
y ≥ 0
ym+1 ≥ 0
ym+2 ≥ 0
The goal is to choose H that big that if this LP has a feasible solution with ym+2 = 0 at all,
then in any optimum solution ym+2 = 0 will hold. In fact, by Corollary 48 we know that there
is a constant l such that if there is an optimum solution of (53) with ym+2 > 0, then there is an
optimum solution with ym+2 ≥ 2−4ml(size(A)+size(c)+size(W )) . On the other hand, bt y ≤ kbk1 (m + 2)
in any feasible solution of (53), so if we set H = (kbk1 (m + 2) + 1)24ml(size(A)+size(c)+size(W )) , then
we enforce that ym+2 = 0 in any optimum solution (if a solution with ym+2 = 0 exists).
The linear program (53) is obviously feasible and bounded. In addition, we can use an optimum
solution of it, to check if the initial dual LP was feasible and bounded, and if this is the case,
we can find an optimum solution of it: Let y1 , . . . , ym+2 be an optimum solution of (53). If
ym+2 > 0, then we know that (52) has no feasible solution (otherwise there was a feasible
solution of (53) with ym+2 = 0 which is cheaper). Thus, the LP (49) has no feasible solution
either. On the other hand, if ym+2 = 0, then the initial dual LP must be feasible. Assume that
this is the case, then we still have to check if the initial dual LP was bounded. If ym+1 > 0, the
initial dual program must be bounded. If ym+1 = 0, then the initial dual LP can be bounded or
unbounded. To decide if it is bounded, we can replace c by the all-zero vector and first solve
this new problem. Then, by Farkas’ Lemma, the LP (49) is bounded if and only if the value of
an optimum solution of the new problem is non-negative.
If we dualize the LP (53), we get the following LP (with variables x ∈ Rn , s ∈ Rm and additional
89
variables xn+1 , sm+1 , and sm+2 ):
max W1 ct x + (m + 2)xn+1
Ax + xn+1 11 + s = b
1 t t

W
c − 11 A x + xn+1 + sm+2 = H
xn+1 + sm+1 = 0 (54)
s ≥ 0
sm+1 ≥ 0
sm+2 ≥ 0
Instead of the primal-dual pair (48) and (49), we will consider the pair (53) and (54). Due to
the modification, both LPs are feasible and bounded.
For the new pair
2 of LPs we can easily find feasible solutions and a number µ such that
Pm+2 yi si
i=1 µ
− 1 ≤ 41 : We set y1 = y2 = · · · = ym = ym+1 = ym+2 = 1 which is obviously
µ
feasible for (53). For (54), we set x1 = x2 = · · · = xn = 0. Moreover, we choose sm+1 = ym+1 =µ
(where µ itself is still to be determined). This leads to xn+1 = −µ, sm+2 = H + µ, and
si = bi − xn+1 = bi + µ (i = 1, . . . m).
As a consequence of this choice, we get:
y i si bi
−1 = i = 1, . . . , m
µ µ
ym+1 sm+1
−1 = 0
µ
ym+2 sm+2 H
−1 =
µ µ
Therefore, !
m+2
X 2 m
yi si 1 X
σ2 = −1 = 2 H2 + b2i .
i=1
µ µ i=1
p Pm 2 2 1
Hence, by choosing µ = 2 H 2 + i=1 bi , we enforce σ ≤ 4
. Moreover, since µ > |bi |, we have
si = bi + µ > 0 for i ∈ {1, . . . , m}.
So what did we get so far? We have replaced the primal-dual pair (48) and (49) by the pair (53)
and (54) such that optimum solutions of these modified problems directly lead to a solution of
the original problem. Moreover, the new primal-dual pair consists of two feasible and bounded
problems.
We will write (53) as
min b̃t y
s.t. Ãt y = c̃ (55)
y ≥ 0
90
and (54) as
max c̃t x
s.t. Ãx + s = b̃ (56)
s ≥ 0
so Ã ∈ R(m+2)×(n+1) , b̃ ∈ Rm+1 and c̃ ∈ Rn+1 .

Note that in these modified problems we have variables x ∈ Rn+1 and y, s ∈ Rm+2 (nevertheless
we denote them by x, y, s as in (48) and (49)).
We have already found initial solutions µ(0) , x(0) , y (0) , s(0) for the following system:
Ãx + s = b̃
Ãt y = c̃

Pm+2 yi si 2
i=1 µ
− 1 ≤ 14 (57)
y > 0
s > 0
7.2 Solutions for Reduced Values of µ
In this section, we will describe a solution for the following problem: Given a solution
µ(k) , x(k) , y (k) , s(k) of (57) we want to compute a new solution µ(k+1) , x(k+1) , y (k+1) , s(k+1) of
(57) where µ(k+1) = (1 − δ)µ(k) for some δ that does not depend on the solution (to be
determined later).
In a first version, we describe the step without considering the sizes of the numbers that occur
during the computation. Afterwards, we will show how we can round intermediate solutions in
such a way that the numbers can be written with a polynomial number of bits.
We write x(k+1) = x(k) + f , y (k+1) = y (k) + g, and s(k+1) = s(k) + h. Think of the entries of f , g
and h as relatively small values. Assuming that µ(k+1) is fixed, we describe how to compute
appropriate values for f , g and h. The first two conditions of (57) lead to Ãf + h = 0 and
(k) (k)
Ãt g = 0. In addition we want to choose f and h such that (yi + gi )(si + hi ) is close to µ(k+1)
(k) (k) (k) (k) (k) (k)
(i = 1 . . . , m + 2). Since (yi + gi )(si + hi ) = yi si + gi si + yi hi + gi hi and the product gi hi
(k) (k) (k) (k)
is small (provided that gi and hi are small) we simply demand yi si + gi si + yi hi = µ(k+1)
(i = 1 . . . , m + 2). Hence, we want to compute f , g and h such that
Ãt g = 0
Ãf + h = 0 (58)
(k) (k) (k) (k)
si gi + yi hi = µ(k+1) − yi si i = 1, . . . , m + 2
Note that y (k) and s(k) are constant in this context. In this formulation, we skipped the
constraints that y (k+1) > 0 and s(k+1) > 0. We will see what we can do to get positive values,
anyway.
91
Let f , g and h be a solution of (58). By construction, we have
(y (k) + g)t (s(k) + h) = (m + 2)µ(k+1) + g t h. (59)
Furthermore the first and second constraint of (58) give
g t h = −g t Ãf = 0t f = 0. (60)
This implies
t
b̃t y (k+1) − c̃t x(k+1) = Ã(x(k) + f ) + (s(k) + h) (y (k) + g) − c̃t (x(k) + f )
t
= Ã(x(k) + f ) (y (k) + g) + (m + 2)µ(k+1) − c̃t (x(k) + f ) (61)
t
= x(k) + f Ãt y (k) + (m + 2)µ(k+1) − c̃t (x(k) + f )
= (m + 2)µ(k+1)
Lemma 63 The system (58) has a unique solution.
(k)
Proof: Let S be an (m + 2) × (m + 2)-diagonal matrix with si as entry at position (i, i)
(k)
and Y be an (m + 2) × (m + 2)-diagonal matrix with yi as entry at position (i, i).
Then, the last condition of (58) is equivalent to
Sg + Y h = µ(k+1) 11m+2 − Sy (k) ,
which is equivalent to
g + S −1 Y h = S −1 µ(k+1) 11m+2 − y (k) .
This implies
Ãt g + Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − Ãt y (k) , (62)
and hence
Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − c̃. (63)
With h = −Ãf this leads to
−Ãt S −1 Y Ãf = Ãt S −1 µ(k+1) 11m+2 − c̃.
However, the matrix Ãt S −1 Y Ã is invertible, so f = (Ãt S −1 Y Ã)−1 (c̃ − Ãt S −1 µ(k+1) 11m+2 ) is the
unique solution of this last inequality. In particular, if (58) has a solution, this is the only
choice for f . By setting h = −Ãf , we fulfill the second constraint of (58). Finally, we set
g = S −1 µ(k+1) 11m+2 − y (k) − S −1 Y h (again the only choice) satisfying the third constraint of
(58).
Since we have chosen g and h such that (62) and (63) are met, we also have Ãt g = 0, so the
solution satisfies the first condition of (58). 2
92
In the above proof we have to solve an equation system −Ãt S −1 Y Ãf = Ãt S −1 µ(k+1) 11m+2 − c̃
in order to compute f . This equation system depends on the previous solutions s(k) and y (k) , so
here the sizes of the numbers to store the intermediate solutions could get too big. At the end
of this section, we will describe how to handle such issues.
s 2 s 2 sm+2
m+2 (k) (k) m+2 (k+1) (k+1) P gi hi 2
yi s i y s
We have σ (k) = − 1 and σ (k+1) =
P P
µ(k)
i i
µ(k+1)
− 1 = µ(k+1)
.
i=1 i=1 i=1
It remains to show that y (k+1) > 0 and s(k+1) > 0 and σ (k+1) ≤ 21 .
We first show that for an appropriate choice of µ(k+1) we get σ (k+1) ≤ 21 .
µ(k) 1
Lemma 64 (a) For i = 1, . . . , m + 2 we have (k) (k) ≤ 1−σ (k)
.
yi s i
√
m+2

(k) (k)
P
1 − yi s i
≤ σ (k) m + 2.
(b) µ(k)
i=1
Proof:
2 2
(k) (k) (k) (k)
Pm+2 yi s i yi s i
(a) We have (σ (k) )2 = i=1 µ(k)
− 1 , so µ(k)
−1 ≤ (σ (k) )2 which implies

(k) (k) (k) (k)
1 − yi (k)
si yi s i
(k)
µ ≤ σ and µ(k)
≥ 1 − σ (k) for i = 1, . . . , m + 2. This proves the claim.
(b) The statement is simply a special case of the Cauchy-Schwarz inequality that can be
proved as follows:
(k) (k) 2
!
m+2
y s
X
(σ (k) )2 (m + 2) − 1 − i (k)i

i=1
µ
!2
(k) (k) 2
m+2
m+2

(k) (k)
X yi si X yi si
= (m + 2) 1 − (k)
− 1 − (k)
µ µ

i=1 i=1
2
(k) (k)
m+2 (k) (k) m+2 m+2 (k) (k)
X y s X X y s yj s j
= (m + 1) 1 − i (k)i − 2 1 − i (k)i · 1 −

µ µ µ (k)

i=1 i=1 j=i+1
m+2
(k) (k)
!2
X m+2 X (k) (k)
yi si yj sj
= 1 − − 1 −

µ(k) µ(k)

i=1 j=i+1
≥ 0
This proves (b). 2
93
Lemma 65 If δ = √1 (i.e. µ(k+1) = (1 − √1 )µ(k) ) then σ (k+1) < 21 .
8 m+2 8 m+2
r r
(k) (k)
si yi
Proof: Let Gi := gi (k) and Hi := hi (k) (for i ∈ {1, . . . , m + 2}).
yi µ(k+1) si µ(k+1)
v v
um+2 um+2
u X gi hi 2 uX
σ (k+1) = t
(k+1)
= t (Gi Hi )2
i=1
µ i=1
v !
u 1 m+2 m+2
u
X X
= t (G2 + Hi2 )2 − (G2i − Hi2 )2
4 i=1 i i=1
v
um+2 m+2
1u X 1X 2
≤ t 2
(G + Hi )2 2
≤ (G + Hi2 )
2 i=1 i 2 i=1 i
m+2 m+2
g t h=0 1X 1X 1 (k) (k) 2
= (Gi + Hi )2 = g i s i + hi y i
2 i=1 2 i=1 yi(k) s(k)
i µ
(k+1) | {z }
(k) (k)
=µ(k+1) −yi si
m+2 2 !2
(k) (k)
1X µ(k) µ(k+1) yi si
= −
2 i=1 yi(k) s(k)
i µ
(k+1) µ(k) µ(k)
m+2
!2
(k) (k)
1 X µ(k) 1 y s
= (k) (k)
−δ + 1 − i (k)i
2 i=1 yi si 1 − δ µ
| {z }
1
≤
1−σ (k)
(k) (k) 2
m+2
! m+2
! !
(k) (k)
1 X y s yi si
X
≤ (m + 2)δ 2
− 2δ 1 − i (k)i
+ 1−
2(1 − δ)(1 − σ (k) ) i=1
µ i=1
µ(k)
(k) (k) 2
! !
m+2 (k) (k) m+2
1 X y s X y s
≤ (m + 2)δ 2 + 2δ 1 − i (k)i + 1 − i (k)i

2(1 − δ)(1 − σ (k) ) µ µ
|i=1 {z
√
} i=1
| {z }
2
≤σ (k) m+2 =(σ )
(k)
1 √ 2
≤ (k)
(m + 2)δ 2 + 2δσ (k) m + 2 + σ (k)
2(1 − δ)(1 − σ )
1 √ 2
(k)
= m + 2δ + σ
2(1 − δ)(1 − σ (k) )
σ (k) ≤ 12
2 √ 2
√

1 1 8 m+2 1 1
≤ m + 2δ + = √ +
1−δ 2 8 m+2−1 8 2
1
≤ .
2
2
94
Lemma 66 We have y (k+1) > 0 and s(k+1) > 0.
(k+1) (k+1)
Proof: Claim: We have yi si > 0 for i = 1, . . . , m + 2.
Proof of the Claim:
(k+1) (k+1)
Assume that yj sj ≤ 0 for a j ∈ {1, . . . , m + 2}. Then,
m+2
!2 (k+1) (k+1)
!2
(k+1) (k+1)
(k+1) 2
X yi si yj sj
σ = (k+1)
−1 ≥ (k+1)
−1 ≥ 1,
i=1
µ µ
which is a contradiction to the previous lemma. This proves the claim.

(k+1) (k+1) (k+1) (k)
Thus if yi ≤ 0, then si ≤ 0 and vice versa. Assume that yi = yi + gi ≤ 0 and
(k+1) (k) (k) (k)
si = si + hi ≤ 0. This implies (because si > 0 and yi > 0) that
(k) (k) (k) (k)

si (yi + gi ) + yi (si + hi ) ≤ 0
| {z }
(k) (k)
=si yi +µ(k+1)
(k) (k)
which is a contradiction to the fact that si , yi , and µ(k+1) are positive.
2
Rounding the intermediate solution

When computing the modification vectors f , g and h according to (58), we have to avoid that
the number of bits needed to store the numbers increases too much in each iteration. We can
(k) (k)
do this in the following way: Instead of the exact values of yi and si , we solve the system
(k) (k)
(58) with respect to rounded values ỹi and s̃i . We do this in such a way that they remain
(k) (k) (k) (k)
ỹ s̃ y s 1 1
positive and such that | iµ(k)i − iµ(k)i | < for some with 0 < < m+2 300
. By restricting the
solution space to a polytope we can assume that a polynomial number of bits is sufficient to
(k) (k)
store these rounded numbers ỹi and s̃i .
2 2
(k) (k) (k) (k) (k) (k)
Pm+2 ỹi s̃i Pm+2 yi si Pm+2 yi s i
+ m+2 2
P
Then, we get i=1 1 − µ(k) ≤ i=1 1 − µ(k) + i=1 2 1 − µ(k) i=1 ≤
2 2
(k) (k) (k) (k)
Pm+2 yi s i 1
Pm+2 yi si
i=1 1 − µ(k) + 100 Thus, if we can bound i=1 1 − µ(k) by, say, 0.49 instead of
2
(k) (k)
Pm+2 ỹi s̃i
0.5, we get i=1 1 − µ(k) ≤ 12 . For the initial solution, this is easy (simply increase the
initial value µ(0) slightly). For the intermediate step, this is also not an issue because in the
proof of Lemma 65, we easily get in the very last inequality even 0.49 as the upper bound.
95
7.3 Finding an Optimum Solution
We will describe a way to find an optimum solution of the dual LP (55).

For the remainder of the chapter, we use the following notation: Let y ∗ be an optimum solution
of (55) and x∗ , s∗ an optimum solution of (56). By Corollary 48, we can assume that all positive
entries of y ∗ and s∗ have a value of at least η for some η = 2−Θ(size(Ã)+size(b̃)+size(c̃)) .
Lemma 67 Let µ, x, y, s be a solution of (57). Let i ∈ {1, . . . , m + 2}. Then:

η
(a) If yi < 4(m+2)
, then yi∗ = 0.
η
(b) If si < 4(m+2)
, then s∗i = 0.
Pm+2 yi si 2
Proof: By the condition i=1 µ
− 1 ≤ 41 , we get
µ 3µ
≤ y i si ≤ < 2µ
2 2
for all i ∈ {1, . . . , m + 2}. Moreover, st y = m+2
P
i=1 yi si ≤ 2(m + 2)µ.
(a) Since y ∗ is an optimum and y a feasible solution of the dual LP, we have b̃t y ≥ b̃t y ∗ and
thus
st y = b̃t y − xt Ãt y = b̃t y − c̃t x ≥ b̃t y ∗ − c̃t x = b̃t y ∗ − xt Ãt y ∗ = st y ∗ .
η
Let i ∈ {1, . . . , m + 2} with yi < 4(m+2)
. We have
µ 2(m + 2)µ st y
si ≥ > ≥ .
2yi η η
Assume that yi∗ > 0, so yi∗ ≥ η. This implies
st y
st y ∗ ≥ si yi∗ > · η = st y ≥ st y ∗ ,
η
which is a contradiction. Therefore, yi∗ = 0.
(b) The case is very similar to part (a): Since x∗ , s∗ is an optimum and x, s a feasible solution
of the primal LP, we have c̃t x ≤ c̃t x∗ and thus
st y = b̃t y − xt Ãt y = b̃t y − c̃t x ≥ b̃t y − c̃t x∗ = b̃t y − y t Ãx∗ = y t s∗ .

η
Let i ∈ {1, . . . , m + 2} with si < 4(m+2)
. We have
µ 2(m + 2)µ st y
yi ≥ > ≥ .
2si η η
96
Assume that s∗i > 0, so s∗i ≥ η. This implies
st y
y t s∗ ≥ s∗i yi > η · = st y ≥ y t s∗ ,
η
which is again a contradiction. Therefore, s∗i = 0. 2
There are several ways to find an optimum solution. Before we describe a method to round
an interior point directly to an optimum solution, we will present a simpler but less efficient
η2
method: We choose k big enough such that µ(k) < 32(m+2) 2 . Then, for each i ∈ {1, . . . , m + 2},
(k) η (k) η
we have yi < 4(m+2)
or si <4(m+2)
. Let Āt y = c̄ be the subsystem of Ãt y = c̃ consisting of
(k) η
the rows with indices i for which si
< 4(m+2) , so s∗i = 0. For all other rows, we know that
yi∗ = 0, so we can ignore them when computing an optimum solution for the dual LP. If Āt y = c̄
has only one solution, we compute it and get an optimum solution of the modified dual LP
(k) η
(53) (provided that the result is non-negative). Otherwise, we check if yi0 < 4(m+2) for some
i0 ∈ {1, . . . , m}. In this case we know that if the initial dual LP has an optimal solution, then
there is one with yi0 = 0. Hence we can start the whole process again but now without the
variable yi0 , without the row of A with index i0 and without the entry of b with index i0 . Hence
we have reduced the instance size, so this method will terminate after at most m iterations.
(k) η
What can we do if there is no i ∈ {1, . . . , m} with yi < 4(m+2)
? To handle this case, we first
make sure that the system Ãx = b̃ does not have a feasible solution. If it has a feasible solution
(which can be checked by Gaussian Elimination), we modify b̃ slightly to a vector b∗ such that
Ãx = b∗ has no feasible solution. To this end choose n linearly independent rows of A. These
rows will define the solution of Ãx = b̃. Then, any modification of b outside these rows will make
the system Ãx = b̃ infeasible. We simply add an > 0 to one of these entries of b. If is small
enough, then an optimum solution of the dual LP with respect to b∗ will still be an optimum
solution of the original dual LP. To see that we can write with a polynomial number of bits,
observe that the absolute value of the difference between the costs of two basic solutions of an
LP is either 0 or can be bounded from below by some value 2−L where L is polynomial in the
input size. This follows from the fact that any basic solution can be written with a polynomial
number of bits. Thus, the same is true for any difference u of two basic solutions and for the
scalar product b̃t u. Hence, b̃t u is either zero or its absolute value is at least 2−L . This implies
that we can choose in such a way that it can be written with polynomially many bits and
that no suboptimal solution can become optimal by the modification.
Now assume that the initial dual LP is bounded and feasible. Then, we can compute optimum
solutions x∗ , y ∗ , s∗ of the modified LPs (53) and (54) by expanding optimum solutions of the
initial primal and dual problems in an canonical way. In particular, we will set xn+1 to 0. Then
Ax∗ + s∗ = b∗ but Ax = b∗ has no feasible solution. Hence, there must be an i0 ∈ {1, . . . , m}
(k) η
with s∗i0 > 0, so yi0 < 4(m+2) and yi∗0 = 0. Again, we get rid of at least one dual variable and
can restart the whole procedure on a smaller instance.
Now, we describe how we can avoid iterating the whole process:
97
Consider again the two problems (55) and (56). Theorem 13 implies that we can partition the
˙ such that for i ∈ B
index set {1, . . . , m + 2} of the dual variables into {1, . . . , m + 2} = B ∪N
∗ ∗
there is an optimum dual solution y with yi > 0 and for i ∈ N there is an optimum primal
solution x∗ , s∗ with s∗i > 0. Any optimum solution can be written as convex combination of
η
basic solutions. Hence, in Lemma 67 for any i ∈ {1, . . . , m + 2} we can either have yi < 4(m+2)
η η 2
or si < 4(m+2) but not both. Now we choose k big enough such that µ(k) < 32(m+2) 2 ∆ for
some ∆ ≥ 1 that will be determined later. Then, for each i ∈ {1, . . . , m + 2}, exactly one of
η η
the inequalities yi < 4(m+2)∆ and si < 4(m+2)∆ holds. Therefore, we can find the partitioning
η
˙ . In particular, we have yi ≥ 4(m+2) η
{1, . . . , m + 2} = B ∪N for each i ∈ B and yi < 4(m+2)∆ for
each i ∈ N
Let AB be the submatrix of Ã consisting of the rows with indices in B, and AN be the submatrix
(k) (k)
of Ã consisting of the remaining rows. By yB , yN , bB , bN we denote the corresponding subvectors
(k)
of vectors y (k) and b. As in the description of the Simplex Algorithm, the entries of e.g. yB
are not necessarily indexed from 1 to |B| but their index set is the set B ⊆ {1, . . . , m + 2}. We
can assume that AB has full column rank.
In the following, the vector norm is the Euclidean norm k · k2 and the matrix norm is the norm
induced by the Euclidean norm.
p
Theorem 68 Set ∆ = max{ (m + 2)kAB (AtB AB )−1 AtN k, 1}. Let k be big enough such
η2
that µ(k) < 32(m+2)2 ∆ . Let YB be a diagonal matrix whose rows and columns are indexed
(k)
with B such that the entry at position (i, i) is yi . Define
(k)
dy := YB AB (AtB (YB )2 AB )−1 AtN yN
(k)
and ỹB = YB dy + yB . Then:
(a) AtB ỹB = c̃.
(b) kdy k < 1.
(c) The vector ỹ ∈ Rm+2 which arises from ỹB by adding zeros for the entries with index
in N is an optimum dual solution.
Proof:
(k) (k) (k)

(a) We have AtB (YB dy + yB ) = AtN yN + AtB yB = c̃.
98
(b)
(k)
kdy k = kYB AB (AtB (YB )2 AB )−1 AtN yN k
(k)
= kYB AB (AtB (YB )2 AB )−1 AtB YB YB−1 AB (AtB AB )−1 AtN yN k
| {z }
=In
(k)
= kYB AB (AtB (YB )2 AB )−1 AtB YB k ·kYB−1 AB (AtB AB )−1 AtN yN kx
| {z }
=1
(k)
≤ kYB−1 k · kAB (AtB AB )−1 AtN k · kyN k
| {z } | {z } | √{z }
4(m+2) ∆ η m+2
≤ ≤ √m+2 < 4(m+2)∆
η
≤ 1.
(c) By (a), we have Ãt ỹ = c̃, and by (b), we know that ỹB > 0, so we have ỹ ≥ 0. Hence ỹ
is a feasible dual solution. Moreover, we know that there is a feasible primal solution in
which the slack variables si are zero for i ∈ B. Hence, by complementary slackness, ỹ is
an optimum dual solution. 2
Theorem 69 Given a feasible and bounded linear program min{bt y | y t A = ct , y ≥ 0}

with A ∈ Qm×n , b ∈ Qm , and c ∈ Qn , the Interior Point Method computes an
optimum solution in polynomial time. Moreover, the algorithm decides correctly, if a
linear program is feasible or bounded. 2
99
100
8 Integer Linear Programming
Imposing integrality constraints on all or some variables of a linear program allows to model
many new conditions that could not be described by linear constraints. For example, even if
we only consider Binary Linear Programs (i.e. all integrality constraints are of the type
x ∈ {0, 1}) we can easily model the following conditions for variables x, y:
• “(x ≥ a or y ≥ b) and x, y ≥ 0” for some a, b > 0.
• “x ∈ {s1 , . . . , sk }” for a set {s1 , . . . , sk } of real numbers.
On the other hand, we have already seen that there are NP-hard optimization problems that
can be modeled as (mixed) integer linear programs. Hence, we cannot hope for polynomial-time
algorithms to solve general ILPs.
8.1 Integral Polyhedra
Definition 21 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron.Then, we define

PI := conv{x ∈ Zn | Ax ≤ b} as the integer hull of P .
Fig. 8: A polyhedron P (given by the red hyperplanes) and its integer hull PI (green). The
black dots indicate the integral vectors.
Observations:
• For a rational polyhedral cone (i.e. a cone C = {x ∈ Rn | Ax ≤ 0} with A ∈ Qm×n ), we

have CI = C (because a polyhedral cone is rational if and only if it is generated by a
finite number of integral vectors).
101
• PI is not necessarily a polyhedron.
• If P is a polytope, then PI is a polyhedron.
Theorem 70 Let P = {x ∈ Rn | Ax ≤ b} be a polyhedron with A ∈ Qm×n and b ∈ Qm .

Then, PI is a polyhedron.
Proof: Let P = {x ∈ Rn | Ax ≤ b} with A ∈ Qm×n and b ∈ Qm . By Theorem 31, we can

write P = conv(V ) + cone(E) for two finite sets V, E ⊆ Rn . Moreover, the proof of Theorem 31
also shows that we can assume that the elements of V and E are rational vectors. Hence, we
Psassume that E = {y1 , . . . , ys } where yi are integral vectors (i = 1, . . . , s). Define
can even
B := { i=1 λi yi | 0 ≤ λi ≤ 1 for i ∈ {1, . . . , s}}.
Claim: PI = (conv(V ) + B)I + cone(E).
Proof of the claim: “PI ⊆ (conv(V ) + B)I + cone(E)”:
vector of P . Then, p = q + c for some q ∈ conv(V ) and some P
Let p be an integral P c ∈ cone(E).
s s
We can write c = i=1 µi yi with µi ≥ 0 for i ∈ {1, . . . , s}. Therefore c = i=1 µi yi =
Xs Xs
(µi − bµi c)yi + bµi cyi , so we can write c = b + c0 with b ∈ B and c0 ∈ cone(E) ∩ Zn .
|i=1 {z } |i=1 {z }
∈B ∈(cone(E)∩Zn )
Thus, p = (q + b) + c . We have q + b ∈ conv(V ) + B. And q + b = p − c0 , so q + b is integral.
0
Hence, q + b ∈ (conv(V ) + B)I , and therefore p ∈ (conv(V ) + B)I + cone(E).

“PI ⊇ (conv(V ) + B)I + cone(E)”:
We have
(conv(V ) + B)I + cone(E) ⊆ PI + cone(E) = PI + (cone(E))I ⊆ (P + cone(E))I = PI .
This proves the claim.

The claim implies the statement of the theorem because conv(V ) + B is a polytope, so
(conv(V ) + B)I is also a polytope. This shows that PI can be written as the Minkowski sum of
two polyhedra. However, by an earlier exercise, the Minkowski sum of two polyhedra is again a
polyhedron. 2
In particular, we get the (somewhat surprising) consequence that one can solve integer linear
programs by solving linear programs. The problem is that the polyhedron PI may not have
a simple description even if there is one for P . For example, Rubin [1970] has shown that for
any k there are rational polyhedra P ⊆ R2 with only 3 facets such that PI has more than k
facets. Moreover, Bárány, Howe, and Lovász [1992] gave an a example showing that there are
rational polyhedra P = {x ∈ Rn | Ax ≤ b} ⊆ Rn such that PI has Ω(φn−1 ) vertices, where
φ = size(A) + size(b).
102
Definition 22 A polyhedron P is called integral if P = PI .
Proposition 71 Let P = {x ∈ Rn | Ax ≤ b} with A ∈ Qm×n and b ∈ Qm such that

PI 6= ∅. Let c ∈ Rn be a vector. Then, max{ct x | x ∈ P } is bounded if and only if
max{ct x | x ∈ PI } is bounded.
Proof: “⇒:” trivial.

“⇐:” Assume that max{ct x | x ∈ P } is unbounded. Then, the dual LP must be infeasible, so
there is no vector y with y t A = c and y ≥ 0. By Farkas’ Lemma (Theorem 6), this means that
there is a vector z with ct z < 0 and Az ≥ 0. Thus, the LP min{ct x | Ax ≥ 0, −11 ≤ x ≤ 11} is
feasible and has an optimum solution with negative value. By Proposition 47, there is a rational
optimum solution x∗ . By multiplying x∗ by an appropriate integer, we get an integral vector w
with Aw ≥ 0 and ct w < 0. Hence, for any v ∈ PI and k ∈ N we have v − kw ∈ PI . Therefore,
max{ct x | x ∈ PI } is unbounded. 2
8.2 Integral Solutions of Equation Systems
In this section, our goal is to find a certificate that a given system of equations does not have
any integral solution (which will be the result of Corollary 73).
Definition 23 An m × n-matrix A is in Hermite normal form if it can be written

as A = B 0 where B is a nonsingular lower triangular non-negative matrix such
that each row of B has a unique maximum entry and this maximum entry is on the diagonal.
The following operations on matrices are called elementary unimodular column operati-
ons:
• Exchange two columns.
• Multiply a column by −1.
• Add an integral multiple of one column to another column.
Theorem 72 Each matrix A ∈ Qm×n of rank m can be transformed into a matrix in

Hermite normal form by a series of elementary unimodular column operations.
103
Proof: We may assume that A is integral. Assume that we have already transformed A
F 0
into a matrix where F is a lower triangular matrix with positive diagonal. Let
G H
h11 , . . . , h1k be the first row of H. Apply elementary unimodular column operations to H such
that all h1j are non-negative and such that kj=1 h1j is as small as possible. We may assume
P
that h11 ≥ h12 ≥ · · · ≥ h1k . Then, h11 > 0 because A has rank m. Moreover, h1j = 0 for
j ∈ {2, . . . , k} because otherwise subtracting h1j from h11 would reduce kj=1 h1j . Hence, we
P
have obtained a larger lower triangular matrix F 0 .

We iterate this step and end up with a matrix B 0 where B is a lower triangular matrix
with positive diagonal. Denote the entries of B be bij (i = 1, . . . , m, j = 1, . . . , m). Finally, we
perform for i = 2, . . . , m the following steps: For j = 1, . . . , i − 1 add an integer multiple of the
i-th column of B to the j-th column of B such that the bij is non-negative and less than bii . 2
Corollary 73 Let A ∈ Qm×n and b ∈ Qm . Then, Ax = b has an integral solution x if

and only if bt y is integral for each y ∈ Qm for which At y is integral.
Proof: “⇒:” If x and y t A are integral vectors and Ax = b, then y t Ax = y t b is also integral.
“⇐:” Assume that bt y is integral for each y ∈ Qm for which At y is integral. Then, Ax = b must
have a (fractional) solution, since otherwise, by Farkas’ Lemma (Corollary 7), there would be
a vector y ∈ Qm with y t A = 0 and y t b = − 12 . Thus, we may assume that the rows of A are
linearly independent, so A has rank m.
It is easy to check the statement to be proved holds for A if and and only if it holds for any
matrix Ã where Ã arises from A by applying an elementary unimodular column operation.
Hence, we can assume that A is in Hermite normal form [B 0]. Thus B −1 [B 0] = [Im 0] is
−1 −1
an integral matrix. Therefore by our
−1 assumption (applied to−1therows of B ), B b is an
B b B b
integral vector. Since [B 0] = b, the vector x := is an integral solution for
0 0
[B 0] x = b. 2
104
8.3 TDI Systems
Theorem 74 Let P = {x ∈ Rn | Ax ≤ b} with A ∈ Qm×n and b ∈ Qm . Then, the

following statements are equivalent:
(a) P is integral
(b) Each face of P contains at least one integral vector.
(c) Each minimal face of P contains at least one integral vector.
(d) Each supporting hyperplane of P contains at least one integral vector.
(e) Each rational supporting hyperplane of P contains at least one integral vector.
(f ) max{ct x | x ∈ P } is attained by an integral vector for each c for which the maximum
is finite.
(g) max{ct x | x ∈ P } is an integer for each integral vector c for which the maximum is
finite.
Proof: The following implications are obvious: “(b) ⇔ (c)”, “(b) ⇒ (d)”, “(d) ⇒ (e)”, and
“(f) ⇒ (g)”
“(a) ⇒ (b):” Assume that P is integral. Let F = P ∩ H be a face of P where H = {x ∈ Rn |
ct x = δ} is a supporting hyperplane of P . Then, any z ∈ F is a convex combination of integral
vectors v1 , . . . , vk of P . If vi ∈ P \ F (so ct vi < δ) for an i ∈ {1, . . . , k}, then (since ct x = δ)
there must be a j ∈ {1, . . . , k} with ct vj > δ, which is a contradiction to vj ∈ P . Thus, all vi
must be in F , so in particular F contains an integral vector.
“(c) ⇒ (f):” Follows from Corollary 19.
“(f) ⇒ (a):” Assume that (f) holds but P 6= PI . Then, there is an x∗ ∈ P \ PI . By Theorem 70,
PI is a polyhedron, so there is an inequality at x ≤ β that is valid for PI but not for x∗ , so
at x∗ > β. This is a contradiction to (f) because max{at x | x ∈ P } is finite (by Proposition 71)
but is not attained by any integral vector.
So far, we have proved that (a),(b),(c), and (f) are equivalent.
“(e) ⇒ (c):” We may assume that A and b are integral. Let F = {x ∈ Rn | A0 x = b0 } be a
minimal face of P (where A0 x ≤ b0 is a subsystem of Ax ≤ b). If there is no integral vector x
with A0 x = b0 , then, by Corollary 73, there must be a rational vector such that c := (A0 )t y is
integral while δ := y t b0 is not an integer. Moreover, we may assume that all entries of y are
positive (otherwise we add an appropriate integral vector to y). Since c is integral but δ is not
integral, the rational hyperplane H := {x ∈ Rn | ct x = δ} does not contain any integral vector.
We will show that H ∩ P = F which implies that H is a supporting hyperplane. By construction,
105
we have F ⊆ H, so we have to show that H∩P ⊆ F . Let x ∈ H∩P . Then, y t A0 x = ct x = δ = y t b0 ,
so y t (A0 x − b0 ) = 0. Thus, since all components of y are positive, A0 x = b0 , so x ∈ F .
Now, we know that (a),(b),(c),(d),(e), and (f) are equivalent.
“(g) ⇒ (e):” Let H = {x ∈ Rn | ct x = δ} be a rational supporting hyperplane of P , so
max{ct x | x ∈ P } = δ. Assume that H does not contain any integral vector. Then, by
Corollary 73, there is a positive number γ for which γc is integral but γδ is not integral. Then
max{(γc)t x | x ∈ P } = γ max{ct x | x ∈ P } = γδ 6∈ Z, so the statement of (g) is false.
Since “(f) ⇒ (g)” is trivial, this shows the equivalence of all statements. 2
Note that this Theorem implies that for any rational polyhedron P ⊆ Rn with P = PI and
any rational vector c there is a polynomial-time algorithm computing a vector x ∈ P ∩ Zn
maximizing ct x over P ∩ Zn , provided that there is an optimum solution. To this end, we only
have to compute an integral element of a minimal face F consisting of optimum solutions only
(for finding F we can apply the Ellipsoid Method). This can be done by computing an
integral solution of an equation system, which is possible in polynomial time by the method
described in the proof of Corollary 73.
Moreover, by the equivalence of (f) and (g), the existence of an integral solution can be deduced
from the integrality of the solution value. This motivates the following definition:
Definition 24 A system of inequalities Ax ≤ b is called totally dual integral

(TDI-system), if the LP min{bt y | At y = c, y ≥ 0} has an integral optimum solution
for each integral vector c for which the LP is feasible and bounded.
Note that total dual integrality is in fact a property of the system of inequalities, not just of
the polyhedron that is defined by them. For example the systems
   
1 1 0
x1
 1 0  ≤ 0 
x2
1 −1 0
and
1 1 x1 0
≤
1 −1 x2 0
define the same polyhedron. But it is easy to check that the first system of inequalities is TDI
while the second one is not TDI.
Theorem 75 Let A ∈ Qm×n and b ∈ Zm such that Ax ≤ b is totally dual integral. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
Proof: If Ax ≤ b is TDI, then by definition min{bt y | At y = c, y ≥ 0} is an integer for each

integral vector c for which the minimum is finite. By duality, this implies that max{ct x | Ax ≤ b}
106
is an integer for each integral vector c for which the maximum is finite. Thus, by the implication
“(g) ⇒ (a)” of Theorem 74, P is integral. 2
Proposition 76 If Ax ≤ b is a TDI-system, and at x ≤ β is valid for any x ∈ Rn with

Ax ≤ b, then the system Ax ≤ b, at x ≤ β is also totally dual integral.
Proof: Exercise. 2
Hence, if a system Ax ≤ b is not TDI, then no proper subsystem A0 x ≤ b0 with {x ∈ Rn | Ax ≤
b} = {x ∈ Rn | A0 x ≤ b0 } can be TDI. We call a system Ax ≤ b minimally TDI if it is TDI
but no proper subsystem of Ax ≤ b defining the same polyhedron is TDI.
Proposition 77 If Ax ≤ b, at x ≤ β is a TDI-system with a integral, then Ax ≤ b, at x = β

is also a TDI-system.
Proof: Let c be an integral vector for which
max{ct x | Ax ≤ b, at x = β}
(64)
= min{bt y + β(λ − µ) | y ≥ 0, λ, µ ≥ 0, At y + (λ − µ)a = c}
is finite. Let x∗ , y ∗ , λ∗ , µ∗ be optimum primal and dual solutions. Set c̃ := c + dµ∗ ea. Then,
max{c̃t x | Ax ≤ b, at x ≤ β}
(65)
= min{bt y + βλ | y ≥ 0, λ ≥ 0, At y + λa = c̃}
is finite because x∗ is feasible for the maximum and y ∗ and λ∗ + dµ∗ e − µ∗ are feasible for the
minimum.
Since Ax ≤ b, at x ≤ β is a TDI-system, the minimum in equation (65) has an integer optimum
solution ỹ, λ̃. Then, y := ỹ, λ := λ̃, µ := dµ∗ e is an integer optimum solution for the minimum
in (64): it is obviously feasible, and its cost is:
bt ỹ + β(λ̃ − dµ∗ e) = bt ỹ + β λ̃ − βdµ∗ e ≤ bt y ∗ + β(λ∗ + dµ∗ e − µ∗ ) − βdµ∗ e = bt y ∗ + β(λ∗ − µ∗ ).
The inequality follows from the fact that y ∗ , λ∗ + dµ∗ e − µ∗ is feasible for the minimum in (65)
and ỹ, λ̃ is an optimum solution for the minimum in (65). Hence, the minimum in (64) has an
integral optimum solution, so Ax ≤ b, at x = β is TDI. 2
Definition 25 A finite set of vectors {v1 , . . . , vt } is a Hilbert basis if each integral

vector in cone({v1 , . . . , vt }) is a non-negative integral combination of v1 , . . . , vt .
Example: The unit vectors are a Hilbert basis.
107
Theorem 78 Every rational polyhedral cone is generated by an integral Hilbert basis.
Proof: Let C be a rational polyhedral cone. C is generated by some rational vectors b1 , . . . , bk ,

and we can assume with loss of generality that these vectors are integral. Let H consist of all
integral vectors in
( k )
X
P = λi bi | 0 ≤ λi ≤ 1 for i ∈ {1, . . . , k} .
i=1
Obviously H is a finite set. We claim that H is a Hilbert basis generating C. As {b1 , . . . , bk } ⊆

H ⊆ C, the cone C is generated by H. To see that H forms a Hilbert basis, let b be an
integral
P vector in C. Since b1 , . . . , bk generate C, there are nonnegative numbers µ1 , . . . , µk with
b = ki=1 µi bi , so
k
! k
X X
b= bµi cbi + (µi − bµi c)bi .
i=1 i=1
Then, the vector

k
! k
X X
b− bµi cbi = (µi − bµi c)bi
i=1 i=1
is integral and an element of P . Thus (since {b1 , . . . , bk } ⊆ H), b can be written a a non-negative
integral combination of the elements of H. This shows that H is a Hilbert basis. 2
Notation: For a system of inequalities Ax ≤ b and a face F of {x ∈ Rn | Ax ≤ b}, we call
a row of A active, if the corresponding inequality in Ax ≤ b is satisfied with equality for all
x ∈ F.
Theorem 79 A feasible rational system of inequalities Ax ≤ b is TDI if and only if for

each minimal face F of P := {x ∈ Rn | Ax ≤ b}, the rows that are active in F form a
Hilbert basis.
Proof: “⇒:” Suppose that Ax ≤ b is TDI. Let F be a minimal face of P and let a1 , . . . , at be
the rows of A that are active for F . We have to show that {a1 , . . . , at } is a Hilbert basis. Let
c be an integral vector in cone({a1 , . . . , at }). We have to write c as an integral non-negative
combination of a1 , . . . , at . The maximum in the LP-duality equation
max{ct x | Ax ≤ b} = min{bt y | At y = c, y ≥ 0} (66)
is attained by every vector x in F . Since Ax ≤ b is TDI, the dual problem has an integral
optimum solution y. By complementary slackness, the entries of y at positions corresponding
to rows that are not active in F are 0. Thus, c is an integral non-negative combination of
a1 , . . . , a t .
108
“⇐:” Assume that for each minimal face F of P , the rows that are active in F form a Hilbert
basis. Let c be an integral vector for which the optima in (66) are finite. We have to show that
the minimum is attained by an integral vector. Let F be a minimal face of P such that each
vector in F attains the maximum in the duality equation. Let a1 , . . . , at be rows of A that are
active in F . Then, by complementary slackness, c ∈ cone({a1 , . . . , at }). Since a1 , . . . , at form a
Hilbert basis, we can write c = ti=1 λi ai for certain non-negative integral numbers λ1 , . . . , λt .
P
We can extend (λ1 , . . . , λt ) with zero-components to a vector y ∈ Zm with y ≥ 0, At y = c and
bt y = xt At y = ct x for all x ∈ F . In other words, y is an integral optimum solution of the dual
LP. 2
Theorem 80 The rational system of inequalities Ax ≤ 0 is TDI if and only if the rows
of A form a Hilbert basis.
Proof: Follows from the previous Theorem with b = 0 (note that in the unique minimal face
of {x ∈ Rn | Ax ≤ 0} all rows of A are active). 2
Theorem 81 (Giles and Pulleyblank [1979]) For each rational polyhedron P ⊆ Rn there
exists a rational TDI-system Ax ≤ b with A ∈ Zm×n and P = {x ∈ Rn | Ax ≤ b}. The
vector b can be chosen to be integral if and only if P is integral.
Proof: We can assume w.l.o.g. that P 6= ∅. For each minimal face F of P , we define
CF := {c ∈ Rn | ct z = max{ct x | x ∈ P } for all z ∈ F }.
Then, CF is a polyhedral cone. To see this, assume that P = {Ãx ≤ b̃} is some description of
P . Then CF is generated by the rows of Ã that are active in F .
Let F be a minimal face, and let a1 , . . . , at be an integral Hilbert basis generating CF . Choose
x0 ∈ F , and define βi := ati x0 for i = 1, . . . , t. Then, βi = max{ati x | x ∈ P } (i = 1, . . . , t). Let
SF be the system at1 x ≤ β1 , . . . , att x ≤ βt . All inequalities in SF are valid for P . Let Ax ≤ b
be the union of the systems SF over all minimal faces F of P . Then, P ⊆ {x ∈ Rn | Ax ≤ b}.
On the other hand, if x∗ ∈ Rn \ P , then there is a supporting hyperplane of P separating x∗
from P , and this supporting hyperplane touches P in a minimal face, so there is an inequality
in Ax ≤ b that is violated by x∗ . Hence, P = {x ∈ Rn | Ax ≤ b}. Moreover, by Theorem 79,
Ax ≤ b is TDI.
If P is integral, then all the βi can chosen to be integral because we can choose the vectors
x0 ∈ F as integral vectors. On the other hand, if b is integral, then by Theorem 75, P is integral.
2
In the primal-dual max{ct x | Ax ≤ b} = min{bt y | At y = c, y ≥ 0} we know (by the Simplex
Algorithm) that if both optima are finite, the minimization problem has an optimum solution
y with at most rank(A) non-zero entries. If we ask for an optimum integral solution (with
109
2
, b = 00 and c = (1).

Ax ≤ b TDI and b integral), this is not necessarily the case: see A = −3
Nevertheless, for full-dimensional solution spaces, we get the following bound on the number of
non-zero entries:
Theorem 82 Let Ax ≤ b be a TDI-system with A ∈ Zm×n such that dim({x ∈ Rn |

Ax ≤ b}) = n. Let c be an integral vector for which the optima in
max{ct x | Ax ≤ b} = min{bt y | At y = c, y ≥ 0}
are finite. Then, the minimization problem has an integral optimum solution y with at
most 2r − 1 positive components where r := rank(A).
Proof: Claim: Let {a1 , . . . , at } ⊆ Zn be a Hilbert basis such that C := cone({a1 , . . . , at }) is

a pointed k-dimensional polyhedral cone. Then, any integral vector c in C is a nonnegative
integral combination of at most 2k − 1 vectors in a1 , . . . , at
Proof of the Claim: Let λ1 , . . . , λt attain
( t t
)
X X
max λi | λ1 , . . . , λt ≥ 0; c = λi ai (67)
i=1 i=1
Since C is pointed, the maximum is finite (check that the dual LP min{ct y | y t ai ≥ 1 for i ∈
{1, . . . , t}} is feasible). We can assume that at most k of the λi are non-zero. Define
t
X t
X
c0 := c − bλi cai = (λi − bλi c)ai .
i=1 i=1
Then, c0 is an integral vector in C, so we can write it as c0 = ti=1 µi ai for some integral numbers
P
µ1 , . . . , µt ≥ 0. Since λ1 , . . . ,P
λt was an P P of (67) and µ1 + bλ1 c, . . . , µt + bλt c is
optimum solution
a feasible solution, we have ti=1 µi + ti=1 bλi c ≤ ti=1 λi , so
t
X t
X t
X
µi ≤ λi − bλi c < k
i=1 i=1 i=1
because at most k of the λi are non-zero. Thus, at most k − 1 of the µi are non-zero. Therefore,
the decomposition
Xt
c= (bλi c + µi )ai
i=1
has at most 2k − 1 non-zero components. This proves the claim.

The claim implies the statement of the theorem. To see this, first note that if P := {x ∈ Rn |
Ax ≤ b} is full-dimensional, then a cone generated by rows of A that are active in a minimal
face F of P must be pointed. Otherwise such a cone would contain a pair of vectors v and −v.
110
Thus there would be inequalities v t x ≤ β1 and −v t x ≤ β2 (for some numbers β1 , β2 ) that can
be written as non-negative combinations of inequalities in Ax ≤ b corresponding to rows of A
that are active in F . For x ∈ F we have v t x = β1 and −v t x = β2 , so this would imply β1 = −β2
and P would be contained in {x ∈ Rn | v t x = β1 }, which is a contradiction to the assumption
that P is full-dimensional.
By Theorem 79, the rows that are active for a minimal face consisting of optimum solutions of
max{ct x | Ax ≤ b} form a Hilbert basis (because Ax ≤ b is TDI). 2
8.4 Total Unimodularity
In this section, we want to identify integral matrices A such that Ax ≤ b, x ≥ 0 is TDI for
any vector b. It will turn out that these are exactly the totally unimodular matrices (see
Corollary 87).
Definition 26 An m × n-matrix A with rank m is called unimodular if A ∈ Zm×n and

for all regular m × m-submatrices B of A, we have det(B) ∈ {−1, 1}.
In particular, a regular square matrix is unimodular if and only if it is integral and its determinant
is −1 or 1. Moreover, by Cramer’s rule, the inverse of any unimodular square matrix is an
integral matrix.
Exercise: Check that any series of elementary unimodular column operations, applied to a
matrix A (see Chapter 8.2), can be performed by multiplying A from the right by an appropriate
regular unimodular square matrix.
Definition 27 A matrix A is called totally unimodular (TU) if every subdeterminant

of A (i.e. every determinant of quadratic submatrices of A) is 0, −1 or 1.
In particular, all entries of totally unimodular matrices must be 0, −1 or 1.

Observation: A matrix A is totally unimodular if and only if Im A is unimodular.
Theorem 83 Let A be a totally unimodular matrix, and let b be an integral vector. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
Proof: Let F be a minimal face of P . We will show that F contains an integral vector. By
the implication “(c) ⇒ (a)” of Theorem 74 this is sufficient to prove that P is integral.
By Proposition 22, we can write the minimal face as F = {x ∈ Rn | A0 x = b0 } where A0 x ≤ b0
is a subsystem of Ax ≤ b. We can assume that A0 has full row rank. By permuting coordinates,
111
U −1 b0
we can write A0 = U V for some matrix U with det(U ) ∈ {−1, 1}. Thus x :=

0
is an
integral vector in F . 2
Theorem 84 Let A ∈ Zm×n be a matrix with rank m. Then A is unimodular if and only
if for each integral vector b the polyhedron {x ∈ Rn | Ax = b, x ≥ 0} is integral.
Proof: “⇒:” Assume that A is unimodular, and let b be an integral vector. Let x0 be a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. This means that there are n linearly independent constraints
in the system Ax ≤ b, −Ax ≤ −b, −In x ≤ 0 that are satisfied by x0 with equality. Thus, the
columns of A corresponding to non-zero entries of x0 are linearly independent. This set of
columns can be extended to a regular m × m-submatrix B of A. Then, the restriction of x0 to
coordinates corresponding to B is B −1 b. This is integral (because det(B) ∈ {−1, 1}). The other
entries of x0 are zero, so x0 is integral.
“⇐:” Suppose that {x ∈ Rn | Ax = b, x ≥ 0} is integral for every integral vector b. Let B be
a regular m × m-submatrix of A. We have to show that det(B) ∈ {−1, 1}. To this end, it
is sufficient to show that B −1 u is integral for every integral vector u (by Cramer’s rule). So
let u be an integral vector. Then, there is an integral vector y such that z := y + B −1 u ≥ 0.
Then, b := Bz is integral. Let z 0 be a vector with Az 0 = Bz = b that arises from z by adding
zero-entries. Then, z 0 is a feasible (i.e. non-negative) basic solution of Ax = b, so it is a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. Therefore z 0 is integral, which also shows that z is integral. This
implies that B −1 u = z − y is integral. 2
Theorem 85 (Hoffman and Kruskal [1956]) Let A be an integral matrix. Then

A is totally unimodular if and only if for each integral vector b the polyhedron
{x ∈ Rn | Ax ≤ b, x ≥ 0} is integral.

Proof: The matrix A is totally unimodular if and only if Im A is unimodular. Let b be an
integral vector. Then, the
vertices
of {x ∈ Rn | Ax ≤ b, x ≥ 0} are integral if and only if the
vertices of {z ∈ Rm+n | Im A z = b, z ≥ 0} are integral. Thus, the statement follows from
Theorem 84. 2
Corollary 86 An integral matrix A is totally unimodular if and only if for all integral
vectors b and c optimum values for both sides of the duality equation
max{ct x | Ax ≤ b, x ≥ 0} = min{bt y | At y ≥ c, y ≥ 0}
are attained by integral vectors (if they are finite).
Proof: Follows directly from Hoffmans and Kruskal’s Theorem (Theorem 85) using the fact
112
that a matrix is totally unimodular if and only if its transposed matrix is totally unimodular. 2
Corollary 87 An integral matrix A is totally unimodular if and only if the system

Ax ≤ b, x ≥ 0 is TDI for each vector b.
Proof: “⇒:” If A is totally unimodular, then also At is totally unimodular. Thus, by Theo-
rem 85, min{bt y | At y ≥ c, y ≥ 0} is attained by an integral vector for each vector b and each
integral vector c for which the minimum is finite. This implies that the system Ax ≤ b, x ≥ 0 is
TDI for each vector b.
“⇐:” Suppose that Ax ≤ b, x ≥ 0 is TDI for each vector b. By Theorem 75 this implies that the
polyhedron {x ∈ Rn | Ax ≤ b, x ≥ 0} is integral for each integral vector b. By Theorem 85, this
means that A is totally unimodular. 2
The following theorem provides as a certificate to show that a matrix is totally unimodular.
Theorem 88 (Ghoulia-Houri [1962]) A matrix A = (aij ) i=1,...,m ∈ Zm×n is totally unimo-

j=1,...,n
˙ 2
dular if and only if for each set R ⊆ {1, . . . , n} there are sets R1 and R2 with R = R1 ∪R
such that for each i ∈ {1, . . . , m}:
X X
aij − aij ∈ {−1, 0, 1}.
j∈R1 j∈R2
Proof: “⇒:” Let A be totally unimodular and R ⊆ {1, . . . , n}. Let d ∈ {0, 1}n be the
characteristic vector for R, i.e.

1 for r ∈ R
dr =
0 for r ∈ {1, . . . , n} \ R


A
Since A is totally unimodular, also the matrix  −A  is also totally unimodular. Thus, the
In
polytope

n 1 1
P := x ∈ R | Ax ≤ Ad , Ax ≥ Ad , x ≤ d, x ≥ 0
2 2
is integral. It contains the vector 12 d, so it is non-empty.

l PLet z bem an integral vertex of P .
Pn 1 n 1 1
Pn
Then, for any i ∈ {1, . . . , m}, we have j=1 aij zj ≤ 2 j=1 aij dj ≤ 2 + 2 j=1 aij dj and
Pn j P k
1 n 1 1
Pn Pn
a z
j=1 ij j ≥ 2 j=1 ij j ≥ − 2 + 2
a d j=1 aij dj , so −1 ≤ j=1 aij (dj − 2zj ) ≤ 1.
113
Define R1 := {r ∈ R | zr = 0} and R2 := {r ∈ R | zr = 1}. For i ∈ {1, . . . , m}, this yields
X X n
X
aij − aij = aij (dj − 2zj ) ∈ {−1, 0, 1}
j∈R1 j∈R2 j=1
˙ 2 as
“⇐:” Assume that for each R ⊆ {1, . . . , n} there are sets R1 , R2 ⊆ R with R = R1 ∪R
described in the theorem. We show by induction in k that every k × k-submatrix of A has
determinant -1,0, or 1. For k = 1 this follows from the criterion for |R| = 1.
Let k > 1. Let B = (bij )i,j∈{1,...,k} a submatrix of A. We can assume that B is non-singular
because otherwise its determinant is 0.
0
By Cramer’s rule, each entry of B −1 is det(B
det(B)
)
where B 0 arises from B by replacing a column by
a unit vector. By the induction hypothesis det(B 0 ) ∈ {−1, 0, 1}. Hence, all entries of the matrix
B ∗ := (det(B))B −1 are in {−1, 0, 1}.
Let b∗ be the first column of B ∗ . Then, Bb∗ = det(B)e1 where e1 is the first unit P vector.∗ We
∗ ∗
define R := {j ∈ {1, . . . , k} | bj 6= 0}. For i ∈ {2, . . . , k}, we have 0 = (Bb )i = j∈R bij bj , so
|{j ∈ R | bij 6= 0}| is even.
˙ 2 such that j∈R bij − j∈R bij ∈ {−1, 0, 1} for all i ∈ {1, . . . , k}. Thus, for
P P
Let R = R1 ∪R 1 2 P P
i ∈ {2,
P . . . , k}, we
P have (since |{j ∈ R | b ij 6
= 0}| is even): j∈R 1
b ij − j∈R2 bij = 0. If we also
had j∈R1 b1j − j∈R2 b1j = 0, then the columns of B would not be linearly independent. Hence,
k
P P
j∈R1 b1j − j∈R2 b1j ∈ {−1, 1} and thus, Bx ∈ {e1 , −e1 } where the vector x ∈ {−1, 0, 1} is
defined by 
 1 for j ∈ R1
xj = −1 for j ∈ R2
0 for j ∈ {1, . . . , k} \ R

Therefore, b∗ = det(B)B −1 e1 ∈ {det(B)x, −det(B)x}. But both b∗ and x are non-zero vectors
with entries -1,0,1 only, so we can conclude that det(B) ∈ {−1, 1}. 2
This result allows as to prove total unimodularity for some quite important matrices: The
incidence matrix of an undirected graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by:
1, if v ∈ e
av,e =
0, if v ∈
6 e
The incidence matrix of a directed graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by: 
 −1, if v = x
av,(x,y) = 1, if v = y
0, if v 6∈ {x, y}

Theorem 89 The incidence matrix of an undirected graph G is totally unimodular if

and only if G is bipartite.
114
Proof: Let G be an undirected graph and AG its incidence matrix. Since a matrix is TU if
and only if its transposed matrix is TU, we can apply Theorem 88 to the rows of AG : AG is TU
if and only if for each X ⊆ V (G) there is a partition X = A∪B ˙ with E(G[A]) = E(G[B]) = ∅.
The last condition is satisfied if and only if G is bipartite. 2
Applications:
• The previous theorem can be used to show König’s Theorem: The maximum cardinality
of a matching in a bipartite graph equals the minimum cardinality of a vertex cover.
To see this, let G be a bipartite graph and AG its incident matrix. Then, a maximum
matching is given by an integral solution of max{11m x | AG x ≤ 11n , x ≥ 0} and a minimum
vertex cover by an integral solution of min{11n y | AtG y ≥ 11m , y ≥ 0}. By the previous
theorem, AG is TU, so by Corollary 86 both optima are attained by integral vectors.
• Another implication of the theorem provides a characterization of doubly stochastic

matrices: A square matrix M = (xij ) i=1,...,n ∈ Rn×n≥0 is called doubly stochastic if for all
j=1,...,n
i ∈ {1, . . . , n}, we have j=1 xij = 1 and for all j ∈ {1, . . . , n}, we have ni=1 xij = 1. If
Pn P
in addition all entries are integral, we call the matrix a permutation matrix. We claim
that each doubly stochastic matrix can be written as a convex combination of permutation
matrices (which has also been proved in an earlier exercise). To see this, note that the
2
set of all doubly stochastic n × n-matrices is given by P = {x ∈ Rn | AG x ≤ 11, x ≥ 0}
where AG is the incidence vector of the complete bipartite Graph Kn,n (which contains a
vertex for each column of M in one side of the bipartition and a vertex for each row of M
in the other side of the bipartition). Since AG is TU, by Theorem 85 all vertices of P are
integral, so the represent permutation matrices.
Theorem 90 The incidence matrix of a directed graph is totally unimodular.
Proof: Again, we apply Theorem 88 to the transpose of the incidence matrix. For any set
R ⊆ {1, . . . , m} we can choose the R1 := R and R2 := ∅ satisfying the constraints of Theorem 88.
2
Remark: This result gives a reason for the existence of integral optimum solution of flow
problems. These results can be extended to more general linear functions on the edges of
directed graphs (see exercises).
8.5 Cutting Planes
The general strategy of cutting-plane methods can be described as follows: Assume that we are
given a polyhedron P and we want to optimize a linear function over the integral vectors in P .
To this end, we first find an optimum solution x∗ over P . If this belongs to PI , we are done,
because then we can also easily compute and integral solution of the same cost. Otherwise we
115
look for an hyperplane separating x∗ from PI , so we ask for a vector c and a number δ, such
that ct x ≤ δ for all x ∈ PI but ct x∗ > δ. Then, we add the constraint ct x ≤ δ, solve the linear
program again and iterate these steps until we get an integral solution.
How can we find half-space that contain PI but not necessarily P ? An easy observation is that
if H is a half-space that contains P , then PI is contained in HI . This motivates the following
definition:
Definition 28 Let P ⊆ Rn be a convex set. Let M be the set of all rational half-spaces
H = {x ∈ Rn | ct x ≤ δ} with P ⊆ H. Then, we define
\
P 0 := HI .
H∈M
0
We set P (0) := P and P (i+1) := P (i) for i ∈ N \ {0}. P (i) is the i-th Gomory-Chvátal-
truncation of P .
Obviously, we have P ⊇ P (1) ⊇ P (2) ⊇ · · · ⊇ PI for any rational polyhedron P . In particular

we have P = P 0 if P = PI .
An example that P 0 may differ from PI is given by the polytope P = conv({(0, 0), (0, 1), (1, 12 }).
For any half-space H containing P , we have ( 12 , 12 ) ∈ HI , so we get ( 12 , 12 ) ∈ P 0 an thus PI 6= P 0 .
In this polyhedron, PI = P (2) , but by extending the polyhedron to the right, one can get for
each k a rational polyhedron for which also PI 6= P (k) .
Lemma 91 Let H = {x ∈ Rn | ct x ≤ δ} be a rational half-space such that the components

of c are relatively prime integers. Then HI = H 0 = {x ∈ Rn | ct x ≤ bδc}.
Proof: Obviously, we have HI ⊆ H 0 ⊆ {x ∈ Rn | ct x ≤ bδc}, so we only have to show that

{x ∈ Rn | ct x ≤ bδc} ⊆ HI . Let x∗ ∈ {x ∈ Rn | ct x ≤ bδc}. By Corollary 73, the hyperplane
{x ∈ Rn | ct x = bδc} contains an integral vector y (because the components of c are relatively
prime integers). Let α ∈ N \ {0} be a number such that αx∗ is integral. Then,
1 α−1
x∗ = (αx∗ − (α − 1)y) + y.
α α
Since ct (αx∗ − (α − 1)y) ≤ ct y = bδc, this shows that x∗ is the convex combination of two
integral vectors in H, so x∗ ∈ HI . 2
116
Proposition 92 Let P = {x ∈ Rn | Ax ≤ b} be a rational polyhedron. Then
P 0 = {x ∈ Rn | ut Ax ≤ but bc for all u ≥ 0 with ut A integral}.
Proof: “⊆:” For any u ≥ 0, we have P ⊆ {x ∈ Rn | ut Ax ≤ ut b}. Hence, if in addition ut A is

integral, this implies P 0 ⊆ {x ∈ Rn | ut Ax ≤ ut b}I ⊆ {x ∈ Rn | ut Ax ≤ but bc}.
“⊇:” W.l.o.g. we can assume that {x ∈ Rn | ut Ax ≤ but bc for all u ≥ 0 with ut A integral} 6= ∅.
Then also P 6= ∅.
Let z ∈ {x ∈ Rn | ut Ax ≤ but bc for all u ≥ 0 with ut A integral}. We have to show that z is in
P 0 , i.e. that z is contained in the integer hull of every half-space containing P .
Let H = {x ∈ Rn | ct x ≤ δ} with c ∈ Qn such that P ⊆ H. We can assume that the components
of c are relatively prime integers.
The LP max{ct x | Ax ≤ b} is feasible and bounded (by δ), so we get the duality equation
max{ct x | Ax ≤ b} = min{ut b | At u = c, u ≥ 0}.
Let ũ be an optimum solution of the minimum. Since ũt A = ct is integral, this leads to
ũt Az ≤ bũt bc, so
ct z = ũt Az ≤ bũt bc ≤ bδc.
By the previous lemma, this implies z ∈ HI . Since this is true for any half-space H containing
P , it also shows z ∈ P 0 . 2
Cuts that are given by inequalities of the type ut Ax ≤ but bc (for some vector u ≥ 0 with ut A
integral) are called Gomory-Chvátal cuts. They have been used for the first algorithms for
integer linear programming based on cutting planes (see Gomory [1963]).
Theorem 93 Let Ax ≤ b with A ∈ Zm×n and b ∈ Qm be a TDI-system. Let

P = {x ∈ Rn | Ax ≤ b}. Then, P 0 = {x ∈ Rn | Ax ≤ bbc}.
Proof: “P 0 ⊆ {x ∈ Rn | Ax ≤ bbc}:” Each inequality in Ax ≤ b gives a half-space H, and the

corresponding inequality in Ax ≤ bbc gives a half-space that contains HI and hence P 0 .
“P 0 ⊇ {x ∈ Rn | Ax ≤ bbc}:” We can assume that {x ∈ Rn | Ax ≤ bbc} = 6 ∅. Let x̃ ∈ {x ∈ Rn |
Ax ≤ bbc}, an let u ≥ 0 be a vector with ut A integral. By the previous proposition, we have to
show that ut Ax̃ ≤ but bc.
The LP max{ut Ax | Ax ≤ b} is feasible (since {x ∈ Rn | Ax ≤ bbc} =
6 ∅) and bounded (by ut b),
so we have the primal-dual equation
max{ut Ax | Ax ≤ b} = min{bt y | y ≥ 0, y t A = ut A}.
117
Since Ax ≤ b is TDI, the minimum is attained by an integral vector ỹ. Thus,
ut Ax̃ = ỹ t Ax̃ ≤ ỹ t bbc ≤ bỹ t bc ≤ but bc.
This shows P 0 ⊇ {x ∈ Rn | Ax ≤ bbc}. 2
Corollary 94 For any rational polyhedron P , the set P 0 is a polyhedron.
Proof: Follows from the previous theorem and the fact that any rational polyhedron can be
described by a TDI-system with integral matrix (Theorem 81). 2
Lemma 95 Let F be a face of a rational polyhedron P . Then, F 0 = F ∩ P 0 .
Proof: Let P be a rational polyhedron. By Theorem 81, we can write P as P = {x ∈ Rn |

Ax ≤ b} with A integral, b rational and Ax ≤ b TDI. Let F = {x ∈ Rn | Ax ≤ b, at x = β} be a
face of P where at x ≤ β with a and β integral is a valid inequality for P . By Proposition 76,
the system Ax ≤ b, at x ≤ β is TDI. Therefore, by Proposition 77, also Ax ≤ b, at x = β is TDI.
Since β is integral, we get (by applying Theorem 93 twice):
P 0 ∩ F = {x ∈ Rn | Ax ≤ bbc, at x = β}
= {x ∈ Rn | Ax ≤ bbc, at x ≤ bβc, at x ≥ dβe}
= F 0.
Corollary 96 Let F be a face of a rational polyhedron P . Then, F (i) = F ∩ P (i) .
Proof: Let P be a rational polyhedron, and F a face of P . By the previous lemma, F 0 is

either empty or a face of P 0 . By induction on i, we show that F (i) is either empty or a face of
P (i) , and F (i) = F ∩ P (i) . For i = 1, this follows from that previous lemma. For i > 1 we get:
F (i) = (F (i−1) )0 = (P (i−1) )0 ∩ F (i−1) = P (i) ∩ (P (i−1) ∩ F ) = P (i) ∩ F . 2
Lemma 97 Let P ⊆ Rn be a polyhedron, U a unimodular n × n-matrix and

f (X) = {U x | x ∈ X} for all X ⊆ Rn . Then, f (P ) is a polyhedron. Moreover, if P is
rational, then (f (P ))0 = f (P 0 ) and (f (P ))I = f (PI ).
Proof: Let P = {x ∈ Rn | Ax ≤ b}, then f (P ) = {x ∈ Rn | AU −1 x ≤ b}, so f (P ) is a

polyhedron.
118
Now assume in addition that P is rational. Since U is unimodular, U x is integral if and only if
x is integral. This implies
(f (P ))I = conv({y ∈ Zn | y = U x, x ∈ P })
= conv({y ∈ Rn | y = U x, x ∈ P, x ∈ Zn })
= conv({y ∈ Rn | y = U x, x ∈ PI })
= f (PI ).
By Theorem 81, we can assume that Ax ≤ b is TDI, A is integral and b is rational. Then, for
any integral vector c for which min{bt y | y t AU −1 = ct , y ≥ 0} is feasible and bounded, also
min{bt y | y t A = ct U, y ≥ 0} is feasible and bounded and ct U is integral. Hence AU −1 x ≤ b is
TDI. Thus, Theorem 93 implies
(f (P ))0 = {x ∈ Rn | AU −1 x ≤ b}0 = {x ∈ Rn | AU −1 x ≤ bbc} = f (P 0 ).
2
Remark: This shows as well that (f (P ))(i) = f (P (i) ) for a rational polyhedron P and i ∈ N.
Theorem 98 For every rational polyhedron P , there is a number t with P (t) = PI .
Proof: Let P ⊆ Rn be a rational polyhedron. We prove the statement by induction on

n + dim(P ). The case dim(P ) = 0 is trivial.
Case 1: dim(P ) < n.
Then, P ⊆ K for some rational hyperplane K = {x ∈ Rn | at x = β}. We can assume that the
entries of a are relatively prime integers.
If K does not contain any integral vector, then by Corollary 73, β must be non-integral. Then,
P 0 ⊆ {x ∈ Rn | at x ≤ bβc, at x ≥ dβe} = ∅ = PI .
If K contains an integral vector y, we can assume that it contains 0 because the theorem holds
for P is and only if it holds for P − y since y is integral. Thus, we can assume that β = 0.
If we interpret at as a 1 × n-matrix, we can bring it into Hermite normal form by elementary
unimodular column operations. The Hermite normal form of at is of the type αet1 . Since any
series of elementary unimodular column operations can be performed by a multiplication
form the right with a unimodular square matrix, there is a unimodular square matrix U with
at U = αet1 . However, by the previous lemma, the theorem is invariant under the transformation
x 7→ U −1 x, so we may assume that at = αet1 . Then, the first component of every vector in P is
zero, so P = {0} × Q for some polyhedron Q ⊆ Rn−1 . We can apply the induction hypothesis
to Q. Since ({0} × Q)I = {0} × QI and ({0} × Q)(t) = {0} × Q(t) for any t ∈ N, this proves the
theorem in the case dim(P ) < n.
Case 2: dim(P ) = n. We can write P as P = {x ∈ Rn | Ax ≤ b} with A integral. Since
P is rational, by Theorem 70, PI is a rational polyhedron as well, so it can be written as
119
PI = {x ∈ Rn | Cx ≤ d} with some integral matrix C and some rational vector d. If PI = ∅, we
choose C = A and d = b − A0 11n where A0 arises from A by taking the absolute value of each
entry. Note that {x ∈ Rn | Ax + A0 11n ≤ b} = ∅ because any vector x∗ with Ax∗ + A0 11n ≤ b
could be rounded down to an integral vector x with Ax ≤ b.
Let ct x ≤ δ be an inequality in Cx ≤ d. Then, we claim that there is an s ∈ N with
P (s) ⊆ H := {x ∈ Rn | ct x ≤ δ}. The theorem is a direct consequence of this claim.
Proof of the claim: Observe that there is a number β ≥ δ with P ⊆ {x ∈ Rn | ct x ≤ β}. If
PI = ∅, this is true by construction. In the case PI 6= ∅, it follows from the fact that ct x is
bounded over P if and only if it is bounded over PI (Proposition 71).
Assume that the claim is false, so there is an integer γ with δ < γ ≤ β for which there is an
s0 ∈ N with P (s0 ) ⊆ {x ∈ Rn | ct x ≤ γ} but there is no s ∈ N with P (s) ⊆ {x ∈ Rn | ct x ≤ γ−1}.
Then, max{ct x | x ∈ P (s) } = γ for all s ≥ s0 . To see this, assume that max{ct x | x ∈ P (s) } < γ
for some s. Then there is an > 0 with P (s) ⊆ {x ∈ Rn | ct x ≤ γ − }. This implies
max{ct x | x ∈ P (s+1) } ≤ γ − 1 because {x ∈ Rn | ct x ≤ γ − }I ⊆ {x ∈ Rn | ct x ≤ γ − 1}.
Define F := P (s0 ) ∩ {x ∈ Rn | ct x = γ}. Then, dim(F ) < n = dim(P ), so we can apply the
induction hypothesis to F , which implies that there is a number s1 with F (s1 ) = FI . Thus,
F (s1 ) = FI ⊆ PI ∩ {x ∈ Rn | ct x = γ} = ∅.
Since F is a face of P (s0 ) , we can apply Corollary 96 to F and P (s0 ) , so
∅ = F (s1 ) = P (s0 +s1 ) ∩ F = P (s0 +s1 ) ∩ {x ∈ Rn | ct x = γ}.
Therefore, max{ct x | x ∈ P (s0 +s1 ) } < γ, which is a contradiction. 2
8.6 Branch-and-Bound Methods
Note that this section was not covered by the lecture course given in summer term 2019.
Branch-and-Bound Methods (they are also called Divide-and-Conquer Algorithms
or Backtracking Algorithms) are a quite simple approach to integer linear programming.
Nevertheless, they are of great practical relevance. Algorithm 5 describes the approach for
integer linear programs but it can be applied to mixed integer linear programs, too. The
algorithm stores a number L which is the cost of the best integral solution found so far (so in
the beginning it is −∞). In each iteration of the main loop, the algorithm chooses a polyhedron
Pj , which is a subset of the given polyhedron P0 , and solves the corresponding linear program. If
this LP is bounded and feasible, the algorithm first checks if the value c∗ of an optimum solution
x∗ is larger than L. If this is not the case, the algorithm can reject the polyhedron Pj because
it cannot contain a better integral solution than the best current solution (this is the bounding
part). If c∗ > L and x∗ is integral, we have found a better integral solution and can update L.
Otherwise, we choose a non-integral component x∗i of x∗ and compute sub-polyhedra P2j+1 and
P2j+2 of Pj with additional constraints that arise by rounding x∗i up or down (branching step).
120
Algorithm 5: Branch-and-Bound Algorithm
Input: A matrix A ∈ Qm×n , a vector b ∈ Qm , and a vector c ∈ Qn such that the LP
max{ct x | Ax ≤ b} is feasible and bounded.
Output: A vector x̃ ∈ {x ∈ Zn | Ax ≤ b} maximizing ct x or the message that there is no
optimum solution.
1 L := −∞;
n
2 P0 := {x ∈ R | Ax ≤ b};
3 K := {P0 };
4 while K 6= ∅ do
5 Choose a Pj ∈ K;
6 K := K \ {Pj };
7 if Pj 6= ∅ then
8 Solve max{ct x | x ∈ Pj };
9 Let x∗ be an optimum solution and c∗ = ct x∗ ;
10 if c∗ > L then
11 if x∗ ∈ Zn then
12 L := c∗ ;
13 x̃ := x∗ ;
14 else
15 Choose i ∈ {1, . . . , n} with x∗i 6∈ Z;
16 P2j+1 := {x ∈ Pj | xi ≤ bx∗i c};
17 P2j+2 := {x ∈ Pj | xi ≥ dx∗i e};
18 K := K ∪ {P2j+1 } ∪ {P2j+2 };
19 if L > −∞ then
20 return x̃;
21 else
22 return “There is no feasible solution”;
Example: Consider the following ILP:
max −x1 + 3x2

subject to −4x1 + 6x2 ≤ 9
x1 + x2 ≤ 4
x1 , x2 ≥ 0
x1 , x2 ∈ Z
Figure 9 illustrates what the algorithm may do on this instance. Since the optimum solution
of the LP-relaxation is not integral, we create in the first branching step two sub-polytopes
P1 = {(x1 , x2 ) | x2 ≤ 2} ∩ P0 and P1 = {(x1 , x2 ) | x2 ≥ 3} ∩ P0 = ∅. In P1 we still do not find
an integral optimum solution, so we branch again and get the polytopes P3 and P4 . In P4 we
get an integral optimum x∗ = (1, 2) with cost 3. In P3 we get a non-integral optimum solution
121
(0, 1.5) whose cost is not better than the best integral solution found so far (provided that we
considered P4 before P3 ), so the algorithm will stop here.
A branch-and-bound computation is often represented by a so-called branch-and-bound tree.
This is in fact rather an arborescence than a tree. Its nodes are the polyhedra Pj that are
considered during the computation, and P0 is the root. For any Pj , the nodes P2j+1 and P2j+2
are its children (if they exist).
In line 5 of the algorithm, we have to choose the next LP to be solved, and in line 15 we have to
decide which non-integral component is used for creating new sub-problems. There are different
strategies for these steps (branching rules). For example, it is often reasonable to store the
elements of K in a last-in-first-out queue and to choose the last element that has been added to
K. In the branch-and-bound tree, this corresponds to a leaf with the biggest distance to the
root. This strategy can reduce the time until the first feasible solution has been found. Another
reasonable branching rule consist in choosing a polyhedron Pj for which max{ct x | x ∈ Pj } is
as large as possible. Note that the maximum over all these values for all Pj ∈ K gives an upper
bound U on the best possible solution that can still be computed. Hence, by choosing a Pj with
max{ct x | x ∈ Pj } = U , we get a chance to reduce U . This can be useful if we do not want to
compute an exact optimum solution but we stop as soon as U − L is small enough.
For the choice of x∗i a common strategy is to choose x∗i such that |x∗i − bx∗i c − 12 | is minimized.
Another, more time-consuming approach is to choose x∗i such that the effect on the objective
function is maximized (strong branching).
Further remarks:
• In order to get at least a finite algorithm, we have to guarantee that in line 8 we always
find a integral optimum solution if Pj is integral.
• Instead of initializing L with −∞, it is often possible to compute some reasonable integral
solution by some heuristics. In particular this is often the case for combinatorial problems.
• The branch-and-bound strategy can be combined with a cutting-plane algorithm (see the
previous section). For each sub-polyhedron Pj , one can try to find hyperplanes separating
some non-integral vectors in Pj from (Pj )I . This combination is called branch-and-cut
method. For example, this approach has been for solving quite large Traveling Salesman
Problems (see Padberg and Rinaldi [1991]).
122
x2
x∗
P0
−4x1 + 6x2 = 9
x1
x1 + x 2 = 4
x2
P2 = ∅
x2 = 3
x2 = 2
x∗
P1
−4x1 + 6x2 = 9
x1
x1 + x2 = 4
x2
P3 = {0} × [0, 1.5]

x∗
P4
−4x1 + 6x2 = 9
x1
x1 = 0 x2 = 1 x1 + x2 = 4
Fig. 9: A branch-and-bound example.
123
Bibliography
Adler, I., Karp, R.M., Shamir, R. [1987]: A simplex variant solving an m × d linear program in
O(min(m2 , d2 )) expected number of steps. Journal of Complexity, 3, 372–387, 1987.
Ahuja, R.K., Magnanti, T.L., and Orlin [1993]: Network Flows: Theory, Algorithms, and
Applications. Prentice Hall, 1993.
Anthony, M., and Harvey, M. [2012]: Linear Algebra: Concepts and Methods. Cambridge
University Press, 2012.
Bárány, I., Howe, R., and Lovász, L. [1992]: On integer points in polyhedra: a lower bound.
Combinatorica, 12, 135–142, 1992.
Bertsimas, D., and Tsitsiklis, J.N. [1997]: Introduction to Linear Optimization. Athena Scientific,
1997.
Bertsimas, D., and Weismantel, R. [2005]: Optimization over Integers. Dynamic Ideas, 2005.
Bland, R.G. [1977]: New finite pivoting rules for the simplex method. Mathematics of Operations
Research, 2, 103–107, 1977.
Borgwardt, K. [1982]: The average number of pivot steps required by the simplex method is
polynomial. Zeitschrift für Operations Research, 26, 157–177, 1982.
Bosch, S. [2007]: Lineare Algebra. 4th edition, Springer, 2007.
Chvátal, V. [1983]: Linear programming. Series of books in the mathematical sciences, W.H.
Freeman, 1983.
Cohn, D.L. [1980]: Measure Theory. Birkhäuser, 1980.
Cunningham, W.H. [1976]: A network simplex method. Mathematical Programming, 11, 105–116,
1976.
Dantzig, G.B. [1951]: Maximization of a linear function of variables subject to linear inequalities.
In: Koopmans, T.C (ed.), Activity Analysis of Production and Allocation, 359–373, Wiley,
1951.
Edmonds, J. [1965]: Maximum matching and polyhedron with (0,1) vertices. Journal of Research
of the National Bureau of Standards, B, 69, 125–130, 1965.
Eisenbrand, F. [2003]: Fast integer programming in fixed dimension. Lecture Notes in Computer
Science, 2832, 196–207, 2003.
Fischer, G. [2009]: Lineare Algebra: Eine Einführung für Studienanfänger. 18th edition, Springer,
2013.
124
Ghoulia-Houri, A. [1962]: Charactérisation des matrices totalement unimodulaires. Comptes
Rendus Hebdomadaires des Séances de l’Académie des Sciences (Paris), 254, 1192-1194, 1962.
Giles, F.R. and Pulleyblank, W.R. [1979]: Total dual integrality and integer polyhedra. Linear
Algebra and Its Applications, 25, 191–196, 1979.
Gomory, R.E. [1963]: An algorithm for integer solutions of linear programs. In: Recent Advances
in Mathematical Programing (R.L. Graves, P. Wolfe, eds.), McGraw-Hill, 269–302, 1963.
Grötschel, M., Lovász, L. and Schrijver, A. [1981]: The ellipsoid method and its consequences
in combinatorial optimization. Combinatorica, 1, 169–197, 1981.
Guenin, B., Könemann, J., and Tunçel, L. [2014]: A Gentle Introduction to Optimization.
Cambridge University Press, 2014.
Hoffman, A. and Kruskal, J. [1956]: Integral boundary points of convex polyhedra. Linear
Inequalities and Related Systems (H. Kuhn, A. Tucker, eds.), Annals of Mathematics Studies,
38, 223–246, 1956.
Hougardy, S., and Vygen, J. [2018]: Algorithmische Mathematik. Second edition, Springer, 2018.
Kalai, G., and Kleitman, D. [1992]: A quasi-polynomial bound for the diameter of graphs of
polyhedra. Bulletin of the American Mathematical Society, 26, 315–316, 1992.
Karmakar, L. [1984]: A new polynomial-time algorithm for linear programming. Combinatorica,

4, 373–395, 1984.
Karloff, H. [1991]: Linear Programming. Birkhäuser, 1991.
Khachiyan, L. [1979]: A polynomial algorithm for linear programming. Soviet Mathematics

Doklady, 20, 191–194, 1979.
Klee, V., and Minty, G.J. [1972]: How good is the simplex algorithm? In: Inequalities III (O.
Shisha, ed.), Academic Press, 159–175, 1972.
Korte, B., and Vygen, J. [2018]: Combinatorial Optimization: Theory and Algorithms. Sixth
edition, Springer, 2018.
Lang, S. [1987]: Linear Algebra. Third edition, Springer, 1987.
Lee, T., Sidford, A., Wong, S.C. [2015]: A Faster Cutting Plane Method and its Implications
for Combinatorial and Convex Optimization. arxiv.org/abs/1508.04874, Symposium on
Foundations of Computer Science, 2015.
Lenstra, H.W. [1983]: Integer programming with a fixed number of variables. Mathematics of
Operations Research, 8, 538–548, 1983.
Matoušek, J., and Gärtner, B. [2007]: Understanding and Using Linear Programming. Springer,
2007.
125
Megiddo, N. [1984]: Linear programming in linear time when the dimension is fixed. Journal of
the ACM, 31, 114–127, 1984.
Mehlhorn, K., and Saxena, S. [2015]: A still simpler way of introducing the interior-point
method for linear programming. Computer Science Review, 22, 1–11, 2016.
Padberg, M. [1999]: Linear Optimization and Extensions. Second edition, Springer, 1999
Padberg, M., and Rao, M. [1982]: Odd minimum cut-sets and b-matchings. Mathematics of
Operations Research, 7, 67–80, 1982.
Padberg, M., and Rinaldi, G. [1991]: A Branch-and-Cut Algorithm for the Resolution of
Large-Scale Symmetric Traveling Salesman Problems. SIAM Review, 33, 1, 60–100, 1991.
Pan, P.-Q. [2014]: Linear Programming Computation. Springer, 2014.
Panik, M.J. [1996]: Linear Programming: Mathematics, Theory and Algorithms. Kluwer
Academic Publishers, 1996.
Roos, C., Terlaky, T., Vial, J.-P. [2005]: Interior Point Methods for Linear Optimization.
Second edition, Springer, 2005.
Rubin, D. [1970]: On the unlimited number of faces in integer hulls of linear programs with a
single constraint. Operations Research, 18, 5, 940 – 946, 1970.
Saigal, R. [1995]: Linear Programming. A Modern Integrated Analysis. Springer, 1995.
Santos, F. [2011]: A counterexample to the Hirsch conjecture. Annals of Mathematics, 176, 1,

383–412, 2011.
Schrijver, A. [1986]: Theory of Linear and Integer Programming. Wiley, 1986.
Sierksma, G., and Zwols, Y. [2015]: Linear and Integer Optimization. Theory and Practice.
Third edition, CRC Press, 2015.
Spielmann, D.A., and Teng, S.-H. [2005]: Smoothed analysis of algorithms: Why the simplex
algorithm usually takes polynomial time. Journal of the ACM, 51, 3, 385 – 463, 2004.
Strang, G. [1980]: Linear Algebra and Its Applications. Second edition, Academic Press, 1980.
Tardos, É. [1986]: A strongly polynomial algorithm to solve combinatorial linear programs.
Operations Research, 34, 2, 250 – 256, 1986.
Terlaky, R.J. [2001]: An easy way to teach interior point methods. European Journal of
Operational Research, 130, 1–19, 2001.
Vanderbei, R.J. [2014]: Linear Programming: Foundations and Extensions. Fourth edition,
Springer, 2014.
Wright, S.J. [1997]: Primal-Dual Interior-Point Methods. SIAM, 1997.
126
Ye, Y. [1992]: On the finite convergence of interior-point algorithms for linear programming.
Mathematical Programming, 57, 325–335, 1992.
Ye, Y. [1997]: Interior Point Algorithms. Theory and Analysis. Wiley, 1997.
Ziegler, G. [2007]: Lectures on Polytopes. Seventh Printing, Springer, 2007.
127
Index
Active row, 108 Face, 34
Affine linear mapping, 33 Facet, 36
Augmentation of flow, 60 Facet-defining, 36
Farkas’ Lemma, 21
b-flow, 59 Farkas-Minkowski-Weyl Theorem, 40
b-flow associated to a spanning tree structure, Feasible basic solution, 46
62 Feasible basis, 46
Basic solution, 34 Feasible spanning tree structure, 62
Basic variables, 46 Finitely generated cone, 15
Basis, 46 Fourier-Motzkin elimination, 20
Binary Linear Programs, 101 Fundamental circuit, 63
Bland’s rule, 55 Fundamental Theorem of Linear Inequalities,
Bounded optimization problem, 6 40
Branch-and-Bound Algorithm, 121
Branch-and-bound tree, 122 Gaussian Elimination, 45, 69
Branch-and-cut method, 122 Gomory-Chvátal cut, 117
Branching rule, 122 Gomory-Chvátal-truncation, 116
Carathéodory’s Theorem, 39 Half-Ball Lemma, 73

Cholesky decomposition, 71 Half-Ellipsoid Lemma, 74
Column generation, 58 Half-space, 13
Combinatorial diameter, 57 Hermite normal form, 103
Complementary slackness, 27 Hilbert basis, 107
Convex, 12 Hyperplane, 13
Convex combination, 12 Idealized Ellipsoid Algorithm, 75
Convex cone, 14 Incidence matrix, 114
Convex hull, 12 Integer hull, 101
Cycling, 54 Integer Linear Program, 9
Integral polyhedron, 103
Degenerated feasible basic solution, 46
Integrality Gap, 11
Dimension, 14
Doubly stochastic matrix, 115 Klee-Minty cube, 57
Dual feasible basis, 58 König’s Theorem, 115
Dual LP, 17
Dual Simplex Algorithm, 59 Largest coefficient rule, 54
Largest increase rule, 54
Elementary unimodular column operation, 103 Linear Program, 7
Ellipsoid, 71 Löwner-John Ellipsoid, 75
Ellipsoid Algorithm, 79 LP-relaxation, 11
128
Matching, 82 Spanning tree solution, 60
Max-Flow-Min-Cut-Theorem, 31 Spanning tree structure, 62
Maximization problem, 6 Stable Set Problem, 12
Maximum-Flow Problem, 9 Standard equation form, 7
Minimal face, 37 Standard inequality form, 7
Minimally TDI, 107 Steepest edge rule, 54
Minimization problem, 6 Strict complementary slackness, 29
Minimum-Cost Flow Problem, 60 Strongly feasible spanning tree structure, 62
Minkowski sum, 42 Supporting hyperplane, 34
Mixed Integer Linear Program, 9
TDI-system, 106
Network Simplex Algorithm, 64 Totally dual integral, 106
Non-basic variables, 46 Totally unimodular matrix, 111
Non-degenerated feasible basic solution, 46
Normal, 13 Unbounded optimization problem, 6
Unimodular matrix, 111
Objective function, 6
Optimization problem, 6 Vertex, 34
Vertex Cover Problem, 11
Peak, 63
Permutation matrix, 115 Weak duality, 18
Pivot rule, 54 Weak optimization problem, 83
Pointed polyhedron, 39 Weak separation oracle, 84
Polar cone, 41
Polyhedral cone, 14
Polyhedron, 13
Polytope, 13
Positive definite matrix, 71
Positive semidefinite matrix, 71
Potential associated to a spanning tree struc-
ture, 62
Primal LP, 18
Projection of a polyhedron, 33
r-R-sandwiched set, 83
Reduced cost, 52
Residual capacity, 60
Residual graph, 60
Revised Simplex Algorithm, 58
s-t-cut, 31
s-t-flow, 9
Separation oracle, 75
Simplex Algorithm, 53
Simplex tableau, 51
Slack variable, 7
129

Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes

Uploaded by

Copyright:

Available Formats

Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear and Integer Optimization (V3C1/F4C1) : Lecture Notes

Uploaded by

Copyright:

Available Formats

Linear and Integer Optimization (V3C1/F4C1)

3 The Structure of Polyhedra 33

7 Interior Point Methods 87

8 Integer Linear Programming 101

1.1 A First Example

max 2x1 + 3x2 // Objective function

{(x1 , x2 ) ∈ R2 | x1 + x2 ≤ 10} ∩ {(x1 , x2 ) ∈ R2 | x1 + 2x2 ≤ 16} ∩

Fig. 1: Graphic solution of the first example.

1.2 Optimization Problems

Definition 1 An optimization problem is a pair (I, f ) where I is a set and f : I → R

Or, in a shorter version we write: max{ct x | Ax ≤ b}.

max ct (z − z̄)

1.3 Possible Outcomes

• The linear program can be infeasible. This means that {x ∈ Rn | Ax ≤ b} = ∅. A simple

1.4 Integrality Constraints

Integer Linear Programming

1.5 Modeling of Optimization Problems as (Integral) Linear Pro-

• f (e) ≤ u(e) for all e ∈ E(G) and

This problem can be formulated as a linear program in the following way:

Bottleneck Maximum-Flow Problem with 2 Sources

• f (e) ≤ u(e) for all e ∈ E(G) and

• ∆f (e) = 0 for all v ∈ V (G) \ {s1 , s2 , t}

such that min{∆f (s1 ), ∆f (s2 )} is maximized

for some c, e ∈ Rn and d, f ∈ R.

Vertex Cover Problem

Stable Set Problem

Again, this problem can easily be formulated as an integer linear program:

An LP-relaxation looks like this:

In this section, we examine basic properties of solution spaces of linear programs.

Definition 3 Let X ⊆ Rn (for n ∈ N). X is called convex if for all x, y ∈ X and

Definition 4 For x1 , . . . , xk ∈ Rn , λ1 , . . . , λk , λi ≥ 0 (i ∈ {1, . . . , k}) with ki=1 λi = 1,

we call x = ki=1 λi xi convex combination of x1 , . . . , xk . The convex hull conv(X)

(a) X is called a half-space if there is a vector a ∈ Rn \ {0} and a number b ∈ R such

(b) X is called a hyperplane if there is a vector a ∈ Rn \ {0} and a number b ∈ R

(c) X is called a polyhedron if there are a matrix A ∈ Rm×n and a vector b ∈ Rm

(d) X is called a polytope if it is a polyhedron and there is a number K ∈ R such that

Examples: The empty set is a polyhedron because ∅ = {x ∈ Rn | 0t x ≤ −1} and, of course, it

Lemma 1 A set X ⊆ Rn is a polyhedron if and only if one of the following conditions

• X is the intersection of a finite number of half-spaces.

Proof: “⇐:” If X = Rn or X is the intersection of a finite number of half-spaces, it is

which is a representation of X as an intersection of a finite number of half-spaces. 2

dim(X) = n − max{rank(A) | A ∈ Rn×n with Ax = Ay for all x, y ∈ X}.

Definition 7 A set X ⊆ Rn is called a convex cone if X 6= ∅ and for all x, y ∈ X and

Definition 8 A set X ⊆ Rn is called a polyhedral cone if it is a polyhedron and a

Lemma 2 A set X ⊆ Rn is a polyhedral cone if and only if there is a matrix A ∈ Rm×n

Proof: “⇐:” Let X = {x ∈ Rn | Ax ≤ 0} for some matrix A ∈ Rm×n . Then X obviously

2.1 Dual LPs

Consider the following linear program (P):

max 12x1 + 10x2

min 5u1 + 7u2 + u3

Proposition 3 (Weak duality) If both the equation systems Ax ≤ b and At y = c, y ≥ 0

max{ct x | Ax ≤ b} ≤ min{bt y | At y = c, y ≥ 0}.

Proof: For x with Ax ≤ b and y with At y = c, y ≥ 0, we have

2.2 Fourier-Motzkin Elimination

Consider the following system of inequalities:

min 10 − 32 y − 43 z, 3 − 23 z, 52 + 21 y ≥ max {−3 + 2y − z, −2}

U := {i ∈ {1, . . . , m} | ai1 > 0}

ãti x̃ + ãtk x̃ ≤ bi + bk i ∈ U, k ∈ L. (17)

max{ãtk x̃ − bk | k ∈ L}, min{bi − ãti x̃ | i ∈ U }