Nam - Convex Analysis -1
Nam - Convex Analysis -1
Series Editors
Thomas V. Mikosch, Køebenhavns Universitet, Copenhagen, Denmark
Sidney I. Resnick, Cornell University, Ithaca, USA
Stephen M. Robinson, University of Wisconsin-Madison, Madison, USA
Editorial Board
Torben G. Andersen, Northwestern University, Evanston, USA
Dmitriy Drusvyatskiy, University of Washington, Seattle, USA
Avishai Mandelbaum, Technion - Israel Institute of Technology, Haifa, Israel
Jack Muckstadt, Cornell University, Ithaca, USA
Per Mykland, University of Chicago, Chicago, USA
Philip E. Protter, Columbia University, New York, USA
Claudia Sagastizabal, IMPA – Instituto Nacional de Matemáti, Rio de Janeiro,
Brazil
David B. Shmoys, Cornell University, Ithaca, USA
David Glavind Skovmand, Køebenhavns Universitet, Copenhagen, Denmark
Josef Teichmann, ETH Zürich, Zürich, Switzerland
Bert Zwart, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
The Springer Series in Operations Research and Financial Engineering publishes
monographs and textbooks on important topics in theory and practice of Operations
Research, Management Science, and Financial Engineering. The Series is distin-
guished by high standards in content and exposition, and special attention to timely
or emerging practice in industry, business, and government. Subject areas include:
Linear, integer and non-linear programming including applications; dynamic
programming and stochastic control; interior point methods; multi-objective
optimization; Supply chain management, including inventory control, logistics,
planning and scheduling; Game theory Risk management and risk analysis, including
actuarial science and insurance mathematics; Queuing models, point processes,
extreme value theory, and heavy-tailed phenomena; Networked systems, including
telecommunication, transportation, and many others; Quantitative finance: portfolio
modeling, options, and derivative securities; Revenue management and quantitative
marketing Innovative statistical applications such as detection and inference in very
large and/or high dimensional data streams; Computational economics
With 42 Figures
123
Boris S. Mordukhovich Nguyen Mau Nam
Department of Mathematics Fariborz Maseeh Department
Wayne State University of Mathematics and Statistics
Detroit, MI, USA Portland State University
Portland, OR, USA
Mathematics Subject Classification: 46A03, 46A55, 47L07, 47N10, 49J52, 49J53, 52A07, 52A41,
90C26
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To the memory of
and
Convexity has a long history that could date back to Ancient Greece
geometers. Probably the first definition of convexity was given by Archimedes
of Syracuse in the third century BC: There are in a plane certain terminated
bent lines, which either lie wholly on the same side of the straight lines joining
their extremities, or have not part of them on the other side.
Over the subsequent centuries, convexity theory has been developed in
various geometric frameworks with outstanding contributions of many great
mathematicians. The most active period in the study of convex sets was in the
late nineteenth century and the early twentieth century with the
quintessential work done by Hermann Minkowski that was summarized in the
books [219, 220] published in 1910–1911 after his death. Minkowski laid the
foundations of the general theory of convex sets in finite-dimensional spaces.
In particular, he established there the fundamental convex separation theo-
rem, which since has played a crucial role in convex analysis and its
applications.
A systematical study of convex functions has been started by Werner
Fenchel who discovered, in particular, seminal results on conjugacy corre-
spondence and convex duality contained in his very influential mimeographed
lecture notes [131] from a course given at Princeton University in 1951.
Although some constructions and results on generalized differentiation of
convex functions can be found in Fenchel [131], the fundamental notion of
subdifferential (a collection of subgradients) for an extended-real-valued
convex function should be attributed to Jean-Jacques Moreau [264] and
R. Tyrrell Rockafellar [302], who introduced this notion independently in
1963. The revolutionary idea of a set-valued generalized derivative satisfying
rich calculus rules has given rise to convex analysis, a new area of mathematics
where analytic and geometric ideas are so nicely interrelated and jointly
produce beautiful results for sets, set-valued mappings, and functions.
A milestone in the consolidation of the new discipline, at least in
finite-dimensional spaces, was Rockafellar’s monograph “Convex Analysis”
[306] published in 1970, which coined the name of this area of mathematics.
Over the subsequent years, numerous strong results have been discovered in
vii
viii Preface
this area and many excellent books have been published on various aspects of
convex analysis and its applications in finite and infinite dimensions. Among
them, we mention the books by Bauschke and Combettes [34], Bertsekas et al.
[37], Borwein and Lewis [48], Boyd and Vandenberghe [62], Castaing and
Valadier [71], Ekeland and Temam [122], Hiriart-Urruty and Lemaréchal
[164] and its abridge version [165], Ioffe and Tikhomirov [174], Nesterov [279],
Pallaschke and Rolewicz [285], Phelps [290], Pshenichnyi [294], and Zălinescu
[361].
It has been well recognized that convex analysis provides the mathemat-
ical foundations for numerous applications in which convex optimization is
the first to name. The presence of convexity makes it possible not only to
investigate qualitative properties of optimal solutions and derive efficient
optimality conditions, but also develop and justify numerical algorithms to
solve convex optimization problems with smooth and nonsmooth data.
Convex analysis and optimization have an increasing impact on many areas of
mathematics and applications including control systems, estimation and
signal processing, communications and networks, electronic circuit design,
data analysis and modeling, statistics, economics and finance, etc. In recent
times, convex analysis has become more and more important for applications
to some new fields of mathematical sciences and practical modeling such as
computational statistics, machine learning, sparse optimization, location
sciences, etc.
Despite an extensive literature on diverse aspects of convex analysis and
applications, our book has a lot to offer to researchers, students, and practi-
tioners in these fascinating areas. We split the book into two volumes, and
now present to the reader’s attention the first volume, which is mainly
devoted to theoretical aspects of convex analysis and related fields where
convexity plays a crucial role. The second volume [240] addresses various
applications of convex analysis including those areas, which were listed above.
The first volume is devoted to developing a unified theory of convex sets,
set-valued mappings, and functions in vector and topological vector spaces
with its specifications to Banach and finite-dimensional settings. These
developments and expositions are based on the powerful geometric approach of
variational analysis, which resides on set extremality with its characterizations
and significant modifications in the presence of convexity. This approach
allows us to consolidate the device of fundamental facts of generalized differ-
ential calculus and obtain novel results for convex sets, set-valued mappings,
and functions in finite-dimensional and infinite-dimensional settings.
Some aspects of the geometric approach to convex analysis in finite-
dimensional spaces were given in our previous short book [237] in which the
reader was provided with an easy path to access generalized differentiation of
convex objects in finite dimensions and its applications to theoretical and
algorithmic topics of convex optimization and facility location. Now we lar-
gely extend in various directions the previous developments in both
finite-dimensional and infinite-dimensional spaces while covering a much
Preface ix
1 FUNDAMENTALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Topological Interior and Closure of Sets . . . . . . . . . . . 5
1.1.3 Continuity of Mappings . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Bases for Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.5 Topologies Generated by Families of Mappings . . . . . 10
1.1.6 Product Topology and Quotient Topology . . . . . . . . . 12
1.1.7 Subspace Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.8 Separation Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.9 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.10 Connectedness and Disconnectedness . . . . . . . . . . . . . 25
1.1.11 Net Convergence in Topological Spaces . . . . . . . . . . . . 28
1.2 Topological Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.2.1 Basic Concepts in Topological Vector Spaces . . . . . . . 32
1.2.2 Weak Topology and Weak Topology . . . . . . . . . . . . . 39
1.2.3 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.3 Some Fundamental Theorems of Functional Analysis . . . . . . 49
1.3.1 Hahn-Banach Extension Theorem . . . . . . . . . . . . . . . . 50
1.3.2 Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . . . 54
1.3.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . 56
1.3.4 Closed Graph Theorem and Uniform Boundedness
Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58
1.4 Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60
1.5 Commentaries to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . .. 63
2 BASIC THEORY OF CONVEXITY . . . . . . . . . . . . . . . . . . . . . . . 65
2.1 Convexity of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.1.1 Basic Definitions and Elementary Properties . . . . . . . 65
2.1.2 Operations on Convex Sets and Convex Hulls . . . . . . 69
xiii
xiv Contents
This chapter collects fundamental notions and results in vector spaces, topo-
logical spaces, topological vector spaces, and their specifications that are
widely used in the subsequent chapters of the book to build the basic the-
ory of convexity and its applications. For the reader’s convenience, we present
here proofs and discussions, which allow us to make the book self-contained
and helpful for a broad spectrum of students, researchers, and practitioners.
Example 1.3 (metrics and metric spaces) Let X be a set. A real-valued func-
tion d : X × X → R is called a metric on X if the following conditions hold
for all elements x, y, z ∈ X:
(a) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y.
(b) d(x, y) = d(y, x).
(c) d(x, z) ≤ d(x, y) + d(y, z) (the triangle inequality).
The set X together with the metric d is called a metric space and is denoted
by (X, d). If the metric d is specified on X so that no confusion occurs, we
can simply say that X is a metric space.
Given a point x0 ∈ X and a number r > 0, the open ball in X centered at
x0 with radius r is defined by
B(x0 ; r) := x ∈ X d(x, x0 ) < r ,
1.1 Topological Spaces 3
To proceed with other examples, let us first define the class of vector/linear
spaces, which generally does not relate to topologies. This very large class is
used in this book to obtain some important results of convex analysis without
topologies, while the most interesting developments require the combination
of linear and topological structures.
4 1 FUNDAMENTALS
The most common choices of the field F in Definition 1.4 (and the only
fields considered in this book) are the field R of real numbers and the field C
of complex numbers. In the first case, X is called a real vector/linear space,
and in the second case it is called a complex one.
Example 1.5 (normed spaces and Banach spaces) Let X be a vector space
over a field F (either R or C). A function · : X → R is called a norm if the
following properties hold for all vectors x, y ∈ X and all numbers α ∈ F:
(a) x ≥ 0, and x = 0 if and only if x = 0.
(b) αx = |α| · x .
(c) x+y ≤ x + y .
If · is a norm on X, then the pair (X, · ) is called a normed space. We
can verify that a normed space X is a metric space with the metric
d(x, y) := x − y for x, y ∈ X.
If in addition X is complete, then it is called a Banach space. Thus both
classes of normed and Banach spaces are examples of topological spaces.
Example 1.6 (inner product spaces and Hilbert spaces) Let H be a vector
space over a field F (either R or C). An inner product on H is a function
·, · : H × H → F that satisfies the following properties for all vectors x, y, z ∈
H and all numbers λ ∈ F:
(a) x, x ≥ 0, and x, x = 0 if and only if x = 0.
(b) x + y, z = x, z + y, z .
(c) λx, y = λ x, y .
(d) x, y = y, x .
If ·, · is an inner product on H, then the pair (H, ·, · ) is called an inner
product space.
1.1 Topological Spaces 5
x := x, x for x ∈ H.
Then · is a norm on H and (H, · ) is a normed space. If in addition this
normed space is a Banach space, then H is called a Hilbert space. Thus inner
product spaces and Hilbert spaces are also examples of topological spaces. In
particular, on Rn (as a real vector space) with the usual addition and scalar
multiplication, we define
n
x, y := xk yk ,
k=1
A= F.
F closed, A⊂F
6 1 FUNDAMENTALS
Proof. Since X and ∅ are open with ∅ = X c and X = ∅c , the first property
in the proposition is satisfied.
To verify the second property, we get by the De Morgan law that
c
F1 ∪ F2 = F1c ∩ F2c .
This is an open set, and thus F1 ∪ F2 is closed.
To check the third property, we use the De Morgan law again and get
c
Fi = Fic ,
i∈I i∈I
Proof. We start by verifying the first property. It follows from the definition of
A that A is closed as the intersection of a collection of closed sets. In addition
we have A ⊂ A. Suppose now that A is closed and get
A= F ⊂ A.
F closed, A⊂F
To proceed with the second property, observe that the interior is an open
set because it is the union of a family of open sets. In addition we have
int(A) ⊂ A. Supposing that A is open gives us
A⊂ G = int(A),
G open, G⊂A
It has been well recognized that the concept of continuity for mappings is one
of the most fundamental in analysis and applications. Here we present some
basic results in the framework of general topological spaces.
(a) f is continuous.
(b) f −1 (G) is open for every open set G in Y .
(c) f −1 (F ) is closed for every closed set F in Y .
(a) f is continuous on X.
(b) f (A) ⊂ f (A) for every subset A of X.
(c) f −1 (int(B)) ⊂ int(f −1 (B)) for every subset B of Y .
The next proposition follows from the definition and Theorem 1.12.
Proposition 1.14 Let X and Y be topological spaces, and let f : X → Y be
a homeomorphism. Then the following properties hold:
In this subsection, we define and study bases for topological spaces. The main
idea involves using a collection of open sets in a topological space to represent
the whole topology.
Definition 1.15 Let (X, τ ) be a topological space, and let B ⊂ τ . We say that
B is a basis for the space (X, τ ) (or for the topology τ ) if for every set G ∈ τ ,
there exists a collection of subsets B ⊂ B such that
G= V.
V ∈B
Proposition 1.16 Let (X, τ ) be a topological space, and let B ⊂ τ . The collec-
tion B is a basis for the topological space (X, τ ) if and only if for every x ∈ X
and every open set G containing x, there exists V ∈ B such that x ∈ V ⊂ G.
Proof. =⇒: Suppose that B is a basis for (X, τ ). Fix G ∈ τ , x ∈ G and then
find by definition a collection B ⊂ B such that
x∈G= V.
V ∈B
The next theorem shows how to determine a topology with a given basis.
Observe from the definition of τ that G ∈ τ if and only if for any x ∈ G, there
exists V ∈ B such that x ∈ V ⊂ G. By the second condition of the theorem we
have that X = V ∈B V , and so X ∈ τ . We also see that ∅ ∈ τ since B could
be an empty collection. Obviously, τ is closed under arbitrary unions. Let us
show that τ is closed under finite intersections. Fix G1 and G2 in τ and pick
x ∈ G1 ∩ G2 . Then there exists Vix ∈ B such that x ∈ Vix ⊂ Gi for i = 1, 2.
Since x ∈ V1x ∩ V2x , we find V x ∈ B with x ∈ V x ⊂ V1x ∩ V2x ⊂ G1 ∩ G2 .
Thus G1 ∩ G2 ∈ τ by the observation at the beginning of the proof. From the
definition of τ , we conclude that B is a basis for (X, τ ).
Theorem 1.19 Let X be a set, and let {Xα }α∈I be a collection of topological
spaces. For each α ∈ I, let fα be a mapping from X to Xα , and let τα denote
the topology in Xα . Consider the family of sets given by
m
B := G = fα−1 (Gαi ) αi ∈ I, Gαi ∈ ταi , m ∈ N .
i
i=1
The next result, which also relates to Theorem 1.19, shows how to check
the continuity of a mapping via its compositions with fα for α ∈ I.
In this subsection, we study the product topology and the quotient topology
using topologies generated by families of mappings considered in the previous
subsection.
where Vα = Xα for α ∈
/ {α1 , . . . , αm }. Then B is a basis for the product
topology τ on X.
Proof. For any subset {α1 , . . . , αm } ⊂ I with m ∈ N and any Vαi ∈ ταi for
i = 1, . . . , m, we have
m
p−1
αi (Vαi ) = Vα ,
i=1 α∈I
Let us now discuss the case where I is a finite set. For simplicity we confine
ourselves to the case where I = {1, 2}.
Corollary 1.25 Let X1 and X2 be topological spaces, and let X = X1 × X2 .
Then the collection of sets
B := V1 × V2 V1 is open in X1 , V2 is open in X2
is a basis for the product topology on X.
Vi = Gi ∩ Y for i = 1, 2,
which clearly gives us the equalities
V1 ∩ V2 = (G1 ∩ Y ) ∩ (G2 ∩ Y ) = (G1 ∩ G2 ) ∩ Y.
Since G1 ∩ G2 ∈ τ , we get V1 ∩ V2 ∈ τY and confirm that τY is a topology.
It follows directly from the definition that for a topological space X, the
following implications hold:
[X is T4 ] =⇒ [X is T3 ] =⇒ [X is T2 ] =⇒ [X is T1 ].
16 1 FUNDAMENTALS
Proposition 1.37 Let X be a topological space such that every singleton set
is closed. Then X is T3 if and only if for any x ∈ X and for any open set G
containing this point, there exists an open set V such that x ∈ V ⊂ V ⊂ G.
One more proposition is needed before the proof of the next theorem.
Proposition 1.39 In the setting of Theorem 1.19, suppose that the space Xα
is Hausdorff for every α ∈ I, and that {fα }α∈I separates point in the sense
that for any x = y ∈ X there is α ∈ I with fα (x) = fα (y). Then the weak
topology τw generated by {fα }α∈I is Hausdorff.
1.1 Topological Spaces 17
Proof. To verify the first property, suppose that X is a T1 space and that Y
is a subspace of X. Fix x, y ∈ Y with x = y and choose open sets U, V ⊂ X
such that x ∈ / U . Let UY := Y ∩ U and VY := V ∩ Y . Then UY
/ V and y ∈
and VY are open in Y with x ∈ / VY and y ∈/ UY , and so Y is T1 .
Suppose now that X is a T2 space and that Y is a subspace of X. Fix
x, y ∈ Y with x = y and choose open sets U, V ⊂ X such that U ∩ V = ∅. Let
UY := Y ∩ U and VY := V ∩ Y . Then the sets UY and VY are open in Y with
UY ∩ VY = U ∩ V ∩ Y = ∅, which verifies that Y is T2 . Further, let X be a T3
space, and let Y be a subspace of X. Then Y is T2 , so every singleton in Y
is closed. Fix x ∈ Y and a closed set FY ⊂ Y such that x ∈ / FY . Then there
exists a closed set F ⊂ X with FY = Y ∩ F . This shows that x ∈ / F . By the
regularity property of X, we find open sets V containing x and W containing
F such that V ∩ W = ∅. Defining
VY := V ∩ Y and WY := W ∩ Y
tells us that VY and WY are open in Y with x ∈ VY , that FY ⊂ WY , and
that VY ∩ WY = ∅. Hence we verify that Y is T3 , which completes the proof
of the first assertion of the theorem.
To justify next the second assertion, let {Xα }α∈I be a family
of T1 spaces.
Fix any x = (xα ) and y = (yα ) in the product space X := α∈I Xα with
x = y. Then there exists α0 ∈ I such that xα0 = yα0 . Since Xα0 is T1 , we find
open sets Vα0 and Wα0 in Xα0 satisfying
xα0 ∈
/ Wα0 and yα0 ∈
/ Vα0 .
Define now the inverse image sets
V := p−1 −1
α0 (Vα0 ) and W := pα0 (Wα0 ),
This shows that X is T3 and thus completes the proof of the theorem.
1.1.9 Compactness
The concept of compactness and its modifications are crucial in many aspects
of mathematical analysis. In this subsection, we discuss basic facts about the
compactness in general topological spaces with specifications in metric spaces
and finite dimensions.
subset K ⊂ X and
Proof. Let X be a compact topological space. Fix a closed
show that K is compact. Taking an open covering K ⊂ α∈I Gα of K, we get
X= Gα ∪ K c .
α∈I
Since f is continuous, we have that the preimage set f −1 (Gα ) is open for
every α ∈ I, and so there exist α1 , . . . , αm ∈ I for which
m
K⊂ f −1 (Gαi ).
i=1
Since Acα is open for every α ∈ I, the compactness of X yields the existence
of α1 , . . . , αm ∈ I such that
m
m
X= Acαi = X \ Aαi .
i=1 i=1
m
Recalling that i=1 Aαi = ∅ brings us to a contradiction, and so α∈I Aα = ∅.
⇐=: To verify the converse statement, suppose that
X= Aα ,
α∈I
where all Aα are open, and so α∈I Acα = ∅. Since the collection {Acα }α∈I
m
consists of closed sets, there are α1 , . . . , αm ∈ I with i=1 Acαi = ∅ and
m
X= Aαi ,
i=1
The next lemma is a preparation for the proof of a deep theorem of topo-
logical space theory known as the Tikhonov theorem; see Theorem 1.48.
Lemma 1.47 Let X be a set, and let A := {Aα }α∈I be a collection of subsets
of X having the finite intersection property. Then there is a maximal collection
F of subsets of X that contains A and satisfies the finite intersection property.
In addition, the following properties hold:
(a) F is closed under finite intersections.
1.1 Topological Spaces 21
Proof. It follows from Proposition 1.44 and the construction of the projection
mapping that if X is compact, then Xα is compact for every α ∈ I. Thus we
only need to prove the converse implication. To proceed, consider a collection
A of closed subsets of X satisfying the finite intersection property and then
verify that A∈A A = ∅. By Lemma 1.47, there exists a maximal collection
F of subsets of X satisfying the finite intersection property that contains A
such that the properties (a) and (b) therein are satisfied. Note that
A⊂ A= A.
A∈F A∈A A∈A
B= pα (A) = ∅.
B∈Fα A∈F
For each α ∈ I pick aα ∈ A∈F pα (A), denote a := (aα )α∈I and show that
a∈ A.
A∈F
m
a∈ p−1
αi (Vαi ) ⊂ V.
i=1
Thus for each i = 1, . . . , m we have Vαi ∩pαi (A) = ∅, and so p−1αi (Vαi )∩A = ∅.
The second assertion of Lemma 1.47 tells us that p−1 αi (V ) ∈ F. Then it follows
m
from the first assertion of Lemma 1.47 that W := i=1 p−1 αi (V αi ) ∈ F. The
finite intersection property of F implies that W ∩A = ∅. Since W ∩A ⊂ V ∩A,
we get a ∈ A, and so a ∈ A∈F A = ∅ as claimed.
To continue the study of compactness, we recall the following notion.
Definition 1.49 Let X be a topological space, and let A ⊂ X. An element x0
in X (not necessarily in A) is called a cluster point of A if any neighbor-
hood of x0 contains an infinite number of elements of A.
Example 1.50 For illustration we list the following trivial examples:
(a) Let X = R, and let A = [0, 1). Then x0 = 0 is a cluster point of A and
u0 = 1 is also a cluster point of A. In fact, the set of cluster points of A
is the interval [0, 1].
(b) Let X = R, and let A = Z. Then A does not have any cluster point.
(c) Let X = R, and let A = {1/n | n ∈ N}. Then x0 = 0 is the only cluster
point of the set A.
The next result is classical in analysis and goes back to Bolzano and Weier-
strass in the case of real numbers.
Theorem 1.51 Any infinite subset of a compact topological space has at least
one cluster point.
Proof. Suppose on the contrary that A is an infinite subset of a compact
topological space X and that A does not have any cluster point. Then for
each x ∈ X, there is an open set Vx such that this set contains only a finite
number of elements of A. Clearly we have X = x∈X mVx . The compactness of
X gives us xi ∈ X for i = 1, . . . , m such that X = i=1 Vxi . It follows that
m
A=A∩X = A ∩ Vxi .
i=1
Proof. Let X be compact metric space. Fix any sequence {xk } in X and show
that it has a convergent subsequence. Consider the following two cases:
Case 1: The set A := {xk | k ∈ N} is infinite. Theorem 1.51 ensures that in
this case A has a cluster point x0 . Let us show that there exists a subsequence
of xk that converges to x0 . Indeed, it follows from the cluster point definition
that each open ball centered at x0 contains an infinite number of elements of
A. For l = 1, the open ball B(x0 ; 1) contains an element xk1 of A. For l = 2,
the open ball B(x0 ; 1/2) contains an infinite number of elements of A, and
thus there is k2 > k1 such that xk2 ∈ B(x0 ; 1/2). Continuing this process, we
find a sequence of positive integers k1 < k2 < . . . with xkl ∈ B(x0 ; 1/l) for
every l ∈ N. It shows that the sequence {xkl } converges to x0 .
Case 2: The set A = {xk | k ∈ N} is finite. In this case we get x0 ∈ A such
that xk = x0 for infinitely many k, say k1 < k2 < . . .. Hence the constant
subsequence {xkl } converges to x0 .
need to check that every sequence {xk } has a Cauchy subsequence since the
completeness of X yields in this case the convergence of this subsequence. It
follows from the total boundedness of X that this space can be covered by
finitely many balls of radius r = 1. Thus there exists at least one of these
balls, denoted by B1 , such that we can find infinitely many elements xk ∈ B1
(1)
as k ∈ N. Let {xk } be a subsequence of {xk } formed by those elements. Like-
wise X is covered by finitely many open balls of radius r = 1/2, and there
(1)
exists such a ball, denoted by B2 , which contains infinitely many xk . Denote
(2) (1)
by {xk } a subsequence of {xk } formed by those elements and then continue
(1) (2)
this process. Now fix an element xk1 from {xk } and in {xk } choose xk2 with
(3)
k2 > k1 , in {xk } choose xk3 with k3 > k2 , and so on. In this way we con-
struct a subsequence {xkl } with the property that if p ≥ l, then xkl , xkp ∈ Bl .
Thus d(xkl , xkp ) ≤ 2/l → 0 as l, p → ∞. Since the space X is complete, this
subsequence is convergent.
The material of this subsection plays an essential role in the subsequent study
of convex geometry and related issues.
26 1 FUNDAMENTALS
Proof. Suppose that X is disconnected. Then there exist two nonempty open
sets U and V such that U ∩ V = ∅ and U ∪ V = X. Since U = V c = X \ V ,
we see that U is both open and closed being a nonempty proper subset of X.
Conversely, suppose that there exists a nonempty proper subset U of X that
is both open and closed. Set V := U c . Then V is nonempty and open with
X = U ∪ V and U ∩ V = ∅. Thus X is disconnected.
Proof. Suppose that X is disconnected. Then there exist two nonempty open
sets U and V such that U ∩ V = ∅ and U ∪ V = X. Considering the charac-
teristic function χU , we see that it is nonconstant and continuous.
Conversely, suppose that there is a nonconstant continuous function f = χA .
Define U := f −1 ((−∞, 1/2)) and V := f −1 ((1/2, ∞)). Then these sets are
nonempty, open, and disjoint with U ∪ V = X. Thus X is disconnected.
Proof. Let us verify the converse implication first. Suppose that there exist
two nonempty sets A and B such that D = A ∪ B, A ∩ B = ∅, and A ∩ B = ∅.
To show that D is disconnected, we only need to check that A and B are open
c c
in D. Indeed, it is easy to see that A ⊂ B , where the set U := B is open in
X. Moreover, we get the equalities
c c c c
D ∩ U = D ∩ B = (A ∪ B) ∩ B = (A ∩ B ) ∪ (B ∩ B ) = A ∪ ∅ = A,
and thus A is open in D. The verification of openness for B in D is similar.
Now suppose that D is disconnected. Then there exist two open sets U and
V in X such that D ⊂ U ∪ V , U ∩ D = ∅, V ∩ D = ∅, and D ∩ U ∩ V = ∅. It
follows that D = D ∩ (U ∪ V ) = (D ∩ U ) ∪ (D ∩ V ). Setting A := D ∩ U and
B := D ∩ V , we get D = A ∪ B and thus conclude that
A ∩ B = D ∩ U ∩ (D ∩ V ) ⊂ V c ∩ (D ∩ V ) = V c ∩ D ∩ V = ∅.
The proof of A ∩ B = ∅ is similar.
Example 1.66 It is easy to check that any connected subset of the real line
is an open interval of R.
The next important result shows that connectedness is preserved under
taking images of continuous mappings.
Theorem 1.67 Let f : X → Y be a continuous mapping between topological
spaces. Suppose that X is connected. Then f (X) is a connected subset of Y .
Proof. Suppose on the contrary that f (X) is disconnected. Then there exist
two open sets U and V in Y such that f (X) ⊂ U ∪ V , U ∩ f (X) = ∅,
V ∩ f (X) = ∅, and f (X) ∩ U ∩ V = ∅. Defining now the disjoint sets G1 :=
f −1 (U ) and G2 := f −1 (V ), observe that G1 and G2 are nonempty and open
in X with X = G1 ∪ G2 and G1 ∩ G2 = ∅. Thus we arrive at a contradiction,
which therefore verifies the claim of the theorem.
We finish this subsection by showing that connectedness is preserved under
taking unions of connected sets. The following lemma is useful for this purpose.
Lemma 1.68 Assume that U and V are nonempty open subsets of X such
that U ∩ V = ∅ and X = U ∪ V . If C is a connected set in X, then either
C ⊂ U or C ⊂ V .
Proof. Suppose on the contrary that C is not a subset of U and that C is also
not a subset of V . Then U1 := C ∩U = ∅ and V1 := C ∩V = ∅. Moreover, these
sets are open in C with U1 ∪ V1 = C and U1 ∩ V1 = ∅. This is a contradiction
since C is assumed to be connected.
Proposition 1.69 Let {Aα }α∈I be a collection of
connected subsets of X with
at least one point in common. Then the set C := α∈I Aα is connected.
Proof. Supposing the contrary, we deduce this statement directly from the
definition of connectedness and Lemma 1.68.
28 1 FUNDAMENTALS
This subsection addresses the concepts of nets, subnets, and net convergence
in general topological spaces. The net language has been recognized as a
convenient tool to deal with convergence, compactness, and related issues
in topological spaces that are not metrizable; in particular, in dual spaces
to nonseparable Banach spaces endowed with the weak∗ topology that are
considered below. In such settings, net convergence is strictly different from
the sequential one while playing a significant role in many aspects of convex
and variational analysis.
To proceed, we consider a nonempty set I with a binary relation on I.
The binary relation is a preorder if it is reflexive and transitive in the sense
that for any α, β, and γ in I the following properties hold:
(a) α α (reflexivity).
(b) If α β and β γ, then α γ (transitivity).
Then (I, ) is called a preordered set. We say that a preordered set (I, ) is a
directed set if for any α, β ∈ I, there exists γ ∈ I such that α γ and β γ.
Now we are ready to define the notions of nets and net convergence.
Definition 1.70 Let X be a topological space, and let I be a directed set.
Consider a function x : I → X. Given any α ∈ I, denote xα := x(α) and say
that {xα }α∈I (or {xα }) is a net in X.
Definition 1.71 Let {xα }α∈I be a net in a topological space X. We say that
the net {xα }α∈I converges to x ∈ X and write it as lim xα = x if for any
neighborhood V of x, there exists an element α0 ∈ I such that
xα ∈ V whenever α0 α.
The following two examples describe particular cases of nets. The first one
treats sequences as nets with directed sets of natural numbers.
Example 1.72 Let {xk }k∈N be a sequence in a topological space X, and let
I = N that is a directed set with the usual “less than or equal to” relation.
Then it is obvious to see that {xk }k∈N is a net.
Example 1.73 Let X be a topological space, and let x ∈ X. Denote by Nx
the collection of all open sets containing this point x and define the following
preordered relation on Nx :
U V if and only if U ⊃ V.
It is easy to verify that (Nx , ) is a directed set. For each V ∈ Nx , we pick
a point xV ∈ V and observe that {xV } is a net. Let us now show that {xV }
converges to x. Indeed, fix any neighborhood V of x and choose V0 := V . If
W ∈ Nx and V0 W , then xW ∈ W ⊂ V and thus lim xV = x by the above
definitions.
1.1 Topological Spaces 29
Proof. Take x ∈ A and denote by Nx the collection of all open sets containing
x. Given U, V ∈ Nx , define the relation
U V if and only if U ⊃ V.
Then I := Nx and the preorder form a directed set. For any U ∈ Nx , we
have U ∩ A = ∅ and thus choose xU ∈ U ∩ A. To verify the net convergence
of {xU }U ∈I to x, fix any neighborhood V of x and let U0 := V . Then for any
U ∈ I with U0 U we get U ⊂ U0 = V and so xU ∈ U ⊂ V , which justifies
the claimed convergence. To verify the converse statement, suppose that there
exists a net {xα }α∈I ⊂ A that converges to x. Fix any neighborhood V of x
and find by the definition α0 ∈ I with xα0 ∈ V . It follows that xα0 ∈ V ∩A and
so V ∩A = ∅. Thus we have x ∈ A while the rest of the proof is straightforward.
Proof. Suppose that f is continuous at x0 and that lim xα = x0 . Fix any open
set V that contains f (x0 ) and then find an open set U containing x0 such that
f (U ) ⊂ V . Since lim xα = x0 , there exists α0 ∈ I with
xα ∈ U whenever α0 α.
It follows furthermore that
f (xα ) ∈ V whenever α0 α,
and thus the net {f (xα )} converges to f (x0 ).
To verify the converse statement, suppose that for any net {xα } that
converges to x0 the corresponding net of values {f (xα )}α∈I converges to f (x0 ).
If f is not continuous at x0 , find an open set V that contains f (x0 ) and such
that for any open set U containing x0 we have f (U ) V . Denote by Nx0 the
collection of all open sets containing x0 and recall from the discussions above
that I := Nx0 is a directed set with the relation defined by the set inclusions
as in Proposition 1.74. Fix now U ∈ I and choose xU ∈ U with f (xU ) ∈ / V.
30 1 FUNDAMENTALS
Proof. =⇒: Suppose that the space X is compact and take an arbitrary net
{xα }α∈I in X. Define the collection of sets {Cα }α∈I as in Lemma 1.78 and
choose x ∈ α∈I Cα . We need to show that there exists a subnet of {xα }α∈I
which converges to x. Denote by Nx the collection of all open sets containing
x. Given U ∈ Nx and α ∈ I, we have U ∩ {xγ | α γ} = ∅. Thus it is possible
to choose γU,α ∈ I for which we have
α γU,α and xγU,α ∈ U.
Let further J := Nx × I and for any (U, α) and (V, β) in J define the relation
(U, α) (V, β) if and only if U ⊃ V and γU,α ≤ γV,β ,
which is clearly a preorder on I. For such (U, α) and (V, β), let W := U ∩ V
and choose γ ∈ I satisfying γU,α γ and γV,β γ. Then we easily see that
(U, α) (W, γ) and (V, β) (W, γ). Thus (J, ) is a directed set. Next we
define ϕ : J → I by ϕ(U, α) := γU,α and note that {xϕ(U,α) }(U,α)∈J is a subnet
of {xα }α∈J . To show that lim xϕ(U,α) = x, fix an open set V containing x,
let V0 := V , and pick α0 ∈ I. If (W, β) ∈ J and (V0 , α0 ) (W, β), then
W ⊂ V0 = V . It follows that xϕ(W,β) ∈ W ⊂ V0 = V , which readily justifies
that lim xϕ(U,α) = x.
⇐=: Suppose that every net in X has a convergent subnet. To verify that X
is compact, it suffices to show by Theorem 1.46 that C∈C C = ∅ whenever
C is a collection of closed subsets of X that satisfies the finite intersection
property. Define the set
m
I := Ci C1 , . . . , Cm ∈ C, m ∈ N
i=1
Note that if (a) and (b) are satisfied, then X is Hausdorff; see Proposition
1.92 presented below.
The next proposition is a direct consequence of continuity of the addition
and the scalar multiplication in a topological vector space. Note that we con-
sider the usual topology on F and the product topologies on X × X and F × X
as defined earlier.
Proposition 1.81 Let X be a topological vector space. Then
(a) For any x0 , y0 ∈ X and any open set W that contains x0 + y0 , there
exist open sets U and V containing x0 and y0 , respectively, such that
U + V ⊂ W.
(b) For any scalar λ0 ∈ F, x0 ∈ X and for any open set W that contains λ0 x0 ,
there exist δ > 0 and an open set U containing x0 such that λU ⊂ W for
all λ ∈ F with |λ − λ0 | < δ.
Proof. The mapping f : X × X → X given by f (x, y) := x + y for x, y ∈ X
is continuous with f (x0 , y0 ) ∈ W . Then there is an open set G with respect
to the product topology on X × X such that f (G) ⊂ W . We can find further
1.2 Topological Vector Spaces 33
Corollary 1.83 Let X be a topological vector space. Then for any a ∈ X and
α ∈ F with α = 0, we have the properties:
f (A × B) = f (A × B) ⊂ f (A × B) = A + B.
(d) Fix any x ∈ int(A) + int(B) and find a ∈ int(A) and b ∈ int(B) with
x = a + b. Choose neighborhoods U and V of the origin such that a + U ⊂ A
and b + V ⊂ B. Then x + (U + V ) = (a + U ) + (b + V ) ⊂ A + B. Since U + V
is also a neighborhood of the origin, x ∈ int(A + B).
The following two properties of sets are often used in the theory of topo-
logical vector spaces.
As seen from the definition, any balanced set Ω is symmetric, i.e., Ω = −Ω,
but the converse is not true in general; see Figures 1.1 and 1.2.
is absorbing while 0 ∈ / int(Ω); see Figure 1.3. Note that this set is nonconvex.
It is possible to construct an example of a convex and absorbing set Ω in
a topological vector space such that 0 ∈ / int(Ω). Consider, e.g., the collection
of measurable and essentially bounded functions f : [0, 1] → R equipped with
the norm f 1 of L1 [0,1], while the norm f ∞ of L∞ [0, 1] is also used in
what follows. Let Ω := f ∈ X f ∞ ≤ 1 be a convex subset of the space
above. To show that Ω is absorbing, fix any f ∈ X and let α := f ∞ . We
can easily see that tf ∈ Ω whenever |t| < α+1 1
, which verifies the absorbing
property of Ω. Furthermore, we have 0 ∈ / int(Ω). Suppose on the contrary
that 0 ∈ int(Ω) and find δ > 0 such that B(0; δ) ⊂ Ω. Define
f (t) := 2χ[0,δ/2] (t) for all t ∈ [0, 1]
via the characteristic function (1.2) of the set A := [0, δ/2] ⊂ [0, 1]. Then
f 1 = δ, and thus f ∈ B(0; δ). On the other hand, we get f ∞ = 2, and so
f∈/ Ω. The obtained contradiction shows that 0 ∈ / int(Ω).
The next result provides a sufficient condition for the preservation of the
closedness properties in topological vector spaces under summation.
Proposition 1.93 Let X be a topological vector space. If A is a closed subset
of X and B is a compact subset of X, then A + B is closed.
Proof. We need to show that (A + B)c is open. Fix any z ∈ (A + B)c and
pick b ∈ B. Then z − b ∈ / A, and so z − b ∈ Ac , where the latter set is open.
Thus it is possible to choose an open set Ub containing z and an open set Vb
containing b for which we have
(Ub − Vb ) ∩ A = ∅.
m
Since B is compact, find vectors b1 , . . . , bm ∈ B such that B ⊂ i=1 V bi .
m
Denote U := i=1 Ubi and check that
U ∩ (A + B) = ∅.
Indeed, suppose by contradiction that there exists x ∈ U ∩ (A + B), i.e., x ∈ U
and x = a + b for some a ∈ A and b ∈ B. Since b ∈ B, we get b ∈ Vbi for some
i ∈ {1, . . . , m}, which shows that x ∈ Ubi and x − b ∈ Ubi − Vbi . It follows that
a=x−b∈ / A, a contradiction verifying that the set A + B is closed.
Now we turn to some properties of linear mappings between topological
vector spaces. The first observation contains an elementary albeit useful fact.
Proof. The first statement is obvious. To verify the second statement, suppose
that U is a neighborhood of x0 in X for the weak topology. Then there exists
a weakly open set W such that
x0 ∈ W ⊂ U.
Without loss of generality, assume that there exist open sets Gi in F and
linear functions fi ∈ X ∗ for i = 1, . . . , m ensuring the representation
m
W = fi−1 (Gi ).
i=1
The next statement reveals one of the main differences between finite-
dimensional and infinite-dimensional spaces. Remember that B stands the
closed unit ball of the space in question.
Proposition 1.105 Let X be an infinite-dimensional normed space. The unit
sphere S := {x ∈ X | x = 1} is never closed in the weak topology σ(X, X ∗ ).
w
More precisely, we have the equality S = B for the weak closure of S.
w
To verify the opposite inclusion, it suffices to show that B ⊂ S , since it
w
is obvious that S ⊂ S and that B = B ∪ S, where B is the open unit ball in
X. We proceed by fixing any x0 ∈ X with x0 < 1 and any open set G in
the weak topology that contains x0 . Then G contains the set
V := x ∈ X |fi (x − x0 )| < ε for all i = 1, . . . , m .
Thus there exists y0 ∈ X with y0 = 0 such that fi (y0 ) = 0 for all i =
1, . . . , m. Indeed, suppose on the contrary that fi (y) = 0 for some i = 1, . . . , m
and all y ∈ X \ {0}. Then the mapping T : X → Fm defined by T (x) :=
(f1 (x), . . . , fm (x)) is one-to-one, which is a contradiction.
Based on the continuity of ϕ(t) := x0 + ty0 on R, select t ∈ R such that
x0 + ty0 ∈ S. It clearly implies that x0 + ty0 ∈ S ∩ V , and hence G ∩ S = ∅.
w w w
This tells us that B ⊂ S , and thus B ⊂ S due to S ⊂ S .
The obtained result easily yields the following observation.
Corollary 1.106 The open unit ball B ⊂ X is never open in the weak topology
σ(X, X ∗ ) if the normed space X is infinite-dimensional.
Proof. Suppose on the contrary that B is open in the weak topology of X.
Then B c = {x ∈ X | x ≥ 1} is weakly closed. Since S = B ∩ B c and B is
obviously weakly closed, S is weakly closed as well. This is a contradiction,
which therefore verifies the claim.
Return now to the general setting where X is a topological vector space
over F. For each x ∈ X, we define the function φx : X ∗ → F by φx (f ) := f (x)
whenever f ∈ X ∗ . The next notion is crucial in many aspects of infinite-
dimensional analysis.
Definition 1.107 The weak∗ topology on X ∗ , denoted by σ(X ∗ , X), is
the topology generated by the collection of functions {φx }x∈X . This amounts
to saying that the weak∗ topology on X ∗ is the weakest topology on X ∗ such
that each function φx for x ∈ X is continuous.
In the case where X is a normed space, we have the three topologies on
X ∗ : the strong topology τ , the weak topology σ(X ∗ , X ∗∗ ), and the weak∗
topology σ(X ∗ , X). Since X ⊂ X ∗∗ , it follows that
σ(X ∗ , X) ⊂ σ(X ∗ , X ∗∗ ) ⊂ τ.
The next proposition and its corollary are consequences of the definitions.
Proof. Suppose that {x∗k } converges to x∗ in σ(X ∗ , X). Take any x ∈ X and
ε > 0. Then the set
V ∗ (x, x∗ ; ε) := z ∗ ∈ X ∗ |(z ∗ − x∗ )(x)| < ε
1.2 Topological Vector Spaces 45
Theorem 1.112 Let X be a real topological vector space, and let U be a neigh-
borhood of 0 ∈ X. Then U ◦ is compact in X ∗ equipped with the weak∗ topology.
Given any x ∈ V , consider the family of sets {Ix }x∈V , where Ix := [−1, 1] ⊂ R
for x ∈ V , and form the set
Z := Ix
x∈V
equipped with the product topology. It follows from Theorem 1.48 by Tikhonov
that Z is a compact topological space. Define the mapping F : V ◦ → Y by
F(f ) := (f (x))x∈V and observe that F is one-to-one. Indeed, if F(f ) = F(g)
for some f, g ∈ V ◦ , then we have
f (v) = g(v) whenever v ∈ V.
Taking any x ∈ X, we find t > 0 such that tx ∈ V . Thus tf (x) = f (tx) =
g(tx) = tg(x), which implies that f (x) = g(x), i.e., F(f ) = F(g) =⇒ f = g.
Now consider the range of F denoted by C := F(V ◦ ). Then the mapping
46 1 FUNDAMENTALS
The main goal of this subsection is to discuss some basic facts about quotient
spaces, which are needed for the subsequent parts of this book. Given a linear
subspace L of a topological vector space X, recall that the quotient space X/L
is defined by
X/L := x + L x ∈ X .
The addition and the scalar multiplication on X/L are given by
(x + L) + (y + L) := (x + y) + L and α(x + L) := αx + L
for x, y ∈ X and scalar α. Since L is a linear subspace, both operations above
are well-defined. It is easy to check that X/L endowed with these operations
is a vector space.
Definition 1.114 Let L be a linear subspace of a vector space X. The codi-
mension of L in X, denoted by codim(L), is the dimension of the quotient
space X/L, i.e.,
codim(L) := dim(X/L).
The following two propositions deal with vector spaces of codimension one.
Proposition 1.115 Let X be a vector space, and let f : X → F be a nonzero
linear function. Then we have
codim(ker f ) = 1.
1.2 Topological Vector Spaces 47
(b) Given a subset Ω ⊂ X/L, observe that Ω is closed in X/L if and only
if π −1 (Ω) is closed in X. Indeed, if Ω is closed, then π −1 (Ω) is closed by
the continuity of π. Supposing now that π −1 (Ω) is closed implies that its
complement (π −1 (Ω))c = π −1 (Ω c ) is open. Since π is surjective and open,
π(π −1 (Ω c )) = Ω c is open, which shows that Ω is closed.
To proceed further, fix any singleton Ω := {z + L} in X/L and get z + L =
π(z). Then we have π −1 (Ω) = π −1 (π(z)) = z + L, which is a closed subset of
X, and so Ω is closed in the quotient space X/L. Now fix a neighborhood V
of the origin in X/L and observe that π −1 (V ) is a neighborhood of the origin
in X by the continuity of π. Thus there exists a neighborhood U of the origin
in X such that U + U ⊂ π −1 (V ), which implies that
π(U ) + π(U ) ⊂ V.
Since π is an open map, the set W := π(U ) is a neighborhood of the origin
in X/L with W + W ⊂ V , and hence the addition in X/L is continuous. It is
similar to prove the continuity of the scalar multiplication in X/L.
(c) Let B be a basis of neighborhoods of the origin in X. Fix any neighborhood
V of the origin in X/L and get that π −1 (V ) is a neighborhood of the origin
in X. Thus there exists B ∈ B such that B ⊂ π −1 (V ), which tells us that
π(B) ⊂ V . Since π(B) ∈ π(B), we conclude therefore that π(B) is a basis of
neighborhoods of the origin in X/L.
Recall first the notion of a binary relation, which provides a partial ordering
on the set in question and is widely used in what follows.
Definition 1.120 Given an nonempty set A, we say that a binary relation
“≤” defines a partial order on A if the following properties are satisfied:
(a) For all a ∈ A we have a ≤ a.
(b) If a ≤ b and b ≤ a, then a = b.
(c) If a ≤ b and b ≤ c, then a ≤ c.
A set ordered by a binary relation ≤ is called a partially ordered set.
It is instructive to discuss several typical settings of partial (and full/total)
ordering. The first Example 1.121 is really simple while Example 1.122 pro-
vides a more involved partial ordering that is used in the proof of the Hahn-
Banach theorem given below.
Example 1.121 We have the following illustrations of Definition 1.120:
(a) Let Ω be a nonempty set, and let P denote the collection of all the subsets
of Ω. For two sets A, B ∈ P, define the binary relation A ≤ B by A ⊂ B.
It is easy to verify that ≤ is a partial ordering on P.
(b) The standard ordering relation ≤ on the set of all the real numbers R
clearly makes R to be a totally ordered set.
Example 1.122 Let Y be a subspace of a real vector space X, and let f : Y →
R be a linear mapping. We say that g is an extension of f if it is defined on
a subspace Dg containing Y with f (y) = g(y) for all y ∈ Y . Denote by F the
collection of all the extensions of f . The set F is obviously nonempty because
it contains f itself. For any g1 , g2 ∈ F, we define the following relation: g1 ≤ g2
if and only if
Dg1 ⊂ Dg2 and g2 (x) = g1 (x) for all x ∈ Dg1 .
We can directly check that F is a partially ordered set.
It is natural to have certain notions of upper bound and maximality with
respect to partial orders. Here we use the standard ones.
Definition 1.123 Let A be a partially ordered set, and let S ⊂ A.
(a) S is called a totally ordered set if for any a, b ∈ S, we have that
a ≤ b or b ≤ a with respect to the ordering binary relation.
(b) An element a ∈ A is called an upper bound of S if s ≤ a for all s ∈ S.
(c) An element q ∈ A is called a maximal element of S if q ∈ S and the
following implication holds:
[a ∈ A, q ≤ a] =⇒ [a = q].
1.3 Some Fundamental Theorems of Functional Analysis 51
= d(x; Y ) = x∗0 , x
and thus completes the proof since the opposite inequalities are obvious.
54 1 FUNDAMENTALS
Example 1.133 Both the set of all rational numbers and the set of all irra-
tional numbers are dense in R. Moreover, Q is countable, so R is separable.
Before the proof of Baire’s theorem, we present the following simple lemma.
1.3 Some Fundamental Theorems of Functional Analysis 55
B1
Now we are ready to formulate and prove the Baire category theorem.
Theorem 1.137 Let X be a complete metric space represented as
∞
X= An .
n=1
for all n ∈ N. The classical Cantor intersection theorem (see Exercise 1.156)
∞
gives us a common point a ∈ n=1 Bn . This is a clear contraction because
a∈/ An for every n ∈ N, and thus we complete the proof of the theorem.
56 1 FUNDAMENTALS
Now we present another classical result of analysis known as the open mapping
theorem. It concerns the following fundamental property of mappings.
Definition 1.138 A mapping f : X → Y between two topological spaces is
open if the image set f (G) is open in Y for any open subset G of X.
Here is the main open mapping result with a complete proof. Recall that a
linear mapping A : X → Y between vector spaces X and Y is called surjective,
or onto, if the image of X under A covers the entire space Y , i.e., AX = Y .
Theorem 1.139 Let X and Y be Banach spaces, and let A : X → Y be a
surjective continuous linear mapping. Then A is open.
Proof.
∞ Denote Xr := B(0; r) = x ∈ X x < r and observe that X =
n=1 Xn . Since A is surjective, we have the equalities
∞
∞
Y = AX = A Xn = AXn .
n=1 n=1
The Baire category theorem tells us that there exists n0 ∈ N such that
int AXn0 = ∅.
Thus we can find y0 ∈ Y and r > 0 for which
B(y0 ; r) = y ∈ Y y − y0 ≤ r ⊂ AXn0 .
The rest of the proof is split into the following three steps.
Step 1. Let the origin be an interior point of AXp for all p > 0.
Pick y ∈ B(0; r) and get y0 + y ∈ B(y0 ; r) ⊂ AXn0 , which tells us that
y ∈ −y0 + AXn0 ⊂ AX2n0 .
The linearity of A yields the inclusion
B(0; γ) ⊂ AX2r−1 n0 γ for all γ > 0. (1.4)
Step 2. Let the origin be an interior point of AXp for all p > 0.
For y ∈ Y with y ≤ 1, we deduce from (1.4) with k = 1 that there exists a
vector x1 ∈ X2r−1 n0 such that
1.3 Some Fundamental Theorems of Functional Analysis 57
1
y − Ax1 ≤ .
2
Since y − Ax1 ∈ B(0; 1/2), we apply (1.4) again and choose x2 ∈ Xr−1 n0 with
1
y − Ax1 − Ax2 ≤ .
22
Following this process, for each m ∈ N we find xm ∈ X(2m−1 r)−1 2n0 satisfying
m
1
y − Axi ≤ m .
i=1
2
∞
Since xm ≤ 2m−12n0
r , the series m=1 xmis absolutely convergent in the
∞
Banach space X. Denoting its sum by x := m=1 xm , we get
∞
2n0 4n0
x ≤ m−1 r
= .
m=1
2 r
Corollary 1.141 Let X be a linear space, and let · 1 and · 2 be two norms
on X such that (X, · 1 ) and (X, · 2 ) are Banach spaces. If there exists a
number α > 0 with
x 2 ≤α x 1 whenever x ∈ X,
then we can find a constant β > 0 ensuring the inequality
x 1 ≤β x 2 for all x ∈ X.
P (z) = x ≤ x + Ax = z .
Using Corollary 1.140 ensures that the inverse mapping P −1 : X → Z is
continuous as well. In particular, if {xn } converges to x, then {P −1 (xn )}
converges to P −1 (x), which implies that the sequence {Axn } converges to Ax
and thus verifies the continuity of A.
For the second result of this subsection we need the following notions.
Definition 1.144 Let X and Y be normed spaces, and let {Aα }α∈I be a fam-
ily of linear mappings from X to Y . Then we say that the family {Aα }α∈I is
pointwise bounded on X if
sup Aα (x) < ∞ for each x ∈ X.
α∈I
x
Aα x + r ≤ n0 for all α ∈ I.
x
This yields the estimate
x
Aα r ≤ n0 + Aα x ≤ 2n0 for all α ∈ I,
x
which tells us in turn that
2n0
Aα x ≤ x whenever α ∈ I.
r
2n0
It shows that Aα ≤ for all α ∈ I and thus completes the proof.
r
Prove that B is a basis for a topology τ on R. Then show that τ is weaker than the
usual topology on R.
Exercise 1.154 In the setting of Proposition 1.32, denote by int(A)Y the interior
of A in Y and clarify the fulfillment of the representation
int(A)Y = int(A) ∩ Y.
Exercise 1.155 Prove that any metric space is a Hausdorff topological space.
is a closed subset of X.
back to Banach [26] who proved the sequential weak∗ compactness of the closed
unit ball in dual spaces of separable normed spaces. The general results on the
topological weak∗ compactness of dual balls in arbitrary normed spaces given in
Corollary 1.113 is due to Leonidas Alaoglu (1914–1981); see [2]. This result is also
known as the Banach-Alaoglu theorem. The full statement of Theorem 1.112 in topo-
logical vector spaces is due to Bourbaki [61]. We present here a simplified proof of
this result. The first version of the fundamental Hahn-Banach extension theorem
was proved by Edward Helly (1884–1943) in his paper [156] published in 1912 for
the case of continuous functions defined in closed intervals of the real line. In his
paper [149] published in 1927, Hans Hahn (1879–1934) proved the Hahn-Banach
theorem in real Banach spaces by extending Helly’s techniques with the usage of
transfinite induction instead of the standard one. Independently of Hahn, but also
using transfinite induction, Banach proved in his paper [25] published in 1929 the
version of the Hahn-Banach theorem (Theorem 1.125) in real vector spaces that we
know nowadays. The Kuratowski-Zorn lemma, which we use in the refined proof
of Theorem 1.125, was established independently in the paper [195] by Kuratowski
published in 1922 and by Max Zorn (1906–1993) in his paper [368] published in 1935.
The open mapping theorem from Theorem 1.139, known also as the Banach-Schauder
principle, was obtained independently by Banach [25, Part II] in 1929 and by Juliusz
Schauder (1899–1943) in his paper [321] published in 1930. The related closed graph
theorem from Theorem 1.143 appeared (together with other fundamental results) in
the classical monograph by Banach [26] on the foundations of functional analysis
published in 1932. The uniform boundedness principle from Theorem 1.145, known
as the Banach-Steinhaus theorem, was also given in Banach’s book while being based
on the joint paper [28] by Banach and his teacher Hugo Steinhaus (1887–1972) pub-
lished in 1927. The Baire category theorem, which is crucially used in the proofs of
the basic results presented in Subsections 1.3.3 and 1.3.4, was obtained in the doc-
toral dissertation [22] by René Baire (1874–1932) published in 1899. More complete
theories of topological spaces and functional analysis presented in this chapter can
be found in the books by Bourbaki [60, 61], Dieudonné [107], Dunford and Schwartz
[114], Kelley [180], and Rudin [318] among other references.
2
BASIC THEORY OF CONVEXITY
This chapter is devoted to basic convexity theory dealing with sets and functions
defined in various space frameworks consisting of linear/vector spaces, topolog-
ical vector spaces, locally convex topological spaces and their subclasses, and
also specific results in finite dimensions. Developing the geometric approach to
convex analysis, we start with convex sets, establish fundamental separation the-
orems for them, and then proceed with the study of convex functions. Further
topics on convexity, including duality and generalized differentiation theories,
are considered in the subsequent chapters. Unless otherwise stated, we consider
real vector spaces in this chapter and the subsequent ones.
Proof. To prove the “only if” part, suppose that B is an affine mapping. Then
there exist a linear mapping A : X → Y and a vector b ∈ Y such that (2.2) is
satisfied. Given any x1 , x2 ∈ X and λ ∈ R, it follows that
B λx1 + (1 − λ)x2 = A λx1 + (1 − λ)x2 + b
1 ) + (1− λ)A(x2 )+ λb + (1 −
= λA(x λ)b
= λ A(x1 ) + b + (1 − λ) A(x2 ) + b
= λB(x1 ) + (1 − λ)B(x2 ),
which therefore verifies the validity of (2.3).
To prove the opposite implication, suppose that B : X → Y satisfies (2.3)
for all λ ∈ R and x1 , x2 ∈ X. Letting b := B(0), define the mapping
A(x) := B(x) − b for x ∈ X (2.4)
and show that it is linear. Indeed, for any x1 , x2 ∈ X and λ ∈ R we employ
(2.3) and (2.4) to verify that
2.1 Convexity of Sets 67
A λx1 + (1 − λ)x2 = λA(x1 ) + (1 − λ)A(x2 ) and A(0) = 0.
Given any x ∈ X, observe that
A(λx) = A λx + (1 − λ)0 = λA(x) + (1 − λ)A(0) = λA(x).
We have furthermore that
x + x 1 1 1 1
1 2
A(x1 + x2 ) = A 2 = 2A x1 + x2 = 2 A(x1 ) + A(x2 )
2 2 2 2 2
= A(x1 ) + A(x2 ),
which justifies the linearity of A. It follows from (2.4) that B(x) = A(x) + b
whenever x ∈ X, and thus the mapping B is affine.
It is easy to verify that the convexity of sets is preserved while taking their
direct and inverse images/preimages by affine mappings.
Proof. We only prove the first assertion and leave the proof of the second
one as an exercise for the reader. Fix any a, b ∈ B(Ω) and λ ∈ (0, 1). Then
a = B(x) and b = B(y) for some x, y ∈ Ω. Proposition 2.2 tells us that
λa + (1 − λ)b = λB(x) + (1 − λ)B(y) = B λx + (1 − λ)y .
Since Ω is convex, we get λx + (1 − λ)y ∈ Ω, and hence λa + (1 − λ)b ∈ B(Ω).
This verifies the convexity of the image B(Ω).
Next we proceed with Cartesian products. Given two vector spaces X and
Y , their product X × Y is a vector space with the operations
(x1 , y1 ) + (x2 , y2 ) := (x1 + x2 , y1 + y2 ),
λ(x1 , y1 ) := (λx1 , λy1 )
for (x1 , y1 ), (x2 , y2 ) ∈ X × Y and λ ∈ R.
Finally in this subsection, we observe that the notion of convexity for sets
can be directly extended to set-valued mappings via passing to their graphs.
By a set-valued mapping/multifunction between vector spaces X and Y we
understand a mapping F defined on X with values in the collection of all the
→Y
subsets of Y , i.e., with F (x) ⊂ Y ; see Figure 2.2. The notation F : X →
is used for set-valued mappings instead of the usual notation F : X → Y
for single-valued ones. As we see below in the text and commentaries, set-
valued mappings play a highly important role in many aspects of convex and
variational analysis as well as in their numerous applications.
It follows from the definition that any vector of the form x = λa + (1 − λ)b,
where a, b ∈ X and 0 ≤ λ ≤ 1, is a convex combination of a and b.
Here is a useful characterization of convexity for arbitrary nonempty sets
in general (real) vector spaces via convex combinations of their elements; see
the illustration of this result in Figure 2.3.
Proposition 2.6 A subset Ω of a vector space X is convex if and only if it
contains all the convex combinations of its elements.
Proof. The sufficiency part is trivial. To justify the necessity, we show by
m
induction that any convex combination x = i=1 λi ωi of elements in Ω is
also an element of Ω. This conclusion follows directly from the definition for
m = 1, 2. Fix now a positive integer m ≥ 2 and suppose that every convex
combination of m elements from Ω belongs to Ω. Form the convex combination
m+1 m+1
y= λ i ωi , λi = 1, λi ≥ 0
i=1 i=1
70 2 BASIC THEORY OF CONVEXITY
Proof. Taking any a, b ∈ α∈I Ωα and λ ∈ (0, 1), we get that a, b ∈ Ωα for
all α ∈ I. The convexity of each Ωα ensures that λa + (1 − λ)b ∈ Ωα . Thus
λa + (1 − λ)b ∈ α∈I Ωα and the intersection α∈I Ωα is convex.
The next result follows from the definition and Proposition 2.7.
Theorem 2.10 For any subset Ω of a vector space X, its convex hull co(Ω)
admits the representation
m m
co(Ω) = λi ai λi = 1, λi ≥ 0, ai ∈ Ω, m ∈ N .
i=1 i=1
xλ + εV = λa + (1 − λ)b + εV
⊂ λa + (1 − λ)(Ω + εV ) + εV
= λa + (1 − λ)Ω + (1 − λ)εV + εV
= λa + (1 − λ)εV + εV + (1 − λ)Ω
1−λ ε
=λ a+ε V + V + (1 − λ)Ω
λ λ
⊂ λ a + V +
V + (1 − λ)Ω
⊂ λ a + W + (1 − λ)Ω
⊂ λΩ + (1 − λ)Ω ⊂ Ω.
This shows that xλ ∈ int(Ω) and thus verifies that [a, b) ⊂ int(Ω).
Next we discuss some relationships between the closure and interior oper-
ations applied to the convex set in question and its closure.
Proof. The first assertion requires us to show that of Ω ⊂ int(Ω), since the
opposite inclusion is obvious. Picking b ∈ Ω, a ∈ int(Ω), and t ∈ (0, 1), we set
xt := ta + (1 − t)b = b + t(a − b).
For any neighborhood V of the origin, choose t > 0 so small that t(b − a) ∈ V
and hence xt ∈ (b + V ) ∩ int(Ω). This yields b ∈ int(Ω), and we are done.
To prove the second assertion, we need to verify that int(Ω) ⊂ int(Ω); the
opposite inclusion is obvious. Fix any vectors b ∈ int(Ω) and a ∈ int(Ω) and
then take ε > 0 sufficiently small such that c := b + ε(b − a) ∈ Ω. Using
Lemma 2.12 brings us to the inclusion
ε 1
b= a+ c ∈ (a, c) ⊂ int(Ω),
1+ε 1+ε
which justifies that int(Ω) ⊂ int(Ω) and thus complete the proof.
The next result also employs Lemma 2.12 to calculate the closure of convex
set intersections under the interiority qualification condition.
lin(Ω) := x ∈ X ∃w ∈ Ω : [w, x) ⊂ Ω . (2.6)
When X is a topological vector space and Ω is a convex subset of X, it is
easy to check that
int(Ω) ⊂ core(Ω) ⊂ Ω ⊂ lin(Ω) ⊂ Ω, (2.7)
where all the inclusions can be strict.
The next proposition follows directly from the definitions.
The following result shows that the operation of taking cores always pre-
serves set convexity.
Proposition 2.16 The core of every convex subset of a vector space X is also
a convex subset of X.
Proof. Fix any a, b ∈ core(Ω) and 0 < λ < 1. It follows from (2.5) that for
any v ∈ X there exists δ > 0 such that
a + γv ∈ Ω and b + γv ∈ Ω whenever |γ| < δ.
For each such number γ we have the relationships
λa + (1 − λ)b + γv = λ(a + γv) + (1 − λ)(b + γv) ∈ λΩ + (1 − λ)Ω ⊂ Ω.
This implies that λa + (1 − λ)b ∈ core(Ω), and hence core(Ω) is convex.
The next observation presents a useful way to check that a point belongs
to the core of a given convex set.
Proof. The inclusion int(Ω) ⊂ core(Ω) was mentioned in (2.7); see Exer-
cise 2.200. To verify the opposite one, consider the following two cases.
Case 1: 0 ∈ int(Ω). Fix any x ∈ core(Ω) and by the definition of cores find
1
t > 0 with x + tx ∈ Ω. Then we have x = 1+t w for some w ∈ Ω. Employing
Lemma 2.12 tells us that
1 t
x= w+ 0 ∈ int(Ω),
1+t 1+t
which shows that core(Ω) ⊂ int(Ω).
76 2 BASIC THEORY OF CONVEXITY
Case 2: 0 ∈
/ int(Ω). Choose a ∈ int(Ω) and define Θ := Ω−a. Then 0 ∈ int(Θ),
and we get therefore that
core(Ω) − a = core(Θ) ⊂ int(Θ) = int(Ω) − a.
This yields core(Ω) ⊂ int(Ω), which completes the proof.
Proof. Pick a, b ∈ lin(Ω) and λ ∈ (0, 1). Then there are vectors u, v ∈ Ω with
[u, a) ⊂ Ω and [v, b) ⊂ Ω.
Denoting xλ := λa + (1 − λ)b and wλ := λu + (1 − λ)v ∈ Ω, we see that
[wλ , xλ ) ⊂ Ω, and so xλ ∈ lin(Ω). This verifies the convexity of lin(Ω).
Proof. Fix λ ∈ (0, 1), define xλ := λa + (1 − λ)b, and then verify that xλ ∈
core(Ω). Since a ∈ core(Ω), there exists δ > 0 such that
a + tv ∈ Ω whenever |t| < δ.
Now taking such t and using the convexity of Ω readily imply that
xλ + tλv = λa + (1 − λ)b + tλv = λ(a + tv) + (1 − λ)b ∈ Ω,
which yields xλ ∈ core(Ω).
The next result in this direction addresses the Banach space setting and
employs the Baire category theorem. First we present a characterization of
absorbing sets in Banach spaces, which is certainly of its own interest.
Lemma 2.23 Let X be a Banach space, and let Ω be a closed convex subset
of X. Then Ω is absorbing if and only if 0 ∈ int(Ω).
Proof. We only need to show that the absorbing property of Ω yields 0 ∈
int(Ω); the converse is obvious. If Ω is absorbing, we have
∞
X= nΩ.
n=1
Then the Baire category theorem ensures the existence of n0 ∈ N such that
int(n0 Ω) = int(n0 Ω) = ∅.
This implies that int(Ω) = ∅. Picking any x0 ∈ int(Ω) and using the absorbing
property of Ω, we find ε > 0 such that [−εx0 , εx0 ] ⊂ Ω. In particular, it shows
that −εx0 ∈ Ω, and thus 0 ∈ (−εx0 , x0 ) ⊂ int(Ω) by Lemma 2.12.
78 2 BASIC THEORY OF CONVEXITY
Proof. It suffices to verify the inclusion core(Ω) ⊂ int(Ω) whenever the set Ω
is nonempty, closed, and convex. Fix any x ∈ core(Ω) and observe that the
shifted set Ω − x is closed and absorbing. Thus 0 ∈ int(Ω − x) = int(Ω) − x
by Lemma 2.23. It shows that x ∈ int(Ω), and we are done.
Proof. Fix any x0 ∈ core(Ω) and show that A(x0 ) ∈ core(A(Ω)). Picking
v ∈ Y gives us v = A(u) for some u ∈ X. Choose δ > 0 such that x0 + tu ∈ Ω
whenever |t| < δ. Thus we get
A(x0 ) + tv = A(x0 + tu) ∈ A(Ω) whenever |t| < δ.
It follows therefore that A(x0 ) ∈ core(A(Ω)).
If core(Ω) = ∅, let us further check that
core A(Ω) ⊂ A core(Ω) .
First consider the case where 0 ∈ core(Ω). Choose any y ∈ core(A(Ω)) and
find t > 0 such that y + ty ∈ A(Ω), which tells us that
1 1
y∈ A(Ω) = A Ω .
1+t 1+t
1
Since 0 ∈ core(Ω), it follows that 1+t Ω ⊂ core(Ω); see Lemma 2.21. Thus
y ∈ A(core(Ω)), which justifies the equality in this case.
In the general case of the proposition, take any a ∈ core(Ω) and get
0 ∈ core(Ω − a). Then
A core(Ω − a) = core A(Ω − a) ,
which clearly yields the equality in (2.8).
Proof. Since Ω is absorbing, for any x ∈ X there exists δ > 0 such that
tx ∈ Ω whenever |t| < δ,
which shows that the value pΩ (x) is a real number. Let us now verify each of
the listed properties of pΩ .
(a) To check the subadditivity of pΩ , for any x, y ∈ X pick ε > 0 and find
numbers s, t > 0 such that
s < pΩ (x) + ε, t < pΩ (y) + ε, and x ∈ sΩ, y ∈ tΩ.
Since Ω is convex, we get x + y ∈ sΩ + tΩ = (s + t)Ω, and so
pΩ (x + y) ≤ s + t < pΩ (x) + pΩ (y) + 2ε.
This implies that pΩ (x + y) ≤ pΩ (x) + pΩ (y) and thus shows that pΩ is
subadditive. Taking further x ∈ X and λ > 0, we have
t
pΩ (λx) = inf t > 0 λx ∈ tΩ = inf t > 0 x ∈ Ω
λ
= λ inf s > 0 x ∈ sΩ = λpΩ (x),
which justifies the positive homogeneity of the Minkowski function.
(b) Pick any x ∈ X with pΩ (x) < 1 and find λ ∈ (0, 1) such that x ∈ λΩ.
Since Ω is absorbing, for any v ∈ X there exists γ > 0 with αv ∈ Ω whenever
|α| < γ. Thus (1 − λ)αv ∈ (1 − λ)Ω for all α ∈ R with |α| < γ. It follows from
the convexity of Ω that
x + (1 − λ)αv ∈ λΩ + (1 − λ)Ω = Ω whenever |α| < γ.
Letting δ := (1 − λ)γ yields |ε/(1 − λ)| < γ if |ε| < δ = (1 − λ)γ, and so
x + (1 − λ)ε/(1 − λ)v = x + εv ∈ Ω,
which verifies the inclusion x ∈ core(Ω).
Conversely, suppose that x ∈ core(Ω) and find γ > 0 with x + γx ∈ Ω.
1
Then we get pΩ (x) ≤ 1+γ < 1, which completes the proof of (b).
(c) Fix any x ∈ X with p(x) ≤ 1 and any λ ∈ (0, 1). Then p(λx) < 1 and
therefore λx ∈ core(Ω) ⊂ Ω. Since 0 ∈ Ω, it follows that (1 − λ)0 + λx ∈ Ω
for all λ ∈ (0, 1), and hence [0, x) ⊂ Ω. Thus we arrive at x ∈ lin(Ω).
80 2 BASIC THEORY OF CONVEXITY
For any open set V of R containing 0, find ε > 0 such that [0, ε] ⊂ V . Then
εΩ ⊂ p−1
Ω (V ).
Proof. (a) Fix any x ∈ B and any scalar t with |t| ≤ 1. Then p(tx) = |t|p(x) ≤
p(x) < 1. It tells us that tx ∈ B, and hence B is balanced. For any x, y ∈ B
and λ ∈ (0, 1) we get the relationships
p λx+(1−λ)y ≤ p(λx)+p (1−λ)y = |λ|p(x)+|1−λ|p(y) < λ+(1−λ) = 1,
which show that λx + (1 − λ)y ∈ B, and so B is convex. If p(x) = 0, then
x ∈ B. For p(x) > 0 we take t := 1/(p(x) + 1) and get p(λx) = |λ|p(x) < 1
whenever |λ| < t, which tells us that B is absorbing. We can similarly verify
that the set C is also balanced, convex, and absorbing.
(b) Due to Theorem 2.26 it remains to show that pΩ (λx) = |λ|pΩ (x) for all
x ∈ X and scalar λ ∈ R. Fixing any x ∈ X and λ with |λ| = 1, we deduce
from the balanced property of Ω, and hence of the set tΩ with t > 0, that
λx ∈ tΩ if and only if x ∈ tΩ.
It follows furthermore that
pΩ (λx) = inf t > 0 λx ∈ tΩ = inf t > 0 x ∈ tΩ = pΩ (x).
Taking now λ ∈ R \ {0}, we have the condition
λ
pΩ x = pΩ (x).
|λ|
Finally, Theorem 2.26(a) yields pΩ (λx) = |λ|pΩ (x) for any scalar λ.
Then it follows that pi (x0 ) ∈ Ii , and so there exists ε > 0 with (pi (x0 ) −
ε, pi (x0 ) + ε) ⊂ Ii for all i = 1, . . . , m. We claim that
x0 ∈ V (x0 , p1 , . . . , pm ; ε) ⊂ G.
Indeed, pick any vector x ∈ V (x0 , p1 , . . . , pm ; ε) and thus get pi (x − x0 ) < ε
for all i = 1, . . . , m. Then we have
|pi (x) − pi (x0 )| ≤ pi (x − x0 ) < ε.
It follows that pi (x) ∈ Ii for all i = 1, . . . , m, and thus x ∈ i=1 p−1
m
i (Ii ) ⊂ G,
which verifies that B is a basis of neighborhoods of x0 in (X, τw ).
Let us now show that the addition operation + : X ×X → X is continuous
on X ×X. By its linearity we only need to prove that it is continuous at (0, 0).
Fix any neighborhood V := V (0, p1 , . . . , pm ; ε) of the origin meaning that
V = x ∈ X pi (x) < ε for all i = 1, . . . , m ,
and let U := V (0, p1 , . . . , pm ; ε/2). It is easy to see that U + U ⊂ V , which
readily verifies the continuity of the addition on X × X.
To verify further that the scalar multiplication · : R × X → X is con-
tinuous, fix any number λ0 ∈ R, point x0 ∈ X, and neighborhood V :=
V (λ0 x0 , p1 , . . . , pm ; ε) of λ0 x0 . Take δ > 0 so small that
|λ0 |δ + pi (x0 )δ + δ 2 < ε for all i = 1, . . . , m.
If |λ−λ0 | < δ and x ∈ V (x0 , p1 , . . . , pm ; δ), then |λ| < |λ0 |+δ and pi (x−x0 ) <
δ for all i = 1, . . . , m. It follows that
2.2 Cores, Minkowski Functions, and Seminorms 83
The next theorem shows that the topology of any LCTV space is generated
by a family of seminorms.
Theorem 2.36 Let X be an LCTV space. Then its topology is generated
by some family of seminorms. This means that there exists a family P of
seminorms on X such that the weakest topology, which makes each p ∈ P
continuous, agrees with the given topology on X.
Proof. Since X is an LCTV space, there exists a basis of neighborhoods B of
the origin in X consisting of open balanced convex sets. Then P := {pΩ | Ω ∈
B} is a family of seminorms on X. Let τ be the given topology on X, and let
τw be the topology generated by P as in Theorem 1.19. To show that τ = τw ,
it suffices to verify that for any V ∈ B there exists a neighborhood U of the
origin in (X, τw ) such that U ⊂ V , and that for any neighborhood U of the
origin in (X, τw ) there exists V ∈ B with V ⊂ U . Indeed, picking V ∈ B gives
us pV ∈ P with V = {x ∈ X | pV (x) < 1}. Defining now U := V (0, pV ; 1), we
see that U is a neighborhood of the origin in (X, τw ) with U = V . Conversely,
take a neighborhood U of the origin in (X, τw ) and suppose without loss of
generality that
U = V (0, pV1 , . . . , pVm ; ε) = x ∈ X pVi (x) < ε for all i = 1, . . . , m ,
m
where V1 , . . . , Vm ∈ B and ε > 0. Then U = ε i=1 Vi . Finally, letting V :=
m
ε i=1 Vi yields V ∈ B with V = U . It verifies that τ = τw .
Next we proceed with the definition and properties of affine sets; see the
illustration in Figure 2.5. Given two elements a and b in a vector space X,
the line connecting them is
86 2 BASIC THEORY OF CONVEXITY
L[a, b] := λa + (1 − λ)b λ ∈ R .
Note that if a = b, then L[a, b] reduces to a singleton.
It follows from the definition that the intersection of any collection of affine
sets is also affine. This leads us to the construction of the affine hull of a set,
which is illustrated by Figure 2.6.
Now we consider further relationships between affine sets and linear sub-
spaces.
Proof. Suppose that a nonempty set Ω ⊂ X is affine. It follows from the last
assertion of Proposition 2.40 that the set Ω − ω is a linear subspace for any
ω ∈ Ω. Conversely, fix ω ∈ Ω and suppose that Ω − ω is a linear subspace,
which we denote by L. Then the set Ω = ω + L is obviously affine.
Now we define the major separation properties studied and applied in the
book. These definitions do not require any topology. Appropriate topological
structures will be imposed below when they are needed. The definition of the
simplest separation property is illustrated by Figure 2.7.
Proof. Invoking Proposition 2.48, it suffices to verify that (b) and (c) are
equivalent. Suppose first that x0 and Ω can be properly separated by a hyper-
plane. Let f : X → R be a nonzero linear function such that
f (x) ≤ f (x0 ) for all x ∈ Ω,
and let x ∈ Ω satisfy the condition
f (x) < f (x0 ).
Suppose on the contrary that x0 ∈ core(Ω). Then we choose t > 0 such that
x0 + t(x0 − x) ∈ Ω, and therefore
f (x0 + t(x0 − x)) ≤ f (x0 ) for all x ∈ Ω.
This yields f (x0 ) ≤ f (x), a contradiction, which verifies that x0 ∈
/ core(Ω).
Let us now prove the converse implication. Observe that core(Ω) is a
nonempty convex subset of X by Proposition 2.19 with core(core(Ω)) =
core(Ω) = ∅ by Corollary 2.22. Since x0 ∈ / core(Ω), we deduce from The-
orem 2.49 that the sets {x0 } and core(Ω) can be properly separated. It gives
us a nonzero linear function f : X → R such that
f (x) ≤ f (x) for all x ∈ core(Ω)
and also a vector w ∈ core(Ω) ⊂ Ω with f (w) < f (x0 ). Fix further any
u ∈ Ω and observe by Proposition 2.21 that tw + (1 − t)u ∈ core(Ω) whenever
0 < t ≤ 1. It follows therefore that
tf (w) + (1 − t)f (u) = f tw + (1 − t)u ≤ f (x0 ).
Passing to the limit as t ↓ 0 brings us to f (u) ≤ f (x0 ), which shows that the
sets {x0 } and Ω and x0 can be properly separated.
Proof. Fix λ ∈ (0, 1), define xλ := λa + (1 − λ)b, and then verify that xλ ∈
core(Ω). Suppose on the contrary that xλ ∈ / core(Ω). Then the single-point
92 2 BASIC THEORY OF CONVEXITY
set {xλ } and the set Ω can be separated by Theorem 2.50, i.e., there exists a
nonzero linear function f : X → R such that
f (x) ≤ f (xλ ) = λf (a) + (1 − λ)f (b) for all x ∈ Ω. (2.16)
Since b ∈ lin(Ω), the definition of the linear closure from (2.6) shows that
there exists w ∈ Ω such that [w, b) ⊂ Ω. Thus for all n ∈ N we have
1
xn := b + (w − b) ∈ Ω.
n
Then (2.16) tells us that
1 1
f (xn ) ≤ f (xλ ) ⇐⇒ f (w) − f (b) + λf (b) ≤ λf (a).
n n
Passing to the limit as n → ∞, we arrive at
f (b) ≤ f (a). (2.17)
Remembering that a ∈ core(Ω) implies that for all large m ∈ N we have
1
xm := a + (a − b) ∈ Ω.
m
Then (2.16) also ensures that
1 1
f (xm ) = f (a) + f (a) − f (b) ≤ λf (a) + (1 − λ)f (b).
m m
Passing there to the limit as m → ∞ and taking into account that λ ∈ (0, 1)
bring us to the equivalence
(1 − λ)f (a) ≤ (1 − λ)f (b) ⇐⇒ f (a) ≤ f (b). (2.18)
Combining now (2.17) and (2.18), we arrive at the equality f (a) = f (b).
Finally, it follows from the inclusion a ∈ core(Ω) that for any v ∈ X there
exists t > 0 such that a + tv ∈ Ω, which implies by (2.16) and the equality
f (a) = f (b) that f (v) = 0. The obtained contradiction verifies that xλ ∈ Ω,
and thus we complete the proof of the proposition.
In the last part of this subsection we are going to show that the Hahn-
Banach extension theorem in the full generality of Theorem 1.125 can be
derived from the “extreme” version of Theorem 2.49 that provides the separa-
tion of a singleton x0 ∈
/ Ω from a convex set Ω in the case where Ω = core(Ω).
To proceed, we first present the following two simple lemmas.
Proof. The convexity of the set Ω immediately follows from the definition.
Since core(Ω) ⊂ Ω, only the opposite inclusion is needed to be verified. To
this end, fix any x0 ∈ Ω and pick an arbitrary vector v ∈ X. If p(v) = 0, then
for any number 0 < λ < 1 we have
p(x0 + λx) ≤ p(x0 ) + λp(v) = p(x0 ) < 1.
In the case where p(v) = 0, define δ := (1 − p(x0 ))/p(v). If 0 < λ < δ, then
p(x0 + λv) ≤ p(x0 ) + λp(v)
1 − p(x0 )
< p(x0 ) + p(v)
p(v)
= p(x0 ) + 1 − p(x0 ) = 1,
which shows that x0 + λv ∈ Ω for all such λ. It tells us therefore that x0 ∈
core(Ω), and thus core(Ω) = Ω. Observing that 0 ∈ Ω = core(Ω), we also see
that the set Ω is absorbing.
To prove further that pΩ = p, fix x ∈ X and check first that pΩ (x) ≤ p(x).
Considering any λ > p(x), we have p(x/λ) < 1, which yields x/λ ∈ Ω and
so x ∈ λΩ. By the definition of the Minkowski function from (2.9) we get
pΩ (x) ≤ λ and hence pΩ (x) ≤ p(x). Taking now λ > 0 with x ∈ λΩ gives us
x = λw for some w ∈ Ω, and so p(w) < 1. Then p(x) = p(λw) = λp(w) < λ,
which ensures that p(x) ≤ pΩ (x). Since x was chosen arbitrarily, it verifies
that p = pΩ and thus completes the proof of the lemma.
Now we are ready to derive the Hahn-Banach theorem from the aforemen-
tioned extreme version of convex separation in vector spaces.
Theorem 2.54 (Hahn-Banach theorem from convex separation). Let X be a
vector space. Then the case of Theorem 2.49 where Ω = core(Ω) implies the
Hahn-Banach theorem formulated in Theorem 1.125.
This subsection continues the study of convex separation, but now in the
setting of vector spaces endowed with topological structures. In preparation
to establish major separation theorems for convex sets in (real) topological
vector spaces, we first present a simple lemma.
Based on Proposition 2.56, we say that two nonempty convex sets Ω1 and Ω2
in a topological vector space X can be separated by a closed hyperplane if
there exists a nonzero continuous linear function f : X → R such that
sup f (x) x ∈ Ω1 ≤ inf f (x) x ∈ Ω2 .
The proper separation by a closed hyperplane is defined by a similar way.
Now we are ready to derive a topological counterpart of Theorem 2.49.
96 2 BASIC THEORY OF CONVEXITY
Proof. It follows from Proposition 2.48 and Theorem 2.49 that there exists a
nonzero linear function f : X → R for which we have
sup f (x) x ∈ Ω ≤ f (x0 ) and inf f (x) x ∈ Ω < f (x0 ). (2.21)
It suffices to show that f is continuous. Indeed, it is obvious that f (x) ≤ α :=
f (x0 ) for all x ∈ Ω. Choose a ∈ Ω and a balanced neighborhood V of the
origin such that a + V ⊂ Ω. Then f (x) ≤ γ := α − f (a) for all x ∈ V . Since
0 ∈ V and V is balanced, we get that γ ≥ 0 and
f (V ) ⊂ [−γ, γ],
which shows that f (εV ) ⊂ [−εγ, εγ] for all ε > 0 and hence implies the
continuity of f at the origin. The linearity of f yields its continuity on X.
Assuming finally that Ω is open, we get
Ω = int(Ω) = core(Ω),
and thus (2.20) follows from (2.14) in Theorem 2.49.
The next theorem concerns the proper separation of two convex sets in
topological vector spaces under an interiority condition.
Next we define the notion of strict separation and establish the correspond-
ing version of convex separation theorems in LCTV spaces.
Note that the following theorem does not require the nonempty interior
assumption of either one of the sets in question while replacing it by the
compactness assumption imposed on one of the sets.
γ ≤ sup f (x) x ∈ V ≤ inf f (x) x ∈ Ω .
It follows furthermore that
sup f (x) x ∈ Ω1 < sup f (x) x ∈ Ω1 + γ ≤ inf f (x) x ∈ Ω2 .
The latter tells us that the sets Ω1 and Ω2 can be strictly separated.
Corollary 2.62 Let X be an LCTV space. Then X with the weak topology
σ(X, X ∗ ) is an LCTV space.
ple 1.100. These sets are disjoint and open sets with respect to the weak
topology on X. We obviously have that xi ∈ Θi for i = 1, 2, and thus the
weak topology σ(X, X ∗ ) on the space X is Hausdorff.
Next we show that the addition operation in (X, σ(X, X ∗ )) is continuous.
Fix any weakly open set W containing the origin. Then there exist elements
f1 , . . . , fm ∈ X ∗ and a number ε > 0 such that
V (f1 , . . . , fm ; ε) ⊂ W.
Letting U := V (f1 , . . . , fm ; ε/2), it is not hard to check that U is a weakly open
set containing the origin, and that U + U ⊂ W . This verifies the continuity
of the addition operation in (X, σ(X, X ∗ )). The proof of the continuity of
the scalar multiplication in (X, σ(X, X ∗ )) is left as exercises for the reader.
It follows from Corollary 1.102 that the space X endowed with the weak
topology σ(X, X ∗ ) is an LCTV space.
Corollary 2.63 Any closed convex set in an LCTV space is weakly closed.
Proof. Let Ω be a nonempty, closed, convex set in an LCTV space X, and let
/ Ω. Theorem 2.61 ensures the existence of f ∈ X ∗ and γ ∈ R such that
x0 ∈
f (x) < γ < f (x0 ) for all x ∈ Ω.
Then the set V := f −1 ((γ, ∞)) is a weakly open set with x0 ∈ V and Ω ∩ V =
∅. Thus the given set Ω is weakly closed in X.
2.3 Convex Separation Theorems 99
Corollary 2.64 Let X be an LCTV space, and let x ∈ X. If f (x) = 0 for all
f ∈ X ∗ , then x = 0. In particular, X ∗ is nonzero whenever X is nonzero.
Proof. Suppose on the contrary that x = 0. Employing Theorem 2.61 applied
to the convex sets Ω1 := {x} and Ω2 := {0} allows us to find f ∈ X ∗ and
γ ∈ R such that
f (x) < γ < 0.
This contradiction justifies the first claim.
Now suppose that X is nonzero and pick an element x = 0 in X. By the
first claim, there exists f ∈ X ∗ with f (x) = 0. Thus f is a nonzero element
in X ∗ , i.e., X ∗ = {0}.
The strict separation result of Corollary 2.64 and the next lemma allow
us to establish a major result on the duality relationship between the given
topology on an LCTV space X and the weak∗ topology on X ∗ .
Recall that the kernel of f : X → F is
ker(f ) := x ∈ X f (x) = 0 .
Lemma 2.66 Let X be a vector space over a field F, and let fi : X → F for
i = 1, . . . , m and f : X → F be linear functions satisfying the condition
m
ker(fi ) ⊂ ker(f ).
i=1
Suppose further that the conclusion holds for some positive integer n with the
inductive hypothesis
n+1
ker(fi ) ⊂ ker(f ).
i=1
and get γx∗ ∈ V for γ ∈ R; so f (x∗ ) = 0. Employing now Lemma 2.66 ensures
the existence of λ1 , . . . , λm ∈ R such that f = i=1 λi fi ∈ X ∗ , which yields
m
2.3 Convex Separation Theorems 101
m m m
f (x∗ ) = λi fi (x∗ ) = λi x∗ , xi = x∗ , λi xi .
i=1 i=1 i=1
Remark 2.69 Recall that the separation and related results presented in this
and preceding subsections concern real vector and topological vector spaces.
On the other hand, we may consider their counterparts over the field of com-
plex numbers F = C by taking into account that the definition of convexity,
which is given via real numbers λ ∈ (0, 1), stays in such spaces due to R ⊂ C.
However, since the formulation of separation theorems requires the number
ordering, we need to use the real part “Re” of the complex-valued separating
functions f : X → C.
(b) If the space X is locally convex and its subsets Ω1 and Ω2 are compact
and closed, respectively, then there exist f ∈ X ∗ and α, β ∈ R such that
Re f (x) < α < β < Re f (y) for all x ∈ Ω1 and y ∈ Ω2 .
Recall that the span of some set Ω, span Ω, is the linear subspace generated
by Ω. The following two propositions are simple albeit useful below.
which justifies the reverse inclusion and hence completes the proof.
m m
x= λi vi and λi = 0.
i=0 i=0
1 m
where μi := + αi ≥ 0 for i = 0, . . . , m. Since i=0 μi = 1, it ensures
m+1
that x ∈ Δm . Thus (v + δB) ∩ aff(Δm ) ⊂ Δm and therefore v ∈ ri(Δm ).
Proof. To verify (a), denote by m the dimension of Ω. Observe first that the
case where m = 0 is trivial, since in this case Ω is a singleton and ri(Ω) = Ω.
Suppose that m ≥ 1 and find m + 1 affinely independent elements v0 , . . . , vm
in Ω as in Lemma 2.82. Consider further the m-simplex
Δm := co v0 , . . . , vm
and get that aff(Δm ) = aff(Ω). Take v ∈ ri(Δm ), which exists by Theo-
rem 2.81. For any small ε > 0 we have
B(v, ε) ∩ aff(Ω) = B(v, ε) ∩ aff(Δm ) ⊂ Δm ⊂ Ω.
This verifies that v ∈ ri(Ω) by the definition of relative interior.
To prove (b), let L be the linear subspace of Rn parallel to aff(Ω), and let
m := dim(L). Then there is a bijective linear mapping A : L → Rm such
that A and A−1 are continuous. Fix x0 ∈ aff(Ω) and define f : aff(Ω) →
Rm by f (x) := A(x − x0 ). It is easy to check that f is a bijective affine
mapping and that both f and f −1 are continuous. We see that a ∈ ri(Ω) if
and only if f (a) ∈ int(f (Ω)), and that b ∈ Ω if and only if f (b) ∈ f (Ω). Then
[f (a), f (b)) ⊂ int(f (Ω)) by Lemma 2.12. This shows that [a, b) ⊂ ri(Ω).
Theorem 2.84 Let Ω be a nonempty convex subset of Rn . Then the sets ri(Ω)
and Ω are also convex, and we have:
(a) ri(Ω) = Ω.
(b) ri(Ω) = ri(Ω).
Proof. Note that the convexity of ri(Ω) follows from Theorem 2.83 while the
convexity of Ω was proved in Proposition 2.11. To justify (a), observe that
the inclusion ri(Ω) ⊂ Ω is obvious. For the reverse inclusion, pick b ∈ Ω and
a ∈ ri(Ω) and then form the sequence
1 1
xk := a + 1 − b, k ∈ N,
k k
which converges to b as k → ∞. Since xk ∈ ri(Ω) by Theorem 2.83, we have
b ∈ ri(Ω). Thus Ω ⊂ ri(Ω), which verifies the first assertion of the theorem.
To verify (b), we need to show that ri(Ω) ⊃ ri(Ω). Pick x ∈ ri(Ω) and
x ∈ ri(Ω). It follows from Proposition 2.73 that z := x + t(x − x) ∈ Ω if t > 0
is small, and so we get x = z/(1 + t) + tx/(1 + t) ∈ (z, x) ⊂ ri(Ω).
Proof. Pick y ∈ B(ri(Ω)) and find x ∈ ri(Ω) such that y = B(x). Then
take vectors y ∈ ri(B(Ω)) ⊂ B(Ω) and x ∈ Ω with y = B(x). If x = x, then
y = y ∈ ri(B(Ω)). Consider the case where x = x. We can find x ∈ Ω such that
x ∈ ( x, x) and define y := B( x) ∈ B(Ω). Thus y = B(x) ∈ (B( x), B(x)) =
y , y), and so we get y ∈ ri(B(Ω)). To complete the proof, it remains to show
(
that B(Ω) = B(ri(Ω)) and then obtain by using Corollary 2.85 the inclusion
ri B(Ω) = ri B(ri(Ω) ⊂ B ri(Ω) .
Since B((riΩ)) ⊂ B(Ω), the continuity of B and Theorem 2.84 imply that
B(Ω) ⊂ B(Ω) = B ri(Ω) ⊂ B ri(Ω) .
Thus we have B(Ω) ⊂ B(ri(Ω)), which ends the proof of the theorem.
Proof. To verify that there exists v ∈ Rn such that (2.24) is satisfied, we only
need to apply the separation result of Theorem 2.61 to the singleton {x} and
the closure Ω of the convex set Ω.
This contradicts the condition v = 1 and so justifies the proper separation.
To prove the converse statement of the theorem, assume that Ω and {0}
can be properly separated and thus find v ∈ Rn such that
sup v, x x ∈ Ω ≤ 0 while v, x < 0 for some x ∈ Ω.
If on the contrary 0 ∈ ri(Ω) , it follows from Proposition 2.73 that
112 2 BASIC THEORY OF CONVEXITY
Now we are ready to establish the main separation theorem for convex sets
in finite-dimensional convex analysis.
We can choose 0 < t < 1 so small that u := x + t(x − x) ∈ B(x; δ). Since
u ∈ aff(Ω), it follows from (2.26) that u ∈ Ω, and hence we have
t 1
x= x+ u ∈ (x, u) = ri [u, x] .
1+t t+t
To verify the opposite implication, we use the assumption that for any x ∈ Ω
with x = x there exists u ∈ Ω such that x ∈ ri([u, x]). Suppose on the contrary
that x ∈/ ri(Ω). Then Theorem 2.92 tells us that the sets {x} and Ω can be
properly separated. Choose v ∈ Rn such that v, x ≤ v, x for all x ∈ Ω
and also a vector x0 ∈ Ω satisfying v, x0 < v, x. Picking u ∈ Ω with
x ∈ ri([x0 , u]), we see that {x} and [x0 , u] can be properly separated. This
yields x ∈
/ ri([x0 , u]), a contradiction, which completes the proof.
Proof. We first verify the fulfillment of the inclusion “⊂”. Consider the pro-
jection mapping P : Rm × Rn → Rm given by
P(x, y) := x for (x, y) ∈ Rm × Rn .
It follows from Theorem 2.86 that
P(ri(gph(F )) = ri(P(gph(F ))) = ri(dom(F )). (2.27)
Now, take any (x, y) ∈ ri(gph(F )) and get from (2.27) that x ∈ ri(dom(F )).
Since (x, y) ∈ ri(gph(F )) ⊂ gph(F ), we have y ∈ F (x). Fix any y ∈ F (x) with
y = y. Then (x, y) ∈ gph(F ) with (x, y) = (x, y). By Corollary 2.93, there
exists (u, z) ∈ gph(F ) and t ∈ (0, 1) such that
(x, y) = t(x, y) + (1 − t)(u, z).
Then x = tx + (1 − t)u, which implies (1 − t)x = (1 − t)u and so x = u. In
addition, y = ty + (1 − t)z ∈ (y, z), where z ∈ F (x). Using the equivalence in
Corollary 2.93 again yields y ∈ ri(F (x)).
To prove the opposite one, fix x ∈ ri(dom(F )) and y ∈ ri(F (x)). Suppose
on the contrary that (x, y) ∈
/ ri(gph(F )). Then Theorem 2.92 allows us to find
(u, v) ∈ Rm × Rn such that
u, x + v, y ≤ u, x + v, y whenever y ∈ F (x). (2.28)
Furthermore, there exists (x0 , y0 ) ∈ gph(F ) satisfying
114 2 BASIC THEORY OF CONVEXITY
Although the precise definition of a convex function with its domain and
epigraph are given only in Subsection 2.4.1, it is tempting to present right now
a direct consequence of Theorem 2.94 on the representation of relative interiors
for epigraphs of extended-real-valued convex functions in finite dimensions. An
infinite-dimensional version of this result is given below in Subsection 2.4.4.
The next example illustrates some typical cases of extreme points in the
plane; see also Figure 2.11.
Example 2.97 (a) Let Ω be the closed unit ball in R2 equipped with the
standard Euclidean or 2 -norm. Then ext(Ω) is the closed unit sphere
given by
ext(Ω) = u ∈ R2 u = 1 .
(b) Let Ω be the closed unit ball in R2 equipped with the sum or 1 -norm,
i.e.,
Ω := u = (u1 , u2 ) ∈ R2 u1 ≤ 1 , where u1 := |u1 | + |u2 |.
Then we have the set of extreme points
ext(Ω) = (0, 1), (0, −1), (−1, 0), (1, 0) .
(c) Let Ω be the closed unit ball in R2 equipped with the maximum or
∞ -norm, i.e.,
Ω := u = (u1 , u2 ) ∈ R2 u∞ ≤ 1 , where u∞ := max |u1 |, |u2 | .
Then we have the set of extreme points
ext(Ω) = (1, 1), (1, −1), (−1, 1), (−1, −1) .
Proposition 2.99 Let Ω be a convex set in a vector space X, and let {Fα }α∈I
be a family of faces of Ω such that F := α∈I Fα is nonempty. Then the set
F is also a face of Ω.
Proof. The convexity of the set F is obvious. Fix now any x, y ∈ Ω and λ ∈
(0, 1) and then suppose that λx+(1−λ)y ∈ F . Then we have λx+(1−λ)y ∈ Fα
for all α ∈ I. Since each Fα is a face of Ω, we get x, y ∈ Fα for all α ∈ I. Thus
x, y ∈ F , which shows that F is a face of Ω.
The next proposition involves the transitivity of the face notion and its
interaction with extreme points.
Proof. Assuming that (2.30) holds, fix any pairs (x, s), (y, t) ∈ epi(f ) and a
number λ ∈ (0, 1). Then we have
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y) ≤ λs + (1 − λ)t,
which immediately implies that
λ(x, s) + (1 − λ)(y, t) = λx + (1 − λ)y, λs + (1 − λ)t ∈ epi(f )
and shows that the epigraph epi(f ) is a convex subset of X × R.
Conversely, suppose that f is convex and pick x, y ∈ dom(f ) and λ ∈ (0, 1).
Then (x, f (x)), (y, f (y)) ∈ epi(f ). Definition 2.104 tells us that
λx + (1 − λ)y, λf (x) + (1 − λ)f (y) = λ x, f (x) + (1 − λ) y, f (y) ∈ epi(f )
and therefore ensures the inequality
120 2 BASIC THEORY OF CONVEXITY
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
The latter also holds if x ∈ / dom(f ) or y ∈ / dom(f ), and so (2.30) is satisfied.
To verify now the equivalence with the extended Jensen inequality, observe
first that (b) implies (a). Thus the only thing we need to show is that the con-
vexity of f implies (b). To proceed, fix xi ∈ X and λi > 0 for i = 1, . . . , m
m
with i=1 λi = 1. It suffices to consider the case where xi ∈ dom(f ) for
i = 1, . . . , m. Then (xi , f (xi )) ∈ epi(f ) for every i = 1, . . . , m. Using Proposi-
tion 2.6, we conclude that
m
m m
λi xi , f (xi ) = λi xi , λi f (xi ) ∈ epi(f ),
i=1 i=1 i=1
which verifies (2.31) and hence completes the proof of the theorem.
The next two examples describe classes of functions associated with given
sets that are highly important in convex analysis and its extensions.
Proof. Fix a < b with a, b ∈ I and assume that the function f is convex. Then
we get from Lemma 2.115 that
f (x) − f (a) f (b) − f (a)
≤ for every x ∈ (a, b).
x−a b−a
This implies by the derivative definition that
f (b) − f (a)
f (a) ≤ .
b−a
Similarly we arrive at the estimate
126 2 BASIC THEORY OF CONVEXITY
f (b) − f (a)
≤ f (b)
b−a
and conclude that f (a) ≤ f (b), i.e., f is a nondecreasing function.
To prove the converse implication, suppose that f is nondecreasing on I
and fix x1 < x2 with x1 , x2 ∈ I and t ∈ (0, 1). Then
x1 < xt < x2 for xt := tx1 + (1 − t)x2 .
Using the classical mean value theorem gives us numbers c1 , c2 with x1 <
c1 < xt < c2 < x2 such that we have the equalities
f (xt ) − f (x2 ) = f (c2 )(xt − x2 ) = f (c2 )t(x1 − x2 ) and
f (xt ) − f (x1 ) = f (c1 )(xt − x1 ) = f (c1 )(1 − t)(x2 − x1 ),
which can be equivalently rewritten as
tf (xt ) − tf (x1 ) = f (c1 )t(1 − t)(x2 − x1 ) and
(1 − t)f (xt ) − (1 − t)f (x2 ) = f (c2 )t(1 − t)(x1 − x2 ).
Summing up these equalities and using f (c1 ) ≤ f (c2 ) give us the estimate
f (xt ) ≤ tf (x1 ) + (1 − t)f (x2 ),
and thus justifies the convexity of the function f .
Example 2.119 Each of the functions below is convex on the given domain:
(a) f (x) := eax on R, where a ∈ R.
(b) f (x) := xq on [0, ∞), where q ≥ 1 is a constant.
(c) f (x) := − ln(x) on (0, ∞).
(d) f (x) := x ln(x) on (0, ∞).
(e) f (x) := 1/x on (0, ∞).
(f) f (x1 , x2 ) := x2n 2n 2
1 + x2 on R , where n ∈ N.
(g) f (x) = Ax, x + b, x + c on Rn , where A is a positive-semidefinite
matrix, b ∈ Rn , and c ∈ R.
Proposition 2.120 For every a, b ≥ 0 and 0 < θ < 1 we have the inequality
aθ b1−θ ≤ θa + (1 − θ)b. (2.36)
Proof. It suffices to consider the case where a > 0 and b > 0. It follows from
the convexity of the function f (x) := − ln(x) on (0, ∞) that
− ln(θa + (1 − θ)b) ≤ −θ ln(a) − (1 − θ) ln(b),
which implies in turn that
ln(aθ b(1−θ) ) ≤ ln(θa + (1 − θ)b).
Then (2.36) is satisfied since f is monotone increasing.
b b
Proof. If either ( a |f |p dγ)1/p = 0 or ( a |g|q dγ)1/q = 0, then f = 0 a.e. and
g = 0 a.e., respectively. Thus inequality (2.38) is satisfied in this case because
b
its left-hand side is zero. Consider now the case where 0 < ( a |f |p dγ)1/p < ∞
b q
and 0 < ( a |g| dγ)1/q < ∞. For each x ∈ [a, b] we define the numbers
|f (x)|p |g(x)|q
a := b , b := b ,
a
|f |p dγ a
|f |q dγ
and then let θ := 1/p. It follows from (2.36) that
|f (x)g(x)| |f (x)|p |g(x)|q
b b 1/q ≤ b
+ b
.
a
|f |p dγ)1/p a |f |q dγ p a |f |p dγ q a |f |q dγ
Integrating both sides of this inequality, we arrive at (2.38).
The next classical result is known as Young’s inequality.
Proposition 2.123 Let p, q > 0 be such numbers that 1/p + 1/q = 1. Then
we have the estimate
|x|p |y|q
|xy| ≤ + whenever x, y ∈ R.
p q
Proof. It suffices to apply (2.36) with a := |x|p , b := |y|q , and θ = 1/p.
Now we come back to the general setting of vector spaces and note first that
the convexity of functions is obviously a unilateral notion meaning that −f
may not be convex for a convex function f as, e.g., for f (x) = |x| on R.
Furthermore, convexity is not preserved under some simple operations even
over linear functions such as taking the minimum; see, e.g., min{x, −x} =
−|x|. However, many operations particularly important in convex analysis
and applications preserve convexity. We discuss them in this subsection.
2.4 Convexity of Functions 129
The next result deals with the supremum of convex functions over an
arbitrary index set. It largely extends the statement of Proposition 2.124(c).
Proposition 2.127 Let X be a vector space, and let fi : X → R for i ∈ I
be a collection of convex functions with a nonempty index set I. Then the
supremum function f (x) := supi∈I fi (x) is convex on X.
In this section we assume that μ(x) > −∞ for every x ∈ X and also use the
convention that inf(∅) := ∞ in this definition and throughout the book.
The following theorem shows that convexity is preserved in the general
settings of marginal functions.
Proof. Pick x1 , x2 ∈ dom(μ), λ ∈ (0, 1), and ε > 0. Then find yi ∈ F (xi ) with
ϕ(xi , yi ) < μ(xi ) + ε, i = 1, 2.
It directly implies the inequalities
λϕ(x1 , y1 ) < λμ(x1 ) + λε, (1 − λ)ϕ(x2 , y2 ) < (1 − λ)μ(x2 ) + (1 − λ)ε.
Summing up these inequalities and employing the convexity of ϕ yield
ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 ≤ λϕ(x1 , y1 ) + (1 − λ)ϕ(x2 , y2 )
< λμ(x1 ) + (1 − λ)μ(x2 ) + ε.
Furthermore, the convexity of gph(F ) gives us
λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 = λ(x1 , y1 ) + (1 − λ)(x2 , y2 ) ∈ gph(F ),
and therefore λy1 + (1 − λ)y2 ∈ F (λx1 + (1 − λ)x2 ). This implies that
μ λx1 + (1 − λ)x2 ≤ ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2
< λμ(x1 ) + (1 − λ)μ(x2 ) + ε.
Letting now ε ↓ 0 ensures the convexity of the optimal value function μ.
Let us now define a property that provides a sufficient condition for the
equality epi(fG ) = G established in the next proposition.
It follows from the definition of fG that fG (x) ≥ (f1 . . . fm )(x). To verify
next that fG (x) ≤ (f1 . . . fm )(x), take any xi ∈ X for i = 1, . . . , m with
m m
x = i=1 xi . If xi ∈ dom(fi ) for all i = 1, . . . , m, then (x, i=1 fi (xi )) =
m m
i=1 (xi , fi (xi )) ∈ G, and so fG (x) ≤ i=1 fi (xi ). This inequality also holds
if xi ∈/ dom(fi ) for some i = 1, . . . , m. Employing (2.40) and (2.41) tells us
that fG (x) ≤ (f1 . . . fm )(x). Hence it follows from Corollary 2.134 that fG
is convex, which verifies the convexity of the infimal convolution (2.41).
Proof. Fix x ∈ X and define the following subsets of the real line:
A := t ∈ R (x, t) ∈ co epi(f ) and
m m m
B := λi f (xi )λi ≥ 0, λi = 1, xi ∈ X, λi xi = x, m ∈ N .
i=1 i=1 i=1
In this way we arrive at inf(A) ≥ inf(B) and thus complete the proof.
Now we define the closure of a function on a topological vector space and
study some of its properties prior to the subsequent convexification.
2.4 Convexity of Functions 135
Suppose now that g(x) ≤ f (x) on X and that the set epi(g) is closed. Then
we have epi(f ) ⊂ epi(g), and so epi(f ) ⊂ epi(g). It follows therefore that
g(x) = fepi(g) (x) ≤ fepi(f ) (x) = f (x) for all x ∈ X.
Since epi(f ) ⊂ epi(f ), we only need to check the inclusion epi(f ) ⊂ epi(f ).
Fix any (x, γ) ∈ epi(f ) and get f (x) ≤ γ. Then Proposition 2.130 implies that
(x, f (x)) ∈ epi(f ), and thus (x, γ) ∈ epi(f ).
Our next topic in this section is the continuity of convex functions defined
on topological vector spaces. We say that an extended-real-valued function
f : X → R defined on a topological vector space X is continuous at x ∈ X if
for any ε > 0 there exists a neighborhood U ⊂ dom(f ) of x such that
|f (x) − f (x)| < ε whenever x ∈ U.
Observe from the definition that if f is continuous at x, then x ∈ int(dom(f )).
First we show that the continuity of a convex function follows from its
local boundedness from above.
ε x 1 ε 1
0 = f (0) ≤ f − + f (x) ≤ c+ f (x),
1+ε ε 1+ε 1+ε 1+ε
which yields −cε ≤ f (x). This shows therefore that condition (2.43) is satis-
fied, and hence the function f is continuous at the origin.
If x = 0, let U := V − x and g(x) := f (x + x) − f (x). Then g is bounded
from above on the neighborhood U of the origin with g(0) = 0. Thus g is is
continuous at 0, which implies the continuity of f at x.
The next theorem provides characterizations of continuity of convex func-
tions on general topological vector spaces.
Theorem 2.144 Let X be a topological vector space, and let f : X → R be a
convex function. The following properties are equivalent:
(a) f is continuous at some point x ∈ X.
(b) f is bounded from above on a nonempty open subset of X.
(c) int(epi(f )) = ∅.
(d) int(dom(f )) = ∅ and f is continuous on int(dom(f )).
Proof. We split the proof into several steps.
Proof. Denote η := f (x) and take M > 0 with f (x) ≤ M for all x ∈ B(x; δ).
Picking any u ∈ B(x; δ), consider the point x := 2x − u ∈ B(x; δ) with
x + u 1 1
η = f (x) = f ≤ f (x) + f (u),
2 2 2
It yields f (u) ≥ 2η − f (x) ≥ 2η − M , and thus f is bounded on B(x; δ).
The next theorem shows that for convex functions on normed spaces the
local boundedness yields their local Lipschitz continuity.
1
n n
ε ελi
x=x+ u=x+ ei = x + ελi ei .
n i=1
n i=1
n
Denoting finally γi := ελi implies that |γi | ≤ ε. It follows from (a) that
x + ελi ei = x + γi ei ∈ co(A), and thus x ∈ co(A) since this point is defined
as a convex combination of some elements in co(A).
The obtained lemma leads us to yet another striking consequence of The-
orem 2.144 in the case of finite-dimensional spaces.
Corollary 2.152 Any extended-real-valued convex function f : Rn → R is
locally Lipschitz continuous on the interior of its domain int(dom(f )).
Proof. Let {ei | i = 1, . . . , n} be the standard orthonormal basis of Rn . Pick
x ∈ int(domf ) and choose ε > 0 such that x ± εei ∈ dom(f ) for every i =
1, . . . , n. Considering the set A from Lemma 2.151, we get B(x; ε/n) ⊂ co(A)
by the second assertion of the lemma. Denote M := max{f (a) | a ∈ A} < ∞
over the finite set A. Using the representation
m m
x= λi ai with λi ≥ 0, λi = 1, ai ∈ A
i=1 i=1
and so f is bounded from above on B(x; ε/n). Then Theorem 2.144 tells us
that f is locally Lipschitz continuous on int(dom(f )).
142 2 BASIC THEORY OF CONVEXITY
Proof. Suppose that f is l.s.c. at x. Since λ := f (x) − ε < f (x), there exists
a neighborhood V of x with
f (x) − ε = λ < f (x) for all x ∈ V.
Conversely, suppose that for any ε > 0 there is a neighborhood V of x such
that (2.46) holds. Fix λ < f (x) and choose ε > 0 with λ < f (x) − ε. Then
λ < f (x) − ε < f (x) whenever x ∈ V
on some neighborhood V , so we complete the proof.
Θ := x ∈ X f (x) ≤ α .
Since Θ is closed and since x ∈
/ Θ, there exists a neighborhood V of x satisfying
the inclusion V ⊂ Θc . This tells us that
f (x) > α for all x ∈ V,
which justifies the lower semicontinuity of f at x and completes the proof of
the proposition.
a contradiction, which thus completes the proof. Observe from the proof that
the implication =⇒ above holds whenever X is a topological space.
The next lemma is needed to derive the main result of this subsection.
Lemma 2.162 Let X be a topological space, and let {Ωk }k∈N be a sequence
of nonempty closed compact subsets of X such that Ωk+1 ⊂ Ωk for all k ∈ N.
Then
∞
Ωk = ∅.
k=1
∞
Proof. Suppose on the contrary that k=1 Ωk = ∅. Then
∞ ∞ ∞
∞
Ω1 = Ω1 \ Ωk ⊂ X \ Ωk = X \ Ωk = Ωkc .
k=1 k=1 k=1 k=1
The collection of open sets {Ωkc }k∈N clearly covers Ω1 , which is a compact
subset of X. Thus there exist k1 < k2 < . . . < km such that
146 2 BASIC THEORY OF CONVEXITY
m m
Ω1 ⊂ Ωkci = X \ Ωk i = X \ Ωk m .
i=1 i=1
Proof. Choose a strictly decreasing sequence {αk } of real numbers for which
inf{f (x) | x ∈ X} < αk < α whenever k ∈ N and limk→∞ αk = inf{f (x) | x ∈
X}. Define the sets
Ωk := x ∈ X f (x) ≤ αk , k ∈ N,
and note that each Ωk is compact as a closed subset of a compact set.
Thus {Ωk }k∈N satisfies the assumptions of Lemma 2.162, which tells us that
∞ ∞
k=1 Ωk = ∅. Picking further any x ∈ k=1 Ωk = ∅, we get that
Proof. Equip the space X with the weak topology. Since the sublevel set Lα
is a closed and bounded convex set, it is weakly compact in X. We know
from Proposition 2.161 that any convex l.s.c. function is weakly l.s.c. Then
the conclusion of this corollary follows directly from Theorem 2.164.
The function f is obviously l.s.c. and convex as the sum of two l.s.c. convex
functions. Fixing α > inf w∈X f (w) = d(x0 ; Ω), we get
Ωα = w ∈ X f (w) ≤ α = w ∈ Ω f (w) ≤ α
= w ∈ Ω x0 − w ≤ α ⊂ Ω ∩ B(0; r),
where r := x0 + α. This tells us that the set Ωα is bounded. Invoking now
Theorem 2.165, we find w0 ∈ X such that
f (w0 ) = inf f (w) = d(x0 ; Ω).
w∈X
Proof. To verity the “if” part, observe that relatively absorbing points of Ω
can be equivalently described as follows: for any x ∈ Ω \{x} there exists α > 1
such that (1 − α)x + αx ∈ Ω. Pick now any nonzero vector v ∈ cone(Ω − x)
and find λ > 0 with λv + x ∈ Ω. The relative absorbing property of x gives
us a number α > 1 such that
(1 − α)(λv + x) + αx = λ(α − 1)(−v) + x ∈ Ω.
This yields −v ∈ cone(Ω − x), and so cone(Ω − x) is a linear subspace of X.
To justify the converse implication, take any x ∈ iri(Ω) and x ∈ Ω with
x = x. Since cone(Ω − x) is a linear subspace of X,
x − x ∈ cone(Ω − x) and x − x ∈ cone(Ω − x).
Choose t > 0 such that x − x = t(w − x) with w ∈ Ω and denote α := 1 + 1/t.
Since α > 1, this tells us that
(α − 1)(x − x) + x = (α − 1)x + αx ∈ Ω,
which means that x is a relatively absorbing point of Ω.
Proof. Suppose first that x ∈ qri(Ω). It follows from the definition that the set
Θ, where Θ := cone(Ω − x), is a linear subspace of X. An easy exercise shows
◦
that Θ is also a linear subspace of X ∗ . Then Proposition 2.172 tells us that
◦
N (x; Ω) = Θ is a linear subspace of X ∗ . In the other direction, suppose that
N (x; Ω) is a linear subspace of X ∗ . Then we deduce from Proposition 2.171
and Proposition 2.172 that
N (x; Ω)◦ = (Θ◦ )◦ = Θ.
Since N (x; Ω) is a linear subspace of X ∗ , the set N (x; Ω)◦ is also a linear
subspace of X. Thus Θ is a linear subspace of X, and therefore x ∈ qri(Ω).
Proof. We first show that ri(Ω) ⊂ iri(Ω). Take x ∈ ri(Ω) and fix x ∈ Ω with
x = x. It follows from (2.49) that x ∈ Ω and there exists a neighborhood U
of x for which we have the inclusion
U ∩ aff(Ω) ⊂ Ω. (2.57)
Choose 0 < t < 1 so small that u := x + t(x − x) ∈ U . Then u ∈ aff(Ω) ⊂
aff(Ω), and we get from (2.57) that u ∈ Ω. It follows that
t 1
x= x+ u ∈ (x, u),
1+t 1+t
which therefore verifies by Proposition 2.169 that x ∈ iri(Ω). This tells us
that ri(Ω) ⊂ iri(Ω). The other inclusion iri(Ω) ⊂ qri(Ω) in (2.55) is trivial,
since the subspace property of cone(Ω − x) clearly implies that the closure
cone(Ω − x) is also a linear subspace of X.
To prove the equalities in (2.56), it is sufficient to show that if ri(Ω) = ∅
and x ∈ qri(Ω), then x ∈ ri(Ω). Suppose on the contrary that x ∈ / ri(Ω)
and begin with the case where x = 0. If 0 ∈ / Ω in this case, then the strict
separation theorem yields the existence of x∗ ∈ X ∗ such that
x∗ , x < 0 for all x ∈ Ω. (2.58)
In the complement setting where 0 ∈ Ω \ ri(Ω), denote X0 := aff(Ω) and get
0 ∈ X0 telling us that X0 is a closed linear subspace of X. It is easy to see
that 0 ∈
/ ri(Ω) = intX0 (Ω), where intX0 (Ω) is the interior of Ω with respect
to the space X0 . Applying the separation result from Corollary 2.59 to the
sets Ω and {0} in the topological space X0 , we find x∗0 ∈ X0∗ ensuring that
x∗0 , x ≤ 0 for all x ∈ Ω, and x∗0 , u < 0 for some u ∈ Ω. (2.59)
Then the Hahn-Banach extension theorem from Theorem 2.37 shows that
there exists an extension x∗ ∈ X ∗ of x∗0 such that
x∗ , x ≤ 0 for all x ∈ Ω.
In either case there exists x∗ ∈ X ∗ such that x∗ , x ≤ 0 for all x ∈ Ω and
hence for all x ∈ cone(Ω). Since 0 ∈ qri(Ω), we have that cone(Ω) is a linear
subspace, and therefore
2.5 Extended Relative Interiors in Infinite Dimensions 153
It is well known that, the condition ri(Ω) = ∅ ensures that the equalities
in Theorem 2.174 always hold for convex sets in finite dimensions. However, it
is not the case in many important infinite-dimensional settings. In particular,
it has been well recognized that the natural ordering/positive cones in the
standard Lebesgue spaces of sequences p and functions Lp [0, 1] for any p ∈
[1, ∞) have empty relative interiors. Thus the usage of (2.49) significantly
restricts the spectrum of applications of infinite-dimensional convex analysis
in various optimization problems (particularly of its vector-valued and set-
valued aspects), equilibria, economic modeling, etc.; see Section 2.7 for more
discussions.
As we see below, in the case where ri(Ω) = ∅ the inclusions in (2.55) may
be strict in the simplest infinite-dimensional Hilbert space of sequences 2
with both sets iri(Ω) and qri(Ω) being nonempty.
It follows from (2.62) that z ∈ N (x; Ω), which is a linear subspace of X. This
tells us that −z ∈ N (x; Ω), and so −z, 0 − x ≤ 0, which contradicts the
conditions −z, 0 − x = z, x = 1. The obtained contradiction shows that x
belongs to the set on the right-hand side of (2.60).
To proceed further, fix any x from the set on the right-hand side of (2.60)
and suppose on the contrary that x ∈ / qri(Ω). Then N (x; Ω) is not a linear
subspace of X. Thus we can find z = 0 such that z ∈ N (x; Ω). It follows from
(2.62) that x, z = z∞ = 0, which yields
∞ ∞ ∞
z∞ = x, z = xk zk ≤ |xk |·|zk | ≤ z∞ |xk | = z∞ x1 ≤ z∞ .
k=1 k=1 k=1
This implies that x1 = 1 and |zk | = z∞ > 0 whenever xk = 0. Since
z ∈ 2 , we see that there exists k0 ∈ N such that xk = 0 for all k ≥ k0 , a
contradiction that gives us x ∈ qri(Ω) and completes the proof of (2.60).
The next example demonstrates that the intrinsic relative interior may be
empty for convex subsets of 2 .
Example 2.176 Let X := 2 , and let Ω ⊂ X be given by
∞ 1/2
Ω := x = (x1 , x2 , . . .) ∈ X x2 := |xk |2 ≤ 1, xk ≥ 0 for all k ∈ N .
k=1
We are going to show that iri(Ω) = ∅ for this set. Assume on the contrary
that there exists x ∈ iri(Ω). Following the arguments in Example 2.175, we
2.5 Extended Relative Interiors in Infinite Dimensions 155
have x < 1. To show now that xk > 0 for all k ∈ N, suppose, e.g., that
x1 = 0 and then let x := (1, 0, 0, . . .) ∈ Ω. Since x ∈ iri(Ω), there exists z ∈ Ω
such that x = t x + (1 − t)z for some t ∈ (0, 1), which readily implies that
−t(1 − t) < 0. This is a contradiction showing that iri(Ω) = ∅.
Proposition 2.169 tells us that whenever x ∈ Ω we have (1 − α)x + αx ∈ Ω
for some α > 1. Fix ε > 0 and select an increasing sequence of natural numbers
{kn } with 0 < xkn ≤ ε/4n . Defining x ∈ Ω by x kn := ε/2n and x k := 0 for
all other k ∈ N, let us check that (1 − α) x + αx ∈ / Ω whenever α > 1. Indeed,
we have the estimate
ε ε
x + αx k ≤ 1 − α n + α n < 0
(1 − α)
n 2 4
for n sufficiently large, which justifies the claim of this example.
It can be directly checked that for the set Ω from Example 2.176 we
have qri(Ω) = ∅. In fact, this remarkable property holds for any nonempty,
closed, and convex subset of every separable Banach space, and even in more
generality; see Theorem 2.178 below. To prove this major theorem, we first
present the following useful proposition, which provides a characterization of
qri(Ω) via nonsupport points of Ω. Recall that x ∈ Ω is a nonsupport point
of this set if any closed supporting hyperplane to Ω at x contains Ω.
On the other hand, it follows from x∗ , −v < 0 and v ∈ cone(Ω − x) that
there exist x ∈ Ω and λ ≥ 0 satisfying x∗ , −λ(x − x) < 0. The latter implies
that x∗ , x − x > 0, which is a contradiction telling us that x ∈ qri(Ω) and
thus completing the proof of the proposition.
Proof. For brevity we present the proof for the case where Ω is closed and
bounded. The reader can consult [47, Theorem 2.19] to a more general version
of this theorem in separable Fréchet spaces, where the boundedness assump-
tion on Ω is dropped and the set in question is CS-closed.
To prove the theorem under the imposed assumptions, we use the sepa-
rability of X and select an arbitrary sequence {xk } ⊂ Ω that is dense in Ω.
Consider further the vector
∞
1
x := xk ,
2k
k=1
where γ := 1+λα
1+α ∈ (0, 1), and thus x ∈ (y, y ). Applying Proposition 2.169
shows that x ∈ iri(Ω).
To prove the final assertion (c), fix x ∈ qri(Ω) and x ∈ Ω, and then let
y := λx + (1 − λ) x with λ ∈ (0, 1]. Using Theorem 2.173, we intend to show
that N (y; Ω) is a linear subspace. Indeed, fix any x∗ ∈ N (y; Ω) and then get
x∗ , x − y ≤ 0 for all x ∈ Ω. (2.64)
Plugging x = x into (2.64) gives us x∗ , x − x ≤ 0, while plugging x = x
into
(2.64) yields x∗ , x − x ≤ 0. Unifying the above, we arrive at the equality
x∗ , x = x∗ , x
. Furthermore, it follows from (2.64) that
x∗ , x − y = x∗ , x − x
− λx∗ , x − x
= x∗ , x − x ≤ 0 for all x ∈ Ω,
which verifies that x∗ ∈ N (x; Ω). Since x ∈ qri(Ω), Theorem 2.173 brings us
to −x∗ ∈ N (x; Ω). Taking into account that x∗ , x = x∗ , x
, we have
−x∗ , x − y = λ−x∗ , x − x + (1 − λ)−x∗ , x − x
≤ 0 whenever x ∈ Ω,
and hence −x∗ ∈ N (y; Ω). This tells us that N (y; Ω) is a linear subspace, and
the result follows from Theorem 2.173.
Proof. The conclusion follows directly from Proposition 2.177. Here we pro-
vide an alternative proof by employing the normal cone characterization of
the quasi-relative interior. Indeed, by (2.54) we get that x ∈ / qri(Ω) if and
only if there exists x∗ ∈ N (x; Ω) with −x∗ ∈ / N (x; Ω). It follows from the
normal cone construction (2.50) for convex sets that x∗ , x ≤ x∗ , x for
all x ∈ Ω. Then the inclusion −x∗ ∈ / N (x; Ω) gives us x0 ∈ Ω such that
−x∗ , x0 > −x∗ , x, which reads as x∗ , x0 < x∗ , x and hence justifies the
statement of the proposition.
Proof. Since x ∈
/ Ω, we see that the sets {x} and Ω are strictly separated in
X, which means that there exists a vector v ∈ X such that
sup v, x x ∈ Ω < v, x. (2.65)
It is well known that any Hilbert space X can be represented as the direct
sum X = L ⊕ L⊥ , where
L⊥ := w ∈ X w, x = 0 for all x ∈ L .
If v ∈ L⊥ , then (2.65) immediately gives us a contradiction. Thus v ∈ X is
represented as v = u + w with some 0 = u ∈ L and w ∈ L⊥ . This implies that
for each x ∈ Ω ⊂ L we have
u, x = u, x + w, x = v, x ≤ sup v, x x ∈ Ω < v, x
= u + w, x = u, x,
which shows that sup{u, x | x ∈ Ω} < u, x.
Before establishing the main convex separation theorem for two nonsolid
sets in terms of their extended relative interiors in LCTV spaces, we present
some calculus rules involving both intrinsic relative and quasi-relative interiors
of convex sets. These rules are of their own interest while being instrumental
for deriving the aforementioned convex separation theorem.
Proof. First we verify assertion (a). Fix any x ∈ iri(Ω) and get that cone(Ω −
x) is a linear subspace of X. The linearity of A shows that
160 2 BASIC THEORY OF CONVEXITY
cone A(Ω) − Ax = A cone(Ω − x)
is a linear subspace of Y . Thus Ax ∈ iri A(Ω) . This justifies the first inclusion
in (2.66). Now, pick any x ∈ qri(Ω) and deduce from (2.54) that N (x; Ω) is a
linear subspace of X ∗ . Then take y ∗ ∈ N (Ax; A(Ω)) meaning that y ∗ , Ax −
Ax ≤ 0 for all x ∈ Ω, which tells us that A∗ y ∗ ∈ N (x; Ω). By the subspace
property of N (x; Ω) we get that −A∗ y ∗ ∈ N (x; Ω), which is equivalent to
−y ∗ ∈ N (Ax; A(Ω)). Thus the normal cone N (Ax; A(Ω)) is a linear subspace
of Y ∗ , and so Ax ∈ qri(A(Ω)) by (2.54). This verifies (2.66).
To justify (b), it suffices to check the inclusion “⊃” in (2.67) under the
assumption that iri(Ω)
= ∅. Fix x ∈ iri(Ω) and set y := Ax.
It follows from
(2.66) that y ∈ iri A(Ω) . Fix further any y ∈ iri A(Ω) . If y = y, then
y ∈ A iri(Ω) . In the case where y = y, Proposition 2.169 shows that there is
u ∈ A(Ω) such that y ∈ (u, y). Pick x ∈ Ω with Ax = u and get
y = tu + (1 − t)y = tA(
x) + (1 − t)Ax = A(t
x + (1 − t)x)
for some t ∈ (0, 1). Since x ∈ iri(Ω) and x ∈ Ω, Proposition 2.180(b) shows
x, x] ⊂ iri(Ω).
that ( Thus xt := t
x + (1 − t)x ∈ iri(Ω) satisfies y = Axt . It
follows that y ∈ A iri(Ω) .
Finally, we prove the inclusion “⊃” in assertion (c) under the assumptions
that qri(Ω) = ∅ and A(Ω) is quasi-regular. Fix x ∈ iri(Ω) and
set y := Ax.
By the second inclusion
in (a)
we have y = Ax ∈ A qri(Ω)
⊂ qri A(Ω) .
Fix any y ∈ qri A(Ω) = iri A(Ω) . If y = y, then y ∈ A qri(Ω) . If y = y,
by Proposition 2.169 there exists u ∈ A(Ω) such that y ∈ (u, y). Pick x ∈Ω
such that A x = u and get
y = tu + (1 − t)y = tA(
x) + (1 − t)Ax = A(t
x + (1 − t)x)
for some t ∈ (0, 1). Then y = Axt , where x + (1 − t)x) ∈ qri(Ω) by
xt := t
Proposition 2.180(c). Thus y ∈ A qri(Ω) , which completes the proof.
The next theorem presents the major separation result for two nonsolid
convex sets in arbitrary LCTV spaces.
Proof. First we verify that the assumptions of the theorem ensure that
qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ). (2.70)
Indeed, define a linear continuous mapping A : X ×X → X by A(x, y) := x−y
and let Ω := Ω1 × Ω2 . It is easy to check that qri(Ω) = qri(Ω1 ) × qri(Ω2 ),
2.5 Extended Relative Interiors in Infinite Dimensions 161
and thus qri(Ω) = ∅ under the assumptions made. Applying formula (2.68)
from Theorem 2.183(c) to these objects A and Ω gives us
qri(Ω1 − Ω2 ) = A qri(Ω) = qri A(Ω) = qri(Ω1 ) − qri(Ω2 ),
and thus we arrive at the claimed equality (2.70).
Consider further the set difference Ω := Ω1 − Ω2 and get from (2.70) that
condition (2.69) reduces to
/ qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ),
0∈
and hence 0 ∈ / qri(Ω1 − Ω2 ) = qri(Ω) under the fulfillment of (2.69). Then
Proposition 2.181 tells us that the sets Ω and {0} can be properly separated,
which clearly ensures the proper separation of the sets Ω1 and Ω2 .
To verify the converse implication, suppose that Ω1 and Ω2 can be properly
separated, which implies that the sets Ω = Ω1 − Ω2 and {0} can be properly
separated as well. Then using Proposition 2.181 and Theorem 2.183 yields
/ qri(Ω) = qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ),
0∈
and thus qri(Ω1 ) ∩ qri(Ω2 ) = ∅, which completes the proof.
As seen above and will be seen in the sequel, the quasi-regularity of convex
sets is needed for the fulfillment of many important results. Theorem 2.174
tells us the quasi-regularity of a convex set Ω holds in LCTV spaces if ri(Ω) =
∅ (in particular, for nonempty convex sets in finite dimensions), and of course if
Ω is a solid convex set. Next we reveal yet another general infinite-dimensional
setting where convex sets are quasi-regular.
Before establishing this result, let us present the following useful technical
lemma on intrinsic relative interiors.
Remark 2.187 The SNC property (2.71) is taken from [228] and investigated
therein for general nonconvex sets in Banach spaces. In the case of closed and
convex subsets Ω ⊂ X of such spaces, this property can be characterized
as follows [228, Theorem 1.21]: If a closed and convex set Ω has nonempty
relative interior, then it is SNC at every x ∈ Ω if and only if the closure of
the span of Ω is of finite codimension.
Proof. First we verify that in the case where 0 ∈ / iri(Ω) the sets Ω and {0}
can be properly separated, i.e., there exists a nonzero vector a ∈ X such that
sup a, x x ∈ Ω ≤ 0 and inf a, x x ∈ Ω < 0. (2.72)
If 0 ∈ Ω, this statement is trivial. Suppose now that 0 ∈ Ω \iri(Ω). Letting
L := aff(Ω) and employing Lemma 2.185 tell us that L is a linear subspace
of X, and that there is a sequence {xk } ⊂ L for which xk ∈ / Ω and xk → 0 as
k → ∞. By Proposition 2.182 we find a sequence {vk } ⊂ L with vk = 0 and
sup vk , x x ∈ Ω < vk , xk whenever k ∈ N.
Denote wk := vk
vk ∈ L with wk = 1 as k ∈ N and observe that
mj mj
aj = λji ωij with λji = 1 and ωij ∈ Ω for i = 1, . . . , mj ,
i=1 i=1
(b) The quasi-regularity of the domain dom(F ) yields the opposite inclusion
qri gph(F ) ⊃ (x, y) ∈ X × Y x ∈ qri dom(F ) , y ∈ qri F (x) .
(c) If both sets gph(F ) and dom(F ) are quasi-regular, then we have
qri gph(F ) = (x, y) ∈ X × Y x ∈ qri dom(F ) , y ∈ qri F (x) .
Since f (x) < λ, we get α ≥ 0. Using now x = x and λ = λ + 1 > λ > f (x) in
(2.79) shows that α ≤ 0, which yields α = 0. Thus it follows from (2.79) and
(2.80) that
the sets {x} and dom(f ) can be properly separated,
and hence
x∈ / qri dom(f ) . This contradiction shows that (x, λ) ∈ qri epi(f ) .
To verify the inclusion “⊂” in (2.78), define F : X → → R by F (x) :=
[f (x), ∞) and observe that gph(F ) = epi(f ) and dom(F ) = dom(f ). Fur-
thermore, for all x ∈ dom(f ) we easily get that qri(F (x)) = (f (x), ∞). The
imposed quasi-regularity of gph(F ) = epi(f ) ensures that the inclusion “⊂”
in (2.78) follows directly from assertion (a) of Theorem 2.189.
The next result, which is based on Theorems 2.189(a) and 2.190, provides
a precise formula for calculating the quasi-relative interior of epigraphs for
an important class of extended-real-valued convex functions on LCTV spaces
without quasi-regularity or any additional assumptions.
Proposition 2.191 Let Ω be a nonempty convex set in an LCTV space X.
Given x∗ ∈ X ∗ and b ∈ R, define the extended-real-valued function
x∗ , x + b if x ∈ Ω,
f (x) :=
∞ if x ∈
/ Ω.
Proof. The inclusion “⊃” in (2.81) follows from (2.77) in Theorem 2.190. To
verify the opposite inclusion “⊂” in (2.81), pick any (x0 , λ0 ) ∈ qri(epi(f ))
and show that (x0 , λ0 ) belongs to the set on the right-hand side of (2.81).
Defining F : X → → R by F (x) := [f (x), ∞) for all x ∈ X, we get that
dom(F ) = dom(f ) = Ω and gph(F ) = epi(f ). Following the lines in the proof
of Theorem 2.189(a), where the quasi-regularity of gph(F ) was not used, gives
us x0 ∈ qri(dom(F )) = qri(Ω). Thus it remains to show that λ0 > f (x0 ).
Suppose on the contrary that λ0 ≤ f (x0 ) and deduce from the inclu-
sion (x0 , λ0 ) ∈ epi(f ) that λ0 = f (x0 ), which ensures that (x0 , f (x0 )) ∈
qri(epi(f )). Remembering the definition of quasi-relative interior tells us that
the set L := cone(epi(f )−(x0 , f (x0 ))) is a linear subspace of X ×R. Hence for
any a := (x0 , f (x0 ) + 2) − (x0 , f (x0 )) = (0, 2) ∈ L we have −a = (0, −2) ∈ L.
Therefore, there exists a net {(γi )}i∈I ⊂ cone(epi(f ) − (x0 , f (x0 ))) such that
γi → −a = (0, −2), where
γi = μi (xi , λi ) − (x0 , f (x0 ))
2.5 Extended Relative Interiors in Infinite Dimensions 167
Proof. Denote by Ω the set on the right-hand side of (2.82) and show that
iri epi(f ) = Ω. (2.83)
To verify the “⊂” in (2.83), pick any (x, λ) ∈ iri(epi(f )) and check
inclusion
that x ∈ iri dom(f ) . Indeed, fixing x ∈ dom(f ) with x = x, we get (x, λ) ∈
epi(f ) with λ := f (x). Then we deduce from Proposition 2.177 the existence
of (u, γ) ∈ epi(f ) such that (x, λ) ∈ ((x, λ), (u, γ)), which gives us x ∈ (x, u).
Applying Proposition 2.177 again yields x ∈ iri(dom(f )).
Next we show that f (x) < λ. Suppose on the contrary that λ = f (x) and
∈ epi(f ) with λ
take any (x, λ) > f (x) meaning that
= (x, λ) = x, f (x) .
(x, λ)
(u, γ)).
By Proposition 2.177 we find (u, γ) ∈ epi(f ) with (x, f (x)) ∈ ((x, λ),
Hence there exists t0 ∈ (0, 1) such that
+ (1 − t0 )γ.
x = t0 x + (1 − t0 )u and λ = t0 λ
Employing the convexity of f shows that
168 2 BASIC THEORY OF CONVEXITY
which yields γ < f (u) and therefore (u, γ) ∈ / epi(f ). The obtained contradic-
tion tells us that λ > f (x) and results in the inclusion iri(epi(f )) ⊂ Ω.
Now we turn to the proof of the opposite inclusion in (2.83). Fix (x, λ) ∈ Ω
giving us x ∈ iri(dom(f )) and λ > f (x). Picking any (x, λ) ∈ epi(f ) with
(x, λ) = (x, λ), we intend to verify the existence of (y, β) ∈ epi(f ) for which
(x, λ) ∈ (x, λ), (y, β) .
To proceed, consider following two cases:
Case 1: x = x. Since x ∈ iri(dom(f )) and x = x ∈ dom(f ), there exists
u ∈ dom(f ) such that x ∈ (x, u). Choose γ ∈ R satisfying
(x, λ) ∈ (u, γ), (x, λ)
and check that there exists (y, β) ∈ ((u, γ), (x, λ)) with (y, β) ∈ epi(f ).
Suppose on the contrary that for every (y, β) ∈ ((u, γ), (x, λ)) we have
(y, β) ∈
/ epi(f ). Fix any t ∈ (0, 1) and define the t-dependent elements
yt := tu + (1 − t)x and βt := tγ + (1 − t)λ,
for which we get (yt , βt ) ∈ ((u, γ), (x, λ)). The convexity of f ensures that
tγ + (1 − t)λ = βt < f (yt ) ≤ tf (u) + (1 − t)f (x) ≤ tf (u) + (1 − t)λ.
Letting t ↓ 0 shows that λ = f (x), a contradiction justifying the existence
of (y, β) with (y, β) ∈ ((u, γ), (x, λ)) and (y, β) ∈ epi(f ). It yields (x, λ) ∈
((x, λ), (y, β)), and so (x, λ) ∈ iri(epi(f )) by Proposition 2.177.
Case 2: x = x. Then we have λ = λ, and it follows from λ > f (x) that
∈ epi(f ) with (x, λ) ∈ ((x, λ), (x, λ)).
there exists (x, λ) This justifies the
representation in (2.82) and complete the proof of the theorem.
Finally, we derive a remarkable consequence of the obtained results show-
ing that the quasi-regularity of epigraphs of convex functions yields this prop-
erty of the corresponding domains.
Exercise 2.195 Prove a more general result in comparison with Lemma 2.55,
namely, that any balanced subset Ω of a topological vector space X is connected.
Hint: Considering the set
Ω := [−x, x],
x∈Ω
note that each line segment [a, b] ⊂ X is connected due to the representation [a, b] =
ϕ([0, 1]), where ϕ : R → X is defined by ϕ(t) := ta + (1 − t)b for all t ∈ [0, 1].
Ω1 + Ω2 = co{Ω1 ∪ Ω2 }.
Exercise 2.199 Let Ω be a nonempty convex cone in a vector space X. Show that
Ω is a linear subspace of X if and only if Ω = −Ω.
(b) Construct examples showing that each inclusion above can be strict.
(c) Establish sufficient conditions for each inclusion to hold as an equality.
Exercise 2.203 Let Ω be an affine set in a vector space X, and let L be the linear
subspace parallel to Ω. Prove that L = Ω − Ω.
170 2 BASIC THEORY OF CONVEXITY
Exercise 2.204 Let Ω1 , Ω2 be nonempty convex sets in a vector space. Prove that:
(a) If Ω1 ∩ Ω2 = ∅ and core(Ω1 ) = ∅, then Ω1 and Ω2 can be separated by a
hyperplane. Hint: Let Ω := Ω1 − Ω2 . Then core(Ω1 ) − Ω2 ⊂ core(Ω), and so
core(Ω) = ∅. Since 0 ∈/ (Ω), it suffices to apply Theorem 2.49.
(b) If core(Ω1 ) ∩ Ω2 = ∅ and core(Ω1 ) = ∅, then Ω1 and Ω2 can be separated by a
hyperplane. Hint: Let A := core(Ω1 ) and B := Ω2 . Then A and B are disjoint
with core(A) = core(core(Ω1 )) = core(Ω1 ) = ∅. By part (a), the sets A and B
can be separated by a hyperplane. Then use Proposition 2.21 to show that Ω1
and Ω2 can be separated by a hyperplane.
Exercise 2.207 Let X be real vector space. Endow it with the topology τc gener-
ated by the family of all the seminorms on X and label this topology as the core
convex topology on X or the strongest topology which makes X a locally convex
topological vector space. Prove the following:
(a) Every absorbing convex set in X is a neighborhood of the origin.
(b) Every linear subspace of X is closed in the core convex topology.
(c) The dual space X ∗ := (X, τc )∗ is the collection of linear functions f : X → R.
(d) intτc (Ω) = core(Ω) for any convex set Ω ⊂ X.
Hint: Compare with [167, Exercise 2.10] and [181, Proposition 6.3.1].
Exercise 2.208 Using Exercise 2.207, clarify which results of Subsection 2.3.1 can
be derived from the corresponding ones obtained in Subsection 2.3.2.
Exercise 2.209 Derive Theorem 2.84 from Theorem 2.92. Hint: Observe first that
the proper separation of the sets Θ := {x} and Ω is equivalent to that for Θ and Ω.
Exercise 2.210 (a) Find two convex sets Ω1 and Ω2 in Rn such that Ω1 ⊂ Ω2
while ri(Ω1 ) ⊂ ri(Ω2 )
(b) Let Ω1 and Ω2 be two subsets of Rn such that Ω1 ⊂ Ω2 and Ω1 ∩ ri(Ω2 ) = ∅.
Prove that ri(Ω1 ) ⊂ ri(Ω2 ).
Exercise 2.213 Verify the convexity of all the functions listed in Example 2.119.
Hint: Employ the second-order characterizations of convexity given in Corollary 2.117
and Theorem 2.118.
2.6 Exercises for Chapter 2 171
Exercise 2.214 Let f : [a, b] → R with a < b be a continuous function such that
x + y f (x) + f (y)
f ≤ for all x, y ∈ [a, b].
2 2
Prove that f is convex.
Exercise 2.215 Let f : [a, b] → R with a < b be a convex function. Prove that
a + b 1 b
f (a) + f (b)
f ≤ f (x) dx ≤ .
2 b−a a 2
Exercise 2.217 Find the Hessian of the following function and prove that the func-
tion is convex on Rn :
Exercise 2.221 Justify Corollary 2.145 by employing the arguments similar to the
proof of Theorem 2.94 with the usage of Theorem 2.58 on the separation of convex
sets in topological vector spaces.
Exercise 2.222 Verify the description of relative absorbing points of convex sets
given in the proof of Proposition 2.169 and provide their geometric interpretation.
modeling, and optimal control. This period started with linear programming, which
involves optimizing linear functions on polyhedral convex sets. The invention of com-
puters and the development of computational technology made it possible to address
practically important optimization problems with many inequality constraints. It
was not the case with the classical optimization theory, chiefly due to Joseph-Louis
Lagrange (1736–1813), dealing with constraints given by equations/equalities. Novel
solution methods for linear programs, particularly exploiting the geometry of convex
polyhedra, were suggested and implemented by Leonid Kantorovich (1912–1986),
Tjalling Koopmans (1910–1985), George Dantzig (1914–2005), and their followers,
and then were successfully applied to solving a great variety of problems appearing
in military, economics, engineering, transportation, manufacturing, energy, etc. All
of this has constituted one of the most monumental applications of optimization
(and of mathematics in general) to society.
Extending models of linear programming, Harold Kuhn (1925–2014) and Albert
Tucker (1905–1995), in their pioneering paper [193] published in 1951, formulated
problems of convex programming with many inequality constraints given by smooth
convex functions and, by using standard convex separation, derived necessary opti-
mality conditions for such problems as a saddle point theorem [193]. Then these
conditions were reformulated in a Lagrangian form with the additional sign and com-
plementary slackness relations that are valid for more general nonlinear programming
problems described by smooth while not necessary convex data. It was revealed later
that the obtained conditions had also been discovered in 1939 by William Karush
(1917–1997) in his unpublished master thesis [179], which was fulfilled under the
supervision of Lawrence Graves (1896-1973). Since that, the aforementioned neces-
sary optimality conditions have been called the Karush-Kuhn-Tucker (KKT) condi-
tions for problems of nonlinear programming.
Another source of profound interest to convexity was provided by economic
modeling, particularly by the general (economic) equilibrium theory. In this vein,
the fundamental results for the most remarkable model of microeconomics, known
as the model of welfare economics, were developed independently in 1951 by Kenneth
Arrow (1921–2017) in his paper [8] (published in the same conference proceedings as
the aforementioned paper by Kuhn and Tucker) and by Gérard Debreu (1921–2004)
in his paper [96] under the convexity assumptions. One of the two major results of
the Arrow-Debreu model is the so-called Second Fundamental Theorem of Welfare
Economics ensuring the support of efficient allocations of a convex economy by a
decentralized equilibrium price, where each consumer minimizes his/her expendi-
tures and each firm maximizes its profit. The proof of this result is fully based on
convex separation, which therefore allowed Arrow and Debreu to rigorously justify
for convex economies the “invisible hand” principle by Adam Smith (1723–1790).
A powerful impulse for further developments and applications of convexity came
at the end of 1950s from then new area of optimal control. The central result of this
theory, the Pontryagin maximum principle formulated by Lev Pontryagin (1908–
1988) and named after him, was proved for nonlinear systems of ordinary differen-
tial equations by Vladimir Boltyanskii (1925–2019) in [41]. A crucial element of this
proof, as well as of the preceding proof of the maximum principle given by Gamkre-
lidze [136] for linear systems, was the usage of convex separation. However, in con-
trast to linear systems, no convexity assumptions were made in the nonlinear case,
where a certain “hidden convexity” was revealed for ODE control systems. All of
this was fully understood and largely extended in the theory of necessary conditions
2.7 Commentaries to Chapter 2 175
The notion of extreme points of convex sets is broadly used in convex analysis,
functional analysis, and various applications; in particular, to optimization prob-
lems. In linear programming, extreme points are called vertices (of convex polyhe-
dra) while playing a pivotal role in Dantzig’s Simplex Method. The fundamental
Theorem 2.103 is due to Mark Krein (1907–1989) and David Milman (1912–1982)
being published in their paper [185]. Its proof is based on convex separation.
Section 2.4 mainly collects well-known results on general properties of extended-
real-valued convex functions by following the aforementioned classical works of
Fenchel, Moreau, and Rockafellar. Appealing to convex geometry, we use the geomet-
ric definition of convex functions while providing their equivalent analytic descrip-
tion via Jensen’s inequalities in Theorem 2.105. In particular, the remarkable opera-
tion of infimal convolution (known also as “epi-addition” and “inf-convolution”) for
convex functions, which keeps convexity, goes back to Fenchel [131] and has been
largely investigated by Moreau [265, 266]. The class of optimal value/marginal func-
tions defined in (2.39) plays a highly important role in convex analysis and numerous
applications as seen in the subsequent chapters of the book. The quasiconvex exten-
sion of convex functions given in Definition 2.113 has been widely recognized in
mathematics after publishing the fundamental monograph by von Neumann and
Morgenstern on game theory and economic behavior [347].
While lower semicontinuous functions play a rather peripheral role in real anal-
ysis, the situation is completely different for extended-real-valued convex functions.
Indeed, the continuity of convex functions on general topological vector spaces pre-
cisely corresponds to interior points of the domain (see Theorem 2.144), and this
actually ensures by Corollary 2.150 the Lipschitz continuity of the functions in ques-
tion defined on normed spaces. On the other hand, lower semicontinuity allows us
to deal with boundary points of the domain, which is crucial, e.g., for applications
to constrained optimization. Furthermore, lower semicontinuity, in contrast to con-
tinuity, is preserved under major operations over convex functions.
Subsection 2.5.2 presents rather recent results (most of them have never been
published in the monographic literature), which address the systematic relaxation
of the nonempty interior assumptions in convex analysis and applications. It has
been well recognized that the nonempty interior requirements on convex sets (and
the corresponding continuity assumptions on convex functions) imposed in many
results of infinite-dimensional convex analysis are often quite restrictive. This issue
has been resolved in finite dimensions by replacing interior with relative interior,
which is always nonempty for nonempty convex sets. However, it is not the case in
infinite-dimensional spaces, where convex sets may have empty relative interiors in
common settings including those highly important in applications. In particular, the
ordering/positive cones in the classical Lebesgue spaces lp and Lp for 1 ≤ p < ∞,
which appear, e.g., in economic modeling and general equilibria, have empty relative
interiors. Since the positive cones in Lebesgue spaces have nonempty interiors only
for p = ∞, this pushes economists in infinite-dimensional modeling to work in the
complicated and inconvenient space L∞ in order to use convex separation and reach
price equilibria; see, e.g., the classical book by Mas-Collel et al. [216] as well as the
more recent book by Mordukhovich [228], where further developments and detailed
commentaries can be found in Chapter 8 with the references therein.
To the best of our knowledge, a more appropriate version of the relative interior
from Definition 2.168(a) first appeared in Holmes [167] under the name “intrin-
sic core.” This name may be confusing as argued in Borwein and Goebel [46]
2.7 Commentaries to Chapter 2 177
Generalized differentiation lies at the very heart of convex analysis and its
applications. Since the most useful and even most simple convex functions
are nondifferentiable at the points of interest, the now flourishing general-
ized differentiation theory oriented toward optimization-related problems has
started from convex analysis and then has been extended to more general
variational frameworks. It concerns not only nondifferentiable functions but
also sets with nonsmooth boundaries as well as set-valued mappings. Calculus
rules of generalized differentiation have been the central issue of the theory
and applications from the very beginning of convex analysis.
Developing a geometric approach to generalized differentiation, which is
mainly based on set extremality and convex separation, we start with the study
of normals to convex sets, proceed with coderivatives of set-valued mappings
having convex graphs, and finally turn to subgradients of extended-real-valued
convex functions. In this chapter we mainly present basic results of general-
ized differential calculus in locally convex topological spaces with their refine-
ments and improvements in finite dimensions, while we also discuss in the
exercise and commentary parts some extended versions in vector spaces with-
out topologies. This study is continued in Chapter 4, where we combine it
with developing calculus rules for Fenchel conjugates and deriving enhanced
results on generalized differential calculus in Banach spaces.
Unless otherwise stated, all the spaces under consideration in this chapter
are real topological vector spaces.
Proof. The separation of the two convex sets Ω1 and Ω2 by a closed hyper-
plane means the existence of x∗ ∈ X ∗ with x∗ = 0 such that
x∗ , x ≤ x∗ , y whenever x ∈ Ω1 , y ∈ Ω2 . (3.3)
It implies, in particular, that we have
x∗ , x ≤ x∗ , x whenever x ∈ Ω1 ,
which tells us by Definition (3.1) that x∗ ∈ N (x; Ω1 ). In the same way we
can check that −x∗ ∈ N (x; Ω2 ). The proof of the opposite implication is also
straightforward.
Definition 3.6 We say that two nonempty (not necessarily convex) subsets
Ω1 , Ω2 of a topological vector space X form an extremal system in X if
for any neighborhood V of the origin there exists a vector a ∈ X such that
a ∈ V and (Ω1 + a) ∩ Ω2 = ∅. (3.4)
Note that Definition 3.6 and the following theorem do not generally require
that the sets in question have a common point.
Proof. To verify assertion (a), assume that the sets Ω1 and Ω2 form an
extremal system in X. Suppose on the contrary that 0 ∈ int(Ω1 − Ω2 ). Then
there exists a balanced neighborhood V of 0 ∈ X such that
V ⊂ Ω1 − Ω 2 .
If a ∈ V , then −a ∈ V ⊂ Ω1 −Ω2 and and so (Ω1 +a)∩Ω2 = ∅, a contradiction.
Now suppose that 0 ∈/ int(Ω1 − Ω2 ) and thus get
V ∩ X \ (Ω1 − Ω2 ) = ∅
for any neighborhood V of the origin. Assume without loss of generality that
V is balanced and so (−V ) ∩ [X \ (Ω1 − Ω2 )] = ∅. Then choose a ∈ V with
184 3 CONVEX GENERALIZED DIFFERENTIATION
−a ∈ X \ (Ω1 − Ω2 ) ,
and hence (Ω1 + a) ∩ Ω2 = ∅. This verifies that the convex sets Ω1 and Ω2
form an extremal system.
For the second part of (a), suppose on the contrary that int(Ω1 ) ∩ Ω2 = ∅.
Then there exists a vector x ∈ int(Ω1 ) with x ∈ Ω2 . We can always choose a
balanced neighborhood V of the origin such that x + V ⊂ Ω1 . For any vector
a ∈ V we have −a ∈ V and x − a ∈ Ω1 . Hence (a + Ω1 ) ∩ Ω2 = ∅, which is a
clear contradiction.
To verify (b), observe that if Ω1 and Ω2 form an extremal system, then 0 ∈ /
int(Ω1 − Ω2 ) by (a). The assumption in (b) on the solidness of the difference
Ω1 − Ω2 allows us to use the convex separation theorem, which yields (3.6).
The last statement of (b) is justified in Proposition 3.5.
To prove (c), suppose that (3.6) holds, which gives us v ∈ X with x∗ , v >
0. Fix any neighborhood V of the origin. Since V is always absorbing, we can
select k ∈ N so large that a := −v/k ∈ V . Let us show that (3.4) is satisfied
with this vector a. Indeed, the negation of this means that there exists x ∈ Ω2
with x − a ∈ Ω1 . By the separation property from (3.6) we have
x∗ , x − a ≤ sup x∗ , x ≤ inf x∗ , x ≤ x∗ , x.
x∈Ω1 x∈Ω2
In this subsection we apply the set extremality and convex extremal princi-
ple of Theorem 3.7 to obtain the normal cone intersection rule under differ-
ent qualification conditions for convex sets in topological vector spaces. This
intersection rule is crucial for deriving major rules of coderivative and subdif-
ferential calculus by using the geometric approach. The proof of the following
fundamental theorem under the classical qualification condition in topological
vector spaces is based on set extremality and convex separation (Figure 3.3).
x∗ , x − x ≤ 0 whenever x ∈ Ω1 ∩ Ω2 .
Define further the convex sets
Θ1 := Ω1 × [0, ∞) and Θ2 := (x, μ) ∈ X × R x ∈ Ω2 , μ ≤ x∗ , x − x .
It follows from the constructions of Θ1 and Θ2 that for any α > 0 we get
Θ1 + (0, α) ∩ Θ2 = ∅,
and thus these sets form an extremal system in the sense of Definition 3.6.
To proceed, observe first that int(Θ1 ) = ∅, and so the set Θ1 − Θ2 is solid.
Applying Theorem 3.7 to the sets Θ1 and Θ2 gives us y ∗ ∈ X ∗ and γ ∈ R
with (y ∗ , γ) = (0, 0) and
y ∗ , x − γλ1 ≤ y ∗ , y − γλ2 whenever (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 . (3.10)
Using (3.10) with (x, 1) ∈ Θ1 and (x, 0) ∈ Θ2 yields γ ≥ 0. If γ = 0, then
y ∗ , x ≤ y ∗ , y for all x ∈ Ω1 , y ∈ Ω2 .
Since int(Ω1 ) ∩ Ω2 = ∅, it readily gives us y ∗ = 0. This contradiction shows
that γ > 0. Employing now (3.10) with (x, 0) ∈ Θ1 for any x ∈ Ω1 and with
(x, 0) ∈ Θ2 tells us that
y ∗ , x ≤ y ∗ , x for all x ∈ Ω1 , and so y ∗ ∈ N (x; Ω1 ).
Using (3.10) with (x, 0) ∈ Θ1 and (y, x∗ , y − x) ∈ Θ2 for y ∈ Ω2 shows that
y ∗ , x ≤ y ∗ , y − γx∗ , y − x for all y ∈ Ω2 .
Dividing both sides of the obtained inequality by γ > 0, we arrive at
x∗ − y ∗ /γ, y − x ≤ 0 for all y ∈ Ω2 ,
which verifies by (3.1) the fulfillment of the inclusions
x∗ ∈ y ∗ /γ + N (x; Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 )
and hence proves that N (x; Ω1 ∩ Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 ).
Next we show that the usage of the set extremality device implemented in
the proof of Theorem 3.10 allows us to obtain the normal cone intersection
rule in topological vector spaces under a refined qualification condition that
is weaker than the standard one (3.9) in any normed space and may strictly
improve the latter even in finite dimensions; see more discussions below (Fig-
ure 3.4).
Recall that a subset Ω of a topological vector space X is bounded if for any
neighborhood V of the origin there exists γ > 0 such that we have Ω ⊂ αV
whenever |α| > γ.
3.1 The Normal Cone and Set Extremality 187
Proof. We need to verify the inclusion “⊂” in (3.12) under the fulfillment
of the qualification condition (3.11), which is called the bounded extremality
condition at x. We will mainly follow the proof lines of Theorem 3.10 with the
corresponding modifications due to the usage of (3.11). Denote A := Ω1 and
B := Ω2 ∩ V and then observe that 0 ∈ int(A − B) and B is bounded. Fixing
an arbitrary normal x∗ ∈ N (x; A ∩ B), we get by (3.1) that x∗ , x − x ≤ 0
for all x ∈ A ∩ B. Consider the convex sets
Θ1 := A × [0, ∞), Θ2 := (x, μ) ∈ X × R x ∈ B, μ ≤ x∗ , x − x . (3.13)
Following the proof of Theorem 3.10, we observe that the sets Θ1 and Θ2
form an extremal system. Next let us check that the set Θ1 − Θ2 is solid, i.e.,
int(Θ1 − Θ2 ) = ∅. Take a neighborhood U of the origin such that U ⊂ A − B.
The boundedness of the set B allows us to choose λ ∈ R satisfying
λ ≥ sup −x∗ , x − x. (3.14)
x∈B
The next remark shows that the qualification condition (3.11) holds in any
normed space under the fulfillment of the standard interiority condition from
Theorem 3.10, and the latter may fail even in R2 while (3.11) is satisfied.
ri ri(Ω1 ) ∩ ri(Ω2 ) = ri(Ω1 ∩ Ω2 ) ⊂ ri(Ω1 ) ∩ ri(Ω2 ).
This justifies representation (3.19) for m = 2.
To verify (3.19) under the fulfillment of (3.18) in the general case of m > 2,
we proceed by induction and assume that the result holds for m − 1 sets.
Considering m sets Ωi , represent their intersection as
m
m−1
Ωi = Ω ∩ Ωm with Ω := Ωi . (3.21)
i=1 i=1
Now we are ready to derive the major representation of the normal cone
to intersections of convex sets. Note that the proof of this result, as well as
the subsequent calculus rules for functions and set-valued mappings, follows
the geometric pattern of variational analysis as in [228] while the specific
features of convexity allow us to essentially simplify the proof and to avoid
the closedness requirement on sets and the corresponding lower semiconti-
nuity assumptions on functions in subdifferential calculus. Furthermore, the
developed geometric approach works in the convex setting under the corre-
sponding relative interior qualification conditions in Rn , which are weaker
than the qualification conditions employed in [228, 237]; see below.
Proof. Proceeding by induction, let us first prove the claimed statement for
the case of m = 2. Since the inclusion “⊃” in (3.23) trivially holds even with-
out imposing (3.22), the real task is to verify the opposite inclusion therein.
Fixing x ∈ Ω1 ∩ Ω2 and v ∈ N (x; Ω1 ∩ Ω2 ), we get by the normal cone
definition that
v, x − x ≤ 0 for all x ∈ Ω1 ∩ Ω2 .
Denote Θ1 := Ω1 × [0, ∞) and Θ2 := {(x, λ) | x ∈ Ω2 , λ ≤ v, x − x}. It
follows from Corollary 2.95 that ri(Θ1 ) = ri(Ω1 ) × (0, ∞) and that
ri(Θ2 ) = (x, λ) x ∈ ri(Ω2 ), λ < v, x − x .
192 3 CONVEX GENERALIZED DIFFERENTIATION
Following the proof of Theorem 3.10, we observe that γ ≥ 0. Let us now show
by using (3.22) that γ > 0. Supposing again on the contrary that γ = 0, we
get the conditions
w, x ≤ w, y for all x ∈ Ω1 , y ∈ Ω2 ,
w, x
< w, y with x ∈ Ω1 , y ∈ Ω2 .
This means that Ω1 and Ω2 can be properly separated, which tells us by
Theorem 2.92 that ri(Ω1 ) ∩ ri(Ω2 ) = ∅, a contradiction verifying that γ > 0.
Deduce from (3.24), by taking into account that (x, 0) ∈ Θ1 if x ∈ Ω1 and
that (x, 0) ∈ Θ2 , the inequality
w, x ≤ w, x for all x ∈ Ω1 .
This implies therefore that w ∈ N (x; Ω1 ) and so w/γ ∈ N (x; Ω1 ). Moreover,
we get from (3.24), due to (x, 0) ∈ Θ1 and (y, α) ∈ Θ2 for all y ∈ Ω2 with
α := v, y − x, that
w, x ≤ w, y − γv, y − x whenever y ∈ Ω2 .
Dividing both sides therein by γ, we arrive at the relationship
w
v − , y − x ≤ 0 for all y ∈ Ω2 ,
γ
w
and thus v − ∈ N (x; Ω2 ). This gives us
γ
w
v ∈ + N (x; Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 )
γ
and therefore completes the proof of (3.23) in the case of m = 2.
Considering now the case of intersections for any finite number of sets,
suppose by induction that the intersection rule (3.23) holds under (3.22) for
m − 1 sets and verify that it continues to hold for the intersection of m > 2
m
sets i=1 Ωi . Represent the latter intersection as Ω ∩ Ωm in (3.21) and get
from the imposed relative interior condition (3.22) and Lemma 3.14 that
m
ri(Ω) ∩ ri(Ωm ) = ri(Ωi ) = ∅.
i=1
3.1 The Normal Cone and Set Extremality 193
Applying the intersection rule (3.23) to the two sets Ω ∩Ωm and then employ-
ing the induction assumption for m − 1 sets give us the equalities
m
m
N x; Ωi = N (x; Ω ∩ Ωm ) = N (x; Ω) + N (x; Ωm ) = N (x; Ωi ),
i=1 i=1
which justify (3.23) for m sets and thus completes the proof.
Finally, compare Theorem 3.15 derived under the relative interior qualifi-
cation condition (3.22) with the corresponding result obtained in [237, Corol-
lary 2.16] for m = 2 under the so-called basic/normal qualification condition
N (x; Ω1 ) ∩ − N (x; Ω2 ) = {0}, (3.25)
which was introduced and applied earlier for establishing the intersection rule
and related calculus results in nonconvex variational analysis; see, e.g., [228,
229, 317], and the references therein. Let us first show that (3.25) yields (3.22)
in the general convex setting.
The next example shows that (3.26) may be strictly weaker than (3.25).
Example 3.18 Consider the two planar convex sets defined by Ω1 := R×{0}
and Ω2 := (−∞, 0] × {0}. Condition (3.26) obviously holds and thus ensures
the fulfillment of the normal cone intersection rule by Theorem 3.15. On the
other hand, it is easy to check that
N (x; Ω1 ) = {0} × R and N (x; Ω2 ) = [0, ∞) × R with x = (0, 0),
i.e., the normal qualification condition (3.25) fails. This shows that the result
of [237, Corollary 2.16] is not applicable in this case.
Let us start with the sum rule for coderivatives under several qualification
conditions induced by the application of the normal cone intersection rule
from Theorem 3.10. For simplicity of formulations we confine ourselves to the
coderivative implementation of the standard qualification condition (3.9).
Given two set-valued mappings F1 , F2 : X →→ Y between topological vector
spaces, consider their sum (F1 + F2 ) : X →
→ Y defined for each x ∈ X by
(F1 + F2 )(x) = F1 (x) + F2 (x) := y ∈ Y ∃ yi ∈ Fi (x) with y = y1 + y2 .
If F1 and F2 have convex graphs, then their sum F1 +F2 has the same property,
and we get dom(F1 +F2 ) = dom(F1 )∩dom(F2 ). To formulate the coderivative
sum rule for F1 + F2 at any (x, y) ∈ gph(F1 + F2 ), define the set
S(x, y) := (y 1 , y 2 ) ∈ Y ×Y y = y 1 +y 2 with y i ∈ Fi (x), i = 1, 2 . (3.28)
It is easy to see that if F and G have convex graphs, then G ◦ F also has this
property. In the next theorem we use the notation
T (x, z) := F (x) ∩ G−1 (z) and rge(F ) := F (x).
x∈X
Proof. First we verify the inclusion “⊂” in (3.37). For every y ∈ (F1 ∩ F2 )(x),
y ∗ ∈ Y ∗ , and x∗ ∈ D∗ (F1 ∩ F2 )(x, y)(y ∗ ) it follows that
(x∗ , −y ∗ ) ∈ N (x, y); gph(F1 ∩ F2 ) = N (x, y); gph(F1 ) ∩ (gph(F2 ) .
Applying Theorem 3.10 under the qualification condition (3.9), which reduces
to (3.36) in this case, tells us that
(x∗ , −y ∗ ) ∈ N (x, y); gph(F1 ∩F2 ) =N (x, y); gph(F1 ) + N (x, y); gph(F2 ) .
Thus (x∗ , −y ∗ ) = (x∗1 , −y1∗ ) + (x∗2 , −y2∗ ) for some (x∗i , −yi∗ ) ∈ N ((x, y);
gph(Fi )). Therefore x∗ ∈ D∗ F1 (x, y)(y1∗ ) + D∗ F2 (x, y)(y2∗ ) and y ∗ = y1∗ + y2∗ ,
which justify the claimed inclusion “⊂” in the coderivative representation
(3.37).
200 3 CONVEX GENERALIZED DIFFERENTIATION
To verify the opposite inclusion in (3.37), take y1∗ , y2∗ ∈ Y ∗ with y1∗ + y2∗ =
y . Picking now x∗ ∈ D∗ F1 (x, y)(y1∗ ) + D∗ F2 (x, y)(y2∗ ), we get x∗ = x∗1 + x∗2
∗
for some x∗1 ∈ D∗ F1 (x, y)(y1∗ ) and x∗2 ∈ D∗ F2 (x, y)(y2∗ ). This shows that
(x∗ , −y ∗ ) = (x∗1 , −y1∗ ) + (x∗2 , −y2∗ ) ∈ N (x, y); gph(F1 ) + N (x, y); gph(F2 )
= N (x, y); gph(F1 ∩ F2 ) ,
and thus x∗ ∈ D∗ (F1 ∩ F2 )(x, y)(y ∗ ), which completes the proof.
Proof. The very construction (3.39) immediately yields this property. Note
that for convex functions the notions of local and global minimizers agree.
Proposition 3.30 Let X be a topological vector space. For any convex func-
tion f : X → R and any x ∈ dom(f ) we have the representation
∂f (x) = x∗ ∈ X ∗ (x∗ , −1) ∈ N (x, f (x)); epi(f ) . (3.40)
3.3 Subgradients of Convex Functions 203
Proof. Fix x∗ ∈ ∂f (x) and (x, λ) ∈ epi(f ). Since λ ≥ f (x), we deduce from
(3.39) the upper estimates
∗
(x , −1), (x, λ) − x, f (x) = x∗ , x − x − λ − f (x)
≤ x∗ , x − x − f (x) − f (x) ≤ 0.
This readily implies that (x∗ , −1) ∈ N ((x, f (x)); epi(f )).
To verify the opposite inclusion in (3.40), take an arbitrary x∗ ∈ X ∗ with
(x∗ , −1) ∈ N ((x, f (x)); epi(f )). For any x ∈ dom(f ) we have (x, f (x)) ∈
epi(f ). Therefore, it follows that
x∗ , x − x − f (x) − f (x) = (x∗ , −1), x, f (x) − x, f (x) ≤ 0,
which yields x∗ ∈ ∂f (x) and thus justifies representation (3.40).
Proposition 3.30 tells us that subgradients of f at x correspond to “non-
horizontal” normals to the epigraph epi(f ). As complemented to them, we
introduce the following collection of x∗ ∈ X ∗ corresponding to horizontal nor-
mals to epi(f ) that plays an independent role in the study of convex functions.
Definition 3.31 Let X be a topological vector space. Given a convex function
f : X → R and a point x ∈ dom(f ), we say that x∗ ∈ X ∗ is a singular or
horizon subgradient of f at x if (x∗ , 0) ∈ N ((x, f (x)); epi(f )). The collec-
tion of such subgradients is called the singular/horizon subdifferential
of f at x and is denoted by
∂ ∞ f (x) := x∗ ∈ X ∗ (x∗ , 0) ∈ N (x, f (x)); epi(f ) . (3.41)
We put ∂ ∞ f (x) := ∅ if x ∈
/ dom(f ).
204 3 CONVEX GENERALIZED DIFFERENTIATION
Remark 3.32 It follows directly from (3.40), (3.41), and the coderivative
construction (3.27) that both subdifferential and singular subdifferential of
f : X → R at x ∈ dom(f ) can be expressed as
∂f (x) = D∗ Ef x, f (x) (1) and ∂ ∞ f (x) = D∗ Ef x, f (x) (0) (3.42)
via the coderivative of the epigraphical multifunction Ef : X →
→ R, which is
associated with the function f by the formula
Ef (x) := α ∈ R (x, α) ∈ epi(f ) = [f (x), ∞), x ∈ X.
There are deeper relationships between ∂f (x), ∂ ∞ f (x), and the coderivative
D∗ f (x) of f itself provided that f is finite around x ∈ dom(f ). It is proved
in [228, Theorem 1.80], even without the convexity of f , that
∂f (x) = D∗ f (x)(1) and ∂ ∞ f (x) ⊂ D∗ f (x)(0) (3.43)
provided that X is a Banach space and that f is continuous around x. Fur-
thermore, the first relationship in (3.43) is proved in [229, Theorem 1.23] for
functions f : Rn → R that are merely l.s.c. around this point. Note however
that, in contrast to (3.42), the equality in (3.43) does not allow us to apply
coderivative results for convex-graph mappings to the study of the subdiffer-
ential of f , since the graph of f is not convex unless f is an affine function.
Employing Proposition 3.30 and Theorem 3.33 gives us the following sub-
differential conditions for locally Lipschitzian/continuous functions.
Theorem 3.34 Let X be a normed space, and let f : X → R be a convex
function with x ∈ dom(f ). Then we have the following assertions:
(a) The local -Lipschitz continuity of f around x implies that
x∗ ≤ for any x∗ ∈ ∂f (x) and ∂ ∞ f (x) = {0}, (3.45)
where the subdifferential ∂f (x) is a weak∗ compact subset of X ∗ .
(b) All the conditions in (a) hold with some ≥ 0 if f is continuous at x.
Proof. In case (a) both properties (3.45) follow directly from representation
(3.40) and construction (3.41), respectively, due to (3.44). The weak∗ compact-
ness of the subdifferential ∂f (x) is a consequence of (3.45) and the Alaoglu-
Bourbaki theorem in normed spaces; see Corollary 1.113. Assertion (b) follows
from (a) due to the equivalence in Theorem 3.33.
Example 3.36 Let X be a normed space, and let f (x) := x be the norm
function on X. Then the subdifferential of f at x ∈ X is calculated by
B∗ if x = 0,
∂f (x) = ∗ (3.48)
x ∈ X ∗ x∗ , x = x, x∗ = 1 otherwise.
206 3 CONVEX GENERALIZED DIFFERENTIATION
The next theorem, which is based on the previous results, provides a full
description of the normal cone to the epigraph of a convex function.
Proof. To verify (a), take (x∗ , −μ) ∈ N ((x, f (x)); epi(f )) and get that
x∗ , x − x − μ λ − f (x) ≤ 0 whenever (x, λ) ∈ epi(f ).
Since (x, f (x) + 1) ∈ epi(f ), it follows that
x∗ , x − x − μ (f (x) + 1 − f (x) ≤ 0,
which clearly implies that μ ≥ 0.
To proceed with (b), pick x∗ ∈ X ∗ such that (x∗ , 0) ∈ N ((x, f (x)); epi(f ))
and deduce from the normal cone definition that
x∗ , x − x + 0 λ − f (x) ≤ 0 whenever (x, λ) ∈ epi(f ). (3.50)
Taking any x ∈ dom(f ) and applying (3.50) for (x, f (x)) ∈ epi(f ) give us the
inequality x∗ , x − x ≤ 0, which implies that x∗ ∈ N (x; dom(f )). The proof
of the opposite implication is also straightforward.
3.3 Subgradients of Convex Functions 207
Note that the result of Theorem 3.37(b) provides the description below of
the singular subdifferential (3.41) of convex functions as
∂ ∞ f (x) = N x; dom(f ) . (3.51)
Having this in mind and using the corresponding results above lead us to the
following characterization of local Lipschitz continuity and merely continuity
in the case of finite-dimensional spaces.
Proof. The equivalence between the first three properties follows from Corol-
lary 2.152 and Theorem 3.33. Furthermore, they yield (3.52) by Theo-
rem 3.34. It remains to verify that (3.52) implies the local Lipschitz con-
tinuity of f around x in finite dimensions. Assuming the contrary, we get
x∈ / int(dom(f )) and thus x ∈ bd(dom(f )). It follows from Theorem 3.3(c)
that N (x; dom(f )) = {0}, which contradicts (3.51) and ends the proof.
Now we derive general conditions ensuring the subdifferentiability of convex
functions (i.e., the existence of a subgradient) in both topological vector spaces
and in finite dimensions. As above, the finite-dimensional geometry offers a
less restrictive assumption in comparison with that in infinite dimensions. We
use different approaches to justify the subdifferentiability.
Proof. To verify (a), observe that int(dom(f )) = ∅ under the imposed con-
tinuity assumption. Then taking any x ∈ int(dom(f )) and using Proposi-
tion 2.145 give us (x, f (x)) ∈ bd(epi(f )). By Theorem 3.3(b) we find a
nonzero element (x∗ , −μ) ∈ N ((x, f (x)); epi(f )) with x∗ ∈ X ∗ and μ ∈ R.
Furthermore, Theorem 3.37(a) tells us that μ ≥ 0. We want to show that
μ > 0. Suppose on the contrary that μ = 0, which yields x∗ = 0. By
Theorem 3.37(b) we get that x∗ ∈ N (x; dom(f )). Then it follows from
Theorem 3.3(b) that x ∈ bd(dom(f )), a contradiction. Thus μ > 0 and
(x∗ /μ, −1) ∈ N ((x, f (x)); epi(f )). Applying finally Proposition 3.30 gives us
x∗ /μ ∈ ∂f (x), which completes the proof of (a).
To prove (b), pick any x ∈ ri(dom(f )). Since (x, f (x)) ∈ / ri(epi(f )) by
Corollary 2.95, we can separate (x, f (x)) and epi(f ) properly by a hyperplane.
This gives us (v, −μ) ∈ Rn × R such that
208 3 CONVEX GENERALIZED DIFFERENTIATION
Proof. For any x∗ ∈ R+ ∂f (x) there are λ ≥ 0, u∗ ∈ ∂f (x) with x∗ = λu∗ and
u∗ , x − x ≤ f (x) − f (x) whenever x ∈ X.
It follows therefore that
x∗ , x − x = λu∗ , x − x ≤ λ f (x) − f (x) for all x ∈ X.
3.3 Subgradients of Convex Functions 209
Our next goal in this subsection is to establish a precise formula for cal-
culating the subdifferential of convex functions of one variable. The following
lemma is useful in the sequel.
The second lemma uses the slope function to calculate the left and right
derivatives of f , which always exists for convex functions under consideration.
Proof. Lemma 3.42 tells us that the slope function ϑx is nondecreasing on the
interval (x, ∞) and is bounded from below by ϑx (x − 1). Then the limit
f (x) − f (x)
lim ϑx (x) = lim+
x→x+ x→x x−x
exists as a finite number. Furthermore, we get
lim ϑx (x) = inf ϑx (x).
x→x+ x>x
This ensures that f+ (x) exists and is calculated by
f+ (x) = inf ϑx (x).
x>x
Similarly we establish the existence of f− (x) with the formula
f− (x) = sup ϑx (x).
x<x
f (x) − f (x)
v≥ whenever x < x,
x−x
which ensures that v ≥ f− (x). Hence
∂f (x) ⊂ f− (x), f+ (x) .
To verify the opposite inclusion, take v ∈ [f− (x), f+ (x)] and get
sup ϑx (x) = f− (x) ≤ v ≤ f+ (x) = inf ϑx (x)
x<x x>x
by Lemma 3.43. It follows from the upper estimate by f+ (x) for v that
f (x) − f (x)
v ≤ ϑx (x) = whenever x > x,
x−x
which implies therefore the inequality
v(x − x) ≤ f (x) − f (x) when x ≥ x.
Proceeding similarly shows that
v(x − x) ≤ f (x) − f (x) for all x < x.
Thus v ∈ ∂f (x) and (3.56) is verified.
Proof. Observing that the inclusion “⊃” in (3.61) holds by Corollary 3.49, we
proceed with the proof of the opposite inclusion. Consider first the case of
m = 2 and pick any v ∈ ∂(f1 + f2 )(x). Then we have
3.3 Subgradients of Convex Functions 215
Example 3.52 Let a1 < a2 < . . . < an , and let μi > 0 for i = 1, . . . , n. Define
the convex function
n
f (x) := μi |x − ai |, x ∈ R.
i=1
Then 0 ∈ ∂g(x) if and only if x ∈ [ak , ak+1 ], and thus g attains its global
minimum at any point from the interval [ak , ak+1 ].
Proof. It is obvious that the graph gph(B) is an affine set. Furthermore, the
inclusion (x∗ , y ∗ ) ∈ N ((x, y); gph(B)) amounts to saying that
3.3 Subgradients of Convex Functions 217
The next theorem is the main subdifferential chain rule for convex com-
positions in the general topological vector space setting.
Proof. It suffices to verify the inclusion “⊂” under the imposed qualifica-
tion condition in finite dimensions. Form Ω1 , Ω2 by (3.67) and observe that
ri(Ω1 ) = Ω1 = gph(B) × R. Using now Corollary 2.95 shows that
ri(Ω2 ) = (x, y, λ) ∈ Rn × Rp × R x ∈ Rn , y ∈ ri dom(f ) , λ > f (y) .
Thus the assumption of this theorem ensures that ri(Ω1 )∩ri(Ω2 ) = ∅. The rest
of the proof follows the proof lines of Theorem 3.55 by applying the normal
cone intersection rule in finite dimensions given in Theorem 3.15.
Here we present yet another consequence of the normal cone intersection rule
of Theorem 3.10 to calculating subgradients of maxima of finitely many con-
vex functions (Figure 3.8). Given fi : X → R, define the maximum function
f : X → R by
f (x) := max fi (x) i = 1, . . . , m , x ∈ X. (3.68)
If in addition all the functions fi are continuous at x, then we have the fol-
lowing subdifferential maximum rule:
∂f (x) = co ∂fi (x) . (3.71)
i∈I(x)
220 3 CONVEX GENERALIZED DIFFERENTIATION
m m
(x∗ , −1) ∈ N x, f (x) ; epi(fi ) = N (x, λ); epi(fi ) with λ := f (x).
i=1 i=1
∗
from Theorem 3.37 that λi ≥ 0 and∗ xi
It follows ∈ λi ∂fi (x) for i ∈ I(x).
Since i∈I(x) λi = 1, we see that the vector x = i∈I(x) x∗i belongs to the
set on right-hand side of (3.71). This completes the proof.
The next two corollaries of Theorem 3.59 are rather straightforward. The
first one follows from the automatic continuity of real-valued convex functions.
Corollary 3.60 Let all the functions fi in (3.68) be real-valued and convex
on Rn . Then we have the the subdifferential maximum rule (3.71).
Let us finally illustrate the usage of Corollary 3.60 to determine all the
subgradients of the maximum function.
Proof. Suppose that d(x; Ω) = 0. Then for any number k ∈ N there exists a
vector ωk ∈ Ω satisfying the inequality x − ωk < 1/k. Thus the sequence
{ωk } converges to x, and hence x ∈ Ω.
Conversely, suppose that x ∈ Ω. Then there exists a sequence {ωk } in
Ω that converges to x. Since d(x; Ω) ≤ x − ωk for every k ∈ N, we get
d(x; Ω) = 0 by letting k → ∞.
Proof. Fix any vectors ω ∈ Ω and x, y ∈ X. Then it follows from the distance
function definition that
d(x; Ω) ≤ x − ω = x − y + y − w ≤ x − y + y − w.
This readily yields the estimate
d(x; Ω) ≤ x − y + inf y − ω ω ∈ Ω = x − y + d(y; Ω).
3.3 Subgradients of Convex Functions 223
Proof. Using the definition of the distance function and the triangle inequality
for the norm on X yields
d(x + y; Ω1 + Ω2 ) = inf (x + y) − (ω1 + ω2 ) ω1 ∈ Ω1 and ω2 ∈ Ω2
≤ (x + y) − (ω1 + ω2 ) ≤ x − ω1 + y − ω2
for all ω1 ∈ Ω1 and ω2 ∈ Ω2 . This readily implies that
d(x + y; Ω1 + Ω2 ) ≤ inf x − ω ω1 ∈ Ω2 + inf y − ω2 ω2 ∈ Ω2
= d(x; Ω1 ) + d(y; Ω2 ),
which is what we claimed in this proposition.
Let us further study the notion of projections to sets associated with the
distance function (3.72). Given an element x ∈ X and a subset Ω of X, the
projection from x to Ω is defined by
Π(x; Ω) := ω ∈ Ω d(x; Ω = x − ω . (3.74)
Note that in general the mapping x → Π(x; Ω) is set-valued on X and
may take empty values. The next proposition lists some sufficient conditions
ensuring the nonemptiness of this mapping.
224 3 CONVEX GENERALIZED DIFFERENTIATION
Proof. In case (a) the result follows immediately from the Weierstrass exis-
tence theorem due to the continuity of the distance function.
To proceed in case (b), for every k ∈ N find ωk ∈ Ω such that
1
d(x; Ω) ≤ x − ωk < d(x; Ω) + .
k
The sequence {ωk } ⊂ Ω is bounded in the finite-dimensional space X since
ωk ≤ x + d(x; Ω) + 1.
Thus it has a convergence subsequence the limit of which belongs to Ω due
to the assumed closedness of the set.
Finally, consider the remaining case (c). As well known and discussed
above, every closed and bounded set in a reflexive Banach space is weakly
sequentially compact and the norm function is weakly lower semicontinuous.
Thus in this case we deduce the nonemptiness of Π(x; Ω) by applying the
Weierstrass existence theorem in the weak topology of X.
Let us present an example showing that the convexity assumption on Ω is
essential for the fulfillment of Proposition 3.67(c).
The next example shows that the reflexivity assumption on the Banach
space X in Proposition 3.67 is also essential, even when Ω is a closed subspace
of X.
Example 3.69 Consider the (nonreflexive) Banach space C[0, 1] of all the
real-valued continuous functions defined on [0, 1] with the norm given by
f := max |f (t)| t ∈ [0, 1] for f ∈ C[0, 1].
Define the closed subspace Ω of C[0, 1] by
3.3 Subgradients of Convex Functions 225
1
Ω := f ∈ C[0, 1] f (1) = 0 and f (t) dt = 0 .
0
Proof. To verify the convexity of d(·; Ω), fix x1 , x2 ∈ X and t ∈ (0, 1). Given
any ε > 0, we find ωi ∈ Ω for i = 1, 2 such that
xi − ωi < d(xi ; Ω) + ε for i = 1, 2.
The convexity of Ω ensures that tω1 + (1 − t)ω2 ∈ Ω, and thus
d tx1 + (1 − t)x2 ; Ω ≤ tx1 + (1 − t)x2 − [tω1 + (1 − t)ω2 ]
≤ tx1 − ω1 + (1 − t)x2 − ω2
= t d(x1 ; Ω) + (1 − t)d(x2 ; Ω) + ε.
Letting now ε ↓ 0 implies that
d tx1 + (1 − t)x2 ; Ω ≤ td(x1 ; Ω) + (1 − t)d(x2 ; Ω),
which verifies the convexity of the distance function (3.72).
Conversely, suppose that d(·; Ω) is a convex function while Ω is a closed
set. Fixing ω1 , ω2 ∈ Ω and λ ∈ (0, 1), we get
d λω1 + (1 − λ)ω2 ; Ω ≤ λd(ω1 ; Ω) + (1 − λ)d(ω2 ; Ω) = 0.
The closedness of Ω yields λω1 + (1 − λ)ω2 ∈ Ω, and thus Ω is convex.
Proof. Suppose that ω ∈ Π(x; Ω). Then for any ω ∈ Ω and λ ∈ (0, 1), ω +
λ(ω − ω) ∈ Ω by convexity of Ω. Applying the properties of inner products
and norms in Hilbert spaces gives us
[d(x; Ω)]2 = x − ω2 ≤ x − [ω + λ(ω − ω)2
= x − ω + λ(ω − ω) , x − ω + λ(ω − ω)
= x − ω2 + 2x − ω, −λ(ω − ω) + λ2 ω − ω2 .
It follows therefore that
2x − ω, ω − ω ≤ λω − ω,
which verifies the “only if” part (3.75) by letting λ ↓ 0.
Conversely, suppose that (3.75) holds. Then take ω ∈ Ω to get
x − ω2 = x − ω + ω − ω2
= x − ω2 + 2x − ω, ω − ω + ω − ω2
= x − ω2 − 2x − ω, ω − ω + ω − ω2
≥ x − ω2 + ω − ω2 ≥ x − ω2 .
Since this holds for all ω ∈ Ω, we have ω ∈ Π(x; Ω).
3.3 Subgradients of Convex Functions 227
Proof. It follows from the estimate (3.76) in Proposition 3.74 by applying the
Cauchy-Schwarz inequality to the right-hand side of (3.76).
for any x ∈ X via the indicator function δΩ (x) := δ(x; Ω). If x ∈ Ω, then
d(x; Ω) = 0 = δΩ (x) + p(0), and therefore we arrive at
∂d(x; Ω) = ∂δΩ (x) ∩ ∂p(0) = N (x; Ω) ∩ B∗
by Theorem 3.76 and the above subdifferential computations for the indicator
and norm functions. This verifies (a). The proof of (b) is similar.
Proof. The above formula is proved in Proposition 3.77 in the case where
x ∈ Ω. Consider now the case where x ∈ / Ω and observe by Proposition 3.70
that in this case the metric projection Π(x; Ω) is a (nonempty) singleton that
230 3 CONVEX GENERALIZED DIFFERENTIATION
Proof. Since x ∈/ Ωr , we get d(x; Ω) > r. Fix u ∈ Ωr and for any ε > 0 find
uε ∈ Ω satisfying the estimates
u − uε ≤ d(u; Ω) + ε ≤ r + ε.
This obviously implies that
u − x ≥ uε − x − u − uε ≥ d(x; Ω) − uε − u ≥ d(x; Ω) − r − ε.
Since the estimate u − x ≥ d(x; Ω) − r − ε holds for all u ∈ Ωr and all ε > 0,
we arrive at the inequality
d(x; Ωr ) ≥ d(x; Ω) − r.
To verify the opposite inequality in (3.82), fix u ∈ Ω and define the con-
tinuous function f : R+ → R by
f (t) := d tx + (1 − t)u; Ω , t ≥ 0.
3.3 Subgradients of Convex Functions 231
Since f (0) = 0 and f (1) > ρ > 0, there exists t0 ∈ (0, 1) with f (t0 ) = r ∈ (0, ρ)
by the classical intermediate value theorem. Putting now v := t0 x + (t − t0 )u,
we have dΩ (v) = r and x − u = x − v + v − u, which implies in turn by
using u ∈ Ω and v ∈ Ωr that
x − u ≥ x − v + d(v; Ω) = x − v + r.
The latter yields x − u ≥ d x; Ωr + r and thus verifies (3.82).
Proof. Fix any u∗ ∈ ∂d(x; Ω) and get (3.83) by the subdifferential definition.
Thus d(x; Ω) ≤ r = d(x; Ω) whenever x ∈ Ωr , and hence
u∗ , x − x ≤ d(x; Ω) − d(x; Ω) ≤ 0 for all x ∈ Ωr ,
i.e., u∗ ∈ N (x; Ωr ). The fact that u∗ ∈ S∗ is proved in Lemma 3.81. The proof
of the reverse inequality is left as an exercise; see Theorem 6.63
Theorem 3.84 Let P and Ω be two nonempty convex sets of Rn such that P
is a convex polyhedron. The statement that there exists a (closed) hyperplane
separating P and Ω and not containing Ω is equivalent to
P ∩ ri(Ω) = ∅. (3.85)
Proof. First we verify that the separation property formulated in the theorem
implies that condition (3.85) is satisfied. To the end, assume that there exists
a hyperplane H defined by
H := x ∈ Rn v, x = α for some v ∈ Rn , v = 0, and α ∈ R
that separates P and Ω and does not contain Ω. Since H separates P and Ω,
we have without loss of generality that
v, x ≤ α ≤ v, p for all x ∈ Ω and p ∈ P. (3.86)
Remembering that Ω ⊂ H gives us a vector x ∈ Ω such that
v, x < α ≤ v, p for all p ∈ P.
Observe furthermore the relationships
v, x ≤ α = v, u for all x ∈ Ω and u ∈ H
and also that the above vector x ∈ Ω satisfies the strict inequality
234 3 CONVEX GENERALIZED DIFFERENTIATION
Since H0 does not contain Ω and since Ω ⊂ Θ, we have that H0 also does not
contain Θ. This tells us by arguing as in the case of (3.87) that the sets H0
and Θ can be properly separated. Thus
H0 ∩ ri(Θ) = ri(H0 ) ∩ ri(Θ) = ∅
by the characterization of Theorem 2.92. Assuming now that D ∩ ri(Θ) = ∅,
pick any x ∈ D ∩ ri(Θ) and then obtain that
v0 , x ≤ α0 and α0 ≤ v0 , x,
which yields x ∈ H0 . This contradicts the condition H0 ∩ ri(Θ) = ∅ and hence
shows that D∩ri(Θ) = ∅. If furthermore x ∈ P ∩ri(Θ), then x ∈ P ∩aff(Ω) = D
since ri(Θ) ⊂ aff(Ω). This means that x ∈ D ∩ ri(Θ) while contradicting the
assumption D ∩ ri(Θ) = ∅. Thus we arrive at
P ∩ ri(Θ) = ∅. (3.88)
If moreover P ∩Θ = ∅, then—remembering that both these sets are polyhedral
and arguing as above in case (3.87)—we find a hyperplane H, which strictly
separates Θ and P and provides the inclusions
Ω ⊂ Θ ⊂ int (H − ) and P ⊂ int (H + ).
We see that this hyperplane H separates P and Ω and does not contain Ω.
Consider next the remaining case where P ∩ Θ = ∅ and suppose without
loss of generality that 0 ∈ P ∩ Θ. Since Θ ⊂ aff(Ω), we have
0 ∈ P ∩ Θ ⊂ P ∩ aff(Ω) = D, and so 0 ∈ Θ ∩ D ⊂ H0− ∩ H0+ = H0 .
Using 0 ∈ H0 gives us α0 = 0, i.e., H0 = {x ∈ Rn | v0 , x = 0}.
Now we verify that 0 ∈ / int (P ). Indeed, suppose on the contrary that
0 ∈ int(P ) and then get by 0 ∈ Θ that [x, 0) ⊂ ri(Θ) for each x ∈ ri(Θ); see
Theorem 2.83. Since 0 ∈ int(P ) clearly yields int(P ) ∩ [x, 0) = ∅, this shows
that int(P ) ∩ ri(Θ) = ∅. The latter contradicts the assumption P ∩ ri(Θ) = ∅
and hence confirms that 0 ∈ / int (P ). Furthermore, taking into account that
P is a convex polyhedron with 0 ∈ P \ int(P ), we represent P in the form
P = x ui , x ≤ 0, i = 1, . . . , m x uj , x ≤ βj , j = m + 1, . . . , m ,
H0+ H0
H0−
D
M
K := x ∈ Rn ui , x ≤ 0, i = 1, . . . , m + M = cone(P ) + M, (3.89)
we claim that K ∩ri(Θ) = ∅. Indeed, suppose on the contrary that there exists
a vector x ∈ K ∩ ri(Θ) and deduce from x ∈ K and x ∈ / M that
x = γw + u, for some w ∈ P, u ∈ M, and γ > 0.
It follows therefore that
w = γ −1 x − γ −1 u ∈ P.
Using x ∈ ri(Θ) and −u ∈ M ⊂ Θ with the subspace M , we obtain
λx − (1 − λ)u ∈ ri(Θ) whenever λ ∈ (0, 1). (3.90)
Since Θ is a cone, ri(Θ) is a cone as well. Combining this with (3.90) yields
λ
x − u ∈ ri(Θ) for all λ ∈ (0, 1),
1−λ
which is equivalent to the inclusion
3.4 Generalized Differentiation under Polyhedrality 237
is polyhedral is that relative interiors of polyhedral sets are not needed in the
characterization condition (3.85).
Next we establish an extension of Theorem 3.84 to infinite dimensions. To
proceed, we need the following useful lemma that holds in LCTV spaces.
Proof. We provide here a direct proof of the inclusion “⊂” without requiring
that qri(Ω) = ∅ and the LCTV assumption on X. Fix any x ∈ qri(Ω) and
get from the definition that cone(Ω − x) is a linear subspace of X. Then
A(cone(Ω − x)) is a linear subspace of Rn , and hence it is closed. Thus
cone A(Ω − x) = A cone(Ω − x) ⊂ A cone(Ω − x) .
Since the opposite inclusion is satisfied by the continuity of A, we have
cone A(Ω − x) = A cone(Ω − x) ,
which implies that Ax ∈ qri(A(Ω)) = ri(A(Ω)). This justifies the inclusion
“⊂” in (3.91). Since A(Ω) ⊂ Rn is obviously quasi-regular, the opposite inclu-
sion follows directly from Theorem 2.183.
Proof. Assume that the sets P and Ω can be separated by a closed hyperplane
H with Ω ⊂ H. Then there exist f ∈ X ∗ and α ∈ R such that
sup f (x) ≤ α ≤ inf f (x) and α < sup f (x). (3.92)
x∈P x∈Ω x∈Ω
We see that the functions fi are well-defined with fi ∈ (X/L)∗ for all i =
1, . . . , m. Remembering the construction of the quotient map π : X → X/L
from (1.3) gives us easily that
π(P ) = [x] ∈ X/L fi ([x]) ≤ bi , i = 1, . . . , m ,
and hence the set π(P ) is a convex polyhedron
in X/L.
Since X/L is finite-
dimensional, by Lemma 3.85 we have π qri(Ω) = ri π(Ω) . Assuming now
that π(P ) ∩ ri(π(Ω)) = ∅, i.e., there is x ∈ qri(Ω) with [x] ∈ π(P ), yields
fi (x) = fi ([x]) ≤ bi for all i = 1, . . . , m,
Having in hand the polyhedral separation result from Theorem 3.86 in LCTV
spaces, we now utilize it to develop calculus rules of generalized differentiation.
Following the geometric approach of this book, our first attention is paid to
establishing the normal cone intersection rule for two convex subsets of an
LCTV space, where one set is polyhedral. The obtained result is an extension
of the polyhedral intersection rule in finite dimensions by replacing the relative
interior with its quasi-relative counterpart.
Theorem 3.87 Let P and Ω be nonempty convex subsets of an LCTV space
X, where P is a convex polyhedron. Assuming that
240 3 CONVEX GENERALIZED DIFFERENTIATION
P ∩ qri(Ω) = ∅, (3.93)
we have the following normal cone intersection rule:
N (x; P ∩ Ω) = N (x; P ) + N (x; Ω) for all x ∈ P ∩ Ω. (3.94)
Proof. Fix x ∈ P ∩ Ω and x∗ ∈ N (x; P ∩ Ω), and then get by definition that
x∗ , x − x ≤ 0 for all x ∈ P ∩ Ω.
Define further two convex sets in the product space X × R by
Q := (x, λ) ∈ X × R x ∈ P, λ ≤ x∗ , x − x
(3.95)
and Θ := Ω × [0, ∞).
It is easy to see that qri(Θ) = qri(Ω) × (0, ∞) and that the set Q is a con-
vex polyhedron. Moreover, we can deduce from the constructions of Q, Θ in
(3.95) and the choice of x∗ that Q ∩ qri(Θ) = ∅. The separation results of
Theorem 3.86 give us a nonzero pair (w∗ , γ) ∈ X ∗ × R and α ∈ R such that
w∗ , x + λ1 γ ≤ α ≤ w∗ , y + λ2 γ for all (x, λ1 ) ∈ Q, (y, λ2 ) ∈ Θ. (3.96)
w∗ −w∗
x∗ = + x∗ + ∈ N (x; P ) + N (x; Ω),
γ γ
which verifies the inclusion “⊂” in (3.94) and thus completes the proof of the
theorem, since the opposite inclusion is trivial.
In this section we establish major calculus rules for coderivatives of convex set-
valued mappings and subdifferentials of convex extended-real-valued functions
in LCTV spaces under certain polyhedrality assumptions. These assumptions
allow us to significantly improve the previous qualification conditions used
for coderivative and subdifferential calculi in LCTV spaces in nonpolyhedral
settings. According to the geometric approach, the driving force of our deriva-
tions here is the application of the polyhedral intersection rule for normal
cones to convex sets obtained above in Theorem 3.87.
We start with the following notions of polyhedral set-valued mappings and
extended-real-valued functions in topological vector spaces.
Definition 3.88 Let X and Y be topological vector spaces.
(a) A mapping F : X → → Y is said to be polyhedral set-valued map-
ping/multifunction if its graph is a convex polyhedron in X × Y .
(b) An extended-real-valued function f : X → R is said to be a polyhedral
function if its epigraph epi(f ) is a convex polyhedron in X × R.
The first result of this subsection presents a coderivative sum rule for two
set-valued mappings, one of which is polyhedral. The proof is based on the
application of the polyhedral normal cone intersection rule from Theorem 3.87.
Theorem 3.89 Consider two convex set-valued mappings F1 , F2 : X → → Y
between LCTV spaces and impose the following graphical quasi-relative inte-
rior qualification condition: there exists a triple (x, y1 , y2 ) ∈ X × Y × Y such
that
(x, y1 ) ∈ gph(F1 ) and (x, y2 ) ∈ qri gph(F2 ) . (3.97)
Assuming in addition that the set-valued mapping F1 is polyhedral, we have
the coderivative sum rule
D∗ (F1 + F2 )(x, y)(y ∗ ) = D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ) (3.98)
valid for all (x, y) ∈ gph(F1 + F2 ), for all y ∗ ∈ Y ∗ , and for all (y 1 , y 2 ) ∈
S(x, y), where the set-valued mapping S is defined in (3.28).
Proof. Fix any x∗ ∈ D∗ (F1 + F2 )(y ∗ ) and get by Definition 3.19 that
(x∗ , −y ∗ ) ∈ N ((x, y); gph(F1 + F2 )). For every (y 1 , y 2 ) ∈ S(x, y) form two
convex subsets of the LCTV space X × Y × Y by
Ω1 := (x, y1 , y2 ) ∈ X × Y × Y y1 ∈ F1 (x),
Ω2 := (x, y1 , y2 ) ∈ X × Y × Y y2 ∈ F2 (x) .
242 3 CONVEX GENERALIZED DIFFERENTIATION
Our next goal here is to obtain efficient conditions that ensure the fulfill-
ment of the graphical quasi-relative interior qualification condition (3.97) and
hence of the coderivative sum rule (3.98) derived in Theorem 3.89. Observe to
this end that it would be much easier to check relative interior-type conditions
for domains of mappings than for their graphs. To achieve this goal, we use in
what follows the corresponding relationships for generalized relative interiors
established in Section 2.5.
The next theorem provides a refined coderivative chain rule in the poly-
hedrality setting of LCTV spaces.
Exercise 3.96 Check that the normal cone inclusion (3.7) holds for any convex sets
in arbitrary topological vector spaces.
0 ∈ int(Ω1 − Ω2 ) (3.106)
ensures the validity of the bounded extremality condition (3.11) provided that
Ω2 is bounded.
(b) Clarify whether the bounded extremality condition (3.11) is implied by the
standard qualification condition (3.9) in arbitrary topological vector spaces.
246 3 CONVEX GENERALIZED DIFFERENTIATION
Exercise 3.99 Verify that in the proof of Theorem 3.22 we have the following:
(a) Ω1 − Ω2 = dom(F1 ) − dom(F2 ) × Y × Y .
(b) (x∗ , −y ∗ , −y ∗ ) ∈ N (x, y 1 , y 2 ); Ω1 ∩ Ω2 .
Exercise 3.100 Verify that in the proof of Theorem 3.23 we have the following:
(a) Ω1 − Ω2 = X × rge(F ) − dom(G) × Z.
(b) (x∗ , 0, −z ∗ ) ∈ N (x, y, z); Ω1 ∩ Ω2 .
Exercise 3.101 Give a direct proof of (3.43) for convex functions and investigate
the possibility to extend this to topological vector spaces.
and also the symmetric subdifferential ∂ 0 f (x) := ∂f (x) ∪ ∂ + f (x). Prove that
Hint: Compare with the proof of [228, Theorem 1.93] in the case of normed spaces.
Exercise 3.104 Prove the subdifferential sum rule for finitely many functions for-
mulated in Corollary 3.49.
∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x), ∂ ∞ (f1 + f2 )(x) = ∂ ∞ f1 (x) + ∂ ∞ f2 (x)
for both basic subdifferential (3.39) and singular subdifferential (3.41) under
the singular subdifferential qualification condition
∂ ∞ f1 (x) ∩ − ∂ ∞ f2 (x) = {0}. (3.109)
(b) Obtain an extension of (a) to the case of finitely many convex functions.
(c) Compare the qualification condition in (a) and (b) with the relative interior
qualification condition (3.60) in Theorem 3.50.
(d) Derive the subdifferential sum rules of Theorem 3.50 and of parts (a) and (b)
of this exercise from the corresponding sum rules for coderivatives by using the
relationships in (3.42).
(e) Obtain appropriate versions of the results in (a)–(d) for convex functions on
topological vector spaces.
Exercise 3.106 Find the subdifferential formulas for the following functions:
(a) f (x) := 3|x|, x ∈ R.
(b) f (x) := |x − 2| + |x + 2|, x ∈ R.
(c) f (x) := max{e−2x , e2x }, x ∈ R.
Exercise 3.107 Let f (x1 , x2 ) := max{|x1 |, |x2 |} for (x1 , x2 ) ∈ R2 . Calculate the
subdifferentials ∂f (0, 0) and ∂f (1, 1).
Exercise 3.108 Let X be a normed space, and let F be a closed bounded convex
set containing the origin in its interior. Prove the following subdifferential formula
for the Minkowski gauge function pF :
F◦ if x = 0,
∂pF (x) = ∗
∗ ∗ ∗
x ∈ X pF (x ) = 1, x , x = pF (x)
◦ otherwise,
Exercise 3.109 Derive chain rules for both basic and singular subdifferentials from
the corresponding chain rules for coderivatives in topological vector spaces and finite-
dimensional spaces. Hint: Use the relationships in (3.42).
(b) Specify the results of (a) in the case of finite-dimensional spaces X and Y .
Hint: Use the subdifferential chain rules from Theorems 3.55 and 3.56 together with
their singular subdifferential counterparts.
248 3 CONVEX GENERALIZED DIFFERENTIATION
(b) Extend the general (not subdifferential) properties of the basic distance function
(3.72) obtained in Subsection 3.3.5 to the generalized one (3.110) under suitable
assumptions of Θ. Hint: Compare with [315, Theorem 2.3] in finite dimensions
and with [229, Theorem 1.41] in normed spaces.
(c) Find conditions on Θ ensuring the convexity of (3.110) in both variables.
(d) Assuming that (x, y) → d(x; Θ(y)) is convex, establish appropriate counterparts
of the subdifferential results from Subsection 3.3.5 in the case of the generalized
distance function (3.110).
(e) Under the convexity of (3.110) in both variables (x, y), derive upper estimates
and precise formulas for calculating the singular subdifferential ∂ ∞ d(x; Θ(y))
of (3.110) at (x, y) ∈ X × Y considering both the in-set case (x, y) ∈ gph(Θ)
and the out-of-set case (x, y) ∈ gph(Θ).
Hint: For (d) and (e), specify and improve the corresponding results of [231, 233]
provided that the function (x, y) → d(x; Θ(y)) is convex.
Exercise 3.114 Let all the spaces under consideration in this chapter be real vector
spaces without topology with their algebraic dual spaces defined as in (3.107).
(a) Prove the intersection rule for the normal cone (3.108) in real vectors spaces by
using the characterization of set extremality from Exercise 3.98.
(b) Formulate and verify vector space counterparts of the calculus results given in
Subsections 3.2.2, 3.3.2, 3.3.3, and 3.3.4 by proceeding similarly to the proofs
therein and replacing set interiors with cores.
(c) Clarify the possibility to derive the results from (b) in vector spaces by reducing
them to those obtained in topological vector spaces with usage of convex core
topology discussed in Exercise 2.207.
Exercise 3.115 Let Ω1 , Ω2 ⊂ Rn be two nonempty polyhedral sets such that Ω1 ∩
Ω2 = ∅. Prove that they can be strictly separated. Hint: Reduce to the description of
strict separation in [306, Theorem 11.4] and compare with Corollary 19.3.3 therein.
Exercise 3.116 Let Ω be a nonempty convex polyhedron in a topological vector
space X. Prove that ri(Ω) = ∅. Hint: Proceed by the definition of polyhedral sets
and cf. [42, Propositin 2.197].
Exercise 3.117 Obtain an extension of Theorem 3.87 to the case of intersections
of finitely many convex sets in topological vector spaces, where all but one of the
sets are polyhedral. Hint: Proceed by induction with deriving first the intersection
rule for two sets, which are both convex polyhedra.
3.6 Commentaries to Chapter 3 249
Exercise 3.119 Extend the coderivative and subdifferential sum rules from Theo-
rem 3.89 and Corollary (3.50), respectively, to the cases of finitely many terms in
the corresponding summation.
Exercise 3.120 Derive efficient conditions for the fulfillment of the polyhedral chain
rule in Theorem 3.92 without using the quasi-relative interiors of the graphs. Hint:
Proceed similarly to the proof of Theorem 3.91.
Exercise 3.121 Develop calculus rules of convex generalized differentiation for nor-
mals to sets, coderivatives of set-valued mappings, and subgradients of extended-
real-valued functions in LCTV spaces under extended relative interior qualification
conditions without any polyhedrality assumptions. Hint: Start with the normal cone
intersection rule and employ the proper separation result of Theorem 2.184 as in
the proof of Theorem 3.15.
The theory of generalized functions by Sergei Sobolev (1908–1989) and the theory
of distributions by Laurent Schwartz (1915–2002) develop and apply appropriate
notions of generalized derivatives. However, those notions have nothing to do with
what is needed for optimization. Indeed, the generalized derivatives in the sense of
Sobolev and Schwartz concern equivalence classes of functions and are defined up to
sets of measure zero. On the other hand, the main interest in convex optimization is
drawn to individual points, where the minimum value of a function is attained. For
instance, the function ϕ(x) = |x| is nondifferentiable at only one point 0 ∈ R, but
this is the most important point of ϕ realizing the minimum value of this function.
The concept of the subdifferential as the collection of subgradients and thus of
set-valued subgradient mappings has been a breakthrough idea in mathematics that
was not accepted right from the beginning (as Rockafellar told to the first author),
but then has been realized as a powerful machinery to investigate and solve numerous
problems appearing in optimization and many other areas of applied mathematics.
Of course, it has become possible after developing an adequate subdifferential cal-
culus and subgradient computations. Note that the set-valuedness of subgradient
mappings is an ultimate indication of the function nonsmoothness and geometri-
cally corresponds to multiple normals to epigraphs at the reference points, i.e., to
multiple supporting hyperplanes to convex sets at kink points of the boundary.
In most publications on convex analysis, as well as those concerning more general
forms of nonsmooth and variational analysis, major properties of normal cones and
particularly calculus rules for them are derived as consequences of the corresponding
functional results of subdifferential calculus. We implement here the opposite strat-
egy following the dual-space geometric approach to variational analysis initiated by
Mordukhovich and then developed by many researchers in finite-dimensional and
Banach spaces; see the books [226, 228, 229] with the references and commentaries
therein. The underlying ingredients of this approach are the notion of set extremality
for systems of closed sets at their common points introduced by Kruger and Mor-
dukhovich in [191] and its dual descriptions via the extremal principles; see [228, 229]
for more discussions. The extension of set extremality to arbitrary set systems (with
possibly empty intersections without closedness requirements) in topological vector
spaces is given in Definition 3.6, which is taken from our recent paper [239]. Neces-
sary and sufficient conditions for set extremality and its relationships with convex
separation are also taken from [239] on which the entire Section 3.1 (except Subsec-
tion 3.1.4) is based; see also [88, 89, 242] for further developments and applications.
Subsection 3.1.4 presents the classical intersection rule for normal cones to convex
sets in finite dimension being generally different from the above intersection rule
in topological vector spaces. The proof of the finite-dimensional intersection rule of
Theorem 3.15 is taken from our publications [237, 238].
For some reason, generalized derivatives and coderivatives of single-valued and
set-valued mappings were not introduced and utilized in basic convex analysis. To
the best of our knowledge, the first notions of this type for convex-graph multifunc-
tions were defined by Pshenichnyi [295] (see also his book [296]) via the tangent
cone to the associated graphical set and considering then the dual object called the
locally conjugate mapping at the reference point of the graph. Similar constructions
were introduced by Aubin [11] under the names of the graphical derivative and cod-
ifferential, respectively; see also the books by Aubin and Ekeland [13] and Aubin
and Frankowska [14] for further studies and applications. The coderivative notion
3.6 Commentaries to Chapter 3 251
from Definition 3.19 was introduced by Mordukhovich [224] via his (limiting) nor-
mal cone defined earlier in [223] without any appeal to tangential approximations of
the graph. In fact, this coderivative cannot be dual to a graphical derivative, since
Mordukhovich’s normal cone is generally nonconvex and hence cannot be generated
in duality by tangential approximations.
In contrast to the graphical derivatives and their dual constructions initiated
by Pshenichnyi and Aubin, the coderivative of Mordukhovich is robust and enjoys
full coderivative calculus for general multifunctions between finite-dimensional and
Banach (mainly Asplund) spaces under unrestrictive qualification conditions; see
the aforementioned books [226, 228, 229] and the book by Rockafellar and Wets
[317] in finite dimensions with the references and discussions therein. The reader is
also referred to the developments of Borwein and Zhu [54], Ioffe [171], Jourani and
Thibault [177], and Penot [288] on calculus rules for coderivatives defined by scheme
(3.27) via other normal cones in suitable Banach space frameworks.
The proofs of the coderivative calculus rules given in Section 3.2 follow the
pattern of the geometric dual-space approach from [226, 228, 229] but lead us to
essentially stronger results for coderivatives of convex multifunctions in comparison
with those obtained for general nonconvex mappings. First of all, we cover arbi-
trary topological vector spaces without any closedness assumptions on the involved
multifunctions, and—most importantly—the obtained calculus rules and the corre-
sponding qualification conditions for convex multifunctions significantly strengthen
known nonconvex rules and cannot be derived from the latter by specifying them
to convex-graph mappings even in finite dimensions. The topological vector space
material of Section 3.2 is based on our paper with Rector and Tran [242], while
the finite-dimensional Subsection 3.2.3 follows our previous publication [238]. The
recent papers [88, 89] provide extensions of the coderivative calculus for convex mul-
tifunctions between general topological vector spaces with qualification conditions
formulated in terms of cores.
Subdifferential theory for convex functions is one of the most understood and
complete part of convex analysis with a profound influence on developing general-
ized differentiation of nonconvex functions in different settings of variational analysis
and its applications. We have discussed above the origin of convex subdifferentia-
tion, which is now available for extended-real-valued functions in finite-dimensional
and infinite-dimensional frameworks. Various aspects of convex subdifferentiation
can be found in the books by Bauschke and Combettes [34], Borwein and Lewis
[48], Ekeland and Teman [122], Hiriart-Urruty and Lemaréchal [164, 165], Ioffe and
Tikhomirov [174], Kusraev and Kutateladze [197], Mordukhovich and Nam [237],
Phelps [290], Rockafellar [306], and Zălinescu [361] among other publications.
It seems, however, that the notion of the singular/horizon subdifferential from
Definition 3.31 did not appear in the aforementioned publications on convex anal-
ysis, although some ideas on behavior of convex sets and functions at infinity had
been explored in the study of convexity via horizon cones and functions; see, e.g.,
Rockafellar and Wets [317, Chapter 3]. Probably the main reason for missing this
construction in basic convex analysis was the intrinsic Lipschitz continuity of a
convex function on the interior of its domain, where the singular subdifferential is
trivial, which is not the case for the domain boundary; see Theorems 3.34 and 3.37.
Singular subgradients of extended-real-valued functions (both convex and non-
convex) naturally appear while considering normal cones to epigraphs that are
252 3 CONVEX GENERALIZED DIFFERENTIATION
decomposed into nonhorizontal and horizontal normals; the latter generate the
singular subgradients. To the best of our knowledge, this has been first explored
in the early work by Kruger and Mordukhovich [190] and Mordukhovich [224]; see
also the books [226, 228, 229] for more details. The constructions of the subdiffer-
ential and singular subdifferential can be unified by using the coderivative of the
graphical multifunction as in (3.43), which allows us to conduct their parallel stud-
ies. In another way, singular subgradients were defined in finite-dimensional spaces
by Rockafellar [313, 316] as “singular proximal limiting subgradients” while being
equivalent to the pattern of (3.41) in finite dimensions.
Subsection 3.3.1 contains standard material of convex analysis, except the results
involving singular subgradients. Subsections 3.3.2–3.3.4 present basic rules of subd-
ifferential calculus in topological vector spaces and finite-dimensional spaces, which
can be found, e.g., in the books by Rockafellar [306] in finite dimensions and by
Ioffe and Tikhomirov [174] in LCTV spaces. Note that in this chapter the local
convexity of the spaces in question is not needed in our developments. The underly-
ing results of subdifferential calculus are the subdifferential sum rules whose origin
has been discussed above in this commentary section. The proofs of the results
presented here are based on the geometric approach from variational analysis by
reducing the subdifferential sum rules to the corresponding results for the normal
cone to set intersections. This is the pattern developed in [226, 228, 229] in gen-
eral nonconvex settings with significant specifications in the case of convexity by
following our publications [237, 238] in finite-dimensional spaces and [242] in the
LCTV setting. The given results and their singular subdifferential counterparts can
be deduced from the corresponding rules of the coderivative calculus. Note that the
singular subdifferential qualification condition (3.109) was introduced independently
by Mordukhovich [225] and Rockafellar [316] for different subdifferentials in general
variational analysis and was used in our book [237] for deriving both subdifferential
sum rules for convex functions in Exercise 3.105. The recent papers [88, 89] contain
geometric derivations of subdifferential calculus rules in general vector spaces with-
out topology by using qualification conditions formulated via algebraic interiors of
sets instead of topological ones.
Subsection 3.3.5 deals with distance functions, which are intrinsically nondif-
ferentiable and play a highly important role in various aspects of convex and
variational analysis as well as in their numerous applications; e.g., the books
[34, 48, 54, 76, 77, 171, 226, 228, 229, 237, 317] and the references therein. The
general properties of the distance functions and projection operators presented in
this subsection are well known. Theorem 3.76 on subdifferentiation of infimal con-
volutions goes back to Moreau [265, 266]. Subdifferential properties of the basic dis-
tance function (3.72) associated with closed sets in finite-dimensional and Banach
spaces have been largely investigated in the literature for major subdifferential
constructions by using variational principles in infinite dimensions. The reader can
consult the books by Mordukhovich [228, 229] with various developments, detailed
commentaries, and references for both in-set and out-of-set cases; the latter one is
much more involved. Among many publications in these directions, we particularly
mention the papers by Bounkhel and Thibault [59], Jourani and Thibault [176],
Kruger [186, 188], and Mordukhovich and Nam [231, 233] for both in-set and out-
of-set cases, as well by Ioffe [169] and Thibault [333] for the in-set case in diverse
space settings. Note finally that our papers [231, 233] contain extended subdifferen-
3.6 Commentaries to Chapter 3 253
tial evaluations for the generalized distance functions of type (3.110) with moving
sets Θ(y) at both in-set points (x, y) ∈ gph(Θ) and out-of-set ones (x, y) ∈ gph(Θ).
Section 3.4 addresses polyhedral convexity, which role—especially in the case of
convex sets—has been well recognized in convex analysis. Definition 3.83 of polyhe-
dral sets via systems of linear inequalities (3.84) is equivalent, in finite dimensions,
to the classical topological definition of convex polyhedra as finitely generated con-
vex sets due to the Minkowski-Weyl theorem; see [219, 352] and a comprehensive
account of finite-dimensional polyhedrality in the book of Rockafellar [306] with
further references. Note that definition of polyhedrality (3.84) in finite dimensions
is equivalent to the “generalized polyhedrality” introduced and studied by Bonnans
and Shapiro [42] (see Exercise 3.118), but the equivalence fails in infinite-dimensional
spaces.
The fundamental Rockafellar’s separation theorem, which is given Theorem 3.84
with a somewhat different proof, is the key to develop many issues of convex polyhe-
drality in finite Dimensions. Its infinite-dimensional extension to the LCTV spaces
is due to Kung Fu Ng and Wen Song [280] who reduced in fact the LCTV setting
to finite dimensions by using the quotient topology. The proof of Theorem 3.86 is a
certain elaboration of [280, Theorem 3.1].
The rest of Section 3.4 follows our recent paper with Cuong and Sandine [91],
where we develop the geometric approach of this book to derive calculus rules for
normal to sets and then to coderivatives of set-valued mappings and subgradients of
extended-real-valued functions. To furnish this approach in the polyhedral setting of
LCTV spaces requires using the results on generalized relative interiors presented in
Chapter 2. Observe, in particular, that the subdifferential chain rule of Corollary 3.93
not only removes the standard continuity assumption on the outer function f , but is
also free of any generalized relative interior construction. The obtained result goes
far beyond the one with the qualification condition
dom(f ) ∩ sqri(AX) = ∅
The next corollary follows directly from Proposition 4.5 by using the func-
tion f defined in Example 4.4(c).
Proof. It follows from the given assumptions that 0 ∈ dom(f ) with f (0) = 0.
Fix any x∗ ∈ Ω = ∂f (0) and get
x∗ , x ≤ f (x) for all x ∈ X.
Thus x∗ , x − f (x) ≤ 0 for all x ∈ X, and x∗ , 0 − f (0) = 0. Then we have
f ∗ (x∗ ) = sup x∗ , x − f (x) x ∈ X = 0.
Consider the case where x∗ ∈ ∈ dom(f ) such that
/ Ω = ∂f (0) and find x
x∗ , x
> f (
x).
In this case we have
f ∗ (x∗ ) ≥ sup x∗ , t x) t > 0 = sup t x∗ , x
x − f (t x) t > 0 = ∞.
− f (
This implies that f ∗ (x∗ ) = ∞ and completes the proof.
4.1 Fenchel Conjugates 259
Proof. Taking into account Proposition 4.12, it suffices to verify the oppo-
site inequality therein. Fix x∗ ∈ ∂f (x) and get x∗ , x = f (x) + f ∗ (x∗ ) by
Theorem 4.10. This readily implies that
f (x) = x∗ , x − f ∗ (x∗ ) ≤ sup x, x∗ − f ∗ (x∗ ) x∗ ∈ X ∗ = f ∗∗ (x),
which therefore verifies the claimed assertion.
The next statement immediately follows from Proposition 4.13 due to the
basic subdifferential theory of convex analysis.
Proof. First we show that A(f ) = ∅ provided that f is l.s.c. and convex.
Fix any x0 ∈ dom(f ) and choose λ0 < f (x0 ). Then (x0 , λ0 ) ∈
/ epi(f ), where
the epigraph is a nonempty, closed, and convex subset of X × R. The convex
separation result from Theorem 2.61 ensures the existence of a pair (x∗ , γ) ∈
X ∗ × R and a positive number ε such that
x∗ , x + γλ < x∗ , x0 + γλ0 − ε for all (x, λ) ∈ epi(f ). (4.5)
Since (x0 , f (x0 ) + α) ∈ epi(f ) if α ≥ 0, we get
γ f (x0 ) + α < γλ0 − ε whenever α ≥ 0.
This yields γ < 0 since otherwise we let α → ∞ and arrive at a contradiction.
Taking any x ∈ dom(f ), it follows from (4.5) as (x, f (x)) ∈ epi(f ) that
x∗ , x + γf (x) < x∗ , x0 + γλ0 − ε for all x ∈ dom(f ).
This allows us to conclude that
x∗ ε
f (x) > , x0 − x + λ0 − if x ∈ dom(f ).
γ γ
Defining now ϕ(x) := x∗ /γ, x0 − x + λ0 − ε/γ, we get ϕ ∈ A(f ) and thus
verify the claimed nonemptiness A(f ) = ∅.
Let us further prove that (a)=⇒(b) meaning that the properties in (a)
ensure that for any λ0 < f (x0 ) there exists ϕ ∈ A(f ) with λ0 < ϕ(x0 ).
Since (x0 , λ0 ) ∈
/ epi(f ), we apply again the aforementioned convex separation
theorem to obtain (4.5). In the case where x0 ∈ dom(f ), it is proved above
that ϕ ∈ A(f ). Moreover, we have ϕ(x0 ) = λ0 − ε/γ > λ0 since γ < 0.
Consider now the case where x0 ∈ / dom(f ). It follows from (4.5), by taking
any x ∈ dom(f ) and letting λ → ∞, that γ ≤ 0. If γ < 0, the same arguments
as above verify (b). Hence we only need to consider the case where γ = 0.
Employing (4.5) in this case tells us that
x∗ , x − x0 + ε < 0 whenever x ∈ dom(f ).
Since A(f ) = ∅, choose ϕ0 ∈ A(f ) and define
ϕk (x) := ϕ0 (x) + k(x∗ , x − x0 + ε) for k ∈ N.
We obviously have ϕk ∈ A(f ) and ϕk (x0 ) = ϕ0 (x0 ) + kε > λ0 for all large k,
which justifies the representation in (b).
To verify now implication (b)=⇒(c), pick any ϕ ∈ A(f ). Since ϕ(x) ≤ f (x)
for all x ∈ X, we have ϕ∗∗ (x) ≤ f ∗∗ (x) on X by Proposition 4.3 and the
262 4 ENHANCED CALCULUS AND FENCHEL DUALITY
which hold by (c) and thus complete the proof of the theorem.
Theorem 4.15 easily implies that the conjugate of any proper, convex, and
lower semicontinuous function is proper.
Corollary 4.16 Let X be an LCTV space, and let f : X → R be a proper,
convex, and l.s.c. function. Then we have dom(f ∗ ) = ∅.
Proof. Theorem 4.15 tells us that there are x∗ ∈ X ∗ and b ∈ R such that
x∗ , x + b ≤ f (x) whenever x ∈ X.
It yields x∗ , x − f (x) ≤ −b for all x ∈ X, and hence f ∗ (x∗ ) ≤ −b < ∞. Thus
we verify the inclusion x∗ ∈ dom(f ∗ ).
Let X be an LCTV space, and let its dual space X ∗ be equipped with the
weak∗ topology. For a function f : X ∗ → R with u∗ ∈ dom(f ∗ ), subgradients
of f at u∗ are given by
∂f (u∗ ) := x ∈ X x∗ − u∗ , x ≤ f (x∗ ) − f (u∗ ) for all x∗ ∈ X ∗ .
The last assertion of this subsection establishes close relationships between
subgradients of a convex function and its conjugate.
This ensures that f ∗ (u∗ ) = u∗ , x − f (x) < ∞, and so u∗ ∈ dom(f ∗ ). Apply-
ing now Proposition 4.9 tells us that
x∗ , x ≤ f (x) + f ∗ (x∗ ) for all x∗ ∈ X ∗ . (4.7)
Unifying now (4.6) and (4.7), we arrive at the inequality
x∗ − u∗ , x ≤ f (x∗ ) − f (u∗ ) for all x∗ ∈ X ∗ ,
which verifies the claimed inclusion x ∈ ∂f ∗ (u∗ ).
Suppose finally that f is l.s.c. and that x ∈ ∂f ∗ (u∗ ). Then we have f ∗∗ = f
by Theorem 4.15, and thus u∗ ∈ ∂f ∗∗ (x) = ∂f (x).
Here we start investigating the support function associated with a given subset
of a topological vector space. This extended-real-valued and always convex
function plays a highly important role in the subsequent material. The main
result of this subsection gives us a precise formula for representing the support
function for the intersection of two convex sets in topological vector spaces via
the infimal convolution under certain qualification conditions with a further
improvement in finite dimensions. The proof is based on the convex extremal
principle in topological vector spaces derived in Subsection 3.1.2.
Proof. To verify the properties listed in (a), observe first that σΩ (0) = 0, and
hence σΩ is proper. Furthermore, it follows from (4.8) that
264 4 ENHANCED CALCULUS AND FENCHEL DUALITY
epi(σΩ ) = (x∗ , λ) ∈ X ∗ × R λ ≥ σΩ (x∗ )
= (x∗ , λ) ∈ X ∗ × R | λ ≥ x∗ , x for all x ∈ Ω}
∗
= (x , λ) | λ ≥ x∗ , x .
x∈Ω
This representation clearly yields the positive homogeneity and convexity (and
hence sublinearity) as well as the weak∗ lower semicontinuity of σΩ .
To prove (b), we have that σΩ (x∗ ) ≤ σΩ (x∗ ) for all x∗ ∈ X ∗ due to Ω ⊂ Ω.
Taking now x ∈ Ω, find a net {xα } ⊂ Ω that converges to x. For any x∗ ∈ X ∗
we obviously get the relationships
x∗ , x = limx∗ , xα ≤ σΩ (x∗ ),
which implies that σΩ (x∗ ) ≤ σΩ (x∗ ). Furthermore, σΩ ≤ σco(Ω) since Ω ⊂
mand then find λi ≥ 0
co(Ω). To verify the reverse inequality, fix any x ∈ co(Ω)
and xi ∈ Ω for i = 1, . . . , m for some m ∈ N with i=1 λi = 1 such that
m
x = i=1 λi xi . It shows that
m m m
x∗ , x = x∗ , λi xi = λi x∗ , xi ≤ λi σΩ (x∗ ) = σΩ (x∗ )
i=1 i=1 i=1
for x∗ ∈ X ∗ . The latter yields σco(Ω) (x∗ ) ≤ σΩ (x∗ ) and verifies (b).
It remains to check (c). To proceed with the first equality therein, we have
σΩ1 +Ω2 (x∗ ) = sup x∗ , x1 + x2 x1
∈ Ω1 , x
2 ∈ Ω2
= sup x∗ , x1 x1 ∈ Ω1 + sup x∗ , x2 x2 ∈ Ω2
= σΩ1 (x∗ ) + σΩ2 (x∗ ) for all x∗ ∈ X ∗ .
The second equality is verified similarly.
Example 4.20 Let X be an LCTV space. Since (δΩ )∗ = σΩ , we have δΩ ∗∗
=
∗∗
δco(Ω) for any nonempty subset Ω ⊂ X. This yields δΩ = δΩ provided that
Ω is a nonempty, closed, and convex set, which is consistent with the general
result of Theorem 4.15.
Lemma 4.21 Let X be an LCTV space, and let Ω be a nonempty, closed,
and convex subset of X. Then we have the formula
∂σΩ (0) = Ω. (4.9)
Proof. Fix any x ∈ ∂σΩ (0) and get by the definition that
x∗ − 0, x = x∗ , x ≤ σΩ (x∗ ) = σΩ (x∗ ) − σΩ (0) for all x∗ ∈ X ∗ .
Suppose on the contrary that x ∈ / Ω. The strict separation theorem ensures
that there exists x∗ ∈ X ∗ such that
sup x∗ , x < x∗ , x,
x∈Ω
which gives us a contradiction. Thus we have the inclusion “⊂” in (4.9). The
proof of the opposite inclusion is straightforward.
4.1 Fenchel Conjugates 265
The following proposition calculates the Fenchel conjugate for infimal con-
volutions of support functions to convex sets in LCTV spaces.
Proposition 4.22 Let the sets Ω1 and Ω2 be nonempty, closed, and convex
in an LCTV space X with Ω1 ∩ Ω2 = ∅. Then we have the representation
∗
σΩ1 σΩ2 (x) = δΩ1 ∩Ω2 (x) for all x ∈ X. (4.10)
Proof. To verify the inequality “≤” in (4.12), fix x∗ ∈ X ∗ and pick x∗1 , x∗2
with x∗ = x∗1 + x∗2 . Then we have
x∗ , x = x∗1 , x + x∗2 , x ≤ σΩ1 (x∗1 ) + σΩ2 (x∗2 ) for every x ∈ Ω1 ∩ Ω2 .
Taking the infimum on the right-hand side above with respect to all such
elements x∗1 , x∗2 implies that
x∗ , x ≤ σΩ1 σΩ2 (x∗ ) whenever x ∈ Ω1 ∩ Ω2 ,
which justifies the inequality “≤” in (4.12) for arbitrary sets Ω1 and Ω2 with-
out imposing the assumptions in (a)-(d).
Now we prove the inequality “≥” in (4.12) considering first case (b). Fix
any x∗ ∈ dom(σΩ1 ∩Ω2 ), denote α := σΩ1 ∩Ω2 (x∗ ) ∈ R for which
x∗ , x ≤ α whenever x ∈ Ω1 ∩ Ω2 ,
and define two nonempty convex subsets of X × R by
Let us check that U × (λ, ∞) ⊂ Θ1 − Θ2 , and thus (3.5) holds. Indeed, taking
any pair (x, λ) ∈ U × (λ, ∞) gives us x ∈ U ⊂ Ω1 − Ω2 and λ > λ. Hence we
represent x = w1 − w2 for some w1 ∈ Ω1 , w2 ∈ Ω2 and therefore obtain
(x, λ) = (w1 , λ − λ) − (w2 , −λ).
It follows from λ − λ > 0 that (w1 , λ − λ) ∈ Θ1 . Then (3.13) and the choice
of λ tell us that (w2 , −λ) ∈ Θ2 , and thus int(Θ1 − Θ2 ) = ∅. Applying The-
orem 3.7(b) ensures the existence of (0, 0) = (y ∗ , β) ∈ X ∗ × R for which
here are the conjugate sum and chain rules that give us exact representa-
tions of the Fenchel conjugate for sums and compositions. To establish these
results, we develop a geometric approach based on the reduction to the inter-
section rule for support functions of sets and using eventually the convex
extremal principle, or some form of convex set separation. Similar to our pro-
cedure above, we concentrate in this subsection on the general topological
vector space setting with finite-dimensional specifications while postponing
until Subsection 4.2.2 the case of l.s.c. functions on Banach spaces. In the
case of topological vector spaces, refined results are derived under certain
polyhedrality assumptions by using the corresponding machinery developed in
Section 3.4.
Let us start with simple rules, which directly follow from the definition.
Proposition 4.24 Let f : X → R be an arbitrary function on a topological
vector space X. Then we have the equalities:
x∗
(a) (λf )∗ (x∗ ) = λf ∗ for any λ > 0.
λ
(b) (f + c)∗ (x∗ ) = f ∗ (x∗ ) − c for any c ∈ R.
(c) (fa )∗ (x∗ ) = f ∗ (x∗ ) − x∗ , a, where fa (x) := f (x + a).
Proof. To verify (a), we get by definition that
(λf )∗ (x∗ ) = sup x∗ , x − λf (x) x ∈ X
x∗ x∗
= λ sup , x − f (x) x ∈ X = λf ∗ .
λ λ
The proofs of (b) and (c) are also straightforward.
The next proposition evaluates the conjugate of the infimal convolution in
topological vector spaces without any convexity assumptions.
Proposition 4.25 Let X be a topological vector space. For any proper func-
tions f, g : X → R we have
∗
f g (x∗ ) = f ∗ (x∗ ) + g ∗ (x∗ ) whenever x∗ ∈ X ∗ .
Proof. The properness of f, g implies that the sum f ∗ + g ∗ is well defined for
all x∗ ∈ X ∗ . Then fixing any x, u ∈ X and x∗ ∈ X ∗ gives us
∗
f g (x∗ ) ≥ x∗ , x − f g (x)
= x∗ , x − inf f (x1 ) + g(x2 ) x1 + x2 = x
x∗ , u + x − u − f (u)
≥ − g(x −u)
= x∗ , u − f (u) + x∗ , x − u − g(x − u) .
Taking the supremum on the rightmost side with respect to x ∈ X first and
then with respect to u ∈ X yields (f +g)∗ (x∗ ) ≥ f ∗ (x∗ )+g ∗ (x∗ ). The opposite
inequality therein can also be verified easily.
The next observation makes a bridge between the conjugate of an arbi-
trary function and the support function for its epigraph. It is essential in the
implementation of our geometric approach to conjugate calculus.
4.1 Fenchel Conjugates 269
Lemma 4.26 Let X be a topological vector space. For any proper function
f : X → R we have
f ∗ (x∗ ) = σepi(f ) (x∗ , −1) whenever x∗ ∈ X ∗ .
Proof. It follows from the definitions that
f ∗ (x∗ ) = sup x∗ , x − f (x) x ∈ dom(f ) = sup x∗ , x − λ (x, λ) ∈ epi(f )
Here is the conjugate sum rule in topological vector space and finite-
dimensional settings.
Theorem 4.27 Let f, g : X → R be proper convex functions on a topological
vector space X, and let one of following conditions be satisfied:
(a) There exists a point x ∈ dom(f )∩dom(g) such that either f is continuous
at x, or g is continuous at this point.
(b) X is an LCTV space, and f is polyhedral under the fulfillment of the
qualification condition
dom(f ) ∩ qri dom(g) = ∅. (4.16)
Let us prove that (f ∗ g ∗ )(x∗ ) ≤ (f + g)∗ (x∗ ) under (a). We only need to
consider the case where (f + g)∗ (x∗ ) < ∞. Define two convex sets by
Ω1 := (x, λ1 , λ2 ) ∈ X × R × R λ1 ≥ f1 (x) = epi(f1 ) × R,
(4.19)
Ω2 := (x, λ1 , λ2 ) ∈ X × R × R λ2 ≥ f2 (x) .
Similar to Lemma 4.26 we get the representation
270 4 ENHANCED CALCULUS AND FENCHEL DUALITY
This justifies the sum rule (4.17) together with the last statement of the
theorem under the assumptions in (a). The verifications of (4.17) under the
assumptions in (b) and (c) are similar to the above arguments by applying
Theorem 4.23 in cases (c) and (d) therein, respectively.
Next we establish the major conjugate chain rule, the proof of which is also
based on the intersection results of Theorem 4.23 in both cases of topological
vector spaces and finite-dimensional spaces under different assumptions.
Theorem 4.28 Let A : X → Y be a linear continuous mapping between topo-
logical vector spaces, and let g : Y → R be a proper convex function. Suppose
that one of the following conditions is satisfied:
(a) The function g is finite and continuous at some point of the set AX.
(b) X is an LCTV space, and the function g is polyhedral with AX ∩
dom(g) = ∅.
(c) X = Rn , Y = Rp , and AX ∩ ri(dom(g)) = ∅.
Then we have the conjugate chain rule
(g ◦ A)∗ (x∗ ) = inf g ∗ (y ∗ ) y ∗ ∈ (A∗ )−1 (x∗ ) , x∗ ∈ X ∗ . (4.21)
Furthermore, for any x∗ ∈ dom(g ◦ A)∗ there exists y ∗ ∈ (A∗ )−1 (x∗ ) such that
(g ◦ A)∗ (x∗ ) = g ∗ (y ∗ ).
Proof. As above, we verify the results only in case (a), since the other two
cases can be considered similarly by using the corresponding versions of The-
orem 4.27. Picking y ∗ ∈ (A∗ )−1 (x∗ ) gives us by definition that
4.1 Fenchel Conjugates 271
g ∗ (y ∗ ) = sup y ∗ , y − g(y) y ∈ Y
≥ sup y ∗ , A(x) − g(Ax) x ∈ X
= sup A∗ y ∗ , x − (g ◦ A)(x)
x ∈X
= sup x , x − (g ◦ A)(x) x ∈ X = (g ◦ A)∗ (x∗ ).
∗
Proof. Let us first check that the inequality “≤” always holds in (4.23) when-
ever λ ∈ [0, 1]. Indeed, it follows directly from the definitions that
∗
λf + (1 − λ)g (x∗ ) = sup x∗ , x − λf (x) − (1 − λ)g(x)
x∈X
≥ sup x∗ , x − λ(f ∨ g)(x) − (1 − λ)(f ∨ g)(x)
x∈X
= sup x∗ , x − (f ∨ g)(x) = (f ∨ g)∗ (x∗ )
x∈X
where the last inequality follows from the convention 0 · f := δdom(f ) . Since
the latter obviously holds if (f ∨ g)∗ (x∗ ) = ∞, we complete the proof.
We start with of the intersection rule for support functions of convex sets,
which improves in the Banach space setting the previous one from Theo-
rem 4.23 obtained for arbitrary convex sets in general topological vector spaces
under interiority assumptions. Here we can deal with nonsolid while closed
and convex subsets of Banach spaces and arrive at the same result but under a
less demanding qualification condition by using a different proof. The obtained
version of the support function intersection rule is employed for developing
enhanced rules of subdifferential and conjugate calculi in the next subsection.
First we establish the following two lemmas that are used below in the
proof of the main result.
Lemma 4.30 Let Ω1 and Ω2 be nonempty subsets of a Banach space X.
Suppose that cone(Ω1 − Ω2 ) = X. Then for any α, β ≥ 0 the set
K = Kα,β := (x∗1 , x∗2 ) ∈ X ∗ × X ∗ σΩ1 (x∗1 ) + σΩ2 (x∗2 ) ≤ α, x∗1 + x∗2 ≤ β
is compact in the weak∗ topology of the dual product space X ∗ × X ∗ .
Proof. The closedness of the set K in the weak∗ topology of X ∗ × X ∗ is obvi-
ous. By the Alaoglu-Bourbaki theorem (Corollary 1.113), it remains to show
that this set is norm-bounded in X ∗ × X ∗ . Remembering the uniform bound-
edness principle, we need to verify that the collection of continuous linear func-
tionals from K is bounded pointwise. To proceed, take any (x1 , x2 ) ∈ X × X
and find by cone(Ω1 − Ω2 ) = X such λ ≥ 0, w1 ∈ Ω1 , and w2 ∈ Ω2 for which
x1 − x2 = λ(w1 − w2 ). Then we have
x∗1 , x1 + x∗2 , x2 = λx∗1 , w1 + λx∗2 , w2 + x∗1 + x∗2 , x2 − λw2
≤ λ σΩ1 (x∗1 ) + σΩ2 (x∗2 ) + x∗1 + x∗2 · x2 − λw2 ≤ λα + β x2 − λw2 .
Since with also holds for (−x1 , −x2 ), we arrive at the claimed conclusion.
274 4 ENHANCED CALCULUS AND FENCHEL DUALITY
Lemma 4.31 Under the assumptions of Lemma 4.30, the infimal convolution
(σΩ1 σΩ2 ) : X ∗ → R is a lower semicontinuous function with respect to the
weak∗ topology of X ∗ .
Proof. Applying the Fenchel conjugate to both sides of formula (4.10) from
Proposition 4.22 and then using Lemma 4.31 give us the equalities
∗
∗∗
δΩ 1 ∩Ω2
(x∗ ) = σΩ1 ∩Ω2 (x∗ ) = σΩ1 σΩ2 (x∗ ) = σΩ1 σΩ2 (x∗ )
for all x∗ ∈ X ∗ . This justifies (4.12) when the assumption cone(Ω1 − Ω2 ) = X
is satisfied. Denote further by L := cone(Ω1 − Ω2 ) the closed subspace of X in
question. Since Ω1 ∩Ω2 = ∅ by (4.24), we can always translate the situation to
the case where 0 ∈ Ω1 ∩ Ω2 and hence suppose that Ω1 , Ω2 ⊂ L. This reduces
the general setting under (4.24) to the one with cone(Ω1 − Ω2 ) = X treated
above. Thus (4.12) is verified.
Finally, representation (4.13) for x∗ ∈ dom(σΩ1 ∩ σΩ2 ) follows from
the
weak∗ compactness of the set Kα,β in Lemma 4.30 with α := σΩ1 σΩ2 (x∗ )+
ε, where ε > 0 is chosen arbitrarily, and where β := x∗ .
Remark 4.34 If Ω is a nonempty convex set in a vector space X such that
cone(Ω)
= λ≥0 λΩ is a linear subspace of X, then 0 ∈ Ω and cone(Ω) =
λ>0 λΩ. Indeed, fix a ∈ Ω and find λ ≥ 0 and b ∈ Ω such that −a = λb.
This implies that
1 λ
0= a+ b ∈ Ω,
1+λ 1+λ
which verifies the equality cone(Ω) = ∪λ>0 λΩ.
In this subsection, we first derive from Theorem 4.33 the normal cone intersec-
tion rule for closed and convex subsets of Banach spaces under the Attouch-
Brezis qualification condition (4.24). This geometric result generates enhanced
calculus rules for coderivatives and subgradients. We also present the corre-
sponding elaborations of conjugate calculus rules in the Banach space setting.
As stated in the discussion in the preceding subsection, the next theorem
significantly improves the intersection rule of Theorem 3.10 for the case where
Ω1 and Ω2 are closed subsets of a Banach space.
Theorem 4.35 Let the sets Ω1 , Ω2 ⊂ X be convex, and let x ∈ Ω1 ∩ Ω2 .
Suppose that the space X is Banach, that both sets Ω1 and Ω2 are closed,
and that the Attouch-Brezis qualification condition from Definition 4.32 is
satisfied. Then we have the normal cone intersection rule
N (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ) + N (x; Ω2 ). (4.25)
Proof. It follows from the normal cone definition that x∗ ∈ N (x; Ω) for x ∈ Ω
if and only if σΩ (x∗ ) = x∗ , x. Then pick any x∗ ∈ N (x; Ω1 ∩ Ω2 ) and get
x∗ , x = σΩ1 ∩Ω2 (x∗ ). Theorem 4.33 yields the existence of x∗1 , x∗2 ∈ X ∗ such
that x∗ = x∗1 + x∗2 and that
x∗1 , x + x∗2 , x = x∗ , x = σΩ1 ∩Ω2 (x∗ ) = σΩ1 (x∗1 ) + σΩ2 (x∗2 ).
276 4 ENHANCED CALCULUS AND FENCHEL DUALITY
This clearly ensures that x∗1 , x = σΩ1 (x∗1 ) and x∗2 , x = σΩ2 (x∗2 ). Thus
we have the inclusions x∗1 ∈ N (x; Ω1 ) and x∗2 ∈ N (x; Ω2 ), which show that
N (x; Ω1 ∩Ω2 ) ⊂ N (x; Ω1 )+N (x; Ω2 ). This verifies the inclusion “⊂” in (4.25).
The opposite inclusion is obvious.
Proof. Following the lines of the proof of Theorem 3.22, observe that the
equality in (3.32) for the sets Ω1 , Ω2 ⊂ X × Y × Y defined in (3.31) yields
cone(Ω1 − Ω2 ) = cone dom(F1 ) − dom(F2 ) × Y × Y.
Thus cone(Ω1 − Ω2 ) is a closed subspace of X × Y × Y under the qualification
condition imposed in the corollary. Then we proceed similarly to the proof of
Theorem 3.22 by applying Theorem 4.35 instead of Theorem 3.10.
Proof. Following the proof of Theorem 3.23, consider the closed and convex
subsets Ω1 , Ω2 ∈ X × Y × Z therein. Taking x∗ ∈ D∗ (G ◦ F )(x, z)(z ∗ ) with
z ∗ ∈ Z ∗ , we have the relationships
(x∗ , 0, −z ∗ ) ∈ N (x, y, z); Ω1 ∩ Ω2 and
Ω1 − Ω2 = X × rge(F ) − dom(G) × Z.
The latter tells us that the qualification condition imposed in this corollary
ensures the fulfillment of the Attouch-Brezis qualification condition (4.24) for
the sets Ω1 , Ω2 . Then we proceed as in the proof of Theorem 3.23 by replacing
the application of Theorem 3.10 with that of Theorem 4.35.
Proof. It follows from the proof of Theorem 3.48 by applying the normal cone
intersection rule from Theorem 4.35 instead of that from Theorem 3.10. We
can easily
check that the imposed closed subspace property of cone dom(f1 )−
dom(f2 ) corresponds to the Attouch-Brezis qualification condition (4.24) for
the sets Ω1 , Ω2 introduced in the proof of Theorem 3.48.
Proof. Proceeding as in the proof of Theorem 3.55, consider the sets Ω1 and
Ω2 defined in (3.67). Observe the equality
cone(Ω1 − Ω2 ) = X × cone AX − dom(g) × R.
Thus the imposed assumption ensures the fulfillment of the Attouch-Brezis
qualification condition (4.24) for these sets in the intersection rule of Theo-
rem 4.35. The rest of the proof follows the one of Theorem 3.55.
Next we proceed with deriving Banach space versions of the main results
of conjugate calculus improving those from Subsection 4.1.3 in this setting.
The first result concerns the conjugate sum rule.
Proof. Observe first that the convex sets Ω1 , Ω2 from (4.19) are closed by the
l.s.c. assumption on f, g and then check that
cone(Ω1 − Ω2 ) = cone dom(f ) − dom(g) × R × R. (4.26)
Indeed, consider u ∈ cone(Ω1 − Ω2 ) and find t ≥ 0, v ∈ (Ω1 − Ω2 ) such that
u = tv. This gives us elements v = (x1 , λ1 , λ2 )−(x2 , γ1 , γ2 ) with (x1 , λ1 , λ2 ) ∈
Ω1 and (x2 , γ1 , γ2 ) ∈ Ω2 . Note that x1 ∈ dom(f ) and x2 ∈ dom(g) due to
f (x1 ) ≤ λ1 < ∞ and g(x2 ) ≤ γ2 < ∞. Hence we arrive at the inclusion
tv = t x1 − x2 , λ1 − γ1 , λ2 − γ2 ∈ cone dom(f ) − dom(g) × R × R.
To verify now the opposite inclusion, fix x ∈ cone(dom(f ) − dom(g)) × R × R
and find, by taking into account Remark 4.34, such t > 0, x1 ∈ dom(f ),
x2 ∈ dom(g), γ1 , γ2 ∈ R, and λ1 , λ2 for which we have
278 4 ENHANCED CALCULUS AND FENCHEL DUALITY
x = t(x1 − x2 ), γ1 , γ2 = t(x
1 − x2 , λ1 , λ2 )
= t x1 , f (x1 ), λ2 + g(x2 ) − x2 , −λ1 + f (x1 ), g(x2 ) .
This readily yields x ∈ t(Ω1 − Ω2 ) ⊂ cone(Ω1 − Ω2 ). Applying Theorem 4.33,
we arrive at the claimed conclusions under the assumptions made.
The two final consequences provide the improved versions of the conjugate
chain and maximum rules obtained geometrically from the corresponding ver-
sion of the support function intersection rule in Banach spaces.
Corollary 4.42 Suppose that in the setting of Theorem 4.28 the spaces X
and Y are Banach, the function g is convex and l.s.c., and the set cone(AX −
dom(g)) is a closed subspace of Y . Then we have the conjugate chain rule
(4.21), where the infimum is achieved for any x∗ ∈ dom(g ◦ A)∗ .
Proof. Considering the sets Ω1 and Ω2 from (4.22), it is easy to check that
cone(Ω1 − Ω2 ) = X × cone AX − dom(g) × R.
Then we proceed as in the proof of Theorem 4.28 by employing Theorem 4.33
instead of Theorem 4.23.
Corollary 4.43 Suppose that in the setting of Theorem 4.29 the space X
is Banach, the functions f, g : X → R are convex and l.s.c., and the set
cone(dom(f ) − dom(g)) is a closed subspace of X. Then we have the con-
jugate maximum rule (4.23), where the infimum is achieved provided that the
value (f ∨ g)∗ (x∗ ) is a real number.
f (x + tv) − f (x)
f− (x; v) := lim .
t↑0 t
It is easy to see from the definitions that
f− (x; v) = −f (x; −v) for all v ∈ X,
and thus properties of the left directional derivative f− (x; v) reduce to those
of the right one (4.27), which we study below.
Directional derivatives of convex functions enjoy remarkable properties,
some of which are presented in what follows.
x + 2tv1 + x + 2tv2
f − f (x)
= lim 2
t↓0 t
= ψ(v1 ) + ψ(v2 ).
To verify (d), choose by using x ∈ int(dom(f )) a number α > 0 to be so small
that x + αv ∈ dom(f ). It follows from Proposition 4.47 that
ψ(αv) = f (x; αv) ≤ f (x + αv) − f (x) < ∞.
Employing (c) gives us ψ(v) < ∞. Furthermore, we get from (a) and (b) that
0 = ψ(0) = ψ v + (−v) ≤ ψ(v) + ψ(−v), v ∈ X,
282 4 ENHANCED CALCULUS AND FENCHEL DUALITY
which implies that ψ(v) ≥ −ψ(−v). This tells us that ψ(v) > −∞ and thus
verifies assertion (d).
It remains to prove (e). We get from the continuity of f at x that x ∈
int(dom(f )), and so ψ is finite on X. There exists γ > 0 such that
f (x + v) − f (x) < 1 whenever v < γ.
By Proposition 4.47 we have the estimate
ψ(v) = f (x; v) ≤ f (x + v) − f (x) < 1 if v < γ.
Then ψ is a convex function bounded from above on bounded sets, and hence
it is locally Lipschitz continuous on dom(ψ) = X.
This section is devoted to the study of the class of supremum functions over
infinite index sets. Supremum functions, which are intrinsically nonsmooth,
have been highly recognized in the convex and variational analysis due to their
remarkable features and numerous applications; in particular, to nonstandard
problems of constrained optimization and optimal control. The main result
here is a precise formula for calculating subgradients of the supremum of
convex functions over compact index sets.
Let X be a topological vector space, and let T be an index set that is assumed
here to be a compact topological space. Given a real-valued function g : T ×
X → R, define the corresponding supremum function f : X → R by
f (x) := sup g(t, x) t ∈ T , x ∈ X. (4.29)
We associate with the supremum function (4.29) the following set-valued map-
→ T of active indices given by
ping S : X →
S(x) := t ∈ T f (x) = g(t, x) , x ∈ X. (4.30)
284 4 ENHANCED CALCULUS AND FENCHEL DUALITY
Proof. This follows from the easily verifiable fact that the supremum of a
family of convex functions is always convex.
The next proposition is rather technical while being important for verifying
the subdifferential formula for (4.29) in the next subsection.
To proceed with (b), fix any γ > 0 and choose λk < γ for all k ∈ N. Then
λk λk
ϕ(λk ) = f (x0 + λk v) = g(tk , x0 + λk v) = g tk , (x0 + γv) + 1 − x0
γ γ
λk λk
≤ g(tk , x0 + γv) + 1 − g(tk , x0 )
γ γ
due to the convexity of g(t, ·). The continuity of ϕ and the upper semiconti-
nuity of g(·, x) ensure the relationships
4.4 Subgradients of Supremum Functions 285
Now we are ready to present the main result of this section, which calculates
the subdifferential of (4.29) via those for the generating convex functions
g(t, ·). In what follows we use the notation gt (x) := g(t, x). Given a topological
space X and a subset Ω ⊂ X ∗ of its dual space X ∗ , denote by co∗ (Ω) the
weak∗ closed convex subset of X ∗ containing Ω, which is the smallest among
all the weak∗ closed convex subsets of X ∗ containing the set co(Ω).
Theorem 4.54 Let X be an LCTV space, let T be a compact topological space,
and let g(t, x) satisfy all the assumptions of Proposition 4.53. Assume in addi-
tion that for every t ∈ T , the function g(t, ·) is continuous at x. Then we have
the subdifferential representation
∂f (x) = co∗ ∂gt (x) . (4.31)
t∈S(x)
Proof. Observe that the assumptions imposed in the theorem ensure that
S(x) = ∅ and the supremum function f is convex by Proposition 4.51 and
Proposition 4.52, respectively. Then the inclusion “⊃” in (4.31) follows directly
from the definitions.
To verify the inclusion “⊂” in (4.31), pick any x∗ ∈ ∂f (x), denote
C := co∗ ∂gt (x) ,
t∈S(x)
and suppose on the contrary that x∗ ∈ / C. Applying to the sets {x∗ } and C
the strict separation result of Theorem 2.61 in the space X ∗ equipped with
the weak∗ topology allows us to find a vector u ∈ X and α > 0 such that
286 4 ENHANCED CALCULUS AND FENCHEL DUALITY
for all (x, y) ∈ X × Y , which yields the inclusion (x∗ , 0) ∈ ∂(ϕ + δgph(F ) )(x, y).
To verify the opposite inclusion, pick (x∗ , 0) ∈ ∂(ϕ + δgph(F ) )(x, y). Then
(4.36) is satisfied for all (x, y) ∈ X × Y . This tells us that
x∗ , x − x ≤ ϕ(x, y) − ϕ(x, y) = ϕ(x, y) − μ(x) whenever y ∈ F (x).
Taking the infimum on the right-hand side above with respect to y ∈ F (x)
gives us (4.35), and so we arrive at x∗ ∈ ∂μ(x).
Corollary 4.57 In the framework of Theorem 4.56, suppose that the cost
function ϕ in (4.33) does not depend on x. Then we have
∂μ(x) = D∗ F (x, y)(y ∗ ) for any (x, y) ∈ gph(M )
y ∗ ∈∂ϕ(y)
with M (·) from (4.34) under the fulfillment of one of the assumptions (a)–(c).
Proof. This follows directly from (4.37) with ϕ(x, y) = ϕ(x). Indeed, in this
case we obviously have ∂ϕ(x, y) = ∂ϕ(x) and x∗ = 0.
It is clear that the claimed chain rule (4.38) follows from the formula
D∗ F x, f (x) (y ∗ ) = ∂(y ∗ ◦ f )(x) whenever y ∗ ∈ ∂ϕ f (x) , (4.39)
which we are going
to provenow. To verify the inclusion “⊂” in (4.39), pick
(y ∗ , x∗ ) ∈ gph D∗ F (x, f (x)) and get by definition that
y ∗ , y ≥ y ∗ , f (x) + x∗ , x − x for all x ∈ X, f (x) ≺ y.
Fix h ∈ X and select x := x + h with y := f (x). Since f (x) ≺ y, we have that
y ∗ , f (x + h) ≥ y ∗ , f (x) + x∗ , h,
which yields x∗ ∈ ∂(y ∗ ◦ f )(x) and hence justifies the inclusion “⊂” in (4.39).
To verify the opposite inclusion in (4.38), pick x∗ ∈ ∂(y ∗ ◦ f )(x) with
y ∗ ∈ ∂ϕ(f (x)) and get by the subgradient definition that
y ∗ , f (x + h) − f (x) ≥ x∗ , h whenever h ∈ X.
Taking any x ∈ X and y ∈ Y with f (x) ≺ y, denote h := x − x. Then
Proposition 4.59 tells us that y ∗ , y ≥ y ∗ , f (x). This shows that
y ∗ , y − f (x) ≥ y ∗ , f (x) − f (x) ≥ x∗ , x − x,
and therefore (x∗ , −y ∗ ) ∈ N ((x, f (x)); gph(F )). It gives us x∗ ∈
D∗ F (x, f (x))(y ∗ ) and thus verifies the inclusion “⊃” in (4.39), which com-
pletes the proof.
Proposition 4.62 Consider the optimization problem (4.42) and its dual
(4.43) in topological vector spaces, where the functions f and g are not
assumed to be convex. Define the optimal values of these problems by
p := inf f (x) + g(Ax) ,
x∈X
d := sup − f ∗ (−A∗ y ∗ ) − g ∗ (y ∗ ) .
y ∗ ∈Y ∗
(c) X and Y are Banach spaces, f and g are l.s.c., and the set
Z := cone dom(g) − A(dom(f ))
is a closed subspace of Y .
(d) X = Rn , Y = Rm , and 0 ∈ ri dom(g) − A(dom(f )) .
Furthermore, if the number p is finite, then
Then we have the equality p = d.
the supremum in the definition of d is attained.
which also ensure that the supremum in the definition of d is attained. Finally,
it is easy to observe from the definitions that the qualification condition (4.44)
holds provided that the simpler one (4.45) is satisfied. This, therefore, com-
pletes the proof of the theorem.
Corollary 4.64 Similar to the above, denote the optimal values in the dual
problem and dual problems (4.46) and (4.47) by, respectively,
p := inf f (x) + g(Ax − b) ,
x∈X
d := sup − f ∗ (A∗ y ∗ ) − g ∗ (−y ∗ ) + y ∗ , b .
y ∗ ∈Y ∗
Proof. Define g(x) := g(x − b) for x ∈ X and apply Proposition 4.62 and
Theorem 4.63 to problem (4.42), where g is replaced by g. Calculating the
conjugate function of g verifies all the conclusions of this corollary.
For another application of the duality theorem for (4.48), consider the
problem of finding the Euclidean distance from a point x ∈ Rn to the set
Ω := x ∈ Rn a, x = b with 0 = a ∈ Rn , b ∈ R.
This can be written as the optimization problem
1
minimize f (x) := x−x 2
subject to a, x = b.
2
Since f ∗ (y ∗ ) = 1/2 y ∗ 2 + y ∗ , x for all y ∗ ∈ Rn , the Fenchel dual problem
(4.49) is written as follows:
maximize − f ∗ (at) − tb = −1/2 ta 2
− ta, x + tb for all t ∈ R.
This is a simple maximization problem in R, and the duality theorem yields
1 (a, x − b)2
d = = p.
2 a 2
Thus we arrive at the well-known formula for the Euclidean distance function:
|a, x − b|
d(x; Ω) = 2
p= .
a
The next example concerns the class of unconstrained optimization prob-
lems with nondifferentiable objectives in arbitrary normed spaces.
• X and Y are Banach spaces, f is l.s.c., Ω is closed, and the set cone(Ω −
A(dom(f ))) is a closed subspace of Y .
We apply the obtained duality result in the last case to finding the distance
from a point x ∈ Rn to the set
Θ := x ∈ Rn a, x ≤ b with 0 = a ∈ Rn and b ∈ R.
This problem can be rewritten in the optimization form of (4.52) as
1
minimize f (x) := x − x 2 subject to Ax ∈ Ω (4.53)
2
with A(x) := a, x for x ∈ Rn and Ω := (−∞, 0] + b. The dual problem of
(4.53) is the one-dimensional one written as
1
maximize − σΩ (t) − bt − t2 a 2 + ta, x subject to t ∈ R.
2
The latter problem can be easily solved giving us the optimal value
⎧
⎨0 if a, x ≤ b,
d = 1 (a, x − b)2
⎩ otherwise.
2 a 2
Thus the duality theorem tells us the distance in question is calculated by
⎧
⎨0 if a, x ≤ b,
d(x; Ω) = a, x − b
⎩ otherwise.
a
convex. Note that (4.54) is a minimization problem with the convex objection
f + (−g).
Along with the Fenchel (convex) conjugate f ∗ from Definition 4.1, define
the concave conjugate of g by
g∗ (x∗ ) := inf x∗ , x − g(x) for all x∗ ∈ X ∗ (4.55)
and observe that we do not have g∗ = (−g)∗ while getting the relationship
g∗ (x∗ ) = −(−g)∗ (−x∗ ) whenever x∗ ∈ X ∗ .
Recall also that the concavity of a function g : X → [−∞, ∞) can be fully
characterized by the convexity of its hypograph
hypo(g) := (x, α) ∈ X × R α ≤ g(x) .
Before the formulation and proof of the main duality theorem given below,
we present the following simple lemma about some properties of intrinsic
relative and quasi-relative interiors as well as quasi-regularity of convex sets
that are taken from Definition 2.168.
Lemma 4.68 Let Ω be a convex subset of a topological vector space X, and
let q ∈ X. Then we have:
(a) iri(q + Ω) = q + iri(Ω).
(b) qri(q + Ω) = q + qri(Ω).
(c) Ω is quasi-regular if and only if Ω + q is quasi-regular.
Proof. Fix any x ∈ Ω and observe easily that
cone(q + Ω − x) = cone Ω − (x − q) and cone(q + Ω − x) = cone Ω − (x − q) .
Then we deduce from the definitions of iri and qri that x ∈ iri(q + Ω) if and
only if x − q ∈ iri(Ω), and that x ∈ qri(q + Ω) if and only if x − q ∈ qri(Ω).
This readily verifies both assertions (a) and (b). Assertion (c) follows directly
from (a) and (b) and the definition of quasi-regularity.
Now we are ready to establish the aforementioned duality theorem for
problem (4.54) written in the difference form.
Theorem 4.69 Let f : X → (−∞, ∞] be a proper convex function, and let
g : X → [−∞, ∞) be a proper concave function defined on an LCTV space X.
Then we have the duality relationship
inf f (x) − g(x) x ∈ X = sup g∗ (x∗ ) − f ∗ (x∗ ) x∗ ∈ X ∗ (4.56)
provided that the following conditions are satisfied simultaneously:
(a) qri dom(f ) ∩ qri dom(g) = ∅.
(b) All the three convex sets dom(f ) − dom(g), epi(f ), and epi(f ) − hypo(g)
are quasi-regular.
300 4 ENHANCED CALCULUS AND FENCHEL DUALITY
Proof. Observe first that for any x ∈ X and x∗ ∈ X ∗ we have the inequalities
f (x) + f ∗ (x∗ ) ≥ x∗ , x ≥ g(x) + g∗ (x∗ ),
which immediately yield the estimate
inf f (x) − g(x) x ∈ X ≥ sup g∗ (x∗ ) − f ∗ (x∗ ) x∗ ∈ X ∗ .
Denoting α := inf{f (x) − g(x) | x ∈ X}, it is easy to see that (4.56) holds if
α = −∞. Considering the case where α is finite, we are going to show that
there exists x∗ ∈ X ∗ such that g∗ (x∗ ) − f ∗ (x∗ ) ≥ α, which would readily
justify (4.56). To proceed, define the sets
Ω1 := epi(f ) and Ω2 := (x, μ) ∈ X × R μ ≤ g(x) + α .
Since the set epi(f ) is quasi-regular, we get by Theorem 2.190 that
qri(Ω1 ) = (x, λ) ∈ X × R x ∈ qri dom(f ) , f (x) < λ ,
qri(Ω2 ) ⊃ (x, μ) ∈ X × R x ∈ qri dom(g) , μ < g(x) + α .
It follows from the qualification condition qri dom(f ) ∩ qri dom(g) = ∅
in (a) that qri(Ω1 ) = ∅ and qri(Ω2 ) = ∅. Thus qri(Ω1 × Ω2 ) = qri(Ω1 ) × qri
(Ω2 ) = ∅; see Exercise 2.226. Using f (x) ≥ g(x) + α for all x ∈ X yields
qri(Ω1 ) ∩ Ω2 = ∅, and so qri(Ω1 ) ∩ qri(Ω2 ) = ∅.
Observing further that Ω2 = hypo(g) + {(0, α)}, we get
Ω1 − Ω2 = epi(f ) − hypo(g) − {(0, α)}.
It follows from Lemma 4.68 and the imposed assumptions in (b) that the
set Ω1 − Ω2 is quasi-regular. This allows us to apply Theorem 2.184, which
ensures that the sets Ω1 and Ω2 can be properly separated. Thus there exists
a pair (u∗ , β) ∈ X ∗ × R satisfying the following two conditions:
∗ ∗
inf u , x + βλ ≥ sup u , y + βμ ,
(x,λ)∈Ω1 (y,μ)∈Ω2
sup u∗ , x + βλ > inf u∗ , y + βμ .
(x,λ)∈Ω1 (y,μ)∈Ω2
∗ ∗
sup u , x > inf u , y .
x∈dom(f ) y∈dom(g)
Thus the sets dom(f ) and dom(g) can be properly separated, which implies
by the characterization of Theorem 2.184 that
qri dom(f ) ∩ qri dom(g) = ∅
Proof. Theorem 2.188 tells us that the sets dom(f ) − dom(g), epi(f ), and
epi(f ) − hypo(g) are quasi-regular under the imposed assumptions. Applying
Theorem 4.69, we arrive at the conclusion of the corollary.
302 4 ENHANCED CALCULUS AND FENCHEL DUALITY
The next consequence of Theorem 4.69 involves the relative interior notion
for convex sets in LCTV spaces defined in (2.49). Recall that, in contrast to
the case of finite-dimensional spaces, nonempty convex sets may have empty
relative interiors in infinite dimensions.
Corollary 4.71 Let X be an LCTV space, and let f : X → (−∞, ∞] and
g : X → [−∞, ∞) be as in Theorem 4.69. Suppose that the sets dom(f ) −
dom(g), epi(f ), and epi(f ) − hypo(g) have nonempty relative interiors and
that the qualification condition
ri dom(f ) ∩ ri dom(g) = ∅ (4.59)
is satisfied. Then we have the Fenchel duality (4.56).
Proof. Since the sets dom(f ) − dom(g), epi(f ), and epi(f ) − hypo(g) have
nonempty relative interiors, we apply Theorem 2.174 and conclude that they
are quasi-regular. The duality result now follows from Theorem 4.69.
(b) Clarifythepossibilitytoavoidthepolyhedralityassumptiononthefunctionf
incase(b)ofTheorem4.27byreplacingthequalificationcondition(4.16)with
qri dom(f )) ∩ qri dom(g) = ∅. (4.60)
Exercise 4.86 Formulate and prove the counterparts of Theorem 4.60 in the
cases of Banach and finite-dimensional spaces.
Exercise 4.87 In the setting of Theorem 4.63 do the following:
(a) Give a detailed proof of the theorem under the assumptions in (b).
304 4 ENHANCED CALCULUS AND FENCHEL DUALITY
The main focus of this chapter is on duality, which is at the core of convex analy-
sis and its applications. As mentioned in the commentaries to Chapter 3 (see also
Chapter 7 and the commentaries therein), the generalized differential concepts
and calculus rules of convex analysis have been widely extended to general frame-
works of nonconvex variational analysis. However, this is not the case for convex
duality and its applications based on conjugate functions and their calculus.
The notion of conjugate functions from Definition 4.1 was introduced and
largely developed by Fenchel in [130] and [131] for convex functions on finite-
dimensional spaces. In more special contexts, similar notions were originated by
Adrien-Marie Legendre (1752–1833) for gradient mappings in classical
mechanics and by William Henry Young (1863–1942) in the framework of non-
decreasing convex functions defined on the set of nonnegative numbers; see [357].
In some publications, conjugate functions are called the Legendre-Fenchel and
Young-Fenchel transforms. After the seminal work by Fenchel [130, 131] in finite
dimensions, various properties of conjugates were studied by Fenchel’s student
Brøndsted [63], Moreau [262, 263], and Rockafellar [304] in infinite-dimensional
spaces. Subsequent developments on Fenchel conjugates and their applications
can be found in the books by Bauschke and Combettes [34], Borwein and Lewis
[48], Boţ [55], Castaing and Valadier [71], Hiriart-Urruty and Lemaréchal [164,
165], Ioffe and Tikhomirov [174], Rockafellar [306, 309], Rockafellar and Wets
[317], Simons [326], Zălinescu [361], and the references therein.
The fundamental biconjugate results of Theorem 4.15 and the conjugate sub-
differential relationships in Theorem 4.17 are due to Fenchel [131] in finite dimen-
sions and to Moreau [263] in LCTV spaces.
Support functions, which were originally defined by Minkowski [220] for
bounded convex sets in finite dimensions, have been largely studied and
4.8 Commentaries to Chapter 4 305
rules and/or their specifications can be found in one or another place in the
literature under various qualification conditions; see, e.g., the books [34, 42,
54, 55, 174, 210, 237, 294, 309, 361] and the references therein.
Section 4.3 contains classical material on directional derivatives of convex
functions that is used in Section 4.4 for the subdifferential study of supremum
functions. The main result of Section 4.4, Theorem 4.54, is also well known.
We refer the reader to Valadier [341], Pshenichnyi [294], Ioffe and Levin [172],
Ioffe and Tikhomirov [174], and Zălinescu [361] with the bibliographies therein
for the results of this type under the continuity assumption on g(·, x). Note
that subdifferentiation of the maxima of infinitely many convex functions
over compact index sets is significantly more challenging that for the case
when only a finite number of functions are involved into maximization. The
latter class of maximum functions was first investigated in subdifferential
theory by Dubovitskii and Milyutin [113] and Danskin [95]; see Demyanov
and Malozemov [97] for related results and also Demyanov and Rubinov [101]
for further extensions to the class of quasidifferentiable functions.
In more recent years, strong attention has been paid to subdifferentiation
of convex supremum functions (4.29), where great progress has been made
in the following two directions: (1) the function g(t, ·) under the supremum
operation is discontinuous at the reference point x of its domain, and (2) the
index set T is not compact and may be an arbitrary set with no topology
involved. The results in these extended settings added the normal cone to
the domain of the supremum function to the subdifferential formula (4.29)
and also used small perturbations of the index set T and ε-expansions of the
convex subgradient mappings for g(·, x). More details can be found in, e.g.,
Correa et al. [82], Hantoute et al. [151], and the bibliographies therein with
applications to convex semi-infinite programs (SIPs).
Versions of subdifferential results for nonconvex maximum and supremum
functions have also been developed in nonsmooth analysis. Namely, Clarke
evaluated in [74, 76] his generalized gradients for pointwise maxima of Lips-
chitz continuous functions over compact sets under appropriate assumptions
that allowed him to reduce the situation to convex analysis. Subsequent devel-
opments in this direction were given by Hiriart-Urruty [159] (see also his book
with Lemaréchal [164]) and by Zheng and Ng [367]. Later on, Borwein and
Zhu [53] established some “fuzzy” upper estimates of regular/Fréchet subgra-
dients for pointwise maxima of Lipschitzian functions. In the other lines of
developments, Mordukhovich and Nghia [249, 250] evaluated Clarke’s general-
ized gradients of Lipschitzian supremum functions (4.29) over metrizable (not
generally compact) index sets T as well as Mordukhovich’s limiting subgradi-
ents of (4.29) over arbitrary index sets with applications to nonconvex SIPs
described by Lipschitzian functions. The reader can find further results and
discussions in [249, 250] and in Mordukhovich’s book [229, Chapter 8] with
the commentaries therein. Quite recently, Pérez-Aros [289] obtained upper
estimates of the regular and limiting subdifferentials of pointwise suprema of
l.s.c. functions.
4.8 Commentaries to Chapter 4 307
and their conjugates were established by Rockafellar [305, 308, 309], Castaing
and Valadier [71], and their followers. We refer the reader to, e.g., the recent
paper by Correa et al. [83] with the bibliography therein for the current stage
of developments in this and related directions.
The first nonconvex extension of the Leibniz rule (4.63) was obtained by
Clarke [75] (see also [76]) in terms of his generalized gradients of Lipschitzian
functions by replacing the equality in (4.63) with the inclusion “⊂”, while
the equality was proved therein under an additional regularity condition. The
construction of the generalized gradients (discussed in Chapter 7 below) was
instrumental to reduce the nonconvex case to the convex one resolved by Ioffe
and Levin [172].
Mordukhovich obtained in [228, Lemma 6.18] the first version of the Leib-
niz rule for his limiting subdifferential of locally Lipschitzian functions defined
on separable and reflexive Banach spaces in the inclusion form with the
replacement of right-hand side in (4.63) by its norm closure for the Bochner
integral on T = [0, 1] in the case of infinite-dimensional spaces. Far-going
extensions of this result in various directions, with and without the closure
operation, were established by Mordukhovich and Sagara [255] by replacing
the Bochner integral with the more suitable Gelfand integration in duals of
Banach spaces. While the main motivation and applications in [228] came from
optimal control, those in [255] were strongly addressed in stochastic dynamic
programming in Banach spaces and economic modeling; see also [256]. Note
that the fundamental Lyapunov convexity theorem on the range of nonatomic
vector measures (due to Alexey Lyapunov [213]) and its infinite-dimensional
versions (see, e.g., [106]) play an important role in integration of set-valued
mappings.
Various results on subdifferentiation of nonconvex integral functionals have
been subsequently obtained in the literature over the years. We refer the reader
to, e.g., Chieu [72], Correa et al. [84], Giner and Penot [142], Mordukhovich
and Pérez-Aros [252], and the bibliographies therein. Observe that the results
of [84] address both Lipschitzian and non-Lipschitzian functionals and estab-
lish, in particular, a limiting subdifferential version of the extended Leibniz
rule (4.64) in the latter case. Note also that the quite recent paper by Mor-
dukhovich and Pérez-Aros [251] applies new extremal principles and integra-
tion of normal cone mappings to evaluate normal cones to essential intersec-
tions of random constraint sets defined on measure spaces. Finally, the same
authors establish in [253], for the first time in the literature, Leibniz-type rules
for coderivatives of expected-integral multifunctions related, in particular, to
two-stage stochastic programming. The results obtained in [251–253] are new
even in convex frameworks. We also refer the reader to the survey article by
Hess [158] for a broad overview on the integration of random set-valued map-
pings and set-valued probability theory with numerous applications to various
specific classes of stochastic problems.
The vast majority of publications on generalized differentiation of integral
functionals have been motivated by applications to various classes of problems
4.8 Commentaries to Chapter 4 309
different way; we derive it here from the general result for marginal functions.
Observe finally that the subdifferential formulas for convex marginal functions
have been recently extended in [88, 89] to arbitrary vector spaces without
topological structures.
Much has been written in the literature about Fenchel duality, which is
at the heart of convex optimization and its applications. The fundamental
contributions by Fenchel, Moreau, and Rockafellar on conjugate functions, as
well as of their followers mentioned at the beginning of the commentaries to
this chapter, are strongly related to Fenchel duality. The results presented in
Subsection 4.6.1 can be found, in different versions, in various publications
with our further elaborations in the presented examples. Theorem 4.63 on
strong duality is known as the Fenchel-Rockafellar theorem. The polyhedral
version (b) of this theorem is taken from [91].
Subsection 4.6.2 addresses the usage of generalized interior notions in
duality theory for convex optimization problems in LCTV spaces. This
line of research has been initiated by Borwein and Lewis [47] whose intro-
duction of quasi-relative interiors was largely motivated by extensions of
Fenchel duality. Subsequent developments on duality theory and its appli-
cations for various classes of infinite-dimensional convex problems under
appropriate quasi-relative interior constraint qualifications can be found in
[55, 56, 94, 143, 144, 181, 280, 360, 362] among other publications. It seems
that the result of Theorem 4.69 is new; see [90] for further elaborations, dis-
cussions, and applications.
5
VARIATIONAL TECHNIQUES AND
FURTHER SUBGRADIENT STUDY
We start this chapter with the study of variational structures for func-
tions and sets in complete metric and normed spaces. The major varia-
tional and extremal principles, being held even in nonconvex frameworks, are
largely related to and motivated by the developments on convexity. Varia-
tional/extremal principles and variational techniques elaborated in this chap-
ter in complete spaces are then applied to establishing density results for ε-
subgradients of convex functions and to developing ε-subdifferential calculus
with the further applications to convex mean value theorems, subdifferential
monotonicity, characterizations of the Fréchet and Gâteaux differentiability
together with their generic properties, and finally to deriving subgradient for-
mulas for spectral and singular functions in convex analysis. Our major results
hold in Banach spaces, but some results and proofs are valid in general set-
tings of complete metric and topological vector spaces, while those for spectral
and singular functions are primarily finite-dimensional.
Proof. Since x ∈ F (x) for every x ∈ X, we first observe that F (xk ) = ∅ for
all k ∈ N. The lower semicontinuity of f ensures that the set F (xk ) is closed
for each k ∈ N. To check next that F (xk+1 ) ⊂ F (xk ) for all k, pick any
y ∈ F (xk+1 ) and get
ε
f (y) + d(xk+1 , y) ≤ f (xk+1 ), k ∈ N.
λ
It follows by the triangle inequality that
ε ε ε
f (y) + d(xk , y) ≤ f (y) + d(xk+1 , y) + d(xk , xk+1 )
λ λ ε λ
≤ f (xk+1 ) + d(xk , xk+1 ) ≤ f (xk ),
λ
5.1 Variational Principles and Convex Geometry 313
where the last estimate holds due to xk+1 ∈ F (xk ). Employing the classical
Cantor intersection theorem allows us to complete the proof by showing that
diam F (xk ) → 0 as k → ∞. (5.3)
To proceed, fix any y ∈ F (xk+1 ) and observe that
ε
d(y, xk+1 ) ≤ f (xk+1 ) − f (y)
λ
1
≤ inf f (x) + − f (y)
x∈F (xk ) k
1 1
≤ inf f (x) + − f (y) ≤ .
x∈F (xk+1 ) k k
Invoking again the triangle inequality tells us that d(y, u) ≤ 2λ/(kε) for all
y, u ∈ F (xk+1 ), which therefore finishes the proof of the lemma.
where the last one follows from the inclusion z ∈ F (yk ) whenever k ∈ N. This
yields x ∈ F (yk ) for all k ∈ N, which contradicts the condition x = z.
It follows from z ∈ Y and the construction of Y that (a) is satisfied. Then
we deduce from (a) and the above estimates that
ε
f (z) + d(z, x0 ) ≤ f (x0 ) ≤ inf f (x) + ε ≤ f (z) + ε,
λ x∈Y
and thus d(x, x0 ) ≤ λ, which verifies (b). It remains to show that (c) is satisfied
for x ∈ X \ {z} with x ∈ / Y . Indeed, for such x we have
ε ε
f (x) + d(x, x0 ) > f (x0 ) ≥ f (z) + d(z, x0 ).
λ λ
Using finally the triangle inequality gives us
ε ε
f (x) > f (z) + d(z, x0 ) − d(x, x0 ) ≥ f (z) + d(x, z),
λ λ
which verifies (c) and thus completes the proof of the theorem.
Observe that for Banach spaces X the distance function in Theorem 5.2
is induced by the norm function, which is intrinsically nonsmooth while being
convex. Applying first the generalized Fermat rule to the minimizer z in con-
dition (c) of the theorem and then using the subdifferential sum rule in the
case of convex functions f , we arrive at the following result, which can be
treated as the subdifferential variational principle of convex analysis.
Proof. Take the element z ∈ X satisfying all the conclusions of Theorem 5.2
in the case of the Banach space X. Then we immediately get the conditions
in (a) and (b) of this theorem and observe that condition (c) of Theorem 5.2
means that z is a minimizer of the function
ε
ϕ(x) := f (x) + x − z for x ∈ X.
λ
The generalized Fermat rule from Proposition 3.29 tells us that
ε
0 ∈ ∂ϕ(z) = ∂ f (·) + x − · (z). (5.4)
λ
5.1 Variational Principles and Convex Geometry 315
Applying now the subdifferential sum rule from Theorem 3.48 to the summa-
tion function in (5.4) taking into account the calculation of subgradients of
the norm function in (3.48), we arrive at
ε
0 ∈ ∂f (z) + B∗ .
λ
This gives us a subgradient z ∗ ∈ ∂f (z) with z ∗ ≤ ε/λ and therefore com-
pletes the proof of the theorem.
Now we are ready to establish both approximate and exact versions of the
extremal principle for closed and convex subsets in arbitrary Banach spaces
without any nonempty interior assumptions.
Theorem 5.5 Let Ω1 and Ω2 be closed and convex subsets of a Banach space
X, and let x be any common point of Ω1 , Ω2 . Consider the following three
assertions concerning these sets and the normal cones to them:
(a) The sets Ωi as i = 1, 2 form an extremal system in X.
(b) For each ε > 0 we have the conditions
∃ xiε ∈ B(x; ε) ∩ Ωi and x∗iε ∈ N (xiε ; Ωi ) + εB∗ , i = 1, 2,
(5.7)
such that x∗1ε + x∗2ε = 0 with x∗1ε = x∗2ε = 1.
necessary but also sufficient for set extremality. Furthermore, the exact
extremal principle (5.8) agrees in this setting with convex separation, and thus
we arrive at a refined separation theorem for closed convex subsets of Banach
spaces without imposing any interiority assumptions; see Remark 2.187 and
more discussions in Section 2.7.
Proof. It is obvious from (5.9) and the definition of boundary points that for
any x ∈ bd(Ω) the sets Ω1 := {x} and Ω2 := Ω form an extremal system
in X. Then both assertions of the theorem follow from the corresponding
statements of Theorem 5.5 due to the normal cone constructions for convex
sets.
We obviously have that ∂0 f (x) = ∂f (x). Note that the ε-subdifferential for
any ε > 0 is also known as the approximate subdifferential of convex analysis.
5.1 Variational Principles and Convex Geometry 319
Observe directly from the definition that for any ε ≥ 0 the ε-subgradient set
∂ε f (x) is always convex and weak∗ closed in X ∗ .
Let us verify that whenever ε > 0 every proper, l.s.c., convex function
is ε-subdifferentiable (i.e., its ε-subgradient set is nonempty) at any point of
its domain. This is indeed different from the exact subdifferential case where
ε = 0; see Theorem 3.39. This is due to the fact that using ε > 0 allows us to
apply now the strict separation theorem instead of the proper separation one
as in the proof of Theorem 3.39.
Proof. Pick any x ∈ dom(f ) and ε > 0. Then (x, f (x) − ε) ∈ / epi(f ). By the
strict separation result from Theorem 2.61 there is (x∗ , −γ) ∈ X ∗ × R with
x∗ , x − γλ < x∗ , x − γ f (x) − ε for all (x, λ) ∈ epi(f ).
Letting x := x and λ := f (x) yields γ > 0. Then for any x ∈ dom(f ) we have
(x, f (x)) ∈ epi(f ). This leads us to the inequality
x∗ , x − γf (x) ≤ x∗ , x − γ f (x) − ε .
Dividing both sides of the inequality by γ and rearranging the terms give us
x∗
, x − x ≤ f (x) − f (x) + ε
γ
and therefore verify the claimed condition x∗ /γ ∈ ∂ε f (x) = ∅.
Proof. Taking any x ∈ dom(f ), we conclude from Theorem 5.9 that ∂ε f (x) =
∅ whenever ε > 0. Then the existence of x ∈ dom(f ) with ∂f (x) = ∅ follows
directly from Theorem 5.10.
∗
It suffices to show that Z ∗ ⊂ F . To proceed, fix z ∗ ∈ Z ∗ and ε > 0 and then
suppose that ε < z ∗ . Choosing x ∈ Ω with
σΩ (z ∗ ) − ε2 < z ∗ , x ,
observe from the definition of σΩ in (4.8) that
z ∗ , z − x < ε2 = δΩ (x) − δΩ (x) + ε2 for all z ∈ Ω,
which yields z ∗ ∈ ∂ε2 δΩ (x). Applying Theorem 5.10 with λ := ε gives us
elements x ∈ dom(δΩ ) = Ω and x∗ ∈ ∂δΩ (x) = N (x; Ω) satisfying
x − x < ε and z ∗ − x∗ < ε.
The second inequality implies that x∗ > z ∗ − ε > 0, and so x∗ = 0. Hence
x∗ ∈ F ∗ , which verifies the density of the set F ∗ in Z ∗ .
This subsection mainly concerns deriving the basic sum rule for ε-gradients of
extended-real-valued convex functions in different space settings. Let us begin
5.2 Calculus Rules for ε-Subgradients 323
Remark 5.20 In the proof of Theorem 5.19 we use the obvious fact that if
a, b ∈ R and a + b ≤ ε with ε ≥ 0, then there exist ε1 , ε2 ≥ 0 such that a ≤ ε1 ,
b ≤ ε2 , and ε1 + ε2 = ε. Indeed, it suffices to take ε1 := a + (ε − a − b)/2 and
ε2 := b + (ε − a − b)/2.
Proof. The inclusion “⊃” in (5.14) is obvious. To verify the opposite one, pick
any x∗ ∈ ∂ε (g ◦ A)(x) and deduce from Proposition 5.18 that
(g ◦ A)(x) + (g ◦ A)∗ (x∗ ) ≤ x∗ , x + ε.
By the conjugate chain rule, which is valid in each of the cases (a)–(c) due
to Theorem 4.28 and its enhanced Banach space version from Corollary 4.42,
we find y ∗ ∈ Y ∗ such that A∗ y ∗ := x∗ and that
(g ◦ A)∗ (x∗ ) := g ∗ (y ∗ ).
This implies therefore that
g(Ax) + g ∗ (y ∗ ) = (g ◦ A)(x) + (g ◦ A)∗ (x∗ )
≤ x∗ , x + ε = A∗ y ∗ , x + ε
= y ∗ , Ax + ε.
Employing Proposition 5.18 tells us that y ∗ ∈ ∂ε g(Ax), and so we conclude
that x∗ = A∗ y ∗ ∈ A∗ ∂ε g(Ax), which completes the proof.
In this subsection we derive sum and chain rules of the asymptotic type for
ε-subgradients of convex functions on topological vector spaces that are formu-
lated via the closure operation without imposing any qualification conditions.
As usual, we start with the (asymptotic) ε-subdifferential sum rule. The
theorem below uses the fact that for a proper convex function ψ : X ∗ →
R, where X is an LCTV space, we always have the well-known biconjugate
relationship discussed in Exercise 5.98:
ψ ∗∗ = ψ, (5.15)
where ψ is the weak∗ l.s.c. and convex function on X ∗ whose its epigraph is
the weak∗ closure of epi(ψ) in X ∗ × R.
326 5 VARIATIONAL TECHNIQUES . . .
x∗ ∈ x∗ ∈ X ∗ | ϕ(x∗ ) ≤ γ + ε}.
This implies in turn that whenever ϕ(x∗ ) ≤ γ + ε/2, it follows that
∗ ∗
ϕ(x∗ ) − γ = ∗ inf
∗ ∗
f (x1 ) + g ∗ (x∗2 ) − x∗1 , x
x =x1 +x2
− x∗2 , x + f (x) + g(x)
= ∗ inf f ∗ (x∗1 ) − x∗1 , x + f (x)
x =x∗ +x∗
1 2
+ g ∗ (x∗2 ) − x∗2 , x + g(x) ≤ ε/2.
5.2 Calculus Rules for ε-Subgradients 327
Thus we find x∗1 and x∗2 such that x∗ = x∗1 + x∗2 and
∗ ∗
f (x1 ) − x∗1 , x + f ∗ (x) + g ∗ (x∗2 ) − x∗2 , x + g(x) < ε.
Since each of the bracketed terms above is nonnegative, we conclude that
x∗1 ∈ ∂ε f (x) and x∗2 ∈ ∂ε g(x). Hence
x∗ ∈ X ∗ ϕ(x∗ ) ≤ γ + ε/2} ⊂ ∂ε f (x) + ∂ε g(x) ,
which completes the proof of the theorem.
Our next result is the following asymptotic chain rule for ε-subgradients of
compositions in topological vector spaces without any qualification conditions.
Combining now the results of Lemma 5.25 and Lemma 5.26 gives us the
following mean value theorem for convex continuous functions on topological
vector spaces.
Theorem 5.27 In the setting of Lemma 5.26 there exists c ∈ (a, b) such that
f (b) − f (a) ∈ ∂f (c), b − a := x∗ , b − a x∗ ∈ ∂f (c) .
Proof. Take the function ϕ : R → R from (5.23) and observe that ϕ(0) = f (a)
and ϕ(1) = f (b). Applying first Lemma 5.25 and then Lemma 5.26 to ϕ gives
us a number t0 ∈ (0, 1) for which
f (b) − f (a) = ϕ(1) − ϕ(0) ∈ ∂ϕ(t0 ) = x∗ , b − a x∗ ∈ ∂f (c)
with c := t0 a + (1 − t0 )b, and thus we are done with the proof.
To deal with advanced issues of convex and variational analysis and their
broad applications to constrained optimization and other areas, we need to
consider not only continuous but lower semicontinuous extended-real-valued
functions. The versions of the mean value theorem given below are largely
different from the classical one and its subdifferential counterparts. The main
feature of them is their approximate structures, and thus such results are
unified under the name of approximate mean value theorems.
To proceed, we present first the next simple albeit useful lemma involving
the distance function and the projection operator.
5.3 Mean Value Theorems for Convex Functions 331
Lemma 5.28 Let X be a normed space, and let a, b ∈ X with a = b. For any
x ∈ X, v ∗ ∈ ∂d(x; [a, b]), and w ∈ Π(x; [a, b]) we have the estimate v ∗ ≤ 1
together with the following properties:
(a) v∗ , x − w ≤ 0 for all x ∈ [a, b].
(b) v∗ , b − a ≤ 0 if w = b.
(c) v∗ , b − a = 0 if w = b and w = a.
(d) v∗ , w − x ≤− w−x .
Proof. We know from the Weierstrass theorem that g attains its absolute
minimum on the compact interval [a, b] at some point c ∈ [a, b]. Since g(a) ≤
g(b), we can always suppose that c ∈ [a, b). The l.s.c. property of g ensures
the existence of r > 0 for which
γ := inf g(x) > −∞ with Ω := [a, b] + rB.
x∈Ω
The final result of this subsection is the main version of the approximate
mean value theorem; see the commentaries to this chapter.
b−c
β − f (a) ≤ lim inf x∗k , b − xk .
b−a k→∞
Since c ∈ [a, b), we have c = ta + (1 − t)b for all t ∈ (0, 1]; hence b − c = t(b − a)
and t = b − c / b − a . Then
b−c
x∗k , b − xk ≥ lim inf x∗k , t(b − a) ≥ f (b) − f (a) .
k→∞ b−a
The last statement of the theorem in the case where β = f (b), c ∈ (a, b), and
g(a) = g(b) for the function g in (5.25) follows directly from Theorem 5.29.
To complete the proof of the theorem, it remains to examine the case
where β < f (b). In this case we choose x∗ ∈ X ∗ such that
x∗ , b − a = β − f (a)
and consider the function g(x) from (5.25). Then g(a) < g(b), which allows
us to proceed as above with the usage of Theorem 5.29.
It is easy to check that G is monotone with the graph gph(F ) being a proper
subset of gph(G), which gives us a contradiction.
Assume now that (b) is satisfied. Suppose on the contrary that F is not
maximal monotone and find a monotone mapping G : X → → X ∗ such that
gph(F ) is a proper subset of gph(G). Choose (u, u ) ∈ gph(G) with (u, u∗ ) ∈
∗
/
gph(F ). Then for any (x, x∗ ) ∈ gph(F ) we have (x, x∗ ) ∈ gph(G), which
readily implies the inequality
x∗ − u∗ , x − u ≥ 0
and thus contradicts the choice of u∗ ∈/ F (u). This verifies that (b)=⇒(a).
The equivalence between (b) and (c) is obvious.
Proof. Having in hand Proposition 5.33, we employ Lemma 5.32 to verify the
maximal monotonicity of F = ∂f . Fix any (u, u∗ ) ∈ X × X ∗ with u∗ ∈
/ ∂f (u).
By Lemma 5.32 it suffices to show that there exists a pair (x, x∗ ) ∈ X × X ∗
with x∗ ∈ ∂f (x) such that
x∗ − u∗ , x − u < 0. (5.26)
To verify this, observe by the choice of u∗ ∈
/ ∂f (u) that there is z ∈ X with
u∗ , z − u > f (z) − f (u).
Taking into account that z = u, we apply the subdifferential mean value result
from Theorem 5.27 to find a vector xt := tz + (1 − t)u with some t ∈ [0, 1]
and a subgradient x∗t ∈ ∂f (xt ) such that
f (z) − f (u) = x∗t , z − u .
This immediately implies that
u∗ , z − u > f (z) − f (u) = x∗t , z − u ,
and hence x∗t − u∗ , z − u < 0. It shows in turn that
x∗t − u∗ , xt − u = x∗t − u∗ , t(z − u) < 0.
Thus x∗t − u∗ , xt − u < 0, which completes the proof of the theorem.
u−c
0< β − g(z) ≤ lim inf x∗k , u − xk .
u−z k→∞
Remark 5.36 The results formulated below shed additional light on the rela-
tionships between the convexity of an extended-real-valued function and the
maximal monotonicity of its subgradient mapping:
(a) The assertion of Theorem 5.35 admits the following reverse statement in
Banach spaces. Namely, if a set-valued mapping F : X → → X ∗ is maximal
monotone, then there exists a proper, l.s.c., and convex function f : X →
R such that F = ∂f . Furthermore, F determines f uniquely up to an
additive constant; see Exercise 5.109 with the hints and references therein
and also the corresponding commentaries in Section 5.9. Summarizing, we
have that a proper l.s.c. function f : X → R is convex if and only if ∂f
is monotone, in which case ∂f is maximal monotone.
(b) In Chapter 7 we discuss major subgradient mappings generated by non-
convex Lipschitzian functions and show that their monotonicity is equiv-
alent to the convexity of these functions, in which case the subgradient
mapping is maximal monotone.
f (x + tv) − f (x)
f− (x; v) = lim
t↑0 t
f (x + (−t)(−v)) − f (x)
= − lim
t↑0 (−t)
f (x + t(−v)) − f (x)
= − lim
t↓0 t
= −f (x; −v) = x∗ , v .
340 5 VARIATIONAL TECHNIQUES . . .
Example 5.42 Consider the normed space 1 containing all the sequences
x = (xk ) of real numbers x1 , x2 , . . . endowed with the norm
∞
x 1 := |xk | < ∞.
k=1
To proceed with verifying (5.29), observe first the obvious fact that for any
a = (ak ) and b = (bk ) in 1 we have
∞ ∞
n ∞
∞
ak − bk ≤ (ak − bk ) + |ak | + |bk | if n ∈ N. (5.30)
k=1 k=1 k=1 k=n+1 k=n+1
where the last estimate holds due to (|xk + tvk | − |xk |)/t ≤ |vk | for all k ∈ N.
To proceed further, fix any ε > 0 and find k0 ∈ N such that
∞
ε
2 |vk | < .
2
k=k0 +1
f (x + h) + f (x − h) − 2f (x)
lim = 0.
h→0 h
Choose the sequence {hk } ⊂ 1 as follows:
h1 = (2x1 , 0, 0, . . .),
h2 = (0, 2x2 , 0, . . .),
...
∞
and observe by k=1 |xk | < ∞ that hk = 2|xk | → 0 as k → ∞. We have
f (x + hk ) + f (x − hk ) − 2f (x) x + hk + x − hk − 2 x
lim = lim ,
k→∞ hk k→∞ hk
which implies by a direct calculation that
x + hk + x − hk − 2 x 3|xk | + |xk | − 2|xk |
lim = lim = 1.
k→∞ hk k→∞ 2|xk |
This contradiction shows that f is not Fréchet differentiable at x and thus
confirms that the finite dimension of X is essential in Theorem 5.40.
which tells us therefore that g is Fréchet differentiable at any point in L2 [0, π].
It remains to show that f is nowhere Fréchet differentiable on L1 [0, π]. To
proceed, we first prove that for each x ∈ L1 [0, π] there exists v ∈ L1 [0, π] such
that the Lebesgue measure of the set {t ∈ R | 0 ≤ t ≤ π and sin(x(t) + v(t)) −
sin(x(t))−v(t) cos(x(t)) = 0} is positive. Indeed, suppose on the contrary that
for all v ∈ L1 [0, π] the measure of this set is zero. Take any rational number
q and define vq (t) = q for all t ∈ [0, π]. Then the set
Sq := {t ∈ [0, π] | sin(x(t) + q) − sin(x(t)) − q cos(x(t)) = 0}
is of measure zero and so is the set S := ∪q∈Q Sq . This ensures that for all
rational numbers q and t ∈ / S we get sin(x(t) + q) − sin(x(t)) = q cos(x(t)),
and therefore sin(x(t) + r) − sin(x(t)) = r cos(x(t)) whenever r ∈ R. We come
to a contradiction, since the function r cos(x(t)) is linear while sin(x(t) + r) −
sin(x(t)) is not. To proceed further, choose v0 ∈ L1 [0, π] such that
μ t ∈ R 0 ≤ t ≤ π, sin x(t)+v0 (t) −sin x(t) −v0 (t) cos x(t) = 0 ) > 0,
where μ denotes the Lebesgue measure. Without loss of generality, suppose
that there exists α > 0 for which
μ {t ∈ R 0 ≤ t ≤ π, sin x(t)+v0 (t) −sin x(t) −v0 (t) cos x(t) ≥ α > 0.
Considering further the set
T := t ∈ R 0 ≤ t ≤ π, sin x(t) + v0 (t) − sin x(t) − v0 (t) cos x(t) ≥ α ,
346 5 VARIATIONAL TECHNIQUES . . .
observe that |v0 | ∈ L1 [0, π] due to v0 ∈ L1 [0, π]. Thus there exist T0 ⊂ T and
β > 0 such that |v0 (t)| > β for all t ∈ T0 . Choose a sequence {Tk } ⊂ T0 such
that Tk+1 ⊂ Tk , μ(Tk ) > 0 for all k ∈ N, and ∩∞ i=1 Tk = ∅, and then define the
sequence of functions {hk } ⊂ L1 [0, π] by
!
v0 (t) if t ∈ Tk ,
hk (t) :=
0 if t ∈ / Tk .
Then hk → 0 as k → ∞ together with
π
sin x(t) + hk (t) − sin x(t) − hk (t) cos x(t) dt
αμ(Tk ) α
0
≥ = .
hn βμ(Tk ) β
This readily implies that
π
sin x(t) + h(t) − sin x(t) − h(t) cos x(t) dt
0
lim = 0,
h →0 h
which shows that f is not Fréchet differentiable at any point in L1 [0, π].
Proof. The fact that (a) and (b) are equivalent follows from Theorem 5.40.
To verify implication (a)=⇒(c), denote v := fG (x) and let ei be the ith vector
n
in the standard orthonormal basis of R . By Theorem 5.38 we have
f (x + tei ) − f (x)
lim = v, ei .
t→0 t
∂f
It shows that ∂xi
(x) = v, ei for all i = 1, . . . , n.
∂f
To proceed with the proof of (c)=⇒(d), let vi := ∂xi (x) for i = 1, . . . , n
and, remembering that ∂f (x) = ∅, pick any u ∈ ∂f (x). Then
u, h ≤ f (x + h) − f (x) for all h ∈ Rn .
In particular, we get the inequality
u, tei ≤ f (x + tei ) − f (x) for all t ∈ R, (5.32)
which implies therefore that for any t > 0 it holds
f (x + tei ) − f (x)
u, ei ≤ .
t
Letting now t ↓ 0 gives us u, ei ≤ vi and using (5.32) with t < 0 yields
u, ei ≥ vi , and hence u, ei = vi whenever i = 1, . . . , n. This verifies that
u = v, and thus ∂f (x) = {v}.
Finally, implication (d)=⇒(a) follows from Theorem 5.44 by observing
that f is always continuous at x.
Proof. Fix any x ∈ dom(f ) and suppose on the contrary that ∂f is not upper
semicontinuous at x. Then there exists a weak∗ open subset V of X ∗ con-
taining ∂f (x) and such that we can find a sequence {xk } ⊂ dom(f ) that
converges to x with x∗k ∈ ∂f (xk ) \ V . Since f is convex and continuous at x,
it is Lipschitz continuous around x with some constant ≥ 0; thus we have
x∗k ≤ for all k ∈ N. The Alaoglu-Bourbaki theorem ensures that {x∗k } has
a weak∗ cluster point x∗ . Then we get x∗ ∈ ∂f (x) \ V by Proposition 5.49.
The obtained contradiction verifies the claimed result.
The following refinement of the above result for the case of Gâteaux dif-
ferentiable functions on normed spaces is rather straightforward.
Proof. We get from Theorem 5.44 that the subgradient mapping of f reduces
to its single-valued Gâteaux derivative counterpart. Furthermore, it is easy to
see that the upper semicontinuity of a single-valued mapping means in fact
its continuity. This verifies the claimed statement.
350 5 VARIATIONAL TECHNIQUES . . .
f (x) − f (x) = u∗ , x − x .
Then we have u∗ − v ∗ < ε and therefore arrive at
f (x) − f (x) − v ∗ , x − x u∗ − v ∗ , x − x
= ≤ u∗ − v ∗ < ε.
x−x x−x
This tells us that f is Fréchet differentiable at x.
We first recall the definitions of Asplund and weak Asplund spaces, which are
closely related to Fréchet and Gâteaux generic differentiability of continuous
convex functions f : X → R on Banach spaces, i.e., the corresponding differ-
entiability on dense Gδ subsets, where the notation Gδ signifies a countable
intersection of open subsets.
Definition 5.58 Let X be a Banach space.
(a) X is an Asplund space if for every convex continuous f : X → R, the
set of Fréchet differentiability points contains a dense Gδ set in X.
(b) X is a weak Asplund space if for every convex continuous f : X → R,
the set of Gâteaux differentiability points contains a dense Gδ set in X.
These remarkable subclasses of Banach spaces were introduced by Edgar
Asplund [9] in the geometric theory of Banach spaces under the names of
“strong differentiability spaces” and “weak differentiability spaces,” respec-
tively; see the commentaries in Section 5.9 for more discussions. We can see
that the only difference between the definitions of Asplund and weak Asplund
spaces is replacing the dense Fréchet differentiability by its Gâteaux coun-
terpart. But the available results for these two classes of Banach spaces are
dramatically different. While Asplund spaces admit many beautiful character-
izations and useful properties (see more details at the end of Subsection 5.6.2
and also the exercises and commentaries in Sections 5.8 and 5.9), it is not
the case for weak Asplund spaces. However, the latter class contains each
separable Banach space, which is proved below in this subsection.
To begin with, we present a useful result on the almost everywhere (a.e.,
with respect to the Lebesgue measure) differentiability of convex functions
on open intervals of the real line. We actually prove even more: the set of
nondifferentiability points is countable.
f (s) − f (t)
ψ(t) := f+ (t) = lim for t ∈ I.
s↓t s−t
It follows from Lemma 2.115 that ψ : I → R is well-defined and nondecreasing
on I. In addition, for any t0 ∈ I we have
lim ψ(t) ≤ f− (t0 ) ≤ f+ (t0 ) = ψ(t0 ). (5.34)
t↑t0
Proof. Fix any sequence {xk } ⊂ Ωα that converges to some x ∈ Ω and show
that x ∈ Ωα . Choose u∗k , x∗k ∈ X ∗ so that
x∗k − u∗k , xk ≥ α.
Remembering that f is locally Lipschitzian around x, suppose without loss
of generality that {x∗k } and {u∗k } are bounded sequences in X ∗ . Since X is
separable, it is known from classical analysis that bounded sets in X ∗ are
metrizable with respect to the weak∗ topology. This allows us to find subse-
quences of {x∗k } and {u∗k } that weak∗ converge to some x∗ and u∗ , respectively.
It is easy to see that x∗ , u∗ ∈ ∂f (x) with x∗ − u∗ , x ≥ α. This verifies that
x ∈ Ωα , and so the set Ωα is closed while Uα is an open set in Ω.
Further we show that Uα is dense in Ω. Fixing any x ∈ Ω and a nonzero
element a ∈ X, define the real-valued function
ϕ(t) := f (x + ta) for t ∈ I := γ ∈ R x + γa ∈ Ω .
It follows from Lemma 5.59 that the function ϕ is differentiable everywhere
except a countable set. Hence for any ε > 0 we find a point x0 := x + t0 a
such that x − x0 < ε and ϕ (t0 ) exists. Pick x∗ , u∗ ∈ ∂f (x0 ) and observe
that the restrictions of these linear functionals to the line x + aR give us the
subdifferentials of ϕ at t0 . Since ϕ is differentiable at t0 , these restrictions
must agree with each other on the entire line x + aR. In particular, we get
x∗ , x0 = u∗ , x0 . This implies that x0 ∈ Ω \ Ωα = Uα , and therefore the set
Uα is dense in Ω, which completes the proof of the lemma.
Based on the above lemmas, we are now ready to derive the following
theorem, which tells us that any separable Banach space is weak Asplund.
Theorem 5.61 Let X be a separable Banach space, and let Ω be a nonempty,
open, and convex subset of X. Then for every convex function f : Ω → R,
which is continuous on Ω = dom(f ), there exists a Gδ dense subset U ⊂ Ω
such that f is Gâteaux differentiable on U .
356 5 VARIATIONAL TECHNIQUES . . .
Proof. Consider an arbitrary dense subset {xk } of X, which exists due the
separability of this space. Given any m ∈ N, define the collection of sets
1
Fm,k := x ∈ Ω x∗ − u∗ , xk ≥ whenever x∗ , u∗ ∈ ∂f (x) .
m
Theorem 5.44 tells us that f is Gâteaux differentiable at x ∈ Ω if and only
if ∂f (x) is a singleton. It follows that f is Gâteaux differentiable at x ∈ Ω if
and only if x ∈ / ∪m,k Fm,k . Thus f is Gâteaux differentiable at every x ∈ U :=
∩m,k Ω \ Fm,k , which is a Gδ set according to Lemma 5.60.
The next lemma, which is based on Proposition 5.62, is needed for deriving
the main result on the generic Fréchet differentiability presented below.
Lemma 5.64 Let X be a normed space, and let Ω be an α-cone meager set
for some α ∈ (0, 1). Then Ω is nowhere dense on X, i.e., int(Ω) = ∅.
Proof. Suppose on the contrary that int(Ω) = ∅ and then find x ∈ Ω and
ε > 0 such that B(x; ε) ⊂ Ω. It follows from Definition 5.63 due to Proposi-
tion 5.62 that for any x0 ∈ B(x; ε) and x∗ ∈ X ∗ \ {0} the set K(x0 , x∗ , α) is
a neighborhood of x0 , and thus
Ω ∩ K(x0 , x∗ , α) = ∅.
This shows that Ω is nowhere dense in X.
The following lemma employs the monotonicity notion for set-valued map-
pings from Definition 5.31. The proof of this lemma contains the main argu-
ments to verify the generic Fréchet differentiability of convex continuous func-
tion on a major subclass of Banach spaces.
Let {u∗k } be a dense set in X ∗ , which exists due to the assumed separability
of the dual space. Fix 0 < α < 1 and define the family of the sets
α
Ck,m := x ∈ Ak d u∗m ; F (x) < for all k, m ∈ N.
4k
Then we clearly have the representation
A= Ak and Ak = Ck,m as k ∈ N.
k∈N m∈N
It follows from Lemmas 5.64 and 5.65 that each set Ck,m is nowhere dense.
Define further U = Ω\A and deduce from the classical Baire category theorem
5.7 Spectral and Singular Functions in Convex Analysis 359
The above theorem presents just one result from Asplund space the-
ory. In generality, Asplund spaces constitute a nice and broad subclass of
Banach spaces, which includes—besides spaces with separable duals as in
Theorem 5.66—every reflexive Banach space. One of the most beautiful char-
acterizations of Asplund spaces is as follows: a Banach space where any sep-
arable subspace has a separable dual. We discuss some other important facts
about Asplund spaces in Sections 5.8 and 5.9 with the references therein.
Throughout the whole section, Sn denotes the set of symmetric n×n matrices
with real entries, Sn+ stands for the subset of Sn consisting of positive semidef-
inite matrices, and On denotes the set of n × n real orthonormal matrices.
That is, U ∈ On if and only if U T U = I, where the symbol U T indicates the
transpose of the square matrix U . Given a matrix A ∈ Rn×n , recall that its
trace is defined by
n
tr(A) := Aei , ei , (5.36)
i=1
n
tr(A) = Aui , ui . (5.37)
i=1
Proof. Let U ∈ Rn×n be the basic transition matrix such that U ei = ui for
all i = 1, . . . , n. Then U is an orthonormal matrix, and therefore
tr(A) = tr(U T AU ),
which holds since the trace is a commutative operation, i.e., tr(AB) = tr(BA).
It readily yields due to (5.36) that
n
tr(A) = tr(U T AU ) = U T AU ei , ei
i=1
n
n
= AU ei , U ei = Aui , ui .
i=1 i=1
This gives us (5.37) and thus completes the proof of the lemma.
Now we are ready to derive a basic version of von Neumann’s trace inequal-
ity for symmetric positive semidefinite matrices. It is said that the matrices
A, B ∈ Sn+ admit a simultaneously ordered spectral decomposition if there exist
an orthonormal matrix U ∈ On together with diagonal matrices A1 , B1 ∈ Sn+
having decreasing entries on the diagonal such that
A = U T A1 U and B = U T B1 U.
and hence verify the von Neumann trace inequality (5.39) for A, B ∈ Sn+ . The
equality statement of the theorem follows from the above proof.
362 5 VARIATIONAL TECHNIQUES . . .
This clearly verifies (5.39) and the equality statement therein for this case
under consideration, and thus we are done with the proof.
Yet another version of von Neumann’s trace inequality, which is needed to
study singular functions, is presented in Subsection 5.7.3.
Let us begin with the following definitions of the two major classes of functions
that are studied in this subsection. Now A and B stand for matrix variables.
Definition 5.70 We say that F : Sn → R is a spectral function if it is
On -invariant in the sense that
F (U AU T ) = F (A) for all A ∈ Sn and U ∈ On .
The next proposition shows that a general class of spectral functions can
be reduced to the eigenvalue ones via compositions with symmetric functions.
The next theorem establishes a precise calculus formula for computing the
conjugates of compositions of type (5.40) that contain spectral functions.
Theorem 5.75 Let f : Rn → R be a symmetric function, and let λ : Sn →
Rn be the eigenvalue mapping defined in Proposition 5.72. Then the Fenchel
conjugate of the composite function (f ◦ λ) : Sn → R is the composition of the
Fenchel conjugate of f with λ, i.e.,
(f ◦ λ)∗ (B) = (f ∗ ◦ λ)(B) for all B ∈ Sn . (5.41)
Proof. It is based on the von Neumann trace inequality for symmetric matrices
from Theorem 5.69, which tells us that
tr(AB) ≤ λ(A), λB) for all A, B ∈ Sn . (5.42)
It follows from (5.42) and the definition of conjugates that
(f ◦ λ)∗ (B) = sup tr(AB) − f λ(A) A ∈ Sn
≤ sup λ(A), λ(B) − f λ(A) A ∈ Sn
≤ sup x, λ(B) − f (x) x ∈ Rn = (f ∗ ◦ λ)(B)
for all B ∈ Sn . This justifies the inequality “≤” in (5.41).
To verify the opposite inequality in (5.41), we get from the spectral decom-
position that B = U T (Diag λ(B))U for some U ∈ On . Recalling the trace
commutation tr(BA) = tr(AB) for all B, A ∈ Sn gives us the relationships
(f ∗ ◦ λ)(B) = sup x, λ(B) − f (x) x ∈ Rn
= sup tr (Diag x)(Diag λ(B) − f (x) x ∈ Rn
= sup tr (Diag x)T − f (x) x ∈ Rn
= sup tr (Diag x)U BU T − (f ◦ λ)(Diag x) x ∈ Rn
= sup tr U T (Diag x)U B − (f ◦ λ)(U T Diag x)U | x ∈ Rn
≤ sup tr(BA) − (f ◦ λ)(A) A ∈ Sn = (f ◦ λ)∗ (B),
which therefore justify the claimed matrix conjugate rule (5.41).
Proof. The equivalence between (a) and (b) follows from Theorem 5.76 with
ε = 0. The equivalence between (b) and (c) is a consequence of the equality
statement in Theorem 5.69.
In the rest of this subsection we always let p := min{m, n}. Also, the inner
product on the matrix space Rm×n is defined via the matrix trace
m
n
A, B := tr(AT B) = aij bij , (5.45)
i=1 j=1
Proof. Fixing any B ∈ Rm×n and applying the von Neumann trace inequality
from Theorem 5.81, we get
(f ◦ σ)∗ (B) = sup tr(AT B) − f σ(A) A ∈ Rm×n
≤ sup σ(A), σ(B) − f σ(A) A ∈ Rm×n
≤ sup x, σ(B) − f (x) x ∈ Rp = (f ∗ ◦ σ)(B),
which verifies the inequality “≤” in (5.47). To prove the opposite inequal-
ity therein, we employ the singular decomposition of B given by B =
U (Diag σ(B))V T for some U ∈ Om and V ∈ On , where Diag λ(B) is the
m × n matrix obtained by placing components of σ(B) on its diagonal.
Using now Lemma 5.83 and the facts that tr(AT B) = tr(B T A) and that
tr(AB) = tr(BA) for all A, B ∈ Rm×n tells us that
(f ∗ ◦ σ)(B) = sup σ(B), x − f (x) x ∈ Rp
T
= sup tr (Diag σ(B) Diag x) − f (x) x ∈ Rp
= sup{tr (V T B T U Diag x) − f (x) x ∈ Rp
= sup tr(B T U Diag xV T ) − f (x) x ∈ Rp
= sup tr (V (Diag x)T U T B − f (x) x ∈ Rp
= sup tr (U Diag xV T )T B − (f ◦ σ) U (Diag x)V T x ∈ Rp
≤ sup tr(AT B) − (f ◦ σ)(A) A ∈ Rm×n = (f ◦ σ)∗ (B),
which therefore completes the proof of the theorem.
The next corollary and its proof are parallel to the case of spectral func-
tions considered in Subsection 5.7.2.
Proof. The equivalence between (a) and (b) follows directly from Theo-
rem 5.85 with ε = 0. The equivalence between (b) and (c) is a consequence of
the equality statement in Theorem 5.81.
Proof. Assuming (a), we get by Corollary 5.86 that the matrices A and B
have a simultaneously ordered singular decomposition. This gives us matri-
ces U, V ∈ Om such that A = U (Diag σ(A))V T and B = U (Diag σ(B))V T .
Denoting x := σ(A) and y := σ(B) implies that A = U (Diag x)V T and
B = U (Diag y)V T with y = σ(B) ∈ ∂f (σ(A)) = ∂f (x). This yields (b) and
thus verifies implication (a)=⇒(b).
Finally, we check that (b)=⇒(a). Having (b), let A := U (Diag x)V T and
B := U (Diag y)V T for some matrix U ∈ Om , V ∈ On and vectors x, y ∈
Rp with y ∈ ∂f (x). It is easy to see that the components |xi | and |yi | are
the singular values of A and B, respectively. Employing Lemma 5.83 and
remembering that both functions f and f ∗ are absolutely symmetric, as well
as that y ∈ ∂f (x), we arrive at the conditions
x, y = f (x) + f ∗ (y) = f σ(A) + f ∗ σ(B) ≤ σ(A), σ(B) .
This ensures that σ(Y ) ∈ ∂f (σ(X)) and thus completes the proof.
Corollary 5.88 Under the assumptions of Theorem 5.87 we have the subdif-
ferential representation for singular functions:
∂(f ◦ σ)(A) = U (Diag v)V T v ∈ ∂f (σ(A)), U ∈ Om , V ∈ On . (5.49)
Proof. Pick Y ∈ ∂(f ◦ σ)(A) and deduce from Theorem 5.87 that there exist
matrices U ∈ Om and V ∈ On such that A = U (Diag σ(A))V T and B =
U (Diag σ(B))V T with σ(B) ∈ ∂f (σ(A)). It shows that B belongs to the set
on the right-hand side of (5.49).
Suppose now that B := U (Diag v)V T with v ∈ ∂f (σ(A)), U ∈ Om , and
V ∈ On . Denoting x := σ(A) and y := v we deduce the opposite inclusion in
(5.49) directly from Theorem 5.87.
Exercise 5.91 Give a direct proof of Theorem 5.10 without using the Ekeland vari-
ational principle. Hint: Use variational arguments similar to those in [228, Theo-
rem 2.10] in Fréchet smooth spaces together with the result of Lemma 5.4.
Exercise 5.92 Give a direct proof of Corollary 5.11 without using the Brøndsted-
Rockafellar density theorem.
Exercise 5.93 Let X be a Banach space. Deduce the Ekeland variational principle
and the Brøndsted-Rockafellar density theorem from the Bishop-Phelps result of
Corollary 5.12. Hint: Compare with the variational proofs in [125].
Prove that Λ is dense in the space X ∗ equipped with the strong topology.
Exercise 5.98 Verify the fulfillment of the relationship in (5.15). Hint: Use the
proof of the biconjugate part of Theorem 4.15.
372 5 VARIATIONAL TECHNIQUES . . .
Exercise 5.100 Let Ω be a nonempty convex subset of a normed space X, and let
ε ≥ 0. Prove that the following hold:
(a) ∂ε p(x) = x∗ ∈ B∗ x ≤ x∗ , x + ε}, where p(x) := x for x ∈ X.
∗
(b) ∂ε δΩ (x) = x ∈ X ∗ σΩ (x∗ ) ≤ x∗ , x + ε , x ∈ Ω.
Exercise 5.103 (a) Consider the marginal function given in the form
μ(x) := inf{ϕ(y) Ay = x ,
Here we assume that μ(x) > −∞ for all x ∈ X. Hint: Proceed based on the def-
initions and the conjugate representations of subgradients and ε-subgradients.
Compare with the proof of [166, Theorem 4.1].
(b) Derive an extension of the results from (a) to the more general class of marginal
functions considered in Theorem 4.56.
Hint: Proceed similarly to the proof of the mean value result of Theorem 5.27
with replacing in the proof therein the usage of the usual subdifferential chain
rule by its asymptotic ε-subdifferential counterpart from Theorem 5.24.
(b) Does the result of (a) go back to Theorem 5.27 when f is continuous?
(c) Compare the asymptotic mean value theorem from (a) with the approximate
mean valued results from Theorems 5.29 and 5.30 in the case of l.s.c. convex
functions on Banach spaces.
Exercise 5.108 Clarify the possibility of using the asymptotic mean value theorem
from Exercise 5.107 to relax the continuity assumption on f in the characterization of
maximal monotone subdifferential mappings in the topological vector space setting
of Theorem 5.34.
Exercise 5.109 Let X be a Banach space.
(a) Give a proof of the result formulated in Remark 5.36(a) by using conjugate
calculus and directional derivatives. Hint: Consider first the case where X is a
reflexive Banach space and compare with the proof of [307, Theorem B].
(b) Simplify the proof of the result in (a) when X is a finite-dimensional space
and also when X is a Hilbert space. Hint: Compare with the proofs in [317,
Theorem 12.17] and in [34, Theorem 22.24].
Exercise 5.110 Give an example of a convex function f : R2 → R the derivative of
which has an uncountable set of discontinuities.
Exercise 5.111 Let X be a normed space, let f : X → R be a proper extended-
real-valued function, and let x ∈ dom(f ).
(a) Show that the Fréchet differentiability of f at x yields the continuity of f at
this point.
(b) Clarify the question in (a) for the case of Gâteaux differentiability.
Exercise 5.112 Let F : X → → Y be a set-valued mapping between topological vector
spaces, and let x ∈ dom(F ).
374 5 VARIATIONAL TECHNIQUES . . .
(a) Clarify the relationships between the upper semicontinuity and topological outer
semicontinuity of F at x.
(b) When do the topological and sequential outer semicontinuity agree?
(c) Compare the outer semicontinuity notions from Definition 5.48 with the closed-
graph property of F .
Exercise 5.114 Recall that the two norms · 1 and · 2 are equivalent on a vector
space X if there exist positive constants α and β such that
Exercise 5.115 Let X be a Banach space. Prove that X is Asplund if and only
if every separable subspace of X has a separable dual. Hint: Verify first that a
separable Banach space X is Asplund if and only if X ∗ is separable. Compare with
the proofs of [105, Theorem 5.7] and [290, Theorem 2.34].
(c) The space c0 of numerical sequences converging to zero is Asplund, while the
spaces 1 , ∞ , and C[0, 1] are not.
Hint: Use the Asplund space characterization from Exercise 5.115 for the proofs of
all the assertions in (a)–(c).
Exercise 5.118 Verify the equality statements in Theorems 5.68 and 5.69.
Exercise 5.119 Verify that for any matrices A, B ∈ Sn the trace of their product
AB given in (5.38) agrees with definition (5.36).
Exercise 5.122 Give a detailed proof of the version of von Neumann’s trace
inequality in Theorem 5.81. Hint: Proceed similarly to the proof of Theorem 5.68.
Exercise 5.123 Define the function F : Rm×n → R by F (A) := σmax (A) for A ∈
Rm×n and find its subdifferential ∂F (A).
The latter fundamental result of variational analysis was discovered by Ivar Eke-
land (born in 1944) in 1972 [118]. A complete proof of this result was given in
[119] by following Bishop-Phelps’ path with the usage of Zorn’s lemma. A signifi-
cantly simpler proof of the Ekeland variational principle was suggested by Michael
Crandall as a personal communication to Ekeland and was reproduced in [120]. We
mainly follow this path in the proof of Theorem 5.2. The scope of applications of
the Ekeland variational principle, its equivalents, modifications, and extensions is
enormous. Among less expected applications of this fundamental result and its set-
valued extensions, let us mention those to models of behavioral sciences initiated by
Antoine Soubeyran, where not only the formulation but mainly the proofs of the
variational results play a crucial role in reaching the practical conclusions; see, e.g.,
the papers by Bao et al. [30], Mordukhovich and Soubeyran [259], and the references
therein.
We also refer the reader to the so-called smooth variational principles, notably
to the Borwein-Preiss [50] and Deville-Godefroy-Zizler [105] ones. Detailed relation-
ships between such variational principles for l.s.c. functions and the extremal prin-
ciples for closed sets can be found in the book by Mordukhovich [229, Section 2.3]
with more references and discussions therein. The subdifferential variational princi-
ple of Theorem 5.3, which is derived there as a consequence of Ekeland’s principle
and the subdifferential sum rule, was originally established (and named so) by Mor-
dukhovich and Wang [260] in a nonconvex setting as a characterization of Asplund
spaces by using the approximate extremal principle; see also [229, Subsection 2.3.2].
The convex extremal principle for closed and convex sets in Banach spaces pre-
sented in Theorem 5.5 was obtained in our paper [239], where the proof did not
provide, however, all the details. Both approximate and exact versions of The-
orem 5.5 are significantly different in important aspects from the corresponding
results obtained in Chapter 3 in the topological vector space setting, as well as from
the nonconvex counterparts given in [229, Chapter 2] and the references therein.
Indeed, the topological vector space results in Theorem 3.7 do not contain any
approximate version, which is given nevertheless in [229, Theorem 2.20] for noncon-
vex sets in Asplund spaces. It should be emphasized that the latter result establishes
only necessary conditions for set extremality (with no sufficiency even in the con-
vex case), while Theorem 5.5(b) gives us necessary and sufficient conditions of this
type for extremality of convex sets. Moreover, the Asplund space structure is not
just sufficient for the fulfillment of the approximate extremal principle in [229, The-
orem 2.20] for any closed sets, but also necessary for this property. On the other
hand, Theorem 5.5(b) holds in an arbitrary Banach space. A specific feature of con-
vexity is exploited in Lemma 5.4, which does not require a smooth renorming as in
the proofs of [229, Theorems 2.10 and 2.20].
The exact version of the convex extremal principle in Theorem 5.5(c) yields the
convex set separation in Banach spaces without any nonempty interior assumptions
as in the topological vector space setting of Theorem 3.7; see more discussions in
Remark 2.187 on the SNC characterization for convex sets. Observe that the sequen-
tial normal compactness property is utilized in the proof of Theorem 5.5(c) in general
Banach spaces, with no sequential compactness of the dual ball as for the case of
Asplund spaces in [229, Theorem 2.22].
The notion of ε-subdifferentials (known also as approximate subdifferentials) for
convex functions given in Definition 5.8 was introduced by Brøndsted and Rock-
afellar in [64] who proved there the Brøndsted-Rockafellar density theorem (Theo-
rem 5.10) and established other topological properties of ε-subdifferentials. It has
5.9 Commentaries to Chapter 5 377
been realized later on that ε-subgradient mappings for ε > 0 exhibit some better
properties in comparison with the classical case of ε = 0. This made it possible
to use ε-subgradients in constructing efficient numerical algorithms, which were
started from the paper by Bertsekas and Mitter [36]. In contrast to convex sub-
gradient mappings ∂f (·), their ε-subgradient expansions ∂ε f (·) with fixed ε > 0
possess continuity and certain local Lipschitzian properties as was first shown by
Nurminskii [281]; see also Hiriart-Urruty [161] and more discussions in his book with
Lemaréchal [164]. The latter two-volume book, as well as the more recent monograph
by Zălinescu [361], contain a variety of theoretical and algorithmic developments and
many applications of ε-subdifferentials of convex functions.
The exact calculus rules for ε-subdifferentials presented in Subsection 5.2.1 under
the imposed qualification conditions in finite and infinite dimensions are similar
to the corresponding results for the classical subdifferentials, while the asymptotic
ε-subdifferential calculus rules of Subsection 5.2.2 are significantly different since
they do not require any qualification conditions. The results of this type have been
initiated by Hiriart-Urruty and Phelps [166] and then have been largely developed
and applied in the literature; see, e.g., Penot [286], Thibault [334], and Zălinescu
[361] with further references and discussions.
It has been well recognized in mathematics that the classical (Lagrange) mean
value theorem for differentiable functions plays a fundamental, crucial role in many
aspects of real analysis and numerous applications. Note that the proof of the clas-
sical mean value theorem is based on the two optimization results; namely, on the
Weierstrass theorem ensuring that any continuous real function attains its mini-
mum and maximum on compact intervals and on the Fermat stationary rule for
local extrema of differentiable functions. The proofs of the mean value results in
Subsection 5.3.1 are based on the same ideas with the additional usage of the subd-
ifferential chain rule in the device of Theorem 5.27 for convex continuous functions
on topological vector spaces. The approximate mean value results presented in Sub-
section 5.3.2 are highly different in formulations and proofs from the “continuous”
mean value version of the preceding subsection. First of all, they address extended-
real-valued, lower semicontinuous, convex functions and thus can be applied, e.g., to
constrained optimization. As we see, the proof of these results is based on variational
principles replacing the classical Weierstrass existence theorem.
The approximate mean value theorems do not have any preceding counterparts
in convex analysis. Both Theorems 5.29 and 5.30 and their proofs are due to Darinsz
Zagrodny who established them [358] for nonconvex l.s.c. functions on Banach spaces
in terms of the Clarke subdifferential; we’ll discuss this more in Chapter 7.
Mean value theorems for convex functions are instrumental in the study of var-
ious aspects of convex analysis and its applications. In particular, in Section 5.4
we present applications of both continuous and approximate versions of the mean
value theorem to the maximal monotonicity of subgradient mappings associated with
convex functions on topological vector spaces and Banach spaces, respectively. The
main result, Theorem 5.35, is proved by employing Zagrodny’s approximate mean
value theorem taken from Theorem 5.30; cf. Borwein [45]. The fundamental maximal
monotonicity result was first established by Rockafellar [303, 307] by using varia-
tional arguments involving the Brøndsted-Rockafellar theorem. We refer the reader
to the book by Simons [326] for other proofs of Theorem 5.35, with all of them being
based on certain variational techniques. Various results on maximal monotone oper-
ators, their enlargements, and algorithmic applications can be found in the book
378 5 VARIATIONAL TECHNIQUES . . .
by Burachik and Iusem [66]. In the case of Hilbert spaces, a comprehensive study
and numerous applications of monotone operators and subdifferential mappings of
convex analysis are given in the book by Bauschke and Combettes [34].
The notion of Gâteaux derivative was introduced by Réne Gâteaux (1889–
1914) in [138], while the notion of Fréchet derivative in infinite dimensions was
defined and studied earlier in [135]. Both Fréchet and Gâteaux were students of
Jacques Hadamard (1865–1963) who also introduced his derivative notion in infinite-
dimensional spaces; see Chapter 7 for more details. Note that the Gâteaux derivative,
being a directional derivative, is naturally defined on topological vector spaces. On
the other hand, the Fréchet derivative is an infinite-dimensional extension of the
classical derivative in finite dimensions and is uniform in directions while requiring
in its definition the normed space structure.
The main results on differentiability and generic differentiability presented in
Sections 5.5 and 5.6 are well known; most of them can be found scattered in the
book by Phelps [290], while we present here more elaborations. Example 5.43 is
taken from [141]. Note that the usage of the subdifferential mean value theorem in
the proof of Theorem 5.52 follows the paper by Cuong and Nam [93] and that the
result of Theorem 5.55 of the Fréchet differentiability of conjugates is taken from
Borwein and Vanderwerf [52].
The classes of Asplund and weak Asplund spaces were introduced by Edgar
Asplund in [9] as “strong differentiability” and “weak differentiability” Banach
spaces, respectively; these spaces were renamed in honor of Asplund by Namioka
and Phelps [274] after his death in 1974. Note that Asplund spaces are strongly
related to Fréchet geometric structures of Banach spaces. In particular, it has been
proved by Ekeland and Lebourg [121], by using Ekeland’s variational principle, that
any Banach space admitting an equivalent norm that is Fréchet differentiable at
nonzero points is Asplund. On the other hand, Haydon [154] constructed an exam-
ple showing that an Asplund space may fail to have even the Gâteaux differentiable
norm off the origin.
The class of Asplund spaces constitutes one of the most beautiful objects in
mathematics that has been well investigated in functional analysis and applications.
This class covers, in particular, reflexive Banach spaces and those with separable
duals. Furthermore, Asplund spaces admit a variety of nice characterizations some of
which have been mentioned in the text (see Sections 5.6 and 5.8); their proofs can be
found, e.g., in the books [105, 290]. Besides this, we refer the reader to the excellent
survey by Yost [355], which collects basic facts and proofs from the Asplund space
theory. Observe that, contrary to Asplund spaces, their weak Asplund counterpart
exhibits a modest number of satisfactory results presented in Fabian’s book [126].
The broad usage of Asplund spaces in variational analysis and generalized dif-
ferentiation, with their novel variational characterizations and applications, can be
found in the two-volume book by Mordukhovich [229] with detailed references.
Section 5.7 deals with some functions depending on quadratic matrices and their
eigenvalues. The importance of these issues for a variety of applications, including
algorithmic design, is difficult to overstate. A characteristic feature of such functions
is their intrinsic nonsmoothness, which naturally calls for employing subgradients
and their calculus. Pioneering work in this direction has been done by Michael
Overton; see his paper [283] and subsequent publications with various collaborators
over the years, e.g., [69, 146, 147, 284] and the references therein. Note also the
influential paper by Adrian Lewis [203] on nonsmooth analysis of eigenvalues.
5.9 Commentaries to Chapter 5 379
(b) We say that f is strongly smooth on Ω if there exists γ > 0 such that
x − y2
f λx + (1 − λ)y + γλ(1 − λ) ≥ λf (x) + (1 − λ)f (y)
2
for all x, y ∈ Ω and all λ ∈ (0, 1). When γ is prescribed a priori, f is
called γ-strongly smooth, or strongly smooth on Ω with modulus γ.
The next assertion characterizes the strong convexity of a function via the
usual convexity of its quadratic shift.
Proof. Due to the inner product structure of the space X, for any x, y ∈ X
and λ ∈ (0, 1) we can easily check the identity
x − y2 σ
σλ(1 − λ) = λx2 + (1 − λ)y2 − λx + (1 − λ)y2 .
2 2
This implies that definition (6.1) admits the rearrangement
σ 2 σ σ
f λx+(1−λ)y − λx+(1−λ)y ≤ λ f (x)− x2 +(1−λ) f (y)− y2 ,
2 2 2
which clearly verifies the claimed equivalence.
Proof. Consider the function g(x) := f (x) − σ2 x2 for x ∈ Ω the convexity
of which is equivalent, by Proposition 6.2, to the σ-strong convexity of f
on Ω. Since ∇2 g(x) = ∇2 f (x) − σIn , the claimed characterization (6.2) of
the σ-strong convexity of f follows from the characterization of convexity for
C 2 -smooth functions in Theorem 2.118.
Denoting zλ∗ := λx∗ + (1 − λ)u∗ and zλ := λx + (1 − λ)u and then taking into
account (6.4) yield in turn the estimate
λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) ≥ zλ∗ , zλ − f (zλ )
x − u2
+λ(1 − λ) x∗ − u∗ , x − u − γ .
2
With v := x − u, this gives us the following chain of inequalities:
∗
λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) ≥ sup zλ , u + λ(x − u) − f (u + λ(x − u))
x,u∈X
x − u2
+λ(1 − λ) x∗ − u∗ , x − u − γ
2
≥ sup sup zλ∗ , u + λv − f (u + tv)
v∈X u∈X
v2
+λ(1 − λ) x∗ − u∗ , v − γ
2 v2
≥ sup f (zλ ) + λ(1 − λ) x∗ − u∗ , v − γ
∗ ∗
v∈X 2
∗
∗ ∗
λ(1 − λ) x∗ − u∗ 2
≥ f λx + (1 − λ)u + ,
γ 2
which verifies the 1/γ-strong convexity of f ∗ on X ∗ by Definition 6.1(a).
Proof. Pick any x ∈ dom(f ) and suppose on the contrary that f (x1 ) = ∞ for
some x1 ∈ X. Choose further x2 ∈ X such that x := (x1 + x2 )/2. Then we
have by the imposed γ-strong smoothness of f that
f (x1 ) + f (x2 ) x + x γ γ
1 2
∞= ≤f + x1 −x2 2 = f (x)+ x1 −x2 2 < ∞,
2 2 8 8
which is clearly a contradiction.
x − u2
f tx + (1 − t)u + γt(1 − t) ≥ tf (x) + (1 − t)f (u) for all t ∈ (0, 1),
2
which readily yields the inequality
f u + t(x − u) − f (u) x − u2
+ γ(1 − t) ≥ f (x) − f (u).
t 2
Letting t ↓ 0 therein gives us the conclusion in (a).
To proceed further with checking (b), we get from (a) that
γ
f (x) ≤ f (u) + fG (u), x − u + x − u2 ,
γ2
f (u) ≤ f (x) + fG (x), u − x + x − u2 .
2
Summing up these two inequalities justifies the claimed conclusion in (b).
Next we verify property (c). Fix any direction d ∈ X and observe that
γ
f (x + v) ≤ f (x) + fG (x), v + v2 ,
2
fG (u)x + v − u ≤ f (x + v) − f (u),
which leads us to the estimate
γ
fG (u), x + v − u ≤ f (x) + fG (x), v + v2 − f (u).
2
Rearranging the terms therein gives us
γ
fG (u) − fG (x), v − v2 + fG (u), x − u ≤ f (x) − f (u).
2
Taking the supremum with respect to d ∈ X, i.e., passing to the Fenchel
conjugate on the left-hand side, clearly justifies (c).
To proceed with verifying (d), we get from (c) that
1
f (x) ≥ f (u) + fG (u)x − u + f (x) − fG
(u)2 ,
2γ G
1
f (u) ≥ f (x) + fG (x), u − x + f (x) − fG
(u)2 .
2γ G
Summing up these inequalities and rearranging the terms give us (d). The
last property (e) easily follows from (d), and thus the proof is complete.
Remark 6.12 In the proof (a)=⇒(b) of Theorem 6.11, we use the fact that
f ∗ is proper, convex, and l.s.c. on X ∗ equipped with the strong topology. We
also use the obvious inclusion X ⊂ X ∗∗ and the equality (f ∗ )∗ (x) = f (x) for
x ∈ X in the proof (b)=⇒(a).
Proof. The general assumptions imposed on the function f ensure the exis-
tence of v ∗ ∈ X ∗ and η ∈ R for which
η + v ∗ , x ≤ f (x) whenever x ∈ X.
The coercive condition (6.6) allows us to find ν > 0 such that
x v ∗ + 1 ≤ f (x) if x ≥ ν.
It follows furthermore that
sup x∗ , x − f (x) x ≥ ν ≤ −x.
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 391
Proposition 6.14 Let f be taken from (6.7), where g is a proper l.s.c. func-
tion, and where Y is a Banach space. If g is σ-strongly convex with some
modulus σ > 0 on Y , then f is Fréchet differentiable and ∇f is Lipschitz
continuous on X with constant A2 /σ.
392 6 MISCELLANEOUS TOPICS ON CONVEXITY
Using the obtained result together with the conjugate sum rule, we derive
now efficient conditions ensuring the Fréchet differentiability of the con-
strained version of the function (6.7).
Proof. Fix a positive number μ and observe from (6.9) that the function fμ
admits the representation
fμ (x) = (ϕ + μp)∗ (Ax) for all x ∈ X.
394 6 MISCELLANEOUS TOPICS ON CONVEXITY
It is easy to check that the σ-strong convexity of p clearly yields this property
of g(y) := ϕ(y) + μp(y) for y ∈ Y with modulus σμ. Thus assertion (a) follows
from Proposition 6.14.
Since p(y) ≥ 0 for all y ∈ Y and dom(ϕ) ⊂ dom(p), it holds that
fμ (x) = sup Ax, y − ϕ(y) + μp(y) | y ∈ Y
= sup Ax, y − ϕ(y) + μp(y) | y ∈ dom(ϕ)∩ dom(p)
= sup Ax, y − ϕ(y) + μp(y) | y∈ dom(ϕ)
≤ sup Ax, y − ϕ(y)| y ∈ dom(ϕ) = f (x).
Furthermore, we get the relationships
f (x) = sup Ax, y − ϕ(y)|
y ∈ dom(ϕ)
≤ sup Ax, y − ϕ(y) + μp(y)) y ∈ dom(ϕ) + μ sup p(y)
y∈dom(ϕ)
= fμ (x) + μM.
Therefore, the claimed estimate is completely verified.
Next we consider the constrained version of (6.8). Given a nonempty set
Ω ⊂ Y , consider the function fΩ : X → R defined by
fΩ (x) := sup Ax, y − φ(y) y ∈ Ω , x ∈ X, (6.10)
where φ : Y → R is a real-valued function defined on a normed space Y , and
where A : X → Y ∗ is a bounded linear operator defined on a normed space
X.
Consider a function p : Y → R and then, given μ > 0, define the μ-
approximation function fμ for (6.10) by
fμ (x) := sup Ax, y − φ(y) − μp(y) y ∈ Ω , x ∈ X. (6.11)
Theorem 6.19 Let A : X → Y ∗ be a bounded linear operator between a
normed space X and the dual to a Banach space Y , let φ : Y → R be a
real-valued l.s.c. convex function, and let Ω ⊂ Y be a nonempty closed convex
set. Consider the function f : X → R from (6.10), and let fμ be taken from
(6.11) for some μ > 0. Suppose that
(a) p is proper, l.s.c., and σ-strongly convex on Ω with some constant σ > 0.
(b) Ω ⊂ dom(p).
(c) p(y) ≥ 0 for all y ∈ Ω.
(d) sup p(y) < ∞.
y∈Ω
Then we have:
(a) fμ is a C 1,1 -smooth function on X with the uniform Lipschitz constant
of the gradient ∇fμ calculated by A2 /(σμ).
(b) We have the estimate
fμ (x) ≤ fΩ (x) ≤ fμ (x) + μM for all x ∈ X with M := sup p(y).
y∈Ω
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 395
Proof. The conclusions follow directly from Theorem 6.19 with the observa-
tion that the prox-function p(y) is σ-strongly convex on Y with σ = 1.
Example 6.21 Consider the function f (x) := |x|, which can be represented
as f (x) = sup{xy | |y| ≤ 1} on R. Using the prox-function p(y) := y 2 /2 for
y ∈ R gives us the family of smooth approximations
⎧ 2
⎪
⎨x if |x| ≤ μ,
fμ (x) = 2μ μ
⎪
⎩|x| − if |x| > μ
2
for all μ > 0, which is depicted in Figure 6.1(a). Consider another choice of
the prox-function given by
396 6 MISCELLANEOUS TOPICS ON CONVEXITY
1 − 1 − y2 if |y| ≤ 1,
p(y) :=
∞ if |y| > 1
observe that the Hessian ∇2 p+ (y) is positive definite for such y, and thus the
function p+ is strongly convex on the positive orthant. This yields the strong
convexity of p on Rm . The positivity of this function on R can be checked by
applying the method of Lagrange multipliers; see Exercise 6.94.
The main object of our study is the following notion that allows us to char-
acterize the set boundedness and efficiently deal with unbounded sets.
Note that the horizon cone is also known in the literature as the asymptotic
cone of Ω at the point in question. Another equivalent definition of Ω∞ (x) is
Ω−x
Ω∞ (x) = . (6.13)
t>0
t
The following proposition shows that Ω∞ (x) is the same for any x ∈ Ω
provided that the set Ω is closed and convex. Thus in this case, we can simply
use the notation Ω∞ for the horizon cone of Ω.
Proof. The convexity of Ω∞ (x) follows from Definition 6.25 and the convexity
of Ω. Since Ω is closed, the set Ω∞ (x) is closed as well due to (6.13). To prove
(6.14), it suffices to verify that Ω∞ (x1 ) ⊂ Ω∞ (x2 ) whenever x1 , x2 ∈ Ω.
Taking any direction d ∈ Ω∞ (x1 ) and any number t > 0, we show that
x2 + td ∈ Ω. Indeed, consider the sequence
1 1
xk := x1 + ktd + 1 − x2 , k ∈ N,
k k
and observe that xk ∈ Ω for every k because d ∈ Ω∞ (x1 ) and Ω is convex.
We also have xk → x2 + td, and so x2 + td ∈ Ω since Ω is closed. Thus
d ∈ Ω∞ (x2 ), which completes the proof of the proposition.
A nice and useful consequence of the above result is as follows:
Corollary 6.27 Let Ω ⊂ X be a closed and convex subset of a topological
vector space X, and let 0 ∈ Ω. Then we have
Ω∞ = tΩ.
t>0
Proof. Suppose that the set Ω is bounded and pick any element d ∈ Ω∞ .
Proposition 6.28 ensures the existence of sequences {tk } ⊂ [0, ∞) with tk → 0
and {xk } ⊂ Ω such that tk xk → d as k → ∞. It follows from the boundedness
of Ω that tk xk → 0, which shows that d = 0.
To verify the converse implication, suppose on the contrary that Ω is
unbounded while Ω∞ = {0}. Then there exists a sequence {xk } ⊂ Ω with
xk → ∞. This allows us to construct a sequence of unit vectors by
xk
dk := , k ∈ N.
xk
Passing to a subsequence if necessary ensures in the finite-dimensional setting
under consideration that dk → d as k → ∞ with d = 1. Fix any x ∈ Ω and
observe that for all t > 0 and k ∈ N sufficiently large, we have
t t
uk := 1 − x+ xk ∈ Ω
xk xk
due to the convexity of Ω. Since uk → x + td as k → ∞ and since Ω is closed,
it yields x + td ∈ Ω which ensures, therefore, that d ∈ Ω∞ . The obtained
contradiction completes the proof of the theorem.
Proof. We can easily check that (t, x, λ) ∈ epi(Pf ) if and only if t > 0 and
(x, λ) ∈ t epi(f ). This leads us to the representation
epi(Pf ) = ta t > 0, a ∈ A with A := {1} × epi(f ).
Since the set A is convex by the assumed convexity of f , we conclude that
the epigraph epi(Pf ) is convex as well. This confirms the convexity of Pf .
Observe from Lemma 6.33 and its proof that f∞ : X → R is exactly the
function whose epigraph is epi(f )∞ . The next lemma gives us an equivalent
analytic description of the horizon function f∞ .
Lemma 6.35 Let f : X → R be a proper, l.s.c., and convex function defined
on a topological vector space X. Then f∞ : X → R is also a proper, l.s.c., and
convex function on this space that admits the representation
f∞ (v) = sup f (x + v) − f (x) x ∈ dom(f ) .
Proof. Since the set epi(f )∞ is closed and convex, the horizon function f∞
is l.s.c. and convex as well. Fix any pair (v, λ) ∈ epi(f∞ ) = epi(f )∞ and any
vector x ∈ dom(f ). Then we have (x, f (x)) ∈ epi(f ), and therefore
x, f (x) + (v, λ) ∈ epi(f ).
It follows that f (x + v) ≤ f (x) + λ; hence f (x + v) − f (x) ≤ λ and
sup f (x + v) − f (x) x ∈ dom(f ) ≤ λ.
Supposing further that sup{f (x + v) − f (x) | x ∈ dom(f )} ≤ λ yields
x, f (x) + (v, λ) ∈ epi(f ) whenever x ∈ dom(f ),
which tells us in turn that
(x, γ) + (v, λ) ∈ epi(f ) for all (x, γ) ∈ epi(f ).
In this way, we arrive at the inclusion epi(f ) + (v, λ) ⊂ epi(f ), and thus
(v, λ) ∈ epi(f )∞ . This ensures the equivalence
6.3 Convex Sets and Functions at Infinity 403
(v, λ) ∈ epi(f∞ ) = epi(f )∞ ⇐⇒ sup f (x + v) − f (x) x ∈ dom(f ) ≤ λ .
The latter readily implies that
f∞ (v) = sup f (x + v) − f (x) x ∈ dom(f )
and also that f∞ (v) > −∞ for all v ∈ X. Thus the horizon function f∞ is
proper, and we are done with the proof of the lemma.
In the case, where the function in question is l.s.c. on a topological vector
space, additional representations of its horizon function are given below.
Theorem 6.36 Let f : X → R be a proper, l.s.c., and convex function defined
on a topological vector space X. Then the horizon function f∞ : X → R is also
proper, l.s.c., and convex. Furthermore, for any x ∈ dom(f ) we have
f (x + tv) − f (x) f (x + tv) − f (x)
f∞ (v) = sup = lim . (6.16)
t>0 t t→∞ t
Proof. The properness and convexity of f∞ are proved in Lemma 6.35. Since
f is l.s.c., the set epi(f )∞ is closed, and hence the horizon function f∞ is
l.s.c. Let us now verify the fulfillment of the representations in (6.16). To do
this, pick v ∈ X and x ∈ dom(f ), and then observe that f∞ is positively
homogeneous since epi(f )∞ is a cone. It follows from Lemma 6.35 that
tf∞ (v) = f∞ (tv) = sup f (x + tv) − f (x) x ∈ dom(f ) ≥ f (x + tv) − f (x)
for any t > 0. Dividing both sides of the obtained inequality by t and taking
the supremum with respect to t > 0 tell us that
f (x + tv) − f (x)
f∞ (v) ≥ sup .
t>0 t
To complete the proof of the theorem, it remains to show that
f (x + tv) − f (x)
f∞ (v) ≤ sup .
t>0 t
This inequality is obvious if the expression on the right-hand side is infinity.
Thus it suffices to consider the case where
f (x + tv) − f (x)
γ := sup ∈ R.
t>0 t
In this case, we have that (x, f (x)) + t(v, γ) ∈ epi(f ) for all t > 0. Since epi(f )
is closed, it follows from Proposition 6.26 that
(v, γ) ∈ epi(f )∞ .
The latter ensures the conditions
γ ≥ inf λ ∈ R (v, λ) ∈ epi(f )∞ = f∞ (v),
which, therefore, complete the proof of the theorem.
404 6 MISCELLANEOUS TOPICS ON CONVEXITY
The last result here expresses the horizon function of the Fenchel conjugate
via the support function of the domain for the function in question.
Theorem 6.37 Let f : X → R be a proper, l.s.c., and convex function on an
LCTV space X. Then we have the relationships
∗
f∞ = σdom(f ) and f∞ = σdom(f ∗ ) . (6.17)
Proof. Fix v ∗ ∈ X ∗ and x∗0 ∈ dom(f ∗ ). Then for any t > 0, we get
f ∗ (x∗0 + tv ∗ ) = sup x∗0 + tv ∗ , x − f (x) x ∈ dom(f )
≤ sup x∗0 , x −f (x) x ∈ dom(f ) +t sup v ∗ , x x ∈ dom(f )
= f ∗ (x∗0 ) + tσdom(f ) (v ∗ ).
It follows, therefore, that
f ∗ (x∗0 + tv ∗ ) − f ∗ (x∗0 )
≤ σdom(f ) (v ∗ ) whenever t > 0.
t
∗
Applying Theorem 6.36 tells us that f∞ (v ∗ ) ≤ σdom(f ) (v ∗ ). It remains to
∗ ∗ ∗
prove that σdom(f ) (v ) ≤ f∞ (v ). As discussed above, it suffices to consider
∗
the case where γ := f∞ (v ∗ ) ∈ R. Then Theorem 6.36 yields the estimate
f ∗ (x∗0 + tv ∗ ) ≤ f ∗ (x∗0 ) + tγ for all t > 0.
Using the Fenchel-Young inequality gives us
f (x) ≥ x∗0 + tv ∗ , x − f ∗ (x∗0 + tv ∗ )
Proof. Assertion (a) is obvious. To verify (b), observe from (6.19) that
x ∈ X | ϑ(x; Ω) ≤ 0 = Ω.
To prove the other property in (b), suppose first that ϑ(x; Ω) < 0. Then x ∈ Ω
and γ := d(x; Ω c ) > 0. We get B(x; γ/2) ⊂ Ω, and hence x ∈ int(Ω). Now
pick x ∈ int(Ω) and find γ > 0 such that B(x; γ) ⊂ Ω. Then for any w ∈ Ω c ,
we have x − w ≥ γ, which shows that d(x; Ω c ) ≥ γ > 0.
To verify (c), take any α ∈ R and get for the α-sublevel set of (6.19) that
Lα : = x ∈ X ϑ(x; Ω) ≤ α
= x ∈ Ω d(x; Ω c ) ≥ α
= Ω ∩ x ∈ X d(x; Ω c ) ≥ α .
This shows that the closedness of Ω yields the closedness of Lα due to the
continuity of d(·; Ω c ). Thus ϑ(·; Ω) is l.s.c. on X.
Now we are ready to show that the signed distance function has the same
global Lipschitz continuity as its standard counterpart (2.33).
The previous theorem and the convexity of the infimal convolution (2.41)
for convex functions yield the convexity of the signed distance function (6.18)
associated with a convex set.
and so inf w∈Ω pF (w − x) ≤ TΩF (x). Let further γ := inf w∈Ω pF (w − x) and,
given ε > 0, find w ∈ Ω satisfying
pF (w − x) < γ + ε.
Then there exists t ≥ 0 such that t < γ + ε and w − x ∈ tF . This implies that
TΩF (x) ≤ t < γ + ε,
and hence TΩF (x) ≤ γ = inf w∈Ω pF (w − x), which completes the proof.
Now we present two consequences of this theorem that give us sufficient
conditions on the constant dynamics F generating the Minkowski gauge (6.23)
that ensure the continuity of (6.21) in arbitrary topological vector spaces X
as well as its Lipschitz continuity if the space in question is normed.
Corollary 6.51 Under the general assumptions of Theorem 6.50 the minimal
time function (6.21) is continuous on X.
Proof. Using representation (6.24), we fix x ∈ X and get
TΩF (x) = inf pF (w − x).
w∈Ω
Let V := int(F ) and w0 ∈ X. It follows from Corollary 2.27 that pF (w0 −x) <
1 if and only if w0 − x ∈ V , i.e., x ∈ w0 − V , which is a neighborhood of w0 .
Thus the minimal time function is bounded on a nonempty, open, and convex
set. This yields the continuity of (6.21) on X by Theorem 2.144.
Corollary 6.52 Under the general assumptions of Theorem 6.50, we have
TΩF (x) − TΩF (y) ≤ pF (y − x) for all x, y ∈ X. (6.25)
If X is a normed space, then TΩF (·) is Lipschitz continuous on X.
Proof. Fix any x, y ∈ X and q ∈ Ω. Then it follows from Theorem 6.50 that
TΩF (x) = inf pF (w − x)
w∈Ω
≤ pF (q − x) = pF (q − y + y − x)
≤ pF (q − y) + pF (y − x).
Taking the infimum above with respect to q ∈ Ω gives us (6.25).
When X is a normed space, fix r > 0 such that B(0; r) ⊂ F . Then
pF (x) = inf t > 0 | x/t ∈ F
≤ inf t > 0 | x/t ∈ B(0;r)
≤ inf t > 0 | x/t ≤ r
= inf t > 0 | x/r ≤ t = x/r,
412 6 MISCELLANEOUS TOPICS ON CONVEXITY
Proof. Suppose that the target set Ω is convex and show that in this case for
any x1 , x2 ∈ X and for any λ ∈ (0, 1), we have
TΩF λx1 + (1 − λ)x2 ≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ). (6.30)
Let t1 := TΩF (x1 ) and t2 := TΩF (x2 ). Then for any ε > 0 there exist numbers
γ1 and γ2 such that
ti ≤ γi < ti + ε and Ω ∩ (xi + γi F ) = ∅, i = 1, 2.
Take wi ∈ Ω ∩ (xi + γi F ) and by the convexity of the sets Ω and F obtain
the inclusions λw1 + (1 − λ)w2 ∈ Ω and
λw1 + (1 − λ)w2 ∈ λx1 + (1 − λ)x2 + λγ
1 F + (1 − λ)γ
2F
⊂ λx1 + (1 − λ)x2 + λγ1 + (1 − λ γ2 )F.
The latter implies the inequalities
TΩF λx1 + (1 − λ)x2 ≤ λγ1 + (1 − λ)γ2
≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ) + ε,
which in turn justify (6.30) due to the arbitrary choice of ε > 0.
shows that condition (6.31) does not hold under the assumptions made, and
thus the function TΩF is concave on Ω c .
Finally in this subsection, we present some results involving the closure
operation in the framework of the minimal time function. Having Ω and F
from (6.21), define the closure of Ω relative to F by
clF (Ω) := Ω − εF . (6.32)
ε>0
The next three propositions involving (6.32) deal with (6.21) in topological
vector spaces. The first statement shows that the boundedness of F ensure
that (6.32) reduces to the topological closure Ω independently of F .
Recall that a subset Θ of a topological vector space is bounded if for any
neighborhood V of the origin there exists t > 0 such that Θ ⊂ tV .
Proof. Fix any x ∈ Ω and choose a neighborhood V of the origin such that
V ⊂ F . Then for any ε > 0, we get x ∈ Ω − εV ⊂ Ω − εF . It follows from
(6.32) that x ∈ clF (Ω), and thus clF (Ω) ⊃ Ω.
To verify the opposite inclusion, pick any x ∈ clF (Ω) and get x ∈ Ω −
εF for all ε > 0. Taking any neighborhood V of the origin and using the
boundedness of F , we have F ⊂ tV for some t > 0, and so εF ⊂ V for
ε := 1/t. This tells us that x ∈ Ω − V , i.e., (x + V ) ∩ Ω = ∅, which implies
that x ∈ Ω.
The final two statements do not assume the boundedness of the constant
dynamic set F . We begin with showing that the usage of (6.32) allows to
characterize the roots of the equation TF (x; Ω) = 0.
Proposition 6.59 Let F be a subset of a topological vector space such that the
interiority condition 0 ∈ int(F ) is satisfied. Then x ∈ X solves the equation
TF (x; Ω) = 0 if and only if we have x ∈ clF (Ω).
Proof. Suppose that TF (x; Ω) = 0 and then for any ε > 0 find 0 < t < ε with
(x + tF ) ∩ Ω = ∅.
Since the condition 0 ∈ int(F ) ensures that tF ⊂ εF , we have (x+εF )∩Ω = ∅,
and thus x ∈ Ω − εF . The latter implies in turn that x ∈ clF (Ω).
To verify the opposite implication, pick x ∈ clF (Ω) and get by (6.32) that
x ∈ Ω − εF as ε > 0, which yields (x + εF ) ∩ Ω = ∅ for all positive ε. This
tells us that TF (x; Ω) < ε whenever ε > 0, and thus TF (x; Ω) = 0.
The last assertion elaborates the result of Proposition 6.58 in the case
where the set Ω is unbounded. In this case, we express the relative closure
416 6 MISCELLANEOUS TOPICS ON CONVEXITY
(6.32) in terms of Ω and the horizon cone of F defined in (6.12). Recall from
Proposition 6.26 that the horizon cone of a closed and convex set is the same
for all points of the set in question.
Proof. To verify first the inclusion “⊃” in (6.33), pick any x ∈ Ω − F∞ and
get x = w − d for some w ∈ Ω and d ∈ F∞ . Using definition (6.12) of F∞
and the condition 0 ∈ int(F ) implies that t(w − x) ∈ F for all t > 0 and that
x ∈ Ω − εF for all ε > 0. This tells us by (6.32) that x ∈ clF (Ω).
To check the opposite inclusion in (6.33), fix x ∈ clF (Ω) and for any k ∈ N
find wk ∈ Ω and vk ∈ F such that x = wk − vk /k. Since Ω is sequentially
compact, suppose without loss of generality that the sequence {wk } converges
to some w ∈ Ω. Hence vk /k → w − x as k → ∞, and we arrive at w − x ∈ F∞ ,
which yields x ∈ Ω − F∞ and thus completes the proof.
x∗ , (x − tf ) − x ≤ TΩF (x − tf ) ≤ t,
where the last inequality holds due to the condition ((x − tf ) + tF ) ∩ Ω = ∅
in definition of the minimal time function. This ensures, therefore, that
x∗ , −f ≤ 1 for all f ∈ F,
and so x∗ ∈ C ∗ by the construction of C ∗ and definition (4.8) of the support
function. Thus we arrive at the inclusion ∂TΩF (x) ⊂ N (x; Ω) ∩ C ∗ .
To verify the opposite inclusion in (6.34), pick any x∗ ∈ N (x; Ω) ∩ C ∗ and
then get x∗ , −f ≤ 1 for all f ∈ F together with
x∗ , x − x ≤ 0 for all x ∈ Ω.
Fix further any u ∈ X and for every ε > 0 find t > 0, f ∈ F , and ω ∈ Ω with
TΩF (u) ≤ t < TΩF (u) + ε and u + tf = ω.
This readily gives us the relationships
x∗ , u − x = x∗ , u − ω + x∗ , ω − x
≤ x∗ , −tf ≤ t < TΩF (u) + ε = TΩF (u) − TΩF (x) + ε,
which yield (6.37) since ε was chosen arbitrarily, and thus justify (6.34).
Let us finally prove the fulfillment of (6.36). We obviously have the inclu-
sion N x; clF (Ω) ∩ C ∗ ⊂ N (x; Ω) ∩ C ∗ due to Ω ⊂ clF (Ω). To verify the
opposite one, pick any x∗ ∈ N (x; Ω) ∩ C ∗ and x ∈ clF (Ω). Then we get from
(6.32) that x ∈ Ω − tF for all t > 0. Representing x = ωt − tdt with some
ωt ∈ Ω and dt ∈ F for each t > 0 tells us that
x∗ , x − x = x∗ , ωt − tdt − x = x∗ , ωt − x + tx∗ , −dt ≤ t,
which ensures that x∗ ∈ N (x; clF (Ω)) ∩ C ∗ by passing to the limit as t ↓ 0
and thus completes the proof of the theorem.
The next theorem derives a similar subdifferential formula for TΩF at points
x ∈ clF (Ω) of the F -relative closure of Ω defined in (6.32).
Theorem 6.62 Let x ∈ clF (Ω) in the setting of Theorem 6.61. Then we have
∂TΩF (x) = N x; clF (Ω) ∩ C ∗ , (6.38)
where the set C ∗ ⊂ X ∗ is taken from (6.35).
Proof. Picking any x∗ ∈ ∂TΩF (x), deduce from (6.37) and the equality
TΩF (x) = 0 for all x ∈ clF (Ω) that x∗ ∈ N (x; clF (Ω)). Since clF (Ω) ⊂ Ω − F ,
we represent x as w − f with w ∈ Ω and f ∈ F . Fixing any f ∈ F and t > 0,
denote x := w − tf and obtain similarly to the proof of Theorem 6.61 that
x∗ , (w − tf ) − (w − f ) = x∗ , −tf + f ≤ TΩF (w − tf ) ≤ t,
which clearly ensures that
418 6 MISCELLANEOUS TOPICS ON CONVEXITY
satisfied. It follows from Theorem 6.61 that x∗ ∈ ∂TΩFr (x), and thus
Proof. Let us first observe that the generalized projection (6.28) is a nonempty
set. Indeed, it is well known (see Chapter 2) that the Minkowski gauge pF is a
continuous function on X under the imposed convexity of F with 0 ∈ int(F ).
The convexity of pF ensures that pF is weakly lower semicontinuous on X.
Due to the semireflexivity of X and the assumptions imposed on Ω, it is not
hard to show that Ω is weakly compact in X; see Exercise 6.110(a). Thus
the infimum in (6.24) is realized by the Weierstrass existence theorem in the
weak topology of X. This justifies the nonemptiness of ΠF (x; Ω). Arguing now
similar to the proof of Theorem 6.61, we arrive at the claimed representation
(6.42) and thus complete the proof of the theorem.
420 6 MISCELLANEOUS TOPICS ON CONVEXITY
Similar to the signed distance function (6.38) studied in Section 6.4, we con-
sider here the signed minimal time function, which is the corresponding ver-
sion of the minimal time function with constant dynamics (6.21). Recall that
a subset Ω of a topological vector space X is called trivial if both Ω and its
complement Ω c are nonempty.
Definition 6.65 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), let Ω be nontrivial subset of X, and let Ω c be the complement of
Ω in X. The signed minimal time function associated with the target set
Ω and the constant dynamics set F is given by
TΩF (x) := TΩF (x) − TΩFc (x), x ∈ X. (6.43)
To investigate the signed minimal time function (6.43) in what follows,
Ω : X → R defined by
consider first the auxiliary function μF
−TΩFc (x) if x ∈ Ω,
μFΩ (x) := (6.44)
∞ if x ∈ Ω c
and reveal some of its properties in the next proposition.
Proposition 6.66 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), and let Ω, Ω1 , Ω2 be nontrivial subsets of X. Then the following
properties of μF
Ω (x) are satisfied:
(a) If Ω1 ⊂ Ω2 , then μF
Ω2 (x) ≤ μΩ1 (x) for all x ∈ X.
F
Proof. Properties (a) and (b) can be checked directly by definition (6.44). To
verify (c), it suffices to prove that TΩFc (x) > 0 if and only if x ∈ int(Ω). Indeed,
assume that TΩFc (x) > 0 and get x ∈ Ω. We need to show that x ∈ / bd(Ω).
Suppose on the contrary that x ∈ bd(Ω). Due to the definitions and the
interiority assumption 0 ∈ int(F ), we have
1
x + F ∩ Ω c = ∅ for all k ∈ N.
k
It follows that TΩFc (x) ≤ 1/k whenever k ∈ N, and hence TΩFc (x) = 0. Using
the imposed boundedness of F and employing Proposition 6.58 tell us that
x ∈ Ω c . This is a contradiction, which justifies the “only if” in the statement.
To verify further the “if” part therein, take x ∈ int(Ω) and suppose on the
contrary that TΩFc (x) = 0. Then select a neighborhood V of the origin in X
6.5 Minimal Time Functions 421
Then we find 0 < t < λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ) such that
λx1 + (1 − λ)x2 + tF ∩ Ω c = ∅.
This gives us a vector w ∈ F satisfying the inclusion
λx1 + (1 − λ)x2 + tw ∈ Ω c .
Define further the number γ := λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ) and the points
tw F tw F
u1 := x1 + T c (x1 ), u2 := x2 + T c (x2 ).
γ Ω γ Ω
Since t < γ, it follows that u1 , u2 ∈ Ω. Indeed, when TΩFc (x1 ) = 0 we imme-
diately get u1 = x1 ∈ Ω. In the remaining case where TΩFc (x1 ) > 0, suppose
on the contrary that u1 ∈ / Ω and so u1 ∈ Ω c . Then it follows from definition
(6.21) of the minimal time function that
t F
TΩFc (x1 ) ≤ T c (x1 ) < TΩFc (x1 ),
γ Ω
which is nonsense, and hence we get u1 ∈ Ω. Using finally the assumed con-
vexity of the set Ω tells us that
λu1 + (1 − λ)u2 = λx1 + (1 − λ)x2 + tw ∈ Ω.
The obtained contradiction verifies the concavity of TΩFc in (6.46), which yields
Ω in (6.44) and thus completes the proof of the lemma.
the convexity of μF
Now we are ready to establish the main result of this subsection, which
provides a representation of the signed minimal time function (6.43) as the
infimal convolution of the function μFΩ and the Minkowski gauge of F and, as
a consequence of it, gives us sufficient conditions for the convexity of TΩF .
Theorem 6.69 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), and let Ω be a nontrivial subset of X. Then the following hold:
(a) We have the estimate
which ensures by (6.43) the equality in (6.47) and thus justifies (b).
Finally, we verify (c) by using Lemma 6.68 on the convexity of μF Ω , Theo-
rem 2.26 on the convexity of the Minkowski gauge, and Proposition 2.135 on
the convexity of the infimal convolution of convex functions.
The following example shows that the symmetry of the set F is essential
for the equality in (6.47) even in the case of one-dimensional problems.
Example 6.70 Consider the signed minimal time function (6.43) for x ∈ R,
F := [−1, 2], and Ω := {5}. Then
−x if x < 0,
pF (x) = x
if x ≥ 0,
2
TΩF (x) = pF (5 − x), and F F
μΩ (x) = δΩ (x). In this case we have ΔΩ (x) =
pF (5 − x) and μΩ pF (x) = pF (x − 5), which shows the failure of the
F
Proof. We know from Corollary 6.52 that both functions TΩF and TΩFc are Lip-
schitz continuous on X and their common Lipschitz constant is calculated by
= F ◦ ; see Exercise 6.109(b). To verify the conclusion of this proposition,
it suffices to show that—due to the structure of (6.43)—for any fixed points
x ∈ Ω and u ∈ Ω c we have the estimate
F
TΩ (x) − TΩFc (u) ≤ F ◦ · x − u. (6.50)
To proceed with the verification of (6.50), observe first that
|TΩF (x) − TΩF (u)| = |TΩF (x) − TΩFc (x) − TΩF (u) + TΩFc (u)|
+F ◦ · z − u = F ◦ · x − u,
which justifies (6.50) and thus completes the proof.
m
λ
i
x=λ ai ∈ K,
i=1
λ
and deduce from Proposition 6.73 that there exist λi ≥ 0 and (1, ai ) ∈ Θ for
i = 0, . . . , m with m ≤ n such that we have
m
(1, x) = λi (1, ai ).
i=0
Proof. Since the inclusion “⊃” in (6.52) is obvious, it remains to verify the
reverse inclusion. To proceed, pick any x ∈ KΩ and get by definition that
m
x= μi ai with μi ≥ 0, i = 1, . . . , m.
i=1
Without loss of generality, suppose that x = 0 and find by the linear depen-
dence of ai numbers γi ∈ R, not all zeros, such that
m
γi ai = 0.
i=1
Proof. We begin with considering the case where the vectors a1 , . . . , am are
linearly independent in Rn . Take any sequence {xk } ⊂ KΩ converging to x as
k → ∞ and find by definition of KΩ nonnegative numbers αki for i = 1, . . . , m
and k ∈ N such that
m
xk = αki ai , k ∈ N.
i=1
Since KΩ is a cone, this tells us that 0 = 0, x < q, x and that tu ∈ KΩ
whenever t > 0 and u ∈ KΩ . Hence we have
t sup u, x u ∈ KΩ < q, x whenever t > 0.
Now divide both sides of the above inequality by t and let t → ∞. This yields
sup u, x u ∈ KΩ ≤ 0
and implies in turn that ai , x ≤ 0 for all i = 1, . . . , m. On the other hand,
we get q, x > 0, which contradicts (b) and thus completes the proof.
In the last subsection of this section, we present two other remarkable results of
convex finite-dimensional geometry known as Helly’s theorem and Radon’s the-
orem. They are generally different from while much related to Carathéodory’s
theorem and Farkas’ lemma discussed above. In this subsection, we first prove
Radon’s theorem and then use it for the derivation of Helly’s result; see his-
torical comments in Section 6.9.
Let us begin with the following simple observation. Recall that the notions
of affinely dependent vectors are taken from Definition 2.74.
Lemma 6.78 Any w1 , . . . , wm ∈ Rn with m ≥ n + 2 are affinely dependent.
Letting λ := i∈I1 λi gives us the equalities
λi −λi
λi wi = − λi wi and wi = wi .
λ λ
i∈I1 i∈I2 i∈I1 i∈I2
and thus conclude that co(Ω1 ) ∩ co(Ω2 ) = ∅, which completes the proof.
Finally, we are ready to formulate the classical Helly theorem and present
its proof by following Radon’s approach.
We clearly have that all Ωi are convex and obey the inclusions Ωi ⊂ Ωj
whenever j = i and i, j = 1, . . . , m + 1. The induction assumption tells us
that Ωi = ∅, and hence we pick wi ∈ Ωi for every i = 1, . . . , m + 1. Using the
Radon theorem for the chosen vectors w1 , . . . wm+1 gives us two nonempty
subsets I1 , I2 ⊂ I := {1, . . . , m + 1} such that I1 ∩ I2 = ∅, I = I1 ∪ I2 , and
co(W1 ) ∩ co(W2 ) = ∅ for W1 := wi i ∈ I1 and W2 := wi i ∈ I2 .
Select now w ∈ co(W1 ) ∩ co(W2 ) and verify that w ∈ ∩m+1 i=1 Ωi . Indeed, we get
by i = j for i ∈ I1 and j ∈ I2 that Ωi ⊂ Ωj whenever i ∈ I1 and j ∈ I2 . Fixing
an index i ∈ {1, . . . , m + 1} and considering the case where i ∈ I1 ensure that
wj ∈ Ωj ⊂ Ωi for every j ∈ I2 . This yields w ∈ co(W2 ) = co({wj | j ∈ I2 }) ⊂
Ωi by the convexity of Ωi , and therefore w ∈ Ωi for every i ∈ I1 . We can
similarly check that w ∈ Ωi for every i ∈ I2 and thus complete the proof.
432 6 MISCELLANEOUS TOPICS ON CONVEXITY
Let us begin with the case where Ω is a nonempty convex cone. The dual/polar
cone to Ω is defined by the duality correspondence
Ω ∗ := x∗ ∈ X ∗ x∗ , x ≤ 0 for all x ∈ Ω . (6.55)
Note that often in the literature on convex analysis the notation for the polar
cone is Ω ◦ , while the dual cone Ω ∗ is defined as “positive polar cone” by
{x∗ ∈ X ∗ | x∗ , x ≥ 0 for all x ∈ Ω . However, in this book, we follow the
tradition of modern variational analysis to not distinguish between the polar
and dual cones for convex sets and use notation (6.55) in what follows. This
also allows us to avoid confusion with the polar set Ω ◦ in (6.48).
Applying the polar operation to Ω ∗ and using the strict separation of
convex sets give us the following bipolarity relationship. Recall that the symbol
“cl Ω” stands as usual for the closure of the set Ω.
Now we define the tangent cone notion for convex sets by using the conic
hull construction from (6.51).
6.7 Approximations of Sets and Geometric Duality 433
The next theorem establishes the full duality correspondence between the
tangent and normal cones for convex sets; see Figure 6.5.
Theorem 6.83 Let Ω ⊂ Rn be a convex set with x ∈ Ω. Then
N (x; Ω) = T (x; Ω)∗ and T (x; Ω) = N (x; Ω)∗ . (6.58)
Proof. First we check that N (x; Ω) ⊂ T (x; Ω)∗ . Indeed, pick x∗ ∈ N (x; Ω)
and get by definition that x∗ , x − x ≤ 0 for all x ∈ Ω. This yields
v, t(x − x) ≤ 0 whenever t ≥ 0 and x ∈ Ω.
For any w ∈ T (x; Ω) we find by definition sequences of tk ≥ 0 and xk ∈ Ω
with tk (xk − x) → w as k → ∞. This tells us that
x∗ , w = lim x∗ , tk (xk − x) ≤ 0, and so x∗ ∈ T (x; Ω)∗ .
k→∞
To check now the opposite inclusion in (6.58), pick x∗ ∈ T (x; Ω)∗ and get from
the polarity that x∗ , w ≤ 0 for any w ∈ T (x; Ω) and that x − x ∈ T (x; Ω)
for any x ∈ Ω due to (6.57). Thus we arrive at
434 6 MISCELLANEOUS TOPICS ON CONVEXITY
x∗ , x − x ≤ 0 whenever x ∈ Ω,
which means that x∗ ∈ N (x; Ω) and hence verifies the first equality in (6.58).
The second equality in (6.58) follows directly from the first one by applying
the bipolarity relationship
N (x; Ω)∗ = T (x; Ω)∗∗ = cl T (x; Ω)
which is valid due to Proposition 6.81 by taking into account that the tangent
cone T (x; Ω) is a closed set by Definition 6.82.
The next theorem establishes effective descriptions of the tangent and normal
cones for convex polyhedral sets defined by linear inequalities. The usage of
Farkas’ lemma from Theorem 6.77 is crucial for deriving this result.
N (x; Ω) = cone x∗i i ∈ I(x) , (6.60)
where the convention cone(∅) := {0} is used.
Proof. First we verify the tangent cone representation (6.59). Consider the
closed and convex cone
K := v ∈ Rn x∗i , v ≤ 0 for all i ∈ I(x)
and observe that whenever x ∈ Ω and i ∈ I(x), we get
∗
xi , t(x − x) = t x∗i , x − x∗i , x ≤ t(bi − bi ) = 0 as t ≥ 0,
which tells us that cone(Ω − x) ⊂ K. Invoking the closedness of K yields
T (x; Ω) = cl R+ (Ω − x) ⊂ K,
which justifies the inclusion “⊂” in (6.59). To check the opposite inclusion
therein, take any v ∈ K and deduce from x∗i , x < bi as i ∈
/ I(x) that
x∗i , x + tv ≤ bi for any i ∈
/ I(x) and small t > 0.
Combining this with x∗i , v ≤ 0 as i ∈ I(x) implies that x∗i , x + tv ≤ bi for
all i = 1, . . . , m and that
6.8 Exercises for Chapter 6 435
1
x + tv ∈ Ω, i.e., v ∈ Ω − x ⊂ T (x; Ω),
t
which completes the verification of the tangent cone representation (6.59).
To continue now with the proof of the normal cone representation (6.60),
deduce from the first equality in (6.58) the equivalence
x∗ ∈ N (x; Ω) if and only if x∗ , v ≤ 0 for all v ∈ T (x; Ω),
which implies in turn that x∗ ∈ N (x; Ω) if and only if
x∗i , v ≤ 0 for all i ∈ I(x) =⇒ x∗ , v ≤ 0
for all v ∈ X. Then we are in a position and employ the Farkas lemma from
Theorem 6.77 and conclude that x∗ ∈ N (x; Ω) if and only if w ∈ cone{x∗i | i ∈
I(x)}. This verifies (6.60) and completes the proof of the theorem.
Proof. The set Ω is obviously represented via only the inequality constraints
Ω = x ∈ X x∗i , x ≤ bi for i = 1, . . . , m, and
u∗j , x ≤ cj , −u∗j , x ≤ −cj for j = 1, . . . , r .
Then both the tangent and normal cone representations follow from the cor-
responding results of Theorem 6.84.
Exercise 6.88 Explore the possibility of extending the equivalence in Corollary 6.9
to the case where X is a normed space.
Exercise 6.90 Let X be a topological vector space. Recall that a set-valued map-
ping F : X → ∗
→ X is strictly monotone if
is strongly convex on R.
Exercise 6.95 Prove the equivalence between Definition 6.25 of the horizon cone
and its representation (6.13).
Exercise 6.96 Let Ω ⊂ X be a nonempty convex subset of a vector space X. Verify
that the horizon cone Ω∞ is also convex.
Exercise 6.97 Let Ω j ⊂ X for j ∈ J be nonempty subsets of a vector space X,
and let J be an arbitrary index set. Prove that
j j
Ωj ⊂ Ω∞ and Ωj ⊃ Ω∞ ,
∞ ∞
j∈J j∈J j∈J j∈J
where the first inclusion holds as an equality if the sets Ω j have a common point,
while the second inclusion becomes an equality when the index set J is finite. Hint:
Apply Definition 6.25 in both cases of intersections and unions.
Exercise 6.98 Let Ω j ⊂ X, j = 1, . . . , s, be nonempty subsets of a vector space X.
Do the following:
(a) Check by definition that
1 s
Ω1 × . . . × Ωs ∞
= Ω∞ × . . . × Ω∞ .
Exercise 6.102 Calculate the perspective and horizon functions of elementary func-
tions: (a) polynomials; (b) exponential functions; (c) logarithmic functions.
Exercise 6.104 Let X be a topological vector space. Consider the constraint set
Θ := x ∈ Ω f j (x) ≤ 0 for j ∈ J ,
and show that it holds as an equality provided that Θ = ∅ and that the set Ω
is closed. Hint: Combine the results from Exercises 6.97 and 6.103(a).
(b) Clarify whether the results of (a) are valid when dim(X) = ∞.
Exercise 6.105 Provide detailed calculations in the three examples given at the
beginning of Section 6.4.
Exercise 6.107 Prove representation (6.22) of the minimal time function, where Ω
is an nonempty and closed subset of a normed space X.
Exercise 6.108 Prove that the conclusions of Theorem 6.50 and Proposition 6.56
hold under the fulfillment of all the assumptions of Theorem 6.50 and Proposi-
tion 6.56 but the boundedness of the set F , which is replaced by 0 ∈ int(F ). Hint:
Proceed similarly to the proofs of the aforementioned results.
Exercise 6.109 Consider the results presented in Corollaries 6.51 and 6.52.
6.8 Exercises for Chapter 6 439
(a) For the case where X = Rn , construct examples showing that each of the
imposed assumptions is essential for the corresponding conclusions.
(b) In the framework of Corollary 6.52, show that for any Ω ⊂ X a Lipschitz
constant of the minimal time function TΩF on X is calculated by F ◦ via
the norm (6.49) of the polar set (6.55). Hint: Deduce this from the proof of
Corollary 6.52.
(c) For the case where X = Rn , compare the Lipschitz constant from (b) with the
exact Lipschitzian modulus calculated in [317, Theorem 9.13]. Hint: Use the
subdifferential calculation for (6.21).
Exercise 6.110 Consider the minimal time function in the setting of Theorem 6.64.
(a) Prove that a subset Ω of a semireflexive space X is weakly compact in X under
the assumptions imposed on Ω in the theorem. Hint: Compare with the proof
of [218, Proposition 23.18].
(b) Finish the proof of Theorem 6.64 by using arguments similar to the proof of
6.61. Hint: Compare with the proof of [235, Theorem 4.3].
Exercise 6.111 Give detailed proofs of assertions (a) and (b) in Proposition 6.66
and of Corollary 6.67.
Exercise 6.113 Let TΩv be a directional minimal time function defined on a normed
space X. Verify the following assertions:
(a) If v ∈ int(Ω∞ ), then the function TΩv is Lipschitz continuous on X and its
Lipschitz constant is calculated by
1
= inf r > 0, B(v; r) ⊂ Ω∞ .
r
Hint: Use the definitions and compare with the proof of [273, Proposition 4.1].
(b) The function TΩv is finite-valued and Lipschitz continuous on X if and only if
v ∈ Ω∞ . Hint: Use (a) and compare with the proof of [273, Proposition 4.2].
Exercise 6.114 Let Ω be a nonempty compact subset of Rn . Prove that the set
co(Ω) is compact as well. Hint: Use Carathéodory’s theorem.
440 6 MISCELLANEOUS TOPICS ON CONVEXITY
Ax = b, x ≥ 0,
A∗ y ≤ 0, b, y > 0
Exercise 6.117 Let A ∈ Rm×n and b ∈ Rm . Prove that one and only one of the
following assertions is true:
(a) There exists x ∈ Rn such that Ax = b.
(b) There exits y ∈ Rm such that A∗ y = 0 and b, y = 0.
Exercise 6.118 Deduce Carathéodory’s theorem from the proof of Radon’s theo-
rem.
Exercise 6.120 Let Ω1 and Ω2 be convex sets in a topological vector space X with
int Ω1 ∩ Ω2 = ∅. Show that
and let x = (x1 , . . . , xn ) ∈ Ω. Verify the tangent and normal cone formulas:
T (x; Ω) = u = (u1 , . . . , un ) ∈ Rn Au = 0, ui ≥ 0 for any i with xi = 0 ,
N (x; Ω) = v ∈ Rn v = A∗ y − z, y ∈ Rp , z ∈ Rn , zi ≥ 0, z, x = 0 .
Hint: Use the results of Corollary 6.85.
Exercise 6.122 Give an example of a closed and convex set Ω ⊂ R2 with a point
x ∈ Ω such that the cone cone(Ω − x) is not closed.
6.9 Commentaries to Chapter 6 441
with showing that polyhedral structures of sets and functions in finite dimensions
make it possible to significantly improve qualification conditions in calculus results,
duality relations, etc. Note that polyhedrality also plays a highly important role in
more general modes of variational analysis. We refer the reader to the very influential
paper by Stephen Robinson [301] and the book by Rockafellar and Wets [317].
7
CONVEXIFIED LIPSCHITZIAN ANALYSIS
Observe that we always have that df (x; v) ≤ f ◦ (x; v) and that the sub-
derivative (7.2) reduces to the classical (one-sided) directional derivative
f (x + tv) − f (x)
f (x; v) = lim , v ∈ X, (7.4)
t↓0 t
defined and studied in Subsection 4.3.1, provided that the limit in (7.4) exists.
In contrast to (7.4), the lower limit in (7.2) always exists as a real number
due to the local Lipschitz continuity of f around x. It is easy to deduce
from the definitions that the directional regularity (7.3) is equivalent to the
requirement that the classical directional derivative f (x, v) exists and agrees
with the generalized one f ◦ (x; v) for all v ∈ X.
Let us highlight the two major differences between the generalized direc-
tional derivative f ◦ (x; v) and the constructions in (7.4) and (7.2).
(i) In contrast to (7.4) and (7.2), the initial point x in (7.1) is not fixed
while being included into the limiting process.
(ii) In contrast to the full limit in (7.4), provided that it exists, and to the
lower limit in (7.2), the upper limit is used in (7.1).
The aforementioned observations are crucial for the fulfillment of many
useful properties of (7.1), which are not generally shared by (7.4) and (7.2).
Now we present some elementary properties of both generalized direc-
tional derivative (7.1) and subderivative (7.2), which easily follow from the
definitions. The first proposition gives us an extended representation of sub-
derivatives that is useful not only for Lipschitzian functions; see Section 7.10.
Proposition 7.2 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Given a direction v ∈ X, the
contingent derivative (7.2) admits the equivalent representation
f (x + tw) − f (x)
df (x; v) = lim inf
w→v
. (7.5)
t↓0
t
Proof. For any v ∈ X, we know that the classical directional derivative f (x; v)
exists as a real number for any convex and locally Lipschitzian function. It
immediately follows from the definitions that
f (x + tv) − f (x) f (x + tv) − f (x)
f (x; v) = lim ≤ lim sup = f ◦ (x; v),
t↓0 t x→x t
t↓0
7.1 Generalized Directional Derivatives 449
which gives us the inequality “≤” in (7.6). To verify the opposite inequality
therein, let ≥ 0 be a Lipschitz constant of f around x. Fixing x, v ∈ X and
α > 0, observe by Proposition 4.45 that the function ψ(t) := f (x + tv) −
f (x) /t is monotonically increasing on (0, ∞). Then we get the relationships
f (x + tv) − f (x) f (x + γv) − f (x)
f ◦ (x; v) = lim sup ≤ lim sup
γ↓0 x−x<αγ t γ↓0 x−x<αγ γ
0<t<γ
f (x + γv) − f (x + γv) + f (x + γv) − f (x) + f (x) − f (x)
≤ lim sup
γ↓0 x−x<αγ γ
x − x + f (x + γv) − f (x) + x − x
≤ lim sup
γ↓0 x−x<αγ γ
f (x + γv) − f (x)
≤ lim 2α + = 2α + f (x; v).
γ↓0 γ
Since α > 0 is arbitrary, we arrive at f ◦ (x; v) ≤ f (x; v) and conclude the
proof by appealing to Proposition 7.3.
The next proposition establishes representations of both extended direc-
tional derivatives from Definition 7.1 in the case of differentiable functions.
Proposition 7.5 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Suppose that f is Gâteaux
differentiable at x with the Gâteaux derivative fG (x). Then for all v ∈ X we
have the representation
f (x; v) = df (x; v) = fG
(x), v . (7.7)
If furthermore f is a C 1 -smooth function around x with the (Fréchet) deriva-
tive fF (x) at x, then for all v ∈ X we have
f ◦ (x; v) = fF (x), v , (7.8)
and thus f is directionally regular at x in the latter case.
Proof. The representations in (7.7) easily follow from the definitions. To verify
(7.8), by (7.1) we choose a sequence {tk } of positive numbers and a sequence
{xk } ⊂ X converging to x such that
f (xk + tk v) − f (xk )
f ◦ (x; v) = lim .
k→∞ tk
Then the classical mean value theorem tells us that
f (xk + tk v) − f (xk ) = fF (uk ), tk v
for some uk ∈ (xk , xk + tk v) for every k ∈ N. Thus we get
f (xk + tk v) − f (xk )
f ◦ (x; v) = lim = lim fF (uk ), v = fF (x), v ,
k→∞ tk k→∞
Propositions 7.4 and 7.5 expectedly verify the directional regularity for the
classical classes of convex and smooth functions. More results on directional
regularity are given in Subsection 7.2.2 in the calculus framework.
Let us now consider two examples illustrating the calculation of the gener-
alized directional derivative (7.1) and the subderivative (7.2) in simple albeit
rather instructive settings of nonconvex functions.
Proof. The positive homogeneity of both functions f ◦ (x; ·) and df (x; ·) imme-
diately follows from the definitions. Furthermore, it follows from the local
Lipschitz continuity of f around x that
f (x + tv) − f (x)
≤ v, v ∈ X,
t
which readily yields the second estimate in (7.11). Since is a Lipschitz con-
stant of f around x, there exists η > 0 with
|f (x) − f (u)| ≤ x − u for all x, u ∈ B(x; η),
which clearly implies that
f (x + tv) − f (x) tv
≤ = v, v ∈ X,
t t
whenever x is near x and t > 0 is near zero. Passing there to the limit superior
as t ↓ 0 and x → x, we arrive at the first estimate in (7.11). Note finally that
the conditions in (7.11) ensure that both functions f ◦ (x; ·) and df (x; ·) are
finite on the entire space X.
We leave as exercises for the reader (see Section 7.9) to show that all
the properties for f ◦ (x; v), which are obtained in Proposition 7.9 and The-
orem 7.10, do not hold in general for the directional derivative f (x; v) and
the subderivative df (x; v). Furthermore, even for (Lipschitzian) functions in
finite dimensions, the generalized directional derivative (7.1) may not agree
with the classical one (7.4) when the latter exists.
Yet another remarkable property of Clarke’s generalized directional deriva-
tive is its plus-minus symmetry. This drastically distinguishes (7.1) from (7.2)
and the classical one-sided directional derivative (7.4), which both are con-
structions of unilateral analysis in contrast to (7.1).
Proof. (a) Take any x ∈ Lim supt↓0 Ωt and find by the definition a sequence
{tk } of positive numbers with xk ∈ Ωtk for all k ∈ N such that
lim tk = 0 and lim xk = x.
k→∞ k→∞
Given each k ∈ N, find xk ∈ Ωtk such that x − xk < d(x; Ωtk ) + 1/k. Then
{xk } converges to x, and so x ∈ Lim supt↓0 Ωt as claimed.
458 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
(b) Take any x ∈ Lim inf t↓0 Ωt . Take any sequence {tk } of positive numbers
that converges to 0 and find by the definition xk ∈ Ωtk for k ∈ N such that
{xk } converges to x. It follows that
0 ≤ d(x; Ωtk ) ≤ x − xk → 0 as t → ∞,
which implies therefore that limt↓0 d(x; Ωt ) = 0.
Now suppose that limt↓0 d(x; Ωt ) = 0. Then for any sequence {tk } of pos-
itive numbers that converges to 0, we have
lim d(x; Ωtk ) = 0.
k→∞
Given each k ∈ N, find xk ∈ Ωtk such that 0 ≤ x−xk < d(x; Ωtk )+1/k → 0
as k → ∞. Thus {xk } converges to x, which completes the proof.
Remark 7.14 It follows from Proposition 7.13 that x ∈ Lim inf t↓0 Ωt if and
only if there exists a curve x : [0, ε] → X for some ε > 0 satisfying x(t) ∈
Ωt for all t ∈ (0, ε] and limt↓0 x(t) = x = x(0). Indeed, suppose that x ∈
Lim inf t↓0 Ωt . Then for any t ∈ (0, ε] choose xt ∈ Ωt such that x − xt <
d(x; Ωt ) + t. Then we simply set x(t) := xt for t ∈ (0, ε] and x(0) := x. The
converse implication is also straightforward.
Using (7.16), we now formulate the following two properties, which play a
significant role in our subsequent analysis.
v : [0, ε] → X be the path taken from (7.20). Defining the curve ξ : [0, ε] → X
by
ξ(t) := x + t v (t)) + t(ν − q(v)) , t ∈ [0, ε],
v (t), f (x + t
it is easy to check that ξ(t) ∈ epi(f ) for all t ∈ [0, ε] with ξ(0) = (x, f (x)),
and ξ+ (0) = (v, ν). We conclude that f is epi-differentiable at x.
Now we are in a position to derive the following equality-type sum rule for
subderivatives of locally Lipschitzian function.
Now we apply the subderivative chain rule of Theorem 7.12 to the com-
position in ϕ ◦ g from (7.22). This shows by ϕ ◦ g = f1 + f2 and (7.23) that
d(f1 + f2 )(x; v) = d(ϕ ◦ g)(x; v) = dϕ (x, x); ∇g(x)v
= dϕ (x, x); (v, v) = df1 (x; v) + df2 (x; v)
for all v ∈ X, and so we get (7.21). Finally, the epi-differentiability of the sum
f1 + f2 at x follows from the epi-differentiability at (x, x) of the function ϕ
in (7.22) proved above, which yields therefore the epi-differentiability of the
composition ϕ ◦ g at x. The latter can be verified by employing Lemma 7.16
similar to the case of ϕ in (7.22); see Exercise 7.122.
This subsection deals with deriving important calculus rules for Clarke’s gen-
eralized directional derivatives (7.1) of locally Lipschitzian functions, which
are generally different from the subderivative ones obtained in the preceding
subsection. First we observe the obvious albeit useful relationship
(αf )◦ (x; v) = αf ◦ (x; v) for all α ≥ 0 and v ∈ X, (7.24)
which is similar to the subderivative case in (7.13).
The next theorem provides sum rules for the generalized derivatives. Note
that its main statement establishes the inequality-type sum rule of the most
useful form with no additional assumptions. This is different from the case of
subderivatives, where the opposite inequality holds for free while it does not
bring any calculus information. On the other hand, Theorem 7.17 provides
an equality-type sum rule for (7.2) under epi-differentiability of the functions
in question, which is weaker than their directional regularity ensuring the
equality sum rule for (7.1) in the following result.
where equality holds provided that all the functions fi , i = 1, . . . , m, are direc-
tionally regular at x. In this case the summation function f1 + . . . + fm is also
directionally regular at this point.
Proof. It suffices to verify the claimed statements for the case where m = 2
while observing that the general case easily follows by induction. The local
Lipschitz continuity of the sum in question is obvious.
To justify estimate (7.25) for m = 2, we fix any direction v ∈ X and get
by using Definition 7.1 the relationships
462 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
The next result provides a chain rule for generalized directional derivatives
of compositions involving locally Lipschitzian outer functions and smooth
inner mappings between Banach spaces. The completeness of the spaces in
question is needed for the application of the classical open mapping theorem
for linear continuous operators discussed in Subsection 1.3.3.
Proof. The local Lipschitz continuity of ϕ ◦ g around x easily follows from the
definitions. To verify (7.27), observe by the surjectivity of the linear continuous
operator ∇g(x) that the open mapping theorem ensures that the image set
g(B(x; η)) is a neighborhood of y for any η > 0. It follows therefore that for
each fixed v ∈ X, we get the equalities
◦
ϕ y + t∇g(x)v − ϕ(y)
ϕ y; ∇g(x)v = lim sup
y→y t
t↓0
ϕ g(x) + t∇g(x)v − ϕ g(x)
= lim sup .
x→x t
t↓0
Assume that the functions fi are C 1 -smooth around a given point x for all
i = 1, . . . , m. Then the maximum function (7.28) is locally Lipschitzian around
x, and we have the representation
◦
fmax (x; v) = max ∇fi (x), v , v ∈ X, (7.29)
i∈I(x)
Combining the latter with (7.29) ensures the directional regularity of the
maximal function (7.28) under the imposed assumptions.
It is obvious from the duality scheme in (7.33) and (7.34) that both sub-
gradient sets are convex and weak∗ closed in X ∗ , and that both these sets
reduce to the subdifferential of convex analysis for locally Lipschitzian convex
functions. However, the crucial difference between them in the general Lips-
chitzian setting is that ∂f (x) is defined via the generalized directional deriva-
tive f ◦ (x; v), which is convex in the direction variable v, while the function
v → df (x, v) is commonly nonconvex. The unconditional convexity of f ◦ (x; ·)
enables us to study the generalized gradient (7.33) by employing powerful
tools of convex analysis.
It is easy to check by the definitions that we always have
∂ − f (x) ⊂ ∂f (x) for all x ∈ X, (7.35)
where equality holds for a fixed x if and only if f is directionally regular
at x. As seen below, important properties and calculus rules for ∂f in the
absence of regularity are significantly better than those for ∂ − f . On the other
hand, the contingent subdifferential may be much smaller than the generalized
gradient even for differentiable Lipschitzian functions on the real line as in
Example 7.23. Furthermore, the set ∂ − f (x) may be empty (which is not the
case of ∂f (x)) for simple nonconvex functions as in Examples 7.22 and 7.25.
Before deriving general results on the above subdifferential constructions,
let us calculate the generalized gradient and contingent subdifferential of the
Lipschitzian functions on the real line taken from Examples 7.6 and 7.7. The
obtained calculations illustrate important features of these notions.
Since f ◦ (0; v) = |v| for v ∈ R, we have ∂f (0) = [−1, 1]. On the other hand, it
is obvious that ∂ − f (x) = f (0)} = {0}.
Proceeding further with a parallel study of the generalized gradient and the
contingent subdifferential from Definition 7.21, we first clarify some structural
properties of these sets. The next result describes the situation for ∂f (x).
Theorem 7.24 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Suppose that ≥ 0 is a
Lipschitz constant of f : X → R around x. Then the set ∂f (x) is nonempty,
convex, weak∗ compact, and bounded in X ∗ with the bound estimate
x∗ := sup{ x∗ , v v ∈ X, v ≤ 1 ≤ for all x∗ ∈ ∂f (x). (7.36)
It follows from Theorem 7.8 that the linear functional x∗ belongs to X ∗ , and
thus we get x∗ ∈ ∂f (x) by the subdifferential definition.
Remembering that the assumed Lipschitz continuity of f around x yields
this property for each x near x, we finally deduce (7.37) from the upper
semicontinuity of f ◦ in Proposition 7.9. Indeed, suppose that f is locally
Lipschitzian on a neighborhood W of x ∈ X. Take any nets {xν } in X and
{x∗ν } in X ∗ satisfying the conditions
w∗
xν → x, x∗ν → x∗ with x∗ν ∈ ∂f (xν ).
Then we have the inequality
x∗ν , v ≤ f ◦ (xν ; v) for all v ∈ X and all indices ν.
Fix finally any v ∈ X and take an arbitrary ε > 0. By the aforementioned
upper semicontinuity of f ◦ , find a neighborhood V ⊂ W of x for which
f ◦ (z; v) < f ◦ (x; v) + ε whenever z ∈ V.
This allows us to assume without loss of generality that
x∗ν , v ≤ f ◦ (xν ; v) < f ◦ (x; v) + ε for all ν.
w∗
Passing to a limit as x∗ν → x∗ and then as ε ↓ 0 gives us x∗ , v ≤ f ◦ (x; v).
Since v is arbitrary, we get x∗ ∈ ∂f (x) and thus complete the proof.
As mentioned above, the contingent subdifferential ∂ − f (x) shares with
(7.33) the convexity property. Also it follows from definition (7.34) and Propo-
sition 7.8 for the subderivative df (x; v) that
x∗ ≤ for any x∗ ∈ ∂ − f (x) (7.38)
via the Lipschitz constant of f around x. However, the major assertions on
the nonemptiness and robustness (7.37) of ∂f obtained in Theorem 7.24 fail
for (7.34) as illustrated next by a simple while instructive example.
Furthermore, it is obvious that ∂f1 (x) = ∂ − f1 (x) = {1} if x > 0 and ∂f2 (x) =
∂ − f2 (x) = {−1} if x < 0. In particular, this example shows that for f := f2
the nonemptiness and robustness properties of ∂f in Theorem 7.24 fail to hold
for the contingent subdifferential ∂ − f at x = 0.
Similar results for the generalized gradient (7.33) of smooth and convex
functions follow from the corresponding properties of the generalized direc-
tional derivative (7.1) obtained in Subsection 7.1.1. More delicate relationships
between the generalized gradient and various kinds of (strict) differentiability
are discussed below in Section 7.5.
470 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Proof. It suffices to verify (a) for x = x. The latter readily follows from defi-
nition (7.33) and Proposition 7.5 telling us that for C 1 -smooth functions, the
generalized directional derivative (7.1) agrees with the classical one f (x; v),
which in turn reduces in this case to ∇f (x), v . Assertion (b) is a direct
consequence of (7.33) and Proposition 7.4.
We conclude this subsection by presenting some elementary properties of
the contingent subdifferential that are not shared by the generalized gradient.
Proposition 7.29 Let X be a normed space, and let f1 , f2 : X → R be locally
Lipschitzian functions around a given point x. The following assertions hold:
(a) We always have the inclusion
∂ − (f1 + f2 )(x) ⊃ ∂ − f1 (x) + ∂ − f2 (x). (7.40)
(b) If f1 (x) = f2 (x) and f1 (x) ≤ f2 (x) for all x around x, then
∂ − f1 (x) ⊂ ∂ − f2 (x).
Note that both assertions in Proposition 7.29 fail for the generalized gra-
dients of f1 and f2 . Indeed, for (a) choose f1 (x) := |x| and f2 (x) := −|x|,
while for (b) choose f1 (x) := −|x| and f2 (x) := 0 on X := R.
7.3.2 Calculus Rules for Generalized Gradients
Proof. Fix any v ∈ X and observe first that the maximum in (7.42) is achieved
due to the nonemptiness and weak∗ compactness of ∂f (x) by Theorem 7.24.
Then it follows from definition (7.33) that the inequality “≥” holds in (7.42).
To verify the opposite inequality, suppose on the contrary that
f ◦ (x; v) > max x∗ , v x∗ ∈ ∂f (x) . (7.43)
Define Y := span{v} and consider the linear function y ∗ : Y → R given by
y ∗ , tv = tf ◦ (x; v). Employing the Hahn-Banach theorem as in the proof of
Theorem 7.24 ensures the existence of a linear continuous functional x∗ ∈ X ∗
such that we have the relationships
x∗ , w ≤ f ◦ (x, w) for all w ∈ X and x∗ , v = f ◦ (x; v),
which yield x∗ ∈ ∂f (x). The latter implies together with (7.43) that
f ◦ (x, v) > x∗ , v = f ◦ (x; v),
a contradiction verifying (7.42). Note that, by the convexity of ∂f (x), repre-
sentation (7.42) can be also obtained by using convex separation.
The next chain rule for generalized gradients in the above composition
framework of ϕ ◦ g does not impose the rather restrictive surjectivity assump-
tion on ∇g(x) as in Theorem 7.33 while it generally provides only the inclusion
“⊂” in (7.45), which becomes an equality if ϕ is directionally regular at y.
Note that in contrast to Theorem 7.33, we do not assume now that the spaces
X and Y in question are Banach spaces.
474 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
and a similar one for ∂ y f (x, y), where equalities hold in both inclusions if f
is directionally regular at (x, y). Furthermore, in the latter case we have
∂f (x, y) ⊂ ∂ x f (x, y) × ∂ y f (x, y). (7.50)
7.3 Directionally Generated Subdifferentials 475
Proof. To verify (7.49) and the equality therein under the directional reg-
ularity assumption, we apply Theorem 7.34 to the composition ϕ ◦ g with
g : X → X × Y and ϕ : X × Y → R defined by
g(x) := (x, y) and ϕ(x, y) := f (x, y).
To get (7.50), we use full duality between the generalized gradient and direc-
tional derivative from Theorem 7.31 and the fact that the latter reduces to
the classical directional derivative for directionally regular functions.
The next theorem provides calculus rules for evaluating generalized gradi-
ents of pointwise maxima of finitely many Lipschitzian functions. Recall that
such maximum functions are intrinsically nonsmooth even when those under
maximization are linear as for |x| = max{x, −x}.
Theorem 7.36 Let fi : X → R, i = 1, . . . , m, be locally Lipschitzian around
x on a normed space X. Consider the maximum function fmax from (7.28)
and the active index set I(x) defined in Theorem 7.20. Then we have
∂fmax (x) ⊂ co ∂fi (x) , (7.51)
i∈I(x)
Using now Proposition 7.30 and the classical formula of convex analysis for
the subdifferentiation of maximum functions (see Theorem 3.59) gives us
x∗ ∈ ∂ max ψi (0) = co ∂ψi (0) = co ∂fi (x) ,
i∈I(x)
i∈I(x) i∈I(x)
476 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
which justifies the generalized gradient inclusion (7.51). To verify finally the
equality in (7.51) under the imposed directional regularity of fi , observe that
we have the opposite inequality
dfmax (x; v) ≥ max dfi (x; v) for all v ∈ X
i∈I(x)
for (7.2) without any regularity assumptions. This implies by (7.34) that
∂ − fmax (x) ⊃ co ∂ − fi (x) (7.53)
i∈I(x)
for the contingent subdifferential. Combining (7.51) and (7.53) with the
assumed directional regularity of fi for i ∈ I(x) justifies the equality in (7.51)
and the directional regularity of the maximum function fmax at x.
Before presenting the next result, let us discuss its essence.
Remark 7.37 The above calculus rules for the generalized gradients (7.33)
belong to the major part of full calculus that holds not only for (7.33) but
also for some other subdifferential constructions for extended-real-valued l.s.c.
functions. However, the following classical plus-minus symmetry
∂(−f )(x) = −∂f (x) (7.54)
strongly contrasts the generalized gradient of locally Lipschitzian functions
from any other known subdifferentials in nonsmooth analysis, including those
that do not possess adequate calculus rules. This is properly reflected by
the very name “generalized gradient” versus subdifferential. In particular, the
symmetry property does not hold for the subdifferential of convex analysis
(simply because the negative convex function is not convex), for the contingent
subdifferential, and for Rockafellar’s extension of Clarke’s constructions to
l.s.c. functions among other subdifferentials discussed in Section 7.10.
The plus-minus symmetry (7.54) excludes the generalized gradient (7.33)
from the realm of unilateral, one-sided analysis the starting point of which is
convex analysis. The proof of (7.54) given below is based on definition (7.1) of
the generalized directional derivative for locally Lipschitzian functions that is
the driving force of full calculus and other important properties of generalized
gradients. However, there is a price to pay for (7.54) including an undesirable
large size of ∂f (x) and especially the fact that the generalized gradient does
not distinguish between essentially different nonsmooth functions f and −f ,
between maxima and minima, between inequalities with “≤” and “≥” signs,
etc. This yields, in particular, the equivalence between the directional regular-
ity of f and −f at x in the above calculus rules. The mentioned drawbacks can
be avoided by implementing the other, “unconvexified” approach to general-
ized differentiation, suggested by the first author and then developed in many
publications, which leads us to even better full subdifferential calculus while
sacrificing the duality scheme (7.31) and hence the subdifferential convexity;
see Section 7.10 for more discussions and references.
7.3 Directionally Generated Subdifferentials 477
Here is the scalar multiplication rule for the generalized gradient (7.33)
that includes the aforementioned symmetry property.
Proposition 7.38 For any function f : X → R defined on a normed space X
which is locally Lipschitzian around x, we have the multiplication rule
∂(αf )(x) = α∂f (x) whenever α ∈ R. (7.55)
In this subsection we obtain two major calculus rules of the equality type for
contingent subgradients of Lipschitzian functions on normed spaces that are
not shared by generalized gradients in the absence of directional regularity.
478 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Proof. Proceed as in the proof of Corollary 7.35 with the usage of Theo-
rem 7.41 instead of Theorem 7.34.
Finally, we derive the following exact sum rule for contingent subgradients
and the contingent regularity statement, which follow from Theorem 7.41.
480 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Here we derive the aforementioned versions of the mean value theorem for
Lipschitz continuous functions defined on normed spaces. Then this theorem
is applied to characterizing function convexity via the monotonicity property
of the generalized gradient and contingent subgradient mappings.
Theorem 7.44 Let f : X → R be a function defined on a normed space X,
and let a, b ∈ X. Assume that f is Lipschitz continuous on an open set con-
taining the line segment [a, b]. Then the following assertions hold:
7.4 Mean Value Theorems and More Calculus 481
Remark 7.45 The following observations shed light on the independent ver-
sions of the mean value theorem obtained above:
(a) Although assertion (a) of Theorem 7.44 does not require any additional
assumptions on f , the set involved in inclusion in (7.62) may be signifi-
cantly larger than the one in (7.63) as for the function f : R → R taken
from Example 7.7.
(b) The contingent regularity assumption of Theorem 7.44(b) is satisfied for
any Gâteaux differentiable function f : X → R, which may be not direc-
tionally regular, and thus (b) does not reduce to (a) for such functions.
(c) It is not hard to deduce from the definitions that for a locally Lipschitzian
function f around x the sets ∂ − f (x) and ∂ − (−f )(x) are nonempty simul-
taneously if and only if f is Gâteaux differentiable at x.
Using the obtained versions of the mean value theorem, we provide now the
following characterizations of function convexity via the monotonicity prop-
erty (5.31) for the subgradient mappings under consideration.
such that f (x)−f (u) = u∗ , x−u . Note that the locally Lipschitzian property
of f guarantees its Lipschitz continuity on an open set containing [u, x] by
the compactness of this line segment. Then the assumed monotonicity of ∂f
ensures right away that the inclusion “⊂” in (7.66) is satisfied.
Having representation (7.66), let us show that f is convex on U . Taking
arbitrary vectors u, x ∈ U , form their convex combination w := λu + (1 − λ)x
for some λ ∈ [0, 1]. Choose w∗ ∈ ∂f (w) and get by (7.66) that
w∗ , x − w ≤ f (x) − f (w),
w∗ , u − w ≤ f (u) − f (w).
Multiplying the first inequality by λ, multiplying the second inequality by
1 − λ, and adding the resulting inequalities give
0 ≤ λf (x) + (1 − λ)f (u) − f (w),
which justifies the convexity of f .
To proceed further with case (b) of ∂− f from (7.65), observe that the
above proof in case (a) ensures the convexity of f if the mapping x → ∂− f (x)
is monotone on U . Indeed, we just need to repeat the previous arguments
with the usage of Theorem 7.44(b) instead of assertion (a) therein, which
thus completes the proof of this theorem.
We conjecture that ∂− f can be replaced by ∂ − f in Theorem 7.46(b) pro-
vided that X is a Banach space admitting a Gâteaux smooth renorming, in
particular, if X is any separable Banach space. The reader is referred to Exer-
cise 7.136(b) with the hint therein for more details. In Section 7.10 we also
discuss a general version of Theorem 7.46 for the class of extended-real-valued
l.s.c. functions by using the approximate mean value theorem instead of its
Lipschitzian counterpart in Theorem 7.44.
In this subsection we obtain yet another major chain rule for generalized gra-
dients, which—in contrast to the previous chain rules for generalized gradients
and contingent subgradients—addresses compositions where inner mappings
are Lipschitzian while not necessarily differentiable. The proof is based on
the mean value Theorem 7.44(a) and the robustness of generalized gradients,
which fails for contingent subgradients even under contingent regularity as for
the function f : R → R from Example 7.7. The established chain rule allows
us to derive product and quotient rules for generalized gradients by reducing
these operations to compositions with nonsmooth inner mappings.
Let us first present the following monotonicity relation for support func-
tions of closed convex sets that is of its own interest.
Lemma 7.47 Let Ω1 , Ω2 ⊂ X be two nonempty, closed, and convex subsets
of an LCTV space X, and let σΩ1 and σΩ2 be their support functions (4.8).
Then the support function relationship
484 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Proof. We only need to verify that (7.67) yields Ω1 ⊂ Ω2 since the opposite
implication is obvious. Suppose on the contrary that Ω1 is not a subset of Ω2 ,
i.e., there exists x ∈ Ω1 with x ∈ / Ω2 . Applying the strict separation theorem
to the sets Ω2 and {x∗ } and using the support function definition give us a
linear functional x∗ ∈ X ∗ such that
σΩ2 (x∗ ) = sup{ x∗ , x x ∈ Ω2 < x∗ , x ≤ σΩ1 (x∗ ).
This contradicts (7.67) and thus completes the proof of the lemma.
Now we are ready to derive the aforementioned chain rule of the inclusion
type for generalized gradients of Lipschitzian compositions.
and apply to it the mean value theorem from (7.62) on the interval [xk , xk +
tk v]. This gives us uk ∈ (xk , xk + tk v) and x∗k ∈ ∂ y ∗ , g (uk ) such that
g(xk + tk v) − g(xk )
y∗ , = x∗k , v for every k ∈ N,
tk
where {uk } converges to x as k → ∞. Since the numerical sequence { x∗k , v }
is bounded, we suppose without loss of generality that there exits α ∈ R such
that limk→∞ x∗k , v = α. By the boundedness of the sequence {x∗k } ⊂ X ∗
due to the boundedness of generalized gradients, it follows from the Alaoglu-
Bourbaki theorem that there exists a subnet of {x∗k } that converges to some
x∗ ∈ X ∗ in the weak∗ topology of X ∗ . This allows us to conclude that
lim x∗k , v = α = x∗ , v ,
k→∞
The final result of this subsection is the following quotient rule for gener-
alized gradients of Lipschitzian functions.
Corollary 7.50 Let f2 (x) = 0 in the setting of Corollary 7.49. Then the
quotient f1 /f2 is locally Lipschitzian around x, and we have the inclusion
f1 f2 (x)∂f1 (x) − f1 (x)∂f2 (x)
∂ (x) ⊂ .
f2 f22 (x)
Proof. Proceed as in the proof of Corollary 7.49 with ϕ(y1 , y2 ) := y1 /y2 .
Remark 7.52 It follows from the definitions (see also Exercise 7.139) that
Proof. It is instructive to show first that the claimed local Lipschitz continuity
of f easily follows from its strict Fréchet differentiability. Indeed, supposing
the latter and denoting x∗ := fF (x) allow us to find δ > 0 such that
|f (x) − f (u) − x∗ , x − u | ≤ x − u whenever x, u ∈ B(x; δ).
With := x∗ + 1, we immediately get the local Lipschitz property
|f (x) − f (u)| ≤ x − u for all x, u ∈ B(x; δ).
Consider further the general case where f is strictly Hadamard differ-
entiable at x with x∗ := fH
(x). Suppose on the contrary that f is not
locally Lipschitzian around x and then for any k ∈ N find sequences of points
xk , uk ∈ B(x; 1/k) such that
xk − uk
|f (xk ) − f (uk )| = f uk + tk − f (uk ) > kxk − uk . (7.70)
tk
√ √
Choosing now tk := kxk − uk gives us 0 < tk ≤ 2/ k → 0 as k → ∞.
Furthermore, with vk := (xk − uk )/tk we get from (7.70) that
√ f (uk + tk vk ) − f (uk )
k< , k ∈ N.
tk
√
Observe that vk = 1/ k → 0 as k → ∞ and form the compact set
C := vk k ∈ N ∪ 0 .
It follows from the definition of strict Hadamard differentiability that
7.5 Strict Differentiability and Generalized Gradients 489
f (uk + tk vk ) − f (uk )
− x∗ , vk → 0 as k → ∞.
tk
This readily leads us to a contradiction due to the estimate
√ f (uk + tk vk ) − f (uk )
k − x∗ · vk ≤ − x∗ , vk
tk
√
and the fact that k − x∗ · vk → ∞ as k → ∞.
The next theorem establishes the relationship between strict Gâteaux dif-
ferentiability and strict Hadamard differentiability of real-valued functions.
Theorem 7.54 Let X be a normed space. Given f : X → R and x ∈ X, the
following are equivalent:
(a) f is strictly Hadamard differentiable at x.
(b) f is locally Lipschitz continuous around x and strictly Gâteaux differen-
tiable at this point.
Proof. Implication (a)=⇒(b) follows from Remark 7.52 and Theorem 7.53.
To verify (b)=⇒(a), suppose on the contrary that f is not strictly Hadamard
differentiable at x. Then there exist a compact set C ⊂ X and a number
ε0 > 0 along with sequences vk ∈ C, tk ↓ 0, and xk → x as k → ∞ such that
f (xk + tk vk ) − f (xk )
ε0 < − x∗ , vk for all k ∈ N.
tk
By the compactness of C, assume without loss of generality that the sequence
{vk } converges to some v ∈ C as k → ∞. Then for k sufficiently large we have
f (xk + tk v) − f (xk ) f (xk + tk vk ) − f (xk )
− x∗ , v = − x∗ , vk
tk tk
f (xk + tk v) − f (xk + tk vk ) ∗
+ − x , v − x∗ , vk
tk
f (xk + tk vk ) − f (xk )
≥ − x∗ , vk − ( + x∗ )vk − v
tk
≥ ε0 − ( + x∗ )vk − v,
where is a Lipschitz constant of f around x. This leads us to a contradiction
and thus completes the proof of the theorem due to the imposed strict Gâteaux
differentiability of f at x and the fact that vk − v → 0 as k → ∞.
We conclude this subsection with the two useful observations and leave
their verifications as exercises for the reader.
Remark 7.55 Observe the following:
(a) In finite-dimensional spaces, the strict Fréchet differentiability of f at x
is equivalent to the strict Hadamard differentiability of f at this point.
(b) For locally Lipschitzian functions, a natural question is whether the strict
Hadamard differentiability of f at x implies the strict Fréchet differen-
tiability of f at this point. The answer is negative: consider the norm
function f (x) := x in the space of sequences X := 1 at x = 0.
490 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
The next result is a direct consequence of Theorem 7.56 for convex func-
tions. Note that instead of dealing with f : X → R, we may consider extended-
real-valued functions f : X → R while assuming that x ∈ int(dom(f )). This
in fact does not make any difference.
Here we give a simple proof of the classical theorem on the a.e. differentia-
bility of Lipschitzian functions in finite dimensions, which goes back to Hans
Rademacher [298]. The theorem is formulated for vector functions f : U → Rm
on open subsets of Rn , but it obviously suffices to verify it for real-valued func-
tions, which is actually needed in the next subsection.
Theorem 7.60 Let f : U → Rm be a Lipschitz continuous mapping defined
on a nonempty open subset U of Rn . Then f is differentiable a.e. on U .
Proof. We can assume without loss of generality that m = 1 and split the
proof into the verification of the following four claims:
Claim 1: The result holds for n = 1.
This follows from the classical result on the a.e. differentiability of absolutely
continuous (and hence of Lipschitz continuous) functions on the real line.
Claim 2: For any vector v ∈ S from the unit sphere S ⊂ Rn , the classical
directional derivative f (x; v) exists at almost every point x ∈ U .
To verify this claim, denote by Sv the set of all the points x ∈ U where f (x; v)
exists. Then the one-dimensional Lebesgue measure of the intersection of the
set U \ Sv with any line that is parallel to v equals zero by Claim 1. It is
easy to see that the set Sv is measurable. Then the classical Fubini theorem
implies that μ(U \ Sv ) = 0 for the Lebesgue measure μ(Ω) of a set Ω ⊂ Rn .
Claim 3: Denote fxi := ∂f /∂xi for i = 1, . . . , n. Given v = (v1 , . . . , vn ) ∈ S,
consider the set Uv ⊂ U on which f (x; v) and fx1 (x), . . . , fxn (x) exist with
f (x; v) = v1 fx1 (x) + . . . + vn fxn (x). (7.71)
Then we have μ(U \ Uv ) = 0.
To prove this claim, recall from Claims 1 and 2 that all the derivatives f (x; v)
and fxi (x), i = 1, . . . , n, exist for a.e. x ∈ U . Thus it remains to verify that
equality (7.71) holds almost everywhere on U . To this end, take any real-
valued function g ∈ C ∞ (U ) with compact support. Using Lebesgue’s domi-
nated convergence theorem and then integration by part gives us
7.6 Generalized Gradients in Finite Dimensions 495
f (x + tv) − f (x)
f (x; v)g(x)dx = lim g(x)dx
U t↓0 U t
g(x − tv) − g(x)
= lim f (x) dx
t↓0 U t
n
n
∂g
− f (x) vi dx = vi fxi g(x)dx.
U i=1
dxi U i=1
Since the latter holds for an arbitrary function g of the above class, we con-
clude that equality (7.71) is satisfied for a.e. x ∈ U .
Claim 4: Let Q ⊂ S be an arbitrary set that is dense on the unit sphere
S ⊂ Rn , and let Ω := ∩v∈Q Uv ⊂ U , where Uv is defined in Claim 2. Then the
function f is differentiable at any point x ∈ Ω.
To verify this statement, fix x ∈ Ω and then for any v ∈ S and t > 0 define
n
f (x + tv) − f (x) − t i=1 vi fxi (x)
r(v, t) := .
t
The Lipschitz continuity of f allows us to find a constant > 0 for which
r(w, t) − r(v, t) ≤ w − v whenever v, w ∈ S.
It follows from Claim 3 that for every finite subset W ⊂ S and every ε > 0
there is a number t = t(W, ε) > 0 with
1
r(w, t) < whenever 0 < t < t(W, ε).
2
The density of Q on the (compact) unit sphere S ensures the existence of a
finite subset W ⊂ Q with the distance estimate d(v; W ) < (ε/2) for all v ∈ S.
Thus for any such v, there exists w ∈ W satisfying v − w < (ε/2) and
r(v, t) ≤ r(w, t) + r(v, t) − r(w, t) < ε if 0 < t < t(W, ε).
The latter inequality tells us that for any ε > 0, we have
n
f (x + v) − f (x) − i=1 vi fxi (x)
<ε
v
provided that v is sufficiently small. This verifies the claimed differentiabil-
ity of f at x and thus completes the proof of the theorem.
Proof. Using again Theorem 7.60, we conclude that the set on the right-hand
side of (7.72) is nonempty. It follows from (7.35) and Proposition 7.27 that
∇f (xk ) ∈ ∂f (xk ) for each k ∈ N. Combining this with the uniform bound-
edness of ∂f (x) on U by the Lipschitz constant in (7.36) from Theorem 7.24
allows us to deduce from the classical Bolzano-Weierstrass theorem that the
sequence {∇f (xk )} contains a convergent subsequence. The limits of such
subsequences belong to ∂f (x) due to the robustness property (7.37) of the
generalized gradient verified in Theorem 7.24. Since the set ∂f (x) is convex,
we get the inclusion “⊃” in (7.72). Note that the set on the right-hand of (7.72)
is compact, which follows from the Carathéodory theorem; see Exercise 6.114.
It remains to prove the inclusion “⊂” in (7.72). Taking into account the
convexity and compactness of both sets in (7.72) and the monotonicity prop-
erty of the support function established in Lemma 7.47, it suffices to show
that the support function of the set on the right-hand side of (7.72) is not less
than the one for ∂f (x). But the latter is exactly f ◦ (x; v) by the full duality
in Theorem 7.42. Thus we have to prove that the inequality
D
f ◦ (x; v) − ε ≤ lim sup ∇f (x), v x → x, x ∈ /O (7.73)
holds for any v = 0 and ε > 0. To proceed, denote by γ ∈ R the value on the
right-hand side of (7.73). Thus there exists η > 0 such that
∇f (x), v ≤ γ + ε whenever x − x ≤ η with x ∈ D, x ∈
/ O,
where the set (U \ D) ∪ O is of measure zero in x + ηB. This implies by using
the aforementioned Fubini theorem that for a.e. x ∈ x + η/2B the intersection
of the interval Ix := {x + tv | 0 < t < η/(2v)} with the set (U \ D) ∪ O is of
one-dimensional measure zero. This ensures in turn that for any x ∈ x + η/2B
having this property and for any t ∈ (0, η/(2v)), we get
t
f (x + tv) − f (x) = ∇f (x + τ v, v dτ.
0
The last subsection here concerns some issues related to the fundamental
theorem of calculus about differentiation of integrals with variable bounds of
integration that are known as antiderivatives. However, instead of the classical
setting of continuous functions under integration, we now consider those which
are taken from the space L∞ [a, b] of essentially bounded functions on a given
interval [a, b] ⊂ R. In such a case, the obtained antiderivative functions are
not differentiable but merely Lipschitz continuous, and this calls therefore for
their generalized differentiation. It is shown below that the above properties
of generalized gradients (which are not shared by contingent subgradients)
and the one-dimensional version of Rademacher’s theorem lead us to precise
calculations of generalized gradients of Lipschitzian antiderivatives. Recall
that a (Lebesgue) measurable function f : [a, b] → R is essentially bounded if
there exists a constant M ≥ 0 such that
|f (x)| ≤ M a.e. on [a, b]. (7.74)
For functions of this class, define the essential supremum of f on [a, b] by
ess sup f (x) x ∈ [a, b] := inf sup f (x), (7.75)
E⊂[a,b] x∈[a,b]\E
μ(E)=0
where the notation μ(E) stands for the Lebesgue measure on the real line.
Similarly we defined the essential infimum of f on [a, b] by
ess inf f (x) x ∈ [a, b] := sup inf f (x).
E⊂[a,b]| x∈[a,b]\E
μ(E)=0
Then the classical Lebesgue space L∞ [a, b] is the collection of measurable and
essentially bounded functions f : [a, b] → R endowed with the norm
f ∞ := inf sup |f (x)|.
E⊂[a,b] x∈[a,b]\E
μ(E)=0
(7.77)
−
f (x) = lim ess inf f (u) u ∈ [x − ε, x + ε] ∩ [a, b] .
ε↓0
Lemma 7.62 Let f ∈ L∞ [a, b], and let x ∈ [a, b]. The following hold:
(a) The function F defined in (7.76) is Lipschitz continuous on [a, b].
(b) Considering the set
ΩF := x ∈ [a, b] F (x) is not differentiable ,
we have that F (x) = f (x) for all x ∈ [a, b] \ ΩF .
(c) There exists a sequence {xk } ⊂ [a, b] \ ΩF that converges to x with
f (xk ) → f + (x) as k → ∞, where f + (x) is defined in (7.77).
(d) There exists a sequence {xk } ⊂ [a, b] \ ΩF that converges to x with
f (xk ) → f − (x) as k → ∞, where f − (x) is also defined in (7.77).
1 1
ess supu∈[x−1/k,x+1/k] f (u) − < f (xk ) < ess supu∈[x−1/k,x+1/k] f (u) + .
k k
Passing to the limit as k → ∞ and using the definition of f + (x) in (7.77)
yield f (xk ) → f + (x) for all x ∈ [a, b], which verifies (c).
The proof of (d) is similar to that of (c), and thus we are done.
Proof. For simplicity we confine ourselves to the case where x ∈ (a, b). It
follows from Lemma 7.62(b) and the previous observations that F (x) =
f (x) ∈ ∂F (x) for all x ∈ [a, b] \ ΩF . Then Lemma 7.62(c) gives us a sequence
{xk } ⊂ [a, b] \ ΩF that converges to x with f (xk ) → f + (x) as k → ∞. Since
f (xk ) ∈ ∂F (xk ), we have f + (x) ∈ ∂F (x), and similarly f − (x) ∈ ∂F (x). Then
the convexity of ∂F (x) ensures that [f − (x), f + (x)] ⊂ ∂F (x).
To verify the opposite inclusion in (7.78), fix a small number ε > 0 and
deduce from the definitions that, whenever t > 0 is sufficiently small, we get
x+t x+t
F (x + t) − F (x) f (u) du ≤ ess supu∈[x−ε,x+ε] f (u) du
x x
We remind the reader that the standing assumptions of this section address
locally Lipschitzian functions on normed spaces. The Lipschitz continuity was
essential in the definitions and results of the previous subsections concerning
generalized derivatives and gradients as well as subderivatives and contingent
subdifferentials. However, it is not the case here. In fact, all the definitions
and results presented in this subsection untill Theorem 7.69 do not require
the Lipschitz continuity, or just need simple modifications for the case of
extended-real-valued functions; see Section 7.10. Nevertheless, we keep the
standing assumptions to avoid the reader’s confusion while mentioning them
explicitly only when they are really needed; see Section 7.10.
Definition 7.64 Let X be a normed space, and let f : X → R be locally
Lipschitzian around x. Consider the following constructions:
(a) The (Fréchet) regular subdifferential of f at x is
f (x) − f (x) − x∗ , x − x
(x) := x∗ ∈ X ∗
∂f lim inf ≥0 . (7.79)
x→x x − x
:X→
via the sequential outer limit of the mapping ∂f → X ∗ , where X ∗ is
∗
equipped with the weak topology.
(x) ⊂ ∂f (x).
It follows directly from definitions (7.79) and (7.80) that ∂f
Furthermore, we always have ∂f (x) = ∅ in arbitrary Asplund spaces, while the
regular subdifferential ∂f (x) may empty even for very simple nonsmooth and
nonconvex functions on R. As expected nevertheless, both subdifferentials
from Definition 7.64 agree with the subdifferential of convex analysis if f
is convex, and they reduce to the classical derivatives when the function is
differentiable in the appropriate sense; see Exercise 7.144 for more details.
Next we reveal relations between regular and contingent subgradients.
f (x + tk vk ) − f (x)
df (x; v) ≤ lim inf ≤ lim inf x∗ , vk − ε = x∗ , v − ε.
k→∞ tk k→∞
∂f (x) = x∗ ∈ X ∗ (x∗ , −1) ∈ N (x, f (x)); epi(f ) . (7.85)
w∗
without loss of generality that u∗mk −−→ u∗ as k → ∞. Hence u∗ ∈ ∂f (x), and
then by passing to the limit in (7.92) we arrive at
f ◦ (x; v) ≤ x∗ , v ≤ sup x∗ , v x∗ ∈ ∂f (x) ,
which readily yields (7.91). It follows from (7.91) that
∂f (x) ⊂ cl∗ co ∂f (x) ,
and thus we get (7.88) due to (7.90).
To verify the last statement (7.89) of the theorem, we easily deduce from
(7.80) and the imposed local Lipschitz continuity of f that the subgradient
set ∂f (x) is bounded in X ∗ ; see Exercise 7.144. If X = Rn , then this set
is clearly closed, and so is its convex hull co(∂f (x)). Thus (7.88) reduces to
(7.89), which completes the proof of the theorem.
Proof. It follows from (7.89) that ∂f (x) is a singleton if and only if the gen-
eralized gradient ∂f (x) is a singleton. Theorem 7.56 tells us that the latter is
equivalent, in general normed spaces, to the strict Hadamard differentiability
of f at x, which is the same as the strict Fréchet differentiability of f at x in
finite dimensions; see Proposition 7.53. This completes the proof.
The main goal of this subsection is to evaluate with providing precise cal-
culation formulas for regular and limiting subgradients sets of the (Lipschitz
continuous) distance function
d(x; Ω) := inf x − w w ∈ Ω , x ∈ X, (7.93)
associated with a nonempty set Ω in a normed space X. This function is
convex for a convex set Ω, and in the convex subdifferential of (7.93) has
been comprehensively studied in Subsection 3.3.5 of Chapter 3. Considering
now general nonconvex settings of (7.93) in both finite and infinite dimensions,
we aim at obtaining subgradient results of their own interest and prepare to
use them in deriving subdifferential formulas for calculating subgradient sets
for convex signed distance functions in the next section.
The first result addresses regular subgradients of distance functions asso-
ciated with subsets of normed spaces in both in-set and out-of-set cases.
506 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
(b) If x ∈
/ Ω and the projection set Π(x; Ω) is not empty, then for every
w ∈ Π(x; Ω) we have the inclusion
∂d(x; (x; Ω),
Ω) ⊂ ∂p(x − w) ∩ N (7.95)
where p(x) := x for x ∈ X.
Proof. To verify the in-set formula (7.94) in (a), fix x∗ ∈ ∂d(x; Ω) and for any
ε > 0 find by (7.79) a δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x = d(x; Ω) + εx − x whenever
x ∈ B(x; δ).
Since d(x; Ω) = 0 when x ∈ Ω, we have
x∗ , x − x ≤ εx − x for all x ∈ B(x; δ),
(x; Ω) by definition (7.82). The inclusion x∗ ∈ B∗ follows
which yields x∗ ∈ N
from the Lipschitz continuity of d(·; Ω) on X with modulus = 1.
(x; Ω) ∩ B∗ and for
To prove the opposite inclusion in (7.94), pick x∗ ∈ N
any ε > 0 find by (7.82) a number δ > 0 with
x∗ , x − x ≤ εx − x whenever x ∈ B(x; δ) ∩ Ω.
give us
Using this and picking any x ∈ B(x; δ)
d(x; Ω) − d(x; Ω) − x∗ , x − x
≥ −2ε − (1 + ε)x − x.
x − x
Taking above the lower limit as x → x yields x∗ ∈ ∂d(x; Ω) since ε > 0 was
chosen arbitrarily. This completes the proof of assertion (a).
To verify the claimed inclusion (7.95) in the out-of-set case (b), fix a regular
normal x∗ ∈ ∂d(x; Ω) and a projection vector w ∈ Π(x; Ω). Then for any
ε > 0 there exists δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x
= d(x; Ω) − x − w + ε|x − x
if x − x < δ. Pick w ∈ Ω with w − w = (w − w + x) − x < δ and get
x∗ , w − w = x∗ , (w − w + x) − x ≤ d(w−w+x; Ω) − x − w+εw − x
≤ w − w + x − w − x − w + εw − x = εw − x.
(w; Ω), which finishes the proof of the theorem since
Thus we arrive at x∗ ∈ N
the remaining inclusion x∗ ∈ ∂p(x − w) is obvious.
Our next goal here is to establish a precise calculation formula for the reg-
ular subdifferential of the distance function at out-of-set points of nonempty
subsets Ω ⊂ X of normed spaces by using set enlargements
Ωr := x ∈ X d(x; Ω) ≤ r , r > 0. (7.96)
We begin with the following lemma that is needed for our calculations.
Lemma 7.72 Let X be a normed space, and let Ω be a nonempty subset of
X. For any x ∈ X with r := d(x; Ω) > 0, we have the properties:
(x), then x∗ = 1. Here f (x) = d(x; Ω) for x ∈ X.
(a) If x∗ ∈ ∂f
(b) d(x; Ωr ) = d(x; Ω) − r whenever x ∈/ Ωr .
(x) and fix any ε > 0. By definition (7.79)
Proof. To verify (a), pick x∗ ∈ ∂f
there exists δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x for all x ∈ B(x; δ). (7.97)
Taking a sequence {tk } of positive numbers with limk→∞ tk = 0, for each
k ∈ N find wk ∈ Ω such that
x − wk < d(x; Ω) + t2k .
Let xk := x+tk (wk −x) and get xk −x = tk wk −x < tk (d(x; Ω)+t2k ) → 0
as k → ∞. Thus we have xk − x < δ for sufficiently large k and hence
508 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Now we are in a position to obtain a precise formula for the regular sub-
differential of the distance function (7.93) at out-of-set points.
Theorem 7.73 Let X be a normed space, and let Ω be a nonempty subset of
X. For any x ∈ X with r := d(x; Ω) > 0 we have the representation
∂d(x; (x; Ωr ) ∩ S∗ ,
Ω) = N (7.98)
where S∗ := {x∗ ∈ X ∗ | x∗ = 1}.
Proof. Fix any x∗ ∈ ∂d(x; Ω) and for each ε > 0 find δ > 0 such that (7.97)
holds. Since d(x; Ω) − d(x; Ω) = d(x; Ω) − r ≤ 0 whenever x ∈ Ωr , we have
x∗ , x − x ≤ 0 for all x ∈ B(x; δ) ∩ Ωr ,
which implies that x∗ ∈ N (x; Ωr ). It follows from Lemma 7.72(a) that x∗ =
1, and thus we get the inclusion “⊂” in (7.98).
To prove the opposite inclusion in (7.98), fix any x∗ ∈ N (x; Ωr ) with
∗ ∗
x = 1. It follows from Theorem 7.71 that x ∈ ∂d(x; Ωr ). Taking any
ε > 0 and 0 < η < ε/2, we find δ1 > 0 for which
x∗ , x − x ≤ d(x; Ωr ) − d(x; Ωr ) + εx − x whenever x − x < δ1 .
Then Lemma 7.72(b) gives us the inequality
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x if x − x < δ1 , x ∈
/ Ωr . (7.99)
The last theorem here presents a precise formula for calculating the limit-
ing subdifferential (7.80) of the distance function associated with a closed set
in finite dimensions for the most challenging case of out-of-set points.
We conclude this subsection with the following remark about the smooth-
ness of the distance function at out-set points.
Now we are ready to establish a precise formula for calculating the convex
subdifferential of the signed distance function by using the above tools of
nonconvex generalized differentiation. For the reader’s convenience, recall that
the signed distance function associated with a nontrivial set Ω is defined as
Ω) := d(x; Ω) if x ∈
/ Ω,
d(x; (7.103)
−d(x; Ω c ) if x ∈ Ω
by using the complement Ω c of the set Ω. The signed distance function was
studied in Section 6.4 in the general setting of normed spaces. Although the
Ω) is convex when the set Ω is convex, no results on its subdif-
function d(x;
ferentiation were given in that section. To proceed now in this direction, we
7.7 Subgradient Analysis of Distance Functions 511
confine ourselves to the finite-dimensional setting and show below that the
main result may fail in infinite dimensions.
First we provide a subdifferential study of the extended-real-valued func-
Ω).
tion ϑ(·; Ω) : X → R defined in (6.19) which is associated with d(x;
Proof. To verify (a), it suffices to consider the case where x ∈ bd(Ω). Observe
that the closedness of Ω guarantees that bd(Ω) ⊂ Ω = dom(ϑΩ ). Observe
that the signed distance function d(·; Ω) is convex and continuous on X
due to Corollary 6.48 and Proposition 6.45, respectively. Thus we know that
Ω) = ∅. Pick any x∗ ∈ ∂ d(x;
∂ d(x; Ω) and get by the definition that
Ω) − d(x;
x∗ , x − x ≤ d(x; Ω) = d(x;
Ω) for all x ∈ X.
The next result provides a precise formula for representing the subdiffer-
Ω) via that of ϑ(x; Ω) at boundary points of Ω.
ential of d(x;
Proof. Fix any x ∈ bd(Ω). It follows from Theorem 6.47 that d(x; Ω) =
(ϑΩ p)(x) for all x ∈ X, where p(x) := x for x ∈ X, and where the infimal
convolution is exact at the reference point x, in the sense that d(x; Ω) =
ϑ(x; Ω)+p(0), since all the terms therein are zero. Applying the subdifferential
rule for infimal convolutions of convex functions, we get the equalities
Ω) = ∂(ϑΩ p)(x) = ∂ϑ(x; Ω) ∩ ∂p(0) = ∂ϑ(x; Ω) ∩ B∗
∂ d(x;
and thus verify the claimed subdifferential representation.
512 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Π(x; Ω) if x ∈ Ω c ,
QΩ (x) := (7.104)
Π(x; Ω c ) if x ∈ Ω.
The following lemma exploits elementary properties of the Euclidean projec-
tions in the construction of (7.104).
Lemma 7.78 Let Ω be a nontrivial, closed, and convex subset of Rn . If w ∈
QΩ (x) with x ∈ int(Ω), then we have the inclusion w − x ∈ N (w; Ω).
Proof. Fix any x ∈ Ω c and write for the Euclidean norm that
x − x2 = x − w2 − 2 x − w, x − w + x − w2 .
Since w ∈ QΩ (x) = P (x; Ω c ), we get x − x2 − x − w2 ≥ 0, which yields
1
x − w, x − w ≤ x − w2 for all x ∈ Ω c .
2
Now let us pick any x ∈ int(Ω) and verify that w + t(w − x) ∈/ Ω whenever
t > 0. Indeed, supposing on the contrary that w + t(w − x) = ω ∈ Ω yields
t 1
w= x+ ω ∈ int(Ω),
t+1 t+1
a contradiction. Thus we arrive at the estimate
t x − w, w − x = x − w, w + t(w − x) − w
1 2 1 2
≤ w + t(w − x) − w = t2 w − x for all t > 0.
2 2
Letting t ↓ 0 gives us w−x, x−w ≤ 0 and hence shows that w−x ∈ N (w; Ω),
which is claimed in the lemma.
Ω) and its
Here is the main result on the convex subdifferentiation of d(x;
“nonconvex” proof. Recall that S stands for the unit sphere of Rn .
Theorem 7.79 Let Ω be a nontrivial, closed, and convex subset of Rn . Then
⎧
⎪
⎪ x − QΩ (x)
⎪
⎪ if x ∈ Ω c ,
⎪
⎨ d(x; Ω)
Ω) =
∂ d(x; x − co QΩ (x) (7.105)
⎪
⎪ if x ∈ int(Ω),
⎪
⎪
d(x; Ω)
⎪
⎩
co S ∩ N (x; Ω) if x ∈ bd(Ω),
where the set QΩ (x) is taken from (7.104).
7.7 Subgradient Analysis of Distance Functions 513
xk − wk
∇f (xk ) = with wk := co QΩ (xk ) ,
d(xk ; Ω)
where the set co QΩ (xk ) is a singleton, and where ∇f (xk ) = 1 for each
k ∈ N. The latter ensures that v ∈ S. Furthermore, by Lemma 7.78 we have
xk − co QΩ (xk ) wk − xk
= ∈ N (wk ; Ω).
k ; Ω)
d(x d(xk ; Ω c )
Since wk → x as k → ∞, it follows from the above constructions that v ∈
N (x; Ω), and hence v ∈ N (x; Ω) ∩ S. Now we deduce from the representation
Ω) ⊂ co(N (x; Ω) ∩ S).
in (7.106) that ∂ d(x;
To verify the opposite inclusion in (7.105) in this case, pick any v ∈
N (x; Ω) ∩ S and let xk := x + v/k for all k ∈ N. It is not hard to check
Ω). This justifies formula
that ∇f (xk ) = v for all k, and hence v ∈ ∂ d(x;
(7.105) for x ∈ bd(Ω) and thus completes the proof of the theorem.
We conclude this section with an example showing that the finite dimen-
sion of the Euclidean space X = Rn is essential for the fulfillment of Theo-
rem 7.79. In fact, the following example demonstrates that the subdifferential
formula (7.105) fails in any infinite-dimensional separable Hilbert space.
optimization that will be considered in the second volume of our book [240]
together with numerical algorithms and a variety of applications. Here we
present some basic properties of DC functions including their subdifferential
and conjugate calculi. A special structure of DC functions makes it possible
to broadly use the machinery of convex analysis, while their intrinsic noncon-
vexity requires also employing nonconvex subdifferentiation studied above.
Since the main part of this chapter deals with Lipschitzian functions, we
restrict ourselves to differences of two continuous convex functions, where Lip-
schitz continuity comes from convexity. Real-valued functions of this type are
known as continuous DC functions. We also consider their local counterparts
and derive for them major rules of subdifferential and conjugate calculi. More
general extended-real-valued DC functions without any continuity assump-
tions are discussed in the commentary Section 7.10.
Unless otherwise stated, all the functions under consideration in this sec-
tion are defined on normed spaces.
Proof. It follows from Definition 7.81 that there exist continuous convex func-
tions gi , hi : X → R such that fi = gi − hi on X for i = 1, . . . , m. Then
m
m
m
fi (x) = gi (x) − hi (x) for all x ∈ X,
i=1 i=1 i=1
m
which readily implies that the sum i=1 fi is a continuous DC function.
We see below that both maximum and minimum operations over finitely
many continuous DC functions keep the resulting functions in this class.
7.8 Differences of Convex Functions 517
Proof. Definition 7.81 tells us that there exist continuous convex functions
gi , hi : X → R such that fi = gi − hi for i = 1, . . . , m on X, and hence
m
m
fi (x) = gi (x) + hj (x) − hj (x), x ∈ X,
j=1,j =i j=1
The last two results in this subsection follow from the above lemma.
Proof. It suffices to prove the proposition for the case where m = 2. We have
1 2
(f1 f2 )(x) = f1 (x) + f2 (x) − f12 (x) − f22 (x) , x ∈ X.
2
It follows from Proposition 7.88 that (f1 + f2 )2 , f12 , and f22 are continuous
DC functions, and thus f1 f2 also belongs to this class.
The following notion plays a significant role in the rest of this subsection.
Definition 7.92 Let X be a normed space, and let f : X → R be a continuous
DC function. We say that ϕ : X → R is a control function for f if ϕ is
continuous and both functions −f + ϕ and f + ϕ are convex.
It follows directly from the definition that any control function for a given
DC function is always convex and continuous.
Proof. Given ε > 0, define the function gε (x) := f (x) + εx2 for x ∈ R.
To verify the convexity of f , it suffices to check that gε is convex for every
ε > 0. Suppose on the contrary that gε is not convex for some ε > 0 and find
x1 , x2 ∈ R such that
gε (tx1 + (1 − t)x2 ) > tgε (x1 ) + (1 − t)gε (x2 ) for some 0 < t < 1.
As parts of the proof of the main result, we present now the following two
lemmas of their own interest.
522 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Proof. Pick x, v ∈ X with v = 1 and δ > 0. Lemma 7.96 ensures the
existence of x1 , x2 ∈ B(x; δ), 0 < λ < 1, and r, s ∈ {1, . . . , m} with
524 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
1
m
1
h(x) := (ϕr (x) + ϕs (x)) + ϕi (x) + ϕj (x) + |fi (x) − fj (x)|
2 2
i,j=1,i=r,j =s
for all x ∈ X. Using the above property (c), for any α, β ∈ R we have
|λf (x1 ) + (1 − λ)f (x2 ) − f (x)|
= |λfr (x1 ) + (1 − λ)fs (x2 ) − f (x)|
1 1 1 1
= |λfr (x1 ) + (1 − λ)fs (x2 ) + α − α + β − β − f (x)|
2 2 2 2
1 1
≤ |λfr (x1 ) + α − fr (x)| + |(1 − λ)fs (x2 ) + β − fs (x)|
2 2
1 1
+ |λfr (x1 ) − β| + |(1 − λ)fs (x2 ) − α|.
2 2
Specify now numbers α, β as α := (1 − λ)ϕr (x2 ) and β := λfs (x1 ) and then
deduce from Proposition 7.91 that
1 1
|λϕr (x1 ) + α − ϕr (x)| = |λϕr (x1 ) + (1 − λ)ϕr (x2 ) − ϕr (x)|
2 2
1
≤ (λϕr (x1 ) + (1 − λ)ϕr (x2 ) − ϕr (x)).
2
Similarly we get the relationships
1 1
|(1 − λ)fs (x2 ) + β − fs (x)| = |λfs (x1 ) + (1 − λ)fs (x2 ) − fs (x)|
2 2
1
≤ (λϕs (x1 ) + (1 − λ)ϕs (x2 ) − ϕs (x)).
2
Since |fr (x) − fs (x)| = 0, it follows that
1 1
|λfr (x1 ) − β| + |(1 − λ)fs (x2 ) − α|
2 2
1 1
= λ|fr (x1 ) − fs (x1 )| + (1 − λ)|fr (x2 ) − fs (x2 )|
2 2
1
+ |fr (x) − fs (x)|.
2
7.8 Differences of Convex Functions 525
f+ (x1 ) ≤ f− (u1 ) ≤ f+ (u1 ) ≤ f− (u2 ) ≤ f+ (u2 ) ≤ f− (x2 ).
Now we are ready to show that the convexity of a function f around each
point of a convex set in a normed space is equivalent to the convexity of this
function on the entire set.
Remark 7.103 Theorem 7.102 provides a convenient way to check the con-
vexity of a given function. For instance, if f : X → R is convex on a nonempty
convex set Ω ⊂ X and if for any x ∈ bd(Ω) ∪ Ω c there exists a convex neigh-
borhood V of x on which f is convex, then Theorem 7.102 tells us that f is
convex on the entire space X. Furthermore, assume that Ω ⊂ ∪m i=1 Vi , where
every Vi is a nonempty open convex set on which f is convex. Take any x ∈ X
and get that x ∈ Vi for some i ∈ {1, . . . , m}. Since f is convex on Vi , it is
convex on Vi ∩ Ω, and thus f is convex on the set Ω by Theorem 7.102.
528 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
Observe that ϕ is convex on W and that max{f (x), g(x)} = g(x) whenever x
is near bd(W ), which implies by Theorem 7.102 that ϕ is convex on Rn . By
the construction we have ϕ(x) = max{f (x), g(x)} = f (x) for all x ∈ V , which
therefore completes the proof of the lemma.
The next lemma justifies the possibility to study locally DC functions by
using the convexity of functions on compact convex sets.
Lemma 7.105 Let f : Rn → R be a locally DC function, and let K be a
nonempty, convex, and compact subset of Rn . Then there exists a convex func-
tion h : Rn → R such that the sum f + h is convex on K.
Proof. Fix any x ∈ Rn and, by using Definition 7.99 and Lemma 7.104, choose
a convex neighborhood Ux of x together with convex functions gx , hx : Rn → R
such that f (z) = gx (z) − hx (z) for all z ∈ Ux . It follows that the function
f + hx is convex on Ux . Since K is compact, we have
m
K⊂ Uxi for some x1 , . . . , xm ∈ Rn .
i=1
m
Define g(x) := f (x) + i=1 hxi for x ∈ Rn and observe that
m
g(x) = (f + hxj )(x) + hxj (x), x ∈ Uxj
i=1,i=j
The last lemma here collects technical results, which play a crucial role in
the proof of the main theorem given below.
Lemma 7.106 Let f : Rn → R be a locally DC function. For each k ∈ N
define the sets Ck := B(0; k) ⊂ Rn and Dk := B(0; k + 1/2) ⊂ Rn , and
let hk : Rn → R be convex functions such that f + hk is convex on Dk ; these
functions exist by Lemma 7.105. Then there exists a convex function ϕ1 : Rn →
R satisfying the following conditions:
(a) ϕ1 (x) = h1 (x) for all x ∈ C1 and
(b) f + ϕ1 is convex on D2 .
0 if x ≤ 1,
ψ(x) :=
α(x − 1) otherwise.
and deduce that ϕ1 (x) = h1 (x) for all x ∈ C1 by the definition of ϕ1 and
(7.113). In addition, we get from (7.114) that
ϕ1 (x) = h2 (x) − γ + ψ(x) whenever x is near bd(D1 ). (7.115)
It also follows from the definition of ϕ1 that this function is convex on D1 .
Furthermore, ϕ1 is convex around any point in Rn \ D1 , and thus it is convex
on the entire space Rn by Theorem 7.102. This verifies (a).
It remains to show that (b) is satisfied, i.e., the function f + ϕ1 is convex
on D2 . Observe to this end that f + h1 is convex on D1 , and that f + h2 is
convex on D2 . It follows that the function f + h2 − γ + ψ is also convex on
D2 . Since D1 ⊂ D2 and
530 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
f (x) + ϕ1 (x) = max f (x) + h1 (x), f (x) + h2 (x) − γ + ψ(x) for all x ∈ D1 ,
we see that f + ϕ1 is convex on D1 . Then deduce from the definition of ϕ1
and (7.115) that f + ϕ1 is convex around every point of D2 \ D1 . Hence f + ϕ1
is convex on D2 by Theorem 7.102, which therefore completes the proof.
where ∇2 f (x) stands for some matrix norm of ∇2 f (x) ∈ Rn×n (e.g., for
the Frobenius norm). Whenever ρ ≥ M , we have the obvious representation
1 ρ ρ
f (x) = x2 + f (x) − x2 − f (x) , x ∈ Rn .
2 2 2
7.8 Differences of Convex Functions 531
Proof. Using Theorem 7.107, it suffices to show that the composition ϕ◦f is a
locally DC function. First consider the case where g is a convex function. Fix
any x0 ∈ Rn and let y0 := f (x0 ) ∈ I. Pick δ > 0 such that J := [y0 −δ, y0 +δ] ⊂
I and choose a neighborhood V of x0 with f (V ) ⊂ J. It is not hard to check
(see Exercise 7.159) that g can be represented in the form
g(y) = sup ϕt (y) t ∈ T for all y ∈ J, (7.116)
where each ϕt : J → R is an affine function of the type ϕt (y) = at y + bt
with M := supt∈T |at | < ∞ and supt∈T bt < ∞. Since f is a continuous
DC function, there exist continuous convex functions f1 and f2 such that
f = f1 − f2 on Rn . By (7.116) we have
g f (x) = sup at f1 (x) − f2 (x) + bt
t∈T
= sup at f1 (x) − at f2 (x) + bt
t∈T
= sup (M + at )f1 (x) + (M − at )f2 (x) + bt − M f1 (x) + f2 (x)
t∈T
Proof. Suppose without loss of generality that f2 (x) > 0 for all x ∈ Rn . By
Proposition 7.89, it suffices to show that 1/f2 is a continuous DC function.
Since f2 : Rn → I := (0, ∞) is a continuous DC function as well as g : I → R
with g(y) := 1/y for y ∈ I, it follows from Proposition 7.109 that 1/f2 is also
a continuous DC function on Rn .
(c) In the setting of assertion (b) we have the same upper estimate for the
generalized gradient of f at x:
∂f (x) ⊂ ∂g(x) − ∂h(x). (7.119)
To prove the opposite inequality in (7.121), it suffices to consider the case
where γ := supz∗ ∈dom(h∗ ) g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) ∈ R. Pick any ε > 0 and fix
z ∗ ∈ dom(h∗ ) such that
g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) ≤ γ for all z ∗ ∈ dom(h∗ ).
It follows therefore that
g ∗ (x∗ + z ∗ ) ≤ h∗ (z ∗ ) + γ whenever z ∗ ∈ X ∗ .
Since h∗∗ = h by the biconjugate theorem in our setting, applying the Fenchel
conjugate operation to both sides above gives us
g ∗∗ (z) − x∗ , z ≥ h∗∗ (z) − γ = h(z) − γ for all z ∈ X.
Using the estimate g ∗∗ (z) ≤ g(z) on X, we have
g(z) − x∗ , z ≥ h(z) − γ whenever z ∈ X,
which readily implies that γ ≥ x∗ , z − (g − h)(z) for all such z. Taking finally
the supremum with respect to z ∈ X verifies that γ ≥ (g − h)∗ (x∗ ) and thus
completes the proof of the theorem.
and thus we get x∗ ∈ ∂ε g(x). This verifies the inclusion ∂ε h(x) ⊂ ∂ε g(x) as a
necessary condition for global optimality.
The proof of the sufficiency of (7.122) for global minima is more involved.
To proceed, suppose that ∂ε h(x) ⊂ ∂ε g(x) for all ε ≥ 0. Suppose on the
contrary that x is not a global minimizer of g − h and find x ∈ X such that
x) − h(
g( x) < g(x) − h(x).
Pick an arbitrary positive number ε satisfying
0 < ε < g(x) + h(
x) − g(
x) − h(x).
Choose x∗ ∈ ∂εh(
x) by Theorem 5.9 and get
x∗ , x − x
≤ h(x) − h(
x) + ε for all x ∈ X.
Then for any x ∈ X, we have
x∗ , x − x ≤ h(x) − h(x) + x∗ , x
− x + h(x) − h(
x) + ε.
Let ε := x∗ , x
− x + h(x) − h(
x) + ε ≥ 0 and deduce from the definition that
x∗ ∈ ∂ε h(x). Using (7.122) gives x∗ ∈ ∂ε g(x). Observe that
x∗ , x
− x = h(
x) − h(x) + ε − ε > g(
x) − g(x) + ε,
which implies that x∗ ∈
/ ∂ε g(x), a clear contradiction.
(a) Give an example showing that the function (x, v) → df (x, v) may not be upper
semicontinuous at (x, v).
(b) Give an example showing that the function (x, v) → f ◦ (x, v) may not be lower
semicontinuous at (x, v).
Exercise 7.124 Consider the setting of Theorem 7.19 under the assumptions made.
538 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
(a) Verify whether both statements of Theorem 7.19 remain valid if the smoothness
of g around x is replaced by the Fréchet differentiability of g at x.
(b) Dropping the surjectivity condition on ∇g(x) in the general assumptions of
Theorem 7.19, show that we have the inequality
(ϕ ◦ g)◦ (x; v) ≤ ϕ◦ y; ∇g(x)v , (7.123)
where equality holds if ϕ is directionally regular at y, which yields in turn the
directional regularity of ϕ ◦ g at x. Hint: To verify the first inequality in (7.123),
proceed similarly to the corresponding part in the proof of [76, Theorem 2.3.9].
The other statements can be deduced directly from the definitions.
Exercise 7.125 Assume in the setting of Theorem 7.20 that all the functions fi are
locally Lipschitzian around x.
(a) Prove that in this case we have the inequality
◦
fmax (x; v) ≤ max fi◦ (x; v), v ∈ X.
i∈I(x)
Hint: Proceed similarly to the proof of this part of Theorem 7.20. This can also
deduced from the chain rule in (7.123).
(b) Show that in this case the subderivative formula (7.30) is replaced by
Exercise 7.129 Clarify whether the counterpart of Theorem 7.33 holds for the
contingent subdifferential (7.34) under all the assumptions imposed therein but the
directional regularity of ϕ at y.
Exercise 7.134 Consider the minimum function fmin from (7.56), where fi : X →
R, i = 1, . . . , m, are locally Lipschitzian around x on a normed space X.
(a) Evaluate the subderivative of the minimum function fmin at x.
(b) Show that the contingent subdifferential counterpart of formula (7.57) holds
without taking the convex hull on the right-hand side therein. Hint: Use (a)
and also compare with [228, Proposition 1.113].
(c) Specify the results of (a) and (b) in the case where the functions fi are C 1 -
smooth around x for all i ∈ I(x).
(c) Show that the contingent regularity assumptions on f and −f are essential for
the fulfillment of Theorem 7.44(b).
Exercise 7.136 Consider the setting of Theorem 7.46 and do the following:
(a) Present all the details in the proof of this theorem.
(b) Clarify the possibility to replace the symmetric construction ∂− f in Theo-
rem 7.46(b) by the contingent subdifferential ∂ − f provided that X is a Banach
space admitting a Gâteaux smooth renorming. Hint: Use Exercise 7.135(a) and
proceed similarly to the proof of [85] and [228, Theorem 3.56] (specifying this
for the case of Lipschitzian function) by taking into account that X is “trust-
worthy” with respect to ∂ − f due to [171, Theorem 4.31].
Exercise 7.138 Derive inclusion (7.51) in Theorem 7.36 from the chain rule of
Theorem 7.48. Hint: Represent the maximum function (7.28) as the composition
ϕ ◦ g with g(x) := (f1 (x), . . . , fm (x)) and ϕ(y1 , . . . , ym ) := max{yi | i = 1, . . . , m}.
f (x + tv) − f (x) − t x∗ , v
lim = 0,
x→x t
t↓0
Exercise 7.140 Prove both assertions formulated in Remark 7.55. Hint: To verify
assertion (a) therein, use the description of the strict Fréchet differentiability of
functions taken from Exercise 7.139(b).
Exercise 7.141 Show that in all the results of Sections 7.1–7.3 and the correspond-
ing exercises, the C 1 -smoothness assumption on the function in question around the
reference point can be replaced by the strict Hadamard differentiability of the func-
tions at this point. Hint: Use Theorem 7.53.
7.9 Exercises for Chapter 7 541
Exercise 7.142 Provide all the details in the proof of Theorem 7.61. Hint: Compare
with the proof of [317, Theorem 9.61].
Exercise 7.143 Consider the antiderivative F : [a, b] → R defined in (7.76). Then
(a) Give a detailed proof of Lemma 7.62 in case (d).
(b) Verify Lemma 7.62 and Theorem 7.63 when x = a and x = b.
Exercise 7.144 Let f : X → R be a locally Lipschitzian function around x ∈ X on
a normed space X. Verify the following statements:
(a) If f is convex, then both ∂f (x) and ∂f (x) agree with the subdifferential of
convex analysis. Hint: Use the definitions.
(x) is a singleton collapsing to the
(b) If f is Fréchet differentiable at x, then ∂f
Fréchet derivative of f at x. Does this hold for the limiting subdifferential?
(c) Prove that both sets ∂f (x) and ∂f (x) are bounded in X ∗ by the Lipschitz
constant of f around x.
(d) Prove that ∂f (x) = ∅ provided that the space X is Asplund. Hint: Compare
with [229, Corollary 2.25].
(e) Show that ∂f (x) = ∅ for f (x) := −|x| at x := 0 ∈ R, and ∂f (x) = ∅ for
f (x) := −x at x := 0 ∈ C[0, 1].
Exercise 7.145 Let f : X → R be a locally Lipschitzian function around some point
x ∈ X, where X is a normed space.
(a) Give an example of f and X such that the regular subgradient set ∂f (x) is
−
empty while the contingent one ∂ f (x) is not.
(b) Verify the following monotonicity property of the regular subdifferential: if
1 (x) ⊂ ∂f
f1 (x) = f2 (x) and f1 (x) ≤ f2 (x) in a neighborhood of x, then ∂f 2 (x).
Hint: Proceed by the definition.
(c) Does the monotonicity property in (b) hold for the contingent subdifferential,
generalized gradient, and the limiting subdifferential?
(d) Assuming that X admits an equivalent Fréchet smooth renorming, prove the
following smooth variational description of regular subgradients: for every x∗ ∈
(x) there exist a neighborhood U of x and a concave, continuously Fréchet
∂f
differentiable function g : U → R such that ∇g(x) = x∗ , and
Hint: Use Theorem 7.71(a) together with the Ekeland variational principle and
compare with [333] and [229, Theorem 1.97].
(b) Clarify whether representation (7.124) holds for closed sets in Banach spaces
with the limiting constructions defined in (7.80) and (7.83).
Exercise 7.152 Let Ω be a nonempty and closed subset of Rn .
(a) Show that the counterpart of Theorem 7.73 fails for the limiting normal cone
and subdifferential.
(b) Modify the construction of the limiting subdifferential (7.80) to ensure a rela-
tionship of types (7.98) and (7.124) (by replacing Ω by Ωr ) at out-of-set points.
Hint: Compare with [229, Theorem 1.101] for the latter representation.
(c) Find and justify appropriate relationships between the limiting subdifferential of
the distance function at out-of-set points of closed sets and the limiting normal
cone to the projections of these points onto the sets. Hint: Compare with [229,
Theorem 1.104] under certain well-posedness conditions.
Exercise 7.153 Verify which parts of Theorem 7.79 and its proof hold in infinite-
dimensional Hilbert spaces.
7.10 Commentaries to Chapter 7 543
Exercise 7.154 Consider the signed minimal time function defined in (6.43), where
X = Rn , and where both sets Ω and F are nonempty and convex. Determine
appropriate conditions on the constant dynamic set F that ensure subdifferential
formulas for (6.43) similar to those obtained in Subsection 7.7.3, where F = B. Hint:
Proceed as in the proof of Theorem 7.79 with appropriate modifications.
Exercise 7.156 Using the mixing property given in Theorem 7.97, prove that the
class of continuous DC functions on normed spaces is closed under taking maxima
of finitely many functions.
Exercise 7.157 Verify the representation (7.111) in the proof of Lemma 7.101.
Hint: Proceed similarly to the proof of Theorem 7.63.
Exercise 7.159 Verify representation (7.116) of the convex function g in the proof
of Proposition 7.109. Hint: Use the subdifferential of g.
Exercise 7.160 Clarify whether Theorem 7.107 and the associated results of Sub-
section 7.8.3 hold in infinite dimensions.
Exercise 7.162 Prove Theorem 7.113. Hint: Proceed similarly to the proof of The-
orem 7.112 with x∗ = 0 therein.
Exercise 7.163 Clarify whether Theorem 7.114 holds for any DC function without
the continuity assumptions on the convex functions g, h in the DC decomposition.
where the pair of convex sets [∂f (x), ∂f (x)] is called the quasi-differential of f at x.
The construction of quasi-differentials and basic machinery of convex analysis made
it possible to develop an extensive quasi-differential calculus important for various
applications; see, e.g., the collection of papers [103] among other publications on
this and related topics.
Since it is not the case that every (even continuous) function has the directional
derivative in each direction (the simplest example is f (x) := x sin(1/x) for x = 0
7.10 Commentaries to Chapter 7 545
Sarabi [222] for a general chain rule of this type in finite dimensions. The notion
of epi-differentiability used in the obtained calculus rules is taken from the book by
Rockafellar and Wets [317].
Both the generalized gradient and the contingent subdifferential from Defini-
tion 7.21 are obtained by the same duality scheme of convex analysis (7.125) by
replacing the classical directional derivative with the corresponding extended direc-
tional derivatives (7.1) and (7.2). But these two directionally generated subdiffer-
entials are essentially different from each other for nonconvex functions. Indeed,
Clarke’s generalized directional derivative f ◦ (x; v) can be fully determined by the
generalized gradient via representation (7.31), which fails for the Dini/contingent
constructions even in the simplest case where f (x) := −|x| at x = 0 ∈ R. Note
that the full duality relationship (7.31) means in fact that any locally Lipschitzian
function is extendedly quasi-differentiable (in the line of Pshenichnyi) by replacing
the classical directional derivative in (7.125) by its generalized Clarke counterpart.
The calculus rules for generalized gradients presented in Subsections 7.3.2 and
7.4.2 mainly follow Clarke’s book [76]. Note that the upper estimate of the general-
ized gradient of the minimum function is the same as for the maximum one, which is
due to the two-sided symmetry property (7.54); see the discussions in Remark 7.37.
The calculus rules for the contingent subdifferential established in Subsection 7.3.3
in the equality form have been recently obtained by Mohammadi and Mordukhovich
[221] under the notion of contingent regularity introduced therein. The meaning of
this property (7.58) is to postulate the full duality (quasi-differentiability) rela-
tionship of type (7.125) for the contingent constructions. As illustrated by simple
examples in the text, the contingent regularity of locally Lipschitzian functions may
be strictly weaker than its (Clarke) directional regularity counterpart even for dif-
ferentiable functions on R. It is also different from other regularity notions of this
type, which were comprehensively studied by Bounkhel and Thibault [59]. Note also
that paper [221] contains appropriate versions of the contingent calculus results to
extended-real-valued functions via the modified subderivative (7.5) and the corre-
sponding subdifferential (7.34) in arbitrary (incomplete) normed spaces.
The mean value theorem in terms of the generalized gradients of Lipschitz con-
tinuous functions in Theorem 7.44(a) is due to Gérald Lebourg [200], a student
of Ivar Ekeland. Its contingent counterpart in Theorem 7.44(b) is new. Note that
the parallel proof of both results given here is different from those in [200] and in
Clarke’s book [76] for generalized gradients. As a consequence of Theorem 7.44, it
is shown in Theorem 7.46 that if either one of the symmetric constructions ∂f and
∂− f from (7.65) are monotone mappings for a Lipschitz continuous function f , then
f must be convex; see also Exercise 7.136(b) for further extensions. The results of
Theorems 7.44(a) and 7.46(a) go back to [76], while assertions (b) of these theorems
are new. Far-going generalizations of Theorem 7.46(a) are obtained by Correa et al.
[85] for axiomatically defined subdifferentials of extended-real-valued l.s.c. functions
on Banach spaces by employing Zagrodny’s approximate mean value theorem [358]
that has been already used in Chapter 5 and discussed in the commentaries therein.
One of the most crucial differences between the contingent subdifferential and
generalized gradient from Definition 7.21 is that the former is a nonsmooth extension
of the classical derivative at the reference point, while the latter extend the strict
derivative therein; we have discussed this at the beginning of Section 7.5. Formally
the notion of (Fréchet) strict derivative at a point was introduced by Leach [199]
in 1961, but in fact this notion was already used in 1950 in the fundamental paper
7.10 Commentaries to Chapter 7 547
[145] by Lawrence Graves, from the famous Chicago School of the calculus of vari-
ations, to prove what is now known as the Lyusternik-Graves theorem of nonlinear
analysis. The notions of strict derivatives, with respect to various bornologies, play
an important role in many aspects of convex and variational analysis Our exposition
in Subsection 7.5.1 mainly follows the book by Phelps [290].
Theorem 7.56 is due to Clarke [76], while Theorem 7.58 can be found in Borwein
[44] with a different proof. Let us mention to this end the notions of “weak differ-
entiability” and “strict-weak differentiability” of single-valued mappings between
Banach spaces with respect to various bornologies. These notions, introduced by
Mordukhovich and Wang [261], may be weaker than even the classical Gâteaux dif-
ferentiability for Lipschitzian mappings with values in the spaces of sequences 2 ;
see [261] and Mordukhovich’s book [228] with applications to graphical regularity
of such mappings and other issues.
Section 7.6 first presents a simple proof of the fundamental Rademacher’s the-
orem from geometric measure theory in finite-dimensional spaces [298]. This result
discovered in 1919 by Hans Rademacher (1892–1969), a student of Constantin
Carathéodory, is often used in convex and variational analysis without giving a
proof. The only exception known to us is the second edition of the book by Jonathan
Borwein and Adrian Lewis [48] with the proof different from the one given in Theo-
rem 7.60. After numerous erroneous attempts and false counterexamples, an exten-
sion of this result to the a.e. Fréchet differentiability of Lipschitz continuous func-
tions defined on Asplund spaces was established by David Preiss in [293] with a very
involved and delicate proof.
Theorem 7.61 on the limiting gradient representation of the generalized gradient
is due to Clarke [74, 76], which can be taken (and often is) as the definition of
generalized gradient of a locally Lipschitzian function. The set under the convex
hull in (7.72) was defined as early as in 1972 by Naum Shor (1937–2006) under
the name of the set of almost-gradients of a locally Lipschitzian function, while its
convex hull of this set was called in [324] the set of generalized almost-gradients; see
also Shor’s book [325] and further algorithmic developments in, e.g., [128, 182, 297]
along with many subsequent publications.
Theorem 7.63 on subdifferentiation of indefinite integrals is taken from Clarke
[76], while Subsection 7.6.3 contains some additional material and treats all of this
differently; cf. the paper by Chieu [73] for similar elaborations for the limiting subd-
ifferential of Mordukhovich. Let us also mention the opposite direction in integration
theory of variational analysis concerning the possibility to determine a function from
its subdifferential up to a constant. The results in this direction go back to Rock-
afellar [307] for convex functions and then were largely extended by Thibault and
Zagrodny [335] and by Thibault and Zlateva [336] to subdifferentials of nonconvex
functions based on Zagrodny’s approximate mean value theorem.
Recall again that the definitions and major properties of Clarke’s generalized
directional derivative and generalized gradient discussed above are strongly related
to the local Lipschitz continuity of the function in question. The following Lipschitz-
backed procedure to extend the generalized gradient construction to non-Lipschitzian
functions was developed in [74, 76]: define first the tangent cone to a set by using
Clarke’s generalized directional derivative of the Lipschitz continuous distance func-
tion associated with the set, then define the normal cone to the set by using
the duality/polarity relation with tangents, and finally define the subdifferential of
f : X → R at x via the normal cone to the epigraph epi(f ) at (x, f (x)). However,
548 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
proceeding in this way does not provide an appropriate convex generalized direc-
tional (sub)derivative extending f ◦ (x, v) in such a way that the full duality corre-
spondence with ∂f (x) holds for non-Lipschitzian functions.
This has been done by Rockafellar in the series of publications [310–312],
where he introduced appropriate subderivative constructions for general classes of
extended-real-valued functions ensuring the non-Lipschitzian counterparts of (7.33)
and (7.42). If f : X → R is finite at x and lower semicontinuous around this point,
Rockafellar’s generalized directional derivative, or upper subderivative, is defined by
f (x + tz) − f (x)
f ↑ (x; v); = sup lim sup inf ,
γ>0 f z−v ≤γ t
x→x
t↓0
f
where the symbol x → x means that x → x with f (x) → f (x), which is of course
redundant for continuous functions. The obtained full duality relationships
∂f (x) = x∗ ∈ X ∗ x∗ , v ≤ f ↑ (x; v), v ∈ X , f ↑ (x; v) = sup x∗ , v
x∗ ∈∂f (x)
(7.128)
are the key to achieve adequate calculus rules for both constructions in (7.128) under
appropriate qualification conditions by using the machinery of convex analysis.
Note that some important properties of Clarke’s constructions for Lipschitz func-
tions are lost for their non-Lipschitzian counterparts. This includes the symmetry
property (7.54), the nonemptiness and boundedness of ∂f (x), and the robustness
∂f (x) = Lim sup ∂f (x), (7.129)
f
x→x
which fails for l.s.c. functions on Rn while being valid for the convex subdifferential.
As seen above, the distance functions associated with nonempty subsets of
normed spaces play a very important role in many aspects of convex and varia-
tional analysis. In Chapter 3 we conducted a comprehensive study of the distance
functions for convex sets in normed spaces by calculating their subgradients at in-set
and out-of-set points and proving detailed commentaries to the obtained results and
related issues. Furthermore, Chapter 6 presented results and commentaries on the
class of signed distance functions without, however, their subdifferential study. Now
Section 7.7 provides such a study in the case of convex sets.
The subdifferential calculations for the signed distance function given in Theo-
rem 7.79 were originally obtained by Luo et al. [212]. We present here another proof
taken from our recent paper with Cuong and Wells [92]. This proof involves the con-
structions and results of nonconvex generalized differentiation, which are certainly
of their independent interest in both finite and infinite dimensions.
To the best of our knowledge, the regular subdifferential construction (7.79)
and the corresponding regular normal cone were introduced in 1974 by Bazaraa,
Goode and Nashed [35] motivated by deriving necessary optimality conditions in
minimax problems. The original name for (7.79) was “the set of ≥ gradients.” The
definition in (7.79) is a lower limit one-sided version of the classical Fréchet deriva-
tive, which explains using the name “Fréchet subdifferential” for this construction
although Fréchet himself had nothing to do with it. The name “regular subdiffer-
ential” comes from Rockafellar and Wets [317]. Note that the subdifferential (7.79)
has been actively used in the theory of viscosity solutions of Hamilton-Jacobi partial
7.10 Commentaries to Chapter 7 549
differential equations starting with the fundamental paper by Crandall and Lions
[86]. We refer the reader to the survey paper by Kruger [189] on Fréchet-type con-
structions.
Note that the regular subdifferential may not be directionally generated, but
it is always the case for finite-dimensional spaces where the regular subdifferential
agrees with the contingent one; see Proposition 7.65, whose proofs clearly hold for
extended-real-valued functions by using the Dini-Hadamard definition (7.5).
Theorem 7.71(a) on regular subgradients of the distance function at in-set points
is due to Kruger [186]; see also Ioffe [169]. The out-of-set case in Theorem 7.71(b)
via projections and related results can be found in our paper [231]. The out-of-set
representation (7.98) in Theorem 7.73 is again due to Kruger [186] whose proof was
clarified by Bounkhel and Thibault [59].
The limiting subdifferential (7.80) and the limiting normal cone (7.83) were
introduced in 1976 by the first author [223] in the equivalent forms for finite-
dimensional spaces. The infinite-dimensional extensions were given in the subse-
quent (starting with 1980) papers by Mordukhovich and his student Alexander
Kruger [187, 188, 190, 191, 226] by using the weak∗ sequential limit of Fréchet ε-
subdifferential/ε-normal constructions in Banach spaces admitting a Fréchet smooth
renorming. Other infinite-dimensional extensions of the constructions in [223] to var-
ious types of Banach spaces were developed in the series of papers by Alexander Ioffe
starting with his note [168] of 1981; see more references in, e.g., [171, 226, 228, 317].
The way of defining the limiting subdifferential and limiting normal cone in
(7.80) and (7.83), respectively, follows the pattern introduced by Mordukhovich
f
and his student Yongheng Shao [257], where the symbol “x → x” should be used
when f is discontinuous. This is different from the previous attempt by Kruger and
Mordukhovich involving ε-constructions. Employing definitions (7.80) and (7.83)
in the case where f is l.s.c. and Ω is closed around x, and where the space X is
Asplund (this is more general than the Fréchet smooth renorming of X assumed in
[187, 190]), a complete generalized differential theory with numerous applications
was developed in Mordukhovich’s two-volume monographs [228] and the references
therein. Here we introduce these definitions in arbitrary normed spaces and see that
some results presented in the text and exercises hold in this general framework while
the others require the closedness of the Asplund space assumptions, which allow
us to employ variational/extremal principles and techniques. Observe to this end
a nonconvex extension of Lebourg’s mean value theorem to continuous functions
on Asplund spaces obtained by replacing the generalized gradient of Lipschitzian
functions in (7.62) by the symmetric limiting subdifferential ∂ 0 f (x) := ∂f (x) ∪
(−∂(−f ))(x) introduced and utilized by the first author in [224]. This extended mean
value theorem is given [228, Theorem 3.47] in full generality, while its Lipschitzian
versions in finite-dimensional and Fréchet smooth spaces go back to the earlier work
by Kruger and Mordukhovich; see [187, 225, 226].
The regular subdifferential counterpart of the approximate mean value theorem
in Asplund spaces and its l.s.c. extension is due to Mordukhovich and Shao [257]
(see also [228, Theorem 3.49] for a more detailed result), while the previous results
in this direction in Fréchet smooth spaces can be found in Borwein and Preiss [50]
and Loewen [208]. Theorem 7.69 on relationships between the generalized gradient
and limiting subdifferential of locally Lipschitzian functions is taken from [228, 257].
We refer the reader to our papers [231, 233] and the book [228] for various
results on the limiting subdifferential of the distance function and its modifications
550 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
in different space settings. Note to this end the remarkable result by Lionel Thibault
[333] who proved the limiting normal cone representation (7.124) via the limiting
subdifferential of the distance function associated with a closed subset of an arbi-
trary Banach space at in-set points. It is interesting to observe that a counterpart
of Thibault’s relationship, with the replacement of Ω by its enlargement Ωr from
(7.96), fails at out-of-set points even for simple nonconvex sets in R2 . It was observed
in our paper [231], where the notion of the right-sided limiting subdifferential was
introduced to obtain the desired relationship in Banach spaces; see [228, Theo-
rem 1.101]. The result of Theorem 7.74 can be found in the books by Mordukhovich
[226] and Rockafellar and Wets [317], while here we provide a new simple proof.
Difference of convex (DC) functions studied in Section 7.8 are also known as
the delta-convex functions. To the best of our knowledge, such functions defined on
finite-dimensional spaces were first considered as early as in 1949 by the famous
geometer Alexander Alexandrov (1912–1999) in his papers [3, 4] written in Russian.
A systematical investigation of DC functions on Rn has been started by Philip
Hartman (1915–2015) in 1959 who defined continuous DC functions together with
local counterparts, and then established their basic properties in [152].
Recent years have witnessed a profound interest in DC optimization, from both
theoretical and algorithmic viewpoints, with numerous applications to practical
models of machine learning, facility locations, etc. A strong motivation came from
the simple and efficient DC algorithm (DCA) to solve minimization problems with
DC objectives suggested by Pham Dinh Tao and Le Thi Hoai An in [332]. Further
developments on this algorithm and related issues can be found in the papers by
Aragón Artacho and Phan Tu Vuong [7], Bajaj et al. [23], Geremew et al. [140],
Nam et al. [269, 272], and the references therein. These topics with a variety of
applications will be developed in the second volume of our book [240].
On the other spectrum of developments in DC optimization, we mention appli-
cations to theoretical aspects of semi-infinite and bilevel programming given by
Dinh et al. [109, 110] and to the major stability properties of convex multifunctions
obtained by Mordukhovich and Nghia in [248]; see also Mordukhovich’s book [229,
Chapter 7]. Various branches of mathematics, where DC results and ideas are very
fruitful, were overviewed and further developed by Miroslav Bačák and Jonathan
Borwein [20] and by Libor Veselý and Luděk Zajı́ček [342] who also considered DC
mappings with values in normed spaces.
Subsections 7.8.1–7.8.3 are mainly based on the fundamental paper by Hartman
[152] and its infinite-dimensional extensions given in the aforementioned papers
[20, 342]. We specifically mention the remarkable result of Theorem 7.97 known as
the mixing lemma, which is due to Veselý and Zajı́ček [342, Lemma 4.8].
Subsection 7.8.4 presents major calculus rules for subdifferentials of nonconvex
DC functions in terms of the corresponding constructions for convex functions in
DC compositions. The obtained calculus leads to optimality conditions for local and
global minimizers and to nonconvex duality for DC functions.
The regular subdifferential estimates in (7.117) of Theorem 7.111 follow from
more general results of our paper with Nguyen Dong Yen [246]. The limiting subd-
ifferential estimate (7.118) of that theorem seems to appear here for the first time,
while the one for the generalized gradient in (7.119) immediately follows from the
latter.
The conjugate calculation for DC functions from Theorem 7.112 and its extended-
real-valued version are due to Jean-Baptiste Hiriart-Urruty [163]; see also [123]). The
7.10 Commentaries to Chapter 7 551
latter version immediately implies the nonconvex duality result of Theorem 7.113,
which goes back to John Toland [338] and independently to Ivan Singer [327, 328]
being known as Toland-Singer duality. Among various results related to this duality,
we mention a calculation formula for ε-subdifferentials of DC functions on Banach
spaces obtained by Juan Enrique Martı́nez-Legaz and Alberto Seeger in [214].
The necessary optimality condition of Theorem 7.114(a) for local minimizers
in unconstrained DC optimization problems in normed spaces immediately follows
from the subdifferential difference rule of Theorem 7.111(a). We refer the reader to
the paper by Mirjam Dür [115] for an interesting ε-subdifferential extension of this
result in Banach spaces. The ε-subdifferential characterization of global minimizers
of DC functions from Theorem 7.114(b) was established by Hiriart-Urruty in Banach
spaces. The proof of this theorem given in our book can be found in the recent paper
by Burachik, Dao, and Lindstrom [65].
Glossary of Notation and Acronyms
SPACES
R := (−∞, ∞) real line
R+ = R+ collection of nonnegative numbers
R := (−∞, ∞] extended real line
C collection of complex numbers
Rn n-dimensional Euclidean space
Rn n
+ and R− nonnegative and nonpositive orthant of Rn
© Springer Nature Switzerland AG 2022 553
B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9
554 Glossary of Notation and Acronyms
SETS
∅ empty set
N set of natural numbers
Ωc complement of Ω ⊂ X in X
Ω◦ polar set of Ω
Ωr r-expansion of Ω
Ω
x→x x converges to x with x ∈ Ω
B(x; r) = Br (x) open ball centered at x with radius r
B(x; r) = Br (x) closed ball centered at x with radius r
B and B∗ closed unit balls of space and dual space in
question
S and S∗ unit spheres of space and dual space
in question
int(Ω) and ri(Ω) interior and relative interior of Ω
qri(Ω) and iri(Ω) quasi-relative and intrinsic relative interiors
of a convex set Ω
core(Ω) algebraic interior/core of a convex set Ω
lin(Ω) linear closure of Ω
Ω = cl (Ω) closure and weak∗ topological closure of Ω
cl∗ (Ω) = co∗ (Ω) weak∗ topological closure of Ω
clF (Ω) closure of Ω relative to F
bd(Ω) set boundary
co(Ω) and clco (Ω) = co(Ω) convex hull and closed convex hull of Ω
cone(Ω) = R+ Ω conic hull of Ω
KΩ convex cone generated by Ω
Ω∞ (x) horizon/asymptotic cone of set Ω at x
aff(Ω) and aff(Ω) affine hull and closed affine hull of Ω
proj x Ω = proj X Ω x-projection of sets in product spaces
Π(x; Ω) = ΠΩ (x) projector of x into Ω
N (x; Ω) normal cone of convex analysis to Ω at x
when Ω is convex
ε (x; Ω)
N set of ε-normals to Ω at x
N (x; Ω) (Mordukhovich) limiting normal cone to Ω
at x
(x; Ω)
N (Fréchet) regular normal cone to Ω at x
Glossary of Notation and Acronyms 555
FUNCTIONS
δ(·; Ω) = δΩ (·) indicator function of Ω
ξΩ (·) characteristic function of Ω
σ(·; Ω) = σΩ (·) support function of Ω
dist(·; Ω) = d(·; Ω) = dΩ (·) distance function for Ω
Ω) = dΩ (·)
d(·; signed distance function for Ω
pΩ Minkowski gauge of function associated with
set Ω
TΩF = TF (·; Ω) minimal time function associated with
dynamic F and target Ω
TΩF = TF (·; Ω) signed minimal time function associated with
dynamic F and target Ω
dom(f ) domain of f : X → R
epi(f ), hypo(f ), gph(f ) epigraph, hypergraph, graph of f
cont(f ) set of points where function f is continuous
f ∗ and f ∗∗ Fenchel conjugate and biconjugate of f
f∗ concave conjugate of f
Pf perspective function associated with f
f∞ horizon function associated with f
f
x→x x → x with f (x) → f (x)
f (x) = f+
(x) and f− (x) right and left derivatives of f : R → R at x
f (x) = fF (x) = ∇f (x) (Fréchet) derivative/gradient of f at x
fG (x) Gâteaux derivative of f at x
fH (x) Hadamard derivative of f at x
f (x); v) directional derivative of f at x in direction v
df (x; v) (Dini, Dini-Hadamard) contingent derivative/
subderivative of f at x in direction v
f ◦ (x; v) (Clarke) generalized directional derivative
f ↑ (x; v) (Rockafellar upper) directional derivative of
f at x in direction v
∂f (x) subdifferential of convex function f at x
∂x f (x, y) partial subdifferential of f (x, y) with respect
x at (x, y)
∂ε f (x) ε-subdifferential/approximate subdifferential
of convex function f at x
∂ − f (x) (Dini) contingent subdifferential
of f at x
∂f (x) (Clarke) generalized gradient of f at x
(x)
∂f (Fréchet) regular subdifferential of f at x
∂f (x) (Mordukhovich) limiting subdifferential of
f at x
∂ ∞ f (x) singular/horizon subdifferential of f at x
556 Glossary of Notation and Acronyms
SET-VALUED MAPPINGS
F: X→
→Y set-valued mapping/multifunction from
X to Y
dom(F ) domain of F
rge(F ) range of F
gph(F ) graph of F
ker(F ) kernel of F
F norm of positive homogeneous mapping
F −1 : Y →
→X inverse mapping of F : X → →Y
F (Ω) and F −1 (Ω) image and inverse image/preimage of Ω
under F
F ◦G composition of mappings
Ef epigraphical multifunction associated with
function f
D∗ F (x, y) coderivative of F at (x, y) ∈ gph(F )
ACRONYMS
CEL compactly epi-Lipschitzian (sets)
DC difference of convex functions
TVS topological vector space
LCTV locally convex topological vector (spaces)
l.s.c. lower semicontinuous (functions)
u.s.c. upper semicontinuous (functions)
SNC sequentially normally compact
List of Figures
[15] R.J. Aumann, Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12
(1965)
[16] A. Auslender, Differential stability in nonconvex and nondifferentiable pro-
gramming. Math. Program. Stud. 10, 29–41 (1979)
[17] A. Auslender, M. Teboulle, Asymptotic Cones and Functions in Optimization
and Variational Inequalities (Springer, New York, 2003)
[18] S. Axler, Measure, Integration & Real Analysis, Graduate Texts in Mathemat-
ics (Springer, 2020)
[19] D. Azé, J.-P. Penot, Uniformly convex and uniformly smooth convex functions.
Ann. Facul. Sci. Toulouse, Sér. 6, 4, 705–730 (1995)
[20] M. Bačák, J.M. Borwein, On difference convexity of locally Lipschitz functions.
Optimization 60, 961–978 (2011)
[21] R. Baier, E. Farkhi, V. Roshchina, On computing the Mordukhovich subd-
ifferential using directed sets of two dimensions, in Variational Analysis and
Generalized Differentiation in Optimization and Control, ed. by R.S. Burachik,
J.-C. Yao (Springer, New York, 2010), pp. 59–94
[22] R. Baire, Sur les fonctions de variables réelles. Ann. Math. 3, 1–123 (1899)
[23] A. Bajaj, B.S. Mordukhovich, N.M. Nam, T. Tran, Solving a continuous mul-
tifacility location problem by DC algorithms. Optim. Meth. Softw. (2020).
https://doi.org/10.1080/10556788.2020.1771335
[24] S. Banach, Sur les opérations dans les ensembles abstraits et leur application
aux équations integrals. Fund. Math. 3, 133–181 (1922)
[25] S. Banach, Sur les fonctionelles linéaires, I, II. Stud. Math. 1(211–216), 223–
229 (1929)
[26] S. Banach, Théorie des Opérations Linéaries (Monografje Matematyczne, I,
Warszawa, 1932)
[27] S. Banach, S. Mazur, Zur Theorie der linearen Dimension. Stud. Math. 4,
100–112 (1933)
[28] S. Banach, H. Steinhaus, Sur le principe de la condensation de singularités.
Fund. Math. 9, 50–61 (1927)
[29] T.Q. Bao, B.S. Mordukhovich, Relative Pareto minimizers for multiobjective
problems: existence and optimality conditions. Math. Program. 122, 301–347
(2010)
[30] T.Q. Bao, B.S. Mordukhovich, A. Soubeyran, Variational analysis in psycho-
logical modeling. J. Optim. Theory Appl. 164, 290–315 (2015)
[31] I. Bárány, A generalization of Caratéodory’s theorem. Disc. Math. 40, 141–152
(1982)
[32] M. Bardi, A boundary value problem for the minimal-time function. SIAM J.
Control Optim. 27, 776–785 (1989)
[33] H.H. Bauschke, J.M. Borwein, W. Li, Strong conical hull intersection property,
bounded linear regularity, Jameson’s property (G), and error bounds in convex
optimization. Math. Program. 86, 135–160 (1999)
[34] H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator
Theory in Hilbert Spaces, 2nd edn. (Springer, New York, 2017)
[35] M.S. Bazaraa, J.J. Goode, M.Z. Nashed, On the cones of tangents with appli-
cations to mathematical programming. J. Optim. Theory Appl. 13, 389–426
(1974)
References 561
[36] D.P. Bertsekas, S.K. Mitter, A decent numerical method for optimization prob-
lems with nondifferentiable cost functionals. SIAM J. Control 11, 637–652
(1973)
[37] D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization
(Athena Scientific, Boston, MA, 2003)
[38] G. Birkhoff, E. Kreyszig, The establishment of functional analysis. Historia
Math. 11, 258–321 (1984)
[39] E. Bishop, R.R. Phelps, A proof that every Banach space is subreflexive. Bull.
Amer. Math. Soc. 67, 97–98 (1961)
[40] N.N. Bogolyubov, Sur quelques méthodes nouvelles dans le Calculus des Vari-
ations. Ann. Math. Pura Appl. 7, 249–271 (1929)
[41] V.G. Boltyanskii, The maximum principle in the theory of optimal processes.
Dokl. Akad. Nauk SSSR 119, 1070–1073 (1958)
[42] J.F. Bonnans, A. Shapiro, Perturbation Analysis of Optimization Problems
(Springer, New York, 2000)
[43] T. Bonnesen, W. Fenchel, Theorie der konvexen Körper (Springer, Berlin,
1934)
[44] J.M. Borwein, Minimal cuscos and subgradients of Lipschitz functions, in Fixed
Point Theory and Its Applications, ed. by J.-B. Baillion, M. Thera (Longman,
Essex, UK, 1991), pp. 57–82
[45] J.M. Borwein, Maximal monotonicity via convex analysis. J. Convex Anal.
13, 561–586 (2006)
[46] J.M. Borwein, R. Goebel, Notions of relative interior in Banach spaces. J.
Math. Sci. 115, 2542–2553 (2003)
[47] J.M. Borwein, A.S. Lewis, Partially finite convex programming, Part I: quasi-
relative interiors and duality theory. Math. Program. 57, 15–48 (1992)
[48] J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization, 2nd
edn. (Springer, New York, 2006)
[49] J.M. Borwein, Y. Lucet, B.S. Mordukhovich, Compactly epi-Lipshitzian con-
vex sets and functions in normed spaces. J. Convex Anal. 7, 375–393 (2000)
[50] J.M. Borwein, D. Preiss, A smooth variational principle with applications
to subdifferentiability and differentiability of convex functions. Trans. Amer.
Math. Soc. 303, 517–527 (1997)
[51] J.M. Borwein, H.M. Strójwas, Tangential approximations. Nonlinear Anal. 9,
1347–1366 (1985)
[52] J.M. Borwein, J. Vanderwerf, Differentiability of conjugate functions and per-
turbed minimization principles. J. Convex Anal. 16, 707–711 (2009)
[53] J.M. Borwein, Q.J. Zhu, A survey of subdifferential calculus with applications.
Nonlinear Anal. 38, 687–773 (1999)
[54] J.M. Borwein, Q.J. Zhu, Techniques of Variational Analysis (Springer, New
York, 2005)
[55] R.I. Boţ, Conjugate Duality in Convex Optimization (Springer, Berlin, 2010)
[56] R.I. Boţ, E.R. Csecnet, G. Wanka, Regularity condition via quasi-relative
interior in convex programming. SIAM J. Optim. 19, 217–233 (2008)
[57] R.I. Boţ, G. Wanka, The conjugate of the pointwise maximum of two convex
functions revisited. J. Global Optim. 41, 625–632 (2008)
[58] T. Botts, On convex sets in linear normed spaces. Bull. Amer. Math. Soc. 48,
150–152 (1942)
562 References
[123] R. Ellaia, J.-B. Hiriart-Urruty, The conjugate of the difference of convex func-
tions. J. Optim. Theory Appl. 49, 493–498 (1986)
[124] E. Ernst, M. Théra, On the necessity of the Moreau-Rockafellar-Robinson
qualification condition in Banach spaces. Math. Program. 117, 149–161 (2009)
[125] M. Fabian, On minimum principles. Acta Polytech. 20, 109–118 (1983)
[126] M. Fabian, Gâteaux Differentiability of Convex Functions and Topology. Weak
Asplund Spaces (Wiley, New York, 1997)
[127] M. Fabian, B.S. Mordukhovich, Sequential normal compactness versus topo-
logical normal compactness in variational analysis. Nonlinear Anal. 54, 1057–
1067 (2003)
[128] F. Facchinei, J.-S. Pang, Finite-Dimensional Variational Inequalities and
Complementary Problems, published in two volumes (Springer, New York,
2003)
[129] J. (Gyula) Farkas, Theorie der Einfachen Ungleichungen. J. Reine Angew.
Math. 124, 1–27 (1902)
[130] W. Fenchel, On conjugate convex functions. Canad. J. Math. 1, 73–77 (1949)
[131] W. Fenchel, Convex Cones, Sets and Functions, Mimeographed Lecture Notes
(Princeton University, Princeton, NJ, 1951)
[132] W. Fenchel, Convexity through the ages, in Convexity and Its Applications,
ed. by P.M. Gruber, J.M. Wills (Basel, Birkhäuser, 1983), pp. 120–130
[133] R.A. Fisher, Theory of statistical estimation. Proc. Cambridge. Philos. Soc.
22, 700–725 (1925)
[134] F. Flores-Bazán, G. Mastroeni, Strong duality in cone constrained nonconvex
optimization. SIAM J. Optim. 23, 153–169 (2013)
[135] M. Fréchet, Sur quelques points du calcul fonctionnel. Rend. Circ. Matem.
Palermo 22, 1–72 (1906)
[136] R.V. Gamkrelidze, On the theory of optimal processes in linear systems. Dokl.
Akad. Nauk SSSR 116, 9–11 (1957)
[137] R.V. Gamkrelidze, On sliding optimal regimes. Soviet Math. Dokl. 3, 559–561
(1962)
[138] R. Gâteaux, Sur les fonctionnelles continues et les fonctionnelles analytiques.
C. R. Acad. Sci. Paris 157, 325–327 (1913)
[139] J. Gauvin, The generalized gradient of a marginal function in mathematical
programming. Math. Oper. Res. 4, 458–463 (1979)
[140] W. Geremew, N.M. Nam, A. Semenov, V. Boginski, E. Pasiliao, A DC pro-
gramming approach for solving multicast network design problems via the
Nesterov smoothing technique. J. Global Optim. 72, 705–729 (2018)
[141] M. Gieraltowska-Kedzierska, F.S. Van Vleck, Fréchet vs. Gâteaux differentia-
bility of Lipschitzian functions. Proc. Amer. Math. Soc. 114, 905–907 (1992)
[142] E. Giner, J.-P. Penot, Subdifferentiation of integral functionals. Math. Pro-
gram. 168, 401–431 (2018)
[143] M.S. Gowda, M. Teboulle, A comparison of constraint qualifications in infinite-
dimensional convex programming. SIAM J. Control Optim. 28, 925–935
(1990)
[144] M.-S. Grad, Vector Optimization and Monotone Operators via Convex Duality
(Springer, Cham, Switzerland, 2015)
[145] L.M. Graves, Some mapping theorems. Duke Math. J. 17, 111–114 (1950)
[146] A. Greenbaum, A.S. Lewis, M.L. Overton, Variational analysis of the Crouzeix
ratio. Math. Program. 164, 229–243 (2017)
566 References
[193] H.W. Kuhn, A.W. Tucker, Nonlinear programming, in Proceedings of the Sec-
ond Berkeley Symposium on Mathematical Statistics and Probability (Univer-
sity of California Press, Berkeley, CA), pp. 481–492
[194] Y.S. Kupitz, H. Martini, M. Spirova, The Fermat-Torricelli problem, Part I: a
discrete gradient-method approach. Optim. Theory Appl. 158, 305–327 (2013)
[195] K. Kuratowski, Une méthode d’élimination des nombres transfinis des raison-
nements mathématiques. Fund. Math. 3, 76–108 (1922)
[196] K. Kuratowski, Topology, I, II (Academic Press, New York, 1966, 1968)
[197] A.G. Kusraev, S.S. Kutateladze, Subdifferentials: Theory and Applications
(Kluwer, Dordrecht, The Netherlands, 1995)
[198] S.R. Lay, Convex Sets and Their Applications (Wiley, New York, 1982)
[199] E.B. Leach, A note on inverse function theorem. Proc. Amer. Math. Soc. 12,
694–697 (1961)
[200] G. Lebourg, Valeur moyenne pour gradient généraliseé. C. R. Acad. Sci. Paris
281, 795–798 (1975)
[201] B. Lemaire, Applications of a subdifferential of a convex composite func-
tional to optimal control in variational inequalities, in Nondifferentiable Opti-
mization: Motivations and Applications, vol. 255, ed. by V.F. Demyanov, D.
Pallaschke, Lecture Notes Economics and Mathematical Systems (Springer,
Berlin, 1985), pp. 103–117
[202] E.S. Levitin, B.T. Polyak, Convergence of minimizing sequences in conditional
extremum problems. Soviet Math. Dokl. 7, 764–767 (1966)
[203] A.S. Lewis, Nonsmooth analysis of engenvalues. Math. Program. 84, 1–24
(1999)
[204] A.S. Lewis, H.S. Sendov, Nonsmooth analysis of singular values, Part I: theory.
Set-Valued Anal. 13, 213–241 (2005)
[205] A.S. Lewis, H.S. Sendov, Nonsmooth analysis of singular values, Part II: appli-
cations. Set-Valued Anal. 13, 243–264 (2005)
[206] C. Li, K.F. Ng, Subdifferential calculus rules for supremum functions in convex
analysis. SIAM J. Optim. 21, 782–797 (2011)
[207] C. Li, K.F. Ng, T.K. Pong, The SECQ, linear regularity, and the strong CHIP
for an infinite system of closed convex sets in normed linear spaces. SIAM. J.
Optim. 18, 643–665 (2007)
[208] P.D. Loewen, Limits of Fréchet normals in nonsmooth analysis, in Optimiza-
tion and Nonlinear Analysis, ed. by A. Ioffe, L. Marcus, S. Reich, P.R. Notes.
Math, Ser. 244 (UK, Longman, Harlow, Essex, 1992), pp. 178–188
[209] N.N. Luan, J. Yao, N.D. Yen, On some generalized polyhedral convex con-
structions. Numer. Funct. Anal. Optim. 29, 537–570 (2017)
[210] R. Lucchetti, Convexity and Well-Posed Problems (Springer, New York, 2006)
[211] Y. Lucet, J.J. Ye, Sensitivity analysis of the value function for optimization
problems with variational inequality constraints. SIAM J. Control Optim. 40,
699–723 (2001). Erratum in SIAM J. Control Optim. 41, 1315–1319 (2002)
[212] H. Luo, X. Wang, B. Lukens, Variational analysis on the signed distance func-
tions. J. Optim. Theory Appl. 180, 751–774 (2019)
[213] A.A. Lyapunov, Sur les founctions-vecteurs complétement additives. Izvest.
Akad. Nauk SSSR, Ser. Mat. 3, 465–478 (1940)
[214] J.-E. Marı́nez-Legaz, A. Seeger, A formula on the approximate subdifferential
of the difference of convex functions. Bull. Austral. Math. Soc. 45, 37–41
(1992)
References 569
[299] J. Radon, Mengen konvexer körper, die einen gemeinsamen punkt enthalten.
Math. Ann. 83, 113–115 (1921)
[300] S.M. Robinson, Regularity and stability for convex multivalued functions.
Math. Oper. Res. 1, 130–143 (1976)
[301] S.M. Robinson, Some continuity properties of polyhedral multifunctions.
Math. Program. Stud. 14, 206–214 (1981)
[302] R.T. Rockafellar, Convex Functions and Dual Extremum Problems, Doctoral
dissertation, Harvard University, Cambridge, MA (1963)
[303] R.T. Rockafellar, Characterization of the subdifferentials of convex functions.
Pacific J. Math. 17, 497–510 (1966)
[304] R.T. Rockafellar, Extension of Fenchel’s duality theorem for convex functions.
Duke Math. J. 33, 81–89 (1966)
[305] R.T. Rockafellar, Integrals which are convex functionals. Pacific J. Math. 24,
525–539 (1968)
[306] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, NJ,
1970)
[307] R.T. Rockafellar, On the maximal monotonicity of subdifferential mappings.
Pacific J. Math. 33, 209–216 (1970)
[308] R.T. Rockafellar, Integrals which are convex functionals, II. Pacific J. Math.
39, 439–469 (1971)
[309] R.T. Rockafellar, Conjugate Duality and Optimization (SIAM, Philadelphia,
PA, 1974)
[310] R.T. Rockafellar, Directional Lipschitzian functions and subdifferential calcu-
lus. Proc. London Math. Soc. 39, 331–355 (1979)
[311] R.T. Rockafellar, Generalized directional derivatives and subgradients of non-
convex functions. Canad. J. Math. 32, 257–280 (1980)
[312] R.T. Rockafellar, The Theory of Subgradients and Its Applications to Problems
of Optimization: Convex and Nonconvex Functions (Helderman Verlag, Berlin,
1981)
[313] R.T. Rockafellar, Proximal subgradients, marginal values and augmented
Lagrangians in nonconvex optimization. Math. Oper. Res. 6, 424–436 (1981)
[314] R.T. Rockafellar, Lagrange multipliers and subderivatives of optimal value
functions in nonlinear programming. Math. Program. Study 17, 28–66 (1982)
[315] R.T. Rockafellar, Lipschitzian properties of multifunctions. Nonlinear Anal.
9, 867–885 (1985)
[316] R.T. Rockafellar, Extensions of subgradient calculus with applications to opti-
mization. Nonlinear Anal. 9, 665–698 (1985)
[317] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis (Springer, Berlin, 1998)
[318] W. Rudin, Functional Analysis, 2nd edn. (McGraw-Hill, New York, 1991)
[319] A. Ruszczyński, Nonlinear Optimization (Princeton University Press, Prince-
ton, NJ, 2006)
[320] P.A. Samuelson, Foundations of Economic Analysis (Harvard University
Press, Cambridge, Massachusetts, 1947)
[321] J. Schauder, Über die Umkehrung linearer, stetiger Funktionaloperationen.
Stud. Math. 2, 1–6
[322] A. Seeger, Convex analysis of spectrally defined matrix functions. SIAM J.
Optim. 7, 679–696 (1997)
[323] N. Shioji, On uniformly convex functions and uniformly smooth functions.
Math. Japonica 41, 641–655 (1995)
574 References
[367] X.Y. Zheng, K.F. Ng, Subsmooth semi-infinite and infinite optimization prob-
lems. Math. Program. 134, 365–393 (2012)
[368] M. Zorn, A remark on method in transfinite algebra. Bull. Amer. Math. Soc.
41, 667–670 (1935)
Subject Index
convexity E
of a Cartesian product, 67 Ekeland variational principle, 312,
of a direct image, 67 313
of a distance function, 225 Ekeland, Ivar, viii, 250, 251, 312, 376
of a sum of sets, 69 enlargement set, 230
of algebraic interior, 75 epi-differentiability, 458
of an intersection, 70 Ernst, Emil, 305
of an inverse image, 67 essential boundedness, 497
of linear closure, 76 Euclid of Alexandria, 173
of scalar multiplication with sets, Euler, Leonard, 173
69 extremal principle, 315, 316, 549
of set-valued mapping, 69 convex, 311, 315, 316, 376
Correa, Rafael, 306, 308 extremal system, 183, 315
Crandall, Michael, 549 extreme point, 116
Cuong, Dang Van, 378, 442
F
Fabian, Marian, 177, 378
D
face of a convex set, 115
Dür, Mirjam, 551
Farkas lemma, 427, 434
Danskin, John, 306
Farkas, Gyula, 442
Dantzig, George, 174
Fenchel conjugate, 179, 255
Dao, Minh, 551
of marginal function, 290
Debreu, Gérard, 174 of singular functions, 368
Demyanov, Vladimir, 306, 544 chain rule, 270, 278
Dentcheva, Darinka, 309 maximum rule, 271, 278
Deville, Robert, 376 of a spectral composition, 364
Dieudonné, Jean, 64 sum rule, 269, 277
Dinh, Nguyen, 550 Fenchel dual problem, 292
Dini contingent derivative, 446 Fenchel strong duality, 293, 299
Dini subderivative, 446, 456 Fenchel weak duality, 292
Dini, Ulisse, xii, 445, 545 Fenchel, Werner, vii, 173, 179, 249,
Dini-Hadamard directional deriva- 304, 310
tive/subderivative, 549 Fenchel-Young inequality, 259
directional derivative, 279, 339 Fermat stationary rule, 202
directional regularity, 447 finite complement topology, 2
discrete topology, 2 finite intersection property, 20
distance function, 222, 505, 507 Fréchet differentiability, 340
convexity, 225 generic, 358
Lipschitz property, 222 of Fenchel conjugate, 352
subdifferential, 228, 232 subdifferential characterization,
subdifferential in a Hilbert space, 350
229 Fréchet strict differentiability, 487,
domain 489
of a set-valued mapping, 68 Fréchet, Maurice, 63
of extended-real-valued function, Frankowska, Hélène, 250
118 function
dual space, 39, 100 absolutely symmetric, 367
Dubovitskii, Abram, 175, 306 characteristic, 26
Dunford, Nelson, 64 conjugate, 255
580 Subject Index
continuous, 7, 136 H
continuous DC, 515 Hadamard strict differentiability, 486,
control, 519 487
convex, 118 Hadamard, Jacques, 378
difference of convex (DC), 514 Hadjisavvas, Nicolas, 177
distance, 122, 222, 505 Hahn, Hans, 64
horizon, 402 Hahn-Banach theorem, 51, 52, 84, 93
indicator, 121, 205 Hantoute, Abderrahim, 306, 308
Lipschitz continuous, 138 Hartman, Philip, 550
locally DC, 525 Hausdorff topological space, 15, 37
locally Lipschitzian, 138, 204, 446 Hausdorff, Felix, 63
lower semicontinuous, 142 Heine, Eduard, 63
marginal, 287 Heine-Borel theorem, 24
Helly theorem, 430
maximum, 129
Helly, Edward, 64, 442
Minkowski gauge, 78, 410
Henrion, René, 308
optimal value, 287
Hilbert space, 5
perspective, 400
Hilbert, David, 63
polyhedral, 241
Hiriart-Urruty, Jean-Baptiste, viii,
positively homogeneous, 51 251, 304, 377, 442, 550
proper, 118 Holmes, Richard, 175
quasiconvex, 123 homeomorphism, 9
signed distance, 510, 548 horizon cone, 398
signed minimal time, 420 hyperplane, 88
strictly convex, 122 closed, 95
subadditive, 51
sublinear, 51 I
support, 263 indicator function, 205
supremum, 283 indiscrete topology, 2
symmetric, 362 inner product, 4
inner product space, 4
interior, 5
G
algebraic, 74
Gâteaux differentiability, 311, 338,
intrinsic relative, 149
345
of a convex set, 73
generic, 355
of convex epigraphs, 138
subdifferential characterization, of convex sets, 72
346 relative, 102, 147
Gâteaux strict differentiability, 487 Ioffe, Alexander, viii, 177, 251, 304,
Gâteaux, Réne, 378 308, 544, 549
Gamkrelidze, Revaz, 173 Iusem, Alfredo, 378
generalized derivative, 446
generic Fréchet differentiability, 358 J
generic Gâteaux differentiability, 355 Jensen, Johan, 173
geometric derivability, 458 Jeyakumar, Jeya, 306
Giner, Emmanuel, 308 Jourani, Abderrahim, 251
Godefroy, Gilles, 376
Goebel, Rafal, 177 K
Graves, Lawrence, 174, 547 Kantorovich, Leonid, 174
Subject Index 581
W Y
Wang, Bingwu, 547 Yen, Nguyen Dong, xii, 550
Wang, Xianfu (Shawn), 548 Young, Laurence Chisholm (L.C.),
Warga, Jack, 173 173
Young, William Henry, 304
weak Asplund space, 354
weak convergence, 41 Z
weak topology, 10, 39, 40 Zălinescu, Constantin, viii, xii, 175,
weak∗ compactness, 45 251, 304, 306, 377, 441
weak∗ convergence, 44 Zagrodny, Darinsz, 377, 547
weak∗ topology, 43 Zajı́ček, Luděk, 550
Zarantonello, Eduardo, 177
weakly closed set, 98
Zheng, Xi Yin, 306
Weierstrass theorem, 25, 377
Zhu, Jim, 175, 251, 306
Weierstrass, Karl, 63, 173 Zizler, Vaclav, 376
Wells, Mike, 442 Zorn’s lemma, 51
Wets, Roger, 251, 304, 548 Zorn, Max, 64