0% found this document useful (0 votes)
29 views

Nam - Convex Analysis -1

The Springer Series in Operations Research and Financial Engineering publishes high-quality monographs and textbooks on Operations Research, Management Science, and Financial Engineering, covering a wide range of topics including optimization, risk management, and quantitative finance. The document introduces 'Convex Analysis and Beyond,' a two-volume work that explores the theoretical foundations and applications of convex analysis, emphasizing its relevance in various fields such as machine learning and computational statistics. The first volume focuses on the unified theory of convex sets and functions, while the second volume addresses practical applications and advanced topics in convex analysis.

Uploaded by

ijnaaca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Nam - Convex Analysis -1

The Springer Series in Operations Research and Financial Engineering publishes high-quality monographs and textbooks on Operations Research, Management Science, and Financial Engineering, covering a wide range of topics including optimization, risk management, and quantitative finance. The document introduces 'Convex Analysis and Beyond,' a two-volume work that explores the theoretical foundations and applications of convex analysis, emphasizing its relevance in various fields such as machine learning and computational statistics. The first volume focuses on the unified theory of convex sets and functions, while the second volume addresses practical applications and advanced topics in convex analysis.

Uploaded by

ijnaaca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 597

Springer Series in Operations Research

and Financial Engineering

Series Editors
Thomas V. Mikosch, Køebenhavns Universitet, Copenhagen, Denmark
Sidney I. Resnick, Cornell University, Ithaca, USA
Stephen M. Robinson, University of Wisconsin-Madison, Madison, USA

Editorial Board
Torben G. Andersen, Northwestern University, Evanston, USA
Dmitriy Drusvyatskiy, University of Washington, Seattle, USA
Avishai Mandelbaum, Technion - Israel Institute of Technology, Haifa, Israel
Jack Muckstadt, Cornell University, Ithaca, USA
Per Mykland, University of Chicago, Chicago, USA
Philip E. Protter, Columbia University, New York, USA
Claudia Sagastizabal, IMPA – Instituto Nacional de Matemáti, Rio de Janeiro,
Brazil
David B. Shmoys, Cornell University, Ithaca, USA
David Glavind Skovmand, Køebenhavns Universitet, Copenhagen, Denmark
Josef Teichmann, ETH Zürich, Zürich, Switzerland
Bert Zwart, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
The Springer Series in Operations Research and Financial Engineering publishes
monographs and textbooks on important topics in theory and practice of Operations
Research, Management Science, and Financial Engineering. The Series is distin-
guished by high standards in content and exposition, and special attention to timely
or emerging practice in industry, business, and government. Subject areas include:
Linear, integer and non-linear programming including applications; dynamic
programming and stochastic control; interior point methods; multi-objective
optimization; Supply chain management, including inventory control, logistics,
planning and scheduling; Game theory Risk management and risk analysis, including
actuarial science and insurance mathematics; Queuing models, point processes,
extreme value theory, and heavy-tailed phenomena; Networked systems, including
telecommunication, transportation, and many others; Quantitative finance: portfolio
modeling, options, and derivative securities; Revenue management and quantitative
marketing Innovative statistical applications such as detection and inference in very
large and/or high dimensional data streams; Computational economics

More information about this series at https://link.springer.com/bookseries/3182


Boris S. Mordukhovich Nguyen Mau Nam

Convex Analysis and Beyond


Volume I: Basic Theory

With 42 Figures

123
Boris S. Mordukhovich Nguyen Mau Nam
Department of Mathematics Fariborz Maseeh Department
Wayne State University of Mathematics and Statistics
Detroit, MI, USA Portland State University
Portland, OR, USA

ISSN 1431-8598 ISSN 2197-1773 (electronic)


Springer Series in Operations Research and Financial Engineering
ISBN 978-3-030-94784-2 ISBN 978-3-030-94785-9 (eBook)
https://doi.org/10.1007/978-3-030-94785-9

Mathematics Subject Classification: 46A03, 46A55, 47L07, 47N10, 49J52, 49J53, 52A07, 52A41,
90C26

© Springer Nature Switzerland AG 2022


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To the memory of

JON BORWEIN (1951–2016)

and

DIETHARD PALLASCHKE (1940–2020),

our dear friends

and prominent researchers in convex analysis and beyond


Preface

Convexity has a long history that could date back to Ancient Greece
geometers. Probably the first definition of convexity was given by Archimedes
of Syracuse in the third century BC: There are in a plane certain terminated
bent lines, which either lie wholly on the same side of the straight lines joining
their extremities, or have not part of them on the other side.
Over the subsequent centuries, convexity theory has been developed in
various geometric frameworks with outstanding contributions of many great
mathematicians. The most active period in the study of convex sets was in the
late nineteenth century and the early twentieth century with the
quintessential work done by Hermann Minkowski that was summarized in the
books [219, 220] published in 1910–1911 after his death. Minkowski laid the
foundations of the general theory of convex sets in finite-dimensional spaces.
In particular, he established there the fundamental convex separation theo-
rem, which since has played a crucial role in convex analysis and its
applications.
A systematical study of convex functions has been started by Werner
Fenchel who discovered, in particular, seminal results on conjugacy corre-
spondence and convex duality contained in his very influential mimeographed
lecture notes [131] from a course given at Princeton University in 1951.
Although some constructions and results on generalized differentiation of
convex functions can be found in Fenchel [131], the fundamental notion of
subdifferential (a collection of subgradients) for an extended-real-valued
convex function should be attributed to Jean-Jacques Moreau [264] and
R. Tyrrell Rockafellar [302], who introduced this notion independently in
1963. The revolutionary idea of a set-valued generalized derivative satisfying
rich calculus rules has given rise to convex analysis, a new area of mathematics
where analytic and geometric ideas are so nicely interrelated and jointly
produce beautiful results for sets, set-valued mappings, and functions.
A milestone in the consolidation of the new discipline, at least in
finite-dimensional spaces, was Rockafellar’s monograph “Convex Analysis”
[306] published in 1970, which coined the name of this area of mathematics.
Over the subsequent years, numerous strong results have been discovered in

vii
viii Preface

this area and many excellent books have been published on various aspects of
convex analysis and its applications in finite and infinite dimensions. Among
them, we mention the books by Bauschke and Combettes [34], Bertsekas et al.
[37], Borwein and Lewis [48], Boyd and Vandenberghe [62], Castaing and
Valadier [71], Ekeland and Temam [122], Hiriart-Urruty and Lemaréchal
[164] and its abridge version [165], Ioffe and Tikhomirov [174], Nesterov [279],
Pallaschke and Rolewicz [285], Phelps [290], Pshenichnyi [294], and Zălinescu
[361].
It has been well recognized that convex analysis provides the mathemat-
ical foundations for numerous applications in which convex optimization is
the first to name. The presence of convexity makes it possible not only to
investigate qualitative properties of optimal solutions and derive efficient
optimality conditions, but also develop and justify numerical algorithms to
solve convex optimization problems with smooth and nonsmooth data.
Convex analysis and optimization have an increasing impact on many areas of
mathematics and applications including control systems, estimation and
signal processing, communications and networks, electronic circuit design,
data analysis and modeling, statistics, economics and finance, etc. In recent
times, convex analysis has become more and more important for applications
to some new fields of mathematical sciences and practical modeling such as
computational statistics, machine learning, sparse optimization, location
sciences, etc.
Despite an extensive literature on diverse aspects of convex analysis and
applications, our book has a lot to offer to researchers, students, and practi-
tioners in these fascinating areas. We split the book into two volumes, and
now present to the reader’s attention the first volume, which is mainly
devoted to theoretical aspects of convex analysis and related fields where
convexity plays a crucial role. The second volume [240] addresses various
applications of convex analysis including those areas, which were listed above.
The first volume is devoted to developing a unified theory of convex sets,
set-valued mappings, and functions in vector and topological vector spaces
with its specifications to Banach and finite-dimensional settings. These
developments and expositions are based on the powerful geometric approach of
variational analysis, which resides on set extremality with its characterizations
and significant modifications in the presence of convexity. This approach
allows us to consolidate the device of fundamental facts of generalized differ-
ential calculus and obtain novel results for convex sets, set-valued mappings,
and functions in finite-dimensional and infinite-dimensional settings.
Some aspects of the geometric approach to convex analysis in finite-
dimensional spaces were given in our previous short book [237] in which the
reader was provided with an easy path to access generalized differentiation of
convex objects in finite dimensions and its applications to theoretical and
algorithmic topics of convex optimization and facility location. Now we lar-
gely extend in various directions the previous developments in both
finite-dimensional and infinite-dimensional spaces while covering a much
Preface ix

broader spectrum of topics and results related to convexity and its


applications.
Besides major topics of convex analysis, we present in this book several
important developments, which have been either motivated by the extensions
of their convexity prototypes, or being largely based on convexity methods and
results. The first group includes variational principles of Ekeland’s type for
lower semicontinuous functions that are strongly related to Bishop-Phelps’s
density theorems and their proofs for nonsolid convex sets. The second group
concerns convexified nonsmooth analysis of nonconvex functions admitting
convex generalized directional derivatives for which calculus rules and other
properties are based on methods and results of convex analysis. All of this
allows us to title our book as “Convex Analysis and Beyond.”
The book consists of seven chapters and is organized as follows. Chapter 1
addresses the mathematical foundations of convex analysis. For the reader’s
convenience, we make the book self-contained and present here basic concepts
and results on topological spaces and topological vector spaces scattered in
the literature and give their short while rather detailed proofs. The selected
concepts and results are used further to study algebraic, geometric, and
topological properties of convex sets and functions. Also, this chapter includes
the fundamental theorems of functional analysis largely employed in the book,
accompanied by their simplified albeit complete proofs.
Chapter 2 is devoted to basic theory of convexity for sets and functions in
linear spaces and topological vector spaces with some specifications in finite
dimensions. We pay special attention to convex sets and derive for them
various versions of convex separation theorems, which play a pivoting role in
further developments. For extended-real-valued functions, the convexity is
defined geometrically with giving analytical representations, describing
operations over functions that keep convexity, and studying major topological
properties of convex functions. The last section of this chapter contains a
more recent material on generalized relative interiors of convex sets in
infinite-dimensional spaces, including new results in this direction.
In Chapter 3 we start the exposition and development of a unified theory of
generalized differentiation for convex sets, set-valued mappings, and
extended-real-valued functions based on the aforementioned variational
geometric approach. We mainly concentrate here on the general setting of
topological vector spaces with presenting also important finite-dimensional
specifications. The essence of our approach is the notion of set extremality and
the corresponding convex extremal principle for systems of sets, which goes
along with the general extremal principle of variational analysis while sig-
nificantly reflecting the specific of convex sets and being closely related to
convex separation. The main calculus result obtained is the normal cone
intersection rule derived under various qualification conditions. Based on this
result, we establish comprehensive rules for coderivatives of set-valued
mappings and subgradients of extended-real-valued functions in topological
vector space and finite-dimensional settings with special elaborations for
x Preface

classes of maximum and distance functions. Finally, in this chapter, we pre-


sent recent and new results on polyhedral calculus rules in topological vector
spaces that essentially extend their prototypes in finite dimensions.
Chapter 4 enters the consideration of Fenchel conjugates of extended-real-
valued functions, which are among the strongest tools of convex analysis and
are closely related to convex duality. Based on the previous study of set
extremality and convex separation, we first develop here a comprehensive
conjugate calculus under appropriate qualification conditions in the three
major settings: in general topological vector spaces with using nonempty
interiors of sets and the continuity of functions, under polyhedrality
assumptions in topological vector spaces with using quasi-relative interiors of
sets, and in finite dimensions with the usage of relative interiors.
Furthermore, enhanced rules of conjugate and generalized differential calculus
are developed in this chapter under relaxed qualification conditions by
employing variational techniques in Banach spaces. A special attention is paid
to subdifferentiation of the pointwise suprema of convex functions over infi-
nite sets with the usage of relationships between subgradients and directional
derivatives, and to computing subgradients and conjugates of marginal/
optimal value functions, which are highly important for numerous applica-
tions. Finally, Chapter 4 presents major developments on Fenchel duality
including quite recent and new results on this topic in various space frame-
works and under diverse qualification conditions.
Chapter 5 contains complete proofs of major variational principles of
variational analysis and their convex counterparts. We highlight, in partic-
ular, novel approximate and exact versions of the extremal principle for closed
convex sets in general Banach spaces. These results give necessary and suf-
ficient conditions for set extremality that are different in several aspects from
the corresponding versions of the extremal principle in nonconvex settings
given in the book of Mordukhovich [228]. As a consequence of the extremal
principle, we establish new approximate and exact characterizations of convex
separation for closed convex subsets of Banach spaces without any (gener-
alized) relative interiority assumptions. Among the topics presented in this
chapter, where the developed variational principles and arguments play a
large role, we mention calculus of e-subgradients with and without qualifi-
cation conditions, mean value theorems for continuous and lower semicon-
tinuous convex functions, maximal monotonicity of subgradient mappings,
and subdifferential characterizations of Gâteaux and Fréchet differentiability
together with their generic versions. This chapter is concluded by considering
matrix-dependent spectral and singular functions with giving a simple proof
of the seminal von Neumann trace inequality and the subsequent subdiffer-
ential study of these functional classes, which are highly important in
applications.
In Chapter 6 we address miscellaneous topics of convex analysis that
combine classical results with more recent themes that play a crucial role in a
variety of theoretical and algorithmic applications to practical models
Preface xi

considered in the second volume of our book. Classical topics include


Carathéodory, Helly and Radon theorems, Farkas lemma, duality relationships
between tangent and normal cones and their calculations for polyhedral sets,
horizon cones and horizon functions at infinity, etc. More recent develop-
ments concern Nesterov’s smoothing techniques and related topics on strong
convexity and strong monotonicity in finite and infinite dimensions, the study
of perspective functions at infinity as well as of signed distance and minimal
time functions together with the new class of signed minimal time functions.
Although the main emphasis in the study of these classes of functions is on
their convex analysis, some important results do not require any convexity
assumptions.
The final Chapter 7 addresses nondifferentiable while nonconvex functions
that is different from the previous chapters. However, the study of such
functions is mainly based on the machinery of convex analysis. This really
goes beyond convex analysis by using certain convexification procedures. The
major attention is paid in this chapter to the parallel investigation of two
convex-valued directionally generated subdifferentials of locally Lipschitzian
functions on normed spaces that are associated with Clarke’s generalized
directional derivative and the Dini contingent derivative/subderivative. We
present comprehensive calculus rules and other results for these and related
constructions including quite recent and new developments. Some properties
of regular and limiting subgradients are also reviewed here with their usage in
deriving precise subdifferential formulas for the signed distance functions
associated with convex sets. Finally, we include major results for a very
important class of DC (difference of convex) functions with applications to
nonconvex duality.
Each chapter contains the exercise section, where we formulate numerous
exercises with different levels of difficulties and provide hints to some of them.
Many figures and examples are given throughout the whole text. Furthermore,
the last section of each chapter presents extensive commentaries, which play a
highly significant role in the book. Besides detailed historical information and
reviewing the genesis of major ideas and motivations, we provide in the
commentaries rather elaborated discussions on some relates topics, which are
not included in the basic text; e.g., relationships with similar results of non-
convex variational analysis, subdifferentiation of integral functionals, direc-
tionally generated subgradient mappings for nonconvex extended-real-valued
functions, etc. All of this makes the book to be more complete and leads the
reader to additional advanced studies.
Different parts of this book aim at their primary groups of readers. The
entire book should be of interest for experts in convex and variational anal-
ysis, optimization, and their numerous applications as well as for mathe-
maticians and applied scientists in other areas who wish to learn more on this
subject. Based on our own experience in teaching some parts of this book at
Portland State University and Wayne State University, we envision that the
book with the exercises therein will be useful for teaching graduate classes in
xii Preface

mathematical sciences that are also accessible to advanced students in eco-


nomics, engineering, and other applied areas. Large parts of the book con-
cerning convex analysis in finite-dimensional and normed spaces present well
accessible material for upper undergraduate students.
Over the years of our work on the book, we have enjoyed fruitful discus-
sions with many prominent experts in convex and variational analysis, opti-
mization, and their applications whose publications are included in the
reference list. Our special thanks go to Terry Rockafellar, to the late Jon
Borwein and Diethard Pallaschke, and also to Yurii Nesterov, Nguyen Dong
Yen, and Constantin Zălinescu. We are very grateful to all our collaborators
of the papers and projects that are used in the book. Our students Anuj Bajaj,
Liam Jemison, Scott Lindstrorm, Will Maxwell, Dao Nguyen, Trang Nguyen,
Nguyen Xuan Quy, and Gary Sandine helped us in the book preparation and
proofreading. Finally, both authors thank the National Science Foundation,
while the first author also thanks the Air Force Office of Scientific Research
for their continuing support.

Ann Arbor, MI, USA Boris S. Mordukhovich


Portland, OR, USA Nguyen Mau Nam
April 2022
Contents

1 FUNDAMENTALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Topological Interior and Closure of Sets . . . . . . . . . . . 5
1.1.3 Continuity of Mappings . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Bases for Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.5 Topologies Generated by Families of Mappings . . . . . 10
1.1.6 Product Topology and Quotient Topology . . . . . . . . . 12
1.1.7 Subspace Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.8 Separation Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.1.9 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.10 Connectedness and Disconnectedness . . . . . . . . . . . . . 25
1.1.11 Net Convergence in Topological Spaces . . . . . . . . . . . . 28
1.2 Topological Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.2.1 Basic Concepts in Topological Vector Spaces . . . . . . . 32
1.2.2 Weak Topology and Weak Topology . . . . . . . . . . . . . 39
1.2.3 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.3 Some Fundamental Theorems of Functional Analysis . . . . . . 49
1.3.1 Hahn-Banach Extension Theorem . . . . . . . . . . . . . . . . 50
1.3.2 Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . . . 54
1.3.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . 56
1.3.4 Closed Graph Theorem and Uniform Boundedness
Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58
1.4 Exercises for Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60
1.5 Commentaries to Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . .. 63
2 BASIC THEORY OF CONVEXITY . . . . . . . . . . . . . . . . . . . . . . . 65
2.1 Convexity of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.1.1 Basic Definitions and Elementary Properties . . . . . . . 65
2.1.2 Operations on Convex Sets and Convex Hulls . . . . . . 69

xiii
xiv Contents

2.2 Cores, Minkowski Functions, and Seminorms . . . . . . . . . . . . 74


2.2.1 Algebraic Interior and Linear Closure . . . . . . . . . . . . . 74
2.2.2 Minkowski Gauges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.3 Seminorms and Locally Convex Topologies . . . . . . . . . 80
2.3 Convex Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.3.1 Convex Separation in Vector Spaces . . . . . . . . . . . . . . 85
2.3.2 Convex Separation in Topological Vector Spaces . . . . 95
2.3.3 Convex Separation in Finite Dimensions . . . . . . . . . . . 102
2.3.4 Extreme Points of Convex Sets . . . . . . . . . . . . . . . . . . 114
2.4 Convexity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.4.1 Descriptions and Properties of Convex Functions . . . . 118
2.4.2 Convexity under Differentiability . . . . . . . . . . . . . . . . 124
2.4.3 Operations Preserving Convexity of Functions . . . . . . 128
2.4.4 Continuity of Convex Functions . . . . . . . . . . . . . . . . . 136
2.4.5 Lower Semicontinuity and Convexity . . . . . . . . . . . . . 142
2.5 Extended Relative Interiors in Infinite Dimensions . . . . . . . . 147
2.5.1 Intrinsic Relative and Quasi-Relative Interiors . . . . . . 147
2.5.2 Convex Separation via Extended Relative
Interiors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.5.3 Extended Relative Interiors of Graphs and
Epigraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.6 Exercises for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
2.7 Commentaries to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 173
3 CONVEX GENERALIZED DIFFERENTIATION . . . . . . . . . . . . 179
3.1 The Normal Cone and Set Extremality . . . . . . . . . . . . . . . . . 179
3.1.1 Basic Definition and Normal Cone Properties . . . . . . . 180
3.1.2 Set Extremality and Convex Extremal Principle . . . . . 182
3.1.3 Normal Cone Intersection Rule in Topological
Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
3.1.4 Normal Cone Intersection Rule in Finite
Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
3.2 Coderivatives of Convex-Graph Mappings . . . . . . . . . . . . . . . 194
3.2.1 Coderivative Definition and Elementary
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3.2.2 Coderivative Calculus in Topological Vector
Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.2.3 Coderivative Calculus in Finite Dimensions . . . . . . . . 200
3.3 Subgradients of Convex Functions . . . . . . . . . . . . . . . . . . . . . 201
3.3.1 Basic Definitions and Examples . . . . . . . . . . . . . . . . . . 201
3.3.2 Subdifferential Sum Rules . . . . . . . . . . . . . . . . . . . . . . 212
3.3.3 Subdifferential Chain Rules . . . . . . . . . . . . . . . . . . . . . 216
3.3.4 Subdifferentiation of Maximum Functions . . . . . . . . . . 219
3.3.5 Distance Functions and Their Subgradients . . . . . . . . 222
Contents xv

3.4 Generalized Differentiation under Polyhedrality . . . . . . . . . . 232


3.4.1 Polyhedral Convex Separation . . . . . . . . . . . . . . . . . . . 232
3.4.2 Polyhedral Normal Cone Intersection Rule . . . . . . . . . 239
3.4.3 Polyhedral Calculus for Coderivatives and
Subdifferentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
3.5 Exercises for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
3.6 Commentaries to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4 ENHANCED CALCULUS AND FENCHEL DUALITY . . . . . . . 255
4.1 Fenchel Conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
4.1.1 Definitions, Examples, and Basic Properties . . . . . . . . 255
4.1.2 Support Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
4.1.3 Conjugate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
4.2 Enhanced Calculus in Banach Spaces . . . . . . . . . . . . . . . . . . . 273
4.2.1 Support Functions of Set Intersections . . . . . . . . . . . . 273
4.2.2 Refined Calculus Rules . . . . . . . . . . . . . . . . . . . . . . . . 275
4.3 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
4.3.1 Definitions and Elementary Properties . . . . . . . . . . . . 279
4.3.2 Relationships with Subgradients . . . . . . . . . . . . . . . . . 280
4.4 Subgradients of Supremum Functions . . . . . . . . . . . . . . . . . . 283
4.4.1 Supremum of Convex Functions . . . . . . . . . . . . . . . . . 283
4.4.2 Subdifferential Formula for Supremum Functions . . . . 285
4.5 Subgradients and Conjugates of Marginal Functions . . . . . . . 286
4.5.1 Computing Subgradients and Another Chain Rule . . . 287
4.5.2 Conjugate Calculations for Marginal Functions . . . . . 290
4.6 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
4.6.1 Fenchel Duality for Convex Composite Problems . . . . 292
4.6.2 Duality Theorems via Generalized Relative
Interiors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
4.7 Exercises for Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
4.8 Commentaries to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 304
5 VARIATIONAL TECHNIQUES AND FURTHER
SUBGRADIENT STUDY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
5.1 Variational Principles and Convex Geometry . . . . . . . . . . . . 311
5.1.1 Ekeland’s Variational Principle and Related
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
5.1.2 Convex Extremal Principles in Banach Spaces . . . . . . 315
5.1.3 Density of e-Subgradients and Some
Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
5.2 Calculus Rules for e-Subgradients . . . . . . . . . . . . . . . . . . . . . 322
5.2.1 Exact Sum and Chain Rules for e-Subgradients . . . . . 322
5.2.2 Asymptotic e-Subdifferential Calculus . . . . . . . . . . . . . 325
5.3 Mean Value Theorems for Convex Functions . . . . . . . . . . . . . 328
xvi Contents

5.3.1 Mean Value Theorem for Continuous Functions . . . . . 328


5.3.2 Approximate Mean Value Theorem . . . . . . . . . . . . . . . 330
5.4 Maximal Monotonicity of Subgradient Mappings . . . . . . . . . . 335
5.5 Subdifferential Characterizations of Differentiability . . . . . . . 338
5.5.1 Gâteaux and Fréchet Differentiability . . . . . . . . . . . . . 338
5.5.2 Characterizations of Gâteaux Differentiability . . . . . . . 346
5.5.3 Characterizations of Fréchet Differentiability . . . . . . . 350
5.6 Generic Differentiability of Convex Functions . . . . . . . . . . . . 353
5.6.1 Generic Gâteaux Differentiability . . . . . . . . . . . . . . . . 354
5.6.2 Generic Fréchet Differentiability . . . . . . . . . . . . . . . . . 356
5.7 Spectral and Singular Functions in Convex Analysis . . . . . . . 359
5.7.1 Von Neumann Trace Inequality . . . . . . . . . . . . . . . . . 359
5.7.2 Spectral and Symmetric Functions . . . . . . . . . . . . . . . 362
5.7.3 Singular Functions and Their Subgradients . . . . . . . . 366
5.8 Exercises for Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
5.9 Commentaries to Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 375
6 MISCELLANEOUS TOPICS ON CONVEXITY . . . . . . . . . . . . . . 381
6.1 Strong Convexity and Strong Smoothness . . . . . . . . . . . . . . . 381
6.1.1 Basic Definitions and Relationships . . . . . . . . . . . . . . . 381
6.1.2 Strong Convexity/Strong Smoothness via
Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
6.2 Derivatives of Conjugates and Nesterov’s Smoothing . . . . . . 391
6.2.1 Differentiability of Conjugate Compositions . . . . . . . . 391
6.2.2 Nesterov’s Smoothing Techniques . . . . . . . . . . . . . . . . 393
6.3 Convex Sets and Functions at Infinity . . . . . . . . . . . . . . . . . . 398
6.3.1 Horizon Cones and Unboundedness . . . . . . . . . . . . . . . 398
6.3.2 Perspective and Horizon Functions . . . . . . . . . . . . . . . 400
6.4 Signed Distance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
6.4.1 Basic Definition and Elementary Properties . . . . . . . . 405
6.4.2 Lipschitz Continuity and Convexity . . . . . . . . . . . . . . 407
6.5 Minimal Time Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
6.5.1 Minimal Time Functions with Constant Dynamics . . . 410
6.5.2 Subgradients of Minimal Time Functions . . . . . . . . . . 416
6.5.3 Signed Minimal Time Functions . . . . . . . . . . . . . . . . . 420
6.6 Convex Geometry in Finite Dimensions . . . . . . . . . . . . . . . . . 424
6.6.1 Carathéodory Theorem on Convex Hulls . . . . . . . . . . . 425
6.6.2 Geometric Version of Farkas Lemma . . . . . . . . . . . . . 427
6.6.3 Radon and Helly Theorems on Set Intersections . . . . . 430
6.7 Approximations of Sets and Geometric Duality . . . . . . . . . . . 432
6.7.1 Full Duality between Tangent and Normal Cones . . . 432
6.7.2 Tangents and Normals for Polyhedral Sets . . . . . . . . . 434
6.8 Exercises for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
6.9 Commentaries to Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Contents xvii

7 CONVEXIFIED LIPSCHITZIAN ANALYSIS . . . . . . . . . . . . . . . 445


7.1 Generalized Directional Derivatives . . . . . . . . . . . . . . . . . . . . 446
7.1.1 Definitions and Relationships . . . . . . . . . . . . . . . . . . . 446
7.1.2 Properties of Extended Directional Derivatives . . . . . . 452
7.2 Generalized Derivative and Subderivative Calculus . . . . . . . . 455
7.2.1 Calculus Rules for Subderivatives . . . . . . . . . . . . . . . . 456
7.2.2 Calculus of Generalized Directional Derivatives . . . . . 461
7.3 Directionally Generated Subdifferentials . . . . . . . . . . . . . . . . 465
7.3.1 Basic Definitions and Some Properties . . . . . . . . . . . . 465
7.3.2 Calculus Rules for Generalized Gradients . . . . . . . . . . 470
7.3.3 Calculus of Contingent Subgradients . . . . . . . . . . . . . . 477
7.4 Mean Value Theorems and More Calculus . . . . . . . . . . . . . . . 480
7.4.1 Mean Value Theorems for Lipschitzian Functions . . . 480
7.4.2 Additional Calculus Rules for Generalized
Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
7.5 Strict Differentiability and Generalized Gradients . . . . . . . . . 486
7.5.1 Notions of Strict Differentiability . . . . . . . . . . . . . . . . 487
7.5.2 Single-Valuedness of Generalized Gradients . . . . . . . . 490
7.6 Generalized Gradients in Finite Dimensions . . . . . . . . . . . . . 493
7.6.1 Rademacher Differentiability Theorem . . . . . . . . . . . . 494
7.6.2 Gradient Representation of Generalized Gradients . . . 495
7.6.3 Generalized Gradients of Antiderivatives . . . . . . . . . . 497
7.7 Subgradient Analysis of Distance Functions . . . . . . . . . . . . . 500
7.7.1 Regular and Limiting Subgradients of Lipschitzian
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
7.7.2 Regular and Limiting Subgradients of Distance
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
7.7.3 Subgradients of Convex Signed Distance
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
7.8 Differences of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . 514
7.8.1 Continuous DC Functions . . . . . . . . . . . . . . . . . . . . . . 515
7.8.2 The Mixing Property of DC Functions . . . . . . . . . . . . 518
7.8.3 Locally DC Functions . . . . . . . . . . . . . . . . . . . . . . . . . 525
7.8.4 Subgradients and Conjugates of DC Functions . . . . . . 532
7.9 Exercises for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
7.10 Commentaries to Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Glossary of Notation and Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577


1
FUNDAMENTALS

This chapter collects fundamental notions and results in vector spaces, topo-
logical spaces, topological vector spaces, and their specifications that are
widely used in the subsequent chapters of the book to build the basic the-
ory of convexity and its applications. For the reader’s convenience, we present
here proofs and discussions, which allow us to make the book self-contained
and helpful for a broad spectrum of students, researchers, and practitioners.

1.1 Topological Spaces


This section contains an introduction to the theory of topological spaces.
Although there are many excellent books on topological spaces, we present
here important concepts and results from the topological space theory that
are needed for the main parts of the book.

1.1.1 Definitions and Examples

Topological spaces provide a general framework for considering many partic-


ular classes of spaces such as the real line, Rn , Hilbert spaces, normed spaces,
topological vector spaces, and metric spaces. Let us start this section with the
definitions of a topology and a topological space.

Definition 1.1 Let X be a set. A collection τ of subsets of X is called a


topology on X if the following conditions are satisfied:
(a) ∅ ∈ τ and X ∈ τ .

(b) If {Gα }α∈I is a collection of elements of τ , then α∈I Gα ∈ τ .
(c) If G1 , G2 ∈ τ , then G1 ∩ G2 ∈ τ .
If τ is a topology on X, then (X, τ ) is called a topological space. Each
element of τ is called an open set. A subset F of X is said to be closed if
its complement F c is open. For two topologies τ1 and τ2 on X, we say that τ1
is weaker than τ2 if τ1 ⊂ τ2 .

© Springer Nature Switzerland AG 2022 1


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 1
2 1 FUNDAMENTALS

Example 1.2 Given a set X, we define below some topological structures on


X.
(a) Let τ1 := P(X) be the collection of all subsets of X. The topology τ1 is
called the discrete topology on X. In (X, τ1 ), every set is both open and
closed.
(b) Let τ2 := {∅, X}. The topology τ2 is called the indiscrete topology on X.
In (X, τ2 ), the empty set and X are the only open sets.
(c) Given a subset A of X, define τ3 := {∅, X, A, Ac }. We can easily verify
that τ3 is a topology on X.
(d) Finally, consider the set collection
  
τ4 := A ⊂ X  X \ A is finite ∪ {∅}
and verify that τ4 is a topology on X called the finite complement topology.
Obviously, ∅ ∈ τ4 . Since X \ X = ∅ is a finite set, we have X ∈ τ4 . Fix
any two sets G1 , G2 ∈ τ4 and show that G1 ∩ G2 ∈ τ4 . Indeed, if G1 = ∅
or G2 = ∅, then G1 ∩ G2 = ∅ ∈ τ4 . Consider the case where G1 = ∅ and
G2 = ∅. Then X \ G1 and X \ G2 finite, and hence
X \ (G1 ∩ G2 ) = (X \ G1 ) ∪ (X \ G2 )
is a finite set. Thus G1 ∩G2 ∈ τ4 . To show that τ4 is closed under arbitrary
 fix any collection of sets {Gα }α∈I ⊂ τ4 . If Gα = ∅ for every α ∈ I,
unions,
then α∈I Gα = ∅ ∈ τ4 . Consider the case where there exists α0 ∈ I such
that Gα0 = ∅. In this case, we get
 
X\ Gα = (X \ Gα ) ⊂ X \ Gα0 .
α∈I α∈I

Since
 X \ Gα0 is a finite set, X \ ( α∈I Gα ) is also a finite set, and hence
α∈I Gα belongs to τ4 . This tells us that τ4 is a topology on X.

Example 1.3 (metrics and metric spaces) Let X be a set. A real-valued func-
tion d : X × X → R is called a metric on X if the following conditions hold
for all elements x, y, z ∈ X:
(a) d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y.
(b) d(x, y) = d(y, x).
(c) d(x, z) ≤ d(x, y) + d(y, z) (the triangle inequality).
The set X together with the metric d is called a metric space and is denoted
by (X, d). If the metric d is specified on X so that no confusion occurs, we
can simply say that X is a metric space.
Given a point x0 ∈ X and a number r > 0, the open ball in X centered at
x0 with radius r is defined by
  
B(x0 ; r) := x ∈ X  d(x, x0 ) < r ,
1.1 Topological Spaces 3

and the closed ball in X centered at x0 with radius r is defined by


  
B(x0 ; r) := x ∈ X  d(x, x0 ) ≤ r .
A subset G of X is called open if for each a ∈ G, there is r > 0 such that
B(a; r) ⊂ G.
We can prove the following properties:

(a) ∅ and X are open.


(b) The intersection of a finite number of open subsets of X is open.
(c) The union of any collection of open subsets of X is open.

Indeed, the first property is trivial. Suppose that Gi for i = 1, . . . , m are


m
open subsets of X. Let us show that the set G := i=1 Gi is also open. Fix
any a ∈ G. Then a ∈ Gi for every i = 1, . . . , m. Since each Gi is open, there
exists ri > 0 such that
B(a; ri ) ⊂ Gi for all i = 1, . . . , m.
Set r := min{ri | i = 1, . . . , m}. Then r > 0 and B(a; r) ⊂ G. Thus G is
open, and so the second property is satisfied. Suppose nextthat {Gα }α∈I is
an arbitrary collection of open sets and verify that G := α∈I Gα is open.
Fix any a ∈ G. Then there is α0 ∈ I such that a ∈ Gα0 . Since Gα0 is open,
there exists r > 0 such that
B(a; r) ⊂ Gα0 .
Thus B(a; r) ⊂ G due to Gα0 ⊂ G, and so the third property is satisfied. Now
consider the collection τ of all open sets in X. These three properties tell us
that τ is a topology called the metric topology on X. Therefore, (X, τ ) is a
topological space.
We say that a sequence {xk } in X converges if there exists an element
x ∈ X such that limk→∞ d(xk , x) = 0. A sequence {xk } in X is called a
Cauchy sequence if for any ε > 0, there exists k0 ∈ N such that
d(xk , xl ) < ε for all k, l ≥ k0 .
The metric space (X, d) is complete if every Cauchy sequence in X converges.

To proceed with other examples, let us first define the class of vector/linear
spaces, which generally does not relate to topologies. This very large class is
used in this book to obtain some important results of convex analysis without
topologies, while the most interesting developments require the combination
of linear and topological structures.
4 1 FUNDAMENTALS

Definition 1.4 A nonempty set X is called a vector or linear space over


a field F if it is endowed with a binary operation + : X × X → X (vector
addition) and a scalar multiplication · : F × X → X such that the following
properties are satisfied for all x, y, z ∈ X and α, β ∈ F:
(a) (x + y) + z = x + (y + z).
(b) There exists 0 ∈ X such that x + 0 = 0.
(c) There exists −x ∈ X such that x + (−x) = 0.
(d) x + y = y + x.
(e) (αβ)x = α(βx).
(f) α(x + y) = αx + αy.
(g) (α + β)x = αx + βx.
(h) 1x = x.

The most common choices of the field F in Definition 1.4 (and the only
fields considered in this book) are the field R of real numbers and the field C
of complex numbers. In the first case, X is called a real vector/linear space,
and in the second case it is called a complex one.
Example 1.5 (normed spaces and Banach spaces) Let X be a vector space
over a field F (either R or C). A function · : X → R is called a norm if the
following properties hold for all vectors x, y ∈ X and all numbers α ∈ F:
(a) x ≥ 0, and x = 0 if and only if x = 0.
(b) αx = |α| · x .
(c) x+y ≤ x + y .
If · is a norm on X, then the pair (X, · ) is called a normed space. We
can verify that a normed space X is a metric space with the metric
d(x, y) := x − y for x, y ∈ X.
If in addition X is complete, then it is called a Banach space. Thus both
classes of normed and Banach spaces are examples of topological spaces.

Example 1.6 (inner product spaces and Hilbert spaces) Let H be a vector
space over a field F (either R or C). An inner product on H is a function
·, · : H × H → F that satisfies the following properties for all vectors x, y, z ∈
H and all numbers λ ∈ F:
(a) x, x ≥ 0, and x, x = 0 if and only if x = 0.
(b) x + y, z = x, z + y, z .
(c) λx, y = λ x, y .
(d) x, y = y, x .
If ·, · is an inner product on H, then the pair (H, ·, · ) is called an inner
product space.
1.1 Topological Spaces 5

Let H be an inner product space. Define · : H → R by

x := x, x for x ∈ H.
Then · is a norm on H and (H, · ) is a normed space. If in addition this
normed space is a Banach space, then H is called a Hilbert space. Thus inner
product spaces and Hilbert spaces are also examples of topological spaces. In
particular, on Rn (as a real vector space) with the usual addition and scalar
multiplication, we define
n
x, y := xk yk ,
k=1

where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Then it is easy to verify that the


operation ·, · defines an inner product.
Proceeding in the same way, on Cn (as a complex vector space) with the
usual addition and scalar multiplication we denote
n
x, y := xk y k ,
k=1

where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Then it is easy to verify that the


complex operation ·, · also defines an inner product.
Finally, let Mn be the vector space of all n × n complex matrices (as a
vector space over either R or C) with the usual matrix addition and scalar
multiplication. For A = (aij ) and B = (bij ) in Mn , define
n
A, B = aij bij .
i,j=1

Then the matrix operation ·, · is clearly an inner product.

1.1.2 Topological Interior and Closure of Sets

The following topological notions are basic in the study of sets.


Definition 1.7 Let A be a subset of a topological space X.
(a) The interior of A, denoted by int(A), is the union of all open sets
contained in A. This means that

int(A) = G.
G open, G⊂A

(b) The closure of A, denoted by A, is the intersection of all closed sets


containing A. This means that

A= F.
F closed, A⊂F
6 1 FUNDAMENTALS

(c) An element x0 ∈ X is called an interior point of A if there exists an


open set G such that x0 ∈ G ⊂ A, i.e., x0 is an element of int(A). In this
case, the set A is called a neighborhood of x0 .
(d) An element x0 ∈ X is called a boundary point of A if for any open set
G containing x0 , we have
G ∩ A = ∅ and G ∩ Ac = ∅.
The set of all boundary points of A is called the boundary of this set
and is denoted by bd(A).

Proposition 1.8 Let X be a topological space. Then we have


(a) ∅ and X are closed.
(b) If F1 and F2 are closed, then F1 ∪ F2 is closed.
(c) If {Fi }i∈I is an arbitrary collection of closed sets, then i∈I Fi is closed.

Proof. Since X and ∅ are open with ∅ = X c and X = ∅c , the first property
in the proposition is satisfied.
To verify the second property, we get by the De Morgan law that
 c
F1 ∪ F2 = F1c ∩ F2c .
This is an open set, and thus F1 ∪ F2 is closed.
To check the third property, we use the De Morgan law again and get
c 
Fi = Fic ,
i∈I i∈I

which is an open set. Thus the set i∈I Fi is closed. 

Proposition 1.9 Let X be a topological space, let x0 ∈ X, and let A be a


subset of X. Then the following properties hold:
(a) A is closed, and A is closed if and only if A = A.
(b) int(A) is open, and A is open if and only if int(A) = A.
(c) x0 ∈ A if and only if G ∩ A = ∅ for every open set G containing x0 .
(d) A = int(A) ∪ bd(A).
(e) A \ bd(A) = int(A).

Proof. We start by verifying the first property. It follows from the definition of
A that A is closed as the intersection of a collection of closed sets. In addition
we have A ⊂ A. Suppose now that A is closed and get

A= F ⊂ A.
F closed, A⊂F

This yields A = A. If A = A, then A is closed because A is closed.


1.1 Topological Spaces 7

To proceed with the second property, observe that the interior is an open
set because it is the union of a family of open sets. In addition we have
int(A) ⊂ A. Supposing that A is open gives us

A⊂ G = int(A),
G open, G⊂A

which implies that A = int(A). To check the converse statement, it is sufficient


to see that if int(A) = A, then A is obviously open due to the openness
property of the interior int(A).
To verify the third assertion, fix x0 ∈ A and take any open set G that
contains x0 . Let us show that G ∩ A = ∅. Suppose on the contrary that
G ∩ A = ∅ and then get A ⊂ Gc . Since Gc is closed, we have A ⊂ Gc and
so A ∩ G = ∅, which is a contradiction due to x0 ∈ A ∩ G. To verify the
converse statement, suppose that for every open set G containing x0 we have
G ∩ A = ∅ and then show that x0 ∈ A. Indeed, if on the contrary x0 ∈ / A,
then x0 ∈ (A)c , which is an open set. Thus (A)c ∩ A = ∅, and this brings us
to a contradiction due to the trivial inclusion A ⊂ A.
We leave as exercises for the reader to verify that the last two properties
of the proposition are also satisfied. 

1.1.3 Continuity of Mappings

It has been well recognized that the concept of continuity for mappings is one
of the most fundamental in analysis and applications. Here we present some
basic results in the framework of general topological spaces.

Definition 1.10 A mapping f : X → Y between topological spaces is said to


be continuous at a given point x0 ∈ X if for every neighborhood V of
f (x0 ), there exists a neighborhood U of x0 such that f (U ) ⊂ V . The mapping
f is said to be continuous (on X) if it is continuous at every x0 ∈ X.

Next we present two theorems concerning the inverse mapping behavior


and various operations over continuous mappings.
Theorem 1.11 Let f : X → Y be a mapping between topological spaces. Then
the following properties are equivalent:

(a) f is continuous.
(b) f −1 (G) is open for every open set G in Y .
(c) f −1 (F ) is closed for every closed set F in Y .

Proof. Note that for any subset A of Y , we have


 c
f −1 (Ac ) = f −1 (A) .
Thus (b) and (c) are equivalent. It remains to show that properties (a) and
(b) are equivalent.
8 1 FUNDAMENTALS

(a) =⇒ (b): Suppose that f is continuous and G is an open set in Y . To check


that f −1 (G) is open in X, fix any x0 in f −1 (G) and get f (x0 ) ∈ G. Since
G is a neighborhood of f (x0 ), there exists a neighborhood U of x0 such that
f (U ) ⊂ G. Choose an open set U1 containing x0 such that x0 ∈ U1 ⊂ U. Then
x0 ∈ U1 ⊂ f −1 (G), which verifies that the set f −1 (G) is open.
(b) =⇒ (a): Suppose that f −1 (G) is open for every open set G in Y . Fix
any x0 ∈ X and any neighborhood V of f (x0 ). Then we can find an open
set G in Y such that f (x0 ) ∈ G ⊂ V . Hence x0 ∈ U := f −1 (G), which is a
neighborhood of x0 . In addition, f (U ) ⊂ G ⊂ V . It follows from the definition
that f is continuous at x0 , and thus on X as x0 is arbitrary. 

Theorem 1.12 Let f : X → Y be a mapping between topological spaces. Then


the following properties are equivalent:

(a) f is continuous on X.
(b) f (A) ⊂ f (A) for every subset A of X.
(c) f −1 (int(B)) ⊂ int(f −1 (B)) for every subset B of Y .

Proof. (a) =⇒ (b): Suppose that f is continuous and fix any subset A of X.


Then f (A) is a closed subset of Y , and so f −1 (f (A)) is a closed subset of X.
We also have the inclusions
   
A ⊂ f −1 f (A) ⊂ f −1 f (A) .
It follows furthermore that
 
A ⊂ f −1 f (A) ,
which readily implies (b).
(b)=⇒(a): Suppose that (b) is satisfied. Fix any closed set F in Y and let
A := f −1 (F ). Then we get
f (A) ⊂ f (A) = f (f −1 (F )) ⊂ F = F.
It follows that A ⊂ f −1 (F ) = A, and so A = f −1 (F ) is closed. Then Theorem
1.11 tells us that f is continuous.
(a) =⇒ (c): Suppose that (a) is satisfied and fix any subset B in Y . Then
G := int(B) is open, and so f −1 (G) = f −1 (int(B)) is open. Note that f −1 (G)
is a subset of f −1 (B). Thus we get the inclusion
f −1 (int(B)) ⊂ int(f −1 (B)).
(c)=⇒ (a): Suppose that (c) holds. Fix any open set G ⊂ Y and observe that
f −1 (G) = f −1 (int(G)) ⊂ int(f −1 (G)),
and thus f −1 (G) is open. Then f is continuous on X by Theorem 1.11. 
1.1 Topological Spaces 9

Definition 1.13 Let f : X → Y be a mapping between topological spaces. We


say that f is a homeomorphism if f is a bijection such that both mappings
f and f −1 are continuous.

The next proposition follows from the definition and Theorem 1.12.
Proposition 1.14 Let X and Y be topological spaces, and let f : X → Y be
a homeomorphism. Then the following properties hold:

(a) G is an open set in X if and only if f (G) is an open set in Y .


(b) F is a closed set in X if and only if f (F ) is a closed set in Y .
(c) f (A) = f (A) for every subset A of X.
(d) f −1 (int(B)) = int(f −1 (B)) for every subset B of Y .

1.1.4 Bases for Topologies

In this subsection, we define and study bases for topological spaces. The main
idea involves using a collection of open sets in a topological space to represent
the whole topology.
Definition 1.15 Let (X, τ ) be a topological space, and let B ⊂ τ . We say that
B is a basis for the space (X, τ ) (or for the topology τ ) if for every set G ∈ τ ,
there exists a collection of subsets B ⊂ B such that

G= V.
V ∈B

This notion admits the following simple description.

Proposition 1.16 Let (X, τ ) be a topological space, and let B ⊂ τ . The collec-
tion B is a basis for the topological space (X, τ ) if and only if for every x ∈ X
and every open set G containing x, there exists V ∈ B such that x ∈ V ⊂ G.

Proof. =⇒: Suppose that B is a basis for (X, τ ). Fix G ∈ τ , x ∈ G and then
find by definition a collection B  ⊂ B such that

x∈G= V.
V ∈B

Thus there exists V ∈ B ⊂ B with x ∈ V ⊂ G.

 set G. For any x ∈ G we find Vx ∈ B satisfying


⇐=: Fix an arbitrary open
x ∈ Vx ⊂ G. Then G = x∈G Vx , and so B ⊂ τ is a basis for (X, τ ). 

Example 1.17 Let X be an arbitrary metric space. Consider the collection


of open balls in X given by
  
B := B(x; r)  x ∈ X, r > 0 .
10 1 FUNDAMENTALS

Then B is a basis for the metric topology on X. Indeed, by the definition of


open sets in metric spaces, for any open set G in X and for any x ∈ G there
exists r > 0 such that
B(x; r) ⊂ G.
Therefore, B is a basis for the metric topology on X.

The next theorem shows how to determine a topology with a given basis.

Theorem 1.18 Let X be a set, and let B ⊂ P(X) be a collection of subsets


of X that satisfies the following conditions:

(a) For any U, V ∈ B and x ∈ U ∩ V , there exists W ∈ B ensuring the


inclusions x ∈ W ⊂ U ∩ V .
(b) For any x ∈ X, there exists V ∈ B such that x ∈ V .

Then there is a topology τ on X such that B is a basis for (X, τ ).


Proof. Define the set collection
  
 
 
τ := G⊂XG= V, B ⊂ B .
  V ∈B

Observe from the definition of τ that G ∈ τ if and only if for any x ∈ G, there
exists V ∈ B such that x ∈ V ⊂ G. By the second condition of the theorem we
have that X = V ∈B V , and so X ∈ τ . We also see that ∅ ∈ τ since B  could
be an empty collection. Obviously, τ is closed under arbitrary unions. Let us
show that τ is closed under finite intersections. Fix G1 and G2 in τ and pick
x ∈ G1 ∩ G2 . Then there exists Vix ∈ B such that x ∈ Vix ⊂ Gi for i = 1, 2.
Since x ∈ V1x ∩ V2x , we find V x ∈ B with x ∈ V x ⊂ V1x ∩ V2x ⊂ G1 ∩ G2 .
Thus G1 ∩ G2 ∈ τ by the observation at the beginning of the proof. From the
definition of τ , we conclude that B is a basis for (X, τ ). 

1.1.5 Topologies Generated by Families of Mappings

In this subsection, we consider two special types of topologies generated by


families of mappings. The first one from the following theorem is called the
weak topology generated by a family of mappings.

Theorem 1.19 Let X be a set, and let {Xα }α∈I be a collection of topological
spaces. For each α ∈ I, let fα be a mapping from X to Xα , and let τα denote
the topology in Xα . Consider the family of sets given by
  
m 

B := G = fα−1 (Gαi )  αi ∈ I, Gαi ∈ ταi , m ∈ N .
i

i=1

Then B is a basis for a topology τ on X. Moreover, τ is the weakest topology


such that each fα is continuous.
1.1 Topological Spaces 11

Proof. Fix G1 and G2 in B with x ∈ G1 ∩ G2 . Then


m p
G1 = fα−1
i
(Gαi ) and G2 = fα−1
 (Gα ),
i
i
i=1 i=1

where αi , αi ∈ I, Gαi ∈ ταi and Gαi ∈ ταi .


Obviously, G1 ∩G2 ∈ B; so we let G := G1 ∩G2 and have x ∈ G ⊂ G1 ∩G2 .
Fix any x ∈ X and fix α ∈ I. Then x ∈ X = fα−1 (Xα ) ∈ B. It follows from
Theorem 1.18 that B is a basis of some topology τ on X. Each fα is continuous
since for any open set V in Xα , we have fα−1 (V ) ∈ B ⊂ τ . Note that if τ  is
a topology on X such that each fα is continuous, then τ ⊂ τ  . Indeed, from
the definition of B we get that B ⊂ τ  , which implies that τ ⊂ τ  . 

The next result, which also relates to Theorem 1.19, shows how to check
the continuity of a mapping via its compositions with fα for α ∈ I.

Theorem 1.20 In the setting of Theorem 1.19, consider a mapping g : Z → X


defined on a topological space Z. Then the mapping g is continuous if and only
if the composition fα ◦ g is continuous for every α ∈ I.

Proof. =⇒: Suppose that g is continuous. Since each fα is continuous, the


composition fα ◦ g is also continuous for every α ∈ I.
⇐=: Let fα ◦ g be continuous for every α ∈ I. Fix any open set G in the basis
B from Theorem 1.19. Then there exist elements αi ∈ I and Vαi ∈ ταi for
i = 1, . . . , m such that
m m m
G= fα−1
i
(Vαi ), and so g −1 (G) = g −1 (fα−1
i
(Vαi )) = (fαi ◦ g)−1 (Vαi ).
i=1 i=1 i=1

Thus g −1 (G) is open in Z, which verifies the continuity of g. 

The next proposition can be checked directly.

Proposition 1.21 Let X be a set, and let {(Xα , τα )}α∈I be a collection of


topological spaces. For each α ∈ I consider a mapping gα : Xα → X and
define a collection of subsets of X by
  
τ := G ⊂ X  gα−1 (G) ∈ τα for all α ∈ I .
Then τ is a topology on X and it is the strongest topology such that gα is
continuous for each α ∈ I.

The last result of this subsection is actually a version of Theorem 1.20.

Proposition 1.22 In the setting of Proposition 1.21, consider a mapping


f : X → Y taking values in a topological space. Then f is continuous on
X if and only if the composition f ◦ gα is continuous on Xα for all α ∈ I.
12 1 FUNDAMENTALS

Proof. =⇒: Suppose that f is continuous. Since each gα is continuous, f ◦ gα


is continuous for every α ∈ I.
⇐=: Suppose that f ◦ gα is continuous for every α ∈ I. Fix any open set G in
Y . Then we have
 
(f ◦ gα )−1 (G) = gα−1 f −1 (G) ∈ τα for all α ∈ I.
It follows from Proposition 1.21 that f −1 (G) ∈ τ , and so f is continuous. 

1.1.6 Product Topology and Quotient Topology

In this subsection, we study the product topology and the quotient topology
using topologies generated by families of mappings considered in the previous
subsection.

Definition 1.23 Let (Xα , τα ), α ∈ I, be a collection of topological spaces.


Consider the Cartesian product set

X := Xα
α∈I

and for each α ∈ I define the projection mapping pα : X → Xα by


pα (x) := xα with x = (xα )α∈I .
The product topology on X is the weakest topology such that each pα for
α ∈ I is continuous.

The following result describes natural relationships between the bases in


the product topology and its components.

Proposition 1.24  Let (Xα , τα ) for α ∈ I be a collection of topological spaces,


and let X := α∈I Xα be equipped with the product topology τ . Consider
the collection of sets B defined as follows: G ∈ B if and only if there exist
{α1 , . . . , αm } ⊂ I, m ∈ N, and Vαi ∈ ταi for i = 1, . . . , m such that

G= Vα ,
α∈I

where Vα = Xα for α ∈
/ {α1 , . . . , αm }. Then B is a basis for the product
topology τ on X.

Proof. For any subset {α1 , . . . , αm } ⊂ I with m ∈ N and any Vαi ∈ ταi for
i = 1, . . . , m, we have
m 
p−1
αi (Vαi ) = Vα ,
i=1 α∈I

where Vα = Xα for α ∈ / {α1 , . . . , αm }. Thus the conclusion of this theorem


follows directly from Theorem 1.19. 
1.1 Topological Spaces 13

Let us now discuss the case where I is a finite set. For simplicity we confine
ourselves to the case where I = {1, 2}.
Corollary 1.25 Let X1 and X2 be topological spaces, and let X = X1 × X2 .
Then the collection of sets
  
B := V1 × V2  V1 is open in X1 , V2 is open in X2
is a basis for the product topology on X.

Consider a nonempty set X with an equivalence relation R on X. For each


x ∈ X, denote the equivalence class of x by [x] and define the quotient set
  
X/R := [x]  x ∈ X .
We continue with the definition of the quotient topology.
Definition 1.26 Let X be a topological space, and let R be an equivalence
relation on X. Consider the quotient mapping π : X → X/R given by
π(x) := [x]. Then the strongest topology on X/R which makes π continuous
is called the quotient topology on X/R.
It follows directly 
from the definition that a subset G is open in X/R if
and only if π −1 (G) = [x]∈G [x] is an open set in X. The next result is a direct
consequence of Proposition 1.22.

Proposition 1.27 Let X be a topological space, and let R be an equivalence


relation on X. Consider the quotient space X/R and the quotient mapping
π : X → X/R. Let f : X/R → Y be a mapping with values in a topological
space Y . Then f is continuous if and only if f ◦ π is continuous.

1.1.7 Subspace Topology

This subsection builds topologies on nonempty subsets of a topological space


from the given topology.
Proposition 1.28 Let (X, τ ) be a topological space. Consider any subset Y
of X and define the collection
  
τY := G ∩ Y  G ∈ τ . (1.1)
Then τY is a topology on Y .

Proof. Since ∅ = ∅ ∩ Y and Y = X ∩ Y , we see that ∅, Y ∈ τY . Let {Vα }α∈I


be a collection of elements of τY . For each α ∈ I, there is Gα ∈ τ with
   
Vα = Gα ∩ Y, and so Vα = (Gα ∩ Y ) = Gα ∩ Y.
α∈I α∈I α∈I
 
Since α∈I Gα ∈ τ , it follows that α∈I Vα ∈ τY . Now we fix V1 , V2 ∈ τY and
find G1 , G2 ∈ τ such that
14 1 FUNDAMENTALS

Vi = Gi ∩ Y for i = 1, 2,
which clearly gives us the equalities
V1 ∩ V2 = (G1 ∩ Y ) ∩ (G2 ∩ Y ) = (G1 ∩ G2 ) ∩ Y.
Since G1 ∩ G2 ∈ τ , we get V1 ∩ V2 ∈ τY and confirm that τY is a topology. 

Definition 1.29 Let (X, τ ) be a topological space, and let Y be a subset of


X. Consider the collection τY of subsets of Y given by (1.1). Then (Y, τY )
is called a topological subspace of (X, τ ), and τY is called the subspace
topology on Y .

Proposition 1.30 Let X be a topological space, and let Y be a topological


subspace of X. A set K ⊂ Y is closed in Y if and only if there exists a closed
set F in X such that K = F ∩ Y .

Proof. Suppose that K is closed in Y . Then Y \ K is open in Y , and so there


exists an open set G in X such that
Y \ K = G ∩ Y.
This ensures the representations
K = Y \ (Y \ K) = Y \ (G ∩ Y ) = (Y \ G) ∪ (Y \ Y ) = Y \ G = Y ∩ Gc .
Defining the closed set F := Gc gives us K = F ∩ Y . Conversely, suppose that
K = F ∩ Y for some closed set F in X. Then
Y \ K = Y \ (F ∩ Y ) = (Y \ F ) ∪ (Y \ Y ) = Y \ F = Y ∩ F c .
Since G := F c is open, the set Y \ K is open in Y , and so K is closed in Y . 

The next corollary is a consequence of Propositions 1.28 and 1.30.

Corollary 1.31 Let X be a topological space, and let Y be a subspace of X.


Then the following assertions hold:
(a) If Y is an open set in X, then a subset G of Y is open in Y if and only
if it is open in X.
(b) If Y is a closed subset of X, then a subset K of Y is closed in Y if and
only if it is closed in X.
Proposition 1.32 Let X be a topological space, and let Y be a topological
subspace of X. For a subset A of Y , denote by clY (A) the closure of A in Y .
Then we have the representation
clY (A) = Y ∩ A.
1.1 Topological Spaces 15

Proof. Since A is closed in X, the set Y ∩ A is a closed set in Y that contains


A, and hence we get clY (A) ⊂ Y ∩ A. Furthermore, the closedness of clY (A)
in Y allows us to find a closed subset F of X such that
clY (A) = Y ∩ F.
Then A ⊂ Y ∩ F ⊂ F , and A ⊂ F . This yields clY (A) = Y ∩ F ⊃ Y ∩ A. 

Proposition 1.33 Let (X, τ ) be a topological space, and let Y be a subspace


of X. Suppose that B is a basis for the topology on X. Then the collection
  
BY := V ∩ Y  V ∈ B
is a basis for the subspace topology on Y .

Proof. Since B ⊂ τ , it follows from the definition of the subspace topology


τY on Y that BY ⊂ τY . Fix now any open set W in Y with y ∈ W and find
an open set G in X such that W = G ∩ Y . Then y ∈ G, and so there exists
V ∈ B with y ∈ V ⊂ G. It follows therefore that
y ∈ V ∩ Y ⊂ G ∩ Y = W.
Since VY := V ∩ Y ∈ BY , Proposition 1.16 tells us that the collection BY is a
basis for the subspace topology on Y . 

1.1.8 Separation Axioms

Here we describe some important classes of general topological spaces and


establish relationships between them.

Definition 1.34 Let X be a topological space. We say that

(a) X is a T1 topological space if for any x and y in X with x = y, there


exist an open set U containing x and an open set V containing y such
that y ∈/ U and x ∈ / V.
(b) X is a T2 /Hausdorff topological space if for any x, y ∈ X with
x = y, there exist an open set U containing x and an open set V contain-
ing y such that U ∩ V = ∅.
(c) X is a T3 /regular topological space if every singleton in X is
closed, and if for any x ∈ X and any closed set F with x ∈/ F there exist
open sets U containing x and V containing F such that U ∩ V = ∅.
(d) X is a T4 /normal topological space if every singleton in X is closed,
and if for any two disjoint closed sets A and B there exist open sets U
containing A and V containing B such that U ∩ V = ∅.

It follows directly from the definition that for a topological space X, the
following implications hold:
[X is T4 ] =⇒ [X is T3 ] =⇒ [X is T2 ] =⇒ [X is T1 ].
16 1 FUNDAMENTALS

Proposition 1.35 Let X be a topological space. If X is T1 , then every sin-


gleton in X is closed.
Proof. Consider any singleton F:={x} in X and fix y ∈ F c . Then x = y, and
so we can find an open set V containing y such that x ∈
/ F . Then
y ∈ V ⊂ F c,
which verifies that F is a closed set. 

Remark 1.36 The converse of Proposition 1.35 is not true in general. A


topological space X is called a T0 space if every singleton in X is closed. It
can be rephrased as follows: for any two distinct points in X, there exists an
open set that contains one point and does not contain the other.

Proposition 1.37 Let X be a topological space such that every singleton set
is closed. Then X is T3 if and only if for any x ∈ X and for any open set G
containing this point, there exists an open set V such that x ∈ V ⊂ V ⊂ G.

Proof. Suppose that X is T3 . Fix x ∈ X and an open set G containing x.


Then x ∈/ Gc , where the set Gc is closed, and hence we can find two open sets
V and W such that
x ∈ V, Gc ⊂ W, and V ∩ W = ∅.
Thus V ⊂ W c ⊂ G, and so x ∈ V ⊂ V ⊂ W c ⊂ G.
To verify the converse statement, fix any x ∈ X and any closed set F with
/ F , and then get x ∈ G := F c . Choose further an open set V with
x∈
x ∈ V ⊂ V ⊂ F c.
Since the set W := (V )c is open, it follows that
x ∈ V, F ⊂ W, and V ∩ W = ∅,
which therefore completes the proof. 
The following proposition is verified similarly.

Proposition 1.38 Let X be a topological space such that every singleton is


closed. Then X is T4 if and only if for any closed set F in X and any open
set G containing F , there exists an open set V such that F ⊂ V ⊂ V ⊂ G.

One more proposition is needed before the proof of the next theorem.

Proposition 1.39 In the setting of Theorem 1.19, suppose that the space Xα
is Hausdorff for every α ∈ I, and that {fα }α∈I separates point in the sense
that for any x = y ∈ X there is α ∈ I with fα (x) = fα (y). Then the weak
topology τw generated by {fα }α∈I is Hausdorff.
1.1 Topological Spaces 17

Proof. Fix any x, y ∈ X with x = y and choose α ∈ I with fα (x) = fα (y).


Since Xα is Hausdorff, we find two open sets U and V in Xα such that
fα (x) ∈ U , fα (y) ∈ V , and U ∩ V = ∅. Define U1 := fα−1 (U ), V1 := fα−1 (V )
and employ Theorem 1.19. Then we have x ∈ U1 ∈ τw , y ∈ V1 ∈ τw , and
U1 ∩ V1 = ∅, which verifies that τw is Hausdorff. 

Theorem 1.40 The following properties hold:


(a) Any subspace of a Ti topological space is also a Ti topological space for all
i = 1, 2, 3.
(b) The product of Ti topological spaces is also a Ti topological space for all
i = 1, 2, 3.

Proof. To verify the first property, suppose that X is a T1 space and that Y
is a subspace of X. Fix x, y ∈ Y with x = y and choose open sets U, V ⊂ X
such that x ∈ / U . Let UY := Y ∩ U and VY := V ∩ Y . Then UY
/ V and y ∈
and VY are open in Y with x ∈ / VY and y ∈/ UY , and so Y is T1 .
Suppose now that X is a T2 space and that Y is a subspace of X. Fix
x, y ∈ Y with x = y and choose open sets U, V ⊂ X such that U ∩ V = ∅. Let
UY := Y ∩ U and VY := V ∩ Y . Then the sets UY and VY are open in Y with
UY ∩ VY = U ∩ V ∩ Y = ∅, which verifies that Y is T2 . Further, let X be a T3
space, and let Y be a subspace of X. Then Y is T2 , so every singleton in Y
is closed. Fix x ∈ Y and a closed set FY ⊂ Y such that x ∈ / FY . Then there
exists a closed set F ⊂ X with FY = Y ∩ F . This shows that x ∈ / F . By the
regularity property of X, we find open sets V containing x and W containing
F such that V ∩ W = ∅. Defining
VY := V ∩ Y and WY := W ∩ Y
tells us that VY and WY are open in Y with x ∈ VY , that FY ⊂ WY , and
that VY ∩ WY = ∅. Hence we verify that Y is T3 , which completes the proof
of the first assertion of the theorem.
To justify next the second assertion, let {Xα }α∈I be a family
of T1 spaces.
Fix any x = (xα ) and y = (yα ) in the product space X := α∈I Xα with
x = y. Then there exists α0 ∈ I such that xα0 = yα0 . Since Xα0 is T1 , we find
open sets Vα0 and Wα0 in Xα0 satisfying
xα0 ∈
/ Wα0 and yα0 ∈
/ Vα0 .
Define now the inverse image sets
V := p−1 −1
α0 (Vα0 ) and W := pα0 (Wα0 ),

which are open in X with x ∈ / W and y ∈ / V . This shows that X is T1 . It is


easy to check that the product of T2 topological spaces isT2 . Considering an
arbitrary family {Xα }α∈I of T3 spaces, we see that X := α∈I Xα is T2 , and
so every singleton therein is closed. Fix any x = (xα ) ∈ X, pick an open set
G containing x, and choose a basis element of x in the form
18 1 FUNDAMENTALS

V := Vα ⊂ G,
α∈I

where Vα := Xα for all α ∈ / {α1 , . . . , αm } ⊂ I and Vαi is open in Xαi for


i = 1, . . . , m. Since Xαi is regular, there exist open sets Uαi with
xαi ∈ Uαi ⊂ U αi for all i = 1, . . . , m.
Define finally the product set

U := Uα ,
α∈I

where Uα := Xα for α ∈ / {α1 , . . . , αm }. Then U is an open set in X that


contains x and satisfies the inclusions

x∈U ⊂U = U α ⊂ V ⊂ G.
α∈I

This shows that X is T3 and thus completes the proof of the theorem. 

1.1.9 Compactness

The concept of compactness and its modifications are crucial in many aspects
of mathematical analysis. In this subsection, we discuss basic facts about the
compactness in general topological spaces with specifications in metric spaces
and finite dimensions.

Definition 1.41 A subset K of a topological space X is called compact if


for any union of open subsets of X covering K, there exists a finite subunion
covering this set. A topological space X is called compact if it is a compact
subset of itself.

This definition can be rephrased as follows: a subset K  of a topological


space X is compact if and only if for any covering K ⊂ α∈I  Gα of K by
m
open sets Gα , there exist αi ∈ I for i = 1, . . . , m such that K ⊂ i=1 Gαi .
The following three propositions are simple consequences of the definitions.

Proposition 1.42 A closed subset of a compact topological space is compact.

 subset K ⊂ X and
Proof. Let X be a compact topological space. Fix a closed
show that K is compact. Taking an open covering K ⊂ α∈I Gα of K, we get
 
X= Gα ∪ K c .
α∈I

Since X is compact, there exist αi ∈ I for i = 1, . . . , m such that


m
 
X= Gαi ∪ K c .
i=1
1.1 Topological Spaces 19

Applying the De Morgan law gives us


m

K =K ∩X ⊂ Gαi
i=1

and thus completes the proof. 


Proposition 1.43 Every compact subset of a Hausdorff topological space is
closed in this topological space.
Proof. Taking a compact subset K of a Hausdorff topological space X, we
need to show that K c is open. Fix x ∈ K c and observe that for any y ∈ K,
there exist open subsets Vy , Wy of X such that
x ∈ Vy , y ∈ Wy , and Vy ∩ Wy = ∅.

Since K ⊂ y∈K Wy , there exist elements y1 , . . . , ym ∈ K with K ⊂
m
i=1 W y i
. Define the open set
m
V := V yi
i=1

for which x ∈ V ⊂ K c . Thus K c is open, and so K is closed. 


Proposition 1.44 Let f : X → Y be a continuous mapping between arbitrary
topological spaces X and Y . If K is a compact subset of X, then f (K) is a
compact subset of Y .
Proof. Suppose that 
f (K) ⊂ Gα ,
α∈I

where Gα is open in Y for all α ∈ I. Then



K⊂ f −1 (Gα ).
α∈I

Since f is continuous, we have that the preimage set f −1 (Gα ) is open for
every α ∈ I, and so there exist α1 , . . . , αm ∈ I for which
m

K⊂ f −1 (Gαi ).
i=1

Thus we arrive at the inclusion


m

f (K) ⊂ Gαi ,
i=1

which verifies the compactness of f (K) in Y . 


20 1 FUNDAMENTALS

Now we formulate a major property in the theory of topological spaces


and characterize it in terms of topological compactness.
Definition 1.45 Let X be a set, and let A := {Aα }α∈I be a family of sub-
sets of X. The collection A is said to satisfy the finite intersection
property(FIP) if for any finite set {α1 , . . . , αm } ⊂ I, the intersection
m
i=1 Aαi is nonempty.

Theorem 1.46 Let X be a topological space. Then X is compact if and only if


every collection of closed subsets of X satisfying the finite intersection property
has a nonempty intersection.

Proof. =⇒: Suppose that X is compact. Let A := {Aα }α∈I be a family of


closed subsets of X that satisfies the finite intersection property. Suppose on
the contrary that α∈I Aα = ∅. Then
   
X=X\ Aα = X \ Aα = Acα .
α∈I α∈I α∈I

Since Acα is open for every α ∈ I, the compactness of X yields the existence
of α1 , . . . , αm ∈ I such that
m
 m 
X= Acαi = X \ Aαi .
i=1 i=1
m
Recalling that i=1 Aαi = ∅ brings us to a contradiction, and so α∈I Aα = ∅.
⇐=: To verify the converse statement, suppose that

X= Aα ,
α∈I

where all Aα are open, and so α∈I Acα = ∅. Since the collection {Acα }α∈I
m
consists of closed sets, there are α1 , . . . , αm ∈ I with i=1 Acαi = ∅ and
m

X= Aαi ,
i=1

which implies the compactness of X. 

The next lemma is a preparation for the proof of a deep theorem of topo-
logical space theory known as the Tikhonov theorem; see Theorem 1.48.

Lemma 1.47 Let X be a set, and let A := {Aα }α∈I be a collection of subsets
of X having the finite intersection property. Then there is a maximal collection
F of subsets of X that contains A and satisfies the finite intersection property.
In addition, the following properties hold:
(a) F is closed under finite intersections.
1.1 Topological Spaces 21

(b) If B ∩ A = ∅ for every A ∈ F, then B ∈ F.

Proof. The existence of such a maximal collection F follows from Zorn’s


lemma; see Theorem 1.124 below. Let us check the other listed properties.
m
(a) Taking any A1 , . . . , Am ∈ F, we need to show that A := i=1 Ai ∈ F. It
follows from the finite intersection property of F that A = ∅. Suppose on the
contrary that A ∈/ F and denote G := F ∪ {A}. It is easy to see that G satis-
fies the finite intersection property and it contains F properly. The obtained
contradiction shows that A ∈ F and so F is closed under finite intersections.
(b) Suppose that B ∩ A = ∅ for every A ∈ F while B ∈ / F. Denoting H :=
m
F ∪ {B} and taking A1 , . . . , Am ∈ F, we get by (a) that A := i=1 Ai ∈ F
m
and so B ∩ A = B ∩ ( i=1 Ai ) = ∅. Thus H satisfies the finite intersection
property and contains F properly. The obtained contradiction verifies that
B ∈ F and hence completes the proof of the lemma. 

 1.48 Let {Xα }α∈I be a collection of topological spaces, and let


Theorem
X := α∈I Xα be equipped with the product topology. Then the topological
space X is compact if and only if Xα is compact for every α ∈ I.

Proof. It follows from Proposition 1.44 and the construction of the projection
mapping that if X is compact, then Xα is compact for every α ∈ I. Thus we
only need to prove the converse implication. To proceed, consider a collection
A of closed subsets of X satisfying the finite intersection property and then
verify that A∈A A = ∅. By Lemma 1.47, there exists a maximal collection
F of subsets of X satisfying the finite intersection property that contains A
such that the properties (a) and (b) therein are satisfied. Note that

A⊂ A= A.
A∈F A∈A A∈A

Thus it suffices to show that A∈F A = ∅. Fix any α ∈ I. Since F satisfies


the finite intersection property, the collection
  
Fα := pα (A)  A ∈ F
also satisfies this property. It follows from Theorem 1.46 that

B= pα (A) = ∅.
B∈Fα A∈F

For each α ∈ I pick aα ∈ A∈F pα (A), denote a := (aα )α∈I and show that

a∈ A.
A∈F

Indeed, fix any A ∈ F and pick an open set V ⊂ X that contains a. Then


there are αi ∈ I and open sets Vαi in Xαi containing aαi , i = 1, . . . , m, with
22 1 FUNDAMENTALS

m
a∈ p−1
αi (Vαi ) ⊂ V.
i=1

Thus for each i = 1, . . . , m we have Vαi ∩pαi (A) = ∅, and so p−1αi (Vαi )∩A = ∅.
The second assertion of Lemma 1.47 tells us that p−1 αi (V ) ∈ F. Then it follows
m
from the first assertion of Lemma 1.47 that W := i=1 p−1 αi (V αi ) ∈ F. The
finite intersection property of F implies that W ∩A = ∅. Since W ∩A ⊂ V ∩A,
we get a ∈ A, and so a ∈ A∈F A = ∅ as claimed. 
To continue the study of compactness, we recall the following notion.
Definition 1.49 Let X be a topological space, and let A ⊂ X. An element x0
in X (not necessarily in A) is called a cluster point of A if any neighbor-
hood of x0 contains an infinite number of elements of A.
Example 1.50 For illustration we list the following trivial examples:
(a) Let X = R, and let A = [0, 1). Then x0 = 0 is a cluster point of A and
u0 = 1 is also a cluster point of A. In fact, the set of cluster points of A
is the interval [0, 1].
(b) Let X = R, and let A = Z. Then A does not have any cluster point.
(c) Let X = R, and let A = {1/n | n ∈ N}. Then x0 = 0 is the only cluster
point of the set A.
The next result is classical in analysis and goes back to Bolzano and Weier-
strass in the case of real numbers.
Theorem 1.51 Any infinite subset of a compact topological space has at least
one cluster point.
Proof. Suppose on the contrary that A is an infinite subset of a compact
topological space X and that A does not have any cluster point. Then for
each x ∈ X, there is an open set Vx such that  this set contains only a finite
number of elements of A. Clearly we have X = x∈X mVx . The compactness of
X gives us xi ∈ X for i = 1, . . . , m such that X = i=1 Vxi . It follows that
m
 
A=A∩X = A ∩ Vxi .
i=1

This implies that A is finite, a contradiction. 


The following notion of sequential compactness is more convenient to deal
with in comparison to the compactness and may be different from the latter
in the general framework of topological spaces.
Let {xk } be a sequence in a topological space. Recall that if {kl } is a
sequence of positive integers that is strictly increasing, then {xkl } is called
a subsequence of {xk }. We say that a sequence {xk } in a topological space
X converges if there exists an element x ∈ X such that for any open set V
k ∈ N such that xk ∈ V for all k ≥ 
containing x, there is  k.
1.1 Topological Spaces 23

Definition 1.52 Let X be a topological space, and let K ⊂ X.


(a) K is a sequentially compact set in X if for every sequence {xk } in
K, there exists a subsequence {xkl } that converges to a point x ∈ K.
(b) X is a sequentially compact space if it is a sequentially compact
subset of itself, i.e., every sequence in X has a convergent subsequence.

Proposition 1.53 If X is a metric space, then the compactness of X yields


its sequential compactness.

Proof. Let X be compact metric space. Fix any sequence {xk } in X and show
that it has a convergent subsequence. Consider the following two cases:
Case 1: The set A := {xk | k ∈ N} is infinite. Theorem 1.51 ensures that in
this case A has a cluster point x0 . Let us show that there exists a subsequence
of xk that converges to x0 . Indeed, it follows from the cluster point definition
that each open ball centered at x0 contains an infinite number of elements of
A. For l = 1, the open ball B(x0 ; 1) contains an element xk1 of A. For l = 2,
the open ball B(x0 ; 1/2) contains an infinite number of elements of A, and
thus there is k2 > k1 such that xk2 ∈ B(x0 ; 1/2). Continuing this process, we
find a sequence of positive integers k1 < k2 < . . . with xkl ∈ B(x0 ; 1/l) for
every l ∈ N. It shows that the sequence {xkl } converges to x0 .
Case 2: The set A = {xk | k ∈ N} is finite. In this case we get x0 ∈ A such
that xk = x0 for infinitely many k, say k1 < k2 < . . .. Hence the constant
subsequence {xkl } converges to x0 . 

In the case of metric spaces, we have the following versions of boundedness


expressed via balls, i.e., via the distance on the space in question.

Definition 1.54 A subset A of a metric space X is called bounded if there


exist x ∈ X and r > 0 such that
A ⊂ B(x; r).
The set A is called totally bounded if for every number ε > 0, there
exist finitely many points x1 , . . . , xm ∈ X for which
m

A⊂ B(xi ; ε).
i=1

It follows from the definition that if A is totally bounded, then it is


bounded. A sufficient condition for the total boundedness is given below.
Theorem 1.55 Let X be a metric space. If X is sequentially compact, then
X is complete and totally bounded.

Proof. Assume that X is sequentially compact and show that X is complete.


Fix any Cauchy sequence {xk } in X. By the sequential compactness of X, this
24 1 FUNDAMENTALS

sequence has a convergent subsequence. Thus the sequence itself is convergent,


since a Cauchy sequence having a convergent subsequence is convergent. Next
we will verify that X is totally bounded. Indeed, suppose on the contrary that
it is not totally bounded. Then there is ε > 0 such that the space X cannot
be covered by a finite union of open balls with radius ε. Fix an open ball
B(x1 ; ε) with x1 ∈ X. There must be a point x2 ∈ X with x2 ∈ / B(x1 ; ε),
and so d(x1 , x2 ) ≥ ε. Similarly we get X = B(x1 ; ε) ∪ B(x2 ; ε), and hence
there exists x3 ∈ X such that d(x1 , x3 ) ≥ ε and d(x2 , x3 ) ≥ ε. Following this
process, find a sequence {xk } such that d(xk , xp ) ≥ ε for every k = p. Then
the sequence {xk } has a convergent subsequence {xkl }. This is a contradiction
because if it holds, then the subsequence must be a Cauchy sequence, which
contradicts the fact that d(xkl , xnp ) ≥ ε for l = p. 
The next classical result is fundamental in the theory of metric spaces. It
is known as the Heine-Borel theorem.
Theorem 1.56 Let X be a metric space. Then X is compact if and only if it
is sequentially compact.
Proof. By Proposition 1.53, we only need to show that sequential compactness
implies compactness. Suppose that X is sequentially compact. Assuming on
the contrary that X is not compact tells us that there exists an open covering
{Gα }α∈I that does not have any finite subcovering. Theorem 1.55 tells us that
X is totally bounded. Thus for ε = 1 we can cover X by a finite number of open
balls with radius 1. Then for at least one of those balls, denoted by B1 , the set
X1 := X ∩B1 cannot be covered by a finite number of Gα . Since X1 is a subset
of X, it can be covered by a finite number of balls with radius 1/2. Likewise
for at least one of those balls, denoted by B2 , the set X2 := X1 ∩ B2 ⊂ X1
cannot be covered by a finite number of Gα . Continuing this process, we find
a sequence of balls Bn and nonempty sets Xn := Xn−1 ∩ Bn for n ∈ N. It
is obvious that Xn ⊂ Xn−1 for every n, and hence we arrive in this way
∞ ∞
at n=1 Xn = ∅. Pick now x0 ∈ n=1 Xn and get α0 ∈ I and r > 0 with
B(x0 ; r) ⊂ Gα0 . Since x0 ∈ Xn for every n, we can find n ∈ N so large that
Bn ⊂ B(x0 ; r) ⊂ Gα0 . Thus Xn ⊂ Gα0 , which is a clear contradiction that
completes the proof of the theorem. 
The following theorem provides yet another characterization of compact-
ness in the framework of metric spaces.
Theorem 1.57 Let X be a metric space. Then the following are equivalent:
(a) X is compact.
(b) X is complete and totally bounded.
Proof. Observe first that implication (a)=⇒(b) is a consequence of Theo-
rems 1.55 and 1.56. To verify the opposite implication (b)=⇒(a), assume that
X is complete and totally bounded. To prove the compactness of X, we only
1.1 Topological Spaces 25

need to check that every sequence {xk } has a Cauchy subsequence since the
completeness of X yields in this case the convergence of this subsequence. It
follows from the total boundedness of X that this space can be covered by
finitely many balls of radius r = 1. Thus there exists at least one of these
balls, denoted by B1 , such that we can find infinitely many elements xk ∈ B1
(1)
as k ∈ N. Let {xk } be a subsequence of {xk } formed by those elements. Like-
wise X is covered by finitely many open balls of radius r = 1/2, and there
(1)
exists such a ball, denoted by B2 , which contains infinitely many xk . Denote
(2) (1)
by {xk } a subsequence of {xk } formed by those elements and then continue
(1) (2)
this process. Now fix an element xk1 from {xk } and in {xk } choose xk2 with
(3)
k2 > k1 , in {xk } choose xk3 with k3 > k2 , and so on. In this way we con-
struct a subsequence {xkl } with the property that if p ≥ l, then xkl , xkp ∈ Bl .
Thus d(xkl , xkp ) ≤ 2/l → 0 as l, p → ∞. Since the space X is complete, this
subsequence is convergent. 

The result of the next theorem characterizes compact sets in finite-


dimensional spaces.

Theorem 1.58 Any closed and bounded subset of Rn is compact.

Proof. Let K be a closed and bounded subset of Rn . Then it is itself a complete


metric space with respect to the metric induced by the Euclidean norm on
Rn . It is not hard to show that K is totally bounded, and hence compact. 

To conclude this subsection, we present the classical Weierstrass theorem


on the existence of maxima and minima for continuous functions on compact
topological spaces.
Theorem 1.59 Let X be a compact topological space, and let f : X → R be
a continuous real-valued function. Then this function achieves its minimum
and maximum values on X.

Proof. We need to verify the existence of a, b ∈ X such that


f (a) = inf f (x) and f (b) = sup f (x).
x∈X x∈X

Since X is compact, Proposition 1.44 tells us that f (X) is compact in R. The


compactness of any nonempty subset K ⊂ R trivially ensures that both num-
bers inf(K) and sup(K) belong to K. Thus f achieves its (absolute, global)
minimum and maximum on X. 

1.1.10 Connectedness and Disconnectedness

The material of this subsection plays an essential role in the subsequent study
of convex geometry and related issues.
26 1 FUNDAMENTALS

Definition 1.60 Let X be a topological space. We say that X is discon-


nected if there exist two nonempty open sets U and V such that U ∩ V = ∅
and U ∪ V = X. If X is not disconnected, then it is called connected.
Accordingly, a subset of a topological space is called disconnected or connected
if it is disconnected or connected as a subspace of X, respectively.

The following proposition gives us several convenient characterizations of


disconnectedness of sets in topological spaces.

Proposition 1.61 Let X be a topological space. Then X is disconnected if


and only if X contains a nonempty proper subset that is both open and closed.

Proof. Suppose that X is disconnected. Then there exist two nonempty open
sets U and V such that U ∩ V = ∅ and U ∪ V = X. Since U = V c = X \ V ,
we see that U is both open and closed being a nonempty proper subset of X.
Conversely, suppose that there exists a nonempty proper subset U of X that
is both open and closed. Set V := U c . Then V is nonempty and open with
X = U ∪ V and U ∩ V = ∅. Thus X is disconnected. 

Definition 1.62 Let X be an arbitrary set, and let A be a subset of X. The


characteristic function associated with A is defined by

1 if x ∈ A,
χA (x) := (1.2)
0 if x ∈ X \ A.

Proposition 1.63 A topological space X is disconnected if and only if it has


a nonconstant continuous characteristic function.

Proof. Suppose that X is disconnected. Then there exist two nonempty open
sets U and V such that U ∩ V = ∅ and U ∪ V = X. Considering the charac-
teristic function χU , we see that it is nonconstant and continuous.
Conversely, suppose that there is a nonconstant continuous function f = χA .
Define U := f −1 ((−∞, 1/2)) and V := f −1 ((1/2, ∞)). Then these sets are
nonempty, open, and disjoint with U ∪ V = X. Thus X is disconnected. 

Proposition 1.64 Let X be a topological space, and let D be a subset of X.


Then D is a disconnected set if and only if there exist two open sets U and V
of X such that D ⊂ U ∪ V , U ∩ D = ∅, V ∩ D = ∅, and D ∩ U ∩ V = ∅.

Proof. This proposition follows directly from the definition of disconnected


subsets in a topological space and the fact that a set G is open in D with the
subspace topology if and only if there exists an open set U in X such that
G = D ∩ U. 

Proposition 1.65 A subset D of a topological space X is disconnected if and


only if there exist two nonempty sets A and B such that D = A∪B, A∩B = ∅,
and A ∩ B = ∅.
1.1 Topological Spaces 27

Proof. Let us verify the converse implication first. Suppose that there exist
two nonempty sets A and B such that D = A ∪ B, A ∩ B = ∅, and A ∩ B = ∅.
To show that D is disconnected, we only need to check that A and B are open
c c
in D. Indeed, it is easy to see that A ⊂ B , where the set U := B is open in
X. Moreover, we get the equalities
c c c c
D ∩ U = D ∩ B = (A ∪ B) ∩ B = (A ∩ B ) ∪ (B ∩ B ) = A ∪ ∅ = A,
and thus A is open in D. The verification of openness for B in D is similar.
Now suppose that D is disconnected. Then there exist two open sets U and
V in X such that D ⊂ U ∪ V , U ∩ D = ∅, V ∩ D = ∅, and D ∩ U ∩ V = ∅. It
follows that D = D ∩ (U ∪ V ) = (D ∩ U ) ∪ (D ∩ V ). Setting A := D ∩ U and
B := D ∩ V , we get D = A ∪ B and thus conclude that
A ∩ B = D ∩ U ∩ (D ∩ V ) ⊂ V c ∩ (D ∩ V ) = V c ∩ D ∩ V = ∅.
The proof of A ∩ B = ∅ is similar. 
Example 1.66 It is easy to check that any connected subset of the real line
is an open interval of R.
The next important result shows that connectedness is preserved under
taking images of continuous mappings.
Theorem 1.67 Let f : X → Y be a continuous mapping between topological
spaces. Suppose that X is connected. Then f (X) is a connected subset of Y .
Proof. Suppose on the contrary that f (X) is disconnected. Then there exist
two open sets U and V in Y such that f (X) ⊂ U ∪ V , U ∩ f (X) = ∅,
V ∩ f (X) = ∅, and f (X) ∩ U ∩ V = ∅. Defining now the disjoint sets G1 :=
f −1 (U ) and G2 := f −1 (V ), observe that G1 and G2 are nonempty and open
in X with X = G1 ∪ G2 and G1 ∩ G2 = ∅. Thus we arrive at a contradiction,
which therefore verifies the claim of the theorem. 
We finish this subsection by showing that connectedness is preserved under
taking unions of connected sets. The following lemma is useful for this purpose.

Lemma 1.68 Assume that U and V are nonempty open subsets of X such
that U ∩ V = ∅ and X = U ∪ V . If C is a connected set in X, then either
C ⊂ U or C ⊂ V .
Proof. Suppose on the contrary that C is not a subset of U and that C is also
not a subset of V . Then U1 := C ∩U = ∅ and V1 := C ∩V = ∅. Moreover, these
sets are open in C with U1 ∪ V1 = C and U1 ∩ V1 = ∅. This is a contradiction
since C is assumed to be connected. 
Proposition 1.69 Let {Aα }α∈I be a collection of 
connected subsets of X with
at least one point in common. Then the set C := α∈I Aα is connected.
Proof. Supposing the contrary, we deduce this statement directly from the
definition of connectedness and Lemma 1.68. 
28 1 FUNDAMENTALS

1.1.11 Net Convergence in Topological Spaces

This subsection addresses the concepts of nets, subnets, and net convergence
in general topological spaces. The net language has been recognized as a
convenient tool to deal with convergence, compactness, and related issues
in topological spaces that are not metrizable; in particular, in dual spaces
to nonseparable Banach spaces endowed with the weak∗ topology that are
considered below. In such settings, net convergence is strictly different from
the sequential one while playing a significant role in many aspects of convex
and variational analysis.
To proceed, we consider a nonempty set I with a binary relation  on I.
The binary relation  is a preorder if it is reflexive and transitive in the sense
that for any α, β, and γ in I the following properties hold:
(a) α  α (reflexivity).
(b) If α  β and β  γ, then α  γ (transitivity).
Then (I, ) is called a preordered set. We say that a preordered set (I, ) is a
directed set if for any α, β ∈ I, there exists γ ∈ I such that α  γ and β  γ.
Now we are ready to define the notions of nets and net convergence.
Definition 1.70 Let X be a topological space, and let I be a directed set.
Consider a function x : I → X. Given any α ∈ I, denote xα := x(α) and say
that {xα }α∈I (or {xα }) is a net in X.

Definition 1.71 Let {xα }α∈I be a net in a topological space X. We say that
the net {xα }α∈I converges to x ∈ X and write it as lim xα = x if for any
neighborhood V of x, there exists an element α0 ∈ I such that
xα ∈ V whenever α0  α.
The following two examples describe particular cases of nets. The first one
treats sequences as nets with directed sets of natural numbers.
Example 1.72 Let {xk }k∈N be a sequence in a topological space X, and let
I = N that is a directed set with the usual “less than or equal to” relation.
Then it is obvious to see that {xk }k∈N is a net.
Example 1.73 Let X be a topological space, and let x ∈ X. Denote by Nx
the collection of all open sets containing this point x and define the following
preordered relation on Nx :
U V if and only if U ⊃ V.
It is easy to verify that (Nx , ) is a directed set. For each V ∈ Nx , we pick
a point xV ∈ V and observe that {xV } is a net. Let us now show that {xV }
converges to x. Indeed, fix any neighborhood V of x and choose V0 := V . If
W ∈ Nx and V0  W , then xW ∈ W ⊂ V and thus lim xV = x by the above
definitions.
1.1 Topological Spaces 29

Proposition 1.74 Let X be a topological space, and let A be a subset of X.


Then x ∈ A if and only if there exists a net {xα }α∈I ⊂ A that converges to x.
Moreover, A is a closed set if and only if for any net {xα }α∈I ⊂ A converging
to x we have x ∈ A.

Proof. Take x ∈ A and denote by Nx the collection of all open sets containing
x. Given U, V ∈ Nx , define the relation
U  V if and only if U ⊃ V.
Then I := Nx and the preorder  form a directed set. For any U ∈ Nx , we
have U ∩ A = ∅ and thus choose xU ∈ U ∩ A. To verify the net convergence
of {xU }U ∈I to x, fix any neighborhood V of x and let U0 := V . Then for any
U ∈ I with U0  U we get U ⊂ U0 = V and so xU ∈ U ⊂ V , which justifies
the claimed convergence. To verify the converse statement, suppose that there
exists a net {xα }α∈I ⊂ A that converges to x. Fix any neighborhood V of x
and find by the definition α0 ∈ I with xα0 ∈ V . It follows that xα0 ∈ V ∩A and
so V ∩A = ∅. Thus we have x ∈ A while the rest of the proof is straightforward.


The next important result equivalently describes the continuity of map-


pings between general topological spaces via net convergence. Note that such
a convergence description of continuity is not possible in general via sequences
unless the spaces in question are metric (or metrizable).

Theorem 1.75 Let X and Y be topological spaces, and let f : X → Y . Then


f is continuous at x0 ∈ X if and only if for any net {xα }α∈I converging to
x0 the net {f (xα )}α∈I converges to f (x0 ).

Proof. Suppose that f is continuous at x0 and that lim xα = x0 . Fix any open
set V that contains f (x0 ) and then find an open set U containing x0 such that
f (U ) ⊂ V . Since lim xα = x0 , there exists α0 ∈ I with
xα ∈ U whenever α0  α.
It follows furthermore that
f (xα ) ∈ V whenever α0  α,
and thus the net {f (xα )} converges to f (x0 ).
To verify the converse statement, suppose that for any net {xα } that
converges to x0 the corresponding net of values {f (xα )}α∈I converges to f (x0 ).
If f is not continuous at x0 , find an open set V that contains f (x0 ) and such
that for any open set U containing x0 we have f (U )  V . Denote by Nx0 the
collection of all open sets containing x0 and recall from the discussions above
that I := Nx0 is a directed set with the relation  defined by the set inclusions
as in Proposition 1.74. Fix now U ∈ I and choose xU ∈ U with f (xU ) ∈ / V.
30 1 FUNDAMENTALS

To show first that {xU }U ∈I converges to x0 , fix any neighborhood V of x0


and let U0 := V . Then for any U ∈ I with U0  U , we get U ⊂ U0 = V and
so xU ∈ U ⊂ V . This verifies that {xU }U ∈I converges to x0 . Since f (xU ) ∈
/V
for every U ∈ I and V is an open set containing f (x0 ), the net {f (xU )}U ∈I
does not converge to f (x0 ), and thus we complete the proof. 
Recall that if {xk } is a sequence in a topological space X and if {kl }l∈N is a
strictly increasing sequence of positive integers, then {xkl }l∈N is a subsequence
of {xk }. Note that l ≤ kl for every l ∈ N, and so for any k0 ∈ N there exists
an index l0 ∈ N such that
k0 ≤ kl whenever l0 ≤ l.
This property can be compared with the next definition of subnets.
Definition 1.76 Let {xα }α∈I be a net, and let J be a directed set. Consider
a function ϕ : J → I. Suppose that the following conditions hold:
(a) ϕ : J → I is increasing, i.e., if μ1  μ2 , then ϕ(μ1 )  ϕ(μ2 ).
(b) For any α0 ∈ I, there exists μ0 ∈ J such that α0  ϕ(μ0 ).
Denote further αμ := ϕ(μ) for μ ∈ J. Then we say that {xαμ }μ∈J is a subnet
of the net {xα }α∈I .
It follows from the definition that for any α0 ∈ I, there exists μ0 ∈ J such
that α0  αμ = ϕ(μ) whenever μ0  μ. Note also that in the above definition,
we use the same preorder notation “” for I and J if no confusion arises.
Proposition 1.77 Let X be a topological space, and let {xα }α∈I be a net in
X that converges to x. Then every subnet of {xα }α∈I converges to x.
Proof. Suppose that lim xα = x and that {xαμ }μ∈J is a subnet of {xα }α∈I .
Fix any open set V that contains x. Then there exists α0 ∈ I satisfying
xα ∈ V whenever α0  α.
Choose μ0 ∈ J such that α0  αμ whenever μ0  μ. It follows that
xαμ ∈ V whenever μ0  μ,
and therefore we get lim xαμ = x. 
Lemma 1.78 Let X be a compact topological space, and let {xα }α∈I be a net
in X. Given α ∈ I, define the set
  
Cα := xγ  α  γ .
Then we have α∈I Cα = ∅.
1.1 Topological Spaces 31

Proof. Since {Cα }α∈I is a collection of nonempty closed sets in X, it suffices to


show by Theorem 1.46 that this family satisfies the finite intersection property.
Fix any α1 , . . . , αm ∈ I and find α0 ∈ I such that
αi ≤ α0 for all i = 1, . . . , m.
m
Then Cα0 ⊂ Cαi for all i = 1, . . . , m, and so i=1 Cαi = ∅. 

The next important result provides an equivalent description of compact-


ness in topological spaces via net convergence. Observe again that such a
description does not hold in general if nets are replaced by sequences.

Theorem 1.79 Let X be a topological space. Then X is compact if and only


if every net in X has a convergent subnet.

Proof. =⇒: Suppose that the space X is compact and take an arbitrary net
{xα }α∈I in X. Define the collection of sets {Cα }α∈I as in Lemma 1.78 and
choose x ∈ α∈I Cα . We need to show that there exists a subnet of {xα }α∈I
which converges to x. Denote by Nx the collection of all open sets containing
x. Given U ∈ Nx and α ∈ I, we have U ∩ {xγ | α  γ} = ∅. Thus it is possible
to choose γU,α ∈ I for which we have
α  γU,α and xγU,α ∈ U.
Let further J := Nx × I and for any (U, α) and (V, β) in J define the relation
(U, α)  (V, β) if and only if U ⊃ V and γU,α ≤ γV,β ,
which is clearly a preorder on I. For such (U, α) and (V, β), let W := U ∩ V
and choose γ ∈ I satisfying γU,α  γ and γV,β  γ. Then we easily see that
(U, α)  (W, γ) and (V, β)  (W, γ). Thus (J, ) is a directed set. Next we
define ϕ : J → I by ϕ(U, α) := γU,α and note that {xϕ(U,α) }(U,α)∈J is a subnet
of {xα }α∈J . To show that lim xϕ(U,α) = x, fix an open set V containing x,
let V0 := V , and pick α0 ∈ I. If (W, β) ∈ J and (V0 , α0 )  (W, β), then
W ⊂ V0 = V . It follows that xϕ(W,β) ∈ W ⊂ V0 = V , which readily justifies
that lim xϕ(U,α) = x.
⇐=: Suppose that every net in X has a convergent subnet. To verify that X
is compact, it suffices to show by Theorem 1.46 that C∈C C = ∅ whenever
C is a collection of closed subsets of X that satisfies the finite intersection
property. Define the set
 m  

I := Ci  C1 , . . . , Cm ∈ C, m ∈ N
i=1

and define a preorder  on I by


A  B if and only if B ⊂ A.
32 1 FUNDAMENTALS

Choosing xA ∈ A for each A ∈ I, observe that {xA }A∈I is a net in X. Then


there is a subnet {xϕ(B) }B∈J that converges to x ∈ X. Furthermore, for
any A ∈ I there exists B0 ∈ J such that A  ϕ(B) whenever B0  B. It
follows that xϕ(B) ∈ ϕ(B) ⊂ A for all B0  B. Since A is closed, we have
by Proposition 1.74 that x ∈ A. Therefore, x ∈ A∈I A = C∈C C and so
C∈C C = ∅, which verifies that X is compact. 

1.2 Topological Vector Spaces


This section is devoted to reviewing major facts concerning a large subclass
of topological spaces where the topology is consistent with a linear structure.
This combination, which is crucial for the main applications of the book,
creates a rich framework for deriving powerful results of convex analysis.

1.2.1 Basic Concepts in Topological Vector Spaces

We start with some basic definitions and elementary properties. Suppose in


what follows that all spaces in question together with their duals are nonzero.

Definition 1.80 Let X be a vector space over a scalar field F (R or C in what


follows), and let τ be a topology on X. We say that (X, τ ) is a topological
vector space (TVS) if
(a) Every single-point set {x} for x ∈ X is closed;
(b) The addition + : X × X → X and the scalar multiplication · : F × X → X
are continuous.

Note that if (a) and (b) are satisfied, then X is Hausdorff; see Proposition
1.92 presented below.
The next proposition is a direct consequence of continuity of the addition
and the scalar multiplication in a topological vector space. Note that we con-
sider the usual topology on F and the product topologies on X × X and F × X
as defined earlier.
Proposition 1.81 Let X be a topological vector space. Then
(a) For any x0 , y0 ∈ X and any open set W that contains x0 + y0 , there
exist open sets U and V containing x0 and y0 , respectively, such that
U + V ⊂ W.
(b) For any scalar λ0 ∈ F, x0 ∈ X and for any open set W that contains λ0 x0 ,
there exist δ > 0 and an open set U containing x0 such that λU ⊂ W for
all λ ∈ F with |λ − λ0 | < δ.
Proof. The mapping f : X × X → X given by f (x, y) := x + y for x, y ∈ X
is continuous with f (x0 , y0 ) ∈ W . Then there is an open set G with respect
to the product topology on X × X such that f (G) ⊂ W . We can find further
1.2 Topological Vector Spaces 33

open sets U and V in X such that x0 ∈ U and y0 ∈ V with U × V ⊂ G. Thus


f (U × V ) = U + V ⊂ W , which verifies the first statement. The proof of the
second one is also straightforward. 

The following properties are easy consequences of the definitions.

Proposition 1.82 Let X be a topological vector space. Then

(a) Given any a ∈ X, the translation operator Ta : X → X defined by


Ta (x) := x + a is a homeomorphism.
(b) Given any scalar α = 0, the multiplication operator ϕα : X → X defined
by ϕα (x) := αx is a homeomorphism.

Proof. To check first the continuity of Ta , take an open set W ⊂ X that


contains Ta (x0 ) := a + x0 . Proposition 1.81 tells us that there exist open sets
U  x0 and V  a such that U + V ⊂ W . Thus Ta (U ) = U + a ⊂ W and Ta is
continuous. It is also obvious that Ta is a bijection with the inverse mapping
Ta−1 = T−a being continuous as well. Since (ϕα )−1 = ϕ1/α for any α = 0, the
rest of the proof is straightforward. 

The next elementary consequence of the definitions and observations above


shows that the entire topology in a topological vector space is fully determined
by neighborhoods of the origin.

Corollary 1.83 Let X be a topological vector space. Then for any a ∈ X and
α ∈ F with α = 0, we have the properties:

(a) V is a neighborhood of the origin if and only if V + a is a neighborhood


of a.
(b) V is a neighborhood of a if and only if αV is a neighborhood of αa.
Now we continue with some topological properties of sets in topological
vector spaces with the summation and scalar multiplication.

Proposition 1.84 Let X be a topological vector space. Then we have


(a) αΩ = αΩ for all α = 0.
(b) α int(Ω) = int(αΩ) for all α = 0.
(c) A + B ⊂ A + B for all subsets A and B of X.
(d) int(A) + int(B) ⊂ int(A + B) for all subsets A and B of X.

Proof. (a) Fix any α = 0 and consider the mapping f : X → X given by


f (x) := αx for x ∈ X. Since f is a homeomorphism, Proposition 1.14(c) tells
us that f (Ω) = f (Ω). This yields αΩ = αΩ.
(b) This equality follows from Proposition 1.14(d).
(c) By the continuity of the mapping f : X ×X → X defined by f (x, y) := x+y
for (x, y) ∈ X × X we get
34 1 FUNDAMENTALS

f (A × B) = f (A × B) ⊂ f (A × B) = A + B.
(d) Fix any x ∈ int(A) + int(B) and find a ∈ int(A) and b ∈ int(B) with
x = a + b. Choose neighborhoods U and V of the origin such that a + U ⊂ A
and b + V ⊂ B. Then x + (U + V ) = (a + U ) + (b + V ) ⊂ A + B. Since U + V
is also a neighborhood of the origin, x ∈ int(A + B). 

The following two properties of sets are often used in the theory of topo-
logical vector spaces.

Definition 1.85 Let X be a topological vector space. Then

(a) A subset Ω of X is said to be balanced if λΩ ⊂ Ω whenever |λ| ≤ 1.


(b) A subset Ω is said to be absorbing if for any x ∈ X, there exists δ > 0
such that λx ∈ Ω whenever |λ| ≤ δ.

As seen from the definition, any balanced set Ω is symmetric, i.e., Ω = −Ω,
but the converse is not true in general; see Figures 1.1 and 1.2.

Fig. 1.1. A balanced set

Example 1.86 It is easy to observe that a subset Ω of a topological vector


space is absorbing if 0 ∈ int(Ω). On the other hand, the set
     
Ω = (x, y) ∈ R2  y ≥ x2 ∪ (x, y) ∈ R2  y ≤ 0
1.2 Topological Vector Spaces 35

is absorbing while 0 ∈ / int(Ω); see Figure 1.3. Note that this set is nonconvex.
It is possible to construct an example of a convex and absorbing set Ω in
a topological vector space such that 0 ∈ / int(Ω). Consider, e.g., the collection
of measurable and essentially bounded functions f : [0, 1] → R equipped with
the norm f 1 of L1 [0,1], while the norm  f ∞ of L∞ [0, 1] is also used in
what follows. Let Ω := f ∈ X  f ∞ ≤ 1 be a convex subset of the space
above. To show that Ω is absorbing, fix any f ∈ X and let α := f ∞ . We
can easily see that tf ∈ Ω whenever |t| < α+1 1
, which verifies the absorbing
property of Ω. Furthermore, we have 0 ∈ / int(Ω). Suppose on the contrary
that 0 ∈ int(Ω) and find δ > 0 such that B(0; δ) ⊂ Ω. Define
f (t) := 2χ[0,δ/2] (t) for all t ∈ [0, 1]
via the characteristic function (1.2) of the set A := [0, δ/2] ⊂ [0, 1]. Then
f 1 = δ, and thus f ∈ B(0; δ). On the other hand, we get f ∞ = 2, and so
f∈/ Ω. The obtained contradiction shows that 0 ∈ / int(Ω).

Fig. 1.2. A symmetric set that is not balanced

Next we describe a large class of absorbing and balanced sets.


36 1 FUNDAMENTALS

Fig. 1.3. An absorbing set

Proposition 1.87 Any neighborhood V of the origin in a topological vector


space X is absorbing. In addition, there exists a balanced neighborhood of
origin U satisfying the inclusions U ⊂ U + U ⊂ V .

Proof. Fix x0 in X and find by Proposition 1.81 a number δ > 0 and a


neighborhood U of x0 such that λU ⊂ V whenever |λ| < δ. In particular, we
have λx0 ∈ V whenever |λ| < δ, and so V is absorbing. Applying Proposition
1.81 again gives us neighborhoods V1 and V2 of the origin with V1 + V2 ⊂ V .
Then V0 := V1 ∩ V2 is also a neighborhood of the origin. Choose δ > 0 and
a neighborhood
 U0 of the origin satisfying λU0 ⊂ V0 whenever |λ| < δ. Then
U := |λ|<δ λU0 is a neighborhood of the origin with U ⊂ U + U ⊂ V0 + V0 ⊂
V . To check that U is balanced, fix any γ ∈ R with |γ| ≤ 1 and pick any
x ∈ U with x = λx0 , where |λ| < δ and x0 ∈ U0 . Then |γλ| ≤ |λ| < δ, and
thus γx = (γλ)x0 ∈ U . 

For the next proposition we need the following definition.


Definition 1.88 Let X be a topological space, and let B be a collection of
neighborhoods of x0 ∈ X. We say that B is a basis of neighborhoods of
x0 if for any neighborhood V of x0 , there exists U ∈ B with x0 ∈ U ⊂ V .
Further properties of absorbing and balanced sets are listed below.

Proposition 1.89 Let X be a topological vector space. Then there exists a


basis of neighborhoods of the origin B0 with the following properties:
1.2 Topological Vector Spaces 37

(a) V is balanced and absorbing for all V ∈ B0 .


(b) αV ∈ B0 for all α ∈ R with α = 0 and V ∈ B0 .
(c) For any V ∈ B0 , there exists U ∈ B0 such that U + U ⊂ V .
(d) For any V1 , V2 ∈ B0 , there exists V ∈ B0 such that V ⊂ V1 ∩ V2 .

Proof. Denoting by B0 the collection of balanced neighborhoods of the origin,


we directly check the validity of all the listed properties. 

Now we define the notion of boundedness for subsets of general topological


vector spaces without using any distance structure and the like.

Definition 1.90 A subset Ω is said to be bounded if for every neighborhood


V of the origin, there exists a scalar α ∈ R such that
Ω ⊂ αV.

A useful characterization of boundedness is given as follows.

Proposition 1.91 A subset A of a topological vector space is bounded if and


only if for any neighborhood V of the origin there exists γ ≥ 0 such that
A ⊂ αV whenever |α| ≥ γ.

Proof. We only need to check the implication =⇒ in the proposition. Fix a


neighborhood V of 0 and choose a balanced neighborhood U of the origin such
that U ⊂ V . Since A is bounded, there exists β ∈ R such that A ⊂ βU = |β|U .
Define γ := |β| and observe that A ⊂ |β|U ⊂ αU ⊂ αV whenever |α| ≥ γ due
to the balanced property of U . This verifies the claim. 

Employing the balanced property, we make the following observation.

Proposition 1.92 Let X be a topological vector space X. Then X is regular/T3


while being, in particular, a Hausdorff topological space.

Proof. It suffices to show that for any closed set F ⊂ X and x ∈


/ F , there
exists an open neighborhood V of the origin such that
(x + V ) ∩ (F + V ) = ∅.
Since x ∈ F c for the open complement F c , we find a balanced open neigh-
borhood U of the origin with x + U ⊂ F c . Then there exists a balanced open
neighborhood V of the origin such that
x + V + V ⊂ F c.
It easily follows from the symmetry of U that
(x + V ) ∩ (F + V ) = ∅,
which therefore verifies the statements of the proposition. 
38 1 FUNDAMENTALS

The next result provides a sufficient condition for the preservation of the
closedness properties in topological vector spaces under summation.
Proposition 1.93 Let X be a topological vector space. If A is a closed subset
of X and B is a compact subset of X, then A + B is closed.
Proof. We need to show that (A + B)c is open. Fix any z ∈ (A + B)c and
pick b ∈ B. Then z − b ∈ / A, and so z − b ∈ Ac , where the latter set is open.
Thus it is possible to choose an open set Ub containing z and an open set Vb
containing b for which we have
(Ub − Vb ) ∩ A = ∅.
m
Since B is compact, find vectors b1 , . . . , bm ∈ B such that B ⊂ i=1 V bi .
m
Denote U := i=1 Ubi and check that
U ∩ (A + B) = ∅.
Indeed, suppose by contradiction that there exists x ∈ U ∩ (A + B), i.e., x ∈ U
and x = a + b for some a ∈ A and b ∈ B. Since b ∈ B, we get b ∈ Vbi for some
i ∈ {1, . . . , m}, which shows that x ∈ Ubi and x − b ∈ Ubi − Vbi . It follows that
a=x−b∈ / A, a contradiction verifying that the set A + B is closed. 
Now we turn to some properties of linear mappings between topological
vector spaces. The first observation contains an elementary albeit useful fact.

Proposition 1.94 Let X and Y be topological vector spaces, and let A : X →


Y be a linear mapping. Then A is continuous at every point of X if and only
if it is continuous at the origin therein.
Proof. We only need to prove the opposite implication. Suppose that A is
continuous at the origin. Fix any x0 ∈ X and an open set V in Y that contains
A(x0 ). Then 0 ∈ V − A(x0 ), which is also an open set. Thus we can find an
open set U in X that contains the origin and such that A(U ) ⊂ V − A(x0 ). It
shows that x0 +U is a neighborhood of x0 with A(x0 +U ) = A(x0 )+A(U ) ⊂ V ,
which readily yields the continuity of A at x0 and hence on X. 
Finally in this subsection, we present relationships between the continuity
and boundedness of linear mappings.
Definition 1.95 Let X and Y be topological vector spaces, and let A : X → Y
be a linear mapping. We say that A is bounded if it maps bounded sets in
X into bounded sets in Y .
In the next major result, we use a certain countability property of topo-
logical spaces.
Definition 1.96 Let X be a topological space.
1.2 Topological Vector Spaces 39

(a) X is said to be first-countable if each point in X has a countable basis


of neighborhoods.
(b) X is said to be second-countable if it has a countable basis.

It follows from the definitions that if a topological space is second-countable,


then it is first-countable. In addition, a given topological vector space is first-
countable if there exists a countable basis of neighborhoods of the origin.

Theorem 1.97 Let A : X → Y be a linear mapping between topological vector


spaces. If A is continuous, then it is bounded. The converse implication holds
if X is first-countable.

Proof. Suppose that A is continuous and that Ω is a bounded set in X. Fix


any neighborhood V of the origin in Y and denote by U := A−1 (V ) the
corresponding neighborhood of the origin in X. Then there exists λ > 0 such
that Ω ⊂ λU , which shows that A(Ω) ⊂ λA(U ) ⊂ λV , i.e., the set A(Ω) is
bounded in Y .
To verify the converse statement under the additional assumption that
X is first-countable, let {Un } be the corresponding basis of neighborhoods
of the origin. Choose a balanced neighborhood V1 of the origin in X such
that V1 ⊂ U1 and then for each n > 1 choose a balanced neighborhood Vn
with Vn ⊂ Vn−1 ∩ Un . In this way, we construct a basis of neighborhoods
{Vn } of the origin in X with Vn+1 ⊂ Vn for all n ∈ N. Suppose on the
contrary that A is not continuous. Then there exists a neighborhood W of
the origin in Y such that A−1 (W ) is not a neighborhood of the origin in X.
Then 1/nVn ⊂ A−1 (W ), and therefore we can choose xn ∈ 1/nVn such that
xn ∈/ A−1 (W ) for all n ∈ N. This shows that the set Ω := {nxn | n ∈ N} is
bounded while the image set A(Ω) = {nA(xn ) | n ∈ N} is not bounded due
to nA(xn ) ∈/ nW whenever n ∈ N. This is a contradiction, which therefore
completes the proof of the theorem. 

The following corollary is a direct consequence of Theorem 1.97.


Corollary 1.98 Let A : X → Y be a linear mapping between normed spaces.
Then we have the assertions:
(a) A is bounded if and only if A := sup{ Ax | x ≤ 1} < ∞.
(b) A is continuous if and only if A is bounded.

1.2.2 Weak Topology and Weak∗ Topology

In this subsection, we introduce and study the weak topology on a topological


vector space X and the weak∗ topology on its topological dual space X ∗ .
These topologies, especially the latter one, play a fundamental role in infinite-
dimensional analysis and its applications. Let X be a topological vector space
over F (which is either R or C), and let f ∈ X ∗ , i.e., f : X → F is a continuous
linear function. Denote by ϕf : X → F the linear function ϕf (x) := f (x) on X.
40 1 FUNDAMENTALS

As f runs through X ∗ , we obtain the collection {ϕf }f ∈X ∗ of continuous linear


functions from X to F.
Let us start with the definition of the weak topology on X and then present
simple examples of open and closed sets with respect to this topology.

Definition 1.99 The weak topology σ(X, X ∗ ) is the weakest topology


associated with the collection {ϕf }f ∈X ∗ ; see Theorem 1.19.
Example 1.100 Let X be a real topological vector space, and let f ∈ X ∗ .
For any scalar γ ∈ R, we define the sets
     
L< 
γ := x ∈ X f (x) < γ and Uγ> := x ∈ X  f (x) > γ ,
which are open with respect to the weak topology on X. The sets
     
L≤ 
γ := x ∈ X f (x) ≤ γ and Uγ≥ := x ∈ X  f (x) ≥ γ
are closed with respect to the weak topology on X.

The next proposition describes neighborhoods of a point and a basis of


neighborhoods for the weak topology of X.
Proposition 1.101 Let X be a topological vector space. Given x0 ∈ X, ε > 0,
and f1 , . . . , fm ∈ X ∗ , we define
  
V = V (x0 , f1 , . . . , fm ; ε) := x ∈ X  |fi (x − x0 )| < ε for all i = 1, . . . , m .
Then V is a neighborhood of x0 for the weak topology σ(X, X ∗ ). Furthermore,
the collection of sets
  
A := V = V (x0 , f1 , . . . , fm ; ε)  f1 , . . . , fm ∈ X ∗ , ε > 0, m ∈ N
is a basis of neighborhoods of x0 for the weak topology of X.

Proof. The first statement is obvious. To verify the second statement, suppose
that U is a neighborhood of x0 in X for the weak topology. Then there exists
a weakly open set W such that
x0 ∈ W ⊂ U.
Without loss of generality, assume that there exist open sets Gi in F and
linear functions fi ∈ X ∗ for i = 1, . . . , m ensuring the representation
m
W = fi−1 (Gi ).
i=1

Since fi (x0 ) ∈ Gi , there exists ε > 0 such that


  
B(ai ; ε) := λ ∈ F  |λ − ai | < ε ⊂ Gi for all i = 1, . . . , m,
1.2 Topological Vector Spaces 41

where ai := fi (x0 ). Here | · | denotes the absolute value in R or the complex


modulus in C. It follows that
m
 
V (x0 , f1 , . . . , fm ; ε) = fi−1 B(ai ; ε) ⊂ W ⊂ U,
i=1

which therefore completes the proof. 


As an immediate consequence of the above proposition, we have the fol-
lowing useful assertion.
Corollary 1.102 Let X be a topological vector space. Given ε > 0 and
f1 , . . . , fm ∈ X ∗ , consider the set
  
V = V (f1 , . . . , fm ; ε) := x ∈ X  |fi (x)| < ε for all i = 1, . . . , m .
Then V is a neighborhood of the origin in X for the weak topology σ(X, X ∗ ).
Furthermore, the collection of open sets
  
A := V = V (f1 , . . . , fm ; ε)  f1 , . . . , fm ∈ X ∗ , ε > 0, m ∈ N
is a basis of neighborhoods of the origin in X for this topology.
Recall that a sequence {xk } ⊂ X converges to x0 as k → ∞ in the given
topology τ if for every open set V containing x0 , there exists k0 ∈ N such that
xk ∈ V for all natural numbers k ≥ k0 . The weak convergence of the sequence
{xk } to x0 as k → ∞ is understood as its convergence in the weak topology
σ(X, X ∗ ). The following theorem provides a sequential characterization of the
weak convergence on X by using dual elements f ∈ X ∗ .
Theorem 1.103 Let X be a topological vector space. Then the sequence {xk }
converges weakly to x0 in X if and only if the numerical sequence {f (xk )}
converges to f (x0 ) as k → ∞ for any dual element f ∈ X ∗ .
Proof. Suppose that the sequence {xk } converges weakly to x0 . Fix any f ∈
X ∗ and any ε > 0. Then the set
  
V := x ∈ X  |f (x) − f (x0 )| < ε
is an open set in σ(X, X ∗ ) that contains x0 . Thus there exists k0 ∈ N such
that xk ∈ V , i.e., |f (xk ) − f (x0 )| < ε for all k ≥ k0 . It follows that {f (xk )}
converges to f (x0 ) as k → ∞.
To verify the converse, fix any open set G containing x0 . Then there exist
ε > 0 and f1 , . . . , fm ∈ X ∗ such that
 
V = V (x0 , f1 , . . . , fm ; ε) = x ∈ X  |fi (x−x0 )| < ε for all i = 1, . . . , m} ⊂ G.
Since {fi (xk )} converges to fi (x0 ) for all i = 1, . . . , m, we find k0 ∈ N with
|fi (xk − x0 )| < ε for all i = 1, . . . , m and for all k ≥ k0 .
It means that xk ∈ G for all k ≥ k0 , and so {xk } converges weakly to x0 . 
42 1 FUNDAMENTALS

It is not surprising that in finite-dimensional spaces the weak convergence


of a sequence agrees with its strong convergence. The next theorem verifies
this rigorously based on the convergence definitions.

Theorem 1.104 If X is a finite-dimensional normed space, then {xk } con-


verges weakly to x0 if and only if it converges strongly to this point.

Proof. We show that the strong topology in a finite-dimensional space agrees


with the weak topology therein. Let τ and τw be the strong and weak topolo-
gies, respectively. Since we obviously have τw ⊂ τ , it remains to verify that
for any open set G ∈ τ and for any point x0 ∈ G there exists a weakly open
set V ∈ τw such that x0 ∈ V ⊂ G. Fix any G ∈ τ and x0 ∈ G and then find
r > 0 such that B(x0 ; r) ⊂ G.
Let {e1 , . . . , en } 
be a basis of X with ei = 1 as i = 1, . . . , n. For any
n
x ∈ X we have x = i=1 xi ei . Consider the mapping T : X → Fn defined by
T (x) := (x1 , . . . , xn ), and then without loss of generality equip the space Fn
with the Euclidean norm
n 1/2
x := |xi |2 for all x = (x1 , . . . , xn ) ∈ Fn .
i=1

It can be directly checked that T is continuous; in fact T is an isomorphism.


Observing that fi : X → F with fi (x) := xi is continuous, define the set
  
V := x ∈ X  |fi (x − x0 )| < r/n for all i = 1, . . . , n .
Then it is an open set with respect to the weak topology and it contains x0 .
We get furthermore the relationships
 n n  n
 
x − x0 =  fi (x)ei − fi (x0 )ei  ≤ |fi (x − x0 )|
i=1 i=1 i=1

and thus conclude that x0 ∈ V ⊂ B(x0 ; r) ⊂ G, which ends the proof. 

The next statement reveals one of the main differences between finite-
dimensional and infinite-dimensional spaces. Remember that B stands the
closed unit ball of the space in question.
Proposition 1.105 Let X be an infinite-dimensional normed space. The unit
sphere S := {x ∈ X | x = 1} is never closed in the weak topology σ(X, X ∗ ).
w
More precisely, we have the equality S = B for the weak closure of S.

Proof. Observe that B is closed in the weak topology. In fact we get


  
B= x ∈ X  |f (x)| ≤ 1 .
f ∈X ∗ , f ≤1
w
Since S ⊂ B, it readily follows that S ⊂ B.
1.2 Topological Vector Spaces 43

w
To verify the opposite inclusion, it suffices to show that B ⊂ S , since it
w
is obvious that S ⊂ S and that B = B ∪ S, where B is the open unit ball in
X. We proceed by fixing any x0 ∈ X with x0 < 1 and any open set G in
the weak topology that contains x0 . Then G contains the set
  
V := x ∈ X  |fi (x − x0 )| < ε for all i = 1, . . . , m .
Thus there exists y0 ∈ X with y0 = 0 such that fi (y0 ) = 0 for all i =
1, . . . , m. Indeed, suppose on the contrary that fi (y) = 0 for some i = 1, . . . , m
and all y ∈ X \ {0}. Then the mapping T : X → Fm defined by T (x) :=
(f1 (x), . . . , fm (x)) is one-to-one, which is a contradiction.
Based on the continuity of ϕ(t) := x0 + ty0 on R, select t ∈ R such that
x0 + ty0 ∈ S. It clearly implies that x0 + ty0 ∈ S ∩ V , and hence G ∩ S = ∅.
w w w
This tells us that B ⊂ S , and thus B ⊂ S due to S ⊂ S . 
The obtained result easily yields the following observation.
Corollary 1.106 The open unit ball B ⊂ X is never open in the weak topology
σ(X, X ∗ ) if the normed space X is infinite-dimensional.
Proof. Suppose on the contrary that B is open in the weak topology of X.
Then B c = {x ∈ X | x ≥ 1} is weakly closed. Since S = B ∩ B c and B is
obviously weakly closed, S is weakly closed as well. This is a contradiction,
which therefore verifies the claim. 
Return now to the general setting where X is a topological vector space
over F. For each x ∈ X, we define the function φx : X ∗ → F by φx (f ) := f (x)
whenever f ∈ X ∗ . The next notion is crucial in many aspects of infinite-
dimensional analysis.
Definition 1.107 The weak∗ topology on X ∗ , denoted by σ(X ∗ , X), is
the topology generated by the collection of functions {φx }x∈X . This amounts
to saying that the weak∗ topology on X ∗ is the weakest topology on X ∗ such
that each function φx for x ∈ X is continuous.
In the case where X is a normed space, we have the three topologies on
X ∗ : the strong topology τ , the weak topology σ(X ∗ , X ∗∗ ), and the weak∗
topology σ(X ∗ , X). Since X ⊂ X ∗∗ , it follows that
σ(X ∗ , X) ⊂ σ(X ∗ , X ∗∗ ) ⊂ τ.
The next proposition and its corollary are consequences of the definitions.

Proposition 1.108 Let X be a topological vector space. For each f0 ∈ X ∗ ,


each ε > 0, and each x1 , . . . , xm ∈ X we consider the set
  
V ∗ = V ∗ (f0 , x1 , . . . , xm ; ε) := f ∈ X ∗  |(f − f0 )(xi )| < ε, i = 1, . . . , m .
Then V ∗ is a neighborhood of f0 in the weak∗ topology σ(X ∗ , X). Furthermore,
the collection
44 1 FUNDAMENTALS
  
A∗ := V ∗ = V ∗ (f0 , x1 , . . . , xm ; ε)  x1 , . . . , xm ∈ X, ε > 0, m ∈ N
forms is a basis of neighborhoods of f0 in σ(X ∗ , X).

Corollary 1.109 Let X be a topological vector space. Given a positive number


ε and a point x ∈ X, the set
  
V (x; ε) := f ∈ X ∗  |f (x)| < ε
is a neighborhood of the origin in X ∗ with respect to the weak∗ topology on
X ∗ . Furthermore, the collection of open sets
 m  

∗ 
B := V (xi ; ε)  xi ∈ X, ε > 0, m ∈ N

i=1

forms a basis of neighborhoods of the origin in X ∗ .


The following statement reveals a regular structure of the dual space
equipped with the weak∗ topology.

Proposition 1.110 Let X be an arbitrary topological vector space. Then the


dual space (X ∗ , σ(X ∗ , X)) is also a topological vector space.

Proof. Fix further any x∗ , u∗ ∈ X ∗ with x∗ = u∗ and find x0 ∈ X such that


a := x∗ , x0 = u∗ , x0 =: b. Choose δ > 0 for which we have
B(a; δ) ∩ B(b; δ) = ∅.
Defining now the sets
     
U := z ∗ ∈ X ∗  | z ∗ −x∗ , x0 | < δ and V := z ∗ ∈ X ∗  | z ∗ −u∗ , x0 | < δ ,
observe that they are open in (X ∗ , σ(X ∗ , X)) with x∗ ∈ U , u∗ ∈ V , and
U ∩V = ∅. This shows that the topological space (X ∗ , σ(X ∗ , X)) is Hausdorff.
The reader can easily check that (X ∗ , σ(X ∗ , X)) is a topological vector space
by verifying the continuity of the addition and the scalar multiplication on
X ∗ ; see also the proof of Theorem 2.31. 

It is useful to deal with the pointwise characterization of the weak∗ con-


vergence presented in the next proposition.
Proposition 1.111 Let {x∗k } ⊂ X ∗ , and let x∗ ∈ X ∗ . Then {x∗k } converges
to x∗ in σ(X ∗ , X) if and only if the sequence {x∗k (x)} converges to x∗ (x) as
k → ∞ for every x ∈ X.

Proof. Suppose that {x∗k } converges to x∗ in σ(X ∗ , X). Take any x ∈ X and
ε > 0. Then the set
  
V ∗ (x, x∗ ; ε) := z ∗ ∈ X ∗  |(z ∗ − x∗ )(x)| < ε
1.2 Topological Vector Spaces 45

is open in σ(X ∗ , X) and contains x∗ . This allows us to find k0 ∈ N such


that x∗k ∈ V ∗ (x, x∗ ; ε) when k ≥ k0 . It yields |x∗k (x) − x∗ (x)| < ε for such
k, and thus the sequence {x∗k (x)} converges to x∗ (x) for every x ∈ X. To
verify the opposite implication, let the sequence {x∗k (x)} converge to x∗ (x)
for every x ∈ X. Fix any open set G containing x∗ and suppose without loss
of generality that
G = V (x∗ , x1 , . . . , xm ; ε)
for some x1 , . . . , xm ∈ X and ε > 0. Since {x∗k (xi )} converges to x∗ (xi ) for all
i = 1, . . . , m, we find k0 ∈ N such that
|x∗k (xi ) − x∗ (xi )| < ε for all k ≥ k0 and for all i = 1, . . . , m.
Then x∗k ∈ G whenever k ≥ k0 , and so {x∗k } converges to x∗ in σ(X ∗ , X). 

Finally in this subsection, we state and prove the Alaoglu-Bourbaki theorem


in the general setting of topological vector spaces. For simplicity we consider
only real vector spaces. Given a subset A of a real topological vector space
X, define the polar set of A by
A◦ := {x∗ ∈ X ∗ | sup x∗ , x ≤ 1}.
x∈A

Theorem 1.112 Let X be a real topological vector space, and let U be a neigh-
borhood of 0 ∈ X. Then U ◦ is compact in X ∗ equipped with the weak∗ topology.

Proof. Choosing a balanced neighborhood V of the origin such that V ⊂ U ,


we have that U ◦ ⊂ V ◦ . Since U ◦ is weak∗ closed, it suffices to show that V ◦
is weak∗ compact. Since V is a balanced set, we have the representation
  
V ◦ := x∗ ∈ X ∗  sup | x∗ , x | ≤ 1 .
x∈V

Given any x ∈ V , consider the family of sets {Ix }x∈V , where Ix := [−1, 1] ⊂ R
for x ∈ V , and form the set 
Z := Ix
x∈V

equipped with the product topology. It follows from Theorem 1.48 by Tikhonov
that Z is a compact topological space. Define the mapping F : V ◦ → Y by
F(f ) := (f (x))x∈V and observe that F is one-to-one. Indeed, if F(f ) = F(g)
for some f, g ∈ V ◦ , then we have
f (v) = g(v) whenever v ∈ V.
Taking any x ∈ X, we find t > 0 such that tx ∈ V . Thus tf (x) = f (tx) =
g(tx) = tg(x), which implies that f (x) = g(x), i.e., F(f ) = F(g) =⇒ f = g.
Now consider the range of F denoted by C := F(V ◦ ). Then the mapping
46 1 FUNDAMENTALS

F : V → C is one-to-one and onto. It is not hard to verify that F is a home-


omorphism. Let us show that C is a closed set; thus C is compact and so is
V ◦ . Indeed, fix a net {zα } in C that converges to z ∈ Z. Then for every α we
have zα = F(fα ), where fα ∈ V ◦ and z = (zx )x∈V . Then
lim fα (v) = zv ∈ [−1, 1] for all v ∈ V.
Pick x ∈ X and find t > 0 with tx ∈ V . Then the linearity of fα yields
1
lim fα (x) = ztv for all v ∈ V.
t
Define f (x) := lim fα (x) for x ∈ X and observe that f is a linear mapping.
Then we have |f (x)| ≤ 1 for all x ∈ V , and hence f ∈ X ∗ and f ∈ V ◦ . It
follows that z := F(f ) ∈ C, i.e., C is closed, which completes the proof. 
As a direct consequence of Theorem 1.112, we get the well-known version
of the Alaoglu-Bourbaki theorem in normed spaces.
Corollary 1.113 Let X be a normed space. Then the closed unit ball
  
B∗ := x∗ ∈ X ∗  x∗ ≤ 1
is a compact subset of X ∗ equipped with the weak∗ topology.

1.2.3 Quotient Spaces

The main goal of this subsection is to discuss some basic facts about quotient
spaces, which are needed for the subsequent parts of this book. Given a linear
subspace L of a topological vector space X, recall that the quotient space X/L
is defined by   
X/L := x + L  x ∈ X .
The addition and the scalar multiplication on X/L are given by
(x + L) + (y + L) := (x + y) + L and α(x + L) := αx + L
for x, y ∈ X and scalar α. Since L is a linear subspace, both operations above
are well-defined. It is easy to check that X/L endowed with these operations
is a vector space.
Definition 1.114 Let L be a linear subspace of a vector space X. The codi-
mension of L in X, denoted by codim(L), is the dimension of the quotient
space X/L, i.e.,
codim(L) := dim(X/L).
The following two propositions deal with vector spaces of codimension one.
Proposition 1.115 Let X be a vector space, and let f : X → F be a nonzero
linear function. Then we have
codim(ker f ) = 1.
1.2 Topological Vector Spaces 47

Proof. Denote L := ker(f ) and fix x0 ∈ X with f (x0 ) = α0 = 0. To show


that span{x0 + L} = X/L, take x ∈ X such that x + L ∈ X/L and get
f (x) = λf (x0 ) = f (λx0 ) with λ := ff(x
(x)
0)
. It follows that f (x − λx0 ) = 0, and
so x − λx0 ∈ L. Thus we get the equality
x + L = λ(x0 + L),
which shows that span{x0 + L} = X/L, and hence codim(L) = 1. 
Proposition 1.116 Let L be a subspace of a vector space X such that
codim(L) = 1. If x0 ∈
/ L, then we have
L ⊕ span{x0 } = X.
Proof. Since dim(X/L) = 1 and x0 + L is a nonzero element in X/L, we see
that span{x0 + L} = X/L. Then for any x ∈ X, there exists a scalar λ with
x + L = λ(x0 + L) = λx0 + L. It follows that x ∈ λx0 + L ⊂ L + span{x0 }.
Thus X = L + span{x0 } since the opposite inclusion is obvious. We can easily
check that L ∩ span{x0 } = {0}. 
The quotient map π : X → X/L is defined by
π(x) := x + L for all x ∈ X. (1.3)
Now we denote by τL the strongest topology on X/L such that the quotient
map is continuous by Proposition 1.21, i.e.,
  
τL = E ⊂ X/L  π −1 (E) is open in X .
The next theorem reveals major properties of quotient spaces and maps.
Theorem 1.117 Let X be a topological vector space, and let L be a closed
subspace of X. Then we have the following assertions:
(a) The quotient map π : X → X/L is linear, continuous, and open.
(b) The space X/L with the quotient topology is a topological vector space.
(c) If B is a basis of neighborhoods of the origin in X, then π(B) is a basis
of neighborhoods of the origin in X/L.
Proof. (a) It follows directly from the definition that π is a linear map, and
its continuity follows from the construction of the quotient topology. Observe
that for any subset Ω of X we have
π −1 (π(Ω)) = Ω + L.
By the definition of the quotient topology, a subset W of the quotient space
X/L is open if and only if π −1 (W ) is open in X. Thus for any open subset V
of X the set π −1 (π(V )) = V + L is open, and so π(V ) is open as well.
48 1 FUNDAMENTALS

(b) Given a subset Ω ⊂ X/L, observe that Ω is closed in X/L if and only
if π −1 (Ω) is closed in X. Indeed, if Ω is closed, then π −1 (Ω) is closed by
the continuity of π. Supposing now that π −1 (Ω) is closed implies that its
complement (π −1 (Ω))c = π −1 (Ω c ) is open. Since π is surjective and open,
π(π −1 (Ω c )) = Ω c is open, which shows that Ω is closed.
To proceed further, fix any singleton Ω := {z + L} in X/L and get z + L =
π(z). Then we have π −1 (Ω) = π −1 (π(z)) = z + L, which is a closed subset of
X, and so Ω is closed in the quotient space X/L. Now fix a neighborhood V
of the origin in X/L and observe that π −1 (V ) is a neighborhood of the origin
in X by the continuity of π. Thus there exists a neighborhood U of the origin
in X such that U + U ⊂ π −1 (V ), which implies that
π(U ) + π(U ) ⊂ V.
Since π is an open map, the set W := π(U ) is a neighborhood of the origin
in X/L with W + W ⊂ V , and hence the addition in X/L is continuous. It is
similar to prove the continuity of the scalar multiplication in X/L.
(c) Let B be a basis of neighborhoods of the origin in X. Fix any neighborhood
V of the origin in X/L and get that π −1 (V ) is a neighborhood of the origin
in X. Thus there exists B ∈ B such that B ⊂ π −1 (V ), which tells us that
π(B) ⊂ V . Since π(B) ∈ π(B), we conclude therefore that π(B) is a basis of
neighborhoods of the origin in X/L. 

Given a continuous linear mapping A : X → Y between topological vector


spaces, recall that the adjoint mapping A∗ : Y ∗ → X ∗ of A is defined by
A∗ y ∗ := y ∗ ◦ A for y ∗ ∈ Y ∗ .
The following proposition describes the structure of the adjoint mapping
associated the quotient one from (1.3).

Proposition 1.118 Let X be a topological vector space, and let L be a closed


linear subspace of X. Then the adjoint π ∗ : (X/L)∗ → X ∗ of π : X → X/L is
a bijection from (X/L)∗ onto L⊥ := {f ∈ X ∗ | f (x) = 0 for all x ∈ L}, i.e.,
we have range(π ∗ ) = L⊥ .

Proof. Fix any z ∗ ∈ (X/L)∗ and show that π ∗ (z ∗ ) ∈ L⊥ . For any x ∈ L we


get π(x) = 0, the zero element in X/L, and so
π ∗ (z ∗ ), x = z ∗ , π(x) = 0.
This implies by the definition that π ∗ (z ∗ ) ∈ L⊥ . To verify the converse state-
ment, pick any f ∈ L⊥ and show that f = π ∗ (z ∗ ) for some z ∗ ∈ (X/L)∗ . To
this end, introduce the function
z ∗ (z) := f (x), z := x + L ∈ X/L.
1.3 Some Fundamental Theorems of Functional Analysis 49

Since f ∈ L⊥ , we conclude that the function z ∗ is well-defined. It is obvious


to see that it is linear, and its continuity follows from the continuity of f and
the fact that π is an open map. Thus z ∗ ∈ (X/L)∗ and
π ∗ (z ∗ ) = z ∗ ◦ π = f,
which shows that π ∗ is a surjection from (X/L)∗ onto L⊥ .

Fig. 1.4. Proof of the surjectivity of π ∗

Finally, suppose that z ∗ ∈ (X/N )∗ and π ∗ (z ∗ ) = 0. This implies that for


any x ∈ X, we have 0 = π ∗ (z ∗ ), x = z ∗ , π(x) meaning that z ∗ = 0. Thus
the mapping π ∗ is injective, and we complete the proof of the proposition. 

The last result of this subsection describes a particular type of finite-


dimensional quotient subspaces of topological vector spaces.

Proposition 1.119 Let X be a topological vector space, and let fi : X → F


be continuous linear functionals for i = 1, . . . m with either real or complex
m
values. Denoting L := i=1 ker(fi ), we have that L is a closed linear subspace
of X with codim(L) < ∞.

Proof. The linearity and closedness of L in X is obvious. Define the mapping


f : X → Fm by f (x) := (f1 (x), . . . , fm (x)). Then f is clearly linear with the
property ker(f ) = L. Thus
codim(L) = dim(X/L) = dim(f (X)) ≤ m,
and we are done with the proof of this statement. 

1.3 Some Fundamental Theorems of Functional Analysis


In this section we present some basic theorems, which play a fundamental role
in the broad area of nonlinear analysis while being particularly important in
convex analysis, optimization theory, and their applications. We start with
the classical Hahn-Banach extension theorem and present its two versions,
real and complex, with no topology involved.
50 1 FUNDAMENTALS

1.3.1 Hahn-Banach Extension Theorem

Recall first the notion of a binary relation, which provides a partial ordering
on the set in question and is widely used in what follows.
Definition 1.120 Given an nonempty set A, we say that a binary relation
“≤” defines a partial order on A if the following properties are satisfied:
(a) For all a ∈ A we have a ≤ a.
(b) If a ≤ b and b ≤ a, then a = b.
(c) If a ≤ b and b ≤ c, then a ≤ c.
A set ordered by a binary relation ≤ is called a partially ordered set.
It is instructive to discuss several typical settings of partial (and full/total)
ordering. The first Example 1.121 is really simple while Example 1.122 pro-
vides a more involved partial ordering that is used in the proof of the Hahn-
Banach theorem given below.
Example 1.121 We have the following illustrations of Definition 1.120:
(a) Let Ω be a nonempty set, and let P denote the collection of all the subsets
of Ω. For two sets A, B ∈ P, define the binary relation A ≤ B by A ⊂ B.
It is easy to verify that ≤ is a partial ordering on P.
(b) The standard ordering relation ≤ on the set of all the real numbers R
clearly makes R to be a totally ordered set.
Example 1.122 Let Y be a subspace of a real vector space X, and let f : Y →
R be a linear mapping. We say that g is an extension of f if it is defined on
a subspace Dg containing Y with f (y) = g(y) for all y ∈ Y . Denote by F the
collection of all the extensions of f . The set F is obviously nonempty because
it contains f itself. For any g1 , g2 ∈ F, we define the following relation: g1 ≤ g2
if and only if
Dg1 ⊂ Dg2 and g2 (x) = g1 (x) for all x ∈ Dg1 .
We can directly check that F is a partially ordered set.
It is natural to have certain notions of upper bound and maximality with
respect to partial orders. Here we use the standard ones.
Definition 1.123 Let A be a partially ordered set, and let S ⊂ A.
(a) S is called a totally ordered set if for any a, b ∈ S, we have that
a ≤ b or b ≤ a with respect to the ordering binary relation.
(b) An element a ∈ A is called an upper bound of S if s ≤ a for all s ∈ S.
(c) An element q ∈ A is called a maximal element of S if q ∈ S and the
following implication holds:
[a ∈ A, q ≤ a] =⇒ [a = q].
1.3 Some Fundamental Theorems of Functional Analysis 51

The next result, known as Zorn’s lemma (also as the Kuratowski-Zorn


lemma), is one of the most fundamental in set theory. It can be proved using
the axiom of choice and transfinite induction.
Theorem 1.124 Let A be a partially ordered set such that every totally
ordered subset S of A admits an upper bound. Then A has a maximal ele-
ment.
Now we employ Zorn’s lemma to prove the basic version of the Hahn-
Banach theorem in real vector spaces. Recall that a function p : X → R on a
real vector space is sublinear if it is subadditive and positively homogeneous.

Theorem 1.125 Let X be a real vector space, and let p : X → R be a sublinear


function on X. Take a subspace Y of X and a linear function f : Y → R
satisfying the condition
f (y) ≤ p(y) whenever y ∈ Y.
Then there exists a linear function F : X → R such that F (y) = f (y) for all
y ∈ Y and F (x) ≤ p(x) for all x ∈ X.

Proof. Consider the collection F of all extensions of f that are majorized by


p with respect to the partial order discussed in Example 1.122. Let N be a
totally ordered subset of F. It is easy to see that N has an upper bound. Then
Zorn’s lemma from Theorem 1.124 ensures that F has a maximal element
denoted by F .
Let us prove that F is the one claimed in the Hahn-Banach theorem. In
fact, it suffices to show that the domain D of F is X. Suppose on the contrary
that D is a proper subset of X and pick any x0 ∈ X \ D. Define
  
Z := D ⊕ span {x0 } = x + λx0  x ∈ D, λ ∈ R .
Fix any number c ∈ R and consider the function on Z given by
E(z) := F (x) + λc with z = x + λx0 ∈ Z,
which is obviously an extension of F . Let us show that it is possible to choose
c ∈ R so that E(z) ≤ p(z) for all z ∈ Z and thus to arrive at a contradiction
to the maximal property of F .
To proceed, observe the inequalities
F (x) − F (y) = F (x − y) ≤ p(x − y) ≤ p(x + x0 ) + p(−x0 − y)
valid for all x, y ∈ D. They imply that
−F (y) − p(−x0 − y) ≤ p(x + x0 ) − F (x).
Select now a number c ∈ R satisfying
   
sup − F (y) − p(−x0 − y) ≤ c ≤ inf p(x + x0 ) − F (x) .
y∈D x∈D
52 1 FUNDAMENTALS

To check that E(z) ≤ p(z) as z ∈ Z, use z = x + λx0 and get


E(z) = F (x) + λc.
If λ > 0, we clearly have the inequality
 
λc ≤ λ p(x/λ + x0 ) − F (x/λ) for all x ∈ D,
which yields the relationships
λc ≤ p(x + λx0 ) − F (x) when x ∈ D,
E(z) = F (x) + λc ≤ p(x + λx0 ) = p(z).
For λ < 0 the proof is similar, and it is obvious for the case where λ = 0. 

The complex version of the Hahn-Banach theorem is formulated by employing


the notion of seminorms defined as follows. We are going to largely study and
apply this notion in Chapter 2.
Definition 1.126 Let X be a vector space, and let p : X → R. We say that p
is a seminorm on X if the following properties hold:

(a) p(x + y) ≤ p(x) + p(y) for all x, y ∈ X.


(b) p(λx) = |λ|p(x) for all x ∈ X and λ ∈ R.

The proof of the complex version of the Hahn-Banach theorem is similar


to that of Theorem 1.125 by applying the notion of seminorms. We leave this
as an exercise to the reader.
Theorem 1.127 Let X be a vector space over C, and let p : X → R be a
seminorm. Consider a linear subspace Y ⊂ X and a linear function f : Y → C
satisfying the condition
|f (y)| ≤ p(y) for all y ∈ Y.
Then there is a linear function F : X → C with F (y) = f (y) as y ∈ Y and
|F (x)| ≤ p(x) for all x ∈ X.
The following consequence of the Hahn-Banach theorem (in the real and
complex forms of Theorems 1.125 and 1.127, respectively), which holds in
normed spaces, is very useful in applications.

Corollary 1.128 Let Y be a subspace of a normed space X over the real or


complex numbers, and let  ∈ Y ∗ . Then there is a linear function L ∈ X ∗
such that L|Y =  and L ∗ =  ∗ , where · ∗ stands for the norm on X ∗ .
1.3 Some Fundamental Theorems of Functional Analysis 53

Proof. Consider the function p(x) =  ∗ x , which is a seminorm on X with


|(y)| ≤ p(y) for all y ∈ Y . Let L be the extension of  by the corresponding
version of the Hahn-Banach theorem. Then we have
|L(x)| ≤ p(x) =  ∗ x for all x ∈ X,
and thus L ∗ ≤  ∗ .
Furthermore, it follows that
     
L∗ = sup |L(x)|  x = 1, x ∈ X ≥ sup |L(y)|  y = 1, y ∈ Y = ∗ ,

which therefore completes the proof. 

In the rest of this subsection, we present some other remarkable conse-


quences of the Hahn-Banach theorem while confining ourselves for definiteness
to its complex version taken from Theorem 1.127.

Theorem 1.129 Let Y be a closed subspace of a normed space X, and let


x ∈ X. Then there exists x∗0 ∈ Y ⊥ such that x∗0 ∗ ≤ 1 and
  
d(x; Y ) : = inf  x − y y ∈ Y 
= sup | x∗ , x |  x∗ ∈ Y ⊥ , x∗ ∗ ≤ 1
= x∗0 , x .

Proof. Consider the case where x ∈


/ Y and define
  
Z := Y ⊕ span {x} = y + λx  y ∈ Y, λ ∈ C .
Let  : Z → C be given by (y + λx) := λd(x; Y ). Then for any y ∈ Y , we get
 y
 
|(y + λx) = |λ|d(x; Y ) ≤ |λ| · x +  = y + λx whenever y ∈ Y
λ
provided that λ = 0. The final estimate clearly holds when λ = 0. Applying
the complex Hahn-Banach theorem from Theorem 1.127 gives us a linear
function x∗0 on X such that
x∗0 (y + λx) = λd(x; Y ), λ ∈ C, y ∈ Y, and
|x∗0 (x)| ≤ x for all x ∈ X.
This shows that x∗0 ∈ Y ⊥ , x∗0 ∗ ≤ 1, and x∗0 (x) = d(x; Y ). Taking now
x∗ ∈ Y ⊥ with x∗ ∗ ≤ 1, we get the estimate
| x∗ , x | = | x∗ , x − y | ≤ x − y for all y ∈ Y.
It clearly yields the fulfillment of the conditions
     
sup | x∗ , x |  x∗ ∈ Y ⊥ , x∗ ∗ ≤ 1 ≤ inf x − y  y ∈ Y

= d(x; Y ) = x∗0 , x
and thus completes the proof since the opposite inequalities are obvious. 
54 1 FUNDAMENTALS

We conclude this subsection with two direct consequences of Theorem 1.129


and hence of the Hahn-Banach theorem.
Corollary 1.130 Let (X, · ) be a normed space, and let 0 = x ∈ X. Then
there exists x∗ ∈ X ∗ such that x∗ = 1 and
x = x∗ , x .

Proof. Employing Theorem 1.129 with Y = {0} and x ∈


/ Y allows us to find
x∗ ∈ X ∗ such that x∗ ∗ ≤ 1 and
d(x; Y ) = x = x∗ , x .
x
Furthermore, we have x∗ ∗ ≥ x∗ , x = 1. 

Corollary 1.131 Let (X, · ) be a normed space, and let x ∈ X. Then


  
x = max | x, x∗ |  x∗ ∗ ≤ 1 .

Proof. Using Theorem 1.129 with Y = {0} gives us the representation


  
x = sup | x, x∗ |  x∗ ∗ ≤ 1 .
The Weierstrass existence theorem ensures that the supremum is attained
since the closed unit ball B∗ ⊂ X ∗ is compact in the weak∗ topology of X ∗
by the Alaoglu-Bourbaki theorem in normed spaces; see Corollary 1.113. 

1.3.2 Baire Category Theorem

This subsection is devoted to the fundamental Baire category theorem, which


is broadly used in the book. We start with the basic definitions and examples.
Definition 1.132 Let X be a topological space. A subset A of X is said to be
dense in the space X if
A = X.
If A is dense in X and A is a countable set, then we say that X is separable.

Example 1.133 Both the set of all rational numbers and the set of all irra-
tional numbers are dense in R. Moreover, Q is countable, so R is separable.

Definition 1.134 Let X be a topological space. A subset A of X is called


nowhere dense if we have
int (A) = ∅.

Example 1.135 A line or a circle in R2 is nowhere dense.

Before the proof of Baire’s theorem, we present the following simple lemma.
1.3 Some Fundamental Theorems of Functional Analysis 55

B1

Fig. 1.5. A property of nowhere dense sets

Lemma 1.136 Suppose that A is a nowhere dense subset of a metric space X.


If B := B(x; r) for some r > 0 is a closed ball in X, then there exists another
closed ball B1 := B(x1 ; r1 ) with radius r1 > 0 such that we have B1 ⊂ B and
B1 ∩ A = ∅.

Proof. Since int (A) = ∅, it is obvious that B is not a subset of A. Thus we


can find a point x1 from the open ball centered at x with radius r such that
x1 ∈
/ A. Furthermore, it is not hard to choose r1 > 0 so small that B1 ⊂ B
and B1 ∩ A = ∅ in the notation above; see Figure 1.5. 

Now we are ready to formulate and prove the Baire category theorem.
Theorem 1.137 Let X be a complete metric space represented as


X= An .
n=1

Then there exists n0 ∈ N such that int(An0 ) = ∅.

Proof. Suppose on the contrary that An is nowhere dense for every n ∈ N.


Since A1 is nowhere dense, by Lemma 1.136 we can choose a closed ball B1
with radius 0 < r1 < 1 such that B1 ∩ A1 = ∅. Since A2 is nowhere dense,
we can similarly choose a closed ball B2 ⊂ B1 with radius 0 < r2 < 1/2 for
which B2 ∩ A2 = ∅. Proceeding in this way allows us to construct a sequence
of closed balls {Bn } with radii rn > 0 satisfying
Bn+1 ⊂ Bn , lim rn = 0, and Bn ∩ An = ∅
n→∞

for all n ∈ N. The classical Cantor intersection theorem (see Exercise 1.156)

gives us a common point a ∈ n=1 Bn . This is a clear contraction because
a∈/ An for every n ∈ N, and thus we complete the proof of the theorem. 
56 1 FUNDAMENTALS

Note that a metric space X is of the first category if it can be expressed in


the form X = ∪∞ n=1 An , where the set An is nowhere dense for every n ∈ N. If
X is not of the first category, then it belongs to the second category. Therefore,
Baire’s category theorem (Theorem 1.137) says that every complete metric
space belongs to the second category.

1.3.3 Open Mapping Theorem

Now we present another classical result of analysis known as the open mapping
theorem. It concerns the following fundamental property of mappings.
Definition 1.138 A mapping f : X → Y between two topological spaces is
open if the image set f (G) is open in Y for any open subset G of X.
Here is the main open mapping result with a complete proof. Recall that a
linear mapping A : X → Y between vector spaces X and Y is called surjective,
or onto, if the image of X under A covers the entire space Y , i.e., AX = Y .
Theorem 1.139 Let X and Y be Banach spaces, and let A : X → Y be a
surjective continuous linear mapping. Then A is open.
  
Proof.
∞ Denote Xr := B(0; r) = x ∈ X  x < r and observe that X =
n=1 Xn . Since A is surjective, we have the equalities

  ∞

Y = AX = A Xn = AXn .
n=1 n=1

The Baire category theorem tells us that there exists n0 ∈ N such that
 
int AXn0 = ∅.
Thus we can find y0 ∈ Y and r > 0 for which
  
B(y0 ; r) = y ∈ Y  y − y0 ≤ r ⊂ AXn0 .
The rest of the proof is split into the following three steps.
Step 1. Let the origin be an interior point of AXp for all p > 0.
Pick y ∈ B(0; r) and get y0 + y ∈ B(y0 ; r) ⊂ AXn0 , which tells us that
y ∈ −y0 + AXn0 ⊂ AX2n0 .
The linearity of A yields the inclusion
B(0; γ) ⊂ AX2r−1 n0 γ for all γ > 0. (1.4)
Step 2. Let the origin be an interior point of AXp for all p > 0.
For y ∈ Y with y ≤ 1, we deduce from (1.4) with k = 1 that there exists a
vector x1 ∈ X2r−1 n0 such that
1.3 Some Fundamental Theorems of Functional Analysis 57

1
y − Ax1 ≤ .
2
Since y − Ax1 ∈ B(0; 1/2), we apply (1.4) again and choose x2 ∈ Xr−1 n0 with
1
y − Ax1 − Ax2 ≤ .
22
Following this process, for each m ∈ N we find xm ∈ X(2m−1 r)−1 2n0 satisfying
 m 
  1
y − Axi  ≤ m .
i=1
2
∞
Since xm ≤ 2m−12n0
r , the series m=1 xmis absolutely convergent in the

Banach space X. Denoting its sum by x := m=1 xm , we get

2n0 4n0
x ≤ m−1 r
= .
m=1
2 r

The continuity of A ensures that y = Ax, and hence y ∈ AX4r−1 n0 . This


verifies that B(0; 1) ⊂ AX4r−1 n0 , and so
B(0; γ) ⊂ AX4r−1 n0 γ for all γ > 0.
Step 3. The set AG is open whenever G is open.
Fix any y0 ∈ AG and find x0 ∈ G with y0 = Ax0 . Choose ε > 0 such that
x0 + εB(0; 1) ⊂ G
and then get the inclusion
y0 + εAB(0; 1) ⊂ AG.
It follows from the proof in Step 2 that
εr 
y0 + B 0; ⊂ AG,
4n0
which verifies that the set AG is open in Y whenever G is open in X. Therefore,
we complete the proof of the theorem. 

The next two corollaries are easy consequences of Theorem 1.139.

Corollary 1.140 Let X and Y be Banach spaces, and let A : X → Y be a


continuous linear mapping. If A is one-to-one and surjective, then it is an
isomorphism, i.e., both mappings A and A−1 are continuous. Moreover, we
have the estimates

(a) A −1 ≤ A−1 and


(b) A−1 −1 x ≤ Ax ≤ A · x for all x ∈ X.
58 1 FUNDAMENTALS

Corollary 1.141 Let X be a linear space, and let · 1 and · 2 be two norms
on X such that (X, · 1 ) and (X, · 2 ) are Banach spaces. If there exists a
number α > 0 with
x 2 ≤α x 1 whenever x ∈ X,
then we can find a constant β > 0 ensuring the inequality
x 1 ≤β x 2 for all x ∈ X.

1.3.4 Closed Graph Theorem and Uniform Boundedness Principle

We conclude this section by presenting two other fundamental results of


Banach space theory. The first one is known as the closed graph theorem.

Definition 1.142 Let f : X → Y be a mapping between two nonempty sets


X and Y . Then the graph of f is the set defined by
      
gph(f ) := x, f (x) ∈ X × Y  x ∈ X = (x, y) ∈ X × Y  y = f (x) .
If X and Y are normed spaces, we mainly use—unless otherwise stated—
the sum norm on the product space X × Y defined by
(x, y) := x X + y Y . (1.5)
It is easy to check that (1.5) is equivalent to the other useful norms given by
  
(x, y) := max x X , y Y and (x, y) := x 2X + y 2Y . (1.6)

Furthermore, we obviously conclude that {(xn , yn )} converges to (x, y) in


X × Y if and only if {xn } converges to x in X and {yn } converges to y in
Y . Note also that the graph of f : X → Y is a closed subset of X × Y if and
only if for any sequence {xn } in X such that {xn } converges to x and {f (xn )}
converges to y it follows that y = f (x).
Now we are ready to formulate and provide a direct proof of the following
closed graph theorem related to the above open mapping theorem.
Theorem 1.143 Let X and Y be Banach spaces, and let A : X → Y be a
linear mapping. If the graph of A is closed, then A is continuous.

Proof. Consider the set Z := gph(A). Then Z is a closed subspace of the


product space X × Y , which is a Banach space with respect to the sum norm
on it. Thus Z is also a Banach space. Define the projection mapping P : Z → X
by P (x, Ax) := x. This mapping is clearly one-to-one and onto. Moreover, it
is continuous since for z := (x, Ax) ∈ Z we have
1.3 Some Fundamental Theorems of Functional Analysis 59

P (z) = x ≤ x + Ax = z .
Using Corollary 1.140 ensures that the inverse mapping P −1 : X → Z is
continuous as well. In particular, if {xn } converges to x, then {P −1 (xn )}
converges to P −1 (x), which implies that the sequence {Axn } converges to Ax
and thus verifies the continuity of A. 

For the second result of this subsection we need the following notions.

Definition 1.144 Let X and Y be normed spaces, and let {Aα }α∈I be a fam-
ily of linear mappings from X to Y . Then we say that the family {Aα }α∈I is
pointwise bounded on X if
sup Aα (x) < ∞ for each x ∈ X.
α∈I

Further, we say that the family {Aα }α∈I is uniformly bounded if


sup Aα op <∞
α∈I

for the operator norm A = A op of A : X → Y defined by


  
A op := inf γ ≥ 0  Ax ≤ γ x for all x ∈ X .

It is obvious from the above definition that if {Aα }α∈I is uniformly


bounded, then it is pointwise bounded. The converse is also true if X is a
Banach space as stated in the following uniform boundedness principle.
Theorem 1.145 Suppose that {Aα }α∈I is a family of continuous linear map-
pings from X to Y , where X is a Banach space, and where Y is a normed
space. If {Aα }α∈I is pointwise bounded, then it is uniformly bounded.
Proof. For each n ∈ N and α ∈ I we define the set
  
Xn,α := x ∈ X  Aα (x) ≤ n ,
which is clearly closed in X. Thus the set Xn := α∈I Xn,α is also closed.
Since the family {Aα }α∈I is pointwise bounded on X, we have


X= Xn .
n=1

The aforementioned Baire category theorem allows us to find n0 ∈ N such


that int(Xn0 ) = ∅. Thus there exists r > 0 for which we have the inclusion
B(x; r) ⊂ Xn0 .
x
Fix now any x ∈ X and get x + r ∈ B(x; r) ⊂ Xn0 . Hence
x
60 1 FUNDAMENTALS

 x 
 
Aα x + r  ≤ n0 for all α ∈ I.
x
This yields the estimate
 x 
 
Aα r  ≤ n0 + Aα x ≤ 2n0 for all α ∈ I,
x
which tells us in turn that
2n0
Aα x ≤ x whenever α ∈ I.
r
2n0
It shows that Aα ≤ for all α ∈ I and thus completes the proof. 
r

1.4 Exercises for Chapter 1


Exercise 1.146 For X = R consider the collection of subsets of X defined by
    
τ := U ⊂ X  0 ∈ U ∪ ∅ .

(a) Prove that τ is a topology on X.


(b) Prove that any subset A of X containing 0 is dense in X, i.e., A = X.
Exercise 1.147 Let X be an arbitrary infinite set. Consider a collection of subsets
of the set X given by
    
τ := U ⊂ X  X − U is finite ∪ ∅ .

Verify the following assertions:


(a) τ is a topology on X (it is called the finite complement topology on X).
(b) If A and B are two nonempty open sets in X, then A ∩ B = ∅.
(c) For any infinite set A we have A = X.
Exercise 1.148 Let p : X → Z be a function from a topological space X to a
nonempty subset Z. Define
  
τp := B ⊂ Z  p−1 (B) is open in X

and prove that τp is a topology on Z.


Exercise 1.149 Let X be a topological space, and let A, B ⊂ X. Prove that
(a) A ∪ B = A ∪ B.
(b) int(A ∩ B) = int(A) ∩ int(B).
(c) A = int(A) ∪ bd(A) = A ∪ bd(A).
(d) A \ bd(A) = int(A).
Exercise 1.150 Let X be a topological space. For any subset A of X, define

α(A) := int(A) and β(A) := int(A).

Verify the following assertions:


1.4 Exercises for Chapter 1 61

(a) If A ⊂ B, then α(A) ⊂ α(B) and β(A) ⊂ β(B).


(b) If A is open, then A ⊂ α(A) while the closedness of A implies that β(A) ⊂ A.
(c) We always have α(α(A)) = α(A) and β(β(A)) = β(A).
(d) If U, V ⊂ X are open sets with U ∩ V = ∅, then α(U ) ∩ α(V ) = ∅.

Exercise 1.151 Let f : X → Y be a function between topological spaces. Prove


that the following properties are equivalent:
(a) f is continuous.
(b) f −1 (G) is open in X whenever G is open in Y .
(c) f −1 (F ) is closed in X whenever F is closed in Y .
(d) f (A) ⊂ f (A) for every subset A of X.
(e) f −1 (int(B)) ⊂ int(f −1 (B)) for every subset B of Y .

Exercise 1.152 Consider the collection of symmetric open intervals in R given by


  
B := (−a, a)  a ∈ R .

Prove that B is a basis for a topology τ on R. Then show that τ is weaker than the
usual topology on R.

Exercise 1.153 Prove Proposition 1.21.

Exercise 1.154 In the setting of Proposition 1.32, denote by int(A)Y the interior
of A in Y and clarify the fulfillment of the representation

int(A)Y = int(A) ∩ Y.

Exercise 1.155 Prove that any metric space is a Hausdorff topological space.

Exercise 1.156 Prove the Cantor intersection theorem: If Bn := B(xn ; rn ) is a


a complete metric space with Bn+1 ⊂ Bn for every n ∈ N
sequence of closed balls in 
and limn→∞ rn = 0, then ∞ n=1 Bn contains exactly one element.

Exercise 1.157 Let f : X → Y be a continuous function from a topological space


X to a Hausdorff topological space Y . Define the relation Rf on X by

x1 Rf x2 if and only if f (x1 ) = f (x2 ).

Show that Rf is an equivalence relation on X and X/Rf is Hausdorff.

Exercise 1.158 Let f, g : X → Y be continuous functions between topological


spaces. Suppose that Y is Hausdorff and then prove that the set
  
A := x ∈ X  f (x) = g(x)

is a closed subset of X.

Exercise 1.159 Let X be a topological space. Prove that X is Hausdorff if and


only if the set D := {(x, x) | x ∈ X} is closed in the space X × X equipped with
the product topology.

Exercise 1.160 Let X be a compact topological space, let Y be a Hausdorff topo-


logical space, and let f : X → Y be a continuous function.
62 1 FUNDAMENTALS

(a) Prove that f (A) = f (A) for every subset A of X.


(b) Prove that f is a closed function in the sense that if A is a closed subset of X,
then f (A) is a closed subset of Y .
Exercise 1.161 Let X be an arbitrary infinite set equipped with the finite comple-
ment topology from Exercise 1.147. Verify the following assertions:
(a) If f : X → R is continuous, then it is a constant function.
(b) X is a compact topological space.
Exercise 1.162 Let X be a topological space, and let A be a subset of X. Recall
that an element x in X is said to be a limit point of A if every open neighborhood
of x contains infinitely many elements of A. The set of all the limit points of A is
denoted by A . In the Euclidean plane R2 , consider the sets
 
   1  1
E := (0, y)  − 1 ≤ y ≤ 1 and F := (x, sin( ))  0 < x ≤ .
x π
(a) Prove the representation F  = E ∪ F .
(b) Prove that the set E ∪ F is connected.
Exercise 1.163 Let {xα }α∈I ⊂ X be a net in a topological space X. Show that
{xα }α∈I converges to x0 in the weak topology if and only if the numerical net
{f (xα )}α∈I converges to f (x0 ) for every f ∈ X ∗ .
Exercise 1.164 Let X be a normed space. Show that if {xk } converges weakly to
x0 and {fk } converges strongly to f in X ∗ , then {fk (xk )} converges to f (x0 ).
Exercise 1.165 Let X be a topological vector space. Prove that
(a) If Y is a linear subspace of X, then Y is also a linear subspace of X.
(b) If A ⊂ X is balanced, then A is also balanced.
  
Exercise 1.166 Let A := (z1 , z2 ) ∈ C2  |z1 | ≤ |z2 | . Show that A is balanced
while its interior is not.
Exercise 1.167 Let X be a topological vector space, and let A, B ⊂ X. Prove the
following assertions:
(a) If A is an open set and B is an arbitrary set, then the set A + B is open.
(b) Construct an example of closed sets A and B in X for which the set A + B is
not closed.
Exercise 1.168 Let X be a topological vector space, and let A, B ⊂ X. Prove that
if int(A) ∩ B = ∅, then 0 ∈ int(A − B).
Exercise 1.169 Give a detailed proof of the complex version of the Hahn-Banach
theorem formulated in Theorem 1.127.
Exercise 1.170 Let X be a topological vector space, and let f : X → R be a nonzero
linear function. Prove that f is an open mapping. Hint: Let Ω be an open set in
X, and let y0 ∈ f (Ω). Then y0 = f (x0 ) for some x0 ∈ Ω. Choose v ∈ X such
that f (v) > 0 and select δ > 0 such that x0 + tv ∈ Ω whenever |t| < δ. Then
f (x0 + tv) = f (x0 ) + tf (v) = y0 + tf (v) ∈ f (Ω) if |t| < δ. Finally, observe that
(y0 − γ, y0 + γ) ⊂ f (Ω) with γ := δf (v) > 0.
Exercise 1.171 Verify that the norms defined in (1.5) and (1.6) are equivalent in
X × Y , where X and Y are arbitrary normed spaces.
1.5 Commentaries to Chapter 1 63

1.5 Commentaries to Chapter 1


In this chapter we collect those fundamentals of mathematics which are broadly
used in the book and, more generally, in convex analysis and optimization theory.
Although the presented notions and facts can be found in many publications, we
collect them here to make our book self-contained and easy to follow. It would
be convenient for instructors, who teach classes based on this book (as well as for
people reading the book individually), to review the material of this chapter before
considering the main topics. The reader can also use the definitions, results, and
proofs given here for immediate references without appealing to other sources. In
the rest of the commentaries to this chapter, we provide brief historical remarks on
the presented fundamental material.
In his doctoral dissertation [135] published in 1906, Maurice Fréchet (1878–1973)
introduced the axioms of metric together with other important notions such as com-
pactness and completeness. Felix Hausdorff (1868–1942) is considered to be one of
the founders of general topology. He was the first to define the concept of neighbor-
hoods based on a set of axioms in his book [153] published in 1914. The term metric
space was also coined by Hausdorff in that book. The compactness result of the met-
ric space theory formulated in Theorem 1.56 is due to Eduard Heine (1821–1881)
and Émile Borel (1871–1956). Pavel Alexandrov (1896–1982) formally introduced
all the axioms of topological spaces in 1925; see [5] with further references. A com-
prehensive theory of topological spaces was developed by Kazimierz Kuratowski in
his 2-volume monograph [196]. One of the most fundamental results of analysis,
Theorem 1.59, was proved by Karl Weierstrass (1815–1897) in 1860 for continuous
functions on the real line by the arguments valid in general topological spaces; see
[350]. The previous compactness result of Theorem 1.51 was actually first obtained,
while not published, by Bernard Bolzano (1781–1848) in 1817 as a lemma in the
proof of the intermediate value theorem for continuous functions on the real line
without, however, a rigorous definition of real numbers. The classical product result
of Theorem 1.48 in topological spaces is due to Andrey Tikhonov (1906–1993), who
proved it in 1930 in his paper [340]. Although some ideas of the notion of norm
appeared earlier, the formal axiomatic definitions of normed and Banach (complete
normed) spaces were first introduced by Stefan Banach (1892–1945) in his doctoral
dissertation [24] published in 1922. Banach made fundamental contributions to the
theory of Banach spaces named after him. To the best of our knowledge, Andrey
Kolmogorov (1903–1987) was the first to introduce the formal definition of topologi-
cal vector spaces in his paper [184] published in 1934. The closely related concept of
Fréchet spaces (complete metrizable topological vector spaces) was introduced ear-
lier by Fréchet in 1926 under the name of topological affine spaces. The definition of
locally convex topological vector (LCT V ) spaces, which we start using in Chapter 2,
is attributed to John von Neumann (1903–1957) who introduced it in 1935 in his
paper [345] after explicitly defining the notion of weak topology, which inspired the
general LCTV space concept. Von Neumann also provided the axiomatic definition
of Hilbert spaces in 1929, where the name was credited to David Hilbert (1862–1943)
due to Hilbert’s influential work on integral equations and Fourier series that had
led to von Neumann’s definition. The origin of the Alaoglu-Bourbaki theorem goes
64 1 FUNDAMENTALS

back to Banach [26] who proved the sequential weak∗ compactness of the closed
unit ball in dual spaces of separable normed spaces. The general results on the
topological weak∗ compactness of dual balls in arbitrary normed spaces given in
Corollary 1.113 is due to Leonidas Alaoglu (1914–1981); see [2]. This result is also
known as the Banach-Alaoglu theorem. The full statement of Theorem 1.112 in topo-
logical vector spaces is due to Bourbaki [61]. We present here a simplified proof of
this result. The first version of the fundamental Hahn-Banach extension theorem
was proved by Edward Helly (1884–1943) in his paper [156] published in 1912 for
the case of continuous functions defined in closed intervals of the real line. In his
paper [149] published in 1927, Hans Hahn (1879–1934) proved the Hahn-Banach
theorem in real Banach spaces by extending Helly’s techniques with the usage of
transfinite induction instead of the standard one. Independently of Hahn, but also
using transfinite induction, Banach proved in his paper [25] published in 1929 the
version of the Hahn-Banach theorem (Theorem 1.125) in real vector spaces that we
know nowadays. The Kuratowski-Zorn lemma, which we use in the refined proof
of Theorem 1.125, was established independently in the paper [195] by Kuratowski
published in 1922 and by Max Zorn (1906–1993) in his paper [368] published in 1935.
The open mapping theorem from Theorem 1.139, known also as the Banach-Schauder
principle, was obtained independently by Banach [25, Part II] in 1929 and by Juliusz
Schauder (1899–1943) in his paper [321] published in 1930. The related closed graph
theorem from Theorem 1.143 appeared (together with other fundamental results) in
the classical monograph by Banach [26] on the foundations of functional analysis
published in 1932. The uniform boundedness principle from Theorem 1.145, known
as the Banach-Steinhaus theorem, was also given in Banach’s book while being based
on the joint paper [28] by Banach and his teacher Hugo Steinhaus (1887–1972) pub-
lished in 1927. The Baire category theorem, which is crucially used in the proofs of
the basic results presented in Subsections 1.3.3 and 1.3.4, was obtained in the doc-
toral dissertation [22] by René Baire (1874–1932) published in 1899. More complete
theories of topological spaces and functional analysis presented in this chapter can
be found in the books by Bourbaki [60, 61], Dieudonné [107], Dunford and Schwartz
[114], Kelley [180], and Rudin [318] among other references.
2
BASIC THEORY OF CONVEXITY

This chapter is devoted to basic convexity theory dealing with sets and functions
defined in various space frameworks consisting of linear/vector spaces, topolog-
ical vector spaces, locally convex topological spaces and their subclasses, and
also specific results in finite dimensions. Developing the geometric approach to
convex analysis, we start with convex sets, establish fundamental separation the-
orems for them, and then proceed with the study of convex functions. Further
topics on convexity, including duality and generalized differentiation theories,
are considered in the subsequent chapters. Unless otherwise stated, we consider
real vector spaces in this chapter and the subsequent ones.

2.1 Convexity of Sets


In this section we begin a systematic study of convex sets. The main results
revolve around convex separation theorems. Prior to them we discuss impor-
tant properties of convex sets used in what follows.

2.1.1 Basic Definitions and Elementary Properties

Here is the underlying definition of set convexity in vector spaces.


Definition 2.1 A subset Ω of a vector space X is called convex if we have
λx + (1 − λ)y ∈ Ω for all x, y ∈ Ω and λ ∈ (0, 1).
Given a, b ∈ X, the line segment [a, b] ⊂ X connecting these points is
  
[a, b] := λa + (1 − λ)b  λ ∈ [0, 1] . (2.1)
The line segments (a, b), (a, b], and [a, b) are defined similarly by
(a, b) := {λa + (1 − λ)b | λ ∈ (0, 1)},
(a, b] := {λa + (1 − λ)b | λ ∈ [0, 1)},
[a, b) := {λa + (1 − λ)b | λ ∈ (0, 1]}.
Note that if a = b, then all these segments reduce to the singleton {a}.

© Springer Nature Switzerland AG 2022 65


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 2
66 2 BASIC THEORY OF CONVEXITY

It is obvious that a set Ω is convex if and only if [a, b] ⊂ Ω for all a, b ∈ Ω.


Simple geometric illustrations of convex and nonconvex sets in the plane are
presented in Figure 2.1.

Fig. 2.1. Convex and nonconvex sets

Let X and Y be vector spaces. A mapping B : X → Y is called affine if


there exist a linear mapping A : X → Y and a vector b ∈ Y such that
B(x) = A(x) + b for all x ∈ X. (2.2)
The following proposition provides a characterization of affine mappings.

Proposition 2.2 Let X and Y be vector spaces. Then B : X → Y is an affine


mapping if and only if we have that
 
B λx1 + (1 − λ)x2 = λB(x1 ) + (1 − λ)B(x2 ) (2.3)
for all λ ∈ R and x1 , x2 ∈ X.

Proof. To prove the “only if” part, suppose that B is an affine mapping. Then
there exist a linear mapping A : X → Y and a vector b ∈ Y such that (2.2) is
satisfied. Given any x1 , x2 ∈ X and λ ∈ R, it follows that
   
B λx1 + (1 − λ)x2 = A λx1 + (1 − λ)x2 + b
 1 ) + (1− λ)A(x2 )+ λb + (1 −
= λA(x  λ)b
= λ A(x1 ) + b + (1 − λ) A(x2 ) + b
= λB(x1 ) + (1 − λ)B(x2 ),
which therefore verifies the validity of (2.3).
To prove the opposite implication, suppose that B : X → Y satisfies (2.3)
for all λ ∈ R and x1 , x2 ∈ X. Letting b := B(0), define the mapping
A(x) := B(x) − b for x ∈ X (2.4)
and show that it is linear. Indeed, for any x1 , x2 ∈ X and λ ∈ R we employ
(2.3) and (2.4) to verify that
2.1 Convexity of Sets 67
 
A λx1 + (1 − λ)x2 = λA(x1 ) + (1 − λ)A(x2 ) and A(0) = 0.
Given any x ∈ X, observe that
 
A(λx) = A λx + (1 − λ)0 = λA(x) + (1 − λ)A(0) = λA(x).
We have furthermore that
  x + x  1 1  1 1 
1 2
A(x1 + x2 ) = A 2 = 2A x1 + x2 = 2 A(x1 ) + A(x2 )
2 2 2 2 2
= A(x1 ) + A(x2 ),
which justifies the linearity of A. It follows from (2.4) that B(x) = A(x) + b
whenever x ∈ X, and thus the mapping B is affine. 

It is easy to verify that the convexity of sets is preserved while taking their
direct and inverse images/preimages by affine mappings.

Proposition 2.3 Let X and Y be vector spaces, and let B : X → Y be an


affine mapping. The following assertions hold:
(a) If Ω is a convex subset of X, then B(Ω) is a convex subset of Y .
(b) If Θ is a convex subset of Y , then B −1 (Θ) is a convex subset of X.

Proof. We only prove the first assertion and leave the proof of the second
one as an exercise for the reader. Fix any a, b ∈ B(Ω) and λ ∈ (0, 1). Then
a = B(x) and b = B(y) for some x, y ∈ Ω. Proposition 2.2 tells us that
 
λa + (1 − λ)b = λB(x) + (1 − λ)B(y) = B λx + (1 − λ)y .
Since Ω is convex, we get λx + (1 − λ)y ∈ Ω, and hence λa + (1 − λ)b ∈ B(Ω).
This verifies the convexity of the image B(Ω). 

Next we proceed with Cartesian products. Given two vector spaces X and
Y , their product X × Y is a vector space with the operations
(x1 , y1 ) + (x2 , y2 ) := (x1 + x2 , y1 + y2 ),
λ(x1 , y1 ) := (λx1 , λy1 )
for (x1 , y1 ), (x2 , y2 ) ∈ X × Y and λ ∈ R.

Proposition 2.4 Let X and Y be vector spaces. If Ω1 is a convex subset of


X and Ω2 is a convex subset of Y , then the Cartesian product Ω1 × Ω2 is a
convex subset of the product space X × Y .

Proof. Fix any (a1 , a2 ), (b1 , b2 ) ∈ Ω1 × Ω2 and λ ∈ (0, 1). Then a1 , b1 ∈ Ω1


and a2 , b2 ∈ Ω2 . It follows from the convexity of Ωi , i = 1, 2, that
 
λ(a1 , a2 ) + (1 − λ)(b1 , b2 ) = λa1 + (1 − λ)b1 , λa2 + (1 − λ)b2 ∈ Ω1 × Ω2 ,
and thus Ω1 × Ω2 is a convex subset of X × Y . 
68 2 BASIC THEORY OF CONVEXITY

Finally in this subsection, we observe that the notion of convexity for sets
can be directly extended to set-valued mappings via passing to their graphs.
By a set-valued mapping/multifunction between vector spaces X and Y we
understand a mapping F defined on X with values in the collection of all the
→Y
subsets of Y , i.e., with F (x) ⊂ Y ; see Figure 2.2. The notation F : X →
is used for set-valued mappings instead of the usual notation F : X → Y
for single-valued ones. As we see below in the text and commentaries, set-
valued mappings play a highly important role in many aspects of convex and
variational analysis as well as in their numerous applications.

Fig. 2.2. An example of a set-valued mapping

Having a set-valued mapping F : X →


→ Y , we associate with it the following
two sets: the domain and graph of F defined by
2.1 Convexity of Sets 69
  
dom(F ) := x ∈ X  F (x) = ∅ and 
gph(F ) := (x, y) ∈ X × Y  y ∈ F (x) ,
respectively. The mapping F in question is called convex of its graph is a
convex set. In contrast to the case of single-valued mappings, where convexity
reduces in fact to linearity, for set-valued mappings this assumption is fairly
reasonable, and it is broadly used in the book.

2.1.2 Operations on Convex Sets and Convex Hulls

First we consider the following standard operations on arbitrary (not neces-


sary convex) sets in vector spaces. Given Ω, Ω1 , Ω2 ⊂ X and λ ∈ R, define
the set addition and multiplication by a real scalar as
  
Ω1 + Ω2 :=  x1 + x2  x1 ∈ Ω1 , x2 ∈ Ω2 and
λΩ := λx  x ∈ Ω .

Proposition 2.5 Let Ω1 and Ω2 be convex subsets of X, and let λ ∈ R be a


scalar. Then λΩ1 and Ω1 + Ω2 are convex subsets of X.
Proof. To verify the first statement, consider the mapping B : X → X given
by B(x) := λx, x ∈ X. Then B is affine and B(Ω1 ) = λΩ1 . Proposition 2.3
ensures that the set B(Ω1 ) is convex. The convexity of the sum Ω1 + Ω2 can
be verified similarly by using Proposition 2.4. 
A vector x ∈ X is called a convex combination of x1 , . . . , xm ∈ X if there are
numbers λ1 , . . . , λm ≥ 0 such that
m m
λi = 1 and x = λi xi .
i=1 i=1

It follows from the definition that any vector of the form x = λa + (1 − λ)b,
where a, b ∈ X and 0 ≤ λ ≤ 1, is a convex combination of a and b.
Here is a useful characterization of convexity for arbitrary nonempty sets
in general (real) vector spaces via convex combinations of their elements; see
the illustration of this result in Figure 2.3.
Proposition 2.6 A subset Ω of a vector space X is convex if and only if it
contains all the convex combinations of its elements.
Proof. The sufficiency part is trivial. To justify the necessity, we show by
m
induction that any convex combination x = i=1 λi ωi of elements in Ω is
also an element of Ω. This conclusion follows directly from the definition for
m = 1, 2. Fix now a positive integer m ≥ 2 and suppose that every convex
combination of m elements from Ω belongs to Ω. Form the convex combination
m+1 m+1
y= λ i ωi , λi = 1, λi ≥ 0
i=1 i=1
70 2 BASIC THEORY OF CONVEXITY

Fig. 2.3. Illustration of Proposition 2.6

and observe that if λm+1 = 1, then λ1 = λ2 = . . . = λm = 0, and so


y = ωm+1 ∈ Ω. In the case where λm+1 < 1 we get the representations
m m
λi
λi = 1 − λm+1 and = 1,
i=1 i=1
1 − λm+1

which imply in turn the inclusion


m
λi
z := ωi ∈ Ω.
i=1
1 − λm+1

It yields therefore the relationships


m
λi
y = (1 − λm+1 ) ωi + λm+1 ωm+1 = (1 − λm+1 )z + λm+1 ωm+1 ∈ Ω
i=1
1 − λm+1

and thus completes the proof of the proposition. 

Next we proceed with intersections of convex sets.

Proposition 2.7 Let {Ωα }α∈I be a collection of convex subsets of X. Then


α∈I Ωα is also a convex subset of X.

Proof. Taking any a, b ∈ α∈I Ωα and λ ∈ (0, 1), we get that a, b ∈ Ωα for
all α ∈ I. The convexity of each Ωα ensures that λa + (1 − λ)b ∈ Ωα . Thus
λa + (1 − λ)b ∈ α∈I Ωα and the intersection α∈I Ωα is convex. 

When a set is not convex, it is useful in many situations to consider its


convexification. Let us define this notion and study some of its properties.
2.1 Convexity of Sets 71

Definition 2.8 Let Ω be a subset of a vector space X. The convex hull of


Ω is the intersection of all convex sets in X that contains Ω, i.e.,
 

co(Ω) := C  C is convex and Ω ⊂ C .

The next result follows from the definition and Proposition 2.7.

Proposition 2.9 Let X be a vector space, and let Ω be a subset of X. Then


the convex hull co(Ω) is the smallest convex set containing Ω.

Proof. The convexity of co(Ω) ⊃ Ω is a consequence of Proposition 2.7. On


the other hand, for any convex set C in X with Ω ⊂ C we clearly get from
the definition that co(Ω) ⊂ C. 

Now we are ready to provide an important representation of convex hulls


of arbitrary sets in vector spaces.

Theorem 2.10 For any subset Ω of a vector space X, its convex hull co(Ω)
admits the representation
m  m 

co(Ω) = λi ai  λi = 1, λi ≥ 0, ai ∈ Ω, m ∈ N .
i=1 i=1

Proof. Denoting by C the right-hand side of the claimed representation, we


have Ω ⊂ C. Let us check that C is convex. Take any a, b ∈ C with
p q
a= αi ai , b= βj bj ,
i=1 j=1
p q
where ai , bj ∈ Ω, αi , βj ≥ 0 with i=1 αi = j=1 βj = 1, and p, q ∈ N. It is
easy to see that for every number λ ∈ (0, 1) we have
p q
λa + (1 − λ)b = λαi ai + (1 − λ)βj bj .
i=1 j=1

Then the resulting equality


p q p q
λαi + (1 − λ)βj = λ αi + (1 − λ) βj = 1
i=1 j=1 i=1 j=1

ensures that λa + (1 − λ)b ∈ C, which yields co(Ω) ⊂ C by the definition of


m m
co(Ω). Fix now any a = i=1 λi ai ∈ C with i=1 λi = 1 and ai ∈ Ω ⊂ co(Ω)
for i = 1, . . . , m. Since co(Ω) is convex, it follows from Proposition 2.6 that
a ∈ co(Ω). Thus we arrive at the equality co(Ω) = C. 

The next proposition concerns the operations of taking the topological


interior and closure of a convex set.
72 2 BASIC THEORY OF CONVEXITY

Proposition 2.11 Let X be a topological vector space. Then the interior


int(Ω) and closure Ω of a convex set Ω ⊂ X are also convex.
Proof. Picking any a, b ∈ int(Ω) and λ ∈ (0, 1), find an open set V such that
a ∈ V ⊂ Ω and so λa + (1 − λ)b ∈ λV + (1 − λ)b ⊂ Ω.
Since λV + (1 − λ)b is open, we get λa + (1 − λ)b ∈ int(Ω) and thus verify the
convexity of the set int(Ω).
To proceed further with Ω, fix a, b ∈ Ω and λ ∈ (0, 1). By Proposition 1.84
we have the relationships
λa + (1 − λ)b ∈ λΩ + (1 − λ)Ω = λΩ + (1 − λ)Ω ⊂ λΩ + (1 − λ)Ω ⊂ Ω,
which show that the closure Ω is also convex. 

Fig. 2.4. An illustration of Lemma 2.12

The following lemma is fairly important for subsequent considerations. It


is illustrated by Figure 2.4.

Lemma 2.12 Let X be a topological vector space, and let Ω ⊂ X be a convex


set with nonempty interior. Then for any a ∈ int(Ω) and b ∈ Ω we have the
inclusion [a, b) ⊂ int(Ω).

Proof. Choose a neighborhood W of the origin such that a + W ⊂ Ω and then


select a balanced one V with V + V ⊂ W . Since b ∈ Ω, for any ε > 0 we have
b ∈ Ω + εV . Take now λ ∈ (0, 1] and let xλ := λa + (1 − λ)b. Choosing ε > 0
1−λ ε
such that max ε , ≤ 1 gives us the relationships
λ λ
2.1 Convexity of Sets 73

xλ + εV = λa + (1 − λ)b + εV
⊂ λa + (1 − λ)(Ω + εV ) + εV
= λa + (1 − λ)Ω + (1 − λ)εV + εV
= λa + (1 − λ)εV + εV + (1 − λ)Ω
 1−λ ε 
=λ a+ε V + V + (1 − λ)Ω
 λ  λ
⊂ λ a + V +
 V + (1 − λ)Ω
⊂ λ a + W + (1 − λ)Ω
⊂ λΩ + (1 − λ)Ω ⊂ Ω.
This shows that xλ ∈ int(Ω) and thus verifies that [a, b) ⊂ int(Ω). 
Next we discuss some relationships between the closure and interior oper-
ations applied to the convex set in question and its closure.

Theorem 2.13 Let X be a topological vector space, and let Ω ⊂ X be a


convex set with nonempty interior. Then the following assertions hold:
(a) int(Ω) = Ω.
(b) int(Ω) = int(Ω).

Proof. The first assertion requires us to show that of Ω ⊂ int(Ω), since the
opposite inclusion is obvious. Picking b ∈ Ω, a ∈ int(Ω), and t ∈ (0, 1), we set
xt := ta + (1 − t)b = b + t(a − b).
For any neighborhood V of the origin, choose t > 0 so small that t(b − a) ∈ V
and hence xt ∈ (b + V ) ∩ int(Ω). This yields b ∈ int(Ω), and we are done.
To prove the second assertion, we need to verify that int(Ω) ⊂ int(Ω); the
opposite inclusion is obvious. Fix any vectors b ∈ int(Ω) and a ∈ int(Ω) and
then take ε > 0 sufficiently small such that c := b + ε(b − a) ∈ Ω. Using
Lemma 2.12 brings us to the inclusion
ε 1
b= a+ c ∈ (a, c) ⊂ int(Ω),
1+ε 1+ε
which justifies that int(Ω) ⊂ int(Ω) and thus complete the proof. 

The next result also employs Lemma 2.12 to calculate the closure of convex
set intersections under the interiority qualification condition.

Theorem 2.14 Let X be a topological vector space, and let Ω1 and Ω2 be


convex subsets of X satisfying the qualification condition
int(Ω1 ) ∩ Ω2 = ∅.
Then we have the representation
Ω1 ∩ Ω2 = Ω1 ∩ Ω2 .
74 2 BASIC THEORY OF CONVEXITY

Proof. It suffices to show that Ω1 ∩Ω2 ⊂ Ω1 ∩ Ω2 . To proceed, fix a ∈ int(Ω1 )∩


Ω2 and pick any x ∈ Ω1 ∩ Ω2 . Taking a neighborhood W of the origin, let
us show that (x + W ) ∩ (Ω1 ∩ Ω2 ) = ∅. Choose another neighborhood V of
the origin with V + V ⊂ W and then select 0 < t < 1 to be so small that
t(a − x) ∈ V . Using again Lemma 2.12 and the choice of x, we get
xt := x + t(a − x) ∈ int(Ω1 ) ∩ Ω2 .
Since xt ∈ int(Ω1 ), there exists a neighborhood U of the origin in X such that
U ⊂ V and xt + U ⊂ Ω1 ,
which ensures that xt + U = x + t(a − x) + U ⊂ x + V + V ⊂ x + W . Observe
further that (xt + U ) ∩ Ω2 = ∅ due to xt ∈ Ω2 . Now select u ∈ U with
xt + u ∈ Ω2 and get in addition to xt + u ∈ x + W that
xt + u = x + t(a − x) + u ∈ Ω1 ∩ Ω2 .
Thus (x + W ) ∩ (Ω1 ∩ Ω2 ) = ∅, i.e., x ∈ Ω1 ∩ Ω2 . 

2.2 Cores, Minkowski Functions, and Seminorms


We start this section with the consideration of algebraic counterparts of the
interior and closure in arbitrary (real) vector spaces without any topology.
Then we proceed with the study of the Minkowski gauge functions and semi-
norms, which play—besides their own interest—an important role in deriving
convex separation theorems given in the next section.

2.2.1 Algebraic Interior and Linear Closure

Given a subset Ω of a real vector space X, define the following algebraic


notions, known as the algebraic interior (or core) of Ω and the linear closure
of Ω, respectively, by
  
core(Ω) := x ∈ Ω  ∀v ∈ X, ∃δ > 0, ∀t with |t| < δ : x + tv ∈ Ω , (2.5)

  
lin(Ω) := x ∈ X  ∃w ∈ Ω : [w, x) ⊂ Ω . (2.6)
When X is a topological vector space and Ω is a convex subset of X, it is
easy to check that
int(Ω) ⊂ core(Ω) ⊂ Ω ⊂ lin(Ω) ⊂ Ω, (2.7)
where all the inclusions can be strict.
The next proposition follows directly from the definitions.

Proposition 2.15 Let Ω be a nonempty subset of a vector space X. Then


x ∈ core(Ω) if and only if Ω − x is an absorbing set.
2.2 Cores, Minkowski Functions, and Seminorms 75

The following result shows that the operation of taking cores always pre-
serves set convexity.
Proposition 2.16 The core of every convex subset of a vector space X is also
a convex subset of X.

Proof. Fix any a, b ∈ core(Ω) and 0 < λ < 1. It follows from (2.5) that for
any v ∈ X there exists δ > 0 such that
a + γv ∈ Ω and b + γv ∈ Ω whenever |γ| < δ.
For each such number γ we have the relationships
λa + (1 − λ)b + γv = λ(a + γv) + (1 − λ)(b + γv) ∈ λΩ + (1 − λ)Ω ⊂ Ω.
This implies that λa + (1 − λ)b ∈ core(Ω), and hence core(Ω) is convex. 

The next observation presents a useful way to check that a point belongs
to the core of a given convex set.

Proposition 2.17 Let X be a vector space, and let Ω ⊂ X be a convex set


with x0 ∈ Ω. Suppose further that for any v ∈ X there exists δ > 0 such that
x0 + λv ∈ Ω whenever 0 < λ < δ. Then we have x0 ∈ core(Ω).

Proof. Fix any v ∈ X and find δ+ > 0 for which


x0 + λv ∈ Ω whenever 0 < λ < δ+ .
On the other hand, there exists δ− > 0 such that x0 + λ(−v) ∈ Ω for all
0 < λ < δ− . Denoting δ := min{δ+ , δ− } > 0, we can easily see that x0 + λv ∈
Ω whenever |λ| < δ, which shows that x0 ∈ core(Ω). 

It is interesting to find verifiable conditions under which the interior and


core agree for convex sets in topological vector spaces. To establish the first
important result in this direction, we use again Lemma 2.12.
Theorem 2.18 Let X be a topological vector space, and let Ω ⊂ X be a
convex set with nonempty interior. Then int(Ω) = core(Ω).

Proof. The inclusion int(Ω) ⊂ core(Ω) was mentioned in (2.7); see Exer-
cise 2.200. To verify the opposite one, consider the following two cases.
Case 1: 0 ∈ int(Ω). Fix any x ∈ core(Ω) and by the definition of cores find
1
t > 0 with x + tx ∈ Ω. Then we have x = 1+t w for some w ∈ Ω. Employing
Lemma 2.12 tells us that
1 t
x= w+ 0 ∈ int(Ω),
1+t 1+t
which shows that core(Ω) ⊂ int(Ω).
76 2 BASIC THEORY OF CONVEXITY

Case 2: 0 ∈
/ int(Ω). Choose a ∈ int(Ω) and define Θ := Ω−a. Then 0 ∈ int(Θ),
and we get therefore that
core(Ω) − a = core(Θ) ⊂ int(Θ) = int(Ω) − a.
This yields core(Ω) ⊂ int(Ω), which completes the proof. 

Next we consider the linear closure lin(Ω) of a set Ω ⊂ X defined in (2.6).


It is instrumental, in particular, for the study of the Minkowski function given
below. Now we show that lin(Ω) reduces to the closure of Ω for solid (i.e.,
with nonempty interior) convex subsets of topological vector spaces.
First we check the convexity of the set lin(Ω) provided that Ω is convex.

Proposition 2.19 Let Ω be a subset of a vector space X. If Ω is convex, then


the set lin(Ω) is also convex.

Proof. Pick a, b ∈ lin(Ω) and λ ∈ (0, 1). Then there are vectors u, v ∈ Ω with
[u, a) ⊂ Ω and [v, b) ⊂ Ω.
Denoting xλ := λa + (1 − λ)b and wλ := λu + (1 − λ)v ∈ Ω, we see that
[wλ , xλ ) ⊂ Ω, and so xλ ∈ lin(Ω). This verifies the convexity of lin(Ω). 

Now we are ready to verify the aforementioned relationship.

Proposition 2.20 Let X be a topological vector space, and let Ω be a convex


subset of X with nonempty interior. Then we have lin(Ω) = Ω.
Proof. To check that lin(Ω) ⊂ Ω, take any x ∈ lin(Ω) and find by (2.6) w ∈ Ω
such that [w, x) ⊂ Ω. Fix a neighborhood V of the origin. Since V is always
absorbing, we choose 0 < t < 1 so small that t(w − x) ∈ V . Then
x + t(w − x) = tw + (1 − t)x ∈ [w, x) ⊂ Ω,
which yields x + t(w − x) ∈ (x + V ) ∩ Ω, (x + V ) ∩ Ω = ∅, and so x ∈ Ω.
Let us further show that Ω ⊂ lin(Ω) under the assumption that int(Ω) =
∅. Fix a ∈ int(Ω) and deduce from Lemma 2.12 that
[a, x) ⊂ int(Ω) ⊂ Ω for any x ∈ Ω.
It tells us that x ∈ lin(Ω), which completes the proof. 

The next property of cores in vector spaces is a precise counterpart of the


one known for interiors of convex sets in spaces with topological structure.

Proposition 2.21 Let X be a vector space, and let Ω be a convex subset of


X. If a ∈ core(Ω) and b ∈ Ω, then [a, b) ⊂ core(Ω).
2.2 Cores, Minkowski Functions, and Seminorms 77

Proof. Fix λ ∈ (0, 1), define xλ := λa + (1 − λ)b, and then verify that xλ ∈
core(Ω). Since a ∈ core(Ω), there exists δ > 0 such that
a + tv ∈ Ω whenever |t| < δ.
Now taking such t and using the convexity of Ω readily imply that
xλ + tλv = λa + (1 − λ)b + tλv = λ(a + tv) + (1 − λ)b ∈ Ω,
which yields xλ ∈ core(Ω). 

The obtained proposition leads to an interesting observation.

Corollary 2.22 Let Ω be a convex subset of a vector space. Then


 
core core(Ω) = core(Ω).

Proof. Since core(Ω) ⊂ Ω, we immediately get


 
core core(Ω) ⊂ core(Ω).
To verify the opposite inclusion, fix a ∈ core(Ω) and take any v ∈ X. It follows
from (2.5) that there exists δ > 0 such that
a + tv ∈ Ω whenever |t| ≤ δ.
Then Proposition 2.21 tells us that a + γv ∈ core(Ω) for all γ ∈ R with
|γ| < δ/2, and thus we arrive at a ∈ core(core(Ω)). 

The next result in this direction addresses the Banach space setting and
employs the Baire category theorem. First we present a characterization of
absorbing sets in Banach spaces, which is certainly of its own interest.

Lemma 2.23 Let X be a Banach space, and let Ω be a closed convex subset
of X. Then Ω is absorbing if and only if 0 ∈ int(Ω).
Proof. We only need to show that the absorbing property of Ω yields 0 ∈
int(Ω); the converse is obvious. If Ω is absorbing, we have


X= nΩ.
n=1

Then the Baire category theorem ensures the existence of n0 ∈ N such that
int(n0 Ω) = int(n0 Ω) = ∅.
This implies that int(Ω) = ∅. Picking any x0 ∈ int(Ω) and using the absorbing
property of Ω, we find ε > 0 such that [−εx0 , εx0 ] ⊂ Ω. In particular, it shows
that −εx0 ∈ Ω, and thus 0 ∈ (−εx0 , x0 ) ⊂ int(Ω) by Lemma 2.12. 
78 2 BASIC THEORY OF CONVEXITY

It is now easy to describe another setting where int(Ω) = core(Ω).

Theorem 2.24 Let X be an arbitrary Banach space. Then int(Ω) = core(Ω)


for any closed and convex subset Ω of the space X.

Proof. It suffices to verify the inclusion core(Ω) ⊂ int(Ω) whenever the set Ω
is nonempty, closed, and convex. Fix any x ∈ core(Ω) and observe that the
shifted set Ω − x is closed and absorbing. Thus 0 ∈ int(Ω − x) = int(Ω) − x
by Lemma 2.23. It shows that x ∈ int(Ω), and we are done. 

Proposition 2.25 Let Ω be a convex subset of X, and let A : X → Y be a


linear mapping. If A is surjective, then
   
A core(Ω) ⊂ core A(Ω) , (2.8)
where the equality holds if we assume in addition that core(Ω) = ∅.

Proof. Fix any x0 ∈ core(Ω) and show that A(x0 ) ∈ core(A(Ω)). Picking
v ∈ Y gives us v = A(u) for some u ∈ X. Choose δ > 0 such that x0 + tu ∈ Ω
whenever |t| < δ. Thus we get
A(x0 ) + tv = A(x0 + tu) ∈ A(Ω) whenever |t| < δ.
It follows therefore that A(x0 ) ∈ core(A(Ω)).
If core(Ω) = ∅, let us further check that
   
core A(Ω) ⊂ A core(Ω) .
First consider the case where 0 ∈ core(Ω). Choose any y ∈ core(A(Ω)) and
find t > 0 such that y + ty ∈ A(Ω), which tells us that
1  1 
y∈ A(Ω) = A Ω .
1+t 1+t
1
Since 0 ∈ core(Ω), it follows that 1+t Ω ⊂ core(Ω); see Lemma 2.21. Thus
y ∈ A(core(Ω)), which justifies the equality in this case.
In the general case of the proposition, take any a ∈ core(Ω) and get
0 ∈ core(Ω − a). Then
   
A core(Ω − a) = core A(Ω − a) ,
which clearly yields the equality in (2.8). 

2.2.2 Minkowski Gauges

Given a nonempty set Ω in a vector space X, we associate with it the


Minkowski function pΩ : X → R, known also the Minkowski gauge, by
  
pΩ (x) := inf t ≥ 0  x ∈ tΩ , x ∈ X. (2.9)
The next theorem presents some properties of the Minkowski gauge.
2.2 Cores, Minkowski Functions, and Seminorms 79

Theorem 2.26 Let Ω be an absorbing convex set in a vector space X. Then


the Minkowski function pΩ is real-valued and satisfies the following properties:
(a) pΩ is subadditive and positively homogenous.
  
(b) x ∈ X  pΩ (x) < 1 = core(Ω).
  
(c) x ∈ X  pΩ (x) ≤ 1 = lin(Ω).

Proof. Since Ω is absorbing, for any x ∈ X there exists δ > 0 such that
tx ∈ Ω whenever |t| < δ,
which shows that the value pΩ (x) is a real number. Let us now verify each of
the listed properties of pΩ .
(a) To check the subadditivity of pΩ , for any x, y ∈ X pick ε > 0 and find
numbers s, t > 0 such that
s < pΩ (x) + ε, t < pΩ (y) + ε, and x ∈ sΩ, y ∈ tΩ.
Since Ω is convex, we get x + y ∈ sΩ + tΩ = (s + t)Ω, and so
pΩ (x + y) ≤ s + t < pΩ (x) + pΩ (y) + 2ε.
This implies that pΩ (x + y) ≤ pΩ (x) + pΩ (y) and thus shows that pΩ is
subadditive. Taking further x ∈ X and λ > 0, we have
    t 

pΩ (λx) = inf t > 0  λx ∈ tΩ = inf t > 0  x ∈ Ω
   λ
= λ inf s > 0  x ∈ sΩ = λpΩ (x),
which justifies the positive homogeneity of the Minkowski function.
(b) Pick any x ∈ X with pΩ (x) < 1 and find λ ∈ (0, 1) such that x ∈ λΩ.
Since Ω is absorbing, for any v ∈ X there exists γ > 0 with αv ∈ Ω whenever
|α| < γ. Thus (1 − λ)αv ∈ (1 − λ)Ω for all α ∈ R with |α| < γ. It follows from
the convexity of Ω that
x + (1 − λ)αv ∈ λΩ + (1 − λ)Ω = Ω whenever |α| < γ.
Letting δ := (1 − λ)γ yields |ε/(1 − λ)| < γ if |ε| < δ = (1 − λ)γ, and so
x + (1 − λ)ε/(1 − λ)v = x + εv ∈ Ω,
which verifies the inclusion x ∈ core(Ω).
Conversely, suppose that x ∈ core(Ω) and find γ > 0 with x + γx ∈ Ω.
1
Then we get pΩ (x) ≤ 1+γ < 1, which completes the proof of (b).
(c) Fix any x ∈ X with p(x) ≤ 1 and any λ ∈ (0, 1). Then p(λx) < 1 and
therefore λx ∈ core(Ω) ⊂ Ω. Since 0 ∈ Ω, it follows that (1 − λ)0 + λx ∈ Ω
for all λ ∈ (0, 1), and hence [0, x) ⊂ Ω. Thus we arrive at x ∈ lin(Ω).
80 2 BASIC THEORY OF CONVEXITY

To prove the opposite implication, take x ∈ lin(Ω) and find w ∈ Ω such


that [w, x) ⊂ Ω. Then we have the relationships
 1  1   1 1  1 
1− pΩ (x) = pΩ 1 − x = pΩ 1 − x+ w+ − w
n n n n n
 1  1   1 
≤ pΩ 1 − x + w + pΩ − w
n n n
1
≤1+ pΩ (−w).
n
Letting finally n → ∞ tells us that pΩ (x) ≤ 1, which completes the proof. 

In topological vector spaces we obtain the following useful consequences.

Corollary 2.27 Let X be a topological vector space, and let Ω be a convex


set such that 0 ∈ int(Ω). Then pΩ is continuous, and we have
     
int(Ω) = x ∈ X  pΩ (x) < 1 and Ω = x ∈ X  pΩ (x) ≤ 1 .

Proof. Fix any ε > 0 and deduce from 0 ∈ int(Ω) that


εΩ ⊂ p−1
Ω ([0, ε]).

For any open set V of R containing 0, find ε > 0 such that [0, ε] ⊂ V . Then
εΩ ⊂ p−1
Ω (V ).

Since εΩ is a neighborhood of the origin, pΩ is continuous at the origin. The


subadditivity of pΩ from Theorem 2.26(a) yields its continuity on the whole
space X. Using now Theorem 2.26(b) tells us that
  
x ∈ X  pΩ (x) < 1 = core(Ω) = int(Ω)
due to int(Ω) = ∅. Finally, the usage of Theorem 2.26(c) together with Propo-
sition 2.20 ensures the fulfillment of
  
x ∈ X  pΩ (x) ≤ 1 = lin(Ω) = Ω
and thus completes the proof of the corollary. 

2.2.3 Seminorms and Locally Convex Topologies

In this subsection we study the important notion of seminorms introduced in


Definition 1.126. This notion strongly relates to the Minkowski function.
It is an easy exercise to observe some elementary properties of seminorms.

Proposition 2.28 Let X be a vector space, and let p : X → R be a seminorm.


Then we have the following properties:
2.2 Cores, Minkowski Functions, and Seminorms 81

(a) p(x) ≥ 0 for all x ∈ X.


(b) p(0) = 0.
(c) |p(x) − p(y)| ≤ p(x − y) for all x, y ∈ X.
(d) The set L := {x ∈ X | p(x) = 0} is a linear subspace of X.

Now we are ready to establish close relationships between seminorms and


the Minkowski function for a large class of convex sets in vector spaces.

Theorem 2.29 Let X be a vector space.


(a) If p : X → R is a seminorm, then the sets B := {x ∈ X | p(x) < 1} and
C := {x ∈ X | p(x) ≤ 1} are balanced, convex, and absorbing.
(b) Let Ω be a balanced, convex, and absorbing set in X. Then the Minkowski
function pΩ of Ω is a seminorm.

Proof. (a) Fix any x ∈ B and any scalar t with |t| ≤ 1. Then p(tx) = |t|p(x) ≤
p(x) < 1. It tells us that tx ∈ B, and hence B is balanced. For any x, y ∈ B
and λ ∈ (0, 1) we get the relationships
   
p λx+(1−λ)y ≤ p(λx)+p (1−λ)y = |λ|p(x)+|1−λ|p(y) < λ+(1−λ) = 1,
which show that λx + (1 − λ)y ∈ B, and so B is convex. If p(x) = 0, then
x ∈ B. For p(x) > 0 we take t := 1/(p(x) + 1) and get p(λx) = |λ|p(x) < 1
whenever |λ| < t, which tells us that B is absorbing. We can similarly verify
that the set C is also balanced, convex, and absorbing.
(b) Due to Theorem 2.26 it remains to show that pΩ (λx) = |λ|pΩ (x) for all
x ∈ X and scalar λ ∈ R. Fixing any x ∈ X and λ with |λ| = 1, we deduce
from the balanced property of Ω, and hence of the set tΩ with t > 0, that
λx ∈ tΩ if and only if x ∈ tΩ.
It follows furthermore that
     
pΩ (λx) = inf t > 0  λx ∈ tΩ = inf t > 0  x ∈ tΩ = pΩ (x).
Taking now λ ∈ R \ {0}, we have the condition
λ 
pΩ x = pΩ (x).
|λ|
Finally, Theorem 2.26(a) yields pΩ (λx) = |λ|pΩ (x) for any scalar λ. 

Next we introduce an important property of a collection of seminorms.

Definition 2.30 We say that a family P of seminorms on X separates


points if for any 0 = x ∈ X there exists p ∈ P such that p(x) = 0.

The following theorem demonstrates the usefulness of this property.


82 2 BASIC THEORY OF CONVEXITY

Theorem 2.31 Let P be a family of seminorms on a vector space X. Impose


the weakest topology τw on X such that every p ∈ P is continuous; see
Theorem 1.19. Given x0 ∈ X, ε > 0, and p1 , . . . , pm ∈ P, define
  
V = V (x0 , p1 , . . . , pm ; ε) := x ∈ X  pi (x − x0 ) < ε for all i = 1, . . . , m
and consider the collection of sets
  
B := V = V (x0 , p1 , . . . , pm ; ε)  p1 , . . . , pm ∈ P, ε > 0, m ∈ N .
Suppose that P separates points. Then B is a basis of neighborhoods of x0 in
(X, τw ), and (X, τw ) is a topological vector space.

Proof. First we check that B is a basis of neighborhoods of x0 in (X, τw ). Fix


a neighborhood G of x0 in (X, τw ) and find open intervals Ii for i = 1, . . . m
and points p1 , . . . , pm ∈ P such that
m
x0 ∈ p−1
i (Ii ) ⊂ G.
i=1

Then it follows that pi (x0 ) ∈ Ii , and so there exists ε > 0 with (pi (x0 ) −
ε, pi (x0 ) + ε) ⊂ Ii for all i = 1, . . . , m. We claim that
x0 ∈ V (x0 , p1 , . . . , pm ; ε) ⊂ G.
Indeed, pick any vector x ∈ V (x0 , p1 , . . . , pm ; ε) and thus get pi (x − x0 ) < ε
for all i = 1, . . . , m. Then we have
|pi (x) − pi (x0 )| ≤ pi (x − x0 ) < ε.
It follows that pi (x) ∈ Ii for all i = 1, . . . , m, and thus x ∈ i=1 p−1
m
i (Ii ) ⊂ G,
which verifies that B is a basis of neighborhoods of x0 in (X, τw ).
Let us now show that the addition operation + : X ×X → X is continuous
on X ×X. By its linearity we only need to prove that it is continuous at (0, 0).
Fix any neighborhood V := V (0, p1 , . . . , pm ; ε) of the origin meaning that
  
V = x ∈ X  pi (x) < ε for all i = 1, . . . , m ,
and let U := V (0, p1 , . . . , pm ; ε/2). It is easy to see that U + U ⊂ V , which
readily verifies the continuity of the addition on X × X.
To verify further that the scalar multiplication · : R × X → X is con-
tinuous, fix any number λ0 ∈ R, point x0 ∈ X, and neighborhood V :=
V (λ0 x0 , p1 , . . . , pm ; ε) of λ0 x0 . Take δ > 0 so small that
|λ0 |δ + pi (x0 )δ + δ 2 < ε for all i = 1, . . . , m.
If |λ−λ0 | < δ and x ∈ V (x0 , p1 , . . . , pm ; δ), then |λ| < |λ0 |+δ and pi (x−x0 ) <
δ for all i = 1, . . . , m. It follows that
2.2 Cores, Minkowski Functions, and Seminorms 83

pi (λx − λ0 x0 ) ≤ pi (λx − λx0 ) + pi (λx0 − λ0 x0 )


= |λ|pi (x − x0 ) + |λ − λ0 |pi (x0 ) < (|λ0 | + δ)δ + δpi (x0 )
= |λ0 |δ + pi (x0 )δ + δ 2 < ε.
Thus λx ∈ V , which proves the continuity of the scalar multiplication.
Finally, fixing any x0 = 0 and using the assumption that the seminorm
family P separates points, we find p ∈ P such that ε := p(x0 ) > 0. Then
V (x0 , p; ε) is a neighborhood of x0 with 0 ∈
/ V (x0 , p; ε). This shows that X is
a topological vector space and therefore completes the proof of the theorem.

Next we introduce a major subclass of topological vector spaces, which
constitutes the most appropriate territory for many important results pre-
sented in the subsequent parts of the book.
Definition 2.32 Let X be a topological vector space. We say that X is a
locally convex topological vector space (LCTV space for brevity)
if it has a basis of neighborhoods of the origin consisting of convex sets.
It is not hard to see that we always can considered balanced convex sets
in the neighborhood basis.
Proposition 2.33 Let X be a LCTV space. Then X has a basis of neighbor-
hoods of the origin consisting of balanced convex sets.
Proof. Let B be a basis of neighborhoods of the origin consisting of convex
sets. Let B1 := {V ∩ (−V ) | V ∈ B}. Then B1 is a basis of neighborhoods of
the origin with the required properties. 
The following example shows that the topological dual of a topological
vector space equipped with weak∗ topology is an LCTV space.
Example 2.34 Let X be a vector topological space. Then the dual space X ∗
equipped with the weak∗ topology is an LCTV space. Indeed, by Proposi-
tion 1.110 we see that X ∗ is a topological vector space, while Corollary 1.109
tells us that the collection
m  

B ∗ := V (xi ; ε)  xi ∈ X, ε > 0, m ∈ N
i=1

forms a basis of neighborhoods of the origin in X ∗ . The reader can easily


check that each element of B ∗ is convex.
Now we show that the LCTV property of a space is inherited by all the
quotient subspaces with respect to any closed subspace of X.
Proposition 2.35 Let X be an LCTV space, and let L be a closed subspace
of X. Then X/L is an LCTV space as well.
Proof. Indeed, let B be the basis of neighborhoods of the origin in X consisting
of all convex sets. It follows from Theorem 1.117(c) and the linearity of π that
π(B) is the basis of neighborhoods of the origin in X/L consisting of convex
sets. Thus X/L is an LCTV space. 
84 2 BASIC THEORY OF CONVEXITY

The next theorem shows that the topology of any LCTV space is generated
by a family of seminorms.
Theorem 2.36 Let X be an LCTV space. Then its topology is generated
by some family of seminorms. This means that there exists a family P of
seminorms on X such that the weakest topology, which makes each p ∈ P
continuous, agrees with the given topology on X.
Proof. Since X is an LCTV space, there exists a basis of neighborhoods B of
the origin in X consisting of open balanced convex sets. Then P := {pΩ | Ω ∈
B} is a family of seminorms on X. Let τ be the given topology on X, and let
τw be the topology generated by P as in Theorem 1.19. To show that τ = τw ,
it suffices to verify that for any V ∈ B there exists a neighborhood U of the
origin in (X, τw ) such that U ⊂ V , and that for any neighborhood U of the
origin in (X, τw ) there exists V ∈ B with V ⊂ U . Indeed, picking V ∈ B gives
us pV ∈ P with V = {x ∈ X | pV (x) < 1}. Defining now U := V (0, pV ; 1), we
see that U is a neighborhood of the origin in (X, τw ) with U = V . Conversely,
take a neighborhood U of the origin in (X, τw ) and suppose without loss of
generality that
  
U = V (0, pV1 , . . . , pVm ; ε) = x ∈ X  pVi (x) < ε for all i = 1, . . . , m ,
m
where V1 , . . . , Vm ∈ B and ε > 0. Then U = ε i=1 Vi . Finally, letting V :=
m
ε i=1 Vi yields V ∈ B with V = U . It verifies that τ = τw . 

To finish this subsection, we present a useful consequence of the Hahn-


Banach theorem and the seminorm property of the Minkowski gauge.

Theorem 2.37 Let X be an LCTV space, and let Y be a linear subspace of


X with the induced topology. Suppose that f : Y → R is a linear continuous
function. Then there exists another linear continuous function F : X → R
with F (y) = f (y) as y ∈ Y .

Proof. Since f is continuous on Y , we find an open, balanced, convex, and


absorbing set Ω such that
|f (y)| ≤ 1 for all y ∈ Ω ∩ Y.
Theorem 2.29(b) tells us that the Minkowski function pΩ : X → R is a semi-
norm. To check further the estimate
|f (y)| ≤ pΩ (y) for all y ∈ Y, (2.10)
fix any y ∈ Y and any t > 0 with y/t ∈ Ω. Then |f (y/t)| ≤ 1, and so
|f (y)| ≤ t. It follows that
y 

|f (y)| ≤ inf t > 0  ∈ Ω = pΩ (y),
t
2.3 Convex Separation Theorems 85

which verifies (2.10). Employing now the Hahn-Banach theorem gives us a


linear function F : X → R such that F (y) = f (y) as y ∈ Y and that
|F (x)| ≤ pΩ (x) for all x ∈ X.
This ensures that |F (x)| ≤ 1 for all x ∈ Ω, which yields in turn the continuity
of F by remembering that Ω is a neighborhood of the origin. 

2.3 Convex Separation Theorems


This section plays a special role in the book. It contains fundamental sep-
aration theorems for convex sets in vector, topological vector, and finite-
dimensional spaces. Besides being of their own importance, the obtained sep-
aration theorems are crucial for applications to the vast majority of other
significant results of convex analysis and its “convexified” extensions due to
the developed geometric approach. We also present in this section a related
study of extreme points of convex sets, which is based on convex separation.

2.3.1 Convex Separation in Vector Spaces

This subsection is devoted to separation theorems for convex sets in gen-


eral vector spaces without topologies. The main results here are given under
the standing assumption that all the spaces are real, while we will discuss
their extensions beyond real vector spaces in Remark 2.69 and the subsequent
results presented at the end of Subsection 2.3.2.

Fig. 2.5. An affine set

Next we proceed with the definition and properties of affine sets; see the
illustration in Figure 2.5. Given two elements a and b in a vector space X,
the line connecting them is
86 2 BASIC THEORY OF CONVEXITY
  
L[a, b] := λa + (1 − λ)b  λ ∈ R .
Note that if a = b, then L[a, b] reduces to a singleton.

Definition 2.38 A subset Ω of a vector space X is called affine if for any


a, b ∈ Ω we have L[a, b] ⊂ Ω.

It follows from the definition that the intersection of any collection of affine
sets is also affine. This leads us to the construction of the affine hull of a set,
which is illustrated by Figure 2.6.

Definition 2.39 Let X be a vector space. The affine hull of Ω ⊂ X is


  
aff(Ω) := C  C is affine and Ω ⊂ C .

An element x in X of the form


m m
x= λi ωi with λi = 1, m ∈ N,
i=1 i=1

is called an affine combination of ω1 , . . . , ωm .

Fig. 2.6. The affine hull of a set

The proof of the next proposition is straightforward and thus is omitted.


2.3 Convex Separation Theorems 87

Proposition 2.40 The following assertions hold:


(a) A set Ω in a vector space is affine if and only if Ω contains all the affine
combinations of its elements.
(b) Let Ω, Ω1 , and Ω2 be affine subsets of a vector space X. Then the sum
Ω1 +Ω2 and the scalar product λΩ for any scalar λ are also affine subsets
of this space.
(c) Let B : X → Y be an affine mapping between vector spaces X and Y . If
Ω is an affine subset of X and if Θ is an affine subset of Y , then the
image B(Ω) is an affine subset of Y and the inverse image B −1 (Θ) is an
affine subset of X.
(d) Given a subset Ω of a vector space X, its affine hull is the smallest affine
set containing Ω. In addition, we have the representation
m  m 

aff(Ω) = λ i ωi  λi = 1, ωi ∈ Ω, m ∈ N .
i=1 i=1

(e) A set Ω in a vector space is a linear subspace if and only if Ω is an affine


set containing the origin.

Now we consider further relationships between affine sets and linear sub-
spaces.

Lemma 2.41 Let X be a vector space. A nonempty subset Ω of X is affine


if and only if Ω − ω is a linear subspace of X for any ω ∈ Ω.

Proof. Suppose that a nonempty set Ω ⊂ X is affine. It follows from the last
assertion of Proposition 2.40 that the set Ω − ω is a linear subspace for any
ω ∈ Ω. Conversely, fix ω ∈ Ω and suppose that Ω − ω is a linear subspace,
which we denote by L. Then the set Ω = ω + L is obviously affine. 

The preceding lemma leads to the following notion.

Definition 2.42 An affine set Ω in a vector space X is said to be parallel


to a linear subspace L ⊂ X if Ω = ω + L for some ω ∈ Ω.

The next proposition justifies the form of the parallel subspace.

Proposition 2.43 Let Ω be a nonempty affine subset of a vector space X.


Then it is parallel to the unique linear subspace L of X defined by L := Ω −Ω.
Proof. Given an affine set Ω = ∅, fix ω ∈ Ω and consider a linear subspace
L := Ω − ω parallel to Ω. To verify the uniqueness of such a linear subspace
L, take any ω1 , ω2 ∈ Ω and any subspaces L1 , L2 ⊂ X with Ω = ω1 + L1 =
ω2 + L2 . Then L1 = ω2 − ω1 + L2 . Since 0 ∈ L1 , we have ω1 − ω2 ∈ L2 . This
yields ω2 − ω1 ∈ L2 and thus L1 = ω2 − ω1 + L2 ⊂ L2 . In the same way we
show that L2 ⊂ L1 , which justifies that L1 = L2 .
88 2 BASIC THEORY OF CONVEXITY

It remains to verify the representation L = Ω − Ω. We have Ω = ω + L


with the unique linear subspace L and some ω ∈ Ω. Then L = Ω −ω ⊂ Ω −Ω.
Take now any x = u − ω with u, ω ∈ Ω and observe that Ω − ω is a linear
subspace parallel to Ω. Hence Ω −ω = L by the uniqueness of L proved above.
This ensures that x ∈ Ω − ω = L, and thus we arrive at Ω − Ω ⊂ L. 

Next we define a key ingredient of the forthcoming separation theorems.


Definition 2.44 An affine subset Ω of a vector space X is called a hyper-
plane if it is of codimension one. This means that the codimension of the
unique linear subspace of X parallel to Ω is one.

The following proposition is closely related to the results dealing with


codimension one that are given in Propositions 1.115 and 1.116.

Proposition 2.45 A subset Ω of a vector space X is a hyperplane if and only


if there exist a nonzero linear function f : X → R and a number α ∈ R that
provide the representation
  
Ω = x ∈ X  f (x) = α . (2.11)

Proof. Let Ω be a hyperplane. Then there exist a vector w ∈ Ω and a linear


subspace L ⊂ X of codimension one such that
Ω = w + L.
Picking any x0 ∈ / L and using Proposition 1.116 give us L ⊕ span{x0 } = X.
It says that for any x ∈ X there exist a unique pair (m, λ) ∈ L × R with
x = m + λx0 .
For these fixed elements x and w we define f (x) := λ and α := f (w). It is
easy to see that f is a nonzero linear function on X ensuring representation
(2.11) with the selected number α ∈ R.
Conversely, suppose that representation (2.11) holds with some nonzero
linear function f : X → R and number α ∈ R. Choose any w ∈ Ω and let
L := ker(f ), which gives us Ω = w + L. Employing now Proposition 1.115,
we have codim(L) = 1, and so the set Ω is a hyperplane in X. 

Now we define the major separation properties studied and applied in the
book. These definitions do not require any topology. Appropriate topological
structures will be imposed below when they are needed. The definition of the
simplest separation property is illustrated by Figure 2.7.

Definition 2.46 Let Ω1 and Ω2 be nonempty subsets of a vector space X.


We say that Ω1 and Ω2 can be separated by a hyperplane if there exists a
nonzero linear function f : X → R such that
     
sup f (x)  x ∈ Ω1 ≤ inf f (x)  x ∈ Ω2 . (2.12)
2.3 Convex Separation Theorems 89

If it holds in addition that


     
inf f (x)  x ∈ Ω1 < sup f (x)  x ∈ Ω2 , (2.13)
which means that there exist vectors x1 ∈ Ω1 and x2 ∈ Ω2 with f (x1 ) < f (x2 ),
then we say that Ω1 and Ω2 can be properly separated by a hyperplane.

Before establishing major separation theorems, we present some additional


facts of their own interest.

Fig. 2.7. Convex separation

Lemma 2.47 Let Ω be a subset of a vector space X with core(Ω) = ∅, and


let f : X → R be a nonzero linear function. Then the function f cannot be
constant on the entire set Ω.

Proof. Suppose on the contrary that


f (x) = c whenever x ∈ Ω
for some c. Fix x0 ∈ core(Ω) and let Θ := Ω − x0 . Then 0 ∈ core(Θ) and
f (x) = 0 for all x ∈ Θ,
Taking any v ∈ X and choosing t > 0 so small that tv ∈ Θ give us f (tv) =
tf (v) = 0, and thus f (v) = 0, which is a contradiction. 
90 2 BASIC THEORY OF CONVEXITY

Proposition 2.48 Let Ω be a nonempty convex set in a vector space X, and


let x0 ∈
/ Ω. Assume that core(Ω) = ∅. Then Ω and {x0 } can be separated by
a hyperplane if and only if they can be properly separated by a hyperplane.
Proof. It suffices to prove that if Ω and {x0 } can be separated, then they can
be properly separated. Choose a nonzero linear function f : X → R such that
f (x) ≤ f (x0 ) for all x ∈ Ω.
Let us show that there exists w ∈ Ω such that f (w) < f (x0 ). Suppose on
the contrary that this is not the case. Then f (x) = f (x0 ) for all x ∈ Ω.
Since core(Ω) = ∅, by Lemma 2.47, the function f is the zero function. This
contradiction completes the proof of the proposition. 
Now we are ready to establish our first separation theorem in the vector
space setting when one of the sets is a singleton.
Theorem 2.49 Let Ω be a nonempty convex set in a vector space X, and let
x0 ∈
/ Ω. Assume furthermore that core(Ω) = ∅. Then there exists a hyperplane
that separates Ω and {x0 }. If in addition Ω = core(Ω), then there exists a
nonzero linear function f : X → R such that
f (x) < f (x0 ) for all x ∈ Ω. (2.14)
Proof. We begin with the case where 0 ∈ core(Ω), and so Ω is an absorbing
set. Define the linear subspace Y := span{x0 } and the function h : Y → R by
h(αx0 ) = α as α ∈ R. Let us show that h is linear and satisfies the estimate
h(y) ≤ pΩ (y) for all y ∈ Y via the Minkowski gauge of Ω defined in (2.9).
To proceed, suppose that y = αx0 for some α ∈ R. If α ≤ 0, then h(y) =
α ≤ 0 ≤ pΩ (y). If α > 0, then we get
h(y) = α ≤ αpΩ (x0 ) = pΩ (αx0 ) = pΩ (y).
Since pΩ is subadditive and positively homogenous, the Hahn-Banach exten-
sion theorem (Theorem 1.125) allows us to find a linear function f : X → R
such that f (y) = h(y) for all y ∈ Y and f (x) ≤ pΩ (x) for all x ∈ X. The
function f is nonzero due to f (x0 ) = 1. This clearly ensures the estimates
f (x) ≤ pΩ (x) ≤ 1 = f (x0 ) for all x ∈ Ω, (2.15)
which justify the separation property (2.23).
Let us further examine the case where 0 ∈ / core(Ω). Fix a ∈ core(Ω)
and consider the set Θ := Ω − a for which we have 0 ∈ core(Θ). Then Θ
and {x0 − a} can be separated by the above, and thus Ω and {x0 } can be
separated as well. Note finally that in the case where Ω = core(Ω) the last
inequality in (2.15) becomes strict by Theorem 2.26(b) and gives us (2.14). 
The next theorem provides a complete characterization of the equivalent
separation and proper separation properties for singletons from nonempty
convex sets in vector spaces.
2.3 Convex Separation Theorems 91

Theorem 2.50 Let Ω be a convex subset of a vector space X with core(Ω) =


∅, and let x0 ∈ X. Then the following properties are equivalent:
(a) Ω and x0 can be separated by a hyperplane.
(b) Ω and x0 can be separated properly separated by a hyperplane.
(c) x0 ∈
/ core(Ω).

Proof. Invoking Proposition 2.48, it suffices to verify that (b) and (c) are
equivalent. Suppose first that x0 and Ω can be properly separated by a hyper-
plane. Let f : X → R be a nonzero linear function such that
f (x) ≤ f (x0 ) for all x ∈ Ω,
and let x ∈ Ω satisfy the condition
f (x) < f (x0 ).
Suppose on the contrary that x0 ∈ core(Ω). Then we choose t > 0 such that
x0 + t(x0 − x) ∈ Ω, and therefore
f (x0 + t(x0 − x)) ≤ f (x0 ) for all x ∈ Ω.
This yields f (x0 ) ≤ f (x), a contradiction, which verifies that x0 ∈
/ core(Ω).
Let us now prove the converse implication. Observe that core(Ω) is a
nonempty convex subset of X by Proposition 2.19 with core(core(Ω)) =
core(Ω) = ∅ by Corollary 2.22. Since x0 ∈ / core(Ω), we deduce from The-
orem 2.49 that the sets {x0 } and core(Ω) can be properly separated. It gives
us a nonzero linear function f : X → R such that
f (x) ≤ f (x) for all x ∈ core(Ω)
and also a vector w ∈ core(Ω) ⊂ Ω with f (w) < f (x0 ). Fix further any
u ∈ Ω and observe by Proposition 2.21 that tw + (1 − t)u ∈ core(Ω) whenever
0 < t ≤ 1. It follows therefore that
 
tf (w) + (1 − t)f (u) = f tw + (1 − t)u ≤ f (x0 ).
Passing to the limit as t ↓ 0 brings us to f (u) ≤ f (x0 ), which shows that the
sets {x0 } and Ω and x0 can be properly separated. 

Based on the above separation theorem, next we establish an enhanced


version of Proposition 2.21.

Proposition 2.51 Let Ω be a convex subset of a vector space X. If a ∈


core(Ω) and b ∈ lin(Ω), then [a, b) ⊂ core(Ω).

Proof. Fix λ ∈ (0, 1), define xλ := λa + (1 − λ)b, and then verify that xλ ∈
core(Ω). Suppose on the contrary that xλ ∈ / core(Ω). Then the single-point
92 2 BASIC THEORY OF CONVEXITY

set {xλ } and the set Ω can be separated by Theorem 2.50, i.e., there exists a
nonzero linear function f : X → R such that
f (x) ≤ f (xλ ) = λf (a) + (1 − λ)f (b) for all x ∈ Ω. (2.16)
Since b ∈ lin(Ω), the definition of the linear closure from (2.6) shows that
there exists w ∈ Ω such that [w, b) ⊂ Ω. Thus for all n ∈ N we have
1
xn := b + (w − b) ∈ Ω.
n
Then (2.16) tells us that
1 1
f (xn ) ≤ f (xλ ) ⇐⇒ f (w) − f (b) + λf (b) ≤ λf (a).
n n
Passing to the limit as n → ∞, we arrive at
f (b) ≤ f (a). (2.17)
Remembering that a ∈ core(Ω) implies that for all large m ∈ N we have
1
xm := a + (a − b) ∈ Ω.
m
Then (2.16) also ensures that
1 1
f (xm ) = f (a) + f (a) − f (b) ≤ λf (a) + (1 − λ)f (b).
m m
Passing there to the limit as m → ∞ and taking into account that λ ∈ (0, 1)
bring us to the equivalence
(1 − λ)f (a) ≤ (1 − λ)f (b) ⇐⇒ f (a) ≤ f (b). (2.18)
Combining now (2.17) and (2.18), we arrive at the equality f (a) = f (b).
Finally, it follows from the inclusion a ∈ core(Ω) that for any v ∈ X there
exists t > 0 such that a + tv ∈ Ω, which implies by (2.16) and the equality
f (a) = f (b) that f (v) = 0. The obtained contradiction verifies that xλ ∈ Ω,
and thus we complete the proof of the proposition. 

In the last part of this subsection we are going to show that the Hahn-
Banach extension theorem in the full generality of Theorem 1.125 can be
derived from the “extreme” version of Theorem 2.49 that provides the separa-
tion of a singleton x0 ∈
/ Ω from a convex set Ω in the case where Ω = core(Ω).
To proceed, we first present the following two simple lemmas.

Lemma 2.52 Let p : X → R be a sublinear function defined on a vector space


X, and let Ω := {x ∈ X | p(x) < 1}. Then Ω is convex and absorbing.
Furthermore, we have pΩ = p and Ω = core(Ω).
2.3 Convex Separation Theorems 93

Proof. The convexity of the set Ω immediately follows from the definition.
Since core(Ω) ⊂ Ω, only the opposite inclusion is needed to be verified. To
this end, fix any x0 ∈ Ω and pick an arbitrary vector v ∈ X. If p(v) = 0, then
for any number 0 < λ < 1 we have
p(x0 + λx) ≤ p(x0 ) + λp(v) = p(x0 ) < 1.
In the case where p(v) = 0, define δ := (1 − p(x0 ))/p(v). If 0 < λ < δ, then
p(x0 + λv) ≤ p(x0 ) + λp(v)
1 − p(x0 )
< p(x0 ) + p(v)
p(v)
= p(x0 ) + 1 − p(x0 ) = 1,
which shows that x0 + λv ∈ Ω for all such λ. It tells us therefore that x0 ∈
core(Ω), and thus core(Ω) = Ω. Observing that 0 ∈ Ω = core(Ω), we also see
that the set Ω is absorbing.
To prove further that pΩ = p, fix x ∈ X and check first that pΩ (x) ≤ p(x).
Considering any λ > p(x), we have p(x/λ) < 1, which yields x/λ ∈ Ω and
so x ∈ λΩ. By the definition of the Minkowski function from (2.9) we get
pΩ (x) ≤ λ and hence pΩ (x) ≤ p(x). Taking now λ > 0 with x ∈ λΩ gives us
x = λw for some w ∈ Ω, and so p(w) < 1. Then p(x) = p(λw) = λp(w) < λ,
which ensures that p(x) ≤ pΩ (x). Since x was chosen arbitrarily, it verifies
that p = pΩ and thus completes the proof of the lemma. 

The second lemma is a direct observation.

Lemma 2.53 Let Ω be a convex subset of a vector space X with core(Ω) = Ω.


Then for any Q ⊂ X we have the equality
core(Ω + Q) = Ω + Q.

Proof. Observe by the definitions that


    
Ω+Q = (Ω + q) = (core(Ω) + q) = (core(Ω + q)) ⊂ core Ω + Q .
q∈Q q∈Q q∈Q

Since the opposite inclusion is obvious, we complete the proof. 

Now we are ready to derive the Hahn-Banach theorem from the aforemen-
tioned extreme version of convex separation in vector spaces.
Theorem 2.54 (Hahn-Banach theorem from convex separation). Let X be a
vector space. Then the case of Theorem 2.49 where Ω = core(Ω) implies the
Hahn-Banach theorem formulated in Theorem 1.125.

Proof. Let Y be a linear subspace of X, let g : Y → R be a linear function,


and let p : X → R be a sublinear function with g(y) ≤ p(y) for all y ∈ Y .
If g = 0, then the zero function f = 0 obviously satisfies the requirements
94 2 BASIC THEORY OF CONVEXITY

of the Hahn-Banach theorem. Thus it suffices to consider the case where g is


nonzero. Then we find y0 ∈ Y with g(y0 ) = 1 and define the sets
 
Ω := x ∈ X  p(x) < 1} and Λ := Ω + ker g.
It follows from Lemma 2.52 that the set Ω is convex with core(Ω) = Ω. Then
the set Λ is also convex, and we deduce from Lemma 2.53 that core(Λ) = Λ.
Observe further that y0 ∈ Λ. Indeed, supposing on the contrary that y0 ∈ Λ
tells us that y0 = ω + z, where p(ω) < 1 and g(z) = 0. Thus
g(y0 ) = g(ω + z) = g(ω) ≤ p(ω) < 1 = g(y0 ),
a contradiction. Employing now the extreme case of Theorem 2.49 gives us a
linear function h : X → R such that
h(x) < h(y0 ) for all x ∈ Λ. (2.19)
Since 0 ∈ Λ, we have 0 = h(0) < h(y0 ).
Define next the linear function f : X → R by
1
f (x) := h(x), x ∈ X,
h(y0 )
and verify that f is an extension of g with f (x) ≤ p(x) on X as stated
in Theorem 1.125. To proceed, observe that f (y0 ) = 1 and show first that
z ∈ ker g, which means that g(z) = 0 yields f (z) = 0. Suppose on the contrary
that f (z) = 0, i.e., h(z) = h(y0 )f (z) = 0. It tells us that
 h(y ) 
0
h z = h(y0 ),
h(z)
0)
which contradicts (2.19) by h(y
h(z) z ∈ kerf ⊂ Λ, and so proves that f (z) = 0.
To proceed further, we can easily see that Y = ker g ⊕ y0 , and thus for
any y ∈ Y find some z ∈ ker g and λ ∈ R such that y = z +λy0 . Since f (z) = 0
and f (y0 ) = 1, we obtain the equalities
f (y) = f (z + λy0 ) = f (z) + λf (y0 ) = λ = g(y),

which ensure that f Y = g. Finally, fix any λ ≥ 0 with x ∈ λΩ and take
ω ∈ Ω such that x = λω. It follows from the definition that
λ
f (x) = f (λω) = λf (ω) = h(ω).
h(y0 )
Since ω ∈ Ω ⊂ Λ, we get by (2.19) that h(ω) < h(y0 ). Using the latter
together with λ/h(y) ≥ 0 brings us to
λ λ
f (x) = h(ω) ≤ h(y0 ) = λ.
h(y0 ) h(y0 )
By the Minkowski gauge definition (2.9) we have f (x) ≤ pΩ (x) = p(x), where
the last equality follows from Lemma 2.52. This completes the proof. 
2.3 Convex Separation Theorems 95

2.3.2 Convex Separation in Topological Vector Spaces

This subsection continues the study of convex separation, but now in the
setting of vector spaces endowed with topological structures. In preparation
to establish major separation theorems for convex sets in (real) topological
vector spaces, we first present a simple lemma.

Lemma 2.55 Any nonempty balanced subset of R is a symmetric interval.

Proof. Let Ω be nonempty balanced subset of R. Then λΩ ⊂ Ω whenever


|λ| ≤ 1. For any x ∈ Ω the interval [−x, x] is a subset of Ω. This implies that

Ω= [−x, x].
x∈Ω

Proposition 1.69 tells us that Ω is a connected subset of R, and so it is an


interval. The symmetry of any balanced set ensures the symmetry of Ω. 
The next proposition tells us that a hyperplane in a topological vector
space is closed if and only if the linear function defining the hyperplane from
Proposition 2.45 is continuous.

Proposition 2.56 Let X be a topological vector space, and let f : X → R be


a nonzero linear function. Then the hyperplane
  
H := x ∈ X  f (x) = α
is closed if and only if the function f is continuous.

Proof. It suffices to show that if H is closed, then f is continuous. Note that


H = X in this case, and so we can choose x0 ∈ / H. Consider a balanced
neighborhood V of the origin such that (x0 + V ) ∩ H = ∅. Then f (x) = β :=
α − f (x0 ) for all x ∈ V . Observe that f (V ) is a balanced convex subset of R
that does not contain β. Invoking Lemma 2.55 tells us that
f (V ) ⊂ [−γ, γ] with γ := |β|.
Then f (εV ) ⊂ [−γε, γε] for all ε > 0, which implies the continuity of f at the
origin. The linearity of f ensures its continuity on the entire space X. 

Based on Proposition 2.56, we say that two nonempty convex sets Ω1 and Ω2
in a topological vector space X can be separated by a closed hyperplane if
there exists a nonzero continuous linear function f : X → R such that
     
sup f (x)  x ∈ Ω1 ≤ inf f (x)  x ∈ Ω2 .
The proper separation by a closed hyperplane is defined by a similar way.
Now we are ready to derive a topological counterpart of Theorem 2.49.
96 2 BASIC THEORY OF CONVEXITY

Theorem 2.57 Let Ω be a nonempty convex set in a topological vector space


X, and let x0 ∈ / Ω. Suppose that int(Ω) = ∅. Then there exists a closed
hyperplane separating Ω and {x0 } properly. If in addition the set Ω is assumed
to be open, then there exists f ∈ X ∗ such that
f (x) < f (x0 ) for all x ∈ Ω. (2.20)

Proof. It follows from Proposition 2.48 and Theorem 2.49 that there exists a
nonzero linear function f : X → R for which we have
     
sup f (x)  x ∈ Ω ≤ f (x0 ) and inf f (x)  x ∈ Ω < f (x0 ). (2.21)
It suffices to show that f is continuous. Indeed, it is obvious that f (x) ≤ α :=
f (x0 ) for all x ∈ Ω. Choose a ∈ Ω and a balanced neighborhood V of the
origin such that a + V ⊂ Ω. Then f (x) ≤ γ := α − f (a) for all x ∈ V . Since
0 ∈ V and V is balanced, we get that γ ≥ 0 and
f (V ) ⊂ [−γ, γ],
which shows that f (εV ) ⊂ [−εγ, εγ] for all ε > 0 and hence implies the
continuity of f at the origin. The linearity of f yields its continuity on X.
Assuming finally that Ω is open, we get
Ω = int(Ω) = core(Ω),
and thus (2.20) follows from (2.14) in Theorem 2.49. 

The next theorem concerns the proper separation of two convex sets in
topological vector spaces under an interiority condition.

Theorem 2.58 Let X be a topological vector space, and let Ω1 and Ω2 be


nonempty convex subsets of X. Suppose that Ω1 ∩Ω2 = ∅ and that int(Ω1 ) = ∅.
Then the sets Ω1 and Ω2 can be properly separated by a closed hyperplane. If
in addition Ω1 is open, then there exist f ∈ X ∗ and β ∈ R such that
f (x) < β ≤ f (y) whenever x ∈ Ω1 , y ∈ Ω2 . (2.22)

Proof. Define the set Θ := Ω1 − Ω2 , which is convex with 0 ∈ / Θ. Since


int(Θ) = ∅, we can separate Θ and {0} properly by Theorem 2.57. Thus the
sets Ω1 and Ω2 can be properly separated as well.
To verify the final statement, observe that Θ is open if Ω1 has this property.
By the last assertion of Theorem 2.57 there exists f ∈ X ∗ such that
f (x − y) < 0 whenever x ∈ Ω1 , y ∈ Ω2 .
Then f (x) < f (y) for x ∈ Ω1 , y ∈ Ω2 . Note that I1 := f (Ω1 ) and I2 := f (Ω2 )
are convex sets in R, so they are two intervals. Since f ∈ X ∗ is nonzero and
Ω1 is open, we get that I1 is an open interval (α, β) or (−∞, β) (see Exercise
1.170), which verifies (2.22). 
2.3 Convex Separation Theorems 97

The following statement is a direct consequence of the preceding result.

Corollary 2.59 Let Ω1 be a convex set with int(Ω1 ) = ∅, and let Ω2 be a


nonempty convex set in a topological vector space X. If int(Ω1 ) ∩ Ω2 = ∅,
then Ω1 and Ω2 can be properly separated.

Proof. Since int(Ω1 ) is a nonempty open convex set and Ω2 is a nonempty


convex set in X, Theorem 2.58 ensures the existence of f ∈ X ∗ and β ∈ R
satisfying the inequalities
f (x) < β ≤ f (y) whenever x ∈ int(Ω1 ), y ∈ Ω2 .
Fix x ∈ int(Ω1 ) and take any u ∈ Ω1 and y ∈ Ω2 . Using Lemma 2.12, for any
t ∈ (0, 1) we have u + t(x − u) ∈ int(Ω1 ) and thus
   
f u + t(x − u) = f (u) + t f (x) − f (u) < β ≤ f (y).
Letting now t ↓ 0 gives us f (u) ≤ β ≤ f (y). Remembering that f (x) < f (y)
verifies that Ω1 and Ω2 can be properly separated. 

Next we define the notion of strict separation and establish the correspond-
ing version of convex separation theorems in LCTV spaces.

Definition 2.60 Let Ω1 and Ω2 be nonempty subsets of a vector space X.


We say that the sets Ω1 and Ω2 can be strictly separated by a hyperplane
if there exists a linear function f : X → R such that
     
sup f (x)  x ∈ Ω1 < inf f (x)  x ∈ Ω2 . (2.23)

Note that the following theorem does not require the nonempty interior
assumption of either one of the sets in question while replacing it by the
compactness assumption imposed on one of the sets.

Theorem 2.61 Let X be an LCTV space, and let Ω1 and Ω2 be nonempty


convex subsets of X. Assume in addition that Ω1 is compact, that Ω2 is closed,
and that the condition Ω1 ∩ Ω2 = ∅ is satisfied. Then the sets Ω1 and Ω2 can
be strictly separated by a closed hyperplane.

Proof. Denote Ω := Ω2 − Ω1 , which is a nonempty closed convex set with


0∈/ Ω. Then there exists a convex neighborhood V of the origin satisfying
V ∩ Ω = ∅.
Theorem 2.58 ensures the existence of a nonzero function f ∈ X ∗ such that
     
sup f (x)  x ∈ V ≤ inf f (x)  x ∈ Ω .
Choose x0 ∈ X with f (x0 ) > 0 and t > 0 so small that tx0 ∈ V . Letting
γ := f (tx0 ) > 0 gives us the inequalities
98 2 BASIC THEORY OF CONVEXITY

     
γ ≤ sup f (x)  x ∈ V ≤ inf f (x)  x ∈ Ω .
It follows furthermore that
        
sup f (x)  x ∈ Ω1 < sup f (x)  x ∈ Ω1 + γ ≤ inf f (x)  x ∈ Ω2 .
The latter tells us that the sets Ω1 and Ω2 can be strictly separated. 

We have an interesting consequence of the above separation results for the


weak topology on topological vector spaces.

Corollary 2.62 Let X be an LCTV space. Then X with the weak topology
σ(X, X ∗ ) is an LCTV space.

Proof. Pick x1 , x2 ∈ X. Employing Theorem 2.61 applied to the convex sets


Ω1 := {x1 } and Ω2 := {x2 } allows us to find f ∈ X ∗ and γ ∈ R such that
f (x1 ) < γ < f (x2 ).
Consider now the convex sets Θ1 := L< γ and Θ2 := Uγ taken from Exam-
>

ple 1.100. These sets are disjoint and open sets with respect to the weak
topology on X. We obviously have that xi ∈ Θi for i = 1, 2, and thus the
weak topology σ(X, X ∗ ) on the space X is Hausdorff.
Next we show that the addition operation in (X, σ(X, X ∗ )) is continuous.
Fix any weakly open set W containing the origin. Then there exist elements
f1 , . . . , fm ∈ X ∗ and a number ε > 0 such that
V (f1 , . . . , fm ; ε) ⊂ W.
Letting U := V (f1 , . . . , fm ; ε/2), it is not hard to check that U is a weakly open
set containing the origin, and that U + U ⊂ W . This verifies the continuity
of the addition operation in (X, σ(X, X ∗ )). The proof of the continuity of
the scalar multiplication in (X, σ(X, X ∗ )) is left as exercises for the reader.
It follows from Corollary 1.102 that the space X endowed with the weak
topology σ(X, X ∗ ) is an LCTV space. 

The next consequence of Theorem 2.61 is a very useful classical result.

Corollary 2.63 Any closed convex set in an LCTV space is weakly closed.

Proof. Let Ω be a nonempty, closed, convex set in an LCTV space X, and let
/ Ω. Theorem 2.61 ensures the existence of f ∈ X ∗ and γ ∈ R such that
x0 ∈
f (x) < γ < f (x0 ) for all x ∈ Ω.
Then the set V := f −1 ((γ, ∞)) is a weakly open set with x0 ∈ V and Ω ∩ V =
∅. Thus the given set Ω is weakly closed in X. 
2.3 Convex Separation Theorems 99

Corollary 2.64 Let X be an LCTV space, and let x ∈ X. If f (x) = 0 for all
f ∈ X ∗ , then x = 0. In particular, X ∗ is nonzero whenever X is nonzero.
Proof. Suppose on the contrary that x = 0. Employing Theorem 2.61 applied
to the convex sets Ω1 := {x} and Ω2 := {0} allows us to find f ∈ X ∗ and
γ ∈ R such that
f (x) < γ < 0.
This contradiction justifies the first claim.
Now suppose that X is nonzero and pick an element x = 0 in X. By the
first claim, there exists f ∈ X ∗ with f (x) = 0. Thus f is a nonzero element
in X ∗ , i.e., X ∗ = {0}. 

Remark 2.65 If we consider nonzero LCTV spaces, then their topological


duals are nonzero as well. Thus the nonzero standing assumption stated at
the beginning of Section 1.2 is satisfied.

The strict separation result of Corollary 2.64 and the next lemma allow
us to establish a major result on the duality relationship between the given
topology on an LCTV space X and the weak∗ topology on X ∗ .
Recall that the kernel of f : X → F is
  
ker(f ) := x ∈ X  f (x) = 0 .
Lemma 2.66 Let X be a vector space over a field F, and let fi : X → F for
i = 1, . . . , m and f : X → F be linear functions satisfying the condition
m
ker(fi ) ⊂ ker(f ).
i=1

Then there exist numbers λi ∈ F for i = 1, . . . , m such that


m
f= λi fi .
i=1

Proof. Since this statement is obvious if f = 0, assume that f = 0. Arguing


by induction, consider first the case where n = 1 and suppose that ker(f1 ) ⊂
ker(f ). Then f1 = 0, i.e., f1 (x0 ) = 0 for some x0 ∈ X. We obviously have
f1 (x)x0
y := x − ∈ ker(f1 ) whenever x ∈ X,
f1 (x0 )
which implies that y ∈ ker(f ). Then f (y) = 0, and hence
f (x0 )
f (x) = αf1 (x) with α := .
f1 (x0 )
100 2 BASIC THEORY OF CONVEXITY

Suppose further that the conclusion holds for some positive integer n with the
inductive hypothesis
n+1
ker(fi ) ⊂ ker(f ).
i=1

Denote Y := ker(fn+1 ), which is a subspace of X, and let g : Y → F and


gi : Y → F be the restrictions of f and fi , i = 1, . . . , n, to Y , respectively. It
is easily to check that
n n
ker(fi ) = ker(fi ) ∩ Y ⊂ ker(f ) ∩ Y = ker(g).
i=1 i=1

The induction assumption gives us scalars λi as i = 1, . . . , n such that


n
g(x) = λi gi (x) for all x ∈ Y.
i=1
n
Letting h(x) := f (x) − i=1 λi fi , we get h(x) = 0 whenever x ∈ Y =
ker(fn+1 ), and so ker(fn+1 ) ⊂ ker(h). It follows that
h(x) = λn+1 fn+1 for some scalar λn+1 .
n+1
Thus we arrive at f (x) = i=1 λi fi , which verifies the claim. 
Here is the aforementioned duality theorem.

Theorem 2.67 Let X be an LCTV space. Then we have


(X ∗ , τw∗ )∗ = X.

Proof. To verify first that X ⊂ (X ∗ , τw∗ )∗ , fix any x ∈ X and observe that


the linear function x : X ∗ → R defined by x(x∗ ) := x∗ , x for x∗ ∈ X ∗ is
continuous with respect to the weak∗ topology. This show that x ∈ (X ∗ , τw∗ )∗ .
The proof of the opposite inclusion is more involved. Pick any element f ∈
(X ∗ , τw∗ )∗ , which means that f : X ∗ → R is a linear function
 ∗ continuous
 with
∗ ∗ ∗  ∗
respect
 to the weak topology on X . Then the set V := x ∈ X |f (x )| <
1 is a neighborhood of the origin of X ∗ with respect to the latter topology.
The definition of the weak∗ topology allows us to find x1 , . . . , xm ∈ X and
ε > 0 for which we have the inclusion
m
V (xi ; ε) ⊂ V.
i=1

This implies that i=1 ker(fi ) ⊂ ker(f ), where fi : X ∗ → R is defined by


m

fi (x∗ ) := x∗ , xi  for x∗ ∈ X ∗ . To verify the latter, fix any x∗ ∈ i=1 ker(fi )


m

and get γx∗ ∈ V for γ ∈ R; so f (x∗ ) = 0. Employing now Lemma 2.66 ensures
the existence of λ1 , . . . , λm ∈ R such that f = i=1 λi fi ∈ X ∗ , which yields
m
2.3 Convex Separation Theorems 101

m m m
f (x∗ ) = λi fi (x∗ ) = λi x∗ , xi  = x∗ , λi xi .
i=1 i=1 i=1

 := i=1 λi xi , we arrive at the representation f (x∗ ) = x∗ , x


m
Denoting x  for
all x ∈ X ∗ . Employing finally Corollary 2.64 tells us that the function f can

 ∈ X and thus completes the proof of the theorem.


be identified with x 

Yet another consequence of Theorem 2.61 deals with weak∗ topology on


the spaces that are dual to topological vector spaces.

Corollary 2.68 Let X be an LCTV space, let Ω be a nonempty closed convex


subset in X ∗ equipped with the weak∗ topology, and let u∗ ∈ X ∗ \ Ω. Then
there exists a vector x0 ∈ X such that
sup x∗ , x0  < u∗ , x0 .
x∗ ∈Ω

Proof. We apply the strict separation result of Theorem 2.61 on X ∗ equipped


with the weak∗ topology. It ensures the existence of f ∈ (X ∗ , τw∗ )∗ such that
sup f (x∗ ) < f (u∗ ).
x∗ ∈Ω

Taking into account that f can be identified with an element x0 ∈ X via


the representation f (x∗ ) = x∗ , x0  for all x∗ ∈ X ∗ , we arrive at the claimed
conclusion of this corollary. 

Remark 2.69 Recall that the separation and related results presented in this
and preceding subsections concern real vector and topological vector spaces.
On the other hand, we may consider their counterparts over the field of com-
plex numbers F = C by taking into account that the definition of convexity,
which is given via real numbers λ ∈ (0, 1), stays in such spaces due to R ⊂ C.
However, since the formulation of separation theorems requires the number
ordering, we need to use the real part “Re” of the complex-valued separating
functions f : X → C.

Having Remark 2.69 in mind, we conclude this subsection by presenting


in Theorem 2.71 complex counterparts of the two major separation theorem.
Prior to it, we need the following lemma and leave to the reader furnishing
its detailed proof; see Exercise 2.205.

Lemma 2.70 Let X be a complex topological vector space, and let f : X →


C be a linear function. Then f ∈ X ∗ if and only if Re(f ) is continuous.
Furthermore, for any continuous real-valued function g : X → R there exists
a unique function f ∈ X ∗ such that g is the real part of f .

Here is the aforementioned complex separation result.


Theorem 2.71 Let Ω1 and Ω2 be disjoint nonempty convex sets in a complex
topological vector space X. The following assertions hold:
102 2 BASIC THEORY OF CONVEXITY

(a) If Ω1 is open, then there exist f ∈ X ∗ and α ∈ R such that


Re f (x) < α ≤ Re f (y) for all x ∈ Ω1 and y ∈ Ω2 .

(b) If the space X is locally convex and its subsets Ω1 and Ω2 are compact
and closed, respectively, then there exist f ∈ X ∗ and α, β ∈ R such that
Re f (x) < α < β < Re f (y) for all x ∈ Ω1 and y ∈ Ω2 .

Proof. If X is a real topological vector space, the first statement follows


directly from Theorems 2.58 and 2.61. Observe further that in the general
case where X is a complex topological vector space, we can also treat it as a
real topological vector space in which Ω1 and Ω2 are convex sets satisfying all
the properties assumed in the theorem; see Remark 2.69. Thus there exist a
continuous real-valued function g : X → R and a real number α ∈ R such that
Re f (x) < α ≤ Re f (y) whenever x ∈ Ω1 and y ∈ Ω2 .
This yields by Lemma 2.70 the existence of a unique complex-valued linear
function on X for which Re(f ) = g. It is easy to see that f ∈ X ∗ enjoys the
claimed property in (a). The proof of (b) is similar. 

2.3.3 Convex Separation in Finite Dimensions

In this subsection we concentrate on convex separation results in finite-


dimensional spaces, which have some specific features in comparison with
infinite-dimensional settings. By dimension of an affine set ∅ = Ω ⊂ Rn we
understand the dimension of the linear subspace parallel to Ω, while dimen-
sion of a convex set Ω in Rn is the dimension of its affine hull aff(Ω).
The following relaxed notion of interior plays a fundamental role in finite-
dimensional convex analysis (Figure 2.8).
Definition 2.72 Let x ∈ Ω ⊂ Rn . We say that x ∈ ri(Ω), i.e., it belongs to
the relative interior of Ω, if there exists γ > 0 such that
B(x; γ) ∩ aff(Ω) ⊂ Ω.
The next simple proposition is useful in what follows and serves as an
example for the better understanding of the relative interior.
Proposition 2.73 Let Ω be a nonempty convex set. Suppose that x ∈ ri(Ω)
and that y ∈ Ω. Then there exists t > 0 for which
x + t(x − y) ∈ Ω.
Proof. Choose a number γ > 0 with
B(x; γ) ∩ aff(Ω) ⊂ Ω
and note that x + t(x − y) = (1 + t)x + (−t)y ∈ aff(Ω) for all t ∈ R as an affine
combination of x and y. Select t > 0 so small that x + t(x − y) ∈ B(x; γ).
Then we have x + t(x − y) ∈ B(x; γ) ∩ aff(Ω) ⊂ Ω. 
2.3 Convex Separation Theorems 103

Fig. 2.8. Relative interior

To proceed further, we need yet another definition.

Definition 2.74 Vectors v0 , . . . , vm in Rn , m ≥ 1, are said to be affinely


independent if we have the implication
 m m   
λi vi = 0, λi = 0 =⇒ λi = 0 for all i = 0, . . . , m .
i=0 i=0

Accordingly, these vectors are affinely dependent if there exist λi ∈ R as


m m
i = 0, . . . , m, not all zeros, with i=0 λi = 0 such that i=0 λi vi = 0.

It is easy to observe the following relationship.

Proposition 2.75 Vectors v0 , . . . , vm in Rn are affinely independent if and


only if the shifted vectors v1 − v0 , . . . , vm − v0 are linearly independent in Rn .

Proof. Let v0 , . . . , vm be affinely independent. Consider the system


m m
λi (vi − v0 ) = 0, i.e., λ0 v0 + λi vi = 0,
i=1 i=1
m m
where λ0 := − i=1 λi . Since i=0 λi = 0, we have get λi = 0 for all i =
1, . . . , m. Thus v1 − v0 , . . . , vm − v0 are linearly independent. The proof of the
converse statement is also straightforward. 

Recall that the span of some set Ω, span Ω, is the linear subspace generated
by Ω. The following two propositions are simple albeit useful below.

Proposition 2.76 Let Ω := aff{v0 , . . . , vm }, where vi ∈ Rn for all i =


0, . . . , m. Then the span of the set {v1 − v0 , . . . , vm − v0 } is the linear subspace
parallel to the affine hull of Ω.
104 2 BASIC THEORY OF CONVEXITY

Proof. Denote by L the linear subspace parallel to Ω. Then Ω − v0 = L, and


therefore vi − v0 ∈ L for all i = 1, . . . , m. This gives us
  
span vi − v0  i = 1, . . . , m ⊂ L.
To verify the reverse inclusion, fix any v ∈ L and get v + v0 ∈ Ω. Thus
m m
v + v0 = λ i vi , λi = 1.
i=0 i=0

This yields the relationship


m
  
v= λi (vi − v0 ) ∈ span vi − v0  i = 1, . . . , m ,
i=1

which justifies the reverse inclusion and hence completes the proof. 

Proposition 2.77 The elements v0 , . . . , vm are affinely independent in Rn if


and only if their affine hull Ω = aff{v0 , . . . , vm } is m-dimensional.

Proof. Let v0 , . . . , vm be affinely independent. Then Proposition 2.76 tells us


that the linear subspace L := span{vi − v0 | i = 1, . . . , m} is parallel to Ω.
The linear independence of v1 − v0 , . . . , vm − v0 means by Proposition 2.75
that the linear subspace L is m-dimensional and so is Ω. The proof of the
converse statement is straightforward as well. 

Affinely independent systems lead us to the construction of simplices.

Definition 2.78 Let v0 , . . . , vm be affinely independent in Rn . Then the set


  
Δm := co vi  i = 0, . . . , m
is called an m-simplex in Rn with the vertices vi , i = 0, . . . , m.

An important role of simplices in finite-dimensional geometry and convex


analysis is revealed by the following proposition.

Proposition 2.79 Consider an m-simplex Δm with vertices vi for i =


0, . . . , m. Then for every v ∈ Δm there is a unique element (λ0 , . . . , λm ) ∈
Rm+1
+ satisfying the conditions
m m
v= λi vi and λi = 1.
i=0 i=0

Proof. Let (λ0 , . . . , λm ) ∈ Rm+1


+ and (μ0 , . . . , μm ) ∈ Rm+1
+ be such that
m m m m
v= λ i vi = μi vi and λi = μi = 1.
i=0 i=0 i=0 i=0
2.3 Convex Separation Theorems 105

This immediately implies the equalities


m m
(λi − μi )vi = 0 with (λi − μi ) = 0.
i=0 i=0

Since the vectors v0 , . . . , vm are affinely independent, we have the equalities


λi = μi for all i = 0, . . . , m and thus complete the proof. 

We begin the study of relative interiors with the following lemma.

Lemma 2.80 Any linear mapping A : Rn → Rp is continuous.

Proof. Let {e1 , . . . , en } be the standard orthonormal basis of Rn , and let


vi := A(ei ) as i = 1, . . . , n. For any x ∈ Rn with x = (x1 , . . . , xn ) we have
 n  n n
A(x) = A xi ei = xi A(ei ) = xi vi .
i=1 i=1 i=1

Then the triangle inequality and the Cauchy-Schwarz inequality give us


 
 n  n
n
 
A(x) ≤ |xi |vi  ≤  |xi | 
2 vi 2 = M x,
i=1 i=1 i=1
 n
where M := i=1 vi 2 . It follows furthermore that
A(x) − A(y) = A(x − y) ≤ M x − y for all x, y ∈ Rn ,
which verifies the continuity of the mapping A. 

The next theorem plays an essential role in what follows.

Theorem 2.81 Let Δm be an m-simplex in Rn for some m ≥ 1. Then we


have ri(Δm ) = ∅.

Proof. Consider the vertices v0 , . . . , vm of the simplex Δm and denote


m
1
v := vi .
m+1 i=0

We prove the theorem by showing that v ∈ ri(Δm ). Define


  
L := span vi − v0  i = 1, . . . , m
and observe that L is the m-dimensional linear subspace of Rn parallel to
aff(Δm ) = aff{v0 , . . . , vm }. It is easy to see that for every x ∈ L there is a
unique collection (λ0 , . . . , λm ) ∈ Rm+1 with
106 2 BASIC THEORY OF CONVEXITY

m m
x= λi vi and λi = 0.
i=0 i=0

Form the mapping A : L → Rm+1 , which sends each x ∈ L to the correspond-


ing coefficients (λ0 , . . . , λm ) ∈ Rm+1 as above. Then A is linear, and so it is
continuous by Lemma 2.80. Since A(0) = 0, we choose δ > 0 such that
1
A(u) < whenever u ≤ δ.
m+1
Let us now show that (v +δB)∩aff(Δm ) ⊂ Δm , which means that v ∈ ri(Δm ).
To proceed, fix any x ∈ (v + δB) ∩ aff(Δm ) and get that x = v + u for some
u ∈ δB. Since v, x ∈ aff(Δm ) and u = x − v, we have u ∈ L. Denoting
m m
Au := (α0 , . . . , αm ) gives us u = i=0 αi vi with i=0 αi = 0 and
1
|αi | ≤ A(u) < for all i = 0, . . . , m.
m+1
This implies in turn the representations
m  1  m
v+u= + αi vi = μi vi ,
i=0
m+1 i=0

1 m
where μi := + αi ≥ 0 for i = 0, . . . , m. Since i=0 μi = 1, it ensures
m+1
that x ∈ Δm . Thus (v + δB) ∩ aff(Δm ) ⊂ Δm and therefore v ∈ ri(Δm ). 

Another lemma is needed to establish a major result given below.


Lemma 2.82 Let Ω be a nonempty convex set in Rn of dimension m ≥ 1.
Then there are m + 1 affinely independent vectors v0 , . . . , vm belonging to Ω.

Proof. Denote by Δk := {v0 , . . . , vk } the k−simplex of maximal dimension


contained in Ω. Then v0 , . . . , vk are affinely independent. To verify now that
k = m, form K := aff{v0 , . . . , vk } and observe that K ⊂ aff(Ω) due to
{v0 , . . . , vk } ⊂ Ω. The opposite inclusion also holds since Ω ⊂ K. To jus-
tify this, we argue by contradiction and suppose that there exists w ∈ Ω
such that w ∈ / K. Then a direct application of the affine independence def-
inition shows that the vectors v0 , . . . , vk , w are affinely independent while
being a subset of Ω, a contradiction. Thus K = aff(Ω), and we arrive at
k = dim(K) = dim(aff(Ω)) = dim(Ω) = m. 
The next theorem is one of the most fundamental results of convex finite-
dimensional geometry. The interiority version of this result, which is valid in
general topological vector spaces, was presented in Lemma 2.12.

Theorem 2.83 Let Ω be a nonempty convex set in Rn . The following hold:


(a) We always have ri(Ω) = ∅.
2.3 Convex Separation Theorems 107

(b) For any a ∈ ri(Ω) and b ∈ Ω we have [a, b) ⊂ ri(Ω).

Proof. To verify (a), denote by m the dimension of Ω. Observe first that the
case where m = 0 is trivial, since in this case Ω is a singleton and ri(Ω) = Ω.
Suppose that m ≥ 1 and find m + 1 affinely independent elements v0 , . . . , vm
in Ω as in Lemma 2.82. Consider further the m-simplex
 
Δm := co v0 , . . . , vm
and get that aff(Δm ) = aff(Ω). Take v ∈ ri(Δm ), which exists by Theo-
rem 2.81. For any small ε > 0 we have
B(v, ε) ∩ aff(Ω) = B(v, ε) ∩ aff(Δm ) ⊂ Δm ⊂ Ω.
This verifies that v ∈ ri(Ω) by the definition of relative interior.
To prove (b), let L be the linear subspace of Rn parallel to aff(Ω), and let
m := dim(L). Then there is a bijective linear mapping A : L → Rm such
that A and A−1 are continuous. Fix x0 ∈ aff(Ω) and define f : aff(Ω) →
Rm by f (x) := A(x − x0 ). It is easy to check that f is a bijective affine
mapping and that both f and f −1 are continuous. We see that a ∈ ri(Ω) if
and only if f (a) ∈ int(f (Ω)), and that b ∈ Ω if and only if f (b) ∈ f (Ω). Then
[f (a), f (b)) ⊂ int(f (Ω)) by Lemma 2.12. This shows that [a, b) ⊂ ri(Ω). 

The following theorem establishes a connection between the relative inte-


rior and closure of sets in Rn similar to that between the interior and closure
in topological vector spaces given in Theorem 2.13.

Theorem 2.84 Let Ω be a nonempty convex subset of Rn . Then the sets ri(Ω)
and Ω are also convex, and we have:
(a) ri(Ω) = Ω.
(b) ri(Ω) = ri(Ω).
Proof. Note that the convexity of ri(Ω) follows from Theorem 2.83 while the
convexity of Ω was proved in Proposition 2.11. To justify (a), observe that
the inclusion ri(Ω) ⊂ Ω is obvious. For the reverse inclusion, pick b ∈ Ω and
a ∈ ri(Ω) and then form the sequence
1  1
xk := a + 1 − b, k ∈ N,
k k
which converges to b as k → ∞. Since xk ∈ ri(Ω) by Theorem 2.83, we have
b ∈ ri(Ω). Thus Ω ⊂ ri(Ω), which verifies the first assertion of the theorem.
To verify (b), we need to show that ri(Ω) ⊃ ri(Ω). Pick x ∈ ri(Ω) and
x ∈ ri(Ω). It follows from Proposition 2.73 that z := x + t(x − x) ∈ Ω if t > 0
is small, and so we get x = z/(1 + t) + tx/(1 + t) ∈ (z, x) ⊂ ri(Ω). 

Corollary 2.85 Let Ω1 , Ω2 ⊂ Rn be convex sets such that Ω 1 = Ω 2 . Then


we have the equality ri(Ω1 ) = ri(Ω2 ).
108 2 BASIC THEORY OF CONVEXITY

Proof. If Ω 1 = Ω 2 , then ri(Ω 1 ) = ri(Ω 2 ), and so we conclude that ri(Ω1 ) =


ri(Ω2 ) by using Theorem 2.84. 
The following theorem presents a calculus rule for relative interiors of sets
under affine mappings.

Theorem 2.86 Let B : Rn → Rp be an affine mapping, and let Ω ⊂ Rn be a


convex set. Then we have the equality
   
B ri(Ω) = ri B(Ω) .

Proof. Pick y ∈ B(ri(Ω)) and find x ∈ ri(Ω) such that y = B(x). Then
take vectors y ∈ ri(B(Ω)) ⊂ B(Ω) and x ∈ Ω with y = B(x). If x = x, then
y = y ∈ ri(B(Ω)). Consider the case where x = x. We can find x  ∈ Ω such that
x ∈ ( x, x) and define y := B( x) ∈ B(Ω). Thus y = B(x) ∈ (B( x), B(x)) =
y , y), and so we get y ∈ ri(B(Ω)). To complete the proof, it remains to show
(
that B(Ω) = B(ri(Ω)) and then obtain by using Corollary 2.85 the inclusion
     
ri B(Ω) = ri B(ri(Ω) ⊂ B ri(Ω) .

Since B((riΩ)) ⊂ B(Ω), the continuity of B and Theorem 2.84 imply that
   
B(Ω) ⊂ B(Ω) = B ri(Ω) ⊂ B ri(Ω) .

Thus we have B(Ω) ⊂ B(ri(Ω)), which ends the proof of the theorem. 

An interesting consequence of this result is the following property concern-


ing the difference of two subsets Ω1 , Ω2 ⊂ Rn defined by
  
Ω1 − Ω2 := x1 − x2  x1 ∈ Ω1 and x2 ∈ Ω2 .
Corollary 2.87 Let Ω1 and Ω2 be convex subsets of Rn . Then we have
ri(Ω1 − Ω2 ) = ri(Ω1 ) − ri(Ω2 ).

Proof. Define B : Rn × Rn → Rn by B(x, y) := x − y and form the Cartesian


product Ω := Ω1 × Ω2 . Then we have B(Ω) = Ω1 − Ω2 , which implies that
     
ri(Ω1 − Ω2 ) = ri B(Ω) = B ri(Ω)  = B ri(Ω1 × Ω2 )
= B ri(Ω1 ) × ri(Ω2 ) = ri(Ω1 ) − ri(Ω2 )
by using Theorem 2.86, and that ri(Ω1 × Ω2 ) = ri(Ω1 ) × ri(Ω2 ). 

The following proposition is a finite-dimensional counterpart of Theo-


rem 2.57 that does not require the nonempty interior assumption.

Theorem 2.88 Let Ω be a nonempty convex set in Rn , and let x ∈ / Ω. Then


there exists a vector v ∈ Rn such that
     
sup v, x  x ∈ Ω ≤ sup v, x  x ∈ Ω < v, x. (2.24)
2.3 Convex Separation Theorems 109

Fig. 2.9. Separation in a linear subspace

Proof. To verify that there exists v ∈ Rn such that (2.24) is satisfied, we only
need to apply the separation result of Theorem 2.61 to the singleton {x} and
the closure Ω of the convex set Ω. 

The next result justifies a strict separation property relative to a linear


subspace of Rn without any interiority assumptions; see Figure 2.9.

Proposition 2.89 Let L be a linear subspace of Rn , and let Ω ⊂ L be a


nonempty convex set with x ∈ L and x ∈ Ω. Then there exists v ∈ L with
  
sup v, x  x ∈ Ω < v, x.

Proof. Employing (2.24) gives us a vector w ∈ Rn such that


  
sup w, x  x ∈ Ω < w, x.
Using the direct sum representation Rn = L ⊕ L⊥ , where
  
L⊥ := u ∈ Rn  u, x = 0 for all x ∈ L ,
gives us u ∈ L⊥ and v ∈ L with w = u + v. This yields u, x = 0 for any
x ∈ Ω ⊂ L and the relationships
  
v, x = u, x + v, x = u + v, x = w, x ≤ sup w, x  x ∈ Ω
< w, x = u + v, x = u, x + v, x = v, x,
which show that sup{v, x | x ∈ Ω} < v, x with v = 0. 

The following technical lemma is needed for deriving major separation


results in finite-dimensional convex analysis presented below; see Figure 2.10.
110 2 BASIC THEORY OF CONVEXITY

Fig. 2.10. Illustration of the proof of Lemma 2.90

Lemma 2.90 Let Ω be a nonempty and convex subset of Rn , and let 0 ∈


Ω \ ri(Ω). Then the affine hull aff(Ω) is actually a linear subspace of Rn , and
there exists a sequence {xk } ⊂ aff(Ω) with xk ∈
/ Ω and xk → 0 as k → ∞.
Proof. Since ri(Ω) = ∅ by Theorem 2.83(a) and since 0 ∈ Ω \ ri(Ω), we find
x0 ∈ ri(Ω) such that −tx0 ∈/ Ω for all t > 0. Indeed, suppose by contradiction
that −tx0 ∈ Ω for some t > 0 and then deduce from Theorem 2.83(b) that
t 1
0= x0 + (−tx0 ) ∈ ri(Ω),
1+t 1+t
/ ri(Ω). Letting now xk := −x0 /k implies
which contradicts the assumption 0 ∈
that xk ∈
/ Ω for every k ∈ N with xk → 0 as k → ∞. Furthermore, we have
0 ∈ Ω ⊂ aff(Ω) = aff(Ω)
by the closedness of aff(Ω). This shows that the set aff(Ω) is a linear subspace
of Rn and that xk ∈ aff(Ω) for all k ∈ N. 

The next result establishes the proper separation of a nonempty convex


subset of Rn from the origin in Rn .
Theorem 2.91 Let Ω be a nonempty convex set in Rn . Then 0 ∈ / ri(Ω) if
and only if the sets Ω and {0} can be properly separated, i.e., there exists a
vector v ∈ Rn such that
     
sup v, x  x ∈ Ω ≤ 0 and inf v, x  x ∈ Ω < 0.
2.3 Convex Separation Theorems 111

Proof. We split the proof into the following two cases.


Case 1: 0 ∈ Ω. It follows from estimate (2.24) in Theorem 2.88 with x = 0
that there exists v = 0 for which
  
sup v, x  x ∈ Ω < v, x = 0,
and thus the sets Ω and {0} can be properly separated.
Case 2: 0 ∈ Ω \ ri(Ω). Letting L := aff(Ω) and employing Lemma 2.90 tell us
that L is a linear subspace of Rn and there exists a sequence {xk } ⊂ L with
xk ∈
/ Ω and xk → 0 as k → ∞. Then Proposition 2.89 implies that there is a
sequence {vk } ⊂ L with vk = 0 and
  
sup vk , x  x ∈ Ω < vk , xk , k ∈ N.
Denoting wk := vk
vk  shows that wk  = 1 for all k ∈ N and that

wk , x < wk , xk  for all x ∈ Ω. (2.25)


Letting k → ∞ in (2.25) and supposing without loss of generality that wk →
v ∈ L with some v = 1 along the entire sequence, we arrive at
  
sup v, x  x ∈ Ω ≤ 0
by using |wk , xk | ≤ wk  · xk  = xk  → 0. To verify the inequality
  
inf v, x  x ∈ Ω < 0,
it suffices to show that there is x ∈ Ω with v, x < 0. Suppose on the contrary
that v, x ≥ 0 for all x ∈ Ω and deduce from sup{v, x | x ∈ Ω} ≤ 0 that
v, x = 0 for all x ∈ Ω. Since v ∈ L = aff(Ω), we get the representation
m m
v= λi ωi with λi = 1 and ωi ∈ Ω for i = 1, . . . , m,
i=1 i=1

which readily implies the equalities


m
v2 = v, v = λi v, ωi  = 0.
i=1

This contradicts the condition v = 1 and so justifies the proper separation.
To prove the converse statement of the theorem, assume that Ω and {0}
can be properly separated and thus find v ∈ Rn such that
  
sup v, x  x ∈ Ω ≤ 0 while v, x < 0 for some x ∈ Ω.
If on the contrary 0 ∈ ri(Ω) , it follows from Proposition 2.73 that
112 2 BASIC THEORY OF CONVEXITY

0 + t(0 − x) = −tx ∈ Ω for some t > 0.


This immediately yields the inequalities
  
v, −tx ≤ sup v, x  x ∈ Ω ≤ 0
and show that v, x ≥ 0. This is a contradiction, which justifies 0 ∈
/ ri(Ω). 

Now we are ready to establish the main separation theorem for convex sets
in finite-dimensional convex analysis.

Theorem 2.92 Let Ω1 and Ω2 be two nonempty convex subsets of Rn . Then


Ω1 and Ω2 can be properly separated if and only if ri(Ω1 ) ∩ ri(Ω2 ) = ∅.

Proof. Define Ω := Ω1 − Ω2 and show that ri(Ω1 ) ∩ ri(Ω2 ) = ∅ if and only if


0∈
/ ri(Ω1 − Ω2 ) = ri(Ω1 ) − ri(Ω2 ).
To proceed, suppose first that ri(Ω1 ) ∩ ri(Ω2 ) = ∅ and get by Corollary 2.87
that 0 ∈ ri(Ω1 − Ω2 ) = ri(Ω). Then Theorem 2.91 tells us that the sets Ω
and {0} can be properly separated. Thus there exist vectors v ∈ Rn with
v, x ≤ 0 for all x ∈ Ω and y ∈ Ω such that v, y < 0. For any ω1 ∈ Ω1 and
ω2 ∈ Ω2 we denote x := ω1 − ω2 ∈ Ω and get that
v, ω1 − ω2  = v, x ≤ 0,
i.e., v, ω1  ≤ v, ω2 . Choose ω 1 ∈ Ω1 and ω 2 ∈ Ω2 with y = ω 1 − ω 2 . Then
v, ω 1 − ω 2  = v, y < 0
telling us that v, ω 1  < v, ω 2 . Hence Ω1 and Ω2 can be properly separated.
To verify now the converse implication, suppose that Ω1 and Ω2 can be
properly separated, which implies that the sets Ω = Ω1 − Ω2 and {0} can
be properly separated as well. Employing Theorem 2.91 again provides the
required relationships
/ ri(Ω) = ri(Ω1 − Ω2 ) = ri(Ω1 ) − ri(Ω2 ), and so ri(Ω1 ) ∩ ri(Ω2 ) = ∅,
0∈
which complete the proof of the theorem. 

A useful consequence of Theorem 2.92 is the following interval character-


ization of relative interiors of convex sets.

Corollary 2.93 Let Ω be a convex set in Rn . Then x ∈ ri(Ω) if and only if


 x there exists u ∈ Ω such that x ∈ ri([u, x]) = (u, x).
for every x ∈ Ω with x =

Proof. Having x ∈ ri(Ω), there exists δ > 0 for which


B(x; δ) ∩ aff(Ω) ⊂ Ω. (2.26)
2.3 Convex Separation Theorems 113

We can choose 0 < t < 1 so small that u := x + t(x − x) ∈ B(x; δ). Since
u ∈ aff(Ω), it follows from (2.26) that u ∈ Ω, and hence we have
t 1  
x= x+ u ∈ (x, u) = ri [u, x] .
1+t t+t
To verify the opposite implication, we use the assumption that for any x ∈ Ω
with x = x there exists u ∈ Ω such that x ∈ ri([u, x]). Suppose on the contrary
that x ∈/ ri(Ω). Then Theorem 2.92 tells us that the sets {x} and Ω can be
properly separated. Choose v ∈ Rn such that v, x ≤ v, x for all x ∈ Ω
and also a vector x0 ∈ Ω satisfying v, x0  < v, x. Picking u ∈ Ω with
x ∈ ri([x0 , u]), we see that {x} and [x0 , u] can be properly separated. This
yields x ∈
/ ri([x0 , u]), a contradiction, which completes the proof. 

The established convex separation theorem and its consequence in Corol-


lary 2.93 are instrumental to obtain the following calculation of relative inte-
riors of graphs of set-valued mappings between finite-dimensional spaces.

Theorem 2.94 Let F : Rm → → Rn be a set-valued mapping with convex graph.


Then we have the representation
       
ri gph(F ) = (x, y) ∈ Rm × Rn  x ∈ ri dom(F ) , y ∈ ri F (x) .

Proof. We first verify the fulfillment of the inclusion “⊂”. Consider the pro-
jection mapping P : Rm × Rn → Rm given by
P(x, y) := x for (x, y) ∈ Rm × Rn .
It follows from Theorem 2.86 that
P(ri(gph(F )) = ri(P(gph(F ))) = ri(dom(F )). (2.27)
Now, take any (x, y) ∈ ri(gph(F )) and get from (2.27) that x ∈ ri(dom(F )).
Since (x, y) ∈ ri(gph(F )) ⊂ gph(F ), we have y ∈ F (x). Fix any y ∈ F (x) with
y = y. Then (x, y) ∈ gph(F ) with (x, y) = (x, y). By Corollary 2.93, there
exists (u, z) ∈ gph(F ) and t ∈ (0, 1) such that
(x, y) = t(x, y) + (1 − t)(u, z).
Then x = tx + (1 − t)u, which implies (1 − t)x = (1 − t)u and so x = u. In
addition, y = ty + (1 − t)z ∈ (y, z), where z ∈ F (x). Using the equivalence in
Corollary 2.93 again yields y ∈ ri(F (x)).
To prove the opposite one, fix x ∈ ri(dom(F )) and y ∈ ri(F (x)). Suppose
on the contrary that (x, y) ∈
/ ri(gph(F )). Then Theorem 2.92 allows us to find
(u, v) ∈ Rm × Rn such that
u, x + v, y ≤ u, x + v, y whenever y ∈ F (x). (2.28)
Furthermore, there exists (x0 , y0 ) ∈ gph(F ) satisfying
114 2 BASIC THEORY OF CONVEXITY

u, x0  + v, y0  < u, x + v, y.


Letting x := x in (2.28) yields v, y ≤ v, y for all y ∈ F (x). Since x ∈
ri(dom(F )) and x0 ∈ dom(F ), Corollary 2.93 tells us that there exists x ∈
dom(F ) with x = tx0 + (1 − t)
x for some t ∈ (0, 1). Note that this conclusion
holds even when x0 = x. Further we choose y ∈ F (x) and define the vector
y := ty0 + (1 − t)
y.
Then y ∈ F (x) by the convexity of gph(F ), and therefore
 + v, y ≤ u, x + v, y and
u, x
u, x0  + v, y0  < u, x + v, y.
Multiplying the first inequality by 1 − t, multiplying the second one by t, and
then adding them together give us the condition
u, x + v, y < u, x + v, y.
This yields v, y < v, y, and thus the sets {y} and F (x) can be properly
separated. This is equivalent to y ∈/ ri(F (x)), which is a contradiction. 

Although the precise definition of a convex function with its domain and
epigraph are given only in Subsection 2.4.1, it is tempting to present right now
a direct consequence of Theorem 2.94 on the representation of relative interiors
for epigraphs of extended-real-valued convex functions in finite dimensions. An
infinite-dimensional version of this result is given below in Subsection 2.4.4.

Corollary 2.95 Let f : Rn → (−∞, ∞] be a convex function. Then


      
ri epi(f ) = (x, λ) ∈ Rn+1  x ∈ ri dom(f ) , λ > f (x) .

Proof. Define the set-valued mapping F : Rn → → R by F (x) := [f (x), ∞) and


observe that dom(F ) = dom(f ). Furthermore, for any x ∈ dom(f ) we have
ri(F (x)) = (f (x), ∞). Then the claimed representation of the epigraph relative
interior follows directly from Theorem 2.94. 

2.3.4 Extreme Points of Convex Sets

In this subsection we discuss yet another beautiful piece of convex geometry


related to convex separation. It concerns extreme points of convex sets. We
proceed here in general settings of (real) vector and LCTV spaces.

Definition 2.96 Let Ω be a convex set in a vector space X, and let z ∈ Ω.


We say that z is an extreme point of Ω if the following implication holds
for all points x, y ∈ Ω and 0 < λ < 1:
   
x, y ∈ Ω, 0 < λ < 1, z = λx + (1 − λ)y =⇒ x = y = z .
The set of all the extreme points of Ω is denoted by ext(Ω).
2.3 Convex Separation Theorems 115

Fig. 2.11. Illustration of extreme points

The next example illustrates some typical cases of extreme points in the
plane; see also Figure 2.11.

Example 2.97 (a) Let Ω be the closed unit ball in R2 equipped with the
standard Euclidean or 2 -norm. Then ext(Ω) is the closed unit sphere
given by   
ext(Ω) = u ∈ R2  u = 1 .

(b) Let Ω be the closed unit ball in R2 equipped with the sum or 1 -norm,
i.e.,
  
Ω := u = (u1 , u2 ) ∈ R2  u1 ≤ 1 , where u1 := |u1 | + |u2 |.
Then we have the set of extreme points
 
ext(Ω) = (0, 1), (0, −1), (−1, 0), (1, 0) .

(c) Let Ω be the closed unit ball in R2 equipped with the maximum or
∞ -norm, i.e.,
    
Ω := u = (u1 , u2 ) ∈ R2  u∞ ≤ 1 , where u∞ := max |u1 |, |u2 | .
Then we have the set of extreme points
 
ext(Ω) = (1, 1), (1, −1), (−1, 1), (−1, −1) .

Another important notion of convex geometry is as follows.


116 2 BASIC THEORY OF CONVEXITY

Definition 2.98 Let Ω be a convex set in a vector space X, and let F ⊂ Ω.


We say that F is a face of Ω if F is a nonempty convex set and the following
implication holds for all x, y ∈ Ω and 0 < λ < 1:
   
x, y ∈ Ω, 0 < λ < 1, λx + (1 − λ)y ∈ F =⇒ x ∈ F, y ∈ F .

We deduce from the above definitions that if z ∈ Ω and F := {z} is a face of


Ω, then z is an extreme point of Ω. The next proposition shows that the face
notion is invariant with respect to taking intersections.

Proposition 2.99 Let Ω be a convex set in a vector space X, and let {Fα }α∈I
be a family of faces of Ω such that F := α∈I Fα is nonempty. Then the set
F is also a face of Ω.

Proof. The convexity of the set F is obvious. Fix now any x, y ∈ Ω and λ ∈
(0, 1) and then suppose that λx+(1−λ)y ∈ F . Then we have λx+(1−λ)y ∈ Fα
for all α ∈ I. Since each Fα is a face of Ω, we get x, y ∈ Fα for all α ∈ I. Thus
x, y ∈ F , which shows that F is a face of Ω. 

The next proposition involves the transitivity of the face notion and its
interaction with extreme points.

Proposition 2.100 Consider a nonempty convex set Ω in a vector space X.


(a) Let E ⊂ F ⊂ Ω. If E is a face of F while F is a face of Ω, then E is
also a face of Ω.
(b) If F is a face of Ω, then ext(F ) = ext(Ω) ∩ F . In particular, we have the
inclusion ext(F ) ⊂ ext(Ω).

Proof. To verify (a), fix any x, y ∈ Ω and λ ∈ (0, 1). Suppose that λx + (1 −


λ)y ∈ E and then get λx + (1 − λ)y ∈ F . Since F is a face of Ω, we have
x, y ∈ F . Taking into account that E is a face of F , it follows that x, y ∈ E,
which shows that E is a face of Ω.

To prove (b), let z be an extremal point of F , and let E := {z}. It immediately


implies that E is a face of F . Then (a) tells us that E is a face of Ω, and
so z is an extremal point of Ω. Thus z ∈ ext(Ω) ∩ F . Suppose now that
z ∈ ext(Ω) ∩ F and then take any x, y ∈ F together with λ ∈ (0, 1) such that
z = λx + (1 − λ)y. We deduce that z = λx + (1 − λ)y and x, y ∈ Ω. Since z
is an extreme point of Ω, we have x = y = z, which thus verifies that z is an
extreme point of F . 

The next proposition reveals a face structure of optimal solution sets in


linear-convex constrained optimization.

Proposition 2.101 Let X be a vector space. Consider minimizing a linear


function f : X → R on a nonempty convex set Ω ⊂ X and assume that this
problem has at least one optimal solution. Then the optimal solution set
2.3 Convex Separation Theorems 117
  
F := x ∈ Ω  f (x) = min f (x)
x∈Ω

is a face of the constraint set Ω.



Proof. Denote α := minx∈Ω f (x) and observe that F = {x ∈ Ω | f (x) ≤ α
is a convex subset of Ω. Fixing any x, y ∈ Ω and λ ∈ (0, 1), suppose that
λx + (1 − λ)y ∈ F . Then we have the relationships
 
α = f λx + (1 − λ)y = λf (x) + (1 − λ)f (y) ≥ λα + (1 − λ)α = α.
This readily implies that f (x) = α and f (y) = α, and hence x, y ∈ F . It shows
therefore that F is a face of Ω. 
The next lemma is certainly of its own interest while being employed below
in the proof of the Krein-Milman theorem.
Lemma 2.102 Let Ω be a nonempty, compact, and convex set in a locally
convex topological vector space X. Then we have ext(Ω) = ∅.
Proof. Consider the collection of sets
  
F := F ⊂ Ω  F is a compact face of Ω
ordered by set inclusions. Then F is nonempty because Ω ∈ F. Zorn’s lemma
from Theorem 1.124 ensures that F has a minimal element denoted by M .
We show that M is a singleton M = {z} and then conclude that z ∈ ext(Ω).
Suppose on the contrary that there exist a, b ∈ M with a = b. By the convex
separation result from Theorem 2.61, find f ∈ X ∗ such that f (a) < f (b).
Define further the set
 

E := u ∈ M  f (u) = min f (x) ,
x∈M

which is a compact face of M . Then Proposition 2.100(a) tells us that E is


a compact face of Ω as well. Observe also that E is a proper subset of M as
b∈/ E. It contradicts the minimality of M and thus completes the proof. 
Now we are ready to prove the celebrated Krein-Milman theorem.
Theorem 2.103 Let Ω be a nonempty, compact, and convex set in a locally
convex topological vector space X. Then we have
 
Ω = co ext(Ω) .
Proof. Lemma 2.102 ensures that the set ext(Ω) is nonempty. It follows from
ext(Ω) ⊂ Ω that the convex set co(ext(Ω)) is also compact.
Let us verify that Ω ⊂ co(ext(Ω)). Fix any x ∈ Ω and suppose that
x∈/ Θ := co(ext(Ω)). Employing the strict separation result of Theorem 2.61,
we find f ∈ X ∗ such that
  
sup f (x)  x ∈ Θ < f (x). (2.29)
Consider the set F := {x ∈ Ω | f (x) = γ} with γ := maxx∈Ω f (x). Then F
is a nonempty compact face of Ω, and so it has an extreme point z ∈ F by
Lemma 2.102. It follows from (2.29) that f (z) < f (x), a contradiction. 
118 2 BASIC THEORY OF CONVEXITY

2.4 Convexity of Functions


In this section we start a systematic study of convex functions. The convexity
of functions is closely related to (in fact, is generated by) the convexity of
sets, while the functional framework exhibits important features which are not
present in the geometric setting of sets. Here we consider some basic properties
of convex functions (not concerning duality and generalized differentiation,
which will be studied in the subsequent sections) and discuss particular classes
of functional convexity together with related notions and applications.

2.4.1 Descriptions and Properties of Convex Functions

This subsection presents the basic definitions, descriptions, and properties


of convex functions in general infinite-dimensional spaces. Unless otherwise
stated, the spaces under consideration are (real) vector spaces on which we
define extended-real-valued functions f : X → R := (−∞, ∞].
Given such a function f : X → R, let us associate with it the domain and
epigraph, which are the sets defined by
  
dom(f ) := x ∈ X  f (x) < ∞ and 
epi(f ) := (x, α) ∈ X × R  f (x) ≤ α ,
respectively. We say that f is proper if dom(f ) = ∅.
Developing a geometric approach to convex analysis, we define the con-
vexity of a function via the convexity of its epigraphical set; see Figures 2.12
and 2.13. This makes it possible to widely employ geometric results on set
convexity in the study and applications of convex functions.
Definition 2.104 Let f : X → R be an extended-real-valued function on a
vector space X. We say that f is convex if epi(f ) is a convex set in X × R.

Fig. 2.12. Convex and nonconvex functions

Next we present equivalent analytic descriptions of convex functions.


2.4 Convexity of Functions 119

Fig. 2.13. Epigraphs of convex and nonconvex functions

Theorem 2.105 The convexity of a function f : X → R on a vector space X


is equivalent to the each of the following statements:
(a) (Jensen inequality) For all x, y ∈ X and λ ∈ (0, 1) we have
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y). (2.30)

(b) (Extended Jensen inequality) For any points xi ∈ X and λi > 0 as


m
i = 1, . . . , m with m ∈ N satisfying the condition i=1 λi = 1 we have
 m  m
f λi xi ≤ λi f (xi ). (2.31)
i=1 i=1

Proof. Assuming that (2.30) holds, fix any pairs (x, s), (y, t) ∈ epi(f ) and a
number λ ∈ (0, 1). Then we have
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y) ≤ λs + (1 − λ)t,
which immediately implies that
 
λ(x, s) + (1 − λ)(y, t) = λx + (1 − λ)y, λs + (1 − λ)t ∈ epi(f )
and shows that the epigraph epi(f ) is a convex subset of X × R.
Conversely, suppose that f is convex and pick x, y ∈ dom(f ) and λ ∈ (0, 1).
Then (x, f (x)), (y, f (y)) ∈ epi(f ). Definition 2.104 tells us that
     
λx + (1 − λ)y, λf (x) + (1 − λ)f (y) = λ x, f (x) + (1 − λ) y, f (y) ∈ epi(f )
and therefore ensures the inequality
120 2 BASIC THEORY OF CONVEXITY
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
The latter also holds if x ∈ / dom(f ) or y ∈ / dom(f ), and so (2.30) is satisfied.
To verify now the equivalence with the extended Jensen inequality, observe
first that (b) implies (a). Thus the only thing we need to show is that the con-
vexity of f implies (b). To proceed, fix xi ∈ X and λi > 0 for i = 1, . . . , m
m
with i=1 λi = 1. It suffices to consider the case where xi ∈ dom(f ) for
i = 1, . . . , m. Then (xi , f (xi )) ∈ epi(f ) for every i = 1, . . . , m. Using Proposi-
tion 2.6, we conclude that
m
   m m 
λi xi , f (xi ) = λi xi , λi f (xi ) ∈ epi(f ),
i=1 i=1 i=1

which verifies (2.31) and hence completes the proof of the theorem. 

It follows from the proof of Theorem 2.105 that a function f : X → R is


convex if and only if for all x, y ∈ dom(f ) and λ ∈ (0, 1) we have
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).

Considering further the case where a function f : Ω → R is given on a


nonempty convex subset Ω ⊂ X, the function f can be extended to the whole
space X by the formula

 f (x) if x ∈ Ω,
f (x) :=
∞ otherwise.

We say that f is convex on Ω if its extension f: X → R is a convex function


on X. It is easy to see that f is convex on Ω if and only if for all x, y ∈ Ω
and λ ∈ (0, 1) we have
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y).
Since any convex function on X is obviously convex on every nonempty convex
subset of X, it allows us to deal with extended-real-valued convex functions
defined on the entire space X.
The next result is a direct consequence of Theorem 2.105(a).

Corollary 2.106 Any convex function f : X → R defined on a vector space


X has a convex domain dom(f ).

Proof. If f is convex, then for every x, y ∈ dom(f ) and λ ∈ (0, 1) we get by


Theorem 2.105(a) that
 
f λx + (1 − λ)y ≤ λf (x) + (1 − λ)f (y) < ∞.
Thus λx + (1 − λ)y ∈ dom(f ), which verifies the convexity of dom(f ). 
2.4 Convexity of Functions 121

Let us illustrate the convexity of functions with some simple examples.

Example 2.107 Consider the following real-valued functions f : X → R on


a normed space X:
(a) f (x) := x, x ∈ X.
(b) f (x) := x2 , x ∈ X.
Check the convexity of both functions by using the characterizing inequality
(2.30). Fix x, y ∈ X and λ ∈ (0, 1). Then we have for f (x) = x that
 
f λx + (1 − λ)y = λx + (1 − λ)y ≤ λx + (1 − λ)y = λf (x) + (1 − λ)f (y)
due to the triangle inequality and by αu = |α| · u if α ∈ R and u ∈ X.
To proceed with f (x) = x2 , we clearly get
   2
f λx + (1 − λ)y = λx + (1 − λ)y2 ≤ λx + (1 − λ)y
= λ2 x2 + 2λ(1 − λ)x · y + (1 − λ)2 y2
≤ λ2 x2 + λ(1 − λ)(x2 + y2 ) + (1 − λ)2 y2
= λx2 + (1 − λ)y2 = λf (x) + (1 − λ)f (y),
which verifies the convexity of this function.

Example 2.108 Let H be an inner product (in particular, a Hilbert) space.


Recall that a linear operator A : H → H is called self-adjoint if
Ax, y = x, Ay for all x, y ∈ H.
We say that A is nonnegative (or positive-semidefinite) if it is self-adjoint and
Ax, x ≥ 0 for all x ∈ H. Let us check that a self-adjoint mapping A : H → H
is nonnegative if and only if the scalar function f : H → R defined by
1
f (x) := Ax, x, x ∈ H,
2
is convex. Indeed, a direct calculation shows that
  1
λf (x) + (1 − λ)f (y) − f λx + (1 − λ)y = λ(1 − λ)A(x − y), x − y (2.32)
2
for any x, y ∈ H and λ ∈ (0, 1). If A is nonnegative, then A(x−y), x−y ≥ 0,
so the function f is convex by (2.32). Conversely, assuming the convexity of
f and using equality (2.32) for y = 0 verify that A is nonnegative.

The next two examples describe classes of functions associated with given
sets that are highly important in convex analysis and its extensions.

Example 2.109 Let Ω be a nonempty subset of a vector space X. Associate


with it the indicator function δΩ : X → R by
122 2 BASIC THEORY OF CONVEXITY

0 if x ∈ Ω,
δ(x; Ω) = δΩ (x) :=
∞ if x ∈
/ Ω,

which is a proper extended-real-valued function with dom(δΩ ) = Ω and


epi(δΩ ) = Ω × [0, ∞). The latter implies that the indicator function δΩ is
convex if and only if the set Ω is convex.

Fig. 2.14. The distance function

The following example associates with a nonempty subset Ω ⊂ X of a


normed space a real-valued function, which is actually Lipschitz continuous
while being usually nondifferentiable. Figure 2.14 illustrates the distance func-
tion to a convex set in the plane endowed with the Euclidean norm.

Example 2.110 Let Ω be a nonempty subset of a normed space X. Define


the distance function dΩ : X → R associated with Ω by
  
d(x; Ω) = dΩ (x) := inf x − w  w ∈ Ω , x ∈ X. (2.33)
It is not hard to verify that the convexity of Ω implies the convexity of (2.33),
and the converse also holds provided that Ω is closed; see Exercise 2.212(b).

Next we define a useful specification of function convexity and illustrate


it by a typical example in infinite dimensions.

Definition 2.111 Let f : X → R be a function defined on a vector space X.


It is said to be strictly convex if
 
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y)
for all x, y ∈ dom(f ) with x = y and λ ∈ (0, 1).
2.4 Convexity of Functions 123

Example 2.112 Let H be an inner product (in particular, a Hilbert) space.


Then the function f : H → R given by f (x) := x2 as x ∈ H is strictly
convex. Indeed, for any x, y ∈ H with and λ ∈ (0, 1) we have
 
f λx + (1 − λ)y = λx + (1 − λ)y2
= λ2 x2 + 2λ(1 − λ)x, y + (1 − λ)2 y2
≤ λ2 x2 + 2λ(1 − λ)x · y + (1 − λ)2 y2
≤ λ2 x2 + λ(1 − λ)(x2 + y2 ) + (1 − λ)2 y2
= λx2 + (1 − λ)y2 = λf (x) + (1 − λ)f (y).
Here we apply the Cauchy-Schwarz inequality and the fact that 2x · y ≤
x2 + y2 . Note further that f (λx + (1 − λ)y) = λf (x) + (1 − λ)f (y) if and
only if x, y = x · y and x2 = y2 . In this case we get x − y2 =
x2 − 2x, y + y2 = 0, and so x = y. Having x = y, it shows that
 
f λx + (1 − λ)y < λf (x) + (1 − λ)f (y),
which verifies the strict convexity of f .

Next we consider an extension of function convexity important for various


applications; in particular, to those in economic modeling.

Definition 2.113 A function f : X → R defined on a vector space X is called


quasiconvex if we have
   
f λx + (1 − λ)y ≤ max f (x), f (y) (2.34)
for all x, y ∈ X and λ ∈ (0, 1).

It follows from the definitions that every convex function is quasiconvex.



The opposite implication fails as for the simple function f (x) := |x| on R,
which is illustrated by Figure 2.15.

Fig. 2.15. A quasiconvex function

The final proposition here shows that quasiconvexity of functions has a


characterization via convexity of a family of sets.

Proposition 2.114 Let X be a vector space. A function f : X → R is quasi-


convex if and only if the sublevel set
124 2 BASIC THEORY OF CONVEXITY
  
Lα := x ∈ X  f (x) ≤ α (2.35)
is convex for every number α ∈ R.

Proof. =⇒: Assuming that f is quasiconvex, fix any α ∈ R, x, y ∈ Lα , and


λ ∈ (0, 1). Then f (x) ≤ α and f (y) ≤ α. It tells us therefore that
   
f λx + (1 − λ)y ≤ max f (x), f (y) ≤ α.
This shows that λx + (1 − λ)y ∈ Lα , and so the sublevel set (2.35) is convex.
⇐=: Suppose now that the sublevel set Lα is convex for all α ∈ R and fix
any x, y ∈ X and λ ∈ (0, 1). If either f (x) = ∞ or f (y) = ∞, then (2.34)
clearly is satisfied. Otherwise, let α := max{f (x), f (y)} ∈ R and then get
x ∈ Lα and y ∈ Lα . It yields λx + (1 − λ)y ∈ Lα , and thus
   
f λx + (1 − λ)y ≤ α = max f (x), f (y) ,
which verifies the quasiconvexity of f . 

2.4.2 Convexity under Differentiability

In this subsection we present characterizations of convexity under first-order


and second-order differentiability assumptions on the function in question.
Then the obtained characterizations are applied to verify the convexity of
remarkable functions and to derive some classical inequalities in real analysis.
We begin with the following simple albeit useful lemma for real-valued
convex functions of one variable; see the illustration in Figure 2.16.

Lemma 2.115 Let f : I → R be a convex function, where I ⊂ R is a


nonempty interval. Then for any different numbers a, b ∈ I with a < b and
any x ∈ (a, b) we have the inequalities
f (x) − f (a) f (b) − f (a) f (b) − f (x)
≤ ≤ .
x−a b−a b−x
x−a
∈ (0, 1). Then
Proof. Fix a, b, x as above and form the numbers t :=
b−a
   x − a     
f (x) = f a+(x−a) = f a+ b−a = f a+t(b−a) = f tb+(1−t)a .
b−a
This gives us the inequalities f (x) ≤ tf (b) + (1 − t)f (a) and
  x − a 
f (x)−f (a) ≤ tf (b)+(1−t)f (a)−f (a) = t f (b)−f (a) = f (b)−f (a) ,
b−a
where the latter one can be equivalently rewritten as
f (x) − f (a) f (b) − f (a)
≤ .
x−a b−a
2.4 Convexity of Functions 125

Similarly we have the estimate


  x − b 
f (x) − f (b) ≤ tf (b) + (1 − t)f (a) − f (b) = (t − 1) f (b) − f (a) = f (b) − f (a) ,
b−a
which finally implies that
f (b) − f (a) f (b) − f (x)

b−a b−x
and thus completes the proof of the lemma. 

Fig. 2.16. Lemma 2.115

Now we arrive at a classical characterization of convexity for differentiable


real functions of one variable.

Theorem 2.116 Let f : I → R be a differentiable function, where I ⊂ R is


a nonempty open interval. Then the function f is convex if and only if its
derivative f  is nondecreasing on the entire interval I.

Proof. Fix a < b with a, b ∈ I and assume that the function f is convex. Then
we get from Lemma 2.115 that
f (x) − f (a) f (b) − f (a)
≤ for every x ∈ (a, b).
x−a b−a
This implies by the derivative definition that
f (b) − f (a)
f  (a) ≤ .
b−a
Similarly we arrive at the estimate
126 2 BASIC THEORY OF CONVEXITY

f (b) − f (a)
≤ f  (b)
b−a
and conclude that f  (a) ≤ f  (b), i.e., f  is a nondecreasing function.
To prove the converse implication, suppose that f  is nondecreasing on I
and fix x1 < x2 with x1 , x2 ∈ I and t ∈ (0, 1). Then
x1 < xt < x2 for xt := tx1 + (1 − t)x2 .
Using the classical mean value theorem gives us numbers c1 , c2 with x1 <
c1 < xt < c2 < x2 such that we have the equalities
f (xt ) − f (x2 ) = f  (c2 )(xt − x2 ) = f  (c2 )t(x1 − x2 ) and
f (xt ) − f (x1 ) = f  (c1 )(xt − x1 ) = f  (c1 )(1 − t)(x2 − x1 ),
which can be equivalently rewritten as
tf (xt ) − tf (x1 ) = f  (c1 )t(1 − t)(x2 − x1 ) and
(1 − t)f (xt ) − (1 − t)f (x2 ) = f  (c2 )t(1 − t)(x1 − x2 ).
Summing up these equalities and using f  (c1 ) ≤ f  (c2 ) give us the estimate
f (xt ) ≤ tf (x1 ) + (1 − t)f (x2 ),
and thus justifies the convexity of the function f . 

As a direct consequence of Theorem 2.116, we get the following character-


ization of convexity for twice differentiable real functions of one variable.

Corollary 2.117 Let f : I → R be twice differentiable, where I ⊂ R is a


nonempty open interval. Then f is convex if and only if its second derivative
is nonnegative on I, i.e., f  (x) ≥ 0 for all x ∈ I.
Proof. Recall that f  (x) ≥ 0 for all x ∈ I if and only if the first derivative f 
is nondecreasing on this interval. Then the conclusion of the corollary follows
directly from Theorem 2.116. 

The next result provides a characterization of convexity for twice continu-


ously differentiable functions on open subsets of Rn in terms of their Hessians.

Theorem 2.118 Let f : Ω → R be twice continuously differentiable on a


nonempty open convex set Ω ⊂ Rn . Then the function f is convex on Ω if
and only if for all x ∈ Ω its Hessian matrix ∇2 f (x) is positive-semidefinite,
i.e., we have v, ∇2 f (x)v ≥ 0 whenever v ∈ Rn .

Proof. It is easy to observe that the convexity of f : Ω → R can be equivalently


described via the convexity of functions of one variable an open intervals. In
fact, f is convex if and only if for any x ∈ Ω and d ∈ Rn the real-valued
function of one variable
2.4 Convexity of Functions 127

ϕx,d (t) := f (x + td) whenever t ∈ I,


is convex, where I is the open interval I := {t ∈ R | x + td ∈ Ω}. This
observation leads us to the claimed statement by applying Corollary 2.117 to
the function ϕx,d (t). 

Using the obtained characterizations allows us to verify function convex-


ity in the following examples and also to prove some classical inequalities
formulated in the subsequent propositions.

Example 2.119 Each of the functions below is convex on the given domain:
(a) f (x) := eax on R, where a ∈ R.
(b) f (x) := xq on [0, ∞), where q ≥ 1 is a constant.
(c) f (x) := − ln(x) on (0, ∞).
(d) f (x) := x ln(x) on (0, ∞).
(e) f (x) := 1/x on (0, ∞).
(f) f (x1 , x2 ) := x2n 2n 2
1 + x2 on R , where n ∈ N.
(g) f (x) = Ax, x + b, x + c on Rn , where A is a positive-semidefinite
matrix, b ∈ Rn , and c ∈ R.

Proposition 2.120 For every a, b ≥ 0 and 0 < θ < 1 we have the inequality
aθ b1−θ ≤ θa + (1 − θ)b. (2.36)

Proof. It suffices to consider the case where a > 0 and b > 0. It follows from
the convexity of the function f (x) := − ln(x) on (0, ∞) that
− ln(θa + (1 − θ)b) ≤ −θ ln(a) − (1 − θ) ln(b),
which implies in turn that
ln(aθ b(1−θ) ) ≤ ln(θa + (1 − θ)b).
Then (2.36) is satisfied since f is monotone increasing. 

Proposition 2.121 Let xi , yi ∈ R for i = 1, . . . , m. Then for p > 1 and q > 1


such that 1/p + 1/q = 1 we have the inequality
m  m 1/p  m 1/q
|xi yi | ≤ |xi |p |yi |q . (2.37)
i=1 i=1 i=1
m m
Proof. It suffices to consider the case where i=1 |xi |p = 0 and i=1 |yi |q = 0.
|xi |p |bi |q
Let a := m , b := m , and θ := 1/p. It follows from the
i=1 |xi | i=1 |yi |
p q

estimate in (2.36) that


128 2 BASIC THEORY OF CONVEXITY

|xi yi | |xi |p |xi |q


 m 1/p  m 1/q ≤ m + m
|xi |p |yi |q p |xi |p q |xi |q
i=1 i=1 i=1 i=1

for all i = 1, . . . , m. Summing up these inequalities gives us (2.37). 


Proposition 2.122 Let f, g : R → R be summable functions of the corre-
sponding degree on [a, b], and let γ(·) stand for the Lebesgue measure on this
interval. Given p > 1 and q > 1 with 1/p + 1/q = 1, we have the inequality
 b  b 1/p   b 1/q
|f g|dγ ≤ |f |p dγ |g|q dγ (2.38)
a a a

whenever f ∈ L [a, b] and g ∈ L [a, b].


p q

b b
Proof. If either ( a |f |p dγ)1/p = 0 or ( a |g|q dγ)1/q = 0, then f = 0 a.e. and
g = 0 a.e., respectively. Thus inequality (2.38) is satisfied in this case because
b
its left-hand side is zero. Consider now the case where 0 < ( a |f |p dγ)1/p < ∞
b q
and 0 < ( a |g| dγ)1/q < ∞. For each x ∈ [a, b] we define the numbers
|f (x)|p |g(x)|q
a :=  b , b :=  b ,
a
|f |p dγ a
|f |q dγ
and then let θ := 1/p. It follows from (2.36) that
|f (x)g(x)| |f (x)|p |g(x)|q
b b 1/q ≤  b
+  b
.
a
|f |p dγ)1/p a |f |q dγ p a |f |p dγ q a |f |q dγ
Integrating both sides of this inequality, we arrive at (2.38). 
The next classical result is known as Young’s inequality.
Proposition 2.123 Let p, q > 0 be such numbers that 1/p + 1/q = 1. Then
we have the estimate
|x|p |y|q
|xy| ≤ + whenever x, y ∈ R.
p q
Proof. It suffices to apply (2.36) with a := |x|p , b := |y|q , and θ = 1/p. 

2.4.3 Operations Preserving Convexity of Functions

Now we come back to the general setting of vector spaces and note first that
the convexity of functions is obviously a unilateral notion meaning that −f
may not be convex for a convex function f as, e.g., for f (x) = |x| on R.
Furthermore, convexity is not preserved under some simple operations even
over linear functions such as taking the minimum; see, e.g., min{x, −x} =
−|x|. However, many operations particularly important in convex analysis
and applications preserve convexity. We discuss them in this subsection.
2.4 Convexity of Functions 129

Proposition 2.124 Let X be a vector space, and let f, fi : X → R be convex


functions for all i = 1, . . . , m. Then the following functions are convex as well:
(a) The multiplication by scalars λf for any λ ≥ 0.
m
(b) The sum function i=1 fi .
(c) The maximum function max1≤i≤m fi .

Proof. The convexity of scalar multiplication λf as λ ≥ 0 follows directly


from the definition. Let us check that the sum of two convex functions f1 + f2
is convex. The case of finitely many functions under summation easily follows
by induction. To proceed, pick any x, y ∈ X and λ ∈ (0, 1). Then we get
      
f1 + f2 λx + (1 − λ)y = f1 λx + (1 − λ)y + f2 λx + (1 − λ)y
≤ λf1 (x) + (1 − λ)f1 (y) + λf2 (x) + (1 − λ)f2 (y)
= λ(f1 + f2 )(x) + (1 − λ)(f1 + f2 )(y),
and thus the sum function f1 + f2 is convex.
Likewise, it is sufficient to consider only two functions under the maximum
operation. Denote g := max{f1 , f2 } and get for x, y ∈ X and λ ∈ (0, 1) that
 
fi λx + (1 − λ)y ≤ λfi (x) + (1 − λ)fi (y) ≤ λg(x) + (1 − λ)g(y)
as i = 1, 2. This readily implies that
      
g λx + (1 − λ)y = max f1 λx + (1 − λ)y , f2 λx + (1 − λ)y
≤ λg(x) + (1 − λ)g(y),
which therefore verifies the convexity of the maximum function on X. 

The next result concerns the preservation of convexity under compositions.

Proposition 2.125 Let X be a vector space. Suppose that f : X → R is con-


vex, and let ϕ : R → R be nondecreasing and convex on a convex set containing
the range of the function f . Then the composition ϕ ◦ f is convex.

Proof. Picking x1 , x2 ∈ X and λ ∈ (0, 1), we have by the convexity of f that


 
f λx1 + (1 − λ)x2 ≤ λf (x1 ) + (1 − λ)f (x2 ).
The nondecreasing and convexity properties of ϕ imply that
     
ϕ ◦ f λx1 + (1 − λ)x2 = ϕf λx1 + (1 − λ)x2 
≤ ϕ λf
 (x1 )+ (1 − λ)f (x2 ) 
≤ λϕ f (x1 ) + (1 − λ)ϕ f (x2 )
= λ(ϕ ◦ f )(x1 ) + (1 − λ)(ϕ ◦ f )(x2 ),
which verifies the convexity of the composition ϕ ◦ f . 

Now we consider the composition of a convex function and an affine map-


ping.
130 2 BASIC THEORY OF CONVEXITY

Proposition 2.126 Let B : X → Y be an affine mapping between vector


spaces, and let f : Y → R be a convex function on Y . Then the composition
f ◦ B is convex on X.

Proof. Taking any x, y ∈ X and λ ∈ (0, 1), we have


      
f ◦B
 λx+ (1 − λ)y = f B(λx
 + (1 − λ)y) = f λB(x) + (1 − λ)B(y)
≤ λf B(x) + (1 − λ)f B(y) = λ(f ◦ B)(x) + (1 − λ)(f ◦ B)(y)
and therefore justify the convexity of the composition f ◦ B. 

The next result deals with the supremum of convex functions over an
arbitrary index set. It largely extends the statement of Proposition 2.124(c).
Proposition 2.127 Let X be a vector space, and let fi : X → R for i ∈ I
be a collection of convex functions with a nonempty index set I. Then the
supremum function f (x) := supi∈I fi (x) is convex on X.

Proof. Fix x1 , x2 ∈ X and λ ∈ (0, 1). For every i ∈ I we have


 
fi λx1 + (1 − λ)x2 ≤ λfi (x1 ) + (1 − λ)fi (x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ),
which implies in turn that
   
f λx1 + (1 − λ)x2 = sup fi λx1 + (1 − λ)x2 ≤ λf (x1 ) + (1 − λ)f (x2 )
i∈I

and thus verifies the convexity of the supremum function. 

Further we turn to a major class of functions having a variational structure


and being highly important in many aspects of analysis and applications, not
only for those related to optimization. Recall that the notion and notation of
set-valued mappings/multifunctions used below were introduced and partly
discussed at the end of Subsection 2.1.1.

Definition 2.128 Given F : X →


→ Y and ϕ : X ×Y → R, the optimal value
or marginal function associated with F and ϕ is defined by
  
μ(x) := inf ϕ(x, y)  y ∈ F (x) for x ∈ X. (2.39)

In this section we assume that μ(x) > −∞ for every x ∈ X and also use the
convention that inf(∅) := ∞ in this definition and throughout the book.
The following theorem shows that convexity is preserved in the general
settings of marginal functions.

Theorem 2.129 Let X and Y be vector spaces. Assume that ϕ : X × Y → R


is a convex function and that F : X →→ Y is a convex set-valued mapping.
Then the optimal value function μ in (2.39) is convex.
2.4 Convexity of Functions 131

Proof. Pick x1 , x2 ∈ dom(μ), λ ∈ (0, 1), and ε > 0. Then find yi ∈ F (xi ) with
ϕ(xi , yi ) < μ(xi ) + ε, i = 1, 2.
It directly implies the inequalities
λϕ(x1 , y1 ) < λμ(x1 ) + λε, (1 − λ)ϕ(x2 , y2 ) < (1 − λ)μ(x2 ) + (1 − λ)ε.
Summing up these inequalities and employing the convexity of ϕ yield
 
ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 ≤ λϕ(x1 , y1 ) + (1 − λ)ϕ(x2 , y2 )
< λμ(x1 ) + (1 − λ)μ(x2 ) + ε.
Furthermore, the convexity of gph(F ) gives us
 
λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2 = λ(x1 , y1 ) + (1 − λ)(x2 , y2 ) ∈ gph(F ),
and therefore λy1 + (1 − λ)y2 ∈ F (λx1 + (1 − λ)x2 ). This implies that
   
μ λx1 + (1 − λ)x2 ≤ ϕ λx1 + (1 − λ)x2 , λy1 + (1 − λ)y2
< λμ(x1 ) + (1 − λ)μ(x2 ) + ε.
Letting now ε ↓ 0 ensures the convexity of the optimal value function μ. 

Next we consider a class of functions generated by sets via the infimum


operation. Given a nonempty set G ⊂ X × R, define the function
  
fG (x) := inf t ∈ R  (x, t) ∈ G . (2.40)
Note that since the set Ex in the following proposition belongs to R, we may
use its boundedness from below.

Proposition 2.130 Let X be a vector space, and let G ⊂ X × R. For any


fixed x ∈ X consider the set of real numbers Ex := {t ∈ R | (x, t) ∈ G} and
assume that it is nonempty, bounded from below, and closed. Then we have
the inclusion (x, fG (x)) ∈ G.

Proof. Employing (2.40) gives us a sequence {tk } ⊂ R such that (x, tk ) ∈ G


for all k ∈ N and that the sequence {tk } converges to the finite number fG (x).
Since Ex is closed, we pass to the limit as k → ∞ and arrive at the claimed
inclusion (x, fG (x)) ∈ G. 

Let us now define a property that provides a sufficient condition for the
equality epi(fG ) = G established in the next proposition.

Definition 2.131 Let X be a vector space. Then G ⊂ X × R is called epi-


graphically closed if for any x ∈ X the set Ex from Proposition 2.130 is
closed, bounded from below, and the following implication holds:
   
(x, t) ∈ G =⇒ (x, t ) ∈ G for all t ≥ t .
132 2 BASIC THEORY OF CONVEXITY

Proposition 2.132 Let X be a vector space, and let G be an epigraphically


closed subset of X × R. Then we have the equality epi(fG ) = G.
Proof. It suffices to show that epi(fG ) ⊂ G. Fix any (x, α) ∈ epi(fG ) and then
get fG (x) ≤ α. Proposition 2.130 tells us that (x, fG (x)) ∈ G. Furthermore,
the assumed epigraphical closedness property of G ensures that (x, α) ∈ G,
which therefore completes the proof. 
We also need further the following simple observation.

Proposition 2.133 Let X be a vector space, and let f : X → R. Then


we always have fepi(f ) (x) = f (x) on X. Consequently, for any functions
f, g : X → R we get f (x) = g(x) on X if and only if epi(f ) = epi(g).

Proof. It easily follows from the definition that


     
fepi(f ) (x) = inf t ∈ R  (x, t) ∈ epi(f ) = inf t ∈ R  t ≥ f (x) = f (x)
for any x ∈ X, and thus we are done. 

Now we are ready to establish the preservation of convexity under taking


the infimum in the convex setting of (2.40).

Corollary 2.134 Let X be a vector space, and let G ⊂ X × R. Then the


function fG : X → R defined in (2.40) is convex provided that the set G is
convex and that fG (x) > −∞ for all x ∈ X.

Proof. Define the real-valued function ϕ : X × R → R by ϕ(x, γ) := γ and the


set-valued mapping F : X →→ R by
  
F (x) := t ∈ R  (x, t) ∈ G for x ∈ X.
Then we have fG = μ on X, where the marginal function μ is defined in (2.39).
Since ϕ is a convex function and F is a convex multifunction, the convexity
of fG follows directly from Theorem 2.129. 

The next extremality operation plays a significant role in convex analysis


and applications. Given a vector space X and functions fi : X → R for i =
1, . . . , m, define the infimal convolution of these functions by
  m
 m 
f1  . . . fm (x) := inf fi (xi )  xi ∈ X, xi = x , x ∈ X. (2.41)
i=1 i=1

We now show that this operation preserves convexity.

Proposition 2.135 Let X be a vector space, and let fi : X → R for i =


1, . . . , m be convex functions. Then the function g := (f1  . . . fm ) : X → R
is also convex provided that (f1  . . . fm )(x) > −∞ for all x ∈ X.
2.4 Convexity of Functions 133

Proof. Consider the convex set


m
G := epi(fi ) ⊂ X × R
i=1

and observe that g = fG , where fG is defined in (2.40). Indeed, fix any (x, t) ∈


G and find (xi , ti ) ∈ epi(fi ) for i = 1, . . . , m such that
m
(x, t) = (xi , ti ).
i=1

Then we clearly have the relationships


m m m

x= xi and t = ti ≥ fi (xi ) ≥ f1  . . . fm )(x).
i=1 i=1 i=1

It follows from the definition of fG that fG (x) ≥ (f1  . . . fm )(x). To verify
next that fG (x) ≤ (f1  . . . fm )(x), take any xi ∈ X for i = 1, . . . , m with
m m
x = i=1 xi . If xi ∈ dom(fi ) for all i = 1, . . . , m, then (x, i=1 fi (xi )) =
m m
i=1 (xi , fi (xi )) ∈ G, and so fG (x) ≤ i=1 fi (xi ). This inequality also holds
if xi ∈/ dom(fi ) for some i = 1, . . . , m. Employing (2.40) and (2.41) tells us
that fG (x) ≤ (f1  . . . fm )(x). Hence it follows from Corollary 2.134 that fG
is convex, which verifies the convexity of the infimal convolution (2.41). 

If the function in question is not convex, it is useful in many situations to


consider its convexification, which is the function defined below.

Definition 2.136 Let X be a vector space, and let f : X → R. Then the


convexification of f is given by
co(f )(x) := fG (x) for all x ∈ X, (2.42)
where G := co(epi(f )), and where fG is taken from (2.40).

It follows from the definition that


     
co(f ) (x) = inf t ∈ R  (x, t) ∈ co epi(f ) , x ∈ X,
and that co(f ) is a convex function provided that (co(f ))(x) > −∞ for all
x ∈ X. This is the case, e.g., when f is bounded from below.
However, in some settings we do not get adequate results from the con-
vexification in (2.42) as shown in the next example; see Figure 2.17.

Example 2.137 Consider the following functions on the real line:


2
(a) Let f (x) := e−x for x ∈ R. Then (co(f ))(x) = 0 for all x ∈ R.
(b) Let f (x) := x3 for x ∈ R. Then (co(f ))(x) = −∞ for all x ∈ R. In this
case co(f ) is not even a convex function by our definition.
134 2 BASIC THEORY OF CONVEXITY

Fig. 2.17. Example 2.137

Next we provide an explicit representation of the convexified function.

Proposition 2.138 Let f : X → R be a function on a vector space X. Then


for any x ∈ X we have the representation
  
m  
m 
m

co(f ) (x) = inf λi f (xi )λi ≥ 0, λi = 1, xi ∈ X, λi xi = x, m ∈ N .
i=1 i=1 i=1

Proof. Fix x ∈ X and define the following subsets of the real line:
   
A := t ∈ R  (x, t) ∈ co epi(f ) and
m  m m 

B := λi f (xi )λi ≥ 0, λi = 1, xi ∈ X, λi xi = x, m ∈ N .
i=1 i=1 i=1

To justify the claimed representation, it suffices to show that inf(A) = inf(B).


We can easily see that B ⊂ A, and so inf(B) ≥ inf(A). To verify the opposite
inclusion, fix t ∈ A and get (x, t) ∈ co(epi(f )). Thus there exist numbers
m
λi ≥ 0 with i=1 λi = 1 and vectors (xi , ti ) ∈ epi(f ) such that
m
(x, t) = λi (xi , ti ).
i=1
m
It immediately implies that x = i=1 λi xi and that
m m
t= (λi ti ) ≥ λi f (xi ) ≥ inf(B).
i=1 i=1

In this way we arrive at inf(A) ≥ inf(B) and thus complete the proof. 
Now we define the closure of a function on a topological vector space and
study some of its properties prior to the subsequent convexification.
2.4 Convexity of Functions 135

Definition 2.139 Given a function f : X → R on a topological vector space


X, define the closure of f by
f (x) := fepi(f ) (x) for all x ∈ X,

which can be equivalently written as


  
f (x) = inf t ∈ R  (x, t) ∈ epi(f ) , x ∈ X.

The following result establishes relationships between epigraphs of the


function in question and its closure.

Theorem 2.140 Let X be a topological vector space, and let f : X → R with


f (x) > −∞ for all x ∈ X. Then epi(f ) = epi(f ), and we also have
f (x) ≤ f (x) whenever x ∈ X.
If in addition g(x) ≤ f (x) for all x ∈ X and if the set epi(g) is closed, then
we get the closure estimate g(x) ≤ f (x) on X.

Proof. Definition (2.40) ensures that if G1 , G2 ⊂ X × R and G1 ⊂ G2 , then


fG2 (x) ≤ fG1 (x) for all x ∈ X. Since epi(f ) ⊂ epi(f ), it follows that
f (x) = fepi(f ) (x) ≤ fepi(f ) (x) = f (x) on X.

Suppose now that g(x) ≤ f (x) on X and that the set epi(g) is closed. Then
we have epi(f ) ⊂ epi(g), and so epi(f ) ⊂ epi(g). It follows therefore that
g(x) = fepi(g) (x) ≤ fepi(f ) (x) = f (x) for all x ∈ X.

Since epi(f ) ⊂ epi(f ), we only need to check the inclusion epi(f ) ⊂ epi(f ).
Fix any (x, γ) ∈ epi(f ) and get f (x) ≤ γ. Then Proposition 2.130 implies that
(x, f (x)) ∈ epi(f ), and thus (x, γ) ∈ epi(f ). 

Finally in this subsection, we define the convex closure of a function given


on a topological vector space.

Definition 2.141 Consider an arbitrary function f : X → R on a topological


vector space X and define its convex closure by
   
co(f )(x) := fco(epi(f )) (x) on X with co epi(f ) := co epi(f ) .
This can be equivalently written as
   
co(f )(x) = inf t ∈ R  (x, t) ∈ co epi(f ) for x ∈ X.

The next proposition summarizes some properties of the latter function


that can be easily derived from the results obtained above.
136 2 BASIC THEORY OF CONVEXITY

Proposition 2.142 Let X be a topological vector space, and let f : X → R


be such that co(f )(x) > −∞ for all x ∈ X. Then the convex closure of f has
the following properties:
 
(a) epi(co(f )) = co epi(f ) .
(b) If f is convex, then co(f )(x) = f (x) on X.
(c) co(f )(x) ≤ f (x) for all x ∈ X. If in addition g(x) ≤ f (x) for all x ∈ X
and if the set epi(g) is closed and convex, then g(x) ≤ co(f )(x).

2.4.4 Continuity of Convex Functions

Our next topic in this section is the continuity of convex functions defined
on topological vector spaces. We say that an extended-real-valued function
f : X → R defined on a topological vector space X is continuous at x ∈ X if
for any ε > 0 there exists a neighborhood U ⊂ dom(f ) of x such that
|f (x) − f (x)| < ε whenever x ∈ U.
Observe from the definition that if f is continuous at x, then x ∈ int(dom(f )).
First we show that the continuity of a convex function follows from its
local boundedness from above.

Proposition 2.143 Let f : X → R be a convex function defined on a topolog-


ical vector space X. If f is bounded from above on a neighborhood of x, then
it is continuous at x.

Proof. First we examine the case where x = 0 and f (x) = 0. Let V be a


symmetric neighborhood of the origin and let c > 0 be such that
f (x) ≤ c for all x ∈ V.
Our intention is to show that for any ε > 0 there exists a neighborhood W of
the origin for which we have the estimate
|f (x)| < cε whenever x ∈ W (2.43)
ensuring the continuity of f at the origin. Assume without loss of generality
that ε < 1 and let W := εV . For any x ∈ W = εV we have x/ε ∈ V , and so
 x  x
f (x) = f ε + (1 − ε)0 ≤ εf + (1 − ε)f (0) ≤ cε.
ε ε
On the other hand, using the representation
ε  x 1
0= − + x
1+ε ε 1+ε
and employing the convexity of f tell us that
2.4 Convexity of Functions 137

ε  x 1 ε 1
0 = f (0) ≤ f − + f (x) ≤ c+ f (x),
1+ε ε 1+ε 1+ε 1+ε
which yields −cε ≤ f (x). This shows therefore that condition (2.43) is satis-
fied, and hence the function f is continuous at the origin.
If x = 0, let U := V − x and g(x) := f (x + x) − f (x). Then g is bounded
from above on the neighborhood U of the origin with g(0) = 0. Thus g is is
continuous at 0, which implies the continuity of f at x. 
The next theorem provides characterizations of continuity of convex func-
tions on general topological vector spaces.
Theorem 2.144 Let X be a topological vector space, and let f : X → R be a
convex function. The following properties are equivalent:
(a) f is continuous at some point x ∈ X.
(b) f is bounded from above on a nonempty open subset of X.
(c) int(epi(f )) = ∅.
(d) int(dom(f )) = ∅ and f is continuous on int(dom(f )).
Proof. We split the proof into several steps.

(a)=⇒(b): If f is continuous at x ∈ X, then x ∈ dom(f ) and for ε = 1 there


exists an open neighborhood V of x on which
|f (x) − f (x)| < 1 for all x ∈ V.
We get that f (x) < f (x) + 1 if x ∈ V , and so f is bounded from above on V .
(b)=⇒(a): This implication is proved in Proposition 2.143.
(b)=⇒(c): Suppose that f (x) ≤ c for all x ∈ V , where V is an open neighbor-
hood of x. Then we have (x, c + 1) ∈ V × (c, ∞), where the product set is an
open subset of epi(f ). It verifies (x, c + 1) ∈ int(epi(f )).
(c)=⇒(d): Suppose that int(epi(f )) = ∅. Fix (x, c) ∈ int(epi(f )) and find a
neighborhood V of the origin and an interval (α, β) containing c such that
(x, c) ∈ W × (α, β) ⊂ epi(f ),
where W := x + V . This yields f (x) ≤ c for all x ∈ W , and hence x ∈ W ⊂
dom(f ). It shows that x ∈ int(dom(f )).
To complete the proof, it remains to verify that f is continuous on
int(dom(f )). Fix any w0 ∈ int(dom(f )) and choose t > 0 so small that
y0 := w0 + t(w0 − x) ∈ int(dom(f )). Then we get
t 1 t
w0 = x+ y = λx + (1 − λ)y with λ := .
1+t 1+t 1+t
Let us show that f is bounded from above on w0 + λV , which implies the
continuity of f at w0 by Proposition 2.143. Indeed, for any x = w0 + λv with
v ∈ V we deduce from the convexity of f that
138 2 BASIC THEORY OF CONVEXITY
   
f (x) = f λx + (1 − λ)y + λv = f λ(x + v) + (1 − λ)y
≤ λf (x + v) + (1 − λ)f (y) ≤ c + f (y) := η,
and thus f is bounded from above by η ∈ R on w0 + λV . 
The next result is an infinite-dimensional counterpart of Corollary 2.95. It
can be derived either from the continuity characterizations of Theorem 2.144,
or similarly to the proof of Theorem 2.94 by using the convex separation theo-
rem in topological vector spaces taken from Theorem 2.58; see Exercise 2.221.
Corollary 2.145 Let X be a topological vector space, and let f : X → R be a
convex function such that int(epi(f )) = ∅. Then we have
      
int epi(f ) = (x, λ) ∈ X × R  x ∈ int dom(f ) , λ > f (x) . (2.44)
Proof. Since int(epi(f )) = ∅, we deduce from the obtained equivalence
(c)=⇒(d) in Theorem 2.144 that int(dom(f  )) = ∅ and that f is continuous
on int(dom(f ). Fix now any (x0 , λ0 ) ∈ int epi(f ) and find a neighborhood
V of x0 and δ > 0 such that
(x0 , λ0 ) ∈ V × [λ0 − ε, λ0 + ε] ⊂ epi(f ).
It follows that f (x) ≤ λ0 − ε for all x ∈ V . Then x0 ∈ V ⊂ dom(f ) and thus
f (x0 ) ≤ λ0 − ε < λ0 , which shows that x0 ∈ int(dom(f )) and f (x0 ) < λ0 .
To verify the opposite inclusion in (2.44), fix any (x0 , λ0 ) ∈ X × R such
that x0 ∈ int(dom(f )) and λ0 > f (x0 ). Choose a number γ with f (x0 ) < γ <
λ0 . The continuity of f allows us to find a neighborhood V of x0 for which
V ⊂ dom(f ) and f (x) < γ whenever x ∈ V . It follows that
(x0 , λ0 ) ∈ V × (γ, ∞) ⊂ epi(f ),
and thus (x0 , λ0 ) ∈ int(epi(f )), which justifies representation (2.44). 
The following general example shows that the conclusion of Corollary 2.145
may fail without assuming that int(epi(f )) = ∅.
Example 2.146 Let X be a topological vector space, and let f : X → R be
any real-valued convex function that is discontinuous at each point of X. Then
int(epi(f )) = ∅ by Theorem 2.144, while dom(f ) = X and the set on the right-
hand side of (2.44) is nonempty. A particular example of such a function is as
follows. Consider the space X := C 1 [0, 1] of continuously differentiable real-
valued functions endowed with the the standard maximum norm and define
a convex function f : X → R by f (x) := |x (0)|. It is easy to see that this
function is nowhere continuous on X.
Now we turn to a class of functions that is well known in classical anal-
ysis and differential equations, but its crucial role has been more recently
recognized in modern nonsmooth and variational analysis. In the case of con-
vex functions it is closely related to continuity while playing its own highly
important role of “continuity at a linear rate” in more general frameworks.
2.4 Convexity of Functions 139

Definition 2.147 Let f : X → R be a function on a normed space X.


(a) We say that f is Lipschitz continuous on a set Ω ⊂ dom(f ) if there
exists a constant  ≥ 0 such that
|f (u) − f (x)| ≤ u − x for all u, x ∈ Ω.

(b) We say that f is locally Lipschitzian on an open set U ⊂ dom(f ) if


for any x ∈ U there exists δ > 0 and a constant  ≥ 0 such that
|f (u) − f (x)| ≤ u − x for all x, u ∈ B(x; δ) ⊂ U.

In preparation to establish the next major theorem, we present first the


following lemma of its own independent interest.

Lemma 2.148 Let X be a normed space. If a convex function f : X → R


is bounded from above on B(x; δ) for some element x ∈ dom(f ) and number
δ > 0, then f is bounded on B(x; δ).

Proof. Denote η := f (x) and take M > 0 with f (x) ≤ M for all x ∈ B(x; δ).
Picking any u ∈ B(x; δ), consider the point x := 2x − u ∈ B(x; δ) with
x + u 1 1
η = f (x) = f ≤ f (x) + f (u),
2 2 2
It yields f (u) ≥ 2η − f (x) ≥ 2η − M , and thus f is bounded on B(x; δ). 

The next theorem shows that for convex functions on normed spaces the
local boundedness yields their local Lipschitz continuity.

Theorem 2.149 Let f : X → R be a convex function on a normed space X,


and let x ∈ dom(f ). If f is bounded from above on B(x; δ) for some δ > 0,
then f is Lipschitz continuous on B(x; δ/2).

Proof. Fix x, y ∈ B(x; δ/2) with x = y and consider the point


δ  
u := x + x−y .
2x − y
x−y δ δ δ
Since e := ∈ B, we have u = x + e ∈ x + B + B ⊂ x + δB.
x − y 2 2 2
δ
Denoting further α := gives us u = x + α(x − y), and thus
2x − y
1 α
x= u+ y.
α+1 α+1
It follows from the convexity of f that
1 α
f (x) ≤ f (u) + f (y),
α+1 α+1
140 2 BASIC THEORY OF CONVEXITY

which implies in turn the inequalities


1   1 2x − y 4M
f (x) − f (y) ≤ f (u) − f (y) ≤ 2M = 2M ≤ x − y
α+1 α+1 δ + 2x − y δ
  
with M := sup |f (x)|  x ∈ B(x; δ) < ∞ by Lemma 2.148. Interchanging
the role of x and y above, we arrive at the estimate
4M
|f (x) − f (y)| ≤ x − y
δ
and thus verify the Lipschitz continuity of f on B(x; δ/2). 

The following immediate consequence of Theorems 2.144 and 2.149 con-


tains major equivalences for convex functions defined on normed spaces.

Corollary 2.150 Let f : X → R be a convex function on a normed space X.


Then each equivalent property in Theorem 2.144 is also equivalent to the facts
that Ω := int(dom(f )) = ∅ and f is locally Lipschitzian on Ω.

Fig. 2.18. Illustration of Lemma 2.151

The obtained result can be strengthened for functions on finite-dimensional


spaces. First we present the following finite-dimensional lemma of its own
interest; see Figure 2.18.
2.4 Convexity of Functions 141

Lemma 2.151 Let {ei | i = 1, . . . , n} be the standard orthonormal basis of


Rn . Denote   
A := x ± εei  i = 1, . . . , n
for any fixed ε > 0. Then we have the properties:
(a) x + γei ∈ co(A) for |γ| ≤ ε and i = 1, . . . , n.
(b) B(x; ε/n) ⊂ co(A).
Proof. The verify (a), take γ ∈ R with |γ| ≤ ε and represent it as γ =
t(−ε) + (1 − t)ε for some t ∈ [0, 1]. Then the inclusion x ± εei ∈ A gives us
x + γei = t(x − εei ) + (1 − t)(x + εei ) ∈ co(A).
To prove (b), pick x ∈ B(x; ε/n) and get x = x + (ε/n)u with u ≤ 1. We
can represent u via the basic vectors (e1 , . . . , en ) as
n
u= λi ei ,
i=1

λ2i = u ≤ 1 for every i. This tells us that
n
where |λi | ≤ i=1

1 
n n
ε ελi
x=x+ u=x+ ei = x + ελi ei .
n i=1
n i=1
n

Denoting finally γi := ελi implies that |γi | ≤ ε. It follows from (a) that
x + ελi ei = x + γi ei ∈ co(A), and thus x ∈ co(A) since this point is defined
as a convex combination of some elements in co(A). 
The obtained lemma leads us to yet another striking consequence of The-
orem 2.144 in the case of finite-dimensional spaces.
Corollary 2.152 Any extended-real-valued convex function f : Rn → R is
locally Lipschitz continuous on the interior of its domain int(dom(f )).
Proof. Let {ei | i = 1, . . . , n} be the standard orthonormal basis of Rn . Pick
x ∈ int(domf ) and choose ε > 0 such that x ± εei ∈ dom(f ) for every i =
1, . . . , n. Considering the set A from Lemma 2.151, we get B(x; ε/n) ⊂ co(A)
by the second assertion of the lemma. Denote M := max{f (a) | a ∈ A} < ∞
over the finite set A. Using the representation
m m
x= λi ai with λi ≥ 0, λi = 1, ai ∈ A
i=1 i=1

for any x ∈ B(x; ε/n) shows that


m m
f (x) ≤ λi f (ai ) ≤ λi M = M,
i=1 i=1

and so f is bounded from above on B(x; ε/n). Then Theorem 2.144 tells us
that f is locally Lipschitz continuous on int(dom(f )). 
142 2 BASIC THEORY OF CONVEXITY

As an immediate consequence of Corollary 2.152, we arrive at the next


impressive result in finite dimensions.
Corollary 2.153 Let f : Rn → R be a real-valued convex function on Rn .
Then it is locally Lipschitzian on the entire space Rn .
We will revisit this topic in Section 3.3 providing a subdifferential char-
acterization of local Lipschitz continuity and continuity of convex functions.

2.4.5 Lower Semicontinuity and Convexity

In this subsection we discuss the notion of lower semicontinuity for extended-


real-valued functions that are not generally convex. This notion plays a major
role in both convex and nonconvex frameworks. This is similar to the funda-
mental role of continuity in classical analysis. Here we discuss some basic
facts on lower semicontinuity and derive additional results in the presence of
convexity.
Definition 2.154 Let X be a topological space. An extended-real-valued func-
tion f : X → R is lower semicontinuous (l.s.c.) at x ∈ X if for every
α ∈ R with f (x) > α there exists a neighborhood V of x such that
f (x) > α for all x ∈ V. (2.45)
We say that f is lower semicontinuous on X (or simply l.s.c., without
mentioning X) if it is l.s.c. at every point x ∈ X.
The next examples illustrate this notion.
Example 2.155 Consider the following real-valued functions on the real line.
(a) Define f : R → R by

x2 if x = 0,
f (x) :=
−1 if x = 0.
It is easy to see that it is l.s.c. at x = 0 and in fact on the entire real line.
On the other hand, the replacement of f (0) = −1 by f (0) = 1 destroys
the lower semicontinuity at x = 0.
(b) Consider the indicator function of the closed interval [−1, 1] given by

0 if |x| ≤ 1,
f (x) :=
∞ otherwise.
Then f is l.s.c. on R. However, the small modification

0 if |x| < 1,
g(x) :=
∞ otherwise
violates the lower semicontinuity on R. In fact, g is not l.s.c. at x = ±1,
while it is lower semicontinuous at any other point of the real line.
2.4 Convexity of Functions 143

In what follows we present several useful characterizations of lower semi-


continuity for extended-real-valued functions in various settings.
Proposition 2.156 Let f : X → R be given on a topological space X. Then
f is l.s.c. at x ∈ dom(f ) if and only if for any number ε > 0 there exists a
neighborhood V of x such that
f (x) − ε < f (x) whenever x ∈ V. (2.46)

Proof. Suppose that f is l.s.c. at x. Since λ := f (x) − ε < f (x), there exists
a neighborhood V of x with
f (x) − ε = λ < f (x) for all x ∈ V.
Conversely, suppose that for any ε > 0 there is a neighborhood V of x such
that (2.46) holds. Fix λ < f (x) and choose ε > 0 with λ < f (x) − ε. Then
λ < f (x) − ε < f (x) whenever x ∈ V
on some neighborhood V , so we complete the proof. 

As the following example shows, the conclusion of Proposition 2.156 is not


true in general if we do not assume that x ∈ dom(f ); see Figure 2.19.

Example 2.157 Consider the function



⎨ 1 if x = 0,
f (x) := |x|

∞ if x = 0.
We can see that f is l.s.c., but for any ε > 0 and any neighborhood V of x = 0
there exists x ∈ V such that f (x) < f (x) − ε = ∞.

Proposition 2.158 Let X be a topological space. Then f : X → R is lower


semicontinuous if and only if for any number α ∈ R the sublevel set {x ∈
X | f (x) ≤ α} is closed.

Proof. Suppose that f is l.s.c. Pick any α ∈ R and denote


  
Ω := x ∈ X  f (x) ≤ α .
If x ∈
/ Ω, then f (x) > α. By (2.45) we can find a neighborhood V of x such
that
f (x) > α for all x ∈ V,
which implies that x ∈ V ⊂ Ω c . Thus the complement Ω c of Ω is open, and
so the set Ω is closed.
To verify the converse implication, suppose that the set {x ∈ X | f (x) ≤
α} is closed for any α ∈ R. Fix x ∈ X and α ∈ R with f (x) > α and then let
144 2 BASIC THEORY OF CONVEXITY

Fig. 2.19. A lower semicontinuous function

  
Θ := x ∈ X  f (x) ≤ α .
Since Θ is closed and since x ∈
/ Θ, there exists a neighborhood V of x satisfying
the inclusion V ⊂ Θc . This tells us that
f (x) > α for all x ∈ V,
which justifies the lower semicontinuity of f at x and completes the proof of
the proposition. 

The following proposition provides an equivalent geometric description of


lower semicontinuity of extended-real-valued functions on topological spaces.

Proposition 2.159 Let X be a topological space. Then f : X → R is l.s.c. if


and only if its epigraph is closed.

Proof. To verify the “only if” part, fix (x, α) ∈


/ epi(f ) meaning that α < f (x).
For any ε > 0 with f (x) > α + ε > α we find a neighborhood V of x such
that f (x) > α + ε whenever x ∈ V . This gives us the inclusion
 c
V × (α − ε, α + ε) ⊂ epi(f ) ,
and hence the set epi(f ) is closed.
Conversely, suppose that epi(f ) is closed. Pick an arbitrary point x ∈ X
and choose α ∈ R with α < f (x). Then (x, α) ∈ / epi(f ). The closedness of
epi(f ) gives us a neighborhood V of x and a number δ > 0 ensuring that
 c
V × (α − δ, α + δ) ⊂ epi(f ) .
For any x ∈ V we see that (x, α) ∈ / epi(f ), and so f (x) > α. This verifies that
the function f is l.s.c. at x and hence on the entire space X. 
2.4 Convexity of Functions 145

The next proposition establishes a sequential description of lower semicon-


tinuity of functions defined on metric spaces.
Proposition 2.160 Let X be a metric space. Then f : X → R is l.s.c. at
x ∈ X if and only if for any sequence xk → x we have that lim inf k→∞ f (xk ) ≥
f (x) as k → ∞.

Proof. If f is l.s.c. at x, take a sequence of xk → x and for any α < f (x)


find δ > 0 such that (2.45) holds. Then xk ∈ B(x; δ), and so α < f (xk ) for
all large k ∈ N. It follows therefore that α ≤ lim inf k→∞ f (xk ), and hence we
arrive at f (x) ≤ lim inf k→∞ f (xk ) by the arbitrary choice of α < f (x).
To verify the converse implication, suppose on the contrary that f is not
l.s.c. at x. Then there exists α < f (x) such that for every δ > 0 we have
xδ ∈ B(x; δ) with α ≥ f (xδ ). Applying the latter inequality to δk := 1/k
gives us a sequence {xk } ⊂ X converging to x with α ≥ f (xk ). It yields
f (x) > α ≥ lim inf f (xk ),
k→∞

a contradiction, which thus completes the proof. Observe from the proof that
the implication =⇒ above holds whenever X is a topological space. 

Now we address convex l.s.c. functions on LCTV spaces.

Proposition 2.161 Let f : X → R be a convex function on an LCTV space


X. Then f is l.s.c. if and only if it is weakly l.s.c. on X.

Proof. Take any α ∈ R and consider the sublevel set


  
Lα := x ∈ X  f (x) ≤ α . (2.47)
Since Lα is convex, Lα is closed if and only if it is weakly closed. Thus the
conclusion of this proposition follows from the one in Proposition 2.158. 

The next lemma is needed to derive the main result of this subsection.
Lemma 2.162 Let X be a topological space, and let {Ωk }k∈N be a sequence
of nonempty closed compact subsets of X such that Ωk+1 ⊂ Ωk for all k ∈ N.
Then

Ωk = ∅.
k=1

Proof. Suppose on the contrary that k=1 Ωk = ∅. Then
∞ ∞ ∞
 ∞

Ω1 = Ω1 \ Ωk ⊂ X \ Ωk = X \ Ωk = Ωkc .
k=1 k=1 k=1 k=1

The collection of open sets {Ωkc }k∈N clearly covers Ω1 , which is a compact
subset of X. Thus there exist k1 < k2 < . . . < km such that
146 2 BASIC THEORY OF CONVEXITY


m m
Ω1 ⊂ Ωkci = X \ Ωk i = X \ Ωk m .
i=1 i=1

This is a contradiction because Ωkm is a nonempty subset of Ω1 . 

Remark 2.163 In the setting of Lemma 2.162 if X itself is compact and


{Ωk }k∈N is a sequence of nonempty closed subsets of X, then each set Ωk is

nonempty, closed, and compact. Thus k=1 Ωk = ∅.

The main results of this subsection is the Weierstrass-type existence the-


orem for minimizing l.s.c. functions on topological spaces.

Theorem 2.164 Let f : X → R be an l.s.c. function on a topological space X.


Suppose that there is α > inf{f (x) | x ∈ X} such that the sublevel set (2.47)
is compact. Then there exists x ∈ X such that f (x) = inf{f (x) | x ∈ X}.

Proof. Choose a strictly decreasing sequence {αk } of real numbers for which
inf{f (x) | x ∈ X} < αk < α whenever k ∈ N and limk→∞ αk = inf{f (x) | x ∈
X}. Define the sets
  
Ωk := x ∈ X  f (x) ≤ αk , k ∈ N,
and note that each Ωk is compact as a closed subset of a compact set.
Thus {Ωk }k∈N satisfies the assumptions of Lemma 2.162, which tells us that
∞ ∞
k=1 Ωk = ∅. Picking further any x ∈ k=1 Ωk = ∅, we get that

f (x) ≤ αk for all k ∈ N.


This readily implies that f (x) = inf x∈X f (x). 

The next striking consequence of Proposition 2.161 and Theorem 2.164


addresses only convex functions. It avoids the compactness requirement for
the existence of global optimal solutions in minimization while replacing it by
the sublevel set boundedness as in finite-dimensional spaces.

Theorem 2.165 Let X be a reflexive Banach space, and let f : X → R be a


convex l.s.c. function. Assume that the sublevel set (2.47) is bounded for some
α > inf{x ∈ X | f (x)}. Then there is x ∈ X with f (x) = inf{f (x) | x ∈ X}.

Proof. Equip the space X with the weak topology. Since the sublevel set Lα
is a closed and bounded convex set, it is weakly compact in X. We know
from Proposition 2.161 that any convex l.s.c. function is weakly l.s.c. Then
the conclusion of this corollary follows directly from Theorem 2.164. 

The final statement of this section is an application of Theorem 2.164 to


metric projections onto closed convex sets in reflexive Banach spaces.
2.5 Extended Relative Interiors in Infinite Dimensions 147

Corollary 2.166 Let Ω be a nonempty subset of a normed space X. Given


any x0 ∈ X, define the metric projection onto Ω by
  
Π(x0 ; Ω) := w ∈ Ω  x0 − w = d(x0 ; Ω) (2.48)
via the distance function (2.33). Then the metric projection (2.48) is nonempty
provided that X is a reflexive Banach space and that Ω is a closed convex set.

Proof. Define f (w) := x0 − w + δΩ (w) for w ∈ X and observe that


inf f (w) = inf f (w) = inf x0 − w = d(x0 ; Ω).
w∈X w∈Ω w∈Ω

The function f is obviously l.s.c. and convex as the sum of two l.s.c. convex
functions. Fixing α > inf w∈X f (w) = d(x0 ; Ω), we get
     
Ωα = w ∈ X  f (w) ≤ α =  w ∈ Ω  f (w) ≤ α
= w ∈ Ω  x0 − w ≤ α ⊂ Ω ∩ B(0; r),
where r := x0  + α. This tells us that the set Ωα is bounded. Invoking now
Theorem 2.165, we find w0 ∈ X such that
f (w0 ) = inf f (w) = d(x0 ; Ω).
w∈X

It ensures that w0 ∈ Ω (since otherwise f (w0 ) = ∞), and so x0 − w0  =


d(x0 ; Ω). Thus we get w0 ∈ Π(x0 ; Ω) and complete the proof. 

2.5 Extended Relative Interiors in Infinite Dimensions


As seen above, the relative interior notion for convex sets in finite dimensions
allows us to obtain more general and less restrictive results in geometric and
functional finite-dimensional settings in comparison with the interior notion
for sets that relates to the continuity of functions in infinite dimensions. In
fact, the nonempty interior assumption on sets and the continuity assumption
on functions may significantly limit the spectrum of applications of convex
analysis to important classes problems in optimization, economics, optimal
control, etc.; see Section 2.7 for more discussions.
This section introduces some extended notions of the relative interior for
convex sets in topological vector spaces, presents their basic properties, and
develops some applications to the study of convex sets, set-valued mappings,
and extended-real-valued functions in infinite-dimensional spaces.

2.5.1 Intrinsic Relative and Quasi-Relative Interiors

Given a convex subset Ω of a topological vector space X, the relative interior


of Ω is defined by
  
ri(Ω) := x ∈ Ω  ∃ neighborhood V of x with V ∩ aff(Ω) ⊂ Ω . (2.49)
148 2 BASIC THEORY OF CONVEXITY

If X is finite-dimensional, this notion reduces to the one formulated in Def-


inition 2.72, since the closure operation is not needed in (2.49) due to the
automatic closedness of affine sets in finite dimensions.
We begin with deriving some useful characterizations of relative interiors
of convex sets in Rn that are important for the subsequent extensions of this
notion to convex sets in infinite dimensions. One of the obtained character-
izations uses the notion of normals to convex sets, which is systematically
studied and applied in Chapter 3. Nevertheless, for the reader’s convenience
it makes sense to preliminarily present here the definition of the normal cone
to a convex subset Ω of a topological vector space as follows:
  
N (x; Ω) := x∗ ∈ X ∗  x∗ , x − x ≤ 0 for all x ∈ Ω for x ∈ Ω. (2.50)
Theorem 2.167 Let Ω be a nonempty convex set in Rn , and let x ∈ Rn . The
following properties are equivalent:
(a) x ∈ ri(Ω).
(b) x ∈ Ω and for every x ∈ Ω with x = x there exists a vector u ∈ Ω such
that x ∈ (x, u).
(c) x ∈ Ω and cone(Ω − x) is a linear subspace of Rn .
(d) x ∈ Ω and cone(Ω − x) is a linear subspace of Rn .
(e) x ∈ Ω and the normal cone N (x; Ω) is a linear subspace of Rn .

Proof. Implication (a)=⇒(b) is a part of Corollary 2.93.


To prove (b)=⇒(c), it suffices to show that for every a ∈ K := cone(Ω −x)
we get −a ∈ K. Fix any a ∈ K and find by definition t ≥ 0 and w ∈ Ω such
that a = t(w − x). If w = x, we have a = 0 and hence −a = 0 ∈ K. In the
case where w = x, take u ∈ Ω with x ∈ (w, u) and then find γ > 0 for which
w = x + γ(x − u). Thus it follows that
−a = t(x − w) = −tγ(x − u) = tγ(u − x) ∈ K,
which implies that K is a linear subspace of Rn .
Implication (c)=⇒(d) is obvious since linear subspaces are closed in Rn .
To verify implication (d)=⇒(e), fix any v ∈ N (x; Ω) and show that −v ∈
N (x; Ω). We have from the normal cone definition (2.50) that v, x − x ≤ 0
for all x ∈ Ω. Using this and denoting K := cone(Ω −x) tells us that v, z ≤ 0
for all z ∈ K. Since K is a linear subspace, for any x ∈ Ω we get that
x − x ∈ K and hence x − x = lim tk (wk − x),
k→∞

where tk ≥ 0 and wk ∈ Ω for every k ∈ N. It follows therefore that


−v, x − x = lim tk v, wk − x ≤ 0,
k→∞

which yields −v ∈ N (x; Ω). Thus N (x; Ω) is a linear subspace in Rn .


2.5 Extended Relative Interiors in Infinite Dimensions 149

To justify finally implication (e)=⇒(a), suppose on the contrary that x ∈


/
ri(Ω) while (e) holds. Then using the separation results from Theorem 2.88
gives us v ∈ Rn such that
v, x ≤ v, x for all x ∈ Ω (2.51)
 ∈ Ω for which v, x
and also ensures the existence of x  < v, x. Then (2.51)
implies that v ∈ N (x; Ω), and hence −v ∈ N (x; Ω). This yields
−v, x
 − x = v, x − v, x
 ≤ 0,
a contradiction, which shows that (a) holds and thus completes the proof. 

The obtained finite-dimensional characterizations of relative interior moti-


vate the major extensions of this notion to infinite dimensions, which are
considered in what follows.

Definition 2.168 Let Ω be a convex subset of a topological vector space X.


(a) The intrinsic relative interior of Ω is the set
  
iri(Ω) := x ∈ Ω  cone(Ω − x) is a linear subspace of X . (2.52)
(b) The quasi-relative interior of Ω is the set
  
qri(Ω) := x ∈ Ω  cone(Ω − x) is a linear subspace of X . (2.53)
(c) We say that Ω ⊂ X is quasi-regular if qri(Ω) = iri(Ω).
Due to Theorem 2.167, both notions in Definition 2.168(a,b) reduce to
the relative interior of Ω in finite-dimensional spaces. The one in (a) is also
known under the name “intrinsic core” of Ω, which may be confusing; see
the corresponding commentaries in Section 2.7. Definition 2.168(c) designates
the property qri(Ω) = iri(Ω) by labeling the sets satisfying this condition as
quasi-regular ones. The latter property plays an important role in the subse-
quent results of this section. Some sufficient conditions for the quasi-regularity
property of convex sets are presented below.
It is not hard to check that if Ω is a nonempty convex subset of an LCTV
space X, then all the three sets ri(Ω), iri(Ω), and qri(Ω) are also convex in
X while each of them may be empty; see below.
To proceed further, we first present a simple equivalent description of
intrinsic relative interior points of nonempty convex sets. Recall that a point
x ∈ Ω is relatively absorbing for Ω if for every x ∈ Ω \ {x} there exists u ∈ Ω
such that x ∈ (x, u).

Proposition 2.169 Let Ω be a nonempty convex subset of a topological vector


space X, and let x ∈ Ω. Then we have x ∈ iri(Ω) if and only if x is a relatively
absorbing point of the set Ω.
150 2 BASIC THEORY OF CONVEXITY

Proof. To verity the “if” part, observe that relatively absorbing points of Ω
can be equivalently described as follows: for any x ∈ Ω \{x} there exists α > 1
such that (1 − α)x + αx ∈ Ω. Pick now any nonzero vector v ∈ cone(Ω − x)
and find λ > 0 with λv + x ∈ Ω. The relative absorbing property of x gives
us a number α > 1 such that
(1 − α)(λv + x) + αx = λ(α − 1)(−v) + x ∈ Ω.
This yields −v ∈ cone(Ω − x), and so cone(Ω − x) is a linear subspace of X.
To justify the converse implication, take any x ∈ iri(Ω) and x ∈ Ω with
x = x. Since cone(Ω − x) is a linear subspace of X,
x − x ∈ cone(Ω − x) and x − x ∈ cone(Ω − x).
Choose t > 0 such that x − x = t(w − x) with w ∈ Ω and denote α := 1 + 1/t.
Since α > 1, this tells us that
(α − 1)(x − x) + x = (α − 1)x + αx ∈ Ω,
which means that x is a relatively absorbing point of Ω. 

Note that Proposition 2.169 shows that the equivalence in (b)⇐⇒(c) of


Theorem 2.167 holds for nonempty convex sets in X. Furthermore, the other
equivalence (d)⇐⇒(e) of Theorem 2.167 suggests a similar relationship in the
general LCTV setting established in what follows.

Definition 2.170 Let X be a topological vector space. Given a subset Ω of


X, define the polar of Ω by
  
Ω ◦ := x∗ ∈ X ∗  x∗ , w ≤ 1 for all w ∈ Ω .
Given a subset Θ of X ∗ , we define the polar of Θ by
  
Θ◦ := x ∈ X  z ∗ , x ≤ 1 for all z ∗ ∈ Θ .

It follows from the definition that if Ω is a cone in X, then


  
Ω ◦ := x∗ ∈ X ∗  x∗ , w ≤ 0 for all w ∈ Ω .
Similarly, for a cone Θ ⊂ X ∗ we have
  ∗ 
Θ◦ := x ∈ X  z , x ≤ 0 whenever z ∗ ∈ Θ .

To obtain a desired normal cone characterization of the quasi-relative inte-


rior of convex sets, we first reveal the following properties of polars that are
of their own interest.

Proposition 2.171 Let C be a nonempty convex cone in a LCTV space X.


Then we have the relationship (C ◦ )◦ = C.
2.5 Extended Relative Interiors in Infinite Dimensions 151

Proof. Observe that C ◦ is a nonempty convex cone in X ∗ . It follows directly


from the definition that (C ◦ )◦ is a closed subset of X and C ⊂ (C ◦ )◦ . Then
C ⊂ (C ◦ )◦ . Fix now any x ∈ (C ◦ )◦ and suppose on the contrary that x ∈ / C.
Employing the strict convex separation, find x∗ ∈ X ∗ such that
x∗ , u ≤ 0 for all u ∈ C and x∗ , x > 0.
Thus we get x∗ ∈ C ◦ and x∗ , x > 0, which contradicts the fact that x ∈
(C ◦ )◦ and completes the proof. 

Proposition 2.172 Let Ω be a nonempty convex set in a topological vector


space X, and let x ∈ Ω. Then

N (x; Ω) = Θ◦ = Θ , where Θ := cone(Ω − x).
Proof. Fix any x∗ ∈ N (x; Ω). It follows from the definition that
x∗ , x − x ≤ 0 for all x ∈ Ω.
This yields x∗ , w ≤ 0 for all w ∈ Θ, and hence x∗ ∈ Θ◦ . Taking further any
x∗ ∈ Θ◦ tells us that x∗ , w ≤ 0 for all w ∈ Θ. Then for any x ∈ Ω we have
x − x ∈ Θ, and so x∗ , x − x ≤ 0. This implies that x∗ ∈ N (x; Ω), and thus

N (x; Ω) = Θ◦ . It is also straightforward to show that Θ◦ = Θ . 

Now we are ready to establish the aforementioned normal cone character-


ization of the quasi-interiority notion.

Theorem 2.173 Let Ω be a nonempty convex subset of an LCTV space X,


and let x ∈ Ω. Then we have
   
x ∈ qri(Ω) ⇐⇒ N (x; Ω) is a linear subspace of X ∗ (2.54)

Proof. Suppose first that x ∈ qri(Ω). It follows from the definition that the set
Θ, where Θ := cone(Ω − x), is a linear subspace of X. An easy exercise shows

that Θ is also a linear subspace of X ∗ . Then Proposition 2.172 tells us that

N (x; Ω) = Θ is a linear subspace of X ∗ . In the other direction, suppose that
N (x; Ω) is a linear subspace of X ∗ . Then we deduce from Proposition 2.171
and Proposition 2.172 that
N (x; Ω)◦ = (Θ◦ )◦ = Θ.
Since N (x; Ω) is a linear subspace of X ∗ , the set N (x; Ω)◦ is also a linear
subspace of X. Thus Θ is a linear subspace of X, and therefore x ∈ qri(Ω). 

Next we establish relationships between the notions of relative, intrinsic


relative, and quasi-relative interiors of convex sets in LCTV spaces.
152 2 BASIC THEORY OF CONVEXITY

Theorem 2.174 Let Ω be a convex subset of a topological vector space X.


Then we have the inclusions
ri(Ω) ⊂ iri(Ω) ⊂ qri(Ω). (2.55)
If furthermore X is locally convex and ri(Ω) = ∅, then the inclusions in (2.55)
become the equalities
ri(Ω) = iri(Ω) = qri(Ω). (2.56)

Proof. We first show that ri(Ω) ⊂ iri(Ω). Take x ∈ ri(Ω) and fix x ∈ Ω with
x = x. It follows from (2.49) that x ∈ Ω and there exists a neighborhood U
of x for which we have the inclusion
U ∩ aff(Ω) ⊂ Ω. (2.57)
Choose 0 < t < 1 so small that u := x + t(x − x) ∈ U . Then u ∈ aff(Ω) ⊂
aff(Ω), and we get from (2.57) that u ∈ Ω. It follows that
t 1
x= x+ u ∈ (x, u),
1+t 1+t
which therefore verifies by Proposition 2.169 that x ∈ iri(Ω). This tells us
that ri(Ω) ⊂ iri(Ω). The other inclusion iri(Ω) ⊂ qri(Ω) in (2.55) is trivial,
since the subspace property of cone(Ω − x) clearly implies that the closure
cone(Ω − x) is also a linear subspace of X.
To prove the equalities in (2.56), it is sufficient to show that if ri(Ω) = ∅
and x ∈ qri(Ω), then x ∈ ri(Ω). Suppose on the contrary that x ∈ / ri(Ω)
and begin with the case where x = 0. If 0 ∈ / Ω in this case, then the strict
separation theorem yields the existence of x∗ ∈ X ∗ such that
x∗ , x < 0 for all x ∈ Ω. (2.58)
In the complement setting where 0 ∈ Ω \ ri(Ω), denote X0 := aff(Ω) and get
0 ∈ X0 telling us that X0 is a closed linear subspace of X. It is easy to see
that 0 ∈
/ ri(Ω) = intX0 (Ω), where intX0 (Ω) is the interior of Ω with respect
to the space X0 . Applying the separation result from Corollary 2.59 to the
sets Ω and {0} in the topological space X0 , we find x∗0 ∈ X0∗ ensuring that
x∗0 , x ≤ 0 for all x ∈ Ω, and x∗0 , u < 0 for some u ∈ Ω. (2.59)
Then the Hahn-Banach extension theorem from Theorem 2.37 shows that
there exists an extension x∗ ∈ X ∗ of x∗0 such that
x∗ , x ≤ 0 for all x ∈ Ω.
In either case there exists x∗ ∈ X ∗ such that x∗ , x ≤ 0 for all x ∈ Ω and
hence for all x ∈ cone(Ω). Since 0 ∈ qri(Ω), we have that cone(Ω) is a linear
subspace, and therefore
2.5 Extended Relative Interiors in Infinite Dimensions 153

x∗ , x = 0 for all x ∈ cone(Ω).


This contradicts the conditions in (2.59) and also in (2.58), and thus verifies
that 0 ∈ ri(Ω). Turning finally to the general case for x, we reduce it to the
case where x = 0 due to the obvious relationships
x ∈ ri(Ω) ⇐⇒ 0 ∈ ri(Ω − x) = ri(Ω) − {x} and
x ∈ qri(Ω) ⇐⇒ 0 ∈ qri(Ω − x) = qri(Ω) − {x},
which complete the proof of the theorem. 

It is well known that, the condition ri(Ω) = ∅ ensures that the equalities
in Theorem 2.174 always hold for convex sets in finite dimensions. However, it
is not the case in many important infinite-dimensional settings. In particular,
it has been well recognized that the natural ordering/positive cones in the
standard Lebesgue spaces of sequences p and functions Lp [0, 1] for any p ∈
[1, ∞) have empty relative interiors. Thus the usage of (2.49) significantly
restricts the spectrum of applications of infinite-dimensional convex analysis
in various optimization problems (particularly of its vector-valued and set-
valued aspects), equilibria, economic modeling, etc.; see Section 2.7 for more
discussions.
As we see below, in the case where ri(Ω) = ∅ the inclusions in (2.55) may
be strict in the simplest infinite-dimensional Hilbert space of sequences 2
with both sets iri(Ω) and qri(Ω) being nonempty.

Example 2.175 Let X := 2 , and let Ω ⊂ X be given by


 ∞ 

Ω := x = (xk ) ∈ X  x1 := |xk | ≤ 1 .
k=1

We can check that iri(Ω) = {x ∈ X | x1 < 1}, while


 
qri(Ω) = Ω \ x = (xk ) ∈ X  x1 = 1, ∃ k0 ∈ N
(2.60)
with xk = 0 for all k ≥ k0 .
To verify the intrinsic relative interior representation, we first take any x ∈
iri(Ω) and show that x1 < 1. Fix u = 0 ∈ Ω and find y ∈ Ω such that
x = ty + (1 − t)u = ty for some t ∈ (0, 1). Since y1 ≤ 1 and 0 < t < 1
we have x1 = ty1 = ty1 < 1. To justify the reverse inclusion, fix any
x ∈ X with x1 < 1 and show that
cone(Ω − x) = 1 , (2.61)
which is a linear subspace of 2 . Take any z ∈ 1 and choose t > 0 so small
that x1 + tz1 ≤ 1. It follows that x + tz1 ≤ 1, and therefore x + tz ∈ Ω.
This implies that z ∈ cone(Ω −x) and hence the inclusion “⊃” in (2.61). Since
the other inclusion in (2.61) is obvious, we see that x ∈ iri(Ω).
To prove (2.60), observe that for any x ∈ Ω we have
154 2 BASIC THEORY OF CONVEXITY
  
N (x; Ω) = z ∈ X  x, z = z∞ . (2.62)
Indeed, z ∈ N (x; Ω) if and only if z, u − x ≤ 0 for all u ∈ Ω, which can be
equivalently written as
  
sup z, u  u ∈ Ω = z, x.
It is easy to check that Ω = {u ∈ 1 | u1 ≤ 1}, and so
     
sup z, u  u ∈ Ω = sup zu  u ∈ l1 , u1 ≤ 1 = z∞ ,
which clearly verifies the fulfillment of (2.62).
Now we pick any x ∈ qri(Ω) and suppose on the contrary that x is not
the set on the right-hand side of (2.60). Then x ∈ 2 satisfies the conditions
x1 = 1, xk = 0 for all k ≥ k0 for some k0 ∈ N.
2
Define z ∈  by zk := sign(xk ). Then z∞ = 1 and
∞ ∞
x, z = xk zk = |xk | = 1.
k=1 k=1

It follows from (2.62) that z ∈ N (x; Ω), which is a linear subspace of X. This
tells us that −z ∈ N (x; Ω), and so −z, 0 − x ≤ 0, which contradicts the
conditions −z, 0 − x = z, x = 1. The obtained contradiction shows that x
belongs to the set on the right-hand side of (2.60).
To proceed further, fix any x from the set on the right-hand side of (2.60)
and suppose on the contrary that x ∈ / qri(Ω). Then N (x; Ω) is not a linear
subspace of X. Thus we can find z = 0 such that z ∈ N (x; Ω). It follows from
(2.62) that x, z = z∞ = 0, which yields
∞ ∞ ∞
z∞ = x, z = xk zk ≤ |xk |·|zk | ≤ z∞ |xk | = z∞ x1 ≤ z∞ .
k=1 k=1 k=1

This implies that x1 = 1 and |zk | = z∞ > 0 whenever xk = 0. Since
z ∈ 2 , we see that there exists k0 ∈ N such that xk = 0 for all k ≥ k0 , a
contradiction that gives us x ∈ qri(Ω) and completes the proof of (2.60).

The next example demonstrates that the intrinsic relative interior may be
empty for convex subsets of 2 .
Example 2.176 Let X := 2 , and let Ω ⊂ X be given by
  
∞ 1/2

Ω := x = (x1 , x2 , . . .) ∈ X  x2 := |xk |2 ≤ 1, xk ≥ 0 for all k ∈ N .
k=1

We are going to show that iri(Ω) = ∅ for this set. Assume on the contrary
that there exists x ∈ iri(Ω). Following the arguments in Example 2.175, we
2.5 Extended Relative Interiors in Infinite Dimensions 155

have x < 1. To show now that xk > 0 for all k ∈ N, suppose, e.g., that
x1 = 0 and then let x  := (1, 0, 0, . . .) ∈ Ω. Since x ∈ iri(Ω), there exists z ∈ Ω
such that x = t x + (1 − t)z for some t ∈ (0, 1), which readily implies that
−t(1 − t) < 0. This is a contradiction showing that iri(Ω) = ∅.
Proposition 2.169 tells us that whenever x ∈ Ω we have (1 − α)x + αx ∈ Ω
for some α > 1. Fix ε > 0 and select an increasing sequence of natural numbers
{kn } with 0 < xkn ≤ ε/4n . Defining x  ∈ Ω by x kn := ε/2n and x k := 0 for
all other k ∈ N, let us check that (1 − α) x + αx ∈ / Ω whenever α > 1. Indeed,
we have the estimate
    ε ε
x + αx k ≤ 1 − α n + α n < 0
(1 − α)
n 2 4
for n sufficiently large, which justifies the claim of this example.

It can be directly checked that for the set Ω from Example 2.176 we
have qri(Ω) = ∅. In fact, this remarkable property holds for any nonempty,
closed, and convex subset of every separable Banach space, and even in more
generality; see Theorem 2.178 below. To prove this major theorem, we first
present the following useful proposition, which provides a characterization of
qri(Ω) via nonsupport points of Ω. Recall that x ∈ Ω is a nonsupport point
of this set if any closed supporting hyperplane to Ω at x contains Ω.

Proposition 2.177 Let Ω be a nonempty convex subset of an LCTV space,


and let x ∈ Ω. Then x ∈ qri(Ω) if and only if x is a nonsupport point of Ω.

Proof. Observe first that any nonsupport point x of Ω can be equivalently


described as follows: whenever x∗ ∈ X ∗ we have the implication
 ∗   
x , x − x ≥ 0 if x ∈ Ω =⇒ x∗ , x − x = 0 if x ∈ Ω . (2.63)
Having this in mind, assume now that x ∈ qri(Ω). Since
 ∗   
x , x − x ≥ 0 if x ∈ Ω =⇒ x∗ , u ≥ 0 if u ∈ cone(Ω − x)
for any x∗ ∈ X ∗ , and since the set cone(Ω − x) is a linear subspace, we get
x∗ , u = 0 if u ∈ cone(Ω − x), and so x∗ , x − x = 0 if x ∈ Ω.
The latter means by (2.63) that x is a nonsupport point of Ω.
To verify the “if” part of the proposition, let x be a nonsupport point of
Ω. Denote C := cone(Ω − x) and suppose on the contrary that C is not a
linear subspace of X, i.e., there exists v ∈ C with −v ∈/ C. This yields by the
strict separation theorem that
x∗ , −v < x∗ , u for all u ∈ C.
for some x∗ ∈ X ∗ . Taking into account that C is a cone, we obtain that
x∗ , u ≥ 0 if u ∈ C, and so x∗ , x − x ≥ 0 if x ∈ Ω.
156 2 BASIC THEORY OF CONVEXITY

On the other hand, it follows from x∗ , −v < 0 and v ∈ cone(Ω − x) that
there exist x ∈ Ω and λ ≥ 0 satisfying x∗ , −λ(x − x) < 0. The latter implies
that x∗ , x − x > 0, which is a contradiction telling us that x ∈ qri(Ω) and
thus completing the proof of the proposition. 

Now we are ready to derive the aforementioned result on the existence of


quasi-relative interior points of convex sets in the Banach space setting.
Theorem 2.178 Let X be a separable Banach space, and let Ω be a nonempty,
closed, and convex subset of X. Then qri(Ω) = ∅.

Proof. For brevity we present the proof for the case where Ω is closed and
bounded. The reader can consult [47, Theorem 2.19] to a more general version
of this theorem in separable Fréchet spaces, where the boundedness assump-
tion on Ω is dropped and the set in question is CS-closed.
To prove the theorem under the imposed assumptions, we use the sepa-
rability of X and select an arbitrary sequence {xk } ⊂ Ω that is dense in Ω.
Consider further the vector

1
x := xk ,
2k
k=1

which is well-defined due to the boundedness of {xk } in X. Moreover, the


completeness of X and the closedness of Ω ensures that x ∈ Ω. We claim
that x ∈ qri(Ω), which reduces by Proposition 2.177 to verifying that x is a
nonsupport point of Ω. To show the latter, let us use the equivalent description
of nonsupport points given in (2.63). Take any x∗ ∈ X ∗ satisfying the left-
hand part of the implication in (2.63). Then we have that x∗ , xk − x ≥ 0 for
all k ∈ N, and hence arrive at the relationships

1 ∗
0 = x∗ , x − x = x , xk − x ≥ 0,
2k
k=1

which imply that x , xk − x = 0 for all k ∈ N. The density of {xk } in Ω
yields the fulfillment of the property on the left-hand side of (2.63) and thus
verifies that x is a nonsupport point of Ω. 

The next example illustrates that the separability assumption of The-


orem 2.178 is essential for the existence of quasi-relative interior points of
closed convex sets even in Hilbert spaces.
Example 2.179 Let S be an arbitrary uncountable set, and let X := 2 (S)
be the Hilbert space of square summable function on S, i.e., X = L2μ (S),
where μ is the counting measure on S. Consider the closed convex set, the
positive cone of 2 (S), given by
  
Ω := x ∈ 2 (S)  x(s) ≥ 0 for all s ∈ S
2.5 Extended Relative Interiors in Infinite Dimensions 157

and prove that qri(Ω) = ∅. Relying on Proposition 2.177, it is sufficient to


show that the set Ω does not have nonsupport points, i.e., for any x ∈ Ω
there exists x∗ ∈ X ∗ such that the implication in (2.63) fails. Indeed, pick
any x ∈ Ω and take s ∈ S such that x(s) = 0. This is possible since we can
easily see that x(s) = 0 only for countably many s ∈ S. Define now x∗ ∈ X ∗
by x∗ , x := −x(s). Then x∗ , x = supx∈Ω x∗ , x = 0, while there exists
u ∈ Ω such that x∗ , u < 0. Thus (2.63) fails, and we arrive at qri(Ω) = ∅.

Theorem 2.178 obviously ensures the existence of intrinsic relative interior


points of convex sets that are quasi-regular. This property introduced in Defi-
nition 2.168(c) will be studied in applied in the subsequent subsections of this
section as well as in further parts of the book. As demonstrated by the exam-
ples above, quasi-regularity is not necessary for the fulfillment of iri(Ω) = ∅.
To the best of our knowledge, it is still an unsolved issue on deriving fairly
general conditions ensuring the existence of intrinsic relative interior points
of convex sets in infinite dimensions that do not exhibit quasi-regularity.

Proposition 2.180 Let Ω be a convex subset of a topological vector space X.


The following assertions hold.
(a) If x ∈ ri(Ω) and x
 ∈ Ω, then (
x, x] ⊂ ri(Ω).
(b) If x ∈ iri(Ω) and x
 ∈ Ω, then (
x, x] ⊂ iri(Ω).
(c) Suppose further that X is locally convex. If x ∈ qri(Ω) and x
 ∈ Ω, then
x, x] ⊂ qri(Ω).
(

Proof. To verify (a), fix x ∈ ri(Ω) and x ∈ Ω. If x = x, then (x, x] = {x} ⊂


ri(Ω). Now suppose that x = x. Since x ∈ ri(Ω), there exists a neighborhood
U of x such that U ∩ aff(Ω) ⊂ Ω. For any λ ∈ [0, 1] we have
 
(1 − λ) U ∩ aff(Ω) + λ x ⊂ Ω.
Picking y ∈ (x, x] gives us y = (1 − λ0 )x + λ0 x
 for some λ0 ∈ [0, 1). Then
V = (1 − λ0 )U + λ0 x is a neighborhood of y. Observe further that
 
V ∩ aff(Ω) = (1 − λ0 ) U ∩ aff(Ω) + λ0 x  ⊂ Ω.
This yields y ∈ ri(Ω), and thus justifies the claimed assertion (x, x] ⊂ ri(Ω).
To verify (b), fix x ∈ iri(Ω) and x
 ∈ Ω. Without loss of generality, assume
that x = 0. Pick any λ ∈ [0, 1) and let x := λ x. To get x ∈ iri(Ω), take any
y ∈ Ω with y = x and show that there exists y ∈ Ω such that x ∈ (y, y).
Indeed, since x = 0 ∈ iri(Ω), we have −αy ∈ Ω for some α > 0. Choosing
y := (1 − δ)
x + δ(−αy) ∈ Ω
1−λ
with δ = 1+λα ∈ [0, 1], tells us that
x = (1 − γ)y + γ
y,
158 2 BASIC THEORY OF CONVEXITY

where γ := 1+λα
1+α ∈ (0, 1), and thus x ∈ (y, y ). Applying Proposition 2.169
shows that x ∈ iri(Ω).
To prove the final assertion (c), fix x ∈ qri(Ω) and x  ∈ Ω, and then let
y := λx + (1 − λ) x with λ ∈ (0, 1]. Using Theorem 2.173, we intend to show
that N (y; Ω) is a linear subspace. Indeed, fix any x∗ ∈ N (y; Ω) and then get
x∗ , x − y ≤ 0 for all x ∈ Ω. (2.64)
Plugging x = x into (2.64) gives us x∗ , x − x  ≤ 0, while plugging x = x
 into
(2.64) yields x∗ , x − x ≤ 0. Unifying the above, we arrive at the equality
x∗ , x = x∗ , x
. Furthermore, it follows from (2.64) that
x∗ , x − y = x∗ , x − x
 − λx∗ , x − x
 = x∗ , x − x ≤ 0 for all x ∈ Ω,
which verifies that x∗ ∈ N (x; Ω). Since x ∈ qri(Ω), Theorem 2.173 brings us
to −x∗ ∈ N (x; Ω). Taking into account that x∗ , x = x∗ , x
, we have
−x∗ , x − y = λ−x∗ , x − x + (1 − λ)−x∗ , x − x
 ≤ 0 whenever x ∈ Ω,
and hence −x∗ ∈ N (y; Ω). This tells us that N (y; Ω) is a linear subspace, and
the result follows from Theorem 2.173. 

2.5.2 Convex Separation via Extended Relative Interiors

In this section we derive enhanced versions of convex separation theorems


for nonsolid sets in LCTV spaces under extended relative interiority assump-
tions. Some applications of these results are given here to the study of quasi-
regularity of convex sets in infinite dimensions, while more applications of
enhanced convex separation are provided in the subsequent material.
We start with a version of proper separation of a singleton from a convex
set that gives us yet another characterization of quasi-relative interior.

Proposition 2.181 Let Ω be a convex set in an LCTV space X, and let


x ∈ Ω. Then the sets Ω and {x} can be properly separated if and only if we
have the condition x ∈
/ qri(Ω).

Proof. The conclusion follows directly from Proposition 2.177. Here we pro-
vide an alternative proof by employing the normal cone characterization of
the quasi-relative interior. Indeed, by (2.54) we get that x ∈ / qri(Ω) if and
only if there exists x∗ ∈ N (x; Ω) with −x∗ ∈ / N (x; Ω). It follows from the
normal cone construction (2.50) for convex sets that x∗ , x ≤ x∗ , x for
all x ∈ Ω. Then the inclusion −x∗ ∈ / N (x; Ω) gives us x0 ∈ Ω such that
−x∗ , x0  > −x∗ , x, which reads as x∗ , x0  < x∗ , x and hence justifies the
statement of the proposition. 

The next result provides a useful version of strict separation relative to


closed subspaces of Hilbert spaces.
2.5 Extended Relative Interiors in Infinite Dimensions 159

Proposition 2.182 Let L be a closed linear subspace of a Hilbert space X,


and let Ω ⊂ L be a nonempty convex set with x ∈ L and x ∈ Ω. Then there
exists a vector u ∈ L such that
  
sup u, x  x ∈ Ω < u, x.

Proof. Since x ∈
/ Ω, we see that the sets {x} and Ω are strictly separated in
X, which means that there exists a vector v ∈ X such that
  
sup v, x  x ∈ Ω < v, x. (2.65)
It is well known that any Hilbert space X can be represented as the direct
sum X = L ⊕ L⊥ , where
  
L⊥ := w ∈ X  w, x = 0 for all x ∈ L .
If v ∈ L⊥ , then (2.65) immediately gives us a contradiction. Thus v ∈ X is
represented as v = u + w with some 0 = u ∈ L and w ∈ L⊥ . This implies that
for each x ∈ Ω ⊂ L we have
  
u, x = u, x + w, x = v, x ≤ sup v, x  x ∈ Ω < v, x
= u + w, x = u, x,
which shows that sup{u, x | x ∈ Ω} < u, x. 

Before establishing the main convex separation theorem for two nonsolid
sets in terms of their extended relative interiors in LCTV spaces, we present
some calculus rules involving both intrinsic relative and quasi-relative interiors
of convex sets. These rules are of their own interest while being instrumental
for deriving the aforementioned convex separation theorem.

Theorem 2.183 Let A : X → Y be a linear continuous mapping between two


LCTV spaces, and let Ω ⊂ X be a convex set. The following assertions hold:
(a) We always have the inclusions
       
A iri(Ω) ⊂ iri A(Ω) and A qri(Ω) ⊂ qri A(Ω) . (2.66)

(b) If iri(Ω) = ∅, then we have the equality


   
A iri(Ω) = iri A(Ω) . (2.67)

(c) If qri(Ω) = ∅ and if the set A(Ω) is quasi-regular, then


   
A qri(Ω) = qri A(Ω) . (2.68)

Proof. First we verify assertion (a). Fix any x ∈ iri(Ω) and get that cone(Ω −
x) is a linear subspace of X. The linearity of A shows that
160 2 BASIC THEORY OF CONVEXITY
   
cone A(Ω) − Ax = A cone(Ω − x)
 
is a linear subspace of Y . Thus Ax ∈ iri A(Ω) . This justifies the first inclusion
in (2.66). Now, pick any x ∈ qri(Ω) and deduce from (2.54) that N (x; Ω) is a
linear subspace of X ∗ . Then take y ∗ ∈ N (Ax; A(Ω)) meaning that y ∗ , Ax −
Ax ≤ 0 for all x ∈ Ω, which tells us that A∗ y ∗ ∈ N (x; Ω). By the subspace
property of N (x; Ω) we get that −A∗ y ∗ ∈ N (x; Ω), which is equivalent to
−y ∗ ∈ N (Ax; A(Ω)). Thus the normal cone N (Ax; A(Ω)) is a linear subspace
of Y ∗ , and so Ax ∈ qri(A(Ω)) by (2.54). This verifies (2.66).
To justify (b), it suffices to check the inclusion “⊃” in (2.67) under the
assumption that iri(Ω)
 = ∅. Fix x ∈ iri(Ω) and set y := Ax.
 It follows from
(2.66) that y ∈ iri A(Ω) . Fix further any y ∈ iri A(Ω) . If y = y, then
y ∈ A iri(Ω) . In the case where y = y, Proposition 2.169 shows that there is
u ∈ A(Ω) such that y ∈ (u, y). Pick x  ∈ Ω with Ax = u and get
y = tu + (1 − t)y = tA(
x) + (1 − t)Ax = A(t
x + (1 − t)x)
for some t ∈ (0, 1). Since x ∈ iri(Ω) and x  ∈ Ω, Proposition 2.180(b) shows
x, x] ⊂ iri(Ω).
that (  Thus  xt := t
x + (1 − t)x ∈ iri(Ω) satisfies y = Axt . It
follows that y ∈ A iri(Ω) .
Finally, we prove the inclusion “⊃” in assertion (c) under the assumptions
that qri(Ω) = ∅ and A(Ω) is quasi-regular. Fix x ∈ iri(Ω)  and
 set y := Ax.

By the second inclusion
  in (a)
 we have y = Ax ∈ A qri(Ω)
 ⊂ qri A(Ω) .
Fix any y ∈ qri A(Ω) = iri A(Ω) . If y = y, then y ∈ A qri(Ω) . If y = y,
by Proposition 2.169 there exists u ∈ A(Ω) such that y ∈ (u, y). Pick x ∈Ω
such that A x = u and get
y = tu + (1 − t)y = tA(
x) + (1 − t)Ax = A(t
x + (1 − t)x)
for some t ∈ (0, 1). Then y = Axt , where x + (1 − t)x) ∈ qri(Ω) by
 xt := t
Proposition 2.180(c). Thus y ∈ A qri(Ω) , which completes the proof. 

The next theorem presents the major separation result for two nonsolid
convex sets in arbitrary LCTV spaces.

Theorem 2.184 Let Ω1 and Ω2 be convex subsets of an LCTV space X.


Assume that qri(Ω1 ) = ∅, qri(Ω2 ) = ∅, and the set difference Ω1 − Ω2 is
quasi-regular. Then Ω1 and Ω2 can be properly separated if and only if
qri(Ω1 ) ∩ qri(Ω2 ) = ∅. (2.69)

Proof. First we verify that the assumptions of the theorem ensure that
qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ). (2.70)
Indeed, define a linear continuous mapping A : X ×X → X by A(x, y) := x−y
and let Ω := Ω1 × Ω2 . It is easy to check that qri(Ω) = qri(Ω1 ) × qri(Ω2 ),
2.5 Extended Relative Interiors in Infinite Dimensions 161

and thus qri(Ω) = ∅ under the assumptions made. Applying formula (2.68)
from Theorem 2.183(c) to these objects A and Ω gives us
   
qri(Ω1 − Ω2 ) = A qri(Ω) = qri A(Ω) = qri(Ω1 ) − qri(Ω2 ),
and thus we arrive at the claimed equality (2.70).
Consider further the set difference Ω := Ω1 − Ω2 and get from (2.70) that
condition (2.69) reduces to
/ qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ),
0∈
and hence 0 ∈ / qri(Ω1 − Ω2 ) = qri(Ω) under the fulfillment of (2.69). Then
Proposition 2.181 tells us that the sets Ω and {0} can be properly separated,
which clearly ensures the proper separation of the sets Ω1 and Ω2 .
To verify the converse implication, suppose that Ω1 and Ω2 can be properly
separated, which implies that the sets Ω = Ω1 − Ω2 and {0} can be properly
separated as well. Then using Proposition 2.181 and Theorem 2.183 yields
/ qri(Ω) = qri(Ω1 − Ω2 ) = qri(Ω1 ) − qri(Ω2 ),
0∈
and thus qri(Ω1 ) ∩ qri(Ω2 ) = ∅, which completes the proof. 

As seen above and will be seen in the sequel, the quasi-regularity of convex
sets is needed for the fulfillment of many important results. Theorem 2.174
tells us the quasi-regularity of a convex set Ω holds in LCTV spaces if ri(Ω) =
∅ (in particular, for nonempty convex sets in finite dimensions), and of course if
Ω is a solid convex set. Next we reveal yet another general infinite-dimensional
setting where convex sets are quasi-regular.
Before establishing this result, let us present the following useful technical
lemma on intrinsic relative interiors.

Lemma 2.185 Let X be a topological vector space, and let Ω ⊂ X be a


nonempty, closed, and convex set with 0 ∈ Ω \ iri(Ω). If iri(Ω) = ∅, then
aff(Ω) is a closed linear subspace of X and there exists a sequence {xk } ⊂ −Ω
such that xk ∈
/ Ω and xk → 0 as k → ∞.

Proof. Using iri(Ω) = ∅ and 0 ∈ Ω \ iri(Ω), we show that there is a nonzero


vector x0 ∈ iri(Ω) with −tx0 ∈
/ Ω for all t > 0. Suppose on the contrary that
−tx0 ∈ Ω for some t > 0. Then it follows from Proposition 2.180(b) that
t 1  
0= x0 + − tx0 ∈ iri(Ω),
1+t 1+t
which clearly contradicts the assumption 0 ∈
/ iri(Ω). Denoting xk := −(x0 /k) ∈
−Ω tells us that xk ∈/ Ω for every k and that xk → 0 as k → ∞. 
Let us now define the following property, which is automatic in finite
dimensions while being very important for performing limiting procedures in
infinite-dimensional spaces.
162 2 BASIC THEORY OF CONVEXITY

Definition 2.186 A subset Ω ⊂ X of a normed space X is sequentially


normally compact (SNC) at x ∈ Ω if for any sequence {(xk , x∗k )} ⊂
X × X ∗ we have the implication
 ∗ w∗ 
xk ∈ N (xk ; Ω), xk ∈ Ω, xk → x, x∗k → 0 =⇒ x∗k  → 0. (2.71)

Remark 2.187 The SNC property (2.71) is taken from [228] and investigated
therein for general nonconvex sets in Banach spaces. In the case of closed and
convex subsets Ω ⊂ X of such spaces, this property can be characterized
as follows [228, Theorem 1.21]: If a closed and convex set Ω has nonempty
relative interior, then it is SNC at every x ∈ Ω if and only if the closure of
the span of Ω is of finite codimension.

Now we are ready to derive the aforementioned result on the quasi-


regularity of convex sets in infinite dimensions.

Theorem 2.188 Let Ω ⊂ X be a nonempty, closed, and convex subset of a


Hilbert space X. Assume in addition that iri(Ω) = ∅, and that Ω is SNC at
every point x ∈ Ω. Then this set is quasi-regular.

Proof. First we verify that in the case where 0 ∈ / iri(Ω) the sets Ω and {0}
can be properly separated, i.e., there exists a nonzero vector a ∈ X such that
     
sup a, x  x ∈ Ω ≤ 0 and inf a, x  x ∈ Ω < 0. (2.72)
If 0 ∈ Ω, this statement is trivial. Suppose now that 0 ∈ Ω \iri(Ω). Letting
L := aff(Ω) and employing Lemma 2.185 tell us that L is a linear subspace
of X, and that there is a sequence {xk } ⊂ L for which xk ∈ / Ω and xk → 0 as
k → ∞. By Proposition 2.182 we find a sequence {vk } ⊂ L with vk = 0 and
  
sup vk , x  x ∈ Ω < vk , xk  whenever k ∈ N.
Denote wk := vk
vk  ∈ L with wk  = 1 as k ∈ N and observe that

wk , x < wk , xk  ≤ wk  · xk  = εk for all x ∈ Ω, (2.73)


where εk := xk  ↓ 0. Since {wk } is bounded, we let k → ∞ in (2.73) and
w
suppose without loss of generality that wk −
→ a ∈ L, which yields
  
sup a, x  x ∈ Ω ≤ 0. (2.74)
To verify further the strict inequality
  
inf a, x  x ∈ Ω < 0,
it suffices to show that there is x ∈ Ω with a, x < 0. Suppose on the contrary
that a, x ≥ 0 for all x ∈ Ω and deduce from (2.74) that a, x = 0 whenever
x ∈ Ω. Since a ∈ L = aff(Ω), there exists a sequence aj → a as j → ∞ with
aj ∈ aff(Ω). The latter inclusion can be rewritten as
2.5 Extended Relative Interiors in Infinite Dimensions 163

mj mj
aj = λji ωij with λji = 1 and ωij ∈ Ω for i = 1, . . . , mj ,
i=1 i=1

which readily implies the equalities


mj
a, aj  = λji a, ωij  = 0.
i=1

The passage to the limit as j → ∞ gives us a = 0.


Next we deduce from (2.73), by using the Brøndsted-Rockafellar theorem
proved below in Theorem 5.10, the existence of bk ∈ Ω and uk ∈ X such that
√ √
uk ∈ N (bk ; Ω), bk  ≤ εk , and uk − wk  ≤ εk . (2.75)
Since wk  = 1, it follows from (2.75) that uk  → 1. Furthermore, we get
w w
from wk − → 0, εk ↓ 0, and (2.75) that uk − → 0 as k → ∞. Remembering that
Ω has the SNC property, it follows from (2.75) that uk  → 0, which clearly
contradicts the condition uk  → 1 as k → ∞. This tells us that there exists
x ∈ Ω such that a, x < 0. It justifies the proper separation of Ω and {0} in
(2.72) and shows therefore that 0 ∈ / qri(Ω).
To verify the quasi-regularity of Ω, it remains to show that qri(Ω) ⊂ iri(Ω).
Picking x ∈ qri(Ω) yields 0 ∈ qri(Ω − x). Since Ω is SNC, this property holds
for Ω − x as well. Based on Proposition 2.181, it allows us to deduce from the
above that 0 ∈ iri(Ω − x), and so x ∈ iri(Ω). This shows that qri(Ω) ⊂ iri(Ω)
and thus finishes the proof. 

2.5.3 Extended Relative Interiors of Graphs and Epigraphs

In the concluding subsection of this section we present applications of the


previous results to the evaluation of extended relative interiors to graphs
of convex set-valued mappings and epigraphs of convex extended-real-valued
functions in LCTV spaces. In contrast to the finite-dimensional results on
the precise calculations of the classical relative interior of convex graphs and
epigraphs given in Subsection 2.3.3, the infinite-dimensional framework offers
much more varieties in assumptions, estimates, and equalities as seen below.
Let us start with deriving evaluations of the quasi-relative interior of con-
vex graphs of set-valued mappings that present LCTV extensions of Theo-
rem 2.94 obtained in finite-dimensional spaces.

Theorem 2.189 Let F : X → → Y be a convex set-valued mapping between


LCTV spaces. The following hold:
(a) If the graph gph(F ) is quasi-regular, then we have the inclusion
       
qri gph(F ) ⊂ (x, y) ∈ X × Y  x ∈ qri dom(F ) , y ∈ qri F (x) .
164 2 BASIC THEORY OF CONVEXITY

(b) The quasi-regularity of the domain dom(F ) yields the opposite inclusion
       
qri gph(F ) ⊃ (x, y) ∈ X × Y  x ∈ qri dom(F ) , y ∈ qri F (x) .

(c) If both sets gph(F ) and dom(F ) are quasi-regular, then we have
       
qri gph(F ) = (x, y) ∈ X × Y  x ∈ qri dom(F ) , y ∈ qri F (x) .

Proof. To verity assertion(a), pick (x, y) ∈ qri(gph(F )) and suppose on the


contrary that x ∈
/ qri(dom(F )). By Proposition 2.181 on proper convex sepa-
ration we find v ∗ ∈ X ∗ such that
v ∗ , x ≤ v ∗ , x whenever x ∈ dom(F )
and also get x0 ∈ dom(F ) for which the strict inequality
v ∗ , x0  < v ∗ , x
is satisfied. Then for all (x, y) ∈ gph(F ) we have
(v ∗ , 0), (x, y) = v ∗ , x ≤ v ∗ , x = (v ∗ , 0), (x, y),
and for each y0 ∈ F (x0 ) arrive at the conditions
(v ∗ , 0), (x0 , y0 ) = v ∗ , x0  < v ∗ , x = (v ∗ , 0), (x, y).
This shows that the sets gph(F
 ) and {(x, y)} can be properly separated, and
hence (x, y) ∈/ qri gph(F ) by Proposition 2.181. The obtained contradiction
tells us that x ∈ qri(dom(F )).
To proceed further with the proof of (a), let us verify that y ∈ qri(F (x)).
Fix any y ∈ F (x) with y = y having so (x, y) ∈ gph(F ). The assumed quasi-
x, y) ∈ gph(F ) and t ∈ (0, 1) such that
regularity of gph(F ) gives us (
(x, y) = t(x, y) + (1 − t)(
x, y).
This yields x = x and y = ty + (1 − t) y with y ∈ F (x). Thus y ∈ iri(F (x)) ⊂
qri(F (x)), which verifies the displayed formula for qri(gph(F )) in (a).
To prove now assertion (b) under the imposed quasi-regularity
 of the
domain dom(F ), fix x ∈ qri(dom(F )) and y ∈ qri F (x) . Suppose on the
contrary that (x, y) ∈ / qri(gph(F )). Then Proposition 2.181 ensures the exis-
tence of (u∗ , v ∗ ) ∈ X ∗ × Y ∗ such that
u∗ , x + v ∗ , y ≤ u∗ , x + v ∗ , y whenever y ∈ F (x) (2.76)
and also the existence of (x0 , y0 ) ∈ gph(F ) for which
u∗ , x0  + v ∗ , y0  < u∗ , x + v ∗ , y.
2.5 Extended Relative Interiors in Infinite Dimensions 165

Letting x = x in (2.76) yields v ∗ , y ≤ v ∗ , y for all y ∈ F (x). Using further


x ∈ qri dom(F ) , x0 ∈ dom(F ), and the assumed quasi-regularity of dom(F )
allows us to deduce from the quasi-relative interior description in (2.54) that
 ∈ dom(F ) ensuring the representation x = tx0 + (1 − t)
there exists x x with
some t ∈ (0, 1). Pick y ∈ F (
x) and define
y := ty0 + (1 − t)
y.
Then y ∈ F (x) by the convexity of gph(F ), and we get
u∗ , x
 + v ∗ , y ≤ u∗ , x + v ∗ , y,
u , x0  + v ∗ , y0  < u∗ , x + v ∗ , y.

Multiply the first inequality above by 1 − t, the second inequality by t, and


add them together to arrive at the condition
u∗ , x + v ∗ , y < u∗ , x + v ∗ , y,
which gives us v ∗ , y < v ∗ , y. Thus we conclude that the sets {y} and F (x)
can be properly separated, and so y ∈ / qri(F (x)) by Proposition 2.181. This
contradiction shows that (x, y) ∈ qri(gph(F )) and hence verifies (b). The
concluding assertion (c) is an immediate consequence of (a) and (b). 

In the remainder of this subsection we evaluate extended relative interi-


ors of epigraphs of extended-real-valued convex functions defined on LCTV
spaces. Similar to the case of convex graphs, in infinite dimensions we have a
variety of situations in comparison with the finite-dimensional representation
of the relative interior of epigraphs given in Corollary 2.95. If fact, parts of
the obtained results for epigraphs in infinite dimensions follow from those for
graphs in Theorem 2.189, while others seem to be specific for epigraphical
convex sets. The next result belongs to the latter category.

Theorem 2.190 Let f : X → R be an extended-real-valued, proper, convex


function defined on an LCTV space X. Then we have the inclusion
      
qri epi(f ) ⊃ (x, λ) ∈ X × R  x ∈ qri dom(f ) , λ > f (x) . (2.77)
If in addition the set epi(f ) is quasi-regular, then
      
qri epi(f ) = (x, λ) ∈ X × R  x ∈ qri dom(f ) , λ > f (x) . (2.78)
 
Proof. First we verify (2.77). Fix any (x, λ) ∈ X × R with x ∈ qri dom(f )
 
and f (x) < λ. Suppose on the contrary that (x, λ) ∈ / qri epi(f ) . Then Propo-
sition 2.181 shows that there exists (x∗ , α) ∈ X ∗ × R such that
x∗ , x + αλ ≤ x∗ , x + αλ for all (x, λ) ∈ epi(f ) (2.79)

and there exists (  ∈ epi(f ) with


x, λ)
166 2 BASIC THEORY OF CONVEXITY

x∗ , x  < x∗ , x + αλ.


 + αλ (2.80)
Using x = x and λ = f (x) in (2.79) gives us
 
α f (x) − λ ≤ 0.

Since f (x) < λ, we get α ≥ 0. Using now x = x and λ = λ + 1 > λ > f (x) in
(2.79) shows that α ≤ 0, which yields α = 0. Thus it follows from (2.79) and
(2.80) that
 the sets {x} and dom(f ) can be properly separated,
 and hence
x∈ / qri dom(f ) . This contradiction shows that (x, λ) ∈ qri epi(f ) .
To verify the inclusion “⊂” in (2.78), define F : X → → R by F (x) :=
[f (x), ∞) and observe that gph(F ) = epi(f ) and dom(F ) = dom(f ). Fur-
thermore, for all x ∈ dom(f ) we easily get that qri(F (x)) = (f (x), ∞). The
imposed quasi-regularity of gph(F ) = epi(f ) ensures that the inclusion “⊂”
in (2.78) follows directly from assertion (a) of Theorem 2.189. 
The next result, which is based on Theorems 2.189(a) and 2.190, provides
a precise formula for calculating the quasi-relative interior of epigraphs for
an important class of extended-real-valued convex functions on LCTV spaces
without quasi-regularity or any additional assumptions.
Proposition 2.191 Let Ω be a nonempty convex set in an LCTV space X.
Given x∗ ∈ X ∗ and b ∈ R, define the extended-real-valued function

x∗ , x + b if x ∈ Ω,
f (x) :=
∞ if x ∈
/ Ω.

Then we have the precise representation


    
qri epi(f ) = (x, λ) ∈ X × R  x ∈ qri(Ω), λ > f (x) . (2.81)

Proof. The inclusion “⊃” in (2.81) follows from (2.77) in Theorem 2.190. To
verify the opposite inclusion “⊂” in (2.81), pick any (x0 , λ0 ) ∈ qri(epi(f ))
and show that (x0 , λ0 ) belongs to the set on the right-hand side of (2.81).
Defining F : X → → R by F (x) := [f (x), ∞) for all x ∈ X, we get that
dom(F ) = dom(f ) = Ω and gph(F ) = epi(f ). Following the lines in the proof
of Theorem 2.189(a), where the quasi-regularity of gph(F ) was not used, gives
us x0 ∈ qri(dom(F )) = qri(Ω). Thus it remains to show that λ0 > f (x0 ).
Suppose on the contrary that λ0 ≤ f (x0 ) and deduce from the inclu-
sion (x0 , λ0 ) ∈ epi(f ) that λ0 = f (x0 ), which ensures that (x0 , f (x0 )) ∈
qri(epi(f )). Remembering the definition of quasi-relative interior tells us that
the set L := cone(epi(f )−(x0 , f (x0 ))) is a linear subspace of X ×R. Hence for
any a := (x0 , f (x0 ) + 2) − (x0 , f (x0 )) = (0, 2) ∈ L we have −a = (0, −2) ∈ L.
Therefore, there exists a net {(γi )}i∈I ⊂ cone(epi(f ) − (x0 , f (x0 ))) such that
γi → −a = (0, −2), where
 
γi = μi (xi , λi ) − (x0 , f (x0 ))
2.5 Extended Relative Interiors in Infinite Dimensions 167

with μi ≥ 0 and (xi , λi ) ∈ epi(f ). It follows that


 
μi (xi − x0 ) → 0 and μi λi − f (x0 ) → −2.
Choose further an index i0 ∈ I for which
 
μi λi − f (x0 ) < −1 whenever i ≥ i0 .
By the construction of f we have
!    
x∗ , μi (xi − x0 ) = μi f (xi ) − f (x0 ) ≤ μi λi − f (x0 ) < −1 for all i ≥ i0 .
This contradicts the fact that x∗ , μi (xi − x0 ) → 0 and thus verifies that
λ0 > f (x0 ), which completes the proof of the proposition. 

We conclude this section by establishing a precise calculation formula for


the intrinsic relative interior of epigraphs of arbitrary convex functions on
topological vector spaces without any additional assumptions. Note that such
results have not been achieved above without imposing quasi-regularity in the
case of the quasi-relative interior of either graphs of mappings or epigraphs of
functions, as well as for the intrinsic relative interior of convex graphs.
Theorem 2.192 Let f : X → R be an extended-real-valued convex function
defined on a topological vector space X. Then we unconditionally have the
following intrinsic relative interior representation of the epigraph:
      
iri epi(f ) = (x, λ) ∈ X × R  x ∈ iri dom(f ) , λ > f (x) . (2.82)

Proof. Denote by Ω the set on the right-hand side of (2.82) and show that
 
iri epi(f ) = Ω. (2.83)

To verify the  “⊂” in (2.83), pick any (x, λ) ∈ iri(epi(f )) and check
 inclusion
that x ∈ iri dom(f ) . Indeed, fixing x ∈ dom(f ) with x = x, we get (x, λ) ∈
epi(f ) with λ := f (x). Then we deduce from Proposition 2.177 the existence
of (u, γ) ∈ epi(f ) such that (x, λ) ∈ ((x, λ), (u, γ)), which gives us x ∈ (x, u).
Applying Proposition 2.177 again yields x ∈ iri(dom(f )).
Next we show that f (x) < λ. Suppose on the contrary that λ = f (x) and
 ∈ epi(f ) with λ
take any (x, λ)  > f (x) meaning that
 
 = (x, λ) = x, f (x) .
(x, λ)
 (u, γ)).
By Proposition 2.177 we find (u, γ) ∈ epi(f ) with (x, f (x)) ∈ ((x, λ),
Hence there exists t0 ∈ (0, 1) such that
 + (1 − t0 )γ.
x = t0 x + (1 − t0 )u and λ = t0 λ
Employing the convexity of f shows that
168 2 BASIC THEORY OF CONVEXITY

 + (1 − t0 )γ = λ = f (x) ≤ t0 f (x) + (1 − t0 )f (u) < t0 λ


t0 λ  + (1 − t0 )f (u),

which yields γ < f (u) and therefore (u, γ) ∈ / epi(f ). The obtained contradic-
tion tells us that λ > f (x) and results in the inclusion iri(epi(f )) ⊂ Ω.
Now we turn to the proof of the opposite inclusion in (2.83). Fix (x, λ) ∈ Ω
giving us x ∈ iri(dom(f )) and λ > f (x). Picking any (x, λ) ∈ epi(f ) with
(x, λ) = (x, λ), we intend to verify the existence of (y, β) ∈ epi(f ) for which
 
(x, λ) ∈ (x, λ), (y, β) .
To proceed, consider following two cases:
Case 1: x = x. Since x ∈ iri(dom(f )) and x = x ∈ dom(f ), there exists
u ∈ dom(f ) such that x ∈ (x, u). Choose γ ∈ R satisfying
 
(x, λ) ∈ (u, γ), (x, λ)

and check that there exists (y, β) ∈ ((u, γ), (x, λ)) with (y, β) ∈ epi(f ).
Suppose on the contrary that for every (y, β) ∈ ((u, γ), (x, λ)) we have
(y, β) ∈
/ epi(f ). Fix any t ∈ (0, 1) and define the t-dependent elements
yt := tu + (1 − t)x and βt := tγ + (1 − t)λ,
for which we get (yt , βt ) ∈ ((u, γ), (x, λ)). The convexity of f ensures that
tγ + (1 − t)λ = βt < f (yt ) ≤ tf (u) + (1 − t)f (x) ≤ tf (u) + (1 − t)λ.
Letting t ↓ 0 shows that λ = f (x), a contradiction justifying the existence
of (y, β) with (y, β) ∈ ((u, γ), (x, λ)) and (y, β) ∈ epi(f ). It yields (x, λ) ∈
((x, λ), (y, β)), and so (x, λ) ∈ iri(epi(f )) by Proposition 2.177.
Case 2: x = x. Then we have λ = λ, and it follows from λ > f (x) that
 ∈ epi(f ) with (x, λ) ∈ ((x, λ), (x, λ)).
there exists (x, λ)  This justifies the
representation in (2.82) and complete the proof of the theorem. 
Finally, we derive a remarkable consequence of the obtained results show-
ing that the quasi-regularity of epigraphs of convex functions yields this prop-
erty of the corresponding domains.

Corollary 2.193 Let f : X → R be an extended-real-valued convex function


on an LCTV space X. Then the quasi-regularity of the epigraph epi(f ) ensures
that the domain dom(f ) is also quasi-regular.

Proof. Supposing that epi(f ) is quasi-regular, fix any x ∈ qri(dom(f )) and


choose λ > f (x). Then Theorem 2.190 tells us that (x, λ) ∈ qri(epi(f )) =
iri(epi(f )). Applying now Theorem 2.192, we see that x ∈ iri(dom(f )). This
shows that qri(dom(f )) ⊂ iri(dom(f )), which readily justifies the quasi-
regularity of dom(f ) since the opposite inclusion is trivial. 
2.6 Exercises for Chapter 2 169

2.6 Exercises for Chapter 2


Exercise 2.194 Verify the properties of seminorms listed in Proposition 2.28.

Exercise 2.195 Prove a more general result in comparison with Lemma 2.55,
namely, that any balanced subset Ω of a topological vector space X is connected.
Hint: Considering the set
Ω := [−x, x],
x∈Ω

note that each line segment [a, b] ⊂ X is connected due to the representation [a, b] =
ϕ([0, 1]), where ϕ : R → X is defined by ϕ(t) := ta + (1 − t)b for all t ∈ [0, 1].

Exercise 2.196 Verify that if B : X → Y is an affine mapping between vector


spaces and if Θ is a convex subset of Y , then B −1 (Θ) is a convex subset of X.

Exercise 2.197 Let Ω ⊂ X be an arbitrary cone in a vector space X, i.e., λx ∈ Ω


for any λ ≥ 0 and x ∈ Ω. that the following properties are equivalent:
(a) Ω is a convex cone.
(b) x + y ∈ Ω for all x, y ∈ Ω.

Exercise 2.198 Let Ω1 and Ω2 be nonempty convex cones in a vector space X.


Prove the following equality:

Ω1 + Ω2 = co{Ω1 ∪ Ω2 }.

Exercise 2.199 Let Ω be a nonempty convex cone in a vector space X. Show that
Ω is a linear subspace of X if and only if Ω = −Ω.

Exercise 2.200 Let Ω be a convex subset of a topological vector space.


(a) Prove the inclusions

int(Ω) ⊂ core(Ω) ⊂ Ω ⊂ lin(Ω) ⊂ Ω.

(b) Construct examples showing that each inclusion above can be strict.
(c) Establish sufficient conditions for each inclusion to hold as an equality.

Exercise 2.201 Let X be a topological vector space, and let Ω be a nonempty


subset of X. Prove that if Ω is open, then co(Ω) is open as well. Is it true that if Ω
is closed, then co(Ω) is closed?

Exercise 2.202 Let X be a normed space.


(a) Prove that if Ω = B is the closed unit ball in X, then pΩ (x) = x for all x ∈ X.
(b) Prove that if Ω is a balanced bounded convex set such that 0 ∈ int(Ω), then
pΩ is a norm which is equivalent to the given norm on X.
(c) Provide an example to show that the assertion in (b) could fail without assuming
the bounded property of Ω.

Exercise 2.203 Let Ω be an affine set in a vector space X, and let L be the linear
subspace parallel to Ω. Prove that L = Ω − Ω.
170 2 BASIC THEORY OF CONVEXITY

Exercise 2.204 Let Ω1 , Ω2 be nonempty convex sets in a vector space. Prove that:
(a) If Ω1 ∩ Ω2 = ∅ and core(Ω1 ) = ∅, then Ω1 and Ω2 can be separated by a
hyperplane. Hint: Let Ω := Ω1 − Ω2 . Then core(Ω1 ) − Ω2 ⊂ core(Ω), and so
core(Ω) = ∅. Since 0 ∈/ (Ω), it suffices to apply Theorem 2.49.
(b) If core(Ω1 ) ∩ Ω2 = ∅ and core(Ω1 ) = ∅, then Ω1 and Ω2 can be separated by a
hyperplane. Hint: Let A := core(Ω1 ) and B := Ω2 . Then A and B are disjoint
with core(A) = core(core(Ω1 )) = core(Ω1 ) = ∅. By part (a), the sets A and B
can be separated by a hyperplane. Then use Proposition 2.21 to show that Ω1
and Ω2 can be separated by a hyperplane.

Exercise 2.205 Give a detailed proof of Lemma 2.70.

Exercise 2.206 Obtain complex counterparts of the separation results of Subsec-


tion 2.3.1.Hint: Proceed similarly to the proof of Theorem 2.71.

Exercise 2.207 Let X be real vector space. Endow it with the topology τc gener-
ated by the family of all the seminorms on X and label this topology as the core
convex topology on X or the strongest topology which makes X a locally convex
topological vector space. Prove the following:
(a) Every absorbing convex set in X is a neighborhood of the origin.
(b) Every linear subspace of X is closed in the core convex topology.
(c) The dual space X ∗ := (X, τc )∗ is the collection of linear functions f : X → R.
(d) intτc (Ω) = core(Ω) for any convex set Ω ⊂ X.
Hint: Compare with [167, Exercise 2.10] and [181, Proposition 6.3.1].

Exercise 2.208 Using Exercise 2.207, clarify which results of Subsection 2.3.1 can
be derived from the corresponding ones obtained in Subsection 2.3.2.

Exercise 2.209 Derive Theorem 2.84 from Theorem 2.92. Hint: Observe first that
the proper separation of the sets Θ := {x} and Ω is equivalent to that for Θ and Ω.

Exercise 2.210 (a) Find two convex sets Ω1 and Ω2 in Rn such that Ω1 ⊂ Ω2
while ri(Ω1 ) ⊂ ri(Ω2 )
(b) Let Ω1 and Ω2 be two subsets of Rn such that Ω1 ⊂ Ω2 and Ω1 ∩ ri(Ω2 ) = ∅.
Prove that ri(Ω1 ) ⊂ ri(Ω2 ).

Exercise 2.211 Justify all the cases presented in Example 2.97.

Exercise 2.212 Let Ω be a nonempty subset of a normed space X, and let dΩ be


the associated distance function (2.33). Verify the following:
(a) The function dΩ is Lipschitz continuous on X with the Lipschitz constant = 1.
(b) Assuming the closedness of Ω, prove that the convexity of Ω is equivalent to
the convexity of dΩ .

Exercise 2.213 Verify the convexity of all the functions listed in Example 2.119.
Hint: Employ the second-order characterizations of convexity given in Corollary 2.117
and Theorem 2.118.
2.6 Exercises for Chapter 2 171

Exercise 2.214 Let f : [a, b] → R with a < b be a continuous function such that
x + y f (x) + f (y)
f ≤ for all x, y ∈ [a, b].
2 2
Prove that f is convex.

Exercise 2.215 Let f : [a, b] → R with a < b be a convex function. Prove that
a + b 1 b
f (a) + f (b)
f ≤ f (x) dx ≤ .
2 b−a a 2

Exercise 2.216 Let Ω := {(x1 , x2 ) ∈ R2 | x1 ≥ 0, x2 ≥ 0}. Define the function



− x1 x2 if x = (x1 , x2 ) ∈ Ω,
f (x) :=
∞ otherwise.

(a) Prove that f is a convex function on R2 .


(b) Is f strictly convex? Justify your answer.

Exercise 2.217 Find the Hessian of the following function and prove that the func-
tion is convex on Rn :

f (x) = f (x1 , . . . , xn ) := ln(ex1 + . . . + exn ).

Exercise 2.218 Let Ω be a subset of X. We say that Ω is midpoint convex if


(x + y)/2 ∈ Ω whenever x, y ∈ Ω. Given a closed set Ω in X, prove that Ω is
midpoint convex if and only if it is convex.

Exercise 2.219 Let I ⊂ R be a nonempty open interval, and let f : I → R be a


differentiable function. Prove that f is strictly convex if and only if f  is strictly
increasing on the interval I.

Exercise 2.220 Prove all the statements of Proposition 2.142.

Exercise 2.221 Justify Corollary 2.145 by employing the arguments similar to the
proof of Theorem 2.94 with the usage of Theorem 2.58 on the separation of convex
sets in topological vector spaces.

Exercise 2.222 Verify the description of relative absorbing points of convex sets
given in the proof of Proposition 2.169 and provide their geometric interpretation.

Exercise 2.223 Let Ω be a nonempty convex subset in an LCTV space X.


(a) Prove that all the sets ri(Ω), iri(Ω), and qri(Ω) are convex.
(b) Give examples, different from those presented in Subsection 2.5.1, showing that
all the sets in (a) may be empty in infinite dimensions.
(c) Show that qri(Ω) = ∅ for the set in Example 2.176.

Exercise 2.224 Let Ω be a nonempty convex subset of an LCTV space X. Prove


characterization (2.63) of nonsupport points of Ω.

Exercise 2.225 Let Ω be a nonempty convex subset of a separable Banach space.


(a) Modify the proof of Theorem 2.178 to cover unbounded sets.
172 2 BASIC THEORY OF CONVEXITY

(b) Prove a version of Theorem 2.178, where Ω is an open set.


(c) Is the Banach space structure essential in Theorem 2.178 and its given proof?
Hint: Simplify the proof of [47, Theorem 2.19] in the settings of (a) and (b).
Exercise 2.226 Let Ω1 ⊂ X1 and Ω2 ⊂ X2 be nonempty convex subsets of the
corresponding LCTV spaces. Show that

iri(Ω1 × Ω2 ) = iri(Ω1 ) × iri(Ω2 ) and qri(Ω1 × Ω2 ) = qri(Ω1 ) × qri(Ω2 ).

Exercise 2.227 Let Ω be a nonempty closed convex subset of a Banach space X.


(a) Deduce from (2.73) and the Brøndsted-Rockafellar theorem (Theorem 5.10)
that the relationships in (2.75) hold.
(b) Give a direct proof of (2.75) in the Hilbert space setting.
Exercise 2.228 A nonempty subset Ω of a topological vector space X is said to be
epi-Lipschitzian around x ∈ Ω if there exist neighborhoods U of x and O of the
origin in X as well a vector v ∈ X and a number γ > 0 such that

Ω ∩ U + tO ⊂ Ω + tv for all t ∈ (0, γ). (2.84)


(a) Show that if Ω is convex, then it is epi-Lipschitzian around any x ∈ Ω if and
only if int(Ω) = ∅.
(b) Verify that for any nonempty subset Ω of a normed space X the epi-Lipschitzian
property of Ω around x ∈ Ω implies that Ω is SNC at this point.
Exercise 2.229 A nonempty subset Ω of a normed space X is said to be com-
pactly epi-Lipschitzian (CEL) around x ∈ Ω if there are neighborhoods U of x
and O of the origin in X as well a compact set C ⊂ X and a number γ > 0 with

Ω ∩ U + tO ⊂ Ω + tC for all t ∈ (0, γ). (2.85)


(a) Verify that the CEL property holds for an arbitrary nonempty subset Ω ⊂ Rn
around any point x ∈ Ω.
(b) Show that the CEL property of Ω around x yields the SNC property of Ω at this
point. Hint: Proceed by the definitions and compare with [228, Theorem 1.26].
(c) Construct a set that is SNC at some point x but not CEL around this point.
Exercise 2.230 Let F : X → → Y be a convex set-valued mapping between LCTV
spaces, and let dom(F ) = ∅. Prove or disprove the following statements:
(a) The representation of qri(gph(F )) obtained in Theorem 2.189 holds without the
imposed quasi-regularity assumptions on dom(F ) and gph(F ).
(b) The counterpart of (a) with the replacement of qri by iri therein holds without
the imposed quasi-regularity assumptions.
Exercise 2.231 Let F : X → → Y be a convex set-valued mapping between LCTV
spaces. Prove that
    
qri(gph(F )) ⊃ (x, y) ∈ X × Y  x ∈ qri dom(F ) , y ∈ int F (x) .

Exercise 2.232 Let F : X → → Y be a convex mapping between LCTV spaces, and


let dom(F ) = ∅. Clarify whether the quasi-regularity of the graph gph(F ) yields
this property of the domain dom(F ).
2.7 Commentaries to Chapter 2 173

2.7 Commentaries to Chapter 2


Some geometric elements of convexity were studied and applied by prominent
Ancient Greece geometers, particularly by Euclid of Alexandria (325–270 BC), by
Archimedes of Syracuse (287–212 BC), and by Apollonius of Perga (240–190 BC).
Various geometric aspects of convexity were largely investigated by Jakob Steiner
(1796–1863), Hermann Brunn (1862–1939), Hermann Minkowski (1864–1909), Ernst
Steinitz (1871–1928), Wilhelm Blaschke (1885–1962), and other outstanding mathe-
maticians. The main achievements of convex geometric theory up to 1933 were sum-
marized in the book [43] by Tommy Bonnesen (1873–1935) and his student Werner
Fenchel (1905–1988) who both made important contributions to convex geometry.
Convex functions systematically came into mathematical life much later.
Although some convex functions were implicitly used and corresponding conditions
were established in the classical calculus of variations by Leonard Euler (1707–1783),
Adrien-Marie Legendre (1752–1833), Alfred Clebsch (1833–1872), and Karl Weier-
strass (1815–1897), the formal definition of real-valued convex functions was first
explicitly introduced by Johan Jensen (1859–1925) in his paper [175] published in
1906. Striking applications of convexity to the foundations of the calculus of varia-
tions were developed by Leonida Tonelli (1885–1946) who proved that the convexity
of integral functionals with respect to velocity variables yields the lower semicon-
tinuity (the notion introduced by Baire [22]) of such functionals in an appropriate
weak topology and thus ensures the existence of solutions to variational problems;
see his two-volume book [339]. Tonelli blazed a trail for the subsequent fundamental
applications of convexity and convexification procedures to the existence of optimal
solutions and their relaxations in various problems of the calculus of variations and
optimal control, which were greatly developed by Nikolay Bogolyubov (1909–1992),
Laurence Chisholm (L.C.) Young (1905–2000), Jack Warga (1922–2011), and Revaz
Gamkrelidze (born in 1927); see [40, 137, 349, 356] for their pioneering contributions.
Curiously, up to the second half of the twentieth century, convex sets and func-
tions were studied in parallel while not together, and the prime attention was paid
to sets (apart from very special convex functions and classical convex inequalities).
It was Minkowski who laid the foundations of the general theory of convex sets
in finite-dimensional spaces; see his two books [219, 220] published posthumously.
Minkowski introduced therein the normal cone notion for convex sets and obtained
the original versions of the convex separation theorem. The refined relative inte-
rior version in finite dimensions, as presented in Theorem 2.184, is due to Fenchel
[131]. Extensions of convex separation theorems to infinite-dimensional settings in
linear topological spaces under nonempty interior assumptions were developed in
[58, 117, 183].
In a parallel way, convexity entered the field of functional analysis in 1930s in
the framework of normed spaces and locally convex topological vector spaces. As
discussed above in the commentaries to Chapter 1, a major result of functional
analysis is the celebrated Hahn-Banach theorem. Although Banach and his student
Stanislaw Mazur (1905–1981) proved in 1933 (see [27]) the equivalence between the
Hahn-Banach extension theorem and convex separation (called the “geometric form
of the Hahn-Banach theorem” by Bourbaki [61]), the vast majority of books on
functional analysis are still revolved around the traditional extension formulation.
The major motivations for elaborations and applications of convexity theory
(mainly in geometric modes) came in 1940s–1950s from optimization, economic
174 2 BASIC THEORY OF CONVEXITY

modeling, and optimal control. This period started with linear programming, which
involves optimizing linear functions on polyhedral convex sets. The invention of com-
puters and the development of computational technology made it possible to address
practically important optimization problems with many inequality constraints. It
was not the case with the classical optimization theory, chiefly due to Joseph-Louis
Lagrange (1736–1813), dealing with constraints given by equations/equalities. Novel
solution methods for linear programs, particularly exploiting the geometry of convex
polyhedra, were suggested and implemented by Leonid Kantorovich (1912–1986),
Tjalling Koopmans (1910–1985), George Dantzig (1914–2005), and their followers,
and then were successfully applied to solving a great variety of problems appearing
in military, economics, engineering, transportation, manufacturing, energy, etc. All
of this has constituted one of the most monumental applications of optimization
(and of mathematics in general) to society.
Extending models of linear programming, Harold Kuhn (1925–2014) and Albert
Tucker (1905–1995), in their pioneering paper [193] published in 1951, formulated
problems of convex programming with many inequality constraints given by smooth
convex functions and, by using standard convex separation, derived necessary opti-
mality conditions for such problems as a saddle point theorem [193]. Then these
conditions were reformulated in a Lagrangian form with the additional sign and com-
plementary slackness relations that are valid for more general nonlinear programming
problems described by smooth while not necessary convex data. It was revealed later
that the obtained conditions had also been discovered in 1939 by William Karush
(1917–1997) in his unpublished master thesis [179], which was fulfilled under the
supervision of Lawrence Graves (1896-1973). Since that, the aforementioned neces-
sary optimality conditions have been called the Karush-Kuhn-Tucker (KKT) condi-
tions for problems of nonlinear programming.
Another source of profound interest to convexity was provided by economic
modeling, particularly by the general (economic) equilibrium theory. In this vein,
the fundamental results for the most remarkable model of microeconomics, known
as the model of welfare economics, were developed independently in 1951 by Kenneth
Arrow (1921–2017) in his paper [8] (published in the same conference proceedings as
the aforementioned paper by Kuhn and Tucker) and by Gérard Debreu (1921–2004)
in his paper [96] under the convexity assumptions. One of the two major results of
the Arrow-Debreu model is the so-called Second Fundamental Theorem of Welfare
Economics ensuring the support of efficient allocations of a convex economy by a
decentralized equilibrium price, where each consumer minimizes his/her expendi-
tures and each firm maximizes its profit. The proof of this result is fully based on
convex separation, which therefore allowed Arrow and Debreu to rigorously justify
for convex economies the “invisible hand” principle by Adam Smith (1723–1790).
A powerful impulse for further developments and applications of convexity came
at the end of 1950s from then new area of optimal control. The central result of this
theory, the Pontryagin maximum principle formulated by Lev Pontryagin (1908–
1988) and named after him, was proved for nonlinear systems of ordinary differen-
tial equations by Vladimir Boltyanskii (1925–2019) in [41]. A crucial element of this
proof, as well as of the preceding proof of the maximum principle given by Gamkre-
lidze [136] for linear systems, was the usage of convex separation. However, in con-
trast to linear systems, no convexity assumptions were made in the nonlinear case,
where a certain “hidden convexity” was revealed for ODE control systems. All of
this was fully understood and largely extended in the theory of necessary conditions
2.7 Commentaries to Chapter 2 175

for general optimization problems developed by Abram Dubovitskii (1923–2007) and


Alexey Milyutin (1925–2001) in their fundamental paper [113], which also contained
new results on convex sets and functions. The convex separation technique played a
crucial role in [113] and subsequent development on necessary conditions in optimiza-
tion and control problems. Boris Pshenichnyi (1937–2000) stated in his book [294]:
“In fact, the entire theory of necessary optimality conditions is an extended conse-
quence of the convex separation theorem.” This is not completely true nowadays,
but still convex separation is strongly employed in optimization, control, and other
areas of mathematics and its applications. The modern approach to the simultane-
ous study of convex functions and sets was initiated by Fenchel [131] and was greatly
developed independently by Jean-Jacques Moreau (1923–2014) and R. Tyrrell Rock-
afellar (born in 1935); see their original works [262, 266, 302]. In contrast to Fenchel
who handled everything in terms of pairs (f, Ω) of real-valued convex functions f
defined on convex sets Ω, both Moreau and Rockafellar considered extended-real-
valued convex functions defined on the entire space with possible infinite values. In
this way they unified the study of functions and sets while associating the latter with
set indicator functions equal to 0 on the set in question and ∞ otherwise. We will
discuss in more detail this approach and the obtained results in the commentaries
to the subsequent chapters. A highly influential book by Rockafellar [306] entitled
“Convex Analysis” (the term suggested by Tucker) laid out the foundations of the
new field of mathematics, which occupies an intermediate position between analysis
and geometry while fruitfully combining and intensely developing ideas and methods
from both disciplines.
Let us now briefly comment on the material presented in this chapter. The basic
definitions and properties of convex sets given in Section 2.1 go back chiefly to
Minkowski [219, 220] in finite-dimensional spaces, while their infinite-dimensional
extensions can be found in many publications; see, e.g., the books by Holmes [167]
and Zălinescu [361]. The notions of seminorms and gauge functions from Section 2.2
and their basic properties in finite dimensions are due to Minkowski. Their infinite-
dimensional counterparts in terms of algebraic interiors (cores) and linear closures
presented in Sections 2.2 and 2.6 are also mostly known (see, e.g., the books [167, 181,
361]). Our exposition here for spaces without topologies follows the recent paper [89].
Note that the core convex topology described in Exercise 2.207 was introduced by
Richard Holmes in [167, Exercise 2.10] and then largely developed by Akhtar Khan,
Christiane Tammer, and Constantin Zălinescu [181, Chapter 6]. Theorem 2.24 on
the algebraic and topological interiors in Banach spaces is taken from the book by
Jonathan Borwein (1951–2016) and Qiji Jim Zhu [54, Theorem 4.1.8].
As discussed above, convex separation theorems presented in Section 2.3 play a
crucial role in convex analysis and its applications. Conceptionally, convex separation
is the major machinery of the geometric approach to convex analysis developed in
this book. We follow the recent paper [92] to prove that, in general vector spaces
without topologies, the Hahn-Banach extension theorem is implied by merely the
extreme case of convex separation by which we understand the separation of an
algebraically solid (i.e., with core(Ω) = ∅) convex set Ω from a singleton x0 ∈ / Ω,
provided that Ω = core(Ω); see Theorem 2.54. The underlying notion of relative
interior for convex sets in finite dimensions and most of its properties presented
in this section trace back to Steinitz [329]. The culmination is Fenchel’s separation
theorem given on Theorem 2.92. The representation of the relative interiors of convex
graphs in Theorem 2.94 is due to Rockafellar [306, Theorem 6.8].
176 2 BASIC THEORY OF CONVEXITY

The notion of extreme points of convex sets is broadly used in convex analysis,
functional analysis, and various applications; in particular, to optimization prob-
lems. In linear programming, extreme points are called vertices (of convex polyhe-
dra) while playing a pivotal role in Dantzig’s Simplex Method. The fundamental
Theorem 2.103 is due to Mark Krein (1907–1989) and David Milman (1912–1982)
being published in their paper [185]. Its proof is based on convex separation.
Section 2.4 mainly collects well-known results on general properties of extended-
real-valued convex functions by following the aforementioned classical works of
Fenchel, Moreau, and Rockafellar. Appealing to convex geometry, we use the geomet-
ric definition of convex functions while providing their equivalent analytic descrip-
tion via Jensen’s inequalities in Theorem 2.105. In particular, the remarkable opera-
tion of infimal convolution (known also as “epi-addition” and “inf-convolution”) for
convex functions, which keeps convexity, goes back to Fenchel [131] and has been
largely investigated by Moreau [265, 266]. The class of optimal value/marginal func-
tions defined in (2.39) plays a highly important role in convex analysis and numerous
applications as seen in the subsequent chapters of the book. The quasiconvex exten-
sion of convex functions given in Definition 2.113 has been widely recognized in
mathematics after publishing the fundamental monograph by von Neumann and
Morgenstern on game theory and economic behavior [347].
While lower semicontinuous functions play a rather peripheral role in real anal-
ysis, the situation is completely different for extended-real-valued convex functions.
Indeed, the continuity of convex functions on general topological vector spaces pre-
cisely corresponds to interior points of the domain (see Theorem 2.144), and this
actually ensures by Corollary 2.150 the Lipschitz continuity of the functions in ques-
tion defined on normed spaces. On the other hand, lower semicontinuity allows us
to deal with boundary points of the domain, which is crucial, e.g., for applications
to constrained optimization. Furthermore, lower semicontinuity, in contrast to con-
tinuity, is preserved under major operations over convex functions.
Subsection 2.5.2 presents rather recent results (most of them have never been
published in the monographic literature), which address the systematic relaxation
of the nonempty interior assumptions in convex analysis and applications. It has
been well recognized that the nonempty interior requirements on convex sets (and
the corresponding continuity assumptions on convex functions) imposed in many
results of infinite-dimensional convex analysis are often quite restrictive. This issue
has been resolved in finite dimensions by replacing interior with relative interior,
which is always nonempty for nonempty convex sets. However, it is not the case in
infinite-dimensional spaces, where convex sets may have empty relative interiors in
common settings including those highly important in applications. In particular, the
ordering/positive cones in the classical Lebesgue spaces lp and Lp for 1 ≤ p < ∞,
which appear, e.g., in economic modeling and general equilibria, have empty relative
interiors. Since the positive cones in Lebesgue spaces have nonempty interiors only
for p = ∞, this pushes economists in infinite-dimensional modeling to work in the
complicated and inconvenient space L∞ in order to use convex separation and reach
price equilibria; see, e.g., the classical book by Mas-Collel et al. [216] as well as the
more recent book by Mordukhovich [228], where further developments and detailed
commentaries can be found in Chapter 8 with the references therein.
To the best of our knowledge, a more appropriate version of the relative interior
from Definition 2.168(a) first appeared in Holmes [167] under the name “intrin-
sic core.” This name may be confusing as argued in Borwein and Goebel [46]
2.7 Commentaries to Chapter 2 177

who suggested to replace it by “pseudo-relative interior.” The latter seems to be


confusing as well since “pseudo” means “false.” The name intrinsic relative inte-
rior (iri) used here and other publications was coined in Bao and Mordukhovich
[29], where various results about the iri property with applications to set-valued
optimization have been developed; see also the book [229, Chapter 9] with its com-
mentaries and bibliography. The topological counterpart of the latter property for-
mulated in Definition 2.168(b) was introduced by Borwein and Lewis [48] under
the name of quasi-relative interior (qri). An equivalent notion of “inner points”
was defined independently by Hadjisavvas and Schaible in [148]. Curiously, a very
similar notion can be actually found in Zarantonello [363]. Contrary to the intrin-
sic relative interior construction, the quasi-relative interior is nonempty for any
nonempty, closed, and convex subset of a separable Banach space; see Theorem 2.178
taken from Borwein and Lewis [48, Theorem 2.19]. This result plays a crucial role
in various applications of the quasi-relative interior notion that can be found in
[29, 46, 48, 55, 56, 94, 116, 134, 181, 229, 362] among other publications.
The notion of quasi-regularity of convex sets in LCTV spaces was introduced in
our paper with Cuong [87], where some relationships between generalized relative
interior notions for graphs and epigraphs were obtained. Theorem 2.189 is taken
from [87], while providing far-going extensions of Rockafellar’s finite-dimensional
result about the relative interior of convex graphs (see Theorem 2.94) to the case
of convex set-valued mappings between LCTV spaces with quasi-regular graphs
and domains. Theorem 2.190 was originally proved by Zălinescu [362] by using a
different and more involved device. Note that quasi-relative regularity allows us to
combine advantages of both intrinsic relative and quasi-relative interiors of convex
sets in infinite dimensions. Among various sufficient conditions for quasi-regularity,
let us highlight Theorem 2.188 taken from [87] that employs the sequential normal
compactness (SNC) property of sets. This property is strongly used in the subsequent
chapters of the book, especially in Chapter 5.
The SNC property of sets was introduced and utilized by Mordukhovich and
Shao [258] (preprint of 1994) and then was broadly used in variational analysis.
We refer the reader to the book of the first author [228] with the large commen-
taries and bibliography therein for a comprehensive theory of SNC and associated
properties, preservation/calculus rules for them, and a variety of applications. The
epi-Lipschitzian property of sets (2.84) goes back to Rockafellar [310], while its com-
pactly Lipschitzian (CEL) extension to infinite-dimensional spaces defined in (2.85)
was introduced by Borwein and Strójwas [51]. The CEL characterization for convex
sets presented in Exercise 2.229 was established by Borwein et al. [49], where the
reader can find more on this property. The proof of the implication CEL=⇒SNC
taken from [257] was actually based therein on Loewen’s arguments from [208].
Topological counterparts of SNC and their relationships with the CEL property can
be found in Fabian and Mordukhovich [127], Ioffe [170], and Penot [287].
For other results presented in Section 2.5.2 we mainly follow our quite recent
paper with Cuong and Sandine [90]; see the references therein for some previous
developments. This paper contains, in particular, the characterization of proper sep-
aration in LCTV spaces given in Theorem 2.184 with the replacement of the relative
interior notion in finite dimensions by its quasi-relative counterpart. We also refer
the reader to the book by Khan at al. [181, Section 6.2] for sufficient conditions
of proper separation of convex sets in vector spaces without topologies that are
expressed in terms of algebraic cores and their relative algebraic versions.
3
CONVEX GENERALIZED
DIFFERENTIATION

Generalized differentiation lies at the very heart of convex analysis and its
applications. Since the most useful and even most simple convex functions
are nondifferentiable at the points of interest, the now flourishing general-
ized differentiation theory oriented toward optimization-related problems has
started from convex analysis and then has been extended to more general
variational frameworks. It concerns not only nondifferentiable functions but
also sets with nonsmooth boundaries as well as set-valued mappings. Calculus
rules of generalized differentiation have been the central issue of the theory
and applications from the very beginning of convex analysis.
Developing a geometric approach to generalized differentiation, which is
mainly based on set extremality and convex separation, we start with the study
of normals to convex sets, proceed with coderivatives of set-valued mappings
having convex graphs, and finally turn to subgradients of extended-real-valued
convex functions. In this chapter we mainly present basic results of general-
ized differential calculus in locally convex topological spaces with their refine-
ments and improvements in finite dimensions, while we also discuss in the
exercise and commentary parts some extended versions in vector spaces with-
out topologies. This study is continued in Chapter 4, where we combine it
with developing calculus rules for Fenchel conjugates and deriving enhanced
results on generalized differential calculus in Banach spaces.
Unless otherwise stated, all the spaces under consideration in this chapter
are real topological vector spaces.

3.1 The Normal Cone and Set Extremality


The notion of the normal cone to a convex set plays a crucial role in a great
many theoretical and numerical aspects of convexity theory and applications.
It is a fundamental concept of generalized differentiation, which reduces to the
normal space for sets with smooth boundaries and induces the corresponding
generalized differential constructions for functions and mappings. Major prop-
erties (and the very definition) of the normal cone to convex sets closely relate

© Springer Nature Switzerland AG 2022 179


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 3
180 3 CONVEX GENERALIZED DIFFERENTIATION

to convex separation theorems. We also exploit set extremality and variational


arguments for the study and applications of normals to convex sets and asso-
ciated constructions for convex set-valued mappings and extended-real-valued
functions; see below.

3.1.1 Basic Definition and Normal Cone Properties

Here is the underlying construction of the normal cone to convex sets.


Definition 3.1 Let Ω be a nonempty convex subset of a topological vector
space X. The normal cone to Ω at x ∈ Ω is defined by
  
N (x; Ω) := x∗ ∈ X ∗  x∗ , x − x ≤ 0 for all x ∈ Ω . (3.1)
If x ∈
/ Ω, we put N (x; Ω) := ∅.

Fig. 3.1. The normal cone

The next proposition is a direct consequence of the definitions (Figure 3.1).

Proposition 3.2 Let Ω be a convex subset of a topological vector space X


with x ∈ Ω. Then N (x; Ω) ⊂ X ∗ is a convex cone that is closed with respect
to the weak∗ topology on X ∗ .

Proof. Fix any x∗ ∈ N (x; Ω) and any λ ≥ 0. Then we have


λx∗ , x − x = λx∗ , x − x ≤ 0 for all x ∈ Ω,
which shows that λx∗ ∈ N (x; Ω) and thus N (x; Ω) is a cone. Picking now any
vectors u∗ , v ∗ ∈ N (x; Ω) readily implies that
3.1 The Normal Cone and Set Extremality 181

u∗ + v ∗ , x − x = u∗ , x − x + v ∗ , x − x ≤ 0 for all x ∈ X,


and so u∗ + v ∗ ∈ N (x; Ω) verifying the convexity of the cone N (x; Ω). Finally,

N (x; Ω) = Ωx ,
x∈Ω

where the set Ωx := {x∗ ∈ X ∗ | x∗ , x − x ≤ 0} is weak∗ closed for every


x ∈ Ω. Thus the cone N (x; Ω) is closed with respect to the weak∗ topology
as the intersection of a family of weak∗ closed subsets of X ∗ . 

Next we present a normal cone characterization of boundary points of sets


in both topological vector spaces and finite-dimensional settings. These results
are based on the corresponding versions of convex separation.

Theorem 3.3 Let Ω be a convex subset of a topological vector space X with


x ∈ Ω. Then we have the following:
(a) If x ∈ int(Ω), then N (x; Ω) = {0}.
(b) If Ω has nonempty interior, then x ∈ bd(Ω) if and only if the normal
cone N (x; Ω) contains a nonzero element.
(c) If X = Rn , then x ∈ bd(Ω) if and only if N (x; Ω) = {0}.

Proof. To verify assertion (a), take x ∈ int(Ω) and choose a neighborhood


V of the origin such that x + V ⊂ Ω. Picking any x∗ ∈ N (x; Ω), we get by
Definition 3.1 that
 ∗ 
x , v = x∗ , (x + v) − x ≤ 0 for all v ∈ V.
Now fix any x ∈ X and take t > 0 so small that tx ∈ V . Then x∗ , tx =
tx∗ , x ≤ 0, and hence x∗ , x ≤ 0. This ensures that x∗ , x = 0 for all x ∈ X
and therefore x∗ = 0.
(b) Suppose that int(Ω) = ∅ and pick any x ∈ / int(Ω), i.e., x ∈ bd(Ω). Then
Theorem 2.58 on convex separation in topological vector spaces applied to the
sets Ω and {x} gives us a nonzero element x∗ ∈ X ∗ satisfying
x∗ , x ≤ x∗ , x for all x ∈ Ω.
This yields x∗ , x − x ≤ 0 for all x ∈ Ω, and so x∗ ∈ N (x; Ω) with x∗ = 0.
The converse implication in (b) immediately follows from (a).
It remains to justify (c) where X = Rn . If x ∈ bd(Ω), then there is a sequence
{xk } ⊂ Ω c that converges to x. Applying the finite-dimensional version of
convex separation taken from Theorem 2.92, for each k ∈ N we find vk ∈ Rn
with vk = 0 such that
182 3 CONVEX GENERALIZED DIFFERENTIATION

vk , x ≤ vk , xk  for all x ∈ Ω.


Let wk := vvkk  and suppose without loss of generality that {wk } converges to
some w = 0. Then by passing to a limit as k → ∞ we get
w, x ≤ w, x for all x ∈ Ω,
and so w ∈ N (x; Ω). The converse implication in (c) also follows from (a). 

Example 3.4 The assumption int(Ω) = ∅ in Theorem 3.3(b) is essential


for the fulfillment of the nontriviality condition N (x; Ω) = {0} at bound-
ary points. To illustrate this, consider any infinite-dimensional Hilbert space
H, which has an orthonormal basis {ek | k ∈ N}. Define the set
Ω := {x ∈ H | x, ek  ≤ 1/k for all k ∈ N}.
Then we see that x := 0 is a boundary point of Ω while N (0; Ω) = {0}.

The next proposition provides a useful description of the separation prop-


erty for convex sets via the normal cone construction (3.1).

Proposition 3.5 Let Ω1 and Ω2 be convex sets in a topological vector space


X with x ∈ Ω1 ∩ Ω2 . Then Ω1 and Ω2 can be separated by a closed hyperplane
if and only if we have

N (x; Ω1 ) ∩ − N (x; Ω2 ) = {0}. (3.2)

Proof. The separation of the two convex sets Ω1 and Ω2 by a closed hyper-
plane means the existence of x∗ ∈ X ∗ with x∗ = 0 such that
x∗ , x ≤ x∗ , y whenever x ∈ Ω1 , y ∈ Ω2 . (3.3)
It implies, in particular, that we have
x∗ , x ≤ x∗ , x whenever x ∈ Ω1 ,
which tells us by Definition (3.1) that x∗ ∈ N (x; Ω1 ). In the same way we
can check that −x∗ ∈ N (x; Ω2 ). The proof of the opposite implication is also
straightforward. 

3.1.2 Set Extremality and Convex Extremal Principle

In this subsection we introduce and investigate extremal systems of sets that


reflect variational aspects of nonlinear analysis and make a bridge between
convex analysis and more general frameworks of variational analysis and gen-
eralized differentiation studied later in the book. In the convex setting mostly
considered below, the main characterization of set extremality, which is known
as the extremal principle and is given in form (3.2) if x ∈ Ω1 ∩ Ω2 , has
some remarkable features while being in fact equivalent to convex separation
(Figure 3.2).
3.1 The Normal Cone and Set Extremality 183

Fig. 3.2. Extremal systems

Definition 3.6 We say that two nonempty (not necessarily convex) subsets
Ω1 , Ω2 of a topological vector space X form an extremal system in X if
for any neighborhood V of the origin there exists a vector a ∈ X such that
a ∈ V and (Ω1 + a) ∩ Ω2 = ∅. (3.4)

Note that Definition 3.6 and the following theorem do not generally require
that the sets in question have a common point.

Theorem 3.7 Let X be a topological vector space, and let Ω1 , Ω2 ⊂ X be


nonempty and convex. Then we have the following:
(a) Ω1 and Ω2 form an extremal system in X if and only if 0 ∈ / int(Ω1 − Ω2 ),
which implies that int(Ω1 ) ∩ Ω2 = ∅ and int(Ω2 ) ∩ Ω1 = ∅.
(b) If Ω1 and Ω2 form an extremal system and the set Ω1 − Ω2 is solid, i.e.,
int(Ω1 − Ω2 ) = ∅, (3.5)
then these sets can be separated by a closed hyperplane meaning that
sup x∗ , x ≤ inf x∗ , x for some x∗ ∈ X ∗ , x∗ = 0, (3.6)
x∈Ω1 x∈Ω2

which is equivalent to (3.2) whenever x ∈ Ω1 ∩ Ω2 .


(c) The separation property (3.6) yields the set extremality even if (3.5) fails.

Proof. To verify assertion (a), assume that the sets Ω1 and Ω2 form an
extremal system in X. Suppose on the contrary that 0 ∈ int(Ω1 − Ω2 ). Then
there exists a balanced neighborhood V of 0 ∈ X such that
V ⊂ Ω1 − Ω 2 .
If a ∈ V , then −a ∈ V ⊂ Ω1 −Ω2 and and so (Ω1 +a)∩Ω2 = ∅, a contradiction.
Now suppose that 0 ∈/ int(Ω1 − Ω2 ) and thus get
V ∩ X \ (Ω1 − Ω2 ) = ∅
for any neighborhood V of the origin. Assume without loss of generality that
V is balanced and so (−V ) ∩ [X \ (Ω1 − Ω2 )] = ∅. Then choose a ∈ V with
184 3 CONVEX GENERALIZED DIFFERENTIATION

−a ∈ X \ (Ω1 − Ω2 ) ,
and hence (Ω1 + a) ∩ Ω2 = ∅. This verifies that the convex sets Ω1 and Ω2
form an extremal system.
For the second part of (a), suppose on the contrary that int(Ω1 ) ∩ Ω2 = ∅.
Then there exists a vector x ∈ int(Ω1 ) with x ∈ Ω2 . We can always choose a
balanced neighborhood V of the origin such that x + V ⊂ Ω1 . For any vector
a ∈ V we have −a ∈ V and x − a ∈ Ω1 . Hence (a + Ω1 ) ∩ Ω2 = ∅, which is a
clear contradiction.
To verify (b), observe that if Ω1 and Ω2 form an extremal system, then 0 ∈ /
int(Ω1 − Ω2 ) by (a). The assumption in (b) on the solidness of the difference
Ω1 − Ω2 allows us to use the convex separation theorem, which yields (3.6).
The last statement of (b) is justified in Proposition 3.5.
To prove (c), suppose that (3.6) holds, which gives us v ∈ X with x∗ , v >
0. Fix any neighborhood V of the origin. Since V is always absorbing, we can
select k ∈ N so large that a := −v/k ∈ V . Let us show that (3.4) is satisfied
with this vector a. Indeed, the negation of this means that there exists x ∈ Ω2
with x − a ∈ Ω1 . By the separation property from (3.6) we have
x∗ , x − a ≤ sup x∗ , x ≤ inf x∗ , x ≤ x∗ , x.
x∈Ω1 x∈Ω2

On the other hand, the construction of the vector a tells us that


1 ∗
x∗ , x − x∗ , a = x∗ , x + x , v ≤ x∗ , x.
k
This shows that x∗ , v ≤ 0, which contradicts the choice of v. 

Next we present two direct consequences of Theorem 3.7.

Corollary 3.8 Let Ω1 and Ω2 be nonempty convex sets in a topological vector


space such that their difference Ω1 − Ω2 has a nonempty interior. Then the
following properties are equivalent:
(a) Ω1 and Ω2 form an extremal system.
(b) Ω1 and Ω2 can be separated by a closed hyperplane.

Corollary 3.9 Let Ω be a nonempty convex subset of X with int(Ω) = ∅, and


let x ∈ Ω. Then the sets Ω and {x} form an extremal system if and only if
we have x ∈ bd(Ω).

Further results on the extremality of convex sets, its characterizations,


and relationships with convex separation in Banach spaces will be given in
Chapter 5; see Subsection 5.1.2.
3.1 The Normal Cone and Set Extremality 185

3.1.3 Normal Cone Intersection Rule in Topological Vector Spaces

In this subsection we apply the set extremality and convex extremal princi-
ple of Theorem 3.7 to obtain the normal cone intersection rule under differ-
ent qualification conditions for convex sets in topological vector spaces. This
intersection rule is crucial for deriving major rules of coderivative and subdif-
ferential calculus by using the geometric approach. The proof of the following
fundamental theorem under the classical qualification condition in topological
vector spaces is based on set extremality and convex separation (Figure 3.3).

Fig. 3.3. Theorem 3.10

Theorem 3.10 Let Ω1 and Ω2 be nonempty convex sets in a topological vector


space X. Then we always have the inclusion
N (x; Ω1 ∩ Ω2 ) ⊃ N (x; Ω1 ) + N (x; Ω2 ) for all x ∈ Ω1 ∩ Ω2 . (3.7)
Furthermore, the normal cone intersection rule
N (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ) + N (x; Ω2 ) for all x ∈ Ω1 ∩ Ω2 (3.8)
holds under the fulfillment of the qualification conditions:
either int(Ω1 ) ∩ Ω2 = ∅, or Ω1 ∩ int(Ω2 ) = ∅. (3.9)

Proof. Inclusion (3.7) can be easily checked by definition. To verify (3.8),


consider for definiteness the case where int(Ω1 ) ∩ Ω2 = ∅. Fixing any x∗ ∈
N (x; Ω1 ∩ Ω2 ), we have by definition that
186 3 CONVEX GENERALIZED DIFFERENTIATION

x∗ , x − x ≤ 0 whenever x ∈ Ω1 ∩ Ω2 .
Define further the convex sets
  
Θ1 := Ω1 × [0, ∞) and Θ2 := (x, μ) ∈ X × R  x ∈ Ω2 , μ ≤ x∗ , x − x .
It follows from the constructions of Θ1 and Θ2 that for any α > 0 we get

Θ1 + (0, α) ∩ Θ2 = ∅,
and thus these sets form an extremal system in the sense of Definition 3.6.
To proceed, observe first that int(Θ1 ) = ∅, and so the set Θ1 − Θ2 is solid.
Applying Theorem 3.7 to the sets Θ1 and Θ2 gives us y ∗ ∈ X ∗ and γ ∈ R
with (y ∗ , γ) = (0, 0) and
y ∗ , x − γλ1 ≤ y ∗ , y − γλ2 whenever (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 . (3.10)
Using (3.10) with (x, 1) ∈ Θ1 and (x, 0) ∈ Θ2 yields γ ≥ 0. If γ = 0, then
y ∗ , x ≤ y ∗ , y for all x ∈ Ω1 , y ∈ Ω2 .
Since int(Ω1 ) ∩ Ω2 = ∅, it readily gives us y ∗ = 0. This contradiction shows
that γ > 0. Employing now (3.10) with (x, 0) ∈ Θ1 for any x ∈ Ω1 and with
(x, 0) ∈ Θ2 tells us that
y ∗ , x ≤ y ∗ , x for all x ∈ Ω1 , and so y ∗ ∈ N (x; Ω1 ).
Using (3.10) with (x, 0) ∈ Θ1 and (y, x∗ , y − x) ∈ Θ2 for y ∈ Ω2 shows that
y ∗ , x ≤ y ∗ , y − γx∗ , y − x for all y ∈ Ω2 .
Dividing both sides of the obtained inequality by γ > 0, we arrive at
x∗ − y ∗ /γ, y − x ≤ 0 for all y ∈ Ω2 ,
which verifies by (3.1) the fulfillment of the inclusions
x∗ ∈ y ∗ /γ + N (x; Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 )
and hence proves that N (x; Ω1 ∩ Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 ). 

Next we show that the usage of the set extremality device implemented in
the proof of Theorem 3.10 allows us to obtain the normal cone intersection
rule in topological vector spaces under a refined qualification condition that
is weaker than the standard one (3.9) in any normed space and may strictly
improve the latter even in finite dimensions; see more discussions below (Fig-
ure 3.4).
Recall that a subset Ω of a topological vector space X is bounded if for any
neighborhood V of the origin there exists γ > 0 such that we have Ω ⊂ αV
whenever |α| > γ.
3.1 The Normal Cone and Set Extremality 187

Fig. 3.4. Proof of Theorem 3.10

Theorem 3.11 Let Ω1 and Ω2 be nonempty convex subsets of a topological


vector space X, and let x ∈ Ω1 ∩ Ω2 . Assume that there exists a convex
neighborhood V of x such that Ω2 ∩ V is bounded and that

0 ∈ int Ω1 − (Ω2 ∩ V ) . (3.11)
Then we have the normal cone intersection rule
N (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ) + N (x; Ω2 ). (3.12)

Proof. We need to verify the inclusion “⊂” in (3.12) under the fulfillment
of the qualification condition (3.11), which is called the bounded extremality
condition at x. We will mainly follow the proof lines of Theorem 3.10 with the
corresponding modifications due to the usage of (3.11). Denote A := Ω1 and
B := Ω2 ∩ V and then observe that 0 ∈ int(A − B) and B is bounded. Fixing
an arbitrary normal x∗ ∈ N (x; A ∩ B), we get by (3.1) that x∗ , x − x ≤ 0
for all x ∈ A ∩ B. Consider the convex sets
  
Θ1 := A × [0, ∞), Θ2 := (x, μ) ∈ X × R  x ∈ B, μ ≤ x∗ , x − x . (3.13)
Following the proof of Theorem 3.10, we observe that the sets Θ1 and Θ2
form an extremal system. Next let us check that the set Θ1 − Θ2 is solid, i.e.,
int(Θ1 − Θ2 ) = ∅. Take a neighborhood U of the origin such that U ⊂ A − B.
The boundedness of the set B allows us to choose λ ∈ R satisfying
λ ≥ sup −x∗ , x − x. (3.14)
x∈B

Then we get int(Θ1 −Θ2 ) = ∅ by showing that U ×(λ, ∞) ⊂ Θ1 −Θ2 . To verify


the latter, fix any (x, λ) ∈ U ×(λ, ∞) for which we clearly have x ∈ U ⊂ A−B
and λ > λ; therefore x = w1 − w2 with some w1 ∈ A and w2 ∈ B. This yields
(x, λ) = (w1 , λ − λ) − (w2 , −λ).
188 3 CONVEX GENERALIZED DIFFERENTIATION

Further, it follows from λ − λ > 0 that (w1 , λ − λ) ∈ Θ1 , and we deduce from


(3.13) and (3.14) that (w2 , −λ) ∈ Θ2 , which shows that int(Θ1 − Θ2 ) = ∅.
Applying Theorem 3.7(b) to the sets Θ1 and Θ2 gives us y ∗ ∈ X ∗ and γ ∈ R
with (y ∗ , γ) = (0, 0) such that
y ∗ , x − γλ1 ≤ y ∗ , y − γλ2 for all (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 . (3.15)
Using (3.15) with (x, 1) ∈ Θ1 , (x, 0) ∈ Θ2 implies that γ ≥ 0. If γ = 0, then
y ∗ , x ≤ y ∗ , y for all x ∈ A, y ∈ B.
Since U ⊂ A − B, it yields y ∗ = 0, a contradiction, which shows that γ > 0.
Employing (3.15) with (x, 0) ∈ Θ1 for x ∈ A and (x, 0) ∈ Θ2 tells us that
y ∗ , x ≤ y ∗ , x for all x ∈ A, and so y ∗ ∈ N (x; A).
Using (3.15) with (x, 0) ∈ Θ1 and (y, x∗ , y − x) ∈ Θ2 for y ∈ B yields
y ∗ , x ≤ y ∗ , y − γx∗ , y − x for all y ∈ B.
Dividing both sides of the obtained inequality by γ > 0, we arrive at
x∗ − y ∗ /γ, y − x ≤ 0 for all y ∈ B,
which verifies by (3.1) the fulfillment of the inclusions
x∗ ∈ y ∗ /γ + N (x; B) ⊂ N (x; A) + N (x; B),
and hence N (x; A ∩ B) ⊂ N (x; A) + N (x; B). The opposite inclusion is trivial,
and so we get N (x; A∩B) = N (x; A)+N (x; B). Since N (x; A∩B) = N (x; Ω1 ∩
Ω2 ) and N (x; B) = N (x; Ω2 ), we have (3.12) and complete the proof. 

The next remark shows that the qualification condition (3.11) holds in any
normed space under the fulfillment of the standard interiority condition from
Theorem 3.10, and the latter may fail even in R2 while (3.11) is satisfied.

Remark 3.12 Let X be a normed space, and let Ω1 , Ω2 be convex subsets of


X with x ∈ Ω1 ∩Ω2 . Assume that Ω1 ∩int(Ω2 ) = ∅ and select u ∈ Ω1 and γ > 0
such that u+γB ⊂ Ω2 . Then we choose r > 0 with u+γB ⊂ Ω2 ∩B(x; r). Thus
γB ⊂ Ω1 − (Ω2 ∩ B(x; r)) and so 0 ∈ int(Ω1 − (Ω2 ∩ V )), where V := B(x; r).
To show that (3.11) is strictly weaker than the condition Ω1 ∩ int(Ω2 ) = ∅,
construct the convex subsets of R2 by
Ω1 := R × {0} and Ω2 := {0} × R
for which Ω1 ∩ int(Ω2 ) = ∅ and int(Ω1 ) ∩ Ω2 = ∅, while (3.11) holds for
x = (0, 0), which is the only point in the set Ω1 ∩ Ω2 .
3.1 The Normal Cone and Set Extremality 189

Observe also that the bounded extremality condition (3.11) at any x ∈


Ω1 ∩ Ω2 is implied by the difference interiority condition 0 ∈ int(Ω1 − Ω2 ) in
arbitrary topological vector spaces provided that the set Ω2 is bounded; see
more in Exercise 3.97.
Now we show that in the case of reflexive Banach spaces, the bounded
extremality condition (3.11) at any x ∈ Ω1 ∩ Ω2 is implied by the aforemen-
tioned difference interiority condition whenever the sets Ω1 and Ω2 are closed
and convex without any boundedness assumption. Note that more compre-
hensive calculus rules in Banach spaces will be established in Chapter 4 by
using other variational techniques.

Theorem 3.13 Let X be a reflexive Banach space, and let Ω1 , Ω2 ⊂ X be


closed and convex sets. The bounded extremality condition (3.11) and hence
the normal cone intersection rule (3.7) hold at any x ∈ Ω1 ∩ Ω2 if
0 ∈ int(Ω1 − Ω2 ). (3.16)

Proof. Fix any number r > 0 and show that



0 ∈ core Ω1 ∩ B(x; r) − Ω2 ∩ B(x; r) . (3.17)
Assumption (3.16) allows us to find γ > 0 such that γB ⊂ Ω1 − Ω2 . For any
x ∈ X denote u := x+1
γ
x ∈ γB and get u = w1 − w2 with wi ∈ Ωi as i = 1, 2.
Hence there is a constant γ > 0 depending on x and r for which
 
t max w1 − x, w2 − x < r whenever 0 < t < γ.
This readily justifies the relationships
   
tu = tw1 −tw2 = x+t(w1 −x) − x+t(w2 −x) ∈ Ω1 ∩B(x; r) − Ω2 ∩B(x; r)
for all 0 < t < γ and thus establishes (3.17) by Proposition 2.17. Since the
space X is reflexive and the sets Ωi ∩ B(x; r), i = 1, 2, are closed and bounded
in X, they are weakly
 sequentially
 compact in this space. This implies that
their difference Ω1 ∩ B(x; r) − Ω2 ∩ B(x; r) is closed in X. Then it follows
from Theorem 2.24 that
 
0 ∈ core Ω1 ∩ B(x; r) − Ω2 ∩ B(x; r) = intΩ1 ∩ B(x; r) − Ω2 ∩ B(x; r)
⊂ int Ω1 − Ω2 ∩ B(x; r) ,
which verifies (3.11) and thus completes the proof of the theorem. 

In the next subsection we significantly improve all the qualification con-


ditions above in the case of finitely many convex sets in finite-dimensional
spaces. Furthermore, Subsection 4.2.2 presents a more advanced qualification
condition for the normal cone intersection rule in the case of closed and con-
vex subsets of Banach spaces, which is derived from calculating the support
function to set intersections.
190 3 CONVEX GENERALIZED DIFFERENTIATION

3.1.4 Normal Cone Intersection Rule in Finite Dimensions

First we present a useful result on representing the relative interior of convex


set intersections in finite-dimensional spaces.

Lemma 3.14 Let Ωi ⊂ Rn for i = 1, . . . , m ≥ 2 be convex subsets of Rn


satisfying the condition

m
ri(Ωi ) = ∅. (3.18)
i=1

Then we have the representation



m  m
ri Ωi = ri(Ωi ). (3.19)
i=1 i=1

Proof. Arguing by induction, let us begin with considering the case of m = 2.


Pick x ∈ ri(Ω1 ) ∩ ri(Ω2 ) and find γ > 0 such that
B(x; γ) ∩ aff(Ω1 ) ⊂ Ω1 and B(x; γ) ∩ aff(Ω2 ) ⊂ Ω2 ,
which implies therefore that
B(x; γ) ∩ [aff(Ω1 ) ∩ aff(Ω2 )] ⊂ Ω1 ∩ Ω2 .
It is easy to see that aff(Ω1 ∩ Ω2 ) ⊂ aff(Ω1 ) ∩ aff(Ω2 ), and hence
B(x; γ) ∩ aff(Ω1 ∩ Ω2 ) ⊂ Ω1 ∩ Ω2 .
Thus we get x ∈ ri(Ω1 ∩Ω2 ), which justifies that ri(Ω1 )∩ri(Ω2 ) ⊂ ri(Ω1 ∩Ω2 ).
To prove the opposite inclusion in (3.19) for m = 2, observe that
Ω1 ∩ Ω2 = Ω 1 ∩ Ω 2 (3.20)
for any convex sets Ω1 , Ω2 with ri(Ω1 ) ∩ ri(Ω2 ) = ∅. Indeed, pick x ∈ Ω 1 ∩ Ω 2 ,
x ∈ ri(Ω1 ) ∩ ri(Ω2 ) and observe that xk := k −1 x + (1 − k −1 )x → x as k → ∞.
Then Theorem 2.83(b) tells us that xk ∈ Ω1 ∩ Ω2 for large k ∈ N and hence
x ∈ Ω1 ∩ Ω2 , which justifies the inclusion “⊃” in (3.20). The opposite inclusion
“⊂” therein obviously holds even for nonconvex sets. Now using (3.20) and
Theorem 2.84(a) based on convex separation gives us
ri(Ω1 ) ∩ ri(Ω2 ) = ri(Ω1 ) ∩ ri(Ω2 ) = Ω1 ∩ Ω2 = Ω1 ∩ Ω2 .
This shows therefore that
ri(ri(Ω1 ) ∩ ri(Ω2 )) = ri(Ω1 ∩ Ω2 )
and allows us to conclude by Theorem 2.84(b) that
3.1 The Normal Cone and Set Extremality 191


ri ri(Ω1 ) ∩ ri(Ω2 ) = ri(Ω1 ∩ Ω2 ) ⊂ ri(Ω1 ) ∩ ri(Ω2 ).
This justifies representation (3.19) for m = 2.
To verify (3.19) under the fulfillment of (3.18) in the general case of m > 2,
we proceed by induction and assume that the result holds for m − 1 sets.
Considering m sets Ωi , represent their intersection as

m 
m−1
Ωi = Ω ∩ Ωm with Ω := Ωi . (3.21)
i=1 i=1

Then we have ri(Ω) ∩ ri(Ωm ) = ∩m i=1 ri(Ωi ) = ∅ by the imposed condition in


(3.18). This allows us to employ the obtained result for the two sets Ω and
Ωm and thus arrive at the claimed conclusion (3.19) for m sets. 

Now we are ready to derive the major representation of the normal cone
to intersections of convex sets. Note that the proof of this result, as well as
the subsequent calculus rules for functions and set-valued mappings, follows
the geometric pattern of variational analysis as in [228] while the specific
features of convexity allow us to essentially simplify the proof and to avoid
the closedness requirement on sets and the corresponding lower semiconti-
nuity assumptions on functions in subdifferential calculus. Furthermore, the
developed geometric approach works in the convex setting under the corre-
sponding relative interior qualification conditions in Rn , which are weaker
than the qualification conditions employed in [228, 237]; see below.

Theorem 3.15 Let Ω1 , . . . , Ωm ⊂ Rn with m ≥ 2 be convex sets satisfying


the relative interior qualification condition

m
ri(Ωi ) = ∅. (3.22)
i=1

Then we have the intersection rule



m   m 
m
N x; Ωi = N (x; Ωi ) for all x ∈ Ωi . (3.23)
i=1 i=1 i=1

Proof. Proceeding by induction, let us first prove the claimed statement for
the case of m = 2. Since the inclusion “⊃” in (3.23) trivially holds even with-
out imposing (3.22), the real task is to verify the opposite inclusion therein.
Fixing x ∈ Ω1 ∩ Ω2 and v ∈ N (x; Ω1 ∩ Ω2 ), we get by the normal cone
definition that
v, x − x ≤ 0 for all x ∈ Ω1 ∩ Ω2 .
Denote Θ1 := Ω1 × [0, ∞) and Θ2 := {(x, λ) | x ∈ Ω2 , λ ≤ v, x − x}. It
follows from Corollary 2.95 that ri(Θ1 ) = ri(Ω1 ) × (0, ∞) and that
  
ri(Θ2 ) = (x, λ)  x ∈ ri(Ω2 ), λ < v, x − x .
192 3 CONVEX GENERALIZED DIFFERENTIATION

Assuming the contrary, it is easy to check that ri(Θ1 ) ∩ ri(Θ2 ) = ∅. Then


applying Theorem 2.92 to these convex sets in Rn+1 gives us a nonzero pair
(w, γ) ∈ Rn × R such that
w, x − γλ1 ≤ w, y − γλ2 for all (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 . (3.24)

Moreover, there are ( 1 ) ∈ Θ1 and (


x, λ 2 ) ∈ Θ2 satisfying
y, λ

w, x 1 < w, y − γ λ


 − γ λ 2 .

Following the proof of Theorem 3.10, we observe that γ ≥ 0. Let us now show
by using (3.22) that γ > 0. Supposing again on the contrary that γ = 0, we
get the conditions
w, x ≤ w, y for all x ∈ Ω1 , y ∈ Ω2 ,
w, x
 < w, y with x ∈ Ω1 , y ∈ Ω2 .
This means that Ω1 and Ω2 can be properly separated, which tells us by
Theorem 2.92 that ri(Ω1 ) ∩ ri(Ω2 ) = ∅, a contradiction verifying that γ > 0.
Deduce from (3.24), by taking into account that (x, 0) ∈ Θ1 if x ∈ Ω1 and
that (x, 0) ∈ Θ2 , the inequality
w, x ≤ w, x for all x ∈ Ω1 .
This implies therefore that w ∈ N (x; Ω1 ) and so w/γ ∈ N (x; Ω1 ). Moreover,
we get from (3.24), due to (x, 0) ∈ Θ1 and (y, α) ∈ Θ2 for all y ∈ Ω2 with
α := v, y − x, that
   
w, x ≤ w, y − γv, y − x whenever y ∈ Ω2 .
Dividing both sides therein by γ, we arrive at the relationship
 w 
v − , y − x ≤ 0 for all y ∈ Ω2 ,
γ
w
and thus v − ∈ N (x; Ω2 ). This gives us
γ
w
v ∈ + N (x; Ω2 ) ⊂ N (x; Ω1 ) + N (x; Ω2 )
γ
and therefore completes the proof of (3.23) in the case of m = 2.
Considering now the case of intersections for any finite number of sets,
suppose by induction that the intersection rule (3.23) holds under (3.22) for
m − 1 sets and verify that it continues to hold for the intersection of m > 2
m
sets i=1 Ωi . Represent the latter intersection as Ω ∩ Ωm in (3.21) and get
from the imposed relative interior condition (3.22) and Lemma 3.14 that

m
ri(Ω) ∩ ri(Ωm ) = ri(Ωi ) = ∅.
i=1
3.1 The Normal Cone and Set Extremality 193

Applying the intersection rule (3.23) to the two sets Ω ∩Ωm and then employ-
ing the induction assumption for m − 1 sets give us the equalities

m 
m
N x; Ωi = N (x; Ω ∩ Ωm ) = N (x; Ω) + N (x; Ωm ) = N (x; Ωi ),
i=1 i=1

which justify (3.23) for m sets and thus completes the proof. 

It is not difficult to observe that the relative interior assumption (3.22) is


essential for the fulfillment of the intersection rule (3.23) as illustrated by the
following two-dimensional example (Figure 3.5).
Example 3.16 Define the two convex sets on the plane by
     
Ω1 := (x, λ) ∈ R2  λ ≥ x2 and Ω2 := (x, λ) ∈ R2  λ ≤ −x2 .
Then for x = (0, 0) ∈ Ω1 ∩ Ω2 we have
N (x; Ω1 ) = {0} × (−∞, 0], N (x; Ω2 ) = {0} × [0, ∞), N (x; Ω1 ∩ Ω2 ) = R2 .
Thus N (x; Ω1 ) + N (x; Ω2 ) = {0} × R = N (x; Ω1 ∩ Ω2 ), i.e., the intersection
rule (3.23) fails. It does not contradict Theorem 3.15 since ri(Ω1 )∩ri(Ω2 ) = ∅,
and so (3.22) does not hold in this case.

Finally, compare Theorem 3.15 derived under the relative interior qualifi-
cation condition (3.22) with the corresponding result obtained in [237, Corol-
lary 2.16] for m = 2 under the so-called basic/normal qualification condition

N (x; Ω1 ) ∩ − N (x; Ω2 ) = {0}, (3.25)
which was introduced and applied earlier for establishing the intersection rule
and related calculus results in nonconvex variational analysis; see, e.g., [228,
229, 317], and the references therein. Let us first show that (3.25) yields (3.22)
in the general convex setting.

Corollary 3.17 Let Ω1 , Ω2 ⊂ Rn be convex sets satisfying the basic qualifi-


cation condition (3.25) at some x ∈ Ω1 ∩ Ω2 . Then we have
ri(Ω1 ) ∩ ri(Ω2 ) = ∅, (3.26)
and so the intersection rule (3.23) holds for these sets at x.

Proof. Suppose on the contrary that ri(Ω1 ) ∩ ri(Ω2 ) = ∅. Then Ω1 , Ω2 are


properly separated by Theorem 2.92, i.e., there is v = 0 such that
v, x ≤ v, y for all x ∈ Ω1 , y ∈ Ω2 .
Since x ∈ Ω2 , we have v, x − x ≤ 0 for all x ∈ Ω1 . Hence v ∈ N (x; Ω1 )
and similarly −v ∈ N (x; Ω2 ). This tells us therefore that 0 = v ∈ N (x; Ω1 ) ∩
(−N (x; Ω2 )), which contradicts (3.25). 
194 3 CONVEX GENERALIZED DIFFERENTIATION

Fig. 3.5. Relative interior condition

The next example shows that (3.26) may be strictly weaker than (3.25).

Example 3.18 Consider the two planar convex sets defined by Ω1 := R×{0}
and Ω2 := (−∞, 0] × {0}. Condition (3.26) obviously holds and thus ensures
the fulfillment of the normal cone intersection rule by Theorem 3.15. On the
other hand, it is easy to check that
N (x; Ω1 ) = {0} × R and N (x; Ω2 ) = [0, ∞) × R with x = (0, 0),
i.e., the normal qualification condition (3.25) fails. This shows that the result
of [237, Corollary 2.16] is not applicable in this case.

3.2 Coderivatives of Convex-Graph Mappings


This section is devoted to calculus rules for coderivatives of set-valued map-
pings by using the geometric approach that is mainly based on the normal
cone intersection rule from Theorem 3.10. The coderivative concept has not
3.2 Coderivatives of Convex-Graph Mappings 195

been investigated in the standard framework of convex analysis while it has


been well recognized in extended settings of variational analysis; see, e.g.,
[228, 229, 317]. The results presented below are essentially different in var-
ious aspects from the coderivative calculus rules derived in general noncon-
vex frameworks as well as from their convex-graph specifications from the
aforementioned references. After giving definitions and elementary proper-
ties, we proceed with coderivative calculus in topological vector spaces and
then present its striking improvement in finite dimensions. Comprehensive
coderivative calculus rules in Banach spaces are developed in Subsection 4.2.2
by using other tools of geometric analysis based on completeness.
3.2.1 Coderivative Definition and Elementary Properties

Given two topological vector spaces X and Y , consider a set-valued mapping


(multifunction) F : X → → Y that takes values F (x) ⊂ Y in the collections of all
the subsets of Y . If F (x) is a singleton for each x ∈ X, we have a single-valued
mapping and use the standard notation F : X → Y . Let us associate with any
F: X →→ Y the two sets: its domain and graph defined, respectively, by
  
dom(F ) := x ∈ X  F (x) = ∅ and 
gph(F ) := (x, y) ∈ X × Y  y ∈ F (x) .
We always assume that F is proper, i.e., dom(F ) = ∅.
Definition 3.19 Let F : X → → Y be a proper set-valued mapping between topo-
logical vector spaces, and let (x, y) ∈ gph(F ). The coderivative of F at
(x, y) is a set-valued mapping D∗ F (x, y) : Y ∗ → → X ∗ defined by
   
D∗ F (x, y)(y ∗ ) := x∗ ∈ X ∗  (x∗ , −y ∗ ) ∈ N (x, y); gph(F ) (3.27)
for y ∗ ∈ Y ∗ . We omit y in the coderivative notation if F (x) is a singleton.
Note the following obvious property
D∗ F (x, y)(λy ∗ ) = λD∗ F (x, y)(y ∗ ) whenever λ > 0 and y ∗ ∈ Y ∗ ,
which means that the coderivative set-valued mapping y ∗ → → D∗ F (x, y)(y ∗ ) is
positively homogeneous.
In this subsection we consider only set-valued mappings having convex
graphs referring to them as convex-graph mappings. The next two results
follow directly from the definitions. The first proposition tells us that the
coderivative is a set-valued extension of the classical notion of the adjoint
linear operator. This is due to the fact that the normal cone to affine sets (and
more generally to smooth manifolds) agrees with the usual normal space.
Proposition 3.20 Let F (x) := {Ax}, where A : X → Y is a linear contin-
uous mapping between topological vector spaces with y := Ax. Then we have
the coderivative representation
 
D∗ A(x, y)(y ∗ ) = A∗ y ∗ for all x ∈ X and y ∗ ∈ Y ∗
via the classical adjoint operator A∗ : Y ∗ → X ∗ .
196 3 CONVEX GENERALIZED DIFFERENTIATION

The second proposition explicitly calculates the coderivative of any convex-


graph multifunctions. It immediately follows from the form of the normal cone
to arbitrary convex sets.

Proposition 3.21 Let F : X → → Y be a convex-graph mapping between topo-


logical vector spaces. Then for all (x, y) ∈ gph(F ) and y ∗ ∈ Y ∗ we have the
coderivative representation
  

D∗ F (x, y)(y ∗ )= x∗ ∈X ∗  x∗ , x − y ∗ , y= max x∗ , x − y ∗ , y .
(x,y)∈gph(F )

3.2.2 Coderivative Calculus in Topological Vector Spaces

Let us start with the sum rule for coderivatives under several qualification
conditions induced by the application of the normal cone intersection rule
from Theorem 3.10. For simplicity of formulations we confine ourselves to the
coderivative implementation of the standard qualification condition (3.9).
Given two set-valued mappings F1 , F2 : X →→ Y between topological vector
spaces, consider their sum (F1 + F2 ) : X →
→ Y defined for each x ∈ X by
  
(F1 + F2 )(x) = F1 (x) + F2 (x) := y ∈ Y  ∃ yi ∈ Fi (x) with y = y1 + y2 .
If F1 and F2 have convex graphs, then their sum F1 +F2 has the same property,
and we get dom(F1 +F2 ) = dom(F1 )∩dom(F2 ). To formulate the coderivative
sum rule for F1 + F2 at any (x, y) ∈ gph(F1 + F2 ), define the set
  
S(x, y) := (y 1 , y 2 ) ∈ Y ×Y  y = y 1 +y 2 with y i ∈ Fi (x), i = 1, 2 . (3.28)

Theorem 3.22 Let F1 , F2 : X → → Y be set-valued mappings between topologi-


cal vector spaces. Assume that F1 , F2 have convex graphs and that
 
int gph(F1 ) = ∅ and int dom(F1 ) ∩ dom(F2 ) = ∅. (3.29)
Then for any (x, y) ∈ gph(F1 + F2 ), (y 1 , y 2 ) ∈ S(x, y), and y ∗ ∈ Y ∗ we have
D∗ (F1 + F2 )(x, y)(y ∗ ) = D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ). (3.30)

Proof. Let y ∗ ∈ Y ∗ , (x, y) ∈ gph(F1 + F2 ), and (y 1 , y 2 ) ∈ S(x, y) be fixed


for the entire proof. To verify first the inclusion “⊂” in (3.30), pick any
x∗ ∈ D∗ (F1 + F2 )(x, y)(y ∗ ) and get by Definition 3.19 that (x∗ , −y ∗ ) ∈
N (x, y); gph(F1 + F2 ) . Define convex sets Ω1 and Ω2 in the product space
X × Y × Y as follows:
     
Ω1 := (x, y1 , y2 )  y1 ∈ F1 (x) , Ω2 := (x, y1 , y2 )  y2 ∈ F2 (x) . (3.31)
It is easy to check that (x∗ , −y ∗ , −y ∗ ) ∈ N ((x, y 1 , y 2 ); Ω1 ∩ Ω2 ). By the set
constructions in (3.31) we have the representation
3.2 Coderivatives of Convex-Graph Mappings 197

Ω1 − Ω2 = dom(F1 ) − dom(F2 ) × Y × Y. (3.32)
Imposing now the assumptions in (3.31) yields int(Ω1 ) ∩ Ω2 = ∅ for the sets
Ω1 and Ω2 in (3.30). Indeed, we have

int(Ω1 − Ω2 ) = int dom(F1 ) − dom(F2 ) × Y × Y.
It readily follows from the qualification condition (3.29) that 0 ∈ int(dom(F1 )−
dom(F2 )) and thus 0 ∈ int(Ω1 − Ω2 ). We deduce therefore from (3.29) that
int(Ω1 ) = ∅. This implies in turn that int(Ω1 ) ∩ Ω2 = ∅, since the negation
of it leads us to a contradiction by the separation theorem. Applying further
Theorem 3.10 to the sets Ω1 and Ω2 in (3.31) under the standard qualification
condition (3.9) gives us the inclusion
 
(x∗ , −y ∗ , −y ∗ ) ∈ N (x, y 1 , y 2 ); Ω1 + N (x, y 1 , y 2 ); Ω2 .
This yields the decomposition (x∗ , −y ∗ , −y ∗ ) = (x∗1 , −y ∗ , 0)+(x∗2 , 0, −y ∗ ) with
some elements x∗1 , x∗2 satisfying
 
(x∗1 , −y ∗ ) ∈ N (x, y 1 ); gph(F1 ) and (x∗2 , −y ∗ ) ∈ N (x, y 2 ); gph(F2 ) .
Thus we arrive at the representation
x∗ = x∗1 + x∗2 ∈ D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ),
which justifies the inclusion “⊂” in (3.30).
To verify the opposite inclusion in (3.30) without imposing the qualifica-
tion condition (3.29), pick x∗ ∈ D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ) and then
find x∗1 ∈ D∗ F1 (x, y 1 )(y ∗ ) and x∗2 ∈ D∗ F2 (x, y 2 )(y ∗ ) with x∗ = x∗1 + x∗2 . This
tells us by the coderivative definition that
x∗1 , x1 − x + x∗2 , x2 − x − y ∗ , y1 − y 1  − y ∗ , y2 − y 2  ≤ 0 (3.33)
whenever yi ∈ Fi (xi ). Take any (x, y) ∈ gph(F1 +F2 ) and have the representa-
tion y = y1 +y2 , where y1 ∈ F1 (x) and y2 ∈ F2 (x). Plugging now x1 = x2 := x
into (3.33) gives us
x∗ , x − x − y ∗ , y1 − y 1  − y ∗ , y2 − y 2  ≤ 0.
Hence we arrive at the inequality
x∗ , x − x − y ∗ , y − y ≤ 0,
which yields x∗ ∈ D∗ (F1 + F2 )(x, y)(y ∗ ) and thus completes the proof. 

Now we aim at deriving a precise coderivative chain rule for compositions


of general set-valued mappings between topological vector spaces. Given arbi-
trary mappings F : X → → Y and G : Y → → Z, recall that their composition
→ Z is defined by
(G ◦ F ) : X →
198 3 CONVEX GENERALIZED DIFFERENTIATION
   
(G ◦ F )(x) = G(y) := z ∈ Z  ∃y ∈ F (x) with z ∈ G(y) , x ∈ X.
y∈F (x)

It is easy to see that if F and G have convex graphs, then G ◦ F also has this
property. In the next theorem we use the notation

T (x, z) := F (x) ∩ G−1 (z) and rge(F ) := F (x).
x∈X

Theorem 3.23 Let F : X → → Y and G : Y → → Z be set-valued mappings


between topological vector spaces. Assume that both F and G have convex
graphs and impose the following qualification conditions:
 
either int gph(F ) = ∅ and int rge(F) ∩ dom(G) = ∅,
or int gph(F ) = ∅ and rge(F ) ∩ (int dom(G) = ∅.
Then for all (x, z) ∈ gph(G ◦ F ), z ∗ ∈ Z ∗ , and y ∈ T (x, z) we have

D∗ (G ◦ F )(x, z)(z ∗ ) = D∗ F (x, y) ◦ D∗ G(y, z) (z ∗ ). (3.34)

Proof. Let (x, z) ∈ gph(G ◦ F ), z ∗ ∈ Z ∗ , and y ∈ T (x, z) be fixed. We first


verify the inclusion “⊂” in (3.34). Pick x∗ ∈ D∗ (G ◦ F )(x, z)(z ∗ ) and get
x∗ , x − x − z ∗ , z − z ≤ 0 for all (x, z) ∈ gph(G ◦ F ).
Define the following subsets of X × Y × Z:
     
Ω1 := (x, y, z) (x, y) ∈ gph(F ) and Ω2 := (x, y, z) (y, z) ∈ gph(G) .
It is easy to see that (x∗ , 0, −z ∗ ) ∈ N ((x, y, z); Ω1 ∩ Ω2 ) and that

Ω1 − Ω2 = X × rge(F ) − dom(G) × Z. (3.35)
Similar to the proof of Theorem 3.22 we conclude that the imposed qualifica-
tion conditions imply that int(Ω1 ) ∩ Ω2 = ∅. Furthermore, deduce from the
structures of Ω1 , Ω2 the normal cone representations
 
N (x, y, z); Ω1 = N (x, y); gph(F ) × {0},
 
N (x, y, z); Ω2 = {0} × N (y, z); gph(G) .
This tells us that (x∗ , 0, −z ∗ ) = (x∗ , −y ∗ , 0) + (0, y ∗ , −z ∗ ) with
 
(x∗ , −y ∗ ) ∈ N (x, y); gph(F ) and (y ∗ , −z ∗ ) ∈ N (y, z); gph(G) .
The latter means that y ∗ ∈ D∗ G(y, z)(z ∗ ) and x∗ ∈ D∗ F (x, y)(y ∗ ), and so
x∗ ∈ (D∗ F (x, y) ◦ D∗ G(y, z))(z ∗ ), which justifies the inclusion “⊂” in (3.34).
To verify the opposite inclusion therein (without imposing any qualifi-
cation condition), pick x∗ ∈ (D∗ F (x, y) ◦ D∗ G(y, z))(z ∗ ) and find y ∗ ∈
D∗ G(y, z)(z ∗ ) such that x∗ ∈ D∗ F (x, y)(y ∗ ). It tells us that
3.2 Coderivatives of Convex-Graph Mappings 199

x∗ , x − x − y ∗ , y1 − y ≤ 0 whenever y1 ∈ F (x),


y ∗ , y2 − y − z ∗ , z − z ≤ 0 whenever z ∈ G(y2 ).
Take any (x, z) ∈ gph(G ◦ F ) and find y ∈ F (x) with z ∈ G(y). Summing up
the two inequalities above and putting y1 = y2 =: y, we get
x∗ , x − x − z ∗ , z − z ≤ 0,
which justifies x∗ ∈ D∗ (G ◦ F )(x, z)(z ∗ ) and thus completes the proof. 
It follows from the proof of Theorem 3.23 that the interiority assumption
int(gph(F )) = ∅ can be alternatively replaced by int(gph(G)) = ∅.
The final result of this subsection provides a useful rule for representing
the coderivative of intersections of set-valued mappings. Again we derive it
geometrically from the normal cone intersection rule of Theorem 3.10 while
concentrating for simplicity on the application of the standard qualification
condition. Given two set-valued mappings F1 , F2 : X → → Y between topological
vector spaces, recall that their intersection (F1 ∩ F2 ) : X →
→ Y is defined by
(F1 ∩ F2 )(x) := F1 (x) ∩ F2 (x), x ∈ X.
It is easy to see that gph(F1 ∩ F2 ) = gph(F1 ) ∩ gph(F2 ), and so the convexity
of both sets gph(F1 ) and gph(F2 ) yields the convexity of gph(F1 ∩ F2 ). This
allows us to derive the following coderivative intersection rule.
Theorem 3.24 Let F1 , F2 : X → → Y be set-valued mappings between topologi-
cal vector spaces. Assume that both mappings F1 , F2 have convex graphs and
that the qualification conditions

 1 ) ∩ gph(F2 ) = ∅,
either int(gph(F
(3.36)
or gph(F1 ) ∩ int(gph(F2 ) = ∅
are satisfied. Then for any y ∈ (F1 ∩ F2 )(x) and any y ∗ ∈ Y ∗ we have
 
D∗ (F1 ∩ F2 )(x, y)(y ∗ ) = D∗ F1 (x, y)(y1∗ ) + D∗ F2 (x, y)(y2∗ ) . (3.37)
y1∗ +y2∗ =y ∗

Proof. First we verify the inclusion “⊂” in (3.37). For every y ∈ (F1 ∩ F2 )(x),
y ∗ ∈ Y ∗ , and x∗ ∈ D∗ (F1 ∩ F2 )(x, y)(y ∗ ) it follows that
 
(x∗ , −y ∗ ) ∈ N (x, y); gph(F1 ∩ F2 ) = N (x, y); gph(F1 ) ∩ (gph(F2 ) .
Applying Theorem 3.10 under the qualification condition (3.9), which reduces
to (3.36) in this case, tells us that
  
(x∗ , −y ∗ ) ∈ N (x, y); gph(F1 ∩F2 ) =N (x, y); gph(F1 ) + N (x, y); gph(F2 ) .
Thus (x∗ , −y ∗ ) = (x∗1 , −y1∗ ) + (x∗2 , −y2∗ ) for some (x∗i , −yi∗ ) ∈ N ((x, y);
gph(Fi )). Therefore x∗ ∈ D∗ F1 (x, y)(y1∗ ) + D∗ F2 (x, y)(y2∗ ) and y ∗ = y1∗ + y2∗ ,
which justify the claimed inclusion “⊂” in the coderivative representation
(3.37).
200 3 CONVEX GENERALIZED DIFFERENTIATION

To verify the opposite inclusion in (3.37), take y1∗ , y2∗ ∈ Y ∗ with y1∗ + y2∗ =
y . Picking now x∗ ∈ D∗ F1 (x, y)(y1∗ ) + D∗ F2 (x, y)(y2∗ ), we get x∗ = x∗1 + x∗2

for some x∗1 ∈ D∗ F1 (x, y)(y1∗ ) and x∗2 ∈ D∗ F2 (x, y)(y2∗ ). This shows that
 
(x∗ , −y ∗ ) = (x∗1 , −y1∗ ) + (x∗2 , −y2∗ ) ∈ N (x, y); gph(F1 ) + N (x, y); gph(F2 )
= N (x, y); gph(F1 ∩ F2 ) ,
and thus x∗ ∈ D∗ (F1 ∩ F2 )(x, y)(y ∗ ), which completes the proof. 

3.2.3 Coderivative Calculus in Finite Dimensions

In this subsection we present counterparts of the coderivative calculus rules


from Subsection 3.2.2 for convex-graph set-valued mappings between finite-
dimensional spaces. The corresponding calculus results are obtained in the
same way as those above by replacing the application of the normal cone
intersection rule from Theorem 3.10 in topological vector spaces with its finite-
dimensional counterpart obtained in Theorem 3.15 under the relative interior
qualification condition.
Here are the formulations of the coderivative calculus rules in finite-
dimensional spaces with brief sketches of their proofs.

Theorem 3.25 Let F1 , F2 : Rn →


→ Rm be convex-graph multifunctions with
 
ri dom(F1 ) ∩ ri dom(F2 ) = ∅.
Then the coderivative sum rule (3.30) holds for any (x, y) ∈ gph(F1 + F2 ),
(y 1 , y 2 ) ∈ S(x, y), and y ∗ ∈ Rm , where the set S(x, y) is taken from (3.28).

Proof. Consider the sets Ω1 , Ω2 from (3.31) and observe that


   
ri(Ω1 ) = (x, y1 , y2 ) ∈ Rn × Rm × Rm  (x, y1 ) ∈ ri gph(F1 )  ,
ri(Ω2 ) = {(x, y1 , y2 ) ∈ Rn × Rm × Rm  (x, y2 ) ∈ ri gph(F2 ) .
Applying Theorem 2.94, we have
    
ri(Ω1 ) = (x, y1 , y2 ) ∈ Rn × Rm × Rm  x ∈ ri dom(F1 ) , y1 ∈ ri F1 (x)  ,
ri(Ω2 ) = {(x, y1 , y2 ) ∈ Rn × Rm × Rm  x ∈ ri dom(F2 ) , y2 ∈ ri F2 (x) .
  
 any x ∈ ri dom(F1 ) ∩ ri dom(F2 ) and any y1 ∈ ri F1 (x)), y2 ∈
Pick
ri F2 (x) . Then (x, y1 , y2 ) ∈ ri(Ω1 ) ∩ ri(Ω2 ), and thus ri(Ω1 ) ∩ ri(Ω2 ) = ∅.
Now proceed as in the proof of Theorem 3.22 by applying the normal cone
intersection rule from Theorem 3.15 to the sets Ω1 , Ω2 . 

Theorem 3.26 Let F : Rn → → Rm and G : Rm → → Rd be convex-graph multi-


functions satisfying the qualification condition
 
ri rge(F ) ∩ ri dom(G) = ∅. (3.38)
Then for any (x, z) ∈ gph(G ◦ F ) and z ∗ ∈ Rd we have the chain rule (3.34).
3.3 Subgradients of Convex Functions 201

Proof. Consider the sets Ω1 , Ω2 ⊂ Rn × Rm × Rd defined in the proof of


Theorem 3.23. Using (3.35), we have the representation

ri(Ω1 − Ω2 ) = Rn × ri rge(F ) − dom(G) × Rd .
It follows from (3.38) due to the definitions of the sets Ω1 and Ω2 that
0 ∈ ri(Ω1 − Ω2 ), and so ri(Ω1 ) ∩ ri(Ω2 ) = ∅.
It remains to apply Theorem 3.15 to the sets Ω1 , Ω2 and then to proceed as
the proof of Theorem 3.23. 
→ Rm with
Theorem 3.27 Given convex-graph multifunctions F1 , F2 : Rn →
 
ri gph(F1 ) ∩ ri gph(F2 ) = ∅.
Then we have the coderivative intersection rule (3.37) that is valid for any
vectors (x, y) ∈ gph(F1 ∩ F2 ) and y ∗ ∈ Rm .
Proof. Apply Theorem 3.15 to the sets Ω1 := gph(F1 ) and Ω2 := gph(F2 )
while arguing as in the proof of Theorem 3.24. 

3.3 Subgradients of Convex Functions


In this section we consider extended-real-valued functions f : X → R :=
(−∞, ∞] defined on topological vector spaces. Having in mind that convex
functions are often nondifferentiable at domain points, their generalized dif-
ferential properties are studied here via the underlying concept of subdifferen-
tial, or the collection of subgradients. This notion of generalized derivative is
among the most fundamental in analysis, with a variety of important appli-
cations. The revolutionary idea behind the subdifferential concept, which dis-
tinguishes it from other notions of generalized derivatives in mathematics, is
its set-valuedness as the indication of nonsmoothness of a function around the
reference individual point. This not only provides various advantages and flex-
ibility, but also creates certain difficulties in developing subdifferential calculus
and its applications. The major techniques of subdifferential analysis revolve
around convex separation, which is associated in our geometric approach with
set extremality and the extremal principle as seen above in the proof of the
normal cone intersection rule.

3.3.1 Basic Definitions and Examples

We start this subsection with defining the fundamental subdifferential notion


for convex functions that is studied and applied throughout the entire book
(Figure 3.6).
Definition 3.28 Let X be a topological vector space, and let f : X → R be
a convex function with x ∈ dom(f ). A dual element x∗ ∈ X ∗ is called a
subgradient of f at x if we have
202 3 CONVEX GENERALIZED DIFFERENTIATION

x∗ , x − x ≤ f (x) − f (x) for all x ∈ X. (3.39)


The collection of all the subgradients of f at x is called the subdifferential
of the function at this point and is denoted by ∂f (x). We put ∂f (x) := ∅
whenever x ∈ / dom(f ).

Fig. 3.6. Definition 3.28

It follows directly from (3.39) that the subdifferential ∂f (x) ⊂ X ∗ is a


convex set, which is closed in the weak∗ topology of X ∗ .
The next statement is a generalized version of the Fermat stationary rule
for convex extended-real-valued functions.

Proposition 3.29 Let f : X → R, and let x ∈ dom(f ). Then f attains its


local (equivalently global/absolute) minimum at x if and only if 0 ∈ ∂f (x).

Proof. The very construction (3.39) immediately yields this property. Note
that for convex functions the notions of local and global minimizers agree. 

Fix x∗ ∈ X ∗ and define the function (x∗ , −1) : X × R → R by


(x∗ , −1), (x, y) := x∗ , x − y for all (x, y) ∈ X × R.
Directly from the definition we see that (x∗ , −1) ∈ (X × R)∗ .
The next proposition shows that the subdifferential ∂f (x) can be equiv-
alently defined geometrically via normals to the epigraph of f at (x, f (x))
(Figure 3.7).

Proposition 3.30 Let X be a topological vector space. For any convex func-
tion f : X → R and any x ∈ dom(f ) we have the representation
   
∂f (x) = x∗ ∈ X ∗  (x∗ , −1) ∈ N (x, f (x)); epi(f ) . (3.40)
3.3 Subgradients of Convex Functions 203

Fig. 3.7. Proposition 3.30

Proof. Fix x∗ ∈ ∂f (x) and (x, λ) ∈ epi(f ). Since λ ≥ f (x), we deduce from
(3.39) the upper estimates
 ∗   
(x , −1), (x, λ) − x, f (x) = x∗ , x − x − λ − f (x)
≤ x∗ , x − x − f (x) − f (x) ≤ 0.
This readily implies that (x∗ , −1) ∈ N ((x, f (x)); epi(f )).
To verify the opposite inclusion in (3.40), take an arbitrary x∗ ∈ X ∗ with
(x∗ , −1) ∈ N ((x, f (x)); epi(f )). For any x ∈ dom(f ) we have (x, f (x)) ∈
epi(f ). Therefore, it follows that
    
x∗ , x − x − f (x) − f (x) = (x∗ , −1), x, f (x) − x, f (x) ≤ 0,
which yields x∗ ∈ ∂f (x) and thus justifies representation (3.40). 
Proposition 3.30 tells us that subgradients of f at x correspond to “non-
horizontal” normals to the epigraph epi(f ). As complemented to them, we
introduce the following collection of x∗ ∈ X ∗ corresponding to horizontal nor-
mals to epi(f ) that plays an independent role in the study of convex functions.
Definition 3.31 Let X be a topological vector space. Given a convex function
f : X → R and a point x ∈ dom(f ), we say that x∗ ∈ X ∗ is a singular or
horizon subgradient of f at x if (x∗ , 0) ∈ N ((x, f (x)); epi(f )). The collec-
tion of such subgradients is called the singular/horizon subdifferential
of f at x and is denoted by
   
∂ ∞ f (x) := x∗ ∈ X ∗  (x∗ , 0) ∈ N (x, f (x)); epi(f ) . (3.41)
We put ∂ ∞ f (x) := ∅ if x ∈
/ dom(f ).
204 3 CONVEX GENERALIZED DIFFERENTIATION

Remark 3.32 It follows directly from (3.40), (3.41), and the coderivative
construction (3.27) that both subdifferential and singular subdifferential of
f : X → R at x ∈ dom(f ) can be expressed as
 
∂f (x) = D∗ Ef x, f (x) (1) and ∂ ∞ f (x) = D∗ Ef x, f (x) (0) (3.42)
via the coderivative of the epigraphical multifunction Ef : X →
→ R, which is
associated with the function f by the formula
  
Ef (x) := α ∈ R  (x, α) ∈ epi(f ) = [f (x), ∞), x ∈ X.
There are deeper relationships between ∂f (x), ∂ ∞ f (x), and the coderivative
D∗ f (x) of f itself provided that f is finite around x ∈ dom(f ). It is proved
in [228, Theorem 1.80], even without the convexity of f , that
∂f (x) = D∗ f (x)(1) and ∂ ∞ f (x) ⊂ D∗ f (x)(0) (3.43)
provided that X is a Banach space and that f is continuous around x. Fur-
thermore, the first relationship in (3.43) is proved in [229, Theorem 1.23] for
functions f : Rn → R that are merely l.s.c. around this point. Note however
that, in contrast to (3.42), the equality in (3.43) does not allow us to apply
coderivative results for convex-graph mappings to the study of the subdiffer-
ential of f , since the graph of f is not convex unless f is an affine function.

To proceed further, observe similarly to the subdifferential ∂f (x) that the


singular subdifferential (3.41) is a convex subset of X ∗ , which is closed in
the weak∗ topology of the dual space. Furthermore, for any x ∈ dom(f ) the
set ∂ ∞ f (x) is a cone, and so it contains the origin. It is easy to observe (see
below) that ∂ ∞ f (x) reduces to {0} for locally Lipschitzian functions defined
on a normed space X.
Given  ≥ 0, we specify now Definition 2.147 of Lipschitz continuity and
say that a function f : X → R on a normed space X is a locally -Lipschitzian,
or locally Lipschitzian with constant/modulus , around x ∈ X if there exists
a neighborhood U ⊂ dom(f ) of x such that
|f (x1 ) − f (x2 )| ≤ x1 − x2  for any x1 , x2 ∈ U. (3.44)
It is obvious that any function f that is locally Lipschitzian around x is
continuous around this point. As mentioned above, Lipschitz continuity can
be treated as continuity with a linear rate. The next important result, which
follows directly from Corollary 2.150, tells us that for convex functions f the
converse implication of the observation above also holds, even if the continuity
of f is assumed only at the point in question.

Theorem 3.33 Let f : X → R be a convex function on a normed space X,


and let x ∈ dom(f ). Then f is continuous at x if and only if it is locally
Lipschitzian around this point.
3.3 Subgradients of Convex Functions 205

Employing Proposition 3.30 and Theorem 3.33 gives us the following sub-
differential conditions for locally Lipschitzian/continuous functions.
Theorem 3.34 Let X be a normed space, and let f : X → R be a convex
function with x ∈ dom(f ). Then we have the following assertions:
(a) The local -Lipschitz continuity of f around x implies that
x∗  ≤  for any x∗ ∈ ∂f (x) and ∂ ∞ f (x) = {0}, (3.45)
where the subdifferential ∂f (x) is a weak∗ compact subset of X ∗ .
(b) All the conditions in (a) hold with some  ≥ 0 if f is continuous at x.

Proof. In case (a) both properties (3.45) follow directly from representation
(3.40) and construction (3.41), respectively, due to (3.44). The weak∗ compact-
ness of the subdifferential ∂f (x) is a consequence of (3.45) and the Alaoglu-
Bourbaki theorem in normed spaces; see Corollary 1.113. Assertion (b) follows
from (a) due to the equivalence in Theorem 3.33. 

Next we consider two important examples of the subdifferential calcu-


lation. The first one establishes a simple connection between normals to a
convex set and subgradients as well as singular subgradients of an extended-
real-valued function associated with the set. It can be treated as an equivalent
definition of the normal cone via the subdifferential.

Example 3.35 Let X be a topological vector space, and let Ω be a nonempty


convex subset of X. Define the indicator function of Ω by

0 if x ∈ Ω,
f (x) = δ(x; Ω) := (3.46)
∞ otherwise.

Then we get epi(f ) = Ω × [0, ∞) and therefore



N (x, f (x)); epi(f ) = N (x; Ω) × (−∞, 0], x ∈ Ω.
This tells us by Proposition 3.30 and Definition 3.31 that
∂f (x) = ∂ ∞ f (x) = N (x; Ω) for any x ∈ Ω, (3.47)
which verifies useful connections exploited in what follows.

The second example calculates the subdifferential of an important convex


nondifferentiable function on a normed space.

Example 3.36 Let X be a normed space, and let f (x) := x be the norm
function on X. Then the subdifferential of f at x ∈ X is calculated by

B∗ if x = 0,
∂f (x) =  ∗   (3.48)
x ∈ X ∗  x∗ , x = x, x∗  = 1 otherwise.
206 3 CONVEX GENERALIZED DIFFERENTIATION

To proceed, for any x∗ ∈ ∂f (x) we obviously have


x∗ , u = x∗ , x + u − x ≤ u + x − x ≤ u whenever u ∈ X. (3.49)
Thus supu≤1 x∗ , u = x∗  ≤ 1, which shows that ∂f (x) ⊂ B∗ .
To verify now formula (3.48), consider first the case where x = 0. Then
for any x∗ ∈ B∗ we get the relationships
x∗ , x − x = x∗ , x ≤ x∗  · x ≤ x − 0 whenever x ∈ X,
and thus conclude by (3.39) that B∗ ⊂ ∂f (x). Together with (3.49), this
justifies the equality in (3.48) for x = 0.
It remains to consider the case where x = 0. From the first estimate in
(3.49) with u = x and u = −x we deduce that x∗ , x ≤ x and x∗ , −x ≤
−x, and so x∗ , x = x. This yields
    x 
x∗  = sup x∗ , u  u = 1 ≥ x∗ , = 1.
x
Now fix any x∗ ∈ X ∗ with x∗ , x = x and x∗  = 1. Then
x∗ , x − x = x∗ , x − x∗ , x ≤ x∗  · x − x = x − x,
which therefore completes the verification of (3.48).

The next theorem, which is based on the previous results, provides a full
description of the normal cone to the epigraph of a convex function.

Theorem 3.37 Let f : X → R be a convex function defined on a topological


vector space X, and let x ∈ dom(f ). Given x∗ ∈ X ∗ , the following properties
are satisfied:
(a) If (x∗ , −μ) ∈ N ((x, f (x)); epi(f )), then μ ≥ 0.
(b) (x∗ , 0) ∈ N ((x, f (x)); epi(f )) if and only if x∗ ∈ N (x; dom(f )).

Proof. To verify (a), take (x∗ , −μ) ∈ N ((x, f (x)); epi(f )) and get that

x∗ , x − x − μ λ − f (x) ≤ 0 whenever (x, λ) ∈ epi(f ).
Since (x, f (x) + 1) ∈ epi(f ), it follows that

x∗ , x − x − μ (f (x) + 1 − f (x) ≤ 0,
which clearly implies that μ ≥ 0.
To proceed with (b), pick x∗ ∈ X ∗ such that (x∗ , 0) ∈ N ((x, f (x)); epi(f ))
and deduce from the normal cone definition that

x∗ , x − x + 0 λ − f (x) ≤ 0 whenever (x, λ) ∈ epi(f ). (3.50)
Taking any x ∈ dom(f ) and applying (3.50) for (x, f (x)) ∈ epi(f ) give us the
inequality x∗ , x − x ≤ 0, which implies that x∗ ∈ N (x; dom(f )). The proof
of the opposite implication is also straightforward. 
3.3 Subgradients of Convex Functions 207

Note that the result of Theorem 3.37(b) provides the description below of
the singular subdifferential (3.41) of convex functions as

∂ ∞ f (x) = N x; dom(f ) . (3.51)
Having this in mind and using the corresponding results above lead us to the
following characterization of local Lipschitz continuity and merely continuity
in the case of finite-dimensional spaces.

Corollary 3.38 Let f : Rn → R be a convex function with x ∈ dom(f ). Then


the continuity of f at x is equivalent to its local Lipschitz continuity around
this point as well as to the interiority condition x ∈ int(dom(f )). For the
fulfillment of these properties it is necessary and sufficient that
∂ ∞ f (x) = {0}. (3.52)

Proof. The equivalence between the first three properties follows from Corol-
lary 2.152 and Theorem 3.33. Furthermore, they yield (3.52) by Theo-
rem 3.34. It remains to verify that (3.52) implies the local Lipschitz con-
tinuity of f around x in finite dimensions. Assuming the contrary, we get
x∈ / int(dom(f )) and thus x ∈ bd(dom(f )). It follows from Theorem 3.3(c)
that N (x; dom(f )) = {0}, which contradicts (3.51) and ends the proof. 
Now we derive general conditions ensuring the subdifferentiability of convex
functions (i.e., the existence of a subgradient) in both topological vector spaces
and in finite dimensions. As above, the finite-dimensional geometry offers a
less restrictive assumption in comparison with that in infinite dimensions. We
use different approaches to justify the subdifferentiability.

Theorem 3.39 Let f : X → R be a convex function.


(a) If X is a topological vector space and f is continuous at some point of its
domain, then ∂f (x) = ∅ for every x ∈ int(dom(f )).
(b) If X = Rn , then ∂f (x) = ∅ for every x ∈ ri(dom(f )).

Proof. To verify (a), observe that int(dom(f )) = ∅ under the imposed con-
tinuity assumption. Then taking any x ∈ int(dom(f )) and using Proposi-
tion 2.145 give us (x, f (x)) ∈ bd(epi(f )). By Theorem 3.3(b) we find a
nonzero element (x∗ , −μ) ∈ N ((x, f (x)); epi(f )) with x∗ ∈ X ∗ and μ ∈ R.
Furthermore, Theorem 3.37(a) tells us that μ ≥ 0. We want to show that
μ > 0. Suppose on the contrary that μ = 0, which yields x∗ = 0. By
Theorem 3.37(b) we get that x∗ ∈ N (x; dom(f )). Then it follows from
Theorem 3.3(b) that x ∈ bd(dom(f )), a contradiction. Thus μ > 0 and
(x∗ /μ, −1) ∈ N ((x, f (x)); epi(f )). Applying finally Proposition 3.30 gives us
x∗ /μ ∈ ∂f (x), which completes the proof of (a).
To prove (b), pick any x ∈ ri(dom(f )). Since (x, f (x)) ∈ / ri(epi(f )) by
Corollary 2.95, we can separate (x, f (x)) and epi(f ) properly by a hyperplane.
This gives us (v, −μ) ∈ Rn × R such that
208 3 CONVEX GENERALIZED DIFFERENTIATION

v, x − μλ ≤ v, x − μf (x) whenever (x, λ) ∈ epi(f ). (3.53)


In addition, there exists (x0 , λ0 ) ∈ epi(f ) for which
v, x0  − μλ0 < v, x − μf (x).
Using (3.53) with x := x and λ := f (x) + 1 yields μ ≥ 0. Let us now show
that μ > 0. Assuming the contrary, we get the conditions
v, x ≤ v, x for all x ∈ dom(f ) and v, x0  < v, x,
which mean that x and dom(f ) can be properly separated by a hyperplane,
and therefore x ∈ / ri(dom(f )) by Theorem 2.92. This contradiction shows
that μ > 0. The latter allows us to derive from (3.53) that (v/μ, −1) ∈
N ((x, f (x)); epi(f )) and thus to obtain v/μ ∈ ∂f (x). 

The following example demonstrates that the qualification condition in


Theorem 3.39(b) is essential for subdifferentiability.

Example 3.40 Consider the function f : R → R given by


 √
− 1 − x2 if |x| ≤ 1,
f (x) :=
∞ otherwise.

Then ∂f (x) = ∅ whenever |x| ≥ 1 while ±1 ∈ dom(f ).

Now we calculate the normal cone to the sublevel sets


  
Lα := x ∈ X  f (x) ≤ α , α ∈ R, (3.54)

of a convex function f : X → R by using the basic normal cone intersection


rule established in Theorem 3.10.

Proposition 3.41 Let f : X → R be a convex function on a topological vector


space X, and let x ∈ dom(f ) with α := f (x). The inclusion
  
N (x; Lα ) ⊃ R+ ∂f (x) := λx∗  λ ≥ 0, x∗ ∈ ∂f (x)
holds for the normal cone to the sublevel set (3.54). If in addition f is con-
tinuous at x with 0 ∈
/ ∂f (x), then we get the exact representation
   
N x; Lα = R+ ∂f (x) = λx∗  λ ≥ 0, x∗ ∈ ∂f (x) .

Proof. For any x∗ ∈ R+ ∂f (x) there are λ ≥ 0, u∗ ∈ ∂f (x) with x∗ = λu∗ and
u∗ , x − x ≤ f (x) − f (x) whenever x ∈ X.
It follows therefore that

x∗ , x − x = λu∗ , x − x ≤ λ f (x) − f (x) for all x ∈ X.
3.3 Subgradients of Convex Functions 209

Taking now any x ∈ Lα , we get f (x) ≤ α = f (x), and so x∗ , x − x ≤ 0.


It shows that x∗ ∈ N (x; Lα ), and thus the inclusion “⊃” is verified.
Let us prove the opposite inclusion “⊂” under the imposed additional
assumptions on f . Fix x∗ ∈ N (x; Lα ) and define the sets
Ω1 := epi(f ) and Ω2 := X × {α}.
Then (x∗ , 0) ∈ N ((x, f (x)); Ω1 ∩ Ω2 ). Furthermore, we have
  
int(Ω1 ) = (x, λ) ∈ X × R  x ∈ int(dom(f )), λ > f (x) .
Since 0 ∈ / ∂f (x), there is u ∈ X with f (u) < f (x) = α, and so (u, α) ∈
int(Ω1 ) ∩ Ω2 . The normal cone intersection rule of Theorem 3.10 tells us that
 
(x∗ , 0) ∈ N (x, f (x) ; Ω1 + N (x, f (x)); Ω2 .
Thus there exists a number μ ∈ R for which

(x∗ , 0) = (x∗ , −μ) + (0, μ) with (x∗ , −μ) ∈ N (x, f (x)); epi(f ) .
It follows from Theorem 3.37 that μ ≥ 0 and therefore x∗ ∈ μ∂f (x). 

Our next goal in this subsection is to establish a precise formula for cal-
culating the subdifferential of convex functions of one variable. The following
lemma is useful in the sequel.

Lemma 3.42 Let f : R → R be a convex function of one variable. Fix a ∈ R


and define the corresponding slope function ϑa by
f (x) − f (a)
ϑa (x) := for all x ∈ (−∞, a) ∪ (a, ∞). (3.55)
x−a
Then we have the inequality
ϑa (x1 ) ≤ ϑa (x2 ) whenever x1 , x2 ∈ (−∞, a) ∪ (a, ∞) with x1 < x2 .

Proof. This fact follows directly from Lemma 2.115. 

The second lemma uses the slope function to calculate the left and right
derivatives of f , which always exists for convex functions under consideration.

Lemma 3.43 Let f : R → R be a convex function, and let x ∈ R. Then f


admits the left and right derivative at x. Moreover, we have the relationships
 
sup ϑx (x) = f− (x) ≤ f+ (x) = inf ϑx (x),
x<x x>x

where the slope function ϑx is taken from (3.55).


210 3 CONVEX GENERALIZED DIFFERENTIATION

Proof. Lemma 3.42 tells us that the slope function ϑx is nondecreasing on the
interval (x, ∞) and is bounded from below by ϑx (x − 1). Then the limit
f (x) − f (x)
lim ϑx (x) = lim+
x→x+ x→x x−x
exists as a finite number. Furthermore, we get
lim ϑx (x) = inf ϑx (x).
x→x+ x>x

This ensures that f+ (x) exists and is calculated by

f+ (x) = inf ϑx (x).
x>x

Similarly we establish the existence of f− (x) with the formula

f− (x) = sup ϑx (x).
x<x

Applying Lemma 3.42 again tells us that


ϑx (x) ≤ ϑx (y) whenever x < x < y
 
and thus yields f− (x) ≤ f+ (x), which completes the proof. 

The aforementioned subdifferential calculation formula is as follows.

Theorem 3.44 Let f : R → R be a convex function, and let x ∈ R. Then the


subdifferential of f at x is calculated by
 
∂f (x) = f− (x), f+ (x) . (3.56)

Proof. Taking any subgradient v ∈ ∂f (x), we have by definition that


v(x − x) ≤ f (x) − f (x) for all x > x,
which readily implies that
f (x) − f (x)
v≤ when x > x.
x−x
This gives us the inequality
f (x) − f (x) 
v ≤ lim+ = f+ (x).
x→x x−x
In a similar way we get
v(x − x) ≤ f (x) − f (x) for all x < x
and therefore arrive at the lower estimate
3.3 Subgradients of Convex Functions 211

f (x) − f (x)
v≥ whenever x < x,
x−x

which ensures that v ≥ f− (x). Hence
 
∂f (x) ⊂ f− (x), f+ (x) .
 
To verify the opposite inclusion, take v ∈ [f− (x), f+ (x)] and get
 
sup ϑx (x) = f− (x) ≤ v ≤ f+ (x) = inf ϑx (x)
x<x x>x


by Lemma 3.43. It follows from the upper estimate by f+ (x) for v that
f (x) − f (x)
v ≤ ϑx (x) = whenever x > x,
x−x
which implies therefore the inequality
v(x − x) ≤ f (x) − f (x) when x ≥ x.
Proceeding similarly shows that
v(x − x) ≤ f (x) − f (x) for all x < x.
Thus v ∈ ∂f (x) and (3.56) is verified. 

Formula (3.56) is easy and convenient for calculations of the subdifferential


of real functions; see, e.g., Example 3.47 below.
Finally, we present two simple consequences of Theorem 3.44.

Corollary 3.45 Let f : R → R be a convex function. Then f is differentiable


at x if and only if ∂f (x) is a singleton. In this case we have
 
∂f (x) = f  (x) .

Proof. If f is differentiable at x, then


 
f− (x) = f+ (x) = f  (x).
By Theorem 3.44 we get
 
 
∂f (x) = f− (x), f+ (x) = f  (x) .

Assuming conversely that ∂f (x) is a singleton gives us by (3.56) that f− (x) =

f+ (x), and thus f is differentiable at x. 
Invoking the generalized Fermat rule from Proposition 3.29 together
with Theorem 3.44, we immediately obtain the following characterization of
local/global minimizers for nonsmooth convex functions of one variable.
212 3 CONVEX GENERALIZED DIFFERENTIATION

Corollary 3.46 A convex function f : R → R attains its local (equivalently


global) minimum at x if and only if
 
0 ∈ f− (x), f+ (x) .
We conclude this subsection with a simple illustrative example.
Example 3.47 Let f (x) = a|x − b| + c, where a > 0. Then f is a convex
function for which
 
f− (b) = −a, f+ (b) = a.
Thus Theorem 3.44 tells us that
∂f (b) = [−a, a].
Taking into account that f is differentiable on the open intervals (−∞, b)
and (b, ∞) and employing then Corollary 3.45, we calculate the subgradient
mapping ∂f : R →
→ R by
⎧ 

⎨ −a if x < b,
∂f (x) = − a, a if x = b,

⎩ 
a if x > b.

3.3.2 Subdifferential Sum Rules

From now on we start developing basic rules of subdifferential calculus for


extended-real-valued functions that play a crucial role in convex analysis and
its applications. Relying on the geometric approach to generalized differential
calculus, our major thrust is on the reduction of subdifferential calculus to
normal cone intersection rules the derivation of which is based in turn on set
extremality and convex separation. This is rather similar to deriving calculus
rules for coderivatives in Section 3.2. In fact, some results of subdifferential
calculus can be obtained from those established for coderivatives due to the
relationships in (3.42), but we prefer to deal directly with subgradients by
exploring specific features of extended-real-valued functions. It seems to be
more convenient for the reader, as well as for the instructor teaching classes
based on this book, if they choose to study subdifferentials independently of
coderivatives. Furthermore, our main attention here is confined to calculus
rules for the (basic) subdifferential (3.39), which is the most important for
the subsequent material of this book. Calculus rules for the singular subd-
ifferential (3.41) can be either derived in a similar way or be deduced from
the corresponding rules of coderivative calculus; we leave such derivations as
exercises to the reader. As earlier in this and previous sections, we concentrate
in what follows on deriving subdifferential calculus for functions on topologi-
cal vector spaces and on finite-dimensional spaces, while enhanced results in
Banach spaces under different assumptions are postponed until Section 4.2.
We begin this subsection with a geometric derivation of the classical sub-
differential sum rule by Moreau and Rockafellar.
3.3 Subgradients of Convex Functions 213

Theorem 3.48 Let f1 : X → R and f2 : X → R be proper convex functions


on a topological vector space X. Then
∂(f1 + f2 )(x) ⊃ ∂f1 (x) + ∂f2 (x) for all x ∈ dom(f1 ) ∩ dom(f2 ). (3.57)
If in addition one of the functions f1 , f2 is continuous at some point u ∈
dom(f1 ) ∩ dom(f2 ), then we have the equality
∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x) for all x ∈ dom(f1 ) ∩ dom(f2 ). (3.58)
Proof. To verify (3.57), pick x∗ ∈ ∂f1 (x)+∂f2 (x) with x ∈ dom(f1 )∩dom(f2 )
and thus get the representation x∗ = x∗1 + x∗2 , where x∗i ∈ ∂fi (x) for i = 1, 2.
It follows from the subdifferential definition (3.39) that
x∗i , x − x ≤ fi (x) − fi (x) for all x ∈ X as i = 1, 2.
For any x ∈ X we have therefore that
x∗1 + x∗2 , x − x = x∗1 , x − x + x∗2 , x − x
≤ f1 (x) − f1 (x) + f 2 (x) − f2 (x)
≤ f1 (x) + f2 (x) − f1 (x) + f2 (x)
≤ (f1 + f2 )(x) − (f1 + f2 )(x),
which shows that x∗ = x∗1 + x∗2 ∈ ∂(f1 + f2 )(x).
The proof of the (main) opposite inclusion is significantly more involved.
Assume for definiteness that f1 is continuous at some point u ∈ dom(f1 ) ∩
dom(f2 ). Fix any x∗ ∈ ∂(f1 + f2 )(x) and write by definition that
x∗ , x − x ≤ (f1 + f2 )(x) − (f1 + f2 )(x) for all x ∈ X. (3.59)
Define now the following convex subsets of the product space X × R × R by
  
Ω1 := (x, λ1 , λ2 ) ∈ X × R × R  λ1 ≥ f1 (x) = epi(f1 ) × R,
Ω2 := (x, λ1 , λ2 ) ∈ X × R × R | λ2 ≥ f2 (x) .
It is easy to see that (x∗ , −1, −1) ∈ N ((x, f1 (x), f2 (x)); Ω1 ∩ Ω2 ). Applying
the normal cone intersection rule of Theorem 3.10 in topological vector spaces,
first we check that int(Ω1 ) ∩ Ω2 = ∅. It follows from Corollary 2.145 that
   
int(Ω1 ) = (x, λ1 , λ2 ) ∈ X × R × R  x ∈ int dom(f1 ) , λ1 > f1 (x) .
Observe that u ∈ int(dom(f1 )) by the continuity of f1 . Choosing now
λ1 :=f1 (u) + 1 and λ2 :=f2 (u) + 1, we conclude that (u, λ1 , λ2 ) ∈ int(Ω1 ) ∩ Ω2 .
The normal cone intersection rule (3.8) tells us that
 
(x∗ , −1, −1) ∈ N (x, f1 (x), f2 (x)); Ω1 + N (x, f1 (x), f2 (x)); Ω2 .
Thus we have the representation
214 3 CONVEX GENERALIZED DIFFERENTIATION

(x∗ , −1, −1) = (x∗1 , −λ1 , −λ2 ) + (x∗2 , −γ1 , −γ2 )


with the corresponding inclusions

(x∗1 , −λ1 , −λ2 ) ∈ N (x, f1 (x), f2 (x)),
(x∗2 , −γ1 , −γ2 ) ∈ N (x, f1 (x), f2 (x)); Ω2 .
It follows that λ2 = γ1 = 0, λ1 = γ2 = 1, and x∗ = x∗1 + x∗2 , where (x∗1 , −1) ∈
N ((x, f1 (x)); epi(f1 )) and (x∗2 , −1) ∈ N ((x, f2 (x)); epi(f2 )). Hence we get x∗1 ∈
∂f1 (x) and x∗2 ∈ ∂f2 (x). This shows that x∗ ∈ ∂f1 (x) + ∂f2 (x), and so
∂(f1 + f1 )(x) ⊂ ∂f1 (x) + ∂f2 (x),
which verifies the sum rule (3.58) and completes the proof of the theorem. 
The following subdifferential sum rule for finitely many convex functions
can be derived from Theorem 3.48 by induction.
Corollary 3.49 Let f1 , . . . , fm : X → R be proper convex functions on a topo-
logical vector space X. Then we have

m  
m 
m
∂ fi (x) ⊃ ∂fi (x) for all x ∈ dom(fi ).
i=1 i=1 i=1

If in addition there exists a point u ∈ ∩m at which m − 1 functions


i=1 dom(fi )
fi are continuous, then we have the sum rule equality

m  
m 
m
∂ fi (x) = ∂fi (x) for all x ∈ dom(fi ).
i=1 i=1 i=1

Next we derive a finite-dimensional version of the sum rule under a weaker


relative interiority assumption and without imposing any continuity.
Theorem 3.50 Let f1 , . . . , fm : Rn → R be extended-real-valued convex func-
tions satisfying the relative interior qualification condition

m 
ri dom(fi ) = ∅, (3.60)
i=1
m
where m ≥ 2. Then for all x ∈ i=1 dom(fi ) we have

m  
m
∂ fi (x) = ∂fi (x). (3.61)
i=1 i=1

Proof. Observing that the inclusion “⊃” in (3.61) holds by Corollary 3.49, we
proceed with the proof of the opposite inclusion. Consider first the case of
m = 2 and pick any v ∈ ∂(f1 + f2 )(x). Then we have
3.3 Subgradients of Convex Functions 215

v, x − x ≤ (f1 + f2 )(x) − (f1 + f2 )(x) for all x ∈ Rn . (3.62)


Following the proof of Theorem 3.48, define the convex subsets of Rn+2 by
 
Ω1 := (x, λ1 , λ2 ) ∈ Rn × R × R | λ1 ≥ f1 (x),
Ω2 := (x, λ1 , λ2 ) ∈ Rn × R × R | λ2 ≥ f2 (x)
and get (v, −1, −1) ∈ N ((x, f1 (x), f2 (x)); Ω1 ∩ Ω2 ).
To apply now the normal cone intersection rule from Theorem 3.15, we
need to check that ri(Ω1 ) ∩ ri(Ω2 ) = ∅. It follows from Corollary 2.95 that
 
ri(Ω1 ) = (x, λ1 , λ2 ) ∈ Rn × R × R | x ∈ ridom(f1 ) , λ1 > f1 (x)},

ri(Ω2 ) = (x, λ1 , λ2 ) ∈ Rn × R × R | x ∈ ri dom(f2 ) , λ2 > f2 (x) .
Choosing z ∈ ri(dom(f1 )) ∩ ri(dom(f2 )), it is not hard to see that

z, f1 (z) + 1, f2 (z) + 1 ∈ ri(Ω1 ) ∩ ri(Ω2 ) = ∅.
Then we employ the normal cone intersection rule from Theorem 3.15 and
obtain (3.61) as in the the proof of Theorem 3.48. 

Finally in this subsection, we illustrate the application of the obtained


subdifferential sum rule to calculating the subgradient mappings of rather
simple albeit useful functions.

Example 3.51 Considering the function


f (x) := |x + 1| + |x − 1| for x ∈ R,
denote f1 (x) := |x + 1| and f2 (x) := |x − 1| for x ∈ R. Then


⎨{−1} if x < −1,
∂f1 (x) = [−1, 1] if x = −1,


{1} if x > −1.
Similarly we have the subdifferential expression


⎨{−1} if x < 1,
∂f2 (x) = [−1, 1] if x = 1,


{1} if x > 1.
It follows from the subdifferential sum rule that


⎪{−2} if x < −1,



⎪ x = −1,
⎨[−2, 0] if
∂f (x) = ∂f1 (x) + ∂f2 (x) = {0} if − 1 < x < 1,



⎪[0, 2] if x = 1,


⎩{2} if x > 1.
216 3 CONVEX GENERALIZED DIFFERENTIATION

Example 3.52 Let a1 < a2 < . . . < an , and let μi > 0 for i = 1, . . . , n. Define
the convex function

n
f (x) := μi |x − ai |, x ∈ R.
i=1

The sum rule of Theorem 3.50 tells us that


⎧   

⎪ μ − μi if x ∈
/ a1 , . . . , an
⎨ i
a
i <x a
i >x
∂f (x) =

⎪ μi − μi + − μi0 , μi0 if x = ai0 .

ai <x ai >x

Example 3.53 Given k ∈ N and a1 < a2 < . . . < a2k−1 , let


2k−1

f (x) := |x − ai |, x ∈ R.
i=1

It follows from the subdifferential formula in Example 3.52 that 0 ∈ ∂f (x) if


and only if x = ak . Thus f attains its unique global minimum at ak .
Similarly we consider the case where a1 < a2 < . . . < a2k and
2k

g(x) := |x − ai |, x ∈ R.
i=1

Then 0 ∈ ∂g(x) if and only if x ∈ [ak , ak+1 ], and thus g attains its global
minimum at any point from the interval [ak , ak+1 ].

3.3.3 Subdifferential Chain Rules

In this subsection we derive subdifferential chain rules for convex composi-


tions in topological vector spaces and finite-dimensional spaces by using the
same geometric pattern as in the preceding subsection. Note that we have to
consider such components of compositions that preserve the convexity of the
resulting function. This confines our consideration to affine inner mappings.
First we present the following useful lemma, which computes the normal
cone to the graph of an affine mapping.
Lemma 3.54 Let X and Y be topological vector spaces, and let B : X → Y be
an affine mapping given by B(x) := Ax + b, where A : X → Y is a continuous
linear mapping, and where b ∈ Y . Then for any (x, y) ∈ gph(B) we have
   
N (x, y); gph(B) = (x∗ , y ∗ ) ∈ X ∗ × Y ∗  x∗ = −A∗ y ∗ .

Proof. It is obvious that the graph gph(B) is an affine set. Furthermore, the
inclusion (x∗ , y ∗ ) ∈ N ((x, y); gph(B)) amounts to saying that
3.3 Subgradients of Convex Functions 217

x∗ , x − x + y ∗ , B(x) − B(x) ≤ 0 for all x ∈ X. (3.63)


It follows directly from the definitions that
x∗ , x − x + y ∗ , B(x) − B(x) = x∗ , x − x + y ∗ , Ax − Ax
= x∗ , x − x + A∗ y ∗ , x − x
= x∗ + A∗ y ∗ , x − x.
Thus (3.63) is equivalent to the fulfillment of x∗ + A∗ y ∗ , x − x ≤ 0 for all
x ∈ X, which clearly implies that x∗ = −A∗ y ∗ . 

The next theorem is the main subdifferential chain rule for convex com-
positions in the general topological vector space setting.

Theorem 3.55 Let B : X → Y be an affine mapping defined in Lemma 3.54,


and let f : Y → R be a convex function finite at y := B(x). We always have
   
∂(f ◦ B)(x) ⊃ A∗ ∂f (y) := A∗ y ∗  y ∗ ∈ ∂f (y) . (3.64)
If f is continuous at some point of B(X), then

∂(f ◦ B)(x) = A∗ ∂f (y) . (3.65)

Proof. Pick a subgradient y ∗ ∈ ∂f (y) and get by its definition that


y ∗ , y − y ≤ f (y) − f (y) for all y ∈ Y. (3.66)
Then for any x ∈ X with y := B(x) we deduce from (3.66) that
y ∗ , B(x) − B(x) = y ∗ , Ax − Ax
∗ ∗
 y , x − x
= A
≤ f B(x)) − f (B(x)
= (f ◦ B)(x) − (f ◦ B)(x).
It tells us that A∗ y ∗ ∈ ∂(f ◦ B)(x), which therefore verifies (3.64).
To prove the more involved and much more important opposite inclusion
“⊂” in (3.65), we employ the additional assumption that f is continuous at
some point y ∈ B(X). Fix any x∗ ∈ ∂(f ◦ B)(x) and form the convex subsets
of the space X × Y × R by
Ω1 := gph(B) × R and Ω2 := X × epi(f ). (3.67)
Proposition 2.145 provides a precise expression of the interior of the set Ω2 as
   
int(Ω2 ) = (x, y, λ)  x ∈ X, y ∈ int dom(f ) , λ > f (y) .

Choosing x ∈ X with y := B(x) and denoting λ := f (  ∈


x, y, λ)
y ) + 1 give us (
Ω1 ∩ int(Ω2 ), and so Ω1 ∩ int(Ω2 ) = ∅. Furthermore, we claim that
218 3 CONVEX GENERALIZED DIFFERENTIATION

(x∗ , 0, −1) ∈ N (x, y, λ); Ω1 ∩ Ω2 where λ = f (y).
To verify the latter inclusion, fix any (x, y, λ) ∈ Ω1 ∩ Ω2 and observe that
λ ≥ f (B(x)) since y = B(x) and λ ≥ f (y). It follows therefore that

x∗ , x − x + 0, y − y + (−1)(λ − λ) ≤ x∗ , x − x − f B(x) − f (B(x) ≤ 0.
Applying the normal cone intersection rule from Theorem 3.10 yields
 
(x∗ , 0, −1) ∈ N (x, y, λ); Ω1 + N (x, y, λ); Ω2 ,
which allows us to find y ∗ ∈ Y ∗ such that

(x∗ , 0, −1) = (x∗ , −y ∗ , 0) + (0, y ∗ , −1), (y ∗ , −1) ∈ N (y, λ); epi(f ) ,
and that (x∗ , −y ∗ ) ∈ N ((x, y); gph(B)). Then we have
x∗ = A∗ y ∗ and y ∗ ∈ ∂f (y).
This verifies that y ∗ ∈ A∗ (∂f (y)) and thus completes the proof. 
Next we present a finite-dimensional counterpart of Theorem 3.55 without
imposing any continuity assumptions.

Theorem 3.56 Let f : Rp → R be a convex function, and let B : Rn → Rp be


an affine mapping given by B(x) := Ax + b, where A is a p × n matrix with
A∗ standing for its transpose, and where b ∈ Rp . Assume that the range of B
contains a point belonging to the set ri(dom(f )). Then for any y := B(x) ∈
dom(f ) with x ∈ Rn we have the subdifferential chain rule (3.65).

Proof. It suffices to verify the inclusion “⊂” under the imposed qualifica-
tion condition in finite dimensions. Form Ω1 , Ω2 by (3.67) and observe that
ri(Ω1 ) = Ω1 = gph(B) × R. Using now Corollary 2.95 shows that
   
ri(Ω2 ) = (x, y, λ) ∈ Rn × Rp × R  x ∈ Rn , y ∈ ri dom(f ) , λ > f (y) .
Thus the assumption of this theorem ensures that ri(Ω1 )∩ri(Ω2 ) = ∅. The rest
of the proof follows the proof lines of Theorem 3.55 by applying the normal
cone intersection rule in finite dimensions given in Theorem 3.15. 

As an immediate consequence of Theorem 3.56, we arrive at the classical


chain rule of convex analysis.

Corollary 3.57 Let f : Rp → R be convex. Consider the composition


ϕ(x) := f (Ax + b), x ∈ Rn ,
where A is a p × n matrix, and where b ∈ Rp . Then we have
∂ϕ(x) = A∗ ∂f (Ax + b) for all x ∈ Rn .
3.3 Subgradients of Convex Functions 219

Finally, we illustrate the chain rule of Corollary 3.57 by a simple two-part


example dealing with smooth and nonsmooth convex compositions.
Example 3.58 (a) Consider the function
ϕ(x) := Ax − b2 , x ∈ Rn ,
where A is a p × n matrix, and where b ∈ Rp . Then we have
   
∂ϕ(x) = ∇ϕ(x) = 2A∗ (Ax − b) for all x ∈ Rn .
(b) Consider further the nonsmooth composition
ϕ(x) := Ax − b, x ∈ Rn ,
with the same A and b as above. Then using Corollary 3.57 and the subdif-
ferential calculation (3.48) gives us the representation

⎨A∗ (B) if Ax = b,
∂ϕ(x) = A∗ (Ax − b)
⎩ if Ax = b.
Ax − b

3.3.4 Subdifferentiation of Maximum Functions

Here we present yet another consequence of the normal cone intersection rule
of Theorem 3.10 to calculating subgradients of maxima of finitely many con-
vex functions (Figure 3.8). Given fi : X → R, define the maximum function
f : X → R by
  
f (x) := max fi (x)  i = 1, . . . , m , x ∈ X. (3.68)

Theorem 3.59 Consider the maximum function (3.68) defined on a topolog-


ical vector space X, where all fi : X → R are convex. Denoting by
  
I(x) := i = 1, . . . m  fi (x) = f (x) (3.69)
the active index set at x ∈ ∩mi=1 dom(fi ), it always holds
 
∂f (x) ⊃ co ∂fi (x) . (3.70)
i∈I(x)

If in addition all the functions fi are continuous at x, then we have the fol-
lowing subdifferential maximum rule:
 
∂f (x) = co ∂fi (x) . (3.71)
i∈I(x)
220 3 CONVEX GENERALIZED DIFFERENTIATION

Fig. 3.8. Theorem 3.59

Proof. Observe first that


∂fi (x) ⊂ ∂f (x) for all i ∈ I(x).
Indeed, taking any i ∈ I(x) and x∗ ∈ ∂fi (x) gives us by (3.39) and (3.69) that
x∗ , x − x ≤ fi (x) − fi (x) = fi (x) − f (x) ≤ f (x) − f (x) whenever x ∈ X.
Thus x∗ ∈ ∂f (x) while (3.70) follows from the convexity of ∂f (x).
To verify the opposite inclusion “⊂” in (3.71), deduce from (3.68) that

m
epi(f ) = epi(fi ).
i=1

Picking any subgradient x∗ ∈ ∂f (x), we get


  
m 
(x∗ , −1) ∈ N (x, f (x)); epi(f = N x, f (x) ; epi(fi ) .
i=1
m
It follows from the assumed continuity of fi that (x, f (x)+1)∈ i=1 int(epi(fi )).
Then the normal cone intersection rule of Theorem 3.10 tells us that
3.3 Subgradients of Convex Functions 221

 
m  m

(x∗ , −1) ∈ N x, f (x) ; epi(fi ) = N (x, λ); epi(fi ) with λ := f (x).
i=1 i=1

Note that λ > fi (x) if i ∈


/ I(x), and so (x, λ) ∈ int(epi(fi )) for such i. This
shows that N ((x, λ); epi(fi )) = {0} whenever i ∈
/ I(x), and thus
 
(x∗ , −1) = N x, fi (x) ; epi(fi ) ,
i∈I(x)

which gives us the representation


 
(x∗ , −1) = (x∗i , −λi ) where (x∗i , −λi ) ∈ N x, f (x) ; epi(fi ) , i ∈ I(x).
i∈I(x)


 from Theorem 3.37 that λi ≥ 0 and∗ xi 
It follows ∈ λi ∂fi (x) for i ∈ I(x).
Since i∈I(x) λi = 1, we see that the vector x = i∈I(x) x∗i belongs to the
set on right-hand side of (3.71). This completes the proof. 

The next two corollaries of Theorem 3.59 are rather straightforward. The
first one follows from the automatic continuity of real-valued convex functions.

Corollary 3.60 Let all the functions fi in (3.68) be real-valued and convex
on Rn . Then we have the the subdifferential maximum rule (3.71).

Corollary 3.61 Let fi : R → R for i = 1, . . . , m in the setting of Corol-


lary 3.60. Then the subdifferential of the maximum functions is calculated by
 
∂f (x) = [m, M ], where m := min fi− (x) and M := max fi+ (x).
i∈I(x) i∈I(x)

Proof. It follows from Theorem 3.59 by the calculations of Theorem 3.44. 

Let us finally illustrate the usage of Corollary 3.60 to determine all the
subgradients of the maximum function.

Example 3.62 Define f : R2 → R by


 
f (x) := max |x1 |, |x2 | for x = (x1 , x2 ) ∈ R2 .
 
Then f (x) = max f1 (x), f2 (x) , where
f1 (x1 , x2 ) := |x1 | and f2 (x1 , x2 ) := |x2 |.
For x = (0, 0) we have I(x) = {1, 2}, ∂f1 (x) = [−1, 1] × {0}, and ∂f2 (x) =
{0} × [−1, 1]. It follows from (3.71) that
   
∂f (x) = co ∂f1 (x) ∪ ∂f2 (x) = (v1 , v2 ) ∈ R2  |v1 | + |v2 | ≤ 1 .
If x = (1, 0), then I(x) = {1} and ∂f (x) = ∂f1 (x) = [−1, 1] × {0}.
222 3 CONVEX GENERALIZED DIFFERENTIATION

3.3.5 Distance Functions and Their Subgradients

In this subsection we study distance functions associated with nonempty sub-


sets of normed spaces. The main attention is paid to the basic distance function
associated with a nonempty set Ω in a normed space X defined by
  
d(x; Ω) = dist(x; Ω) := inf x − w  w ∈ Ω , x ∈ X. (3.72)
Some properties of distance functions associated with variable/moving sets
Ω = Ω(x) in (3.72) are discussed in Exercise 3.113 and the corresponding
commentaries in Section 3.6. Various extensions of the basic distance function
(signed/oriented distance functions, minimal time and signed minimal time
functions) are studied in detail in Chapter 6.
We begin with general properties of the distance function (3.72) and
related constructions that do not require convexity, and then proceed with
subdifferentiation of (3.72) in the case of convex sets.
First we present some simple while useful properties of the distance func-
tion (3.72), which are employed in what follows.

Proposition 3.63 Let Ω be a nonempty subset of a normed space X and let


x ∈ X. Then d(x; Ω) = 0 if and only if x ∈ Ω. Consequently, for a nonempty
closed subset Ω of X we have that d(x; Ω) = 0 if and only if x ∈ Ω.

Proof. Suppose that d(x; Ω) = 0. Then for any number k ∈ N there exists a
vector ωk ∈ Ω satisfying the inequality x − ωk  < 1/k. Thus the sequence
{ωk } converges to x, and hence x ∈ Ω.
Conversely, suppose that x ∈ Ω. Then there exists a sequence {ωk } in
Ω that converges to x. Since d(x; Ω) ≤ x − ωk  for every k ∈ N, we get
d(x; Ω) = 0 by letting k → ∞. 

The next proposition gives us the classical global Lipschitzian property of


the general class of nonconvex distance functions (3.72) on normed spaces.

Proposition 3.64 Let Ω be a nonempty subset of a normed space X. Then


|d(x; Ω) − d(y; Ω)| ≤ x − y for all x, y ∈ X, (3.73)
which tells us that the distance function d(x; Ω) is Lipschitz continuous with
constant  = 1 on the entire space X.

Proof. Fix any vectors ω ∈ Ω and x, y ∈ X. Then it follows from the distance
function definition that
d(x; Ω) ≤ x − ω = x − y + y − w ≤ x − y + y − w.
This readily yields the estimate
  
d(x; Ω) ≤ x − y + inf y − ω  ω ∈ Ω = x − y + d(y; Ω).
3.3 Subgradients of Convex Functions 223

In the same way we get


d(y; Ω) ≤ x − y + d(x; Ω).
Unifying both estimates above gives us the claimed property (3.73). 

The following observation reveals a kind of the homogeneity property of


distance functions under scalar multiplication of both the point and the set
in question by a real number.

Proposition 3.65 Let Ω be a nonempty subset of a normed space X, and let


λ ∈ R. Then
d(λx; λΩ) = |λ|d(x; Ω).

Proof. It follows from definition (3.72) that


  
d(λx; λΩ) = inf λx − λω  ω  ∈Ω 
= inf |λ| · x − ω  ω ∈ Ω = |λ|d(x; Ω),
which verifies the claimed property. 

The next property is a kind of subadditivity for the distance function.

Proposition 3.66 Let Ω1 and Ω2 be nonempty subsets of a normed space X.


Then for all x, y ∈ X we have
d(x + y; Ω1 + Ω2 ) ≤ d(x; Ω1 ) + d(y; Ω2 ).

Proof. Using the definition of the distance function and the triangle inequality
for the norm on X yields
  
d(x + y; Ω1 + Ω2 ) = inf (x + y) − (ω1 + ω2 )  ω1 ∈ Ω1 and ω2 ∈ Ω2
≤ (x + y) − (ω1 + ω2 ) ≤ x − ω1  + y − ω2 
for all ω1 ∈ Ω1 and ω2 ∈ Ω2 . This readily implies that
     
d(x + y; Ω1 + Ω2 ) ≤ inf x − ω  ω1 ∈ Ω2 + inf y − ω2   ω2 ∈ Ω2
= d(x; Ω1 ) + d(y; Ω2 ),
which is what we claimed in this proposition. 

Let us further study the notion of projections to sets associated with the
distance function (3.72). Given an element x ∈ X and a subset Ω of X, the
projection from x to Ω is defined by
  
Π(x; Ω) := ω ∈ Ω  d(x; Ω = x − ω . (3.74)
Note that in general the mapping x → Π(x; Ω) is set-valued on X and
may take empty values. The next proposition lists some sufficient conditions
ensuring the nonemptiness of this mapping.
224 3 CONVEX GENERALIZED DIFFERENTIATION

Proposition 3.67 Let Ω be a nonempty subset of a normed space X. Then


the projection set Π(x; Ω) is nonempty for every x ∈ X in each of the follow-
ing cases:
(a) Ω is compact.
(b) Ω is closed and dim(X) < ∞.
(c) Ω is closed and convex, and X is a reflexive Banach space.

Proof. In case (a) the result follows immediately from the Weierstrass exis-
tence theorem due to the continuity of the distance function.
To proceed in case (b), for every k ∈ N find ωk ∈ Ω such that
1
d(x; Ω) ≤ x − ωk  < d(x; Ω) + .
k
The sequence {ωk } ⊂ Ω is bounded in the finite-dimensional space X since
ωk  ≤ x + d(x; Ω) + 1.
Thus it has a convergence subsequence the limit of which belongs to Ω due
to the assumed closedness of the set.
Finally, consider the remaining case (c). As well known and discussed
above, every closed and bounded set in a reflexive Banach space is weakly
sequentially compact and the norm function is weakly lower semicontinuous.
Thus in this case we deduce the nonemptiness of Π(x; Ω) by applying the
Weierstrass existence theorem in the weak topology of X. 
Let us present an example showing that the convexity assumption on Ω is
essential for the fulfillment of Proposition 3.67(c).

Example 3.68 Let X be a Hilbert space with an orthogonal basis {en | n ∈


N}. Consider the set
1  

Ω := {0} ∪ en  n ∈ N ,
n
which clearly is nonempty, closed, but not convex in X. Then taking x :=
 ∞ 1
n=1 n en , it is easy to check that Π(x; Ω) = ∅.

The next example shows that the reflexivity assumption on the Banach
space X in Proposition 3.67 is also essential, even when Ω is a closed subspace
of X.

Example 3.69 Consider the (nonreflexive) Banach space C[0, 1] of all the
real-valued continuous functions defined on [0, 1] with the norm given by
  
f  := max |f (t)|  t ∈ [0, 1] for f ∈ C[0, 1].
Define the closed subspace Ω of C[0, 1] by
3.3 Subgradients of Convex Functions 225

   1 

Ω := f ∈ C[0, 1]  f (1) = 0 and f (t) dt = 0 .
0

It is possible to check that Π(f ; Ω) = ∅ for f ∈ C[0, 1] defined by f (t) := 1 − t;


see [18] for further details.
Now we strengthen the result of Proposition 3.67(c) when X is a Hilbert
space.
Proposition 3.70 Let Ω be a nonempty, closed, and convex subset of a
Hilbert space X. Then the projection Π(x; Ω) is a singleton for every x ∈ X.
Proof. Proposition 3.67(c) tells us that Π(x; Ω) = ∅ on X. To prove the
uniqueness of the projection for all x ∈ X suppose on the contrary that there
exist x ∈ X and two points ω1 , ω2 ∈ Π(x; Ω) such that ω1 = ω2 . Having
x − ω1  = x − ω2  = d(x; Ω) with ω1 = ω2
and using the parallelogram law, we get
4[d(x; Ω)]2 = 2x − ω1 2 + 2x − ω2 2
= x − ω1 + (x − ω2 )2 + x − ω1 − (x − ω2 )2
 2
 ω1 + ω2 

= 4x −  + ω1 − ω2 2 .
2 

Then a simple rearrangement yields


 2
  2
x − ω1 + ω2  = [d(x; Ω)]2 − ω1 − ω2  < [d(x; Ω)]2 .
 2  4
ω1 + ω 2
Since ∈ Ω by the convexity of Ω, we thus arrive at a contradiction,
2
which completes the proof of the proposition. 
The following example demonstrates that the Hilbert space assumption in
Proposition 3.70 is essential for the projection uniqueness even on the plane
equipped with a non-Euclidean norm.
Example 3.71 Let X := R2 with the maximum norm given by
 
x := max |x1 |, |x2 | for x = (x1 , x2 ) ∈ R2 .
Taking Ω := R × [0, ∞) and x := (0, −1), we can easily see that Π(x; Ω)
contains infinitely many elements.
In preparation to developing subdifferential theory for distance functions,
let us first show that the convexity of the distance function (3.72) is inherent
to the convexity of the associated set Ω.
Proposition 3.72 If Ω is a nonempty convex subset of a normed space X,
then the associated distance function (3.72) is convex. The converse also holds
if we assume additionally that Ω is closed.
226 3 CONVEX GENERALIZED DIFFERENTIATION

Proof. To verify the convexity of d(·; Ω), fix x1 , x2 ∈ X and t ∈ (0, 1). Given
any ε > 0, we find ωi ∈ Ω for i = 1, 2 such that
xi − ωi  < d(xi ; Ω) + ε for i = 1, 2.
The convexity of Ω ensures that tω1 + (1 − t)ω2 ∈ Ω, and thus

d tx1 + (1 − t)x2 ; Ω ≤ tx1 + (1 − t)x2 − [tω1 + (1 − t)ω2 ]
≤ tx1 − ω1  + (1 − t)x2 − ω2 
= t d(x1 ; Ω) + (1 − t)d(x2 ; Ω) + ε.
Letting now ε ↓ 0 implies that

d tx1 + (1 − t)x2 ; Ω ≤ td(x1 ; Ω) + (1 − t)d(x2 ; Ω),
which verifies the convexity of the distance function (3.72).
Conversely, suppose that d(·; Ω) is a convex function while Ω is a closed
set. Fixing ω1 , ω2 ∈ Ω and λ ∈ (0, 1), we get

d λω1 + (1 − λ)ω2 ; Ω ≤ λd(ω1 ; Ω) + (1 − λ)d(ω2 ; Ω) = 0.
The closedness of Ω yields λω1 + (1 − λ)ω2 ∈ Ω, and thus Ω is convex. 

The following characterization of projections to convex sets in Hilbert


spaces is very useful.

Proposition 3.73 Let Ω be a nonempty convex subset of a Hilbert space X,


and let ω ∈ Ω. Then ω ∈ Π(x; Ω) if and only if
x − ω, ω − ω ≤ 0 for all ω ∈ Ω. (3.75)

Proof. Suppose that ω ∈ Π(x; Ω). Then for any ω ∈ Ω and λ ∈ (0, 1), ω +
λ(ω − ω) ∈ Ω by convexity of Ω. Applying the properties of inner products
and norms in Hilbert spaces gives us
[d(x; Ω)]2 = x − ω2 ≤ x − [ω + λ(ω − ω)2  
= x − ω + λ(ω − ω) , x − ω + λ(ω − ω)
= x − ω2 + 2x − ω, −λ(ω − ω) + λ2 ω − ω2 .
It follows therefore that
2x − ω, ω − ω ≤ λω − ω,
which verifies the “only if” part (3.75) by letting λ ↓ 0.
Conversely, suppose that (3.75) holds. Then take ω ∈ Ω to get
x − ω2 = x − ω + ω − ω2
= x − ω2 + 2x − ω, ω − ω + ω − ω2
= x − ω2 − 2x − ω, ω − ω + ω − ω2
≥ x − ω2 + ω − ω2 ≥ x − ω2 .
Since this holds for all ω ∈ Ω, we have ω ∈ Π(x; Ω). 
3.3 Subgradients of Convex Functions 227

It is shown in Proposition 3.70 that the projection mapping Π(x; Ω) is


single-valued for any closed and convex set Ω in a Hilbert space. Now we show
that the mapping x → Π(x; Ω) is nonexpansive, i.e., it is Lipschitz continuous
on X with Lipschitz constant  = 1. This property is a consequence of the
following estimate important for its own sake.

Proposition 3.74 Let Ω be a nonempty, closed, and convex subset of a


Hilbert space X. Then for any x1 , x2 ∈ X we have
 
Π(x1 ; Ω) − Π(x2 ; Ω)2 ≤ Π(x1 ; Ω) − Π(x2 ; Ω), x1 − x2 . (3.76)

Proof. Fix any x1 , x2 ∈ X and consider their projections Π(x1 ; Ω) and


Π(x2 ; Ω). Proposition 3.70 tells us that Π(x1 ; Ω) and Π(x2 ; Ω) are single-
tons. Denoting ω1 := Π(x1 ; Ω) and ω2 := Π(x2 ; Ω), we deduce from Propo-
sition 3.73 that
x1 − ω1 , ω2 − ω1  = ω2 − ω1 , x1 − ω1  = ω1 − ω2 , ω1 − x1  ≤ 0.
Interchanging the positions of x1 , ω1 and x2 , ω2 gives us
ω1 − ω2 , x2 − ω2  ≤ 0.
Summing up these inequalities yields
ω1 − ω2 , x2 − x1 + ω1 − ω2  = ω1 − ω2 , x2 − x1  + ω1 − ω2 , ω1 − ω2  ≤ 0.
This verifies the claimed estimate (3.76). 

Here is the aforementioned nonexpansive property of projections.

Corollary 3.75 Let Ω be a nonempty, closed, and convex subset of a Hilbert


space X. Then the projection Π(·; Ω) is Lipschitz continuous with Lipschitz
constant  = 1, i.e., we have the estimate
Π(x1 ; Ω) − Π(x2 ; Ω) ≤ x1 − x2  for all x1 , x2 ∈ X.

Proof. It follows from the estimate (3.76) in Proposition 3.74 by applying the
Cauchy-Schwarz inequality to the right-hand side of (3.76). 

Prior to subdifferentiation of the distance function d(x; Ω) at any point


x ∈ X, we calculate the subdifferential of the infimal convolution of two
extended-real-valued convex functions f1 , f2 : X → R defined by
   
f1 f2 (x) := inf f1 (x1 ) + f2 (x2 )  x1 + x2 = x, x1 , x2 ∈ X . (3.77)
This important operation, which preserves convexity, has been considered in
Chapter 2 from the viewpoint of general properties. Now we are ready to
compute its subdifferential. The following theorem is certainly of its inde-
pendent interest, while the obtained subdifferential formula is used below for
subdifferentiation of the distance function (3.72).
228 3 CONVEX GENERALIZED DIFFERENTIATION

Theorem 3.76 Let f1 , f2 : X → R be two convex functions on a topological


vector space X, and let (f1 f2 )(x) > −∞ for all x ∈ X. Then the infimal
convolution (3.77) is convex. Furthermore, for any x ∈ dom(f1 f2 ) and any
x1 , x2 ∈ X with x = x1 + x2 and (f1 f2 )(x) = f1 (x1 ) + f2 (x2 ) we have

∂ f1 f2 (x) = ∂f1 (x1 ) ∩ ∂f2 (x2 ). (3.78)

Proof. The convexity of (3.77) is an easy consequence of the definition. Let


us next observe that

f1 f2 (x1 + x2 ) ≤ f1 (x1 ) + f2 (x2 ) whenever x1 , x2 ∈ X.
Pick x∗ ∈ ∂(f1 f2 )(x) and get for all x1 , x2 ∈ X the inequality
 
x∗ , x1 + x2 − x ≤ f1 f2 (x1 + x2 ) − f1 f2 (x).
This readily implies that
x∗ , x1 − x1  + x∗ , x2 − x2  ≤ f1 (x1 ) − f1 (x1 ) + f2 (x2 ) − f2 (x2 ).
Substituting x1 := x1 and x2 := x2 gives us x∗ ∈ ∂f1 (x1 ) ∩ ∂f2 (x2 ).
To verify the opposite inclusion in (3.78), fix any x∗ ∈ ∂f1 (x1 ) ∩ ∂f2 (x2 )
and consider x, x1 , x2 ∈ X such that x = x1 + x2 . Then
x∗ , x1 − x1  ≤ f1 (x1 ) − f1 (x1 ) and x∗ , x2 − x2  ≤ f2 (x2 ) − f2 (x2 ).
Summing up these inequalities, we get
 
x∗ , x − x ≤ f1 (x1 ) + f2 (x2 ) − f1 f2 (x).
Taking the infimum with respect to all x1 , x2 ∈ X with x = x1 + x2 gives us
 
x∗ , x − x ≤ f1 f2 (x) − f1 f2 (x)
and implies therefore that x∗ ∈ ∂(f1 f2 )(x). 

Now we are in a position to establish precise formulas for computing sub-


gradients of the distance function (3.72) at a given point x ∈ X of a normed
space X. We distinguish between the two significantly different settings: the
in-set case x ∈ Ω and the out-of-set case x ∈
/ Ω. Recall for the reader’s conve-
nience that the symbol B∗ stands for the dual closed unit ball B∗ ⊂ X ∗ while
the corresponding dual unit sphere is defined by
  
S∗ := x∗ ∈ X ∗  x∗  = 1 .
First we derive the following representations of the subdifferential ∂d(x; Ω)
at both in-set and out-of-set points based on the subdifferential calculation
for the infimal convolution taken from Theorem 3.76.
3.3 Subgradients of Convex Functions 229

Proposition 3.77 Let Ω be a nonempty convex set in a normed space X.


(a) If x ∈ Ω, then we have
∂d(x; Ω) = N (x; Ω) ∩ B∗ . (3.79)

(b) Suppose that x ∈


/ Ω and Π(x; Ω) = ∅ for the metric projection. Then
∂d(x; Ω) = ∂p(x − w) ∩ N (w; Ω) for any w ∈ Π(x; Ω)
(i.e., independently of the projection choice), where p(x) := x.

Proof. It follows from the definitions in (3.72) and (3.77) that


     
d(x; Ω) = inf x − w  w ∈ Ω = inf δΩ (w) + x − w = δΩ p (x)
w∈X

for any x ∈ X via the indicator function δΩ (x) := δ(x; Ω). If x ∈ Ω, then
d(x; Ω) = 0 = δΩ (x) + p(0), and therefore we arrive at
∂d(x; Ω) = ∂δΩ (x) ∩ ∂p(0) = N (x; Ω) ∩ B∗
by Theorem 3.76 and the above subdifferential computations for the indicator
and norm functions. This verifies (a). The proof of (b) is similar. 

Remark 3.78 It follows directly from (3.79) that



N (x; Ω) = λ∂d(x; Ω) for all x ∈ Ω. (3.80)
λ>0

Thus (3.80) can be taken as an equivalent definition of the normal cone to


convex sets at in-set points via the subdifferential of the (globally) Lipschitz
continuous distance function.

Next we present an important and useful consequence of Proposition 3.77


for closed and convex sets in Hilbert spaces.

Corollary 3.79 Let X be a Hilbert space, and let Ω ⊂ X be a nonempty,


closed, and convex set. Then we have

⎨N (x; Ω) ∩ B if x ∈ Ω,
∂d(x; Ω) =  x − Π(x; Ω) 
⎩ if x ∈
/ Ω,
d(x; Ω)
where the metric projection Π(x; Ω) is a singleton.

Proof. The above formula is proved in Proposition 3.77 in the case where
x ∈ Ω. Consider now the case where x ∈ / Ω and observe by Proposition 3.70
that in this case the metric projection Π(x; Ω) is a (nonempty) singleton that
230 3 CONVEX GENERALIZED DIFFERENTIATION

is denoted by Π(x; Ω) =: {w}. Using further Proposition 3.77(b) and the


x−w
inclusion ∈ N (w; Ω) for projections in Hilbert spaces tells us that
x − w
∂d(x; Ω) = ∂p(x − w) ∩ N (w; Ω)
 x−w 
= ∩ N (w; Ω)
x − w
 x − Π(x; Ω) 
= ,
d(x; Ω)
which therefore verifies the claimed formula. 

Note that the assumption Π(x; Ω) = ∅ of Proposition 3.77(b) is rather


restrictive in infinite-dimensional spaces that are not Hilbert spaces. Let us
now obtain another formula for calculating the subdifferential ∂d(x; Ω) at
x∈/ Ω without the latter assumption in terms of normals to the corresponding
enlargement of Ω defined by
  
Ωr := x ∈ X  d(x; Ω) ≤ r for any r > 0. (3.81)
It is easy to see that Ωr is nonempty and convex if Ω has these properties.
Furthermore, the enlargement (3.81) is always closed even if Ω is not.
Prior to the proof of the aforementioned formula we present the following
two lemmas of their own interest (Figure 3.9).

Lemma 3.80 Let Ω be a nonempty subset of a normed space X, and let Ωr


be its enlargement (3.81) with some r > 0. Then for any x ∈
/ Ωr we have
d(x; Ω) = d(x; Ωr ) + r. (3.82)

Proof. Since x ∈/ Ωr , we get d(x; Ω) > r. Fix u ∈ Ωr and for any ε > 0 find
uε ∈ Ω satisfying the estimates
u − uε  ≤ d(u; Ω) + ε ≤ r + ε.
This obviously implies that
u − x ≥ uε − x − u − uε  ≥ d(x; Ω) − uε − u ≥ d(x; Ω) − r − ε.
Since the estimate u − x ≥ d(x; Ω) − r − ε holds for all u ∈ Ωr and all ε > 0,
we arrive at the inequality
d(x; Ωr ) ≥ d(x; Ω) − r.
To verify the opposite inequality in (3.82), fix u ∈ Ω and define the con-
tinuous function f : R+ → R by

f (t) := d tx + (1 − t)u; Ω , t ≥ 0.
3.3 Subgradients of Convex Functions 231

Fig. 3.9. Lemma 3.80

Since f (0) = 0 and f (1) > ρ > 0, there exists t0 ∈ (0, 1) with f (t0 ) = r ∈ (0, ρ)
by the classical intermediate value theorem. Putting now v := t0 x + (t − t0 )u,
we have dΩ (v) = r and x − u = x − v + v − u, which implies in turn by
using u ∈ Ω and v ∈ Ωr that
x − u ≥ x − v + d(v; Ω) = x − v + r.

The latter yields x − u ≥ d x; Ωr + r and thus verifies (3.82). 

Lemma 3.81 Let Ω be as in Lemma 3.80, and let x ∈


/ Ω. Then for any
subgradients u∗ ∈ ∂d(x; Ω) we have u∗ ∈ S∗ .

Proof. It follows from the subgradient definition that


u∗ , x − x ≤ d(x; Ω) − d(x; Ω) whenever x ∈ X. (3.83)
Since d(·; Ω) is Lipschitz continuous on X with modulus  = 1, we get
u∗  ≤ 1. Given any t ∈ (0, 1), find ω ∈ Ω such that
d(x; Ω) ≤ x − ω < d(x; Ω) + t2 .
Employing (3.83) with x := x + t(ω − x) yields
232 3 CONVEX GENERALIZED DIFFERENTIATION

−tu∗ , x − ω ≤ d(x + t(ω − x); Ω) − d(x; Ω)


≤ x + t(ω − x) − ω − x − ω + t2
≤ (1 − t)x − ω − x − ω + t2
= −tx − ω + t2 ,
which gives us the estimates
x − ω ≤ u∗ , x − ω + t ≤ u∗  · x − ω + t.
This tells us therefore that
t t
1 ≤ u∗  + ≤ u∗  + .
x − ω d(x; Ω)
Letting there t ↓ 0 verifies that u∗  ≥ 1 and thus completes the proof. 
Now we are ready to establish the following convenient representation of
subgradients of the distance function at out-of-set points.

Theorem 3.82 Let Ω be a nonempty convex subset of a normed space X, let


x∈
/ Ω, and let r := d(x; Ω) in (3.81). Then we have
∂d(x; Ω) = N (x; Ωr ) ∩ S∗ .

Proof. Fix any u∗ ∈ ∂d(x; Ω) and get (3.83) by the subdifferential definition.
Thus d(x; Ω) ≤ r = d(x; Ω) whenever x ∈ Ωr , and hence
u∗ , x − x ≤ d(x; Ω) − d(x; Ω) ≤ 0 for all x ∈ Ωr ,
i.e., u∗ ∈ N (x; Ωr ). The fact that u∗ ∈ S∗ is proved in Lemma 3.81. The proof
of the reverse inequality is left as an exercise; see Theorem 6.63 

3.4 Generalized Differentiation under Polyhedrality


This section is devoted to developing advanced results on generalized differ-
ential calculus for normals to sets, coderivatives of set-valued mappings, and
subgradients of extended-real-valued functions under polyhedral convexity in
LCTV spaces. The role of polyhedral convexity in finite dimensions has been
well recognized in convex analysis and its applications. However, not much
has been done in infinite-dimensional settings. Our approach here is based on
extended notions of relative interiors studied in Section 2.5. In Chapter 4 we
present more results on polyhedral convexity in LCTV spaces with applica-
tions to conjugate calculus and duality in optimization.

3.4.1 Polyhedral Convex Separation

The definition of polyhedral sets, or convex polyhedra, in real topological vector


spaces is similar to the case of finite dimensions.
3.4 Generalized Differentiation under Polyhedrality 233

Definition 3.83 Let Ω be a subset of a topological vector space X. We say


that Ω is a polyhedral set in X if there exist continuous linear functionals
fi ∈ X ∗ and numbers αi ∈ R for i = 1, . . . , m such that
  
Ω = x ∈ X  fi (x) ≤ αi for all i = 1, . . . , m . (3.84)

Geometrically, representation (3.84) means that Ω can be expressed as the


intersection of finitely many closed halfspaces. Following the terminology and
explanation in Rockafellar and Wets [317, page 75], we do not add “con-
vex” when referring to polyhedral set, but “convex” is used while referring
to polyhedra. Classical convex analysis in finite-dimensional spaces developed
in Rockafellar’s book [306] clearly distinguishes convex polyhedra from other
convex sets in results related to subdifferential calculus and its applications.
A key ingredient of polyhedrality in Rn is the following Rockafellar’s sepa-
ration theorem [306, Theorem 20.2] for which we present a somewhat different
and more detailed proof before establishing its extension to the general setting
of LCTV spaces.
Given two nonempty sets Ω1 and Ω2 in a topological vector space X. We
say that Ω1 and Ω2 can be separated by a closed hyperplane that does not
contain Ω2 if there exist f ∈ X ∗ and α ∈ R such that
        
sup f (x)  x ∈ Ω1 ≤ α ≤ inf f (x)  x ∈ Ω2 and α < sup f (x)  x ∈ Ω2 .
Here is the aforementioned Rockafellar’s theorem in finite dimensions.

Theorem 3.84 Let P and Ω be two nonempty convex sets of Rn such that P
is a convex polyhedron. The statement that there exists a (closed) hyperplane
separating P and Ω and not containing Ω is equivalent to
P ∩ ri(Ω) = ∅. (3.85)

Proof. First we verify that the separation property formulated in the theorem
implies that condition (3.85) is satisfied. To the end, assume that there exists
a hyperplane H defined by
  
H := x ∈ Rn  v, x = α for some v ∈ Rn , v = 0, and α ∈ R
that separates P and Ω and does not contain Ω. Since H separates P and Ω,
we have without loss of generality that
v, x ≤ α ≤ v, p for all x ∈ Ω and p ∈ P. (3.86)
Remembering that Ω ⊂ H gives us a vector x ∈ Ω such that
v, x < α ≤ v, p for all p ∈ P.
Observe furthermore the relationships
v, x ≤ α = v, u for all x ∈ Ω and u ∈ H
and also that the above vector x ∈ Ω satisfies the strict inequality
234 3 CONVEX GENERALIZED DIFFERENTIATION

v, x < α = v, u for all u ∈ H,


which means that the hyperplane H properly separates the convex sets Ω and
H. To justify now the fulfillment of (3.85), suppose the contrary and pick an
arbitrary element x0 ∈ P ∩ ri(Ω). Then it follows from (3.86) that
v, x0  ≤ α ≤ v, x0 ,
i.e., v, x0  = α, which implies that x0 ∈ H. Since H is an affine set, we get
ri(H) = H, and therefore
x0 ∈ H ∩ ri(Ω) = ri(H) ∩ ri(Ω).
This contradicts the proper separation of Ω and H, since we know from
Theorem 2.92 that the latter property is characterized by the condition
ri(H) ∩ ri(Ω) = ∅. Thus we verify the necessity (3.85) for the separation
property, which is claimed in the theorem.
The proof of the sufficiency of condition (3.85) for the separation property
stated in the theorem is more involved. To proceed, suppose that (3.85) holds
and define the set
D := P ∩ aff(Ω). (3.87)
Assuming that D = ∅ and taking into account that both sets P and aff(Ω) are
convex polyhedra, it is not hard to check that they can be strictly separated
by a hyperplane H; see Exercise 3.115. Observe that this hyperplane strictly
separates P and Ω and does not contain Ω due to Ω ⊂ aff(Ω) ⊂ int (H + )
while P ⊂ int (H − ), where H + and H − stand for the corresponding “upper”
and “lower” halfspaces with respect to H.
Now we consider the remaining case where D = ∅. It follows from the
imposed condition (3.85) that
ri(D) ∩ ri(Ω) ⊂ D ∩ ri(Ω) ⊂ P ∩ ri(Ω) = ∅.
Then Theorem 2.92 applied to Ω and D tells us that these sets can be properly
separated by the hyperplane H0 defined by
  
H0 := x ∈ Rn  v0 , x = α0 for some v0 ∈ Rn , v0 = 0, and α0 ∈ R.
Supposing that Ω ⊂ H0 gives us aff(Ω) ⊂ H0 and hence D ⊂ H0 , which yields
v0 , x = v0 , y = α0 for all x ∈ Ω and y ∈ D. This contradicts the proper
separation of Ω and D by H0 and thus shows that H0 does not contain Ω.
To proceed further, suppose that
     
Ω ⊂ H0− := x ∈ Rn  v0 , x ≤ α0 , i.e., D ⊂ H0+ := x ∈ Rn  v0 , x ≥ α0 ,
and then define similarly to (3.87) the set intersection
Θ := H0− ∩ aff(Ω).
3.4 Generalized Differentiation under Polyhedrality 235

Since H0 does not contain Ω and since Ω ⊂ Θ, we have that H0 also does not
contain Θ. This tells us by arguing as in the case of (3.87) that the sets H0
and Θ can be properly separated. Thus
H0 ∩ ri(Θ) = ri(H0 ) ∩ ri(Θ) = ∅
by the characterization of Theorem 2.92. Assuming now that D ∩ ri(Θ) = ∅,
pick any x ∈ D ∩ ri(Θ) and then obtain that
v0 , x ≤ α0 and α0 ≤ v0 , x,
which yields x ∈ H0 . This contradicts the condition H0 ∩ ri(Θ) = ∅ and hence
shows that D∩ri(Θ) = ∅. If furthermore x ∈ P ∩ri(Θ), then x ∈ P ∩aff(Ω) = D
since ri(Θ) ⊂ aff(Ω). This means that x ∈ D ∩ ri(Θ) while contradicting the
assumption D ∩ ri(Θ) = ∅. Thus we arrive at
P ∩ ri(Θ) = ∅. (3.88)
If moreover P ∩Θ = ∅, then—remembering that both these sets are polyhedral
and arguing as above in case (3.87)—we find a hyperplane H, which strictly
separates Θ and P and provides the inclusions
Ω ⊂ Θ ⊂ int (H − ) and P ⊂ int (H + ).
We see that this hyperplane H separates P and Ω and does not contain Ω.
Consider next the remaining case where P ∩ Θ = ∅ and suppose without
loss of generality that 0 ∈ P ∩ Θ. Since Θ ⊂ aff(Ω), we have
0 ∈ P ∩ Θ ⊂ P ∩ aff(Ω) = D, and so 0 ∈ Θ ∩ D ⊂ H0− ∩ H0+ = H0 .
Using 0 ∈ H0 gives us α0 = 0, i.e., H0 = {x ∈ Rn | v0 , x = 0}.
Now we verify that 0 ∈ / int (P ). Indeed, suppose on the contrary that
0 ∈ int(P ) and then get by 0 ∈ Θ that [x, 0) ⊂ ri(Θ) for each x ∈ ri(Θ); see
Theorem 2.83. Since 0 ∈ int(P ) clearly yields int(P ) ∩ [x, 0) = ∅, this shows
that int(P ) ∩ ri(Θ) = ∅. The latter contradicts the assumption P ∩ ri(Θ) = ∅
and hence confirms that 0 ∈ / int (P ). Furthermore, taking into account that
P is a convex polyhedron with 0 ∈ P \ int(P ), we represent P in the form
    
P = x  ui , x ≤ 0, i = 1, . . . , m x  uj , x ≤ βj , j = m + 1, . . . , m ,

where 1 ≤ m ≤ m, where vk are nonzero vectors in Rn for k = 1, . . . , m, and


where βj > 0 for j = m + 1, . . . , m.
Denote by M the relative boundary of the above set Θ. Using the above
notation, we get the representation
M = H0 ∩ aff(Ω)
as depicted in Figure 3.10. Note that both sets M and aff(Ω) are linear sub-
spaces due to 0 ∈ H0 and 0 ∈ Θ ⊂ aff(Ω), while Θ is a convex cone.
Considering further the polyhedral (convex) cone
236 3 CONVEX GENERALIZED DIFFERENTIATION

H0+ H0

H0−

D
M

Fig. 3.10. Ω, P , D, H0 , M , and Θ

  
K := x ∈ Rn  ui , x ≤ 0, i = 1, . . . , m + M = cone(P ) + M, (3.89)
we claim that K ∩ri(Θ) = ∅. Indeed, suppose on the contrary that there exists
a vector x ∈ K ∩ ri(Θ) and deduce from x ∈ K and x ∈ / M that
x = γw + u, for some w ∈ P, u ∈ M, and γ > 0.
It follows therefore that
w = γ −1 x − γ −1 u ∈ P.
Using x ∈ ri(Θ) and −u ∈ M ⊂ Θ with the subspace M , we obtain
λx − (1 − λ)u ∈ ri(Θ) whenever λ ∈ (0, 1). (3.90)
Since Θ is a cone, ri(Θ) is a cone as well. Combining this with (3.90) yields
λ
x − u ∈ ri(Θ) for all λ ∈ (0, 1),
1−λ
which is equivalent to the inclusion
3.4 Generalized Differentiation under Polyhedrality 237

αx − u ∈ ri(Θ) whenever α > 0.


In particular, we have x − u ∈ ri(Θ). Multiplying the latter by γ −1 produces
w = γ −1 x − γ −1 u ∈ ri(Θ)
which means that w ∈ P ∩ ri(Θ). This clearly contradicts (3.88), and hence
we arrive at the claimed condition K ∩ ri(Θ) = ∅.
We proceed with the proof of the theorem by representing the convex
polyhedron (3.89) in the form

s
K= Hi− ,
i=1

where each Hi− , i = 1, . . . , s, is a closed halfspace generated by the hyperplane


Hi , which passes through the origin since K is a cone. This means that for
each i ∈ {1, . . . , s} there exists a nonzero vector vi ∈ Rn such that Hi− =
{x ∈ Rn | vi , x ≤ 0}. Since M ⊂ K, we have M ⊂ Hi− for all i = 1, . . . , s.
Let us now show that Hi− ∩ ri(Θ) = ∅ for at least one index i ∈ {1, . . . , s}.
Observe first that if Hi− ∩ ri(Θ) = ∅, then Θ ⊂ Hi− . Indeed, taking any
x ∈ Hi− ∩ ri(Θ) gives us v0 , x < 0. Then pick y ∈ Θ and let
v0 , y
λ := and z := y − λx.
v0 , x
We have that λ ≥ 0 and z ∈ H0 by v0 , z = 0 and z ∈ aff(Ω), which hold due
to x, y ∈ aff(Ω) and aff(Ω) is a subspace. This gives us z ∈ H0 ∩ aff(Ω) = M
and therefore z ∈ Hi− . By taking into account that y = λx + z, vi , z ≤ 0,
and vi , x ≤ 0, we arrive at the relationships
vi , y = vi , λx + z = λvi , x + vi , z ≤ 0,
which show that y ∈ Hi− . Since y was chosen arbitrarily in Θ, we conclude
Θ ⊂ Hi− whenever Hi− ∩ ri(Θ) = ∅. This implies that Hi0 ∩ ri(Θ) = ∅ for some
i0 ∈ {1, . . . , m}, because the contrary means that
ri(Θ) ⊂ Θ ⊂ ∩ki=1 Hi− = K,
which readily contradicts ri(Θ) ∩ K = ∅.
Finally, denoting H := Hi0 for some index i0 where Hi−0 has an empty
intersection with ri(Θ), we have Ω ⊂ Θ ⊂ H + , P ⊂ K ⊂ H − , and Ω ⊂ H
due to ri(Θ) ⊂ Θ ⊂ H. This verifies the sufficiency of (3.85) for the claimed
separation property and thus completes the proof of the theorem. 

The main difference between characterizing the proper separation of gen-


eral convex sets in finite dimensions and the setting where one of these sets
238 3 CONVEX GENERALIZED DIFFERENTIATION

is polyhedral is that relative interiors of polyhedral sets are not needed in the
characterization condition (3.85).
Next we establish an extension of Theorem 3.84 to infinite dimensions. To
proceed, we need the following useful lemma that holds in LCTV spaces.

Lemma 3.85 Let X be an LCTV space, let Ω be a nonempty convex subset


of X, and let A : X → Rn be a continuous linear mapping. If qri(Ω) = ∅, then
we have the equality  
A qri(Ω) = ri A(Ω) . (3.91)

Proof. We provide here a direct proof of the inclusion “⊂” without requiring
that qri(Ω) = ∅ and the LCTV assumption on X. Fix any x ∈ qri(Ω) and
get from the definition that cone(Ω − x) is a linear subspace of X. Then
A(cone(Ω − x)) is a linear subspace of Rn , and hence it is closed. Thus
  
cone A(Ω − x) = A cone(Ω − x) ⊂ A cone(Ω − x) .
Since the opposite inclusion is satisfied by the continuity of A, we have
 
cone A(Ω − x) = A cone(Ω − x) ,
which implies that Ax ∈ qri(A(Ω)) = ri(A(Ω)). This justifies the inclusion
“⊂” in (3.91). Since A(Ω) ⊂ Rn is obviously quasi-regular, the opposite inclu-
sion follows directly from Theorem 2.183. 

The following theorem extending Theorem 3.84 to arbitrary LCTV spaces


reveals that the same holds with the replacement of relative interiors of non-
polyhedral convex sets by their quasi-relative interiors, provided that the
latter are nonempty. Recall by Theorem 2.178 that quasi-relative interiors
are nonempty for all nonempty, closed, and convex sets in separable Banach
spaces. Note also (see Exercise 3.116) that ri(P ) = ∅ for any nonempty poly-
hedral subset of an LCTV space. Thus it follows from Theorem 2.174 that
qri(P ) = ri(P ) = ∅ for nonempty polyhedral sets.

Theorem 3.86 Let P and Ω be nonempty convex sets in an LCTV space X.


Suppose that P is a convex polyhedron and that qri(Ω) = ∅. Then we have
that P ∩ qri(Ω) = ∅ if and only if there exists a closed hyperplane H ⊂ X
such that H does not contain Ω and separates Ω and P .

Proof. Assume that the sets P and Ω can be separated by a closed hyperplane
H with Ω ⊂ H. Then there exist f ∈ X ∗ and α ∈ R such that
sup f (x) ≤ α ≤ inf f (x) and α < sup f (x). (3.92)
x∈P x∈Ω x∈Ω

Suppose on the contrary that P ∩ qri(Ω) = ∅. Then we get x ∈ P ∩ qri(Ω)


and deduce from (3.92) that f (x) ≥ 0 for all x ∈ cone(Ω − x). Since the set
cone(Ω − x) is a linear subspace, it follows that f (x) = 0 for all x ∈ Ω − x, a
contradiction which verifies the fulfillment of (3.92).
3.4 Generalized Differentiation under Polyhedrality 239

To verify the converse statement, assume that P ∩ qri(Ω) = ∅ with


qri(Ω) = ∅ and show the fulfillment of the separation property formulated
in the theorem. Since P is a convex polyhedron, we have its representation
  
P = x ∈ X  fi (x) ≤ bi , i = 1, . . . , m
m
with fi ∈ X ∗ and bi ∈ R for i = 1, . . . , m. Set L := i=1 ker(fi ) and deduce
from Proposition 1.119, the quotient space X/L is finite-dimensional. For each
i ∈ {1, . . . , m} consider the functions fi : X/L → R given by

fi ([x]) := fi (x), where [x] := x + L ∈ X/L.

We see that the functions fi are well-defined with fi ∈ (X/L)∗ for all i =
1, . . . , m. Remembering the construction of the quotient map π : X → X/L
from (1.3) gives us easily that
  
π(P ) = [x] ∈ X/L  fi ([x]) ≤ bi , i = 1, . . . , m ,
and hence the set π(P ) is a convex polyhedron
 in X/L.
 Since X/L is finite-
dimensional, by Lemma 3.85 we have π qri(Ω) = ri π(Ω) . Assuming now
that π(P ) ∩ ri(π(Ω)) = ∅, i.e., there is x ∈ qri(Ω) with [x] ∈ π(P ), yields
fi (x) = fi ([x]) ≤ bi for all i = 1, . . . , m,

 that x ∈ P ∩ qri(Ω), which is a contradiction showing that π(P ) ∩


and
ri π(Ω) = ∅. Thus we are in a position to apply Theorem 3.84 and get
therefore x∗ ∈ (X/L)∗ and α ∈ R such that
sup x∗ , [x] ≤ α ≤ inf x∗ , [x] and α < sup x∗ , [x].
[x]∈π(P ) [x]∈π(Ω) [x]∈π(Ω)

Proposition 1.118 ensures that the adjoint operator π ∗ : (X/L)∗ → X ∗ is onto


the space L⊥ . Letting f := π ∗ (x∗ ), we deduce from this proposition that
f ∈ L⊥ ⊂ X ∗ with f (x) = x∗ , [x] for x ∈ X, and so
sup f (x) ≤ α ≤ inf f (x) and α < sup f (x),
x∈P x∈Ω x∈Ω

which completes the proof of the theorem. 


3.4.2 Polyhedral Normal Cone Intersection Rule

Having in hand the polyhedral separation result from Theorem 3.86 in LCTV
spaces, we now utilize it to develop calculus rules of generalized differentiation.
Following the geometric approach of this book, our first attention is paid to
establishing the normal cone intersection rule for two convex subsets of an
LCTV space, where one set is polyhedral. The obtained result is an extension
of the polyhedral intersection rule in finite dimensions by replacing the relative
interior with its quasi-relative counterpart.
Theorem 3.87 Let P and Ω be nonempty convex subsets of an LCTV space
X, where P is a convex polyhedron. Assuming that
240 3 CONVEX GENERALIZED DIFFERENTIATION

P ∩ qri(Ω) = ∅, (3.93)
we have the following normal cone intersection rule:
N (x; P ∩ Ω) = N (x; P ) + N (x; Ω) for all x ∈ P ∩ Ω. (3.94)

Proof. Fix x ∈ P ∩ Ω and x∗ ∈ N (x; P ∩ Ω), and then get by definition that
x∗ , x − x ≤ 0 for all x ∈ P ∩ Ω.
Define further two convex sets in the product space X × R by
  
Q := (x, λ) ∈ X × R  x ∈ P, λ ≤ x∗ , x − x
(3.95)
and Θ := Ω × [0, ∞).
It is easy to see that qri(Θ) = qri(Ω) × (0, ∞) and that the set Q is a con-
vex polyhedron. Moreover, we can deduce from the constructions of Q, Θ in
(3.95) and the choice of x∗ that Q ∩ qri(Θ) = ∅. The separation results of
Theorem 3.86 give us a nonzero pair (w∗ , γ) ∈ X ∗ × R and α ∈ R such that
w∗ , x + λ1 γ ≤ α ≤ w∗ , y + λ2 γ for all (x, λ1 ) ∈ Q, (y, λ2 ) ∈ Θ. (3.96)

Furthermore, there exists a pair ( 2 ) ∈ Θ satisfying


y, λ
2 γ.
α < w∗ , y + λ
Using (3.96) with (x, 0) ∈ Q and (x, 1) ∈ Θ yields γ ≥ 0. Now we employ
the quasi-relative interior qualification condition (3.93) to show that γ = 0.
Suppose on the contrary that γ = 0 and then get
w∗ , x ≤ α ≤ w∗ , y for all x ∈ P, y ∈ Ω,
and α < w∗ , y with y ∈ Ω.
This tells us that the sets P and Ω can be separated by a hyperplane H which
does not contain Ω, and thus by Theorem 3.86 we have P ∩ qri(Ω) = ∅. The
obtained contradiction shows that γ > 0.
We immediately deduce from (3.96) that
w∗ , x ≤ α ≤ w∗ , x for all x ∈ Ω,
and thus −w∗ ∈ N (x; Ω) and hence −w∗ /γ ∈ N (x; Ω). It also follows from
(3.96) when (x, β) ∈ Q with β := x∗ , x − x and (x, 0) ∈ Θ that
w∗ , x + γx∗ , x − x ≤ w∗ , x whenever x ∈ P.
Dividing both sides therein by γ gives us the inequality
 w∗ 
+ x∗ , x − x ≤ 0 for all x ∈ P,
γ
w∗
and so γ + x∗ ∈ N (x; P ). Therefore, we arrive at
3.4 Generalized Differentiation under Polyhedrality 241

w∗  −w∗ 
x∗ = + x∗ + ∈ N (x; P ) + N (x; Ω),
γ γ
which verifies the inclusion “⊂” in (3.94) and thus completes the proof of the
theorem, since the opposite inclusion is trivial. 

3.4.3 Polyhedral Calculus for Coderivatives and Subdifferentials

In this section we establish major calculus rules for coderivatives of convex set-
valued mappings and subdifferentials of convex extended-real-valued functions
in LCTV spaces under certain polyhedrality assumptions. These assumptions
allow us to significantly improve the previous qualification conditions used
for coderivative and subdifferential calculi in LCTV spaces in nonpolyhedral
settings. According to the geometric approach, the driving force of our deriva-
tions here is the application of the polyhedral intersection rule for normal
cones to convex sets obtained above in Theorem 3.87.
We start with the following notions of polyhedral set-valued mappings and
extended-real-valued functions in topological vector spaces.
Definition 3.88 Let X and Y be topological vector spaces.
(a) A mapping F : X → → Y is said to be polyhedral set-valued map-
ping/multifunction if its graph is a convex polyhedron in X × Y .
(b) An extended-real-valued function f : X → R is said to be a polyhedral
function if its epigraph epi(f ) is a convex polyhedron in X × R.
The first result of this subsection presents a coderivative sum rule for two
set-valued mappings, one of which is polyhedral. The proof is based on the
application of the polyhedral normal cone intersection rule from Theorem 3.87.
Theorem 3.89 Consider two convex set-valued mappings F1 , F2 : X → → Y
between LCTV spaces and impose the following graphical quasi-relative inte-
rior qualification condition: there exists a triple (x, y1 , y2 ) ∈ X × Y × Y such
that 
(x, y1 ) ∈ gph(F1 ) and (x, y2 ) ∈ qri gph(F2 ) . (3.97)
Assuming in addition that the set-valued mapping F1 is polyhedral, we have
the coderivative sum rule
D∗ (F1 + F2 )(x, y)(y ∗ ) = D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ) (3.98)
valid for all (x, y) ∈ gph(F1 + F2 ), for all y ∗ ∈ Y ∗ , and for all (y 1 , y 2 ) ∈
S(x, y), where the set-valued mapping S is defined in (3.28).
Proof. Fix any x∗ ∈ D∗ (F1 + F2 )(y ∗ ) and get by Definition 3.19 that
(x∗ , −y ∗ ) ∈ N ((x, y); gph(F1 + F2 )). For every (y 1 , y 2 ) ∈ S(x, y) form two
convex subsets of the LCTV space X × Y × Y by
  
Ω1 := (x, y1 , y2 ) ∈ X × Y × Y  y1 ∈ F1 (x),
Ω2 := (x, y1 , y2 ) ∈ X × Y × Y  y2 ∈ F2 (x) .
242 3 CONVEX GENERALIZED DIFFERENTIATION

Then Ω1 is a convex polyhedron and we can easily see that


   
qri(Ω2 ) = (x, y1 , y2 ) ∈ X × Y × Y  (x, y2 ) ∈ qri gph(F2 ) .
It follows directly from the definitions that

(x∗ , −y ∗ , −y ∗ ) ∈ N (x, y 1 , y 2 ); Ω1 ∩ Ω2 . (3.99)
The imposed qualification condition clearly yields Ω1 ∩qri(Ω2 ) = ∅. Appealing
now to Theorem 3.87 for the set intersection in (3.99) tells us that
 
(x∗ , −y ∗ , −y ∗ ) ∈ N (x, y 1 , y 2 ); Ω1 + N (x, y 1 , y 2 ); Ω2 .
Thus we arrive at the representation

(x∗ , −y ∗ , −y ∗ ) = (x∗1 , −y ∗ , 0) + (x∗2 , 0, −y ∗ ) as (x∗i , −y ∗ ) ∈ N (x, y i ); gph(Fi )
for i = 1, 2. The obtained representation amounts to saying that
x∗ = x∗1 + x∗2 ∈ D∗ F1 (x, y 1 )(y ∗ ) + D∗ F2 (x, y 2 )(y ∗ ),
which justifies the inclusion “⊂” in (3.98). The opposite inclusion is obvious,
and thus we are done with the proof. 
The following polyhedral subdifferential sum rule for extended-real-valued
convex functions is a consequence of Theorem 3.89.
Corollary 3.90 Let fi : X → R, i = 1, 2, be proper convex functions defined
on an LCTV space X such that f1 is polyhedral. Under the fulfillment of the
qualification condition

dom(f1 ) ∩ qri dom(f2 ) = ∅
we have the subdifferential sum rule
∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x) whenever x ∈ dom(f1 ) ∩ dom(f2 ).
→ R by
Proof. Define the convex set-valued mappings F1 , F2 : X →
Fi (x) := fi (x), ∞ for i = 1, 2.
Then gph(F1 ) = epi(f1 ) is a convex polyhedron by definition. Fix any
x ∈ dom(f1 ) ∩ qri(dom(f2 )) and choose γ > max{f1 (x), f2 (x)}. It follows
from Theorem 2.190 that (x, γ) ∈ epi(f1 ) = gph(F1 ) and that (x, γ) ∈
qri(epi(f2 )) = qri(gph(F2 )). Furthermore, for every x∗ ∈ ∂(f1 + f2 )(x) with
x ∈ dom(f1 ) ∩ dom(f2 ) we have
x∗ ∈ D∗ (F1 + F2 )(x, y)(1).
Applying to the latter Theorem 3.89 with y i = fi (x) as i = 1, 2 gives us
x∗ ∈ D∗ F1 (x, y 1 )(1) + D∗ F2 (x, y 2 )(1) = ∂f1 (x) + ∂f2 (x),
which verifies the inclusion “⊂” in (3.61) and this completes the proof of the
corollary, since the opposite inclusion is obvious. 
3.4 Generalized Differentiation under Polyhedrality 243

Our next goal here is to obtain efficient conditions that ensure the fulfill-
ment of the graphical quasi-relative interior qualification condition (3.97) and
hence of the coderivative sum rule (3.98) derived in Theorem 3.89. Observe to
this end that it would be much easier to check relative interior-type conditions
for domains of mappings than for their graphs. To achieve this goal, we use in
what follows the corresponding relationships for generalized relative interiors
established in Section 2.5.

Theorem 3.91 Let F1 , F2 : X → → Y be two convex set-valued mappings between


LCTV spaces, and let F1 be polyhedral. Then the coderivative sum rule in The-
orem 3.89 holds under either one of the following conditions:

(a) dom(F2 ) is quasi-regular,
 and there exists u ∈ dom(F1 ) ∩ qri dom(F2 )
such that qri F2 (u) = ∅.
 
(b) There exists u ∈ dom(F1 ) ∩ qri dom(F2 ) such that int F2 (u) = ∅.

Proof. First we show that for any convex set-valued mapping F : X → → Y


between LCTV spaces the inclusion
     
qri gph(F ) ⊃ (x, y) ∈ X × Y  x ∈ qri dom(F ) , y ∈ int F (x) (3.100)

 To furnish this, pick (x, y) ∈ X × Y with x ∈ qri dom(F ) , y ∈
holds.
int F (x) and suppose on the contrary that (x, y) ∈/ qri gph(F ) . Then the
separation result from Proposition 2.181 tells us that the sets {(x, y)} and
gph(F ) can be properly separated by a closed hyperplane. This means that
there is (x∗ , y ∗ ) ∈ X ∗ × Y ∗ with
x∗ , x+y ∗ , y ≤ x∗ , x+y ∗ , y whenever x ∈ dom(F ), y ∈ F (x), (3.101)
x, y) ∈ gph(F ) for which
and that also there exists a pair (
x∗ , x
 + y ∗ , y < x∗ , x + y ∗ , y. (3.102)
Choosing x = x, we deduce from the estimate in (3.101) that
y ∗ , y ≤ y ∗ , y for all y ∈ F (x). (3.103)

Since y ∈ int F (x) , there exists a symmetric neighborhood V of the ori-
gin with V ⊂ int F (x) − {y}. It follows from (3.103) that y ∗ , v ≤ 0 and
y ∗ , −v ≤ 0 if 0 = v ∈ V , which yields y ∗ = 0 on V . The symmetry of V
ensures that for any y ∈ Y we can find 0 = t ∈ R such that ty = v ∈ V and
hence y ∗ , y = t−1 y ∗ , v = 0, i.e., y ∗ = 0 on the entire space Y . It follows
from (3.101) and (3.102) that the sets {x} and dom(F  ) can be properly sepa-
rated, and so Proposition 2.181 tells us that x ∈ / qri dom(F ) . The obtained
contradiction readily justifies (3.100).
We proceed with the proof of the qualification condition (3.97) and find
by assumption (a) two vectors z1 ∈ F1 (u) and z2 ∈ qri(F2 (u)). Thus it follows
from Theorem 2.189(b) that
244 3 CONVEX GENERALIZED DIFFERENTIATION

(u, z1 ) ∈ gph(F1 ) and (u, z2 ) ∈ qri gph(F2 ) ,
which verifies (3.97). Similarly we get from assumption (b) that there exist
vectors z1 ∈ F1 (u) and z2 ∈ int(F2 (u)), and hence (3.97) is implied by the
inclusion (3.100) established above. To summarize, we see that both assump-
tions (a) and (b) ensure the fulfillment of the qualification condition (3.97),
and so the coderivative sum rule (3.98) follows from Theorem 3.89. 

The next theorem provides a refined coderivative chain rule in the poly-
hedrality setting of LCTV spaces.

Theorem 3.92 Let F : X → → Y and G : Y →


→ Z be convex set-valued mappings
between the corresponding LCTV spaces. Suppose that one of the following two
conditions holds:
(a) F is polyhedral, and there exists a triple (x, y, z) ∈ X × Y × Z satisfying

(x, y) ∈ gph(F ) and (y, z) ∈ qri gph(G) .

(b) G is polyhedral, and there exists a triple (x, y, z) ∈ X × Y × Z satisfying



(x, y) ∈ qri gph(F ) and (y, z) ∈ gph(G).

Then whenever (x, z) ∈ gph(G ◦ F ) and w∗ ∈ Z ∗ we have


D∗ (G ◦ F )(x, z)(z ∗ ) = D∗ F (x, y) ◦ D∗ G(y, z)(z ∗ ) for y ∈ M (x, z), (3.104)
where M (x, z) := F (x) ∩ G−1 (z).

Proof. Fix x∗ ∈ D∗ (G ◦ F )(x, z)(z ∗ ) and y ∈ M (x, z). As in the proof of


Theorem 3.23 (with no polyhedrality assumptions), we check that

(x∗ , 0, −z ∗ ) ∈ N (x, y, z); Ω1 ∩ Ω2 ,
where the sets Ω1 and Ω2 are defined by
Ω1 := gph(F ) × Z and Ω2 := X × gph(G).
The first qualification condition (a) tells us that Ω1 is a convex polyhedron
and yields furthermore the fulfillment of
Ω1 ∩ qri(Ω2 ) = ∅.
Similarly, the second qualification condition (b) ensures that Ω2 is a convex
polyhedron and that the symmetric relationship
qri(Ω1 ) ∩ Ω2 = ∅
is satisfied. To complete the proof, we only need to use the intersection rule
from Theorem 3.87 and then follow the proof of Theorem 3.23. 
3.5 Exercises for Chapter 3 245

As a consequence of Theorem 3.92, we finally arrive at the polyhedral


subdifferential chain rule in LCTV spaces without involving any (generalized)
relative interior notion in qualification conditions.

Corollary 3.93 Let A : X → Y be a continuous linear mapping between


LCTV spaces, and let f : Y → R be a polyhedral convex function. Denote
y := A(x) and assume that y ∈ dom(f ) for some x ∈ X. Then we have the
subdifferential chain rule:
   
∂(f ◦ A)(x) = A∗ ∂f (y) := A∗ y ∗  y ∗ ∈ ∂f (y) . (3.105)

Proof. Define F (x) := {Ax} and G(x) := [f (x), ∞) for x ∈ X. Then


qri(gph(F )) = gph(A), dom(G) = dom(f ), and gph(G) = epi(f ). The
imposed assumptions guarantee that the qualification condition (b) of Theo-
rem 3.92 is satisfied. This allows us to deduce from Theorem 3.92 the fulfill-
ment of the following equalities:
 
∂(f ◦ A)(x) = D∗ (G ◦ A)(1) = D∗ A D∗ G(x, y)(1) = A∗ ∂f (y) ,
which therefore verify the claimed subdifferential chain rule (3.105). 

3.5 Exercises for Chapter 3


Exercise 3.94 Let Ω be a nonempty convex subset of Rn , and let x ∈
/ Ω. Prove
that w ∈ Π(x; Ω) if and only if x − w ∈ N (w; Ω).

Exercise 3.95 Do the following:


(a) Provide detailed proofs of Corollary 3.8 and Corollary 3.9 under the assumptions
imposed therein.
(b) Show that for X = Rn the result of Corollary 3.9 holds without imposing the
interiority condition.

Exercise 3.96 Check that the normal cone inclusion (3.7) holds for any convex sets
in arbitrary topological vector spaces.

Exercise 3.97 Let Ω1 and Ω2 ⊂ X be two nonempty convex subsets of a topological


vector space X.
(a) Check that the difference interiority condition

0 ∈ int(Ω1 − Ω2 ) (3.106)
ensures the validity of the bounded extremality condition (3.11) provided that
Ω2 is bounded.
(b) Clarify whether the bounded extremality condition (3.11) is implied by the
standard qualification condition (3.9) in arbitrary topological vector spaces.
246 3 CONVEX GENERALIZED DIFFERENTIATION

Exercise 3.98 Let X be a real vector space, and let


  
X  := f : X → R  f is a linear function (3.107)
be its algebraic dual space. Given a nonempty convex set Ω ⊂ X, define the normal
cone to Ω at x ∈ Ω by
  
N (x; Ω) := f ∈ X   f (x − x) ≤ 0 for all x ∈ Ω . (3.108)
(a) Formulate an appropriate notion of set extremality for sets Ω1 , Ω2 ⊂ X in the
vein of Definition 3.6.
(b) Formulate and give a direct proof of the characterizations of set extremality in
real vector spaces without topological structures in the form of Theorem 3.7 by
using the normal cone (3.108) and by replacing the topological interior int(Ω)
with its core/algebraic interior core(Ω).
(c) Clarify the possibility of deriving the result of (b) from Theorem 3.7 by using
the core convex topology on X described in Exercise 2.207.

Exercise 3.99 Verify that in the proof of Theorem 3.22 we have the following:
 
(a) Ω1 − Ω2 = dom(F1 ) − dom(F2 ) × Y × Y .
 
(b) (x∗ , −y ∗ , −y ∗ ) ∈ N (x, y 1 , y 2 ); Ω1 ∩ Ω2 .

Exercise 3.100 Verify that in the proof of Theorem 3.23 we have the following:
 
(a) Ω1 − Ω2 = X × rge(F ) − dom(G) × Z.
 
(b) (x∗ , 0, −z ∗ ) ∈ N (x, y, z); Ω1 ∩ Ω2 .

Exercise 3.101 Give a direct proof of (3.43) for convex functions and investigate
the possibility to extend this to topological vector spaces.

Exercise 3.102 Let f : X → R be a convex function defined on a topological vector


space. Prove that if f is continuous at x, then ∂f (x) is compact in the weak∗
topology. Hint: Use the Banach-Alaoglu theorem in topological vector spaces.

Exercise 3.103 Let f : X → R be a convex function on a topological vector space.


Consider the upper subdifferential/superdifferential of f at x ∈ dom(f ) defined by
  
∂ + f (x) := x∗ ∈ X ∗  x∗ , x − x ≥ f (x) − f (x) for all x ∈ X

and also the symmetric subdifferential ∂ 0 f (x) := ∂f (x) ∪ ∂ + f (x). Prove that

∂ + f (x) ⊂ ∂f (x), and hence ∂ 0 f (x) = ∂f (x).

Hint: Compare with the proof of [228, Theorem 1.93] in the case of normed spaces.

Exercise 3.104 Prove the subdifferential sum rule for finitely many functions for-
mulated in Corollary 3.49.

Exercise 3.105 Let f1 , f2 : Rn → R be extended-real-valued convex functions, and


let x ∈ dom(f1 ) ∩ dom(f2 ).
3.5 Exercises for Chapter 3 247

(a) Prove the subdifferential sum rules

∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x), ∂ ∞ (f1 + f2 )(x) = ∂ ∞ f1 (x) + ∂ ∞ f2 (x)

for both basic subdifferential (3.39) and singular subdifferential (3.41) under
the singular subdifferential qualification condition
 
∂ ∞ f1 (x) ∩ − ∂ ∞ f2 (x) = {0}. (3.109)

(b) Obtain an extension of (a) to the case of finitely many convex functions.
(c) Compare the qualification condition in (a) and (b) with the relative interior
qualification condition (3.60) in Theorem 3.50.
(d) Derive the subdifferential sum rules of Theorem 3.50 and of parts (a) and (b)
of this exercise from the corresponding sum rules for coderivatives by using the
relationships in (3.42).
(e) Obtain appropriate versions of the results in (a)–(d) for convex functions on
topological vector spaces.

Exercise 3.106 Find the subdifferential formulas for the following functions:
(a) f (x) := 3|x|, x ∈ R.
(b) f (x) := |x − 2| + |x + 2|, x ∈ R.
(c) f (x) := max{e−2x , e2x }, x ∈ R.

Exercise 3.107 Let f (x1 , x2 ) := max{|x1 |, |x2 |} for (x1 , x2 ) ∈ R2 . Calculate the
subdifferentials ∂f (0, 0) and ∂f (1, 1).

Exercise 3.108 Let X be a normed space, and let F be a closed bounded convex
set containing the origin in its interior. Prove the following subdifferential formula
for the Minkowski gauge function pF :

F◦ if x = 0,
∂pF (x) =  ∗ 
∗  ∗ ∗

x ∈ X pF (x ) = 1, x , x = pF (x)
◦ otherwise,

where F ◦ := {x∗ ∈ X ∗ | x∗ , x ≤ 1 for all x ∈ F }.

Exercise 3.109 Derive chain rules for both basic and singular subdifferentials from
the corresponding chain rules for coderivatives in topological vector spaces and finite-
dimensional spaces. Hint: Use the relationships in (3.42).

Exercise 3.110 Let f : X × Y → R be a convex function defined on the product


of two topological vector spaces, and let (x, y) ∈ dom(f ). Denote by ∂x f (x, y) and
∂x∞ f (x, y) the basic and singular subdifferential of f with respect to the first variable.
(a) Find conditions on f under which we have
  
∂x f (x, y) = x∗ ∈ X ∗  ∃ y ∗ ∈ Y ∗ with (x∗ , y ∗ ) ∈ ∂f (x, y) ,
∂x∞ f (x, y) = x∗ ∈ X ∗  ∃ y ∗ ∈ Y ∗ with (x∗ , y ∗ ) ∈ ∂ ∞ f (x, y) .

(b) Specify the results of (a) in the case of finite-dimensional spaces X and Y .
Hint: Use the subdifferential chain rules from Theorems 3.55 and 3.56 together with
their singular subdifferential counterparts.
248 3 CONVEX GENERALIZED DIFFERENTIATION

Exercise 3.111 Give a direct proof of Proposition 3.67(c) when X is a Hilbert


space. Hint: Use the parallelogram law in Hilbert spaces to show that any sequence
of approximate projections is Cauchy.
Exercise 3.112 Prove assertion (b) of Proposition 3.77.
Exercise 3.113 Let Θ : X →
→ Y be a multifunction between two normed spaces.
(a) Find conditions on the multifunction Θ ensuring the Lipschitz continuity of the
generalized distance function
      
d x; Θ(y) = dist x; Θ(y) := inf x − w  w ∈ Θ(y) . (3.110)

(b) Extend the general (not subdifferential) properties of the basic distance function
(3.72) obtained in Subsection 3.3.5 to the generalized one (3.110) under suitable
assumptions of Θ. Hint: Compare with [315, Theorem 2.3] in finite dimensions
and with [229, Theorem 1.41] in normed spaces.
(c) Find conditions on Θ ensuring the convexity of (3.110) in both variables.
(d) Assuming that (x, y) → d(x; Θ(y)) is convex, establish appropriate counterparts
of the subdifferential results from Subsection 3.3.5 in the case of the generalized
distance function (3.110).
(e) Under the convexity of (3.110) in both variables (x, y), derive upper estimates
and precise formulas for calculating the singular subdifferential ∂ ∞ d(x; Θ(y))
of (3.110) at (x, y) ∈ X × Y considering both the in-set case (x, y) ∈ gph(Θ)
and the out-of-set case (x, y) ∈ gph(Θ).
Hint: For (d) and (e), specify and improve the corresponding results of [231, 233]
provided that the function (x, y) → d(x; Θ(y)) is convex.
Exercise 3.114 Let all the spaces under consideration in this chapter be real vector
spaces without topology with their algebraic dual spaces defined as in (3.107).
(a) Prove the intersection rule for the normal cone (3.108) in real vectors spaces by
using the characterization of set extremality from Exercise 3.98.
(b) Formulate and verify vector space counterparts of the calculus results given in
Subsections 3.2.2, 3.3.2, 3.3.3, and 3.3.4 by proceeding similarly to the proofs
therein and replacing set interiors with cores.
(c) Clarify the possibility to derive the results from (b) in vector spaces by reducing
them to those obtained in topological vector spaces with usage of convex core
topology discussed in Exercise 2.207.
Exercise 3.115 Let Ω1 , Ω2 ⊂ Rn be two nonempty polyhedral sets such that Ω1 ∩
Ω2 = ∅. Prove that they can be strictly separated. Hint: Reduce to the description of
strict separation in [306, Theorem 11.4] and compare with Corollary 19.3.3 therein.
Exercise 3.116 Let Ω be a nonempty convex polyhedron in a topological vector
space X. Prove that ri(Ω) = ∅. Hint: Proceed by the definition of polyhedral sets
and cf. [42, Propositin 2.197].
Exercise 3.117 Obtain an extension of Theorem 3.87 to the case of intersections
of finitely many convex sets in topological vector spaces, where all but one of the
sets are polyhedral. Hint: Proceed by induction with deriving first the intersection
rule for two sets, which are both convex polyhedra.
3.6 Commentaries to Chapter 3 249

Exercise 3.118 A subset Ω of a topological vector space X is said to be a gen-


eralized convex polyhedron if it can be represented as the intersection of a convex
polyhedron with a closed affine subset of X.
(a) Give an example of a generalized convex polyhedron that is not a convex poly-
hedron. Hint: Let X = 2 , and let Ω := {x = (xn ) ∈ 2 | x2n = 0 for all n ∈ N}.
(b) Prove that any generalized convex polyhedron is a convex polyhedron if X is a
finite-dimensional space.
(c) Clarify whether Theorem 3.86 and the results of Subsections 3.4.2 and 3.4.3
hold with the replacement of the corresponding polyhedrality by generalized
polyhedrality assumptions.

Exercise 3.119 Extend the coderivative and subdifferential sum rules from Theo-
rem 3.89 and Corollary (3.50), respectively, to the cases of finitely many terms in
the corresponding summation.

Exercise 3.120 Derive efficient conditions for the fulfillment of the polyhedral chain
rule in Theorem 3.92 without using the quasi-relative interiors of the graphs. Hint:
Proceed similarly to the proof of Theorem 3.91.

Exercise 3.121 Develop calculus rules of convex generalized differentiation for nor-
mals to sets, coderivatives of set-valued mappings, and subgradients of extended-
real-valued functions in LCTV spaces under extended relative interior qualification
conditions without any polyhedrality assumptions. Hint: Start with the normal cone
intersection rule and employ the proper separation result of Theorem 2.184 as in
the proof of Theorem 3.15.

3.6 Commentaries to Chapter 3


As mentioned in the commentary Section 2.7, in convex analysis everything starts with
geometry, and generalized differentiation is not an exception. The fundamental notion
of the normal cone to a convex set in finite dimensions goes back to Minkowski [220] who
was motivated by establishing and formalizing the supporting hyperplane theorem on
the existence of nonzero normals at boundary points of convex sets, an equivalent of the
basic convex separation theorem in convex analysis.
The concept of the subdifferential (collection of subgradients) for extended-real-
valued convex functions was defined independently by Moreau [264] and Rockafellar
[302] without any appeal to the normal cone, although it has been realized that sub-
gradients can be expressed via normals to their epigraph. Both Moreau and Rock-
afellar were influenced by Fenchel’s developments presented in the lecture notes [131]
while addressing not extended-real-valued functions ϕ : Rn → R, but pairs (ϕ, Ω)
of real-valued convex functions ϕ defined on convex sets Ω. Note that dealing with
subgradients of extended-real-valued functions immediately calls for developing sub-
differential calculus, since an extended-real-valued function is represented as the sum
of its real-valued part and the indicator function of the domain. Thus a subdifferen-
tial sum rule has been ultimately required, and this is the content of the celebrated
Moreau-Rockafellar theorem, which is the fundamental result of the subdifferential
calculus for nondifferentiable convex functions.
Convex analysis is not the only area of mathematics where nondifferentiable
functions naturally appear and play a crucial role in the theory and applications.
250 3 CONVEX GENERALIZED DIFFERENTIATION

The theory of generalized functions by Sergei Sobolev (1908–1989) and the theory
of distributions by Laurent Schwartz (1915–2002) develop and apply appropriate
notions of generalized derivatives. However, those notions have nothing to do with
what is needed for optimization. Indeed, the generalized derivatives in the sense of
Sobolev and Schwartz concern equivalence classes of functions and are defined up to
sets of measure zero. On the other hand, the main interest in convex optimization is
drawn to individual points, where the minimum value of a function is attained. For
instance, the function ϕ(x) = |x| is nondifferentiable at only one point 0 ∈ R, but
this is the most important point of ϕ realizing the minimum value of this function.
The concept of the subdifferential as the collection of subgradients and thus of
set-valued subgradient mappings has been a breakthrough idea in mathematics that
was not accepted right from the beginning (as Rockafellar told to the first author),
but then has been realized as a powerful machinery to investigate and solve numerous
problems appearing in optimization and many other areas of applied mathematics.
Of course, it has become possible after developing an adequate subdifferential cal-
culus and subgradient computations. Note that the set-valuedness of subgradient
mappings is an ultimate indication of the function nonsmoothness and geometri-
cally corresponds to multiple normals to epigraphs at the reference points, i.e., to
multiple supporting hyperplanes to convex sets at kink points of the boundary.
In most publications on convex analysis, as well as those concerning more general
forms of nonsmooth and variational analysis, major properties of normal cones and
particularly calculus rules for them are derived as consequences of the corresponding
functional results of subdifferential calculus. We implement here the opposite strat-
egy following the dual-space geometric approach to variational analysis initiated by
Mordukhovich and then developed by many researchers in finite-dimensional and
Banach spaces; see the books [226, 228, 229] with the references and commentaries
therein. The underlying ingredients of this approach are the notion of set extremality
for systems of closed sets at their common points introduced by Kruger and Mor-
dukhovich in [191] and its dual descriptions via the extremal principles; see [228, 229]
for more discussions. The extension of set extremality to arbitrary set systems (with
possibly empty intersections without closedness requirements) in topological vector
spaces is given in Definition 3.6, which is taken from our recent paper [239]. Neces-
sary and sufficient conditions for set extremality and its relationships with convex
separation are also taken from [239] on which the entire Section 3.1 (except Subsec-
tion 3.1.4) is based; see also [88, 89, 242] for further developments and applications.
Subsection 3.1.4 presents the classical intersection rule for normal cones to convex
sets in finite dimension being generally different from the above intersection rule
in topological vector spaces. The proof of the finite-dimensional intersection rule of
Theorem 3.15 is taken from our publications [237, 238].
For some reason, generalized derivatives and coderivatives of single-valued and
set-valued mappings were not introduced and utilized in basic convex analysis. To
the best of our knowledge, the first notions of this type for convex-graph multifunc-
tions were defined by Pshenichnyi [295] (see also his book [296]) via the tangent
cone to the associated graphical set and considering then the dual object called the
locally conjugate mapping at the reference point of the graph. Similar constructions
were introduced by Aubin [11] under the names of the graphical derivative and cod-
ifferential, respectively; see also the books by Aubin and Ekeland [13] and Aubin
and Frankowska [14] for further studies and applications. The coderivative notion
3.6 Commentaries to Chapter 3 251

from Definition 3.19 was introduced by Mordukhovich [224] via his (limiting) nor-
mal cone defined earlier in [223] without any appeal to tangential approximations of
the graph. In fact, this coderivative cannot be dual to a graphical derivative, since
Mordukhovich’s normal cone is generally nonconvex and hence cannot be generated
in duality by tangential approximations.
In contrast to the graphical derivatives and their dual constructions initiated
by Pshenichnyi and Aubin, the coderivative of Mordukhovich is robust and enjoys
full coderivative calculus for general multifunctions between finite-dimensional and
Banach (mainly Asplund) spaces under unrestrictive qualification conditions; see
the aforementioned books [226, 228, 229] and the book by Rockafellar and Wets
[317] in finite dimensions with the references and discussions therein. The reader is
also referred to the developments of Borwein and Zhu [54], Ioffe [171], Jourani and
Thibault [177], and Penot [288] on calculus rules for coderivatives defined by scheme
(3.27) via other normal cones in suitable Banach space frameworks.
The proofs of the coderivative calculus rules given in Section 3.2 follow the
pattern of the geometric dual-space approach from [226, 228, 229] but lead us to
essentially stronger results for coderivatives of convex multifunctions in comparison
with those obtained for general nonconvex mappings. First of all, we cover arbi-
trary topological vector spaces without any closedness assumptions on the involved
multifunctions, and—most importantly—the obtained calculus rules and the corre-
sponding qualification conditions for convex multifunctions significantly strengthen
known nonconvex rules and cannot be derived from the latter by specifying them
to convex-graph mappings even in finite dimensions. The topological vector space
material of Section 3.2 is based on our paper with Rector and Tran [242], while
the finite-dimensional Subsection 3.2.3 follows our previous publication [238]. The
recent papers [88, 89] provide extensions of the coderivative calculus for convex mul-
tifunctions between general topological vector spaces with qualification conditions
formulated in terms of cores.
Subdifferential theory for convex functions is one of the most understood and
complete part of convex analysis with a profound influence on developing general-
ized differentiation of nonconvex functions in different settings of variational analysis
and its applications. We have discussed above the origin of convex subdifferentia-
tion, which is now available for extended-real-valued functions in finite-dimensional
and infinite-dimensional frameworks. Various aspects of convex subdifferentiation
can be found in the books by Bauschke and Combettes [34], Borwein and Lewis
[48], Ekeland and Teman [122], Hiriart-Urruty and Lemaréchal [164, 165], Ioffe and
Tikhomirov [174], Kusraev and Kutateladze [197], Mordukhovich and Nam [237],
Phelps [290], Rockafellar [306], and Zălinescu [361] among other publications.
It seems, however, that the notion of the singular/horizon subdifferential from
Definition 3.31 did not appear in the aforementioned publications on convex anal-
ysis, although some ideas on behavior of convex sets and functions at infinity had
been explored in the study of convexity via horizon cones and functions; see, e.g.,
Rockafellar and Wets [317, Chapter 3]. Probably the main reason for missing this
construction in basic convex analysis was the intrinsic Lipschitz continuity of a
convex function on the interior of its domain, where the singular subdifferential is
trivial, which is not the case for the domain boundary; see Theorems 3.34 and 3.37.
Singular subgradients of extended-real-valued functions (both convex and non-
convex) naturally appear while considering normal cones to epigraphs that are
252 3 CONVEX GENERALIZED DIFFERENTIATION

decomposed into nonhorizontal and horizontal normals; the latter generate the
singular subgradients. To the best of our knowledge, this has been first explored
in the early work by Kruger and Mordukhovich [190] and Mordukhovich [224]; see
also the books [226, 228, 229] for more details. The constructions of the subdiffer-
ential and singular subdifferential can be unified by using the coderivative of the
graphical multifunction as in (3.43), which allows us to conduct their parallel stud-
ies. In another way, singular subgradients were defined in finite-dimensional spaces
by Rockafellar [313, 316] as “singular proximal limiting subgradients” while being
equivalent to the pattern of (3.41) in finite dimensions.
Subsection 3.3.1 contains standard material of convex analysis, except the results
involving singular subgradients. Subsections 3.3.2–3.3.4 present basic rules of subd-
ifferential calculus in topological vector spaces and finite-dimensional spaces, which
can be found, e.g., in the books by Rockafellar [306] in finite dimensions and by
Ioffe and Tikhomirov [174] in LCTV spaces. Note that in this chapter the local
convexity of the spaces in question is not needed in our developments. The underly-
ing results of subdifferential calculus are the subdifferential sum rules whose origin
has been discussed above in this commentary section. The proofs of the results
presented here are based on the geometric approach from variational analysis by
reducing the subdifferential sum rules to the corresponding results for the normal
cone to set intersections. This is the pattern developed in [226, 228, 229] in gen-
eral nonconvex settings with significant specifications in the case of convexity by
following our publications [237, 238] in finite-dimensional spaces and [242] in the
LCTV setting. The given results and their singular subdifferential counterparts can
be deduced from the corresponding rules of the coderivative calculus. Note that the
singular subdifferential qualification condition (3.109) was introduced independently
by Mordukhovich [225] and Rockafellar [316] for different subdifferentials in general
variational analysis and was used in our book [237] for deriving both subdifferential
sum rules for convex functions in Exercise 3.105. The recent papers [88, 89] contain
geometric derivations of subdifferential calculus rules in general vector spaces with-
out topology by using qualification conditions formulated via algebraic interiors of
sets instead of topological ones.
Subsection 3.3.5 deals with distance functions, which are intrinsically nondif-
ferentiable and play a highly important role in various aspects of convex and
variational analysis as well as in their numerous applications; e.g., the books
[34, 48, 54, 76, 77, 171, 226, 228, 229, 237, 317] and the references therein. The
general properties of the distance functions and projection operators presented in
this subsection are well known. Theorem 3.76 on subdifferentiation of infimal con-
volutions goes back to Moreau [265, 266]. Subdifferential properties of the basic dis-
tance function (3.72) associated with closed sets in finite-dimensional and Banach
spaces have been largely investigated in the literature for major subdifferential
constructions by using variational principles in infinite dimensions. The reader can
consult the books by Mordukhovich [228, 229] with various developments, detailed
commentaries, and references for both in-set and out-of-set cases; the latter one is
much more involved. Among many publications in these directions, we particularly
mention the papers by Bounkhel and Thibault [59], Jourani and Thibault [176],
Kruger [186, 188], and Mordukhovich and Nam [231, 233] for both in-set and out-
of-set cases, as well by Ioffe [169] and Thibault [333] for the in-set case in diverse
space settings. Note finally that our papers [231, 233] contain extended subdifferen-
3.6 Commentaries to Chapter 3 253

tial evaluations for the generalized distance functions of type (3.110) with moving
sets Θ(y) at both in-set points (x, y) ∈ gph(Θ) and out-of-set ones (x, y) ∈ gph(Θ).
Section 3.4 addresses polyhedral convexity, which role—especially in the case of
convex sets—has been well recognized in convex analysis. Definition 3.83 of polyhe-
dral sets via systems of linear inequalities (3.84) is equivalent, in finite dimensions,
to the classical topological definition of convex polyhedra as finitely generated con-
vex sets due to the Minkowski-Weyl theorem; see [219, 352] and a comprehensive
account of finite-dimensional polyhedrality in the book of Rockafellar [306] with
further references. Note that definition of polyhedrality (3.84) in finite dimensions
is equivalent to the “generalized polyhedrality” introduced and studied by Bonnans
and Shapiro [42] (see Exercise 3.118), but the equivalence fails in infinite-dimensional
spaces.
The fundamental Rockafellar’s separation theorem, which is given Theorem 3.84
with a somewhat different proof, is the key to develop many issues of convex polyhe-
drality in finite Dimensions. Its infinite-dimensional extension to the LCTV spaces
is due to Kung Fu Ng and Wen Song [280] who reduced in fact the LCTV setting
to finite dimensions by using the quotient topology. The proof of Theorem 3.86 is a
certain elaboration of [280, Theorem 3.1].
The rest of Section 3.4 follows our recent paper with Cuong and Sandine [91],
where we develop the geometric approach of this book to derive calculus rules for
normal to sets and then to coderivatives of set-valued mappings and subgradients of
extended-real-valued functions. To furnish this approach in the polyhedral setting of
LCTV spaces requires using the results on generalized relative interiors presented in
Chapter 2. Observe, in particular, that the subdifferential chain rule of Corollary 3.93
not only removes the standard continuity assumption on the outer function f , but is
also free of any generalized relative interior construction. The obtained result goes
far beyond the one with the qualification condition

dom(f ) ∩ sqri(AX) = ∅

in terms of the so-called “strong quasi-relative interior” of AX (meaning the collec-


tion of y ∈ AX such that the conic hull of AX − y is a closed subspace of AX),
which was used in [280, Corollary 3.2] for establishing this subdifferential rule.
4
ENHANCED CALCULUS AND FENCHEL
DUALITY

A large part of this chapter continues developing generalized differential calcu-


lus started in the previous chapter, but from different perspectives. Namely,
we consider Fenchel conjugates and duality relationships, which are specifi-
cally related to convexity and play a fundamental role in convex analysis.
Our geometric approach in this chapter strongly relates to representations of
the support function to set intersections obtained by using the set extremality
and its characterizations derived in Chapter 3 in topological vector spaces and
finite-dimensional settings. In this way, we develop calculus rules for Fenchel
conjugates in these settings and also obtain enhanced generalized differential
calculus rules in Banach spaces by involving some other ideas. Special atten-
tion is paid to the classes of supremum and marginal/optimal value functions
that are highly important for applications.

4.1 Fenchel Conjugates


This section is mainly devoted to the study of Fenchel conjugates, which
are defined for arbitrary extended-real-valued functions on topological vector
spaces while having remarkable properties in the presence of convexity. In the
latter case, Fenchel conjugates are closely related to subgradients and duality.

4.1.1 Definitions, Examples, and Basic Properties

Definition 4.1 Given a function f : X → [−∞, ∞] (not necessarily convex)


on a topological vector space X, the Fenchel conjugate f ∗ : X ∗ → [−∞, ∞]
of f is defined as
  
f ∗ (x∗ ) := sup x∗ , x − f (x)  x ∈ X , x∗ ∈ X ∗ . (4.1)

In what follows we mainly consider the case where f : X → R, but occa-


sionally involve functions with possibly −∞ values. If f : X → R, the following
proposition shows that the Fenchel conjugate of a proper function f is always
a convex function even if f is nonconvex.

© Springer Nature Switzerland AG 2022 255


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 4
256 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Proposition 4.2 Let X be a topological vector space, and let f : X → R be


an arbitrary proper function. Then its conjugate f ∗ : X ∗ → R is convex.
Proof. Fix x ∈ dom(f ) and get for any x∗ ∈ X ∗ that
  
f ∗ (x∗ ) = sup x∗ , x − f (x)  x ∈ X ≥ x∗ , x − f (x) > −∞.
Observe that x∗ , x − f (x) = −∞ if x ∈
/ dom(f ), and so
 ∗     
f (x ) := sup x , x − f (x)  x ∈ X = sup x∗ , x − f (x)  x ∈ dom(f ) .
∗ ∗

For each x ∈ dom(f ) define the affine function ϕx : X ∗ → R by


ϕx (x∗ ) := x∗ , x − f (x) whenever x∗ ∈ X ∗ .
Then ϕx is a convex function for every x ∈ dom(f ), and we have
  
f ∗ (x∗ ) = sup ϕx (x∗ )  x ∈ dom(f ) whenever x∗ ∈ X ∗ .
Thus f ∗ is convex on X ∗ as the supremum of a family of convex functions. 

The next proposition provides a useful property of Fenchel conjugates.

Proposition 4.3 Let X be a topological vector space, and let f, g : X → R


be such functions that f (x) ≤ g(x) for all x ∈ X. Then we have the opposite
inequality for the conjugates f ∗ (x∗ ) ≥ g ∗ (x∗ ) whenever x∗ ∈ X ∗ .

Proof. For any fixed x∗ ∈ X ∗ it follows from (4.1) that


x∗ , x − f (x) ≥ x∗ , x − g(x), x ∈ X.
Thus we immediately arrive at
     
f ∗ (x∗ ) = sup x∗ , x−f (x)  x ∈ X ≥ sup x∗ , x−g(x)  x ∈ X = g ∗ (x∗ ),
which verifies the claimed property. 

The following example illustrates the calculation of conjugate functions.


Example 4.4 (a) Let X be a topological vector space. Given v ∗ ∈ X ∗ and
b ∈ R, consider the affine function
f (x) := v ∗ , x + b, x ∈ X.
Then it can be seen directly from the definition that

∗ ∗ −b if x∗ = v ∗ ,
f (x ) =
∞ otherwise.
4.1 Fenchel Conjugates 257

(b) Let f (x) := ex for x ∈ R. Then we have




⎨v ln(v) − v if v > 0,

f (v) = 0 if v = 0,


∞ if v < 0.

(c) Given any p > 1, consider the power function


⎧ p
⎨x if x ≥ 0,
f (x) := p
⎩∞ otherwise.

Then for any v ∈ R the conjugate of this function is


xp  xp 

f ∗ (v) = sup vx −  x ≥ 0 = − inf − vx  x ≥ 0 .
p p
It is clear furthermore that f ∗ (v) = 0 if v ≤ 0, since in this case vx −
p−1 xp ≤ 0 when x ≥ 0. Considering the case where v > 0, we see that
the function ψv (x) := p−1 xp − vx is convex and differentiable on (0, ∞)
with ψv (x) = xp−1 − v. Thus ψv (x) = 0 if and only if x = v 1/(p−1) , and
so ψv attains its minimum at x = v 1/(p−1) . Hence the conjugate function
is calculated by the formula
1 p/(p−1)
f ∗ (v) = 1 − v , v ∈ R.
p
Taking q with q −1 = 1 − p−1 , we express the conjugate function as

⎨0 if v ≤ 0,
f ∗ (v) = v q
⎩ otherwise.
q
The next proposition allows us to compute the Fenchel conjugate for a
large class of functions involving the norm functions on normed spaces.

Proposition 4.5 Let f : R → R be a function such that f (0) = 0 and f (t) ≥ 0


for all t ∈ R. Define ϕ(x) := f ( x ) for x ∈ X, where X is a normed space.
Then ϕ∗ (x∗ ) = f ∗ ( x∗ ) for all x∗ ∈ X ∗ .

Proof. For any x∗ ∈ X ∗ , we have


  
ϕ∗ (x∗ ) = sup x∗ , x − f ( x )  x ∈
 X 
≤ sup  x∗ · x − f ( x ) x ∈ X
≤ sup  x∗ t − f (t) | t ≥ 0 
≤ sup x∗ t − f (t) | t ∈ R = f ∗ ( x∗ ).
Observe that x∗ t − f (t) ≤ 0 = f (0) whenever t ≤ 0. Thus we have
258 4 ENHANCED CALCULUS AND FENCHEL DUALITY
  
f ∗ ( x∗ ) = sup x∗ t − f (t)  t ∈ R
  
= sup x∗ t − f (t)  t ≥ 0
  

∗ 
≤ sup sup x , x − f (t)  t ≥ 0
 x=t  
≤ sup x∗ , x − f ( x )  t ≥ 0, x ∈ X, x = t
  
≤ sup x∗ , x − f ( x )  x ∈ X = ϕ∗ (x∗ ),
which completes the proof of the proposition. 

The next corollary follows directly from Proposition 4.5 by using the func-
tion f defined in Example 4.4(c).

Corollary 4.6 Let X be a normed space. Given p > 1, define ϕ(x) := x p /p


for x ∈ X. Then ϕ∗ (x∗ ) = x∗ q /q for x∗ ∈ X ∗ , where 1/p + 1/q = 1.

The next proposition allows us to compute the Fenchel conjugate of a pos-


itively homogeneous convex function based on its subgradients at the origin.

Proposition 4.7 Let X be a topological vector space, and let f : X → R be a


convex function that is positively homogeneous, i.e.,
f (tx) = tf (x) for all x ∈ X and t > 0.
Suppose that Ω := ∂f (0) = ∅. Then we have

0 if x∗ ∈ Ω,
f ∗ (x∗ ) = δΩ (x∗ ) =
∞ if x∗ ∈
/ Ω.

Proof. It follows from the given assumptions that 0 ∈ dom(f ) with f (0) = 0.
Fix any x∗ ∈ Ω = ∂f (0) and get
x∗ , x ≤ f (x) for all x ∈ X.
Thus x∗ , x − f (x) ≤ 0 for all x ∈ X, and x∗ , 0 − f (0) = 0. Then we have
  
f ∗ (x∗ ) = sup x∗ , x − f (x)  x ∈ X = 0.
Consider the case where x∗ ∈  ∈ dom(f ) such that
/ Ω = ∂f (0) and find x
x∗ , x
 > f (
x).
In this case we have
     
f ∗ (x∗ ) ≥ sup x∗ , t x)  t > 0 = sup t x∗ , x
x − f (t x)  t > 0 = ∞.
 − f (
This implies that f ∗ (x∗ ) = ∞ and completes the proof. 
4.1 Fenchel Conjugates 259

Example 4.8 Let X be a normed space, and let f (x) := x , x ∈ X.


Then f is convex and positively homogeneous with ∂f (0) = B∗ . We get from
Proposition 4.7 that f ∗ = δB∗ .
The following property is valid for arbitrary proper functions and is known
as the Fenchel-Young inequality.
Proposition 4.9 Let X be a topological vector space, and let f : X → R be a
proper function on X. Then we have
x∗ , x ≤ f (x) + f ∗ (x∗ ) for all x ∈ X and x∗ ∈ X ∗ .
Proof. The conclusion is obvious if f (x) = ∞. If x ∈ dom(f ), we get from
(4.1) that f ∗ (x∗ ) ≥ x∗ , x − f (x), which verifies the inequality. 
The next important result reveals a close relationship between subgradi-
ents and Fenchel conjugates of convex functions on topological vector spaces.
Theorem 4.10 Let X be a topological vector space, and let f : X → R be a
convex function with x ∈ dom(f ). Then we have x∗ ∈ ∂f (x) if and only if
f (x) + f ∗ (x∗ ) = x∗ , x. (4.2)

Proof. Taking any x ∈ ∂f (x) and using definition (3.39) tell us that
f (x) + x∗ , x − f (x) ≤ x∗ , x for all x ∈ X.
This readily implies the inequality
  
f (x) + f ∗ (x∗ ) = f (x) + sup x∗ , x − f (x)  x ∈ X ≤ x∗ , x.
Since the opposite inequality holds by Proposition 4.9, we arrive at (4.2).
Conversely, suppose that f (x) + f ∗ (x∗ ) = x∗ , x. Applying again Propo-
sition 4.9, we get f ∗ (x∗ ) ≥ x∗ , x − f (x) for every x ∈ X, and therefore
x∗ , x = f (x) + f ∗ (x∗ ) ≥ f (x) + x∗ , x − f (x) whenever x ∈ X.
This verifies the subgradient inclusion x∗ ∈ ∂f (x). 
It is natural to introduce the second conjugate, or biconjugate, of a function
on a topological vector space as follows.
Definition 4.11 Let X be a topological vector space. Given a function f : X →
[−∞, ∞], its second Fenchel conjugate f ∗∗ is defined by

f ∗∗ (x) = sup x∗ , x − f ∗ (x∗ ) | x∗ ∈ X ∗ }, x ∈ X. (4.3)
Note that we consider the second conjugate in (4.3) only on X, not on the
larger space X ∗∗ in the case where X is a normed space. Observe that in the
case where X is an LCTV space and X ∗ is equipped with the weak∗ topology,
we have f ∗∗ = (f ∗ )∗ due to the duality relationship in Theorem 2.67.
The next proposition holds for arbitrary functions on general topological
vector spaces.
260 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Proposition 4.12 Let X be a topological vector space, and let f : X → R be


a function on X. Then we have
f ∗∗ (x) ≤ f (x) for all x ∈ X. (4.4)

Proof. Fix any x ∈ X and employ Proposition 4.9 to get


x∗ , x − f ∗ (x∗ ) ≤ f (x) for all x∗ ∈ X ∗ ,
which holds even if f is an improper function. It follows therefore that
  
f ∗∗ (x) = sup x∗ , x − f ∗ (x∗ )  x∗ ∈ X ∗ ≤ f (x), x ∈ X,
which verifies the claimed inequality (4.4). 
Let us start investigating conditions to ensure that the biconjugate f ∗∗ of
a convex function f agrees with the function itself.

Proposition 4.13 Let X be a topological vector space. Given a convex func-


tion f : X → R with x ∈ dom(f ), suppose that ∂f (x) = ∅. Then we have the
equality f ∗∗ (x) = f (x).

Proof. Taking into account Proposition 4.12, it suffices to verify the oppo-
site inequality therein. Fix x∗ ∈ ∂f (x) and get x∗ , x = f (x) + f ∗ (x∗ ) by
Theorem 4.10. This readily implies that
  
f (x) = x∗ , x − f ∗ (x∗ ) ≤ sup x, x∗  − f ∗ (x∗ )  x∗ ∈ X ∗ = f ∗∗ (x),
which therefore verifies the claimed assertion. 
The next statement immediately follows from Proposition 4.13 due to the
basic subdifferential theory of convex analysis.

Corollary 4.14 Let X be a topological vector space, and let f : X → R be a


convex function that is continuous at some x ∈ X. Then we have the equality
f ∗∗ (x) = f (x) for all x ∈ int(dom(f )).

Proof. Since f is continuous at x, Theorem 3.39(a) tells us that ∂f (x) = ∅


for all x ∈ int(dom(f )). It remains to apply Proposition 4.13. 

Now we are ready to derive a fundamental result of Fenchel conjugate the-


ory that, in particular, significantly extends the statement of Corollary 4.14.

Theorem 4.15 Let f : X → R be a proper function on an LCTV space, and


let A be the collection of all the affine functions of the form ϕ(x) = x∗ , x + b
for x ∈ X with x∗ ∈ X ∗ and b ∈ R. Denoting
  
A(f ) := ϕ ∈ A  ϕ(x) ≤ f (x) for all x ∈ X ,
we have that A(f ) = ∅ if f is l.s.c. and convex. Furthermore, the following
properties are equivalent:
4.1 Fenchel Conjugates 261

(a) f is l.s.c. and convex.


(b) f (x) = supϕ∈A(f ) ϕ(x) for all x ∈ X.
(c) f ∗∗ (x) = f (x) for all x ∈ X.

Proof. First we show that A(f ) = ∅ provided that f is l.s.c. and convex.
Fix any x0 ∈ dom(f ) and choose λ0 < f (x0 ). Then (x0 , λ0 ) ∈
/ epi(f ), where
the epigraph is a nonempty, closed, and convex subset of X × R. The convex
separation result from Theorem 2.61 ensures the existence of a pair (x∗ , γ) ∈
X ∗ × R and a positive number ε such that
x∗ , x + γλ < x∗ , x0  + γλ0 − ε for all (x, λ) ∈ epi(f ). (4.5)
Since (x0 , f (x0 ) + α) ∈ epi(f ) if α ≥ 0, we get
 
γ f (x0 ) + α < γλ0 − ε whenever α ≥ 0.
This yields γ < 0 since otherwise we let α → ∞ and arrive at a contradiction.
Taking any x ∈ dom(f ), it follows from (4.5) as (x, f (x)) ∈ epi(f ) that
x∗ , x + γf (x) < x∗ , x0  + γλ0 − ε for all x ∈ dom(f ).
This allows us to conclude that
 x∗  ε
f (x) > , x0 − x + λ0 − if x ∈ dom(f ).
γ γ
Defining now ϕ(x) := x∗ /γ, x0 − x + λ0 − ε/γ, we get ϕ ∈ A(f ) and thus
verify the claimed nonemptiness A(f ) = ∅.
Let us further prove that (a)=⇒(b) meaning that the properties in (a)
ensure that for any λ0 < f (x0 ) there exists ϕ ∈ A(f ) with λ0 < ϕ(x0 ).
Since (x0 , λ0 ) ∈
/ epi(f ), we apply again the aforementioned convex separation
theorem to obtain (4.5). In the case where x0 ∈ dom(f ), it is proved above
that ϕ ∈ A(f ). Moreover, we have ϕ(x0 ) = λ0 − ε/γ > λ0 since γ < 0.
Consider now the case where x0 ∈ / dom(f ). It follows from (4.5), by taking
any x ∈ dom(f ) and letting λ → ∞, that γ ≤ 0. If γ < 0, the same arguments
as above verify (b). Hence we only need to consider the case where γ = 0.
Employing (4.5) in this case tells us that
x∗ , x − x0  + ε < 0 whenever x ∈ dom(f ).
Since A(f ) = ∅, choose ϕ0 ∈ A(f ) and define
ϕk (x) := ϕ0 (x) + k(x∗ , x − x0  + ε) for k ∈ N.
We obviously have ϕk ∈ A(f ) and ϕk (x0 ) = ϕ0 (x0 ) + kε > λ0 for all large k,
which justifies the representation in (b).
To verify now implication (b)=⇒(c), pick any ϕ ∈ A(f ). Since ϕ(x) ≤ f (x)
for all x ∈ X, we have ϕ∗∗ (x) ≤ f ∗∗ (x) on X by Proposition 4.3 and the
262 4 ENHANCED CALCULUS AND FENCHEL DUALITY

second conjugate definition. Furthermore, applying Proposition 4.13 ensures


that ϕ(x) = ϕ∗∗ (x) ≤ f ∗∗ (x) therein. It follows therefore that
 
f (x) = sup ϕ(x)  ϕ ∈ A(f )} ≤ f ∗∗ (x) for every x ∈ X.
The reverse inequality f ∗∗ (x) ≤ f (x) holds by Proposition 4.9, and hence it
justifies that f ∗∗ = f .
To verify the last implication (c)=⇒(a), we need to show that the epigraph
of f is closed and convex in X × R. This follows from the relationships
  
epi(f ) = (x, λ) ∈ X × R  λ ≥ f (x) = f ∗∗ (x) 
 ∗
, x − f ∗ (x∗ ) | x∗ ∈ dom(f ∗ )}
 λ)∈ X × R λ ≥ sup{x
= (x,
 
= (x, λ) ∈ X × R  λ ≥ x∗ , x − f ∗ (x∗ ) ,
x∗ ∈X ∗

which hold by (c) and thus complete the proof of the theorem. 

Theorem 4.15 easily implies that the conjugate of any proper, convex, and
lower semicontinuous function is proper.
Corollary 4.16 Let X be an LCTV space, and let f : X → R be a proper,
convex, and l.s.c. function. Then we have dom(f ∗ ) = ∅.

Proof. Theorem 4.15 tells us that there are x∗ ∈ X ∗ and b ∈ R such that
x∗ , x + b ≤ f (x) whenever x ∈ X.
It yields x∗ , x − f (x) ≤ −b for all x ∈ X, and hence f ∗ (x∗ ) ≤ −b < ∞. Thus
we verify the inclusion x∗ ∈ dom(f ∗ ). 
Let X be an LCTV space, and let its dual space X ∗ be equipped with the
weak∗ topology. For a function f : X ∗ → R with u∗ ∈ dom(f ∗ ), subgradients
of f at u∗ are given by
  
∂f (u∗ ) := x ∈ X  x∗ − u∗ , x ≤ f (x∗ ) − f (u∗ ) for all x∗ ∈ X ∗ .
The last assertion of this subsection establishes close relationships between
subgradients of a convex function and its conjugate.

Theorem 4.17 Let f : X → R be a proper convex function defined on an


LCTV space X, and let x ∈ dom(f ). If u∗ ∈ ∂f (x), then we have u∗ ∈
dom(f ∗ ) and x ∈ ∂f ∗ (u∗ ). Furthermore, assuming in addition that f is l.s.c.
gives us the equivalence:
u∗ ∈ ∂f (x) if and only if x ∈ ∂f ∗ (u∗ ).

Proof. Taking u∗ ∈ ∂f (x) implies by Theorem 4.10 that


u∗ , x = f (x) + f ∗ (u∗ ). (4.6)
4.1 Fenchel Conjugates 263

This ensures that f ∗ (u∗ ) = u∗ , x − f (x) < ∞, and so u∗ ∈ dom(f ∗ ). Apply-
ing now Proposition 4.9 tells us that
x∗ , x ≤ f (x) + f ∗ (x∗ ) for all x∗ ∈ X ∗ . (4.7)
Unifying now (4.6) and (4.7), we arrive at the inequality
x∗ − u∗ , x ≤ f (x∗ ) − f (u∗ ) for all x∗ ∈ X ∗ ,
which verifies the claimed inclusion x ∈ ∂f ∗ (u∗ ).
Suppose finally that f is l.s.c. and that x ∈ ∂f ∗ (u∗ ). Then we have f ∗∗ = f
by Theorem 4.15, and thus u∗ ∈ ∂f ∗∗ (x) = ∂f (x). 

4.1.2 Support Functions

Here we start investigating the support function associated with a given subset
of a topological vector space. This extended-real-valued and always convex
function plays a highly important role in the subsequent material. The main
result of this subsection gives us a precise formula for representing the support
function for the intersection of two convex sets in topological vector spaces via
the infimal convolution under certain qualification conditions with a further
improvement in finite dimensions. The proof is based on the convex extremal
principle in topological vector spaces derived in Subsection 3.1.2.

Definition 4.18 Let X be a topological vector space. Given a nonempty subset


Ω ⊂ X, the support function σΩ : X ∗ → R of Ω ⊂ X is defined by
  
σΩ (x∗ ) := sup x∗ , x  x ∈ Ω , x∗ ∈ X ∗ . (4.8)

It follows directly from the definition that σΩ is always convex on X ∗ , regard-


less of the convexity of Ω, and that we have the conjugacy relationship
σΩ (x∗ ) = δΩ

(x∗ ) for x∗ ∈ X ∗ with the indicator function of Ω.
Besides these observations, let us formulate some other elementary prop-
erties of the support function (4.8) in the general setting.

Proposition 4.19 The following geometric properties hold in arbitrary topo-


logical vector spaces:
(a) For any nonempty subset Ω ⊂ X we have that σΩ : X ∗ → R is proper,
sublinear, and l.s.c. with respect to the weak∗ topology of X ∗ .
(b) For any nonempty subset Ω ⊂ X we have that σΩ = σΩ = σcoΩ .
(c) For any nonempty subsets Ω1 , Ω2 ⊂ X we have the representations
 
σΩ1 +Ω2 = σΩ1 + σΩ2 and σΩ1 ∪Ω2 = max σΩ1 , σΩ2 .

Proof. To verify the properties listed in (a), observe first that σΩ (0) = 0, and
hence σΩ is proper. Furthermore, it follows from (4.8) that
264 4 ENHANCED CALCULUS AND FENCHEL DUALITY
  
epi(σΩ ) = (x∗ , λ) ∈ X ∗ × R  λ ≥ σΩ (x∗ )
= (x∗ , λ) ∈ X ∗ × R | λ ≥ x∗ , x for all x ∈ Ω}
 ∗ 
= (x , λ) | λ ≥ x∗ , x .
x∈Ω

This representation clearly yields the positive homogeneity and convexity (and
hence sublinearity) as well as the weak∗ lower semicontinuity of σΩ .
To prove (b), we have that σΩ (x∗ ) ≤ σΩ (x∗ ) for all x∗ ∈ X ∗ due to Ω ⊂ Ω.
Taking now x ∈ Ω, find a net {xα } ⊂ Ω that converges to x. For any x∗ ∈ X ∗
we obviously get the relationships
x∗ , x = limx∗ , xα  ≤ σΩ (x∗ ),
which implies that σΩ (x∗ ) ≤ σΩ (x∗ ). Furthermore, σΩ ≤ σco(Ω) since Ω ⊂
mand then find λi ≥ 0
co(Ω). To verify the reverse inequality, fix any x ∈ co(Ω)
and xi ∈ Ω for i = 1, . . . , m for some m ∈ N with i=1 λi = 1 such that
m
x = i=1 λi xi . It shows that
  m   m m

x∗ , x = x∗ , λi xi = λi x∗ , xi  ≤ λi σΩ (x∗ ) = σΩ (x∗ )
i=1 i=1 i=1

for x∗ ∈ X ∗ . The latter yields σco(Ω) (x∗ ) ≤ σΩ (x∗ ) and verifies (b).
It remains to check (c). To proceed with the first equality therein, we have
  
σΩ1 +Ω2 (x∗ ) = sup x∗ , x1 + x2   x1 
∈ Ω1 , x
2 ∈ Ω2  
= sup x∗ , x1   x1 ∈ Ω1 + sup x∗ , x2   x2 ∈ Ω2
= σΩ1 (x∗ ) + σΩ2 (x∗ ) for all x∗ ∈ X ∗ .
The second equality is verified similarly. 
Example 4.20 Let X be an LCTV space. Since (δΩ )∗ = σΩ , we have δΩ ∗∗
=
∗∗
δco(Ω) for any nonempty subset Ω ⊂ X. This yields δΩ = δΩ provided that
Ω is a nonempty, closed, and convex set, which is consistent with the general
result of Theorem 4.15.
Lemma 4.21 Let X be an LCTV space, and let Ω be a nonempty, closed,
and convex subset of X. Then we have the formula
∂σΩ (0) = Ω. (4.9)
Proof. Fix any x ∈ ∂σΩ (0) and get by the definition that
x∗ − 0, x = x∗ , x ≤ σΩ (x∗ ) = σΩ (x∗ ) − σΩ (0) for all x∗ ∈ X ∗ .
Suppose on the contrary that x ∈ / Ω. The strict separation theorem ensures
that there exists x∗ ∈ X ∗ such that
sup x∗ , x < x∗ , x,
x∈Ω

which gives us a contradiction. Thus we have the inclusion “⊂” in (4.9). The
proof of the opposite inclusion is straightforward. 
4.1 Fenchel Conjugates 265

The following proposition calculates the Fenchel conjugate for infimal con-
volutions of support functions to convex sets in LCTV spaces.
Proposition 4.22 Let the sets Ω1 and Ω2 be nonempty, closed, and convex
in an LCTV space X with Ω1 ∩ Ω2 = ∅. Then we have the representation
 ∗
σΩ1 σΩ2 (x) = δΩ1 ∩Ω2 (x) for all x ∈ X. (4.10)

Proof. Define a convex and positively homogeneous function on X ∗ by


 
f (x∗ ) := σΩ1 σΩ2 (x∗ ), x∗ ∈ X ∗ .
It follows from Proposition 3.76 and Lemma 4.21 that
∂f (0) = ∂σΩ1 (0) ∩ ∂σΩ2 (0) = Ω1 ∩ Ω2 .
Furthermore, it follows from Proposition 4.7 that
f ∗ (x) = δ∂f (0) (x) whenever x ∈ X,
which readily verifies the infimal convolution representation in (4.10). 

The next theorem presents a major geometric result of convex analysis


on representing the support function for convex set intersections via the infi-
mal convolution of the support functions applied to each component. We
derive this result by using the convex extremal principle under four mutually
independent assumptions. The first one imposes the difference interiority con-
dition (3.106) and the boundedness of one of the sets in topological vector
spaces. The second assertion, which is established in the same setting, does
not require any set boundedness while imposing a more restrictive qualifica-
tion condition. Assuming that one of the sets is polyhedral, we present in
the third assertion a refined intersection rule for support functions in LCTV
spaces under a quasi-relative interior requirement imposed on the other set.
The fourth assertion is proved under a relative interior qualification condition
in finite dimensions. Note that in Subsection 4.2.1 we obtain the intersection
rule for support functions under yet another qualification condition for closed
convex subsets of Banach spaces by using a different variational technique.
Theorem 4.23 Let Ω1 , Ω2 be nonempty and convex subsets of a topological
vector space X, and let one of the following assumptions be satisfied:
(a) Either int(Ω1 ) ∩ Ω2 = ∅, or int(Ω2 ) ∩ Ω1 = ∅.
(b) The difference interiority condition (3.106) holds, and Ω2 is bounded.
(c) X is an LCTV space, the quasi-relative qualification condition
Ω1 ∩ qri(Ω2 ) = ∅ (4.11)
is satisfied, and the set Ω1 is polyhedral.
(d) X = Rn and ri(Ω1 ) ∩ ri(Ω2 ) = ∅.
266 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Then the support function (4.8) of the intersection Ω1 ∩ Ω2 is represented


as    
σΩ1 ∩Ω2 (x∗ ) = σΩ1 σΩ2 (x∗ ) for all x∗ ∈ X ∗ . (4.12)
Furthermore, for any x∗ ∈ dom(σΩ1 ∩Ω2 ) there exist dual elements x∗1 , x∗2 ∈ X ∗
such that x∗ = x∗1 + x∗2 and
(σΩ1 ∩Ω2 )(x∗ ) = σΩ1 (x∗1 ) + σΩ2 (x∗2 ). (4.13)

Proof. To verify the inequality “≤” in (4.12), fix x∗ ∈ X ∗ and pick x∗1 , x∗2
with x∗ = x∗1 + x∗2 . Then we have
x∗ , x = x∗1 , x + x∗2 , x ≤ σΩ1 (x∗1 ) + σΩ2 (x∗2 ) for every x ∈ Ω1 ∩ Ω2 .
Taking the infimum on the right-hand side above with respect to all such
elements x∗1 , x∗2 implies that
 
x∗ , x ≤ σΩ1 σΩ2 (x∗ ) whenever x ∈ Ω1 ∩ Ω2 ,
which justifies the inequality “≤” in (4.12) for arbitrary sets Ω1 and Ω2 with-
out imposing the assumptions in (a)-(d).
Now we prove the inequality “≥” in (4.12) considering first case (b). Fix
any x∗ ∈ dom(σΩ1 ∩Ω2 ), denote α := σΩ1 ∩Ω2 (x∗ ) ∈ R for which
x∗ , x ≤ α whenever x ∈ Ω1 ∩ Ω2 ,
and define two nonempty convex subsets of X × R by

 Θ1 := Ω1 × [0, ∞) and  (4.14)


Θ2 := (x, λ) ∈ X × R  x ∈ Ω2 , λ ≤ x∗ , x − α .
It is easy to see from the constructions of Θ1 and Θ2 that
 
Θ1 + (0, γ) ∩ Θ2 = ∅ for any γ > 0,
and thus these sets form an extremal system in X × R. Then Theorem 3.7(a)
/ int(Θ1 − Θ2 ). To apply further Theorem 3.7(b) to these sets,
tells us that 0 ∈
we need to verify that the set difference Θ1 − Θ2 is solid. The assumption
0 ∈ int(Ω1 − Ω2 ) allows us to find a neighborhood U of the origin with
U ⊂ Ω1 − Ω2 . The continuity of ϕ(x) := −x∗ , x + α ensures the existence
of a neighborhood W of the origin such that ϕ is bounded from above on W .
Since Ω2 is bounded, we find t > 0 for which Ω2 ⊂ tW . Note that ϕ is also
bounded from above on tW , so we can find λ ∈ R with
λ ≥ sup −x∗ , x + α ≥ sup −x∗ , x + α.
x∈tW x∈Ω2
4.1 Fenchel Conjugates 267

Let us check that U × (λ, ∞) ⊂ Θ1 − Θ2 , and thus (3.5) holds. Indeed, taking
any pair (x, λ) ∈ U × (λ, ∞) gives us x ∈ U ⊂ Ω1 − Ω2 and λ > λ. Hence we
represent x = w1 − w2 for some w1 ∈ Ω1 , w2 ∈ Ω2 and therefore obtain
(x, λ) = (w1 , λ − λ) − (w2 , −λ).
It follows from λ − λ > 0 that (w1 , λ − λ) ∈ Θ1 . Then (3.13) and the choice
of λ tell us that (w2 , −λ) ∈ Θ2 , and thus int(Θ1 − Θ2 ) = ∅. Applying The-
orem 3.7(b) ensures the existence of (0, 0) = (y ∗ , β) ∈ X ∗ × R for which

y ∗ , x + βλ1 ≤ y ∗ , y + βλ2 whenever (x, λ1 ) ∈ Θ1 , (y, λ2 ) ∈ Θ2 . (4.15)


The structure of Θ1 yields β ≤ 0. If β = 0, we deduce from (4.15) that
y ∗ , x ≤ y ∗ , y for all x ∈ Ω1 , y ∈ Ω2 .
This implies that y ∗ = 0 due to 0 ∈ int(Ω1 − Ω2 ), a contradiction. Take now
the pairs (x, 0) ∈ Θ1 and (y, x∗ , y − α) ∈ Θ2 in (4.15) and get
y ∗ , x ≤ y ∗ , y + β(x∗ , y − α) with β < 0,
which leads us to the estimate
   
α ≥ y ∗ /β + x∗ , y + − y ∗ /β, x for all x ∈ Ω1 , y ∈ Ω2 .
By putting x∗1 := y ∗ /β + x∗ and x∗2 := −y ∗ /β we arrive at the inequality
“≥” in (4.12) and thus get representation (4.13). This justifies therefore the
conclusions of the theorem in case (b).
To verify the results under assumptions in (a), it suffices to examine only
the first case therein. Considering the sets Θ1 , Θ2 from (4.14), we see that
int(Θ1 ) = ∅, and so the difference set Θ1 − Θ2 is solid. Moreover, it follows
from int(Ω2 ) ∩ Ω1 = ∅ that 0 ∈ int(Ω1 − Ω2 ), which allows us to proceed
similarly to the above proof in case (b).
In case (d) we use the unconditional equivalence between the extremality and
convex separation for the sets Θ1 , Θ2 in (4.14) discussed in Exercise 3.95 and
then apply to them the finite-dimensional separation result of Theorem 2.92.
The rest of the proof in this case follows the one given above in case (b).
Finally, in case (c) we proceed similarly to (d) by replacing the application
of Theorem 2.92 with the usage of the polyhedral proper separation char-
acterization from Theorem 3.86. This completes the proof of the theorem.


4.1.3 Conjugate Calculus

In this subsection, we discuss the behavior of the Fenchel conjugate under


various operations, i.e., conjugate calculus. Some conjugate calculus rules are
rather obvious, while the others are more involved. The main calculus results
268 4 ENHANCED CALCULUS AND FENCHEL DUALITY

here are the conjugate sum and chain rules that give us exact representa-
tions of the Fenchel conjugate for sums and compositions. To establish these
results, we develop a geometric approach based on the reduction to the inter-
section rule for support functions of sets and using eventually the convex
extremal principle, or some form of convex set separation. Similar to our pro-
cedure above, we concentrate in this subsection on the general topological
vector space setting with finite-dimensional specifications while postponing
until Subsection 4.2.2 the case of l.s.c. functions on Banach spaces. In the
case of topological vector spaces, refined results are derived under certain
polyhedrality assumptions by using the corresponding machinery developed in
Section 3.4.
Let us start with simple rules, which directly follow from the definition.
Proposition 4.24 Let f : X → R be an arbitrary function on a topological
vector space X. Then we have the equalities:
x∗
(a) (λf )∗ (x∗ ) = λf ∗ for any λ > 0.
λ
(b) (f + c)∗ (x∗ ) = f ∗ (x∗ ) − c for any c ∈ R.
(c) (fa )∗ (x∗ ) = f ∗ (x∗ ) − x∗ , a, where fa (x) := f (x + a).
Proof. To verify (a), we get by definition that
  
(λf )∗ (x∗ ) = sup x∗ , x − λf (x)  x ∈ X
 x∗   x∗

= λ sup , x − f (x)  x ∈ X = λf ∗ .
λ λ
The proofs of (b) and (c) are also straightforward. 
The next proposition evaluates the conjugate of the infimal convolution in
topological vector spaces without any convexity assumptions.
Proposition 4.25 Let X be a topological vector space. For any proper func-
tions f, g : X → R we have
 ∗
f g (x∗ ) = f ∗ (x∗ ) + g ∗ (x∗ ) whenever x∗ ∈ X ∗ .
Proof. The properness of f, g implies that the sum f ∗ + g ∗ is well defined for
all x∗ ∈ X ∗ . Then fixing any x, u ∈ X and x∗ ∈ X ∗ gives us
 ∗    
f g (x∗ ) ≥ x∗ , x − f g  (x)  
= x∗ , x − inf f (x1 ) + g(x2 )  x1 + x2 = x
x∗ , u + x − u − f (u)
≥   − g(x −u) 
= x∗ , u − f (u) + x∗ , x − u − g(x − u) .
Taking the supremum on the rightmost side with respect to x ∈ X first and
then with respect to u ∈ X yields (f +g)∗ (x∗ ) ≥ f ∗ (x∗ )+g ∗ (x∗ ). The opposite
inequality therein can also be verified easily. 
The next observation makes a bridge between the conjugate of an arbi-
trary function and the support function for its epigraph. It is essential in the
implementation of our geometric approach to conjugate calculus.
4.1 Fenchel Conjugates 269

Lemma 4.26 Let X be a topological vector space. For any proper function
f : X → R we have
f ∗ (x∗ ) = σepi(f ) (x∗ , −1) whenever x∗ ∈ X ∗ .
Proof. It follows from the definitions that
     
f ∗ (x∗ ) = sup x∗ , x − f (x)  x ∈ dom(f ) = sup x∗ , x − λ  (x, λ) ∈ epi(f )

= σepi(f ) (x∗ , −1),

which therefore verifies the claimed formula. 

Here is the conjugate sum rule in topological vector space and finite-
dimensional settings.
Theorem 4.27 Let f, g : X → R be proper convex functions on a topological
vector space X, and let one of following conditions be satisfied:
(a) There exists a point x ∈ dom(f )∩dom(g) such that either f is continuous
at x, or g is continuous at this point.
(b) X is an LCTV space, and f is polyhedral under the fulfillment of the
qualification condition
 
dom(f ) ∩ qri dom(g) = ∅. (4.16)

(c) X = Rn and ri(dom(f )) ∩ ri(dom(g)) = ∅.


Then we have the conjugate sum rule
 
(f + g)∗ (x∗ ) = f ∗ g ∗ (x∗ ) for all x∗ ∈ X ∗ . (4.17)
Moreover, the infimum in (f ∗ g ∗ )(x∗ ) is attained, i.e., for any x∗ ∈ dom(f +
g)∗ there exist vectors x∗1 , x∗2 ∈ X ∗ for which
(f + g)∗ (x∗ ) = f ∗ (x∗1 ) + g ∗ (x∗2 ), x∗ = x∗1 + x∗2 . (4.18)
Proof. Fixing any x∗1 , x∗2 ∈ X ∗ with x∗1 + x∗2 = x∗ , we get
     
f ∗ (x∗1 ) + g ∗ (x∗2 ) = sup x∗1 , x − f (x)  x ∈ X + sup x∗2 , x 
− g(x)  x ∈ X
≥ sup x∗1 , x − f (x) + x∗2 ,x − g(x) 
 x∈X
= sup x∗ , x − (f + g)(x)  x ∈ X = (f + g)∗ (x∗ ).

Let us prove that (f ∗ g ∗ )(x∗ ) ≤ (f + g)∗ (x∗ ) under (a). We only need to
consider the case where (f + g)∗ (x∗ ) < ∞. Define two convex sets by
  
Ω1 := (x, λ1 , λ2 ) ∈ X × R × R  λ1 ≥ f1 (x) = epi(f1 ) × R,
(4.19)
Ω2 := (x, λ1 , λ2 ) ∈ X × R × R  λ2 ≥ f2 (x) .
Similar to Lemma 4.26 we get the representation
270 4 ENHANCED CALCULUS AND FENCHEL DUALITY

(f + g)∗ (x∗ ) = σΩ1 ∩Ω2 (x∗ , −1, −1). (4.20)


Furthermore, it is easy to observe that our continuity assumptions ensure the
fulfillment of the assumptions in Theorem 4.23(a) for the support function
intersection rule. Applying this theorem to the right-hand side of (4.20) gives
us triples (x∗1 , −α1 , −α2 ) ∈ X ∗ × R × R and (x∗2 , −β1 , −β2 ) ∈ X ∗ × R × R such
that (x∗ , −1, −1) = (x∗1 , −α1 , −α2 ) + (x∗2 , −β1 , −β2 ) and
(f + g)∗ (x∗ ) = σΩ1 ∩Ω2 (x∗ , −1, −1) = σΩ1 (x∗1 , −α1 , −α2 ) + σΩ2 (x∗2 , −β1 , −β2 ).
If α2 = 0, then σΩ1 (x∗1 , −α1 , −α2 ) = ∞, which is not possible. Thus α2 = 0
and similarly β1 = 0. Employing now Lemma 4.26 and taking into account
the structures of the sets Ω1 and Ω2 in (4.19) imply that
(f + g)∗ (x∗ ) = σΩ1 ∩Ω2 (x∗ , −1, −1) = σΩ1 (x∗1 , −1, 0) + σΩ2 (x∗2 , 0, −1)
= σepi(f ) (x∗ , −1) + σepi(g) ∗
 ∗ (x∗2, −1)
= f (x1 ) + g (x2 ) ≥ f g (x∗ ).
∗ ∗ ∗ ∗

This justifies the sum rule (4.17) together with the last statement of the
theorem under the assumptions in (a). The verifications of (4.17) under the
assumptions in (b) and (c) are similar to the above arguments by applying
Theorem 4.23 in cases (c) and (d) therein, respectively. 

Next we establish the major conjugate chain rule, the proof of which is also
based on the intersection results of Theorem 4.23 in both cases of topological
vector spaces and finite-dimensional spaces under different assumptions.
Theorem 4.28 Let A : X → Y be a linear continuous mapping between topo-
logical vector spaces, and let g : Y → R be a proper convex function. Suppose
that one of the following conditions is satisfied:
(a) The function g is finite and continuous at some point of the set AX.
(b) X is an LCTV space, and the function g is polyhedral with AX ∩
dom(g) = ∅.
(c) X = Rn , Y = Rp , and AX ∩ ri(dom(g)) = ∅.
Then we have the conjugate chain rule
  
(g ◦ A)∗ (x∗ ) = inf g ∗ (y ∗ )  y ∗ ∈ (A∗ )−1 (x∗ ) , x∗ ∈ X ∗ . (4.21)
Furthermore, for any x∗ ∈ dom(g ◦ A)∗ there exists y ∗ ∈ (A∗ )−1 (x∗ ) such that
(g ◦ A)∗ (x∗ ) = g ∗ (y ∗ ).
Proof. As above, we verify the results only in case (a), since the other two
cases can be considered similarly by using the corresponding versions of The-
orem 4.27. Picking y ∗ ∈ (A∗ )−1 (x∗ ) gives us by definition that
4.1 Fenchel Conjugates 271
  
g ∗ (y ∗ ) = sup y ∗ , y − g(y)  y ∈ Y
 
≥ sup y ∗ , A(x) − g(Ax)  x ∈ X 
= sup A∗ y ∗ , x − (g ◦ A)(x) 
 x ∈X

= sup x , x − (g ◦ A)(x) x ∈ X = (g ◦ A)∗ (x∗ ).

This implies the inequality


  
inf g ∗ (y ∗ )  y ∗ ∈ (A∗ )−1 (x∗ ) ≥ (g ◦ A)∗ (x∗ ).
Let us show that the opposite inequality is also satisfied. We can assume that
x∗ ∈ dom((g ◦ A)∗ ) and then define two convex sets
Ω1 := gph(A) × R and Ω2 := X × epi(g) ⊂ X × Y × R. (4.22)
It follows directly from the above constructions that
(g ◦ A)∗ (x∗ ) = σΩ1 ∩Ω2 (x∗ , 0, −1) < ∞.
Assuming the validity of the conditions in (a) and taking into account that
    
int(Ω2 ) = (x, y, λ) ∈ X × Y × R  x ∈ X, y ∈ int dom(g) , λ > g(y) ,
we get that Ω1 ∩ (ıΩ2 ) = ∅ for the sets Ω1 , Ω2 from (4.22). Then Theo-
rem 4.23(a) tells us that there exist triples (x∗1 , y1∗ , α1 ) and (x∗2 , y2∗ , α2 ) in the
space X ∗ × Y ∗ × R satisfying
(x∗ , 0, −1) = (x∗1 , y1∗ , α1 ) + (x∗2 , y2∗ , α2 ) and
σΩ1 ∩Ω2 (x∗ , 0, −1) = σΩ1 (x∗1 , y1∗ , α1 ) + σΩ2 (x∗2 , y2∗ , α2 ).
It follows from the structures of Ω1 , Ω2 in (4.22) that α1 = 0 and x∗2 = 0.
This gives us the representation
σΩ1 ∩Ω2 (x∗ , 0, −1) = σΩ1 (x∗ , −y2∗ , 0) + σΩ2 (0, y2∗ , −1)
for some y2∗ ∈ Y ∗ . Thus we arrive at the equalities
  
σΩ1 ∩Ω2 (x∗ , 0, −1) = sup x∗ , x − y2∗ , Ax  x ∈ X + σepi(g) (y2∗ , −1)
 
= sup x∗ − A∗ y2∗ , x  x ∈ X + g ∗ (y2∗ ),
which allow us to conclude that x∗ = A∗ y2∗ and therefore
  
σΩ1 ∩Ω2 (x∗ , 0, −1) = g ∗ (y2∗ ) ≥ inf g ∗ (y ∗ )  y ∗ ∈ (A∗ )−1 (x∗ ) .
This justifies both statements of the theorem in case (a), and thus we are
done with the proof of the theorem. 

Next we establish the conjugate maximum rule to calculate the conjugate


of the maximum of two convex functions f, g : X → R defined by
 
(f ∨ g)(x) := max f (x), g(x) , x ∈ X,
using the convention that 0 · f = δdom(f ) .
272 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Theorem 4.29 Let X be a topological vector space. Given convex functions


f, g : X → R, suppose that the assumptions in either (a), or (b), or (c) of
Theorem 4.27 are satisfied. Then we have the conjugate maximum rule
 ∗
(f ∨ g)∗ (x∗ ) = inf λf + (1 − λ)g (x∗ ), x∗ ∈ X ∗ . (4.23)
λ∈[0,1]

If furthermore (f ∨ g) (x∗ ) ∈ R, then the infimum in (4.23) is realized.


Proof. Let us first check that the inequality “≤” always holds in (4.23) when-
ever λ ∈ [0, 1]. Indeed, it follows directly from the definitions that
 ∗  
λf + (1 − λ)g (x∗ ) = sup x∗ , x − λf (x) − (1 − λ)g(x)
x∈X  
≥ sup x∗ , x − λ(f ∨ g)(x) − (1 − λ)(f ∨ g)(x)
x∈X  
= sup x∗ , x − (f ∨ g)(x) = (f ∨ g)∗ (x∗ )
x∈X

for all x∗ ∈ X ∗ . To verify the opposite inequality, observe that epi(f ∨ g) =


epi(f ) ∩ epi(g), and hence we deduce from Lemma 4.26 the relationships
(f ∨ g)∗ (x∗ ) = σepi(f ∨g) (x∗ , −1) = σΩ1 ∩Ω2 (x∗ , −1)
with Ω1 := epi(f ) and Ω2 := epi(g). It follows from Theorem 4.23 under the
corresponding assumptions that
 
(f ∨ g)∗ (x∗ ) = σΩ1 σΩ2 (x∗ , −1), x∗ ∈ X ∗ .
Suppose now that (f ∨ g)∗ (x∗ ) ∈ R, i.e., x∗ ∈ dom(f ∨ g)∗ . Then Theo-
rem 4.23 gives us pairs (x∗1 , −λ1 ), (x∗2 , −λ2 ) ∈ X ∗ × R such that x∗1 + x∗2 = x∗ ,
λ1 + λ2 = 1, and
(f ∨ g)∗ (x∗ ) = σepi(f ∨g) (x∗ , −1) = σΩ1 ∩Ω2 (x∗ , −1)
= σΩ1 (x∗1 , −λ1 ) + σΩ2 (x∗2 , −λ2 ).
Note that if either λ1 < 0 or λ2 < 0, then either σΩ1 (x∗1 , −λ1 ) = ∞ or
σΩ2 (x∗2 , −λ2 ) = ∞, respectively. This is a contradiction, which tells us that
λ1 , λ2 ≥ 0. In the case where both λ1 and λ2 are positive, it follows that
x∗1 x∗ x∗ x∗
(f ∨ g)∗ (x∗ ) = λ1 σΩ1 , −1 + λ2 σΩ2 2 , −1 = λ1 f ∗ 1 + λ2 g ∗ 2
λ1 λ2 λ1 λ2

= (λ1 f )∗ (x∗1 ) + (λ2 g)∗ (x∗2 ) with x∗ ∈ dom(f ∨ g)∗ .


Furthermore, we obviously have the estimate
 ∗  ∗
(λ1 f )∗ (x∗1 )+(λ2 g)∗ (x∗2 ) ≥ λ1 f +(1−λ1 )g (x∗ ) ≥ inf λf +(1−λ)g (x∗ ).
λ∈[0,1]

Plugging there λ1 = 0 gives us λ2 = 1, and hence


4.2 Enhanced Calculus in Banach Spaces 273

σΩ1 (x∗1 , −λ1 ) + σΩ2 (x∗2 , −λ2 ) = σdom(f ) (x∗1 ) + g ∗ (x∗2 )  ∗



= δdom(f (x∗ ) + g ∗ (x∗2 ) ≥ δdom(f ) + g (x∗ )
) 1 ∗ ∗
≥ inf λf + (1 − λ)g (x ),
λ∈[0,1]

where the last inequality follows from the convention 0 · f := δdom(f ) . Since
the latter obviously holds if (f ∨ g)∗ (x∗ ) = ∞, we complete the proof. 

4.2 Enhanced Calculus in Banach Spaces


This section addresses deriving enhanced rules of both conjugate and gener-
alized differential calculi under less demanding assumptions on closed convex
sets, set-valued mappings with closed and convex graphs, and l.s.c. convex
functions in the setting of Banach spaces. The space completeness and the
closedness assumptions imposed allow us to employ variational arguments
and fundamental results of Banach space theory.

4.2.1 Support Functions of Set Intersections

We start with of the intersection rule for support functions of convex sets,
which improves in the Banach space setting the previous one from Theo-
rem 4.23 obtained for arbitrary convex sets in general topological vector spaces
under interiority assumptions. Here we can deal with nonsolid while closed
and convex subsets of Banach spaces and arrive at the same result but under a
less demanding qualification condition by using a different proof. The obtained
version of the support function intersection rule is employed for developing
enhanced rules of subdifferential and conjugate calculi in the next subsection.
First we establish the following two lemmas that are used below in the
proof of the main result.
Lemma 4.30 Let Ω1 and Ω2 be nonempty subsets of a Banach space X.
Suppose that cone(Ω1 − Ω2 ) = X. Then for any α, β ≥ 0 the set
  
K = Kα,β := (x∗1 , x∗2 ) ∈ X ∗ × X ∗  σΩ1 (x∗1 ) + σΩ2 (x∗2 ) ≤ α, x∗1 + x∗2 ≤ β
is compact in the weak∗ topology of the dual product space X ∗ × X ∗ .
Proof. The closedness of the set K in the weak∗ topology of X ∗ × X ∗ is obvi-
ous. By the Alaoglu-Bourbaki theorem (Corollary 1.113), it remains to show
that this set is norm-bounded in X ∗ × X ∗ . Remembering the uniform bound-
edness principle, we need to verify that the collection of continuous linear func-
tionals from K is bounded pointwise. To proceed, take any (x1 , x2 ) ∈ X × X
and find by cone(Ω1 − Ω2 ) = X such λ ≥ 0, w1 ∈ Ω1 , and w2 ∈ Ω2 for which
x1 − x2 = λ(w1 − w2 ). Then we have
x∗1 , x1  + x∗2 , x2  = λx∗1 , w1  + λx∗2 , w2  + x∗1 + x∗2 , x2 − λw2 
 
≤ λ σΩ1 (x∗1 ) + σΩ2 (x∗2 ) + x∗1 + x∗2 · x2 − λw2 ≤ λα + β x2 − λw2 .
Since with also holds for (−x1 , −x2 ), we arrive at the claimed conclusion. 
274 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Lemma 4.31 Under the assumptions of Lemma 4.30, the infimal convolution
(σΩ1 σΩ2 ) : X ∗ → R is a lower semicontinuous function with respect to the
weak∗ topology of X ∗ .

Proof. It suffices to verify that for any γ ∈ R the set


   
C := x∗ ∈ X ∗  σΩ1 σΩ2 (x∗ ) ≤ γ
is weak∗ closed in X ∗ . Consider the parametric family of sets
  
Cε := x∗ ∈ X ∗  x∗ = x∗1 + x∗2 , σΩ1 (x∗1 ) + σΩ2 (x∗2 ) ≤ γ + ε , ε > 0,

with C = ε>0 Cε and show that each Cε for ε > 0 is weak∗ closed in X ∗ .
Using the seminal Banach-Dieudonné-Krein-S̆mulian theorem (see, e.g., [114,
Theorem V.5.7]), we only need to check that the set Cε ∩ rB∗ is weak∗ closed
in X ∗ for all r > 0. To this end, define a mapping T : X ∗ × X ∗ → X ∗ by
T (x∗1 , x∗2 ) = x∗1 + x∗2 ,
which is continuous in the weak∗ topology of X ∗ × X ∗ . Observe that
Cε ∩ rB∗ = T (Kγ+ε,r ),
where the set Kγ+ε,r is defined above. It follows from Lemma 4.30 that the
intersection Cε ∩ rB∗ is weak∗ compact and hence weak∗ closed in X ∗ . This
verifies the claimed conclusion. 

The following qualification condition imposed on closed and convex subsets


of Banach spaces plays a crucial role in the subsequent calculus rules.
Definition 4.32 Let X be a Banach space, and let the sets Ω1 , Ω2 ⊂ X
be nonempty, closed, and convex. We say that Ω1 , Ω2 satisfy the Attouch-
Brezis qualification condition if the set
cone(Ω1 − Ω2 ) is a closed linear subspace of X. (4.24)
It is clear that the Attouch-Brezis condition (4.24) supersedes the equiva-
lent core and difference interior qualification conditions (3.106) used above for
closed convex sets in Banach spaces; the latter implies that cone(Ω1 − Ω2 ) =
X. Thus the qualification condition (4.24) is also weaker than the standard
qualification condition (3.9). This implies that, in the case of closed convex
subsets of Banach spaces, the next theorem improves the results of Theo-
rem 4.23(a,b), which however holds in general topological vector spaces even
without the closedness assumption on the sets in question.
Theorem 4.33 Let Ω1 and Ω2 be nonempty, closed, and convex subsets of a
Banach space X, and let the Attouch-Brezis qualification condition (4.24) be
satisfied. Then we have (4.12). Furthermore, for any x∗ ∈ dom(σΩ1 ∩Ω2 ) there
exist x∗1 , x∗2 ∈ X ∗ for which x∗ = x∗1 + x∗2 and (4.13) is satisfied.
4.2 Enhanced Calculus in Banach Spaces 275

Proof. Applying the Fenchel conjugate to both sides of formula (4.10) from
Proposition 4.22 and then using Lemma 4.31 give us the equalities

 ∗∗  
δΩ 1 ∩Ω2
(x∗ ) = σΩ1 ∩Ω2 (x∗ ) = σΩ1 σΩ2 (x∗ ) = σΩ1 σΩ2 (x∗ )
for all x∗ ∈ X ∗ . This justifies (4.12) when the assumption cone(Ω1 − Ω2 ) = X
is satisfied. Denote further by L := cone(Ω1 − Ω2 ) the closed subspace of X in
question. Since Ω1 ∩Ω2 = ∅ by (4.24), we can always translate the situation to
the case where 0 ∈ Ω1 ∩ Ω2 and hence suppose that Ω1 , Ω2 ⊂ L. This reduces
the general setting under (4.24) to the one with cone(Ω1 − Ω2 ) = X treated
above. Thus (4.12) is verified.
Finally, representation (4.13) for x∗ ∈ dom(σΩ1 ∩ σΩ2 ) follows from
 the
weak∗ compactness of the set Kα,β in Lemma 4.30 with α := σΩ1 σΩ2 (x∗ )+
ε, where ε > 0 is chosen arbitrarily, and where β := x∗ . 
Remark 4.34 If Ω is a nonempty convex set in a vector space X such that
cone(Ω)
 = λ≥0 λΩ is a linear subspace of X, then 0 ∈ Ω and cone(Ω) =
λ>0 λΩ. Indeed, fix a ∈ Ω and find λ ≥ 0 and b ∈ Ω such that −a = λb.
This implies that
1 λ
0= a+ b ∈ Ω,
1+λ 1+λ
which verifies the equality cone(Ω) = ∪λ>0 λΩ.

4.2.2 Refined Calculus Rules

In this subsection, we first derive from Theorem 4.33 the normal cone intersec-
tion rule for closed and convex subsets of Banach spaces under the Attouch-
Brezis qualification condition (4.24). This geometric result generates enhanced
calculus rules for coderivatives and subgradients. We also present the corre-
sponding elaborations of conjugate calculus rules in the Banach space setting.
As stated in the discussion in the preceding subsection, the next theorem
significantly improves the intersection rule of Theorem 3.10 for the case where
Ω1 and Ω2 are closed subsets of a Banach space.
Theorem 4.35 Let the sets Ω1 , Ω2 ⊂ X be convex, and let x ∈ Ω1 ∩ Ω2 .
Suppose that the space X is Banach, that both sets Ω1 and Ω2 are closed,
and that the Attouch-Brezis qualification condition from Definition 4.32 is
satisfied. Then we have the normal cone intersection rule
N (x; Ω1 ∩ Ω2 ) = N (x; Ω1 ) + N (x; Ω2 ). (4.25)
Proof. It follows from the normal cone definition that x∗ ∈ N (x; Ω) for x ∈ Ω
if and only if σΩ (x∗ ) = x∗ , x. Then pick any x∗ ∈ N (x; Ω1 ∩ Ω2 ) and get
x∗ , x = σΩ1 ∩Ω2 (x∗ ). Theorem 4.33 yields the existence of x∗1 , x∗2 ∈ X ∗ such
that x∗ = x∗1 + x∗2 and that
x∗1 , x + x∗2 , x = x∗ , x = σΩ1 ∩Ω2 (x∗ ) = σΩ1 (x∗1 ) + σΩ2 (x∗2 ).
276 4 ENHANCED CALCULUS AND FENCHEL DUALITY

This clearly ensures that x∗1 , x = σΩ1 (x∗1 ) and x∗2 , x = σΩ2 (x∗2 ). Thus
we have the inclusions x∗1 ∈ N (x; Ω1 ) and x∗2 ∈ N (x; Ω2 ), which show that
N (x; Ω1 ∩Ω2 ) ⊂ N (x; Ω1 )+N (x; Ω2 ). This verifies the inclusion “⊂” in (4.25).
The opposite inclusion is obvious. 

The following consequences of Theorem 4.35 contain enhanced versions of


sum and chain rules for coderivatives of set-valued mappings and subgradients
of extended-real-valued functions in Banach spaces.
Corollary 4.36 Assume that in the framework of Theorem 3.22 the space
X is Banach, the mappings F1 and F2 have closed graphs, and the set
cone(dom(F1 ) − dom(F2 )) is a closed subspace of X. Then we have the
coderivative sum rule (3.30).

Proof. Following the lines of the proof of Theorem 3.22, observe that the
equality in (3.32) for the sets Ω1 , Ω2 ⊂ X × Y × Y defined in (3.31) yields
 
cone(Ω1 − Ω2 ) = cone dom(F1 ) − dom(F2 ) × Y × Y.
Thus cone(Ω1 − Ω2 ) is a closed subspace of X × Y × Y under the qualification
condition imposed in the corollary. Then we proceed similarly to the proof of
Theorem 3.22 by applying Theorem 4.35 instead of Theorem 3.10. 

Corollary 4.37 Let X, Y, Z be Banach spaces, let the set-valued mappings


F: X →→ Y and G : Y → → X have closed and convex graphs, and let the set
cone(rge(F )−dom(G)) be a closed subspace of Y . Then for all (x, z) ∈ gph(G◦
F ) and all z ∗ ∈ Z ∗ we have the coderivative chain rule (3.34).

Proof. Following the proof of Theorem 3.23, consider the closed and convex
subsets Ω1 , Ω2 ∈ X × Y × Z therein. Taking x∗ ∈ D∗ (G ◦ F )(x, z)(z ∗ ) with
z ∗ ∈ Z ∗ , we have the relationships
 
(x∗ , 0, −z ∗ ) ∈ N (x, y, z); Ω1 ∩ Ω2 and
Ω1 − Ω2 = X × rge(F ) − dom(G) × Z.
The latter tells us that the qualification condition imposed in this corollary
ensures the fulfillment of the Attouch-Brezis qualification condition (4.24) for
the sets Ω1 , Ω2 . Then we proceed as in the proof of Theorem 3.23 by replacing
the application of Theorem 3.10 with that of Theorem 4.35. 

Corollary 4.38 Let F1 , F2 : X →→ Y be closed-graph mappings between Banach


spaces, and let the set cone(gph(F1 ) − gph(F2 )) be a closed subspace of X × Y .
Then the coderivative intersection rule (3.37) holds for any y ∈ (F1 ∩ F2 )(x)
and any y ∗ ∈ Y ∗ .

Proof. Observe that the qualification condition imposed in this statement


reduces to (4.24) for the sets Ω1 := gph(F1 ) and Ω2 := gph(F2 ). Thus the
proof of Theorem 3.24 applies by replacing the application of Theorem 3.10
with that of Theorem 4.35. 
4.2 Enhanced Calculus in Banach Spaces 277

Corollary 4.39 Let f1 , f2 : X → R be l.s.c. convex functions


 defined on a
Banach space X, and let the set cone dom(f1 )−dom(f2 ) be a closed subspace
of X. Then we have the subdifferential sum rule (3.58).

Proof. It follows from the proof of Theorem 3.48 by applying the normal cone
intersection rule from Theorem 4.35 instead of that from Theorem  3.10. We
can easily
 check that the imposed closed subspace property of cone dom(f1 )−
dom(f2 ) corresponds to the Attouch-Brezis qualification condition (4.24) for
the sets Ω1 , Ω2 introduced in the proof of Theorem 3.48. 

Corollary 4.40 Let B : X → Y be an affine mapping defined on a Banach


space X with its values B(x) := Ax + b belonging to a topological vector space
Y , and let g : Y → R be an l.s.c. convex function such that cone(AX −dom(g))
is a closed subspace of X. Then we have the subdifferential chain rule (3.65).

Proof. Proceeding as in the proof of Theorem 3.55, consider the sets Ω1 and
Ω2 defined in (3.67). Observe the equality
 
cone(Ω1 − Ω2 ) = X × cone AX − dom(g) × R.
Thus the imposed assumption ensures the fulfillment of the Attouch-Brezis
qualification condition (4.24) for these sets in the intersection rule of Theo-
rem 4.35. The rest of the proof follows the one of Theorem 3.55. 

Next we proceed with deriving Banach space versions of the main results
of conjugate calculus improving those from Subsection 4.1.3 in this setting.
The first result concerns the conjugate sum rule.

Theorem 4.41 Let f, g : X → R be convex l.s.c. functions defined on a


Banach space X. Assume that the set cone(dom(f ) − dom(g)) is a closed
subspace of X. Then we have the conjugate sum rule (4.17). Furthermore, for
any x∗ ∈ X ∗ there are x∗1 , x∗2 ∈ X ∗ such that the conditions in (4.18) hold.

Proof. Observe first that the convex sets Ω1 , Ω2 from (4.19) are closed by the
l.s.c. assumption on f, g and then check that
 
cone(Ω1 − Ω2 ) = cone dom(f ) − dom(g) × R × R. (4.26)
Indeed, consider u ∈ cone(Ω1 − Ω2 ) and find t ≥ 0, v ∈ (Ω1 − Ω2 ) such that
u = tv. This gives us elements v = (x1 , λ1 , λ2 )−(x2 , γ1 , γ2 ) with (x1 , λ1 , λ2 ) ∈
Ω1 and (x2 , γ1 , γ2 ) ∈ Ω2 . Note that x1 ∈ dom(f ) and x2 ∈ dom(g) due to
f (x1 ) ≤ λ1 < ∞ and g(x2 ) ≤ γ2 < ∞. Hence we arrive at the inclusion
   
tv = t x1 − x2 , λ1 − γ1 , λ2 − γ2 ∈ cone dom(f ) − dom(g) × R × R.
To verify now the opposite inclusion, fix x ∈ cone(dom(f ) − dom(g)) × R × R
and find, by taking into account Remark 4.34, such t > 0, x1 ∈ dom(f ),
x2 ∈ dom(g), γ1 , γ2 ∈ R, and λ1 , λ2 for which we have
278 4 ENHANCED CALCULUS AND FENCHEL DUALITY
 
x = t(x1 − x2 ), γ1 , γ2 = t(x
 1 − x2 , λ1 , λ2 ) 
= t x1 , f (x1 ), λ2 + g(x2 ) − x2 , −λ1 + f (x1 ), g(x2 ) .
This readily yields x ∈ t(Ω1 − Ω2 ) ⊂ cone(Ω1 − Ω2 ). Applying Theorem 4.33,
we arrive at the claimed conclusions under the assumptions made. 

The two final consequences provide the improved versions of the conjugate
chain and maximum rules obtained geometrically from the corresponding ver-
sion of the support function intersection rule in Banach spaces.

Corollary 4.42 Suppose that in the setting of Theorem 4.28 the spaces X
and Y are Banach, the function g is convex and l.s.c., and the set cone(AX −
dom(g)) is a closed subspace of Y . Then we have the conjugate chain rule
(4.21), where the infimum is achieved for any x∗ ∈ dom(g ◦ A)∗ .
Proof. Considering the sets Ω1 and Ω2 from (4.22), it is easy to check that
 
cone(Ω1 − Ω2 ) = X × cone AX − dom(g) × R.
Then we proceed as in the proof of Theorem 4.28 by employing Theorem 4.33
instead of Theorem 4.23. 
Corollary 4.43 Suppose that in the setting of Theorem 4.29 the space X
is Banach, the functions f, g : X → R are convex and l.s.c., and the set
cone(dom(f ) − dom(g)) is a closed subspace of X. Then we have the con-
jugate maximum rule (4.23), where the infimum is achieved provided that the
value (f ∨ g)∗ (x∗ ) is a real number.

Proof. Assuming that cone(dom(f ) − dom(g)) is a closed subspace of X and


remembering that Ω1 := epi(f ) and Ω2 := epi(g), it is straightforward to
conclude that the set cone(Ω1 − Ω2 ) is a closed subspace of X × R. Then we
follow the proof of Theorem 4.29. 

4.3 Directional Derivatives

This section is devoted to directional differentiability of convex functions and


their relationships with subdifferentiation. In contrast to classical analysis,
directional derivative constructions in the convex analysis are one-sided and
related to directions with no classical plus-minus symmetry. Geometrically, it
corresponds to considering tangent cones instead of tangent subspaces in clas-
sical analysis. Note that tangents and directional derivatives are primal-space
constructions in contrast to the dual-space nature of normals and subgradi-
ents.
4.3 Directional Derivatives 279

4.3.1 Definitions and Elementary Properties

Our basic definition here is as follows.


Definition 4.44 Let X be a vector space, and let f : X → R with x ∈ dom(f ).
The directional derivative of f at x in the direction v ∈ X is the following
limit, provided that it exists as either a real number or ±∞:
f (x + tv) − f (x)
f  (x; v) := lim . (4.27)
t↓0 t
Note that construction (4.27) is sometimes called the right directional
derivative f at x in the direction v. Its left counterpart is defined by

 f (x + tv) − f (x)
f− (x; v) := lim .
t↑0 t
It is easy to see from the definitions that

f− (x; v) = −f  (x; −v) for all v ∈ X,

and thus properties of the left directional derivative f− (x; v) reduce to those
of the right one (4.27), which we study below.
Directional derivatives of convex functions enjoy remarkable properties,
some of which are presented in what follows.

Proposition 4.45 Let X be a vector space, and let f : X → R be a convex


function with x ∈ dom(f ). Given v ∈ X, define a function ϕ : R+ → R by
f (x + tv) − f (x)
ϕ(t) := , t > 0.
t
Then the function ϕ is nondecreasing on (0, ∞).
Proof. Fix any numbers 0 < t1 < t2 and get the representation
t1 t1
x + t1 v = x + t2 v + 1 − x.
t2 t2
It follows from the convexity of f that
t1 t1
f (x + t1 v) ≤ f (x + t2 v) + 1 − f (x),
t2 t2
which implies in turn the inequality
f (x + t1 v) − f (x) f (x + t2 v) − f (x)
ϕ(t1 ) = ≤ = ϕ(t2 ).
t1 t2
This verifies that ϕ is nondecreasing on (0, ∞). 
280 4 ENHANCED CALCULUS AND FENCHEL DUALITY

The next proposition establishes the existence of the directional derivative


for any extended-real-valued convex function on a vector space.
Proposition 4.46 Let X be a vector space, and let f : X → R be a con-
vex function with x ∈ dom(f ). Then the directional derivative f  (x; d) exists
in every direction v ∈ X. Furthermore, it admits the representation via the
function ϕ defined in Proposition 4.45:
f  (x; v) = inf ϕ(t), v ∈ X.
t>0

Proof. Proposition 4.45 tells us that the function ϕ is nondecreasing. Thus


f (x + tv) − f (x)
f  (x; v) = lim = lim ϕ(t) = inf ϕ(t),
t↓0 t t↓0 t>0

which verifies the results claimed in the proposition. 

Based on the above representation, we get the following nonlimiting rela-


tionship between the convex function in question and its directional derivative.

Proposition 4.47 Let X be a vector space, and let f : X → R be a convex


function with x ∈ dom(f ). Then
f  (x; v) ≤ f (x + v) − f (x) whenever v ∈ X.

Proof. Proposition 4.45 tells us that


ϕ(t) ≤ ϕ(1) = f (x + v) − f (x) for all t ∈ (0, 1),
which justifies the claimed property due to f  (x; v) = inf t>0 ϕ(t) ≤ ϕ(1). 

4.3.2 Relationships with Subgradients

Here we establish several relationships between the directional derivative and


the subdifferential of a convex function.

Proposition 4.48 Let f : X → R be a proper convex function defined on


a topological vector space X, and let x ∈ dom(f ). Then the following three
properties are equivalent:
(a) x∗ ∈ ∂f (x).
(b) x∗ , v ≤ f  (x; v) for all v ∈ X.

(c) f− (x; v) ≤ x∗ , v ≤ f  (x; v) for all v ∈ X.

Proof. Picking any x∗ ∈ ∂f (x) and t > 0, we get


x∗ , tv ≤ f (x + tv) − f (x) whenever v ∈ X,
which verifies implication (a)=⇒(b) by dividing by t and taking the limit as
t ↓ 0. Assuming that assertion (b) holds, we get from Proposition 4.47 that
4.3 Directional Derivatives 281

x∗ , v ≤ f  (x; v) ≤ f (x + v) − f (x) for all v ∈ X.


This ensures by (3.39) that x∗ ∈ ∂f (x), and so (a) and (b) are equivalent.
It is obvious that (c) yields (b). Conversely, if (b) is satisfied, then for any
d ∈ X we have x∗ , −v ≤ f  (x; −v). This gives us

f− (x; v) = −f  (x; −v) ≤ x∗ , v for any v ∈ X,
which justifies the fulfillment of (c) and thus completes the proof. 
The next proposition is useful to derive the main result of this section.

Proposition 4.49 Let X be a vector space. Given a convex function f : X →


R with x ∈ dom(f ), define the directional function ψ(v) := f  (x; v). Then this
function has the following properties:
(a) ψ(0) = 0.
(b) ψ(v1 +v2 ) ≤ ψ(v1 )+ψ(v2 ) for all v1 , v2 ∈ X with ψ(vi ) = −∞ as i = 1, 2.
(c) ψ(αv) = αψ(v) whenever v ∈ X and α > 0.
(d) If X is a topological vector space and x ∈ int(dom(f )), then ψ is finite
on the entire space X.
(e) If X is a normed space and f is continuous at x ∈ int(dom(f )), then ψ
is finite and locally Lipschitz continuous on X.

Proof. It is straightforward to deduce properties (a)–(c) directly from defini-


tion (4.27). For instance, (b) is satisfied due to the relationships
 
f x + t(v1 + v2 ) − f (x)
ψ(v1 + v2 ) = lim
t↓0 t

x + 2tv1 + x + 2tv2
f − f (x)
= lim 2
t↓0 t

f (x + 2tv1 ) − f (x) f (x + 2tv2 ) − f (x)


≤ lim + lim
t↓0 2t t↓0 2t

= ψ(v1 ) + ψ(v2 ).
To verify (d), choose by using x ∈ int(dom(f )) a number α > 0 to be so small
that x + αv ∈ dom(f ). It follows from Proposition 4.47 that
ψ(αv) = f  (x; αv) ≤ f (x + αv) − f (x) < ∞.
Employing (c) gives us ψ(v) < ∞. Furthermore, we get from (a) and (b) that
 
0 = ψ(0) = ψ v + (−v) ≤ ψ(v) + ψ(−v), v ∈ X,
282 4 ENHANCED CALCULUS AND FENCHEL DUALITY

which implies that ψ(v) ≥ −ψ(−v). This tells us that ψ(v) > −∞ and thus
verifies assertion (d).
It remains to prove (e). We get from the continuity of f at x that x ∈
int(dom(f )), and so ψ is finite on X. There exists γ > 0 such that
f (x + v) − f (x) < 1 whenever v < γ.
By Proposition 4.47 we have the estimate
ψ(v) = f  (x; v) ≤ f (x + v) − f (x) < 1 if v < γ.
Then ψ is a convex function bounded from above on bounded sets, and hence
it is locally Lipschitz continuous on dom(ψ) = X. 

Now we are ready to establish the following major relationships between


directional derivatives and subgradients of arbitrary convex functions defined
on topological vector spaces.

Theorem 4.50 Let X be a topological vector space. Consider a convex func-


tion f : X → R that is continuous at x, and let ψ(v) := f  (x; v) on X. Then
we have the relationships:
(a) ∂f (x) = ∂ψ(0).
(b) ψ ∗ (x∗ ) = δΩ (x∗ ) for all x∗ ∈ X, where Ω := ∂ψ(0).
(c) f  (x; v) is represented via the subdifferential ∂f (x) = ∅ as
  
f  (x; v) = max x∗ , v  x∗ ∈ ∂f (x) for any v ∈ X. (4.28)

Proof. It follows from Proposition 4.48 that x∗ ∈ ∂f (x) if and only if


x∗ , v − 0 = x∗ , v ≤ f  (x; v) = ψ(v) = ψ(v) − ψ(0) for all v ∈ X.
This is equivalent to x∗ ∈ ∂ψ(0), and hence (a) holds.
To verify (b), let us first check that ψ ∗ (v) = 0 for all x∗ ∈ Ω = ∂ψ(0).
Indeed, we have by the definition of the Fenchel conjugate that
  
ψ ∗ (x∗ ) = sup x∗ , v − ψ(v)  v ∈ X ≥ x∗ , 0 − ψ(0) = 0.
Picking now any x∗ ∈ ∂ψ(0) gives us
x∗ , v = x∗ , v − 0 ≤ ψ(v) − ψ(0) = ψ(v), v ∈ X,
which implies therefore that
  
ψ ∗ (x∗ ) = sup x∗ , v − ψ(v)  v ∈ X ≤ 0
and thus ensures the fulfillment of ψ ∗ (v) = 0 for any x∗ ∈ ∂ψ(0).
Let us now show that ψ ∗ (x∗ ) = ∞ if x∗ ∈ / ∂ψ(0). For such x∗ ∈ X ∗ we
find v0 ∈ X with x∗ , v0  > ψ(v0 ). Since ψ is positively homogeneous by
Proposition 4.49(c), it follows that
4.4 Subgradients of Supremum Functions 283
    
ψ ∗ (x∗ ) = sup x∗ , v − ψ(v)  v ∈ X ≥ supt>0 x∗ , tv0  − ψ(tv0 )
= supt>0 t x∗ , v0  − ψ(v0 ) = ∞.
To verify representation (4.28) in case (c), observe by (a) that ∂ψ(0) = ∅.
Employing Corollary 4.14 tells us that
f  (x; v) = ψ(v) = ψ ∗∗ (v), v ∈ X.
It follows from (b) that ψ ∗ (x∗ ) = δΩ (x∗ ), where Ω = ∂ψ(0) = ∂f (x). Hence
  
ψ ∗∗ (v) = δΩ

(v) = sup x∗ , v  x∗ ∈ Ω ,
which justifies (c) in the general case under consideration. The continuity of
f at x implies that ∂f (x) is a nonempty compact set in the weak∗ topology
of X ∗ . Indeed, from the imposed continuity of f at x we can find a balanced
neighborhood U of the origin such that f (x + u) − f (x) ≤ 1 whenever u ∈ U .
Then for any x∗ ∈ ∂f (x) and for any u ∈ U we have
x∗ , u = x∗ , x + u − x ≤ f (x + u) − f (x) ≤ 1.
Thus we get ∂f (x) ⊂ U ◦ := {x∗ ∈ X ∗ | supu∈U |x∗ , u| ≤ 1}, which is a
compact set in the weak∗ topology by the Banach-Alaoglu theorem in the
topological vector space setting; see Theorem 1.112. This confirms that ∂f (x)
is a compact set in the weak∗ topology, and so the maximum is achieved in
representation (4.28), which completes the proof of the theorem. 

4.4 Subgradients of Supremum Functions

This section is devoted to the study of the class of supremum functions over
infinite index sets. Supremum functions, which are intrinsically nonsmooth,
have been highly recognized in the convex and variational analysis due to their
remarkable features and numerous applications; in particular, to nonstandard
problems of constrained optimization and optimal control. The main result
here is a precise formula for calculating subgradients of the supremum of
convex functions over compact index sets.

4.4.1 Supremum of Convex Functions

Let X be a topological vector space, and let T be an index set that is assumed
here to be a compact topological space. Given a real-valued function g : T ×
X → R, define the corresponding supremum function f : X → R by
  
f (x) := sup g(t, x)  t ∈ T , x ∈ X. (4.29)
We associate with the supremum function (4.29) the following set-valued map-
→ T of active indices given by
ping S : X →
  
S(x) := t ∈ T  f (x) = g(t, x) , x ∈ X. (4.30)
284 4 ENHANCED CALCULUS AND FENCHEL DUALITY

This subsection presents some general properties of the supremum function


(4.29), mostly in the case when the functions x → g(x, t) are convex for all
t ∈ T . Recall that ψ : X → [−∞, ∞) is upper semicontinuous (u.s.c.) if the
function −ψ : X → (−∞, ∞] is l.s.c.

Proposition 4.51 Let t → g(t, x) be upper semicontinuous on T for some


x ∈ X. Then we have S(x) = ∅ at this point x.
Proof. This follows from the existence result of Theorem 2.164. 

Proposition 4.52 Let x → g(t, x) be convex on X for all t ∈ T . Then the


supremum function (4.29) is also convex on X.

Proof. This follows from the easily verifiable fact that the supremum of a
family of convex functions is always convex. 

The next proposition is rather technical while being important for verifying
the subdifferential formula for (4.29) in the next subsection.

Proposition 4.53 Given x0 ∈ X and v ∈ X, define the extended-real-valued


function ϕ : R → R by
ϕ(λ) := f (x0 + λv), λ ∈ R,
via (4.29) and also the set-valued mapping Φ : R →
→ T by
Φ(λ) := S(x0 + λv), λ ∈ R,
via (4.30). Assume that g(t, ·) is convex for every t ∈ T and that g(·, x) is
upper semicontinuous for every x ∈ X. The following assertions hold:
(a) The function ϕ is convex and continuous on R.
(b) For any sequences λk ↓ 0 and tk → t0 with tk ∈ Φ(λk ) we have t0 ∈
Φ(0) = S(x0 ). Furthermore, f (x0 ) = g(t0 , x0 ) = limk→∞ g(tk , x0 ).

Proof. To verify (a), observe by Proposition 4.51 that the supremum is


attained in the definition of f (x0 + λv) via (4.29) since g(·, x) is upper semi-
continuous for every x ∈ X. Thus ϕ is a real-valued convex function on R,
which is automatically continuous.

To proceed with (b), fix any γ > 0 and choose λk < γ for all k ∈ N. Then
λk λk
ϕ(λk ) = f (x0 + λk v) = g(tk , x0 + λk v) = g tk , (x0 + γv) + 1 − x0
γ γ
λk λk
≤ g(tk , x0 + γv) + 1 − g(tk , x0 )
γ γ
due to the convexity of g(t, ·). The continuity of ϕ and the upper semiconti-
nuity of g(·, x) ensure the relationships
4.4 Subgradients of Supremum Functions 285

f (x0 ) = ϕ(0) ≤ lim sup g(tk , x0 ) ≤ g(t0 , x0 ),


k→∞

which show that f (x0 ) = g(t0 , x0 ) and hence t0 ∈ S(x0 ).


It remains to verify the limiting condition in (b). Using the upper semi-
continuity of g(·, x0 + γv) at t0 , find a neighborhood V of t0 such that
g(t, x0 + γv) < g(t0 , x0 + γv) =: δ whenever t ∈ V.
Then for sufficiently large k ∈ N we have
λk λk
ϕ(λk ) < δ+ 1− g(tk , x0 ),
γ γ
which clearly implies that
ϕ(0) = f (x0 ) ≤ lim inf g(tk , x0 )
k→∞

and thus shows that limk→∞ g(tk , x0 ) = f (t0 , x0 ) = f (x0 ). 

4.4.2 Subdifferential Formula for Supremum Functions

Now we are ready to present the main result of this section, which calculates
the subdifferential of (4.29) via those for the generating convex functions
g(t, ·). In what follows we use the notation gt (x) := g(t, x). Given a topological
space X and a subset Ω ⊂ X ∗ of its dual space X ∗ , denote by co∗ (Ω) the
weak∗ closed convex subset of X ∗ containing Ω, which is the smallest among
all the weak∗ closed convex subsets of X ∗ containing the set co(Ω).
Theorem 4.54 Let X be an LCTV space, let T be a compact topological space,
and let g(t, x) satisfy all the assumptions of Proposition 4.53. Assume in addi-
tion that for every t ∈ T , the function g(t, ·) is continuous at x. Then we have
the subdifferential representation

∂f (x) = co∗ ∂gt (x) . (4.31)
t∈S(x)

Proof. Observe that the assumptions imposed in the theorem ensure that
S(x) = ∅ and the supremum function f is convex by Proposition 4.51 and
Proposition 4.52, respectively. Then the inclusion “⊃” in (4.31) follows directly
from the definitions.
To verify the inclusion “⊂” in (4.31), pick any x∗ ∈ ∂f (x), denote

C := co∗ ∂gt (x) ,
t∈S(x)

and suppose on the contrary that x∗ ∈ / C. Applying to the sets {x∗ } and C
the strict separation result of Theorem 2.61 in the space X ∗ equipped with
the weak∗ topology allows us to find a vector u ∈ X and α > 0 such that
286 4 ENHANCED CALCULUS AND FENCHEL DUALITY

u∗ , u + α < x∗ , u for all u∗ ∈ C. (4.32)


Under the continuity assumption of g(t, ·) at x for t ∈ T , employing (4.32)
and Theorem 4.50(c) gives us the estimates
  
gt (x; u) + α = max u∗ , u  u∗ ∈ ∂gt (x) + α < x∗ , u
whenever t ∈ S(x). Thus
gt (x; u) + α < x∗ , u for every t ∈ S(x).
Next we provide the proof under the assumption that T is sequentially
compact. The proof for the case where T is compact can be done similarly by
using nets instead of sequences; see Exercise 4.83. In the case under consider-
ation, we proceed by applying Theorem 4.46 to the function f which allows
us to find a sequence λk ↓ 0 with
f (x + λk u) − f (x)
f  (x; u) = lim .
k→∞ λk
Choose tk ∈ T such that f (x + λk u) = g(tk , x + λk u) and, by the compactness
of T , suppose without loss of generality that {tk } converges to t ∈ T . By
Proposition 4.53(b) we get t ∈ S(x) and so
f (x) = g(t, x) = lim g(tk , x).
k→∞

Select further a scalar λ0 > 0 such that


g(t, x + λ0 u) − g(t, x) α α
< f  (x; u) + < x∗ , u −
λ0 2 2
and then find k0 ∈ N for which λk < λ0 whenever k ≥ k0 . Thus for such
numbers k ∈ N we have the estimates
f (x + λk u) − f (x) g(tk , x + λk u) − g(tk , x) g(tk , x + λ0 u) − g(tk , x)
≤ ≤ .
λk λk λ0
Letting k → ∞ and using the assumed upper semicontinuity of g(·, x + λ0 u)
gives us the inequalities
g(t, x + λu) − g(t, x)
f  (x; u) ≤ < x∗ , u − α/2.
λ0
The latter contradicts x∗ , u ≤ f  (x; u) and therefore yields x∗ ∈ C. 

4.5 Subgradients and Conjugates of Marginal Functions

This section is devoted to the study of subdifferential properties for a


remarkable class of extended-real-valued functions known as optimal value
4.5 Subgradients and Conjugates of Marginal Functions 287

or marginal functions. Such functions were introduced and partly discussed


in Subsection 2.4.3 while not from the viewpoint of generalized differentiation
and conjugacy. For the reader’s convenience, we repeat here the construction
of such functions. Given a set-valued mapping F : X → → Y and an extended-
real-valued function of two variables ϕ : X × Y → R, define
  
μ(x) := inf ϕ(x, y)  y ∈ F (x) , x ∈ X. (4.33)
We always assume that (4.33) is proper and get from Theorem 2.129 that
the marginal function (4.33) is convex provided that ϕ is convex and that the
mapping F is of convex-graph. The main results of this section furnish precise
calculations of the subdifferential and Fenchel conjugate of (4.33) in terms
of the given data under different assumptions in topological vector spaces,
Banach spaces, and finite-dimensional spaces. We also present an application
of the subdifferential calculation for (4.33) to deriving another subdifferential
chain rule for compositions involving mappings with values in ordered spaces.

4.5.1 Computing Subgradients and Another Chain Rule

Observing that the optimal value function (4.33) is intrinsically nonsmooth,


we concentrate here on calculating its subdifferential. Note that the results
presented below are significantly different from those known earlier for non-
convex marginal functions (see, e.g., [228, Theorems 1.108 and 3.38] with the
references therein) and their specifications to the convex case. Given the opti-
mal value function μ(x) composed in (4.33), define the corresponding solution
map M : X → → Y associated with (4.33) by
  
M (x) := y ∈ F (x)  μ(x) = ϕ(x, y) , x ∈ X. (4.34)
The following result allows us to represent the subdifferential of (4.33) via the
one for the sum of two associated convex functions.
Proposition 4.55 Considering (4.33), let ϕ : X × Y → R be convex, and let
F: X → → Y be a convex-graph set-valued mapping between topological vector
spaces. Then for any x ∈ X and y ∈ M (x) we have the representation
    
∂μ(x) = x∗ ∈ X ∗  (x∗ , 0) ∈ ∂ ϕ + δgph(F ) (x, y) .
Proof. Fix a subgradient x∗ ∈ ∂μ(x) and get
x∗ , x − x ≤ μ(x) − μ(x) for all x ∈ X. (4.35)
It follows from the definition in (4.33) that
x∗ , x − x ≤ μ(x) − ϕ(x, y) ≤ ϕ(x, y) − ϕ(x, y) whenever y ∈ F (x).
Using further the indicator function associated with gph(F ), we have
   
x∗ , x − x ≤ μ(x) − ϕ(x, y) ≤ ϕ + δgph(F ) (x, y) − ϕ + δgph(F ) (x, y) (4.36)
288 4 ENHANCED CALCULUS AND FENCHEL DUALITY

for all (x, y) ∈ X × Y , which yields the inclusion (x∗ , 0) ∈ ∂(ϕ + δgph(F ) )(x, y).
To verify the opposite inclusion, pick (x∗ , 0) ∈ ∂(ϕ + δgph(F ) )(x, y). Then
(4.36) is satisfied for all (x, y) ∈ X × Y . This tells us that
x∗ , x − x ≤ ϕ(x, y) − ϕ(x, y) = ϕ(x, y) − μ(x) whenever y ∈ F (x).
Taking the infimum on the right-hand side above with respect to y ∈ F (x)
gives us (4.35), and so we arrive at x∗ ∈ ∂μ(x). 

The next theorem provides a precise calculation of the subdifferential of


(4.33) via the subdifferential of ϕ and the coderivative of F .

Theorem 4.56 Let ϕ : X × Y → R be a proper convex function, and let


F: X → → Y be a convex-graph set-valued mapping between topological vector
spaces. Assume that one of the following conditions is satisfied:
(a) The function ϕ is (finite and) continuous at some point in gph(F ).
(b) The spaces X and Y are Banach, ϕ is l.s.c., the graph of F is closed, and
the set R+ (dom(ϕ) − gph(F )) is a closed subspace of X × Y .
(c) X = Rn , Y = Rm , and ri(dom(ϕ)) ∩ ri(gph(F )) = ∅.
Then for any x ∈ X and y ∈ M (x) we have the subdifferential representation
  ∗ 
∂μ(x) = x + D∗ F (x, y)(y ∗ ) . (4.37)
(x∗ ,y ∗ )∈∂ϕ(x,y)

Proof. Employing the subdifferential representation for marginal functions


obtained in Proposition 4.55, we apply the subdifferential sum rule from The-
orem 3.48, Corollary 4.39, and Theorem 3.50, respectively, under the corre-
sponding conditions in (a)–(c). The coderivative of F appears there due to its
definition and the fact that ∂δΩ (z) = N (z; Ω) for any convex set Ω. 

Corollary 4.57 In the framework of Theorem 4.56, suppose that the cost
function ϕ in (4.33) does not depend on x. Then we have

∂μ(x) = D∗ F (x, y)(y ∗ ) for any (x, y) ∈ gph(M )
y ∗ ∈∂ϕ(y)

with M (·) from (4.34) under the fulfillment of one of the assumptions (a)–(c).

Proof. This follows directly from (4.37) with ϕ(x, y) = ϕ(x). Indeed, in this
case we obviously have ∂ϕ(x, y) = ∂ϕ(x) and x∗ = 0. 

Remark 4.58 When the mapping F in (4.33) is single-valued, the marginal


function therein reduces to the (generalized) composition ϕ(x, F (x)), which
reads as the standard composition ϕ ◦ F if ϕ = ϕ(y). In this way the results
of Theorem 4.56 and Corollary 4.57 give us (generalized) subdifferential chain
rules. Consider, e.g., the case where F (x) := Ax + b, where A : X → Y is a
4.5 Subgradients and Conjugates of Marginal Functions 289

linear continuous operator, and where b ∈ Y . Then the coderivative of F at


(x, y) with y := Ax + b is calculated by
 
D∗ F (x, y)(y ∗ ) = A∗ y ∗ for all y ∗ ∈ Y ∗
due to Proposition 3.20. The composition g ◦ F of this mapping F with a
convex function g : Y → R is a particular case of the optimal value function
(4.33) with ϕ(x, y) := g(y). Hence Corollary 4.57 yields
∂(g ◦ F )(x) = A∗ ∂g(Ax)
under each of the assumptions (a)–(c) in Theorem 4.56. Thus in this case we
get back to the subdifferential chain rules obtained above.
Moreover, the results on subdifferentiation of optimal value functions
obtained in Theorem 4.56 and Corollary 4.57 allow us to derive yet another
chain rule involving generalized convex functions with values in ordered topo-
logical vector spaces. We confine ourselves to the case of applying Corol-
lary 4.57.
Recall some definitions. Given an ordering convex cone Y+ ⊂ Y , define
the ordering relation ≺ on Y as follows: y1 ≺ y2 for y1 , y2 ∈ Y if and only
if y2 − y1 ∈ Y+ . We say that a function ϕ : Y → R is Y+ -nondecreasing if
ϕ(y1 ) ≤ ϕ(y2 ) when y1 ≺ y2 . Further, a mapping f : X → Y with values in
an ordered space Y is Y+ -convex if
 
f λx + (1 − λ)u ≺ λf (x) + (1 − λ)f (u) for all x, u ∈ X and λ ∈ (0, 1).
It is easy to observe the following property.
Proposition 4.59 Let Y be a topological vector space, and let ϕ : Y → R be
Y+ -nondecreasing and convex. Then for any y ∈ dom(ϕ) and any y ∗ ∈ ∂ϕ(y)
we have y ∗ , z ≥ 0 whenever z ∈ Y+ .
Proof. For any z ∈ Y+ we get y − z ≺ y. This tells us that
y ∗ , −z = y ∗ , y − z − y ≤ ϕ(y − z) − ϕ(y) ≤ 0,
which clearly implies that y ∗ , z ≥ 0. 
Now we are ready to derive from Theorem 4.56 the aforementioned chain
rule in ordered spaces. For simplicity, we consider only the result in the topo-
logical vector space setting corresponding to assumption (a) of Theorem 4.56.
Theorem 4.60 Let X and Y be topological vector spaces, and let Y be ordered
by an ordering convex cone Y+ ⊂ Y . Consider a Y+ -convex mapping f : X →
Y and a function ϕ : Y → R, which is convex and Y+ -nondecreasing. If there
 ∈ X such that ϕ is continuous at some point y ∈ Y with f (
exists x x) ≺ y,
then we have the subdifferential chain rule

∂(ϕ ◦ f )(x) = ∂(y ∗ ◦ f )(x) whenever x ∈ dom(ϕ ◦ f ). (4.38)
y ∗ ∈∂ϕ(f (x))
290 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Proof. Define F : X →→ Y by F (x) := {y ∈ Y | f (x) ≺ y}. It is easy to check


that the Y+ -convexity of f yields that the graph of F is convex in X × Y .
Since ϕ is Y+ -nondecreasing, we have the marginal function representation
  
μ(x) := inf ϕ(y)  y ∈ F (x) = (ϕ ◦ f )(x), x ∈ X.
Then the subdifferential formula from Theorem 4.56 tells us that
  
∂(ϕ ◦ f )(x) = D∗ F x, f (x) (y ∗ ).
y ∗ ∈∂ϕ(f (x))

It is clear that the claimed chain rule (4.38) follows from the formula
   
D∗ F x, f (x) (y ∗ ) = ∂(y ∗ ◦ f )(x) whenever y ∗ ∈ ∂ϕ f (x) , (4.39)
which we are going
 to provenow. To verify the inclusion “⊂” in (4.39), pick
(y ∗ , x∗ ) ∈ gph D∗ F (x, f (x)) and get by definition that
y ∗ , y ≥ y ∗ , f (x) + x∗ , x − x for all x ∈ X, f (x) ≺ y.
Fix h ∈ X and select x := x + h with y := f (x). Since f (x) ≺ y, we have that
y ∗ , f (x + h) ≥ y ∗ , f (x) + x∗ , h,
which yields x∗ ∈ ∂(y ∗ ◦ f )(x) and hence justifies the inclusion “⊂” in (4.39).
To verify the opposite inclusion in (4.38), pick x∗ ∈ ∂(y ∗ ◦ f )(x) with
y ∗ ∈ ∂ϕ(f (x)) and get by the subgradient definition that
y ∗ , f (x + h) − f (x) ≥ x∗ , h whenever h ∈ X.
Taking any x ∈ X and y ∈ Y with f (x) ≺ y, denote h := x − x. Then
Proposition 4.59 tells us that y ∗ , y ≥ y ∗ , f (x). This shows that
y ∗ , y − f (x) ≥ y ∗ , f (x) − f (x) ≥ x∗ , x − x,
and therefore (x∗ , −y ∗ ) ∈ N ((x, f (x)); gph(F )). It gives us x∗ ∈
D∗ F (x, f (x))(y ∗ ) and thus verifies the inclusion “⊃” in (4.39), which com-
pletes the proof. 

4.5.2 Conjugate Calculations for Marginal Functions

We conclude this section by calculating the Fenchel conjugate of the marginal


function (4.33) via the given data ϕ and F . The main result here involves the
infimal convolution under the major qualification conditions in topological
vector spaces, Banach spaces, and finite-dimensional spaces that have been
already employed before.
4.5 Subgradients and Conjugates of Marginal Functions 291

Theorem 4.61 Let ϕ : X × Y → R be a convex function, and let the graph of


the mapping F : X → → Y between topological vector spaces be convex in X × Y .
Then the conjugate of the marginal function is expressed by
 ∗
μ∗ (x∗ ) = ϕ + δgph(F ) (x∗ , 0) whenever x∗ ∈ X ∗ . (4.40)
Furthermore, we have the refined representation
 
μ∗ (x∗ ) = ϕ∗ σgph(F ) (x∗ , 0), x∗ ∈ X ∗ , (4.41)
provided the fulfillment of one of the following assumptions (a)–(c):
(a) ϕ is continuous at some point (x, y) ∈ gph(F ).
(b) X and Y are Banach spaces, ϕ is l.s.c., F is of closed graph, and the set
R+ (dom(ϕ) − gph(F )) is a closed subspace of X × Y .
(c) X = Rn , Y = Rm , and ri(dom(ϕ)) ∩ ri(gph(F )) = ∅.

Proof. Fix x∗ ∈ X ∗ and x ∈ dom(μ). It follows from definition (4.33) that


x∗ , x − μ(x) ≥ x∗ , x − ϕ(x, y) whenever y ∈ F (x).
This clearly implies that for all (x, y) ∈ X × Y we have
  
μ∗ (x∗ ) = sup x∗ , x − μ(x)  x ∈ dom(μ) ≥ x∗ , x − μ(x)
  
≥ (x∗ , 0), (x, y) − ϕ + δgph(F ) (x, y).
Putting these together gives us the relationships
     
μ∗ (x∗ ) ≥ sup (x∗ , 0), (x, y) − ϕ + δgph(F ) (x, y)  (x, y) ∈ X × Y

= ϕ + δgph(F ) (x∗ , 0),
which justify the inequality “≥” in (4.40). To verify the opposite inequality,
fix ε > 0 and for any x ∈ dom(μ) find y ∈ F (x) with ϕ(x, y) < μ(x) + ε. Then
    
x∗ , x
− μ(x) + ε < x∗, x − ϕ(x, y)
 ≤ sup x∗ , x − ϕ(x,y)  y ∈ F (x)
≤ sup (x∗ , 0), (x, y) − ϕ + δgph(F ) (x, y)  (x, y) ∈ X × Y

= ϕ + δgph(F ) (x∗ , 0) whenever x∗ ∈ X ∗ .
Since this holds for all x ∈ dom(μ) and all ε > 0, we conclude that
 ∗
μ∗ (x∗ ) ≤ ϕ + δgph(F ) (x∗ , 0), x∗ ∈ X ∗ ,
which therefore justifies the first representation (4.40) of the theorem.
To derive the second representation (4.41), it remains to apply the conju-
gate sum rules from Theorems 4.27 and 4.41 to the sum in (4.40) taking into
∗ ∗ ∗ ∗ ∗
account that δgph(F ) (x , 0) = σgph(F ) (x , 0) for all x ∈ X . 
292 4 ENHANCED CALCULUS AND FENCHEL DUALITY

4.6 Fenchel Duality

This section is devoted to Fenchel duality, which is a highly important topic


of convex analysis that mainly addresses problems of convex optimization.
Roughly speaking, the Fenchel duality scheme says that each problem of con-
vex optimization can be associated with a dual problem built via the conjugate
functions and such that the optimal values of the primal and dual problems
agree under appropriate assumptions.
In what follows we consider two major forms of the Fenchel duality scheme.
The first one deals with the class of primal problems written in a convex
composite form. The duality theorem obtained for this class of problems is
established under the corresponding qualification conditions in topological
vector spaces, Banach spaces, and finite-dimensional spaces being based on
the conjugate calculus results derived above in this chapter. The second class
of primal problems is written as minimizing differences of convex and concave
functions. For this class of problems, we obtain the duality results under
qualification conditions expressed in terms of generalized relative interiors in
topological vector spaces.

4.6.1 Fenchel Duality for Convex Composite Problems

In this subsection we investigate the composite convex optimization frame-


work of Fenchel duality in general topological vector spaces with specifying
the results in Banach and finite-dimensional settings. Given proper convex
functions f : X → R, g : Y → R and a linear continuous operator A : X → Y
between topological vector spaces X and Y , consider the following primal
minimization problem:
minimize f (x) + g(Ax) subject to x ∈ X. (4.42)
Note that the optimization problem (4.42) is written in the unconstrained
form, while it actually incorporates constraints via the domains of the extended-
real-valued functions f and g. Using the conjugate functions of f, g and the
adjoint operator of A, the Fenchel dual problem of (4.42) is defined in the
maximization form as follows:
maximize −f ∗ (A∗ y ∗ ) − g ∗ (−y ∗ ) subject to y ∗ ∈ Y ∗ . (4.43)
As we know, the conjugate functions f ∗ : X ∗ → R and g ∗ : Y ∗ → R are convex
while entering (4.43) with the negative sign, and thus the Fenchel dual problem
also belongs to convex minimization.
The following result establishes an inequality relationship between optimal
values of the primal and dual problems. This relationship, which sometimes is
called weak duality, holds with no convexity assumptions of f and/or g, and
its proof is a consequence of the definitions.
4.6 Fenchel Duality 293

Proposition 4.62 Consider the optimization problem (4.42) and its dual
(4.43) in topological vector spaces, where the functions f and g are not
assumed to be convex. Define the optimal values of these problems by
 
p := inf f (x) + g(Ax) ,
x∈X  
d := sup − f ∗ (−A∗ y ∗ ) − g ∗ (y ∗ ) .
y ∗ ∈Y ∗

Then we always have the relationship d ≤ p.


Proof. It follows from the definitions of conjugate functions and adjoint oper-
ators that for any y ∗ ∈ Y ∗ we have
   
−f ∗ (A∗ y ∗ ) − g ∗ (−y ∗ ) = − sup A∗ y ∗ , x − f (x) − sup −y ∗ , y − g(y)
x∈X  
y∈Y 
= inf f (x) − y ∗ , Ax + inf g(y) + y ∗ , y
x∈X   y∈Y  
≤ inf f (x) − y ∗ , Ax + inf g(Ax) + y ∗ , Ax
x∈X  x∈X 
≤ inf f (x) − y ∗ , Ax + g(Ax) + y ∗ , Ax
x∈X  
= inf x∈X f (x) + g(Ax) = p.

Taking the supremum with respect to all y ∗ ∈ Y ∗ yields d ≤ p. 


The next (strong) duality theorem is the main result of this subsection
that reveals appropriate qualification conditions in topological vector spaces,
Banach spaces, and finite-dimensional spaces that ensure the equality between
optimal values of primal and dual problems. Since it is often easier to solve
the dual problem than the primal one, the duality theorem provides an effi-
cient way to deal with complicated problems of convex optimization, which
is important, in particular, in the design and justification of numerical algo-
rithms. Observe that the proof of this theorem is based on the developed
conjugate calculus, where the convexity of the functions in question is essen-
tial.
Theorem 4.63 Consider the optimization problem (4.42) and its dual (4.43).
In addition to the standing assumptions on f , g, and A formulated at the
beginning of this subsection, suppose that one of the following conditions (a)–
(d) holds:
(a) X and Y are LCTV spaces, g is (finite and ) continuous at y := Ax ∈ AX
for some x ∈ dom(f ), i.e., A(dom(f )) ∩ cont(g) = ∅, where cont(g)
denotes the set of all the points y ∈ Y at which g is continuous.
(b) X and Y are topological vector spaces, g is a polyhedral function, and the
relaxed qualification condition
 
dom(g ◦ A) ∩ qri dom(f ) = ∅ (4.44)
is satisfied, which holds, in particular, when
 
dom(g) ∩ A qri(dom(f ) = ∅. (4.45)
294 4 ENHANCED CALCULUS AND FENCHEL DUALITY

(c) X and Y are Banach spaces, f and g are l.s.c., and the set
 
Z := cone dom(g) − A(dom(f ))
is a closed subspace of Y .
 
(d) X = Rn , Y = Rm , and 0 ∈ ri dom(g) − A(dom(f )) .
 Furthermore, if the number p is finite, then
Then we have the equality p = d.
the supremum in the definition of d is attained.

Proof. Due to Proposition 4.62, it remains to show that p ≤ d.  Since the


latter inequality is obvious when p = −∞, it suffices to consider the case
where p ∈ R. We clearly have the equalities
   
p : = inf f (x) + g(Ax) = − sup 0, x − [f + (g ◦ A)](x)
x∈X x∈X
= −[f + (g ◦ A)]∗ (0).
Using the conjugate sum rule from Theorem 4.27 in the corresponding cases
under consideration, we find x∗ ∈ X ∗ such that
p = −[f + (g ◦ A)]∗ (0) = −f ∗ (−x∗ ) − (g ◦ A)∗ (x∗ ).
The conjugate chain rule from Theorem 4.28 gives us y ∗ ∈ Y ∗ satisfying
A∗ y ∗ = x∗ , and (g ◦ A)∗ (x∗ ) = g ∗ (y ∗ ).
Therefore, we arrive at the relationships

p = −f ∗ (−A∗ y ∗ ) − g ∗ (y ∗ ) ≤ d,

which also ensure that the supremum in the definition of d is attained. Finally,
it is easy to observe from the definitions that the qualification condition (4.44)
holds provided that the simpler one (4.45) is satisfied. This, therefore, com-
pletes the proof of the theorem. 

Let us now consider a small albeit useful modification of problem (4.42)


with an affine inner mapping in the composition instead of a linear one. Given
f : X → R, g : Y → R, and A : X → Y as above, fix an arbitrary vector b ∈ Y
and formulate the primal minimization problem as
 
minimize f (x) + g Ax − b subject to x ∈ X. (4.46)
The dual maximization problem of (4.46) is defined by
maximize −f ∗ (A∗ y ∗ ) − g ∗ (−y ∗ ) + y ∗ , b subject to y ∗ ∈ Y ∗ . (4.47)
The next duality statements concerning the modified problems (4.46) and
(4.47) are direct consequences of Proposition 4.62 and Theorem 4.63.
4.6 Fenchel Duality 295

Corollary 4.64 Similar to the above, denote the optimal values in the dual
problem and dual problems (4.46) and (4.47) by, respectively,
 
p := inf f (x) + g(Ax − b) ,
x∈X  
d := sup − f ∗ (A∗ y ∗ ) − g ∗ (−y ∗ ) + y ∗ , b .
y ∗ ∈Y ∗

Then we have p ≥ d without any convexity assumptions on f and g. Suppose


in addition to the convexity of f, g and the continuity of A that one of the
following conditions (a)–(c) is satisfied:
(a) X and Y are topological vector spaces, g is continuous at some point
y := B(x) ∈ B(X) with x ∈ dom(f ), where B(x) := Ax − b for x ∈ X.
(b) X and Y are Banach spaces, f and g are l.s.c., and
 
Z := cone dom(g) − B(dom(f ))
is a closed subspace of Y .
 
(c) X = Rn , Y = Rm , and 0 ∈ ri dom(g) − B(dom(f )) .

Then we have p =  If furthermore p is a real number, then the supremum


d.
in the definition of d is attained.

Proof. Define g(x) := g(x − b) for x ∈ X and apply Proposition 4.62 and
Theorem 4.63 to problem (4.42), where g is replaced by g. Calculating the
conjugate function of g verifies all the conclusions of this corollary. 

In the rest of this subsection, we present some examples illustrating appli-


cations of the obtained duality results to particular classes of convex optimiza-
tion problems. Let us start with the following problem of convex optimization
with linear equality constraints in finite-dimensional and Banach space set-
tings. The reader can see that the dual problem is often easier to solve in
comparison with the primal one.

Example 4.65 Let f : X → R be a proper convex function, let A : X → Y


be a continuous linear operator, and let b ∈ Y . Consider the problem
minimize f (x) subject to Ax = b. (4.48)
This problem can be rewritten in the unconstrained composite form discussed
above in Corollary 4.64:
minimize f (x) + g(Ax − b) subject to x ∈ X,
where g(y) := δΩ (y) is the indicator function of the origin Ω = {0} ⊂ Y .
Recalling that g ∗ (y ∗ ) = σΩ (y ∗ ) = 0 for all y ∗ ∈ Y ∗ , the Fenchel dual problem
resulting from Corollary 4.64 is defined as
296 4 ENHANCED CALCULUS AND FENCHEL DUALITY

maximize − f ∗ (A∗ y ∗ ) − y ∗ , b subject to y ∗ ∈ Y ∗ . (4.49)



To employ the conditions of Corollary 4.64 ensuring the (strong) duality p = d,
observe that the topological case (a) cannot be efficiently used here, since the
function g is never continuous at the boundary points of B(X). Thus we spec-
ify the duality conditions in finite-dimensional and Banach spaces:

• either X = Rn , Y = Rm , and b ∈ A(ri(dom(f ))),

• or X and Y are Banach spaces, f is l.s.c., and the set Z := cone(b −


A(dom(f ))) is a closed subspace of Y .

For another application of the duality theorem for (4.48), consider the
problem of finding the Euclidean distance from a point x ∈ Rn to the set
  
Ω := x ∈ Rn  a, x = b with 0 = a ∈ Rn , b ∈ R.
This can be written as the optimization problem
1
minimize f (x) := x−x 2
subject to a, x = b.
2
Since f ∗ (y ∗ ) = 1/2 y ∗ 2 + y ∗ , x for all y ∗ ∈ Rn , the Fenchel dual problem
(4.49) is written as follows:
maximize − f ∗ (at) − tb = −1/2 ta 2
− ta, x + tb for all t ∈ R.
This is a simple maximization problem in R, and the duality theorem yields
1 (a, x − b)2
d = = p.
2 a 2
Thus we arrive at the well-known formula for the Euclidean distance function:
|a, x − b|
d(x; Ω) = 2
p= .
a
The next example concerns the class of unconstrained optimization prob-
lems with nondifferentiable objectives in arbitrary normed spaces.

Example 4.66 Given a continuous linear operator A : X → Y between


normed spaces, a proper convex function f : X → R, and a vector b ∈ Y ,
we consider the following unconstrained problem:
minimize f (x) + Ax − b subject to x ∈ X. (4.50)
Denote g(y) := y − b for all y ∈ Y and observe the conjugate expression
g ∗ (y ∗ ) = y ∗ , b + δB∗ (y ∗ )
via the closed unit ball B∗ ⊂ Y ∗ . Then the Fenchel dual problem (4.47) is
written in the unconstrained form as
4.6 Fenchel Duality 297

maximize − f ∗ (−A∗ y ∗ ) − y ∗ , b − δB∗ (y ∗ ) subject to y ∗ ∈ Y ∗ ,


which is equivalent to the constrained problem
maximize − f ∗ (−A∗ y ∗ ) − y ∗ , b subject to y ∗ ≤ 1. (4.51)
Note that the duality conditions from Corollary 4.64(a) are always satisfied
for the problem under consideration.
As an example of (4.50), examine the problem of finding the distance from
a given point x ∈ X to a nonempty convex subset Ω ⊂ X of a normed space.
This problem can be formalized as follows:
minimize δΩ (x) + x − x subject to x ∈ X.

Based on (4.51) and δΩ = σΩ , we get the dual problem defined by
maximize − σΩ (−u∗ ) − u∗ , x subject to u∗ ≤ 1.
It follows from the obtained duality theorem that
  
d(x; Ω) = sup − σΩ (−u∗ ) − u∗ , x = sup u∗ , x − σΩ (u∗ )].
u∗ ≤1 u∗ ≤1

In particular, for Ω = {0} ⊂ X we arrive at the classical norm representation


x = sup u∗ , x.
u∗ ≤1

The last example addresses optimization problems with inclusion con-


straints and employs Theorem 4.63 in all the three space frameworks therein.

Example 4.67 Consider the constrained optimization problem given by


minimize f (x) subject to Ax ∈ Ω, (4.52)
where f and A satisfy the general assumptions of Theorem 4.63 while Ω ⊂ Y
is a nonempty convex set. Then the primal problem (4.52) can be reformulated
in the convex composite form:
minimize f (x) + δΩ (Ax) subject to x ∈ X.
The Fenchel dual problem (4.43) reads for (4.52) as
maximize − f ∗ (A∗ y ∗ ) − σΩ (−y ∗ ) subject to y ∗ ∈ Y ∗ ,
which is clearly equivalent to
maximize − σΩ (y ∗ ) − f ∗ (−A∗ y ∗ ) subject to y ∗ ∈ Y ∗ .
Theorem 4.63 yields strong duality under one of the following conditions:
298 4 ENHANCED CALCULUS AND FENCHEL DUALITY

• X, Y are topological vector spaces and A(dom(f )) ∩ int(Ω) = ∅.

• X and Y are Banach spaces, f is l.s.c., Ω is closed, and the set cone(Ω −
A(dom(f ))) is a closed subspace of Y .

• X = Rn , Y = Rm , and 0 ∈ ri(Ω − A(dom(f ))).

We apply the obtained duality result in the last case to finding the distance
from a point x ∈ Rn to the set
  
Θ := x ∈ Rn  a, x ≤ b with 0 = a ∈ Rn and b ∈ R.
This problem can be rewritten in the optimization form of (4.52) as
1
minimize f (x) := x − x 2 subject to Ax ∈ Ω (4.53)
2
with A(x) := a, x for x ∈ Rn and Ω := (−∞, 0] + b. The dual problem of
(4.53) is the one-dimensional one written as
1
maximize − σΩ (t) − bt − t2 a 2 + ta, x subject to t ∈ R.
2
The latter problem can be easily solved giving us the optimal value

⎨0 if a, x ≤ b,
d = 1 (a, x − b)2
⎩ otherwise.
2 a 2
Thus the duality theorem tells us the distance in question is calculated by

⎨0 if a, x ≤ b,
d(x; Ω) = a, x − b
⎩ otherwise.
a

4.6.2 Duality Theorems via Generalized Relative Interiors

In this subsection, we focus on the Fenchel duality of convex optimization


problems formulated in general LCTV spaces. Our goal is to establish a duality
theorem in this infinite-dimensional framework, which goes in the way of the
finite-dimensional result of Theorem 4.63(c) expressed in terms of the relative
interior. To furnish it, we are going to use the generalized relative interior
notions introduced and studied in Section 2.5.
It is more convenient in this subsection to consider the primal optimization
problem in the following difference form:
minimize f (x) − g(x) subject to x ∈ X, (4.54)
where f : X → (−∞, ∞] is a proper convex function, while g : X → [−∞, ∞)
is a proper concave function, i.e., such that the function −g is proper and
4.6 Fenchel Duality 299

convex. Note that (4.54) is a minimization problem with the convex objection
f + (−g).
Along with the Fenchel (convex) conjugate f ∗ from Definition 4.1, define
the concave conjugate of g by
 
g∗ (x∗ ) := inf x∗ , x − g(x) for all x∗ ∈ X ∗ (4.55)
and observe that we do not have g∗ = (−g)∗ while getting the relationship
g∗ (x∗ ) = −(−g)∗ (−x∗ ) whenever x∗ ∈ X ∗ .
Recall also that the concavity of a function g : X → [−∞, ∞) can be fully
characterized by the convexity of its hypograph
  
hypo(g) := (x, α) ∈ X × R  α ≤ g(x) .
Before the formulation and proof of the main duality theorem given below,
we present the following simple lemma about some properties of intrinsic
relative and quasi-relative interiors as well as quasi-regularity of convex sets
that are taken from Definition 2.168.
Lemma 4.68 Let Ω be a convex subset of a topological vector space X, and
let q ∈ X. Then we have:
(a) iri(q + Ω) = q + iri(Ω).
(b) qri(q + Ω) = q + qri(Ω).
(c) Ω is quasi-regular if and only if Ω + q is quasi-regular.
Proof. Fix any x ∈ Ω and observe easily that
   
cone(q + Ω − x) = cone Ω − (x − q) and cone(q + Ω − x) = cone Ω − (x − q) .
Then we deduce from the definitions of iri and qri that x ∈ iri(q + Ω) if and
only if x − q ∈ iri(Ω), and that x ∈ qri(q + Ω) if and only if x − q ∈ qri(Ω).
This readily verifies both assertions (a) and (b). Assertion (c) follows directly
from (a) and (b) and the definition of quasi-regularity. 
Now we are ready to establish the aforementioned duality theorem for
problem (4.54) written in the difference form.
Theorem 4.69 Let f : X → (−∞, ∞] be a proper convex function, and let
g : X → [−∞, ∞) be a proper concave function defined on an LCTV space X.
Then we have the duality relationship
     
inf f (x) − g(x)  x ∈ X = sup g∗ (x∗ ) − f ∗ (x∗ )  x∗ ∈ X ∗ (4.56)
provided that the following conditions are satisfied simultaneously:
   
(a) qri dom(f ) ∩ qri dom(g) = ∅.
(b) All the three convex sets dom(f ) − dom(g), epi(f ), and epi(f ) − hypo(g)
are quasi-regular.
300 4 ENHANCED CALCULUS AND FENCHEL DUALITY

Proof. Observe first that for any x ∈ X and x∗ ∈ X ∗ we have the inequalities
f (x) + f ∗ (x∗ ) ≥ x∗ , x ≥ g(x) + g∗ (x∗ ),
which immediately yield the estimate
     
inf f (x) − g(x)  x ∈ X ≥ sup g∗ (x∗ ) − f ∗ (x∗ )  x∗ ∈ X ∗ .
Denoting α := inf{f (x) − g(x) | x ∈ X}, it is easy to see that (4.56) holds if
α = −∞. Considering the case where α is finite, we are going to show that
there exists x∗ ∈ X ∗ such that g∗ (x∗ ) − f ∗ (x∗ ) ≥ α, which would readily
justify (4.56). To proceed, define the sets
  
Ω1 := epi(f ) and Ω2 := (x, μ) ∈ X × R  μ ≤ g(x) + α .
Since the set epi(f ) is quasi-regular, we get by Theorem 2.190 that
    
qri(Ω1 ) = (x, λ) ∈ X × R  x ∈ qri dom(f ) , f (x) < λ ,
    
qri(Ω2 ) ⊃ (x, μ) ∈ X × R  x ∈ qri dom(g) , μ < g(x) + α .
   
It follows from the qualification condition qri dom(f ) ∩ qri dom(g) = ∅
in (a) that qri(Ω1 ) = ∅ and qri(Ω2 ) = ∅. Thus qri(Ω1 × Ω2 ) = qri(Ω1 ) × qri
(Ω2 ) = ∅; see Exercise 2.226. Using f (x) ≥ g(x) + α for all x ∈ X yields
qri(Ω1 ) ∩ Ω2 = ∅, and so qri(Ω1 ) ∩ qri(Ω2 ) = ∅.
Observing further that Ω2 = hypo(g) + {(0, α)}, we get
Ω1 − Ω2 = epi(f ) − hypo(g) − {(0, α)}.
It follows from Lemma 4.68 and the imposed assumptions in (b) that the
set Ω1 − Ω2 is quasi-regular. This allows us to apply Theorem 2.184, which
ensures that the sets Ω1 and Ω2 can be properly separated. Thus there exists
a pair (u∗ , β) ∈ X ∗ × R satisfying the following two conditions:
 ∗   ∗ 
inf u , x + βλ ≥ sup u , y + βμ ,
(x,λ)∈Ω1 (y,μ)∈Ω2
   
sup u∗ , x + βλ > inf u∗ , y + βμ .
(x,λ)∈Ω1 (y,μ)∈Ω2

This gives us a constant γ ∈ R such that


u∗ , x + βλ ≥ γ ≥ u∗ , y + βμ (4.57)
whenever (x, λ) ∈ Ω1 and (y, μ) ∈ Ω2 . If β = 0, then we have
 ∗   ∗ 
inf u , x ≥ sup u , y ,
x∈dom(f ) y∈dom(g)
4.6 Fenchel Duality 301

 ∗   ∗ 
sup u , x > inf u , y .
x∈dom(f ) y∈dom(g)

Thus the sets dom(f ) and dom(g) can be properly separated, which implies
by the characterization of Theorem 2.184 that
   
qri dom(f ) ∩ qri dom(g) = ∅

under the assumptions made. The obtained contradiction verifies that β = 0.


It follows from the structure of Ω1 that for any fixed x0 ∈ dom(f ) we have
(x0 , f (x0 ) + k) ∈ Ω1 whenever k ∈ N. Thus we deduce from (4.57) that
 
u∗ , x0  + β f (x0 ) + k ≥ γ for all k ∈ N,

which clearly yields β > 0. It also follows from (4.57) that


 u∗   u∗ 
, x + f (x) ≥ γ ≥ , y + g(y) + α for all x ∈ dom(f ), y ∈ dom(g).
β β
Letting x∗ := −u∗ /β and γ := −γ brings us to the inequalities
f (x) ≥ x∗ , x − γ and x∗ , y − γ ≥ g(y) + α
(4.58)
for all x ∈ dom(f ) and all y ∈ dom(g).
The first one in (4.58) shows that
  
γ ≥ sup x∗ , x − f (x)  x ∈ dom(f ) = f ∗ (x∗ ),
while the second inequality in (4.58) tells us that
  
γ + α ≤ inf x∗ , y − g(y)  y ∈ dom(g) = g∗ (x∗ ).
Thus α ≤ g∗ (x∗ ) − f ∗ (x∗ ), which completes the proof of the theorem. 

In the rest of this subsection, we present three useful consequences of


Theorem 4.69. The first result is based on the sufficient condition for quasi-
regularity via the SNC property from Definition 2.186.

Corollary 4.70 Let X be a Hilbert space, and let f : X → (−∞, ∞] and


g : X → [−∞, ∞) be as in Theorem 4.69. Suppose that all the sets dom(f ) −
dom(g), epi(f ), and epi(f ) − hypo(g) are closed and SNC with nonempty
intrinsic relative interiors, and that the qualification condition
   
qri dom(f ) ∩ qri dom(g) = ∅
is satisfied. Then we have the Fenchel duality (4.56).

Proof. Theorem 2.188 tells us that the sets dom(f ) − dom(g), epi(f ), and
epi(f ) − hypo(g) are quasi-regular under the imposed assumptions. Applying
Theorem 4.69, we arrive at the conclusion of the corollary. 
302 4 ENHANCED CALCULUS AND FENCHEL DUALITY

The next consequence of Theorem 4.69 involves the relative interior notion
for convex sets in LCTV spaces defined in (2.49). Recall that, in contrast to
the case of finite-dimensional spaces, nonempty convex sets may have empty
relative interiors in infinite dimensions.
Corollary 4.71 Let X be an LCTV space, and let f : X → (−∞, ∞] and
g : X → [−∞, ∞) be as in Theorem 4.69. Suppose that the sets dom(f ) −
dom(g), epi(f ), and epi(f ) − hypo(g) have nonempty relative interiors and
that the qualification condition
   
ri dom(f ) ∩ ri dom(g) = ∅ (4.59)
is satisfied. Then we have the Fenchel duality (4.56).
Proof. Since the sets dom(f ) − dom(g), epi(f ), and epi(f ) − hypo(g) have
nonempty relative interiors, we apply Theorem 2.174 and conclude that they
are quasi-regular. The duality result now follows from Theorem 4.69. 

4.7 Exercises for Chapter 4


Exercise 4.72 Calculate the Fenchel conjugate of each of the following func-
tions defined on Rn :
(a) f (x) := 12 x, Ax + b, x + c, where A is a quadratic positive definite
matrix of order n, where b ∈ Rn , and where c ∈ R.

(b) g(x) := ni=1 xi ln(xi ) if xi > 0 for all i = 1, . . . , n, and g(x) := ∞ otherwise.
Exercise 4.73 Let f : Rn → R be a proper convex function. Prove that f ∗ is
also a proper convex function. Is the converse implication true?
Exercise 4.74 Prove the second equality in Proposition 4.19(c).
Exercise 4.75 Give a detailed proof of Theorem 4.23 under the assumptions
in (c).
Exercise 4.76 (a) Give detailed proofs of Theorem 4.27 in cases (b) and (c).
(b) Extend the results of Theorem 4.27 to the case of finitely many functions.
Exercise 4.77 Let the mappings A : X → Y and g : Y → R satisfy the gen-
eral assumptions of Theorem 4.28. Give detailed proofs of the conjugate chain
rule of this theorem under assumptions (b) and (c).
Exercise 4.78 In the setting Theorem 4.29 do the following:
(a) Give a detailed proof in case (b).
(b) Extend the results of the theorem to the case of finitely many functions.
Exercise 4.79 (a) Clarify the possibility to derive the support function
intersection rule of Theorem 4.23 for the case of nonpolyhedral convex
sets Ω1 , Ω2 in LCTV spaces by replacing the relative interior qualifica-
tion condition in assertion (d) of this theorem with the quasi-relative
qualification condition qri(Ω1 ) ∩ qri(Ω2 ) = ∅.
4.7 Exercises for Chapter 4 303

(b) Clarifythepossibilitytoavoidthepolyhedralityassumptiononthefunctionf
incase(b)ofTheorem4.27byreplacingthequalificationcondition(4.16)with
  
qri dom(f )) ∩ qri dom(g) = ∅. (4.60)

(c) Clarify the possibility to avoid the polyhedrality assumption on the


function g in case (b) of Theorem 4.28 by replacing the qualification
condition AX ∩ dom(g) = ∅ therein with AX ∩ qri((dom(g)) = ∅.
(d) Clarify the possibility to avoid the polyhedrality assumption on f in The-
orem 4.29(b) by replacing the qualification condition (4.16) with (4.60).
Exercise 4.80 Let X be an LCTV space.
(a) Given ∅ = Ω ⊂ X, show that the support function σΩ : X ∗ → R is convex.
(b) Given two nonempty, closed, and convex sets Ω1 , Ω2 in X, show that
Ω1 ⊂ Ω2 if and only if σΩ1 (x∗ ) ≤ σΩ2 (x∗ ) for all x∗ ∈ X ∗ .

Exercise 4.81 Let X be an LCTV space, and let Ω be a nonempty subset


of X. Prove that (δΩ )∗ = σΩ and δΩ
∗∗
= δco(Ω) .
Exercise 4.82 For the case of finite-dimensional spaces, clarify relationships
between the Attouch-Brezis qualification condition and its versions used in
the calculus rules of Subsection 4.2.2 and the corresponding relative interior
qualification conditions used for calculus rules in Chapter 3.
Exercise 4.83 Prove Theorem 4.54 in the general case where T is compact. Hint:
Proceed similarly to the proof of this theorem given in the case where T is sequen-
tially compact, with the replacement of the sequences therein by nets.
Exercise 4.84 Let A is an m × n matrix with the transpose/adjoint matrix
A∗ , and let b ∈ Rm be a given vector.
(a) Consider the function ϕ(x) := f (Ax + b) for x ∈ Rn , where f : Rm → R
is a convex function. Prove that
∂ϕ(x) = A∗ ∂f (Ax + b) for all x ∈ Rn .

(b) Consider the function ϕ(x) := Ax − b on Rn and calculate the subdif-


ferential ∂ϕ(x) at any point x ∈ Rn .
Exercise 4.85 Let X be a normed space, and let Ω be a nonempty, closed,
and convex subset of X. Prove the distance function representation
 
d(x; Ω) = sup x∗ , x − σΩ (x∗ ) .
x∗ ≤1

Exercise 4.86 Formulate and prove the counterparts of Theorem 4.60 in the
cases of Banach and finite-dimensional spaces.
Exercise 4.87 In the setting of Theorem 4.63 do the following:
(a) Give a detailed proof of the theorem under the assumptions in (b).
304 4 ENHANCED CALCULUS AND FENCHEL DUALITY

(b) Clarify the possibility to drop the polyhedrality assumption on g in (b)


with the replacement of dom(g ◦ A) and dom(g) by qri(dom(g ◦ A)) and
qri(dom(g)) in (4.44) and (4.45), respectively.
Exercise 4.88 Let X be a real vector space, and let X  be its algebraic dual
(3.107).
(a) Clarify the possibility of the corresponding extensions of the generalized dif-
ferential calculus results, which are obtained in Subsections 4.1.2, 4.4.2, and
4.5.1 in the case of topological vector spaces with the usage of the set interior
and the function continuity, to vector spaces without topology by using cores.
(b) Is it possible to derive such results in vector spaces directly from those men-
tioned in (a) by using the convex core topology described in Exercise 2.207?
Exercise 4.89 Verify the possibility of vector space extensions of the conjugate
calculus results obtained in Subsections 4.1.3 and 4.5.2 by either developing simi-
lar proofs in vector spaces, or by using the convex core topology as in Exercise 4.88.

4.8 Commentaries to Chapter 4

The main focus of this chapter is on duality, which is at the core of convex analy-
sis and its applications. As mentioned in the commentaries to Chapter 3 (see also
Chapter 7 and the commentaries therein), the generalized differential concepts
and calculus rules of convex analysis have been widely extended to general frame-
works of nonconvex variational analysis. However, this is not the case for convex
duality and its applications based on conjugate functions and their calculus.
The notion of conjugate functions from Definition 4.1 was introduced and
largely developed by Fenchel in [130] and [131] for convex functions on finite-
dimensional spaces. In more special contexts, similar notions were originated by
Adrien-Marie Legendre (1752–1833) for gradient mappings in classical
mechanics and by William Henry Young (1863–1942) in the framework of non-
decreasing convex functions defined on the set of nonnegative numbers; see [357].
In some publications, conjugate functions are called the Legendre-Fenchel and
Young-Fenchel transforms. After the seminal work by Fenchel [130, 131] in finite
dimensions, various properties of conjugates were studied by Fenchel’s student
Brøndsted [63], Moreau [262, 263], and Rockafellar [304] in infinite-dimensional
spaces. Subsequent developments on Fenchel conjugates and their applications
can be found in the books by Bauschke and Combettes [34], Borwein and Lewis
[48], Boţ [55], Castaing and Valadier [71], Hiriart-Urruty and Lemaréchal [164,
165], Ioffe and Tikhomirov [174], Rockafellar [306, 309], Rockafellar and Wets
[317], Simons [326], Zălinescu [361], and the references therein.
The fundamental biconjugate results of Theorem 4.15 and the conjugate sub-
differential relationships in Theorem 4.17 are due to Fenchel [131] in finite dimen-
sions and to Moreau [263] in LCTV spaces.
Support functions, which were originally defined by Minkowski [220] for
bounded convex sets in finite dimensions, have been largely studied and
4.8 Commentaries to Chapter 4 305

applied in convex analysis. Theorem 4.23 on support functions to set intersec-


tions plays a special role in deriving subsequent results of conjugate calculus
presented in this book. The given proof of Theorem 4.23 follows our paper
[239] by using the geometric approach based on set extremality and the con-
vex extremal principle in LCTV spaces. This approach allows us to derive the
support intersection rule not only under the conventional qualification condi-
tions in (a) and (d), but under the conditions in (b), which are weaker than
those in (a) in normed spaces. It seems furthermore that the intersection rule
of Theorem 4.23 under the assumptions in (b) cannot be captured by calculus
rules for general Fenchel conjugates of extended-real-valued convex functions.
Just the opposite, in Subsection 4.1.3 we follow the device of our paper with
Rector and Tran [242] and derive basic rules of conjugate calculus in LCTV and
finite-dimensional spaces geometrically by reducing them to the above support
intersectionrule.Thisisdifferentfromthemoreinvolvedanalyticargumentsused,
e.g., in the books by Boţ [55], Simons [326], Rockafellar [306] and Zălinescu [361].
The new conjugate rules under polyhedrality assumptions in LCTV spaces, which
are associated with the support function intersection rule in Theorem 4.23(c), are
mainly taken from our recent preprint with Cuong and Sandine [91]. Their finite-
dimensional counterparts are due to Rockafellar [306].
Section 4.2 presents enhanced conjugate and generalized differential cal-
culus rules in Banach spaces under the closedness and lower semicontinuity
assumptions on the sets and functions involved. Such assumptions are required
for furnishing limiting procedures, and we employ here those developed by
Attouch and Brezis in [10] that are given in Lemmas 4.30 and 4.31. Differ-
ent from the context of paper [10], which uses such arguments to establish a
Banach space improvement of the conjugate sum rule in Theorem 4.27 under
the qualification condition
 
cone dom(f ) − dom(g) is a closed subspace of X, (4.61)
we first prove the refined support intersection rule of Theorem 4.33 under the
corresponding qualification condition (4.24). Traditionally, both conditions
(4.24) and (4.61) are labeled as the Attouch-Brezis qualification condition as
it is called in the book. Observe to this end that in the case where the subspace
in (4.61) is the entire space X, the dom-dom condition was used in conjugate
calculus and related topics by Rockafellar [309] when the space X is reflexive
and by Robinson [300] in the general Banach space setting. Let us mention
the following interesting result by Ernst and Théra [124]: If the conjugate sum
rule of Theorem 4.41 holds for any l.s.c. convex function g : X → R under all
the assumptions therein but the Attouch-Brezis qualification condition (4.61),
then f must be continuous at x. Another necessary sand sufficient condition
for the fulfillment of the conjugate sum rule in Banach spaces was obtained
by Burachik and Jeyakumar [67] in terms of the weak∗ closedness property of
the epigraphical sum epi(f ∗ ) + epi(g ∗ ).
In Subsection 4.2.2, we follow the pattern of our paper [242] in Banach
spaces, although the presented conjugate and generalized differential calculus
306 4 ENHANCED CALCULUS AND FENCHEL DUALITY

rules and/or their specifications can be found in one or another place in the
literature under various qualification conditions; see, e.g., the books [34, 42,
54, 55, 174, 210, 237, 294, 309, 361] and the references therein.
Section 4.3 contains classical material on directional derivatives of convex
functions that is used in Section 4.4 for the subdifferential study of supremum
functions. The main result of Section 4.4, Theorem 4.54, is also well known.
We refer the reader to Valadier [341], Pshenichnyi [294], Ioffe and Levin [172],
Ioffe and Tikhomirov [174], and Zălinescu [361] with the bibliographies therein
for the results of this type under the continuity assumption on g(·, x). Note
that subdifferentiation of the maxima of infinitely many convex functions
over compact index sets is significantly more challenging that for the case
when only a finite number of functions are involved into maximization. The
latter class of maximum functions was first investigated in subdifferential
theory by Dubovitskii and Milyutin [113] and Danskin [95]; see Demyanov
and Malozemov [97] for related results and also Demyanov and Rubinov [101]
for further extensions to the class of quasidifferentiable functions.
In more recent years, strong attention has been paid to subdifferentiation
of convex supremum functions (4.29), where great progress has been made
in the following two directions: (1) the function g(t, ·) under the supremum
operation is discontinuous at the reference point x of its domain, and (2) the
index set T is not compact and may be an arbitrary set with no topology
involved. The results in these extended settings added the normal cone to
the domain of the supremum function to the subdifferential formula (4.29)
and also used small perturbations of the index set T and ε-expansions of the
convex subgradient mappings for g(·, x). More details can be found in, e.g.,
Correa et al. [82], Hantoute et al. [151], and the bibliographies therein with
applications to convex semi-infinite programs (SIPs).
Versions of subdifferential results for nonconvex maximum and supremum
functions have also been developed in nonsmooth analysis. Namely, Clarke
evaluated in [74, 76] his generalized gradients for pointwise maxima of Lips-
chitz continuous functions over compact sets under appropriate assumptions
that allowed him to reduce the situation to convex analysis. Subsequent devel-
opments in this direction were given by Hiriart-Urruty [159] (see also his book
with Lemaréchal [164]) and by Zheng and Ng [367]. Later on, Borwein and
Zhu [53] established some “fuzzy” upper estimates of regular/Fréchet subgra-
dients for pointwise maxima of Lipschitzian functions. In the other lines of
developments, Mordukhovich and Nghia [249, 250] evaluated Clarke’s general-
ized gradients of Lipschitzian supremum functions (4.29) over metrizable (not
generally compact) index sets T as well as Mordukhovich’s limiting subgradi-
ents of (4.29) over arbitrary index sets with applications to nonconvex SIPs
described by Lipschitzian functions. The reader can find further results and
discussions in [249, 250] and in Mordukhovich’s book [229, Chapter 8] with
the commentaries therein. Quite recently, Pérez-Aros [289] obtained upper
estimates of the regular and limiting subdifferentials of pointwise suprema of
l.s.c. functions.
4.8 Commentaries to Chapter 4 307

It has been realized in convex analysis that subdifferentiation of pointwise


maximum/supremum functions and integral functionals goes hand in hand.
However, the latter issue requires a deep excursion into measure theory and
theory of measurable multifunctions, which is beyond the scope of this book.
Nevertheless, we discuss in what follows some basic results and perspectives in
this direction while referring the reader to the extensive bibliography address-
ing the function f : X → R of the type
!
f (x) := g(t, x)μ(dt), (4.62)
T

where (T, μ) is a measure space, X is an LCTV space, and g : T × X → R


is μ-measurable in t and l.s.c. in x. Paying the main attention to the case
of convexity of g in x, we’ll also mention some publications dealing with
nonconvex functions.
To the best of our knowledge, the first result on calculating the subdif-
ferential of the integral functional (4.62) generated by the family of convex
integrands gt (x) := g(t, x) was obtained by Strassen [330] who derived the
generalized Leibniz rule
!
∂f (x) = ∂gt (x)μ(dt), (4.63)
T

in the case of sublinear functions gt , with the integral of a set-valued mapping


understood in the sense of Aumann [15]. In the aforementioned paper [341],
Valadier established the Leibniz rule (4.63) for convex functions on separable
LCTV spaces under the continuity assumption on gt at x. The more general
case of (4.62), where the functions gt are convex and l.s.c. on X and where
x ∈ dom(f ), was first considered by Ioffe and Tikhomirov [173] for X = Rn
with adding to the Leibniz rule (4.63) the normal cone to the domain of f ,
i.e., getting the formula
!
 
∂f (x) = ∂gt (x)μ(dt) + N x; dom(f ) , (4.64)
T

which is similar to the situation with differentiation of the pointwise maxima


discussed above. An extension of the latter result to separable Banach spaces
X was furnished by Ioffe and Levin [172], where the reader can find additional
references for the initial period of Leibniz-type rules in convex analysis.
In a parallel line of developments, Rockafellar [305] introduced the notion
of normal integrands, which occurred to be crucial for subsequent investiga-
tions and applications of integral functionals in convex analysis and beyond.
Being defined in terms of the measurability and closed-valuedness of epigraph-
ical multifunctions epi(gt (·)) : T →
→ X ×R, normal integrals constitute a conve-
nient territory of analysis and applications while unifying Castaing’s theory of
measurable multifunctions with the basic machinery of convex analysis, pro-
vided that the integrand functions gt (x) are convex. Proceeding in this way,
deep results on duality and other properties of convex integral functionals
308 4 ENHANCED CALCULUS AND FENCHEL DUALITY

and their conjugates were established by Rockafellar [305, 308, 309], Castaing
and Valadier [71], and their followers. We refer the reader to, e.g., the recent
paper by Correa et al. [83] with the bibliography therein for the current stage
of developments in this and related directions.
The first nonconvex extension of the Leibniz rule (4.63) was obtained by
Clarke [75] (see also [76]) in terms of his generalized gradients of Lipschitzian
functions by replacing the equality in (4.63) with the inclusion “⊂”, while
the equality was proved therein under an additional regularity condition. The
construction of the generalized gradients (discussed in Chapter 7 below) was
instrumental to reduce the nonconvex case to the convex one resolved by Ioffe
and Levin [172].
Mordukhovich obtained in [228, Lemma 6.18] the first version of the Leib-
niz rule for his limiting subdifferential of locally Lipschitzian functions defined
on separable and reflexive Banach spaces in the inclusion form with the
replacement of right-hand side in (4.63) by its norm closure for the Bochner
integral on T = [0, 1] in the case of infinite-dimensional spaces. Far-going
extensions of this result in various directions, with and without the closure
operation, were established by Mordukhovich and Sagara [255] by replacing
the Bochner integral with the more suitable Gelfand integration in duals of
Banach spaces. While the main motivation and applications in [228] came from
optimal control, those in [255] were strongly addressed in stochastic dynamic
programming in Banach spaces and economic modeling; see also [256]. Note
that the fundamental Lyapunov convexity theorem on the range of nonatomic
vector measures (due to Alexey Lyapunov [213]) and its infinite-dimensional
versions (see, e.g., [106]) play an important role in integration of set-valued
mappings.
Various results on subdifferentiation of nonconvex integral functionals have
been subsequently obtained in the literature over the years. We refer the reader
to, e.g., Chieu [72], Correa et al. [84], Giner and Penot [142], Mordukhovich
and Pérez-Aros [252], and the bibliographies therein. Observe that the results
of [84] address both Lipschitzian and non-Lipschitzian functionals and estab-
lish, in particular, a limiting subdifferential version of the extended Leibniz
rule (4.64) in the latter case. Note also that the quite recent paper by Mor-
dukhovich and Pérez-Aros [251] applies new extremal principles and integra-
tion of normal cone mappings to evaluate normal cones to essential intersec-
tions of random constraint sets defined on measure spaces. Finally, the same
authors establish in [253], for the first time in the literature, Leibniz-type rules
for coderivatives of expected-integral multifunctions related, in particular, to
two-stage stochastic programming. The results obtained in [251–253] are new
even in convex frameworks. We also refer the reader to the survey article by
Hess [158] for a broad overview on the integration of random set-valued map-
pings and set-valued probability theory with numerous applications to various
specific classes of stochastic problems.
The vast majority of publications on generalized differentiation of integral
functionals have been motivated by applications to various classes of problems
4.8 Commentaries to Chapter 4 309

in stochastic optimization, probabilistic constraint systems, and related top-


ics. Some of such applications can be found in the aforementioned papers and
the references therein. Let us specially mention the recent paper by Hantoute
et al. [150] dealing with the so-called probabilistic functions, where practical
models require subdifferentiation of integral functionals in infinite dimensions
with Lipschitzian integrands described, in particular, by functions of the max-
imum and minimum types. The results [150] on evaluating the Clarke and
Mordukhovich subdifferentials of integral functionals (where the latter con-
struction allowed the authors to deal with not only maximum functions but
also with minimum ones) brought them to establishing efficient estimates for
probability functions associated with multivariate Gaussian distributions and
random inequality systems. Other recent developments in this direction with
applications to stochastic optimization and related topics can be found in the
papers by Burke et al. [68] and by Dentcheva and Ruszczyński [104].
The class of marginal/optimal value functions (4.33) considered in Sec-
tion 4.5 has been highly important in variational analysis, optimization, and
applications; see, e.g., the books [76, 77, 228, 229, 317, 343] with the refer-
ences and discussions therein. Besides applications to sensitivity analysis in
optimization and control, marginal functions have been realized as an effi-
cient machinery to derive necessary optimality conditions, to establish cal-
culus rule of generalized differentiation, to investigate viscosity solutions of
partial differential and stochastic differential equations, to study optimistic
and pessimistic versions of bilevel programming, to support marginal price
equilibria in economic modeling, etc. Due to the intrinsic nondifferentiability
of marginal functions, subgradient evaluations of them play a key role in their
theory and applications. Besides the aforementioned books, various results in
this direction, mainly in terms of the generalized gradient and limiting subd-
ifferential, can be found in [16, 42, 98–100, 110, 139, 159, 211, 217, 227, 230,
232, 241, 247, 255, 282, 313, 314, 316, 333, 337, 354] among other publications.
Although the above subdifferential results surely apply to convex marginal
functions with reducing all the nonconvex subgradient mappings to the sub-
differential of convex analysis, they do not fully capture the nature of convex
functions. Indeed, in contrast to the strongest nonconvex results of this type
(cf. [228, Theorem 3.38] for the limiting subdifferential), Theorem 4.56 holds
as equality in general LCTV spaces without the inner semicontinuity of the
solution/argminimum mapping (4.34), and any SNC assumptions of the prob-
lem data. This result was first obtained in finite dimensions in our book [237]
under the singular subdifferential qualification condition and then derived in
the subsequent paper [238] under the relative interior qualification condition
(c). The full version of Theorem 4.56 in Banach and LCTV spaces was given
in our paper with Rector and Tran [242]. Particular cases of this theorem
under stronger nonempty interior conditions can be found in Aubin [12] in
Hilbert spaces with ϕ = ϕ(y) and in An and Yen [6] in the LCTV setting.
The remainder of Section 4.5 also follows [242]. The chain rule in ordered
spaces given in Theorem 4.60 extends the one from Lemaire [201] obtained in a
310 4 ENHANCED CALCULUS AND FENCHEL DUALITY

different way; we derive it here from the general result for marginal functions.
Observe finally that the subdifferential formulas for convex marginal functions
have been recently extended in [88, 89] to arbitrary vector spaces without
topological structures.
Much has been written in the literature about Fenchel duality, which is
at the heart of convex optimization and its applications. The fundamental
contributions by Fenchel, Moreau, and Rockafellar on conjugate functions, as
well as of their followers mentioned at the beginning of the commentaries to
this chapter, are strongly related to Fenchel duality. The results presented in
Subsection 4.6.1 can be found, in different versions, in various publications
with our further elaborations in the presented examples. Theorem 4.63 on
strong duality is known as the Fenchel-Rockafellar theorem. The polyhedral
version (b) of this theorem is taken from [91].
Subsection 4.6.2 addresses the usage of generalized interior notions in
duality theory for convex optimization problems in LCTV spaces. This
line of research has been initiated by Borwein and Lewis [47] whose intro-
duction of quasi-relative interiors was largely motivated by extensions of
Fenchel duality. Subsequent developments on duality theory and its appli-
cations for various classes of infinite-dimensional convex problems under
appropriate quasi-relative interior constraint qualifications can be found in
[55, 56, 94, 143, 144, 181, 280, 360, 362] among other publications. It seems
that the result of Theorem 4.69 is new; see [90] for further elaborations, dis-
cussions, and applications.
5
VARIATIONAL TECHNIQUES AND
FURTHER SUBGRADIENT STUDY

We start this chapter with the study of variational structures for func-
tions and sets in complete metric and normed spaces. The major varia-
tional and extremal principles, being held even in nonconvex frameworks, are
largely related to and motivated by the developments on convexity. Varia-
tional/extremal principles and variational techniques elaborated in this chap-
ter in complete spaces are then applied to establishing density results for ε-
subgradients of convex functions and to developing ε-subdifferential calculus
with the further applications to convex mean value theorems, subdifferential
monotonicity, characterizations of the Fréchet and Gâteaux differentiability
together with their generic properties, and finally to deriving subgradient for-
mulas for spectral and singular functions in convex analysis. Our major results
hold in Banach spaces, but some results and proofs are valid in general set-
tings of complete metric and topological vector spaces, while those for spectral
and singular functions are primarily finite-dimensional.

5.1 Variational Principles and Convex Geometry

This section is devoted to major variational results and techniques involving


lower semicontinuous functions and closed sets in complete metric spaces and
Banach spaces. We begin with the fundamental Ekeland variational principle
for extended-real-valued l.s.c. functions on complete metric spaces and its
subdifferential variational consequence valid in Banach spaces. Then we apply
the obtained variational results and techniques together with other tools
of variational analysis to establish both approximate and exact versions of
the convex extremal principle in general Banach spaces. Both versions hold
for nonsolid sets while being essentially stronger than the previous results
of the convex extremal principle in topological vector spaces established in
Chapter 3. As a consequence of the approximate extremal principle in Banach
spaces, we immediately get the classical Bishop-Phelps theorem on the density
of support points in the boundaries of nonempty, closed, and convex subsets of
Banach spaces. Furthermore, the Ekeland variational principle is directly used

© Springer Nature Switzerland AG 2022 311


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 5
312 5 VARIATIONAL TECHNIQUES . . .

to derive the Brøndsted-Rockafellar theorem on the density of subgradients of


an l.s.c. convex function in the collection of its ε-subgradients, again in the
setting of Banach spaces.

5.1.1 Ekeland’s Variational Principle and Related Results

Here we derive the fundamental variational principle for l.s.c. functions on


complete metric spaces developed by Ivar Ekeland who called the geomet-
ric Bishop-Phelps theorem “the grandfather of it all” [120]. In this book
we deduce the Bishop-Phelps theorem (see Theorem 5.7) from the convex
extremal principle in Banach spaces developed in the next subsection. Observe
in general that variational principles for functions are strongly interrelated
with (geometric) extremal principles for sets in both convex and nonconvex
settings; see [229, Chapter 2] and the commentaries in Section 5.9 below for
more details and discussions.
Prior to the formulation and proof of the Ekeland variational principle, we
present a lemma of its own interest, which is crucial in the sequel.
Lemma 5.1 Let (X, d) be a complete metric space, and let f : X → R be a
proper l.s.c. function bounded from below. Select ε > 0 and x0 ∈ dom(f ) with
f (x0 ) < inf f (x) + ε. (5.1)
x∈X

For any λ > 0 define the set-valued mapping F : X →


→ X by
  
 ε
F (x) := y ∈ X  f (y) + d(x, y) ≤ f (x) , x ∈ X, (5.2)
λ
and construct the sequence {xk } ⊂ X iteratively as follows:
choose x1 ∈ dom(f ) and select
1
xk+1 ∈ F (xk ) such that f (xk+1 ) ≤ inf f (x) + , k ∈ N.
x∈F (xk ) k
∞
Then there exists z ∈ X with k=1 F (xk ) = {z}.

Proof. Since x ∈ F (x) for every x ∈ X, we first observe that F (xk ) = ∅ for
all k ∈ N. The lower semicontinuity of f ensures that the set F (xk ) is closed
for each k ∈ N. To check next that F (xk+1 ) ⊂ F (xk ) for all k, pick any
y ∈ F (xk+1 ) and get
ε
f (y) + d(xk+1 , y) ≤ f (xk+1 ), k ∈ N.
λ
It follows by the triangle inequality that
ε ε ε
f (y) + d(xk , y) ≤ f (y) + d(xk+1 , y) + d(xk , xk+1 )
λ λ ε λ
≤ f (xk+1 ) + d(xk , xk+1 ) ≤ f (xk ),
λ
5.1 Variational Principles and Convex Geometry 313

where the last estimate holds due to xk+1 ∈ F (xk ). Employing the classical
Cantor intersection theorem allows us to complete the proof by showing that
 
diam F (xk ) → 0 as k → ∞. (5.3)
To proceed, fix any y ∈ F (xk+1 ) and observe that
ε
d(y, xk+1 ) ≤ f (xk+1 ) − f (y)
λ
1
≤ inf f (x) + − f (y)
x∈F (xk ) k
1 1
≤ inf f (x) + − f (y) ≤ .
x∈F (xk+1 ) k k
Invoking again the triangle inequality tells us that d(y, u) ≤ 2λ/(kε) for all
y, u ∈ F (xk+1 ), which therefore finishes the proof of the lemma. 

Now we are ready to derive the fundamental Ekeland variational principle.

Theorem 5.2 Let (X, d) be a complete metric space, and let f : X → R be a


proper l.s.c. function that is bounded from below. Select ε > 0 and x0 ∈ dom(f )
satisfying the suboptimality condition (5.1). Then for any λ > 0 there exists
z ∈ dom(f ) such that the following properties hold:
ε
(a) f (z) + d(z, x0 ) ≤ f (x0 ).
λ
(b) d(z, x0 ) ≤ λ.
ε
(c) f (z) < f (x) + d(x, z) for all x ∈ X \ {z}.
λ
Proof. Consider the closed subset of X defined by
  
 ε
Y := x ∈ X  f (x) + d(x, x0 ) ≤ f (x0 ) .
λ
Observe that Y is a complete metric space with the induced metric from X
and that we have the relationships
f (x0 ) < inf f (x) + ε ≤ inf f (x) + ε,
x∈X x∈Y

i.e., x0 is a suboptimal point (5.1) of f , but now with respect to Y . Applying


Lemma 5.1 to the mapping F : Y → → Y from (5.2) restricted to Y gives us a
∞
sequence {yk } ⊂ Y and a unique point z ∈ Y such that z ∈ k=1 F (yk ).
Fix any x ∈ Y \ {z} and show that (c) is satisfied when X is replaced by
Y . Suppose on the contrary that (c) fails, i.e., f (x) + λε d(x, z) ≤ f (z) for some
point x ∈ Y \ {z}. Then for each k ∈ N we have the estimates
ε ε ε
f (x) + d(x, yk ) ≤ f (x) + d(x, z) + d(z, yk )
λ λ λ
ε
≤ f (z) + d(z, yk ) ≤ f (yk ),
λ
314 5 VARIATIONAL TECHNIQUES . . .

where the last one follows from the inclusion z ∈ F (yk ) whenever k ∈ N. This
yields x ∈ F (yk ) for all k ∈ N, which contradicts the condition x = z.
It follows from z ∈ Y and the construction of Y that (a) is satisfied. Then
we deduce from (a) and the above estimates that
ε
f (z) + d(z, x0 ) ≤ f (x0 ) ≤ inf f (x) + ε ≤ f (z) + ε,
λ x∈Y

and thus d(x, x0 ) ≤ λ, which verifies (b). It remains to show that (c) is satisfied
for x ∈ X \ {z} with x ∈ / Y . Indeed, for such x we have
ε ε
f (x) + d(x, x0 ) > f (x0 ) ≥ f (z) + d(z, x0 ).
λ λ
Using finally the triangle inequality gives us
ε  ε
f (x) > f (z) + d(z, x0 ) − d(x, x0 ) ≥ f (z) + d(x, z),
λ λ
which verifies (c) and thus completes the proof of the theorem. 

Observe that for Banach spaces X the distance function in Theorem 5.2
is induced by the norm function, which is intrinsically nonsmooth while being
convex. Applying first the generalized Fermat rule to the minimizer z in con-
dition (c) of the theorem and then using the subdifferential sum rule in the
case of convex functions f , we arrive at the following result, which can be
treated as the subdifferential variational principle of convex analysis.

Theorem 5.3 Let X be a Banach space, and let f : X → R be a proper,


l.s.c. convex function that is bounded from below. Pick arbitrary ε > 0 and
x0 ∈ dom(f ) such that (5.1) holds. Then for any λ > 0 there exist z ∈ dom(f )
and z ∗ ∈ ∂f (z) ⊂ X ∗ satisfying the conditions:
ε
(a) f (z) + z − x0 ≤ f (x0 ).
λ
(b) x0 − z ≤ λ.
(c) z ∗ ≤ ε/λ.

Proof. Take the element z ∈ X satisfying all the conclusions of Theorem 5.2
in the case of the Banach space X. Then we immediately get the conditions
in (a) and (b) of this theorem and observe that condition (c) of Theorem 5.2
means that z is a minimizer of the function
ε
ϕ(x) := f (x) + x − z for x ∈ X.
λ
The generalized Fermat rule from Proposition 3.29 tells us that
 ε
0 ∈ ∂ϕ(z) = ∂ f (·) + x − · (z). (5.4)
λ
5.1 Variational Principles and Convex Geometry 315

Applying now the subdifferential sum rule from Theorem 3.48 to the summa-
tion function in (5.4) taking into account the calculation of subgradients of
the norm function in (3.48), we arrive at
ε
0 ∈ ∂f (z) + B∗ .
λ
This gives us a subgradient z ∗ ∈ ∂f (z) with z ∗ ≤ ε/λ and therefore com-
pletes the proof of the theorem. 

5.1.2 Convex Extremal Principles in Banach Spaces

In this subsection we continue the study of set extremality and extremal


principle given in Subsection 3.1.2 of Chapter 3 for arbitrary convex sets in
general topological vector spaces under nonempty interior assumptions. Now
our attention is paid to the case of closed convex sets in Banach spaces. Using
the completeness of the space and the closedness of the sets in question allows
us to employ variational arguments and obtain in this way enhanced versions
of the convex extremal principle and related results without any interiority
requirements on the sets in question.
The main result of this subsection, Theorem 5.5, provides necessary condi-
tions for set extremality of two closed convex sets in the Banach space setting
with full characterizations of set extremality in the case where one of the sets
is SNC at the point in question; see Definition 2.186. In its proof we employ
the Ekeland variational principle given in Theorem 5.2. The following lemma
is useful for the proof of the main result.

Lemma 5.4 Let X be a normed space, and let ϕ : X × X → R be defined by


ϕ(x1 , x2 ) := x1 − x2 for all (x1 , x2 ) ∈ X × X. (5.5)
Fix x1 = x2 and take (x∗1 , x∗2 ) ∈ ∂ϕ(x1 , x2 ). Then we have
x∗1 + x∗2 = 0 with x∗1 = x∗2 = 1.
Proof. It follows from the subdifferential definition for (5.5) that
x∗1 , x1 − x1 + x∗2 , x2 − x2 ≤ x1 − x2 − x1 − x2 (5.6)
for all (x1 , x2 ) ∈ X × X. Putting x2 := x2 above, we get
x∗1 , x1 − x1 ≤ x1 − x2 − x1 − x2 for all x1 ∈ X.
Denoting further x := x1 − x2 brings us to the estimate
x∗1 , x − (x1 − x2 ) ≤ x − x1 − x2 for all x ∈ X,
which tells us that x∗1
∈ ∂p(x1 − x2 ) for p(·) := · . Thus x∗1 = 1, and

similarly x2 = 1. Putting now x2 := x1 in (5.6), we get
316 5 VARIATIONAL TECHNIQUES . . .

x∗1 , x1 − x1 + x∗2 , x1 − x2 ≤ − x1 − x2 for all x1 ∈ X.


This implies in turn the following relationships for any x1 ∈ X:
x∗1 , x1 − x1 + x∗2 , x1 − x1 + x1 − x2 ≤ − x1 − x2 ,
x∗1 + x∗2 , x1 − x1 ≤ −x∗2 , x1 − x2 − x1 − x2
≤ x∗2 · x1 − x2 − x1 − x2
≤ x1 − x2 − x1 − x2 = 0,
which clearly verify that x∗1 + x∗2 = 0 and thus complete the proof. 

Now we are ready to establish both approximate and exact versions of the
extremal principle for closed and convex subsets in arbitrary Banach spaces
without any nonempty interior assumptions.
Theorem 5.5 Let Ω1 and Ω2 be closed and convex subsets of a Banach space
X, and let x be any common point of Ω1 , Ω2 . Consider the following three
assertions concerning these sets and the normal cones to them:
(a) The sets Ωi as i = 1, 2 form an extremal system in X.
(b) For each ε > 0 we have the conditions
∃ xiε ∈ B(x; ε) ∩ Ωi and x∗iε ∈ N (xiε ; Ωi ) + εB∗ , i = 1, 2,
(5.7)
such that x∗1ε + x∗2ε = 0 with x∗1ε = x∗2ε = 1.

(c) There exists nonzero vector x∗ ∈ X ∗ such that


 
x∗ ∈ N (x; Ω1 ) ∩ − N (x; Ω1 ) . (5.8)
Then (a)=⇒(b). Furthermore, all the properties in (a)–(c) are equivalent if in
addition either Ω1 or Ω2 is SNC at x.

Proof. We begin with verifying (a)=⇒(b). It follows from the extremality


condition (3.4) that for any ε > 0 there exists a ∈ X such that
a ≤ ε2 and (Ω1 + a) ∩ Ω2 = ∅. (5.9)
Define the convex, l.s.c., and bounded from below function f : X 2 → R by
 
f (x1 , x2 ) := x1 − x2 + a + δ (x1 , x2 ); Ω1 × Ω2 , (x1 , x2 ) ∈ X 2 , (5.10)
via the indicator function of the closed set Ω1 × Ω2 . Then (5.9) tells us that
f (x1 , x2 ) > 0 on X 2 and f (x, x) = a ≤ ε2 for any (x1 , x2 ) ∈ X 2 . Applying
to (5.10) the Ekeland variational principle from Theorem 5.2, we find a pair
(x1ε , x2ε ) ∈ Ω1 × Ω2 satisfying x1ε − x ≤ ε, x2ε − x ≤ ε, and
 
f (x1ε , x2ε ) ≤ f (x1 , x2 ) + ε x1 − x1ε + x2 − x2ε for all (x1 , x2 ) ∈ X 2 .
5.1 Variational Principles and Convex Geometry 317

The latter means that the function ψ(x1 , x2 ) := f (x1 , x2 ) + ε x1 − x1ε +
x2 − x2ε ) attains its minimum on X × X at (x1ε , x2ε ) with x1ε − x2ε +
a = 0. Thus the generalized Fermat rule from Proposition 3.29 yields 0 ∈
∂ψ(x1ε , x2ε ). Taking into account the structure of f in (5.10), we apply to
it the subdifferential sum rule from Theorem 3.48 that allows us to find—by
standard subdifferentiation of the norm and indicator functions—such dual
elements x∗iε ∈ N (xiε ; Ωi ) + εB∗ for i = 1, 2 that all the conditions in (5.7)
are satisfied. This gives us property (b) of the theorem.
Next we verify the fulfillment of implication (b)=⇒(c) by passing to the
limit in (5.7) as ε ↓ 0 with the help of the SNC property of, say, the set
Ω1 at x. Take a sequence εk ↓ 0 as k → ∞ and find, by using (5.7) and
Lemma 5.4, the corresponding septuples (x1k , x2k , x∗k , x∗1k , x∗2k , e∗1k , e∗2k ) such
that x1k → x, x2k → x, and
x∗k = x∗1k + εk e∗1k , x∗k = −x∗2k + εk e∗2k , x∗k = 1,
(5.11)
x∗ik ∈ N (xik ; Ω1 ), and e∗ik ∈ B∗
for all k ∈ N and i = 1, 2. Then the Alaoglu-Bourbaki theorem on the topo-
logical weak∗ compactness of the unit dual ball B∗ ⊂ X ∗ in any normed space
X allows us to conclude that there exists x∗ ∈ B∗ belonging to the weak∗
closure of the set {x∗k | k ∈ N}. Thus we can find a subnet of (x∗k , e∗1k , e∗2k )
converging in the weak∗ topology of X ∗ × X ∗ × X ∗ to (x∗ , e∗1 , e∗2 ), where
(e∗1 , e∗2 ) ∈ B∗ × B∗ . Since the normal cone mapping generated by a convex set
is clearly closed-graph in the strong×weak∗ topology of X × X ∗ , it follows by
passing to the limit in (5.7) that any cluster point x∗ of {x∗k | k ∈ N} in the
weak∗ topology of X ∗ satisfies the claimed inclusion (5.8).
To justify now (c), it remains to show that there always exists x∗ = 0
obtained by the above procedure, provided that Ω1 is SNC at x. Indeed,
supposing the contrary tells us that x∗ = 0 is the only weak∗ cluster point of
the sequence {x∗k } in the weak∗ topology of X ∗ . But the latter means that
the entire sequence converges to zero in this topology. Thus the imposed SNC
property of Ω1 ensures that x∗1k → 0 and hence x∗2k → 0 as k → ∞. This
clearly contradicts (5.7) and therefore verifies that x∗ = 0.
To check finally that (c) yields (a), observe that the separation property
(3.6) ensures by Theorem 3.7(c) that the sets Ω1 , Ω2 form an extremal system
in X. This verifies (a) and completes the proof of theorem. 
Remark 5.6 The relationships in (5.7) are known as the approximate extremal
principle, while those in (c) of Theorem 5.5 are referred to as the exact
extremal principle. Both assertions in (b) and (c) of this theorem hold as nec-
essary conditions for set extremality of closed nonconvex subsets of Asplund
spaces (i.e., such Banach spaces where any separable subspace has a sepa-
rable dual; see below) in terms of the appropriate notions of normal cones.
The reader can find more about it in [228, Chapter 2] and in the commen-
taries of this chapter. Theorem 5.5 shows that in the case of closed con-
vex subsets of general Banach spaces the obtained conditions are not only
318 5 VARIATIONAL TECHNIQUES . . .

necessary but also sufficient for set extremality. Furthermore, the exact
extremal principle (5.8) agrees in this setting with convex separation, and thus
we arrive at a refined separation theorem for closed convex subsets of Banach
spaces without imposing any interiority assumptions; see Remark 2.187 and
more discussions in Section 2.7.

As an immediate consequence of Theorem 5.5, we derive now the following


result on boundary points of closed convex subsets of Banach spaces. The first
part of it, which is implied by the (convex) approximate extremal principle,
gives us the celebrated Bishop-Phelps theorem on the density of support points
of closed convex sets. The second statement, which follows from the exact
extremal principle, provides a refined version of the supporting hyperplane
theorem for nonsolid convex sets in infinite dimensions. Recall that x ∈ Ω
is a support point of Ω ⊂ X if there is 0 = x∗ ∈ X ∗ such that the function
x → x∗ , x attains its global maximum on Ω at x, i.e., N (x; Ω) = {0}.
Theorem 5.7 Let Ω be a nonempty, closed, and convex subset of a Banach
space X. Then the following assertions hold:
(a) The collection of support points of Ω is dense on the boundary of Ω.
(b) If Ω is SNC at x ∈ bd(Ω), then x is a support point, i.e., there exists
0 = x∗ ∈ X ∗ such that
x∗ , x ≤ x∗ , x for all x ∈ Ω.

Proof. It is obvious from (5.9) and the definition of boundary points that for
any x ∈ bd(Ω) the sets Ω1 := {x} and Ω2 := Ω form an extremal system
in X. Then both assertions of the theorem follow from the corresponding
statements of Theorem 5.5 due to the normal cone constructions for convex
sets. 

5.1.3 Density of ε-Subgradients and Some Consequences

Next we consider a useful expansion of the subdifferential notion for extended-


real-valued convex functions, which plays an important role in both theoretical
and numerical aspects of convex analysis and optimization, as well as in their
various applications some of which are presented below.

Definition 5.8 Let X be a topological vector space, let f : X → R be a proper


convex function, and let x ∈ dom(f ). Given ε ≥ 0, the ε-subdifferential
(collection of ε-subgradients) of f at x is defined by

∂ε f (x) := x∗ ∈ X ∗  x∗ , x − x ≤ f (x) − f (x) + ε for all x ∈ X .
If x ∈
/ dom(f ), we put ∂ε f (x) := ∅ for all ε ≥ 0.

We obviously have that ∂0 f (x) = ∂f (x). Note that the ε-subdifferential for
any ε > 0 is also known as the approximate subdifferential of convex analysis.
5.1 Variational Principles and Convex Geometry 319

Observe directly from the definition that for any ε ≥ 0 the ε-subgradient set
∂ε f (x) is always convex and weak∗ closed in X ∗ .
Let us verify that whenever ε > 0 every proper, l.s.c., convex function
is ε-subdifferentiable (i.e., its ε-subgradient set is nonempty) at any point of
its domain. This is indeed different from the exact subdifferential case where
ε = 0; see Theorem 3.39. This is due to the fact that using ε > 0 allows us to
apply now the strict separation theorem instead of the proper separation one
as in the proof of Theorem 3.39.

Theorem 5.9 Let X be an LCTV space, and let f : X → R be a proper, l.s.c.,


convex function. Then ∂ε f (x) = ∅ for every x ∈ dom(f ) and ε > 0.

Proof. Pick any x ∈ dom(f ) and ε > 0. Then (x, f (x) − ε) ∈ / epi(f ). By the
strict separation result from Theorem 2.61 there is (x∗ , −γ) ∈ X ∗ × R with
 
x∗ , x − γλ < x∗ , x − γ f (x) − ε for all (x, λ) ∈ epi(f ).
Letting x := x and λ := f (x) yields γ > 0. Then for any x ∈ dom(f ) we have
(x, f (x)) ∈ epi(f ). This leads us to the inequality
 
x∗ , x − γf (x) ≤ x∗ , x − γ f (x) − ε .
Dividing both sides of the inequality by γ and rearranging the terms give us
x∗
, x − x ≤ f (x) − f (x) + ε
γ
and therefore verify the claimed condition x∗ /γ ∈ ∂ε f (x) = ∅. 

Next we continue with deriving the well-recognized Brøndsted-Rockafellar


density theorem involving ε-subgradients of convex functions. The simple proof
of this fundamental result presented below is based on the Ekeland varia-
tional principle from Theorem 5.2 while historically the order was reversed:
Brøndsted-Rockafellar’s theorem (clearly yielding the Bishop-Phelps geomet-
ric density result) and its proof were a strong inspiration for establishing Eke-
land’s variational principle; see more discussions in the commentaries to this
chapter. Let us mention the possibility of the reverse geometric approach: to
deduce both Ekeland’s principle and Brøndsted-Rockafellar’s theorem directly
from the Bishop-Phelps geometric density results; see Exercise 5.93.
Here is the Brøndsted-Rockafellar theorem on subgradient density.
Theorem 5.10 Let X be a Banach space, and let f : X → R be a proper,
l.s.c., and convex function. Then for any elements x ∈ dom(f ), ε > 0, λ > 0,
and x∗ ∈ ∂ε f (x) there exist x ∈ dom(f ) and x∗ ∈ ∂f (x) such that
ε
x − x ≤ and x∗ − x∗ ≤ λ.
λ
320 5 VARIATIONAL TECHNIQUES . . .

Proof. It follows directly from Definition 5.8 that


f (x) − x∗ , x ≤ f (x) − x∗ , x + ε for all x ∈ X.
Define the function g(x) := f (x) − x∗ , x on X and observe that it is proper,
l.s.c., and convex with dom(g) = dom(f ). Furthermore, we have
g(x) ≤ inf g(x) + ε.
x∈X

Applying Theorem 5.2 to the function g gives us x ∈ dom(g) = dom(f ) with


x − x0 ≤ ε/λ and u∗ ∈ ∂g(x) with u∗ ≤ λ. By the subdifferential sum
rule we arrive at the representation
u∗ = x∗ − x∗ with some x∗ ∈ ∂f (x),
and therefore get x∗ − x∗ ≤ λ as claimed in the theorem. 

The following corollary is a straightforward consequence of Theorems 5.9


and 5.10 while being also a version of the subdifferentiability Theorem 3.39
without imposing any interiority condition.

Corollary 5.11 Let X be a Banach space, and let f : X → R be a proper,


l.s.c., and convex function. Then there exists x ∈ dom(f ) with ∂f (x) = ∅.

Proof. Taking any x ∈ dom(f ), we conclude from Theorem 5.9 that ∂ε f (x) =
∅ whenever ε > 0. Then the existence of x ∈ dom(f ) with ∂f (x) = ∅ follows
directly from Theorem 5.10. 

Next we present some geometric consequences of the Brøndsted-Rockafellar


density theorem. Given a nonempty convex subset Ω ⊂ X of a topological
vector space, recall that x ∈ Ω is a support point of Ω if there exists a nonzero
element x∗ ∈ X ∗ , which attains its supremum on Ω at x. In this setting the
element x∗ is called a support functional of Ω.
Recall also that the Bishop-Phelps theorem (Theorem 5.7) establishes the
density of support points on the boundary of any nonempty, closed, and con-
vex subset of a Banach space. Now we derive from the Brøndsted-Rockafellar
theorem a version of this result for support functionals.

Corollary 5.12 Let Ω ⊂ X be a nonempty, closed, and convex subset of


a Banach space X. Then the set of support functionals of Ω is dense with
respect to the strong topology of X ∗ in the collection of all linear functionals
0 = z ∗ ∈ X ∗ that are bounded from above on the set Ω.

Proof. Denote by F ∗ the collection of all support functionals of Ω and, invok-


ing the support function σΩ (x∗ ) of Ω, consider the set

Z ∗ := z ∗ ∈ X ∗  z ∗ = 0 and σΩ (z ∗ ) < ∞ .
5.1 Variational Principles and Convex Geometry 321


It suffices to show that Z ∗ ⊂ F . To proceed, fix z ∗ ∈ Z ∗ and ε > 0 and then
suppose that ε < z ∗ . Choosing x ∈ Ω with
σΩ (z ∗ ) − ε2 < z ∗ , x ,
observe from the definition of σΩ in (4.8) that
z ∗ , z − x < ε2 = δΩ (x) − δΩ (x) + ε2 for all z ∈ Ω,
which yields z ∗ ∈ ∂ε2 δΩ (x). Applying Theorem 5.10 with λ := ε gives us
elements x ∈ dom(δΩ ) = Ω and x∗ ∈ ∂δΩ (x) = N (x; Ω) satisfying
x − x < ε and z ∗ − x∗ < ε.
The second inequality implies that x∗ > z ∗ − ε > 0, and so x∗ = 0. Hence
x∗ ∈ F ∗ , which verifies the density of the set F ∗ in Z ∗ . 

The notion of ε-subgradients for extended-real-valued functions from Def-


inition 5.8 applies to the case of sets and gives us the following.
Definition 5.13 Let Ω be a nonempty convex subset of a topological vector
space X, and let ε ≥ 0. Then the collection of ε-normals to Ω at x ∈ Ω
is defined by the formula

Nε (x; Ω) := x∗ ∈ X ∗  x∗ , x − x ≤ ε for all x ∈ Ω .
We put Nε (x; Ω) := ∅ if x ∈
/ Ω.
It is obvious that N0 (x; Ω) = N (x; Ω) for the standard normal cone (3.1),
while the set Nε (x; Ω) is not a cone in general if ε > 0. The next corollary is
an immediate consequence of the Brøndsted-Rockafellar theorem.

Corollary 5.14 Let Ω be a nonempty, closed, and convex subset of a Banach


space X. Then for any elements x ∈ Ω, x∗ ∈ Nε (x; Ω), ε > 0, and λ > 0
there exists x ∈ Ω such that
ε
x − x ≤ and x∗ ∈ N (x; Ω) + λB∗ .
λ
Proof. It follows from Definition 5.13 that x∗ ∈ Nε (x; Ω) = ∂ε δΩ (x)). Thus
the claimed conclusion follows directly from Theorem 5.10. 

Fix now a support point x ∈ Ω of Ω with an associated support functional


x∗ ∈ X ∗ and then define the corresponding supporting half-space by

L(x∗ , x) := x ∈ X  x∗ , x ≤ x∗ , x .
We end this subsection by deriving the following convenient representation of
closed convex subsets of Banach spaces via their supporting half-spaces.

Theorem 5.15 Let Ω be a nonempty, closed, and convex set in a Banach


space X. Then Ω is the intersection of its supporting half-spaces.
322 5 VARIATIONAL TECHNIQUES . . .

Proof. It is obvious that Ω is contained in the intersection of all its supporting


half-spaces. To verify the opposite inclusion, suppose that z ∈ X belongs to
such an intersection. Suppose on the contrary that z ∈ / Ω and denote its
distance to Ω by d := d(z; Ω) > 0. Then we get Ω ∩ B(z; d) = ∅ for the open
ball centered at z with radius d. Then the separation result from Theorem 2.57
yields z ∗ ∈ X ∗ with z ∗ = 1 such that
z ∗ , x ≤ z ∗ , y whenever x ∈ Ω and y ∈ B(z; d).
Observing that σΩ (z ∗ ) = z ∗ , z − d z ∗ = z ∗ , z − d and choosing sequences
εk ↓ 0 as k → ∞ and {xk } ⊂ Ω with xk − z < d + ε2k , let us check that
z ∗ ∈ Nε2k (xk ; Ω) for all k ∈ N.
Indeed, for any x ∈ Ω we have the relationships
z ∗ , x − xk = z ∗ , x − z + z ∗ , z − xk
= z ∗ , x − z ∗ , z + z ∗ · z − xk
≤ σΩ (z ∗ ) − z ∗ , z + d + ε2k = ε2k .
Corollary 5.14 allows us to find uk ∈ Ω with uk − xk ≤ εk and z ∗ ∈
N (uk ; Ω) + εk B∗ . Picking u∗k ∈ N (uk ; Ω) with u∗k − z ∗ ≤ εk gives us
u∗k , uk − z = u∗k − z ∗ , uk − z + z ∗ , uk − z
≤ u∗k − z ∗ · uk − z + σΩ (z ∗ ) − z ∗ , z
≤ εk ( uk − xk + xk − z ) − d
≤ εk (εk + d + ε2k ) − d → −d as k → ∞.
This shows that we can always get u∗k , uk − z < 0 for all k ∈ N sufficiently
/ L(u∗k , uk ), a contradiction which
large. Thus there exists k ∈ N such that z ∈
completes the proof of the theorem. 

5.2 Calculus Rules for ε-Subgradients


As seen in Section 5.1, the notion of ε-subdifferentials (as ε > 0) for convex
functions formulated in Definition 5.8 is instrumental for establishing impor-
tant results of convex analysis. In this section we first present major calculus
rules for ε-subdifferentials in topological vector spaces, Banach spaces, and
finite-dimensional spaces under appropriate qualification conditions. Further-
more, it is shown that these qualification conditions can be dropped in the
general framework of topological vector spaces if we aim at deriving the asymp-
totic forms of the major ε-subdifferential calculus rules with the usage of the
weak∗ closure operation.

5.2.1 Exact Sum and Chain Rules for ε-Subgradients

This subsection mainly concerns deriving the basic sum rule for ε-gradients of
extended-real-valued convex functions in different space settings. Let us begin
5.2 Calculus Rules for ε-Subgradients 323

with the formulations of two elementary propositions the proofs of which we


leave as exercises for the reader.
Proposition 5.16 Let f : X → R be a convex function defined on a topolog-
ical vector space X, and let x ∈ X. Then the following properties hold:
(a) If 0 ≤ ε1 ≤ ε2 , then ∂ε1 f (x) ⊂ ∂ε2 f (x).

(b) ∂f (x) = ε>0 ∂ε f (x).
Proposition 5.17 In the setting of Proposition 5.16, for any ε ≥ 0 and x ∈ X
we have the following properties:
(a) ∂ε (f + c)(x) = ∂ε f (x) whenever c ∈ R.
(b) ∂ε (γf )(x) = γ∂ε/γ f (x) whenever γ > 0.
(c) Given a real number α = 0, define the function g : X → R by g(x) :=
f (αx) for all x ∈ X. Then
∂ε g(x) = α∂ε f (αx).

(d) Given an arbitrary vector a ∈ X, define the function h : X → R by


h(x) := f (x − a) for all x ∈ X. Then
∂ε h(x) = ∂ε f (x − a).

The next proposition contains a useful description of ε-subgradients of


convex functions via their Fenchel conjugates.
Proposition 5.18 Let f : X → R be a proper convex function defined on a
topological vector space X, and let ε ≥ 0. Then x∗ ∈ ∂ε f (x) if and only if
f ∗ (x∗ ) + f (x) ≤ x∗ , x + ε. (5.12)
Proof. For any x∗ ∈ ∂ε f (x) we have x ∈ dom(f ) and
x∗ , x − x ≤ f (x) − f (x) + ε whenever x ∈ X,
which tells us therefore that
x∗ , x − f (x) ≤ x∗ , x − f (x) + ε for all x ∈ X.
Taking the supremum above with respect to x ∈ X and remembering
the Fenchel conjugate construction in topological vector spaces show that
f ∗ (x∗ ) ≤ x∗ , x − f (x) + ε, which thus verifies (5.12).
To check the converse implication, suppose that (5.12) is satisfied. Then
we get x ∈ dom(f ) and
x∗ , x − f (x) ≤ f ∗ (x∗ ) ≤ x∗ , x − f (x) + ε.
Employing the definition of ε-subgradients shows that x∗ ∈ ∂ε f (x). 
Now we derive the major ε-subgradient sum rule under different assump-
tions in topological vector spaces, Banach spaces, and finite-dimensional
324 5 VARIATIONAL TECHNIQUES . . .

spaces. Employing the ε-subgradient description from Proposition 5.18 allows


us to deduce the following theorem from the corresponding results of the
conjugate calculus developed in Section 4.1.
Theorem 5.19 Let f, g : X → R be convex functions defined on a topological
vector space X, and let ε ≥ 0. Then we have the ε-subdifferential sum rule
 
∂ε (f + g)(x) = ∂ε1 f (x) + ∂ε2 g(x)  ε1 ≥ 0, ε2 ≥ 0, ε1 + ε2 = ε (5.13)

in each of the following cases (a)–(c):


(a) f is continuous at some point u ∈ dom(g).
(b) X is a Banach space, the functions f and g are proper and l.s.c. under
the fulfillment of the qualification condition

cone(dom(f ) − dom(g) is a closed subspace of X.
   
(c) X = Rn and ri dom(f ) ∩ ri dom(g) = ∅.
Proof. Note first that the inclusion “⊃” in (5.13) is obvious even without
requiring the assumptions in (a)–(c). To verify the opposite inclusion in (5.13),
fix any x∗ ∈ ∂ε (f + g)(x) and get from Proposition 5.18 that
(f + g)(x) + (f + g)∗ (x∗ ) ≤ x∗ , x + ε.
Then in each of the cases (a)–(c) we can apply the corresponding results from
Theorems 4.27 and 4.41 on the validity of the conjugate sum rule while finding
in this way x∗1 , x∗2 ∈ X ∗ with x∗ = x∗1 + x∗2 such that
f (x) + g(x) + f ∗ (x∗1 ) + g ∗ (x∗2 ) ≤ x∗ , x + ε = x∗1 , x + x∗2 , x + ε.
This gives us the equality
   
f (x) + f ∗ (x∗1 ) − x∗1 , x + g(x) + g ∗ (x∗2 ) − x∗2 , x ≤ ε.
Then there exist ε1 ≥ 0 and ε2 ≥ 0 for which ε1 + ε2 = ε,
f (x) + f ∗ (x∗1 ) − x∗1 , x ≤ ε1 , and g(x) + g ∗ (x∗2 ) − x∗2 , x ≤ ε2 .
It follows from the constructions above and Proposition 5.18 that x∗1 ∈ ∂ε1 f (x)
and x∗2 ∈ ∂ε2 g(x). This shows that x∗ belongs to the set on the right-hand
side of (5.13), which therefore completes the proof of the theorem. 

Remark 5.20 In the proof of Theorem 5.19 we use the obvious fact that if
a, b ∈ R and a + b ≤ ε with ε ≥ 0, then there exist ε1 , ε2 ≥ 0 such that a ≤ ε1 ,
b ≤ ε2 , and ε1 + ε2 = ε. Indeed, it suffices to take ε1 := a + (ε − a − b)/2 and
ε2 := b + (ε − a − b)/2.

Next we establish the ε-subdifferential chain rule in three space settings


under the appropriate qualification conditions.
5.2 Calculus Rules for ε-Subgradients 325

Theorem 5.21 Let A : X → Y be a linear continuous mapping between topo-


logical vector spaces, and let g : Y → R be a proper convex function. Given
any ε ≥ 0 and x ∈ X with A(x) ∈ dom(g), we have the chain rule
∂ε (g ◦ A)(x) = A∗ ∂ε g(Ax) (5.14)
in each of the following cases (a)–(c):
(a) The function g is finite and continuous at some point of the set AX.
(b) X and Y are Banach spaces, and the function g is l.s.c. under the fulfill-
ment of the qualification condition
 
cone dom(g) − AX is a closed subspace.
 
(c) X = Rn , Y = Rp , and AX ∩ ri dom(g) = ∅.

Proof. The inclusion “⊃” in (5.14) is obvious. To verify the opposite one, pick
any x∗ ∈ ∂ε (g ◦ A)(x) and deduce from Proposition 5.18 that
(g ◦ A)(x) + (g ◦ A)∗ (x∗ ) ≤ x∗ , x + ε.
By the conjugate chain rule, which is valid in each of the cases (a)–(c) due
to Theorem 4.28 and its enhanced Banach space version from Corollary 4.42,
we find y ∗ ∈ Y ∗ such that A∗ y ∗ := x∗ and that
(g ◦ A)∗ (x∗ ) := g ∗ (y ∗ ).
This implies therefore that
g(Ax) + g ∗ (y ∗ ) = (g ◦ A)(x) + (g ◦ A)∗ (x∗ )
≤ x∗ , x + ε = A∗ y ∗ , x + ε
= y ∗ , Ax + ε.
Employing Proposition 5.18 tells us that y ∗ ∈ ∂ε g(Ax), and so we conclude
that x∗ = A∗ y ∗ ∈ A∗ ∂ε g(Ax), which completes the proof. 

5.2.2 Asymptotic ε-Subdifferential Calculus

In this subsection we derive sum and chain rules of the asymptotic type for
ε-subgradients of convex functions on topological vector spaces that are formu-
lated via the closure operation without imposing any qualification conditions.
As usual, we start with the (asymptotic) ε-subdifferential sum rule. The
theorem below uses the fact that for a proper convex function ψ : X ∗ →
R, where X is an LCTV space, we always have the well-known biconjugate
relationship discussed in Exercise 5.98:
ψ ∗∗ = ψ, (5.15)
where ψ is the weak∗ l.s.c. and convex function on X ∗ whose its epigraph is
the weak∗ closure of epi(ψ) in X ∗ × R.
326 5 VARIATIONAL TECHNIQUES . . .

Theorem 5.22 Let f, g : X → R be proper, l.s.c., and convex functions


defined on an LCTV space X, and let x ∈ dom(f ) ∩ dom(g). Then we have
the asymptotic subdifferential sum rule

∂(f + g)(x) = ∂ε f (x) + ∂ε g(x), (5.16)
ε>0

where the closure is taken with respect to the weak∗ topology of X ∗ .

Proof. It follows from the definition that


∂ε f (x) + ∂ε g(x) ⊂ ∂2ε (f + g)(x) for all ε > 0.
This clearly implies in turn that
∂ε f (x) + ∂ε g(x) ⊂ ∂2ε (f + g)(x) whenever ε > 0.
Taking the intersection on both sides of the above inclusion with respect to
ε > 0 verifies the inclusion “⊃” in (5.16).
To prove next the opposite inclusion in (5.16), pick any x∗ ∈ ∂(f + g)(x)
and rewrite this equivalently in the conjugate form
(f + g)∗ (x∗ ) + (f + g)(x) ≤ x∗ , x . (5.17)
Using the conjugate formula for the infimal convolution and its dual version
(f g)∗ = f ∗ + g ∗ , (f + g)∗ = (f ∗ g ∗ )∗∗ = (f ∗ g ∗ ) (5.18)
valid due to the convexity and properness of f ∗ g ∗ , we define the function
ϕ(x∗ ) := (f ∗ g ∗ )(x∗ ) − x∗ , x for all x∗ ∈ X ∗
with its closure described by ϕ(x∗ ) = (f ∗ g ∗ )(x∗ ) − x∗ , x on X ∗ . Denoting
γ := −(f + g)(x), it follows from (5.17) and (5.18) that ϕ(x∗ ) ≤ γ. Since
 
x∗ ∈ X ∗  ϕ(x∗ ) ≤ γ = x∗ ∈ X ∗ | ϕ(x∗ ) ≤ γ + ε},
ε>0

for any ε > 0 we get the inclusion

x∗ ∈ x∗ ∈ X ∗ | ϕ(x∗ ) ≤ γ + ε}.
This implies in turn that whenever ϕ(x∗ ) ≤ γ + ε/2, it follows that
 ∗ ∗
ϕ(x∗ ) − γ = ∗ inf
∗ ∗
f (x1 ) + g ∗ (x∗2 ) − x∗1 , x
x =x1 +x2

− x∗2 , x + f (x) + g(x) 
= ∗ inf f ∗ (x∗1 ) − x∗1 , x + f (x)
x =x∗ +x∗
1 2 
+ g ∗ (x∗2 ) − x∗2 , x + g(x) ≤ ε/2.
5.2 Calculus Rules for ε-Subgradients 327

Thus we find x∗1 and x∗2 such that x∗ = x∗1 + x∗2 and
 ∗ ∗   
f (x1 ) − x∗1 , x + f ∗ (x) + g ∗ (x∗2 ) − x∗2 , x + g(x) < ε.
Since each of the bracketed terms above is nonnegative, we conclude that
x∗1 ∈ ∂ε f (x) and x∗2 ∈ ∂ε g(x). Hence
  
x∗ ∈ X ∗  ϕ(x∗ ) ≤ γ + ε/2} ⊂ ∂ε f (x) + ∂ε g(x) ,
which completes the proof of the theorem. 

As an immediate consequence of Theorem 5.22, we arrive at the asymptotic


intersection rule for ε-normals to convex sets in topological vector spaces.
Corollary 5.23 Let Ω1 and Ω2 be closed and convex subsets of an LCTV
space X, and let x ∈ Ω1 ∩ Ω2 . Then we have the normal cone intersection
rule 
N (x; Ω1 ∩ Ω2 ) = Nε (x; Ω1 ) + Nε (x; Ω2 ),
ε>0

where the closure is taken with respect to the weak∗ topology on X ∗ .

Proof. Take f (x) := δ(x; Ω1 ) and g(x) := δ(x; Ω2 ) in Theorem 5.22. 

Our next result is the following asymptotic chain rule for ε-subgradients of
compositions in topological vector spaces without any qualification conditions.

Theorem 5.24 Let A : X → Y be a linear continuous mapping between


LCTV spaces, and let f : Y → R be a convex function. Then for any x ∈ X
with Ax ∈ dom(f ) we have the asymptotic ε-subdifferential chain rule

∂(f ◦ A)(x) = A∗ ∂ε f (Ax), (5.19)
ε>0

where the closure is taken with respect to the weak∗ topology of X ∗ .

Proof. Define the proper, l.s.c., and convex functions


 
f1 (x, y) := f (y) and f2 (x, y) := δ (x, y); gph(A) for (x, y) ∈ X × Y
and consider their sum g := f1 + f2 on X × Y with the domain gph(A) ∩ (X ×
dom(f )). Thus for any x ∈ X with Ax ∈ dom(f ) we have (x, Ax) ∈ dom(g),
which ensures the equivalence
(x∗ , y ∗ ) ∈ ∂g(x, Ax) if and only if x∗ + A∗ y ∗ ∈ ∂(f ◦ A)(x).
To verify the inclusion “⊂” in (5.19), pick any x∗ ∈ ∂(f ◦ A)(x) and deduce
from the above equivalence that (x∗ , 0) ∈ ∂(f1 + f2 )(x). Applying the asymp-
totic sum rule from Theorem 5.22 to this sum, for ε > 0 we get
328 5 VARIATIONAL TECHNIQUES . . .

(x∗ , 0) ∈ ∂ε f1 (x, Ax) + ∂ε f2 (x, Ax). (5.20)


It easily follows from the definitions of f1 and f2 that
∂ε f1 (x, Ax) = {0} × ∂ε f (Ax) and

∂ε f2 (x, Ax) = (u∗ , y ∗ ) ∈ X ∗ × Y ∗  u∗ + A∗ y ∗ = 0 .
This brings us therefore to the equivalence
 
(u∗ , v ∗ ) ∈ ∂ε f1 (x, Ax) + ∂ε f2 (x, Ax) ⇐⇒ u∗ ∈ A∗ ∂ε f (Ax) − v ∗ .
Combining this with (5.20) gives us a net {(u∗α , vα∗ )} with u∗α ∈ A∗ (∂ε f (Ax) −
w∗ w∗
vα∗ ) such that u∗α → x∗ and vα∗ → 0. By passing to the limit along this net with
taking into account that the operator A∗ is weak∗ continuous, we conclude
that x∗ ∈ A∗ ∂ε f (Ax), which therefore verifies the inclusion “⊂” in (5.19).
To prove the opposite inclusion in (5.19), observe from the definitions that
A∗ ∂ε f (Ax) ⊂ ∂ε (f ◦ A)(x) for each ε > 0, (5.21)
where the set on the right-hand side is weak∗ closed. This clearly yields the
inclusion “⊃” in (5.19) and thus completes the proof of the theorem. 
Some other results of the asymptotic calculus for ε-subgradients are dis-
cussed in the exercises and commentaries in Sections 5.8 and 5.9.

5.3 Mean Value Theorems for Convex Functions


The (Lagrange) mean value theorem for differentiable functions is one of the
central results of classical real analysis. It gives an exact (not limiting) rela-
tionship between the function in question and its derivative. In this section we
establish subdifferential counterparts of the mean value theorem for convex
continuous functions and also develop approximate versions of the mean value
theorem for convex functions that are merely lower semicontinuous.

5.3.1 Mean Value Theorem for Continuous Functions

In this subsection we consider the case of continuous convex functions on topo-


logical vector spaces, which is essentially different from the discontinuous case
examined in the next subsection. The first lemma presents a direct extension
of the classical Lagrange theorem for smooth functions on the real line to the
case of convex continuous functions without differentiability assumptions; see
Figure 5.1 for a geometric illustration of the proof.
Lemma 5.25 Let f : R → R be convex, let a < b, and let the interval [a, b]
belong to the domain of f , which is an open interval in R. Then there exists
an intermediate point c ∈ (a, b) such that
f (b) − f (a)
∈ ∂f (c). (5.22)
b−a
5.3 Mean Value Theorems for Convex Functions 329

Proof. Define the function g : R → R by


 f (b) − f (a) 
g(x) := f (x) − (x − a) + f (a)
b−a
for which g(a) = g(b). This implies, by the convexity of f on its open domain
and hence the continuity of g on [a, b], that g achieves an absolute minimum
on [a, b] at some c ∈ (a, b). The function h : R → R defined by
 f (b) − f (a) 
h(x) := − (x − a) + f (a) , x ∈ R,
b−a
is obviously convex and differentiable at c, and hence
 f (b) − f (a) 
∂h(c) = h (c) = − .
b−a
Employing now the subdifferential Fermat rule and the subdifferential sum
rule for convex functions presented in Chapter 3 yields
 f (b) − f (a) 
0 ∈ ∂g(c) = ∂f (c) − ,
b−a
which verifies (5.22) and thus completes the proof. 

Fig. 5.1. Subdifferential mean value theorem

To proceed further with establishing a version of the mean value theorem


for convex continuous functions on general topological vector spaces, we need
yet another observation based on the subdifferential chain rule.

Lemma 5.26 Let f : X → R be a convex function on a topological vector


space X, where the domain Ω := dom(f ) is a nonempty open convex subset
330 5 VARIATIONAL TECHNIQUES . . .

of X, and where f is continuous on Ω. Given a, b ∈ Ω with a = b, define a


function on R by
 
ϕ(t) := f tb + (1 − t)a for all t ∈ R. (5.23)
Then for any number t0 ∈ (0, 1) we have

∂ϕ(t0 ) = x∗ , b − a  x∗ ∈ ∂f (c0 ) with c0 := t0 a + (1 − t0 )b.

Proof. Define the linear/affine mappings A : R → X and B : R → X by


A(t) := (b − a)t and B(t) := tb + (1 − t)a = t(b − a) + a = A(t) + a, t ∈ R.
We clearly get that ϕ(t) = (f ◦ B)(t) and that the adjoint operator to A is
given by A∗ x∗ := x∗ , b−a for all x∗ ∈ X ∗ . Applying the subdifferential chain
rule from Theorem 3.55 to the above composite representation of ϕ yields

∂ϕ(t0 ) = A∗ ∂f (c0 ) = x∗ , b − a  x∗ ∈ ∂f (c0 ) ,
which therefore verifies the claimed result. 

Combining now the results of Lemma 5.25 and Lemma 5.26 gives us the
following mean value theorem for convex continuous functions on topological
vector spaces.

Theorem 5.27 In the setting of Lemma 5.26 there exists c ∈ (a, b) such that
  
f (b) − f (a) ∈ ∂f (c), b − a := x∗ , b − a  x∗ ∈ ∂f (c) .

Proof. Take the function ϕ : R → R from (5.23) and observe that ϕ(0) = f (a)
and ϕ(1) = f (b). Applying first Lemma 5.25 and then Lemma 5.26 to ϕ gives
us a number t0 ∈ (0, 1) for which

f (b) − f (a) = ϕ(1) − ϕ(0) ∈ ∂ϕ(t0 ) = x∗ , b − a  x∗ ∈ ∂f (c)
with c := t0 a + (1 − t0 )b, and thus we are done with the proof. 

5.3.2 Approximate Mean Value Theorem

To deal with advanced issues of convex and variational analysis and their
broad applications to constrained optimization and other areas, we need to
consider not only continuous but lower semicontinuous extended-real-valued
functions. The versions of the mean value theorem given below are largely
different from the classical one and its subdifferential counterparts. The main
feature of them is their approximate structures, and thus such results are
unified under the name of approximate mean value theorems.
To proceed, we present first the next simple albeit useful lemma involving
the distance function and the projection operator.
5.3 Mean Value Theorems for Convex Functions 331

Lemma 5.28 Let X be a normed space, and let a, b ∈ X with a = b. For any
x ∈ X, v ∗ ∈ ∂d(x; [a, b]), and w ∈ Π(x; [a, b]) we have the estimate v ∗ ≤ 1
together with the following properties:
(a) v∗ , x − w ≤ 0 for all x ∈ [a, b].
(b) v∗ , b − a ≤ 0 if w = b.
(c) v∗ , b − a = 0 if w = b and w = a.
(d) v∗ , w − x ≤− w−x .

Proof. It follows from Proposition 3.77 that


 
∂d x; [a, b] = ∂p(x − w) ∩ N (w; [a, b]),
where p(x) := x for x ∈ X. Since v ∗ ∈ N (w; [a, b]), we have (a) right away.
Taking w ∈ [a, b), there exists t ∈ (0, 1] such that w = ta + (1 − t)b. Hence
v ∗ , b − w = v ∗ , b − ta − (1 − t)b = t v ∗ , b − a ≤ 0,
which implies that v ∗ , b − a ≤ 0 and hence verifies (b).
To verify the next property (c) where w = b and w = a, we pick t ∈ (0, 1)
with w = ta + (1 − t)b. Then
v ∗ , a − w = v ∗ , a − (ta + (1 − t)b) = (1 − t) v ∗ , a − b ≤ 0.
It follows that v ∗ , a − b ≤ 0, which yields v ∗ , b − a = 0 and so justifies (c).
Finally, we readily get the relationships
   
v ∗ , w − x ≤ d(w; [a, b]) − d x; [a, b] = −d x; [a, b] = − x − w ,
giving us (d) and thus completing the proof of the lemma. 
Now we are ready to derive two major results of the approximate mean
value type for l.s.c. convex functions on Banach spaces. Recall that the symbol
g
xk −
→ c signifies that xk → c with g(xk ) → g(c) as k → ∞.

Theorem 5.29 Let X be a Banach space, and let g : X → R be a proper,


l.s.c., and convex function. Fix a, b ∈ X with a ∈ dom(g), a = b, and g(a) ≤
g(b), and let the function g achieve its absolute minimum on [a, b] at c ∈
g
[a, b). Then there exist a sequence {xk } ⊂ X with xk −
→ c and a sequence of
subgradients u∗k ∈ ∂g(xk ) satisfying the conditions
0 ≤ lim inf u∗k , b − xk ,
k→∞
0 ≤ lim inf u∗k , b − a ,
k→∞
0 ≤ lim inf u∗k , c − xk .
k→∞

If in addition c ∈ (a, b) and g(a) = g(b), then


lim u∗k , c − xk = 0.
k→∞
332 5 VARIATIONAL TECHNIQUES . . .

Proof. We know from the Weierstrass theorem that g attains its absolute
minimum on the compact interval [a, b] at some point c ∈ [a, b]. Since g(a) ≤
g(b), we can always suppose that c ∈ [a, b). The l.s.c. property of g ensures
the existence of r > 0 for which
γ := inf g(x) > −∞ with Ω := [a, b] + rB.
x∈Ω

Define further the extended-real-valued function


h(x) := g(x) + δ(x; Ω) for all x ∈ X,
which is clearly proper, l.s.c., convex, and bounded from below on X. For any
k ∈ N we find a real number rk ∈ (0, r) such that
1
g(x) ≥ g(c) − whenever x ∈ Ωk := [a, b] + rk B.
k2
Choose tk ≥ k with γ + tk rk ≥ g(c) − 1/k 2 and consider the function
   
hk (x) := h(x) + tk d x; [a, b] = g(x) + δ(x; Ω) + tk d x; [a, b] on X.
Then for all k ∈ N we have the upper estimate
1
hk (c) = g(c) ≤ inf hk (x) + .
x∈X k2
To check it, consider first x ∈ Ωk and get hk (c) = g(c) ≤ g(x) ≤ hk (x) + 1/k 2 .
In the case where x ∈/ Ωk , we have d(x; [a, b]) ≥ rk , and so
1 1
hk (x) ≥ γ + tk rk ≥ g(c) − 2
= kk (c) − 2 , k ∈ N.
k k
Since each function hk is obviously l.s.c., we employ the subdifferential vari-
ational principle from Theorem 5.3 to find a point xk ∈ X and a subgradient
zk∗ ∈ ∂hk (xk ) satisfying the conditions
1 1
xk − c ≤ , hk (xk ) ≤ hk (c) = g(c), and zk∗ ≤ .
k k
Having xk ∈ int(Ω) when k is sufficiently large and supposing without loss of
generality that this holds for all k ∈ N allow us to apply the subdifferential
sum rule from Theorem 3.48 to hk and thus get
zk∗ = u∗k + tk vk∗ whenever k ∈ N,
g
where u∗k ∈ ∂g(xk ) and vk∗ ∈ ∂d(xk ; [a, b]). In addition, we have xk −
→ c as
k → ∞. It follows from the general part of Lemma 5.28 that vk∗ ≤ 1 and
   
vk∗ , b − xk ≤ d b; [a, b] − d xk ; [a, b] ≤ 0 for k ∈ N.
5.3 Mean Value Theorems for Convex Functions 333

Furthermore, assertion (a) of Lemma 5.28 tells us that


vk∗ , b − wk ≤ 0 for all k ∈ N.
Since wk → c = b as k → ∞, we can assume that wk = b whenever k ∈ N and
deduce from assertion (b) of Lemma 5.28 that vk∗ , b − a ≤ 0. Now we arrive
at the relationships
u∗k , b − xk = zk∗ , b − xk − tk vk∗ , b − xk , (5.24)
1
| zk∗ , b − xk | ≤ zk∗ · b − xk ≤ b − xk → 0 as k → ∞.
k
In addition, assertion (b) of Lemma 5.28 ensures that
vk∗ , b − xk = vk∗ , b − wk + vk∗ , wk − xk
≤ vk∗ , wk − xk ≤ − wk − xk ≤ 0.
It follows directly from (5.24) that
lim inf u∗k , b − xk ≥ 0.
k→∞

Similarly we conclude that


lim inf u∗k , b − a ≥ 0.
k→∞

Arguing in the same way brings us to the estimates


vk∗ , c − xk = vk∗ , c − wk + vk∗, wk − xk   
≤ vk∗ , wk − xk ≤ d wk ; [a, b] − d xk ; [a, b] ≤ 0,
which in turn imply that
u∗k , c − xk = zk∗ , c − xk − tk vk∗ , c − xk
1
≥ zk∗ , c − xk ≥ − zk∗ · c − xk ≥ − c − xk ,
k
and thus verify all but the last conclusion of the theorem.
To justify this conclusion under the assumptions therein, we employ
assertion (d) of Lemma 5.28 to get vk∗ , b − c = 0. The latter yields
limk→∞ u∗k , b − a = 0 and thus completes the proof of the theorem. 

The final result of this subsection is the main version of the approximate
mean value theorem; see the commentaries to this chapter.

Theorem 5.30 Let X be a Banach space, and let f : X → R be a proper,


l.s.c., and convex function. Fix a point a ∈ dom(f ) with a = b. Then for any
β ∈ R with β ≤ f (b) there exist c ∈ [a, b) and a sequence {xk } ⊂ X with
g
xk −→ c as k → ∞ and x∗k ∈ ∂f (xk ) such that
334 5 VARIATIONAL TECHNIQUES . . .

lim inf x∗k , b − a ≥ β − f (a) and


k→∞

b−c 
β − f (a) ≤ lim inf x∗k , b − xk .
b−a k→∞

If in addition β = f (b) and c ∈ (a, b), then we have


lim x∗k , c − xk = 0.
k→∞

Proof. Pick x∗ ∈ X ∗ satisfying


x∗ , b − a = f (b) − f (a)
and then define the function
g(x) := f (x) − x∗ , x , x ∈ X, (5.25)
for which g(a) = g(b). Choose xk ∈ [a, b) and u∗k as in Theorem 5.29. Then
we get u∗k = x∗k − x∗ , where x∗ ∈ ∂f (xk ). It follows from Theorem 5.29 that
0 ≤ lim inf u∗k , b − a .
k→∞

This tells us therefore that


 
0 ≤ lim inf x∗k − x∗ , b − a = lim inf x∗k , b − a − f (b) − f (a) ,
k→∞ k→∞

which implies in turn the inequality


lim inf x∗k , b − a ≥ f (b) − f (a).
k→∞

It also follows from Theorem 5.29 that


lim inf u∗k , b − xk and 0 ≤ lim inf u∗k , c − xk ,
k→∞ k→∞

and we arrive therefore at the relationships


u∗k , b − xk = u∗k , b − c + u∗k , c − xk ,

lim inf u∗k , b − xk ≥ lim inf u∗k , b − c + lim inf u∗k , c − xk − c


k→∞ k→∞ k→∞

≥ lim inf u∗k , b − c .


k→∞
g
Remembering that u∗k = x∗k − x∗ and xk −
→ c brings us to
lim inf x∗k , b − xk ≥ lim inf x∗k , b − c .
k→∞ k→∞
5.4 Maximal Monotonicity of Subgradient Mappings 335

Since c ∈ [a, b), we have c = ta + (1 − t)b for all t ∈ (0, 1]; hence b − c = t(b − a)
and t = b − c / b − a . Then
 b−c
x∗k , b − xk ≥ lim inf x∗k , t(b − a) ≥ f (b) − f (a) .
k→∞ b−a
The last statement of the theorem in the case where β = f (b), c ∈ (a, b), and
g(a) = g(b) for the function g in (5.25) follows directly from Theorem 5.29.
To complete the proof of the theorem, it remains to examine the case
where β < f (b). In this case we choose x∗ ∈ X ∗ such that
x∗ , b − a = β − f (a)
and consider the function g(x) from (5.25). Then g(a) < g(b), which allows
us to proceed as above with the usage of Theorem 5.29. 

5.4 Maximal Monotonicity of Subgradient Mappings


This section presents applications of the subdifferential mean value theorems,
in both cases of continuous and lower semicontinuous convex functions, to
one of the most important issues of convex analysis that concerns the max-
imal monotonicity of subgradient mappings generated by such functions on
topological vector spaces and Banach spaces. In particular, we prove in this
way the fundamental theorem on the maximal monotonicity of the subgradi-
ent mapping associated with a proper, l.s.c., and convex function defined on a
Banach space. The maximal monotonicity of convex subdifferential mappings
plays a highly important role in both theoretical and computational aspects of
optimization and its applications, some of which are considered in this book.

Definition 5.31 Let X be a topological vector space, and let F : X →


→ X ∗ be

a set-valued mapping between X and its topological dual X . We say that the
mapping F is (globally) monotone on X if
x∗ − u∗ , x − u ≥ 0 for all x, u ∈ dom(F ), x∗ ∈ F (x), and u∗ ∈ F (u).
A monotone mapping F : X → → X ∗ is maximal monotone if the graph of F
is not properly contained in the graph of any monotone mapping G : X →
→ X ∗.
The next lemma presents convenient characterizations of maximal mono-
tonicity that are used below for deriving the main subdifferential results.

Lemma 5.32 Let F : X → → X ∗ be a monotone mapping defined on a topolog-


ical vector space X. Then the following properties are equivalent:
(a) F is maximal monotone.
(b) Whenever u ∈ X and u∗ ∈ X ∗ we have the implication
 ∗   
u − x∗ , u − x ≥ 0 for all (x, x∗ ) ∈ gph(F ) =⇒ u∗ ∈ F (u) .
336 5 VARIATIONAL TECHNIQUES . . .

(c) Whenever u ∈ X and u∗ ∈ X ∗ we have the implication


 ∗   
u ∈/ F (u) =⇒ u∗ − x∗ , u − x < 0 for some (x, x∗ ) ∈ gph(F ) .

Proof. To verify implication (a)=⇒(b), suppose that F is maximal monotone


and pick any u ∈ X and u∗ ∈ X ∗ satisfying
u∗ − x∗ , u − x ≥ 0 for all (x, x∗ ) ∈ gph(F ).
We need to show that u∗ ∈ F (u). Suppose on the contrary that u∗ ∈
/ F (u),
i.e., (u, u∗ ) ∈
/ gph(F ), and then define the set-valued mapping

F (x) if x = u,
G(x) := ∗
F (u) ∪ {u } if x = u.

It is easy to check that G is monotone with the graph gph(F ) being a proper
subset of gph(G), which gives us a contradiction.
Assume now that (b) is satisfied. Suppose on the contrary that F is not
maximal monotone and find a monotone mapping G : X → → X ∗ such that
gph(F ) is a proper subset of gph(G). Choose (u, u ) ∈ gph(G) with (u, u∗ ) ∈

/
gph(F ). Then for any (x, x∗ ) ∈ gph(F ) we have (x, x∗ ) ∈ gph(G), which
readily implies the inequality
x∗ − u∗ , x − u ≥ 0
and thus contradicts the choice of u∗ ∈/ F (u). This verifies that (b)=⇒(a).
The equivalence between (b) and (c) is obvious. 

Now we consider set-valued mappings generated by subgradients of convex


functions. The following proposition simply follows from the definitions.

Proposition 5.33 Let X be a topological vector space, and let f : X → R be


a proper convex function. Then the set-valued mapping F : X →→ X ∗ defined
by F (x) := ∂f (x) for all x ∈ X is monotone.

Proof. Fix any x, u ∈ dom(F ) with x∗ ∈ ∂f (x) and u∗ ∈ ∂f (u). By the


subdifferential definition we have
x∗ , u − x ≤ f (u) − f (x) and u∗ , x − u ≤ f (x) − f (u).
Adding these two inequalities gives us the claimed conclusion. 

The first theorem on the maximal monotonicity of subgradient mappings


concerns continuous functions defined on topological vector spaces.
Theorem 5.34 Let f : X → R be a continuous convex function defined on
a topological vector space X. Then the subgradient mapping ∂f : X →
→ X ∗ is
maximal monotone on X.
5.4 Maximal Monotonicity of Subgradient Mappings 337

Proof. Having in hand Proposition 5.33, we employ Lemma 5.32 to verify the
maximal monotonicity of F = ∂f . Fix any (u, u∗ ) ∈ X × X ∗ with u∗ ∈
/ ∂f (u).
By Lemma 5.32 it suffices to show that there exists a pair (x, x∗ ) ∈ X × X ∗
with x∗ ∈ ∂f (x) such that
x∗ − u∗ , x − u < 0. (5.26)
To verify this, observe by the choice of u∗ ∈
/ ∂f (u) that there is z ∈ X with
u∗ , z − u > f (z) − f (u).
Taking into account that z = u, we apply the subdifferential mean value result
from Theorem 5.27 to find a vector xt := tz + (1 − t)u with some t ∈ [0, 1]
and a subgradient x∗t ∈ ∂f (xt ) such that
f (z) − f (u) = x∗t , z − u .
This immediately implies that
u∗ , z − u > f (z) − f (u) = x∗t , z − u ,
and hence x∗t − u∗ , z − u < 0. It shows in turn that
x∗t − u∗ , xt − u = x∗t − u∗ , t(z − u) < 0.
Thus x∗t − u∗ , xt − u < 0, which completes the proof of the theorem. 

The majority of applications of the subdifferential maximal monotonicity


require dealing with lower semicontinuous functions. The next theorem pro-
vides a desired result in the framework of Banach spaces. The proof of this
major result is based on the usage of the approximate mean value theorem
instead of the conventional one as in the proof of Theorem 5.34.
Theorem 5.35 Let f : X → R be a proper, l.s.c., and convex function on a
Banach space X. Then the mapping ∂f : X →
→ X ∗ is maximal monotone.
Proof. Employing again Lemma 5.32, we are going to show that for any
(u, u∗ ) ∈ X × X ∗ with u∗ ∈ / ∂f (u) there exists a pair (x, x∗ ) ∈ X × X ∗ with

x ∈ ∂f (x) for which (5.26) holds. Indeed, as in the proof of Theorem 5.34
we find z = u such that u∗ , z − u > f (z) − f (u), which yields
   
f (z) − u∗ , z < f (u) − u∗ , u .
Note that the above claim holds in both cases: u ∈ dom(f ) and u ∈
/ dom(f ).
Define the function g(x) := f (x) − u∗ , x for x ∈ X and observe that g(z) <
g(u). Fix β ∈ R with g(z) < β < g(u). Observing that z ∈ dom(f ) = dom(g),
the main approximate mean value result from Theorem 5.30 gives us a point
g
c ∈ [z, u) and a sequence {xk } ⊂ X such that xk − → c and x∗k ∈ ∂g(xk ) =
∂f (xk ) − u∗ ensuring the estimate
338 5 VARIATIONAL TECHNIQUES . . .

u−c  
0< β − g(z) ≤ lim inf x∗k , u − xk .
u−z k→∞

This allows us to choose k ∈ N so that 0 < x∗k , u − xk . Then we have


x∗k = x
∗k − u∗ for some x
∗k ∈ ∂f (xk ) and arrive therefore at
0 < x∗k , u − xk = x
∗k − u∗ , u − xk .
Denoting finally x := xk and x∗ := x ∗k ensures the fulfillment of (5.26) and
thus completes the proof of the theorem. 

We conclude this section with some discussions highlighting the equiva-


lence between the maximal monotonicity of set-valued operators and their
representation via subgradient mappings generated by proper, l.s.c., convex
functions defined on Banach spaces.

Remark 5.36 The results formulated below shed additional light on the rela-
tionships between the convexity of an extended-real-valued function and the
maximal monotonicity of its subgradient mapping:
(a) The assertion of Theorem 5.35 admits the following reverse statement in
Banach spaces. Namely, if a set-valued mapping F : X → → X ∗ is maximal
monotone, then there exists a proper, l.s.c., and convex function f : X →
R such that F = ∂f . Furthermore, F determines f uniquely up to an
additive constant; see Exercise 5.109 with the hints and references therein
and also the corresponding commentaries in Section 5.9. Summarizing, we
have that a proper l.s.c. function f : X → R is convex if and only if ∂f
is monotone, in which case ∂f is maximal monotone.
(b) In Chapter 7 we discuss major subgradient mappings generated by non-
convex Lipschitzian functions and show that their monotonicity is equiv-
alent to the convexity of these functions, in which case the subgradient
mapping is maximal monotone.

5.5 Subdifferential Characterizations of Differentiability


In this section we recall the concepts of Gâteaux and Fréchet differentiability
for real-valued functions. Then we study necessary and sufficient conditions
that ensure the Gâteaux differentiability and Fréchet differentiability of con-
vex functions based on their subgradients.

5.5.1 Gâteaux and Fréchet Differentiability

Let us start with the definition of Gâteaux differentiability for functions on


topological vector spaces. Although we formally address general extended-
real-valued functions, our differentiability analysis is provided at interior
points of their domains, i.e., concerns in fact locally real-valued functions.
5.5 Subdifferential Characterizations of Differentiability 339

Definition 5.37 Let f : X → R be a function defined on a topological vector


space X, and let x ∈ int(domf ). We say that f is Gâteaux differentiable
at x if there exists x∗ ∈ X ∗ such that
f (x + tv) − f (x)
lim = x∗ , v for all v ∈ X.
t→0 t
The element x∗ , being uniquely defined, is called the Gâteaux derivative

of f at x and is denoted by fG (x).
Recall that the directional derivative of a function f : X → R at x ∈
dom(f ) in the direction v ∈ X is given by
f (x + tv) − f (x)
f  (x; v) := lim
t↓0 t
provided that the limit exists. The next proposition reveals the relationship
between directional differentiability and Gâteaux differentiability.
Proposition 5.38 Let X be a topological vector space, and let f : X → R be
a function with x ∈ int(dom(f )). Then the following properties are equivalent:
(a) f is Gâteaux differentiable at x.
(b) The directional derivative f  (x; v) is well-defined and is linear with respect
to directions, i.e., there is x∗ ∈ X ∗ with f  (x; v) = x∗ , v for all v ∈ X.

Proof. To verify implication (a)=⇒(b), suppose that f is Gâteaux differen-


tiable at x and let x∗ := fG

(x) ∈ X ∗ . Then we have
f (x + tv) − f (x)
x∗ , v = lim for all v ∈ X,
t→0 t
which tells us by definition that
f (x + tv) − f (x)
f  (x; v) = lim = x∗ , v for all v ∈ X.
t↓0 t
To verify now that opposite implication (b)=⇒(a), take x∗ ∈ X ∗ satisfying
f  (x; v) = x∗ , v for all v ∈ X and pick any v ∈ X. Then we have
f  (x; −v) = x∗ , −v = − x∗ , v ,
which clearly yields the equalities

 f (x + tv) − f (x)
f− (x; v) = lim
t↑0 t
f (x + (−t)(−v)) − f (x)
= − lim
t↑0 (−t)
f (x + t(−v)) − f (x)
= − lim
t↓0 t
= −f  (x; −v) = x∗ , v .
340 5 VARIATIONAL TECHNIQUES . . .

In this way we arrive at the representation


f (x + tv) − f (x)
lim = x∗ , v for all v ∈ X
t→0 t
and thus justify the Gâteaux differentiability of f at x. 

Our further considerations address the notion of Fréchet differentiability


for functions given on normed spaces.
Definition 5.39 Let X be a normed space, and let f : X → R be a function
with x ∈ int(dom(f )). We say that f is Fréchet differentiable at x if
there exists x∗ ∈ X ∗ such that
f (x + h) − f (x) − x∗ , h
lim = 0.
h→0 h
The element x∗ , being uniquely defined, is called the Fréchet derivative
of f at x and is denoted by either fF (x) or ∇f (x).
The next result establishes relationships between Gâteaux differentiability
and Fréchet differentiability of functions on normed spaces.

Theorem 5.40 Let X be a normed space, and let f : X → R be a function


with x ∈ int(dom(f )). If f is Fréchet differentiable at x, then it is Gâteaux
differentiable at this point. The converse implication holds if X = Rn and if
f is locally Lipschitz continuous around x.

Proof. Suppose that f is Fréchet differentiable at x and take x∗ := fF (x) ∈


X ∗ . Then for any v = 0 we have
f (x + tv) − f (x) − x∗ , tv
lim = 0.
t↓0 t v
This readily implies that
f (x + tv) − f (x)
f  (x; v) = lim = x∗ , v .
t↓0 t
Since the latter obviously holds for v = 0, we conclude from Proposition 5.38
that the function f is Gâteaux differentiable at x.
To verify the converse implication, assume that f is Gâteaux differentiable
at x with x∗ := fG
(x) and show that
f (x + h) − f (x) − x∗ , h
lim = 0.
h→0 h
Suppose on the contrary that this fails and then find ε0 > 0 and a sequence
of hk → 0 with hk = 0 as k ∈ N for which
5.5 Subdifferential Characterizations of Differentiability 341

f (x + hk ) − f (x) − x∗ , hk |
ε0 ≤ . (5.27)
hk
hk
Denote tk := hk and vk := . Then tk → 0, and as X = Rn we get
hk
without loss of generality that vk → v with v = 1. This yields the estimate

f (x + tk vk ) − f (x) − x∗ , tk vk |
ε0 v k ≤ .
tk
Taking now a Lipschitz constant  ≥ 0 of f around x and employing the
triangle inequality ensure that

f (x + tk vk ) − f (x) − x∗ , tk vk |
ε0 vk ≤
 tk
f (x + tk vk ) − f (x + tk v)| + | x∗ , tk v − x∗ , tk vk |

 tk
f (x + tk v) − f (x) − x∗ , tk v |
+
tk
≤  vk − v + x∗ · vk − v
f (x + tk v) − f (x) − x∗ , tk v |
+ → 0 as k → ∞.
tk
This contradicts (5.27) due to ε0 vk → ε0 and thus verifies the Fréchet
differentiability of f at x. 

Let us demonstrate that both assumptions on the Lipschitz continuity


of f around x and the finite dimension of X are essential for the validity
of the converse implication in Theorem 5.40. The first example deals with
non-Lipschitzian (actually discontinuous) functions on the plane.

Example 5.41 Fix α > 2 and consider the function f : R2 → R defined by


⎧ α
⎨ |x1 | if x2 = 0,
f (x1 , x2 ) := x
⎩ 2
0 if x2 = 0.
Observe that f is Gâteaux differentiable at (0, 0). Indeed, for any direction
v = (v1 , v2 ) ∈ R2 with v2 = 0 we have
f (0 + tv1 , 0 + tv2 ) − f (0, 0) tα−2 |v1 |α
lim = lim = 0.
t↓0 t t↓0 v2
It also follows from the construction of f that
f (0 + tv1 , 0 + tv2 ) − f (0, 0)
lim = 0 if v2 = 0.
t↓0 t
342 5 VARIATIONAL TECHNIQUES . . .

Thus Proposition 5.38 ensures that f is Gâteaux differentiable at (0, 0). It is


easy to see that the Fréchet differentiability of any function f at x yields its
continuity at this point. On the other hand, the function f under consideration
is not continuous at (0, 0). To check this, consider the sequence (1/k, 1/k α ) →
0 as k → ∞ and observe f (1/k, 1/k α ) = 1 whenever k ∈ N, which shows that
f is not Fréchet differentiable at (0, 0).

The second example is significantly more involved in comparison with the


previous one while showing that the converse implication in Theorem 5.40
fails even for the simplest Lipschitz continuous function on the classical space
of sequences with real number entries.

Example 5.42 Consider the normed space 1 containing all the sequences
x = (xk ) of real numbers x1 , x2 , . . . endowed with the norm


x 1 := |xk | < ∞.
k=1

Define a Lipschitz continuous function f : 1 → R by f (x) := x 1 for any


x ∈ 1 . Let us first prove that f is Gâteaux differentiable at x = (xk ) ∈ 1 if
and only if xk = 0 for all k ∈ N. Suppose that f is Gâteaux differentiable at
x = (xk ) ∈ 1 . Then we have
x + tv − x
lim exists for any v = (vk ) ∈ 1 .
t→0 t
Fixing now an arbitrary index i ∈ N, consider the direction d ∈ 1 with vk := 0
for k = i and vi = 1. Then the following limits exist:
∞
x + tv − x (|xi + tvi | − |xi |) |xi + t| − |xi |
lim = lim k=1 = lim .
t→0 t t→0 t t→0 t
It follows easily from the above that xi = 0 for all i ∈ N.
To prove the converse implication, fix x = (xk ) ∈ 1 with xk = 0 for all
k ∈ N and show that f is Gâteaux differentiable at x. We obviously have
|xk + tvk | − |xk |
lim = sign(xk )vk . (5.28)
t↓0 t
Define further the function x∗ : 1 → R by


∗ ∗
x (v) := x , v = sign(xk )vk for v = (vk ) ∈ 1 .
k=1
∗ 1 ∗
Since x ∈ ( ) , the claimed Gâteaux differentiability of f follows by Propo-
sition 5.38 from the limiting representation
f (x + tv) − f (x)
lim = x∗ , v for all v ∈ 1 . (5.29)
t↓0 t
5.5 Subdifferential Characterizations of Differentiability 343

To proceed with verifying (5.29), observe first the obvious fact that for any
a = (ak ) and b = (bk ) in 1 we have
∞ ∞
   n  ∞
 ∞

   
 ak − bk  ≤  (ak − bk ) + |ak | + |bk | if n ∈ N. (5.30)
k=1 k=1 k=1 k=n+1 k=n+1

Furthermore, whenever t = 0 it holds that


 f (x + tv) − f (x)   ∞
|xk + tvk | − |xk | 
∞ 
   
 − x∗ , v  =  − sign(xk )vk .
t t
k=1 k=1

Thus for any n ∈ N and t = 0 we employ (5.30) and arrive at


 f (x + tv) − f (x)   n 
|xk + tvk | − |xk | 
 ∗   
 − x ,v  ≤  − sign(xk )vk 
t t
k=1
∞  
 |xk + tvk | − |xk | 
+  
t
k=n+1
∞
+ |sign(xk )vk | (5.31)
k=n+1
n 
|xk + tvk | − |xk | 
 
≤ − sign(xk )vk 
t
k=1
∞
+2 |vk |,
k=n+1

where the last estimate holds due to (|xk + tvk | − |xk |)/t ≤ |vk | for all k ∈ N.
To proceed further, fix any ε > 0 and find k0 ∈ N such that

 ε
2 |vk | < .
2
k=k0 +1

Using (5.28) gives us δ > 0 for which


k0 
|xk + tvk | − |xk |  ε
 
 − sign(xk )vk  < if t < δ.
t 2
k=1

It follows from (5.31) with n = k0 that


 f (x + tv) − f (x) 
 
 − x∗ , v  < ε whenever 0 < t < δ.
t
This verifies (5.29) and justifies the Gâteaux differentiability of f at x.
To complete this example, let us show that f is not Fréchet differentiable
at every point x ∈ 1 . Fix any x = (xk ) ∈ 1 and suppose on the contrary
that f is Fréchet differentiable at x. Then we have
344 5 VARIATIONAL TECHNIQUES . . .

f (x + h) + f (x − h) − 2f (x)
lim = 0.
h→0 h
Choose the sequence {hk } ⊂ 1 as follows:
h1 = (2x1 , 0, 0, . . .),
h2 = (0, 2x2 , 0, . . .),
...
∞
and observe by k=1 |xk | < ∞ that hk = 2|xk | → 0 as k → ∞. We have
f (x + hk ) + f (x − hk ) − 2f (x) x + hk + x − hk − 2 x
lim = lim ,
k→∞ hk k→∞ hk
which implies by a direct calculation that
x + hk + x − hk − 2 x 3|xk | + |xk | − 2|xk |
lim = lim = 1.
k→∞ hk k→∞ 2|xk |
This contradiction shows that f is not Fréchet differentiable at x and thus
confirms that the finite dimension of X is essential in Theorem 5.40.

The final example below presents a real-valued Lipschitz continuous func-


tion on L1 , which is Gâteaux differentiable everywhere while being nowhere
Fréchet differentiable.
Example 5.43 Consider two functions f : L1 [0, π] → R and g : L2 [0, π] → R
defined by the same formula
 π  π
f (x) := sin(x(t))dt, x ∈ L1 [0, π], and g(x) := sin(x(t))dt, x ∈ L2 [0, π].
0 0

The function g is clearly the restriction of f on L2 [0, π] ⊂ L1 [0, π]. We are


going to show that g is Fréchet differentiable on L2 [0, π] and that f is every-
where Gâteaux differentiable while nowhere Fréchet differentiable on L1 [0, π].
Observe that for any x, v ∈ L1 [0, π] with v = 0 we have the relationships
 π
    
sin x(t) + hv(t) − sin x(t) dt
f (x + hv) − f (x)
limh→0 = lim 0
h h→0
 π  h 
f (x + hv) − f (x) sin( hv(t)
2 ) hv(t)
⇐⇒ lim = lim h
cos x(t) + dt
h→0 h h→0
 π 0 2
2
f (x + hv) − f (x)
⇐⇒ lim = v(t) cos(x(t)) = cos(x), v ,
h→0 h 0

which clearly imply that f is Gâteaux differentiable on L1 [0, π] with fG 


(x) =
cos(x). Furthermore, this yields the Gâteaux differentiability of g on L2 [0, π]

with gG (x) = cos(x) since g is the restriction of f to L2 [0, π]. To verify now
5.5 Subdifferential Characterizations of Differentiability 345

that g is Fréchet differentiable on L2 [0, π], we use the Taylor expansion of g


2
to get | sin(a + b) − sin(a) − b cos(a)| = | b2 sin(z)| for some z between a and
a + b. This ensures the estimate
 π  π
      1 1
sin x(t) + y(t) − sin x(t) − y(t) cos x(t) dt ≤
2 2
y(t) = y 2 ,
0 0 2 2
where y 2 stands for the L2 -norm of y(·). Then we have
|g(x + y) − g(x) − cos(x), y |
lim
y 2 →0
π   y     
0
sin x(t) + y(t) − sin x(t) − y(t) cos x(t) dt
≤ lim
y 2 →0 y 2
1
≤ lim y 2.
y 2 →0 2

Combining all the above gives us the estimate


|g(x + y) − g(x) − cos(x), y | 1
lim ≤ lim y 2 = 0,
y 2 →0 y y 2 →0 2

which tells us therefore that g is Fréchet differentiable at any point in L2 [0, π].
It remains to show that f is nowhere Fréchet differentiable on L1 [0, π]. To
proceed, we first prove that for each x ∈ L1 [0, π] there exists v ∈ L1 [0, π] such
that the Lebesgue measure of the set {t ∈ R | 0 ≤ t ≤ π and sin(x(t) + v(t)) −
sin(x(t))−v(t) cos(x(t)) = 0} is positive. Indeed, suppose on the contrary that
for all v ∈ L1 [0, π] the measure of this set is zero. Take any rational number
q and define vq (t) = q for all t ∈ [0, π]. Then the set
Sq := {t ∈ [0, π] | sin(x(t) + q) − sin(x(t)) − q cos(x(t)) = 0}
is of measure zero and so is the set S := ∪q∈Q Sq . This ensures that for all
rational numbers q and t ∈ / S we get sin(x(t) + q) − sin(x(t)) = q cos(x(t)),
and therefore sin(x(t) + r) − sin(x(t)) = r cos(x(t)) whenever r ∈ R. We come
to a contradiction, since the function r cos(x(t)) is linear while sin(x(t) + r) −
sin(x(t)) is not. To proceed further, choose v0 ∈ L1 [0, π] such that
       
μ t ∈ R  0 ≤ t ≤ π, sin x(t)+v0 (t) −sin x(t) −v0 (t) cos x(t) = 0 ) > 0,
where μ denotes the Lebesgue measure. Without loss of generality, suppose
that there exists α > 0 for which
        
μ {t ∈ R  0 ≤ t ≤ π, sin x(t)+v0 (t) −sin x(t) −v0 (t) cos x(t) ≥ α > 0.
Considering further the set
      
T := t ∈ R  0 ≤ t ≤ π, sin x(t) + v0 (t) − sin x(t) − v0 (t) cos x(t) ≥ α ,
346 5 VARIATIONAL TECHNIQUES . . .

observe that |v0 | ∈ L1 [0, π] due to v0 ∈ L1 [0, π]. Thus there exist T0 ⊂ T and
β > 0 such that |v0 (t)| > β for all t ∈ T0 . Choose a sequence {Tk } ⊂ T0 such
that Tk+1 ⊂ Tk , μ(Tk ) > 0 for all k ∈ N, and ∩∞ i=1 Tk = ∅, and then define the
sequence of functions {hk } ⊂ L1 [0, π] by
!
v0 (t) if t ∈ Tk ,
hk (t) :=
0 if t ∈ / Tk .
Then hk → 0 as k → ∞ together with
 π 
        
 sin x(t) + hk (t) − sin x(t) − hk (t) cos x(t) dt 
 αμ(Tk ) α
0
≥ = .
hn βμ(Tk ) β
This readily implies that
 π 
        
 sin x(t) + h(t) − sin x(t) − h(t) cos x(t) dt 
 
0
lim = 0,
h →0 h
which shows that f is not Fréchet differentiable at any point in L1 [0, π].

5.5.2 Characterizations of Gâteaux Differentiability

This subsection presents subdifferential characterizations of Gâteaux differen-


tiability for convex functions defined on topological vector spaces. The basic
result is as follows.

Theorem 5.44 Let X be a topological vector space, and let f : X → R be a


convex function with x ∈ int(dom(f )). If f is Gâteaux differentiable at x, then
∂f (x) is a singleton. The converse holds if in addition f is continuous at x.

Proof. Suppose that f is Gâteaux differentiable at x. It follows from Proposi-


tion 5.38 that f  (x; v) = x∗ , v for all v ∈ X, where x∗ := fG

(x). Using impli-
cation (b)=⇒(a) in Theorem 4.48 yields x ∈ ∂f (x). Taking any v ∗ ∈ ∂f (x)

and employing now implication (a)=⇒(c) in Theorem 4.48, we get that


x∗ , v = f−

(x; v) ≤ v ∗ , v ≤ f  (x; v) for all v ∈ X.
This yields x∗ = v ∗ and thus verifies that ∂f (x) = {x∗ }.
Let us finally prove the Gâteaux differentiability of f at x by assuming
that f is continuous at x and ∂f (x) = {x∗ }. Indeed, it follows from Theo-
rem 4.50(c) that in this setting we have

f  (x; v) = sup x∗ , v  x∗ ∈ ∂f (x) = x∗ , v for all v ∈ X.
Thus Proposition 5.38 tells us that f is Gâteaux differentiable at x. 
5.5 Subdifferential Characterizations of Differentiability 347

We now arrive at useful consequences of the previous results for convex


functions in finite dimensions.
Corollary 5.45 Let f : Rn → R be a convex function, and let x ∈ int(dom(f )).
Then the following assertions are equivalent:
(a) f is Gâteaux differentiable at x.
(b) f is Fréchet differentiable at x.
∂f
(c) The partial derivatives ∂x i
(x) exist for all i = 1, . . . , n.
(d) ∂f (x) is a singleton.

Proof. The fact that (a) and (b) are equivalent follows from Theorem 5.40.

To verify implication (a)=⇒(c), denote v := fG (x) and let ei be the ith vector
n
in the standard orthonormal basis of R . By Theorem 5.38 we have
f (x + tei ) − f (x)
lim = v, ei .
t→0 t
∂f
It shows that ∂xi
(x) = v, ei for all i = 1, . . . , n.
∂f
To proceed with the proof of (c)=⇒(d), let vi := ∂xi (x) for i = 1, . . . , n
and, remembering that ∂f (x) = ∅, pick any u ∈ ∂f (x). Then
u, h ≤ f (x + h) − f (x) for all h ∈ Rn .
In particular, we get the inequality
u, tei ≤ f (x + tei ) − f (x) for all t ∈ R, (5.32)
which implies therefore that for any t > 0 it holds
f (x + tei ) − f (x)
u, ei ≤ .
t
Letting now t ↓ 0 gives us u, ei ≤ vi and using (5.32) with t < 0 yields
u, ei ≥ vi , and hence u, ei = vi whenever i = 1, . . . , n. This verifies that
u = v, and thus ∂f (x) = {v}.
Finally, implication (d)=⇒(a) follows from Theorem 5.44 by observing
that f is always continuous at x. 

For our subsequent study of subdifferential properties of convex functions,


recall first the classical notion of upper semicontinuity for set-valued mappings.

Definition 5.46 Let F : X →→ Y be a set-valued mapping between two topo-


logical spaces X and Y . We say that F is upper semicontinuous at
x ∈ dom(F ) if for any open set V ⊂ Y containing F (x) there exists an
open set U ⊂ X containing x such that F (U ) ⊂ V .

The next proposition is useful in what follows.


348 5 VARIATIONAL TECHNIQUES . . .

Proposition 5.47 Let X and Y be topological vector spaces, and let


→ Y be a set-valued mapping with F (x) = {y}. The following hold:
F: X →
(a) F is upper semicontinuous at x.
(b) For any net {xα }α∈I in X converging to x with yα ∈ F (xα ) for every
α ∈ I, we have that {yα }α∈I converges to y.

Proof. To verify (a)=⇒(b), suppose that F is upper semicontinuous at x and


fix any net {xα }α∈I in X that converges to x with yα ∈ F (xα ) for every α ∈ I.
Let V be a neighborhood of y. Then F (x) = {y} ⊂ V , and we deduce from
the upper semicontinuity of F at x that there is a neighborhood U of x with
F (x) ⊂ V whenever x ∈ U.
Since {xα }α∈I ⊂ X converges to x, find α0 ∈ I with
xα ∈ U for all α ≥ α0 .
Then yα ∈ F (xα ) ⊂ V whenever α ≥ α0 , and thus {yα }α∈I converges to y.
To verify the converse implication (b)=⇒(a), let (b) be satisfied and suppose
on the contrary that F is not upper semicontinuous at x. Then there exists a
neighborhood V of y such that for any neighborhood U of x we have f (U ) ⊂ V .
Consider further the family of neighborhoods

I := U  U is a neighborhood of x
and define a preorder relation by: U1  U2 if and only if U2 ⊂ U1 in I.
For each U ∈ I pick xU ∈ U and yU ∈ F (xU ) such that yU ∈ / V . Then we
see that the net {xU }U ∈I converges to x. It follows from (b) that the net
{yU }U ∈I converges to y, and thus we arrive at a contradiction since yU ∈/V
for every U ∈ I. This verifies the upper semicontinuity of F at x and therefore
completes the proof of the proposition. 

The following versions of the outer semicontinuity of set-valued mappings


play a very important role in many aspects of convex and variational analysis.

Definition 5.48 Let F : X → → Y be a set-valued mapping between two topo-


logical spaces, and let x ∈ dom(F ). We say that
(a) F is topologically outer semicontinuous at x if for any net xα →
x with xα ∈ dom(F ) and any bounded net of yα ∈ F (xα ) converging to
some y we have y ∈ F (x).
(b) F is sequentially outer semicontinuous at x if for any sequence
xk → x with xk ∈ dom(F ) and any sequence of yk ∈ F (xk ) converging to
some y we have y ∈ F (x).

Now we verify the topological outer semicontinuity of subgradient map-


pings generated by convex functions on normed spaces in the norm×weak∗
topology on the Cartesian product X × X ∗ .
5.5 Subdifferential Characterizations of Differentiability 349

Proposition 5.49 Let f : X → R be a convex function defined on a normed


space X, and let x ∈ dom(f ). Suppose that f is l.s.c. at x and consider a net
{xα }α∈I ⊂ X that converges to x. Then for any subgradient net x∗α ∈ ∂f (xα )
for α ∈ I, which is bounded in X ∗ and converges to some x∗ in the weak∗
topology of X ∗ , we have x∗ ∈ ∂f (x).

Proof. It follows from the subdifferential definition that


x∗α , x − xα ≤ f (x) − f (xα ) for all x ∈ X, α ∈ I. (5.33)
Since the net {x∗α }α∈I ⊂ X ∗ is bounded and converges to x∗ , we get that
x∗α , xα → x∗ , x . Passing to a limit inferior in (5.33) gives us
x∗ , x − x ≤ f (x) − f (x) for all x ∈ X,
which shows by definition that x∗ ∈ ∂f (x). 

The next proposition tells us that subdifferential mappings for convex


functions enjoy a stronger property than the one in Proposition 5.49; namely,
upper semicontinuity with respect to the same product topology.
Proposition 5.50 Let X be a normed space, and let f : X → R be a convex
function with the open domain dom(f ) on which f is continuous. Then the
subgradient mapping ∂f : X →
→ X ∗ is upper semicontinuous on dom(f ) in the
norm×weak topology on the product space X × X ∗ .

Proof. Fix any x ∈ dom(f ) and suppose on the contrary that ∂f is not upper
semicontinuous at x. Then there exists a weak∗ open subset V of X ∗ con-
taining ∂f (x) and such that we can find a sequence {xk } ⊂ dom(f ) that
converges to x with x∗k ∈ ∂f (xk ) \ V . Since f is convex and continuous at x,
it is Lipschitz continuous around x with some constant  ≥ 0; thus we have
x∗k ≤  for all k ∈ N. The Alaoglu-Bourbaki theorem ensures that {x∗k } has
a weak∗ cluster point x∗ . Then we get x∗ ∈ ∂f (x) \ V by Proposition 5.49.
The obtained contradiction verifies the claimed result. 

The following refinement of the above result for the case of Gâteaux dif-
ferentiable functions on normed spaces is rather straightforward.

Corollary 5.51 Suppose in addition to the assumptions of Proposition 5.50


that f is Gâteaux differentiable on D := dom(f ). Then the Gâteaux derivative

mapping fG : D → X ∗ is continuous with respect to the norm×weak∗ topology
on the product space X × X ∗ .

Proof. We get from Theorem 5.44 that the subgradient mapping of f reduces
to its single-valued Gâteaux derivative counterpart. Furthermore, it is easy to
see that the upper semicontinuity of a single-valued mapping means in fact
its continuity. This verifies the claimed statement. 
350 5 VARIATIONAL TECHNIQUES . . .

5.5.3 Characterizations of Fréchet Differentiability

This subsection provides various subdifferential characterizations of Fréchet


differentiability of convex functions on normed and Banach spaces. The first
theorem addresses the case of general normed spaces. Given a set-valued
mapping F : X → → X ∗ with x ∈ dom(F ), we say that the mapping F is
strongly upper semicontinuous at x if it is upper semicontinuous when X and
X ∗ are equipped with the strong topologies.
Theorem 5.52 Let f : X → R be a convex function defined on a normed
space X, and let x ∈ int(dom(f )). Then the following are equivalent:
(a) f is Fréchet differentiable at x.
(b) f is continuous around x, the set ∂f (x) is a singleton, and the subgradient
mapping ∂f : X → → X ∗ is strongly upper semicontinuous at x.
Proof. To verify implication (a)=⇒(b), suppose that f is Fréchet differentiable
at x. It is well known that in this case ∂f (x) = {∇f (x)} and f is locally
bounded around x; thus it is (Lipschitz) continuous around this point. Let us
show that the subgradient mapping ∂f is strongly upper semicontinuous at
x. To proceed, denote x∗ := fF (x) and get from the definition that
f (x) − f (x) − x∗ , x − x
lim = 0.
x→x x−x
For any ε > 0 we choose δ > 0 such that
|f (x) − f (x) − x∗ , x − x | ≤ ε x − x whenever x − x < δ.
This readily yields the inequality
x∗ , x − x ≤ f (x) − f (x) + ε x − x for all x − x < δ.
Fixing further any x ∈ B(x; δ) and any u∗ ∈ ∂f (x) gives us
x∗ − u∗ , x − x = x∗ , x − x + u∗ , x − x
≤ f (x) − f (x) + ε x − x + f (x) − f (x) = ε x − x ,
which shows that x∗ − u∗ ≤ ε and thus justifies the claimed strong upper
semicontinuity of the subgradient mapping ∂f .
Let us now verify the opposite implication (b)=⇒(a). Denote ∂f (x) :=
{v ∗ } and deduce from the strong upper semicontinuity of ∂f at x that for any
ε > 0 there exists δ > 0 such that
∂f (u) ⊂ B(v ∗ ; ε) whenever u ∈ B(x; δ).
We can choose δ > 0 to be so small that f is continuous on B(x; δ). Fix any
x ∈ B(x; δ) with x = x and, by using the subdifferential mean value result
from Theorem 5.27, find u ∈ (x, x) and u∗ ∈ ∂f (u) satisfying
5.5 Subdifferential Characterizations of Differentiability 351

f (x) − f (x) = u∗ , x − x .
Then we have u∗ − v ∗ < ε and therefore arrive at
 f (x) − f (x) − v ∗ , x − x   u∗ − v ∗ , x − x 
   
 =  ≤ u∗ − v ∗ < ε.
x−x x−x
This tells us that f is Fréchet differentiable at x. 

The next theorem provides yet another characterization of Fréchet differ-


entiability, now via ε-subgradients in the Banach space setting with the usage
of Brøndsted-Rockafellar’s theorem taken from Theorem 5.10.

Theorem 5.53 Let X be a Banach space, and let f : X → R be a convex


function which is continuous at x. Then f is Fréchet differentiable at x if and
only if there exists x∗ ∈ X ∗ with ∂f (x) = {x∗ } and such that {x∗k } converges
strongly to x∗ whenever x∗k ∈ ∂εk f (x) and εk ↓ 0 as k → N.

Proof. Suppose that f is Fréchet differentiable at x with ∇f (x) := x∗ , and


hence ∂f (x) = {x∗ }. Consider any sequence of x∗k ∈ ∂εk f (x) with εk ↓ 0 as
k → ∞. Then Theorem 5.10 gives us a sequence {xk } converging to x and
√ √
such that x∗k ∈ ∂f (xk ) + εk B∗ . We have x∗k = u∗k + εk e∗k with u∗k ∈ ∂f (xk )
and e∗k ∈ B∗ . It follows from Proposition 5.47 and Theorem 5.52 that {u∗k }
converges to x∗ strongly, and hence {x∗k } converges to x∗ strongly as well.
To verify the opposite implication asserted in the theorem, assume that
there exists x∗ ∈ X ∗ with ∂f (x) := {x∗ } and such that {x∗k } converges
strongly to x∗ whenever x∗k ∈ ∂εk (x) and εk ↓ 0 as k → ∞. Theorem 5.52 tells
us that the Fréchet differentiability of f at x follows from the strong upper
semicontinuity of the subgradient mapping ∂f : X → → X ∗ at x with respect to

the norm topologies of X and X . To check the latter, pick any sequence of
subgradients x∗k ∈ ∂f (xk ) with xk → x as k → ∞ and get from the continuity
of f at x that it is locally Lipschitzian around x with some Lipschitz constant
 > 0. Without loss of generality, suppose that |f (xk ) − f (x)| ≤  xk − x
and x∗k ≤  for all k ∈ N. Then we have the relationships
x∗k , x − x = x∗k , x − xk + x∗k , xk − x
≤ f (x) − f (xk ) + x∗k · xk − x
= f (x) − f (x) + f (x) − f (xk ) + x∗k · xk − x
≤ f (x) − f (x) +  x − xk + x∗k · xk − x
≤ f (x) − f (x) + 2 xk − x .
Letting finally εk := 2 xk − x shows that εk ↓ 0 as k → ∞ and that x∗k ∈
∂εk f (x) for every k ∈ N. Thus the sequence {x∗k } converges to x∗ strongly,
which therefore completes the proof of the theorem. 

To proceed further, we first observe the following property.


352 5 VARIATIONAL TECHNIQUES . . .

Proposition 5.54 Let X be a Banach space, and let f : X → R be a proper,


l.s.c., and convex function. Suppose that the conjugate function f ∗ : X ∗ → R
is Fréchet differentiable at x∗ . Then we have ∇f ∗ (x∗ ) ∈ X.
Proof. It is easy to see that f ∗ is l.s.c. on the dual space X ∗ equipped with
the weak∗ topology. Fix any sequence εk ↓ 0 as k → ∞ and deduce from
Theorem 5.9 that ∂εk f (x∗ ) is a nonempty subset of X. Thus we can pick a
sequence of xk ∈ ∂εk f (x∗ ). It follows from Theorem 5.53 that {xk } converges
to ∇f (x) as k → ∞, which implies that ∇f (x) ∈ X due to the closedness of
the Banach space X as a linear subspace of X ∗∗ . 
The next theorem provides a characterization of Fréchet differentiability
of conjugate functions on duals of Banach spaces.
Theorem 5.55 Let f : X → R be a proper, l.s.c., and convex function on a
Banach space X. Assume that the conjugate function f ∗ : X ∗ → R is con-
tinuous at some x∗ ∈ X ∗ in the norm topology of X ∗ . Then f ∗ is Fréchet
differentiable at x∗ if and only if there exists x ∈ X such that ∂f ∗ (x∗ ) = {x}
and that xk → x whenever xk ∈ ∂εk f ∗ (x∗ ) and εk ↓ 0 as k → ∞.
Proof. To begin with, suppose that f ∗ is Fréchet differentiable at x∗ and
denote x := ∇f ∗ (x∗ ) ∈ X. If xk ∈ ∂εk f ∗ (x∗ ) and εk ↓ 0, then we deduce
directly from Theorem 5.53 that {xk } converges to x as k → ∞.
Let us verify the converse implication of the theorem. Supposing on the
contrary that f ∗ is not Fréchet differentiable at x∗ , we have
f (x∗ + u∗ ) − f (x∗ ) − u∗ , x
 0 as u∗ → 0.
u∗
It tells us that there exist ε > 0, u∗k ∈ S ∗ , and tk ↓ 0 as k → ∞ such that
f ∗ (x∗ + tk u∗k ) − f ∗ (x∗ ) − tk u∗k , x ≥ εtk for k ∈ N.
Choosing xk ∈ ∂εk f ∗ (x∗ + tk u∗k ) with εk := εtk /2 gives us
tk u∗k , xk − x ≥ f ∗ (x∗ + tk u∗k ) − f ∗ (x∗ ) − tk u∗k , x − tk ε/2 ≥ tk ε/2 as k ∈ N.
It shows that u∗k , xk − x ≥ ε/2 for all k ∈ N. Thus the sequence {xk } does
not converge to x, which is a contradiction completing the proof. 
Now we derive a variational consequence of Fréchet differentiability for
l.s.c. convex functions on Banach spaces.
Theorem 5.56 Let f : X → R be a proper, l.s.c., and convex function on a
Banach space X. If f ∗ is Fréchet differentiable at x∗ ∈ X ∗ , then we have the
following:
(a) ∇f ∗ (x∗ ) := x ∈ dom(f ).
(b) The function x → f (x) − x∗ , x attains its robust global minimum
on X at x, in the sense that x is a global minimizer of f on X and every
minimizing sequence of f − x∗ converges to x.
5.6 Generic Differentiability of Convex Functions 353

Proof. Proposition 5.54 tells us ∇f ∗ (x∗ ) = x ∈ X. It follows from the bicon-


jugate theorem (see Theorem 4.15) that f ∗∗ (x) = f (x) under the assumptions
made. Thus we get the subdifferential relationship
x, u∗ − x∗ ≤ f (u∗ ) − f (x∗ ) for all u∗ ∈ X ∗ ,
which can be equivalently rewritten as
u∗ , x − f (u∗ ) ≤ x∗ , x − f (x∗ ) for all u∗ ∈ X ∗ .
It follows from f (x) = f ∗∗ (x) < ∞ that x ∈ dom(f ) and f ∗ (x∗ ) + f (x) =
x∗ , x , which implies that x is a global minimizer of f (x) − x∗ , x on X.
Suppose further that {xk } is a minimizing sequence for the function f − x∗ ,
i.e.,
(f − x∗ )(xk ) → inf f (x) − x∗ , x as k → ∞.
x∈X

Then there exists a numerical sequence εk ↓ 0 as k → ∞ such that


(f − x∗ )(xk ) ≤ (f − x∗ )(x) + εk for all k ∈ N.
This tells us that xk ∈ ∂εk f ∗ (x∗ ) whenever k ∈ N, and therefore the given
sequence {xk } converges to x by Theorem 5.55. 
The concluding result of this section gives us a variational characterization
of Fréchet differentiability for l.s.c. convex functions on Banach spaces.
Theorem 5.57 Let f : X → R be a proper, l.s.c., and convex function on
a Banach space X such that f ∗ is continuous at x∗ . Then f ∗ is Fréchet
differentiable at x∗ ∈ X ∗ if and only if the function x → f (x) − x∗ , x
attains its robust strong minimum in the sense of Theorem 5.56 at some point
x ∈ dom(f ), which is x = ∇f ∗ (x∗ ) in this case.
Proof. Due to Theorem 5.56(b) we only need to verify that f ∗ is Fréchet
differentiable at x∗ provided that the function f (x) − x∗ , x attains its robust
strong minimum at x, which clearly implies that ∂f ∗ (x∗ ) = {x}. To proceed,
suppose that xk ∈ ∂εk f ∗ (x∗ ) with εk ↓ 0. It tells us by duality that x∗ ∈
∂εk f (xk ) for every k ∈ N. The latter ensures by the ε-subdifferential definition
that {xk } is a minimizing sequence for f (x) − x∗ , x , and hence xk → x
as k → ∞ by the imposed assumption. Invoking finally Theorem 5.53, we
conclude that f ∗ is Fréchet differentiable at x∗ . 

5.6 Generic Differentiability of Convex Functions


This section is in a sense a continuation of the preceding one, which provides
subdifferential characterizations of both Fréchet and Gâteaux differentiability
of convex functions at the point in question. Here we study such differentia-
bility properties of convex functions on dense subsets of Banach spaces. Dense
differentiability properties of convex functions play significant roles not only
in convex analysis but also in general aspects of variational analysis dealing
with nonconvex functions and sets. In particular, the class of Asplund spaces
is of crucial importance for variational theory and applications; see [228].
354 5 VARIATIONAL TECHNIQUES . . .

5.6.1 Generic Gâteaux Differentiability

We first recall the definitions of Asplund and weak Asplund spaces, which are
closely related to Fréchet and Gâteaux generic differentiability of continuous
convex functions f : X → R on Banach spaces, i.e., the corresponding differ-
entiability on dense Gδ subsets, where the notation Gδ signifies a countable
intersection of open subsets.
Definition 5.58 Let X be a Banach space.
(a) X is an Asplund space if for every convex continuous f : X → R, the
set of Fréchet differentiability points contains a dense Gδ set in X.
(b) X is a weak Asplund space if for every convex continuous f : X → R,
the set of Gâteaux differentiability points contains a dense Gδ set in X.
These remarkable subclasses of Banach spaces were introduced by Edgar
Asplund [9] in the geometric theory of Banach spaces under the names of
“strong differentiability spaces” and “weak differentiability spaces,” respec-
tively; see the commentaries in Section 5.9 for more discussions. We can see
that the only difference between the definitions of Asplund and weak Asplund
spaces is replacing the dense Fréchet differentiability by its Gâteaux coun-
terpart. But the available results for these two classes of Banach spaces are
dramatically different. While Asplund spaces admit many beautiful character-
izations and useful properties (see more details at the end of Subsection 5.6.2
and also the exercises and commentaries in Sections 5.8 and 5.9), it is not
the case for weak Asplund spaces. However, the latter class contains each
separable Banach space, which is proved below in this subsection.
To begin with, we present a useful result on the almost everywhere (a.e.,
with respect to the Lebesgue measure) differentiability of convex functions
on open intervals of the real line. We actually prove even more: the set of
nondifferentiability points is countable.

Lemma 5.59 Let f : I → R be a convex function defined on an open interval


I ⊂ R. Then f is differentiable everywhere except a countable set.

Proof. Consider the right derivative function

 f (s) − f (t)
ψ(t) := f+ (t) = lim for t ∈ I.
s↓t s−t
It follows from Lemma 2.115 that ψ : I → R is well-defined and nondecreasing
on I. In addition, for any t0 ∈ I we have
 
lim ψ(t) ≤ f− (t0 ) ≤ f+ (t0 ) = ψ(t0 ). (5.34)
t↑t0

Define further the sets



A := t ∈ I  f is not differentiable at t ,
B := t ∈ I  ψ is not continuous at t
5.6 Generic Differentiability of Convex Functions 355

and deduce from (5.34) the inclusion A ⊂ B. Since ψ is nondecreasing, we get


that B is countable, and thus A is countable as well. 
The next lemma is rather technical while playing a significant role in estab-
lishing below the generic Gâteaux differentiability of convex continuous func-
tions on separable Banach spaces.

Lemma 5.60 Let X be a separable Banach space, and let Ω be a nonempty,


open, and convex subset of X. Given a convex function f : X → R with
dom(f ) = Ω, and given any α > 0, we define the set

Ωα := x ∈ Ω  x∗ − u∗ , x ≥ α for some x∗ , u∗ ∈ ∂f (x)}.
If f is continuous on Ω, then the set Uα := Ω \ Ωα is open and dense in Ω.

Proof. Fix any sequence {xk } ⊂ Ωα that converges to some x ∈ Ω and show
that x ∈ Ωα . Choose u∗k , x∗k ∈ X ∗ so that
x∗k − u∗k , xk ≥ α.
Remembering that f is locally Lipschitzian around x, suppose without loss
of generality that {x∗k } and {u∗k } are bounded sequences in X ∗ . Since X is
separable, it is known from classical analysis that bounded sets in X ∗ are
metrizable with respect to the weak∗ topology. This allows us to find subse-
quences of {x∗k } and {u∗k } that weak∗ converge to some x∗ and u∗ , respectively.
It is easy to see that x∗ , u∗ ∈ ∂f (x) with x∗ − u∗ , x ≥ α. This verifies that
x ∈ Ωα , and so the set Ωα is closed while Uα is an open set in Ω.
Further we show that Uα is dense in Ω. Fixing any x ∈ Ω and a nonzero
element a ∈ X, define the real-valued function

ϕ(t) := f (x + ta) for t ∈ I := γ ∈ R  x + γa ∈ Ω .
It follows from Lemma 5.59 that the function ϕ is differentiable everywhere
except a countable set. Hence for any ε > 0 we find a point x0 := x + t0 a
such that x − x0 < ε and ϕ (t0 ) exists. Pick x∗ , u∗ ∈ ∂f (x0 ) and observe
that the restrictions of these linear functionals to the line x + aR give us the
subdifferentials of ϕ at t0 . Since ϕ is differentiable at t0 , these restrictions
must agree with each other on the entire line x + aR. In particular, we get
x∗ , x0 = u∗ , x0 . This implies that x0 ∈ Ω \ Ωα = Uα , and therefore the set
Uα is dense in Ω, which completes the proof of the lemma. 

Based on the above lemmas, we are now ready to derive the following
theorem, which tells us that any separable Banach space is weak Asplund.
Theorem 5.61 Let X be a separable Banach space, and let Ω be a nonempty,
open, and convex subset of X. Then for every convex function f : Ω → R,
which is continuous on Ω = dom(f ), there exists a Gδ dense subset U ⊂ Ω
such that f is Gâteaux differentiable on U .
356 5 VARIATIONAL TECHNIQUES . . .

Proof. Consider an arbitrary dense subset {xk } of X, which exists due the
separability of this space. Given any m ∈ N, define the collection of sets
  
 1
Fm,k := x ∈ Ω  x∗ − u∗ , xk ≥ whenever x∗ , u∗ ∈ ∂f (x) .
m
Theorem 5.44 tells us that f is Gâteaux differentiable at x ∈ Ω if and only
if ∂f (x) is a singleton. It follows that f is Gâteaux differentiable at x ∈ Ω if
and only if x ∈ / ∪m,k Fm,k . Thus f is Gâteaux differentiable at every x ∈ U :=
∩m,k Ω \ Fm,k , which is a Gδ set according to Lemma 5.60. 

5.6.2 Generic Fréchet Differentiability

This subsection is focused on the generic Fréchet differentiability of convex


continuous functions and related geometric issues in Banach spaces. First we
prove the following technical proposition.

Proposition 5.62 Let X be a normed space. For x0 ∈ X, x∗ ∈ X ∗ \ {0}, and


0 < α < 1, define the set

K(x0 , x∗ , α) := x ∈ X  α x − x0 · x∗ ≤ x∗ , x − x0 . (5.35)
Then this set is a closed and convex cone with the nonempty interior
  
int K(x0 , x∗ , α) = x ∈ X  α x − x0 · x∗ < x∗ , x − x0 .

Proof. It suffices to examine the case where x0 = 0. Consider the function


ϕx∗ ,α (x) := α x · x∗ − x∗ , x for all x ∈ X,
which is clearly continuous, positively homogeneous, and subadditive. Fur-
thermore, we get the following representation of (5.35):

K(0, x∗ , α) = x ∈ X  ϕx∗ ,α (x) ≤ 0
implying that K(0, x∗ , α) is a closed and convex cone. In addition, the above
considerations bring us to the equalities
   
int K(0, x∗ , α) = x ∈ X  ϕx∗ ,α (x) < 0 = x ∈ X  α x · x∗ < x∗ , x .
In this way we arrive at the estimate
 x∗ , x  

α x∗ < x∗ = sup  x = 0 ,
x
which yields the existence of x ∈ X such that
x∗ , x  
α x∗ < , and hence int K(0, x∗ , α) = ∅.
x
This verifies the statement of the proposition. 
5.6 Generic Differentiability of Convex Functions 357

Using the construction in (5.35), we now formulate the following important


notion from geometric theory of Banach spaces.
Definition 5.63 Let X be a normed space, and let α ∈ (0, 1). A subset Ω of
X is called an α-cone meager set if for any x ∈ Ω and any ε > 0 there
exist x0 ∈ B(x; ε) and x∗ ∈ X ∗ \ {0} such that
Ω ∩ K(x0 , x∗ , α) = ∅.

The next lemma, which is based on Proposition 5.62, is needed for deriving
the main result on the generic Fréchet differentiability presented below.

Lemma 5.64 Let X be a normed space, and let Ω be an α-cone meager set
for some α ∈ (0, 1). Then Ω is nowhere dense on X, i.e., int(Ω) = ∅.

Proof. Suppose on the contrary that int(Ω) = ∅ and then find x ∈ Ω and
ε > 0 such that B(x; ε) ⊂ Ω. It follows from Definition 5.63 due to Proposi-
tion 5.62 that for any x0 ∈ B(x; ε) and x∗ ∈ X ∗ \ {0} the set K(x0 , x∗ , α) is
a neighborhood of x0 , and thus
Ω ∩ K(x0 , x∗ , α) = ∅.
This shows that Ω is nowhere dense in X. 

The following lemma employs the monotonicity notion for set-valued map-
pings from Definition 5.31. The proof of this lemma contains the main argu-
ments to verify the generic Fréchet differentiability of convex continuous func-
tion on a major subclass of Banach spaces.

Lemma 5.65 Let X be a normed space, and let F : X → → X ∗ be a monotone


∗ ∗
mapping. For any u ∈ X and α ∈ (0, 1) consider the set sequences
    1

Ak := x ∈ dom(F )  lim diam F (B(x; r) > ,
r↓0 k
    α

Ck := x ∈ Ak  d u∗ ; F (x) < , k ∈ N.
4k
Then Ck is an α-cone meager set for all k ∈ N.

Proof. Fix k ∈ N, x ∈ Ck , and ε > 0. Then x ∈ Ak , and hence there exist


0 < r < ε and y1 , y2 ∈ B(x; r) with y1∗ ∈ F (y1 ) and y2∗ ∈ F (y2 ) such that
y1∗ − y2∗ > k −1 due the monotonicity of F . Thus for any x∗ ∈ F (x) we have
either x∗ − y1∗ > (2k)−1 or x∗ − y2∗ > (2k)−1 . Since d(u∗ ; F (x)) < α/(4k),
there exists x∗ ∈ F (x) with u∗ −x∗ < α/(4k). For y and y ∗ equal to either y1
and y1∗ or y1∗ and y2∗ , respectively, we get y ∈ B(x; ε) and y ∗ ∈ F (y) satisfying
1 α 1
y ∗ − u∗ ≥ y ∗ − x∗ − x∗ − u∗ > − > .
2k 4k 4k
358 5 VARIATIONAL TECHNIQUES . . .

Letting now z ∗ := y ∗ − u∗ = 0, we are going to show that


Ck ∩ K(z, z ∗ , α) = ∅ whenever k ∈ N,
where K(z, z ∗ , α) is taken from (5.35). Supposing that there exists z ∈ Ck ∩
K(y, z ∗ , α) for some k ∈ N, we get z ∈ dom(F ) and thus deduce from the
monotonicity of F that
y ∗ − u∗ , z − y > α y ∗ − u∗ · z − y .
If z ∗ ∈ F (z), then the monotonicity of F ensures that
z ∗ − u∗ , z − y = z ∗ − y ∗ , z − y + y ∗ − u∗ , z − y
α
≥ y ∗ − u∗ , z − y > α y ∗ − u∗ · z − y > z−y ,
4k
which implies in turn that z ∗ − u∗ ≥ α/(4k). Thus d(u∗ ; F (z)) ≥ α/(4k), a
clear contradiction due to the choice of z ∈ Ck . Thus Ck is an α-cone meager
set, which is claimed in the lemma. 
Finally, we are ready to prove the aforementioned result on generic Fréchet
differentiability of convex continuous functions.
Theorem 5.66 Let X be a Banach space with a separable dual, and let
f : Ω → R be a convex continuous function defined on a nonempty, open,
and convex subset Ω of X. Then there exists a dense Gδ set U ⊂ Ω such
that f is Fréchet differentiable at every point x ∈ U . This ensures that every
Banach space with a separable dual is Asplund.
Proof. Consider the mapping F : X →→ X ∗ defined by F (x) := ∂f (x) if x ∈ Ω
and F (x) := ∅ otherwise. Then F is monotone on X by the monotonicity of
the subgradient mapping; see Proposition 5.33. Consider further the sets
    

A := x ∈ Ω  lim diam F B(x; r) = 0 ,
  r↓0   

Ak := x ∈ Ω  lim diam F B(x; r) > k −1 , k ∈ N.
r↓0

Let {u∗k } be a dense set in X ∗ , which exists due to the assumed separability
of the dual space. Fix 0 < α < 1 and define the family of the sets
    α

Ck,m := x ∈ Ak  d u∗m ; F (x) < for all k, m ∈ N.
4k
Then we clearly have the representation
 
A= Ak and Ak = Ck,m as k ∈ N.
k∈N m∈N

It follows from Lemmas 5.64 and 5.65 that each set Ck,m is nowhere dense.
Define further U = Ω\A and deduce from the classical Baire category theorem
5.7 Spectral and Singular Functions in Convex Analysis 359

that U is a Gδ dense set. We obviously get that the mapping F is single-valued


and upper semicontinuous in the norm topology on X × X ∗ at every point of
U . Then Theorem 5.52 tells us that f is Fréchet differentiable at every x ∈ U .
The latter verifies that X is an Asplund space by Definition 5.58(a). 

The above theorem presents just one result from Asplund space the-
ory. In generality, Asplund spaces constitute a nice and broad subclass of
Banach spaces, which includes—besides spaces with separable duals as in
Theorem 5.66—every reflexive Banach space. One of the most beautiful char-
acterizations of Asplund spaces is as follows: a Banach space where any sep-
arable subspace has a separable dual. We discuss some other important facts
about Asplund spaces in Sections 5.8 and 5.9 with the references therein.

5.7 Spectral and Singular Functions in Convex Analysis


This section is devoted to the study of two special classes of convex functions
on finite-dimensional spaces that are highly important for various applications;
in particular, to optimization theory and numerical algorithms for solving
problems of machine learning, image reconstructions, etc. These applications
will be considered in the second volume of our book, while here we mainly
concentrate on subdifferential results, which are of their own interest and are
especially needed for such applications. Besides deriving subgradient and ε-
subgradient formulas for spectral and singular functions, this section presents
simplified proofs for some versions of the fundamental von Neumann trace
inequality and their applications. We keep here the standard notation used
in the theory and applications of spectral and singular functions while being
different from the notation system used in other parts of the book. This should
not cause confusion to the readers.

5.7.1 Von Neumann Trace Inequality

Throughout the whole section, Sn denotes the set of symmetric n×n matrices
with real entries, Sn+ stands for the subset of Sn consisting of positive semidef-
inite matrices, and On denotes the set of n × n real orthonormal matrices.
That is, U ∈ On if and only if U T U = I, where the symbol U T indicates the
transpose of the square matrix U . Given a matrix A ∈ Rn×n , recall that its
trace is defined by
n

tr(A) := Aei , ei , (5.36)
i=1

where {e1 , . . . , en } is the standard orthonormal basis of Rn . First we show


that definition (5.36) does not change if instead of {e1 , . . . , en } we use any
other orthonormal basis of Rn .

Lemma 5.67 Let {u1 , . . . , un } be an orthonormal basis of Rn . Then for any


A ∈ Rn×n we have the trace representation
360 5 VARIATIONAL TECHNIQUES . . .

n

tr(A) = Aui , ui . (5.37)
i=1

Proof. Let U ∈ Rn×n be the basic transition matrix such that U ei = ui for
all i = 1, . . . , n. Then U is an orthonormal matrix, and therefore
tr(A) = tr(U T AU ),
which holds since the trace is a commutative operation, i.e., tr(AB) = tr(BA).
It readily yields due to (5.36) that
n

tr(A) = tr(U T AU ) = U T AU ei , ei
i=1
n
 n

= AU ei , U ei = Aui , ui .
i=1 i=1

This gives us (5.37) and thus completes the proof of the lemma. 

It is easy to check that trace (5.36) of the product AB of any matrices


A, B ∈ Sn with the entries aij and bij , respectively, can be represented as
n 
 n
tr(AB) = aij bij . (5.38)
i=1 j=1

Now we are ready to derive a basic version of von Neumann’s trace inequal-
ity for symmetric positive semidefinite matrices. It is said that the matrices
A, B ∈ Sn+ admit a simultaneously ordered spectral decomposition if there exist
an orthonormal matrix U ∈ On together with diagonal matrices A1 , B1 ∈ Sn+
having decreasing entries on the diagonal such that
A = U T A1 U and B = U T B1 U.

Theorem 5.68 Let A, B ∈ Sn+ . Then we have the trace inequality


 
tr(AB) ≤ λ(A), λ(B) , (5.39)
where λ(A) = [λ1 (A), . . . , λn (A)]T (and similarly for λ(B)) consists of all the
eigenvalues of A satisfying
λ1 (A) ≥ λ2 (A) ≥ . . . ≥ λn (A).
Furthermore, (5.39) holds as an equality if and only if the matrices A and B
admit a simultaneously ordered spectral decomposition.
5.7 Spectral and Singular Functions in Convex Analysis 361

Proof. By using the spectral decompositions of A and B, we find orthonormal


bases {u1 , . . . , un } and {w1 , . . . , wn } giving us the representations
n
 n

Ax = αi x, ui ui and Bx = βi x, wi wi for all x ∈ Rn ,
i=1 i=1

where αi := λi (A) and β: = λi (B) as i = 1, . . . , n. Involving the trace defini-


tion (5.36) and representation (5.37) from Lemma 5.67, we have
n

tr(AB) = ABwi , wi
i=1
n 
n
= A βj wi , wj wj , wi
i=1 j=1
n
 n

= αj βi wi , uj uj , wi
i=1 j=1
n n
 2
= αj βi uj , wi .
i=1 j=1

For every i = 1, . . . , n consider the vector xi := [1, . . . , 1, 0, . . . , 0]T ∈ Rn


where the number of 1 entries is exactly i. Then we get the representations
n
 n

λ(A) = ηi xi and λ(B) = γi xi
i=1 i=1

with ηi , γk ≥ 0 for all i = 1, . . . , n. Taking into account the obvious identity


n 
 n n
 j

aj br = aj br
j=1 r=j j=1 r=1

for any real numbers a1 , . . . , an and b1 , . . . , bn , we arrive at


n 
 n n 
 n 
n 
n
2
 2
αj βi wi , uj = ηr γs uj , wi
i=1 j=1 i=1 j=1 r=j s=i
n 
 n j 
i
 2
= ηj γk ur , ws
k=1 j=1 r=1 s=1
n  n
≤ ηj γi min j, i
i=1 j=1
n n
  
= ηj xj , γi xi = λ(A), λ(B) ,
j=1 i=1

and hence verify the von Neumann trace inequality (5.39) for A, B ∈ Sn+ . The
equality statement of the theorem follows from the above proof. 
362 5 VARIATIONAL TECHNIQUES . . .

Finally in this subsection, we derive von Neumann’s trace inequality for


arbitrary symmetric matrices.
Theorem 5.69 Let A, B ∈ Sn , i.e., they are symmetric matrices with real
entries. Then we have the trace inequality (5.39), where λ(A) and λ(B) are
defined in Theorem 5.68.
Proof. Choosing α > 0 such that A+αIn and B+αIn are positive semidefinite,
we apply Theorem 5.68 to get
   
tr (A + αIn )(B + αIn ) ≤ λ(A + αIn ), λ(B + αIn ) .
It follows therefore that
tr(AB) + α tr(A) + α tr(B) + α2
n
 n

= tr(AB) + α λi (A) + α λi (B) + α2
i=1 i=1
n
 n

 
≤ λ(A), λ(B) + α λi (A) + α λk (B) + α2 .
i=1 i=1

This clearly verifies (5.39) and the equality statement therein for this case
under consideration, and thus we are done with the proof. 
Yet another version of von Neumann’s trace inequality, which is needed to
study singular functions, is presented in Subsection 5.7.3.

5.7.2 Spectral and Symmetric Functions

Let us begin with the following definitions of the two major classes of functions
that are studied in this subsection. Now A and B stand for matrix variables.
Definition 5.70 We say that F : Sn → R is a spectral function if it is
On -invariant in the sense that
F (U AU T ) = F (A) for all A ∈ Sn and U ∈ On .

Definition 5.71 Given x ∈ Rn and σ ∈ sym(n), the symmetric group of order


n, denote by σx the element of Rn obtained by permuting the components of
x by σ. Then f : Rn → R is called a symmetric function if f (x) = f (σx)
for all σ ∈ sym(n).
Our first result here describes a particular class of spectral functions gen-
erated by eigenvalues of symmetric matrices.
Proposition 5.72 Consider the eigenvalue mapping λ : Sn → Rn defined by
λ(A) := (λ1 (A), . . . , λn (A)), where λi (A) for i = 1, . . . , n are the eigenvalues
of A in nonincreasing order. Then we have the representation
λ(A) = λ(U AU T ) for all A ∈ Sn and U ∈ On .
5.7 Spectral and Singular Functions in Convex Analysis 363

Proof. Fixing any X ∈ Sn and U ∈ On we get


 
det(U AU T − λI) = det U (A − λI)U T = det(A − λI).
Thus λ is an eigenvalue of X if and only if it is an eigenvalue of U AU T . This
verifies the claimed statement. 

The next proposition shows that a general class of spectral functions can
be reduced to the eigenvalue ones via compositions with symmetric functions.

Proposition 5.73 A function F : Sn → R is spectral if and only if there exists


a symmetric function f : Rn → R such that
 
F (A) = f λ(A) for all A ∈ Sn . (5.40)

Proof. Let F be a spectral function. Define f : Rn → R by


 
f (x) := F Diag(x) for x ∈ Rn ,
where Diag(x) signifies the n × n orthogonal matrix which entries on the
diagonal are components of x. Fix any A ∈ Sn and U ∈ On satisfying
 
A = U T diag(λ(A)) U.
Then we have the representations diag(λ(A)) = U AU T and
   
f λ(A) = F diag(λ(A)) = F (U AU T ) = F (A).
To verify the opposite implication, suppose that there exists a symmetric
function f : Rn → R such that (5.40) is satisfied. Then for any U ∈ On we get
   
F (U AU T ) = f λ(U AU T ) = f λ(A) = F (A),
and thus F is a spectral function. 

To deal further with Fenchel conjugates and subgradients of symmetric


functions (and hence of spectral functions due to Proposition 5.73), we present
first the following lemma on the symmetry of conjugates.
Lemma 5.74 If f : Rn → R be a symmetric function, its Fenchel conjugate
f ∗ : Rn → R is symmetric as well.

Proof. Fix any x, y ∈ Rn and σ ∈ Sn . Since x, y = σx, σy , we get



f ∗ (σx) = sup σy, σx − f (σy)  y ∈ Rn

= sup y, x − f (y)  y ∈ Rn = f ∗ (x),
which shows that f ∗ is a symmetric function. 
364 5 VARIATIONAL TECHNIQUES . . .

The next theorem establishes a precise calculus formula for computing the
conjugates of compositions of type (5.40) that contain spectral functions.
Theorem 5.75 Let f : Rn → R be a symmetric function, and let λ : Sn →
Rn be the eigenvalue mapping defined in Proposition 5.72. Then the Fenchel
conjugate of the composite function (f ◦ λ) : Sn → R is the composition of the
Fenchel conjugate of f with λ, i.e.,
(f ◦ λ)∗ (B) = (f ∗ ◦ λ)(B) for all B ∈ Sn . (5.41)

Proof. It is based on the von Neumann trace inequality for symmetric matrices
from Theorem 5.69, which tells us that
 
tr(AB) ≤ λ(A), λB) for all A, B ∈ Sn . (5.42)
It follows from (5.42) and the definition of conjugates that
 
(f ◦ λ)∗ (B) = sup tr(AB) − f λ(A)  A ∈ Sn
 
≤ sup λ(A), λ(B) − f λ(A)  A ∈ Sn

≤ sup x, λ(B) − f (x)  x ∈ Rn = (f ∗ ◦ λ)(B)
for all B ∈ Sn . This justifies the inequality “≤” in (5.41).
To verify the opposite inequality in (5.41), we get from the spectral decom-
position that B = U T (Diag λ(B))U for some U ∈ On . Recalling the trace
commutation tr(BA) = tr(AB) for all B, A ∈ Sn gives us the relationships

(f ∗ ◦ λ)(B) = sup x, λ(B) − f (x)  x ∈ Rn
  
= sup tr (Diag x)(Diag λ(B) − f (x)  x ∈ Rn
  
= sup tr (Diag x)T − f (x)  x ∈ Rn
  
= sup tr (Diag x)U BU T − (f ◦ λ)(Diag x)  x ∈ Rn
   
= sup tr U T (Diag x)U B − (f ◦ λ)(U T Diag x)U | x ∈ Rn

≤ sup tr(BA) − (f ◦ λ)(A)  A ∈ Sn = (f ◦ λ)∗ (B),
which therefore justify the claimed matrix conjugate rule (5.41). 

Let us mention an immediate consequence of Theorem 5.75 saying that


the l.s.c. and convexity properties of f yield those for f ◦ λ. We use it in the
subsequent results of this subsection that provide the ε-subdifferential and
subdifferential chain rules for convex composite functions of matrix variables.
5.7 Spectral and Singular Functions in Convex Analysis 365

Theorem 5.76 Let f : Rn → R be a proper, convex, and symmetric func-


tion, and let λ : Sn → Rn be the eigenvalue mapping. Then for any matrices
A, B ∈ Sn and any number ε ≥ 0 the following ε-subgradient properties of the
composition f ◦ λ are equivalent:
(a) B ∈ ∂ε (f ◦ λ)(A).
(b) We have γ := ε + A, B − λ(A), λ(B) ≥ 0 and λ(B) ∈ ∂γ f (λ(A)).

Proof. Supposing that (a) is satisfied, we get by Proposition 5.18 that


(f ◦ λ)∗ (B) + (f ◦ λ)(A) ≤ A, B + ε. (5.43)
Applying now Theorem 5.75 tells us that
     
f ∗ λ(B) + f λ(A) ≤ B, A + ε
    
= λ(B), λ(A) + B, A − λ(B), λ(A) + ε
 
= λ(B), λ(A) + γ.
Using the Fenchel-Young inequality from Proposition 4.9 yields
     
λ(A), λ(B) ≤ f ∗ λ(B) + f λ(A) ≤ B, A + ε,
which ensures that γ ≥ 0. Applying again Proposition 5.18 leads us to λ(B) ∈
∂γ f (λ(A)), which therefore justifies implication (a)=⇒(b).
To verify the converse implication (b)=⇒(a), we get from (b) that
       
f ∗ λ(B) + f λ(A) ≤ λ(B), λ(A) + γ = A, B + ε,
which yields the fulfillment of (5.43) and hence of B ∈ ∂ε (f ◦ λ)(A). 

As a consequence of Theorem 5.76 and von Neumann’s trace inequality, we


obtain the equivalent descriptions of subgradients of spectral compositions.

Corollary 5.77 Let f : Rn → R satisfy the assumptions of Theorem 5.76.


Then for any matrices A, B ∈ Sn the following properties are equivalent:
(a) B ∈ ∂(f ◦ λ)(A).
(b) A, B = λ(A), λ(B) and λ(B) ∈ ∂f (λ(A)).
(c) A and B have a simultaneously ordered spectral decomposition and satisfy
the inclusion λ(B) ∈ ∂f (λ(A)).

Proof. The equivalence between (a) and (b) follows from Theorem 5.76 with
ε = 0. The equivalence between (b) and (c) is a consequence of the equality
statement in Theorem 5.69. 

The next theorem provides yet another equivalent description of subgra-


dients for compositions f ◦ λ in terms of On -invariants from Definition 5.70.
366 5 VARIATIONAL TECHNIQUES . . .

Theorem 5.78 Let f : Rn → R satisfy the assumptions of Theorem 5.76.


Then for any matrices A, B ∈ Sn the following properties are equivalent:
(a) B ∈ ∂(f ◦ λ)(A).
(b) A = U T (Diag x)U and B = U T (Diag y)U for some matrix U ∈ On and
vectors x, y ∈ Rn with y ∈ ∂f (x).
Proof. If (a) holds, then we deduce from Corollary 5.77 that the matrices A
and B have a simultaneously ordered spectral decomposition, i.e.,
   
A = U T Diag λ(A) and B = U T Diag λ(B) U for some U ∈ On .
Letting x := λ(A) and y := λ(B), we get
 
A = U T (Diag x)U and B = U T (Diag y)U with y = λ(B) ∈ ∂f λ(A) = ∂f (x),

which gives us (b) and hence verifies implication (a)=⇒(b).


To prove the opposite implication (b)=⇒(a), let A := U T (Diag x)U and
B := U T (Diag y)U for some matrix U ∈ On and vectors x, y ∈ Rn with
y ∈ ∂f (x). Then we can easily check that
det(A − λI) = (x1 − λ)(x2 − λ) . . . (xn − λ) for x = (x1 , . . . , xn ),
which shows that the components of x are the eigenvalues of A. Dealing sim-
ilarly with (y, B) tells us that y is composed of the eigenvalues of B. Thus x
is a permutation of λ(A) and y is a permutation of λ(B). Since f and f ∗ are
symmetric and since y ∈ ∂f (x), we obtain
     
f λ(A) + f ∗ λ(B) = f (x) + f ∗ (y) ≤ λ(A), λ(B) ,
which yields λ(B) ∈ ∂f (λ(A)) and thus verifies implication (b)=⇒(a). 
The following result is a direct consequence of Theorem 5.78.
Corollary 5.79 Let f : Rn → R satisfy the assumptions of Theorem 5.76.
Then for all A ∈ Sn we have the subdifferential representation

∂(f ◦ λ)(A) = U T (Diag v)U  v ∈ ∂f (λ(A)), U ∈ On . (5.44)

5.7.3 Singular Functions and Their Subgradients


In this subsection we consider a class of singular functions, which are related
to but different from the spectral functions studied above. The convex analysis
of this class of functions conducted below is similar but somewhat different
from the one for spectral functions conducted in Subsection 5.7.2. Let A be
an m × n real matrix. The singular value decomposition of the matrix A is
given in the form
A := U ΣV T ,
where U is an m × m real orthonormal matrix, Σ is a diagonal m × n matrix
with nonnegative numbers on the diagonal, and V is an n×n real orthonormal
matrix. Each entry on the diagonal of Σ is called a singular value of A.
5.7 Spectral and Singular Functions in Convex Analysis 367

Definition 5.80 Consider a mapping σ : Rm×n → Rp with p := min{m, n}.


We say that σ(·) is a singular mapping if it maps any matrix A ∈ Rm×n
into the vector σ(A) = (σ1 (A), . . . , σp (A)) consisting of the singular values of
A in descending order
σ1 (A) ≥ · · · ≥ σp (A).
Given further a symmetric function f : Rp → R, its composition (f ◦ σ) :
Rm×n → R with a singular mapping σ(·) is called a singular function.

In the rest of this subsection we always let p := min{m, n}. Also, the inner
product on the matrix space Rm×n is defined via the matrix trace
m 
 n
A, B := tr(AT B) = aij bij , (5.45)
i=1 j=1

where A := (aij ) and B := (bij ) ∈ Rm×n .


As in Subsection 5.7.1, we broadly employ below the corresponding version
of von Neumann’s trace inequality formulated in the next theorem. The proof
of this theorem is similar to the quadratic case of Theorem 5.68, and we leave
it as an exercise for the reader.
Theorem 5.81 Given two matrices A, B ∈ Rm×n , let A = UA ΣA VAT and
B := UB ΣB VBT be their ordered singular value decompositions. Then we have
tr(AT B) ≤ σ(A)T σ(B), (5.46)
where equality holds if and only if UA = UB and VA = VB , i.e., the singular
decompositions are simultaneously ordered.
To proceed further, we need the following important notion.
Definition 5.82 An extended-real-valued function f : Rp → R is said to be
absolutely symmetric if f (x) = f (|x|) for all x ∈ Rn , where |x| denotes
the vector with components |xi | arranged in nonincreasing order.
The next lemma is useful in the proofs of the main results below.

Lemma 5.83 Let A := U (Diag x)V T , where U ∈ Rm×m , V ∈ Rn×n , and


x ∈ Rp . Then |x| consists of the singular values of A.

Proof. It follows from the constructions that


 
det(AT A − γIn ) = det V Diag(x) Diag(x)T V T − γIn
 
= det V (Diag(x) Diag(x)T − γIn )V T
 
= det Diag(x) Diag(x)T − γIn

= (x21 − γ)(x22 − γ) . . . (x2p − γ)(−γ)|m−n| .


368 5 VARIATIONAL TECHNIQUES . . .

This shows that each x2i for i = 1, . . . , p is an eigenvalue of AT A, and thus


|xi | is a singular value of A. 
Combining this lemma with the von Neumann trace inequality (5.46) leads
us to deriving the major results on Fenchel conjugates and subgradients of
singular functions. First we present the following theorem on conjugates.
Theorem 5.84 Let (f ◦ σ) : Rm×n → R be a singular function in the sense
of Definition 5.80, and let f : Rp → R therein be absolutely symmetric. Then
the Fenchel conjugate of the composition f ◦ σ agrees with the composition of
the Fenchel conjugate of f with σ, i.e., we have
(f ◦ σ)∗ (B) = (f ∗ ◦ σ)(B) for any B ∈ Rm×n . (5.47)

Proof. Fixing any B ∈ Rm×n and applying the von Neumann trace inequality
from Theorem 5.81, we get
 
(f ◦ σ)∗ (B) = sup tr(AT B) − f σ(A)  A ∈ Rm×n
   
≤ sup σ(A), σ(B) − f σ(A)  A ∈ Rm×n
  
≤ sup x, σ(B) − f (x)  x ∈ Rp = (f ∗ ◦ σ)(B),
which verifies the inequality “≤” in (5.47). To prove the opposite inequal-
ity therein, we employ the singular decomposition of B given by B =
U (Diag σ(B))V T for some U ∈ Om and V ∈ On , where Diag λ(B) is the
m × n matrix obtained by placing components of σ(B) on its diagonal.
Using now Lemma 5.83 and the facts that tr(AT B) = tr(B T A) and that
tr(AB) = tr(BA) for all A, B ∈ Rm×n tells us that
  
(f ∗ ◦ σ)(B) = sup σ(B), x − f (x)  x ∈ Rp
 T 
= sup tr (Diag σ(B) Diag x) − f (x)  x ∈ Rp
  
= sup{tr (V T B T U Diag x) − f (x)  x ∈ Rp

= sup tr(B T U Diag xV T ) − f (x)  x ∈ Rp
  
= sup tr (V (Diag x)T U T B − f (x)  x ∈ Rp
   
= sup tr (U Diag xV T )T B − (f ◦ σ) U (Diag x)V T  x ∈ Rp

≤ sup tr(AT B) − (f ◦ σ)(A)  A ∈ Rm×n = (f ◦ σ)∗ (B),
which therefore completes the proof of the theorem. 

Next we proceed with subdifferential calculus for singular matrix functions.


5.7 Spectral and Singular Functions in Convex Analysis 369

Theorem 5.85 Let f : Rp → R be a proper, convex, and absolutely symmetric


function. Then for any ε ≥ 0 and any matrices A, B ∈ Rm×n the following
properties are equivalent:
(a) B ∈ ∂ε (f ◦ σ)(A).
(b) We have γ := ε + A, B − σ(A), σ(B) ≥ 0 and σ(B) ∈ ∂γ f (σ(A)).

Proof. If (a) holds, then Proposition 5.18 ensures that


 
(f ◦ σ)∗ (B) + (f ◦ σ)(A) ≤ A, B + ε. (5.48)
Applying Theorem 5.84 gives us the relationships
     
f ∗ σ(B) + f σ(A) ≤ B, A + ε
    
= σ(B), σ(A) + B, A − σ(B), σ(A) + ε
 
= σ(B), σ(A) + γ.
Using further the Fenchel-Young inequality, we arrive at
       
σ(A), σ(B) ≤ f ∗ σ(B) + f σ(A) ≤ B, A + ε,
which yields γ ≥ 0. Applying Proposition 5.18 tells us that σ(B) ∈ ∂γ f (σ(A))
and hence verifies implication (a)=⇒(b).
For the converse one (b)=⇒(a) we get from (b) that
       
f ∗ σ(B) + f σ(A) ≤ σ(B), σ(A) + γ = A, B + ε.
This gives us (5.48) and B ∈ ∂ε (f ◦ σ)(A); thus we are done. 

The next corollary and its proof are parallel to the case of spectral func-
tions considered in Subsection 5.7.2.

Corollary 5.86 Let f : Rn → R be a proper, convex, and absolutely symmet-


ric function. Then for any A, B ∈ Sn we have the equivalent assertions:
(a) B ∈ ∂(f ◦ σ)(A).
(b) A, B = σ(A), σ(B) and σ(B) ∈ ∂f (σ(A)).
(c) A and B have a simultaneously ordered singular decomposition and satisfy
the inclusion σ(B) ∈ ∂f (σ(A)).

Proof. The equivalence between (a) and (b) follows directly from Theo-
rem 5.85 with ε = 0. The equivalence between (b) and (c) is a consequence of
the equality statement in Theorem 5.81. 

The last theorem of this subsection provides yet another description of


subgradients of singular functions that is again parallel to the case of spectral
functions in Subsection 5.7.2 with the additional usage of the absolute sym-
metry of the outer function f in the singular decomposition from Lemma 5.83.
370 5 VARIATIONAL TECHNIQUES . . .

Theorem 5.87 Let f : Rp → R be a proper, convex, and absolutely symmetric


function. Then for any A, B ∈ Rm×n the following are equivalent:
(a) B ∈ ∂(f ◦ σ)(A).
(b) A = U (Diag x)V T and B = U (Diag y)V T for some matrix U ∈ Om ,
V ∈ On and vectors x, y ∈ Rp with y ∈ ∂f (x).

Proof. Assuming (a), we get by Corollary 5.86 that the matrices A and B
have a simultaneously ordered singular decomposition. This gives us matri-
ces U, V ∈ Om such that A = U (Diag σ(A))V T and B = U (Diag σ(B))V T .
Denoting x := σ(A) and y := σ(B) implies that A = U (Diag x)V T and
B = U (Diag y)V T with y = σ(B) ∈ ∂f (σ(A)) = ∂f (x). This yields (b) and
thus verifies implication (a)=⇒(b).
Finally, we check that (b)=⇒(a). Having (b), let A := U (Diag x)V T and
B := U (Diag y)V T for some matrix U ∈ Om , V ∈ On and vectors x, y ∈
Rp with y ∈ ∂f (x). It is easy to see that the components |xi | and |yi | are
the singular values of A and B, respectively. Employing Lemma 5.83 and
remembering that both functions f and f ∗ are absolutely symmetric, as well
as that y ∈ ∂f (x), we arrive at the conditions
     
x, y = f (x) + f ∗ (y) = f σ(A) + f ∗ σ(B) ≤ σ(A), σ(B) .
This ensures that σ(Y ) ∈ ∂f (σ(X)) and thus completes the proof. 

The following consequence of Theorem 5.87 is a direct counterpart of


Corollary 5.79 for the case of singular functions.

Corollary 5.88 Under the assumptions of Theorem 5.87 we have the subdif-
ferential representation for singular functions:

∂(f ◦ σ)(A) = U (Diag v)V T  v ∈ ∂f (σ(A)), U ∈ Om , V ∈ On . (5.49)

Proof. Pick Y ∈ ∂(f ◦ σ)(A) and deduce from Theorem 5.87 that there exist
matrices U ∈ Om and V ∈ On such that A = U (Diag σ(A))V T and B =
U (Diag σ(B))V T with σ(B) ∈ ∂f (σ(A)). It shows that B belongs to the set
on the right-hand side of (5.49).
Suppose now that B := U (Diag v)V T with v ∈ ∂f (σ(A)), U ∈ Om , and
V ∈ On . Denoting x := σ(A) and y := v we deduce the opposite inclusion in
(5.49) directly from Theorem 5.87. 

This subsection is concluded by an illustrative example, which is of its own


importance for constructing some numerical algorithms.
Example 5.89 Given A ∈ Rm×n , consider its nuclear norm
p

A ∗ := σi (A),
i=1
5.8 Exercises for Chapter 5 371

where σ1 (A), . . . , σp (A) are singular values of A arranged in nonincreasing


order. Then the nuclear norm function p(A) := A ∗ can be represented as
p(A) = (f ◦ σ)(A) for A ∈ Rm×n
with the standard 1 -norm f (x) := x 1 on Rp .
For any A ∈ Rm×n take its singular value decomposition A = U σ(A)V T .
It follows from Corollary 5.88 that all the subgradients of the nuclear norm
function are calculated by

∂p(A) = U (Diag v)V T  v ∈ ∂f (σ(A)), A = U σ(A)V T ,
where ∂f (σ(A)) is the subdifferential of the 1 -norm at σ(A) given by
  
∂f σ(A) = v = (v1 , . . . , vp ) ∈ Rp  vi = 1 if σi (A) > 0
and vi ∈ [−1, 1] if σi (A) = 0 .

Since σi (A) ≥ 0, we get from the above that U V T ∈ ∂ A ∗ for A = U ΣV T .


This allows us to find the most convenient subgradient to pick when working
with numerical algorithms for the nuclear norm regularization.

5.8 Exercises for Chapter 5


Exercise 5.90 Give a proof of the subdifferential variational principle from Theo-
rem 5.3 by using the convex extremal principle. Hint: Compare with the proof of
[228, Theorem 2.28] in the nonconvex case.

Exercise 5.91 Give a direct proof of Theorem 5.10 without using the Ekeland vari-
ational principle. Hint: Use variational arguments similar to those in [228, Theo-
rem 2.10] in Fréchet smooth spaces together with the result of Lemma 5.4.

Exercise 5.92 Give a direct proof of Corollary 5.11 without using the Brøndsted-
Rockafellar density theorem.

Exercise 5.93 Let X be a Banach space. Deduce the Ekeland variational principle
and the Brøndsted-Rockafellar density theorem from the Bishop-Phelps result of
Corollary 5.12. Hint: Compare with the variational proofs in [125].

Exercise 5.94 Let X be a Banach space. Define the set


  
Λ := x∗ ∈ X ∗  x∗  = x∗ , x for some x ∈ X with x = 1 .

Prove that Λ is dense in the space X ∗ equipped with the strong topology.

Exercise 5.95 Give a detailed proof of Proposition 5.16.

Exercise 5.96 Give a detailed proof of Proposition 5.17.

Exercise 5.97 Verify inclusion (5.21) in the proof of Theorem 5.24.

Exercise 5.98 Verify the fulfillment of the relationship in (5.15). Hint: Use the
proof of the biconjugate part of Theorem 4.15.
372 5 VARIATIONAL TECHNIQUES . . .

Exercise 5.99 Consider the function f : R → R defined by



∞ if ≤ 0,
f (x) := √
− x if x > 0.

(a) Show that f is a convex function.


(b) Find ∂f (0) and ∂ε f (0) for ε > 0.

Exercise 5.100 Let Ω be a nonempty convex subset of a normed space X, and let
ε ≥ 0. Prove that the following hold:
 
(a) ∂ε p(x) = x∗ ∈ B∗  x ≤ x∗ , x + ε}, where p(x) := x for x ∈ X.
 ∗  
(b) ∂ε δΩ (x) = x ∈ X ∗  σΩ (x∗ ) ≤ x∗ , x + ε , x ∈ Ω.

Exercise 5.101 Let f, g : X → R be proper convex functions on a topological vec-


tor space X that are bounded from below by the same affine function, and let
x ∈ dom(f ) ∩ dom(g). Prove that the subdifferential of the infimal convolution is
represented in the form

∂(f g)(x) = ∂ε f (x) ∩ ∂ε g(x − x) .


ε>0 x∈X

Hint: Use the conjugate descriptions of subgradients and ε-subgradients. Compare


with the proof of [166, Theorem 1.1].

Exercise 5.102 Develop the asymptotic versions without qualification conditions


of the other subdifferential calculus rules presented in Chapters 3 and 4 for extended-
real-valued convex functions on topological vector spaces.

Exercise 5.103 (a) Consider the marginal function given in the form
 
μ(x) := inf{ϕ(y)  Ay = x ,

where A : Y → X is a linear continuous operator between topological vector


spaces, and where ϕ : Y → R is a proper convex function. Given x ∈ A(dom(ϕ)),
prove the subdifferential representation
 ∗  
∂μ(x) = x ∈ X ∗  A∗ x∗ ∈ ∂ε ϕ(y) .
ε>0 {y | Ay=x}

Here we assume that μ(x) > −∞ for all x ∈ X. Hint: Proceed based on the def-
initions and the conjugate representations of subgradients and ε-subgradients.
Compare with the proof of [166, Theorem 4.1].
(b) Derive an extension of the results from (a) to the more general class of marginal
functions considered in Theorem 4.56.

Exercise 5.104 Let f : R → R be a convex function. Prove that f is nondecreasing


if and only if ∂f (x) ⊂ [0, ∞) for all x ∈ R. Hint: Use Lemma 5.25.

Exercise 5.105 Let f : X → R be a continuous convex function defined on a


normed space. Suppose that there exists ≥ 0 such that ∂f (x) ⊂ B∗ for all x ∈ X.
Prove that f is Lipschitz continuous on X. Hint: Use Theorem 5.27.
5.8 Exercises for Chapter 5 373

Exercise 5.106 Consider the function f : R → R defined by




⎨0 if |x| ≤ 1,
f (x) := 1 if x = 1 or x = −1,


∞ otherwise.

(a) Verify that f is a convex function.


(b) Show that the mean value result of Lemma 5.25 fails for this function on [−1, 1]
and clarify the reason for it. Would it be possible to apply the approximate
mean value results from Theorems 5.29 and 5.30?
Exercise 5.107 Let f : X → R be a proper, convex, and l.s.c. function on a topo-
logical vector space X, and let a, b ∈ dom(f ) with a = b.
(a) Prove that there exists c ∈ (a, b) such that
  
f (b) − f (a) = x∗ , b − a  x∗ ∈ ∂ε f (c) .
ε>0

Hint: Proceed similarly to the proof of the mean value result of Theorem 5.27
with replacing in the proof therein the usage of the usual subdifferential chain
rule by its asymptotic ε-subdifferential counterpart from Theorem 5.24.
(b) Does the result of (a) go back to Theorem 5.27 when f is continuous?
(c) Compare the asymptotic mean value theorem from (a) with the approximate
mean valued results from Theorems 5.29 and 5.30 in the case of l.s.c. convex
functions on Banach spaces.
Exercise 5.108 Clarify the possibility of using the asymptotic mean value theorem
from Exercise 5.107 to relax the continuity assumption on f in the characterization of
maximal monotone subdifferential mappings in the topological vector space setting
of Theorem 5.34.
Exercise 5.109 Let X be a Banach space.
(a) Give a proof of the result formulated in Remark 5.36(a) by using conjugate
calculus and directional derivatives. Hint: Consider first the case where X is a
reflexive Banach space and compare with the proof of [307, Theorem B].
(b) Simplify the proof of the result in (a) when X is a finite-dimensional space
and also when X is a Hilbert space. Hint: Compare with the proofs in [317,
Theorem 12.17] and in [34, Theorem 22.24].
Exercise 5.110 Give an example of a convex function f : R2 → R the derivative of
which has an uncountable set of discontinuities.
Exercise 5.111 Let X be a normed space, let f : X → R be a proper extended-
real-valued function, and let x ∈ dom(f ).
(a) Show that the Fréchet differentiability of f at x yields the continuity of f at
this point.
(b) Clarify the question in (a) for the case of Gâteaux differentiability.
Exercise 5.112 Let F : X → → Y be a set-valued mapping between topological vector
spaces, and let x ∈ dom(F ).
374 5 VARIATIONAL TECHNIQUES . . .

(a) Clarify the relationships between the upper semicontinuity and topological outer
semicontinuity of F at x.
(b) When do the topological and sequential outer semicontinuity agree?
(c) Compare the outer semicontinuity notions from Definition 5.48 with the closed-
graph property of F .

Exercise 5.113 Let X be a Banach space.


(a) Prove that X is an Asplund space if and only if any convex continuous function
f : Ω → R defined on a nonempty open convex subset Ω of X is Fréchet differ-
entiable on a dense subset of Ω. Hint: Show that the Fréchet differentiability
points of f always form a Gε subset of U .
(b) Show that the Asplund property of X in (a) is equivalent to the Fréchet differ-
entiability of every convex continuous function at some point of X.
(c) Clarify whether it is possible to replace the Fréchet differentiability by the
Gâteaux one in (a) and (b) to characterize weak Asplund spaces.

Exercise 5.114 Recall that the two norms  · 1 and  · 2 are equivalent on a vector
space X if there exist positive constants α and β such that

αx2 ≤ x1 ≤ βx2 for all x ∈ X. (5.50)


(a) Prove that if a Banach space X admits an equivalent norm that is Gâteaux
differentiable at all nonzero points, then for every open convex subset ∅ = U ⊂
X and every continuous convex function f : U → R there exists a dense set of
points in U where f is Gâteaux differentiable. Hint: Use variational arguments
and compare with the proof of [290, Corollary 4.16].
(b) Show that the classical spaces ∞ and L∞ [0, 1] do not admit equivalent Gâteaux
differentiable norms. Hint: Use the result of (a) and the facts that the norms of
these spaces are nowhere Gâteaux differentiable.
(c) Give an example of a convex continuous function f defined on a nonseparable
Hilbert space X such that the collection of all the Gâteaux differentiability
points of f is not a Gδ subset of X.

Exercise 5.115 Let X be a Banach space. Prove that X is Asplund if and only
if every separable subspace of X has a separable dual. Hint: Verify first that a
separable Banach space X is Asplund if and only if X ∗ is separable. Compare with
the proofs of [105, Theorem 5.7] and [290, Theorem 2.34].

Exercise 5.116 Let X and Y be Asplund spaces.


(a) Prove that the product space X × Y is Asplund.
(b) Verify that the the dual unit ball B∗ ⊂ X ∗ is weak∗ sequentially compact.
Hint: Use the characterization from Exercise 5.115 in both cases (a) and (b).

Exercise 5.117 Prove the following:


(a) Any Banach space X, which admits an equivalent renorming (5.50) that is
Fréchet differentiable at all nonzero points, is Asplund.
(b) Any reflexive Banach space is Asplund. Does it follow from (a)?
5.9 Commentaries to Chapter 5 375

(c) The space c0 of numerical sequences converging to zero is Asplund, while the
spaces 1 , ∞ , and C[0, 1] are not.
Hint: Use the Asplund space characterization from Exercise 5.115 for the proofs of
all the assertions in (a)–(c).

Exercise 5.118 Verify the equality statements in Theorems 5.68 and 5.69.

Exercise 5.119 Verify that for any matrices A, B ∈ Sn the trace of their product
AB given in (5.38) agrees with definition (5.36).

Exercise 5.120 Prove that each of the functions F : Sn → R defined below is a


spectral function. Find a function f : Rn → R such that F = f ◦ λ with the following
properties:
(a) F (A) = λmax (A), which is the largest eigenvalue of A.

(b) F (A) = n λi (A).
 i=1
tr(A−1 ) if A  0,
(c) F (A) =
∞ otherwise.

−1
λmax (A ) if A  0,
(d) F (A) =
∞ otherwise.

Exercise5.121 Consider the function F : Sn → R defined by


λmax (A−1 ) if A  0,
F (A) :=
∞ otherwise.

(a) Find the conjugate function F ∗ .


(b) Find the subdifferential ∂F (A).

Exercise 5.122 Give a detailed proof of the version of von Neumann’s trace
inequality in Theorem 5.81. Hint: Proceed similarly to the proof of Theorem 5.68.

Exercise 5.123 Define the function F : Rm×n → R by F (A) := σmax (A) for A ∈
Rm×n and find its subdifferential ∂F (A).

5.9 Commentaries to Chapter 5


Variational methods and results of convex analysis have started with the seminal
theorem on density of support points of closed convex sets in Banach spaces pub-
lished in 1961 in the paper [39] by Errett Bishop (1928–1983) and Robert Phelps
(1926–2013); see Theorem 5.7. The Bishop-Phelps theorem constituted a break-
through departure from the conventional techniques and results of convex analy-
sis in infinite-dimensional spaces that were based on convex separation theorems
under nonempty interior assumptions. The key element of Bishop-Phelps’ proof was
in introducing a certain convex cone in the Banach space in question, associating
this cone with a partial ordering, and then applying transfinite induction via Zorn’s
lemma. This idea was crucial in the subsequent proof of Brøndsted-Rockafellar’s
theorem on subgradient density [64] presented in Theorem 5.10 and in the original
proof of Ekeland’s variational principle formulated in Theorem 5.2.
376 5 VARIATIONAL TECHNIQUES . . .

The latter fundamental result of variational analysis was discovered by Ivar Eke-
land (born in 1944) in 1972 [118]. A complete proof of this result was given in
[119] by following Bishop-Phelps’ path with the usage of Zorn’s lemma. A signifi-
cantly simpler proof of the Ekeland variational principle was suggested by Michael
Crandall as a personal communication to Ekeland and was reproduced in [120]. We
mainly follow this path in the proof of Theorem 5.2. The scope of applications of
the Ekeland variational principle, its equivalents, modifications, and extensions is
enormous. Among less expected applications of this fundamental result and its set-
valued extensions, let us mention those to models of behavioral sciences initiated by
Antoine Soubeyran, where not only the formulation but mainly the proofs of the
variational results play a crucial role in reaching the practical conclusions; see, e.g.,
the papers by Bao et al. [30], Mordukhovich and Soubeyran [259], and the references
therein.
We also refer the reader to the so-called smooth variational principles, notably
to the Borwein-Preiss [50] and Deville-Godefroy-Zizler [105] ones. Detailed relation-
ships between such variational principles for l.s.c. functions and the extremal prin-
ciples for closed sets can be found in the book by Mordukhovich [229, Section 2.3]
with more references and discussions therein. The subdifferential variational princi-
ple of Theorem 5.3, which is derived there as a consequence of Ekeland’s principle
and the subdifferential sum rule, was originally established (and named so) by Mor-
dukhovich and Wang [260] in a nonconvex setting as a characterization of Asplund
spaces by using the approximate extremal principle; see also [229, Subsection 2.3.2].
The convex extremal principle for closed and convex sets in Banach spaces pre-
sented in Theorem 5.5 was obtained in our paper [239], where the proof did not
provide, however, all the details. Both approximate and exact versions of The-
orem 5.5 are significantly different in important aspects from the corresponding
results obtained in Chapter 3 in the topological vector space setting, as well as from
the nonconvex counterparts given in [229, Chapter 2] and the references therein.
Indeed, the topological vector space results in Theorem 3.7 do not contain any
approximate version, which is given nevertheless in [229, Theorem 2.20] for noncon-
vex sets in Asplund spaces. It should be emphasized that the latter result establishes
only necessary conditions for set extremality (with no sufficiency even in the con-
vex case), while Theorem 5.5(b) gives us necessary and sufficient conditions of this
type for extremality of convex sets. Moreover, the Asplund space structure is not
just sufficient for the fulfillment of the approximate extremal principle in [229, The-
orem 2.20] for any closed sets, but also necessary for this property. On the other
hand, Theorem 5.5(b) holds in an arbitrary Banach space. A specific feature of con-
vexity is exploited in Lemma 5.4, which does not require a smooth renorming as in
the proofs of [229, Theorems 2.10 and 2.20].
The exact version of the convex extremal principle in Theorem 5.5(c) yields the
convex set separation in Banach spaces without any nonempty interior assumptions
as in the topological vector space setting of Theorem 3.7; see more discussions in
Remark 2.187 on the SNC characterization for convex sets. Observe that the sequen-
tial normal compactness property is utilized in the proof of Theorem 5.5(c) in general
Banach spaces, with no sequential compactness of the dual ball as for the case of
Asplund spaces in [229, Theorem 2.22].
The notion of ε-subdifferentials (known also as approximate subdifferentials) for
convex functions given in Definition 5.8 was introduced by Brøndsted and Rock-
afellar in [64] who proved there the Brøndsted-Rockafellar density theorem (Theo-
rem 5.10) and established other topological properties of ε-subdifferentials. It has
5.9 Commentaries to Chapter 5 377

been realized later on that ε-subgradient mappings for ε > 0 exhibit some better
properties in comparison with the classical case of ε = 0. This made it possible
to use ε-subgradients in constructing efficient numerical algorithms, which were
started from the paper by Bertsekas and Mitter [36]. In contrast to convex sub-
gradient mappings ∂f (·), their ε-subgradient expansions ∂ε f (·) with fixed ε > 0
possess continuity and certain local Lipschitzian properties as was first shown by
Nurminskii [281]; see also Hiriart-Urruty [161] and more discussions in his book with
Lemaréchal [164]. The latter two-volume book, as well as the more recent monograph
by Zălinescu [361], contain a variety of theoretical and algorithmic developments and
many applications of ε-subdifferentials of convex functions.
The exact calculus rules for ε-subdifferentials presented in Subsection 5.2.1 under
the imposed qualification conditions in finite and infinite dimensions are similar
to the corresponding results for the classical subdifferentials, while the asymptotic
ε-subdifferential calculus rules of Subsection 5.2.2 are significantly different since
they do not require any qualification conditions. The results of this type have been
initiated by Hiriart-Urruty and Phelps [166] and then have been largely developed
and applied in the literature; see, e.g., Penot [286], Thibault [334], and Zălinescu
[361] with further references and discussions.
It has been well recognized in mathematics that the classical (Lagrange) mean
value theorem for differentiable functions plays a fundamental, crucial role in many
aspects of real analysis and numerous applications. Note that the proof of the clas-
sical mean value theorem is based on the two optimization results; namely, on the
Weierstrass theorem ensuring that any continuous real function attains its mini-
mum and maximum on compact intervals and on the Fermat stationary rule for
local extrema of differentiable functions. The proofs of the mean value results in
Subsection 5.3.1 are based on the same ideas with the additional usage of the subd-
ifferential chain rule in the device of Theorem 5.27 for convex continuous functions
on topological vector spaces. The approximate mean value results presented in Sub-
section 5.3.2 are highly different in formulations and proofs from the “continuous”
mean value version of the preceding subsection. First of all, they address extended-
real-valued, lower semicontinuous, convex functions and thus can be applied, e.g., to
constrained optimization. As we see, the proof of these results is based on variational
principles replacing the classical Weierstrass existence theorem.
The approximate mean value theorems do not have any preceding counterparts
in convex analysis. Both Theorems 5.29 and 5.30 and their proofs are due to Darinsz
Zagrodny who established them [358] for nonconvex l.s.c. functions on Banach spaces
in terms of the Clarke subdifferential; we’ll discuss this more in Chapter 7.
Mean value theorems for convex functions are instrumental in the study of var-
ious aspects of convex analysis and its applications. In particular, in Section 5.4
we present applications of both continuous and approximate versions of the mean
value theorem to the maximal monotonicity of subgradient mappings associated with
convex functions on topological vector spaces and Banach spaces, respectively. The
main result, Theorem 5.35, is proved by employing Zagrodny’s approximate mean
value theorem taken from Theorem 5.30; cf. Borwein [45]. The fundamental maximal
monotonicity result was first established by Rockafellar [303, 307] by using varia-
tional arguments involving the Brøndsted-Rockafellar theorem. We refer the reader
to the book by Simons [326] for other proofs of Theorem 5.35, with all of them being
based on certain variational techniques. Various results on maximal monotone oper-
ators, their enlargements, and algorithmic applications can be found in the book
378 5 VARIATIONAL TECHNIQUES . . .

by Burachik and Iusem [66]. In the case of Hilbert spaces, a comprehensive study
and numerous applications of monotone operators and subdifferential mappings of
convex analysis are given in the book by Bauschke and Combettes [34].
The notion of Gâteaux derivative was introduced by Réne Gâteaux (1889–
1914) in [138], while the notion of Fréchet derivative in infinite dimensions was
defined and studied earlier in [135]. Both Fréchet and Gâteaux were students of
Jacques Hadamard (1865–1963) who also introduced his derivative notion in infinite-
dimensional spaces; see Chapter 7 for more details. Note that the Gâteaux derivative,
being a directional derivative, is naturally defined on topological vector spaces. On
the other hand, the Fréchet derivative is an infinite-dimensional extension of the
classical derivative in finite dimensions and is uniform in directions while requiring
in its definition the normed space structure.
The main results on differentiability and generic differentiability presented in
Sections 5.5 and 5.6 are well known; most of them can be found scattered in the
book by Phelps [290], while we present here more elaborations. Example 5.43 is
taken from [141]. Note that the usage of the subdifferential mean value theorem in
the proof of Theorem 5.52 follows the paper by Cuong and Nam [93] and that the
result of Theorem 5.55 of the Fréchet differentiability of conjugates is taken from
Borwein and Vanderwerf [52].
The classes of Asplund and weak Asplund spaces were introduced by Edgar
Asplund in [9] as “strong differentiability” and “weak differentiability” Banach
spaces, respectively; these spaces were renamed in honor of Asplund by Namioka
and Phelps [274] after his death in 1974. Note that Asplund spaces are strongly
related to Fréchet geometric structures of Banach spaces. In particular, it has been
proved by Ekeland and Lebourg [121], by using Ekeland’s variational principle, that
any Banach space admitting an equivalent norm that is Fréchet differentiable at
nonzero points is Asplund. On the other hand, Haydon [154] constructed an exam-
ple showing that an Asplund space may fail to have even the Gâteaux differentiable
norm off the origin.
The class of Asplund spaces constitutes one of the most beautiful objects in
mathematics that has been well investigated in functional analysis and applications.
This class covers, in particular, reflexive Banach spaces and those with separable
duals. Furthermore, Asplund spaces admit a variety of nice characterizations some of
which have been mentioned in the text (see Sections 5.6 and 5.8); their proofs can be
found, e.g., in the books [105, 290]. Besides this, we refer the reader to the excellent
survey by Yost [355], which collects basic facts and proofs from the Asplund space
theory. Observe that, contrary to Asplund spaces, their weak Asplund counterpart
exhibits a modest number of satisfactory results presented in Fabian’s book [126].
The broad usage of Asplund spaces in variational analysis and generalized dif-
ferentiation, with their novel variational characterizations and applications, can be
found in the two-volume book by Mordukhovich [229] with detailed references.
Section 5.7 deals with some functions depending on quadratic matrices and their
eigenvalues. The importance of these issues for a variety of applications, including
algorithmic design, is difficult to overstate. A characteristic feature of such functions
is their intrinsic nonsmoothness, which naturally calls for employing subgradients
and their calculus. Pioneering work in this direction has been done by Michael
Overton; see his paper [283] and subsequent publications with various collaborators
over the years, e.g., [69, 146, 147, 284] and the references therein. Note also the
influential paper by Adrian Lewis [203] on nonsmooth analysis of eigenvalues.
5.9 Commentaries to Chapter 5 379

An underlying role in the study of matrix-dependent functions and related prob-


lems of eigenvalue optimization is paid by the von Neumann trace inequality dis-
covered in [346]. This is given in Theorem 5.68 for which we obtain a simplified
proof. The subdifferential study of the spectral and singular functions conducted
in Section 5.7 follows the papers by Lewis and Sendov [204, 205] who also covered
nonconvex settings. Being concentrated on the convex case, we present conjugate
relations and the computation formulas for ε-subgradients (first obtained by Seeger
for spectral functions) by using general results of ε-subdifferential calculus for gen-
eral convex functions developed in Section 5.2.
6
MISCELLANEOUS TOPICS ON
CONVEXITY

This chapter deals with certain miscellaneous topics of convex analysis in


infinite-dimensional and finite-dimensional spaces. Most of the presented
developments are not directly related to generalized differentiation, although
some results employ subgradients. We mainly concentrate here on strong con-
vexity and related monotonicity of subgradient mappings, which are partic-
ularly important for Nesterov’s smoothing techniques in numerical optimiza-
tion; on the study of asymptotic behavior of convex sets and functions at
infinity; on the considerations of the remarkable classes of signed distance and
minimal time functions; and on classical finite-dimensional results related to
the Carathéodory and Helly theorems, Farkas lemma, etc.

6.1 Strong Convexity and Strong Smoothness


This section collects major facts on strong convexity and strong smoothness
of functions defined on normed spaces. These results are of undoubted interest
for their own sake while being particularly important for applications.

6.1.1 Basic Definitions and Relationships

We start by introducing the following basic notions.

Definition 6.1 Let f : X → R be an extended-real-valued function on a


normed space X, and let Ω ⊂ X be a nonempty convex subset of X.
(a) We say that f is strongly convex on Ω if there exists a constant σ > 0
such that
  x − y2
f λx + (1 − λ)y + σλ(1 − λ) ≤ λf (x) + (1 − λ)f (y) (6.1)
2
for all x, y ∈ Ω and all λ ∈ (0, 1). When σ is prescribed a priori, f is
called σ-strongly convex, or strongly convex on Ω with modulus σ.

© Springer Nature Switzerland AG 2022 381


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 6
382 6 MISCELLANEOUS TOPICS ON CONVEXITY

(b) We say that f is strongly smooth on Ω if there exists γ > 0 such that
  x − y2
f λx + (1 − λ)y + γλ(1 − λ) ≥ λf (x) + (1 − λ)f (y)
2
for all x, y ∈ Ω and all λ ∈ (0, 1). When γ is prescribed a priori, f is
called γ-strongly smooth, or strongly smooth on Ω with modulus γ.

The next assertion characterizes the strong convexity of a function via the
usual convexity of its quadratic shift.

Proposition 6.2 Let f : X → R be given on an inner product space X, and


let σ > 0. Then f is σ-strongly convex on X if and only if the quadratically
σ-shifted function g : X → R defined by
σ
g(x) := f (x) − x2 for all x ∈ X
2
is convex on the space in question.

Proof. Due to the inner product structure of the space X, for any x, y ∈ X
and λ ∈ (0, 1) we can easily check the identity
x − y2 σ 
σλ(1 − λ) = λx2 + (1 − λ)y2 − λx + (1 − λ)y2 .
2 2
This implies that definition (6.1) admits the rearrangement
  σ 2  σ   σ 
f λx+(1−λ)y − λx+(1−λ)y  ≤ λ f (x)− x2 +(1−λ) f (y)− y2 ,
2 2 2
which clearly verifies the claimed equivalence. 

Now we present an interesting example, which illustrates the strong con-


vexity of quadratic forms on inner product spaces and, in particular, of positive
definite symmetric matrices in terms of their eigenvalues.

Example 6.3 Let X be an inner product space, and let A : X → X be a


self-adjoint operator, i.e., such that
Ax, y = x, Ay for all x, y ∈ X.
Suppose that there exists σ > 0 with the property
σx2 ≤ Ax, x whenever x ∈ X.
Then the function f : X → R defined by
1
Ax, x for x ∈ X
f (x) :=
2
is strongly convex on X with modulus σ. To check it, we get by the direct
calculations that
6.1 Strong Convexity and Strong Smoothness 383
 
λf (x) + (1 − λ)f (y) − f λx + (1 − λ)y
λ  1 − λ  1   
= Ax, x + Ay, y − A λx + (1 − λ)y , λx + (1 − λ)y
2 2 2
λ(1 − λ)   x − y2
= A(x − y), x − y ≥ σλ(1 − λ) .
2 2
It follows from Definition 6.1 that f is strongly convex with modulus σ.
As a particular case, let A ∈ Rn×n be a symmetric matrix. Then it has an
orthonormal set of eigenvectors u1 , . . . , un corresponding to the eigenvalues
λ1 ≥ λ2 . . . ≥ λn . We get the estimate
λn x2 ≤ Ax, x for all x ∈ Rn .
In the case where λn > 0, the function f : Rn → R defined by f (x) := 12 Ax, x
for x ∈ Rn is strongly convex. In this case, the matrix A is positive definite.

The next result is a direct consequence of Proposition 6.2.

Corollary 6.4 Let f : Ω → R be a convex function defined on a nonempty,


open, and convex subset Ω ⊂ Rn . Suppose that f is twice continuously differ-
entiable on Ω. Then f is σ-strongly convex on Ω if and only if
 
σd2 ≤ ∇2 f (x)d, d for all x ∈ Ω, d ∈ Rn . (6.2)

Proof. Consider the function g(x) := f (x) − σ2 x2 for x ∈ Ω the convexity
of which is equivalent, by Proposition 6.2, to the σ-strong convexity of f
on Ω. Since ∇2 g(x) = ∇2 f (x) − σIn , the claimed characterization (6.2) of
the σ-strong convexity of f follows from the characterization of convexity for
C 2 -smooth functions in Theorem 2.118. 

Now we show that strong convexity of functions is actually equivalent to


strong smoothness of their conjugates with the precise reciprocal relationships
between the corresponding moduli.

Theorem 6.5 Let X be a normed space, and let f : X → R be a proper


function. The following assertions hold:
(a) If f : X → R is σ-strongly convex on X with some σ > 0, then the
conjugate function f ∗ is 1/σ-strongly smooth on X ∗ .
(b) If f is convex and strongly smooth on X with some modulus γ > 0, then
the conjugate function f ∗ is strongly convex on X ∗ with modulus 1/γ.

Proof. It follows from Definition 6.1(a) that whenever λ ∈ (0, 1) we have


  x − u2
f λx + (1 − λ)u + σλ(1 − λ) ≤ λf (x) + (1 − λ)f (u) for all x, u ∈ X.
2
384 6 MISCELLANEOUS TOPICS ON CONVEXITY

Fix now x∗ , u∗ ∈ X ∗ and λ ∈ (0, 1), and then show that


  x∗ − u∗ 
f ∗ λx∗ + (1 − λ)u∗ ≥ λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) + σλ(1 − λ) . (6.3)
2
Since f is proper, we have −∞ < f ∗ (x∗ ) and −∞ < f ∗ (u∗ ). Then for any
numbers η1 , η2 ∈ R with
η1 < f ∗ (x∗ ) and η2 < f ∗ (u∗ )
we find vectors x, u ∈ X satisfying the conditions
η1 < x∗ , x − f (x) and η2 < x∗ , u − f (u).
Defining further zλ := λx + (1 − λ)u and zλ∗ := λx∗ + (1 − λ)u∗ , observe that
λx∗ , x + (1 − λ)u∗ , u = zλ∗ , zλ + λ(1 − λ)x∗ − u∗ , x − u . (6.4)
This leads us to the relationships
λη1 + (1 − λ)η2 < λx∗ , x + (1 − λ)u∗ , u − λf (x) + (1 − λ)f (u)
x − u2
≤ λx∗ , x + (1 − λ)u∗ , u − σλ(1 − λ) − f (zλ )
2
x − u2
= zλ∗ , zλ − f (zλ )+λ(1 − λ) x∗ − u∗ , x − u −σ .
2
The Fenchel-Young inequality (Proposition 4.9) yields
x − u2
λη1 + (1 − λ)η2 < zλ∗ , zλ − f (zλ ) + λ(1 − λ) x∗ − u∗ , x − u − σ
2
  x∗ − u∗ 2
≤ f ∗ λx∗ + (1 − λ)u∗ + λ(1 − λ) .

Letting η1 → f ∗ (x∗ ) and η2 → f ∗ (u∗ ), we get (6.3) and thus verify (a).
To continue with the proof of (b), observe from Definition 6.1(b) that
  x − u2
f λx + (1 − λ)u ≥ γλ(1 − λ) + λf (x) + (1 − λ)f (u) for all x, u ∈ X
2
whenever λ ∈ (0, 1). Fix any pairs x, u ∈ X and x∗ , u∗ ∈ X ∗ . The aforemen-
tioned Fenchel-Young inequality tells us that
f ∗ (x∗ ) ≥ x∗ , x − f (x) and f ∗ (u∗ ) ≥ u∗ , u − f (u).
Then for any λ ∈ (0, 1), we have the relationships
   
λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) ≥ λ x∗ , x − f (x) + (1 − λ) u∗ , u − f (u)
= λx∗ , x + (1 − λ)u∗ , u − λf (x) − (1 − λ)f (u)
x − u2
≥ λx∗ , x + (1 − λ)u∗ , u − γλ(1 − λ)
  2
−f λx + (1 − λ)u .
6.1 Strong Convexity and Strong Smoothness 385

Denoting zλ∗ := λx∗ + (1 − λ)u∗ and zλ := λx + (1 − λ)u and then taking into
account (6.4) yield in turn the estimate
 
λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) ≥ zλ∗ , zλ − f (zλ )
 x − u2 
+λ(1 − λ) x∗ − u∗ , x − u − γ .
2
With v := x − u, this gives us the following chain of inequalities:
 ∗ 
λf ∗ (x∗ ) + (1 − λ)f ∗ (u∗ ) ≥ sup zλ , u + λ(x − u) − f (u + λ(x − u))
x,u∈X
 x − u2 
+λ(1 − λ) x∗ − u∗ , x − u − γ
 2
≥ sup sup zλ∗ , u + λv − f (u + tv)
v∈X u∈X
 v2 
+λ(1 − λ) x∗ − u∗ , v − γ
 2 v2 
≥ sup f (zλ ) + λ(1 − λ) x∗ − u∗ , v − γ
∗ ∗
v∈X 2

 ∗ ∗
 λ(1 − λ) x∗ − u∗ 2
≥ f λx + (1 − λ)u + ,
γ 2
which verifies the 1/γ-strong convexity of f ∗ on X ∗ by Definition 6.1(a). 

6.1.2 Strong Convexity/Strong Smoothness via Derivatives

In this subsection, we use Gâteaux and Fréchet derivatives to study strong


convexity and related strong smoothness of functions and their conjugates.
Let us begin with the γ-strong smoothness of functions on normed spaces.

Proposition 6.6 If a proper function f : X → R is γ-strongly smooth on a


normed space X, then dom(f ) = X.

Proof. Pick any x ∈ dom(f ) and suppose on the contrary that f (x1 ) = ∞ for
some x1 ∈ X. Choose further x2 ∈ X such that x := (x1 + x2 )/2. Then we
have by the imposed γ-strong smoothness of f that
f (x1 ) + f (x2 ) x + x  γ γ
1 2
∞= ≤f + x1 −x2 2 = f (x)+ x1 −x2 2 < ∞,
2 2 8 8
which is clearly a contradiction. 

To proceed further, we need the following result on the continuity of real-


valued, l.s.c., and convex functions defined on Banach spaces which is a con-
sequence of the Baire category theorem; see Theorem 1.137.

Lemma 6.7 Let f : X → R be a real-valued, l.s.c., and convex function


defined on a Banach space X. Then f is continuous on X.
386 6 MISCELLANEOUS TOPICS ON CONVEXITY

Proof. For each k ∈ N we form the closed set


 
Fk := x ∈ X  f (x) ≤ k
∞
and observe that X = k=1 Fk . Then the Baire category theorem tells us
that there exists a number k0 ∈ N such that int(Fk0 ) = ∅. Taking the ball
B(x; r) ⊂ Fk0 , we see that the function f is bounded from above on B(x; r),
and hence by Theorem 2.144 it is continuous on int(dom(f )) = X. 

The next result reveals some remarkable properties of strongly smooth


functions on Banach spaces in the presence of convexity.

Theorem 6.8 Let f : X → R be a proper, l.s.c., and convex function on a


Banach space X. If f is γ-strongly smooth on X with some modulus γ > 0,
then it is Gâteaux differentiable on dom(f ) = X and satisfies the following
properties for all x, u ∈ X:

(a) f (x) ≤ f (u) + fG (u), x − u + γ2 x − u2 .
 
(b) fG (x) − fG (u), x − u ≤ γ2 x − u2 .
  
(c) f (x) ≥ f (u) + fG (u), x − u + 2γ
1
fG (x) − fG (u)2 .
   
(d) fG (x) − fG (u), x − u ≥ γ1 fG (x) − fG (u)2 .
 
(e) fG (x) − fG (u) ≤ γx − u.

Proof. It is shown in Proposition 6.6 that dom(f ) = X. Fix x ∈ X and take


any v ∈ X. We have by Definition 6.1(b) that
f (x1 ) + f (x2 ) x + x  γ
1 2
≤f + x1 − x2 2
2 2 8
whenever x1 , x2 ∈ X. This implies for any t > 0 that
f (x + tv) + f (x − tv) γ
≤ f (x) + 2tv2 ,
2 8
which tells us, therefore, that
f (x + tv) − f (x) f (x − tv) − f (x)
+ ≤ tγv2 .
t t
Recalling that the directional derivative exists for convex functions yields
f (x + tv) − f (x) f (x − tv) − f (x)
f  (x; v) + f  (x; −v) = lim + lim ≤ 0.
t↓0 t t↓0 t
We have 0 = f  (x; v + (−v)) ≤ f  (x; v) + f  (x; −v), and so f  (x; v) =
−f  (x; −v) for all v ∈ X. This implies that the function f  (x; ·) is linear.
Using now Lemma 6.7 and Proposition 4.49 tells us that f  (x; ·) continuous.
Thus Proposition 5.38 yields the Gâteaux differentiability of f at x.
Let us now check all the properties listed in the theorem. To verify (a), we
get by the γ-strong smoothness of f that
6.1 Strong Convexity and Strong Smoothness 387

  x − u2
f tx + (1 − t)u + γt(1 − t) ≥ tf (x) + (1 − t)f (u) for all t ∈ (0, 1),
2
which readily yields the inequality
 
f u + t(x − u) − f (u) x − u2
+ γ(1 − t) ≥ f (x) − f (u).
t 2
Letting t ↓ 0 therein gives us the conclusion in (a).
To proceed further with checking (b), we get from (a) that
 γ
f (x) ≤ f (u) + fG (u), x − u + x − u2 ,
 γ2
f (u) ≤ f (x) + fG (x), u − x + x − u2 .
2
Summing up these two inequalities justifies the claimed conclusion in (b).
Next we verify property (c). Fix any direction d ∈ X and observe that
 γ
f (x + v) ≤ f (x) + fG (x), v + v2 ,

2
fG (u)x + v − u ≤ f (x + v) − f (u),
which leads us to the estimate
  γ
fG (u), x + v − u ≤ f (x) + fG (x), v + v2 − f (u).
2
Rearranging the terms therein gives us
  γ 
fG (u) − fG (x), v − v2 + fG (u), x − u ≤ f (x) − f (u).
2
Taking the supremum with respect to d ∈ X, i.e., passing to the Fenchel
conjugate on the left-hand side, clearly justifies (c).
To proceed with verifying (d), we get from (c) that
1

f (x) ≥ f (u) + fG (u)x − u + f  (x) − fG

(u)2 ,
2γ G
1

f (u) ≥ f (x) + fG (x), u − x + f  (x) − fG
(u)2 .
2γ G
Summing up these inequalities and rearranging the terms give us (d). The
last property (e) easily follows from (d), and thus the proof is complete. 

As a consequence of Theorem 6.8, we now derive a characterization of


γ-strong smoothness of convex functions via Fréchet differentiability.

Corollary 6.9 Let X be a Banach space, and let f : X → R be a proper,


l.s.c., and convex function. Then the following properties are equivalent:
(a) f is γ-strongly smooth with some γ > 0.
388 6 MISCELLANEOUS TOPICS ON CONVEXITY

(b) dom(f ) = X, f is Fréchet differentiable, and


∇f (x) − ∇f (u) ≤ γx − u for all x, u ∈ X.

Proof. To verify implication (a)=⇒(b), we deduce from Theorem 6.8 that f


is Gâteaux differentiable on dom(f ) = X and satisfies
 
fG (x) − fG (u) ≤ γx − u for all x, u ∈ X.
This justifies (b) by applying Theorem 5.52.
To prove the converse implication (b)=⇒(a), fix any x, u ∈ X and t ∈
(0, 1). Denoting zt := tx + (1 − t)u and ϕ(t) := f (zt ) = f (tx + (1 − t)u, we
get ϕ(0) = f (u), ϕ(1) = f (x), and
 1
f (x) − f (u) − ∇f (u), x − u = ϕ (t)dt − ∇f (u), x − u
 1
0

= ∇f (zt ) − ∇f (u), x − u dt


0 1
≤ ∇f (zt ) − ∇f (u) · x − udt
0 1
γ
≤ γtx − u2 dt = x − u2 .
0 2
This readily yields the estimates
  γ
f (x) ≤ f (zt ) + ∇f (zt ), (1 − t)(x − u) + (1 − t)2 x − u2 ,
  γ 2
f (u) ≤ f (zt ) − ∇f (zt ), t(x − u) + t2 x − u2 .
2
Multiplying the first inequality above by t, the second one by (1 − t), and
summing up the resulting inequalities give us (a). 

The next result provides a characterization of strong convexity for Gâteaux


differentiable functions on normed spaces.

Theorem 6.10 Let f : X → R be a real-valued convex function defined on


a normed space X. If f is Gâteaux differentiable on X, then we have the
equivalencies:
(a) f is σ-strongly convex on X.

(b) f (x) ≥ f (u) + fG (u), x − u + σ2 x − u2 for all x, u ∈ X.
 
(c) σx − u ≤ fG (x) − fG
2
(u), x − u for all x, u ∈ X.

Proof. To verify (a)=⇒(b), suppose that f is σ-strongly convex on X and fix


any x, u ∈ X and t ∈ (0, 1). Then it follows that
  x − u2   x − u2
f tx + (1 − t)u + σt(1 − t) = f u + t(x − u) + σt(1 − t)
2 2
≤ tf (x) + (1 − t)f (u),
6.1 Strong Convexity and Strong Smoothness 389

which implies therefore that


 
f u + t(x − u) − f (u) σ
+ (1 − t)x − u2 ≤ f (x) − f (u).
t 2
Letting t ↓ 0 therein justifies (b).
To check (b)=⇒(c), fix any x, u ∈ X and deduce from (b) that

f (x) ≥ f (u) + fG (u)(x − u) + (σ/2)x − u2 ,

f (u) ≥ f (x) + fG (x)(u − x) + (σ/2)x − u2 .
Summing up these inequalities and rearranging the terms justify (c).
To proceed with the verification of the opposite implication (c)=⇒(b), fix
x, u ∈ X and define the composite function
 
ϕ(t) := f tx + (1 − t)u for t ∈ [0, 1].
Then ϕ is differentiable with ϕ (t) = fG
(tx + (1 − t)u), x − u on (0, 1). Thus
 1  1

   
f (x) − f (u) = ϕ(1) − ϕ(0) = ϕ (t)dt = fG tx + (1 − t)u , x − u dt.
0 0

Note that ϕ is monotone increasing, and so it is Riemann integrable on [0, 1].
Employing the fundamental theorem of calculus, we arrive at

f (x) − f (u) − fG (x), x − u
 1
      
= fG tx + (1 − t)u , x − u − fG (x), x − u dt
0 1
1
≥ (1 − t)x − u2 dt = x − u2 .
0 2
It remains to verify implication (b)=⇒(a). Fixing x, u ∈ X and t ∈ (0, 1),
denote zt := tx + (1 − t)u and then get
  
f (x) ≥ f (zt ) + fG (zt ), (1 − t)(x − u) + (σ/2)(1 − t)2 x − u2 ,

f (u) ≥ f (zt ) − fG (zt ), t(x − u) + (σ/2)t2 x − u2 .
Multiplying the first inequality by t, the second one by (1 − t), and summing
up the obtained inequalities give us (a) and thus complete the proof. 

The next result of this subsection provides a characterization of strong


convexity of functions on Banach spaces via Fréchet differentiability of their
Fenchel conjugates together with a subgradient inequality.

Theorem 6.11 Let X be a Banach space, and let f : X → R be a proper,


l.s.c., and convex function. Then the following properties are equivalent:
(a) f is strongly convex on X with modulus σ > 0.
390 6 MISCELLANEOUS TOPICS ON CONVEXITY

(b) dom(f ∗ ) = X ∗ and f ∗ is Fréchet differentiable on X ∗ with


1 ∗
∇f ∗ (x∗ ) − ∇f ∗ (u∗ ) ≤ x − u∗  whenever x∗ , u∗ ∈ X ∗ . (6.5)
σ
Proof. Starting with the verification of (a)=⇒(b), suppose that f is strongly
convex with modulus σ > 0. Then Theorem 6.5(a) tells us that the Fenchel
conjugate f ∗ is 1/σ-strongly smooth. It follows from Corollary 6.9 that
dom(f ∗ ) = X ∗ , f ∗ is Fréchet differentiable, and estimate (6.5) holds.
Next we check implication (b)=⇒(a). Assuming (b), it follows from Corol-
lary 6.9 that f ∗ is 1/σ-strongly smooth on X ∗ . Then Theorem 6.5(b) tells us
that f is σ-strongly convex on X. 

Remark 6.12 In the proof (a)=⇒(b) of Theorem 6.11, we use the fact that
f ∗ is proper, convex, and l.s.c. on X ∗ equipped with the strong topology. We
also use the obvious inclusion X ⊂ X ∗∗ and the equality (f ∗ )∗ (x) = f (x) for
x ∈ X in the proof (b)=⇒(a).

Let us continue our investigation of interrelated properties of a function


f : X → R and its (Fenchel) conjugate f ∗ : X ∗ → R with the values
 
f ∗ (x∗ ) := sup x∗ , x − f (x)  x ∈ X , x∗ ∈ X ∗ ,
as defined and studied in Chapter 4. Here we consider these issues from dif-
ferent prospectives in comparison with the previous parts of the book. The
next proposition summarizes some properties of Fenchel conjugates under the
following coercivity/growth condition.

Proposition 6.13 Let f : X → R be a proper, l.s.c., and convex function on


a normed space X. Assume that f is coercive in the sense that
f (x)
lim = ∞. (6.6)
x→∞ x
Then we have that dom(f ∗ ) = X ∗ , and that f ∗ is continuous on the dual
space X ∗ equipped with the norm topology.

Proof. The general assumptions imposed on the function f ensure the exis-
tence of v ∗ ∈ X ∗ and η ∈ R for which
η + v ∗ , x ≤ f (x) whenever x ∈ X.
The coercive condition (6.6) allows us to find ν > 0 such that
 
x v ∗  + 1 ≤ f (x) if x ≥ ν.
It follows furthermore that
 
sup x∗ , x − f (x) x ≥ ν ≤ −x.
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 391

We also have the inequalities


 
sup x∗ , x − f (x) ≤ sup x∗ , x − v ∗ , x − η < ∞,
x≤ν x≤ν

which tell us that f ∗ (x∗ ) < ∞ and thus dom(f ∗ ) = X ∗ .


Since the conjugate function f ∗ is convex and l.s.c. on the Banach space

X equipped with the (dual) norm topology, we deduce from Lemma 6.7 that
f ∗ is continuous on the entire space X ∗ . 

6.2 Derivatives of Conjugates and Nesterov’s Smoothing


In this section, we continue the study of Gâteaux and Fréchet differentiabil-
ity of Fenchel conjugates and provide applications of the obtained results to
the powerful Nesterov’s smoothing techniques, which play a prominent role in
numerical optimization. These and related techniques are largely used in the
second volume of our book for developing numerical algorithms to solve opti-
mization and facility location problems. We mainly deal in this section with
extended-real-valued functions defined on normed spaces (in some cases on
topological vector spaces), although the major results require the complete-
ness of the spaces in question and further Banach space specifications.

6.2.1 Differentiability of Conjugate Compositions

In this subsection, we study differentiability properties for compositions of


conjugate functions with linear operators. Recall that the norm of a bounded
linear operator A : X → Y between normed spaces is defined by
 
A := sup Ax  x ≤ 1 .
It follows from the definition that Ax ≤ A · x for all x ∈ X. The adjoint
operator A∗ : Y ∗ → X ∗ of A can be written as A∗ y ∗ := y ∗ ◦ A for y ∗ ∈ Y ∗ . We
know that A = A∗  for any bounded linear operator A. Consider further
extended-real-valued functions f : X → R of the type
 
f (x) := sup Ax, y − g(y)  y ∈ Y , x ∈ X, (6.7)

where g : Y → R is a function defined on a normed space Y , and where


A : X → Y ∗ is a bounded linear operator defined on a normed space X. The
next proposition gives us additional conditions on Y and g that ensure the
Fréchet differentiability of the function f in (6.7).

Proposition 6.14 Let f be taken from (6.7), where g is a proper l.s.c. func-
tion, and where Y is a Banach space. If g is σ-strongly convex with some
modulus σ > 0 on Y , then f is Fréchet differentiable and ∇f is Lipschitz
continuous on X with constant A2 /σ.
392 6 MISCELLANEOUS TOPICS ON CONVEXITY

Proof. We get from (6.7) that f can be represented as the composition


f (x) = g ∗ (Ax) for all x ∈ X.
It follows from Theorem 6.11 that the conjugate function g ∗ is Fréchet dif-
ferentiable and ∇g ∗ is Lipschitz continuous on Y ∗ with constant 1/σ. This
implies that f is Fréchet differentiable on X with the derivative representation
∇f (x) = A∗ ∇g ∗ (Ax) for all x ∈ X,
and thus ∇f is Lipschitz continuous on X with constant A2 /σ. 

Using the obtained result together with the conjugate sum rule, we derive
now efficient conditions ensuring the Fréchet differentiability of the con-
strained version of the function (6.7).

Corollary 6.15 Given a nonempty, closed, and convex set Ω ⊂ Y , consider


the function fΩ : X → R defined by
 
fΩ (x) := sup Ax, y − h(y)  y ∈ Ω , x ∈ X,
where A and X are the same as in the definition of (6.7), where h : Y → R is
a real-valued l.s.c. function, and where Y is a Banach space. If h is σ-strongly
convex, then fΩ is Fréchet differentiable and its Fréchet derivative ∇fΩ is
Lipschitz continuous on X with constant A2 /σ.

Proof. We can obviously represent the function fΩ in the form


    
fΩ (x) = sup Ax, y − h(y)+δΩ (y)  y ∈ Y = sup Ax, y −g(y)  y ∈ Y ,

where g : Y → R is defined by g(y) := h(y) + δΩ (y) for y ∈ Y . Then g


is a proper and l.s.c. function. It follows from the definition that the strong
convexity of h with modulus σ yields this property of g with the same modulus.
Thus the claimed conclusions follow from Proposition 6.14. 

Remark 6.16 Given a proper convex function f : X → R on a normed space


X, the subdifferential mapping ∂f : X →
→ X ∗ is said to be strongly mono-
tone with modulus σ > 0 (or σ-strongly monotone) if
σx1 − x2 2 ≤ v1∗ − v2∗ , x1 − x2 whenever vi∗ ∈ ∂f (xi ), i = 1, 2.
In particular, this implies that
σx1 − x2  ≤ v1∗ − v2∗  whenever vi∗ ∈ ∂f (xi ), i = 1, 2.
It is well understood in convex analysis that the σ-strong convexity of a proper
l.s.c. convex function f : X → R defined on a Banach space X is equivalent
to the σ-strong monotonicity of the subgradient mapping ∂f : X → → X ∗ ; see
Exercise 6.89. Thus all the results concerning strong convexity of convex func-
tions obtained in Section 6.1 are equally applied to the strong monotonicity
of their subgradient mappings.
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 393

6.2.2 Nesterov’s Smoothing Techniques

We are now ready to describe a powerful smooth approximation technique


known as the Nesterov smoothing. It was suggested by Yurii Nesterov in [278]
in finite-dimensional spaces with strong applications to numerical algorithms
of nonsmooth optimization. Consider extended-real-valued functions f : X →
R of the type
 
f (x) := sup Ax, y − ϕ(y)  y ∈ Y , x ∈ X, (6.8)

where ϕ : Y → R is a function defined on a normed space Y , and where


A : X → Y ∗ is a bounded linear operator defined on a normed space X.
Consider another function p : Y → R called the prox-function for f below.

Definition 6.17 We say that p : Y → R is a prox-function for f from


(6.8) if the following conditions hold:
(a) p is proper, l.s.c., and σ-strongly convex with some constant σ > 0.
(b) dom(ϕ) ⊂ dom(p).
(c) p(y) ≥ 0 for all y ∈ dom(ϕ).
(d) sup p(y) < ∞.
y∈dom(ϕ)

Given μ > 0, define the μ-approximation function fμ for (6.8) by


 
fμ (x) := sup Ax, y − ϕ(y) − μp(y)  y ∈ Y , x ∈ X. (6.9)
The following theorem provides precise conditions under which the func-
tions fμ taken from (6.9) for μ > 0 constitute a family of smooth approxima-
tions of the original nonsmooth one (6.8). Recall that a real-valued function
defined on a normed space is said to be C 1,1 -smooth if it is continuously dif-
ferentiable and its (Fréchet) derivative is Lipschitz continuous.
Theorem 6.18 Let A : X → Y ∗ be a bounded linear operator between a
normed space X and the dual to a Banach space Y , and let ϕ : Y → R be
a proper, l.s.c., and convex function. Consider the function f : X → R from
(6.8), where p is a prox-function for f , and let fμ be taken from (6.9) for some
μ > 0. Then we have:
(a) fμ is a C 1,1 -smooth function on X with the uniform Lipschitz constant
of the gradient ∇fμ given by A2 /(σμ).
(b) We have the estimate
fμ (x) ≤ f (x) ≤ fμ (x) + μM for all x ∈ X with M := sup p(y).
y∈dom(ϕ)

Proof. Fix a positive number μ and observe from (6.9) that the function fμ
admits the representation
fμ (x) = (ϕ + μp)∗ (Ax) for all x ∈ X.
394 6 MISCELLANEOUS TOPICS ON CONVEXITY

It is easy to check that the σ-strong convexity of p clearly yields this property
of g(y) := ϕ(y) + μp(y) for y ∈ Y with modulus σμ. Thus assertion (a) follows
from Proposition 6.14.
Since p(y) ≥ 0 for all y ∈ Y and dom(ϕ) ⊂ dom(p), it holds that
  
fμ (x) = sup Ax, y − ϕ(y) + μp(y) | y ∈ Y 
= sup Ax, y − ϕ(y) + μp(y) | y ∈ dom(ϕ)∩ dom(p)
= sup Ax, y − ϕ(y) + μp(y) | y∈ dom(ϕ)
≤ sup Ax, y − ϕ(y)| y ∈ dom(ϕ) = f (x).
Furthermore, we get the relationships

f (x) = sup Ax, y − ϕ(y)|
 y ∈ dom(ϕ)
 
≤ sup Ax, y − ϕ(y) + μp(y))  y ∈ dom(ϕ) + μ sup p(y)
y∈dom(ϕ)
= fμ (x) + μM.
Therefore, the claimed estimate is completely verified. 
Next we consider the constrained version of (6.8). Given a nonempty set
Ω ⊂ Y , consider the function fΩ : X → R defined by
 
fΩ (x) := sup Ax, y − φ(y)  y ∈ Ω , x ∈ X, (6.10)
where φ : Y → R is a real-valued function defined on a normed space Y , and
where A : X → Y ∗ is a bounded linear operator defined on a normed space
X.
Consider a function p : Y → R and then, given μ > 0, define the μ-
approximation function fμ for (6.10) by
 
fμ (x) := sup Ax, y − φ(y) − μp(y)  y ∈ Ω , x ∈ X. (6.11)
Theorem 6.19 Let A : X → Y ∗ be a bounded linear operator between a
normed space X and the dual to a Banach space Y , let φ : Y → R be a
real-valued l.s.c. convex function, and let Ω ⊂ Y be a nonempty closed convex
set. Consider the function f : X → R from (6.10), and let fμ be taken from
(6.11) for some μ > 0. Suppose that
(a) p is proper, l.s.c., and σ-strongly convex on Ω with some constant σ > 0.
(b) Ω ⊂ dom(p).
(c) p(y) ≥ 0 for all y ∈ Ω.
(d) sup p(y) < ∞.
y∈Ω

Then we have:
(a) fμ is a C 1,1 -smooth function on X with the uniform Lipschitz constant
of the gradient ∇fμ calculated by A2 /(σμ).
(b) We have the estimate
fμ (x) ≤ fΩ (x) ≤ fμ (x) + μM for all x ∈ X with M := sup p(y).
y∈Ω
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 395

Proof. Observe that the function fΩ defined in (6.10) is a particular case of


(6.8) where ϕ(y) := φ(y) + δ(y; Ω) for y ∈ Y . Since ϕ is proper, l.s.c., and
σ-strongly convex with dom(ϕ) = Ω, the conclusions follow directly from
Theorem 6.18. 

Now we present an effective consequence of Theorem 6.19 and then illus-


trate the smoothing procedure by typical examples.
Corollary 6.20 Assume that in the setting of Theorem 6.19, the space Y is a
Hilbert space. Consider the function (6.11) with p(y) := 12 y − y0 2 for y ∈ Y ,
where y0 ∈ Ω and Ω ⊂ Y is bounded. Then:
(a) The function fμ given by (6.11) is Fréchet differentiable and its gradient
is Lipschitz continuous on X with Lipschitz constant A2 /μ.
(b) We have the estimate
μ 2
fμ (x) ≤ fΩ (x) ≤ fμ (x) + M (y0 ; Ω) for all x ∈ X,
2
where M (y0 ; Ω) := sup{y0 − y | y ∈ Ω} < ∞.
If in particular φ := b, y with some b ∈ Y , then fμ is represented via the
distance to Ω by
Ax − b2 μ  Ax − b  2
fμ (x) = + Ax − b, y0 − d y0 + ;Ω .
2μ 2 μ
Furthermore, fμ (x) is Fréchet differentiable on X with the derivative ∇fμ (x) =
A∗ uμ (x), where uμ can be expressed in terms of the Euclidean projection
 Ax − b 
uμ (x) := Π y0 + ;Ω ,
μ
and where the derivative ∇fμ is Lipschitz continuous with constant μ−1 A2 .

Proof. The conclusions follow directly from Theorem 6.19 with the observa-
tion that the prox-function p(y) is σ-strongly convex on Y with σ = 1. 

The first example concerns the simplest nonsmooth convex function on R.

Example 6.21 Consider the function f (x) := |x|, which can be represented
as f (x) = sup{xy | |y| ≤ 1} on R. Using the prox-function p(y) := y 2 /2 for
y ∈ R gives us the family of smooth approximations
⎧ 2

⎨x if |x| ≤ μ,
fμ (x) = 2μ μ

⎩|x| − if |x| > μ
2
for all μ > 0, which is depicted in Figure 6.1(a). Consider another choice of
the prox-function given by
396 6 MISCELLANEOUS TOPICS ON CONVEXITY

Fig. 6.1. Nesterov’s smoothing for f (x) := |x|. Case (a)

 
1 − 1 − y2 if |y| ≤ 1,
p(y) :=
∞ if |y| > 1

 that p is l.s.c. on R with p(y) ≥ y for all y ∈ R. Then we have


2
and observe
fμ (x) = x2 + μ2 − μ for x ∈ R and μ > 0; see Figure 6.2(b).

Fig. 6.2. Nesterov’s smoothing for f (x) := |x|. Case (b)

The next example addresses the norm function on a Hilbert space.


Example 6.22 Let X be a Hilbert space. Given b ∈ X, consider the function
f (x) := x − b for x ∈ X and represent it by
   
f (x) = sup x − b, y  y ∈ B = sup x, y − b, y  y ∈ B .
Then using Corollary 6.20 with p(y) := 1/2y2 gives us
x − b2 μ x − b  2
fμ (x) = − d ;B .
2μ 2 μ
We start the last example of this subsection with the following proposition.
6.2 Derivatives of Conjugates and Nesterov’s Smoothing 397

Proposition 6.23 For y = (y1 , . . . , ym ) ∈ Rm , consider the function


⎧m
⎪  m
⎨ yi ln(yi ) + ln(m) if yi > 0, i = 1, . . . , m and yi = 1,
p(y) := i=1


i=1
∞ otherwise.
Then p is strongly convex on Rm with p(y) ≥ 0 for all y ∈ Rm .

Proof. Restricting the above function p to the positive orthant



m
p+ (y) := yi ln(yi ) + ln(m) for yi > 0, i = 1, . . . , m,
i=1

observe that the Hessian ∇2 p+ (y) is positive definite for such y, and thus the
function p+ is strongly convex on the positive orthant. This yields the strong
convexity of p on Rm . The positivity of this function on R can be checked by
applying the method of Lagrange multipliers; see Exercise 6.94. 

Here is the aforementioned example that is useful in numerical optimiza-


tion.
Example 6.24 Consider the function

f (x) := max x1 , . . . , xn for x = (x1 , . . . , xn ) ∈ Rn ,
which can be equivalently represented as
  n 

f (x) = sup x1 y1 + . . . + xn yn  y1 , . . . , yn ≥ 0, yi = 1
  i=1
= sup x, y − δΩ (y)  y ∈ Rn ,
n
where Ω := {(y1 , . . . , yn ) ∈ Rn | y1 , . . . , yn ≥ 0, i=1 yi = 1}.
Define further the function
⎧ n
⎪ 
⎨ yi ln(yi ) + ln(n) if y1 , . . . , yn ≥ 0,
p(y) := i=1


∞ otherwise,
where we use the convention that yi ln(yi ) := 0 if yi = 0 for i = 1, . . . , n. Then
Proposition 6.23 tells us that p is a prox-function for f . Finally, it follows
from Theorem 6.19 and Corollary 6.20 that the functions
 
fμ (x) = sup x, y − δΩ (y) − μp(y)  y ∈ Rn
n n  

= sup xi yi − μ yi ln(yi ) − μ ln(n)  y ∈ Ω
i=1
n 
i=1  n xi /μ 
i=1 e
= μ ln exi /μ
− μ ln(n) = μ ln
i=1
n

for μ > 0 constitute a family of smooth approximations of the original one f


considered in this example.
398 6 MISCELLANEOUS TOPICS ON CONVEXITY

The smoothing technique discussed above plays an important role in solv-


ing facility location problems formulated in optimization formats; see, e.g.,
[271]. In particular, the smallest enclosing ball problem is formulated as
2  
minimize f (x) := max d(x; Ωi )  i = 1, . . . , m , x ∈ Rn ,
where the nonempty sets Ωi , i = 1, . . . , m, are closed and convex sets in Rn .
Similar to Example 6.24, we can construct a family of smooth approximations
of f by the functions
 m [d(x;Ωi )]2 /μ 
i=1 e
fμ (x) := μ ln , x ∈ Rn .
m
which then are efficiently minimized by using accelerated first-order optimiza-
tion methods of the gradient type.

6.3 Convex Sets and Functions at Infinity


In this section, we first study the behavior of unbounded convex sets at infin-
ity by using their horizon cones. Then we turn to the asymptotic behavior
of functions at infinity while concentrating on the two important classes of
perspective and horizon functions on infinite-dimensional spaces.

6.3.1 Horizon Cones and Unboundedness

The main object of our study is the following notion that allows us to char-
acterize the set boundedness and efficiently deal with unbounded sets.

Definition 6.25 Given a nonempty subset Ω of a vector space X and a point


x ∈ Ω, the horizon cone of Ω at x is defined by the formula
 
Ω∞ (x) := d ∈ X  x + td ∈ Ω for all t > 0 . (6.12)

Note that the horizon cone is also known in the literature as the asymptotic
cone of Ω at the point in question. Another equivalent definition of Ω∞ (x) is
 Ω−x
Ω∞ (x) = . (6.13)
t>0
t

The following proposition shows that Ω∞ (x) is the same for any x ∈ Ω
provided that the set Ω is closed and convex. Thus in this case, we can simply
use the notation Ω∞ for the horizon cone of Ω.

Proposition 6.26 Let Ω be a nonempty, closed, and convex set in a topolog-


ical vector space X. Then the horizon cone Ω∞ (x) is closed and convex for
each x ∈ Ω. In addition, we have the equality
Ω∞ (x1 ) = Ω∞ (x2 ) for all x1 , x2 ∈ Ω. (6.14)
6.3 Convex Sets and Functions at Infinity 399

Proof. The convexity of Ω∞ (x) follows from Definition 6.25 and the convexity
of Ω. Since Ω is closed, the set Ω∞ (x) is closed as well due to (6.13). To prove
(6.14), it suffices to verify that Ω∞ (x1 ) ⊂ Ω∞ (x2 ) whenever x1 , x2 ∈ Ω.
Taking any direction d ∈ Ω∞ (x1 ) and any number t > 0, we show that
x2 + td ∈ Ω. Indeed, consider the sequence
1   1

xk := x1 + ktd + 1 − x2 , k ∈ N,
k k
and observe that xk ∈ Ω for every k because d ∈ Ω∞ (x1 ) and Ω is convex.
We also have xk → x2 + td, and so x2 + td ∈ Ω since Ω is closed. Thus
d ∈ Ω∞ (x2 ), which completes the proof of the proposition. 
A nice and useful consequence of the above result is as follows:
Corollary 6.27 Let Ω ⊂ X be a closed and convex subset of a topological
vector space X, and let 0 ∈ Ω. Then we have

Ω∞ = tΩ.
t>0

Proof. It follows from (6.13) and Proposition 6.26 that


 Ω−0 1 
Ω∞ = Ω∞ (0) = = Ω= tΩ,
t>0
t t>0
t t>0

which, therefore, justifies the claim. 


The next proposition provides a sequential description of the horizon cone.
Proposition 6.28 Let Ω be a nonempty, closed, and convex subset of a topo-
logical vector space X. Given d ∈ X, the following are equivalent:
(a) We have d ∈ Ω∞ .
(b) There exist sequences {tk } ⊂ [0, ∞) and {xk } ⊂ Ω such that tk → 0 and
tk xk → d as k → ∞.
Proof. To verify (a)=⇒(b), take d ∈ Ω∞ and fix x ∈ Ω. It follows directly
from the definition that
x + kd ∈ Ω for all k ∈ N.
For each k ∈ N this allows us to find xk ∈ Ω such that
1 1
x + kd = xk , or equivalently, x + d = xk .
k k
1
Letting tk := , we see that tk xk → d as k → ∞.
k
To prove (b)=⇒(a), suppose that there exist sequences {tk } ⊂ [0, ∞) and
{xk } ⊂ Ω with tk → 0 and tk xk → d. Fix x ∈ Ω and verify that d ∈ Ω∞ as
k → ∞ by showing that
400 6 MISCELLANEOUS TOPICS ON CONVEXITY

x + td ∈ Ω for all t > 0.


Indeed, for any t > 0 we have 0 ≤ t · tk < 1 when k is sufficiently large. Thus
(1 − t · tk )x + t · tk xk → x + td as k → ∞.
It follows from the convexity of Ω that every element (1 − t · tk )x + t · tk xk
belongs to Ω. Hence x + td ∈ Ω by the closedness of Ω, and therefore, we
arrive at the desired inclusion d ∈ Ω∞ . 
Now we are in a position to establish the expected characterization of set
boundedness in terms of its horizon cone.

Theorem 6.29 Let Ω be a nonempty, closed, and convex subset of Rn . Then


the set Ω is bounded if and only if its horizon cone is trivial, i.e., Ω∞ = {0}.

Proof. Suppose that the set Ω is bounded and pick any element d ∈ Ω∞ .
Proposition 6.28 ensures the existence of sequences {tk } ⊂ [0, ∞) with tk → 0
and {xk } ⊂ Ω such that tk xk → d as k → ∞. It follows from the boundedness
of Ω that tk xk → 0, which shows that d = 0.
To verify the converse implication, suppose on the contrary that Ω is
unbounded while Ω∞ = {0}. Then there exists a sequence {xk } ⊂ Ω with
xk  → ∞. This allows us to construct a sequence of unit vectors by
xk
dk := , k ∈ N.
xk 
Passing to a subsequence if necessary ensures in the finite-dimensional setting
under consideration that dk → d as k → ∞ with d = 1. Fix any x ∈ Ω and
observe that for all t > 0 and k ∈ N sufficiently large, we have
 
t t
uk := 1 − x+ xk ∈ Ω
xk  xk 
due to the convexity of Ω. Since uk → x + td as k → ∞ and since Ω is closed,
it yields x + td ∈ Ω which ensures, therefore, that d ∈ Ω∞ . The obtained
contradiction completes the proof of the theorem. 

6.3.2 Perspective and Horizon Functions

Here we study the asymptotic behavior of functions at infinity and consider


two remarkable classes of functions that are very useful in this respect: the
perspective functions and the horizon functions, which both relate to geometric
properties of their (always unbounded) epigraphs.
We start with the class of perspective functions that are less investigated
in the literature in comparison with their horizon function counterpart.

Definition 6.30 Let f : X → R be a function given on a vector space X. The


perspective function Pf : R × X → R associated with f is defined by
6.3 Convex Sets and Functions at Infinity 401
 x
tf if t > 0,
Pf (t, x) := t
∞ otherwise.

The first result provides an expected property of perspective functions.


Proposition 6.31 Let f : X → R be a convex function defined on a vector
space X. Then the associated perspective function Pf is convex on R × X.

Proof. We can easily check that (t, x, λ) ∈ epi(Pf ) if and only if t > 0 and
(x, λ) ∈ t epi(f ). This leads us to the representation
 
epi(Pf ) = ta  t > 0, a ∈ A with A := {1} × epi(f ).
Since the set A is convex by the assumed convexity of f , we conclude that
the epigraph epi(Pf ) is convex as well. This confirms the convexity of Pf . 

The following example illustrates the usage of Proposition 6.31 to verify


function convexity when it does not directly follow from the construction.

Example 6.32 Consider the set


 
Ω := x ∈ Rn  c, x + γ > 0 ,

where c ∈ Rn \ {0} and γ ∈ R. Then given a convex function f : Rn → R, a


p × n matrix A, and a vector b ∈ Rp , we define another extended-real-valued
function g : Rn → R by
⎧  
⎨c, x + γ f Ax + b
if x ∈ Ω,
g(x) := c, x + γ (6.15)

∞ otherwise.
We aim at showing that this function is convex on Rn by employing the result
of Proposition 6.31. To proceed, construct the mapping B : R × Rn → R × Rp
by B(t, x) := (t, Ax + b) and observe that this mapping is affine. Using the
perspective function Pf of f , consider the composition h := (Pf ◦ B) : R ×
Rn → R. Since Pf is convex by Proposition 6.31, the composition h is also
convex. It is clearly expressed in the form
⎧  
⎨tf Ax + b if t > 0,
h(t, x) := t

∞ otherwise,

which shows that g(x) = h c, x + γ, x) for (6.15) on Rn . This verifies the
convexity of g that is not a direct consequence of the definition.

To proceed next with the construction and elaboration of the functional


counterpart of the horizon cone taken from Definition 6.25, we begin with the
following lemma, where the term “epigraphical set” means that this set can
be represented as the epigraph of some (convex) function.
402 6 MISCELLANEOUS TOPICS ON CONVEXITY

Lemma 6.33 Let f : X → R be a proper, l.s.c., and convex function on a


topological vector space X, and let Ω := epi(f ). Then the horizon cone Ω∞ is
an epigraphical set.
Proof. Fix a direction d = (v, λ) ∈ Ω∞ and pick any (x, f (x)) ∈ Ω. Then we
have by Definition 6.25 that
 
x, f (x) + t(v, λ) ∈ epi(f ) whenever t > 0.
This tells us that f (x + tv) ≤ f (x) + tλ for all t > 0. Taking further any
λ ≤ λ , we get f (x + tv) ≤ f (x) + tλ as t > 0, which implies in turn that
(v, λ ) ∈ Ω∞ . It is also easy to check that the set Ev := {λ ∈ R | (v, λ) ∈ Ω∞ }
is closed, and thus Proposition 2.132 ensures that Ω∞ is an epigraphical set.

Now we are ready to explicitly introduce the function whose epigraph
agrees with epi(f )∞ , i.e., the horizon cone associated with the epigraph of f .
Definition 6.34 Let f : X → R be a proper, l.s.c., and convex function on a
topological vector space X. The horizon function of f is given by
 
f∞ (v) := fepi(f )∞ (v) = inf t ∈ R  (v, t) ∈ epi(f )∞ .

Observe from Lemma 6.33 and its proof that f∞ : X → R is exactly the
function whose epigraph is epi(f )∞ . The next lemma gives us an equivalent
analytic description of the horizon function f∞ .
Lemma 6.35 Let f : X → R be a proper, l.s.c., and convex function defined
on a topological vector space X. Then f∞ : X → R is also a proper, l.s.c., and
convex function on this space that admits the representation
 
f∞ (v) = sup f (x + v) − f (x)  x ∈ dom(f ) .
Proof. Since the set epi(f )∞ is closed and convex, the horizon function f∞
is l.s.c. and convex as well. Fix any pair (v, λ) ∈ epi(f∞ ) = epi(f )∞ and any
vector x ∈ dom(f ). Then we have (x, f (x)) ∈ epi(f ), and therefore
 
x, f (x) + (v, λ) ∈ epi(f ).
It follows that f (x + v) ≤ f (x) + λ; hence f (x + v) − f (x) ≤ λ and
 
sup f (x + v) − f (x)  x ∈ dom(f ) ≤ λ.
Supposing further that sup{f (x + v) − f (x) | x ∈ dom(f )} ≤ λ yields
 
x, f (x) + (v, λ) ∈ epi(f ) whenever x ∈ dom(f ),
which tells us in turn that
(x, γ) + (v, λ) ∈ epi(f ) for all (x, γ) ∈ epi(f ).
In this way, we arrive at the inclusion epi(f ) + (v, λ) ⊂ epi(f ), and thus
(v, λ) ∈ epi(f )∞ . This ensures the equivalence
6.3 Convex Sets and Functions at Infinity 403
 
(v, λ) ∈ epi(f∞ ) = epi(f )∞ ⇐⇒ sup f (x + v) − f (x)  x ∈ dom(f ) ≤ λ .
The latter readily implies that
 
f∞ (v) = sup f (x + v) − f (x)  x ∈ dom(f )
and also that f∞ (v) > −∞ for all v ∈ X. Thus the horizon function f∞ is
proper, and we are done with the proof of the lemma. 
In the case, where the function in question is l.s.c. on a topological vector
space, additional representations of its horizon function are given below.
Theorem 6.36 Let f : X → R be a proper, l.s.c., and convex function defined
on a topological vector space X. Then the horizon function f∞ : X → R is also
proper, l.s.c., and convex. Furthermore, for any x ∈ dom(f ) we have
f (x + tv) − f (x) f (x + tv) − f (x)
f∞ (v) = sup = lim . (6.16)
t>0 t t→∞ t
Proof. The properness and convexity of f∞ are proved in Lemma 6.35. Since
f is l.s.c., the set epi(f )∞ is closed, and hence the horizon function f∞ is
l.s.c. Let us now verify the fulfillment of the representations in (6.16). To do
this, pick v ∈ X and x ∈ dom(f ), and then observe that f∞ is positively
homogeneous since epi(f )∞ is a cone. It follows from Lemma 6.35 that
 
tf∞ (v) = f∞ (tv) = sup f (x + tv) − f (x)  x ∈ dom(f ) ≥ f (x + tv) − f (x)
for any t > 0. Dividing both sides of the obtained inequality by t and taking
the supremum with respect to t > 0 tell us that
f (x + tv) − f (x)
f∞ (v) ≥ sup .
t>0 t
To complete the proof of the theorem, it remains to show that
f (x + tv) − f (x)
f∞ (v) ≤ sup .
t>0 t
This inequality is obvious if the expression on the right-hand side is infinity.
Thus it suffices to consider the case where
f (x + tv) − f (x)
γ := sup ∈ R.
t>0 t
In this case, we have that (x, f (x)) + t(v, γ) ∈ epi(f ) for all t > 0. Since epi(f )
is closed, it follows from Proposition 6.26 that
(v, γ) ∈ epi(f )∞ .
The latter ensures the conditions
 
γ ≥ inf λ ∈ R  (v, λ) ∈ epi(f )∞ = f∞ (v),
which, therefore, complete the proof of the theorem. 
404 6 MISCELLANEOUS TOPICS ON CONVEXITY

The last result here expresses the horizon function of the Fenchel conjugate
via the support function of the domain for the function in question.
Theorem 6.37 Let f : X → R be a proper, l.s.c., and convex function on an
LCTV space X. Then we have the relationships

f∞ = σdom(f ) and f∞ = σdom(f ∗ ) . (6.17)

Proof. Fix v ∗ ∈ X ∗ and x∗0 ∈ dom(f ∗ ). Then for any t > 0, we get
 
f ∗ (x∗0 + tv ∗ ) = sup x∗0 + tv ∗ , x − f (x)  x ∈ dom(f )
   
≤ sup x∗0 , x −f (x) x ∈ dom(f ) +t sup v ∗ , x  x ∈ dom(f )

= f ∗ (x∗0 ) + tσdom(f ) (v ∗ ).
It follows, therefore, that
f ∗ (x∗0 + tv ∗ ) − f ∗ (x∗0 )
≤ σdom(f ) (v ∗ ) whenever t > 0.
t

Applying Theorem 6.36 tells us that f∞ (v ∗ ) ≤ σdom(f ) (v ∗ ). It remains to
∗ ∗ ∗
prove that σdom(f ) (v ) ≤ f∞ (v ). As discussed above, it suffices to consider

the case where γ := f∞ (v ∗ ) ∈ R. Then Theorem 6.36 yields the estimate
f ∗ (x∗0 + tv ∗ ) ≤ f ∗ (x∗0 ) + tγ for all t > 0.
Using the Fenchel-Young inequality gives us
f (x) ≥ x∗0 + tv ∗ , x − f ∗ (x∗0 + tv ∗ )

≥ x∗0 , x − f ∗ (x∗ ) + t(v ∗ , x − γ) for all t > 0.


This implies that v ∗ , x ≤ γ whenever x ∈ dom(f ), and thus
σdom(f ) (v ∗ ) ≤ β = f∞

(v ∗ ),
which verifies the first formula in (6.17). The second formula therein follows
from the first one and the relationship f = f ∗∗ of Theorem 4.15. 

6.4 Signed Distance Functions


This section is devoted to the study of a remarkable class of intrinsically non-
smooth functions, which is highly important in applications as partly demon-
strated in the book. The functions of this type are known under the name of
signed distance functions. Here we reveal some basic properties of such func-
tions without considering their generalized differentiation. One of the reasons
for this is that more efficient subdifferential elaborations, even in the case
of convex functions, will be done in Chapter 7 by using tools of nonconvex
generalized differentiation.
6.4 Signed Distance Functions 405

6.4.1 Basic Definition and Elementary Properties

This subsection describes the construction and some elementary properties of


the signed distance functions in the setting of normed spaces. In what follows
we say that a subset Ω ⊂ X is nontrivial if Ω = ∅ and Ω = X.
Definition 6.38 Let Ω be a nontrivial subset of a normed space X. The
signed distance function associated with Ω is defined by

  d(x; Ω) if x ∈
/ Ω,
d(x; Ω) = dΩ (x) := (6.18)
−d(x; Ω ) if x ∈ Ω,
c

where Ω c := X \ Ω signifies as usual the complement of Ω in X.


 Ω) = d(x; Ω) − d(x; Ω c ) for all x ∈ X.
It follows from (6.18) that d(x;
Now we present three examples that illustrate the calculations of the signed
distance function for simple sets.
Example 6.39 Let Ω := B be the closed unit ball of a normed space X. Then
 Ω) = x − 1 for all x ∈ X.
d(x;
Example 6.40 Let Ω be the collection of all points in the closed unit ball of
R2 with rational components. Then we have

 0 if x ≤ 1,
d(x; Ω) :=
x − 1 if x > 1.

 Ω) = x − 1 for all x ∈ X. This example shows


Note that Ω = B, and so d(x;
  Ω).
that d(x; Ω) may be different from d(x;
Example 6.41 Given a nonzero element a ∈ Rn and b ∈ R, form the set
Ω := {x ∈ Rn | a, x + b ≤ 0}. Then we have


⎪ |a, x + b|
⎨ if x ∈
/ Ω,
 Ω) :=
d(x; a

⎪ |a, x + b|
⎩− if x ∈ Ω.
a

Given now a nontrivial subset Ω of a normed space, consider the extended-


real-valued function ϑ : X → R defined by

∞ if x ∈ Ω c ,
ϑ(x; Ω) := (6.19)
−d(x; Ω c ) if x ∈ Ω.
We reveal its relationships with the sets in question in the next proposition.
Proposition 6.42 Let Ω, Ω1 , Ω2 ⊂ X be proper subsets of a normed space
X. The following assertions hold:
(a) If Ω1 ⊂ Ω2 , then ϑ(x; Ω2 ) ≤ ϑ(x; Ω1 ) for all x ∈ X.
406 6 MISCELLANEOUS TOPICS ON CONVEXITY

(b) We have the equalities


   
x ∈ X  ϑ(x; Ω) ≤ 0 = Ω and x ∈ X  ϑ(x; Ω) < 0 = int(Ω).

(c) If Ω is closed, then ϑ(·; Ω) is lower semicontinuous on X.

Proof. Assertion (a) is obvious. To verify (b), observe from (6.19) that

x ∈ X | ϑ(x; Ω) ≤ 0 = Ω.
To prove the other property in (b), suppose first that ϑ(x; Ω) < 0. Then x ∈ Ω
and γ := d(x; Ω c ) > 0. We get B(x; γ/2) ⊂ Ω, and hence x ∈ int(Ω). Now
pick x ∈ int(Ω) and find γ > 0 such that B(x; γ) ⊂ Ω. Then for any w ∈ Ω c ,
we have x − w ≥ γ, which shows that d(x; Ω c ) ≥ γ > 0.
To verify (c), take any α ∈ R and get for the α-sublevel set of (6.19) that
 
Lα : = x ∈ X  ϑ(x; Ω) ≤ α 
= x ∈ Ω  d(x; Ω c ) ≥ α 
= Ω ∩ x ∈ X  d(x; Ω c ) ≥ α .
This shows that the closedness of Ω yields the closedness of Lα due to the
continuity of d(·; Ω c ). Thus ϑ(·; Ω) is l.s.c. on X. 

Next we discuss some properties of the signed distance function (6.18).

Proposition 6.43 Let X be a normed space, and let Ω be a nontrivial subset


of X. Then we have the following assertions:
 
(a) int(Ω) = x ∈ X  d(x; Ω) < 0 .
 
 Ω) ≤ 0 .
(b) Ω = x ∈ X  d(x;

(c) bd(Ω) = x ∈ X | d(x; Ω) = 0 .
 
 Ω) > 0 .
(d) int(Ω c ) = x ∈ X  d(x;
Proof. To verify (a), suppose first that x ∈ int(Ω). Then Proposition 6.42(b)
 Ω) = ϑ(x; Ω) < 0. Conversely, assuming that d(x;
yields d(x;  Ω) < 0 tells us
 Ω) = ϑ(x; Ω). Applying Proposition 6.42(b) again,
that x ∈ Ω, and so d(x;
we get x ∈ int(Ω), and thus (a) is proved.
To verify (b), take x ∈ Ω. Then d(x; Ω) = 0, and therefore
 Ω) = d(x; Ω) − d(x; Ω c ) = −d(x; Ω c ) ≤ 0.
d(x;
 Ω) ≤ 0 which means d(x; Ω) ≤
To prove the opposite inclusion, let d(x;
d(x; Ω ). Suppose on the contrary that x ∈
c
/ Ω and then find γ > 0 with
B(x; γ) ∩ Ω = ∅.
This yields x − w ≥ γ for all w ∈ Ω while implying that 0 < γ ≤ d(x; Ω) ≤
d(x; Ω c ) = 0, a clear contradiction.
6.4 Signed Distance Functions 407

Next we check (c). Observe that x ∈ bd(Ω) if and only if d(x; Ω) =


 Ω) = 0. Note also that d(x; Ω) =
d(x; Ω c ) = 0, which is equivalent to d(x;
d(x; Ω ) = 0 if and only if x ∈ Ω ∩ Ω = bd(Ω), and so we are done.
c c
 Ω) = −d(x;
Finally, (d) follows from (a) and d(x;  Ω c ). 

6.4.2 Lipschitz Continuity and Convexity

To proceed with the study of more involved properties of (6.18), we present


first the following lemma that deals with the usual distance function (2.33).

Lemma 6.44 Let Ω be a nontrivial subset of a normed space X. If x ∈ Ω


and u ∈ Ω c , then there exists a vector z ∈ X satisfying z ∈ bd(Ω) ∩ [x, u].

Proof. Define the function ϕ : [0, 1] → R by


  
ϕ(t) := d tx + (1 − t)u; Ω − d(tx + (1 − t)u; Ω c for t ∈ [0, 1].
Then ϕ(0) = d(u; Ω) ≥ 0 and ϕ(1) = −d(x; Ω c ) ≤ 0. The classical intermedi-
ate value theorem gives us z ∈ [x, u] such that d(z; Ω) = d(z; Ω c ). Supposing
the contrary, we get d(z; Ω) = d(z; Ω c ) = 0 and deduce from Exercise 1.149(c)
that z ∈ Ω ∩ Ω c = bd(Ω), which verifies the claimed result. 

Now we are ready to show that the signed distance function has the same
global Lipschitz continuity as its standard counterpart (2.33).

Proposition 6.45 Let Ω be a nontrivial subset of a normed space X. Then


 Ω) is globally Lipschitz continuous on X with
the signed distance function d(·;
the uniform Lipschitz constant  = 1.

Proof. We need to verify that


 Ω) − d(u;
|d(x;  Ω)| ≤ x − u for all x, u ∈ X. (6.20)
Since both functions d(·; Ω) and d(·; Ω c ) are Lipschitz continuous on X with
constant  = 1, it suffices to consider the case in (6.20) where x ∈ Ω and
u ∈ Ω c . Then we clearly have the equalities
 Ω) − d(u;
|d(x;  Ω)| = |d(x; Ω) − d(x; Ω c ) − d(u; Ω) + d(u; Ω c )|
= d(x; Ω c ) + d(u; Ω).
Applying Lemma 6.44 allows us to find z ∈ [x, u] ∩ bd(Ω) with d(z; Ω) =
d(z; Ω c ) = 0. This readily yields
d(x; Ω c ) + d(u; Ω) = d(x; Ω c ) − d(z; Ω c ) + d(u; Ω) − d(z; Ω)
≤ x − z + z − u = x − u,
which completes the proof of this proposition. 
The next proposition reveals the equivalence between the convexity of the
set Ω and of the function (6.19) associated with Ω.
408 6 MISCELLANEOUS TOPICS ON CONVEXITY

Proposition 6.46 Let Ω be a nontrivial subset of a normed space X. Then


Ω is convex if and only if the function ϑ(·; Ω) is convex on X.
Proof. Assuming the convexity of Ω, we show that the function ϑ(·; Ω) is
convex as well. It suffices to verify the inequality
 
d tx + (1 − t)u; Ω c ≥ td(x; Ω c ) + (1 − t)d(u; Ω c ) for all x, u ∈ Ω.
Suppose on the contrary that ϑ(·; Ω) is not convex and then find x1 , x2 ∈ Ω
and t ∈ (0, 1) such that
 
d tx1 + (1 − t)x2 ; Ω c < td(x1 ; Ω c ) + (1 − t)d(x2 ; Ω c ).
Then there exists w ∈ Ω c satisfying
w − (tx1 + (1 − t)x2 ) < td(x1 ; Ω c ) + (1 − t)d(x2 ; Ω c ).
Consider further the points
z z
u1 := x1 + d(x1 ; Ω c ) and u1 = x2 + d(x2 ; Ω c ),
γ γ
where γ := td(x1 ; Ω c ) + (1 − t)d(x2 ; Ω c ) and z := w − (tx1 + (1 − t)x2 ). Let
us deduce from t < γ that u1 , u2 ∈ Ω. Indeed, for d(x1 ; Ω c ) = 0, we obviously
get u1 = x1 ∈ Ω. Consider next the remaining case where d(x1 ; Ω c ) > 0 and
suppose on the contrary that u1 ∈ / Ω, which gives us u1 ∈ Ω c . It follows from
definition (2.33) of the distance function that
d(x1 ; Ω c )
d(x1 ; Ω c ) ≤ x1 − u1  = z,
γ
which yields γ ≤ z, a contradiction. This tells us that u1 ∈ Ω, and similarly
we get u2 ∈ Ω. Employing the convexity of Ω ensures that
tu1 + (1 − t)u2 = tx1 + (1 − t)x2 + z = w ∈ Ω,
which contradicts the assumption made and so verifies the convexity of ϑ(·; Ω).
To prove the opposite implication, suppose that ϑ(·; Ω) is convex. Fixing
any x1 , x2 ∈ Ω and t ∈ (0, 1) ensures that
 
ϑ tx1 + (1 − t)x2 ; Ω ≤ tϑ(x1 ; Ω) + (1 − t)ϑ(x2 ; Ω) ≤ 0,
which implies that tx1 + (1 − t)x2 ∈ Ω by Proposition 6.42. 

The following important result establishes a convenient representation of


the signed distance function (6.19) that is used below.
Theorem 6.47 Let Ω be a nontrivial subset of a normed space X. Then (6.19)
is represented as the infimal convolution
dΩ (x) = (ϑΩ p)(x) for all x ∈ X,
where ϑΩ (x) := ϑ(x; Ω), and where p(x) := x is the norm function on X.
6.4 Signed Distance Functions 409

Proof. First consider the case where x ∈ Ω. In this case, we have


 
(ϑΩ p)(x) = inf ϑΩ (y) + p(x − y)  y ∈ X
 
= inf ϑΩ (y) + p(x − y)  y ∈ Ω
 
= inf − d(y; Ω c ) + x − y y ∈ Ω

≤ −d(x; Ω c ) + x − x = −d(x; Ω c ) = dΩ (x).


To verify the opposite inequality, pick any y ∈ Ω and get
−d(x; Ω c ) ≤ −d(y; Ω c ) + x − y,
which readily implies that
 
dΩ (x) = −d(x; Ω c ) ≤ inf ϑΩ (y) + x − y  y ∈ Ω = (ϑΩ p)(x).
Consider further the case where x ∈ Ω c . For any y ∈ Ω, we have
−d(y; Ω c ) + x − y ≤ x − y,
which ensures in turn that

(ϑΩ p)(x) ≤ inf x − y  y ∈ Ω} = d(x; Ω) = dΩ (x).
Moreover, Lemma 6.44 tells us that for any y ∈ Ω there exists z ∈ bd(Ω)
satisfying x − z + z − y = x − y. Hence
d(x; Ω) + d(y; Ω c ) ≤ x − z + z − y = x − y.
This yields d(x; Ω) ≤ −d(y; Ω c ) + x − y for all y ∈ Ω, and therefore,
dΩ (x) = d(x; Ω) ≤ (ϑΩ p)(x), which completes the proof of the theorem. 

The previous theorem and the convexity of the infimal convolution (2.41)
for convex functions yield the convexity of the signed distance function (6.18)
associated with a convex set.

Corollary 6.48 Let Ω be a nontrivial convex subset of a normed space. Then


the signed distance function dΩ is convex on X.

Proof. Combine the results of Theorem 6.47 and Proposition 2.135. 

We will return to the investigation of the signed distance function in Chap-


ter 7 while concentrating there on the study of subdifferential properties of
(6.18). It is interesting to observe that, although (6.18) is proved above to be
a convex function, we develop in Section 7.7 an efficient approach to calcu-
late the convex subdifferential of (6.18) by employing the results of nonconvex
generalized differentiation presented in Chapter 7.
410 6 MISCELLANEOUS TOPICS ON CONVEXITY

6.5 Minimal Time Functions


This section is devoted to the study of two intrinsically nonsmooth classes
of extended-real-valued functions known as the minimal time functions with
constant dynamics and the signed minimal time functions. The importance of
these classes of functions has been highly recognized in applications to opti-
mization and facility location problems as demonstrated in the second volume
of the book [240]. Here we reveal some basic properties of such functions and
then proceed with their convex generalized differentiation.

6.5.1 Minimal Time Functions with Constant Dynamics

In this subsection, we consider the class of functions related to the following


minimal time problem with constant dynamics:
minimize t ≥ 0 subject to (x + tF ) ∩ Ω = ∅, x ∈ X,
where X is a topological vector space of state variables, Ω ⊂ X is a target set,
and F ⊂ X signifies the constant dynamics dx/dt ∈ F to attain the target Ω
from the state x ∈ X. Such functions are known as the minimal time functions
with constant dynamics, or simply as the minimal time functions.

Definition 6.49 Let F be a convex set in a topological vector space X with 0 ∈


int(F ), and let Ω be a nonempty subset of X. The minimal time function
associated with Ω, F is
 
TΩF (x) := inf t ≥ 0  Ω ∩ (x + tF ) = ∅ , x ∈ X. (6.21)

To begin with, we observe that the minimal time function (6.21) is an


extension of the distance function dΩ (x) associated with Ω when X is a
normed space. Indeed, when F = B is the closed unit ball of X, we have

TΩB (x) = dΩ (x) for all x ∈ X. (6.22)


In connection with (6.21), we recall here the Minkowski gauge function con-
sidered in Subsection 2.2.2 that is rewritten in the notation of (6.21) by
 
pF (u) := inf t ≥ 0  u ∈ tF , u ∈ X. (6.23)
Next we establish an important relationship between the minimal time
function and the Minkowski gauge (6.23) in full generality.

Theorem 6.50 Let F be a convex set in a topological vector space X with


0 ∈ int(F ), and let Ω be a nonempty subset of X. Then the minimal time
function (6.21) is represented by
TΩF (x) = inf pF (w − x) for all x ∈ X. (6.24)
w∈Ω
6.5 Minimal Time Functions 411

Proof. The condition 0 ∈ int(F ) ensures that TΩF : X → R is a real-valued


function. Fix any x ∈ X and observe that for each t ≥ 0 with Ω ∩ (x + tF ) = ∅
there is w ∈ Ω satisfying w − x ∈ tF , and hence pF (w − x) ≤ t. This yields
inf pF (w − x) ≤ t,
w∈Ω

and so inf w∈Ω pF (w − x) ≤ TΩF (x). Let further γ := inf w∈Ω pF (w − x) and,
given ε > 0, find w ∈ Ω satisfying
pF (w − x) < γ + ε.
Then there exists t ≥ 0 such that t < γ + ε and w − x ∈ tF . This implies that
TΩF (x) ≤ t < γ + ε,
and hence TΩF (x) ≤ γ = inf w∈Ω pF (w − x), which completes the proof. 
Now we present two consequences of this theorem that give us sufficient
conditions on the constant dynamics F generating the Minkowski gauge (6.23)
that ensure the continuity of (6.21) in arbitrary topological vector spaces X
as well as its Lipschitz continuity if the space in question is normed.
Corollary 6.51 Under the general assumptions of Theorem 6.50 the minimal
time function (6.21) is continuous on X.
Proof. Using representation (6.24), we fix x ∈ X and get
TΩF (x) = inf pF (w − x).
w∈Ω

Let V := int(F ) and w0 ∈ X. It follows from Corollary 2.27 that pF (w0 −x) <
1 if and only if w0 − x ∈ V , i.e., x ∈ w0 − V , which is a neighborhood of w0 .
Thus the minimal time function is bounded on a nonempty, open, and convex
set. This yields the continuity of (6.21) on X by Theorem 2.144. 
Corollary 6.52 Under the general assumptions of Theorem 6.50, we have
TΩF (x) − TΩF (y) ≤ pF (y − x) for all x, y ∈ X. (6.25)
If X is a normed space, then TΩF (·) is Lipschitz continuous on X.
Proof. Fix any x, y ∈ X and q ∈ Ω. Then it follows from Theorem 6.50 that
TΩF (x) = inf pF (w − x)
w∈Ω
≤ pF (q − x) = pF (q − y + y − x)
≤ pF (q − y) + pF (y − x).
Taking the infimum above with respect to q ∈ Ω gives us (6.25).
When X is a normed space, fix r > 0 such that B(0; r) ⊂ F . Then

pF (x) = inf t > 0 | x/t ∈ F 
≤ inf t > 0 | x/t ∈ B(0;r)
≤ inf t > 0 | x/t ≤ r
= inf t > 0 | x/r ≤ t = x/r,
412 6 MISCELLANEOUS TOPICS ON CONVEXITY

and thus we deduce from (6.25) the estimate


1
TΩF (x) − TΩF (y) ≤ x − y for all x, y ∈ X.
r
Interchanging there the positions of x and y verifies the claimed Lipschitz
continuity of the minimal time function TΩF on the entire space X. 
Now we return to the general setting of topological vector spaces and
present some useful results. Consider first the behavior of the minimal time
function (6.21) with respect to the enlargement of the target set Ω defined by
 
Ωr := x ∈ X  TΩF (x) ≤ r , r > 0. (6.26)
Proposition 6.53 Let F be a convex set in a topological vector space X with
0 ∈ int(F ), and let Ω be a nonempty subset of X. For any x ∈/ Ωr , we have
the following representation:
TΩF (x) = r + TΩFr (x) whenever r > 0. (6.27)
Proof. Let t1 :=TΩFr (x).
By the definition of TΩFr (x), for any ε > 0 there exist
w1 ∈ Ωr and t1 ≤ γ1 < t1 + ε satisfying
w1 ∈ Ωr ∩ (x + γ1 F ).
Then TΩF (w1 ) ≤ r by (6.26), and so there are w2 ∈ Ω and γ2 < r + ε with
w2 ∈ Ω ∩ (w1 + γ2 F ).
Consequently, it follows that w2 ∈ Ω ∩ (x + (γ1 + γ2 )F ) due to the convexity
of F . This brings us to the relationships
TΩF (x) ≤ γ1 + γ2 ≤ TΩFr (x) + r + 2ε,
which imply that TΩF (x) ≤ TΩFr (x) + r due to the arbitrary choice of ε > 0.
To verify the opposite inequality in (6.27), we get that t := TΩF (x) > r
due to x ∈/ Ωr . Then for any ε > 0 there exist γ with t ≤ γ < t + ε and w ∈ X
satisfying the inclusion
w ∈ Ω ∩ (x + γF ).
The above element w ∈ Ω can be represented as w = x + γq with some q ∈ F .
Define further wr := x + (γ − r)q and get wr ∈ Ωr by w ∈ Ω ∩ (wr + rF ) = ∅.
Thus wr ∈ Ωr ∩ (x + (γ − r)F ), which yields the inequalities
TΩFr (x) ≤ γ − r ≤ TΩF (x) + ε − r.
We, therefore, arrive at TΩFr and complete the proof of the proposition. 
The following proposition reveals the behavior of the minimal time func-
tion under linear argument shifts.
Proposition 6.54 Let F be a convex set in a topological vector space X with
0 ∈ int(F ), and let Ω be a nonempty subset of X. Then for any x ∈ Ωr ⊂ X
with r > 0 and any t ≥ 0 we have the estimate
6.5 Minimal Time Functions 413

TΩF (x − tq) ≤ r + t whenever q ∈ F.


Proof. Fix (x, r, t, q) in the formulation of the proposition and denote λ :=
TΩF (x). Picking any ε > 0 and observing that λ ≤ r, find γ > 0 such that
λ ≤ γ < λ + ε and that w ∈ Ω ∩ (x + γF ). This directly implies the inclusions
 
w ∈ Ω ∩ (x − tq + tq + γF ) ⊂ Ω ∩ (x − tq + tF + γF ) ⊂ Ω ∩ x − tq + (t + γ)F .
It follows then that TΩF (x − tq) ≤ γ + t ≤ t + λ + ε ≤ t + r + ε, and hence we
arrive at TΩF (x − tq) ≤ r + t by the arbitrary choice of ε > 0. 
The next proposition establishes a kind of linearity of the minimal time
functions with respect to projection points on arbitrary target sets.
Proposition 6.55 In the setting of Proposition 6.54 assume further that F
is closed. Let x ∈
/ Ω, and let w ∈ ΠΩ
F
(x), where
 
ΠF (x; Ω) := ω ∈ Ω  TΩF (x) = pF (ω − x) . (6.28)
Then we have the equality
 
TΩF λw + (1 − λ)x = (1 − λ)TΩF (x) for any λ ∈ (0, 1). (6.29)
Proof. Observe by Corollary 2.27 that w ∈ x + tF for t := TΩF (x),
and there-
fore
λw + (1 − λ)x = w + (1 − λ)(x − w) ∈ w − (1 − λ)tF,
which implies the inclusion
 
w ∈ Ω ∩ λw + (1 − λ)x + (1 − λ)tF , 0 < λ < 1.
Hence TΩF (λw + (1 − λ)x) ≤ (1 − λ)t = (1 − λ)TΩF (x) for such λ. This justifies
the inequality “≤” in (6.29). To verify the opposite inequality, denote tλ :=
TΩF (λw + (1 − λ)x) and for any ε > 0 find tλ ≤ γ < tλ + ε with
 
Ω ∩ x + λ(w − x) + γF = ∅.
Thus we arrive at the condition
 
Ω ∩ x + (λt + γ)F = ∅,
and so TΩF (x) ≤ λt + γ ≤ λTΩF (x) + tλ + ε. It follows finally that
(1 − λ)TΩF (x) ≤ tλ + ε,
which completes the proof by passing to the limit as ε ↓ 0. 
Next we present two results on the convexity and concavity of the min-
imal time functions depending on the set structures. The first proposition
establishes a simple and expected sufficient condition for the convexity of the
minimal time function with constant dynamics.
Proposition 6.56 Let F be a convex set in a topological vector space X with
0 ∈ int(F ), and let Ω be a nonempty subset of X. Then the minimal time
function (6.21) is convex if the target set Ω is convex.
414 6 MISCELLANEOUS TOPICS ON CONVEXITY

Proof. Suppose that the target set Ω is convex and show that in this case for
any x1 , x2 ∈ X and for any λ ∈ (0, 1), we have
 
TΩF λx1 + (1 − λ)x2 ≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ). (6.30)
Let t1 := TΩF (x1 ) and t2 := TΩF (x2 ). Then for any ε > 0 there exist numbers
γ1 and γ2 such that
ti ≤ γi < ti + ε and Ω ∩ (xi + γi F ) = ∅, i = 1, 2.
Take wi ∈ Ω ∩ (xi + γi F ) and by the convexity of the sets Ω and F obtain
the inclusions λw1 + (1 − λ)w2 ∈ Ω and
λw1 + (1 − λ)w2 ∈ λx1 + (1 − λ)x2 + λγ
 1 F + (1 − λ)γ
 2F
⊂ λx1 + (1 − λ)x2 + λγ1 + (1 − λ γ2 )F.
The latter implies the inequalities
 
TΩF λx1 + (1 − λ)x2 ≤ λγ1 + (1 − λ)γ2
≤ λTΩF (x1 ) + (1 − λ)TΩF (x2 ) + ε,
which in turn justify (6.30) due to the arbitrary choice of ε > 0. 

The following proposition provides sufficient conditions for the concavity


property of the minimal time function.
Proposition 6.57 Let F be a convex set in a topological vector space X with
0 ∈ int(F ), and let Ω be a nonempty subset of X. Assume that the comple-
ment Ω c of the target set is convex. Then the minimal time function (6.21) is
concave on Ω c .
Proof. If TΩF is not concave on Ω c , then there exist x1 , x2 ∈ Ω c and 0 < λ < 1
for which we have the inequalities
 
TΩF λx1 + (1 − λ)x2 < λTΩF (x1 ) + (1 − λ)TΩF (x2 ). (6.31)
By definition (6.21), we find t < λTΩF (x1 )+(1−λ)TΩF (x2 ) and w ∈ Ω satisfying
w − (λx1 + (1 − λ)x2 ) = tq
for some q ∈ F . Consider the points
tq
ui := xi + T F (xi ), i = 1, 2,
λTΩF (x1 ) + (1 − λ)TΩF (x2 ) Ω
and observe that u1 , u2 ∈ Ω c . Indeed, assuming for definiteness that u1 ∈ Ω
clearly yields the estimates
tTΩF (x1 )
TΩF (x1 ) ≤ < TΩF (x1 ),
λTΩF (x1 ) + (1 − λ)TΩF (x2 )
a contradiction. At the same time, we have the inclusion w = λu1 +(1−λ)u2 ∈
Ω, which is impossible due to the convexity of Ω c . Combining all the above
6.5 Minimal Time Functions 415

shows that condition (6.31) does not hold under the assumptions made, and
thus the function TΩF is concave on Ω c . 
Finally in this subsection, we present some results involving the closure
operation in the framework of the minimal time function. Having Ω and F
from (6.21), define the closure of Ω relative to F by
 
clF (Ω) := Ω − εF . (6.32)
ε>0

The next three propositions involving (6.32) deal with (6.21) in topological
vector spaces. The first statement shows that the boundedness of F ensure
that (6.32) reduces to the topological closure Ω independently of F .
Recall that a subset Θ of a topological vector space is bounded if for any
neighborhood V of the origin there exists t > 0 such that Θ ⊂ tV .

Proposition 6.58 Let F be a bounded subset of a topological vector space X


under the fulfillment of the interiority condition 0 ∈ int(F ). Then for any
nonempty set Ω ⊂ X, we have the equality clF (Ω) = Ω.

Proof. Fix any x ∈ Ω and choose a neighborhood V of the origin such that
V ⊂ F . Then for any ε > 0, we get x ∈ Ω − εV ⊂ Ω − εF . It follows from
(6.32) that x ∈ clF (Ω), and thus clF (Ω) ⊃ Ω.
To verify the opposite inclusion, pick any x ∈ clF (Ω) and get x ∈ Ω −
εF for all ε > 0. Taking any neighborhood V of the origin and using the
boundedness of F , we have F ⊂ tV for some t > 0, and so εF ⊂ V for
ε := 1/t. This tells us that x ∈ Ω − V , i.e., (x + V ) ∩ Ω = ∅, which implies
that x ∈ Ω. 
The final two statements do not assume the boundedness of the constant
dynamic set F . We begin with showing that the usage of (6.32) allows to
characterize the roots of the equation TF (x; Ω) = 0.

Proposition 6.59 Let F be a subset of a topological vector space such that the
interiority condition 0 ∈ int(F ) is satisfied. Then x ∈ X solves the equation
TF (x; Ω) = 0 if and only if we have x ∈ clF (Ω).

Proof. Suppose that TF (x; Ω) = 0 and then for any ε > 0 find 0 < t < ε with
(x + tF ) ∩ Ω = ∅.
Since the condition 0 ∈ int(F ) ensures that tF ⊂ εF , we have (x+εF )∩Ω = ∅,
and thus x ∈ Ω − εF . The latter implies in turn that x ∈ clF (Ω).
To verify the opposite implication, pick x ∈ clF (Ω) and get by (6.32) that
x ∈ Ω − εF as ε > 0, which yields (x + εF ) ∩ Ω = ∅ for all positive ε. This
tells us that TF (x; Ω) < ε whenever ε > 0, and thus TF (x; Ω) = 0. 

The last assertion elaborates the result of Proposition 6.58 in the case
where the set Ω is unbounded. In this case, we express the relative closure
416 6 MISCELLANEOUS TOPICS ON CONVEXITY

(6.32) in terms of Ω and the horizon cone of F defined in (6.12). Recall from
Proposition 6.26 that the horizon cone of a closed and convex set is the same
for all points of the set in question.

Proposition 6.60 In addition to the assumptions of Proposition 6.59, sup-


pose that F is closed and convex, while Ω is sequentially compact in X. Then

clF (Ω) = Ω − F∞ . (6.33)

Proof. To verify first the inclusion “⊃” in (6.33), pick any x ∈ Ω − F∞ and
get x = w − d for some w ∈ Ω and d ∈ F∞ . Using definition (6.12) of F∞
and the condition 0 ∈ int(F ) implies that t(w − x) ∈ F for all t > 0 and that
x ∈ Ω − εF for all ε > 0. This tells us by (6.32) that x ∈ clF (Ω).
To check the opposite inclusion in (6.33), fix x ∈ clF (Ω) and for any k ∈ N
find wk ∈ Ω and vk ∈ F such that x = wk − vk /k. Since Ω is sequentially
compact, suppose without loss of generality that the sequence {wk } converges
to some w ∈ Ω. Hence vk /k → w − x as k → ∞, and we arrive at w − x ∈ F∞ ,
which yields x ∈ Ω − F∞ and thus completes the proof. 

6.5.2 Subgradients of Minimal Time Functions

In this subsection, we study subgradients of the minimal time function (6.21).


The first result of this subsection establishes a precise formula for calculating
the subdifferential of TΩF at the target points x ∈ Ω via the normal cone to
Ω and the support function of the constant dynamics F .

Theorem 6.61 Let F be a convex set in a topological vector space X with


0 ∈ int(F ), and let Ω a nonempty convex subset of X. Then for any x ∈ Ω,
we have the representation
∂TΩF (x) = N (x; Ω) ∩ C ∗ , (6.34)
where the set C ∗ ⊂ X ∗ is defined via the support function (4.8) of F by
 
C ∗ := x∗ ∈ X ∗  σF (−x∗ ) ≤ 1 . (6.35)
Furthermore, representation (6.34) is equivalent to
 
∂TΩF (x) = N x; clF (Ω) ∩ C ∗ . (6.36)

Proof. Fix any x∗ ∈ ∂TΩF (x) and write by definition that


x∗ , x − x ≤ TΩF (x) − TΩF (x) for all x ∈ X. (6.37)
Since TΩF (x) = 0 whenever x ∈ Ω, it follows that
x∗ , x − x ≤ 0 for all x ∈ Ω,
which implies by the normal cone definition (3.15) that x∗ ∈ N (x; Ω). Fixing
now any f ∈ F and t > 0, we deduce from (6.37) that
6.5 Minimal Time Functions 417

x∗ , (x − tf ) − x ≤ TΩF (x − tf ) ≤ t,
where the last inequality holds due to the condition ((x − tf ) + tF ) ∩ Ω = ∅
in definition of the minimal time function. This ensures, therefore, that
x∗ , −f ≤ 1 for all f ∈ F,
and so x∗ ∈ C ∗ by the construction of C ∗ and definition (4.8) of the support
function. Thus we arrive at the inclusion ∂TΩF (x) ⊂ N (x; Ω) ∩ C ∗ .
To verify the opposite inclusion in (6.34), pick any x∗ ∈ N (x; Ω) ∩ C ∗ and
then get x∗ , −f ≤ 1 for all f ∈ F together with
x∗ , x − x ≤ 0 for all x ∈ Ω.
Fix further any u ∈ X and for every ε > 0 find t > 0, f ∈ F , and ω ∈ Ω with
TΩF (u) ≤ t < TΩF (u) + ε and u + tf = ω.
This readily gives us the relationships
x∗ , u − x = x∗ , u − ω + x∗ , ω − x
≤ x∗ , −tf ≤ t < TΩF (u) + ε = TΩF (u) − TΩF (x) + ε,
which yield (6.37) since ε was chosen arbitrarily, and thus justify (6.34).
Let us finally prove the fulfillment of (6.36). We obviously have the inclu-
sion N x; clF (Ω) ∩ C ∗ ⊂ N (x; Ω) ∩ C ∗ due to Ω ⊂ clF (Ω). To verify the
opposite one, pick any x∗ ∈ N (x; Ω) ∩ C ∗ and x ∈ clF (Ω). Then we get from
(6.32) that x ∈ Ω − tF for all t > 0. Representing x = ωt − tdt with some
ωt ∈ Ω and dt ∈ F for each t > 0 tells us that
x∗ , x − x = x∗ , ωt − tdt − x = x∗ , ωt − x + tx∗ , −dt ≤ t,
which ensures that x∗ ∈ N (x; clF (Ω)) ∩ C ∗ by passing to the limit as t ↓ 0
and thus completes the proof of the theorem. 

The next theorem derives a similar subdifferential formula for TΩF at points
x ∈ clF (Ω) of the F -relative closure of Ω defined in (6.32).

Theorem 6.62 Let x ∈ clF (Ω) in the setting of Theorem 6.61. Then we have
 
∂TΩF (x) = N x; clF (Ω) ∩ C ∗ , (6.38)
where the set C ∗ ⊂ X ∗ is taken from (6.35).

Proof. Picking any x∗ ∈ ∂TΩF (x), deduce from (6.37) and the equality
TΩF (x) = 0 for all x ∈ clF (Ω) that x∗ ∈ N (x; clF (Ω)). Since clF (Ω) ⊂ Ω − F ,
we represent x as w − f with w ∈ Ω and f ∈ F . Fixing any f ∈ F and t > 0,
denote x := w − tf and obtain similarly to the proof of Theorem 6.61 that
x∗ , (w − tf ) − (w − f ) = x∗ , −tf + f ≤ TΩF (w − tf ) ≤ t,
which clearly ensures that
418 6 MISCELLANEOUS TOPICS ON CONVEXITY

x∗ , −f + f /t ≤ 1 for all t > 0.


Letting t → ∞ and using (6.35) give us x∗ ∈ C ∗ and hence verify the inclusion
 
∂TΩF (x) ⊂ N x; clF (Ω) ∩ C ∗ .
 
To justify the opposite inclusion in (6.38), fix x∗ ∈ N x; clF (Ω) ∩ C ∗ .
Taking any ε > 0, for every u ∈ X find t ∈ [0, ∞), ω ∈ Ω, and f ∈ F with
TΩF (u) ≤ t < TΩF (u) + ε and u + tf = ω.
Since Ω ⊂ clF (Ω), we deduce from the definitions that
x∗ , u − x = x∗ , ω − x + tx∗ , −f ≤ t < TΩF (u) + ε = TΩF (u) − TΩF (x) + ε.
This, therefore, ensures that x∗ ∈ ∂TΩF (x) and thus verifies the inclusion “⊃”
in (6.38). In this way, we complete the proof of the theorem. 
In the rest of this subsection, we consider the most challenging case for sub-
differentiation of the minimal time function (6.21) at the points x ∈ / clF (Ω).
Two precise while different formulas for computing the subdifferential ∂TΩF (x)
at such points are derived below. The first subdifferential formula involves the
expansions of Ω defined by
 
Ωr := x ∈ X  TΩF (x) ≤ r for any r > 0. (6.39)
Theorem 6.63 In the setting of Theorem 6.61, suppose that x ∈ / clF (Ω), and
let r := TΩF (x) > 0. Then we have the subdifferential formula
 
∂TΩF (x) = N (x; Ωr ) ∩ S ∗ with S ∗ := x∗ ∈ X ∗  σF (−x∗ ) = 1 . (6.40)
Proof. Pick any x∗ ∈ ∂TΩF (x) and conclude similarly to the proof of The-
orem 6.61 that σF (−x∗ ) ≤ 1 and x∗ ∈ N (x; Ωr ). Let us now show that
σF (−x∗ ) = 1. Having in mind that
x∗ , x − x ≤ TΩF (x) − TΩF (x) for all x ∈ X, (6.41)
fix any ε ∈ (0, r) and find t ∈ R, f ∈ F , and ω ∈ Ω satisfying
r ≤ t < r + ε2 and ω = x + tf.
We can write ω = x+εf +(t−ε)f and thus get the estimate TΩF (x+εf ) ≤ t−ε.
Applying (6.41) with x = x + εf gives us the inequalities
x∗ , εf ≤ TΩF (x + εf ) − TΩF (x) ≤ t − ε − r ≤ ε2 − ε,
which ensure the fulfillment of the estimates
1 − ε ≤ −x∗ , f ≤ σF (−x∗ ).
This implies by passing to the limit as ε ↓ 0 that σF (−x∗ ) ≥ 1, and thus
x∗ ∈ S ∗ . This verifies the inclusion ∂TΩF (x) ⊂ N (x; Ωr ) ∩ S ∗ .
To justify the opposite inclusion in (6.40), take any x∗ ∈ N (x; Ωr ) for
which σF (−x∗ ) = 1 and then show that the subgradient inequality (6.41) is
6.5 Minimal Time Functions 419

satisfied. It follows from Theorem 6.61 that x∗ ∈ ∂TΩFr (x), and thus

x∗ , x − x ≤ TΩFr (x) for all x ∈ X.


Fix any x ∈ X and first consider the case where t := TΩF (x) > r. Then
it follows from Proposition 6.53 that the desired condition (6.41) holds. In
the remaining case where t ≤ r, for any ε > 0 we choose f ∈ F such that
x∗ , −f > 1 − ε. Employing now Proposition 6.54 ensures that TΩF (x − (r −
t)f ) ≤ r, and hence x − (r − t)f ∈ Ωr . Since x∗ ∈ N (x; Ωr ), we get
x∗ , x − (r − t)f − x ≤ 0,
which clearly yields the relationships
 
x∗ , x − x ≤ x∗ , f (r − t) ≤ (1 − ε)(t − r) = (1 − ε) TΩF (x) − TΩF (x) .
Since ε > 0 was chosen arbitrarily, we get (6.41) and complete the proof. 

The last theorem of this subsection establishes yet another representation


of the subdifferential of TΩF at points x ∈
/ clF (Ω) that uses the subdifferential
of the Minkowski gauge (6.23) of the dynamics F and the normal cone to the
target Ω at the generalized projection points instead of the target expansion
Ωr . Now we impose more assumptions on Ω ensuring, in particular, that the
F -closure clF (Ω) reduces to the standard closure Ω by Proposition 6.58. The
appropriate class of topological vector spaces is also specified. Recall to this
end that a topological vector space is semireflexive if the canonical map into
its bidual is surjective; see [218] .

Theorem 6.64 In addition to the assumptions of Theorem 6.61, suppose that


X is semireflexive and that Ω is closed and bounded. Then whenever x ∈ X,
we have the subdifferential representation
 
∂TΩF (x) = − ∂pF (ω − x) ∩ N (ω; Ω) for any ω ∈ ΠF (x; Ω), (6.42)
where the nonempty generalized projection is defined by (6.28).

Proof. Let us first observe that the generalized projection (6.28) is a nonempty
set. Indeed, it is well known (see Chapter 2) that the Minkowski gauge pF is a
continuous function on X under the imposed convexity of F with 0 ∈ int(F ).
The convexity of pF ensures that pF is weakly lower semicontinuous on X.
Due to the semireflexivity of X and the assumptions imposed on Ω, it is not
hard to show that Ω is weakly compact in X; see Exercise 6.110(a). Thus
the infimum in (6.24) is realized by the Weierstrass existence theorem in the
weak topology of X. This justifies the nonemptiness of ΠF (x; Ω). Arguing now
similar to the proof of Theorem 6.61, we arrive at the claimed representation
(6.42) and thus complete the proof of the theorem. 
420 6 MISCELLANEOUS TOPICS ON CONVEXITY

6.5.3 Signed Minimal Time Functions

Similar to the signed distance function (6.38) studied in Section 6.4, we con-
sider here the signed minimal time function, which is the corresponding ver-
sion of the minimal time function with constant dynamics (6.21). Recall that
a subset Ω of a topological vector space X is called trivial if both Ω and its
complement Ω c are nonempty.
Definition 6.65 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), let Ω be nontrivial subset of X, and let Ω c be the complement of
Ω in X. The signed minimal time function associated with the target set
Ω and the constant dynamics set F is given by
TΩF (x) := TΩF (x) − TΩFc (x), x ∈ X. (6.43)
To investigate the signed minimal time function (6.43) in what follows,
Ω : X → R defined by
consider first the auxiliary function μF

−TΩFc (x) if x ∈ Ω,
μFΩ (x) := (6.44)
∞ if x ∈ Ω c
and reveal some of its properties in the next proposition.
Proposition 6.66 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), and let Ω, Ω1 , Ω2 be nontrivial subsets of X. Then the following
properties of μF
Ω (x) are satisfied:

(a) If Ω1 ⊂ Ω2 , then μF
Ω2 (x) ≤ μΩ1 (x) for all x ∈ X.
F

(b) We have the equality {x ∈ X | μF


Ω (x) ≤ 0} = Ω.
(c) Assume in addition that F is bounded. Then it holds that
 
x ∈ X  μFΩ (x) < 0 = int(Ω). (6.45)

(d) If Ω is closed, then μF


Ω is l.s.c. on X.

Proof. Properties (a) and (b) can be checked directly by definition (6.44). To
verify (c), it suffices to prove that TΩFc (x) > 0 if and only if x ∈ int(Ω). Indeed,
assume that TΩFc (x) > 0 and get x ∈ Ω. We need to show that x ∈ / bd(Ω).
Suppose on the contrary that x ∈ bd(Ω). Due to the definitions and the
interiority assumption 0 ∈ int(F ), we have
 1 
x + F ∩ Ω c = ∅ for all k ∈ N.
k
It follows that TΩFc (x) ≤ 1/k whenever k ∈ N, and hence TΩFc (x) = 0. Using
the imposed boundedness of F and employing Proposition 6.58 tell us that
x ∈ Ω c . This is a contradiction, which justifies the “only if” in the statement.

To verify further the “if” part therein, take x ∈ int(Ω) and suppose on the
contrary that TΩFc (x) = 0. Then select a neighborhood V of the origin in X
6.5 Minimal Time Functions 421

such that x + V ⊂ Ω. The boundedness of F ensures the existence of t > 0


such that tF ⊂ V , which gives us x + tF ⊂ Ω. The contradiction comes from
the fact that TΩFc (x) = 0 yields (x + tF ) ∩ Ω c = ∅ as t > 0. Thus we get (6.45).
Finally, let us verify (d). Take any α ∈ R and consider the corresponding
α-sublevel set of μF Ω represented by
 
Lα = x ∈ Ω  TΩFc (x) ≥ α
due to definition (6.44). The closedness of the set on the right-hand side above
follows from the assumed closedness of Ω and the continuity of the minimal
time function TΩFc , which was proved in Corollary 6.51. Thus μF Ω is lower
semicontinuous on X. 

The next result, which is a direct consequence of Proposition 6.66 and


the definitions, shows that the signed minimal time function (6.43) entirely
determines the interior, closure, boundary, and exterior of a given set Ω. Note
that this significantly distinguishes (6.43) from the minimal time function
(6.21).

Corollary 6.67 Let F be a bounded, convex subset of a topological vector


space X with 0 ∈ int(F ), and let Ω be a nontrivial subset of X. Then we have
the following relationships:
 
(a) int(Ω) = x ∈ X  TΩF (x) < 0 .
 
(b) Ω = x ∈ X  TΩF (x) ≤ 0 .
 
(c) bd(Ω) = x ∈ X  TΩF (x) = 0 .
 
(d) int(Ω c ) = x ∈ X  T F (x) > 0 .
Ω

To continue the study of (6.44) with the subsequent application to the


signed minimal time function, we obtain now a characterization of the con-
vexity of the function μF
Ω . The next lemma is of its own interest while being
important to establish the main result of this subsection that follows.

Lemma 6.68 Let F be a convex subset of topological vector space X with


0 ∈ int(F ), and let Ω be a nontrivial subset of X. Then the function μF
Ω is
convex if and only if the set Ω is convex.

Proof. If the function μF


Ω is convex, then it follows from Proposition 6.66(b)
that the set Ω is convex as well. To verify the reverse implication, assume
that Ω is convex and show that for all λ ∈ (0, 1) and x1 , x2 ∈ Ω, we have
 
TΩFc λx1 + (1 − λ)x2 ≥ λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ). (6.46)
Suppose on the contrary that there exist λ ∈ (0, 1) and x1 , x2 ∈ Ω with
 
TΩFc λx1 + (1 − λ)x2 < λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ).
422 6 MISCELLANEOUS TOPICS ON CONVEXITY

Then we find 0 < t < λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ) such that
 
λx1 + (1 − λ)x2 + tF ∩ Ω c = ∅.
This gives us a vector w ∈ F satisfying the inclusion
 
λx1 + (1 − λ)x2 + tw ∈ Ω c .
Define further the number γ := λTΩFc (x1 ) + (1 − λ)TΩFc (x2 ) and the points
tw F tw F
u1 := x1 + T c (x1 ), u2 := x2 + T c (x2 ).
γ Ω γ Ω
Since t < γ, it follows that u1 , u2 ∈ Ω. Indeed, when TΩFc (x1 ) = 0 we imme-
diately get u1 = x1 ∈ Ω. In the remaining case where TΩFc (x1 ) > 0, suppose
on the contrary that u1 ∈ / Ω and so u1 ∈ Ω c . Then it follows from definition
(6.21) of the minimal time function that
t F
TΩFc (x1 ) ≤ T c (x1 ) < TΩFc (x1 ),
γ Ω
which is nonsense, and hence we get u1 ∈ Ω. Using finally the assumed con-
vexity of the set Ω tells us that
λu1 + (1 − λ)u2 = λx1 + (1 − λ)x2 + tw ∈ Ω.
The obtained contradiction verifies the concavity of TΩFc in (6.46), which yields
Ω in (6.44) and thus completes the proof of the lemma. 
the convexity of μF
Now we are ready to establish the main result of this subsection, which
provides a representation of the signed minimal time function (6.43) as the
infimal convolution of the function μFΩ and the Minkowski gauge of F and, as
a consequence of it, gives us sufficient conditions for the convexity of TΩF .
Theorem 6.69 Let F be a convex subset of topological vector space X with
0 ∈ int(F ), and let Ω be a nontrivial subset of X. Then the following hold:
(a) We have the estimate

TΩF (x) ≥ (μF


Ω pF )(x) for all x ∈ Ω. (6.47)

(b) If the set F is symmetric, then (6.47) holds as an equality.


(c) If in addition the set Ω is convex, then the signed minimal time function
TΩF is convex on X.
Proof. Starting with (a), pick any x ∈ Ω and deduce from the definitions that
 
(μF F 
Ω pF )(x) = inf μΩ (u) + pF (x − u)  u ∈ X

Ω (u) + pF (x − u) u ∈
= inf μF  Ω 
= inf − TΩ c (u) + pF (x − u)  u ∈ Ω
F

≤ −TΩFc (x) + pF (x − x) = −TΩFc (x).


Furthermore, for any u ∈ Ω we have
6.5 Minimal Time Functions 423

−TΩFc (x) ≤ −TΩFc (u) + pF (x − u).


Combining the above inequalities gives us
 
Ω (u) + pF (x − u) u ∈ Ω = (μΩ pF )(x),
−TΩFc (x) ≤ inf μF  F

which therefore justifies the estimate in (6.47) by definition (6.43) for x ∈ Ω.


To verify the equality in (6.47) when the set F is symmetric, pick x ∈ Ω c
and get for any u ∈ Ω that
−TΩFc (u) + pF (u − x) ≤ pF (u − x),
which yields, therefore, the relationships
 F   
μΩ pF (x) ≤ inf pF (u − x)  u ∈ Ω = TΩF (x).
Since u ∈ Ω, it follows from definition (2.9) that there exists z ∈ bd(Ω)
satisfying pF (x − z) + pF (z − u) = pF (x − u). Hence
TΩF (x) + TΩFc (u) ≤ pF (z − x) + pF (z − u) = pF (x − u),
which implies in turn the inequality
TΩF (x) ≤ −TΩFc (u) + pF (x − u) for all y ∈ Ω.
We arrive, therefore, at the reverse estimate
 
TΩF (x) ≤ μF Ω pF (x) for x ∈ Ω ,
c

which ensures by (6.43) the equality in (6.47) and thus justifies (b).
Finally, we verify (c) by using Lemma 6.68 on the convexity of μF Ω , Theo-
rem 2.26 on the convexity of the Minkowski gauge, and Proposition 2.135 on
the convexity of the infimal convolution of convex functions. 
The following example shows that the symmetry of the set F is essential
for the equality in (6.47) even in the case of one-dimensional problems.
Example 6.70 Consider the signed minimal time function (6.43) for x ∈ R,
F := [−1, 2], and Ω := {5}. Then

−x if x < 0,
pF (x) = x
if x ≥ 0,
2
TΩF (x) = pF (5 − x), and F F
 μΩ (x) = δΩ (x). In this case we have ΔΩ (x) =
pF (5 − x) and μΩ pF (x) = pF (x − 5), which shows the failure of the
F

equality in (6.47) when F is not symmetric.


Finally in this subsection, we present sufficient conditions for the (global)
Lipschitz continuity of the signed minimal time function on a normed space
with an efficient calculation of the corresponding Lipschitz constant.
Given an nonempty convex set F ⊂ X, recall that its polar set (or simply
polar) F ◦ of is defined by the formula
424 6 MISCELLANEOUS TOPICS ON CONVEXITY
 
F ◦ := x∗ ∈ X ∗  x∗ , x ≤ 1 for all x ∈ F . (6.48)
If X is a normed space, the norm of the polar F ◦ is
 
F ◦  := sup x∗   x∗ ∈ F ◦ . (6.49)

Proposition 6.71 Let X be a normed space, and let F be a convex subset of


X satisfying the interiority condition 0 ∈ int(F ). Then for any nontrivial set
Ω ⊂ X the signed minimal time function TΩF is Lipschitz continuous on X
with a Lipschitz constant  calculated by  = F ◦ .

Proof. We know from Corollary 6.52 that both functions TΩF and TΩFc are Lip-
schitz continuous on X and their common Lipschitz constant is calculated by
 = F ◦ ; see Exercise 6.109(b). To verify the conclusion of this proposition,
it suffices to show that—due to the structure of (6.43)—for any fixed points
x ∈ Ω and u ∈ Ω c we have the estimate
 F 
TΩ (x) − TΩFc (u) ≤ F ◦  · x − u. (6.50)
To proceed with the verification of (6.50), observe first that

|TΩF (x) − TΩF (u)| = |TΩF (x) − TΩFc (x) − TΩF (u) + TΩFc (u)|

= TΩFc (x) + TΩF (u).


Consider further the continuous function ψ : [0, 1] → R defined by
   
ψ(t) := TΩF tx + (1 − t)u − TΩFc tx + (1 − t)u , t ∈ [0, 1].
Since ψ(0)ψ(1) ≤ 0, the classical intermediate value theorem gives us a point
z ∈ [x, u] with TΩ c (z) = TΩF (z). It tells us that z ∈ bd(Ω) and that
TΩFc (x) + TΩF (u) ≤ pF (z − x) + pF (z − u) ≤ F ◦  · x − z

+F ◦  · z − u = F ◦  · x − u,
which justifies (6.50) and thus completes the proof. 

6.6 Convex Geometry in Finite Dimensions


This section presents several fundamental results of finite-dimensional con-
vex geometry, which are important for numerous applications of convex
analysis; in particular, to problems of convex optimization. We start with
Carathéodory’s theorem on representing convex hulls of sets in finite dimen-
sions, then proceed with the related geometric version of Farkas’ lemma on
solvability of linear inequalities, and finally consider the interrelated theorems
by Helly and Radon on set intersections. All these results and their proofs sig-
nificantly exploit the geometry of finite-dimensional spaces.
6.6 Convex Geometry in Finite Dimensions 425

6.6.1 Carathéodory Theorem on Convex Hulls

The Carathéodory theorem presented below establishes the precise dimension-


dependent representation of convex hulls of sets in Rn with numerous appli-
cations to optimization, calculus of variations, optimal control, etc.
Recall that, given a nonempty set Ω ⊂ Rn , the convex cone KΩ generated
by Ω, or the convex conic hull of Ω, is the intersection of all the convex cones
containing the set Ω. First we present a simple description of the cone KΩ
that holds also in infinite dimensions (Figure 6.3).

Fig. 6.3. Convex cone generated by a set

Proposition 6.72 Let Ω be a nonempty subset of a topological vector space


X. Then the convex cone generated by Ω admits the representation
 m  

KΩ = λi ai  λi ≥ 0, ai ∈ Ω, m ∈ N . (6.51)
i=1

If in addition the set Ω is convex, then KΩ = cone(Ω).


Proof. Denote by C the set on the right-hand side of (6.51), which is a convex
cone containing Ω. Let us now show that C ⊂ K for any convex cone K
containing Ω. To verify this, fix such a cone K and consider a vector x ∈ C
m
given by x = i=1 λi ai with any fixed λi ≥ 0, ai ∈ Ω, and m ∈ N. Without
loss of generality we suppose that λ i > 0 for some i, since otherwise the
m
statement is obvious. Denoting λ := i=1 λi > 0 gives us
426 6 MISCELLANEOUS TOPICS ON CONVEXITY


m
λ 
i
x=λ ai ∈ K,
i=1
λ

which yields C ⊂ K. Thus we get KΩ = C, while the claimed representation


of KΩ in the case of convex sets Ω is obvious. 
The next proposition is the key part of the Carathéodory theorem; it
involves the dimension of the space in question.
Proposition 6.73 Let Ω ⊂ Rn be a nonempty set, and let x ∈ KΩ \ {0}.
Then we have the representation

m
x= λi ai with λi > 0, ai ∈ Ω for all i = 1, . . . , m and m ≤ n.
i=1

m Pick any x ∈ KΩ \ {0} and deduce from Proposition 6.72 that x =


Proof.
i=1 μi ai with some μi > 0 and ai ∈ Ω as i = 1, . . . , m and m ∈ N. Suppose
that the vectors a1 , . . . , am are linearly dependent and find numbers γi ∈ R,
not all zeros, such that
m
γi ai = 0.
i=1

Letting I := {i = 1, . . . , m | γi > 0} = ∅ and taking any ε > 0, we get



m 
m 
m  m  
x= μi ai = μi ai − ε γi ai = μi − εγi ai .
i=1 i=1 i=1 i=1
μ   μ
i  i
Denote ε := min  i ∈ I = 0 for some i0 ∈ I and let βi := μi − εγi for
γi γi0
i = 1, . . . , m. Then we have

m
x= βi ai with βi0 = 0 and βi ≥ 0 for any i = 1, . . . , m.
i=1

This shows that the number of vectors ai in the representation of x is reduced


by at least one. Continuing the reduction process, we represent x after a
finite number of steps as a positive linear combination of linearly independent
vectors {aj | j ∈ J} in Rn , where J ⊂ {1, . . . , m}. This tells us that the index
set J cannot contain more than n elements. 
Now we are ready to derive the Carathéodory theorem, which is geomet-
rically illustrated by Figure 6.4.
Theorem 6.74 Let Ω be a nonempty subset of Rn . Then every point x ∈
co(Ω) can be represented as a convex combination of no more than n + 1
elements of the set Ω.
Proof. Considering the set Θ := {1} × Ω ⊂ Rn+1 , notice that co(Θ) = {1} ×
co(Ω) and that co(Θ) ⊂ KΘ . Thus for any x ∈ co(Ω) we have (1, x) ∈ co(Θ)
6.6 Convex Geometry in Finite Dimensions 427

Fig. 6.4. Carathéodory theorem

and deduce from Proposition 6.73 that there exist λi ≥ 0 and (1, ai ) ∈ Θ for
i = 0, . . . , m with m ≤ n such that we have

m
(1, x) = λi (1, ai ).
i=0

This readily verifies the representation



m 
m
x= λi ai with λi = 1, λi ≥ 0, and m ≤ n,
i=0 i=0

and thus completes the proof of the Carathéodory theorem. 

A useful consequence of Theorem 6.74 is given in Exercise 6.114.

6.6.2 Geometric Version of Farkas Lemma

In this section, we derive yet another fundamental result of finite-dimensional


convex analysis known as Farkas’ lemma; see Theorem 6.77. Although Farkas’
lemma and Carathéodory’s theorem are generally independent, the driving
force in the proof of both of them is Proposition 6.73 reflecting the geometry
of finite-dimensional spaces.
Here we present a geometric version of Farkas’ lemma, which gives us a
characterization of the solvability of finitely many linear inequalities via the
conic hull of the vectors generating the inequalities. Other versions of Farkas’
lemma are given in the corresponding exercises of Section 6.8.
428 6 MISCELLANEOUS TOPICS ON CONVEXITY

Prior to Theorem 6.77, we present two propositions of their own interest.


The first one provides a relationship between convex conic hulls of finite sets.
Proposition 6.75 Let a1 , . . . , am be linearly dependent nonzero vectors in
Rn . Consider the finite sets
 
Ω := a1 , . . . , am and Ωi := Ω \ ai , i = 1, . . . , m.
Then the following relationship for the conic hulls holds:
m
KΩ = K Ωi . (6.52)
i=1

Proof. Since the inclusion “⊃” in (6.52) is obvious, it remains to verify the
reverse inclusion. To proceed, pick any x ∈ KΩ and get by definition that

m
x= μi ai with μi ≥ 0, i = 1, . . . , m.
i=1

Without loss of generality, suppose that x = 0 and find by the linear depen-
dence of ai numbers γi ∈ R, not all zeros, such that

m
γi ai = 0.
i=1

As in the proof of Proposition 6.73, we reduce the number of vectors ai in the


representation of x as follows:

m

x= λi ai with some i0 ∈ 1, . . . , m .
i=1,i=i0

This tells us that x ∈ ∪m


i=1 KΩi and thus justifies (6.52). 

The second proposition also applies the dimension reduction procedure to


verify the closedness of convex conic hulls generated by finite sets.

Proposition 6.76 The convex conic hulls KΩ generated by finite sets Ω :=


{a1 , . . . , am } ⊂ Rn are closed in Rn .

Proof. We begin with considering the case where the vectors a1 , . . . , am are
linearly independent in Rn . Take any sequence {xk } ⊂ KΩ converging to x as
k → ∞ and find by definition of KΩ nonnegative numbers αki for i = 1, . . . , m
and k ∈ N such that
m
xk = αki ai , k ∈ N.
i=1

Denote αk := (αk1 , . . . , αkm ) ∈ Rm and observe that the sequence {αk }, k ∈


N, is bounded in Rm . Indeed, it is easy to check while arguing by contradiction.
Hence we suppose without loss of generality that αk → (α1 , . . . , αm ) as k → ∞
6.6 Convex Geometry in Finite Dimensions 429

with αi ≥ 0 for all


im= 1, . . . , m. Then we get by passing to the limit as k → ∞
that xk → x = i=1 αi ai ∈ KΩ , which, therefore, verifies the closedness of
KΩ in the case where a1 , . . . , am are linearly independent.
The remaining case is where the vectors a1 , . . . , am are linearly dependent,
and we assume without loss of generality that ai = 0 for all i = 1, . . . , m.
Then Proposition 6.75 provides the cone representation (6.52), where each
set Ωi contains m − 1 vectors. If at least one of these sets consists of linearly
dependent vectors, we can reduce the number of generating elements again
and thus arrive after a finite number of iterations at the representation
p
KΩ = K Ωj ,
j=1

where each set Ωj ⊂ Ω contains only linear independent elements. It follows


from considering the previous case that all the cones KΩj are closed, and
therefore, their finite union KΩ is closed as well. 
Using the last proposition together with the separation of closed convex
sets leads us to the justification of the seminal Farkas lemma.

Theorem 6.77 For any set Ω := {a1 , . . . , am } ⊂ Rn and vector q ∈ Rn the


following assertions are equivalent:
(a) q ∈ KΩ , i.e., there exist numbers λi ≥ 0, i = 1, . . . , m, such that

m
q= λi ai . (6.53)
i=1

(b) Whenever x ∈ Rn we have the implication


ai , x ≤ 0, i = 1, . . . , m =⇒ q, x ≤ 0 .

Proof. Let us first check that (a)=⇒(b). If q ∈ KΩ , by Proposition 6.72 we


find λi ≥ 0 as i = 1, . . . , m giving us the representation (6.53). Then fix
x ∈ Rn and observe that the fulfillment of the inequalities ai , x ≤ 0 for all
i = 1, . . . , m clearly ensures that

m
q, x = λi ai , x ≤ 0,
i=1

which, therefore, verifies the implication in (b).


To proceed with the proof of (b)=⇒(a), suppose on the contrary that
q ∈/ KΩ while (b) holds. Taking into account that the convex cone KΩ is
closed by Proposition 6.76, the strict separation result of Proposition 2.89
yields the existence of x ∈ Rn for which
 
sup u, x  u ∈ KΩ < q, x .
430 6 MISCELLANEOUS TOPICS ON CONVEXITY

Since KΩ is a cone, this tells us that 0 = 0, x < q, x and that tu ∈ KΩ
whenever t > 0 and u ∈ KΩ . Hence we have
 
t sup u, x  u ∈ KΩ < q, x whenever t > 0.
Now divide both sides of the above inequality by t and let t → ∞. This yields
 
sup u, x  u ∈ KΩ ≤ 0
and implies in turn that ai , x ≤ 0 for all i = 1, . . . , m. On the other hand,
we get q, x > 0, which contradicts (b) and thus completes the proof. 

6.6.3 Radon and Helly Theorems on Set Intersections

In the last subsection of this section, we present two other remarkable results of
convex finite-dimensional geometry known as Helly’s theorem and Radon’s the-
orem. They are generally different from while much related to Carathéodory’s
theorem and Farkas’ lemma discussed above. In this subsection, we first prove
Radon’s theorem and then use it for the derivation of Helly’s result; see his-
torical comments in Section 6.9.
Let us begin with the following simple observation. Recall that the notions
of affinely dependent vectors are taken from Definition 2.74.
Lemma 6.78 Any w1 , . . . , wm ∈ Rn with m ≥ n + 2 are affinely dependent.

Proof. Consider the vectors w2 − w1 , . . . , wm − w1 ∈ Rn and note that these


vectors are linearly dependent since m−1 > n. Thus the given ones w1 , . . . , wm
are affinely dependent by Proposition 2.75. 

Next we formulate and prove the aforementioned theorem by Radon.

Theorem 6.79 Take vectors w1 , . . . , wm ∈ Rn with m ≥ n + 2 and denote


I := {1, . . . , m}. Then there exist two nonempty subsets I1 , I2 ⊂ I with I1 ∩
I2 = ∅ and I1 ∪ I2 = I such that
   
co(Ω1 ) ∩ co(Ω2 ) = ∅ for Ω1 := wi  i ∈ I1 and Ω2 := wi  i ∈ I2 .

Proof. Since m ≥ n + 2, we get from Lemma 6.78 that the vectors w1 , . . . , wm


are affinely dependent, which gives us numbers λ1 , . . . , λm ∈ R that are not
equal to zero simultaneously while satisfying

m 
m
λi wi = 0 and λi = 0.
i=1 i=1

Denoting I1 := {i = 1, . . . , m | λi ≥ 0} and I2 := {i = 1, . . . , m | λi < 0}, we


see that both these sets are nonempty, they are disjoint and are related by
 
λi = − λi .
i∈I1 i∈I2
6.6 Convex Geometry in Finite Dimensions 431


Letting λ := i∈I1 λi gives us the equalities
   λi  −λi
λi wi = − λi wi and wi = wi .
λ λ
i∈I1 i∈I2 i∈I1 i∈I2

Defining the sets Ω1 := {wi | i ∈ I1 } and Ω2 := {wi | i ∈ I2 }, we get


 λi  −λi
wi = wi ∈ co(Ω1 ) ∩ co(Ω2 )
λ λ
i∈I1 i∈I2

and thus conclude that co(Ω1 ) ∩ co(Ω2 ) = ∅, which completes the proof. 

Finally, we are ready to formulate the classical Helly theorem and present
its proof by following Radon’s approach.

Theorem 6.80 Consider the collection O := {Ω1 , . . . , Ωm } of convex sets in


Rn , where m ≥ n + 1. If the intersection of any subcollection of n + 1 sets
from O is nonempty, then we have

m
Ωi = ∅. (6.54)
i=1

Proof. It is natural to employ induction with respect to m ≥ n + 1. For


m = n + 1, the result is obvious. Assume that (6.54) holds for some fixed
number m ≥ n + 1 and then show that this conclusion is satisfied for m + 1.
Given the collection {Ω1 , . . . , Ωm , Ωm+1 } of m + 1 convex sets in Rn for which
the intersection of any subcollection of n + 1 sets in nonempty, define the sets

m+1
Ωi := Ω, i = 1, . . . , m + 1.
j=1,j =i

We clearly have that all Ωi are convex and obey the inclusions Ωi ⊂ Ωj
whenever j = i and i, j = 1, . . . , m + 1. The induction assumption tells us
that Ωi = ∅, and hence we pick wi ∈ Ωi for every i = 1, . . . , m + 1. Using the
Radon theorem for the chosen vectors w1 , . . . wm+1 gives us two nonempty
subsets I1 , I2 ⊂ I := {1, . . . , m + 1} such that I1 ∩ I2 = ∅, I = I1 ∪ I2 , and
   
co(W1 ) ∩ co(W2 ) = ∅ for W1 := wi  i ∈ I1 and W2 := wi  i ∈ I2 .

Select now w ∈ co(W1 ) ∩ co(W2 ) and verify that w ∈ ∩m+1 i=1 Ωi . Indeed, we get
by i = j for i ∈ I1 and j ∈ I2 that Ωi ⊂ Ωj whenever i ∈ I1 and j ∈ I2 . Fixing
an index i ∈ {1, . . . , m + 1} and considering the case where i ∈ I1 ensure that
wj ∈ Ωj ⊂ Ωi for every j ∈ I2 . This yields w ∈ co(W2 ) = co({wj | j ∈ I2 }) ⊂
Ωi by the convexity of Ωi , and therefore w ∈ Ωi for every i ∈ I1 . We can
similarly check that w ∈ Ωi for every i ∈ I2 and thus complete the proof. 
432 6 MISCELLANEOUS TOPICS ON CONVEXITY

6.7 Approximations of Sets and Geometric Duality


The main attention in this section is paid to the primal, tangential approxi-
mation of a convex set Ω ⊂ X at a reference point x ∈ Ω by elements of the
same space X. This is clearly different from the dual, normal approximation
of Ω at x by elements of the dual space X ∗ while being related to each other
via a duality correspondence. We have already studied and applied above the
normal cone to convex sets and the associated subdifferential of convex func-
tions. The tangent cone to convex sets considered below corresponds to the
directional derivative of convex functions.
In this section, we confine ourselves to the finite-dimensional setting of
X = Rn . While the definitions and some results presented here hold true
in general topological vector spaces, the obtained tangent and normal cone
calculations for convex polyhedral sets are finite-dimensional being based on
Farkas’ lemma presented in the previous section.

6.7.1 Full Duality between Tangent and Normal Cones

Let us begin with the case where Ω is a nonempty convex cone. The dual/polar
cone to Ω is defined by the duality correspondence
 
Ω ∗ := x∗ ∈ X ∗  x∗ , x ≤ 0 for all x ∈ Ω . (6.55)
Note that often in the literature on convex analysis the notation for the polar
cone is Ω ◦ , while the dual cone Ω ∗ is defined as “positive polar cone” by
{x∗ ∈ X ∗ | x∗ , x ≥ 0 for all x ∈ Ω . However, in this book, we follow the
tradition of modern variational analysis to not distinguish between the polar
and dual cones for convex sets and use notation (6.55) in what follows. This
also allows us to avoid confusion with the polar set Ω ◦ in (6.48).
Applying the polar operation to Ω ∗ and using the strict separation of
convex sets give us the following bipolarity relationship. Recall that the symbol
“cl Ω” stands as usual for the closure of the set Ω.

Proposition 6.81 For any nonempty convex cone Ω ⊂ Rn we have


(Ω ∗ )∗ = Ω ∗∗ = cl Ω. (6.56)

Proof. It is obvious from (6.55) that Ω ⊂ Ω ∗∗ , and so cl Ω ⊂ Ω ∗∗ by the


closedness of the set Ω ∗∗ . To prove the opposite inclusion in (6.56), pick any
q ∈ Ω ∗∗ and suppose on the contrary that q ∈/ cl Ω. Then the strict separation
result from Proposition 2.89 gives us x∗ = 0 such that
x∗ , x ≤ 0 for all x ∈ Ω and x∗ , q > 0.
This tells us that x∗ ∈ Ω ∗ and hence contradicts the fact that q ∈ Ω ∗∗ . 

Now we define the tangent cone notion for convex sets by using the conic
hull construction from (6.51).
6.7 Approximations of Sets and Geometric Duality 433

Definition 6.82 Let Ω be a nonempty convex subset of Rn . The tangent


cone to Ω at x ∈ Ω is given by
T (x; Ω) := cl K{Ω−x} = cl R+ (Ω − x). (6.57)

Fig. 6.5. Duality between tangent and normal cones

The next theorem establishes the full duality correspondence between the
tangent and normal cones for convex sets; see Figure 6.5.
Theorem 6.83 Let Ω ⊂ Rn be a convex set with x ∈ Ω. Then
N (x; Ω) = T (x; Ω)∗ and T (x; Ω) = N (x; Ω)∗ . (6.58)

Proof. First we check that N (x; Ω) ⊂ T (x; Ω)∗ . Indeed, pick x∗ ∈ N (x; Ω)
and get by definition that x∗ , x − x ≤ 0 for all x ∈ Ω. This yields
 
v, t(x − x) ≤ 0 whenever t ≥ 0 and x ∈ Ω.
For any w ∈ T (x; Ω) we find by definition sequences of tk ≥ 0 and xk ∈ Ω
with tk (xk − x) → w as k → ∞. This tells us that
 
x∗ , w = lim x∗ , tk (xk − x) ≤ 0, and so x∗ ∈ T (x; Ω)∗ .
k→∞

To check now the opposite inclusion in (6.58), pick x∗ ∈ T (x; Ω)∗ and get from
the polarity that x∗ , w ≤ 0 for any w ∈ T (x; Ω) and that x − x ∈ T (x; Ω)
for any x ∈ Ω due to (6.57). Thus we arrive at
434 6 MISCELLANEOUS TOPICS ON CONVEXITY

x∗ , x − x ≤ 0 whenever x ∈ Ω,
which means that x∗ ∈ N (x; Ω) and hence verifies the first equality in (6.58).
The second equality in (6.58) follows directly from the first one by applying
the bipolarity relationship
N (x; Ω)∗ = T (x; Ω)∗∗ = cl T (x; Ω)
which is valid due to Proposition 6.81 by taking into account that the tangent
cone T (x; Ω) is a closed set by Definition 6.82. 

6.7.2 Tangents and Normals for Polyhedral Sets

The next theorem establishes effective descriptions of the tangent and normal
cones for convex polyhedral sets defined by linear inequalities. The usage of
Farkas’ lemma from Theorem 6.77 is crucial for deriving this result.

Theorem 6.84 Let Ω ⊂ Rn be a (convex) polyhedral set given by


 
Ω := x ∈ Rn  x∗i , x ≤ bi for all i = 1, . . . , m ,
where the vectors x∗i ∈ Rn and numbers bi ∈ R are fixed for all i = 1, . . . , m.
Take x ∈ Ω and consider the active index set I(x) := {i = 1, . . . , m | x∗i , x =
bi } at x. Then we have the representations
 
T (x; Ω) = v ∈ Rn  x∗i , v ≤ 0 for all i ∈ I(x) , (6.59)

 
N (x; Ω) = cone x∗i  i ∈ I(x) , (6.60)
where the convention cone(∅) := {0} is used.

Proof. First we verify the tangent cone representation (6.59). Consider the
closed and convex cone
 
K := v ∈ Rn  x∗i , v ≤ 0 for all i ∈ I(x)
and observe that whenever x ∈ Ω and i ∈ I(x), we get
 ∗   
xi , t(x − x) = t x∗i , x − x∗i , x ≤ t(bi − bi ) = 0 as t ≥ 0,
which tells us that cone(Ω − x) ⊂ K. Invoking the closedness of K yields
T (x; Ω) = cl R+ (Ω − x) ⊂ K,
which justifies the inclusion “⊂” in (6.59). To check the opposite inclusion
therein, take any v ∈ K and deduce from x∗i , x < bi as i ∈
/ I(x) that
x∗i , x + tv ≤ bi for any i ∈
/ I(x) and small t > 0.
Combining this with x∗i , v ≤ 0 as i ∈ I(x) implies that x∗i , x + tv ≤ bi for
all i = 1, . . . , m and that
6.8 Exercises for Chapter 6 435

1 
x + tv ∈ Ω, i.e., v ∈ Ω − x ⊂ T (x; Ω),
t
which completes the verification of the tangent cone representation (6.59).
To continue now with the proof of the normal cone representation (6.60),
deduce from the first equality in (6.58) the equivalence
x∗ ∈ N (x; Ω) if and only if x∗ , v ≤ 0 for all v ∈ T (x; Ω),
which implies in turn that x∗ ∈ N (x; Ω) if and only if
x∗i , v ≤ 0 for all i ∈ I(x) =⇒ x∗ , v ≤ 0
for all v ∈ X. Then we are in a position and employ the Farkas lemma from
Theorem 6.77 and conclude that x∗ ∈ N (x; Ω) if and only if w ∈ cone{x∗i | i ∈
I(x)}. This verifies (6.60) and completes the proof of the theorem. 

As a consequence of Theorem 6.84, we finally calculate the tangent and


normal cones for sets described by both inequality and equality constraints.

Corollary 6.85 Given x∗i ∈ Rn , bi ∈ R for i = 1, . . . , m and u∗j ∈ Rn , cj ∈ R


for j = 1, . . . , r, consider the convex polyhedron Ω defined by
 
Ω := x ∈ Rn  x∗i , x ≤ bi , i = 1, . . . , m, and u∗j , x = cj , j = 1, . . . , r .
Take x ∈ Ω with the active inequality constraint indices I(x) defined in The-
orem 6.84. Then we have the tangent and normal cone representations
 
T (x; Ω) = v ∈ X  x∗i , v ≤ 0, i ∈ I(x) and u∗j , v = 0, j = 1, . . . , r ,
   
N (x; Ω) = cone x∗i  i ∈ I(x) + span{u∗j  j = 1, . . . , r .

Proof. The set Ω is obviously represented via only the inequality constraints

Ω = x ∈ X  x∗i , x ≤ bi for i = 1, . . . , m, and 
u∗j , x ≤ cj , −u∗j , x ≤ −cj for j = 1, . . . , r .
Then both the tangent and normal cone representations follow from the cor-
responding results of Theorem 6.84. 

6.8 Exercises for Chapter 6


Exercise 6.86 Consider a normed space X. Let f : X → R be a σ-strongly convex
function, let g : X → R be a convex function, and let A ∈ L(X, X) be invertible.
Prove that each of the following functions is strongly convex and find constants of
their strong convexity:
(a) f + g.
(b) max{f, g}.
(c) γf for γ > 0.
(d) f ◦ A.
436 6 MISCELLANEOUS TOPICS ON CONVEXITY

Exercise 6.87 Let X be a normed space, and let fα : X → R for α ∈ I be a family


of γ-strongly convex functions. Prove that the supremum function supα∈I fα is also
a γ-strongly convex on X.

Exercise 6.88 Explore the possibility of extending the equivalence in Corollary 6.9
to the case where X is a normed space.

Exercise 6.89 Let f : X → R be a proper, l.s.c., and convex function on a Banach


space X, and let σ be a given positive constant.
(a) Assuming that X is Hilbert, prove that f is σ-strongly convex if and only if
the subgradient mapping ∂f : X → ∗
→ X is σ-strongly monotone. Hint: Use the
equivalence of the σ-strong convexity of f to the convexity of the shifted function
from Proposition 6.2 and then apply to the shifted function the equivalence
between the convexity of a function and the monotonicity of its subgradient
mapping from Theorem 5.35 and Remark 5.36(a).
(b) Give a direct proof of the equivalence between the σ-strong convexity of f and
the σ-strong monotonicity of the subgradient mapping ∂f .

Exercise 6.90 Let X be a topological vector space. Recall that a set-valued map-
ping F : X → ∗
→ X is strictly monotone if

x∗ − u∗ , x − u > 0 whenever x, u ∈ dom(F ), x∗ ∈ F (x), u∗ ∈ F (u), and x = u.

A function f : X → R is almost strictly convex if it is strictly convex along every


line segment included in dom(f ).
(a) Prove that in the framework of Remark 5.36(a) we have that f is almost strictly
convex if and only if ∂f is strictly monotone. Hint: In finite dimensions and
Hilbert spaces, proceed similarly to the proof of [317, Theorem 12.17].
(b) Verify that if a set-valued mapping F : Rn → n
→ R is σ-strongly monotone with
some constant σ > 0, then it is strictly monotone and there exists a unique
point x such that 0 ∈ F (x). Hint: Observe that the σ-monotonicity of F is
equivalent to the monotonicity if the shifted mapping F − σI and deal with the
latter as in the proof of [317, Proposition 12.54].
(c) Reveal appropriate infinite-dimensional settings in which (b) holds.

Exercise 6.91 Prove that the function ϕ : R → R defined by


 
1 − 1 − y 2 if |y| ≤ 1,
p(y) :=
∞ otherwise

is strongly convex on R.

Exercise 6.92 Let Ω be a nonempty convex subset of a Hilbert space H. Find


smooth approximations of the support function f (x) := σΩ (x) for x ∈ H using
Nesterov’s smoothing and the prox-function p(y) := y 2 /2 on H.

Exercise 6.93 Verify all the properties listed in Proposition 6.94.

Exercise 6.94 Show that the function p : Rn → R defined in Proposition 6.23 is


nonnegative on Rm .
6.8 Exercises for Chapter 6 437

Exercise 6.95 Prove the equivalence between Definition 6.25 of the horizon cone
and its representation (6.13).
Exercise 6.96 Let Ω ⊂ X be a nonempty convex subset of a vector space X. Verify
that the horizon cone Ω∞ is also convex.
Exercise 6.97 Let Ω j ⊂ X for j ∈ J be nonempty subsets of a vector space X,
and let J be an arbitrary index set. Prove that
   j    j
Ωj ⊂ Ω∞ and Ωj ⊃ Ω∞ ,
∞ ∞
j∈J j∈J j∈J j∈J

where the first inclusion holds as an equality if the sets Ω j have a common point,
while the second inclusion becomes an equality when the index set J is finite. Hint:
Apply Definition 6.25 in both cases of intersections and unions.
Exercise 6.98 Let Ω j ⊂ X, j = 1, . . . , s, be nonempty subsets of a vector space X.
Do the following:
(a) Check by definition that
 1 s
Ω1 × . . . × Ωs ∞
= Ω∞ × . . . × Ω∞ .

(b) Find conditions ensuring that


s s
Ωj = j
Ω∞ .

j=1 j=1

Hint: Proceed by definition for s = 2 and then use induction.


Exercise 6.99 Let Ω ⊂ Rn be a nonempty closed set, and let A : Rn → Rs be a
linear operator. Check the following:
(a) We always have the inclusion

AΩ∞ ⊂ (AΩ)∞ , (6.61)


−1
which holds as an equality if A (0) ∩ Ω∞ = {0}.
(b) Prove that the assumption in (a) ensures that the image set AΩ is closed.
(c) Give an example showing that the equality in (6.61) fails if the assumption in
(a) is not satisfied.
(d) Clarify whether the assertions in (a) and (b) hold in infinite dimensions.
Hint: Similarly to the proof of Theorem 6.29, proceed by using Definition 6.25 with-
out employing the cosmic convergence as in the proof of [317, Theorem 3.10] based
on an alternative definition of the horizon cone for nonconvex sets in finite dimen-
sions.
Exercise 6.100 Given ai ∈ Rn and bi > 0 for i = 1, . . . , m, consider the set
 
Ω := x ∈ Rn  ai , x ≤ bi for all i = 1, . . . , m .

Prove that the horizon cone to Ω can be represented by


 
Ω∞ = x ∈ Rn  ai , x ≤ 0 for all i = 1, . . . , m .
438 6 MISCELLANEOUS TOPICS ON CONVEXITY

Exercise 6.101 Find the horizon cones of the following sets:


 
(a) Ω := (x, y) ∈ R2  y ≥ x2 .
 
(b) Ω := (x, y) ∈ R2  y ≥ |x| .
 
(c) Ω := (x, y) ∈ R2  x > 0, y ≥ 1/x .
 √ 
(d) Ω := (x, y) ∈ R2  x > 0, y ≥ x2 + 1 .

Exercise 6.102 Calculate the perspective and horizon functions of elementary func-
tions: (a) polynomials; (b) exponential functions; (c) logarithmic functions.

Exercise 6.103 Let f : X → R be a proper, l.s.c., and convex function on a topo-


logical vector space X. Do the following:
(a) If dim(X) < ∞, prove the inclusion
   
x ∈ X  f (x) ≤ α ∞ ⊂ x ∈ X  f∞ (x) ≤ 0 (6.62)
and show the it holds as an equality whenever α ∈ R is such that {x ∈
X | f (x) ≤ α} = ∅. Hint: Proceed by using the definitions and then compare
with the proof given in [317, Proposition 3.23].
(b) Clarify whether the results of (a) are valid when dim(X) = ∞.

Exercise 6.104 Let X be a topological vector space. Consider the constraint set
 
Θ := x ∈ Ω  f j (x) ≤ 0 for j ∈ J ,

where Ω ⊂ X is a convex set, and where f j : X → R is a convex function for each j


from an arbitrary index set J.
(a) If dim(X) < ∞, prove the inclusion
 j 
Θ∞ ⊂ x ∈ Ω∞  f∞ (x) ≤ 0 for j ∈ J

and show that it holds as an equality provided that Θ = ∅ and that the set Ω
is closed. Hint: Combine the results from Exercises 6.97 and 6.103(a).
(b) Clarify whether the results of (a) are valid when dim(X) = ∞.

Exercise 6.105 Provide detailed calculations in the three examples given at the
beginning of Section 6.4.

Exercise 6.106 Let Ω be a nontrivial convex subset of a normed space X. Show


that x ∈ bd(Ω) if and only if d(x; Ω) = d(x; Ω c ) = 0, and this is equivalent to the
 Ω) = 0.
condition d(x;

Exercise 6.107 Prove representation (6.22) of the minimal time function, where Ω
is an nonempty and closed subset of a normed space X.

Exercise 6.108 Prove that the conclusions of Theorem 6.50 and Proposition 6.56
hold under the fulfillment of all the assumptions of Theorem 6.50 and Proposi-
tion 6.56 but the boundedness of the set F , which is replaced by 0 ∈ int(F ). Hint:
Proceed similarly to the proofs of the aforementioned results.

Exercise 6.109 Consider the results presented in Corollaries 6.51 and 6.52.
6.8 Exercises for Chapter 6 439

(a) For the case where X = Rn , construct examples showing that each of the
imposed assumptions is essential for the corresponding conclusions.
(b) In the framework of Corollary 6.52, show that for any Ω ⊂ X a Lipschitz
constant of the minimal time function TΩF on X is calculated by F ◦ via
the norm (6.49) of the polar set (6.55). Hint: Deduce this from the proof of
Corollary 6.52.
(c) For the case where X = Rn , compare the Lipschitz constant from (b) with the
exact Lipschitzian modulus calculated in [317, Theorem 9.13]. Hint: Use the
subdifferential calculation for (6.21).

Exercise 6.110 Consider the minimal time function in the setting of Theorem 6.64.
(a) Prove that a subset Ω of a semireflexive space X is weakly compact in X under
the assumptions imposed on Ω in the theorem. Hint: Compare with the proof
of [218, Proposition 23.18].
(b) Finish the proof of Theorem 6.64 by using arguments similar to the proof of
6.61. Hint: Compare with the proof of [235, Theorem 4.3].

Exercise 6.111 Give detailed proofs of assertions (a) and (b) in Proposition 6.66
and of Corollary 6.67.

Exercise 6.112 Let Ω be a nonempty subset of a topological vector space X, and


let v be a nonzero vector from X. The function
 
TΩv (x) := inf t ≥ 0  x + tv ∈ Ω , x ∈ X, (6.63)
is called the directional minimal value function of the target set Ω in the direction
v. Verify the following assertions:
(a) TΩv is convex if and only if Ω is convex.
(b) Assume that the set Ω is closed and strictly convex, i.e., for any x1 , x2 ∈ Ω
with x1 = x2 and any α ∈ (0, 1) we have that αx1 + (1 − α)x2 ∈ Ω. Show that
if [a, b] ⊂ dom(TΩv ) \ Ω and b − a ∈
/ span{v}, then the strict convexity of the set
Ω yields the strict convexity of the function TΩv in the sense of Definition 2.111.
Hint: Proceed by using the definitions and then compare with the proofs given in
[273, Propositions 2.6 and 2.7].

Exercise 6.113 Let TΩv be a directional minimal time function defined on a normed
space X. Verify the following assertions:
(a) If v ∈ int(Ω∞ ), then the function TΩv is Lipschitz continuous on X and its
Lipschitz constant is calculated by
1  

= inf  r > 0, B(v; r) ⊂ Ω∞ .
r
Hint: Use the definitions and compare with the proof of [273, Proposition 4.1].
(b) The function TΩv is finite-valued and Lipschitz continuous on X if and only if
v ∈ Ω∞ . Hint: Use (a) and compare with the proof of [273, Proposition 4.2].

Exercise 6.114 Let Ω be a nonempty compact subset of Rn . Prove that the set
co(Ω) is compact as well. Hint: Use Carathéodory’s theorem.
440 6 MISCELLANEOUS TOPICS ON CONVEXITY

Exercise 6.115 Let Ω1 , . . . , Ωn+1 ⊂ Rn and x ∈ co(Ωi ) as i = 1, . . . , n + 1. Prove


that there are vectors xi ∈ Ωi , i = 1, . . . , n + 1, such that x ∈ co({x1 , . . . , xn+1 }).
Hint: Use Carathéodory’s theorem and compare with [31, Theorem 2.1].

Exercise 6.116 Let A ∈ Rm×n and b ∈ Rm .


(a) Employ Farkas’ lemma from Theorem 6.77 to show that the linear system

Ax = b, x ≥ 0,

where the vector inequality is understood componentwise, has a feasible solution


x ∈ Rn if and only if the adjoint system

A∗ y ≤ 0, b, y > 0

has no feasible solution y ∈ Rm .


(b) Prove that the analytic version of Farkas’ lemma from (a) is equivalent to the
geometric version of Theorem 6.77.

Exercise 6.117 Let A ∈ Rm×n and b ∈ Rm . Prove that one and only one of the
following assertions is true:
(a) There exists x ∈ Rn such that Ax = b.
(b) There exits y ∈ Rm such that A∗ y = 0 and b, y = 0.

Exercise 6.118 Deduce Carathéodory’s theorem from the proof of Radon’s theo-
rem.

Exercise 6.119 Use Carathéodory’s theorem to prove Helly’s theorem.

Exercise 6.120 Let Ω1 and Ω2 be convex sets in a topological vector space X with
int Ω1 ∩ Ω2 = ∅. Show that

T (x; Ω1 ∩ Ω2 ) = T (x; Ω1 ) ∩ T (x; Ω2 ) for any x ∈ Ω1 ∩ Ω2 .

Hint: Proceed by the definitions.

Exercise 6.121 Let A be an p × n matrix, let


 
Ω := x ∈ Rn  Ax = 0, x ≥ 0 ,

and let x = (x1 , . . . , xn ) ∈ Ω. Verify the tangent and normal cone formulas:
 
T (x; Ω) = u = (u1 , . . . , un ) ∈ Rn  Au = 0, ui ≥ 0 for any i with xi = 0 ,
 
N (x; Ω) = v ∈ Rn  v = A∗ y − z, y ∈ Rp , z ∈ Rn , zi ≥ 0, z, x = 0 .
Hint: Use the results of Corollary 6.85.

Exercise 6.122 Give an example of a closed and convex set Ω ⊂ R2 with a point
x ∈ Ω such that the cone cone(Ω − x) is not closed.
6.9 Commentaries to Chapter 6 441

6.9 Commentaries to Chapter 6


The notion of strongly convex functions, defined and studied in Subsection 6.1, was
introduced in 1966 by Boris Polyak [291] being motivated by applications to the
convergence of numerical algorithms in optimization and optimal control. The uni-
formly convex extension of this notion was defined by Polyak and his student Evgenii
Levitin in [202] with further applications to numerical optimization; see also Polyak’s
book [292]. The class of uniformly convex (and hence strongly convex) functions was
then deeply investigated by Vladimirov et al. [344] and by Zălinescu [359]. Although
various notions of uniform smoothness for functions have been used in analysis, those
studied in Section 6.1 (being restricted to strong smoothness) were independently
introduced by Dominique Azé and Jean-Paul Penot [19] in the convex setting and
by Naoki Shioji [323] in generality.
The above publications contain various characterizations of uniformly convex
and uniformly smooth functions with relationships between them established in [19]
and [323]. We particularly refer the reader to the book by Constantin Zălinescu
[361], which we mainly follow in our exposition of the material in Section 6.1, while
concentrating there on strongly convex and strongly smooth functions.
In Section 6.2, we present some basic elements of the smoothing technique sug-
gested in 1983 and strongly developed by Yurii Nesterov [277] (also a student of
Boris Polyak) in finite-dimensional spaces; see [278, 279] and the references therein.
By now Nesterov’s smoothing techniques play a highly important role in numerical
optimization. Infinite-dimensional results of Section 6.2 are taken from the paper by
Nam et al. [269]. We plan to consider various algorithmic applications of Nesterov’s
smoothing techniques in the second volume of this book.
Behavior of convex sets and functions at infinity, which is studied in Section 6.3,
has always been of great interest in convex analysis and applications. For the case
of convex sets, the unboundedness was first investigated by Steinitz [329] who intro-
duced the notion of horizon cone (known also as the asymptotic cone and the reces-
sion cone) and obtained, in finite dimensions, most of the basic facts presented in
Subsection 6.3.1; see also Rockafellar’s book [306].
Since any extended-real-valued function is identified with its (unbounded) epi-
graph, horizon cones to convex epigraphs naturally defined the class of hori-
zon/recession functions and generate their properties presented in Subsection 6.3.2;
cf. the finite-dimensional book [306] for further discussions an references.
Perspective functions defined and studied in that subsection are different from
their horizon counterparts, while they also capture the behavior of a given function at
infinity. Such functions frequently appear in various areas of applied mathematics,
statistics, etc. Among the earliest publications, we mention Fisher’s paper [133]
on statistical data analysis. Although the name “perspective function”, coined by
Claude Lemaréchal, appeared in his book with Hiriart-Urruty [164], it seems that
such functions were first investigated by Rockafellar [306]. We refer the reader to
the recent survey by Combettes [81] for basic properties and numerous applications
of perspective functions in Hilbert spaces. The reader can also consult the book
by Auslender and Teboulle [17] for major developments on asymptotic properties
of functions and sets at infinity with various applications to optimization, stability,
variational inequalities, etc. Understanding the asymptotic behavior of functions at
infinity plays a significant role in economic analysis related to turnpike properties
of economic processes, which were first investigated by Paul Samuelson who coined
442 6 MISCELLANEOUS TOPICS ON CONVEXITY

the turnpike terminology in his fundamental monograph [320]. Turnpike properties


are instrumental in the study and applications of control systems on the infinite
horizon; see, e.g., the book by Zaslavski [364] and the references therein.
To the best of our knowledge, the signed/oriented distance function considered in
Section 6.4 was first defined in 1979 by Jean-Baptiste Hiriart-Urruty [160], and since
that it has been largely investigated and applied in many publications. The reader
is referred to the recent paper by Luo et al. [212], which contains new results on
various properties of signed distance functions in finite-dimensional spaces together
with a nice survey of previous developments and broad applications. We continue the
study of convex signed distance functions in Chapter 7 with deriving subdifferential
formulas for them by using tools of nonconvex generalized differentiation.
Section 6.5 addresses two different classes of the minimal time functions, which
are far-going extensions of the distance functions and signed distance functions,
respectively. The first class, known as minimal time functions with constant dynam-
ics, was identified and named by Giovanni Colombo and Peter Wolenski in [80]
who were motivated by the work of Martino Bardi [32] investigating minimal time
problems for Hamilton-Jacobi partial differential equations via viscosity methods.
In [79, 80] and the subsequent publications [78, 155, 234, 235, 237], the reader can
find various properties of minimal time functions with constant dynamics and their
convex and nonconvex subdifferentiation in finite-dimensional, Hilbert, Asplund,
and Banach spaces. The results in topological vector spaces presented in Subsec-
tions 6.5.1 and 6.5.2 are taken from our very recent paper with Cuong and Wells
[92]. In this paper, we also introduced a new class of signed minimal time functions
studied in Subsection 6.5.3. Observe that, among various results presented therein
for this new class of minimal time functions, the reader cannot find any subgradient
formulas. This is still an open question, except for the class of convex signed distance
functions, which forms a special subclass of signed minimal time ones.
Section 6.6 reviews four classical results of finite-dimensional convex geometry
that play significant roles in convex analysis and its applications to optimization
and other areas. One of the earliest results in this direction, Carathéodory’s theorem,
was first proved by Minkowski’s student Constantin Carathéodory (1873–1950) in
1907 [70] for compact sets; its full version appeared in Steinitz [329, II]. This result
provides a precise representation of convex hulls of sets in Rn , which restricts the
number of elements in convex combinations by the dimension-dependent number
n + 1, being thus very useful in numerous applications of finite-dimensional analysis.
The Farkas lemma was discovered by Gyula Farkas (1847–1930) in 1902 [129],
and since then it has been broadly used in various fields of optimization, inequality
systems, economics, quantum mechanics, etc. This area of research is still pretty
active; see, e.g., the survey paper by Dinh and Jeyakumar [108] for a variety of
generalizations, further applications, and references.
The original Helly theorem was discovered in 1913 by Edward Helly (1884–1943)
who published this result only in 1923 [157]. Johann Radon (1887–1956), while
seeking a simple proof of Helly’s theorem, established in 1921 [299] his own important
result, Radon’s theorem, and elegantly derived from it the one obtained by Helly.
Despite quite different formulations of Carathéodory, Helly’s and Radon’s theorems,
these three results are actually equivalent; see the proof in Martini et al. [215] which
uses subdifferential tools of convex analysis.
The material of Section 6.7 is classical in convex analysis; see Rockafellar’s book
[306] with historical remarks. Convex polyhedrality is also nicely reflected in [306]
6.9 Commentaries to Chapter 6 443

with showing that polyhedral structures of sets and functions in finite dimensions
make it possible to significantly improve qualification conditions in calculus results,
duality relations, etc. Note that polyhedrality also plays a highly important role in
more general modes of variational analysis. We refer the reader to the very influential
paper by Stephen Robinson [301] and the book by Rockafellar and Wets [317].
7
CONVEXIFIED LIPSCHITZIAN ANALYSIS

This chapter deals with nonconvex nondifferentiable functions. However, the


machinery and results of convex analysis presented in the previous chapters
provide crucial tools for the study of generalized differentiation beyond the
function convexity. A conventional way to do it is developing a scheme using
first a suitable extension of the directional derivative and then defining the cor-
responding directional subdifferential via a duality correspondence with such
an extended directional derivative. The subgradient mappings obtained in this
way are always convex-valued, but the most important properties of them are
exhibited when the directional derivatives are assumed (or happen to be) con-
vex with respect to directions. All of this constitutes the realm of convexified
analysis in this chapter.
The main attention here is paid to the notions of the generalized direc-
tional derivative and the associated generalized gradient introduced by Fran-
cis Clarke (we use his terminology) together with the notions of the contin-
gent derivative/subderivative and the corresponding contingent subdifferential
that are traced back to Ulisse Dini. We confine ourselves in this chapter to
the study of locally Lipschitzian functions. The principal reason for this is
that the definitions and major properties of Clarke’s generalized differential
constructions are significantly based on the local Lipschitz continuity. The
Lipschitzian framework is also essential for the given definitions of the contin-
gent constructions, which are studied in parallel with the former ones. Non-
Lipschitzian extensions of the aforementioned notions and related material
are discussed in the commentaries to this chapter.
A crucial difference between Clarke’s generalized directional derivative and
the (contingent) subderivative is that the former is automatically convex in
directions while the latter is not. The convexity of the generalized directional
derivative is the major source of comprehensive calculus rules for generalized
gradients of Lipschitzian functions that are derived by reducing them to con-
vex analysis counterparts. This is not the case for contingent subgradients
without imposing the convexity of the corresponding subderivatives.

© Springer Nature Switzerland AG 2022 445


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9 7
446 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Besides the generalized directional derivatives, subderivatives, and the cor-


responding directionally generated subdifferentials, we include in this chap-
ter some additional materials on subdifferential constructions of other types
(namely, the regular and limiting ones) whenever they are needed for the
main material of the chapter. This particularly concerns deriving subdiffer-
ential formulas for the distance and signed distance functions, and also the
study of difference of convex (DC) functions.
Unless otherwise stated, our standing assumptions throughout the whole
chapter are that the functions under consideration are defined on normed
spaces and are locally Lipschitzian around the points in question. Since our
analysis involving the Lipschitz continuity is local, we suppose without loss of
generality that all the functions are finite on the entire space.

7.1 Generalized Directional Derivatives


This section is devoted to primal constructions of generalized directional
derivatives, which are used in the next section to define the corresponding
subdifferentials by a duality correspondence while playing an underlying role
in subdifferential theory for such directionally generated subdifferentials. We
split this section into two subsections. The first one contains basic definitions
of the generalized directional derivative, Dini contingent derivative (subderiva-
tive), and directional regularity with revealing some important relationships.
The second subsection derives significant properties of these constructions
with emphasis on the most remarkable ones for the generalized directional
derivative of locally Lipschitzian functions on normed spaces.

7.1.1 Definitions and Relationships

Recall that a function f : X → R defined on a normed space X is locally


Lipschitzian around x ∈ X if there exist  ≥ 0 and δ > 0 such that
|f (x) − f (u)| ≤ x − u for all x, u ∈ B(x; δ).
We begin with the following definitions of basic notions.

Definition 7.1 Let f : X → R be a locally Lipschitzian function around some


point x ∈ X, and let v ∈ X be a given direction. Then
(a) The (Clarke) generalized directional derivative of f at x in the
direction v is defined by
f (x + tv) − f (x)
f ◦ (x; v) := lim sup . (7.1)
x→x t
t↓0
7.1 Generalized Directional Derivatives 447

(b) The (Dini) contingent derivative or subderivative of f at x in the


direction v is defined by
f (x + tv) − f (x)
df (x; v) := lim inf . (7.2)
t↓0 t
(c) The function f is called directionally regular at x if
f ◦ (x; v) = df (x; v) for all v ∈ X. (7.3)

Observe that we always have that df (x; v) ≤ f ◦ (x; v) and that the sub-
derivative (7.2) reduces to the classical (one-sided) directional derivative
f (x + tv) − f (x)
f  (x; v) = lim , v ∈ X, (7.4)
t↓0 t
defined and studied in Subsection 4.3.1, provided that the limit in (7.4) exists.
In contrast to (7.4), the lower limit in (7.2) always exists as a real number
due to the local Lipschitz continuity of f around x. It is easy to deduce
from the definitions that the directional regularity (7.3) is equivalent to the
requirement that the classical directional derivative f  (x, v) exists and agrees
with the generalized one f ◦ (x; v) for all v ∈ X.
Let us highlight the two major differences between the generalized direc-
tional derivative f ◦ (x; v) and the constructions in (7.4) and (7.2).
(i) In contrast to (7.4) and (7.2), the initial point x in (7.1) is not fixed
while being included into the limiting process.
(ii) In contrast to the full limit in (7.4), provided that it exists, and to the
lower limit in (7.2), the upper limit is used in (7.1).
The aforementioned observations are crucial for the fulfillment of many
useful properties of (7.1), which are not generally shared by (7.4) and (7.2).
Now we present some elementary properties of both generalized direc-
tional derivative (7.1) and subderivative (7.2), which easily follow from the
definitions. The first proposition gives us an extended representation of sub-
derivatives that is useful not only for Lipschitzian functions; see Section 7.10.
Proposition 7.2 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Given a direction v ∈ X, the
contingent derivative (7.2) admits the equivalent representation
f (x + tw) − f (x)
df (x; v) = lim inf
w→v
. (7.5)
t↓0
t

Proof. It follows directly from definition (7.2) that


f (x + tw) − f (x)
lim inf
w→v
≤ df (x; v).
t↓0
t
448 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Since f is locally Lipschitzian around x, there exist  ≥ 0 and δ > 0 with


|f (x) − f (u)| ≤ x − u whenever x, u ∈ B(x; δ).
Then we have the upper estimate
f (x + tv) − f (x) f (x + tw) − f (x)
≤ w − v +
t t
for all w ∈ X and for all t > 0 sufficiently small. Taking there the limit inferior
as t ↓ 0 and w → v brings us
f (x + tw) − f (x)
df (x; v) ≤ lim inf
w→v
,
t↓0
t

which completes the proof of the proposition. 

The next result provides an equivalent description of directional regularity.

Proposition 7.3 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around some point x ∈ X. Then f is directionally reg-
ular at x if and only if the classical directional derivative f  (x; v) from (4.27)
exists and the following equality holds:
f  (x; v) = f ◦ (x; v) for all v ∈ X. (7.6)

Proof. Suppose that f is directionally regular at x. Then for any v ∈ X we


have df (x; v) = f ◦ (x; v). Therefore, this gives us the relationships
f (x + tv) − f (x) f (x + tv) − f (x)
lim sup ≤ f ◦ (x; v) = df (x; v) = lim inf ,
t↓0 t t↓0 t
which ensure the existence of the classical directional derivative f  (x; v) and
the claimed equality in (7.6).
The opposite implication follows from the obvious fact that if f  (x; v)
exists, then f  (x; v) = df (x; v). 

By Proposition 7.3, it is not surprising to see that any convex locally


Lipschitzian function is directionally regular at the reference point.

Proposition 7.4 Let X be a normed space, and let f : X → R be a convex


function that is locally Lipschitzian around some point x. Then f is direction-
ally regular at this point.

Proof. For any v ∈ X, we know that the classical directional derivative f  (x; v)
exists as a real number for any convex and locally Lipschitzian function. It
immediately follows from the definitions that
f (x + tv) − f (x) f (x + tv) − f (x)
f  (x; v) = lim ≤ lim sup = f ◦ (x; v),
t↓0 t x→x t
t↓0
7.1 Generalized Directional Derivatives 449

which gives us the inequality “≤” in (7.6). To verify the opposite inequality
therein, let  ≥ 0 be a Lipschitz constant of f around x. Fixing x, v ∈ X and
α > 0, observe by Proposition 4.45 that the function ψ(t) := f (x + tv) −
f (x) /t is monotonically increasing on (0, ∞). Then we get the relationships
f (x + tv) − f (x) f (x + γv) − f (x)
f ◦ (x; v) = lim sup ≤ lim sup
γ↓0 x−x<αγ t γ↓0 x−x<αγ γ
0<t<γ
f (x + γv) − f (x + γv) + f (x + γv) − f (x) + f (x) − f (x)
≤ lim sup
γ↓0 x−x<αγ γ
x − x + f (x + γv) − f (x) + x − x
≤ lim sup
γ↓0 x−x<αγ γ
 f (x + γv) − f (x) 
≤ lim 2α + = 2α + f  (x; v).
γ↓0 γ
Since α > 0 is arbitrary, we arrive at f ◦ (x; v) ≤ f  (x; v) and conclude the
proof by appealing to Proposition 7.3. 
The next proposition establishes representations of both extended direc-
tional derivatives from Definition 7.1 in the case of differentiable functions.
Proposition 7.5 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Suppose that f is Gâteaux

differentiable at x with the Gâteaux derivative fG (x). Then for all v ∈ X we
have the representation
f  (x; v) = df (x; v) = fG

(x), v . (7.7)
If furthermore f is a C 1 -smooth function around x with the (Fréchet) deriva-
tive fF (x) at x, then for all v ∈ X we have
f ◦ (x; v) = fF (x), v , (7.8)
and thus f is directionally regular at x in the latter case.
Proof. The representations in (7.7) easily follow from the definitions. To verify
(7.8), by (7.1) we choose a sequence {tk } of positive numbers and a sequence
{xk } ⊂ X converging to x such that
f (xk + tk v) − f (xk )
f ◦ (x; v) = lim .
k→∞ tk
Then the classical mean value theorem tells us that
 
f (xk + tk v) − f (xk ) = fF (uk ), tk v
for some uk ∈ (xk , xk + tk v) for every k ∈ N. Thus we get
f (xk + tk v) − f (xk )
f ◦ (x; v) = lim = lim fF (uk ), v = fF (x), v ,
k→∞ tk k→∞

which justifies the claim of the proposition. 


450 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Propositions 7.4 and 7.5 expectedly verify the directional regularity for the
classical classes of convex and smooth functions. More results on directional
regularity are given in Subsection 7.2.2 in the calculus framework.
Let us now consider two examples illustrating the calculation of the gener-
alized directional derivative (7.1) and the subderivative (7.2) in simple albeit
rather instructive settings of nonconvex functions.

Example 7.6 Consider the function f (x) := min{−x, 0} defined on R. Then


for all v ∈ R we are going to verify the formula the


⎨−v if x > 0,

f (x; v) = max − v, 0} if x = 0, (7.9)


0 if x < 0,
where only the case of x = 0 must be checked. To proceed with the verification
of formula (7.9) for x = 0, observe first that for any two real numbers a and
b we obviously have
min{a, 0} − min{b, 0} ≤ max{a − b, 0}.
It follows from definition (7.1) for the function f under consideration that
f (x + tv) − f (x)
f ◦ (0; v) = lim sup
x→0 t
t↓0
min{−x − tv, 0} − min{−x, 0}
= lim sup
x→0 t
t↓0
max{−tv, 0}
≤ lim sup = max{−v, 0}.
x→0 t
t↓0

If v > 0, take xk := 1/k, tk := 1/k 2 for k ∈ N and get


f (xk + tk v) − f (xk ) −1/k − v/k 2 + 1/k
f ◦ (0; v) ≥ lim = limk→∞
k→∞ tk 1/k
= 0 = max{−v, 0}.
If v < 0, take xk = 1/k, vk = 1/k for k ∈ N and get
f (xk + tk v) − f (xk ) −1/k − v/k + 1/k
f ◦ (0; v) ≥ limk→∞ = limk→∞
tk 1/k
= −v = max{−v, 0}.
We arrive at f ◦ (0; v) = max{−v, 0} for all v ∈ X and thus verify (7.9).
It is easy to see that the classical directional derivative (7.4) does not exist
at x = 0, while the contingent derivative (7.2) is calculated by
7.1 Generalized Directional Derivatives 451


⎨−v if x > 0,
df (x; v) = min − v, 0} if x = 0, (7.10)


0 if x < 0.
Indeed, it suffices to consider the case where x = 0 in (7.10) for which
f (0 + tv) − f (0)
df (0; v) = lim inf
t↓0 t
min{−tv, 0} t min{−v, 0}
= lim inf = lim inf = min{−v, 0}.
t↓0 t t↓0 t
The second example is more involved. It shows, in particular, that repre-
sentation (7.8) does not hold if the function f is not C 1 -smooth around x but
is the merely Fréchet differentiable at this point.

Example 7.7 Consider the function f : R → R given by


⎧  
⎨x2 sin 1 if x = 0,
f (x) := x
⎩0 if x = 0.

It is easy to see that this function is locally Lipschitzian around x = 0,


(Fréchet) differentiable at x, but not C 1 -smooth around this point. We start
by calculating the generalized directional derivative (7.1) at x.
Let us first show that f ◦ (0; v) = |v| for all v ∈ R. Choose sequences
{xk } ⊂ R and {tk } ⊂ (0, ∞) so that both converge to zero while satisfying
f (xk + tk v) − f (xk )
f ◦ (0; v) = lim , v ∈ R.
k→∞ tk
To proceed, we employ the classical mean value theorem for each k ∈ N and
find ck in between xk and xk + tk v such that
f (xk + tk v) − f (xk ) = f  (ck )(tk v),
which implies in turn the equality
f (xk + tk v) − f (xk )
= f  (ck )v,
tk
where we use the standard derivative notation f  for functions of real vari-
ables. It is straightforward to check that f  (0) = 0 and also that f  (ck ) =
2ck sin(1/ck ) − cos(1/ck ) for ck = 0. Thus we have
1 1 1
|f  (ck )| ≤ 2ck sin + cos ≤ 2ck sin + 1.
ck ck ck
By taking into account that lim ck sin(1/ck ) = 0, this yields the inequality
k→∞
452 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

f ◦ (0; v) ≤ |f ◦ (0; v)| ≤ |v|, v ∈ R.


Consider now the case where v > 0. Let xk := 1/(π + 2kπ) for k ∈ N, and
let {tk } ⊂ (0, ∞) be a sequence that converges to zero. Then
f (xk + tk v) − f (xk ) 1 1
= f  (ck )v with < ck < + tk v.
tk π + 2kπ π + 2kπ
This clearly implies the inequalities
1 1
1 < < π + 2kπ,
π+2kπ + tk v ck
which can be equivalently rewritten as
(π + 2kπ)2 tk v 1
π + 2kπ − < < π + 2kπ.
1 + (π + 2kπ)tk v ck
Choosing tk := 1/k 3 ensures that cos(1/ck ) → −1 as k → ∞, and hence
f (xk + tk v) − f (xk )
= f  (ck )v → v as k → ∞,
tk
which implies that f ◦ (0; v) ≥ v = |v|. On the other hand, by the differentia-
bility of f at x = 0 we clearly have
df (0; v) = f  (0; v) = f  (0) · v = 0 for all v > 0,
and so f ◦ (0; v) = df (0; v). The case where v ≤ 0 is left as an exercise.

7.1.2 Properties of Extended Directional Derivatives

In this subsection, we present major properties of the extended directional


derivatives for locally Lipschitzian functions taken from Definition 7.1(a,b).
The most remarkable ones are obtained for the generalized directional deriva-
tive (7.1) following readily from its definition.
The first proposition collects properties, which are shared by both con-
structions f ◦ (x; v) and df (x; v). We show that these extended directional
derivatives are well-defined as real numbers and are bounded by a constant
of local Lipschitz continuity of f .

Proposition 7.8 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around some x ∈ X. Suppose that  ≥ 0 is a Lipschitz
constant of f : X → R around x. Then both functions v → f ◦ (x; v) and
v → df (x; v) are finite, positively homogeneous, and satisfy the estimates
|f ◦ (x; v)| ≤ v and |df (x; v)| ≤ v for all v ∈ X. (7.11)
7.1 Generalized Directional Derivatives 453

Proof. The positive homogeneity of both functions f ◦ (x; ·) and df (x; ·) imme-
diately follows from the definitions. Furthermore, it follows from the local
Lipschitz continuity of f around x that
f (x + tv) − f (x)
≤ v, v ∈ X,
t
which readily yields the second estimate in (7.11). Since  is a Lipschitz con-
stant of f around x, there exists η > 0 with
|f (x) − f (u)| ≤ x − u for all x, u ∈ B(x; η),
which clearly implies that
f (x + tv) − f (x) tv
≤ = v, v ∈ X,
t t
whenever x is near x and t > 0 is near zero. Passing there to the limit superior
as t ↓ 0 and x → x, we arrive at the first estimate in (7.11). Note finally that
the conditions in (7.11) ensure that both functions f ◦ (x; ·) and df (x; ·) are
finite on the entire space X. 

The next proposition verifies an important stability property of the gen-


eralized directional derivative (7.1) with respect to both variables.
Proposition 7.9 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some x ∈ X. Suppose that  ≥ 0 is a Lipschitz
constant of f : X → R around x. Then function (x, v) → f ◦ (x; v) is upper
semicontinuous at (x, v) for any v ∈ X.
Proof. Fix v ∈ X and take any sequences xk → x and vk → v as k → ∞. By
the definition of the upper limit in (7.1), for each k ∈ N we find uk ∈ X and
tk > 0 satisfying the conditions uk − xk  ≤ k −1 and
1 f (uk + tk vk ) − f (uk )
f ◦ (xk ; vk ) − ≤
k tk
f (uk + tk v) − f (uk ) f (uk + tk vk ) − f (uk + tk v)
= + .
tk tk
The assumed Lipschitz continuity of f around x with constant  ensures that
for all k ∈ N sufficiently large, the last term above is bounded by vk − v.
Thus passing to the upper limit as k → ∞ and using definition (7.1) give us
lim sup f ◦ (xk ; vk ) ≤ f ◦ (x; v),
k→∞

which justifies the upper semicontinuity of f at (x, v) for any v ∈ X. 


454 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.10 Let X be a normed space, and let f : X → R be a locally Lip-


schitzian function around some point x ∈ X. Define the directional function
ψ(v) := f ◦ (x; v) for all v ∈ X.
Then the function ψ is subadditive and Lipschitz continuous on X. In partic-
ular, it is convex on X.

Proof. It follows from Proposition 7.8 that ψ is a real-valued function on the


whole space X while satisfying the estimate
|ψ(v)| ≤ v for all v ∈ X.
Take any directions v1 , v2 ∈ X and deduce from definition (7.1) that
ψ(v1 + v2 ) = f ◦ (x; v1 + v2 )
 
f x + t(v1 + v2 ) − f (x)
= lim sup
x→x t
t↓0
f (x + tv1 + tv2 ) − f (x + tv1 ) + f (x + tv1 ) − f (x)
≤ lim sup
x→x t
t↓0
f (x + tv1 + tv2 ) − f (x + tv1 )
≤ lim sup
x→x t
t↓0
f (x + tv1 ) − f (x)
+ lim sup
x→x t
t↓0
f (u + tv2 ) − f (u) f (x + tv1 ) − f (x)
≤ lim sup + lim sup
x→x t x→x t
t↓0 t↓0
◦ ◦
= f (x; v1 ) + f (x; v2 ) = ψ(v1 ) + ψ(v2 ).
This clearly shows that the function ψ is subadditive on X. We know from
Proposition 7.8 that ψ is positively homogeneous, and thus it is convex.
To complete the proof of the theorem, it remains to verify the Lipschitz
continuity of ψ on the whole space X. The local Lipschitz continuity of f
around x tells us that, whenever x is near x and t > 0 is near zero, we get
f (x + tv1 ) − f (x + tv2 ) ≤ tv1 − v2 .
It follows therefore that for all such x and t, the estimate
f (x + tv1 ) − f (x) f (x + tv2 ) − f (x)
≤ + v1 − v2 
t t
holds and readily yields ψ(v1 ) ≤ ψ(v2 ) + v1 − v2  by taking the upper
limit as x → x and t ↓ 0. Interchanging the roles of v1 and v2 gives us
ψ(v2 ) ≤ ψ(v1 ) + v1 − v2 , and hence
7.2 Generalized Derivative and Subderivative Calculus 455

|ψ(v1 ) − ψ(v2 )| ≤ v1 − v2  for all v1 , v2 ∈ X.


This justifies the Lipschitz continuity of ψ on X and ends the proof. 

We leave as exercises for the reader (see Section 7.9) to show that all
the properties for f ◦ (x; v), which are obtained in Proposition 7.9 and The-
orem 7.10, do not hold in general for the directional derivative f  (x; v) and
the subderivative df (x; v). Furthermore, even for (Lipschitzian) functions in
finite dimensions, the generalized directional derivative (7.1) may not agree
with the classical one (7.4) when the latter exists.
Yet another remarkable property of Clarke’s generalized directional deriva-
tive is its plus-minus symmetry. This drastically distinguishes (7.1) from (7.2)
and the classical one-sided directional derivative (7.4), which both are con-
structions of unilateral analysis in contrast to (7.1).

Proposition 7.11 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around some point x ∈ X. We always have the following
symmetry relationship for the generalized directional derivative:
f ◦ (x; −v) = (−f )◦ (x; v) for all v ∈ X. (7.12)

Proof. Observe directly from definition (7.1) that


(−f )(x + tv) − (−f )(x)
(−f )◦ (x; v) = lim sup
x→x t
t↓0
f (x) − f (x + tv)
= lim sup
x→x t
t↓0  
f x + tv + t(−v) − f (x + tv)
= lim sup = f ◦ (x; −v),
x→x t
t↓0

which verifies therefore the symmetry property (7.12). 

7.2 Generalized Derivative and Subderivative Calculus


This section is devoted to deriving calculus rules for both extended directional
derivatives of locally Lipschitzian functions on normed spaces: the generalized
directional derivative (7.1) and the subderivative (7.2). We start with calcu-
lus rules for subderivatives that are established in the first subsection. Besides
being important for their own sake, the results obtained therein are employed
in the second subsection to derive equality and regularity statements in calcu-
lus rules for generalized directional derivatives. The main results of the second
subsection provide inequality-type calculus rules for (7.1) in the desired form
without any regularity assumptions.
456 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

7.2.1 Calculus Rules for Subderivatives

We begin this subsection by observing the following obvious scalar multipli-


cation rule for the subderivatives:
d(αf )(x; v) = αdf (x; v) for all α ≥ 0 and v ∈ X. (7.13)
Next we proceed by deriving an exact/equality-type chain rule for the sub-
derivatives of compositions that involve locally Lipschitzian outer functions
and Fréchet differentiable inner mappings between normed spaces.

Theorem 7.12 Consider the composition


 
(ϕ ◦ g)(x) := ϕ g(x) , x ∈ X, (7.14)
where g : X → Y is a mapping between normed spaces that is locally Lip-
schitzian around x and Fréchet differentiable at this point with the Fréchet
derivative ∇g(x), and where ϕ : Y → R is locally Lipschitzian around the
point y := g(x) with a Lipschitz constant  ≥ 0. Then we have the subderiva-
tive chain rule of the equality type
 
d(ϕ ◦ g)(x; v) = dϕ y; ∇g(y)v for all v ∈ X. (7.15)

Proof. It is obvious that the composition ϕ◦g is locally Lipschitzian around x.


Picking now any v ∈ X, deduce from the Fréchet differentiability of g at x that
∇g(x)v + o(tv)/t → ∇g(x)v as t ↓ 0. Based on this and the subderivative
definition (7.2), we get the relationships
 
ϕ g(x + tv) − ϕ(y)
d(ϕ ◦ g)(x; v) = lim inf
t↓0 t
 
ϕ y + t∇g(x)v + o(tv) − ϕ(y)
= lim inf
t↓0 t
  o(tv) 
ϕ y + t ∇g(x)v + − ϕ(y)
= lim inf t
t↓0 t
 
≥ dϕ y; ∇g(x)v whenever v ∈ X.
This verifies the inequality “≥” in (7.15).
Turning now to the proof of the opposite inequality in (7.15), take any
v ∈ X and find a sequence tk ↓ 0 such that
 
  ϕ y + tk ∇g(x)v − ϕ(y)
dϕ y; ∇g(x)v = lim .
k→∞ tk
This implies by the definitions that for all v ∈ X we have
7.2 Generalized Derivative and Subderivative Calculus 457
 
  ϕ y + tk ∇g(x)v − ϕ(y)
dϕ y; ∇g(x)v = lim
k→∞ tk
  
ϕ g x + tk ∇g(x)v − ϕ(y)
≥ lim inf
k→∞ tk
 g x + t ∇g(x)v  − y 
 k 
− lim  − ∇g(x)v 
k→∞ tk
 
 o(tk ) 
≥ d(ϕ ◦ g)(x, v) −  lim ∇g(x)v + − ∇g(x)v 
k→∞ tk
= d(ϕ ◦ g)(x; v),
which verifies the inequality “≤” in (7.15) and thus completes the proof. 
Our next goal is to obtain subderivative sum rules for locally Lipschitzian
functions. The main result in this direction requires an additional assumption
on the functions in question. We introduce the needed property of functions
geometrically via the convergence of sets, which is important for its own sake.
Given a parameterized family (Ωt )t>0 of nonempty subsets of a normed
space X, define the (Painlevé-Kuratowski) outer limit and inner limit of Ωt
as t ↓ 0 by, respectively,

Lim sup Ωt := x ∈ X ∃ tk ↓ 0, ∃ xk → x with xk ∈ Ωtk , k ∈ N ,
t↓0

Lim inf Ωt := x ∈ X ∀ tk ↓ 0, ∃ xk → x with xk ∈ Ωtk , k ∈ N .
t↓0

Proposition 7.13 Let (Ωt )t>0 be a family of nonempty subsets of a normed


space X. Then we have
(a) x ∈ Lim sup Ωt if and only if lim inf t↓0 d(x; Ωt ) = 0.
t↓0
(b) x ∈ Lim inf Ωt if and only if limt↓0 d(x; Ωt ) = 0, which is equivalent to
t↓0
lim supt↓0 d(x; Ωt ) = 0.

Proof. (a) Take any x ∈ Lim supt↓0 Ωt and find by the definition a sequence
{tk } of positive numbers with xk ∈ Ωtk for all k ∈ N such that
lim tk = 0 and lim xk = x.
k→∞ k→∞

It follows that 0 ≤ d(x; Ωtk ) ≤ x − xk  → 0 as k → ∞, which yields the


limiting condition lim inf t↓0 d(x; Ωt ) = 0.
For the converse, suppose that lim inf t↓0 d(x; Ωt ) = 0 and find a sequence
{tk } of positive numbers that converges to 0 such that
lim d(x; Ωtk ) = 0.
k→∞

Given each k ∈ N, find xk ∈ Ωtk such that x − xk  < d(x; Ωtk ) + 1/k. Then
{xk } converges to x, and so x ∈ Lim supt↓0 Ωt as claimed.
458 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

(b) Take any x ∈ Lim inf t↓0 Ωt . Take any sequence {tk } of positive numbers
that converges to 0 and find by the definition xk ∈ Ωtk for k ∈ N such that
{xk } converges to x. It follows that
0 ≤ d(x; Ωtk ) ≤ x − xk  → 0 as t → ∞,
which implies therefore that limt↓0 d(x; Ωt ) = 0.
Now suppose that limt↓0 d(x; Ωt ) = 0. Then for any sequence {tk } of pos-
itive numbers that converges to 0, we have
lim d(x; Ωtk ) = 0.
k→∞

Given each k ∈ N, find xk ∈ Ωtk such that 0 ≤ x−xk  < d(x; Ωtk )+1/k → 0
as k → ∞. Thus {xk } converges to x, which completes the proof. 

Remark 7.14 It follows from Proposition 7.13 that x ∈ Lim inf t↓0 Ωt if and
only if there exists a curve x : [0, ε] → X for some ε > 0 satisfying x(t) ∈
Ωt for all t ∈ (0, ε] and limt↓0 x(t) = x = x(0). Indeed, suppose that x ∈
Lim inf t↓0 Ωt . Then for any t ∈ (0, ε] choose xt ∈ Ωt such that x − xt  <
d(x; Ωt ) + t. Then we simply set x(t) := xt for t ∈ (0, ε] and x(0) := x. The
converse implication is also straightforward.

We say that the parameterized family of sets Ωt converges to Ω as t ↓ 0


and write it as Ωt → Ω if the outer and inner limits of (Ωt )t>0 agree, i.e.,
Lim sup Ωt = Lim inf Ωt = Ω := Lim Ωt . (7.16)
t↓0 t↓0 t↓0

Using (7.16), we now formulate the following two properties, which play a
significant role in our subsequent analysis.

Definition 7.15 Given a normed space X, consider the following properties:


(a) A nonempty set Ω ⊂ X is called geometrically
  derivable at x ∈ Ω if
for any v ∈ T (x; Ω) := Lim supt↓0 Ω−x
t there exists a curve ξ : [0, ε] →
Ω with ε > 0 such that
 ξ(t) − ξ(0)
ξ(0) = x and ξ+ (0) := lim = v. (7.17)
t↓0 t

(b) A function f : X → R is epi-differentiable at x ∈ dom(f ) if its epi-


graph is geometrically derivable at (x, f (x)).

Considering the t-parametric family of finite differences


f (x + tv) − f (x)
Δt f (x; v) := for v ∈ X and t > 0, (7.18)
t
7.2 Generalized Derivative and Subderivative Calculus 459

it is not hard to deduce from the definitions that the epi-differentiability


of a locally Lipschitzian function f : X → R around x can be equivalently
described by the set convergence (7.16) of the epigraphs
   
epi Δt f (x; ·) → epi df (x; ·) as t ↓ 0. (7.19)
It also follows from the definitions (and we leave is as an exercise for the
reader) that the set Ω is geometrically
 derivable at x ∈ Ω if and only if the
full limit (7.16) of the set family Ω−x
t exists as t ↓ 0.
t>0
Observe that the class of epi-differentiable functions at a given point x
is sufficiently large. Besides smooth and convex functions, it contains any
function on a normed space that is directionally regular at x. The opposite
implication fails as for the function f from Example 7.7 at x = 0. We refer
the reader to Exercise 7.121 with the hints therein for more details.
Having the above facts in hand, next we establish a convenient character-
ization of epi-differentiability by showing that this is the class of functions for
which the lower limit in the contingent derivative definition (7.2) reduces to
the (full) limit along a curve. The next lemma is used in what follows.

Lemma 7.16 Let f : X → R be locally Lipschitzian around x, and let q : X →


R be such that q(v) ≤ df (x; v) for all v in a normed space X. Then f is epi-
differentiable at x and we have q(·) = df (x; ·) if and only if for every v ∈ X
there exists a path v : [0, ε] → X, ε > 0, with the properties
 
v (t) − f (x)
f x + t
lim v(t) = v(0) = v and q(v) = lim , v ∈ X. (7.20)
t↓0 t↓0 t
Proof. Assume first that f is epi-differentiable at x with the subderivative
function q(·) = df (x; ·), which is finite on X by Proposition 7.8. Take any
v ∈ X and deduce from the definitions that (v, df (x; v)) = (v, q(v)) ∈
T ((x, f (x)); epi(f )). The epi-differentiability of f at x ensures by (7.17) that
there exists a path ξ : [0, ε] → epi(f ) with the components ξ(t) = (ξ1 (t), ξ2 (t))

for all t ∈ [0, ε] satisfying the conditions ξ(0) = (x, f (x)) and ξ+ (0) = (v, q(v)).
ξ1 (t)−x
Setting now v(t) := t for all t ∈ (0, ε] with 
v (0) := v, we get from the
inclusion ξ(t) ∈ epi(f ) on [0, ε] that
 
f x + t v (t) − f (x) ξ2 (t) − f (x)
≤ for all t ∈ (0, ε].
t t
Since ξ2 (t)−f
t
(x)
→ q(v) = df (x; v) as t ↓ 0, the limit of the quotient on the
left-hand side above exists and equals q(v).
To verify the converse implication, suppose that (7.20) holds and observe
that it yields q(·) ≥ df (x; ·). Invoking the imposed assumption on q(·) ≤
df (x; ·), this tells us that q(·) = df (x; ·) and hence epi(q) = T ((x, f (x)); epi(f )).
It remains to prove that the set epi(f ) is geometrically derivable at (x, f (x)).
To verify the latter, pick any pair (v, ν) ∈ T ((x, f (x)); epi(f )) = epi(q) and let
460 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

v : [0, ε] → X be the path taken from (7.20). Defining the curve ξ : [0, ε] → X
by  
ξ(t) := x + t v (t)) + t(ν − q(v)) , t ∈ [0, ε],
v (t), f (x + t
it is easy to check that ξ(t) ∈ epi(f ) for all t ∈ [0, ε] with ξ(0) = (x, f (x)),

and ξ+ (0) = (v, ν). We conclude that f is epi-differentiable at x. 

Now we are in a position to derive the following equality-type sum rule for
subderivatives of locally Lipschitzian function.

Theorem 7.17 Let f1 , f2 : X → R be locally Lipschitzian around x and epi-


differentiable at this point. Then the function f1 + f2 is also epi-differentiable
at x, and we have the subderivative sum rule
d(f1 + f2 )(x; v) = df1 (x; v) + df2 (x; v) for all v ∈ X. (7.21)

Proof. Reducing the summation f1 + f2 to the composition framework ϕ ◦ g


in (7.14), consider Y := X × X and then define g : X → Y and ϕ : Y → R by
g(x) := (x, x) and ϕ(x, z) := f1 (x) + f2 (z). (7.22)
We obviously have the representation f1 + f2 = ϕ ◦ g, where ϕ is locally
Lipschitzian around y := (x, x), and where g is Fréchet differentiable at x.
Let us show that the epi-differentiability of f1 and f2 at x ensures the
epi-differentiability of ϕ at (x, x) and its subderivative representation
 
dϕ (x, x); (u, w) = df1 (x; u) + df2 (x; w) for all (u, w) ∈ X × X. (7.23)
Since the inequality “≥” in (7.23) is trivial, we need to verify the opposite
inequality therein. To proceed, observe first that
 
Δt ϕ (x, x); (u, w) = Δt f1 (x; u) + Δt f2 (x; w) whenever (u, w) ∈ X × X
for the finite difference Δt in (7.18). Fixing (u, w) ∈ X × X and applying the
epi-differentiability criterion of Lemma 7.16 to both functions f1 and f2 , we
find the corresponding paths u (·) and w(·)
 such that
   
dϕ (x, x); (u, w) ≤ lim inf Δt ϕ (x, x); ( 
u(t), w(t)
t↓0
    
(t) + Δt f2 x; w(t)
= lim inf Δt f1 x; u 
t↓0
   
(t) + lim Δt f2 x; w(t)
= lim Δt f1 x; u 
t↓0 t↓0
= df1 (x; u) + df2 (x; w).
Hence (7.23) holds as an equality. This also tells us by Lemma 7.16 that the
function ϕ is epi-differentiable at (x, x).
7.2 Generalized Derivative and Subderivative Calculus 461

Now we apply the subderivative chain rule of Theorem 7.12 to the com-
position in ϕ ◦ g from (7.22). This shows by ϕ ◦ g = f1 + f2 and (7.23) that
 
d(f1 + f2 )(x; v) = d(ϕ ◦ g)(x; v) = dϕ (x, x); ∇g(x)v
 
= dϕ (x, x); (v, v) = df1 (x; v) + df2 (x; v)
for all v ∈ X, and so we get (7.21). Finally, the epi-differentiability of the sum
f1 + f2 at x follows from the epi-differentiability at (x, x) of the function ϕ
in (7.22) proved above, which yields therefore the epi-differentiability of the
composition ϕ ◦ g at x. The latter can be verified by employing Lemma 7.16
similar to the case of ϕ in (7.22); see Exercise 7.122. 

7.2.2 Calculus of Generalized Directional Derivatives

This subsection deals with deriving important calculus rules for Clarke’s gen-
eralized directional derivatives (7.1) of locally Lipschitzian functions, which
are generally different from the subderivative ones obtained in the preceding
subsection. First we observe the obvious albeit useful relationship
(αf )◦ (x; v) = αf ◦ (x; v) for all α ≥ 0 and v ∈ X, (7.24)
which is similar to the subderivative case in (7.13).
The next theorem provides sum rules for the generalized derivatives. Note
that its main statement establishes the inequality-type sum rule of the most
useful form with no additional assumptions. This is different from the case of
subderivatives, where the opposite inequality holds for free while it does not
bring any calculus information. On the other hand, Theorem 7.17 provides
an equality-type sum rule for (7.2) under epi-differentiability of the functions
in question, which is weaker than their directional regularity ensuring the
equality sum rule for (7.1) in the following result.

Theorem 7.18 Let fi : X → R, i = 1, . . . , m, be locally Lipschitzian around x


on a normed space X. Then the sum of these functions is locally Lipschitzian
around this point, and for any v ∈ X we have

m ◦ m

fi (x; v) ≤ fi◦ (x; v), (7.25)
i=1 i=1

where equality holds provided that all the functions fi , i = 1, . . . , m, are direc-
tionally regular at x. In this case the summation function f1 + . . . + fm is also
directionally regular at this point.

Proof. It suffices to verify the claimed statements for the case where m = 2
while observing that the general case easily follows by induction. The local
Lipschitz continuity of the sum in question is obvious.
To justify estimate (7.25) for m = 2, we fix any direction v ∈ X and get
by using Definition 7.1 the relationships
462 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

(f1 + f2 )(x + tv) − (f1 + f2 )(x)


(f1 + f2 )◦ (x; v) = lim sup
x→x t
t↓0
 f (x + tv) − f (x) f (x + tv) − f (x) 
1 1 2 2
= lim sup +
x→x t t
t↓0
f1 (x + tv) − f1 (x) f2 (x + tv) − f2 (x)
≤ lim sup + lim sup
x→x t x→x t
t↓0 t↓0

= f1◦ (x; v) + f2◦ (x; v),


which verify (7.25) for m = 2. It is easy to deduce from definition (7.2) that
d(f1 + f2 )(x; v) ≥ df1 (x; v) + df2 (x; v) for all v ∈ X. (7.26)
Combining (7.25) and (7.26) with the directional regularity in (7.3) justifies
the equality in (7.25) for m = 2 and thus completes the proof. 

The next result provides a chain rule for generalized directional derivatives
of compositions involving locally Lipschitzian outer functions and smooth
inner mappings between Banach spaces. The completeness of the spaces in
question is needed for the application of the classical open mapping theorem
for linear continuous operators discussed in Subsection 1.3.3.

Theorem 7.19 Let g : X → Y be a mapping between Banach spaces, and let


ϕ : Y → R. Assume that g is continuously differentiable around x ∈ X with
the surjective derivative ∇g(x) : X → Y , and that ϕ is locally Lipschitzian
around y := g(x) with a Lipschitz constant  ≥ 0. Then the composition ϕ ◦ g
is Lipschitz continuous around x, and we have the chain rule equality
 
(ϕ ◦ g)◦ (x; v) = ϕ◦ y; ∇g(x)v for all v ∈ X. (7.27)
Furthermore, the composition ϕ ◦ g is directionally regular at x provided that
the outer function ϕ is directionally regular at y.

Proof. The local Lipschitz continuity of ϕ ◦ g around x easily follows from the
definitions. To verify (7.27), observe by the surjectivity of the linear continuous
operator ∇g(x) that the open mapping theorem ensures that the image set
g(B(x; η)) is a neighborhood of y for any η > 0. It follows therefore that for
each fixed v ∈ X, we get the equalities
 

  ϕ y + t∇g(x)v − ϕ(y)
ϕ y; ∇g(x)v = lim sup
y→y t
t↓0
   
ϕ g(x) + t∇g(x)v − ϕ g(x)
= lim sup .
x→x t
t↓0

Invoking the Lipschitz constant  ≥ 0 of ϕ around y, observe that


7.2 Generalized Derivative and Subderivative Calculus 463
   
ϕ g(x + tv) − ϕ g(x) + t∇g(x)v ≤ g(x + tv) − g(x) − t∇g(x)v
= g(x + tv) − g(x) − t∇g(x)v + t∇g(x)v − t∇g(x)v
≤ g(x + tv) − g(x) − t∇g(x)v + t∇g(x)v − ∇g(x)v
whenever x is near x and t > 0 is sufficiently small. For such x and t we have
     g(x + tv) − g(x) − t∇g(x)v 
ϕ g(x + tv) − ϕ g(x) + t∇g(x)v  
≤  
t t 
+ ∇g(x)v − ∇g(x)v ,

which readily implies that


   
ϕ g(x + tv) − ϕ g(x) + t∇g(x)
lim sup = 0.
x→x t
t↓0

It follows therefore by using (7.1) that


   

  ϕ g(x) + t∇g(x)v − ϕ g(x)
ϕ y; ∇g(x)v = lim sup
x→x t
t↓0
 
ϕ g(x + tv) − ϕ(g(x)
= lim sup = (ϕ ◦ g)◦ (x; v)
x→x t
t↓0

for all v ∈ X. Thus we arrive at the chain rule (7.27).


It remains to show that the composition ϕ ◦ g is directionally regular at x
provided that ϕ is directionally regular at y. Indeed, Theorem 7.12 gives us
the equality chain rule (7.15) for the subderivative of the composition ϕ ◦ g
under even the weaker assumptions in comparison with those imposed in this
theorem. Thus combining the chain rules (7.15) and (7.27) with the imposed
directional regularity of ϕ at y, we get for all v ∈ X that
   
(ϕ ◦ g)◦ (x; v) = ϕ◦ y; ∇g(x)v = dϕ y; ∇g(x)v = d(ϕ ◦ g)(x; v),
which verifies the claimed directional regularity of ϕ ◦ g at x. 

The final theorem of this subsection provides a convenient formula for


calculating the generalized directional derivative (7.1) of pointwise maxima
of smooth functions on normed spaces. The additional directional regularity
statement includes a similar calculus rule for subderivatives (7.2).

Theorem 7.20 Given fi : X → R, i = 1, . . . , m, on a normed space X, for


all x ∈ X define the maximum function

fmax (x) := max fi (x) i = 1, . . . , m . (7.28)
464 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Assume that the functions fi are C 1 -smooth around a given point x for all
i = 1, . . . , m. Then the maximum function (7.28) is locally Lipschitzian around
x, and we have the representation

 
fmax (x; v) = max ∇fi (x), v , v ∈ X, (7.29)
i∈I(x)

where I(x) := {i = 1, . . . , m | fi (x) = fmax (x)} is the set of active indices.


Furthermore, the maximum function (7.28) is directionally regular at x.

Proof. The Lipschitz continuity of (7.28) around x follows immediately from


the definitions. Fix any element v ∈ X. Using further the generalized direc-
tional derivative construction (7.1), we find a sequence {xk } ⊂ X converging
to x and a sequence {tk } of positive numbers that converges to zero such that

◦ fmax (xk + tk v) − fmax (xk )


fmax (x; v) = lim .
k→∞ tk
Denoting Ik := {i = 1, . . . , m | fi (xk +tk d) = fmax (xk +tk v)} and extracting a
subsequence if necessary, we assume without loss of generality that Ik = I0 = ∅
for all k ∈ N. Since xk + tk v → x as k → ∞, it gives us I0 ⊂ I(x). Then for
any i ∈ I0 we get the relationships

◦ fmax (xk + tk v) − fmax (xk )


fmax (x; v) = lim
k→∞ tk
fi (xk + tk v) − fmax (xk )
= lim
k→∞ tk
fi (xk + tk v) − fi (xk )
≤ lim sup
k→∞ tk
     
= ∇fi (x), v ≤ max ∇fi (x), v ≤ max ∇fi (x), v ,
i∈I0 i∈I(x)

which verify the inequality “≤” in (7.29).


To prove the opposite inequality in (7.29), fix any i ∈ I(x) and t > 0
sufficiently close to 0. Then we have by using the mean value theorem that
fmax (x + tv) − fmax (x) fmax (x + tv) − fi (x)
=
t t
fi (x + tv) − fi (x)  
≥ = ∇fi (ui ), v
t
for some ui ∈ (x, x + tv). This tells us therefore that

◦ fmax (x + tv) − fmax (x)  


fmax (x; v) ≥ lim inf ≥ ∇fi (x), v ,
t↓0 t

which yields fmax (x; v) ≥ max ∇fi (x), v and thus verifies (7.29).
i∈I(x)
7.3 Directionally Generated Subdifferentials 465

Similar arguments based on the subderivative definition (7.2) lead us to


the subderivative calculation of the pointwise maxima
 
dfmax (x; v) = max ∇fi (x), v , v ∈ X. (7.30)
i∈I(x)

Combining the latter with (7.29) ensures the directional regularity of the
maximal function (7.28) under the imposed assumptions. 

7.3 Directionally Generated Subdifferentials


In this section we define and study the subdifferential constructions for
locally Lipschitzian functions, which are generated in duality by the direc-
tional derivatives considered in the two preceding sections. We split this
section into three subsections. The first subsection presents the definitions
and some properties of the two major subdifferentials that correspond to the
(Clarke) generalized directional derivative and the (Dini) contingent deriva-
tive/subderivative from Definition 7.1. The second subsection contains major
calculus rules for generalized gradients, while the third one presents calculus
rules for contingent subgradients of such functions. Repeat here that, unless
otherwise stated, our standing assumptions require the local Lipschitz conti-
nuity of functions defined on normed spaces.

7.3.1 Basic Definitions and Some Properties

We start by recalling (see Proposition 4.48) that the subdifferential of a con-


vex function f : X → R at x ∈ X can be equivalently defined via its one-
sided/right directional derivative by the duality scheme

∂f (x) := x∗ ∈ X ∗ x∗ , x ≤ f  (x; v) for all v ∈ X , (7.31)
where the classical directional derivative always exists for convex functions.
Furthermore, the one-sided directional derivative of a convex function can be
restored from the subdifferential by the full duality

f  (x; v) = sup x∗ , v x∗ ∈ ∂f (x) , v ∈ X, (7.32)
as shown in Theorem 4.50. We know that the classical directional derivative
(7.4) may exist not only for convex functions, and thus the duality scheme
(7.31) can be used to define subgradients of nonconvex functions. However,
such subdifferential constructions do not usually exhibit good properties,
including calculus rules, and surely fail to satisfy the full duality in (7.32)
unless the directional derivative f  (x; v) is convex with respect to v. In the
latter case, f belongs to the class of locally convex functions at x for which
desired calculus rules are obtained under appropriate qualification conditions
by using fundamental tools of convex analysis. We are going to further discuss
this and related issues in Section 7.10 with the references therein.
The main attention here is paid to investigating the two subdifferential
constructions for locally Lipschitzian functions obtained via the duality
466 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

scheme (7.31) by using, respectively, the generalized directional derivative


(7.1) and the subderivative (7.2). The subdifferential symbols we use for these
constructions are conventional in variational analysis and its applications.
Definition 7.21 Let f : X → R be a locally Lipschitzian function around a
given point x ∈ X. Then we define the following:
(a) The (Clarke) generalized gradient of f at x is

∂f (x) := x∗ ∈ X ∗ x∗ , v ≤ f ◦ (x; v) for all v ∈ X . (7.33)

(b) The (Dini) contingent subdifferential of f at x is



∂ − f (x) := x∗ ∈ X ∗ x∗ , v ≤ df (x; v) for all v ∈ X . (7.34)

It is obvious from the duality scheme in (7.33) and (7.34) that both sub-
gradient sets are convex and weak∗ closed in X ∗ , and that both these sets
reduce to the subdifferential of convex analysis for locally Lipschitzian convex
functions. However, the crucial difference between them in the general Lips-
chitzian setting is that ∂f (x) is defined via the generalized directional deriva-
tive f ◦ (x; v), which is convex in the direction variable v, while the function
v → df (x, v) is commonly nonconvex. The unconditional convexity of f ◦ (x; ·)
enables us to study the generalized gradient (7.33) by employing powerful
tools of convex analysis.
It is easy to check by the definitions that we always have
∂ − f (x) ⊂ ∂f (x) for all x ∈ X, (7.35)
where equality holds for a fixed x if and only if f is directionally regular
at x. As seen below, important properties and calculus rules for ∂f in the
absence of regularity are significantly better than those for ∂ − f . On the other
hand, the contingent subdifferential may be much smaller than the generalized
gradient even for differentiable Lipschitzian functions on the real line as in
Example 7.23. Furthermore, the set ∂ − f (x) may be empty (which is not the
case of ∂f (x)) for simple nonconvex functions as in Examples 7.22 and 7.25.
Before deriving general results on the above subdifferential constructions,
let us calculate the generalized gradient and contingent subdifferential of the
Lipschitzian functions on the real line taken from Examples 7.6 and 7.7. The
obtained calculations illustrate important features of these notions.

Example 7.22 Consider the function f (x) := min{−x, 0} on R from Exam-


ple 7.6. Based on Definition 7.21(a,b) and directional derivative formulas given
in Example 7.6, we calculate the generalized gradient of f by


⎨{−1} if x > 0,
∂f (x) = [ − 1, 0] if x = 0,


{0} if x < 0.
7.3 Directionally Generated Subdifferentials 467

On the other hand, the contingent subdifferential of f is calculated by




⎨{−1} if x > 0,

∂ f (x) = ∅ if x = 0,


{0} if x < 0.
Example 7.23 Consider the function f : R → R from Example 7.7 given by
⎧  
⎨x2 sin 1 if x = 0,
f (x) := x
⎩0 if x = 0.

Since f ◦ (0; v) = |v| for v ∈ R, we have ∂f (0) = [−1, 1]. On the other hand, it
is obvious that ∂ − f (x) = f  (0)} = {0}.
Proceeding further with a parallel study of the generalized gradient and the
contingent subdifferential from Definition 7.21, we first clarify some structural
properties of these sets. The next result describes the situation for ∂f (x).
Theorem 7.24 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. Suppose that  ≥ 0 is a
Lipschitz constant of f : X → R around x. Then the set ∂f (x) is nonempty,
convex, weak∗ compact, and bounded in X ∗ with the bound estimate

x∗  := sup{ x∗ , v v ∈ X, v ≤ 1 ≤  for all x∗ ∈ ∂f (x). (7.36)

Furthermore, the set-valued mapping x → ∂f (x) is of closed graph around x


in the norm×weak∗ topology of X × X ∗ , i.e., for any x near x we have
 w∗
xν → x, x∗ν → x∗ with x∗ν ∈ ∂f (xν )] =⇒ x∗ ∈ ∂f (x), (7.37)
w∗
where → indicates net convergence in the weak∗ topology of X ∗ .
Proof. The convexity and the closedness of ∂f (x) in the weak∗ topology of
X ∗ follow immediately from definition (7.33). The boundedness of ∂f (x) with
estimate (7.36) is a consequence of (7.33) and Proposition 7.8. Thus the weak∗
compactness of the set ∂f (x) follows from the Alaoglu-Bourbaki theorem in
duals of normed spaces; see Corollary 1.113.
To verify the nonemptiness of this set, let Y := span{v0 } for some v0 ∈ X
and define a linear function y ∗ : Y → R by y ∗ , tv0 := tf ◦ (x; v0 ) for t ∈ R.
It is not hard to verify that y ∗ , v ≤ p(v) := f ◦ (x; v) for all v ∈ Y and
y ∗ , v0 = f ◦ (x; v0 ). We recall the positive homogeneity and subadditivity
of the generalized directional derivative p from Theorem 7.10. The Hahn-
Banach theorem (Theorem 1.125) allows us to extend y ∗ to a linear functional
x∗ : X → R such that x∗ , v0 = f ◦ (x; v0 ) and
x∗ , v ≤ p(v) = f ◦ (x; v) for all v ∈ X.
468 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

It follows from Theorem 7.8 that the linear functional x∗ belongs to X ∗ , and
thus we get x∗ ∈ ∂f (x) by the subdifferential definition.
Remembering that the assumed Lipschitz continuity of f around x yields
this property for each x near x, we finally deduce (7.37) from the upper
semicontinuity of f ◦ in Proposition 7.9. Indeed, suppose that f is locally
Lipschitzian on a neighborhood W of x ∈ X. Take any nets {xν } in X and
{x∗ν } in X ∗ satisfying the conditions
w∗
xν → x, x∗ν → x∗ with x∗ν ∈ ∂f (xν ).
Then we have the inequality
x∗ν , v ≤ f ◦ (xν ; v) for all v ∈ X and all indices ν.
Fix finally any v ∈ X and take an arbitrary ε > 0. By the aforementioned
upper semicontinuity of f ◦ , find a neighborhood V ⊂ W of x for which
f ◦ (z; v) < f ◦ (x; v) + ε whenever z ∈ V.
This allows us to assume without loss of generality that
x∗ν , v ≤ f ◦ (xν ; v) < f ◦ (x; v) + ε for all ν.
w∗
Passing to a limit as x∗ν → x∗ and then as ε ↓ 0 gives us x∗ , v ≤ f ◦ (x; v).
Since v is arbitrary, we get x∗ ∈ ∂f (x) and thus complete the proof. 
As mentioned above, the contingent subdifferential ∂ − f (x) shares with
(7.33) the convexity property. Also it follows from definition (7.34) and Propo-
sition 7.8 for the subderivative df (x; v) that
x∗  ≤  for any x∗ ∈ ∂ − f (x) (7.38)
via the Lipschitz constant  of f around x. However, the major assertions on
the nonemptiness and robustness (7.37) of ∂f obtained in Theorem 7.24 fail
for (7.34) as illustrated next by a simple while instructive example.

Example 7.25 Consider the following two simplest nonsmooth Lipschitzian


functions defined on the real line by
f1 (x) := |x| and f2 (x) := −|x|, x ∈ R.
The point of interest here, where both functions are nondifferentiable, is x :=
0. Based on Definition 7.1(a,b), we get for all v ∈ R that
f1◦ (0; v) = f2◦ (0; v) = df1 (0; v) = |v| and df2 (0; v) = −|v|.
Then the corresponding parts of Definition 7.21(a,b) tell us that
∂f1 (0) = ∂f2 (0) = ∂ − f1 (0) = [−1, 1] and ∂ − f2 (0) = ∅.
7.3 Directionally Generated Subdifferentials 469

Furthermore, it is obvious that ∂f1 (x) = ∂ − f1 (x) = {1} if x > 0 and ∂f2 (x) =
∂ − f2 (x) = {−1} if x < 0. In particular, this example shows that for f := f2
the nonemptiness and robustness properties of ∂f in Theorem 7.24 fail to hold
for the contingent subdifferential ∂ − f at x = 0.

An easy while very important setting of the automatic nonemptiness of


∂ − f (x), and hence of ∂f (x), is when x is a local minimizer of f .
Proposition 7.26 Let f : X → R be a locally Lipschitzian function around a
given point x ∈ X. If x is a local minimizer of f , then we have the following
generalized Fermat rules:
0 ∈ ∂ − f (x) and 0 ∈ ∂f (x). (7.39)

Proof. The first inclusion in (7.39) is an immediate consequence of the defi-


nitions. The second inclusion follows from the first one by (7.35). 

Now we compare the directionally generated subdifferentials in Defini-


tion 7.21 with the classical notions for differentiable and convex functions.
The next proposition for the contingent subdifferential (7.34) can be easily
derived from the corresponding results obtained in the previous chapters.

Proposition 7.27 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around a given point x ∈ X. The following assertions
hold for the contingent subdifferential:

(a) If f is Gâteaux differentiable at x with the Gâteaux derivative fG (x), then
we have the representation

∂ − f (x) = fG
(x) .

(b) If f : X → R is a convex function on X, then we have



∂ − f (x) = x∗ ∈ X ∗ x∗ , x − x ≤ f (x) − f (x) for all x ∈ X .

Proof. To verify (a), we first observe that the Gâteaux differentiability of


f at x yields the directional differentiability of f at this point and ensures
furthermore that df (x; v) agrees with f  (x; v). Then the claim of (a) follows
directly from Proposition 5.38.
If f is convex, then f  (x; v) exists for any x and thus agrees with df (x; v).
Hence in this case the result is a consequence of Proposition 4.48. 

Similar results for the generalized gradient (7.33) of smooth and convex
functions follow from the corresponding properties of the generalized direc-
tional derivative (7.1) obtained in Subsection 7.1.1. More delicate relationships
between the generalized gradient and various kinds of (strict) differentiability
are discussed below in Section 7.5.
470 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Proposition 7.28 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around a given point x ∈ X. The following assertions
hold for the generalized gradient:
(a) If f is a C 1 -smooth function around x, then

∂f (x) = ∇f (x) for all x near x.

(b) If f is a convex function on X, then we have



∂f (x) = x∗ ∈ X ∗ x∗ , x − x ≤ f (x) − f (x) for all x ∈ X .

Proof. It suffices to verify (a) for x = x. The latter readily follows from defi-
nition (7.33) and Proposition 7.5 telling us that for C 1 -smooth functions, the
generalized directional derivative (7.1) agrees with the classical one f  (x; v),
which in turn reduces in this case to ∇f (x), v . Assertion (b) is a direct
consequence of (7.33) and Proposition 7.4. 
We conclude this subsection by presenting some elementary properties of
the contingent subdifferential that are not shared by the generalized gradient.
Proposition 7.29 Let X be a normed space, and let f1 , f2 : X → R be locally
Lipschitzian functions around a given point x. The following assertions hold:
(a) We always have the inclusion
∂ − (f1 + f2 )(x) ⊃ ∂ − f1 (x) + ∂ − f2 (x). (7.40)

(b) If f1 (x) = f2 (x) and f1 (x) ≤ f2 (x) for all x around x, then
∂ − f1 (x) ⊂ ∂ − f2 (x).

Proof. Inclusion (7.40) in (a) is a consequence of the definition. To verify (b),


we use df1 (x; v) ≤ df2 (x; v) for all v ∈ X under the assumptions made, and
then also apply definition (7.34). 

Note that both assertions in Proposition 7.29 fail for the generalized gra-
dients of f1 and f2 . Indeed, for (a) choose f1 (x) := |x| and f2 (x) := −|x|,
while for (b) choose f1 (x) := −|x| and f2 (x) := 0 on X := R.
7.3.2 Calculus Rules for Generalized Gradients

In this subsection, we present major calculus rules for generalized gradients


(7.33) of locally Lipschitzian functions on normed spaces. Their proofs reduce
to basic convex analysis due to the convexity of the generalized derivative
v → f ◦ (x; v) and the calculus rules for it obtained in Subsection 7.2. The
realization of this procedure involves the following two crucial results. The
first proposition shows that the generalized gradient ∂f (x) is the convex sub-
differential at the origin of the generalized directional derivative v → f ◦ (x; v),
while the second result extends the full duality correspondence (7.32) of convex
analysis to the case of the convexified constructions under consideration.
7.3 Directionally Generated Subdifferentials 471

Proposition 7.30 Let X be a normed space, and let f : X → R be a locally


Lipschitzian function around some point x ∈ X. Consider the generalized
derivative function defined by
ψ(v) := f ◦ (x; v) for all v ∈ X.
Then we have the relationship ∂f (x) = ∂ψ(0).

Proof. Theorem 7.10 tells us that ψ is a convex function with ψ(0) = 0. By


definition (7.33), we get that x∗ ∈ ∂f (x) if and only if
x∗ , v ≤ f ◦ (x; v) = ψ(v) whenever v ∈ X. (7.41)
Since ψ(0) = 0, the equality in (7.41) can be written as
x∗ , v − 0 ≤ ψ(v) − ψ(0) for all v ∈ X,
which tells us therefore that x∗ ∈ ∂ψ(0). 
Theorem 7.31 Let X be a normed space, and let f : X → R be a locally
Lipschitzian function around some point x ∈ X. We have the representation

f ◦ (x; v) = max x∗ , v x∗ ∈ ∂f (x) for every v ∈ X. (7.42)

Proof. Fix any v ∈ X and observe first that the maximum in (7.42) is achieved
due to the nonemptiness and weak∗ compactness of ∂f (x) by Theorem 7.24.
Then it follows from definition (7.33) that the inequality “≥” holds in (7.42).
To verify the opposite inequality, suppose on the contrary that

f ◦ (x; v) > max x∗ , v x∗ ∈ ∂f (x) . (7.43)
Define Y := span{v} and consider the linear function y ∗ : Y → R given by
y ∗ , tv = tf ◦ (x; v). Employing the Hahn-Banach theorem as in the proof of
Theorem 7.24 ensures the existence of a linear continuous functional x∗ ∈ X ∗
such that we have the relationships
x∗ , w ≤ f ◦ (x, w) for all w ∈ X and x∗ , v = f ◦ (x; v),
which yield x∗ ∈ ∂f (x). The latter implies together with (7.43) that
f ◦ (x, v) > x∗ , v = f ◦ (x; v),
a contradiction verifying (7.42). Note that, by the convexity of ∂f (x), repre-
sentation (7.42) can be also obtained by using convex separation. 

Having in hand the representation results of Proposition 7.30 and Theo-


rem 7.31, now we can derive the following major calculus rules for the gen-
eralized gradient mappings from their counterparts for convex functions and
generalized directional derivatives (7.1) of locally Lipschitzian functions.
Let us start with sum rules of inclusion and equality types for (7.33).
472 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.32 Let X be a normed space, and let fi : X → R, i = 1, . . . , m, be


locally Lipschitzian around some point x ∈ X. We always have the inclusion
∂(f1 + . . . + fm )(x) ⊂ ∂f1 (x) + . . . + ∂fm (x), (7.44)
where equality holds in the following two cases:
(a) All the functions fi , i = 1, . . . , m, are directionally regular at x, which
yields the directional regularity of the sum f1 + . . . + fm at x.
(b) All but one of the functions fi are C 1 -smooth around x.
Proof. We can confine ourselves to the case where m = 2 while observing that
the general case easily follows by induction. To prove the inclusion in (7.44)
for m = 2, define the functions ψi : X → R by
ψi (v) := fi◦ (x; v) for all v ∈ X, i = 1, 2,
and pick any x∗ ∈ ∂(f1 + f2 )(x). Then we have by the definitions and the sum
rule of Theorem 7.18 for generalized directional derivatives that
x∗ , v ≤ (f1 + f2 )◦ (x; v) ≤ f1◦ (x; v) + f2◦ (x; v) = ψ1 (v) + ψ2 (v) for all v ∈ X.
Using now Proposition 7.30 and the classical sum rule (Moreau-Rockafellar
theorem) of convex analysis tells us that
x∗ ∈ ∂(ψ1 + ψ2 )(0) = ∂ψ1 (0) + ∂ψ2 (0) = ∂f1 (x) + ∂f2 (x),
which therefore justifies the inclusion in (7.44).
To verify the equality in (7.44) under the directional regularity of fi , we
get from (7.44), Proposition 7.29(a), and the equality in (7.35) for fi that
∂(f1 + f2 )(x) ⊂ ∂f1 (x) + ∂f2 (x) = ∂ − f1 (x) + ∂ − f2 (x)
⊂ ∂ − (f1 + f2 )(x),
which verifies the equality in (7.44) and the directional regularity of the sum
f1 + f2 at x since we always have
∂ − (f1 + f2 )(x) ⊂ ∂(f1 + f2 )(x).
To check the equality in (7.44) (while not the directional regularity of f1 + f2 ,
which is not claimed) in case (b), suppose for definiteness that f2 is C 1 -smooth
around x and apply (7.44) to the sum f1 = (f1 + f2 ) + (−f2 ). Hence we get
 
∂f1 (x) = ∂ (f1 + f2 ) + (−f2 ) (x) ⊂ ∂(f1 + f2 )(x) − ∇f2 (x)
by Proposition 7.28(a). This tells us that
∂f1 (x) + ∇f2 (x) = ∂f1 (x) + ∂f2 (x) ⊂ ∂(f1 + f2 )(x)
and thus justifies the equality in (7.44) in this case. 
7.3 Directionally Generated Subdifferentials 473

It is easy to see that the assumptions on the directional regularity of f1 , f2


in Theorem 7.32(a) and on the smoothness of one of the functions f1 , f2 in
Theorem 7.32(b) for m = 1, 2 are essential for the fulfillment of the equality
in (7.44). Indeed, this is clearly illustrated by the functions f1 (x) := |x| and
f2 (x) := −|x| from Example 7.25 at x := 0. Furthermore, considering these
functions shows that the counterpart of the inclusion in (7.44) fails for the
contingent subdifferential (7.34).
Next we consider chain rules for generalized gradients of compositions. The
first result below is a consequence of the generalized gradient reduction to the
convex subdifferential in Proposition 7.30, the chain rule for the generalized
directional derivatives obtained in Theorem 7.19, and the classical chain rule
of convex analysis; see Theorem 3.55. Here we need to impose the additional
completeness assumption on the spaces in question due to its requirement in
Theorem 7.19. The most essential assumption in the following theorem is the
surjectivity of the derivative operator ∇g(x) : X → Y .
Theorem 7.33 Consider the composition (ϕ ◦ g) : X → R of g : X → Y and
ϕ : Y → R under the assumptions and notation of Theorem 7.19. Then we
have the equality-type chain rule
∂(ϕ ◦ g)(x) = ∇g(x)∗ ∂ϕ(y). (7.45)
Furthermore, the composition ϕ ◦ g is directionally regular at x if the outer
function ϕ is directionally regular at y.

Proof. It follows from definition (7.33) that



∂(ϕ ◦ g)(x) = x∗ ∈ X ∗ x∗ , v ≤ (ϕ ◦ g)◦ (x; v) for all v ∈ X .
Applying now the chain rule from Theorem 7.19 to (ϕ ◦g)◦ and using then the
convex subdifferential reduction from Proposition 7.30, we get the equality
 
∂(ϕ ◦ g)(x) = ∂ ψ ◦ A (0) (7.46)
with the convex function ψ(u) := ϕ◦ (y; u) and the linear continuous operator
A := ∇g(x). To verify (7.45), it remains to employ in the right-hand side
of (7.46) the classical chain rule of convex analysis from Theorem 3.55 while
remembering that ϕ is (Lipschitz) continuous around y.
Assuming finally that ϕ is directionally regular at y, we get by Theo-
rem 7.19 that (ϕ ◦ g)◦ is directionally regular at x and thus ∂(ϕ ◦ g)(x) =
∂ − (ϕ ◦ g)(x), which completes the proof of the theorem. 

The next chain rule for generalized gradients in the above composition
framework of ϕ ◦ g does not impose the rather restrictive surjectivity assump-
tion on ∇g(x) as in Theorem 7.33 while it generally provides only the inclusion
“⊂” in (7.45), which becomes an equality if ϕ is directionally regular at y.
Note that in contrast to Theorem 7.33, we do not assume now that the spaces
X and Y in question are Banach spaces.
474 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.34 Let g : X → Y be a mapping between normed spaces that


is C 1 -smooth around x, and let ϕ : Y → R be locally Lipschitzian around
y := g(x). Then we have the inclusion
∂(ϕ ◦ g)(x) ⊂ ∇g(x)∗ ∂ϕ(y), (7.47)
where equality holds if ϕ is directionally regular at y. In the latter case, the
composition ϕ ◦ g is directionally regular at x.

Proof. The Lipschitz continuity of ϕ ◦ g around x is obvious. To verify (7.47),


pick any x∗ ∈ ∂(ϕ◦g)(x). Using the convexity and weak∗ compactness of (7.33)
and employing the Hahn-Banach theorem similar to the proof of Theorem 7.24
allow us to show that the inclusion in (7.47) is equivalent to
  
(ϕ ◦ g)◦ (x; v) ≤ max y ∗ , ∇g(x)v y ∗ ∈ ∂ϕ(y) for all v ∈ X, (7.48)
where the set on the right-hand side of (7.48) reduces to ϕ◦ (y; ∇g(x)v) by
Theorem 7.31. Thus we arrive at (7.47).
To verify now the equality and regularity statements of the theorem under
the directional regularity of ϕ at y, we deduce from the latter that the classical
directional derivative of ϕ at y exists and then get the relationships
 
ϕ y + t∇g(x)v − ϕ(y) ϕ(y + tv) − ϕ(y)
lim = lim
t↓0 t t↓0 t
(ϕ ◦ g)(x + tv) − y
lim = (ϕ ◦ g) (x; v) ≤ (ϕ ◦ g)◦ (x; v),

t↓0 t
which imply therefore that (ϕ ◦ f ) (x; v) ≤ (ϕ ◦ g)◦ (x; v) for all v ∈ X. Since
the opposite inequality was proved above, we conclude that the composition
ϕ ◦ g is directionally regular at x and that equality holds in (7.47). 

Among useful consequences of Theorem 7.34, we next present the following


relationships between the full and partial generalized gradients of Lipschitzian
functions of two variables on normed spaces. Note that the proof of this corol-
lary cannot be obtained by using Theorem 7.33, since the derivative of the
inner mapping below is never surjective.
Given a function f : X × Y → R defined on the product of two normed
spaces, consider its full generalized gradient ∂f at (x, y) and the partial ones
∂ x f (x, y) and ∂ y f (x, y) with respect to the corresponding variables.

Corollary 7.35 Let X and Y be normed spaces, and let f : X × Y → R be


locally Lipschitzian around (x, y) ∈ X × Y . Then we have the inclusion

∂ x f (x, y) ⊂ x∗ ∈ X ∗ ∃ y ∗ ∈ Y ∗ such that (x∗ , y ∗ ) ∈ ∂f (x, y) , (7.49)

and a similar one for ∂ y f (x, y), where equalities hold in both inclusions if f
is directionally regular at (x, y). Furthermore, in the latter case we have
∂f (x, y) ⊂ ∂ x f (x, y) × ∂ y f (x, y). (7.50)
7.3 Directionally Generated Subdifferentials 475

Proof. To verify (7.49) and the equality therein under the directional reg-
ularity assumption, we apply Theorem 7.34 to the composition ϕ ◦ g with
g : X → X × Y and ϕ : X × Y → R defined by
g(x) := (x, y) and ϕ(x, y) := f (x, y).
To get (7.50), we use full duality between the generalized gradient and direc-
tional derivative from Theorem 7.31 and the fact that the latter reduces to
the classical directional derivative for directionally regular functions. 

The next theorem provides calculus rules for evaluating generalized gradi-
ents of pointwise maxima of finitely many Lipschitzian functions. Recall that
such maximum functions are intrinsically nonsmooth even when those under
maximization are linear as for |x| = max{x, −x}.
Theorem 7.36 Let fi : X → R, i = 1, . . . , m, be locally Lipschitzian around
x on a normed space X. Consider the maximum function fmax from (7.28)
and the active index set I(x) defined in Theorem 7.20. Then we have
  
∂fmax (x) ⊂ co ∂fi (x) , (7.51)
i∈I(x)

where equality holds and f is directionally regular at x provided that fi are


directionally regular for all i ∈ I(x).

Proof. It is straightforward to check that the maximum function fmax is


locally Lipschitzian around x. Following the proof of Theorem 7.20, it is not
hard to see (see Exercise 7.125) that the equality formula (7.29) for the gen-
eralized directional derivative of ϕ becomes the inequality

fmax (x; v) ≤ max fi◦ (x; v) for all v ∈ X (7.52)
i∈I(x)

when the smoothness of fi is replaced by their local Lipschitz continuity.


Define further the directional functions
ψi (v) := fi◦ (x; v) for all v ∈ X and i = 1, . . . , m
and pick any x∗ ∈ ∂fmax (x). Then we get by employing (7.52) that
x∗ , v ≤ fmax

(x; v) ≤ max fi◦ (x; v) = max ψi (v) for all v ∈ X.
i∈I(x) i∈I(x)

Using now Proposition 7.30 and the classical formula of convex analysis for
the subdifferentiation of maximum functions (see Theorem 3.59) gives us
       
x∗ ∈ ∂ max ψi (0) = co ∂ψi (0) = co ∂fi (x) ,
i∈I(x)
i∈I(x) i∈I(x)
476 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

which justifies the generalized gradient inclusion (7.51). To verify finally the
equality in (7.51) under the imposed directional regularity of fi , observe that
we have the opposite inequality
dfmax (x; v) ≥ max dfi (x; v) for all v ∈ X
i∈I(x)

for (7.2) without any regularity assumptions. This implies by (7.34) that
  
∂ − fmax (x) ⊃ co ∂ − fi (x) (7.53)
i∈I(x)

for the contingent subdifferential. Combining (7.51) and (7.53) with the
assumed directional regularity of fi for i ∈ I(x) justifies the equality in (7.51)
and the directional regularity of the maximum function fmax at x. 
Before presenting the next result, let us discuss its essence.
Remark 7.37 The above calculus rules for the generalized gradients (7.33)
belong to the major part of full calculus that holds not only for (7.33) but
also for some other subdifferential constructions for extended-real-valued l.s.c.
functions. However, the following classical plus-minus symmetry
∂(−f )(x) = −∂f (x) (7.54)
strongly contrasts the generalized gradient of locally Lipschitzian functions
from any other known subdifferentials in nonsmooth analysis, including those
that do not possess adequate calculus rules. This is properly reflected by
the very name “generalized gradient” versus subdifferential. In particular, the
symmetry property does not hold for the subdifferential of convex analysis
(simply because the negative convex function is not convex), for the contingent
subdifferential, and for Rockafellar’s extension of Clarke’s constructions to
l.s.c. functions among other subdifferentials discussed in Section 7.10.
The plus-minus symmetry (7.54) excludes the generalized gradient (7.33)
from the realm of unilateral, one-sided analysis the starting point of which is
convex analysis. The proof of (7.54) given below is based on definition (7.1) of
the generalized directional derivative for locally Lipschitzian functions that is
the driving force of full calculus and other important properties of generalized
gradients. However, there is a price to pay for (7.54) including an undesirable
large size of ∂f (x) and especially the fact that the generalized gradient does
not distinguish between essentially different nonsmooth functions f and −f ,
between maxima and minima, between inequalities with “≤” and “≥” signs,
etc. This yields, in particular, the equivalence between the directional regular-
ity of f and −f at x in the above calculus rules. The mentioned drawbacks can
be avoided by implementing the other, “unconvexified” approach to general-
ized differentiation, suggested by the first author and then developed in many
publications, which leads us to even better full subdifferential calculus while
sacrificing the duality scheme (7.31) and hence the subdifferential convexity;
see Section 7.10 for more discussions and references.
7.3 Directionally Generated Subdifferentials 477

Here is the scalar multiplication rule for the generalized gradient (7.33)
that includes the aforementioned symmetry property.
Proposition 7.38 For any function f : X → R defined on a normed space X
which is locally Lipschitzian around x, we have the multiplication rule
∂(αf )(x) = α∂f (x) whenever α ∈ R. (7.55)

Proof. The function αf is obviously Lipschitz continuous around x for each


α ∈ R. If α ≥ 0, property (7.55) immediately follows from (7.24) and definition
(7.33). To prove (7.55) for α < 0, it suffices to consider the case where α = −1,
i.e., to verify the symmetry property (7.54). In this case, we have by the full
duality in Theorem 7.31 that
x∗ ∈ ∂(−f )(x) if and only if x∗ , v ≤ (−f )◦ (x; v) for all v ∈ X.
Furthermore, it follows from Proposition 7.11 that (−f )◦ (x; v) = f ◦ (x, −v)
whenever v ∈ X. This tells us that −x∗ ∈ ∂f (x), and therefore ∂(−f )(x) =
−∂f (x), which completes the proof of this proposition. 

We conclude this subsection by subdifferentiation of pointwise minima of


finitely many Lipschitzian functions, where the result for the generalized gra-
dient and contingent subdifferential are drastically different; cf. Exercise 7.134.

Proposition 7.39 Let fi : X → R, i = 1, . . . , m, be locally Lipschitzian func-


tions around x on a normed space X. Define the minimum function

fmin (x) := min fi (x) i = 1, . . . , m , x ∈ X, (7.56)
and denote the collection of active indices at x by

I(x) := i = 1, . . . , m fi (x) = fmin (x) .
Then fmin is locally Lipschitzian around x and we have the inclusion
  
∂fmin (x) ⊂ co ∂fi (x) , (7.57)
i∈I(x)

which holds as an equality if fi are directionally regular for all i ∈ I(x).


Proof. Assertion (a) follows from Theorem 7.36 for maximum functions due
to the symmetry property (7.54) of the generalized gradient. 

7.3.3 Calculus of Contingent Subgradients

In this subsection we obtain two major calculus rules of the equality type for
contingent subgradients of Lipschitzian functions on normed spaces that are
not shared by generalized gradients in the absence of directional regularity.
478 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

As demonstrated in Subsection 7.3.2, the driving force of generalized gradi-


ent calculus rules (particularly of the inclusion type without any regularity
assumptions) is the convexity of the generalized derivative f ◦ (x; ·), which
ensures the full duality relationship (7.42) between generalized gradients and
generalized directional derivatives. This is not the case for the contingent con-
structions, but now we introduce a new regularity condition that ensures such
a full duality and allows us to proceed with the chain and sum rules for con-
tingent subdifferentials (7.34). This new regularity condition is weaker than
directional regularity even for differentiable Lipschitz continuous functions.

Definition 7.40 Let X be a normed space, and let f : X → R be a locally Lip-


schitzian function around some point x ∈ X. We say that f is contingently
regular at x if

df (x; v) = sup{ x∗ , v x∗ ∈ ∂ − f (x) for all v ∈ X. (7.58)

Since f is locally Lipschitzian around x, the subderivative df (x; v) is finite


on X, and hence ∂ − f (x) = ∅ in (7.58). Remembering that the latter set is
weak∗ closed and bounded by (7.38) tells us that the maximum is achieved in
(7.58). Furthermore, we get from (7.58) that the subderivative v → df (x; v)
is convex on the entire space X.
It is obvious that contingent regularity (7.58) holds if f is directionally reg-
ular at x. However, the reverse implication fails even for differentiable locally
Lipschitzian functions on R as demonstrated by Example 7.23. In general,
Proposition 7.27(a) tells us that if f is Gâteaux differentiable at x and locally
Lipschitzian around this point, then f is contingently regular at x, which is
not the case of directional regularity. The calculus rules obtained below reveal
that the contingent regularity is preserved under major operations.
The next theorem is the main result of this subsection. It establishes the
exact chain rule for contingent subgradients of compositions under the con-
tingent regularity assumption on the outer function.
Theorem 7.41 Let g : X → Y be a mapping between normed spaces that
is C 1 -smooth around x, and let ϕ : Y → R be locally Lipschitzian around
y := g(x) and contingently regular at this point. Then the composition ϕ ◦ g is
contingently regular at x, and we have the following chain rule as an equality:
∂ − (ϕ ◦ g)(x) = ∇g(x)∗ ∂ − ϕ(y). (7.59)
Proof. First we verify the inclusion “⊃” in (7.59). Pick any y ∗ ∈ ∂ − ϕ(y) and
v ∈ X, and then deduce from the subderivative chain rule (7.12) that
 
∇g(x)∗ y ∗ , v = y ∗ , ∇g(x)∗ v ≤ dϕ(y) ∇g(x)v = d(ϕ ◦ g)(x)(v),
which yields ∇g(x)∗ y ∗ ∈ ∂ − (ϕ ◦ g)(x)) and thus justifies the claimed inclusion
without any regularity assumption.
7.3 Directionally Generated Subdifferentials 479

To verify the opposite inclusion in (7.59), suppose on the contrary that


there exists x∗ ∈ ∂ − (ϕ ◦ g)(x) with x∗ ∈
/ ∇g(x)∗ ∂ − ϕ(y). The Lipschitz conti-
nuity of ϕ around y ensures that the subgradient set ∂ − ϕ(y) is bounded and
weak∗ closed in Y ∗ , and thus it is compact in the weak∗ topology of Y ∗ by the
Alaoglu-Bourbaki theorem as discussed above. Since ∇g(x)∗ : Y ∗ → X ∗ is a
linear continuous operator, we get that the set ∇g(x)∗ ∂ − ϕ(y) is convex and
weak∗ compact in X ∗ . The strict convex separation theorem from Chapter 2
(see Corollary 2.68) gives us ξ ∈ X \ {0} and ε > 0 such that
ξ, ∇g(x)∗ λ ≤ ξ, x∗ − ε for all λ ∈ ∂ − ϕ(y).
Employing this together with the imposed contingent regularity of ϕ at y and
the subderivative chain rule (7.12), we arrive at the relationships
ξ, x∗ − ε ≥ sup ξ, ∇g(x)∗ λ = sup ∇g(x)ξ, λ
λ∈∂ − ϕ(y) λ∈∂ − ϕ(y)
 
= dϕ(y) ∇g(x)ξ = d(ϕ ◦ g)(x)(ξ)
ensuring that ξ, x∗ > d(ϕ ◦ g)(x)(ξ). This tells us that x∗ ∈/ ∂ − (ϕ ◦ g)(x), a
contradiction that completes the proof of the subdifferential chain rule (7.59).
It remains to verify that the composition ϕ ◦ g is contingently regular at
x under the assumptions made. To proceed, take any v ∈ X and deduce from
(7.58) and the subderivative chain rule (7.12) that
sup x∗ , v = sup ∇g(x)∗ λ, v = sup λ, ∇g(x)v
x∗ ∈∂ − (ϕ◦g)(x) λ∈∂ − ϕ(y) λ∈∂ − ϕ(y)
 
= dϕ(y) ∇g(x)v = d(ϕ ◦ g)(x)(v),
which justifies the claimed contingent regularity of the composition ϕ ◦ g and
thus completes the proof of the theorem. 
Now we present a useful consequence of Theorem 7.41 that provides a
relationship between full and partial contingent subdifferentials for locally
Lipschitzian functions of two variables.

Corollary 7.42 Let X and Y be normed spaces, and let f : X × Y → R be


locally Lipschitzian around (x, y) and contingently regular at this point. Then
we have the equality relationship

∂x− f (x, y) = x∗ ∈ X ∗ ∃ y ∗ ∈ Y ∗ such that (x∗ , y ∗ ) ∈ ∂ − f (x, y) , (7.60)
and the corresponding equality for ∂y− f (x, y).

Proof. Proceed as in the proof of Corollary 7.35 with the usage of Theo-
rem 7.41 instead of Theorem 7.34. 
Finally, we derive the following exact sum rule for contingent subgradients
and the contingent regularity statement, which follow from Theorem 7.41.
480 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.43 Let X be a normed space, and let f1 , f2 : X → R be a locally


Lipschitzian function around some point x ∈ X. Suppose that both functions
are contingently regular at x. Then the sum f1 +f2 is also contingently regular
at this point, and we have
∂ − (f1 + f2 )(x) = ∂ − f1 (x) + ∂ − f2 (x). (7.61)

Proof. Similar to the proof of Theorem 7.17, define ϕ : X × X → R by


ϕ(x, z) := f1 (x) + f2 (z) and g : X → X × X by g(x) := (x, x). It is easy
to deduce directly from Definition 7.40 that the contingent regularity of f1
and f2 at x yields this property for the function ϕ at y := (x, x). We also
deduce directly from definition (7.34) of the contingent subdifferential that
∂ − ϕ(x, x) = ∂ − f1 (x) × ∂ − f2 (x).
Observing the representation f1 + f2 = ϕ ◦ g and applying the chain rule from
Theorem 7.41 to this composition at y = (x, x) tell us that the sum f1 + f2 is
contingently regular at x and that
∂ − (f1 + f2 )(x) = ∂ − (ϕ ◦ g)(x) = ∇g(x)∗ ∂ − ϕ(y) = ∂ − f1 (x) + ∂ − f2 (x).
Thus we are done with the proof of the theorem. 

7.4 Mean Value Theorems and More Calculus


This section brings us back to nonsmooth versions of the fundamental mean
value theorem the importance of which has been highly recognized in clas-
sical analysis. An extension of this fundamental result to convex continuous
functions on topological vector spaces was established in Theorem 5.27 from
Section 5.3 in terms of subgradients of convex analysis. Here we present two
independent versions of the mean value theorem expressed in terms of gener-
alized gradients and contingent subgradients of Lipschitz continuous functions
on normed spaces. These results are formulated and proved in a parallel way,
and the proofs of both versions are based on the corresponding chain rules
established for these constructions in Subsections 7.3.2 and 7.3.3, respectively.
In turn, the main version of the mean value theorem is used in the second
part of this section to obtain other calculus rules for generalized gradients.

7.4.1 Mean Value Theorems for Lipschitzian Functions

Here we derive the aforementioned versions of the mean value theorem for
Lipschitz continuous functions defined on normed spaces. Then this theorem
is applied to characterizing function convexity via the monotonicity property
of the generalized gradient and contingent subgradient mappings.
Theorem 7.44 Let f : X → R be a function defined on a normed space X,
and let a, b ∈ X. Assume that f is Lipschitz continuous on an open set con-
taining the line segment [a, b]. Then the following assertions hold:
7.4 Mean Value Theorems and More Calculus 481

(a) There exists a point c ∈ (a, b) such that


 
f (b) − f (a) ∈ ∂f (c), b − a . (7.62)

(b) If in addition both functions f and −f are contingently regular on U ,


then there exists a point c ∈ (a, b) for which
 
f (b) − f (a) ∈ ∂− f (c), b − a , (7.63)
 
where ∂− f (c) := ∂ − f (c) ∪ − ∂ − (−f )(c) .

Proof. Consider the function of one variable defined by


   
θ(t) := f a + t(b − a) + t f (a) − f (b) , t ∈ [0, 1].
This function is obviously continuous on [0, 1] with θ(0) = θ(1) = f (a). The
classical Weierstrass existence theorem tells us that θ attains both its global
minimum and maximum on the compact interval [0, 1]. Unless θ is constant
on [0, 1] when both statements (7.62) and (7.63) are trivial, either minimum
or maximum point lies in (0, 1). Let τ ∈ (0, 1) be a minimum point of θ on
[0, 1], which is thus a local minimizer for this function. It follows from the
generalized Fermat rules in Proposition 7.26 that
   
f (b) − f (a) ∈ ∂ − θ a + τ (b − a) and f (b) − f (a) ∈ ∂θ a + τ (b − a) . (7.64)
Observe that the function θ can be represented via the composition
θ(t) = (f ◦ g)(t) + t(f (a) − f (b)) with g(t) := a + t(b − a), t ∈ [0, 1].
Since the function g : [0, 1] → X is smooth, we can apply to this composition
the chain rules obtained for generalized gradients and contingent subgradients
in Theorem 7.34 and 7.41, respectively. Thus we deduce from (7.64) that
   
f (b) − f (a) ∈ ∂f (c), b − a and f (b) − f (a) ∈ ∂ − f (c), b − a
with c := a + τ (b − a), where the second inclusion holds provided that the
function f is contingently regular at the point c.
In the alternative case where τ ∈ (0, 1) is a local maximizer of θ on [0, 1], we
have that τ is a local minimizer of the function −θ on this interval. Applying to
the latter function the same arguments as above, observe that nothing changes
in the case of generalized gradients due to the symmetry property (7.54)
verified in Proposition 7.38. Since the symmetry property fails for contingent
subgradients, we arrive in this case to the inclusions
   
f (b) − f (a) ∈ ∂f (c), b − a and f (b) − f (a) ∈ − ∂ − (−f )(c), b − a
with c = a + τ (b − a), where the second one holds when the function −f is
contingently regular at x. Unifying the two cases above brings us to (7.62)
and (7.63), and thus completes the proof of the theorem. 
482 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Let us discuss some specific features of both statements in Theorem 7.44.

Remark 7.45 The following observations shed light on the independent ver-
sions of the mean value theorem obtained above:
(a) Although assertion (a) of Theorem 7.44 does not require any additional
assumptions on f , the set involved in inclusion in (7.62) may be signifi-
cantly larger than the one in (7.63) as for the function f : R → R taken
from Example 7.7.
(b) The contingent regularity assumption of Theorem 7.44(b) is satisfied for
any Gâteaux differentiable function f : X → R, which may be not direc-
tionally regular, and thus (b) does not reduce to (a) for such functions.
(c) It is not hard to deduce from the definitions that for a locally Lipschitzian
function f around x the sets ∂ − f (x) and ∂ − (−f )(x) are nonempty simul-
taneously if and only if f is Gâteaux differentiable at x.

Using the obtained versions of the mean value theorem, we provide now the
following characterizations of function convexity via the monotonicity prop-
erty (5.31) for the subgradient mappings under consideration.

Theorem 7.46 Let U be a nonempty open and convex subset of a normed


space X, and let f : U → R be a locally Lipschitzian function around every
point in U . Then we have the assertions:
(a) The mapping ∂f : U → → X ∗ is monotone on U if and only if the function
f is convex on this set.
(b) Under the additional assumption that both f and −f are contingently
regular on U , the convexity of f on U is equivalent to the monotonicity
of the mapping x → ∂ − f (x) ∪ − ∂ − (−f )(x)) on this set.

Proof. If the function f is convex on U , then both constructions ∂f (x) and


 
∂− f (x) := ∂ − f (x) ∪ − ∂ − (−f )(x) (7.65)
reduce to the subgradient mapping of convex analysis due to Propositions 7.27,
7.28 and Remark 7.45(c). Thus the monotonicity (even maximal monotonic-
ity) of these subgradient mappings follows from Theorem 5.34.
To prove the reverse implication for ∂f and ∂− f , let us start with the
generalized gradient mapping in (a). We first show that the monotonicity of
∂f yields, whenever x ∈ U , the fulfillment of the representation

∂f (x) = x∗ ∈ X ∗ x∗ , u − x ≤ f (u) − f (x) for all u ∈ U (7.66)
and then confirm that (7.66) implies that f is convex on U .
The inclusion “⊃” in (7.66) is obvious. To verify the opposite inclusion
therein, fix x ∈ U and then pick any x∗ ∈ ∂f (x) and u ∈ U . Applying
Theorem 7.44(a) on the interval [u, x] ⊂ U , find c ∈ (u, x) and u∗ ∈ ∂f (c)
7.4 Mean Value Theorems and More Calculus 483

such that f (x)−f (u) = u∗ , x−u . Note that the locally Lipschitzian property
of f guarantees its Lipschitz continuity on an open set containing [u, x] by
the compactness of this line segment. Then the assumed monotonicity of ∂f
ensures right away that the inclusion “⊂” in (7.66) is satisfied.
Having representation (7.66), let us show that f is convex on U . Taking
arbitrary vectors u, x ∈ U , form their convex combination w := λu + (1 − λ)x
for some λ ∈ [0, 1]. Choose w∗ ∈ ∂f (w) and get by (7.66) that
w∗ , x − w ≤ f (x) − f (w),
w∗ , u − w ≤ f (u) − f (w).
Multiplying the first inequality by λ, multiplying the second inequality by
1 − λ, and adding the resulting inequalities give
0 ≤ λf (x) + (1 − λ)f (u) − f (w),
which justifies the convexity of f .
To proceed further with case (b) of ∂− f from (7.65), observe that the
above proof in case (a) ensures the convexity of f if the mapping x → ∂− f (x)
is monotone on U . Indeed, we just need to repeat the previous arguments
with the usage of Theorem 7.44(b) instead of assertion (a) therein, which
thus completes the proof of this theorem. 
We conjecture that ∂− f can be replaced by ∂ − f in Theorem 7.46(b) pro-
vided that X is a Banach space admitting a Gâteaux smooth renorming, in
particular, if X is any separable Banach space. The reader is referred to Exer-
cise 7.136(b) with the hint therein for more details. In Section 7.10 we also
discuss a general version of Theorem 7.46 for the class of extended-real-valued
l.s.c. functions by using the approximate mean value theorem instead of its
Lipschitzian counterpart in Theorem 7.44.

7.4.2 Additional Calculus Rules for Generalized Gradients

In this subsection we obtain yet another major chain rule for generalized gra-
dients, which—in contrast to the previous chain rules for generalized gradients
and contingent subgradients—addresses compositions where inner mappings
are Lipschitzian while not necessarily differentiable. The proof is based on
the mean value Theorem 7.44(a) and the robustness of generalized gradients,
which fails for contingent subgradients even under contingent regularity as for
the function f : R → R from Example 7.7. The established chain rule allows
us to derive product and quotient rules for generalized gradients by reducing
these operations to compositions with nonsmooth inner mappings.
Let us first present the following monotonicity relation for support func-
tions of closed convex sets that is of its own interest.
Lemma 7.47 Let Ω1 , Ω2 ⊂ X be two nonempty, closed, and convex subsets
of an LCTV space X, and let σΩ1 and σΩ2 be their support functions (4.8).
Then the support function relationship
484 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

σΩ1 (x∗ ) ≤ σΩ2 (x∗ ) for all x∗ ∈ X ∗ (7.67)


is equivalent to the inclusion Ω1 ⊂ Ω2 .

Proof. We only need to verify that (7.67) yields Ω1 ⊂ Ω2 since the opposite
implication is obvious. Suppose on the contrary that Ω1 is not a subset of Ω2 ,
i.e., there exists x ∈ Ω1 with x ∈ / Ω2 . Applying the strict separation theorem
to the sets Ω2 and {x∗ } and using the support function definition give us a
linear functional x∗ ∈ X ∗ such that

σΩ2 (x∗ ) = sup{ x∗ , x x ∈ Ω2 < x∗ , x ≤ σΩ1 (x∗ ).
This contradicts (7.67) and thus completes the proof of the lemma. 

Now we are ready to derive the aforementioned chain rule of the inclusion
type for generalized gradients of Lipschitzian compositions.

Theorem 7.48 Let g : X → Rn be a locally Lipschitzian mapping around x


defined on a normed space X, and let ϕ : Rn → R be a locally Lipschitzian
function around y := g(x). Then we have the inclusion

∂(ϕ ◦ g)(x) ⊂ co∗ ∂ y ∗ , g (x) y ∗ ∈ ∂ϕ(y) , (7.68)
where co∗ stands for the weak∗ closure of the convex hull in X ∗ .
Proof. It is easy to check that the composition ϕ ◦ g is locally Lipschitzian
around x. Fix any v ∈ X and choose sequences {xk } ⊂ X to x and {tk } ⊂
(0, ∞) with tk ↓ 0 as k → ∞ such that
(ϕ ◦ g)(xk + tk v) − (ϕ ◦ g)(x)
(ϕ ◦ g)◦ (x; v) = lim
k→∞ tk
 
ϕ g(xk + tk v) − ϕ(y)
= lim .
k→∞ tk
Applying the mean value result from Theorem 7.44(a) to the function ϕ on the
interval [g(xk ), g(xk + tk v)] for each k ∈ N gives us a point yk ∈ (g(xk ), g(xk +
tk v)) and a subgradient yk∗ ∈ ∂ϕ(yk ) such that
 
ϕ g(xk + tk v) − ϕ(y)  ∗ g(xk + tk v) − g(xk ) 
= yk , .
tk tk
It follows from the above that yk → y as k → ∞. Furthermore, by The-
orem 7.24 ensuring that the sequence {yk∗ } is bounded in Rn and that the
generalized gradient ∂ϕ(·) is robust in the sense of (7.37), we suppose without
loss of generality that there is y ∗ ∈ ∂ϕ(y) such that yk∗ → y ∗ as k → ∞.
Next we consider the function ψ : X → R defined by
ψ(x) := y ∗ , g(x) for all x ∈ X
7.4 Mean Value Theorems and More Calculus 485

and apply to it the mean value theorem from (7.62) on the interval [xk , xk +
tk v]. This gives us uk ∈ (xk , xk + tk v) and x∗k ∈ ∂ y ∗ , g (uk ) such that
 g(xk + tk v) − g(xk ) 
y∗ , = x∗k , v for every k ∈ N,
tk
where {uk } converges to x as k → ∞. Since the numerical sequence { x∗k , v }
is bounded, we suppose without loss of generality that there exits α ∈ R such
that limk→∞ x∗k , v = α. By the boundedness of the sequence {x∗k } ⊂ X ∗
due to the boundedness of generalized gradients, it follows from the Alaoglu-
Bourbaki theorem that there exists a subnet of {x∗k } that converges to some
x∗ ∈ X ∗ in the weak∗ topology of X ∗ . This allows us to conclude that
lim x∗k , v = α = x∗ , v ,
k→∞

and in addition we have that x∗ ∈ ∂ y ∗ , g (x) due to robustness (7.37) of the


generalized gradient mapping; see Theorem 7.24. Combining the above yields
  
◦ ϕ g(xk + tk v) − ϕ(y) g(xk + tk v) − g(xk ) 
(ϕ ◦ g) (x; v) = lim = lim yk∗ ,
k→∞ tk k→∞ tk
 g(xk + tk v) − g(xk ) 
= lim yk∗ − y ∗ + y ∗ ,
k→∞ tk
 g(x + t v) − g(xk )   ∗ g(xk + tk v) − g(xk ) 
k k
= lim yk∗ − y ∗ , + y ,
k→∞ tk tk
  
∗ ∗ g(xk + tk v) − g(xk ) ∗ ∗
= lim yk − y , + xk , v = x , v .
k→∞ tk
Denote further Ω1 := ∂(ϕ ◦ g)(x), and let Ω2 be the set on the right-hand
side of (7.68). Then both these sets are nonempty, closed, and convex in the
space X ∗ equipped with the weak∗ topology. We know that the dual space to
the latter LCTV space is X. Hence it follows from the above due to the full
duality representation (7.42) of Theorem 7.31 that
σΩ1 (v) = f ◦ (x; v) ≤ σΩ2 (v) for all v ∈ X.
Thus we conclude from Lemma 7.47 that Ω1 ⊂ Ω2 , which gives us (7.68) and
completes the proof of the theorem. 

The obtained chain rule easily leads us to evaluations of generalized gra-


dients for product and quotient of locally Lipschitzian functions. First we
present the following product rule of the inclusion type.

Corollary 7.49 Let f1 , f2 : X → R be locally Lipschitzian around x on a


normed space X. Then the product f1 · f2 is also locally Lipschitzian around
x, and we have the inclusion
∂(f1 · f2 )(x) ⊂ f2 (x)∂f1 (x) + f1 (x)∂f2 (x). (7.69)
486 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Proof. The local Lipschitz continuity of f1 · f2 is obvious. Observe that the


product f1 · f2 can be written as the composition ϕ ◦ g of the mappings
g : X → R2 and ϕ : R2 → R defined by
 
g(x) := f1 (x), f2 (x) and ϕ(y1 , y2 ) := y1 · y2 .
Then inclusion (7.69) follows the one in (7.68), where the operation co∗ is
superfluous, and the scalar multiplication rule from Proposition 7.38. 

The final result of this subsection is the following quotient rule for gener-
alized gradients of Lipschitzian functions.

Corollary 7.50 Let f2 (x) = 0 in the setting of Corollary 7.49. Then the
quotient f1 /f2 is locally Lipschitzian around x, and we have the inclusion
 
f1 f2 (x)∂f1 (x) − f1 (x)∂f2 (x)
∂ (x) ⊂ .
f2 f22 (x)
Proof. Proceed as in the proof of Corollary 7.49 with ϕ(y1 , y2 ) := y1 /y2 . 

7.5 Strict Differentiability and Generalized Gradients


As seen above, the generalized gradient of locally Lipschitzian functions is not
a proper extension of the classical notion(s) of differentiability to nondifferen-
tiable functions at the point in question, even in the case of functions on the
real line. Indeed, we get for the differentiable Lipschitzian function f : R → R
from Example 7.7 that ∂f (0) = [−1, 1] while f  (0) = 0. This is a signifi-
cant difference between ∂f and the contingent subdifferential ∂ − f , which is
a proper extension of the classical derivative at a point. On the other hand,
we do have the reduction of the generalized gradient to the classical deriva-
tive/gradient at the reference point provided that the function is continuously
differentiable around this point.
It has been understood in real analysis (in the second half of the twenti-
eth century) that there is a notion of differentiability at a point, which lies
between the usual point differentiability and the continuous differentiability
in a neighborhood. This notion is known as strict differentiability. In fact,
there are several modifications of this notion corresponding to the standard
derivatives at a point that are generally different for non-Lipschitzian func-
tions on infinite-dimensional spaces. We consider here strict versions of the
three major ones that go back to Maurice Fréchet, René Gâteaux, and Jacques
Hadamard.
It has also been realized in classical analysis that strict differentiability,
although being defined merely at the point in question, is much closer to the
continuous differentiability in a neighborhood than to the usual differentia-
bility at the point. This is due to the limit uniformity in its definition with
respect to points nearby; see below. Such a uniformity yields the robustness
of strict derivatives with respect to small perturbations of the reference point
7.5 Strict Differentiability and Generalized Gradients 487

and leads us to the unconditional extension of the major results of classical


analysis from continuously differentiable to strictly differentiable mappings.
A similar uniformity is reflected in the constructions of Clarke’s gener-
alized directional derivative and generalized gradient, which makes the lat-
ter a proper nonsmooth extension of strict differentiability. This is demon-
strated in this section. The first subsection presents the definitions of Gâteaux,
Hadamard, and Fréchet strict differentiability and establishes relationships
between them. The second subsection shows, roughly speaking, that the gen-
eralized gradient is a singleton if and only if the function under consideration
in the corresponding setting is strictly differentiable in an appropriate sense.
7.5.1 Notions of Strict Differentiability

In this section we consider functions f : X → R defined on normed spaces.


Although we do not assume a priori that f is Lipschitz continuous around a
given point x, the latter property naturally and frequently appears in what
follows. Recall that, unless otherwise stated, all the spaces below are normed.
Here are the definitions of function strict differentiability in the sense of
Gâteaux, Hadamard, and Fréchet. If no confusion arises, we do not distinguish
in notation between the usual and strict derivatives of f at x.
Definition 7.51 Let f : X → R be a real-valued function on a normed space
X, and let x ∈ X. Define the following notions:
(a) The function f is said to be strictly Gâteaux differentiable at x
if there exists x∗ ∈ X ∗ such that we have
f (x + tv) − f (x) − t x∗ , v
lim = 0 for all v ∈ X.
x→x t
t↓0

The element x∗ , necessarily unique, is called the strict Gâteaux derivative



of f at x and is denoted by fG (x).
(b) The function f is said to be strictly Hadamard differentiable at
x if there exists x∗ ∈ X ∗ such that we have
f (x + tv) − f (x) − t x∗ , v
lim = 0 for all v ∈ X,
x→x t
t↓0

where the convergence is uniform for v over compact subsets of X. The


element x∗ , necessarily unique, is called the strict Hadamard derivative

of f at x and is denoted by fH (x).
(c) The function f is strictly Fréchet differentiable at x if there
exists x∗ ∈ X ∗ such that we have
f (x) − f (u) − x∗ , x − u
lim = 0.
x→x x − u
u→x

The element x , necessarily unique, is called the strict Fréchet derivative
of f at x denoted by fF (x).
488 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Remark 7.52 It follows from the definitions (see also Exercise 7.139) that

strict Fréchet differentiability =⇒ strict Hadamard differentiability


=⇒ strict Gâteaux differentiability .

Observe also that if f is strictly Hadamard differentiable at x, then there


exists x∗ ∈ X ∗ such that for any ε > 0 and for any compact subset C of X
we can find a positive number δ satisfying
f (x + tv) − f (x) − t x∗ , v

t
whenever 0 < t < δ, 0 < x − x < δ, and v ∈ C.

Next we show that the Lipschitz continuity of f around x is a consequence


of its strict Hadamard (and hence strict Fréchet) differentiability at this point.

Theorem 7.53 Let X be a normed space. If f : X → R is strictly Hadamard


differentiable at x ∈ X, then it is locally Lipschitzian around this point.

Proof. It is instructive to show first that the claimed local Lipschitz continuity
of f easily follows from its strict Fréchet differentiability. Indeed, supposing
the latter and denoting x∗ := fF (x) allow us to find δ > 0 such that
|f (x) − f (u) − x∗ , x − u | ≤ x − u whenever x, u ∈ B(x; δ).
With  := x∗  + 1, we immediately get the local Lipschitz property
|f (x) − f (u)| ≤ x − u for all x, u ∈ B(x; δ).
Consider further the general case where f is strictly Hadamard differ-
entiable at x with x∗ := fH 
(x). Suppose on the contrary that f is not
locally Lipschitzian around x and then for any k ∈ N find sequences of points
xk , uk ∈ B(x; 1/k) such that
 xk − uk 
|f (xk ) − f (uk )| = f uk + tk − f (uk ) > kxk − uk . (7.70)
tk
√ √
Choosing now tk := kxk − uk  gives us 0 < tk ≤ 2/ k → 0 as k → ∞.
Furthermore, with vk := (xk − uk )/tk we get from (7.70) that
√ f (uk + tk vk ) − f (uk )
k< , k ∈ N.
tk

Observe that vk  = 1/ k → 0 as k → ∞ and form the compact set
 
C := vk k ∈ N ∪ 0 .
It follows from the definition of strict Hadamard differentiability that
7.5 Strict Differentiability and Generalized Gradients 489

f (uk + tk vk ) − f (uk )
− x∗ , vk → 0 as k → ∞.
tk
This readily leads us to a contradiction due to the estimate
√ f (uk + tk vk ) − f (uk )
k − x∗  · vk  ≤ − x∗ , vk
tk

and the fact that k − x∗  · vk  → ∞ as k → ∞. 

The next theorem establishes the relationship between strict Gâteaux dif-
ferentiability and strict Hadamard differentiability of real-valued functions.
Theorem 7.54 Let X be a normed space. Given f : X → R and x ∈ X, the
following are equivalent:
(a) f is strictly Hadamard differentiable at x.
(b) f is locally Lipschitz continuous around x and strictly Gâteaux differen-
tiable at this point.
Proof. Implication (a)=⇒(b) follows from Remark 7.52 and Theorem 7.53.
To verify (b)=⇒(a), suppose on the contrary that f is not strictly Hadamard
differentiable at x. Then there exist a compact set C ⊂ X and a number
ε0 > 0 along with sequences vk ∈ C, tk ↓ 0, and xk → x as k → ∞ such that
f (xk + tk vk ) − f (xk )
ε0 < − x∗ , vk for all k ∈ N.
tk
By the compactness of C, assume without loss of generality that the sequence
{vk } converges to some v ∈ C as k → ∞. Then for k sufficiently large we have
f (xk + tk v) − f (xk ) f (xk + tk vk ) − f (xk )
− x∗ , v = − x∗ , vk
tk tk
f (xk + tk v) − f (xk + tk vk )  ∗ 
+ − x , v − x∗ , vk
tk
f (xk + tk vk ) − f (xk )
≥ − x∗ , vk − ( + x∗ )vk − v
tk
≥ ε0 − ( + x∗ )vk − v,
where  is a Lipschitz constant of f around x. This leads us to a contradiction
and thus completes the proof of the theorem due to the imposed strict Gâteaux
differentiability of f at x and the fact that vk − v → 0 as k → ∞. 
We conclude this subsection with the two useful observations and leave
their verifications as exercises for the reader.
Remark 7.55 Observe the following:
(a) In finite-dimensional spaces, the strict Fréchet differentiability of f at x
is equivalent to the strict Hadamard differentiability of f at this point.
(b) For locally Lipschitzian functions, a natural question is whether the strict
Hadamard differentiability of f at x implies the strict Fréchet differen-
tiability of f at this point. The answer is negative: consider the norm
function f (x) := x in the space of sequences X := 1 at x = 0.
490 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

7.5.2 Single-Valuedness of Generalized Gradients

This subsection establishes close relationships between the strict differentiabil-


ity notions considered above and single-valuedness of the generalized gradient
at the reference point. The first theorem gives us a complete characterization
of the latter property via strict Hadamard differentiability. In particular, it
significantly improves the result in Proposition 7.28(a).

Theorem 7.56 Let X be a normed space. Given f : X → R, fix a point x ∈ X.


The following assertions hold:
(a) If f is strictly Hadamard differentiable at x, then f is locally Lipschitzian
around this point and ∂f (x) = {x∗ } = {fH 
(x)}.
(b) Conversely, if f is locally Lipschitzian around x and ∂f (x) = {x∗ }, then

f is strictly Hadamard differentiable at x with fH (x) = x∗ .

Proof. Suppose that f is strictly Hadamard differentiable at x and deduce


from Theorem 7.54 that it is locally Lipschitzian around x and strictly
Gâteaux differentiable at this point. In particular, we have
f (x + tv) − f (x)
x∗ , v = lim for all v ∈ X.
x→x t
t↓0

It follows therefore that


f (x + tv) − f (x)
f ◦ (x; v) = lim sup = x∗ , v whenever v ∈ X.
x→x t
t↓0

This implies by definition (7.33) that ∂f (x) = {x∗ } = {fH 


(x)} and thus
justifies assertion (a).
To verify (b), suppose that f is locally Lipschitzian around x and that
∂f (x) reduces to a singleton {x∗ }. Then Theorem 7.31 tells us that

f ◦ (x; v) = max u∗ , v u∗ ∈ ∂f (x) = x∗ , v for all v ∈ X,
which implies that f ◦ (x; −v) = x∗ , −v for all v ∈ X. This yields
(−f )◦ (x; v) = f ◦ (x; −v) = x∗ , −v for all v ∈ X
by the symmetry property of Proposition 7.11. Observe further that
(−f )(x + tv) − (−f )(x) f (x + tv) − f (x)
(−f )◦ (x; v) = lim sup = − lim inf
x→x t x→x t
t↓0 t↓0

and arrive therefore at the representation


f (x + tv) − f (x) f (x + tv) − f (x)
lim inf = x∗ , v = lim sup .
x→x t x→x t
t↓0 t↓0
7.5 Strict Differentiability and Generalized Gradients 491

The latter shows that the function f is strictly Gâteaux differentiable at x.


Thus we deduce from Theorem 7.54 that it is strictly Hadamard differentiable

at x with fH (x) = x∗ , which completes the proof of the theorem. 

The next result is a direct consequence of Theorem 7.56 for convex func-
tions. Note that instead of dealing with f : X → R, we may consider extended-
real-valued functions f : X → R while assuming that x ∈ int(dom(f )). This
in fact does not make any difference.

Corollary 7.57 Let X be a normed space, and let f : X → R be a convex


function. Then for any x ∈ X the following properties are equivalent:
(a) f is strictly Hadamard differentiable at x.
(b) f is continuous and strictly Gâteaux differentiable at x.
(c) f is continuous at x, and ∂f (x) = ∂f (x) is a singleton.

Proof. Implication (a)=⇒(b) is proved in Theorem 7.54. Implication (b)=⇒(c)


is straightforward due to the obvious observation that the strict Gâteaux dif-
ferentiability yields the Gâteaux differentiability of f at x. To verify (c)=⇒(a),
we use the fact that the continuity of a convex function f at x ensures its Lip-
schitz continuity around this point; see Corollary 2.150. Recall also that the
generalized gradient of a convex function reduces to the subdifferential of
convex analysis at the point in question; see Proposition 7.28(b). Thus the
claimed implication follows from Theorem 7.56(b). 

Now we establish a characterization of strict Fréchet differentiability via


generalized gradients. This characterization includes, together with the prop-
erty that the set of generalized gradients is a singleton at the reference point,
a certain continuity property of the generalized gradient mapping.
Let X be a normed space. Given a set-valued mapping F : X → → X ∗ and a
point x ∈ dom(F ), recall that the mapping F is strongly upper semicontinuous
at x if it is upper semicontinuous when both X and X ∗ are equipped with
their strong topologies.

Theorem 7.58 Let f : X → R be a real-valued function defined on a normed


space X, and let x ∈ X be fixed. Then the following properties are equivalent:
(a) f is strictly Fréchet differentiable at x.
(b) f is locally Lipschitzian around x, the set ∂f (x) is a singleton, and the
mapping ∂f : X → → X ∗ is strongly upper semicontinuous at x.
Proof. To verify (a)=⇒(b), suppose that f is strictly Fréchet differentiable
at x, and thus it is strictly Gâteaux differentiable at this point. Then The-
orem 7.56 tells us that f is locally Lipschitzian around x and that the set
∂f (x) is a singleton. To get (b), it remains to show that the mapping ∂f (·)
is upper semicontinuous at x. Fix a sequence {xk } ⊂ X that converges to x,
and pick any x∗k ∈ ∂f (xk ) for k ∈ N. Given ε > 0, there exists δ > 0 such that
492 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

f (x) − f (u) − fF (x), x − u ≤ εx − u whenever x, u ∈ B(x; δ)


with the strict Fréchet derivative fF (x). Since {xk } converges to x, there
exists k0 ∈ N with xk ∈ B(x; δ/4) for all k ≥ k0 . Fix any k ≥ k0 and any
v ∈ X and then observe the estimates
x − x ≤ x − xk  + xk − x < δ/4 + δ/4 = δ/2 < δ
if x − xk  < δ/4 and 0 < t < δ/(2(v + 1)).
This yields x + tv − x ≤ x − x + tv < δ/2 + δ/2 = δ, and so
f (x + tv) − f (x) ≤ fF (x), tv + εtv
for all v ∈ X and t > 0 sufficiently small. Thus we get the inequality
f (x + tv) − f (x)
≤ fF (x), v + εv for such x, v, t,
t
which implies in turn by passing to the limit that
f (x + tv) − f (x)
f ◦ (xk ; v) = lim sup ≤ fF (x), v + εv, v ∈ X.
x→xk t
t↓0

The latter tells us therefore that


x∗k , v ≤ f ◦ (xk ; v) ≤ fF (x), v + εv for all v ∈ X,
which ensures that x∗k − fF (x) ≤ ε whenever k ≥ k0 . This verifies that the
sequence {x∗k } converges strongly to fF (x), which justifies (b).
To prove the opposite implication (b)=⇒(a), we get by (b) that ∂f (x) =
{x∗ }. Since the mapping ∂f (·) is strongly upper semicontinuous at x, for any
ε > 0 there exists δ > 0 such that
∂f (u) ⊂ B(x∗ ; ε) whenever u ∈ B(x; δ).
We can choose δ > 0 sufficiently small such that f is Lipschitz continuous
on the ball B(x; δ). Fix any x, z ∈ B(x; δ) with x = z and find by the mean
value inclusion (7.62) in Theorem 7.44 a point u ∈ (x, z) and a subgradient
u∗ ∈ ∂f (u) satisfying the equality
f (z) − f (x) = u∗ , z − x .
This readily implies that u∗ − x∗  ≤ ε, and hence
f (z) − f (x) − x∗ , z − x u∗ − x∗ , z − x
= ≤ u∗ − x∗  ≤ ε,
z − x z − x
which verifies that f is strictly Fréchet differentiable at x. 
7.6 Generalized Gradients in Finite Dimensions 493

We conclude this subsection with a refinement of Theorem 7.58 in the case


of convex functions on normed spaces.
Corollary 7.59 Let the function f : X → R be convex in the setting of The-
orem 7.58. Then we have the equivalent properties:
(a) f is strictly Fréchet differentiable at x.
(b) f is Fréchet differentiable at x.
(c) f is continuous at x, the set ∂f (x) is a singleton, and the mapping
∂f : X →→ X ∗ is strongly upper semicontinuous at x.
Proof. Implication (a)=⇒(b) is obvious. To verify (b)=⇒(c), observe that the
Fréchet differentiability of f at x yields its continuity at this point and thus—
by the convexity of f —its Lipschitz continuity around x. This tells us that in
the case of convex and Fréchet differentiable functions, we have

∂f (x) = ∂f (x) = fF (x) .
To get (c), it remains to prove the strong upper semicontinuity of the mapping
∂f at x. Denoting x∗ := fF (x) gives us by definition that
f (x) − f (x) − x∗ , x − x
lim = 0.
x→x x − x
Having this, for any ε > 0 choose δ > 0 such that
x∗ , x − x ≤ f (x) − f (x) + εx − x whenever x − x < δ.
Then for every x ∈ B(x; δ) and every u∗ ∈ ∂f (x) we get
x∗ − u∗ , x − x = x∗ , x − x + u∗ , x − x
≤ f (x) − f (x) + εx − x + f (x) − f (x)
= εx − x.
This implies that x∗ − u∗  ≤ ε, which justifies the strong upper semiconti-
nuity of ∂f and hence implication (b)=⇒(c). Finally, implication (c)=⇒(a)
follows directly from Theorem 7.58, and thus we are done with the proof. 

7.6 Generalized Gradients in Finite Dimensions


The main goal of this section is to show that the generalized gradient, which
is defined in (7.33) by the generalized directional derivative via the duality
correspondence, admits an equivalent representation in X = Rn via limits
of classical gradients without any appeal to directional derivatives. This is
based on the fundamental Rademacher’s theorem that establishes the almost
everywhere (a.e.) differentiability of Lipschitz continuous functions on open
sets in finite-dimensional spaces. Besides the convex-valuedness of ∂f , the
494 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

proof of such a limiting representation crucially exploits the robust property


of the generalized gradient mapping and does not lead us to a result of this
type for the nonrobust (although convex-valued) contingent subdifferential.
In Subsection 7.6.1 we provide a simple proof of Rademacher’s theorem
from measure theory. Subsection 7.6.2 exploits this result to derive the afore-
mentioned representation of the generalized gradient. In Subsection 7.6.3, we
present applications of the above generalized gradient properties and the a.e.
differentiability of Lipschitz continuous functions to precise calculations of
generalized gradients of antiderivatives (indefinite integrals) over L∞ func-
tions on real line intervals.

7.6.1 Rademacher Differentiability Theorem

Here we give a simple proof of the classical theorem on the a.e. differentia-
bility of Lipschitzian functions in finite dimensions, which goes back to Hans
Rademacher [298]. The theorem is formulated for vector functions f : U → Rm
on open subsets of Rn , but it obviously suffices to verify it for real-valued func-
tions, which is actually needed in the next subsection.
Theorem 7.60 Let f : U → Rm be a Lipschitz continuous mapping defined
on a nonempty open subset U of Rn . Then f is differentiable a.e. on U .

Proof. We can assume without loss of generality that m = 1 and split the
proof into the verification of the following four claims:
Claim 1: The result holds for n = 1.
This follows from the classical result on the a.e. differentiability of absolutely
continuous (and hence of Lipschitz continuous) functions on the real line.
Claim 2: For any vector v ∈ S from the unit sphere S ⊂ Rn , the classical
directional derivative f  (x; v) exists at almost every point x ∈ U .
To verify this claim, denote by Sv the set of all the points x ∈ U where f  (x; v)
exists. Then the one-dimensional Lebesgue measure of the intersection of the
set U \ Sv with any line that is parallel to v equals zero by Claim 1. It is
easy to see that the set Sv is measurable. Then the classical Fubini theorem
implies that μ(U \ Sv ) = 0 for the Lebesgue measure μ(Ω) of a set Ω ⊂ Rn .
Claim 3: Denote fxi := ∂f /∂xi for i = 1, . . . , n. Given v = (v1 , . . . , vn ) ∈ S,
consider the set Uv ⊂ U on which f  (x; v) and fx1 (x), . . . , fxn (x) exist with
f  (x; v) = v1 fx1 (x) + . . . + vn fxn (x). (7.71)
Then we have μ(U \ Uv ) = 0.
To prove this claim, recall from Claims 1 and 2 that all the derivatives f  (x; v)
and fxi (x), i = 1, . . . , n, exist for a.e. x ∈ U . Thus it remains to verify that
equality (7.71) holds almost everywhere on U . To this end, take any real-
valued function g ∈ C ∞ (U ) with compact support. Using Lebesgue’s domi-
nated convergence theorem and then integration by part gives us
7.6 Generalized Gradients in Finite Dimensions 495
 
f (x + tv) − f (x)
f  (x; v)g(x)dx = lim g(x)dx
U t↓0 U t

g(x − tv) − g(x)
= lim f (x) dx
t↓0 U t
 n
  
n 
∂g
− f (x) vi dx = vi fxi g(x)dx.
U i=1
dxi U i=1

Since the latter holds for an arbitrary function g of the above class, we con-
clude that equality (7.71) is satisfied for a.e. x ∈ U .
Claim 4: Let Q ⊂ S be an arbitrary set that is dense on the unit sphere
S ⊂ Rn , and let Ω := ∩v∈Q Uv ⊂ U , where Uv is defined in Claim 2. Then the
function f is differentiable at any point x ∈ Ω.
To verify this statement, fix x ∈ Ω and then for any v ∈ S and t > 0 define
n
f (x + tv) − f (x) − t i=1 vi fxi (x)
r(v, t) := .
t
The Lipschitz continuity of f allows us to find a constant  > 0 for which
r(w, t) − r(v, t) ≤ w − v whenever v, w ∈ S.
It follows from Claim 3 that for every finite subset W ⊂ S and every ε > 0
there is a number t = t(W, ε) > 0 with
1
r(w, t) < whenever 0 < t < t(W, ε).
2
The density of Q on the (compact) unit sphere S ensures the existence of a
finite subset W ⊂ Q with the distance estimate d(v; W ) < (ε/2) for all v ∈ S.
Thus for any such v, there exists w ∈ W satisfying v − w < (ε/2) and
r(v, t) ≤ r(w, t) + r(v, t) − r(w, t) < ε if 0 < t < t(W, ε).
The latter inequality tells us that for any ε > 0, we have
n
f (x + v) − f (x) − i=1 vi fxi (x)

v
provided that v is sufficiently small. This verifies the claimed differentiabil-
ity of f at x and thus completes the proof of the theorem. 

7.6.2 Gradient Representation of Generalized Gradients

Now we are ready to derive a convenient representation of generalized gradi-


ents of locally Lipschitzian functions on Rn via the convex hull of the classical
gradient limits at the points of differentiability. Given f : U → R that is Lips-
chitz continuous on a neighborhood U of x, denote by D the sets of points in
U at which f is differentiable. Theorem 7.60 tells us that the set D is of full
Lebesgue measure on U for any locally Lipschitzian function f .
496 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.61 Let U ⊂ Rn be a neighborhood of x on which f : Rn → R is


Lipschitz continuous, and let O be any set of measure zero in U . Then we
have the generalized gradient representation
D 
∂f (x) = co lim ∇f (xk ) xk → x, xk ∈ / O as k → ∞ . (7.72)

Proof. Using again Theorem 7.60, we conclude that the set on the right-hand
side of (7.72) is nonempty. It follows from (7.35) and Proposition 7.27 that
∇f (xk ) ∈ ∂f (xk ) for each k ∈ N. Combining this with the uniform bound-
edness of ∂f (x) on U by the Lipschitz constant in (7.36) from Theorem 7.24
allows us to deduce from the classical Bolzano-Weierstrass theorem that the
sequence {∇f (xk )} contains a convergent subsequence. The limits of such
subsequences belong to ∂f (x) due to the robustness property (7.37) of the
generalized gradient verified in Theorem 7.24. Since the set ∂f (x) is convex,
we get the inclusion “⊃” in (7.72). Note that the set on the right-hand of (7.72)
is compact, which follows from the Carathéodory theorem; see Exercise 6.114.
It remains to prove the inclusion “⊂” in (7.72). Taking into account the
convexity and compactness of both sets in (7.72) and the monotonicity prop-
erty of the support function established in Lemma 7.47, it suffices to show
that the support function of the set on the right-hand side of (7.72) is not less
than the one for ∂f (x). But the latter is exactly f ◦ (x; v) by the full duality
in Theorem 7.42. Thus we have to prove that the inequality
D 
f ◦ (x; v) − ε ≤ lim sup ∇f (x), v x → x, x ∈ /O (7.73)
holds for any v = 0 and ε > 0. To proceed, denote by γ ∈ R the value on the
right-hand side of (7.73). Thus there exists η > 0 such that
∇f (x), v ≤ γ + ε whenever x − x ≤ η with x ∈ D, x ∈
/ O,
where the set (U \ D) ∪ O is of measure zero in x + ηB. This implies by using
the aforementioned Fubini theorem that for a.e. x ∈ x + η/2B the intersection
of the interval Ix := {x + tv | 0 < t < η/(2v)} with the set (U \ D) ∪ O is of
one-dimensional measure zero. This ensures in turn that for any x ∈ x + η/2B
having this property and for any t ∈ (0, η/(2v)), we get
 t
f (x + tv) − f (x) = ∇f (x + τ v, v dτ.
0

Since x + τ v − x < η for τ ∈ (0, t), it gives us ∇f (x + τ v), v ≤ γ + ε, and


therefore we arrive at the estimate
f (x + tv) − f (x) ≤ t(γ + ε)
for x ∈ x + ηB except a set of measure zero and for any t ∈ (0, η/(2v)). It
follows from the continuity of f that the obtained estimate actually holds for
all such x and t. This verifies (7.73) and thus completes the proof. 
7.6 Generalized Gradients in Finite Dimensions 497

7.6.3 Generalized Gradients of Antiderivatives

The last subsection here concerns some issues related to the fundamental
theorem of calculus about differentiation of integrals with variable bounds of
integration that are known as antiderivatives. However, instead of the classical
setting of continuous functions under integration, we now consider those which
are taken from the space L∞ [a, b] of essentially bounded functions on a given
interval [a, b] ⊂ R. In such a case, the obtained antiderivative functions are
not differentiable but merely Lipschitz continuous, and this calls therefore for
their generalized differentiation. It is shown below that the above properties
of generalized gradients (which are not shared by contingent subgradients)
and the one-dimensional version of Rademacher’s theorem lead us to precise
calculations of generalized gradients of Lipschitzian antiderivatives. Recall
that a (Lebesgue) measurable function f : [a, b] → R is essentially bounded if
there exists a constant M ≥ 0 such that
|f (x)| ≤ M a.e. on [a, b]. (7.74)
For functions of this class, define the essential supremum of f on [a, b] by

ess sup f (x) x ∈ [a, b] := inf sup f (x), (7.75)
E⊂[a,b] x∈[a,b]\E
μ(E)=0

where the notation μ(E) stands for the Lebesgue measure on the real line.
Similarly we defined the essential infimum of f on [a, b] by

ess inf f (x) x ∈ [a, b] := sup inf f (x).
E⊂[a,b]| x∈[a,b]\E
μ(E)=0

Then the classical Lebesgue space L∞ [a, b] is the collection of measurable and
essentially bounded functions f : [a, b] → R endowed with the norm
f ∞ := inf sup |f (x)|.
E⊂[a,b] x∈[a,b]\E
μ(E)=0

It is well known that L∞ [a, b] is a Banach space.


Given f ∈ L∞ [a, b], define further the antiderivative
 x
F (x) := f (t)dt, x ∈ [a, b], (7.76)
a

and for each x ∈ [a, b] consider the functions


 
f + (x) = lim ess sup f (u) u ∈ [x − ε, x + ε] ∩ [a, b] ,
ε↓0

 (7.77)


f (x) = lim ess inf f (u) u ∈ [x − ε, x + ε] ∩ [a, b] .
ε↓0

The next lemma collects important properties of the functions above.


498 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Lemma 7.62 Let f ∈ L∞ [a, b], and let x ∈ [a, b]. The following hold:
(a) The function F defined in (7.76) is Lipschitz continuous on [a, b].
(b) Considering the set

ΩF := x ∈ [a, b] F (x) is not differentiable ,
we have that F  (x) = f (x) for all x ∈ [a, b] \ ΩF .
(c) There exists a sequence {xk } ⊂ [a, b] \ ΩF that converges to x with
f (xk ) → f + (x) as k → ∞, where f + (x) is defined in (7.77).
(d) There exists a sequence {xk } ⊂ [a, b] \ ΩF that converges to x with
f (xk ) → f − (x) as k → ∞, where f − (x) is also defined in (7.77).

Proof. To verify (a), we use the boundedness of f ∈ L∞ [a, b] by M > 0 in


(7.74). Employing this and picking any x, u ∈ [a, b] with u ≤ x tell us that
 x  u
|F (x) − F (u| = f (t) dt − f (t) dt
 x 0 x 0
x
= f (t) dt ≤ |f (t)| dt ≤ M dt = M |x − u|,
u u u

which readily justifies the Lipschitz continuity of F on [a, b].


To prove (b), we get from (a) and Theorem 7.60 that F is differentiable
a.e. on [a, b]. It is also well known that F  (x) = f (x) for all x ∈ [a, b] \ ΩF .
Let us further proceed with (c) while considering for simplicity only the
case where x ∈ (a, b). It follows from definition (7.75) that for any ε > 0
sufficiently small, we have the equality
ess supu∈[x−ε,x+ε] f (u) = inf sup f (u).
E⊂[x−ε,x+ε] u∈[x−ε,x+ε]\E
μ(E)=0

Hence there exists E ⊂ [x − ε, x + ε] with μ(E) = 0 such that


γ := sup f (u) < ess supu∈[x−ε,x+ε] f (u) + ε,
u∈[x−ε,x+ε]\E

and furthermore f (u) ≤ γ for all u ∈ [x − ε, x + ε] \ E. Since μ(E ∪ ΩF ) = 0,


we see that there exists u ∈ [x − ε, x + ε] \ (ΩF ∪ E) with
ess supu∈[x−ε,x+ε] f (u) − ε < f (u).
Indeed, assuming the contrary gives us the inequality
ess supu∈[x−ε,x+ε] f (u) − ε ≥ f (u) whenever u ∈ [x − ε, x + ε] \ (ΩF ∪ E),
which implies that ess supu∈[x−ε,x+ε] f (u) ≤ f (u) ≤ ess supu∈[x−ε,x+ε] f (u)−ε,
a contradiction. Letting now ε := 1/k for all k ∈ N, we find a sequence
{xk } ⊂ [x − 1/k, x + 1/k] \ (E ∪ ΩF ) such that
7.6 Generalized Gradients in Finite Dimensions 499

1 1
ess supu∈[x−1/k,x+1/k] f (u) − < f (xk ) < ess supu∈[x−1/k,x+1/k] f (u) + .
k k
Passing to the limit as k → ∞ and using the definition of f + (x) in (7.77)
yield f (xk ) → f + (x) for all x ∈ [a, b], which verifies (c).
The proof of (d) is similar to that of (c), and thus we are done. 

Now we are ready to provide the following precise calculation of generalized


gradients of antiderivative functions.

Theorem 7.63 Consider the antiderivative function F : [a, b] → R defined in


(7.76). Then its generalized gradient at x is calculated by
 
∂F (x) = f − (x), f + (x) , (7.78)
where the functions f + and f − are taken from (7.77).

Proof. For simplicity we confine ourselves to the case where x ∈ (a, b). It
follows from Lemma 7.62(b) and the previous observations that F  (x) =
f (x) ∈ ∂F (x) for all x ∈ [a, b] \ ΩF . Then Lemma 7.62(c) gives us a sequence
{xk } ⊂ [a, b] \ ΩF that converges to x with f (xk ) → f + (x) as k → ∞. Since
f (xk ) ∈ ∂F (xk ), we have f + (x) ∈ ∂F (x), and similarly f − (x) ∈ ∂F (x). Then
the convexity of ∂F (x) ensures that [f − (x), f + (x)] ⊂ ∂F (x).
To verify the opposite inclusion in (7.78), fix a small number ε > 0 and
deduce from the definitions that, whenever t > 0 is sufficiently small, we get
 x+t  x+t
 
F (x + t) − F (x) f (u) du ≤ ess supu∈[x−ε,x+ε] f (u) du
x x

= t ess supu∈[x−ε,x+ε] f (u),


which tells us therefore that
F (x + t) − F (x)
≤ ess supu∈[x−ε,x+ε] f (u).
t
Using definition (7.1) of the generalized directional derivative, we arrive at
f (x + t) − f (x)
F ◦ (x; 1) = lim sup ≤ ess supu∈[x−ε,x+ε] f (u),
x→x t
t↓0

where the limiting procedure as ε ↓ 0 leads us to F ◦ (x; 1) ≤ f + (x).


Finally, picking any subgradient v ∈ ∂F (x), we can equivalently describe
it by using Theorem 7.31 and the above estimate as
v = 1 · v ≤ f ◦ (x; 1) ≤ f + (x).
Arguing similarly gives us f − (x) ≤ v, and therefore ∂F (x) ⊂ [f − (x), f + (x)],
which completes the proof of the theorem. 
500 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

7.7 Subgradient Analysis of Distance Functions


The main goal of this section is to calculate the convex subdifferential of the
signed distance function, which was introduced and studied in Section 6.4.
However, our approach to accomplish this goal requires the usage of subdif-
ferentials of nonconvex locally Lipschitzian functions. Proceeding in this way,
we define here the two other subgradient sets for such functions that are known
in variational analysis as the regular/Fréchet and limiting/Mordukhovich sub-
differentials at the point in question. The regular subdifferential is not always
directionally generated as those studied above in this chapter. On the other
hand, the limiting subdifferential, which is nonconvex-valued and hence is
not directionally generated, enjoys full calculus based on variational/extremal
principles while not on convex analysis. Since much has been written in the
literature on variational analysis about both regular and limiting subdifferen-
tials, we restrict ourselves in what follows to presenting only those properties
of the subdifferentials which are needed below in this section and the next
one devoted to (nonconvex) differences of convex functions; see Section 7.10
for more discussions and references.

7.7.1 Regular and Limiting Subgradients of Lipschitzian Functions

Recall that for a set-valued mapping F : X →→ Y between topological spaces,


the (Painlevé-Kuratowski) sequential outer limit of F at x ∈ X is given by

Lim sup F (x) := y ∈ Y ∃xk → x, yk → y, yk ∈ F (xk ) for all k ∈ N .
x→x

We remind the reader that the standing assumptions of this section address
locally Lipschitzian functions on normed spaces. The Lipschitz continuity was
essential in the definitions and results of the previous subsections concerning
generalized derivatives and gradients as well as subderivatives and contingent
subdifferentials. However, it is not the case here. In fact, all the definitions
and results presented in this subsection untill Theorem 7.69 do not require
the Lipschitz continuity, or just need simple modifications for the case of
extended-real-valued functions; see Section 7.10. Nevertheless, we keep the
standing assumptions to avoid the reader’s confusion while mentioning them
explicitly only when they are really needed; see Section 7.10.
Definition 7.64 Let X be a normed space, and let f : X → R be locally
Lipschitzian around x. Consider the following constructions:
(a) The (Fréchet) regular subdifferential of f at x is
f (x) − f (x) − x∗ , x − x 
 (x) := x∗ ∈ X ∗
∂f lim inf ≥0 . (7.79)
x→x x − x

(b) The (Mordukhovich) limiting subdifferential of f at x is defined by


 (x)
∂f (x) := Lim sup ∂f (7.80)
x→x
7.7 Subgradient Analysis of Distance Functions 501

 :X→
via the sequential outer limit of the mapping ∂f → X ∗ , where X ∗ is

equipped with the weak topology.
 (x) ⊂ ∂f (x).
It follows directly from definitions (7.79) and (7.80) that ∂f
Furthermore, we always have ∂f (x) = ∅ in arbitrary Asplund spaces, while the
regular subdifferential ∂f (x) may empty even for very simple nonsmooth and
nonconvex functions on R. As expected nevertheless, both subdifferentials
from Definition 7.64 agree with the subdifferential of convex analysis if f
is convex, and they reduce to the classical derivatives when the function is
differentiable in the appropriate sense; see Exercise 7.144 for more details.
Next we reveal relations between regular and contingent subgradients.

Proposition 7.65 Let X be a normed space. For any function f : X → R


which is locally Lipschitzian around x, we always have the inclusion
 (x) ⊂ ∂ − f (x),
∂f (7.81)
n
which holds as an equality provided that X = R .
 (x) and fix v ∈ X with v = 0. By definition (7.79) of
Proof. Fix any x∗ ∈ ∂f
the regular subdifferential, we have
f (x + tv) − f (x) − x∗ , tv
lim inf ≥ 0.
t↓0 tv
This clearly implies by definition (7.34) of the contingent subdifferential that
f (x + tv) − f (x)
df(x; v) = lim inf ≥ x∗ , v ,
t↓0 t
where the inequality also holds for v = 0. Then we arrive at v ∈ ∂ − f (x),
which justifies the inclusion in (7.81).
To verify the opposite inclusion in (7.81) when X = Rn , pick x∗ ∈ ∂ − f (x)
and suppose on the contrary that x∗ ∈  (x). This gives us ε > 0 and a
/ ∂f
sequence {xk } ⊂ Rn converging to x with xk = x and
f (xk ) − f (x) − x∗ , xk − x
≤ −
ε < 0, k ∈ N.
xk − x
For each k ∈ N let tk := xk − x and vk := (xk − x)/xk − x with vk  = 1.
Then the sequence {tk } converges to 0, and we can assume without loss of
generality that {vk } converges to v as k → ∞. Thus we get
f (x + tk vk ) − f (x)
≤ x∗ , vk − ε.
tk
Employing now Proposition 7.2 tells us that
502 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

f (x + tk vk ) − f (x)
df (x; v) ≤ lim inf ≤ lim inf x∗ , vk − ε = x∗ , v − ε.
k→∞ tk k→∞

The obtained contradiction shows that x∗ ∈


/ ∂ − f (x), which therefore com-
pletes the proof of the proposition. 

In what follows, we also use geometric constructions corresponding to the


subdifferentials from Definition 7.64.

Definition 7.66 Let Ω be a nonempty subset of a normed space X.


(a) The (Fréchet) regular normal cone to Ω at x ∈ Ω is defined by

 (x; Ω) := x∗ ∈ X ∗ x∗ , x − x ≤ o(x − x) for all x ∈ Ω . (7.82)
N
 (x; Ω) := ∅ if x ∈
We set N / Ω.
(b) The (Mordukhovich) limiting normal cone to Ω at x ∈ Ω is given by
 (x; Ω)
N (x; Ω) := Lim sup N (7.83)
x→x

with N (x; Ω) := ∅ whenever x ∈


/ Ω.

The next theorem provides representations of both regular and limiting


subdifferential of functions in terms of the corresponding normal cones to
their epigraphs. We can see that the local Lipschitz continuity of functions
plays no role in the proofs of these representations, which can be taken as the
definitions of the regular and limiting subdifferentials of extended-real-valued
functions; see Section 7.10 for more discussions.
Theorem 7.67 We have the following geometric representations of the regu-
lar and limiting subdifferentials from Definition 7.64:
 
 (x) = x∗ ∈ X ∗ (x∗ , −1) ∈ N
∂f  (x, f (x)); epi(f ) , (7.84)

 
∂f (x) = x∗ ∈ X ∗ (x∗ , −1) ∈ N (x, f (x)); epi(f ) . (7.85)

Proof. We verify only the regular subdifferential representation (7.84) while


referring the reader to Exercise 7.146 for the limiting subdifferential one (7.85).
To proceed with the proof of (7.84), pick x∗ ∈ ∂f  (x). Then definition
(7.79) tells us that for any ε > 0 there exists δ > 0 such that
x∗ , x − x ≤ f (x) − f (x) + εx − x whenever x − x < δ.
Taking any (x, λ) ∈ epi(f ) with x − x + |λ − f (x)| < δ, we get
   
x∗ , x − x − λ − f (x) ≤ x∗ , x − x − f (x) − f (x) 
≤ εx − x ≤ ε x − x + |λ − f (x)| ,
 ((x, f (x)); epi(f )).
which readily implies by (7.82) that (x∗ , −1) ∈ N
7.7 Subgradient Analysis of Distance Functions 503

To verify the opposite inclusion in (7.84), pick any x∗ ∈ X ∗ with


 ((x, f (x)); epi(f )) and suppose on the contrary that x∗ ∈
(x∗ , −1) ∈ N  (x).
/ ∂f
Then definition (7.82) gives us a number ε > 0 and a sequence {xk } ⊂ X
converging to x such that
f (xk ) − f (x) − x∗ , xk − x < −
εxk − x for all k ∈ N. (7.86)
It follows therefore that
f (xk ) < λk := f (x) + x∗ , xk − x − εxk − x whenever k ∈ N.
Observe that (xk , λk ) ∈ epi(f ) and that (xk , λk ) → (x, f (x)) as k → ∞.
Take now any ε with 0 < ε < ε and find δ > 0 such that
   
x∗ , x − x − λ − f (x) ≤ ε x − x + |λ − f (x)| ,
whenever x − x + |λ − f (x)| < δ with (x, λ) ∈ epi(f ). This tells us that for
any k ∈ N sufficiently large we have
   
x∗ , xk − x − λk − f (x) ≤ ε xk − x + |λk − f (x)| ,
which implies by (7.86) that
 
εxk − x ≤ εxk − x + |λk − f (x)| 

≤ εxk − x + x   · xk − x + εxk − x
≤ ε 1 + x∗  + ε xk − x.
Thus we arrive at ε ≤ ε(1 + x∗  + ε). Letting finally ε ↓ 0 brings us to a
contradiction, which completes the proof of the theorem. 

The next result used below is a Lipschitzian regular subgradient version


of Zagrodny’s approximate mean value theorem [358]. Its proof follows the
arguments developed in the proof of Theorems 5.29 and 5.30 for the case
of convex functions. The main difference is that, instead of the sum rule for
convex subgradients as in the above proofs of the convex counterpart, we
need to exploit now the subgradient description of the approximate extremal
principle in Asplund spaces taken from [229, Lemma 2.32]; see Exercise 7.147
and further commentaries in Section 7.10.
Theorem 7.68 Let X be an Asplund space, and let f : X → R be a locally
Lipschitzian function around x ∈ X. Fix any a, b ∈ X with a = b. Then there
exist a vector c ∈ [a, b] and a sequence {xk } ⊂ X converging to c such that
 (xk ) for every k ∈ N, and we have the mean value inequality
x∗k ∈ ∂f
f (b) − f (a) ≤ lim inf x∗k , b − a . (7.87)
k→∞

Now we are ready to establish relations between generalized gradients


(7.33) and limiting subgradients (7.80) of locally Lipschitzian functions.
504 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.69 Let X be an Asplund space, and let f : X → R be a locally


Lipschitzian function around x ∈ X. Then we have
 
∂f (x) = cl∗ co ∂f (x) (7.88)
by using the weak∗ convex closure operation in X ∗ . If X = Rn , representation
(7.88) is specified as follows:
 
∂f (x) = co ∂f (x) . (7.89)
Proof. First deduce from (7.35), definition (7.80), and the weak∗ closedness
of the generalized gradient mapping for locally Lipschitzian functions that
∂f (x) ⊂ ∂f (x) in any normed space; see Exercise 7.148. This yields
 
cl∗ co ∂f (x) ⊂ ∂f (x) (7.90)

by the convexity of ∂f (x). To prove the opposite inclusion in (7.88), let us


verify the following equalities for the generalized derivative (7.1):
 
f ◦ (x; v) = max x∗ , v x∗ ∈ cl∗ co ∂f (x)
(7.91)
= sup x∗ , v x∗ ∈ ∂f (x) .
Indeed, by (7.90) and the generalized derivative representation

f ◦ (x; v) = max x∗ , v x∗ ∈ ∂f (x) whenever v ∈ X
due to Theorem 7.31, we get the estimates
  
f ◦ (x; v) ≥ max x∗ , v x∗ ∈ cl∗ co ∂f (x) ≥ sup x∗ , v x∗ ∈ ∂f (x) .
The construction of f ◦ (x; v) makes it possible to find sequences xk → x and
tk ↓ 0 as k → ∞ such that
f (xk + tk v) − f (xk )
f ◦ (x; v) = lim .
k→∞ tk
Applying the mean value inequality (7.87) from Theorem 7.68, for each fixed
k ∈ N we find a sequence um → ck ∈ [xk , xk + tk m] as m → ∞ and regular
 (um ) satisfying the inequality
subgradients u∗m ∈ ∂f
f (xk + tk v) − f (xk ) ≤ tk lim inf u∗m , v .
m→∞

Thus there exists an increasing sequence of natural numbers m1 < m2 < . . .


for which we get for any v ∈ X that xmk − ck  < 1/k and
f (xk + tk v) − f (xk ) ≤ tk lim inf u∗mk , v , k ∈ N. (7.92)
k→∞

Since f is locally Lipschitzian around x, the subgradient sequence {u∗mk },


k ∈ N, is bounded in X ∗ . Since X is Asplund, this allows us to suppose
7.7 Subgradient Analysis of Distance Functions 505

w∗
without loss of generality that u∗mk −−→ u∗ as k → ∞. Hence u∗ ∈ ∂f (x), and
then by passing to the limit in (7.92) we arrive at

f ◦ (x; v) ≤ x∗ , v ≤ sup x∗ , v x∗ ∈ ∂f (x) ,
which readily yields (7.91). It follows from (7.91) that
 
∂f (x) ⊂ cl∗ co ∂f (x) ,
and thus we get (7.88) due to (7.90).
To verify the last statement (7.89) of the theorem, we easily deduce from
(7.80) and the imposed local Lipschitz continuity of f that the subgradient
set ∂f (x) is bounded in X ∗ ; see Exercise 7.144. If X = Rn , then this set
is clearly closed, and so is its convex hull co(∂f (x)). Thus (7.88) reduces to
(7.89), which completes the proof of the theorem. 

To conclude this subsection, we present a simple consequence of Theo-


rem 7.69 that gives us a characterization of locally Lipschitzian functions for
which the limiting subdifferential (7.80) is a singleton.

Corollary 7.70 Let f : Rn → R be locally Lipschitzian around x. Then the


limiting subdifferential is a singleton if and only if f is strictly Fréchet differ-
entiable at x. In this case ∂f (x) reduces to the strict Fréchet derivative.

Proof. It follows from (7.89) that ∂f (x) is a singleton if and only if the gen-
eralized gradient ∂f (x) is a singleton. Theorem 7.56 tells us that the latter is
equivalent, in general normed spaces, to the strict Hadamard differentiability
of f at x, which is the same as the strict Fréchet differentiability of f at x in
finite dimensions; see Proposition 7.53. This completes the proof. 

7.7.2 Regular and Limiting Subgradients of Distance Functions

The main goal of this subsection is to evaluate with providing precise cal-
culation formulas for regular and limiting subgradients sets of the (Lipschitz
continuous) distance function

d(x; Ω) := inf x − w w ∈ Ω , x ∈ X, (7.93)
associated with a nonempty set Ω in a normed space X. This function is
convex for a convex set Ω, and in the convex subdifferential of (7.93) has
been comprehensively studied in Subsection 3.3.5 of Chapter 3. Considering
now general nonconvex settings of (7.93) in both finite and infinite dimensions,
we aim at obtaining subgradient results of their own interest and prepare to
use them in deriving subdifferential formulas for calculating subgradient sets
for convex signed distance functions in the next section.
The first result addresses regular subgradients of distance functions asso-
ciated with subsets of normed spaces in both in-set and out-of-set cases.
506 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Theorem 7.71 Let Ω be a nonempty subset of a normed space X.


(a) If x ∈ Ω, then we have the equality

∂d(x;  (x; Ω) ∩ B∗ .
Ω) = N (7.94)

(b) If x ∈
/ Ω and the projection set Π(x; Ω) is not empty, then for every
w ∈ Π(x; Ω) we have the inclusion

∂d(x;  (x; Ω),
Ω) ⊂ ∂p(x − w) ∩ N (7.95)
where p(x) := x for x ∈ X.

Proof. To verify the in-set formula (7.94) in (a), fix x∗ ∈ ∂d(x; Ω) and for any
ε > 0 find by (7.79) a δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x = d(x; Ω) + εx − x whenever
x ∈ B(x; δ).
Since d(x; Ω) = 0 when x ∈ Ω, we have
x∗ , x − x ≤ εx − x for all x ∈ B(x; δ),
 (x; Ω) by definition (7.82). The inclusion x∗ ∈ B∗ follows
which yields x∗ ∈ N
from the Lipschitz continuity of d(·; Ω) on X with modulus  = 1.
 (x; Ω) ∩ B∗ and for
To prove the opposite inclusion in (7.94), pick x∗ ∈ N
any ε > 0 find by (7.82) a number δ > 0 with
x∗ , x − x ≤ εx − x whenever x ∈ B(x; δ) ∩ Ω.

Denote δ := min{1, δ/4} and fix any x ∈ X with x − x < δ.


 First consider
/ Ω. By x − x > 0 we find u ∈ Ω such that
the case where x ∈ 2

x − u < d(x; Ω) + x − x2 ≤ x − x + x − x2


and therefore obtain the estimates
u − x ≤ u − x + x − x < d(x; Ω) + x − x2 + x − x
≤ 2x − x + x − x2 < 2δ + δ2 < δ.
We have furthermore that
x∗ , u − x ≤ εu − x,
which implies in turn that
x∗ , x − x = x∗ , x − u + x∗ , u − x
≤ x∗  · x − u + εu − x
≤ x − u + ε(u − x + x − x)
= u − x + εu − x + εx − x
≤ d(x; Ω) + x − x2 + ε(x − x + x − x2 ) + εx − x
≤ d(x; Ω) + 2εx − x + (1 + ε)x − x2 .
7.7 Subgradient Analysis of Distance Functions 507

 give us
Using this and picking any x ∈ B(x; δ)
d(x; Ω) − d(x; Ω) − x∗ , x − x
≥ −2ε − (1 + ε)x − x.
x − x

Taking above the lower limit as x → x yields x∗ ∈ ∂d(x; Ω) since ε > 0 was
chosen arbitrarily. This completes the proof of assertion (a).
To verify the claimed inclusion (7.95) in the out-of-set case (b), fix a regular

normal x∗ ∈ ∂d(x; Ω) and a projection vector w ∈ Π(x; Ω). Then for any
ε > 0 there exists δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x
= d(x; Ω) − x − w + ε|x − x
if x − x < δ. Pick w ∈ Ω with w − w = (w − w + x) − x < δ and get
x∗ , w − w = x∗ , (w − w + x) − x ≤ d(w−w+x; Ω) − x − w+εw − x
≤ w − w + x − w − x − w + εw − x = εw − x.
 (w; Ω), which finishes the proof of the theorem since
Thus we arrive at x∗ ∈ N
the remaining inclusion x∗ ∈ ∂p(x − w) is obvious. 

Our next goal here is to establish a precise calculation formula for the reg-
ular subdifferential of the distance function at out-of-set points of nonempty
subsets Ω ⊂ X of normed spaces by using set enlargements

Ωr := x ∈ X d(x; Ω) ≤ r , r > 0. (7.96)
We begin with the following lemma that is needed for our calculations.
Lemma 7.72 Let X be a normed space, and let Ω be a nonempty subset of
X. For any x ∈ X with r := d(x; Ω) > 0, we have the properties:
 (x), then x∗  = 1. Here f (x) = d(x; Ω) for x ∈ X.
(a) If x∗ ∈ ∂f
(b) d(x; Ωr ) = d(x; Ω) − r whenever x ∈/ Ωr .
 (x) and fix any ε > 0. By definition (7.79)
Proof. To verify (a), pick x∗ ∈ ∂f
there exists δ > 0 such that
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x for all x ∈ B(x; δ). (7.97)
Taking a sequence {tk } of positive numbers with limk→∞ tk = 0, for each
k ∈ N find wk ∈ Ω such that
x − wk  < d(x; Ω) + t2k .
Let xk := x+tk (wk −x) and get xk −x = tk wk −x < tk (d(x; Ω)+t2k ) → 0
as k → ∞. Thus we have xk − x < δ for sufficiently large k and hence
508 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

x∗ , xk − x ≤ d(xk ; Ω) − d(x; Ω) + εxk − x for all such k.


Since d(xk ; Ω) ≤ xk − wk  = (1 − tk )wk − x, it follows that
x∗ , tk (wk − x) ≤ (1 − tk )wk − x − wk − x + t2k + εtk wk − x
= tk (ε − 1)wk − x + t2k
when k ∈ N is large enough. This readily yields the estimate
 x − wk  tk tk
1 − ε ≤ x∗ , + ≤ x∗  + .
x − wk  x − wk  d(x; Ω)
Letting there k → ∞ and ε ↓ 0 shows that 1 ≤ x∗ . Since the distance
function is Lipschitz continuous with modulus  = 1, we have that x∗  ≤ 1,
and hence x∗  = 1. This completes the proof of assertion (a). The second
assertion follows directly from Lemma 3.80. 

Now we are in a position to obtain a precise formula for the regular sub-
differential of the distance function (7.93) at out-of-set points.
Theorem 7.73 Let X be a normed space, and let Ω be a nonempty subset of
X. For any x ∈ X with r := d(x; Ω) > 0 we have the representation

∂d(x;  (x; Ωr ) ∩ S∗ ,
Ω) = N (7.98)
where S∗ := {x∗ ∈ X ∗ | x∗  = 1}.

Proof. Fix any x∗ ∈ ∂d(x; Ω) and for each ε > 0 find δ > 0 such that (7.97)
holds. Since d(x; Ω) − d(x; Ω) = d(x; Ω) − r ≤ 0 whenever x ∈ Ωr , we have
x∗ , x − x ≤ 0 for all x ∈ B(x; δ) ∩ Ωr ,

which implies that x∗ ∈ N (x; Ωr ). It follows from Lemma 7.72(a) that x∗  =
1, and thus we get the inclusion “⊂” in (7.98).
To prove the opposite inclusion in (7.98), fix any x∗ ∈ N  (x; Ωr ) with
∗ ∗ 
x  = 1. It follows from Theorem 7.71 that x ∈ ∂d(x; Ωr ). Taking any
ε > 0 and 0 < η < ε/2, we find δ1 > 0 for which
x∗ , x − x ≤ d(x; Ωr ) − d(x; Ωr ) + εx − x whenever x − x < δ1 .
Then Lemma 7.72(b) gives us the inequality
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x if x − x < δ1 , x ∈
/ Ωr . (7.99)

Since x∗ ∈ N  (x; Ωr ), there exists 0 < δ2 < δ1 such that


ε
x∗ , x − x ≤ x − x whenever x − x < δ2 , x ∈ Ωr . (7.100)
2
7.7 Subgradient Analysis of Distance Functions 509

By x∗  = 1 we can choose a unit vector u ∈ X for which 1 − η ≤ x∗ , u .


To proceed further, pick any x ∈ Ωr with x − x < δ2 /2 and set z :=
x + rx u, where rx := d(x; Ω) − d(x; Ω) ≥ 0. Then
d(x + rx u; Ω) ≤ d(x; Ω) + rx = d(x; Ω) = r
and x + rx u − x ≤ x − x + rx ≤ 2x − x < δ2 . Using (7.100) yields
ε
x∗ , x + rx u − x ≤ x − x,
2
which implies in turn that
ε
x∗ , x − x ≤ −rx x∗ , u + εx − x ≤ −rx (1 − η) + x − x
ε 2
= d(x; Ω) − d(x; Ω)(1 − η) + x − x
ε 2
≤ d(x; Ω) − d(x; Ω) + (η + x − x
2
≤ d(x; Ω) − d(x; Ω) + εx − x.
Remembering (7.99) gives us
x∗ , x − x ≤ d(x; Ω) − d(x; Ω) + εx − x whenever x − x < δ2 .

Thus we arrive at x∗ ∈ ∂d(x; Ω) and complete the proof of the theorem. 

The last theorem here presents a precise formula for calculating the limit-
ing subdifferential (7.80) of the distance function associated with a closed set
in finite dimensions for the most challenging case of out-of-set points.

Theorem 7.74 Let Ω be a nonempty closed subset of the Euclidean space Rn .


Then for any x ∈
/ Ω, we have Π(x; Ω) = ∅ and
x − Π(x; Ω)
∂d(x; Ω) = . (7.101)
d(x; Ω)
Proof. As an immediate consequence of the Weierstrass existence theorem,
we have Πx; Ω) = ∅. Fix any v ∈ ∂d(x; Ω) and by (7.80) find sequences
 k ; Ω). For each k ∈ N we pick
xk → x and vk → v as k → ∞ with vk ∈ ∂d(x
wk ∈ Π(xk ; Ω) and get by Theorem 7.71 that
xk − wk
vk ∈ ∂p(xk − wk ) = when k ∈ N is sufficiently large k.
xk − wk 
It is easy to see that the sequence {wk } is bounded, and hence we suppose
without loss of generality that {wk } converges to some w ∈ Π(x; Ω). This
ensures the relationships
x−w x − Π(x; Ω)
v= ∈ ,
d(x; Ω) d(x; Ω)
510 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

which verify therefore the subdifferential inclusion


x − Π(x; Ω)
∂d(x; Ω) ⊂ . (7.102)
d(x; Ω)
To prove the opposite inclusion in (7.101), take any projection w ∈ Π(x; Ω),
consider a sequence tk ↓ 0 as k → ∞, and define the vectors xk := x+tk (w−x).
Then we have xk → x as k → ∞. Observe furthermore that Π(xk ; Ω) = {w}
and thus deduce from (7.102) that
xk − w x−w
∂d(xk ; Ω) ⊂ = .
d(xk ; Ω) d(x; Ω)
Corollary 7.70 tells us that the distance function d(x; Ω) is strictly Fréchet
differentiable at xk with ∇d(xk ; Ω) = (x − w)/d(x; Ω). This yields (x −
w)/d(x; Ω) ∈ ∂d(x; Ω) and thus completes the proof of the theorem. 

We conclude this subsection with the following remark about the smooth-
ness of the distance function at out-set points.

Remark 7.75 Considering a nonempty closed subset Ω ⊂ Rn and a point


x∈ / Ω such that Π(x; Ω) is a singleton (as, e.g., in the case of convex sets),
we have by Theorem 7.74 and Corollary 7.70 that the distance function d(·; Ω)
is strictly Fréchet differentiable at x with
x − Π(x; Ω)
∇d(x; Ω) = .
d(x; Ω)
However, in this case the distance function may not be continuously differen-
tiable around x. This is illustrated by the set Ω ⊂ R2 given by

Ω := (1/k, 0) k ∈ N ∪ {(0, 0)}
at the out-of-set point x = (0, 1) ∈ R2 .

7.7.3 Subgradients of Convex Signed Distance Functions

Now we are ready to establish a precise formula for calculating the convex
subdifferential of the signed distance function by using the above tools of
nonconvex generalized differentiation. For the reader’s convenience, recall that
the signed distance function associated with a nontrivial set Ω is defined as

 Ω) := d(x; Ω) if x ∈
/ Ω,
d(x; (7.103)
−d(x; Ω c ) if x ∈ Ω
by using the complement Ω c of the set Ω. The signed distance function was
studied in Section 6.4 in the general setting of normed spaces. Although the
 Ω) is convex when the set Ω is convex, no results on its subdif-
function d(x;
ferentiation were given in that section. To proceed now in this direction, we
7.7 Subgradient Analysis of Distance Functions 511

confine ourselves to the finite-dimensional setting and show below that the
main result may fail in infinite dimensions.
First we provide a subdifferential study of the extended-real-valued func-
 Ω).
tion ϑ(·; Ω) : X → R defined in (6.19) which is associated with d(x;

Proposition 7.76 Let X be a normed space, and let Ω ⊂ X be a nontrivial,


closed, and convex subset of X. The following assertions hold:
(a) ∂ϑ(x; Ω) = ∅ for all x ∈ Ω.
(b) If x ∈ bd(Ω), then we have the inclusion
∂ϑ(x; Ω) ⊂ N (x; Ω).

Proof. To verify (a), it suffices to consider the case where x ∈ bd(Ω). Observe
that the closedness of Ω guarantees that bd(Ω) ⊂ Ω = dom(ϑΩ ). Observe
that the signed distance function d(·; Ω) is convex and continuous on X
due to Corollary 6.48 and Proposition 6.45, respectively. Thus we know that
 Ω) = ∅. Pick any x∗ ∈ ∂ d(x;
∂ d(x;  Ω) and get by the definition that

 Ω) − d(x;
x∗ , x − x ≤ d(x;  Ω) = d(x;
 Ω) for all x ∈ X.

This clearly implies the conditions


 Ω) = −d(x; Ω c ) = ϑ(x; Ω) − ϑ(x; Ω) whenever x ∈ Ω,
x∗ , x − x ≤ d(x;
and hence x∗ ∈ ∂ϑ(x; Ω), which justifies assertion (a).
To verify (b), take any x∗ ∈ ∂ϑΩ (x) and deduce from the definitions that
x∗ , x − x ≤ ϑ(x) − ϑ(x) = −d(x; Ω c ) ≤ 0 for all x ∈ Ω.
This gives us x∗ ∈ N (x; Ω) and thus completes the proof. 

The next result provides a precise formula for representing the subdiffer-
 Ω) via that of ϑ(x; Ω) at boundary points of Ω.
ential of d(x;

Proposition 7.77 Let all the assumptions of Proposition 7.76 be satisfied.


Then we have the representation
 Ω) = ∂ϑ(x; Ω) ∩ B∗ for every x ∈ bd(Ω).
∂ d(x;

Proof. Fix any x ∈ bd(Ω). It follows from Theorem 6.47 that d(x;  Ω) =
(ϑΩ p)(x) for all x ∈ X, where p(x) := x for x ∈ X, and where the infimal
convolution is exact at the reference point x, in the sense that d(x; Ω) =
ϑ(x; Ω)+p(0), since all the terms therein are zero. Applying the subdifferential
rule for infimal convolutions of convex functions, we get the equalities
 Ω) = ∂(ϑΩ p)(x) = ∂ϑ(x; Ω) ∩ ∂p(0) = ∂ϑ(x; Ω) ∩ B∗
∂ d(x;
and thus verify the claimed subdifferential representation. 
512 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

It follows from the combination of Proposition 7.76 and Proposition 7.77


that, under the imposed assumptions, the subdifferential of the signed distance
function admits the upper estimate
 Ω) ⊂ N (x; Ω) ∩ B∗ for every x ∈ bd(Ω).
∂ d(x;
To proceed further, for each x ∈ Ω consider the set

Π(x; Ω) if x ∈ Ω c ,
QΩ (x) := (7.104)
Π(x; Ω c ) if x ∈ Ω.
The following lemma exploits elementary properties of the Euclidean projec-
tions in the construction of (7.104).
Lemma 7.78 Let Ω be a nontrivial, closed, and convex subset of Rn . If w ∈
QΩ (x) with x ∈ int(Ω), then we have the inclusion w − x ∈ N (w; Ω).
Proof. Fix any x ∈ Ω c and write for the Euclidean norm that
x − x2 = x − w2 − 2 x − w, x − w + x − w2 .
Since w ∈ QΩ (x) = P (x; Ω c ), we get x − x2 − x − w2 ≥ 0, which yields
1
x − w, x − w ≤ x − w2 for all x ∈ Ω c .
2
Now let us pick any x ∈ int(Ω) and verify that w + t(w − x) ∈/ Ω whenever
t > 0. Indeed, supposing on the contrary that w + t(w − x) = ω ∈ Ω yields
t 1
w= x+ ω ∈ int(Ω),
t+1 t+1
a contradiction. Thus we arrive at the estimate
   
t x − w, w − x = x − w, w + t(w − x) − w
1   2 1  2
≤  w + t(w − x) − w = t2 w − x for all t > 0.
2 2
Letting t ↓ 0 gives us w−x, x−w ≤ 0 and hence shows that w−x ∈ N (w; Ω),
which is claimed in the lemma. 
 Ω) and its
Here is the main result on the convex subdifferentiation of d(x;
“nonconvex” proof. Recall that S stands for the unit sphere of Rn .
Theorem 7.79 Let Ω be a nontrivial, closed, and convex subset of Rn . Then
⎧ 

⎪ x − QΩ (x)

⎪ if x ∈ Ω c ,
⎪ 
⎨ d(x; Ω) 
 Ω) =
∂ d(x; x − co QΩ (x) (7.105)

⎪ if x ∈ int(Ω),

⎪ 
d(x; Ω)

⎩  
co S ∩ N (x; Ω) if x ∈ bd(Ω),
where the set QΩ (x) is taken from (7.104).
7.7 Subgradient Analysis of Distance Functions 513

Proof. We consider separately the following three cases corresponding to the


position of the reference point x ∈ Rn in (7.105).
 Ω) = d(x; Ω) for all x ∈ Ω c ,
Case 1: x ∈ Ω c . In this case we have that d(x;
c n
where Ω is an open set in R . Using Corollary 3.79 on subdifferentiation of
the distance function for a convex set tells us that
   
 Ω) = ∂d(x; Ω) = x − Π(x; Ω) = x − QΩ (x) , x ∈ Ω c ,
∂ d(x;
d(x; Ω)  Ω)
d(x;
which verifies the subdifferential representation of (7.105) in this case.
Case 2: x ∈ int(Ω). In this case we deduce from the definitions that
 Ω) = −d(x; Ω c ) = −d(x; Ω c ) for all x ∈ int(Ω).
d(x;
Then applying the relationship between the limiting subgradients and gener-
alized gradients in (7.89) of Theorem 7.69, the symmetry property (7.54) for
generalized gradients of locally Lipschitzian functions, and the limiting subd-
ifferential calculation (7.101) for the usual distance function in Theorem 7.74
brings us to the equalities:
 
 Ω) = −∂d(x; Ω c ) = −co ∂d(x; Ω c )
∂ d(x;
    
x − QΩ (x) x − co QΩ (x)
= −co = .
d(x; Ω c )  Ω)
d(x;
Thus we arrive at the claimed formula (7.105) in the case where x ∈ int(Ω).
Case 3: x ∈ bd(Ω). Since the signed distance function d(x;  Ω) is con-
vex and Lipschitz continuous on Rn , and since the boundary set bd(Ω) has
Lebesgue measure zero on Rn , it follows from the generalized gradient repre-
sentation of Theorem 7.61 that

 Ω) = co lim ∇f (xk ) xk → x, xk ∈
∂ d(x; / bd(Ω) ∪ Df , (7.106)
 Ω) on Rn for convenience.
where we denote f (x) := d(x;
Select a sequence {xk } ⊂ Rn such that xk ∈
/ bd(Ω) ∪ Df for every k ∈ N,
that xk → x as k → ∞, and that the limit v := limk→∞ ∇f (xk ) exists.
Suppose first that for some k0 ∈ N, we have xk ∈ / Ω whenever k ≥ k0 .
Remembering that the Euclidean projection Π(xk ; Ω) is a singleton by the
convexity of Ω yields the inclusions
xk − Π(xk )  
∇f (xk ) = ∈ N Π(xk ); Ω ∩ S for all k ≥ k0 ,
d(xk ; Ω)
and hence we get v ∈ N (x; Ω) ∩ S by passing to the limit as k → ∞.
It remains to consider the complement situation where there exists a sub-
sequence {xk } such that (without relabeling) xk ∈ int(Ω) as k ∈ N. Thus we
are in the setting of Case 2, which gives us
514 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

xk − wk  
∇f (xk ) = with wk := co QΩ (xk ) ,

d(xk ; Ω)
 
where the set co QΩ (xk ) is a singleton, and where ∇f (xk ) = 1 for each
k ∈ N. The latter ensures that v ∈ S. Furthermore, by Lemma 7.78 we have
 
xk − co QΩ (xk ) wk − xk
= ∈ N (wk ; Ω).
 k ; Ω)
d(x d(xk ; Ω c )
Since wk → x as k → ∞, it follows from the above constructions that v ∈
N (x; Ω), and hence v ∈ N (x; Ω) ∩ S. Now we deduce from the representation
 Ω) ⊂ co(N (x; Ω) ∩ S).
in (7.106) that ∂ d(x;
To verify the opposite inclusion in (7.105) in this case, pick any v ∈
N (x; Ω) ∩ S and let xk := x + v/k for all k ∈ N. It is not hard to check
 Ω). This justifies formula
that ∇f (xk ) = v for all k, and hence v ∈ ∂ d(x;
(7.105) for x ∈ bd(Ω) and thus completes the proof of the theorem. 

We conclude this section with an example showing that the finite dimen-
sion of the Euclidean space X = Rn is essential for the fulfillment of Theo-
rem 7.79. In fact, the following example demonstrates that the subdifferential
formula (7.105) fails in any infinite-dimensional separable Hilbert space.

Example 7.80 Let X be an infinite-dimensional separable Hilbert space with


an orthonormal basis {ek }k∈N . Consider the set

Ω := x ∈ X x, ek ≤ k −1 for all k ∈ N ,
which is clearly closed and convex. We are going to show that x = 0 ∈ Ω is a
boundary point of the set Ω, and that
 
 Ω) = co S ∩ N (0; Ω) .
∂ d(0;
Let us first check that 0 ∈ bd(Ω). Indeed, fix any ε > 0 and find k ∈ N with
1/k < ε/2. This implies that (ε/2)ek ∈ B(0; ε) ∩ Ω c , and hence 0 ∈ bd(Ω)
since ε > 0 was chosen arbitrarily. Due to ±(1/k)ek ∈ Ω for all k ∈ N, we
see that the inclusion x ∈ N (0; Ω) yields x, ek = 0 as k ∈ N. This shows
that N (0; Ω) = {0}, and thus S ∩ N (0; Ω) = ∅. On the other hand, we know
 Ω) is (Lipschitz) continuous and convex, and therefore
that the function d(x;
 Ω) = ∅. This tells us that formula (7.105) fails in infinite dimensions.
∂ d(0;

7.8 Differences of Convex Functions


This section is devoted to the investigation of a large class of functions, which
can be represented as differences of convex (DC) functions. Such functions
frequently appear in many areas of variational analysis, optimization, and
their applications. A significant part of optimization theory is known as DC
7.8 Differences of Convex Functions 515

optimization that will be considered in the second volume of our book [240]
together with numerical algorithms and a variety of applications. Here we
present some basic properties of DC functions including their subdifferential
and conjugate calculi. A special structure of DC functions makes it possible
to broadly use the machinery of convex analysis, while their intrinsic noncon-
vexity requires also employing nonconvex subdifferentiation studied above.
Since the main part of this chapter deals with Lipschitzian functions, we
restrict ourselves to differences of two continuous convex functions, where Lip-
schitz continuity comes from convexity. Real-valued functions of this type are
known as continuous DC functions. We also consider their local counterparts
and derive for them major rules of subdifferential and conjugate calculi. More
general extended-real-valued DC functions without any continuity assump-
tions are discussed in the commentary Section 7.10.
Unless otherwise stated, all the functions under consideration in this sec-
tion are defined on normed spaces.

7.8.1 Continuous DC Functions

Here is the basic class of functions we study in this section.


Definition 7.81 Let Ω be a nonempty convex subset of a normed space X.
A function f : Ω → R is said to be a continuous DC function on Ω if
there exist two continuous convex functions g, h : Ω → R such that f (x) =
g(x) − h(x) for all x ∈ Ω. If f is a continuous DC function on the entire
space X, we simply say that it is a continuous DC function.
Both parts of the following example are instructive for understanding the
given definition of continuous DC functions.
Example 7.82 (a) Let Ω be a nonempty (not necessarily convex) subset of
an inner product space X. Define f (x) := d2 (x; Ω) for x ∈ X and show
that it is a continuous DC function. Indeed, we have
 
f (x) = inf x − w2 = inf x2 − 2 x, w + w2
w∈Ω  w∈Ω
= x2 − sup 2 w, x − w2 ), x ∈ X.
w∈Ω

Since both g(x) := x and h(x) := supw∈Ω 2 w, x − w2 ) are convex
2

and continuous on X, we verify that f is a continuous DC function.


!
(b) The function f (x) := |x| for x ∈ R is not a continuous DC function.
Indeed, any continuous DC function on Rn must be locally Lipschitz
continuous, which is not the case for this function.
The next proposition reveals the continuous DC structure of a major class
of smooth functions on inner product spaces.
Proposition 7.83 Let X be an inner product space, and let f : X → R be a
Fréchet differentiable function with its Fréchet derivative ∇f being Lipschitz
continuous on X with modulus  ≥ 0. Then f is a continuous DC function.
516 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Proof. We obviously have the representation


1     
f (x) = x2 + f (x) − x2 − f (x) for all x ∈ X.
2 2 2
Since the continuity of both functions g(x) := x2 /2 + f (x) and h(x) :=
x2 /2 − f (x) does not need any proof, let us check that these functions are
convex on X. Taking any x0 ∈ X and v ∈ X, we first show that the function
ϕ(t) := g(x0 + tv) for t ∈ R is convex on R. Indeed, it is obvious to see that
ϕ is differentiable on R with the derivative calculated by
ϕ (t) =  x0 + tv, v + ∇f (x0 + tv), v , t ∈ R.
Observe that if t1 < t2 , then we get
ϕ(t2 ) − ϕ(t1 ) = (t2 − t1 ) v, v + ∇f (x0 + t2 v) − ∇f (x0 + t1 v), v
≥ (t2 − t1 )v2 − ∇f (x0 + t2 v) − ∇f (x0 + t1 v) · v
≥ (t2 − t1 )v2 − (t2 − t1 )v2 = 0.
Thus the derivative ϕ is monotone increasing, which tells us that ϕ is convex.
This clearly implies that g is convex. A similar proof shows that h is also
convex, and hence we are done with the proof. 

The following proposition is straightforward, but it is important to observe


that the class of continuous DC functions is invariant with respect to taking
the negative sign. This differs it dramatically from the one-sided classes of
convex and concave functions.

Proposition 7.84 Let X be a normed space, and let f : X → R be a contin-


uous DC function. Then λf is also a continuous DC function for any λ ∈ R.

Next we check that sums of finitely many continuous DC functions also


belong to this class of functions.

Proposition 7.85 Let X be a normed space. If  fi : X → R are continuous


m
DC functions for all i = 1, . . . , m, then their sum i=1 fi is also a continuous
DC function on X.

Proof. It follows from Definition 7.81 that there exist continuous convex func-
tions gi , hi : X → R such that fi = gi − hi on X for i = 1, . . . , m. Then
m
 m
 m

fi (x) = gi (x) − hi (x) for all x ∈ X,
i=1 i=1 i=1
m
which readily implies that the sum i=1 fi is a continuous DC function. 

We see below that both maximum and minimum operations over finitely
many continuous DC functions keep the resulting functions in this class.
7.8 Differences of Convex Functions 517

Proposition 7.86 Let X be a normed space. If fi : X → R for i = 1, . . . , m


are continuous DC functions, then so are max fi and min fi .
i=1,...,m i=1,...,m

Proof. Definition 7.81 tells us that there exist continuous convex functions
gi , hi : X → R such that fi = gi − hi for i = 1, . . . , m on X, and hence
m
 m

fi (x) = gi (x) + hj (x) − hj (x), x ∈ X,
j=1,j =i j=1

for i = 1, . . . , m. It follows therefore that for all x ∈ X we get


 m
  m
max fi (x) = max gi (x) + hj (x) − hj (x).
i=1,...,m i=1,...,m
j=1,j =i j=1
  m
Since the functions g(x) := max gi (x) + j=1,j =i hj (x) and h(x) :=
i=1,...,m
m
j=1 hj (x) for x ∈ X are continuous and convex, we arrive at the claimed
conclusion for max fi . The case of minimum functions reduces to the max-
i=1,...,m
imum ones by Proposition 7.84, and thus the proof is complete. 

To proceed with other operations keeping functions in the continuous DC


class, we present first the following useful lemma.

Lemma 7.87 Let X be a normed space. If f : X → R is a continuous


DC function, then there exist nonnegative, convex, and continuous functions
g, h : X → [0, ∞) such that f (x) = g(x) − h(x) for all x ∈ X.

Proof. Let f1 , f2 : X → R be continuous convex functions such that f (x) =


f1 (x) − f2 (x) on X. Due to the subdifferentiability of these functions every-
where on X, we find x∗1 , x∗2 ∈ X ∗ and b1 , b2 ∈ R such that
f1 (x) ≥ ϕ1 (x) := x∗1 , x + b1 ,
f2 (x) ≥ ϕ2 (x) := x∗2 , x + b2
for all x ∈ X. Then we have f (x) = (f1 (x) − ϕ1 (x)) − (f2 (x) − ϕ2 (x)) + φ(x),
where φ(x) := ϕ1 (x) − ϕ2 (x) is an affine function with
 
φ(x) = max φ(x), 0 − max − φ(x), 0 , x ∈ X.
Define φ+ (x) := max{φ(x), 0}, φ− (x) := max{−φ(x), 0} and observe that
both functions are nonnegative, continuous, and convex on X. Thus we get
   
f (x) = f1 (x) − ϕ1 (x) + φ+ (x) − f2 (x) − φ2 (x) + φ− (x) , x ∈ X.
Defining finally g(x) := f1 (x) − ϕ1 (x) + φ+ (x) and h(x) := f2 (x) − ϕ2 (x) +
φ− (x) completes the proof of the lemma. 
518 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

The last two results in this subsection follow from the above lemma.

Proposition 7.88 Let X be a normed space. If f : X → R is a continuous


DC function, then its square f 2 belong to this class as well.

Proof. Lemma 7.87 allows us to find continuous convex functions g, h : X →


[0, ∞) such that f (x) = g(x) − h(x) on X. Then we have the representation
 2
f 2 (x) = g 2 (x)+h2 (x)−2g(x)h(x) = 2g 2 (x)+2h2 (x)− g(x)+h(x) , x ∈ X.
This readily proves the continuous DC property of f 2 , since both functions
2g 2 + 2h2 and (g + h)2 are convex and continuous on X. 

The previous result ensures the preservation of the continuous DC property


for finite products of such functions.
Proposition 7.89 Let X be a normed space. If fi : X → R for i = 1, . . . , m
are continuous DC functions, then so is their product f1 f2 · · · fm .

Proof. It suffices to prove the proposition for the case where m = 2. We have
1  2 
(f1 f2 )(x) = f1 (x) + f2 (x) − f12 (x) − f22 (x) , x ∈ X.
2
It follows from Proposition 7.88 that (f1 + f2 )2 , f12 , and f22 are continuous
DC functions, and thus f1 f2 also belongs to this class. 

We conclude this subsection by describing an important class of quadratic


forms that belongs to the collection of continuous DC functions.

Example 7.90 Let A ∈ Rn×n be a symmetric matrix, and let


f (x) := x, Ax for all x ∈ Rn .
To verify that this quadratic form is a continuous DC  function, fix a real
number γ such that γ ≥ max |λi (A)| i = 1, . . . , n , where λi (A) for i =
1, . . . , n are the eigenvalues of the matrix A. Then we have the representation
1 
f (x) = x, (γI + A)x − x, (γI − A)x for all x ∈ Rn ,
2
which readily justifies the claim.

7.8.2 The Mixing Property of DC Functions

This subsection continues the study of DC functions on normed spaces. The


main result of Theorem 7.97, labeled sometimes as the mixing lemma, provides
a convenient test to check the DC property of various compositions; see, e.g.,
Corollary 7.98 and Exercise 7.156. We begin with some preliminary results,
which are used in what follows.
7.8 Differences of Convex Functions 519

Proposition 7.91 Let X be a normed space, and let f : X → R be a contin-


uous function. The following properties are equivalent:
(a) f is a continuous DC function on X.
(b) There exists a continuous function ϕ : X → R such that both functions
−f + ϕ and f + ϕ are convex on X.
(c) There exists a continuous function ϕ : X → R such that for any x1 , x2 ∈
X and 0 < λ < 1, we have
 
λf (x1 ) + (1 − λ)f (x2 ) − f λx
 1 + (1 − λ)x2  (7.107)
≤ λϕ(x1 ) + (1 − λ)ϕ(x2 ) − ϕ λx1 + (1 − λ)x2 .

Proof. To verify (a)=⇒(b), suppose by (a) that f is a continuous DC function


on X and find continuous convex functions f1 , f2 : X → R such that f =
f1 − f2 . Define ϕ := f1 + f2 . Then ϕ is continuous and both of the functions
−f + ϕ = 2f2 and f + ϕ = 2f1 are convex, i.e., we get (b).
To prove (b)=⇒(c), take by (b) a continuous function ϕ : X → R such
that both −f + ϕ and f + ϕ are convex and continuous. Pick any x1 , x2 ∈ X
and 0 < λ < 1, and then let x := λx1 +(1−λ)x2 . It follows from the convexity
of −f + ϕ and f + ϕ that
   
−f (x) + ϕ(x) ≤ λ − f (x1 ) + ϕ(x1 ) +(1 −λ) − f (x2 ) + ϕ(x
 2) ,
f (x) + ϕ(x) ≤ λ f (x1 ) + ϕ(x1 ) +(1 − λ) f (x2 ) + ϕ(x2 ) ,
which ensure therefore the inequalities
−f (x) + λf (x1 ) + (1 − λ)f (x2 ) ≤ λϕ(x1 ) + (1 − λ)ϕ(x2 ) − ϕ(x),
(7.108)
f (x) − λf (x1 ) − (1 − λ)f (x2 ) ≤ λϕ(x1 ) + (1 − λ)ϕ(x2 ) − ϕ(x).
 
Since ϕ = (−f + ϕ) + (f + ϕ) /2 is a convex function, we get λϕ(x1 ) + (1 −
λ)ϕ(x2 ) − ϕ(x) ≥ 0. Thus the desired condition (7.107) follows from (7.108).
Finally, assume that (c) is satisfied and deduce directly from (7.107) that
the Jensen inequality holds for both −f + ϕ and f + ϕ. Thus we have that
both functions −f + ϕ and f + ϕ are continuous and convex with f = (f +
ϕ)/2 − (−f + ϕ)/2. This tells us that f is a continuous DC function. 

The following notion plays a significant role in the rest of this subsection.
Definition 7.92 Let X be a normed space, and let f : X → R be a continuous
DC function. We say that ϕ : X → R is a control function for f if ϕ is
continuous and both functions −f + ϕ and f + ϕ are convex.
It follows directly from the definition that any control function for a given
DC function is always convex and continuous.

Proposition 7.93 Let X be a normed space, and let f : X → R be a contin-


uous DC function controlled by a continuous function ϕ : X → R. Then |f | is
a continuous DC function controlled by |f | + 2ϕ.
520 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Proof. Define the functions g, h : X → R by


1  1 
g(x) := f (x) + ϕ(x) and h(x) := − f (x) + ϕ(x) , x ∈ X.
2 2
Then we have f (x) = g(x) − h(x), ϕ(x) = g(x) + h(x), and
   
|f (x)| = max g(x) − h(x), h(x) − g(x) = max 2g(x), 2h(x) − g(x) + h(x)
for all x ∈ X. It follows therefore that
  

ϕ(x) := f (x) + 2ϕ(x) = max 2g(x), 2h(x) + g(x) + h(x) , x ∈ X.
Then ϕ is a continuous function such that both functions |f (x)| + ϕ(x)
 =
2 max{2g(x), 2h(x)} and −|f (x)| + ϕ(x)
 = 2(g(x) + h(x)) are convex on X.
 is a control function for |f |.
This tells us that ϕ 

The next proposition gives us a convenient sufficient condition for the


convexity of a continuous function of one real variable.

Proposition 7.94 Let f : R → R be a continuous function. Then f is convex


if for any x ∈ R and δ > 0 there exist 0 < λ < 1 and u1 , u2 ∈ (x − δ, x + δ)
with u1 = u2 such that x = λu1 + (1 − λ)u2 and f (x) ≤ λf (u1 ) + (1 − λ)f (u2 ).

Proof. Given ε > 0, define the function gε (x) := f (x) + εx2 for x ∈ R.
To verify the convexity of f , it suffices to check that gε is convex for every
ε > 0. Suppose on the contrary that gε is not convex for some ε > 0 and find
x1 , x2 ∈ R such that
gε (tx1 + (1 − t)x2 ) > tgε (x1 ) + (1 − t)gε (x2 ) for some 0 < t < 1.

Fig. 7.1. The functions gε , , and h


7.8 Differences of Convex Functions 521

Without loss of generality we can assume that x1 < x2 . Let x0 := tx1 +


(1 − t)x2 and define the affine function
gε (x2 ) − gε (x1 )
(x) := (x − x1 ) + gε (x1 ) for x ∈ R,
x2 − x1
which is the line connecting (x1 , gε (x1 )) and (x2 , gε (x2 )). Then the continuous
function gε − achieves
 its absolute
 maximum on [x1 , x2 ] at x ∈ [x1 , x2 ]. Define
ψ(x) := (x) + gε (x) − (x) for x ∈ R and observe the following properties:
(a) gε (x) ≤ ψ(x) for all x ∈ [x1 , x2 ].
(b) gε (x) = ψ(x).
(c) gε (x1 ) < ψ(x1 ) and gε (x2 ) < ψ(x2 ).
Indeed, (a) holds since for any x ∈ [x1 , x2 ] we have
gε (x) − (x) ≤ gε (x) − (x),
 
which implies that gε (x) ≤ (x) + gε (x) − (x) = ψ(x). Since (b) is obvious,
it remains to verify (c). To this end, we deduce from gε (x0 ) > (x0 ) that
gε (x) − (x) ≥ gε (x0 ) − (x0 ) > 0,

which implies that ψ(x1 ) = (x1 ) + gε (x) − (x)) > (x1 ) = gε (x1 ), with a
similar proof for x2 ; see Figure 7.1.
It follows from (c) that x ∈ (x1 , x2 ). The imposed assumptions allow us
to find u1 , u2 ∈ (x1 , x2 ) with u1 = u2 and 0 < λ < 1 such that
f (x) ≤ λf (u1 ) + (1 − λ)f (u2 ),
where x := λu1 + (1 − λ)u2 . Using this property together with the strict
convexity of the squaring function tells us that
gε (x) = f (x) + εx2
< λf (u1 ) + λεu21 + (1 − λ)f (u2 ) + (1 − λ)εu22
= λgε (u1 ) + (1 − λ)gε (u2 ).
Since ψ is an affine function, by using (a) we have
gε (x) < λgε (u1 ) + (1 − λ)gε (u2 )
 1 ) + (1 − λ)ψ(u
≤ λψ(u  2)
= ψ λu1 + (1 − λ)u2 = ψ(x),
which contradicts (b). This verifies therefore that f is a convex function. 

As parts of the proof of the main result, we present now the following two
lemmas of their own interest.
522 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Lemma 7.95 Let X be a normed space, and let f : X → R be continuous.


Suppose that there is a continuous function ϕ : X → R such that for any
x, v ∈ X with v = 1 and any δ > 0 there are x1 , x2 ∈ B(x; δ) with x1 = x2
and 0 < λ < 1 such that x = λx1 + (1 − λ)x2 , x1 − x2 = x1 − x2 v, and
|λϕ(x1 ) + (1 − λ)ϕ(x2 ) − f (x)| ≤ λϕ(x1 ) + (1 − λ)ϕ(x2 ) − ϕ(x).
Then f is a continuous DC function on X.

Proof. To verify that f is a continuous DC function, it suffices to show that


both functions f + ϕ and −f + ϕ are convex on X. In what follows we prove
that f + ϕ is convex and leave the proof that −f + ϕ is convex as an exercise.
Fix any a ∈ X with a = 0 and b ∈ X. Define the function ψ : R → R by
ψ(t) := (f + ϕ)(at + b) for all t ∈ R.
Let us show that ψ is convex on R and then conclude that the function f + ϕ
is convex on X as well.
To verify the convexity of ψ, we employ Proposition 7.94 to show that,
taking any t ∈ R and δ > 0, there exist t1 , t2 ∈ (t − δ, t + δ) with t1 = t2 and
0 < λ < 1 such that t = λt1 + (1 − λ)t2 and
ψ(t) ≤ λψ(t1 ) + (1 − λ)ψ(t2 ).
Denote x := at + b and v := a/a. By the imposed assumptions, we find
x1 , x2 ∈ B(x; δa) with x1 = x1 and 0 < λ < 1 satisfying
(a) x = λx1 + (1 − λ)x2 ,
(b) x1 − x2 = x1 − x2 v,
(c) |λϕ(x1 ) + (1 − λ)ϕ(x2 ) − f (x)| ≤ λϕ(x1 ) + (1 − λ)ϕ(x2 ) − f (x).
It follows directly from (c) that
   
f (at + b) + ϕ(at + b) ≤ λ f (x1 ) + ϕ(x1 ) + (1 − λ) f (x2 ) + ϕ(x2 ) . (7.109)
Employing (a) and (b) tells us that
 x1 − x2  
x1 = a t − (1 − λ) + b,
a
 x1 − x2  
x2 = a t + λ + b.
a
Denote further the real numbers
x1 − x2  x1 − x2 
t1 := t − (1 − λ) , t2 := t + λ
a a
and observe that t = λt1 + (1 − λ)t2 . It follows from (7.109) that
7.8 Differences of Convex Functions 523
 
ψ(t) = f (at + b) + ϕ(at + b) ≤ λ f + ϕ)(at1 + b) + (1 − λ) f + ϕ)(at2 + b)
= λψ(t1 ) + (1 − λ)ψ(t2 ).
Since x1 , x2 ∈ B(x; δa), we have
x1 − x = at + b − (at1 + b) = a · |t − t1 | < δa,
which implies that |t − t1 | < δ and hence t1 ∈ (t − δ, t + δ). A similar argument
shows that t2 ∈ (t − δ, t + δ). Then Proposition 7.94 yields the convexity of
ψ. This verifies therefore that f is a continuous DC function. 

Lemma 7.96 Consider continuous DC functions fi : X → R for i = 1, . . . , m,


and let f : X → R be a continuous function such that
f (x) ∈ f1 (x), . . . , fm (x)} whenever x ∈ X. (7.110)
Then for any x, v ∈ X with v = 1 and for any δ > 0 there exist x1 , x2 ∈
B(x; δ) with x1 = x2 , 0 < λ < 1, and r, s ∈ {1, . . . , m} such that
(a) x = λx1 + (1 − λ)x2 ,
(b) x1 − x2 = x1 − x2 v,
(c) f (x1 ) = fr (x1 ), f (x2 ) = fs (x2 ), and f (x) = fr (x) = fs (x).

Proof. Define the sets


 
L+ := x + tv t > 0 and L− := x + tv t < 0 .
We first claim that there exist r ∈ I := {1, . . . , m} and x1 ∈ B(x; δ) ∩ L+
such that f (x) = fr (x) and f (x1 ) = fr (x1 ). Choose a sequence {uk } in L+
that converges to x. Denote Ik := {i ∈ I | f (uk ) = fi (uk )} and observe that
each Ik is a nonempty subset of I. Since there are only finitely many subsets
of I, by extracting a subsequence, we suppose without loss of generality that
there exists a nonempty subset J of I such that Ik = J for all k ∈ N. Fix
r ∈ J and get f (uk ) = fr (uk ) for all k ∈ N. Passing there to the limit as
k → ∞ and using the continuity of f and fr yield f (x) = fr (x). Letting finally
x1 := uk for some k ∈ N justifies the claim. Similarly we verify the existence
of s ∈ I and x2 ∈ B(x; δ) ∩ L− such that f (x) = fs (x) and f (x2 ) = fr (x2 ).
It follows furthermore that x = λx1 + (1 − λ)x2 for some λ ∈ (0, 1) and that
x1 − x2 = x1 − x2 v, which completes the proof of the lemma. 

Now we are ready to prove the main result of this subsection.


Theorem 7.97 Let fi : X → R for i = 1, . . . , m be continuous DC functions
defined on a normed space X. If f : X → R is a continuous function such that
f (x) ∈ {f1 (x), . . . , fm (x)} for all x ∈ X, then f is a continuous DC function.

Proof. Pick x, v ∈ X with v = 1 and δ > 0. Lemma 7.96 ensures the
existence of x1 , x2 ∈ B(x; δ), 0 < λ < 1, and r, s ∈ {1, . . . , m} with
524 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

(a) x = λx1 + (1 − λ)x2 ,


(b) x1 − x2 = x1 − x2 v,
(c) f (x1 ) = fr (x1 ), f (x2 ) = fs (x2 ), and f (x) = fr (x) = fs (x).
Take any i, j ∈ 1, . . . , m, and then let ϕi and ϕj be control functions
of fi and fj , respectively; see Definition 7.92. We obviously have that both
functions fi − fj + ϕi + ϕj and fj − fi + ϕi + ϕj are convex, and so fi − fj is a
continuous DC function controlled by ϕi +ϕj . It follows from Proposition 7.93
that |fi − fj | is a continuous DC function controlled by |fi − fj | + 2(ϕi + ϕj ),
1
which is convex and continuous together with ϕi + ϕj + |fi − fj |. Define
2
further the convex and continuous functions
m  
1
g(x) := ϕi (x) + ϕj (x) + |fi (x) − fj (x)| ,
i,j=1
2

1
m
  1 
h(x) := (ϕr (x) + ϕs (x)) + ϕi (x) + ϕj (x) + |fi (x) − fj (x)|
2 2
i,j=1,i=r,j =s

for all x ∈ X. Using the above property (c), for any α, β ∈ R we have
|λf (x1 ) + (1 − λ)f (x2 ) − f (x)|
= |λfr (x1 ) + (1 − λ)fs (x2 ) − f (x)|
1 1 1 1
= |λfr (x1 ) + (1 − λ)fs (x2 ) + α − α + β − β − f (x)|
2 2 2 2
1 1
≤ |λfr (x1 ) + α − fr (x)| + |(1 − λ)fs (x2 ) + β − fs (x)|
2 2
1 1
+ |λfr (x1 ) − β| + |(1 − λ)fs (x2 ) − α|.
2 2
Specify now numbers α, β as α := (1 − λ)ϕr (x2 ) and β := λfs (x1 ) and then
deduce from Proposition 7.91 that
1 1
|λϕr (x1 ) + α − ϕr (x)| = |λϕr (x1 ) + (1 − λ)ϕr (x2 ) − ϕr (x)|
2 2
1
≤ (λϕr (x1 ) + (1 − λ)ϕr (x2 ) − ϕr (x)).
2
Similarly we get the relationships
1 1
|(1 − λ)fs (x2 ) + β − fs (x)| = |λfs (x1 ) + (1 − λ)fs (x2 ) − fs (x)|
2 2
1
≤ (λϕs (x1 ) + (1 − λ)ϕs (x2 ) − ϕs (x)).
2
Since |fr (x) − fs (x)| = 0, it follows that
1 1
|λfr (x1 ) − β| + |(1 − λ)fs (x2 ) − α|
2 2
1 1
= λ|fr (x1 ) − fs (x1 )| + (1 − λ)|fr (x2 ) − fs (x2 )|
2 2
1
+ |fr (x) − fs (x)|.
2
7.8 Differences of Convex Functions 525

Observe from the definitions of g and h that we have


1 
g(u) − h(u) = ϕr (u) + ϕs (u) + |fr (u) − fs (u)| , u ∈ X.
2
Taking into account that |fi − fj | is a continuous DC function tells us that
|λf (x1 ) + (1 − λ)f (x2 ) − f (x)|
1 
≤ λϕr (x1 ) + (1 − λ)ϕr (x2 ) − ϕr (x)
2
1 1
+ (λϕs (x1 ) + (1 − λ)ϕs (x2 ) − ϕs (x)) + λ|fr (x1 ) − fs (x1 )|
2 2
1 1
+ (1 − λ)|fr (x2 ) − fs (x2 )| + |fr (x) − fs (x)|
2 2
1  1 
= λ (ϕr + ϕs + |fr − fs |)(x1 ) + (1 − λ) (ϕr + ϕs + |fr − fs |)(x2 )
1 2  2
− (ϕr + ϕs + |fr − fs |)(x)
2   
= λϕ(x1 ) + (1 − λ)ϕ(x1 ) − g(x) − λh(x1 ) + (1 − λ)h(x1 ) − h(x)
≤ λg(x1 ) + (1 − λ)g(x1 ) − g(x),
where the last inequality is due to
λh(x1 ) + (1 − λ)h(x1 ) − h(x) ≥ 0
by the convexity of h. Since the function g does not depend on the choice of
i and j, it follows from Lemma 7.95 that f is a continuous DC function, and
we are done with the proof of the theorem. 
Using Theorem 7.97, we can easily prove that the maximum and the mini-
mum of a finite number of continuous DC functions are DC functions. Another
useful consequence is obtained in the corollary below.

Corollary 7.98 Let X be a normed space, and let f : X → R be a continuous


function. Then f is a continuous DC function if and only if |f | is.

Proof. It suffices to check the converse implication. Observe that if |f | is a


continuous DC function, then −|f | is also a continuous DC function. It is
obvious that f (x) ∈ {|f |(x), (−|f |)(x)} for all x ∈ X. Thus Theorem 7.97
readily implies that f is a continuous DC function on X. 

7.8.3 Locally DC Functions

This subsection concerns local counterparts of convex and DC functions that,


being clearly related to their global versions, have their own values.
Definition 7.99 Given a nonempty convex set Ω ⊂ X, we say that a function
f : Ω → R is locally DC if for any x0 ∈ Ω there exist a convex neighborhood
V of x0 and two convex functions g, h : V ∩ Ω → R such that
f (x) = g(x) − h(x) for all x ∈ Ω ∩ V.
526 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

The example presented below shows that any C 2 -smooth function on an


open interval of the real line is locally DC.
Example 7.100 Let f : Ω → R be a C 2 -smooth function on an open interval
Ω ⊂ R. For any x0 ∈ Ω we can choose δ > 0 such that I := [x0 −δ, x0 +δ] ⊂ Ω.
Denote M := supx∈J |f  (x)| < ∞ and observe that for each ρ ≥ M , we have
the representation
1  ρ 2  ρ 
f (x) = x + f (x) − x2 − f (x) , x ∈ I.
2 2 2
It is easy to see that the functions
ρ ρ
g(x) := x2 + f (x) and h(x) := x2 − f (x), x ∈ I,
2 2
are convex on I, and thus f is a locally DC function on Ω.

The next lemma, using nonconvex subdifferentiation, is the key to estab-


lishing relationships between convexity and local convexity of functions.

Lemma 7.101 Assume that f : R → R is convex on the intervals (−∞, 0],


[0, ∞), and (−δ, δ) for some δ > 0. Then this function is convex on R.

Proof. We know that a convex function defined on an open set is locally


Lipschitzian therein. Furthermore, Theorem 7.46 tells us that the convexity of
a local Lipschitzian function is equivalent to the monotonicity of the associated
generalized gradient mapping. We easily get (see Exercise 7.157) that
 
∂f (x) = [f− (x), f+ (x)] for all x ∈ R. (7.111)
Thus verifying the claimed convexity of f on R reduces to showing that
 
f+ (x1 ) ≤ f− (x2 ) whenever x1 < x2 . (7.112)
To prove (7.112), observe that for any function f : I → R defined on an
open interval I ⊂ R with x1 , x2 ∈ I and x1 < x2 , we have
f (x) − f (x1 ) f (x2 ) − f (x1 ) f (x) − f (x2 )
≤ ≤ whenever x1 < x < x2 .
x − x1 x2 − x1 x − x2

Letting x ↑ x+
1 and then x ↓ x2 gives us (7.112).
To proceed with the proof of the lemma, fix any x1 , x2 ∈ R with x1 < x2 . If
either x1 ≥ 0 or x2 ≤ 0, we immediately arrive at the conclusion by using the
intervals I = [0, ∞) and I = (−∞, 0], respectively. Consider the case where
x1 < 0 < x2 and choose u1 , u2 ∈ (−δ, δ) so that
x1 < u1 < 0 < u2 < δ.
Then applying the established result (7.112) on the intervals (−∞, 0], (−δ, δ),
and [0, ∞) brings us to the relationships
7.8 Differences of Convex Functions 527

     
f+ (x1 ) ≤ f− (u1 ) ≤ f+ (u1 ) ≤ f− (u2 ) ≤ f+ (u2 ) ≤ f− (x2 ).

This justifies the monotonicity of ∂f and hence the claimed convexity of f ,


which therefore completes the proof of the lemma. 

Now we are ready to show that the convexity of a function f around each
point of a convex set in a normed space is equivalent to the convexity of this
function on the entire set.

Theorem 7.102 Let Ω be a nonempty convex subset of a normed space X,


and let f : Ω → R be such that for any x ∈ Ω there exists a convex neighbor-
hood V of x such that f is convex on Ω ∩ V . Then f is convex on Ω.

Proof. For simplicity we confine ourselves to the case where Ω = X. Consider


first the one-dimensional setting and verify that the convexity of f : R → R
around each fixed x ∈ R yields its convexity on R. To this end, define the set

I := x ≥ 0 f is convex on [0, x)
and show that I is unbounded. Observing that I is nonempty by the local
convexity of f , suppose on the contrary that I is bounded. Then α := sup I
is a real number, and so f is convex on [0, α). The local convexity of f gives
us δ > 0 such that f is convex on (α − δ, α + δ). It follows that f is convex on
[0, α + δ), and hence it is convex on [0, α + δ] by its continuity. The obtained
contradiction justifies the unboundedness of I, which verifies therefore that f
is convex on [0, ∞). It can be similarly shown that f is convex on (−∞, 0].
Furthermore, this allows us to find δ > 0 such that f is convex on (−δ, δ).
Thus Lemma 7.101 ensures the convexity of f on R.
To verify finally the convexity of f : X → R in the general case of normed
space, fix any vector x ∈ X and any direction v ∈ X. Define ϕ(t) := f (x + tv)
for t ∈ R, which is locally convex on R by the imposed assumption. Thus it is
convex on R as proved above. But the latter clearly implies that f is convex
on X, which completes the proof of the theorem. 

Let us discuss some possible applications of Theorem 7.102, which we


repeatedly use in what follows.

Remark 7.103 Theorem 7.102 provides a convenient way to check the con-
vexity of a given function. For instance, if f : X → R is convex on a nonempty
convex set Ω ⊂ X and if for any x ∈ bd(Ω) ∪ Ω c there exists a convex neigh-
borhood V of x on which f is convex, then Theorem 7.102 tells us that f is
convex on the entire space X. Furthermore, assume that Ω ⊂ ∪m i=1 Vi , where
every Vi is a nonempty open convex set on which f is convex. Take any x ∈ X
and get that x ∈ Vi for some i ∈ {1, . . . , m}. Since f is convex on Vi , it is
convex on Vi ∩ Ω, and thus f is convex on the set Ω by Theorem 7.102.
528 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

In the rest of this subsection, our attention is paid to locally DC functions


defined on finite-dimensional spaces. Before establishing the main result in
this direction saying that such functions are always continuous DC, we present
several lemmas that certainly are of their own interest. The first lemma verifies
the possibility to extend a convex function on a neighborhood of some point
to one, which is convex on the entire space.
Lemma 7.104 Let f : U → R be a convex function defined on a convex neigh-
borhood U of some point x0 ∈ Rn . Then there exist a neighborhood V ⊂ U of
x0 and a convex function ϕ : Rn → R such that ϕ(x) = f (x) for all x ∈ V .
Proof. Choose r > 0 such that W := B(x0 ; r) ⊂ U . Then f is bounded on
B(x0 ; r). Let g(x) := αx − x0  + f (x0 ) − 1, where α is so large that g(x) >
f (x) + 1 > f (x) whenever x − x0  = r. We get g(x0 ) = f (x0 ) − 1 < f (x0 ),
and thus the continuity of f, g gives us a neighborhood V ⊂ W ⊂ U of x0
with g(x) ≤ f (x) for all x ∈ V . Define ϕ : Rn → R by

max f (x), g(x) if x ∈ W,
ϕ(x) :=
g(x) otherwise.

Observe that ϕ is convex on W and that max{f (x), g(x)} = g(x) whenever x
is near bd(W ), which implies by Theorem 7.102 that ϕ is convex on Rn . By
the construction we have ϕ(x) = max{f (x), g(x)} = f (x) for all x ∈ V , which
therefore completes the proof of the lemma. 
The next lemma justifies the possibility to study locally DC functions by
using the convexity of functions on compact convex sets.
Lemma 7.105 Let f : Rn → R be a locally DC function, and let K be a
nonempty, convex, and compact subset of Rn . Then there exists a convex func-
tion h : Rn → R such that the sum f + h is convex on K.
Proof. Fix any x ∈ Rn and, by using Definition 7.99 and Lemma 7.104, choose
a convex neighborhood Ux of x together with convex functions gx , hx : Rn → R
such that f (z) = gx (z) − hx (z) for all z ∈ Ux . It follows that the function
f + hx is convex on Ux . Since K is compact, we have
m

K⊂ Uxi for some x1 , . . . , xm ∈ Rn .
i=1
m
Define g(x) := f (x) + i=1 hxi for x ∈ Rn and observe that
m

g(x) = (f + hxj )(x) + hxj (x), x ∈ Uxj
i=1,i=j

is the sum of two convex functions for all j = 1, . . . , m. Thus it is convex on


K by Theorem
m 7.102; see the discussions in Remark 7.103. Denoting finally
h(x) = i=1 hxi (x) for x ∈ Rn , we conclude that this function is convex on
Rn , and hence f = g − h is a DC function on K. 
7.8 Differences of Convex Functions 529

The last lemma here collects technical results, which play a crucial role in
the proof of the main theorem given below.
Lemma 7.106 Let f : Rn → R be a locally DC function. For each k ∈ N
define the sets Ck := B(0; k) ⊂ Rn and Dk := B(0; k + 1/2) ⊂ Rn , and
let hk : Rn → R be convex functions such that f + hk is convex on Dk ; these
functions exist by Lemma 7.105. Then there exists a convex function ϕ1 : Rn →
R satisfying the following conditions:
(a) ϕ1 (x) = h1 (x) for all x ∈ C1 and
(b) f + ϕ1 is convex on D2 .

Proof. Since C1 is compact, we can find a constant γ > 0 such that


h2 (x) − γ < h1 (x) for all x ∈ C1 .
Given a constant α > 0, consider the function

0 if x ≤ 1,
ψ(x) :=
α(x − 1) otherwise.

Then ψ : Rn → R is a convex function due to the representation ψ(x) =


max{0, α(x − 1)}), and we have ψ(x) = 0 for all x ∈ C1 . It follows that
h2 (x) − γ + ψ(x) ≤ h1 (x) for all x ∈ C1 . (7.113)
Since ψ(x)−1 > 0 whenever x ∈/ C1 , we can choose a sufficiently large positive
number α in the definition of ψ such that
h2 (x) − γ + ψ(x) > h1 (x) for all x ∈ bd(D1 ). (7.114)
Having all of this, we define now the function

max h1 (x), h2 (x) − γ + ψ(x) if x < 3/2,
ϕ1 (x) :=
h2 (x) − γ + ψ(x) otherwise

and deduce that ϕ1 (x) = h1 (x) for all x ∈ C1 by the definition of ϕ1 and
(7.113). In addition, we get from (7.114) that
ϕ1 (x) = h2 (x) − γ + ψ(x) whenever x is near bd(D1 ). (7.115)
It also follows from the definition of ϕ1 that this function is convex on D1 .
Furthermore, ϕ1 is convex around any point in Rn \ D1 , and thus it is convex
on the entire space Rn by Theorem 7.102. This verifies (a).
It remains to show that (b) is satisfied, i.e., the function f + ϕ1 is convex
on D2 . Observe to this end that f + h1 is convex on D1 , and that f + h2 is
convex on D2 . It follows that the function f + h2 − γ + ψ is also convex on
D2 . Since D1 ⊂ D2 and
530 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

f (x) + ϕ1 (x) = max f (x) + h1 (x), f (x) + h2 (x) − γ + ψ(x) for all x ∈ D1 ,
we see that f + ϕ1 is convex on D1 . Then deduce from the definition of ϕ1
and (7.115) that f + ϕ1 is convex around every point of D2 \ D1 . Hence f + ϕ1
is convex on D2 by Theorem 7.102, which therefore completes the proof. 

Having in hand the above results, we come to the main theorem.

Theorem 7.107 Let f : Rn → R be a locally DC function on the entire space


Rn . Then it is a continuous DC function on Rn .

Proof. By the definitions of the sets Ck and Dk in Lemma 7.106, we get



 ∞

Ck = Dk = Rn and C1 ⊂ D1 ⊂ C2 ⊂ D2 ⊂ . . . .
k=1 k=1

Employing the induction procedure in the framework of that lemma allows us


to construct a sequence of convex functions ϕk : Rn → R such that ϕk = ϕk−1
on Ck and the function f +ϕk is convex on Dk+1 for each k. Indeed, by Lemma
7.106 we already have ϕ1 with the claimed properties. In particular, the sum
f + ϕ1 is convex on D2 . Following the device in Lemma 7.106, we construct a
convex function ϕ2 : D3 → R such that f +ϕ2 is convex on D3 with ϕ2 = ϕ1 on
C2 . Continuing this process gives us the desired sequence {ϕk (x)}. It follows
furthermore from the above procedure that the sequence {ϕk (x)} converges
pointwise to some function h : Rn → R with the convergence being uniform
on compact subsets of Rn . Thus h : Rn → R is a convex function. Since the
sum f + h is convex on Ck for all k ∈ N, it is convex on the entire space Rn .
This proves that f is a DC function. The continuity of the functions in its
representation follows directly from the above constructions. 

The obtained major theorem has many consequences. In what follows we


present different results in this direction. The first assertion verifies the con-
tinuous DC property of arbitrary C 2 -smooth functions on Rn .

Corollary 7.108 Let f : Rn → R be a C 2 -smooth function on the entire space


Rn . Then it is a continuous DC function on Rn .

Proof. By Theorem 7.107, it suffices to show that f is locally DC on Rn . To


proceed, fix any x0 ∈ Rn and any δ > 0, and then show that f is a locally
DC function on B(x0 ; r). It is obvious that
M := sup ∇2 f (x) < ∞,
x∈B(x0 ;r)

where ∇2 f (x) stands for some matrix norm of ∇2 f (x) ∈ Rn×n (e.g., for
the Frobenius norm). Whenever ρ ≥ M , we have the obvious representation
1  ρ  ρ 
f (x) = x2 + f (x) − x2 − f (x) , x ∈ Rn .
2 2 2
7.8 Differences of Convex Functions 531

To complete the proof, it remains to show that the continuous functions


ρ ρ
g(x) := x2 + f (x) and h(x) := x2 − f (x)
2 2
are convex on B(x0 ; δ). First check that for any x ∈ B(x0 ; δ), we have
∇2 g(x) = ρIn + ∇2 f (x).
Indeed, picking arbitrary x ∈ B(x0 ; δ) and v ∈ Rn clearly yields
| v, ∇2 f (x)v | ≤ ∇2 f (x) · v2 ≤ ρv2 ,
which implies in turn that
v, ∇2 g(x)v = ρv2 + v, ∇2 f (x)v ≥ 0
and thus ensures the convexity of g on B(x0 ; δ). The convexity of h on this
set is verified similarly, and we are done. 

Finally in this subsection, we present two results on the preservation of the


continuous DC property of functions under various operations. Taking func-
tion compositions as in the next proposition provides a common framework
for numerous operations of this type.

Proposition 7.109 Let f : Rn → I and g : I → R be two continuous DC


functions, where I is an open interval in R. Then the composition g ◦ f is also
a continuous DC function on Rn .

Proof. Using Theorem 7.107, it suffices to show that the composition ϕ◦f is a
locally DC function. First consider the case where g is a convex function. Fix
any x0 ∈ Rn and let y0 := f (x0 ) ∈ I. Pick δ > 0 such that J := [y0 −δ, y0 +δ] ⊂
I and choose a neighborhood V of x0 with f (V ) ⊂ J. It is not hard to check
(see Exercise 7.159) that g can be represented in the form

g(y) = sup ϕt (y) t ∈ T for all y ∈ J, (7.116)
where each ϕt : J → R is an affine function of the type ϕt (y) = at y + bt
with M := supt∈T |at | < ∞ and supt∈T bt < ∞. Since f is a continuous
DC function, there exist continuous convex functions f1 and f2 such that
f = f1 − f2 on Rn . By (7.116) we have
    
g f (x) = sup at f1 (x) − f2 (x) + bt
t∈T 
= sup at f1 (x) − at f2 (x) + bt
t∈T   
= sup (M + at )f1 (x) + (M − at )f2 (x) + bt − M f1 (x) + f2 (x)
t∈T

for all x ∈ V . Observe that the functions


532 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS
  
ϕ1 (x) := sup (M +at )f1 (x)+(M −at )f2 (x)+bt , ϕ2 (x) := M (f1 (x)+f2 (x)
t∈T

are both convex on V . This verifies that g ◦ f is a locally DC function, and


therefore it a continuous DC function on Rn by Theorem 7.107.
Consider finally the general case where g is a continuous DC function on
I, i.e., it is represented as g = g1 − g2 via some continuous convex functions
g1 , g2 : I → R. Then g1 ◦ f and g2 ◦ f are both continuous DC functions as
proved above, and thus g ◦ f is a continuous DC function on Rn . 

The last assertion of this subsection presents a particular operation over


continuous DC functions for which the preservation result follows from Propo-
sition 7.109. The reader can derive other operations of this type.

Corollary 7.110 Let f1 , f2 : Rn → R be two continuous DC functions on


Rn such that f2 (x) = 0 for all x ∈ Rn . Then their quotient f1 /f2 is also a
continuous DC function on Rn .

Proof. Suppose without loss of generality that f2 (x) > 0 for all x ∈ Rn . By
Proposition 7.89, it suffices to show that 1/f2 is a continuous DC function.
Since f2 : Rn → I := (0, ∞) is a continuous DC function as well as g : I → R
with g(y) := 1/y for y ∈ I, it follows from Proposition 7.109 that 1/f2 is also
a continuous DC function on Rn . 

7.8.4 Subgradients and Conjugates of DC Functions

The concluding subsection of this section deals with establishing basic


subdifferential and conjugate relationships for continuous DC functions. Then
we apply the obtained relationships to the study of local and global minima
of continuous DC functions, where ε-subdifferentials of convex functions are
also used for characterizations of global minimizers.
We begin with evaluations of the three major subdifferentials for continu-
ous DC functions in appropriate space frameworks.

Theorem 7.111 Let X be a normed space, and let f : X → R be a continuous


DC function with a DC decomposition f = g − h via some continuous convex
functions g, h : X → R. For any x ∈ X we have the following:
(a) The regular subdifferential of f at x is estimated by
"  
 (x) ⊂
∂f ∂g(x) − x∗ ⊂ ∂g(x) − ∂h(x). (7.117)
x∗ ∈∂h(x)

(b) If in addition X is an Asplund space, then the limiting subdifferential of


f at x is estimated via the convex subdifferentials of g and h by
∂f (x) ⊂ ∂g(x) − ∂h(x). (7.118)
7.8 Differences of Convex Functions 533

(c) In the setting of assertion (b) we have the same upper estimate for the
generalized gradient of f at x:
∂f (x) ⊂ ∂g(x) − ∂h(x). (7.119)

Proof. To verify (a), fix any z ∗ ∈ ∂f (x) and x∗ ∈ ∂h(x). Using definition


(7.79), for every ε > 0 find δ > 0 such that
z ∗ , x − x ≤ f (x) − f (x) + (ε/2)x − x,
x∗ , x − x ≤ h(x) − f (x) + (ε/2)x − x
whenever x − x < δ. This gives us right away that
z ∗ + x∗ , x − x ≤ (f + h)(x) − (f + h)(x) + εx − x when x − x < δ.
Hence z ∗ + x∗ ∈ ∂g(x), and so we get z ∗ ∈ ∂g(x) − x∗ , which verifies the first
inclusion in (7.117). Since the second inclusion in (7.117) is obvious, assertion
(a) is fully justified in general normed spaces.
Assume now that X is Asplund and verify (b). Pick any x∗ ∈ ∂f (x) and
by definition (7.80) of the limiting subdifferential find sequences xk → x and
x∗k → x∗ as k → ∞ such that x∗k ∈ ∂f  (xk ) for all k ∈ N. Fix any k ∈ N
and apply the second inclusion in (7.117) to x∗k . This gives us subgradients
 k ) and v ∗ ∈ ∂h(x
u∗k ∈ ∂g(x  k ) such that
k

x∗k = u∗k − vk∗ for all k ∈ N. (7.120)


Since both g and h are locally Lipschitzian around x, the subgradient
sequences {u∗k } and {vk∗ } are bounded. Remembering that bounded sets in
topologically dual spaces associated with Asplund spaces are weak∗ sequen-
tially compact, we suppose without loss of generality that these sequences
weak∗ converge to some u∗ and v ∗ , respectively. Invoking again definition
(7.80) shows that u∗ ∈ ∂g(x) and v ∗ ∈ ∂h(x). Passing finally to the limit in
(7.120) as k → ∞ verifies (7.118) and finishes the proof of (b).
It remains to prove inclusion (7.119) for generalized gradients of (Lips-
chitz) continuous DC functions. To proceed, we recall representation (7.88)
of the generalized gradient ∂f (x) via the weak∗ convex closure of the limiting
subdifferential in Asplund spaces. Applying this operation to the sets on both
sides of (7.118) and taking into account that the sets ∂g(x) and ∂h(x) are
convex, weak∗ closed, and bounded in X ∗ , we readily arrive at (7.119) and
thus complete the proof of the theorem. 
The next theorem presents a precise formula for calculating Fenchel con-
jugates of continuous DC functions defined on normed spaces.
Theorem 7.112 Let X be a normed space, and let f : X → R be a continuous
DC function represented by f (x) = g(x)−h(x) for all x ∈ X, where g, h : X →
R are continuous convex functions. Then the conjugate of f is calculated by

(g − h)∗ (x∗ ) = sup g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) , x∗ ∈ X ∗ . (7.121)
z ∗ ∈dom(h∗ )
534 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Proof. Taking any x∗ ∈ X ∗ and z ∗ ∈ dom(h∗ ), we have



(g − h)∗ (x∗ ) = sup x∗ , x − (g − h)(x)
x∈X 
= sup x∗ , x − g(x) + h(x)
x∈dom(g)   
≥ x∗ + z ∗ , x − g(x) + h(x) − z ∗ , x
for all x ∈ dom(g). It follows therefore that
 
(g − h)∗ (x∗ ) + z ∗ , x − h(x) ≥ x∗ + z ∗ , x − g(x) when x ∈ dom(g).
Since z ∗ , x − h(x) ≤ h∗ (z ∗ ), we get the estimate
(g − h)∗ (x∗ ) + h∗ (z ∗ ) ≥ x∗ + z ∗ , x − g(x) for any x ∈ dom(g).
Taking the supremum above with respect to x ∈ dom(g) gives us
(g − h)∗ (x∗ ) + h∗ (z ∗ ) ≥ g ∗ (x∗ + z ∗ )
and verifies therefore the “≥” inequality in (7.121):

(g − h)∗ (x∗ ) ≥ sup g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) .
z ∗ ∈dom(h∗ )

To prove the opposite inequality in (7.121), it suffices to consider the case
where γ := supz∗ ∈dom(h∗ ) g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) ∈ R. Pick any ε > 0 and fix
z ∗ ∈ dom(h∗ ) such that
g ∗ (x∗ + z ∗ ) − h∗ (z ∗ ) ≤ γ for all z ∗ ∈ dom(h∗ ).
It follows therefore that
g ∗ (x∗ + z ∗ ) ≤ h∗ (z ∗ ) + γ whenever z ∗ ∈ X ∗ .
Since h∗∗ = h by the biconjugate theorem in our setting, applying the Fenchel
conjugate operation to both sides above gives us
g ∗∗ (z) − x∗ , z ≥ h∗∗ (z) − γ = h(z) − γ for all z ∈ X.
Using the estimate g ∗∗ (z) ≤ g(z) on X, we have
g(z) − x∗ , z ≥ h(z) − γ whenever z ∈ X,
which readily implies that γ ≥ x∗ , z − (g − h)(z) for all such z. Taking finally
the supremum with respect to z ∈ X verifies that γ ≥ (g − h)∗ (x∗ ) and thus
completes the proof of the theorem. 

The following nonconvex duality theorem can be viewed as a particular case


of Theorem 7.112 corresponding to x∗ = 0 therein. However, it is not the case
in full generality, since the functions g and h in this result are more general
than those considered in Theorem 7.112. Nevertheless, the reader can check
that the given proof of Theorem 7.112 works in the new setting under small
modifications, which we leave as an exercise for the reader; see Exercise 7.162.
7.8 Differences of Convex Functions 535

Theorem 7.113 Let X be a normed space, and let g : X → R and h : X → R


be l.s.c. convex functions. Then we have the duality relationship
 
inf g(x) − h(x) = ∗ inf ∗ h∗ (z ∗ ) − g ∗ (z ∗ ) .
x∈X z ∈dom(h )

Consequently, if f : X → R is a DC function with the decomposition f = g −h


via some l.s.c. convex functions g, h : X → R, then
 
inf g(x) − h(x) = ∗ inf ∗ h∗ (z ∗ ) − g ∗ (z ∗ ) ,
x∈X z ∈dom(h
 ) 
sup g(x) − h(x) = sup h∗ (z ∗ ) − g ∗ (z ∗ ) .
x∈X z ∗ ∈dom(g ∗ )

The last theorem here provides applications of the obtained results on


subdifferentiation and conjugacy of continuous DC functions to deriving nec-
essary conditions for local minimizers of such functions as well to establishing
complete characterizations of their global minimizers. In the latter case, ε-
subdifferentials for convex functions are significantly involved. As above, all
the functions are defined on normed spaces.

Theorem 7.114 Let X be a normed space, and let f : X → R be a continuous


DC function with a DC decomposition f = g − h, where g, h : X → R are
continuous convex functions. The following assertions hold:
(a) If x is a local minimizer of f (x) = g(x) − h(x) on X, then we have
∂h(x) ⊂ ∂g(x).
(b) x is a global minimizer of f (x) = g(x) − h(x) on X if and only if
∂ε h(x) ⊂ ∂ε g(x) for all ε ≥ 0. (7.122)

Proof. The verification of (a) is easy. Indeed, we know that if x is a local


 − h)(x). Thus the result follows from
minimizer of g − h on X, then 0 ∈ ∂(g
(7.117) of Theorem 7.111 in any normed space.
To verify (b), suppose first that x is a global minimizer of f = g − h on
X. This tells us that
f (x) = g(x) − h(x) ≤ g(x) − h(x) for all x ∈ X
and obviously yields the inequality
h(x) − h(x) + ε ≤ g(x) − g(x) + ε for all x ∈ X and ε ≥ 0.
Taking now any ε-subgradient x∗ ∈ ∂ε h(x) implies by definition that
x∗ , x − x ≤ h(x) − h(x) + ε whenever x ∈ X.
It follows therefore that
x∗ , x − x ≤ g(x) − g(x) + ε for all x ∈ X,
536 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

and thus we get x∗ ∈ ∂ε g(x). This verifies the inclusion ∂ε h(x) ⊂ ∂ε g(x) as a
necessary condition for global optimality.
The proof of the sufficiency of (7.122) for global minima is more involved.
To proceed, suppose that ∂ε h(x) ⊂ ∂ε g(x) for all ε ≥ 0. Suppose on the
contrary that x is not a global minimizer of g − h and find x  ∈ X such that
x) − h(
g( x) < g(x) − h(x).
Pick an arbitrary positive number ε satisfying
0 < ε < g(x) + h(
x) − g(
x) − h(x).
Choose x∗ ∈ ∂εh(
x) by Theorem 5.9 and get
x∗ , x − x
 ≤ h(x) − h(
x) + ε for all x ∈ X.
Then for any x ∈ X, we have
x∗ , x − x ≤ h(x) − h(x) + x∗ , x
 − x + h(x) − h(
x) + ε.
Let ε := x∗ , x
 − x + h(x) − h(
x) + ε ≥ 0 and deduce from the definition that
x∗ ∈ ∂ε h(x). Using (7.122) gives x∗ ∈ ∂ε g(x). Observe that
x∗ , x
 − x = h(
x) − h(x) + ε − ε > g(
x) − g(x) + ε,
which implies that x∗ ∈
/ ∂ε g(x), a clear contradiction. 

7.9 Exercises for Chapter 7


Exercise 7.115 Let x, v ∈ R. Find f ◦ (x; v), f  (x, v), and df (x, v) for the following
real-valued functions:
(a) f (x) := |x|, x ∈ R.
(b) f (x) := −|x|, x ∈ R.
(c) f (x) := max{x, 0}, x ∈ R.
(d) f (x) := min{x, 0}, x ∈ R.
(e) f (x) := |x2 − 1|, x ∈ R.
(f) f (x) := | sin(x)|, x ∈ R.

Exercise 7.116 Let f : Rn → R be locally Lipschitzian around some x ∈ Rn .


(a) Give an example where f  (x; ·) exists but is different from f ◦ (x; ·).
(b) Construct examples showing that the functions v → f  (x; v) and v → df (x; v)
may not be either convex or Lipschitz continuous on Rn .
(c) Give an example of a Lipschitz continuous function f : R → R, which is differ-
entiable at some x but is not directionally regular at this point.

Exercise 7.117 Let f : Rn → R be locally Lipschitzian around some point x ∈ Rn .


7.9 Exercises for Chapter 7 537

(a) Give an example showing that the function (x, v) → df (x, v) may not be upper
semicontinuous at (x, v).
(b) Give an example showing that the function (x, v) → f ◦ (x, v) may not be lower
semicontinuous at (x, v).

Exercise 7.118 Prove that the function f in Example 7.7 is differentiable at x := 0,


being locally Lipschitzian while not continuously differentiable around this point.

Exercise 7.119 Let f : X → R be a function defined on a normed space X that is


locally Lipschitzian around some point x.
(a) Clarify whether merely the Fréchet differentiability of f at x is sufficient for the
directional regularity of f at this point.
(b) Show that the directional regularity of f at x is equivalent to the fulfillment of
the equality in (7.35).

Exercise 7.120 Let Ω be a nonempty subset of a normed space X.


(a) Verify the description of the geometric derivability of Ω at x ∈ Ω given in
(7.17). Hint: Proceed by the definitions and compare with the proof of [317,
Proposition 6.2] in finite dimensions.
(b) Show that if Ω is convex, then it is geometrically derivable at any point x ∈ Ω.
(c) Give an example of a set Ω ⊂ R2 , which is not geometrically derivable at some
point x ∈ Ω.

Exercise 7.121 Let f : X → R be a function defined on a normed space X that is


locally Lipschitzian around some point x.
(a) Show that the epi-differentiability of f at x from Definition 7.15(b) is equivalent
to the set convergence (7.19), where the finite differences Δt f (x)(v) are defined
in (7.18). Hint: Proceed by the definitions.
(b) Verify that the Fréchet differentiability of f at x implies that f is epi-
differentiable at this point. Hint: Follows from the definitions.
(c) Prove that the directional regularity of f at x yields the epi-differentiability of
f at this point. Hint: Deduce it from the geometry of epigraphs and compare
with proof of [317, Theorem 6.26] in finite dimensions.

Exercise 7.122 Consider the composition ϕ ◦ g in the setting of Theorem 7.12.


Prove the epi-differentiability of ϕ at y yields the epi-differentiability of ϕ ◦ g at x.
Hint: Employ Lemma 7.16.

Exercise 7.123 Let X be a normed space, and let f1 , f2 : X → R be locally Lips-


chitzian functions around x.
(a) Show that the directional regularity assumption in Theorem 7.18 is essential
for the fulfillment of the equality in (7.25), even in the case where X = R and
the functions f1 , f2 are differentiable at x.
(b) Verify the fulfillment of (7.26) and give an example showing that the opposite
inequality may fail when X = R.

Exercise 7.124 Consider the setting of Theorem 7.19 under the assumptions made.
538 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

(a) Verify whether both statements of Theorem 7.19 remain valid if the smoothness
of g around x is replaced by the Fréchet differentiability of g at x.
(b) Dropping the surjectivity condition on ∇g(x) in the general assumptions of
Theorem 7.19, show that we have the inequality
 
(ϕ ◦ g)◦ (x; v) ≤ ϕ◦ y; ∇g(x)v , (7.123)
where equality holds if ϕ is directionally regular at y, which yields in turn the
directional regularity of ϕ ◦ g at x. Hint: To verify the first inequality in (7.123),
proceed similarly to the corresponding part in the proof of [76, Theorem 2.3.9].
The other statements can be deduced directly from the definitions.
Exercise 7.125 Assume in the setting of Theorem 7.20 that all the functions fi are
locally Lipschitzian around x.
(a) Prove that in this case we have the inequality

fmax (x; v) ≤ max fi◦ (x; v), v ∈ X.
i∈I(x)

Hint: Proceed similarly to the proof of this part of Theorem 7.20. This can also
deduced from the chain rule in (7.123).
(b) Show that in this case the subderivative formula (7.30) is replaced by

dfmax (x; v) ≥ max dfi (x; v), v ∈ X.


i∈I(x)

Hint: Proceed by definition (7.2).


(c) Prove that equalities hold in both formulas given in (a) and (b) if the functions
fi are directionally regular at x for all i ∈ I(x), which yields the directional reg-
ularity of fmax at this point. Construct an example showing the above regularity
assumptions are essential for the equalities.
(d) Clarify whether the epi-differentiability of fi at x for all i ∈ I(x) ensures
the equality in the subderivative formula given in (b) together with the epi-
differentiability of the maximum function fmax at this point.
Exercise 7.126 Let f be locally Lipschitzian around x on a normed space X.
(a) Does Proposition 7.28(a) hold if the continuous differentiability of f around x
is replaced by its Fréchet differentiability at this point?
(b) Does Proposition 7.28(a) hold if the C 1 -smoothness property of f is replaced by
the requirement that f is continuously Gâteaux differentiable around x?
Exercise 7.127 Let f be locally Lipschitzian around x on a normed space X.
(a) Prove that x∗ ∈ ∂ − f (x) if and only if there exists a locally Lipschitzian function

ϕ, which is Gâteaux differentiable at x with fG (x) = x∗ and such that f − ϕ
attains its local minimum at x. Hint: Proceed by the definitions.
(b) Clarify whether this kind of smooth variational description via an appropriate
derivative is possible for the generalized gradient ∂f (x).
Exercise 7.128 Let f : X → R be locally Lipschitzian around x on a normed space
X. Prove that ∂f (x) = ∅ and that the representation in (7.42) holds by using the
convex separation theorem.
7.9 Exercises for Chapter 7 539

Exercise 7.129 Clarify whether the counterpart of Theorem 7.33 holds for the
contingent subdifferential (7.34) under all the assumptions imposed therein but the
directional regularity of ϕ at y.

Exercise 7.130 Consider ϕ ◦ g under the assumptions of Theorem 7.34.


(a) Give a detailed proof of the inclusion in (7.47). Hint: Use the Hahn-Banach
theorem and compare with the proof of [76, Theorem 2.3.9].
(b) Provide an example with X = Y = R showing that the equality in (7.48) may
fail if ϕ is not directionally regular at y.
Exercise 7.131 Let X and Y be normed spaces, and let f : X × Y → R be locally
Lipschitzian around a given point (x, y).
(a) Show that the equality in (7.49) may fail in the absence of directional regularity
of f at (x, y).
(b) Provide a detailed proof of inclusion (7.50) and show that the directional reg-
ularity of f at (x, y) is essential for its fulfillment.
(c) Find conditions ensuring that (7.50) holds as an equality.

Exercise 7.132 Let f : X × Y → R be as in Exercise 7.131.


(a) Show that without the contingent regularity assumption, the equality in (7.60)
may fail, and that neither the inclusion “⊂” nor “⊃” holds therein.
(b) Clarify whether a contingent counterpart of the inclusion in (7.50) is valid under
the contingent regularity of f at (x, y).
Exercise 7.133 Let all the functions fi , i = 1, . . . , m, be locally Lipschitzian around
x on a normed space X.
(a) Verify (7.53) for contingent subgradients of the maximum function (7.28).
(b) Give an example in X = R showing that the equality may fail in (7.53).
(c) Find sufficient conditions weaker than directional regularity, which ensure the
fulfillment of the equality in (7.53). Hint: Check for the case where fi are con-
tingently regular at x for all i ∈ I(x).

Exercise 7.134 Consider the minimum function fmin from (7.56), where fi : X →
R, i = 1, . . . , m, are locally Lipschitzian around x on a normed space X.
(a) Evaluate the subderivative of the minimum function fmin at x.
(b) Show that the contingent subdifferential counterpart of formula (7.57) holds
without taking the convex hull on the right-hand side therein. Hint: Use (a)
and also compare with [228, Proposition 1.113].
(c) Specify the results of (a) and (b) in the case where the functions fi are C 1 -
smooth around x for all i ∈ I(x).

Exercise 7.135 Let f : X → R be a locally Lipschitzian function on some open


subset U of a normed space X containing the interval [a, b].
(a) Verify that both sets ∂ − f (x) and ∂ − (−f )(x) are nonempty simultaneously if
and only if f is Gâteaux differentiable at x.
(b) Construct an example of f satisfying the assumptions of Theorem 7.44(b) while
not being Gâteaux differentiable and directionally regular at some point of U .
540 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

(c) Show that the contingent regularity assumptions on f and −f are essential for
the fulfillment of Theorem 7.44(b).

Exercise 7.136 Consider the setting of Theorem 7.46 and do the following:
(a) Present all the details in the proof of this theorem.
(b) Clarify the possibility to replace the symmetric construction ∂− f in Theo-
rem 7.46(b) by the contingent subdifferential ∂ − f provided that X is a Banach
space admitting a Gâteaux smooth renorming. Hint: Use Exercise 7.135(a) and
proceed similarly to the proof of [85] and [228, Theorem 3.56] (specifying this
for the case of Lipschitzian function) by taking into account that X is “trust-
worthy” with respect to ∂ − f due to [171, Theorem 4.31].

Exercise 7.137 Clarify the following issues:


(a) Under the corresponding assumptions of Theorem 7.48 and Corollaries 7.49 and
7.50, find conditions ensuring the equalities therein.
(b) Give an example showing that a counterpart of the Lipschitzian chain rule of
Theorem 7.48 fails for contingent subgradients even when X = Y = R.
(c) Find conditions ensuring the fulfillment of the contingent subgradient counter-
parts of the product and quotient rules from Corollaries 7.49 and 7.50.

Exercise 7.138 Derive inclusion (7.51) in Theorem 7.36 from the chain rule of
Theorem 7.48. Hint: Represent the maximum function (7.28) as the composition
ϕ ◦ g with g(x) := (f1 (x), . . . , fm (x)) and ϕ(y1 , . . . , ym ) := max{yi | i = 1, . . . , m}.

Exercise 7.139 Let f : X → R, where X is a normed space. Prove that


(a) f is strictly Gâteaux differentiable at x ∈ X if and only if there exists x∗ ∈ X ∗ ,
necessarily unique, such that for each v ∈ X we have

f (x + tv) − f (x) − t x∗ , v
lim = 0,
x→x t
t↓0

where the convergence is uniform for v in finite subsets of X.


(b) f is strictly Fréchet differentiable at x if and only if there exists x∗ ∈ X ∗ ,
necessarily unique, such that for each v ∈ X we have
f (x + tv) − f (x) − t x∗ , v
lim = 0,
x→x t
t↓0

where the convergence is uniform for v in bounded subsets of X.

Exercise 7.140 Prove both assertions formulated in Remark 7.55. Hint: To verify
assertion (a) therein, use the description of the strict Fréchet differentiability of
functions taken from Exercise 7.139(b).

Exercise 7.141 Show that in all the results of Sections 7.1–7.3 and the correspond-
ing exercises, the C 1 -smoothness assumption on the function in question around the
reference point can be replaced by the strict Hadamard differentiability of the func-
tions at this point. Hint: Use Theorem 7.53.
7.9 Exercises for Chapter 7 541

Exercise 7.142 Provide all the details in the proof of Theorem 7.61. Hint: Compare
with the proof of [317, Theorem 9.61].
Exercise 7.143 Consider the antiderivative F : [a, b] → R defined in (7.76). Then
(a) Give a detailed proof of Lemma 7.62 in case (d).
(b) Verify Lemma 7.62 and Theorem 7.63 when x = a and x = b.
Exercise 7.144 Let f : X → R be a locally Lipschitzian function around x ∈ X on
a normed space X. Verify the following statements:
(a) If f is convex, then both ∂f  (x) and ∂f (x) agree with the subdifferential of
convex analysis. Hint: Use the definitions.
 (x) is a singleton collapsing to the
(b) If f is Fréchet differentiable at x, then ∂f
Fréchet derivative of f at x. Does this hold for the limiting subdifferential?
(c) Prove that both sets ∂f  (x) and ∂f (x) are bounded in X ∗ by the Lipschitz
constant of f around x.
(d) Prove that ∂f (x) = ∅ provided that the space X is Asplund. Hint: Compare
with [229, Corollary 2.25].
(e) Show that ∂f  (x) = ∅ for f (x) := −|x| at x := 0 ∈ R, and ∂f (x) = ∅ for
f (x) := −x at x := 0 ∈ C[0, 1].
Exercise 7.145 Let f : X → R be a locally Lipschitzian function around some point
x ∈ X, where X is a normed space.
(a) Give an example of f and X such that the regular subgradient set ∂f  (x) is

empty while the contingent one ∂ f (x) is not.
(b) Verify the following monotonicity property of the regular subdifferential: if
 1 (x) ⊂ ∂f
f1 (x) = f2 (x) and f1 (x) ≤ f2 (x) in a neighborhood of x, then ∂f  2 (x).
Hint: Proceed by the definition.
(c) Does the monotonicity property in (b) hold for the contingent subdifferential,
generalized gradient, and the limiting subdifferential?
(d) Assuming that X admits an equivalent Fréchet smooth renorming, prove the
following smooth variational description of regular subgradients: for every x∗ ∈
 (x) there exist a neighborhood U of x and a concave, continuously Fréchet
∂f
differentiable function g : U → R such that ∇g(x) = x∗ , and

f (x) − g(x) − x − x2 ≥ f (x) − g(x) for all x ∈ U.

Hint: Compare with [229, Theorem 1.88(ii)] for any f : X → R finite at x.


Exercise 7.146 Let f : X → R be a proper function defined on a normed space that
is finite at x. Prove the limiting subdifferential representation (7.85). Hint: Compare
with [229, Theorem 1.89] for a somewhat different statement.
Exercise 7.147 Let X be an Asplund space.
(a) Assuming that f : X → R is locally Lipschitzian around x, prove the approxi-
mate mean value inequality of Theorem 7.68 following the lines discussed before
the formulation of the theorem.
(b) Obtain extensions of the mean value inequality (7.87) to the case of l.s.c. func-
tions. Let f : X → R. Hint: Compare with the proof of [229, Theorem 3.49].
542 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Exercise 7.148 Let X be a normed space. Do the following:


(a) Assuming that f : X → R is locally Lipschitzian around x, prove that ∂f (x) ⊂
∂f (x). Hint: Use Definition 7.64 and construction (7.33) of the generalized gra-
dient for locally Lipschitzian functions.
(b) Assuming merely that f : X → R is finite at x, verify the inclusion ∂f (x) ⊂
∂f (x), where ∂f (x) stands for Clarke’s subdifferential defined by the duality
scheme (7.33) by using Rockafellar’s generalized directional derivative instead
of f ◦ (x; v); see [311] and the commentaries in Section 7.10. Hint: Compare with
the proof in Kruger [188, Theorem 7].
Exercise 7.149 Let f : X → R be Lipschitzian around x on a normed space X.
(a) Assuming that X = Rn , give a direct proof of Corollary 7.70 without appealing
to generalized gradients.
(b) Construct an example showing that the local Lipschitz continuity assumption
is essential for the conclusion of Corollary 7.70.
(c) Find appropriate conditions on the space X ensuring the fulfillment of a coun-
terpart of Corollary 7.70 in infinite dimensions.
Exercise 7.150 In the setting of Theorem 7.71 clarify the following:
(a) Find sufficient conditions under which the regular subdifferential inclusion
(7.95) of Theorem 7.71 holds as an equality.
(b) Verify whether counterparts of both assertions of Theorem 7.71 hold for the
limiting subdifferential and normal cone in Rn .
Exercise 7.151 Let Ω be a nonempty and closed subset of an Asplund space X.
(a) Prove the limiting normal cone representation

N (x; Ω) = λd(x; Ω) for any x ∈ Ω. (7.124)
λ>0

Hint: Use Theorem 7.71(a) together with the Ekeland variational principle and
compare with [333] and [229, Theorem 1.97].
(b) Clarify whether representation (7.124) holds for closed sets in Banach spaces
with the limiting constructions defined in (7.80) and (7.83).
Exercise 7.152 Let Ω be a nonempty and closed subset of Rn .
(a) Show that the counterpart of Theorem 7.73 fails for the limiting normal cone
and subdifferential.
(b) Modify the construction of the limiting subdifferential (7.80) to ensure a rela-
tionship of types (7.98) and (7.124) (by replacing Ω by Ωr ) at out-of-set points.
Hint: Compare with [229, Theorem 1.101] for the latter representation.
(c) Find and justify appropriate relationships between the limiting subdifferential of
the distance function at out-of-set points of closed sets and the limiting normal
cone to the projections of these points onto the sets. Hint: Compare with [229,
Theorem 1.104] under certain well-posedness conditions.
Exercise 7.153 Verify which parts of Theorem 7.79 and its proof hold in infinite-
dimensional Hilbert spaces.
7.10 Commentaries to Chapter 7 543

Exercise 7.154 Consider the signed minimal time function defined in (6.43), where
X = Rn , and where both sets Ω and F are nonempty and convex. Determine
appropriate conditions on the constant dynamic set F that ensure subdifferential
formulas for (6.43) similar to those obtained in Subsection 7.7.3, where F = B. Hint:
Proceed as in the proof of Theorem 7.79 with appropriate modifications.

Exercise 7.155 Let X be a normed space, let T be a compact topological space,


and let f : T ×X → R. Suppose that f (·, x) is upper semicontinuous for every x ∈ X,
and f (t, ·) is a continuous DC function for every t ∈ T . Clarify whether the pointwise
supremum and infimum functions defined by

fsup (x) := sup f (t, x) and finf (x) := inf f (t, x)


t∈T t∈T

belong to the class of continuous DC functions on X.

Exercise 7.156 Using the mixing property given in Theorem 7.97, prove that the
class of continuous DC functions on normed spaces is closed under taking maxima
of finitely many functions.

Exercise 7.157 Verify the representation (7.111) in the proof of Lemma 7.101.
Hint: Proceed similarly to the proof of Theorem 7.63.

Exercise 7.158 Clarify whether every function f : Rn → R, which is continuously


differentiable on the entire space Rn , is a continuous DC function on Rn .

Exercise 7.159 Verify representation (7.116) of the convex function g in the proof
of Proposition 7.109. Hint: Use the subdifferential of g.

Exercise 7.160 Clarify whether Theorem 7.107 and the associated results of Sub-
section 7.8.3 hold in infinite dimensions.

Exercise 7.161 Let f : X → R be a continuous DC function on a normed space.


(a) Show that both inclusions in (7.117) may be strict.
(b) Clarify whether a counterpart of (7.117) holds for the contingent subdifferential.
(c) Give an example showing that inclusion (7.118) may fail for the limiting subd-
ifferential in non-Asplund spaces.
(d) Clarify whether inclusion (7.119) holds in general Banach spaces.

Exercise 7.162 Prove Theorem 7.113. Hint: Proceed similarly to the proof of The-
orem 7.112 with x∗ = 0 therein.

Exercise 7.163 Clarify whether Theorem 7.114 holds for any DC function without
the continuity assumptions on the convex functions g, h in the DC decomposition.

7.10 Commentaries to Chapter 7


This book is on convex analysis and beyond. By “beyond” we mean first of all
emphasizing the crucial role of convex ideas, methods, and results applied to non-
convex objects. In the previous chapters of the book, we have already discussed
some fundamental issues of this type; e.g., the genesis and derivations of general
variational principles in Chapter 5. This concluding chapter reveals a crucial role
544 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

of convex analysis in subdifferentiation theory for nonconvex functions and related


topics.
A remarkable feature of any convex function f : X → X is the close relationship
(in fact, the equivalence) between the subdifferential ∂f (x) of f at x ∈ dom(f ) and
the (classical) one-sided directional derivative f  (x; v) given in (7.4), which always
exists while being convex with respect to directions. As we know, the subdifferential
of a convex function f at x can be equivalently described by
 
∂f (x) = x∗ ∈ X ∗  x∗ , v ≤ f  (x, v) whenever v ∈ X . (7.125)
Conversely, rather mild assumptions allow us to express the directional derivative
of a convex function via its subdifferential as follows:
 
f  (x; v) = sup x∗ , v  v ∈ ∂f (x) , v ∈ X. (7.126)
The main difference between the basic definition of the subdifferential of convex
functions given in Definition 3.28 and the subdifferential representation in (7.125) is
that the former is global while the latter is local and thus reflects the local behavior
of a function around the reference point. Since in many aspects of variational analysis
and applications (in particular, those to optimization theory) the local behavior of the
function in question is what matters, the subdifferential construction (7.125) has the
potential to work for nonconvex functions that admit the classical directional deriva-
tive having appropriate properties. Following this line, Boris Pshenichnyi introduced
in [294] the class of quasi-differentiable functions as those admitting the classical direc-
tional derivative f  (x; v) for all directions v with its representation (7.126) via some
convex and weak∗ closed subgradient set ∂f (x) ⊂ X ∗ . This definition clearly falls into
the duality scheme of (7.125) with f  (x; v) being convex in directions. It has been real-
ized that the class of quasi-differentiable functions in the sense of Pshenichnyi contains,
along with convex functions, pointwise maxima max{f (x, u) | u ∈ U } of differentiable
functions over compact sets; the latter class is highly important for various applica-
tions including those to optimization and control theory. In the very influential book
by Alexander Ioffe and Vladimir Tikhomirov [174], a class of locally convex functions
was introduced as those functions which are uniformly directionally differentiable at
the point in question with convex and continuous directional derivatives. Then the
subdifferential of such functions was defined by using the duality scheme (7.125). As
shown in [174], locally convex functions exhibit the same subdifferential calculus rules
as their globally convex counterparts.
A broad extension of Pshenichnyi’s class of quasi-differentiable functions, also
generated by the classical directional derivative, was introduced under the same
name by Vladimir Demyanov (1938–2014) and Alexander Rubinov (1940–2006) in
[101] (see also their book [102]) as directionally differentiable functions such that
   
f  (x; v) = sup x∗ , v  v ∈ ∂f (x) + inf x∗ , v  v ∈ ∂f (x) , (7.127)

where the pair of convex sets [∂f (x), ∂f (x)] is called the quasi-differential of f at x.
The construction of quasi-differentials and basic machinery of convex analysis made
it possible to develop an extensive quasi-differential calculus important for various
applications; see, e.g., the collection of papers [103] among other publications on
this and related topics.
Since it is not the case that every (even continuous) function has the directional
derivative in each direction (the simplest example is f (x) := x sin(1/x) for x = 0
7.10 Commentaries to Chapter 7 545

on R with f (0) := 0) that moreover admits special representations as in (7.126) or


(7.127), some extensions of the classical directional derivative were defined and then
used to introduce the corresponding subdifferentials by the duality scheme (7.125)
of convex analysis with the replacement of f  (x; v) by its positive homogeneous
extension.
In this chapter, we mainly consider the two most prominent and useful direc-
tional (sub)derivatives of this type. The contingent derivative/subderivative df (x; x)
from (7.2) is a direct extension of the classical directional derivative with the replace-
ment of the full limit (which may not exist) in (7.4) by the lower limit in (7.2), which
always exists while it may not be finite. This construction goes back to Ulisse Dini
(1845–1918) who introduced it in 1878 as a “derivate number” for functions of one
real variable. [111]. The extended directional derivative (7.2) and the corresponding
subdifferential (7.34) have appeared in variational analysis under different names;
see, e.g., the books [14, 54, 171, 228, 288, 317] and the references therein.
A rather small modification in the Dini construction (7.2) by including t ↓ 0
into the lower limit in (7.5) makes this subderivative more suitable to deal with
(non-Lipschitzian) extended-real-valued functions. The latter construction, which
is equivalent to (7.2) for locally Lipschitzian functions due to Proposition 7.2, is
labeled sometimes as the Dini-Hadamard directional derivative and the correspond-
ing subdifferential defined by the same duality scheme (7.34) by using (7.5) on the
right-hand side. In fact, all the results and proofs presented in this chapter for the
subderivative (7.2) and the subdifferential (7.34) hold with some modifications for
the extended-real-valued setting by using (7.5) instead of (7.2); see the commentaries
below. We concentrate here on the locally Lipschitzian case to make a comparison
and parallel study with Clarke’s generalized directional derivative and generalized
gradient, where the Lipschitz continuity is essential. It seems that such a parallel
study of the two major extended directional derivatives is provided for the first time
in the literature on variational analysis and generalized differentiation.
The generalized directional derivative (7.1) and the corresponding generalized
gradient (7.33) were defined, named so, and systematically studied in the fundamen-
tal paper [74] by Francis (Frank) Clarke following his doctoral dissertation (1973)
under the direction of R. T. Rockafellar; see also the books [76, 77, 312, 317] with
the references therein along with many other publications.
The generalized directional derivative (7.1) for locally Lipschitzian function is
by far the crucial construction of Clarke’s generalized differentiation, even beyond
the Lipschitz continuity. It is significantly different from the classical directional
derivative (7.4) and its subderivative extension (7.2); see the discussion after Defini-
tion 7.1. The notion of directional regularity (known also as Clarke regularity) was
defined in [74, 76] in the equivalent form presented in Proposition 7.3.
The properties and calculus rules for Clarke’s generalized directional deriva-
tive included in this chapter are mostly taken from the book [76] while they are
presented here somewhat differently. A major property of f ◦ (x; v) is its automatic
convexity with respect to directions, which is the source of good calculus rules and
applications of the generalized directional derivative (7.1). This is not shared by
the subderivative (7.2), which generally provides a better local approximation of
the function in question around x while lacking adequate calculus without imposing
additional assumptions. The calculus rules for subderivatives (contingent deriva-
tives) presented in Subsection 7.2.1 are taken from the recent paper by Mohammadi
and Mordukhovich [221], where they are obtained for extended-real-valued func-
tions under various qualification conditions; see also the paper of these authors with
546 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

Sarabi [222] for a general chain rule of this type in finite dimensions. The notion
of epi-differentiability used in the obtained calculus rules is taken from the book by
Rockafellar and Wets [317].
Both the generalized gradient and the contingent subdifferential from Defini-
tion 7.21 are obtained by the same duality scheme of convex analysis (7.125) by
replacing the classical directional derivative with the corresponding extended direc-
tional derivatives (7.1) and (7.2). But these two directionally generated subdiffer-
entials are essentially different from each other for nonconvex functions. Indeed,
Clarke’s generalized directional derivative f ◦ (x; v) can be fully determined by the
generalized gradient via representation (7.31), which fails for the Dini/contingent
constructions even in the simplest case where f (x) := −|x| at x = 0 ∈ R. Note
that the full duality relationship (7.31) means in fact that any locally Lipschitzian
function is extendedly quasi-differentiable (in the line of Pshenichnyi) by replacing
the classical directional derivative in (7.125) by its generalized Clarke counterpart.
The calculus rules for generalized gradients presented in Subsections 7.3.2 and
7.4.2 mainly follow Clarke’s book [76]. Note that the upper estimate of the general-
ized gradient of the minimum function is the same as for the maximum one, which is
due to the two-sided symmetry property (7.54); see the discussions in Remark 7.37.
The calculus rules for the contingent subdifferential established in Subsection 7.3.3
in the equality form have been recently obtained by Mohammadi and Mordukhovich
[221] under the notion of contingent regularity introduced therein. The meaning of
this property (7.58) is to postulate the full duality (quasi-differentiability) rela-
tionship of type (7.125) for the contingent constructions. As illustrated by simple
examples in the text, the contingent regularity of locally Lipschitzian functions may
be strictly weaker than its (Clarke) directional regularity counterpart even for dif-
ferentiable functions on R. It is also different from other regularity notions of this
type, which were comprehensively studied by Bounkhel and Thibault [59]. Note also
that paper [221] contains appropriate versions of the contingent calculus results to
extended-real-valued functions via the modified subderivative (7.5) and the corre-
sponding subdifferential (7.34) in arbitrary (incomplete) normed spaces.
The mean value theorem in terms of the generalized gradients of Lipschitz con-
tinuous functions in Theorem 7.44(a) is due to Gérald Lebourg [200], a student
of Ivar Ekeland. Its contingent counterpart in Theorem 7.44(b) is new. Note that
the parallel proof of both results given here is different from those in [200] and in
Clarke’s book [76] for generalized gradients. As a consequence of Theorem 7.44, it
is shown in Theorem 7.46 that if either one of the symmetric constructions ∂f and
∂− f from (7.65) are monotone mappings for a Lipschitz continuous function f , then
f must be convex; see also Exercise 7.136(b) for further extensions. The results of
Theorems 7.44(a) and 7.46(a) go back to [76], while assertions (b) of these theorems
are new. Far-going generalizations of Theorem 7.46(a) are obtained by Correa et al.
[85] for axiomatically defined subdifferentials of extended-real-valued l.s.c. functions
on Banach spaces by employing Zagrodny’s approximate mean value theorem [358]
that has been already used in Chapter 5 and discussed in the commentaries therein.
One of the most crucial differences between the contingent subdifferential and
generalized gradient from Definition 7.21 is that the former is a nonsmooth extension
of the classical derivative at the reference point, while the latter extend the strict
derivative therein; we have discussed this at the beginning of Section 7.5. Formally
the notion of (Fréchet) strict derivative at a point was introduced by Leach [199]
in 1961, but in fact this notion was already used in 1950 in the fundamental paper
7.10 Commentaries to Chapter 7 547

[145] by Lawrence Graves, from the famous Chicago School of the calculus of vari-
ations, to prove what is now known as the Lyusternik-Graves theorem of nonlinear
analysis. The notions of strict derivatives, with respect to various bornologies, play
an important role in many aspects of convex and variational analysis Our exposition
in Subsection 7.5.1 mainly follows the book by Phelps [290].
Theorem 7.56 is due to Clarke [76], while Theorem 7.58 can be found in Borwein
[44] with a different proof. Let us mention to this end the notions of “weak differ-
entiability” and “strict-weak differentiability” of single-valued mappings between
Banach spaces with respect to various bornologies. These notions, introduced by
Mordukhovich and Wang [261], may be weaker than even the classical Gâteaux dif-
ferentiability for Lipschitzian mappings with values in the spaces of sequences 2 ;
see [261] and Mordukhovich’s book [228] with applications to graphical regularity
of such mappings and other issues.
Section 7.6 first presents a simple proof of the fundamental Rademacher’s the-
orem from geometric measure theory in finite-dimensional spaces [298]. This result
discovered in 1919 by Hans Rademacher (1892–1969), a student of Constantin
Carathéodory, is often used in convex and variational analysis without giving a
proof. The only exception known to us is the second edition of the book by Jonathan
Borwein and Adrian Lewis [48] with the proof different from the one given in Theo-
rem 7.60. After numerous erroneous attempts and false counterexamples, an exten-
sion of this result to the a.e. Fréchet differentiability of Lipschitz continuous func-
tions defined on Asplund spaces was established by David Preiss in [293] with a very
involved and delicate proof.
Theorem 7.61 on the limiting gradient representation of the generalized gradient
is due to Clarke [74, 76], which can be taken (and often is) as the definition of
generalized gradient of a locally Lipschitzian function. The set under the convex
hull in (7.72) was defined as early as in 1972 by Naum Shor (1937–2006) under
the name of the set of almost-gradients of a locally Lipschitzian function, while its
convex hull of this set was called in [324] the set of generalized almost-gradients; see
also Shor’s book [325] and further algorithmic developments in, e.g., [128, 182, 297]
along with many subsequent publications.
Theorem 7.63 on subdifferentiation of indefinite integrals is taken from Clarke
[76], while Subsection 7.6.3 contains some additional material and treats all of this
differently; cf. the paper by Chieu [73] for similar elaborations for the limiting subd-
ifferential of Mordukhovich. Let us also mention the opposite direction in integration
theory of variational analysis concerning the possibility to determine a function from
its subdifferential up to a constant. The results in this direction go back to Rock-
afellar [307] for convex functions and then were largely extended by Thibault and
Zagrodny [335] and by Thibault and Zlateva [336] to subdifferentials of nonconvex
functions based on Zagrodny’s approximate mean value theorem.
Recall again that the definitions and major properties of Clarke’s generalized
directional derivative and generalized gradient discussed above are strongly related
to the local Lipschitz continuity of the function in question. The following Lipschitz-
backed procedure to extend the generalized gradient construction to non-Lipschitzian
functions was developed in [74, 76]: define first the tangent cone to a set by using
Clarke’s generalized directional derivative of the Lipschitz continuous distance func-
tion associated with the set, then define the normal cone to the set by using
the duality/polarity relation with tangents, and finally define the subdifferential of
f : X → R at x via the normal cone to the epigraph epi(f ) at (x, f (x)). However,
548 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

proceeding in this way does not provide an appropriate convex generalized direc-
tional (sub)derivative extending f ◦ (x, v) in such a way that the full duality corre-
spondence with ∂f (x) holds for non-Lipschitzian functions.
This has been done by Rockafellar in the series of publications [310–312],
where he introduced appropriate subderivative constructions for general classes of
extended-real-valued functions ensuring the non-Lipschitzian counterparts of (7.33)
and (7.42). If f : X → R is finite at x and lower semicontinuous around this point,
Rockafellar’s generalized directional derivative, or upper subderivative, is defined by
f (x + tz) − f (x)
f ↑ (x; v); = sup lim sup inf ,
γ>0 f z−v ≤γ t
x→x
t↓0

f
where the symbol x → x means that x → x with f (x) → f (x), which is of course
redundant for continuous functions. The obtained full duality relationships
 
∂f (x) = x∗ ∈ X ∗  x∗ , v ≤ f ↑ (x; v), v ∈ X , f ↑ (x; v) = sup x∗ , v
x∗ ∈∂f (x)
(7.128)
are the key to achieve adequate calculus rules for both constructions in (7.128) under
appropriate qualification conditions by using the machinery of convex analysis.
Note that some important properties of Clarke’s constructions for Lipschitz func-
tions are lost for their non-Lipschitzian counterparts. This includes the symmetry
property (7.54), the nonemptiness and boundedness of ∂f (x), and the robustness
∂f (x) = Lim sup ∂f (x), (7.129)
f
x→x

which fails for l.s.c. functions on Rn while being valid for the convex subdifferential.
As seen above, the distance functions associated with nonempty subsets of
normed spaces play a very important role in many aspects of convex and varia-
tional analysis. In Chapter 3 we conducted a comprehensive study of the distance
functions for convex sets in normed spaces by calculating their subgradients at in-set
and out-of-set points and proving detailed commentaries to the obtained results and
related issues. Furthermore, Chapter 6 presented results and commentaries on the
class of signed distance functions without, however, their subdifferential study. Now
Section 7.7 provides such a study in the case of convex sets.
The subdifferential calculations for the signed distance function given in Theo-
rem 7.79 were originally obtained by Luo et al. [212]. We present here another proof
taken from our recent paper with Cuong and Wells [92]. This proof involves the con-
structions and results of nonconvex generalized differentiation, which are certainly
of their independent interest in both finite and infinite dimensions.
To the best of our knowledge, the regular subdifferential construction (7.79)
and the corresponding regular normal cone were introduced in 1974 by Bazaraa,
Goode and Nashed [35] motivated by deriving necessary optimality conditions in
minimax problems. The original name for (7.79) was “the set of ≥ gradients.” The
definition in (7.79) is a lower limit one-sided version of the classical Fréchet deriva-
tive, which explains using the name “Fréchet subdifferential” for this construction
although Fréchet himself had nothing to do with it. The name “regular subdiffer-
ential” comes from Rockafellar and Wets [317]. Note that the subdifferential (7.79)
has been actively used in the theory of viscosity solutions of Hamilton-Jacobi partial
7.10 Commentaries to Chapter 7 549

differential equations starting with the fundamental paper by Crandall and Lions
[86]. We refer the reader to the survey paper by Kruger [189] on Fréchet-type con-
structions.
Note that the regular subdifferential may not be directionally generated, but
it is always the case for finite-dimensional spaces where the regular subdifferential
agrees with the contingent one; see Proposition 7.65, whose proofs clearly hold for
extended-real-valued functions by using the Dini-Hadamard definition (7.5).
Theorem 7.71(a) on regular subgradients of the distance function at in-set points
is due to Kruger [186]; see also Ioffe [169]. The out-of-set case in Theorem 7.71(b)
via projections and related results can be found in our paper [231]. The out-of-set
representation (7.98) in Theorem 7.73 is again due to Kruger [186] whose proof was
clarified by Bounkhel and Thibault [59].
The limiting subdifferential (7.80) and the limiting normal cone (7.83) were
introduced in 1976 by the first author [223] in the equivalent forms for finite-
dimensional spaces. The infinite-dimensional extensions were given in the subse-
quent (starting with 1980) papers by Mordukhovich and his student Alexander
Kruger [187, 188, 190, 191, 226] by using the weak∗ sequential limit of Fréchet ε-
subdifferential/ε-normal constructions in Banach spaces admitting a Fréchet smooth
renorming. Other infinite-dimensional extensions of the constructions in [223] to var-
ious types of Banach spaces were developed in the series of papers by Alexander Ioffe
starting with his note [168] of 1981; see more references in, e.g., [171, 226, 228, 317].
The way of defining the limiting subdifferential and limiting normal cone in
(7.80) and (7.83), respectively, follows the pattern introduced by Mordukhovich
f
and his student Yongheng Shao [257], where the symbol “x → x” should be used
when f is discontinuous. This is different from the previous attempt by Kruger and
Mordukhovich involving ε-constructions. Employing definitions (7.80) and (7.83)
in the case where f is l.s.c. and Ω is closed around x, and where the space X is
Asplund (this is more general than the Fréchet smooth renorming of X assumed in
[187, 190]), a complete generalized differential theory with numerous applications
was developed in Mordukhovich’s two-volume monographs [228] and the references
therein. Here we introduce these definitions in arbitrary normed spaces and see that
some results presented in the text and exercises hold in this general framework while
the others require the closedness of the Asplund space assumptions, which allow
us to employ variational/extremal principles and techniques. Observe to this end
a nonconvex extension of Lebourg’s mean value theorem to continuous functions
on Asplund spaces obtained by replacing the generalized gradient of Lipschitzian
functions in (7.62) by the symmetric limiting subdifferential ∂ 0 f (x) := ∂f (x) ∪
(−∂(−f ))(x) introduced and utilized by the first author in [224]. This extended mean
value theorem is given [228, Theorem 3.47] in full generality, while its Lipschitzian
versions in finite-dimensional and Fréchet smooth spaces go back to the earlier work
by Kruger and Mordukhovich; see [187, 225, 226].
The regular subdifferential counterpart of the approximate mean value theorem
in Asplund spaces and its l.s.c. extension is due to Mordukhovich and Shao [257]
(see also [228, Theorem 3.49] for a more detailed result), while the previous results
in this direction in Fréchet smooth spaces can be found in Borwein and Preiss [50]
and Loewen [208]. Theorem 7.69 on relationships between the generalized gradient
and limiting subdifferential of locally Lipschitzian functions is taken from [228, 257].
We refer the reader to our papers [231, 233] and the book [228] for various
results on the limiting subdifferential of the distance function and its modifications
550 7 CONVEXIFIED LIPSCHITZIAN ANALYSIS

in different space settings. Note to this end the remarkable result by Lionel Thibault
[333] who proved the limiting normal cone representation (7.124) via the limiting
subdifferential of the distance function associated with a closed subset of an arbi-
trary Banach space at in-set points. It is interesting to observe that a counterpart
of Thibault’s relationship, with the replacement of Ω by its enlargement Ωr from
(7.96), fails at out-of-set points even for simple nonconvex sets in R2 . It was observed
in our paper [231], where the notion of the right-sided limiting subdifferential was
introduced to obtain the desired relationship in Banach spaces; see [228, Theo-
rem 1.101]. The result of Theorem 7.74 can be found in the books by Mordukhovich
[226] and Rockafellar and Wets [317], while here we provide a new simple proof.
Difference of convex (DC) functions studied in Section 7.8 are also known as
the delta-convex functions. To the best of our knowledge, such functions defined on
finite-dimensional spaces were first considered as early as in 1949 by the famous
geometer Alexander Alexandrov (1912–1999) in his papers [3, 4] written in Russian.
A systematical investigation of DC functions on Rn has been started by Philip
Hartman (1915–2015) in 1959 who defined continuous DC functions together with
local counterparts, and then established their basic properties in [152].
Recent years have witnessed a profound interest in DC optimization, from both
theoretical and algorithmic viewpoints, with numerous applications to practical
models of machine learning, facility locations, etc. A strong motivation came from
the simple and efficient DC algorithm (DCA) to solve minimization problems with
DC objectives suggested by Pham Dinh Tao and Le Thi Hoai An in [332]. Further
developments on this algorithm and related issues can be found in the papers by
Aragón Artacho and Phan Tu Vuong [7], Bajaj et al. [23], Geremew et al. [140],
Nam et al. [269, 272], and the references therein. These topics with a variety of
applications will be developed in the second volume of our book [240].
On the other spectrum of developments in DC optimization, we mention appli-
cations to theoretical aspects of semi-infinite and bilevel programming given by
Dinh et al. [109, 110] and to the major stability properties of convex multifunctions
obtained by Mordukhovich and Nghia in [248]; see also Mordukhovich’s book [229,
Chapter 7]. Various branches of mathematics, where DC results and ideas are very
fruitful, were overviewed and further developed by Miroslav Bačák and Jonathan
Borwein [20] and by Libor Veselý and Luděk Zajı́ček [342] who also considered DC
mappings with values in normed spaces.
Subsections 7.8.1–7.8.3 are mainly based on the fundamental paper by Hartman
[152] and its infinite-dimensional extensions given in the aforementioned papers
[20, 342]. We specifically mention the remarkable result of Theorem 7.97 known as
the mixing lemma, which is due to Veselý and Zajı́ček [342, Lemma 4.8].
Subsection 7.8.4 presents major calculus rules for subdifferentials of nonconvex
DC functions in terms of the corresponding constructions for convex functions in
DC compositions. The obtained calculus leads to optimality conditions for local and
global minimizers and to nonconvex duality for DC functions.
The regular subdifferential estimates in (7.117) of Theorem 7.111 follow from
more general results of our paper with Nguyen Dong Yen [246]. The limiting subd-
ifferential estimate (7.118) of that theorem seems to appear here for the first time,
while the one for the generalized gradient in (7.119) immediately follows from the
latter.
The conjugate calculation for DC functions from Theorem 7.112 and its extended-
real-valued version are due to Jean-Baptiste Hiriart-Urruty [163]; see also [123]). The
7.10 Commentaries to Chapter 7 551

latter version immediately implies the nonconvex duality result of Theorem 7.113,
which goes back to John Toland [338] and independently to Ivan Singer [327, 328]
being known as Toland-Singer duality. Among various results related to this duality,
we mention a calculation formula for ε-subdifferentials of DC functions on Banach
spaces obtained by Juan Enrique Martı́nez-Legaz and Alberto Seeger in [214].
The necessary optimality condition of Theorem 7.114(a) for local minimizers
in unconstrained DC optimization problems in normed spaces immediately follows
from the subdifferential difference rule of Theorem 7.111(a). We refer the reader to
the paper by Mirjam Dür [115] for an interesting ε-subdifferential extension of this
result in Banach spaces. The ε-subdifferential characterization of global minimizers
of DC functions from Theorem 7.114(b) was established by Hiriart-Urruty in Banach
spaces. The proof of this theorem given in our book can be found in the recent paper
by Burachik, Dao, and Lindstrom [65].
Glossary of Notation and Acronyms

OPERATIONS AND SYMBOLS


:= and =: equal by definition
≡ identically equal
≺,  binary, preference, ordering relations

indication of some dual/adjoint/polar
operation
·, · canonical pairing between space X and its
topological dual X ∗
d(x, y) metrics in metric spaces
 ·  and | · | norm and absolute value (for a real number),
respectively
x→x x converges to x strongly (by norm)
w∗
x→x x converges to x weak∗ (in weak∗ topology of
X ∗)
lim inf lower limit for real numbers
lim sup upper limit for real numbers
Lim inf inner sequential limit for set-valued mappings
Lim sup outer sequential limit for set-valued mappings
dimX and codimX dimension and codimension of X, respectively
Ω1 × Ω2 Cartesian product of Ω1 and Ω2
haus(Ω1 , Ω2 ) Pompieu-Hausdorff distance between sets
Ω1 ⊂ Ω2 Ω1 is included in or equal to Ω2
f1 f2 infimal convolution of two functions
f1 ∨ f2 pointwise maximum of two functions
tr(A) trace of matrix X
 end of proof

SPACES
R := (−∞, ∞) real line
R+ = R+ collection of nonnegative numbers
R := (−∞, ∞] extended real line
C collection of complex numbers
Rn n-dimensional Euclidean space
Rn n
+ and R− nonnegative and nonpositive orthant of Rn
© Springer Nature Switzerland AG 2022 553
B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9
554 Glossary of Notation and Acronyms

p , 1 ≤ p ≤ ∞ Lebesgue spaces of sequences


Lp (T ), 1 ≤ p ≤ ∞ Lebesgue spaces of functions on T
C(T ) space of continuous functions on a compact
set T
C1 class of continuously differentiable functions
C2 class of twice continuously differentiable
functions
C 1,1 subclass of C 1 with Lipschitzian derivatives
X/L quotient space
X ⊕Y direct sum of spaces
X algebraic dual space of X

SETS
∅ empty set
N set of natural numbers
Ωc complement of Ω ⊂ X in X
Ω◦ polar set of Ω
Ωr r-expansion of Ω
Ω
x→x x converges to x with x ∈ Ω
B(x; r) = Br (x) open ball centered at x with radius r
B(x; r) = Br (x) closed ball centered at x with radius r
B and B∗ closed unit balls of space and dual space in
question
S and S∗ unit spheres of space and dual space
in question
int(Ω) and ri(Ω) interior and relative interior of Ω
qri(Ω) and iri(Ω) quasi-relative and intrinsic relative interiors
of a convex set Ω
core(Ω) algebraic interior/core of a convex set Ω
lin(Ω) linear closure of Ω
Ω = cl (Ω) closure and weak∗ topological closure of Ω
cl∗ (Ω) = co∗ (Ω) weak∗ topological closure of Ω
clF (Ω) closure of Ω relative to F
bd(Ω) set boundary
co(Ω) and clco (Ω) = co(Ω) convex hull and closed convex hull of Ω
cone(Ω) = R+ Ω conic hull of Ω
KΩ convex cone generated by Ω
Ω∞ (x) horizon/asymptotic cone of set Ω at x
aff(Ω) and aff(Ω) affine hull and closed affine hull of Ω
proj x Ω = proj X Ω x-projection of sets in product spaces
Π(x; Ω) = ΠΩ (x) projector of x into Ω
N (x; Ω) normal cone of convex analysis to Ω at x
when Ω is convex
ε (x; Ω)
N set of ε-normals to Ω at x
N (x; Ω) (Mordukhovich) limiting normal cone to Ω
at x
 (x; Ω)
N (Fréchet) regular normal cone to Ω at x
Glossary of Notation and Acronyms 555

N (x; Ω) (Clarke) convexified normal cone to Ω at x


T (x; Ω) tangent/contingent cone to Ω at x

FUNCTIONS
δ(·; Ω) = δΩ (·) indicator function of Ω
ξΩ (·) characteristic function of Ω
σ(·; Ω) = σΩ (·) support function of Ω
dist(·; Ω) = d(·; Ω) = dΩ (·) distance function for Ω
 Ω) = dΩ (·)
d(·; signed distance function for Ω
pΩ Minkowski gauge of function associated with
set Ω
TΩF = TF (·; Ω) minimal time function associated with
dynamic F and target Ω
TΩF = TF (·; Ω) signed minimal time function associated with
dynamic F and target Ω
dom(f ) domain of f : X → R
epi(f ), hypo(f ), gph(f ) epigraph, hypergraph, graph of f
cont(f ) set of points where function f is continuous
f ∗ and f ∗∗ Fenchel conjugate and biconjugate of f
f∗ concave conjugate of f
Pf perspective function associated with f
f∞ horizon function associated with f
f
x→x x → x with f (x) → f (x)
f  (x) = f+ 
(x) and f− (x) right and left derivatives of f : R → R at x
f  (x) = fF (x) = ∇f (x) (Fréchet) derivative/gradient of f at x

fG (x) Gâteaux derivative of f at x

fH (x) Hadamard derivative of f at x
f  (x); v) directional derivative of f at x in direction v
df (x; v) (Dini, Dini-Hadamard) contingent derivative/
subderivative of f at x in direction v
f ◦ (x; v) (Clarke) generalized directional derivative
f ↑ (x; v) (Rockafellar upper) directional derivative of
f at x in direction v
∂f (x) subdifferential of convex function f at x
∂x f (x, y) partial subdifferential of f (x, y) with respect
x at (x, y)
∂ε f (x) ε-subdifferential/approximate subdifferential
of convex function f at x
∂ − f (x) (Dini) contingent subdifferential
of f at x
∂f (x) (Clarke) generalized gradient of f at x
 (x)
∂f (Fréchet) regular subdifferential of f at x
∂f (x) (Mordukhovich) limiting subdifferential of
f at x
∂ ∞ f (x) singular/horizon subdifferential of f at x
556 Glossary of Notation and Acronyms

SET-VALUED MAPPINGS
F: X→
→Y set-valued mapping/multifunction from
X to Y
dom(F ) domain of F
rge(F ) range of F
gph(F ) graph of F
ker(F ) kernel of F
F  norm of positive homogeneous mapping
F −1 : Y →
→X inverse mapping of F : X → →Y
F (Ω) and F −1 (Ω) image and inverse image/preimage of Ω
under F
F ◦G composition of mappings
Ef epigraphical multifunction associated with
function f
D∗ F (x, y) coderivative of F at (x, y) ∈ gph(F )

ACRONYMS
CEL compactly epi-Lipschitzian (sets)
DC difference of convex functions
TVS topological vector space
LCTV locally convex topological vector (spaces)
l.s.c. lower semicontinuous (functions)
u.s.c. upper semicontinuous (functions)
SNC sequentially normally compact
List of Figures

Fig. 1.1 A balanced set


Fig. 1.2 A symmetric set that is not balanced
Fig. 1.3 An absorbing set
Fig. 1.4 Proof of the surjectivity of π ∗
Fig. 1.5 A property of nowhere dense sets
Fig. 2.1 Convex and nonconvex sets
Fig. 2.2 An example of a set-valued mapping
Fig. 2.3 Illustration of Proposition 2.6
Fig. 2.4 Illustration of Lemma 2.12
Fig. 2.5 An affine set
Fig. 2.6 The affine hull of a set
Fig. 2.7 Convex separation
Fig. 2.8 Relative interior
Fig. 2.9 Separation in a linear subspace
Fig. 2.10 Illustration of the proof of Lemma 2.89
Fig. 2.11 Illustration of extreme points
Fig. 2.12 Convex and nonconvex functions
Fig. 2.13 Epigraphs of convex and nonconvex functions
Fig. 2.14 The distance function
Fig. 2.15 A quasiconvex function
Fig. 2.16 Lemma 2.114
Fig. 2.17 Example 2.136
Fig. 2.18 Illustration of Lemma 2.150
Fig. 2.19 A lower semicontinuous function
Fig. 3.1 The normal cone
Fig. 3.2 Extremal systems
Fig. 3.3 Theorem 3.10

© Springer Nature Switzerland AG 2022 557


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9
558 List of Figures

Fig. 3.4 Proof of Theorem 3.10


Fig. 3.5 Relative interior condition
Fig. 3.6 Definition 3.28
Fig. 3.7 Proposition 3.30
Fig. 3.8 Theorem 3.59
Fig. 3.9 Theorem 3.80
Fig. 3.10 Q, P , D, H0 , M , and Θ
Fig. 4.1 Definition 4.1
Fig. 5.1 Subdifferential mean value theorem
Fig. 6.1 Nesterov’s smoothing for f (x) = |x|. Case (a)
Fig. 6.2 Nesterov’s smoothing for f (x) = |x|. Case (b)
Fig. 6.3 Convex cone generated by a set
Fig. 6.4 Carathéodory’s theorem
Fig. 6.5 Duality between tangent and normal cones
Fig. 7.1 The functions gε , , and h
References

[1] S. Adly, E. Ernst, M. Théra, On the closedness of the algebraic difference of


closed convex sets. J. Math. Pures Appl. 82, 1219–1249 (2003)
[2] L. Alaoglu, Weak topologies of normed linear spaces. Ann. Math. 41, 252–267
(1940)
[3] A.D. Alexandrov, On surfaces represented by the difference of convex func-
tions. Izv. Akad. Nauk Kaz. SSR, Ser. Math. Mekh. 60, 3–20 (1949)
[4] A.D. Alexandrov, Surfaces represented by the difference of convex functions.
Dokl. Akad. Nauk SSSR 72, 613–616 (1950); English trans. Siberian Electronic
Mathematical Reports 9, 360–376 (2012)
[5] P. Alexandroff (Alexandrov), H. Hopf, Topologie, I (Springer, Berlin, 1935)
[6] D.T.V. An, N.D. Yen, Differential stability of convex optimization problems
under inclusion constraints. Applic. Anal. 94, 108–128 (2015)
[7] F.J. Aragón Artacho, P.T. Vuong, The boosted difference of convex functions
algorithm for nonsmooth functions. SIAM J. Optim. 30, 980–1006 (2020)
[8] K.J. Arrow, An extension of the basic theorems of classical welfare economics,
in Proceedings of the Second Berkeley Symposium on Mathematical Statistics
and Probability (University of California Press, Berkeley, CA, 1951), pp. 507–
532
[9] E. Asplund, Fréchet differentiability of convex functions. Acta Math. 121,
31–47 (1968)
[10] H. Attouch, H. Brézis, Duality of the sum of convex functions in general
Banach spaces, in Aspects of Mathematics and Its Applications, vol. 34, ed.
by J.A. Barroso (North-Holland, Amsterdam, 1986), pp. 125–133
[11] J.-P. Aubin, Contingent derivatives of set-valued maps and existence of solu-
tions to nonlinear inclusions and differential inclusions, in Mathematical Anal-
ysis and Applications, ed. by L. Nachbin (Academic Press, New York, 1981),
pp. 159–229
[12] J.-P. Aubin, Optima and Equilibria: An Introduction to Nonlinear Analysis,
2nd edn. (Springer, NY, 1998)
[13] J.-P. Aubin, I. Ekeland, Applied Nonlinear Analysis (Wiley, New York, 1984)
[14] J.-P. Aubin, H. Frankowska, Set-Valued Analysis (Birkhäuser, Boston, MA,
1990)

© Springer Nature Switzerland AG 2022 559


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9
560 References

[15] R.J. Aumann, Integrals of set-valued functions. J. Math. Anal. Appl. 12, 1–12
(1965)
[16] A. Auslender, Differential stability in nonconvex and nondifferentiable pro-
gramming. Math. Program. Stud. 10, 29–41 (1979)
[17] A. Auslender, M. Teboulle, Asymptotic Cones and Functions in Optimization
and Variational Inequalities (Springer, New York, 2003)
[18] S. Axler, Measure, Integration & Real Analysis, Graduate Texts in Mathemat-
ics (Springer, 2020)
[19] D. Azé, J.-P. Penot, Uniformly convex and uniformly smooth convex functions.
Ann. Facul. Sci. Toulouse, Sér. 6, 4, 705–730 (1995)
[20] M. Bačák, J.M. Borwein, On difference convexity of locally Lipschitz functions.
Optimization 60, 961–978 (2011)
[21] R. Baier, E. Farkhi, V. Roshchina, On computing the Mordukhovich subd-
ifferential using directed sets of two dimensions, in Variational Analysis and
Generalized Differentiation in Optimization and Control, ed. by R.S. Burachik,
J.-C. Yao (Springer, New York, 2010), pp. 59–94
[22] R. Baire, Sur les fonctions de variables réelles. Ann. Math. 3, 1–123 (1899)
[23] A. Bajaj, B.S. Mordukhovich, N.M. Nam, T. Tran, Solving a continuous mul-
tifacility location problem by DC algorithms. Optim. Meth. Softw. (2020).
https://doi.org/10.1080/10556788.2020.1771335
[24] S. Banach, Sur les opérations dans les ensembles abstraits et leur application
aux équations integrals. Fund. Math. 3, 133–181 (1922)
[25] S. Banach, Sur les fonctionelles linéaires, I, II. Stud. Math. 1(211–216), 223–
229 (1929)
[26] S. Banach, Théorie des Opérations Linéaries (Monografje Matematyczne, I,
Warszawa, 1932)
[27] S. Banach, S. Mazur, Zur Theorie der linearen Dimension. Stud. Math. 4,
100–112 (1933)
[28] S. Banach, H. Steinhaus, Sur le principe de la condensation de singularités.
Fund. Math. 9, 50–61 (1927)
[29] T.Q. Bao, B.S. Mordukhovich, Relative Pareto minimizers for multiobjective
problems: existence and optimality conditions. Math. Program. 122, 301–347
(2010)
[30] T.Q. Bao, B.S. Mordukhovich, A. Soubeyran, Variational analysis in psycho-
logical modeling. J. Optim. Theory Appl. 164, 290–315 (2015)
[31] I. Bárány, A generalization of Caratéodory’s theorem. Disc. Math. 40, 141–152
(1982)
[32] M. Bardi, A boundary value problem for the minimal-time function. SIAM J.
Control Optim. 27, 776–785 (1989)
[33] H.H. Bauschke, J.M. Borwein, W. Li, Strong conical hull intersection property,
bounded linear regularity, Jameson’s property (G), and error bounds in convex
optimization. Math. Program. 86, 135–160 (1999)
[34] H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator
Theory in Hilbert Spaces, 2nd edn. (Springer, New York, 2017)
[35] M.S. Bazaraa, J.J. Goode, M.Z. Nashed, On the cones of tangents with appli-
cations to mathematical programming. J. Optim. Theory Appl. 13, 389–426
(1974)
References 561

[36] D.P. Bertsekas, S.K. Mitter, A decent numerical method for optimization prob-
lems with nondifferentiable cost functionals. SIAM J. Control 11, 637–652
(1973)
[37] D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization
(Athena Scientific, Boston, MA, 2003)
[38] G. Birkhoff, E. Kreyszig, The establishment of functional analysis. Historia
Math. 11, 258–321 (1984)
[39] E. Bishop, R.R. Phelps, A proof that every Banach space is subreflexive. Bull.
Amer. Math. Soc. 67, 97–98 (1961)
[40] N.N. Bogolyubov, Sur quelques méthodes nouvelles dans le Calculus des Vari-
ations. Ann. Math. Pura Appl. 7, 249–271 (1929)
[41] V.G. Boltyanskii, The maximum principle in the theory of optimal processes.
Dokl. Akad. Nauk SSSR 119, 1070–1073 (1958)
[42] J.F. Bonnans, A. Shapiro, Perturbation Analysis of Optimization Problems
(Springer, New York, 2000)
[43] T. Bonnesen, W. Fenchel, Theorie der konvexen Körper (Springer, Berlin,
1934)
[44] J.M. Borwein, Minimal cuscos and subgradients of Lipschitz functions, in Fixed
Point Theory and Its Applications, ed. by J.-B. Baillion, M. Thera (Longman,
Essex, UK, 1991), pp. 57–82
[45] J.M. Borwein, Maximal monotonicity via convex analysis. J. Convex Anal.
13, 561–586 (2006)
[46] J.M. Borwein, R. Goebel, Notions of relative interior in Banach spaces. J.
Math. Sci. 115, 2542–2553 (2003)
[47] J.M. Borwein, A.S. Lewis, Partially finite convex programming, Part I: quasi-
relative interiors and duality theory. Math. Program. 57, 15–48 (1992)
[48] J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization, 2nd
edn. (Springer, New York, 2006)
[49] J.M. Borwein, Y. Lucet, B.S. Mordukhovich, Compactly epi-Lipshitzian con-
vex sets and functions in normed spaces. J. Convex Anal. 7, 375–393 (2000)
[50] J.M. Borwein, D. Preiss, A smooth variational principle with applications
to subdifferentiability and differentiability of convex functions. Trans. Amer.
Math. Soc. 303, 517–527 (1997)
[51] J.M. Borwein, H.M. Strójwas, Tangential approximations. Nonlinear Anal. 9,
1347–1366 (1985)
[52] J.M. Borwein, J. Vanderwerf, Differentiability of conjugate functions and per-
turbed minimization principles. J. Convex Anal. 16, 707–711 (2009)
[53] J.M. Borwein, Q.J. Zhu, A survey of subdifferential calculus with applications.
Nonlinear Anal. 38, 687–773 (1999)
[54] J.M. Borwein, Q.J. Zhu, Techniques of Variational Analysis (Springer, New
York, 2005)
[55] R.I. Boţ, Conjugate Duality in Convex Optimization (Springer, Berlin, 2010)
[56] R.I. Boţ, E.R. Csecnet, G. Wanka, Regularity condition via quasi-relative
interior in convex programming. SIAM J. Optim. 19, 217–233 (2008)
[57] R.I. Boţ, G. Wanka, The conjugate of the pointwise maximum of two convex
functions revisited. J. Global Optim. 41, 625–632 (2008)
[58] T. Botts, On convex sets in linear normed spaces. Bull. Amer. Math. Soc. 48,
150–152 (1942)
562 References

[59] M. Bounkhel, L. Thibault, On various notions of regularity of sets in nons-


mooth analysis. Nonlinear Anal. 48, 223–246 (2002)
[60] N. Bourbaki, Elements of Mathematics: General Topology (Addison-Wesley,
Boston, MA, 1966)
[61] N. Bourbaki, Topological Vector Spaces (Springer, Berlin, 1987)
[62] S. Boyd, L. Vandenberghe, Convex Optimization (Cambridge University Press,
Cambridge, UK, 2004)
[63] A. Brøndsted, Conjugate convex functions in topological vector spaces. Math.
Fys. Medd. Danske Vid. Selsk. 34, 1–26 (1964)
[64] A. Brøndsted, R.T. Rockafellar, On the subdiferentiability of convex functions.
Bull. Amer. Math. Soc. 16, 605–611 (1965)
[65] R.S. Burachik, M.N. Dao, S.B. Lindstrom, Generalized Bregman envelopes
and proximity operators. J. Optim Theory Appl. 190, 744–778 (2021)
[66] R.S. Burachik, A.N. Iusem, Set-Valued Mappings and Enlargements of Mono-
tone Operators (Springer, New York, 2008)
[67] R.S. Burachik, V. Jeyakumar, A dual condition for the convex subdifferential
sum formula with applications. J. Convex Anal. 12, 279–290 (2005)
[68] J.M. Burke, X. Chen, H. Sun, The subdifferential of measurable composite
max integrands and smoothing approximation. Math. Program. 181, 229–264
(2020)
[69] J.V. Burke, M.L. Overton, Variational analysis of non-Lipschitz spectral func-
tions. Math. Program. 90, 317–352 (2001)
[70] C. Carathéodory, Über den Variabilitätsbereich der Koeffizienten von Poten-
zreihen, die gegebene Werte nicht annehmen. Math. Ann. 64, 95–115 (1907)
[71] C. Castaing, M. Valadier, Convex Analysis and Measurable Multifunctions
(Springer, Berlin, 1977)
[72] N.H. Chieu, The Fréchet and limiting subdifferentials of integral functionals
on the spaces L1 (Ω, E). J. Math. Anal. Appl. 360, 704–710 (2009)
[73] N.H. Chieu, Limiting subdifferentials of indefinite integrals. J. Math. Anal.
Appl. 341, 247–258 (2008)
[74] F.H. Clarke, Generalized gradients and applications. Trans. Amer. Math. Soc.
205, 247–262 (1975)
[75] F.H. Clarke, Generalized gradients of Lipschitz functionals. Adv. Math. 40,
52–67 (1981)
[76] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983)
[77] F.H. Clarke, Yu.S. Ledyaev, R.J. Stern, P.R. Wolenski, Nonsmooth Analysis
and Control Theory (Springer, New York, 1998)
[78] G. Colombo, V.V. Goncharov, B.S. Mordukhovich, Well-posedness of minimal
time problems with constant dynamics in Banach spaces. Set-Valued Var.
Anal. 18, 349–372 (2010)
[79] G. Colombo, P.R. Wolenski, The subgradient formula for the minimal time
function in the case of constant dynamics in Hilbert space. J. Global Optim.
28, 269–282 (2004)
[80] G. Colombo, P.R. Wolenski, Variational analysis for a class of minimal time
functions in Hilbert spaces. J. Convex Anal. 11, 335–364 (2004)
[81] P.L. Combettes, Perspective functions: properties, constructions, and exam-
ples. Set-Valued Var. Anal. 26, 247–264 (2018)
References 563

[82] R. Correa, A. Hantoute, M.A. López, Subdifferential of the supremum func-


tion: moving back and forth between continuous and non-continuous settings.
Math. Program. (2020). https://doi.org/10.1007/s10107-020-01592-0
[83] R. Correa, A. Hantoute, P. Pérez-Aros, Characterizations of the subdifferential
of convex integral functions under qualification conditions. J. Funct. Anal.
277, 227–254 (2019)
[84] R. Correa, A. Hantoute, P. Pérez-Aros, Subdifferential calculus rules for possi-
bly nonconvex integral functions. SIAM J. Control Optim. 58, 462–484 (2020)
[85] R. Correa, A. Jofré, L. Thibault, Subdifferential monotonicity as characteri-
zation of convex functions. Numer. Funct. Anal. Optim. 15, 1167–1183 (1994)
[86] M.G. Crandall, P.-L. Lions, Viscosity solutions of Hamilton-Jacobi equations.
Trans. Amer. Math. Soc. 277, 1–42 (1983)
[87] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, Quasi-relative interiors for
graphs of convex set-valued mappings. Optim. Lett. 15, 933–952 (2021)
[88] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, Extremal systems of convex sets
with applications to convex calculus in vector spaces, to appear in Pure Appl.
Func. Anal. (2021), arXiv:2003.12899
[89] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, A. Cartwell, Algebraic core and
convex calculus without topology. Optimization (2020). https://doi.org/10.
1080/02331934.2020.1800700
[90] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, G. Sandine, Fenchel-Rockafellar
theorem in infinite dimensions via generalized relative interior (2021),
arxiv:2104.13510
[91] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, G. Sandine, Generalized differ-
entiation and duality in infinite dimensions under polyhedral convexity (2021),
arXiv:2106.15777
[92] D.V. Cuong, B.S. Mordukhovich, N.M. Nam, M. Wells, Convex analysis
of minimal time and signed minimal time functions. Optimization (2021).
https://doi.org/10.1080/02331934.2021.1910695
[93] D.V. Cuong, N.M. Nam, Generalized differentiation and characterizations for
differentiability of infimal convolutions. Set-Valued Var. Anal. 23, 333–353
(2015)
[94] P. Daniele, S. Giuffré, G. Idone, A. Maugeri, Infinite dimensional duality and
applications. Math. Ann. 339, 221–239 (2007)
[95] J.M. Danskin, The Theory of Max-Min and Its Applications to Weapons Allo-
cations Problems (Springer, New York, 1967)
[96] G. Debreu, The coefficient of resource utilization. Econometrica 19, 273–292
(1951)
[97] V.F. Demyanov, V.N. Malozemov, On the theory of nonlinear minimax prob-
lems. Russ. Math. Surv. 26, 57–115 (1971)
[98] S. Dempe, J. Dutta, B.S. Mordukhovich, New necessary optimality conditions
in optimistic bilevel programming. Optimization 56, 577–604 (2007)
[99] S. Dempe, B.S. Mordukhovich, A.B. Zemkoho, Sensitivity analysis for two-
level value functions with applications to bilevel programming. SIAM J.
Optim. 22, 1309–1343 (2012)
[100] S. Dempe, B.S. Mordukhovich, A.B. Zemkoho, Two-level value function
approach to optimistic and pessimistic bilevel programs. Optimization 68,
433–455 (2019)
564 References

[101] V.F. Demyanov, A.M. Rubinov, On quasidifferentiable functions. Soviet Math.


Dokl. 21, 14–17 (1980)
[102] V.F. Demyanov, A.M. Rubinov, Constructive Nonsmooth Analysis (Peter
Lang, Frankfurt, Germany, 1995)
[103] V.F. Demyanov, A.M. Rubinov (eds.), Quasidifferentiability and Related Top-
ics (Kluwer, Dordrecht, The Netherlands, 2000)
[104] D. Dentcheva, A. Ruszczyński, Subregular recourse in nonlinear multistage
stochastic optimization. Math. Program. (2021). https://doi.org/10.1007/
s10107-020-01612-z
[105] R. Deville, G. Godefroy, V. Zizler, Smoothness and Renorming in Banach
Spaces (Wiley, New York, 1993)
[106] J. Diestel, J.J. Uhl Jr., Vector Measures (American Mathematical Society,
Providence, RI, 1977)
[107] J. Dieudonné, History of Functional Analysis (North-Holland, Amsterdam,
1981)
[108] N. Dinh, V. Jeyakumar, Farkas’ lemma: three decades of generalizations for
mathematical optimization. TOP 22, 1–22 (2014)
[109] N. Dinh, B.S. Mordukhovich, T.T.A. Nghia, Qualification and optimality con-
ditions for convex and DC programs with infinite constraints. Acta Math.
Vietnamica 34, 123–153 (2009)
[110] N. Dinh, B.S. Mordukhovich, T.T.A. Nghia, Subdifferentials of value functions
and optimality conditions for some classes of DC and bilevel infinite and semi-
infinite programs. Math. Program. 123, 101–138 (2010)
[111] U. Dini, Fondamenti per la Teoria delle Funzioni di Variabili Reali (Pisa, Italy,
1878)
[112] Z. Drezner, H.W. Hamacher (eds.), Facility Location: Applications and Theory
(Springer, Berlin, 2004)
[113] A.Y. Dubovitskii, A.A. Milyutin, Extremum problems in the presence of
restrictions. USSR Comput. Maths. Math. Phys. 5, 1–80 (1965)
[114] N. Dunford, J.T. Schwartz, Linear Operators, I (Interscience, New York, 1964)
[115] M. Dür, A parametric characterization of local optimality. Math. Meth. Oper.
Res. 57, 101–109 (2003)
[116] M. Durea, J. Dutta, C. Tammer, Lagrange multipliers and ε-Pareto solutions
in vector optimization with nonsolid cones in Banach spaces. J. Optim. Theory
Appl. 145, 196–211 (2010)
[117] M. Eidelheit, Zur theorie der konvexen mengen in linear en normierten
Räumen. Stud. Math. 6, 104–111 (1936)
[118] I. Ekeland, Sur les problèmes variationnels. C. R. Acad. Sci. Paris 275, 1057–
1059 (1972); 276, 1347–1348 (1973)
[119] I. Ekeland, On the variational principle. J. Math. Anal. Appl. 47, 324–353
(1974)
[120] I. Ekeland, Nonconvex minimization problems. Bull. Amer. Math. Soc. 1, 432–
467 (1979)
[121] I. Ekeland, G. Lebourg, Generic Fréchet-differentiability and perturbed opti-
mization problems in Banach spaces. Trans. Amer. Math. Soc. 224, 193–216
(1976)
[122] I. Ekeland, R. Temam, Convex Analysis and Variational Problems (SIAM,
Philadelphia, PA, 1999)
References 565

[123] R. Ellaia, J.-B. Hiriart-Urruty, The conjugate of the difference of convex func-
tions. J. Optim. Theory Appl. 49, 493–498 (1986)
[124] E. Ernst, M. Théra, On the necessity of the Moreau-Rockafellar-Robinson
qualification condition in Banach spaces. Math. Program. 117, 149–161 (2009)
[125] M. Fabian, On minimum principles. Acta Polytech. 20, 109–118 (1983)
[126] M. Fabian, Gâteaux Differentiability of Convex Functions and Topology. Weak
Asplund Spaces (Wiley, New York, 1997)
[127] M. Fabian, B.S. Mordukhovich, Sequential normal compactness versus topo-
logical normal compactness in variational analysis. Nonlinear Anal. 54, 1057–
1067 (2003)
[128] F. Facchinei, J.-S. Pang, Finite-Dimensional Variational Inequalities and
Complementary Problems, published in two volumes (Springer, New York,
2003)
[129] J. (Gyula) Farkas, Theorie der Einfachen Ungleichungen. J. Reine Angew.
Math. 124, 1–27 (1902)
[130] W. Fenchel, On conjugate convex functions. Canad. J. Math. 1, 73–77 (1949)
[131] W. Fenchel, Convex Cones, Sets and Functions, Mimeographed Lecture Notes
(Princeton University, Princeton, NJ, 1951)
[132] W. Fenchel, Convexity through the ages, in Convexity and Its Applications,
ed. by P.M. Gruber, J.M. Wills (Basel, Birkhäuser, 1983), pp. 120–130
[133] R.A. Fisher, Theory of statistical estimation. Proc. Cambridge. Philos. Soc.
22, 700–725 (1925)
[134] F. Flores-Bazán, G. Mastroeni, Strong duality in cone constrained nonconvex
optimization. SIAM J. Optim. 23, 153–169 (2013)
[135] M. Fréchet, Sur quelques points du calcul fonctionnel. Rend. Circ. Matem.
Palermo 22, 1–72 (1906)
[136] R.V. Gamkrelidze, On the theory of optimal processes in linear systems. Dokl.
Akad. Nauk SSSR 116, 9–11 (1957)
[137] R.V. Gamkrelidze, On sliding optimal regimes. Soviet Math. Dokl. 3, 559–561
(1962)
[138] R. Gâteaux, Sur les fonctionnelles continues et les fonctionnelles analytiques.
C. R. Acad. Sci. Paris 157, 325–327 (1913)
[139] J. Gauvin, The generalized gradient of a marginal function in mathematical
programming. Math. Oper. Res. 4, 458–463 (1979)
[140] W. Geremew, N.M. Nam, A. Semenov, V. Boginski, E. Pasiliao, A DC pro-
gramming approach for solving multicast network design problems via the
Nesterov smoothing technique. J. Global Optim. 72, 705–729 (2018)
[141] M. Gieraltowska-Kedzierska, F.S. Van Vleck, Fréchet vs. Gâteaux differentia-
bility of Lipschitzian functions. Proc. Amer. Math. Soc. 114, 905–907 (1992)
[142] E. Giner, J.-P. Penot, Subdifferentiation of integral functionals. Math. Pro-
gram. 168, 401–431 (2018)
[143] M.S. Gowda, M. Teboulle, A comparison of constraint qualifications in infinite-
dimensional convex programming. SIAM J. Control Optim. 28, 925–935
(1990)
[144] M.-S. Grad, Vector Optimization and Monotone Operators via Convex Duality
(Springer, Cham, Switzerland, 2015)
[145] L.M. Graves, Some mapping theorems. Duke Math. J. 17, 111–114 (1950)
[146] A. Greenbaum, A.S. Lewis, M.L. Overton, Variational analysis of the Crouzeix
ratio. Math. Program. 164, 229–243 (2017)
566 References

[147] S. Grundel, M.L. Overton, Variational analysis of the spectral abscissa at a


matrix with a nongeneric multiple eigenvalue. Set-Valued Var. Anal. 22, 19–43
(2014)
[148] N. Hadjisavvas, S. Schaible, Quasimonotone variational inequalities in Banach
spaces. J. Optim. Theory Appl. 90, 95–111 (1996)
[149] H. Hahn, Über lineare gleichungssysteme in linearen räumen. J. Reine Angew.
Math. 157, 214–229 (1927)
[150] A. Hantoute, R. Henrion, P. Pérez-Aros, Subdifferential characterization of
probability functions under Gaussian distribution. Math. Program. 174, 167–
194 (2019)
[151] A. Hantoute, M.A. López, C. Zălinescu, Subdifferential calculus rules in convex
analysis: a unifying approach via pointwise supremum functions. SIAM J.
Optim. 19, 863–882 (2008)
[152] P. Hartman, On functions representable as a difference of convex functions.
Pacific J. Math. 9, 707–713 (1959)
[153] F. Hausdorff, Grundzüge der Mengenlehre (Veit, Leipzig, 1914)
[154] R. Haydon, A counterexample in several questions about scattered compact
spaces. Bull. London Math. Soc. 22, 261–268 (1990)
[155] Y. He, K.F. Ng, Subdifferentials of a minimal time function in Banach spaces.
J. Math. Anal. Appl. 321, 896–910 (2006)
[156] E. Helly, Über lineare Funktionaloperationen. Wien. Ber. 121, 265–297 (1912)
[157] E. Helly, Über mengen konvexer körper mit gemeinschaftlichen punkten.
Jahresbericht der Deutschen Mathematiker-Vereinigung 32, 175–176 (1923)
[158] C. Hess, Set-valued integration and set-valued probability theory: an overview,
in Handbook of Measure Theory, ed. by E. Pap, Chapter 14 (North Hol-
land/Elsevier, Amsterdam, 2002)
[159] J.-B. Hiriart-Urruty, Gradients généralisés de fonctions marginales. SIAM J.
Control Optim. 16, 301–316 (1978)
[160] J.-B. Hiriart-Urruty, New concepts in nondifferentiable programming. Bull.
Soc. Math. France 60, 5–85 (1979)
[161] J.-B. Hiriart-Urruty, Lipschitz r−continuity of the approximate subdifferential
of a convex function. Math. Scand. 47, 123–134 (1980)
[162] J.-B. Hiriart-Urruty, ε-Subdifferential, in Convex Analysis and Optimization,
ed. by J.-P. Aubin, R.B. Vinter (Pitman, London, 1982), pp. 43–92
[163] J.-B. Hiriart-Urruty, A general formula on the conjugate of the difference of
functions. Canad. Math. Bull. 29, 482–485 (1986)
[164] J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algo-
rithms I, II (Springer, Berlin, 1993)
[165] J.-B. Hiriart-Urruty, C. Lemaréchal, Fundamentals of Convex Analysis
(Springer, Berlin, 2001)
[166] J.-B. Hiriart-Urruty, R.R. Phelps, Subdifferential calculus using ε-
subdifferentials. J. Func. Anal. 118, 154–166 (1993)
[167] R.B. Holmes, Geometric Functional Analysis and Its Applications (Springer,
New York, 1975)
[168] A.D. Ioffe, Sous-differentielles approaches de fonctions numériques. C. R.
Acad. Sci. Paris 292, 675–678 (1981)
[169] A.D. Ioffe, Proximal analysis and approximate subdifferentials. J. London
Math. Soc. 41, 175–192 (1990)
References 567

[170] A.D. Ioffe, Codirectional compactness, metric regularity and subdifferential


calculus, in Constructive, Experimental and Nonlinear Analysis, ed. by M.
Théra, Canad. Math. Soc. Conf. Proc. 27 (American Mathematical Society,
Providence, RI, 2000), pp. 123–164
[171] A.D. Ioffe, Variational Analysis of Regular Mappings: Theory and Applications
(Springer, Cham, Switzerland, 2017)
[172] A.D. Ioffe, V.L. Levin, Subdifferentials of convex functions. Trans. Moscow
Math. Soc. 26, 1–72 (1972)
[173] A.D. Ioffe, V.M. Tikhomirov, On minimization of integral functionals. Func.
Anal. Appl. 3, 218–227 (1969)
[174] A.D. Ioffe, V.M. Tikhomirov, Theory of Extremal Problems (North-Holland,
Amsterdam, 1979)
[175] J.L.W.V. Jensen, Sur les fonctions convexes et les inégalités entre les valeurs
moyennesa. Acta Math. 30, 175–193
[176] A. Jourani, L. Thibault, Metric regularity and subdifferential calculus in
Banach spaces. Set-Valued Anal. 3, 87–100 (1996)
[177] A. Jourani, L. Thibault, Qualification conditions for calculus rules of coderiva-
tives of multivalued mappings. J. Math. Anal. Appl. 218, 66–81 (1998)
[178] A. Jourani, L. Thibault, Noncoincidence of approximate and limiting sub-
differentials of integral functionals. SIAM J. Control Optim. 49, 1435–1453
(2011)
[179] W. Karush, Minima of Functions of Several Variables with Inequalities as
Side Constraints, M. Sc. Thesis, Department of Mathematics, University
of Chicago, Chicago, Illinois (1939) (available in http://pi.lib.uchicago.edu/
1001/cat/bib/4111654)
[180] J. Kelley, General Topology (Van Nostrand, Princeton, NJ, 1955)
[181] A.A. Khan, C. Tammer, C. Zălinescu, Set-Valued Optimization: An Introduc-
tion with Applications (Springer, Berlin, 2015)
[182] K.C. Kiwiel, Methods of Descent for Nondifferentiable Optimization (Springer,
Berlin, 1985)
[183] V.L. Klee Jr., Convex sets in linear spaces. Duke Math. J. 18, 443–466 (1951)
[184] A. Kolmogoroff (Kolmogorov), Zur Normierbarkeit eines allgemeinen topolo-
gischen linearen Raumes. Stud. Math. 5, 29–33 (1934)
[185] M. Krein, D. Milman, On extreme points of regular convex sets. Stud. Math.
9, 133–138 (1940)
[186] A.Y. Kruger, Epsilon-semidifferentials and epsilon-normal elements, Depon.
VINITI #1331-81, Moscow (1981)
[187] A.Y. Kruger, Generalized differentials of nonsmooth functions and necessary
conditions for an extremum. Siberian Math. J. 26, 370–379 (1985)
[188] A.Y. Kruger, Properties of generalized differentials. Siberian Math. J. 26,
822–832 (1985)
[189] A.Y. Kruger, On Fréchet subdifferentials. J. Math. Sci. 116, 3325–3358 (2003)
[190] A.Y. Kruger, B.S. Mordukhovich, Generalized normals and derivatives, and
necessary optimality conditions in nondifferential programming, Parts I and
II, Depon. VINITI: I#408-80, II# 494-80, Moscow (1980)
[191] A.Y. Kruger, B.S. Mordukhovich, Extremal points and the Euler equation in
nonsmooth optimization. Dokl. Akad. Nauk BSSR 24, 684–687 (1980)
[192] H.W. Kuhn, A note on Fermat-Torricelli problem. Math. Program. 54, 98–107
(1973)
568 References

[193] H.W. Kuhn, A.W. Tucker, Nonlinear programming, in Proceedings of the Sec-
ond Berkeley Symposium on Mathematical Statistics and Probability (Univer-
sity of California Press, Berkeley, CA), pp. 481–492
[194] Y.S. Kupitz, H. Martini, M. Spirova, The Fermat-Torricelli problem, Part I: a
discrete gradient-method approach. Optim. Theory Appl. 158, 305–327 (2013)
[195] K. Kuratowski, Une méthode d’élimination des nombres transfinis des raison-
nements mathématiques. Fund. Math. 3, 76–108 (1922)
[196] K. Kuratowski, Topology, I, II (Academic Press, New York, 1966, 1968)
[197] A.G. Kusraev, S.S. Kutateladze, Subdifferentials: Theory and Applications
(Kluwer, Dordrecht, The Netherlands, 1995)
[198] S.R. Lay, Convex Sets and Their Applications (Wiley, New York, 1982)
[199] E.B. Leach, A note on inverse function theorem. Proc. Amer. Math. Soc. 12,
694–697 (1961)
[200] G. Lebourg, Valeur moyenne pour gradient généraliseé. C. R. Acad. Sci. Paris
281, 795–798 (1975)
[201] B. Lemaire, Applications of a subdifferential of a convex composite func-
tional to optimal control in variational inequalities, in Nondifferentiable Opti-
mization: Motivations and Applications, vol. 255, ed. by V.F. Demyanov, D.
Pallaschke, Lecture Notes Economics and Mathematical Systems (Springer,
Berlin, 1985), pp. 103–117
[202] E.S. Levitin, B.T. Polyak, Convergence of minimizing sequences in conditional
extremum problems. Soviet Math. Dokl. 7, 764–767 (1966)
[203] A.S. Lewis, Nonsmooth analysis of engenvalues. Math. Program. 84, 1–24
(1999)
[204] A.S. Lewis, H.S. Sendov, Nonsmooth analysis of singular values, Part I: theory.
Set-Valued Anal. 13, 213–241 (2005)
[205] A.S. Lewis, H.S. Sendov, Nonsmooth analysis of singular values, Part II: appli-
cations. Set-Valued Anal. 13, 243–264 (2005)
[206] C. Li, K.F. Ng, Subdifferential calculus rules for supremum functions in convex
analysis. SIAM J. Optim. 21, 782–797 (2011)
[207] C. Li, K.F. Ng, T.K. Pong, The SECQ, linear regularity, and the strong CHIP
for an infinite system of closed convex sets in normed linear spaces. SIAM. J.
Optim. 18, 643–665 (2007)
[208] P.D. Loewen, Limits of Fréchet normals in nonsmooth analysis, in Optimiza-
tion and Nonlinear Analysis, ed. by A. Ioffe, L. Marcus, S. Reich, P.R. Notes.
Math, Ser. 244 (UK, Longman, Harlow, Essex, 1992), pp. 178–188
[209] N.N. Luan, J. Yao, N.D. Yen, On some generalized polyhedral convex con-
structions. Numer. Funct. Anal. Optim. 29, 537–570 (2017)
[210] R. Lucchetti, Convexity and Well-Posed Problems (Springer, New York, 2006)
[211] Y. Lucet, J.J. Ye, Sensitivity analysis of the value function for optimization
problems with variational inequality constraints. SIAM J. Control Optim. 40,
699–723 (2001). Erratum in SIAM J. Control Optim. 41, 1315–1319 (2002)
[212] H. Luo, X. Wang, B. Lukens, Variational analysis on the signed distance func-
tions. J. Optim. Theory Appl. 180, 751–774 (2019)
[213] A.A. Lyapunov, Sur les founctions-vecteurs complétement additives. Izvest.
Akad. Nauk SSSR, Ser. Mat. 3, 465–478 (1940)
[214] J.-E. Marı́nez-Legaz, A. Seeger, A formula on the approximate subdifferential
of the difference of convex functions. Bull. Austral. Math. Soc. 45, 37–41
(1992)
References 569

[215] H. Martini, N.M. Nam, A. Robinson, On the equivalence of the theorems of


Helly, Radon, and Carathéodory via convex analysis. J. Convex Anal. 22,
591–601 (2015)
[216] A. Mas-Collel, M.D. Whinston, J.R. Green, Microeconomic Theory (Oxford
University Press, Oxford, UK, 1995)
[217] H. Maurer, J. Zowe, First and second-order necessary and sufficient optimality
conditions for infinite-dimensional programming problems. Math. Program.
16, 98–110 (1979)
[218] R. Meise, D. Vogt, Introduction to Functional Analysis (Oxford University
Press, Oxford, UK, 1997)
[219] H. Minkowski, Geometrie der Zahlen (Teubner, Leipzig, 1910)
[220] H. Minkowski, Theorie der konvexen Körper, insbesondere Begründung ihres
Oberflächenbegriffs, Gesammelte Abhandlungen II (Leipzig, Teubner, 1911)
[221] A. Mohammadi, B.S. Mordukhovich, Variational analysis in normed spaces
with applications to constrained optimization. SIAM J. Optim. 31, 569–603
(2021)
[222] A. Mohammadi, B.S. Mordukhovich, M.E. Sarabi, Variational analysis of com-
posite models with applications to continuous optimization. Math. Oper. Res.
(2020). https://doi.org/10.1287/moor.2020.1074
[223] B.S. Mordukhovich, Maximum principle in problems of time optimal control
with nonsmooth constraints. J. Appl. Math. Mech. 40, 960–969 (1976)
[224] B.S. Mordukhovich, Metric approximations and necessary optimality condi-
tions for general classes of extremal problems. Soviet Math. Dokl. 22, 526–530
(1980)
[225] B.S. Mordukhovich, Nonsmooth analysis with nonconvex generalized differen-
tials and adjoint mappings. Dokl. Akad. Nauk BSSR 28, 976–979 (1984)
[226] B.S. Mordukhovich, Approximation Methods in Problems of Optimization and
Control (Nauka, Moscow, 1988)
[227] B.S. Mordukhovich, Sensitivity analysis in nonsmooth optimization, in Theo-
retical Aspects of Industrial Design, ed. by D.A. Field, V. Komkov, S.I.A.M.
Proc, Appl. Math. 58 (PA, Philadelphia, 1992), pp. 32–46
[228] B.S. Mordukhovich, Variational Analysis and Generalized Differentiation, I:
Basic Theory, II: Applications (Springer, Berlin, 2006)
[229] B.S. Mordukhovich, Variational Analysis and Applications (Springer, Cham,
Switzerland, 2018)
[230] B.S. Mordukhovich, Bilevel optimization and variational analysis, in Bilevel
Optimization: Advances and Next Challenges, ed. by S. Dempe, A. Zemkoho
(Switzerland, Springer, Cham, 2020), pp. 197–226
[231] B.S. Mordukhovich, N.M. Nam, Subgradients of distance functions with some
applications. Math. Program. 104, 635–668 (2005)
[232] B.S. Mordukhovich, N.M. Nam, Variational stability and marginal functions
via generalized differentiation. Math. Oper. Res. 30, 1–18 (2005)
[233] B.S. Mordukhovich, N.M. Nam, Subgradients of distance functions at out-of-
state points. Taiwan. J. Math. 10, 299–326 (2006)
[234] B.S. Mordukhovich, N.M. Nam, Limiting subgradients of minimal time func-
tions in Banach spaces. J. Global Optim. 46, 615–633 (2010)
[235] B.S. Mordukhovich, N.M. Nam, Subgradients of minimal time functions under
minimal requirements. J. Convex Anal. 18, 915–947 (2011)
570 References

[236] B.S. Mordukhovich, N.M. Nam, Applications of variational analysis to a gen-


eralized Fermat-Torricelli problem. Optim. Theory Appl. 148, 431–454 (2011)
[237] B.S. Mordukhovich, N.M. Nam, An Easy Path to Convex Analysis and Appli-
cations (Morgan & Claypool Publishers, San Rafael, CA, 2014)
[238] B.S. Mordukhovich, N.M. Nam, Geometric approach to convex subdifferential
calculus. Optimization 66, 839–873 (2017)
[239] B.S. Mordukhovich, N.M. Nam, Extremality of convex sets with some appli-
cations. Optim. Lett. 17, 1201–1215 (2017)
[240] B.S. Mordukhovich, N.M. Nam, Convex Analysis and Beyond, II: Applica-
tions, book in preparation
[241] B.S. Mordukhovich, N.M. Nam, H.M. Phan, Variational analysis of marginal
functions with applications to bilevel programming. J. Optim. Theory Appl.
152, 557–586 (2011)
[242] B.S. Mordukhovich, N.M. Nam, B. Rector, T. Tran, Variational geometric
approach to generalized differential and conjugate calculus in convex analysis.
Set-Valued Var. Anal. 25, 731–755 (2017)
[243] B.S. Mordukhovich, N.M. Nam, J. Salinas, Solving a generalized Heron prob-
lem by means of convex analysis. Amer. Math. Monthly 119, 87–99 (2012)
[244] B.S. Mordukhovich, N.M. Nam, J. Salinas, Applications of variational analysis
to a generalized Heron problem. Applic. Anal. 91, 1915–1942 (2012)
[245] B.S. Mordukhovich, N.M. Nam, C. Villalobos, The smallest enclosing ball
problem and the smallest intersecting ball problem: existence and uniqueness
of solutions. Optim. Lett. 154, 768–791 (2012)
[246] B.S. Mordukhovich, N.M. Nam, N.D. Yen, Frećhet subdifferential calculus
and optimality conditions in nondifferentiable programming. Optimization 55,
685–708 (2006)
[247] B.S. Mordukhovich, N.M. Nam, N.D. Yen, Subgradients of marginal func-
tions in parametric mathematical programming. Math. Program. 116, 369–
396 (2009)
[248] B.S. Mordukhovich, T.T.A. Nghia, DC optimization approach to metric reg-
ularity of convex multifunctions with applications to stability of infinite sys-
tems. J. Optim. Theory Appl. 155, 762–784 (2012)
[249] B.S. Mordukhovich, T.T.A. Nghia, Subdifferentials of nonconvex supremum
functions and their applications to semi-infinite and infinite programs with
Lipschitzian data. SIAM J. Optim. 23, 406–431 (2013)
[250] B.S. Mordukhovich, T.T.A. Nghia, Nonsmooth cone-constrained optimization
with applications to semi-infinite programming. Math. Oper. Res. 39, 301–337
(2014)
[251] B.M. Mordukhovich, P. Pérez-Aros, New extremal principles with applications
to stochastic and semi-infinite programming. Math. Program. 89, 527–553
(2021)
[252] B.S. Mordukhovich, P. Pérez-Aros, Generalized sequential differential calculus
for expected-integral functionals. Set-Valued Var. Anal. 29, 621–644 (2021)
[253] B.S. Mordukhovich, P. Pérez-Aros, Generalized Leibniz rules and Lipschitz
stability for expected-integral mappings, to appear in SIAM J. Optim. (2021),
arXiv:2101.06711
[254] B.S. Mordukhovich, H.M. Phan, Tangential extremal principle for finite and
infinite systems II: applications to semi-infinite and multiobjective optimiza-
tion. Math. Program. 136, 31–63 (2012)
References 571

[255] B.S. Mordukhovich, N. Sagara, Subdifferentials of nonconvex integral func-


tionals in Banach spaces with applications to stochastic dynamic program-
ming. J. Convex Anal. 25, 643–673 (2018)
[256] B.S. Mordukhovich, N. Sagara, Subdifferentials of value functions in non-
convex dynamic programming for nonstationary stochastic processes. Comm.
Stoch. Anal. 13, 1–18 (2019). https://doi.org/10.31390/cosa.13.3.05
[257] B.S. Mordukhovich, Y. Shao, Nonsmooth sequential analysis in Asplund
spaces. Trans. Amer. Math. Soc. 348, 1235–1280 (1996)
[258] B.S. Mordukhovich, Y. Shao, Stability of multifunctions in infinite dimensions:
point criteria and applications. SIAM J. Control Optim. 35, 285–314 (1997)
[259] B.S. Mordukhovich, A. Soubeyran, Variational analysis and variational ratio-
nality in behavioral sciences: stationary traps, in Variational Analysis and Set
Optimization: Developments and Applications in Decision Making, ed. by A.
Khan et al. (CRC Press, Boca Raton, FL, 2019), pp. 1–29
[260] B.S. Mordukhovich, B. Wang, Necessary optimality and suboptimality con-
ditions in nondifferentiable programming via variational principles. SIAM J.
Control Optim. 41, 623–640 (2002)
[261] B.S. Mordukhovich, B. Wang, Differentiability and regularity of Lipschitzian
mappings. Proc. Amer. Math. Soc. 131, 389–399 (2003)
[262] J.J. Moreau, Fonctions convexes duales et points proximaux dans un espace
hilbertien. C. R. Acad. Sci. Paris 255, 2897–2899 (1962)
[263] J.J. Moreau, Fonctions Convexes en Dualité (Faculté des Sciences de Mont-
pellier, Séminaires de Mathématiques, Montpellier, France, 1962)
[264] J.J. Moreau, Propriétés des applications “prox”. C. R. Acad. Sci. Paris 256,
1069–1071 (1963)
[265] J.J. Moreau, Inf-convolution des fonctions numériques sur un espace vectoriel.
C. R. Acad. Sci. Paris 256, 5047–5049 (1963)
[266] J.J. Moreau, Inf-Convolution (Faculté des Sciences de Monlpellier, Séminaires
de Mathématiques, Montpellier, France, 1963)
[267] J.J. Moreau, Fonctionnelles sous-différentiables. C. R. Acad. Sci. Paris 257,
4117–4119 (1963)
[268] J.J. Moreau, Étude Locale dúne Fonctionnelle Convexe (Université de Mont-
pellier, Montpellier, France, 1963)
[269] N.M. Nam, L.T.H. An, D. Giles, N.T. An, Smoothing techniques and difference
of convex functions algorithms for image reconstructions. Optimization 69,
1601–1633 (2020)
[270] N.M. Nam, N. Hoang, A generalized Sylvester problem and a generalized
Fermat-Torricelli problem. J. Convex Anal. 20, 669–687 (2013)
[271] N.M. Nam, T.A. Nguyen, R.B. Rector, J. Sun, Nonsmooth algorithms and
Nesterov’s smoothing techniques for generalized Fermat-Torricelli problems.
SIAM J. Optim. 24, 1815–1839 (2014)
[272] N.M. Nam, R.B. Rector, D. Giles, Minimizing differences of convex functions
with applications to facility location and clustering. J. Optim. Theory Appl.
173, 255–278 (2017)
[273] N.M. Nam, C. Zălinescu, Variational analysis of directional minimal time func-
tions with applications to location problems. Set-Valued Var. Anal. 21, 405–
430 (2013)
[274] I. Namioka, R.R. Phelps, Banach spaces which are Asplund spaces. Duke
Math. J. 42, 735–750 (1975)
572 References

[275] J.F. Nash, Noncooperative Games, Doctoral dissertation (Princeton Univer-


sity, Princeton, NJ, 1950)
[276] J.F. Nash, Equilibrium points in N -person games. Proc. Nat. Acad. Sci. 36,
48–49 (1950)
[277] Yu. Nesterov, A method for unconstrained convex minimization problem with
the rate of convergence O(1/k2 ). Soviet Math. Dokl. 269, 543–547 (1983)
[278] Yu. Nesterov, Smooth minimization of nonsmooth functions. Math. Program.
103, 127–152 (2005)
[279] Yu. Nesterov, Lectures on Convex Optimization, 2nd edn. (Springer, Cham,
Switzerland, 2018)
[280] K.F. Ng, W. Song, Fenchel duality in finite-dimensional setting and its appli-
cations. Nonlinear Anal. 55, 845–858 (2003)
[281] E.A. Nurminskii, Continuity of ε-subgradient mappings. Cybernetics 3, 790–
791 (1977)
[282] J.V. Outrata, On the numerical solution of a class of Stackelberg problems,
ZOR-Methods Models. Oper. Res. 34, 255–277 (1990)
[283] M.L. Overton, On minimizing the maximum eingenvalue of a symmetric
matrix. SIAM J. Matrix Anal. Appl. 9, 256–268 (1988)
[284] M.L. Overton, R.S. Womersley, On minimizing the spectral radius of a non-
symmetric matrix function-optimality conditions and duality theory. SIAM J.
Matrix Anal. Appl. 9, 473–498 (1988)
[285] D. Pallaschke, S. Rolewicz, Foundations of Mathematical Optimization: Con-
vex Analysis without Linearity (Kluwer, Dordrecht, The Netherlands, 1998)
[286] J.-P. Penot, Subdifferential calculus without qualification conditions. J. Con-
vex Anal. 3, 1–13 (1996)
[287] J.-P. Penot, Compactness properties, openness criteria and coderivatives. Set-
Valued Anal. 6, 363–380 (1998)
[288] J.-P. Penot, Calculus Without Derivatives (Springer, New York, 2013)
[289] P. Pérez-Aros, Subdifferential formulae for the supremum of an arbitrary fam-
ily of functions. SIAM J. Optim. 29, 171–1743 (2019)
[290] R.R. Phelps, Convex Functions, Monotone Operators and Differentiability,
2nd edn. (Springer, Berlin, 1993)
[291] B.T. Polyak, Existence theorems and convergence of minimizing sequences in
extremum problems with restrictions. Soviet Math. Dokl. 7, 72–75 (1966)
[292] B.T. Polyak, An Introduction to Optimization (Optimization Software, New
York, 1987)
[293] D. Preiss, Differentiability of Lipschitz functions on Banach spaces. J. Funct.
Anal. 91, 312–345 (1990)
[294] B.N. Pshenichnyi, Necessary Conditions for an Extremum (Marcel Dekker,
New York, 1971)
[295] B.N. Pshenichnyi, Necessary conditions for an extremum for differential inclu-
sions. Kibernetika 12, 60–73 (1976)
[296] B.N. Pshenichnyi, Convex Analysis and Extremal Problems (Nauka, Moscow,
1980)
[297] L. Qi, J. Sun, A nonsmooth version of Newton’s method. Math. Program. 58,
353–368 (1993)
[298] H. Rademacher, Über partielle und totale differenzierbarkeit von funktionen
mehrerer varieabeln und über die transformation der doppelintegrale. Math.
Ann. 79, 340–359 (1919)
References 573

[299] J. Radon, Mengen konvexer körper, die einen gemeinsamen punkt enthalten.
Math. Ann. 83, 113–115 (1921)
[300] S.M. Robinson, Regularity and stability for convex multivalued functions.
Math. Oper. Res. 1, 130–143 (1976)
[301] S.M. Robinson, Some continuity properties of polyhedral multifunctions.
Math. Program. Stud. 14, 206–214 (1981)
[302] R.T. Rockafellar, Convex Functions and Dual Extremum Problems, Doctoral
dissertation, Harvard University, Cambridge, MA (1963)
[303] R.T. Rockafellar, Characterization of the subdifferentials of convex functions.
Pacific J. Math. 17, 497–510 (1966)
[304] R.T. Rockafellar, Extension of Fenchel’s duality theorem for convex functions.
Duke Math. J. 33, 81–89 (1966)
[305] R.T. Rockafellar, Integrals which are convex functionals. Pacific J. Math. 24,
525–539 (1968)
[306] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, NJ,
1970)
[307] R.T. Rockafellar, On the maximal monotonicity of subdifferential mappings.
Pacific J. Math. 33, 209–216 (1970)
[308] R.T. Rockafellar, Integrals which are convex functionals, II. Pacific J. Math.
39, 439–469 (1971)
[309] R.T. Rockafellar, Conjugate Duality and Optimization (SIAM, Philadelphia,
PA, 1974)
[310] R.T. Rockafellar, Directional Lipschitzian functions and subdifferential calcu-
lus. Proc. London Math. Soc. 39, 331–355 (1979)
[311] R.T. Rockafellar, Generalized directional derivatives and subgradients of non-
convex functions. Canad. J. Math. 32, 257–280 (1980)
[312] R.T. Rockafellar, The Theory of Subgradients and Its Applications to Problems
of Optimization: Convex and Nonconvex Functions (Helderman Verlag, Berlin,
1981)
[313] R.T. Rockafellar, Proximal subgradients, marginal values and augmented
Lagrangians in nonconvex optimization. Math. Oper. Res. 6, 424–436 (1981)
[314] R.T. Rockafellar, Lagrange multipliers and subderivatives of optimal value
functions in nonlinear programming. Math. Program. Study 17, 28–66 (1982)
[315] R.T. Rockafellar, Lipschitzian properties of multifunctions. Nonlinear Anal.
9, 867–885 (1985)
[316] R.T. Rockafellar, Extensions of subgradient calculus with applications to opti-
mization. Nonlinear Anal. 9, 665–698 (1985)
[317] R.T. Rockafellar, R.J.-B. Wets, Variational Analysis (Springer, Berlin, 1998)
[318] W. Rudin, Functional Analysis, 2nd edn. (McGraw-Hill, New York, 1991)
[319] A. Ruszczyński, Nonlinear Optimization (Princeton University Press, Prince-
ton, NJ, 2006)
[320] P.A. Samuelson, Foundations of Economic Analysis (Harvard University
Press, Cambridge, Massachusetts, 1947)
[321] J. Schauder, Über die Umkehrung linearer, stetiger Funktionaloperationen.
Stud. Math. 2, 1–6
[322] A. Seeger, Convex analysis of spectrally defined matrix functions. SIAM J.
Optim. 7, 679–696 (1997)
[323] N. Shioji, On uniformly convex functions and uniformly smooth functions.
Math. Japonica 41, 641–655 (1995)
574 References

[324] N.Z. Shor, On a class of almost-differentiable functions and on a minimization


method for functions from this class. Cybernetics 8, 599–606 (1972)
[325] N.Z. Shor, Minimization Methods for Nondifferentiable Functions (Springer,
Berlin, 1985)
[326] S. Simons, From Hahn-Banach to Monotonicity, 2nd edn. (Springer, Berlin,
2008)
[327] I. Singer, A Fenchel-Rockafeller type duality theorem for maximization. Bull.
Austral. Math. Soc. 20, 193–198 (1979)
[328] I. Singer, Duality for Nonconvex Approximation and Optimization (Springer,
New York, 2006)
[329] E. Steinitz, Bedingt konvergente Reihen und konvexe Systeme, I, II, III. J.
Math. 143, 128–175 (1913); 144, 1–40 (1914); 146, 1–52 (1916)
[330] V. Strassen, The existence of probability measures with given marginals. Ann.
Math. Stat. 36, 423–439 (1965)
[331] J.J. Sylvester, A question in the geometry of situation. Quart. J. Math. 1, 79
(1857)
[332] P.D. Tao, L.T.H. An, Difference of convex functions optimization algorithms
(DCA) for globally minimizing nonconvex quadratic forms on Euclidean balls
and spheres. Oper Res. Lett. 19, 207–216 (1996)
[333] L. Thibault, On subdifferentials of optimal value functions. SIAM J. Control
Optim. 29, 1019–1036 (1991)
[334] L. Thibault, Sequential convex subdifferential calculus and sequential
Lagrange multipliers. SIAM J. Control Optim. 35, 1434–1444 (1997)
[335] L. Thibault, D. Zagrodny, Integration of subdifferentials of lower semicontin-
uous functions. J. Math. Anal. Appl. 189, 22–58 (1995)
[336] L. Thibault, N. Zlateva, Integrability of subdifferentials of directionally Lips-
chitz functions. Proc. Amer. Math. Soc. 133, 2939–2948 (2005)
[337] N.T. Toan, J.-C. Yao, Mordukhovich subgradients of the value function to a
parametric discrete optimal control problem. J. Global Optim. 58, 595–612
(2014)
[338] J.F. Toland, Duality in nonconvex optimization. J. Math. Anal. Appl. 66,
399–415 (1978)
[339] L. Tonelli, Fondamenti di Calcolo della Variazoni, I, II (Nicola Zanichelli,
Bologna, 1921, 1923)
[340] A. N. Tychonoff (Tikhonov), Über die topologische Erweiterung von Räumen.
Math. Ann. 102, 544–561 (1930)
[341] M. Valadier, Sous-différentiels d’une borne supérieure et d’une somme continue
de fonctions convexes. C. R. Acad. Sci. Paris, Sér. A–B Math. 268, 39–42
(1969)
[342] L. Veselý, L. Zajı́ček, Delta-Convex Mappings between Banach Spaces and
Applications, Dissertationes Mathematicae, vol. 289 (Polish Academy of Sci-
ences, Warsaw, Poland, 1989), 52 pp
[343] R.B. Vinter, Optimal Control (Birkhäuser, Boston, MA, 2000)
[344] A.A. Vladimirov, Y.E. Nesterov, Y.Y. Chekanov, On uniformly convex func-
tionals. Vest. Moscow Univ. Ser. XV 3, 12–23 (1978)
[345] J. von Neumann, On complete topological spaces. Trans. Amer. Math. Soc.
37, 1–20 (1935)
References 575

[346] J. von Neumann, Some matrix inequalities and metrization of matrix-space.


Tomsk Univ. Rev. 1, 286–300 (1937). Reprinted in Collected Works, vol. IV
(Pergamon Press, Oxford, UK, 1962), pp. 205–219
[347] J. von Neumann, O. Morgenstern, Theory of Games and Economic Behavior
(Princeton University Press, Princeton, NJ, 1944)
[348] D.E. Ward, J.M. Borwein, Nonsmooth calculus in finite dimensions. SIAM J.
Control Optim. 25, 1312–1340 (1987)
[349] J. Warga, Relaxed variational problems. J. Math. Anal. Appl. 4, 111–128
(1962)
[350] K. Weierstrass, Uber continuirliche Functionen eines reellen Arguments,
die für keinen Werth des letzteren einen bestimmten Differentiialquotienten
besitzen. MathematischeWerke von Karl Weierstrass, vol. 2 (Mayer & Müller,
Berlin, 1895), pp. 71–74
[351] E. Weiszfeld, Sur le point pour lequel la somme des distances de n points
donnés est minimum. Tôhoku Mathematics Journal 43, 355–386 (1937)
[352] H. Weyl, Elementare Theorie der konvexen Polyeder. Comm. Math. Helvetici
7, 290–306 (1935)
[353] Z. Wu, J.J. Ye, Equivalences among various derivatives and subdifferentials of
the distance function. J. Math. Anal. Appl. 282, 629–647 (2003)
[354] J.J. Ye, D.L. Zhu, New necessary optimality conditions for bilevel programs
by combining MPEC and the value function approach. SIAM J. Optim. 20,
1885–1905 (2010)
[355] D. Yost, Asplund spaces for beginners. Acta Univ. Carolinae, Ser. Math. Phys.
34, 159–177 (1993)
[356] L.C. Young, Generalized curves and the existence of an attained absolute
minimum in the calculus of variations. C. R. Soc. Sci. Lett. Varsovie, Cl. III,
30, 212–234 (1937)
[357] W.H. Young, On classes of summable functions and their Fourier series. Proc.
Royal Soc. (A) 87, 1413–1428 (1912)
[358] D. Zagrodny, Approximate mean value theorem for upper subderivatives. Non-
linear Anal. 12, 1413–1428 (1988)
[359] C. Zălinescu, On uniformly convex functions. J. Math. Anal. Appl. 95, 344–
374 (1983)
[360] C. Zălinescu, A comparison of constraint qualifications in infinite-dimensional
convex programming revisited. J. Austral. Math. Soc., Ser. B 40, 353–378
(1999)
[361] C. Zălinescu, Convex Analysis in General Vector Space (World Scientific, Sin-
gapore, 2002)
[362] C. Zălinescu, On the use of the quasi-relative interior in optimization. Opti-
mization 64, 1795–1823 (2015)
[363] E. H. Zarantonello, Projections of convex sets in Hilbert space and spectral
theory, in Contributions to Nonlinear Functional Analysis (Academic Press,
Cambridge, MA, 1971), pp. 237–424
[364] A.J. Zaslavski, Turnpike Phenomenon and Infinite Horizon Optimal Control
(Springer, Cham, Switzerland, 2016)
[365] A.J. Zaslavski, Numerical Optimization with Computational Errors (Springer,
Cham, Switzerland, 2016)
[366] E. Zeidler, Applied Functional Analysis. Applications to Mathematical Physics
(Springer, Berlin, 1997)
576 References

[367] X.Y. Zheng, K.F. Ng, Subsmooth semi-infinite and infinite optimization prob-
lems. Math. Program. 134, 365–393 (2012)
[368] M. Zorn, A remark on method in transfinite algebra. Bull. Amer. Math. Soc.
41, 667–670 (1935)
Subject Index

Symbols asymptotic cone, 398


ε-normal, 321 Attouch, Hedy, 304
ε-normal intersection rule, 327 Attouch-Brezis condition, 274, 303,
ε-subdifferential calculus 305
asymptotic sum rule, 325, 327 Aubin, Jean-Pierre, 250
exact chain rule, 324 Azé , Dominique, 441
exact sum rule, 324
simple rules, 323 B
ε-subgradient, 318 Bačák, Miroslav, 550
Baire category theorem, 55
A
Baire, René, 64
absolutely symmetric function, 367
absorbing set, 34, 35, 74 balanced set, 34
adjoint mapping, 48 Banach space, 4
affine Banach, Stefan, 63, 173, 310
hull, 103 Bao, Truong Quang, 376
independence, 103 basis of neighborhoods, 36
mapping, 66, 129 in the weak topology, 40, 82
set, 85 in weak∗ topology, 43
Alaoglu, Leonidas, 64 in an LCTV space, 83
Alaoglu-Bourbaki theorem, 45 Bauschke, Heinz, 251, 378
Alexandrov, Alexander, 550 Bazaraa, Mokhtar, 549
Alexandrov, Pavel, 63 Bertsekas, Dimitri, viii, 377
algebraic interior, 74 Bishop, Errett, 375
An, Le Thi Hoai, 550 Bishop-Phelps theorem, 311, 318, 320
antiderivative, 497 Blaschke, Wilhelm, 173
Apollonius of Perga, 173 Boţ, Radu, 304
approximate mean value theorem, 333 Bochner integral, 308
Aragón Artacho, Francisco, 550 Bogolyubov, Nikolay, 173
Archimedes of Syracuse, vii, 173 Boltyanskii, Vladimir, 174
Arrow, Kenneth, 174 Bolzano, Bernard, 63
Asplund space, 317, 353, 354, 374, 549 Bonnans, Frédéric, 253
Asplund, Edgar, 354, 378 Borel, Émile, 63

© Springer Nature Switzerland AG 2022 577


B. S. Mordukhovich and N. Mau Nam, Convex Analysis and Beyond,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-94785-9
578 Subject Index

Borwein, Jonathan, viii, xii, 175, 177, intersection rule in Rn , 201


251, 304, 306, 376, 378, 547, 549, intersection rule in Banach spaces,
550 276
boundary, 6 intersection rule in TVS, 199
bounded extremality condition, 187 of a convex mapping, 195
bounded linear mapping, 38, 39 of a polyhedral-convex composi-
bounded set tion, 244
in a metric space, 23 of a polyhedral-convex sum, 243
in a topological vector space, 37 polyhedral-convex sum rule, 241
in topological vector space, 186 sum rule in Rn , 200
Bounkhel, Messaoud, 549 sum rule in Banach spaces, 276
Bourbaki, Nikolas, 64 sum rule in TVS, 196
Boyd, Stephan, viii codimension, 46
Brøndsted, Arne, 304, 377 Combettes, Patrick, viii, 251, 378
Brezis, Haı̈m, 304 compact set, 18, 38
Brunn, Hermann, 173 complete metric space, 3
Brøndsted-Rockafellar density theo- conjugate function, 255
rem, 319 connected set, 25
Burachik, Regina, 306, 378, 551 contingent derivative, 446
Burke, James, 309 continuity, 7
of a convex function, 136
C of an extended-real-valued func-
Carathéodory theorem, 426 tion, 136
Carathéodory, Constantin, 442, 547 of the Minkowski gauge, 80
Castaing, Charles, viii, 304, 308 continuous linear mapping, 38, 39
Cauchy sequence, 3 control function, 519
characteristic function, 26 convergence
Chieu, Nguyen Huy, 308, 547 of a net, 28
Clarke generalized directional deriva- of a sequence, 3
tive, 446, 455 weak, 41
Clarke generalized gradient, 465, 470, weak∗ , 44
495 convex
Clarke, Francis, xii, 306, 308, 445, cone generated by a set, 425
446, 465, 487, 545, 547 conic hull, 425
Clebsch, Alfred, 173 combination, 69, 426
closed ball, 3 function, 118
closed graph theorem, 58 hull, 71, 426
closed hyperplane, 95 line segment, 65
closed set, 1 proper separation, 112
closed unit ball, 42 set, 65
closed-graph mapping, 58 set-valued mapping, 69
closure, 5 convex separation, 93, 182
linear, 74 in a dual space, 101
of a convex set, 71, 73 in a topological vector space, 95, 96
of an intersection, 73 in a vector space, 88, 90
coderivative in an LCTV space, 97
chain rule in Rn , 200 proper, 89
chain rule in Banach spaces, 276 strict, 97
chain rule in TVS, 198 convexification, 70
Subject Index 579

convexity E
of a Cartesian product, 67 Ekeland variational principle, 312,
of a direct image, 67 313
of a distance function, 225 Ekeland, Ivar, viii, 250, 251, 312, 376
of a sum of sets, 69 enlargement set, 230
of algebraic interior, 75 epi-differentiability, 458
of an intersection, 70 Ernst, Emil, 305
of an inverse image, 67 essential boundedness, 497
of linear closure, 76 Euclid of Alexandria, 173
of scalar multiplication with sets, Euler, Leonard, 173
69 extremal principle, 315, 316, 549
of set-valued mapping, 69 convex, 311, 315, 316, 376
Correa, Rafael, 306, 308 extremal system, 183, 315
Crandall, Michael, 549 extreme point, 116
Cuong, Dang Van, 378, 442
F
Fabian, Marian, 177, 378
D
face of a convex set, 115
Dür, Mirjam, 551
Farkas lemma, 427, 434
Danskin, John, 306
Farkas, Gyula, 442
Dantzig, George, 174
Fenchel conjugate, 179, 255
Dao, Minh, 551
of marginal function, 290
Debreu, Gérard, 174 of singular functions, 368
Demyanov, Vladimir, 306, 544 chain rule, 270, 278
Dentcheva, Darinka, 309 maximum rule, 271, 278
Deville, Robert, 376 of a spectral composition, 364
Dieudonné, Jean, 64 sum rule, 269, 277
Dinh, Nguyen, 550 Fenchel dual problem, 292
Dini contingent derivative, 446 Fenchel strong duality, 293, 299
Dini subderivative, 446, 456 Fenchel weak duality, 292
Dini, Ulisse, xii, 445, 545 Fenchel, Werner, vii, 173, 179, 249,
Dini-Hadamard directional deriva- 304, 310
tive/subderivative, 549 Fenchel-Young inequality, 259
directional derivative, 279, 339 Fermat stationary rule, 202
directional regularity, 447 finite complement topology, 2
discrete topology, 2 finite intersection property, 20
distance function, 222, 505, 507 Fréchet differentiability, 340
convexity, 225 generic, 358
Lipschitz property, 222 of Fenchel conjugate, 352
subdifferential, 228, 232 subdifferential characterization,
subdifferential in a Hilbert space, 350
229 Fréchet strict differentiability, 487,
domain 489
of a set-valued mapping, 68 Fréchet, Maurice, 63
of extended-real-valued function, Frankowska, Hélène, 250
118 function
dual space, 39, 100 absolutely symmetric, 367
Dubovitskii, Abram, 175, 306 characteristic, 26
Dunford, Nelson, 64 conjugate, 255
580 Subject Index

continuous, 7, 136 H
continuous DC, 515 Hadamard strict differentiability, 486,
control, 519 487
convex, 118 Hadamard, Jacques, 378
difference of convex (DC), 514 Hadjisavvas, Nicolas, 177
distance, 122, 222, 505 Hahn, Hans, 64
horizon, 402 Hahn-Banach theorem, 51, 52, 84, 93
indicator, 121, 205 Hantoute, Abderrahim, 306, 308
Lipschitz continuous, 138 Hartman, Philip, 550
locally DC, 525 Hausdorff topological space, 15, 37
locally Lipschitzian, 138, 204, 446 Hausdorff, Felix, 63
lower semicontinuous, 142 Heine, Eduard, 63
marginal, 287 Heine-Borel theorem, 24
Helly theorem, 430
maximum, 129
Helly, Edward, 64, 442
Minkowski gauge, 78, 410
Henrion, René, 308
optimal value, 287
Hilbert space, 5
perspective, 400
Hilbert, David, 63
polyhedral, 241
Hiriart-Urruty, Jean-Baptiste, viii,
positively homogeneous, 51 251, 304, 377, 442, 550
proper, 118 Holmes, Richard, 175
quasiconvex, 123 homeomorphism, 9
signed distance, 510, 548 horizon cone, 398
signed minimal time, 420 hyperplane, 88
strictly convex, 122 closed, 95
subadditive, 51
sublinear, 51 I
support, 263 indicator function, 205
supremum, 283 indiscrete topology, 2
symmetric, 362 inner product, 4
inner product space, 4
interior, 5
G
algebraic, 74
Gâteaux differentiability, 311, 338,
intrinsic relative, 149
345
of a convex set, 73
generic, 355
of convex epigraphs, 138
subdifferential characterization, of convex sets, 72
346 relative, 102, 147
Gâteaux strict differentiability, 487 Ioffe, Alexander, viii, 177, 251, 304,
Gâteaux, Réne, 378 308, 544, 549
Gamkrelidze, Revaz, 173 Iusem, Alfredo, 378
generalized derivative, 446
generic Fréchet differentiability, 358 J
generic Gâteaux differentiability, 355 Jensen, Johan, 173
geometric derivability, 458 Jeyakumar, Jeya, 306
Giner, Emmanuel, 308 Jourani, Abderrahim, 251
Godefroy, Gilles, 376
Goebel, Rafal, 177 K
Graves, Lawrence, 174, 547 Kantorovich, Leonid, 174
Subject Index 581

Karush, William, 174 Lyusternik, Lazar, 547


Kelly, John, 64
Khan, Akhtar, 175 M
Koopmans, Tjalling, 174 Malozemov, Vasilii, 306
Krein, Mark, 176 mapping
Krein-Milman theorem, 117 affine, 66
Kruger, Alexander, 250, 252, 542, 549 bounded linear, 38, 39
Kuhn, Harold, 174 closed-graph, 58
Kuratowski, Kazimierz, 64 continuous, 7
Kusraev, Anatoly, 251 continuous linear, 38, 39
Kutateladze, Semen., 251 linear, 38
open, 56
L set-valued, 68
López, Marco, 308 marginal function, 287
Lagrange, Joseph-Louis, 174 Martı́nez-Legaz, Juan Enrique, 551
LCTV space, 83 maximal monotonicity, 335, 336, 483
Lebourg, Gérald, 378, 546, 549 maximum function, 129
Legendre, Adrien-Marie, 173, 304 Mazur, Stanislaw, 173
Leibniz rule, 306 mean value theorem, 126
Lemaréchal, Claude, viii, 251, 304, approximate, 333, 377, 547
441 convex subdifferential, 330
Levin, Vladimir, 306, 308 nonconvex subdifferential, 480, 549
Levitin, Evgenii, 441 metric, 2
Lewis, Adrian, viii, 177, 251, 304, 379, metric space, 2
547 Milman, David, 176
Lindstrom, Scott, 551 Milyutin, Alexey, 175, 306
linear Minkowski gauge function, 74, 78
closure, 74 Minkowski, Hermann, vii, 173, 249,
function, 88 304
mapping, 38 mixing lemma, 523
space, 3 Mohammadi, Ashkan, 546
subspace, 87 Mordukhovich, Boris, 177, 250, 251,
line segment, 65 306, 308, 376, 378, 442, 546, 547,
linear space 549
complex, 4 Moreau, Jean-Jacques, 175, 176, 212,
real, 4 249, 252, 304, 310, 472
Lions, Pierre-Louis, 549 Morgenstern, Oskar, 176
Lipschitz modulus, 204 multifunction, 68
Lipschitzian function, 204, 446 multiplication operator, 33
local convexity
in weak topology, 84 N
in weak∗ topology, 83 Nam, Nguyen Mau, 251, 252, 378, 550
of a quotient space, 83 Namioka, Isaac, 378
of a TVS space, 83 Nashed, Zuhair, 549
of an LCTV space with weak topol- Nesterov smoothing, 393
ogy, 98 Nesterov, Yurii, viii, xii, 441
locally convex space, 83 net, 28
Loewen, Philip, 177, 549 net convergence, 28
Lyapunov, Alexey, 308 Ng, Kung Fu, 253
582 Subject Index

Nghia, Tran Thai An, 306 polyhedral set-valued mapping, 241


norm, 4 Pontryagin, Lev, 174
in a product normed space, 58 positively homogeneous function, 51
of a bounded linear mapping, 39 Preiss, David, 376, 547, 549
norm of a bounded linear operator, projection, 223
391 projection in a Hilbert space, 225
norm of the adjoint operator, 391 proper separation, 89
normal cone, 148, 180 prox-function, 393
intersection rule in Banach spaces, Pshenichnyi, Boris, viii, 175, 250, 306,
275 544
intersection rule, 239
intersection rule in Rn , 191 Q
intersection rule in TVS, 185, 187 Quotient
limiting (Mordukhovich), 502, 549 map, 13, 47
nontriviality, 181 space, 13, 46
nontriviality counterexample, 182 topology, 13
polyhedral-convex intersection
rule, 240 R
regular (Fréchet), 502 Rademacher differentiability theo-
to a sublevel set, 208 rem, 494
normal integral, 308 Rademacher, Hans, 547
normed space, 4 Radon theorem, 430
nowhere dense set, 54 Radon, Johann, 442
nowhere Fréchet differentiability relative interior, 102, 105, 147
example, 344 intrinsic, 149
nuclear norm, 370 of a convex graph, 113
of a set, 102, 147
O quasi-, 149
open ball, 2 Robinson, Stephen, 305, 310, 443
open mapping, 56 Rockafellar directional deriva-
open mapping theorem, 56 tive/subderivative, 547
open set, 1 Rockafellar, R. Tyrrell, vii, xii, 175,
optimal value function, 287 177, 212, 249, 304, 308, 310, 377,
Overton, Michael, 378 472, 545, 547, 548
Rolewicz, Stefan, viii
P Rubinov, Alexander, 306, 544
Pérez-Aros, Pedro, 306, 308 Rudin, Walter, 64
Painlevé-Kuratowski sequential outer Ruszczyński, Andrzej, 309
limit, 500
Pallaschke, Diethard, viii, xii S
parallel subspace, 87 Sagara, Nobusumi, 308
partial order, 50 Sarabi, Ebrahim, 546
partially ordered set, 50 Schaible, Siegfried, 177
Penot, Jean-Paul, 177, 251, 308, 377 Schauder, Juliusz, 64
Phelps, Robert, viii, 251, 375, 377, Schwartz, Jacob, 64
378 Schwartz, Laurent, 250
pointwise boundedness, 59 second conjugate, 259
Polyak, Boris, 441 Seeger, Alberto, 379, 551
polyhedral function, 241 seminorm, 52, 80
Subject Index 583

seminorm that separates points, 81 singular value decomposition (SVD),


Sendov, Hristo, 379 366
separation Smith, Adam, 174
by a closed hyperplane, 95 Sobolev, Sergei, 250
by a hyperplane, 88 Song, Wen, 253
polyhedral-convex, 233, 238 Soubeyran, Antoine, 376
proper, 89, 112 space
strict, 97 Banach, 4
sequential normal compactness complete metric, 3
(SNC), 161, 177, 376 dual, 100
sequentially compact set, 22 Hausdorff, 32
set Hilbert, 4
absorbing, 34, 35, 74 inner product, 4
affine, 85 linear/vector, 3
balanced, 34 locally convex (LCTV), 83
bounded, 23, 37 metric, 2
closed, 1 normed, 4
compact, 18, 38 quotient, 13, 46
compactly epi-Lipschitzian (CEL), topological, 1
172 span, 103
connected, 25 spectral function, 362, 378
standing assumption
convex, 65
nonzero, 32, 99
dense, 54
real TVS, 179
disconnected, 25
Steiner, Jakob, 173
epi-Lipschitzian, 172
Steinhaus, Hugo, 64
open, 1
Steinitz, Ernst, 173
partially ordered, 50
strict differentiability, 487
polar, 45, 150
subadditive function, 51
polyhedral, 233
subdifferentiability, 207
sequentially compact, 22 subdifferential, 202
symmetric, 34 of nuclear norm, 370
totally bounded, 23 of singular function, 368
totally ordered, 50 of spectral functions, 364
set-valued mapping Clarke, 466, 533
maximal monotone, 335 convex, 201
monotone, 335 Dini/contingent, 465, 466, 477
outer semicontinuity, 348 Fréchet/regular, 500, 507, 533, 547
polyhedral, 241 geometric description, 202
sequential outer semicontinuity, Mordukhovich/limiting, 476, 500,
348 507, 533, 547, 549
strong upper semicontinuity, 350 of a convex convolution, 227
Shao, Yongheng, 177, 549 of a convex distance function, 228,
Shapiro, Alexander, 253 232
Shioji, Naoki, 441 of a convex function, 201
Shor, Naum, 547 of a convex supremum function,
Simons, Stephen, 304, 378 285
simplex, 104 of a normed function, 205
Singer, Ivan, 551 of an indicator function, 205
584 Subject Index

of convex marginal function, 287 Temam, Roger, viii, 251


singular/horizon, 203 Théra, Michel, 305
symmetric, 549 Thibault, Lionel, 251, 377, 547–549
subdifferential calculus-convex case Tikhomirov, Vladimir, viii, 251, 304,
for a composition in Rn , 218 544
for a composition in Banach spaces, Tikhonov theorem, 21
277 Tikhonov, Andrey, 63
for a composition in TVS, 217 Toland, John, 551
for a maximum in Rn , 221 Tonelli, Leonida, 173
for a maximum in TVS, 219 topological dual, 39
for a polyhedral composition in topological space, 1
LCTV spaces, 245 topological vector space, 32
for a polyhedral-convex sum in topology, 1
LCTV spaces, 242 T1 , 15
for a sum in Rn , 214 core convex, 170
for a sum in Banach spaces, 277 discrete, 2
for a sum in TVS, 212 finite complement, 2
subdifferential calculus-nonconvex Hausdorff (T2 ), 15, 32, 37
case indiscrete, 2
chain rule, 473, 478 normal (T4 ), 15
maximum rule, 475 product, 12
scalar multiplication rule, 477 quotient, 13
sum rule, 470, 471, 479 regular (T3 ), 15, 37
subdifferential mapping subspace, 13
maximal monotone, 336, 337 weak, 10, 40
strongly monotone, 392 weak∗ , 43
upper semicontinuity, 349 totally bounded set, 23
subdifferential variational principle, totally ordered set, 50
314 trace inequality, 360
subgradients, 201 trace of a matrix, 359
sublevel set, 208 translation operator, 33
sublinear function, 51
Tucker, Albert, 174
subnet, 30
subspace, 14
linear, 87 U
parallel, 87 uniform boundedness, 59
topology, 14 uniform boundedness principle, 59
support function intersection rule, unit sphere, 42
265, 274
support functional, 320 V
support point, 318, 320 Valadier, Michel, viii, 304, 308
supremum function, 283 vector space, 3
symmetric function, 362 complex, 4
symmetric set, 34 real, 4
Veselý, Libor, 550
T von Neumann trace inequality, 360,
Tammer, Christiane, 175 367
tangent cone, 432 von Neumann, John, 63, 176
Tao, Pham Dinh, 550 Vuong, Phan Tu, 550
Subject Index 585

W Y
Wang, Bingwu, 547 Yen, Nguyen Dong, xii, 550
Wang, Xianfu (Shawn), 548 Young, Laurence Chisholm (L.C.),
Warga, Jack, 173 173
Young, William Henry, 304
weak Asplund space, 354
weak convergence, 41 Z
weak topology, 10, 39, 40 Zălinescu, Constantin, viii, xii, 175,
weak∗ compactness, 45 251, 304, 306, 377, 441
weak∗ convergence, 44 Zagrodny, Darinsz, 377, 547
weak∗ topology, 43 Zajı́ček, Luděk, 550
Zarantonello, Eduardo, 177
weakly closed set, 98
Zheng, Xi Yin, 306
Weierstrass theorem, 25, 377
Zhu, Jim, 175, 251, 306
Weierstrass, Karl, 63, 173 Zizler, Vaclav, 376
Wells, Mike, 442 Zorn’s lemma, 51
Wets, Roger, 251, 304, 548 Zorn, Max, 64

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy