Takashi's Econ633 Lecture Notes May 18 2010
Takashi's Econ633 Lecture Notes May 18 2010
Takashi's Econ633 Lecture Notes May 18 2010
by
Takashi Kunimoto
• Linear Algebra
• Multivariate Calculus;
• Static Optimization;
A good comprehension of the material covered in the notes is essential for successful
graduate studies in economics. Since we are seriously time constrained – which you
might not believe –, it would be very useful for you to carry one of the books provided
below as a reference after you start to study in graduate school in September.
READING:
The main textbook of this course is “Further Mathematics for Economic Analysis.” I
mostly use this book for the course. However, if you don’t find the main textbook helpful
enough, I strongly recommend you that you should buy at least one of the other books
I listed below as well as “Further Mathematics for Economic Analysis.” Of course, you
can buy any math book which you find useful.
1
I am thankful to the students for their comments, questions, and suggestions. Yet, I believe that
there are still many errors in this manuscript. Of course, all remaining ones are my own.
1
• “Further Mathematics for Economic Analysis,” by Knut Sydsaeter, Peter Ham-
mond, Atle Seierstad, and Atle Strom, Prentice Hall, 2005 (Main Textbook. If
you don’t have any math book or are not confident about your math skill, this
book will help you a lot.)
PROBLEM SETS: There will be several problem sets. Problem sets are essential to
help you understand the course and to develop your skill to analyze economic problems.
2
Contents
1 Introduction 6
2 Preliminaries 9
2.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Necessity and Sufficiency . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2 Theorems and Proofs . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Preference Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.1 Least Upper Bound Principle . . . . . . . . . . . . . . . . . . . . . 15
3 Topology in Rn 17
3.1 Sequences on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Subsequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Upper and Lower Limits . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.4 Infimum and Supremum of Functions . . . . . . . . . . . . . . . . 21
3.1.5 Indexed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Point Set Topology in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Topology and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Properties of Sequences in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Linear Algebra 29
4.1 Basic Concepts in Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Determinants and Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4.1 Linear Dependence and Systems of Linear Equations . . . . . . . . 36
4.5 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3
4.5.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5.2 How to Find Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.7 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.8 Appendix 1: Farkas Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8.2 Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . 46
4.8.3 Linear Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.8.4 Non-Negative Solutions . . . . . . . . . . . . . . . . . . . . . . . . 47
4.8.5 The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.9 Appendix 2: Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.9.1 Number Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.9.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9.3 Bases, Components, Dimension . . . . . . . . . . . . . . . . . . . . 52
4.9.4 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.9.5 Morphisms of Linear Spaces . . . . . . . . . . . . . . . . . . . . . . 54
5 Calculus 55
5.1 Functions of a Single Variable . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Real-Valued Functions of Several Variables . . . . . . . . . . . . . . . . . 56
5.3 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 The Directional Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5.1 Upper Contour Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6 Concave and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7 Concavity/Convexity for C 2 Functions . . . . . . . . . . . . . . . . . . . . 60
5.7.1 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.8 Quasiconcave and Quasiconvex Functions . . . . . . . . . . . . . . . . . . 64
5.9 Total Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.9.1 Linear Approximations and Differentiability . . . . . . . . . . . . . 69
5.10 The Inverse of a Transformation . . . . . . . . . . . . . . . . . . . . . . . 72
5.11 Implicit Function Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Static Optimization 77
6.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1.1 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.1.2 Envelope Theorems for Unconstrained Maxima . . . . . . . . . . . 78
6.1.3 Local Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.1.4 Necessary Conditions for Local Extreme Points . . . . . . . . . . . 80
6.2 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2.1 Equality Constraints: The Lagrange Problem . . . . . . . . . . . . 81
6.2.2 Lagrange Multipliers as Shadow Prices . . . . . . . . . . . . . . . . 84
6.2.3 Tangent Hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2.4 Local First-Order Necessary Conditions . . . . . . . . . . . . . . . 85
4
6.2.5 Second-Order Necessary and Sufficient Conditions for Local Ex-
treme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.6 Envelope Result for Lagrange Problems . . . . . . . . . . . . . . . 87
6.3 Inequality Constraints: Nonlinear Programming . . . . . . . . . . . . . . . 88
6.4 Properties of the Value Function . . . . . . . . . . . . . . . . . . . . . . . 90
6.5 Constraint Qualifications . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.6 Nonnegativity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.7 Concave Programming Problems . . . . . . . . . . . . . . . . . . . . . . . 95
6.8 Quasiconcave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.9 Appendix: Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . 96
7 Differential Equations 97
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5
Chapter 1
Introduction
I start my lecture with Rakesh Vohra’s message about what economic theory is. He is a
professor at Northwestern University. 1
I think this is the biggest picture of economic theory you could have as you go along
this course. Whenever you are at a loss, please come back to this message.
We build our theory on individuals. Assume that all commodities are traded in the
centralized markets. Throughout Econ 610 and 620, we assume that each individual
(consumer and firm) takes prices as given. We call this the price taking behavior assump-
tion. You might ask why individuals are price takers. My answer would be “why not?”
Let us go as far as we can with this behavioral assumption and thereafter try to see the
limitation of the assumption. However, you have to wait for Econ 611 and 621 for how
to relax this assumption. So, stick with this assumption. For each consumer, we want
to know
1. What is the set of “physically feasible bundles? Is there any such a bundle at all
(feasibility)? We call this set the consumption set.
1
See Preface of Advanced Mathematical Economics by Rakesh V. Vohra.
6
2. What is the set of “financially feasible bundles? Is there any such a bundle at all
(feasibility)? We call this set the budget set.
3. What is the best bundle to the consumer among all feasible bundles (optimality)?
We call this bundle the consumer’s demand.
We can make the exact parallel argument for the firm. What is the set of “technically”
feasible inputs (feasibility)? We call this the production set of the firm. What is the
best combination of inputs to maximize its profit (optimality)? We call this the firm’s
supply. Once we figure out what are feasible and best choices to each consumer and each
firm under any possible circumstance, we want to know if there is any coherent state of
affairs where everybody makes her best choice. In particular, all markets must clear. We
call this coherent state “competitive (Walrasian) equilibrium.” (a fixed point).
How can we summarize what we discussed above? Given a (per capita) consump-
tion stream {ct }∞ ∞
t=0 , (per capita) capital accumulation {kt }t=0 , (per capita) GDP stream
∞
{f (kt )}t=0 , capital depreciation rate δ, the population growth rate n, (per capita) con-
sumption growth g, instantaneous utility function of the representative consumer u(·),
the effective discount rate of the representative consumer β > 0, (per capita) wage profile
{wt }∞ ∞
t=0 , and capital interest rate profile {rt }t=0 :
1. Find a {ct }∞
t=0 such that k̇t = f (kt ) − ct − (δ + g + n)kt holds at each t ≥ 1 and
k0 > 0 is exogenously given. This is the feasibility question. Any such {ct } is called
a feasible consumption stream.
∞ −βt
2. Find a feasible consumption stream {ct }∞ t=0 that maximizes V0 = 0 e u(ct )dt.
This is the problem of optimality. I assume that V0 < ∞.
3. Find a {rt , wt }∞
t=0 such that V0 (the planner’s optimization) is sustained through
market economies where k̇t = (rt − n − g)kt + wt − ct holds at each t ≥ 1 and
another condition limt→∞ λt e−βt kt = 0 holds. This latter condition is sometimes
called the transversality condition. This is the fixed point problem. This, in fact,
can be done by choosing rt = f (kt ) − δt and wt = f (kt ) − f (kt )kt at each t ≥ 1.
7
With appropriate re-interpretations, the above is exactly what we had in the begin-
ning except the transversality condition, which is a genuine feature of macroeconomics.
8
Chapter 2
Preliminaries
2.1 Logic
Theorems provide a compact and precise format for presenting the assumptions and
important conclusions of sometimes lengthy arguments, and so help identify immediately
the scope and limitations of the result presented. Theorems must be proved and a proof
consists of establishing the validity of the statement in the theorem in a way that is
consistent with the rules of logic.
9
“p ⇒ q” is true, the contrapositive statement, “¬q ⇒ ¬p” is also true.
Two implications, “p ⇐ q” and “p ⇒ q,” can both be true. When this is so, I say
that “p is necessary and sufficient for q,” or “p is true if and only if q is true,” or “p iff
q.” When “p is necessary and sufficient for q,” we say that the statements p and q are
equivalent and write “p ⇔ q.”
If I assert that p is necessary and sufficient for q, or that “p ⇔ q,” we must give a
proof in “both directions.” That is, both “p ⇒ q” and “q ⇒ p” must be established
before a complete proof of the assertion has been achieved.
10
It is important to keep in mind the old saying that goes, “Proof by example is no
proof.” Suppose the following two statements are given:
• p ≡ “x is a student,”
Assume further that we make the assertion “p ⇒ q.” Then clearly finding one student
with red hair and pointing him out to you is not going to convince you of anything.
Examples are good for illustrating but typically not for proving.
Finally, a sort of converse to the old saying about examples and proofs should be
noted. Whereas citing a hundred examples can never prove that a certain property
always holds, citing one solitary counterexample can disprove that the property always
holds. For instance, to disprove the assertion about the color of students’ hair, you need
simply point out one student with brown hair. A counterexample proves that the claim
cannot always be true because you have found at least one case where it is not.
For two sets S and T , we define the union of S and T as the set S∪T ≡ {x| x ∈S or x ∈ T }.
We define the intersection of S and T as the set S ∩ T ≡ {x| x ∈ S and x ∈ T }. Let
. }, we can write
Λ ≡ {1, 2, 3, . . . } be an index set. In stead of writing {S1 , S2 , S3 , . .
{Sλ }λ∈Λ . We would denote the union of all sets in the collection by λ∈Λ Sλ , and the
intersection of all sets in the collection as λ∈Λ Sλ .
11
The following are some important identities involving the operations defined above.
• A ∪ B = B ∪ A, (A ∪ B) ∪ C = A ∪ (B ∪ C), A ∪ ∅ = A
• A ∩ B = B ∩ A, (A ∩ B) ∩ C = A ∩ (B ∩ C), A ∩ ∅ = ∅
The collection of all subsets of a set A is also a set, called the power set of A and
denoted by P(A). Thus, B ∈ P(A) ⇐⇒ B ⊂ A.
Example 2.1 Let A = {a, b, c}. Then, P(A) = {∅, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}.
The previous argument reveals its stance that the order of the elements in a set
specification does not matter. In particular, {a, b} = {b, a}. However, on many occasions,
one is interested in distinguishing between the first and the second elements of a pair.
One such example is the coordinates of a point in the x − y-plane. These coordinates
are given as an ordered pair (a, b) of real numbers. The important property of ordered
pairs is that (a, b) = (c, d) if and only if a = c and b = d. The product of two sets S and
T is the set of “ordered pairs” in the form (s, t), where the first element in the pair is a
member of S and the second is a member of T . The product of S and T is denoted
S × T ≡ {(s, t)| s ∈ S, t ∈ T }.
The set of real numbers is denoted by the special symbol R and is defined as
Any n-tuple, or vector, is just an n dimensional ordered tuple (x1 , . . . , xn ) and can
be thought of as a “point” in n dimensional Euclidean space. This space is defined as
the set product
Rn ≡ R · · × R ≡ {(x1 , . . . , xn ) | xi ∈ R, i = 1, . . . , n}.
× ·
n times
Rn+ ≡ {(x1 , . . . , xn ) | xi ≥ 0, i = 1, . . . , n} ⊂ Rn .
12
2.3 Relations
Any ordered pair (s, t) associates an element s ∈ S to an element t ∈ T . Any collection
of ordered pairs is said to constitute a binary relation between the sets S and T . Many
familiar binary relations are contained in the product of one set with itself. For example,
let X be the closed unit interval, X = [0, 1]. Then the binary relation ≥ consists of all
ordered pairs of numbers in X where the first one in the pair is greater than or equal to
the second one. When, as here, a binary relation is a subset of the product of one set
X with itself, we say that it is a relation on the set X. A binary relation R on X is
represented by the subset of X × X, i.e., R ⊂ X × X. We can build more structure for
a binary relation on some set by requiring that it possesses certain properties.
Definition 2.2 A relation R on X is complete if, for all elements x and y in X, xRy
or yRx.
For example, ≥ on R is complete, while > and = are not. Note that R on X is
reflexive if it is complete.
Definition 2.3 A relation R on X is transitive if, for any three elements x, y, and
z ∈ X, xRy and yRz implies xRz.
For instance, all ≥, =, > on R are transitive.
13
, defined on the consumption set, Rn+ . If x x , we say that “x is at least as good as
x ,” for this consumer.
3. ∼ on Rn+ is symmetric.
2.4 Functions
A function is a relation that associates each element of one set with a single, unique
element of another set. We say that the function f is a mapping, map, or transformation
from one set D to another set R and write f : D → R. We call the set D the domain
and the set R the range of the mapping. If y is the point in the range mapped into by
the point x in the domain, we write y = f (x). In set-theoretic terms, f is a relation from
D to R with the property that for each x ∈ D, there is exactly one y ∈ R such that xf y
(x is related to y via f ).
The image of f is that set of points in the range into which some point in the domain
is mapped, i.e.,
f −1 (S) ≡ {x | x ∈ D, f (x) ∈ S} .
G ≡ {(x, y) | x ∈ D, y = f (x)} .
14
that is, for all x, x ∈ D, whenever f (x) = f (x ), then x = x . If the image is equal to
the range - if for every y ∈ R, there is x ∈ D such that f (x) = y, the function is said
to be onto. If a function is one-to-one and onto (sometimes called bijective), then an
inverse function f −1 : R → D exists that is also one-to-one and onto. The composition
of a function f : A → B and a function g : B → C is the function g ◦ f : A → C given
by (g ◦ f )(a) = g(f (a)) for all a ∈ A.
Fact 2.1 (Least Upper Bound Principle) Any nonempty set of real numbers that is
bounded above has a least upper bound.
This principle is rather an axiom of real numbers. A set S can have at most one least
upper bound, because if b∗1 and b∗2 are both least upper bounds for S, then b∗1 ≤ b∗2 and
b∗2 ≤ b∗1 , which thus implies that b∗1 = b∗2 . The least upper bound b∗ of S is often called
the supremum of S. We write b∗ = sup S and b∗ = supx∈S x.
Example 2.2 The set S = (0, 5), consisting of all x such that 0 < x < 5, has many
upper bounds, some of which are 100, 6.73, and 5. Clearly no number smaller than 5
can be an upper bound, so 5 is the least upper bound. Thus, sup S = 5.
A set S is bounded below if there exists a real number a such that x ≥ a for all x ∈ S.
The number a is a lower bound for S. A set S that is bounded below has a greatest
lower bound a∗ , with the property a∗ ≤ x for all x ∈ S, and a∗ ≥ a for all lower bounds
a. The number a∗ is called the infimum of S and we write a∗ = inf S or a∗ = inf x∈S x.
Thus, we summarize
• sup S = the least number greater than or equal to all numbers in S; and
Theorem 2.1 Let S be a set of real numbers and b∗ a real number. Then sup S = b∗ if
and only if the following two conditions are satisfied:
1. x ≤ b∗ for all x ∈ S.
15
Proof of Theorem 2.1: (=⇒) Since b∗ is an upper bound for S, by definition,
property 1 holds, that is, x ≤ b∗ for all x ∈ S. Suppose, on the other hand, that there
is some ε > 0 such that x ≤ b∗ − ε for all x ∈ S. Define b∗∗ = b∗ − ε. This implies
that b∗∗ is also an upper bound for S and b∗∗ < b∗ . This contradicts our hypothesis that
b∗ is a least upper bound for S. (⇐=) Property 1 says that b∗ is an upper bound for
S. Suppose, on the contrary, that b∗ is not a least upper bound. That is, there is some
other b such that x ≤ b < b∗ for all x ∈ S. Define ε = b∗ − b. Then, we obtain that
x ≤ b∗ − ε for all x ∈ S. This contradicts property 2.
16
Chapter 3
Topology in Rn
3.1 Sequences on R
A sequence is a function k → x(k) whose domain is the set {1, 2, 3, . . . } of all pos-
itive integers. I denote the set of natural numbers by N = {1, 2, . . . }. The terms
x(1), x(2), . . . , x(k), . . . of the sequence are usually denoted by using subscripts: x1 , x2 , . . . , xk , . . . .
We shall use the notation {xk }∞ k=1 , or simply {xk }, to indicate an arbitrary sequence of
real numbers. A sequence {xk } of real numbers is said to be
Definition 3.1 The sequence {xk } converge to x if for every ε > 0, there exists a
natural number Nε such that |xk − x| < ε for all k > Nε . The number x is called
the limit of the sequence {xk }. A convergent sequence is one that converges to some
number.
Note that the limit of a convergent sequence is unique. A sequence that does not
converge to any real number is said to diverge. In some cases we use the notation
limk→∞ xk even if the sequence {xk } is divergent. For example, we say that xk → ∞ as
k → ∞. A sequence {xk } is bounded if there exists a number M such that |xk | ≤ M for
all k = 1, 2, . . . . It is easy to see that every convergent sequence is bounded: If xk → x,
by the definition of convergence, only finitely many terms of the sequence can lie outside
the interval I = (x − 1, x + 1). The set I is bounded and the finite set of points from the
17
sequence that are not in I is bounded, so {xk } must be bounded. On the other hand, is
every bounded sequence convergent? No. For example, the sequence {yk } = {(−1)k } is
bounded but not convergent.
Theorem 3.2 Suppose that the sequences {xk } and {yk } converge to x and y, respec-
tively. Then,
1. limk→∞ (xk ± yk ) = x ± y
2. limk→∞ (xk · yk ) = x · y
3.1.1 Subsequences
Let {xk } be a sequence. Consider a strictly increasing sequence of natural numbers
and form a new sequence {yj }∞ j=1 , where yj = xkj for j = 1, 2, . . . . The sequence
{yj }j = {xkj }j is called a subsequence of {xk }.
Theorem 3.3 Every subsequence of a convergent sequence is itself convergent, and has
the same limit as the original sequence.
Proof of Theorem 3.3: It is trivial.
Theorem 3.4 If the sequence {xk } is bounded, then it contains a convergent subse-
quence.
Proof of Theorem 3.4: Since {xk } is bounded, we can assume that there exists
some M ∈ R such that |xk | ≤ M for all k ∈ N. Let yn = sup{xk |k ≥ n} for n ∈ N.
By construction, {yn } is a nonincreasing sequence because the set {xk |k ≥ n} shrinks
as n increases. The sequence {yn } is also bounded because −M ≤ yn ≤ M . Theorem
3.1 already showed that the sequence {yn } is convergent. Let x = limn→∞ yn . By the
18
definition of yn , we can choose a term xkn from the original sequence {xk } (with kn ≥ n)
satisfying |yn − xkn | < 1/n.
Definition 3.2 A sequence {xk } of real numbers is called a Cauchy sequence if for
every ε > 0, there exists a natural number Nε such that |xm − xn | < ε for all m, n > Nε .
The theorem below is a characterization of convergent sequences.
Therefore, {xk } is a Cauchy sequence. (⇐=) Suppose that {xk } is a Cauchy sequence.
First, we shall show that the sequence is bounded. By the Cauchy property, there is a
number M such that |xk −xM | < 1 for k > M . Moreover, the finite set {x1 , x2 , . . . , xM −1 }
is clearly bounded. Hence, {xk } is bounded. Theorem 3.4 showed that the bounded
sequence {xk } has a convergent subsequence {xkj }. Let x = limj xkj . Because {xk } is a
Cauchy sequence, for every ε > 0, there is a natural number N such that |xm − xn | < ε/2
for all m, n > N . If we take J is sufficiently large, we have |xkj − x| < ε/2 for all j > J.
Then for k > N and j > max{N, J},
|xk − x| = |xk − xkj + xkj − x| ≤ |xk − xkj | + |xkj − x| < ε/2 + ε/2 = ε
Hence xk → x as k → ∞.
Exercise 3.2 Consider the sequence {xk } with the generic term
1 k
1 1 1
xk = 2 + 2 + · · · + 2 =
1 2 k i2
i=1
19
Prove that this sequence is a Cauchy sequence. Hint:
1 1 1
2
+ 2
+ ··· +
(n + 1) (n + 2) (n + k)2
1 1 1
< + + ··· +
n(n + 1) (n + 1)(n + 2) (n + k − 1)(n + k)
1 1 1 1 1 1
= − + − + ··· + −
n n+1 n+1 n+2 n+k−1 n+k
Exercise 3.3 Prove that a sequence can have at most one limit. Use proof by contra-
diction. Namely, you first suppose, by way of contradiction, that there are two limit
points.
On the other hand, if limk→∞ sup xk = limk→∞ inf xk , then {xk } is convergent.
I omit the proof of Theorem 3.6.
Exercise 3.4 Determine the lim sup and lim inf of the following sequences.
1. {xk } = {(−1)k }
2. {xk } = (−1)k (2 + 1/k) + 1
20
3.1.4 Infimum and Supremum of Functions
Suppose that f (x) is defined for all x ∈ B, where B ⊂ Rn . We define the infimum and
supremum of the function f over B by
inf f (x) = inf{f (x)|x ∈ B}, sup f (x) = sup{f (x)|x ∈ B}.
x∈B x∈B
If a function f is defined over a set B, if inf x∈B f (x) = y, and if there exists a c ∈ B
such that f (c) = y, then we say that the infimum is attained (at the point c) in B. In
this case the infimum y is called the minimum of f over B, and we often write “min”
instead of “inf.” In the same way we write “max” instead of “sup” when the supremum
of f over B is attained in B, and so becomes the maximum.
A set whose elements are sets is often called a family of sets, and so an indexed set
of sets is also called an indexed family of sets. Consider a nonempty indexed family
{Aλ }λ∈Λ of sets. The union and the intersection of this family are the sets
• λ∈Λ Aλ = the set consisting of all x that belong to Aλ for at least one λ ∈ Λ
• λ∈Λ Aλ = the set consisting of all x that belong to Aλ for all λ ∈ Λ.
The union
∞ and the intersection
∞ of a sequence {An }n∈Æ = {An }∞
n=1 of sets is often
written as n=1 An and n=1 An .
21
(x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Rn is the norm x − y of the vector difference
between x and y. Thus,
d(x, y) = x − y = (x1 − y1 )2 + · · · + (xn − yn )2
If x0 is a point in Rn and r is a positive real number, then the set of all points x ∈ Rn
whose distance from x0 is less than r, is called the open ball around x0 with radius r.
This open ball is denoted by Br (x0 ). Thus,
Definition 3.3 A set S ⊂ Rn is open if, for all x0 ∈ S, there exists some ε > 0 such
that Bε (x0 ) ⊂ S.
On the real line R, the simplest type of open set is an open interval. Let S be any
subset of Rn . A point x0 ∈ S is called an interior point of S if there is some ε > 0 such
that Bε (x0 ) ⊂ S. The set of all interior points of S is called the interior of S, and is
denoted int(S). A set S is said to be a neighborhood of x0 if x0 is an interior point of S,
that is, if S contains some open ball Bε (x0 ) (i.e., Bε (x0 ) ⊂ S) for some ε > 0.
Theorem 3.7 1. The entire space Rn and the empty set ∅ are both open.
2. arbitrary unions of open sets are open: Let Λ be an arbitrary index set. If Aλ is
open for each λ ∈ Λ, then, λ∈Λ Aλ is also open.
22
Exercise 3.5 There are two questions. First, draw the graph of S = {(x, y) ∈ R2 |2x −
y < 2 and x − 3y < 5}. Second, prove that S is open in R2 .
Each point in a set is either an interior point or a boundary point of the set. The set
of all boundary points of a set S is said to be the boundary of S and is denoted ∂S or
bd(S). Note that, given any set S ⊂ Rn , there is a corresponding partition of Rn into
three mutually disjoint sets (some of which may be empty), namely;
1. the interior of S, which consists of all points x ∈ Rn such that N ⊂ S for some
neighborhood N of x;
2. the exterior of S, which consists of all points x ∈ Rn for which there exists some
neighborhood N of x such that N ⊂ Rn \S;
3. the boundary of S, which consists of all points x ∈ Rn with the property that every
neighborhood N of x intersects both S and its complement Rn \S.
A set S ∈ Rn is said to be closed if it contains all its boundary points. The union of
S and its boundary (S ∪ ∂S) is called the closure of S, denoted by S̄. A point x belongs
to S̄ if and only if Bε (x) ∩ S
= ∅ for any ε > 0. The closure S̄ of any set S is indeed
closed. In fact, S̄ is the smallest closed set containing S.
Theorem 3.8 1. The whole space Rn and the empty set ∅ are both closed.
Exercise 3.6 Prove Theorem 3.8. Use the fact that the complement of open set is closed
and Theorem 3.7.
In topology, any set containing some of its boundary points but not all of them, is
neither open nor closed. The half-open intervals [a, b) and (a, b], for examples, are neither
open nor closed. Hence, openness and closedness are not mutually exclusive.
23
Definition 3.5 A sequence {xk } in Rn converges to a point x ∈ Rn if for each ε > 0,
there exists a natural number N such that xk ∈ Bε (x) for all k ≥ N , or equivalently, if
d(xk , x) → 0 as k → ∞.
Theorem 3.9 Let {xk } be a sequence in Rn . Then, {xk } converges to the vector x ∈ Rn
(j)
if and only if for each j = 1, . . . , n, the real number sequence {xk }∞k=1 , consisting of jth
(j)
component of each vector xk , converges to x ∈ R, the jth component of x.
Proof of Theorem 3.9: (=⇒) For every k and every j, one has d(xk , x) = xk −x| ≥
(j) (j)
|xk − x(j) |. It follows
that if xk → x, then xk → x(j) for each j. (⇐=) Suppose that
(j)
xk → x(j) as k → ∞ for j = 1, . . . , n. Then, given any ε > 0, for each i = 1, . . . , n,
(j) √
there exists a number Nj such that |xk − x(j) | < ε/ n for all k > Nj . It follows that
(1) (n)
d(xk , x) = |xk − x(1) |2 + · · · + |xk − x(n) |2 < ε2 /n + · · · ε2 /n = ε,
for all k > max{N1 , . . . , Nn }. This is well defined because of the finiteness of n. There-
fore, xk → x as k → ∞
Exercise 3.7 Prove Theorem 3.10. Apply the same argument in Theorem 3.5 to each
coordinate.
24
(=⇒ of Property 2) Assume that S is closed and let {xk } be a convergent sequence
such that xk ∈ S for each k. Note that x ∈ S̄ by property 1. Since S = S̄ if S is closed,
it follows that x ∈ S. (⇐= of Property 2) By property 1, for any point x ∈ S̄, there is
some sequence {xk } for which xk ∈ S for each k and limk→∞ xk = x. By our hypothesis,
x ∈ S. This shows that x ∈ S̄ implies x ∈ S, i.e., S̄ ⊂ S. By definition, S ⊂ S̄ for any
S. Hence S = S̄, that is, S is closed.
25
Exercise 3.8 Let the number of commodities of the competitive market be n. Let pi > 0
be a price for commodity i for each i = 1, . . . , n. Let y > 0 be the consumer’s income.
Define the consumer’s budget set B(p, y) as
n
B(p, y) ≡ x = (x1 , . . . , xn ) ∈ Rn+ pi xi ≤ y .
i=1
26
Proof of Theorem 3.14: (=⇒) Suppose f is continuous at x0 . Then, for every
ε > 0, there exists a δ > 0 such that
The theorem below shows that continuous mappings preserve the compactness of the
set.
27
Suppose that f is a continuous function from Rn to Rm . If V is an open set in Rn ,
the image f (V ) = {f (x)|x ∈ V } of V need not be open in Rm . Nor need f (C) be closed
if C is closed. Nevertheless, the inverse image f −1 (U ) = {x|f (x) ∈ U } of an open set U
under continuous function f is always open. Similarly, the inverse image of any closed
set must be closed.
Theorem 3.17 Let f be any function from Rn to Rm . Then f is continuous if and only
if either of the following equivalent conditions is satisfied.
I omit the proof of Theorem 3.17. Because it is conceptually involved. So, just accept
the result.
Theorem 3.18 Let S be a compact set in R and let x∗ be the greatest lower bound of S
and x∗ be the lowest upper bound of S. Then, x∗ ∈ S and x∗ ∈ S.
Proof of Theorem 3.18: Let S ⊂ R be closed and bounded and let x∗ be the
lowest upper bound of S. Then, by definition of any lower bound, we have x∗ ≥ x for
all x ∈ S. If x∗ = x for some x ∈ S, we are done. Suppose, therefore, that x∗ is strictly
greater than every point in S. If x∗ > x for all x ∈ S, then x∗ ∈ / S, so x∗ ∈ R\S.
Since S is closed, R\S is open. Then, by the definition of open sets, there exists some
ε > 0 such that Bε (x∗ ) = (x∗ − ε, x∗ + ε) ⊂ R\S. Since x∗ > x for all x ∈ S and
Bε (x∗ ) ⊂ R\S, we claim that for any x̃ ∈ Bε (x∗ ), we must have x̃ > x for all x ∈ S. In
particular, x∗ − ε/2 ∈ Bε (x∗ ) and x∗ − ε/2 > x for all x ∈ S. But then this contradicts
our hypothesis that x∗ is the lowest upper bound of S. Thus, we must conclude that
x∗ ∈ S. The same argument should be constructed for the greatest lower bound of S,
i.e., x∗ .
28
Chapter 4
Linear Algebra
Here aij denotes the elements in the ith row and the jth column. With this notation,
we can express f (x) = Ax as below:
⎛ (1) ⎞
f (x) ⎛ ⎞⎛ ⎞
⎜ .. ⎟ a11 a12 ··· a1n x1
⎜ . ⎟ ⎜ ⎟⎜ ⎟
⎜ ⎟ ⎜ a21 a22 ··· a2n ⎟⎜ x2 ⎟
f (x) = ⎜ (j) ⎟
⎜ f (x) ⎟ = ⎜ . .. .. .. ⎟⎜ .. ⎟.
⎜ .. ⎟ ⎝ .. . . . ⎠⎝ . ⎠
⎝ . ⎠
am1 am2 · · · amn xn
(m)
f (x)
• αA = (αaij )m×n ,
1
This is a non-trivial statement. But I take this one-to-one correspondence between linear mapping
and matrix representation as a fact with no proof provided.
29
• A − B = A + (−1)B = (aij − bij )m×n .
Let f : Rn → Rm and g : Rn → Rp be linear mappings. Then, we can set m × n
matrix A = (aij )m×n associated with f and p × m matrix B = (bij )p×m associated with
g. Consider the composite mapping g ◦ f (x) = g(f (x)). What I would like to have is the
requirement on the product of matrices that g ◦ f ≡ BA. Then the product C = BA
is defined as the p × n matrix C = (cij )p×n , whose element in the ith row and the jth
column is the inner product of the ith row of A and the jth column of B. That is,
n
cij = air brj = ai1 b1j + ai2 b2j + · · · + aik bkj + · · · ain bnj
r=1
n terms
It is important to note that the product BA is well defined only if the number of
columns in B is equal to the number of rows in A.
If A, B, and C are matrices whose dimensions are such that the given operations are
well defined, then the basic properties of matrix of multiplication are:
• (AB)C = A(BC) (associative law)
• A(B + C) = AB + AC (left distributive law)
• (A + B)C = AC + BC (right distributive law)
Exercise 4.2 Show the above three properties when we consider 2 × 2 matrices.
However, matrix multiplication is not commutative. In fact,
• AB
= BA, except in special cases
• AB = 0 does not imply that A or B is 0
• AB = AC and A
= 0 do not imply that B = C
30
A matrix is square if it has an equal number of rows and columns. If A is a square
matrix and n is a positive integer, we define the nth power of A in the obvious way:
An = AA
· · · A
n factors
The identity matrix of order n, denoted by In , is the n × n matrix having ones along
the main diagonal and zeros elsewhere:
⎛ ⎞
1 0 ··· 0
⎜ 0 1 ··· 0 ⎟
⎜ ⎟
In = ⎜ . . .
. . ... ⎟
(identity matrix)
⎝ .. .. ⎠
0 0 ··· 1
1. (AT )T = A
2. (A + B)T = AT + B T
3. (αA)T = αAT
4. (AB)T = B T AT
Exercise 4.4 Prove the above four properties when we consider 2 × 2 matrices.
A square matrix is said to be symmetric if A = AT .
31
4.2 Determinants and Matrix Inverses
4.2.1 Determinants
Recall that the determinants |A| of 2 × 2 and 3 × 3 matrices are defined by
a a12
|A| = 11 = a11 a22 − a12 a21
a21 a22
a11 a12 a13
|A| = a21 a22 a23 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31
a31 a32 a33
For a general n × n matrix A = {aij }, the determinant |A| can be defined recursively.
In fact,
|A| = ai1 Ai1 + ai2 Ai2 + · · · aij Aij + · · · + ain Ain
where the cofactors Aij are determinants of (n − 1) × (n − 1) matrices given by
a11 · · · a1,j−1 a1j a1,j+1 · · · a1n
a21 · · · a2,j−1 a2j a2,j+1 · · · a2n
.. .. .. .. ..
i+j . . . . .
Aij = (−1)
a i1 · · · ai,j−1 a ij ai,j+1 · · · ain
. .. .. .. ..
.. . . . .
a · · · ann
n1 · · · an,j−1 anj an,j+1
Here row i and column j are to be deleted from the matrix A to produce Aij .
32
Exercise 4.6 Define A as
⎛ ⎞
a11 a12 a13
A = ⎝ a21 a22 a23 ⎠
a31 a32 a33
is obtained by replacing the jth column of |A| by the column whose components are
b1 , b2 , . . . , bn . If the right-hand side of the equation system (∗) consists only of zeros, so
that it can be written in matrix form as Ax = 0, the system is called homogeneous. A
homogeneous system will always have the trivial solution x1 = x2 = · · · = xn = 0.
33
Lemma 4.2 Ax = 0 has nontrivial solutions if and only if |A| = 0.
I omit the proof of Lemma 4.2.
Exercise 4.9 Use Cramer’s rule to solve the following system of equations:
2x1 − 3x2 = 2
4x1 − 6x2 + x3 = 7
x1 + 10x2 = 1.
4.3 Vectors
An n-vector is an ordered n-tuple of numbers. It is often convenient to regard the rows
and columns of a matrix as vectors, and an n-vector can be understood either as a 1 × n
matrix a = (a1 , a2 , . . . , an ) (a row vector ) or as an n × 1 matrix aT = (a1 , a2 , . . . , an )T
(a column vector). The operations of addition, subtraction and multiplication by scalars
of vectors are defined in the obvious way. The dot product (or inner product) of the
n-vectors a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ) is defined as
n
a · b = a1 b1 + a2 b2 + · · · + an bn = ai bi
i=1
1. a · b = b · a,
2. a · (b + c) = a · b + a · c,
4. a · a = 0 =⇒ a = 0
5. (a + b) · (a + b) = a · a + 2(a · b) + b · b.
Exercise 4.10 Prove Proposition 4.1. If you find it difficult to do so, focus on vectors
in R2 .
The Euclidean norm or length of the vector a = (a1 , a2 , . . . , an ) is
√
a = a · a = a21 + a22 + · · · + a2n
34
2. a + b ≤ a + b (Minkowski inequality)
Proof of Cauchy-Schwartz inequality: Define f (t) as
where t ∈ R. Because of the definition of dot products, we have f (t) ≥ 0 for any t ∈ R.
Then, using the formula, we solve the above equation with respect to t:
−(a · b) ± (a · b)2 − a2 b2
t=
a2
Since f (t) ≥ 0 for any t ∈ R, we must have
|a · b| ≤ ab.
Exercise 4.11 Prove property 2 in Lemma 4.3. (Hint: It suffices to show that a+b2 ≤
(a + b)2 )
Cauchy-Schwartz inequality implies that, for any a, b ∈ Rn ,
a·b
−1 ≤ ≤ 1.
ab
Thus, the angle θ between nonzero vectors a and b ∈ Rn is defined by
a·b
cos θ = , θ ∈ [0, π]
a · b
This definition reveals that cos θ = 0 if and only if a · b = 0. Then θ = π/2. In symbols,
a⊥b ⇐⇒ a · b = 0
The hyperplane in Rn that passes through the point a = (a1 , . . . , an ) and is orthogo-
nal to the nonzero vector p = (p1 , . . . , pn ), is the set of all points x = (x1 , . . . , xn ) such
that
p · (x − a) = 0
c1 a1 + c2 a2 + · · · + cn an = 0
If this equation holds only when c1 = c2 = · · · = cn = 0, then the vectors are linearly
independent.
35
Exercise 4.12 a1 = (1, 2), a2 = (1, 1), and a3 = (5, 1) ∈ R2 . Show that a1 , a2 , a3 are
linearly dependent.
Leta1 , a2 , . . . , an ∈ Rn \{0}. Suppose that, for any i = 1, . . . , n, it follows that
ai
= n
j=i λj aj for any λ1 , . . . , λi−1 , λi+1 , . . . , λn ∈ R. Then, the entire space R is
spanned by the set of all linear combinations of a1 , . . . , an .
Suppose that (∗) has two solutions (u1 , . . . , un ) and (v1 , . . . , vn ). Then,
u1 a1 + · · · + un an = b and v1 a1 + · · · + vn an = b
Subtracting the second equation from the first yields
(u1 − v1 )a1 + · · · + (un − vn )an = 0.
Let c1 = u1 −v1 , . . . , cn = un −vn . The two solutions are different if and only if c1 , . . . , cn
are not all equal to 0. We conclude that if system (∗) has more than one solution, then
the column vectors a1 , . . . , an are linearly dependent. 2 Equivalently, If the column
vectors a1 , . . . , an are linearly independent, then system (∗) has “at most” one solution.
3
2
Recall Lemma 4.4.
3
Is there anything to say when there is no solution? The answer is yes. I can use Farkas Lemma to
check if there is any solution to the system. See Appendix 1 in this chapter for Farkas Lemma.
36
Theorem 4.1 The n column vectors a1 , a2 , . . . , an of the n × n matrix
⎛ ⎞ ⎛ ⎞
a11 a12 · · · a1n a1j
⎜ a21 a22 · · · a2n ⎟ ⎜ a2j ⎟
⎜ ⎟ ⎜ ⎟
A=⎜ . .. .. .. ⎟ , where aj = ⎜ .. ⎟ j = 1, . . . , n
⎝ .. . . . ⎠ ⎝ . ⎠
an1 an2 · · · ann anj
Definition 4.2 The rank of a matrix A, written r(A), is the maximum number of
linearly independent column vectors in A. If A is the 0 matrix, we put r(A) = 0.
4.5 Eigenvalues
4.5.1 Motivations
Consider the matrix A below.
2 0
A=
0 3
The linear transformation (matrix) A extends x1 into 2x1 along the x1 axis and x2
into 3x2 along the x2 axis. Importantly, there is no interaction between x1 and x2 through
the linear transformation A. This is, I believe, a straightforward extension of the linear
transformation in R into Rn . Define e1 = (1, 0) and e2 = (0, 1) as the unit vectors in R2 .
Then, x = x1 e1 + x2 e2 and y = 2x1 e1 + 3x2 e2 . In other words, (e1 , e2 ) is the unit vector
in the original space and (2e1 , 3e2 ) is the unit vector in the space transformed through
A. Next, consider the matrix B as follows.
1 1
B=
−2 4
Now, we don’t have a clear image about what is going on through the linear trans-
formation A. However, consider the following different unit vectors f1 = (1, 1) and
4
Check Section 4.2.3 for this argument. Recall that the system of linear equations is homogeneous if
it is expressed by Ax = 0.
37
f2 = (1, 2). Then,
f1 f2
1 1 1 1 1 1 1 1
=2 , and =3
−2 4 1 1 −2 4 2 2
A 2f1 A 3f2
This shows that once we take f1 and f2 as the new coordinate system, the linear
transformation B is the same as A but now along the f1 and f2 axes, respectively.
Finally, consider the matrix C below.
2 −3
C=
4 2
It turns out that there is no way of finding the new coordinate system in which
the linear transformation C can be seen as either extending or shrinking the vectors in
each new axis. The reason why we don’t find such a new coordinate system is that we
restrict our attention to Rn . Once we allow for the unit vectors in the new system to be
complex numbers, we again will be successful to find the new coordinate system in which
everything is easy to understand. 5
Then,
f˜1
2 −3 1 √ 1
√ = (2 − 2 3i) √ , and
4 2 2 3i 2 3i
3 3
A √
(2−2 3i)f˜1
f˜2
2 −3 1√ √ 1√
= (2 + 2 3i)
4 2 − 2 33i − 2 33i
A √
(2+2 3i)f˜2
38
Definition 4.3 If A is an n × n matrix, then a scalar λ is an eigenvalue of A if there
is a nonzero vector x ∈ Rn such that
Ax = λx
(A − λI)x = 0
where I denotes the identity matrix of order n. Note that this linear system of equations
has a solution x
= 0 if and only if the coefficient matrix has determinant equal to 0 –
that is, iff |A − λI| = 0. Letting p(λ) = |A − λI|, where A = (aij )n×n , we have the
equation
a11 − λ a12 ··· a1n
a21 a22 − λ · · · a2n
p(λ) = |A − λI| = .. .. . .. = 0.
. . .. .
an1 an2 · · · ann − λ
where a0 , . . . an−1 ∈ C. Then, (∗) has n solutions z1∗ , . . . , zn∗ with the property that zi∗ ∈ C
for each i = 1, . . . , n. This includes the case in which zi∗ = zj∗ for some i
= j.
Exercise 4.13 Find the eigenvalues and the associated eigenvectors of the matrices, A
and B.
1 2
A =
3 0
0 1
B =
−1 0
39
The zeros of this characteristic polynomial are precisely the eigenvalues of A. Denoting
the eigenvalues by λ1 , λ2 , . . . , λn ∈ C, we have
If we choose ajj from one of these parentheses and −λ from the remaining n − 1, then
add over j = 1, . . . n, we obtain the term
Since we cannot obtain other terms with (−λ)n−1 except the above terms, we conclude
that bn−1 = a11 + a22 + · · · ann , the trace of A.
4.6 Diagonalization
Let A and P be n × n matrices with P invertible. Then A and P −1 AP have the same
eigenvalues. This is true because the two matrices have the same characteristic polyno-
mial:
where we use the fact that |P −1 | = 1/|P | and |AB| = |A||B| (See Proposition 4.1 and
4.2.) .
P −1 AP = diag(λ1 , . . . , λn )
40
Proof of Diagonalization Theorem: (⇐=) Suppose that A has n linearly indepen-
dent eigenvectors x1 , . . . , xn , with corresponding eigenvalues λ1 , . . . , λn . Let P denote
the matrix whose columns are x1 , . . . , xn . Then, AP = P D (⇔ AP = DP because D is
diagonal), where D = diag(λ1 , . . . , λn ). Because the eigenvectors are linearly indepen-
dent, P is invertible, so P −1 AP = D. (=⇒) If A is diagonalizable, there exists invertible
n × n matrix P such that P −1 AP = D. Then, AP = P D. Since D is a diagonal matrix
by our hypothesis, we have AP = DP . The columns of P must be eigenvectors of A,
and the diagonal elements of D must be the corresponding eigenvalues.
Proof of Theorem 4.5: (1) We will show this for n = 2. The eigenvalues of 2 × 2
matrix A are given by the quadratic equation.
a11 − λ a
|A − λI| = 12 = λ2 − (a11 + a22 )λ + (a11 a22 − a12 a21 ) = 0 (∗)
a21 a22 − λ
41
This is indeed the case. (2) Suppose that Axi = λi xi and Axj = λj xj by λi
= λj .
Multiplying these equalities from the left by xTj and xTi , respectively,
xTj Axi = λi xj xi and xTi Axj = λj xTi xj
where the aij are constants. Suppose we put x = (x1 , . . . , xn )T and A = (aij ). Then, it
follows from the definition of matrix multiplication that
42
Definition 4.4 A quadratic form Q(x) = x Ax, as well as its associated symmetric
matrix A, are said to be positive definite, positive semidefinite, negative definite,
or negative semidefinite according as
for all x ∈ R\{0}. The quadratic form Q(x) is indefinite if there exist vectors x∗ and
y ∗ such that Q(x∗ ) < 0 and Q(y ∗ ) > 0.
Let A = (aij ) be any n × n matrix. An arbitrary principal minor of order r is the
determinant of the matrix obtained by deleting all but r rows and r columns in A with
the same numbers. In particular, a principal minor of order r always includes exactly r
elements of the main (principal) diagonal. We call the determinant |A| itself a principal
minor (no rows and columns are deleted). A principal minor is said to be a leading
principal minor of order r (1 ≤ r ≤ n), if it consists of the first “leading” rows and
columns of |A|.
with the associated symmetric matrix A = (aij )n×n . Let Dk be the leading principal
minor of A of order k and let Δk denote an arbitrary principal minor of order k. Then
we have
43
4. Q is negative semidefinite ⇐⇒ (−1)k Δk ≥ 0 for all principal minors of order
k = 1, . . . , n.
Proof of Theorem 4.6: We only prove this for n = 2. Then, the quadratic form is
Thus, we obtain
2. Q is positive semidefinite ⇐⇒ λ1 ≥ 0, . . . , λn ≥ 0
4. Q is negative semidefinite ⇐⇒ λ1 ≤ 0, . . . , λn ≤ 0
44
4.8.1 Preliminaries
Definition 4.5 A vector y can be expressed as a linear combination of a vectors in
S = {x1 , x2 , . . . } if there are real numbers {λj }j∈S such that
y= λj xj
j∈S
The set of all vectors that can be expressed as a linear combination of vectors in S is
called the span of S and denoted span(S).
Definition 4.6 The rank of a (not necessarily finite) set S of vectors is the size of the
largest subset of linearly independent vectors in S.
Definition 4.7 Let S be a set of vectors and B ⊂ S be finite and linearly independent.
The set B of vectors is said to be a maximal linear independent set if the set B ∪ {x}
is linearly dependent for all vectors x ∈ S\B. A maximal linearly independent subset of
S is called a basis of S.
Definition 4.8 Let S be a set of vectors. The dimension of span(S) is the rank of S.
Definition 4.9 The kernel or null space of A is the set {x ∈ Rn |Ax = 0}.
The following theorem summarizes the relationship between the span of A and its
kernel.
Theorem 4.10 If A is an m × n matrix, then the dimension of span(A) plus the dimen-
sion of the kernel of A is n.
This is sometimes written as
The column rank of a matrix is the dimension of the span of its columns. Similarly,
the row rank is the dimension of the span of its row.
Theorem 4.11 Let A be an m × n matrix. Then, the column rank of A and AT (the
tranpose of A) are the same.
Thus, the column and row rank of A are equal. This allows us to define the rank of
a matrix A to be the dimension of span(A).
45
4.8.2 Fundamental Theorem of Linear Algebra
Let A be an m × n matrix of real numbers. We will be interested in problems of the
following kind:
Convincing another that Ax = b has a solution (when it does) is easy. One merely
exhibits and they can verify that the solution does indeed satisfy the equations. What
if the system Ax = b does not admit a solution? By framing the problem in the right
way, we can bring to bear the machinery of linear algebra. Specifically, given b ∈ Rm ,
the problem of finding an x ∈ Rn such that Ax = b can be stated as: is b ∈ span(A)?
yb = yAx = (yA)x = 0,
Using the fact that the rank of a matric and its transpose coincide, we have
r + dim[ker(C T )] = m = r + dim[ker(AT )],
46
4.8.3 Linear Inequalities
Now consider the following problem:
The problem differs from the earlier one in that “=” has been replaced by “≤.”
Definition 4.11 The set of all non-negative linear combinations of the columns of A is
called the finite cone generated by the columns of A. It is denoted cone(A).
Note the difference between span(A) and cone(A) below:
and
cone(A) = y ∈ Rm |y = Ax for some x ∈ Rn+
Proof: First we prove that both statements cannot hold simultaneously. Suppose
not. Let x∗ ≥ 0 be a solution to Ax = b and y ∗ a solution to yA ≥ 0 such that
y ∗ b < 0. Notice that x∗ must be a solution to y ∗ Ax = y ∗ b. Thus, y ∗ Ax∗ = y ∗ b. The
0 ≤ y ∗ Ax∗ = y ∗ b < 0, which is a contradiction.
If b ∈
/ span(A) (i.e., there is no x such that Ax = b), by the previous theorem
(Theorem ??), there is a y ∈ Rm such that yA = 0 and yb
= 0. If it so happens that the
given y has the property that yb < 0 we are done. If yb > 0, then negate y and again we
are done. So, we may suppose that b ∈ span(A) but b ∈ / cone(A), i.e., F = ∅.
47
Let r be the rank of A. Note that n ≥ r. Since A contains r linearly independent
column vectors and b ∈ span(A), we can express b as a linear combination of
an r-subset
D of linearly independent columns of A. Let D = {a , . . . , a } and b = rt=1 λit ait .
i1 ir
Note that D is linearly independent. Since b ∈ / cone(A), at least one of {λit }t≥1 is
negative.
Now apply the following four step procedure repeatedly. Subsequently, we show that
the procedure must terminate.
1. Choose the smallest index h amongst {i1 , . . . , ir } with λh < 0.
2. Choose y so that y · a = 0 for all a ∈ D\ah and y · ah
= 0. This can be done by the
previous theorem (Theorem ??) because ah ∈ / span(D\ah ). Normalize y so that
h
y · a = 1. Observe that y · b = λh < 0.
3. If y · aj ≥ 0 for all columns aj of A stop, and the proof is complete.
4. Otherwise, choose the smallest index w amongst {1, . . . , n} such that y · aw < 0.
Note that aw ∈
/ D\ah . Replace D by {D\ah } ∪ aw , i.e., exchange ah for aw .
To complete the proof, we must show that the procedure terminates. Let D k denote
the set D at the start of the kth iteration of the four step procedure described above. If
the procedure does not terminate, there is a pair (k,
) with k <
such that D k = D ,
i.e., the procedure cycles.
Let s be the largest index for which as has been removed from D at the end of one
of the iterations k, k + 1, . . . ,
− 1, say p. Since D = Dk , there is a q such that as is
inserted into D q at the end of iteration q, where k ≤ q <
. No assumption is made
about whether p < q or p > q. Notice that
Dp ∩ {as+1 , . . . , an } = Dq ∩ {as+1 , . . . , an }.
Let D p = {ai1 , . . . , air }, b = λi1 ai1 + · · · λir air and let y be the vector found in step two
of iteration q. Then:
!
0 > y · b = y λi1 ai1 + · · · + λir air = y λi1 ai1 + · · · + y λir air > 0,
which is a contradiction. The first inequality comes from the previous theorem (Theorem
??). To see why the last inequality must be true:
• When ij < s, we have from Step 1 of iteration p that λij ≥ 0. From Step 4 of
iteration q, we have y · aij ≥ 0.
• When ij = s, we have from Step 1 of iteration p that λij < 0. From Step 4 of
iteration q we have y · aij < 0.
• When ij > s, we have from Dp ∩ {as+1 , . . . , an } = Dq ∩ {ar+1 , . . . , an } and Step 2
of iteration q that y · aij = 0.
This complete the proof.
48
4.8.5 The General Case
The problem of deciding whether the system {x ∈ Rn | Ax ≤ b} has a solution can be
reduced to the problem of deciding if Bz = b, z ≥ 0 has a solution for a suitable matrix
B.
First observe that any inequality of the form j aij xj ≥ bi can be turned into an
equation by the subtraction of a surplus variable, s. That is, define a new variable si ≥ 0
such that
aij xj − si = bi .
j
Similarly, an inequality of the form j aij xj ≤ bi can be converted into an equation by
the addition of a slack variable, si ≥ 0 as follows:
aij xj + si = bi .
j
As an example, we derive the Farkas alternative for the system {x|Ax ≤ b, x ≥ 0}.
Deciding solvability of Ax ≤ b for x ≥ 0 is equivalent to solvability of Ax + Is = b where
x, s ≥ 0. Set B = [A|I] and z = (x, s)T and we can write the system as Bz = b, z ≥ 0.
Now apply the Farkas lemma to this system:
yB ≥ 0, yb < 0.
49
2. (α + β) + γ = α + (β + γ) ∀α, β, γ ∈ K (addition is associative);
4. For every α
= 0 ∈ K, there exists a number (reciprocal element) γ ∈ K such that
αγ = 1.
α(β + γ) = αβ + αγ.
The most commonly encountered concrete examples of number fields are the follow-
ing:
50
1. The field of rational numbers, i.e., of quotients p/q where p and q
= 0 are the
ordinary integers subject to the ordinary operations of arithmetic. It should be
noted that the integers by themselves do not form a field, since they do not satisfy
axiom F2-4. It follows that every field K has a subset isomorphic to the field of
rational numbers.
2. The field of real numbers, having the set of all points of the real line as its geometric
counterpart. The set of real numbers is denoted as R. An axiomatic treatment of
the field of real numbers is achieved by supplementing axioms F 1, F 2, F 3 with the
axioms of order and the least upper bound principle.
3. The field of complex numbers of the form a + ib, where a and b are real numbers
(i is not a real number), equipped with the following operations of addition and
multiplication:
The set of complex numbers is denoted as C. For numbers of the form a + i0, these
operations reduce to the corresponding operations for real numbers; briefly I write
a + i0 = a and call complex numbers of this form real. Thus, it can be said that
the field of complex numbers has a subset isomorphic to the field of real numbers.
Complex numbers of the form 0 + ib are said to be (purely) imaginary and are
designated briefly by ib. It follows from the multiplication rule that
4.9.2 Definitions
The concept of a linear space generalizes that of the set of all vectors. The generalization
consists first in getting away from the concrete nature of the objects involved (directed
line segments) without changing the properties of the operations on the objects, and
secondly in getting away from the concrete nature of the admissible numerical factors
(real numbers). This leads the following definition.
Definition 4.12 A set V is called a linear (or affine) space over a field K if
1. Given any two elements x, y ∈ V , there is a rule (the addition rule) leading to a
(unique) element x + y ∈ V , called the sum of x and y;
2. Given any element x ∈ V and any number λ ∈ V , there is a rule (the multiplication
by a number) leading to a (unique) element λx ∈ V , called the product of the
element x and the number λ;
3. These two rules obey the axioms listed below, VS1 and VS2.
51
1. x + y = y + x for every x, y ∈ V ;
2. (x + y) + z = x + (y + z) for every x, y, z ∈ V ;
3. There exists an element 0 ∈ V (the zero vector ) such that x + 0 = x for every
x∈V;
4. For every x ∈ V , there exists an element y ∈ V (the negative element) such that
x + y = 0.
x = ξ1 e1 + ξ2 e2 + · · · + ξn en (∗)
x = ξ1 e1 + ξ2 e2 + · · · + ξn en ,
x = η1 e1 + η2 e2 + · · · + ηn en
for a vector x, then, subtracting them term by term, we obtain the relation
from which, by the assumption that the vectors e1 , e2 , . . . , en are linearly independent,
we find that
ξ1 = η1 , ξ2 = η2 , . . . , ξn = ηn .
The uniquely defined numbers ξ1 , . . . , ξn are called the components of the vector x with
respect to the basis e1 , . . . , en
The fundamental significance of the concept of a basis for a linear space consists
in the fact that when a basis is specified, the originally abstract linear operations in
the space become ordinary linear operations with numbers, i.e., the components of the
vectors with respect to the given basis. In fact, we have the following.
52
Theorem 4.14 When two vectors of a linear space V are added, their components (with
respect to any basis) are added. When a vector is multiplied by a number λ, all its
components are multiplied by λ.
If, in a linear space V , we can find n linearly independent vectors while every n + 1
vectors of the space are linearly dependent, then the number n is called the dimension
of the space V and the space V itself is called n-dimensional. A linear space in which
we can find an arbitrarily large number of linearly independent vectors is called infinite-
dimensional.
Theorem 4.16 If there is a basis in the space V , then the dimension of V equals the
number of basis vectors.
4.9.4 Subspaces
Definition 4.14 (Subspaces) Suppose that a set W of elements of a linear space V
has the following properties:
1. If x, y ∈ W , then x + y ∈ W ;
Then, every set W ⊂ V with properties 1 and 2 above is called linear subspace (or
simply a subspace) of the space V .
Definition 4.15 (The Direct Sum) A linear space W is the direct sum of given
subspaces W1 , . . . , Wm ⊂ W if the following two conditions are satisfied:
x = x1 + · · · + xm ,
where x1 ∈ W1 , . . . , xm ∈ Wm ;
x = x1 + · · · + xm = y1 + · · · + ym
x1 = y1 , . . . , xm = ym .
53
4.9.5 Morphisms of Linear Spaces
Definition 4.16 Let ϕ be a rule which assigns to every given vector x of linear space V
a vector x in a linear space. Then, ϕ is called morphism (or linear operator) if the
following two conditions hold:
Corollary 4.1 Every n-dimensional linear space over a field K is K-isomorphic to the
space K n . In particular, every n-dimensional complex space is C-isomorphic to the space
Cn , and every n-dimensional real space is R-isomorphic to the space Rn .
54
Chapter 5
Calculus
dy
= f (x).
dx
to indicate that f (x) gives us the (instantaneous) amount, dy, by which y changes per
unit change, dx, in x. If the first derivative is a differentiable function, we can take its
derivative which gets the second derivative of the original function
d2 y
= f (x).
dx2
If a function possesses a continuous derivatives f , f , . . . , f n , it is called n-times contin-
uously differentiable, or a C n function. Some rules of differentiation is provided below:
Later in this note on multivariate calculus, we are going to discuss some of the above
properties in details from a more general perspective. Until then, just remember them
so that you can use them anytime.
55
5.2 Real-Valued Functions of Several Variables
f : D → R is said to be a real-valued function if D is any set and R ⊂ R. Define
the following: x ≥ y if xi ≥ yi for every i = 1, . . . , n; and x y if xi > yi for every
i = 1, . . . , n.
5.3 Gradients
If z = F (x, y) and C is any number, we call the graph of the equation F (x, y) = C a
level curve for F . The slope of the level curve F (x, y) = C at a point (x, y) is given by
the formula
dy ∂F (x, y)/∂x F1 (x, y)
F (x, y) = C =⇒ y = =− =−
dx ∂F (x, y)/∂y F2 (x, y)
If (x0 , y0 ) is a particular point on the level curve F (x, y) = C, the slope at (x0 , y0 ) is
−F1 (x0 , y0 )/F2 (x0 , y0 ). The equation for the tangent hyperplane T is
or, rearranging
The vector (F1 (x0 , y0 ), F2 (x0 , y0 )) is said to be the gradient of F at (x0 , y0 ) is often
denoted by ∇F (x0 , y0 ) (pronounced as “nabla”). The vector (x − x0 , y − y0 ) is a vector
56
on the tangent hyperplane T which implies that ∇F (x0 , y0 ) is orthogonal to the tangent
hyperplane T at (x0 , y0 ).
f (x + ha) − f (x)
fa (x) = lim
h→0 h
or, with components,
57
Then, (g(h)−g(0))/h
= (f (x+ha)−f (x))/h.
Letting h tend to 0, we have g (0) = fa (x).
Since g (h) = ni=1 fi (x + ha)ai , g (0) = ni=1 fi (x)ai . Hence,
n
fa (x) = fi (x)ai = ∇f (x) · a.
i=1
This equation shows that the derivative of f along the vector a is equal to the inner
product of the gradient of f and a. If a = 1, the number fa (x) is called the directional
derivative of f at x, in the direction a.
3. ∇f (x) measures how fast the function increases in the direction of maximal in-
crease.
Proof of Theorem 5.1: By introducing θ as the angle between the vectors ∇f (x)
and a, we have
fa (x) = ∇f (x) · a = ∇f (x)a cos θ
Note that cos θ ≤ 1 for all θ and cos 0 = 1. So when a = 1, it follows that at points
where ∇f (x)
= 0, the number fa (x) is largest when θ = 0, i.e., when a points in the
same direction as ∇f (x), while fa (x) is smallest when θ = π, that is, cos π = −1, i.e.,
when a points in the opposite direction to ∇f (x). Moreover, it follows that the length
of ∇f (x) equals the magnitude of the maximum directional derivative.
Proof of the mean-value theorem: We assume that the mean-value theorem for
functions of one variable is correct. Define ϕ(λ) = f (λx + (1 − λ)y). Then, using the
chain rule which we will cover later, ϕ (λ) = ∇f (λx + (1 − λ)y) · (x − y). According to
the mean-value theorem for functions of one variable, there exists a number λ0 ∈ (0, 1)
such that ϕ(1) − ϕ(0) = ϕ (λ0 ). Putting w = λ0 x + (1 − λ0 )y, the theorem follows.
58
Definition 5.3 S ⊂ Rn is a convex set if for all x, y ∈ S, we have
αx + (1 − α)y ∈ S,
for all α ∈ [0, 1]
We say that z is a convex combination of x and y if z = αx + (1 − α)y for some
α ∈ [0, 1]. We have a very simple and intuitive rule defining convex sets: A set is convex
if and only if we can connect any two points in the set by a straight line that lies entirely
within the set.
Exercise 5.2 Construct an example in which two sets S and T are convex but S ∪ T is
not convex.
59
5.7 Concavity/Convexity for C 2 Functions
Suppose that z = f (x) = f (x1 , . . . , xn ) is a C 2 function in an open convex set S in Rn .
The matrix
are the leading principal minors of D 2 f (x) of order r. Here fij (x) = ∂ 2 f (x)/∂xi ∂xj for
any i, j = 1, . . . , r.
2. f is concave in S ⇐⇒ (−1)r Δ2(r) f (x) ≥ 0 for all x ∈ S and all Δ2r f (x), r =
1, . . . , n.
Proof of Theorem 5.4: (⇐=) The proof relies on the knowledge on the chain
rule (Theorem 5.15) which we are going to cover in this course. Just take it for granted
until then. Take two points x, x0 ∈ S and let t ∈ [0, 1]. Define
60
But this shows that f (·) is convex. The concavity of f easily follows by replacing f with
−f . (=⇒) Suppose f (·) is convex. According to Theorem 4.6 on quadratic forms, it
suffices to show that for all x ∈ S and all h1 , . . . , hn , we have
n
n
Q= fij (x)hi hj ≥ 0.
i=1 j=1
Exercise 5.3 Let f (x, y) = 2x − y − x2 + 2xy − y 2 for all (x, y) ∈ R2 . Check whether f
is concave, convex, or neither.
Exercise 5.4 The CES (Constant Elasticity of Substitution) function f defined for K >
0, L > 0 by
" #−1/ρ
f (K, L) = A δK −ρ + (1 − δ)L−ρ
where A > 0, ρ
= 0, and 0 ≤ δ ≤ 1. Show that f is concave if ρ ≥ −1 and convex if
ρ ≤ −1.
2. (−1)r D(r)
2 f (x) > 0 for all x ∈ S and all r = 1, . . . , n =⇒ f is strictly concave.
Proof of Theorem 5.5: Define the function g(·) as in the proof of Theorem 5.4
above. If the specified conditions are satisfied, the Hessian matrix D2 f (x) is positive
definite by Theorem 4.6 on quadratic forms. So, for x
= x0 , g (t) > 0 for all t ∈ [0, 1].
It follows that g(·) is strictly convex. Then, we have
g(t) = g (t · 1 + (1 − t) · 0)) > tg(1) + (1 − t)g(0) = tf (x) + (1 − t)f (x0 )
for all t ∈ (0, 1). The strict concavity of f is obtained by replacing f with −f .
61
Corollary 5.2 Let z = f (x, y) be a C 2 function defined on an open convex set S ⊂ R2 .
Then,
for all x, x0 ∈ S.
3. The corresponding result for convex (strictly convex) functions is obtained by chang-
ing ≤ to ≥ (< to >) in the above inequality.
for all λ ∈ (0, 1). Rearranging the above inequality, for all λ ∈ (0, 1), we obtain
Multiplying the inequality in (i) by λ > 0 and the inequality in (ii) by 1 − λ > 0, we
obtain
! " #
λ (f (x) − f (z)) + (1 − λ) f (x0 ) − f (z) ≤ ∇f (z) · λ(x − z) + (1 − λ)(x0 − z) (iii)
62
because z = λx + (1 − λ)x0 . This shows that f is concave. (2) (=⇒) Suppose that
f is strictly concave in S. Then, inequality (∗) is strict for x
= x0 . (⇐=) With z =
x0 + λ(x − x0 ), we have
H(2) is true because it is indeed the definition of concavity of f . Now, we will show that
H(k) =⇒ H(k + 1). We execute a series of computations below.
k+1 k $ k %
λh
f λh xh = f λh k xh + λk+1 xk+1
h=1 h=1 h=1 h=1 λh
k k
λh
≥ λh f k xh + λk+1 f (xk+1 ) (because of H(2))
h=1 h=1 h=1 λh
k k
λh
≥ λh · k f (xh ) + λk+1 f (xk+1 )
h=1 h=1 h=1 λh
(because H(k) is true under the inductive hypothesis.)
k+1
= λh f (xh ).
h=1
One can extend Jensen’s inequality to the continuum. Let X be a random variable
which takes values on the real line R. Define g : R → R to be a probability density
function. Then, continuous version of Jensen’s inequality is given:
63
Jensen’s Inequality (Continuum Version): A function f (·) is concave on R if
and only if
& ∞ & ∞
f f (x)g(x)dx ≥ f (x)g(x)dx
−∞ −∞
Thus, λx + (1 − λ)x ∈ Pa . This proves that Pa is convex. We leave the rest of the proof
as an exercise.
64
Theorem 5.9 (Quasiconcavity is preserved under positive monotone transformation)
Let f (·) be defined on a convex set S in Rn and let F be a function of one variable whose
domain includes f (S). If f (·) is quasiconcave (quasiconvex) and F is strictly increasing,
then F (f (·)) is quasiconcave (quasiconvex).
Proof of Theorem 5.9: Suppose f (·) is quasiconcave. Using the previous theorem
(Theorem 5.8), we must have
'
(
f λx + (1 − λ)x ≥ min{f (x), f (x )}.
Then, using the chain rule (Theorem 5.15) which will be shown later, we have
g (t) = ∇f (x0 + t(x − x0 )) · (x − x0 ).
65
Suppose f (x) ≥ f (x0 ). By Theorem 5.8, g(t) ≥ g(0) for all t ∈ [0, 1]. For any t ∈ (0, 1],
we have
g(t) − g(0)
≥ 0.
t
Letting t → 0, we obtain
g(t) − g(0)
lim = g (0) ≥ 0.
t→0 t
This implies
g (0) = ∇f (x0 ) · (x − x0 ) ≥ 0
The content of Theorem 5.10 is that for any quasiconcave function f (·) and any pair
of points x and x0 with f (x) ≥ f (x0 ), the gradient vector ∇f (x0 ) and the vector (x − x0 )
must form an acute angle.
for every x ∈ S. If the Hessian matrix D2 f (x) is negative definite in the subspace
{z ∈ Rn |∇f (x) · z = 0} for every x ∈ S, then f (·) is strictly quasiconcave.
Proof of Theorem 5.11: (=⇒) Suppose f (·) is quasiconcave. Let x ∈ S. Choose
x ∈ S such that ∇f (x) · (x − x) = 0. Since f (·) is quasiconcave, f (x ) ≤ f (x). To see
this, draw the figure. Then,
f (x ) − f (x) ≤ ∇f (x) · (x − x) = 0
By Theorem 5.6, f (·) is concave in the subspace for which ∇f (x) · (x − x) = 0. With
Theorem 5.4 and Theorem 4.6, concavity of f (·) is equivalent to negative semidefiniteness
of the Hessian matrix. Then, the conclusion follows. (⇐=) This proof is based on “A
Characterization of Quasi-Concave Functions,” by Kiyoshi Otani in Journal of Economic
Theory, vol 31, (1983), 194-196. Let x, x ∈ S such that f (x ) ≥ f (x). This choice entails
no loss of generality. For λ ∈ [0, 1], define
g(λ) = f (x + λ(x − x)).
Note that g(0) = f (x), g(1) = f (x ), and g(1) ≥ g(0) because f (x ) ≥ f (x) by our
hypothesis. What we want to show is that g(λ) ≥ g(0) for any λ ∈ [0, 1]. By the
mean-value theorem (Theorem 5.2), there exists λ0 ∈ (0, 1) such that g (λ0 ) = 0. Let
66
x0 = λ0 x+ (1− λ0 )x . Then, g (λ0 ) = 0 ⇔ ∇f (x0 )·(x − x) = 0. Assume that ∇f (x)
= 0
for any x ∈ S. This strikes us as being innocuous. Let p denote ∇f (x0 ) for notational
simplicity.
By pT · p > 0, there exists a C 2 function β : R → R for sufficiently small |α| > 0 such
that
β(0) = 0 and
'
(
f β(α)p + α(x − x) + x0 = f (x0 ) for any small α
Again, for notional simplicity, we denote β(α)p + α(x − x) + x0 by z(α). By differ-
entiating f (z(α)) = f (x0 ), we have
)
*
∇f (z(α)) β (α)p + (x − x) = 0 (∗)
because β(λ − λ0 ) ≥ 0 for λ sufficiently close to λ0 . Accordingly, g(λ0 ) does not have an
interior minimum in [0, 1], unless it is constant. Hence, g(λ) ≥ g(0) for any λ ∈ [0, 1].
The last step is based on Corollary 4.3 in “Nine Kinds of Quasiconcavity and Concavity,”
by Diewert, Avriel, and Zang in Journal of Economic Theory, vol 25, (1981), 397-420.
67
v T ∇f (x0 ) = 0 implies g(t) ≡ f (x0 + tv) does not attain a (semistrict) local minimum at
t = 0.
Then,
2. A sufficient condition for f to be strictly quasiconcave is that (−1)r Br (x) > 0 for
all x ∈ S and all r = 1, . . . , n.
for all x1 , x2 ∈ Rn and all scalars α ∈ R. Our knowledge on linear algebra tells us that
for every linear transformation f : Rn → Rm , there is a unique m × n matrix A such
that f (x) = Ax for all x ∈ Rn .
⎛ (1) ⎞ ⎛ ⎞⎛ ⎞
f (x) a11 a12 · · · a1n x1
⎜ .. ⎟ ⎜ .. .. .. .. ⎟ ⎜ x2 ⎟
⎜ . ⎟ ⎜ . . . . ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ f (j)(x) ⎟ = ⎜ aj1 aj2 · · · ajn ⎟ ⎜ ... ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ .. ⎟ ⎜ .. .. .. .. ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ . . . . ⎠ ⎝ . ⎠
(m)
f (x) am1 am2 · · · amn xn
In particular,
n
f (j)(x) = aj1 x1 + aj2 x2 + · · · ajn xn = aji xi .
i=1
68
5.9.1 Linear Approximations and Differentiability
If a one variable function f is differentiable at a point x0 , then the linear approximation
to f around x0 is given by
f (x0 + h) ≈ f (x0 ) + f (x0 )h,
for small values of h. Here ≈ stands for “approximation.” This is useful because the
approximation error is defined by
O(h) = true value - approximate value = f (x0 + h) − f (x0 ) − f (x0 )h
Moreover, f (·) is differentiable at x0 if and only if there exists a number c ∈ R such that
f (x0 + h) − f (x0 ) − ch
lim = 0.
h→0 h
If such a number c ∈ R exists, it is unique and c = f (x0 ). These arguments can be gen-
eralized straightforwardly to higher dimensional spaces. In particular, a transformation
f (·) is differentiable at a point x0 if it admits a linear transformation around x0 :
If such a matrix C exists, it is called the total derivative of f (·) at x0 , and is denoted
by Df (x0 ).
If I restrict attention to real-valued functions, the following theorem establishes an
equivalence between directional derivative along every vector and total differentiation.
69
j
In particular, if ej = (0, . . . , 1 , . . . , 0) is the jth standard unit vector in Rn , then
∇f (x) · ej is the partial derivative ∂f (x)/∂xj = fj (x) with respect to the jth variable.
On the other hand, ∇f (x) · ej is the jth component of ∇f (x). Hence, ∇f (x) is the row
vector
∇f (x) = (∇f (x) · e1 , . . . , ∇f (x) · en ) = (f1 (x), . . . , fn (x)) .
Suppose that I am interested in checking if a given transformation f : Rn → Rm is
differentiable. Then, the next theorem shows that it suffices to check if each component
real-valued function f j : Rn → R is differentiable (j = 1, . . . , m). 2
This is called the Jacobian matrix of f (·) at x. Its rows are the gradients of the
component functions of f (·).
Proof of Theorem 5.13: Let C be an m × n matrix and let O(h) = f (x + h) −
f (x) − Ch where h ∈ Rn .
⎛ ⎞ ⎛ (1) ⎞ ⎛ ⎞⎛ ⎞
O1 (h) f (x + h) − f (1) (x) c11 c12 · · · c1n h1
⎜ O2 (h) ⎟ ⎜ f (2) (x + h) − f (2) (x) ⎟ ⎜ c21 c22 · · · c2n ⎟ ⎜ h2 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟
⎜ .. ⎟=⎜ .. ⎟ − ⎜ .. .. .. .. ⎟ ⎜ .. ⎟ .
⎝ . ⎠ ⎝ . ⎠ ⎝ . . . . ⎠ ⎝ . ⎠
Om (h) (m) (m)
f (x + h) − f (x) cm1 cm2 · · · cmn hn
Hence, f (·) is differentiable at x if and only if each f (j) is differentiable at x. Also, the
j-th row of the matrix C = Df (x) is the derivative of f (j), that is, Cj = ∇f (j) (x).
70
Theorem 5.15 (Differentiability =⇒ Continuity) If a transformation f from A ⊂
Rn into Rm is differentiable at an interior point x0 ∈ A, then f is continuous.
Proof of Theorem 5.14: Let C = Df (x). Then, for small but nonzero h ∈ Rn , the
triangle inequality yields
Since f is differentiable at x0 ,
The next theorem shows that the order of two operations do not matter for the final
product: (1) Construct a composite mapping and differentiate it; and (2) Differentiate
each mapping and constructs a composite of two derivatives.
Also,
eg (k)
g(f (x + h)) − g(f (x)) = Dg(f (x))k + eg (k), where → 0 as k → 0
k
Note that there exits some fixed constant K such that k(h) ≤ Kh for all small h.
Otherwise, f and g are not differentiable. Note also that for all ε > 0, eg (k) < εk
for k small because g is differentiable. Thus, when h is small, we can summarize
71
Hence,
eg (k(h))
→ 0 as h → 0
h
Then, we execute a series of computation below.
And, moreover,
e(h) 1
= D(g ◦ f )(x)ef (h) + eg (k(h))
h h
D(g ◦ f )(x)ef (h) eg (k(h))
≤ + (∵ triangle inequality)
h h
ef (h) eg (k(h))
≤ D(g ◦ f )(x) + (∵ Cauchy-Schwartz inequality)
h h
Since ef (h)/h → 0 and eg (k(h))/h → 0 as h → 0, we conclude that e(h)/h →
0 as h → 0.
72
“Sketch” of the Proof of Inverse Function Theorem: The proof consists of
5 steps. However, the proof of each step will be either briefly sketched or completely
skipped due to its technical difficulty. For simplicity, we assume that x0 = 0 and f (x0 ) =
y0 = 0.
associated with A−1 and so is C k . Therefore, we can rather talk about f ◦ g instead of
f so that Df (0) = In can be assumed with no loss of generality.
Proof of Step 2:
Step 3: There exists an open set V such that f |U : U → V is onto. That is, for any
0 ∈ V , there exists 0 ∈ U such that f (0) = 0.
Step 5: f −1 |V : V → U is C k .
f1 (x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ) = 0
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ⇐⇒ f (x, y) = 0
fm (x1 , x2 , . . . , xn , y1 , y2 , . . . , ym ) = 0
with f = (f1 , . . . , fm ) , x = (x1 , . . . , xn ), and y = (y1 , . . . , ym ).
⎛ ⎞
∂f1 /∂x1 · · · ∂f1 /∂xn
⎜ .. .. .. ⎟
Dx f (x, y) = ⎝ . . . ⎠.
∂fm /∂x1 · · · ∂fm /∂xn
73
the Jacobian determinant of f with respect to y is different from 0 at (x0 , y 0 ) – i.e.,
|Dy f (x, y)|
= 0 at (x, y) = (x0 , y 0 ). Then, there exist open balls B1 and B2 around x0
and y 0 , respectively, with B1 × B2 ⊂ A, such that |Dy f (x, y)|
= 0 in B1 × B2 , and such
that for each x ∈ B1 , there is a unique y ∈ B2 with f (x, y) = 0. In this way, y is defined
on B1 as a C 1 function g(·) of x. The Jacobian matrix Dg(x) can be found by implicit
differentiation of f (x, y) = 0, and
⎛ ⎞
∂y1 /∂x1 ∂y1 /∂x2 · · · ∂y1 /∂xn
dy ⎜ ⎜ ∂y2 /∂x1 ∂y2 /∂x2 · · · ∂y2 /∂xn ⎟
⎟
=⎜ .. .. . . ⎟ = Dg(x) = −[Dy f (x, y)]−1 Dx f (x, y)
dx ⎝ . . .. .. ⎠
∂ym /∂x1 ∂ym /∂x2 · · · ∂ym /∂xn
This is the norm we discuss for the implicit function theorem. The proof relies on the
following three lemmas (Lemmas 5.1, 5.2, and 5.3). We will not provide their proofs
here.
Lemma 5.1 Let K be a compact set in Rn . Let {hk (x)}Kk=1 be a sequence of continuous
m
functions K → R . Suppose that for any ε > 0, there exists a number N ∈ N such that
for all m, n > N . Then, there exists a unique continuous function h : K → Rm such that
+
lim max hk (x) − h(x) = 0
k→∞ x∈K
With Weierstrass’s Theorem (Theorem 3.19) and the concept of Cauchy sequence in
Rn (Definition 3.6), the above lemma should be easy to be established. For the next
lemma, define the following.
Dα = x ∈ Rn x − x0 ≤ α
Dβ = y ∈ Rm y − y 0 ≤ β
Lemma 5.2 (Lipschitz Continuity) There exists a number K ∈ (0, 1) such that, for
all y, y ∈ Dβ ,
ξ(x, y) − ξ(x, y ) < Ky − y .
74
We argue that Lemma 5.2 enables us to construct a sequence of continuous functions
needed for Lemma 5.1. Again, we take Lemma 5.2 for granted. Let y0 (x) = y 0 . Define
yk+1 (x) = y0 + ξ(x, yk (x)) for k ≥ 0. Since ξ(x0 , y 0 ) = 0, we have
|ξ(x, y 0 )| = (1 − K)β
Fix m ∈ N.
Lemma 5.3 Let ξ(x, y) be a continuous mapping from Dα × Dβ to Rm with the property
that ξ(x0 , y 0 ) = 0. Furthermore, There exists a number K ∈ (0, 1) such that, for all
y, y ∈ Dβ ,
ξ(x, y) − ξ(x, y ) < Ky − y .
75
By construction of g, we have g(x0 , y 0 ) = 0 and |Dy g(x0 , y 0 )|
= 0. Define
" #−1
ξ(x, y) = − Dy f (x0 , y 0 ) g(x, y)
If we choose α > 0 small enough so that x is very close to x0 , i.e., x ∈ Dα , there exists
K ∈ (0, 1) such that
ξ(x, y) − ξ(x, y ) < KIm (y − y 0 )
= Ky − y 0
Now, we can take two open sets B1 and B2 small enough so that B1 ⊂ Dα and B2 ⊂ Dβ
needed for the theorem. Then, we can use Lemma 5.3 which completes the proof.
1. for every x ∈ Ix , the equation f (x, y) = 0 has a unique solution in Iy which defines
y as a function y = ϕ(x) in Iy ;
dy ∂f (x, ϕ(x))/∂x
= ϕ (x) = −
dx ∂f (x, ϕ(x))/∂y
Exercise 5.8 The point P = (x, y, z, u, v, w) = (1, 1, 0, −1, 0, 1) satisfies all the equa-
tions
y 2 − z + u − v − w3 = −1
−2x + y − z 2 + u + v 3 − w = −3
x2 + z − u − v + w 3 = 3
Using the implicit function theorem, find du/dx, dv/dx, and dw/dx at P .
76
Chapter 6
Static Optimization
Here x∗ is called a (global) maximal point for f in S and f (x∗ ) is called the maximum
value. If the inequality (∗) is strict for all x
= x∗ , then x∗ is a strict maximum point
for f (·) in S. We define (strict) minimum point and minimum value by reversing the
inequality sign in (∗). As collective names, we use extreme points and extreme values to
indicate both maxima or minima.
Theorem 6.1 Let f (·) be defined on a set S in Rn and let x∗ = (x∗1 , . . . , x∗n ) be an
interior point in S at which f (·) has partial derivatives. A necessary condition for x∗ to
be an extreme point for f is that x∗ is a stationary point for f (·) – that is, it satisfies
the equations
∂f (x)
∇f (x) = 0 ⇐⇒ = 0, for i = 1, . . . , n
∂xi
Proof of Theorem 6.1: Suppose, on the contrary, that x∗ is a maximum point but
not a stationary point for f (·). Then, there is no loss of generality to assume that there
exists at least i such that fi (x) > 0. Define x∗∗ = (x∗1 , . . . , x∗i + ε, . . . , x∗n ). Since x∗ is an
interior point in S, one can make sure that x∗∗ ∈ S by choosing ε > 0 sufficiently small.
Then,
77
However, this contradicts the hypothesis that x∗ is a maximum point for f (·).
The next theorem clarifies under what conditions, the converse of the previous theo-
rem (Theorem 6.1) is established.
Theorem 6.2 Suppose that the function f (·) is defined in a convex set S ⊂ Rn and let
x∗ be an interior point of S. Assume that f (·) is C 1 in a ball around x∗ .
2. f (·) is convex in S, then x∗ is a (global) minimum point for f (·) in S if and only
if x∗ is a stationary point for f (·).
Proof of Theorem 6.2: We focus on the first part of the theorem. The second part
follows once we take into account that −f is concave. (=⇒) This follows from Theorem
6.1 above. (⇐=) Suppose that x∗ is a stationary point for f (·) and that f (·) is concave.
Recall the inequality in Theorem 5.6 (First-order characterization of concave functions).
For any x ∈ S,
The vector x that maximizes f (x, r) depends on r and is therefore denoted by x∗ (r).
Then, f ∗ (r) = f (x∗ (r), r).
Theorem 6.3 (Envelope Theorem) In the maximization problem maxx∈S f (x, r), where
S ⊂ Rn and r ∈ Rk , suppose that there is a maximum point x∗ (r) ∈ S for every
r ∈ Bδ (r∗ ) with some δ > 0. Furthermore, assume that the mappings r → f (x∗ (r∗ ), r)
and r → f ∗ (r) are differentiable at r ∗ . Then
⎡ ⎤
∂f (x, r)/∂r1
⎢ ∂f (x, r)/∂r2 ⎥
⎢ ⎥
∇r f ∗ (r∗ ) = ⎢ .. ⎥
⎣ . ⎦
∂f (x, r)/∂rk x=x∗ (r∗ ),r=r ∗
78
There are two effects of r on the value function f ∗ through both directly and indirectly
x∗ (r).
The Envelope theorem says that we can ignore the indirect effects.
Because x∗ (r∗ ) is a maximum point of f (x, r) when r = r ∗ , one has ϕ(r∗ ) = 0 and
ϕ(r) ≤ 0 for all r ∈ Bδ (r∗ ). Since ϕ(r∗ ) is a maximum, the following first order condition
is satisfied (because of Theorem 6.1).
∂ϕ(r)
∇r ϕ(r) r=r∗ = 0 ⇐⇒ = 0 ∀j = 1, . . . , k
∂rj r=r∗
That is,
∂ϕ(r) ∂f (x∗ (r∗ ), r) ∂f ∗ (r)
∂rj
=
∂rj ∗ − ∂rj ∗ = 0 ∀j = 1, . . . , k
r=r r=r
Before stating the next result, recall the n leading principal minors of the Hessian
matrix D 2 f (x):
f11 (x) f12 (x) ··· f1k (x)
f21 (x) f22 (x) ··· f2n (x)
2
|D(k) f (x)| = .. .. .. .. , k = 1, . . . , n
. . . .
fk1 (x) fk2 (x) ··· fkk (x)
Theorem 6.4 (Sufficient Conditions for Local Extreme Points) Suppose that f (x) =
f (x1 , . . . , xn ) is defined on a set S ⊂ Rn and that x∗ is an interior stationary point. As-
sume also that f (·) is C 2 in an open ball around x∗ . Then,
79
Proof of Theorem 6.4: We only focus on the first part of the theorem. We should
be able to prove the second part of the proof by replacing f (·) with −f (·). Since each
fij (x) is continuous in x (because f (·) is C 2 ), the determinant is a continuous function
2 f (x∗ )| > 0 for all k, it is possible to find a ball B (x∗ ) with ε > 0
of x. Therefore, if |D(k) ε
2 f (x)| > 0 for all x ∈ B (x∗ ) and all k = 1, . . . , n. By Theorem 4.6,
so small that |D(k) ε
the corresponding quadratic form is positive definite for all x ∈ Bε (x∗ ). It follows from
Theorem 5.5 that f (·) is strictly convex in Bε (x∗ ). Then, Theorem 6.2 shows that the
stationary point x∗ is a maximum point for f in Bε (x∗ ). Hence, x∗ is a local minimum
point for f (·).
Lemma 6.1 If x∗ is an interior stationary point of f (·) such that |D2 f (x∗ )|
= 0 and
D 2 f (x∗ ) is neither positive definite nor negative definite, then x∗ is a saddle point.
Theorem 6.5 (Necessary Conditions for Local Extreme Points) Suppose that f (x) =
f (x1 , . . . , xn ) is defined on a set S ⊂ Rn , and x∗ is an interior stationary point in S.
Assume that f is C 2 in a ball around x∗ . Then,
Proof of Theorem 6.5: Suppose that x∗ is an interior local maximum point for
f (·). Then, if ε > 0 is small enough, Bε (x∗ ) ⊂ S, and f (x) ≤ f (x∗ ) for all x ∈ Bε (x∗ ).
If t ∈ (−ε, ε), then x∗ + th ∈ Bε (x∗ ) because (x∗ + th) − x∗ = th = |t| < ε. Then,
for all t ∈ (−ε, ε), we have
Thus, the function g(·) has an interior maximum at t = 0. Using the chain rule, we
obtain
n
g (t) = fi (x∗ + th)hi
i=1
n
n
g (t) = fij (x∗ + th)hi hj
i=1 j=1
80
The condition g (0) ≤ 0 yields
n
n
fij (x∗ + th)hi hj ≤ 0 ∀h = (h1 , . . . , hn ) with h = 1
i=1 j=1
This implies that the Hessian matrix D 2 f (x∗ ) is negative semidefinite. Theorem 4.6
shows that this is equivalent to checking all principal minors. The same argument can
be used to establish the necessary condition for x∗ to be a local minimum point for f (·).
Exercise 6.1 Find the local extreme values and classify the stationary points as maxima,
minima, or neither.
Theorem 6.6 (N&S Conditions for Extreme Points with Equality Constraints)
The following establishes the necessary and sufficient conditions for the Lagrangian method
81
1. (Necessity) Suppose that the functions f and g1 , . . . , gm are defined on a set S
in Rn and x∗ = (x∗1 , . . . , x∗n ) is an interior point of S that solves the maximization
problem (∗). Assume further that f and g1 , . . . , gm are C 1 in a ball around x∗ , and
that the Jacobian matrix,
⎛ ⎞
∂g 1 (x∗ ) ∂g 1 (x∗ )
· · ·
⎜ ∂x. 1 ∂xn ⎟
Dg(x∗ ) = ⎜ ⎝ .
.
..
.
..
.
⎟
⎠
m ∗ m ∗
∂g (x ) ∂g (x )
m×n
∂x1 · · · ∂xn
has rank m. Then, there exist unique numbers λ1 , . . . , λm such that the first-order
conditions (∗∗) are valid.
Proof of Theorem 6.6: (Necessity) The proof for the necessity part consists of
three steps.
Since m × n matrix Dg(x∗ ) is assumed to have rank m, there exists a invertible (non-
singular) m × m submatrix. After renumbering the variables, if necessary, we can assume
that it consists of the first m columns. By the implicit function theorem (Theorem 5.17),
the m constraints, g1 , . . . , gm define x1 , . . . , xm as C 1 functions of x̃ = (xm+1 , . . . , xn )
in some open ball B around x∗ , i.e., Bε (x∗ ) with ε > 0 sufficiently small, so we can write
xj = hj (xm+1 , . . . , xn ) = hj (x̃), j = 1, . . . , m
of x̃ only. Now, the maximization problem with equality constraints is translated into
the unconstrained maximization problem
max ξ(x̃)
x̃∈Bε (x∗ )
Since x∗ is a local extreme point for f subject to the given constraints, ξ must have an
unconstrained local extreme point at x̃∗ = (x∗m+1 , . . . , x∗n ). Hence, the partial derivatives
of ξ with respect to xm+1 , . . . , xn must be 0:
∂ξ(x∗ ) ∂f (x∗ ) ∂h1 ∂f (x∗ ) ∂hm ∂f (x∗ )
= + ··· + + = 0 (1)
∂xk ∂x1 ∂xk ∂xm ∂xk ∂xk
indirect effects direct effects
82
where k = m + 1, . . . , n.
Thus, the first-order necessary conditions for the Lagrangian are also satisfied.
83
We rewrite the system (3) as
∂g1 ∂g2 ∂gm ∂f
λ1 + λ2 · · · λm =
∂x1 ∂x1 ∂x1 ∂x1
∂g1 ∂g2 ∂gm ∂f
λ1 + λ2 · · · λm = ⇐⇒ Dg(x∗ )
λ = ∇f (x∗ )
∂x2 ∂x2 ∂x2 ∂x1
m×n n×1 m×1
··························· ··· ···
∂g1 ∂g2 ∂gm ∂f
λ1 + λ2 · · · λm =
∂xm ∂xm ∂xm ∂xm
Here the coefficient matrix above is invertible (nonsingular), it has a unique solution
λ1 , . . . , λm . This completes the proof.
(Sufficiency) Suppose that the Lagrangian L(x) is concave. The first-order neces-
sary conditions imply that the Lagrangian is stationary at x∗ . Then by Theorem 6.2
(sufficiency for global maximum),
m
m
∗ ∗ j ∗
L(x ) = f (x ) − λj g (x ) ≥ f (x) − λj gj (x) = L(x) ∀x ∈ S
j=1 j=1
But for all feasible x, we have gj (x) = 0 and of course, gj (x∗ ) = 0 for all j = 1, . . . , m.
Hence, this implies that f (x∗ ) ≥ f (x). Thus, x∗ solves the maximization problem (∗).
have
∂f ∗ (r) ∂L(x, r)
= , i = 1, . . . , k
∂ri ∂ri x=x∗ (r)
84
defines a subset of Rn which is best viewed as a hypersurface. If, as I assume in this
section, the functions gj , j = 1, . . . , m belong to C 1 , the surface defined by them is said
to be smooth. We introduce the tangent hyperplane M below:
M = {y ∈ Rn |Dg(x∗ )y = 0}
Note that the tangent hyperplane is a subspace of Rn .
Since x∗ is a regular point, the tangent hyperplane is identical with the set of y’s
satisfying ∇g(x∗ )y = 0. Then, since x∗ is a constrained local extreme point of f (·), we
have
d
f (x(t))
= 0 =⇒ ∇f (x∗ )x (0) = 0,
dt t=0
equivalently, ∇f (x∗ )y = 0.
The above lemma says that ∇f (x∗ ) is orthogonal to the tangent hyperplane.
85
Proof of Theorem 6.7: The first part follows from Theorem 6.6. We only focus
on the second part. Let h = (h1 , . . . , hn ) ∈ M with h = 1. Let x(t) = x∗ + th be
any smooth curve on the constant surface g(x(t)) = 0 passing through x∗ with derivative
x (0) = h at x(0) = x∗ . Suppose that x∗ is an interior local maximum point for f subject
to g(x) = 0. Then, if ε > 0 is small enough,
for all t ∈ (−ε, ε) because (x∗ + th) − x∗ = th = |t| < ε. Define the function
ϕ(t) = L((x∗ + th). Then, for all t ∈ (−ε, ε), we have
Thus, the function ϕ has an interior maximum at t = 0. Using the chain rule (Theorem
5.15), we obtain
ϕ (t) = ∇L(x∗ + th)h = ∇f (x∗ + th)h − λDg(x∗ + th)h
ϕ (0) = ∇L(x∗ )h = ∇f (x∗ )h − λDg(x∗ )h = 0
Suppose also that the matrix D 2 L(x∗ ) = D2 f (x∗ ) + λD2 g(x∗ ) is negative definite on
M = {y ∈ Rn |Dg(x∗ )y = 0}, that is, for y ∈ M with y
= 0, y T D2 L(x∗ )y < 0. Then, x∗
is a strict local maximum of f (·) subject to g(x) = 0.
Proof of Theorem 6.8: The first part follows from Theorem 6.7. Define the La-
grangian as follows.
86
This implies that ∇L(x∗ )y = 0 for any y ∈ Rn . By our hypothesis, D2 L(x∗ ) is negative
definite on M , and therefore, x∗ is a local maximum point of L(x) from Theorem 6.4.
This implies that x∗ is a local maximum of f (·) subject to g(x) = 0.
1. Let x∗1 (m) and x∗2 (m) denote the values of x1 and x2 that solve the above maximiza-
tion problem. Find these functions and the corresponding Lagrangian multiplier.
is called the value function. Suppose that λi = λi (r) for all i = 1, . . . , m are the Lagrange
multipliers in the first-order conditions for the maximization problem and let
be the Lagrangian. Here λ = (λ1 , . . . , λm ) and g(x, r) = (g1 (x, r), . . . , gm (x, r)).
87
6.3 Inequality Constraints: Nonlinear Programming
⎧ 1
⎪
⎪ g (x1 , . . . , xn ) ≤ 0
⎪
⎨ g2 (x1 , . . . , xn ) ≤ 0
max f (x) subject to ..
x∈S ⎪
⎪ .
⎪
⎩ m
g (x1 , . . . , xn ) ≤ 0
A vector x = (x1 , . . . , xn ) that satisfies all the constraints is called feasible. The
set of all feasible vectors is said to be the feasible set. We assume that f (·) and all the
gj functions are C 1 . In the case of equality constraint, the number of constraints were
assumed to be strictly less than the number of variables. This is not necessary for the
case of inequality constraints. An inequality constraint gj (x) ≤ 0 is said to be active
(binding) at x if gj (x) = 0 and inactive (non-binding) at x if gj (x) < 0.
λj ≥ 0 and λj gj (x) = 0
Conditions (∗) and (∗∗) are often called the Kuhn-Tucker conditions. They are
(essentially but not quite) necessary conditions for a feasible vector to solve the maxi-
mization problem. In general, they are definitely not sufficient on their own. Suppose
one can find a point x∗ at which f (·) is stationary and gj (x∗ ) < 0 for all j = 1, . . . , m.
Then, the Kuhn-Tucker conditions will automatically be satisfied by x∗ together with all
the Lagrangian multipliers λj = 0 for all j = 1, . . . , m.
88
Theorem 6.9 (Sufficiency for the Kuhn-Tucker Conditions I) Consider the max-
imization problem and suppose that x∗ is feasible and satisfies conditions (∗) and (∗∗).
If the Lagrangian L(x) = f (x) − λ · g(x) (with the λ values obtained from the recipe) is
concave, then x∗ is optimal.
Proof of Theorem 6.9: This is very much the same as the sufficiency part of
the Lagrangian problem in Theorem 6.6. Since L(x) is concave by assumption and
∇L(x∗ ) = 0 from (∗), by Theorem 6.2, x∗ is a global maximum point of L(x). Hence,
for all x ∈ S,
m
m
f (x∗ ) − λj gj (x∗ ) ≥ f (x) − λj gj (x)
j=1 j=1
for all feasible x, because this will imply that x∗ solves the maximization problem. Sup-
pose that gj (x∗ ) < 0. Then (∗∗) shows that λj = 0. Suppose that gj (x∗ ) = 0, we have
λj (gj (x∗ ) − gj (x)) j
m = −λjjg (x) ≥ 0 because
! x is feasible, i.e., gj (x) ≤ 0 and λj ≥ 0.
∗ j
Then, we have j=1 λj g (x ) − g (x) ≥ 0 as desired.
Theorem 6.10 (Sufficiency for the Kuhn-Tucker Conditions II) Consider the max-
imization problem and suppose that x∗ is feasible and satisfies conditions (∗) and (∗∗).
If f (·) is concave and each λj gj (x) (with the λ values obtained from the recipe) is qua-
siconvex, then x∗ is optimal.
Proof of Theorem 6.10: We want to show that f (x) − f (x∗ ) ≤ 0 for all feasible
x. Since f (·) is concave, then according to Theorem 5.6 (First-order characterization of
concavity of f (·)),
m
f (x) − f (x∗ ) ≤ ∇f (x∗ ) · (x − x∗ )
= λj ∇g j (x∗ ) · (x − x∗ )
(∗) j=1
where we use the first order condition (∗). It therefore suffices to show that for all
j = 1, . . . , m, and all feasible x,
λj ∇g j (x∗ ) · (x − x∗ ) ≤ 0.
The above inequality is satisfied for those j such that gj (x∗ ) < 0, because then λj = 0
from the complementary slackness condition (∗∗). For those j such that gj (x∗ ) = 0, we
89
have gj (x) ≤ gj (x∗ ) (because x is feasible), and hence −λj gj (x) ≥ −λj gj (x∗ ) because
λj ≥ 0. Since the function −λj gj (x) is quasiconcave (because λj gj (x) is quasiconvex),
it follows from Theorem 5.10 (a characterization of quasiconcavity) that ∇(−λj gj (x∗ )) ·
(x − x∗ ) ≥ 0, and thus, λj ∇g j (x∗ ) · (x − x∗ ) ≤ 0.
as a standard Kuhn-Tucker maximization problem and write down the necessary Kuhn-
Tucker conditions. Moreover, find the solution of the problem (Take it for granted that
there is a solution).
The optimal value of the objective f (x) obviously depends upon b ∈ Rm which is a
parameter vector in the constraint set. The function defined by
f ∗ (b) = max f (x)gj (x) ≤ bj , j = 1, . . . , m
assigns to each b = (b1 , . . . , bk ) the optimal value f ∗ (b) of f (·). It is called the value
function for the problem. Let the optimal choice for x in the constrained optimization
problem be denoted by x∗ (b), and assume that it is unique. Let λj (b) for j = 1, . . . , m
be the corresponding Lagrange multipliers. Then, if ∂f ∗ (b)/∂bj exists,
∂f ∗ (b)
= λj (b) ∀j = 1, . . . , m
∂bj
The value function f ∗ is not necessarily C 1 . The next proposition characterizes a geo-
metric structure of the value function.
Proposition 6.1 If f (x) is concave and gj (x) is convex for each j = 1, . . . , m, then
f ∗ (b) is concave.
Proof of Proposition 6.1: Suppose that b and b are two arbitrary parame-
ter vectors in the constraint set, and let f ∗ (b ) = f (x∗ (b ), f ∗ (b ) = f (x∗ (b ), with
gj (x∗ (b )) ≤ bj , gj (x∗ (b )) ≤ bj for j = 1, . . . , m. Let α ∈ [0, 1]. Corresponding to the
vector αb + (1 − α)b , there exists an optimal solution x∗ (αb + (1 − α)b ), and
f ∗ (αb + (1 − α)b ) = f (x∗ (αb + (1 − t)b ))
Define xα = αx∗ (b ) + (1 − α)x∗ (b ). Then, convexity of gj for j = 1, . . . , m implies that
gj (xα ) ≤ αgj (x∗ (b )) + (1 − α)gj (x∗ (b )) ≤ αbj + (1 − α)bj
90
Thus, xα is feasible in the problem with parameter αb + (1 − α)b , and in that problem,
x∗ (αb + (1 − α)b ) is optimal. It follows that
f (xα ) ≤ f (x∗ (αb + (1 − α)b )) = f ∗ (αb + (1 − α)b )
In sum,
f ∗ (αb + (1 − α)b ) ≥ αf ∗ (b ) + (1 − α)f (b )
Definition 6.2 The constrained maximization problem satisfies the constraint qual-
ification if the gradient vectors ∇g j (x∗ ) (1 ≤ j ≤ m) corresponding those constraints
that are active (binding) at x∗ , are linearly independent.
An alternative formulation of this condition is: Delete all rows in the Jacobian matrix
Dg(x∗ ) that correspond to constraints that are inactive (not binding) at x∗ . Then, the
remaining matrix should have rank equal to the number of rows.
3. the k × n Jacobian matrix Dgk (x∗ ) has maximal rank k. That is,
⎛ ⎞
∂g1 (x∗ )/∂x1 · · · ∂g1 (x∗ )/∂xn
⎜ .. .. ⎟
k = Rank [Dgk (x∗ )] = Rank ⎝ .
..
. . ⎠.
k ∗ k ∗
∂g (x )/∂x1 · · · ∂g (x )/∂xn
91
The proof consists of two steps.
Since each gj (·) is a continuous function, there is a open ball Bε (x∗ ) such that gj (x) <
0 for all x ∈ Bε (x∗ ) and for j = k + 1, . . . , m. We will work in the open ball Bε (x∗ ) for
the rest of proof.
Note that x∗ maximizes f (·) in Bε (x∗ ) over the constraint set that gj (x) = 0 for
j = 1, . . . , k. By assumption, Theorem 6.7 (Necessity for Optimization with Equality
Constraints) applies and therefore, there exist μ∗1 , . . . , μ∗k such that
Let λ∗i = μ∗i for i = 1, . . . , k and λ∗i = 0 for j = k + 1, . . . , m. Then, we see that (x∗ , λ∗ )
is a solution of the n + m equations in n + m unknowns:
∂L ∗ ∗
(x , λ ) = 0 ∀i = 1, . . . , n
∂xi
λ∗j gj (x∗ ) = 0 ∀j = 1, . . . , m
There is a C 1 curve x(t) defined for t ∈ [0, ε) such that x(0) = x∗ and, for all t ∈ [0, ε),
By the implicit function theorem (Theorem 5.17), we can still solve the constrained
optimization problem in Bε (x∗ ) even if we slightly perturb the constraint set. Let h =
x (0). Using the chain rule (Theorem 5.15), we conclude that
Since x(t) lies in the constraint set for all t and x∗ maximizes f (·) in the constraint set,
f (·) must be nonincreasing along x(t). Therefore,
d
f (x(t)) = ∇f (x∗ )h ≤ 0
dt t=0
92
By our first-order conditions, we execute a series of computations.
0 = ∇L(x∗ )y
k
= ∇f (x∗ )h − λj ∇g j (x∗ )y
j=1
= ∇f (x )h − λ1 ∇g 1 (x∗ )h
∗
= ∇f (x∗ )h + λ1
Then, x∗ satisfies
∇L̃(x∗ ) = ∇f (x∗ ) − λj ∇g j (x∗ ) = 0
j∈J
gj (yk ) ≤ gj (x∗ )
93
for k large enough. Then, we must have Dgj (x∗ )h∗ ≤ 0 because Dgj (x∗ )hk is a linear
continuous in hk and gj (yk ) ≤ gj (y ∗ ) for each k.
If ∇g j (x∗ )h∗ = 0 for all j ∈ J, then the proof goes through just as in Theorem 6.8.
Therefore, there must exists at least one j ∈ J such that ∇g j (x∗ )h∗ < 0. Then, we
obtain
∇f (x∗ )h∗ − λj Dgj (x∗ )h∗ > 0 because λj > 0 for all j ∈ J
j∈J
⎡ ⎤
⎣∇f (x∗ ) − λj Dgj (x∗ )⎦ h∗ > 0
j∈J
=0
This, however, contradicts our fulfilled condition that ∇f (x∗ ) − j∈J λj ∇g j (x∗ ) = 0.
We complete the proof.
94
To reduce this collection of m + n constraints and m + n Lagrangian multipliers,
the necessary conditions for the optimization problem are sometime formulated slightly
differently below.
m
∂f (x∗ ) ∂gj (x∗ )
− λj ≤ 0 (= 0 if x∗i > 0), ∀i = 1, . . . , n
∂xi ∂xi
j=1
Definition 6.3 Let f (·) be concave on convex set S ⊂ Rn , and let x0 be an interior
point in S. Then, there exists a vector p ∈ Rn such that for all x ∈ S,
f (x) − f (x0 ) ≤ p · (x − x0 ).
A vector p that satisfies the above inequality is called a supergradient for f at x0 .
Definition 6.4 The nonlinear programming problem satisfies the Slater qualification
if there exists a vector z ∈ Rn such that g(z) 0, i.e., gj (z) < 0 for all j.
95
Theorem 6.14 (Sufficient Conditions for Concave Programming) Consider the
nonlinear programming problem with f (·) concave and g(·) convex, and assume that there
exists a vector λ ≥ 0 and a feasible vector x∗ which together have the property that x∗
maximizes f (x) − λ · g(x) among all x ∈ Rn , and λ · g(x∗ ) = 0. Then, x∗ solves the
original concave problem and λ is a supergradient for f ∗ at 0.
2. ∇f (x∗ )
= 0.
Then, x∗ is optimal.
96
Chapter 7
Differential Equations
7.1 Introduction
What is a differential equation? As the name suggests, it is an equation. Unlike ordinary
algebraic equations, in a differential equation:
An ordinary differential equation is one for which the unknown is a function of only
one variable. Partial differential equations are equations where the unknown is a function
of two or more variables, and one or more the partial derivatives of the function are
included. In this chapter, I restrict attention to first-order ordinary differential equations
– that is, equations where the first-order derivatives of the unknown functions of one
variable are included.
where a is some constant. I propose here x(t) = Keat as a solution to the differential
equation.
97
Chapter 8
The problem of finding the zeros of a function, f (·), and x ∈ S such that f (x) = 0,
can be converted into a fixed point problem. To see this, observe that f (x) = 0 iff
g(x) = x where g(x) = f (x) + x. The unconstrained optimization with concave objective
function is a special case of the fixed point theorem. In this case, the optimal solution is
found by solving ∇f (x) = 0.
We use this norm in the proof of the implicit function theorem (Theorem 5.17). Choose
any x0 ∈ S and let xk = f (xk−1 ). If a sequence {xk } has a limit x∗ , then x∗ ∈ S because
S is closed, and f (x∗ ) = x∗ . Therefore, it suffices to prove that {xk } has a limit. We use
the Cauchy criterion. Pick q > p. Then,
6 6
6 q−1 6 q−1
6 6
xq − xp = 6 6 6
(xk+1 − xk )6 ≤ xk+1 − xk
6k=p 6
k=p
Minkowski inequality
98
But
xk+1 − xk ≤ θ k x1 − x0
Hence
q−1
xq − xp ≤ θ k x1 − x0
k=p
!
≤ x1 − x0 θ p + θ p+1 + · · ·
θp
= x1 − x0 → 0 as p, q → ∞
1−θ
Because θ p → 0 as p → ∞ due to θ < 1.
x = (1 − x) × 0 + x × 1
The same will be true for f (x). So we express each x ∈ [0, 1] as a pair of nonnegative
numbers (x1 , x2 ) = (1 − x, x) that add to one. When expressing f (x) in this way, we will
write it as (f1 (x), f2 (x)) = (1 − f (x), f (x)). Suppose for a contradiction, that f has no
fixed point.
Since f : [0, 1] → [0, 1] we can think of the function f (·) as moving each point x ∈ [0, 1]
either to the right (if f (x) > x) or to the left (if f (x) < x). The assumption that f (·)
has no fixed point eliminates the possibility that f (·) leaves the position of x unchanged.
Given any x ∈ [0, 1], we label it with a “+” if f1 (x) < x1 (move to the right) and
label it “−” if f1 (x) > x1 (move to the left). The assumption of no fixed point implies
f1 (x)
= 1 − x for all x ∈ [0, 1]. Thus, the labeling scheme is well defined. Notice that
the point 0 will be labeled (+) and the point 1 will be labeled (−).
Choose any finite partition, Π0 , of the interval [0, 1] into smaller intervals.
Claim 8.1 The partition Π0 must contain a subinterval [x0 , y 0 ] whose endpoints have
different labels.
99
Proof of Claim 8.1: Every endpoint of these subintervals is labeled either (+) or
(−). The point “0”, which must be the endpoint of the subinterval of Π0 , has label (+).
The point “1” has label (−). As we travel from 0 to 1 (left to right), we leave a point
labeled (+) and arrive at a point labeled (−). At some point, we must pass through a
subintervals which has endpoints with different labels.
Now take the partition Π0 and form a new partition Π1 , finer than the first by taking
all the subintervals in Π0 whose endpoints have different labels and cutting them in half.
In Π1 , there must exist at least one subinterval, [x1 , y 1 ] with endpoints having different
labels. Repeat this procedure indefinitely.
Let z be the limit point of {xk } and {y k }. By continuity, f (xk ) and f (y k ) both
converge to f (z). Since each xk is labeled (+) and each y k is labeled (−), for each k, we
have f1 (xk ) < xk1 and in the limit f1 (z) ≤ z1 . For each k, we have f1 (y k ) > y1k and in
the limit f1 (z) ≥ z1 . Thus, f1 (z) ≤ z1 and f1 (z) ≥ z1 . This implies that f1 (z) = z, i.e.,
f (z) = z, a fixed point. This is a contradiction.
n
Definition 8.2 The n-simplex is the set Δn = {x ∈ Rn | i=1 xi = 1 and xi ≥ 0 for
all i = 1, . . . , n}.
From the definition, Δn is convex and compact. We also see that this is an (n − 1)-
dimensional object.
100
We skip the proof of Theorem 8.2. Now, it is time to prove Brouwer fixed point
theorem.
Since
" continuous, by Lemma 8.2, it has a fixed point x∗ . Therefore, h(x∗ ) =
h(·) is !#
g f g (x ) = x∗ . We have f (g−1 (x∗ )) = g−1 (x∗ ). So, g−1 (x∗ ) is a fixed point for f .
−1 ∗
101
Chapter 9
Theorem 9.1 (Strict Separation Theorem) Let S be a closed convex set in Rn , and
let y be a point in Rn that does not belong to S. Then there exists a nonzero vector
a ∈ Rn \{0} and a number α ∈ R such that
a·x<α <a·y
for all x ∈ S. For every such α, the hyperplane H = {x ∈ Rn |a · x = α} strictly separates
S and y.
Proof of Theorem 9.1: Because S is a closed set, among all the points of S, there
is one w = (w1 , . . . , wn ) that is closest to y. For this we can suppose that there is no
such closest point. Then, closedness of S gives us a contradiction. You should fill the
gap in the above argument. Let a = y − w. Since w ∈ S and y ∈ / S, it follows that a
= 0.
Note that a · (y − w) = a · a > 0, and so a · w < a · y. Suppose we prove that
a · x ≤ a · w ∀x ∈ S (∗)
102
Then, the theorem is true for every number α ∈ (a · w, a · y). Now, it remains to show
(∗). Let x be any point in S. Since S is convex, λx + (1 − λ)w ∈ S for each λ ∈ [0, 1].
Now define g(λ) as the square of the distance from λx + (1 − λ)w to the point y:
In the proof of the above theorem, it was essential that y did not belong to S. If S
is an arbitrary convex set (not necessarily closed), and if y is not an interior point of S,
then it seems plausible that y can still be separated from S by a hyperplane. If y is a
boundary point of S, such a hyperplane is called a supporting hyperplane to S at y.
Theorem 9.3 (Separating Hyperplane Theorem) Let S and T be two disjoint nonempty
convex sets in Rn . Then, there exists a nonzero vector a ∈ Rn and a scalar α ∈ R such
that
103
Proof of Claim 9.1: Let w, w ∈ W . By definition of W , there are s, s ∈ S and
t, t ∈ T such that w = s − t and w = s − t . Let α ∈ [0, 1]. What we want to show is
that αw + (1 − α)w ∈ W . We compute the convex combination below.
αw + (1 − α)w = αs − αt + (1 − α)s − (1 − α)t
)
* )
*
= αs + (1 − α)s − αt + (1 − α)t
Since S and T are convex, αs + (1 − α)s ∈ S and αt + (1 − α)t ∈ T . This implies that
αw + (1 − α)w ∈ S − T = W .
Theorem 9.4 Let S and T be two disjoint, nonempty, closed, convex sets in Rn with S
being bounded. Then, there exists a nonzero vector a ∈ Rn and a scalar α ∈ R such that
a·x>α >a·y for all x ∈ S and all y ∈ T .
104
9.3 Dimension of a Set
105