Notes01 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ECO 317 – Economics of Uncertainty – Fall Term 2009

Notes for lectures


1. Reminder and Review of Probability Concepts

1 States and Events


In an uncertain situation, any one of several possible outcomes may be realized. Treatment
of this in probability theory begins by listing all the logical possibilities. These are called the
elementary events in probability and statistical theories, and in our economic applications
we will often call them elementary states of the world or states of nature, or sometimes in
financial economics, all possible scenarios. Each elementary event or state is intended to
be an exhaustive description of exactly one possible outcome; an elementary state of nature
is intended to be an exhaustive description of exactly one set of circumstances. How fine
or coarse the distinction is made, or what is exogenous and what is not, depends on the
context of each specific application. As far as the mathematics of probability is concerned,
the elementary events or states are just general abstract entities – they are among the basic
concepts or “primitives” of the theory, and are assumed to follow certain specified rules and
axioms. The logical deductions from this structure of rules constitutes the theory.
The set of all elementary events is called the sample space or probability space; in the
economic context we will simply call it the set of all states of nature. Each subset of this space
is called an event; singleton subsets have already been termed elementary events. In economic
applications, the sample space (the full list of elementary events) should be exogenous, but
individuals may control or affect the probabilities that attach to these events. The primary
instances of this are mixed strategies and moral hazard.
Examples: [1] When a die is rolled, there are six elementary events corresponding to the
number of dots on the top face. An event such as “the number is even” or “the number
exceeds 4” are composite events. [2] When two coins are tossed, each can land either heads
up or tails up. If the two coins are distinguishable, for example one is a quarter and the
other a dime, or a coin is tossed twice and we keep track of the order in which the heads
or tails appeared, then there are four elementary events: HH, TT, HT, TH. But if the coins
are not distinguishable, then there are only three elementary events: two heads, two tails,
and one head one tail. Thus the concept of elementary event can be specific to the context.
[3] The upcoming interest rate choice of the Fed can be random from the perspective of an
individual investor. (From a larger perspective that includes the Fed as a participant the
interest rate may not be random; thus the very idea of randomness and probabilities may
be specific to a context of application.) Suppose interest rates must be positive, they are
denominated in basis points (one hundredth of a percentage point) and are absolutely sure
to be less than or equal to 12 percent. Then there are 1200 elementary events. We could say
that the Fed only ever uses 25-basis-point increments, so only 60 of the events are relevant
and we should discard the rest from the sample space. But some day technology may allow
finer adjustments of monetary policy so we may want to retain all 1200. Such decisions
about practical modeling are matters for context-specific judgments.

1
2 Probability
The idea is to formalize the intuitive concept of how likely or unlikely is one event among the
possible outcomes of an uncertain situation. Again the mathematics stands in the abstract,
governed by the rules or axioms we impose. But these assumptions are usually made with
some application in mind, and probability theory has behind it one of three motivations or
interpretations:
Classical: Probability describes the physical chance that a controlled experiment produces
some outcome. Example: radioactive decay or various phenomena in quantum physics.
Sometimes events that are in principle deterministic but in reality too complex to calculate
may be better modeled as probabilistic, e.g. some classical statistical mechanics.
Frequentist: Probability corresponds to the long-run frequency of an event in an experiment
that can be repeated independently under identical conditions. Example: coin tosses.
Subjectivist: Probability is a numerical representation of an individual’s subjective belief
about the likelihood of an event. Example: who will win the Super Bowl.
We will generally adopt either a frequentist (objective) or subjectivist interpretation as suits
our applications in a pragmatic way.
Let S denote the sample space, 2S the set of its subsets (the set of all logically conceivable
events), and R+ the set of non-negative real numbers. For the moment, suppose S is finite.
We define probability as a function P r : 2S 7→ R+ with the following properties:
1. P r(∅) = 0, P r(S) = 1.
2. If A, B ⊂ S and A ∩ B = ∅, then P r(A ∪ B) = P r(A) + P r(B).
From this, one can immediately prove that if Ai ⊂ S for i = 1, 2, . . . n and Ai ∩ Aj = ∅
for i 6= j, then
n n
!
[ X
Pr Ai = P r(Ai ) . (1)
i=1 i=1

Just to give you an idea of how such proofs proceed, I will sketch this one.
When n = 2, this is just property 2 in the definition. Next we show that if the result is
true for any n, the it is also true for (n + 1).
Let ni=1 Ai = B. We claim that B ∩ An+1 = ∅. For suppose not. Then there exists
S

x ∈ B ∩ An+1 . Therefore x ∈ B, and therefore x ∈ Ai for at least one i = 1, 2, . . . n.


But from the supposition we are making temporarily, we also have x ∈ An+1 . Therefore
x ∈ Ai ∩ An+1 for this i, which contradicts Ai ∩ An+1 = ∅. Therefore our supposition must
be false; this proves the claim.
Now we can use property 2 in the definition to write

P r(B ∪ An+1 ) = P r(B) + P r(An+1 ) .

But
n
[ n+1
[
B ∪ An+1 = Ai ∪ An+1 = Ai ,
i=1 i=1

2
and the assumption that the result is true for n gives us
n
X
P r(B) = P r(Ai ) .
i=1

Substituting into (1), we see that the result is true for n + 1. QED
This is proof by mathematical induction, with an inner step (lemma, if you like) proved
by contradiction. Such techniques will be useful from time to time.
This proof is so trivial that we could have directly made the n case the definition without
loss of generality, but doing it this way served as a simple introduction to “proof math,”
which will appear from time to time in this course.
Since P r is defined over subsets of S, we should write P r({s}) for the probability of an
elementary event s ∈ S, but we will more simply write P r(s) without much risk of confusion.
What if S is a countably or uncountably infinite set? We might want to require the
probability function to be countably or uncountably additive over pairwise disjoint events,
that is, extend (1) to the case of countably or uncountably infinite n. But this is problematic.
Uncountable additivity is clearly too much to ask in the standard system of analysis: if a real
number is being picked randomly from the interval (0,1), then we will want P r(s) = 0 for
any elementary event or number s ∈ (0, 1), but P r( (0, 1) ) = 1. Even countable additivity
can be problematic. In general it is not possible to define probability over all subsets of an
infinite S and get countable additivity. The definition must be restricted to subclasses of 2S
that are called σ-fields or σ-algebras. Such a structure must have the following properties: it
should contain the empty set and the whole space, and unions and intersections of countable
families of sets already in it. If our event space is the real line or an interval, the usual σ-field
is constructed by taking countable unions, intersections, and complements of all intervals;
this is called the Borel σ-field. We will denote the σ-field or class of subsets of our sample
space S over which probabilities are defined by A. Thus A ⊂ 2S , and if S is finite, we will
usually take A = 2S .
To sum up, the foundations of our probability theory are the triple (S, A, P r), where S
is the sample space, A a σ-field over S, and a function P r : A 7→ R+ satisfying
1. P r(∅) = 0, P r(S) = 1.
2. If Ai ⊂ S for i = 1, 2, . . . and Ai ∩ Aj = ∅ for i 6= j, then
∞ ∞
!
[ X
Pr Ai = P r(Ai ) . (2)
i=1 i=1

Luckily for our purpose in this course, the details of such measure-theoretic foundations
of probability are largely irrelevant. But some related ideas will occasionally crop up, and
you will need them in more detail if you go on to do advanced courses such as rocket-science
finance where the sample space of the stochastic processes under study consists of time-paths
(functions) and the σ-field over which probabilities are defined must evolve in time as the
process itself unfolds.
The mathematical structure as set out above is independent of any interpretation. But
in each application one must assign probability numbers to events. How is this done? Here

3
are some examples. [1] In some situations, the physics or other science of a situation gives
us probabilities; radioactive decay is an example. Some would argue that in fact everything
is deterministic, and probabilistic modeling is only an imperfect way of dealing with our
inability to calculate a highly complex reality. We leave such debates about the true meaning
of determinism or randomness to philosophers, who have a long tradition of arguing questions
and never finding any answers. [2] In independently repeatable experiments we can assign
probabilities by performing the experiment numerous times and observing frequencies. A
similar method underlies most forecasting based on estimation of models. [3] Sometimes
a “principle of insufficient reason” is invoked: if we have no good reason to believe that
one event is more likely than another, then we regard them as equally likely. But this is
fallible. For example, the three elementary outcomes of tossing two indistinguishable coins
are not equally likely; it would be a mistake to forget the underlying process that makes
”one head, one tail” twice as likely as either two heads or two tails. [4] How can we quantify
an individual’s purely subjective assessment that “one event is more likely than another”?
If the individual has a complete ordering of the class of events, and if the class includes (or
can be augmented to include by adding sequences of coin tosses) events that have objective
probabilities k/2n , then the probability P r(E) of any event E can be found to any desired
level of accurancy by repeatedly asking the individual to make comparisons to establish
bounds
(k − 1)/2n < P r(E) ≤ k/2n .
This is basically Savage’s theory of subjective probabilities. Kreps (1988, chapter 8) gives a
good account.

3 Conditional Probabilities
Suppose we start with a sample space S, and then are told that a particular event E ∈ A
with Pr(E) > 0 has happened. So uncertainty is partially resolved: we know that the actual
outcome is some elementary event in E, but do not know precisely which. We should now
redefine probabilities taking into account of the fact that E has happened. These are called
conditional probabilities given E, and are defined as

P r(A ∩ E)
P r(A|E) = (3)
P r(E)

(The definition is meaningless if P r(E) = 0: then P r(A ∩ E) = 0 also (Why?), so the


ratio is 0/0. The definition remains valid but trivial if E is an elementary event so that
its occurance resolves the uncertainty completely. We can restrict the definition of these
probabilities just to sub-events of E, with the σ-field appropriately restricted. Or we could
define these probabilities for all A in the original σ-field, and say that P r(A|E) = 0 if
A ∩ E = ∅. For our applications it makes little difference which usage we adopt.)
Example: Consider the roll of a die, and suppose all outcomes are equally likely with
probability 16 each. Define E to be the event that the outcome is odd (1 or 3 or 5), so

4
P r(E) = 21 . Let A be the event that the outcome is less than or equal to 3, so P r(A) = 21 .
Then A ∩ E = {1, 3}, and P r(A ∩ E) = 13 . Therefore
1/3 2
P r(A|E) = = .
1/2 3

Observe that P r(A|E) 6= P r(A) in this example. That is because the knowledge that E
has occurred conveys some extra information about A (here the occurrance of E tells us that
the outcome cannot be 2. If such is not the case, so that the conditional probability of the
second event equals its unconditional probability, we call the events independent. Formally,
we define events A1 , A2 , . . . An to be independent if, for any selection of any k distinct
events, say Ai1 , Ai2 , . . . Aik (including the case where k = n so all are selected),
P r (Ai1 ∩ Ai2 ∩ Ai3 . . . ∩ Aik ) = P r(Ai1 ) P r(Ai2 ) P r(Ai3 ) . . . P r(Aik ) . (4)
Then (3) immediately shows that if A and B are independent events,
P r(A) P r(B)
P r(A|B) = = P r(A) .
P r(B)
More generally, for any k + 1 independent events A1 , A2 , . . . Ak , Ak+1 ,
P r(Ak+1 |A1 ∩ A2 ∩ . . . Ak ) = P r(Ak+1 )

Observe that in making all these statements I omitted to say that all these events have
to be in the σ-algebra A. Since the σ-algebra structure is not of much importance in our
applications, I will continue to avoid such pedantry from now on.
For applications in economics, perhaps the most important result about conditional prob-
abilities is Bayes’ theorem or formula. It enables us to extract information about the prob-
abilities of some underlying events (possible causes?) by observing some other events (ef-
fects?). The situation is usually as follows. Suppose E1 , E2 , . . . Em is a partition of S, that
is, collection of mutually exclusive and exhaustive events:
n
[
Ei ∩ Ej = ∅ for i 6= j; Ei = S .
i=1

We know the probabilities P r(Ei ), but don’t get any direct information about the occurance
or otherwise of these events. However, there is another family of mutually exclusive and
exhaustive events A1 , A2 , . . . An . We know the conditional probabilities P r(Aj |Ei ), and we
do get to observe which of the Aj actually occurs. Then Bayes’ formula gives us the “reverse
probabilities”
P r(Aj |Ei ) P r(Ei )
P r(Ei |Aj ) = Pm . (5)
k=1 P r(Aj |Ek ) P r(Ek )

In many applications, the probabilities P r(Ei ) are the ones initially held, which are then
revised in the light of the information as to which of the Aj occurs. Therefore the P r(Ei )
are called the prior probabilities and the relevant P r(Ei |Aj ) the posterior probabilities.

5
To prove Bayes’ formula, begin with the definition (3) to write

P r(Ei ∩ Aj )
P r(Ei |Aj ) = .
P r(Aj )

In the numerator, use the definition of the conditional probability in the other direction to
write
P r(Ei ∩ Aj ) = P r(Aj ∩ Ei ) = P r(Aj |Ei ) P r(Ei ) .
Turning to the denominator, observe that the event Aj can be partitioned into mutually
exclusive and exhaustive subevents:
m
[
Aj = (Aj ∩ Ek ) ,
k=1

and therefore n n
X X
P r(Aj ) = P r(Aj ∩ Ek ) = P r(Aj |Ek ) P r(Ek )
k=1 k=1

This completes the proof. Observe that I have used the symbol k for the index of summation
to distinguish the Ek here from the Ei for a specific i in the numerator of Bayes’ formula.
Example: Suppose the world consists of good guys and bad guys, and your prior is
that the probability of a random person being good is 70%. The two types have different
temptations to cheat you; a bad guy will cheat you 80% of the time and a good guy will
cheat you only 10% of the time. You interact with a person who cheats you. What is your
posterior probability that this person is bad?
Let E1 = the person is bad, E2 = the person is good. So your prior probabilities are
P r(E1 ) = 0.3, P r(E2 ) = 0.7 . Let A1 = you get cheated, A2 = you are not cheated. The
conditional probabilities are

P r(A1 |E1 ) = 0.8, P r(A2 |E1 ) = 0.2, P r(A1 |E2 ) = 0.1, P r(A2 |E2 ) = 0.9 .

Then by Bayes’ formula the required posterior probability is

P r(A1 |E1 ) P r(E1 )


P r(E1 |A1 ) =
P r(A1 |E1 ) P r(E1 ) + P r(A1 |E2 ) P r(E2 )
0.8 ∗ 0.3 0.24 0.24
= = = = 0.774
0.8 ∗ 0.3 + 0.1 ∗ 0.7 0.24 + 0.07 0.31
I usually find it convenient to display this calculation in a matrix where the rows are the
unobservable Ei , the columns are the observable Aj , the cells are the intersections Ei ∩ Aj ,
and the cell entries are the probabilities. Then, once an Aj is observed, we are restricted to
that column. The conditional probabilities of the various Ei are the ratios of the probabilities
in each row divided by the sum of the rows in that column:

6
Observables (effects)
A1 (cheated) A2 (not cheated)

Unobserved E1 (bad, prior prob 0.3) 0.3 * 0.8 = 0.24 0.3 * 0.2 = 0.06
Causes E2 (good, prior prob. 0.7) 0.7 * 0.1 = 0.07 0.7 * 0.9 = 0.63
Sum over rows 0.31 0.69
In this example, if you were not cheated, you would revise the probability of the person
being bad down to 0.06/0.69 = 0.087 .
Such calculations will appear often when solving games with asymmetric information.
There the conditional probabilities P r(Aj |Ei ) will be the (mixed) strategies of the types of
players, and their equilibrium values will of course have to be found as a part of the solution.

4 Random Variables
Sample spaces in general can be quite abstract or can consist of objects like cards in a deck.
But often, and especially in economic applications, numerical magnitudes are associated with
events. To study these mathematically, we define the concept of a random variable as a real
valued function on a sample space, X : S 7→ R. Thus a random variable is actually neither
random nor variable; it is just a function.
Example: The sample space has two elementary events, “earthquake in LA” and “no
earthquake in LA,” and the random variable maps each event to the aggregate of property
values in LA in that event.
Given a random variable X, we define its (cumulative) distribution function (CDF): for
any real number t, this takes the value equal to the probability that the value of the random
variable is less than or equal to t. Symbolically, the CDF F : R 7→ [0, 1] is defined by the
rule
F (t) = P r(X −1 (−∞, t]) (6)
(For this to be meaningful, the set in the sample space that is mapped to (−∞, t] by the
random variable X, namely the set of preimages X −1 (−∞, t]), has to be in the σ-field over
which probabilities are defined; for this, X has to be what is called a measurable function.
But in our applications in this course we can disregard this mathematical complexity.)
The CDF of any random variable must be non-decreasing (prove this). But it may be
flat over some ranges of R. If S is finite, then X can take on only a finite number of values,
and its CDF will be flat between the values with jumps at these values. Even if S is an
uncountably infinite continuum, X may have gaps in the values it takes and may take some
real values with positive probability, so the CDF may have flats and/or jumps.
If the CDF is differentiable, its derivative function f (t) = F 0 (t) is called the probability
density function (PDF) of the random variable X. This definition can be generalized by
allowing suitably infinite derivatives for step functions (Dirac Delta functions); we may
occasionally do this rather heuristically.
The support of the distribution of a random variable is the subset of the real line corre-
sponding to the values the variable can take with positive probability or density. In most of

7
our applications, we will find that the value F (t) of the CDF is zero over an interval (−∞, tL ),
then it increases either continuously or in jumps, finally reaching 1 at tH and then staying
there over (tH , ∞). Then [tL , tH ] is the support. In special cases we may get tL = −∞
and/or tH = ∞. More generally, mathematical complications arise in rigorous definition and
treatment of the concept of the support. We do not need these in our economic applications,
therefore I will omit this.
The (mathematical) expectation or expected value of a random variable X over a finite
sample space S is defined as X
E[X] = X(s) P r(s) .
s∈S

Important – “expected value” has no connotation of anticipation or entitlement. It is just a


mathematical term. Note that the expected value is a single number.
For an infinite sample space, we can define a corresponding integral
Z
E[X] = X(s) P r(ds) ,
s∈S

but a more convenient representation is by means of the CDF or the PDF F of X,


Z tH Z tH
E[X] = t dF (t) = t f (t) dt .
tL tL

If the CDF has jumps, the first form of the integral has to be defined and calculated appro-
priately; we won’t go into the general theory of this but will explain it in specific examples
when (if) the issue arises.
Write E[X] = µ for short; then the variance of X is defined as

V[X] = E[ (X − µ)2 ] .

Prove that
V[X] = E[X 2 ] − µ2 .
Other moments are similarly defined. Some further formulas for expectations of other func-
tions of a random variable (e.g. exponential) and for CDFs, PDFs, means, variances etc. of
particular random variables (negative exponential, normal, etc.) will be needed from time
to time. You should know most of these from your probability and statistics courses, and
we will develop others as needed.
More reminders or wake-up calls for matters from probability theory and statistics will
appear in the first problem set.

5 Further Reading
The only required reading for this background is your textbook in your prerequisite statistics
course, ECO202 (old ECO200) or ORF245. For those interested, here is some more
Feller, William. 1968. An Introduction to Probability Theory and Its Applications: Volume I.
Third Edition. New York: Wiley.

8
The Introduction chapter of this gives an outstanding discussion of the conceptual founda-
tions. Chapters I, V, and IX cover the above material for finite sample spaces.
Billingsley, Patrick. 1986. Probability and Measure. Second Edition. New York: Wiley.
This gives the rigorous general theory for arbitrary sample spaces.
If you want to find out more about the Savage approach to subjective probability, read
Kreps, David. 1988. Notes on the Theory of Choice. Boulder, CO: Westview Press. Chap-
ter 8.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy