Probability Basics
Probability Basics
Probability Basics
Probability assigns numbers to events. It forms the foundation of statistics, and can be
studied independently.
1.1
Axioms of probability
The fundamental notion is that of a probability space, that is composed of three elements:
The sample space . It represents a population formed of individuals, firms, etc...
A collection of events F. An event (one element of F) is a subset F .
A probability measure P . P is a function which maps F onto [0, 1]. It assigns a
probability to each event.
1.2
Counting techniques
Simple events with equal probability: Assume that the sample space consists of a
finite number of simple events with equal probability. Then if A F, n(A) is the number
of simple events contained in A and n() is the number of simple events contained in ,
we have:
n(A)
P (A) =
.
n()
Example: the probability of having two tails if two coins are tossed is:
P ({T T }) =
n({T T })
1
= .
n({HH, HT, T H, T T })
4
Permutations: The number of distinct ordered sets taking r elements from n elements
is
n!
= n(n 1)...(n r + 1) Arn .
(n r)!
Combinations: The number of distinct sets taking r elements from n elements is
n!
n(n 1)...(n r + 1)
=
Cnr .
r!(n r)!
r(r 1)...1
Note that Newtons formula is
n
(a + b) =
n
X
Cnk ak bnk .
k=0
So in particular:
n
X
Cnk = 2n ,
k=0
1.3
Conditional probability
We are often interested in the properties of an economic variable given others. Example:
wage given education and age. Moreover, most economic models hold ceteris paribus,
hence the need to condition.
Definition: Let A and B be events. The probability that A occurs given B is:
P (A|B) =
P (A B)
P (B)
J
X
P (B|Aj )P (Aj ).
j=1
Bayes theorem:
P (A|B) =
P (B|A)P (A)
.
P (B)
10 20
20 30
10 20
5 10
+ 10
20 30
30
2
= .
3
That is: A is independent of B if B (resp. A) does not bring any new information on A
(resp. B).
Ex: A {T, H}=result of the first coin, B {T, H}=result of the second coin.
We can also define pairwise independence, and mutual independence.
Exercise: If the probability being admitted to one school is 1/n and you apply to n
schools, what is the probability that you will be admitted to at least one school?
1
1 P (not admitted at school 1)...P (not admitted at school n) = 1 1
n
n
.
Definition
A random variable is a variable that takes values according to a certain probability distribution.
More formally, a random variable is a function of the sample space onto RK :
X : RK ,
7 X().
If K = 1 X is said univariate.
We generally can forget about . It will be sufficient to think of X as a variable
that can take some values with some probability; that is: X is stochastic (as opposed to
deterministic).
To all univariate random variable X corresponds a cumulative distribution function
(cdf), that maps the real line onto the interval [0, 1], given by:
FX (x) = P { , X() x} P (X x),
0 = FX (x
1 ) < FX (x1 ) = FX (x2 ) < FX (x2 ) < ... < 1.
x = 0...n,
n!
x!(n x)!
x=0
Poisson: The sample space is {0, 1, 2, 3, ...}. The p.m.f. of a Poisson distributed random
variable with parameter > 0 is:
fX (x) = exp()
We verify that:
fX (x) = exp()
x=0
x
.
x!
X
x
x=0
x!
= 1.
The Poisson is widely used to model duration processes. Example: arrival of job offers.
dF (x)
dx
R +
fX (x)dx = 1.
This implies that the event X = a has probability zero! Still, this is not an impossible
event.
Example: the probability of having a yearly wage of 114, 773.6 dollars is zero. However,
there can be individuals with that wage in the sample.
Conditional density function: Let [x1 , x2 ] [a, b]. Then:
P (X [x1 , x2 ]|X [a, b]) =
P (X [x1 , x2 ])
.
P (X [a, b])
fX (x)
if x [a, b]
fX (x)dx
= 0 if x
/ [a, b].
a
x2
Example: Let f be the pdf of the wage variable. The pdf of the wage, given that one
belongs to the 20% poorest of the population, is:
fX (x|x [0, FX1 (.20)]) =
f (x)
X1
= 5fX (x) if x [0, FX1 (.20)].
FX FX (.20)
x > 0.
Standard normal: The sample space is (, +), and the p.d.f. of a standard normal
random variable is given by:
1
fX (x) = exp(x2 /2) = (x).
2
The cdf has no simple analytical expression:
Z
(e
x)de
x = (x).
FX (x) =
(x) = (x),
1 x
(
),
x
).
The normal family plays a central role in statistics and econometrics. In the course,
we will understand why when studying the central limit theorem.
X (x) = (
Mixed random variable: There can also be mixtures of continuous and discrete random variables. An example arises when random variables are censored.
A convenient way to represent a mixed random variable is a a couple of two random
variables, one discrete and the other continuous.
Example: financial assets owned by a household (there are typically many zeros). Let
X be the value of the financial assets, and Y be the dummy variable (i.e. binary 0/1)
indicating participation to financial markets. We study (X, Y ).
P (X g 1 (y)) = F (g 1 (y))
if g(.) is increasing
X
FY (y) =
P (X g 1 (y)) = 1 F (g 1 (y)) if g(.) is decreasing.
X
Theorem: If X is (absolutely) continuous the pdf of Y is given by:
1
dg (y)
.
fY (y) = fX (g (y))
dy
1
Example: if X, the log-wage, is distributed as a standard normal, then the pdf of the
wage Y is given by:
1 (log y)
fY (y) = fX (log(y)) =
.
y
y
This is the pdf of the (standard) log-normal distribution.
Remark: as g(g 1 (y)) = y we have:
1
dg (y)
1
1
.
= fX (g (y)) 0 1
fY (y) = fX (g (y))
dy
g (g (y))
1
10
k
k
k=1
where
PK
k=1
pk = 1.
Result: every continuous pdf can be approximated by the pdf of a mixture of normals.
This result is true when K is not fixed. In practice, the choice of K is analogous the
choice of a bandwidth in nonparametric estimation, and so can be difficult. Still, the
family of normal mixtures is remarkable for its simplicity and its ability to fit arbitrary
continuous distributions.
11
fX,Y (xj , yk ) = 1.
1/2
0
1/4
1/4
The pmf of (X, Y ) is such that fX,Y (0, 0) = 1/2, fX,Y (0, 1) = 1/4, fX,Y (1, 0) = 0,
fX,Y (1, 1) = 1/4, and zero everywhere else. The cdf satisfies FX,Y (0, 0) = 1/2, FX,Y (0, 1) =
3/4, FX,Y (1, 0) = 1/2, FX,Y (1, 1) = 1, extended in the usual way.
Example 2: Trinomial distribution with parameters n, p, q, where p + q 1. The pmf
is given by:
n!
px q y (1 p q)nxy
fX,Y (x, y) =
x!y!(n x y)!
for all (x, y) {0, ..., n}2 such that x + y n, and zero everywhere else. This can be a
relevant framework to model labor market participation, unemployment and employment
over n months.
We check that:
n X
nx
X
fX,Y (x, y) = 1.
x=0 y=0
Marginal probability:
fX (x) = P (X = x) =
X
y
12
P (X = x, Y = y),
so:
fX (x) =
Example 1:
X/Y
0
1
1/2
0
1/4 3/4
1/4 1/4
1/2
1/2
Example 2: The marginals of the trinomial distribution are binomial. For instance:
fX (x) =
nx
X
y=0
n!
n!
px q y (1 p q)nxy =
px (1 p)nx .
x!y!(n x y)!
x!(n x)!
fX,Y (x, y)
.
fX (x)
fY |X (yk |x) =
X fX,Y (x, yk )
k
fX (x)
P
=
fX,Y (x, yk )
fX (x)
=
= 1.
fX (x)
fX (x)
P (X = 0, Y = 0)
1/2
=
= 1.
P (Y = 0)
1/2
Independence: Two discrete random variables are said independent iff {X = xj } and
{Y = yk } are independent, for all j, k, i.e.
P (X = xj , Y = yk ) = P (X = xj )P (Y = yk ).
13
N
Y
fXj (xj ) =
j=1
N
Y
fX (xj ).
j=1
b1
b2
FX,Y (b1 , b2 ) =
Remark: in this course we shall assume that the regularity conditions are such that
the Fubini theorem can be applied. As such, integrals can be interchangable.
We have:
2 FX,Y (x, y)
fX,Y (x, y) =
.
xy
14
As FX,Y is increasing with respect to both arguments it follows that fX,Y is non negative.
R + R +
Moreover: fX,Y (x, y)dydx = 1. Together, these two properties characterize a
bivariate pdf.
Moreover:
P (X = a, b1 Y b2 ) = 0.
As in the univariate case, this zero probability event is not impossible.
Example 1: the roof distribution, the joint pdf of which is
0 x 1,
fX,Y (x, y) = x + y,
0 y 1.
fX (x) =
fX (x)dx =
1
(x + y)dy = (x + ), for 0 x 1.
2
Example 2:
Z
fX (x) =
15
Conditional density. Given a continuous bivariate random variable (X, Y ) one can
define its conditional pdfs:
fY |X (y|x) =
fX,Y (x, y)
;
fX (x)
fX|Y (x|y) =
fX,Y (x, y)
.
fY (y)
x+y
,
x + 12
0 y 1.
fY |X (e
y |x)de
y
FY |X (y|x) =
= lim
h0
P (Y y|x h X x + h).
FX,Y (x, y) =
fX (e
x)FY |X (y|e
x)de
x=
fY (e
y )FX|Y (x|e
y )de
y.
16
Change of variables. As an example, consider the roof distribution, and let Z = X+Y .
The pdf of Z is
Z
fZ (z) =
x+y=z
So:
fZ (z) =
which yields:
fZ (z) = z (min(z, 1) max(z 1, 0)) , for z [0, 2].
We check that fZ is indeed a pdf.
Mixed random variables: In the case of mixed distributions, it is often convenient
to specify the marginal discrete pmf and the conditional pdf of the continuous variable
given the discrete one.
Example: wage and type of contract.
Theorem. Let U and V be two independent standard univariate normal r.v.s. Let
be a constant such that || < 1. Let
X = U;
Y = U +
p
1 2 V.
2 is called the standard bivariate normal pdf with parameter . We shall denote:
X
Y
0
0
!
,
1
1
!!
.
Proof. To show this result, note that Y |X N (X, 1 2 ). Hence its pdf is:
1
fY |X (y|x) = p
1 2
17
y x
p
1 2
!
.
Hence:
1
1 2
y x
p
1 2
!
(x).
p
exp
2
2 1 2 1
z1 1
1
z2 2
2
2(1
z1 1
1
z2 2
2
2 )
!
N
1
2
!
,
1 2
21
1 2
22
!!
.
2
;
1 1
2
;
1
N 0, (1 2 ) 22 ,
and is independent of Z1 .
Two remarks:
The bivariate normal is more than simply normal marginals: the dependence structure is also normal.
Example: in studies of default risk, the normal is typically found to present too little
dependence in the tail.
Two univariate normal variables are not necessarily independent. They can be
dependent if the dependence structure is not normal.
18
K
2
12
||
1
0 1
exp (x ) (x ) ,
2
K
X
(xk k )2
,
2
k
k=1
and:
K
1
1 X (xk k )2
fX (x) = (2)
exp
1 ... K
2 k=1
2k
!)
(
K
Y
1 (xk k )2
1
=
exp
.
2
2
2
k
k
k=1
K
2
19