Revision Notes - ST2131: Ma Hongqiang April 18, 2017
Revision Notes - ST2131: Ma Hongqiang April 18, 2017
Revision Notes - ST2131: Ma Hongqiang April 18, 2017
Ma Hongqiang
April 18, 2017
Contents
1 Combinatorial Analysis 2
2 Axioms of Probability 3
4 Random Variables 7
7 Properties of Expectation 24
8 Limit Theorems 29
9 Problems 30
1
1 Combinatorial Analysis
Theorem 1.1 (Generalised Basic Principle of Counting).
Suppose that r experiments are to be preformed. If
• experiment 1 can result in n1 possible outcomes;
• experiment 2 can result in n2 possible outcomes;
• ···
• experiment r can result in nr possible outcomes;
then together there are n1 n2 · · · nr possible outcomes of the r experiments.
1.1 Permutations
Theorem 1.2 (Permutation of distinct objects).
Suppose there are n distinct objects, then the total number of permutations is
n!
.
Theorem 1.3 (General principle of permutation).
For n objects of which n1 are alike, n2 are alike, . . ., nr are alike, there are
n!
n1 !n2 ! · · · nr !
different permutations of the n objects.
1.2 Combinations
Theorem 1.4 (General principle of combination).
If there are n distinct objects, of which we choose a group of r items, number of combinations
equals
n n!
=
r r!(n − r)!
2
1.3 Multinomial Coefficients
n
If n1 + n2 + · · · + nr = n, we define n1 ,n2 ,··· ,nr
by
n n!
=
n1 , n2 , · · · , nr n1 !n2 ! · · · nr !
Thus n1 ,n2n,··· ,nr represents the number of possible divisions of n distinct objects into r
2 Axioms of Probability
2.1 Sample Space and Events
The basic objects of probability is an experiment: an activity or procedure that produces
distinct, well-defined possibilities called outcomes.
The sample space is the set of all possible outcomes of an experiment, usually denoted by
S.
Any subset E of the sample space is an event.
3
(iii) For any sequence of mutually exclusive events A1 , A2 , . . .(that is Ai Aj = ∅ when i 6= j)
∞
X
P (∪∞
i=1 Ai ) = P (Ai )
i=1
4
3 Conditional Probability and Independence
3.1 conditional Probabilities
Definition 3.1. Let A and B be two events. Suppose that P (A) > 0, the conditional
probability of B given A is defined as
P (AB)
P (A)
and is denoted by P (B|A).
Suppose P (A) > 0, then P (AB) = P (A)P (B|A).
Theorem 3.1 (General Multiplication Rule).
Let A1 , A2 , . . . , An be n events, then
5
Theorem 3.4. If A and B are independent, then so are
1. A and B c
2. Ac and B
3. Ac and B c
2.
P (S|A) = 1
6
4 Random Variables
4.1 Random Variables
Definition 4.1. A random variable X, is a mapping from the sample space to real num-
bers.
7
4.4 Expectation of a Function of a Random Variable
Theorem 4.1. If X is a discrete random variable that takes values xi , i ≥ 1, with respective
probabilities pX (xi ), then for any real value function g
X
E[g(x)] = g(xi )pX (xi ) or equivalently
i
X
= g(x)pX (x)
x
8
2. Binomial random variable,denoted by Bin(n, p)
We perform the experiment (under identical conditions and independently) n times
and define
X = number of successes in n Bernoulli(p) trials
Therefore, X takes values 0, 1, . . . , n. In fact, for 0 ≤ k ≤ n,
n k n−k
P (X = k) = p q
k
Here,
E(X) = np, Var(X) = np(1 − p)
Also, a useful fact is
P (X = k + 1) p n−k
=
P (X = k) 1−pk+1
P (X = k) = pq k−1
And,
1 1−p
E(X) = Var(X) =
p p2
4. Negative Binomial random variable, denoted by NBr, p
Define the random variable
And
r r(1 − p)
E(X) = Var(X) =
p p2
9
4.7 Poisson Random Variable
A random variable X is said to have a Poisson distribution with parameter λ if X takes
values 0, 1, 2 . . . with probabilities given as:
e−λ λk
P (X = k) =
k!
And
E(X) = λ Var(X) = λ
Poisson distribution of parameter λ := np can be used as an approximation for a binomial
distribution with parameter (n, p), when n is large and p is small such that np is moderate.
(Poisson Paradigm) Consider n events, with pi equal to the probability that event i
occurs, i = 1, . . . , n. If all the pi are small and trials are either independent or at most
weakly dependent, then P the number of these events that occur approximately has a Poisson
distribution with mean ni=1 pi := λ. Another use is Poisson process.
10
2. lim FX (b) = 1
b→∞
3. lim FX (b) = 0
b→−∞
11
5 Continuous Random Variable
5.1 Introduction
Definition 5.1. We say that X is a continuous random variable if there exists a non-
negative function fX , defined for all real x ∈ R, having the property that, for any set B of
real numbers, Z
P (X ∈ B) = fX (x)dx
B
The function fX is called the probability density function of the random variable X.
For instance, letting B = [a, b], we have
Z b
P (a ≤ X ≤ b) = fX (x)dx
a
2. FX is continuous.
1. Z ∞
E[g(x)] = g(x)fX (x)dx
−∞
12
2. Same linearity Property:
E(aX + b) = aE(X) + b
In general, for a < b, a random variable X is uniformly distributed over the interval (a, b) if
its probability density function is given by
(
1
, a<x<b
fX (x) = b−a
0, otherwise
13
5.4 Normal Distribution
A random variable is said to be normally distributed with parameters µ and σ 2 if its
probability density function is given by
1 1 2
fX (x) = √ e− 2σ2 (x−µ)
2πσ
2. −Z ∼ N (0, 1)
3. P (Z ≤ x) = 1 − P (Z > x)
4. P (Z ≤ −x) = P (Z ≥ x)
Y −µ
5. If Y ∼ N (µ, σ 2 ), then X = σ
∼ N (0, 1)
Important facts:
Definition 5.4. The qth quantile of a random variable X is defined as a number zq so that
P (X ≤ zq ) = q.
14
5.5 Exponential Distribution
A random variable X is said to be exponentially distributed with parameter λ > 0 if its
probability density function is given by
(
λe−λx if x ≥ 0
fX (x) =
0 if x < 0
where λ > 0, α > 0, and Γ(α), called the gamma function, is defined by
Z ∞
Γ(α) = e−y y α−1 dy
0
Remark:
1. Γ(1) = 1.
2. Γ(α) = (α − 1)Γ(α − 1)
4. Γ(1, λ) = Exp(λ).
√
5. Γ( 21 ) = π
15
5.7 Beta Distribution
A random variable X is said to have a beta distribution with parameter (a, b), denoted
by X ∼ Beta(a, b), if its density is given by
(
1
xa−1 (1 − x)b−1 0 < x < 1
f (x) = B(a,b)
0 otherwise
where Z 1
B(a, b) = xa−1 (1 − x)b−1 dx
0
is known as the beta function.
It can be shown that
Γ(a)Γ(b)
B(a, b) =
Γ(a + b)
If X ∼ β(a, b), then
a ab
E[X] = and Var(X) =
a+b (a + b)2 (a + b + 1)
as n → ∞, where q = 1 − p.
That is,
Bin(n, p) ≈ N (np, npq)
Equivalently,
X − np
√ ≈Z
npq
where Z ∼ N (0, 1). Remark: The normal approximation will be generally quite good for
values of n satisfying np(1 − p) ≥ 10.
16
Approximation is further improved if we incorporate continuity correction.
If X ∼ Bin(n, p), then
1 1
P (X = k) = P k − ≤ X ≤ k +
2 2
1
P (X ≥ k) = P X ≥ k −
2
1
P (X ≤ k) = P X ≤ k +
2
17
6 Jointly Distributed Random Variables
6.1 Joint Distribution Functions
Definition 6.1. For any two random variables X and Y defined on the same sample space,
we defined the joint distribution function of X and Y by
The distribution function of X can be obtained from the joint density function of X and Y
in the following way:
lim FX,Y (x, y)
FX (x) = y→∞
We call FX the marginal distribution function of X.
Similarly,
lim FX,Y (x, y)
FY (y) = x→∞
and FY is called the marginal distribution function of Y .
P (a1 < X ≤ a2 , b1 < Y ≤ b2 ) = FX,Y (a2 , b2 ) − FX,Y (a1 , b2 ) + FX,Y (a1 , b1 ) − FX,Y (a2 , b1 )
pX,Y (x, y) = P (X = x, Y = y)
We can recover the probability mass function of X and Y in the following manner:
X
pX (x) = P (X = x) = pX,Y (x, y)
y∈R
X
pY (y) = P (Y = y) = pX,Y (x, y)
x∈R
We call pX the marginal probability mass function of X and pY the marginal prob-
ability mass function of Y .
1. X X
P (a1 < X ≤ a2 , b1 < Y ≤ b2 ) = pX,Y (x, y)
a1 <X≤a2 b1 <Y ≤b2
18
2. XX
FX,Y (a, b) = P (X ≤ a, Y ≤ b) = pX,Y (x, y)
X≤a Y ≤b
3. XX
P (X > a, Y > b) = pX,Y (x, y)
X>a Y >b
3. Let a, b ∈ R, we have
Z a Z b
FX,Y (a, b) = P (X ≤ a, Y ≤ b) = fX,Y (x, y)dydx
−∞ −∞
As a result of this,
∂2
fX,Y (x, y) = FX,Y (x, y)
∂x∂y
Definition 6.2. The marginal probability density function of X is given by
Z ∞
fX (x) = fX,Y (x, y)dy
−∞
19
6.2 Independent Random Variables
Two random variables X and Y are said to be independent if
Theorem 6.5. Random variables X and Y are independent if and only if there exist func-
tions g, h : R → R such that for all x, y ∈ R, we have
And Z ∞
fX+Y (x) = fX (x − t)fY (t)dt
−∞
X + Y ∼ Γ(α + β, λ)
20
6.4 X and Y are discrete and independent
Theorem 6.9 (Sum of 2 Independent Poisson Random Variables).
If X ∼ Poisson(λ) and Y ∼ Poisson(µ) are two independent random variables, X + Y ∼
Poisson(λ + µ).
pX|Y (x | y) : = P (X = x | Y = y)
pX,Y (x, y)
=
pY (y)
P (X ≤ x, Y = y)
FX|Y (x | y) : =
P (Y = y)
X
= pX|Y (x | y)
a≤x
fX,Y (x, y)
fX|Y (x | y) :=
fY (y)
for all y such that fY (y) > 0. We define conditional probabilities of event associated with
one random variable when we are given the value of a second random variable. That is, for
A ⊂ R and y such that fY (y) > 0,
Z
P (X ∈ A | Y = y) fX|Y (x | y)dx
A
21
In particular, the conditional distribution function of X given that Y = y is defined by
Z x
FX|Y (x, y) = P (X ≤ x | Y = y) = fX|Y (t | y)dt
−∞
FX,Y,Z (x, y, z) := P (X ≤ x, Y ≤ y, Z ≤ z)
22
6.8.1 Joint probability density function of X, Y and Z:fX,Y,Z (x, y, z)
For any D ⊂ R3 , we have
ZZ Z
P ((X, Y, Z) ∈ D) = fX,Y,Z (x, y, z)dxdydz
(x,y,z)∈D
23
7 Properties of Expectation
Theorem 7.1. If a ≤ X ≤ b, then a ≤ E(X) ≤ b.
1. If X and Y are jointly discrete with joint probability mass function pX,Y , then
XX
E[g(X, Y )] = g(x, y)pX,Y (x, y)
y x
2. If X and Y are joint continuous with joint probability density function fX,Y , then
Z ∞Z ∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy
−∞ −∞
4. Monotone Property
If jointly distributed random variables X and Y satisfy X ≤ Y , then
E(X) ≤ E(Y )
24
Theorem 7.4 (Alternative formulae for covariance).
1. Var(X) = cov(X, X)
2. cov(X, Y ) = cov(Y, X)
P P P
n Pm
3. cov i=1 a i X i , j=1 b j Y j = ni=1 m
j=1 ai bj cov(Xi , Yj )
Theorem 7.8.
n
! n
X X X
Var Xk = Var(Xk ) + 2 cov(Xi , Xj )
k=1 k=1 1≤i<j≤n
−1 ≤ ρ(X, Y ) ≤ 1
1. The correlation coefiicient is a measure of the degree of linearity between X and Y .
If ρ(X, Y ) = 0, then X and Y are said to be uncorrelated.
ρY
2. ρ(X, Y ) = 1 if and only if Y = aX + b where a = ρX
> 0.
4. ρ(X, Y ) is dimensionless.
5. If X and Y are independent, then ρ(X, Y ) = 0.
25
7.3 Conditional expectation
Definition 7.3.
and we have
= E(IA ) = E[E(IA | Y )]
(P
E(IA | Y = y)P (Y = y) if Y is discrete
= R ∞y
P (A) −∞
E(IA | Y = y)fY (y)dy if Y is continuous
(P
P (A | Y = y)P (Y = y) if Y is discrete
= R ∞y
−∞
P (A | Y = y)fY (y)dy if Y is continuous
26
7.4 Conditional Variance
Definition 7.4. The conditional variance of X given that Y = y is defined as
Theorem 7.13.
Var(X) = E[Var(X | Y )] + Var(E[X | Y ])
(n)
1. MX (0) = E[X n ].
2. Multiplicative Property: If X and Y are independent, then
3. Uniqueness Property: Let X and Y be random variables with their moment generating
functions MX and MY respectively. Suppose that there exists an h > 0 such that
27
7.6 Joint Moment Generating Functions
Definition 7.6. For any n random variables X1 , . . . , Xn , the joint moment generating func-
tion, M (t1 , . . . , tn ), is defined for all real values t1 , . . . , tn by
The individual moment generating functions can be obtained from M (t1 , . . . , tn ) by letting
all but one of the tj be 0. That is,
28
8 Limit Theorems
8.1 Chebyshev’s Inequality and the Weak Law of Large Numbers
Theorem 8.1 (Markov’s Inequality).
Let X be a nonnegative random variable. For a > 0, we have
E(X)
P (X ≥ a) ≤
a
Theorem 8.2 (Chebyshev’s Inequality).
Let X be a random variable with finite mean µ and variance σ 2 , then for a > 0, we have
σ2
P (|X − µ| ≥ a) ≤
a2
Theorem 8.3 (Consequences of Chebyshev’s Inequality).
If Var(X) = 0, then the random variable X is a constant. Or in other words,
P (X = E(X)) = 1
Theorem 8.4 (The Weak Law of Large Numbers).
Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables,
with common mean µ. Then, for any > 0,
X1 + · · · + Xn
P − µ ≥ → 0 as n → ∞
n
29
9 Problems
1 (AY1314Sem1) Let X1 and X2 have a bivariate normal distribution with parameters
µ1 = µ2 = 0, σ1 = σ2 = 1 and ρ = 12 . Find the probability that all of the roots of the
following equation are real:
X1 x2 + 2X2 x + X1 = 0
2 (AY1617Sem1) Let (X1 , X2 ) have a bivariate normal distribution with means 0, vari-
ances µ21 and µ22 , respectively, and with the correlation coefficient −1 < ρ < 1.
(a) Determine the distribution of aX1 + bX2 , where a and b are two real numbers
such that a2 + b2 > 0.
(b) Find a constants b such that X1 + bX2 is independent of X1 . Justify your answer.
(c) Find the probability that the following equation has real roots:
X1 x2 − 2X1 x − bX − 2 = 0
30