Formelsamling-Stk-1100-1110 Eng Nov 2015
Formelsamling-Stk-1100-1110 Eng Nov 2015
1. Probability
Let A, B, A1 , A2 , . . . , B1 , B2 , . . . be events, that is, subsets of a sample space Ω.
a) Axioms:
A probability function P is a function from subsets of the sample space Ω to real
numbers, satisfying
P (Ω) = 1
P (A) ≥ 0
P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) if A1 ∩ A2 = ∅
[∞ X ∞
P Ai = P (Ai ) if Ai ∩ Aj = ∅ for i 6= j
i=1 i=1
b) P (A0 ) = 1 − P (A)
c) P (∅) = 0
d) A ⊂ B ⇒ P (A) ≤ P (B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
f) Conditional probability:
P (A ∩ B)
P (A|B) = if P (B) > 0
P (B)
g) Total probability:
n
X n
[
P (A) = P (A|Bi )P (Bi ) if Bi = Ω and Bi ∩ Bj = ∅ for i 6= j
i=1 i=1
h) Bayes’ Rule:
P (A|Bj )P (Bj )
P (Bj |A) = Pn under same conditions as in g)
i=1 P (A|Bi )P (Bi )
1
j) A1 , . . . , An are (statistically) independent events if
P (A1 ∩ · · · ∩ An )
= P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ) · · · P (An |A1 ∩ A2 ∩ · · · ∩ An−1 )
2. Combinatorics
a) Two operations that can be done in respectively n and m different ways can be
combined in n · m ways.
b) The number of ordered subsets of r elements drawn with replacement from a set
of n elements is nr
f) Number of ways a set of n elements can be divided into r subsets with ni elements
in the ith subset is
n n!
=
n1 n2 · · · nr n1 ! n2 ! · · · nr !
2
3. Probability distributions
a) For a random variable X (discrete or continuous), F (x) = P (X ≤ x) is the
cumulative distribution function (cdf).
p(xj ) = P (X = xj )
X
F (x) = p(xj )
xj ≤x
f (x) ≥ 0
Z ∞
f (x)dx = 1
−∞
e) For discrete random variables X and Y which can take the values x1 , x2 , . . . and
y1 , y2 , . . . respectively, we have
p(xi , yj ) = P (X = xi , Y = yj )
XX
F (x, y) = p(xi , yj )
xi ≤x yj ≤y
3
f) For continuous random variables X and Y we have
Z Z
P ( (X, Y ) ∈ A ) = f (u, v)dvdu
A
Zx Zy
F (x, y) = f (u, v)dvdu
−∞ −∞
2
∂ F (x, y)
f (x, y) =
∂x∂y
Z∞
fX (x) = f (x, y)dy (for X)
−∞
Z∞
fY (y) = f (x, y)dx (for Y )
−∞
i) Independence:
The random variables X and Y are independent if
p(xi , yj )
pX|Y (xi |yj ) = (for X given Y = yj )
pY (yj )
p(xi , yj )
pY |X (yj |xi ) = (for Y given X = xi )
pX (xi )
Assuming pY (yj ) > 0 and pX (xi ) > 0, respectively. Conditional point probabili-
ties can be treated as regular point probabilities.
4
k) Conditional probability densities:
f (x, y)
fX|Y (x|y) = (for X given Y = y)
fY (y)
f (x, y)
fY |X (y|x) = (for Y given X = x)
fX (x)
Assuming fY (y) > 0 and fX (x) > 0, respectively. Conditional probability densi-
ties can be treated as regular probability densities.
4. Expectation
a) The expected value of a random variable X is defined as
X
E(X) = xj p(xj ) (discrete)
j
Z ∞
E(X) = xf (x)dx (continuous)
−∞
d) For a real function g(X, Y ) of two random variables X and Y , the expected value
is
XX
E g(X, Y ) = g(xi , yj )p(xi , yj ) (discrete)
i j
Z∞ Z∞
E g(X, Y ) = g(x, y)f (x, y)dydx (continuous)
−∞ −∞
e) If X and Y are independent E g(X)h(Y ) = E g(X) · E h(Y )
5
h) Conditional expectation:
X
E(Y |X = xi ) = yj pY |X (yj |xi ) (discrete)
j
Z∞
2
b) V(X) = E(X 2 ) − E(X)
e) !
n
X n
X n X
X
2
V a+ bi X i = bi V(Xi ) + bi bj Cov(Xi , Xj )
i=1 i=1 i=1 j6=i
f) Chebyshev’s inequality:
Let X be a random variable with µ = E(X) and σ 2 = V(X).
For all t > 0 we have
σ2
P (|X − µ| > t) ≤ 2
t
6
b) Cov(X, X) = V(X)
d) X, Y independent ⇒ Cov(X, Y ) = 0
e) !
n
X m
X n X
X m
Cov a + bi X i , c + dj Yj = bi dj Cov(Xi , Yj )
i=1 j=1 i=1 j=1
b) If the moment generating function MX (t) exists for t in an open interval contain-
ing 0, then it uniquely determines the distribution of X.
c) If the moment generating function MX (t) exists for t in an open interval con-
taining 0, then all moments of X exist, and we can find the rth moment by
(r)
E(X r ) = MX (0)
X − np
Approximation 1: Z=p is approximately normally distributed
np(1 − p)
when np and n(1 − p) both are sufficiently big (at least 10)
7
Sum rule: X ∼ binomial (n, p), Y ∼ binomial (m, p)
and X, Y independent ⇒ X + Y ∼ binomial (n + m, p)
b) Geometric distribution:
Point probability: P (X = k) = (1 − p)k−1 p k = 1, 2, . . .
Moment generating function: MX (t) = et p/[1 − (1 − p)et ]
Expectation: E(X) = 1/p
Variance: V(x) = (1 − p)/p2
d) Hypergeometric distribution:
−M
(Mk )(Nn−k )
Point probability: P (X = k) = N
(n)
M
Expectation: E(X) = n · N
M N −n
Variance: V(X) = n M
N
(1 − )
N N −1
e) Poisson distribution:
λk −λ
Point probability: P (X = k) = k!
e k = 0, 1, . . .
t
Moment generating function: MX (t) = eλ(e −1)
Expectation: E(X) = λ
8
Variance: V(X) = λ
X −λ
Approximation: Z= √ is approximately normally distributed
λ
when λ is sufficiently big (at least 10)
e) Multinomial distribution:
Point probability: n!
P (N1 = n1 , . . . , Nr = nr ) = n1 !···n pn1 · · · pnr r
r! 1
Pr Pr
Here pi = 1 and ni = n
i=1 i=1
b) Exponential distribution:
Density: f (x) = λe−λx x>0
Moment generating function: MX (t) = λ/(λ − t) for t < λ
Expectation: E(X) = 1/λ
Variance: V(X) = 1/λ2
Sum rule: X ∼ exp(λ), Y ∼ exp(λ), X and Y independent
⇒ X + Y ∼ gamma(2, 1/λ)
c) Gamma distribution:
Density: f (x) = 1
β α Γ(α)
xα−1 e−x/β x>0
9
R∞
Gamma function: Γ(α) = 0 uα−1 e−u du
Γ(α + 1) = αΓ(α)
Γ(n) = (n√− 1)! when n is an integer
Γ(1/2) = π, Γ(1) = 1
Moment generating function: MX (t) = [1/(1 − βt)]α
Expectation: E(X) = αβ
Variance: V(X) = αβ 2
Sum rule: X ∼ gamma(α, β), Y ∼ gamma(δ, β),
X and Y independent ⇒ X + Y ∼ gamma(α + δ, β)
d) Chi-squared distribution:
Density: f (v) = 1
2n/2 Γ(n/2)
v (n/2)−1 e−v/2 v>0
n degrees of freedom
Expectation: E(V ) = n
Variance: V(V ) = 2n
Sum rule: U ∼ χ2n , V ∼ χ2m , U and V independent
⇒ U + V ∼ χ2n+m
Result: Z ∼ N (0, 1) ⇒ Z 2 ∼ χ21
e) Student’s t-distribution:
Γ[(n+1)/2] t2 −(n+1)/2
Density: f (t) = √
nπΓ(n/2)
(1 + n
) −∞<t<∞
n degrees of freedom
Expectation: E(T ) = 0 (n ≥ 2)
Variance: V(T ) = n/(n − 2) (n ≥ 3)
p
Result: Z ∼ N (0, 1), U ∼ χ2n , Z, U independent ⇒ Z/ U/n ∼ tn
f) Binormal distribution:
Density:
f (x, y) =
n (x−µX )2 o
(y−µY )2
1√ 1
exp − 2(1−ρ2) σ2
+ 2
σY
− 2ρ (x−µσXX)(y−µ
σY
Y)
2πσX σY 1−ρ2 X
2
Marginal distribution: X ∼ N (µX , σX ), Y ∼ N (µY , σY2 )
Correlation: Corr(X, Y ) = ρ
Conditional distribution: Given X = x, Y is normally distributed with
expectation E(Y |X = x) = µY + ρ σσXY (x − µX )
and variance V(Y |X = x) = σY2 (1 − ρ2 )
10
10. One normally distributed sample
If X1 , X2 , . . . , Xn are independent and N (µ, σ 2 ) distributed then we have that:
n n
1 1
and S 2 = (Xi − X)2
P P
a) X = n
Xi n−1
are independent
i=1 i=1
b) X ∼ N (µ, σ 2 /n)
c) (n − 1)S 2 /σ 2 ∼ χ2n−1
X−µ
d) √
S/ n
∼ tn−1
b) X − Y ∼ N µX − µY , σ 2 ( n1 + m1 )
c) (n + m − 2)Sp2 /σ 2 ∼ χ2m+n−2
X−Y −(µX −µY )
d) √1 1
∼ tm+n−2
Sp n
+m
σ 2 ni=1 x2i σ2
P
Var(β̂0 ) = and Var(β̂1 ) = n
n ni=1 (xi − x)2
P P 2
i=1 (xi − x)
n
(Yi − βˆ0 − βˆ1 xi )2 . Then S 2 = SSE/(n−2) is an unbiased estimator
P
c) Let SSE=
i=1
for σ 2 , and (n − 2)S 2 /σ 2 ∼ χ2n−2
11
13. Multiple linear regression
Assume Yi = β0 +β1 xi1 +· · ·+βk xik +i ; i = 1, 2, . . . , n ; where xij -s are given numbers
and i -s are independent and N (0, σ 2 ) distributed. The model can be written in matrix
form as Y = Xβ, where Y = (Y1 , . . . , Yn )T and β = (β0 , . . . , βk )T are n- and (k + 1)-
dimentional vectors, and X = {xij } (with xi0 = 1) is a n × (k + 1)-dimentional matrix.
Then:
2. Let β̂ = (β̂0 , . . . , β̂k )T . Then β̂j -s are normally distributed and unbiased, and
4. Let Sβ̂2 be the variance estimator for β̂j we get by replacing σ 2 with S 2 in the
j
12