Soderlind - Lecture Notes - Econometrics - Some Statistics
Soderlind - Lecture Notes - Econometrics - Some Statistics
Soderlind - Lecture Notes - Econometrics - Some Statistics
Paul Sderlind1
10 February 2011
University of St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen,
Switzerland. E-mail: Paul.Soderlind@unisg.ch. Document name: EcmXSta.TeX.
Contents
21 Some Statistics
21.1 Distributions and Moment Generating Functions . . . . . .
21.2 Joint and Conditional Distributions and Moments . . . . .
21.2.1 Joint and Conditional Distributions . . . . . . . .
21.2.2 Moments of Joint Distributions . . . . . . . . . .
21.2.3 Conditional Moments . . . . . . . . . . . . . . .
21.2.4 Regression Function and Linear Projection . . . .
21.3 Convergence in Probability, Mean Square, and Distribution
21.4 Laws of Large Numbers and Central Limit Theorems . . .
21.5 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . .
21.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . .
21.7 Special Distributions . . . . . . . . . . . . . . . . . . . .
21.7.1 The Normal Distribution . . . . . . . . . . . . . .
21.7.2 The Lognormal Distribution . . . . . . . . . . . .
21.7.3 The Chi-Square Distribution . . . . . . . . . . . .
21.7.4 The t and F Distributions . . . . . . . . . . . . . .
21.7.5 The Bernouilli and Binomial Distributions . . . . .
21.7.6 The Skew-Normal Distribution . . . . . . . . . . .
21.7.7 Generalized Pareto Distribution . . . . . . . . . .
21.8 Inference . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
4
4
5
5
6
7
9
10
10
11
11
16
17
18
20
20
21
21
Chapter 21
Some Statistics
This section summarizes some useful facts about statistics. Heuristic proofs are given in
a few cases.
Some references: Mittelhammer (1996), DeGroot (1986), Greene (2000), Davidson
(2000), Johnson, Kotz, and Balakrishnan (1994).
21.1
The affine function a C bX (a and b are constants) has the moment generating function mgfg.X / .t/ D E e t .aCbX / D e t a E e t bX D e t a mgfX .bt/. By setting b D 1 and
a D E X we obtain a mgf for central moments (variance, skewness, kurtosis, etc),
mgf.X E X / .t/ D e t E X mgfX .t/.
Example 21.4 When X N.; 2 /, then mgfX .t/ D exp t C 2 t 2 =2 . Let Z D
.X /= so a D = and b D 1=. This gives mgfZ .t/ D exp. t=/mgfX .t=/ D
exp t 2 =2 . (Of course, this result can also be obtained by directly setting D 0 and
D 1 in mgfX .)
Fact 21.5 (Change of variable, univariate case, monotonic function) Suppose X has the
probability density function fX .c/ and cumulative distribution function FX .c/. Let Y D
g.X / be a continuously differentiable function with dg=dX > 0 (so g.X/ is increasing
for all c such that fX .c/ > 0. Then the cdf of Y is
FY .c/ D PrY c D Prg.X/ c D PrX g 1 .c/ D FX g 1 .c/;
where g
of Y is
is the inverse function of g such that g 1 .Y / D X. We also have that the pdf
dg 1 .c/
:
fY .c/ D fX g .c/
dc
1
If, instead, dg=dX < 0 (so g.X/ is decreasing), then we instead have the cdf of Y
FY .c/ D PrY c D Prg.X/ c D PrX g 1 .c/ D 1
FX g 1 .c/;
21.2
21.2.1
Fact 21.10 (Joint and marginal cdf) Let X and Y be (possibly vectors of) random variables and let x and y be two numbers. The joint cumulative distribution function of
Rx Ry
X and Y is H.x; y/ D Pr.X x; Y y/ D
1
1 h.qx ; qy /dqy dqx , where
2
h.x; y/ D @ F .x; y/=@x@y is the joint probability density function.
Fact 21.11 (Joint and marginal pdf) The marginal cdf of X is obtained by integrating out
R x R 1
Y : F .x/ D Pr.X x; Y anything/ D 1
1 h.qx ; qy /dqy dqx . This shows that the
R1
marginal pdf of x is f .x/ D dF .x/=dx D 1 h.qx ; qy /dqy .
Fact 21.12 (Conditional distribution) The pdf of Y conditional on X D x (a number) is
g.yjx/ D h.x; y/=f .x/. This is clearly proportional to the joint pdf (at the given value
x).
Fact 21.13 (Change of variable, multivariate case, monotonic function) The result in
Fact 21.5 still holds if X and Y are both n 1 vectors, but the derivative are now
@g 1 .c/=@dc 0 which is an n n matrix. If gi 1 is the ith function in the vector g 1
then
@g1 1 .c/
@c1
@g 1 .c/ 6
::
D6
:
4
0
@dc
@gn 1 .c/
@c1
21.2.2
@g1 1 .c/
@cn
::
:
@gn 1 .c/
@cm
3
7
7:
5
E.XY /= E.X 2 /
Fact 21.15 ( 1 Corr.X; y/ 1). Let Y and X in Fact 21.14 be zero mean variables
(or variables minus their means). We then get Cov.X; Y /2 Var.X/ Var.Y /, that is,
1 Cov.X; Y /=Std.X/Std.Y / 1.
21.2.3
Conditional Moments
Fact 21.17 (Conditional moments as random variables) Before we observe X , the conditional moments are random variablessince X is. We denote these random variables by
E .Y jX /, Var .Y jX/, etc.
Fact 21.18 (Law of iterated expectations) E Y D EE .Y jX/. Note that E .Y jX/ is a
random variable since it is a function of the random variable X . It is not a function of Y ,
however. The outer expectation is therefore an expectation with respect to X only.
Proof. EE .Y jX / D
E Y:
R R
RR
RR
yg.yjx/dy f .x/dx D
yg.yjx/f .x/dydx D
yh.y; x/dydx D
Fact 21.19 (Conditional vs. unconditional variance) Var .Y / D Var E .Y jX /CE Var .Y jX /.
E y/g.yjx/dy f .x/dx,
21.2.4
where E.X"/ D 0k1 . This implies that EP .Y jX/" D 0, so the forecast and forecast error are orthogonal. (b) The orthogonality conditions also imply that EXY D
EXP .Y jX/. (c) When X contains a constant, so E " D 0, then (a) and (b) carry over to
covariances: CovP .Y jX/; " D 0 and CovX; Y D CovXP; .Y jX/.
Example 21.25 (P .1jX/) When Y t D 1, then D .E XX 0 /
pose X D x1t ; x t 2 0 . Then
"
D
2
E x1t
E x1t x2t
2
E x2t x1t
E x2t
"
E x1t
E x2t
#
:
21.3
Fact 21.30 (Convergence in probability) The sequence of random variables fXT g converges in probability to the random variable X if (and only if) for all " > 0
X j < "/ D 1:
lim Pr.jXT
T !1
p
Example 21.31 Suppose XT D 0 with probability .T 1/=T and XT D T with probability 1=T . Note that limT !1 Pr.jXT
0j D 0/ D limT !1 .T
1/=T D 1, so
limT !1 Pr.jXT 0j D "/ D 1 for any " > 0. Note also that E XT D 0 .T 1/=T C
T 1=T D 1, so XT is biased.
Fact 21.32 (Convergence in mean square) The sequence of random variables fXT g converges in mean square to the random variable X if (and only if)
X/2 D 0:
lim E.XT
T !1
m
0.4
T=5
T=25
T=100
0.3
0.2
1
0
2
0.1
1
0
1
Sample average
0
T sample average
T !1
This means that both the variance and the squared bias go to zero as T ! 1.
and fYT g, and the non-random matrix faT g be such that XT ! X , YT ! Y , and aT ! a
d
21.4
Fact 21.41 (The Lindeberg-Lvy theorem) Let X t be independently and identically disd
tributed (iid) with E X t D 0 and Var.X t / < 1. Then p1T tTD1 X t = ! N.0; 1/.
9
21.5
Stationarity
21.6
Martingales
Yt
1 ; :::
If
is a mar-
10
.E jX t j2C /1=.2C/
T
tD1
E X t2 =T
p
d
then . tTD1 X t = T /=. tTD1 E X t2 =T /1=2 ! N.0; 1/. (See Davidson (2000) 6.2)
21.7
Special Distributions
21.7.1
Fact 21.53 (Univariate normal distribution) If X N.; 2 /, then the probability density function of X, f .x/ is
f .x/ D p
1
2 2
1 x 2
2. /
The moment generating function is mgfX .t/ D exp t C 2 t 2 =2 and the moment gen
erating function around the mean is mgf.X / .t/ D exp 2 t 2 =2 .
Example 21.54 The first few moments around the mean are E.X / D 0, E.X /2 D
2 , E.X /3 D 0 (all odd moments are zero), E.X /4 D 3 4 , E.X /6 D 15 6 ,
and E.X /8 D 105 8 .
11
Pdf of N(0,1)
0.2
0.4
0.1
0.2
0
2
0
0
x
2
0
0
y
0.4
0.2
0
2
2
0
0
y
0.2
0.4
0.1
0
2
x=0.8
x=0
0.6
0.2
0
y
0
2
0
y
0.4
0.4
0.2
0
2
x=0.8
x=0
0.6
0.2
0
y
0
2
0
y
13
ZX XX1 XZ :
Note that the conditional variance is constant in the multivariate normal distribution
(Var .ZjX / is not a random variable in this case). Note also that Var .Zjx/ is less than
Var.Z/ D ZZ (in a matrix sense) if X contains any relevant information (so ZX is
not zero, that is, E .Zjx/ is not the same for all x).
Fact 21.58 (Steins lemma) If Y has normal distribution and h./ is a differentiable function such that E jh0 .Y /j < 1, then CovY; h.Y / D Var.Y / E h0 .Y /.
R1
Proof. E.Y /h.Y / D 1 .Y /h.Y /.Y I ; 2 /d Y , where .Y I ; 2 / is the
pdf of N.; 2 /. Note that d.Y I ; 2 /=d Y D .Y I ; 2 /.Y /= 2 , so the integral
R1
R
can be rewritten as 2 1 h.Y /d.Y I ; 2 /. Integration by parts ( udv D uv
1
R1
R
2 0
2
0
vdu) gives 2 h.Y /.Y I ; 2 / 1
1 .Y I ; /h .Y /d Y D E h .Y /.
Fact 21.59 (Steins lemma 2) It follows from Fact 21.58 that if X and Y have a bivariate
normal distribution and h./ is a differentiable function such that E jh0 .Y /j < 1, then
CovX; h.Y / D Cov.X; Y / E h0 .Y /.
Example 21.60 (a) With h.Y / D exp.Y / we get CovX; exp.Y / D Cov.X; Y / E exp.Y /;
(b) with h.Y / D Y 2 we get CovX; Y 2 D Cov.X; Y /2 E Y so with E Y D 0 we get a
zero covariance.
Fact 21.61 (Steins lemma 3) Fact 21.59 still holds if the joint distribution of X and Y is
a mixture of n bivariate normal distributions, provided the mean and variance of Y is the
same in each of the n components. (See Sderlind (2009) for a proof.)
Fact 21.62 (Truncated normal distribution) Let X N.; 2 /, and consider truncating
the distribution so that we want moments conditional on a < X b. Define a0 D
.a /= and b0 D .b /=. Then,
.b0 / .a0 /
and
.b0 / .a0 /
(
b0 .b0 / a0 .a0 /
Var.Xja < X b/ D 2 1
.b0 / .a0 /
E.Xja < X b/ D
14
.b0 /
.b0 /
.a0 /
.a0 /
2 )
:
Fact 21.63 (Lower truncation) In Fact 21.62, let b ! 1, so we only have the truncation
a < X. Then, we have
E.Xja < X/ D C
1
(
.a0 /
and
.a0 /
a0 .a0 /
1C
1 .a0 /
Var.Xja < X/ D 2
.a0 /
1 .a0 /
2 )
:
E.XjX b/ D
.b0 /
.b0 /
2 )
:
a0 .a0 / D 0.)
d
0 ! N .0; / ;
, where
is q k:
0
@
@
@ 0
Proof. By the mean value theorem we have
@g . / O
O
g D g .0 / C
@ 0
15
0 ;
where
2
@g ./ 6
D6
4
@ 0
::
:
:::
@g1 . /
@k
@gq . /
@1
@gq . /
@k
@g1 . /
@1
::
:
3
7
7
5
;
qk
T and
1 ln y 2
/
2.
, y > 0:
16
Example 21.71 The first two moments of the truncated (from above) lognormal distri
bution are E.yjy b/ D exp C 2 =2 . C b0 /=.b0 / and E.y 2 jy b/ D
exp 2 C 2 2 . 2 C b0 /=.b0 /.
Fact 21.72 (Multivariate lognormal distribution) Let the n 1 vector x have a mulivariate normal distribution
2 3
2
3
1
11 1n
6 : 7
6
:: 7
:: 7 and D 6 ::: : : :
x N.; /, where D 6
: 7
4 5
4
5:
n
n1 nn
Then y D exp.x/ has a lognormal distribution, with the means and covariances
E yi D exp .i C i i =2/
Cov.yi ; yj / D exp i C j C .i i C jj /=2 exp.ij / 1
q
Corr.yi ; yj / D exp.ij / 1 = exp.i i / 1 exp.jj / 1 :
Cleary, Var.yi / D exp 2i C i i exp.i i / 1. Cov.y1 ; y2 / and Corr.y1 ; y2 / have the
same sign as Corr.xi ; xj / and are increasing in it. However, Corr.yi ; yj / is closer to zero.
21.7.3
Fact 21.73 (The 2n distribution) If Y 2n , then the pdf of Y is f .y/ D 2n=2 1.n=2/ y n=2 1 e
where ./ is the gamma function. The moment generating function is mgfY .t/ D .1
2t / n=2 for t < 1=2. The first moments of Y are E Y D n and Var.Y / D 2n.
Fact 21.74 (Quadratic forms of normally distribution random variables) If the n 1
vector X N.0; /, then Y D X 0 1 X 2n . Therefore, if the n scalar random
variables Xi , i D 1; :::; n, are uncorrelated and have the distributions N.0; i2 /, i D
1; :::; n, then Y D inD1 Xi2 =i2 2n .
Fact 21.75 (Distribution of X 0 AX) If the n1 vector X N.0; I /, and A is a symmetric
idempotent matrix (A D A0 and A D AA D A0 A) of rank r, then Y D X 0 AX 2r .
Fact 21.76 (Distribution of X 0 C X) If the n 1 vector X N.0; /, where has
rank r n then Y D X 0 C X 2r where C is the pseudo inverse of .
17
y=2
n/=n1=2 .
1
n=2
n1=2
n!1
This is the moment generating function of a N.0; 2/ distribution, which shows that Z !
N.0; 2/. This result should not come as a surprise as we can think of Y as the sum of
n variables; dividing by n1=2 is then like creating a scaled sample average for which a
central limit theorem applies.
21.7.4
Fact 21.79 (The F .n1 ; n2 / distribution) If Y1 2n1 and Y2 2n2 and Y1 and Y2 are
independent, then Z D .Y1 =n1 /=.Y2 =n2 / has an F .n1 ; n2 / distribution. This distribution
18
b. Pdf of F(n1,10)
a. Pdf of Chisquare(n)
1
1
n=1
n=2
n=5
n=10
0.5
n1=2
n1=5
n1=10
0.5
10
c. Pdf of F(n1,100)
0.4
n1=2
n1=5
n1=10
0.5
4
x
N(0,1)
t(10)
t(50)
0.2
0
x
2/ for n > 2.
19
21.7.5
Fact 21.84 (Bernoulli distribution) The random variable X can only take two values:
1 or 0, with probability p and 1 p respectively. The moment generating function is
mgf .t / D pe t C 1 p. This gives E.X/ D p and Var.X/ D p.1 p/.
Example 21.85 (Shifted Bernoulli distribution) Suppose the Bernoulli variable takes the
values a or b (instead of 1 and 0) with probability p and 1 p respectively. Then E.X/ D
pa C .1 p/b and Var.X/ D p.1 p/.a b/2 .
Fact 21.86 (Binomial distribution). Suppose X1 ; X2 ; :::; Xn all have Bernoulli distributions with the parameter p. Then, the sum Y D X1 C X2 C ::: C Xn has a Binomial
distribution with parameters p and n. The pdf is pdf.Y / D n=y.n y/p y .1 p/n y
for y D 0; 1; :::; n. The moment generating function is mgf .t/ D pe t C 1 pn . This
gives E.Y / D np and Var.Y / D np.1 p/.
Example 21.87 (Shifted Binomial distribution) Suppose the Bernuolli variables X1 ; X2 ; :::; Xn
take the values a or b (instead of 1 and 0) with probability p and 1 p respectively.
Then, the sum Y D X1 C X2 C ::: C Xn has E.Y / D npa C .1 p/b and Var.Y / D
np.1 p/.a b/2 .
21.7.6
Fact 21.88 (Skew-normal distribution) Let and be the standard normal pdf and cdf
respectively. The pdf of a skew-normal distribution with shape parameter is then
f .z/ D 2.z/.z/:
If Z has the above pdf and
Y D C !Z with ! > 0;
then Y is said to have a SN.; ! 2 ; / distribution (see Azzalini (2005)). Clearly, the pdf
of Y is
f .y/ D 2 .y / =! .y / =! =!:
The moment generating function is mgfy .t/ D 2 exp t C ! 2 t 2 =2 .!t/ where D
p
= 1 C 2 . When > 0 then the distribution is positively skewed (and vice versa)and
20
2=,
EY D C ! EZ
Var.Y / D ! 2 Var.Z/
E .Y
E Y /3 D ! 3 E.Z
E Z/3 :
Notice that with D 0 (so D 0), then these moments of Y become , ! 2 and 0
respecively.
21.7.7
Fact 21.90 (Cdf and pdf of the generalized Pareto distribution) The generalized Pareto
distribution is described by a scale parameter ( > 0) and a shape parameter (). The
cdf (Pr.Z z/, where Z is the random variable and z is a value) is
(
1 .1 C z=/ 1= if 0
G.z/ D
1 exp. z=/
D 0;
for 0 z and z
g.z/ D
1
exp. z=/
D 0:
The mean is defined (finite) if < 1 and is then E.z/ D =.1 /, the median is
.2 1/= and the variance is defined if < 1=2 and is then 2 =.1 /2 .1 2/.
21.8
Inference
R0 Var. /R, so every linear combination of O has a variance that is as large as the variance of the same linear combination of . In particular, this means that the variance of
O is at least as large as variance of
every element in O (the diagonal elements of Var./)
the corresponding element of .
22
Bibliography
Azzalini, A., 2005, The skew-normal distribution and related Multivariate Families,
Scandinavian Journal of Statistics, 32, 159188.
Davidson, J., 2000, Econometric theory, Blackwell Publishers, Oxford.
DeGroot, M. H., 1986, Probability and statistics, Addison-Wesley, Reading, Massachusetts.
Greene, W. H., 2000, Econometric analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 4th edn.
Johnson, N. L., S. Kotz, and N. Balakrishnan, 1994, Continuous univariate distributions,
Wiley, New York, 2nd edn.
Mittelhammer, R. C., 1996, Mathematical statistics for economics and business, SpringerVerlag, New York.
Sderlind, P., 2009, An extended Steins lemma for asset pricing, Applied Economics
Letters, forthcoming, 16, 10051008.
23