Soderlind - Lecture Notes - Econometrics - Some Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Lecture Notes - Econometrics: Some Statistics

Paul Sderlind1
10 February 2011

University of St. Gallen. Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St. Gallen,
Switzerland. E-mail: Paul.Soderlind@unisg.ch. Document name: EcmXSta.TeX.

Contents

21 Some Statistics
21.1 Distributions and Moment Generating Functions . . . . . .
21.2 Joint and Conditional Distributions and Moments . . . . .
21.2.1 Joint and Conditional Distributions . . . . . . . .
21.2.2 Moments of Joint Distributions . . . . . . . . . .
21.2.3 Conditional Moments . . . . . . . . . . . . . . .
21.2.4 Regression Function and Linear Projection . . . .
21.3 Convergence in Probability, Mean Square, and Distribution
21.4 Laws of Large Numbers and Central Limit Theorems . . .
21.5 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . .
21.6 Martingales . . . . . . . . . . . . . . . . . . . . . . . . .
21.7 Special Distributions . . . . . . . . . . . . . . . . . . . .
21.7.1 The Normal Distribution . . . . . . . . . . . . . .
21.7.2 The Lognormal Distribution . . . . . . . . . . . .
21.7.3 The Chi-Square Distribution . . . . . . . . . . . .
21.7.4 The t and F Distributions . . . . . . . . . . . . . .
21.7.5 The Bernouilli and Binomial Distributions . . . . .
21.7.6 The Skew-Normal Distribution . . . . . . . . . . .
21.7.7 Generalized Pareto Distribution . . . . . . . . . .
21.8 Inference . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2
2
4
4
5
5
6
7
9
10
10
11
11
16
17
18
20
20
21
21

Chapter 21
Some Statistics

This section summarizes some useful facts about statistics. Heuristic proofs are given in
a few cases.
Some references: Mittelhammer (1996), DeGroot (1986), Greene (2000), Davidson
(2000), Johnson, Kotz, and Balakrishnan (1994).

21.1

Distributions and Moment Generating Functions

Most of the stochastic variables we encounter in econometrics are continuous. For a


continuous random variable X , the range is uncountably infinite and the probability that
Rx
X  x is Pr.X  x/ D 1 f .q/dq where f .q/ is the continuous probability density
function of X . Note that X is a random variable, x is a number (1.23 or so), and q is just
a dummy argument in the integral.
Fact 21.1 (cdf and pdf) The cumulative distribution function of the random variable X is
Rx
F .x/ D Pr.X  x/ D 1 f .q/dq. Clearly, f .x/ D dF .x/=dx. Note that x is just a
number, not random variable.
Fact 21.2 (Moment generating function of X) The moment generating function of the
random variable X is mgf .t/ D E e tX . The rth moment is the rth derivative of mgf .t/
evaluated at t D 0: E X r D d mgf .0/=dt r . If a moment generating function exists (that
is, E e tX < 1 for some small interval t 2 . h; h/), then it is unique.
Fact 21.3 (Moment generating function of a function of X) If X has the moment generating function mgfX .t/ D E e tX , then g.X/ has the moment generating function E e tg.X / .
2

The affine function a C bX (a and b are constants) has the moment generating function mgfg.X / .t/ D E e t .aCbX / D e t a E e t bX D e t a mgfX .bt/. By setting b D 1 and
a D E X we obtain a mgf for central moments (variance, skewness, kurtosis, etc),
mgf.X E X / .t/ D e t E X mgfX .t/.

Example 21.4 When X  N.;  2 /, then mgfX .t/ D exp t C  2 t 2 =2 . Let Z D
.X /= so a D = and b D 1=. This gives mgfZ .t/ D exp. t=/mgfX .t=/ D

exp t 2 =2 . (Of course, this result can also be obtained by directly setting  D 0 and
 D 1 in mgfX .)
Fact 21.5 (Change of variable, univariate case, monotonic function) Suppose X has the
probability density function fX .c/ and cumulative distribution function FX .c/. Let Y D
g.X / be a continuously differentiable function with dg=dX > 0 (so g.X/ is increasing
for all c such that fX .c/ > 0. Then the cdf of Y is
FY .c/ D PrY  c D Prg.X/  c D PrX  g 1 .c/ D FX g 1 .c/;
where g
of Y is

is the inverse function of g such that g 1 .Y / D X. We also have that the pdf

dg 1 .c/
:
fY .c/ D fX g .c/
dc
1

If, instead, dg=dX < 0 (so g.X/ is decreasing), then we instead have the cdf of Y
FY .c/ D PrY  c D Prg.X/  c D PrX  g 1 .c/ D 1

FX g 1 .c/;

but the same expression for the pdf.


Proof. Differentiate FY .c/, that is, FX g 1 .c/ with respect to c.
Example 21.6 Let X  U.0; 1/ and Y D g.X/ D F 1 .X/ where F .c/ is a strictly
increasing cdf. We then get
dF .c/
fY .c/ D
:
dc
The variable Y then has the pdf dF .c/=dc and the cdf F .c/. This shows how to generate random numbers from the F ./ distribution: draw X  U.0; 1/ and calculate
Y D F 1 .X/.

Example 21.7 Let Y D exp.X/, so the inverse function is X D ln Y with derivative


1=Y . Then, fY .c/ D fX .ln c/=c. Conversely, let Y D ln X , so the inverse function is
X D exp.Y / with derivative exp.Y /. Then, fY .c/ D fX exp.c/ exp.c/.
Example 21.8 Let X  U.0; 2/, so the pdf and cdf of X are then 1=2 and c=2 respectively. Now, let Y D g.X/ D X gives the pdf and cdf as 1=2 and 1 C y=2 respectively.
The latter is clearly the same as 1 FX g 1 .c/ D 1 . c=2/.
Fact 21.9 (Distribution of truncated a random variable) Let the probability distribution
and density functions of X be F .x/ and f .x/, respectively. The corresponding functions,
conditional on a < X  b are F .x/ F .a/=F .b/ F .a/ and f .x/=F .b/ F .a/.
Clearly, outside a < X  b the pdf is zero, while the cdf is zero below a and unity above
b.

21.2

Joint and Conditional Distributions and Moments

21.2.1

Joint and Conditional Distributions

Fact 21.10 (Joint and marginal cdf) Let X and Y be (possibly vectors of) random variables and let x and y be two numbers. The joint cumulative distribution function of
Rx Ry
X and Y is H.x; y/ D Pr.X  x; Y  y/ D
1
1 h.qx ; qy /dqy dqx , where
2
h.x; y/ D @ F .x; y/=@x@y is the joint probability density function.
Fact 21.11 (Joint and marginal pdf) The marginal cdf of X is obtained by integrating out

R x R 1
Y : F .x/ D Pr.X  x; Y anything/ D 1
1 h.qx ; qy /dqy dqx . This shows that the
R1
marginal pdf of x is f .x/ D dF .x/=dx D 1 h.qx ; qy /dqy .
Fact 21.12 (Conditional distribution) The pdf of Y conditional on X D x (a number) is
g.yjx/ D h.x; y/=f .x/. This is clearly proportional to the joint pdf (at the given value
x).
Fact 21.13 (Change of variable, multivariate case, monotonic function) The result in
Fact 21.5 still holds if X and Y are both n  1 vectors, but the derivative are now
@g 1 .c/=@dc 0 which is an n  n matrix. If gi 1 is the ith function in the vector g 1

then

@g1 1 .c/
@c1

@g 1 .c/ 6
::
D6
:
4
0
@dc

@gn 1 .c/
@c1

21.2.2




@g1 1 .c/
@cn

::
:

@gn 1 .c/
@cm

3
7
7:
5

Moments of Joint Distributions

Fact 21.14 (Caucy-Schwartz) .E XY /2  E.X 2 / E.Y 2 /:


Proof. 0  E.aXCY /2 D a2 E.X 2 /C2a E.XY /CE.Y 2 /. Set a D
to get
E.XY /2
E.XY /2
2
0
C
E.Y
/,
that
is,
 E.Y 2 /:
E.X 2 /
E.X 2 /

E.XY /= E.X 2 /

Fact 21.15 ( 1  Corr.X; y/  1). Let Y and X in Fact 21.14 be zero mean variables
(or variables minus their means). We then get Cov.X; Y /2  Var.X/ Var.Y /, that is,
1  Cov.X; Y /=Std.X/Std.Y /  1.
21.2.3

Conditional Moments

Fact 21.16 (Conditional moments) E .Y jx/ D


E .Y jx/g.yjx/dy.

yg.yjx/dy and Var .Y jx/ D

Fact 21.17 (Conditional moments as random variables) Before we observe X , the conditional moments are random variablessince X is. We denote these random variables by
E .Y jX /, Var .Y jX/, etc.
Fact 21.18 (Law of iterated expectations) E Y D EE .Y jX/. Note that E .Y jX/ is a
random variable since it is a function of the random variable X . It is not a function of Y ,
however. The outer expectation is therefore an expectation with respect to X only.
Proof. EE .Y jX / D
E Y:

R R


RR
RR
yg.yjx/dy f .x/dx D
yg.yjx/f .x/dydx D
yh.y; x/dydx D

Fact 21.19 (Conditional vs. unconditional variance) Var .Y / D Var E .Y jX /CE Var .Y jX /.

Fact 21.20 (Properties of Conditional Expectations) (a) Y D E .Y jX /CU where U and


E .Y jX / are uncorrelated: Cov .X; Y / D Cov X; E .Y jX/ C U D Cov X; E .Y jX /.
It follows that (b) CovY; E .Y jX / D VarE .Y jX/; and (c) Var .Y / D Var E .Y jX/ C
Var .U /. Property (c) is the same as Fact 21.19, where Var .U / D E Var .Y jX /.
RR
R R
Proof. Cov .X; Y / D
x.y E y/h.x; y/dydx D x .y
but the term in brackets is E .Y jX / E Y .


E y/g.yjx/dy f .x/dx,

Fact 21.21 (Conditional expectation and unconditional orthogonality) E .Y jZ/ D 0 )


E Y Z D 0.
Proof. Note from Fact 21.20 that E.Y jX/ D 0 implies Cov .X; Y / D 0 so E XY D
E X E Y (recall that Cov .X; Y / D E XY E X E Y:) Note also that E .Y jX/ D 0 implies
that E Y D 0 (by iterated expectations). We therefore get
"
#
Cov .X; Y / D 0
E .Y jX / D 0 )
) E YX D 0:
EY D 0

21.2.4

Regression Function and Linear Projection

Fact 21.22 (Regression function) Suppose we use information in some variables X to


predict Y . The choice of the forecasting function YO D k.X/ D E .Y jX/ minimizes
EY k.X/2 : The conditional expectation E .Y jX / is also called the regression function
of Y on X . See Facts 21.20 and 21.21 for some properties of conditional expectations.
Fact 21.23 (Linear projection) Suppose we want to forecast the scalar Y using the k  1
vector X and that we restrict the forecasting rule to be linear YO D X 0 . This rule is a
linear projection, denoted P .Y jX/, if satisfies the orthogonality conditions EX.Y
X 0 / D 0k1 , that is, if D .E XX 0 / 1 E XY . A linear projection minimizes EY
k.X /2 within the class of linear k.X/ functions.
Fact 21.24 (Properties of linear projections) (a) The orthogonality conditions in Fact
21.23 mean that
Y D X 0 C ";
6

where E.X"/ D 0k1 . This implies that EP .Y jX/" D 0, so the forecast and forecast error are orthogonal. (b) The orthogonality conditions also imply that EXY D
EXP .Y jX/. (c) When X contains a constant, so E " D 0, then (a) and (b) carry over to
covariances: CovP .Y jX/; " D 0 and CovX; Y D CovXP; .Y jX/.
Example 21.25 (P .1jX/) When Y t D 1, then D .E XX 0 /
pose X D x1t ; x t 2 0 . Then
"
D

2
E x1t
E x1t x2t
2
E x2t x1t
E x2t

"

E x1t
E x2t

E X . For instance, sup-

#
:

If x1t D 1 in all periods, then this simplifies to D 1; 00 .


Remark 21.26 Some authors prefer to take the transpose of the forecasting rule, that is,
to use YO D 0 X . Clearly, since XX 0 is symmetric, we get 0 D E.YX 0 /.E XX 0 / 1 .
Fact 21.27 (Linear projection with a constant in X) If X contains a constant, then P .aY C
bjX / D aP .Y jX/ C b.
Fact 21.28 (Linear projection versus regression function) Both the linear regression and
the regression function (see Fact 21.22) minimize EY k.X/2 , but the linear projection
imposes the restriction that k.X/ is linear, whereas the regression function does not impose any restrictions. In the special case when Y and X have a joint normal distribution,
then the linear projection is the regression function.
Fact 21.29 (Linear projection and OLS) The linear projection is about population moments, but OLS is its sample analogue.

21.3

Convergence in Probability, Mean Square, and Distribution

Fact 21.30 (Convergence in probability) The sequence of random variables fXT g converges in probability to the random variable X if (and only if) for all " > 0
X j < "/ D 1:

lim Pr.jXT

T !1
p

We denote this XT ! X or plim XT D X (X is the probability limit of XT ). Note: (a)


X can be a constant instead of a random variable; (b) if XT and X are matrices, then
p
XT ! X if the previous condition holds for every element in the matrices.
7

Example 21.31 Suppose XT D 0 with probability .T 1/=T and XT D T with probability 1=T . Note that limT !1 Pr.jXT
0j D 0/ D limT !1 .T
1/=T D 1, so
limT !1 Pr.jXT 0j D "/ D 1 for any " > 0. Note also that E XT D 0  .T 1/=T C
T  1=T D 1, so XT is biased.
Fact 21.32 (Convergence in mean square) The sequence of random variables fXT g converges in mean square to the random variable X if (and only if)
X/2 D 0:

lim E.XT

T !1
m

We denote this XT ! X . Note: (a) X can be a constant instead of a random variable;


m
(b) if XT and X are matrices, then XT ! X if the previous condition holds for every
element in the matrices.

Distribution of T sample avg.

Distribution of sample avg.


3

0.4

T=5
T=25
T=100

0.3
0.2

1
0
2

0.1
1

0
1
Sample average

0
T sample average

Sample average of zt1 where zt has a 2(1) distribution

Figure 21.1: Sampling distributions

Fact 21.33 (Convergence in mean square to a constant) If X in Fact 21.32 is a constant,


m
then then XT ! X if (and only if)
lim .E XT

T !1

X/2 D 0 and lim Var.XT 2 / D 0:


T !1

This means that both the variance and the squared bias go to zero as T ! 1.

Proof. E.XT X/2 D E XT2 2X E XT C X 2 . Add and subtract .E XT /2 and recall


that Var.XT / D E XT2 .E XT /2 . This gives E.XT X/2 D Var.XT / 2X E XT C X 2 C
.E XT /2 D Var.XT / C .E XT X/2 .
Fact 21.34 (Convergence in distribution) Consider the sequence of random variables
fXT g with the associated sequence of cumulative distribution functions fFT g. If limT !1 FT D
F (at all points), then F is the limiting cdf of XT . If there is a random variable X with
d

cdf F , then XT converges in distribution to X : XT ! X . Instead of comparing cdfs, the


comparison can equally well be made in terms of the probability density functions or the
moment generating functions.
m

Fact 21.35 (Relation between the different types of convergence) We have XT ! X )


p

XT ! X ) XT ! X . The reverse implications are not generally true.


Example 21.36 Consider the random variable in Example 21.31. The expected value is
E XT D 0.T 1/=T C T =T D 1. This means that the squared bias does not go to zero,
so XT does not converge in mean square to zero.
Fact 21.37 (Slutskys theorem) If fXT g is a sequence of random matrices such that plim XT D
X and g.XT / a continuous function, then plim g.XT / D g.X/.
Fact 21.38 (Continuous mapping theorem) Let the sequences of random matrices fXT g
d

and fYT g, and the non-random matrix faT g be such that XT ! X , YT ! Y , and aT ! a
d

(a traditional limit). Let g.XT ; YT ; aT / be a continuous function. Then g.XT ; YT ; aT / !


g.X; Y; a/.

21.4

Laws of Large Numbers and Central Limit Theorems

Fact 21.39 (Khinchines theorem) Let X t be independently and identically distributed


p
(iid) with E X t D  < 1. Then tTD1 X t =T ! .
T
Fact 21.40 (Chebyshevs theorem) If E X t D 0 and limT !1 Var. tD1
X t =T / D 0, then
p
T
t D1 X t =T ! 0.

Fact 21.41 (The Lindeberg-Lvy theorem) Let X t be independently and identically disd

tributed (iid) with E X t D 0 and Var.X t / < 1. Then p1T tTD1 X t = ! N.0; 1/.
9

21.5

Stationarity

Fact 21.42 (Covariance stationarity) X t is covariance stationary if


E X t D  is independent of t;
Cov .X t s ; X t / D s depends only on s, and
both  and s are finite.
Fact 21.43 (Strict stationarity) X t is strictly stationary if, for all s, the joint distribution
of X t ; X t C1 ; :::; X tCs does not depend on t.
Fact 21.44 (Strict stationarity versus covariance stationarity) In general, strict stationarity does not imply covariance stationarity or vice versa. However, strict stationary with
finite first two moments implies covariance stationarity.

21.6

Martingales

Fact 21.45 (Martingale) Let t be a set of information in t , for instance Y t ; Y t


E jY t j < 1 and E.Y t C1 j t / D Y t , then Y t is a martingale.
Fact 21.46 (Martingale difference) If Y t is a martingale, then X t D Y t
tingale difference: X t has E jX t j < 1 and E.X t C1 j t / D 0.

Yt

1 ; :::

If

is a mar-

Fact 21.47 (Innovations as a martingale difference sequence) The forecast error X t C1 D


Y t C1 E.Y t C1 j t / is a martingale difference.
Fact 21.48 (Properties of martingales) (a) If Y t is a martingale, then E.Y tCs j t / D Y t
for s  1. (b) If X t is a martingale difference, then E.X t Cs j t / D 0 for s  1.
Proof. (a) Note that E.Y t C2 j t C1 / D Y t C1 and take expectations conditional on t :
EE.Y t C2 j t C1 /j t D E.Y t C1 j t / D Y t . By iterated expectations, the first term equals
E.Y t C2 j t /. Repeat this for t C 3, t C 4, etc. (b) Essentially the same proof.
Fact 21.49 (Properties of martingale differences) If X t is a martingale difference and
g t 1 is a function of t 1 , then X t g t 1 is also a martingale difference.

10

Proof. E.X tC1 g t j t / D E.X t C1 j t /g t since g t is a function of t .


Fact 21.50 (Martingales, serial independence, and no autocorrelation) (a) X t is serially
uncorrelated if Cov.X t ; X t Cs / D 0 for all s 0. This means that a linear projection of
X t Cs on X t ; X t 1;::: is a constant, so it cannot help predict X t Cs . (b) X t is a martingale
difference with respect to its history if E.X t Cs jX t ; X t 1 ; :::/ D 0 for all s  1. This means
that no function of X t ; X t 1 ; ::: can help predict X t Cs . (c) X t is serially independent if
pdf.X t Cs jX t ; X t 1 ; :::/ D pdf.X tCs /. This means than no function of X t ; X t 1 ; ::: can
help predict any function of X t Cs .
Fact 21.51 (WLN for martingale difference) If X t is a martingale difference, then plim tTD1 X t =T D
0 if either (a) X t is strictly stationary and E jx t j < 0 or (b) E jx t j1C < 1 for > 0 and
all t. (See Davidson (2000) 6.2)
Fact 21.52 (CLT for martingale difference) Let X t be a martingale difference. If plim tTD1 .X t2
E X t2 /=T D 0 and either
(a) X t is strictly stationary or
(b) max t21;T

.E jX t j2C /1=.2C/
T
tD1
E X t2 =T

< 1 for > 0 and all T > 1;

p
d
then . tTD1 X t = T /=. tTD1 E X t2 =T /1=2 ! N.0; 1/. (See Davidson (2000) 6.2)

21.7

Special Distributions

21.7.1

The Normal Distribution

Fact 21.53 (Univariate normal distribution) If X  N.;  2 /, then the probability density function of X, f .x/ is
f .x/ D p

1
2 2

1 x  2
2.  /


The moment generating function is mgfX .t/ D exp t C  2 t 2 =2 and the moment gen
erating function around the mean is mgf.X / .t/ D exp  2 t 2 =2 .
Example 21.54 The first few moments around the mean are E.X / D 0, E.X /2 D
 2 , E.X /3 D 0 (all odd moments are zero), E.X /4 D 3 4 , E.X /6 D 15 6 ,
and E.X /8 D 105 8 .
11

Pdf of N(0,1)

Pdf of bivariate normal, corr=0.1

0.2

0.4

0.1
0.2
0
2
0

0
x

2
0

0
y

Pdf of bivariate normal, corr=0.8

0.4
0.2
0
2

2
0

0
y

Figure 21.2: Normal distributions


Fact 21.55 (Standard normal distribution) If X  N.0; 1/, then the moment generating

function is mgfX .t/ D exp t 2 =2 . Since the mean is zero, m.t/ gives central moments.
The first few are E X D 0, E X 2 D 1, E X 3 D 0 (all odd moments are zero), and
p
EX 4 D 3. The distribution function, Pr.X  a/ D .a/ D 1=2 C 1=2 erf.a= 2/,
Rz
where erf./ is the error function, erf.z/ D p2 0 exp. t 2 /dt). The complementary error
function is erfc.z/ D 1 erf.z/. Since the distribution is symmetric around zero, we have
. a/ D Pr.X  a/ D Pr.X  a/ D 1 .a/. Clearly, 1 .a/ D . a/ D
p
1=2 erfc.a= 2/.

Fact 21.56 (Multivariate normal distribution) If X is an n1 vector of random variables


with a multivariate normal distribution, with a mean vector  and variance-covariance
matrix , N.; /, then the density function is


1
1
0
1
f .x/ D
exp
.x / .x / :
.2/n=2 jj1=2
2
12

Pdf of bivariate normal, corr=0.1

Conditional pdf of y, corr=0.1


0.8

0.2

0.4

0.1
0
2

x=0.8
x=0

0.6

0.2

0
y

0
2

Pdf of bivariate normal, corr=0.8

0
y

Conditional pdf of y, corr=0.8


0.8

0.4

0.4

0.2
0
2

x=0.8
x=0

0.6

0.2

0
y

0
2

0
y

Figure 21.3: Density functions of normal distributions


Fact 21.57 (Conditional normal distribution) Suppose Zm1 and Xn1 are jointly normally distributed
"
#
"
# "
#!
Z
Z
ZZ ZX
N
;
:
X
X
XZ XX
The distribution of the random variable Z conditional on that X D x (a number) is also
normal with mean
E .Zjx/ D Z C ZX XX1 .x X / ;
and variance (variance of Z conditional on that X D x, that is, the variance of the
prediction error Z E .Zjx/)
Var .Zjx/ D ZZ

13

ZX XX1 XZ :

Note that the conditional variance is constant in the multivariate normal distribution
(Var .ZjX / is not a random variable in this case). Note also that Var .Zjx/ is less than
Var.Z/ D ZZ (in a matrix sense) if X contains any relevant information (so ZX is
not zero, that is, E .Zjx/ is not the same for all x).
Fact 21.58 (Steins lemma) If Y has normal distribution and h./ is a differentiable function such that E jh0 .Y /j < 1, then CovY; h.Y / D Var.Y / E h0 .Y /.
R1
Proof. E.Y /h.Y / D 1 .Y /h.Y /.Y I ;  2 /d Y , where .Y I ;  2 / is the
pdf of N.;  2 /. Note that d.Y I ;  2 /=d Y D .Y I ;  2 /.Y /= 2 , so the integral
R1
R
can be rewritten as  2 1 h.Y /d.Y I ;  2 /. Integration by parts ( udv D uv
1


R1
R
2 0
2
0
vdu) gives  2 h.Y /.Y I ;  2 / 1
1 .Y I ;  /h .Y /d Y D  E h .Y /.
Fact 21.59 (Steins lemma 2) It follows from Fact 21.58 that if X and Y have a bivariate
normal distribution and h./ is a differentiable function such that E jh0 .Y /j < 1, then
CovX; h.Y / D Cov.X; Y / E h0 .Y /.
Example 21.60 (a) With h.Y / D exp.Y / we get CovX; exp.Y / D Cov.X; Y / E exp.Y /;
(b) with h.Y / D Y 2 we get CovX; Y 2 D Cov.X; Y /2 E Y so with E Y D 0 we get a
zero covariance.
Fact 21.61 (Steins lemma 3) Fact 21.59 still holds if the joint distribution of X and Y is
a mixture of n bivariate normal distributions, provided the mean and variance of Y is the
same in each of the n components. (See Sderlind (2009) for a proof.)
Fact 21.62 (Truncated normal distribution) Let X  N.;  2 /, and consider truncating
the distribution so that we want moments conditional on a < X  b. Define a0 D
.a /= and b0 D .b /=. Then,
.b0 / .a0 /
and

.b0 / .a0 /
(
b0 .b0 / a0 .a0 /
Var.Xja < X  b/ D  2 1
.b0 / .a0 /
E.Xja < X  b/ D 

14

.b0 /
.b0 /

.a0 /
.a0 /

2 )
:

Fact 21.63 (Lower truncation) In Fact 21.62, let b ! 1, so we only have the truncation
a < X. Then, we have
E.Xja < X/ D  C 
1
(

.a0 /
and
.a0 /

a0 .a0 /
1C
1 .a0 /

Var.Xja < X/ D  2

.a0 /
1 .a0 /

2 )
:

(The latter follows from limb!1 b0 .b0 / D 0.)


Example 21.64 Suppose X  N.0;  2 / and we want to calculate E jxj. This is the same
as E.XjX > 0/ D 2.0/.
Fact 21.65 (Upper truncation) In Fact 21.62, let a !
tion X  b. Then, we have
.b0 /

and
.b0 /
(
b0 .b0 /
Var.XjX  b/ D  2 1
.b0 /

1, so we only have the trunca-

E.XjX  b/ D 

(The latter follows from lima!

.b0 /
.b0 /

2 )
:

a0 .a0 / D 0.)

Fact 21.66 (Delta method) Consider an estimator Ok1 which satisfies


p 
T O

 d
0 ! N .0; / ;

and suppose we want the asymptotic distribution of a transformation of


q1 D g ./ ;
where g .:/ is has continuous first derivatives. The result is
i d
p h  

T g O
g .0 / ! N 0; qq ; where
@g .0 / @g .0 /0
@g .0 /
D

, where
is q  k:
0
@
@
@ 0
Proof. By the mean value theorem we have
 
@g .  /  O
O

g D g .0 / C
@ 0
15

0 ;

where

2
@g ./ 6
D6
4
@ 0

::
:


:::

@g1 . /
@k

@gq . /
@1



@gq . /
@k

@g1 . /
@1

::
:

3
7
7
5

;
qk

and we evaluate it at  which is (weakly) between O and 0 . Premultiply by


rearrange as
i @g .  / p 

p h  
O
O
T g
g .0 / D
T 0 .
@ 0

T and

If O is consistent (plim O D 0 ) and @g .  / =@ 0 is continuous, then by Slutskys theorem


plim @g .  / =@ 0 D @g .0 / =@ 0 , which is a constant. The result then follows from the
continuous mapping theorem.
21.7.2

The Lognormal Distribution

Fact 21.67 (Univariate lognormal distribution) If x  N.;  2 / and y D exp.x/ then


the probability density function of y, f .y/ is
1
e
f .y/ D p
y 2 2

1 ln y  2
/
2.


, y > 0:

The rth moment of y is E y r D exp.r C r 2  2 =2/.



Example 21.68 The first two moments are E y D exp  C  2 =2 and E y 2 D exp.2 C



2 2 /. We therefore get Var.y/ D exp 2 C  2 exp  2
1 and Std .y/ = E y D
p
exp. 2 / 1.
Fact 21.69 (Moments of a truncated lognormal distribution) If x  N.;  2 / and y D
exp.x/ then E.y r jy > a/ D E.y r /.r a0 /=. a0 /, where a0 D .ln a / =.
Note that the denominator is Pr.y > a/ D . a0 /. In contrast, E.y r jy  b/ D
E.y r /. r C b0 /=.b0 /, where b0 D .ln b / =. The denominator is Pr.y  b/ D
.b0 /.
Example 21.70 The first two moments of the truncated (from below) lognormal distri
bution are E.yjy > a/ D exp  C  2 =2 . a0 /=. a0 / and E.y 2 jy > a/ D

exp 2 C 2 2 .2 a0 /=. a0 /.

16

Example 21.71 The first two moments of the truncated (from above) lognormal distri
bution are E.yjy  b/ D exp  C  2 =2 .  C b0 /=.b0 / and E.y 2 jy  b/ D

exp 2 C 2 2 . 2 C b0 /=.b0 /.
Fact 21.72 (Multivariate lognormal distribution) Let the n  1 vector x have a mulivariate normal distribution
2 3
2
3
1
11    1n
6 : 7
6
:: 7
:: 7 and D 6 ::: : : :
x  N.; /, where  D 6
: 7
4 5
4
5:
n
n1    nn
Then y D exp.x/ has a lognormal distribution, with the means and covariances
E yi D exp .i C i i =2/



Cov.yi ; yj / D exp i C j C .i i C jj /=2 exp.ij / 1

 q


Corr.yi ; yj / D exp.ij / 1 = exp.i i / 1 exp.jj / 1 :
Cleary, Var.yi / D exp 2i C i i exp.i i / 1. Cov.y1 ; y2 / and Corr.y1 ; y2 / have the
same sign as Corr.xi ; xj / and are increasing in it. However, Corr.yi ; yj / is closer to zero.
21.7.3

The Chi-Square Distribution

Fact 21.73 (The 2n distribution) If Y  2n , then the pdf of Y is f .y/ D 2n=2 1.n=2/ y n=2 1 e
where ./ is the gamma function. The moment generating function is mgfY .t/ D .1
2t / n=2 for t < 1=2. The first moments of Y are E Y D n and Var.Y / D 2n.
Fact 21.74 (Quadratic forms of normally distribution random variables) If the n  1
vector X  N.0; /, then Y D X 0 1 X  2n . Therefore, if the n scalar random
variables Xi , i D 1; :::; n, are uncorrelated and have the distributions N.0; i2 /, i D
1; :::; n, then Y D inD1 Xi2 =i2  2n .
Fact 21.75 (Distribution of X 0 AX) If the n1 vector X  N.0; I /, and A is a symmetric
idempotent matrix (A D A0 and A D AA D A0 A) of rank r, then Y D X 0 AX  2r .
Fact 21.76 (Distribution of X 0 C X) If the n  1 vector X  N.0; /, where has
rank r  n then Y D X 0 C X  2r where C is the pseudo inverse of .
17

y=2

Proof. is symmetric, so it can be decomposed as D CC 0 where C are the


orthogonal eigenvector (C 0 C D I ) and  is a diagonal matrix with the eigenvalues along
the main diagonal. We therefore have D CC 0 D C1 11 C10 where C1 is an n  r
matrix associated with the r non-zero eigenvalues (found in the r  r matrix 11 ). The
generalized inverse can be shown to be
"
#
h
i  1 0 h
i0
11
C D C1 C2
C1 C2 D C1 111 C10 ;
0 0
We can write C D C1 111=2 111=2 C10 . Consider the r  1 vector Z D 111=2 C10 X , and
note that it has the covariance matrix
E ZZ 0 D 111=2 C10 E XX 0 C1 111=2 D 111=2 C10 C1 11 C10 C1 111=2 D Ir ;
since C10 C1 D Ir . This shows that Z  N.0r1 ; Ir /, so Z 0 Z D X 0 C X  2r .
Fact 21.77 (Convergence to a normal distribution) Let Y  2n and Z D .Y

n/=n1=2 .

Then Z ! N.0; 2/.


Example 21.78 If Y D inD1 Xi2 =i2 , then this transformation means Z D .inD1 Xi2 =i2
1/=n1=2 .
Proof. We can directly note from the moments of a 2n variable that E Z D .E Y
n/=n1=2 D 0, and Var.Z/ D Var.Y /=n D 2. From the general properties of moment
generating functions, we note that the moment generating function of Z is
mgfZ .t/ D e


1

n=2

with lim mgfZ .t/ D exp.t 2 /:

n1=2

n!1

This is the moment generating function of a N.0; 2/ distribution, which shows that Z !
N.0; 2/. This result should not come as a surprise as we can think of Y as the sum of
n variables; dividing by n1=2 is then like creating a scaled sample average for which a
central limit theorem applies.
21.7.4

The t and F Distributions

Fact 21.79 (The F .n1 ; n2 / distribution) If Y1  2n1 and Y2  2n2 and Y1 and Y2 are
independent, then Z D .Y1 =n1 /=.Y2 =n2 / has an F .n1 ; n2 / distribution. This distribution
18

b. Pdf of F(n1,10)

a. Pdf of Chisquare(n)
1

1
n=1
n=2
n=5
n=10

0.5

n1=2
n1=5
n1=10

0.5

10

c. Pdf of F(n1,100)

d. Pdf of N(0,1) and t(n)

0.4
n1=2
n1=5
n1=10

0.5

4
x

N(0,1)
t(10)
t(50)

0.2

0
x

Figure 21.4: 2 , F, and t distributions


has no moment generating function, but E Z D n2 =.n2

2/ for n > 2.

Fact 21.80 (Convergence of an F .n1 ; n2 / distribution) In Fact (21.79), the distribution of


n1 Z D Y1 =.Y2 =n2 / converges to a 2n1 distribution as n2 ! 1. (The idea is essentially
that n2 ! 1 the denominator converges to the mean, which is E Y2 =n2 D 1. Only the
numerator is then left, which is a 2n1 variable.)
Fact 21.81 (The tn distribution) If X  N.0; 1/ and Y  2n and X and Y are independent, then Z D X=.Y =n/1=2 has a tn distribution. The moment generating function does
not exist, but E Z D 0 for n > 1 and Var.Z/ D n=.n 2/ for n > 2.
Fact 21.82 (Convergence of a tn distribution) The t distribution converges to a N.0; 1/
distribution as n ! 1.
Fact 21.83 (tn versus F .1; n/ distribution) If Z  tn , then Z 2  F .1; n/.

19

21.7.5

The Bernouilli and Binomial Distributions

Fact 21.84 (Bernoulli distribution) The random variable X can only take two values:
1 or 0, with probability p and 1 p respectively. The moment generating function is
mgf .t / D pe t C 1 p. This gives E.X/ D p and Var.X/ D p.1 p/.
Example 21.85 (Shifted Bernoulli distribution) Suppose the Bernoulli variable takes the
values a or b (instead of 1 and 0) with probability p and 1 p respectively. Then E.X/ D
pa C .1 p/b and Var.X/ D p.1 p/.a b/2 .
Fact 21.86 (Binomial distribution). Suppose X1 ; X2 ; :::; Xn all have Bernoulli distributions with the parameter p. Then, the sum Y D X1 C X2 C ::: C Xn has a Binomial
distribution with parameters p and n. The pdf is pdf.Y / D n=y.n y/p y .1 p/n y
for y D 0; 1; :::; n. The moment generating function is mgf .t/ D pe t C 1 pn . This
gives E.Y / D np and Var.Y / D np.1 p/.
Example 21.87 (Shifted Binomial distribution) Suppose the Bernuolli variables X1 ; X2 ; :::; Xn
take the values a or b (instead of 1 and 0) with probability p and 1 p respectively.
Then, the sum Y D X1 C X2 C ::: C Xn has E.Y / D npa C .1 p/b and Var.Y / D
np.1 p/.a b/2 .
21.7.6

The Skew-Normal Distribution

Fact 21.88 (Skew-normal distribution) Let  and be the standard normal pdf and cdf
respectively. The pdf of a skew-normal distribution with shape parameter is then
f .z/ D 2.z/.z/:
If Z has the above pdf and
Y D  C !Z with ! > 0;
then Y is said to have a SN.; ! 2 ; / distribution (see Azzalini (2005)). Clearly, the pdf
of Y is
f .y/ D 2 .y / =! .y / =! =!:

The moment generating function is mgfy .t/ D 2 exp t C ! 2 t 2 =2 .!t/ where D
p
= 1 C 2 . When > 0 then the distribution is positively skewed (and vice versa)and
20

when D 0 the distribution becomes a normal distribution. When ! 1, then the


density function is zero for Y  , and 2 .y / =! =! otherwisethis is a halfnormal distribution.
Example 21.89 The first three moments are as follows. First, notice that E Z D
p
Var.Z/ D 1 2 2 = and E.Z E Z/3 D .4= 1/ 2= 3 . Then we have

2=,

EY D  C ! EZ
Var.Y / D ! 2 Var.Z/
E .Y

E Y /3 D ! 3 E.Z

E Z/3 :

Notice that with D 0 (so D 0), then these moments of Y become , ! 2 and 0
respecively.
21.7.7

Generalized Pareto Distribution

Fact 21.90 (Cdf and pdf of the generalized Pareto distribution) The generalized Pareto
distribution is described by a scale parameter ( > 0) and a shape parameter (). The
cdf (Pr.Z  z/, where Z is the random variable and z is a value) is
(
1 .1 C z=/ 1= if  0
G.z/ D
1 exp. z=/
 D 0;
for 0  z and z 

= in case  < 0. The pdf is therefore


(
1
.1 C z=/ 1= 1 if  0

g.z/ D
1
exp. z=/
 D 0:

The mean is defined (finite) if  < 1 and is then E.z/ D =.1 /, the median is
.2 1/= and the variance is defined if  < 1=2 and is then 2 =.1 /2 .1 2/.

21.8

Inference

O and Var.  / be the


Fact 21.91 (Comparing variance-covariance matrices) Let Var./
O Var.  /
variance-covariance matrices of two estimators, O and  , and suppose Var./
O 
is a positive semi-definite matrix. This means that for any non-zero vector R that R0 Var./R
21

R0 Var.  /R, so every linear combination of O has a variance that is as large as the variance of the same linear combination of  . In particular, this means that the variance of
O is at least as large as variance of
every element in O (the diagonal elements of Var./)
the corresponding element of  .

22

Bibliography

Azzalini, A., 2005, The skew-normal distribution and related Multivariate Families,
Scandinavian Journal of Statistics, 32, 159188.
Davidson, J., 2000, Econometric theory, Blackwell Publishers, Oxford.
DeGroot, M. H., 1986, Probability and statistics, Addison-Wesley, Reading, Massachusetts.
Greene, W. H., 2000, Econometric analysis, Prentice-Hall, Upper Saddle River, New
Jersey, 4th edn.
Johnson, N. L., S. Kotz, and N. Balakrishnan, 1994, Continuous univariate distributions,
Wiley, New York, 2nd edn.
Mittelhammer, R. C., 1996, Mathematical statistics for economics and business, SpringerVerlag, New York.
Sderlind, P., 2009, An extended Steins lemma for asset pricing, Applied Economics
Letters, forthcoming, 16, 10051008.

23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy