TA_session_06

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Econometrics I

TA Session 6
Jukina HATAKEYAMA∗
May 21, 2024

Contents
1 Review of Some Concepts for a Multivariate Normal Random Variable 2

2 Multiple Regression Model 2


2.1 Derivation of the OLS Estimator . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Properties of the OLS Estimator . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Gauss–Markov Theorem for a Multiple Regression Model 8

4 Asymptotic Normality for the OLS Estimator of a Multiple Regression


Model 9

A The Probability Density Function for a Multivariate Normal Distribution 10


A.1 Independent Univariate Normals . . . . . . . . . . . . . . . . . . . . . . . . . 10
A.2 Affine Transformations of a Random Vector . . . . . . . . . . . . . . . . . . 11
A.3 Probability Density Function of a Transformed Random Vector . . . . . . . . 11
A.4 The Multivariate Normal Probability Density Function . . . . . . . . . . . . 11

B Properties of Conditional Variances 12


E-mail: u868710a@ecs.osaka-u.ac.jp

1
1 Review of Some Concepts for a Multivariate Normal
Random Variable
 
Theorem 1.1 (Multivariate Normal Distribution). Let the vector x = (x1 , . . . , xk )′ ∈ Rk
be the set of n random variables, µ their mean vector, and Σ their variance–covariance
matrix. The general form of the joint distribution is given by
 
−k/2 −1/2 1 ′ −1
f (x) = (2π) |Σ| exp − (x − µ) Σ (x − µ) .
2

In the special case where x = (x1 , . . . , xk )′ ∈ Rk and xi for i ∈ {1, . . . , k} is an i.i.d.random


variable with mean 0 and finite variance σi2 < ∞, we have
 
−k/2 −1/2 1 ′ −1
f (x) = (2π) |Σ| exp − x Σ x
2

where

Σ = diag(σ12 , . . . , σk2 ).
 
The proof is shown in Appendix A.

Adding to the theorem, we can construct the characteristic function and moment generating
function for this random variable as follows.
 
Theorem 1.2 (Characteristic Function and Moment Generating Function). For a ran-
dom variable x : Ω → Rk which follows a multibvariate normal distribution with mean
µ ∈ Rk and variance–covariance matrix Σ ∈ Rk×k , by using a parameter θ ∈ Rk , we can
define a function φX : Rk → C:
 
iθ ′ x ′ 1 ′
φx (θ) := E[e ] = exp iθ µ − θ Σθ , (1)
2

which is called the characteristic function of x. In addition, there exists a function


ϕx : Rk → R defined as
 
′ ′ 1 ′
ϕ(θ) = E [exp (θ x)] = exp θ µ + θ Σθ , (2)
2

which is called the moment generating function of x.


 

2 Multiple Regression Model

yi = b1 xi,1 + · · · + bi,k xi,k + ui = xi b + ui , (3)

where xi = (xi,1 , . . . , xi,k ) is a 1 × K vector for i ∈ {1, . . . , n} and b = (b1 , . . . , bk )′ is a k × 1

2
vector. Denoting by
y := (y1 , . . . , yn )′ ∈ Rn ,
u := (u1 , . . . , un )′ ∈ Rn ,
   
x1 x1,1 · · · x1,k
 ..   .. .. ..  ∈ Rn×k ,
x :=  .  =  . . . 
xk xn,1 · · · xn,k
we can write the stacked regression system as follows:
 
        
 y1 x1,1 · · · x1,k b1 u1 
        .. 
y = xb + u 
 ⇐⇒ 
.
.
.  = 
.
.
.
. . .
.
.
.  
.
.
.  +  . 
.
 xn,1 · · · xn,k un 
 yn bk 
| {z } | {z } | {z } | {z }
∈Rn ∈Mn×k (R) ∈Rk ∈Rn

2.1 Derivation of the OLS Estimator


In this subsection, we derive the OLS estimator, which is defined as follows.
 
Definition 2.1 (Ordinary Least Squares (OLS) Estimator for a Multivariate Regression
Model). The OLS estimator b̂ for a multivariate regression model is a vector b̂ ∈ Rk
which satisfies the minimum distance between y and the vectorial space of Rn generated
by X for the Euclidian norm:
!2
X
n X
k
b̂ = arg min ∥y − Xb∥22 = arg min (y − Xb)′ (y − Xb) = arg min yi − bl Xi,l .
b b b i=1 l=1
 
Pk
The residual is defined by ûi = yi − ŷi = l=1 b̂l Xi,l . Therefore, the above definition can be
written as
X
n
b̂ = arg min û2i .
b i=1

This implies that the OLS estimator is an estimator which minimizes the sum of the residual
sum of squares. The OLS estimator obtained from the above definition becomes as follows.
 
Theorem 2.1 (Ordinary Least Squares (OLS) Estimator for a Multivariate Regression
Model). Suppose

H1: X1 , . . . , Xk are independent,

then the OLS estimator b̂ exists uniquely and satisfies


−1
b̂ = (X ′ X) (X ′ y) . (4)
 

Proof. To obtain the OLS estimator, we have to confirm the first and second order condition
for the minimization problem of the following loss function S(b):
arg min ∥y − Xb∥22 =: arg min S(b).
b b

3
The first order condition becomes
 ′  
∇b ∥y − X b̂∥22 = ∇b y − X b̂ y − X b̂
= −2X ′ (y − X b̂) = 0.

The OLS estimator, denoted as b̂, satisfies this equation, and hence

(X ′ X) b̂ = X ′ y.

From the assumption H1, the inverse matrix (X ′ X)−1 exists, with X = (X1′ , . . . , Xk′ )′ ∈
Mn×k (R), whose columns are independent so that X ′ X is a full rank matrix, and therefore
we can obtain the OLS estimator in the form of (4). The second order condition becomes

∇2b,b′ ∥y − X b̂∥22 = 2X ′ X > 0.

By assumption H1, X ′ X is a positive definite matrix. This shows that the loss function S(b)
has a minimum at the OLS estimator b̂.
From this theorem, we can confirm that the OLS estimator expressed as (4) is a random
variable since we can rewite it as follows:
−1
b̂ = b + (X ′ X) X ′ u. (5)

Therefore, we can consider the mean and variance of the OLS estimator. First, we see the
mean of the OLS estimator, which will be used to prove that the OLS estimator is an unbiased
estimator.
 
Proposition 2.1 (Mean of the OLS Estimator). Suppose

H2: E[ui X] = 0 for all i ∈ {1, . . . , n},

then the conditional expectation of the OLS estimator b̂ becomes

E[b̂|X] = b. (6)
 

Proof. Calculating the expectation of b̂ yields


h i
−1
E[b̂|X] = E (X ′ X) (X ′ y) X
h i
−1
= b + E (X ′ X) X ′ u X
−1  
= b + (X ′ X) X ′ E u X
| {z }
=0(from H2)

= b,

which proves (6).


Remark 2.1 (Unconditional Expectation of the OLS estimator). The conditional expecta-
tion of the OLS estimator is same as the unconditional one:

E[b̂] = b.

from the law of iterated expectation mentioned below.

4
 
Lemma 2.1 (Law of Iterated Expectation). For any two random variables x and y,

E[y] = Ex [E[y|X]] , (7)

where Ex is the expectation over the values of x.


 
The proof is omitted (left as an exercise for students). From this, we have
 
E[b̂] = E E[b̂|X] = E[b] = b.
| {z }
=b

The variance of the OLS estimator, which is the minimum variance in the class of linear
OLS estimator, becomes as follows.
 
Proposition 2.2 (Variance of the OLS Estimator). Suppose [H1–H2] holds and assume

H3 V[ui |X] = σ 2 for all i ∈ {1, . . . , n};

H4 E[ui uj |X] = 0 for all i ̸= j and i, j ∈ {1, . . . , n},

the conditional variance of the OLS estimator b̂ becomes


−1
V[b̂|X] = σ 2 (X ′ X) , (8)

and the unconditional variance becomes


h i
−1
V[b̂] = σ 2 E (X ′ X) . (9)
 

Proof. From the Eq, (5) and Eq, (6),


−1
b̂ − E[b̂ X] = b̂ − b = (X ′ X) X ′ u.

Therefore,
  ′ 
V[b̂ X] = E b̂ − E[b̂ X] b̂ − E[b̂ X] X
h i
−1 −1
= E (X ′ X) X ′ uu′ X (X ′ X) X
−1   −1
= (X ′ X) X ′ E uu′ X X (X ′ X)
−1 −1
= (X ′ X) X ′ σ 2 In X (X ′ X)
−1
= σ 2 (X ′ X) .

This implies (8) holds. Thus,


h i h i
V[b̂] = E V[b̂ X] + V E[b̂ X]
h i
−1
= E σ 2 (X ′ X) + V[b]
|{z}
h i =0
−1
= σ 2 E (X ′ X) ,

which porves (9). See the Appendix B for the proof of the first equality.

5
2.2 Properties of the OLS Estimator
Here we exhibit some properties of the OLS estimator.
 
Theorem 2.2 (Properties of the OLS Estimator). The OLS estimator obtained above
has the following properties.

(i) unbiasedness Under the assumption H2, the OLS estimator b̂ becomes an unbi-
ased estimator:

E[b̂] = b. (10)

(ii) consistency Under the following assumption:

H5 X ′ X is positive definite;
H6 For all i, for all k, l, the moments of E [|Xik Xil |] exist and E [X ′ X] is positive
definite,

as well as [H1–H4], the OLS estimator b̂ = (X ′ X)−1 (X ′ y) satisfies


p
b̂ −−−→ b or plim b̂ = b. (11)
n→∞ n→∞

(iii) efficiency Under the assumption [H1–H4], the variance of the OLS estimator is
the minimum one in the class of linear unbiased estimator.
 

Proof. We can derive these properties via a similar calculation in the case of a simple regres-
sion model.

(i) unbiasedness This property is shown above (in Remark 2.1).

(ii) consistency From (5), we have:


−1
b̂ = b + (X ′ X) X ′u
!−1 !
1X ′ 1X ′
n n
=b+ X Xi X ui .
n i=1 i n i=1 i

By WLLN and CMT, we have:

 −1 

1 ′ 1 ′
b̂ = b + XX Xu (12)
n n
!−1 !
1X ′ 1X ′
n n
=b+ X Xi X ui (13)
n i=1 i n i=1 i
P −1
−−−→ b + E [Xi′ Xi ] E [Xi′ ui ] . (14)
n→∞

Here we apply the convergence of the product of random variables in probability, which
we will discuss in the following. From the weak law of large numbers (WLLN),

6
1X ′
n
p
Xi Xi −−−→ E [Xi′ Xi ] < ∞; (15)
n i=1 n→∞

1X ′
n
p
Xi ui −−−→ E [Xi′ ui ] = 0 ∈ Rk . (16)
n i=1 n→∞

E [Xi′ ui ] = 0 holds from the orthogonality condition with respect to X and u. In


addition,

 −1
1 ′ P −1
X Xi −−−→ E [Xi′ Xi ] (17)
n i n→∞

holds from the continuous mapping theorem shown as below. Thus, substituting (15)
and (16) into (14) results in

−1
b̂ −−−→ b + E [Xi′ Xi ]
P
0 = b,
n→∞

p
which indicates that b̂ −−−→ b.
n→∞

(iii) efficiency As for the efficiency of the OLS estimator, the following Gauss–Markov
theorem for a multiple regression model support the efficiency.

The convergence of the product of random variables in probability and continuous mapping
theorem are respectively given as follows.
 
Lemma 2.2 (Convergence of the Product of Random Variables in Probability). Suppose
a sequence of random vector Xn converges in probability to X and yn to y, respectively.
Then, the product of the two random variable Xn yn also converges in probability to the
product of the each probability limit:
P
Xn yn −−−→ Xy.
n→∞

In another notation,

plim Xn yn = plim Xn plim yn .


n→∞ n→∞ n→∞
 
 
Lemma 2.3 (Continuous Mapping Theorem). Suppose a sequence of random vector
xn ∈ S converges in probability to x. Then, for any continuous mapping g : S → Rl , the
following relation holds:
 
plim g (xn ) = g plim xn = g (x) .
n→∞ n→∞
 

7
3 Gauss–Markov Theorem for a Multiple Regression
Model
Here we will obtain a general result for the class of linear unbiased estimators of b. It can
be

conducted via a direct method. 
Theorem 3.1 (Gauss–Markov Theorem for a Multiple Regression Model). Under the
assumption [H1–H4], the OLS estimater b̂ of the multiple regression model

y i = X i b + ui , (18)

for all i ∈ {1, . . . , n} is of minimum variance among the class of linear unbiased estimator.
 

Proof. Let us assume another unbiased linear estimator of b, say b̃. Thus, there exists a
matrix A ∈ Rk×n such that b̃ = Ay. Since b̃ is an unbiased estimator,

E[b̃] = b (19)

holds, which yields

E [A {Xb + u}] = b ⇐⇒ AXb = b. (20)

Therefore, AX = Ik must hold. Moreover, from the equation:

b̃ − E[b̃] = A {y − Xb} = Au. (21)

the variance V[b̃] becomes



V[b̃] = V [Au] = AV [u] A′ = A σ 2 In A′ = σ 2 AA′ , (22)

from the assumption V [u] = σ 2 In . Using the projection matrix:


 
−1 −1
MX := In − X (X ′ X) X ′ ⇐⇒ In = MX + X (X ′ X) X ′ , (23)

we can rewrite (22) as follows:



V[b̃] = A σ 2 In A′
 
−1
= σ A X (X X) X + MX A′
2 ′ ′

 
−1
= σ 2 AX (X ′ X) X ′ A′ + AMX A′ .

Substituting AX(= X ′ A′ ) = Ik and V[b̂] = σ 2 (X ′ X)−1 into the above equation results in

V[b̃] = V[b̂] + σ 2 AMX A′ ⇐⇒ V[b̃] − V[b̂] = σ 2 AMX A′ .

Hence, the difference of ith diagonal elements of variance–covariance matrices becomes

V[b̃]ii − V[b̂]ii = a′i Mai > 0

for any column vector ai in A for i ∈ {1, . . . , k}, which proves the theorem.

8
4 Asymptotic Normality for the OLS Estimator of a
Multiple Regression Model
In this section, we derive the asymptotic distribution of an OLS estimator to observe how
the distribution changes as n → ∞.
 

Theorem 4.1 (Asymptotic Normality of an OLS Estimator). Let b̂ be the OLS estima-
tor obtained under the assumption [H1–H6]. Then, the OLS estimator asymptotically
follows a normal distribution as follows:
√  
−1
n(b̂ − b) −−−→ NRk 0, σ 2 (E [Xi′ Xi ])
d
.
n→∞
 

Proof. From (5), we have


−1
b̂ = b + (X ′ X) X ′u
!−1 !
1X ′ 1X ′
n n
=b+ X Xi X ui .
n i=1 i n i=1 i

Therefore,
!−1 !
√ 1X ′ 1 X ′
n n
n(b̂ − b) = X Xi √ X ui . (24)
n i=1 i n i=1 i

From the Lindeberg–Feller central limit theorem (Lindeberg–Feller CLT) as well as the
weak law of large numbers (WLLN) and continuous mapping theorem, we have
!−1
1X ′
n
−1
−−−→ E [Xi′ Xi ] ;
P
Xi Xi
n i=1 n→∞
! !
1 X ′ √ 1X ′
n n
Xi ui − 0 −−−→ NRk (0, V[Xi′ ui ]) ,
d
√ X i ui = n
n i=1 n i=1 n→∞

since from the orthogonality condition,


" n #
1X ′ 1X
n
E X ui = E[Xi′ ui ] = 0.
n i=1 i n i=1

Then,
   
V[Xi′ ui ] = E V[Xi′ ui Xi ] + V E[Xi′ ui Xi ]
| {z }
 ′  =0
= E Xi V[ui Xi ]Xi
 
= E Xi′ σ 2 Xi
= σ 2 E [Xi′ Xi ] < ∞,

Therefore, from (24) and the Slutsky’s theorem,


√ −1
n(b̂ − b) −−−→ E [Xi′ Xi ]
d
b,
n→∞

9
where

b ∼ NRk 0, σ 2 E [Xi′ Xi ] .
From the following relation:
  
−1 −1
b ∼ NRk 0, σ 2 E [Xi′ Xi ] =⇒ E [Xi′ Xi ] b ∼ NRk 0, σ 2 E [Xi′ Xi ] ,
we obtain
√  
−1
n(b̂ − b) −−−→ NRk 0, σ 2 E [Xi′ Xi ]
d
.
n→∞

Here we review the Slutsky’s Theorem.


 
P
Lemma 4.1 (Slutsky’s Theorem). Suppose a sequence of random vector Xn −−−→ X
n→∞
d
and yn −−−→ y, respectively. Then, the product of the two random variable Xn yn also
n→∞
converges in distribution as follows:
d
Xn yn −−−→ Xy. (25)
n→∞
 

Appendix
A The Probability Density Function for a Multivariate
Normal Distribution
A.1 Independent Univariate Normals
To derive the general case of probability density function for a multivariate normal distribu-
tion, we will start with a vector consisting of k independent and normally distributed random
variables with mean 0: x = (x1 , . . . , xk ) where

xi ∼ NR 0, σi2 .
Let us denote, by fxi , the probability density function for a single normal random variable
xi for i ∈ {1, . . . , k}. Then, since the variables are independent, the joint probability density
function, fx , of all k variables will just be the product of their densities:

fx = Πki=1 fxi
 
1 x2i
= Πki=1 p exp − 2
2πσi2 2σi
 
1 1 ′ 2 −1
=p exp − x diag(σ1 , . . . , σk ) x
2
(2π)k Πki=1 σi2 2
 
1 1 ′ −1
= exp − x Σ x
(2π)k/2 |Σ|1/2 2
where Σ = diag(σ12 , . . . , σk2 ). In this case we say that x ∼ NRk (0, Σ). Unfortunately, this
derivation is restricted to the case where these entries are independent and 0-centered. Thus,
we will see that we can derive the general case using this result.

10
A.2 Affine Transformations of a Random Vector
Consider an affine transformation L : Rk → Rk , L(x) = Ax + b for an invertible matrix
A ∈ Rk×k and a constant vector b ∈ Rk . It is easy to verify that when we apply this
transformation to a random variable z = (z1 , . . . , zk ) with mean µ ∈ Rk and variance–
covariance matrix Σz ∈ Rk×k we get a new random variable x = L(z) such that

E[L(x)] = L (E[z]) ;
 
V[L(x)] = E (x − E[x]) (x − E[x])′ = AΣz A′ .

In this case, for a symmetric, positive definite matrix Σ and constant vector µ, we will be
looking at the transformation x = Σ1/2 z+µ. It is interesting to note that, given an orthogonal
decomposition Σ = U ΛU ′ , where U is orthogonal and Λ is a diagonal matrix consisting of
the eigenvalues of Σ, entry xi of the new random vector is a weighted sum of originally
independent random variables in z. Let ui denote the ith row of a matrix U . Then,

 p X
k
xi = Σ1/2
z z +µ i
= λ i u i z + µi = λi uij zj + µi .
j=1

We now just need one more fact about a change of variables to derive the general multivariate
normal probability density function for this new random vector.

A.3 Probability Density Function of a Transformed Random Vec-


tor
Suppose that z is a random vector taking on values in a subset S ∈ Rk , with a continuous
probability density function f . Suppose x = r(z) where r is a differentiable function from S
onto some other subset T ∈ Rk . Then, the probability density function g of x is given by
   
dz −1
 dz
g(x) = f (z) det = f r(x) det ,
dx dx
dz
where dx stands for the Jacobian of the inverse of r, and det() the determinant of a matrix.
Returning to our previous discussion, where x = Σ1/2 z + µ, we can see that the inverse
transformation is given by z = Σ−1/2 (x − µ). Thus, the determinant of the Jacobian of this
inverse becomes det(Σ−1/2 ) = √ 1 . (You should check this equality using some porperties
det(Σ)
about the determinant.)

A.4 The Multivariate Normal Probability Density Function


Consider the random vector z ∼ NRk (0, I) where I is is the identity matrix. As before we let
x := Σ1/2 z + µ for positive definite Σ and a constant vector µ. We can now find the density
function g of x from the known density function f for z.

11
 
−1
 dz
g(x) = f r(x) det
dx
 1
= f Σ−1/2 (x − µ) p
det(Σ)
 
1 1 1 −1/2 ′ −1/2 
=p p exp − Σ (x − µ) Σ (x − µ)
(2π)k det(Σ) 2
 
1 1 ′ −1
= exp − (x − µ) Σ (x − µ) .
(2π)k/2 |Σ|1/2 2
This is the probability density function for a multivariate normal distribution with mean
vector µ and a covariance matrix Σ. We say that x ∼ NRk (µ, Σ).

B Properties of Conditional Variances


 
Theorem B.1 (Properties of Conditional Variances).
   
V [y] = E V[y X] + V E[y X] . (26)
 

Proof. We can derive this relation in direct procedure shown as follows.


 
V [y] ≡ E (y − E[y])2
h 2 i
= E y − E[y X] + E[y X] − E[y]
h 2 i h 2 i   
= E y − E[y X] + E E[y X] − E[y] + 2E y − E[y X] E[y X] − E[y] .

Here we have the following calculation:


     
E y − E[y X] X = E y X − E E[y X] X
   
= E E y X − E[y X] X
= 0.
Thus,
   h h   ii
E y − E[y X] E[y X] − E[y] = E E y − E[y X] E[y X] − E[y] X
h  h  ii
= E E[y X] − E[y] E y − E[y X] X
= 0.
Note that E [c|X] = c for any constant c and E[y X] is a random variable of x. Therefore,
by using the law of iterated expectations,
h 2 i h 2 i
V [y] = E y − E[y X] + E E[y X] − E[y]
h h 2 ii h  2 i
= E E y − E[y X] X + E E[y X] − E E[y X]
   
= E V[y X] + V E[y X] ,
which proves (26).

12
 
Lemma B.1. In general, we have the following equation:
 
E g(x) y − E[y X] = 0.
 
Proof. Using the fact E y − E[y X] X = 0, we have
     
E g(x) y − E[y X] = E E g(x) y − E[y X] X
   
= E g(x)E y − E[y X] X
= 0.

 

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy