Stats-Basic Proves

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Econometrics I Academic Year 2020-2021

APPENDIX: easy proofs+summary table

Here you can find proofs for several properties about random variables
frequently used in the course. This is important material of the course.
Most of them have been extracted from the reference book by Stock &
Watson. For a summary of concepts and notation, see table in last page

Let X be a (discrete) random variable that takes values { X=x1, X=x2, …,


X=xk } with probabilities { P(X=x1), P(X=x2), … , P(X=xk) } such that, by
definition, { P(X=x1)+P(X=x2)+…+P(X=xk) = ∑𝑘𝑖=1 𝑃(𝑋 = 𝑥𝑖 ) = 1 }, then

(A) 𝐸(𝑋) = ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) ≡ 𝜇𝑋 Mean or Expected value


𝑘
(B) 𝑣𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇𝑋 )2 ] =
𝐴 ∑𝑖=1(𝑥𝑖 − 𝜇𝑋 ) 𝑃(𝑋 = 𝑥𝑖 ) ≡ 𝜎𝑋 Variance
2 2

(C) 𝑠𝑑(𝑋) = √𝜎𝑋2 ≡ 𝜎𝑋 Standard deviation

Let a, b & c be constants (b will be used later), then


(1) 𝐸(𝑎𝑋 + 𝑐) =
𝐴 ∑𝑘𝑖=1(𝑎𝑥𝑖 + 𝑐)𝑃(𝑋 = 𝑥𝑖 )

= ∑𝑘𝑖=1 𝑎𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) + ∑𝑘𝑖=1 𝑐𝑃(𝑋 = 𝑥𝑖 )


= 𝑎 ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) + 𝑐 ∑𝑘𝑖=1 𝑃(𝑋 = 𝑥𝑖 )
=
𝐴 𝑎𝐸(𝑋) + 𝑐 ≡ 𝑎𝜇𝑋 + 𝑐

(2) 𝑣𝑎𝑟(𝑋) =𝐵 𝐸[(𝑋 − 𝜇𝑋 )2 ]


= 𝐸[𝑋 2 +(𝜇𝑋 )2 − 2𝜇𝑋 𝑋]
=
1 𝐸[𝑋 2 ] + 𝜇𝑋2 − 2𝜇𝑋 𝐸[𝑋]
= 𝐸[𝑋 2 ] − 𝜇𝑋2
(3) 𝑣𝑎𝑟(𝑎𝑋 + 𝑐) ==
𝐵1 𝐸{[(𝑎𝑋 + 𝑐) − (𝑎𝜇𝑋 + 𝑐)]2 }
= 𝐸{[𝑎(𝑋 − 𝜇𝑋 )]2 }
=
1 𝑎2 𝐸[(𝑋 − 𝜇𝑋 )2 ]
=
𝐵 𝑎2 𝑣𝑎𝑟(𝑋) ≡ 𝑎2 𝜎𝑋2

Let Y be another (discrete) random variable that takes values { Y=y1,


Y=y2, …, Y=ym}. There is a joint distribution of X & Y with probabilities
{ P(𝑥1 , 𝑦1 ), P(𝑥1 , 𝑦2 ), …, P(𝑥1 , 𝑦𝑗 ),…, P(𝑥1 , 𝑦𝑚 );
P(𝑥2 , 𝑦1 ), P(𝑥2 , 𝑦2 ), …, P(𝑥2 , 𝑦𝑗 ),…, P(𝑥2 , 𝑦𝑚 );
…,
P(𝑥𝑖 , 𝑦1 ), P(𝑥𝑖 , 𝑦2 ), …, P(𝑥𝑖 , 𝑦𝑗 ),…, P(𝑥𝑖 , 𝑦𝑚 );
…,
P(𝑥𝑘 , 𝑦1 ), P(𝑥𝑘 , 𝑦2 ), …, P(𝑥𝑘 , 𝑦𝑗 ),…, P(𝑥𝑘 , 𝑦𝑚 ) }

where, in general, P(𝑥𝑖 , 𝑦𝑗 ) is short notation for P(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 ),


and the sum of probabilities for all combinations of X & Y equals 1.

Also, we define individual probability distributions of X &Y as

(D) 𝑃(𝑋 = 𝑥𝑖 ) = ∑𝑚
𝑗=1 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )

and 𝑃(𝑌 = 𝑦𝑗 ) = ∑𝑘𝑖=1 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )

, in other words, as marginal probabilities , then we already know that


(A’) 𝐸(𝑌) = ∑𝑚
𝑗=1 𝑦𝑗 𝑃(𝑌 = 𝑦𝑗 ) ≡ 𝜇𝑌 as for X !

(B’) 𝑣𝑎𝑟(𝑌) = 𝐸[(𝑌 − 𝜇𝑌 )2 ] = 𝐸[𝑌 2 ] − 𝜇𝑌2 ≡ 𝜎𝑌2 as for X !

(C’) 𝑠𝑑(𝑌) = √𝜎𝑌2 ≡ 𝜎𝑌 as for X !


(4) 𝐸(𝑎𝑋 + 𝑏𝑌 + 𝑐) =
𝐴 ∑𝑘𝑖=1 ∑𝑚
𝑗=1(𝑎𝑥𝑖 + 𝑏𝑦𝑗 + 𝑐)𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )

= ∑𝑖 ∑𝑗 𝑎𝑥𝑖 𝑃(𝑥𝑖 , 𝑦𝑗 ) + ∑𝑖 ∑𝑗 𝑏𝑦𝑗 𝑃(𝑥𝑖 , 𝑦𝑗 ) + ∑𝑖 ∑𝑗 𝑐𝑃(𝑥𝑖 , 𝑦𝑗 )

= 𝑎 ∑𝑘𝑖=1 𝑥𝑖 ∑𝑚 𝑚 𝑘
𝑗=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) + 𝑏 ∑𝑗=1 𝑦𝑗 ∑𝑖=1 𝑃(𝑥𝑖 , 𝑦𝑗 ) + 𝑐
=
𝐷 𝑎 ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) + 𝑏 ∑𝑚
𝑗=1 𝑦𝑗 𝑃(𝑌 = 𝑦𝑗 ) + 𝑐
=
𝐴 𝑎𝐸[𝑋] + 𝑏𝐸[𝑌] + 𝑐 ≡ 𝑎𝜇𝑋 + 𝑏𝜇𝑌 + 𝑐

and, in general, if instead of two random variables X & Y we have ran-


dom variables {X1, X2, …, Xs,…, XS} and constants {a1, a2, …, as, …, aS, c}

(4) 𝐸(𝑐 + ∑𝑆𝑠=1 𝑎𝑠 𝑋𝑠 ) = 𝑐 + ∑𝑆𝑠=1 𝑎𝑠 𝐸(𝑋𝑠 ) ≡ 𝑐 + ∑𝑆𝑠=1 𝑎𝑠 𝜇𝑋𝑠

Let´s define
(E) 𝑐𝑜𝑣(𝑋, 𝑌) = 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] ≡ 𝜎𝑋𝑌
then
(5) 𝑐𝑜𝑣(𝑋, 𝑌) =
𝐸 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] =
= 𝐸(𝑋𝑌 − 𝑋𝜇𝑌 − 𝑌𝜇𝑋 + 𝜇𝑋 𝜇𝑌 )
=
4 𝐸(𝑋𝑌) – 𝜇𝑌 𝜇𝑋 − 𝜇𝑋 𝜇𝑌 + 𝜇𝑋 𝜇𝑌
= 𝐸(𝑋𝑌)−𝜇𝑋 𝜇𝑌

(6) 𝑐𝑜𝑣(𝑋, 𝑋) =
𝐸 𝐸[(𝑋 − 𝜇𝑋 )(𝑋 − 𝜇𝑋 )]
=
𝐵 𝑣𝑎𝑟(𝑋) ≡ 𝜎𝑋2

(7) 𝑐𝑜𝑣(𝑎𝑋, 𝑐 + 𝑏𝑌) 𝐸==4 𝐸[(𝑎𝑋 − 𝑎𝜇𝑋 )(𝑐 + 𝑏𝑌 − 𝑐 − 𝑏𝜇𝑌 )] or use (5)!

= 𝐸[𝑎𝑏(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
=
4 𝑎𝑏 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
=
𝐸 𝑎𝑏 𝑐𝑜𝑣(𝑋, 𝑌) ≡ 𝑎𝑏 𝜎𝑋𝑌
(8) 𝑐𝑜𝑣(𝑎𝑋 + 𝑏𝑌 + 𝑐, 𝑍) =7 𝑐𝑜𝑣(𝑎𝑋 + 𝑏𝑌, 𝑍)
==
𝐸 4 𝐸[{(𝑎𝑋 + 𝑏𝑌) − (𝑎𝜇𝑋 + 𝑏𝜇𝑌 )}{𝑍 − 𝜇𝑍 }] or use (5)!

= 𝐸[{𝑎(𝑋 − 𝜇𝑋 ) + 𝑏(𝑌 − 𝜇𝑌 )}{𝑍 − 𝜇𝑍 }]


=
4 𝑎𝐸[(𝑋 − 𝜇𝑋 )(𝑍 − 𝜇𝑍 )] + 𝑏𝐸[(𝑌 − 𝜇𝑌 )(𝑍 − 𝜇𝑍 )]
=
𝐸 𝑎 𝑐𝑜𝑣(𝑋, 𝑍) + 𝑏 𝑐𝑜𝑣(𝑌, 𝑍) ≡ 𝑎 𝜎𝑋𝑍 + 𝑏 𝜎𝑌𝑍

and, in general, if we have random variables {X1, X2, …, Xs, …, XS } & {Y1,
Y2, …, Yt, …, YT} and constants { a1, a2, …, as, …, aS ;b1, b2, …, bt, …, bT ; c}

(8) 𝑐𝑜𝑣(∑𝑠 𝑎𝑠 𝑋𝑠 , 𝑐 + ∑𝑡 𝑏𝑡 𝑌𝑡 ) = ∑𝑠 ∑𝑡 𝑎𝑠 𝑏𝑡 𝑐𝑜𝑣(𝑋𝑠 , 𝑌𝑡 ) ≡ ∑𝑠 ∑𝑡 𝑎𝑠 𝑏𝑡 𝜎𝑋𝑠𝑌𝑡

(9) 𝑣𝑎𝑟(𝑎𝑋 + 𝑏𝑌 + 𝑐) =3 𝑣𝑎𝑟(𝑎𝑋 + 𝑏𝑌)


==
𝐵4 E[{(𝑎𝑋 + 𝑏𝑌) − (𝑎𝜇𝑋 + 𝑏𝜇𝑌 )}2 ] or use (2)!

= E[{𝑎(𝑋 − 𝜇𝑋 ) + 𝑏(𝑌 − 𝜇𝑌 )}2 ]


= E[𝑎2 (𝑋 − 𝜇𝑋 )2 + 𝑏 2 (𝑌 − 𝜇𝑌 )2 + 2𝑎𝑏(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
=
4 𝑎2 E[(𝑋 − 𝜇𝑋 )2 ] + 𝑏 2 𝐸[(𝑌 − 𝜇𝑌 )2 ] + 2𝑎𝑏𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )]
==
𝐵𝐸 𝑎2 𝑣𝑎𝑟(𝑋) + 𝑏 2 𝑣𝑎𝑟(𝑌) + 2𝑎𝑏 ∙ 𝑐𝑜𝑣(𝑋, 𝑌)
≡ 𝑎2 𝜎𝑋2 + 𝑏 2 𝜎𝑌2 + 2𝑎𝑏 𝜎𝑋𝑌

and, in general, if we have random variables {X1, X2, …, Xs, …, XS } and


constants { a1, a2, …, as, …, aS ; c}

(9) 𝑣𝑎𝑟(∑𝑠 𝑎𝑠 𝑋𝑠 + 𝑐) = ∑𝑠 𝑎𝑠2 𝑣𝑎𝑟(𝑋𝑠 ) + 2 ∑𝑠 ∑𝑡≠𝑠 𝑎𝑠 𝑎𝑡 𝑐𝑜𝑣(𝑋𝑠 , 𝑋𝑡 )


2
≡ ∑𝑠 𝑎𝑠2 𝜎𝑋𝑠 + 2 ∑𝑠 ∑𝑡≠𝑠 𝑎𝑠 𝑎𝑡 𝜎𝑋𝑠𝑋𝑡

By definition, the conditional probability is


𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )
(F) 𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 ) = ⁄
𝑃(𝑌 = 𝑦𝑗 )
so
(G) 𝐸(𝑋|𝑌 = 𝑦𝑗 ) = ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 ) or simply 𝐸(𝑋|𝑌)!
(H) 𝑉𝑎𝑟(𝑋|𝑌 = 𝑦𝑗 ) = 𝐸{(𝑋 − 𝜇𝑋 )2 |𝑌 = 𝑦𝑗 } or simply 𝑉𝑎𝑟(𝑋|𝑌)!
= ∑𝑘 (𝑥
𝐺 𝑖=1 𝑖 − 𝜇𝑋 )2 𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 )

Thus, if we consider an extra random variable Z,


(10) 𝐸(𝑎𝑋 + 𝑏𝑌 + 𝑐|𝑍) = =
4 𝐺 𝑎𝐸(𝑋|𝑍) + 𝑏𝐸(𝑌|𝑍) + 𝑐

(11) 𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌 + 𝑐|𝑍) = =


9 𝐻

𝑎2 𝑣𝑎𝑟(𝑋|𝑍) + 𝑏 2 𝑣𝑎𝑟(𝑌|𝑍) + 2𝑎𝑏 ∙ 𝑐𝑜𝑣(𝑋, 𝑌|𝑍)


and, in general,
(10) 𝐸(𝑐 + ∑𝑆𝑠=1 𝑎𝑠 𝑋𝑠 |𝑍) = = 𝑆
4 𝐺 𝑐 + ∑𝑠=1 𝑎𝑠 𝐸(𝑋𝑠 |𝑍)

(11) 𝑉𝑎𝑟(∑𝑠 𝑎𝑠 𝑋𝑠 + 𝑐|𝑍) = =


9 𝐻

∑𝑠 𝑎𝑠2 𝑣𝑎𝑟(𝑋𝑠 |𝑍) + 2 ∑𝑠 ∑𝑡≠𝑠 𝑎𝑠 𝑎𝑡 𝑐𝑜𝑣(𝑋𝑠 , 𝑋𝑡 |𝑍)


where details were omitted as these proofs are very similar to (4) & (9)

(12) 𝐸[𝐸(𝑋|𝑌)] = 𝐸[𝐸(𝑋|𝑌 = 𝑦𝑗 )] =𝐴 ∑𝑚


𝑗=1 𝐸(𝑋|𝑌 = 𝑦𝑗 )𝑃(𝑌 = 𝑦𝑗 )

=
𝐺 ∑𝑚 𝑘
𝑗=1[∑𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 )]𝑃(𝑌 = 𝑦𝑗 )

=
𝐹 ∑𝑚 𝑘
𝑗=1 ∑𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )

= ∑𝑘𝑖=1 ∑𝑚
𝑗=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )

= ∑𝑘𝑖=1 𝑥𝑖 [∑𝑚
𝑗=1 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )]

=
𝐷 ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 )
=
𝐴 𝐸(𝑋) ≡ 𝜇𝑋 Law of Iterated Expectations (LIE)

The random variables X & Y are independent iff

(I) 𝑃(𝑋 = 𝑥𝑖 |𝑌 = 𝑦𝑗 ) = 𝑃(𝑋 = 𝑥𝑖 ) ∀ 𝑥𝑖 , 𝑦𝑗

Then
(13) 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 ) ==
𝐹 𝐼 𝑃(𝑋 = 𝑥𝑖 ) ∙ 𝑃(𝑌 = 𝑦𝑗 ) ∀ 𝑥𝑖 , 𝑦𝑗

Finally, remember
𝜎𝑋𝑌
(J) 𝑐𝑜𝑟𝑟(𝑋, 𝑌) = ⁄𝜎𝑋 𝜎𝑌 ≡ 𝜌𝑋𝑌

Thus,
(14) 𝐼𝑓 𝑋 & 𝑌 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 , 𝑡ℎ𝑒𝑛 𝑐𝑜𝑟𝑟(𝑋, 𝑌) = 0
X & Y are random variables (not constants), therefore 𝜎𝑋 > 0 & 𝜎𝑌 > 0.
Thus, 𝑐𝑜𝑟𝑟(𝑋, 𝑌) = 0 iff 𝑐𝑜𝑣(𝑋, 𝑌) = 0 or, by (5), 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌).

𝐸(𝑋𝑌) =𝐴 ∑𝑘𝑖=1 ∑𝑚
𝑗=1 𝑥𝑖 𝑦𝑗 𝑃(𝑋 = 𝑥𝑖 ∩ 𝑌 = 𝑦𝑗 )
=
𝐼 ∑𝑘𝑖=1 ∑𝑚
𝑗=1 𝑥𝑖 𝑦𝑗 𝑃(𝑋 = 𝑥𝑖 ) 𝑃 (𝑌 = 𝑦𝑗 )

= ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) ∑𝑚
𝑗=1 𝑦𝑗 𝑃(𝑌 = 𝑦𝑗 )
=
𝐴 𝐸(𝑋)𝐸(𝑌)
Econometrics I Academic Year 2020-2021

You should understand the distinction between the moments of a


random variable and their estimates (with a hat ^)

Moments of random variable X Sample-estimated moments

1
𝐸(𝑋) = ∑𝑘𝑖=1 𝑥𝑖 𝑃(𝑋 = 𝑥𝑖 ) ≡ 𝜇𝑋 𝑥 = ∑𝑛𝑖=1 𝑥𝑖 ≡ 𝜇̂ 𝑋
𝑛

1
𝑣𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇𝑋 )2 ] ≡ 𝜎𝑋2 𝑠𝑋2 = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)2 ≡ 𝜎̂𝑋2
𝑛−1

𝑠𝑑(𝑋) = √𝜎𝑋2 ≡ 𝜎𝑋 𝑠𝑋 = √𝑠𝑋2 ≡ 𝜎̂𝑋

𝑐𝑜𝑣(𝑋, 𝑌) = 𝑠𝑋𝑌 =
1
= 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] ≡ 𝜎𝑋𝑌 = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) ≡ 𝜎̂𝑋𝑌
𝑛−1

𝜎𝑋𝑌 𝑠𝑋𝑌
𝑐𝑜𝑟𝑟(𝑋, 𝑌) = ⁄𝜎𝑋 𝜎𝑌 ≡ 𝜌𝑋𝑌 𝑟𝑋𝑌 = ⁄𝑠𝑋 𝑠𝑌 ≡ 𝜌̂𝑋𝑌

Moreover, the statistics on the right are also functions of the data, thus
random variables whose realization depends on the sample.
For example, X is the height of Europeans and we want to estimate 𝜇𝑋 ,
the unknown mean or expected value of the height of Europeans, E(X).
Juan selects randomly a representative sample of 500 Spaniards and
obtains 𝑥 = 1.67: this is his estimate of 𝜇𝑋 (𝜇̂ 𝑋 ). But Giampiero takes
another sample of 500 Italians and gets 𝑥 = 1.65. And Catiana draws a
representative sample of 500 Germans that gives 𝑥 = 1.70. Thus, {1.65,
1.67, 1.70} are realizations of the random variable 𝑋, the average height
of a representative sample of 500 Europeans. This random variable 𝑋
also has moments: mean, variance, etc; that we can estimate with data.

𝐸(𝑋) = 𝜇𝑋 ≡ 𝜇𝑋̅ 𝜇̂ 𝑋̅ = 𝑥

1 2 1
𝑣𝑎𝑟(𝑋) = 𝜎𝑋 ≡ 𝜎𝑋2̅ 𝜎̂𝑋2̅ = 𝜎̂ 2 ≡ 𝑠𝑋2̅
𝑛 𝑛 𝑋

1 1
𝑠𝑑(𝑋) = √ 𝜎𝑋2 ≡ 𝜎𝑋̅ 𝜎̂𝑋̅ = √ 𝜎̂𝑋2 ≡ 𝑆𝐸(𝑋)
𝑛 𝑛

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy