0% found this document useful (0 votes)
23 views11 pages

Chapter 4 Statistics

it is better for undergraduate

Uploaded by

sisayshimelis70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views11 pages

Chapter 4 Statistics

it is better for undergraduate

Uploaded by

sisayshimelis70
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter Four

4. JOINT AND CONDITIONAL PROBABILITY DISTRIBUTIONS


4.1. Joint Probability Distributions
Let X and Y be the two random variables: for simplicity, they are considered to be discrete random
variables. The outcome of the experiment is a pair of values (x ; y): The probability of this outcome is a
joint probability which can be denoted:
Pr ⁡(X =x ∩Y = y )
, emphasizing the analogy with the probability of a joint event Pr ( A ¿), or, more usually, by
Pr ⁡(X =x ∩Y = y )
 The collection of these probabilities, for all possible combinations of x and y; is the joint
probability distribution of X and Y; denoted
p ( x ; y )=Pr ⁡(X=x ∩Y = y)
 The value of p ( x ; y ) is between 0 and 1
0 ≤ p ( x ; y )≤ 1
 The sum of all possible joint probabilities is unity

∑ ∑ p ( x ; y )=1 , where the sum is over all (x; y) values


x y

Example1: Let H and W be the random variables representing the population of weekly incomes of
husbands and wives, respectively, in Dire Dawa City. There are only three possible weekly incomes, Birr
0, Birr 2000 or Birr 4000. The joint probability distribution of H and W is represented as a table:
Probabilities Value of H

Value of W 0 2000 4000

0 0.05 0.15 0.10

2000 0.10 0.10 0.30

4000 0.05 0.05 0.10

Then we can read off, for example, that Pr (H=0; W =0)=0: 05; or that in this population, 5% of
husbands and wives have each a zero weekly income.
In this example, the nature of the experiment underlying the population data is not explicitly stated.
However, in the next example, the experiment is described, the random variables defined in relation to the
experiment, and their probability distribution deduced directly .
Example 2: Consider the following simple version of a lottery. Players in the lottery choose one number
between 1 and 5, whilst a machine selects the lottery winners by randomly selecting one of the e balls

1|Page
(numbered 1 to 5). Any player whose chosen number coincides with the number on the ball is a winner
whilst the machine selects one ball at random (so that each ball has a 0.2 chance of selection), players of
the lottery have “lucky” and “unlucky” numbers, with the probabilities of choosing each specific number
as follows:
Number chosen by player 1 2 3 4 5
Probability of being chosen 0.40 0.20 0.05 0.10 0.25 1
Let X denotes the number chosen by a player and Y the number chosen by the machine. If they are
assumed to be independent events, then for each possible value of X and Y, we will have
Pr (X ∩Y )=Pr (X )× Pr (Y )
The table above gives the probabilities for X; and Pr (Y )=0.2; so that a table can be drawn up displaying the joint

distribution p(x ; y) :
Probabilities Y: Chosen by Machine Row total

X: chosen by Player 1 2 3 4 5
1 0.08 0.08 0.08 0.08 0.08 0.40
2 0.04 0.04 0.04 0.04 0.04 0.20
3 0.01 0.01 0.01 0.01 0.01 0.05
4 0.02 0.02 0.02 0.02 0.02 0.10
5 0.05 0.05 0.05 0.05 0.05 0.25
Column total 0.20 0.20 0.20 0.20 0.20 1
The general question of independence in joint probability distributions will be discussed later in the
section.
4.2. Marginal Probability Distribution
Given a joint probability distribution p ( x ; y )=Pr ( X =x ; Y = y ) for the random variables X and Y; a
probability of the form Pr ( X=x )∨Pr ( Y = y ) is called a marginal probability. The collection of these
probabilities for all values of X is the marginal probability distribution for X;
pX (x)=Pr (X=x )
If it is clear from the context, write pX ( x ) as p ( x ): Suppose that Y takes on values 0; 1; 2: Then,
Pr (X =x)=Pr ( X=x ;Y =0)+ Pr (X= x ; Y =1)+ Pr (X =x ; Y =2) ; The sum of all the joint
probabilities favorable to X = x: So, marginal probability distributions are found by summing over all the
values of the other variable:
Pr ( X=x )=∑ p ( x ; y )
y

And
Pr (Y = y )=∑ p ( x ; y )
x

2|Page
This can be illustrated using Example 1 of Section 5.1 again:
Pr (W =0)=Pr (W =0 ; H=0)+ Pr (W =0 ; H =2000)+ Pr (W =0; H =4000)
¿ 0 :05+ 0:15+ 0 :10
¿ 0 :30
There is a simple recipe for finding the marginal distributions in the table of joint probabilities: find the
row sums and column sums. From Example 1 in Section 5.1,
Probabilities Value of H Row sum)

Value of W 0 2000 4000 pW(w)

0 0.05 0.15 0.10 0.30

2000 0.10 0.10 0.30 0.50

4000 0.05 0.05 0.10 0.20

Column sum pH(h) 0.20 0.30 0.50 1

, from which the marginal distributions should be written out explicitly as;

Value of H pH(h) Value of W pW(w)


0 0.20 0 0.30
2000 0.30 2000 0.50
4000 0.50 4000 0.20
1 1
By calculation, we can find the expected values and variances of W and H as

E [ W ] =∑ W × p W (w)=1800; var [ W ] =∑ ¿ ¿ ¿
E [ H ] =∑ H × p H (h)=2600; var [ H ]=∑ H × p H (h)=1.96 × 106
Notice that a marginal probability distribution has to satisfy the usual properties expected of a probability
distribution (for a discrete random variable):
0 ≤ p X (x )≤ 1 ; ∑ p X (x )=1
x

0 ≤ p Y ( y)≤1 ; ∑ pY ( y )=1
y

4.3. Conditional probability Distributions and independence


In econometrics, we are often interested in how random variable X affects random variable Y. This
information is contained in something called the conditional distribution of “Y given X”. For discrete
random variables, X and Y; this distribution is defined by the following probabilities:
p (x , y )
PY / X ( y / x )=Pr (Y = y|X =x )=
pX ( x )

3|Page
, where reads as .the probability that Y takes the value y given that (conditional on) X takes the value x.

conditional probabilities are defined on a restricted sample space of X = x (hence the rescaling by p X ¿ ))
and they are calculated on a sequence on restricted sample spaces; one for each possible value of x (in the
discrete case).
As an illustration of the calculations, consider again Example 1 and the construction of the conditional
distribution of W given H for which we had the following joint distribution:
Probabilities Value of H

Value of W 0 2000 4000

0 0.05 0.15 0.10

2000 0.10 0.10 0.30

4000 0.05 0.05 0.10

We consider, in turn, conditional probabilities for the values of W given, first H=0; then H=2000 and
finally H=4000. Intuitively, think of the probabilities in the cells as indicating sub-areas of the entire
sample space, with the latter having an area of 1 and the former (therefore) summing to 1: With this
interpretation, the restriction H=0 “occupies” 20% of the entire sample space (recall the marginal
probability, (H=0) ; from the example). The three cells corresponding to H = 0 now correspond to the
restricted sample space of H = 0; and the outcome W = 0 takes up 0 :05 /0:2=0 :25 of this restricted
sample space; thus
( W =0× H=0 ) 0.05
Pr (W =0|H =0 ) =Pr = =25
Pr ( H=0 ) 0.20
Similarly,
Pr (W =2000 × H =0 ) 0.10
Pr (W =2000|H =0 ) = = =0.50
Pr ( H=0 ) 0.20
And
Pr ( W =4000 × H =0 ) 0.05
Pr (W =4000|H=0 )= = =0.25
Pr ( H=0 ) 0.20
Notice that
4000

∑ Pr ( W = j|H =0 )=1
j=0

; as it should do for the restricted sample space of H = 0. For all possible restrictions imposed by H we get
the following conditional distributions for W (we get three conditional distributions, one for each of h =
0; h = 1, h = 2).

4|Page
Value of H Pr(W=w|H=h)
Probabilities 0 2000 4000 h=0 h=2000 h=4000
0 0.05 0.15 0.10 1/4 ½ 1/5
Conditional distribution
Value of W 2000 0.10 0.10 0.30 1/2 1/3 3/5
4000 0.05 0.05 0.10 1/4 1/6 1/5

Notice how the probabilities for particular values of W change according to the restriction imposed by H; for
example, (W =0∨H=0)≠ Pr (W =0∨H=2000) ; say. Thus knowledge of, or information about, H changes
probabilities concerning W. Because of this, and as ascertained previously, W and H are NOT independent.
In general, X and Y are independent, if and only if knowledge of the value taken by X does not tell us anything
about the probability that Y takes any particular value. Indeed, from the definition of pY ∨ X ( y /x ) ; we see that X

and Y are independent if and only if pY ∨ X ( y ∨x) = pY ( y ) ; for all x, y. There is a similar treatment for
conditional distributions for continuous random variables.
Conditional Expectation
While correlation is a useful summary of the relationship between two random variables, in econometrics we often
want to go further and explain one random variable Y as a function of some other random variable X. One way of
doing this is to look at the properties of the distribution of Y conditional on X, as introduced above. In general these
properties, such as expectation and variance, will depend on the value of X, thus we can think of them as being
functions of X. The conditional expectation of Y is denoted E(Y ∨X =x) and tells us the expectation of Y given
that X has taken the particular value x. Since this will vary with the particular value taken by X we can think of

E(Y ∨X =x)=m( x); as a function of x.


As an example think of the population of all working individuals and let X be years of education and Y be hourly
wages. E(Y ∨X =12) is the expected hourly wage for all those people who have 12 years of education while

E(Y ∨X =16) tells us the expected hourly wage for all those who have 16 years of education. Tracing
out the values of E(Y ∨X =x) for all values of X tells us a lot about how education and wages are related.
In econometrics we typically summaries the relationship represented by E(Y ∨X )=m( X) in the form of
a simple function. For example we could use a simple linear function:
E(WAGE ∨EDUC)=1.05+0 : 45 EDUC
, or a non-linear function:
10
E ( QUANTITY |PRICE )=
PRICE
, with the latter example demonstrating the deficiencies of correlation as a measure of association (since it
confines itself to the consideration of linear relationships only)
Properties of Conditional Expectation
The following properties hold for both discrete and continuous random variables.

5|Page
 E [c ( X)∨X ]=c (X ) for any function c(X). Functions of X behave as constants when we compute
expectations conditional on X. (If we know the value of X then we know the value of c(X) so this is
effectively a constant.)
For functions a(X) and b(X)
 E [(a( X)Y +b (X ))∨X ]=a( X) E(Y ∨X )+b( X ). This is an extension of the previous rule`s logic
and says that since we are conditioning on X, we can treat X, and any function of X, as a constant when we
take the expectation.
 If X and Y are independent, then E(Y ∨X )=E (Y ) . This follows immediately from the earlier
discussion of conditional probability distributions. If the two random variables are independent then
knowledge of the value of X should not change our view of the likelihood of any value of Y. It should
therefore not change our view of the expected value of Y. A specific case is where U and X are
independent and E(U )=0. It is then clear that E(U ∨X )=0.

E [E (Y ∨X )]=E(Y )
This result is known as the “iterative expectations” rule. We can think of E ¿) as being a function of X. Since X is a
random variable, then E(Y ∨X )=m( X) is a random variable and it makes sense to think about its distribution
and hence its expected value
Think about the following example:
Suppose E(WAGE ∨EDUC)=4+0.6 EDUC . Suppose E(EDUC )=11.5.
Then according to the iterative expectation rule
E(WAGE )=E (4 +0.6 EDUC)=4+ 0.6(11.5)=10.9
 If E(Y ∨X )=E ¿ ) then Cov ( X ; Y ) =0.
The last two properties have immediate applications in econometric modeling: if U and X are random
variables with E ( U| X )=0 , then E(U )=0 andCov (U ; X )=0 . Finally, E(Y ∨X ) is often called the
“regression” of Y on X. We can always write Y =E (Y ∨ X )+U
Where by the above properties, E(U ∨X )=0 Now consider E ¿), which is the conditional variance of Y
given X;
2
E(U ∨X )=E {(Y −E(Y ∨X ))2∨X =var (Y ∨ X) ,
In general, it can be shown that
var (Y )=E [var (Y ∨ X )]+ var [E (Y ∨X )] .
4.4. Independence, covariance and correlation
4.4.1. Independence
If the random variables X and Y have a joint probability distribution
p ( x ; y )=Pr ( X =x ; Y = y )
Then it is possible that for some combinations of x and y; the events (X = x) and

6|Page
(Y = y) are independent events:
Pr (X =x ; Y = y)=Pr ( X=x )× Pr (Y = y )
Or, using the notation of joint and marginal probability distributions,

p(x ; y)=p X (x )× p Y ¿)
If this relationship holds for all values of x and y; the random variables X and Y are said to be independent:
 X and Y are independent random variables if and only if
p(x ; y)=p X (x )× p Y ¿) for all x; y
Each joint probability is the product of the corresponding marginal probabilities. Independence also means that Pr
(Y = y) would not be affected by knowing that X = x: knowing the value taken on by one random variable does not
affect the probabilities of the outcomes of the other random variable. A corollary of this is that if two random
variables X and Y are independent, then there can be no relationship of any kind, linear or non-linear, between them.
The joint probabilities and marginal probabilities for Example 1 are given below:
Probabilities Value of H Row sum)

Value of W 0 2000 4000 pW(w)

0 0.05 0.15 0.10 0.30

2000 0.10 0.10 0.30 0.50

4000 0.05 0.05 0.10 0.20

Column sum pH(h) 0.20 0.30 0.50 1

Here p(0 ; 0)=0.05; whilst pW (0)=0.30; pH ( 0 )=0.20 with p(0 ; 0)≠ pW (0)× pH (0) So, H and W
cannot be independent.
For X and Y to be independent, p(x ; y)=p X (x )× p Y ¿) has to hold for all x; y. Finding one pair of
values x; y for which this fails is sufficient to conclude that X and Y are not independent. However, one
may also have to check every possible pair of values to confirm independence: think what would be
required in Example 2, if one did not know that the joint probability distribution had been constructed
using an independence property.
Functions of Two Random Variables
Given the experiment of Example 1, one can imagine defining further random variables on the sample space of this
experiment. One example is the random variable T representing total household income:
T =H +W :
This new random variable is a (linear) function of H and W; and we can deduce the probability distribution of T
from the joint distribution of H and W: For example,
Pr (T =0)=Pr (H =0 ; W =0) ;
Pr (T =1)=Pr (H=0 ; W =1)+ Pr ( H =1; W =0):

7|Page
The complete probability distribution of T is
Value of T Pr (T =t) t × Pr (T=t)
0 0.05 0
2000 0.25 500
4000 0.25 1000
6000 0.35 2100
8000 0.10 800
1 4400

, from which we note that E [T] = 4400; indicating that the population mean income for married couples in the city
is 4400.
Now we consider a more formal approach. Let X and Y be two discrete random variables with joint probability
distribution (x ; y) : Let V be a random variable defined as a function of X and Y :

V =g (X ; Y ) :
Here, g( X ; Y ) is not necessarily a linear function: it could be any function of two variables. In principle, we can

deduce the probability distribution of V from p(x ; y) and thus deduce the mean of V; E ¿] ; just as we did for T in
Example 1.
However, there is a second method that works directly with the joint probability distribution p(x ; y) : the expected
value of V is
E [V ]=E[ g( X ; Y )]=∑ ∑ g ( x ; y ) × p(x ; y)
x y

The point about this approach is that it avoids the calculation of the probability distribution of V.
To apply this argument to find E [T ] in Example 1, it is helpful to modify the table of joint probabilities to display
the value of T associated with each pair of values for H and W:

h
(t ) 0 2000 4000
w 0 (0) 0.05 (2000) 0.15 (4000) 0.10
2000 (2000) 0.10 (4000) 0.10 (6000) 0.30
4000 (4000) 0.05 (6000) 0.05 (8000) 0.10

E [T ]=(0)× 0 . 05+(2000)×(0 .15)+ 4000× 0. 10


+ ( 2000 ) × 0.10+ ( 4000 ) × ( 0 . 10 ) +(6000)×0 . 30
+ 4000× 0. 05+(6000)×(0.05)+8000 ×0 . 10
¿ 4400
So, the recipe is to multiply, for each cell, the implied value of T in that by the probability in that cell, and add up
the calculated values over all the cells.
4.4.2. Covariance

8|Page
A popular measure of association for random variables X and Y is the (population) correlation coefficient. It is the
population characteristic analogous to the (sample) correlation coefficient. It will be seen that this (population)
correlation coefficient is really only a measure of strength of any linear relationship between the random variables.
The first step is to define the (population) covariance as a characteristic of the joint probability distribution of X and
Y. Let
E [X ]=μ x ; E [Y ]=μ y
The (population) covariance is defined as
cov [ X ; Y ] =E [ ( X−μ x ) ( Y −μ y ) ] =σ XY
Notice that by this definition, [ X ; Y ]=cov [Y ; X ] :

There are a number of alternative expressions for the covariance. The first follows from seeing ( X −μ x ) ( Y −μ y )as a

function g( X ; Y ) of X and Y:
cov [X ; Y ]=∑ ∑ ( X−μ x ) ( Y −μ y ) p (x ; y)
x y

We can see from this expression that if enough (x; y) pairs have X −μx and Y −μ y values with the same
sign, cov [X ; Y ]> 0; so that large (small) values of X −μx tend to occur with large (small) values of
Y −μ y: Similarly, if enough (x; y) pairs have X −μx and Y −μ yvalues with different signs, cov [X ; Y ]< 0
: Here, large (small) values of X −μx tend to occur with small (large) values of Y −μ y .
 cov [X ; Y ]> 0 gives a positive relationship between X and Y; cov [X ; Y ]< 0 a negative relationship.
There is a shorthand calculation for covariance, analogous to that given for the variance in the following section.
Cov [ X ; Y ] =E [ ( X−μ x ) ( Y −μ y ) ]

¿ E [ XY − X μ Y −Y μ x + μx μ y ]
¿ E¿
¿ E [ XY ] −μY μ x – μ x μY + μ x μ y
¿ E [ XY ] −μ x μY
Even with this shorthand method, the calculation of the covariance is rather tedious. To calculate cov [W ; H ] in
Example 1, the best approach is to imitate the way in which E [T] was calculated in W × H the previous Section.
Rather than display the values of T; here we display the values of in order to first calculate E [WH ]:

h
(w×h ) 0 2000 4000
w 0 (0) 0.05 (0) 0.15 (0) 0.10
(0) 6 6
2000 0.10 (4 ×10 ) 0.10 (8 ×10 ) 0.30
4000 (0) 0.05
6
(8 ×10 ) 0.05 ¿) 0.10
Using the same strategy of multiplication within cells, and adding up along each row in turn, we find

9|Page
E [WH ]=(0)× 0 .05+(0)×(0 .15)+0 × 0.10
+ ( 0 ) × 0.10+ ( 4 × 106 ) × ( 0 . 10 ) +(8× 106 )× 0 .30
6 6
+0 × 0.05+(8 × 10 ) ×(0.05)+16 ×10 ×0 . 10
6
¿ 4.8 × 10
E [ W ] =∑ W × p W (w)=1800;
E [ H ] =∑ H × p H (h)=2600;

Cov [ W ; H ] =E [ WH ]−E [ W ] E [ H ]
6
¿ 4.8 × 10 −1800 ×2600 ¿
6 6
¿ 4.8 × 10 −4.68 ×10
5
¿ 3.2 ×10
4.4.3. Correlation
This cannot be sensible: what is required is a measure of the strength of association which is invariant to changes in
units of measurement. Generalizing what we have just seen, if the units of measurement of two random variables X
and Y are changed to produce new random variables α X and β Y; then the covariance in the new units of
measurement is related to the covariance in the original units of measurement by
cov [αX ; β Y ]=αβ cov [ X ; Y ]:
What are the variances of αX and β Y in terms of var [ X ] and var [Y ]?
2 2
var [αX ]=α var [ X ]; var [ βY ]=β var [Y ]:
The (population) correlation coefficient between X and Y is defined by
ρ XY =cov [ X ; Y ]/ √ var [ X ]var [Y ]
This is also the correlation between α X and β Y:
cov [ αX ; βY ]
ραX , βY =
¿ √ α 2 var [ X ] × β 2 var [ Y ]
αβcov [ X ; ]
ραX , βY = =ρ XY
αβ √ var [ X ] × var [ Y ]
,so that the correlation coefficient does not depend on the units of measurement.
In example one what is the correlation coefficient?
cov [ W ; H ] 3 . 2 ×10
5
ρWH = = =0.146328
√ var [ W ] var [ H ] √1.96 ×10 6 × 2.44 ×106
 the correlation coefficient ρ XY Y always satis.es −1 ≤ ρXY ≤ 1 :
 The closer ρ XY is to 1 or -1; the stronger the relationship.

10 | P a g e
 It can be shown that if X and Y are exactly linearly related by Y =a+bX with b> 0then ρ XY = 1 -
that is, X and Y are perfectly correlated. X and Y are also perfectly correlated if they are exactly
linearly related by Y =a+b X with b< 0; but ρ XY = 1: Thus,
 correlation measures only the strength of a linear relationship between X and Y ;
 Correlation does not imply causation. Other notations for the correlation coefficient are
ρ σ XY
XY =
σ X σY

which uses covariance and standard deviation notation, and


ρ E [ ( X −μx )( Y −μ y ) ]
XY =
√ E [ ( X −μ ) ( Y −μ ) ]
x
2
y
2

Correlation, Covariance and Independence


Non-zero correlation and covariance between random variables X and Y indicate some linear association
between them, whilst independence of X and Y implies no relationship or association of any kind
between them. So, it is not surprising that
 independence of X and Y implies zero covariance cov [X ; Y ]=0;
 independence of X and Y implies zero correlation: ρ XY =0 :
 The converse is not true, in general: zero covariance or correlation does not imply independence.
The reason is that there may be a relationship between X and Y which is not linear.

11 | P a g e

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy