Joint Probability Distributions and Random Samples
Joint Probability Distributions and Random Samples
Joint Probability Distributions and Random Samples
5 Distributions and
Random Samples
3
Expected Values, Covariance, and Correlation
Proposition
Let X and Y be jointly distributed rv’s with pmf p(x, y) or
pdf f (x, y) according to whether the variables are discrete
or continuous.
4
Example 13
Five friends have purchased tickets to a certain concert. If
the tickets are for seats 1–5 in a particular row and the
tickets are randomly distributed among the five, what
is the expected number of seats separating any particular
two of the five?
x = 1, . . . , 5; y = 1, . . . , 5; x y
otherwise
5
Example 13 cont’d
6
Example 13 cont’d
Thus
7
Covariance
8
Covariance
When two random variables X and Y are not independent,
it is frequently of interest to assess how strongly they are
related to one another.
Definition
The covariance between two rv’s X and Y is
Cov(X, Y) = E[(X – X)(Y – Y)]
X, Y discrete
X, Y continuous
9
Covariance
That is, since X – X and Y – Y are the deviations of the
two variables from their respective mean values, the
covariance is the expected product of deviations. Note
that Cov(X, X) = E[(X – X)2] = V(X).
10
Covariance
Then most of the probability mass or density will be
associated with (x – X) and (y – Y), either both positive
(both X and Y above their respective means) or both
negative, so the product (x – X)(y – Y) will tend to be
positive.
12
Covariance
Figure 5.4 illustrates the different possibilities. The
covariance depends on both the set of possible pairs and
the probabilities. In Figure 5.4, the probabilities could be
changed without altering the set of possible pairs, and this
could drastically change the value of Cov(X, Y).
14
Example 15 cont’d
Therefore,
Cov(X, Y) = (x – 175)(y – 125)p(x, y)
(x, y)
= 1875
15
Covariance
The following shortcut formula for Cov(X, Y) simplifies the
computations.
Proposition
Cov(X, Y) = E(XY) – X Y
16
Correlation
17
Correlation
Definition
The correlation coefficient of X and Y, denoted by
Corr(X, Y), X,Y, or just , is defined by
18
Example 17
It is easily verified that in the insurance scenario of
Example 15, E(X2) = 36,250,
This gives
19
Correlation
The following proposition shows that remedies the defect
of Cov(X, Y) and also suggests how to recognize the
existence of a strong (linear) relationship.
Proposition
1. If a and c are either both positive or both negative,
Corr(aX + b, cY + d) = Corr(X, Y)
20
Correlation
If we think of p(x, y) or f(x, y) as prescribing a mathematical
model for how the two numerical variables X and Y are
distributed in some population (height and weight, verbal
SAT score and quantitative SAT score, etc.), then is a
population characteristic or parameter that measures how
strongly X and Y are related in the population.
Proposition
1. If X and Y are independent, then = 0, but = 0 does
not imply independence.
22
Correlation
This proposition says that is a measure of the degree of
linear relationship between X and Y, and only when the
two variables are perfectly related in a linear manner will
be as positive or negative as it can be.
23
Correlation
Also, = 0 does not imply that X and Y are independent,
but only that there is a complete absence of a linear
relationship. When = 0, X and Y are said to be
uncorrelated.
24
Example 18
Let X and Y be discrete rv’s with joint pmf
25
Example 18 cont’d
E(XY) =0
26
Correlation
A value of near 1 does not necessarily imply that
increasing the value of X causes Y to increase. It implies
only that large X values are associated with large Y values.
27
Correlation
For children of a fixed age, there is probably a low
correlation between number of cavities and vocabulary
size.
28