Review of Basic Probability: 1.1 Random Variables and Distributions
Review of Basic Probability: 1.1 Random Variables and Distributions
Review of Basic Probability: 1.1 Random Variables and Distributions
Lecture 1:
This lecture covers concepts from probability that you should have seen before if you have
taken the right prerequisites to be adequately prepared for this course.
1.1
For our purposes, random variables will be one of two types: discrete or continuous. (Also
see the note immediately after Example 1.1.3.)
Discrete Random Variables
A random variable X is discrete if its set of possible values X is finite or countably infinite.
The set of possible values X is called the support of the random variable.
The probability mass function (pmf) of a discrete random variable X is the nonnegative
function f (x) = P (X = x), where x denotes each possible value that X can take. It is
always true that xX f (x) = 1.
The cumulative distribution function (cdf) of a random variable X is F (x) = P (X x).
If X is discrete, then F (x) = {tX tx} f (t), and so the cdf consists of constant sections
separated by jump discontinuities.
Specific types of discrete random variables include binomial, geometric, Poisson, and discrete
uniform random variables.
Example 1.1.1: A fair coin is flipped three times. Let X be the total number of heads.
The set of possible values of X is X = {0, 1, 2, 3}, a finite set, so X is discrete. Its pmf is
f (0) = f (3) = 1/8, f (1) = f (2) = 3/8. Note that X is a binomial random variable.
Example 1.1.2: A fair coin is flipped repeatedly until it comes up heads. Let X be the total
number of flips needed to obtain heads. The set of possible values of X is X = {1, 2, 3, . . .}, a
countably infinite set, so X is discrete. Its pmf is f (x) = 2x for every x {1, 2, 3, . . .}. Note
that X is a geometric random variable.
Specific types of continuous random variables include normal, exponential, beta, gamma,
chi-squared, Students t, and continuous uniform random variables.
Example 1.1.3: Let X be the amount of time in hours that an electrical component functions before breaking down. This random variable might have the pdf
exp(x)
f (x) = exp(x) I[0,) (x) =
if x 0,
if x < 0,
which we recognize as an exponential distribution. The probability that the part functions
Note: The cdf is a more general description of a random variable than the pmf or pdf,
since it has a single definition that applies for both discrete and continuous random
variables. In fact, there is no difficulty in writing down the cdf of a mixed random
variable that is neither wholly discrete nor wholly continuous. Such a cdf would simply include both jump discontinuities and regions where it is continuously increasing.
However, such mixed random variables have neither a pmf nor a pdf in the senses
considered here.
Clarification of Notation
We may sometimes need to clarify our notation for pmfs or pdfs in two ways:
When dealing with more than one random variable, we may need to explicitly denote
the random variable to which a pmf or pdf corresponds. If so, we will write f (X) (x)
for the pmf or pdf of X evaluated at x.
A pmf or pdf often depends on one or more parameters. We may need to explicitly
indicate the value of a parameter at which the pmf or pdf is calculated. If so, we will
write f (x) for the pmf or pdf evaluated at the parameter value .
(X)
{xX g(x)=y}
P (X = x) =
{xX g(x)=y}
f (X) (x).
d (Y )
f (X) [g 1 (y)]
Example 1.1.4: Let X be the random variable in Example 1.1.3, and let Y = X 2 . Then
the cdf of Y is
F
(Y )
(y) = P (Y y) = P (X y) = P (X y ) = F (X) ( y ) =
2
exp(x) dx
= 1 exp( y )
0
if y 0 (and zero otherwise). Then the pdf of Y can be obtained by differentiating the cdf:
f (Y ) (y) =
d (Y )
F (y) = exp( y )
dy
2 y
(Y )
yY
xX
(X,Y )
(x, y) dx dy = P [(X, Y ) A]
Note: When we write f (x, y) without clarification, we mean the joint pdf f (X,Y ) (x, y).
f (X) (x) =
f (Y ) (y) =
f (X,Y ) (x, y)
,
f (Y ) (y)
f (Y X) (y x) =
f (X,Y ) (x, y)
.
f (X) (x)
P (X A, Y = y) 0
=
P (Y = y)
0
since Y is a continuous random variable. See the note on page 146 of DeGroot and
Schervish for additional explanation of what conditional pdfs actually represent.
Indepdenence
Random variables X and Y are called independent if P (X A, Y B) = P (X A) P (Y B)
for all sets A and B. Random variables X and Y are independent if and only if their joint
pmf or pdf factorizes into their marginal pmfs or pdfs, i.e., f (X,Y ) (x, y) = f (X) (x) f (Y ) (y)
for all x and y.
Example 1.1.5: As in Example 1.1.1, a fair coin is flipped three times, with X denoting
the total number of heads. Also let Y count the number of heads in the first flip (i.e., Y = 1
if the first flip is heads and Y = 0 if the first flip is tails). Then the joint pmf of X and Y is
f (X,Y ) (0, 0) = 1/8,
f (X,Y ) (0, 1) = 0,
f (X,Y ) (3, 0) = 0,
The marginal pmf of X is the same as in Example 1.1.1, while the marginal pmf of Y is
simply f (Y ) (0) = f (Y ) (1) = 1/2. The conditional pmf of X given that Y = 0 is
f (X,Y ) (1, 0) 1/4 1
f (X,Y ) (0, 0) 1/8 1
(XY )
=
=
,
f
(1
0)
=
=
= ,
1/2 4
1/2 2
f (Y ) (0)
f (Y ) (0)
f (X,Y ) (2, 0) 1/8 1
f (X,Y ) (3, 0)
0
(XY )
f (XY ) (2 0) =
=
=
,
f
(3
0)
=
=
= 0,
(Y
)
(Y
)
1/2 4
1/2
f (0)
f (0)
with the other conditional pmfs calculated similarly. Note that X and Y are not independent
since (for example) f (X,Y ) (0, 1) = 0 3/16 = f (X) (0) f (Y ) (1). (However, note from the
construction of the example that the random variables Y and X Y are independent.)
f (XY ) (0 0) =
1.2
There are several quantities that we can calculate to summarize random variables.
Expectation
For our purposes, the expectation or expected value E(X) of a random variable X is defined as
that a more general formula for all cases above is E[g(X)] = g(x) dF (x).
Example 1.2.2: Let X have a t distribution with one degree of freedom (also known as a
Cauchy distribution), which has the pdf f (x) = [(1 + x2 )]1 for all x R. Since the pdf is
symmetric about zero, it might seem as though E(X) should be zero. However, this is false:
E(X) =
0
x
x
x
dx
=
dx
+
dx = .
2
2
(1 + x )
(1 + x )
0
(1 + x2 )
positive part
negative part
In the presence of multiple random variables, the sum or integral should be taken over the
joint pdf, i.e., E[g(X, Y )] = R2 g(x, y) f (x, y) dx dy or E[g(X, Y )] = xX yY g(x, y) f (x, y).
For functions of X only or Y only, the sum or integral may equivalently be taken over the
corresponding marginal pmf or pdf.
Clarification of Notation
The pmf or pdf of X often depends on one or more parameters. In general, the value of
E[g(X)] may also depend on these same parameters. To explicitly indicate this, we will
write E [g(X)] for the expectation of g(X) computed with a parameter value .
Note: You may have seen people write things like EX [g(X)] for what weve called
E[g(X)] or E [g(X)]. The EX [g(X)] notation is problematic for multiple reasons,
and we will not use such notation in this course.
Variance
The variance Var(X) of a random variable X is defined as Var(X) = E{[X E(X)]2 }.
An equivalent (and typically easier) formula is Var(X) = E(X 2 ) [E(X)]2 . Similarly,
the variance of a function g(X) of a random variable X is Var[g(X)] = E{[g(X)]2 }
{E[g(X)]}2 .
Example 1.2.3: In Example 1.1.1, we have
3
3
1
1
E(X 2 ) = (02 ) + (12 ) + (22 ) + (32 ) = 3,
8
8
8
8
3 2 3
2
Var(X) = E(X 2 ) [E(X)] = 3 ( ) = .
2
4
(Of course, we could also simply look up the variance of a binomial random variable.)
Since the variance is just a special kind of expectation, it can also be or fail to exist.
Note: If E[g(X)] is infinite or does not exist, then Var[g(X)] does not exist. If
E[g(X)] is finite, then Var[g(X)] is guaranteed to exist, although it may be . It
can be shown that if E{[g(X)]2 } is finite, then E[g(X)] is finite as well. Thus, if
Var[g(X)] < , then E[g(X)] exists and is finite.
We will write Var [g(X)] when necessary to explicitly indicate the dependence of the variance on a parameter value .
Covariance
The covariance Cov(X, Y ) of a random variable X and a random variable Y is defined
as Cov(X, Y ) = E{[X E(X)][Y E(Y )]}. An equivalent (and typically easier) formula is Cov(X, Y ) = E(XY ) E(X) E(Y ). Similarly, the covariance of g(X) and h(Y )
is Cov[g(X), h(Y )] = E[g(X) h(Y )] E[g(X)] E[h(Y )].
Example 1.2.4: In Example 1.1.5,
1
1
1
1
1
1
E(XY ) = (0)(0) + (1)(0) + (1)(1) + (2)(0) + (2)(1) + (3)(1) = 1.
8
4
8
8
4
8
Then Cov(X, Y ) = E(XY ) E(X) E(Y ) = 1 (3/2)(1/2) = 1/4.
Since the covariance is just a special kind of expectation, it can also be or fail to exist.
Note: If either E[g(X)] or E[h(Y )] does not exist, then Cov[g(X), h(Y )] does not
exist either. However, if Var[g(X)] < and Var[h(Y )] < , then Cov[g(X), h(Y )] is
guaranteed to exist and to be finite (see the Cauchy-Schwarz inequality below).
with equality if and only if g(X) = a + b h(Y ) with probability 1 for some constants a and b.
Note: The Cauchy-Schwarz inequality is actually a much more general result than just
what is stated above.
or
E[g(X) Y = y] =
Notice that computing E[g(X) Y = y] yields (in general) different results for different values
of y. Thus, E[g(X) Y = y] is a function of y (and not a random variable). However, we
can consider plugging the random variable Y into this function, which does yield a random
variable. This random variable is what we mean when we write E[g(X) Y ].
Note: A formal treatment of conditional expectation is a bit more complicated than
this, but the explanation above is good enough for our purposes.
1
Var(X Y = 0) = E(X 2 Y = 0) [E(X Y = 0)]2 = .
2
Of course, it is not a coincidence that these are simply the mean and variance of the number
of heads in two fair coin flips.