Chapter 2 - Probability
Chapter 2 - Probability
Chapter 2 - Probability
EXAMPLE
Suppose that a die is rolled twice; assume that the variable of interest is whether the
number that turns up is odd or even. There are four possible outcomes of this random
experiment: {even, even}, {even, odd}, {odd, even}, {odd, odd}.
The set of all possible outcomes of a random experiment is known as the sample space.
The sample space for this experiment consists four sample points: S = {EE, EO, OE,
OO}. A subset of the sample space is known as an event.
EXAMPLE
Based on the die-rolling experiment, suppose that the event F is defined as follows: “at
most one even number turns up”. The event F is a set containing the following sample
points: F = {EO, OE, OO}.
For a random experiment in which there are n equally likely outcomes, each outcome has
a probability of occurring equal to 1/n. In the die-rolling example, each of the four
sample points is equally likely, giving it a probability of 1/4. The probability of event F is
therefore:
In this example, the probability of event F would be calculated as P(F) = #F/#S = 3/4.
AXIOMS OF PROBABILITY
An axiom is a logical statement that is assumed to be true without formal proof; all
further results are derived using axioms as a starting point. In probability theory, the
three fundamental axioms are:
n
P(A1 ∪ A2 ∪ ... ∪ An ) = ∑ P(Ai )
i =1
RULES OF PROBABILITY
The rules of probability are derived from these axioms. Three of the most important rules
are the Addition Rule, Multiplication Rule and the Complement Rule.
ADDITION RULE
EXAMPLE
Referring to the die-rolling experiment, suppose that the following events are defined:
What is the probability that the first roll is even or at exactly one roll is odd?
Since the sample space consists of four equally likely sample points,
P(A) = 1/2
P(B) = 1/2
P(A ∩ B) = 1/4
If events A and B cannot both occur at the same time, they are said to be mutually
exclusive or disjoint; in this case, P(A ∩ B) = 0. Therefore, the addition rule for
mutually exclusive events is: P(A ∪ B) = P(A) + P(B).
CONDITIONAL PROBABILITY
The conditional probability of an event is the probability that it occurs given that
another event occurs. For two events A and B, the probability that B occurs given that A
occurs is written as: P(B|A).
Equivalently, the probability that A occurs given that B occurs is computed as:
EXAMPLE
MULTIPLICATION RULE
EXAMPLE
Suppose that a card is chosen from a standard deck without being replaced; a second card
is then chosen.
P(B ∩ A) = P(B|A)P(A)
Since there are thirteen clubs in a standard deck of cards and fifty-two cards in the deck,
P(A) = 13/52 = 1/4
The probability of B given A is computed as follows: if the first card is a club (event A),
then there will be 51 remaining cards in the deck when the second card is chosen. Of
these, 12 of 13 clubs remain in the deck. Therefore, P(B|A) = 12/51.
INDEPENDENT EVENTS
Two events A and B are said to be independent if the occurrence of A does not affect the
probability of B occurring, and vice versa.
EXAMPLE
Referring to the die-rolling experiment, suppose that the following events are defined:
Intuitively, events C and D are independent since the two rolls of the die have no
influence on each other.
INDEPENDENT EVENTS
Events A and B are independent only if both of the following conditions are true:
P(B | A) = P(B)
P(A | B) = P(A)
EXAMPLE
P(A) = 1/2
P(B) = 1/2
P(B ∩ A) = 1/4
P(A|B) = P(B ∩ A) /P(B) = (1/4)/(1/2) = 1/2
P(B|A) = P(B ∩ A) /P(A) = (1/4)/(1/2) = 1/2
Since P(A|B) = P(A) and P(B|A) = P(B), A and B are independent events.
For two independent events, the multiplication rule for independent events is:
COMPLEMENT RULE
P(A ∪ B) = 1
P(A ∩ B ) = 0
P(A) = 1 - P(AC)
BAYES’ THEOREM
Bayes’ Theorem can be used to determine the conditional probability of an event. The
general formula for computing conditional probabilities is:
P(B | Ai )P(Ai )
P(Ai | B) = n
! P(B | A )P(A )
i i
i=1
The sample space, or set of all possible events, is partitioned into n events: A1, A2, ..., An;
the denominator of the formula represents the total probability of event B.
EXAMPLE
In this example, AC = "the first roll is odd" = {OE, OO}. Using Bayes’ Theorem, the
probability that the first roll is even given that exactly one roll is odd is determined as
follows:
(1 / 2)(1 / 2) 1
P(A | B) = =
(1 / 2)(1 / 2) + (1 / 2)(1 / 2) 2
RANDOM VARIABLES
EXAMPLE
SAMPLE POINT X
EE 2
EO 1
OE 1
OO 0
P(X = 0) = 1/4
P(X = 1) = 2/4
P(X = 2) = 1/4
Since X can only assume a finite number of different values, it is said to be a discrete
random variable.
F(x) = P(X ≤ x)
where:
X = a random variable
x = a realization (value) of X
F(x) = P(X ≤ x) = ∫
−∞
f (u)du
EXAMPLE
Referring to the die-rolling experiment, the following table illustrates the cdf of X:
x (X ≤ x) F(x) = P(X ≤ x)
<0 {} 0
0 {OO} 1/4
1 {OO, EO, OE} 3/4
2 {OO, EO, OE, EE} 1
>2 {OO, EO, OE, EE} 1
For a discrete random variable, a table or a function showing the probability of each
possible value is known as a probability mass function (pmf), designated p(x):
p(x) = P(X = x)
where:
For a discrete random variable, probabilities can be derived from the cdf as follows:
EXAMPLE
For the die-rolling experiment, the probability mass function of X is shown in the
following table:
x F(x) p(x)
<0 0 0
0 1/4 (1/4 – 0) = 1/4
1 3/4 (3/4 – 1/4) = 1/2
2 1 (1 – 3/4) = 1/4
>2 1 (1 – 1) = 0
P(X = 0) = 1/4
P(X = 1) = 2/4
P(X = 2) = 1/4
For a continuous random variable, a function showing the probability of any range of
possible values is known as a probability density function (pdf), designated f(x):
where:
∫
−∞
f (x)dx = 1
dF(x)
f (x) =
dx
F(x) = ∫ f (u)du
−∞
For a continuous random variable, probabilities can be derived from the cdf as follows:
PROBABILITY FUNCTION
Both probability mass functions and probability density functions are sometimes known
more simply as probability functions.
For two jointly distributed discrete random variables, X and Y, the joint probability
mass function is defined as:
p(x, y) = P(X = x, Y = y)
EXAMPLE
Suppose that a census is taken for a small town; the distribution of the number of children
among the families in this town is given as follows:
Define:
This table shows the joint probabilities for every possible value of X and Y. For
example, the joint probability that a family has 2 boys and 1 girl (X = 2, Y = 1) is 0.0375.
MARGINAL PROBABILITY
For two jointly distributed random variables, X and Y, the marginal probability of X is
the probability that X assumes a given value for all possible values of Y. Equivalently,
the marginal probability of Y is the probability that Y assumes a given value for all
possible values of X.
p X (X = x) = ∑ p(x, y)
y
pY (Y = y) = ∑ p(x, y)
x
In the census example, the marginal probabilities of X are given by the row sums of the
joint pmf; the marginal probabilities of Y are given by the column sums.
EXAMPLE
P(one boy) = P(one boy and no girls) + P(one boy and one girl) + P(one boy and two
girls) + P(one boy and three girls)
EXAMPLE
The (marginal) probability that a family has two girls can be determined as follows:
P(two girls) = P(no boys and two girls) + P(one boy and two girls) + P(two boys and two
girls) + P(three boys and two girls)
UNCONDITIONAL PROBABILITY
CONDITIONAL PROBABILITY
p(x, y)
p X |Y (x | y) = P(X = x | Y = y) =
pY (Y = y)
EXAMPLE
The probability that a family has one boy given that it has two girls is computed as
follows:
p(x, y)
P(X = x | Y = y) =
pY (Y = y)
p(1, 2)
P(X = 1 | Y = 2) =
pY (Y = 2)
For two jointly distributed continuous random variables, X and Y, the joint probability
density function is defined as:
b d
f X (x) = ∫
−∞
f (x, y)dy
fY (y) = ∫
−∞
f (x, y)dx
f (x, y)
f X |Y (x | y) =
fY (y)
For two jointly distributed random variables, X and Y, the joint cumulative distribution
function is defined as:
F(x, y) = P(X ≤ x, Y ≤ y)
a b
EXAMPLE
Using the census example, the probability that a family has two boys or less and one girl
or less can be computed as follows:
P(two boys or less and one girl or less) = P(no boys, no girls) + P( no boys, one girl) +
P(one boy, no girls) + P(one boy, one girl) + P(two boys, no girls) + P(two boys, one girl)
If X and Y are discrete random variables and p(X=xi ,Y = yj) = p(X=xi)p(Y = yj) then X
and Y are independent. For continuous random variables, if:
1) E(XY) = E(X)E(Y)
2) COV(X, Y) = 0
3) Var(X, Y) = Var(X) + Var(Y)
A random variable can be characterized by its moments. These are summary measures
of the behavior of a random variable. The most important of these are:
• Expected Value
• Variance
• Skewness
• Kurtosis
EXPECTED VALUE
The first moment of a random variable X is known as its expected value; this is the
average or mean value of X. For a discrete random variable X, the expected value is
computed as follows:
n
E(X) = ∑ xi P(X = xi )
i =1
where:
xi = a possible value of X
i = an index
n = the number of possible values of X
Σ = “sigma”; this is the summation operator
EXAMPLE
For the die-rolling experiment, where X is the number of even numbers that turn up after
rolling a die twice, the expected value is computed as follows:
This shows that on average, there will be one even number each time a die is rolled twice.
E(X) = ∫ xf (x)dx
−∞
1) E(a) = a
2) E(aX+ b) = aE(X) + b
3) E(X + Y) = E(X) + E(Y)
CONDITIONAL EXPECTATION
For two random variables X and Y, the conditional expectation of X given that Y
assumes a specific value y is written as:
E[X|Y = y]
E[X | Y = y] = ∑ xP(X = x | Y = y)
x
E[X | Y = y] = ∫ xf
−∞
X |Y (x | y)dx
∞
f (x, y)
= ∫x
−∞
fY (y)
dx
EXAMPLE
Using the census example, the expected number of boys in a family given that there are
two girls in the family is computed as:
E[X|Y = 2] =
= 0 + 0.2727 + 0 + 0 = 0.2727
The expected value of the product of two discrete random variables is:
E(XY ) = ∑ ∑ xyp(x, y)
x y
The expected value of the product of two continuous random variables is:
∞ ∞
VARIANCE
The second central moment of a random variable X is known as its variance. This
indicates the degree of dispersion or spread of X around its expected value. The variance
of random variable X is computed as follows:
σ X2 = E[(X − E(X))2 ]
= E[X 2 ] − (E[X])2
For a discrete random variable, the variance can also be expressed as:
i =1
EXAMPLE
σ X2 = ∫ [ x − E(X)]
2
f (x)dx
−∞
PROPERTIES OF VARIANCE
1) Var(a) = 0
2) Var(aX + b) = a2Var(X)
3) Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X,Y)
CONDITIONAL VARIANCE
For two random variables X and Y, the conditional variance of X given that Y assumes
a specific value y is written as:
Var[X|Y] = E[[X-E(X|Y)]2|Y]
where:
E[X 2 | Y ] = ∑ x 2 p(x, y)
x
EXAMPLE
= 0 + 0.2727 + 0 + 0 = 0.2727
STANDARD DEVIATION
One of the drawbacks to using variance is that it is measured in squared units. Since
these are difficult to interpret, the standard deviation is often used instead. For any
random variable X, the standard deviation of X equals the square root of the variance of
X.
EXAMPLE
1
σ X = σ X2 = = 0.7071
2
COVARIANCE
If X and Y are discrete random variables, the covariance can also be expressed as:
n n
where:
i, j are indexes
∞ ∞
where:
EXAMPLE
Using the following joint probability mass function for two random variables X and Y,
the covariance is computed as follows:
i, j 0 1 Row Sum
P(X = i)
0 0.40 0.30 0.70
1 0.10 0.20 0.30
Column Sum 0.50 0.50 1.00
P(Y = j)
x P(x)
0 0.70
1 0.30
n
E(X) = ∑ xi P(X = xi )
i =1
Using the column sums from the joint probability mass function, the probability mass
function for Y is:
y P(y)
0 0.50
1 0.50
n
E(Y ) = ∑ yi P(Y = yi )
i =1
n n
COV (X,Y ) = ∑ ∑ (xi − E(X))(y j − E(Y ))P(X = xi ,Y = y j )
i =1 j =1
PROPERTIES OF COVARIANCE
1) Cov(X, Y) = Cov(Y, X)
2) Cov(X + a,Y + b) = Cov(X, Y)
CORRELATION
Cov(X,Y )
ρ=
σ Xσ Y
where:
The correlation coefficient always assumes a value between negative one and positive
one and is unit-free: -1 ≤ ρ ≤ 1
EXAMPLE
Using the data from the covariance example, the correlation is computed as follows.
n
σ X2 = ∑ [ xi − E(X)] P(X = xi )
2
i =1
σ X = 0.21 = 0.4583
i =1
σ Y = 0.25 = 0.5
Cov(X,Y ) 0.12
ρ= = = 0.5237
σ Xσ Y (0.4583)(0.5)
SKEWNESS
The third central moment of a random variable X is known as its skewness. This
indicates the degree of asymmetry in the values of X. Skewness is computed as follows:
E[(X − E(X))3 ]
α3 =
σ3
where:
EXAMPLE
For the die-rolling experiment, the numerator of the skewness formula is computed as
follows:
Therefore, α3 = 0/0.3536 = 0
KURTOSIS
The fourth central moment of a random variable X is known as its kurtosis. This refers
to the likelihood that X will assume an extremely small or large value. Kurtosis is
computed as follows:
E[(X − E(X))4 ]
α4 =
σ4
where:
EXAMPLE
For the die-rolling experiment, the numerator of the kurtosis formula is computed as
follows:
Therefore, α4 = 0.5/0.25 = 2
COEFFICIENT OF VARIATION
σX
CV =
µX
σ X 0.7071
CV = = = 0.7071
µX 1
CHEBYSHEV’S INEQUALITY
Chebyshev’s inequality gives an upper limit for the probability that X will be k or more
standard deviations away from its expected value. Chebyshev’s inequality is written as:
P { X − µ X ≥ kσ X } ≤
1
k2
EXAMPLE
P { X − µ X ≥ kσ X } ≤
1
k2
P { X − 2 ≥ 15} ≤
1
9
In other words, P(X ≥ 17) and P(X ≤ -15) are both less than or equal to 1/9.