ProbabilityStatistics Probability
ProbabilityStatistics Probability
ProbabilityStatistics Probability
Probability
Probability theory is the part of mathematics that deals with the study of random or
random phenomena. These are phenomena of which we know the possible results that
can be given, but we cannot predict exactly which of them will occur. This discipline is
an indispensable tool in statistical inference.
1
Intersection: The intersection of two events A and B denoted by A ∩ B occurs when
both of them do simultaneously occur. It is represented by the grammatical expression
A and B:
A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}.
Complementary: The complementary event to A denoted by Ac occurs when A does
not occur.
AC = {ω ∈ Ω : ω ∈/ A}.
Difference: The set different of A and B denoted by A r B (or A − B) occurs when A
does occurs, but B does not:
/ B} = A ∩ B c .
A r B = A − B = {ω ∈ Ω : ω ∈ A and ω ∈
1.2 Probability
Frequentist probability
The relative frequency of occurrence of an event A, observed in a number of repetitions of
the experiment, is a measure of the probability of that event. This is the core conception
of probability in the frequentist interpretation.
frequency of A in n number of repetitions of the experiment
fn (A) =
n
A claim of the frequentist approach is that, as the number of trials increases, the change
in the relative frequency will diminish. Hence, one can view a probability as the limiting
value of the corresponding relative frequencies.
For example, if we throw a correct coin many times, the proportion of times we get, for
example, face is getting closer to 1/2.
This frequency definition of probability cannot be used in practice because:
However, there are situations in which probabilities can be calculated without the need
to repeat experiments.
2
Classical probability definition. Laplace formula
This formula basically applies to gambling. If a random experiment results in a finite
number of results and there is no reason to favor one result over another (that is, all
are equally possible) the probability of a random event A is calculated as the quotient
between the number of favorable cases aa A and the number of all possible results of the
experiment.
number of cases favorable toA Card(A)
P (A) = = ,
number of possible cases Card(Ω)
where the symbol Card in front of a set indicates its cardinality or number of elements.
This formula for calculating probabilities is called Laplace’s formula.
Example. Suppose we have an unnamed dice. We want to calculate the probability of
getting a score greater than 4.
Here Ω = {1, 2, 3, 4, 5, 6} and we want to calculate the probability of the event A = {5, 6}.
Like this,
Card(A) 2 1
P (A) = = = .
Card(Ω) 6 3
Needless to say, it is often difficult to determine which set Ω is suitable for a certain
random experiment. It can also be difficult to count the number of items in a set. The
latter type of problem is dealt with in the so-called combinatorial part of mathematics.
Problems with the classical definition of probability:
• Can only be used when the number of possible results is finite. For example, if we
want to calculate the probability that a certain animal will live more than 10 years,
the space of the possible results of the life time of an animal is infinite: it can be
any number of the range [0, T ] (where T is the maximum lifetime).
• In situations where the space for possible results is finite, it may not be usable, for
example if we have a dice called, it is not true that all possible results have the same
chance of occurring.
0 ≤ P (A) ≤ 1.
P (Ω) = 1.
3
3. The probability of joining two disjoint events is the sum of their probabilities:
If we have A1 , A2 , A3 , . . . , An so that Ai → Aj = ∅, if i 6= j, then
n
[ Xn
P Ai = P (Ai ).
i=1 i=1
Properties:
P (∅) = 0.
2. Complementary probability:
P (Ac ) = 1 − P (A).
In particular,
P (A ∪ B) ≤ P (A) + P (B).
Note: There is also a formula for joining an arbitrary number of events, but we
don’t need it.
5. Probability of difference:
P (A r B) = P (A) − P (A ∩ B).
Example. A dice is thrown in the air. What is the probability that the number 4 will
come out if we know that the result was an even number?
4
Here Ω = {1, 2, 3, 4, 5, 6}, and we have the events A = {4} and B = {2, 4, 6}. We are
interested in P (A|B):
P (A ∩ B) Card(A ∩ B)/Card(Ω) 1/6 1
P (A|B) = = = = ,
P (B) Card(B)/Card(Ω) 3/6 3
since A \ B = {4}.
Note that P (A) = 61 and if we have the information that B happened, the probability
P (A|B) = 13 , is doubled.
Example. Suppose we have two ballot boxes, the first with 4 white balls and 4 black balls
and the second ballot box with 3 white and 2 black balls. We remove a ball from the first
ballot box and put it in the second ballot box, from which we then remove another ball.
What is the probability of getting a white ball in both draws?
Here are two events: B1 consisting of having a white ball in the first extraction and B2 ,
obtaining a white ball in the second extraction. We want to know the probability of the
two events happening, that is P (B1 \ B2 ). According to what we just observed
P (B1 ∩ B2 ) = P (B1 )P (B2 |B1 ).
5
Note that the conditional probability that appears in the previous expression is very
simple, since if we know that in the first extraction a white ball has come out, the new
composition of the second ballot box will be of 4 white and 2 black balls and for both the
probability of getting a white ball again with this composition is:
4 2
P (B2 |B1 ) = = .
6 3
On the other hand, it is clear that the probability of getting a white ball in the first draw
is:
4 1
P (B1 ) = = .
8 2
Thus, the desired probability is
1 2 1
P (B1 ∩ B2 ) = P (B1 )P (B2 |B1 ) = · = .
2 3 3
Example. The flu season is rapidly approaching. Each of us have some probability of
getting the flu, which can be naively computed as the number of cases of flu last year
divided by the number of people potentially exposed to the flu virus in that same year.
Let’s call this probability P(flu).
If a person gets a flu vaccination, their chance of getting the flu should change. This would
be denoted as P(flu—vaccine), and is read as ”probability of getting the flu given you
have been vaccinated”. Because of the ”been vaccinated” condition, this is a conditional
probability.
Adapting the equations above to our flu example,
The numerator is the probability that a person gets the vaccine and the flu; the denomi-
nator is the probability that a person gets the vaccine.
Let’s look at a table of frequencies for a population:
10
P (F lu|V accine) = P (Flu and Vaccine)/P (V accine) = = 0.0033.
3010
So this probability is the chance of getting the flu only among those who were vaccinated.
We have normalized the probability of an event (getting the flu) to the conditioning event
(getting vaccinated) rather than to the entire sample space.
Challenge Question: According to the table above, what is the probability of getting
the flu if you weren’t vaccinated P(Flu — No Vaccine)? What is the probability of getting
the flu P(flu) in general?
6
1.4 Event Independence
As applied in the previous example, one can write the probability of the intersection of
two non-zero probability events as
P (A ∩ B) = P (B)P (A|B)
and also how
P (A ∩ B) = P (A)P (B|A).
These expressions tell us that the probability of the intersection of two events is the prob-
ability that any of them will pass by the probability that the other will pass conditioned
on the fact that the first has happened.
If there is no relationship between these two events, knowing that one of them has occurred
should not change the probability of the other, ie P (A|B) = P (A) i P (B|A) = P (B). In
this way we introduce the concept of independence of two events. Two events A and B
are said to be independent if and only if
P (A ∩ B) = P (A)P (B).
This definition can be given even if the conditional probabilities do not make sense for
either event to have zero probability.
With this definition we can state the following formula which is one of the most used tools
in the calculation of probabilities.
Theorem 1 (Total probability formula) Let {A1 , . . . , An } be a partition of the sam-
ple space Ω such that P (Ai ) > 0 for all i. Then, given any B event, we have the following
equality
X n
P (B) = P (B|Ai )P (Ai ).
i=1
Example. There are 40% men in the Faculty of Arts and Philosophy. 45% for men
smoking, while for women, the percentage of smokers is 30%. We calculate the probability
that a randomly selected student from this Faculty is a smoker.
We have the following random events involved:
A1 = {selected student is a man}
A2 = {the selected student is a woman}
B = {selected student is a smoker}
7
The events A1 and A2 form a partition of the set of all possibilities, Ω. We have the
following probabilities:
We are interested in calculating P (B), which by applying the formula of total probability
is
P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) = 0.40 · 0.45 + 0.60 · 0.30 = 0.36.
Theorem 2 (Bayes formula) Let {A1 , . . . , An } be a partition of Ω such that P (Ai ) > 0
for all i, and let B be an event such that P (B) > 0. Then, for any k = 1, . . . , n,
P (B|Ak )P (Ak )
P (Ak |B) = Pn .
i=1 P (B|Ai )P (Ai )
The Bayes formula is also known as the conditioning inversion formula, since we know the
probabilities of the type P (B|Ai ) and the formula tells us how to calculate a probability
of the type P (Ak |B).
Example. We continue the example from the previous Faculty. If we have information
that the randomly selected student is a smoker, what is the probability that he is a man?
Now we want to calculate the probability P (A1 |B). Applying the Bayes formula, we have
8
#define function for Bayes’ Theorem
bayesTheorem <- function(pA, pB, pBA) {
pAB <- pA * pBA / pB
return(pAB)
}
#define probabilities
pRain <- 0.2
pCloudy <- 0.4
pCloudyRain <- .85
[1] 0.425
This tells us that if it’s cloudy outside on a given day, the probability that it will rain
that day is 0.425 or 42.5%.
This matches the value that we can calculate by hand.
Example. Let’s do a little experiment in R. We’ll toss two fair dice, just as we did in an
earlier post, and see if the results of the two dice are independent. We first roll the dice
100,000 times, and then compute the joint distribution of the results of the rolls from the
two dice.
9
Let’s evaluate the probability that y=1 both with and without knowledge of x. If we
don’t observe x, that probability is:
> mean(rolls$y==1)
0.16602
If we know that x=3, then the conditional probability that y=1 given x=3 is:
> mean(rolls$y[rolls$x==3]==1)
0.165793
These results are very close. Note: R makes it very easy to do conditional probability
evaluations. In R, you can restrict yourself to those observations of y when x=3 by
specifying a Boolean condition as the index of the vector, as y[x==3].
If we assumed that the results from the two dice are statistically independent, we would
have, for every pair of values i,j in 1,2,3,4,5,6: P(x=i and y=j) = P(x=i)P(y=j) We
computed the first part earlier from prob table. Now for the second part.
We see that prob table and prob table indep are quite close, indicating that the rolls of
the two dice are probably independent.
10