ProbabilityStatistics Probability

Chapter 1
Probability
Probability theory is the part of mathematics that deals with the study of random or
random phenomena. These are phenomena of which we know the possible results that
can be given, but we cannot predict exactly which of them will occur. This discipline is
an indispensable tool in statistical inference.
1.1 Random experiments and events

We will say that an experiment is random if the following conditions are satisfied:
1. It can be repeated indefinitely, always under the same conditions (at least in theory).
2. The result cannot be predicted before doing so.
3. The result obtained, ω, belongs to a previously known set of possible results. We
will call this set of possible results a sample space and denote it with the letter Ω.
The elements of the sample space are called elementary events.
Examples:
• All games of chance.
• Observation of a certain variable (for example, age) in a population of objects (in
this case people) from which we select a random object.
• The result of a certain measurement. This result is influenced (even if there is no
systematic error) by inevitable random errors and the value to be obtained cannot
be predicted a priori.
Any subset of Ω is called random event and is usually denoted with a capital letter.
As random events are only subset of the set Ω we can apply them the known operations
for sets.
Union: The union of two events A and B denoted by A ∪ B occurs when either of the
two events (or both of them simultaneously) do occur. Represented by the grammatical
expression A or B:
A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}.
1
Intersection: The intersection of two events A and B denoted by A ∩ B occurs when
both of them do simultaneously occur. It is represented by the grammatical expression
A and B:
A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}.
Complementary: The complementary event to A denoted by Ac occurs when A does
not occur.
AC = {ω ∈ Ω : ω ∈/ A}.
Difference: The set different of A and B denoted by A r B (or A − B) occurs when A
does occurs, but B does not:
/ B} = A ∩ B c .
A r B = A − B = {ω ∈ Ω : ω ∈ A and ω ∈
1.2 Probability
Frequentist probability
The relative frequency of occurrence of an event A, observed in a number of repetitions of
the experiment, is a measure of the probability of that event. This is the core conception
of probability in the frequentist interpretation.
frequency of A in n number of repetitions of the experiment
fn (A) =
n
A claim of the frequentist approach is that, as the number of trials increases, the change
in the relative frequency will diminish. Hence, one can view a probability as the limiting
value of the corresponding relative frequencies.
P (A) = lim fn (A).

n→∞
For example, if we throw a correct coin many times, the proportion of times we get, for
example, face is getting closer to 1/2.
This frequency definition of probability cannot be used in practice because:
• An infinite number of experiments should be performed to calculate a probability.

For example, rolling a dice infinite times to see that the relative frequencies converge
to 1/6.
• Sometimes a large number of experiments cannot be performed in practice. For

example, in animal experimentation, for ethical reasons, the number of tests is very
limited. Sometimes it’s a matter of time and / or money, but the most common is
that you can’t do a random experiment as many times as you want.
However, there are situations in which probabilities can be calculated without the need
to repeat experiments.
2
Classical probability definition. Laplace formula
This formula basically applies to gambling. If a random experiment results in a finite
number of results and there is no reason to favor one result over another (that is, all
are equally possible) the probability of a random event A is calculated as the quotient
between the number of favorable cases aa A and the number of all possible results of the
experiment.
number of cases favorable toA Card(A)
P (A) = = ,
number of possible cases Card(Ω)
where the symbol Card in front of a set indicates its cardinality or number of elements.
This formula for calculating probabilities is called Laplace’s formula.
Example. Suppose we have an unnamed dice. We want to calculate the probability of
getting a score greater than 4.
Here Ω = {1, 2, 3, 4, 5, 6} and we want to calculate the probability of the event A = {5, 6}.
Like this,
Card(A) 2 1
P (A) = = = .
Card(Ω) 6 3
Needless to say, it is often difficult to determine which set Ω is suitable for a certain
random experiment. It can also be difficult to count the number of items in a set. The
latter type of problem is dealt with in the so-called combinatorial part of mathematics.
Problems with the classical definition of probability:
• Can only be used when the number of possible results is finite. For example, if we
want to calculate the probability that a certain animal will live more than 10 years,
the space of the possible results of the life time of an animal is infinite: it can be
any number of the range [0, T ] (where T is the maximum lifetime).
• In situations where the space for possible results is finite, it may not be usable, for
example if we have a dice called, it is not true that all possible results have the same
chance of occurring.
Axiomatic definition of probability

This definition does not tell us how to calculate probabilities, it only tells us what prop-
erties probability has.
Given a sample space Ω, we say that P is a probability over Ω if the following axioms or
properties are satisfied:
1. Given an event A we always have that
0 ≤ P (A) ≤ 1.
2. The probability of the sample space Ω (also called secure event) is 1,
P (Ω) = 1.
3
3. The probability of joining two disjoint events is the sum of their probabilities:
If we have A1 , A2 , A3 , . . . , An so that Ai → Aj = ∅, if i 6= j, then
n
[ Xn
P Ai = P (Ai ).
i=1 i=1
Properties:
1. Probability of empty set ∅, also called impossible event:
P (∅) = 0.
2. Complementary probability:
P (Ac ) = 1 − P (A).
3. Probability of joining two not necessarily disjointed events:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
In particular,
P (A ∪ B) ≤ P (A) + P (B).
4. Probability of joining three events not necessarily two by two:
P (A∪B∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (A∩C)−P (B∩C)+P (A∩B∩C).
Note: There is also a formula for joining an arbitrary number of events, but we
don’t need it.
5. Probability of difference:
P (A r B) = P (A) − P (A ∩ B).
1.3 Conditional probability

Let B be a non-zero probability event, that is, P (B) > 0. For any other event A ⊂ Ω, we
call conditional probability from A to B the amount we denote P (A|B) and which is
calculated as
P (A ∩ B)
P (A|B) = .
P (B)
It means the probability that event A will occur if we know that event B occurred. Note
that we need the condition that P (B) be different from 0 in order to make the division.
Example. A dice is thrown in the air. What is the probability that the number 4 will
come out if we know that the result was an even number?
4
Here Ω = {1, 2, 3, 4, 5, 6}, and we have the events A = {4} and B = {2, 4, 6}. We are
interested in P (A|B):
P (A ∩ B) Card(A ∩ B)/Card(Ω) 1/6 1
P (A|B) = = = = ,
P (B) Card(B)/Card(Ω) 3/6 3
since A \ B = {4}.
Note that P (A) = 61 and if we have the information that B happened, the probability
P (A|B) = 13 , is doubled.
Some more examples of where we might encounter such conditional probabilities:

• Inveterate bridge players like my dad would keep track of cards as they got exposed
in the pile, for that (and the bids) provided information about the likelihoods of
what hand each player had. Such card counting and conditional probabilities (what’s
the likelihood of each hand, given what I have seen) is one of the (frowned upon)
strategies for trying to beat the casinos in blackjack and poker (see the movie 21
for a Hollywood version of real-life card counting in casinos).
• When we go to the doctor to test for a disease (say tuberculosis or HIV or even,
more commonly, strep throat and flu), we get a yes or no answer. However, no test
is perfect. A positive test still means we might not have the disease, and testing
negative might mean we have it, though hopefully with very little likelihood. For
us, the important thing to know is, if we tested positive (an observed event), what
is the chance that we truly have the disease (an unobserved event).
• Weather forecasting is based on conditional probabilities. When the forecast says
that there is a 30% chance of rain, that probability is based on all the information
that the meteorologists know up until that point. It’s not just a roll of the dice
(though sometimes, it feels that way).
In general, calculating probabilities conditioned on an event B is as if the sample space

had been reduced to the set B.
In many practical situations, the conditional probability is known due to the character-
istics of the experiment and precisely the formula of the definition is used in reverse, to
calculate the probability of an intersection, since this formula is en deduces that
P (A ∩ B) = P (B) · P (A|B).
Example. Suppose we have two ballot boxes, the first with 4 white balls and 4 black balls
and the second ballot box with 3 white and 2 black balls. We remove a ball from the first
ballot box and put it in the second ballot box, from which we then remove another ball.
What is the probability of getting a white ball in both draws?
Here are two events: B1 consisting of having a white ball in the first extraction and B2 ,
obtaining a white ball in the second extraction. We want to know the probability of the
two events happening, that is P (B1 \ B2 ). According to what we just observed
P (B1 ∩ B2 ) = P (B1 )P (B2 |B1 ).
5
Note that the conditional probability that appears in the previous expression is very
simple, since if we know that in the first extraction a white ball has come out, the new
composition of the second ballot box will be of 4 white and 2 black balls and for both the
probability of getting a white ball again with this composition is:
4 2
P (B2 |B1 ) = = .
6 3
On the other hand, it is clear that the probability of getting a white ball in the first draw
is:
4 1
P (B1 ) = = .
8 2
Thus, the desired probability is
1 2 1
P (B1 ∩ B2 ) = P (B1 )P (B2 |B1 ) = · = .
2 3 3
Example. The flu season is rapidly approaching. Each of us have some probability of
getting the flu, which can be naively computed as the number of cases of flu last year
divided by the number of people potentially exposed to the flu virus in that same year.
Let’s call this probability P(flu).
If a person gets a flu vaccination, their chance of getting the flu should change. This would
be denoted as P(flu—vaccine), and is read as ”probability of getting the flu given you
have been vaccinated”. Because of the ”been vaccinated” condition, this is a conditional
probability.
Adapting the equations above to our flu example,
P (F lu|V accine) = P (Flu and Vaccine)/P (V accine).
The numerator is the probability that a person gets the vaccine and the flu; the denomi-
nator is the probability that a person gets the vaccine.
Let’s look at a table of frequencies for a population:
Condition V accine No Vaccine

F lu 10 1000
No Flu 3000 10000
10
P (F lu|V accine) = P (Flu and Vaccine)/P (V accine) = = 0.0033.
3010
So this probability is the chance of getting the flu only among those who were vaccinated.
We have normalized the probability of an event (getting the flu) to the conditioning event
(getting vaccinated) rather than to the entire sample space.
Challenge Question: According to the table above, what is the probability of getting
the flu if you weren’t vaccinated P(Flu — No Vaccine)? What is the probability of getting
the flu P(flu) in general?
6
1.4 Event Independence
As applied in the previous example, one can write the probability of the intersection of
two non-zero probability events as
P (A ∩ B) = P (B)P (A|B)
and also how
P (A ∩ B) = P (A)P (B|A).
These expressions tell us that the probability of the intersection of two events is the prob-
ability that any of them will pass by the probability that the other will pass conditioned
on the fact that the first has happened.
If there is no relationship between these two events, knowing that one of them has occurred
should not change the probability of the other, ie P (A|B) = P (A) i P (B|A) = P (B). In
this way we introduce the concept of independence of two events. Two events A and B
are said to be independent if and only if
P (A ∩ B) = P (A)P (B).
This definition can be given even if the conditional probabilities do not make sense for
either event to have zero probability.
1.5 Total probability formula

A partition of the sample space Ω is a family of events {A1 , . . . , An } that satisfy:
1. They are two by two: Ai ∩ Aj = ∅, si i 6= j.
2. They are exhaustive: ni=1 Ai = Ω.
S
With this definition we can state the following formula which is one of the most used tools
in the calculation of probabilities.
Theorem 1 (Total probability formula) Let {A1 , . . . , An } be a partition of the sam-
ple space Ω such that P (Ai ) > 0 for all i. Then, given any B event, we have the following
equality
X n
P (B) = P (B|Ai )P (Ai ).
i=1
Example. There are 40% men in the Faculty of Arts and Philosophy. 45% for men
smoking, while for women, the percentage of smokers is 30%. We calculate the probability
that a randomly selected student from this Faculty is a smoker.
We have the following random events involved:
A1 = {selected student is a man}
A2 = {the selected student is a woman}
B = {selected student is a smoker}
7
The events A1 and A2 form a partition of the set of all possibilities, Ω. We have the
following probabilities:
P (A1 ) = 0.40 P (A2 ) = 0.60 P (B|A1 ) = 0.45 P (B|A2 ) = 0.30
We are interested in calculating P (B), which by applying the formula of total probability
is
P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) = 0.40 · 0.45 + 0.60 · 0.30 = 0.36.
1.6 Bayes formula

Suppose now that we are interested in knowing the information given by an event B on
an element of partition {A1 , . . . , An }, that is, P (Ai |B).
Theorem 2 (Bayes formula) Let {A1 , . . . , An } be a partition of Ω such that P (Ai ) > 0
for all i, and let B be an event such that P (B) > 0. Then, for any k = 1, . . . , n,
P (B|Ak )P (Ak )
P (Ak |B) = Pn .
i=1 P (B|Ai )P (Ai )
The Bayes formula is also known as the conditioning inversion formula, since we know the
probabilities of the type P (B|Ai ) and the formula tells us how to calculate a probability
of the type P (Ak |B).
Example. We continue the example from the previous Faculty. If we have information
that the randomly selected student is a smoker, what is the probability that he is a man?
Now we want to calculate the probability P (A1 |B). Applying the Bayes formula, we have
P (A1 )P (B|A1 ) 0.40 · 0.45 0.18 1

P (A1 |B) = = = = .
P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) 0.40 · 0.45 + 0.60 · 0.30 0.36 2
We can create the following simple function to apply Bayes’ Theorem in R:
bayesTheorem <- function(pA, pB, pBA) {

pAB <- pA * pBA / pB
return(pAB)
}
Example. Suppose we know the following probabilities:

P(rain) = 0.20
P(cloudy) = 0.40
P(cloudy — rain) = 0.85
To calculate P(rain — cloudy), we can use the following syntax:
8
#define function for Bayes’ Theorem
bayesTheorem <- function(pA, pB, pBA) {
pAB <- pA * pBA / pB
return(pAB)
}
#define probabilities
pRain <- 0.2
pCloudy <- 0.4
pCloudyRain <- .85
#use function to calculate conditional probability

bayesTheorem(pRain, pCloudy, pCloudyRain)
[1] 0.425
This tells us that if it’s cloudy outside on a given day, the probability that it will rain
that day is 0.425 or 42.5%.
This matches the value that we can calculate by hand.
Example. Let’s do a little experiment in R. We’ll toss two fair dice, just as we did in an
earlier post, and see if the results of the two dice are independent. We first roll the dice
100,000 times, and then compute the joint distribution of the results of the rolls from the
two dice.
dice <- function(no_of_rolls=1){

x <- sample(1:6, size=no_of_rolls, replace=TRUE)
y <- sample(1:6, size=no_of_rolls, replace=TRUE)
return(cbind(x,y))
}
set.seed(20485)
rolls <- as.data.frame(dice(100000))
library(plyr)
freq_table <- ddply(rolls, ~x, summarize,
y1=sum(y==1), y2=sum(y==2), y3= sum(y==3),
y4 = sum(y==4), y5=sum(y==5), y6=sum(y==6))
row.names(freq_table) <- paste0(’x’,1:6)
prob_table <- freq_table[,-1]/100000
prob_table
## y1 y2 y3 y4 y5 y6
## x1 0.02754 0.02886 0.02723 0.02804 0.02762 0.02820
## x2 0.02656 0.02831 0.02670 0.02803 0.02807 0.02799
## x3 0.02753 0.02876 0.02745 0.02725 0.02755 0.02751
## x4 0.02783 0.02800 0.02831 0.02715 0.02806 0.02771
## x5 0.02830 0.02836 0.02668 0.02766 0.02729 0.02849
## x6 0.02826 0.02770 0.02771 0.02747 0.02757 0.02825
9
Let’s evaluate the probability that y=1 both with and without knowledge of x. If we
don’t observe x, that probability is:
> mean(rolls$y==1)
0.16602
If we know that x=3, then the conditional probability that y=1 given x=3 is:
> mean(rolls$y[rolls$x==3]==1)
0.165793
These results are very close. Note: R makes it very easy to do conditional probability
evaluations. In R, you can restrict yourself to those observations of y when x=3 by
specifying a Boolean condition as the index of the vector, as y[x==3].
If we assumed that the results from the two dice are statistically independent, we would
have, for every pair of values i,j in 1,2,3,4,5,6: P(x=i and y=j) = P(x=i)P(y=j) We
computed the first part earlier from prob table. Now for the second part.
prob_x <- table(rolls$x)/100000

prob_y <- table(rolls$y)/100000
prob_table_indep <- outer(prob_x,prob_y,’*’)
row.names(prob_table_indep) <- paste0(’x’,1:6)
colnames(prob_table_indep) <- paste0(’y’,1:6)
prob_table_indep
##
## y1 y2 y3 y4 y5 y6
## x1 0.02781 0.02847 0.02748 0.02774 0.02783 0.02816
## x2 0.02750 0.02816 0.02718 0.02743 0.02753 0.02786
## x3 0.02757 0.02823 0.02725 0.02750 0.02759 0.02792
## x4 0.02774 0.02840 0.02741 0.02767 0.02776 0.02809
## x5 0.02769 0.02835 0.02737 0.02762 0.02771 0.02804
## x6 0.02772 0.02838 0.02739 0.02765 0.02774 0.02807
We see that prob table and prob table indep are quite close, indicating that the rolls of
the two dice are probably independent.
10

ProbabilityStatistics Probability

Uploaded by

Copyright:

Available Formats

ProbabilityStatistics Probability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ProbabilityStatistics Probability

Uploaded by

Copyright:

Available Formats

Chapter 1

1.1 Random experiments and events

P (A) = lim fn (A).

• An infinite number of experiments should be performed to calculate a probability.

• Sometimes a large number of experiments cannot be performed in practice. For

Axiomatic definition of probability

2. The probability of the sample space Ω (also called secure event) is 1,

1. Probability of empty set ∅, also called impossible event:

3. Probability of joining two not necessarily disjointed events:

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

4. Probability of joining three events not necessarily two by two:

P (A∪B∪C) = P (A)+P (B)+P (C)−P (A∩B)−P (A∩C)−P (B∩C)+P (A∩B∩C).

1.3 Conditional probability

Some more examples of where we might encounter such conditional probabilities:

In general, calculating probabilities conditioned on an event B is as if the sample space

P (F lu|V accine) = P (Flu and Vaccine)/P (V accine).

Condition V accine No Vaccine

1.5 Total probability formula

P (A1 ) = 0.40 P (A2 ) = 0.60 P (B|A1 ) = 0.45 P (B|A2 ) = 0.30

1.6 Bayes formula

P (A1 )P (B|A1 ) 0.40 · 0.45 0.18 1

We can create the following simple function to apply Bayes’ Theorem in R:

bayesTheorem <- function(pA, pB, pBA) {

Example. Suppose we know the following probabilities:

#use function to calculate conditional probability

dice <- function(no_of_rolls=1){

prob_x <- table(rolls$x)/100000

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.