AAS24_1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Probability

Adam Mahdi Applied Stats 22 Oct 2024 2 / 59


Probability theory (one slide review)
Experiment, sample, sample space, events, probability
Conditional probability, Bayes’ theorem, independence
Discrete random variables
Probability mass function
P (PMF): P(X = xi ) = p(xi )
p(xi ) 0 and p(xi ) = 1
Continuous random variables
Cumulative distribution function (CDF): F (x) = P(X < x) and
Rb
Probability density function (PDF): P(a  X  b) = a f (x)dx
R1
f (x) 0 and 1
f (x)dx = 1,
Expectation
P R1
E [X ] = xi pi (discrete), E [X ] = 1 xf (x)dx (continuous)
E (aX + bY ) = aE (X ) + bE (Y )
Variance
Var (X ) = E [X E [X ]]2 = E [X 2 ] E [X ]2
Var (aX + b) = a2 Var (X )
Adam Mahdi Applied Stats 22 Oct 2024 3 / 59
Sets and operations
Set: (e.g. A, B) is collection of objects
Empty set (?): set with no elements.
Subset: A ⇢ B means that A is a subset of B
Union: A [ B is the set containing all the elements of A and B
Intersection: A \ B contains the elements common to both A and B
Complement: Ac is a set containing all the elements not in a
particular set

Venn diagrams: a good way of visualising sets, and set operations

Adam Mahdi Applied Stats 22 Oct 2024 4 / 59


Laws of set algebra

Commutative property: A [ B = B [ A
Associative property: (A [ B) [ C = A [ (B [ C )
1st Distributive property: A [ (B \ C ) = (A [ B) \ (A [ C )
2nd Distributive property: A \ (B [ C ) = (A \ B) [ (A \ C )

Additionally, if S is the space, i.e. the set that contains all possible
elements, and A ⇢ S, then
A [ ? = A, A\?=?
A [ S = S, A\S =A
A[ Ac = S, A \ Ac = ?
A [ A = A, A\A=A

Adam Mahdi Applied Stats 22 Oct 2024 5 / 59


Example (rolling a die - 1)
Consider rolling a conventionally numbered die once and getting a 6.

Experiment: rolling a die


Outcome (or sample): getting a 6
Sample space: S = {1, 2, 3, 4, 5, 6}

Often we are not interested in individual outcomes, but in whether an


outcome belongs to a given subset (e.g. A) of S. These subsets are called
events. In this example, we might consider two mutually exclusive events:
throwing an even number or throwing an odd number; another event
would be throwing a number which is an integer multiple of 3.

Adam Mahdi Applied Stats 22 Oct 2024 6 / 59


Example (rolling a die - 2)
If the die we considered above is fair or unbiased, each outcome is equally
probable and we can say that the probability of getting any number is 1/6.

1
P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) =
6
A central idea in understanding probability calculations is the concept of
relative frequency, i.e. the frequency with which we can expect a
particular event to appear among all possible events. If all events are
equally likely, we just need to count the number of possible results to
assess probabilities.

Adam Mahdi Applied Stats 22 Oct 2024 7 / 59


Probability

Consider an experiment whose sample space is S. For each event E of the


sample space S there exists a value P(E ) called the probability of E . The
probabilities satisfy the following conditions (axioms):

Axiom 1: 0  P(E )  1
Axiom 2: P(S) = 1
Axiom 3: For any sequence of mutually exclusive events E1 , E2 , . . . (that
is, events for which Ei \ Ej = ? when i 6= j)
1
[ X
P( Ei ) = P(Ei )
i=1 i

Adam Mahdi Applied Stats 22 Oct 2024 8 / 59


Example (jackpot)
What is the chance of winning the jackpot in the national lottery?

There are 49 balls, numbered from 1 to 49, in the machine. Six balls are
selected without replacement and the jackpot is won when all 6 of the
selected balls are correctly identified. How many ways of choosing
combinations of 6 balls from a set of 49 are there?
✓ ◆
49 49! 49!
= = = 13, 983, 816
6 (49 6)!6! 43!6!

Our chances: each combination is equally likely so the chance of selecting


all the numbers is one in 13,983,816.

Adam Mahdi Applied Stats 22 Oct 2024 9 / 59


Conditional probability

There are many situations in which we possess prior information, i.e.


we already know something about the outcome.

As an example for our die we can ask ”if we know that the outcome is
even, what is the probability of it being larger than 3?”

What we are trying to calculate here is a conditional probability, i.e.


the probability of the event A given that the event B has happened,
written as P(A|B).
P(A \ B)
P(A|B) = , P(B) 6= 0
P(B)
P(A \ B) = P(A|B)P(B) = P(B|A)P(A)
We read this: the chance of two events happening simultaneously is the
chance of one of them happening, multiplied by the chance of the second
happening given that the first has happened.
Adam Mahdi Applied Stats 22 Oct 2024 10 / 59
Total probability

If we partition the sample space S into a set of n disjoint sets Ai

the addition rule above that the probability of B is given by

P(B) = P(B|A1 )P(A1 ) + P(B|A2 )P(A2 ) + . . . + P(B|An )P(An )

In particular
P(B) = P(B|A)P(A) + P(B|Ac )P(Ac )

Adam Mahdi Applied Stats 22 Oct 2024 11 / 59


Bayes’ rule

Rearranging the equalities

P(A \ B) = P(A|B)P(B) = P(B|A)P(A)

we obtain the Bayes’ rule

P(B|A)P(A)
P(A|B) =
P(B)

Or combined with total probability we obtain:

P(B|Aj )P(Aj )
P(Aj |B) = Pn
i=1 P(B|Ai )P(Ai )

Adam Mahdi Applied Stats 22 Oct 2024 12 / 59


Example (picnic)
You are planning a picnic today, but the morning is cloudy. We know that
50% of all rainy days start o↵ cloudy! But cloudy mornings are common
(about 40% of days start cloudy). This is usually a dry month (only 3 of 30
days tend to be rainy, or 10%). What is the chance of rain during the day?

We will use Rain to mean rain during the day, and Cloud to mean
cloudy morning.
The chance of Rain given Cloud is written P(Rain|Cloud)
Bayes’ formula

P(Rain)P(Cloud|Rain)
P(Rain|Cloud) =
P(Cloud)

0.1 ⇥ 0.5
P(Rain|Cloud) = = .125
0.4

Adam Mahdi Applied Stats 22 Oct 2024 13 / 59


Independence

If prior knowledge does not a↵ect the probability of the second event,
P(A|B) = P(A) that is

P(A \ B) = P(A|B)P(B) = P(B|A)P(A) = P(A)P(B)

then we say that the events A and B are independent.


So, if two events are independent, the probability that the two of them
happen in the same experiment is the product of their individual
probabilities.

Adam Mahdi Applied Stats 22 Oct 2024 14 / 59


Rare events

Example
A test for a rare disease detects the disease with a probability of 99%, and
has a false positive ratio (i.e. it tests positive even though the person is
healthy) of 0.5%. We know that the percentage of the general population
who have the disease is 1 in 10,000.
Suppose we chose a random subject and perform the test, which comes out
positive. What is the probability of the person actually having the disease?

Adam Mahdi Applied Stats 22 Oct 2024 15 / 59


Example: Rare events

The probability of D (having the disease) before the test is 1 in


10,000: P(D) = 0.0001 (i.e. P(D c ) = 0.9999)
The conditional probabilities of getting a positive in the test when the
person does / does not have the disease are P(T |D) = 0.99 (true
positive ratio) and P(T |D c ) = 0.005 (false positive ratio).
The probability of getting a positive result in the test is P(T ), which
we can calculate using total probability:

P(T ) = P(T |D)P(D) + P(T |D c )P(D c ) = 0.0050985

Now we just need to apply Bayes’ theorem

P(T |D)P(D) 0.99 ⇥ 0.0001


P(D|T ) = = = 0.01941747572 ⇡ 0.02
P(T ) 0.0050985

About 2%!! Are you surprised???


Adam Mahdi Applied Stats 22 Oct 2024 16 / 59
Example: Rare events (intuition)

Suppose 1 million people get tested for the disease. Out of the one million
people, about 100 of them have the disease, while the other 999, 900 do
not have the disease. Out of the 100 people who have the disease
100 ⇥ 0.99 = 99 people will have positive test results. However, out of the
people who do not have the disease 999, 900 ⇥ 0.005 = 4999.5 people will
have positive test results. Thus in total about 5,000 people with positive
test results, and only 99 of them actually have the disease. Therefore, the
probability that a person from the ”positive test result” group actually
have the disease is
99
P(D|T ) = = 0.01941747572 ⇡ 0.02
999, 900 ⇥ 0.005 + 99

Adam Mahdi Applied Stats 22 Oct 2024 17 / 59


Random Variables

When an experiment is performed, we are frequently interested mainly in


some function of the outcome as opposed to the actual outcome itself. For
instance, in tossing dice, we are often interested in the sum of the two dice
and are not really concerned about the separate values of each die. That
is, we may be interested in knowing that the sum is 7 and may not be
concerned over whether the actual outcome was (1, 6), (2, 5), (3, 4),
(4, 3), (5, 2), or (6, 1).

These quantities of interest, or, more formally, these real-valued functions


defined on the sample space, are known as random variables.

If the sample space is a set of discrete points then the variable is discrete,
otherwise it is continuous.

Adam Mahdi Applied Stats 22 Oct 2024 18 / 59


Discrete random variables
Given a sample space containing a discrete number of values xi , we say
that the probability that the random variable X equals that number xi , is
p(xi ) or
P(X = xi ) = p(xi )

The function, p(xi ), is a discrete probability density function


(pdf). The function must obey the basic rules of probability, i.e.
X
p(xi ) 0 p(xi ) = 1.
i

For historical reasons the discrete probability density function is also


known as the probability mass function.
Cumulative Distribution Function (CDF) is given by

F (x) = P(X < x)

Adam Mahdi Applied Stats 22 Oct 2024 19 / 59


Distributions

What is the probability distribution for a random variable X that is given


by the sum of the result of rolling two six-sided dice?

X = sum of the results of rolling two six-sided dice


X = {2, 3, . . . , 12}
The probability distribution of X are all the values P(X = xi ) =?
when xi is 2, 3, . . . , 12.
For example what is P(X = 2), that is what is the probability of
rolling two 1s?

Adam Mahdi Applied Stats 22 Oct 2024 20 / 59


Expectation and Variance (discrete)

The expected value of X is a weighted average (by the probability)


of the possible values that X can take on:
X
E [X ] = xi p(xi )
i

The variance gives information about how much the values of a


random variable X are likely to vary between tests
h i
Var [X ] = E (X E [X ])2 = E [X 2 ] (E [X ])2

Properties:
E [aX + b] = aE [X ] + b
Var [aX + b] = a2 Var [X ]

Adam Mahdi Applied Stats 22 Oct 2024 21 / 59


Bernoulli distribution
Question: What is the probability of getting a 6 when rolling a die?
In a Bernoulli distribution we have only two possible outcomes in a single
trial: 1 (success) and 0 (failure).

So the random variable X which has a Bernoulli distribution can take


value 1 with the probability of success, say p, and the value 0 with the
probability of failure, say q or 1 p.

1-p k=0
P(X = k) =
p k=1

If X is a Bernoulli distribution, show that:


E (X ) = p
Var (X ) = p(1 p)

Adam Mahdi Applied Stats 22 Oct 2024 22 / 59


Binomial distribution
Question: At a particular junction 10% of cars turn left. Five cars
approach the junction, what is the probability that exactly 3 will turn left?
In a Binomial distribution we are counting the number of times a
condition, which can either be success or a failure, is met in n identical
trials.
✓ ◆
n k
P(X = k) = p (1 p)n k
k

Each trial is independent


Only two possible outcomes in a trial
A total number of n identical trials are conducted
The probability of success and failure is same for all trials
If X is a Binomial distribution, show that:
E (X ) = np
Var (X ) = np(1 p)
Adam Mahdi Applied Stats 22 Oct 2024 23 / 59
Poisson distribution

Question: If the number of accidents on a particular road is 2 per month,


what is the probability that there are no accidents during a given month?

P(X = k) = e
k!
The requirements are:
each event is independent of each other
only one event can happen at a time
the mean rate of events is constant.

If X is a Poisson distribution, show that:


E (X ) =
Var (X ) =

Adam Mahdi Applied Stats 22 Oct 2024 24 / 59


Continuous random variables

Previously we considered discrete random variables - variables that can


take one of a countable set of distinct values. In other cases the variable
can vary continuously over a range: take for example the distribution of
student heights within your class.
We say that X is a continuous random variable if there exists a
nonnegative function f , called probability density function (PDF)
defined for all real x 2 ( 1, +1) having the property that for any
set B of real numbers
Z
P[X 2 B] = f (x)dx
B

All probability statements about X can be answered in terms of f !!!

Adam Mahdi Applied Stats 22 Oct 2024 25 / 59


Example (height)
What is the probability that the height of randomly selected Londoner will
be between 1.65m and 1.75m? Assume we know the population (i.e.
probability density function is known).

Z 1.75
P(1.65  X  1.75) = f (x)dx
1.65

Adam Mahdi Applied Stats 22 Oct 2024 26 / 59


Expectation and Variance (continuous)

The expected value of a continuous random variable X is given by


Z +1
E [X ] = xf (x)dx
1

Similarly as in the discrete case, the variance gives information about


how much the values of a X are likely to vary between tests
h i
Var [X ] = E (X E [X ])2 = E [X 2 ] (E [X ])2

where Z +1
2
E [X ] = x 2 f (x)dx
1

Adam Mahdi Applied Stats 22 Oct 2024 27 / 59


Uniform distribution
One of the simplest continuous distributions is the uniform distribution,
where outcomes are equally probable inside a range [a, b].

1/(b a) for a  x  b
f (x) =
0 elsewhere

If X has a uniform distribution U(a, b), show that:


E (X ) = (a + b)/2
Var (X ) = (b a)2 /12
Adam Mahdi Applied Stats 22 Oct 2024 28 / 59
Normal distribution

The most important in probability is the normal (or Gaussian) distribution:

1 h (x m)2 i
2
f (x) = N(m, ) = p exp 2
2⇡ 2

Adam Mahdi Applied Stats 22 Oct 2024 29 / 59


Probability theory (one slide review)
Experiment, events, sample space, probability of events
Conditional probability, Bayes’ theorem, independence
Discrete random variables
Probability mass function
P (PMF): P(X = xi ) = p(xi )
p(xi ) 0 and p(xi ) = 1
Continuous random variables
Cumulative distribution function (CDF): F (x) = P(X < x) and
Rb
Probability density function (PDF): P(a  X  b) = a f (x)dx
R1
f (x) 0 and 1
f (x)dx = 1,
Expectation
P R1
E [X ] = xi pi (discrete), E [X ] = 1 xf (x)dx (continuous)
E (aX + bY ) = aE (X ) + bE (Y )
Variance
Var (X ) = E [X E [X ]]2 = E [X 2 ] E [X ]2
Var (aX + b) = a2 Var (X )
Adam Mahdi Applied Stats 22 Oct 2024 30 / 59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy