Notes - Module 4

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

Topic 1: Joint Distribution of the Sample Mean and Sample Variance

Topic 2: Confidence Intervals

Confidence Interval is used to describe the uncertainty associated with a sampling


method.

A confidence interval gives the probability within which the true value of the
parameter will lie.

What is the confidence interval estimate of the population mean?

The general format of a confidence interval estimate of a population mean is


given by :

= Sample mean ± Multiplier × Standard error of Mean

For variable Xj, a confidence interval estimate of its population mean µj is given
by,

= Xj ± z × Sj / (n)1/2

Where,

 Xj is the sample mean,

 Sj is the standard sample deviation,

 n is the sample value

 z represents the appropriate z-values corresponding to the confidence


interval in z-table

Hence, the confidence interval estimate of population mean is Xj ± z × Sj / (n)1/2 .


Topic 3: Bayesian Analysis of samples from Normal Distribution

Bayes Theorem

Bayes theorem is a theorem in probability and statistics, named after the


Reverend Thomas Bayes, that helps in determining the probability of an event
that is based on some event that has already occurred.

Bayes theorem has many applications such as bayesian interference, in the


healthcare sector - to determine the chances of developing health problems with
an increase in age and many others.

What is Bayes Theorem?

Bayes theorem, in simple words, determines the conditional probability of an


event A given that event B has already occurred.

Bayes theorem is also known as the Bayes Rule or Bayes Law. It is a method to
determine the probability of an event based on the occurrences of prior events. It
is used to calculate conditional probability.

Bayes theorem calculates the probability based on the hypothesis.

Bayes theorem states that the conditional probability of an event A, given the
occurrence of another event B, is equal to the product of the likelihood of B, given
A and the probability of A. It is given as:

Here, P(A) = how likely A happens(Prior knowledge)- The probability of a


hypothesis is true before any evidence is present.

P(B) = how likely B happens(Marginalization)- The probability of observing the


evidence.
P(A/B) = how likely A happens given that B has happened(Posterior)-The
probability of a hypothesis is true given the evidence.

P(B/A) = how likely B happens given that A has happened(Likelihood)- The


probability of seeing the evidence if the hypothesis is true.

Note:

 Here Eii ∩ Ejj = φ, where i ≠ j. (i.e) They are mutually exhaustive events

 The union of all the events of the partition, should give the sample space.

 0 ≤ P(Eii) ≤ 1

Bayes Theorem Formula for Events

.
Terms Related to Bayes Theorem

As we have studied about Bayes theorem in detail, let us understand the


meanings of a few terms related to the concept which have been used in the
Bayes theorem formula and derivation:

 Conditional Probability - Conditional Probability is the probability of an


event A based on the occurrence of another event B. It is denoted by P(A|
B) and represents the probability of A given that event B has already
happened.

 Joint Probability - Joint probability measures the probability of two more


events occurring together and at the same time. For two events A and B, it
is denoted by P(A∩B).

 Random Variables - Random variable is a real-valued variable whose


possible values are determined by a random experiment. The probability of
such variables is also called the experimental probability.

 Posterior Probability - Posterior probability is the probability of an event


that is calculated after all the information related to the event has been
accounted for. It is also known as conditional probability.

 Prior Probability - Prior probability is the probability of an event that is


calculated before considering the new information obtained. It is the
probability of an outcome that is determined based on current knowledge
before the experiment is performed.

Important Notes on Bayes Theorem

 Bayes theorem is used to determine conditional probability.

 When two events A and B are independent, P(A|B) = P(A) and P(B|A) = P(B)

 Conditional probability can be calculated using the Bayes theorem for


continuous random variables.
Problems based on Bayes Theorem

Example 1: Amy has two bags. Bag I has 7 red and 2 blue balls and bag II has 5 red
and 9 blue balls. Amy draws a ball at random and it turns out to be red.
Determine the probability that the ball was from the bag I using the Bayes
theorem.

Solution: Let X and Y be the events that the ball is from the bag I and bag II,
respectively. Assume A to be the event of drawing a red ball. We know that the
probability of choosing a bag for drawing a ball is 1/2, that is,
P(X) = P(Y) = 1/2

Since there are 7 red balls out of a total of 11 balls in the bag I, therefore,
P(drawing a red ball from the bag I) = P(A|X) = 7/11

Similarly, P(drawing a red ball from bag II) = P(A|Y) = 5/14

We need to determine the value of P(the ball drawn is from the bag I given that it
is a red ball), that is, P(X|A). To determine this we will use Bayes Theorem. Using
Bayes theorem, we have the following:

= [((7/11)(1/2))/(7/11)(1/2)+(5/14)(1/2)]

= 0.64

Answer: Hence, the probability that the ball is drawn is from bag I is 0.64

Example 2: Assume that the chances of a person having a skin disease are
40%. Assuming that skin creams and drinking enough water reduces the risk
of skin disease by 30% and prescription of a certain drug reduces its chance
by 20%. At a time, a patient can choose any one of the two options with equal
probabilities. It is given that after picking one of the options, the patient
selected at random has the skin disease. Find the probability that the patient
picked the option of skin screams and drinking enough water using the Bayes
theorem.

Solution: Assume E1: The patient uses skin creams and drinks enough
water; E2: The patient uses the drug; A: The selected patient has the skin
disease

P(E1) = P(E2) = 1/2

Using the probabilities known to us, we have

P(A|E1) = 0.4 × (1-0.3) = 0.28

P(A|E2) = 0.4 × (1-0.2) = 0.32

Using Bayes Theorem, the probability that the selected patient uses skin
creams and drinks enough water is given by,

= (0.28 × 0.5)/(0.28 × 0.5 + 0.32 × 0.5)

= 0.14/(0.14 + 0.16)

= 0.47

Answer: The probability that the patient picked the first option is 0.47

Example 3: Example 3: A man is known to speak the truth 3/4 times. He draws a
card and reports it is king. Find the probability that it is actually a king.

Solution:

Let E be the event that the man reports that king is drawn from the pack of cards

A be the event that the king is drawn


B be the event that the king is not drawn.

Then we have P(A) = probability that king is drawn = 1/4

P(B) = probability that king is drawn = 3/4

P(E/A) = Probability that the man says the truth that king is drawn when actually
king is drawn = P(truth) = 3/4

P(E/B)= Probability that the man lies that king is drawn when actually king is
drawn = P(lie) = 1/4

Then according to Bayes theorem, the probability that it is actually a king =

P(A/E= [1/4 × 3/4] ÷[(1/4 × 3/4) + (1/4 × 3/4)]

= 3/16 ÷12/16

= 3/16 × 16/12

=1/2 = 0.5

Answer: Thus the probability that the drawn card is actually a king = 0.5

Bayes Factor:

A Bayes factor is the ratio of the likelihood of one particular hypothesis to the
likelihood of another. It can be interpreted as a measure of the strength of
evidence in favor of one theory among two competing theories.

That’s because the Bayes factor gives us a way to evaluate the data in favor of
a null hypothesis, and to use external information to do so. It tells us what the
weight of the evidence is in favor of a given hypothesis.
Deciphering the Bayes Factor

When we are comparing two hypotheses, H1 (the alternate hypothesis) and


H0 (the null hypothesis), the Bayes Factor is often written as B10. It can be defined
mathematically as

The Schwarz criterion is one of the easiest ways to calculate rough approximation
of the Bayes Factor.

Bayesian Information Criterion (BIC) / Schwarz Criterion:

The Bayesian Information Criterion (BIC) is an index used in Bayesian


statistics to choose between two or more alternative models.

The BIC is also known as the Schwarz information criterion (abrv. SIC) or
the Schwarz-Bayesian information criteria.

Definition of the Bayesian Information Criterion / Schwarz Criterion:


The Bayesian Information Criterion (BIC) is defined as:

k log(n)- 2log(L(θ̂)).

Here n is the sample size; the number of observations or number of data points you
are working with.
k is the number of parameters which your model estimates, and
θ is the set of all parameters.
L(θ̂) represents the likelihood of the model tested, given your data, when evaluated
at maximum likelihood values of θ.
You could call this the likelihood of the model given everything aligned to their most
favorable.

Another way of understanding L(θ̂) is that it is the probability of obtaining the data
which you have, supposing the model being tested was a given.
Comparing Models:
Comparing models with the Bayesian information criterion simply
involves calculating the BIC for each model. The model with the lowest BIC is
considered the best, and can be written BIC* (or SIC* if you use that name and
abbreviation).

We can also calculate the Δ BIC; the difference between a particular model and the
‘best’ model with the lowest BIC, and use it as an argument against the other model.
Δ BIC is just BICmodel – BIC*, where BIC* is the best model.

If Δ BIC is less than 2, it is considered ‘barely worth mentioning’ as an argument


either for the best theory or against the alternate one. The edge it gives our best
model is too small to be significant.
But if Δ BIC is between 2 and 6, one can say the evidence against the other model is
positive; i.e. we have a good argument in favor of our ‘best model’.
If it’s between 6 and 10, the evidence for the best model and against the weaker
model is strong.
A Δ BIC of greater than ten means the evidence favoring our best model vs the
alternate is very strong indeed.

Example:
Suppose you have a set of data with 50 observation points, and Model 1 estimates 3
parameters. Model 2 estimates 4 parameters. Let’s say the log of your maximum
likelihood for model 1 is a; and for model 2 it is 2a.
Using the formula k log(n)- 2log(L(θ)),
Calculation of SIC on this data gives us:
 Model 1: 3log(50) – 2a = 5.1 – 2a
 Model 2: 4log(50) – 4a = 6.8 – 4a
So ΔBIC is 1.7 – 2a.

Since the evidence that the Bayesian Information Criterion gives us for model 1 will
only be ‘worth mentioning’ if 1.7 – 2a > 2, we can only claim conclusive results if -2a
> 0.3; that is to say, a < -0.15.
Topic 4: Fisher Information

What is Fisher Information?

Fisher information tells us how much information about an unknown parameter


we can get from a sample. In other words, it tells us how well we can measure a
parameter, given a certain amount of data. More formally, it measures the
expected amount of information given by a random variable (X) for
a parameter(Θ) of interest. The concept is related to the law of entropy, as both
are ways to measure disorder in a system (Friedan, 1998).

Applications:

 Describing the asymptotic behavior of maximum likelihood estimates.


 Calculating the variance of an estimator.
 Finding priors in Bayesian inference.
Topic 6: Central Limit Theorem

The Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no
matter what the shape of the population distribution. This fact holds especially
true for sample sizes over 30.

All this is saying is that as you take more samples, especially large ones, your
graph of the sample means will look more like a normal distribution.

Here’s what the Central Limit Theorem is saying, graphically. The picture below
shows one of the simplest types of test: rolling a fair die. The more times you roll
the die, the more likely the shape of the distribution of the means tends to look
like a normal distribution graph.
The Central Limit Theorem and Means

An essential component of the Central Limit Theorem is that the average of your
sample means will be the population mean. In other words, add up the means
from all of your samples, find the average and that average will be your actual
population mean.

Similarly, if you find the average of all of the standard deviations in your sample,
you’ll find the actual standard deviation for your population. It’s a pretty useful
phenomenon that can help accurately predict characteristics of a population.

A Central Limit Theorem word problem will most likely contain the phrase
“assume the variable is normally distributed”, or one like it. With these central
limit theorem examples, you will be given:

1. A population (i.e. 29-year-old males, seniors between 72 and 76, all


registered vehicles, all cat owners)

2. An average (i.e. 125 pounds, 24 hours, 15 years, $15.74)

3. A standard deviation (i.e. 14.4lbs, 3 hours, 120 months, $196.42)

4. A sample size (i.e. 15 males, 10 seniors, 79 cars, 100 households)

Central Limit Theorem Examples:


 I want to find the probability that the mean is greater than a certain
number
 I want to find the probability that the mean is less than a certain number

Material for Numericals on CLT : https://www.statisticshowto.com/probability-


and-statistics/normal-distributions/central-limit-theorem-definition-examples/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy