Notes - Module 4
Notes - Module 4
Notes - Module 4
A confidence interval gives the probability within which the true value of the
parameter will lie.
For variable Xj, a confidence interval estimate of its population mean µj is given
by,
= Xj ± z × Sj / (n)1/2
Where,
Bayes Theorem
Bayes theorem is also known as the Bayes Rule or Bayes Law. It is a method to
determine the probability of an event based on the occurrences of prior events. It
is used to calculate conditional probability.
Bayes theorem states that the conditional probability of an event A, given the
occurrence of another event B, is equal to the product of the likelihood of B, given
A and the probability of A. It is given as:
Note:
Here Eii ∩ Ejj = φ, where i ≠ j. (i.e) They are mutually exhaustive events
The union of all the events of the partition, should give the sample space.
0 ≤ P(Eii) ≤ 1
.
Terms Related to Bayes Theorem
When two events A and B are independent, P(A|B) = P(A) and P(B|A) = P(B)
Example 1: Amy has two bags. Bag I has 7 red and 2 blue balls and bag II has 5 red
and 9 blue balls. Amy draws a ball at random and it turns out to be red.
Determine the probability that the ball was from the bag I using the Bayes
theorem.
Solution: Let X and Y be the events that the ball is from the bag I and bag II,
respectively. Assume A to be the event of drawing a red ball. We know that the
probability of choosing a bag for drawing a ball is 1/2, that is,
P(X) = P(Y) = 1/2
Since there are 7 red balls out of a total of 11 balls in the bag I, therefore,
P(drawing a red ball from the bag I) = P(A|X) = 7/11
We need to determine the value of P(the ball drawn is from the bag I given that it
is a red ball), that is, P(X|A). To determine this we will use Bayes Theorem. Using
Bayes theorem, we have the following:
= [((7/11)(1/2))/(7/11)(1/2)+(5/14)(1/2)]
= 0.64
Answer: Hence, the probability that the ball is drawn is from bag I is 0.64
Example 2: Assume that the chances of a person having a skin disease are
40%. Assuming that skin creams and drinking enough water reduces the risk
of skin disease by 30% and prescription of a certain drug reduces its chance
by 20%. At a time, a patient can choose any one of the two options with equal
probabilities. It is given that after picking one of the options, the patient
selected at random has the skin disease. Find the probability that the patient
picked the option of skin screams and drinking enough water using the Bayes
theorem.
Solution: Assume E1: The patient uses skin creams and drinks enough
water; E2: The patient uses the drug; A: The selected patient has the skin
disease
Using Bayes Theorem, the probability that the selected patient uses skin
creams and drinks enough water is given by,
= 0.14/(0.14 + 0.16)
= 0.47
Answer: The probability that the patient picked the first option is 0.47
Example 3: Example 3: A man is known to speak the truth 3/4 times. He draws a
card and reports it is king. Find the probability that it is actually a king.
Solution:
Let E be the event that the man reports that king is drawn from the pack of cards
P(E/A) = Probability that the man says the truth that king is drawn when actually
king is drawn = P(truth) = 3/4
P(E/B)= Probability that the man lies that king is drawn when actually king is
drawn = P(lie) = 1/4
= 3/16 ÷12/16
= 3/16 × 16/12
=1/2 = 0.5
Answer: Thus the probability that the drawn card is actually a king = 0.5
Bayes Factor:
A Bayes factor is the ratio of the likelihood of one particular hypothesis to the
likelihood of another. It can be interpreted as a measure of the strength of
evidence in favor of one theory among two competing theories.
That’s because the Bayes factor gives us a way to evaluate the data in favor of
a null hypothesis, and to use external information to do so. It tells us what the
weight of the evidence is in favor of a given hypothesis.
Deciphering the Bayes Factor
The Schwarz criterion is one of the easiest ways to calculate rough approximation
of the Bayes Factor.
The BIC is also known as the Schwarz information criterion (abrv. SIC) or
the Schwarz-Bayesian information criteria.
k log(n)- 2log(L(θ̂)).
Here n is the sample size; the number of observations or number of data points you
are working with.
k is the number of parameters which your model estimates, and
θ is the set of all parameters.
L(θ̂) represents the likelihood of the model tested, given your data, when evaluated
at maximum likelihood values of θ.
You could call this the likelihood of the model given everything aligned to their most
favorable.
Another way of understanding L(θ̂) is that it is the probability of obtaining the data
which you have, supposing the model being tested was a given.
Comparing Models:
Comparing models with the Bayesian information criterion simply
involves calculating the BIC for each model. The model with the lowest BIC is
considered the best, and can be written BIC* (or SIC* if you use that name and
abbreviation).
We can also calculate the Δ BIC; the difference between a particular model and the
‘best’ model with the lowest BIC, and use it as an argument against the other model.
Δ BIC is just BICmodel – BIC*, where BIC* is the best model.
Example:
Suppose you have a set of data with 50 observation points, and Model 1 estimates 3
parameters. Model 2 estimates 4 parameters. Let’s say the log of your maximum
likelihood for model 1 is a; and for model 2 it is 2a.
Using the formula k log(n)- 2log(L(θ)),
Calculation of SIC on this data gives us:
Model 1: 3log(50) – 2a = 5.1 – 2a
Model 2: 4log(50) – 4a = 6.8 – 4a
So ΔBIC is 1.7 – 2a.
Since the evidence that the Bayesian Information Criterion gives us for model 1 will
only be ‘worth mentioning’ if 1.7 – 2a > 2, we can only claim conclusive results if -2a
> 0.3; that is to say, a < -0.15.
Topic 4: Fisher Information
Applications:
The Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no
matter what the shape of the population distribution. This fact holds especially
true for sample sizes over 30.
All this is saying is that as you take more samples, especially large ones, your
graph of the sample means will look more like a normal distribution.
Here’s what the Central Limit Theorem is saying, graphically. The picture below
shows one of the simplest types of test: rolling a fair die. The more times you roll
the die, the more likely the shape of the distribution of the means tends to look
like a normal distribution graph.
The Central Limit Theorem and Means
An essential component of the Central Limit Theorem is that the average of your
sample means will be the population mean. In other words, add up the means
from all of your samples, find the average and that average will be your actual
population mean.
Similarly, if you find the average of all of the standard deviations in your sample,
you’ll find the actual standard deviation for your population. It’s a pretty useful
phenomenon that can help accurately predict characteristics of a population.
A Central Limit Theorem word problem will most likely contain the phrase
“assume the variable is normally distributed”, or one like it. With these central
limit theorem examples, you will be given: