Continuous Random Variable and Z Test (MADA)
Continuous Random Variable and Z Test (MADA)
3
1. What is Hypothesis Testing?
2. Size and Power of a Test.
3. Neyman - Pearson Paradigm of Hypothesis Testing
4. Types of hypothesis testing
5. Motivation example for hypothesis testing
“You can’t prove a hypothesis; you can only improve or disprove it.” – Christopher
Monckton
4
1. Z test: Tests concerning the mean of a normal population
2. Solved Example using Z test: 1
3. Solved Example using Z test: 2
4. Solved Example using Z test: 3
5. Solved Example using Z test : 4
5
Discrete and Continuous Random Variables
Definition:
A discrete random variable is characterized by its ability to assume, at most, a countable number of
possible values.
Consequently, any random variable capable of adopting either a finite number or a countably infinite
number of distinct values qualifies as a discrete random variable.
It's worth noting that there are also random variables whose set of potential values is uncountably infinite.
Definition:
Continuous random variables pertain to scenarios where outcomes of random events are
numerical, yet cannot be enumerated and are infinitely divisible.
6
Discrete Random Variable Continuous Random Variable
7
Probability density function (pdf)
• A Probability Density Function (pdf) is a mathematical function that describes the likelihood of a
continuous random variable falling within a particular range of values.
• In other words, it provides a way to represent the probability distribution of a continuous random
variable. It is denoted by f(x)
8
Probability density function (pdf)
9
Properties of pdf
•
10
Properties of pdf
•
11
Cumulative Distribution Function
•
12
Expectation and Variance of any
continuous random variable
•
13
Continuous Random Variable Distributions
• Normal Distribution
• Standard Normal Distribution
• T distribution
14
Normal Distribution
• The normal distribution, alternatively referred to as the Gaussian distribution or bell curve, is a continuous
probability distribution exhibiting symmetry around its mean, which coincides with its median and mode.
• Defined by its mean (μ) and standard deviation (σ), the normal distribution takes on a bell-shaped form.
• Considered paramount in statistics for both theoretical understanding and practical application, the normal
distribution holds significant importance.
15
Probability Density Function (pdf)
Normal Distribution
•
16
Shape of Normal Probability Density Function (pdf)
17
Height of the graph
•
18
Height of the graph
•
19
Same Mean and Different Standard Deviation
•
20
Same Standard Deviation and Different Mean
•
21
Symmetric around Mean
•
22
Normal Distribution
1. Bell Shaped
2. Centered at mean i.e., the expected
value
3. Close to the horizontal axis outside
the range from μ + 3.σ and μ - 3.σ
23
Approximation rule for Normal Distribution
24
Example of Normal Distribution N (μ , σ2)
X~ N(10,16)
25
Example of Normal Distribution N (μ , σ2)
X~ N(10,16)
26
Example of Normal Distribution N (μ , σ2)
X~ N(10,16)
27
Example of Normal Distribution N (μ , σ2)
X~ N(10,16)
28
Standard Normal Distribution
Z ~ N(0,1)
29
Standard Normal Distribution
Z ~ N(0,1)
Z-Score Formula
The statistical formula for a value's z-score is calculated using the following formula:
z=(x-μ)/σ
Where:
z = Z-score
x = the value being evaluated
μ = the mean
σ = the standard deviation
30
Standard Normal Distribution
Z ~ N(0,1)
How to Calculate Z-Score
Calculating a z-score requires that you first determine the mean and standard deviation of your
data. Once you have these figures, you can calculate your z-score. So, assume you have the
following variables:
x = 57
μ = 52
σ=4
31
Approximate Rule to Standard Normal Distribution
Z ~ N(0,1)
32
Approximate Rule to Standard Normal Distribution
Z ~ N(0,1)
33
Properties of Z
34
Properties of Z
35
Properties of Z
36
Properties of Z
37
Standard Normal Table
38
Standard Normal Table
3. P( 0 < Z < 1)
= P( Z < 1) – P( Z < 0)
= 0.84134 – 0.5
= 0.34134
39
Standard Normal Table
40
Standard Normal Table
41
What Is a T-Distribution?
• The t-distribution, also known as the Student’s t-distribution.
is a type of probability distribution that is similar to the normal distribution with its bell shape but has heavier
tails.
• It is used for estimating population parameters for small sample sizes or unknown variances.
• T-distributions have a greater chance for extreme values than normal distributions, and as a result have fatter
tails.
• The t-distribution is the basis for computing t-tests in statistics
KEY TAKEAWAYS
• The t-distribution is a continuous probability distribution of the z-score when the
estimated standard deviation is used in the denominator rather than the true standard
deviation.
• The t-distribution, like the normal distribution, is bell-shaped and symmetric, but it has
heavier tails, which means that it tends to produce values that fall far from its mean.
• T-tests are used in statistics to estimate significance.
42
What Is a T-Distribution?
What Does a T-Distribution Tell You?
Tail heaviness is determined by a parameter of the t-distribution called degrees of freedom, with smaller values
giving heavier tails, and with higher values making the t-distribution resemble a standard normal distribution with a
mean of 0 and a standard deviation of 1.
43
What Is a T-Distribution?
•
44
T-Distribution vs. Normal Distribution
• Normal distributions are used when the population distribution is assumed to be normal.
• The t-distribution is similar to the normal distribution, just with fatter tails. Both assume a
normally distributed population.
• The probability of getting values very far from the mean is larger with a t-distribution than a
normal distribution.
45
T-Distribution vs. Normal Distribution
Important Note: Because the t-distribution has fatter tails than a normal distribution, it
can be used as a model for financial returns that exhibit excess kurtosis, which will allow
for a more realistic calculation of Value at Risk (VaR) in such cases.
46
Limitations of Using a T-Distribution
• The t-distribution can skew exactness relative to the normal distribution.
• Its shortcoming only arises when there’s a need for perfect normality.
• The t-distribution should only be used when the population standard deviation is not known.
• If the population standard deviation is known and the sample size is large enough, the normal
distribution should be used for better results.
Note: We will look into the main application of t distribution in the Hypothesis Testing lecture (i.e.,
T-test).
47
What is Hypothesis Testing
Motivating Example: Is a coin fair or unfair?
A fair coin is said to have a probability of getting head P(H) = 0.5
An unfair coin is said to have a probability of getting head P(H) = 0.6
Let us suppose you have a coin that could be fair or unfair. You may toss the coin multiple times and observe the
results. How would you test whether the coin is fair or unfair?
48
What is Hypothesis Testing
Motivating Example: Is a coin fair or unfair?
A fair coin is said to have a probability of getting head P(H) = 0.5
An unfair coin is said to have a probability of getting head P(H) = 0.6
Let us suppose you have a coin that could be fair or unfair. You may toss the coin multiple times and observe the
results. How would you test whether the coin is fair or unfair?
Hypothesis Testing:
Using samples, decide between a null hypothesis (H0) and an alternative hypothesis (HA)
Fair Coin Example:
H0 : P(H) = 0.5
HA : P(H) = 0.6
One of the most important statistical analysis methods with a wide range of applications.
49
What is Hypothesis Testing
In summary, the null hypothesis represents the assumption to be tested, while the alternative hypothesis
represents the researcher's claim or the possibility of an effect or difference.
The goal of hypothesis testing is to gather evidence from sample data to decide whether to reject the null
hypothesis in favor of the alternative hypothesis.
50
Accepting or Rejecting the Null Hypothesis
Motivating Example: Is a coin fair or unfair?
• Suppose we toss the coin 3 times.
• Possible outcomes are HHH, HHT, … , TTT
• For some outcomes, we will accept H0 and others, we will reject H0
• Let A be the set of all outcomes for which we accept H0
• Every acceptance subset A corresponds to a test
51
Size and Power of a Test
52
Computing the Size and Power for Unfair Coin Example
H0 : P(H) = 0.5
HA : P(H) = 0.6
Toss 3 times = { HHH , HHT, HTH, THH, THT , HTT , TTH , TTT}
If acceptance subset A = ∅
• Always reject H0
• α=1,β=0
If acceptance subset A = { HHH, HHT, HTH, THH, THT, HTT, TTH, TTT}
• Always accept H0
• α=0,β=1
If acceptance subset A = { HHT, HTH, THH, THT, HTT, TTH }
• α = P( AC | P(H) = 0.5 ) = 2/8 = 0.25
• β = P( A | P(H) = 0.6 ) = 3(0.4)2(0.6) + 3(0.4)(0.6)2 = 0.72
The value α, called the level of significance of the test, is usually set in
advance, with commonly chosen values being α = .1, .05, .005.
53
Neyman-Pearson Paradigm of Hypothesis Testing
X1,X2,X3,…,Xn ~ iid X
• H0: Null Hypothesis on the distribution of X, HA: Alternative Hypothesis
• Test: Defined by an acceptance set A
• If samples fall in A, accept H0; otherwise, reject H0
• Two Errors:
• Type I Error: Reject H0 when H0 is true
• Type II error: Accept H0 when HA is true
• Two Metrics
• Size of a test = α = P( Type I error ) = P( Reject H0 | H0 is true)
• Power = 1 – β = P( Reject H0 | HA is true)
54
Types of Hypothesis Testing
Simple Hypothesis:
• A hypothesis that completely specifies the distribution of the samples is called a simple hypothesis.
• Example:
1) Coin Toss ; P(Heads) = 0.5 , P(Heads) = 0.8
3) Normal (µ, 3) samples ; µ = 1, µ = -1 etc.,
• Simple null vs simple alternative
55
Types of Hypothesis Testing
Composite Hypothesis
• A hypothesis that does not specify the distribution of the samples is called a Composite hypothesis.
56
Types of Hypothesis Testing
Standard Tests: One Sample
X1,X2,X3,…,Xn ~ iid X
E(X) = µ ; Var(X) = σ2
57
Z-score values for Rejection Regions
58
Standard Tests: One Sample
X1,X2,X3,…,Xn ~ iid X
E(X) = µ ; Var(X) = σ2
59
Standard Tests: Two Sample
X1,X2,X3,…,Xn ~ iid X
Y1,Y2,Y3,…,Yn ~ iid Y
E(X) = µ1 ; Var(X) = σ12
E(X) = µ2 ; Var(X) = σ22
60
Goodness of fit testing
Samples: X1,X2,X3,…,Xn
• Examples :
• Integer Samples Xi ∈ { 0,1,2, … }. Is the distribution Poisson?
• Continuous Samples Xi ∈ (- ꝏ , ꝏ) . Is the distribution normal?
61
Observations
• In all examples, the questions seem to be reasonably posed in a statistical hypothesis testing
framework.
• In most cases, the null and/or alternative are composite
• In all cases, the confidence of the testing is very important.
62
Covering concepts (through an example)
Suppose that a construction firm has just purchased a large supply of cables that have been guaranteed to have an
average breaking strength of at least 7,000 psi.
To verify this claim, the firm has decided to take a random sample of 10 of these cables to determine their breaking
strengths. They will then use the result of this experiment to ascertain whether or not they accept the cable
manufacturer’s hypothesis that the population mean is at least 7,000 pounds per square inch.
A primary problem is to develop a procedure for determining whether or not the values of a
random sample from this population are consistent with the hypothesis
63
For instance,
consider a particular normally distributed population having an unknown mean value θ and known variance 1.
The statement “θ is less than 1” is a statistical hypothesis that we could try to test by observing a random sample
from this population.
If the random sample is deemed to be consistent with the hypothesis under consideration, we say that the
hypothesis has been “accepted”; otherwise, we say that it has been “rejected”
Important Note: In accepting a given hypothesis, we are not actually claiming that it is true but rather
we are saying that the resulting data appear to be consistent with it.
64
For instance,
in the case of a normal (θ, 1) population,
If a resulting sample of size 10 has an average value of 1.25, then although such a result cannot be regarded
as being evidence in favor of the hypothesis “θ < 1,” it is not inconsistent with this hypothesis, which would
thus be accepted.
On the other hand, if the sample of size 10 has an average value of 3, then even though a sample value that
large is possible when θ < 1, it is so unlikely that it seems inconsistent with this hypothesis, which would
thus be rejected.
65
Significance Levels
Thus, the first of these hypotheses states that the population is normal with mean 1 and
variance 1, whereas the second states that it is normal with variance 1 and a mean less
than or equal to 1.
Note: the null hypothesis in (a), when true, completely specifies the population
distribution, whereas the null hypothesis in (b) does not. A hypothesis that, when true,
completely specifies the population distribution is called a simple hypothesis; one that
does not is called a composite hypothesis.
66
Suppose now that in order to test a specific null hypothesis H0, a population sample of size n — say X1, . . . ,
Xn — is to be observed. Based on these n values, we must decide whether or not to accept H 0.
A test for H0 can be specified by defining a region C in n-dimensional space with the provison that the
hypothesis is to be rejected if the random sample X1, . . . , Xn turns out to lie in C and accepted otherwise.
The region C is called the critical region. In other words, the statistical test determined by the critical region C
is the one that
67
Important Note:
When developing a procedure for testing a given null hypothesis H 0 that, in any test, two different types of
errors can result.
1. The first of these, called a type I error, is said to result if the test incorrectly calls for rejecting H 0 when it
is indeed correct.
2. The second, called a type II error, results if the test calls for accepting H0 when it is false.
The objective of a statistical test of H0 is not to explicitly determine whether or not H0 is true but
rather to determine if its validity is consistent with the resultant data.
Hence, with this objective, it seems reasonable that H0 should only be rejected if the resultant
data are very unlikely when H0 is true.
68
The classical way of accomplishing this is
(i) First, specify a value α and
(ii) then require the test to have the property that whenever H0 is true its probability of being rejected is never
greater than α.
The value α, called the level of significance of the test, is usually set in advance, with commonly chosen values
being α = .1, .05, .005.
In other words,
the classical approach to testing H0 is to fix a significance level α and then require that the test have the
property that the probability of a type I error occurring can never be greater than α.
69
Tests concerning the mean of a normal population
Z test
1. Case of Known Variance
Suppose that X1, . . . , Xn is a sample of size n from a normal distribution having an unknown mean μ
and a known variance σ2 and suppose we are interested in testing the null hypothesis
H0 : μ = μ 0
H1 : μ ≠ μ 0
70
If we desire that the test has significance level α, then we must determine the critical value c that will make
the type I error equal to α. That is, c must be such that
71
where Z is a standard normal random variable. However, we know that
72
we have superimposed the standard normal density function [which is the density of the test statistic
when H0 is true].
73
Determining Critical Values (Zα/2)
What is the critical value (Zα/2) for a 95% confidence level (for α =
0.05) , assuming a two-tailed test?
74
Rejection regions for different tailed Z test
75
Z-score values for common confidence levels of a normal distribution
99% Confidence level ( i.e alpha = 0.01):
Left-tailed test: Zα = -2.33
Two-tailed test: Zα/2 = +- 2.55 (the critical z-values are +2.55 and
-2.55)
Right-tailed test: Zα = +-2.33
76
Since this value is less than z.025 = 1.96 , the hypothesis is accepted. In other words, the data are not
inconsistent with the null hypothesis in the sense that a sample average as far from the value 8 as
observed would be expected , when the true mean is 8, over 5 percent of the time.
Note: however, that if a less stringent significance level were chosen — say α = 0.1 then the null
hypothesis would have been rejected. This follows since z.05 = 1.645, which is less than 1.68.
Hence, if we had chosen a test that had a 10 percent chance of rejecting H0 when H0 was
true, then the null hypothesis would have been rejected.
77
The “correct” level of significance to use in a given situation depends on the individual circumstances involved in
that situation.
For instance, if rejecting a null hypothesis H0 would result in large costs that would thus be lost if H 0 were indeed
true, then we might elect to be quite conservative and so choose a significance level of .05 or .01.
Also, if we initially feel strongly that H0 was correct, then we would require very stringent data evidence to the
contrary for us to reject H0. (That is, we would set a very low significance level in this situation).
78
Example (1/4) of Z test
Suppose a manufacturer claims that the mean weight of their product is 500 grams. To test this claim, a random sample
of 36 products is selected, and their weights are recorded. The sample mean weight is found to be 495 grams, with a
sample standard deviation of 10 grams. Assume the weights follow a normal distribution. Using a significance level of
0.05, test the manufacturer's claim.
H0: μ = 500
H1: μ ≠ 500
79
•
80
•
81
Z-score values for common confidence levels of a normal distribution
82
4) Determine the critical value:
Since it's a two-tailed test, the critical z-values are -1.96 and 1.96 at a significance level of
0.05.
5) Decision rule:
If the absolute value of the z-score is greater than 1.96, we reject the null hypothesis.
6) Make a decision:
The calculated z-value (-3) falls in the rejection region (less than -1.96), so we reject the null
hypothesis.
7) Conclusion:
Since we reject the null hypothesis, we have sufficient evidence to conclude that the mean
weight of the product is not 500 grams.
Therefore, based on the sample data, there is enough evidence to suggest that the
manufacturer's claim is not correct at the 0.05 significance level.
83
Example (2/4) of Z test
Suppose a manufacturer claims that the average lifespan of their light bulbs is at least 1000 hours. You believe that the
average lifespan is actually less than that. To test this claim, you collect a sample of 50 light bulbs and find that the
average lifespan is 980 hours with a standard deviation of 40 hours. You want to test whether the average lifespan is
significantly less than 1000 hours at a 5% significance level.
84
Calculate the test statistic (z-score):
85
Z-score values for common confidence levels of a normal distribution
86
Determine the critical value:
Since this is a left-tailed test and the significance level is 0.05, we find the
critical z-value from the standard normal distribution table. At α = 0.05, the
critical value is approximately -1.645..
Make a decision:
Since the calculated z-value (-3.54) is less than -1.645 (the critical value for a
5% significance level), we reject the null hypothesis.
Conclusion:
There is enough evidence to suggest that the average lifespan of the light
bulbs is significantly less than 1000 hours.
87
Example (3/4) of Z test
Suppose a company claims that the average response time for their customer service hotline is no more than 3
minutes. You believe that the average response time is actually longer than that. To test this claim, you collect a
sample of 40 calls to the hotline and find that the average response time is 3.5 minutes with a standard deviation of
0.8 minutes. You want to test whether the average response time is significantly greater than 3 minutes at a 5%
significance level.
State the hypothesis:
Null Hypothesis (H0):
The average response time for the customer service hotline
is no more than 3 minutes..
Alternative Hypothesis (H1):
The average response time for the customer service hotline
is greater than 3 minutes..
H0: μ = 3
H1: μ > 3
Set Significance Level:
Let's choose a significance level (α) of 0.05.
88
Calculate the test statistic (z-score):
89
Z-score values for common confidence levels of a normal distribution
90
Determine the critical value:
Since this is a right-tailed test and the significance level is 0.05, we
find the critical z-value from the standard normal distribution table.
At α = 0.05, the critical value is approximately 1.645.
Make a decision:
Since the calculated z-value (3.95) is greater than 1.645 (the critical
value for a 5% significance level), we reject the null hypothesis.
Conclusion:
There is enough evidence to suggest that the average response time
for the customer service hotline is significantly greater than 3
minutes.
91
Example (4/4) of Z test
•
92
Calculate the test statistic (z-score):
93
Z-score values for common confidence levels of a normal distribution
94
Determine the critical value:
Since it's a two-tailed test at α = 0.01, the critical z-values are ±2.58 (rounded from z-table).
Decision rule:
If the absolute value of the z-score is greater than 2.58, we reject the null hypothesis.
Make a decision:
The absolute value of the calculated z-value (2.65) falls in the rejection region (greater than
2.58), so we reject the null hypothesis.
Conclusion:
Since we reject the null hypothesis, we have sufficient evidence to conclude that the
average score of the institute's students is not 75 at a significance level of 0.01.
Therefore, based on the sample data, there is enough evidence to suggest that the
institute's claim of the average score being 75 is not supported.
95
• Introduction to Probability and Statistics for Engineers and Scientists, Sixth Edition, Sheldon M. Ross
• Statistical Methods Combined Edition (Volume I& II), N G Das
96
• Continuous random variable
• Normal distribution
Approximation Rule
• T distribution
97
• Defined the Hypothesis Testing with examples.
• We have discussed the Z test hypothesis testing in detail, along with four different
examples.
98
99
100