PT Module5
PT Module5
Hypothesis Testing I
➢ Introduction
➢ Types of errors
➢ critical region
➢ procedure of testing hypothesis
➢ Large sample tests
➢ Z test for Single Proportion
➢ Difference of Proportion
➢ mean and difference of means.
Introduction
The method of hypothesis testing uses tests of significance to determine the
likelihood that a statement (often related to the mean or variance of a given
distribution) is true, and at what likelihood we would, as statisticians, accept
the statement as true.
Right-Tailed Test:
A chemist invents an additive to increase the life of an automobile battery. If the
mean lifetime of the battery is 36 months, then his hypotheses are
𝐻0 : 𝜇 = 36 and 𝐻1 : 𝜇 > 36
Left-Tailed Test:
A contractor wishes to lower heating bills by using a special type of insulation in
houses. If the average of the monthly heating bills is Rs.78, her hypotheses about
heating costs will be
𝐻0 : 𝜇 = Rs.78 and 𝐻1 : 𝜇 < Rs. 78
A test statistic is computed after stating the null hypothesis. It is based on the
appropriate probability distribution.
A test statistics uses the data obtained from a sample to make a decision about
whether or not the null hypothesis should be rejected.
The numerical value obtained from a test statistic is called the calculated value.
Errors in Hypothesis Testing:
A Type I error occurs if one rejects the null hypothesis when it is true. This is similar
to a good product being rejected by the consumer and hence Type I error is also
known as producer's risk.
The level of significance is an important concept in hypothesis testing. It is always
some percentage. The level of significance is the maximum probability of rejecting a
null hypothesis when it is true and is denoted by 𝛼. The probability of making a
correct decision is 1 − 𝛼. The level of significance may be taken as 1% or 5% or 10%
(i.e., 𝛼 = 0.01 or 0.05 or 0.1). If we fix the level of significance at 5%, then the
probability of making type-I error is 0.05. This also means that we are 95% confident
of making a correct decision. When no level of significance is mentioned, it is taken as
= 0.05.
A Type II error occurs if one does not reject the null hypothesis when it is false. As this
error is similar to that of accepting a product of inferior quality, it is known as
consumer's risk. The probability of committing Type II error is denoted by 𝛽.
Types of Errors
If the sample size n is greater than or equal to 30 (𝑛 ≥ 30), the sample is called a Large
Sample.
The z-test is a statistical test for the mean of a population. It can be used for large
sample or when the population is normally distributed and 𝜎 is known.
The critical values for some standard LOS's are given in the following table:
Table 1:
Critical Region
A region corresponding to a statistics which amounts to rejection of the null
hypothesis 𝐻0 is known as the critical region. It is also called as the region of
rejection. The critical region is the region of the standard normal curve
corresponding to a predetermined level of significance. The region under the
normal curve which is not shaded is known as the acceptance region.
Procedure for Hypothesis Testing
The main question in hypothesis testing is whether to accept the null hypothesis or not
to accept the null hypothesis. The following tests are involved in hypothesis testing.
ҧ
𝑥−𝜇
Z= If 𝜎 is known.
𝜎Τ 𝑛
ҧ
𝑥−𝜇
Z =Τ If 𝜎 is not known. Here, 𝑠 is the standard deviation of the sample.
𝑠 𝑛
Problem 1: The heights of college students in a city are normally distributed with S.D. 6
cms. A sample of 100 students have mean height 158 cms. Test the hypothesis that the
mean height of college students in the college is 160 cms.
Solution:
We have 𝑥ҧ = 158 (mean of the sample),
𝜇 = 160 (mean of the population), 𝜎 = 6, 𝑛 = 100.
Level of significance: 5%
𝐻0 : 𝜇 = 160, i.e., difference is not significant.
𝐻1 : 𝜇 ≠ 160
We apply the two tailed test.
ҧ
𝑥−𝜇 158−160
Test statistic is 𝑍 = = = −3.33.
𝜎/√𝑛 6/√100
∴ |𝑍| = 3.333
Table value of 𝑍 at 5% level of significance = 1.96. Since calculated value of 𝑍 at 5%
level of significance is greater than the table value of 𝑍, we reject 𝐻0 at 5% level of
significance.
Problem 2: A sample of 400 items is taken from a population whose standard deviation
is 10. The mean of the sample is 40. Test whether the sample has come from the
population with mean 38. Also calculate 95% confidence interval for the population
mean.
Solution:
𝐻0 : 𝜇 = 38
𝐻1 : 𝜇 ≠ 38
Level of significance: 5%
We apply the two tailed test.
ҧ
𝑥−𝜇 40−38
Test statistic is 𝑍 = = = 4.
𝜎/√𝑛 10/√400
∴ |𝑍| = 4
Table value of 𝑍 at 5% level of significance = 1.96. Since calculated value of 𝑍 at 5%
level of significance is greater than the table value of 𝑍, we reject 𝐻0 at 5% level of
significance. 95% confidence interval for the population mean is given by
𝜎 10
𝑥ҧ ± 𝑧𝛼 = 40 ± 1.96 × = 39.02,40.98
𝑛 400
Problem 3:The mean of a certain production process is known to be 50 with a
standard deviation of 2.5. The production manager may welcome any change in the
mean value towards the higher side but would like to safeguard against decreasing
values of mean. He takes a sample of 12 items that gives a mean value of 46.5. What
inference should the manager take for the population process on the basis of sample
results. Use 5% level of significance for the purpose.
Solution:
𝐻0 : 𝜇 = 50
𝐻1 : 𝜇 < 50
Level of significance: 𝛼 = 0.05
ҧ
𝑥−𝜇 46.5−50
Test statistic is 𝑍 = = = −4.854.
𝜎/√𝑛 2.5/√12
∴ |𝑍| = 4.854
Let 𝑥1 be the mean of an independent random sample of size 𝑛1 from a population with
mean 𝜇1 and variance 𝜎12 . Again, let 𝑥2 be the mean of an independent random sample
of size 𝑛2 from a population with mean 𝜇2 and variance 𝜎22 . Here, 𝑛1 and 𝑛2 are large.
Clearly,
Test at 5% level that the mean height is the same for the children at two places.
Test of significance of the difference between sample proportion and
population proportion (single proportion)
Let 𝑋 be the number of successes in 𝑛 independent Bernoulli trials in which the probability
of success for each trial is a constant = 𝑃 (say). Then it is known that 𝑋 follows a binomial
distribution with mean 𝐸(𝑋) = 𝑛 𝑃 and variance 𝑉(𝑋) = 𝑛 𝑃 𝑄
When 𝑛 is large, 𝑋 follows 𝑁 𝑛𝑃, 𝑛𝑃𝑄 , i.e. a normal distribution with mean 𝑛 𝑃 and
𝑋 𝑃𝑛 𝑛𝑃𝑄
S.D. 𝑛𝑃𝑄, where 𝑄 = 1 − 𝑃. follows 𝑁 𝑛
, 𝑛2
𝑛
Now 𝑋𝑛 is the proportion of successes in the sample consisting of 𝑛 trials, that is denoted by
𝑃𝑄
𝑝. Thus the sample proportion 𝑝 follows 𝑁 𝑃, 𝑛
. Therefore test statistic
𝑝−𝑃
𝑧= ~𝑁(0,1).
𝑃𝑄
𝑛
If | 𝑧 | ≤ 𝑧𝛼 , the difference between the sample proportion 𝑝 and the population
proportion 𝑃 is not significant at α% LOS.
Note: When P is not known, the 95 percent confidents limits for P are given by
Problem : If 20 people were attacked by a disease and only 18 survived, will you reject
the hypothesis that the survival rate if attacked by this disease is 85% in favor of the
hypothesis that is more at 5% level.
Suppose two samples of sizes 𝑛1 and 𝑛2 are drawn from two different populations.
To test the significance of difference between the two pro-portions, we consider the
following cases.
𝑃1 − 𝑃2
𝑍=
1 1
𝑃𝑄 +
𝑛1 𝑛2
Problem: A machine puts out 16 imperfect articles in sample of 500. After the
machine is overhauled, it puts out 3 imperfect articles in a batch of 100. Has the
machine improved?