0% found this document useful (0 votes)
33 views

PT Module5

The document discusses hypothesis testing and statistical tests. It introduces key concepts like null and alternative hypotheses, types of errors, test statistics, and critical regions. Specific parametric tests covered include the z-test for single means, difference of means, single proportions, and difference of proportions. Examples of z-tests are provided to demonstrate hypothesis testing procedures.

Uploaded by

Venkat Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

PT Module5

The document discusses hypothesis testing and statistical tests. It introduces key concepts like null and alternative hypotheses, types of errors, test statistics, and critical regions. Specific parametric tests covered include the z-test for single means, difference of means, single proportions, and difference of proportions. Examples of z-tests are provided to demonstrate hypothesis testing procedures.

Uploaded by

Venkat Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Module 5

Hypothesis Testing I
➢ Introduction
➢ Types of errors
➢ critical region
➢ procedure of testing hypothesis
➢ Large sample tests
➢ Z test for Single Proportion
➢ Difference of Proportion
➢ mean and difference of means.
Introduction
The method of hypothesis testing uses tests of significance to determine the
likelihood that a statement (often related to the mean or variance of a given
distribution) is true, and at what likelihood we would, as statisticians, accept
the statement as true.

A hypothesis should be specific, clear and precise. It should state as far as


possible in mostly single terms so that the same is easily understood by all. It
should state the relationship between variables.

While understanding the mathematical concepts that go into the formulation


of these tests is important, knowledge of how to appropriately use each test
(and when to use which test) is equally important.
A Statistical hypothesis is a conjecture about a population parameter. This conjecture
may or may not be true.

The null hypothesis, symbolized by 𝐻0 , is a statistical hypothesis that states that


there is no difference between a parameter and a specific value or that there is no
difference between two parameters.

The alternative hypothesis, symbolized by 𝐻1 , is a statistical hypothesis that states a


specific difference between a parameter and a specific value or states that there is a
difference between two parameters.

In other words, we can say 𝐻1 is complementary to 𝐻0 .


Type of Tests
Two-Tailed Test:
A medical researcher is interested in finding out whether a new medication will have
any undesirable side effects. The researcher is particularly concerned with the pulse
rate of the patients who take the medication.
What are the hypotheses to test whether the pulse rate will be different from the
mean pulse rate of 82 beats per minute?
𝐻0 : 𝜇 = 82 and 𝐻1 : 𝜇 ≠ 82 :
This is a Two-Tailed test.

Right-Tailed Test:
A chemist invents an additive to increase the life of an automobile battery. If the
mean lifetime of the battery is 36 months, then his hypotheses are
𝐻0 : 𝜇 = 36 and 𝐻1 : 𝜇 > 36
Left-Tailed Test:
A contractor wishes to lower heating bills by using a special type of insulation in
houses. If the average of the monthly heating bills is Rs.78, her hypotheses about
heating costs will be
𝐻0 : 𝜇 = Rs.78 and 𝐻1 : 𝜇 < Rs. 78

A test statistic is computed after stating the null hypothesis. It is based on the
appropriate probability distribution.

A test statistics uses the data obtained from a sample to make a decision about
whether or not the null hypothesis should be rejected.

The numerical value obtained from a test statistic is called the calculated value.
Errors in Hypothesis Testing:
A Type I error occurs if one rejects the null hypothesis when it is true. This is similar
to a good product being rejected by the consumer and hence Type I error is also
known as producer's risk.
The level of significance is an important concept in hypothesis testing. It is always
some percentage. The level of significance is the maximum probability of rejecting a
null hypothesis when it is true and is denoted by 𝛼. The probability of making a
correct decision is 1 − 𝛼. The level of significance may be taken as 1% or 5% or 10%
(i.e., 𝛼 = 0.01 or 0.05 or 0.1). If we fix the level of significance at 5%, then the
probability of making type-I error is 0.05. This also means that we are 95% confident
of making a correct decision. When no level of significance is mentioned, it is taken as
= 0.05.
A Type II error occurs if one does not reject the null hypothesis when it is false. As this
error is similar to that of accepting a product of inferior quality, it is known as
consumer's risk. The probability of committing Type II error is denoted by 𝛽.
Types of Errors

Study reports Study reports


NO difference IS a difference
(Do not reject H0) (Reject H0)
H0 is true
Difference Does
NOT exist in X Type I
Error
population
HA is true
Difference DOES
exist in population
X Type II
Error
Important Tests of Hypothesis
For the purpose of testing a hypothesis, several tests of hypothesis were developed.
They can be classified as
Parametric test
Non-parametric test
Parametric tests are also known as the standard distribution free tests of hypothesis.
The important parametric tests are
Z-test (for Large Samples)
t-test (for Small Samples)
F-test (for Small Samples)
Z-test

If the sample size n is greater than or equal to 30 (𝑛 ≥ 30), the sample is called a Large
Sample.
The z-test is a statistical test for the mean of a population. It can be used for large
sample or when the population is normally distributed and 𝜎 is known.
The critical values for some standard LOS's are given in the following table:
Table 1:
Critical Region
A region corresponding to a statistics which amounts to rejection of the null
hypothesis 𝐻0 is known as the critical region. It is also called as the region of
rejection. The critical region is the region of the standard normal curve
corresponding to a predetermined level of significance. The region under the
normal curve which is not shaded is known as the acceptance region.
Procedure for Hypothesis Testing
The main question in hypothesis testing is whether to accept the null hypothesis or not
to accept the null hypothesis. The following tests are involved in hypothesis testing.

Step 1: State the Null 𝐻0 and Alternative 𝐻1 Hypotheses


Step 2: Decide the nature of test (one-tailed or two-tailed based on 𝐻1 )
Step 3: Obtain 𝑧𝛼 value which depends upon 𝛼 value and the nature of test.
Step 4: For large samples, when population n's standard deviation is known, the
ҧ
𝑥−𝜇
test statistics is Z = Τ . The corresponding distribution is normal.
𝜎 𝑛
Step 5: Comparison and Conclusion.
- If 𝑧 < 𝑧𝛼 , 𝐻0 is accepted or 𝐻1 is rejected, that is, there is no significant
difference at 𝛼 % LOS.
- If 𝑧 > 𝑧𝛼 , 𝐻0 is rejected or 𝐻1 is accepted, that is, there is significant difference
at 𝛼 % LOS.
Test of Significance: Large Samples (Z-test)

• Test of significance for single mean


• Test of significance for difference of means of two large samples
• Test of significance for a single proportion
• Test of significance for difference of proportions
Test of significance for single mean
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample of size 𝑛, drawn from a large population with
mean 𝜇 and variance σ2 .
Let 𝑥ҧ denote the mean of the sample and 𝑠 2 denote the variance of the sample.
We know that 𝑥ҧ ∼ 𝑁(𝜇, 𝜎 2 /𝑛). The standard normal variate corresponding to 𝑥ҧ is
ҧ
𝑥−𝜇
𝑍= ҧ
, where S.E.(𝑥)ҧ =𝜎/ 𝑛.
𝑆.𝐸.(𝑥)
We set up the null hypothesis that there is no difference between the sample
mean and the population mean. The test statistic is

ҧ
𝑥−𝜇
Z= If 𝜎 is known.
𝜎Τ 𝑛

ҧ
𝑥−𝜇
Z =Τ If 𝜎 is not known. Here, 𝑠 is the standard deviation of the sample.
𝑠 𝑛
Problem 1: The heights of college students in a city are normally distributed with S.D. 6
cms. A sample of 100 students have mean height 158 cms. Test the hypothesis that the
mean height of college students in the college is 160 cms.

Solution:
We have 𝑥ҧ = 158 (mean of the sample),
𝜇 = 160 (mean of the population), 𝜎 = 6, 𝑛 = 100.
Level of significance: 5%
𝐻0 : 𝜇 = 160, i.e., difference is not significant.
𝐻1 : 𝜇 ≠ 160
We apply the two tailed test.
ҧ
𝑥−𝜇 158−160
Test statistic is 𝑍 = = = −3.33.
𝜎/√𝑛 6/√100
∴ |𝑍| = 3.333
Table value of 𝑍 at 5% level of significance = 1.96. Since calculated value of 𝑍 at 5%
level of significance is greater than the table value of 𝑍, we reject 𝐻0 at 5% level of
significance.
Problem 2: A sample of 400 items is taken from a population whose standard deviation
is 10. The mean of the sample is 40. Test whether the sample has come from the
population with mean 38. Also calculate 95% confidence interval for the population
mean.
Solution:
𝐻0 : 𝜇 = 38
𝐻1 : 𝜇 ≠ 38
Level of significance: 5%
We apply the two tailed test.
ҧ
𝑥−𝜇 40−38
Test statistic is 𝑍 = = = 4.
𝜎/√𝑛 10/√400
∴ |𝑍| = 4
Table value of 𝑍 at 5% level of significance = 1.96. Since calculated value of 𝑍 at 5%
level of significance is greater than the table value of 𝑍, we reject 𝐻0 at 5% level of
significance. 95% confidence interval for the population mean is given by
𝜎 10
𝑥ҧ ± 𝑧𝛼 = 40 ± 1.96 × = 39.02,40.98
𝑛 400
Problem 3:The mean of a certain production process is known to be 50 with a
standard deviation of 2.5. The production manager may welcome any change in the
mean value towards the higher side but would like to safeguard against decreasing
values of mean. He takes a sample of 12 items that gives a mean value of 46.5. What
inference should the manager take for the population process on the basis of sample
results. Use 5% level of significance for the purpose.
Solution:
𝐻0 : 𝜇 = 50
𝐻1 : 𝜇 < 50
Level of significance: 𝛼 = 0.05
ҧ
𝑥−𝜇 46.5−50
Test statistic is 𝑍 = = = −4.854.
𝜎/√𝑛 2.5/√12
∴ |𝑍| = 4.854

Table value of 𝑍 at 5% level of significance = 1.645. Since calculated value of 𝑍 at 5%


level of significance is greater than the table value of 𝑍, we reject 𝐻0 at 5% level of
significance.
Test of significance for difference of means of two large samples

Let 𝑥1 be the mean of an independent random sample of size 𝑛1 from a population with
mean 𝜇1 and variance 𝜎12 . Again, let 𝑥2 be the mean of an independent random sample
of size 𝑛2 from a population with mean 𝜇2 and variance 𝜎22 . Here, 𝑛1 and 𝑛2 are large.
Clearly,

Hence, under the null hypothesis 𝐻0 : 𝜇1 = 𝜇2, the test statistic is


If 𝜎1 = 𝜎2 = 𝜎, then the test statistic is

If 𝜎1 and 𝜎2 are not known, then the test statistic is


Problem: Random samples drawn from two places gave the following data relating to
the heights of children

Test at 5% level that the mean height is the same for the children at two places.
Test of significance of the difference between sample proportion and
population proportion (single proportion)
Let 𝑋 be the number of successes in 𝑛 independent Bernoulli trials in which the probability
of success for each trial is a constant = 𝑃 (say). Then it is known that 𝑋 follows a binomial
distribution with mean 𝐸(𝑋) = 𝑛 𝑃 and variance 𝑉(𝑋) = 𝑛 𝑃 𝑄

When 𝑛 is large, 𝑋 follows 𝑁 𝑛𝑃, 𝑛𝑃𝑄 , i.e. a normal distribution with mean 𝑛 𝑃 and
𝑋 𝑃𝑛 𝑛𝑃𝑄
S.D. 𝑛𝑃𝑄, where 𝑄 = 1 − 𝑃. follows 𝑁 𝑛
, 𝑛2
𝑛
Now 𝑋𝑛 is the proportion of successes in the sample consisting of 𝑛 trials, that is denoted by
𝑃𝑄
𝑝. Thus the sample proportion 𝑝 follows 𝑁 𝑃, 𝑛
. Therefore test statistic
𝑝−𝑃
𝑧= ~𝑁(0,1).
𝑃𝑄
𝑛
If | 𝑧 | ≤ 𝑧𝛼 , the difference between the sample proportion 𝑝 and the population
proportion 𝑃 is not significant at α% LOS.

Note: When P is not known, the 95 percent confidents limits for P are given by
Problem : If 20 people were attacked by a disease and only 18 survived, will you reject
the hypothesis that the survival rate if attacked by this disease is 85% in favor of the
hypothesis that is more at 5% level.

Solution: Number of people survived= 𝑥 = 18.


Size of the sample = 𝑛 = 20.
𝑥 18
𝑝 = Proportion of the people survived= = = 0.9
𝑛 20
It is given that= 𝑃 = 85% = 0.85. 𝑄 = 1 − 𝑃 = 1 − 0.85 = 0.15
Null hypothesis: 𝐻0 : 𝑃 = 0.85
Alternative hypothesis: 𝐻1 : 𝑃 > 0.85
Level of significance = 𝛼 = 0.05
𝑝−𝑃
Test statistic: 𝑧 = = 0.6265.
𝑃𝑄
𝑛
Table value of 𝑧 = 1.645.
Calculated value of 𝑧 is less than the table value of 𝑧 at 5% level of significance. Null
hypothesis is accepted.
Test of significance for difference of proportions

Suppose two samples of sizes 𝑛1 and 𝑛2 are drawn from two different populations.
To test the significance of difference between the two pro-portions, we consider the
following cases.

Case-I When the population proportions 𝑃1 and 𝑃2 are known:


In this case 𝑄1 = 1 − 𝑃1 and 𝑄2 = 1 − 𝑃2 . The test statistic is
𝑃1 − 𝑃2
𝑍=
𝑃1 𝑄1 𝑃2 𝑄2
+
𝑛1 𝑛2
Case-II When the population proportions 𝑃1 and 𝑃2 are not known but sample
proportions 𝑝1 and 𝑝2 are known:
𝑝1 − 𝑝2
𝑍=
𝑝1 𝑞1 𝑝2 𝑞2
+
𝑛1 𝑛2
Case-III Method of pooling:
In this method, the sample proportions 𝑃1 and 𝑃2 are pooled into a single proportion 𝑃,
by using the formula:
𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2

The test statistic in this case is

𝑃1 − 𝑃2
𝑍=
1 1
𝑃𝑄 +
𝑛1 𝑛2
Problem: A machine puts out 16 imperfect articles in sample of 500. After the
machine is overhauled, it puts out 3 imperfect articles in a batch of 100. Has the
machine improved?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy