0% found this document useful (0 votes)
38 views75 pages

ProbStat2019 07 Testing

The document discusses hypothesis testing, beginning with an introduction to how scientists and researchers make hypotheses and test them. It then provides examples of statistical hypotheses, distinguishing between the null and alternative hypotheses. It explains how to set up hypotheses depending on the context, and the difference between one-tailed and two-tailed tests. Finally, it begins to demonstrate the process of hypothesis testing with an example involving testing the average weight of products.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views75 pages

ProbStat2019 07 Testing

The document discusses hypothesis testing, beginning with an introduction to how scientists and researchers make hypotheses and test them. It then provides examples of statistical hypotheses, distinguishing between the null and alternative hypotheses. It explains how to set up hypotheses depending on the context, and the difference between one-tailed and two-tailed tests. Finally, it begins to demonstrate the process of hypothesis testing with an example involving testing the average weight of products.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Basic ideas The first example The p-value Two types of errors More tests

Statistics
Hypothesis Testing

Ling-Chieh Kung

Department of Information Management


National Taiwan University

Hypothesis Testing 1 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Introduction

I How do scientists (physicists, chemists, etc.) do research?


I Observe phenomena.
I Make hypotheses.
I Test the hypotheses through experiments (or other methods).
I Make conclusions about the hypotheses.
I In the business world, business researchers do the same thing with
hypothesis testing.
I One of the most important technique of inferential Statistics.
I A technique for (statistically) proving things.
I Again relies on sampling distributions.

Hypothesis Testing 2 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Road map

I Basic ideas of hypothesis testing.


I The first example.
I The p-value.
I Two types of errors.
I More tests.

Hypothesis Testing 3 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

People ask questions

I In the business (or social science) world, people ask questions:


I Are older workers more loyal to a company?
I Does the newly hired CEO enhance our profitability?
I Is one candidate preferred by more than 50% voters?
I Do teenagers eat fast food more often than adults?
I Is the quality of our products stable enough?
I How should we answer these questions?
I Statisticians suggest:
I First make a hypothesis.
I Then test it with samples and statistical methods.

Hypothesis Testing 4 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses

I A statistical hypothesis is a formal way of stating a research


hypothesis.
I Typically with parameters and numbers.
I It contains two parts:
I The null hypothesis (denoted as H0 ).
I The alternative hypothesis (denoted as Ha or H1 ).
I The alternative hypothesis is:
I The thing that we want (need) to prove.
I The conclusion that can be made only if we have a strong evidence.
I The null hypothesis corresponds to a default position.

Hypothesis Testing 5 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses: example 1

I In our factory, we produce packs of candy whose average weight should


be 1 kg.
I One day, a consumer told us that his pack only weighs 900 g.
I We need to know whether this is just a rare event or our production
system is out of control.
I If (we believe) the system is out of control, we need to shutdown the
machine and spend two days for inspection and maintenance. This will
cost us at least $100,000.
I So we should not to believe that our system is out of control just
because of one complaint. What should we do?

Hypothesis Testing 6 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses: example 1

I We may state a research hypothesis “Our production system in under


control.”
I Then we ask: Is there a strong enough evidence showing that the
hypothesis is wrong, i.e., the system is out of control?
I Initially, we assume our system is under control.
I Then we do a survey for a “strong enough evidence”.
I We should shutdown machines only if we prove that the system is out of
control.
I Let µ be the average weight, the statistical hypothesis is

H0 : µ = 1
Ha : µ 6= 1.

Hypothesis Testing 7 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses: example 2

I In our society, we adopt the presumption of innocence.


I One is considered innocent until proven guilty.
I So when there is a person who probably stole some money:

H0 : The person is innocent


Ha : The person is guilty.
I It is unacceptable that an innocent person is considered guilty.
I We will say one is guilty only if there is a strong evidence.

Hypothesis Testing 8 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses: example 3

I One is considering whether to join an election as a candidate.


I A hypothesis is “The candidate is preferred by more than 50% voters.”
I As we need a default position and the percentage that we care about is
50%, we will choose our null hypothesis as

H0 : p = 0.5.

I How about the alternative hypothesis? Should it be

Ha : p > 0.5 or Ha : p < 0.5?

Hypothesis Testing 9 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Statistical hypotheses: example 3

I The choice of the alternative hypothesis depends on the related


decisions or actions to make.
I Suppose one will go for the election only if she thinks she will win (i.e.,
p > 0.5), the alternative hypothesis will be

Ha : p > 0.5.

I Suppose one tends to participate in the election and will give up only if
the chance is slim, the alternative hypothesis will be

Ha : p < 0.5.

Hypothesis Testing 10 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Remarks

I For setting up a statistical hypothesis:


I Our default position will be put in the null hypothesis.
I The thing we want to prove (i.e., the thing that needs a strong evidence)
will be put in the alternative hypothesis.
I For writing the mathematical statement:
I The equal sign (=) will always be put in the null hypothesis.
I The alternative hypothesis contains an unequal sign or strict
inequality: 6=, >, or <.
I The statement of the alternative hypothesis depends on the business
context.
I Some studies have H0 , H1 , H2 , ...

Hypothesis Testing 11 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

One-tailed tests and two-tailed tests

I If the alternative hypothesis contains an unequal sign (6=), the test is a


two-tailed test.
I If it contains a strict inequality (> or <), the test is a one-tailed test.
I Suppose we want to test the value of the population mean.
I In a two-tailed test, we test whether the population mean significantly
deviates from a value. We do not care whether it is larger than or
smaller than.
I In a one-tailed test, we test whether the population mean significantly
deviates from a value in a specific direction.

Hypothesis Testing 12 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Road map

I Basic ideas of hypothesis testing.


I The first example.
I The p-value.
I Two types of errors.
I More tests.

Hypothesis Testing 13 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The first example

I Now we will demonstrate the process of hypothesis testing.


I Suppose we test the average weight (in g) of our products.

H0 : µ = 1000
Ha : µ 6= 1000.
I Once we have a strong evidence supporting Ha , we will claim that
µ 6= 1000.
I Suppose we know the variance of the weights of the products produced:
σ 2 = 40000 g2 .

Hypothesis Testing 14 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Controlling the error probability

I Certainly the evidence comes from a random sample.


I It is natural that we may be wrong when we claim µ 6= 1.
I E.g., it is possible that µ = 1000 but we unluckily get a sample mean
x̄ = 912.
I We want to control the error probability.
I Let α be the maximum probability for us to make this error.
I α is called the significance level.
I So when µ = 1, we will claim that µ 6= 1 for at most probability α.
I Recall confidence intervals!

Hypothesis Testing 15 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule

I Now let’s test with the significance level α = 0.05.


I Intuitively, if X deviates from 1000 a lot, we should reject the null
hypothesis and believe that µ 6= 1000.
I If µ = 1000, it is so unlikely to observe such a large deviation.
I So such a large deviation provides a strong evidence.
I So we start by sampling and calculating the sample mean.
I Suppose the sample size n = 100.
I Suppose the sample mean x̄ = 963.
I We want to construct a rejection rule: If |X − 1000| > d, we reject
H0 . We need to calculate d.

Hypothesis Testing 16 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule

I We want a distance d such that


if H0 is true, the probability of rejecting H0 is 5%.
I If H0 is true, µ = 1000. We reject H0 if |X − 1000| > d.
I Therefore, we need
 
Pr |X − 1000| > d µ = 1000 = 0.05.

I People typically hide the condition µ = 1000.


I The statistic sample mean X has its sampling distribution.
X−µ
I Due to the central limit theorem, √
σ/ n
∼ ND(0, 1). The standard error

is 200/ 100 = 20.

Hypothesis Testing 17 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value


I 0.95 = Pr(|X − 1000| < d) = Pr(1000 − d < X < 1000 + d).

Hypothesis Testing 18 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value


I The rejection region is R = (−∞, 960.8) ∪ (1039.2, ∞).
I If X falls in the rejection region, we reject H0 .

Hypothesis Testing 19 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value


I we cannot reject H0 because x̄ = 963 ∈
/ R.
I The deviation is not large enough; The evidence is not strong enough.

Hypothesis Testing 20 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value

I In this example, the two values 960.8 and 1039.2 are the critical values
for rejection.
I If the sample mean is more extreme than one of the critical values, we
reject H0 .
I Otherwise, we do not reject H0 .
I x̄ = 963 is not strong enough to support Ha : µ 6= 1000.
I Concluding statement:
I Because the sample mean does not lie in the rejection region, we cannot
reject H0 . With a 5% significance level, there is no strong evidence
showing that the average weight is not 1000 g. Based on this result, we
should not shutdown machines and do an inspection.

Hypothesis Testing 21 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Summary
I We want to know whether H0 is false, i.e., µ 6= 1000.
I We control the probability of making a wrong conclusion.
I If the machine is actually good, we do not want to reach a conclusion
that requires an inspection and maintenance.
I If H0 (µ = 1000) is true, we do not want to reject H0 .
I We limit the probability at the significance level α = 5%.
I We conclude that H0 is false because the sample mean falls in the
rejection region.
I If the population is normal or if the sample size is large (n ≥ 30), we
have
X −µ
Z= √ ∼ ND(0, 1).
σ/ n
I The calculation of the rejection region (i.e., the critical values) is based
on the z distribution.
I We conducted a z test.

Hypothesis Testing 22 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Remark: Not rejecting vs. accepting

I We should be careful in writing our conclusions:


I Right: Because the sample mean does not lie in the rejection region, we
cannot reject H0 . With a 5% significance level, there is no strong
evidence showing that the average weight is not 1000 g.
I Wrong: Because the sample mean does not lie in the rejection region,
we accept H0 . With a 5% significance level, there is a strong evidence
showing that the average weight is 1000 g.
I Unable to prove one thing is false does not mean it is true!

Hypothesis Testing 23 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Remark: The probability that is controlled?


I What we have controlled is:
I If the null hypothesis is true, the probability of rejecting it is no greater
than the significance level (α).
I We did not ensure that:
I If we reject the null hypothesis, the probability that the null hypothesis
is true is no greater than the significance level (α).
I The key is:
I Only if we know (actually, assume) the null hypothesis is true, we may
calculate the probability of rejecting it.
I The probability cannot be controlled in the opposite way.
I The significance level α is a conditional probability:
I Pr(rejecting H0 |H0 is true) = α.
I Pr(H0 is true|rejecting H0 ) cannot be calculated.

Hypothesis Testing 24 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The first example (part 2)

I Suppose we modify the hypothesis into a directional one:

H0 : µ = 1000.
Ha : µ < 1000.

σ 2 = 40000, n = 100, α = 0.05.


I This is a one-tailed test.
I Once we have a strong evidence supporting Ha , we will claim that
µ < 1000.
I We need to find a distance d such that
 
Pr 1000 − X > d µ = 1000 = 0.05.

Hypothesis Testing 25 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value


I We have 0.05 = Pr(1000 − X > d).
I The critical value d = 32.9. The rejection region is (−∞, 967.1).

Hypothesis Testing 26 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value


I Because the observed x̄ = 963 ∈ (−∞, 967.1), we reject H0 .
I The deviation is large enough. The evidence is strong enough.

Hypothesis Testing 27 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule: the critical value

I In this example, 967.1 is the critical values for rejection.


I If the sample mean is more extreme than (in this case, below) the critical
value, we reject H0 .
I Otherwise, we do not reject H0 .
I There is a strong evidence supporting Ha : µ < 1000.
I Concluding statement:
I Because the sample mean lies in the rejection region, we reject H0 .
With a 5% significance level, there is a strong evidence showing that the
average weight is less than 1000 g.

Hypothesis Testing 28 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The other form of the null hypothesis

I Some statisticians write the one-tailed hypothesis as

H0 : µ ≥ 1000
Ha : µ < 1000.
I When H0 is true, µ is not fixed to a single value.
I With the rejection region (−∞, 967.1), what is the error probability
Pr(rejecting H0 |H0 is true)?
I If µ = 1000, Pr(rejecting H0 |H0 is true) = 0.05.
I If µ > 1000,

Pr(rejecting H0 |H0 is true)


= Pr(X < 967.1|H0 is true) < 0.05.

Hypothesis Testing 29 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The other form of the null hypothesis


I E.g., suppose µ = 1010.

I In general, we control the probability of rejecting H0 when it is true to


be at most α.

Hypothesis Testing 30 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

One-tailed tests vs. two-tailed tests

I When should we use a two-tailed test?


I We should use a two-tailed test to be conservative.
I E.g., we suspect that the parameter has changed, but we are unsure
whether it becomes larger or smaller.
I If we know or believe that the change is possible only in one
direction, we may use a one-tailed test.
I If we do not know it, using one-tailed test is dangerous.
I In the previous example with Ha : µ < 1000.
I If x̄ = 2000, all we can say is “there is no strong evidence that µ < 1000.”
I We are unable to conclude that µ 6= 1000.

Hypothesis Testing 31 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

One-tailed tests vs. two-tailed tests


I Having more information (i.e., knowing the direction of change) makes
rejection “easier”.
I Easier to find a strong enough evidence.

Hypothesis Testing 32 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Road map

I Basic ideas of hypothesis testing.


I The first example.
I The p-value.
I Two types of errors.
I More tests.

Hypothesis Testing 33 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The p-value

I The p-value is an important, meaningful, and widely-adopted tool for


hypothesis testing.

Definition 1
In a hypothesis testing, for an observed value of the statistic, the
p-value is the probability of observing a value that is at least as
extreme as the observed value under the assumption the null
hypothesis is true.
I Based on an observed value of the statistic.
I Is the tail probability of the observed value.
I Assuming that the null hypothesis is true.

Hypothesis Testing 34 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The p-value

I Mathematically:
I Suppose we test a population mean µ with a one-tailed test

H0 : µ = 1000
Ha : µ < 1000.
I Given an observed x̄, the p-value is defined as

Pr(X < x̄).


I In the previous example:
I σ 2 = 40000, n = 100, α = 0.05, x̄ = 963.
I How to calculate the p-value of x̄?

Hypothesis Testing 35 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The p-value
I If H0 is true, i.e., µ = 1000, we have
Pr(X ≤ 963) = Pr(Z ≤ −1.85) = 0.032.

Hypothesis Testing 36 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Quiz: Which factors affect the p-value?

I Which of the following factors affect the p-value

Pr(X < x̄)?


I The observed value of the statistic.
I The population mean assumed in the null hypothesis.
I The population variance.
I The sample size.
I The significance level α.
I Whether the test is one-tailed or two-tailed.

Hypothesis Testing 37 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

How to use the p-value?

I The p-value can be used for constructing a rejection rule.


I For a one-tailed test:
I If the p-value is smaller than α, we reject H0 .
I If the p-value is greater than α, we do not reject H0 .
I Consider the one-tailed test
H0 : µ = 1000
Ha : µ < 1000.
I Suppose we still adopt α = 0.05.
I Because the p-value 0.032 < 0.05, we reject H0 .

Hypothesis Testing 38 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

p-values vs. critical values


I Using the p-value is equivalent to using the critical values.
I The rejection-or-not decision we make will be the same based on the two
methods.

Hypothesis Testing 39 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The benefit of using the p-value

I In calculating the p-value, we do not need α.


I After the p-value is calculated, we compare it with α.
I In many studies, the researchers do not determine the significance level
α before a test is conducted.
I They calculate the p-value and then mark how significant the result
is with stars.
p-value < 0.001 (0.001, 0.01) (0.01, 0.05) > 0.05
Highly Moderately Slightly
Significant? Insignificant
significant significant significant
Mark *** ** * (Empty)

Hypothesis Testing 40 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The benefit of using the p-value

I As an example, suppose one is testing whether people sleep at least


eight hours per day in average.
I Age groups: [10, 15), [15, 20), [20, 25), etc.
I For group i, a one-tailed test is conducted. Ha : µi > 8.
I The result may be presented in a table:

Group Age group p-value


1 [10,15) 0.0002***
2 [15,20) 0.2
3 [20,25) 0.04*
4 [25,30) 0.03*
5 [30,35) 0.008**

Hypothesis Testing 41 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Interpreting the p-value

I A smaller p-value does NOT mean a larger deviation!


I We cannot conclude that µ5 > µ4 , µ1 > µ3 , etc.
I A smaller p-value means a higher probability to reject the null
hypothesis.
I If α = 0.01, we will conclude that only µ1 is statistically significantly
larger than 8.
I We do not believe that µ1 is larger than 8 by a huge amount!
I It is more probable (i.e., with a larger range of α) for us to conclude
that µ1 “significantly” deviate from 8.

Hypothesis Testing 42 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The p-value for two-tailed tests


I How to construct the rejection rule for a two-tailed test?
I If the p-value is smaller than α2 , we reject H0 .
I If the p-value is greater than α2 , we do not reject H0 .
I Consider the two-tailed test
H0 : µ = 1000.
Ha : µ 6= 1000.
I Suppose we still adopt α = 0.05.
I Because the p-value 0.032 > α2 = 0.025, we do not reject H0 .
I Some functions return the p-value for a one-tailed test but twice of
the p-value for a two-tailed test.
I With these functions, we will always compare the returned value with α
directly.
I Read the instructions before using those functions!

Hypothesis Testing 43 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Road map

I Basic ideas of hypothesis testing.


I The first example.
I The p-value.
I Two types of errors.
I More tests.

Hypothesis Testing 44 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Type I error

I We discussed a lot in controlling a probability:


I If the null hypothesis is true, we want to avoid rejecting it.
I Typically we set Pr(rejecting H0 |H0 is true) = α.
I In general, it is Pr(rejecting H0 |H0 is true) ≤ α.
I What we have controlled is not Pr(H0 is true|rejecting H0 ).
I If we reject a true null hypothesis, we make a Type I error.
I What if the null hypothesis is false?

Hypothesis Testing 45 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Type II error

I What if the null hypothesis is false? How to avoid not rejecting a false
null hypothesis?
I Not rejecting a false null hypothesis is a Type II error.
I The probability of making a type II error is denoted as β:

Pr(rejecting H0 |H0 is true) = α.


Pr(not rejecting H0 |H0 is false) = β.
I We controlled the probability of making a Type I error. We know it is
at most α.
I Do we know the probability of making a Type II error?

Hypothesis Testing 46 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Type II error

I Recall our one-tailed test with α = 0.05 again:

H0 : µ = 1000.
Ha : µ < 1000.
I If H0 is false and µ is actually 950, we know how to calculate β:
I The rejection rule (which is constructed by assuming H0 is true) will
be the same: Reject H0 if X < 967.1.
I The probability of not rejecting H0 is

Pr(X > 967.1) = Pr(Z > 0.855) = 0.196 = β.

Hypothesis Testing 47 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

α and β

Hypothesis Testing 48 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Type II error

I For every different value of µ, we have a different β:


µ 950 960 970 980 990
β 0.196 0.361 0.558 0.74 0.874
I As the true value of µ is never known, we never know β.
I To lower β, one way is to increase α.

Hypothesis Testing 49 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Increasing α to decrease β

Hypothesis Testing 50 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Type I errors vs. Type II errors

I If we control α, we cannot control β.


I As α is controlled, β (as a function of the parameter) determines how
good a test is.
I 1 − β is called the power of a test. Smaller β means a better test.
I Summary:
State of nature
Action
H0 is true H0 is false
Correct decision Type II error
Do not reject H0 (confidence level: 1 − α) (β)
Type I error Correct decision
Reject H0 (significance level: α) (power: 1 − β)

Hypothesis Testing 51 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Why controlling α only?

I We cannot control α and β at the same time.


I Why do we control α only?
I Recall what we did in setting up a hypothesis:
I We put the claim that requires a strong evidence in Ha .
I We will conclude that Ha is true only with a strong evidence.
I We did so because it is more important to:
I Avoid rejecting H0 when it is true.
I Avoid a type I error.
I That is, a type I error is more costly than a type II error.
I This is why controlling α is our first priority.
I To reduce both α and β, increase the sample size.

Hypothesis Testing 52 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Road map

I Basic ideas of hypothesis testing.


I The first example.
I The p-value.
I Two types of errors.
I More tests.

Hypothesis Testing 53 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

When the variance is unknown


X−µ
I When the population variance σ 2 is unknown, the quantity √
σ/ n
is
unknown.
I When we use the sample variance S 2 as a substitute, we have

X −µ
√ ∼ t(n − 1),
S/ n
X−µ
which means the quantity √
S/ n
follows the t distribution with
degree of freedom n − 1.
I We will use the t test to test the population mean if the population is
normal.
I If the sample size is large, we may still use the z distribution with s
substituting σ.

Hypothesis Testing 54 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 2

I We are interested in whether the students in NTU prefer the


restaurants in NTU.
I One benchmark is NTUST. In a census conducted in NTUST, students
are asked to rate their restaurants in a five-point scale.
I The average score is 4.6.
I We asked 60 NTU students to rate the restaurants in NTU. The
average score is 4.27 and the standard deviation is 1.22.
I Do NTU students rate their restaurants differently from NTUST
students?
I Suppose the scores of all NTU students are normal.

Hypothesis Testing 55 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 2: hypothesis
I The hypothesis is

H0 : µ = 4.6
Ha : µ 6= 4.6.
I µ is the average score (out of a five-point scale) of NTU restaurants rated
by all NTU students.
I Because the population variance is unknown and the population is
normal, we may use the t test.
I Let Tn ∼ t(n), we calculate the p-value:
 
4.27 − 4.6
Pr(X < 4.27) = Pr T59 < √ = Pr(T59 < −2.095) = 0.0202.
1.22/ 60

Hypothesis Testing 56 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 2: test and calculations

I The rejection decision for various α is:


α 0.01 0.05 0.1
Comparison 0.0202 > 0.005 0.0202 < 0.025 0.0202 < 0.05
Decision Do not reject Reject Reject
α
I Why 2?

Hypothesis Testing 57 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 2: decision and implications

I Suppose the significance level is α = 0.01.


I The concluding statement:
I For this two-tailed test, as the p-value is larger than α2 , we do not reject
H0 .
I With a 1% significance level, there is no strong evidence showing that
NTU students rate their restaurants differently from NTUST students.
I NTU do not need to change their restaurants.
I The choice of α affects the decision and implications!

Hypothesis Testing 58 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 2 with the z test

I We may also use the z test because the sample size is large.
I The p-value in the z test is
 
4.27 − 4.6
Pr(X < 4.27) = Pr Z < √
1.22/ 60
= Pr(Z < −2.095) = 0.01808.

I The p-value becomes smaller in the z test than in the t test.


I It is easier to reject H0 by using the z test.
I It is assumed that S is close enough to σ when n is large.
I If one wants to be conservative, the z test should be adopted only if n is
much larger than 30.

Hypothesis Testing 59 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The hypotheses

I The population proportion is denoted as p.


I A two-tailed test for the population proportion is

H0 : p = p0
Ha : p 6= p0 ,

where p0 is the hypothesized proportion.


I In a one-tailed test, the alternative hypothesis may be either

Ha : p > p0

or
Ha : p < p0 .

Hypothesis Testing 60 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Sample proportion
I In testing the population proportion p, we base on the sample
proportion Pb and its distribution.
I Reject H0 if the p-value Pr(Pb < p̂) < α (for a left-tailed test).
I Pb is a statistic and p̂ is an observed value of it.
I What is the sampling distribution of Pb?
I Pb is a random variable:
p(1−p)
I E[Pb] = p and Var(Pb) = n
.
I When the sample size is large (n ≥ 30), Pb follows the normal
distribution approximately.
I In practice, it is safer to assume normality if np > 5 and n(1 − p) > 5. As
p is unknown, we check whether np̂ and n(1 − p̂) are greater than five.
I In testing the population proportion, we use the z test.

Hypothesis Testing 61 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 3

I In a factory, it seems to us that the defective rate of our product is too


high. Ideally it should be below 1% but some workers believe that it is
above 1%.
I If the defective rate is above 1%, we should fix the machine.
Otherwise, we do not do anything.
I Let p be the defective rate, the hypothesis is

H0 : p = 0.01
Ha : p > 0.01.
I When to adopt Ha : p < 0.01?

Hypothesis Testing 62 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 3

I In several random production runs, we found that out of 1000


produced items, 14 of them are defective.
I The observed sample proportion p̂ = 0.014.
I Suppose the significance level is set of α = 0.05, what is our conclusion?
I The sample proportion Pb is normal. Moreover:
I Its expectation is 0.01.
q
(0.01)(0.99)
I Its standard error is 1000
= 0.00315.
I The p-value is
 
 0.014 − 0.01
Pr Pb > p̂ = Pr Z > = Pr(Z > 1.271) = 0.1018.
0.00315

Hypothesis Testing 63 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 1: the p-value


I Because the p-value is larger than α, we do not reject the null
hypothesis.

Hypothesis Testing 64 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 1: decision and implications

I The concluding statement:


I Because the p-value is larger than α, we do not reject H0 .
I With a 5% significance level, there is no strong evidence showing that the
defective rate is higher than 1%.
I We will not try to fix the machine.

Hypothesis Testing 65 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Testing the population proportion

I Wait!
I The sample proportion Pb is normal with
I Its expectation is 0.01.
q
(0.01)(0.99)
I Its standard error is 1000
= 0.00315.
I One thing is strange...
I The population proportion is p. It should be E[Pb] = p and
Var(Pb) = p(1−p)
n .
I Why do we use p0 = 0.01 to substitute p?
I Why not p̂ = 0.014?

Hypothesis Testing 66 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Using the hypothesized value

I The population proportion is p. It should be E[Pb] = p and


Var(Pb) = p(1−p)
n .
I As p is unknown, we need a substitute. p0 or p̂?
I In doing hypothesis testing, it is always the case that we assume H0 is
true.
I α = Pr(rejecting H0 |H0 is true).
I If H0 is true, it is natural that p = p0 and p 6= p̂.
I Summary:
I For estimating p, use p̂ as a substitute.
I For testing p, use p0 as a substitute.

Hypothesis Testing 67 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Testing the population variance


I In many cases, we need to test the population variance.
I The demand of a product seems to remain identical in average for many
years. But is the variability also identical?

I The average weight of a product is under control. But is the variance


small enough?
I If we believe the daily demand of a product Dt satisfies Dt = µ + t ,
where µ is an estimation and t is a random fluctuation. Is Var(t ) small?

Hypothesis Testing 68 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

The hypotheses
I The population variance is denoted as σ 2 .
I A two-tailed test for the population proportion is

H0 : σ 2 = σ02
Ha : σ 2 6= σ02 ,

where σ02 is the hypothesized variance.


I In a one-tailed test, the alternative hypothesis may be either

Ha : σ 2 > σ02

or
Ha : σ 2 < σ02

Hypothesis Testing 69 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Sample variance

I To test the population variance σ 2 , we base on the sample variance


S 2 and the sampling distribution

(n − 1)S 2
χ2n−1 = ∼ Chi(n − 1),
σ2
where n is the sample size.
I The population must be normal!
I The test for testing the population variance is thus called a
chi-square test.
I To utilize the chi-square distribution, let’s recall the definition of the
chi-square critical value.

Hypothesis Testing 70 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Critical chi-square value



I The critical chi-square value χ2w,n−1 satisfies Pr χ2n−1 > χ2w,n−1 = w,
where w is the right-tail probability.

Hypothesis Testing 71 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Rejection rule
I Consider a right-tailed test (Ha : σ 2 > σ02 ). With the p-value method,
we will reject H0 if

(n − 1)s2
 
2
Pr χn−1 > < α.
σ02

(n−1)S 2 (n−1)s2
I
σ2
∼ Chi(n − 1). 2
σ0
is the observed chi-square value.
I The observed sample variance s2 is combined with the hypothesized
population variance σ02 . This is because we assume H0 is true.
I For a two-tailed test, we reject H0 if either

(n − 1)s2 (n − 1)s2
   
α α
Pr χ2n−1 < < or Pr χ2
n−1 > < .
σ02 2 σ02 2

Hypothesis Testing 72 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 4

I Suppose we are testing

H0 : σ 2 = 10
Ha : σ 2 6= 10.

The sample size is 15 and the sample variance is 18. Let α = 0.05. The
population is normal.
I The observed chi-square value is

(15 − 1) × 18
= 25.2.
10

Hypothesis Testing 73 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Example 4: the p-value


I The p-value is Pr(χ214 > 25.2) = 0.0326.
I As the p-value is greater than α2 = 0.025, we do not reject H0 .

Hypothesis Testing 74 / 75 Ling-Chieh Kung (NTU IM)


Basic ideas The first example The p-value Two types of errors More tests

Remarks

I Even if s2 = 18 is almost twice higher than σ02 = 10, we cannot reject


H0 : The evidence is not strong enough.
I If we only allow to make a Type I error with probability 5%, we are not
confident enough to claim that σ 2 6= 10.
I If α = 0.1, we will reject H0 .
I For a right-tailed test (Ha : σ 2 > 10), we will reject H0 .
I For using the chi-square test to test the population variance, the
population must be normal.
I A large sample size does not help!

Hypothesis Testing 75 / 75 Ling-Chieh Kung (NTU IM)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy