Composite Hypotheses
Composite Hypotheses
Composite Hypotheses
Composite Hypotheses
Simple hypotheses limit us to a decision between one of two possible states of nature. This limitation does not allow
us, under the procedures of hypothesis testing to address the basic question:
Does the length, the reaction rate, the fraction displaying a particular behavior or having a particular
opinion, the temperature, the kinetic energy, the Michaelis constant, the speed of light, mutation rate, the
melting point, the probability that the dominant allele is expressed, the elasticity, the force, the mass, the
parameter value ✓0 increase, decrease or change at all under under a different experimental condition?
H0 : ✓ 2 ⇥ 0 versus H1 : ✓ 2 ⇥1 .
• decrease would lead to the choices ⇥0 = {✓; ✓ ✓0 } and ⇥1 = {✓; ✓ < ✓0 }, and
for some choice of parameter value ✓0 . The effect that we are meant to show, here the nature of the change, is contained
in ⇥1 . The first two options given above are called one-sided tests. The third is called a two-sided test,
Rejection and failure to reject the null hypothesis, critical regions, C, and type I and type II errors have the same
meaning for a composite hypotheses as it does with a simple hypothesis. Significance level and power will necessitate
an extension of the ideas for simple hypotheses.
277
Introduction to the Science of Statistics Composite Hypotheses
• For ✓ 2 ⇥1 , 1 ⇡(✓) is the probability of making a type II error, i.e., failing to reject the null hypothesis
when it is false.
With this property for the power function, we would rarely reject the null hypothesis when it is true and rarely fail to
reject the null hypothesis when it is false.
In reality, incorrect decisions are made. Thus, for ✓ 2 ⇥0 ,
0.4
0.2
0.0
mu
Figure 18.1: Power function for the one-sided test with alternative “greater”. The size of the test ↵ is given by the height of the red segment.
Notice that ⇡(µ) < ↵ for all µ < µ0 and ⇡(µ) > ↵ for all µ > µ0
Example 18.1. Let X1 , X2 , . . . , Xn be independent N (µ, 0) random variables with 0 known and µ unknown. For
the composite hypothesis for the one-sided test
H0 : µ µ 0 versus H1 : µ > µ0 ,
278
Introduction to the Science of Statistics Composite Hypotheses
we use the test statistic from the likelihood ratio test and reject H0 if the statistic x̄ is too large. Thus, the critical
region
C = {x; x̄ k(µ0 )}.
If µ is the true mean, then the power function
As we shall see soon, the value of k(µ0 ) depends on the level of the test.
As the actual mean µ increases, then the probability that the sample mean X̄ exceeds a particular value k(µ0 ) also
increases. In other words, ⇡ is an increasing function. Thus, the maximum value of ⇡ on the set ⇥0 = {µ; µ µ0 }
takes place for the value µ0 . Consequently, to obtain level ↵ for the hypothesis test, set
We now use this to find the value k(µ0 ). When µ0 is the value of the mean, we standardize to give a standard normal
random variable
X̄ µ0
Z= p .
0/ n
• If we fix n, the number of observations and the alternative value µ = µ1 > µ0 and determine the power 1
as a function of the significance level ↵, then we have the receiver operator characteristic as in Figure 17.2.
• If we fix µ1 the alternative value and the significance level ↵, then we can determine the power as a function of
th e number of observations as in Figure 17.3.
• If we fix n and the significance level ↵, then we can determine the power function ⇡(µ), the power as a function
of the alternative value µ. An example of this function is shown in Figure 18.1.
Returning to the example with a model species and its mimic. For the plot of the power function for µ0 = 10,
0 = 3, and n = 16 observations,
279
Introduction to the Science of Statistics Composite Hypotheses
1.0
1.0
0.8
0.8
0.6
0.6
pi
pi
0.4
0.4
0.2
0.2
0.0
0.0
6 7 8 9 10 11 6 7 8 9 10 11
mu mu
Figure 18.2: Power function for the one-sided test with alternative “less than”. µ0 = 10, 0 = 3. Note, as argued in the text that ⇡ is a decreasing
function. (left) n = 16, ↵ = 0.05 (black), 0.02 (red), and 0.01 (blue). Notice that lowering significance level ↵ reduces power ⇡(µ) for each value
of µ. (right) ↵ = 0.05, n = 15 (black), 40 (red), and 100 (blue). Notice that increasing sample size n increases power ⇡(µ) for each value of
µ µ0 and decreases type I error probability for each value of µ > µ0 . For all 6 power curves, we have that ⇡(µ0 ) = ↵.
> zalpha<-qnorm(0.95)
> mu0<-10
> sigma0<-3
> mu<-(600:1100)/100
> n<-16
> z<--zalpha - (mu-mu0)/(sigma0/sqrt(n))
> pi<-pnorm(z)
> plot(mu,pi,type="l")
In Figure 18.2, we vary the values of the significance level ↵ and the values of n, the number of observations in
the graph of the power function ⇡
Example 18.3 (mark and recapture). We may want to use mark and recapture as an experimental procedure to test
whether or not a population has reached a dangerously low level. The variables in mark and recapture are
• t be the number captured and tagged,
• k be the number in the second capture,
• r the the number in the second capture that are tagged, and let
• N be the total population.
If N0 is the level that a wildlife biologist say is dangerously low, then the natural hypothesis is one-sided.
H0 : N N0 versus H1 : N < N0 .
The data are used to compute r, the number in the second capture that are tagged. The likelihood function for N is
the hypergeometric distribution,
t N t
r k r
L(N |r) = N
.
k
280
Introduction to the Science of Statistics Composite Hypotheses
The maximum likelihood estimate is N̂ = [tk/r]. Thus, higher values for r lead us to lower estimates for N . Let R be
the (random) number in the second capture that are tagged, then, for an ↵ level test, we look for the minimum value
r↵ so that
⇡(N ) = PN {R r↵ } ↵ for all N N0 . (18.3)
As N increases, then recaptures become less likely and the probability in (18.3) decreases. Thus, we should set the
value of r↵ according to the parameter value N0 , the minimum value under the null hypothesis. Let’s determine r↵
for several values of ↵ using the example from the topic, Maximum Likelihood Estimation, and consider the case in
which the critical population is N0 = 2000.
> N0<-2000; t<-200; k<-400
> alpha<-c(0.05,0.02,0.01)
> ralpha<-qhyper(1-alpha,t,N0-t,k)
> data.frame(alpha,ralpha)
alpha ralpha
1 0.05 49
2 0.02 51
3 0.01 53
For example, we must capture al least 49 that were tagged in order to reject H0 at the ↵ = 0.05 level. In this case
the estimate for N is N̂ = [kt/r↵ ] = 1632. As anticipated, r↵ increases and the critical regions shrinks as the value
of ↵ decreases.
Using the level r↵ determined using the value N0 for N , we see that the power function
⇡(N ) = PN {R r↵ }.
R is a hypergeometric random variable with mass function
t N t
r k r
fR (r) = PN {R = r} = N
.
k
The plot for the case ↵ = 0.05 is given using the R commands
> N<-c(1300:2100)
> pi<-1-phyper(49,t,N-t,k)
> plot(N,pi,type="l",ylim=c(0,1))
We can increase power by increasing the size of k, the number the value in the second capture. This increases the
value of r↵ . For ↵ = 0.05, we have the table.
> k<-c(400,600,800)
> N0<-2000
> ralpha<-qhyper(0.95,t,N0-t,k)
> data.frame(k,ralpha)
k ralpha
1 400 49
2 600 70
3 800 91
We show the impact on power ⇡(N ) of both significance level ↵ and the number in the recapture k in Figure 18.3.
Exercise 18.4. Determine the type II error rate for N = 1600 with
• k = 400 and ↵ = 0.05, 0.02, and 0.01, and
• ↵ = 0.05 and k = 400, 600, and 800.
281
Introduction to the Science of Statistics Composite Hypotheses
1.0
1.0
0.8
0.8
0.6
0.6
pi
pi
0.4
0.4
0.2
0.2
0.0
0.0
1400 1600 1800 2000 1400 1600 1800 2000
N N
Figure 18.3: Power function for Lincoln-Peterson mark and recapture test for population N0 = 2000 and t = 200 captured and tagged. (left)
k = 400 recaptured ↵ = 0.05 (black), 0.02 (red), and 0.01 (blue). Notice that lower significance level ↵ reduces power. (right) ↵ = 0.05,
k = 400 (black), 600 (red), and 800 (blue). As expected, increased recapture size increases power.
H0 : µ = µ 0 versus H1 : µ 6= µ0 .
In this case, the parameter values for the null hypothesis ⇥0 consist of a single value, µ0 . We reject H0 if |X̄ µ0 | is
too large. Under the null hypothesis,
X̄ µ0
Z= p
/ n
is a standard normal random variable. For a significance level ↵, choose z↵/2 so that
↵
P {Z z↵/2 } = P {Z z↵/2 } = .
2
Thus, P {|Z| z↵/2 } = ↵. For data x = (x1 , . . . , xn ), this leads to a critical region
⇢
x̄ µ0
C = x; p z↵/2 .
/ n
If µ is the actual mean, then
X̄ µ
p
0 / n
is a standard normal random variable. We use this fact to determine the power function for this test
⇢
X̄ µ0
⇡(µ) = Pµ {X 2 C} = 1 Pµ {X 2 / C} = 1 Pµ p < z↵/2
0/ n
⇢ ⇢
X̄ µ0 µ µ0 X̄ µ µ µ
= 1 Pµ z↵/2 < p < z↵/2 = 1 Pµ z↵/2 p < p < z↵/2 p0
0/ n 0/ n 0/ n 0/ n
✓ ◆ ✓ ◆
µ µ0 µ µ0
=1 z↵/2 p + z↵/2 p
0/ n 0/ n
282
Introduction to the Science of Statistics Composite Hypotheses
1.0
1.0
0.8
0.8
0.6
0.6
pi
pi
0.4
0.4
0.2
0.2
0.0
0.0
6 8 10 12 14 6 8 10 12 14
mu mu
Figure 18.4: Power function for the two-sided test. µ0 = 10, 0 = 3. (left) n = 16, ↵ = 0.05 (black), 0.02 (red), and 0.01 (blue). Notice that
lower significance level ↵ reduces power. (right) ↵ = 0.05, n = 15 (black), 40 (red), and 100 (blue). As before, decreased significance level
reduces power and increased sample size n increases power.
If we do not know if the mimic is larger or smaller that the model, then we use a two-sided test. Below is the R
commands for the power function with ↵ = 0.05 and n = 16 observations.
We shall see in the the next topic how these tests follow from extensions of the likelihood ratio test for simple
hypotheses.
The next example is unlikely to occur in any genuine scientific situation. It is included because it allows us to
compute the power function explicitly from the distribution of the test statistic. We begin with an exercise.
Exercise 18.6. For X1 , X2 , . . . , Xn independent U (0, ✓) random variables, ✓ 2 ⇥ = (0, 1). The density
⇢1
if 0 < x ✓,
fX (x|✓) = ✓
0 otherwise.
Let X(n) denote the maximum of X1 , X2 , . . . , Xn , then X(n) has distribution function
⇣ x ⌘n
FX(n) (x) = P✓ {X(n) x} = .
✓
Example 18.7. For X1 , X2 , . . . , Xn independent U (0, ✓) random variables, take the null hypothesis that ✓ lands in
some normal range of values [✓L , ✓R ]. The alternative is that ✓ lies outside the normal range.
283
Introduction to the Science of Statistics Composite Hypotheses
Because ✓ is the highest possible value for an observation, if any of our observations Xi are greater than ✓R , then
we are certain ✓ > ✓R and we should reject H0 . On the other hand, all of the observations could be below ✓L and the
maximum possible value ✓ might still land in the normal range.
Consequently, we will try to base a test based on the statistic X(n) = max1in Xi and reject H0 if X(n) > ✓R
˜ We shall soon see that the choice of ✓˜ will depend on n the number of observations
and too much smaller than ✓L , say ✓.
and on ↵, the size of the test.
The power function
⇡(✓) = P✓ {X(n) ✓}˜ + P✓ {X(n) ✓R }
We compute the power function in three cases - low, middle and high values for the parameter ✓. The second case
has the values of ✓ under the null hypothesis. The first and the third cases have the values for ✓ under the alternative
hypothesis. An example of the power function is shown in Figure 18.5.
1.0
0.8
0.6
pi
0.4
0.2
0.0
0 1 2 3 4 5
theta
Figure 18.5: Power function for the test above with ✓L = 1, ✓R = 3, ✓˜ = 0.9, and n = 10. The size of the test is ⇡(1) = 0.3487.
Case 1. ✓ ✓.˜
˜ Thus, X(n) is certainly less
In this case all of the observations Xi must be less than ✓ which is in turn less than ✓.
than ✓˜ and
˜ = 1 and P✓ {X(n)
P✓ {X(n) ✓} ✓R } = 0
and therefore ⇡(✓) = 1.
Case 2. ✓˜ < ✓ ✓R .
Here X(n) can be less that ✓˜ but never greater than ✓R .
!n
˜ ✓˜
P✓ {X(n) ✓} = and P✓ {X(n) ✓R } = 0
✓
˜ n.
and therefore ⇡(✓) = (✓/✓)
Case 3. ✓ > ✓R .
284
Introduction to the Science of Statistics Composite Hypotheses
and that ✓ ◆n
✓R
P✓ {X(n) ✓R } = 1 P✓ {X(n) < ✓R } = 1
✓
˜ n+1
and therefore ⇡(✓) = (✓/✓) (✓R /✓)n .
The size of the test is the maximum value of the power function under the null hypothesis. This is case 2. Here, the
power function
!n
✓˜
⇡(✓) =
✓
˜ obtaining ✓˜ = ✓L p
To achieve this level, we solve for ✓, n
↵. Note that ✓˜ increases with ↵. Consequently, we must
expand the critical region in order to reduce the significance level. Also, ✓˜ increases with n and we can reduce the
critical region while maintaining significance if we increase the sample size.
Example 18.8. For the one-sided hypothesis test to see if the mimic had invaded,
H0 : µ µ0 versus H1 : µ < µ0 .
with µ0 = 10 cm, 0 = 3 cm and n = 16 observations. The test statistics is the sample mean x̄ and the critical region
is C = {x; x̄ k}
Our data had sample mean x̄ = 8.93125 cm. The maximum value of the power function ⇡(µ) for µ in the subset
of the parameter space determined by the null hypothesis occurs for µ = µ0 . Consequently, the p-value is
> pnorm(8.93125,10,3/4)
[1] 0.0770786
285
Introduction to the Science of Statistics Composite Hypotheses
0.6
0.5
0.4
dnorm(x, 10, 3/4)
0.3
0.2
0.1
0.0
7 8 9 10 11 12 13
x
p
Figure 18.6: Under the null hypothesis, X̄ has a normal distribution mean µ0 = 10cm, standard deviation 3/ 16 = 3/4cm. The p-value, 0.077,
is the area under the density curve to the left of the observed value of 8.931 for x̄, The critical value, 8.767, for an ↵ = 0.05 level test is indicated
by the red line. Because the p-vlaue is greater than the significance level, we cannot reject H0 .
If the p-value is below a given significance level ↵, then we say that the result is statistically significant at the
level ↵. For the previous example, we could not have rejected H0 at the ↵ = 0.05 significance level. Indeed, we
could not have rejected H0 at any level below the p-value, 0.0770786. On the other hand, we would reject H0 for any
significance level above this value.
Many statistical software packages (including R, see the example below) do not need to have the significance level
in order to perform a test procedure. This is especially important to note when setting up a hypothesis test for the
purpose of deciding whether or not to reject H0 . In these circumstances, the significance level of a test is a value that
should be decided before the data are viewed. After the test is performed, a report of the p-value adds information
beyond simply saying that the results were or were not significant.
It is tempting to associate the p-value to a statement about the probability of the null or alternative hypothesis being
true. Such a statement would have to be based on knowing which value of the parameter is the true state of nature.
Assessing whether of not this parameter value is in ⇥0 is the reason for the testing procedure and the p-value was
computed in knowledge of the data and our choice of ⇥0 .
In the example above, the test is based on having a test statistic S(x) (namely x̄) exceed a level k0 , i.e., we have
decision
reject H0 if and only if S((x)) k0 .
This choice of k0 is based on the choice of significance level ↵ and the choice of ✓0 2 ⇥0 so that ⇡(✓0 ) = P✓0 {S(X)
k0 } = ↵, the lowest value for the power function under the null hypothesis. If the observed data x takes the value
S(x) = k, then the p-value equals
P✓0 {S(X) k}.
This is the lowest value for the significance level that would result in rejection of the null hypothesis if we had chosen
it in advance of seeing the data.
Example 18.9. Returning to the example on the proportion of hives that survive the winter, the appropriate composite
hypothesis test to see if more that the usual normal of hives survive is
286
Introduction to the Science of Statistics Composite Hypotheses
> prop.test(88,112,0.7,alternative="greater")
Exercise 18.10. Is the hypothesis test above signfiicant at the 5% level? the 1% level?
18.4. The type II error rate is 1 ⇡(1600) = P1600 {R < r↵ }. This is the distribution function of a hypergeometric
random variable and thus these probabilities can be computed using the phyper command
• For varying significance, we have the R commands:
> t<-200;N<-1600
> k<-400
> alpha<-c(0.05,0.02,0.01)
> ralpha<-c(49,51,53)
> beta<-1-phyper(ralpha-1,t,N-t,k)
> data.frame(alpha,beta)
alpha beta
1 0.05 0.5993010
2 0.02 0.4609237
3 0.01 0.3281095
Notice that the type II error probability is high for ↵ = 0.05 and increases as ↵ decreases.
• For varying recapture size, we continue with the R commands:
287
Introduction to the Science of Statistics Composite Hypotheses
> k<-c(400,600,800)
> ralpha<-c(49,70,91)
> beta<-1-phyper(ralpha-1,t,N-t,k)
> data.frame(k,beta)
k beta
1 400 0.5993010
2 600 0.8043988
3 800 0.9246057
Notice that increasing recapture size has a significant impact on type II error probabilities.
18.6. The i-th observation satisfies Z x
1 x
P {Xi x} = dx̃ =
0 ✓ ✓
Now, X(n) x occurs precisely when all of the n-independent observations Xi satisfy Xi x. Because these
random variables are independent,
18.10. Yes, the p-value is below 0.05. No, the p-value is above 0.01.
288