PTS2 Exam 2017 With Solutions
PTS2 Exam 2017 With Solutions
PTS2 Exam 2017 With Solutions
Identification :
You have to identify yourself using your certificate of registration (UvA-identification card) and a valid proof
of identity (passport, ID card) with a good resembling photograph. If you cannot identify yourself, access to
the exam may be denied.
If you are not correctly registered via SIS for the course component, your exam will not be marked and
registered.
Please write your name and student number on every sheet of paper you hand in.
Tools allowed:
pencil (not red!), pen (not red!), eraser, ruler, (non-graphic) calculator.
Specific information on this exam: This exam consists of 5 questions on 2 pages. The maximum score is
100 points. Always motivate your answers and show your calculations. The final mark for the course will be
determined as follows:
0.65*(points in this exam) + 0.25*(midterm score) + 0.10*(average score for computer assignments).
The answers will be posted on Blackboard.
SUCCESS!
You may have to use (some of) the following theorems.
Theorem 1.
If Z ~ N(0, 1), then Z 2 ~ 2 (1) .
Theorem 2.
Let X1 , X 2 ,..., X n be a sample from a normal distribution with X i ~ N(, 2 ) . Then
(n 1) S 2
~ 2n1 , where S 2 denotes the sample variance as usual.
2
Theorem 3.
If V ~ N(V , V2 ) , then bV a ~ N b V a, b2 V2 (for b ≠ 0).
Theorem 4.
If P( X 1) p and P( X 0) 1 p, then E( X ) p and Var( X ) p(1 p) .
Theorem 5.
(Central Limit Theorem, informally formulated) Let X1, . . . , Xn be a sample from a
distribution with finite standard deviation σ and mean µ, then for n ‘sufficiently large’
i1 X i n will have approximately a standard normal distribution.
n
n
Theorem 6.
n
If X ~ Bin(n, p) , then P( X x) p x (1 p)n x .
x
Theorem 7.
1
If X ~ Unif (a, b) , then f X (x) for a x b , E( X ) (a b) 2 and Var( X ) (b a)2 12
ba
Theorem 8.
If V ~ 2m and W ~ 2n and mutually independent, then V W ~ 2n m
Power = onderscheidingsvermogen
Level of significance = onbetrouwbaarheidsdrempel
Unbiased estimator = zuivere schatter
Test statistic = toetsingsgrootheid
Rejection region = kritieke gebied
1 The joint pdf of the random variables X and Y is given by (where a and c are positive constants):
ce xy
for 1 x a and 0 y
f X ,Y x, y
0 elsewhere
1 / y for 0 x y 1
f X ,Y x, y
0 elsewhere
b [6] Define W = X + 2Y. Determine the CDF of W. It is sufficient here to just write down the correct
integrals with the correct integration limits.
3 Consider two independent random variables X and Y, both with a uniform distribution on the interval
(1, 1). Define V = min( X, Y ) and W = max( X, Y ).
a [6] Determine the pdf of V.
b [3] Explain (without any calculations) why the variance of V must, in this case, be equal to the variance of
W.
c [8] Determine the covariance and the coefficient of correlation of V and W. Hint: use the fact that: X + Y =
V + W in order to determine the variance of V + W very simply. Next, use a well-known equation
which relates the variance of V + W to the covariance of V and W . In case you do not manage to find
the variance of V, then use Var(V) = 2/9.
4 Let X1 , X 2 ,..., X16 be a sample of size 16 from a normal distribution with X i ~ N(, 2 ) .
16 16
The sample outcomes are x i 1
i 96 and x
i 1
2
i 3324 .
Please note that all seven parts of this question can be answered independently from any other part.
1 n 2
c [4] Prove that X i is now an unbiased estimator for the population variance 2 .
n i 1
d [5] In what way should the test statistic in part b be modified as a result of the extra information about μ
(while still using the chi-square distribution)? Again determine the p-value (given the same sample
outcomes as above). Explain why this test is now preferable over the test in part b.
e [4] Determine the probability that the sample mean will be larger than 6, first assuming that
σ = 10 and then again under the assumption that σ = 20 .
f [8] For the hypothesis test H0 : 10 versus H1 : 10 , a chi-square distributed test statistic is the usual
choice. However, in this particular case (since µ is known), it is possible to choose X ~ N(, 2 n) as a
suitable test statistic for the very same hypotheses. Explain why this is a proper test statistic in this case.
Next, argue why the rejection region for X at α = 0.05 is now two-sided (even though the alternative
hypothesis is one-sided), and should consist of all values such that | x | 4.9 .
(Note that we have NOT discussed such a situation in class; you are requested to use your
understanding of hypothesis testing to answer this part!!).
g [4] Determine the power of the test as described in part f, in case the true value for the population standard
deviation is equal to 20.
5 Let X1, . . . , Xn be a sample from a distribution with X i ~ Bin(1, p) . Define the sample proportion as
Pˆ n 1 i 1 X i .
n
a [8] Prove that P̂ (in case n is large enough) is approximately normally distributed with expected value p
and variance p(1 – p) / n. You may only use the theorems as shown on the first page of this exam.
Indicate clearly when and how you use each theorem. Do not skip any steps, even if they might seem
trivial to you.
b [7] Derive the formula for a two-sided 99%-confidence interval for p, starting from what is given in part a.
c [4] Determine the sample size needed to ensure that the margin of error for the 99%-confidence interval for
p is never more than 0.02.
d [8] It is often said that the political party VPP of Greet Tammers is more popular among female than
among male voters. In an attempt to confirm this, two random samples from the population of eligible
voters are taken; the first of 476 female voters, and the second consisting of 524 male voters.
Develop a hypothesis test for this situation, when the chosen level of significance is 0.05. Formulate
the hypotheses, the test statistic with its distribution, and the rejection region for the test statistic. Make
sure that you define all symbols you use. Hint: determine first the distribution of the difference of the
two sample proportions under H0. Note that this distribution still depends on an unknown parameter
(even under H0), so it should be replaced by a suitable estimate before a usable test statistic results.
Solutions
1a No, they are not, since f X ,Y x, y f X x fY y for all x and y. This can be seen because the pdf
cannot be factorized into a function of x and a function of y only.
Unfortunately, many students apparently thought that e xy e x e y , which is of course not true.
y
1 c
b f X ( x) f X ,Y ( x, y ) dy ce xy dy c e xy for 1 < x < a.
0 y 0
0
x x
f X ,Y ( x, y ) ce xy
fY | X ( y | x) xe xy for 0 < y and only defined for 1 < x < a.
f X ( x) c x
a a a
c
For f X ,Y x, y to be a proper pdf, we must have ce
xy
dy dx 1 , or f X ( x) dx x dx 1
10 1 1
c ln a 1 c 1 ln a
y
x = vy
1/ v
x
2
a When looking at the support of X and Y , it follows immediately that V = X/Y can only assume values
between 0 and 1.
1 yv 1
FV (v) P(V v) P( X / Y v) P( X vY ) 1 / y dx dy v dy v
0 0 0
dFV (v)
fV (v) 1 for 0 v 1
dv
v 1
In case the reverse order of integration was used, the first integral should be: FV (v) 1 / y dy dx
0 x/v
(this results in slightly more difficult integration step; of course with the same answer)
b FW (w) P(W w) P( X 2Y w)
It can be seen simply that W can attain values between 0 and 3. Drawing lines x 2 y w for different
values of w shows that the area of integration has a different form for w < 2 as for w > 2.
for 0 w 2 :
Determine intersection of the lines x 2 y w and x y : x y w / 3
w /3 y w/2 w 2 y
FW ( w) 1 / y dx dy 1 / y dx dy
0 0 w /3 0
w/3 ( w x )/2
or, with reversed order of integration: FW ( w) 1 / y dy dx
0 x
and for 2 w 3:
w/3 y 1 w 2 y
FW ( w) 1 / y dx dy 1 / y dx dy
0 0 w/3 0
w 2 1 w/3 ( w x )/2
or, with reversed order of integration: FW ( w) 1 / y dy dx 1 / y dy dx
0 x w 2 x
1 v
v 1
Since 1 FX (v) 1 12 dx 12 dx , we get: FV (v) 1 ( 12 2v )2
1 v
2
(students who used the incorrect CDF of X should have found the error by performing the check:
FV (v) 0 for v 1, and FV (v) 1 for v 1)
dFV (v) 1 v
Now, fV (v) 2 2 for –1 < v < 1
dv
b This can be explained because of symmetry reasons. A complete argument could be given like for
example: Both the distributions of X and Y are symmetrical around 0, so that the distributions of –X =
(–1)X and of –Y are the same as those of X and Y , so that also min(–X , –Y ) will have the same
distribution as V = min(X , Y ).
Also,since W = max( X, Y ) = – min(–X , –Y ), it is clear that the variances of X and Y must be equal to
each other.
1
1
1
E(V ) v( 12 2v )dv 14 v 2 16 v 3 16 16 13
1
1
1
Var(V ) 13 19 2
9
Entering the above into the equation Var(V + W ) = Var(V ) + Var(W ) + 2 Cov(V , W ), it follows that
1
1
2/3 = 2/9 + 2/9 + 2 Cov(V , W ), and thus that Cov(V , W ) = 1/9, and thus (V ,W ) 9
22 2
9 9
4a s2
1 16 2
15 i 1
( xi x ) 2
1 16 2
15 i 1
2
xi 16 x
15
1
3324 16 96 /16 183.2
2
(n 1) S 2 15S 2
b Test statistic with distribution under H 0 : ~ 2df 15 .
2
100
15 183.2
The outcome for the test statistic is then 27.48
100
p-value = P(15 2
27.48) , which from the supplied table can be found to be just a little larger than
0.025. Since this value is less than the value for the significance level, we reject H 0 . As a result, the
conclusion is that sufficient proof exists to state that the population standard deviation is larger than 10
(at α = 0.05).
1 1
c E 1n X i2 E X i2 nE(X 2 ) E(X 2 ) Var( X ) (E( X ))2 2 2 2 .
n n
(Note that in this situation no degree of freedom has been lost for the estimation of µ! In case you didn’t
know this result anymore, it could be derived easily from Theorems 1 and 8 above. Also, this situation
was also part of problem 7.15).
3324
The outcome for the test statistic is then 33.24
100
p-value = P(16
2
33.24) , which must be between 0.005 and 0.01.
The reason that this test is preferable (given the knowledge that µ = 0) over the test in part b is simply
because now more information has become available. We no longer have to use part of the information
in the sample to estimate the sample mean first. In general this will lead to an increased power of the
test.
Any answer stating that this test is preferable because it resulted in a smaller p-value is clearly and
completely incorrect!
X
e P( X 6 | 10, =0) P( 2.4) 1 P( Z 2.4) 0.0082
n
X
P( X 6 | 20, =0) P( 1.2) 1 P( Z 1.2) 0.1151
n
The rejection region for the sample mean should now be such that P( X in rejection region) 0.05 .
So we need to find a such that P(| X | a) 0.05 , or P( X a) 0.025
X X
Since under H 0 : Z 0.4 X ~ N(0,1) , we can see that
n 10 16
P( X a) P(0.4 X 0.4a) P(Z 0.4a) 0.025 , which shows that 0.4a 1.96 , or that
a 1.96 2.5 4.9 P(| X | 4.9) 0.05 .
i 1 X i np
n
Then we apply theorem 5 (since both µ and σ are finite), which tells us that is
np(1 p)
approximately standard normally distributed (for n large enough).
i 1 X i p
1 n
n
By dividing both numerator and denominator by n, it follows that is also
p(1 p) / n
approximately standard normally distributed. As our final step, we need to apply Theorem 3 with
i 1 X i p p 1
1 n
i 1 X i Pˆ is
n n
b p(1 p) / n and a p , such that we get that b
p(1 p) / n
n
Pˆ p
b P z /2 z /2 1
pq / n
By rewriting the above, we obtain:
P Pˆ z /2 pq / n p Pˆ z /2 pq / n Pˆ z/2 pq / n 1
Now, we replace p and q in the limits by their (unbiased) estimators Pˆ and (1 Pˆ ) , so that the
requested confidence interval becomes:
Pˆ (1 Pˆ ) ˆ Pˆ (1 Pˆ )
Pˆ z /2 , P z /2 .
n n
pˆ (1 pˆ )
c Margin of error in the confidence interval above is 2.576 , so we look for those values of n
n
pˆ (1 pˆ )
such that 2.576 0.02 . However, this expression contains the unknown pˆ (1 pˆ ) , so it can’t
n
be used straight away. However, since we know that pˆ (1 pˆ ) 1 4 , we know that after replacing
pˆ (1 pˆ ) by 1 4 , we will always be on the safe side. Thus:
14
2.576 0.02
n
0.02
1/ n
2.576 * 12
2.576 * 12
n 64.4
0.02
n 64.42 4147.4 n 4148
d Let us write p f for the proportion of people voting for the VPP within the female population, and pˆ f
for the sample proportion of people indicating they will vote for the VPP within the sample of 476
females. For the male population and sample of 524 voters, we will use the symbols pm and pˆ m
respectively.
Now we will test H0 : p f pm versus H1 : p f pm .
To find the correct test statistic, recall that we know from part a:
approx approx
Pˆ f ~ N( p f , p f (1 p f ) / 476) and Pˆm ~ N( pm , pm (1 pm ) / 524)
approx p f (1 p f ) pm (1 pm )
Thus, we also know that Pˆ f Pˆm ~ N( p f pm , ).
476 524
approx p(1 p) p(1 p)
Under H0 this becomes: Pˆ f Pˆm ~ N(0, ) , where p is the common population
476 524
proportion: p pm p f .Since this distribution still depends on an unknown parameter, it can’t be used
directly as a test statistic. But we can estimate the value of p by the proportion of all people in the two
476 Pˆ f 524 Pˆm
samples combined who vote for the VPP, thus by Pˆp . Thus, we obtain the test
1000
Pˆ f Pˆm approx
statistic: Z = ~ N(0, 1) .
Pˆp (1 Pˆp ) Pˆp (1 Pˆp )
476 524
The rejection region for this one-sided test is simply: z ≥ 1.645 (since α = 0.05).
This completes the setup of the test; no information has been given about the actually observed sample
proportions, so we can’t perform the test.