PTS2 Exam 2017 With Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Economics and Business

Academic Year 2016-2017

Exam: Probability Theory and Statistics 2, 6011P0173Y


Kansrekening en Statistiek 2, 6011P0100Y
Date and time of the exam: 1 June 2017, 13:00 – 16:00

Duration of the exam: 3 hours

Identification :
You have to identify yourself using your certificate of registration (UvA-identification card) and a valid proof
of identity (passport, ID card) with a good resembling photograph. If you cannot identify yourself, access to
the exam may be denied.
If you are not correctly registered via SIS for the course component, your exam will not be marked and
registered.
Please write your name and student number on every sheet of paper you hand in.

Warning against fraud/cheating:


Students who are caught at any form of fraud/cheating will be punished.
Make sure that your mobile phone is switched off and put away in your briefcase/bag. This also applies for
other audio equipment, headphones, digital watches (e.g. I-watches) and other electronic devices. Your
briefcase/bag must be closed and placed on the floor beside your table.

Tools allowed:
pencil (not red!), pen (not red!), eraser, ruler, (non-graphic) calculator.

Exam results and exam inspection:


The results of this exam will be published within 18 working days following the date of the exam by the
Student Administration Economics and Business via SIS.
The date for the inspection of the exam will be communicated on Blackboard.
You are allowed to make copies of the assessed work and the elaborations/solutions, at cost price.

Specific information on this exam: This exam consists of 5 questions on 2 pages. The maximum score is
100 points. Always motivate your answers and show your calculations. The final mark for the course will be
determined as follows:
0.65*(points in this exam) + 0.25*(midterm score) + 0.10*(average score for computer assignments).
The answers will be posted on Blackboard.

SUCCESS!
You may have to use (some of) the following theorems.

Theorem 1.
If Z ~ N(0, 1), then Z 2 ~ 2 (1) .

Theorem 2.
Let X1 , X 2 ,..., X n be a sample from a normal distribution with X i ~ N(, 2 ) . Then
(n  1) S 2
~ 2n1 , where S 2 denotes the sample variance as usual.
 2

Theorem 3.

If V ~ N(V , V2 ) , then bV  a ~ N b V  a, b2 V2  (for b ≠ 0).

Theorem 4.
If P( X  1)  p and P( X  0)  1  p, then E( X )  p and Var( X )  p(1  p) .

Theorem 5.
(Central Limit Theorem, informally formulated) Let X1, . . . , Xn be a sample from a
distribution with finite standard deviation σ and mean µ, then for n ‘sufficiently large’
i1 X i  n will have approximately a standard normal distribution.
n

 n

Theorem 6.
n
If X ~ Bin(n, p) , then P( X  x)    p x (1  p)n  x .
 x

Theorem 7.
1
If X ~ Unif (a, b) , then f X (x)  for a  x  b , E( X )  (a  b) 2 and Var( X )  (b  a)2 12
ba

Theorem 8.
If V ~ 2m and W ~ 2n and mutually independent, then V  W ~ 2n m

Power = onderscheidingsvermogen
Level of significance = onbetrouwbaarheidsdrempel
Unbiased estimator = zuivere schatter
Test statistic = toetsingsgrootheid
Rejection region = kritieke gebied
1 The joint pdf of the random variables X and Y is given by (where a and c are positive constants):

 ce xy
 for 1  x  a and 0  y  
f X ,Y  x, y   

0 elsewhere

a [3] Are X and Y independent? Why yes/no?


b [6] Determine the pdf of Y given X = x. Also, express c as a function of a.

2 Consider the following joint pdf of the random variables X and Y.

1 / y for 0  x  y  1
f X ,Y  x, y   
0 elsewhere

a [6] Define V = X / Y. Determine the pdf of V using the CDF-method.

b [6] Define W = X + 2Y. Determine the CDF of W. It is sufficient here to just write down the correct
integrals with the correct integration limits.

3 Consider two independent random variables X and Y, both with a uniform distribution on the interval
(1, 1). Define V = min( X, Y ) and W = max( X, Y ).
a [6] Determine the pdf of V.

b [3] Explain (without any calculations) why the variance of V must, in this case, be equal to the variance of
W.

c [8] Determine the covariance and the coefficient of correlation of V and W. Hint: use the fact that: X + Y =
V + W in order to determine the variance of V + W very simply. Next, use a well-known equation
which relates the variance of V + W to the covariance of V and W . In case you do not manage to find
the variance of V, then use Var(V) = 2/9.

4 Let X1 , X 2 ,..., X16 be a sample of size 16 from a normal distribution with X i ~ N(, 2 ) .
16 16
The sample outcomes are x i 1
i  96 and x
i 1
2
i  3324 .

Please note that all seven parts of this question can be answered independently from any other part.

a [3] Show that the sample variance is equal to 183.2.

b [7] We would like to test the following hypotheses: H0 :   10 versus H1 :   10 .


Specify the appropriate test statistic (with distribution) for this test, and determine the p-value (given
the sample outcomes above). Will the null hypothesis be rejected at α = 0.05? Formulate the conclusion.
For the remaining parts (c to g) of this question, assume that extra information becomes
available, which tells us that the population mean μ is actually equal to 0.

1 n 2
c [4] Prove that  X i is now an unbiased estimator for the population variance 2 .
n i 1

d [5] In what way should the test statistic in part b be modified as a result of the extra information about μ
(while still using the chi-square distribution)? Again determine the p-value (given the same sample
outcomes as above). Explain why this test is now preferable over the test in part b.

e [4] Determine the probability that the sample mean will be larger than 6, first assuming that
σ = 10 and then again under the assumption that σ = 20 .
f [8] For the hypothesis test H0 :   10 versus H1 :   10 , a chi-square distributed test statistic is the usual
choice. However, in this particular case (since µ is known), it is possible to choose X ~ N(, 2 n) as a
suitable test statistic for the very same hypotheses. Explain why this is a proper test statistic in this case.
Next, argue why the rejection region for X at α = 0.05 is now two-sided (even though the alternative
hypothesis is one-sided), and should consist of all values such that | x |  4.9 .
(Note that we have NOT discussed such a situation in class; you are requested to use your
understanding of hypothesis testing to answer this part!!).

g [4] Determine the power of the test as described in part f, in case the true value for the population standard
deviation is equal to 20.

5 Let X1, . . . , Xn be a sample from a distribution with X i ~ Bin(1, p) . Define the sample proportion as
Pˆ  n 1 i 1 X i .
n

a [8] Prove that P̂ (in case n is large enough) is approximately normally distributed with expected value p
and variance p(1 – p) / n. You may only use the theorems as shown on the first page of this exam.
Indicate clearly when and how you use each theorem. Do not skip any steps, even if they might seem
trivial to you.

b [7] Derive the formula for a two-sided 99%-confidence interval for p, starting from what is given in part a.

c [4] Determine the sample size needed to ensure that the margin of error for the 99%-confidence interval for
p is never more than 0.02.

d [8] It is often said that the political party VPP of Greet Tammers is more popular among female than
among male voters. In an attempt to confirm this, two random samples from the population of eligible
voters are taken; the first of 476 female voters, and the second consisting of 524 male voters.
Develop a hypothesis test for this situation, when the chosen level of significance is 0.05. Formulate
the hypotheses, the test statistic with its distribution, and the rejection region for the test statistic. Make
sure that you define all symbols you use. Hint: determine first the distribution of the difference of the
two sample proportions under H0. Note that this distribution still depends on an unknown parameter
(even under H0), so it should be replaced by a suitable estimate before a usable test statistic results.
Solutions

1a No, they are not, since f X ,Y  x, y   f X  x  fY  y  for all x and y. This can be seen because the pdf
cannot be factorized into a function of x and a function of y only.
Unfortunately, many students apparently thought that e xy  e x e y , which is of course not true.
   y 
 1  c
b f X ( x)   f X ,Y ( x, y ) dy   ce xy dy  c    e xy   for 1 < x < a.
0  y 0
 0
x x

f X ,Y ( x, y ) ce xy
 fY | X ( y | x)    xe xy for 0 < y and only defined for 1 < x < a.
f X ( x) c x
a a a
c
For f X ,Y  x, y  to be a proper pdf, we must have   ce
 xy
dy dx  1 , or  f X ( x) dx   x dx  1
10 1 1
 c ln a  1  c  1 ln a

2 The support of X and Y can be drawn as below:

y
x = vy

1/ v
x
2
a When looking at the support of X and Y , it follows immediately that V = X/Y can only assume values
between 0 and 1.
1 yv 1
FV (v)  P(V  v)  P( X / Y  v)  P( X  vY )    1 / y dx dy   v dy  v
0 0 0

dFV (v)
 fV (v)  1 for 0  v  1
dv
v 1
In case the reverse order of integration was used, the first integral should be: FV (v)    1 / y dy dx
0 x/v
(this results in slightly more difficult integration step; of course with the same answer)
b FW (w)  P(W  w)  P( X  2Y  w)
It can be seen simply that W can attain values between 0 and 3. Drawing lines x  2 y  w for different
values of w shows that the area of integration has a different form for w < 2 as for w > 2.
for 0  w  2 :
Determine intersection of the lines x  2 y  w and x  y : x  y  w / 3
w /3 y w/2 w 2 y
FW ( w)   1 / y dx dy    1 / y dx dy
0 0 w /3 0

w/3 ( w x )/2
or, with reversed order of integration: FW ( w)    1 / y dy dx
0 x

and for 2  w  3:
w/3 y 1 w 2 y
FW ( w)   1 / y dx dy    1 / y dx dy
0 0 w/3 0

w 2 1 w/3 ( w x )/2
or, with reversed order of integration: FW ( w)   1 / y dy dx    1 / y dy dx
0 x w 2 x

3a So for v  (1 , 1):

FV (v)  P(V  v)  1  P(V  v)  1  P( X  v,Y  v)  1  P( X  v) P(Y  v)  1  (1  FX (v)) 2 .

1 v
v 1
Since 1  FX (v)  1   12 dx   12 dx  , we get: FV (v)  1  ( 12  2v )2
1 v
2

(students who used the incorrect CDF of X should have found the error by performing the check:
FV (v)  0 for v  1, and FV (v)  1 for v  1)

dFV (v) 1 v
Now, fV (v)   2  2 for –1 < v < 1
dv

b This can be explained because of symmetry reasons. A complete argument could be given like for
example: Both the distributions of X and Y are symmetrical around 0, so that the distributions of –X =
(–1)X and of –Y are the same as those of X and Y , so that also min(–X , –Y ) will have the same
distribution as V = min(X , Y ).
Also,since W = max( X, Y ) = – min(–X , –Y ), it is clear that the variances of X and Y must be equal to
each other.

c Since X + Y = V + W, it follows that Var(X + Y ) = Var(V + W ).


Because of independence of X and Y: Var(X + Y ) = 2Var(X ) = 2  (4/12) = 2/3. (See Theorem 7).

This has to be equal to:

Var(V + W ) = Var(V ) + Var(W ) + 2 Cov(V , W )


Var(V )  E(V 2 )  (E(V )) 2
1
E(V 2 )   v 2 ( 12  2v )dv   16 v 3  81 v 4   16  16  13
1

1
1
1
E(V )   v( 12  2v )dv   14 v 2  16 v 3    16  16   13
1

1
1

 Var(V )  13  19  2
9

Because of part b, we also know now that Var(W )  92 .

Entering the above into the equation Var(V + W ) = Var(V ) + Var(W ) + 2 Cov(V , W ), it follows that
1
1
2/3 = 2/9 + 2/9 + 2 Cov(V , W ), and thus that Cov(V , W ) = 1/9, and thus  (V ,W )  9

22 2
9 9

4a s2 
1 16 2

15 i 1
( xi  x ) 2

1  16 2
15  i 1
2
  xi  16 x  
 15
1

3324  16   96 /16   183.2
2

(n  1) S 2 15S 2
b Test statistic with distribution under H 0 :  ~ 2df 15 .
 2
100
15 183.2
The outcome for the test statistic is then  27.48
100
p-value = P(15 2
 27.48) , which from the supplied table can be found to be just a little larger than
0.025. Since this value is less than the value for the significance level, we reject H 0 . As a result, the
conclusion is that sufficient proof exists to state that the population standard deviation is larger than 10
(at α = 0.05).

1 1
c E  1n  X i2   E  X i2   nE(X 2 )  E(X 2 )  Var( X )  (E( X ))2  2  2  2 .
n n

d Test statistic with distribution under H 0 :


 X i2   X i2 ~ 2
df 16 .
2 100

(Note that in this situation no degree of freedom has been lost for the estimation of µ! In case you didn’t
know this result anymore, it could be derived easily from Theorems 1 and 8 above. Also, this situation
was also part of problem 7.15).
3324
The outcome for the test statistic is then  33.24
100
p-value = P(16
2
 33.24) , which must be between 0.005 and 0.01.

The reason that this test is preferable (given the knowledge that µ = 0) over the test in part b is simply
because now more information has become available. We no longer have to use part of the information
in the sample to estimate the sample mean first. In general this will lead to an increased power of the
test.

Any answer stating that this test is preferable because it resulted in a smaller p-value is clearly and
completely incorrect!
X
e P( X  6 |   10, =0)  P(  2.4)  1  P( Z  2.4)  0.0082
 n
X
P( X  6 |   20, =0)  P(  1.2)  1  P( Z  1.2)  0.1151
 n

f X ~ N(, 2 n) is a possible choice for a test statistic for testing H0 :   10 versus H1 :   10 in


case µ = 0, because: its distribution is known under the null hypothesis, and it is relevant for the
decision whether to reject the null hypothesis or not. This latter point is illustrated by the answers in
part e: for smaller values of σ, the distribution of the sample mean will be more concentrated around µ
= 0. So a large positive value for the sample mean, but also a large negative value will be more likely
under the alternative hypothesis as compared to under the null hypothesis. This explains why the
rejection region is now two-sided.
(We should always ask ourselves which values for the test statistic are indicative for the alternative
hypothesis. An answer like: “the normal distribution is symmetrical, while the chi-square distribution is
not”, is not part of a correct answer. This can be seen by the following argument: suppose we would
like to test σ two-sided, then the rejection region for the sample mean actually consists of three different
areas: for rather negative values, for rather large values, but also for values around 0!

The rejection region for the sample mean should now be such that P( X in rejection region)  0.05 .
So we need to find a such that P(| X |  a)  0.05 , or P( X  a)  0.025
X X
Since under H 0 : Z    0.4 X ~ N(0,1) , we can see that
 n 10 16
P( X  a)  P(0.4 X  0.4a)  P(Z  0.4a)  0.025 , which shows that 0.4a  1.96 , or that
a  1.96  2.5  4.9  P(| X |  4.9)  0.05 .

Power( =20 ) = P(H0 rejected | =20)  P(| X |  4.9 | =20)


g
 2 * P( X  4.9 | =20)  2 * P( X ( 16))  4.9 5 | =20)  2 * P( Z  0.98)  0.327

5a Because X i ~ Bin(1, p) , we can apply Theorem 6 to find that P( X  1)  p and P( X  0)  1  p .


Now, we can apply Theorem 4 directly, which gives :
E( X i )  p ( ) and Var( X i )  p (1 p ) ( 2 ) .

 i 1 X i  np
n

Then we apply theorem 5 (since both µ and σ are finite), which tells us that is
np(1  p)
approximately standard normally distributed (for n large enough).

 i 1 X i  p
1 n
n
By dividing both numerator and denominator by n, it follows that is also
p(1  p) / n
approximately standard normally distributed. As our final step, we need to apply Theorem 3 with
 i 1 X i  p  p  1
1 n

 i 1 X i  Pˆ is
n n
b p(1  p) / n and a  p , such that we get that b
p(1  p) / n
n

approximately normally distributed with expected value b  0  a  p and variance


b2  1  p(1  p) / n .

 Pˆ  p 
b P   z /2   z /2   1  
 pq / n 
By rewriting the above, we obtain:

P  Pˆ  z /2 pq / n  p  Pˆ  z /2 pq / n  Pˆ  z/2 pq / n   1  

Now, we replace p and q in the limits by their (unbiased) estimators Pˆ and (1  Pˆ ) , so that the
requested confidence interval becomes:

 Pˆ (1  Pˆ ) ˆ Pˆ (1  Pˆ ) 
 Pˆ  z /2 , P  z /2 .
 n n 
 

In this case, α = 0.01, so z/2  2.576 .

pˆ (1  pˆ )
c Margin of error in the confidence interval above is 2.576 , so we look for those values of n
n
pˆ (1  pˆ )
such that 2.576  0.02 . However, this expression contains the unknown pˆ (1  pˆ ) , so it can’t
n
be used straight away. However, since we know that pˆ (1  pˆ )  1 4 , we know that after replacing
pˆ (1  pˆ ) by 1 4 , we will always be on the safe side. Thus:

14
2.576  0.02
n
0.02
 1/ n 
2.576 * 12
2.576 * 12
 n  64.4
0.02
n  64.42  4147.4  n  4148

d Let us write p f for the proportion of people voting for the VPP within the female population, and pˆ f
for the sample proportion of people indicating they will vote for the VPP within the sample of 476
females. For the male population and sample of 524 voters, we will use the symbols pm and pˆ m
respectively.
Now we will test H0 : p f  pm versus H1 : p f  pm .

To find the correct test statistic, recall that we know from part a:
approx approx
Pˆ f ~ N( p f , p f (1  p f ) / 476) and Pˆm ~ N( pm , pm (1  pm ) / 524)

approx p f (1  p f ) pm (1  pm )
Thus, we also know that Pˆ f  Pˆm ~ N( p f  pm ,  ).
476 524
approx p(1  p) p(1  p)
Under H0 this becomes: Pˆ f  Pˆm ~ N(0,  ) , where p is the common population
476 524
proportion: p  pm  p f .Since this distribution still depends on an unknown parameter, it can’t be used
directly as a test statistic. But we can estimate the value of p by the proportion of all people in the two
476 Pˆ f  524 Pˆm
samples combined who vote for the VPP, thus by Pˆp  . Thus, we obtain the test
1000
Pˆ f  Pˆm approx
statistic: Z = ~ N(0, 1) .
Pˆp (1  Pˆp ) Pˆp (1  Pˆp )

476 524
The rejection region for this one-sided test is simply: z ≥ 1.645 (since α = 0.05).

This completes the setup of the test; no information has been given about the actually observed sample
proportions, so we can’t perform the test.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy