0% found this document useful (0 votes)
3 views

CS1A (2)

The document provides indicative solutions for the CS1-Actuarial Statistics exam conducted by the Institute of Actuaries of India in May 2023. It includes detailed solutions for various problems, highlighting correct answers and methodologies used to arrive at those answers. The solutions cover topics such as moment generating functions, probability distributions, and statistical inference methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

CS1A (2)

The document provides indicative solutions for the CS1-Actuarial Statistics exam conducted by the Institute of Actuaries of India in May 2023. It includes detailed solutions for various problems, highlighting correct answers and methodologies used to arrive at those answers. The solutions cover topics such as moment generating functions, probability distributions, and statistical inference methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Institute of Actuaries of India

Subject CS1-Actuarial Statistics (Paper A)

May 2023 Examination

INDICATIVE SOLUTION

Introduction

The indicative solution has been written by the Examiners with the aim of helping candidates. The solutions
given are only indicative. It is realized that there could be other points as valid answers and examiner have given
credit for any alternative approach or interpretation which they consider to be reasonable.
IAI CS1A-0523

Solution 1:
i) Correct Answer is Option C

It is a basic property of MGFs that for any random variable at t=0, the value of the MGF shall be
equal to 1. This is not true in case of option C which will return a value of 2/3, hence it is not
valid. [1]

ii) Mx(t) = E(etx) = ∫a∞ etx λ e-λ(x-a)dx [1]

=∫a∞ eλa λ e-x(λ-t)dx


= eλa λ e-x(λ-t)/-(λ-t)|a∞ [1]
=λ/(λ-t) eat , Provided t<λ
=(1-t/λ)-1 eat [1]
[3]

iii) Mx(t) =(1-t/λ)-1 eat


M’x(t) =(1-t/λ)-1 eat a+ (1-t/λ)-2 1/λ eat [1]
E(x) = M’x(0)=a+1/λ [1]
[2]

iv) Correct Answer is Option B

The result can be obtained by integrating the pdf from a to x. It returns the following result:
- exp(λa) * [ exp(-λt) ] xa
= - exp(λa) * [ exp(-λx) - exp(-λa)]
= exp(λa) * [ exp(-λa) - exp(-λx) ] [1]

v) FX(x) = u = exp(λa) * [ exp(-λa) - exp(-λx) ]


u / exp (λa) - exp(-λa) = - exp(-λx)
exp (-λx) = exp(-λa) - u / exp (λa) [1]

Taking logs to the base e on both sides and substituting values of λ and a,
-0.10 * x = ln ( exp(-0.10*0.05) – u / exp(0.10*0.05) )
x = - 10 * ln ( (1-u) * exp(-0.10*0.05) )
x = - 10 * ln ( (1-u) * 0.995012) [1]

Using the two values given in the question,


@ u = 0.226, x = -10 * ln ( (1-0.226) * 0.995012) = 2.61183
@ u = 0.304, x = -10 * ln ( (1-0.304) * 0.995012) = 3.67406 [1]
[3]
[10 Marks]

Solution 2:
i) Pr(Y < X / 4)
= Pr (Y < 250)
= Pr (3Y < X < ∞, 0 < Y < 250) [0.5]

= ∫0250 (∫3y∞ 3/106 e-x/1000 dx) dy

= ∫0250 (3/106 e-x/1000/ (-1/1000))3y∞ dy [1]


Page 2 of 11
IAI CS1A-0523

=∫0250 3/103 (e-3y/1000-e-∞) dy

= 3/103 (e-3y/1000/(-3/1000))0250

= (e0 – e -3(250/1000))
= 1 – exp(-0.75) [1]
= 1 – 0.472
= 0.528
The probability that the ratio has in fact reduced to 1:4 is 0.528. [0.5]
[3]

ii) f(y) =∫𝑥=3𝑦 3/10^6 e-x/1000 dx

=3/106 (e-x/1000/(-1/1000))3y∞
= 3/1000 e-3y/1000 [2]

iii) Correct Answer is Option B


From part (ii), we can observe that Y is an exponential random variable with the value of
parameter λ = 3/1000. [1]

iv) For random variables X and Y to be independent, fX(x) * fY(y) = f XY (x, y) must be true.

From observation itself, we understand that since f XY (x, y) does not contain any term in y, the
above is not true and hence X and Y cannot be independent. [1]

v) Correct Answer is Option A


fX(x) * fY(y)
= x/10002 e-x/1000 * 3/1000 e-3y/1000
= 3x/109 * e-(x+3y)/1000
= g XY (x, y) as mentioned in Option A. [1]

vi) We have decided to use g XY (x, y) and we know that for this joint probability density
function X and Y are independent random variables.
Hence, conditional expectation E (Y | X > 950) is independent of X and hence it is equal to
E(Y). [1]

From part (iii), we know that Y ~ Exp (3/1000).

Hence E (Y | X > 950) = E(Y) = 1000/3 = 333.33 [1]


[2]
[10 Marks]

Solution 3:
i) Correct Answer is Option A
All other options represent statements which are not true. [1]

ii) For exponential family, we have write in the form:

𝑦𝜃 − 𝑏(𝜃)
𝑓(𝑦) = 𝑒𝑥𝑝 [ + 𝑐(𝑦, ∅)]
𝑎(∅) [0.5]
[1]
Page 3 of 11
IAI CS1A-0523

The given discrete distribution is:


𝑓(𝑦)
= n C ny * µny * (1 - µ)n-ny
= exp ( n( y logµ + (1 – y) log(1 – µ)) + log (n C ny) )
= exp ( n( y log (µ / (1 – µ)) + log(1 – µ)) + log (n C ny) )

This is in the form of the exponential family as mentioned above.

Where:
𝜃 = log (µ / (1 – µ) ) [0.5]
∅=𝑛
a(∅) = 1 / ∅ [0.5]
c(y, ∅) = log (n C ny) [0.5]
Using 𝜃 = log (µ / (1 – µ)), we can show that µ = 𝑒 𝜃 / (1 + 𝑒 𝜃 )
b(𝜃) = 𝑙𝑜𝑔 (1 + 𝑒 𝜃 ) [1]
[4]

iii) E(Y) = b’(𝜃) = 𝑒 𝜃 / (1 + 𝑒 𝜃 ) = µ [0.5]


𝜃 2
V(µ) = 𝑒 𝜃 / (1 + 𝑒 ) = µ (1 − µ) [0.5]

V(Y) = V(µ) ∗ a(∅) = µ (1 − µ) / n [0.5]

E(Z) = n * E(Y) = n * µ [0.5]

Var(Z) = n2 * V(Y) = n * µ (1 − µ) [0.5]

This corresponds to the mean n * µ and variance n * µ (1 − µ) of Z ~ Bin(n, µ) [0.5]


[3]
iv) Bernoulli variable say X with parameter µ would have the following probability distribution:
f(x) = µ ∗ (1 − µ) 𝑓𝑜𝑟 𝑥 = 0,1.
But the probability density function of Y is
f(y,µ) = n C ny * µny * (1 - µ)n-ny where y = 0, 1/n, 2/n, 3/n, ………, 1
Hence Y is not Bernoulli. Although mean and variance of Y equal to the variance of a Bernoulli
distribution, it is not sufficient to conclude that Y is a Bernoulli variable. [1]

v) Correct Answer is Option A


In case of binomial distribution, logit link function is the canonical link function. [1]
[10 Marks]

Solution 4:
i) Correct Answer is Option D
All methods are valid methods to estimate the value of parameter based on information from a
sample. [1]

33 37 70
ii) p̂E = = 0.786, p̂Z = (11∗7) =0.481, p̂ = =0.588 [1]
(6 ∗ 7) (17∗7)

iii) Correct Answer is Option A

let ni be the total number of questions for the whole batch i,, Bi be the total number of correct
answer by Batch i.
L(b;Ɵ) = (2Ɵ)BE (1-2Ɵ)7nE-BE (Ɵ)BZ (1-Ɵ)7nZ-BZ *constant [2]

Page 4 of 11
IAI CS1A-0523

l(b;Ɵ ) = InL(b;Ɵ)
= 33In(2Ɵ)+(42-33) In(1-2Ɵ)+37In(Ɵ) +(77-37) In(1-Ɵ)+ constant
= 33 In(2Ɵ) +9In(1-2Ɵ)+37 In(Ɵ) +40 In(1-Ɵ)+constant

iv) 𝑑𝑙 66 18 37 40
= 2Ɵ - 1−2Ɵ + -
𝑑Ɵ Ɵ 1−Ɵ

70 18 40
= − 1−2Ɵ - 1−Ɵ [1.5]
Ɵ

Set equal to zero and solve


70(1−2Ɵ)(1−Ɵ)−18Ɵ(1−Ɵ)−40Ɵ(1−2Ɵ)
=0
Ɵ(1−2Ɵ)(1−Ɵ)
 70-210 Ɵ-140 Ɵ -18 Ɵ+18 Ɵ -40 Ɵ+80 Ɵ2 =0
2 2

 238 Ɵ2 -268 Ɵ +70 = 0


[1.5]
 Ɵ = 0.412 or 0.714
 As Ɵ<0.5, Ɵ̂ =0.412
[1]
[4]

v) Total number of correct answers “Elite” Batch “Zenith” Batch


Method of Moments Estimate 42 * 0.588 77 * 0.588
= 24.7 = 45.3
Method of Maximum Likelihood Estimate 42 * 2 * 0.412 77 * 0.412
= 34.6 =31.7
Observed Values 33 37 [1]

From the table it appears that MLE gives a better fit as predicted values are close to expected
values. [1]
[2]
[10 Marks]

Solution 5:
i) Correct Answer is Option A

Since the sample size is small and since the population variance is not known, t distribution
would be suitable to perform this test. [1]

ii) X̅A =51.48, X̅B =40.14,


SA2 =1/9*(27804.64-10*51.48^2) = 144.7484
SB2 =1/9*(18215.88-10*40.14^2) = 233.7427 [1.5]

Pooled Variance:
S2p = 1/18*(9*144.7484+9*233.7427)
=189.2456 [1]
1 1
95 % confidence interval is: (X̅A - X̅B) ± tnA+nB-2 * S2p* √𝑛𝐴 + 𝑛𝐵
2
= (51.48-40.14)± 2.101 √189.2456 √10
= (-1.58569, 24.26569) [1]

Since 0 lies in the above confidence interval, we can conclude that at 5% level of significance,
there is insufficient evidence to reject the null hypothesis. [0.5]
[4]

Page 5 of 11
IAI CS1A-0523

iii) Correct Answer is Option D

Normality of the population data and equal population variances are the assumptions used for
conducting a t-test. [1]

iv) Sample size:

The width of the confidence interval is


2 144.7484(𝑛−1)+233.7427(𝑛−1)
2 * t2.5%,2n-2√𝑛 √ 2𝑛−2

38.9097
= t2.5%,2n-2
√𝑛 [1]
This should be less than 20, so using percentage points of the t distribution,
We have:
n = 15 => t2.5%,2n-2 =2.048
=> 38.9097 * 2.048/√15 = 20.57511 > 20 [1]
And n = 16 => t2.5%,2n-2 =2.042

=> 38.9097*2.042/√16 = 19.8634 < 20


Hence the minimum sample size required is 16. Hence, at least 6 additional circles must be
subject to a test run before going ahead with full blown implementation. [1]
[3]

v) At this new width, confidence interval would be:


(11.34 – 19.8634/2, 11.34 + 19.8634/2)
= (1.408, 21.272) [0.5]
Since this confidence interval does not contain 0, we can conclude that there is sufficient
evidence to reject the null hypothesis. [0.5]
[1]
[10 Marks]

Solution 6:
i) Correct Answer is Option B
Since the data is perfectly monotonically decreasing, both the coefficients would be equal to -1
(perfect negative correlation).
[2]

ii) H0: ρ = 0
H1: ρ > 0 [0.5]

Under H0, the sampling distribution of Kendall’s rank correlation coefficient is approximately
normal with mean 0 and variance = 2(2n+5) / 9n(n-1) [0.5]

Variance = 2(20+5)/90(9) = 50/810 = 0.061728 [0.5]


Observed value of the test statistic
= (rk – mean) / √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
= (0.11 – 0) / √0.061728
= 0.4427 [1]
This does not exceed the upper 0.05% point of the standard normal distribution (3.2905). [0.5]
So we do not have sufficient evidence to reject the null hypothesis. [0.5]
[0.5]
Page 6 of 11
IAI CS1A-0523

Hence, we can conclude that the inflation rates for Freedonia and Genovia are not positively
correlated.
[4]

iii) The completed table with the rank values for Genovia is given below:

Rank Freedonia Rank Genovia Number of Number of


Concordant Pairs Discordant Pairs
1 6 4 5
2 3 6 2
3 1 7 0
4 4 5 1
5 8 2 3
6 10 0 4
7 9 0 3
8 7 0 2
9 2 1 0
10 5 - -
25 20

In case of Rank 1 (Freedonia), there are 4 concordant pairs and 5 discordant pairs. So, the rank
value here for Genovia is higher than 4 rank values (10, 9, 8, 7) but lower than 5 rank values (1,
2, 3, 4, 5). It must be 6.
In case of Rank 2 (Freedonia), there are 6 concordant pairs and 2 discordant pairs. So, the rank
value here for Genovia is higher than 6 rank values (10, 9, 8, 7, 5, 4) but lower than 2 rank values
(1, 2). It must be 3. Kindly note that 6 is already considered in the upper cell and it is not being
taken into consideration.
The above process can be continued till we get all rank values for Genovia. [2.5]

6 ∑ 𝑑2
rs = 1 – 𝑛(𝑛2 −1) [0.5]

∑ 𝑑 2 = -52 + -12 + 22 + 02 + -32 + -42 + -22 + 12 + 72 + 52 = 134 [1]

rs = 1 – 6*134 / 10(100 – 1) = 0.1879 [1]


[5]

iv) If we want to retain those components which explain 90% of the total variance, PC1 should be
retained as it accounts for 95.88% of the total variance. [1]

Based on the Scree Plot, the plot becomes flat from PC2 and onwards. Hence using Scree Test,
PC1 should be retained as variances level off after PC1. [1]

As per Kaiser’s Test only those PCs with variances greater than 1 should be retained (applicable
in case of scaled data). Since, only PC1 has variance greater than 1, only PC1 should be retained. [1]
[3]

v) Correct Answer is Option C


Principal components are un-correlated linear combinations of the variables of the original data
set. Hence correlation coefficient between PC1 and PC2 will be equal to 0. [1]
[15 Marks]

Solution 7:
i) Correct Answer is Option B

Page 7 of 11
IAI CS1A-0523

Since we are modelling N which represents the number of trials to be performed until the first
success occurs, the appropriate distribution would be geometric distribution. [1]

ii) The prior distribution of “p” is uniform over the interval [0,1]
So f prior (p) =1 0≤p≤1 [0.5]

Sample contains only one observation n1. So the likelihood function of “p” is:
L(p) = P(N = n1) = (1 – p)(n1-1) * p
The above expansion is based on the fact that N | p ~ Geometric(p) [1]

Combining the prior PDF and the likelihood function, we get,


f posterior (p) ∝ f prior (p) * L(p)
f posterior (p) ∝ (1 – p)(n1-1) * p
f posterior (p) ∝ (1 – p)(n1-1) * p(2-1) for 0 ≤ p ≤ 1 [1]

So the posterior distribution of p is Beta (2, n1) [0.5]


[3]

iii) Prior mean of p under U[0,1] is given by:


E(p) = (0+1)/2 = 0.50 [0.5]
We have to calculate probability that P exceeds 0.50 using the posterior distribution of P.

P(P > 0.50)


1
= ∫0.50(1 − 𝑝)(𝑛1−1) ∗ p dp
1
=∫0.50(1 − 𝑝)(3−1) ∗ p dp
1
=∫0.50(1 − 𝑝)2 ∗ p dp
1
= ∫0.50(1 − 2p + 𝑝2 ) ∗ p dp
1
=∫0.50(p − 2𝑝2 + 𝑝3 ) dp
= (12 – 0.502) / 2 – 2/3 * (13 – 0.503) + ¼ * (14 – 0.504)
= 0.026042 [2.5]
This signifies that there is only 2.6% chance that the vote share of XJP will cross 50% (which is
the prior expectation based on historical vote shares). [1]
[4]

iv) Likelihood function now is given by:


L(p)
= P(N1 = n1) * P(N2 = n2) * P(N3 = n3) * ………………… * P(N50 = n50)
= (1 – p)(n1-1) * p * (1 – p)(n2-1) * p *(1 – p)(n3-1) * p * ………………….. * (1 – p)(n50-1) *
p
= (1 – p) (n1+n2+……+n50 – 50) * p 50
= (1 – p) (200 – 50) * p 50
= (1 – p)150 * p 50 [1.5]

Hence, posterior distribution of p is given by:


f posterior (p) ∝ (1 – p)150 * p50
f posterior (p) ∝ (1 – p)(151-1) * p(51-1) for 0 ≤ p ≤ 1 [1]

So the posterior distribution of p is Beta (51, 151) [0.5]

Page 8 of 11
IAI CS1A-0523

[3]
v) Bayesian estimate of “p” under squared error loss is the mean of the posterior distribution
which is given by:

E(P | n) = α / (α + β) = 51 / (51 + 151) = 0.2525 [1]

E(P | n) = Z * sample mean + (1 – Z) * prior mean


0.2525 = Z * (50/200) + (1 – Z) * 0.50
0.2525 = 0.25Z – 0.50 Z + 0.50
0.25Z = 0.2475
Z = 0.99
So, if we want to estimate the posterior mean as a credibility estimate,
E(P | n) = 0.99 * sample mean + 0.01 * prior mean [2]
[3]

vi) Correct Answer is Option C


200 is a very small sample size to draw any concrete conclusion. Increasing the sample size
would be the ideal way forward.
[1]
[15 Marks]

Solution 8:
i) Correct Answer is Option B
Value of estimates for the intercept, X1 and X2 will be the values of α, β1 and β2 respectively. [1]

ii) Adjusted R2 = 1 – (n – 1) / (n – k – 1) * (1 – R2) [1]


We have n = 10 and k = 2 predictors and R2 = 0.9005 [0.5]
Adjusted R2
= 1 – (10 – 1) / (10 – 2 – 1) * (1 – 0.9005)
= 87.21% [0.5]
[2]
iii) 𝑦̂ 𝐸𝑚𝑒𝑟𝑎𝑙𝑑 𝐶𝑖𝑡𝑦
= 47.5601 + 0.6924 (125) – 0.6624 (90)
= 74.4941 [0.5]
𝑒̂ 𝐸𝑚𝑒𝑟𝑎𝑙𝑑 𝐶𝑖𝑡𝑦
= 77 – 74.4941
= 2.5059 [0.5]
𝑦̂ 𝐷𝑎𝑟𝑘 𝐶𝑖𝑡𝑦
= 47.5601 + 0.6924 (124) – 0.6624 (72)
= 85.7249 [0.5]
𝑒̂ 𝐷𝑎𝑟𝑘 𝐶𝑖𝑡𝑦
= 88 – 85.7249
= 2.2751 [0.5]
[2]

iv) Syy = 776.10


Szz = 1472.10
Syz = 1013.90 [1]
µ̂
= Syz / Szz . [1]

Page 9 of 11
IAI CS1A-0523

= 1013.90 / 1472.10
= 0.6887
ʎ̂
= 𝑦̅ – µ̂ * 𝑧̅
= 80.30 – 0.6887 * 43.70
= 50.2019 [1]

𝑦̂ 𝐸𝑚𝑒𝑟𝑎𝑙𝑑 𝐶𝑖𝑡𝑦
= 50.2019 + 0.6887 (125-90)
= 74.3064 [0.5]
𝑒̂ 𝐸𝑚𝑒𝑟𝑎𝑙𝑑 𝐶𝑖𝑡𝑦
= 77 – 74.3064
= 2.6936 [0.5]
𝑦̂ 𝐷𝑎𝑟𝑘 𝐶𝑖𝑡𝑦
= 50.2019 + 0.6887 (124-72)
= 86.0143 [0.5]
𝑒̂ 𝐷𝑎𝑟𝑘 𝐶𝑖𝑡𝑦
= 88 – 86.0143
= 1.9857 [0.5]
[5]
v) R2
= Sxz2 / (Sxx * Szz)
= (1013.90)2 / (776.10 * 1472.10)
= 89.9778% [1]

Adjusted R2
= 1 – (10 – 1) / (10 – 1 – 1) * (1 – 0.899778)
= 88.725% [1]
[2]
vi) Correct Answer is Option D
2 = (0.6887) * zreduction
zreduction = 2/0.6887 = 2.90 [1]

vii) ̅ = 0.
a) We are given that W
̅)2 = ∑(w − 0)2 = ∑(z − 𝑧̅)2 = Szz
Sww = ∑(w − 𝑤 [1]

Syw = ∑(y − 𝑦̅)(w − 𝑤


̅) = ∑(y − 𝑦̅)(w − 0) ∑(y − 𝑦̅)(z − 𝑧̅) = Syz [1]

£̂ = Syw / Sww = Syz / Szz = µ̂ [1]

b) 𝛿̂ = 𝑦̅ – µ̂ * 𝑤
̅ = 𝑦̅ – µ̂ * 0 = 𝑦̅ = ʎ̂ + µ̂ * 𝑧̅ [1]
[4]

viii) Improvised
Multiple Linear Bivariate Linear
City Bivariate Linear
Regression Model Model
Model
Emerald City (𝑦̂, 𝑒) (74.49, 2.51) (74.31, 2.69) (74.31, 2.69)
Dark City (𝑦̂, 𝑒) (85.72, 2.28) 86.02, 1.99) (86.02, 1.98)
Adjusted R2 87.21% 88.73% 88.72%
[1]

Page 10 of 11
IAI CS1A-0523

In terms of the predicted responses and residuals, for Emerald City, the multiple linear
regression model appears to be a better fit. However for Dark City, the bivariate linear model
gives better results as compared to the multiple linear regression model.

However in terms of Adjusted R2 (which measures the variation of the predicted responses to
actual responses), Bivariate Linear Model appears to be a better fit. [1]

Improvised Bivariate Linear Model just employs a linear combination of the explanatory
variable of the Original Bivariate Linear Model and hence gives almost similar results like the
original model. Unlike the presumption made by your son, it is clear that the improvised model
does not provide a better fit as compared to the original model. [1]
[3]
[20 Marks]

*****************

Page 11 of 11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy