MDAE FC 2022-23 Stats Problem&Solutions
MDAE FC 2022-23 Stats Problem&Solutions
MDAE FC 2022-23 Stats Problem&Solutions
Index:
2,3 Probability
4 Probability and Baye’s Theorem
5 Binomial and Normal Approximation
6 Poisson and Normal Approximation
7 Poisson – Also clarifies PDF-CDF
8 Poisson and Chi-Square for Goodness of Fit
9 Chi Square Test for Independence
10,11 Chi Square Goodness of Fit
12 One Sample z-Test
13 Pooled Variance, Simple t-test
14 Simple t-test (Calculating backwards), Paired t-test
15 Confidence Interval and Unpaired t-test
16 Complete Problem on Regression
17 Correlation Coefficient and Goodness of Fit
Problem Set for Discussions Page 2 of 17
Probability
A die has the numbers 0,0,1,1,4 and 5. It is rolled. The probability of getting
the same number on both the throws is ___________ From Chpt 25, [E].
The table shows the left-handed and right-handed girls and boys in a class.
Two of the girls are chosen at random. The probability that exactly one of
these girls is left-handed is ___________ 0607_s13_qp_41_8b
Probability
Each of the three fair coins has been painted on its two sides. The first coin
is red on one side and blue on the other. The second coin is blue on one side
and green on the other. The third coin is green on one side and red on the
other. If the three coins are tossed in the air, the probability that the coins
show up three different colours is ___________
Problem Set for Discussions Page 4 of 17
Baye’s Theorem
Assume that Disease X is a rare disease, and only 2% of people have it.
Suppose that at your regular physical exam you test positive for Disease X.
Although Disease X has only mild symptoms, you are concerned and ask
your doctor about the accuracy of the test. It turns out that the test is 95%
accurate. What is the probability you have this disease?
19,800 people who tested positive would actually have the disease and
88,200 people who tested positive would not have the disease. This means
that of all those who tested positive, only 19,800/(19,800 + 88,200) = 0.1833 have it.
Let D be the event you have disease X. Since only 2% of people in your
situationhave Disease X, the prior probability of Event D is 0.02. Or, more
formally, P(D) = 0.02. If D' represents the probability that Event D is false,
then P(D') = 1 - P(D) = 0.98.
To define the diagnostic value of the test, we need to define another event:
that you test positive for Disease X. Let's call this Event T. The diagnostic
value of the test depends on the probability you will test positive given that
you actually have the disease, written as P(T|D), and the probability you
test positive given that you do not have the disease, written as P(T|D').
Bayes' theorem shown below allows you to calculate P(D|T), the probability
that you have the disease given that you test positive for it.
The various terms and the formulae are: (and you will get the same answer!!)
( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( )
( ) ( ) ( )
√
Both and
Problem Set for Discussions Page 6 of 17
( )
For the first one, we take and hence ( )
( ) ( ) ( ) ( ) ( )
We want ( ) ( ) ( )
√
Problem Set for Discussions Page 7 of 17
When we define the variable X, we are not saying it is between the second
and the third flaw or between the nineteenth and the twentieth. It really
does not matter – (Lack of memory of the exponential distribution).
So the event X > x, we want zero flaws in x meters.
( )
That being said we are looking at P(zero flaws) =
( )
P(at least one flaw) = 1 – P(no flaw) =
, which gives
The numbers of a particular type of laptop computer sold by a store on each of 100
consecutive Saturdays are summarised in the following table.
Numbers Sold 0 1 2 3 4 5 6 7 >= 8
Number of Saturdays 7 20 39 16 14 2 1 1 0
Fit a Poisson distribution to the data and carry out a goodness of fit test at the
2.5% significance level … 2014W
x 0 1 2 3 4 5 6 7 >=8 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.050649 0.018993 0.006105 0.002267 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 5.06 1.9 0.61 0.23 100
Combine the last 4 cells so that the expected value is more than 5
x 0 1 2 3 4 >=5 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.078014 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 7.8 100
Test for independence of hair colour and city at a 10% level of significance.
(State the null hypothesis and the alternate hypothesis clearly and use the
appropriate test.)
Hair Colour
Fair Red Medium Dark Jet Black
A 59 11 84 50 36
City B 54 97 67 45 14
C 40 30 20 40 10
Calculate all the expected numbers assuming independence and the table should
look like
240*153/657
Hair Colour
Fair Red Medium Dark Jet Black Totals
A 59 (56) 11 (51) 84 (62) 50 (49) 36 (22) 240
City B 54 (64) 97 (58) 67 (72) 45 (57) 14 (25) 277
C 40 (33) 30 (29) 20 (37) 40 (29) 10 (13) 140
Totals 153 138 171 135 60 657
H0 : Null Hypothesis - The attributes city and hair colour are independent
H1 : Alternate Hypothesis – The attributes are dependent
χ2 8, 0.1 = 13.362
Since the calculated value is greater (in fact, much greater) than the table
value, we reject the null hypothesis and conclude at 10% LOS that the hair
colour attribute is dependent on the city.
Problem Set for Discussions Page 10 of 17
…. 2013
Problem Set for Discussions Page 11 of 17
Humerous bones from the same species of animal have approximately the
same length-to-width ratios. It is known that Species A has a mean ratio of
8.5. Suppose that 41 fossil humerous bones were unearthed at a site where
Species A is known to have flourished. (We assume that all bones are from
the same species.) The length-to-width ratios of these bones has sample
mean 9.26 and sample standard deviation 1.20. Can we conclude that these
bones belong to Species A? Perform a level 0.05 z-test to check. [Mendenhall
and Sincich, p. 45]
Problem Set for Discussions Page 13 of 17
Pooled Variance
Paired t-test
A random sample of 8 bowlers was taken and their bowling speeds with a
white cricket ball and a red cricket ball were noted under similar
conditions. Use the t-test, with appropriate parameters to support the
hypothesis (state it clearly first) that there is no significant difference in
speeds between using the red and the white ball. The calculated t-value
should be approximately 1.09
Player A B C D E F G H
Red Ball Speed 66.2 62.4 60.8 65.4 68.8 64.3 65.2 67.2
White Ball Speed 66.1 60.3 60.9 65.2 66.4 63.8 62.4 69.8
Difference is speed 0.1 2.1 -0.1 0.2 2.4 0.5 2.8 -2.6
Mean 0.675
Estimated Std Dev 1.750