MDAE FC 2022-23 Stats Problem&Solutions

Problem Set for Discussions Page 1 of 17
18th Sep 2022
Index:
2,3 Probability
4 Probability and Baye’s Theorem
5 Binomial and Normal Approximation
6 Poisson and Normal Approximation
7 Poisson – Also clarifies PDF-CDF
8 Poisson and Chi-Square for Goodness of Fit
9 Chi Square Test for Independence
10,11 Chi Square Goodness of Fit
12 One Sample z-Test
13 Pooled Variance, Simple t-test
14 Simple t-test (Calculating backwards), Paired t-test
15 Confidence Interval and Unpaired t-test
16 Complete Problem on Regression
17 Correlation Coefficient and Goodness of Fit
Probability
A card is drawn at random from a pack of 52 cards. The probability of

getting a 3 of diamond is __________
There is only one 3 of diamond in a pack of 52 cards so probability = 1/52
A die has the numbers 0,0,1,1,4 and 5. It is rolled. The probability of getting
the same number on both the throws is ___________ From Chpt 25, [E].
The table shows the left-handed and right-handed girls and boys in a class.
Two of the girls are chosen at random. The probability that exactly one of
these girls is left-handed is ___________ 0607_s13_qp_41_8b
Can you figure out why it is ?

Probability
The probability of A hitting a target is and that of B is . They both fire at

the target. The probability that only one of them will hit the target is
___________
P[A hits target]+P[B hits target]−2P[A AND B hit] = ( )( )
Each of the three fair coins has been painted on its two sides. The first coin
is red on one side and blue on the other. The second coin is blue on one side
and green on the other. The third coin is green on one side and red on the
other. If the three coins are tossed in the air, the probability that the coins
show up three different colours is ___________
Baye’s Theorem
Assume that Disease X is a rare disease, and only 2% of people have it.
Suppose that at your regular physical exam you test positive for Disease X.
Although Disease X has only mild symptoms, you are concerned and ask
your doctor about the accuracy of the test. It turns out that the test is 95%
accurate. What is the probability you have this disease?
A picture is worth a thousand words – however here are a few to explain -
19,800 people who tested positive would actually have the disease and
88,200 people who tested positive would not have the disease. This means
that of all those who tested positive, only 19,800/(19,800 + 88,200) = 0.1833 have it.
If you insist on using Baye’s Theorem,
Let D be the event you have disease X. Since only 2% of people in your
situationhave Disease X, the prior probability of Event D is 0.02. Or, more
formally, P(D) = 0.02. If D' represents the probability that Event D is false,
then P(D') = 1 - P(D) = 0.98.
To define the diagnostic value of the test, we need to define another event:
that you test positive for Disease X. Let's call this Event T. The diagnostic
value of the test depends on the probability you will test positive given that
you actually have the disease, written as P(T|D), and the probability you
test positive given that you do not have the disease, written as P(T|D').
Bayes' theorem shown below allows you to calculate P(D|T), the probability
that you have the disease given that you test positive for it.
The various terms and the formulae are: (and you will get the same answer!!)
P(T|D) = 0.99 Substitute…and you

P(T|D') = 0.09 will get the same
P(D) = 0.02 answer !!!
P(D') = 0.98
Binomial and Normal Approximation

(i) In a certain country, 68% of the households have a printer. Find
the probability that, in a random sample of 8 households, 5, 6 or 7
households have a printer.
(ii) Use an approximation to find the probability that, in a random

sample of 500 households more than 337 households have a
printer.
(iii) Justify your use of approximation in part (ii).
( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( )
We take , (Normal approximation to Binomial)
( ) ( ) ( )
√
Both and
Poisson and Normal Approximation

A publishing firm has found that errors in the first draft of a new book occur
at random and that, on average, there is 1 error in every 3 pages of a first
draft. Find the probability that in a particular first draft there are
(i) exactly 2 errors in 10 pages
(ii) at least 3 errors in 6 pages
(iii) fewer than 50 errors in 200 pages.
( )
For the first one, we take and hence ( )
For the second one, we take and the solution is given by
( ) ( ) ( ) ( ) ( )
For the last question, we take X ~ Po ( )
We want ( ) ( ) ( )
√
Poisson – Also clarifies PDF-CDF

Cotton cloth is sold from long rolls of cloth. The number of flaws on a
randomly chosen piece of cloth of length a meters has a Poisson distribution
with mean 0.8a. The random variable X is the length of cloth, in metres,
between two successive flaws.
(i) Explain why, for ( )
(ii) Find the probability that there is at least one flaw in a 4 meter length of
cloth.
(iii) Find
(a) the distribution function of X
(b) the probability density function of X
(c) the interquartile range of X
When we define the variable X, we are not saying it is between the second
and the third flaw or between the nineteenth and the twentieth. It really
does not matter – (Lack of memory of the exponential distribution).
So the event X > x, we want zero flaws in x meters.
( )
That being said we are looking at P(zero flaws) =
( )
P(at least one flaw) = 1 – P(no flaw) =
We have 3 parts to the last question, so

(a) ( ) ( ) ( )
(b) The distribution function ( )
(c) We want F(x) = ¼ to give us Q1 and F(x) = ¾ to give us Q3. IQR = Q3 – Q1
, which gives
(Just in case you want to verify: and )

Poisson and Chi-Square for Goodness of Fit
The numbers of a particular type of laptop computer sold by a store on each of 100
consecutive Saturdays are summarised in the following table.
Numbers Sold 0 1 2 3 4 5 6 7 >= 8
Number of Saturdays 7 20 39 16 14 2 1 1 0
Fit a Poisson distribution to the data and carry out a goodness of fit test at the
2.5% significance level … 2014W
The mean of the sample data is a weighted average:
The null hypothesis H0: Poisson Distribution fits the Data
We have to find expected values, using
x 0 1 2 3 4 5 6 7 >=8 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.050649 0.018993 0.006105 0.002267 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 5.06 1.9 0.61 0.23 100
Combine the last 4 cells so that the expected value is more than 5
x 0 1 2 3 4 >=5 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.078014 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 7.8 100
Finally, the Chi-Square computations are given below:

x 0 1 2 3 4 >=5 SUM
E(X = x) 10.54 23.71 26.68 20.01 11.26 7.8 100
Observed 7 20 39 16 14 4 100
(Obs - Exp)^2 / Exp 1.189 0.581 5.689 0.804 0.667 1.851 10.78
There are 6 cells, so the df = 5. However, we lost one more df because we

computed λ from the data itself. So we have to look at df = 4.
χ2 4, 0.975 = 11.14. Our computed value is (just) less than the table value.
So we accept the Null Hypothesis.
Chi Square Test for Independence
Test for independence of hair colour and city at a 10% level of significance.
(State the null hypothesis and the alternate hypothesis clearly and use the
appropriate test.)
Hair Colour
Fair Red Medium Dark Jet Black
A 59 11 84 50 36
City B 54 97 67 45 14
C 40 30 20 40 10
Calculate all the expected numbers assuming independence and the table should
look like
240*153/657
Hair Colour
Fair Red Medium Dark Jet Black Totals
A 59 (56) 11 (51) 84 (62) 50 (49) 36 (22) 240
City B 54 (64) 97 (58) 67 (72) 45 (57) 14 (25) 277
C 40 (33) 30 (29) 20 (37) 40 (29) 10 (13) 140
Totals 153 138 171 135 60 657
H0 : Null Hypothesis - The attributes city and hair colour are independent
H1 : Alternate Hypothesis – The attributes are dependent
We calculate Chi-Square as the short-cut formula ∑ -N
and get 754.9643 – 657 = 97.9643.

We have (3-1)(5-1) = 8 degrees of freedom and the Chi-Square from the table,
χ2 8, 0.1 = 13.362
Since the calculated value is greater (in fact, much greater) than the table
value, we reject the null hypothesis and conclude at 10% LOS that the hair
colour attribute is dependent on the city.
Calculate Expected Frequencies Using Definite Integrals.

Goodness of Fit using Chi Square.
…. 2013
Fitting an appropriate distribution and testing goodness of fit

using Chi Square
One sample z-Test
Humerous bones from the same species of animal have approximately the
same length-to-width ratios. It is known that Species A has a mean ratio of
8.5. Suppose that 41 fossil humerous bones were unearthed at a site where
Species A is known to have flourished. (We assume that all bones are from
the same species.) The length-to-width ratios of these bones has sample
mean 9.26 and sample standard deviation 1.20. Can we conclude that these
bones belong to Species A? Perform a level 0.05 z-test to check. [Mendenhall
and Sincich, p. 45]
Pooled Variance
Simple t-test – Assume Population Normal

There are a large number of students at a particular college. The heights, in
metres, of a random sample of 8 students are as follows:
1.75, 1.72, 1.62, 1.70, 1.82, 1.75, 1.68, 1.84
You may assume that heights of students are normally distributed.
(a) Test, at the 5% significance level, whether the population mean height of
students at this college is greater than 1.70 metres
(b) Find a 95% confidence interval for the population mean height of
students at this college.
Simple t-test … Computing some parameters backwards
Paired t-test
A random sample of 8 bowlers was taken and their bowling speeds with a
white cricket ball and a red cricket ball were noted under similar
conditions. Use the t-test, with appropriate parameters to support the
hypothesis (state it clearly first) that there is no significant difference in
speeds between using the red and the white ball. The calculated t-value
should be approximately 1.09
Player A B C D E F G H
Red Ball Speed 66.2 62.4 60.8 65.4 68.8 64.3 65.2 67.2
White Ball Speed 66.1 60.3 60.9 65.2 66.4 63.8 62.4 69.8
We use a paired t-test. The results in brief, are shown below.
Difference is speed 0.1 2.1 -0.1 0.2 2.4 0.5 2.8 -2.6
Mean 0.675
Estimated Std Dev 1.750
Then t = 0.675/(1.75/sqrt (8)) = 1.09. We check this with tables at 7 df and

0.975. The tabulated value is 2.36 so validates null hypothesis.
Confidence Intervals and Unpaired t-test

Complete Problem on Regression

Correlation Coefficient and Goodness of Fit

MDAE FC 2022-23 Stats Problem&Solutions

Uploaded by

Copyright:

Available Formats

MDAE FC 2022-23 Stats Problem&Solutions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MDAE FC 2022-23 Stats Problem&Solutions

Uploaded by

Copyright:

Available Formats

Problem Set for Discussions Page 1 of 17

18th Sep 2022

A card is drawn at random from a pack of 52 cards. The probability of

There is only one 3 of diamond in a pack of 52 cards so probability = 1/52

Can you figure out why it is ?

The probability of A hitting a target is and that of B is . They both fire at

P[A hits target]+P[B hits target]−2P[A AND B hit] = ( )( )

A picture is worth a thousand words – however here are a few to explain -

If you insist on using Baye’s Theorem,

P(T|D) = 0.99 Substitute…and you

Binomial and Normal Approximation

(ii) Use an approximation to find the probability that, in a random

We take , (Normal approximation to Binomial)

Poisson and Normal Approximation

For the second one, we take and the solution is given by

For the last question, we take X ~ Po ( )

Poisson – Also clarifies PDF-CDF

We have 3 parts to the last question, so

(b) The distribution function ( )

(c) We want F(x) = ¼ to give us Q1 and F(x) = ¾ to give us Q3. IQR = Q3 – Q1

(Just in case you want to verify: and )

Poisson and Chi-Square for Goodness of Fit

The mean of the sample data is a weighted average:

The null hypothesis H0: Poisson Distribution fits the Data

We have to find expected values, using

Finally, the Chi-Square computations are given below:

There are 6 cells, so the df = 5. However, we lost one more df because we

Chi Square Test for Independence

We calculate Chi-Square as the short-cut formula ∑ -N

and get 754.9643 – 657 = 97.9643.

Calculate Expected Frequencies Using Definite Integrals.

Fitting an appropriate distribution and testing goodness of fit

One sample z-Test

Simple t-test – Assume Population Normal

Simple t-test … Computing some parameters backwards

We use a paired t-test. The results in brief, are shown below.

Then t = 0.675/(1.75/sqrt (8)) = 1.09. We check this with tables at 7 df and

Confidence Intervals and Unpaired t-test

Complete Problem on Regression

Correlation Coefficient and Goodness of Fit

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.