MDAE FC 2022-23 Stats Problem&Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Problem Set for Discussions Page 1 of 17

18th Sep 2022

Index:
2,3 Probability
4 Probability and Baye’s Theorem
5 Binomial and Normal Approximation
6 Poisson and Normal Approximation
7 Poisson – Also clarifies PDF-CDF
8 Poisson and Chi-Square for Goodness of Fit
9 Chi Square Test for Independence
10,11 Chi Square Goodness of Fit
12 One Sample z-Test
13 Pooled Variance, Simple t-test
14 Simple t-test (Calculating backwards), Paired t-test
15 Confidence Interval and Unpaired t-test
16 Complete Problem on Regression
17 Correlation Coefficient and Goodness of Fit
Problem Set for Discussions Page 2 of 17

Probability

A card is drawn at random from a pack of 52 cards. The probability of


getting a 3 of diamond is __________

There is only one 3 of diamond in a pack of 52 cards so probability = 1/52

A die has the numbers 0,0,1,1,4 and 5. It is rolled. The probability of getting
the same number on both the throws is ___________ From Chpt 25, [E].

The table shows the left-handed and right-handed girls and boys in a class.
Two of the girls are chosen at random. The probability that exactly one of
these girls is left-handed is ___________ 0607_s13_qp_41_8b

Can you figure out why it is ?


Problem Set for Discussions Page 3 of 17

Probability

The probability of A hitting a target is and that of B is . They both fire at


the target. The probability that only one of them will hit the target is
___________

P[A hits target]+P[B hits target]−2P[A AND B hit] = ( )( )

Each of the three fair coins has been painted on its two sides. The first coin
is red on one side and blue on the other. The second coin is blue on one side
and green on the other. The third coin is green on one side and red on the
other. If the three coins are tossed in the air, the probability that the coins
show up three different colours is ___________
Problem Set for Discussions Page 4 of 17

Baye’s Theorem

Assume that Disease X is a rare disease, and only 2% of people have it.
Suppose that at your regular physical exam you test positive for Disease X.
Although Disease X has only mild symptoms, you are concerned and ask
your doctor about the accuracy of the test. It turns out that the test is 95%
accurate. What is the probability you have this disease?

A picture is worth a thousand words – however here are a few to explain -

19,800 people who tested positive would actually have the disease and
88,200 people who tested positive would not have the disease. This means
that of all those who tested positive, only 19,800/(19,800 + 88,200) = 0.1833 have it.

If you insist on using Baye’s Theorem,

Let D be the event you have disease X. Since only 2% of people in your
situationhave Disease X, the prior probability of Event D is 0.02. Or, more
formally, P(D) = 0.02. If D' represents the probability that Event D is false,
then P(D') = 1 - P(D) = 0.98.

To define the diagnostic value of the test, we need to define another event:
that you test positive for Disease X. Let's call this Event T. The diagnostic
value of the test depends on the probability you will test positive given that
you actually have the disease, written as P(T|D), and the probability you
test positive given that you do not have the disease, written as P(T|D').
Bayes' theorem shown below allows you to calculate P(D|T), the probability
that you have the disease given that you test positive for it.

The various terms and the formulae are: (and you will get the same answer!!)

P(T|D) = 0.99 Substitute…and you


P(T|D') = 0.09 will get the same
P(D) = 0.02 answer !!!
P(D') = 0.98
Problem Set for Discussions Page 5 of 17

Binomial and Normal Approximation


(i) In a certain country, 68% of the households have a printer. Find
the probability that, in a random sample of 8 households, 5, 6 or 7
households have a printer.

(ii) Use an approximation to find the probability that, in a random


sample of 500 households more than 337 households have a
printer.
(iii) Justify your use of approximation in part (ii).

( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( )

We take , (Normal approximation to Binomial)

( ) ( ) ( )

Both and
Problem Set for Discussions Page 6 of 17

Poisson and Normal Approximation


A publishing firm has found that errors in the first draft of a new book occur
at random and that, on average, there is 1 error in every 3 pages of a first
draft. Find the probability that in a particular first draft there are
(i) exactly 2 errors in 10 pages
(ii) at least 3 errors in 6 pages
(iii) fewer than 50 errors in 200 pages.

( )
For the first one, we take and hence ( )

For the second one, we take and the solution is given by

( ) ( ) ( ) ( ) ( )

For the last question, we take X ~ Po ( )

We want ( ) ( ) ( )

Problem Set for Discussions Page 7 of 17

Poisson – Also clarifies PDF-CDF


Cotton cloth is sold from long rolls of cloth. The number of flaws on a
randomly chosen piece of cloth of length a meters has a Poisson distribution
with mean 0.8a. The random variable X is the length of cloth, in metres,
between two successive flaws.
(i) Explain why, for ( )
(ii) Find the probability that there is at least one flaw in a 4 meter length of
cloth.
(iii) Find
(a) the distribution function of X
(b) the probability density function of X
(c) the interquartile range of X

When we define the variable X, we are not saying it is between the second
and the third flaw or between the nineteenth and the twentieth. It really
does not matter – (Lack of memory of the exponential distribution).
So the event X > x, we want zero flaws in x meters.
( )
That being said we are looking at P(zero flaws) =
( )
P(at least one flaw) = 1 – P(no flaw) =

We have 3 parts to the last question, so


(a) ( ) ( ) ( )

(b) The distribution function ( )

(c) We want F(x) = ¼ to give us Q1 and F(x) = ¾ to give us Q3. IQR = Q3 – Q1

, which gives

(Just in case you want to verify: and )


Problem Set for Discussions Page 8 of 17

Poisson and Chi-Square for Goodness of Fit

The numbers of a particular type of laptop computer sold by a store on each of 100
consecutive Saturdays are summarised in the following table.
Numbers Sold 0 1 2 3 4 5 6 7 >= 8
Number of Saturdays 7 20 39 16 14 2 1 1 0

Fit a Poisson distribution to the data and carry out a goodness of fit test at the
2.5% significance level … 2014W

The mean of the sample data is a weighted average:

The null hypothesis H0: Poisson Distribution fits the Data

We have to find expected values, using

x 0 1 2 3 4 5 6 7 >=8 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.050649 0.018993 0.006105 0.002267 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 5.06 1.9 0.61 0.23 100

Combine the last 4 cells so that the expected value is more than 5
x 0 1 2 3 4 >=5 SUM
P(X = x) 0.105399 0.237148 0.266792 0.200094 0.112553 0.078014 1
E(X = x) 10.54 23.71 26.68 20.01 11.26 7.8 100

Finally, the Chi-Square computations are given below:


x 0 1 2 3 4 >=5 SUM
E(X = x) 10.54 23.71 26.68 20.01 11.26 7.8 100
Observed 7 20 39 16 14 4 100
(Obs - Exp)^2 / Exp 1.189 0.581 5.689 0.804 0.667 1.851 10.78

There are 6 cells, so the df = 5. However, we lost one more df because we


computed λ from the data itself. So we have to look at df = 4.
χ2 4, 0.975 = 11.14. Our computed value is (just) less than the table value.
So we accept the Null Hypothesis.
Problem Set for Discussions Page 9 of 17

Chi Square Test for Independence

Test for independence of hair colour and city at a 10% level of significance.
(State the null hypothesis and the alternate hypothesis clearly and use the
appropriate test.)
Hair Colour
Fair Red Medium Dark Jet Black
A 59 11 84 50 36
City B 54 97 67 45 14
C 40 30 20 40 10

Calculate all the expected numbers assuming independence and the table should
look like

240*153/657
Hair Colour
Fair Red Medium Dark Jet Black Totals
A 59 (56) 11 (51) 84 (62) 50 (49) 36 (22) 240
City B 54 (64) 97 (58) 67 (72) 45 (57) 14 (25) 277
C 40 (33) 30 (29) 20 (37) 40 (29) 10 (13) 140
Totals 153 138 171 135 60 657

H0 : Null Hypothesis - The attributes city and hair colour are independent
H1 : Alternate Hypothesis – The attributes are dependent

We calculate Chi-Square as the short-cut formula ∑ -N

and get 754.9643 – 657 = 97.9643.


We have (3-1)(5-1) = 8 degrees of freedom and the Chi-Square from the table,

χ2 8, 0.1 = 13.362

Since the calculated value is greater (in fact, much greater) than the table
value, we reject the null hypothesis and conclude at 10% LOS that the hair
colour attribute is dependent on the city.
Problem Set for Discussions Page 10 of 17

Calculate Expected Frequencies Using Definite Integrals.


Goodness of Fit using Chi Square.

…. 2013
Problem Set for Discussions Page 11 of 17

Fitting an appropriate distribution and testing goodness of fit


using Chi Square
Problem Set for Discussions Page 12 of 17

One sample z-Test

Humerous bones from the same species of animal have approximately the
same length-to-width ratios. It is known that Species A has a mean ratio of
8.5. Suppose that 41 fossil humerous bones were unearthed at a site where
Species A is known to have flourished. (We assume that all bones are from
the same species.) The length-to-width ratios of these bones has sample
mean 9.26 and sample standard deviation 1.20. Can we conclude that these
bones belong to Species A? Perform a level 0.05 z-test to check. [Mendenhall
and Sincich, p. 45]
Problem Set for Discussions Page 13 of 17

Pooled Variance

Simple t-test – Assume Population Normal


There are a large number of students at a particular college. The heights, in
metres, of a random sample of 8 students are as follows:
1.75, 1.72, 1.62, 1.70, 1.82, 1.75, 1.68, 1.84
You may assume that heights of students are normally distributed.
(a) Test, at the 5% significance level, whether the population mean height of
students at this college is greater than 1.70 metres
(b) Find a 95% confidence interval for the population mean height of
students at this college.
Problem Set for Discussions Page 14 of 17

Simple t-test … Computing some parameters backwards

Paired t-test
A random sample of 8 bowlers was taken and their bowling speeds with a
white cricket ball and a red cricket ball were noted under similar
conditions. Use the t-test, with appropriate parameters to support the
hypothesis (state it clearly first) that there is no significant difference in
speeds between using the red and the white ball. The calculated t-value
should be approximately 1.09

Player A B C D E F G H
Red Ball Speed 66.2 62.4 60.8 65.4 68.8 64.3 65.2 67.2
White Ball Speed 66.1 60.3 60.9 65.2 66.4 63.8 62.4 69.8

We use a paired t-test. The results in brief, are shown below.

Difference is speed 0.1 2.1 -0.1 0.2 2.4 0.5 2.8 -2.6
Mean 0.675
Estimated Std Dev 1.750

Then t = 0.675/(1.75/sqrt (8)) = 1.09. We check this with tables at 7 df and


0.975. The tabulated value is 2.36 so validates null hypothesis.
Problem Set for Discussions Page 15 of 17

Confidence Intervals and Unpaired t-test


Problem Set for Discussions Page 16 of 17

Complete Problem on Regression


Problem Set for Discussions Page 17 of 17

Correlation Coefficient and Goodness of Fit

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy