0% found this document useful (0 votes)

21 views

Lecture 06 Estimation of Differences of Means

The document discusses methods for comparing population means and proportions between two groups. It covers estimating the difference between two population means when variances are equal or unequal, as well as estimating the difference between two population proportions. Examples are provided to illustrate calculating confidence intervals for comparing population means.

Uploaded by

sxya.community.hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Lecture 06 Estimation of Differences of Means

Uploaded by

sxya.community.hk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

CCN2311

FOUNDATIONS OF DATA SCIENCE

LECTURE 6
Estimation of Differences of Means
Estimation of Differences of Proportions
Estimation of Ratios of Variances
Topics
Estimation of Differences
• Differences of Means
• Ratios of Variances
• Differences of Proportions

CCN2311 Foundations of Data Science Page 2

Motivations
In many situations, there are needs to compare the
population means of two different populations.
Examples
• Comparison of average ages of male and female
populations
• Comparison of proportions of smokers in Hong Kong
and China

CCN2311 Foundations of Data Science Page 3

Types of Comparisons
• To compare parameters of two populations there are
two approaches
– Differences of the parameters in two different populations
– Ratios of the parameters in two different populations
• The choice of differences vs ratios depends on the
sampling distribution of the statistics used

CCN2311 Foundations of Data Science Page 4

Comparing Population Means
Similar to one sample problem in Lecture 4, there are a number of
cases to consider. The following are two of the important situations.

Case 1: Normal populations and unknown variances, but the

variances are equal
Case 2: Normal populations and unknown variances, but the
variances are unequal
We will also see how to deal with the situation when we don't have
normal populations at the end of the section.

CCN2311 Foundations of Data Science Page 5

Comparing Population Means
Case 1 – Assumptions
Assumptions:
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are iid 𝑁𝑁(𝜇𝜇𝑥𝑥 , 𝜎𝜎𝑥𝑥2 ) [Sample from the 1st population]
• 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 are iid 𝑁𝑁(𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦2 ) [Sample from the 2nd population]
• No requirements on 𝑚𝑚 and 𝑛𝑛. i.e. large samples are not needed
• 𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛 and 𝑌𝑌𝑗𝑗 , 𝑗𝑗 = 1, … , 𝑚𝑚 are all independent of each
other
• 𝜇𝜇𝑥𝑥 and 𝜇𝜇𝑦𝑦 are unknown. 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are unknown but equal. That
is 𝜎𝜎𝑥𝑥2 = 𝜎𝜎𝑦𝑦2 = 𝜎𝜎 2

CCN2311 Foundations of Data Science Page 6

Comparing Population Means
Case 1 – Sampling Distribution
• If 𝑋𝑋� and 𝑌𝑌� are the sample mean of the two populations
𝜎𝜎 2 𝜎𝜎 2
𝑋𝑋� − 𝑌𝑌~𝑁𝑁
� 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 , +
𝑛𝑛 𝑚𝑚
• 𝑆𝑆𝑥𝑥2 and 𝑆𝑆𝑦𝑦2 are the sample variances of the two populations
∑ 𝑛𝑛 �
𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋
2 ∑ 𝑚𝑚 �
𝑖𝑖=1 𝑌𝑌𝑖𝑖 − 𝑌𝑌
2
2 2
𝑆𝑆𝑥𝑥 = , 𝑆𝑆𝑦𝑦 =
𝑛𝑛 − 1 𝑚𝑚 − 1
𝑛𝑛−1 𝑆𝑆𝑥𝑥2 + 𝑚𝑚−1 𝑆𝑆𝑦𝑦2
• 𝑆𝑆𝑝𝑝2 = is the pooled sample variance for
(𝑛𝑛+𝑚𝑚−2)
estimating the common variance 𝜎𝜎 2
– 𝐸𝐸 𝑆𝑆𝑝𝑝2 = 𝜎𝜎 2
𝑛𝑛+𝑚𝑚−2 𝑆𝑆𝑝𝑝2
– ~𝜒𝜒 2 (𝑛𝑛 + 𝑚𝑚 − 2)
𝜎𝜎 2

CCN2311 Foundations of Data Science Page 7

Comparing Population Means
Case 1 – Sampling Distribution
� 𝑌𝑌� −(𝜇𝜇𝑥𝑥 −𝜇𝜇𝑦𝑦 )
𝑋𝑋−
• 𝑇𝑇 = has a t-distribution with (n+m-2)
1 1
𝑆𝑆𝑝𝑝 +
𝑛𝑛 𝑚𝑚
degrees of freedom
• The confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 is derived from the
distribution of 𝑇𝑇

CCN2311 Foundations of Data Science Page 8

Comparing Population Means
Case 1 – Estimation
• Point estimate of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 = 𝑥𝑥̅ − 𝑦𝑦,
� where 𝑥𝑥̅ and 𝑦𝑦� are the
observed sample mean of the two populations
1 1
• Standard error = 𝑠𝑠𝑝𝑝 + , where 𝑠𝑠𝑝𝑝 is the pooled sample
𝑛𝑛 𝑚𝑚
variance

1 − 𝛼𝛼 100% Confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦

1 1
• 𝑥𝑥̅ − 𝑦𝑦� ± 𝑡𝑡𝛼𝛼⁄2;(𝑛𝑛+𝑚𝑚−2) 𝑠𝑠𝑝𝑝 +
𝑛𝑛 𝑚𝑚

CCN2311 Foundations of Data Science Page 9

Interpretation of CI of the Difference of
Population Means
Interpretation of the CI
• If “0” is within the confidence interval, it is likely that the
difference of the two population means is zero (i.e. they are
equal)
– We will say “We do not have enough evidence to conclude that the two
population means are not equal”
• If “0” is outside the confidence interval, it is likely that the
difference of the two population means is NOT zero (i.e. they are
unequal)
– We will say “We enough evidence to conclude that the two population
means are not equal”
• Note: We will revisit this when we talk about two-sample
hypothesis tests in upcoming lectures.
CCN2311 Foundations of Data Science Page 10
Example 1 – Comparison of Population
Means (Case 1)
Suppose the weights (in kg) of a sample of 5 boys and a sample of 6
girls are listed below.
Boys: 50, 52, 55, 60, 48
Girls: 46, 48, 50, 52, 54, 50
It is known that weights of boys and weights of girls are both
normally distributed with equal variances.

1. Give a point estimate for the difference of the population mean

weights of boys & girls and calculate its standard error.
2. Find a 95% confidence interval for the difference of the
population mean weights of boys & girls.

CCN2311 Foundations of Data Science Page 11

Example 1 – Comparison of Population
Means (Case 1)
Solution:
For boys, 𝑛𝑛 = 5, 𝑥𝑥̅ = 53, 𝑠𝑠𝑥𝑥2 = 22
For girls, 𝑚𝑚 = 6, 𝑦𝑦� = 50, 𝑠𝑠𝑦𝑦2 = 8
𝑛𝑛−1 𝑠𝑠𝑥𝑥2 + 𝑚𝑚−1 𝑠𝑠𝑦𝑦2 (5−1) 22 +(6−1)(8)
Pooled variance = 𝑠𝑠𝑝𝑝2 = = = 14.2222
(𝑛𝑛+𝑚𝑚−2) 5+6−2

Point estimate of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 = 53 − 50 = 3

1 1 1 1
Standard error = 𝑠𝑠𝑝𝑝 + = 1.4222 × + = 2.2836
𝑛𝑛 𝑚𝑚 5 6

CCN2311 Foundations of Data Science Page 12

Example 1 – Comparison of Population
Means (Case 1)
Solution:
For 95% confidence interval, 𝑡𝑡1−𝛼𝛼/2;(𝑛𝑛+𝑚𝑚−2) = 𝑡𝑡0.025;(9) = 2.262
The 95% confidence interval is given by
1 1
𝑥𝑥̅ − 𝑦𝑦� ± 𝑡𝑡𝛼𝛼⁄2;(𝑛𝑛+𝑚𝑚−2) 𝑠𝑠𝑝𝑝 +
𝑛𝑛 𝑚𝑚

1 1
53 − 50 ± 2.262 14.2222 +
5 6
𝑖𝑖. 𝑒𝑒. (−2.1655, 8.1655)
Therefore, the difference of the mean weights of boys and girls is between
− 2.1655 kg and 8.1655 kg with a confidence of 95%. Since 0 is within the
confidence interval, we do not have enough evidence to conclude that the two
population means are unequal.

CCN2311 Foundations of Data Science Page 13

Comparing Population Means
Case 2 – Assumptions
Assumptions:
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are iid 𝑁𝑁(𝜇𝜇𝑥𝑥 , 𝜎𝜎𝑥𝑥2 ) [Sample from the 1st population]
• 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 are iid 𝑁𝑁(𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦2 ) [Sample from the 2nd population]
• No requirements on 𝑚𝑚 and 𝑛𝑛. i.e. large samples are not needed
• 𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛 and 𝑌𝑌𝑗𝑗 , 𝑗𝑗 = 1, … , 𝑚𝑚 are all independent of each
other
• 𝜇𝜇𝑥𝑥 and 𝜇𝜇𝑦𝑦 are unknown. 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are unknown but NOT equal.
That is 𝜎𝜎𝑥𝑥2 ≠ 𝜎𝜎𝑦𝑦2
• If we are uncertain if 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are equal, it is preferable to use
case 2

CCN2311 Foundations of Data Science Page 14

Comparing Population Means
Case 2 – Sampling Distribution
• If 𝑋𝑋� and 𝑌𝑌� are the sample mean of the two populations
𝜎𝜎𝑥𝑥2 𝜎𝜎𝑦𝑦2
𝑋𝑋� − 𝑌𝑌~𝑁𝑁
� 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 , +
𝑛𝑛 𝑚𝑚
• 𝑆𝑆𝑥𝑥2 and 𝑆𝑆𝑦𝑦2 are the sample variances of the two populations
∑ 𝑛𝑛 �
𝑖𝑖=1 𝑋𝑋𝑖𝑖 − 𝑋𝑋
2 ∑ 𝑚𝑚 �
𝑖𝑖=1 𝑌𝑌𝑖𝑖 − 𝑌𝑌
2
2 2
𝑆𝑆𝑥𝑥 = , 𝑆𝑆𝑦𝑦 =
𝑛𝑛 − 1 𝑚𝑚 − 1
• Since 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are unequal, they are estimated separately by
𝑆𝑆𝑥𝑥2 and 𝑆𝑆𝑦𝑦2
– 𝐸𝐸 𝑆𝑆𝑥𝑥2 = 𝜎𝜎𝑥𝑥2 and 𝐸𝐸 𝑆𝑆𝑦𝑦2 = 𝜎𝜎𝑦𝑦2
𝑛𝑛−1 𝑆𝑆𝑥𝑥2 𝑚𝑚−1 𝑆𝑆𝑦𝑦2
– 2 ~𝜒𝜒 2 (𝑛𝑛 − 1) and ~𝜒𝜒 2 (𝑚𝑚 − 1)
𝜎𝜎𝑥𝑥 𝜎𝜎𝑦𝑦2

• There are no pooled variance in this case as 𝜎𝜎𝑥𝑥2 ≠ 𝜎𝜎𝑦𝑦2

CCN2311 Foundations of Data Science Page 15

Comparing Population Means
Case 2 – Sampling Distribution
� 𝑌𝑌� −(𝜇𝜇𝑥𝑥 −𝜇𝜇𝑦𝑦 )
𝑋𝑋−
• 𝑇𝑇 = has an approximate t-distribution with (𝑟𝑟)
2
𝑆𝑆2
𝑥𝑥 +𝑆𝑆𝑦𝑦
𝑛𝑛 𝑚𝑚

degrees of freedom, where

2
𝑠𝑠𝑥𝑥2𝑠𝑠𝑦𝑦2
+
𝑟𝑟 = 𝑛𝑛 𝑚𝑚
2
2 2 𝑠𝑠𝑦𝑦2
1 𝑠𝑠𝑥𝑥 1
+
𝑛𝑛 − 1 𝑛𝑛 𝑚𝑚 − 1 𝑚𝑚
• The confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 is derived from the
distribution of 𝑇𝑇

CCN2311 Foundations of Data Science Page 16

Comparing Population Means
Case 2 – Estimation
Point Estimation of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦
• Point estimate = 𝑥𝑥̅ − 𝑦𝑦,
� where 𝑥𝑥̅ and 𝑦𝑦� are the observed sample
mean of the two populations
𝑠𝑠𝑥𝑥2 𝑠𝑠𝑦𝑦2
• Standard error = +
𝑛𝑛 𝑚𝑚

1 − 𝛼𝛼 100% Confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦

𝑠𝑠𝑥𝑥2 𝑠𝑠𝑦𝑦2
• 𝑥𝑥̅ − 𝑦𝑦� ± 𝑡𝑡𝛼𝛼⁄2;(𝑟𝑟) +
𝑛𝑛 𝑚𝑚

CCN2311 Foundations of Data Science Page 17

Example 2 – Comparison of Population
Means (Case 2)
Two samples of primary school students are selected from City A
and City B.
For City A, there are 10 students selected, the mean and standard
deviation of the 10 students are 140cm and 5 cm respectively.
For City B, there are 12 students selected, the mean and standard
deviation of the 12 students are 135cm and 4cm respectively.

Suppose the average heights of primary school students in City A

and City B are normally distributed as 𝑁𝑁(𝜇𝜇𝐴𝐴 , 𝜎𝜎𝐴𝐴2 ) and 𝑁𝑁(𝜇𝜇𝐵𝐵 , 𝜎𝜎𝐵𝐵2 )
respectively. Estimate the difference 𝜇𝜇𝐴𝐴 − 𝜇𝜇𝐵𝐵 and find a 90%
confidence interval for the difference.

CCN2311 Foundations of Data Science Page 18

Example 2 – Comparison of Population
Means (Case 2)
Solution:
𝑛𝑛 = 10, 𝑥𝑥𝐴𝐴̅ = 140, 𝑠𝑠𝐴𝐴2 = 52 and 𝑚𝑚 = 12, 𝑥𝑥̅𝐵𝐵 = 135, 𝑠𝑠𝐵𝐵2 = 42
• Populations are normal
• 𝜎𝜎𝐴𝐴2 and 𝜎𝜎𝐵𝐵2 are unknown and NOT equal
• Sample sizes are 10 and 12. (i.e. not large)

Point estimate of 𝜇𝜇𝐴𝐴 − 𝜇𝜇𝐵𝐵 = 𝑥𝑥𝐴𝐴̅ − 𝑥𝑥̅𝐵𝐵 = 140 − 135 = 5

2 2
𝑠𝑠𝐴𝐴 𝑠𝑠𝐵𝐵 52 42
Standard error = + = + = 1.9579
𝑛𝑛 𝑚𝑚 10 12

CCN2311 Foundations of Data Science Page 19

Example 2 – Comparison of Population
Means (Case 2)
Solution: Confidence interval is given by
Degrees of freedom for CI is
2 𝑠𝑠𝐴𝐴2 𝑠𝑠𝐵𝐵2
𝑠𝑠𝐴𝐴2 𝑠𝑠𝐵𝐵2 𝑥𝑥̅ − 𝑦𝑦� ± 𝑡𝑡𝛼𝛼⁄2;(𝑟𝑟) +
+ 𝑛𝑛 𝑚𝑚
𝑛𝑛 𝑚𝑚
𝑟𝑟 = 2 2 52 42
1 𝑠𝑠𝐴𝐴2 1 𝑠𝑠𝐵𝐵2 140 − 135 ± 1.740 +
+ 10 12
𝑛𝑛 − 1 𝑛𝑛 𝑚𝑚 − 1 𝑚𝑚
2 (1.5933, 8.4067)
52 42
+ We are 90% confident that the difference in
10 12
= 2 2 = 17.1652 ≈ 17 average heights of primary school students in
1 52 1 42 City A and B is between 1.5933 cm and 8.4067
+
9 10 11 12 cm.
Table value is 𝑡𝑡𝛼𝛼/2;(𝑟𝑟) = 𝑡𝑡0.05;(17) = 1.740 Since 0 is not inside the confidence interval, we
have enough evidence to conclude that the two
population means are unequal.

CCN2311 Foundations of Data Science Page 20

Example 3 – Comparison of Population
Means (Case 2)
A doctor was trying to study the difference in heart rates (in beats per minute) of
smokers and non-smokers.
For a random sample of 8 smokers, the mean and standard deviation of their
heart rates were 85 and 5 respectively.
For a random sample of 16 non-smokers, the mean and standard deviation of
their heart rates were 81 and 7 respectively.
It was known that heart rates of smokers and non-smokers are both normally
distributed but their variance may be different.

Estimate the difference in average heart rates of smokers and non-smokers and
find a 99% confidence interval for the difference.

CCN2311 Foundations of Data Science Page 21

Example 3 – Comparison of Population
Means (Case 2)
Solution:

CCN2311 Foundations of Data Science Page 22

Example 3 – Comparison of Population
Means (Case 2)
Solution: Degrees of freedom for CI is
2
2 52 72
𝑛𝑛 = 8, 𝑥𝑥̅ = 85, 𝑠𝑠𝑥𝑥2 = 52 and 𝑠𝑠𝑥𝑥2 𝑠𝑠𝑦𝑦2 +
𝑟𝑟 = 𝑛𝑛 + 𝑚𝑚 8 16
𝑚𝑚 = 16, 𝑦𝑦� = 81, 𝑠𝑠𝑦𝑦2 = 72 2 2 = 2 2
1 𝑠𝑠𝑥𝑥2 1 𝑠𝑠𝑦𝑦2 1 52 1 72
+ +
7 8 15 16
• Populations are normal 𝑛𝑛 − 1 𝑛𝑛 𝑚𝑚 − 1 𝑚𝑚
= 18.9498 ≈ 19
• 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are unknown and NOT equal Table value is 𝑡𝑡𝛼𝛼/2;(𝑟𝑟) = 𝑡𝑡0.005;(19) = 2.861
• Sample sizes are 8 and 16. (i.e. not large) Confidence interval is given by
𝑠𝑠𝑥𝑥2 𝑠𝑠𝑦𝑦2
𝑥𝑥̅ − 𝑦𝑦� ± 𝑡𝑡𝛼𝛼⁄2;(𝑟𝑟) +
𝑛𝑛 𝑚𝑚
Point estimate of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 = 𝑥𝑥̅ − 𝑦𝑦�
= 85 − 81 = 4 85 − 81 ± 2.861
52 72
+
8 16
𝑠𝑠𝑥𝑥2 𝑠𝑠𝑦𝑦2 52 72 (−3.1166,11.1167)
Standard error = + = + = 2.4875
𝑛𝑛 𝑚𝑚 8 16 Therefore, we are 99% confident that the difference in heart rates
of smokers and non-smokers is between -3.1166 and 11.1167
beats per minute.
Since 0 is within the confidence interval, we do not have enough
evidence to conclude that the two population mean are unequal.

CCN2311 Foundations of Data Science Page 23

Comparison of Population Means -
Other Situations
• Case 1 and 2 only cover the situations when populations are
normal. When populations are not normal, the confidence
interval formula can still be used if sample sizes of both samples
are large (say 𝑛𝑛 > 30 and 𝑚𝑚 > 30)

• Recall that when degrees of freedom of t-distribution goes to

infinity, it will become a standard normal distribution. Therefore,
for large 𝑛𝑛 and 𝑚𝑚, 𝑡𝑡𝛼𝛼/2 in the confidence interval formula can be
replaced by 𝑧𝑧𝛼𝛼/2 .

CCN2311 Foundations of Data Science Page 24

Questions?

CCN2311 Foundations of Data Science Page 25

Comparison of Population Variances
• Why do we need to compare population variances?
• This is usually done before the comparison of
population means because the equality of variances will
affect the choice of estimation methods for comparing
means

CCN2311 Foundations of Data Science Page 26

Comparison of Population Variances -
Assumptions
Assumptions:
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are iid 𝑁𝑁(𝜇𝜇𝑥𝑥 , 𝜎𝜎𝑥𝑥2 ) [Sample from the 1st population]
• 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 are iid 𝑁𝑁(𝜇𝜇𝑦𝑦 , 𝜎𝜎𝑦𝑦2 ) [Sample from the 2nd population]
• No requirements on 𝑚𝑚 and 𝑛𝑛. i.e. large samples are not needed
• 𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛 and 𝑌𝑌𝑗𝑗 , 𝑗𝑗 = 1, … , 𝑚𝑚 are all independent of each
other
• 𝜇𝜇𝑥𝑥 and 𝜇𝜇𝑦𝑦 are unknown
• 𝜎𝜎𝑥𝑥2 and 𝜎𝜎𝑦𝑦2 are unknown
• 𝜇𝜇𝑥𝑥 ≠ 𝜇𝜇𝑦𝑦 and 𝜎𝜎𝑥𝑥2 ≠ 𝜎𝜎𝑦𝑦2

CCN2311 Foundations of Data Science Page 27

Comparison of Population Variances –
Sampling Distributions
• 𝑆𝑆𝑥𝑥2 and 𝑆𝑆𝑦𝑦2 are the sample variances of the two populations
∑ 𝑛𝑛
𝑖𝑖=1 𝑋𝑋𝑖𝑖 − �
𝑋𝑋 2 ∑ 𝑚𝑚 � 2
𝑖𝑖=1 𝑌𝑌𝑖𝑖 − 𝑌𝑌
2 2
𝑆𝑆𝑥𝑥 = , 𝑆𝑆𝑦𝑦 =
𝑛𝑛 − 1 𝑚𝑚 − 1
• The sample variances are both chi-square distributed
𝑛𝑛−1 𝑆𝑆𝑥𝑥2 𝑚𝑚−1 𝑆𝑆𝑦𝑦2
2 ~𝜒𝜒 2 𝑛𝑛 − 1 , ~𝜒𝜒 2 𝑚𝑚 − 1
𝜎𝜎𝑥𝑥 𝜎𝜎𝑦𝑦2

• The comparison of variances will rely on the following ratio

𝑛𝑛 − 1 𝑆𝑆𝑥𝑥2
/(𝑛𝑛 − 1)
𝜎𝜎𝑥𝑥2 𝜎𝜎𝑦𝑦2 𝑆𝑆𝑥𝑥2
= 2 ⋅ 2 ~𝐹𝐹(𝑛𝑛 − 1, 𝑚𝑚 − 1)
𝑚𝑚 − 1 𝑆𝑆𝑦𝑦2 𝜎𝜎𝑥𝑥 𝑆𝑆𝑦𝑦
/(𝑚𝑚 − 1)
𝜎𝜎𝑦𝑦2

CCN2311 Foundations of Data Science Page 28

Comparison of Population Variances –
Estimation
𝜎𝜎𝑥𝑥2 𝑠𝑠𝑥𝑥2
• Point estimate of =
𝜎𝜎𝑦𝑦2 𝑠𝑠𝑦𝑦2

𝜎𝜎𝑥𝑥2
• 1 − 𝛼𝛼 100% confidence interval of is given by
𝜎𝜎𝑦𝑦2
1 𝑠𝑠𝑥𝑥2 𝜎𝜎𝑥𝑥2 𝑠𝑠𝑥𝑥2
⋅ 2 < 2 < 𝐹𝐹𝛼𝛼/2; 𝑚𝑚−1,𝑛𝑛−1
𝐹𝐹𝛼𝛼/2; 𝑛𝑛−1,𝑚𝑚−1 𝑠𝑠𝑦𝑦 𝜎𝜎𝑦𝑦 𝑠𝑠𝑦𝑦2

CCN2311 Foundations of Data Science Page 29

Comparison of Population Variances –
Interpretation of the CI
𝜎𝜎𝑥𝑥2
• If “1” is within the CI, it is very likely that the ratio is
𝜎𝜎𝑦𝑦2
equal to 1. i.e. the two variances are equal
– We will say “We do not have enough evidence to conclude
that the two population variances are not equal”
𝜎𝜎𝑥𝑥2
• If “1” is not inside the CI, it very likely that the ratio is
𝜎𝜎𝑦𝑦2
NOT equal to 1. i.e. the two variances are unequal
– We will say “We have enough evidence to conclude that the
two population variances are not equal”

CCN2311 Foundations of Data Science Page 30

Example 4 – Comparison of Population
Variances
Suppose a sample of 10 boys and a sample of 8 girls were
selected. The sample variances of their IQs are 30 and 20
respectively.
If IQs of boys and girls are both normally distributed with
variances 𝜎𝜎𝐵𝐵2 and 𝜎𝜎𝐺𝐺2 , find a 95% confidence interval for
the variance ratio 𝜎𝜎𝐵𝐵2 /𝜎𝜎𝐺𝐺2 .

CCN2311 Foundations of Data Science Page 31

Example 4 – Comparison of Population
Variances
Solution:
𝑛𝑛 = 10, 𝑠𝑠𝐵𝐵2 = 30, 𝑚𝑚 = 8 and 𝑠𝑠𝐺𝐺2 = 20
• The two populations are normal (important)
• Means and variances of the two populations may be unequal
• Sample sizes are not large
Table values for 95% confidence interval
• 𝐹𝐹𝛼𝛼/2; 𝑛𝑛−1,𝑚𝑚−1 = 𝐹𝐹0.025; 9,7 = 4.82
• 𝐹𝐹𝛼𝛼/2; 𝑚𝑚−1,𝑛𝑛−1 = 𝐹𝐹0.025; 7,9 = 4.20

95% confidence of 𝜎𝜎𝐵𝐵2 /𝜎𝜎𝐺𝐺2 is given by

1 𝑠𝑠𝐵𝐵2 𝜎𝜎𝐵𝐵2 𝑠𝑠𝐵𝐵2
⋅ < < 𝐹𝐹𝛼𝛼/2; 𝑚𝑚−1,𝑛𝑛−1
𝐹𝐹𝛼𝛼/2; 𝑛𝑛−1,𝑚𝑚−1 𝑠𝑠𝐺𝐺2 𝜎𝜎𝐺𝐺2 𝑠𝑠𝐺𝐺2

CCN2311 Foundations of Data Science Page 32

Example 4 – Comparison of Population
Variances
Solution:
95% confidence of 𝜎𝜎𝐵𝐵2 /𝜎𝜎𝐺𝐺2 is given by
1 𝑠𝑠𝐵𝐵2 𝜎𝜎𝐵𝐵2 𝑠𝑠𝐵𝐵2
⋅ < < 𝐹𝐹𝛼𝛼; 𝑚𝑚−1,𝑛𝑛−1 2
𝐹𝐹𝛼𝛼/2; 𝑛𝑛−1,𝑚𝑚−1 𝑠𝑠𝐺𝐺2 𝜎𝜎𝐺𝐺2 2 𝑠𝑠𝐺𝐺
1 30 𝜎𝜎𝐵𝐵2 30
⋅ < < 4.20 ⋅
4.82 20 𝜎𝜎𝐺𝐺2 20
𝜎𝜎𝐵𝐵2
0.3112 < 2 < 6.3
𝜎𝜎𝐺𝐺
2
𝜎𝜎𝐵𝐵
Therefore, we are 95% confident that 2 is between 0.3112 and 6.3. Since the
𝜎𝜎𝐺𝐺
value 1 is within the confidence interval, we don’t have enough evidence to
conclude that the two variances are different.

CCN2311 Foundations of Data Science Page 33

Example 5 – Comparison of Population
Variances
A study was carried out to compare the average salaries of male and female in IT
industry.
13 male IT workers were selected and their average salary was $26000 with a
standard deviation $1500.
10 female IT workers were also selected and their average salary was $28000
with a standard deviation of $1000.
2
It is known that male IT workers’ salaries are 𝑁𝑁(𝜇𝜇𝑚𝑚 , 𝜎𝜎𝑚𝑚 ) and female IT workers’
salaries are 𝑁𝑁(𝜇𝜇𝑓𝑓 , 𝜎𝜎𝑓𝑓2 ).
2
(a) Construct a 95% confidence interval for the 𝜎𝜎𝑚𝑚 /𝜎𝜎𝑓𝑓2
(b) Based the result of (a), choice a suitable method to construct a 95%
confidence for the difference in average salaries of male and female IT
workers.

CCN2311 Foundations of Data Science Page 34

Example 5 – Comparison of Population
Variances
Solution:
(a)

CCN2311 Foundations of Data Science Page 35

Example 5 – Comparison of Population
Variances
Solution:
(b)

CCN2311 Foundations of Data Science Page 36

Example 5 – Comparison of Population
Variances
Solution:
2
𝑛𝑛𝑚𝑚 = 13, 𝑥𝑥̅𝑚𝑚 = 26000, 𝑠𝑠𝑚𝑚 = 15002
𝑛𝑛𝑓𝑓 = 10, 𝑥𝑥𝑓𝑓̅ = 28000, 𝑠𝑠𝑓𝑓2 = 10002
2
𝜎𝜎𝑚𝑚
(a) 95% CI for
𝜎𝜎𝑓𝑓2
1 2
15002 𝜎𝜎𝑚𝑚 15002
⋅ < < 𝐹𝐹0.025; 9,12
𝐹𝐹0.025; 12,9 10002 𝜎𝜎𝑓𝑓2 10002
1 15002 𝜎𝜎𝑚𝑚 2
15002
⋅ < < 3.44 ⋅
3.87 10002 𝜎𝜎𝑓𝑓2 10002
2
𝜎𝜎𝑚𝑚
0.5814 < 2 < 7.74
𝜎𝜎𝑓𝑓
2
𝜎𝜎𝑚𝑚
We are 95% confident that the variance ratio is between 0.5814 and 7.74.
𝜎𝜎𝑓𝑓2
Since “1” is within the confidence interval, there is not enough evidence to
conclude that the two population variances are unequal.

CCN2311 Foundations of Data Science Page 37

Example 5 – Comparison of Population
Variances
Solution:
(b) Based on the conclusion in part (a), we will assume that the variances of the two populations are
equal when constructing the 95% confidence interval for the difference in population means.
Since we assume equality of variances, we need to find the pooled sample variance.
2 2
2
𝑛𝑛 − 1 𝑠𝑠𝑚𝑚 + 𝑚𝑚 − 1 𝑆𝑆𝑓𝑓 13 − 1 15002 + 10 − 1 10002
𝑠𝑠𝑝𝑝 = = = 1714285.7142
(𝑛𝑛 + 𝑚𝑚 − 2) (13 + 10 − 2)
95% confidence interval for 𝜇𝜇𝑚𝑚 − 𝜇𝜇𝑓𝑓 is given by
1 1
𝑥𝑥̅𝑚𝑚 − 𝑦𝑦�𝑓𝑓 ± 𝑡𝑡𝛼𝛼⁄2;(𝑛𝑛+𝑚𝑚−2) 𝑠𝑠𝑝𝑝 +
𝑛𝑛 𝑚𝑚

1 1
28000 − 26000 ± 𝑡𝑡0.025; 21 1714285.7142 +
13 10
(854.4945, 3145.506)
We are 95% confident that the difference between in the average salaries of male and female IT
workers is between $854.4945 and $3145.506. Since “0” is not inside the CI, we have enough
evidence to conclude that the average salaries of male and female IT workers are unequal.

CCN2311 Foundations of Data Science Page 38

Questions?

CCN2311 Foundations of Data Science Page 39

Comparison of Population Proportions
• If we are interested in comparing the opinions (yes/no
responses) of people in two populations, we can do
that by comparing proportions of the opinion (say yes)
of the two populations
• For example,
– Proportion of unemployment in male vs female populations
– Proportion of childless families in Hong Kong vs China
– Proportion of students with iPad in secondary school vs
college

CCN2311 Foundations of Data Science Page 40

Comparison of Population Proportions -
Assumptions
Assumptions:
• 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are iid Bernoulli random variables with parameter 𝑝𝑝𝑥𝑥 [Sample
from the 1st population]
• 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 are iid Bernoulli random variables with parameter 𝑝𝑝𝑦𝑦 [Sample
from the 2nd population]
• Sample sizes are required, i.e. 𝑚𝑚 > 30, 𝑛𝑛 > 30, 𝑛𝑛𝑝𝑝̂𝑥𝑥 > 5, 𝑛𝑛 1 − 𝑝𝑝̂𝑥𝑥 >
5, 𝑛𝑛𝑝𝑝̂𝑦𝑦 > 5, 𝑛𝑛 1 − 𝑝𝑝̂𝑦𝑦 > 5
• 𝑋𝑋𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛 and 𝑌𝑌𝑗𝑗 , 𝑗𝑗 = 1, … , 𝑚𝑚 are all independent of each other

CCN2311 Foundations of Data Science Page 41

Comparison of Population Proportions -
Sampling Distributions
• Sample proportions
𝑝𝑝𝑥𝑥 1−𝑝𝑝𝑥𝑥 𝑝𝑝𝑦𝑦 1−𝑝𝑝𝑦𝑦
𝑝𝑝̂𝑥𝑥 is approximately 𝑁𝑁(𝑝𝑝𝑥𝑥 , ) and 𝑝𝑝̂𝑦𝑦 is approximately 𝑁𝑁(𝑝𝑝𝑦𝑦 , )
𝑛𝑛 𝑚𝑚
• Difference of sample proportions
𝑝𝑝𝑥𝑥 1−𝑝𝑝𝑥𝑥 𝑝𝑝𝑦𝑦 1−𝑝𝑝𝑦𝑦
𝑝𝑝̂𝑥𝑥 − 𝑝𝑝̂𝑦𝑦 is approximately 𝑁𝑁 𝑝𝑝𝑥𝑥 − 𝑝𝑝𝑦𝑦 , +
𝑛𝑛 𝑚𝑚
• Confidence interval of 𝑝𝑝𝑥𝑥 − 𝑝𝑝𝑦𝑦 is based on
𝑝𝑝̂𝑥𝑥 − 𝑝𝑝̂𝑦𝑦 − (𝑝𝑝𝑥𝑥 − 𝑝𝑝𝑦𝑦 )
𝑍𝑍 =
𝑝𝑝̂𝑥𝑥 1 − 𝑝𝑝̂𝑥𝑥 𝑝𝑝̂𝑦𝑦 1 − 𝑝𝑝̂𝑦𝑦
+
𝑛𝑛 𝑚𝑚
𝑍𝑍 is approximately 𝑁𝑁 0,1 when sample sizes are large.

CCN2311 Foundations of Data Science Page 42

Comparison of Population Proportions -
Estimation
• Point estimate of 𝑝𝑝𝑥𝑥 − 𝑝𝑝𝑦𝑦 = 𝑝𝑝̂𝑥𝑥 − 𝑝𝑝̂𝑦𝑦

𝑝𝑝�𝑥𝑥 1−𝑝𝑝�𝑥𝑥 𝑝𝑝�𝑦𝑦 1−𝑝𝑝�𝑦𝑦

• Standard error = +
𝑛𝑛 𝑚𝑚
• 1 − 𝛼𝛼 100% confidence interval
𝑝𝑝̂𝑥𝑥 1 − 𝑝𝑝̂𝑥𝑥 𝑝𝑝̂𝑦𝑦 1 − 𝑝𝑝̂𝑦𝑦
𝑝𝑝̂𝑥𝑥 − 𝑝𝑝̂𝑦𝑦 ± 𝑧𝑧𝛼𝛼/2 +
𝑛𝑛 𝑚𝑚

CCN2311 Foundations of Data Science Page 43

Example 6 – Comparison of Population
Proportions
An economist would like to find out if proportion of unemployed
males (𝑝𝑝𝑚𝑚 ) and proportion of unemployed female (𝑝𝑝𝑓𝑓 ) are different
in a town.
Among a random sample of 50 males, 10 of them are unemployed.
Among a random sample of 80 females, 20 of them are
unemployed.
Estimate the difference in the unemployment proportions 𝑝𝑝𝑚𝑚 − 𝑝𝑝𝑓𝑓
and find a 99% confidence interval for the difference.

CCN2311 Foundations of Data Science Page 44

Example 6 – Comparison of Population
Proportions
Solution:
10 20
𝑛𝑛 = 50, 𝑝𝑝̂𝑚𝑚 = = 0.2 and 𝑚𝑚 = 80, 𝑝𝑝̂𝑓𝑓 = = 0.25
50 80
Point estimate of 𝑝𝑝𝑚𝑚 − 𝑝𝑝𝑓𝑓 = 𝑝𝑝̂𝑚𝑚 − 𝑝𝑝̂𝑓𝑓 = 0.2 − 0.25 = −0.05

𝑝𝑝�𝑚𝑚 1−𝑝𝑝�𝑚𝑚 𝑝𝑝�𝑓𝑓 1−𝑝𝑝�𝑓𝑓 0.2 1−0.2 0.25 1−0/25

Standard error = + = + = 0.0745
𝑛𝑛 𝑚𝑚 50 80
99% confidence interval is
𝑝𝑝̂𝑥𝑥 1 − 𝑝𝑝̂𝑥𝑥 𝑝𝑝̂𝑦𝑦 1 − 𝑝𝑝̂𝑦𝑦
𝑝𝑝̂𝑥𝑥 − 𝑝𝑝̂𝑦𝑦 ± 𝑧𝑧𝛼𝛼/2 +
𝑛𝑛 𝑚𝑚
0.2 − 0.25 ± 2.575 0.0745
(−0.2417,0.1417)
We are 99% confident that the difference in unemployment rate of males and females is
between -0.2417 and 0.1417. Since “0” is within the confidence interval, we do not have
enough evidence to conclude that the unemployment rate of males and females are
different.

CCN2311 Foundations of Data Science Page 45

Example 7 – Comparison of Population
Proportions
In order to study the effect of smoking on lung problems, a doctor interviewed a
random sample of 100 smokers and a random sample of 80 non-smokers. The
doctor asked if they had any lung problems in the last 3 months. The collected
information is summarized in the table below.
Smoking Status Had lung No lung Total
problems in the problems in the
last 3 months last 3 months
Smoking 45 55 100
Non-smoking 20 60 80

Let 𝑝𝑝𝑠𝑠 and 𝑝𝑝𝑛𝑛 be the proportion of smokers and non-smokers who developed
lung problems in the last 3 months. Construct a 95% confidence interval for 𝑝𝑝𝑠𝑠 −
𝑝𝑝𝑛𝑛 .

CCN2311 Foundations of Data Science Page 46

Example 7 – Comparison of Population
Proportions
Solution:

CCN2311 Foundations of Data Science Page 47

Example 7 – Comparison of Population
Proportions
Solution:
45 20
𝑛𝑛 = 100, 𝑝𝑝̂𝑠𝑠 = = 0.45, 𝑚𝑚 = 80, 𝑝𝑝̂𝑛𝑛 = = 0.25
100 80
95% confidence interval is
𝑝𝑝̂𝑠𝑠 1 − 𝑝𝑝̂𝑠𝑠 𝑝𝑝̂𝑛𝑛 1 − 𝑝𝑝̂𝑛𝑛
𝑝𝑝̂𝑠𝑠 − 𝑝𝑝̂𝑛𝑛 ± 𝑧𝑧𝛼𝛼/2 +
𝑛𝑛 𝑚𝑚

0.45(1 − 0.45) 0.25(1 − 0.25)

0.45 − 0.25 ± 1.96 +
100 80
(0.0639,0.3361)

We are 95% confidence that the difference in proportions of people who had
lung problems in the last 3 months between smokers and non-smokers is
between 0.0639 and 0.3361. Since “0” is outside the confidence interval, we have
enough evidence to conclude that the two proportions are unequal.

CCN2311 Foundations of Data Science Page 48

Final Words – Comparison of
DEPENDENT Samples
• In some situations, we need to compare population means of
dependent samples.
• For example, we want to know that change in average weight of
a group of obese children before and after a diet. It is typically
done in the following way.
– A random sample of obese children are selected (only 1 sample)
– The weights of the i-th child before and after the diet are 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖
– However 𝑋𝑋𝑖𝑖 and 𝑌𝑌𝑖𝑖 do not satisfy the independent assumption required,
because they are measured on the same child!
– To compare the difference we need to look at the random variable 𝐷𝐷𝑖𝑖 =
𝑋𝑋𝑖𝑖 − 𝑌𝑌𝑖𝑖
– Methods in Lecture 5 (one sample problem) can be used to estimate the
mean difference by looking at 𝐷𝐷𝑖𝑖

CCN2311 Foundations of Data Science Page 49

Next Lecture
• In next lecture, we will look at how can we use data to
make decisions by means of hypothesis testing.

CCN2311 Foundations of Data Science Page 50

The Wizard of Us by Jean Houston - Excerpt
50% (4)
The Wizard of Us by Jean Houston - Excerpt
20 pages
HCG General Catalog
No ratings yet
HCG General Catalog
395 pages
Intro Stats Formula Sheet
No ratings yet
Intro Stats Formula Sheet
5 pages
Statistics Formula Tables
No ratings yet
Statistics Formula Tables
8 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
International Education: Issues For Teachers, Second Edition (Toronto: Canadian
No ratings yet
International Education: Issues For Teachers, Second Edition (Toronto: Canadian
38 pages
Health9 - q1 - Mod7 - Home Gardening As An Environmental Project - v3
100% (6)
Health9 - q1 - Mod7 - Home Gardening As An Environmental Project - v3
25 pages
Todd May On Deleuze
No ratings yet
Todd May On Deleuze
20 pages
New9Topic - Two Sample Inference (Corrected)
No ratings yet
New9Topic - Two Sample Inference (Corrected)
57 pages
Lecture 04
No ratings yet
Lecture 04
19 pages
Biostat Estimation
100% (1)
Biostat Estimation
48 pages
04 - Interval Estimation
No ratings yet
04 - Interval Estimation
42 pages
IISER Biostat
No ratings yet
IISER Biostat
87 pages
Sample Problemsfor Confidence Intervals 924152003
No ratings yet
Sample Problemsfor Confidence Intervals 924152003
22 pages
Chapter 4 - Hypothesis Confidence Interval - 30102016
No ratings yet
Chapter 4 - Hypothesis Confidence Interval - 30102016
103 pages
Chapter 3 (Sampling-New)
0% (1)
Chapter 3 (Sampling-New)
103 pages
Lecture 07 - Large and Small Estimation
No ratings yet
Lecture 07 - Large and Small Estimation
44 pages
Chapter 3 - Sampling Distribution and Confidence Interval1
No ratings yet
Chapter 3 - Sampling Distribution and Confidence Interval1
54 pages
Notes STA408 - Chapter 3
No ratings yet
Notes STA408 - Chapter 3
17 pages
Chapter 13 Inference About Comparing Two Populations: QMDS 202 Data Analysis and Modeling
No ratings yet
Chapter 13 Inference About Comparing Two Populations: QMDS 202 Data Analysis and Modeling
9 pages
Chapter 8
No ratings yet
Chapter 8
40 pages
Lecture_7_CIs_A (1)
No ratings yet
Lecture_7_CIs_A (1)
34 pages
Inference For Two Populations
No ratings yet
Inference For Two Populations
58 pages
MATH& 146 Lesson 30: Difference of Two Means
No ratings yet
MATH& 146 Lesson 30: Difference of Two Means
28 pages
stat
No ratings yet
stat
44 pages
Materi Estimasi
No ratings yet
Materi Estimasi
34 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Statics Chapter 8 88
No ratings yet
Statics Chapter 8 88
12 pages
Chapter-8-Estimation & Hypothesis Testing
100% (1)
Chapter-8-Estimation & Hypothesis Testing
12 pages
Week 11
No ratings yet
Week 11
6 pages
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
No ratings yet
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
38 pages
Chapter 2
No ratings yet
Chapter 2
62 pages
Inf_lec_2
No ratings yet
Inf_lec_2
26 pages
Chapter 3 - 2 Statistical Inference For 1 Population
No ratings yet
Chapter 3 - 2 Statistical Inference For 1 Population
84 pages
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
No ratings yet
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
8 pages
3a Confidence Interval
No ratings yet
3a Confidence Interval
52 pages
Chapter 2
No ratings yet
Chapter 2
118 pages
Examples Ch8 40 41 2
No ratings yet
Examples Ch8 40 41 2
10 pages
CH 4 Estimation.
100% (1)
CH 4 Estimation.
48 pages
Statistics-Help-Card-Formulas
No ratings yet
Statistics-Help-Card-Formulas
3 pages
Statistics Help Card Formulas
No ratings yet
Statistics Help Card Formulas
3 pages
Lesson-
No ratings yet
Lesson-
63 pages
Sampling & Estimation
No ratings yet
Sampling & Estimation
19 pages
Unit 6
No ratings yet
Unit 6
60 pages
Statistics Formulas
No ratings yet
Statistics Formulas
6 pages
Chapter 2 Organizing and Summarizing Data
No ratings yet
Chapter 2 Organizing and Summarizing Data
8 pages
Business Analytics & Machine Learning: Regression Analysis
No ratings yet
Business Analytics & Machine Learning: Regression Analysis
58 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Applied Statistics Lecture 11
No ratings yet
Applied Statistics Lecture 11
48 pages
Business Statistics: Course Code: MJNB1W05
No ratings yet
Business Statistics: Course Code: MJNB1W05
32 pages
Chapter 8
No ratings yet
Chapter 8
40 pages
Module 8-3 Inference About Two Populations
No ratings yet
Module 8-3 Inference About Two Populations
64 pages
Week 4 - Statistical hypothesis testing (2)(1)
No ratings yet
Week 4 - Statistical hypothesis testing (2)(1)
22 pages
QBM101_chapter8
No ratings yet
QBM101_chapter8
35 pages
Sampling Distributions & Confidence Interval
No ratings yet
Sampling Distributions & Confidence Interval
42 pages
Sta 342-Testing Hypothesis-5-Test on Equality of Means
No ratings yet
Sta 342-Testing Hypothesis-5-Test on Equality of Means
10 pages
Estimation
0% (1)
Estimation
106 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Chapter 10-Inference About Means and Proportions With Two Populations
No ratings yet
Chapter 10-Inference About Means and Proportions With Two Populations
69 pages
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
No ratings yet
22nd Inferences Based On Two Samples-Confidence Intervals and Tests of Hypothesis
69 pages
Chapter 8 - Hypothesis Testing - 2populations - L1 - Jan 2024
No ratings yet
Chapter 8 - Hypothesis Testing - 2populations - L1 - Jan 2024
28 pages
Estimation
No ratings yet
Estimation
41 pages
Estimation of Parameters.
No ratings yet
Estimation of Parameters.
29 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
DS Preferred Subjects
No ratings yet
DS Preferred Subjects
7 pages
Articulation As 2022
No ratings yet
Articulation As 2022
2 pages
Articulation-AS 2021
No ratings yet
Articulation-AS 2021
5 pages
Articulation As 2020
No ratings yet
Articulation As 2020
1 page
Lecture 02 Normal Distribution and Binomial Distribution
No ratings yet
Lecture 02 Normal Distribution and Binomial Distribution
38 pages
Ikigai Diet The Secret of Japanese Diet To Health and Longevity (Sachiaki Takamiya)
100% (1)
Ikigai Diet The Secret of Japanese Diet To Health and Longevity (Sachiaki Takamiya)
44 pages
Texts in This Period: Compare and Contrast The Presentation of Relationships Between Men and Women in Two
No ratings yet
Texts in This Period: Compare and Contrast The Presentation of Relationships Between Men and Women in Two
9 pages
Goenka 2022 Full DAY List FINAL PUBLISH
No ratings yet
Goenka 2022 Full DAY List FINAL PUBLISH
304 pages
Double Heighted Apt Lobby FE-04: Consulting Engineers Fahim, Nanji & Desouza (PVT.) LTD
No ratings yet
Double Heighted Apt Lobby FE-04: Consulting Engineers Fahim, Nanji & Desouza (PVT.) LTD
1 page
Python Training Course Content
No ratings yet
Python Training Course Content
6 pages
E&I Notes
No ratings yet
E&I Notes
2 pages
Chapter 9 - Audit Sampling
100% (2)
Chapter 9 - Audit Sampling
37 pages
Full download (Ebook) The Physics and Technology of Laser Resonators by Hall, Denis; Jackson, P.E ISBN 9780852741177, 9781000112221, 9781000132182, 9781000157031, 9781003069508, 0852741170, 1000112225, 1000132188, 1000157032 pdf docx
100% (5)
Full download (Ebook) The Physics and Technology of Laser Resonators by Hall, Denis; Jackson, P.E ISBN 9780852741177, 9781000112221, 9781000132182, 9781000157031, 9781003069508, 0852741170, 1000112225, 1000132188, 1000157032 pdf docx
55 pages
People of The Storval Pateau
No ratings yet
People of The Storval Pateau
8 pages
Seminar Report On Bajaj Auto Limited
No ratings yet
Seminar Report On Bajaj Auto Limited
15 pages
Hamlet Thesis Statements About Ophelia
100% (3)
Hamlet Thesis Statements About Ophelia
7 pages
Vibration Analysis and Damping Characteristics of Hybrid Composite Plate Using Finite Element Analysis
No ratings yet
Vibration Analysis and Damping Characteristics of Hybrid Composite Plate Using Finite Element Analysis
51 pages
Critical Theory, Poststructuralism, Postmodernism Their Sociological Relevance - Agger
No ratings yet
Critical Theory, Poststructuralism, Postmodernism Their Sociological Relevance - Agger
28 pages
CHILISA, Bagele. 2021. Indigenous Made in Africa Evaluation Frameworks...
No ratings yet
CHILISA, Bagele. 2021. Indigenous Made in Africa Evaluation Frameworks...
13 pages
Lab View
No ratings yet
Lab View
39 pages
Celestial Objects
No ratings yet
Celestial Objects
7 pages
Advancing The Framework For Considering The Effects of Climate Change On Worker Safety and Health
No ratings yet
Advancing The Framework For Considering The Effects of Climate Change On Worker Safety and Health
20 pages
Matrices
No ratings yet
Matrices
46 pages
Cohesion and Coherence
No ratings yet
Cohesion and Coherence
11 pages
Best Waterproofing Chemicals- Water Repellent Coatings-Sikagard 72W-ADT Industries
No ratings yet
Best Waterproofing Chemicals- Water Repellent Coatings-Sikagard 72W-ADT Industries
2 pages
Friction - Factors Affecting Friction
No ratings yet
Friction - Factors Affecting Friction
2 pages
FAO Barbados - Fishery and Aquaculture Country Profiles
No ratings yet
FAO Barbados - Fishery and Aquaculture Country Profiles
10 pages
HT Fuse 1 1
No ratings yet
HT Fuse 1 1
4 pages
Psychological Ownership
No ratings yet
Psychological Ownership
21 pages
Lesson 1
No ratings yet
Lesson 1
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 06 Estimation of Differences of Means

Uploaded by

Lecture 06 Estimation of Differences of Means

Uploaded by

CCN2311

FOUNDATIONS OF DATA SCIENCE

CCN2311 Foundations of Data Science Page 2

CCN2311 Foundations of Data Science Page 3

CCN2311 Foundations of Data Science Page 4

Case 1: Normal populations and unknown variances, but the

CCN2311 Foundations of Data Science Page 5

CCN2311 Foundations of Data Science Page 6

CCN2311 Foundations of Data Science Page 7

CCN2311 Foundations of Data Science Page 8

1 − 𝛼𝛼 100% Confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦

CCN2311 Foundations of Data Science Page 9

1. Give a point estimate for the difference of the population mean

CCN2311 Foundations of Data Science Page 11

Point estimate of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦 = 53 − 50 = 3

CCN2311 Foundations of Data Science Page 12

CCN2311 Foundations of Data Science Page 13

CCN2311 Foundations of Data Science Page 14

• There are no pooled variance in this case as 𝜎𝜎𝑥𝑥2 ≠ 𝜎𝜎𝑦𝑦2

CCN2311 Foundations of Data Science Page 15

degrees of freedom, where

CCN2311 Foundations of Data Science Page 16

1 − 𝛼𝛼 100% Confidence interval of 𝜇𝜇𝑥𝑥 − 𝜇𝜇𝑦𝑦

CCN2311 Foundations of Data Science Page 17

Suppose the average heights of primary school students in City A

CCN2311 Foundations of Data Science Page 18

Point estimate of 𝜇𝜇𝐴𝐴 − 𝜇𝜇𝐵𝐵 = 𝑥𝑥𝐴𝐴̅ − 𝑥𝑥̅𝐵𝐵 = 140 − 135 = 5

CCN2311 Foundations of Data Science Page 19

CCN2311 Foundations of Data Science Page 20

CCN2311 Foundations of Data Science Page 21

CCN2311 Foundations of Data Science Page 22

CCN2311 Foundations of Data Science Page 23

• Recall that when degrees of freedom of t-distribution goes to

CCN2311 Foundations of Data Science Page 24

CCN2311 Foundations of Data Science Page 25

CCN2311 Foundations of Data Science Page 26

CCN2311 Foundations of Data Science Page 27

• The comparison of variances will rely on the following ratio

CCN2311 Foundations of Data Science Page 28

CCN2311 Foundations of Data Science Page 29

CCN2311 Foundations of Data Science Page 30

CCN2311 Foundations of Data Science Page 31

95% confidence of 𝜎𝜎𝐵𝐵2 /𝜎𝜎𝐺𝐺2 is given by

CCN2311 Foundations of Data Science Page 32

CCN2311 Foundations of Data Science Page 33

CCN2311 Foundations of Data Science Page 34

CCN2311 Foundations of Data Science Page 35

CCN2311 Foundations of Data Science Page 36

CCN2311 Foundations of Data Science Page 37

CCN2311 Foundations of Data Science Page 38

CCN2311 Foundations of Data Science Page 39

CCN2311 Foundations of Data Science Page 40

CCN2311 Foundations of Data Science Page 41

CCN2311 Foundations of Data Science Page 42

𝑝𝑝�𝑥𝑥 1−𝑝𝑝�𝑥𝑥 𝑝𝑝�𝑦𝑦 1−𝑝𝑝�𝑦𝑦

CCN2311 Foundations of Data Science Page 43

CCN2311 Foundations of Data Science Page 44

𝑝𝑝�𝑚𝑚 1−𝑝𝑝�𝑚𝑚 𝑝𝑝�𝑓𝑓 1−𝑝𝑝�𝑓𝑓 0.2 1−0.2 0.25 1−0/25

CCN2311 Foundations of Data Science Page 45

CCN2311 Foundations of Data Science Page 46

CCN2311 Foundations of Data Science Page 47

0.45(1 − 0.45) 0.25(1 − 0.25)

CCN2311 Foundations of Data Science Page 48

CCN2311 Foundations of Data Science Page 49

CCN2311 Foundations of Data Science Page 50

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.