0% found this document useful (0 votes)
2 views

Week 1 Lecture 2

The document provides an introduction to Quantitative Methods 2 (QM2), focusing on estimation and hypothesis testing of a population mean. It reviews key concepts such as sampling distributions, confidence intervals, and the procedures for hypothesis testing, including a six-step process. Additionally, it discusses desirable properties of point estimators and includes examples related to estimating population means and testing hypotheses.

Uploaded by

jdmckie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week 1 Lecture 2

The document provides an introduction to Quantitative Methods 2 (QM2), focusing on estimation and hypothesis testing of a population mean. It reviews key concepts such as sampling distributions, confidence intervals, and the procedures for hypothesis testing, including a six-step process. Additionally, it discusses desirable properties of point estimators and includes examples related to estimating population means and testing hypotheses.

Uploaded by

jdmckie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ECON20003: QM2

WEEK 1: INTRODUCTION AND GENERAL


INFORMATION ABOUT QUANTITATIVE METHODS 2
(QM2) ESTIMATION AND HYPOTHESIS TESTING OF A
POPULATION MEAN
References:
S: § 9.3-9.4, 10.1-10.3, 10.5, 12.1-12.4
W: Ch 1-2, § 3.1-3.6, 3.8-3.9

Faculty of Business and Economics


Department of Economics
DESCRIBING A SINGLE POPULATION:
ESTIMATION AND HYPOTHESIS TESTING OF A
POPULATION MEAN

• In QM1 you already learnt about how to describe a single population by estimating
its mean and by performing a hypothesis test on it.
You are supposed to be familiar with these topics, so in QM2 we just briefly review
them.

In the case of random sampling, any statistic is a random variable because it is a


function of some randomly selected sample items.

Sampling Distribution: The probability distribution of all possible values of a


statistic generated by random samples of the same size is called the sampling
distribution of the given statistic.
Like probability distributions in general, sampling distributions can be characterized
by their mean, variance (standard deviation), and shape.

UoM, ECON 20003, Week 1 3


SAMPLING DISTRIBUTION OF THE SAMPLE MEAN

• Consider a random sample drawn from population X : ( ; ).

(identically and independently distributed)

The point estimator of the population mean () is the sample mean:

It has the following properties.

i)

The expected value of the sample mean is equal to the


population mean.
UoM, ECON 20003, Week 1 4
ii)

The variance of the sample mean is equal to the population variance


divided by the sample size (assuming, that the sampled population
is infinitely large, or is finite but sampling is with replacement).
The standard deviation of a statistic is called its standard error.

The standard error of the sample mean (just


iii) like its variance) is a decreasing function of the
sample size.
iv) Shape / form of the sampling distribution:
a) If the sampled population is normally distributed, X : N( ; ), then
the sample mean is also normally distributed.
b) If the sampled population is not normal but “large” (e.g. n  30),
then according to the Central Limit Theorem (CLT), the sample
mean is still approximately normally distributed.
See e.g https://yihui.org/animation/example/clt-ani/

5
UoM, ECON 20003, Week 1
ESTIMATING THE POPULATION MEAN

• Assume that we repeatedly draw random samples of the same size from
a normally distributed population, or that the sampled population is not
normal but we draw reasonably large samples and thus the CLT holds.

If z/2 denotes the (1- /2)100% percentile of the standard normal


distribution, that is P(Z > z/2) =  /2, then

Ex ante (before drawing a sample) Ex post (after drawing a sample)

Unknown X-bar is an estimator, while x-bar


is an estimate, i.e. a particular
constant
number, so this is not a random
Random interval limits interval and it either contains  or
(even if n,  and  are fixed) not.
UoM, ECON 20003, Week 1 6
When  is unknown but the sample size is large enough to estimate 
satisfactorily, we can replace  with its estimate s to obtain an estimate
of the standard error of the sample mean,

 is unknown, replace with s

 is known vs  unknown; the (1-)100% confidence interval


estimate of the population mean is given by

if  is known and X-bar ~ N


if  is unknown but X ~ N

Commenting on these intervals we cannot talk about ‘probability’.


Instead we use the term ‘confidence’ as we have (1-) degree of
confidence that the single interval obtained from the sample at hand is
not extreme but indeed contains .
UoM, ECON 20003, Week 1 7
A NSW Department of Consumer Affairs officer
responsible for enforcing laws concerning weights and
measures routinely inspects containers to determine if
the contents of 10kg bags of potatoes weigh at least
10kg as advertised on the container. A random sample
of 25 bags whose container claims that the net weight
is 10kg yielded the following statistics:
x-bar =10.52, s2 =1.43.

Estimate with 95% confidence the mean weight of a


bag of potatoes. Assume that the weights of 10kg bags
of potatoes are normally distributed.

https://www.industry.gov.au/national-measurement-
institute/trade-measurement/trade-measurement-inspectors

Credit: Own prompt Dalle3


Let X denote the weight of a bag of potatoes. We do not know its population mean and
standard deviation, but we are told that it is normally distributed.

Hence, we can develop the 95% confidence interval using

From the sample

From Selvanathan et al. Appendix B, p.1097, tα/2,df=n-1 = t0.025,24 = 2.064.

With 95% confidence the mean weight of a bag of potatoes is


somewhere between 10.02kg and 11.02kg.

UoM, ECON 20003, Week 1


TESTING THE POPULATION MEAN

• There are two types of statistical inference.

Estimation Hypothesis testing

Point estimation Interval estimation

• Hypothesis testing, in general, is a six-step procedure:

1) Set up the null and alternative hypotheses;


2) Determine the test statistic and its sampling distribution;
3) Specify the significance level;
4) Define the decision rule;
5) Take a sample and calculate the value of the test statistic;
6) Make a statistical decision and draw the conclusion.
UoM, ECON 20003, Week 1 10
A diet doctor claims that the average
Australian is more than 10kg
overweight. To test his claim, a
random sample of 100 Australians
were weighed, and the difference
between their actual weight and
their ideal weight was calculated and
recorded. The sample mean and
standard deviation are 12.175 and
7.898, respectively.

Credit: Own prompt Dalle3


a) Do the data allow us to infer at the 5% significance level that the doctor’s
claim is true?

i. Let diff denote the difference between actual weight and ideal weight (kg).

HA :  > 10 and H0 :  = 10

ii. If  was known (which it isn’t), since n is 100, CLT would make
sampling distribution normal. Then the test statistic would be Z.

However, since  is unknown, must assume that the population of


diff is not extremely non-normal.
Given this assumption, the test statistic is 𝑡𝛼,𝑛−1

iii. The significance level is given,  = 0.05.

12

UoM, ECON 20003, Week 1


iv. This is a right-tail test, so the entire rejection region is located under the
right tail of the sampling distribution.
Reject H0 if the value of the test statistic calculated from the sample
is greater than tα,df=n-1 = t0.05,99  t0.05,100 = 1.660.

v. The sample mean and standard deviation are 12.175 and 7.898,
respectively.

vi. Since tobs = 2.7539 > 1.660 = tα, we reject H0. Hence, at  = 0.05 there is
enough evidence to conclude that the diet doctor is right, the average
Australian is more than 10kg overweight.

UoM, ECON 20003, Week 1 13


b) Find the p-value of the test. What does it suggest?

The t-table does not show the exact p-value. However, it is certainly smaller
than 0.005, so H0 can be rejected even at the 0.5% significance level.

c) Perform the test in part (a) with R this time.


You will learn in the tutorials how to import the data from an Excel file and how
to run the t-test with R/RStudio. The t.test(diff, mu = 10, alternative = "greater")
command returns the following printout:

R reports the test statistic (t), the


degrees of freedom, the p-value,
the alternative hypothesis, the
95% ‘one-sided’ confidence
interval (do not worry about it)
and the sample mean.

Check whether R performed the required test (i.e. a right-tail t-test this time)
and whether the p-value <  = 0.05. Since p-value  0.0035, we reject H0.
UoM, ECON 20003, Week 1 14
• The (single-sample) Z / t test and the corresponding confidence interval
for a population mean are based on the following assumptions:
i. The data is a random sample of independent observations
ii. The variable of interest is quantitative and continuous
iii. … and is measured on an interval or ratio scale
iv. Either
(Z test) the population standard deviation, , is known and the
sample mean is at least approximately normally distributed
(because the sampled population itself is normally distributed or
the sample size is large and thus CLT holds),
or
(t-test)  is unknown but the sampled population is normally
distributed (at least approximately).

UoM, ECON 20003, Week 1 15


DESIRABLE PROPERTIES OF POINT ESTIMATORS

• Let’s examine four properties of ‘good’ point estimators.

Suppose there is a parameter  (e.g. a population mean, a


population proportion or a slope parameter of a population
regression model) and we estimate it with the following estimator:

a)  -hat is said to be a linear estimator of  if it is a linear function of


the sample observations.

For example, the sample mean X-bar is a linear estimator of the


population mean .
However, the sample variance s2 is a quadratic function of the Xi
sample observations, so it is a non-linear estimator of the
population variance  2.

UoM, ECON 20003, Week 2 16


b)  -hat is said to be an unbiased estimator of  if
i.e. if the expected value of  -hat is equal to  and thus the sampling
distribution of  -hat is centered around .
Otherwise,  -hat is referred to as a biased estimator and

Bias

For example, the sample mean is an unbiased estimator of


the population mean because

Similarly, the sample variance is an unbiased estimator of


the population variance because

UoM, ECON 20003, Week 2 17


However, the following alternative estimator of  2 is biased since

Suppose that 1-hat and 2-hat denote two different (normally distributed)
estimators of .

The sampling distribution of 1-hat is


centered around , while the sampling
distribution of 2-hat is not.

1-hat is an unbiased, whereas


2-hat is a biased estimator.

1-hat is expected to estimate  more accurately than 2-hat.
UoM, ECON 20003, Week 2 18
c)  -hat is an efficient estimator of  within some well-defined class
of estimators (e.g. in the class of linear unbiased estimators) if its
variance is smaller, or at least not greater, than that of any other
estimator of  in the same class of estimators.

3-hat and 4-hat are both unbiased


estimators of , but the sampling
distribution of 3-hat has a smaller
variance than the sampling distribution
of 4-hat.


3-hat is the more efficient estimator, it is likely to produce a more
accurate estimate of  than 4-hat.
Note: In case of random sampling the sample mean is the best linear unbiased
estimator (BLUE) of the population mean. “Best” means that X-bar has
the smallest variance in the class of linear unbiased estimators of ,
hence it is an efficient estimator.
UoM, ECON 20003, Week 2 19
d)  -hat is called a consistent estimator of  if its sampling
distribution collapses into a vertical straight line at the point  when
the sample size n goes to infinity.

f1( -hat), f2( -hat) and f3( -hat) n1 < n2 < n3


denote the sampling distributions of
the same  -hat estimator generated
by three different sample sizes.
These sampling distributions are
centered around , and as the
sample size increases they become
narrower. 
If this is true for larger sample sizes as well,  -hat is a consistent
estimator of .
• If -hat is an unbiased estimator then consistency requires the
variance of its sampling distribution to go to zero for increasing n.
For example, X-bar is a consistent estimator of .
• However, if -hat is a biased estimator then consistency requires both
its variance and the bias to go to zero for increasing n.
20
UoM, ECON 20003, Week 2
WHAT SHOULD YOU KNOW?
• Important definitions and concepts, like
population, sample, parameter, statistic, descriptive statistics, inferential
statistics, sampling error, non-sampling error, types of data/variable,
measurement scales, estimator, estimate, etc.
• To compute normal probabilities. To use the standard normal and t
tables (Tables 3 and 4 in Appendix B of Selvanathan et al., 2021).
• The sampling distribution of the sample mean.
• To estimate a population mean with a single value and with a confidence interval.
• To carry out the six steps of hypothesis testing and to apply this procedure to
testing a population mean.
• To estimate a population mean and to run a test for a population mean with
R/RStudio.
• Desirable statistical properties of point estimators: linearity,
unbiasedness, efficiency and consistency.
UoM, ECON 20003, Week 1 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy