STAT2102_Chapter6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

CHAPTER 6: POINT ESTIMATION

March 22, 2024

6.1 Descriptive Statistics


Definition. Given a realization of a random sample of size n, x1 , x2 , . . . , xn , we define the
sample mean:
n
1X
x̄ = xi ,
n
i=1
sample variance:
n
2 1 X
s = (xi − x̄)2 ,
n−1
i=1
sample standard deviation:
v
u n
u 1 X
s=t (xi − x̄)2 ,
n−1
i=1

and empirical distribution that puts probability n1 on each of the observations x1 , x2 , . . . , xn .


In the case that k observations are equal, the probability of the empirical distribution at
that value becomes nk .
The mode is simply the most frequently occurring value among all the observations in
a sample.

6.2 Exploratory Data Analysis


Given a realization of a random sample of size n, x1 , x2 , . . . , xn , we can order them from
the smallest to the largest, the resulting ordered data are called the order statistics of
the sample, denoted by y1 , y2 , . . . , yn .
1 n
Definition. For n+1 ≤ p ≤ n+1 , if (n + 1)p is an integer, we define the (100p)th sample
percentile as the (n + 1)pth order statistic, that is, y(n+1)p . If (n + 1)p equals an integer r
plus some proper fraction 0 < ab < 1, then we define the (100p)th sample percentile as
a
ep = yr + (yr+1 − yr ).
π
b

1
Definition. The 25th, 50th, and 75th percentiles are called the first, second, and third
quartiles of the sample. The 50th percentile is also called the median of the sample.

6.3 Order Statistics


Definition. If y1 ≤ y2 ≤ · · · ≤ yn are the order statistics associated with the sample
x1 , x2 , . . . , xn , then yr is called the sample quantile of order r/(n + 1), which is the
same as the 100r/(n + 1)th sample percentile.
Definition. Given a theoretical (continuous) distribution, let πp be its (100p)th percentile.
The plot of (yr , πp ), where p = r/(n + 1), for several values of r is called the quantile-
quantile plot (q-q plot). If the random sample is taken from the given theoretical
distribution, we shall have yr ≈ πp , and the q-q plot shall be close to the diagonal line.
Because a linear transformation of a normal distribution is still normal, the q-q plot of
a sample and N (0, 1) shall be close to a linear line if the sample is taken from a normal
distribution.

6.4 Maximum Likelihood Estimation


Definition. Any numerical feature of a population distribution is called a parameter. Sta-
tistical inference deals with drawing generalization about population parameters from
an analysis of the sample data.

The important types of inferences:

• estimation of parameter(s) (Chapters 6-7)

• testing of statistical hypothesis (Chapters 8-9)

Here we consider the following question: Given a random sample X1 , X2 , . . . , Xn from


a population distribution f (x), if there is unknown parameter θ associated with f (x), how
do we estimate its value based on the random sample? For example:

• We believe that the height for CUHK students is normally distributed. How can we
estimate µ?

We will have two types of estimates:

• Point estimates (Chapter 6) are single numbers, for example, µ̂ = 165 cm.

• Interval estimates (Chapter 7) give a range in which the parameter value is likely to
be, for example, [163, 167] for µ.

2
Definition. A function of X1 , X2 , . . . , Xn is called a statistic. The statistic, say u(X1 , X2 , . . . , Xn ),
used to estimate θ is called a (point) estimator of θ. The value u(x1 , x2 , . . . , xn ) com-
puted using the data is called the estimate.
How to choose an appropriate estimator?
Definition. If E[u(X1 , X2 , . . . , Xn )] = θ, then the statistic u(X1 , X2 , . . . , Xn ) is called an
unbiased estimator of θ. Otherwise, it is said to be biased.
We also want the mean squared error E[(u(X1 , X2 , . . . , Xn ) − θ)2 ] to be small, so
that the estimator is close to the truth.
Theorem. The sample mean
n
1X
X̄ = Xi
n
i=1

is an unbiased estimator for the population mean µ. The sample variance


n
1 X
S2 = (Xi − X̄)2
n−1
i=1

is an unbiased estimator for the population variance σ 2 .


1 1
This explains the reason of the factor n−1 instead of n in the definition of S 2 .

Besides estimating the mean and variance, a large class of estimation problems can be
described as follows. Suppose we have a random sample X1 , X2 . . . , Xn from the population
distribution with pmf or pdf f (x; θ), that is, we know that the distribution belongs to a
particular family of distributions (e.g., exponential) but we don’t know the parameter.
How do we estimate θ?
Definition. We define the likelihood function to be

L(θ) = f (X1 ; θ)f (X2 ; θ) · · · f (Xn ; θ)

The so called Maximum Likelihood Estimator (MLE) of θ is defined to be the max-


imizer of L(θ).
Example. Suppose that the population distribution has the probability density function
(
e−(x−θ) if x ≥ θ
f (x; θ) =
0 if x < θ

with an unknown parameter θ. Let X1 , . . . , Xn be a random sample from this distribution.


Find the MLE of θ. (Answer: min{X1 , X2 , . . . , Xn })

3
6.5 A Simple Regression Problem
Now we consider yet another class of estimation problems. “Regression analysis” refers to
the analysis of data involving two or more variables. Its objective is to discover the nature
of their relationship and then to explore it for the purposes of prediction. For example,

• The advertising manager of a firm is interested in the relationship between money


spent for advertising and the corresponding increase in sales.

• A major concern in radiation therapy is the extent of cellular damage induced by the
uration and intensity of exposure.

• Employee job performance and interview scores at the time of hiring.

In studying the relation between two variables x and Y , we consider the experiment
where the values of x are controlled, whereas Y is dependent on x and may be subject to
uncontrollable sources of error.
Definition. In the simple linear model, we assume

Yi = α1 + βxi + εi ,

where εi , for i = 1, 2, . . . , n, are independent and follows N (0, σ 2 ). We can also rewrite the
model as
Yi = α + β(xi − x̄) + εi ,
where α = α1 + β x̄.
Observing the data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), our goal is to estimate α, β and σ 2 .
Theorem. The maximum likelihood estimators of α, β and σ 2 are, respectively,

α̂ = Ȳ ,
Pn
Yi (xi − x̄)
β̂ = Pi=1
n 2
,
i=1 (xi − x̄)
and
n
1X
σ̂ 2 = [Yi − α̂ − β̂(xi − x̄)]2 .
n
i=1
1 1
It can be verified that α̂ and β̂ are unbiased. If we replace n by n−2 in the definition of
σ̂ 2 , then it also becomes unbiased.
Definition. The above α̂ and β̂ are called the least squares estimators because they
minimize
Xn
[Yi − α − β(xi − x̄)]2
i=1

4
among all choices of α and β.
Example. Given these four pairs of (x, y) values:
x 1 1 2 4
Find the least squares regression line. (Answer: α̂ = 6, β̂ = −1.5)
y 7 8 6 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy