0% found this document useful (0 votes)
2 views

Lec3_estimatorproperties (1)

Uploaded by

awesomenessv3.1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lec3_estimatorproperties (1)

Uploaded by

awesomenessv3.1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Properties of Estimators

STA 211: The Mathematics of Regression

Niccolo Anceschi Ph.D.

1
Slides adapted from lectures by Prof. Jerry Reiter
Properties of Estimators

▶ We learned about maximum likelihood estimation (MLE) as a


general tool for estimating parameters of statistical models.
▶ How do we know if the MLE, or any other estimator for that
matter, is “good”?
▶ What does “good” even mean for estimators?
▶ In this lesson, we present criteria for “good” and apply them
to MLEs.
▶ Unbiasedness
▶ Variance
▶ Mean squared error
Conceptual Picture of Bias and Variance Part 1

▶ Suppose you are shooting arrows at a target.


▶ You are not a perfect shot, so you generally don’t hit the
target exactly.
▶ Unbiased: Your shots are spread around the target, sometimes
too high or too low, sometimes to the right or to the left.
▶ Biased: Your shots are systematically away from the target,
e.g., tending to be too high (or too low).
▶ Variance: If your shots have a wide spread, you have a high
variance. If your shots are tightly clustered, you have a low
variance.
Conceptual Picture of Bias and Variance Part 2

▶ Ideally, your shots are tightly clustered around the target: your
shooting is unbiased with low variance.
▶ Very unfortunately, your shots are systematically far from the
target and spread all over the place: your shooting is biased
with high variance.
▶ It might be acceptable if your shots are tightly clustered
around a value close to the target: your shooting has a small
bias and small variance.
Conceptual Picture of Bias and Variance Part 3
Mapping this to Estimation

▶ Each time we take a sample, we compute the MLE for the


parameters of interest. (We shoot an arrow at the target.)
▶ Suppose we could take many independent samples over and
over again and compute the MLE in each.
▶ Ideally, the collection of MLE values are centered at the true
parameter. (We are unbiased shooters.)
▶ Even better, the collection of MLE values are clustered around
the true parameter. (We are unbiased shooters with low
variance.)
▶ It would be bad news if the MLE values tended not to estimate
the true parameter and were diffusely spread. (We are biased
shooters with high variance.)
Operationalizing Unbiasedness and Variance

▶ Let θ̂ be the an estimator for some parameter θ.

▶ Unbiasedness: we generally aim for E(θ̂) = θ. If that’s not


true, we call Bias the difference E(θ̂) − θ
▶ The expected value is taken over the distribution of the
random variable θ̂.
▶ The randomness in any given estimator θ̂ comes from being a
function of the observed random variables Y1 , . . . , Yn .

▶ Variance: we compute Var (θ̂) = E((θ̂ − E(θ̂))2 )


▶ the variance is taken over the distribution of the random
variables.

▶ Let’s study this for the MLEs we derived last class.


Properties of p̂ = y /n for Binomial Data, Part 1

▶ For Y ∼ Bin(n, p), the MLE is p̂ = Y /n.

▶ Let’s show that p̂ is unbiased, i.e., E(p̂) = p.

E(p̂) = E(Y /n)


= (1/n)E(Y )
= (1/n)(np) = p.

▶ Here, we used the fact that E(Y ) = np for a binomial


distribution.
(This is proved in Example 6.1 of the STA 211 Math
Supplement, pages 35 and 36).
Properties of p̂ = Y /n for Binomial Data, Part 2

▶ Let’s find Var (p̂).

Var (p̂) = Var (Y /n)


= (1/n2 )Var (Y )
= (1/n2 )(np(1 − p)) = p(1 − p)/n.

▶ Here, we used the fact that Var (Y ) = np(1 − p) for a


binomial distribution.
(This is proved in Example 6.1 of the STA 211 Math
Supplement, pages 35 and 36).
Properties of MLE of µ for Normal Distribution, Part 1

▶ For n independent measurements drawn from a N(µ, σ 2 ), the


MLE of µ is µ̂ = ni=1 Yi /n.
P

▶ Let’s show that µ̂ is unbiased, i.e., E(µ̂) = µ.


n
!
X
E(µ̂) = E Yi /n
i=1
n
!
X
= (1/n)E Yi
i=1
n
X n
X
= (1/n) E(Yi ) = (1/n) µ = (1/n)(nµ) = µ.
i=1 i=1
Properties of MLE of µ for Normal Distribution, Part 2

▶ Let’s find Var (µ̂).


n
!
X
Var (µ̂) = Var Yi /n
i=1
n
!
X
= (1/n2 )Var Yi
i=1
n
X n
X
= (1/n2 ) Var (Yi ) = (1/n2 ) σ2
i=1 i=1
= (1/n2 )(nσ 2 ) = σ 2 /n.
Properties of MLE of σ 2 for Normal Distribution, Part 1

▶ For n independent measurements drawn from a N(µ, σ 2 ), the


MLE of σ 2 is σ̂ 2 = ni=1 (Yi − Ȳ )2 /n.
P

▶ Let’s see if E(σ̂ 2 ) is equal to σ 2 . (Spoiler: it does not!)

▶ First, a convenient simplification of ni=1 (Yi − Ȳ )2


P

n
X n
X
2
(Yi − Ȳ ) = (Yi2 − 2Yi Ȳ + Ȳ 2 )
i=1 i=1
n
X n
X n
X
= Yi2 − 2Yi Ȳ + Ȳ 2
i=1 i=1 i=1
Properties of MLE of σ 2 for Normal Distribution, Part 3

▶ Since Ȳ is a constant with respect to the summation (it does


not depend on i),
X n Xn
2Yi Ȳ = 2Ȳ Yi
i=1 i=1
Pn
▶ Since i=1 yi = nȲ ,
n
X
2Ȳ yi = 2nȲ 2
i=1
Pn 2
▶ Since i=1 Ȳ = nȲ 2 ,
Xn n
X n
X n
X
Yi2 − 2Yi Ȳ + Ȳ 2 = Yi2 − nȲ 2
i=1 i=1 i=1 i=1
Properties of MLE of σ 2 for Normal Distribution, Part 4

▶ Back to the expected value. Let’s use our new fact,


Pn 2
Pn 2 2
i=1 (Yi − Ȳ ) = i=1 Yi − nȲ .

n n
! !
X X
2
E (Yi − Ȳ ) /n = (1/n)E Yi2 − nȲ 2

i=1 i=1
n
!
X
= (1/n)E Yi2 − (1/n)E(nȲ 2 )
i=1
n
X
= (1/n) E(Yi2 ) − E(Ȳ 2 ).
i=1
Properties of MLE of σ 2 for Normal Distribution, Part 5

▶ Recalling expressions for variances, we have


▶ σ 2 = Var (Yi ) = E(Yi2 ) − E(Yi )2 = E(Yi2 ) − µ2 for all Yi ,
▶ σ 2 /n = Var (Ȳ ) = E(Ȳ 2 ) − E(Ȳ )2 = E(Ȳ 2 ) − µ2 .

▶ Substituting for E(Yi2 ) and E(Ȳ 2 ), we have

n
X n
X
(1/n) E(Yi2 ) 2
− E(Ȳ ) = (1/n) (σ 2 + µ2 ) − (σ 2 /n + µ2 )
i=1 i=1
= (1/n)(n(σ 2 + µ2 )) − (σ 2 /n + µ2 )
= σ 2 + µ2 − (σ 2 /n + µ2 )
= σ 2 − σ 2 /n
= ((n − 1)/n) · σ 2 ̸= σ 2 .
Conclusion....

Pn
▶ So, the MLE σ̂ 2 = i=1 (Yi − Ȳ )2 /n is biased for σ 2 .
▶ What if we correct by multiplying σ̂ 2 by n/(n − 1) ?
n
n X
ŝ 2 = σ̂ 2 · = (Yi − Ȳ )2 /(n − 1)
n−1
i=1

▶ Let’s compute the expected value of such ŝ 2

E(ŝ 2 ) = E(σ̂ 2 · n/(n − 1)) = (n/(n − 1))E(σ̂ 2 )


= (n/(n − 1))((n − 1)/n)σ 2 = σ 2

▶ Thus, ŝ 2 is an unbiased estimator of σ 2 . This explains why


we use ŝ 2 rather than σ̂ 2 in practice.
Mean Squared Error & Bias and Variance Trade-off (1)

Let’s consider the random variable given by the difference between


some estimator θ̂ and what it’s trying to estimate, θ:

Var [θ̂ − θ] = E[(θ̂ − θ)2 ] − (E[θ̂ − θ])2

At the same time Var [θ̂ − θ] = Var [θ̂] Thus the mean squared
error (MSE) of θ̂ wrt θ is

E[(θ̂ − θ)2 ] = Var [θ̂] + (E[θ̂] − θ)2


| {z } | {z } | {z }
MSE Variance Bias2
Mean Squared Error & Bias and Variance Trade-off (2)
Efficiency and Consistency

1. In comparing unbiased estimators, the one that has the


smallest variance is said to be the most efficient.
2. An estimator is said to be consistent, if the MSE converges
to zero as the sample size n increases.

Suggested Exercise: Suppose Y1 , . . . , Yn ∼ N (µ, σ 2 ).


Rank the performance of the estimators Ȳ , Y1 , Ȳ + 1/n for µ.
▶ Which one has the lowest/highest MSE?
▶ I Which ones are consistent?
▶ I Can you think of a situation in which you might want a
biased estimate of some parameter?
Confidence Intervals

▶ Consider a statistical model indexed by a parameter θ ∈ R


▶ Denote the observation as Y1:n = (Y1 , . . . , Yn )
▶ Fix a value α ∈ (0, 1) of interest

Definition: A confidence interval (CI) for θ at confidence level


1 − α is an interval In,α = [ ℓ(Y1:n ) , u(Y1:n ) ] such that:
▶ The lower and upper bounds ℓ(Y1:n ) and u(Y1:n ) are statistics
(i.e. function of Y1:n not depending explicitly on θ)
 
▶ Pθ ℓ(Y1:n ) ≤ θ ≤ u(Y1:n ) = Pθ θ ∈ In,α ≥ 1 − α
Confidence Intervals

▶ e.g.: In,0.05 gives us a region such that the probability that


the true parameter is within its range is at least 0.95
▶ Under the true model (i.e. under the true value of θ) it is very
unlikely that we observe values Y1:n = y1:n such that θ is not
between ℓ(y1:n ) and u(y1:n )
▶ Often the focus is on confidence intervals centered around a
given estimator θ̂ for θ
▶ In such a case, the width of In,α is often resemblant of the
variance of θ̂
Binomial MLE & Confidence Intervals
▶ We have seen that the MLE for binomial data Y ∼ Bin(n,p) is
p̂ = Y /n
▶ Let us consider a specific type of confidence intervals Ĩn,α
• Symmetric & centered around p̂, i.e. Ĩn,α = [p̂ − δ, p̂ + δ]
 
• Therefore: Pp p ∈ Ĩn,α = Pp |p̂ − p| ≤ δ
▶ Given any α ∈ (0, 1), we want to find δ = δ(n, α) such that

Pp |p̂ − p| ≤ δ ≥ 1 − α

Chebyshev’s Inequality: let X be a RV with finite mean E[X ] = µ


and variance Var [X ] = σ 2 . For any t > 0 it holds
 1
P |X − µ| ≥ t σ ≤ 2
t
Binomial MLE & Confidence Intervals

Let us apply such inequality to p̂


 Var [p̂] p(1 − p) 1
Pp |p̂ − p| ≥ δ ≤ = ≤
δ2 n δ2 4 n δ2
Recalling that
 
Pp |p̂ − p| ≤ δ = 1 − Pp |p̂ − p| ≥ δ

we jut need to solve for α = 1/(4 n δ 2 ). This gives


 
 Y 1
Pp p ∈ Ĩn,α ≤ 1 − α if Ĩn,α = ± √
n 2 nα

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy