Sta 2
Sta 2
1/50 2/50
3/50 4/50
Parametric estimation (II) Point estimation
of dataset DN .
5/50 6/50
where
We will focus on: Fz (z) = Prob {z ≤ z}
1. Plug-in principle the empirical distribution function is
2. Maximum likelihood N(z) #zi ≤ z
F̂z (z) = =
N N
where N(z) is the number of samples in DN that do not exceed z.
7/50 8/50
TP R: empirical distribution Plug-in principle to define an estimator
I Dataset of N = 14 observed ages
DN = {20, 21, 22, 20, 23, 25, 26, 25, 20, 23, 24, 25, 26, 29}
I Empirical distribution function F̂z (Estimation/cumdis.R) is I Sample DN from Fz (z, θ) where θ = t(F (z)). .
a staircase function with discontinuities at the points zi . I Plug-in estimate of θ:
Empirical Distribution function
1.0
0.8
θ̂ = t(F̂ (z))
0.4
0.2
0.0
20 22 24 26 28 30
9/50 10/50
11/50 12/50
Other plug-in estimators Sampling distribution
I Skewness estimator:
I Point estimate is
1 PN
N i=1 (zi − µ̂)3 θ̂ = h(DN )
γ̂ =
σ̂ 3 where DN is the realisation of a random variable DN .
I Upper critical point estimator: I If DN is random, the point estimator
13/50 14/50
θ
(1)
θ
(2)
θ
(3)
θ
(...) 1: S = {}
N N N N
2: for r = 1 to R do
3: Fz → DN = {z1 , z2 , . . . , zN } // sample dataset
D
(1)
D
(2)
D
(3)
D
(...) 4: θ̂ = h(DN ) // compute estimate
S = S ∪ {θ̂}
N N N N
5:
6: end for
7: Plot histogram of S
8: Compute statistics of S (mean, variance)
UNKNOWN R.V. DISTRIBUTION
9: Study distribution of S with respect to θ
mu<-0 # parameter
0.15
R<-10000 # number trials
N<-20 # size dataset
0.10
mu.hat<-numeric(R)
Density
for (r in 1:R){
0.05
D<-rnorm(N,mean=mu,sd=10) # random generator
mu.hat[r]<-mean(D) # estimator
}
0.00
−40 −20 0 20 40
hist(mu.hat) # histogram mu.hat
17/50 18/50
EDN [θ̂] = θ
Definition
The variance of θ̂ is the variance of the sampling distribution
h i
Var θ̂ = EDN [(θ̂ − E [θ̂])2 ]
19/50 20/50
Bias/variance issue Some considerations
21/50 22/50
23/50 24/50
Bias of σ̂ 2 Considerations
25/50 26/50
29/50 30/50
Suppose we have two unbiased estimators. How to choose Let z1 , . . . , zN be i.i.d. N (µ, σ 2 ) and let us consider the following
between them? sample statistics
Definition (Relative efficiency) N N
1 X X SS
c
Let us consider two unbiased estimators θ̂ 1 and θ̂ 2 . If µ̂ = zi , SS
c= (zi − µ̂)2 , σ̂ 2 =
N N −1
h i h i i=1 i=1
Var θ̂ 1 < Var θ̂ 2 It can be shown that the following relations hold
1. µ̂ ∼ N (µ, σ 2 /N) and µ̂−µ
√σ ∼ N (0, 1)
we say that θ̂ 1 is more efficient than θ̂ 2 . N
√
2. N(µ̂ − µ)/σ̂ ∼ TN−1 or µ̂−µ σ̂
√
∼ TN−1 where TN−1 denotes
N
the Student distribution with N − 1 degrees of freedom.
If the estimators are biased, typically the comparison is done on
3. if E [|z − µ|4 ] = µ4 then Var σ̂ 2 = N1 µ4 − N−3 4 .
N−1 σ
the basis of the mean square error.
31/50 32/50
Likelihood Maximum likelihood
N
Y I The r.v θ̂ is the maximum likelihood estimator (m.l.e.).
pDN (DN , θ) = pz (zi , θ) = LN (θ)
I It is usual to consider the log-likelihood lN (θ) since being
i=1
log(·) a monotone function, we have
where for a fixed DN , LN (·) is a function of θ and is called the
empirical likelihood of θ given DN . θ̂ml = arg max LN (θ) = arg max log(LN (θ)) = arg max lN (θ)
θ∈Θ θ∈Θ θ∈Θ
33/50 34/50
1.5e-07
I Suppose that the probabilistic model underlying the data is
Gaussian with an unknown mean µ and a known variance
1.0e-07
σ 2 = 1.
L
I The likelihood LN (µ) is a function of (only) the unknown
5.0e-08
parameter µ.
I By applying the maximum likelihood technique we have
0.0e+00
-2 -1 0 1 2
N (zi −µ)2
1 mu
e − 2σ2
Y
µ̂ = arg max L(µ) = arg max √
µ µ 2πσ
i=1 P µ according to the data is µ̂ ≈ 0.358.
Then the most likely value of
Note that in this case µ̂ = Nzi .
R script Estimation/ml_norm.R
35/50 36/50
Some considerations Example: log likelihood
Consider the previous example.
The behaviour of the log-likelihood for this model is
I Likelihood measures the relative abilities of the various
parameter values to explain the observed data.
-15
I Rationale: the value of the parameter under which the
observed data have the highest probability of arising is the
-20
best estimator of θ.
-25
I Likelihood function is a function of the parameter θ.
log(L)
-30
I Likelihood function is NOT the probability function of θ (θ is
constant).
-35
I LN (θ) is rather the conditional probability of observing the
-40
dataset DN for a given θ.
I Likelihood is the probability of the data given the parameter -2 -1 0
mu
1 2
37/50 38/50
41/50 42/50
43/50 44/50
R example: Numerical max. likelihood Properties of m.l. estimators
45/50 46/50
47/50 48/50
From sampling distribution to confidence interval Interval estimation (II)
I Suppose that our interval estimator satisfies
Let us suppose we have an estimator θ̂ of the parameter θ which is
Prob θ ≤ θ ≤ θ̄ = 1 − α α ∈ [0, 1]
I unbiased
I with variance σ 2 then the random interval [θ, θ̄] is called a 100(1 − α)%
θ̂
confidence interval of θ.
I with a Normal sampling distribution θ̂ ∼ N (θ, σ 2 )
θ̂ I Notice that θ is a fixed unknown value and that at each
We can write realization DN the interval either does or does not contain the
n o true θ.
Prob θ − 1.96σθ̂2 ≤ θ̂ ≤ θ + 1.96σθ̂2 = 0.95
I If we repeat the procedure of sampling DN and constructing
which entails the confidence interval many times, then our confidence
n o interval will contain the true θ at least 100(1 − α)% of the
Prob θ̂ − 1.96σθ̂2 ≤ θ ≤ θ̂ + 1.96σθ̂2 = 0.95 time (i.e. 95% of the time if α = 0.05).
I While an estimator is characterized by bias and variance, an
interval estimator is characterized by its endpoints and
confidence.
49/50 50/50