0% found this document useful (0 votes)
12 views62 pages

Lecture2 2

xiamen university finance risk management presentation

Uploaded by

zhangenming2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views62 pages

Lecture2 2

xiamen university finance risk management presentation

Uploaded by

zhangenming2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Risk Management

Lecture 2-2: Estimation and Hypothesis Testing

Chen Tong

SOE & WISE, Xiamen University

September 12, 2024

Chen Tong (SOE&WISE) Risk Management September 12, 2024 1 / 62


Estimation and Hypothesis Testing

This chapter includes:

1. Statistical Inference and Estimation

2. Unbiasedness, Efficiency and Consistency

3. Interval Estimation and Confidence Intervals

4. Hypothesis Testing

5. LLN and CLT

6. ∗ ∗ ∗ Maximum Likelihood Estimation (MLE)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 2 / 62


1. Statistical Inference and Estimation

Chen Tong (SOE&WISE) Risk Management September 12, 2024 3 / 62


Statistical Inference

▶ We usually use a sample from a population (e.g. all working adults


in China urban labor market) to draw inferences about the properties
of this population.

▶ Since we do not know the parameters of interest in the population


(such as the expected value and the variance), we use the sample to
estimate these parameters.

▶ We want to use a procedure that allows us to estimate the


parameters "as precise as possible"

▶ We want to test whether certain hypotheses are in line with


information included in the sample

Chen Tong (SOE&WISE) Risk Management September 12, 2024 4 / 62


Estimator and Estimate
▶ Given a random sample {Y1 , Y2 , . . . , Yn } drawn from a population
distribution that depends on an unknown parameter θ, and estimator
W of θ is a rule that assigns each possible outcome of the sample a
value of θ
▶ The rule does not depend on the data actually obtained

▶ An estimator W of a parameter θ can be expressed as a function of


{Y1 , Y2 , . . . , Yn }
W = h (Y1 , Y2 , . . . , Yn )

▶ W is a random variable because it depends on Y1 , Y2 , . . . , Yn

▶ Given the actual data {y1 , y2 , . . . , yn }, we obtain the point estimate

w = h (y1 , y2 , . . . , yn )

Chen Tong (SOE&WISE) Risk Management September 12, 2024 5 / 62


Estimator and Estimate

▶ Example: Given the random sample {Y1 , Y2 , . . . , Yn }, a natural


estimator of the mean µ is the average Ȳ of a random sample:
n
1X
Ȳ = Yi
n
i=1

▶ For actual data outcomes {y1 , y2 , . . . , yn }, the estimate is the


average in the sample:
n
1X
ȳ = yi
n
i=1

Chen Tong (SOE&WISE) Risk Management September 12, 2024 6 / 62


Estimator and Estimate

▶ Since we want to use an estimator as precise as possible, we have to


define what "as precise as possible" means

▶ Most important criteria:


- Unbiasedness
- Efficiency
- Consistency

Chen Tong (SOE&WISE) Risk Management September 12, 2024 7 / 62


2. Unbiasedness, Efficiency and Consistency

Chen Tong (SOE&WISE) Risk Management September 12, 2024 8 / 62


Unbiasedness

▶ An estimator W of θ is unbiased if

E (W ) = θ

⇒ If we could indefinitely draw random samples on Y from the


population and compute an estimate each time and then average
these estimates over all random samples, we would obtain θ

Chen Tong (SOE&WISE) Risk Management September 12, 2024 9 / 62


Unbiasedness

▶ It can be shown that Ȳ is an unbiased estimator of µ :


" n # " n # n
1X 1 X 1X
E (Ȳ ) = E Yi = E Yi = E (Yi )
n n n
i=1 i=1 i=1
n
1X 1
= µ = nµ = µ
n n
i=1

▶ For hypothesis testing, we need to estimate the variance σ 2 from a


population with mean µ

Chen Tong (SOE&WISE) Risk Management September 12, 2024 10 / 62


Unbiasedness

▶ Letting {Y1 , Y2 , . . . , Yn } denote the random sample from the


population with E (Y ) = µ and Var(Y ) = σ 2 , the estimator of σ 2 is
given by the sample variance
n
1 X 2
S2 = Yi − Ȳ
n−1
i=1

▶ It can be shown that S 2 is unbiased estimator for σ 2 :

E S 2 = σ2


Chen Tong (SOE&WISE) Risk Management September 12, 2024 11 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 12 / 62
Efficiency

▶ There are usually more than one unbiased estimator. They could
have a different sampling distribution

▶ If the sampling distribution of an estimator is more dispersed, it is


more likely that we will obtain a random sample that yields an
estimate very far from θ. ⇒ We need to rely on the variance of an
estimator

▶ If W1 and W2 are two unbiased estimators of θ, W1 is efficient


relative to W2 when Var (W1 ) ≤ Var (W2 ) for all θ (with strict
inequality for at least one value of θ)

▶ Comparing variances is difficult if we do not restrict our attention to


unbiased estimators because we could always use a trivial estimator
with variance zero that is biased.

Chen Tong (SOE&WISE) Risk Management September 12, 2024 13 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 14 / 62
Sampling Variance

▶ Var(Ȳ ) is called the sampling variance because it is the variance


associated with the sampling distribution

▶ The sampling variance is a constant, not a random variable

▶ The variance of the sample average Ȳ is given by


n
! n
1X 1 X 1 σ2
Var(Ȳ ) = Var Yi = 2 Var (Yi ) = 2 nσ 2 =
n n n n
i=1 i=1

⇒ Var(Ȳ ) gets smaller if the sample size n increases

Chen Tong (SOE&WISE) Risk Management September 12, 2024 15 / 62


Consistency
▶ We can study the asymptotic properties of estimators for large
sample sizes, i.e. we can approximate the features of the sampling
distribution of an estimator for large sample sizes n

▶ We usually want to know the distance of an estimator from the


"true" parameter if the sample size increases indefinitely

▶ Let Wn be an estimator of θ based on the sample Y1 , Y2 , . . . , Yn of


size n Then, Wn is a consistent estimator of θ if for every ε > 0
(even a very small one),

P (|Wn − θ| > ε) → 0 as n → ∞

▶ Alternative notation: plim (Wn ) = θ (the probability limit of Wn is


θ) ⇒ The distribution of Wn becomes more concentrated about θ if
n increases

Chen Tong (SOE&WISE) Risk Management September 12, 2024 16 / 62


Consistency

▶ Unbiased estimators are not necessarily consistent, but those whose


variances shrink to zero as the sample size increases are consistent
⇒ If Wn is an unbiased estimator of θ and Var (Wn ) → 0 as
n → ∞, then plim (Wn ) = θ

▶ The sample average of Ȳ is consistent: Since Var Ȳn = σ 2 /n for




any sample size n ⇒ Var Ȳn → 0 as n → ∞, ⇒ Ȳ is a consistent
estimator of µ

Chen Tong (SOE&WISE) Risk Management September 12, 2024 17 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 18 / 62
3. Interval Estimation and Confidence Intervals

Chen Tong (SOE&WISE) Risk Management September 12, 2024 19 / 62


Interval Estimation and Confidence Intervals

▶ A point estimate provides no information about how close the


estimate is "likely" to be to the population parameter

▶ We cannot know how close an estimate for a particular sample is to


the population parameter because the population value is unknown

▶ However, we can obtain an interval estimate that contains the


population parameter with a certain probability

Chen Tong (SOE&WISE) Risk Management September 12, 2024 20 / 62


▶ Suppose the population has a normal distribution N µ, σ 2 and let


{Y1 , Y2 , . . . , Yn } be a random sample from this population. Then


the sample average has a normal distribution: Ȳ ∼ N µ, σ 2 /n


▶ The standardized sample average Z̄ is given by

Ȳ − µ
Z̄ = √ , with Z̄ ∼ N(0, 1)
(σ/ n)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 21 / 62


▶ We may obtain a confidence interval about Z̄ by choosing a certain
confidence level (typically chosen: 95% or 99% )

▶ In general, this confidence level is 1 − α, where α is called


significance level

▶ Formally, we look for critical values −dα/2 and dα/2 , so that



P −dα/2 < Z̄ < dα/2 = 1 − α

Chen Tong (SOE&WISE) Risk Management September 12, 2024 22 / 62


Interval Estimation and Confidence Intervals

▶ Since the event


Ȳ − µ
−dα/2 < √ < dα/2
(σ/ n)
is identical to the event
√ √
Ȳ − dα/2 σ/ n < µ < Ȳ + dα/2 σ/ n,

it follows that
√ √ 
P Ȳ − dα/2 σ/ n < µ < Ȳ + dα/2 σ/ n = 1 − α

 √ √ 
▶ The random interval Ȳ − dα/2 σ/ n, Ȳ + dα/2 σ/ n contains the
population mean µ with a probability 1 − α

Chen Tong (SOE&WISE) Risk Management September 12, 2024 23 / 62


Interval Estimation and Confidence Intervals

▶ We obtain an interval estimate by plugging in the sample outcome


of the average, ȳ and the sample standard deviation s :
 √ √ 
ȳ − dα/2 s/ n, ȳ + dα/2 s/ n ,
q
1
Pn 1
Pn 2
with ȳ = n i=1 yi and s = n−1 i=1 (yi − ȳ )

▶ Unfortunately, the interval estimate does not preserve the confidence


level 1 − α, because s depends
 on the particular
√ sample ⇒√ In
 other
words, the random interval Ȳ − dα/2 σ/ n, Ȳ + dα/2 σ/ n no
longer contains µ with probability 1 − α if we replace σ with a
random variable S

Chen Tong (SOE&WISE) Risk Management September 12, 2024 24 / 62


Interval Estimation and Confidence Intervals

▶ Solution: We consider a standardized sample average that has a t


distribution with n − 1 degrees of freedom,

Ȳ − µ
√ ∼ tn−1
S/ n

where S is the sample standard deviation


√ √ 
⇒ P Ȳ − cα/2 S/ n < µ < Ȳ + cα/2 S/ n = 1 − α,

where cα/2 is the critical value of the t distribution


 √ 
▶ The confidence interval may be written as Ȳ ± cα/2 (S/ n)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 25 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 26 / 62
Interval Estimation and Confidence Intervals

Example:
▶ Changes in worker productivity on "scrap rates" for a sample of
Michigan manufacturing firms
▶ Was there a significant change in the scrap rate between 1987 and
1988?
▶ n = 20 ⇒ The critical value for a 95% confidence interval for
n − 1 = 19 degrees of freedom is 2.093, which is the 97.5th
percentile in a t19 distribution (see Wooldridge, page 825)

▶ ȳ = −1.15, se(ȳ ) = s/ n = .54
▶ The confidence interval for the mean change in scrap rates µ is
[ȳ ± 2.093 se(ȳ )]
⇒ The 95% confidence interval is [−2.28, −.02]
⇒ The average change in scrap rates is statistically significant (i.e.
significantly different from zero) at a significance level of 5%

Chen Tong (SOE&WISE) Risk Management September 12, 2024 27 / 62


4. Hypothesis Testing

Chen Tong (SOE&WISE) Risk Management September 12, 2024 28 / 62


▶ We often want to test a certain hypothesis to learn something about
the "true" value θ

▶ Formally, we want to test whether θ is significantly different from a


certain value µ0 ,
H0 : θ = µ0

▶ Since µ0 = 0 is the most common hypothesis, H0 is called null


hypothesis

▶ The alternative hypothesis is

H1 : θ ̸= µ0

Chen Tong (SOE&WISE) Risk Management September 12, 2024 29 / 62


▶ If the value µ0 does not lie within the calculated confidence interval,
then we reject the null hypothesis
▶ If the value µ0 lies within the calculated confidence interval, then we
fail to reject the hypothesis
▶ In both cases, there is a certain risk that our conclusion is wrong
▶ We can reject the null hypothesis when it is in fact true (Type I
error)
▶ We can fail to reject the null hypothesis when it is actually false
(Type II error)
▶ We usually address these problems by the choice of the significance
level α
▶ We will never know with certainty whether we committed an error

Chen Tong (SOE&WISE) Risk Management September 12, 2024 30 / 62


▶ Testing hypotheses about the mean µ from a N(µ, σ) distribution is
straightforward:

▶ Null hypothesis:
H0 : µ = µ0

▶ Alternative hypotheses:
A. H1 : µ > µ0 (one-sided hypothesis)
B. H1 : µ < µ0 (one-sided hypothesis)
C. H1 : µ ̸= µ0 (two-sided hypothesis)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 31 / 62


Testing hypotheses about the mean µ from a N(µ, σ) distribution is
straightforward:

A. One-tailed test: µ > µ0 : We reject H0 in favor of H1 when the value


of the sample average, ȳ , is "sufficiently" greater than µ0 :
ȳ −µ0
1. Calculate the t-statistic: t = se(ȳ )

2. Compare t with the critical value c for a significance level of 5%


⇒ If n is large: c = 1.645
⇒ If t > c, we reject H0 at a significance level of 5%
⇒ If t < c, we fail to reject H0 at a significance level of 5%

Chen Tong (SOE&WISE) Risk Management September 12, 2024 32 / 62


B. One-tailed test: µ < µ0 : We reject H0 in favor of H1 when the value
of the sample average, ȳ , is "sufficiently" smaller than µ0

⇒ If t < −c, we reject H0 at a significance level of 5%


⇒ If t > −c, we fail to reject H0 at a significance level of 5%

C. Two-tailed test: µ ̸= µ0 : We reject H0 in favor of H1 when the value


of the sample average, ȳ , is far from µ0 in absolute value

⇒ If |t| > c, we reject H0 at a significance level of 5%


⇒ If |t| < c, we fail to reject H0 at a significance level of 5%

Chen Tong (SOE&WISE) Risk Management September 12, 2024 33 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 34 / 62
Chen Tong (SOE&WISE) Risk Management September 12, 2024 35 / 62
A. One-tailed test: µ > µ0 : If the significance level is α = .05 = 5%,
then the critical value c is the 100(1 − α) = 95th percentile in the tn−1
distribution

B. One-tailed test: µ < µ0 : If the significance level is α = .05 = 5%,


then the critical value −c is the 100α = 5th percentile in the tn−1
distribution

C. Two-tailed test: µ ̸= µ0 : If the significance level is α = .05 = 5%,


then the critical value c is the 100(1 − α/2) = 97.5th in the tn−1
distribution

Chen Tong (SOE&WISE) Risk Management September 12, 2024 36 / 62


p-value
To provide additional information, we could ask the question: What is
the largest significance level at which we could carry out the test and still
fail to reject the null hypothesis?

⇒ We can consider the p-value of a test:


▶ Calculate the t-statistic t
▶ The largest significance level at which we would fail to reject H0 is
the significance level associated with using t as our critical value:

p-value = 1 − Φ(t),

where Φ(·) denotes the standard normal cdf (we assume that n is
large enough to treat the test statistic as having a standard normal
distribution)

Two-tailed test:
p-value = 2(1 − Φ(|t|))

Chen Tong (SOE&WISE) Risk Management September 12, 2024 37 / 62


Chen Tong (SOE&WISE) Risk Management September 12, 2024 38 / 62
General approach:

1. State H0 (µ = µ0 )
2. State H1 (µ < µ0 , µ > µ0 or µ ̸= µ0 )
3. If necessary, calculate ȳ , s and se(ȳ )
ȳ −µ0
4. Calculate the t-statistic: t = se(ȳ )
5. Find the critical value which depends on (i) the significance level,
(ii) the alternative hypothesis (i.e. one-tailed or two-tailed test) and
(iii) the degrees of freedom (n − 1)
6. Compare t with c for a given significance level α Reject or fail to
reject the null hypothesis. The interpretation depends on H1 :
µ > µ0 (one-tailed test): reject H0 if t > c µ < µ0 (one-tailed test):
reject H0 if t < −c µ ̸= µ0 (two-tailed test): reject H0 if |t| > c
7. (If 
requested:) Calculate the p-value
1 − Φ(t) if one-tailed test
=
2(1 − Φ(|t|)) if two-tailed test

Chen Tong (SOE&WISE) Risk Management September 12, 2024 39 / 62


Example

1. H0 : µ = 0
2. H1 : µ ̸= 0
3. ȳ = −1.15, se(ȳ ) = .54
ȳ −0 −1.15
4. t = se(ȳ ) = 0.54 ≈ −2.13
5. critical value:
(i) significance level: 5%
(ii) two-tailed test
(iii) degrees of freedom: n − 1 = 20 − 1 = 19
⇒ c = 2.093
6. Reject H0 if |t| > c : 2.13 > 2.093 ⇒ We reject H0 at a significance
level of 5%
7. The smallest significance level at which we would reject H0 ?
⇒ p-value = 2[1 − Φ(|t|)] = 2[1 − Φ(|2.13|)] ≈ .033
⇒ We would still reject H0 at a significance level of 3.3%

Chen Tong (SOE&WISE) Risk Management September 12, 2024 40 / 62


5. LLN and CLT

Chen Tong (SOE&WISE) Risk Management September 12, 2024 41 / 62


The Law of Large Number (LLN) for i.i.d data

▶ For n obervations of i.i.d (independent and identically distributed)


data X1 , X2 , X3 , ... Xn , with E(X ) = µ and Var(X ) = σ 2 < ∞,
then we have
n
1X P
Xi −→ E(X ) = µ
n
i=1

▶ How to proof it?

Chen Tong (SOE&WISE) Risk Management September 12, 2024 42 / 62


Proof of LLN for i.i.d data

▶ (Chebyshev inequality) For ∀ε > 0,

σ2
P{|X − µ| ≥ ε} ≤
ε2

▶ How to proof it?

|X − µ|2
Z Z
P{|X − µ| ≥ ε} = f (x)dx ≤ f (x)dx
|X −µ|≥ε |X −µ|≥ε ε2
Z
1
≤ |X − µ|2 f (x)dx
ε2
σ2
=
ε2

Chen Tong (SOE&WISE) Risk Management September 12, 2024 43 / 62


Proof of LLN for i.i.d data (cont.)

▶ Using Chebyshev inequality, we have

Var(X̄ ) σ2
P{|X̄ − µ| ≥ ε} ≤ =
ε2 nε2
that means
lim P{|x̄ − µ| ≥ ε} = 0
n→∞

▶ The key is the computation of Var(X̄ ): for i.i.d X , we have


n
! n
!
1X 1 X Var(Xi )
Var(X̄ ) = Var Xi = 2 Var Xi =
n n n
i=1 i=1

▶ What is the case for dependent data?

Chen Tong (SOE&WISE) Risk Management September 12, 2024 44 / 62


The Central Limit Theorem for i.i.d data

▶ For n obervations of i.i.d (independent and identically distributed)


data X1 , X2 , X3 , ... Xn , with E(X ) = µ and Var(X ) = σ 2 < ∞,
then we have
√ X̄ − µ d
n −→ N(0, 1)
σ

▶ How to proof it? (self-reading)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 45 / 62


Proof of CLT for i.i.d data

▶ The characteristic function (CF) of Y is defined as



φ(t) = E(e itY ), i = −1

▶ Define Yn as
n n
√ X̄ − µ n(X̄ − µ) X (Xi − µ) X ηi
Yn = n = √ = √ = √
σ nσ nσ nσ
i=1 i=1

▶ The CF of Yn is
  n
t
φ(t) = E e itYn = ϕ


σ n

where ϕ(t) is the CF of ηi

Chen Tong (SOE&WISE) Risk Management September 12, 2024 46 / 62


Proof of CLT for i.i.d data (cont.)

▶ Expanding ϕ(t) at zero using Taylor-expansion


2 2 !
ϕ′′ (0)
   
t ′ it it it
ϕ √ = ϕ(0) + ϕ (0) √ + √ +o √
σ n σ n 2! σ n σ n
2 !
t2

it
=1+0− +o √
2n σ n

therefore,
2
 
2
 2 !!(− 2n
t2
)× − t2
t it t2
φ(t) = 1− +o √ = e − 2 , n → +∞
2n σ n

t2
and e − 2 is the CF of N(0, 1).

Chen Tong (SOE&WISE) Risk Management September 12, 2024 47 / 62


▶ To derive the LLN and CLT for serially dependent data (or time
series data), we need more concepts and assumptions on the
stochastic process, including:
▶ The Mean and Autocovariance
▶ Stationarity
▶ Ergodicity

Chen Tong (SOE&WISE) Risk Management September 12, 2024 48 / 62


6. Maximum Likelihood Estimation (MLE)

Chen Tong (SOE&WISE) Risk Management September 12, 2024 49 / 62


Motivating Examples: Time Invariant Model

▶ Time Invariant Model


yt = σzt
where σ is the scale parameter and zt ∼ N(0, 1). Thus
yt ∼ N 0, σ 2 . The density function of yt is

y2
 
1
f (yt ; θ) = √ exp − t 2
2πσ 2 2σ

Chen Tong (SOE&WISE) Risk Management September 12, 2024 50 / 62


Motivating Examples: Count Model

▶ Consider a time series of counts from a Poission distribution


θy exp[−θ]
f (y ; θ) = , y = 0, 1, 2, 3 . . . .
y!

where θ > 0 is an unknown parameter.

Chen Tong (SOE&WISE) Risk Management September 12, 2024 51 / 62


Motivating Examples: Linear Regression Model

▶ Consider the regression model

yt = βxt + σzt , zt ∼ iid N(0, 1)

where
 xt is an explanatory variable that is independent of zt and
θ = β, σ 2 . The distribution of y conditional on xt is still normal
with mean βxt and variance σ 2
" #
2
1 (yt − βxt )
f (yt | xt ; θ) = √ exp −
2πσ 2 2σ 2

Chen Tong (SOE&WISE) Risk Management September 12, 2024 52 / 62


Motivating Examples: Autoregressive Model

▶ A first-order autoregressive model, denoted AR(1), is

yt = ρyt−1 + ut , ut ∼ iidN 0, σ 2


with |ρ| < 1, and θ = ρ, σ 2 . The distribution of yt conditional on




yt−1 is normal with mean ρyt−1 and variance σ 2


" #
2
1 (yt − ρyt−1 )
f (yt | yt−1 ; θ) = √ exp −
2πσ 2 2σ 2

Chen Tong (SOE&WISE) Risk Management September 12, 2024 53 / 62


Joint Probability Distribtion

▶ The joint probability pdf for a sample of T observation is

f (y1 , y2 , . . . yT ; ψ)

with ψ is the vector of parameters.

Chen Tong (SOE&WISE) Risk Management September 12, 2024 54 / 62


Joint Probability Distribtion

▶ Independent Case: yt are independent

T
Y
f (y1 , y2 , . . . yT ; θ) = f (yt ; θ)
t=1

▶ Dependent Case

T
Y
f (y1 , y2 , . . . yT ; θ) = f (y1 ; θ) f (yt | yt−1 , yt−2 . . . , y1 ) .
t=2

we have
f (yt | yt−1 , yt−2 . . . , y1 ) = f (yt | Ft−1 )

Chen Tong (SOE&WISE) Risk Management September 12, 2024 55 / 62


Maximum Likelihood Framework

▶ Maximum Likelihood Principle:


▶ The maximum likelihood estimator is to find value of θ which is
"most likely" to have generate the observed data.

θ = argmin f (y1 , y2 , . . . yT ; θ)

▶ Maximum Likelihood Framework


▶ Log-likelihood function
▶ Gradient
▶ Hession

Chen Tong (SOE&WISE) Risk Management September 12, 2024 56 / 62


Log-likelihood function:

▶ Log-likelihood function:

ln LT (θ) = log f (y1 , y2 , . . . yT ; θ)

▶ The maximum likelihood estimator (MLE) of θ is defined as the


value of θ, denoted θ̂, that maximizes the log-likelihood function
ln LT (θ).

Chen Tong (SOE&WISE) Risk Management September 12, 2024 57 / 62


Log-likelihood function

▶ Poisson Distribution: Let {y1 , y2 . . . yT } be iid observations from a


Poisson distribution
θy exp[−θ]
f (y ; θ) = , y = 0, 1, 2, 3 . . . .
y!

The log-likelihood function for the sample is


T
X
ln LT (θ) = [yt ln θ − θ − ln (yt !)]
t=1

Chen Tong (SOE&WISE) Risk Management September 12, 2024 58 / 62


Gradient (or Score)

▶ Differentiating ln LT (θ) with respect to a (K × 1) parameter vector


θ yields a (K × 1) gradient vector, also known as the score, given by
 
∂ ln LT (θ)
∂θ1 T
∂ ln LT (θ)   X
GT (θ) = = ... = gt (θ)
∂θ ∂ ln LT (θ) t=1
∂θK

∂ ln ℓt (θ)
where gt (θ) = ∂θ , and ℓt (θ) = log f (yt |Ft−1 )

▶ In most cases, the maximum likelihood estimator θ̂ satisfied the


necessary condition G

∂ ln LT (θ)
GT (θ̂) = =0
∂θ θ=θ̂

Chen Tong (SOE&WISE) Risk Management September 12, 2024 59 / 62


Gradient (or Score)

▶ Normal Distribution
" # " #
∂ ln LT (θ) 1
PT
(y − µ)
GT (θ) = ∂µ
∂ ln LT (θ) = T
σ2
1
PT t
t=1
2
∂σ 2
− 2σ 2 + 2σ 4 t=1 (yt − µ)

Thus we have
T
1 X
µ̂ = yt = ȳ
T t=1
T
1 X 2
σ̂ 2 = (yt − ȳ )
T t=1

Chen Tong (SOE&WISE) Risk Management September 12, 2024 60 / 62


Hession

▶ The Hession matrix


T
∂ 2 ln LT (θ) X
HT (θ) = = ht (θ)
∂θ∂θ′ t=1
 ∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ) 
∂θ1 ∂θ1 ∂θ1 ∂θ2 ... ∂θ1 ∂θK
∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ)
...
 
HT (θ) =  ∂θ2 ∂θ1 ∂θ2 ∂θ2 ∂θ2 ∂θK
 

 ... ... ... ... 
∂ 2 ln LT (θ) ∂ 2 ln LT (θ) ∂ 2 ln LT (θ)
∂θK ∂θ1 ∂θK ∂θ2 ∂θK ∂θK

▶ The second-order condition to ensure that the MLE maximizes the


log-likelihood function is that the Hession matrix HT (θ̂) should be
negative definite.

Chen Tong (SOE&WISE) Risk Management September 12, 2024 61 / 62


Asymptotic Poperties: Consistency

▶ If f (y ; θ) is correctly specified, then under some suitable regularity


conditions, the MLE is consistent.

plim θ̂ = θ0

Chen Tong (SOE&WISE) Risk Management September 12, 2024 62 / 62

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy