0% found this document useful (0 votes)
18 views

AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

The document discusses parameter estimation techniques including maximum likelihood estimation and confidence intervals. It covers MLE examples for Bernoulli, Poisson and normal distributions. It also discusses interval estimates, differences in means, and evaluating point estimators.

Uploaded by

prasan0311das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

AE 248: AI and Data Science: Prabhu Ramachandran 2024-03-01

The document discusses parameter estimation techniques including maximum likelihood estimation and confidence intervals. It covers MLE examples for Bernoulli, Poisson and normal distributions. It also discusses interval estimates, differences in means, and evaluating point estimators.

Uploaded by

prasan0311das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AE 248: AI and Data science

Parameter Estimation

Prabhu Ramachandran

2024-03-01

Parameter Estimation

• Probability theory: you are given 𝐹


• Statistics: observed data → infer unknown parameters

Estimates

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝐹𝜃


• 𝐹𝜃 not fully specified, 𝜃 unknown
• Example:
– Exponential distribution with unknown mean
– Normal with unknown mean and variance.

Estimates/Estimators

• Point estimates
• Interval estimates
• Confidence
• Estimator: statistic to estimate unknown parameter 𝜃

Maximum Likelyhood Estimators

• Assume unknown parameter 𝜃


• Find joint PDF/PMF, 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃)
• Maximize 𝑓 w.r.t. 𝜃 → 𝜃 ̂
• 𝑓(𝑥1 , ..., 𝑥𝑛 |𝜃), likelyhood function

1
. . .

• Provides a point estimate


• Note: 𝑓 and 𝑙𝑜𝑔(𝑓) have same maximum

MLE Example: Bernoulli Parameter

• 𝑛 Bernoulli trials with 𝑝 success probability


• What is the MLE of 𝑝?
• Data consist of values 𝑋1 , … , 𝑋𝑛

Solution

𝑃 {𝑋𝑖 = 𝑥} = 𝑝𝑥 (1 − 𝑝)(1−𝑥) , 𝑥 = 0, 1

𝑓(𝑥1 , … , 𝑥𝑛 |𝑝) = 𝑝∑𝑖 𝑥𝑖 (1 − 𝑝)𝑛−∑𝑖 𝑥𝑖

maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝑝)}

Answer
𝑛
∑ 𝑥𝑖
𝑝̂ = 𝑖=1
𝑛

MLE Example: Poisson Parameter

• 𝑛 independent Poisson RVs with mean 𝜆


• Find 𝜆̂

Solution

𝑒−𝑛𝜆 𝜆∑ 𝑥𝑖
𝑓(𝑥1 , … , 𝑥𝑛 |𝜆) =
𝑥1 ! … 𝑥𝑛 !
maximize {log 𝑓(𝑥1 , … , 𝑥𝑛 |𝜆)}

2
Answer
𝑛
∑𝑖=1 𝑥𝑖
𝜆̂ =
𝑛

Problem

The number of traffic accidents in Mumbai, in 10 randomly chosen non-rainy days in 2003 is
as follows:

4, 0, 6, 5, 2, 1, 2, 0, 4, 3

Use this to estimate the proportion of non rainy days that had 2 or fewer accidents that year.

MLE for Normal Population

• Self-study
• Same idea and approach
• Two parameters, so maximize w.r.t. each

MLE for a Uniform Distribution

• If 𝑥 ∈ (0, 𝜃)
• 𝜃 should be small
• But large enough for largest 𝑋𝑖

Interval estimates

• Given that 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇, 𝜎)


• Unknown 𝜇 but known 𝜎
• MLE 𝜇̂ = 𝑋̄

. . .

• Is the MLE equal to actual 𝜇??


• Can we provide an interval in which 𝜇 lies?

3
Interval estimates
√ ̄
• 𝑛 𝑋−𝜇𝜎 is a standard normal
• So, for example:

√ 𝑋̄ − 𝜇
𝑃 {−1.96 < 𝑛 < 1.96} = 0.95
𝜎

Interval estimates

• Can be modified to:

𝜎 𝜎
𝑃 {𝑋̄ − 1.96 √ < 𝜇 < 𝑋̄ + 1.96 √ } = 0.95
𝑛 𝑛

Example

• Given some 𝑥,̄ this means


– With 95% confidence the mean lies within ±1.96 √𝜎𝑛 of 𝑥̄

• 95% percent confidence interval estimate of 𝜇

Interpretation

• If we estimated the interval a 100 times (with 100 samples), then we expect the number
of intervals that contain the true mean 𝜇 to tend to 90.
• There is a 90% probability that if we calculate the CI in a future experiment it will
encompass the true mean.

Example from textbook

Suppose that when a signal having value 𝜇 is transmitted from location A the value received
at location B is normally distributed with mean 𝜇 and variance 4. That is, if 𝜇 is sent, then
the value received is 𝜇 + 𝑁 where N , representing noise, is normal with mean 0 and variance
4. To reduce error, suppose the same value is sent 9 times. If the successive values received
are 5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5, let us construct a 95 percent confidence interval for 𝜇.

4
Two-sided vs one-sided

• With 95% confidence assert if 𝜇 is at least as large as the value

√ 𝑋̄ − 𝜇
𝑃{ 𝑛 < 1.645} = 0.95
𝜎
𝜎
𝑃 {𝑋̄ − 1.645 √ < 𝜇} = 0.95
𝑛

One-sided intervals

• One-sided upper CI for 𝜇 = (𝑥̄ − 1.645 √𝜎𝑛 , ∞)


• One-sided lower CI for 𝜇 = (−∞, 𝑥̄ + 1.645 √𝜎𝑛 )

Using the tables

• Recall 𝑃 {𝑍 > 𝑧𝛼 } = 𝛼
• 𝑃 {−𝑧𝛼/2 < 𝑍 < 𝑧𝛼/2 } = 1 − 𝛼

𝜎 𝜎
𝑃 {𝑋̄ − 𝑧𝛼/2 √ < 𝜇 < 𝑋̄ + 𝑧𝛼/2 √ } = 1 − 𝛼
𝑛 𝑛

Finding suitable n

• Given desired interval size


• Find 𝑛 to satisfy it

So what if variance is not known?


√ 𝑋−𝜇
̄
• Cannot assume 𝑛 𝜎 is 𝑍
• We can find 𝑆 2

So what if variance is not known?


√ ̄
• 𝑛 𝑋−𝜇
𝑆 is 𝑡𝑛−1

𝑠 𝑠
𝑃 {𝑋̄ − 𝑡𝛼/2,𝑛−1 √ < 𝜇 < 𝑋̄ + 𝑡𝛼/2,𝑛−1 √ } = 1 − 𝛼
𝑛 𝑛

5
Non-normal populations

• Central limit theorem applies, so if 𝑛 is ”large enough” we should be good.

Confidence intervals for the variance


2
• Recall that (𝑛 − 1) 𝑆𝜎2 ∼ 𝜒2𝑛−1
• Homework.
• Note that 𝜒2 is not symmetric
• 𝜒2𝛼/2,𝑛−1 and 𝜒21−𝛼/2,𝑛−1

Example

The weights of 5 students was found to be 61, 65, 68, 58, and 70 Kgs. Determine a 95%
confidence interval for their mean. Also determine a 95% lower confidence interval for this
mean.

Difference in means

• 𝑋1 , ..., 𝑋𝑛 from 𝒩(𝜇1 , 𝜎1 )


• 𝑌1 , ..., 𝑌𝑚 from 𝒩(𝜇2 , 𝜎2 )
• CI for 𝜇1 − 𝜇2 ?
• Recall: distribution of two normally distributed RVs is normal

Difference in means

• MLE of 𝜇1 − 𝜇2 is 𝑋̄ − 𝑌 ̄

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
2
∼ 𝒩(0, 1)
√ 𝜎1 + 𝜎22
𝑛 𝑚

When variances are not known?

• If 𝜎1 ≠ 𝜎2 we have a problem
• If they are the same the same approach as before can be used

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝒩(0, 1)
2 𝜎2
√ 𝜎𝑛 + 𝑚

6
Variances unknown

• 𝑋,̄ 𝑆12 , 𝑌 ̄ , 𝑆22 are independent


• If we consider

(𝑛 − 1)𝑆12 + (𝑚 − 1)𝑆22
𝑆𝑝2 =
𝑛+𝑚−2

Variances unknown

𝑋̄ − 𝑌 ̄ − (𝜇1 − 𝜇2 )
∼ 𝑡𝑛+𝑚−2
√𝑆𝑝2 ( 𝑛1 + 1
𝑚)

Approximate CI for Bernoulli RV


𝑋−𝑛𝑝
• When 𝑛 is large, √𝑛𝑝(1−𝑝)
∼ 𝒩(0, 1)
• To get a CI, let 𝑝̂ = 𝑋/𝑛
𝑋−𝑛𝑝
• So √𝑛 𝑝(1−
̂ 𝑝)̂
∼ 𝒩(0, 1)
• 𝑃 {𝑝̂ − 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛
̂ < 𝑝 < 𝑝̂ + 𝑧𝛼/2 √𝑝(1
̂ − 𝑝)/𝑛}
̂ ≈1−𝛼

Evaluating point estimators

• How good is an estimator, 𝑑(𝑋1 , … , 𝑋𝑛 )?


• One measure is the mean-square error 𝐸[(𝑑(X) − 𝜃)2 ]
• A desirable quality is unbiasedness

Unbiased estimators

• Bias is defined as: 𝑏𝜃 (𝑑) = 𝐸[𝑑(X)] − 𝜃


• Unbiased if 𝑏𝜃 (𝑑) = 0
• If 𝑑 is unbiased then 𝐸[(𝑑(X − 𝜃)2 ] = 𝑉 𝑎𝑟(𝑑(X))

7
Bayes estimator

• Prior information on distribution of 𝜃, i.e. 𝑝(𝜃)


• Use data to find posterior density

𝑓(𝜃, 𝑥1 , … , 𝑥𝑛 )
𝑓(𝜃|𝑥1 , … , 𝑥𝑛 ) =
𝑓(𝑥1 , … , 𝑥𝑛 )
𝑝(𝜃)𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)
=
∫ 𝑓(𝑥1 , … , 𝑥𝑛 |𝜃)𝑝(𝜃)𝑑𝜃

Bayes estimator

• Best estimate of 𝜃 is the mean of the posterior:

𝐸[𝜃|𝑋1 = 𝑥1 , … , 𝑋𝑛 = 𝑥𝑛 ] = ∫ 𝜃𝑓(𝜃|𝑥1 , … , 𝑥𝑛 )𝑑𝜃

• See examples in the book

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy