0% found this document useful (0 votes)
2 views

BSDS_slides-Week9

The document covers large sample theory in statistics, focusing on sampling distributions, convergence of random variables, and the Central Limit Theorem (CLT). It explains concepts such as convergence in probability, consistency of estimators, and the Weak Law of Large Numbers (WLLN). The document also provides examples and definitions to illustrate these statistical principles.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

BSDS_slides-Week9

The document covers large sample theory in statistics, focusing on sampling distributions, convergence of random variables, and the Central Limit Theorem (CLT). It explains concepts such as convergence in probability, consistency of estimators, and the Weak Law of Large Numbers (WLLN). The document also provides examples and definitions to illustrate these statistical principles.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Statistics II: Introduction to Inference

Week 9: Large Sample Theory

1 Sampling distributions
Recall the sampling distributions of some important statistics.
iid
Example 1. Let X1 , . . . , Xn ∼ N (µ, σ 2 ). Define T1n = X̄n (Sample mean) and T2n = Sn2 (Sample
variance). Then, T1n ∼ Normal(µ, n−1 σ 2 ) and nT2n ∼ σ 2 χ2(n−1) .

iid
Example 2. Let X1 , . . . , Xn ∼ Gamma(α, β), then T3n = X̄n ∼ n−1 Gamma(nα, β).

Fig. 1: Plots of three statistics T1n , T2n generated from Normal(1, 1) and T3n generated from Gamma(2, 4) for
varying sample sizes (n).

1
• As we see in the above examples, usually distribution of good statistics or estimators depend on n.

• Usually, a good statistic or estimator, say, Tn is a function of X1 , . . . , Xn . For example,

– T11 = X1 ,
– T12 = (X1 + X2 )/2
– T13 = (X1 + X2 + X3 )/3 and so on
– Tn = X̄n .

• Therefore, it is natural to ask, how the statistic behaves when we have a large number of
representative samples?

• In this module, we will learn the large sample behaviour of statistic.

2 Convergence of a sequence of random variables


Towards understanding the large sample behavior of a statistic, say Tn , it is important to understand
what do we mean by convergence of a sequence of random variable.
iid
Example 3. Let X1 , · · · , Xn ∼ Ber(p) and consider the statistic Tn = X̄n . The distribution of Tn
changes over n as follows: 

2
0 w.p. (1 − p)3
0 w.p. (1 − p)
( 

0 w.p. (1 − p)

 1/3 w.p. 3p(1 − p)2

T1 = , T2 = 21 w.p. 2p(1 − p) , T3 = and so on.
1 w.p. p 
2
2/3 w.p. 3p2 (1 − p)2
1 w.p. p
 

w.p. p3

1

So, one can not directly generalize the concept of convergence of real sequence for convergence of
random variables (RV). [Recall, a real sequence {an } converges to a point a if for any ϵ > 0, ∃ nϵ s.t.
|an − a| < ϵ for n ≥ nϵ .] Therefore, to define convergence of a sequence of RVs to a RV, i.e., to define
Tn → T , where {Tn } and T are RVs, one first,

• reduces the difference of Tn and T to a real number,

• then apply the concept of real convergence.

Definition 1 (Convergence in probability). Let {Tn } be a sequence of RVs and T be another RV such
that all of them are defined on the same probability space (so that the RV Tn − T is well-defined), then we
p
say that Tn converges in probability to T , notationally, Tn → T if for any ϵ > 0,

pn,ϵ = P (|Tn − T | > ϵ) → 0 as n → ∞.

Remark 1. Observe that, here {pn,ϵ } is a real sequence measuring the difference between Tn and T (at ϵ
level).

Remark 2. Sometimes the limiting RV, T , is of degenerate type, i.e., P (T = c) = 1. In such cases,
p p
Tn → T can equivalently be expressed as Tn → c.
iid p
Example 4. Let X1 , · · · , Xn ∼ U (0, θ), and Tn = X(n) . Show that Tn → θ.

2
Fix any ϵ > 0,
P (|Tn − θ| < ϵ) = P (θ − ϵ < Tn < θ + ϵ)
(
P (θ − ϵ < Tn ≤ θ) if ϵ ≤ θ
=
P (0 ≤ Tn ≤ θ) if ϵ > θ
(
n
1 − (ϵ/θ) if ϵ ≤ θ
=
1 if ϵ > θ
→ 1 as n → ∞.
Thus, P (|Tn − θ| > ϵ) = pn,ϵ → 0 as n → ∞.
Definition 2 (Consistency). Let {Tn } be a sequence of estimators for the parameter ψ(θ). Then {Tn } is
p
called a consistent sequence of estimators for ψ(θ) if Tn → ψ(θ) as n → ∞.
Example 5. In the above Example 4, X(n) is consistent for θ.

Checking Consistency: The following inequalities are quite helpful in checking consistency of estima-
tors.

(I) Markov’s inequality: Let X be a non-negative random variable. Then,


E(X)
P (X > t) ≤ , for any t > 0.
t
Applications of Markov’s inequality:
(1) Chebyshev’s inequality: Let X be any random variable with expectation E(X) = µ and finite
variance Var(X) = σ 2 . Then,
P (|X − µ| > ϵ) = P |X − µ|2 > ϵ2


E |X − µ|2


ϵ2
σ 2
= 2
ϵ
by Markov’s inequality. The above inequality is called the Chebyshev’s inequality.
(2) One can further sharpen the upper bound of P (|X − µ| > ϵ) (the tail probability) when higher order
moments are finite.
Let X be a RV with expectation µ, and finite r-th order moment µr = E|X − µ|r . Then,
µr
P (|X − µ| > ϵ) ≤ r .
ϵ
Note: Usually one obtains a sharper bound by taking a larger value of r. However, for a larger
value of r, µr may not be finite.

(II) Application of the inequalities:


Sufficient conditions for consistency: A sequence of estimators {Tn } is consistent for ψ(θ), i.e.,
p
Tn −→ ψ(θ) if
E (Tn ) → ψ(θ) as n → ∞ and (1)
Var (Tn ) → 0 as n → ∞ (2)
p
Note: If (1) and (2) holds then Tn →
− ψ(θ). But the converse is NOT true.

3
Proof. Observe that, for any ϵ > 0,

P (|Tn − ψ(θ)| > ϵ) = P {Tn − ψ(θ)}2 > ϵ2




E {Tn − ψ(θ)}2
 

ϵ2
Now, h i
E {Tn − ψ(θ)}2 ≤ 3 Var (Tn ) + 3 [E (Tn ) − ψ(θ)]2 → 0 by (1) and (2).

Example 6 (WLLN: IID case). Let X1 , · · · , Xn be IID samples from some distribution with ex-
pectation µ and finite variance σ 2 . Then E X̄n = µ and Var X̄n = σ 2 /n → 0 as n → ∞. So,

p
X̄n →
− µ as n → ∞.

2.1 Weak Law of Large Numbers (WLLN):


As shown in the last example, sample mean, X̄n , is a consistent estimator of population mean (expec-
tation). This result is called law of large numbers (LLN). WLLN can be extended to the case with
independent (not identically distributed) samples.
Result 1 (WLLN : Independent case). Let X1 , · · · , Xn be independent samples such that E (Xi ) = µi
and Var (Xi ) = σi2 ; i = 1, . . . , n. Further, let σi2 ≤ M for some M > 0 for all i = 1, . . . , n. Then
n
p 1X
X̄n − µ̄n → 0 as n → ∞, where µ̄n = µi .
n
i=1

Proof. Fix ϵ > 0,


h 2 i
 E X̄n − µ̄n
P X̄n − µ̄n > ϵ ≤ 2
h nϵP oi
1 n 2
E n2 i=1 (Xi − µi )
=
Pn ϵ2 P
i=1 Var (Xi ) + i̸=j Cov (Xi , Xj )
=
Pn ϵ2 n2
2
i=1 σi
=
n2 ϵ2
nM
≤ →0 as n → ∞.
n2 ϵ2

Remark 3. Observe that, in the above proof it is enough to have


Pn 2
i=1 σi
→ 0 as n → ∞.
n2
Example 7. Let Xi ∼ Bern (pi ) for i = 1, . . . , n and X1 , · · · , Xn are mutually independent. Observe that,
1
E (Xi ) = pi , and Var (Xi ) = pi (1 − pi ) ≤ ∀i.
4
Thus,  p
X̄n − p̄n →− 0 as n → ∞.

4
3 Convergence in distribution
Another concept of convergence of RVs is convergence in distribution.

Definition 3 (Convergence in distribution). Let {Tn } be a sequence of RVs with CDFs {Fn } and T be
another random variable with CDF F . Then Tn is said to converge in distribution to T , notationally,
d
Tn −
→ T if
Fn (x) −→ F (x) as n → ∞,
for any continuity point x of F (·).

Remark 4. A point x is a continuity point of a CDF F if

F x− = F (x) = F x+
 
(3)

We know that any CDF F is right continuous (i.e., F (x+ ) = F (x) for all x ∈ R), and non-decreasing.
However, for some points x ∈ R, it might happen that

F x− < F (x) = F x+ .
 
(4)

Such points are called discontinuity points of F .

Note: There can be at most countable number of discontinuity points in any CDF F .
d
Remark 5. If the limiting RV T is degenerate with P (T = c) = 1, then Tn −
→ T can equivalently be
d
written as Tn −
→ c.
iid d
Example 8. Let X1 , · · · , Xn ∼ uniform(0, θ) and Tn = X(n) . Show that Tn −
→ θ.
d
Proof. Let T be such that P (T = θ) = 1. So, we’ll show Tn −
→ T . The CDF of T , denoted by FT , is as
follows: 
0 if −∞ < t < θ
FT (t) =
1 if t ⩾ θ
Thus, FT is discontinuous at the point θ only. So, the set of continuity points of FT ,

e = {x : x ∈ R\{θ}}

Now, The CDF of Tn , Fn is given by



 0 if −∞ < t < 0
Fn (t) = (t/θ)n if 0≤t≤θ
1 if t>θ

Consider any point x ∈ e.

Fn (x) = 0 = FT (x) if −∞ < x < 0,


Fn (x) = (x/θ)n → 0 = FT (x) if 0 ≤ x < θ,
Fn (x) = 1 = F (x) if x>θ

d
So, Tn −
→ T.

5
3.1 Some important remarks
p d d
→ T , but the converse is not true in general. However, if Tn −
− T then Tn −
Remark 6. If Tn → → c then
P
Tn −
→ c.
d/p
Remark 7 (Continuous mapping theorem). If Tn −−→ T and g : R → R is a continuous function then
d/p
g (Tn ) −−→ g(T ).
d p
Remark 8 (Slutsky’s Lemma). Let Tn −
→ T and Wn →
− c then
d
(i) Tn ± Wn −
→ T ± c,
d
(ii) Tn Wn −
→ cT , and
d
(iii) if c ̸= 0 then Tn /Wn −
→ T /c.

4 Central Limit Theorem (CLT)


The CLT concerns the large sample behavior of sample mean, after proper scaling (magnification). Like
WLLN, there are various versions of CLT. Here we provide only the result for IID samples.

Theorem 1 (CLT, IID case). Let X1 , . . . , Xn be IID samples from some distribution with expectation µ,
and finite variance σ 2 . Then, √ 
n X̄n − µ Sn − nµ d
= √ −
→ Z,
σ nσ 2
where Z ∼ N (0, 1) and Sn = (X1 + · · · + Xn ).
iid
Example 9. Let X1 , . . . , Xn ∼ N 0, σ 2 distribution. Then, n−1 ni=1 Xi2 is an estimator of σ 2 . By
 P
CLT,
√ Pn 2 2

n i=1 Xi − σ d
√ −
→ Z ∼ N (0, 1)
3σ 4
n
We say that n−1 i=1 Xi2 is asymptotically normally distributed with asymptotic mean σ 2 and asymptotic
P
variance 3σ 4 /n.

5 Reference
Chapter 6 of Probability and Statistics by M. H. DeGroot and M. J. Schervish.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy