BSDS_slides-Week9
BSDS_slides-Week9
1 Sampling distributions
Recall the sampling distributions of some important statistics.
iid
Example 1. Let X1 , . . . , Xn ∼ N (µ, σ 2 ). Define T1n = X̄n (Sample mean) and T2n = Sn2 (Sample
variance). Then, T1n ∼ Normal(µ, n−1 σ 2 ) and nT2n ∼ σ 2 χ2(n−1) .
iid
Example 2. Let X1 , . . . , Xn ∼ Gamma(α, β), then T3n = X̄n ∼ n−1 Gamma(nα, β).
Fig. 1: Plots of three statistics T1n , T2n generated from Normal(1, 1) and T3n generated from Gamma(2, 4) for
varying sample sizes (n).
1
• As we see in the above examples, usually distribution of good statistics or estimators depend on n.
– T11 = X1 ,
– T12 = (X1 + X2 )/2
– T13 = (X1 + X2 + X3 )/3 and so on
– Tn = X̄n .
• Therefore, it is natural to ask, how the statistic behaves when we have a large number of
representative samples?
So, one can not directly generalize the concept of convergence of real sequence for convergence of
random variables (RV). [Recall, a real sequence {an } converges to a point a if for any ϵ > 0, ∃ nϵ s.t.
|an − a| < ϵ for n ≥ nϵ .] Therefore, to define convergence of a sequence of RVs to a RV, i.e., to define
Tn → T , where {Tn } and T are RVs, one first,
Definition 1 (Convergence in probability). Let {Tn } be a sequence of RVs and T be another RV such
that all of them are defined on the same probability space (so that the RV Tn − T is well-defined), then we
p
say that Tn converges in probability to T , notationally, Tn → T if for any ϵ > 0,
Remark 1. Observe that, here {pn,ϵ } is a real sequence measuring the difference between Tn and T (at ϵ
level).
Remark 2. Sometimes the limiting RV, T , is of degenerate type, i.e., P (T = c) = 1. In such cases,
p p
Tn → T can equivalently be expressed as Tn → c.
iid p
Example 4. Let X1 , · · · , Xn ∼ U (0, θ), and Tn = X(n) . Show that Tn → θ.
2
Fix any ϵ > 0,
P (|Tn − θ| < ϵ) = P (θ − ϵ < Tn < θ + ϵ)
(
P (θ − ϵ < Tn ≤ θ) if ϵ ≤ θ
=
P (0 ≤ Tn ≤ θ) if ϵ > θ
(
n
1 − (ϵ/θ) if ϵ ≤ θ
=
1 if ϵ > θ
→ 1 as n → ∞.
Thus, P (|Tn − θ| > ϵ) = pn,ϵ → 0 as n → ∞.
Definition 2 (Consistency). Let {Tn } be a sequence of estimators for the parameter ψ(θ). Then {Tn } is
p
called a consistent sequence of estimators for ψ(θ) if Tn → ψ(θ) as n → ∞.
Example 5. In the above Example 4, X(n) is consistent for θ.
Checking Consistency: The following inequalities are quite helpful in checking consistency of estima-
tors.
E |X − µ|2
≤
ϵ2
σ 2
= 2
ϵ
by Markov’s inequality. The above inequality is called the Chebyshev’s inequality.
(2) One can further sharpen the upper bound of P (|X − µ| > ϵ) (the tail probability) when higher order
moments are finite.
Let X be a RV with expectation µ, and finite r-th order moment µr = E|X − µ|r . Then,
µr
P (|X − µ| > ϵ) ≤ r .
ϵ
Note: Usually one obtains a sharper bound by taking a larger value of r. However, for a larger
value of r, µr may not be finite.
3
Proof. Observe that, for any ϵ > 0,
E {Tn − ψ(θ)}2
≤
ϵ2
Now, h i
E {Tn − ψ(θ)}2 ≤ 3 Var (Tn ) + 3 [E (Tn ) − ψ(θ)]2 → 0 by (1) and (2).
Example 6 (WLLN: IID case). Let X1 , · · · , Xn be IID samples from some distribution with ex-
pectation µ and finite variance σ 2 . Then E X̄n = µ and Var X̄n = σ 2 /n → 0 as n → ∞. So,
p
X̄n →
− µ as n → ∞.
4
3 Convergence in distribution
Another concept of convergence of RVs is convergence in distribution.
Definition 3 (Convergence in distribution). Let {Tn } be a sequence of RVs with CDFs {Fn } and T be
another random variable with CDF F . Then Tn is said to converge in distribution to T , notationally,
d
Tn −
→ T if
Fn (x) −→ F (x) as n → ∞,
for any continuity point x of F (·).
F x− = F (x) = F x+
(3)
We know that any CDF F is right continuous (i.e., F (x+ ) = F (x) for all x ∈ R), and non-decreasing.
However, for some points x ∈ R, it might happen that
F x− < F (x) = F x+ .
(4)
Note: There can be at most countable number of discontinuity points in any CDF F .
d
Remark 5. If the limiting RV T is degenerate with P (T = c) = 1, then Tn −
→ T can equivalently be
d
written as Tn −
→ c.
iid d
Example 8. Let X1 , · · · , Xn ∼ uniform(0, θ) and Tn = X(n) . Show that Tn −
→ θ.
d
Proof. Let T be such that P (T = θ) = 1. So, we’ll show Tn −
→ T . The CDF of T , denoted by FT , is as
follows:
0 if −∞ < t < θ
FT (t) =
1 if t ⩾ θ
Thus, FT is discontinuous at the point θ only. So, the set of continuity points of FT ,
e = {x : x ∈ R\{θ}}
d
So, Tn −
→ T.
5
3.1 Some important remarks
p d d
→ T , but the converse is not true in general. However, if Tn −
− T then Tn −
Remark 6. If Tn → → c then
P
Tn −
→ c.
d/p
Remark 7 (Continuous mapping theorem). If Tn −−→ T and g : R → R is a continuous function then
d/p
g (Tn ) −−→ g(T ).
d p
Remark 8 (Slutsky’s Lemma). Let Tn −
→ T and Wn →
− c then
d
(i) Tn ± Wn −
→ T ± c,
d
(ii) Tn Wn −
→ cT , and
d
(iii) if c ̸= 0 then Tn /Wn −
→ T /c.
Theorem 1 (CLT, IID case). Let X1 , . . . , Xn be IID samples from some distribution with expectation µ,
and finite variance σ 2 . Then, √
n X̄n − µ Sn − nµ d
= √ −
→ Z,
σ nσ 2
where Z ∼ N (0, 1) and Sn = (X1 + · · · + Xn ).
iid
Example 9. Let X1 , . . . , Xn ∼ N 0, σ 2 distribution. Then, n−1 ni=1 Xi2 is an estimator of σ 2 . By
P
CLT,
√ Pn 2 2
n i=1 Xi − σ d
√ −
→ Z ∼ N (0, 1)
3σ 4
n
We say that n−1 i=1 Xi2 is asymptotically normally distributed with asymptotic mean σ 2 and asymptotic
P
variance 3σ 4 /n.
5 Reference
Chapter 6 of Probability and Statistics by M. H. DeGroot and M. J. Schervish.