MIT15 075JF11 chpt05
MIT15 075JF11 chpt05
MIT15 075JF11 chpt05
Note 3: CLT is really useful because it characterizes large samples from any distribution.
As long as you have a lot of independent samples (from any distribution), then the distribu-
tion of the sample mean is approximately normal.
1
2
Sampling Distribution of the Sample Variance - Chi-Square Distribution
From the central limit theorem (CLT), we know that the distribution of the sample mean is
approximately normal. What about the sample variance?
Technical Note: we lost a degree of freedom when we used the sample mean rather than the
true mean. In other words, fixing n − 1 quantities completely determines s2, since:
1 2
s2 := (xi − x̄) .
n −1 i
Let’s simulate a χ2
n−1
distribution for n = 3. Draw 3 samples from N (0, 1). Repeat 1000
times.
3 2
z1 z2 z3 i=1z i
t=1 -0.3 -1.1 0.2 1.34
:
:
:
t = 1000
Then, histogram the values in the rightmost column.
3
For the chi-square distribution, it turns out that the mean and variance are:
E(χν2) = ν
Var(χν2) = 2ν.
Remember, the χ2 distribution characterizes normal r.v. with known variance. You need to
know σ! Look below, you can’t get the distribution for S2 unless you know σ.
2 (n − 1)S2 2
X1, X2, . . . , Xn ∼ N (µ, σ ) → ∼ χn−1
σ2
4
Student’s t-Distribution
Let X1, X2, . . . , Xn ∼ N (µ, σ2). William Sealy Gosset aka “Student” (1876-1937) was
looking at the distribution of:
X̄ − µ
T = √
S/ n
Contrast T with σ/ √ which we know is N (0, 1).
X̄ −µ
n
So why was Student looking at this?
Because he had a small sample, he didn’t know the variance of the distribution and couldn’t
estimate it well, and he wanted to determine how far x̄ was from µ. We are in the case of:
• N (0, 1) r.v.’s
• comparing X̄ to µ
• unknown variance σ2
The numerator Z is N (0, 1), and the denominator is sort of the square root of a chi-square,
because remember S2(n − 1)/σ2 ∼ χ2 .
n−1
Note that when n is large, S 2/σ2 → 1 so the T-distribution → N (0, 1).
Student showed that the pdf of T is:
. Σ
Γ ν+1 t2 −(ν+1)/2
f (t) = √ 2
1+ −∞<t<∞
πνΓ (ν/2) ν
5
Snedecor’s F-distribution
The F -distribution is usually used for comparing variances from two separate sources.
Consider 2 independent random samples X1, X2, . . . , Xn ∼ N (µ1, σ2) and Y1, Y2, . . . , Yn ∼
1 1 2
N (µ2, σ2). Define S2 and S2 as the sample variances. Recall:
2 1 2
S2(n1 − 1) S2(n2 − 1)
1 2 2 2
2 ∼ χn1−1 and ∼ χn2−1.
σ 1 σ22
The F-distribution considers the ratio:
S2 /σ2 χ2 /(n1 − 1)
1 1
∼ n1−1
.
S2 /σ22
2 χ2n2−1 /(n2 − 1)
When σ2 = σ2, the left hand side reduces to S2/S2.
1 2 1 2
We want to know the distribution of this! Speaking more generally, let U ∼ χν21 and V ∼ χν22 .
Then W = U/ν1 has an F-distribution, W ∼ Fν ,ν .
V/ν2 1 2
6
7
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.