Central Limit Theorem
Central Limit Theorem
Central Limit Theorem
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = −
−
=
−
−
σ/√n √n σ
has mean E Zn = 0 and variance Var(Zn ) = 1. The central limit theorem states that the CDF of Zn converges to the standard normal CDF.
Let X1 ,X2 ,...,Xn be i.i.d. random variables with expected value E Xi = μ < ∞ and variance 0 < Var(X i ) = σ
2
< ∞. Then, the random variable
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = − =
− −
−
σ/√n √n σ
converges in distribution to the standard normal random variable as n goes to infinity, that is
An interesting thing about the CLT is that it does not matter what the distribution of the Xi 's is. The Xi 's can be discrete, continuous, or mixed random variables.
To get a feeling for the CLT, let us look at some examples. Let's assume that Xi 's are Bernoulli(p). Then E Xi = p , Var(Xi ) = p(1 − p) . Also,
Y n = X 1 + X 2 +. . . +X n has Binomial(n, p) distribution. Thus,
Y n − np
Zn = −−−−−−−−,
√np(1 − p)
where Yn ∼ Binomial(n, p) . Figure 7.1 shows the PMF of Zn for different values of n. As you see, the shape of the PMF gets closer to a normal PDF curve
as n increases. Here, Zn is a discrete random variable, so mathematically speaking it has a PMF not a PDF. That is why the CLT states that the CDF (not the PDF)
of Zn converges to the standard normal CDF. Nevertheless, since PMF and PDF are conceptually similar, the figure is useful in visualizing the convergence to
normal distribution.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 1/7
4/28/2021 Central Limit Theorem
Fig.7.1 - Zn is the normalized sum of n independent Bernoulli(p) random variables. The shape of its PMF, PZ (z), resembles the normal curve
n
as n increases.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 2/7
4/28/2021 Central Limit Theorem
1 1
As another example, let's assume that Xi 's are U nif orm(0, 1). Then E Xi = , Var(Xi ) = . In this case,
2 12
n
X 1 + X 2 +. . . +X n −
2
Zn = −−−−
.
√n/12
Figure 7.2 shows the PDF of Zn for different values of n. As you see, the shape of the PDF gets closer to the normal PDF as n increases.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 3/7
4/28/2021 Central Limit Theorem
Fig. 7.2 - Zn is the normalized sum of n independent U nif orm(0, 1) random variables. The shape of its PDF, fZ n
(z), gets closer to the normal
curve as n increases.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 4/7
4/28/2021 Central Limit Theorem
We could have directly looked at Yn = X1 + X2 +. . . +Xn , so why do we normalize it first and say that the normalized version (Zn ) becomes
approximately normal? This is because E Yn = nE Xi and Var(Yn ) = nσ 2 go to infinity as n goes to infinity. We normalize Yn in order to have a finite
mean and variance (E Zn = 0 , Var(Zn ) = 1 ). Nevertheless, for any fixed n, the CDF of Zn is obtained by scaling and shifting the CDF of Yn . Thus, the
two CDFs have similar shapes.
The importance of the central limit theorem stems from the fact that, in many real applications, a certain random variable of interest is a sum of a large number of
independent random variables. In these situations, we are often able to use the CLT to justify using the normal distribution. Examples of such random variables are
found in almost every discipline. Here are a few:
In communication and signal processing, Gaussian noise is the most frequently used model for noise.
In finance, the percentage changes in the prices of some assets are sometimes modeled by normal random variables.
When we do random sampling from a population to obtain statistical knowledge about the population, we often model the resulting quantity as a normal
random variable.
The CLT is also very useful in the sense that it can simplify our computations significantly. If you have a problem in which you are interested in a sum of one
thousand i.i.d. random variables, it might be extremely difficult, if not impossible, to find the distribution of the sum by direct calculation. Using the CLT we can
immediately write the distribution, if we know the mean and variance of the Xi 's.
Another question that comes to mind is how large n should be so that we can use the normal approximation. The answer generally depends on the distribution of the
X i s. Nevertheless, as a rule of thumb it is often stated that if n is larger than or equal to 30 , then the normal approximation is very good.
Here are the steps that we need in order to apply the CLT:
1. Write the random variable of interest, Y , as the sum of n i.i.d. random variable Xi 's:
Y = X 1 + X 2 +. . . +X n .
2
E Y = nμ, Var(Y ) = nσ ,
y1 − nμ Y − nμ y2 − nμ
P (y1 ≤ Y ≤ y2 ) = P ( −
− ≤ −
− ≤ −
− )
√n σ √n σ √n σ
y2 − nμ y1 − nμ
≈ Φ( −
− ) − Φ( −
− ).
√n σ √n σ
Let us look at some examples to see how we can use the central limit theorem.
Example 7.1
A bank teller serves customers standing in the queue one by one. Suppose that the service time Xi for customer i has mean E Xi = 2 (minutes) and
Var(X i ) = 1 . We assume that service times for different bank customers are independent. Let Y be the total time the bank teller spends serving 50 customers.
Solution
Y = X 1 + X 2 +. . . +X n ,
90 − nμ Y − nμ 110 − nμ
P (90 < Y ≤ 110) = P ( < < )
−
− −
− −
−
√n σ √n σ √n σ
– Y − nμ –
= P (−√2 < < √2) .
−−
√n σ
Y −nμ
By the CLT, is approximately standard normal, so we can write
√nσ
– –
P (90 < Y ≤ 110) ≈ Φ(√2) − Φ(−√2)
= 0.8427
Example 7.2
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 5/7
4/28/2021 Central Limit Theorem
In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may be received in error with probability 0.1. It is assumed bit errors
occur independently. Find the probability that there are more than 120 errors in a certain data packet.
Solution
Let us define Xi as the indicator random variable for the ith bit in the packet. That is, Xi = 1 if the ith bit is received in error, and Xi = 0
otherwise. Then the Xi 's are i.i.d. and Xi ∼ Bernoulli(p = 0.1). If Y is the total number of bit errors in the packet, we have
Y = X 1 + X 2 +. . . +X n .
2
E X i = μ = p = 0.1, Var(X i ) = σ = p(1 − p) = 0.09
Y − nμ 120 − nμ
P (Y > 120) = P ( −
− > −
− )
√n σ √n σ
Y − nμ 120 − 100
= P ( −
− > −− )
√n σ √90
20
≈ 1 − Φ( −−)
√90
= 0.0175
Continuity Correction:
1
Let us assume that Y ∼ Binomial(n = 20, p = ), and suppose that we are interested in P (8 ≤ Y ≤ 10) . We know that a
2
1
Binomial(n = 20, p = ) can be written as the sum of n i.i.d. Bernoulli(p) random variables:
2
Y = X 1 + X 2 +. . . +X n .
1
Since Xi ∼ Bernoulli(p = ), we have
2
1 1
2
E Xi = μ = p = , Var(X i ) = σ = p(1 − p) = .
2 4
8 − nμ Y − nμ 10 − nμ
P (8 ≤ Y ≤ 10) = P ( −
− < −
− < −
− )
√n σ √n σ √n σ
8 − 10 Y − nμ 10 − 10
= P ( – < −
− < – )
√5 √n σ √5
−2
≈ Φ(0) − Φ ( –)
√5
= 0.3145
Since, here, n = 20 is relatively small, we can actually find P (8 ≤ Y ≤ 10) accurately. We have
10
n
k n−k
P (8 ≤ Y ≤ 10) = ∑ ( )p (1 − p)
k
k=8
20
20 20 20 1
= [( )+( )+( )]( )
8 9 10 2
= 0.4565
We notice that our approximation is not so good. Part of the error is due to the fact that Y is a discrete random variable and we are using a continuous distribution to
find P (8 ≤ Y ≤ 10) . Here is a trick to get a better approximation, called continuity correction. Since Y can only take integer values, we can write
7.5 − nμ Y − nμ 10.5 − nμ
= P ( < < )
−
− −
− −
−
√n σ √n σ √n σ
7.5 − 10 Y − nμ 10.5 − 10
= P ( < < )
– −
− –
√5 √n σ √5
0.5 −2.5
≈ Φ( ) − Φ( )
– –
√5 √5
= 0.4567
As we see, using continuity correction, our approximation improved significantly. The continuity correction is particularly useful when we would like to find
P (y1 ≤ Y ≤ y2 ) , where Y is binomial and y1 and y2 are close to each other.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 6/7
4/28/2021 Central Limit Theorem
Continuity Correction for Discrete Random Variables
Y = X1 + X2 + ⋯ + Xn .
Suppose that we are interested in finding P (A) = P (l ≤ Y ≤ u) using the CLT, where l and u are integers. Since Y is an integer-valued random variable, we
can write
1 1
P (A) = P (l − ≤ Y ≤ u+ ).
2 2
It turns out that the above expression sometimes provides a better approximation for P (A) when applying the CLT. This is called the continuity correction and it is
particularly useful when Xi 's are Bernoulli (i.e., Y is binomial).
← previous
next →
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 7/7