Central Limit Theorem

4/28/2021 Central Limit Theorem
7.1.2 Central Limit Theorem

The central limit theorem (CLT) is one of the most important results in probability theory. It states that, under certain conditions, the sum of a large number of
random variables is approximately normal. Here, we state a version of the CLT that applies to i.i.d. random variables. Suppose that X1 , X2 , ... , Xn are i.i.d.
random variables with expected values E Xi = μ < ∞ and variance Var(Xi ) = σ 2 < ∞ . Then as we saw above, the sample mean
2
¯¯¯¯ X1 +X2 +...+Xn ¯¯¯¯ ¯¯¯¯ σ
X =
n
has mean E X = μ and variance Var(X ) =
n
. Thus, the normalized random variable
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = −
−
=
−
−
σ/√n √n σ
has mean E Zn = 0 and variance Var(Zn ) = 1. The central limit theorem states that the CDF of Zn converges to the standard normal CDF.
The Central Limit Theorem (CLT)
Let X1 ,X2 ,...,Xn be i.i.d. random variables with expected value E Xi = μ < ∞ and variance 0 < Var(X i ) = σ
2
< ∞. Then, the random variable
¯¯¯¯
X −μ X 1 + X 2 +. . . +X n − nμ
Zn = − =
− −
−
σ/√n √n σ
converges in distribution to the standard normal random variable as n goes to infinity, that is
lim P (Zn ≤ x) = Φ(x), for all x ∈ R,

n→∞
where Φ(x) is the standard normal CDF.
An interesting thing about the CLT is that it does not matter what the distribution of the Xi 's is. The Xi 's can be discrete, continuous, or mixed random variables.
To get a feeling for the CLT, let us look at some examples. Let's assume that Xi 's are Bernoulli(p). Then E Xi = p , Var(Xi ) = p(1 − p) . Also,
Y n = X 1 + X 2 +. . . +X n has Binomial(n, p) distribution. Thus,
Y n − np
Zn = −−−−−−−−,
√np(1 − p)
where Yn ∼ Binomial(n, p) . Figure 7.1 shows the PMF of Zn for different values of n. As you see, the shape of the PMF gets closer to a normal PDF curve
as n increases. Here, Zn is a discrete random variable, so mathematically speaking it has a PMF not a PDF. That is why the CLT states that the CDF (not the PDF)
of Zn converges to the standard normal CDF. Nevertheless, since PMF and PDF are conceptually similar, the figure is useful in visualizing the convergence to
normal distribution.
https://www.probabilitycourse.com/chapter7/7_1_2_central_limit_theorem.php 1/7
Fig.7.1 - Zn is the normalized sum of n independent Bernoulli(p) random variables. The shape of its PMF, PZ (z), resembles the normal curve
n
as n increases.
1 1
As another example, let's assume that Xi 's are U nif orm(0, 1). Then E Xi = , Var(Xi ) = . In this case,
2 12
n
X 1 + X 2 +. . . +X n −
2
Zn = −−−−
.
√n/12
Figure 7.2 shows the PDF of Zn for different values of n. As you see, the shape of the PDF gets closer to the normal PDF as n increases.
Fig. 7.2 - Zn is the normalized sum of n independent U nif orm(0, 1) random variables. The shape of its PDF, fZ n
(z), gets closer to the normal
curve as n increases.
We could have directly looked at Yn = X1 + X2 +. . . +Xn , so why do we normalize it first and say that the normalized version (Zn ) becomes
approximately normal? This is because E Yn = nE Xi and Var(Yn ) = nσ 2 go to infinity as n goes to infinity. We normalize Yn in order to have a finite
mean and variance (E Zn = 0 , Var(Zn ) = 1 ). Nevertheless, for any fixed n, the CDF of Zn is obtained by scaling and shifting the CDF of Yn . Thus, the
two CDFs have similar shapes.
The importance of the central limit theorem stems from the fact that, in many real applications, a certain random variable of interest is a sum of a large number of
independent random variables. In these situations, we are often able to use the CLT to justify using the normal distribution. Examples of such random variables are
found in almost every discipline. Here are a few:
Laboratory measurement errors are usually modeled by normal random variables.
In communication and signal processing, Gaussian noise is the most frequently used model for noise.
In finance, the percentage changes in the prices of some assets are sometimes modeled by normal random variables.
When we do random sampling from a population to obtain statistical knowledge about the population, we often model the resulting quantity as a normal
random variable.
The CLT is also very useful in the sense that it can simplify our computations significantly. If you have a problem in which you are interested in a sum of one
thousand i.i.d. random variables, it might be extremely difficult, if not impossible, to find the distribution of the sum by direct calculation. Using the CLT we can
immediately write the distribution, if we know the mean and variance of the Xi 's.
Another question that comes to mind is how large n should be so that we can use the normal approximation. The answer generally depends on the distribution of the
X i s. Nevertheless, as a rule of thumb it is often stated that if n is larger than or equal to 30 , then the normal approximation is very good.
Let's summarize how we use the CLT to solve problems:
How to Apply The Central Limit Theorem (CLT)
Here are the steps that we need in order to apply the CLT:
1. Write the random variable of interest, Y , as the sum of n i.i.d. random variable Xi 's:
Y = X 1 + X 2 +. . . +X n .
2. Find E Y and Var(Y ) by noting that
2
E Y = nμ, Var(Y ) = nσ ,
where μ = E Xi and σ 2 = Var(X i ) .

Y −EY Y −nμ
3. According to the CLT, conclude that = is approximately standard normal; thus, to find P (y1 ≤ Y ≤ y2 ) , we can write
√Var(Y ) √nσ
y1 − nμ Y − nμ y2 − nμ
P (y1 ≤ Y ≤ y2 ) = P ( −
− ≤ −
− ≤ −
− )
√n σ √n σ √n σ
y2 − nμ y1 − nμ
≈ Φ( −
− ) − Φ( −
− ).
√n σ √n σ
Let us look at some examples to see how we can use the central limit theorem.
Example 7.1
A bank teller serves customers standing in the queue one by one. Suppose that the service time Xi for customer i has mean E Xi = 2 (minutes) and
Var(X i ) = 1 . We assume that service times for different bank customers are independent. Let Y be the total time the bank teller spends serving 50 customers.
Find P (90 < Y < 110).
Solution
Y = X 1 + X 2 +. . . +X n ,
where n = 50 , E X i = μ = 2 , and Var(Xi ) = σ

2
= 1. Thus, we can write
90 − nμ Y − nμ 110 − nμ
P (90 < Y ≤ 110) = P ( < < )
−
− −
− −
−
90 − 100 Y − nμ 110 − 100

= P ( < < )
−− −
− −−
√50 √n σ √50
– Y − nμ –
= P (−√2 < < √2) .
−−
√n σ
Y −nμ
By the CLT, is approximately standard normal, so we can write
√nσ
– –
P (90 < Y ≤ 110) ≈ Φ(√2) − Φ(−√2)
= 0.8427
Example 7.2
In a communication system each data packet consists of 1000 bits. Due to the noise, each bit may be received in error with probability 0.1. It is assumed bit errors
occur independently. Find the probability that there are more than 120 errors in a certain data packet.
Solution
Let us define Xi as the indicator random variable for the ith bit in the packet. That is, Xi = 1 if the ith bit is received in error, and Xi = 0
otherwise. Then the Xi 's are i.i.d. and Xi ∼ Bernoulli(p = 0.1). If Y is the total number of bit errors in the packet, we have
Y = X 1 + X 2 +. . . +X n .
Since Xi ∼ Bernoulli(p = 0.1) , we have
2
E X i = μ = p = 0.1, Var(X i ) = σ = p(1 − p) = 0.09
Using the CLT, we have
Y − nμ 120 − nμ
P (Y > 120) = P ( −
− > −
− )
√n σ √n σ
Y − nμ 120 − 100
= P ( −
− > −− )
√n σ √90
20
≈ 1 − Φ( −−)
√90
= 0.0175
Continuity Correction:
1
Let us assume that Y ∼ Binomial(n = 20, p = ), and suppose that we are interested in P (8 ≤ Y ≤ 10) . We know that a
2
1
Binomial(n = 20, p = ) can be written as the sum of n i.i.d. Bernoulli(p) random variables:
2
Y = X 1 + X 2 +. . . +X n .
1
Since Xi ∼ Bernoulli(p = ), we have
2
1 1
2
E Xi = μ = p = , Var(X i ) = σ = p(1 − p) = .
2 4
Thus, we may want to apply the CLT to write
8 − nμ Y − nμ 10 − nμ
P (8 ≤ Y ≤ 10) = P ( −
− < −
− < −
− )
8 − 10 Y − nμ 10 − 10
= P ( – < −
− < – )
√5 √n σ √5
−2
≈ Φ(0) − Φ ( –)
√5
= 0.3145
Since, here, n = 20 is relatively small, we can actually find P (8 ≤ Y ≤ 10) accurately. We have
10
n
k n−k
P (8 ≤ Y ≤ 10) = ∑ ( )p (1 − p)
k
k=8
20
20 20 20 1
= [( )+( )+( )]( )
8 9 10 2
= 0.4565
We notice that our approximation is not so good. Part of the error is due to the fact that Y is a discrete random variable and we are using a continuous distribution to
find P (8 ≤ Y ≤ 10) . Here is a trick to get a better approximation, called continuity correction. Since Y can only take integer values, we can write
P (8 ≤ Y ≤ 10) = P (7.5 < Y < 10.5)
7.5 − nμ Y − nμ 10.5 − nμ
= P ( < < )
−
− −
− −
−
7.5 − 10 Y − nμ 10.5 − 10
= P ( < < )
– −
− –
√5 √n σ √5
0.5 −2.5
≈ Φ( ) − Φ( )
– –
√5 √5
= 0.4567
As we see, using continuity correction, our approximation improved significantly. The continuity correction is particularly useful when we would like to find
P (y1 ≤ Y ≤ y2 ) , where Y is binomial and y1 and y2 are close to each other.
Continuity Correction for Discrete Random Variables
Let X1 ,X2 , ⋯ ,Xn be independent discrete random variables and let
Y = X1 + X2 + ⋯ + Xn .
Suppose that we are interested in finding P (A) = P (l ≤ Y ≤ u) using the CLT, where l and u are integers. Since Y is an integer-valued random variable, we
can write
1 1
P (A) = P (l − ≤ Y ≤ u+ ).
2 2
It turns out that the above expression sometimes provides a better approximation for P (A) when applying the CLT. This is called the continuity correction and it is
particularly useful when Xi 's are Bernoulli (i.e., Y is binomial).
← previous
next →

Central Limit Theorem

Uploaded by

Copyright:

Available Formats

Central Limit Theorem

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Central Limit Theorem

Uploaded by

Copyright:

Available Formats

4/28/2021 Central Limit Theorem

7.1.2 Central Limit Theorem

The Central Limit Theorem (CLT)

lim P (Zn ≤ x) = Φ(x), for all x ∈ R,

where Φ(x) is the standard normal CDF.

Laboratory measurement errors are usually modeled by normal random variables.

Let's summarize how we use the CLT to solve problems:

How to Apply The Central Limit Theorem (CLT)

2. Find E Y and Var(Y ) by noting that

where μ = E Xi and σ 2 = Var(X i ) .

Find P (90 < Y < 110).

where n = 50 , E X i = μ = 2 , and Var(Xi ) = σ

90 − 100 Y − nμ 110 − 100

Since Xi ∼ Bernoulli(p = 0.1) , we have

Using the CLT, we have

Thus, we may want to apply the CLT to write

P (8 ≤ Y ≤ 10) = P (7.5 < Y < 10.5)

Let X1 ,X2 , ⋯ ,Xn be independent discrete random variables and let

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.