Lecture 5 - AEP: Nguyễn Phương Thái
Lecture 5 - AEP: Nguyễn Phương Thái
Lecture 5 - AEP: Nguyễn Phương Thái
x1, x2, …, xn is . For example, the probability of the sequence (1, 0, 1, 1, 0, 1) is p4q2. Clearly, it is
not true that all 2n sequences of length n have the same probability.
- However, we might be able to predict the probability of the sequence that we actually observe.
We ask for the probability of the outcomes , where are iid ~ p(x). This is insidiously self-
referential, but well defined nonetheless. Apparently, we are asking for the probability of an event
drawn according to the same probability distribution. Here it turns out that is close to 2-nH with
high probability.
We summarize this by saying, “Almost all events are almost equally surprising.” This is a way of
saying that:
in probability.
Proof: Functions of independent random variables are also independent random variables. Thus, since
the Xi are i.i.d., so are log p(Xi). Hence, by the weak law of large numbers,
in probability
= H(X)
Definition: The typical set with respect to p(x) is the set of sequences (x1, x2, …, xn) ∈ Xn with
the property:
The probability of a string xn that contains r ‘1’ and n-r ‘0’ is:
These functions are shown in the figure in the next page. The mean of r is np 1, and its standard
deviation is . If n = 100, .
If n = 1000, . Notice that as n gets bigger, the probability distribution of r becomes more
concentrated, in the sense that while the range of possible values of r grows as n, the standard
deviation of r grows only as . That r is most likely to fall in a small range of values implies that the
outcome x is also most likely to fall in a corresponding small subset of outcomes that we will call
the typical set.
Properties of typical set
(1) If then
Thus, the typical set has probability nearly 1, all elements of the typical set are nearly equiprobable, and the
number of elements in the typical set
is nearly 2nH.
Data
compression
• Let X1,X2, . . . , Xn be
independent, identically
distributed random variables
drawn from the probability
mass function p(x).
• We wish to find short
descriptions for such
sequences of random
variables. We divide all
sequences in Xn into two
sets: the typical set and its
complement, as shown in
- We order all elements in each set according to some order (e.g., lexicographic order).
- Then we can represent each sequence of by giving the index of the sequence in the set.
- Since there are ≤ sequences in , the indexing requires no more than bits. [The extra bit may
be necessary because may not be an integer.]
- We prefix all these sequences by a 0, giving a total length of ≤ bits to represent each sequence
in (see Figure in the next page).
Similarly, we can index each sequence not in by using not more than bits. Prefixing these
indices by 1, we have a code for all the sequences in Xn.
Note the following features of the above coding scheme:
• The code is one-to-one and easily decodable. The initial bit acts as a flag bit to indicate the length of
the codeword that follows.
• We have used a brute-force enumeration of the atypical set without taking into account the fact
that the number of elements in is less than the number of elements in Xn. Surprisingly, this is good
enough to yield an efficient description.