0% found this document useful (0 votes)
38 views

Channel Capacity and The Channel Coding Theorem, Part I: Information Theory 2013

This lecture covers channel capacity and the channel coding theorem. It introduces Fano's inequality, which relates the error in estimating a random variable X from another random variable Y to their conditional entropy H(X|Y). Channel capacity is defined as the maximum rate at which information can be sent over a channel with arbitrarily low probability of error. Several channel models are presented, including the noisy typewriter channel and binary symmetric channel. The channel coding theorem states that channel capacity is the highest rate at which error-free communication is possible. Tools needed to prove the theorem are also outlined.

Uploaded by

Kate boss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Channel Capacity and The Channel Coding Theorem, Part I: Information Theory 2013

This lecture covers channel capacity and the channel coding theorem. It introduces Fano's inequality, which relates the error in estimating a random variable X from another random variable Y to their conditional entropy H(X|Y). Channel capacity is defined as the maximum rate at which information can be sent over a channel with arbitrarily low probability of error. Several channel models are presented, including the noisy typewriter channel and binary symmetric channel. The channel coding theorem states that channel capacity is the highest rate at which error-free communication is possible. Tools needed to prove the theorem are also outlined.

Uploaded by

Kate boss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Channel Capacity and the Channel Coding

Theorem, Part I
Information Theory 2013
Lecture 4

Michael Roth

April 24, 2013


Outline

This lecture will cover


• Fano’s inequality.
• channel capacity and some channel
models.
• a preview of the channel coding theorem.
• the tools that are needed to establish the
channel coding theorem.

All illustrations are borrowed from the book.


Fano’s inequality
Estimate X from Y . Relate error in guessing X to H(X |Y ).
We know that H(X |Y ) = 0 if X = g(Y ) (Problem 2.5) → can
estimate X with zero error probability. Extension: H(X |Y ) “small”
→ can estimate X with low error probability.
Formally: X has p(x ), Y related via p(y |x ),n estimate
o X̂ = g(Y )
with alphabet X̂ , error probability Pe = Pr X̂ 6= X .

Fano’s inequality: For X → Y → X̂

H(Pe ) + Pe log |X | ≥ H(X |X̂ ) ≥ H(X |Y ).

Weaker: 1 + Pe log |X | ≥ H(X |Y ) or

H(X |Y ) − 1
Pe ≥ .
log |X |
Motivation and preview
A communicates with B: A induces a state in B. Physical process
gives rise to noise.
Mathematical analog: source W , transmitted sequence X n , etc.

^
W Xn Channel Yn W
Encoder Decoder
Message p(y |x) Estimate
of
Message

Two X n may give the same Y n — inputs confusable.


Idea: use only a subset of all possible X n such that there is, with
high probability, only one likely X n to result in each Y n .

Map W into “widely spaced” X n . Then Ŵ = W with high


probability.
Channel capacity: maximum rate (source bits/channel use) at
which we can carry out the above steps.
Channel capacity

Discrete channel: input alphabet X , output alphabet Y,


probability transition matrix p(y |x ).
Memoryless channel: current output depends only on the current
input, conditionally independent of previous inputs or outputs.
“Information” channel capacity of a discrete memoryless channel is

C = max I(X ; Y ).
p(x )

Shannon’s channel coding theorem: C highest rate (bits per


channel use) at which information can be sent with arbitrary low
probability of error.
Some channels I
0 0

Noiseless binary channel


X Y • I(X ; Y ) = H(X ) − H(X |Y ) =
H(X ).
• C = 1, achieved for uniform X .
1 1

1
1/2

1/2
Noisy channel with
2
nonoverlapping outputs
X Y
• output random, but input uniquely
3 determined.
1/3
• C = 1, achieved for uniform X .
1

2/3
4
Some channels II
A A A A
B B B
C C C C
D D D
E E

Noisy typewriter
• input either unchanged or
shifted (both w.p. 21 ).
• use of every second input:
log 13 bits per transmission
without error.
• I(X ; Y ) = H(Y ) − H(Y |X ) =
H(Y ) − H( 12 , 12 ) = H(Y ) − 1.
• C = max I(X ; Y ) =
log 26 − 1 = log 13.
Y Y
Z Z Z
Noisy channel Noiseless subset of inputs
Some channels III
1−p
0 0
Binary symmetric channel
• simplest channel with errors.
p • probability of switched input is p.
p
• “all received bits unreliable”.
• C = 1 − H(p) achieved for
1
1−p
1
uniform X .

I(X ; Y ) = H(Y ) − H(Y |X )


X
= H(Y ) − p(x )H(Y |X = x )
X
= H(Y ) − p(x )H(p)
= H(Y ) − H(p)
≤ 1 − H(p).

Reminder: H(p) = −p log p − (1 − p) log(1 − p).


Some channels IV
1−α Binary erasure channel
0 0
• bits are lost rather than corrupted.
• fraction α are erased.
• e: receiver knows that it does not
know.
α
α
e • I(X ; Y ) = H(Y ) − H(Y |X ) =
H(Y ) − H(α).
• C = 1 − α.
• feedback discussion and surprising
1
1−α
1 fact.

Introduce E with E = 1 if Y = e. Let π = Pr {X = 1}. Then

H(Y ) = H(Y , E ) = H(E ) + H(Y |E )


= H(α) + (1 − α)H(π)

and I(X ; Y ) = (1 − α)H(π) yields C = (1 − α) for π = 12 .


Symmetric channels I

Transmission matrix. Example for X = Y = {0, 1, 2}:

0.3 0.2 0.5


 

p(y |x ) = 0.5 0.3 0.2


 
0.2 0.5 0.3

Pr {Y = 1|X = 0} = 0.2. Rows must add up to 1.


This is a symmetric channel: row 1 is a permutation of row 2.
Other rows and columns are permutations too.
Let r be one row in p(y |x ). Then

I(X ; Y ) = H(Y ) − H(Y |X ) = H(Y ) − H(r) ≤ log |Y| − H(r).


Symmetric channels II
I(X ; Y ) maximized for uniform Y . Achieved by uniform X :
X 1 X 1
p(y ) = p(y |x )p(x ) = p(y |x ) = c
x ∈X
|X | |X |

with c sum over one column.


Generalization: each row is a permutation of every other row, and
all column sums are equal. Example:
" #
1/3 1/6 1/2
p(y |x ) = .
1/3 1/2 1/6

Channel capacity for weakly symmetric channels is

C = log |Y| − H(r).


Properties of channel capacity

Properties:
• C ≥ 0, since I(X ; Y ) ≥ 0.
• C ≤ log |X | and C ≤ log |Y|.
• I(X ; Y ) continuous function of p(x ).
• I(X ; Y ) concave in p(x ).
Consequences:
• maximum exists and is finite.
• convex optimization tools can be employed.
Preview of the channel coding theorem

Yn
Intuitive idea:
Xn
• for large block lengths every channel looks
like the noisy typewriter.
• one (typical) input sequence gives
≈ 2nH(Y |X ) output sequences.
• total number of (typical) output
sequences ≈ 2nH(Y ) must be divided into
sets of size 2nH(Y |X ) .
• total number of disjoint sets ≤ 2n(H(Y )−H(Y |X )) = 2nI(X ;Y ) .
• can send at most 2nI(X ;Y ) distinguishable sequences of
length n.
• channel capacity as log of the maximum number of
distinguishable sequences.
Definitions I
^
W Xn Channel Yn W
Encoder Decoder
Message p(y |x) Estimate
of
Message

• discrete channel: (X , p(y |x ), Y)).


• nth extension of the discrete memoryless channel:
(X n , p(y n |x n ), Y n ) with p(yk |x k , y k−1 ) = p(yk |xk ).
• no feedback: p(y n |x n ) = (default case in the
Qn
i=1 p(yi |xi ).
book.)
• (M, n) code for (X , p(y |x ), Y):
1. index set {1, 2, . . . , M}.
2. encoding function X n : {1, 2, . . . , M} → X n with codewords
x n (1), . . . , x n (M). all codewords form the codebook.
3. decoding function: g : Y n → {1, 2, . . . , M}.
Definitions II

• conditional prob. of error: λi = Pr {g(Y n ) 6= i|X n = x n (i)}.


• maximal prob. of error: λ(n) = maxi∈{1,...,M} λi .
(n)
• average prob. of error for an (M, n) code: Pe i=1 λi .
1 PM
= M

• rate of an (M, n) code: R = log(M)/n bits per transmission.


• rate R achievable if there exists a sequence of (d2nR e, n)
codes such that λ(n) → 0 as n → 0.
• capacity is the supremum of all achievable rates.
Jointly typical sequences I

Idea: decode Y n as index i if X n (i) is jointly typical with Y n .


(n)
The set A of jointly typical sequences {(x n , y n )} w.r.t. p(x , y ) is
given by
(
A(n)
 = (x n , y n ) ∈ X n × Y n :

1

− log p(x n ) − H(X ) < ,

n
1

− log p(y n ) − H(Y ) < ,

n

1
)
− log p(x , y ) − H(X , Y ) <  ,
n n

n

where p(x n , y n ) =
Qn
i=1 p(xi , yi ).
Jointly typical sequences II
Joint AEP: Let (X n , Y n ) have lengths n, drawn i.i.d. from
p(x n , y n ). Then:
n o
(n)
1. Pr (X n , Y n ) ∈ A → 1 as n → ∞.
(n)
2. |A | ≤ 2n(H(X ,Y )+) .
n o
(n)
3. Pr (X̃ n , Ỹ n ) ∈ A ≤ 2−n(I(X ;Y )−3) for
(X̃ n , Ỹ n ) ∼ p(x n )p(y n ).
yn
xn
. .. . . . . . .. . . . . • 2nH(X ) typical X sequences.
. . ..
.. . . . .. . . . . . • 2nH(Y ) typical Y sequences.
. . . . . ..
. .. . . . . . • only 2nH(X ,Y ) jointly typical
.. . .
.. . . . .. . . .. . . sequences.
. . .
.. . .
.
. .
.
.
.
.. ..
. • one in 2nI(X ;Y ) pairs is jointly
. .
. . . .. . . . . . .. typical.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy