session3
session3
Property
For any X1 , . . . , Xn we have:
n
X
H(X1 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 )
i=1
Property
For any X1 , . . . , Xn we have:
n
X
H(X1 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 )
i=1
Property
For any X1 , . . . , Xn we have:
n
X
H(X1 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 )
i=1
Xi = f (Xi−1 , Ui ).
Xi = f (Xi−1 , Ui ).
Xi = f (Xi−1 , Ui ).
Xi = f (Xi−1 , Ui ).
Property
For any X1 , . . . , Xn we have:
n
X
I(X1 , . . . , Xn ; Y) = I(Xi ; Y|Xi−1 , . . . , X1 )
i=1
Property
For any X1 , . . . , Xn we have:
n
X
I(X1 , . . . , Xn ; Y) = I(Xi ; Y|Xi−1 , . . . , X1 )
i=1
Property
Consider X, Y discrete random variables with joint distribution pX,Y
and marginal distributions pX , pY respectively. We have:
Property
Consider X, Y discrete random variables with joint distribution pX,Y
and marginal distributions pX , pY respectively. We have:
ai
with equality iff bi = c for all i.
ai
with equality iff bi = c for all i.
Definition
X → Y → Z is a Markov chain iff X and Z are independent given Y.
Equivalently we have (X, Y, Z) ∼ pX,Y,Z (x, y, z) with
Definition
X → Y → Z is a Markov chain iff X and Z are independent given Y.
Equivalently we have (X, Y, Z) ∼ pX,Y,Z (x, y, z) with
Definition
X → Y → Z is a Markov chain iff X and Z are independent given Y.
Equivalently we have (X, Y, Z) ∼ pX,Y,Z (x, y, z) with
Property
If X → Y → Z then I(X; Y) ≥ I(X; Z).
Proof We have:
Property
If X → Y → Z then I(X; Y) ≥ I(X; Z).
Proof We have:
1−α b 0
0 b
b ×
1 b b 1
1−α
1−α b 0
0 b
b ×
1 b b 1
1−α
1−α b 0
0 b
b ×
1 b b 1
1−α
1−α b 0
0 b
b ×
1 b b 1
1−α
1−α b 0
0 b
b ×
1 b b 1
1−α
Proposition
If X → Y → X̂ then:
Proposition
If X → Y → X̂ then:
Proposition
If X → Y → X̂ then:
Proposition
If X → Y → X̂ then:
so that
H(X|Y) ≤ H(X|X̂)
Define E = 1{X̂ ̸= X}, using the chain rule in both directions:
so that
H(X|Y) ≤ H(X|X̂)
Define E = 1{X̂ ̸= X}, using the chain rule in both directions:
so that
H(X|Y) ≤ H(X|X̂)
Define E = 1{X̂ ̸= X}, using the chain rule in both directions:
We have
We have
n
1X 1
log2 → H(X|Y) in probability.
n pX|Y (Xi |Yi ) n→∞
i=1
n
1X pX (Xi )pY (Yi )
log2 → I(X; Y) in probability.
n pX,Y (Xi , Yi ) n→∞
i=1 Information Theory, Richard Combes, CentraleSupelec, 2024
The typical set
Proposition
Consider X1 , . . . , Xn i.i.d. with common distribution pX .
Given ϵ > 0 define the typical set:
n
n 1X 1 o
Anϵ n
= (x1 , ..., xn ) ∈ X : log2 − H(X) ≤ ϵ .
n p(xi )
i=1
Then:
(i) |Anϵ | ≤ 2n(H(X)+ϵ) for all n
(ii) |Anϵ | ≥ (1 − ϵ)2n(H(X)−ϵ) for n large enough
(iii) P((X1 , . . . , Xn ) ∈ Anϵ ) ≥ 1 − ϵ for n large enough
Proposition
Consider X1 , . . . , Xn i.i.d. with common distribution pX .
Given ϵ > 0 define the typical set:
n
n 1X 1 o
Anϵ n
= (x1 , ..., xn ) ∈ X : log2 − H(X) ≤ ϵ .
n p(xi )
i=1
Then:
(i) |Anϵ | ≤ 2n(H(X)+ϵ) for all n
(ii) |Anϵ | ≥ (1 − ϵ)2n(H(X)−ϵ) for n large enough
(iii) P((X1 , . . . , Xn ) ∈ Anϵ ) ≥ 1 − ϵ for n large enough
▶ Which we bound as
▶ Which we bound as
▶ Which we bound as
1 − ϵ ≤ P((X1 , . . . , Xn ) ∈ Anϵ ) ≤ 1.
1 − ϵ ≤ P((X1 , . . . , Xn ) ∈ Anϵ ) ≤ 1.
1 − ϵ ≤ P((X1 , . . . , Xn ) ∈ Anϵ ) ≤ 1.
X n of size 2n log2 |X |
X n of size 2n log2 |X |