Information Theories Solutions Detail Trang 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Đề bài chương 8 206

1. Differential entropy. Evaluate the differential entropy h(X) = − N f ln f for the following: (a) The
203
exponential density, f(x) = λe−λx , x ≥ 0.
2. Concavity of determinants. Let K1 and K2 be two symmetric nonnegative definite
204
n × n matrices. Prove the result of Ky Fan[5]
3. Uniformly distributed noise. Let the input random variable X to a channel be
uniformly distributed over the interval −1/2 ≤ x ≤ +1/2 . Let the output of the
204
channel be Y = X + Z , where the noise random variable is uniformly distributed over
the interval −a/2 ≤ z ≤ +a/2 .
4. Quantized random variables. Roughly how many bits are required on the average
to describe to 3 digit accuracy the decay time (in years) of a radium atom if the half-life 206
of radium is 80 years? Note that half-life is the median of the distribution.
5. Scaling. Let h(X) = − N f(x)log f(x) dx. Show h(AX) = log | det(A) | +h(X). 206
6. Variational inequality: Verify, for positive random variables X , tha 206
7. Differential entropy bound on discrete entropy: Let X be a discrete random
207
variable on the set X = {a1, a2, . . .} with Pr(X = ai) = pi . Show that
8. Channel with uniformly distributed noise: Consider a additive channel whose
input alphabet X = {0, ±1, ±2} , and whose output Y = X +Z , where Z is uniformly 209
distributed over the interval [−1, 1]. Thus the input of the channel is a discrete random
9. Gaussian mutual information. Suppose that (X, Y,Z) are jointly Gaussian and
that X → Y → Z forms a Markov chain. Let X and Y have correlation coefficient 210
ρ1 and let Y and Z have correlation coefficient ρ2 . Find I(X;Z).
10. The Shape of the Typical Set
Let Xi be i.i.d. ∼ f(x), where 211
f(x) = ce−x4
Đề bài chương 9 Số trang
A channel with two independent looks at Y. Let Y1 and Y2
be conditionally independent and conditionally identically distributed given X. 217
The two-look Gaussian channel. Consider
the ordinary Gaussian channel with two correlated looks at X, i.e., Y =
(Y1, Y2), where 218
Output power constraint. Consider an additive white Gaussian noise
channel with an expected output power constraint P . Thus Y = X + Z , Z ∼ N(0, σ2),
Z is independent of X , and EY 2 ≤ P . Find the channel capacity. 219
Exponential noise channels. Consider an additive noise channel Yi = Xi + Zi ,
where Zi is i.i.d. exponentially distributed noise with mean µ. Assume that we have a
mean constraint on the signal, i.e., EXi ≤ λ. Show that the capacity of such a channel
is C = log(1 + λ/µ). 220
Fading channel.
Consider an additive noise fading channel. where Z is additive noise,
V is a random variable representing fading, and Z and V are independent of
each other and of X . Argue that knowledge of the fading factor V improves
capacity by showing 221
Parallel channels and waterfilling. Consider a pair of parallel Gaussian channels,
i.e., and there is a power constraint E(X2^1 + X2^2 ) ≤ 2P . Assume that σ2^1 > σ2^2 . At what
power does the channel stop behaving like a single channel with noise variance σ2^2 , and
begin behaving like a pair of channels? 221
. Multipath Gaussian channel. Consider a Gaussian noise channel of power contraint
P , where the signal takes two different paths and the received noisy signals are added
together at the antenna. 222
Parallel Gaussian channels
Consider the following parallel Gaussian channel. where Z1 ∼ N (0,N1) and Z2 ∼ N (0,N2) are
independent Gaussian random variables and Yi = Xi + Zi . We wish to allocate power to the
two parallel channels. Let β1 and β2 be fixed. Consider a total cost constraint β1P1 + β2P2 ≤ β, w 223
Vector Gaussian channel
Consider the vector Gaussian noise channel
Y = X + Z, 224
The capacity of photographic film. Here is a problem with a nice answer that takes
a little time. We’re interested in the capacity of photographic film. The film consists
of silver iodide crystals, Poisson distributed, with a density of λ particles per square
inch. The film is illuminated without knowledge of the position of the silver iodide
particles. It is then developed and the receiver sees only the silver iodide particles that
have been illuminated. It is assumed that light incident on a cell exposes the grain if it
is there and otherwise results in a blank response 225
. Gaussian mutual information. Suppose that (X, Y,Z) are jointly Gaussian and
that X → Y → Z forms a Markov chain. Let X and Y have correlation coefficient
ρ1 and let Y and Z have correlation coefficient ρ2 . Find I(X;Z). 226
Time varying channel. A train pulls out of the station at constant velocity. The
received signal energy thus falls off with time as 1/i2 . The total received signal at time
i is 227
. Feedback capacity for n = 2 . Let (Z1,Z2) ∼ N(0, K), K =1 ρ
ρ1
Find the maximum of 1^2 log |KX+Z| |KZ| with and without feedback given a trace (power) constraint
tr(KX ) ≤ 2P 229
Additive noise channel. Consider the channel Y = X + Z , where X is the transmitted signal
with power constraint P , Z is independent additive noise, and Y is thereceived signal. Let
where Z∗ ∼ N(0, N). Thus Z has a mixture distribution which is the mixture of a
Gaussian distribution and a degenerate distribution with mass 1 at 0. 230
Discrete input continuous output channel. Let Pr{X = 1} = p , Pr{X = 0} =
1 − p , and let Y = X + Z , where Z is uniform over the interval [0, a], a > 1 , and Z
is independent of X . 230
Gaussian mutual information. Suppose that (X, Y,Z) are jointly Gaussian and that X → Y → Z
forms a Markov chain. Let X and Y have correlation coefficient ρ1 and let Y and Z have correlation
coefficient ρ2 . Find I(X;Z) 231
Impulse power.
Consider the additive white Gaussian channel where Zi ∼ N(0, N), and the input signal has average
power constraint P. 232
Gaussian channel with time-varying mean. Find the capacity of the following
Gaussian channels. Let Z1,Z2, . . . be independent and let there be a power contraint P on xn(W). Find
the capacity when 233
A parametric form for channel capacity
Consider m parallel Gaussian channels, Yi = Xi + Zi , where Zi ∼ N(0, λi) and the
noises Xi are independent r.v.’s. Thus C...) where λ is chosen
to satisfy ... P . Show that this can be rewritten in the form 234
Robust decoding. Consider an additive noise channel whose output Y is given by
Y = X + Z,
where the channel input X is average power limited,
EX^2 ≤ P,
and the noise process {Zk}∞
k=−∞ is iid with marginal distribution pZ(z) (not necessarily
Gaussian) of power N ,
EZ^2 = N. 235
A mutual information game. Consider the following channel:
Throughout this problem we shall constrain the signal power
EX = 0, EX^2 = P, and the noise power
EZ = 0, EZ^2 = N 237
Recovering the noise. Consider a standard Gaussian channel Y^n = X^n + Z^n, where Zi is i.i.d. ∼ N (0, N),
i = 1, 2, . . . , n, and .... Xi^2 ≤ P.
For what R is this possible? 240
Đề bài chương 10 Số trang
One bit quantization of a single Gaussian random variable. Let X ∼ N (0, σ2)
and let the distortion measure be squared error. Here we do not allow block descriptions.
Show that the optimum reproduction points for 1 bit quantization are ±
Compare this with the distortion rate bound D = σ^22^−2R for R = 1 . 241
Rate distortion function with infinite distortion. Find the rate distortion function R(D) = min I(X; Xˆ) for
X ∼ Bernoulli ( 1/2 ) and distortion
242
. Rate distortion for binary source with asymmetric distortion. Fix p(xˆ|x) and
evaluate I(X; Xˆ ) and D for X ∼ Bern(1/2)
(The rate distortion function cannot be expressed in closed form.) 242
Properties of R(D). Consider a discrete source X ∈ X = {1, 2, . . . , m} with distribution p1, p2, . . . ,
pm and a distortion measure d(i, j). Let R(D) be the rate distortion
function for this source and distortion measure. Let d'
(i, j) = d(i, j) − wi be a new distortion measure and let R'
(D) be the corresponding rate distortion function. Show that R'(D) = R(D + w¯), where...
This result is due to Pinkston[10]. 243
Rate distortion for uniform source with Hamming distortion. Consider a
source X uniformly distributed on the set {1, 2, . . . , m} . Find the rate distortion
function for this source with Hamming distortion, i.e., 244
. Shannon lower bound for the rate distortion function. Consider a source X
with a distortion measure d(x, xˆ) that satisfies the following property: all columns of
the distortion matrix are permutations of the set {d1, d2, . . . , dm} . Define the function...
The Shannon lower bound on the rate distortion function[14] is proved by the following
steps: 245
Erasure distortion. Consider X ∼ Bernoulli( 1/2 ), and let the distortion measure be
given by the matrix
Calculate the rate distortion function for this source. Can you suggest a simple scheme
to achieve any value of the rate distortion function for this source? 248
Bounds on the rate distortion function for squared error distortion. For the case of a
continuous random variable X with mean zero and variance σ^2 and squared 248
Properties of optimal rate distortion code. A good (R, D) rate distortion code
with R ≈ R(D) puts severe constraints on the relationship of the source X^n and the
representations Xˆn . Examine the chain of inequalities (10.100–10.112) considering the
conditions for equality and interpret as properties of a good code. For example, equality
in (10.101) implies that Xˆn is a deterministic function of X^n . 252
Rate distortion. Find and verify the rate distortion function R(D) for X uniform
on X = {1, 2, . . . , 2m} and
where Xˆ is defined on Xˆ = {1, 2, . . . , 2m} .
(You may wish to use the Shannon lower bound in your argument.) 253
Lower bound:
let...
and
Define g(a) = max h(X) over all densities such that EX4 ≤ a. Let R(D) be the
rate distortion function for X with the above density and with distortion criterion
d(x, xˆ) = (x − xˆ)4. Show R(D) ≥ g(c) − g(D) 254
. Adding a column to the distortion matrix. Let R(D) be the rate distortion function for an i.i.d.
process with probability mass function p(x) and distortion function
d(x, xˆ), x ∈ X , xˆ ∈ Xˆ . Now suppose that we add a new reproduction symbol xˆ0 to
Xˆ with associated distortion d(x, xˆ0), x ∈ X . Does this increase or decrease R(D)
and why? 255
Simplification. Suppose X = {1, 2, 3, 4} , Xˆ = {1, 2, 3, 4} , p(i) = 1/4 , i = 1, 2, 3, 4 ,
and X1, X2, . . . are i.i.d. ∼ p(x). The distortion matrix d(x, xˆ) is given by 255
Rate distortion for two independent sources. Can one simultaneously compress
two independent sources better than by compressing the sources individually? The
following problem addresses this question. Let {Xi} be iid ∼ p(x) with distortion
d(x, xˆ) and rate distortion function RX (D). Similarly, let {Yi} be iid ∼ p(y) with
distortion d(y, yˆ) and rate distortion function RY (D). 256
Distortion-rate function. Let
be the distortion rate function
a) Is D(R) increasing or decreasing in R?
(b) Is D(R) convex or concave in R?
(c).... 257
Probability of conditionally typical sequences. In Chapter 7, we calculated the
probability that two independently drawn sequences Xn and Y n are weakly jointly
typical. To prove the rate distortion theorem, however, we need to calculate this probability
when one of the sequences is fixed and the other is random.
The techniques of weak typicality allow us only to calculate the average set size of the
conditionally typical set. Using the ideas of strong typicality on the other hand provides
us with stronger bounds which work for all typical xn sequences. We will outline the proof that
Let (Xi, Yi) be drawn i.i.d. ∼ p(x, y). Let the marginals of X and Y be p(x) and
p(y) respectively. 259
The source-channel separation theorem with distortion: Let V1, V2, . . . , Vn be
a finite alphabet i.i.d. source which is encoded as a sequence of n input symbols X^n
of a discrete memoryless channel. The output of the channel Y n is mapped onto the
reconstruction alphabet Vˆ n = g(Y^n). Let D = Ed(V n, Vˆ n) =....be
the average distortion achieved by this combined source and channel coding scheme 268
Rate distortion.
Let d(x, xˆ) be a distortion function. We have a source X ∼ p(x). Let R(D) be the
associated rate distortion function.
a) Find R˜(D) in terms of R(D), where R˜(D) is the rate distortion function associated with the distortion ˜
d(x, xˆ) = d(x, xˆ) + a for some constant a > 0. (They are
not equal)
b)
c) 270
Rate distortion with two constraints
Let Xi be iid ∼ p(x). We are given two distortion functions d1(x, xˆ) and d2(x, xˆ). We
wish to describe Xn at rate R and reconstruct it with distortions Ed1(Xn, Xˆn
1 ) ≤ D1 ,and Ed2(X^n, Xˆn
2 ) ≤ D2 , as shown here: 271
Rate distortion
Consider the standard rate distortion problem, Xi i.i.d. ∼ p(x), Xn → i(Xn) → Xˆn,
|i(·)| = 2^nR. Consider two distortion criteria d1(x, xˆ) and d2(x, xˆ)
Let R1(D) and R2(D) be the corresponding rate distortion functions. 272
Đề bài chương 11 Trang Đề bài Trang
1. Chernoff-Stein lemma. Consider the two hypothesis test
H1 : f = f1 vs. H2 : f = f2
Find D(f1 5 f2) if 14. Sanov. Let Xi be i.i.d. ∼ N(0, σ2).
(a) fi(x) = N(0, σ2i ),i = 1, 2 (a) Find the exponent in the behavior of Pr{1/n sigma(n, i=1) Xi^2 ≥ α2}.
(b) fi(x) = λie−λix, x ≥ 0,i = 1, 2 This can be done from first principles (since the normal distribution is nice)
(c) f1(x) is the uniform density over the interval [0,1] and f2(x) is the uniform density or by using Sanov’s theorem.
over [a, a + 1]. Assume 0 < a < 1. (b) What does the data look like if 1/n sigma(n,i=1) Xi^2 ≥ α?
(d) f1 corresponds to a fair coin and f2 corresponds to a two-headed coin. 273 That is, what is the P* that minimizes D(P || Q) ? 289
15. Counting states.
2. A relation between D(P 5 Q) and Chi-square. Show that the χ2 statistic Suppose an atom is equally likely to be in each of 6 states,
χ2 = Σx(P(x) − Q(x))2/Q(x) X ∈ {s1, s2, s3, . . . , s6} .One observes n atoms X1, X2, . . . , Xn
is (twice) the first term in the Taylor series expansion of D(P || Q) about Q. independently drawn according to this uniform distribution.
Thus D(P 5 Q) = 1/2 χ2 + . . . It is observed that the frequency of occurrence of state s1 is twice the
Suggestion: Write P/Q = 1 + P −Q/Q and expand the log 274 frequency of occurrence of state s2 . 290
16. Hypothesis testing
3. Error exponent for universal codes. A universal source code of rate R achieves Let {Xi} be i.i.d. ∼ p(x), x ∈ {1, 2, . . .} . Consider two hypotheses H0:
a probability of error Pe^(n) = e^−nD(P∗||Q), where Q is the true distribution and P* p(x) = p0(x) vs. H1 : p(x) = p1(x), where p0(x) = (1/2)^x ,
achieves min D(P || Q) over all P such that H(P) ≥ R. and p1(x) = qp^x−1 , x = 1, 2, 3, . . .
(a) Find P∗ in terms of Q and R. (a) Find D(p0 || p1).
(b) Now let X be binary. Find the region of source probabilities Q(x), x ∈ {0, 1} , (b) Let Pr{H0} = 1
for which rate R is sufficient for the universal source code to achieve P (n) e → 0. 2 . Find the minimal probability of error test for H0 vs. H1 given
Solution: Error exponent for universal codes. 275 data X1, X2, . . . , Xn ∼ p(x). 291
4. Sequential projection. We wish to show that projecting Q onto P1 and
then projecting the projection Q^ onto P1 ^ P2 is the same as projecting Q directly
onto P1 ^P2.
Let P1 be the set of probability mass functions on X satisfying
sigma(x) p(x) = 1,
sigma(x) p(x)hi(x) ≥ αi, i = 1, 2, . . . , r.
Let P2 be the set of probability mass functions on X satisfying
sigma(x) p(x) = 1,
sigma(x) p(x)gj(x) ≥ βj , j = 1, 2, . . . , s. 17. Maximum likelihood estimation. Let {fθ(x)} denote a parametric family
Suppose Q !∈ P1 U P2. Let P* minimize D(P || Q) over all P ∈ P1. Let R* minimize of densities with parameter θ%R. Let X1, X2, . . . , Xn be i.i.d. ∼ fθ(x).
D(R || Q) over all R ∈ P1 ^ P2. Argue that R* minimizes D(R || P*) over all The function
R ∈ P1 ^ P2. is known as the log likelihood function. Let θ0 denote the true parameter
Solution: Sequential Projection. 277 value 292
18. Large deviations. Let X1, X2, . . . be i.i.d. random variables drawn
according to the geometric distribution
Pr{X = k} = pk−1(1 − p), k = 1, 2, . . . .
5. Counting. Let X = {1, 2, . . . , m} . Show that the number of sequences x^n ∈ X^n Find good estimates (to first order in the exponent) of
satisfying 1/n sigma(n,i=1) g(xi) ≥ α is approximately equal to 2^nH*, to first order (a) Pr{1/n sigma(n,i=1 Xi ≥ α} .
in the exponent, for n sufficiently large, where (b) Pr{X1 = k|1/n sigma(n,i=1) Xi ≥ α} .
H* = max[P: sigma(m, i=1) P(i)g(i)≥α] H(P). 278 (c) Evaluate a) and b) for p = 1/2 , α = 4 . 293
19. Another expression for Fisher information. Use integration by parts to show
that
6. Biased estimates may be better. Consider the problem of estimating µ and σ2 J(θ) = −E* ∂^2 lnfθ(x)/ ∂θ^2 .
from n samples of data drawn i.i.d. from a N (µ, σ2) distribution. 279 294
7. Fisher information and relative entropy. Show for a parametric family {pθ(x)} 20. Stirling’s approximation: Derive a weak form of
that Stirling’s approximation for factorials, i.e., show that
lim (θ%→θ) 1/(θ − θ')^2 D(pθ||pθ') = 1/ln 4 J(θ). 282 using the approximation of integrals by sums. Justify the following steps: 295
21. Asymptotic value of (n,k)
. Use the simple approximation of the previous problem to
show that, if 0 ≤ p ≤ 1 , and k = *np+ , i.e., k is the largest integer less than or equal
8. Examples of Fisher information. The Fisher information J(Θ) for the family to np , then
fθ(x), θ ∈ R is defined by Now let pi , i = 1, . . . , m be a probability distribution on m symbols, i.e., pi ≥ 0 , and
Find the Fisher information for the following families: 283 sigma(i) pi = 1 . What is the limiting value of 297
22. The running difference.. Let X1, X2, . . . , Xn be i.i.d. ∼ Q1(x), and Y1, Y2, . . . , Yn
be i.i.d. ∼ Q2(y). Let Xn and Y n be independent. Find an expression for
9. Two conditionally independent looks double the Fisher information. Let Pr{sigma(n,i=1) Xi− sigma(n,i=1) Yi ≥ nt} , good to first order in the exponent.
gθ(x1, x2) = fθ(x1)fθ(x2). Show Jg(θ) = 2Jf (θ) . 285 Again, this answer can be left in parametric form. 298
23. Large likelihoods. Let X1, X2, . . . be i.i.d. ∼ Q(x), x ∈ {1, 2, . . . , m} . Let P(x) be
10. joint distributions and product distributions. Consider a joint distribution some other probability mass function. We form the log likelihood ratio
Q(x, y) with marginals Q(x) and Q(y). Let E be the set of types that look jointly of the sequence Xn and ask for the probability that it exceeds a certain threshold.
typical with respect to Q, i.e., 285 Specifically, find (to first order in the exponent) 299
24. Fisher information for mixtures. Let f1(x) and f0(x) be two given probability
11. Cramer-Rao inequality with a bias term. Let X ∼ f(x; θ) and let T(X) be an densities. Let Z be Bernoulli( θ ), where θ is unknown. Let X ∼ f1(x), if Z = 1 and
estimator for θ . Let bT (θ) = EθT − θ be the bias of the estimator. Show that 286 X ∼ f0(x), if Z = 0 . 300
12. Hypothesis testing. Let X1, X2, . . . , Xn be i.i.d. ∼ p(x). Consider the hypothesis 25. Bent coins. Let {Xi} be iid ∼ Q where
test H1 : p = p1 versus H2 : p = p2 . Let Q(k) = Pr(Xi = k) = (m,k) q^k (1 − q)^(m−k), for k = 0, 1, 2, . . . , m.
Find the error exponent for Pr{ Decide H2|H1 true } in the best hypothesis test Thus, the Xi ’s are iid ∼ Binomial (m, q).
of H1 vs. H2 subject to Pr{ Decide H1|H2 true } ≤ 1 Show that, as n → ∞,
2. 287 301
13. Sanov’s theorem: Prove the simple version of Sanov’s theorem for the binary random
variables, i.e., let X1, X2, . . . , Xn be a sequence of binary random variables, drawn i.i.d. 26. Conditional limiting distribution.
according to the distribution: (a) Find the exact value of
Let the proportion of 1’s in the sequence X1, X2, . . . , Xn be pX , i.e., Pr{X1 = 1|1/n sigma(n,i=1) Xi = 1/4},
By the law of large numbers, we would expect pX to be close to q for large n. Sanov’s if X1, X2, . . . , are Bernoulli(2/3) and n is a multiple of 4.
theorem deals with the probability that pXn is far away from q . In particular, for (b) Now let Xi e {−1, 0, 1} and let X1, X2 . . . be i.i.d. uniform over {−1, 0, +1}. Find
concreteness, if we take p > q > 1 the limit of
2 , Sanov’s theorem states that 287 Pr{X1 = +1|1/n sigma(n,i=1) Xi^2 = 1/2} 302
27. Variational inequality: Verify, for positive random variables X , that
log EP (X) = sup
Q
[EQ(log X) − D(Q||P)] (11.264)
where EP (X) = $
x xP(x) and D(Q||P) = $
x Q(x)log Q(x)
P(x) , and the supremum is
over all Q(x) ≥ 0 , $Q(x) = 1 . It is enough to extremize J(Q) = EQ ln X−D(Q||P)+
λ(
$ Q(x) − 1). 302
28. Type constraints.
(a) Find constraints on the type PXn such that the sample variance X2
n −(Xn)2 ≤ α,
where X2
n=1
n
$n
i=1 X2
i and Xn = 1
n
$n
i=1 Xi .
(b) Find the exponent in the probability Qn(X2
n − (Xn)2 ≤ α). You can leave the
answer in parametric form. 303
29. While we cannot use the results of Section 11.5 directly, we can use a similar
approach using the calculus of variations to find the type that is closest to subject to the
constraint. We set up the functional 304
30. Uniform distribution on the simplex.
Which of these methods will generate a sample from the uniform distribution on the
simplex {x ∈ Rn : xi ≥ 0, $n
i=1 xi = 1} ? 305
Đề bài chương 12 Trang Đề bài Trang
Min D(P 5 Q) under constraints on P. We wish to
find the (parametric form) of
1. Maximum entropy. Find the maximum entropy the probability mass function P(x), x ∈ {1, 2, . . .}
density f , defined for x ≥ 0 , satisfying EX = α1, E ln X = that minimizes the relative entropy
α2. That is, maximize − N f ln f subject to N xf(x) dx = D(P 5 Q) over all P such that $ P(x)gi(x) = αi, i = 1,
α1, N (ln x)f(x) dx = α2 , where the integral is over 0 ≤ x 2, . . . .
< ∞. What family of densities is this? 307 307
3. Maximum entropy processes. Find the maximum
entropy rate stochastic process {Xi}∞ −∞ subject to the 4. Maximum entropy with marginals. What is the
constraints: (a) EX2 i = 1, i = 1, 2, . . . , (b) EX2 i = 1 , maximum entropy distribution p(x, y) that has the
EXiXi+1 = 1 2 , i = 1, 2, . . . . (c) Find the maximum following marginals? Hint: You may wish to guess
entropy spectrum for the processes in parts (a) and (b). 308 and verify a more general result. 309
. Every density is a maximum entropy density.
Let f0(x) be a given density. Given r(x), let gα(x) be
5. Processes with fixed marginals. Consider the set the density maximizing h(X) over all f satisfying N f
of all densities with fixed pairwise marginals fX1,X2(x1, (x)r(x) dx = α. Now let r(x) = ln f0(x). Show that gα
x2), fX2,X3(x2, x3), . . . , fXn−1,Xn(xn−1, xn). Show that (x) = f0(x) for an appropriate choice α = α0 . Thus
the maximum entropy process with these marginals is f0(x) is a maximum entropy density under the
the first-order (possibly time-varying) Markov process constraint N f ln f0 = α0 . Solution: Every density is
with these marginals. Identify the maximizing f ∗(x1, x2, a maximum entropy density. Given the constraints
. . . , xn). 310 that 310
8. Maximum entropy characteristic functions.
We ask for the maximum entropy density f(x), 0 ≤ x
≤ a, satisfying a constraint on the characteristic
function Ψ(u) = N a 0 eiuxf(x)dx. The answers need
be given only in parametric form. (a) Find the
maximum entropy f satisfying N a 0 f(x) cos(u0x) dx
= α, at a specified point u0 . (b) Find the maximum
entropy f satisfying N a 0 f(x)sin(u0x)dx = β . (c)
7. Mean squared error. Let {Xi}n i=1 satisfy EXiXi+k = Find the maximum entropy density f(x), 0 ≤ x ≤ a,
Rk, k = 0, 1, . . . , p . Consider linear predictors for Xn , having a given value of the characteristic function
i.e. Xˆn = n !−1 i=1 biXn−i. Assume n > p . Find max f Ψ(u0) at a specified point u0 . (d) What problem is
(xn) min b E(Xn − Xˆn) 2, 311 encountered if a = ∞? 312
10. Maximum entropy of sums Let Y = X1 + X2
9. Maximum entropy processes. (a) Find the Find the maximum entropy density for Y under the
maximum entropy rate binary stochastic process {Xi}∞ constraint EX2 1 = P1 , EX2 2 = P2 , (a) if X1 and
i=−∞, Xi ∈ {0, 1} , satisfying Pr{Xi = Xi+1} = 1 3 , for all X2 are independent. (b) if X1 and X2 are allowed to
i. (b) What is the resulting entropy rate? 313 be dependent. (c) Prove part (a). 314
. Maximum entropy Markov chain. Let {Xi} be a 12. An entropy bound on prediction error. Let
stationary Markov chain with Xi ∈ {1, 2, 3} . Let I(Xn; {Xn} be an arbitrary real valued stochastic process.
Xn+2) = 0 for all n. (a) What is the maximum entropy Let Xˆn+1 = E{Xn+1|Xn} . Thus the conditional
rate process satisfying this constraint? (b) What if I(Xn; mean Xˆn+1 is a random variable depending on the
Xn+2) = α, for all n for some given value of α, 0 ≤ α ≤ n-past Xn . Here Xˆn+1 is the minimum mean
log 3 ? 314 squared error prediction of Xn+1 given the past. 315
14. Maximum entropy. (a) What is the parametric
13. Maximum entropy rate. What is the maximum form maximum entropy density f(x) satisfying the
entropy rate stochastic process {Xi} over the symbol set two conditions EX8 = a EX16 = b? (b) What is the
{0, 1} for which the probability that 00 occurs in a maximum entropy density satisfying the condition E
sequence is zero? 316 (X8 + X16) = a + b ? (c) Which entropy is higher? 316
15. Maximum entropy. Find the parametric form of the 16. Maximum entropy processes Consider the
maximum entropy density f satisfying the Laplace set of all stochastic processes with {Xi}, Xi ∈ R,
transform condition 2 f(x)e−xdx = α , and give the with R0 = EX2 i = 1 R1 = EXiXi+1 = 1 2 . Find the
constraints on the parameter. 317 maximum entropy rate. 317
17. Binary maximum entropy Consider a binary 18. Maximum entropy. Maximize h(Z, Vx, Vy, Vz)
process {Xi}, Xi ∈ {−1, +1}, with R0 = EX2 i = 1 and R1 subject to the energy constraint E( 1 2m 5 V 52
= EXiXi+1 = 1 2 . (a) Find the maximum entropy +mgZ) = E0. Show that the resulting distribution
process with these constraints. (b) What is the entropy yields E 1 2 m 5 V 52= 3 5 E0 EmgZ = 2 5 E0.
rate? (c) Is there a Bernoulli process satisfying these Thus 2 5 of the energy is stored in the potential
constraints? 318 field, regardless of its strength g . 318
20. Maximum entropy of sums. Let Y = X1 + X2.
19. Maximum entropy discrete processes. (a) Find Find the maximum entropy of Y under the
the maximum entropy rate binary stochastic process constraint EX2 1 = P1, EX2 2 = P2, (a) if X1 and X2
{Xi}∞ i=−∞, Xi ∈ {0, 1}, satisfying Pr{Xi = Xi+1} = 1 3 , are independent. (b) if X1 and X2 are allowed to be
for all i. (b) What is the resulting entropy rate? 319 dependent. 320
22. Minimum expected value (a) Find the
21. Entropy rate (a) Find the maximum entropy rate minimum value of EX over all probability density
stochastic process {Xi} with EX2 i = 1, EXiXi+2 = α, i = functions f(x) satisfying the following three
1, 2, . . . . Be careful. (b) What is the maximum entropy constraints: (i) f(x) = 0 for x ≤ 0, (ii) N ∞ −∞ f(x)dx =
rate? (c) What is EXiXi+1 for this process? 321 1, and 321
Đề bài chương 13 Trang Đề Trang
2. Universal data compression. Consider three
possible source distributions on X , Pa = (.7, .2, .
1. Minimax regret data compression and channel 1), Pb = (.1, .7, .2), and Pc = (.2, .1, .7).(a) Find
capacity. First consider universal data compression with the minimum incremental cost of compression
respect to two source distributions. Let the alphabet V = R∗ = min
{1, e, 0} and let p1(v) put mass 1 − α on v = 1 and mass α P
on v = e . Let p2(v) put mass 1 − α on 0 and mass α on v max
=e. 323 324
4. Arithmetic coding. Let Xi be binary stationary
3. Arithmetic coding: Let [Xi] be a stationary binary Markov with transition matrix 13232313
Markov chain with transition matrix pij = " 3 4 1 4 1 4 3 4 . (a) Find F(01110) = Pr{.X1X2X3X4X5 < .
# (13.2) Calculate the first 3 bits of F(X∞) = 0.F1F2 . . . 01110}. (b) How many bits .F1F2 . . . can be
when X∞ = 1010111 . . . . How many bits of X∞ does this known for sure if it is not known how X = 01110
specify? 325 continues? 325
6. Lempel Ziv 78 We are given the constant
sequence xn = 11111 . . . (a) Give the LZ78
parsing for this sequence. (b) Argue that the
5. Lempel-Ziv. Give the LZ78 parsing and encoding of number of encoding bits per symbol for this
00000011010100000110101. Solution: Lempel-Ziv. 326 sequence goes to zero as n → ∞. 326
7. Another idealized version of Lempel-Ziv coding. An
idealized version of LZ was shown to be optimal: The 8. Length of pointers in LZ77. In the version of
encoder and decoder both have available to them the LZ77 due of the Storer and Szymanski[16],
“infinite past” generated by the process, . . . , X−1, X0 , described in Section 13.4.1, a short match can
and the encoder describes the string (X1, X2, . . . , Xn) by either be represented by (F, P,L) (flag, pointer,
telling the decoder the position Rn in the past of the first length) or by (F, C) (flag, character). Assume that
recurrence of that string. This takes roughly log Rn + 2 the window length is W , and assume that the
log log Rn bits. 326 maximum match length is M . 327
10. Two versions of fixed-database Lempel-
Ziv. Consider a source (A, P). For simplicity
assume that the alphabet is finite |A| = A < ∞, and
9. Lempel-Ziv. (a) Continue the Lempel-Ziv parsing of the symbols are i.i.d. ∼ P . A fixed database D is
the sequence 0,00,001,00000011010111. (b) Give a given, and is revealed to the decoder. The
sequence for which the number of phrases in the LZ encoder parses the target sequence xn 1 into
parsing grows as fast as possible. (c) Give a sequence blocks of length l , and subsequently encodes
for which the number of phrases in the LZ parsing grows them by giving the binary description of their last
as slowly as possible. 328 appearance in the database. If a match is 328
11. Tunstall Coding: The normal setting for source
coding maps a symbol (or a block of symbols) from a
finite alphabet onto a variable length string. An example
of such a code is the Huffman code, which is the optimal
(minimal expected length) mapping from a set of symbols
to a prefix free set of codewords. Now consider the dual
problem of variable 330
Đề bài chương 14 Trang
Kolmogorov complexity of two sequences. Let x, y ∈ {0, 1}∗. Argue that K(x, y) ≤
K(x) + K(y) + c. 337
Complexity of the sum.
(a) Argue that K(n) ≤ log n + 2 log log n + c.
(b) Argue that K(n1 + n2) ≤ K(n1) + K(n2) + c. 337
Images. Consider an n × n array x of 0’s and 1’s . Thus x has n2 bits. 338
Do computers reduce entropy? Feed a random program P into an universal
computer. 339
Monkeys on a computer. Suppose a random program is typed into a computer. Give
a rough estimate of the probability that the computer prints the following sequence: 340
Kolmogorov complexity and ternary programs.
Suppose that the input programs for a universal computer U are sequences in {0, 1, 2}∗
(ternary inputs). Also, suppose U prints ternary outputs. Let K(x|l(x)) = minU(p,l(x))=x l(p).
Show that 341
A law of large numbers. Using ternary inputs and outputs as in Problem 6 341
Image complexity.
Consider two binary subsets A and B (of an n × n grid). 342
Random program. Suppose that a random program (symbols i.i.d. uniform over the
symbol set) is fed into the nearest available computer. 342
The face-vase illusion.
(a) What is an upper bound on the complexity of a pattern on an m × m grid that 342
Kolmogorov complexity
Assume n very large and known. Let all rectangles be parallel to the frame 343
Encrypted text
Suppose English text xn is encrypted into yn by a substitution cypher: a 1 to 1
reassignment of each of the 27 letters of the alphabet 345
Kolmogorov complexity.
Consider the Kolmogorov complexity K(n) over the integers n. 345
Complexity of large numbers. Let A(n) be the set of positive integers x for which
a terminating program p of length less than or equal to n bits exists that outputs
x. Let B(n) be the complement of A(n) 346
Đề bài chương 15 Trang
1. The cooperative capacity of a multiple access channel. (Figure 15.1) Suppose X1 and X2 have access to both
indices W1 ∈ {1, 2nR}, W2 ∈ {1, 2nR2 }.
Thus the codewords X1(W1, W2), X2(W1, W2) depend on both indices. Find the
capacity region.
(b) Evaluate this region for the binary erasure multiple access channel Y = X1 +
X2, Xi ∈ {0, 1}. Compare to the non-cooperative region 347
. Capacity of multiple access channels. Find the capacity region for each of the following multiple access channels
(a) Additive modulo 2 multiple access access channel. X1 ∈ {0, 1}, X2 ∈ {0, 1}, Y = X1 ⊕ X2 .(b) Multiplicative
multiple access channel. X1 ∈ {−1, 1}, X2 ∈ {−1, 1}, Y = X1 · X2\ 349
Cut-set interpretation of capacity region of multiple access channel. For the multiple access channel we know that
(R1, R2) is achievable if R1 < I(X1; Y | X2), (15.3) R2 < I(X2; Y | X1), (15.4) R1 + R2 < I(X1, X2; Y ), (15.5 349
Gaussian multiple access channel capacity. For the AWGN multiple access channel, prove, using typical
sequences, the achievability of any rate pairs (R1, R2) satisfyin The proof extends the proof for the discrete multiple
access channel in the same way as the proof for the single user Gaussian channel extends the proof for the discrete
single user channel 350
Converse for the Gaussian multiple access channel. Prove the converse for the Gaussian multiple access channel
by extending the converse in the discrete case to take into account the power constraint on the codeword 3353
Unusual multiple access channel. Consider the following multiple access channel: X1 = X2 = Y = {0, 1} . If (X1, X2)
= (0, 0), then Y = 0 . If (X1, X2) = (0, 1), then Y = 1 . If (X1, X2) = (1, 0), then Y = 1 . If (X1, X2) = (1, 1), then Y = 0 with
probability 1 2 and Y = 1 with probability 1 2 3356
Convexity of capacity region of broadcast channel. Let C ⊆ R2 be the capacity region of all achievable rate pairs
R = (R1, R2) for the broadcast channel. Show that C is a convex set by using a timesharing argument. Specifically,
show that if R(1) and R(2) are achievable, then λR(1) + (1 − λ)R(2) is achievable for 0 ≤ λ ≤ 1 357
Slepian-Wolf for deterministically related sources. Find and sketch the SlepianWolf rate region for the
simultaneous data compression of (X, Y ), where y = f(x) is some deterministic function of x. Solution: Slepian Wolf for
Y = f(X). The quantities defining the Slepian Wolf rate region are H(X, Y ) = H(X), H(Y |X) = 0 and H(X|Y ) ≥ 0 . Hence
the rate region is as shown in the Figure 15.4 358
Slepian-Wolf. Let Xi be i.i.d. Bernoulli(p ). Let Zi be i.i.d. ∼ Bernoulli( r ), and let Z be independent of X. Finally, let Y =
X⊕Z (mod 2 addition). Let X be described at rate R1 and Y be described at rate R2. What region of rates allows
recovery of X, Y with probability of error tending to zero 358
Broadcast capacity depends only on the conditional marginals. Consider the general broadcast channel (X, Y1 ×
Y2, p(y1, y2 | x)). Show that the capacity region depends only on p(y1 | x) and p(y2 | x). To do this, for any given
((2nR1 , 2nR2 ), n) code, le 359
Converse for the degraded broadcast channel. The following chain of inequalities proves the converse for the
degraded discrete memoryless broadcast channel. Provide reasons for each of the labeled inequalities. Setup for
converse for degraded broadcast channel capacity: (W1, W2)indep. → Xn(W1, W2) → Y n 1 → Y 360
Capacity points. (a) For the degraded broadcast channel X → Y1 → Y2, find the points a and b where the capacity
region hits the R1 and R2 axes (Figure 15.6). 363
Degraded broadcast channel. Find the capacity region for the degraded broadcast channel in Figure 15.8 364
. Channels with unknown parameters. We are given a binary symmetric channel with parameter p. The capacity is
C = 1 − H(p). Now we change the problem slightly. The receiver knows only that p ∈ {p1, p2} , i.e.,p = p1 or p = p2 ,
where p1 and p2 are given real numbers. The transmitter knows the actual value of p. Devise two codes for use by the
transmitter, one to be used if p = p1, the other to be used if p = p2, such that transmission to the receiver can take
place at rate ≈ C(p1) if p = p1 and at rate ≈ C(p2) if p = p2 365
. Two-way channel. Consider the two-way channel shown in Figure 15.6. The outputs Y1 and Y2 depend only on the
current inputs X1 and X2 . (a) By using independently generated codes for the two senders, show that the following
rate region is achievable: 366
. Multiple-access channel Let the output Y of a multiple-access channel be given by Y = X1 + sgn(X2) where X1, X2
are both real and power limited, 367
Slepian Wolf Let (X, Y ) have the joint pmf p(x, y)where β = 16 − α2 . (Note: This is a joint, not a conditional,
probability mass function. 368
. Square channel What is the capacity of the following multiple access channel? X1 ∈ {−1, 0, 1} X2 ∈ {−1, 0, 1} Y =
X2 1 + 369
Slepian-Wolf: Two senders know random variables U1 and U2 respectively. Let the random variables (U1,U2) have
the following joint distribution 370
0. Multiple access. (a) Find the capacity region for the multiple access channe 371
Broadcast Channel. Consider the following degraded broadcast channel 372
Stereo. The sum and the difference of the right and left ear signals are to be individually compressed for a common
receiver. Let Z1 be Bernoulli (p1) and Z2 be Bernoulli (p2) and suppose Z1 and Z2 are independent. Let X = Z1 + Z2 ,
and Y = Z1 − Z2 . (a) What is the Slepian Wolf rate region of achievable (RX, RY ) 373
The Slepian Wolf region for (Z1,Z2) is R1 ≥ H(Z1|Z2) = H(p1) (15.198) R2 ≥ H(Z2|Z1) = H(p2) (15.199) R1 + R2 ≥ H
(Z1,Z2) = H(p1) + H(p2) (15.200) which is a rectangular region. The minimum sum of rates is the same in both cases,
since if we knew both X and Y , we could find Z1 and Z2 and vice versa. However, the region in part (a) is usually
pentagonal in shape, and is larger than the region in (b 374
Multiplicative multiple access channel. Find and sketch the capacity region of the multiplicative multiple access
channe 375
Distributed data compression. Let Z1,Z2,Z3 be independent Bernoulli(p). Find the Slepian-Wolf rate region for the
description of (X1, X2, X3) where X1 = Z1 X2 = Z1 + Z2 X3 = Z1 + Z2 + Z3 . 375
Noiseless multiple access channel Consider the following multiple access channel with two binary inputs X1, X2 ∈
{0, 1} and output Y = (X1, X2). (a) Find the capacity region. Note that each sender can send at capacity 377
Infinite bandwidth multiple access channel Find the capacity region for the Gaussian multiple access channel with
infinite bandwidth. Argue that all senders can send at their individual capacities, i.e., infinite bandwidth eliminates
interference 377
A multiple access identity. Let C(x) = 1 2 log(1 + x) denote the channel capacity of a Gaussian channel with signal
to noise ratio x. Sho 378
Frequency Division Multiple Access (FDMA). Maximize the throughput R1 + R2 = W1 log(1+ P1 NW1 )+(W −W1)
log(1+ P2 N(W−W1) ) over W1 to show that bandwidth should be proportional to transmitted power for FDMA 378
Trilingual speaker broadcast channel A speaker of Dutch, Spanish and French wishes to communicate
simultaneously to three people: D, S, and F . D knows only Dutch, but can distinguish when a Spanish word is being
spoken as distinguished from a French word, similarly for the other two, who know only Spanish and French
respectively, but can distinguish when a foreign word is spoken and which language is being spoken. Suppose each
language, Dutch, Spanish, and French, has M words: M words of Dutch, M words of French, and M words of Spanish.
(a) What is the maximum rate at which the trilingual speaker can speak to D ? 379
Parallel Gaussian channels from a mobile telephone Assume that a sender X is sending to two fixed base stations.
Assume that the sender sends a signal X that is constrained to have average power P . Assume that the two base
stations receive signals Y1 and Y2 , where Y1 = α1X + Z1 Y2 = α2X + Z2 380
Gaussian multiple access. A group of m users, each with power P , is using a Gaussian multiple access channel at
capacity, so that !m i=1 Ri = C *mP N + , (15.223) where C(x) = 1 2 log(1 + x) and N is the receiver noise power. A
new user of power P0 wishes to join in. (a) At what rate can he send without disturbing the other users 381
Converse for deterministic broadcast channel. A deterministic broadcast channel is defined by an input X , two
outputs, Y1 and Y2 which are functions of the input X . Thus Y1 = f1(X) and Y2 = f2(X). Let R1 and R2 be the rates at
which information can be sent to the two receivers. Prove that R1 ≤ H(Y1) (15.227) R2 ≤ H(Y2) (15.228) R1 + R2 ≤ H
(Y1, Y2) (15.229) 381
Multiple access channel Consider the multiple access channel Y = X1+X2 (mod 4), where X1 ∈ {0, 1, 2, 3}, X2 ∈
{0, 1} . (a) Find the capacity region (R1, R2). (b) What is the maximum throughput R1 + R2 382
Distributed source compression Let Z1 = ) 1, p 0, q, Z2 = ) 1, p 0, q 382
MAC capacity with costs The cost of using symbol x is r(x). The cost of a codeword xn is r(xn) = 1 n $n i=1 r(xi). A
(2nR, n) codebook satisfies cost constraint r if 1 n $n i=1 r(xi(w)) ≤ r, for all w ∈ 2nR . (a) Find an expression for the
capacity C(r) of a discrete memoryless channel with cost constraint r . (b) Find an expression for the multiple access
channel capacity region for (X1 × X2, p(y|x1, x2),Y) if sender X1 has cost constraint r1 and sender X2 has cost
constraint r2 383
Đề bài chương 16 Trang
Growth rate. Let X = ) (1, a), with probability 1/2 (1, 1/a), with probability 1/2 , where a > 1 . This vector X represents a
stock market vector of cash vs. a hot stock. Let W(b, F) = E log bt X 393
Side information. Suppose, in the previous problem, that Y = ) 1, if (X1, X2) ≥ (1, 1), 0, if (X1, X2) ≤ (1, 1). Let the portfolio
b depend on Y. Find the new growth rate W ∗∗ and verify that ∆W = W∗∗ − W∗ satisfies ∆W ≤ I(X; Y ) 394
Stock dominance. Consider a stock market vector X = (X1, X2). Suppose X1 = 2 with probability 1. (a) Find necessary
and sufficient conditions on the distribution of stock X2 such that the log optimal portfolio b∗ invests all the wealth in stock
X2 , i.e., b∗ = (0, 1). (b) Argue for any distribution on X2 that the growth rate satisfies W ∗ ≥ 1 . 395
Including experts and mutual funds. Let X ∼ F(x), x ∈ Rm + be the vector of price relatives for a stock market. Suppose
an “expert” suggests a portfolio b. This would result in a wealth factor bt X. We add this to the stock alternatives to form X˜
= (X1, X2, . . . , Xm, bt X). Show that the new growth rate W˜ ∗ = max b1,...,bm,bm+1 2 ln(bt x˜)dF(x˜ 396
Growth rate for symmetric distribution. Consider a stock vector X ∼ F(x), X ∈ Rm, X ≥ 0 , where the component stocks
are exchangeable. Thus F(x1, x2, . . . , xm) = F(xσ(1), xσ(2), . . . , xσ(m)), for all permutations σ . (a) Find the portfolio b∗
optimizing the growth rate and establish its optimality. Now assume that X has been normalized so that 1 m $m i=1 Xi = 1,
and F is symmetric as before. (b) Again assuming X to be normalized, show that all symmetric distributions F have the
same growth rate against b∗ . (c) Find this growth rate. 397
Convexity. We are interested in the set of stock market densities that yield the same optimal porfolio. Let Pb0 be the set of
all probability densities on Rm + for which b0 is optimal. Thus Pb0 = {p(x) : N ln(bt x)p(x)dx is maximized by b = b0} . Show
that Pb0 is a convex set. It may be helpful to use Theorem 16.2.2 398
Short selling. Let X = ) (1, 2), p (1, 1 2 ), 1 − p Let B = {(b1, b2) : b1 + b2 = 1} Thus this set of portfolios B does not include
the constraint bi ≥ 0 . (This allows short selling.) (a) Find the log optimal portfolio b∗(p) 398
Normalizing x. Suppose we define the log optimal portfolio b∗ to be the portfolio maximizing the relative growth rate 2 ln bt
x 1 m $m i=1 xi dF(x1, . . . , xm). The virtue of the normalization 1 m $ Xi , which can be viewed as the wealth associated
with a uniform portfolio, is that the relative growth rate is finite, even when the growth rate N ln bt xdF(x) is not. This
matters, for example, if X has a St. Petersburg-like distribution. Thus the log optimal portfolio b∗ is defined for all
distributions F , even those with infinite growth rates W ∗(F) 400
Universal portfolio. We examine the first n = 2 steps of the implementation of the universal portfolio for m = 2 stocks. Let
the stock vectors for days 1 and 2 be x1 = (1, 1 2 ), and x2 = (1, 2). Let b = (b, 1 − b) denote a portfolio. (a) Graph S2(b) =
C2 i=1 bt xi, 0 ≤ b ≤ 1 . (b) Calculate S∗ 2 = maxb S2(b). (c) Argue that log S2(b) is concave in b. (d) Calculate the
(universal) wealth Sˆ2 = N 1 0 S2(b)db. (e) Calculate the universal portfolio at times n = 1 and n = 2 : 402
Growth optimal. Let X1, X2 ≥ 0 , be price relatives of two independent stocks. Suppose EX1 > EX2 . Do you always want
some of X1 in a growth rate optimal portfolio S(b) = bX1 + ¯bX2 ? Prove or provide a counterexample 403
Cost of universality. In the discussion of finite horizon universal portfolios, it was shown that the loss factor due to
universality is 1 Vn = !n k=0 . 404
Convex families. This problem generalizes Theorem 16.2.2. We say that S is a convex family of random variables if S1,
S2 ∈ S implies λS1 + (1 − λ)S2 ∈ S . Let S be a closed convex family of random variables. Show that there is a random
variable S∗ ∈ S such that E ln * S S∗ + ≤ 404
Đề bài chương 17 Trang
1. Sum of positive definite matrices. For any two positive definite matrices, K1 and
K2 , show that |K1 + K2| ≥ |K1|.
Solution: Sum of positive definite matrices
407
2. Fan’s inequality[6] for ratios of determinants. For all 1 ≤ p ≤ n, for a positive
definite K = K(1, 2, . . . , n), show that 407
3. Convexity of determinant ratios. For positive definite matrices K , K0 , show that
ln |K+K0|/ |K| is convex in K . 407
4. Data Processing Inequality: Let random variable X1, X2, X3 and X4 form a
Markov chain X1 → X2 → X3 → X4 . Show that
I(X1; X3) + I(X2; X4) ≤ I(X1; X4) + I(X2; X3). 409
5. Markov chains: Let random variables X, Y,Z and W form a Markov chain so that
X → Y → (Z, W), i.e., p(x, y, z, w) = p(x)p(y|x)p(z, w|y). Show that
I(X;Z) + I(X; W) ≤ I(X; Y ) + I(Z; W) 409

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy