Coding
Coding
Coding
Dr T.A. Fisher
Michaelmas 2005
These notes are based on a course of lectures given by Dr T.A. Fisher in Part II of the
Mathematical Tripos at the University of Cambridge in the academic year 20052006.
These notes have not been checked by Dr T.A. Fisher and should not be regarded as
ocial notes for the course. In particular, the responsibility for any errors is mine
please email Sebastian Pancratz (sfp25) with any comments or corrections.
Contents
1 Noiseless Coding 3
2 Error-control Codes 11
3 Shannon's Theorems 19
5 Cryptography 45
Introduction to Communication Channels
Source
/ Encoder
/o /o /o /o /o channel
o/ /o /oO /o /o /o /o /o /o /o / Decoder
/ Receiver
errors, noise
Examples include telegraphs, mobile phones, fax machines, modems, compact discs, or
a space probe sending back a picture.
Basic Problem
Given a source and a channel (modelled probabilistically), we must design an encoder
and decoder to transmit messages economically (noiseless coding, data compression) and
reliably (noisy coding).
Example .
A E Q Z
(Noiseless coding) In Morse code, common letters are given shorter code-
words, e.g. A , E , Q , and Z .
Example (Noisy coding). Every book has an ISBN a1 a2 . . . a10 where ai {0, 1, . . . , 9}
P10
for 1 i 9 and a10 {0, 1, . . . , 9, X} with j=1 jaj 0 (mod 11). This allows
detection of errors such as
The books mentioned above cover the following parts of the course.
Overview
Denition. A communication channel accepts symbols from an alphabet 1 = {a1 , . . . , ar }
and it outputs symbols from an alphabet 2 = {b1 , . . . , bs }. The channel is modelled by
the probabilities P(y1 y2 . . . yn received) | x1 x2 . . . xn sent).
the same for each channel use and independent of all past and future uses of the channel.
The channel matrix is P = (pij ), an rs stochastic matrix.
Denition. The binary erasure channel has 1 = {0, 1} and 2 = {0, 1, ?}. The
1p 0 p
channel matrix is 0 1p p .
We model n uses of a channel by the nth extension with input alphabet n1 and ouput
alphabet n2 .
A code C of length n is a function M n1 , where M is the set of possible messages.
Implicitly, we also have a decoding
n
rule 2 M.
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Denition. The capacity of a channel is the supremum over all reliable transmission
rates.
Noiseless Coding
Example. Let 1 = {1, 2, 3, 4}, 2 = {0, 1} and f (1) = 0, f (2) = 1, f (3) = 00, f (4) =
01. Then f (114) = 0001 = f (312). Here f is injective but not decipherable.
Our aim is to construct decipherable codes with short word lengths. Assuming f is
injective, the following codes are always decipherable.
Note 2. Note that (i) and (ii) are special cases of (iii). Prex-free codes are sometimes
called instantaneous codes or self-punctuating codes.
m
X
asi 1. ()
i=1
Proof. Rewrite () as
s
X
nl al 1 ()
l=1
where nl is the number of codewords of length l and s = max1im si .
If f : 1 2 is prex-free then
n1 as1 + n2 as2 + + ns1 a + ns as
since the LHS is the number of strings of length s in 2 with some codeword of f as a
prex and the RHS is the number of strings of length s.
For the converse, given n1 , . . . , ns satisfying (), we need to construct a prex-free code
f with nl codewords of length l, for all l s. We proceed by induction on s. The case
s = 1 is clear. (Here, () gives n1 a, so we can choose a code.) By the induction
hypothesis, there exists a prex-free code g with nl codewords of length l for all l s1.
() implies
n1 as1 + n2 as2 + + ns1 a + ns as
where the rst s 1 terms on the LHS sum to the number of strings of length s with
some codeword of g as a prex and the RHS is the number of strings of length s. Hence
we can add at least ns new codewords of length s to g and maintain the prex-free
property.
Remark. The proof is constructive, i.e. just choose codewords in order of increasing
length, ensuring that no previous codeword is a prex.
m
!r rs
X X
si
a = bl al
i=1 l=1
where
m
!r rs
X X
asi al al = rs
i=1 l=1
m
X
asi (rs)1/r 1 as r .
i=1
si
P
Therefore, i=1 a 1.
5
Corollary 1.3. A decipherable code with prescribed word lengths exists if and only if
a prex-free code with the same word lengths exists.
x1
u:
uuu
H
u
IuII 9 x2
III Hrrr
I I$ rrrr
T
NNN x3
NNN Hppp7
N N' pppp
T NNN
NNN
N' T x4
1 1 1 1 7
Hence H= 2 1+ 4 2+ 8 3+ 8 3= 4.
Denition.
Pn
entropy
The of X is H(X) = i=1 pi log pi = H(p1 , . . . , pn ), where in
this course log = log2 .
0.8
0.6
H(p)
0.4
0.2
1
The entropy is greatest for p= 2 , i.e. a fair coin.
6 Noiseless Coding
Lemma 1.4 (Gibbs' Inequality) . Let (p1 , . . . , pn ) and (q1 , . . . , qn ) be probability distri-
butions. Then
n
X n
X
pi log pi pi log qi
i=1 i=1
0 1 2 3 4 5 6
x
1
We have
ln x x 1 x > 0 ()
qi qi
ln 1 i I
pi pi
X qi X X
pi ln qi pi
pi
iI iI iI
X
= qi 1
iI
0
X X
pi ln pi pi ln qi
iI iI
Xn Xn
pi ln pi pi ln qi
i=1 i=1
P qi
If equality holds then iI qi =1 and
pi =1 for all i I. Therefore, pi = q i for all
1 i n.
Proof. Take q1 = = qn = 1
n in Gibb's inequality.
Denition. A code
is optimal
f : 1 P
2 if it is a decipherable code with the minimum
m
possible expected word length i=1 pi si .
Theorem 1.6 (Noiseless Coding Theorem) . The expected word length E(S) of an op-
timal code satises
H(X) H(X)
E(S) < + 1.
log a log a
Proof. We rst prove the lower bound. Take f : 1 2 decipherable with word lengths
s
s1 , . . . , sm . Set qi = a c i where c = m
P si . Note
Pm
i=1 a i=1 qi = 1. By Gibbs' inequality,
m
X
H(X) pi log qi
i=1
Xm
= pi (si log a log c)
i=1
m
!
X
= pi si log a + log c.
i=1
loga pi si
loga pi si
pi asi
Pm si m
P
Now i=1 a i=1 pi = 1. By Theorem 1.1, there exists a prex-free code f with
word lengths s1 , . . . , sm . f has expected word length
m
X
E(S) = pi si
i=1
Xm
< pi ( loga pi + 1)
i=1
H(X)
= + 1.
log a
ShannonFano Coding
This follows the above proof. Given p1 , . . . , pm set si = d loga pi e. Construct a prex-
free code with word lengths s1 , . . . , sm by choosing codewords in order of increasing
length, ensuring that previous codewords are not prexes.
8 Noiseless Coding
Example. Let a = 2, m = 5.
i pi d log2 pi e
1 0.4 2 00
2 0.2 3 010
3 0.2 3 011
4 0.1 4 1000
5 0.1 4 1001
P H
We have E(S) = pi si = 2.8. The entropy is H = 2.121928 . . . , so here
log a =
2.121928 . . . .
Human Coding
For simplicity, let a = 2. Without loss of generality, p1 pm . The denition
is recursive. If m = 2 take codewords 0 and 1. If m > 2, rst take a Human code
for messages 1 , . . . , m2 , with probabilities p1 , . . . , pm2 , pm1 + pm . Then append
0 (resp. 1) to the codeword for to give a codeword for m1 (resp. m ).
P
We have E(S) = pi si = 2.2.
Theorem 1.7. Human codes are optimal.
Let
0
fm be an optimal code for Xm . 0 is still prex-free. By
fm
Without loss of generality
0
Lemma 1.8, without loss of generality the last two codewords of fm have maximal length
0 0
and dier only in the last digit. Say fm (m1 ) = y0, fm (m ) = y1 for some y {0, 1} .
0
Let fm1 be the prex-free code for Xm1 given by
0 0
fm1 (i ) = fm (i ) 1 i m 2
9
0
fm1 () = y.
0 0
E(Sm ) = E(Sm1 ) + pm1 + pm ()
Proof. If not, we modify f by (i) swapping the ith and j th codewords, or (ii) deleting
the last letter of each codeword of maximal length. The modied code is still prex-free
but has shorter expected word length, contradicting the optimality of f.
Joint Entropy
Denition. Let X, Y be random variables that values in 1 , 2 .
X X
H(X, Y ) = P(X = x, Y = y) log P(X = x, Y = y)
x1 y2
pij = P(X = xi Y = yj )
pi = P(X = xi )
qi = P(Y = yi )
Apply Gibbs' inequality with probability distributions {pij } and {pi qj } to obtain
X X
pij log pij pij log(pi qj )
i,j i,j
!
X X X X
= pij log pi pij log qj
i j j i
with equality if and only if pij = pi qj for all i, j , i.e. if and only if X, Y are independent.
Chapter 2
Error-control Codes
Denition. A binary [n, m]-code is a subsetC {0, 1}n of size m = |C|; n is the
length of the code, elements are called codewords. We use an [n, m]-code to send one of
m messages through a BSC making n uses of the channel.
1p
0 QQQQ m6/ 0
QQQ p mmmmm
mQmQmQ
mmmmm p QQQQQ
mm Q/(
1 1p
1
(i) The ideal observer decoding rule decodes x {0, 1}n as c C maximising
P(c sent | x received).
The maximum likelihood decoding rule decodes x {0, 1} as c C maximising
(ii)
n
d(x, c).
Lemma 2.1. If all messages are equally likely then (i) and (ii) agree.
Remark. The hypthesis of Lemma 2.1 is reasonable if we rst carry out noiseless coding.
P(c sent)
= P(x received |c sent).
P(x received)
r
r nr n p
P(x received |c sent) = p (1 p) = (1 p) .
1p
1 p
Since p< 2 , 1p < 1. So maximising P(x received |c sent) is the same as minimising
d(x, c).
p2 (1 p)
P(000 sent | 110 received) =
p2 (1 p) + (1 )p(1 p)2
p
=
p + (1 )(1 p)
3
=
4
1
P(111 sent | 110 received) = .
4
Therefore, the ideal observer decodes as 000. Maximimum likelihood and minimum
distance rules both decode as 111.
From now on, we will use the minimum distance decoding rule.
Remark. (i) Minimum distance decoding may be expensive in terms of time and
storage if |C| is large.
(ii) We should specify a convention in the case of a tie, e.g. make a random choice,
request to send again, etc.
Denition. A code C is
(i) d-error detecting if changing up to d digits in each codeword can never produce
another codeword.
(ii) e-error correcting if knowing that x {0, 1}n diers from a codeword in at most
e places, we can deduce the codeword.
Example. For the simple parity check code, also known as the paper tape code, we
identify {0, 1} with F2 .
This is a [n, 2n1 ]-code; it is 1-error detecting, but cannot correct errors. Its information
n1
rate is
n .
We can work out the codeword of 0, . . . , 7 by asking whether it is in {4, 5, 6, 7}, {2, 3, 6, 7},
{1, 3, 5, 7} and setting the last bit to be the parity checker.
13
0 0000 4 1001
1 0011 5 1010
2 0101 6 1100
3 0110 7 1111
c1 + c3 + c5 + c7 = 0
c2 + c3 + c6 + c7 = 0
c4 + c5 + c6 + c7 = 0
There is an arbitrary choice ofc3 , c5 , c6 , c7 but then c1 , c2 , c4 are forced. Hence, |C| = 24
1
and the information rate is
n log m = 47 .
Suppose we receive x F2 . We form the syndrome z = (z1 , z2 , z4 ) where
7
z1 = x1 + x3 + x5 + x7
z2 = x2 + x3 + x6 + x7
z4 = x4 + x5 + x6 + x7
If x C then z = (0, 0, 0). If d(x, c) = 1 for some c C then xi and ci dier for
i = z1 + 2z2 + 4z4 . The code is 1-error correcting.
{1 i n : xi 6= zi } {1 i n : xi 6= yi } {1 i n : yi 6= zi }
Denition. The minimum distance of a code is the minimum value of d(c1 , c2 ) for c1 , c2
distinct codewords.
If x B(c1 , e) B( c2 , e) then
Bounds on Codes
Notation.
Pr n
x Fn2 .
Let V (n, r) = |B(x, r)| = i=0 i , independently of
2n
|C| .
V (n, e)
X
|B(c, e)| |Fn2 | = 2n
cC
|C|V (n, e) 2n
2n
|C| = .
V (n, e)
Equivalently, for all x Fn2 there exists a unique c C such that d(x, c) e. Also
Fn2 = cC B(c, e), i.e. any e + 1 errors will make you decode wrongly.
S
equivalently,
2n 27 27
= = = 24 = |C|
V (n, e) V (7, 1) 1+7
Remark. If
2n
V (n,e) 6 Z then there does not exist a perfect e-error correcting code of
length n. The converse is false (see Example Sheet 2 for the case n = 90, e = 2).
15
In the last case, we have A(n, 2) 2n1 by the simple parity check code. Suppose C
has length n and mininum distance 2. Let C be obtained from C by switching the last
n n
digit of every codeword. Then 2|C| = |C C| |F2 | = 2 , so A(n, 2) = 2
n1 .
Proof. Let m = A(n, d + 1) and pick C with parameters [n, m, d + 1]. Let c1 , c2 C
0
with d(c1 , c2 ) = d + 1. Let c1 dier from c1 in exactly one of the places where c1 and c2
0
dier. Hence d(c1 , c2 ) = d. If c C \ {c1 } then
Proposition 2.8.
2n 2n
A(n, d)
V (n, d 1) V (n, d1
2 )
The lower bound is known as the Gilbert Shannon Varshanov (GSV) bound or sphere
covering bound. The upper bound is known as Hamming's bound or sphere packing
bound.
Proof of the GSV bound. Let m = A(n, d). Let C be an [n, m, d]-code. Then there does
n
not exist x F2 with d(x, c) d c C , otherwise we could replace C by C {x} to
get an [n, m + 1, d]-code. Therefore,
[
Fn2 = B(c, d 1)
cC
X
2n |B(c, d 1)| = mV (n, d 1).
cC
210 210
A(10, 3)
56 11
19 A(10, 3) 93.
It is known that 72 A(10, 3) 79, but the exact value is not known.
1
We study
n log A(n, bnc) as n to see how large the information rate can be for a
given error rate.
16 Error-control Codes
Proof. We rst show that (i) implies (ii). By the GSV bound,
2n
A(n, bnc)
V (n, bnc)
log A(n, bnc) log V (n, bnc)
1
n n
1 H().
1
Now we prove (i). Since H() is increasing for 2 , we may assume n Z.
1 = ( + (1 ))n
n
X n i
= (1 )ni
i
i=0
n
X n i
(1 )ni
i
i=0
n i
n
X n
= (1 )
i 1
i=0
n n
n
X n
(1 )
i 1
i=0
n n(1)
= (1 ) V (n, n)
Lemma 2.10.
V (n, bnc)
lim = H().
n n
Proof.
Pr
1
Without loss of generality assume 0 < < . Let 0 r
2
n
2 and recall V (n, r) =
n
i=0 i . Therefore,
n n
V (n, r) (r + 1) ()
r r
Stirling's formula states
ln n! = n ln n n + O(log n)
17
n
ln = (n ln n n) (r ln r r) ((n r) ln(n r) (n r))
r
+ O(log n)
n r nr
log = r log (n r) log + O(log n)
r n n
r
= nH + O(log n)
n
By (),
r
log n log V (n, r) r log n
H +O H +O
n n n n n
log V (n, bnc)
lim = H()
n n
1
If , we can
2 use the symmetry of the binomial coecients and the entropy to swap
and 1 .
n
X
C = {(c1 , . . . , cn , ci : (c1 , . . . , cn ) C},
i=1
Shannon's Theorems
Shannon's First Coding Theorem computes the information rate of certain sources, in-
cluding Bernoulli sources.
Consider the probability space (, F, P). Recall that a random variable X is a function
dened on with some
n
range, e.g. R, R , or . We have a probability mass function
pX : [0, 1]
x 7 P(X = x)
We consider
pX
p(X) :
X / / [0, 1]
/ P(X = X())
Lemma 3.1. The information rate of a Bernoulli source X1 , X 2 , . . . is atmost the ex-
pected word length of an optimal code f: {0, 1} for X.
Then
n
!
X
P((X1 , . . . , Xn ) An ) = P ( Si < n(E S1 + )
i=1
n
!
1X
=P | Si E S1 | <
n
i=1
1 as n
Corollary 3.2. From Lemma 3.1 and the Noiseless Coding Theorem, a Bernoulli source
X1 , X 2 , . . . has information rate less than H(X1 ) + 1.
X1 , . . . , XN , XN +1 , . . . , X2N , . . .
| {z } | {z }
Y1 Y2
N H H(Y1 ) + 1
= H(X1 , . . . , XN ) + 1
N
X
= H(Xi ) + 1
i=1
= N H(X1 ) + 1
1
H < H(X1 ) +
N
But N 1 is arbitrary, so H H(X1 ).
21
1
log p(X1 , . . . , Xn ) H as n .
n
Example. We toss a biased coin, P(Heads) = 2
3, P(Tails) = 1
3 , 300 times. Typically
we get about 200 heads and 100 tails. Each such sequence occurs with probability
2 200 1 100
approximately ( ) (3) .
3
Lemma 3.4. The AEP for a source X1 , X 2 , . . . is equivalent to the following property:
Proof. Let >0 and T n n be typical sets. Then for all (x1 , . . . , xn ) Tn
p(x1 , . . . , xn ) 2n(H+)
1 |Tn |2n(H+)
log|Tn |
n(H + )
n
Taking An = T n shows that the source is reliably encodeable at rate H + .
H
Conversely, if H = 0 we are done, otherwise pick 0 < < 2.
We suppose for a
contradiction that the source is reliably encodable at rate H 2, say with sets An n .
Let T n n be typical sets. Then for all (x1 , . . . , xn ) Tn ,
p(x1 , . . . , xn ) 2n(H)
P(An Tn ) 2n(H) |An |
log P(An Tn ) log|An | n
(H ) + (H ) + (H 2) =
n n
log P(An Tn ) as n
P(An Tn ) 0 as n
Entropy as an Expectation
Note 6.
For the entropy H, we have H(X) = E log p(X) , e.g. if X, Y independent,
p(X, Y ) = p(X)p(Y )
log p(X, Y ) = log p(X)p(Y )
H(X, Y ) = H(X) + H(Y ),
recovering Lemma 1.9.
Proof. We have
by the WLLN, using that X1 , . . . are independent identically distributed random vari-
ables and hence so are log p(X1 ), . . . .
Carefully writing out the denition of convergence in probability shows that the AEP
holds with constant H(X1 ). (This is left as an exercise.) We conclude using Shannon's
First Coding Theorem.
Remark. Many sources, which are not necessarily Bernoulli, satisfy the AEP. Under
1
suitable hypotheses, the sequence
n H(X1 , . . . , Xn ) is decreasing and the AEP is satised
with constant
1
H = lim H(X1 , . . . , Xn ).
n n
H(X1 ) 4.03
1
2 H(X1 , X2 ) 3.32
1
3 H(X1 , X2 , X3 ) 3.10
It is generally believed that English has entropy a bit bigger than 1, so about 75%
H 1 3
redundancy as 1 log|| 1 4 = 4 ).
Denition. Consider a communication channel, with input alphabet 1 , output alpha-
bet 2 . A code of length n is a subset C n1 . The error rate is
Denition. The capacity of the channel is the supremum of all reliable transmission
rates.
Proof. The idea is to use the GSV bound. Pick with 2p < < 1
2 . We will show reliable
transmission at rate R = 1 H() > 0. Let Cn be a code of length n and minimum
distance bnc of maximal size. Then
= 2nR
n1
e(Cn ) P(BSC makes more than
2 errors)
Pick >0 with p+< 2 . For n suciently large,
n 1
> n(p + )
2
e(Cn ) P(BSC makes more than n(p + ) errors)
0 as n
Lemma 3.8. Let > 0. A BSC with error probability p is used to transmit n digits.
Then
lim P(BSC makes at least n(p + ) errors) = 0.
n
(
1 if the ith digit is mistransmitted
Ui =
0 otherwise
24 Shannon's Theorems
P(Ui = 1) = p
P(Ui = 0) = 1 p
n
!
X
P(BSC makes more than n(p + ) errors) P | n1 Ui p| 0
i=0
as n by the WLLN.
Conditional Entropy
Let X, Y be random variables taking values in alphabets 1 , 2 .
Denition. We dene
X
H(X | Y = y) = P(X = x | Y = y) log P(X = x | Y = y)
x1
X
H(X | Y ) = P(Y = y)H(X | Y = y)
y2
Lemma 3.9.
H(X, Y ) = H(X | Y ) + H(Y ).
Proof.
X X
H(X | Y ) = P(X = x | Y = y) P(Y = y) log P(X = x | Y = y)
x1 y2
X X P(X = x, Y = y)
= P(X = y, Y = y) log
P(Y = y)
x1 y2
X
= P(X = x, Y = y) log P(X = x, Y = y)
(x,y)1 2
X X
+ P(X = x, Y = y) log P(Y = y)
y2 x1
| {z }
P(Y =y)
= H(X, Y ) H(Y )
Corollary 3.10. H(X | Y ) H(X) with equality if and only if X, Y are independent.
Note 9. H(X, Y | Z) denotes the entropy of X and Y given Z , not the entropy of X
and Y | Z.
Lemma 3.11. Let X, Y, Z be random variables. Then
Proof. Let (
0 if X=Y
Z=
1 if X 6= Y
Then P(Z = 0) = 1 p, P(Z = 1) = p and so H(Z) = H(p). Now by Lemma 3.11,
H(X | Y = y, Z = 0) = 0.
Therefore,
X
H(X | Y, Z) = P(Y = y, Z = z)H(X | Y = y, Z = z)
y,z
X
P(Y = y, Z = 1) log(m 1)
y
= P(Z = 1) log(m 1)
= p log(m 1)
Now by (),
H(X | Y ) H(p) + p log(m 1).
26 Shannon's Theorems
Theorem 3.13 (Shannon's Second Coding Theorem) . For a DMC, the capacity equals
the information capacity.
0.8
0.6
C
0.4
0.2
Note 11. We can choose either H(Y ) H(Y | X) or vice versa. Often one is easier.
Example. Consider a binary erasure channel with erasure probability p, input X and
output Y.
P(X = 0) = P(Y = 0) = (1 p)
P(X = 1) = 1 P(Y = ?) = p
P(Y = 1) = (1 )(1 p)
Then
H(X | Y = 0) = 0
H(X | Y = ?) = H()
H(X | Y = 1) = 0
0.8
0.6
C
0.4
0.2
Lemma 3.14. The nth extension of a DMC with information capacity C has information
capacity nC .
n
X
H(Y1 , . . . , Yn | X1 , . . . , Xn ) = H(Yi | X1 , . . . , Xn )
i=1
Xn
= H(Yi | Xi )
i=1
I(X1 , . . . , Xn ; Y1 , . . . , Yn ) = H(Y1 , . . . , Yn ) H(Y1 , . . . , Yn | X1 , . . . , Xn )
Xn
= H(Y1 , . . . , Yn ) H(Yi | Xi )
i=1
28 Shannon's Theorems
n
X n
X
H(Yi ) H(Yi | Xi )
i=1 i=1
X
= I(Xi , Yi )
i=1
nC
Proposition 3.15. For a DMC, the capacity is at most the information capacity.
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Recall
e(Cn ) = max P(error | c sent).
cCn
nR 1
H(X | Y ) |{z}
1 +e(Cn ) log(|Cn | 1)
H(p)1
1 + e(Cn )n(Cn )
nR
since |Cn | = 2 . Now by Lemma 3.14,
nC I(X; Y )
29
= H(X) H(X | Y )
log|Cn | (1 + e(Cn )n(Cn ))
= n(Cn ) + e(Cn )n(Cn ) 1
e(Cn )n(Cn ) n((Cn ) C) 1
(Cn ) C 1 RC
e(Cn ) as n
(Cn ) n(Cn ) R
Since R > C , this contradicts e(Cn ) 0 as n . This shows that we cannot transmit
reliably at any rate R > C , hence the capacity is at most C .
To complete the proof of Shannon's Second Coding Theorem for a BSC with error
probability p, we must show that the capacity is at most 1 H(p).
Proposition 3.16. Consider a BSC with error probability p. Let R < 1 H(p). Then
there exists codes C1 , C2 , . . . with Cn of length n and such that
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Note 12. Note that Proposition 3.16 is concerned with with the average error rate e
rather than the error rate e.
Proof. The idea of the proof is to pick codes at random. Without loss of generality,
1
assume p< 2 . Take >0 such that
1
p+<
2
R < 1 H(p + )
nR
Note this is possible since H is continuous. Let m = 2 and be the set of [n, m]-
2n
codes, so || = m . Let C be a random variable equidistributed in . Say C =
{X1 , . . . , Xm } where the Xi are random variables taking values in Fn2 such that
(
1
m if xC
P(Xi = x | C = C) =
0 otherwise
Note that (
1
2n 1 6 x2
x1 =
P(X2 = x2 | X1 = x1 ) =
0 x1 = x2
We send X = X1 through the BSC, receive Y and decode to obtain Z. Using minimum
distance decoding,
1 X
P(X 6= Z) = e(C)
||
C
m
X
P(B(Y, r) C ) {X}) P(Xi B(Y, r) and X1 B(Y, r))
i=2
m
X
P(Xi B(Y, r) | X1 B(Y, r))
i=2
V (n, r) 1
= (m 1)
2n 1
V (n, r)
m
2n
nR nH(p+) n
2 2 2
= 2n[R(1H(p+))]
0,
as n since R < 1 H(p + ). We have used Proposition 2.9 to obtain the last
inequality.
and
e(Cn ) 2e(Cn0 )
Then (Cn ) R and e(Cn ) 0 as n .
Proposition 3.17 says that we can transmit reliably at any rate R < 1 H(p), so the
capacity is at least 1 H(p). But by Proposition 3.15, the capacity is at most 1 H(p),
hence a BSC with error probability p has capacity 1 H(p).
Remark. The proof shows that good codes exist, but does not tell us how to construct
them.
Chapter 4
(i) 0 C;
(ii) whenever x, y C then x + y C.
Equivalently, C is an F2 -vector subspace of Fn2 .
Denition. The rank of C is its dimension as a F2 -vector subspace. A linear code of
length n, rank k is an (n, k)-code. If the minimum distance is d, it is an (n, k, d)-code.
k
so |C| = 2k . So an (n, k)-code is an [n, 2k ]-code. The information rate is (C) = n.
C = {x Fn2 : p.x = 0 p P }.
This is a parity check code, so it is linear. Beware that we can have C C 6= {0}.
So C = (C ) .
C = {x Fn2 : Hx = 0}.
Syndrome Decoding
Let C be an (n, k)-linear code. Recall that
C= {GT y
:y Fk2 } where G is the generator matrix.
C = {x Fn2 : Hx = 0} where H is the parity check matrix.
Lemma 4.5. Every (n, k)-linear code is equivalent to one with generator matrix G=
(Ik | B) for some k (n k) matrix B .
Proof. Using Gaussian elimination, i.e. row operations, we can transform G into row
echelon form, i.e.
(
0 if j < l(i)
Gij =
1 if j = l(i)
for some l(1) < l(2) < < l(k). Permuting columns replaces the code by an equivalent
code. So without loss of generality we may assume l(i) = i for all 1 i k. Therefore,
1
..
G=
.
0 1
Proof of Lemma 4.3. Without loss of generality C has generator matrix G = (Ik | B). G
n k
has k linearly independent columns, so the linear map : F2 F2 , x 7 Gx is surjective
and ker() = C , so by the rank-nullity theorem we obtain
Lemma 4.6. An (n, k)-linear code with generator matrix G = (Ik | B) has parity check
matrix H = (B T | Ink ).
so C
has generator matrix H .
Hamming Codes
Denition. For d 1, n = 2d 1. Let H be the d n matrix whose columns are
let
the non-zero elements of F2 . The Hamming (n, n d)-code is the linear code with parity
d
1 0 1 0 1 0 1
H = 0 1 1 0 0 1 1
0 0 0 1 1 1 1
34 Linear and Cyclic Codes
Lemma 4.7. The minimum distance of the (n, n d) Hamming code C is d(C) = 3. It
is a perfect 1-error correcting code.
Proof. The codewords of C are dependence relations between the columns of H. Any
two columns of H are linearly independent, so there are no non-zero codewords of weight
at most 2. Hence d(C) 3. If
1 0 1 ...
H = 0 1 1
.
.
.
2n 2n
= = 2nd = |C|,
V (n, 1) n+1
so C is perfect.
ReedMuller Codes
Take a set X such that |X| = n, X = {P1 , . . . , Pn }. There is a correspondence between
P(X) and Fn2 .
P(X) o / {f : X F2 } o / Fn
2
/ 1A
A
f / (f (P1 ), . . . , f (Pn ))
Example. Let d = 3.
X {000 001 010 011 100 101 110 111}
v0 1 1 1 1 1 1 1 1
v1 1 1 1 1 0 0 0 0
v2 1 1 0 0 1 1 0 0
v3 1 0 1 0 1 0 1 0
v1 v2 1 1 0 0 0 0 0 0
v2 v3 1 0 0 0 1 0 0 0
v1 v3 1 0 0 0 0 0 0 0
v1 v2 v3 1 0 0 0 0 0 0 0
35
Theorem 4.8. (i) The vectors vi1 vis for 1 i1 < i2 < < is d and
0 s d are a basis n
Pr fordF
2 .
(ii) rank RM (d, r) = s=0 s .
Pd
Proof. d
= (1 + 1)d = 2d = n vectors,
(i) We have listed i=0 s so it suces to check
spanning, i.e. check RM (d, d) = Fn2 . Let p X and
(
vi if pi = 0
yi =
v0 + vi if p1 = 1
Then 1{p} = y1 yd . Expand this using the distributive law to show 1{p}
RM (d, d). But 1{p} for p X span Fn2 , so the given vectors form a basis.
(ii) RM (d, r) is spanned by the vectors vi1 vis for 1 i1 < < is d with
0 s r. ThesePvectors are linearly independent by (i), so a basis. Therefore,
rank RM (d, r) = rs=0 ds .
vd = (00 . . . 0 | 11 . . . 1)
36 Linear and Cyclic Codes
Then
z = x + y vd
= (x0 | x0 ) + (y 0 | y 0 ) (00 . . . 0 | 11 . . . 1)
= (x0 | x0 + y 0 )
So z RM (d 1, r) | RM (d 1, r 1).
(ii)
d
If r = 0 then RM (d, 0) is a repetition code of length n = 2 . This has minimum
distance 2
d0 n
. If r = d then RM (d, d) = F2 with minimum distance 1 = 2
dd .
RM (d, r) = RM (d 1, r) | RM (d 1, r 1).
( n )
X
R[X] = ai X i : a0 , . . . , an R, n N
i=0
Remark.
Pn i
By denition, i=0 ai X = 0 if and only ai = 0 for all i. Thus f (X) =
X2 + X F2 [X] is non-zero, yet f (a) = 0 for all a F2 .
Let F be any eld. The rings Z and F [X] both have a division algorithm: if a, b Z,
b 6= 0 then there exist q, r Z such that a = qb + r and 0 r < |b|. If f, g F [X],
g 6= 0 then there exist q, r F [X] such that f = qg + r with deg(r) < deg(g).
37
r R, x I = rx I.
Denition. The principal ideal generated by xR is
Fact. Every non-zero element of Z or F [X] can be factored into irreducibles, uniquely
up to order and multiplication by units.
IfI R is an ideal then the set of cosets R/I = {x + I : x R} is a ring, called the
quotient ring, under the natural choice of + and . In practice, we identify Z/nZ and
{0, 1, . . . , n 1} and agree to reduce modulo n after each + and . Similarly,
(n1 )
X
F [X]/(f (X)) = ai X : a0 , . . . , an1 F = F n
i
i=0
where n = deg f , reducing after each multiplication using the division algorithm.
Cyclic Codes
Denition. A linear code C Fn2 is cyclic if
(i) 0C
(ii) f, g C = f + g C
(iii) f F2 [X], g C = f g C
Equivalently, C is an ideal in F2 [X]/(X n 1).
by (ii).
38 Linear and Cyclic Codes
Basic Problem
Our basic problem is to nd all cyclic codes of length n. The following diagram outlines
the solution.
Proof. Let g(X) F2 [X] be of least degree representing a non-zero codeword. Note
deg g < n. Since C is cyclic, we have in (i).
Let p(X) F2 [X] represent a codeword. By the division algorithm, p(X) = q(X)g(X)+
r(X) for some q, r F2 [X] with deg r < deg g . So r(X) = p(X) q(X)g(X) C ,
n
contradicting the choice of g(X) unless r(X) is a multiple of X 1, hence r(X) = 0 as
deg r < deg g < n; i.e. g(X) | p(X). This shows in (i).
Taking p(X) = X n 1 gives (ii).
Uniqueness. Suppose g1 (X), g2 (X) both satisfy (i) and (ii). Then g1 (X) | g2 (X) and
g2 (X) | g1 (X), so g1 (X) = ug2 (X) for some unit u. But units in F2 [X] are F2 \{0} = {1},
so g1 (X) = g2 (X).
Lemma 4.13. LetC be a cyclic code of length n with generator polynomial g(X) =
a0 + a1 X + + ak X k , ak 6= 0. Then C has basis g(X), Xg(X), . . . , X nk1 g(X). In
particular, C has rank n k .
bnk . . . b1 b0 0
bnk . . . b1 b0
H=
b nk . . . b 1 b0
.. ..
. .
0 bnk . . . b1 b0
Indeed, the dot product of theith row of G and the j th row of H is the coecient of
X (nki)+j ing(X)h(X). But 1 i n k and 1 j k , so 0 < (n k i) + j < n.
n
These coecients of g(X)h(X) = X 1 are zero, hence the rows of G and H are
orthogonal. Also rank H = k = rank C , so H is a parity check matrix.
Remark. The check polynomial is the reverse of the generator polynomial for the dual
code.
Lemma 4.15. n is odd then X n 1 = f1 (X) . . . ft (X) with f1 (X), . . . , ft (X) distinct
If
2 2
irreducibles in F2 [X]. (Note this is false for n even, e.g. X 1 = (X 1) in F2 [X].)
t
In particular, there are 2 cyclic codes of length n.
Proof. Suppose X n 1 has a repeated factor. Then there exists a eld extension K/F2
such that X
n 1 = (X a)2 g(X) for some a K and some g(X) K[X]. Taking
formal derivatives, nX
n1 = 2(X a)g(X) + (X a)2 g 0 (X) so nan1 = 0, so a = 0
n
since n is odd, hence 0 = a = 1, contradiction.
Finite Fields
Theorem A. Suppose p prime, Fp = Z/pZ. Let f (X) Fp [X] be irreducible. Then
K = Fp [X]/(f (X)) is a eld of order p
deg f and every nite eld arises in this way.
Theorem B. Let q = pr be a prime power. Then there exists a eld Fq of order q and
it is unique up to isomorphism.
BCH Codes
Let n be an odd integer. Pick r 1 such that 2r 1 (mod n). (This exists since
(2, n) = 1.) Let K = F2r . Let n (K) = {x K : xn = 1} K . Since n | (2r 1) =
|K |, n (K) is a cyclic group of order n. So n (K) = {1, , . . . , n1 } for some K ,
is called a primitive nth root of unity.
The generator polynomial is the non-zero polynomial g(X) of least degree such that
g(a) = 0 for all a A. Equivalently, g(X) is the least common multiple of the minimal
polynomials of the elements a A.
Denition. The cyclic code with dening set A = {, 2 , . . . , 1 } is called a BCH
(Bose, Ray-Chaudhuri, Hocquenghem) code with design distance .
2 n1
1 ...
1 2 4 ... 2(n1)
H=
. . . . . . . . . . . . .
. . . . . . . . . . . .
1 1 2(1) . . . (1)(n1)
By Lemma 4.17, any 1 columns of H are linearly independent. But any codeword of
C is a dependence relation between the columns of H. Hence every non-zero codeword
has weights at least . Therefore, d(C) .
Note 14. H is not a parity check matrix, its entries are not in F2 .
Fn2 o / F2 [X]/(X n 1)
where E = {0 i n 1 : ei = 1}.
41
Theorem 4.18. Assume deg = |E| t. Then (X) is the unique polynomial in K[X]
of least degree such that
X Y
(X) = i X (1 j X).
iE jE
j6=i
(X) X i X
=
(X) 1 i X
iE
XX
= (i X)j
iE j=1
!
X X
= (j )i Xj
j=1 iE
X
= e(j )X j
j=1
Therefore,
X
(X) e(j )X j = (X).
j=1
2t
X
(X) r(j )X j (X) (mod X 2t+1 ).
j=1
We have checked (i) and (ii) with (X) = X 0 (X), so deg = deg = |E| t.
Suppose (X), (X) K[X] also satisfy (i), (ii) and deg deg . Note if i E,
Y
(i ) = (1 ji ) 6= 0
jE
j6=i
Decoding algorithm
Shift Registers
Denition. A (general) feedback shiftback register is a function f : Fd2 Fd2 of the form
f (x0 , x1 , . . . , xd1 ) = (x1 , . . . , xd1 , C(x0 , . . . , xd1 )) for some function C : Fd2 F2 . We
say the register has length d.
t t r
x0 TTTx1 QQ x2 tH ... t xd2 xd1 o
TTTT QQQ HH
TTTTQQQ HH sss kkkkk
TTTQTQQHHH ss kk k
T* Q( $ yssukskkkk
function C
Pd1
The register is linear (LFSR) if C is a linear map, say (x0 , . . . , xd1 ) 7 i=0 ai xi .
The initial ll (y0 , y1 , . . . , yd1 ) produces an output sequence (yn )n0 given by
i.e. we have a sequence determined by a linear recurrence relation with auxiliary poly-
nomial P (X) = X d + ad1 X d1 + a1 X + a0 .
Lemma 4.19. The sequence (yn )n0 in F2 is the output from a LSFR with auxiliary
polynomial P (X) if and only if
X A(X)
yi X i =
i=0
P (X)
for some A(X) F2 [X], with deg A < deg P and P (X) = X deg P P (X 1 ) F2 [X].
Proof. Let
d + + a X + a . Therefore, P (X) = a X d + + a
P (X) = ad XP 1 0 0 d1 X + ad .
i P (X) is a polynomial of degree less than d. This
The condition is that i=0 iy X
holds if and only if
d1
X
ai ynd+1 = 0 n d
i=0
d1
X
ai yn+i = 0 n 0
i=0
If we know that the register has length at least r, start with i = r. Compute det Ai .
If det Ai 6= 0, then d > i, replace i by i + 1 and repeat.
If det Ai = 0, solve () for a0 , . . . , ad1 by Gaussian elemination and test the
solution over as many terms of the sequence as we like. If it fails, then d > i,
replace i by i+1 and repeat.
Chapter 5
Cryptography
There is some secret information shared by the sender and receiver, called the key in K.
The unencrypted message is called the plaintext and from M. The encrypted message
is called the ciphertext and it is from C. A cryptosystem consists of sets (K, M, C) with
functions
e: M K C
d: C K M
Examples (i) and (ii) fail at the level 2, at least for suciently random messages. They
even fail at level 1, if e.g. the source is English text. For modern applications, level 3 is
desirable.
We model the key and the messages as independent random variables K and M taking
values in K and M. Put C = e(K, M ).
Denition. A cryptosystem has perfect secrecy if M and C are independent. Equiva-
lently, I(M ; C) = 0.
46 Cryptography
Proof. Pick m0 M and k0 K with P(K = k0 ) > 0. Let c0 = e(m0 , k0 ). For any
m M,
P(C = c0 | M = m) = P(C = c0 ) = P(C = c0 | M = m0 ) = P(K = k0 ) > 0.
So for each mM there exists kK such that e(m, k) = c0 . Therefore, |K| |M|.
Denition. The unicity distance is the least n for which H(K | C (n) ) = 0, i.e. the
smallest number of encrypted messages required to uniquely determine the key.
We assume that
So
if and only if
log|K|
n U :=
log|| H
which is the unicity distance.
Recall that 0 H log||. To make the unicity distance large we can make K large or
use a message source with little redundancy.
47
Example. Suppose we can decrypt a substitution cipher after 40 letters. || = 26, |K| =
26!, U 40. Then for the entropy of English text HE we have
log 26!
HE log 26 2.5
40
Many cryptosystems are thought secure (and indeed used) beyond the unicity distance.
Stream Ciphers
We work with streams, i.e. sequences in F2 . For plaintext p0 , p 1 , . . . and key k0 , k1 , . . .
we set the ciphertext to be z0 , z 1 , . . . where zn = pn + kn .
The key stream is a random sequence, known only to the sender and recipient. Let
1
K0 , K1 , . . . be i.i.d. random variables with P(Kj = 0) = P(Kj = 1) = 2 . The ciphertext
isZn = pn +Kn , where the plaintext is xed. Then Z0 , Z1 , . . . are i.i.d. random variables
1
with P(Zj = 0) = P(Zj = 1) =
2 . Therefore, without knowledge of the key stream
deciphering is impossible. (Hence this has innite unicity distance.)
There are the following two problems with the use of one time pads.
(i) is surprisingly tricky, but not a problem in practice. (ii) is the same problem we
started with. In most applications, the one time pad is not practical. Instead, we
generate k0 , k1 , . . . using a feedback shift register, say of length d. We only need to
share the initial ll k0 , k1 , . . . , kd1 .
Proof. Let the register be f : Fd2 Fd2 . Let vi = (xi , xi+1 , . . . , xi+d1 ). Then vi+1 =
f (vi ). Since |Fd2 | = 2d , the vectors v0 , v1 , . . . , v2d cannot all be distinct, so there exist
0 a < b 2d such that va = vb . Let M = a, N = ba. So vM = vM +N and vr = vr+N
for all r M (by induction, apply f ), so xr = xr+N for all r M .
We can also generate new key streams from old ones as follows.
Lemma 5.4. Let xn and yn be the output from a LFSR of length M and N , respectively.
(i) The sequence (xn + yn ) is the output from a LFSR of length M + N .
(ii) The sequence (xn yn ) is the output from a LFSR of length M N .
Proof. We will assume that the auxiliary polynomials P (X), Q(X) each have distinct
roots, say 1 , . . . , M and 1 , . . . , N in some extension eld K of F2 .
Then xn =
P M n, y =
PN n for some , K .
i=1 i i n j=1 j j i j
PM n
PN n
(i) xn + yn = i=1 i i + j=1 j j . This is produced by a LFSR with auxiliary
polynomial P (X)Q(X).
PM PN
(ii) xn yn = i j (i j )n is the output of a LFSR with auxiliary polynomial
QN QM i=1 j=1
i=1 j=1 (X i j ), which is in F2 [X] by the Symmetric Function Theorem.
(i) Adding the output of two LFSR is no more economical then producing the same
string with a single LFSR.
(ii) Multiplying streams looks promising, until we realise that xn yn = 0 75% of the
time.
Remark. Non-linear registers look appealing, but are dicult to analyse. In particular,
the eavesdropper may understand them better than we do.
(
xn if zn = 0
kn =
yn if zn = 1
To apply Lemma 5.4, write kn = xn + zn (xn + yn ) to deduce (kn ) is again the output
from a LFSR.
Stream ciphers are examples of symmetric cryptosystems, i.e. decryption is the same, or
easily deduced from the encryption algorithm.
49
This is an example of an asymmetric cryptosystem. We split the key into two parts.
Knowing the encryption and decryption algorithms and the public key, it should still be
hard to nd the private key or to decrypt messages. This aim implies security at level 3
(chosen plaintext). There is also no key exchange problem.
The idea is to base the system on mathematical problems that are believed to be hard.
We consider two such problems.
d
#(operations) c(input size)
Note 15. An algorithm for factoring N has input size log N , i.e. the number of digits
of N.
Polynomial time algorithms are not known for (i) and (ii).
Elementary methods
(i) Trial division properly organised takes time O( N ).
(ii) Baby-step Giant-step algorithm. Set m= p , write a = qm + r, 0 q, r < m.
Then
x g a g qm+r (mod p)
qm r
g g x (mod p)
Factor base
The best known method for solving (i) and (ii) uses a factor base method called the
1/3 2/3
number eld sieve. It has running time O(ec(log N ) (log log N ) ) where c is a known
constant. Note this is closer to polynomial time (in log N ) than to exponential time (in
1 2
log N ) tanks to the exponents
3 and 3 .
Recall that
A special case of this is Fermat's little theorem, stating that for prime p
Proof. Let x0 be a solution. Without loss of generality, we may assume x0 6 0 (mod p).
Then
2(2k1)
d2k1 x0 xp1
0 1 (mod p)
(dk )2 d (mod p).
51
The private key consists of two large distinct primesp, q 3 (mod 4). The public key
isN = pq . We have M = C = {0, 1, 2, . . . , N 1}. We encrypt
a message m M as
c = m2 (mod N ). The ciphertext is c. (We should avoid m < N .)
Suppose we receive c. Use Lemma 5.5 to solve for x1 , x2 such thatx21 c (mod p),
x22 c (mod q). Then use the Chinese Remainder Theorem (CRT) to nd x with
x x1 (mod p), x x2 (mod q), hence x2 c (mod N ). Indeed, running Euclid's
algorithm on p and q gives integers r, s with rp + sq = 1. We take x = (sq)x1 + (rp)x2 .
Lemma 5.6. (i) Let p be an odd prime and gcd(d, p) = 1. Then x2 d (mod p) has
no or two solutions.
(ii) Let N = pq , p, q distinct odd primes and gcd(d, N ) = 1. Then x2 d (mod N )
has no or four solutions.
Proof. (i)
x2 y 2 (mod p)
p | (x + y)(x y)
p | (x + y) or p | (x y)
x y (mod p).
(ii) If x0 is some solution, then by CRT there exist solutions x with x x0 (mod p),
x x0 (mod q) for any of the four choices of . By (i), these are the only
solutions.
Proof. We have seen that factoring N allows us to decrypt messages. Conversely, suppose
we have an algorithm for computing square roots modulo N . Pick x (mod N ) at random.
Use the algorithm to nd y such that y 2 x2 (mod N ). With probability 12 , x 6 y
(mod N ). Then gcd(N, x y) is a non-trivial factor of N . If this fails, start again
1
with another x. After r trials, the probability of failure is less that r , which becomes
2
arbitrarily small.
Theorem 5.8.
t
x X then
(i) If there exists 0t<a such that gcd(x2 b 1, N ) is
a non-trivial factor of N .
(ii) |X| 21 |(Z/N Z) | = (N )
2 .
52 Cryptography
x(N ) 1 (mod N )
m
= x 1 (mod N ).
a
But m = 2a b, so putting y = xb (mod N ) we get y 2 1 (mod N ). Therefore,
op (y) and oq (y) are powers of 2. We are given op (y) 6= oq (y), and without loss of
t
generality we may assume op (y) < oq (y). Say op (y) = 2 , so 0 t < a. Then
t
y 2 1 (mod p)
t
y2 1 (mod q)
t
So gcd(y 2 1, N ) = p.
(ii) See Page 52.
Corollary 5.9. Finding the RSA private key (N, d) from the public key (N, e) is essen-
tially as dicult as factoring N.
Proof. We have seen that factoring N allows us to nd d. Conversely, if we know d and e,
de 1 (mod (N )), then (N ) | (de 1) from taking m = de 1 in Theorem 3.13.
Proof of Theorem 5.8 (ii). By the CRT we have the following correspondence.
x / (x mod p, x mod q)
It suces to show that if we partition (Z/pZ) according to the value of op (xb ) then
1 p1
each subset has size at most
2 |(Z/pZ) | = 2 . We show that some subset has size
1
(Z/pZ) = {1, g, g , . . . , g p1 }. By Fermat's little theorem,
2
2 |(Z/pZ) |. Recall that
g p1 1 (mod p)
2a b
g 1 (mod p)
and hence op (g b ) is a power of 2. So
(
= op (g b )
b if odd
op (g )
< op (g b ) otherwise
Remark. It is not known whether decrypting RSA messages without knowledge of the
private key is essentially as hard as factoring.
53
Let p be a large prime, g a primitve root modulo p. This data is xed and known to
everyone.
Alice and Bob wish to agree a secret key. A chooses Z and sends g (mod p) to B .
B chooses Z and sends g
(mod p) to A. They both compute k = (g ) = (g )
(mod p) and use this as their secret key.
Secrecy.A and B can be sure that no third party can read the message.
A and B can be sure that no third party can alter the message.
Integrity.
Authenticity. B can be sure that A sent the message.
Non-repudiation. B can prove to a third party that A sent the message.
A uses the private key (N, d) to encrypt messages. Anyone can decrypt messages using
the public key (N, e). (Note that (xd )e = (xe )d x.) But they cannot forge messages
sent by A.
Signatures
Signature schemes can be used to preserve integrity and non-repudiation. They also
prevent tampering of the following kind.
Example (Homomorphism attack). A bank sends messages of the form (M1 , M2 ) where
M1 is the name of the client and M2 is the amount transferred to his account. Messages
are encoded using RSA
I transfer 100 to my account, observe the encrypted message (Z1 , Z2 ) and then send
(Z1 , Z23 ). I become a millionaire without the need to break RSA.
Example (Copying) . I could just keep sending (Z1 , Z2 ). This is defeated by time
stamping.
A message m is signed as (m, s) where s is a function of m and the private key. The
signature (or trapdoor) function should be designed so no-one without knowledge of the
private key can sign messages, yet anyone can check the signature is valid.
Remark. We are interested in the signature of the message, not of the sender.
54 Cryptography
A has private key (N, d), public key (N, e). She signs m as (m, md mod N ). The
signature s is veried by checking s
e m (mod N ).
There are the following problems.
r gk (mod p) (1)
m ur + ks (mod p 1) (2)
ax b (mod m) ()
It is important that Alice chooses a new value of k to sign each message. Otherwise
suppose messages m1 , m2 have signatures (r, s1 ) and (r, s2 ).
m1 ur + ks1 (mod p 1) ()
m2 ur + ks2 (mod p 1)
m1 m2 k(s1 s2 ) (mod p 1)
Remark. Several existential forgeries are known, i.e. we can nd solutions m, r, s to
g m y r rs (mod p), but with now control over m. In practice, this is stopped by signing
a hash value of the message instead of the message itself.
Bit Commitment
Alice would like to send a message to Bob in such a way that
(i) Bob cannot read the message until Alice sends further information;
(ii) Alice cannot change the message.
Coin tossing;
sell stock market tips;
multiparty computation, e.g. voting, surveys, etc.
(i) Using any public key cryptosystem. Bob cannot read the message until Alice sends
her private key.
(ii) Using coding theory as follows.
noisy channel
*
Alice 4 Bob
clear channel
The noisy channel is modelled as a BSC with error probability p. Bob chooses a
linear code C with appropriate parameters. Alice chooses a linear map : C F2 .
To send m {0, 1}, Alice chooses c C such that (c) = m and sends c to Bob
via the noisy channel. Bob receives r = c + e, d(r, c) = (e) np. (The variance
of the BSC should be chosen small.) Later Alice sends c via the clear channel and
Bob checks d(r, c) np.
Why can Bob not read the message? We arrange that C has minimum distance
much smaller than np.
Why can Alice not change her choice? Alice knows the codeword c sent, but not r .
If later she sends c it will only be accepted if d(c , r) np. Alice's only safe option
0 0
0
is to choose c very close to c. But if the minimum distance of C is suciently
0
large, this forces c = c.
56 Cryptography
Quantum Cryptography
The following are problems with public key systems.
They are based on the belief that some mathematical problem is hard, e.g. factori-
sation or computation of the discrete logarithm. This might not be true.
As computers get faster, yesterday's securely encrypted message is easily read
tomorrow.
The aim is to construct a key exchange scheme that is secure, conditional only on the
laws of physics.
A classical bit is an element of {0, 1}. A quantum bit, or qubit, is a linear combination
|i = |0i + |1i with , C, ||2 + ||2 = 1. Measuring |i gives |0i with probability
||2 and |1i with probability ||2 . After the measurement, the qubit collapses to the
state observed, i.e. |0i or |1i.
The basic idea is that Alice generates a sequence of qubits and sends them to Bob. By
comparing notes afterwards, they can detect the presence of an eavesdropper.
/ / /
N
polarity vertical
light bulbs lter, angle polarity
to the vertical lter
Each photon passes through the second lter with probability cos2 . We identify C2 =
{|0i + |1i : , C} with an inner product (1 , 1 ).(2 , 2 ) = 1 2 + 1 2 . We can
measure a qubit with respect to any orthonormal basis, e.g.
1 1
|+i = |0i + |1i
2 2
1 1
|i = |0i |1i
2 2
If|i = |+i + |i then the observation gives |+i with probability ||2 and |i with
2
probability || .
Remark. An eavesdropper who could predict which basis Alice is using to send, or Bob
uses to measure, could remain undetected. Otherwise, the eavesdropper will change
about 25% of the 2n bits shared.
57
One problem is that noise has the same eect as an eavesdropper. Say A and B accept at
most t errors in the n bits they compare, and assume at most t errors in the other n bits.
Say A has x F2 , B has x + e F2 with (e) t. We pick linear codes C2 C1 Fn2
of length n where C1 and C2 are t-error correcting. A chooses c C1 at random and
sends x + c to B using the clear channel. B computes (x + e) + (x + c) = c + e and
recovers c using the decoding rule for C1 .
To decrease the mutual information shared, A and B use as their key the coset c + C2
in C1 /C2 .
This version of BB84 is provably secure conditional only on the laws of physics. A
suitable choice of parameters can make both the probability that the scheme aborts and
the mutual information simultaneously arbitrarily small.