Coding

Coding and Cryptography
Dr T.A. Fisher
Michaelmas 2005
AT Xed by Sebastian Pancratz

L E
ii
These notes are based on a course of lectures given by Dr T.A. Fisher in Part II of the
Mathematical Tripos at the University of Cambridge in the academic year 20052006.
These notes have not been checked by Dr T.A. Fisher and should not be regarded as
ocial notes for the course. In particular, the responsibility for any errors is mine
please email Sebastian Pancratz (sfp25) with any comments or corrections.
Contents
1 Noiseless Coding 3
2 Error-control Codes 11
3 Shannon's Theorems 19
4 Linear and Cyclic Codes 31
5 Cryptography 45
Introduction to Communication Channels
We model communication as illustrated in the following diagram.
Source
/ Encoder
/o /o /o /o /o channel
o/ /o /oO /o /o /o /o /o /o /o / Decoder
/ Receiver
errors, noise
Examples include telegraphs, mobile phones, fax machines, modems, compact discs, or
a space probe sending back a picture.
Basic Problem
Given a source and a channel (modelled probabilistically), we must design an encoder
and decoder to transmit messages economically (noiseless coding, data compression) and
reliably (noisy coding).
Example .
A E Q Z
(Noiseless coding) In Morse code, common letters are given shorter code-
words, e.g. A , E , Q , and Z .
Noiseless coding is adapted to the source.
Example (Noisy coding). Every book has an ISBN a1 a2 . . . a10 where ai {0, 1, . . . , 9}
P10
for 1 i 9 and a10 {0, 1, . . . , 9, X} with j=1 jaj 0 (mod 11). This allows
detection of errors such as
one incorrect digit;

transposition of two digits.
Noisy coding is adapted to the channel.
Plan of the Course

I. Noiseless Coding
II. Error-control Codes
III. Shannon's Theorems
IV. Linear and Cyclic Codes
V. Cryptography
Useful books for this course include the following.
D. Welsh: Codes & Cryptography, OUP 1988.

C.M. Goldie, R.G.E. Pinch: Communication Theory, CUP 1991.
T.M. Cover, J.A. Thomas: Elements of Information Theory, Wiley 1991.
2 Contents
W. Trappe, L.C. Washington: Introduction to Cryptography with Coding Theory,

Prentice Hall 2002.
The books mentioned above cover the following parts of the course.
W G & P C & T T & W

I & III X X X
II & IV X X X
V X X
Overview
Denition. A communication channel accepts symbols from an alphabet 1 = {a1 , . . . , ar }
and it outputs symbols from an alphabet 2 = {b1 , . . . , bs }. The channel is modelled by
the probabilities P(y1 y2 . . . yn received) | x1 x2 . . . xn sent).
Denition. A discrete memoryless channel (DMC) is a channel with
pij = P(bj received | ai sent)
the same for each channel use and independent of all past and future uses of the channel.
The channel matrix is P = (pij ), an rs stochastic matrix.
Denition. The binary symmetric channel (BSC) with error probability 0 p 1

has 1 = 2 = {0, 1}. The channel matrix is 1p p
p 1p . A symbol is transmitted with
probability 1 p.
Denition. The binary erasure channel has 1 = {0, 1} and 2 = {0, 1, ?}. The
1p 0 p
channel matrix is 0 1p p .
We model n uses of a channel by the nth extension with input alphabet n1 and ouput
alphabet n2 .
A code C of length n is a function M n1 , where M is the set of possible messages.
Implicitly, we also have a decoding
n
rule 2 M.
The size of C is m = |M|. The information rate is (C) = 1

n log2 m. The error rate is
e(C) = maxxM P(error | x sent).
Denition. A channel can transmit reliably at rate R if there exists (Cn )

n=1 with Cn a
code of length n such that
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Denition. The capacity of a channel is the supremum over all reliable transmission
rates.
Fact. A BSC with error probability p< 1

2 has non-zero capacity.
Chapter 1
Noiseless Coding
Notation. an alphabet, let = n0 n be the set of all nite strings from .

S
For
Strings x = x1 . . . xr and y = y1 . . . ys have concatenation xy = x1 . . . xr y1 . . . ys .
Denition. Let 1 , 2 be alphabets. A code is a function f : 1 2 . The strings

f (x) for x 1 are called codewords or words.
Example (Greek Fire Code). 1 = {, , , . . . , }, 2 = {1, 2, 3, 4, 5}. f () =

11, f () = 12, . . . , f () = 53, f () = 54. Here, xy means x torches held up and another
y torches close-by.
Example. Let 1 be the words in a given dictionary and 2 = {A, B, C, . . . , Z, }.

f is spell the word and follow by a space. We send a message x1 . . . xn 1 as
f (x1 )f (x2 ) . . . f (xn ) 2 , i.e. f extends to a function f : 1 2 .
Denition. f is decipherable if f is injective, i.e. each string from 2 corresponds to

at most one message.
Note 1. Note that we need f to be injective, but this is not enough.
Example. Let 1 = {1, 2, 3, 4}, 2 = {0, 1} and f (1) = 0, f (2) = 1, f (3) = 00, f (4) =
01. Then f (114) = 0001 = f (312). Here f is injective but not decipherable.
Notation. If |1 | = m, |2 | = a then f is an a-ary code of size m.
Our aim is to construct decipherable codes with short word lengths. Assuming f is
injective, the following codes are always decipherable.
(i) A block code has all codewords the same length.

(ii) A comma code reserves a letter from 2 to signal the end of a word.
(iii) A prex-free code is one where no codeword is a prex of any other distinct word.

(If x, y 2 then x is a prex of y if y = xz for some z 2 .)
Note 2. Note that (i) and (ii) are special cases of (iii). Prex-free codes are sometimes
called instantaneous codes or self-punctuating codes.
Exercise 1. Construct a decipherable code which is not prex-free.
Take 1 = {1, 2}, 2 = {0, 1} and set f (1) = 0, f (2) = 01.

4 Noiseless Coding
Theorem 1.1 (Kraft's Inequality) . Let |1 | = m, |2 | = a. A prex-free code f : 1

2 with word lengths s1 , . . . , sm exists if and only if
m
X
asi 1. ()
i=1
Proof. Rewrite () as
s
X
nl al 1 ()
l=1
where nl is the number of codewords of length l and s = max1im si .
If f : 1 2 is prex-free then
n1 as1 + n2 as2 + + ns1 a + ns as
since the LHS is the number of strings of length s in 2 with some codeword of f as a
prex and the RHS is the number of strings of length s.
For the converse, given n1 , . . . , ns satisfying (), we need to construct a prex-free code
f with nl codewords of length l, for all l s. We proceed by induction on s. The case
s = 1 is clear. (Here, () gives n1 a, so we can choose a code.) By the induction
hypothesis, there exists a prex-free code g with nl codewords of length l for all l s1.
() implies
n1 as1 + n2 as2 + + ns1 a + ns as
where the rst s 1 terms on the LHS sum to the number of strings of length s with
some codeword of g as a prex and the RHS is the number of strings of length s. Hence
we can add at least ns new codewords of length s to g and maintain the prex-free
property.
Remark. The proof is constructive, i.e. just choose codewords in order of increasing
length, ensuring that no previous codeword is a prex.
Theorem 1.2 (McMillan) . Any decipherable code satises Kraft's inequality.
Proof (Kamish). Let f : 1 2 be a decipherable code with word lengths s1 , . . . , sm .

Let s = max1im si . For r N,
m
!r rs
X X
si
a = bl al
i=1 l=1
where
bl = |{x r1 : f (x) has length l}|

|l2 | =a l
using that f is injective. Then
m
!r rs
X X
asi al al = rs
i=1 l=1
m
X
asi (rs)1/r 1 as r .
i=1
si
P
Therefore, i=1 a 1.
5
Corollary 1.3. A decipherable code with prescribed word lengths exists if and only if
a prex-free code with the same word lengths exists.
Entropy is a measure of randomness or uncertainty. A random variable P X takes

values x1 , . . . , xn with probabilities p1 , . . . , pn , where 0 pi 1 and pi = 1. The
entropy H(X) is roughly speaking the expected number of fair coin tosses needed to
simulate X .
Example. Suppose that p 1 = p2 = p3 = p 4 = 1

4 . Identify {x1 , x2 , x3 , x4 } = {HH, HT, T H, T T },
i.e. the entropy is H = 2.
Example. Let (p1 , p2 , p3 , p4 ) = ( 12 , 14 , 18 , 18 ).
x1
u:
uuu
H
u
IuII 9 x2
III Hrrr
I I$ rrrr
T
NNN x3
NNN Hppp7
N N' pppp
T NNN
NNN
N' T x4
1 1 1 1 7
Hence H= 2 1+ 4 2+ 8 3+ 8 3= 4.
Denition.
Pn
entropy
The of X is H(X) = i=1 pi log pi = H(p1 , . . . , pn ), where in
this course log = log2 .
Note 3. H(X) is always non-negative. It is measured in bits.
Exercise 2. By convention 0 log 0 = 0. Show that x log x 0 as x 0.
Example. A biased coin has P(Heads) = p, P(Tails) = 1 p. We abbreviate H(p, 1 p)

as H(p). Then
H(p) = p log p (1 p) log(1 p)

1p
H 0 (p) = log
p
0.8
0.6
H(p)
0.4
0.2
0 0.2 0.4 0.6 0.8 1

p
1
The entropy is greatest for p= 2 , i.e. a fair coin.
6 Noiseless Coding
Lemma 1.4 (Gibbs' Inequality) . Let (p1 , . . . , pn ) and (q1 , . . . , qn ) be probability distri-
butions. Then
n
X n
X
pi log pi pi log qi
i=1 i=1
with equality if and only if pi = q i for all i.
Proof. Since log x = ln x

ln 2 , we may replace log by ln for the duration of this proof. Let
I = {1 i n : pi 6= 0}.
0 1 2 3 4 5 6
x
1
We have
ln x x 1 x > 0 ()
with equality if and only if x = 1. Hence
qi qi
ln 1 i I
pi pi
X qi X X
pi ln qi pi
pi
iI iI iI
X
= qi 1
iI
0
X X
pi ln pi pi ln qi
iI iI
Xn Xn
pi ln pi pi ln qi
i=1 i=1
P qi
If equality holds then iI qi =1 and
pi =1 for all i I. Therefore, pi = q i for all
1 i n.
Corollary 1.5. H(p1 , . . . , pn ) log n with equality if and only if p1 = . . . = pn = 1

n.
Proof. Take q1 = = qn = 1
n in Gibb's inequality.
Let 1 = {1 , . . . , m }, |2 | = a. The random variable X takes values 1 , . . . , m with

probabilities p1 , . . . , pm .
7
Denition. A code
is optimal
f : 1 P
2 if it is a decipherable code with the minimum
m
possible expected word length i=1 pi si .
Theorem 1.6 (Noiseless Coding Theorem) . The expected word length E(S) of an op-
timal code satises
H(X) H(X)
E(S) < + 1.
log a log a
Proof. We rst prove the lower bound. Take f : 1 2 decipherable with word lengths
s
s1 , . . . , sm . Set qi = a c i where c = m
P si . Note
Pm
i=1 a i=1 qi = 1. By Gibbs' inequality,
m
X
H(X) pi log qi
i=1
Xm
= pi (si log a log c)
i=1
m
!
X
= pi si log a + log c.
i=1
By Theorem 1.2, c 1, so log c 0.

m
!
X
H(X) pi si log a
i=1
H(X)
E(S).
log a
We have equality if and only if pi = asi for some integers s1 , . . . , sm .
For the upper bound, take si = d loga pi e. Then
loga pi si
loga pi si
pi asi
Pm si m
P
Now i=1 a i=1 pi = 1. By Theorem 1.1, there exists a prex-free code f with
word lengths s1 , . . . , sm . f has expected word length
m
X
E(S) = pi si
i=1
Xm
< pi ( loga pi + 1)
i=1
H(X)
= + 1.
log a
ShannonFano Coding
This follows the above proof. Given p1 , . . . , pm set si = d loga pi e. Construct a prex-
free code with word lengths s1 , . . . , sm by choosing codewords in order of increasing
length, ensuring that previous codewords are not prexes.
8 Noiseless Coding
Example. Let a = 2, m = 5.
i pi d log2 pi e
1 0.4 2 00
2 0.2 3 010
3 0.2 3 011
4 0.1 4 1000
5 0.1 4 1001
P H
We have E(S) = pi si = 2.8. The entropy is H = 2.121928 . . . , so here
log a =
2.121928 . . . .
Human Coding
For simplicity, let a = 2. Without loss of generality, p1 pm . The denition
is recursive. If m = 2 take codewords 0 and 1. If m > 2, rst take a Human code
for messages 1 , . . . , m2 , with probabilities p1 , . . . , pm2 , pm1 + pm . Then append
0 (resp. 1) to the codeword for to give a codeword for m1 (resp. m ).
Remark. (i) Human codes are prex-free.

(ii) Exercise choice if some pj are equal.
Example. Consider the same case as in the previous example.
0.4 1 0.4 1 0.4 1 0.6 0

0.2 01 0.2 01 0.4 00 0.4 1
0.2 000 0.2 000 0.2 01
0.1 0010 0.2 001
0.1 0011
P
We have E(S) = pi si = 2.2.
Theorem 1.7. Human codes are optimal.
Proof. We show by induction on m that Human codes of size m are optimal.
If m=2 the codewords are 0 and 1. This code is clearly optimal.
Assume m > 2. let fm be a Human code for Xm which takes values 1 , . . . , m

with probabilities p1 pm . fm is constructed from a Human code fm1 for
Xm1 which takes values 1 , . . . , m2 , with probabilities p1 , . . . , pm2 , pm1 + pm .
The expected word length is
E(Sm ) = E(Sm1 ) + pm1 + pm ()
Let
0
fm be an optimal code for Xm . 0 is still prex-free. By
fm
Without loss of generality
0
Lemma 1.8, without loss of generality the last two codewords of fm have maximal length
0 0
and dier only in the last digit. Say fm (m1 ) = y0, fm (m ) = y1 for some y {0, 1} .

0
Let fm1 be the prex-free code for Xm1 given by
0 0
fm1 (i ) = fm (i ) 1 i m 2
9
0
fm1 () = y.
The expected word length is
0 0
E(Sm ) = E(Sm1 ) + pm1 + pm ()
By the induction hypothesis, fm1 is optimal, hence

0
E(Sm1 ) E(Sm1 ). So by ()
and (), E(Sm ) 0 ), so
E(Sm fm is optimal.
Lemma 1.8. Suppose messages 1 , . . . , m are sent with probabilities p1 , . . . , p m . Let

f be an optimal prex-free code with word lengths s1 , . . . , sm .
(i) If pi > pj then si sj .
(ii) Among all codewords of maximal lengths there exists two that dier only in the
last digit.
Proof. If not, we modify f by (i) swapping the ith and j th codewords, or (ii) deleting
the last letter of each codeword of maximal length. The modied code is still prex-free
but has shorter expected word length, contradicting the optimality of f.
Joint Entropy
Denition. Let X, Y be random variables that values in 1 , 2 .
X X
H(X, Y ) = P(X = x, Y = y) log P(X = x, Y = y)
x1 y2
This denition generalises to any nite number of random variables.
Lemma 1.9. Let X, Y be random variables that values in 1 , 2 . Then
H(X, Y ) H(X) + H(Y )
with equality if and only if X and Y are independent.
Proof. Let 1 = {x1 , . . . , xm }, 2 = {y1 , . . . , yn }. Set
pij = P(X = xi Y = yj )
pi = P(X = xi )
qi = P(Y = yi )
Apply Gibbs' inequality with probability distributions {pij } and {pi qj } to obtain
X X
pij log pij pij log(pi qj )
i,j i,j
!
X X X X
= pij log pi pij log qj
i j j i
H(X, Y ) H(X) + H(Y )
with equality if and only if pij = pi qj for all i, j , i.e. if and only if X, Y are independent.
Chapter 2
Error-control Codes
Denition. A binary [n, m]-code is a subsetC {0, 1}n of size m = |C|; n is the
length of the code, elements are called codewords. We use an [n, m]-code to send one of
m messages through a BSC making n uses of the channel.
1p
0 QQQQ m6/ 0
QQQ p mmmmm
mQmQmQ
mmmmm p QQQQQ
mm Q/(
1 1p
1
Note 4. Note 1 m 2n . Therefore, 0 1

n log m 1.
Denition. For x, y {0, 1}n the Hamming distance is
d(x, y) = |{i : 1 i n xi 6= yi }|.
We consider three possible decoding rules.
(i) The ideal observer decoding rule decodes x {0, 1}n as c C maximising
P(c sent | x received).
The maximum likelihood decoding rule decodes x {0, 1} as c C maximising
(ii)
n
P(x received | c sent).

The minimum distance decoding rule decodes x {0, 1} as c C minimising
(iii)
n
d(x, c).
Lemma 2.1. If all messages are equally likely then (i) and (ii) agree.
Lemma 2.2. If p< 1

2 then (ii) and (iii) agree.
Remark. The hypthesis of Lemma 2.1 is reasonable if we rst carry out noiseless coding.
Proof of Lemma 2.1. By Bayes' rule,
P(c sent and x received)

P(c sent |x received) =
P(x received)
P(c sent)
= P(x received |c sent).
P(x received)
By the hypothesis, P(c sent) is independent of c C . So for xed x, maximising

P(c sent | x received) is the same as maximising P(x received | c sent).
12 Error-control Codes
Proof of Lemma 2.2. Let r = d(x, c). Then
r
r nr n p
P(x received |c sent) = p (1 p) = (1 p) .
1p
1 p
Since p< 2 , 1p < 1. So maximising P(x received |c sent) is the same as minimising
d(x, c).
Example. Codewords 000 and 111 are sent with probabilities = 9

10 and 1 = 1
10
1
through a BSC with error probability p= 4 . We receive 110.
p2 (1 p)
P(000 sent | 110 received) =
p2 (1 p) + (1 )p(1 p)2
p
=
p + (1 )(1 p)
3
=
4
1
P(111 sent | 110 received) = .
4
Therefore, the ideal observer decodes as 000. Maximimum likelihood and minimum
distance rules both decode as 111.
From now on, we will use the minimum distance decoding rule.
Remark. (i) Minimum distance decoding may be expensive in terms of time and
storage if |C| is large.
(ii) We should specify a convention in the case of a tie, e.g. make a random choice,
request to send again, etc.
We aim to detect, or even correct errors.
Denition. A code C is
(i) d-error detecting if changing up to d digits in each codeword can never produce
another codeword.
(ii) e-error correcting if knowing that x {0, 1}n diers from a codeword in at most
e places, we can deduce the codeword.
Example. A repetition code of length n has codewords 00 . . . 0, 11 . . . 1. This is a [n, 2]-

(n 1)-error detecting and n1

code. It is -error correcting. But the information rate
2
1
is only
n.
Example. For the simple parity check code, also known as the paper tape code, we
identify {0, 1} with F2 .
C = {(c1 , . . . , cn ) {0, 1}n : c1 + + cn = 0}.
This is a [n, 2n1 ]-code; it is 1-error detecting, but cannot correct errors. Its information
n1
rate is
n .
We can work out the codeword of 0, . . . , 7 by asking whether it is in {4, 5, 6, 7}, {2, 3, 6, 7},
{1, 3, 5, 7} and setting the last bit to be the parity checker.
13
0 0000 4 1001
1 0011 5 1010
2 0101 6 1100
3 0110 7 1111
Example. Hamming's original [7,16]-code. Let C F72 be dened by
c1 + c3 + c5 + c7 = 0
c2 + c3 + c6 + c7 = 0
c4 + c5 + c6 + c7 = 0
There is an arbitrary choice ofc3 , c5 , c6 , c7 but then c1 , c2 , c4 are forced. Hence, |C| = 24
1
and the information rate is
n log m = 47 .
Suppose we receive x F2 . We form the syndrome z = (z1 , z2 , z4 ) where
7
z1 = x1 + x3 + x5 + x7
z2 = x2 + x3 + x6 + x7
z4 = x4 + x5 + x6 + x7
If x C then z = (0, 0, 0). If d(x, c) = 1 for some c C then xi and ci dier for
i = z1 + 2z2 + 4z4 . The code is 1-error correcting.
Lemma 2.3. The Hamming distance d on Fn2 is a metric.
Proof. (i) d(x, y) 0 with equality if and only if x = y.

(ii) d(x, y) = d(y, x).
(iii) Triangle inequality. Let x, y, z Fn2 .
{1 i n : xi 6= zi } {1 i n : xi 6= yi } {1 i n : yi 6= zi }
d(x, z) d(x, y) + d(y, z).

Therefore,
Remark. We can also write d(x, y) = ni=1 d1 (xi , yi ) where d1

P
is the discrete metric on
F2 .
Denition. The minimum distance of a code is the minimum value of d(c1 , c2 ) for c1 , c2
distinct codewords.
Lemma 2.4. Let C have minimum distance d.

(i) C is (d 1)-error
d1 detecting, but cannot detect all sets of d errors.
d1

(ii) C is
2 -error correcting, but cannot correct all sets of
2 +1 errors.
Proof. (i) d(c1 , c2 ) d for all distinct c1 , c2 C . Therefore, C is (d 1)-error

detecting. But d(c1 , c2 ) = d for some c1 , c2 C . Therefore, C cannot correct all
sets of d errors.
(ii) The closed Hamming ball with centre x Fn2 and radius r0 is B(x, r) = {y
Fn2 : d(x, y) r}. Recall, C is e-error correcting if and only if
distinct c1 , c2 C B(c1 , e) B(c2 , e) = .

If x B(c1 , e) B( c2 , e) then
d(c1 , c2 ) d(c1 , x) + d(x, c2 )

2e
d 2e + 1 then C is e-error correcting. Take e = d1

So if
2 .
n
Let c1 , c2 C with d(c1 , c2 ) = d. Let x F2 dier from c1 in e digits where c1 and
c2 dier too. Then d(x, c1 ) = e, d(x, c2 ) = de. If d 2e then
B(c1 , e)B(c2 , e) 6=
, i.e. C cannot correct all sets of e-errors. Take e = d2 = d1 2 + 1.
Notation. An [n, m]-code with minimum distance d is an [n, m, d]-code.
Example. (i) The repetition code of length [n, 2, n]-code. n is an

(ii) The simple parity check code of length [n, 2n1 , 2]-code.
n is an
(iii) Hamming's [7, 16]-code is 1-error correcting. Hence d 3. Also, 0000000 and
1110000 are both codewords. Therefore, d = 3, and this code is a [7, 16, 3]-code.
It is 2-error detecting.
Bounds on Codes
Notation.
Pr n
x Fn2 .

Let V (n, r) = |B(x, r)| = i=0 i , independently of
Lemma 2.5 (Hamming's Bound) . An e-error correcting code C of length n has
2n
|C| .
V (n, e)
Proof. C is e-error correcting, so B(c1 , e) B(c2 , e) = for all distinct c1 , c2 C .

Therefore,
X
|B(c, e)| |Fn2 | = 2n
cC
|C|V (n, e) 2n
Denition. A code C of length n that can correct e-errors is perfect if
2n
|C| = .
V (n, e)
Equivalently, for all x Fn2 there exists a unique c C such that d(x, c) e. Also
Fn2 = cC B(c, e), i.e. any e + 1 errors will make you decode wrongly.
S
equivalently,
Example. Hamming's [7, 16, 3]-code is e=1 error correcting and
2n 27 27
= = = 24 = |C|
V (n, e) V (7, 1) 1+7
i.e. this code is perfect.
Remark. If
2n
V (n,e) 6 Z then there does not exist a perfect e-error correcting code of
length n. The converse is false (see Example Sheet 2 for the case n = 90, e = 2).
15
Denition. A(n, d) = max{m : there exists an [n, m, d]-code}.

Example. We have
A(n, 1) = 2n A(n, n) = 2 A(n, 2) = 2n1
In the last case, we have A(n, 2) 2n1 by the simple parity check code. Suppose C
has length n and mininum distance 2. Let C be obtained from C by switching the last
n n
digit of every codeword. Then 2|C| = |C C| |F2 | = 2 , so A(n, 2) = 2
n1 .
Lemma 2.6. A(n, d + 1) A(n, d).
Proof. Let m = A(n, d + 1) and pick C with parameters [n, m, d + 1]. Let c1 , c2 C
0
with d(c1 , c2 ) = d + 1. Let c1 dier from c1 in exactly one of the places where c1 and c2
0
dier. Hence d(c1 , c2 ) = d. If c C \ {c1 } then
d(c, c1 ) d(c, c01 ) + d(c01 , c1 )

= d + 1 d(c, c01 ) + 1
= d(c, c01 ) d.
Replacing c1 by c01 gives an [n, m, d]-code. Therefore, m A(n, d).
Corollary 2.7. Equivalently, we have
A(n, d) = max{m : there exists an [n, m, d0 ]-code for some d0 d}.
Proposition 2.8.
2n 2n
A(n, d)
V (n, d 1) V (n, d1

2 )
The lower bound is known as the Gilbert Shannon Varshanov (GSV) bound or sphere
covering bound. The upper bound is known as Hamming's bound or sphere packing
bound.
Proof of the GSV bound. Let m = A(n, d). Let C be an [n, m, d]-code. Then there does
n
not exist x F2 with d(x, c) d c C , otherwise we could replace C by C {x} to
get an [n, m + 1, d]-code. Therefore,
[
Fn2 = B(c, d 1)
cC
X
2n |B(c, d 1)| = mV (n, d 1).
cC
Example. Let n = 10, d = 3. We have V (n, 2) = 56, V (n, 1) = 11.
210 210
A(10, 3)
56 11
19 A(10, 3) 93.
It is known that 72 A(10, 3) 79, but the exact value is not known.
1
We study
n log A(n, bnc) as n to see how large the information rate can be for a
given error rate.
Proposition 2.9. Let 0<< 1

2 . Then
(i) log V (n, bnc) nH()
1
(ii)
n log A(n, bnc) 1 H()
where H() = log (1 ) log(1 ) as before.
Proof. We rst show that (i) implies (ii). By the GSV bound,
2n
A(n, bnc)
V (n, bnc)
log A(n, bnc) log V (n, bnc)
1
n n
1 H().
1
Now we prove (i). Since H() is increasing for 2 , we may assume n Z.
1 = ( + (1 ))n
n
X n i
= (1 )ni
i
i=0
n
X n i
(1 )ni
i
i=0
n i
n
X n
= (1 )
i 1
i=0
n n
n
X n
(1 )
i 1
i=0
n n(1)
= (1 ) V (n, n)
Take logarithms to obtain
0 n log + n(1 ) log(1 ) + log V (n, n)

0 nH() + log V (n, n)
In fact, the constant H() is in Proposition 2.9 (i) is best possible.
Lemma 2.10.
V (n, bnc)
lim = H().
n n
Proof.
Pr
1
Without loss of generality assume 0 < < . Let 0 r
2
n
2 and recall V (n, r) =
n

i=0 i . Therefore,
n n
V (n, r) (r + 1) ()
r r
Stirling's formula states
ln n! = n ln n n + O(log n)
17

n
ln = (n ln n n) (r ln r r) ((n r) ln(n r) (n r))
r
+ O(log n)

n r nr
log = r log (n r) log + O(log n)
r n n
r
= nH + O(log n)
n
By (),
r
log n log V (n, r) r log n
H +O H +O
n n n n n
log V (n, bnc)
lim = H()
n n
1
If , we can
2 use the symmetry of the binomial coecients and the entropy to swap
and 1 .
New Codes from Old

Let C be an [n, m, d]-code.
(i) The parity check extension of C is
n
X
C = {(c1 , . . . , cn , ci : (c1 , . . . , cn ) C},
i=1
where the sum is modulo 2.

(ii) Fix 1 i n. Deleting the ith digit from each codeword gives a punctured code,
with (assuming d 2) parameters [n 1, m, d0 ] where d1 d0 d.
(iii) Fix 1 i n , a F2 . The shortened code is
{(c1 , . . . , ci1 , ci+1 , . . . , cn ) : (c1 , . . . , ci1 , a, ci+1 , . . . , cn ) C}
It has parameters [n 1, m0 , d0 ] with d0 d and m0 m

2 for a suitable choice of a.
Chapter 3
Shannon's Theorems
Denition. A source is a sequence of random variables X1 , X2 , . . . taking values in

some alphabet . A source X1 , X2 , . . . is Bernoulli, or memoryless, if X1 , X2 , . . . are
independent identically distributed.
Denition. A source X1 , X 2 , . . . is reliably encodeable at rate r if there exists subsets

An n such that
n|
(i) limn log|A
n = r;
(ii) limn P (X1 , . . . , Xn ) An = 1.
Denition. The information rate H of a source is the inmum of all reliable encoding
rates.
Note 5. Note that 0 H log||.
Shannon's First Coding Theorem computes the information rate of certain sources, in-
cluding Bernoulli sources.
Consider the probability space (, F, P). Recall that a random variable X is a function
dened on with some
n
range, e.g. R, R , or . We have a probability mass function
pX : [0, 1]
x 7 P(X = x)
We consider
pX
p(X) :
X / / [0, 1]
/ P(X = X())
Note that p(X) is another random variable.
Recall that a sequence of random variables X1 , X 2 , . . . converges in probability to cR

if
> 0 lim P(|Xn c| > ) = 0.
n
We write this as
P
Xn
c as n .
Fact (Weak Law of Large Numbers, WLLN). Let X1 , X 2 , . . . be i.i.d. real-valued random
variables with nite expected value . Then
n
1X P
Xi
as n .
n
i=1
20 Shannon's Theorems
Lemma 3.1. The information rate of a Bernoulli source X1 , X 2 , . . . is atmost the ex-
pected word length of an optimal code f: {0, 1} for X.
Proof. Let S1 , S2 , . . . be the lengths of codewords when we encode X1 , X 2 , . . . using f.

Let > 0 and set
An = {x n : f (x) has length less than n(E S1 + )}
Then
n
!
X
P((X1 , . . . , Xn ) An ) = P ( Si < n(E S1 + )
i=1
n
!
1X
=P | Si E S1 | <
n
i=1
1 as n
f is decipherable, so f is injective. Hence |An | 2n(E S1 +) . Making An larger, we may

2n(E S1 +) , so

assume |An | =
log|An |
E S1 +
n
Therefore, X1 , X 2 , . . . is reliably encodeable at rate E S1 + for all > 0, so the infor-
mation rate is at most E S1 .
Corollary 3.2. From Lemma 3.1 and the Noiseless Coding Theorem, a Bernoulli source
X1 , X 2 , . . . has information rate less than H(X1 ) + 1.
Suppose we encode X1 , X 2 , . . . in blocks
X1 , . . . , XN , XN +1 , . . . , X2N , . . .
| {z } | {z }
Y1 Y2
such that Y1 , Y2 , . . . take values in N .

Exercise 3. If X1 , X 2 , . . . has information rate H then Y1 , Y2 , . . . has information rate
N H.
Proposition 3.3. The information rate H of a Bernoulli source X1 , X 2 , . . . is at most
H(X1 ).
Proof. We apply the previous corollary to Y1 , Y2 , . . . and obtain
N H H(Y1 ) + 1
= H(X1 , . . . , XN ) + 1
N
X
= H(Xi ) + 1
i=1
= N H(X1 ) + 1
1
H < H(X1 ) +
N
But N 1 is arbitrary, so H H(X1 ).
21
Denition. A source X1 , X 2 , . . . satises the Asymptotic Equipartition Property (AEP)

for constant H0 if
1
log p(X1 , . . . , Xn ) H as n .
n
Example. We toss a biased coin, P(Heads) = 2
3, P(Tails) = 1
3 , 300 times. Typically
we get about 200 heads and 100 tails. Each such sequence occurs with probability
2 200 1 100
approximately ( ) (3) .
3
Lemma 3.4. The AEP for a source X1 , X 2 , . . . is equivalent to the following property:
> 0 n0 () n n0 () Tn n such that

(i) P (X1 , . . . Xn ) Tn > 1 ()
(ii) (x1 , . . . , xn ) Tn 2n(H+) p(xn , . . . , xn ) 2n(H)

The Tn are called typical sets.
Proof. If (x1 , . . . , xn ) n then we have the following equivalence
2n(H+) p(x1 , . . . , xn ) 2n(H)

()
| n1 log p(x1 , . . . , xn ) H|
Then both the AEP and () say that

P (X1 , . . . , Xn ) satises () 1 as n .
Theorem 3.5 (Shannon's First Coding Theorem) . If a source X1 , X 2 , . . . satises the

AEP with constant H then it has information rate H.
Proof. Let >0 and T n n be typical sets. Then for all (x1 , . . . , xn ) Tn
p(x1 , . . . , xn ) 2n(H+)
1 |Tn |2n(H+)
log|Tn |
n(H + )
n
Taking An = T n shows that the source is reliably encodeable at rate H + .
H
Conversely, if H = 0 we are done, otherwise pick 0 < < 2.
We suppose for a
contradiction that the source is reliably encodable at rate H 2, say with sets An n .
Let T n n be typical sets. Then for all (x1 , . . . , xn ) Tn ,
p(x1 , . . . , xn ) 2n(H)
P(An Tn ) 2n(H) |An |
log P(An Tn ) log|An | n
(H ) + (H ) + (H 2) =
n n
log P(An Tn ) as n
P(An Tn ) 0 as n
But P(Tn ) P(An Tn ) + P(n \ An ) 0 as n , contradicting that the Tn are

typical. Therefore, we cannot reliably encode at rate H 2. Thus the information rate
is H.
Entropy as an Expectation
Note 6.

For the entropy H, we have H(X) = E log p(X) , e.g. if X, Y independent,
p(X, Y ) = p(X)p(Y )
log p(X, Y ) = log p(X)p(Y )
H(X, Y ) = H(X) + H(Y ),
recovering Lemma 1.9.
Corollary 3.6. A Bernoulli source X1 , X 2 , . . . has information rate H = H(X1 ).
Proof. We have
p(X1 , . . . , Xn ) = p(X1 ) p(Xn )

n
1 1X P
log p(X1 , . . . , Xn ) = log p(Xi )
H(X1 )
n n
i=1
by the WLLN, using that X1 , . . . are independent identically distributed random vari-
ables and hence so are log p(X1 ), . . . .
Carefully writing out the denition of convergence in probability shows that the AEP
holds with constant H(X1 ). (This is left as an exercise.) We conclude using Shannon's
First Coding Theorem.
Remark. The AEP is useful for noiseless coding. We can
encode the typical sequences using a block code;

encode the atypical sequences arbitrarily.
Remark. Many sources, which are not necessarily Bernoulli, satisfy the AEP. Under
1
suitable hypotheses, the sequence
n H(X1 , . . . , Xn ) is decreasing and the AEP is satised
with constant
1
H = lim H(X1 , . . . , Xn ).
n n
Note 7. For a Bernoulli source H(X1 , . . . , Xn ) = nH(X1 ).

Example. If our source is English text with = {A, B, . . . , Z, } then experiments
show
H(X1 ) 4.03
1
2 H(X1 , X2 ) 3.32
1
3 H(X1 , X2 , X3 ) 3.10
It is generally believed that English has entropy a bit bigger than 1, so about 75%
H 1 3
redundancy as 1 log|| 1 4 = 4 ).
Denition. Consider a communication channel, with input alphabet 1 , output alpha-
bet 2 . A code of length n is a subset C n1 . The error rate is
e(C) = max P(error | c sent).

cC
The information rate is

log|C|
(C) = .
n
23
Denition. A channel can transmit reliably at rate R if there exist codes C1 , C2 , . . .

with Cn of length n such that
(i) limn (Cn ) = R;

(ii) limn e(Cn ) = 0.
Denition. The capacity of the channel is the supremum of all reliable transmission
rates.
Suppose we are given a source
information rate r bits per symbol

emits symbols at s symbols per second
and a channel
capacity R bits per transmission

transmits symbols at S transmissions per second
Usually, mathematicians take S = s = 1. If rs RS then you can encode and transmit

reliably; if rs > RS then you cannot.
Proposition 3.7. A BSC with error probability p< 1

4 has non-zero capacity.
Proof. The idea is to use the GSV bound. Pick with 2p < < 1
2 . We will show reliable
transmission at rate R = 1 H() > 0. Let Cn be a code of length n and minimum
distance bnc of maximal size. Then
|Cn | = A(n, bnc) 2n(1H()) by Proposition 2.9 (ii)
= 2nR
Using minimum distance decoding,
n1
e(Cn ) P(BSC makes more than
2 errors)

Pick >0 with p+< 2 . For n suciently large,
n 1
> n(p + )
2
e(Cn ) P(BSC makes more than n(p + ) errors)
0 as n
by the next lemma.
Lemma 3.8. Let > 0. A BSC with error probability p is used to transmit n digits.
Then
lim P(BSC makes at least n(p + ) errors) = 0.
n
Proof. Dene random variables
(
1 if the ith digit is mistransmitted
Ui =
0 otherwise
We have U1 , U2 , . . . are i.i.d. and
P(Ui = 1) = p
P(Ui = 0) = 1 p
and so E(Ui ) = p. Therefore,
n
!
X
P(BSC makes more than n(p + ) errors) P | n1 Ui p| 0
i=0
as n by the WLLN.
Conditional Entropy
Let X, Y be random variables taking values in alphabets 1 , 2 .
Denition. We dene
X
H(X | Y = y) = P(X = x | Y = y) log P(X = x | Y = y)
x1
X
H(X | Y ) = P(Y = y)H(X | Y = y)
y2
Note 8. Note that H(X | Y ) 0.
Lemma 3.9.
H(X, Y ) = H(X | Y ) + H(Y ).
Proof.
X X
H(X | Y ) = P(X = x | Y = y) P(Y = y) log P(X = x | Y = y)
x1 y2

X X P(X = x, Y = y)
= P(X = y, Y = y) log
P(Y = y)
x1 y2
X
= P(X = x, Y = y) log P(X = x, Y = y)
(x,y)1 2
X X
+ P(X = x, Y = y) log P(Y = y)
y2 x1
| {z }
P(Y =y)
= H(X, Y ) H(Y )
Corollary 3.10. H(X | Y ) H(X) with equality if and only if X, Y are independent.
Proof. Combine Lemma 1.9 and Lemma 3.9.
Replacing X, Y by random vectors X1 , . . . , Xr and Y1 , . . . , Ys , we similarly dene H(X1 , . . . , Xr |

Y1 , . . . , Ys ).
25
Note 9. H(X, Y | Z) denotes the entropy of X and Y given Z , not the entropy of X
and Y | Z.
Lemma 3.11. Let X, Y, Z be random variables. Then
H(X | Y ) H(X | Y, Z) + H(Z).
Proof. We use Lemma 3.9 to give
H(X, Y, Z) = H(Z | X, Y ) + H(X | Y ) + H(Y )

| {z }
H(X,Y )
H(X, Y, Z) = H(X | Y, Z) + H(Z | Y ) + H(Y )

| {z }
H(Y,Z)
Since H(Z | X, Y ) 0, we get
H(X | Y ) H(X | Y, Z) + H(Z | Y )

H(X | Y, Z) + H(Z)
Lemma 3.12 (Fano's Inequality) . Let X, Y be random variables taking values in ,

|| = m say. Let p = P(X 6= Y ). Then
H(X | Y ) H(p) + p log(m 1)
Proof. Let (
0 if X=Y
Z=
1 if X 6= Y
Then P(Z = 0) = 1 p, P(Z = 1) = p and so H(Z) = H(p). Now by Lemma 3.11,
H(X | Y ) H(p) + H(X | Y, Z) ()
Since we must have X = y,
H(X | Y = y, Z = 0) = 0.
There are just m1 possibilities for X and so
H(X | Y = y, Z = 1) log(m 1).
Therefore,
X
H(X | Y, Z) = P(Y = y, Z = z)H(X | Y = y, Z = z)
y,z
X
P(Y = y, Z = 1) log(m 1)
y
= P(Z = 1) log(m 1)
= p log(m 1)
Now by (),
H(X | Y ) H(p) + p log(m 1).
Denition. Let X, Y be random variables. The mutual information is
I(X; Y ) = H(X) H(X | Y ).
By Lemma 1.9 and Lemma 3.9,
I(X; Y ) = H(X) + H(Y ) H(X, Y ) 0,

with equality if and only if X, Y are independent. Note the symmetry I(X; Y ) =
I(Y ; X).
Consider a DMC with input alphabet 1 , |1 | = m, and output alphabet 2 . Let X
be a random variable taking values in 1 used as input to the channel. Let Y be the
random variable output, depending on X and the channel matrix.
Denition. The information capacity is maxX I(X; Y ).
Remark. (i) We maximise over all probability distributions p1 , . . . , p m .
(ii) The maximum is attained since we have a continuous function I on a compact set
n X o
(p1 , . . . , pm ) Rm : i pi 0; pi = 1 .
(iii) The information capacity depends only on the channel matrix.
Theorem 3.13 (Shannon's Second Coding Theorem) . For a DMC, the capacity equals
the information capacity.
Note 10. We will prove in general and for a BSC only.
Example. Consider a BSC with error probability p, input X and output Y.

P(X = 0) = P(Y = 0) = (1 p) + (1 )p
P(X = 1) = 1 P(Y = 1) = (1 )(1 p) + p
Then

C = max I(X; Y ) = max H(Y ) H(Y | X)

= max H((1 p) + (1 )p) H(p)

= 1 H(p)
1
where the maximum is attained for = 2 . Hence C = 1 + p log p + (1 p) log(1 p)
and this has the following graph.
0.8
0.6
C
0.4
0.2
0 0.2 0.4 0.6 0.8 1

p
27
Note 11. We can choose either H(Y ) H(Y | X) or vice versa. Often one is easier.
Example. Consider a binary erasure channel with erasure probability p, input X and
output Y.
P(X = 0) = P(Y = 0) = (1 p)
P(X = 1) = 1 P(Y = ?) = p
P(Y = 1) = (1 )(1 p)
Then
H(X | Y = 0) = 0
H(X | Y = ?) = H()
H(X | Y = 1) = 0
and hence H(X | Y ) = pH(). Therefore,

C = max I(X; Y ) = max H(X) H(X | Y )

= max H() pH()

=1p
1
where the maximum is attained for = 2 . This has the following graph.
0.8
0.6
C
0.4
0.2
0 0.2 0.4 0.6 0.8 1

p
Lemma 3.14. The nth extension of a DMC with information capacity C has information
capacity nC .
Proof. The input X1 , . . . , X n determines the ouput Y1 , . . . , Yn . Since the channel is

memoryless,
n
X
H(Y1 , . . . , Yn | X1 , . . . , Xn ) = H(Yi | X1 , . . . , Xn )
i=1
Xn
= H(Yi | Xi )
i=1
I(X1 , . . . , Xn ; Y1 , . . . , Yn ) = H(Y1 , . . . , Yn ) H(Y1 , . . . , Yn | X1 , . . . , Xn )
Xn
= H(Y1 , . . . , Yn ) H(Yi | Xi )
i=1
n
X n
X
H(Yi ) H(Yi | Xi )
i=1 i=1
X
= I(Xi , Yi )
i=1
nC
We now need to nd X1 , . . . , Xn giving equality to complete the proof. Equality is

attained by taking X1 , . . . , Xn independent, each with the same distribution such that
I(Xi ; Yi ) = C . Indeed, if X1 , . . . , Xn are independent then Y1 , . . . , Yn are independent,
so
n
X
H(Y1 , . . . , Yn ) = H(Yi )
i=1
and we have equality. Therefore,
max I(X1 , . . . , Xn ; Y1 , . . . , Yn ) = nC.

X1 ,...,Xn
Proposition 3.15. For a DMC, the capacity is at most the information capacity.
Proof. Let C be the information capacity. Suppose reliable transmission is possible at

some rate R > C , i.e. there exist C1 , C2 , . . . with Cn of length n such that
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Recall
e(Cn ) = max P(error | c sent).
cCn
Now consider the average error rate

1 X
e(Cn ) = P(error | c sent).
|Cn |
cCn
Clearly e(Cn ) e(Cn ) and so e(Cn ) 0 as n .

Let X be a random variable equidistributed in Cn . We transmit X and decode to obtain
Y. So e(Cn ) = P(X 6= Y ). Then
H(X) = log|Cn | = log 2nR

nR 1
for n suciently large. Thus by Fano's inequality 3.12,
H(X | Y ) |{z}
1 +e(Cn ) log(|Cn | 1)
H(p)1
1 + e(Cn )n(Cn )
nR

since |Cn | = 2 . Now by Lemma 3.14,
nC I(X; Y )
29
= H(X) H(X | Y )
log|Cn | (1 + e(Cn )n(Cn ))
= n(Cn ) + e(Cn )n(Cn ) 1
e(Cn )n(Cn ) n((Cn ) C) 1
(Cn ) C 1 RC
e(Cn ) as n
(Cn ) n(Cn ) R
Since R > C , this contradicts e(Cn ) 0 as n . This shows that we cannot transmit
reliably at any rate R > C , hence the capacity is at most C .
To complete the proof of Shannon's Second Coding Theorem for a BSC with error
probability p, we must show that the capacity is at most 1 H(p).
Proposition 3.16. Consider a BSC with error probability p. Let R < 1 H(p). Then
there exists codes C1 , C2 , . . . with Cn of length n and such that
lim (Cn ) = R
n
lim e(Cn ) = 0
n
Note 12. Note that Proposition 3.16 is concerned with with the average error rate e
rather than the error rate e.
Proof. The idea of the proof is to pick codes at random. Without loss of generality,
1
assume p< 2 . Take >0 such that
1
p+<
2
R < 1 H(p + )
nR
Note this is possible since H is continuous. Let m = 2 and be the set of [n, m]-
2n

codes, so || = m . Let C be a random variable equidistributed in . Say C =
{X1 , . . . , Xm } where the Xi are random variables taking values in Fn2 such that
(
1
m if xC
P(Xi = x | C = C) =
0 otherwise
Note that (
1
2n 1 6 x2
x1 =
P(X2 = x2 | X1 = x1 ) =
0 x1 = x2
We send X = X1 through the BSC, receive Y and decode to obtain Z. Using minimum
distance decoding,
1 X
P(X 6= Z) = e(C)
||
C
It suces to show that P(X 6= Z) 0 as n . Let r = bn(p + )c.
P(X 6= Z) P(B(Y, r) C 6= {X})

= P(X 6 B(Y, r)) + P(B(Y, r) C ) {X})
We consider the two terms on the RHS separately.
P(X 6 B(Y, r)) = P(BSC makes more than n(p + ) errors)
as n , by the WLLN, see Lemma 3.8.
m
X
P(B(Y, r) C ) {X}) P(Xi B(Y, r) and X1 B(Y, r))
i=2
m
X
P(Xi B(Y, r) | X1 B(Y, r))
i=2
V (n, r) 1
= (m 1)
2n 1
V (n, r)
m
2n
nR nH(p+) n
2 2 2
= 2n[R(1H(p+))]
0,
as n since R < 1 H(p + ). We have used Proposition 2.9 to obtain the last
inequality.
Proposition 3.17. We can replace e by e in Proposition 3.16.
Proof. Pick R0 with jR < R 0 0 0

k < 1 H(p). Proposition 3.16 constructs C1 , C2 , . . . with
0
Cn0 of length n, size 2nR and e(Cn0 ) 0 as n . Order the codewords of Cn0 by
P(error | c sent) and delete the worse half of them to give Cn . We have
0
|Cn | 1
|Cn | =
2
and
e(Cn ) 2e(Cn0 )
Then (Cn ) R and e(Cn ) 0 as n .
Proposition 3.17 says that we can transmit reliably at any rate R < 1 H(p), so the
capacity is at least 1 H(p). But by Proposition 3.15, the capacity is at most 1 H(p),
hence a BSC with error probability p has capacity 1 H(p).
Remark. The proof shows that good codes exist, but does not tell us how to construct
them.
Chapter 4
Linear and Cyclic Codes
Denition. A code C Fn2 is linear if
(i) 0 C;
(ii) whenever x, y C then x + y C.
Equivalently, C is an F2 -vector subspace of Fn2 .
Denition. The rank of C is its dimension as a F2 -vector subspace. A linear code of
length n, rank k is an (n, k)-code. If the minimum distance is d, it is an (n, k, d)-code.
Let v1 , . . . , vk be a basis for C. Then

( k )
X
C= i vi : 1 , . . . , k F2 ,
i=1
k
so |C| = 2k . So an (n, k)-code is an [n, 2k ]-code. The information rate is (C) = n.
Denition. The weight of x Fn2 is (x) = d(x, 0).

Lemma 4.1. The minimum distance of a linear code is the minimum weight of a non-
zero codeword.
Proof. If x, y C then d(x, y) = d(x + y, 0) = (x + y). Therefore,
min{d(x, y) : x, y C, x 6= y} = min{(c) : c C : c 6= 0}.
Notation. For x, y Fn2 , let x.y = ni=1 xi yi F2 . Beware that there

P
exists x 6= 0
with x.x = 0.
Denition. Let P Fn2 . The parity check code dened by P is
C = {x Fn2 : p.x = 0 p P }.
Example. (i) P = {111 . . . 1} gives the simple parity check code.

(ii) P = {1010101, 0110011, 0001111} gives Hamming's [7, 16, 3]-code.
Lemma 4.2. Every parity check code is linear.
Proof. 0 C since p.0 = 0 p P . If x, y C then
p.(x + y) = p.x + p.y = 0 p P.

32 Linear and Cyclic Codes
Denition. Let C Fn2 be a linear code. The dual code is
C = {x Fn2 : x.y = 0 y C}.
This is a parity check code, so it is linear. Beware that we can have C C 6= {0}.
Lemma 4.3. rank C + rank C = n.
Proof. We can use the similar result about dual spaces (C

= Ann C ) from Linear
Algebra. An alternative proof is presented on Page 33.
Lemma 4.4. Let C be a linear code. Then (C ) = C . In particular, C is a parity

check code.
Proof. Let x C. Then x.y = 0 y C . So x (C ) , hence C (C ) . By

Lemma 4.3,
rank C = n rank C = n (n rank(C ) ) = rank(C ) .
So C = (C ) .
Denition. Let C be an (n, k)-code.

(i) A generator matrix G for C is a k n matrix with rows a basis for C .
(ii) A parity check matrix H for C is a generator matrix for C . It is an (n k) n
matrix.
The codewords in C can be views as
(i) linear combinations of rows of G;

(ii) linear dependence relations between the columns of H, i.e.
C = {x Fn2 : Hx = 0}.
Syndrome Decoding
Let C be an (n, k)-linear code. Recall that
C= {GT y
:y Fk2 } where G is the generator matrix.
C = {x Fn2 : Hx = 0} where H is the parity check matrix.
Denition. The syndrome of x Fn2 is Hx.
If we receive x = c + z , where c is the codeword and z is the error pattern, then

Hx = Hc + Hz = Hz . If C is e-error correcting, we precompute Hz for all z with
(z) e. On receiving x Fn2 , we look for Hx in our list. Hx = Hz , so H(x z) = 0,
so c = x z C with d(x, c) = (z) e.
Remark. We did this for Hamming's (7, 4)-code, where e = 1.
Denition. Codes C1 , C2 Fn2 are equivalent if reordering each codeword of C1 using

the same permutation gives the codewords of C2 .
33
Lemma 4.5. Every (n, k)-linear code is equivalent to one with generator matrix G=
(Ik | B) for some k (n k) matrix B .
Proof. Using Gaussian elimination, i.e. row operations, we can transform G into row
echelon form, i.e.
(
0 if j < l(i)
Gij =
1 if j = l(i)
for some l(1) < l(2) < < l(k). Permuting columns replaces the code by an equivalent
code. So without loss of generality we may assume l(i) = i for all 1 i k. Therefore,

1
..
G=

.
0 1
Further row operations give G = (Ik |B).
Remark. A messagey Fk2 , viewed as a row vector, is encoded as yG. So if G = (Ik | B)

then yG = (y | yB), where y is the message and yB are the check digits.
Proof of Lemma 4.3. Without loss of generality C has generator matrix G = (Ik | B). G
n k
has k linearly independent columns, so the linear map : F2 F2 , x 7 Gx is surjective

and ker() = C , so by the rank-nullity theorem we obtain
dim Fn2 = dim ker() + dim Im()

n = rank C + rank C.
Lemma 4.6. An (n, k)-linear code with generator matrix G = (Ik | B) has parity check
matrix H = (B T | Ink ).
Proof. Since GH T = (Ik | B) Ink

B

= B + B = 0, the rows of H generate a subcode of
C . But rank H = n k since H contains Ink , and n k = rank C by Lemma 4.3,

so C
has generator matrix H .
Remark. We usually only consider codes up to equivalence.
Hamming Codes
Denition. For d 1, n = 2d 1. Let H be the d n matrix whose columns are
let
the non-zero elements of F2 . The Hamming (n, n d)-code is the linear code with parity
d
check matrix H . Note this is only dened up to equivalence.
Example. For d = 3, we have

1 0 1 0 1 0 1
H = 0 1 1 0 0 1 1
0 0 0 1 1 1 1
Lemma 4.7. The minimum distance of the (n, n d) Hamming code C is d(C) = 3. It
is a perfect 1-error correcting code.
Proof. The codewords of C are dependence relations between the columns of H. Any
two columns of H are linearly independent, so there are no non-zero codewords of weight
at most 2. Hence d(C) 3. If

1 0 1 ...
H = 0 1 1

.
.
.
then 1110 . . . 0 C , hence d(C) = 3.

By Lemma 2.4, C is 1-error correcting. But
2n 2n
= = 2nd = |C|,
V (n, 1) n+1
so C is perfect.
ReedMuller Codes
Take a set X such that |X| = n, X = {P1 , . . . , Pn }. There is a correspondence between
P(X) and Fn2 .
P(X) o / {f : X F2 } o / Fn
2
/ 1A
A
f / (f (P1 ), . . . , f (Pn ))
symmetric dierence o_ _ _ _ _ _ _/ vector addition

A4B = (A \ B) (B \ A) x + y = (x1 + y1 , . . . , xn + yn )
intersection o_ _ _ _ _ _ _ _ _ _ _ _/ wedge product

AB x y = (x1 y1 , . . . , xn yn )
Take X = Fd2 , so n = |X| = 2d . Let v0 = 1X = (1, 1, . . . , 1). Let vi = 1Hi for 1id
where Hi = {p X : pi = 0}, called the coordinate hyperplane.
Denition. ReedMuller code RM (d, r) of order r, 0 r d, and length 2d is the

The
linear code spanned by v0 and all wedge products of r or fewer of the vi . By convention,
the empty wedge product is v0 .
Example. Let d = 3.
X {000 001 010 011 100 101 110 111}
v0 1 1 1 1 1 1 1 1
v1 1 1 1 1 0 0 0 0
v2 1 1 0 0 1 1 0 0
v3 1 0 1 0 1 0 1 0
v1 v2 1 1 0 0 0 0 0 0
v2 v3 1 0 0 0 1 0 0 0
v1 v3 1 0 0 0 0 0 0 0
v1 v2 v3 1 0 0 0 0 0 0 0
35
This gives the following codes.
RM (3, 0) is spanned by v0 , a repetion code of length 8.

RM (3, 1) is spanned by v0 , v1 , v2 , v3 , a parity check extension of Hamming's (7, 4)-
code.
RM (3, 2) is an (8, 7)-code, in fact it is the simple parity check code.
RM (3, 3) is F82 , the trivial code.
Theorem 4.8. (i) The vectors vi1 vis for 1 i1 < i2 < < is d and
0 s d are a basis n
Pr fordF
2 .
(ii) rank RM (d, r) = s=0 s .
Pd
Proof. d
= (1 + 1)d = 2d = n vectors,

(i) We have listed i=0 s so it suces to check
spanning, i.e. check RM (d, d) = Fn2 . Let p X and
(
vi if pi = 0
yi =
v0 + vi if p1 = 1
Then 1{p} = y1 yd . Expand this using the distributive law to show 1{p}
RM (d, d). But 1{p} for p X span Fn2 , so the given vectors form a basis.
(ii) RM (d, r) is spanned by the vectors vi1 vis for 1 i1 < < is d with
0 s r. ThesePvectors are linearly independent by (i), so a basis. Therefore,
rank RM (d, r) = rs=0 ds .
Denition. C1 , C2 be linear codes of length n with C2 C1 . The bar product

Let is
C1 | C2 = {(x | x + y) : x C1 , y C2 }. It is a linear code of length 2n.
Lemma 4.9. rank(C1 | C2 ) = rank C1 + rank C2 ;

(i)
(ii) d(C1 | C2 ) = min{2d(C1 ), d(C2 )}.
Proof. (i) C1 has basis x1 , . . . , xk , C2 has basis y1 , . . . , yl . C1 | C2 has basis {(xi |

xi ) : 1 i k} {(0 | yi ) : 1 i l}. Therefore,
rank(C1 | C2 ) = k + l = rank C1 + rank C2 .
(ii) Let 0 6= (x | x + y) C1 | C2 . If y 6= 0 then (x | x + y) (y) d(C2 ). If y=0

then (x | x + y) = 2(x) 2d(C1 ). Therefore,
d(C1 | C2 ) min{2d(C1 ), d(C2 )}.
There exists x C1 with (x) = d(C1 ). Then d(C1 | C2 ) (x | x) = 2d(C1 ).

There exists y C2 with (y) = d(C2 ). Then d(C1 | C2 ) (0 | y) = d(C2 ).
Therefore,
d(C1 | C2 ) min{2d(C1 ), d(C2 )}.
Theorem 4.10. (i) RM (d, r) = RM (d 1, r) | RM (d 1, r 1).

(ii) RM (d, r) has minimum distance 2dr .
Proof. (i) NoteRM (d 1, r 1) RM (d 1, r), so the bar product is dened. Order

the
d
elements of X = F2 such that
vd = (00 . . . 0 | 11 . . . 1)
vi = (vi0 | vi0 ) for 1 i d 1.
Ifz RM (d, r), then z is a sum of wedge products of v1 , . . . , vd . So z = x + y vd

for x, y sums of wedge products of v1 , . . . , vd1 . Then
x = (x0 | x0 ) for some x0 RM (d 1, r)

y = (y 0 | y 0 ) for some y 0 RM (d 1, r 1)
Then
z = x + y vd
= (x0 | x0 ) + (y 0 | y 0 ) (00 . . . 0 | 11 . . . 1)
= (x0 | x0 + y 0 )
So z RM (d 1, r) | RM (d 1, r 1).
(ii)
d
If r = 0 then RM (d, 0) is a repetition code of length n = 2 . This has minimum
distance 2
d0 n
. If r = d then RM (d, d) = F2 with minimum distance 1 = 2
dd .
We prove the case 0 < r < d by induction on d. Recall
RM (d, r) = RM (d 1, r) | RM (d 1, r 1).
The minimum distance of RM (d 1, r) is 2d1r and of RM (d 1, r 1) is 2dr .

By Lemma 4.9, the minimum distance of RM (d, r) is
min{2(2d1r ), 2dr } = 2dr .
GRM Revision: Polynomial Rings and Ideals

Denition. (i) A ring R is a set with operations + and , satisfying certain axioms
(familiar as properties ofZ).
(ii) A eld is a ring where every non-zero element has a multiplicative inverse, e.g. Q,
R, C, Fp = Z/pZ for p prime.
Every eld is either an extension of Fp (with characteristic p) or an extension of Q (with

characteristic 0).
Denition. Let R be a ring. The polynomial ring with coecients in R is
( n )
X
R[X] = ai X i : a0 , . . . , an R, n N
i=0
with the usual operations.
Remark.
Pn i
By denition, i=0 ai X = 0 if and only ai = 0 for all i. Thus f (X) =
X2 + X F2 [X] is non-zero, yet f (a) = 0 for all a F2 .
Let F be any eld. The rings Z and F [X] both have a division algorithm: if a, b Z,
b 6= 0 then there exist q, r Z such that a = qb + r and 0 r < |b|. If f, g F [X],
g 6= 0 then there exist q, r F [X] such that f = qg + r with deg(r) < deg(g).
37
Denition. An ideal I R is a subgroup under addition such that
r R, x I = rx I.
Denition. The principal ideal generated by xR is
(x) = Rx = xR = {rx : r R}.
By the division algorithm, every ideal in Z or F [X] is principal, generated by an element

of least absolute value respectively least degree. The generator of a principal ideal is
unique up to multiplication by a unit, i.e. an element with multiplicative inverse. Z has
units {1}, F [X] has units F \ {0}, i.e. non-zero constants.
Fact. Every non-zero element of Z or F [X] can be factored into irreducibles, uniquely
up to order and multiplication by units.
IfI R is an ideal then the set of cosets R/I = {x + I : x R} is a ring, called the
quotient ring, under the natural choice of + and . In practice, we identify Z/nZ and
{0, 1, . . . , n 1} and agree to reduce modulo n after each + and . Similarly,
(n1 )
X
F [X]/(f (X)) = ai X : a0 , . . . , an1 F = F n
i
i=0
where n = deg f , reducing after each multiplication using the division algorithm.
Cyclic Codes
Denition. A linear code C Fn2 is cyclic if
(a0 , a1 , . . . , an1 ) C = (an1 , a0 , a1 , . . . , an2 ) C.

We identify
F2 [X]/(X n 1) o / {f F2 [X] : deg f < n} o / Fn

2
a0 + a1 X + an1 X n1 o / (a0 , a1 , . . . , an1 )
Lemma 4.11. A code C F2 [X]/(X n 1) is cyclic if and only if
(i) 0C
(ii) f, g C = f + g C
(iii) f F2 [X], g C = f g C
Equivalently, C is an ideal in F2 [X]/(X n 1).
Proof. If g(X) a0 + a1 X + + an1 X n1 (mod X n 1), then Xg(X) an1 +

a0 X + + an2 X n1 (mod X n 1). So C is cyclic if and only if
(i) 0 C;
(ii) f, g C = f + g C ;
(iii)' g(X) C = Xg(X) C .
ai X i ,
P
Note (iii)' is the case f (X) = X of (iii). In general, f (X) = so
X
f (X)g(X) = ai X i g(X) C
| {z }
C by (iii)
by (ii).
Basic Problem
Our basic problem is to nd all cyclic codes of length n. The following diagram outlines
the solution.
cyclic codes o / ideals in o / ideals in F2 [X]

of length n F2 [X]/(X n 1) containing Xn 1
O
F2 [X] a PID

polynomials g(X) F2 [X]
dividingXn 1
Theorem 4.12. Let C F2 [X]/(X n 1) be a cyclic code. Then there exists a unique
g(X) F2 [X] such that
(i) C = {f (X)g(X) (mod X n 1) : f (X) F2 [X]};

(ii) g(X) | X n 1.
In particular, p(X) F2 [X] represents a codeword if and only if g(X) | p(X). We say
g(X) is the generator polynomial of C .
Proof. Let g(X) F2 [X] be of least degree representing a non-zero codeword. Note
deg g < n. Since C is cyclic, we have in (i).
Let p(X) F2 [X] represent a codeword. By the division algorithm, p(X) = q(X)g(X)+
r(X) for some q, r F2 [X] with deg r < deg g . So r(X) = p(X) q(X)g(X) C ,
n
contradicting the choice of g(X) unless r(X) is a multiple of X 1, hence r(X) = 0 as
deg r < deg g < n; i.e. g(X) | p(X). This shows in (i).
Taking p(X) = X n 1 gives (ii).
Uniqueness. Suppose g1 (X), g2 (X) both satisfy (i) and (ii). Then g1 (X) | g2 (X) and
g2 (X) | g1 (X), so g1 (X) = ug2 (X) for some unit u. But units in F2 [X] are F2 \{0} = {1},
so g1 (X) = g2 (X).
Lemma 4.13. LetC be a cyclic code of length n with generator polynomial g(X) =
a0 + a1 X + + ak X k , ak 6= 0. Then C has basis g(X), Xg(X), . . . , X nk1 g(X). In
particular, C has rank n k .
Proof. (i) Linear independence. Suppose f (X)g(X) 0 (mod X n 1) for some

f (X) F2 [X] with deg f < n k . Then deg f g < n, so f (X)g(X) = 0, hence
f (X) = 0, i.e. every dependence relation is trivial.
(ii) Spanning. Let p(X) F2 [X] represent a codeword. Without loss of generality
deg p < n. Since g(X) is the generator polynomial, g(X) | p(X), i.e. p(X) =
f (X)g(X) for some f (X) F2 [X]. deg f = deg p deg g < n k , so p(X) belongs
to the span of g(X), Xg(X), . . . , X
nk1 g(X).
Corollary 4.14. The n (n k) generator matrix is

a0 a1 . . . ak 0

a0 a1 . . . ak

G=
a0 a1 . . . ak

.. ..
. .
0 a0 a1 . . . ak
39
Denition. The parity check polynomial h(X) F2 [X] is dened by X n 1 = g(X)h(X).
Note 13. If h(X) = b0 + b1 X + + bnk X nk , then the nk parity check matrix is

bnk . . . b1 b0 0

bnk . . . b1 b0

H=
b nk . . . b 1 b0

.. ..
. .
0 bnk . . . b1 b0
Indeed, the dot product of theith row of G and the j th row of H is the coecient of
X (nki)+j ing(X)h(X). But 1 i n k and 1 j k , so 0 < (n k i) + j < n.
n
These coecients of g(X)h(X) = X 1 are zero, hence the rows of G and H are

orthogonal. Also rank H = k = rank C , so H is a parity check matrix.
Remark. The check polynomial is the reverse of the generator polynomial for the dual
code.
Lemma 4.15. n is odd then X n 1 = f1 (X) . . . ft (X) with f1 (X), . . . , ft (X) distinct
If
2 2
irreducibles in F2 [X]. (Note this is false for n even, e.g. X 1 = (X 1) in F2 [X].)
t
In particular, there are 2 cyclic codes of length n.
Proof. Suppose X n 1 has a repeated factor. Then there exists a eld extension K/F2
such that X
n 1 = (X a)2 g(X) for some a K and some g(X) K[X]. Taking
formal derivatives, nX
n1 = 2(X a)g(X) + (X a)2 g 0 (X) so nan1 = 0, so a = 0
n
since n is odd, hence 0 = a = 1, contradiction.
Finite Fields
Theorem A. Suppose p prime, Fp = Z/pZ. Let f (X) Fp [X] be irreducible. Then
K = Fp [X]/(f (X)) is a eld of order p
deg f and every nite eld arises in this way.
Theorem B. Let q = pr be a prime power. Then there exists a eld Fq of order q and
it is unique up to isomorphism.
Theorem C. The multiplicative group Fq = Fq \ {0} is cyclic, i.e. there exists Fq

such that Fq = {0, 1, , . . . , q2 }.
BCH Codes
Let n be an odd integer. Pick r 1 such that 2r 1 (mod n). (This exists since
(2, n) = 1.) Let K = F2r . Let n (K) = {x K : xn = 1} K . Since n | (2r 1) =
|K |, n (K) is a cyclic group of order n. So n (K) = {1, , . . . , n1 } for some K ,
is called a primitive nth root of unity.
Denition. The cyclic code of length n with dening set A n (K) is
C = {f (X) (mod X n 1) : f (X) F2 [X], f (a) = 0 a A}.

The generator polynomial is the non-zero polynomial g(X) of least degree such that
g(a) = 0 for all a A. Equivalently, g(X) is the least common multiple of the minimal
polynomials of the elements a A.
Denition. The cyclic code with dening set A = {, 2 , . . . , 1 } is called a BCH
(Bose, Ray-Chaudhuri, Hocquenghem) code with design distance .
Theorem 4.16. A BCH code C with design distance has d(C) .

Lemma 4.17 (Vandermonde determinant) .

1 1 ... 1

x1 x2 ... xn Y
2
x
1 x22 ... x2n = (xi xj )
. . . . . . . . . .... . . . . 1j<in

xn1 xn1 ... xn1
1 2 n
Proof. This is an indentity in Z[X1 , . . . , Xn ]. The LHS vanishes when we specialise to

xi = xj for i 6= j . Therefore, (xi xj ) | LHS for i 6= j .
Running over distinct permutations of (i, j) we get coprime polynomials, so RHS | LHS .
n
x2 x23 xn1

Both sides have degree and the coecient of n is 1 on the LHS and on
2
the RHS. (On the RHS, we need to take a term with larger index from each bracket, so
always take xi , not xj , whence the coecient is 1.) Therefore, LHS = RHS .
Proof of Theorem 4.16. Let
2 n1

1 ...
1 2 4 ... 2(n1)
H=
. . . . . . . . . . . . .

. . . . . . . . . . . .
1 1 2(1) . . . (1)(n1)
By Lemma 4.17, any 1 columns of H are linearly independent. But any codeword of
C is a dependence relation between the columns of H. Hence every non-zero codeword
has weights at least . Therefore, d(C) .
Note 14. H is not a parity check matrix, its entries are not in F2 .
Decoding BCH Codes

Let C be a cyclic code with dening set {, 2 , . . . , 1 }, where K is
a primitive
t = 1

nth root of unity. By Theorem 4.16, we ought to be able to correct
2 errors.
We send cC and receive r = c + e, where e is the error pattern. Note here
Fn2 o / F2 [X]/(X n 1)
r, c, e o / r(X), c(X), e(X)
Denition. The error locator polynomial is

Y
(X) = (1 i X) K[X]
iE
where E = {0 i n 1 : ei = 1}.
41
Theorem 4.18. Assume deg = |E| t. Then (X) is the unique polynomial in K[X]
of least degree such that
(i) (0) =P1;

(ii) (X) 2t j j
j=1 r( )X (X) (mod X
2t+1 ) for some (X) K[X] with deg t.
Proof. Let (X) = X 0 (X). Then
X Y
(X) = i X (1 j X).
iE jE
j6=i
We work in the power series ring K[[X]].
(X) X i X
=
(X) 1 i X
iE

XX
= (i X)j
iE j=1

!
X X
= (j )i Xj
j=1 iE
X
= e(j )X j
j=1
Therefore,

X
(X) e(j )X j = (X).
j=1
By denition of C , c(j ) = 0 for all 1 j 1. But r = c + e, so r(j ) = e(j ) for

all 1 j 2t. Therefore,
2t
X
(X) r(j )X j (X) (mod X 2t+1 ).
j=1
We have checked (i) and (ii) with (X) = X 0 (X), so deg = deg = |E| t.
Suppose (X), (X) K[X] also satisfy (i), (ii) and deg deg . Note if i E,
Y
(i ) = (1 ji ) 6= 0
jE
j6=i
so (X) and (X) are coprime. By (ii),
(X)(X) (X)(X) (mod X 2t+1 ),
so (X)(X) = (X)(X) since , , , all have degree at most t.

But (X), (X) are coprime, so (X) | (X). We assumed deg deg , so = a
for some a K. Then by (i), = .
Decoding algorithm
Suppose we receive the word r(X).

P2t
(i) Compute j=0 r(j )xj .
(ii) Set (X) = 1 + 1 X + + t X t and compare coecients of X i for t + 1 i < 2t
to obtain linear equations for 1 , . . . , t .
(iii) Solve these over K , e.g. using Gaussian elimination, keeping solutions of least
degree.
(iv) Compute = {0 i n 1 : (i = 0} and check |E| = deg .
EP
(v) Set e(X) = iE X i , c(X) = r(X) + e(X) and check c(X) is a codeword.
Example. n = 7. X 7 1 = (X 1)(X 3 + X + 1)(X 3 + X 2 + 1) in F2 [X].

(i) Let
3 3
For example, take g(X) = X + X + 1 and h(X) = (X + 1)(X + X + 1) =
2
X 4 + X 2 + X + 1. The parity check matrix is

1 0 1 1 1 0 0
H = 0 1 0 1 1 1 0
0 0 1 0 1 1 1
This is the Hamming (7, 4)-code.

(ii) Let K be a splitting eld of X 7 1 F2 [X], e.g. K = F8 . Let K be a
3
root of g(X). Therefore, is a primitive 7th root of unity. Note = + 1, so
6 = ( + 1)2 = 2 + 1, so g( 2 ) = 0. Therefore, the BCH code C dened by
{, 2 } has generator polynomial g(X), it is Hamming's (7, 4)-code again. So by
Theorem 4.16, d(C) 3.
Shift Registers
Denition. A (general) feedback shiftback register is a function f : Fd2 Fd2 of the form
f (x0 , x1 , . . . , xd1 ) = (x1 , . . . , xd1 , C(x0 , . . . , xd1 )) for some function C : Fd2 F2 . We
say the register has length d.
t t r
x0 TTTx1 QQ x2 tH ... t xd2 xd1 o
TTTT QQQ HH
TTTTQQQ HH sss kkkkk
TTTQTQQHHH ss kk k
T* Q( $ yssukskkkk
function C
Pd1
The register is linear (LFSR) if C is a linear map, say (x0 , . . . , xd1 ) 7 i=0 ai xi .
The initial ll (y0 , y1 , . . . , yd1 ) produces an output sequence (yn )n0 given by
yn+d = C(yn , yn+1 , . . . , yn+d1 )

d1
X
= ai yn+i
i=0
i.e. we have a sequence determined by a linear recurrence relation with auxiliary poly-
nomial P (X) = X d + ad1 X d1 + a1 X + a0 .
Denition. The feedback polynomial is P (X) = a0 X d + a1 X d1 + + ad1 X + 1.

43
Lemma 4.19. The sequence (yn )n0 in F2 is the output from a LSFR with auxiliary
polynomial P (X) if and only if

X A(X)
yi X i =
i=0
P (X)
for some A(X) F2 [X], with deg A < deg P and P (X) = X deg P P (X 1 ) F2 [X].
Proof. Let
d + + a X + a . Therefore, P (X) = a X d + + a
P (X) = ad XP 1 0 0 d1 X + ad .
i P (X) is a polynomial of degree less than d. This
The condition is that i=0 iy X
holds if and only if
d1
X
ai ynd+1 = 0 n d
i=0
d1
X
ai yn+i = 0 n 0
i=0
if and only if (yn )n0 is the output from a LSFR.
The following problems are closely related.
(i) Decoding BCH codes (see Theorem 4.18);

(ii) recovering a LFSR from its output stream (see Lemma 4.19);
(iii) writing a power series as a ratio of polynomials.
Berlekamp Massey Method

Let (xn )n0 be the
Poutput from a LFSR. Our aim is to nd d and a0 , . . . , ad1 F2
d1
such that xn+d = i=0 ai xn+i for all n 0. We have

a0
x0 x1 . . . xd a1
x1 x2 . . . xd+1 .

. =0 ()
.

. . . . . . . . . . . . . . .
ad1
xd xd+1 . . . x2d
| {z } 1
=:Ad
If we know that the register has length at least r, start with i = r. Compute det Ai .
If det Ai 6= 0, then d > i, replace i by i + 1 and repeat.
If det Ai = 0, solve () for a0 , . . . , ad1 by Gaussian elemination and test the
solution over as many terms of the sequence as we like. If it fails, then d > i,
replace i by i+1 and repeat.
Chapter 5
Cryptography
The aim is to modify the message such that it is unintelligible to an eavesdropper.
There is some secret information shared by the sender and receiver, called the key in K.
The unencrypted message is called the plaintext and from M. The encrypted message
is called the ciphertext and it is from C. A cryptosystem consists of sets (K, M, C) with
functions
e: M K C
d: C K M
such that d(e(m, k), k) = m for all m M, k K.

Example. Some examples in the case M = C = = {A, B, . . . , Z}.
(i) Simple substitution, K is the set of permutations of . Each letter of the plaintext
is replaced by the image under the permutation.
(ii) Vigenre cipyer. K = d for some d N. Identify and Z/26Z. Write out the
key repeatedly below the message and add modulo 26.
What does it mean to break a cryptosystem? The enemy might know
the functions d and e,

the probability distributions on M, K,
but not the key. They seek to recover the plaintext from the ciphertext.
There are three possible attacks.
1. Ciphertext only. The enemy knows some piece of the ciphertext.

2. Known plaintext. The enemy possesses a considerable length of plaintext and
matching ciphertext, and seeks to discover the key.
3. Chosen plaintext. The enemy may aquire the ciphertext for any message he
chooses.
Examples (i) and (ii) fail at the level 2, at least for suciently random messages. They
even fail at level 1, if e.g. the source is English text. For modern applications, level 3 is
desirable.
We model the key and the messages as independent random variables K and M taking
values in K and M. Put C = e(K, M ).
Denition. A cryptosystem has perfect secrecy if M and C are independent. Equiva-
lently, I(M ; C) = 0.
46 Cryptography
Lemma 5.1. Perfect secrecy implies |K| |M|.
Proof. Pick m0 M and k0 K with P(K = k0 ) > 0. Let c0 = e(m0 , k0 ). For any
m M,
P(C = c0 | M = m) = P(C = c0 ) = P(C = c0 | M = m0 ) = P(K = k0 ) > 0.
So for each mM there exists kK such that e(m, k) = c0 . Therefore, |K| |M|.
We conclude that perfect secrecy is an unrealistic goal.
Denition. (i) The message equivocation is H(M | C).

(ii) The key equivocation is H(K | C).
Lemma 5.2. H(M | C) H(K | C).
Proof. Since M = d(C, K), H(M | C, K) = 0. So H(C, K) = H(M, C, K). Therefore,
H(K | C) = H(M, C, K) H(C)

= H(K|M, C) + H(M, C) H(C)
= H(K | M, C) +H(M | C)
| {z }
0
H(K | C) H(M, C)
Take M = C = , say. We send n messages M (n) = (M1 , . . . , Mn ) encrypted as

C (n) = (C1 , . . . , Cn ) using the same key.
Denition. The unicity distance is the least n for which H(K | C (n) ) = 0, i.e. the
smallest number of encrypted messages required to uniquely determine the key.
H(K | C (n) ) = H(K, C (n) ) H(C (n) )

= H(K, M (n) ) H(C (n) )
= H(K) + H(M (n) ) H(C (n) ).
We assume that
(i) all keys are equally likely, so H(K) = log|K|;

(ii) H(M (n) ) nH for some constant H , for suciently large n (this is true for many
sources, including Bernoulli sources);
(iii) all sequences of ciphertext are equally likely, so H(C (n) ) = n log|| (good cryp-
tosystems should satisfy this).
So
H(K | C (n) ) = log|K| + nH n log||

0
if and only if
log|K|
n U :=
log|| H
which is the unicity distance.
Recall that 0 H log||. To make the unicity distance large we can make K large or
use a message source with little redundancy.
47
Example. Suppose we can decrypt a substitution cipher after 40 letters. || = 26, |K| =
26!, U 40. Then for the entropy of English text HE we have
log 26!
HE log 26 2.5
40
Many cryptosystems are thought secure (and indeed used) beyond the unicity distance.
Stream Ciphers
We work with streams, i.e. sequences in F2 . For plaintext p0 , p 1 , . . . and key k0 , k1 , . . .
we set the ciphertext to be z0 , z 1 , . . . where zn = pn + kn .
One time pad
The key stream is a random sequence, known only to the sender and recipient. Let
1
K0 , K1 , . . . be i.i.d. random variables with P(Kj = 0) = P(Kj = 1) = 2 . The ciphertext
isZn = pn +Kn , where the plaintext is xed. Then Z0 , Z1 , . . . are i.i.d. random variables
1
with P(Zj = 0) = P(Zj = 1) =
2 . Therefore, without knowledge of the key stream
deciphering is impossible. (Hence this has innite unicity distance.)
There are the following two problems with the use of one time pads.
(i) How do we construct a random key sequence?

(ii) How do we share the key sequence?
(i) is surprisingly tricky, but not a problem in practice. (ii) is the same problem we
started with. In most applications, the one time pad is not practical. Instead, we
generate k0 , k1 , . . . using a feedback shift register, say of length d. We only need to
share the initial ll k0 , k1 , . . . , kd1 .
Lemma 5.3. Let x0 , x1 , . . . be a stream produced by a shift register of length d. Then

there exist M, N 2d such that xN +r = xr for all r M .
Proof. Let the register be f : Fd2 Fd2 . Let vi = (xi , xi+1 , . . . , xi+d1 ). Then vi+1 =
f (vi ). Since |Fd2 | = 2d , the vectors v0 , v1 , . . . , v2d cannot all be distinct, so there exist
0 a < b 2d such that va = vb . Let M = a, N = ba. So vM = vM +N and vr = vr+N
for all r M (by induction, apply f ), so xr = xr+N for all r M .
Remark. (i) The maximum period of a feedback shift register of length d is 2d .

(ii) The maximum period of a LFSR of length d is 2
d
1. The bound of Lemma 5.3
is improved by 1, vi 6= 0 for all i, otherwise the period is 1.
since we can assume
d n
But we can obtain period 2 1 by taking xn = T ( ) where is a generator for

F2d and T : F2d F2 is any non-zero F2 -linear map. We must check that (xn ) is
the output from a LFSR and the sequence does not repeat itself with period less
than 2d 1 (see Example Sheet 4).
(iii) Stream ciphers using a LSFR fail at level 2 (known plaintext attack), due to the
Berlekamp Massey method.
Why should this cryptosystem be used?

48 Cryptography
(i) It is cheap, fast and easy to use.

(ii) Messages are encrypted and decrypted on the y.
(iii) It is error-tolerant.
Solving linear recurrence relations
Recall that over C the general solution is a linear combination of solutions n , nn ,

n2 n ,
. . . , nt1 n for a root of the auxiliary polynomial P (X) with multiplicity t.
2
Beware that n n (mod 2). Over F2 , we need two modications.
(i) We work in a splitting eld K for P (X) F2 [X];

n
ni n n .

(ii) replace by
i
We can also generate new key streams from old ones as follows.
Lemma 5.4. Let xn and yn be the output from a LFSR of length M and N , respectively.
(i) The sequence (xn + yn ) is the output from a LFSR of length M + N .
(ii) The sequence (xn yn ) is the output from a LFSR of length M N .
Proof. We will assume that the auxiliary polynomials P (X), Q(X) each have distinct
roots, say 1 , . . . , M and 1 , . . . , N in some extension eld K of F2 .
Then xn =
P M n, y =
PN n for some , K .
i=1 i i n j=1 j j i j
PM n
PN n
(i) xn + yn = i=1 i i + j=1 j j . This is produced by a LFSR with auxiliary
polynomial P (X)Q(X).
PM PN
(ii) xn yn = i j (i j )n is the output of a LFSR with auxiliary polynomial
QN QM i=1 j=1
i=1 j=1 (X i j ), which is in F2 [X] by the Symmetric Function Theorem.
We have the following conclusions.
(i) Adding the output of two LFSR is no more economical then producing the same
string with a single LFSR.
(ii) Multiplying streams looks promising, until we realise that xn yn = 0 75% of the
time.
Remark. Non-linear registers look appealing, but are dicult to analyse. In particular,
the eavesdropper may understand them better than we do.
Example. Take xn , yn , zn output from LFSRs. Put
(
xn if zn = 0
kn =
yn if zn = 1
To apply Lemma 5.4, write kn = xn + zn (xn + yn ) to deduce (kn ) is again the output
from a LFSR.
Stream ciphers are examples of symmetric cryptosystems, i.e. decryption is the same, or
easily deduced from the encryption algorithm.
49
Public Key Cryptography
This is an example of an asymmetric cryptosystem. We split the key into two parts.
Private key for decryption.

Public key for encryption.
Knowing the encryption and decryption algorithms and the public key, it should still be
hard to nd the private key or to decrypt messages. This aim implies security at level 3
(chosen plaintext). There is also no key exchange problem.
The idea is to base the system on mathematical problems that are believed to be hard.
We consider two such problems.
(i) Factoring. Let N = pq for p, q large primes. Given N, nd p and q.

(ii) Discrete logarithms. Let p be a large prime and g be a primitive root modulo p,
i.e. a generator for Fp . Given x, nd a such that x g a (mod p).
Denition. An algorithm runs in polynomial time if
d
#(operations) c(input size)
for some constants c and d.
Note 15. An algorithm for factoring N has input size log N , i.e. the number of digits
of N.
The following are polynomial time algorithms.
Arithmetic of integers (+, , , division algorithm);

computation of GCD using Euclid's algorithm;
modular exponentiation, i.e. computation of x (mod N ) using the repeated squar-
ing algorithm;
primality testing (Agrawal, Kayal, Saxena 2002).
Polynomial time algorithms are not known for (i) and (ii).
Elementary methods

(i) Trial division properly organised takes time O( N ).

(ii) Baby-step Giant-step algorithm. Set m= p , write a = qm + r, 0 q, r < m.
Then
x g a g qm+r (mod p)
qm r
g g x (mod p)
List g qm (mod p) for q = 0, 1, . . . , m 1 and g r x (mod p) for r = 0, 1, . . . , m 1.

Sort these two lists and look for a match. Therefore, we can nd discrete logarithms

in time and storage O( p log p).
50 Cryptography
Factor base
Let B = {q prime : q C} {1} for some constant C.

x2 q (q,x) (mod N ). Linear algebra over F2
Q
(i) Find relations of the form qB
2 2
allows us to multiply such relations together to obtain x y (mod N ), hence
(x y)(x + y) 0 (mod N ). Taking gcd(x y, N ) may give a non-trivial factor
of N , repeat otherwise.
r (q,r) (mod p). With enough relations,
Q
(ii) Find relations of the form g qB q
solving linear equations modulo p 1 will solve the discrete logarithm problem for
s (q,s) (mod p). Therefore, we can
Q
each q B . Then nd s such that xg qB q
solve the discrete logarithm problem for x.
The best known method for solving (i) and (ii) uses a factor base method called the
1/3 2/3
number eld sieve. It has running time O(ec(log N ) (log log N ) ) where c is a known
constant. Note this is closer to polynomial time (in log N ) than to exponential time (in
1 2
log N ) tanks to the exponents
3 and 3 .
RSA factoring challenges.
# decimal digits Factored Price money

RSA-576 174 3rd Dec 2003 $10,000
RSA-640 193 2nd Nov 2005 $20,000
RSA-704 212 Not factored $30,000
Recall that
(n) = |{1 a n : (a, n) = 1}|

= |(Z/nZ) |,
the number of units in Z/nZ. The Euler-Fermat theorem states
(a, n) = 1 = a(n) 1 (mod n).
A special case of this is Fermat's little theorem, stating that for prime p
(a, p) = 1 = ap1 1 (mod p).
Lemma 5.5. Let p = 4k 1 be prime, d Z. If x2 d (mod p) is soluble then a

solution is x dk (mod p).
Proof. Let x0 be a solution. Without loss of generality, we may assume x0 6 0 (mod p).
Then
2(2k1)
d2k1 x0 xp1
0 1 (mod p)
(dk )2 d (mod p).
51
Rabin Williams cryptosystem
The private key consists of two large distinct primesp, q 3 (mod 4). The public key
isN = pq . We have M = C = {0, 1, 2, . . . , N 1}. We encrypt
a message m M as
c = m2 (mod N ). The ciphertext is c. (We should avoid m < N .)
Suppose we receive c. Use Lemma 5.5 to solve for x1 , x2 such thatx21 c (mod p),
x22 c (mod q). Then use the Chinese Remainder Theorem (CRT) to nd x with
x x1 (mod p), x x2 (mod q), hence x2 c (mod N ). Indeed, running Euclid's
algorithm on p and q gives integers r, s with rp + sq = 1. We take x = (sq)x1 + (rp)x2 .
Lemma 5.6. (i) Let p be an odd prime and gcd(d, p) = 1. Then x2 d (mod p) has
no or two solutions.
(ii) Let N = pq , p, q distinct odd primes and gcd(d, N ) = 1. Then x2 d (mod N )
has no or four solutions.
Proof. (i)
x2 y 2 (mod p)
p | (x + y)(x y)
p | (x + y) or p | (x y)
x y (mod p).
(ii) If x0 is some solution, then by CRT there exist solutions x with x x0 (mod p),
x x0 (mod q) for any of the four choices of . By (i), these are the only
solutions.
To decrypt Rabin Williams, we nd all four solutions to x2 c (mod N ). Messages

should include enough redundancy that only one of these possibilities makes sense.
Theorem 5.7. Breaking the Rabin Williams cryptosystem is essentially as dicult as

factoring N.
Proof. We have seen that factoring N allows us to decrypt messages. Conversely, suppose
we have an algorithm for computing square roots modulo N . Pick x (mod N ) at random.
Use the algorithm to nd y such that y 2 x2 (mod N ). With probability 12 , x 6 y
(mod N ). Then gcd(N, x y) is a non-trivial factor of N . If this fails, start again
1
with another x. After r trials, the probability of failure is less that r , which becomes
2
arbitrarily small.
Let N = pq , p, q distinct odd primes. We show that if we know a multiple m of

(N ) = (p 1)(q 1) then factoring N is easy.
Notation. Let op be the order of x in (Z/pZ) . Write m = 2a b, a 1, b odd. Let
X = {x (Z/N Z) : op (xb ) = oq (xb )}.
Theorem 5.8.
t
x X then
(i) If there exists 0t<a such that gcd(x2 b 1, N ) is
a non-trivial factor of N .
(ii) |X| 21 |(Z/N Z) | = (N )
2 .
52 Cryptography
Proof. (i) By Euler-Fermat,
x(N ) 1 (mod N )
m
= x 1 (mod N ).
a
But m = 2a b, so putting y = xb (mod N ) we get y 2 1 (mod N ). Therefore,
op (y) and oq (y) are powers of 2. We are given op (y) 6= oq (y), and without loss of
t
generality we may assume op (y) < oq (y). Say op (y) = 2 , so 0 t < a. Then
t
y 2 1 (mod p)
t
y2 1 (mod q)
t
So gcd(y 2 1, N ) = p.
(ii) See Page 52.
RSA (Rivest, Shamir, Adleman)

Let N = pq , p, q large distinct primes. Recall that (N ) = (p 1)(q 1). Pick e with
gcd(e, (N )) = 1. We solve for d such that de 1 (mod (N )).
The public key is (N, e), the private key is (N, d).
We encrypt m M as c = m2
(mod N ) and decrypt c as x = cd (mod N ). By Euler-
Fermat, x = m
de m (mod N ) since de 1 (mod (N )). (We ignore the possibility
that gcd(m, N ) 6= 1, since this occurs with very small probability.)
Corollary 5.9. Finding the RSA private key (N, d) from the public key (N, e) is essen-
tially as dicult as factoring N.
Proof. We have seen that factoring N allows us to nd d. Conversely, if we know d and e,
de 1 (mod (N )), then (N ) | (de 1) from taking m = de 1 in Theorem 3.13.
Proof of Theorem 5.8 (ii). By the CRT we have the following correspondence.
(Z/N Z) / (Z/pZ) (Z/qZ)
x / (x mod p, x mod q)
It suces to show that if we partition (Z/pZ) according to the value of op (xb ) then
1 p1
each subset has size at most
2 |(Z/pZ) | = 2 . We show that some subset has size
1
(Z/pZ) = {1, g, g , . . . , g p1 }. By Fermat's little theorem,
2
2 |(Z/pZ) |. Recall that
g p1 1 (mod p)
2a b
g 1 (mod p)
and hence op (g b ) is a power of 2. So
(
= op (g b )
b if odd
op (g )
< op (g b ) otherwise
Therefore, {g mod p : odd} is the required set.
Remark. It is not known whether decrypting RSA messages without knowledge of the
private key is essentially as hard as factoring.
53
DieHellman key exchange
Let p be a large prime, g a primitve root modulo p. This data is xed and known to
everyone.
Alice and Bob wish to agree a secret key. A chooses Z and sends g (mod p) to B .
B chooses Z and sends g
(mod p) to A. They both compute k = (g ) = (g )
(mod p) and use this as their secret key.
The eavesdropper seeks to compute g from g, g , g , p. This is conjectured, although

not proven, to be as hard as the discrete logarithm problem.
Authentication and Signatures

Alice sends a message to Bob. Possible aims include the following.
Secrecy.A and B can be sure that no third party can read the message.
A and B can be sure that no third party can alter the message.
Integrity.
Authenticity. B can be sure that A sent the message.
Non-repudiation. B can prove to a third party that A sent the message.
Authentication using RSA
A uses the private key (N, d) to encrypt messages. Anyone can decrypt messages using
the public key (N, e). (Note that (xd )e = (xe )d x.) But they cannot forge messages
sent by A.
Signatures
Signature schemes can be used to preserve integrity and non-repudiation. They also
prevent tampering of the following kind.
Example (Homomorphism attack). A bank sends messages of the form (M1 , M2 ) where
M1 is the name of the client and M2 is the amount transferred to his account. Messages
are encoded using RSA
(Z1 , Z2 ) = (M1e mod N, M2e mod N ).
I transfer 100 to my account, observe the encrypted message (Z1 , Z2 ) and then send
(Z1 , Z23 ). I become a millionaire without the need to break RSA.
Example (Copying) . I could just keep sending (Z1 , Z2 ). This is defeated by time
stamping.
A message m is signed as (m, s) where s is a function of m and the private key. The
signature (or trapdoor) function should be designed so no-one without knowledge of the
private key can sign messages, yet anyone can check the signature is valid.
Remark. We are interested in the signature of the message, not of the sender.
54 Cryptography
Signatures using RSA
A has private key (N, d), public key (N, e). She signs m as (m, md mod N ). The
signature s is veried by checking s
e m (mod N ).
There are the following problems.
(i) The homomorphism attack still works.

(ii) Existential forgery. Anyone can produce valid signed messages of the form (se
mod N, s) after choosing s rst. We might hope that messages generated in this
way are not meaningful.
However, there are the following solutions.
(i) We can use a better signature scheme, as explained later.

(ii) Rather than signing the message m, we sign h(m) where h is a hash function.
h : M {0, 1, . . . , N 1} is a publically known function for which it is very
0 0
dicult to nd pairs x, x M with x 6= x and h(x) = h(x ).
0
The el Gamal signature scheme
Let p be a large prime, g a primitive element modulo p. Alice randomly chooses an

integer u, 1 u p 1. The public key is p, g, y = g u (mod p). The private key is u.
To send a message m, 1 m p 1, Alice randomly chooses k, coprime to p 1, and
computes r, s with 1 r, s p 1 satisfying
r gk (mod p) (1)
m ur + ks (mod p 1) (2)
Alice signs the message m with signature (r, s). Now
g m g ur+ks (mod p) by (2)

u r k s
(g ) (g ) (mod p)
r s
y r (mod p)
Bob accepts the signature if g m y r rs (mod p).

How can we forge such a signature? All obvious attacks involve solving the discrete
logarithm problem.
Lemma 5.10. Given a, b, m, the congruence
ax b (mod m) ()
has either zero or gcd(a, m) solutions.
Proof. d = gcd(a, m).

Let If d-b then there are no solutions. Otherwise rewrite the
congruence () as
a b m
x (mod ) ()
d d d
Now gcd( ad , m
d ) = 1, so () has a unique solution modulo
m
d , so () has d solutions
modulo m.
55
It is important that Alice chooses a new value of k to sign each message. Otherwise
suppose messages m1 , m2 have signatures (r, s1 ) and (r, s2 ).
m1 ur + ks1 (mod p 1) ()
m2 ur + ks2 (mod p 1)
m1 m2 k(s1 s2 ) (mod p 1)
By Lemma 5.10, this congruence has d = gcd(s1 s2 , p 1) solutions for k . If d is small,

we run through all possibilities for k and see which of them satisfy r g k (mod p).
Now similarly, we use () to solve for u. This is Alice's private key, so we can now sign
messages.
Remark. Several existential forgeries are known, i.e. we can nd solutions m, r, s to
g m y r rs (mod p), but with now control over m. In practice, this is stopped by signing
a hash value of the message instead of the message itself.
Bit Commitment
Alice would like to send a message to Bob in such a way that
(i) Bob cannot read the message until Alice sends further information;
(ii) Alice cannot change the message.
This has the following applications.
Coin tossing;
sell stock market tips;
multiparty computation, e.g. voting, surveys, etc.
We now present two solutions.
(i) Using any public key cryptosystem. Bob cannot read the message until Alice sends
her private key.
(ii) Using coding theory as follows.
noisy channel
*
Alice 4 Bob
clear channel
The noisy channel is modelled as a BSC with error probability p. Bob chooses a
linear code C with appropriate parameters. Alice chooses a linear map : C F2 .
To send m {0, 1}, Alice chooses c C such that (c) = m and sends c to Bob
via the noisy channel. Bob receives r = c + e, d(r, c) = (e) np. (The variance
of the BSC should be chosen small.) Later Alice sends c via the clear channel and
Bob checks d(r, c) np.
Why can Bob not read the message? We arrange that C has minimum distance
much smaller than np.
Why can Alice not change her choice? Alice knows the codeword c sent, but not r .
If later she sends c it will only be accepted if d(c , r) np. Alice's only safe option
0 0
0
is to choose c very close to c. But if the minimum distance of C is suciently
0
large, this forces c = c.
56 Cryptography
Quantum Cryptography
The following are problems with public key systems.
They are based on the belief that some mathematical problem is hard, e.g. factori-
sation or computation of the discrete logarithm. This might not be true.
As computers get faster, yesterday's securely encrypted message is easily read
tomorrow.
The aim is to construct a key exchange scheme that is secure, conditional only on the
laws of physics.
A classical bit is an element of {0, 1}. A quantum bit, or qubit, is a linear combination
|i = |0i + |1i with , C, ||2 + ||2 = 1. Measuring |i gives |0i with probability
||2 and |1i with probability ||2 . After the measurement, the qubit collapses to the
state observed, i.e. |0i or |1i.
The basic idea is that Alice generates a sequence of qubits and sends them to Bob. By
comparing notes afterwards, they can detect the presence of an eavesdropper.
/ / /
N
polarity vertical
light bulbs lter, angle polarity
to the vertical lter
Each photon passes through the second lter with probability cos2 . We identify C2 =
{|0i + |1i : , C} with an inner product (1 , 1 ).(2 , 2 ) = 1 2 + 1 2 . We can
measure a qubit with respect to any orthonormal basis, e.g.
1 1
|+i = |0i + |1i
2 2
1 1
|i = |0i |1i
2 2
If|i = |+i + |i then the observation gives |+i with probability ||2 and |i with
2
probability || .
BB84 (Bennet, Brassard, 1984)

Alice sends Bob a stream of (4 + )n qubits with randomly chosen polarisations
1
|0i, |1i, |+i, |i with probability .
4
Bob measures the qubits, using either the rst basis |0i, |1i or the second basis
|+i, |i, deciding which at random.
Afterwards, Alice announces which basis she used.
Bob announces which bits he measured with the right bases. (There are about

(2 + 2 )n of these.)
Now A and B share 2n bits. They compare n of these bits and if they agree, use the
other n bits as their key.
Remark. An eavesdropper who could predict which basis Alice is using to send, or Bob
uses to measure, could remain undetected. Otherwise, the eavesdropper will change
about 25% of the 2n bits shared.
57
One problem is that noise has the same eect as an eavesdropper. Say A and B accept at
most t errors in the n bits they compare, and assume at most t errors in the other n bits.
Say A has x F2 , B has x + e F2 with (e) t. We pick linear codes C2 C1 Fn2

of length n where C1 and C2 are t-error correcting. A chooses c C1 at random and
sends x + c to B using the clear channel. B computes (x + e) + (x + c) = c + e and
recovers c using the decoding rule for C1 .
To decrease the mutual information shared, A and B use as their key the coset c + C2
in C1 /C2 .
This version of BB84 is provably secure conditional only on the laws of physics. A
suitable choice of parameters can make both the probability that the scheme aborts and
the mutual information simultaneously arbitrarily small.

Coding

Uploaded by

Copyright:

Available Formats

Coding

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coding

Uploaded by

Copyright:

Available Formats

Coding and Cryptography

AT Xed by Sebastian Pancratz

4 Linear and Cyclic Codes 31

We model communication as illustrated in the following diagram.

Noiseless coding is adapted to the source.

one incorrect digit;

Noisy coding is adapted to the channel.

Plan of the Course

Useful books for this course include the following.

D. Welsh: Codes & Cryptography, OUP 1988.

W. Trappe, L.C. Washington: Introduction to Cryptography with Coding Theory,

W G & P C & T T & W

Denition. A discrete memoryless channel (DMC) is a channel with

pij = P(bj received | ai sent)

Denition. The binary symmetric channel (BSC) with error probability 0 p 1

The size of C is m = |M|. The information rate is (C) = 1

Denition. A channel can transmit reliably at rate R if there exists (Cn )

Fact. A BSC with error probability p< 1

Notation. an alphabet, let = n0 n be the set of all nite strings from .

Denition. Let 1 , 2 be alphabets. A code is a function f : 1 2 . The strings

Example (Greek Fire Code). 1 = {, , , . . . , }, 2 = {1, 2, 3, 4, 5}. f () =

Example. Let 1 be the words in a given dictionary and 2 = {A, B, C, . . . , Z, }.

Denition. f is decipherable if f is injective, i.e. each string from 2 corresponds to

Note 1. Note that we need f to be injective, but this is not enough.

Notation. If |1 | = m, |2 | = a then f is an a-ary code of size m.

(i) A block code has all codewords the same length.

Exercise 1. Construct a decipherable code which is not prex-free.

Take 1 = {1, 2}, 2 = {0, 1} and set f (1) = 0, f (2) = 01.

Theorem 1.1 (Kraft's Inequality) . Let |1 | = m, |2 | = a. A prex-free code f : 1

Theorem 1.2 (McMillan) . Any decipherable code satises Kraft's inequality.

Proof (Kamish). Let f : 1 2 be a decipherable code with word lengths s1 , . . . , sm .

bl = |{x r1 : f (x) has length l}|

using that f is injective. Then

Entropy is a measure of randomness or uncertainty. A random variable P X takes

Example. Suppose that p 1 = p2 = p3 = p 4 = 1

Example. Let (p1 , p2 , p3 , p4 ) = ( 12 , 14 , 18 , 18 ).

Note 3. H(X) is always non-negative. It is measured in bits.

Exercise 2. By convention 0 log 0 = 0. Show that x log x 0 as x 0.

Example. A biased coin has P(Heads) = p, P(Tails) = 1 p. We abbreviate H(p, 1 p)

H(p) = p log p (1 p) log(1 p)

0 0.2 0.4 0.6 0.8 1

with equality if and only if pi = q i for all i.

Proof. Since log x = ln x

with equality if and only if x = 1. Hence

Corollary 1.5. H(p1 , . . . , pn ) log n with equality if and only if p1 = . . . = pn = 1

Let 1 = {1 , . . . , m }, |2 | = a. The random variable X takes values 1 , . . . , m with

By Theorem 1.2, c 1, so log c 0.

Remark. (i) Human codes are prex-free.

Example. Consider the same case as in the previous example.

0.4 1 0.4 1 0.4 1 0.6 0

Proof. We show by induction on m that Human codes of size m are optimal.

If m=2 the codewords are 0 and 1. This code is clearly optimal.

Assume m > 2. let fm be a Human code for Xm which takes values 1 , . . . , m

E(Sm ) = E(Sm1 ) + pm1 + pm ()

The expected word length is

By the induction hypothesis, fm1 is optimal, hence

Lemma 1.8. Suppose messages 1 , . . . , m are sent with probabilities p1 , . . . , p m . Let

This denition generalises to any nite number of random variables.

Denition. A discrete memoryless channel (DMC) is a channel with

Denition. The binary symmetric channel (BSC) with error probability 0 p 1

Denition. A channel can transmit reliably at rate R if there exists (Cn )

Notation. an alphabet, let = n0 n be the set of all nite strings from .

Denition. Let 1 , 2 be alphabets. A code is a function f : 1 2 . The strings

Denition. f is decipherable if f is injective, i.e. each string from 2 corresponds to

Exercise 1. Construct a decipherable code which is not prex-free.

Theorem 1.1 (Kraft's Inequality) . Let |1 | = m, |2 | = a. A prex-free code f : 1

Theorem 1.2 (McMillan) . Any decipherable code satises Kraft's inequality.

Entropy is a measure of randomness or uncertainty. A random variable P X takes

Remark. (i) Human codes are prex-free.

Proof. We show by induction on m that Human codes of size m are optimal.

Assume m > 2. let fm be a Human code for Xm which takes values 1 , . . . , m

This denition generalises to any nite number of random variables.

Denition. For x, y {0, 1}n the Hamming distance is

By the hypothesis, P(c sent) is independent of c C . So for xed x, maximising

Example. Hamming's original [7,16]-code. Let C F72 be dened by

Denition. A code C of length n that can correct e-errors is perfect if

Denition. A(n, d) = max{m : there exists an [n, m, d]-code}.

Denition. A source is a sequence of random variables X1 , X2 , . . . taking values in

Denition. A source X1 , X 2 , . . . is reliably encodeable at rate r if there exists subsets

Denition. A source X1 , X 2 , . . . satises the Asymptotic Equipartition Property (AEP)

Theorem 3.5 (Shannon's First Coding Theorem) . If a source X1 , X 2 , . . . satises the