Supplementary Material For Coding Theory
Supplementary Material For Coding Theory
Sarah A. Spence
Contents
1 Introduction 1
2 Basics 2
2.1 Important code parameters . . . . . . . . . . . . . . . . . . . . . 4
2.2 Correcting and detecting errors . . . . . . . . . . . . . . . . . . . 5
2.3 Sphere-packing bound . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Linear codes 9
3.1 Generator and parity check matrices . . . . . . . . . . . . . . . . 11
3.2 Coset and syndrome decoding . . . . . . . . . . . . . . . . . . . . 14
3.3 Hamming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Acknowledgements 40
1 Introduction
Imagine that you are using an infrared link to beam messages consisting of 0s
and 1s from your laptop to your friend’s PalmPilot. Usually, when you send
a 0, your friend’s PalmPilot receives a 0. Occasionally, however, noise on the
channel causes your 0 to be received as a 1. Examples of possible causes of noise
include atmospheric disturbances. You would like to find a way to transmit your
messages in such a way that errors are detected and corrected. This is where
error-control codes come into play.
1
Error-control codes are used to detect and correct errors that occur when
data is transmitted across some noisy channel. Compact discs (CDs) use error-
control codes so that a CD player can read data from a CD even if it has been
corrupted by noise in the form of imperfections on the CD. When photographs
are transmitted to Earth from deep space, error-control codes are used to guard
against the noise caused by lightning and other atmospheric interruptions.
Error-control codes build redundancy into a message. For example, if your
message is x = 0, you might encode x as the codeword c = 00000. (We work
more with this example in Chapter 2.) In general, if a message has length k,
the encoded message, i.e. codeword, will have length n > k.
Algebraic coding theory is an area of discrete applied mathematics that is
concerned (in part) with developing error-control codes and encoding/decoding
procedures. Many areas of mathematics are used in coding theory, and we focus
on the interplay between algebra and coding theory. The topics in this packet
were chosen for their importance to developing the major concepts of coding
theory and also for their relevance to a course in abstract algebra. We aimed to
explain coding theory concepts in a way that builds on the algebra learned in
Math 336. We recommend looking at any of the books in the bibliography for
a more detailed treatment of coding theory.
As you read this packet, you will notice questions interspersed with the text.
These questions are meant to be straight-forward checks of your understanding.
You should work out each of these problems as you read the packet. More
homework problems are found at the end of each chapter.
2 Basics
To get the ball rolling, we begin with some examples of codes. Our first example
is an error-detecting, as opposed to error-correcting, code.
Example 2.1. (The ISBN Code) The International Standard Book Number
(ISBN) Code is used throughout the world by publishers to identify properties
of each book. The first nine digits of each ISBN represent information about
the book including its language, publisher, and title. In order to guard against
errors, the nine-digit “message” is encoded as a ten-digit codeword. The ap-
pended tenth digit is a check digit chosen so that the whole ten-digit string
x1 x2 · · · x10 satisfies
X10
ixi ≡ 0 (mod 11). (1)
i=1
If x10 should be equal to 10, an ‘X’ is used.
The ISBN code can be used to detect any single error and any double-error
created by the transposition of two digits. This detection scheme uses properties
of F11 , the finite field with 11 elements, as follows. Suppose we receive a length
ten vector y1 y2 · · · y10 (by scanning a book). We calculate its weighted check
P10
sum Y = i=1 iyi . If Y 6= 0 modulo 11 , then we know that one or more
errors has occurred. To prove that any single error can be detected, suppose
2
that x = x1 x2 · · · x10 is sent, but y = y1 y2 · · · y10 is received where yi = xi for
all 1 ≤ i ≤ 10 except one index j where yj = xj + a for some nonzero a. Then
P10 P10
Y = i=1 iyi = ( i=1 ixi ) + ja = ja 6= 0 (mod 11), since j and a are nonzero.
Since Y 6= 0, the single error is detected. To prove that any double-error created
by the transposition of two digits is detected, suppose the received vector y is
the same as the sent vector x except that digits xj and xk have been transposed.
P10 P10
Then Y = i=1 iyi = ( i=1 ixi ) + (k − j)xj + (j − k)xk = (k − j)(xj − xk ) 6= 0
(mod 11), if k 6= j and xj 6= xk . Notice that this last step relies on the fact that
the field F11 has no zero divisors.
Question 2.2. Are either of the following strings valid ISBNs: 0-13165332-6,
0-1392-4101-4?
The Universal Product Code (UPC) found on groceries and other items is
another example of a code that employs a check digit. You can learn more about
the UPC code at
http://www.howstuffworks.com/upc1.htm.
In some states, drivers’ license numbers include check digits in order to detect
errors or fraud. In fact, many states generate license numbers through the use
of complicated formulas that involve both check digits and numbers based on
information such as the driver’s name, date of birth, and sex. Most states keep
their formulas confidential, however Professor Joseph Gallian at the Univer-
sity of Minnesota-Duluth figured out how several states generate their license
numbers. The following website summarizes some of his results and techniques:
http://www.maa.org/mathland/mathtrek 10 19 98.html.
http://www.cs.queensu.ca/home/bradbury/checkdigit/index.html.
3
However, we have increased the probability of decoding the received string as
the correct message.
Question 2.4. How many different binary 5-tuples exist? Which of these would
be decoded as 00000?
4
The third condition above is called the triangle inequality. The proofs of the
above properties are left as problems at the end of this chapter.
The following notation is standard and will be used throughout this packet.
An (n, M, d) code has minimum distance d and consists of M codewords, all of
length n. One of the major goals of coding theory is to develop codes that strike
a balance between having small n (for fast transmission of messages), large M
(to enable transmission of a wide variety of messages), and large d (to detect
many errors).
Traditionally, the alphabets used in coding theory are finite fields, Fq . We
say that a code is q-ary if its codewords are defined over the q-ary alphabet Fq .
The most commonly used alphabets are binary extension fields, F2m . In the
common case where the alphabet is F2 , we say that the code is binary.
Example 2.11. Let C = {0000, 1100, 0011, 1111}. Then C is a (4, 4, 2) binary
code.
1. Each symbol transmitted has the same probability p (< 1/2) of being
received in error.
2. If a symbol is received in error, that each of the q − 1 possible errors is
equally likely.
5
We know that the repetition code of length 5 corrects up to two errors, and
it clearly can detect up to four errors. In general, a code that has minimum
distance d can be used to either detect up to d − 1 errors or correct up to
b(d − 1)/2c errors. This is a consequence of the following theorem.
Theorem 2.13. 1. A code C can detect up to s errors in any codeword if
d(C) ≥ s + 1.
Question 2.15. How many errors can the (32, 64, 16) Reed-Muller code cor-
rect?
Definition 2.16. The rate of a code is the ratio of message bits to coded bits.
6
Reed-Muller code in Example 2.14 represents a 6-bit message, the rate of the
code is 6/32. Lower rate codes are faster and use less power, however sometimes
higher rate codes are preferred because of superior error-correcting capability
or other practical considerations.
We’d like to compare the probability of¡error ¢ when using these two codes.
n
First, we need some notation. The number m , read “n choose m,” counts the
number
¡n¢ of ways that we can choose m objects from a pool of n objects, and
m = n!/(n − m)!m!. Note that n! is
¡n¢ read “n factorial,” and n! = n · (n −
1) · · · 2 · 1. The numbers of the form m are called binomial coefficients because
of the binomial theorem which states that for any positive integer n,
µ ¶ µ ¶ µ ¶
n n 2 n n
(1 + x)n = 1 + x+ x + ··· + x .
1 2 n
We now return to our code analysis. When using the Reed-Muller code, the
probability that a length 32 codeword is decoded incorrectly is
32 µ ¶
X 32 i
p (1 − p)32−i (3)
i=8
i
7
Theorem 2.21. (Sphere-packing bound ) A t-error-correcting q-ary code of length
n must satisfy
t µ ¶
X n
M (q − 1)i ≤ q n (5)
i=0
i
where M is the total number of codewords.
In order to prove Theorem 2.21, we need the following lemma.
Lemma 2.22. A sphere of radius r, 0 ≤ r ≤ n, in Fnq contains exactly
r µ ¶ µ ¶ µ ¶ µ ¶ µ ¶
X n i n n n 2 n
(q − 1) = + (q − 1) + (q − 1) + · · · + (q − 1)r
i=0
i 0 1 2 r
vectors.
Proof of Lemma 2.22. Let u be a fixed vector in Fnq . Consider how many vectors
v have distance exactly m from u, ¡ n where
¢ m ≤ n. The m positions in which v is
to differ from u can be chosen in m ways, and then in each of these m positions
the entry of v can be chosen in q − 1 ways to differ from the corresponding
¡n¢ entry
of u. Hence, the number of vectors at distance exactly m from u is m (q − 1)m .
So,
¡n¢ the¡ntotal number
¡n¢ of vectors in a ball
¡n¢ of radius r, centered at u, must be
2 r
¢
0 + 1 (q − 1) + 2 (q − 1) + · · · + r (q − 1) .
Proof of Theorem 2.21. Suppose C is a t-error-correcting q-ary code of length n.
As explained in the discussion above Theorem 2.21, in order that C can correct
t errors, any two spheres of radius t centered on distinct codewords can have
no vectors in common. Hence, the total number of vectorsP in the¡ M
¢ spheres of
t
radius t centered on the M codewords of C is given by M i=0 ni (q − 1)i , by
Lemma 2.22. This number of vectors must be less than or equal to the total
number of vectors in Fnq , which is q n . This proves the sphere-packing bound.
Definition 2.23. A code is perfect if it satisfies the sphere-packing bound of
Theorem 2.21 with equality.
When decoding using a perfect code, every possible word in Fnq is at distance
less than or equal to t from a unique codeword, so the nearest neighbor algorithm
yields an answer for every received word y, and it is the correct answer when
the number of errors is less than or equal to t. In the next chapter, we will
introduce a family of perfect codes called Hamming codes.
2.4 Problems
1. Complete the ISBN that starts as 0-7803-1025-.
2. Let C = {00, 01, 10, 11}. Why can’t C correct any errors?
3. Prove that the Hamming distance satisfies the following conditions for any
x, y, z ∈ Fnq :
8
(a) d(x, x) = 0
(b) d(x, y) = d(y, x)
(c) d(x, y) ≤ d(x, z) + d(z, y)
4. Let C be a code that contains 0, the codeword consisting of a string of all
zeros. Suppose that C contains a string u as a codeword. Now suppose
that the string u also occurs as an error. Prove that C will not always
detect when the error u occurs.
5. Let C be a code of even minimum distance d(C) = 2t. Show that C can
be used to simultaneously correct up to t − 1 errors and to detect t errors.
More precisely, give an algorithm that takes a received word y and either
produces a code word x or declares “t errors;” the algorithm should always
give the right answer if the number of errors is at most t. Explain why,
in spite of Theorem 2.13, we can’t hope to detect 2t − 1 errors if we are
simultaneously trying to correct errors.
6. Prove that for n = 2r − 1, n0 + n1 = 2r .
¡ ¢ ¡ ¢
3 Linear codes
In this chapter, we study error-control codes that have additional structure.
Definition 3.1. A linear code of length n over Fq is a subspace of the vector
space Fnq .
Hence, a subset C of Fnq is a linear code if and only if (1) u + v ∈ C for all u,
v in C, and (2) au ∈ C, for all u ∈ C and a ∈ Fq . Linear codes are widely used
in practice for a number of reasons. One reason is that they are easy to find.
Another reason is that encoding linear codes is very quick and easy. Decoding
is also often facilitated by the linearity of a code.
9
When we are viewing Fnq as a vector space, we will write V (n, q). If C is a k-
dimensional subspace of V (n, q), we say that C is an [n, k, d] or [n, k] linear code,
and we can talk about the dimension k of the code. If C is a q-ary [n, k] linear
code, then C has q k codewords. This means that C can be used to communicate
any of q k distinct messages. We identify these messages with the q k elements of
V (k, q), the vector space of k-tuples over Fq . The idea is to encode messages of
length k to get codewords of length n, where n > k.
Linear codes have several useful properties, two of which are described in
the following theorems.
Theorem 3.2. Let C be a linear code. The linear combination of any set of
codewords in C is a code word in C.
Proof. C is a subspace of V (n, q), and this property follows directly from the
definition of a vector space.
Question 3.3. Is {100, 001, 101} a linear code?
Question 3.4. Show that the codeword 0 consisting of all zeros is always
contained in a linear code.
Theorem 3.5. The minimum distance, d(C), of a linear code C is equal to
w∗ (C), the weight of the lowest-weight nonzero codeword.
Proof. There exist codewords x and y in C such that d(C) = d(x, y). By the
definition of Hamming distance, we can rewrite this as d(C) = w(x − y). Since
x − y is a codeword in C by Theorem 3.2 and by the definition of w∗ (C), we
have d(C) = w(x − y) ≥ w∗ (C).
On the other hand, there exists some codeword c ∈ C such that w∗ (C) =
w(c) = d(c, 0) ≥ d(C), since 0 ∈ C. Since we have now shown that both
d(C) ≥ w∗ (C) and d(C) ≤ w∗ (C), we conclude that d(C) = w∗ (C).
Theorem 3.5 greatly facilitates finding the minimum distance of a linear code.
Instead of looking at the distances between all possible pairs of codewords, we
need only look at the weight of each codeword.
Question 3.6. How many pairs of codewords would we need to consider if
trying to find the minimum distance of a nonlinear code with M codewords?
Sometimes we can start with a known binary code and add an overall par-
ity check digit to increase the minimum distance of a code. Suppose C is an
(n, M, d) code. Then we can construct a code C 0 , called the extended code of
C, by adding a parity check digit to each codeword x ∈ C to obtain a codeword
x0 ∈ C 0 as follows. For each x = x0 x1 · · · xn−1 ∈ C, let x0 = x0 x1 · · · xn−1 0 if
the Hamming weight of x is even, and let x0 = x0 x1 · · · xn−1 1 if the Hamming
weight of x is odd. This ensures that every codeword in the extended code has
even Hamming weight.
Theorem 3.7. Let C be a binary linear code with minimum distance d = d(C).
If d is odd, then the minimum distance of the extended code C 0 is d + 1, and if
d is even, then the minimum distance of C 0 is d.
10
Proof. In the problems at the end of this chapter, you will prove that the ex-
tended code C 0 of a binary linear code is also a linear code. Hence by Theorem
3.5, d(C 0 ) is equal to the smallest weight of nonzero codewords of C 0 . Compare
the weights of codewords x0 ∈ C 0 with the weights of the corresponding code-
words x ∈ C: w(x0 ) = w(x) if w(x) is even, and w(x0 ) = w(x)+1 if w(x) is odd.
By Theorem 3.5, we can conclude that d(C 0 ) = d if d is even and d(C 0 ) = d + 1
if d is odd.
11
Theorem 3.13. Two k × n matrices generate equivalent [n, k] linear codes over
Fq if one matrix can be obtained from the other by a sequence of operations of
the following types:
1. Permutation of the rows
2. Multiplication of a row by a nonzero scalar
3. Addition of a scalar multiple of one row to another
4. Permutation of the columns
5. Multiplication of any column by a nonzero scalar.
Proof. The first three types of operations preserve the linear independence of the
rows of a generator matrix and simply replace one basis of the code by another
basis of the same code. The last two types of operations convert a generator
matrix for one code into a generator matrix for an equivalent code.
Recall the following definitions from linear algebra.
Definition 3.14. The inner product u · v of vectors u = (u0 , . . . , un−1 ) and
v = (v0 , . . . , vn−1 ) in V (n, q) is the scalar defined by u · v = u0 v0 + u1 v1 + · · · +
un−1 vn−1 .
Question 3.15. In V (4, 3), what is the inner product of (2, 0, 1, 1) and (1, 2,
1, 0)? Hint: Make sure that your answer is a scalar in the correct field.
Definition 3.16. Two vectors u and v are orthogonal if u · v = 0.
Definition 3.17. Given a subspace S of some vector space V (n, q), the space
of all vectors orthogonal to S is called the orthogonal complement of S, and is
denoted S ⊥ .
For example, you may recall that the nullspace of a matrix is the orthogonal
complement of the rowspace of that matrix.
The following definition shows how the above concepts from linear algebra
are used in coding theory.
Definition 3.18. Given a linear [n, k] code C, the dual code of C, denoted C ⊥ ,
is the set of vectors of V (n, q) which are orthogonal to every codeword in C, i.e.
C ⊥ = {v ∈ V (n, q) | v · u = 0, ∀ u ∈ C}
Hence, the concepts of dual codes and orthogonal complements in V (n, q)
are the same. However, the reader should be careful not to think of a dual code
as an orthogonal complement in the sense of vector spaces over the real numbers
R: In the case of finite fields, C and C ⊥ can have intersections larger than {0}.
In fact, codes where C = C ⊥ are called self-dual and are well-studied. If C is
an [n, k] linear code, then C ⊥ is an [n, n − k] linear code. Furthermore, if C
has generator matrix G, then C ⊥ has an (n − k) × n generator matrix H that
satisfies GH T = 0. The generator matrix H for C ⊥ is also called a parity check
matrix for C, as explained by the following theorem.
12
Theorem 3.19. Let C be a linear code, and let H be a generator matrix for
C ⊥ , the dual code of C. Then a vector c is a codeword in C if and only if
cH T = 0, or equivalently, if and only if HcT = 0.
Proof. Let c ∈ C. Then c · h = 0 for all h ∈ C ⊥ by the definition of dual codes.
It follows that cH T = 0, since the rows of H form a basis for C ⊥ .
Alternately, let c be a vector such that cH T = 0. Then c · h = 0 for all h
in the dual code C ⊥ . So c ∈ (C ⊥ )⊥ . You will prove at the end of this Chapter
that (C ⊥ )⊥ = C, hence c ∈ C.
Theorem 3.19 motivates the following definition:
Definition 3.20. Let C be an [n, k] cyclic code. A parity check matrix for C is
an (n − k) × n matrix H such that c ∈ C if and only if cH T = 0. Equivalently,
a parity check matrix for C is a generator matrix for C ⊥ .
Theorem 3.19 shows that C is equal to the nullspace of the parity check
matrix H. We can use the parity check matrix of a code to determine the
minimum distance of the code:
Theorem 3.21. Let C have parity check matrix H. The minimum distance
of C is equal to the minimum nonzero number of columns in H for which a
nontrivial linear combination of the columns sums to zero.
Proof. Since H is a parity check matrix for C, c ∈ C if and only if 0 = cH T . Let
the column vectors of H be {d0 , d1 , . . . , dn−1 }. The matrix equation 0 = cH T
can be reexpressed as follows:
0 = cH T (6)
= (c0 , c1 , . . . , cn−1 )[d0 d1 · · · dn−1 ]T (7)
= c0 d0 + c1 d1 + · · · + cn−1 dn−1 (8)
13
Proof. An [n, k] linear code has a parity check matrix H containing (n − k)
linearly independent rows. By linear algebra facts, any such H also has exactly
(n−k) linearly independent columns. It follows that any collection of (n−k +1)
columns of H must be linearly dependent. The result now follows from Theorem
3.21.
In Chapter 4.5, we will study some codes that achieve the Singleton bound
with equality. Such codes are called maximum distance separable.
An (n − k) × n parity check matrix H for a code C in standard form if H =
(A | In−k ). Then the generating matrix in standard form is G = (Ik | −AT ).
(Note: Authors vary in their definitions of the standard form of generator and
parity check matrices.) Any parity check matrix or generating matrix can be put
in standard form, up to rearranging of columns, by performing row operations.
This changes neither the nullspace nor the rowspace of the matrix, which is
important since C is the nullspace of H and the rowspace of G.
When a data block is encoded using a generator matrix in standard form,
the data block is embedded without modification in the first k coordinates of
the resulting codeword. This facilitates decoding. When data is encoded in this
fashion, it is called systematic encoding.
14
The coset leader of a coset is chosen to be one of the vectors of minimum
weight in the coset. In the example above, either 0100 or 0001 could be chosen
as the coset leader for the corresponding coset.
The coset decoding algorithm for an [n, k] linear code works as follows. We
partition Fnq into cosets of C. There are q n /q k = q n−k cosets, each of which
contains q k elements. For each coset, pick a coset leader. There may be an
arbitrary choice involved at this step. Then, if y is received, find the coset that
contains y. This coset is of the form e + C, and we guess that y − e was sent.
Note that coset leaders, defined to be of least weight, represent error vectors,
and this decoding algorithm assumes that lowest weight error vectors are the
most probable.
In order to carry out the above decoding algorithm, we make a standard
array by listing the cosets of C in rows, where the first entry in each row is the
coset leader. More specifically, the first row of the standard array is 0 + C, with
0, the coset leader, listed first. We next pick a lowest-weight vector of Fnq that
is not in C and put it as a coset leader for the second row. The coset in the
second row is obtained by adding its leader to each of the words in the first row.
This continues until all cosets of C are entered as rows.
The standard array for Example 3.26 is:
Question 3.27. Explain why the above code, with the above decoding scheme,
can not correct errors in the fourth position.
Syndrome decoding is a related decoding scheme for linear codes that uses
the parity check matrix H of a code. Suppose x is transmitted and y is received.
Compute yH T , which we call the syndrome of y and write S(y). We know from
the definition of a parity check matrix that yH T = 0 if and only if y ∈ C. So,
if S(y) = yH T = 0, then we conclude that most likely no errors occurred.
(It is possible that sufficiently many errors occurred to change the transmitted
codeword into a different codeword, but we can not detect or correct this.) If
yH T 6= 0, then we know that at least one error occurred, and so y = x + e,
where x is a codeword and e is an error vector. Since xH T = 0, we see that
yH T = (x + e)H T = xH T + eH T = eH T . So the syndrome of the received
15
word y is equal to the syndrome of the error vector e that occurred during
transmission. It is known that S(u) = S(v) if and only if u and v are in the
same coset of C. It follows that there is a one-to-one correspondence between
cosets and syndromes. This means that every word in a particular coset (i.e. in
a particular row of the standard array) has the same syndrome. Thus, we can
extend the standard array by listing the syndromes of each coset leader (and
hence of each element in the coset) in an extra column.
Example 3.28. When the standard array of Example 3.26 is expanded to allow
for syndrome decoding, it looks as follows:
0000 1011 0101 1110 00
1000 0011 1101 0110 11
0100 1111 0001 1010 01
0010 1001 0111 1100 10
S(r) = rH T (9)
= (c + e)H T (10)
= cH T + eH T (11)
= 0 + eH T (12)
= eH T (13)
Notice that (12) follows from the fact that H is the parity check matrix for
C. Since eH T is a combination of the columns in H that correspond to the
positions where errors occurred, this proves the result.
16
3.3 Hamming codes
The fact that the syndrome of a received vector is equal to the sum of the
columns of the parity check matrix H where errors occurred gives us some
insight about how to construct a parity check matrix for a binary code that
will undergo coset or syndrome decoding. First, the columns of H should all be
nonzero, since otherwise an error in the corresponding position would not affect
the syndrome and would not be detected by the syndrome decoder. Second, the
columns of H should be distinct, since if two columns were equal, then errors
in those two positions would be indistinguishable.
We use these observations to build a family of codes known as the binary
Hamming codes, Hr , r ≥ 2. Any parity check matrix for the Hamming code Hr
has r rows, which implies that each column of the matrix has length r. There
are precisely 2r − 1 nonzero binary vectors of length r, and in order to construct
a parity check matrix of Hr , we use all of these 2r − 1 column vectors.
One parity check matrix for the binary [7, 4, 3] Hamming code H3 , where
columns are taken in the natural order of increasing binary numbers is as follows:
0 0 0 1 1 1 1
0 1 1 0 0 1 1
1 0 1 0 1 0 1
We now rearrange the columns to get a parity check matrix H in standard form:
0 1 1 1 1 0 0
1 0 1 1 0 1 0
1 1 0 1 0 0 1
Question 3.33. Encode the messages 0000 and 1010 using G. Check that the
resulting codewords are valid by using H.
Question 3.34. Write down a parity check matrix for the binary Hamming
code with r = 4.
17
It is easy to decode Hamming codes, which are used when we expect to
have zero or one error per codeword. If we receive the vector y, compute the
syndrome S(y). If S(y) = 0, then assume that y was the codeword sent. If
S(y) 6= 0, then, assuming a single error, S(y) is equal to the column of H that
corresponds to the coordinate of y where the error occurred. Find the column
of H that matches S(y), and then correct the corresponding coordinate of y.
This decoding scheme is further simplified if the columns of H are arranged in
order of increasing binary numbers.
Example 3.35. In Section 2.2, we studied extended codes, which are built
from existing codes by adding an overall parity check bit. This example shows
a parity check matrix of the extended code of the binary Hamming code H3 :
0 1 1 1 1 0 0 0
1 0 1 1 0 1 0 0
1 1 0 1 0 0 1 0
1 1 1 1 1 1 1 1
Notice that the last row of the extended parity check matrix gives an overall
parity-check equation on the codewords: x0 + x1 + · · · + xn = 0. Compare this
parity check matrix with the parity check for the corresponding Hamming code.
3.4 Problems
1. Show that if C is a binary linear code, then the extended code obtained
by adding an overall parity check to C is also linear.
2. Prove that either all of the codewords in a binary linear code have even
weight or exactly half have even weight.
6. Construct a standard
µ array for a ¶binary code having the following gen-
1 0 1 1 0
erator matrix: . Decode the received vectors 11111
0 1 0 1 1
and 01011. Give examples of (a) two errors occurring in a codeword and
being corrected and (b) two errors occurring in a codeword and not being
corrected.
18
7. Construct a syndrome look-up table for the perfect binary
[7, 4, 3] code
1 0 0 0 1 1 1
0 1 0 0 1 1 0
which has generator matrix 0 0 1 0 1 0 1 . Use the table
0 0 0 1 0 1 1
to decode the following received vectors: 0000011, 1111111, 1100110, and
1010101.
8. Show that a binary code can correct all single errors if and only if any
parity check matrix for the code has distinct nonzero columns.
9. Prove that any Hamming code has minimum distance equal to 3.
10. Suppose that C is a binary code with parity check matrix H. Write down
the parity check matrix for the extended code of C. (Your new parity
check matrix will involve the matrix H.)
11. In this chapter, we defined extended codes of binary codes so that every
codeword in the extended code has even Hamming weight. Generalize this
idea to define extended codes of q-ary codes. What condition should the
weight of each codeword in the extended code satisfy?
12. Let C be an [n, k] binary linear code with generator matrix G. If G does
not have a column of zeroes, show that the sum of the weights of all the
codewords is n · 2k−1 . [Hint: How many codewords are there with a 1 in
coordinate i for fixed i?] Generalize this to linear codes over the field Fq .
4.1 Ideals
We first studied rings in Chapter 8 of the main text [1]. Recall that we assume
rings have identity, 1. We will now study ideals, which are a special type of
subset of a ring.
Definition 4.1. A nonempty subset A of a ring R is an ideal of R if a + b ∈ A
whenever a, b ∈ A and ra, ar ∈ A whenever a ∈ A and r ∈ R. When R is
commutative, ar = ra, hence we need only check that ra ∈ A.
We say that an ideal A “absorbs” elements from R.
Example 4.2. For any ring R, {0} and R are ideals of R.
Question 4.3. For any positive integer n, prove that the set nZ = {0, ±n, ±2n . . .}
is an ideal of the ring of integers Z.
19
Question 4.4. Let h3i = {3r|r ∈ Z36 }. Prove that h3i is an ideal of the ring
Z36 .
Definition 4.5. Let R be a commutative ring with unity and let g ∈ R. The
set hgi = {rg|r ∈ R} is an ideal of R called the principal ideal generated by g.
The element g is called the generator of the principal ideal.
So, A is a principal ideal if there exists g ∈ A such that every element a ∈ A
can be written as rg for some r ∈ R.
We studied rings of the form F [x]/m(x), where F is a field, in Chapter 28
of [1]. We continue that study now, and in particular, we focus on ideals of
Fq [x]/(xn − 1).
Question 4.6. Show that h[x + 1]i is an ideal in F2 [x]/(x7 − 1).
Theorem 4.7. Let A be an ideal in Fq [x]/(xn − 1). The following statements
are true:
1. There exists a unique monic polynomial g(x) of minimal degree such that
[g(x)] ∈ A. (A monic polynomial has a coefficient of 1 on its highest power
term.)
2. A is principal with generator [g(x)].
3. g(x) divides xn − 1 in Fq [x].
20
Fq [x]/(xn − 1), the definition of an ideal implies that [h(x)] ∈ A. It follows
that h[g(x)]i ⊆ A. Since we have both inclusions, we conclude that A =
h[g(x)]i, hence A is a principal ideal with generator [g(x)].
3. Note that [xn − 1] = 0 ∈ A, so by the Claim, g(x) must divide xn − 1.
Example 4.10. The binary code C = {000, 101, 011, 110} is a cyclic code.
Question 4.11. What is the corresponding code polynomial for the codeword
(1, 0, 1, 1)?
Until now, we have been viewing linear codes as subspaces of Fnq = V (n, q).
When we use code polynomials to represent codewords, we begin to view cyclic
codes as subspaces the ring Fq [x]/(xn − 1) of polynomials modulo xn − 1. By
working modulo xn − 1, we can achieve a right cyclic shift of a codeword by
multiplying the associated code polynomial by x: Consider the code polynomial
c(x) = c0 + c1 x + · · · + cn−1 xn−1 . Multiplying c(x) by x modulo xn − 1 gives
c0 (x) = c0 x + c1 x2 + · · · + cn−1 xn ≡ c0 x + c1 x2 + · · · + cn−1 modulo xn − 1. The
codeword associated with c0 (x) is (cn−1 , c0 , . . . , cn−2 ), which is clearly the right
cyclic shift of the codeword associated with c(x). From now on, we will use the
terms “codeword” and “code polynomial” interchangeably. This abuse reflects
the fact that you should be thinking about codewords and code polynomials as
representing the same thing.
We now use the language of code polynomials to characterize cyclic codes:
21
Proof. Suppose C is a cyclic code of length n over Fq . Then C is linear, so
Condition (1) holds. Now suppose that a(x) ∈ C and r(x) = r0 + r1 x +
. . . rn−1 xn−1 ∈ Fq [x]/(xn − 1). As discussed above, multiplication of a code
polynomial by x corresponds to a cyclic shift of the corresponding codeword.
Hence, xa(x) ∈ C. Similarly, x(xa(x)) = x2 c(x) ∈ C, and so on. It follows
that r(x)a(x) = r0 a(x) + r1 xa(x) + · · · + rn−1 xn−1 a(x) is also in C since each
summand is in C. Therefore, Condition (2) also holds.
On the other hand, suppose that Conditions (1) and (2) hold. If we take
r(x) to be a scalar, the conditions imply that C is a linear code. Then, if we
take r(x) = x, Condition (2) implies that C is a cyclic code.
Theorem 4.12 implies the following:
Corollary 4.13. Cyclic codes of length n over Fq are precisely the ideals in the
ring Fq [x]/(xn − 1).
Proof. Suppose C is a cyclic code of length n over Fq . Then, its set of code
polynomials is defined in Fq [x]/(xn − 1). Since C is a linear code, it is closed
under all linear combinations of the code polynomials. Furthermore, since C is
a cyclic code, r(x)a(x) ∈ C for any polynomial r(x) ∈ Fq [x]/(xn −1). Therefore,
C satisfies all conditions of being an ideal in Fq [x]/(xn − 1).
On the other hand, suppose that A is an ideal in Fq [x]/(xn − 1). Then
its elements are polynomials of degree less than or equal to n − 1, and the
set of polynomials are closed under linear combinations. This shows that the
polynomials represent codewords in a linear code. Furthermore, by the definition
of an ideal, if a(x) ∈ A, then r(x)a(x) ∈ A for any polynomial r(x) ∈ Fq [x]/(xn −
1). This implies that A is a cyclic code.
We can actually say more:
Theorem 4.14. Let C be a q-ary [n, k] cyclic code.
1. In C, there is a unique monic code polynomial g(x) with minimal degree
r < n called the generator polynomial of C.
2. Every code polynomial c(x) ∈ C can be expressed uniquely as c(x) =
m(x)g(x), where g(x) is the generator polynomial and m(x) is a polyno-
mial of degree less than (n − r) in Fq [x].
3. The generator polynomial g(x) divides (xn − 1) in Fq [x].
Proof. This theorem is a translation of Theorem 4.7 into the language of coding
theory. See the proof of Theorem 4.7 above.
Although each cyclic code contains a unique monic generating polynomial of
minimal degree, it may also contain other polynomials that generate the code.
These other polynomials are either not monic or not of minimal degree.
The above theorems say a great deal about cyclic codes. We can henceforth
think of ideals and cyclic codes as equivalent. More specifically, each cyclic code
22
C is a principal ideal. Since ideals of Fq [x]/(xn − 1) are generated by divisors of
xn − 1, there are precisely as many cyclic codes of length n as there are divisors
of xn − 1 in Fq [x]/(xn − 1). More formally:
Theorem 4.15. There is a one-to-one correspondence between monic divisors
of xn − 1 in Fq [x] and q-ary cyclic codes of length n.
Proof. By Corollary 4.13 and Theorem 4.14, any q-ary [n, k] cyclic code C is
an ideal in Fq [x]/(xn − 1) with a unique monic generating polynomial g(x) that
divides xn − 1. Hence, to any q-ary [n, k] cyclic code, we can associate a unique
monic divisor of xn − 1.
One the other hand, suppose h(x) is a monic divisor of xn − 1 in Fq [x].
Consider the code C = h[h(x)]i. Clearly, the code polynomial h(x) generates
C. Suppose however that h(x) is not the generating polynomial. Then C
contains some other monic generating polynomial g(x) of minimal degree. Since
[h(x)] ∈ C and [g(x)] is the generating polynomial for C, Claim 4.8 implies that
g(x) divides h(x). On the other hand, since [g(x)] ∈ C = h[h(x)]i, [g(x)] is of
the form [g(x)] = [h(x)][m(x)] for some [m(x)] ∈ Fq [x]/(xn − 1). Hence, h(x)
divides g(x) modulo (xn − 1). It now follows that g(x) = h(x). Hence, any
monic divisor h(x) of xn − 1 is the unique generator polynomial for the cyclic
code C = h[h(x)]i, and there is a one-to-one correspondence between monic
divisors of xn − 1 and cyclic codes of length n.
Theorem 4.15 facilitates listing all of the cyclic codes of a given length.
Example 4.16. In order to find all of the binary cyclic codes of length 3,
we must factor x3 − 1 into irreducible polynomials (x + 1)(x2 + x + 1) over
F2 . There are four possible generator polynomials: g0 (x) = 1 generates the
code which is equal to all of F2 /(x3 − 1); g1 (x) = x + 1 generates the code
{0, 1 + x, x + x2 , 1 + x2 }; g2 (x) = x2 + x + 1 generates the code {0, 1 + x + x2 };
g3 (x) = x3 + 1 generates the code {0}.
Notice that the first row is the vector corresponding to the generator poly-
nomial g1 (x), and the second row is its cyclic shift. We next show that we can
always use the generator polynomial to define the generator matrix of a cyclic
code in this way.
23
Theorem 4.17. Suppose C is a cyclic code with generator polynomial g(x) =
g0 + g1 x + · · · + gr xr of degree r. Then the dimension of C is n − r, and a
generator matrix for C is the following (n − r) × n matrix:
g0 g 1 · · · gr 0 0 ··· 0
0 g 0 g1 · · · gr 0 ··· 0
..
..
G= 0 0 g 0 g 1 · · · g r . .
. . . . . .
.. .. .. .. .. · · · .. 0
0 0 ··· 0 g0 g1 . . . g r
Proof. First, note that g0 is nonzero: Otherwise, (0, g1 , . . . , gr−1 ) ∈ C which im-
plies that (g1 , . . . , gr−1 , 0) ∈ C which implies that g1 + g2 x + · · · + gr−1 xr−1 ∈ C,
which contradicts the minimality of the degree r of the generating polynomial.
Now, we see that the n−r rows of the matrix G are linearly independent because
of the echelon of nonzero g0 s with 0s below. These n − r rows represent the
code polynomials g(x), xg(x), x2 g(x), . . ., xn−r−1 g(x). In order to show that G
is a generator matrix for C we must show that every code polynomial in C can
be expressed as a linear combination of g(x), xg(x), x2 g(x), . . ., xn−r−1 g(x).
Part 2 of Theorem 4.14 shows that if c(x) is a code polynomial in C, then
c(x) = m(x)g(x) for some polynomial m(x) of degree less than n − r in Fq [x].
Hence,
c(x) = m(x)g(x)
= (m0 + m1 x + · · · + mn−r−1 xn−r−1 )g(x)
= m0 g(x) + m1 xg(x) + · · · + mn−r−1 xn−r−1 g(x)
which shows that any code polynomial c(x) in C can be written as a linear
combination of the code polynomials represented by the n − r independent rows
of G. We conclude that G is a generator matrix for C and the dimension of C
is n − r.
We now have an easy way to write down a generator matrix for a cyclic
code. As with linear codes, we can encode a cyclic code by performing a matrix
multiplication. However, Part (2) of Theorem 4.14 suggests a different encoding
scheme for cyclic codes. We can use the generating polynomial g(x) to encode a
message a = (a0 , . . . , ak−1 ) = a(x) by performing the polynomial multiplication
24
a(x)g(x). This simple polynomial multiplication can be used instead of storing
and using an entire generator matrix for the cyclic code.
Question 4.19. Let g(x) = 1 + x2 + x3 be the generator polynomial for a
binary cyclic code of length 7. Encode the message (1, 0, 0, 1) without using a
generating matrix.
You may have noticed that the above way of obtaining a generator matrix
of a cyclic code gives a matrix that is not in standard form. Therefore, we
can not automatically write down the parity check matrix, as we can when we
have a generator matrix in standard form. However, we next define parity check
polynomials that can be used to easily construct parity check matrices.
Definition 4.20. The parity check polynomial h(x) for an [n, k] cyclic code
C is the polynomial such that g(x)h(x) = xn − 1, where g(x) is the degree r
generator polynomial for C. Furthermore, h(x) is monic of degree k = n − r.
Since c(x) is a code polynomial if and only if it is a multiple of g(x), it follows
that c(x) is a code polynomial if and only if c(x)h(x) ≡ 0 modulo (xn − 1).
Question 4.21. Prove the above statement.
Theorem 4.22. Suppose C is an [n, k] cyclic code with parity check polynomial
h(x) = h0 + h1 x + · · · + hk xk . Then, a parity check matrix for C is the following
(n − k) × n matrix:
hk hk−1 · · · h0 0 0 ··· 0
0 hk hk−1 · · · h0 0 ··· 0
..
..
H= 0 0 hk hk−1 · · · h0 . .
. .. .. .. .. ..
..
. . . . ··· . 0
0 0 ··· 0 hk hk−1 · · · h0
Proof. You proved above that if c(x) ∈ C, then c(x)h(x) ≡ 0 modulo (xn −
Pn−1
1). The coefficient of xj in the product c(x)h(x) is i=0 ci hj−i , where the
subscripts are taken modulo n. So, if c(x) ∈ C, we have
n−1
X
ci hj−i = 0, for 0 ≤ j ≤ n − 1. (14)
i=0
25
Example 4.23. The [7, 3] code C constructed in Example 4.18 has parity check
polynomial h(x) = (x7 −1)/g(x) = 1+x2 +x3 . The following [(n−k)×n] = [4×7]
matrix is a parity check matrix for C:
1 1 0 1 0 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 0
0 0 0 1 1 0 1
Question 4.24. What condition (equation) should the generator matrix from
Example 4.18 and parity check matrix from Example 4.23 together satisfy?
Check that these matrices do satisfy that equation.
We defined Hamming codes Hr in Section 3.3 by constructing their parity
check matrices. We now show that these codes are equivalent to cyclic codes.
26
Definition 4.26. A permutation of a set A is a function from A to itself that
is both one-to-one and onto.
We will focus on the case where A = {0, 1, . . . , n − 1}. Here, permutations
are rearrangements of the numbers.
Example 4.27. Let A = {0, 1, . . . , 5} and let π be the map from A to A that
sends the ordered numbers 0, 1, 2, 3, 4, 5 to the ordered numbers 4, 5, 3, 1, 0,
2. (That is, π(0) = 4, π(1) = 5, and so on.) Then π is a permutation of the set
A.
We often use cycle notation to describe a permutation. For the permutation
in Example 4.27, the cycle notation is π = (1 5 2 3)(0 4). The rightmost cycle
means that 0 is sent to 4, and 4 is sent to 0. The leftmost cycle means that 1
is sent to 5, 5 is sent to 2, 2 is sent to 3, and 3 is sent to 1. As this example
shows, within each cycle, we read left to right. However, since the juxtaposition
of cycles denotes composition, we read the rightmost cycle first, and then work
leftward. (Some books vary in the convention of which order to read cycles or
functions in a composition. We will always read right to left.) For this particular
permutation, the order of the cycles does not matter. However, the following
example shows that the order of the cycles can make a difference.
Example 4.28. Consider the permutation γ on {1, 2, 3} represented by cycle
notation (1 2 3)(3 1). To determine γ(2), we begin with the rightmost cycle and
notice that 2 is not involved in this cycle. Moving leftwards, the next cycle
shows that 2 is sent to 3. Since there are no more cycles, we conclude that
γ(2) = 3.
Now consider the permutation γ 0 represented by cycle notation (3 1)(1 2 3).
To determine γ 0 (2), we begin with the rightmost cycle and notice that 2 gets
sent to 3. Moving leftwards, we see that 3 gets sent to 1. Since there are no
more cycles, we conclude that γ 0 (2) = 1. This shows that in general, cycles do
not commute.
As you might guess, there are certain conditions under which cycles do com-
mute. This is formalized in the next theorem.
Theorem 4.29. Disjoint cycles commute: If the pair of cycles a = (a1 . . . am )
and b = (b1 . . . bn ) have no entries in common, then ab = ba.
Proof. See [2].
Because disjoint cycles commute, we like to express permutations in terms
of disjoint cycles. For example, we can rewrite the permutation γ = (1 2 3)(3 1)
from Example 4.28 as (1)(3 2). The cycle (1) means γ(1) = 1, or in other words,
1 is fixed by the permutation γ. It is customary to omit from the cycle notation
the elements that are fixed by the permutation. For the example of γ, we omit
the (1) and simply write γ = (3 2). Another example of a permutation that
fixes some elements is the permutation ρ on A = {0, 1, . . . , 5} that sends the
ordered numbers 0, 1, 2, 3, 4, 5 to the ordered numbers 0, 1, 2, 5, 3, 4. In cycle
notation, we write ρ = (3 5 4).
27
Question 4.30. Express the following permutations as products of disjoint
cycles:
1. (1 2 3)(3 4)(5 0 2 1)
2. (3 2)(5 0 2)(2 3 5)
3. (0 5)(0 1 2 3 4 5)
28
where στ is the product of σ and τ in the group Sn ; recall that this product is
the composite σ ◦ τ , so that (στ )(i) = σ(τ (i)). [Hint: This problem is easy if
you use the definition as given in (15). But you are likely to get confused, and
even think that (18) is wrong, if you try to use (16).]
Define σn on An = {0, 1, . . . , n − 1} as the permutation written with cycle
notation as σn = (0 1 . . . n − 1). Hence σn (0) = 1, σn (1) = 2, . . ., σn (n − 1) = 0.
For convenience, we write σn = σ. Consider xσ, where x is any vector of length
n:
This shows that xσ is a cyclic shift of x. We can use this notation to restate
the definition of a cycle code:
Definition 4.34. A linear code C of length n is cyclic if and only if whenever
c ∈ C, so is cσn .
Let C be a binary linear [n, k] code. It follows from Definition 3.12 that
every permutation of the n coordinates of the codewords of C sends C onto an
equivalent [n, k] code or onto itself.
Definition 4.35. The set of all permutations that send a code C onto itself is
called the group of the code C. It is a group under function composition, and
it is denoted G(C).
In other words, G(C) is the group of permutations τ ∈ Sn such that for any
codeword c ∈ C, the vector cτ is also a codeword in C.
Question 4.36. Prove that the group of a code of length n is a subgroup of
Sn .
Example 4.37. If C is the whole space V (n, q), then G(C) = Sn .
Using the group of a code, we can offer another definition of a cyclic code:
Definition 4.38. A linear code C of length n is cyclic if G(C) contains the
cyclic group of order n generated by σn = (0 1 . . . n − 1).
For a cyclic code C, G(C) may be, and usually is, larger than the cyclic
group of order n generated by σn = (0 1 . . . n − 1).
Sometimes we need to find alternate generating matrices for a code, and
the group of the code can be used to do this: Any element in G(C) applied to
the coordinate positions of any generator matrix of C yields another generator
matrix of C. There are several other reasons that coding theorists study groups
of codes, for example the group of a code is useful in determining the structure
of the code, computing weight distributions, classifying codes, and devising
decoding algorithms. We say more about the latter application below.
29
What follows is a brief discussion on how the group of a code, if it is big
enough, can sometimes be used as an aid in decoding. This discussion was
written by Professor Ken Brown and is adapted from The Theory of Error-
Correcting Codes [4, p.513].
Consider an [n, k, d] linear code C with k ×n generator matrix G in standard
form, G = (I | A). For simplicity, let’s assume the code is binary. We then have
an (n − k) × n parity check matrix H = (AT | I). Given a k-bit message u, we
encode it as x by setting
x = uG = (u | uA).
Since the first k bits of x contain the message u, we call them the information
bits; the last n − k bits, given by uA, are called check bits.
Assume d ≥ 2t + 1, so that t errors can be corrected. Suppose a codeword
x is sent and an error e with weight w(e) ≤ t occurs. Then y = x + e is
received, and we know in principle that it is possible to decode y and recover the
codeword x. We also know that the syndrome S(y) = yH T can help us do this.
The following theorem describes a particularly easy case. Write e = (e0 | e00 ),
where e0 contains the first k bits of e.
Theorem 4.39. Suppose the syndrome S(y) = s has weight ≤ t. Then e0 = 0
(so the k information bits of y are correct) and e00 = s. We therefore have
x = y − (0 | s).
Proof. We will prove the equivalent statement that if e0 6= 0 then w(s) > t.
Using H = (AT | I) and remembering that xH T = 0, one computes
s = yH T = eH T = e0 A + e00 . (22)
Consequently,
w(s) ≥ w(e0 A) − w(e00 ). (23)
0 0 0
On the other hand, the vector e G = (e | e A) is a nonzero codeword, so it has
weight ≥ 2t + 1. Thus w(e0 ) + w(e0 A) ≥ 2t + 1, or
as required.
The converse is also true (and easier): If e0 = 0, so that there is no error in
the information bits, then the syndrome satisfies w(s) ≤ t. To see this, observe
that s = e00 by (22), and w(e00 ) ≤ w(e) ≤ t.
To summarize the discussion so far, decoding is easy if we’re lucky enough
that the errors don’t affect the information bits; moreover, we can tell whether
we were lucky by looking at the weight of the syndrome. What if we’re unlucky?
This is where the group G(C) can help.
30
Suppose that for every vector e of weight ≤ t there is a permutation σ ∈
G(C) such that the vector eσ obtained by permuting the coordinates according
to σ has all 0’s in its first k bits. Then we have the following decoding algorithm,
always assuming that there are at most t errors: If y (which equals x + e) is
received, find σ ∈ G(C) such that yσ (which equals xσ + eσ) has syndrome
of weight ≤ t. Such a σ exists by our hypothesis and by the discussion above;
however, we may have to use trial and error to find it since we don’t know e.
Then we can decode yσ as explained in the theorem, giving us xσ, and then we
can recover x by “unpermuting” the coordinates, i.e., by applying σ −1 .
In practice, one tries to find a small subset P ⊆ G(C) such that the permu-
tation σ above can always be found in P ; this cuts down the search time. For
example, consider the [7, 4, 3] Hamming code, with t = 1. Let’s take a version of
the Hamming code that is cyclic, as we know is possible. Then G(C) contains
at least the cyclic group of order 7 generated by the cyclic shift. In this case
one can take P ⊆ G(C) to consist of 3 of the elements of the cyclic group. In
other words, one can specify 3 “rotations” that suffice to move the one nonzero
bit of any vector e of weight 1 out of the 4 information positions.
31
Example 4.43. The minimal polynomial of i ∈ C with respect to the subfield
R is x2 + 1.
Example 4.44. Let β = a + bi ∈ C where b 6= 0. The minimal polynomial of
β with respect to R is
(x − β)(x − β) = x2 − (β + β)x + ββ
= x2 − 2ax + (a2 + b2 )
When working over finite fields, we would like to use some homomorphism
in a way that is analogous to our use of conjugation in the case involving C
and its subfield R. Suppose that we are dealing with α ∈ Fqm and the subfield
Fq . The analog of complex conjugation is the homomorphism φ : Fqm → Fqm
defined by φ : α 7→ αq . This map fixes precisely the subfield Fq . We might
guess that the minimal polynomial of an element α ∈ Fqm with respect to Fq
takes the form (x − α)(x − φ(α)). However, this needs a slight modification.
For complex conjugation, notice that ϕ(ϕ(β)) = β, so it would be redundant to
include the factor (x − ϕ2 (β)) in the minimal polynomial for β ∈ C. However, in
2
the finite field case, φ2 (α) = αq , which is generally not equal to α. In general,
the minimal polynomial of α ∈ Fqm with respect to Fq has the form
2
(x − α)(x − φ(α))(x − φ2 (α)) · · · = (x − α)(x − αq )(x − αq ) · · ·
v−1
The polynomials terminate after the factor (x − αq ) where v is the smallest
integer such that φv (α) = α. We make all of these ideas more formal below.
Definition 4.47. The conjugacy class of α ∈ Fqm with respect to the subfield
Fq is the set consisting of the distinct conjugates of α with respect to Fq . Note
that α is always contained in its conjugacy class.
Theorem 4.48. The conjugacy class of α ∈ Fqm with respect to the subfield
Fq contains v elements, where v divides m and v is the smallest integer such
v
that αq = α.
32
Proof. See [8].
Example 4.49. Let α be an element of order 3 in F16 . The conjugates of α
2
with respect to F2 are α, α2 , α2 = α3+1 = α, etc. Hence, the conjugacy class
2
with respect to F2 of α is {α, α }.
Question 4.50. Convince yourself that the conjugacy class with respect to F4
of α, an element of order 63 in F64 , is {α, α4 , α16 }.
Theorem 4.51. Let α be an element in Fqm . Let p(x) be the minimal polyno-
mial of α with respect to Fq . The roots of p(x) are exactly the conjugates of α
with respect to Fq .
Proof. See [8].
Example 4.52. In Example 2 on page 419 of the main text [1], we constructed
F8 by identifying F2 [x]/(x3 + x + 1) with F2 [α], where α3 + α + 1 = 0. Below,
we arrange the eight elements of this field into conjugacy classes, using their
exponential representations, and list their associated minimal polynomials. The
minimum polynomial for the element αi is denoted pi (x), and the minimum
polynomial for the element 0 is denoted p∗ (x).
Note that simplification above is done using the relation α3 = α + 1 and the
fact that addition and subtraction are equivalent in fields of characteristic 2.
Following the method of the above example, it is now easy to find the minimal
polynomial of any field element α with respect to a certain subfield. We simply
find the appropriate conjugates of α and then immediately form the minimal
polynomial by multiplying together factors (x − β), where β runs through all
conjugates of α (including α itself).
33
by Irving Reed and Gustave Solomon in a five-page paper, “Polynomial Codes
over Certain Finite Fields,” in 1960 [6], and they were given their present name
by W. Wesley Peterson in 1961 [5].
BCH codes are cyclic codes whose generator polynomial satisfies a certain
condition that guarantees a certain minimum distance:
Definition 4.53. Fix an integer n. Let Fqm be the smallest extension field of
Fq that contains an element of order n. Let β be an element of Fqm of order
n. A cyclic code of length n over Fq is called a BCH code of designed distance
δ if its generator polynomial g(x) ∈ Fq [x] is the least common multiple of the
minimal polynomials of β l , β l+1 , . . . , β l+δ−2 , where l ≥ 0 and δ ≥ 1.
where the αi are elements from any finite or infinite field. The transpose of this
matrix is also called Vandermonde.
1 t1 t21 · · · tn−1
1 x1 b1
1 t2 t22 · · · tn−1 x2 b2
2
.. .. = ..
. . .. ..
.. .. . . . . .
1 tn t2n ··· tnn−1 xn bn
34
Alternately, one can derive the expression for the determinant by showing
that the determinant of V is equal to the determinant of
βl β 2l β (n−1)l
1 ··· c0
1 β l+1 β 2(l+1) ··· β (n−1)(l+1)
c1
T
Hc = . . . . . . =0
.. .. .. .. .. ..
β a1 l β aw l
··· c a1
β a1 (l+1) aw (l+1)
· · · β c a2
.. .. .. . =0
..
. . .
β a1 (l+δ−2) ··· β aw (l+δ−2) c aw
β a1 l β aw l
··· c a1
β a1 (l+1) · · · β aw (l+1)
c
a2
.. .. .. .. = 0
. . . .
β a1 (l+w−1) · · · β aw (l+w−1) c aw
35
Since ca1 , . . . , caw are nonzero, the determinant of the matrix on the left is
zero. This is β (a1 +···+aw )b times the determinant of the following matrix:
1 ··· 1
β a1 ··· β aw
H0 = .. .. ..
. . .
β a1 (w−1) ··· β aw (w−1)
Example 4.56. In this example, we build two binary BCH codes of length 31.
Since 31 = 25 − 1, we can find a primitive element of order 31 in F32 , and our
BCH codes will be primitive. In particular, take β to be a root of the irreducible
polynomial x5 + x2 + 1.
One can generate the following chart of conjugacy classes of elements in F32
and their associated minimal polynomials.
36
Conjugacy Class : Associated Minimal Polynomial
{0} : p∗ (x) = (x − 0) = x
0
{α = 1} : p0 (x) = (x − 1) = x + 1
{α, α , α , α8 , α16 }
2 4
: p1 (x)
= (x − α)(x − α2 )(x − α4 )(x − α8 )(x − α16 )
= x5 + x2 + 1
{α3 , α6 , α12 , α24 , α17 } : p3 (x)
= (x − α3 )(x − α6 )(x − α12 )(x − α24 )(x − α17 )
= x5 + x4 + x3 + x2 + 1
{α5 , α10 , α20 , α9 , α18 } : p5 (x)
= (x − α5 )(x − α10 )(x − α20 )(x − α9 )(x − α18 )
= x5 + x4 + x2 + x + 1
{α7 , α14 , α28 , α25 , α19 } : p7 (x)
= (x − α7 )(x − α14 )(x − α28 )(x − α25 )(x − α19 )
= x5 + x4 + x2 + x + 1
{α11 , α22 , α13 , α26 , α21 } : p11 (x)
= (x − α11 )(x − α22 )(x − α13 )(x − α26 )(x − α21 )
= x5 + x4 + x3 + x + 1
{α15 , α30 , α29 , α27 , α23 } : p15 (x)
= (x − α15 )(x − α30 )(x − α29 )(x − α27 )(x − α23 )
= x5 + x3 + 1
37
dundancy of a code in order to maximize the rate of the code. In constructing
BCH codes, this translates to trying to minimize the number of extraneous roots
in the minimal polynomial. We know from our study of minimal polynomials
and conjugacy classes that the extraneous zeros are the conjugates of the de-
sired zeros. We can often choose l carefully so as to avoid including too many
extraneous roots.
Reed-Solomon codes are the most used codes today, due to their use in
compact disc digital audio systems. They are also well known for their role in
providing pictures of Saturn and Neptune during space missions such as the
Voyager. These and several other applications of RS codes are described in [9].
One way to define RS codes is as a special case of nonbinary BCH codes:
Definition 4.57. A Reed-Solomon code is a q-ary BCH code of length q − 1
(q 6= 2).
Consider the construction of a t-error-correcting Reed-Solomon code of length
q − 1. The first step is to note that the required element β of order q − 1
can be found in Fq . Hence, the conjugacy class of β with respect to Fq con-
sists of the unique element β. It follows that all 2t consecutive powers of β
are also in Fq , and their conjugacy classes with respect to Fq are also sin-
gletons. It follows from Theorem 4.51 that the minimal polynomial for any
power β s is of the form (x − β s ). Hence, a t-error-correcting RS code has
a generator polynomial of degree 2t with no extraneous roots. In particular,
a RS code of length q − 1 and designed distance δ has generator polynomial
g(x) = (x − β l )(x − β l+1 ) · · · (x − β l+δ−2 ) for some l ≥ 0. If we need to construct
a t-error-correcting RS code of length n = q − 1 over Fq , we define its generator
polynomial as
2t−1
Y
g(x) = (x − β l+j ).
j=0
38
bound, we have d ≥ δ = n − k + 1. Now, we can combine both inequalities to
see that d = n − k + 1.
Theorem 4.59 shows that RS codes satisfy the Singleton bound of Theorem
3.23 with equality. This means that given their length and dimension, RS
codes are optimal in their distance properties. Codes that satisfy the Singleton
bound with equality are called maximum distance separable (MDS). One notable
property of MDS codes is that if a code is MDS, then so is its dual. (The proof
is left as a homework problem.)
4.6 Problems
1. Show that the subring of rational numbers is not an ideal in the Reals.
2. Let R be a commutative ring with unity and let a ∈ R. Prove that the
set hai = {ra|r ∈ R} is an ideal of R.
5. List all of the distinct ideals in F2 [x]/h(x7 − 1)i by listing their generators.
6. List by dimension all of the binary cyclic codes of length 31. Do the same
for length 63 and 19.
7. List the cosets of the linear code having the following generator matrix:
1 1 0 1 0 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 0
0 0 0 1 1 0 1
8. Let C be a q-ary linear code with parity check matrix H. Let u, v ∈ Fnq .
Prove directly that uH = vH if and only if u and v lie in the same coset
of C.
9. Find a basis for the smallest binary linear cyclic code of length 7 containing
the codeword 1101000.
10. Let g(x) be the generator polynomial of a binary cyclic code which contains
some codewords of odd weight. Is the subset of even-weight codewords
in hg(x)i a cyclic code? If so, what is the generator polynomial of this
subcode?
11. Suppose that a generator matrix G of a linear code C has the property
that a cyclic shift of any row of G is also a codeword. Show that C is a
cyclic code.
39
12. If C is an [n, k]-cyclic code, show that every k successive positions are
information positions, if the first k are.
13. Fix b and prove the following: A BCH code of length n and design distance
δ1 contains as linear subspaces all length n BCH codes with design distance
δ2 ≥ δ1 .
14. Write down a generator polynomial g(x) for a 16-ary [15, 11] Reed–Solomon
code.
16. Prove the following: The minimum distance of the extended code of a RS
code, formed by adding a parity check digit to the end of each codeword
in the RS code, is increased by 1.
17. We studied the order of a group element in Chapter 11B of [1]. If a
permutation is written in cycle notation (using disjoint cycles), how can
you compute its order? Find the order of the following permutations:
(1 4), (1 4 7 6 2), and (1 2 3)(4 5 7).
18. Prove that Sn is non-abelian for all n ≥ 3.
19. Prove that the groups of dual codes satisfy the following equation: G(C) =
G(C ⊥ ).
5 Acknowledgements
The books in the following bibliography were of great help in writing this packet.
We borrowed some examples, problems, and proofs from these books, most often
[3] and [8].
The first version of this packet was written in the fall of 2001 at Cornell
University, while the author was being supported by an NSF VIGRE grant
held by the Mathematics Department. The current version was written during
June 2002, while the author was being supported by the Cornell Mathematics
Department. This version contains some additional theorems, proofs, exam-
ples, and content, and it has been revised to accommodate several suggestions
compiled by Ken Brown and Steph van Willigenburg while teaching Math 336
during the Spring 2002 semester. This version also incorporates some additional
problems and a handout on the group of a code developed by Ken Brown during
the Spring 2002 semester. The author thanks Ken and Steph for their helpful
comments and contributions.
References
[1] L. N. Childs, A concrete introduction to higher algebra, Springer-Verlag,
New York, second ed., 1995.
40
[2] J. Gallian, Contemporary Abstract Algebra, D.C. Heath and Company,
Lexington, MA, third ed., 1994.
[3] R. Hill, A first course in coding theory, Oxford University Press, New York,
first ed., 1986.
[6] I. S. Reed and G. Solomon, Polynomial codes over certain finite fields,
J. Soc. Indust. Appl. Math., 8 (1960), pp. 300–304.
41