Codes and Turbo Codes - C. Berrou (Springer, 2010) BBS
Codes and Turbo Codes - C. Berrou (Springer, 2010) BBS
Codes and Turbo Codes - C. Berrou (Springer, 2010) BBS
Springer
Paris
Berlin
Heidelberg
New York
Hong Kong
Londres
Milan
Tokyo
Claude Berrou (Ed.)
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and
storage in data banks. Duplication of this publication or parts thereof is permitted only
under the provisions of the German Copyright Law of September 9, 1965, in its current
version, and permissions for use must always be obtained from Springer. Violations are
liable for prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
Product liability: The publishers cannot guarantee the accuracy of any information about
dosage and application contained in this book. In every individual case the user must
check such information by consulting the relevant literature.
with the invaluable assistance of Josette Jouas, Mohamed Koubàa and Nicolas
Puech.
Any comments on the contents of this book can be sent to this e-mail address:
turbocode@mlistes.telecom-bretagne.eu
"The oldest, shortest words — yes and no —
are those which require the most thought"
What is commonly called the information age began with a double big bang.
It was 1948 and the United States of America was continuing to invest heavily
in high-tech research, the first advantages of which had been reaped during the
Second World War. In the Bell Telephone Laboratories, set up in New Jersey, to
the south of New York, several teams were set up around brilliant researchers,
many of whom had been trained at MIT (Massachusetts Institute of Technology).
That year two exceptional discoveries were made, one technological and the other
theoretical, which were to mark the 20th century. For, a few months apart, and
in the same institution John Bardeen, Walter Brattain and William Shockley
invented the transistor while Claude Elwood Shannon established information
and digital communications theory. This phenomenal coincidence saw the birth
of near-twins: the semi-conductor component which, according to its conduction
state (on or off), is able to materially represent binary information ("0" or "1")
and the Shannon or bit (short for binary unit ), a unit that measures information
capacity.
Today we can recognize the full importance of these two inventions that en-
abled the tremendous expansion of computing and telecommunications, to name
but these two. Since 1948, the meteoric progress of electronics, then of micro-
electronics, has provided engineers and researchers in the world of telecommuni-
cations with a support for their innovations, in order to continually increase the
performance of their systems. Who could have imagined, only a short while ago,
that a television programme could be transmitted via a pair of telephone wires?
In short, Shockley and his colleagues, following Gordon Moore’s law (which
states that the number of transistors on a silicon chip doubles every 18 months),
gradually provided the means to solve the challenge issued by Shannon, thanks
to algorithms that could only be more and more complex. A typical example
of this is the somewhat late invention of turbo codes and iterative processing
in receivers, which could only be imagined because the dozens or hundreds of
thousands of transistors required were available.
Experts in micro-electronics foresee the ultimate limits of CMOS technology
at around 10 billion transistors per square centimetre, in around 2015. This
is about the same as the number of neurons in the human brain (which will,
x Codes and Turbo Codes
Contributors v
Foreword ix
1 Introduction 1
1.1 Digital messages . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 A first code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Hard input decoding and soft input decoding . . . . . . . . . . . 7
1.4 Hard output decoding and soft output decoding . . . . . . . . . 11
1.5 The performance measure . . . . . . . . . . . . . . . . . . . . . . 11
1.6 What is a good code? . . . . . . . . . . . . . . . . . . . . . . . . 15
1.7 Families of codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Digital communications 19
2.1 Digital Modulations . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.2 Linear Memoryless Modulations . . . . . . . . . . . . . . 22
2.1.3 Memoryless modulation with M states (M-FSK) . . . . . 29
2.1.4 Modulations with memory by continuous phase frequency
shift keying (CPFSK) . . . . . . . . . . . . . . . . . . . . 31
2.2 Structure and performance of the optimal receiver on a Gaussian
channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.1 Structure of the coherent receiver . . . . . . . . . . . . . . 37
2.2.2 Performance of the coherent receiver . . . . . . . . . . . . 42
2.3 Transmission on a band-limited channel . . . . . . . . . . . . . . 59
2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3.2 Intersymbol interference . . . . . . . . . . . . . . . . . . . 60
2.3.3 Condition of absence of ISI: Nyquist criterion . . . . . . . 63
2.3.4 Expression of the error probability in presence of Nyquist
filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.4 Transmission on fading channels . . . . . . . . . . . . . . . . . . 69
2.4.1 Characterization of a fading channel . . . . . . . . . . . . 69
2.4.2 Transmission on non-frequency-selective slow-fading chan-
nels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xiv Codes and Turbo Codes
3 Theoretical limits 83
3.1 Information theory . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1.1 Transmission channel . . . . . . . . . . . . . . . . . . . . 83
3.1.2 An example: the binary symmetric channel . . . . . . . . 84
3.1.3 Overview of the fundamental coding theorem . . . . . . . 86
3.1.4 Geometrical interpretation . . . . . . . . . . . . . . . . . . 87
3.1.5 Random coding . . . . . . . . . . . . . . . . . . . . . . . . 88
3.2 Theoretical limits to performance . . . . . . . . . . . . . . . . . 91
3.2.1 Binary input and real output channel . . . . . . . . . . . 91
3.2.2 Capacity of a transmission channel . . . . . . . . . . . . . 92
3.3 Practical limits to performance . . . . . . . . . . . . . . . . . . . 96
3.3.1 Gaussian binary input channel . . . . . . . . . . . . . . . 96
3.3.2 Gaussian continuous input channel . . . . . . . . . . . . . 97
3.3.3 Some examples of limits . . . . . . . . . . . . . . . . . . . 99
3.4 Minimum distances required . . . . . . . . . . . . . . . . . . . . 100
3.4.1 MHD required with 4-PSK modulation . . . . . . . . . . 100
3.4.2 MHD required with 8-PSK modulation . . . . . . . . . . 102
3.4.3 MHD required with 16-QAM modulation . . . . . . . . . 104
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Index 413
Chapter 1
Introduction
Redundancy, diversity and parsimony are the keywords of error correction cod-
ing. To these, on the decoding side, can be added efficiency, that is, making the
most of all the information available. To illustrate these concepts, consider a
simple situation in everyday life.
Two people are talking near a road where there is quite heavy traffic. The
noise of the engines more or less disrupts their conversation, with peaks of
perturbation noise corresponding to the vehicles going past. First assume that
one of the people regularly transmits one letter chosen randomly: "a", "b" ... or
any of the 26 letters of the alphabet, with the same probability (that is, 1/26).
The message does not contain any other information and there is no connection
between the sounds transmitted. If the listener doesn’t read the speaker’s lips,
he will certainly often be at a loss to recognize certain letters. So there will be
transmission errors.
Now, in another scenario, one of the two people speaks in full sentences, on a
very particular topic, for example, the weather. In spite of the noise, the listener
understands what the speaker says better that when he says individual letters,
because the message includes redundancy. The words are not independent and
the syllables themselves are not concatenated randomly. For example, we know
that after a subject we generally have a verb, and we guess that after "clou",
there will be "dy" even if we cannot hear properly, etc. This redundancy in the
construction of the message enables the listener to understand it better in spite
of the difficult transmission conditions.
Suppose that we want to improve the quality of the transmission even further,
in this conversation that is about to take an unexpected turn. To be sure of
being understood, the speaker repeats some of the words, for example "dark
dark". However, after the double transmission, the receiver has understood
"dark lark". There is obviously a mistake somewhere, but is it "dark" or "lark"
that the receiver is supposed to understand? No error correction is possible
using this repetition technique, except maybe to transmit the word more than
twice. "dark lark dark" can, without any great risk of error, be translated as
"dark".
2 Codes and Turbo Codes
where x denotes the whole part of x. Multimedia messages (voice, music,
fixed and moving images, text etc.) transiting through communication systems
or stocked in mass memories, are exclusively binary. However, in this book
we shall sometimes have to consider alphabets with more than two elements.
This will be the case in Chapter 4, to introduce certain algebraic codes. In
Chapters 2 and 10, which deal with modulations, the alphabets, which we then
more concretely call constellations, contain a number of symbols that are a power
of 2, that is, we have precisely: m = log2 (M ).
Correction coding techniques are only implemented on digital messages.
However, there is nothing against constructing a redundant analogue message.
For example, an analogue signal, in its temporal dimension, accompanied or fol-
lowed by its frequential representation obtained thanks to the Fourier transform,
performs a judiciously redundant coding. However, this technique is not very
simple and the decoder remains to be invented.
Furthermore, the digital messages that we shall be considering in what fol-
lows, before the coding operation has been performed, will be assumed to be
made up of binary elements that are mutually independent and taking the values
0 and 1 with the same probability, that is, 1/2. The signals that are produced
by a sensor like a microphone or a camera, and then digitized to become bi-
nary sequences, do not generally satisfy these properties of independence and
4 Codes and Turbo Codes
equiprobability. It is the same with text (for example, the recurrence of "e" in
an English text is on average 5 times higher than that of "f"). The effects of
dependency or disparity in the original message, whether they be of physical,
orthographical or semantic origin or whatever, cannot be exploited by the digital
communication system, which transmits 0s and 1s independently of their con-
text. To transform the original message into a message fulfilling the conditions of
independence and equiprobability, an operation called source coding, or digital
compression, can be performed. Today, compression norms like JPEG, MPEG,
ZIP, MUSICAM, etc. are well integrated into the world of telecommunications,
the Internet in particular. At the output of the source encoder, the statistical
properties of independence and equiprobability are generally respected and the
compressed message can then be processed by the channel encoder, which will
add redundancy mathematically exploitable by the receiver.
additions are performed and if there is no ambiguity in the notation, the term
modulo 2 may be omitted.
The encoder transforms the message containing four data bits: d =(d0 , d1 ,
d2 , d3 ) into a word of eight bits: c = (d0 , d1 , d2 , d3 , r0 , r1 , r2 , r3 ), called code-
word. The codeword is therefore separable into one part that is the information
coming from the source, called the systematic 1 part and a part added by the
encoder, called the redundant part. Any code producing codewords of this form
is called a systematic code. Most codes, in practice, are systematic but there is
one important exception in the family of convolutional codes (Chapter 5).
The law for the construction of the redundant part by the particular encoder
of Figure 1.1 can be simply written as:
3
rj = dj + dp (j = 0, ..., 3) (1.3)
p=0
Table 1.1 shows the sixteen possible values of c, that is, the set {c} of codewords.
d0 d1 d2 d3 r0 r1 r2 r3 d0 d1 d2 d3 r0 r1 r2 r3
0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1
0 0 1 0 1 1 0 1 1 0 1 0 1 0 1 0
0 0 1 1 0 0 1 1 1 0 1 1 0 1 0 0
0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0
0 1 0 1 0 1 0 1 1 1 0 1 0 0 1 0
0 1 1 0 0 1 1 0 1 1 1 0 0 0 0 1
0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
Table 1.1 – The sixteen codewords that the encoder in Figure 1.1 can produce.
We can first note that the coding law is linear : the sum of two codewords is
also a codeword. It is the linearity of relation (1.3) that guarantees the linearity
of the encoding. All the codes that we shall consider in what follows are linear as
they are all based on two linear operations: addition and permutation (including
shifting). Since the code is linear and the transmission of a codeword might be
affected by a process that is also linear (the addition of a perturbation: noise,
interference, etc.), the choice of a codeword, to explain or justify the properties
of the code, is completely indifferent. It is the "all zero" codeword that will
play this "representative" or reference role for all the codewords vis-à-vis the
general properties of the encoder/decoder pair. At reception, the presence of 1
will therefore be representative of transmission errors.
1 We can also say information part because it is made up of bits of information coming from
the source.
6 Codes and Turbo Codes
The number of 1s contained in a codeword that is not "all zero" is called the
Hamming weight and is denoted wH . We can distinguish the weight relating
to the systematic part (wH,s ) and the weight relating to the redundant part
(wH,r ). We note, in Table 1.1, that wH is at least equal to 4. Because of
linearity, this also means that the number of bits that differ in two codewords
is also at least equal to 4. The number of bits that are different, when we
compare two binary words, is called the Hamming distance. The smallest of all
the distances between all the codewords, considered two by two, is called the
minimum Hamming distance (MHD) and denoted dmin . Linearity means that
it is also the smallest of the Hamming weights in the list of codewords excluding
the "all zero". dmin is an essential parameter for characterizing a particular
code, since the correction capability of the corresponding decoder is directly
linked to it.
We now write a decoding law for the code of Figure 1.1:
"After receiving the word c’ = (d0 ’, d1 ’, d2 ’, d3 ’, r0 ’, r1 ’, r2 ’, r3 ’) transmitted
by the encoder and possibly altered during transmission, the decoder chooses
the closest codeword ĉ in the sense of the Hamming distance".
The job of the decoder is therefore to run through Table 1.1 and, for each of the
sixteen possible codewords, to count the number of bits that differ from c’. The
ĉ codeword selected is the one that differs least from c’. When several solutions
are possible, the decoder selects one at random. Mathematically, this is written:
3
3
ĉ = c ∈ {c} such that dj ⊕ dj + rj ⊕ rj is minimum (1.4)
j=0 j=0
non-correctable error detector. Finally, if c’ contains three 1s, the decoder will
find a single codeword at a distance of 1 and it will propose this codeword as
the most probable solution, but it will be erroneous.
The error correction capability of the extended Hamming code is therefore
t = 1 errors. More generally, the error correction capability of a code with a
minimum distance dmin is:
dmin − 1
t= (1.5)
2
Note that the correction capability of the code given as the example in this
introduction is not decreased if we remove one (any) of the redundancy symbols.
The MHD passes from 4 to 3 but the correction capability is still of one error.
This shortened version is in fact the original Hamming code, the first correcting
code in the history of information theory (1948).
In a given family of codes, we describe a particular version of it by the
shortcut (n, k, dmin ) where n and k are the lengths of the codewords and
of the source messages, respectively. Up to now, we have thus just defined
two Hamming codes denoted (8, 4, 4) and (7, 4, 3). The second seems more
interesting as it offers the same error correction capability (t = 1) with a τ =
n−k
k
redundancy rate of 0.75, instead of 1 for the first one. However, the code
(7, 4, 3) cannot play the role of an error detector: if the received word contains
two errors, the decoder will decide in favour of the single, erroneous codeword
that is to be found at a Hamming distance of 1.
Rather than redundancy rate, we usually prefer to use the notion of coding
rate, denoted R, and defined by:
k 1
R= = (1.6)
n 1+τ
The product Rdmin will appear in the sequel as an essential figure of merit
vis-à-vis a perturbation caused by additive noise with a Gaussian distribution.
When the signal received by the decoder comes from a device capable of
producing estimations of an analogue nature on the binary data transmitted,
the error correction capability of the decoder can be greatly improved. To show
this using the example of the extended Hamming code, we must first change
alphabet and adopt an antipodal (or symmetric) binary alphabet. We will make
the transmitted values x = -1 and x = +1 correspond to the systematic binary
data d = 0 and d = 1, respectively. Similarly, we will make the transmitted
values y = -1 and y = +1 correspond to the redundant binary data r = 0 and
r = 1, respectively. We then have:
x = 2d − 1 = −(−1)d
(1.7)
y = 2r − 1 = −(−1)r
Since the decoder has information about the degree of reliability of the values
received, called soft or weighted values in what follows, the decoding of the
1. Introduction 9
extended Hamming code according to law (1.4) is no longer optimal. The law of
maximum likelihood decoding to implement in order to exploit these weighted
values depends on the type of noise. An important case in practice is additive
white Gaussian noise (AWGN).
u is a random Gaussian variable with mean μ and variance σ 2 when its
probability density p(u) can be expressed in the form:
1 (u − μ)2
p(u) = √ exp(− ) (1.8)
σ 2π 2σ 2
The AWGN is a perturbation which, after adapted filtering and periodic sam-
pling (see Chapter 2), produces independent samples whose amplitude follows
probability density law (1.8), with zero mean and variance:
N0
σ2 = (1.9)
2
where N0 is the noise power spectral density.
A transmission channel on which the only alteration of the signal comes from
an AWGN is called a Gaussian channel. At the output of such a channel, the
ML decoding is based on the exhaustive search for the codeword that is at the
smallest Euclidean distance from the received word. Denoting X and Y the
received values corresponding to the transmitted symbols x and y respectively,
the soft input decoder of the extended Hamming code therefore chooses:
3
2
3
2
ĉ = c ∈ {c} such that (xj − Xj ) + (yj − Yj ) is minimum (1.10)
j=0 j=0
Since the values transmitted are all such that x2j = 1 or yj2 = 1 and all the
Euclidean distances contain Xj2 and Yj2 , the previous law can be simplified as:
3
3
ĉ = c ∈ {c} such that − 2xj Xj − 2yj Yj is minimum
j=0 j=0
or as:
3
3
ĉ = c ∈ {c} such that xj Xj + yj Yj is maximum (1.11)
j=0 j=0
3
ĉ = c ∈ {c} such that (Vmax +xj Xj ) + (Vmax +yj Yj ) is maximum (1.12)
j=0
where [−Vmax , Vmax ] is the interval of the values that the input samples Xj and
Yj of the decoder can take after the clipping operation.
In Figure 1.4, the "all zero" codeword has been transmitted and received
with three alterations in the first three positions. These three alterations have
inverted the signs of the symbols but their amplitudes are at a fairly low level:
0.2, 0.4 and 0.1. Hard input decoding produces an erroneous result as the closest
codeword in terms of the Hamming distance is (1, 1, 1, 0, 0, 0, 0, 1). However,
soft input decoding according to (1.11) does produce the "all zero" word, whose
maximum scalar product is:
in comparison with:
Figure 1.4 – The "all zero" word (emission of symbols with the value -1) has been
altered during the transmission on the first three positions. The hard input decoding
is erroneous, but not the soft input decoding.
The ML decoding rules that we have just used in a specific example are easily
generalizable. However, we do realize that beyond a certain length of message,
such a decoding principle is unrealistic. Applying ML decoding to codewords
1. Introduction 11
containing 240 bits in the systematic part, for example, would mean considering
as many codewords as atoms in the visible universe (1080 ). In spite of this, for
most of the codes known, non-exhaustive decoding methods have been imagined,
enabling us to get very close to the optimal result of the ML method.
introduced by the encoder and the decoder, the degree of flexibility of the code
(in particular its ability to conform to different lengths of message and/or to
different coding rates) are also to be considered more or less closely, depending
on the specific constraints of the communication system.
The residual errors that the decoder has not managed to correct are measured
by using two parameters. The binary error rate (BER) is the ratio between the
number of residual binary errors and the total number of bits of information
transmitted. The word, block or packet error rate (PER) is the number of
codewords badly decoded (at least one of the bits of information is wrong) out
of the total number of codewords transmitted. The ratio between BER and PER
is the average density of errors δe in the systematic part of a badly decoded word:
w̄ BER
δe = = (1.13)
k PER
where w̄ = kδe is the average number of erroneous information bits in the
systematic part of a badly decoded block.
Figure 1.5 gives a typical example of the graphic representation of the per-
formance of error correction coding and decoding. The ordinate gives the BER
Eb
on a logarithmic scale and the abscissa carries the signal to noise ratio N0
, ex-
pressed in decibels (dB). N0 is defined by (1.9) and Eb is the energy received
per bit of information. If Es is the energy received for each of the symbols of
the codeword, Es and Eb are linked by:
Es = REb (1.14)
Figure 1.5 – Error correction capability of (8, 4, 4) and (7, 4, 3) Hamming codes on a
Gaussian channel, with hard input and soft input decoding.
1 exp − Eb
N0
Pe ≈ (1.15)
2 π Eb
N0
To evaluate the probability Pe,word that the soft-input decoder of a code with
rate R with minimum distance dmin produces an erroneous codeword, in the
Eb Eb
previous equation we replace N 0
by Rdmin N 0
and we introduce a multiplicative
coefficient denoted N (dmin ):
exp −Rd Eb
1 Eb 1 min N0
Pe,word = N (dmin )erfc Rdmin ≈ N (dmin ) (1.16)
2 N0 2 πRd Eb
min N0
The replacement of Eb by REb comes from (1.14) for the energy received by
symbol is Es . The multiplication by dmin is explained by the ML decoding
rule (relation (1.11)), through which the decoder can discriminate the correct
codeword and its closest competitor codewords thanks to dmin distinct values.
Finally, the coefficient N (dmin ), called multiplicity, takes into account the num-
ber of competitor codewords that are the minimum distance away. For example,
in the case of the extended Hamming code, we have N (dmin = 4) = 14 (see Ta-
ble 1.1).
14 Codes and Turbo Codes
1 exp −Rdmin N Eb
0
Pe,bit ≈ δe N (dmin ) (1.17)
2 πRd Eb
min N0
Reading Table 1.1, we note that average number of errors in the 14 competitor
words of weight 4, at the minimum distance from the "all zero" word, is 2.
Equation (1.19) applied to the extended Hamming code is therefore:
1 2 exp − 1
2 × 4 × Eb
N0 exp −2 Eb
N0
Pe,bit ≈ × × 14 = 3.5
2 4 π ×4×
1 Eb
2π Eb
2 N0 N0
This expression gives Pe,bit = 2.8 10−5, 1.8 10−6 et 6.2 10−8 for N Eb
0
= 7, 8
and 9 dB respectively, which corresponds to the results of the simulation of
Figure 1.5. Such agreement between equations and experimentation cannot be
found so clearly for more complex codes. In particular, finding the competitor
codewords at distance dmin may not be sufficient and we then have to consider
words at distance dmin + 1, dmin +2 etc.
For a same value of Pe and Pe,bit provided
by the relations
(1.15) and (1.17)
Eb Eb
respectively, the signal to noise ratios N0 and N0 without coding (NC)
NC C
and with coding (C) are such that:
⎛ ⎞
Eb
Eb Eb ⎜ N0 N C ⎟
Rdmin − = log ⎝δe N (dmin ) ⎠ (1.18)
N0 C N0 N C Eb
Rd
min N0
C
If δe N (dmin ) is not too far from unity, this relation can be simplified as:
Eb Eb
Rdmin − ≈0
N0 C N0 N C
Eb
The asymptotic gain, expressed in dB, provides the gap between N and
0
NC
Eb
N0 :
C ⎛ ⎞
Eb
N0
Ga = 10 log ⎝ N C ⎠ ≈ 10 log (Rdmin ) (1.19)
Eb
N0
C
As mentioned above, Rdmin appears as a figure of merit which, in a link budget
with a low error rate, fixes the gain that a coding process can provide on a
1. Introduction 15
Gaussian channel when the decoder is soft input. This is a major parameter
for communication system designers. For types of channel other than Gaussian
channels (Rayleigh, Rice, etc.), the asymptotic gain is always higher than what
is approximately given by (1.19).
In Figure 1.5, the soft input decoding of the (8, 4, 4) Hamming code gives
the best result, with an observed asymptotic gain of the order of 2.4 dB, in
accordance with relation (1.18) that is more precise than (1.19). The (7, 4, 3)
code is slightly less efficient since the product Rdmin is 12/7 instead of 2 for
the (8, 4, 4) code. On the other hand, hard input decoding is unfavourable to
the extended code as it does not offer greater correction capability in spite of
a higher redundancy rate. This example is atypical: in the very large majority
of practical cases, the hierarchy of codes that can be established based on their
performance on a Gaussian channel, with soft input decoding, is respected for
other types of channels.
The search for the ideal encoder/decoder pair, since Shannon’s work, has
always had to face this dilemma: good convergence versus high MHD. Excellent
algebraic codes like BCH or Reed-Solomon codes were fairly rapidly elaborated
in the history of the correction coding (see Chapter 4). MHDs are high (and even
sometimes optimal) but it is not always easy to implement soft input decoding.
In addition, algebraic codes are generally "sized" for a specific length of codeword
and coding rate, which limits their fields of application. In spite of this, algebraic
codes are of great use in applications that require very low error rates, especially
mass memories and/or when soft information is not available.
It is only recently, with the introduction of iterative probabilistic decoding
(turbo decoding), that we have been able to obtain efficient error correction
close to the theoretical limit. And it is even more recently that we have been
able to obtain sufficient MHDs to avoid a change in slope that is penalizing for
the performance curve.
It is not easy to find a simple answer to the question posed at the beginning
of this section. Performance is, of course, the main criterion: for a given error
rate, counted either in BER or in PER and for a fixed coding rate, a good code is
first the one whose decoder offers a good error correction capability close to the
corresponding theoretical limit. One preliminary condition of this is obviously
the existence of a decoding algorithm (random codes do not have a decoder, for
1. Introduction 17
example) and that the software and/or hardware of this algorithm should not be
too complex. Furthermore, using soft inputs could be an imperative that might
not be simple to satisfy.
Other criteria like decoding speed (that fixes the throughput of the infor-
mation decoded), latency (the delay introduced by the decoding process) or
flexibility (the ability of the code to be defined for various word lengths and
coding rates) are also to be taken into account in the context of the application
targeted.
Finally, non-technical factors may also be very important. Technological
maturity (do applications and standards already exist?), the cost of components,
possible intellectual property rights, strategic preferences or force of habit are
elements that carry weight when choosing a coding solution.
• the coding rates of algebraic codes are rather close to unity, whilst convo-
lutional codes have lower rates,
• block code decoding is rather of the hard input decoding type, and that
of convolutional codes is almost always soft input decoding.
Today, these distinctions are tending to blur. Convolutional codes can easily
be adapted to encode blocks and most decoders of algebraic codes accept soft
inputs. Via concatenation (Chapter 6), algebraic code rates can be lowered to
values comparable with those of convolutional codes. One difference remains,
however, between the two sub-families: the number of possible logical states of
algebraic encoders is generally very high, which prevents decoding by exhaustive
state methods. Decoding algorithms are based on techniques specific to each
code. Convolutional encoders have a limited number of states, 2 to 256 in
practice, and their decoding uses a complete representation of states, called
a trellis (Chapter 5). It is for this reason that the book is structured in a
traditional manner, which, for the time being, makes the distinction between
algebraic codes and convolutional codes.
Modern coding requires concatenated or composite structures, which use sev-
eral elementary encoders and whose decoding is performed by repeated passages
18 Codes and Turbo Codes
Digital communications
where t denotes the time and f0 is constant. Modulation involves making one or
other of parameters a the amplitude, and ϕ the phase, depend on the signal to
be transmitted. The modulated signal s(t) then has a narrow spectrum centred
on f0 , which is what we want.
The signal to be transmitted will in the sequel be called the modulating
signal. Modulation makes one of the parameters a and ϕ vary as a function of
20 Codes and Turbo Codes
the modulating signal if the latter is analogue. In the case of a digital signal, the
modulating signal is a series of elements of a finite set, or symbols, applied to the
modulator at discrete instants that are called significant instants. This series
is called the digital message and we assume that the symbols are binary data
applied periodically at the input of the modulator, every Tb seconds, therefore
with a binary rate D = 1/Tb bits per second. The binary data of this series are
assumed to be independent and identically distributed (iid). A given digital, for
example binary, message can be replaced by its "mth extension" obtained by
grouping the initial symbols into packets of m. Then the symbols are numbers
with m binary digits (or m-tuples), the total number of which is M = 2m ,
applied to the modulator at significant instants with period mTb . If the mth
extension is completely equivalent to the message (of which it is only a different
description), then the signal modulated by the original message and the signal
modulated by its mth extension do not have the same properties, in particular
concerning their bandwidth, since the larger m is, the narrower the bandwidth.
The choice of integer m thus allows the characteristics of the modulated message
to be varied.
Consider the complex signal
whose s(t) is the real part, where j is the solution to the equation x2 + 1 = 0.
We can represent σ(t) as the product
uous derivation, or increasing the size of the constellation to lower the central
lobe of the spectrum, mean an increase in the complexity of the modulator.
In the general case of amplitude-shift keying (ASK) with M states, the am-
plitude of the carrier is the modulated value aj = Aj h(t) for j = 1, 2, ..., M
where Aj takes a value among M = 2m values according to the group of data
presented at the input of the modulator and h(t) is a rectangular pulse with
unit amplitude and width T . The modulator thus provides signals of the form:
For an M-ASK signal, the different states of the modulated signal are situ-
ated on a straight line and its constellation is therefore one-dimensional in the
Fresnel plane. There are many ways to make the association between the value of
the amplitude of the modulated signal and the particular realization of a group
of data of m = log2 (M ) data. In general, we associate with two adjacent values
taken by the amplitude, two groups of data that differ by only one binary value.
This particular association is called Gray coding. It enables the errors made by
the receiver to be minimized. Indeed, when the receiver selects an amplitude
adjacent to the emitted amplitude because of noise, which corresponds to the
most frequent situation, we make only one error for m = log2 (M ) data trans-
mitted. We show in Figure 2.2 two examples of signal constellations modulated
in amplitude by Gray coding.
Figure 2.2 – Example of 4-ASK and 8-ASK signal constellations with Gray coding
where the {ai } are a sequence of M -ary symbols, called modulation symbols,
which take the values (2j − 1 − M ), j = 1, 2, · · · , M . In the expression of the
modulated signal, i is the time index.
The signal S(t) can again be written in the form:
S(t) =
e {se (t) exp (j(2πf0 t + ϕ0 ))} (2.11)
24 Codes and Turbo Codes
Taking into account the fact that the data di provided by the source of infor-
mation are iid, the modulation symbols ai are independent, with zero mean and
variance equal to (M 2 − 1)/3.
It can be shown that the power spectral density (psd) of the signal S(t) is
equal to:
1 1
γS (f ) = γse (f − f0 ) + γse (f + f0 ) (2.13)
4 4
with: 2
M2 − 1 2 sin πf T
γse (f ) = A T (2.14)
3 πf T
The psd of se (t) expressed in dB is shown in Figure 2.3 as a function of the
normalized frequency f T , for M = 4 and A2 T = 1.
Figure 2.3 – Power spectral density (psd) of the complex envelope of a signal ASK-4,
with A2 T = 1.
The psd of S(t) is centred on the frequency carrier f0 and its envelope
decreases in f 2 . It is made up of a main lobe of width 2/T and of sidelobes that
periodically have zero crossing at f0 ± k/T .
Note
The bandwidth is, strictly speaking, infinite. In practice, we can decide only
to transmit a percentage of the power of the signal S(t) and in this case, the
2. Digital communications 25
bandwidth is finite. If, for example, we decide to transmit 99% of the power of
the modulated signal, which results in only a low distortion of the signal S(t),
then the bandwidth is about 8/T where 1/T is the symbol rate. We shall see in
Section 2.3 that it is possible to greatly reduce this band without degrading the
performance of the modulation. This remark is valid for all linear modulations.
In this form, the M-PSK signal can be expressed as the sum of two quadrature
carriers, cos(2πf0 t + ϕ0 ) and − sin(2πf0 t + ϕ0 ), the amplitude modulated by
cos φj and sin φj with cos2 φj + sin2 φj = 1. We can check that when M is
a multiple of 4, the possible values of the amplitude of the two carriers are
identical.
In Figure 2.4 we show two constellations of a phase modulated signal with
Gray coding. The constellations have two dimensions and the different states of
the modulated signal are on a circle of radius A. We say that the constellation
is circular.
Figure 2.4 – Examples of constellations of a phase modulated signal with Gray coding.
26 Codes and Turbo Codes
The energy Es for transmitting a phase state, that is, a group of log2 (M ) binary
data, is equal to:
T
A2 T
Es = A2 cos2 (2πf0 t + ϕ0 + φj )dt = if f0 >> 1/T (2.18)
2
0
Energy Es is always the same whatever the phase state transmitted. The energy
used to transmit a bit is Eb = Es / log2 (M ).
For the transmission of a continuous data stream, the modulated signal can
be written in the form:
S(t) = A ai h(t − iT ) cos(2πf0 t + ϕ0 )
i (2.19)
− bi h(t − iT ) sin(2πf0 t + ϕ0 )
i
where the modulation symbols ai and bi take their values in the following sets:
ai ∈ cos (2j + 1) M
π
+ θ0 0 ≤ j ≤ (M − 1)
(2.20)
bi ∈ sin (2j + 1) M π
+ θ0 0 ≤ j ≤ (M − 1)
The signal S(t) can again be written in the form given by (2.11) with:
se (t) = A ci h(t − iT ), ci = ai + jbi (2.21)
i
Taking into account the fact that the data di provided by the source of infor-
mation are iid, the modulation symbols ci are independent, with zero mean and
unit variance.
The psd of the signal S(t) is again equal to:
1 1
γS (f ) = γs (f − f0 ) + γse (f + f0 )
4 e 4
with this time: 2
2 sin πf T
γse (f ) = A T (2.22)
πf T
the psd looking like that of Figure 2.3.
where f0 is the frequency of the carrier, ϕ0 its phase and h(t) a rectangular
pulse of unit amplitude and width T .
Two situations can arise depending on whether the length m of the groups of
data at the input of the modulator is even or not. If m is even, then M = 2m is
a perfect square (4, 16, 64, 256, . . .); in the opposite case, M is simply a power
of two (8, 32, 128, . . .).
When m is even, the group of data can be separated into two sub-groups of
length m/2, each being associated respectively with amplitudes Acj and Asj that
√ √
take their values in the set (2j − 1 − M )A, j = 1, 2, · · · , M . In Figure 2.5
are represented the constellations of the 16-QAM and 64-QAM modulations.
These constellations are said to be square.
with: Asj
Vj = (Acj )2 + (Asj )2 φj = tan−1
Acj
In this form, the M-QAM modulation can be considered as a modulation
combining phase and amplitude. Assuming that the phase takes M1 = 2m1
states and the amplitude M2 = 2m2 states, the modulated signal transmits
log2 (M1 M2 ) = m1 +m2 data every T seconds. Figure 2.7 shows the constellation
28 Codes and Turbo Codes
Figure 2.7 – Constellation of a modulation combining phase and amplitude for M = 16.
The average energy Es to transmit the pair (Acj , Asj ), that is, a group of
log2 (M ) binary data, is equal to:
T
Es = E Vj2 cos2 (2πf0 t + ϕ0 + φj )dt (2.25)
0
For a group of data of even length m, E Vj2 = 2A2 (M − 1)/3 and thus, for
f0 >> 1/T , the average energy Es is equal to:
M −1
Es = A2 T (2.26)
3
The average energy used to transmit a bit is Eb = Es / log2 (M ).
2. Digital communications 29
For a continuous data stream, the signal can be written in the form:
S(t) = A ai h(t−iT ) cos(2πf0 t+ϕ0 )
i (2.27)
− bi h(t−iT ) sin(2πf0 t+ϕ0 )
i
√
where the modulation
√ symbols ai and bi take the values (2j − 1 − M ), for
j = 1, 2, · · · , M and for M = 2m with even m. The signal S(t) can be
expressed by the relations (2.11) and (2.21):
S(t) =
e {se (t) exp j(2πf0 t + ϕ0 )}
with se (t) = A ci h(t − iT ), ci = ai + jbi
i
The binary data di provided by the information source being iid, the modulation
symbols ci are independent, with zero mean and variance equal to 2(M − 1)/3.
The psd of the signal S(t) is again given by (2.13) with:
2
2(M − 1) 2 sin πf T
γse (f ) = A T (2.28)
3 πf T
The spectral width of a modulated M-QAM signal is therefore, to within an
amplitude, the same as that of M-ASK and M-PSK signals.
where δ(f ) is the Dirac distribution. The psd of a 2-FSK signal has a continuous
part and a discrete part. Limiting ourselves to the two main lobes of this power
spectral density, the band of frequencies occupied by a 2-FSK signal is 3/T , that
is, three times the symbol rate. Let us recall that at a same symbol rate, an
M-PSK or an M-QAM signal occupies a bandwidth of only 2/T . The discrete
part corresponds to two spectral lines situated in f1 and f2 .
where h is called the modulation index and the M -ary symbols ai take their
values in the alphabet {±1, ±3, · · · , ±(2p + 1), · · · , ± ( M − 1) }; M = 2n .
The function g(t) is causal and has a finite pulse widtht.
g(t) = 0 t ∈ [0, LT [ L integer
(2.38)
= 0 elsewhere
32 Codes and Turbo Codes
Putting:
t
q(t) = g(τ )dτ
0
i
i−L
φ(t) = 2πh an q(t − nT ) + πh an (2.39)
n=i−L+1 n=−∞
q(t) = 0 t≤0
q(t) = t
2T 0≤t≤T (2.40)
q(t) = 1
2 t≥T
MSK modulation is full response continuous phase frequency shift keying mod-
ulation (L = 1).
On the interval [iT, (i + 1)T [, the phase φ(t) of the MSK signal has the
expression:
π (t − iT ) π
i−1
φ(t) = ai + an (2.41)
2 T 2 n=−∞
The evolution of the phase φ(t) as a function of time is shown in Figure 2.10.
We can note that the phase φ(t) varies linearly over a time interval T and that
there is no discontinuity at instants iT .
2. Digital communications 33
0.5
1
Phase (rad)
1.5
2
2.5
3
3.5
0 T 2T 3T 4T 5T 6T 7T 8T 9T 10T
Using the expressions (2.36) and (2.41), the MSK signal can be written in the
form:
ai π
S(t) = A cos 2π(f0 + )t − i ai + θi + ϕ0 iT ≤ t < (i + 1)T (2.42)
4T 2
with:
π
i−1
θi = ai (2.43)
2 n=−∞
The MSK signal uses two frequencies to transmit the binary symbols ai = ±1.
1
f1 = f0 + 4T
if ai = +1
(2.44)
f2 = f0 − 1
4T
if ai = −1
We can verify that the two frequency signals f1 and f2 are orthogonal and
that they present a frequency deviation Δf = f1 − f2 = 1/2T minimal. This
minimum deviation is at the origin of the name of MSK modulation.
The modulated MSK signal can also be written in the form:
S(t) = A c2i−1 h(t − 2iT ) cos 2T
πt
cos(2πf0 t + ϕ0 )
i (2.45)
− c2i h(t − (2i + 1)T ) sin 2T sin(2πf0 t + ϕ0 )
πt
i
where the symbols ci are deduced from the symbols ai by transition coding.
h(t) = 1 if t ∈ [−T, T [
(2.47)
= 0 elsewhere
MSK modulation can be seen as an amplitude modulation of the terms
πt πt
cos cos(2πf0 t + ϕ0 ) and − sin sin(2πf0 t + ϕ0 )
2T 2T
by two bit streams uc (t) = c2i−1 h(t − 2iT ) and us (t) = c2i h(t − (2t + 1)T )
i i
whose transitions are shifted by T . Each bit stream enables a bit to be trans-
mitted every 2T seconds and thus, the binary rate of an MSK modulation is
D = 1/Tb with T = Tb .
MSK modulation is a particular case of CP M-FSK modulation since it is
linear. The psd is given by:
1 1
γS (f ) = γ(f − f0 ) + γ(−f − f0 ) (2.48)
4 4
with: 2
16A2 T cos 2πf T
γ(f ) = (2.49)
π2 1 − 16f 2 T 2
Figure 2.11 shows the psd of the complex envelope of an MSK signal expressed
in dB as a function of the normalized frequency f Tb . We have also plotted the
psd of the complex envelope of a 4-PSK signal. In order for the comparison of
these two power spectral densities to make sense, we have assumed that the rate
transmitted was identical for these two modulations (that is, T = 2Tb for the
4-PSK modulation).
The width of the main lobe of the power spectral density of an MSK mod-
ulation is 3/2Tb whereas it is only 1/Tb for 4-PSK modulation. Thus, for a
same rate transmitted, the main lobe of MSK modulation occupies 50% more
bandwidth than that of 4-PSK modulation. However, the envelope of the psd
of an MSK signal decreases in f −4 whereas it only decreases in f −2 for 4-PSK
modulation. One of the consequences of this is that the bandwidth B that con-
tains 99% of the power of the modulated signal for MSK is 1.2/Tb whereas it is
around 8/Tb for 4-PSK.
Figure 2.11 – Power spectral density of the complex envelope of MSK and 4-PSK
signals.
The term BN allows the time spreading of function g(t) to be fixed. Thus for
BN = 0.2, this function is approximately of width 4T whereas its width is only
3T for BN = 0.3. When BN tends towards infinity, it becomes a recatngu-
lar pulse with width T (the case of MSK modulation). GMSK modulation is
therefore a partial response continuous phase modulation (L > 1).
On the interval [iT, (i + 1)T [, the phase φ(t) of the GMSK signal is equal to:
i
π
i−L
φ(t) = π an q(t − nT )+ an (2.55)
2 n=−∞
n=i−L+1
chosen for the GSM (Groupe Spécial Mobile and later Global System for Mobile
communications) system. We note that there is no simple expression of the
power spectral density of a GMSK signal. For values of the normalized passband
of 0.3 or of 0.2, the power spectral density of the GMSK signal does not show
sidelobes and its decrease as a function of frequency is very rapid. Thus at -10
dB the band occupied by the GMSK signal is approximately 200 kHz, and at
-40 dB 400 kHz for a rate D = 271 kbit/s.
where sjp is a scalar equal to the projection of the signal sj (t) on the function
νp (t).
T
sjp = sj (t)νp (t)dt
0
38 Codes and Turbo Codes
The noise can also be represented in the form of a series of normed and orthog-
onal functions but of infinite length (Karhunen Loeve expansion). When the
noise is white, we show that the normed and orthogonal functions can be chosen
arbitrarily. We are therefore going to take the same orthonormed functions as
those used to represent the sj (t) signals, but after extension to infinity of this
base of functions:
∞
N
b(t) = bp νp (t) = bp νp (t) + b (t)
p=1 p=1
T
bp = b(t)νp (t)dt
0
The quantities bp are random non-correlated Gaussian variables, with zero mean
and variance σ2 = N0 /2.
TT
E {bp bn } = E {b(t)b(t )}νp (t)νn (t )dtdt
00
T
N0 N0
E {bp bn } = νp (t)νn (t)dt = δn,p (2.56)
2 2
0
Using the representations of the sj (t) signals and of the b(t) noise by their
respective series, we can write:
N ∞
N
r(t) = (sjp + bp )νp (t) + bp νp (t) = rp νp (t)+b (t)
p=1 p=N +1 p=1
Conditionally to the emission of the signal sj (t), the quantities rp are random
Gaussian variables, with mean and variance N0 /2. They are non-correlated to
the noise b (t). Indeed, we have:
% ∞
&
E {rp b (t)} = E (sjp + bp ) bn νn (t) ∀ p = 1, 2, · · · , N
n=N +1
2. Digital communications 39
Taking into account the fact that the variables bn , whatever n is, are zero mean
and non-correlated, we obtain:
∞
E {rp b (t)} = E {bp bn } νn (t) = 0 ∀ p = 1, 2, · · · , N (2.57)
n=N +1
The quantities rp and the noise b (t) are therefore independent since Gaussian.
In conclusion, the optimal receiver can base its decision only on the quantities
rp , p = 1, 2, · · · , N with:
T
rp = r(t)νp (t)dt (2.58)
0
Passing the signal r(t) provided by the transmission channel to the N quantities
rp is called demodulation.
Example
Let us consider an M-PSK modulation for which the sj (t) signals are of the
form:
sj (t) = Ah(t) cos(2πf0 t + ϕ0 + φj )
The signals sj (t) define a space with N = 2 dimensions if M > 2. The normed
and orthogonal functions νp (t), p = 1, 2 can be expressed respectively as:
2
ν1 (t) = cos(2πf0 t + ϕ0 )
T
2
ν2 (t) = T
sin(2πf0 t + ϕ0 )
where ŝj (t) is the signal that was transmitted and R = (r1 · · · rp · · · rN ) the
output of the demodulator. To simplify the notations, the time reference has
been omitted for the components of observation R. Pr {sj (t)/R} denotes the
probability of sj (t) conditionally to the knowledge of observation R.
Using Bayes’ rule, the MAP criterion can again be written:
where πj = Pr {sj (t)} represents the a priori probability of transmitting the sig-
nal sj (t) and p(R|sj (t)) is the probability density of observation R conditionally
to the emission of the signal sj (t) by the modulator.
Taking into account the fact that the components rp = sjp +bp of observation
R conditionally to the emission of the signal sj (t) are non-correlated Gaussian,
with mean sjp and variance N0 /2, we can write:
'
N '
N
ŝj (t) if πj p(rp |sj (t)) > πn p(rp |sn (t))
p=1 p=1
∀n
= j p = 1, 2, · · · , M
2. Digital communications 41
After simplification:
N
N
ŝj (t) if rp sjp + Cj > rp snp + Cn ∀ n
= j n = 1, 2, · · · , M (2.59)
p=1 p=1
Ej
N
where Cj = N0
2
ln(πj ) − 2
with Ej = (sjp )2 .
p=1
Noting that:
T T
N
N
r(t)sj (t)dt = rp νp (t) sjm νm (t)dt
0 0 p=1 m=1
and recalling that the functions νp (t) are normed and orthogonal, we obtain:
T
N
r(t)sj (t)dt = rp sjp
0 p=1
T T
N
N
s2j (t)dt = sjp νp (t) sjm νm (t)dt
0 0 p=1 m=1
and finally:
T
N
s2j (t)dt = s2jp
0 p=1
Taking into account the above, the MAP criterion can again be written in the
form:
(T (T
ŝj (t) if r(t)sj (t)dt + Cj > r(t)sn (t)dt + Cn
0 0
(2.60)
∀ n
= j n = 1, 2, · · · , M
Ej (T
where Cj = N0
2 ln(πj ) − 2 with Ej = s2j (t)dt.
0
42 Codes and Turbo Codes
If all the sj (t) signals are transmitted with the same probability (πj = 1/M ),
the term Cj reduces to −Ej /2. In addition, if all the sj (t) signals have in
addition the same energy Ej = E (the case of phase or frequency shift keying
modulation), then the MAP criterion is simplified and becomes:
T T
ŝj (t) if r(t)sj (t)dt > r(t)sn (t)dt ∀ n
= j n = 1, 2, · · · , M (2.61)
0 0
T
r= r(t)ν(t)dt
0
with ν(t) = T2 cos(2πf0 t + ϕ0 ).
For a continuous data stream, the estimation of the symbols ai is done by
integrating the product r(t)ν(t) on each time interval [iT, (i + 1)T [. If a matched
filter is used rather than an integrator, the sampling at the output of the filter
is realized at time (i + 1)T .
On the interval [0, T [, assuming iid information data di , all the amplitude
states have the same probability and decision rule (2.59) leads to:
1 1
Âj if rsj − s2j > rsn − s2n ∀ n
= j (2.62)
2 2
T
T 1
sj = Aj cos(2πf0 t + ϕ0 )ν(t)dt = Aj if f0 >> (2.63)
2 T
0
2. Digital communications 43
The coherent receiver, shown in Figure 2.13, takes its decision by comparing
observation r to a set of (M − 1) thresholds of the form:
−(M − 2)A T2 , · · · , −2pA T2 , · · · , −2A T2 , 0,
(2.64)
2A T2 , · · · , 2pA T2 , · · · , (M − 2)A T2
Example
Consider a 4-ASK modulation, the three thresholds being −2A T2 , 0, 2A T2 .
The decisions are the following:
Âj = −3A if r < −2A T2
Âj = −A if −2A T2 < r < 0
Âj = A if 0 < r < 2A T2
Âj = 3A if r > 2A T2
The mean error probability on the symbols, denoted P es, is equal to:
1
M
P es = P e2j−1−M
M j=1
The conditional error probabilities can be classified into two types. The first
type corresponds to the probabilities that the observation is higher or is lower
than a certain threshold and the second type, to the probabilities that the
44 Codes and Turbo Codes
TYPE 2: Probabilities that the observation does not fall between two thresholds
) *
P e2j−1−M = Pr Âj
= (2j − 1 − M )A | Aj = (2j − 1 − M )A
)
P e2j−1−M = 1− Pr (2j − 2 − M )A T2 < r < (2j − M )A T2 |
Aj = (2j − 1 − M )A}
Observation r is+Gaussian conditionally to a realization of the amplitude Aj ,
with mean ±Aj T /2 and variance N0 /2. The conditional probabilities have
the expressions: ,
1 A2 T
P e(M−1) = P e−(M−1) = erfc
2 2N0
,
A2 T
P e(2j−1−M) = erfc
2N0
where the complementary error function is always defined by:
+∞
2
erfc(x) = 1 − erf(x) = √ exp(−u2 )du
π
x
To calculate the mean error probability on the groups of data, we have two
conditional probabilities of type 1, and (M − 2) conditional probabilities of type
2. ,
M −1 A2 T
P es = erfc
M 2N0
2 (M 2 −1)
Introducing the average energy Es = A2T 3
received by group of data, the
mean error probability is again equal to:
M −1 3 Es
P es = erfc
M M 2 − 1 N0
2. Digital communications 45
or again as a function of the received mean power P and of the transmitted bit
rate D = 1/Tb : ,
M −1 3 log2 (M ) P
P es = erfc (2.65)
M M 2 − 1 N0 D
Figure 2.14 provides the mean error probability P es as a function of the signal
to noise ratio Eb /N0 for different values of the parameter M .
Figure 2.14 – Mean error probability P es as a function of the signal to noise ratio
Eb /N0 for different values of parameter M of an M-ASK modulation.
The bit error probability P eb can be deduced from the mean error proba-
bility P es in the case where Gray coding is used and under the hypothesis of
a sufficiently high signal to noise ratio. Indeed, in this case we generally have
an erroneous bit among the log2 (M ) data transmitted. (We assume that the
46 Codes and Turbo Codes
amplitude of the received symbol has a value immediately lower or higher than
the value of the transmitted amplitude).
P es Eb
P eb ∼
= if >> 1 (2.66)
log2 (M ) N0
with:
π
φj = (2j + 1) + θ0 j = 0, 1, · · · , (M − 1)
M
The sj (t) signals, for M > 2, define a two-dimensional space. Observation R at
the output of the demodulator is therefore made up of two components (r1 , r2 )
with:
T T
r1 = r(t)ν1 (t)dt r2 = r(t)ν2 (t)dt
0 0
where ν1 (t) = T2 cos(2πf0 t + ϕ0 ) et ν2 (t) = − T2 sin(2πf0 t + ϕ0 ).
Using decision rule (2.50) and assuming the information data iid, all the states
of phase have the same probability and the decision is the following:
2
2
φ̂j if rp sjp > rp snp ∀n
= j (2.68)
p=1 p=1
with:
T T 1
sj1 = A cos φj and sj2 = A sin φj if f0 >> (2.69)
2 2 T
Taking into account the expressions of sj1 and of sj2 , the decision rule can again
be written:
binary data, whatever the value of M , does not have an analytical expression.
However, at high signal to noise ratios, this probability is well approximated by
the following expression:
! $
∼ Eb π Eb
P es = erfc log2 (M ) sin if >> 1 (2.71)
N0 M N0
For this modulation, phase φj takes the values 0 or π. Each phase state is
therefore associated with a bit. Adopting the following coding:
φj = 0 → di = 1 φj = π → di = 0
Observation r2 is not used for decoding the data di since the space defined by
the signals modulated with two phase states has dimension N = 1.
48 Codes and Turbo Codes
For 2-PSK modulation, there is an exact expression of the bit error proba-
bility P eb. Assuming the binary data iid, this error probability is equal to:
1 1
P eb = Pr {r1 > 0|φj = π} + Pr {r1 < 0|φj = 0}
2 2
Output r1 of the demodulator is:
+
r1 = ± Eb + b
1√1
(∞ √
P eb = 2 πN0
exp(− N10 (r1 + Eb )2 )dr1
0
(0 √
+ 21 √πN
1
exp − N10 (r1 − Eb )2 dr1
0
−∞
For this modulation, phase φj takes four values π/4, 3π/4, 5π/4, 7π/4.
With each state of the phase are associated two binary data. For equiprob-
able phase states, the MAP criterion leads to the following decision rules:
π
φ̂j = 4 if r1 > 0; r2 > 0
3π
φ̂j = 4
if r1 < 0; r2 > 0
5π
φ̂j = 4
if r1 < 0; r2 < 0
7π
φ̂j = 4 if r1 > 0; r2 < 0
P es = 1 − (1 − P eb)2
For high signal to noise ratios, the error probability P eb is much lower than
unity and thus, for 4-PSK modulation:
Eb
P es = 2P eb if >> 1 (2.76)
N0
Figure 2.17 shows the error probability P es as a function of the relation Eb /N0
for different values of the parameter M 1 .
Two situations can arise depending whether the length m of the groups of data
at the input of the modulator is even or not. When M = 2m with even m,
1 For M = 2, it is the exact relation 2.74 that is used since P es = P eb, for M > 2, P es is
Figure 2.17 – Error probability P es as a function of the relation Eb /N0 for different
values of M of an M-PSK modulation.
the group of data can be separated into two sub-groups of length m/2, each
sub-group being associated respectively with amplitudes Ajc and Ajs , with:
√ √
Ajc = (2j − 1 − M )A j = 1, 2, · · · , M
√ √ (2.78)
Ajs = (2j − 1 − M )A j = 1, 2, · · · , M
√
In this case, the M-QAM modulation is equivalent to two M -ASK modulations
having quadrature carriers. The coherent receiver for an M-QAM modulation is
made up of two components called phase
√ and quadrature, and each component,
similar to a receiver for modulation M -ASK, performs the estimation of a
group of m/2 binary data. The receiver is shown in Figure 2.18.
The error probability√on a group of m/2 binary data is equal to the error
probability of an ASK− M modulation, that is:
√ , √
M −1 3 log2 ( M ) Eb
P em/2 = √ erfc (2.79)
M M − 1 N0
2. Digital communications 51
Figure 2.18 – Coherent receiver for M-QAM modulation with M = 2m and even m.
The bit error probability P eb can be deduced from P es if Gray coding is used
and for a sufficiently high signal to noise ratio:
P es
P eb = (2.81)
log2 (M )
where the frequencies fj are chosen in such a way that the M sj (t) signals are
orthogonal. The space defined by these signals therefore has dimension N = M
and the vectors νj (t) are of the form:
2
νj (t) = cos(2πfj t + ϕj ) j = 1, 2, · · · M (2.84)
T
Assuming the information data di iid, the sj (t) signals have the same probability.
In addition, they have the same energy E and thus, the MAP criterion leads to
the following decision rule:
M
M
ŝj (t) if rp sjp > rp snp ∀n
= j (2.85)
p=1 p=1
T
T
sjp = sj (t)νp (t)dt = A δjp (2.86)
2
0
Taking into account the expression of sjp , decision rule (2.85) is simplified and
becomes:
ŝj (t) if rj > rn ∀n
= j (2.87)
The optimal coherent receiver for an M-FSK modulation is shown in Figure 2.19.
Conditionally to the emission of the signal sj (t), the M outputs of the de-
modulator are of the form:
+
rj = Es + bj rp = bp ∀p
= j
2. Digital communications 53
where bj and bp are AWGN, with zero mean and variance equal to N0 /2.
The probability of a correct decision on a group of binary data, conditionally to
the emission of the signal sj (t) is equal to:
+∞
The noises being non-correlated and therefore independent, since they are Gaus-
sian, we have:
⎛ rj ⎞M−1
2
1 b
Pr {b1 < rj , · · · , bp < rj , · · · , bM < rj } = ⎝ √ exp − db⎠
πN0 N0
−∞
The probability of a correct decision is the same whatever the transmitted signal.
The signals sj (t) being equiprobable, the mean probability of a correct decision
P c is therefore equal to the conditional probability P cj . The symbol error
probability is then equal to:
P es = 1 − P c
54 Codes and Turbo Codes
The error probability can also be expressed as a function of the relation Eb /N0
where Eb is the energy used to transmit a bit with Eb = Es / log2 (M ).
We can also try to determine the bit error probability P eb. All the M − 1
groups of erroneous data appear with the same probability:
P es
(2.89)
M −1
In a group oferroneous
data, we can have k erroneous data among m and this
m
can occur in possible ways. Thus, the average number of erroneous data
k
in a group is:
m m
P es 2m−1
k =m m P es
k M −1 2 −1
k=1
2m−1
P eb = P es (2.90)
2m − 1
where m = log2 (M ).
The error probability for an M-FSK modulation does not have a simple
expression and we have to resort to digital computation to determine this prob-
ability as a function of the relation Eb /N0 . We show that for a given error
probability P eb, the relation Eb /N0 necessary decreases when M increases. We
also show that probability P es tends towards a value arbitrarily small when M
tends towards infinity, and for Eb /N0 = 4 ln 2 dB, that is, -1.6 dB.
For a binary transmission (M = 2), there is an expression of the error probability
P eb.
Let us assume that the signal transmitted is s1 (t), we then have:
+
r1 = Eb + b1 r2 = b2
Assuming the two signals identically distributed, error probability P eb has the
expression:
1
P eb = (P eb1 + P eb2 )
2
The noises b1 and b2 are non-correlated Gaussian, with zero mean and same
variance equal to N0 /2. The variable z, conditionally to the emission of the
2. Digital communications 55
√
signal s1 (t), is therefore Gaussian, with mean Eb and variance N0 . Thus
probability P eb1 is equal to:
0 √
1 (z − Eb )2
P eb1 = √ exp − dz
2πN0 2N0
−∞
with:
i−L
θi−L = πh an
n=−∞
h is the modulation index and the symbols ai are M -ary in the general case.
They take their values in the alphabet { ±1, ±3, · · · , ±(2p + 1), · · · , ± ( M
−1 ) }; M = 2m .
If the modulation index h = m/p where m and p are relatively prime integers,
phase θi−L takes its values in the following sets:
) *
(p−1)πm
θi−L ∈ 0, πm p , p , ··· ,
2πm
p if m even
) * (2.94)
(2p−1)πm
θi−L ∈ 0, πm 2πm
p , p , ··· , p if m odd
56 Codes and Turbo Codes
The evolution of phase φ(t) can be represented by a trellis whose states are
defined by (ai−L+1 , ai−L+2 , · · · , ai−1 ; θi−L ), that is:
Figure 2.20 – Trellis associated with phase φ(t) for MSK modulation.
• for each branch l leaving a state of the trellis at instant iT calculate metric
zil as defined later, that is, for MSK and GMSK, 2L × 4 metrics have to
be calculated;
• for each path converging to instant (i + 1)T towards a state of the trel-
lis, calculate the cumulated metric, then select the path with the largest
cumulated metric, called the survivor path;
• among the survivor paths, trace back along s branches of the path having
the largest cumulated metric and decode symbol ai−s ;
• continue the algorithm on the following time interval.
(i+1)T
where r(t) = s(t) + b(t) is the signal received by the receiver and b(t) is a
AWGN, with zero mean and power spectral density equal to N0 /2. Quantity
φli (t) represents a realization of phase φ(t) associated with branch l of the trellis
on time interval [iT, (i + 1)T [.
Taking into account the fact that the noise can be put in the form b(t) =
bc (t) cos(2πf0 t + ϕ0 ) − bs (t) sin(2πf0 t + ϕ0 ) and that f0 >> 1/T , the branch
metric can again be written:
(i+1)T
(i+1)T
zil = rc (t) cos φli (t)dt + rs (t) sin φli (t)dt (2.96)
1T iT
where the signals rc (t) and rs (t) are obtained after transposition into baseband
of r(t)(multiplying r(t) by cos(2πf0 t + ϕ0 ) and − sin(2πf0 t + ϕ0 ) respectively,
then lowpass filtering).
i
l
cos φi (t) = cos 2πh an q(t − nT ) + θi−L
l l
n=i−L+1
(2.97)
i
sin φli (t) = sin 2πh aln q(t − nT ) + l
θi−L
n=i−L+1
Putting:
i
ψil (t) = 2πh aln q(t − nT )
n=i−L+1
Ali = (rc (t) cos ψil (t) + rs (t) sin ψil (t))dt
iT
(i+1)T
Bil = (rs (t) cos ψil (t) − rc (t) sin ψil (t))dt
iT
For MSK modulation, it is possible to decode symbols ai by using a receiver
similar to that of 4-PSK modulation. Indeed, the MSK signal can be written in
the following form:
S(t) = A c2i−1 h(t − 2iT ) cos 2T
πt
cos(2πf0 t + ϕ0 )
i (2.99)
− c2i h(t − (2i + 1)T ) sin 2T
πt
sin(2πf0 t + ϕ0 )
i
58 Codes and Turbo Codes
It is easy to show that the error probabilities on binary symbols c2i−1 and
c2i are identical and equal to:
1 Eb
P eci = erfc (2.101)
2 N0
where Eb is the energy used to transmit a binary symbol ci .
To obtain binary data ai from the symbols ci , at the output of the coherent
receiver we have to use a differential decoder given by the following equations:
a2i = c2i c2i−1 and a2i−1 = c2i−1 c2I−2
The bit error probability P eb on ai is:
P eb = 1 − (1 − P eci )2
thus for P eci << 1, a good approximation of the bit error probability P eb is:
P eb ≈ 2P eci (2.102)
As a first approximation, the performance of the MSK modulation is identical
to that of the 4-PSK modulation.
2. Digital communications 59
M-ASK ci = ai bi = 0
Let g(t) be the impulse response of the emission passband filter centred on
the carrier frequency. This waveform can be written:
or equivalently:
g(t) =
e {ge (t) exp [j(2πf0 t + θ0 )]} (2.105)
60 Codes and Turbo Codes
where ge (t) = gc (t) + jgs (t) is the baseband-equivalent waveform of the emission
filter. The output e(t) of the emission filter is equal to:
e(t) = A ci z(t − iT ) (2.106)
i
where z(t) = h(t) ⊗ ge (t) is, in the general case, a complex waveform while h(t)
is real.
where:
x(t) = z(t) ⊗ gr (t)
b (t) = b(t) ⊗ gr (t)
Sampling signal y(t) at time t0 + nT , we obtain:
y(t0 + nT ) = A ci x(t0 + (n − i)T ) + b (t0 + nT ) (2.109)
i
Considering that in the general case x(t) = p(t) + jq(t) is a complex waveform,
the sample y(t0 + nT ) can again be written in the form:
y(t0 + nT ) = Acn p(t0 ) + A cn−i p(t0 + iT )
i=0
(2.110)
+jA cn−i q(t0 + iT ) + b (t0 + nT )
i
The first term Acn p(t0 ) represents the desired information for the decoding of
the symbol cn , the following two terms being Intersymbol Interference (ISI)
terms. Let us examine the outputs of the two components of the receiver, called
the in-phase and quadrature components, corresponding to the real part and
to the imaginary part of y(t0 + nT ) respectively. We can notice that the in-
phase component (respectively the quadrature component) depends on symbols
2. Digital communications 61
Let us analyse, for example, output yc (t) of the reception filter of the in-phase
component on time interval [t1 + nT, t1 + (n + 1)T [ where t1 represents an ar-
bitrary time. Replacing t by t + t1 + nT , signal yc (t) can be written:
yc (t + t1 + nT ) = A an−i p(t + t1 + iT )
i (2.112)
−A bn−i q(t + t1 + iT ) for 0 ≤ t ≤ T
i
In the absence of ISI, at the sampling time, all the plots of p(t) pass through a
single point. The more open the eye diagram at the sampling time, the greater
the immunity of the transmission to noise. In the same way, the greater the
horizontal aperture of the eye diagram, the less sensitive the transmission is to
62 Codes and Turbo Codes
Figure 2.22 – Eye diagram of an (a) ISI and (b) ISI-free 2-PSK modulation.
positioning errors of the sampling time. In the presence of ISI, the different
plots of p(t) no longer pass through a single point at the sampling time and the
ISI contributes to closing the eye diagram.
The output of the reception filter at time t0 + nT of the in-phase component
of the receiver is equal to:
yc (t0 + nT ) = Aan p(t0 ) + A an−i p(t0 + iT )
i=0 (2.114)
−A bn−i q(t0 + iT )
i
√
For a M-QAM
√ modulation, the useful signal is A(2j − 1 − M )p(t0 ), for j =
1, · · · , M and the decision is taken by comparing signal yc (t0 + nT ) to a set
of thresholds separated by 2p(t0 ). There will be errors in the absence of noise if
the ISI, for certain configurations of symbols ci , is such that the received signal
is situated outside the correct decision zone. This occurs if the ISI is higher in
absolute value to p(t0 ). This situation is translated by the following condition:
Maxan−i ,bn−i an−i p(t0 + iT ) − bn−i q(t0 + iT ) > |p(t0 )|
i=0 i
Taking
√ into account the fact that the largest value taken by symbols ai and bi
is M − 1, the previous condition becomes:
" #
|p(t0 + iT )| + |q(t0 + iT )|
√ i=0 i
Dmax = ( M − 1) ≥1 (2.115)
|p(t0 )|
Quantity Dmax is called the maximum distortion. When the maximum distor-
tion is greater than unity, the eye diagram is closed at the sampling time and
errors are possible, even in the absence of noise.
2. Digital communications 63
p(t0 + iT ) = 0 ∀ i = 0 (2.116)
q(t0 + iT ) = 0 ∀i (2.117)
which can again be written by using the complex signal x(t) = p(t) + jq(t)
which, after Fourier transform and taking into account the condition of absence
of ISI, becomes:
XE (f ) = p(t0 ) exp(−j2πf t0 ) (2.123)
The equality of the relations (2.121) and (2.123) gives:
i i
exp j2π(f − )t0 X(f − ) = T p(t0 )
i
T T
Putting:
X(f )
X t0 (f ) = exp(j2πf t0 )
p(t0 )
64 Codes and Turbo Codes
the condition for the absence of ISI can be expressed from X t0 (f ) by the follow-
ing relation:
+∞
i
X t0 (f − ) = T (2.124)
i=−∞
T
X t0 (f ) = T |f | ≤ W
(2.126)
= 0 elsewhere
or again:
X(f ) = T p(t0 ) exp(−j2πt0 ) |f | ≤ W
(2.127)
= 0 elsewhere
X t0 (f ) = T if 0 ≤ |f | ≤ 1−α
2T
- πT .
T
2
1 + sin α
1
( 2T − |f |) if 1−α
2T
≤ |f | ≤ 1+α
2T
(2.129)
0 if |f | > 1+α
2T
or again:
X(f ) = p(t0 )X t0 (f ) exp(−j2πf t0 ) (2.130)
whose waveform is:
sin π(t−t
T
0)
cos πα(t−t
T
0)
Figure 2.23 shows the frequency domain X t0 (f ) and time domain x(t) char-
acteristics of a raised-cosine function for different values of α, called the roll-off
factor.
The bandwidth of the raised-cosine function is W = (1 + α)/2T ; 0 ≤ α ≤ 1.
Function x(t) is again non-causal in the strict sense but the more the roll-off
factor increases, the greater this function decreases. Thus, by choosing t0 large
enough, implementing a raised cosine becomes possible. Figure 2.24 plots the
eye diagrams obtained with raised-cosine functions for different values of roll-off
factor.
All the plots of x(t) pass through a single point at the sampling time t0 + nT ,
whatever the value of the roll-off factor. Note that the larger the roll-off factor,
the greater the horizontal aperture of the eye diagram. For α = 1, the aperture
of the eye is maximum and equal to T ; the sensitivity to any imprecision about
the sampling time is thus minimum.
66 Codes and Turbo Codes
Figure 2.24 – Eye diagrams for modulations with (2-PSK or 4-PSK) binary symbols
for different values of roll-off factor α (0.2, 0.5, 0.8).
Having determined the global spectrum X(f ) that the transmission chain
must satisfy in order to guarantee the absence of ISI, we will now establish the
frequency characteristics of the emission and reception filters.
Replacing |Z(f )| by its value in relation (2.137), we obtain the magnitude spec-
trum of the reception filter:
+ +
p(t0 ) CSα (f )
|Ge (f )| = (2.139)
|H(f )|
We have obtained the magnitude spectrum of the emission and reception filters.
These filters are therefore defined to within one arbitrary phase. Distributing
time t0 between the emission and reception filters, we obtain:
+ +
Gr (f ) = p(t0 ) CSα (f ) exp(−j2πf t1 )
√ √ (2.140)
p(t0 ) CSα (f )
Ge (f ) = H(f ) exp(−j2πf (t0 − t1 ))
(M − 1) A2
γe (f ) = 2 p(t0 )CSα (f ) (2.142)
3 T
68 Codes and Turbo Codes
Let us introduce the power transmitted at the output of the emission filter:
+∞
+∞
+∞
1 1 1
P = γe (f − f0 )df + γe (f + f0 )df = γe (f )df (2.148)
4 4 2
−∞ −∞ −∞
+∞
A2 A2
P = p(t0 )CSα (f )df = p(t0 ) (2.149)
T T
−∞
Using expressions (2.147) and (2.149), the error probability is equal to:
1 PT
P ean = erfc (2.150)
2 2N0
The energy Eb used to transmit an information bit dn is:
Eb = P Tb (2.151)
reflected on different obstacles and, thus, the receiver receives M copies of the
signal transmitted, each copy being affected by an attenuation ρn (t), with delay
τn (t) and a Doppler frequency shift fnd (t). The attenuations, delays and Doppler
frequencies are functions of time in order to take into account the time-varying
channel. To simplify the notations, in the following we will omit variable t for
the attenuations, the delays and the Doppler frequencies.
Let r(t) be the response of the transmission channel to the signal s(t), which
is generally written in the form:
M
r(t) = ρn exp j2π(fnd + f )(t − τn ) (2.153)
n=1
M
r(t) = ρn exp j2π(fnd t − (fnd + f )τn )) s(t) (2.154)
n=1
and thus the frequency response of the transmission channel is defined by:
M
c(f, t) = ρn exp −j2π(f τn − fnd t + fnd τn ) (2.155)
n=1
The multipath channel is generally frequency selective, that is, it does not trans-
mit all the frequency components of the signal placed at its input in the same
way, certain components being more attenuated than others. The channel will
therefore create distortions of the transmitted signal. In addition, their evolution
over time can be more or less rapid.
To illustrate the frequency selectivity of a multipath channel, we have plotted
in Figure 2.25 the power spectrum of the frequency response of this channel for
M = 2, in the absence of a Doppler frequency shift (fnd = 0) and fixing τ1 to
zero.
2
|c(f )| = ρ21 (1 + α cos 2πf τ2 )2 + α2 sin2 2πf τ2 (2.156)
with α = ρ2 /ρ1 .
Two parameters are now introduced: coherence bandwidth Bc and coherence
time tc that allow the transmission channel to be characterized in relation to
the frequency selectivity and its evolution speed.
Coherence bandwidth
There are several definitions of the coherence bandwidth but the most common
definition is:
1
Bc ≈ (2.157)
Tm
2. Digital communications 71
with:
ϕn (t) = fnd t − (f0 + fnd )τn
Putting:
M
M
ac (t) = ρn cos ϕn (t) and as (t) = ρn sin ϕn (t)
n=1 n=1
and:
ac (t) as (t)
cos φ(t) = + and sin φ(t) = +
a2c (t) + a2s (t) a2c (t) + a2s (t)
signal r(t) can again be written:
r(t) = Aα(t) ai h(t − iT ) cos(2πf0 t + φ(t))
i (2.158)
− bi h(t − iT ) sin(2πf0 t + φ(t))
i
+
with α(t) = a2c (t) + a2s (t).
For a frequency non-selective multipath channel, the modulated M-QAM
signal only undergoes an attenuation α(t) and dephasing φ(t).
Modelling the attenuations ρn , the delays τn , and the Doppler frequencies fnd
by mutually independent random variables then, for large enough M and for a
given t, ac (t) and as (t) tend towards non-correlated random Gaussian variables
(central limit theorem). The attenuation α(t), for a given t, follows a Rayleigh
law and the phase φ(t) is equidistributed on [0, 2π[.
2α α2
p(α) = 2 exp − 2 α≥0 (2.159)
σα σα
with σα2 = E α2 .
The attenuation α(t) can take values much lower than unity and, in this
case, the information signal received by the receiver is very attenuated. Its level
is then comparable to, if not lower than, that of the noise. We say that the
transmission channel shows deep Rayleigh fading.
If band B occupied by the modulated signal is higher than the coherence
band, the channel is frequency selective. Its frequency response, on band B, is
2. Digital communications 73
no longer flat and some spectral components of the modulated signal can be very
attenuated. The channel introduces a distortion of the modulated signal which
results in the phenomenon of Intersymbol Interference (ISI). In the presence of
ISI, the signal at a sampling time is a function of the symbol of the modulated
signal at this time but also of the symbols prior to and after this time. ISI
appears as noise that is added to the additive white Gaussian noise and, of
course, degrades the performance of the transmission.
Coherence time
The coherence time tc of a fading channel is defined by:
1
tc ≈ (2.160)
Bd
d
where Bd is the Doppler band of the channel that is well approximated by fmax
with:
d
fmax = Maxfnd (2.161)
The coherence time of the channel is a measure of its evolution speed over time.
If tc is much higher than the symbol period T of the modulated signal, the
channel is said to be slow-fading. For a frequency non-selective slow-fading
channel, attenuation α(t) and phase φ(t) are practically constant over one or
more symbol periods T .
A channel is frequency non-selective slow-fading if it satisfies the following
condition:
T M Bd < 1 (2.162)
For high Ēb /N0 , the error probabilities can be approximated by:
1
2-PSK or 4-PSK P eb ≈ (2.169)
4Ē/N0
1
2-FSK P eb ≈ (2.170)
2Ē/N0
On a Rayleigh fading channel, the performance of the different receivers is
severely degraded compared to those obtained on a Gaussian channel (with
identical Eb /N 0 at the input). Indeed, on a Gaussian channel, the error prob-
abilities P eb decrease exponentially as a function of the signal to noise ratio
Eb /N0 whereas on a Rayleigh fading channel, the decrease in the probability
P eb is proportional to the inverse of the average signal to noise ratio, Ēb /N0 .
To improve the performance on a Rayleigh fading channel, we use two tech-
niques, which we can combine, diversity and, of course, channel coding (which
is, in fact, diversity of information).
2. Digital communications 75
and variance:
N0 n 2
L
2
σZ = (α ) (2.174)
2 n=1 i
76 Codes and Turbo Codes
1 √
P eb(ρ) = erfc ρ (2.175)
2
with:
Eb n 2
L
ρ= (α )
N0 n=1 i
with: ,
Ēb /LN0 L−1+n (L − 1 + n)!
η= and =
1 + Ēb /LN0 n n!(L − 1)!
where Ēb is the average total energy used to transmit an information bit (Ēb =
LEb ).
For a high signal to noise ratio, an approximation of the bit error probability
P eb is given by:
L
2L − 1 1 Ēb
P eb ≈ pour >> 1 (2.178)
L 4Ēb /LN0 LN0
In the presence of diversity, the bit error probability P eb decreases following the
inverse of the signal to noise ratio to the power of L.
2. Digital communications 77
For an 2-FSK modulation, calculating the bit error probability in the pres-
ence of coherent reception is similar to that of 2-PSK modulation. We obtain
the following result:
1−η
L L−1
L−1+n
1+η
n
P eb = (2.179)
2 n 2
n=0
With a high signal to noise ratio, a good approximation of the error probability
P eb is given by:
L
2L − 1 1 Ēb
P eb ≈ for >> 1 (2.180)
L 2Ēb /LN0 LN0
Note that diversity like the type presented here is a form of coding that uses
a repetition code and weighted decoding at reception. Figure 2.26 shows the
performance of 2-PSK and 2-FSK modulations in the presence of diversity.
Bc of the channel. We can also transmit these data by using the same carrier but
at a time separated by a quantity at least equal to the coherence time tc of the
channel. This way of proceeding amounts to performing frequency interleaving
combined with time interleaving.
where cn,i = an,i + jbn,i is a complex modulation symbol, h(t) a unit ampli-
tude rectangular pulse shape of width T , and N is the number of carriers with
frequency fn .
Considering time interval [iT, (i + 1)T [, the signal s(t) is equal to:
N −1
s(t) = A
{cn,i exp(j2πfn t)} ∀ t ∈ [iT, (i + 1)T [ (2.182)
n=0
2N −1
nl
sl = A cn, i exp j2π (2.188)
n=0
2N
Δ ≥ τMax (2.189)
We put:
T = ts + Δ
In the presence of a guard interval, the modulation symbols are still of duration T
but the discrete Fourier transform at reception is realized on the time intervals
[iT + Δ, (i + 1)T [. Proceeding thus, we can check that on this time interval
only the modulation symbol transmitted between iT and (i + 1)T is taken into
account for the decoding: there is therefore no intersymbol interference.
The introduction of a guard interval has two consequences. The first is that
only a part of the energy transmitted on emission is exploited on reception.
Indeed, we transmit each modulation symbol over a duration T and we recover
this same symbol from an observation of duration ts = T −Δ. The loss, expressed
2. Digital communications 81
The second consequence is that the orthogonality of the carriers must be ensured
so as to be able to separate these carriers at reception, that is, separating them
by a quantity 1/ts . The band of frequencies occupied by the signal OFDM with
a guard interval is therefore:
N −1
B= (2.191)
ts
that is, an expansion of the bandwidth compared to a system without a guard
interval of 1 + Δ/ts .
In the presence of a guard interval, we should therefore choose Δ so as
to minimize the degradations of the signal to noise ratio and of the spectral
efficiency, that is, choose the smallest possible Δ compared to duration ts .
Theoretical limits
The recent invention of turbo codes and the rediscovery of LDPC codes have
brought back into favour the theoretical limits of transmission which were re-
puted to be inaccessible until now. This chapter provides the conceptual bases
necessary to understand and compute these limits, in particular those that cor-
respond to real transmission situations with messages of finite length and binary
modulations.
of the emitted message and random, is added to the useful value (spurious
effects of attenuation can also be added, like on the Rayleigh channel). Thermal
noise is well represented by a Gaussian random process. When demodulation
is performed in an optimal way, it results in a random Gaussian variable whose
sign represents the best hypothesis concerning the binary symbol emitted. The
channel is then characterized by its signal to noise ratio, defined as the ratio
of the power of the useful signal to that of the perturbing noise. For a given
signal to noise ratio, the decisions taken on the binary symbols emitted are
assigned a constant error probability, which leads to the simple model of the
binary symmetric channel.
Figure 3.1 – Binary symmetric channel with error probability p. The transition prob-
abilities of an input symbol towards an output symbol are equal two by two.
Another description of the same channel can be given in the following way:
let E be a binary random variable taking value 1 with a probability p < 1/2 and
value 0 with the probability 1 − p. The hypothesis that p < 1/2 does not restrict
the generality of the model because changing the arbitrary signs 0 and 1 leads
to replacing an initial error probability p > 1/2 by 1 − p < 1/2. The behaviour
of the channel can be described by the algebraic expression Y = X ⊕ E, where
X and Y are the binary variables at the input and at the output of the channel
respectively, E a binary error variable, and ⊕ represents the modulo 2 addition.
y=x⊕e (3.1)
The first equality in (3.4) defined I(X; Y ) as the logarithmic increase of the
probability of X that results on average from the data Y , that is, the average
quantity of information that the knowledge of Y provides about that of X. The
second equality in (3.4), deduced from the first using (3.5), shows that this value
is symmetric in X and in Y . The quantity of information that Y provides about
X is therefore equal to what X provides about Y , which justifies the name of
mutual information.
Mutual information is not sufficient to characterize the channel because the
former also depends on the entropy of the source, that is, the quantity of in-
formation that it produces on average per emitted symbol. Entropy, that is,
in practice the average number of bits necessary to represent each symbol, is
defined by:
H(X) = Pr (X) log2 (Pr (X))
X
This capacity is maximum for p = 0 (then it equals 1 Sh, like the entropy of the
source: the channel is then "transparent") and null for p = 1/2, which is what
we could expect since then there is total incertitude.
The larger n is, the smaller the probability that this rule has an erroneous
result is, and this probability tends towards 0 (assuming that p is kept constant)
when n tends towards infinity, provided that d > 2np. So d has also to tend
towards infinity.
Still in geometrical terms, the construction of the best possible code can
therefore be interpreted as involving choosing M < 2n points belonging to Sn
in such a way that they are as far away as possible from each other (note that
the inequality M < 2n implies that the code is necessarily redundant). For a
given value of the error probability p of the channel (still assumed to be binary
symmetric) it is clear that there is a limit to the number M of points that can
be placed in Sn while maintaining the distance between these points higher than
2np. Let Mmax be this number. The value
log2 (Mmax )
C = lim
n→∞ n
measures in shannons the greatest quantity of information per symbol that can
be communicated without any errors through the channel, and it happens to
coincide with the capacity of the channel defined in Section 3.1. No explicit
procedure making it possible to determine Mmax points in Sn while maintaining
the distance between these points higher than 2np is generally known, except in
a few simple, not very useful, cases.
The mathematical expectation, or mean, of this weight is n/2 and its variance
equals n/4. For very large n, a good approximation of the weight distribution
of the codewords obtained by random coding is a Gaussian distribution. If
3. Theoretical limits 89
of its distances, and that of a linear code on the distribution of its weights,
we can undertake to build a linear code having a weight distribution close to
that of random coding. This idea has not been much exploited directly, but we
can interpret turbo codes as being a first implementation. Before returning to
the design of coding procedures, we will make an interesting remark concerning
codes that imitate random coding.
The probability of obtaining a codeword of length n and weight w by ran-
domly drawing the bits 0 and 1 each with a probability of 1/2, independently
of each other, is given by (3.7). Drawing a codeword 2k times, we obtain an
average number of words of weight w equal to:
n
Nw,k = 2−(n−k)
w
n
Assuming that n, k and w are large, we can express approximately,
w
using the Stirling formula:
n 1 nn+1/2
≈√
w 2π ww+1/2 (n − w)n−w+1/2
The minimal weight obtained on average, that is wmin , is the largest number
such that Nwmin ,k has value 1 for the best integer approximation. The number
Nwmin ,k is therefore small. It will be sufficient for us to put it equal to a constant
λ close to 1, which it will not be necessary to detail further because it will be
eliminated from the calculation. We must therefore have:
1 nn+1/2
2−(n−k) √ wmin +1/2 n−wmin +1/2
=λ
2π wmin (n − wmin )
Taking the base 2 logarithms and ignoring the constant in relation to n, k and
wmin that tend towards infinity, we obtain:
k
1− ≈ H2 (wmin /n)
n
where H2 (·) is the binary entropy function:
This is the asymptotic form of the Gilbert-Varshamov bound that links the
minimum distance d of the code having the greatest minimum distance possible,
given the parameters k and n. It is a lower bound but, in its asymptotic form,
it is very close to equality. A code whose minimum distance verifies this bound
with equality is considered to be good for the minimum distance criterion. This
shows that a code built with a weight distribution close to that of random coding
is also good for this criterion.
capacity thus obtained will not be given here. We merely note that this capacity
is higher than that of the binary symmetric channel that is deduced from it by
taking a hard decision, that is, restricted to the binary symbol Y = x̂, by a
factor that increases when the signal to noise ratio of the channel decreases. It
reaches π/2 when we make this ratio tend towards 0, if the noise is Gaussian.
For a given signal to noise ratio, the binary input continuous output channel is
therefore better than the binary symmetric channel that can be deduced from
it by taking hard decisions. This channel is also simpler than the hard decision
channel, since it does not have any means to take a binary decision according
to the received real value. Taking a hard decision means losing the information
carried by the individual variations of this value, which explains that the capacity
of the soft output channel is higher.
where I(X; Y) is the mutual information between X and Y. When the input
and the output of the channel are real values, and no longer discrete values, the
probabilities are replaced by probability densities and the sums in relation (3.4)
become integrals. For realizations x and y of the random variables X and Y,
we can write the mutual information as a function of the probabilities of x and
y:
+∞
+∞
p (y |x )
I (X; Y) = ··· p (x) p (y |x ) log2 dxdy (3.13)
p (y)
−∞ −∞
1 23 4
2N times
To determine C, we therefore have to maximize (3.13) which is valid for all types
of inputs (continuous, discrete) of any dimension N . In addition, the maximum
is reached for equiprobable inputs (see Section 3.1), for which we have:
1
M
p (y) = p (y |xi )
M i=1
3. Theoretical limits 93
5
N
p(x) = p(xn )
n=1
where x = [x1 x2 . . . xN ] is the input vector and p(xn ) = N (0, σ2 ). The mutual
information is reached for equiprobable inputs, and denoting N0 /2 the variance
of the noise, (3.14) after development gives:
N 2σ 2
C= log2 1 + .
2 N0
This relation is modified to make the mean energy Eb of each of the bits and
Eb
consequently the signal to noise ratio N 0
. For N=2, we have:
Eb
Cb = log2 1 + R (3.15)
N0
the capacity being expressed in bit per second per Hertz and per couple di-
mension. Taking R = 1, this leads to the ratio Eb /N0 being limited by the
normalized Shannon limit, as shown in Figure 3.2.
discrete values, M being the modulation order, and have dimension N , that is
xi = [xi1 , xi2 , · · · , xiN ]. The transition probability of the Gaussian channel is:
" #
5N 5N
1 − (yn − xin )
2
p (y |xi ) = p (yn |xin ) = √ exp
n=1 n=1
πN0 N0
C = log2 (M )
+∞ ! $
+∞
( π)−N
√ M M
2 2
− M · · · exp − |t| log2 exp −2tdij − |dij | dt (3.16)
i=1 j=1
−∞ −∞
1 23 4
N times
C being expressed in bit/symbol. We note that dij increases when the signal
to noise ratio increases (N0 decreases) and the capacity tends towards log2 (M ).
The different possible modulations only appear in the expression of dij . The
discrete sums from 1 to M represent the possible discrete inputs. For the final
calculation, we express dij as a function of Es /N0 according to the modulation,
Es being the energy per symbol, and the capacity of the channel can be deter-
3. Theoretical limits 95
mined using a computer. Figure 3.3 gives the result of the calculation for some
PSK and QAM modulations.
= √ 1
2πσ2
2α exp −α2 exp − |y−αx
2σ 2
i|
dα
0
One development of this expression means that we can explicitly write this
conditional probability density that turns out to be independent of α:
|y|2
|y|2 − 2 2
xi ye |xi | +2σ
−
2 σe 2σ2
p(y |xi ) = π |xi |2 +2σ2 +2 2 2 3/2
! " (|xi | +2σ ) #$
xi y
× 1 − 12 erfc
σ 2(|xi |2 +2σ2 )
96 Codes and Turbo Codes
+∞
C= Cα p (α) dα = E [Cα ]
0
In the case of a channel with equiprobable binary inputs, the probability of each
of the inputs is 1/2 and the vectors x and y can be treated independently in
x and y scalar values. Considering that (3.17) is valid for any ρ, in order to
obtain the closest upper bound to the PER, we must minimize the right-hand
side of (3.17) as a function of ρ. Introducing the rate R = k/n, it therefore
means minimizing for 0 ≤ ρ ≤ 1, the expression :
⎧ +∞
1+ρ ! " # " #$ ⎫k
⎨ 1 1
1
(y − 1)
2
(y + 1)
2 ⎬
2ρR √ × exp − 2 + exp − 2 dy
⎩ 2 σ 2π 2σ (1 + ρ) 2σ (1 + ρ) ⎭
−∞
The explicit value of σ is known for binary inputs (2-PSK and 4-PSK modu-
−1/2
lations): σ = (2REb /N0 ) . An exploitable expression of Gallager’s upper
bound on the PER of a binary input channel is then:
⎧ +∞ " " + ##1+ρ ⎫k
E
⎨ exp−y 2
y 4RE /N ⎬
−k Nb b 0
e 0 min 2ρR+1 √ cosh dy (3.18)
0ρ1 ⎩ π 1+ρ ⎭
0
This expression links the PER, the rate, the size k of the messages and the signal
to noise ratio Eb /N0 , for a Gaussian binary input channel. It gives an upper
bound of the PER and not an equality. This equation is not very well adapted
to all cases. In particular, simulations show that for a rate close to 1, the bound
is far too lax and does not give really useful results.
If we want to determine the penalty associated with a given packet size, we
can compare the result obtained by evaluating (3.18) with the result obtained
by computing the capacity that considers infinite size packets
lower bounded by the limit obtained by a binary input and upper bounded by
a continuous input.
The first results were given by Shannon [3.4] and by the so-called sphere-
packing bound method which provides a lower bound on the error probability
of random codes on a Gaussian channel. We again assume maximum likelihood
decoding. A codeword of length n is a sequence of n whole numbers. Geometri-
cally, this codeword can be assimilated to a point in an n-dimensional Euclidean
space and the noise can be seen as a displacement of this point towards a neigh-
bouring point following a Gaussian distribution (see Section 3.1.4). Denoting P
the power of the emitted
√ signal, all the codewords are situated on the surface of
a sphere of radius nP .
k
√ Observing that we have a code with 2 points (codewords), each at a distance
nP from the origin in n-dimensional space, any two points are equidistant from
the origin, and consequently, the bisector of these two points (a hyperplane of
dimension n − 1) passes through the origin. Considering the set of 2k points
making up the code, all the hyperplanes pass through the origin and form pyra-
mids with the origin as the summit. The error probability, after decoding, is
2k
Pr(e) = 21k Pr(ei ), where Pr(ei ) is the probability that the point associated
i=1
with the codeword i is moved by the noise outside the corresponding pyramid.
The principle of Shannon’s sphere-packing bound involves this geometrical
vision of coding. However, it is very complex to keep the ’pyramid’ approach
and the solid angle pyramid Ωi , around the codeword i, is replaced by a cone
with the same summit and the same solid angle Ωi (Figure 3.4).
Figure 3.4 – Assimilation of a pyramid with one cone in Shannon’s so-called sphere-
packing approach.
It can be shown that the probability that the signal remains in the cone is
higher than the probability that it remains in the same solid angle pyramid.
Consequently, the error probability can be lower-bounded in the following way:
k
1 ∗
2
Pr (e) ≥ k Q (Ωi ) (3.19)
2 i=1
denoting Q∗ (Ωi ) the probability that the noise moves point i out of the solid
angle cone Ωi (therefore a decoding error is made on this point). We also observe
that, if we consider the set of codewords equally distributed on the surface of
3. Theoretical limits 99
√
the sphere of radius nP , the decoding pyramids form a partition of this same
sphere, and therefore the solid angle of this sphere Ω0 is the sum of all the solid
angles of the Ωi pyramids. We can thus replace the solid angles Ωi by the mean
solid angle Ω0 /2k .
This progression, which leads to a lower bound on the error probability for an
optimal decoding of random codes on the Gaussian channel, is called the sphere-
packing bound because it involves restricting the coding to an n-dimensional
sphere and the effects of the noise to movements on this sphere.
Mathematical simplifications give an exploitable form of the lower bound on
the packet error rate (PER):
- .
ln (PER) ≥ R k
ln (G (θi , A) sin θi ) − 12 A2 − AG (θi , A) cos θi
- .
θi ≈ arcsin 2−R
- √ . (3.20)
G (θi , A) ≈ A cos θi + A2 cos2 θi + 4 /2
+
A= 2REb /N0
These expressions link the size k of the messages, the signal to noise ratio Eb /N0
and the coding rate R. For high values of R and for block sizes k lower than a
few tens of bits, the lower bound is very far from the real PER.
Asymptotically, for block sizes tending towards infinity, the bound obtained
by (3.20) tends towards the Shannon limit for a continuous input and output
channel such as presented in Section 3.2. In the same way as for the binary
input channel, if we wish to quantify the loss caused by the transmission of finite
length packets, we must normalize the values obtained by evaluating (3.20) by
removing the Shannon limit (3.15) from them, the penalty having to be null
when the packet sizes tend towards infinity. The losses due to the transmission
of finite length packets in comparison with the transmission of a continuous flow
of data are less in the case of a continuous input channel than in the case of a
binary input channel.
Figure 3.5 – Penalties in Eb/N 0 for the transmission of finite length packets for the
continuous input channel and the binary input channel as a function of size k (infor-
mation bits), for a coding rate 5/6 and different PER.
where erfc(x) denotes the complementary error function defined by erfc (x) =
(∞ - .
√2
π
exp −t2 dt. dmin is the minimum Hamming distance of the code associ-
x
ated with the modulation considered, 2-PSK or 4-PSK in the present case. N (d)
represents the multiplicities of the code (see Section 1.5). In certain cases, these
multiplicities can be determined precisely (like for example simple convolutional
codes, Reed-Solomon codes, BCH codes, etc. . . . ), and (3.21) can easily be eval-
uated. For other codes, in particular turbo codes, it is not possible to determine
these multiplicities easily and we have to consider some realistic hypotheses in
order to get round the problem. The hypotheses that we adopt for turbo codes
and for LDPC codes are the following [3.1]:
• Hypothesis 1: Uniformity. There exists at least one codeword of weight1
dmin having an information bit di equal to "1", for any place i of the
systematic part (1 ≤ i ≤ k).
• Hypothesis 2: Unicity. There is only one codeword of weight dmin such
that di ="1".
• Hypothesis 3: Non-overlapping. The k codewords of weight dmin associated
with the k bits of information are distinct.
Using these hypotheses and limiting ourselves to the first term of the sum in
(3.21), the upper bound becomes an asymptotic approximation (low PERs):
" #
k Eb
PER ≈ erfc dmin R (3.22)
2 N0
The three hypotheses, taken separately, are more or less realistic. Hypotheses 1
and 3 are somewhat pessimistic as to the quantity of codewords at the minimum
distance. As for hypothesis 2, it is slightly optimistic. The three hypotheses to-
gether are suitable for an acceptable approximation of the multiplicity, especially
since imprecision about the value of this multiplicity does not affect the quality
of the final result. Indeed, the targeted minimum distance that we wish to deter-
mine from (3.22) appears in an exponential argument, whereas the multiplicity
is a multiplying coefficient.
It is then possible to combine (3.22) with the results obtained in Section 3.3
which provide the signal to noise ratio limits. Giving Eb /N0 the limit value
beyond which using a code is not worthwhile, we can extract from (3.22) the
MHD sufficient to reach a PER at that limit value. Given, on the one hand,
that (3.22) assumes ideal (maximum likelihood) decoding and, on the other
hand, that the theoretical limit is not reached in practice, the targeted MHD
can be slightly lower than the result of this extraction.
Figure 3.6 presents some results obtained using this method.
1 The codes being linear, distance and weight have the same meaning.
102 Codes and Turbo Codes
Figure 3.6 – Minimum distances required for 4-PSK modulation and a Gaussian
channel as a function of packet size, for some coding rates and P ER = 10−7 and
P ER = 10−11 .
where Es is the energy per symbol emitted and N0 the monolateral noise power
spectral density. It is however not possible to exploit (3.23) in the general case.
We require an additional hypothesis, which is then added to the three hypotheses
formulated in the previous section, and assume that NS is much lower than the
size of the interleaved codewords:
• Hypothesis 4: A symbol does not contain more than one opposite bit in
the correct codeword and in the wrong codeword.
This hypothesis allows the following probabilities to be expressed:
Pr {ϕi − ϕi = π/4} = 2/3; Pr {ϕi − ϕi = 3π/4} = 1/3
which means that two times out ofthree on average, the Euclidean distance
between the concurrent symbols is 2 ETs sin(π/8) and, one time out of three, is
raised to 2 ETs sin(3π/8) (Figure 3.8).
Considering the asymptotic case, that is, putting Ns = dmin , yields:
PER8-PSK,Π random ≈
- 2 .dmin d
min
dmin - 1 .j+1 (3.24)
k 3 2 erfc Es
N0 j sin
2 3π
8 + (dmin −j) sin2 π
8
j=0 j
104 Codes and Turbo Codes
Figure 3.8 – 8-PSK constellation with Gray coding. Es and T are the energy and the
duration of a symbol, respectively.
Like for 4-PSK and 8-PSK modulations, this relation used jointly with signal to
noise ratio limits makes it possible to obtain targeted MHD values for 16-QAM
modulation (Figure 3.10).
Some observations can be made from the results obtained in Section 3.4.
For example, in the particular case of 4-PSK modulation, for a rate R = 1/2,
size k = 4000 bits and PER of 10−11 , Figure 3.6 provides a targeted MHD of
50. From the evaluation that we can make from the Gilbert-Varshamov bound
3. Theoretical limits 105
Figure 3.9 – Minimum distances required for 8-PSK modulation and a Gaussian
channel as a function of packet size, for some coding rates and P ER = 10−7 and
P ER = 10−11 .
(relation (3.9)), random codes have a minimum distance of about 1000. There
is therefore a great difference between what ideal (random) coding can offer and
what we really need.
A second aspect concerns the dependency of the required MHD upon the
modulation used, a dependency that turns out to be minimum. Thus, a code
106 Codes and Turbo Codes
Figure 3.10 – Minimum distances required for 16-QAM modulation and a Gaussian
channel as a function of the packet size, for some coding rates and P ER = 10−7 and
P ER = 10−11 .
Bibliography
[3.1] C. Berrou, E. Maury, and H. Gonzalez. Which minimum hamming distance
do we really need? In Proceedings of the 3rd International Symposium on
Turbo Codes & related topics (ISTC 2003), pages 141–148, Brest, France,
Sept. 2003.
[3.2] R.E. Gallager. Information Theory and Reliable Communications. John
Wiley & Sons, 1968.
Block codes
Block coding involves associating with a data block d of k symbols coming from
the information source, a block c, called the codeword, of n symbols with n ≥ k.
The (n − k) is the amount of redundancy introduced by the code. Knowledge of
the coding rule at reception enables errors to be detected and corrected, under
certain conditions. The ratio k/n is called the coding rate of the code.
The message symbols of the information d and of the codeword c take their
values in a finite field Fq with q elements, called a Galois field, whose main
properties are given in the appendix to this chapter. We shall see that for most
codes, the symbols are binary and take their value in the field F2 with two
elements (0 and 1). This field is the smallest Galois field.
The elementary addition and multiplication operations in field F2 are re-
sumed in Table 4.1.
a b a + b ab
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
A block code of length n is an application g of the set Fkq towards the set
Fnq that associates a codeword c with any block of data d.
g : Fkq → Fnq
d → c = g(d)
A block code with parameters (n, k), that we denote C(n, k), is linear if the
codewords are a vector subspace of Fnq , that is, if g is a linear application. A
direct consequence of linearity is that the sum of two codewords is a codeword,
and that the null word made up of n symbols at zero is always a codeword.
We will now consider linear block codes with binary symbols. Linear block
codes with non binary symbols will be addressed later.
k−1
d= dj ej (4.1)
j=0
Taking into account the fact that application g is linear, the word c associated
with d is equal to:
k−1
c = g(d) = dj g(ej ) (4.2)
j=0
Expressing the vector g(ej ) from a base (e 0 , · · · , e l , · · · , e n−1 ) of Fn2 , we ob-
tain:
n−1
g(ej ) = gjl e l (4.3)
l=0
Matrix G with k rows and n columns, having its elements gjl ∈ F2 is called
a generator matrix of the code C(n, k). It associates the codeword c with the
block of data d by the matrix relation:
c = dG (4.5)
Example 4.1
Let us consider a linear block code called the parity check code denoted
C(n, k), with k = 2 and n = k + 1 = 3 (for a parity check code, the sum of the
symbols of a codeword is equal to zero). We have four codewords:
Dataword Codeword
00 000
01 011
10 101
11 110
To write a generator matrix of this code, let us consider, for example, the canon-
ical base of F22 :
e 0 = 1 0 , e1 = 0 1
and the canonical base of F32 :
e 0 = 1 0 0 , e 1 = 0 1 0 , e 2 = 0 0 1
We can write:
g (e0 ) = [101] = 1.e 0 + 0.e 1 + 1.e 2
g (e1 ) = [011] = 0.e 0 + 1.e 1 + 1.e 2
A generator matrix of the parity check code is therefore equal to :
1 0 1
G=
0 1 1
112 Codes and Turbo Codes
By permuting the first two vectors of the canonical base of F32 , we obtain a new
generator matrix of the same parity check code:
0 1 1
G =
1 0 1
In this example, we have just seen that the generator matrix of a block code is
not unique. By permuting the rows or the columns of a generator matrix or by
adding one or several other rows to a row, which means considering a new base
in Fn2 , it is always possible to write a generator matrix of a block code in the
following form:
⎡ ⎤
1 0 ··· 0 p0,1 ··· p0,l ··· p0,n−k
⎢⎢ 1
0 ··· 0 p1,1 ··· p1,l ··· p1,n−k ⎥
⎥
G = Ik P = ⎢. . .. .. .. .. ⎥ (4.6)
⎣ .. .. ··· . . ··· . ··· . ⎦
0 0 ... 1 pk−1,1 ... pk−1,l ... pk−1,n−k
ciq = dq , q = 0, 1, · · · , k − 1.
n−1
x⊥y ⇔ x, y = xj yj = 0
j=0
With each linear block code C(n, k), we can associate a dual linear block
code that verifies that any word of the dual code is orthogonal to any word of
the code C(n, k). The dual of code C(n, k) is therefore a vector subspace of Fn2
made up of 2n−k codewords of n symbols. This vector subspace is the orthogonal
4. Block codes 113
of the vector subspace made up of 2k words of the code C(n, k). It results that
any word c of code C(n, k) is orthogonal to the rows of the generator matrix H
of its dual code
cHT = 0 (4.8)
where T indicates the transposition.
A vector y belonging to Fn2 is therefore a codeword of code C(n, k) if, and
only if, it is orthogonal to the codewords of its dual code, that is, if:
yHT = 0
The decoder of a code C(n, k) can use this property to verify that the word
received is a codeword and thus to detect the presence of errors. That is why
matrix H is called the parity check matrix of code C(n, k).
It is easy to see that the matrices G and H are orthogonal (GHT = 0).
Hence, when the code is systematic and its generator matrix is of the form
G = [Ik P], we have:
Example 4.2
Taking into account the fact that the distance between two codewords is equal
to the weight of their sum, the minimum distance of a block code is also equal
to the minimum weight of its non-null codewords.
114 Codes and Turbo Codes
When the number of codewords is very high, searching for the minimum
distance can be laborious. A first solution to get round this difficulty is to
determine the minimum distance from the parity check matrix.
We have seen that dmin is equal to the minimum Hamming weight of the
non-null codewords. Let us consider a codeword of weight dmin . The orthog-
onality property cHT = 0 implies that the sum of dmin columns of the parity
check matrix is null. Thus dmin corresponds to the minimum number of linearly
dependent columns of the parity check matrix.
A second solution to evaluate dmin is to use higher bounds of the minimum
distance. A first bound can be expressed as a function of the k and n parameters
of the code. For a linear block code whose generator matrix is written in the
systematic form G = [Ik P], the (n − k) columns of the matrix In−k of the
parity check matrix (H = PT In−k ) being linearly independent, any column
of PT can be expressed as at most a combination of these (n − k) columns. The
minimum distance is therefore upper bounded by:
dmin ≤ n − k + 1 (4.12)
Another bound of the minimum distance, called the Plotkin bound, can be
obtained by noting that the minimum distance is necessarily lower than the
average weight of the non-null codewords. If we consider the set of codewords,
it is easy to see that there are as many symbols at 0 as symbols at 1. Thus
the sum of the weights of all the codewords is equal to n2k−1 . The number of
non-null codewords being 2k − 1, the minimum distance can be upper bounded
by:
n2k−1
dmin ≤ k (4.13)
2 −1
A systematic block code C(n, k) with minimum distance dmin can be short-
ened by setting s < k data symbols to zero. We thus obtain a systematic linear
code C(n − s, k − s). Of course the s symbols set to zero are not transmitted,
but they are retained in order to calculate the (n − k) redundancy symbols. The
minimum distance of a shortened code is always higher than or equal to the
distance of code C(n, k).
k−1
c= d0 d1 ··· dk−2 dk−1 cn−1 with cn−1 = dj
j=0
116 Codes and Turbo Codes
Figure 4.1 – Product code resulting from the serial concatenation of two systematic
block codes.
where d = d0 d1 · · · dk−1 represents the dataword. The minimum
distance of this code is 2.
Example 4.3
Repetition code
For this code with parameters k = 1 and n = 2m + 1, each bit coming from the
information source is repeated an odd number of times. The minimum distance
of this code is 2m + 1. The repetition code C(2m + 1, 1) is the dual code of the
parity check code C(2m + 1, 2m).
4. Block codes 117
Example 4.4
The generator matrix and the parity check matrix of this code, for k = 1,
n = 5, can be the following:
G= 1 1 1 1 1 = I1 P
⎡ ⎤
1 1 0 0 0
⎢ 1 0 1 0 0 ⎥
H=⎢ ⎣ 1
⎥ = PT I4
0 0 1 0 ⎦
1 0 0 0 1
Hamming code
For a Hamming code, the columns of the parity check matrix are the binary
representations of the numbers from 1 to n. Each column being made up of
m = (n − k) binary symbols, the parameters of the Hamming code are therefore:
n = 2m − 1 k = 2m − m − 1
The columns of the parity check matrix being made up of all the possible com-
binations of (n − k) binary symbols except (00 · · · 0), the sum of two columns is
equal to one column. The minimum number of linearly dependent columns is
3. The minimum distance of a Hamming code is therefore equal to 3, whatever
the value of parameters n and k.
Example 4.5
Hadamard code
The codewords of a Hadamard code are made up of the rows of a Hadamard
matrix and of its complementary matrix. A Hadamard matrix has n rows and
n columns (n even) whose elements are 1s and 0s. Each row differs from the
other rows at n/2 positions. The first row of the matrix is made up only of 0,
the other rows having n/2 0 and n/2 1.
For n = 2, the Hadamard matrix is of the form:
0 0
M2 =
0 1
Example 4.6
The rows of M4 and M4 are the codewords of a Hadamard code with parameters
n = 4, k = 3 and with minimum distance equal to 2. In this particular case, the
Hadamard code is a parity check code.
More generally, the rows of matrices Mn and Mn are the codewords of a
Hadamard code with parameters n = 2m , k = m+ 1 and with minimum distance
dmin = 2m−1 .
4. Block codes 119
Reed-Muller codes
A Reed-Muller code (RM) of order r and with parameter m, denoted RMr,m ,
has codewords of length n = 2m and the datawords are made up of k symbols
with:
m m N N!
k =1+ + ···+ , with =
1 r q q! (N − q)!
where r < m. The minimum distance of an RM code is dmin = 2m−r .
The generator matrix of an RM code of order r is built from the generator matrix
of an RM code of order r − 1 and if G(r,m) represents the generator matrix of
the Reed-Muller code of order r and with parameter m, it can be obtained from
G(r−1,m) by the relation:
(r−1,m)
(r,m) G
G =
Qr
m
where Qr is a matrix with dimensions × n.
r
By construction, G(0,m) is a row vector of length n whose elements are equal
to 1. The matrix G(1,m) is obtained by writing on each column the binary
representation of the index of the columns (from 0 to n − 1). For example, for
m = 4, the matrix G(1,m) is given by:
⎡ ⎤
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
⎢ 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 ⎥
G(1,4) = ⎢
⎣ 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 ⎦.
⎥
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Matrix Qr is obtained simply by considering all the combinations of r rows of
G(1,m) and by obtaining the product of these vectors, component by component.
The result of this multiplication constitutes a row of Qr . For example, for
the combination having the rows of G(1,m) with indices i1 , i2 , . . . , ir , the j-
(1,m) (1,m) (1,m)
th coefficient of the row thus obtained is equal to Gi1 ,j Gi2 ,j · · · Gir ,j , the
multiplication being carried out in the field F2 . For example, for r = 2, we
obtain:
⎡ ⎤
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
⎢ 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 ⎥
⎢ ⎥
⎢ 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 ⎥
Q2 = ⎢⎢ ⎥
⎥
⎢ 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 ⎥
⎣ 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 ⎦
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
We can show that the code RMm−r−1,m is the dual code of the code RMr,m ,
that is, the generator matrix of code RMm−r−1,m is the parity check matrix of
120 Codes and Turbo Codes
code RMr,m . For some values of r and m, the generator matrix of code RMr,m
is also its parity check matrix. We then say that code RMr,m is self dual. Code
RM1,3 , for example, is a self dual code.
A linear block code C(n, k) is cyclic if, for any codeword c =
c0 c1 · · · cn−1 , c1 = cn−1 c0 · · · cn−2 , obtained by circu-
lar shift to the right of a symbol of c, is also a codeword. This definition
of cyclic codes means that any circular shift to the right of j symbols of a
codeword, gives another codeword.
For cyclic codes, we use a polynomial representation of the codewords and of the
datawords. Thus, with codeword c we associate the polynomial c(x) of degree
n − 1.
c(x) = c0 + c1 x + · · · + cj xj + · · · + cn−1 xn−1
and with dataword d the polynomial d(x) of degree k − 1.
where cj (x) is also a codeword obtained by j circular shifts to the right of the
symbols of c(x).
4. Block codes 121
xj g(x) j = k − 1, k − 2, . . . , 1, 0.
Let d(x) be the polynomial representation of any dataword. The k codewords
generated by the polynomials xj g(x) have the expression:
cj (x) = xj g(x)d(x) j = k − 1, k − 2, · · · , 1, 0
and the k rows of the matrix G have for their elements the binary coefficients
of the monomials of cj (x).
Example 4.7
c3 (x) = x3 + x5 + x6
c2 (x) = x2 + x4 + x5
c1 (x) = x + x3 + x4
c0 (x) = 1 + x2 + x3
A generator matrix of the code C(7, 4) is equal to:
⎡ ⎤
0 0 0 1 0 1 1
⎢ 0 0 1 0 1 1 0 ⎥
G=⎢ ⎣ 0 1 0 1 1
⎥
0 0 ⎦
1 0 1 1 0 0 0
Taking into account the fact that c(x) is a multiple of the generator poly-
nomial and that the addition and the subtraction can be merged in F2 , we can
then write:
xn−k d(x) = q(x)g(x) + v(x)
v(x) is therefore the remainder of the division of xn−k d(x) by the generator
polynomial g(x). The codeword associated with dataword d(x) is equal to
xn−k d(x) increased by the remainder of the division of xn−k d(x) by the gener-
ator polynomial.
Example 4.8
c(x) = 1 + x3 + x5 + x6
d(x) = 1, x, x2 , x3 .
124 Codes and Turbo Codes
We obtain:
d(x) c(x)
1 1 + x + x3
x x + x2 + x4
x2 1 + x + x2 + x5
x3 1 + x2 + x6
and thus the generator matrix in a systematic form:
⎡ ⎤
1 1 0 1 0 0 0
⎢ 0 1 1 0 1 0 0 ⎥
G=⎢ ⎣ 1
⎥
1 1 0 0 1 0 ⎦
1 0 1 0 0 0 1
Implementation of an encoder
We have just seen that the encoder must carry out the division of xn−k d(x)
by the generator polynomial g(x) then add the remainder v(x) of this division
to xn−k d(x). This operation can be done using only shift registers and adders
in field F2 . As the most difficult operation to carry out is the division of
xn−k d(x) by g(x), let us first examine the schematic diagram of a divisor by
g(x) shown in Figure 4.2. The circuit divisor is realized from a shift register
with (n − k) memories denoted Ri and the same number of adders. The shift
register is initialized to zero and the k coefficients of the polynomial xn−k d(x)
are introduced sequentially into the circuit divisor. After k clock pulses, we
can verify that the result of the division is available at the output of the cir-
cuit divisor, as well as the remainder v(x) which is in the shift register memories.
The schematic diagram of the encoder shown in Figure 4.3, uses the circuit
divisor of Figure 4.2. The multiplication of d(x) by xn−k , corresponding to a
simple shift, is realized by introducing polynomial d(x) at the output of the shift
register of the divisor.
The k data coming from the information source are introduced sequentially
into the encoder (switch I in position 1) that carries out the division of xn−k d(x)
by g(x). Simultaneously, the k data coming from the information source are also
transmitted. Once this operation is finished, the remainder v(x) of the division
is in the (n − k) shift register memories. Switch I then moves to position 2, and
the (n − k) redundancy symbols are sent to the output of the encoder.
4. Block codes 125
BCH codes
Bose-Chaudhuri-Hocquenghem codes, called BCH codes, enable cyclic codes to
be built systematically correcting at least t errors in a block of n symbols, that
is, codes whose minimum distance dmin is at least equal to 2t + 1.
To build a BCH code, we set t or equivalently d, called the constructed
distance of the code and we determine its generator polynomial g(x). The code
obtained has a minimum distance dmin that is always higher than or equal to
the constructed distance.
where mαi (x) is the minimal polynomial with coefficients in field F2 asso-
ciated with αj , and S.C.M. is the Smallest Common Multiple.
It is shown in the appendix that a polynomial with coefficients in F2 having
αj as its root also has α2j as its root. Thus, the minimal polynomials
mαi (x) and mα2i (x) have the same roots. This remark enables us to
simplify the writing of generator polynomial g(x).
n = 2m − 1; k ≥ 2m − 1 − mt; dmin ≥ 2t + 1
Example 4.9
Using the binary representations of the elements of field F16 , we can show
that α2 +α = α5 and that α4 +α8 = α5 (we recall that the binary additions
4. Block codes 127
are done modulo 2 in the Galois field). We then continue the development
of mα (x) and finally we have:
mα (x) = x4 + x + 1
For the computation of mα3 (x), the roots to take into account are
α3 , α6 , α12 , α24 = α9 (α15 = 1), and the other powers of α3 (α48 , α96 , · · · )
give the previous roots again. The minimal polynomial mα3 (x) is therefore
equal to:
mα3 (x) = (x + α3 )(x + α6 )(x + α12 )(x + α9 )
mα3 (x) = x4 + x3 + x2 + x + 1
The S.C.M. of polynomials mα (x) and mα3 (x) is obviously equal to the
product of these two polynomials since they are irreducible and thus, the
polynomial generator is equal to:
g(x) = x8 + x7 + x6 + x4 + 1
m = 4; n = 15; n − k = 8; k = 7; t = 2
The numerical values of parameters (n, k, t) of the main BCH codes and
the associated generator polynomials have been put table form and can be
found in [4.2]. As an example, we give in Table 4.2 the parameters and
the generator polynomials, expressed in octals, of some BCH codes with
error correction capability t = 1 (Hamming codes).
Note : g(x) = 13 in octals gives 1011 in binary, that is, g(x) = x3 + x + 1
n k t g(x)
7 4 1 13
15 11 1 23
31 26 1 45
63 57 1 103
127 120 1 211
255 247 1 435
511 502 1 1021
1023 1013 1 2011
2047 2036 1 4005
4095 4083 1 10123
n = 2m − 1; (n − k) = m + 1; k = 2m − m − 2
4. Block codes 129
Code m g(x)
CRC-12 12 14017
CRC-16 16 300005
CRC-CCITT 16 210041
CRC-32 32 40460216667
The most widely-used CRC codes have the parameters m = 12, 16,
32 and their generator polynomials are given, in octals, in Table 4.3.
Note: g(x) = 14017 in octals corresponds to 1 100 000 001 111 in
binary, that is:
Example 4.10
mβ (x) : roots β, β 2 , β 4 , β 8 , β 16 , β 32 = β 11
mβ 3 (x) : roots β 3 , β 6 , β 12
130 Codes and Turbo Codes
g(x) = x9 + x8 + x7 + x5 + x4 + x + 1
n = 21; (n − k) = 9; k = 12
• Golay code
Among non-primitive BCH codes, the most well-known is certainly the
Golay code constructed over a Galois field Fq with m = 11, q = 2048.
Noting that 2m − 1 = 2047 = 23 × 89, the non-primitive element used
to build a Golay code is β = α89 . The computation of the generator
polynomial of this code constructed on field F2048 leads to the following
expression:
g(x) = x11 + x9 + x7 + x6 + x5 + x + 1
We can show that the minimum distance dmin of a Golay code is 7 and
thus, its correction capability is 3 errors in a block of 23 binary symbols
(β 23 = α2047 = 1). The parameters of a Golay code are therefore:
Note that the reciprocal polynomial of g(x), equal to g̃(x) = x11 g(x−1 )
also enables a Golay code to be produced.
Figure 4.4 – Schematic diagram of the encoder for the RS code (15,11).
Syndrome s is null if, and only if, r is a codeword. A non-null syndrome implies
the presence of errors. However, it should be noted that a null syndrome does
not necessarily mean absence of errors since r can belong to the set of codewords
even though it is different from c. For this to occur, it suffices for word e to be a
codeword. Indeed, for a linear block code, the sum of two codewords is another
codeword.
Finally, let us note that for any linear block code, there are configurations
of non-detectable errors.
Detection capability
Let cj be the transmitted codeword and cl its nearest neighbour. We have the
following inequality:
dH (cj , cl ) dmin
Introducing the received word r, we can write:
and thus all the errors can be detected if the Hamming distance between r and
cl is higher than or equal to 1, that is, if r is not merged with cl .
The detection capability of a C(n, k) code with minimum distance dmin is
therefore equal to dmin − 1.
n
Aj = 2k − 1
j=dmin
The detection of errors therefore remains efficient whatever the error proba-
bility on the transmission channel if the number of redundancy symbols (n − k)
is large enough. The detection of errors is therefore not very sensitive to error
statistics.
When erroneous symbols are detected, the receiver generally asks the source
to send them again. To transmit this re-transmission request, it is then necessary
to have a receiver source link, called a return channel. The data rate on the
return channel being low (a priori, requests for retransmission are short and few
in number), we can always arrange it so that the error probability on this channel
is much lower than the error probability on the transmission channel. Thus, the
performance of a transmission system using error detection and repetition does
not greatly depend on the return channel.
In case of error detection, the emission of the source can be interrupted
to enable the retransmission of the corrupted information. The data rate is
therefore not constant, which can present problems in some cases.
Hard decoding
• Maximum a posteriori likelihood decoding
For hard decoding the received word r is of the form:
r=c+e
Again taking the example of a binary symmetric channel with error probability
p and denoting dH (r, ĉ) the Hamming distance between r and ĉ, the decision
4. Block codes 135
rule is:
n−dH (ci ,r) n−dH (cj ,r)
ĉ = ci ⇔ pdH (ci ,r) (1 − p) > pdH (cj ,r) (1 − p) , ∀cj
= ci
Taking the logarithm of the two parts of the above inequality and considering
p < 0.5, the decision rule of the maximum a posteriori likelihood can finally be
written:
ĉ = ci ⇔ dH (r, ci ) ≤ dH (r, cj ), ∀ cj
= ci ∈ C(n, k)
If two or several codewords are the same distance from r, the codeword ĉ is
chosen arbitrarily among the codewords equidistant from r.
This decoding procedure which is optimal, that is, which minimizes the
probability of erroneous decoding, becomes difficult to implement when the
number of codewords becomes large, which is often the case for the widely-used
block codes.
Example 4.12
This code has 16 codewords but only 8 configurations for the syndrome as indi-
cated in Table 4.4.
Let us assume that the codeword transmitted is c = [0101101] and that the
received word r = [0111101] has an error in position 3. The syndrome is then
equal to s = [101] and, according to the table, e = [0010000]. The decoded
codeword is ĉ = r + e = [0101101] and the error is corrected.
If the number of configurations of the syndrome is still too high to apply
this decoding procedure, we use decoding algorithms specific to certain classes
of codes but that, unfortunately, do not always exploit the whole correction
capability of the code. These algorithms will be presented below.
• Correction power
Let cj be the codeword transmitted and cl its nearest neighbour. We have
the following inequality:
dH (cj , cl ) dmin
Introducing the received word r and assuming that the minimum distance
dmin is equal to 2t + 1 (integer t), we can write:
We see that if the number of errors is lower than or equal to t, cj is the most
likely codeword since it is nearer to r than to cl and thus the t errors can be
corrected. If the minimum distance is now (2t + 2), using the same reasoning,
we arrive at the same error correction capability. In conclusion, the correction
capability of a linear block code with minimum distance dmin with hard decoding
is equal to:
dmin − 1
t= (4.23)
2
where x is the whole part of x rounded down (for example 2.5 = 2).
4. Block codes 137
We can also determine the binary error probability Pe,bit on the information data
after decoding. In presence of erroneous decoding, the maximum a posteriori
likelihood decoder adds at most t errors by choosing the codeword with the
minimum distance from the received word. The error probability is therefore
bounded by:
1
n
n
Pe,bit < (j + t) pj (1 − p)n−j (4.25)
n j
j=t+1
Soft decoding
Considering a channel with additive white Gaussian noise and binary phase mod-
ulation transmission (2-PSK or 4-PSK), the components rj , j = 0, 1, · · · , n− 1
of the received word r have the form:
+
rj = Es c̃j + bj , c̃j = 2cj − 1
where cj = 0, 1 is the symbol in position j of codeword c, c̃j is the binary
symbol associated with cj , Es is the energy received per transmitted symbol
and bj is white Gaussian noise, with zero mean and variance equal to σb2 .
Figure 4.5 – Performance of the algebraic decoding of the (15,7) BCH code. 4-PSK
transmission on a Gaussian channel.
Using the Bayes’ rule and assuming all the codewords equiprobable, the above
inequality can also be written:
ĉ = c if p(r |c ) > p(r |c ), ∀ c
= c ∈ C(n, k) (4.26)
where p(r |c ) is the probability density function of observation r conditionally
to codeword c.
For a Gaussian channel, probability density function p(r |c ) is equal to:
⎛ ⎞
n
1 1
n−1 +
p(r |c ) = √ exp ⎝− 2 (rj − Es c̃j )2 ⎠
2πσb 2σb j=0
The decoded codeword is the one that maximizes the scalar product r, c. We
could also show that the decoded
B codeword
B2 is the one that minimizes the square
√
of the Euclidean distance Br − Es c̃B .
4. Block codes 139
• Chase algorithm
The Chase algorithm is a sub-optimal decoding procedure that uses the max-
imum a posteriori likelihood criterion but considers a very reduced subset of
codewords. To determine this subset of codewords, the Chase algorithm works
in the following way.
sgn(x) = 1 if x 0
= 0 if x < 0
zi = z0 + ei
Example 4.13
n−1
n−1
rj c0,j < rj cl,j ∀ cl
= c0 ∈ C(n, k)
j=0 j=0
The code being linear, we can, without loss of generality, assume that the code-
word transmitted is the null word, that is, c0,j = 0 for j = 0, 1, · · · , n − 1.
The probability of erroneous decoding Pe,word of a codeword is then equal
to: ⎛ ⎞
n−1
n−1
Pe,word = Pr ⎝ rj c1,j > 0 or . . . rj cl,j > 0 or . . .⎠
j=0 j=0
2k
1 Es
Pe,word ≤ erfc wj
2 j=2 N0
For the code (32,26) the missing Aw quantities are obtained from the relation
Aw = An−w for 0 ≤ w ≤ n/2, n/2 even.
The Aw quantities for non-extended Hamming codes can be deduced from
those of extended codes by resolving the following system of equations:
(n + 1)Aw−1 = wAwextended
wAw = (n + 1 − w)Aw−1
where n is the length of the words of the non-extended code.
For the Hamming code (7,4), for example, the Aw quantities are:
8A3 = 4A4extended A3 = 7
4A4 = 4A3 A4 = 7
8A7 = 8A8extended A7 = 1
142 Codes and Turbo Codes
For Golay and extended Golay codes, the Aw quantities are given in Table 4.6.
With a high signal to noise ratio, error probability Pe,word is well approxi-
mated by the first term of the series:
∼ Ad erfc Rdmin Eb if Eb >> 1
Pe,word =
1
(4.30)
2 min N0 N0
The same goes for error probability Pe,bit on the information symbols.
dmin Eb
Pe,bit ∼
= Pe,word if >> 1 (4.31)
n N0
In the absence of coding, the error probability on the binary symbols is equal
to:
1 Eb
p = erfc
2 N0
As seen in Section 1.5, comparing the two expressions of the binary error proba-
bility with and without coding, we observe that the signal to noise ratio Eb /N0
is multiplied by Rdmin in the presence of coding. If this multiplying coefficient is
higher than 1, the coding acts as an amplifier of the signal to noise ratio whose
asymptotic gain is approximated by
Ga = 10 log(Rdmin )(dB)
To illustrate these bounds, let us again take the example of the (15,7) BCH
code transmitted on a Gaussian channel with 4-PSK modulation. In Figure 4.6,
we show the evolution of the binary error probability and word error probability
obtained by simulation from the sub-optimal Chase algorithm (4 non-reliable
positions). We also show the first two terms of the sums appearing in the
bounds given by (4.28) and (4.29). As a reference, we have also plotted the
binary error probability curve of a 4-PSK modulation without coding.
4. Block codes 143
Figure 4.6 – Performance of the soft input decoding of the (15,7) BCH code. 4-PSK
transmission on a Gaussian channel.
with:
codewords were multiples of the generator polynomial. Thus, for any codeword,
we can write:
c(αi ) = 0; ∀ i = 1, 2, · · · , 2t
Decoding RS codes and binary BCH codes can be performed from a vector with
2t components S = [S1 · · · Sj · · · S2t ], called a syndrome.
Sj = r(αj ) = e(αj ), j = 1, 2, · · · , 2t (4.32)
When the components of vector S are all null, there are no errors or, at least, no
detectable errors. When some components of the vector S are non-null, errors
are present that, in certain conditions, can be corrected.
In the presence of t transmission errors, the error polynomial e(x) is of the
form:
e(x) = en1 xn1 + en2 xn2 + · · · + ent xnt
where the enl are non-null coefficients taking their value in the field Fq .
The components Sj of syndrome S are equal to:
Sj = en1 (αj )n1 + · · · + enl (αj )nl + · · · + ent (αj )nt
Putting Zl = αnl and, to simplify the notations enl = el , the component Sj
of the syndrome is again equal to:
Sj = e1 Z1j + · · · + el Zlj + · · · + et Ztj (4.33)
To determine the position of the transmission errors it is therefore sufficient
to know the value of quantities Zl ; j = 1, 2, · · · , t then, in order to correct the
errors, to evaluate coefficients el ; l = 1, 2, · · · , t.
The main difficulty in decoding RS codes or binary BCH codes is determining
the position of the errors. Two methods are mainly used to decode RS codes
or binary BCH codes: Peterson’s direct method and the iterative method using
the Berlekamp-Massey algorithm or Euclid algorithm .
t
Sj = ei Zij , j = 1, 2, · · · , 2t
i=1
t
σd (Zl ) = Zlt + σj Zlt−j = 0, l = 1, 2, · · · , t (4.34)
j=1
Multiplying the two parts of this expression by the same term el Zlq , we
obtain:
t
el Zlt+q + σj el Zlt+q−j = 0, l = 1, 2, · · · , t (4.35)
j=1
Summing relations (4.35) for l from 1 to t and taking into account the defi-
nition of component Sj of syndrome S, we can write:
S2
S2 + σ1 S1 = 0 → σ1 = (4.37)
S1
σ1 S2 + σ2 S1 = S3
σ1 S3 + σ2 S2 = S4
Δ2 = S22 + S1 S3
Finally, for an RS code correcting three errors (t = 3), the relation (4.36) with
t = 3 and q = 1, 2, 3 leads to the following system of three equations:
σ1 S3 + σ2 S2 + σ3 S1 = S4
σ1 S4 + σ2 S3 + σ3 S2 = S5
σ1 S5 + σ2 S4 + σ3 S3 = S6
• Case (b)
1
ei = Δ S1 (Zk2 Zp3 + Zk3 Zp2 ) + S2 (Zk3 Zp + Zk Zp3 ) + S3 (Zk2 Zp + Zk Zp2 ) ,
k
= p
= i, (i, k, p) ∈ {1, 2, 3}3
Δ= Z1i1 Z2i2 Z3i3
1 ≤ i 1 , i2 , i3 ≤ 3
i1 + i2 + i3 = 6
i1
= i2
= i3
• Case (c)
S1 Zp + S2
ei = , p
= i, (i, p) ∈ {1, 2}2
Zi (Z1 + Z2 )
• Case (d)
S12
e1 =
S2
6. Correct the errors: ĉ(x) = r(x) + e(x)
Example 4.14
m = 4 q = 16 n = 15 n − k = 6
Let us assume, for example, that the transmitted codeword is c(x) = 0 and that
the received word has two errors.
r(x) = α7 x3 + α3 x6
Δ3 = 0 Δ2 = α8
σd (x) = x2 + α2 x + α9
The transmitted codeword is the null word; the two errors have therefore
been corrected.
σ1 = S1
S12 S3 +S5
σ2 = S13 +S3 (4.40)
S 2 S3 +S5
σ3 = (S13 + S3 ) + S1 S1 3 +S3
1
4. Block codes 149
For a BCH code with binary symbols correcting up to t = 2 errors, also taking
into account the previous remark and using the expressions of the two coefficients
σj of the error locator polynomial, we obtain:
σ1 = S1
S3 +S13 (4.41)
σ2 = S1
Example 4.15
g(x) = x8 + x7 + x6 + x4 + 1
Let us assume that the transmitted codeword is c(x) = 0 and that the received
word r(x) has two errors.
r(x) = x8 + x3
There are three steps to the decoding: calculate syndrome S, determine the
coefficients σl of the error locator polynomial and search for its roots in field
F16 .
3. Search for the roots of the error locator polynomial in field F16 . By trying
all the elements of field F16 , we can verify that the roots of the error
locator polynomial are α3 and α8 . Indeed, we have
The transmission errors concern the terms x8 and x3 of received word r(x).
The transmitted codeword is therefore c(x) = 0 and the two errors have
been corrected.
The reader can verify that in the presence of a single error, r(x) = xj ;
0 ≤ j ≤ (n − 1), the correction is still performed correctly since:
Chien algorithm
To search for the error locator polynomial roots in the case of codes with binary
symbols, we can avoid going through all the elements of field Fq by using Chien’s
iterative algorithm.
Dividing polynomial σd (x) by xt , we obtain:
σd (x)
σ̃d (x) = = 1 + σ1 x−1 + · · · + σj x−j + · · · + σt x−t
xt
The roots of polynomial σd (x) that are also the roots of σ̃d (x) have the form
αn−j where j = 1, 2, . . ., n − 1 and n = q − 1.
Thus αn−j is a root of σ̃d (x) if:
Taking into account the fact that αn = 1, the condition to satisfy in order for
αn−j to be a root of the error locator polynomial is:
t
σp αjp = 1; j = 1, 2, · · · , (n − 1) (4.42)
p=1
Chien’s algorithm has just tested whether condition (4.42) is satisfied using the
circuit shown in Figure 4.7.
This circuit has a register with t memories initialized with the t coefficients
σj of the error locator polynomial and a register with n memories that stocks
symbols rj ; j = 0, 1, · · · , (n − 1) of word r(x). At the first clock pulse, the
circuit performs the computation of the left-hand part of expression (4.42) for
j = 1. If the result of this computation is equal to 1, αn−1 is a root of the error
locator polynomial and the error that concerned symbol rn−1 is then corrected.
If the result of this computation is equal to 0, no correction is performed. At
the end of this first phase, the σj coefficients contained in the t memories of
the register are replaced by σj αj . At the second clock pulse the circuit again
4. Block codes 151
Figure 4.7 – Schematic diagram of the circuit implementing the Chien algorithm.
5
t
Λ(x) = (1 + Zj x) (4.43)
j=1
t
Λ(x)
Γ(x) = ei Zi x (4.44)
i=1
1 + Zi x
The error locator polynomial whose roots are Zj−1 enables the position of the
errors to be determined and the error evaluator polynomial enables the value
of the error ej to be determined. Indeed, taking into account the fact that
Λ(Zj−1 ) = 0, the polynomial Γ(x) taken in Zj−1 is equal to:
'
Γ(Zj−1 ) = ej (1 + Zp Zj−1 )
p=j
= ej Zj−1 Λ (Zj−1 )
where Λ (x) = dΛ
dx (x).
152 Codes and Turbo Codes
Γ(Zj−1 )
e j = Zj (4.45)
Λ (Zj−1 )
2t
S(x) = Sj xj (4.46)
j=1
Initial conditions:
L0 = 0
Λ(0) (x) = 1 Θ(0) (x) = 1
Γ(0) (x) = 0 Ω(0) (x) = 1
Recursion: 1 ≤ p ≤ 2t
(p−1)
Δp = Λj Sp−j
j
δp = 1 if Δp
= 0 and 2Lp−1 ≤ p − 1
= 0 otherwise
Lp = δp (p − Lp−1 ) + (1 − δp )Lp−1
Λ(p) Γ(p) 1 Δp x Λ(p−1) Γ(p−1)
=
Θ(p) Ω(p) Δ−1
p δp (1 − δp )x Θ(p−1) Ω(p−1)
Termination :
Λ(x) = Λ(2t) (x)
Γ(x) = Γ(2t) (x)
4. Block codes 153
Example 4.16
Let us assume, for example, that the transmitted codeword is c(x) = 0 and that
the received word has two errors.
r(x) = α7 x3 + α3 x6
The set of calculations performed to decode this RS code will be done in field
F16 whose elements are given in the appendix.
In the table above, all the calculations are done in field F16 and take into
account the fact that α15 = 1.
The error locator and error evaluator polynomials are:
Λ(x) = 1 + α2 x + α9 x2
Γ(x) = α13 x + α13 x2
We can verify that the key equation for the decoding has been satisfied.
Indeed, we do have:
e(x) = e3 x3 + e6 x6
e(x) = α7 x3 + α3 x6
and the estimated codeword is ĉ(x) = r(x) + e(x) = 0. The two transmis-
sion errors are corrected.
Euclid’s algorithm
Euclid’s algorithm enables us to solve the key equation for decoding, that is, to
determine polynomials Λ(x) and Γ(x).
Initial conditions:
Example 4.17
Let us again take the RS code used to illustrate the Berlekamp-Massey al-
gorithm. Assuming that the received word is always r(x) = α7 x3 + α3 x6 when
the transmitted codeword is c(x) = 0, the decoding algorithm is the following:
4. Block codes 155
2. Calculate polynomials Λ(x) and Γ(x) from Euclid’s algorithm (the calcula-
tions are performed in field F16 whose elements are given in the appendix).
j = 0 j = 1
R−1 (x) = x5 R0 (x) = S(x)
R0 (x) = S(x) R1 (x) = α5 x3 + α13 x2 + α12 x
Q0 (x) = α9 x + α14 Q1 (x) = αx + α5
R1 (x) = α5 x3 + α13 x2 + α12 x R2 (x) = α14 x2 + α14 x
U1 (x) = α9 x + α14 U2 (x) = α10 x2 + α3 x + α
We can verify that the key equation for the decoding is satisfied and that
the two polynomials obtained are identical, to within one coefficient α, to
those determined using the Berlekamp-Massey algorithm.
The roots of the polynomial Λ(x) are therefore 1/α3 and 1/α6 , and error
polynomial e(x) is equal to:
e(x) = α7 x3 + α3 x6
1 + xn n
∗
S (x) = Γ(x) = Sj xj (4.48)
Λ(x) j=1
156 Codes and Turbo Codes
Coefficient ej is null (no errors) if α−j is not a root of error locator polynomial
Λ(x). In this case, we have S ∗ (α−j ) = 0 since α−jn = 1 (recall that n = q − 1
and αq−1 = 1).
A contrario if α−j is a root of the locator polynomial, coefficient ej is non-null
(presence of an error) and S ∗ (α−j ) is of the form 0/0. This indetermination can
be removed by calculating the derivation of the numerator and the denominator
of expression (4.48).
nα−j(n−1)
S ∗ (α−i ) = Γ(α−i ) −j
Λ (α )
Using Equation (4.45) and taking into account the fact that α−j(n−1) = αj and
that na = a for n odd in a Galois field, coefficient ej is equal to:
ej = S ∗ (α−j ) (4.49)
The extended syndrome can be computed from polynomials Λ(x) and Γ(x) using
the following relation deduced from expression (4.48).
Example 4.18
with:
S1 = α13 S3 = α11
S2 = α6 S4 = α6
Equation (4.50) provides us with the following relation:
S1 x + (α2 S1 + S2 )x2
15
+ (α9 Sk−2 + α2 Sk−1 + Sk )xk
k=3
+(α2 S15 + α9 S14 )x16 + α9 S15 x17 = α13 (x + x2 + x16 + x17 )
4. Block codes 157
= 0 if not
Termination:
Λ(x) = Λ(2t−1) (x)
Example 4.19
Again taking the BCH code that was used to illustrate the computation of
the error locator polynomial with the direct method, let us assume that the
received word is r(x) = x8 + x3 when the transmitted codeword is c(x) = 0.
S1 = r(α) = α8 + α3 = α13
S3 = r(α3 ) = α24 + α9 = 0
S2 = S12 = α26 = α11
S4 = S22 = α22 = α7
Let us again take the decoding of the (15,7) BCH code. The received word
is r(x) = x8 + x3 .
j = 0 j = 1
R−1 (x) = x5 R0 (x) = S(x)
R0 (x) = S(x) R1 (x) = α4 x3 + α6 x2
Q0 (x) = α8 x Q1 (x) = α3 x + α5
R1 (x) = α x + α 6 x2
4 3
R2 (x) = α13 x
U1 (x) = α8 x U2 (x) = α x + α13 x + 1
11 2
4. Block codes 159
For a binary BCH code it is not necessary to use the error evaluator polynomial
to determine the value of coefficients e3 and e8 . However, we can verify that:
−3
e3 = α3 ΛΓ(α )
(α−3 ) = 1
−8
e8 = α8 ΛΓ(α )
(α−8 ) = 1
The decoded word is therefore ĉ(x) = r(x) + e(x) = 0 and the two errors have
been corrected.
where ps is the error probability per q-ary symbol on the transmission channel
and t is the code correction capability in number of q-ary symbols.
When a codeword is wrongly decoded, the error probability per corresponding
Pe,symbol symbol after decoding is upper bounded by:
1
n
n
Pe,symbol ≤ (j + t) pjs (1 − ps )n−j (4.52)
n j=t+1 j
The binary error probability after decoding is obtained from the error probability
per symbol, taking into account that a symbol is represented by m bits:
1
Pe,bit = 1 − (1 − Pe,symbol ) m
At high signal to noise ratio, we can approximate the binary error probability
after decoding:
∼ 1 Pe,symbol Eb >> 1 .
Pe,bit =
m N0
160 Codes and Turbo Codes
Bibliography
[4.1] R.H. Morelos-Zaragoza. The Art of Error Correcting Coding. John Wiley
& sons, 2005.
• irreducible, that is, non factorizable in F2 (in other words, 0 and 1 are not
roots of ϕ(x)),
• of degree m,
The elements of a Galois field Fq are defined modulo ϕ(x) and thus, each element
of this field can be represented by a polynomial with degree at most equal to
(m − 1) and with coefficients in F2 .
Example 1
Consider an irreducible polynomial ϕ(x) in the field F2 of degree m = 2.
ϕ(x) = x2 + x + 1
This polynomial enables a Galois field to be built with 4 elements. The elements
of this field F4 are of the form:
aα + b where a, b ∈ F2
that is:
F4 : {0, 1, α, α + 1}
162 Codes and Turbo Codes
The binary couples correspond to the four values taken by coefficients a and b.
Note that in such a Galois field the "-" sign is equivalent to the "+" sign, that
is:
−αj = αj ∀j ∈ {0, 1, · · · , (q − 2)}
Observing that 2αj = 0 modulo 2, we can always add the zero quantity 2αj to
−αj and we thus obtain the above equality.
For example, for field F4 let us give the rules that govern the addition and
multiplication operations. All the operations are done modulo 2 and modulo
α2 + α + 1.
+ 0 1 α α2
0 0 1 α α2
1 1 0 1 + α = α2 1 + α2 = α
α α 1 + α = α2 0 α + α2 = 1
α2 α2 1 + α2 = α α + α2 = 1 0
× 0 1 α α2
0 0 0 0 0
1 0 1 α α2
α 0 α α2 α3 = 1
α2 0 α2 3
α =1 α4 = α
So, if β is a root of polynomial f (x) then β 2 , β 4 , · · · are also roots of this poly-
nomial. The minimal polynomial with coefficients in F2 having β as a root can
then be written in the form:
Example 2
Let us calculate the minimal polynomial associated with the primitive element
α of Galois field F4 .
F4 : 0, 1, α, α2
The minimal polynomial associated with element α therefore has α and α2
(m = 2) as roots, and can be expressed:
Taking into account the fact that α3 = 1 and that α + α2 = 1 in field F4 , the
polynomial mα (x) is thus equal to:
mα (x) = x2 + x + 1
164 Codes and Turbo Codes
mβ (x) = x + β
These results on minimal polynomials are used to determine the generator poly-
nomials of particular cyclic codes (BCH and Reed-Solomon).
Primitive polynomials
A polynomial with coefficients in F2 is primitive if it is the minimal polynomial
associated with a primitive element of a Galois field. A primitive polynomial
is thus irreducible in F2 and consequently can be used to build a Galois field.
When a primitive polynomial is used to build a Galois field, all the elements of
the field are obtained by raising the primitive element, the root of the primitive
polynomial, to successively increasing powers. As the main primitive polynomi-
als are listed in the literature, the construction of a Galois field with q = 2m
elements can then be done simply by using a primitive polynomial of degree m.
Table 4.9 gives some primitive polynomials.
To end this introduction to Galois fields and minimal polynomials, let us
give an example of a Galois field with q = 16 (m = 4) elements built from the
primitive polynomial x4 +x+1. This field is used to build generator polynomials
of BCH and Reed-Solomon codes and to decode them. The elements of this field
are:
F16 = 0, 1, α, α2 , α3 · · · α14
where α is a primitive element of F16 . With these 16 elements, we can also asso-
ciate a polynomial representation and a binary representation. The polynomial
representation of an element of this field is of the form:
aα3 + bα2 + cα + d
9 α9 + α4 + 1
10 α10 + α3 + 1
Example 3
Some calculations in field F16 are given in table 4.11 for addition, in table 4.12
for multiplication and in table 4.13 for division.
+ α2 α4
α8 0100 + 0101 = 0001 = 1 0011 + 0101 = 0110 = α5
α10 0100 + 0111 = 0011 = α4 0011 + 0111 = 0100 = α2
× α2 α6
α8 α10 α14
α14 α16 = α as α15 = 1 α20 = α as α15 = 1
5
÷ α2 α12
−6
α8 α = α as α15 = 1
9
α4
α14 α−12 = α3 as α15 = 1 α−2 = α as α15 = 1
13
5.1 History
It was in 1955 that Peter Elias introduced the notion of convolutional code
[5.5]. The example of an encoder described is illustrated in Figure 5.1. It
is a systematic encoder, that is, the coded message contains the message to
be transmitted, to which redundant information is added. The message is of
infinite length, which at first sight limits the field of application of this type of
code. It is however easy to adapt it for packet transmissions thanks to tail-biting
techniques.
di
di d i 1 d i 2 d i 3 d i ,r i
D D D Mux
ri
The encoder presented in Figure 5.1 is designed around a shift register with
three memory elements. The redundancy bit at instant i, denoted ri is con-
structed with the help of a modulo 2 sum of the information at instant i, di and
the data present at instants i − 1 and i − 3 (di−1 and di−3 ). A multiplexer plays
the role of a parallel to serial converter and provides the result of the encoding
at a rate twice that of the rate at the input. The coding rate of this encoder is
168 Codes and Turbo Codes
1/2 since, at each instant i, it receives data di and delivers two elements at the
output: di (systematic part) and ri (redundant part).
It was not until 1957 that the first algorithm capable of decoding such codes
appeared. Invented by Wozencraft [5.15], this algorithm, called sequential de-
coding, was then improved by Fano [5.6] in 1963. Four years later, Viterbi
introduced a new algorithm that was particularly interesting when the length of
the shift register of the encoder is not too large [5.14]. Indeed, the complexity
of the Viterbi algorithm increases exponentially with the size of this register
whereas the complexity of the Fano algorithm is almost independent of it.
In 1974, Bahl, Cocke, Jelinek and Raviv presented a new algorithm [5.1]
capable of associating a probability with the binary decision. This property is
very widely used in the decoding of concatenated codes and more particularly
turbocodes, which have brought this algorithm back into favour. It is now
referred to in the literature in one of these three ways: BCJR (initials of the
inventors), MAP (Maximum A Posteriori) or APP (A Posteriori Probability).
The MAP algorithm is rather complex to implement in its initial version,
and it exists in simplified versions, the most common ones being presented in
Chapter 7.
In parallel with these advances in decoding algorithms, a number of works
have treated the construction of convolutional encoders. The aim of these stud-
ies has not been to decrease the complexity of the encoder, since its implantation
is trivial. The challenge is to find codes with the highest possible error correction
capability. In 1970, Forney wrote a reference paper on the algebra of convolu-
tional codes [5.7]. It showed that a good convolutional code is not necessarily
systematic and suggested a construction different from that of Figure 5.1. For a
short time, that paper took systematic convolutional codes away from the field
of research on channel coding.
Figure 5.2 gives an example of a non-systematic convolutional encoder. Un-
like the encoder in Figure 5.1, the data are not present at the output of the
encoder and are replaced by a modulo 2 sum of the data at instant i, di , and
of the data present at instants i − 2 and i − 3 (di−2 and di−3 ). The rate of
the encoder remains unchanged at 1/2 since the encoder always provides two
(1) (2)
elements at the output: ri and ri , at instant i.
When Berrou et al. presented their work on turbocodes [5.4], they rehabil-
itated systematic convolutional codes by using them in a recursive form. The
interest of recursive codes is presented in Sections 5.2 and 5.3. Figure 5.3 gives
an example of an encoder for recursive systematic convolutional codes. The
original message being transmitted (di ), the code is therefore truly systematic.
A feedback loop appears, the structure of the encoder now being similar to that
of pseudo-random sequence generators.
This brief overview has allowed us to present the three most commonly used
families of convolutional codes: systematic, non-systematic, and recursive sys-
tematic codes. The next two sections tackle the representation and performance
5. Convolutional codes and their decoding 169
r (1)
i
di d i 1 r (1)
, r (2)
i i
D D D Mux
d i2 d i 3
r (2)
i
di d i ,r i
ri Mux
di s (1)
i
D D D
s (0)
i
s (2)
i s (3)
i
d (1)
i c
d (1)
i
d (2)
i c
d (2)
i
(m)
d i c
d (m)
i
a(1)
1
a(2)
1
a(m)
1
a(1)
2
a(2)
2
a2(m) a(1) a(2) a(m)
D D D
s (1)
i s (2)
i s (i )
b1 b2
(l)
Using the coefficients aj , each of the m components of the vector di is
selected or not as the term of an addition with the content of a previous flip-flop
(except in the case of the first flip-flop) to provide the value to be stored in the
following flip-flop. The new content of a flip-flop thus depends on the current
input and on the content of the previous flip-flop. The case of the first flip-flop
has to be considered differently. If all the bj coefficients are null, the input is the
result of the sum of the only components selected of di . In the opposite case,
the contents of the flip-flops selected by the non-null bj coefficients are added to
the sum of the components selected of di . The code thus generated is recursive.
Thus, the succession of states of the register depends on the departure state
and on the succession of data at the input. The components of redundancy ri
are finally produced by summing the content of the flip-flops selected by the
coefficients g.
Let us consider some examples.
— The encoder represented in Figure 5.1 is systematic binary, therefore m = 1
(l) (1)
and c = 1. Moreover, all the aj coefficients are null except a1 = 1.
This encoder is not recursive since all the coefficients bj are null. The
(1) (1) (1)
redundancy (or parity) bit is defined by g0 = 1, g1 = 1, g2 = 0 and
(1)
g3 = 1.
— In the case of the non-systematic non-recursive binary (here called "clas-
(l) (1)
sical") encoder in Figure 5.2, m = 1, c = 0 ; among the aj , only a1 = 1
is non-null and bj = 0 ∀j. Two parity bits come from the encoder and
(1) (1) (1) (1) (2) (2)
are defined by g0 = 1, g1 = 1, g2 = 0, g3 = 1 and g0 = 1, g1 = 0,
(2) (2)
g2 = 1, g3 = 1.
d (1)
i
d (1), d (2),ri
d (2)
i Mux
i i
ri
D D D
(1)
Let us take the case of the encoder defined in Figure 5.2. The outputs ri and
(2)
ri are expressed as functions of the successive data d as follows:
(1)
ri = di + di−2 + di−3 (5.2)
with G(1) (D) = 1 + D2 + D 3 , the first generator polynomial of the code and
d(D) the transform in D of the message to be encoded. Likewise, the second
generator polynomial is G2 (D) = 1 + D + D 3 .
These generator polynomials can also be resumed by the series of their coeffi-
cients, (1011) and (1101) respectively, generally denoted in octal representation,
(13)octal and (15)octal respectively. In the case of a non-recursive systematic
code , like the example in Figure 5.1, the generator polynomials are expressed
according to the same principle. In this example, the encoder has generator
polynomials G(1) (D) = 1 and G(2) (D) = 1 + D + D3 .
To define the generator polynomials of a recursive systematic code is not
straightforward. Let us consider the example of Figure 5.3. The first generator
5. Convolutional codes and their decoding 173
where G(1) (D) and G(2) (D) are the generator polynomials of the code shown in
Figure 5.2, which leads to
d(D)
s(D) = G(2) (D)
(5.8)
G(1) (D)
r(D) = G(2) (D)
d(D)
Figure 5.6 – Tree diagram of the code of polynomials [1, 1 + D + D3 ]. The binary pairs
indicate the outputs of the encoder and the values in brackets are the future states.
00 00 00 00 00
(000)
11 11 11
01 01
(001)
10 10
11 11
00 00 00
(010)
11 11 11
01 01
(011) 01
10 10
01
01 01
(100)
10 10 10
10 00 00
(101) 11 11
01
01 01
(110) 10 10 10
00 00
(111) 11 11
di ri di=0
di ri di=1
two states is represented by an arc between the two associated nodes and labelled
with the outputs of the encoder. In the case of a binary code, the transition on an
input at 0 (resp. 1) is represented by a dotted (resp. solid) line. The succession
of si states up to instant t is represented by the different paths between the
initial state and the different possible states at instant t.
Let us show this with the example of the systematic encoder of Figure 5.1.
Hypothesizing that the initial state s0 is state (000) :
• If d1 = 0 then the following state, s1 , is also (000). The transition is
represented by a dotted line and labelled in this first case 00, the value of
the encoder outputs;
• If d1 = 1, then the following state, s1 , is (100). The transition is repre-
sented by a solid line and here labelled 11.
• We must next envisage the four possible transitions: s1 = (000) if d2 = 0
or d2 = 1 and s1 = (100) if d2 = 0 or d2 = 1.
Figure 5.8 – Trellis sections of the codes with generator polynomials [1, 1 + D + D3 ]
(a), [1 + D2 + D3 , 1 + D + D3 ] (b) and [1, (1 + D2 + D3 )/(1 + D + D3 )] (c)
176 Codes and Turbo Codes
Such a representation shows the basic pattern of these trellises: the butterfly,
so called because of its shape. Each of the sections of Figures 5.8 is thus made up
of 4 butterflies (the transitions of states 0 and 1 towards states 0 and 4 make up
one). The butterfly structure of the three trellises illustrated is identical but the
sequences coded differ. It should be noted in particular that all the transitions
arriving at the same node of a trellis of a non-recursive code are due to the same
value at the input of the encoder. Thus, among the two non-recursive examples
treated (Figures 5.8(a) and 5.8(b)), a transition associated with a 0 at the input
necessarily arrives at one of the states between 0 and 3 and a transition with a
1 arrives at one of the states between 4 and 7. It is different in the case of a
recursive code (like the one presented in Figure 5.8(c)): each state allows one
incident transition associated with an input with a 0, and another one associated
with 1. We shall see the consequences of this in Section 5.3.
10
11 (100) 01 11
(110) 10
11
00 (000) 10 (010) 00 (101) 01 (111) 11
01 00 10 00
(001) 01 (011)
di ri di =0
di ri di =1
Figure 5.9 – State machine for a code with generator polynomials [1, 1 + D + D3 ].
10
11 (100) (110) 00
01 01
01
00 (000) 00 (010) 10 (101) 11 (111) 11
11 10 10 00
(001) 01 (011)
ri(1) ri(2) di=0
ri(1) ri(2) di=1
Figure 5.11 – State machine for a code with generator polynomials [1, (1+D2 +D3 )/(1+
D + D3 )].
However, the recursive state machine allows another cycle on a null sequence at
the input: state 4 → state 6 → state 7 → state 3 → state 5 → state 2 → state
1 → state 4.
Moreover, this cycle is linked to the loop on state 0 by two transitions associated
with inputs at 1 (transitions 0 → 4 and 1 → 0). There therefore exists an infinite
number of input sequences with Hamming weight 2 equal to 2 producing a cycle
on state 0. This weight 2 is the minimum weight of any sequence that makes
the recursive encoder leave state 0 and return to zero. Because of the linearity
of the code (see Chapter 1), this value of 2 is also the smallest distance that can
separate two sequences with different inputs that make the encoder leave the
same state and return to the same state.
In the case of non-recursive codes, the Hamming weight of the input se-
quences allowing a cycle on state 0 can only be 1 (state 0 → state 4 → state 2
→ state 1 → state 0). This distinction is essential for understanding the interest
of recursive codes used alone (see Section 5.3) or in a turbocode structure (see
Chapter 7).
2 The Hamming weight of a binary sequence is equal to the number of bits equal to 1.
178 Codes and Turbo Codes
00 00 00 00 00
(000)
11 11 11
01 01
(001)
10 10
11 11
00 00 00
(010)
11 11 11
01 01
(011) 01
10 10
01
01 01
(100)
10 10 10
10 00 00
(101) 11 11
01
01 01
(110) 10 10 10
00 00
(111) 11 11
ri(1) ri(2) di =0
ri(1) ri(2) di =1
Figure 5.12 – RTZ sequence (in bold) defining the free distance of the code with
generator polynomials [1, 1 + D + D3 ].
00 00 00 00 00 00
(000)
11 11 11 11
11 11 11
(001)
00 00 00
11 11
10 10 10 10
(010)
01 01 01 01
01 01 01
(011) 01
10 10 10
01
01 01 01
(100)
10 10 10 10
10 10 10 10
(101) 01 01 01
11
11 11 11
(110) 00 00 00 00
00 00 00
(111) 11 11 11
ri(1) ri(2) di =0
ri(1) ri(2) di =1
Figure 5.13 – RTZ sequences (in bold) defining the free distance of the code with
generator polynomials [1 + D2 + D3 , 1 + D + D3 ].
00 00 00 00 00 00
(000)
11 11 11 11
11 11 11
(001)
00 00 00
11 11
01 01 01 01
(010)
10 10 10 10
10 10 10
(011) 10
01 01 01
10
10 10 10
(100)
01 01 01 01
01 01 01 01
(101) 10 10 10
11
11 11 11
(110) 00 00 00 00
00 00 00
(111) 11 11 11
di ri(2) di =0
di ri(2) di =1
Figure 5.14 – RTZ sequences (in bold) defining the free distance of the code with
generator polynomials [1, (1 + D2 + D3 )/(1 + D + D3 )].
the non-recursive systematic code, the only sequence of this type has a weight
180 Codes and Turbo Codes
equal to 1, which means that if the RTZ sequence is decided instead of the
transmitted "all zero" sequence, only one bit is erroneous. In the case of the
classical code, one sequence at the input has a weight of 1 and another a weight
of 3: one or three bits are therefore wrong if such an RTZ sequence is decoded.
In the case of the recursive systematic code, the RTZ sequences with minimum
weight have an input weight of 3.
Knowledge of the minimum Hamming distance and of the input weight as-
sociated with it is not sufficient to closely evaluate the error probability at the
output of the decoder of a simple convolutional code. It is necessary to com-
pute the distances, beyond the minimum Hamming distance, and their weight in
order to make this evaluation. This computation is called the distance spectrum.
(000)
ae
O²I
OI
(100) O O²I
(110) OI
e g
O²I
OI (010) 1 (101) O (111) O²I
c f h
1 OI 1
(001) O (011)
b d
O
(000)
as
Figure 5.15 – Machine state of the code [1, 1 + D + D3 ], modified for the computation
of the associated transfer function.
Each transition has a label Oi I j , where i is the weight of the sequence coded
and j that of the sequence at the input of the encoder. In our example, j
can take the value 0 or 1 according to the level of the bit at the input of the
5. Convolutional codes and their decoding 181
encoder at each transition and i varies between 0 and 2, since 4 coded symbols
are possible (00, 01, 10, 11), with weights between 0 and 2.
The transfer function of the code T (O, I) is then defined by:
as
T (O, I) = (5.9)
ae
To establish this function, we have to solve the system of equations coming from
the relations between the 9 states (ae, b, c... h and as):
b = c + Od
c = Oe + f
d = h + Dg
e = O2 Iae + OIb
(5.10)
f = O2 Ic + OId
g = OIe + O2 If
h = O2 Ih + OIg
as = Ob
T (O, I) = IO4
+(I 4 + 2I 3 + 3I 2 )O6
+(4I 5 + 6I 4 + 6I 3 )O8
(5.11)
+(I 8 + 5I 7 + 21I 6 + 24I 5 + 17I 4 + I 3 )O10
+(7I 9 + 30I 8 + 77I 7 + 73I 6 + 42I 5 + 3I 4 )O12
+···
T (O, I) = (I 3 + I)O6
+(2I 6 + 5I 4 + 3I 2 )O8
+(4I 9 + 16I 7 + 21I 5 + 8I 3 )O10
(5.12)
+(8I 12 + 44I 10 + 90I 8 + 77I 6 + 22I 4 )O12
+(16I 15 + 112I 13 + 312I 11 + 420I 9 + 265I 7 + 60I 5 )O14
+···
182 Codes and Turbo Codes
Likewise, the recursive systematic code already studied has as its transfer func-
tion:
T (O, I) = 2I 3 O6
+(I 6 + 8I 4 + I 2 )O8
+(8I 7 + 33I 5 + 8I 3 )O10
(5.13)
+(I 10 + 47I 8 + 145I 6 + 47I 4 + I 2 )O12
+(14I 11 + 254I 9 + 649I 7 + 254I 5 + 14I 3 )O14
+···
Comparing the transfer functions from the point of view of the monomial with
the smallest degree allows us to appreciate the error correction capability at
very high signal to noise ratio (asymptotic behaviour). Thus, the non-recursive
systematic code is weaker than its rivals since it has a lower minimum distance.
A classical code and its equivalent recursive systematic code have the same free
distance, but their monomials of minimal degree differ. The first is in (I 3 + I)O6
and the second in 2I 3 O6 . This means that with the classical code an input
sequence with weight 3 and another with weight 1 produce an RTZ sequence
with weight 6 whereas with the recursive systematic code two sequences with
weight 3 produce an RTZ sequence with weight 6. Thus, if an RTZ sequence
with minimum weight is introduced by the noise, the classical code will introduce
one or three errors, whereas its recursive systematic code will introduce three
or three other errors. In conclusion, the probability of a binary error on such
a sequence is lower with a classical code than with a recursive systematic code,
which explains that the former will be slightly better at high signal to noise ratio.
Things are generally different when the codes are punctured (see Section 5.5) in
order to have higher rates [5.13].
To compare the performance of codes with low signal to noise ratio, we
must consider all the monomials. Let us take the example of the monomial in
O12 for the non-recursive systematic code, the classical code and the recursive
systematic code, respectively:
If 12 errors are introduced by the noise on the channel, 232 RTZ sequences
are "available" as errors for the first code, 241 for the second and 241 again
for the third. It is therefore (a little) less probable that an RTZ sequence will
appear if the code used is the non-recursive systematic code. Moreover, the
error expectancy per RTZ sequence of the three codes is 6.47, 7.49 and 6.00,
respectively: the recursive systematic code therefore introduces, on average,
fewer decoding errors than the classical code on RTZ sequences with 12 errors
on the frame coded. This is also true for higher degree monomials. Recursive
and non-recursive systematic codes are therefore more efficient at low signal to
5. Convolutional codes and their decoding 183
d 6 8 10 12 14 ...
ω(d) 6 40 245 1446 8295 ...
Table 5.1 – First terms of the spectrum of the recursive systematic code with generator
polynomials [1, (1 + D2 + D3 )/(1 + D + D3 )].
noise ratio than the classical code. Moreover, we find the monomials I 2 O8+4c ,
where c is an integer, in the transfer function of the recursive code. The infinite
number of monomials of this type is due to the existence of the cycle on a
null input sequence different from the loop on state 0. Moreover, such a code
does not provide any monomials of the form IOc , unlike non-recursive codes.
These conclusions concur with those drawn from the study of state machines in
Section 5.2.
This notion of transfer function is therefore efficient for studying the per-
formance of a convolutional code. A derived version is moreover essential for
the classification of codes according to their performance. This is the distance
spectrum ω(d) whose definition is as follows:
∞
∂T (O, I)
( )I=1 = ω(d)Od (5.14)
∂I
d=df
For example, the first terms of the spectrum of the recursive systematic code,
obtained from (5.13), are presented in Table 5.1. This spectrum is essential for
estimating the performance of codes in terms of calculating their error proba-
bility, as illustrated in the vast literature on this subject [5.9].
The codes used in the above examples have a rate of 1/2. By increasing the
number of redundancy bits n the rate becomes lower. In this case, the powers
of O associated with the branches of the state machines will be higher than or
equal to those of the figures above. This leads to higher transfer functions with
powers of O, that is, to RTZ sequences with a greater Hamming weight. The
codes with lower rates therefore have a higher error correction capability.
5.3.4 Performance
The performance of a code is defined by the decoding error probability after
transmission on a noisy channel. The previous section allows us to intuitively
compare non-recursive non-systematic, non-recursive systematic and recursive
systematic codes with the same constraint length. However, to estimate the
absolute performance of a code, we must be able to estimate the decoding error
probability as a function of the noise, or at least to limit it. The literature,
for example [5.9], thus defines many bounds that are not described here and we
will limit ourselves to comparing the three categories of convolutional codes. To
do this, a transmission on a Gaussian channel of blocks of 53 then 200 bytes
184 Codes and Turbo Codes
Figure 5.16 – Comparison of simulated performance (Binary Error Rate and Packet
Error Rate) of three categories of convolutional codes after transmission of packets of
53 bytes on a Gaussian channel (decoding using the MAP algorithm).
coded according to different schemes was simulated (Figures 5.16 and 5.17):
classical (non-recursive non-systematic), non-recursive systematic and recursive
systematic.
The blocks were constructed following the classical trellis termination tech-
nique for non-recursive codes whereas the recursive code is circular tail-biting
(see Section 5.5). The decoding algorithm used is the MAP algorithm.
The BER curves are in perfect agreement with the conclusions drawn during
the analysis of the free distance of codes and of their transfer function: the
systematic code is not as good as the others at high signal to noise ratio and
the classical code is then slightly better than the recursive code. At low signal
to noise ratios, the hierarchy is different: the recursive code and the systematic
code are equivalent and better than the classical code.
Comparing performance as a function of the size of the frame (53 and 200
bytes) shows that the performance hierarchy of the codes is not modified. More-
over, the bit error rates are almost identical. This was predictable as the sizes
of the frames are large enough for the transfer functions of the codes not to be
affected by edge effects. However, the packet error rate is affected by the length
of the blocks since although the bit error probability is constant, the packet
error probability increases with size.
The comparisons above only concern codes with 8 states. It is, however, easy
to see that the performance of a convolutional code is linked with its capacity
to provide information on the succession of data transmitted: the more the
code can integrate successive data into its output symbols, the more it improves
the quality of protection these data. In other words, the greater the number
of states (therefore the size of the register of the encoder), the more efficient
a convolutional code is (within its category). Let us compare three recursive
systematic codes:
• 4 states [1, (1 + D2 )/(1 + D + D 2 )],
probes was designed to process frames encoded with convolutional codes with
2 to 16384 states and rates much lower than 1/2 (16384 states and R = 1/6
for the Cassini probe to Saturn and the Mars Pathfinder probe). Why not
use such codes for terrestrial radio-mobile transmissions for the general public?
Because the complexity of the decoding would become unacceptable for current
terrestrial transmissions using a reasonably-sized terminal operating in real time,
and fitting into a pocket.
described in this book since, for turbo decoding, we prefer another family of
algorithms relying on the minimization of the error probability of each symbol
transmitted. Thus, the Maximum A Posteriori (MAP) algorithm enables the
calculation of the exact value of the a posteriori probability associated with each
symbol transmitted using the received sequence [5.1]. The MAP algorithm and
its variants are described in Chapter 7.
• Calculate for each branch of a branch metric, d (T (i, si−1 , si )). For a binary
output channel, this metric is defined as the Hamming distance between
the symbol carried by the branch of the trellis and the received symbol,
d (T (i, si−1 , si )) = dH (T (i, si−1 , si )).
For a Gaussian channel, the metric is equal to the square of the Euclidean
distance between the branch considered and the observation at the input
of the decoder (see also Section 1.3):
2 2
d (T (i, si−1 , si )) = Xi − xi + Yi − yi
m
2 n
2
(j) (j) (j) (j)
= xi − Xi + yi − Yi
j=1 j=1
• Calculate the accumulated metric associated with each branch T (i, si−1 , si )
defined by:
where μ(i − 1, si−1 ) is the accumulated metric associated with node si−1 .
• For each node si , select the branch of the trellis corresponding to the
minimum accumulated metric and memorize this branch in memory (in
practice, it is the value of di associated with the branch that is stored).
The path in the trellis made up of the branches successively memorized at
the instants between 0 and i is the survivor path arriving in si . If the two
paths that converge in si have identical accumulated metrics, the survivor
is then chosen arbitrarily between these two paths.
5. Convolutional codes and their decoding 189
From the point of view of complexity, the Viterbi algorithm requires the
calculation of 2ν+1 accumulated metrics at each instant i and its complexity
varies linearly with the length of sequence k or of decoding window l.
Figure 5.20 – Structure of the recursive systematic convolutional code (7,5) and asso-
ciated trellis.
For each of the four nodes of the trellis, the value of di corresponding to the
transition of minimum accumulated metric λ is stored in memory.
(0)
(1)
(2)
(i, 2 ) = min ( (i, s ) )
s {0, 1, 2, 3}
(3)
i 15 i 14 i 13 i 2 i 1 i
^d
i 15
Figure 5.21 – Survivor path traceback operation (in bold) in the trellis from instant i
and determining the binary decision at instant i − 15.
After selecting the node with minimum accumulated metric, denoted s (in
the example of Figure 5.21, s = 3), we trace back in the trellis along the survivor
path to a depth l = 15. At instant i − 15, the binary decision dˆi−15 is equal to
the value of di−15 stored in the memory associated with the survivor path.
The aim of applying the Viterbi algorithm with weighted inputs is to search
for codeword c that is the shortest Euclidean distance between two codewords.
Equivalently (see Chapter 1), this also means looking
m for the codeword that
k (l) (l) n
(l) (l)
maximizes the scalar product x, X + y, Y = xi Xi + yi Yi .
i=1 l=1 l=1
In this case, applying the Viterbi algorithm uses branch metrics of the form
m
(l) (l)
n
(l) (l)
d (T (i, si−1 , si )) = xi X i + yi Yi and the survivor path then corre-
l=1 l=1
sponds to the path with maximum accumulated metric.
Figure 5.22 provides the performance of the two variants, with hard and
weighted inputs, of a decoder using the Viterbi algorithm for the code (7,5)
RSC for a transmission on a channel with additive white Gaussian noise. In
practice, we observe a gain of around 2 dB when we substitute weighted input
decoding for hard input decoding.
192 Codes and Turbo Codes
Figure 5.22 – Example of correction performance of the Viterbi algorithm with hard
inputs and with weighted inputs on a Gaussian channel. Recursive systematic convo-
lutional code (RSC) with generator polynomials 7 (recursivity) and 5 (redundancy).
Coding rate R = 1/2.
tions systems use independent frame transmissions. Paragraph 5.4.2 showed the
importance of knowing the initial and final states of the encoder during the de-
coding of a frame. In order to know these states, the technique used is usually
called trellis termination. This generally involves forcing the initial and final
states to values known by the decoder (in general zero).
d’i d’i , r i
ri Mux
di s (1)
i
1 D D D s (3)
i
2
s (0)
i
(2)
si
After initializing the register to zero, switch I is kept in position 1 and data
d1 to dk are coded. At the end of this encoding operation, instants k to k + ν,
switch I is placed in position 2 and di takes the value coming from the feedback
(0)
of the register, that is, a value that forces one register input to zero. Indeed, Si
is the result of a modulo-2 sum of two identical members. As for the encoder,
it continues to produce the associated redundancies ri .
194 Codes and Turbo Codes
This classical termination has one main drawback: the protection of the data
is not independent of their position in the frame. In particular, this can lead to
edge effects in the construction of a turbocode (see Chapter 7).
Tail-biting
A technique was introduced in the 70s and 80s [5.12] to terminate the trellis of
convolutional codes without edge effects: tail-biting. This involves making the
decoding trellis circular, that is, ensuring that the departure and the final states
of the encoder are identical. This state is then called the circulation state. This
technique is trivial for non-recursive codes as the circulation state is merely the
last ν bits of the sequence to encode. As for RSC codes, tail-biting requires
operations that are described in the following. The trellis of such a code, called
circular recursive systematic codes (CRSC), is shown in Figure 5.24.
(000)
(001)
(010)
(011)
(100)
(101)
(110)
(111)
where A is the state matrix and B the input matrix. In the case of the recur-
sive systematic code of generator polynomials [1, (1+D 2 + D 3 )/( 1+D + D3 )]
mentioned above, these matrices are
⎡ ⎤ ⎡ ⎤
1 0 1 1
A = ⎣ 1 0 0 ⎦ and B = ⎣ 0 ⎦.
0 1 0 0
5. Convolutional codes and their decoding 195
If the encoder is initialized to state 0 (s0 = 0), the final state s0k obtained at the
end of a frame of length k is:
k
s0k = Aj−1 Bdk−j (5.16)
j=1
When it is initialized in any state sc , the final state sk is expressed as follows:
k
sk = Ak sc + Aj−1 Bdk−j (5.17)
j=1
For this state sk to be equal to the departure state sc and for the latter therefore
to become the circulation state, it is necessary and sufficient that:
- .
k
I − Ak sc = Aj−1 Bdk−j (5.18)
j=1
XX
XXXk mod 7
XXX 1 2 3 4 5 6
s0k XX
0 0 0 0 0 0 0
1 6 3 5 4 2 7
2 4 7 3 1 5 6
3 2 4 6 5 7 1
4 7 5 2 6 1 3
5 1 6 7 2 3 4
6 3 2 1 7 4 5
7 5 1 4 3 6 2
Table 5.2 – Table of the CRSC code with generator polynomials [1, (1+D2 +D 3 )/
(1 + D + D3 )] providing the circulation state as a function of k mod 7 (k being
the length of the frame at the input) and of the terminal state s0k obtained after
encoding initialized to state 0.
3. Calculate the circulation state sc from the tables already calculated and
stored;
4. Initialize the encoder to state sc ;
5. Code the frame and transmit the redundancies calculated.
5.5.2 Puncturing
Some applications can only allocate a small space for the redundant part of the
codewords. But, by construction, the natural rate of a systematic convolutional
code is m/(m + n), where m is the number of input bits di of the encoder and n
is the number of output bits. It is therefore maximum when n = 1 and becomes
R = m/(m + 1). High rates can therefore only be obtained with high values
of m. Unfortunately, the number of transitions leaving any one node of the
trellis is 2m . In other words, the complexity of the trellis, and therefore of the
decoding, increases exponentially with the number of input bits of the encoder.
Therefore, this solution is generally not satisfactory. It is often avoided in favour
of a technique with a slightly lower error correction capability, but easier to
implement: puncturing.
The puncturing technique is commonly used to obtain high rates. It involves
using an encoder with a low value of m (1 or 2 for example), to keep a reasonable
decoding complexity, but transmitting only part of the bits coded. An example
is proposed in Figure 5.25. In this example, a 1/2 rate encoder produces outputs
di and ri at each instant i. Only 3 bits out of 4 are transmitted, which leads to
a global rate of 2/3. The pattern in which the bits are punctured is called the
puncturing mask.
5. Convolutional codes and their decoding 197
di ri di =0
di ri di =1
Figure 5.26 – Trellis diagram of the punctured recursive code for a rate 2/3.
The most widely used decoding technique involves taking the decoder of
the original code and inserting neutral values in the place of the punctured
elements. The neutral values are values representing information that is a priori
not known. In the usual case of a transmission using antipodal signalling (+1
for the logical ’1’, -1 for the logical ’0’), the null value (analogue 0) is taken as
the neutral value.
The introduction of puncturing increases the coding rate but, of course,
decreases its correction capability. Thus, in the example of Figure 5.26, the free
distance of the code is reduced from 6 to 4 (an associated RTZ sequence is shown
in the figure). Likewise Figure 5.27, in which we present the error rate curves
of code [1, (1 + D2 + D3 )/(1 + D + D3 )] for rates 1/2, 2/3, 3/4 and 6/7, shows
a decrease in error correction capability with the increase in coding rate.
The choice of puncturing mask obviously influences the performance of the
code. It is thus possible to favour one part of the frame, transporting sensitive
data, by slightly puncturing it to the detriment of another part that is more
highly punctured. A regular mask is, however, often chosen as it is simple to
implement.
198 Codes and Turbo Codes
Bibliography
[5.1] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv. Optimal decoding of
linear codes for minimizing symbol error rate. IEEE Transactions on
Information Theory, IT-20:284–287, March 1974.
[5.2] G. Battail. Weighting of the symbols decoded by the viterbi algorithm.
Annals of Telecommunications, 42(1-2):31–38, Jan.-Feb. 1987.
[5.5] P. Elias. Coding for noisy channels. IRE Convention Records, 3(4):37–46,
1955.
[5.6] R. M. Fano. A heuristic discussion of probabilistic decoding. IEEE Trans-
actions on Information Theory, IT-9:64–74, Apr. 1963.
[5.7] G. D. Forney. Convolutional codes i: Algebraic structure. IEEE Trans-
actions on Information Theory, IT-16:720–738, Nov. 1970.
[5.8] G. D. Forney. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–
278, March 1973.
[5.9] A. Glavieux. Codage de canal, des bases théoriques aux turbocodes.
Hermès-Science, 2005.
Concatenated codes
The previous chapters presented the elementary laws of encoding like BCH,
Reed-Solomon or CRSC codes. Most of these elementary codes are asymptoti-
cally good, in the sense that their minimum Hamming distances (MHD) can be
made as large as we want, by sufficiently increasing the degree of the generator
polynomials. The complexity of the decoders is unfortunately unacceptable for
the degrees of polynomials that would guarantee the MHD required by practical
applications.
A simple means of having codes with a large MHD and nevertheless easily
decodable is to combine several reasonably-sized elementary codes, in such a
way that the resulting global code has a high error correction capability. The
decoding is performed in steps, each of them corresponding to one of the ele-
mentary encoding steps. The first composite encoding scheme was proposed by
Forney during work on his thesis in 1965, called concatenated codes [6.4]. In
this scheme, a first encoder, called the outer encoder, provides a codeword that
is then re-encoded by a second encoder, called the inner encoder. If the two
codes are systematic, the concatenated code is itself systematic. In the rest of
this chapter, only systematic codes will be considered.
Figure 6.1(a) shows a concatenated code, as imagined by Forney, and the
corresponding step decoder. The most judicious choice of constituent code is an
algebraic code, typically a Reed-Solomon code, for the outer code, and a convo-
lutional code for the inner code. The inner decoder is then the Viterbi decoder,
which easily takes advantage of the soft values provided by the demodulator,
and the outer decoder, which works on symbols with several bits (for example,
8 bits), can handle errors in bursts at the output of the first decoder. A per-
mutation or interleaving function inserted between the two encoders, and its
inverse function placed between the two decoders, can greatly increase the ro-
bustness of the concatenated code (Figure 6.1(b)). Such an encoding scheme has
worked very successfully in applications as varied as deep space transmissions
and digital, satellite and terrestrial television broadcasting. In particular, it is
202 Codes and Turbo Codes
the encoding scheme adopted in many countries for digital terrestrial television
[6.1].
Figure 6.1 – serial concatenated code, (a) without and (b) with permutation. In both
cases, the output of the outer encoder is entirely recoded by the inner encoder.
R1 R2 R1 R2
Rp = = (6.1)
R1 + R2 − R1 R2 1 − (1 − R1 )(1 − R2 )
This rate is higher than the global rate Rs of a serial concatenated code (Rs =
R1 R2 ), for identical values of R1 and R2 , and the lower the encoding rates
the greater the difference. We can deduce from this that with the same error
correction capability of component codes, parallel concatenation offers a better
encoding rate, but this advantage diminishes when the rates considered tend
towards 1. When the dimension of the composite code increases, the gap between
Rp and Rs also increases. For example, three component codes of rate 1/2
form a concatenated code with global rate 1/4 for parallel, and 1/8 for serial
concatenation. That is the reason why it does not seem to be useful to increase
the dimension of a serial concatenated code beyond 2, except for rates very close
to unity.
However, with SC, the redundant part of a word processed by the outer
decoder has benefited from the correction of the decoder(s) that precede(s) it.
Therefore, at first sight, the correction capability of a serial concatenated code
seems to be greater than that of a parallel concatenated code, in which the values
representing the redundant part are never corrected. In other terms, the MHD
of a serial concatenated code must normally be higher than that of a parallel
concatenated code. We therefore find ourselves faced with the dilemma given
in Chapter 1: PC performs better in the convergence zone (near the theoretical
limit) since the encoding rate is more favourable, and the SC behaves better
at low error rates thanks to a larger MHD. Encoding solutions based on the
SC of convolutional codes have been studied [6.3], which can be an interesting
alternative to classical turbo codes, when low error rates are required. Serial
convolutional concatenated codes will not, however, be described in the rest of
this book.
204 Codes and Turbo Codes
When the redundant parts of the inner and outer codewords both undergo
supplementary encoding, the concatenation is said to be double serial concate-
nation. The most well-known example of this type of encoding structure is
the product code, which implements BCH codes (see Chapters 4 and 8). Mixed
structures, combining parallel and serial concatenations have also been proposed
[6.6]. Moreover, elementary concatenated codes can be of a different nature, for
example a convolutional code and a BCH code [6.2]. We then speak of hybrid
concatenated codes. From the moment elementary decoders accept and produce
weighted values, all sorts of mixed and/or hybrid schemes can be imagined.
Whilst SC can use systematic or non-systematic codes indifferently, parallel
concatenation uses systematic codes. If they are convolutional codes, at least
one of these codes must be recursive, for a fundamental reason to do with the
minimum input weight wmin , which is only 1 for non-recursive codes but is 2 for
recursive codes (see Chapter 5). To show this, see Figure 6.3 which presents two
non-recursive systematic codes, concatenated in parallel. The input sequence is
"all zero" (reference sequence) except in one position. This single "1" perturbs
the output of the encoder C1 for a short length of time, equal to the constraint
length 4 of the encoder. The redundant information Y1 is poor, in relation to
this particular sequence, as it contains only 3 values different from 0. After
permutation, of whatever type, the sequence is still "all zero", except in one
single position. Again, this "1" perturbs the output of the encoder C2 for a
length of time equal to the constraint length, and redundancy Y2 provided by the
second code is as poor in information as redundancy Y1 . In fact, the minimum
distance of this two-dimensional code is not higher than that of a single code,
with the same rate as that of the concatenated code. If we replace at least one
of the two non-recursive encoders by a recursive encoder, the "all zero" sequence
except in one position is no longer a "Return to Zero" (RTZ, see Section 5.3.2)
sequence for this recursive encoder, and the redundancy that it produces is thus
of much higher weight.
What we have explained above about the PC of non-recursive convolutional
codes suggests that the choice of elementary codes for the PC in general is
limited. As another example, let us build a parallel concatenated code from the
extended Hamming code defined by Figure 1.1 and the encoding Table 1.1. The
information message contains 16 bits, arranged in a 4x4 square (Figure 6.4(a)).
Each line and each column is encoded by the elementary Hamming code. The
horizontal and vertical parity bits are denoted ri,j and ri,j , respectively. The
global coding rate is 1/3. Decoding this type of code can be performed using the
principles of turbo decoding (optimal local decoding according to the maximum
likelihood and continuous exchanges of extrinsic information).
The MHD of the code is given by the pattern of errors of input weight 1
(Figure 6.4(b)). Whatever the position of the 1 in the information message,
the weight is 7. The figure of merit Rdmin (see Section 1.5) is therefore equal
to 7x(1/3), compared with the figure of merit of the elementary code 4x(1/2).
6. Concatenated codes 205
Figure 6.3 – The parallel concatenation of non-recursive systematic codes is a poor code
concerning the information sequences of weight 1. In this example, the redundancy
symbols Y1 and Y2 each contain only three 1.
Figure 6.4 – Parallel concatenation of extended Hamming codes (global rate: 1/3). On
the right: a pattern of errors of input weight 1 and total weight 7.
The asymptotic gain has therefore not been extraordinarily increased by means
of the concatenation (0,67 dB precisely), and a great reduction in the coding
rate has occurred. If we wish to keep the same global rate of 1/2, a part of the
redundancy must be punctured. We can choose, for example, not to transmit
the 16 symbols present in the last two columns and the last two lines of the
table of Figure 6.4(a). The MHD then drops to the value 3, that is, less than
the MHD of the elementary code. The PC is therefore of no interest in this case.
206 Codes and Turbo Codes
Again from the extended Hamming code, a double serial concatenation can
be elaborated in the form of a product code (Figure 6.5(a)). In this scheme,
the redundant parts of the horizontal and vertical codewords are themselves
re-encoded by elementary codes, which produce redundancy symbols denoted
wi,j . One useful algebraic property of this product code is the identity of the
redundancy symbols coming from the second level of encoding, in the horizontal
and vertical directions. The MHD of the code, which has a global rate 1/4, is
again given by the patterns of errors of input weight 1 and is equal to 16, that
is, the square of the MHD of the elementary code (Figure 6.5(b)). The figure
of merit Rdmin = 4 has therefore been greatly increased compared to parallel
concatenation. To attempt to increase the rate of this code by puncturing the
redundancy symbols while keeping a good MHD is bound to fail.
Figure 6.5 – Double serial concatenation (product code) of extended Hamming codes
(global rate: 1/4). On the right: a pattern of errors of input weight 1 and total weight
16.
Here it is not a concatenation in the sense that we defined above, since the
parity relations contain several redundancy variables and these variables appear
in several relations. We cannot therefore assimilate LDPC codes to standard
serial or parallel concatenation schemes. However, we can, like MacKay [6.5],
observe that a turbo code is an LDPC code. An RSC code with generator
polynomials GX (D) (recursivity) and GY (D) (redundancy), whose input is X
and redundant output Y , is characterized by the sliding parity relation:
Using the tail-biting technique (see CRSC, Section 5.5.1, the parity check
matrix takes a very regular form, such as the one presented in Figure 6.6 for a
coding rate 1/2, and choosing GX (D) = 1 + D + D3 and GY (D) = 1 + D2 + D 3 .
A CRSC code is therefore an LDPC code since the check matrix is sparse. This
is certainly not a good LDPC code, as the check matrix does not respect certain
properties about the positions of the 1s. In particular, the 1s on a same line
are very close to each other, which is not favourable to the belief propagation
decoding method.
6.3 Permutations
The functions of permutation or interleaving, used between elementary encoders
in a concatenated scheme, have a twofold role. On the one hand, they ensure, at
the output of each component decoder, a time spreading of the errors that can
be produced by it in bursts. These packets of errors then become isolated errors
for the following decoder, with far lower correlation effects. This technique for
the spreading of errors is used in a wider context than that of channel coding.
We can use it profitably, for example, to reduce the effects of more or less long
attenuation in transmissions affected by fading, and more generally in situations
where perturbations can alter consecutive symbols. On the other hand, in close
liaison with the characteristics of constituent codes, the permutation is designed
so that the MHD of the concatenated code is as large as possible. This is a
problem of pure mathematics associating geometry, algebra and combinatory
logic which, in most cases, has not yet found a definitive answer. Sections 7.3.2
and 9.1.6 develop the topic of permutation for turbo codes and graphs for LDPC
codes, respectively.
Figure 6.7 – Crossword grid with wrong answers but correct clues.
To correct (or decode) this grid, we must operate iteratively by line and by
column. The basic decoding rule is the following: "If there is a word in the
dictionary, a synonym or an equivalent to the definition given that differs from
the word in the grid by at most one letter, then this synonym is adopted".
6. Concatenated codes 209
The horizontal definitions allow us to begin correcting the lines in the grid
(Figure 6.8(a)):
After decoding this line, two words are correct or have been corrected, and
three are still to be found. Using the vertical definitions, we can now decode the
columns (Figure 6.8(b)):
After decoding the columns, there are still some unknown words and we have to
perform a second iteration of the line - column decoding process (Figure 6.9).
Line decoding leads to the following result (Figure 6.9(a)):
After this step, there is still one wrong line. It is possible to correct it by
decoding the columns again (Figure 6.9(b)).
3. TENON is correct.
4. AGENT is correct.
5. LASSO is correct.
After this final decoding step, the wrong word on line II is identified: it is
OMEGA, another Greek letter.
• To arrive at the right result, two iterations of the line and column decoding
were necessary. It would have been a pity to stop after just one iteration.
But that is what we do in the case of a classical concatenated code such
as that of Figure 6.1.
6. Concatenated codes 211
• A word that was correct at the beginning (THETA is indeed a Greek letter)
turned out to be wrong. Likewise, a correction made during the second
step (SHAPE) turned out to be wrong. So the intermediate results must
be considered with some caution and we must avoid any hasty decisions.
In modern iterative decoders, this caution is measured by a probability
that is never exactly 0 or 1.
Bibliography
[6.1] Dvb-terrestrial. ETSI EN 302 296 V1.1.1 (2004-04).
[6.2] P. Adde, R. Pyndiah, and C. Berrou. Performance of hybrid turbo codes.
Elect. Letters, 32(24):2209–2210, Nov. 1996.
[6.3] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara. Serial concate-
nation of interleaved codes: performance analysis, design, and iterative
decoding. IEEE Trans. Info. Theory, 44(3):909–926, May 1998.
[6.4] Jr. G. D. Forney. Performance of concatenated codes. In E. R. Berlekamp,
editor, Key papers in the development of coding theory, pages 90–94. IEEE
Press, 1974.
[6.5] D. J. C. MacKay. Good error-correcting codes based on very sparse ma-
trices. IEEE Transactions on Information Theory, 45(2):399–431, March
1999.
[6.6] K. R. Narayanan and G. L. Stüber. Selective serial concatenation of turbo
codes. IEEE Comm. Letters, 1(5):136–139, Sept. 1997.
Chapter 7
The error correction capability of a convolutional code increases when the length
of the encoding register increases. This is shown in Figure 7.1, which provides
the performance of four RSC codes with respective memories ν = 2, 4, 6 and 8,
for rates 1/2, 2/3, 3/4 and 4/5, decoded according to the MAP algorithm. For
each of the rates, the error correction capability improves with the increase in ν,
above a certain signal to noise ratio that we can assimilate almost perfectly with
the theoretical limit calculated in Chapter 3 and identified here by an arrow.
To satisfy the most common applications of channel coding, a memory of the
order of 30 or 40 would be necessary (from a certain length of register and for a
coding rate 1/2, the minimum Hamming distance of a convolutional code with
memory ν is of the order of ν). If we knew how to easily decode a convolutional
code with over a billion states, we would no longer speak much about channel
coding and this book would not exist.
A turbo code is a coding trick, aiming to imitate a convolutional code with
a large memory ν. It is built on the principle of the saying divide and rule, that
is, by associating several small RSC codes whose particular decodings are of rea-
sonable complexity. A judicious exchange of information between the elemen-
tary decoders enables the composite decoder to approximate the performance
of maximum likelihood decoding.
Figure 7.1 – Performance of recursive systematic convolutional codes (RSC) for dif-
ferent rates and four values of the memory of code ν. Comparison with Shannon
limits.
[7.26], etc. had earlier imagined procedures for coding and decoding that were
the forerunners of turbo codes.
In a laboratory at École Nationale Supérieure des Télécommunications de
Bretagne (Telecom Bretagne), Claude Berrou and Patrick Adde were attempt-
ing to transcribe the Viterbi algorithm with weighted input (SOVA: Soft-Output
Viterbi Algorithm) proposed in [7.7], into MOS transistors, in the simplest possi-
ble way. A suitable solution [7.10] was found after two years which enabled these
researchers to form an opinion about probabilistic decoding. Claude Berrou,
then Alain Glavieux, pursued the study and observed, after Gerard Battail,
that a decoder with weighted input and output could be considered as a sig-
nal to noise ratio amplifier. This encouraged them to implement the concepts
commonly used in amplifiers, mainly feedback. Perfecting turbo codes involved
many very pragmatic stages and also the introduction of neologisms, like "paral-
lel concatenation" or "extrinsic information", nowadays common in information
theory jargon. The publication in 1993 of the first results [7.14], with a perfor-
mance 0,5 dB from the Shannon limit, shook the coding community. A gain of
almost 3 dB, compared to solutions existing at that time, had been found by
a small team that was not only unknown, but also French (France, a country
known for its mathematical rigour, versus turbo codes, an empirical invention
to say the least). There followed a very distinct evolution in habits, as under-
lined by A. R. Calderbank in [7.20] (p. 2573): "It is interesting to observe that
the search for theoretical understanding of turbo codes has transformed coding
theorists into experimental scientists"
[7.13] presents a chronology describing the successive ideas that appeared in
the search to perfect turbo codes. This new coding and decoding technique was
7. Convolutional turbo codes 215
first baptized turbo-code, with a hyphen to show that it was a code decoded in
a turbo way (by analogy with the turbo engine that uses exhaust gas to increase
its power). As the hyphen is not used much in English, it became turbo code,
that is, a "turbo" code, which does not mean very much. In French today, turbo
code is written as a single word: turbocode.
Since the seminal work of Shannon, random codes have always been a ref-
erence for error correction coding (see Section 3.1.5). The systematic random
coding of a block of k information bits, leading to a codeword of length n, can,
as the first step and once for all, involve drawing at random and memorizing k
binary markers containing n − k bits, whose memorization address is denoted
i (0 ≤ i ≤ k − 1). The redundancy associated with any block of information
is then formed by the modulo 2 sum of all the markers whose address i is such
216 Codes and Turbo Codes
that the i-th information bit equals 1. In other words, the k markers are the
bases of a vector space of dimension k. The codeword is finally made up of the
concatenation of the k information bits and of the n − k redundancy bits. The
rate R of the code is k/n. This very simple construction of the codeword relies
on the linearity property of the addition and leads to high minimum distances
for sufficiently large values of n − k. Because two codewords are different by at
least one information bit and the redundancy is drawn at random, the average
minimum distance is 1 + n−k 2 . However, the minimum distance of this code
being a random variable, its different realizations can be lower than this value.
A simple realistic approximation of the effective minimum distance is n−k 4 .
A way to build an almost random encoder is presented in Figure 7.2. It is
a multiple parallel concatenation of circular recursive systematic convolutional
codes (CRSC, see Chapter 5) [7.12]. The sequence of k binary data is coded N
times by N CRSC encoders, in a different order each time. The permutations Πj
are drawn at random, except the first one that can be the identity permutation.
k
Each elementary encoder produces N redundancy symbols (N being a divisor
of k), the global rate of the concatenated code being 1/2.
The proportion of input sequences of a recursive encoder built from a pseudo-
random generator with memory ν, initially positioned in state 0, which return
the register back to the same state at the end of the coding, is:
p1 = 2−ν (7.1)
since there are 2ν possible return states, with the same probability. These
sequences, called Return To Zero (RTZ) sequences,(see Chapter 5), are linear
combinations of the minimum RTZ sequence, which is given by the recursivity
polynomial of the generator (1 + D + D 3 in the case of Figure 7.2).
The proportion of RTZ sequences for the multi-dimensional encoder is low-
ered to:
pN = 2−N ν (7.2)
since the sequence must, after each permutation, remain RTZ for the N encoders.
The other sequences, with proportion 1 − pN , produce codewords that have
a distance d satisfying:
k
d> (7.3)
2N
This worst case value assumes that a single permuted sequence is not RTZ
and that redundancy Y takes the value 1, every other time on average, on the
corresponding circle. If we take N = 8 and ν = 3 for example, we obtain
p8 ≈ 10−7 and, for sequences to encode of length k = 1024, we have dmin = 64,
which is a sufficient minimum distance if we refer to the curves of Figure 3.6.
Random coding can thus be approximated by using small codes and random
permutations. The decoding can be performed following the turbo principle,
described in Section 7.4 for N = 2. The scheme of Figure 7.2 is, however, not
used in practice, for reasons linked to the performance and complexity of the
7. Convolutional turbo codes 217
decoding. First, the convergence threshold of the turbo decoder, that is, the
signal to noise ratio from which the turbo decoder can begin to correct most of
the errors, degrades when the dimension of the concatenation increases. Indeed,
the very principle of turbo decoding means considering the elementary codes
one after the other, iteratively. As their redundancy rate decreases when the
dimension of the composite code increases, the first steps in the decoding are
penalized compared to a concatenated code with a simple dimension 2. Then,
the complexity and the latency of the decoder are proportional to the number
of elementary encoders.
Figure 7.3 – A binary turbo code with memory ν = 3 using identical elementary RSC
encoders (polynomials 15, 13). The natural coding rate of the turbo code, without
puncturing, is 1/3.
Figure 7.3 presents a turbo code in its most classical version [7.14]. The
binary input message, of length k, is encoded in its natural order and in a
permuted order by two RSC encoders called C1 and C2 , which can be terminated
or not. In this example, the two elementary encoders are identical (generator
polynomials 15 for the recursivity and 13 for the construction of the redundancy)
but this is not a necessity. The natural coding rate, without puncturing, is 1/3.
To obtain higher rates, redundancy symbols Y1 and Y2 are punctured. Another
way to have higher rates is to adopt m-binary codes (see 7.5.2).
218 Codes and Turbo Codes
As the permutation function (Π) concerns a message of finite size k, the turbo
code is by construction a block code. However, to distinguish it from concate-
nated algebraic codes decoded in a "turbo" way, like product codes which were
later called block turbo codes, this turbo coding scheme is called a convolutional
code or, more technically, a Parallel Concatenated Convolutional Code (PCCC).
Arguments in favour of this coding scheme (some of which have already been
introduced in Chapter 6) are the following:
1. A decoder for convolutional codes is vulnerable to errors arriving in pack-
ets. Coding the message twice, following two different orders (before and
after permutation), makes fairly improbable the simultaneous appearance
of error packets at the input of the decoders of C1 and C2 . If there are
errors grouped at the input of the decoder of C1 , the permutation dis-
perses them over time and they become isolated errors that are easy for
the decoder of C2 to correct. This reasoning also holds for error packets at
the input of this second decoder, which correspond, before permutation,
to isolated errors. Thus two-dimensional coding, on at least one of the
two dimensions, greatly reduces the vulnerability of convolutional coding
concerning grouped perturbations. But which of the two decoders should
be relied on to take the final decision? No criterion allows us to be more
confident about one or the other. The answer is given by the "turbo" algo-
rithm that avoids having to make this choice. This algorithm implements
exchanges of probabilities between the two decoders and constrains them
to converge, during these exchanges, towards the same decisions.
2. As we saw in Section 6.1, parallel concatenation leads to a higher coding
rate than that of serial concatenation. Parallel concatenation is therefore
more favourable when signal to noise ratios close to the theoretical limits
are considered, with average error rates targeted. It can be different when
very low error rate are sought for since the MHD of a serial concatenated
code can be larger.
3. Parallel concatenation uses systematic codes and at least one of these codes
must be recursive, for reasons also described in Section 6.1
4. Elementary codes are small codes: codes with 16, 8, or even 4 states. Even
if the decoding implements repeated probabilistic processing, it remains of
reasonable complexity.
Figure 7.4 presents turbo codes used in practice and Table 7.2 lists some
industrial applications. For a detailed overview of the applications of turbo and
LDPC codes see [7.29]. The parameters defining a particular turbo code are the
following:
a– m is the number of bits in the symbols applied to the turbo encoder. The
applications known to this day consider binary (m = 1) or double-binary
(m = 2) symbols.
7. Convolutional turbo codes 219
data shared by the two decoders is more penalizing than puncturing data that
are only useful to one of the decoders.
What must be closely considered when building a turbo code and decoding
it, are the RTZ sequences, whose output weights limit the minimum distance of
the code and fix its asymptotic performance. In what follows it will be assumed
that the error patterns that are not RTZ do not contribute to the MHD of the
turbo code and will therefore not have to be considered.
• Unlike the other techniques, circular termination does not present any edge
effects: all the bits of the message are protected in the same way and are
all doubly encoded by the turbo code. Therefore, during the design of the
permutation, there is no need to give special importance to such and such
a bit, which leads to simpler permutation models.
222 Codes and Turbo Codes
• The sequences that are not RTZ have an influence on the whole circle: on
average, one parity symbol out of two is modified along the block. For
typical values of k (a few hundred or more), the corresponding output
weight is therefore very high and these error patterns do not contribute
to the MHD of the code, as already mentioned at the end of the previous
section. Without termination or with termination using tail bits, only the
part of the block after the beginning of the non-RTZ sequence has any
effect on the parity symbols.
To these two advantages we can, of course, add the interest of having to trans-
mit no additional information about termination and therefore losing nothing
in spectral efficiency.
The circular termination technique was chosen for the DVB-RCS and DVB-
RCT [7.2, 7.1] standards, for example.
There are two ways to specify a permutation, the first by equations linking
addresses before and after permutation, the second by a look-up table providing
the correspondence between addresses. The first is preferable from the point
of view of simplicity in the specification of the turbo code (standardization
committees are sensitive to this aspect) but the second can lead to better results
since the degree of freedom is generally larger when designing the permutation.
Regular permutation
The point of departure when designing interleaving is regular permutation,
which is described in Figure 7.5 in two different forms. The first assumes that the
block containing k bits can be organized as a table of M rows and N columns.
The interleaving then involves writing the data in an ad hoc memory, row by
row, and reading them column by column (Figure 7.5(a)). The second applies
without any hypothesis about the value of k. After writing the data in a linear
memory (address i, 0 ≤ i ≤ k − 1), the block is assimilated to a circle, the two
extremities (i = 0 and i = k − 1) then being adjacent (Figure 7.5(b)). The
binary data are then extracted in such a way that the j-th datum read has been
previously written in position i, with value:
Figure 7.5 – Regular permutation in rectangular (a) and circular (b) form.
is a convention that is to be adopted once and for all, and the one that we have chosen is
compatible with most standardized turbo codes.
224 Codes and Turbo Codes
where:
This upper bound is only reached in the case of a regular permutation and
with conditions:
√
P = P0 = 2k (7.9)
and:
P0
k= mod P0 (7.10)
2
Let us now consider a sequence of any weight that, after permutation, can be
written:
k−1
˜
d(D) = aj D j (7.11)
j=0
where aj can take the binary value 0 (no error) or 1 (one error) and, before
permutation:
k−1
k−1
d(D) = ai D i = aΠ(j) DΠ(j) (7.12)
i=0 j=0
We denote jmin and jmax the j indices corresponding to the first and last non-
˜
null values aj in d(D). Similarly, we define imin and imax for sequence d(D).
Then, the regular permutation satisfying (7.9) and (7.10) guarantees the prop-
erty:
√
(jmax − jmin ) + (imax − imin ) > 2k (7.13)
˜
This is because d(D) and d(D), both considered between min and max indices,
contain at least 2 bits whose
√ accumulated spatial distance, as defined by (7.5),
is maximum and equal to 2k. We must now consider two cases:
7. Convolutional turbo codes 225
˜
• sequences d(D) and d(D) are both of the simple RTZ type, that is, they
begin in state 0 of the encoder and return to it once, at the end. The
parity bits produced by these sequences are statistically 1s, every other
time. Taking into account (7.13), for common values of k (k > 100), the
redundancy weights are high and these RTZ sequences do not contribute
to the MHD of the turbo code.
˜
• at least one of sequences d(D) and d(D) is of the multiple RTZ type, that
is, it corresponds to the encoder passing several times through state 0.
If these passes through state 0 are long, the parity associated with the
sequence may have reduced weight and the associated distance may be
low. Generally, in this type of situation, the sequences before and after
permutation are both multiple RTZ.
The performance of a turbo code, at low error rates, is closely linked with
the presence of multiple RTZ patterns and regular permutation is not a good
solution for eliminating these patterns.
Figure 7.6 – Possible error patterns of weight 2, 3, 6 or 9 with a turbo code whose
elementary encoders have a period 7 and with regular permutation.
Intra-symbol disorder
When the elementary codes are m-binary codes, we can introduce a certain
disorder into the permutation of a turbo code without however removing its
regular nature! To do this, in addition to intersymbol classical permutation, we
implement intra-symbol permutation, that is, a non-regular modification of the
content of the symbols of m bits, before coding by the second code [7.11]. We
briefly develop this idea with the example of double-binary turbo codes (m = 2).
Figure 7.7 – Possible error patterns with binary (a) and double-binary (b) turbo codes
and regular permutation.
Figure 7.8 – Periodicities of the double-binary encoder of Figure 7.7(b). The four
input couples (0, 0), (0, 1), (1, 0) and (1, 1) are denoted 0, 1, 2 and 3, respectively. This
diagram gives all the combinations of pairs of couples of the RTZ type.
Figure 7.7(b) gives two examples of rectangular, minimum size error patterns.
First note that the perimeter of these patterns is larger than half the perimeter
of the square of Figure 7.7(a). Now, for a same coding rate, the redundancy of
a double-binary code is twice as dense as that of a binary code. We thus deduce
that the distances of the double-binary error patterns will naturally be larger,
everything else being equal, than those of binary error patterns. Moreover, there
is a simple way to eliminate these elementary patterns.
Figure 7.9 – The couples of the grey boxes are inverted before the second (vertical)
encoding. 1 becomes 2, 2 becomes 1; 0 and 3 remain unchanged. The patterns of
Figure 7.7(b), redrawn in (a), are no longer possible error patterns. But those of (b)
are, with distances 24 and 26 for coding rate 1/2.
7. Convolutional turbo codes 229
Assume, for example, that the couples are inverted (1 becomes 2 and vice-
versa), every other time, before being applied to the vertical encoder. Then the
error patterns presented in Figure 7.9(a) no longer exist; for example, although
30002 does represent an RTZ sequence for the encoder considered, 30001 no
longer does. Thus, many of the error patterns, in particular the smallest, disap-
pear thanks to the disorder introduced inside the symbols. Figure 7.9(b) gives
two examples of patterns that the periodic inversion does not modify. The cor-
responding distances are high enough (24 and 26 for a rate 1/2) not to pose
a problem for small or average block sizes. For long blocks (several thousand
bits), additional intersymbol disorder, of low intensity, can be added to the
intra-symbol non-uniformity, to obtain even higher minimum distances.
Figure 7.10 – Permutation of the DRP type. This is a circular regular permutation to
which local permutations before writing and after reading are added.
Irregular permutations
In this section, we will not describe all the irregular permutations that have
been imagined so far, and that have been the subject of numerous publications
and several book chapters (see [7.40, 7.34] for example). We prefer to present
what seems, for the moment, to be both the simplest and the most efficient type
of permutation. These are almost regular circular permutations, called almost
regular permutation (ARP)[7.17] or dithered relatively prime (DRP ) [7.21]
permutations, depending on their authors. In all cases, the idea is not to stray
too far away from the regular permutation, which is well adapted to simple RTZ
error patterns and to instil some small, controlled disorder to counter multiple
RTZ error patterns.
Figure 7.10 gives an example, taken from [7.21], of what this small disorder
can be. Before the circular regular permutation is performed, the bits undergo
local permutation. This permutation is performed in groups of CW bits. CW ,
which is the writing cycle disorder, is a divisor of length k of the message. Sim-
ilarly, a local CR reading cycle permutation is applied before the final reading.
230 Codes and Turbo Codes
If we choose
Q(j) = A(j)P + B(j) (7.15)
where the positive integers A(j) and B(j) are periodic, with cycle C (divisor
of k), then these values correspond to the positive shifts applied respectively
before and after regular permutation. That is the difference between the per-
mutation shown in Figure 7.10, in which the writing and reading perturbations
are performed inside small groups of data and not by shifts.
7. Convolutional turbo codes 231
For the permutation to really be a bijection, parameters A(j) and B(j) are
not just any parameters. To ensure the existence of the permutation, there is
one sufficient condition: all the parameters have to be multiples of C. This
condition is not very restricting in relation to the efficiency of the permutation.
(7.15) can then be rewritten in the form:
where α(j) and β(j) are more often than not small integers, with values 0 to 8.
In addition, since the properties of a circular permutation are not modified by
a simple rotation, one of the Q(j) values can systematically be 0.
Two typical sets of Q values, with cycle 4 and α = 0 or 1, are given below:
Quadratic Permutation
Recently, Sun and Takeshita [7.41] proposed a new class of deterministic in-
terleavers based on permutation polynomials (PP) over integer rings. The use
of PP reduces the design of interleavers to simply a selection of polynomial
coefficients. Furthermore, PP-based turbo codes have been shown to have a)
good distance properties [7.38] which are desirable for lowering the error floor
and b) a maximum contention-free property [7.43] which is desirable for parallel
processing to allow high-speed hardware implementation of iterative turbo
decoders.
232 Codes and Turbo Codes
A. Permutation Polynomials
Before addressing the quadratic PP, we will define the general form of a
polynomial and discuss how to verify whether a polynomial is a PP over the
ring of integers modulo N , ZN . Given an integer N ≥ 2, a polynomial
f (x) = a0 + a1 + a2 x2 + . . . + am xm modulo N (7.19)
where the coefficients a0 , a1 , a2 , . . . , am and m are non-negative integers, is said
to be permutation polynomial over ZN when f (x) permutes {0, 1, 2, . . . , N − 1}
[7.41]. Since we have modulo N operation, it is sufficient for the coefficients
a0 , a1 , a2 , . . . , am to be in ZN . Let us recall that the formal derivative of the
polynomial f (x) is given by
f (x) = a1 + 2a2 x + 3a3 x2 + . . . + mam xm−1 modulo N (7.20)
To verify whether a polynomial is a PP over ZN , let us discuss the following
three cases a) the case N = 2n , where n is an element of the positive integers
Z+ , b) the case N = pn where p is any prime number, and c) the case where N
is an arbitrary element of Z+ .
1. Case I (N = 2n ): a theorem in [7.36] states that f (x) is a PP over the
integer ring Zn2 if and only if 1) a1 is odd, 2) a2 + a4 + a6 + . . . is even,
and 3) a3 + a5 + a7 + . . . is even.
Under the condition gcd(q1 , N/2 = 45) = 1, the potential values for q1 are have
120 possible QPPs.
{1, 2, 4, 7, 8, 11, 13, 14, 16, 17, 19, 22, 23, 26, 28, 29, 31, 32, 34, 37, 38, 41, 43, 44, 46,
47, 49, 52, 53, 56, 58, 59, 61, 62, 64, 67, 68, 71, 73, 74, 76, 77, 79, 82, 83, 86, 88, 89}
computer search, Rosnes and Takeshita provided, for turbo codes that use 8 and
16-state constituent codes, a very useful list for the best (in term of minimum
distance) QPPs for a wide range of N (32 ≤ N ≤ 512 and N = 1024) [7.38].
After discussing a necessary and sufficient condition for verifying whether a
quadratic polynomial is a PP modulo N, and providing some examples, let us
discuss some properties of QPPs. It is well known that a linear polynomial,
l(x) = l0 + l1 x(or simply l(x) = l1 x), is guaranteed to be a PP modulo N if
l1 is chosen relatively prime to N (i.e., gcd(l1 , N ) = 1). Consequently, linear
permutation polynomials (LPP) always exist for any N , but unfortunately this
is not true for QPPs. For example, there are no QPP for N = 11 and for
2 ≤ N ≤ 4096 there are only 1190 values of N that have QPPs (roughly 29%)
[7.44]. A theorem in [7.44] guarantees the existence of QPP for all N = 8i,
i ∈ Z+ (i.e., multiples of a typical computer byte size of 8). It is shown in [7.44]
that some QPP degenerate to LPP (i.e., there exists an LPP that generates the
same permutation over the ring ZN ). A QPP is called reducible if it degenerates
to an LPP; otherwise it is called irreducible. For instance, example 1 in Case I of
sub-section A could be simply reduced to f (x) = l + 3x modulo 8 to obtain the
same permutation. In [7.38], it is shown that some reducible QPPs can achieve
better minimum distances than irreducible QPP for some short to medium
interleavers. However, for large intereavers, the class of irreducible QPPs are
better (in term of minimum distance) than the class of LPP; and if not, that
particular length will not have any good minimum distance [7.38].
Figure 7.12 – 8-state turbo encoder and schematic structure of the corresponding
turbo decoder. The two elementary decoders exchange probabilistic information, called
extrinsic information (z)
.
where z(d) is the extrinsic information specific to d. The LLR is improved when
z is negative and d is a 0, or when z is positive and d is a 1.
7. Convolutional turbo codes 237
If the iterative process converges towards a stable solution, z1p(d) − z1p−1 (d) and
z2p (d) − z2p−1 (d) tend towards zero when p tends towards infinity. Consequently,
the two LLRs relative to d become identical, thus satisfying the basic criterion of
common probability mentioned above. As for proof of the convergence, it is still
being studied further and on this topic we can, for example, consult [7.49, 7.24].
Apart from the permutation and inverse permutation functions, Figure 7.13
details the operations performed during turbo decoding:
2. The role of SISO decoding is to increase the equivalent signal to noise ratio
of the LLR, that is, to provide more reliable extrinsic information at output
zoutput than at input (zinput ). The convergence of the iterative process
(see Section 7.6) will depend on the transfer function SNR(zoutput ) =
G(SNR(zinput )) of each of the decoders.
When data is not available at the input of the SISO decoder, due to punc-
turing for example, a neutral value (analogue zero) is substituted for this
missing data.
3. When the elementary decoding algorithm is not the optimal MAP algo-
rithm but a sub-optimal simplified version, the extrinsic information has
to undergo some transformations before being used by a decoder:
Figure 7.14 – Performance in packet error rates (PER) of the UMTS standard turbo
code for k = 640 and R = 1/3 on a Gaussian channel with 4-PSK modulation. Decod-
ing using the Max-Log-MAP algorithm with 6 iterations.
Notations
A sequence of data d is defined by d ≡ dk−1 0 = (d0 · · · di · · · dk−1 ), where di
is the vector of m-binary data applied at the input of the encoder at instant i:
di = (di,1 · · · di,l · · · di,m ). The value of di can also be represented by the scalar
m
integer value j = 2l−1 di,l , ranging between 0 and 2m − 1 and we can then
l=1
write di ≡ j.
In the case of two or four-phase PSK modulation (2-PSK, 4-PSK), the en-
coded modulated sequence u ≡ uk−1 0 = (u0 · · · ui · · · uk−1 ) is made up of vec-
tors ui of size m + m : ui = (ui,1 · · · ui,l · · · ui,m+m ), where ui,l = ±1 for l =
1 · · · m + m and m is the number of redundancy bits added to the m bits of
information. The symbol ui,l is therefore representative of a systematic bit for
l ≤ m and of a redundancy bit for l > m.
The sequence observed at the output of the demodulator is denoted v ≡
v0k−1 = (v0 · · · vi · · · vk−1 ), with vi = (vi,1 · · · vi,l · · · vi,m+m ). The series of
the states of the encoder between instants 0 and k is denoted S = Sk0 =
(S0 · · · Si · · · Sk ). The following is based on the results presented in the chapter
on convolutional codes.
240 Codes and Turbo Codes
p(di ≡ j, v) p(di ≡ j, v)
Pr(di ≡ j | v ) = = 2m −1 (7.22)
p(v)
p(di ≡ l, v)
l=0
where (s , s)/di (s , s) ≡ j denotes the set of transitions from state to state s → s
associated with the m-binary j. This set is, of course, always the same in a trellis
that is invariant over time.
The value gi (s , s) is expressed as:
5
m+m
1 (vi,l − ui,l )2
p(vi | ui ) = √ exp − (7.25)
σ 2π 2σ2
l=1
and: ν
2 −1
βi (s) = βi+1 (s ) gi (s, s ) for i = k − 1 · · · 0 (7.28)
s =0
βi+1 (s) αi (s ) gi∗ (s , s)
(s ,s)/di (s ,s)≡j
Prex (di ≡ j | v ) = (7.30)
βi+1 (s) αi (s ) gi∗ (s , s)
(s ,s)
We define Miα (s) and Miβ (s) the forward and backward metrics relative to node
s at instant i, and Mi (s , s), the branch metric relative to the s → s transition
of the trellis at instant i by:
⎡ ⎤
Bi = −σ 2 ln ⎣ βi+1 (s)αi (s )gi (s , s)⎦ (7.35)
(s ,s)
7. Convolutional turbo codes 243
Ai (j) ≈ min β
Mi+1 (s) + Miα (s ) + Mi (s , s) (7.38)
(s ,s)/di (s ,s)≡j
and for Bi :
β α
Bi ≈ min M i+1 (s) + M i (s ) + M i (s , s) = min Ai (l) (7.39)
(s ,s) l=0···2m −1
The forward and backward metrics are then calculated from the following
recurrence relations:
⎛
⎞
m+m
Miα (s) = minν ⎝Mi−1 α
(s ) − vi−1,l · ui−1,l + 2Lai−1 (d(s , s))⎠ (7.44)
s =0···2 −1
l=1
244 Codes and Turbo Codes
⎛
⎞
m+m
Miβ (s) = min ⎝Mi+1
β
(s ) − vi,l · ui,l + 2Lai (d(s, s )⎠ (7.45)
s =0···2ν −1
l=1
same value. The same rule is applied for the final state. For circular codes, all
the metrics are initialized to the same value at the beginning of the prologue.
Finally, taking into account (7.38) and replacing Mi (s , s) by its expression
(7.43), we obtain:
⎛
⎞
m+m
Ai (j) = min ⎝Mi+1
β
(s) + Miα (s ) − vi,l · ui,l ⎠ + 2Lai(j) (7.46)
(s ,s)/di (s ,s)≡j
l=1
1
m
Li (j) = L∗i (j) + vi,l · [ ui,l |di ≡j − ui,l |di ≡j0 ] + L@
i (j) − Li (j0 )
@
(7.50)
2
l=1
7. Convolutional turbo codes 245
This expression shows that extrinsic information L∗i (j) can, in practice, be
deduced from Li (j) by simple subtraction. Factor 12 in definition (7.32) of Li (j)
allows us to obtain a weighted decision and extrinsic information L∗i (j) on the
same scale as the noisy samples vi,l .
The first practical problem lies in the memory necessary to store metrics
Miβ (s). Processing the coded messages of k = 1000 bits, for example, with 8-
state decoders and quantization of the metrics on 6 bits, at first sight requires
a storage capacity of 48000 bits for each decoder. In sequential operation (al-
ternate processing of C1 and C2 ), this memory can, of course, be used by the
two decoders in turn. The technique used to greatly reduce this memory is
that of the sliding window. It involves (Figure 7.15) replacing all the backward
processing, from i = k − 1 to 0, by a succession of partial forward processings,
from i = iF to 0, then from i = 2iF to iF , from i = 3iF to 2iF etc., where iF
is an interval of some tens of trellis sections. Each partial backward processing
includes a "prologue" (dotted line), that is, a step without memorization whose
aim is to estimate as correctly as possible the accumulated backward metrics in
positions iF , 2iF , 3iF , etc. The parts shown by a solid line correspond to the
phases during which these metrics are memorized. The same memory can be
used for all the partial backward recursions. The forward recursion is performed
without any discontinuity.
The process greatly reduces the storage capacity necessary which, in addition,
becomes independent of the length of the messages. The drawback lies in the
necessity to perform the additional operations – the prologues – that can increase
the total calculation complexity by 10 to 20 %. However, these prologues can be
246 Codes and Turbo Codes
avoided after the first iteration if the estimates of the metrics at the boundary
indices are put into memory to be used as departure points for the calculations
of the following iteration.
Figure 7.15 – Operation of the forward and backward recursions when implementing
the MAP algorithm with a sliding window.
The second practical problem is that of the speed and latency of decoding.
The extent of the problem depends of course on the application and on the ra-
tio between the decoding circuit clock and the data rate. If the latter is very
high, the operations can be performed by a single machine, in the sequential
order presented above. In specialized processors of the DSP (digital signal pro-
cessor ) type, cabled co-processors may be available to accelerate the decoding.
In dedicated circuits of the ASIC (application-specific integrated circuit ) type,
acceleration of the decoding is obtained by using parallelism, that is, multiplying
the number of arithmetical operators, if possible without increasing the capacity
of the memories required to the same extent. Then, problems of access to these
memories are generally posed.
Note first that only knowledge of permutation i = Π(j) is necessary for
implementation of the iterative decoding and not that of inverse permutation
Π−1 , as could be wrongly assumed from the schematic diagrams of Figures 7.12
and 7.13. Consider, for example, two SISO decoders working in parallel to
decode the two elementary codes of the turbo code and based on two dual-
port memories for the extrinsic information (Figure 7.16). The DEC1 decoder
associated with the first code produces and receives the extrinsic information in
the natural order i. The DEC2 decoder associated with the second code works
according to index j but writes and recovers its data at addresses i = Π(j).
Knowledge of Π−1 , which could pose a problem depending on the permutation
model selected, is therefore not required.
7. Convolutional turbo codes 247
Figure 7.16 – Implementing turbo decoding does not require explicit knowledge of Π−1
Figure 7.17 – In practice, the storage of the extrinsic information uses only a single
memory.
Figure 7.17(b), during the odd cycles, the accesses to the reading-writing pages
are exchanged.
To further increase the degree of parallelism in the iterative decoder, the
forward and backward recursion operations can also be tackled inside each of the
two decoders (DEC1 and DEC2). This can be easily implemented by considering
the diagram of Figure 7.15.
Finally, depending on the permutation model used, the number of elemen-
tary decoders can be increased beyond two. Consider for example the circular
permutation defined by (7.14) and (7.16), with cycle C = 4 and k a multiple
of 4.
The congruences of j and Π(j) modulo 4, are periodic. Parallelism with
degree 4 is then possible following the principle described in Figure 7.18 [7.17].
For each forward or backward recursion (these also can be done in parallel),
four processors are used. At the same instant, these processors process data
whose addresses have different congruences modulo 4. In the example in the
figure, the forward recursion is considered and we assume that k/4 is also a
multiple of 4. Then, we have first processor begin at address 0, the second at
address k/4 + 1, the third at address k/2 + 2 and finally the fourth at address
3k/4 + 3. At each instant, as the processors advance by one place each time, the
congruences modulo 4 of the addresses are always different. Addressing conflicts
are avoided via a router that directs the four processors towards four memory
pages corresponding to the four possible congruences. If k/4 is not a multiple
of 4, the departure addresses are no longer exactly 0, k/4 + 1, k/2 + 2, 3k/4 + 3
but the process is still applicable.
Whatever the value of cycle C, higher degrees of parallelism of value pC, can
be implemented. Indeed, any multiple of C, the basic cycle in the permutation,
is also a cycle in the permutation, on condition that pC is a divisor of k. That
is, j modulo pC and Π(j) modulo pC are periodic on the circle of length k,
which can then be cut into pC fractions of equal length. For example, a degree
64 parallelism is possible for a value of k equal to 2048.
However, whatever the degree of parallelism, a minimum latency is unavoid-
able: the time required for receiving a packet and putting it into the buffer
memory. While this packet is being put into memory, the decoder works on the
information contained in the previous packet. If this decoding is performed in
a time at least equal to the memorization time, then the total decoding latency
is at maximum twice this memorization time. The level of parallelism in the
decoder is adjusted according to this objective, which may be a constraint in
certain cases.
For further information about the implementation of turbo decoders, of all
the publications on this topic, [7.47] is a good resource.
Figure 7.19 – General structure of an m-binary RSC encoder with code memory ν.
The time index is not shown.
on condition that :
RT G−1 C ≡ 0 (7.54)
Expression (7.52) ensures, first, that the Hamming weight of vector
(d1,i , d2,i , · · · , dm,i , yi ) is at least equal to 2 when we leave the reference path
7. Convolutional turbo codes 251
("all zero" path), in the trellis. Indeed, inverting any component of di modifies
the value of yi . Second, expression (7.53) indicates that the Hamming weight
of the same vector is also at least equal to 2 when we return to the reference
path. In conclusion, relations (7.52) and (7.53) together guarantee that the
free distance of the code, whose rate is R = m/(m + 1), is at least equal to 4,
whatever m.
• good average performance for this code whose decoding complexity re-
mains very reasonable (around 18,000 gates per iteration plus the mem-
ory);
• a certain coherence concerning the variation of performance with block
size (in agreement with the curves of Figures 3.6, 3.9, 3.10). The same
coherence could also be observed for the variation of performance with
coding rate;
• quasi-optimality of decoding with low error rates. The theoretical asymp-
totic curve for 188 bytes has been calculated from the sole knowledge of
the minimum distance of the code (that is, 13 with a relative multiplicity
of 0.5) and not from the total spectrum of the distances. In spite of this,
the difference between the asymptotic curve and the curve obtained by
simulation is only 0.2 dB for a PER of 10−7 .
For the rate 2/3 turbo code, again with blocks of 188 bytes, the minimum
distance obtained is equal to 18 (relative multiplicity of 0.75) instead of 13 for
the 8-state code. Figure 7.22(b) shows the gain obtained for low error rates:
around 1 dB for a PER of 10−7 and 1.4 dB asymptotically, considering the
respective minimum distances. We can note that the convergence threshold is
almost the same for 8-state and 16-state decoders, the curves being practically
identical for a PER greater than 10−4. The theoretical limit (TL), for R = 2/3
and for a blocksize of 188 bytes, is 1.7 dB. The performance of the decoder in
this case is: TL + 0.9 dB for a PER of 10−4 and TL + 1.3 dB for a PER of
10−7 . These intervals are typical of what we obtain in most rate and blocksize
configurations.
Replacing 4-PSK modulation by 8-PSK modulation, in the so-called prag-
matic approach, gives the results shown in Figure 7.22(b), for blocks of 188
and 376 bytes. Here again, good performance of the double-binary code can
be observed, with losses compared to the theoretical limits (that are around
3.5 and 3.3 dB, respectively) close to those obtained with 4-PSK modulation.
Associating turbo codes with different modulations is described in Chapter 10.
For a particular system, the choice between an 8-state or 16-state turbo code
depends, apart from the complexity desired for the decoder, on the target error
7. Convolutional turbo codes 255
Figure 7.22 – (a) PER performance of a double-binary turbo code with 8 states for
blocks of 12, 14, 16, 53 and 188 bytes. 4-PSK, AWGN noise and rate 2/3. Max-Log-
MAP decoding with input samples of 4 bits and 8 iterations. (b) PER performance of
a double-binary turbo code with 16 states for blocks of 188 bytes (4-PSK and 8-PSK)
and 376 bytes (8-PSK), AWGN noise and rate 2/3. Max-Log-MAP decoding with
input samples of 4 bits (4-PSK) or 5 bits (8-PSK) and 8 iterations.
rates. To simplify, let us say that an 8-state turbo code suffices for PERs greater
than 10−4 . This is generally the case for transmissions having the possibility
of repetitions (ARQ: Automatic Repeat reQuest). For lower PERs, typical of
broadcasting or of mass memory applications, the 16-state code is highly prefer-
able.
256 Codes and Turbo Codes
• the asymptotic gain measuring the behaviour of the coded system at low
error rates. This is mainly dictated by the MHD of the code (see Sec-
tion 1.5). A low value of the MHD leads to a great change in the slope
(flattening) in the error rate curve. When the asymptotic gain is reached,
the BER(Eb /N0 ) curve with coding becomes parallel to the curve without
coding.
• the convergence threshold defined as the signal to noise ratio from which
the coded system becomes more efficient than the non-coded transmission
system;
In the case of turbo codes and the iterative process of their decoding, it is
not always easy to estimate the performance either of the asymptotic gain or of
the convergence. Methods for estimating or determining the minimum distance
proposed by Berrou et al. [7.18], Garello et al. [7.27] and Crozier et al. [7.22]
are presented in the rest of this chapter. The EXIT diagram method proposed
by ten Brink [7.46] to estimate the convergence threshold is also introduced.
of a product code is equal to the product of the minimum distances of the con-
stituent codes). In the case of convolutional turbo codes, the minimum distance
is not obtained analytically; the only methods proposed are based on the total
or partial [7.28] enumeration of codewords whose input weight is lower than or
equal to the minimum distance. These methods are applicable in practice only
for small sizes blocksizes and small minimum distances, which is why they will
not be described here.
Figure 7.24 – Measured and estimated PER (UB) of the DVB-RCS turbo code for
the transmission of MPEG (188 bytes) blocks with coding rates 2/3 and 4/5. 4-PSK
modulation and Gaussian channel.
cable to any linear code, for any blocksize and any coding rate, and it requires
only a few seconds to a few minutes calculation on an ordinary computer, the
calculation time being a linear function of the blocksize or of its period P .
When the decoding is not maximum likelihood, this method is no longer rig-
orous and produces only an estimation of the minimum distance. In addition,
the multiplicity of the codewords at distance dmin is not provided and Equa-
tion (7.57) cannot be applied without particular hypotheses about the prop-
erties of the code. In the case of turbo codes, two realistic hypotheses can be
formulated to estimate multiplicity: a single codeword at distance A∗i has its i-th
information bit at 1 (unicity), and the A∗i values corresponding to all positions
i come from distinct codewords (non-overlapping).
An estimation of the PER is then given by:
1
k−1
Eb
PER ≈ erfc( RA∗i ) (7.58)
2 i=0 N0
The first hypothesis (unicity) under-evaluates the value of the error rate, unlike
the second (non-overlapping) that over-evaluates it and, overall, the two effects
compensate each other. As an example, Figure 7.24 compares the measured per-
formance of the DVB-RCS turbo code, for two coding rates, with their estimate
7. Convolutional turbo codes 259
deduced from (7.58). The parameters obtained by the error impulse method
are:
7.6.3 Convergence
A SISO decoder can be seen as a processor that transforms one of its input
values, the LLR of the extrinsic information used as a priori information, into
an output extrinsic LLR. In iterative decoding, the characteristics of the extrinsic
information provided by decoder 1 depend on the extrinsic information provided
by decoder 2 and vice-versa. The degree of dependency between the input and
output extrinsic information can be measured by the mutual information (MI).
The idea implemented by ten Brink [7.46] is to follow the exchange of ex-
trinsic information through the SISO decoders working in parallel on a diagram,
260 Codes and Turbo Codes
+∞
1 2f ( z| x)
I (z, x) = f ( z| x) × log2 dz (7.59)
2 x=−1,+1 f ( z| − 1) + f ( z| + 1)
−∞
7. Convolutional turbo codes 261
Figure 7.26 – Algorithm for determining the transfer function IE = T (IA, Eb/N0 )
Hypotheses:
• Hyp. 1: when the interleaving is large enough, the distribution of the input
extrinsic information can be approximated by a Gaussian distribution after
a few iterations.
• Hyp. 2: probability density f (z|x) satisfies the exponential symmetry
condition, that is, f (z|x) = f (−z|x)exp(−z).
The first hypothesis allows the a priori LLR ZA of a SISO decoder to be mod-
elled by a variable having independent Gaussian noise nz , with variance σz and
expectation μz , applied to the transmitted information symbol x according to
the expression
ZA = μz x + nz
The second hypothesis imposes σz2 = 2μz . The amplitude of the extrinsic infor-
mation is therefore modelled by the following distribution:
! $
2
1 (λ − μz x)
f ( λ| x) = √ exp − (7.60)
4πμz 4μz
From (7.59) and (7.60), observing that f ( z| 1) = f ( −z| − 1), we deduce the a
priori mutual information:
+∞ ! $
2
1 (λ − μz ) 2
IA = √ exp − × log2 dλ
4πμz 4μz 1 + exp (−λ)
−∞
or again
+∞ ! - C .2 $
1 λ − σz2 2
IA = 1 − √ exp − × log2 [1 + exp (−λ)] dλ (7.61)
2πσz 2σz2
−∞
262 Codes and Turbo Codes
We can note that lim IA = 0 (the extrinsic information does not provide any
σz →0
information about datum x) and that lim IA = 1 (the extrinsic information
σz →+∞
perfectly determines datum x).
IA is an increasing monotonous function of σz ; it is therefore invertible. Function
σz = f (IA) is shown in Figure 7.25.
Figure 7.27 – (a) Transfer characteristic of the extrinsic information for a 16-state
binary encoder, with rate 2/3 and MAP decoding with different Eb/N0.
(b) EXIT chart and trajectory for the corresponding turbo code, with pseudo-random
interleaving on 20,000 bits, for Eb /N0 = 2dB.
+∞
1 2fs ( z| x)
IS = fs ( z| x) × log2 dz (7.62)
2 x=−1,+1 fs ( z| − 1) + fs ( z| + 1)
−∞
Figure 7.26 shows the path taken to establish the transfer characteristic of
the extrinsic information of a SISO decoder.
e. An example
The simulations were performed on a 16-state binary turbo code with rate
2/3, with a pseudo-random interleaving of 20,000 bits. The decoding algorithm
is the MAP algorithm. Figure 7.27(a) shows the relation between IS and IA as
a function of the signal to noise ratio of the Gaussian channel.
Figure 7.28 – EXIT charts for different Eb /N0 in the case of binary turbo codes,
rate 2/3, pseudo-random interleaving of 20,000 bits, (a) 16-state and (b) 8-state MAP
decoding.
EXIT chart
The extrinsic information transfer characteristic is now known for a SISO de-
coder. In the case of iterative decoding, the output of decoder 1 becomes
the input of decoder 2 and vice versa. Curves IS1 = f (IA1 = IS2) and
IS2 = f (IA2 = IS1), identical to one symmetry if the SISO decoders are the
same, are placed on the same graph as shown in Figure 7.27(b). In the case
of a high enough signal to noise ratio (here 2 dB), the two curves do not have
any intersection outside the point of coordinates (1,1) which materializes the
knowledge of the received message. Starting from null mutual information, it
is then possible to follow the exchange of extrinsic information along the iter-
7. Convolutional turbo codes 265
Figure 7.29 – Binary error rates of a 16-state (a) and 8-state (b) binary turbo code
with rate 2/3, with pseudo-random interleaving of 20000 bits. MAP decoding with 1,
3, 6, 10, 15 and 20 iterations and comparison with the convergence threshold estimated
by the EXIT method.
When the signal to noise ratio is too low, as in case Eb /N0 = 1.4 dB in
Figure 7.28(b), the curves have intersection points other than point (1, 1). The
iterative process starting from null MI at the input will therefore not be able to
lead to a perfectly determined message. The minimum signal to noise ratio for
which there is no intersection other than point (1,1) is the convergence threshold
of the turbo encoder. In the simulated example, this convergence can be esti-
266 Codes and Turbo Codes
mated at around 1.4 dB for 16-state (Figure 7.28(a)) and 8-state (Figure 7.28(b))
binary turbo codes.
Figure 7.29 shows the performance of 16-state and 8-state binary turbo codes
as a function of the number of iterations, and compares them with the conver-
gence threshold estimated by the EXIT chart method.
Bibliography
[7.1] Interaction channel for digital terrestrial television. DVB, ETSI EN 301
958, V1.1.1, pp. 28-30, Aug. 2001.
[7.2] Interaction channel for satellite distribution systems. DVB, ETSI EN 301
790, V1.2.2, pp. 21-24, Dec. 2000.
[7.3] Multiplexing and channel coding (fdd). 3GPP Technical Specification
Group, TS 25.212 v2.0.0, June 1999.
[7.4] Recommendations for space data systems. telemetry channel coding. Con-
sultative Committee for Space Data Systems, BLUE BOOK, May 1998.
[7.5] Ieee standard for local and metropolitan area networks, IEEE Std 802.16a,
2003. Available at
http://standards.ieee.org/getieee802/download/802.16a-2003.pdf.
[7.6] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv. Optimal decoding of
linear codes for minimizing symbol error rate. IEEE Transactions on
Information Theory, IT-20:284–287, March 1974.
[7.7] G. Battail. Weighting of the symbols decoded by the viterbi algorithm.
Annals of Telecommunications, 42(1-2):31–38, Jan.-Feb. 1987.
[7.8] G. Battail. Coding for the gaussian channel: the promise of weighted-
output decoding. International Journal of Satellite Communications,
7:183–192, 1989.
[7.9] C. Berrou. Some clinical aspects of turbo codes. In Proceedings of 3rd
International Symposium on turbo codes & Related Topics, pages 26–31,
Brest, France, Sept. 1997.
[7.10] C. Berrou, P. Adde, E. Angui, and S. Faudeuil. A low complexity soft-
output viterbi decoder architecture. In Proceedings of IEEE International
Conference on Communications (ICC’93), pages 737–740, GENEVA, May
1993.
[7.11] C. Berrou, C. Douillard, and M. Jézéquel. Designing turbo codes for
low error rates. In IEE colloquium : turbo codes in digital broadcasting –
Could it double capacity?, pages 1–7, London, Nov. 1999.
7. Convolutional turbo codes 267
[7.16] C. Berrou and M. Jézéquel. Non binary convolutional codes for turbo
coding. Electronics Letters, 35(1):39–40, Jan. 1999.
[7.17] C. Berrou, Y. Saouter, C. Douillard, S. Kerouédan, and M. Jézéquel.
Designing good permutations for turbo codes: towards a single model.
In Proceedings of IEEE International Conference on Communications
(ICC’04), Paris, France, June 2004.
[7.24] L. Duan and B. Rimoldi. The iterative turbo decoding algorithm has fixed
points. IEEE Transactions on Information Theory, 47(7):2993–2995, Nov.
2001.
[7.25] P. Elias. Error-free coding. IEEE Transactions on Information Theory,
4(4):29–39, Sept. 1954.
[7.26] R. G. Gallager. Low-density parity-check codes. IRE Transactions on
Information Theory, IT-8:21–28, Jan. 1962.
[7.27] R. Garello and A. Vila Casado. The all-zero iterative decoding algo-
rithm for turbo code minimum distance computation. In Proceedings of
IEEE International Conference on Communications (ICC’2004), pages
361–364, Paris, France, 2004.
[7.28] R. Garello, P. Pierloni, and S. Benedetto. Computing the free distance of
turbo codes and serially concatened codes with interleavers: Algorithms
and applications. IEEE Journal on Selected Areas in Communications,
May 2001.
[7.29] K. Gracie and M.-H. Hamon. Turbo and turbo-like codes: Principles
and applications in telecommunications. In Proceedings of the IEEE, vol-
ume 95, pages 1228–1254, June 2007.
[7.30] J. Hagenauer and P. Hoeher. Concatenated viterbi-decoding. In Proceed-
ings of International Workshop on Information Theory, pages 136–139,
Gotland, Sweden, Aug.-Sept. 1989.
8.1 History
Because of the Gilbert-Varshamov bound, it is necessary to have long codes in
order to obtain block codes with a large minimum Hamming distance (MHD) and
therefore high error correction capability. But, without a particular structure,
it is almost impossible to decode these codes.
The invention of product codes, due to Elias [8.4], can be seen in this con-
text: it means finding a simple way to obtain codes with high error correction
capability that are easily decodable from simple elementary codes. These prod-
uct codes can be seen as a particular realization of the concatenation principle
(Chapter 6).
The first decoding algorithm results directly from the construction of these
codes. This algorithm alternates the hard decision decoding of elementary codes
on the rows and columns. Unfortunately, this algorithm does not allow us to
reach the maximum error correction capability of these codes. The Reddy-
Robinson algorithm [8.15] does allow us to reach it. But no doubt due to its
complexity, it has never been implemented in practical applications.
The aim of this chapter is to give a fairly complete presentation of algorithms
for decoding product codes, whether they be algorithms for hard data or soft
data.
Definition
Let C1 (resp. C2 ) be a linear code of length n1 (resp. n2 ) and with dimension1
k1 (resp. k2 ). The product code C = C1 ⊗ C2 is the set of matrices M of size
n1 × n2 such that:
Example 8.1
Let H be the Hamming code of length 7 and P be the parity code of length 3.
The dimension of H is 4 and the dimension of P is 2. The code C = H ⊗ P
is therefore of length 21 = 7 × 3 and dimension 8 = 4 × 2. Let the following
information word be coded:
0 1 1 0
I= .
1 0 1 0
⎡ ⎤
1 0 0 0 1 1 1
⎢0 1 0 0 1 1 0⎥
0 1 1 0 ·⎢
⎣0
⎥= 0 1
⎦ 1 0 0 1 1
0 1 0 1 0 1
0 0 0 1 0 1 1
⎡ ⎤
1 0 0 0 1 1 1
⎢0 1 0 0 1 1 0⎥
1 0 1 0 ·⎢
⎣0
⎥= 1 0 1 0 0 1 0
0 1 0 1 0 1⎦
0 0 0 1 0 1 1
1 or length of message
8. Turbo product codes 273
Each column of the final codeword must now be a codeword with parity P .
The final codeword is therefore obtained by adding a third row made up of the
parity bits of each column. The complete codeword is:
⎡ ⎤
0 1 1 0 0 1 1
⎣1 0 1 0 0 1 0⎦.
1 1 0 0 0 0 1
For the codeword to be valid, it must then be verified that the third row of the
word is indeed a codeword of H. This row vector must therefore be multiplied
by the parity control matrix of H:
⎡ ⎤
1
⎢1⎥
⎡ ⎤ ⎢ ⎥ ⎡ ⎤
1 1 1 0 1 0 0 ⎢ ⎥ 0
⎢0⎥
⎣1 1 0 1 0 1 0⎦ · ⎢0⎥ = ⎣0⎦
⎢ ⎥
1 0 1 1 0 0 1 ⎢ ⎥ 0
⎢0⎥
⎣0⎦
1
In fact, it is not worthwhile doing this verification: it is ensured by con-
struction since codes H and P are linear. In addition, the encoding order is not
important: if we first code by columns then by rows, the codeword obtained is
the same.
Property
Row-column decoding is limited by a correction capability of (t1 + 1) · (t2 + 1)
errors. In other words, row-column decoding decodes any word having at least
274 Codes and Turbo Codes
(t1 + 1) · (t2 + 1) errors (even if it might decode certain patterns having more
errors) and there are words having exactly (t1 + 1) · (t2 + 1) errors that will not
be decoded.
Indeed, assume that we have a pattern with a number of errors strictly lower
than (t1 + 1) · (t2 + 1). Since the algorithm decoding in rows corrects up to t1
errors, after the first decoding step, any row with errors contains at least t1 + 1
errors. There are therefore at least t2 rows with errors after row decoding. Each
column therefore contains at least t2 errors and column decoding then eliminates
all the errors.
There are undecodable patterns having exactly (t1 + 1) · (t2 + 1) errors. Take
a codeword of C1 ⊗ C2 for which we choose (t2 + 1) rows and (t1 + 1) columns at
random. At each intersection between a row and a column, we insert an error in
the initial codeword. By construction, for the word thus obtained, there exists
a codeword for the product code at a distance of (t1 + 1) · (t2 + 1) errors, but
row decoding and column decoding fail.
We can note that row-column decoding is less powerful than syndrome decod-
ing for a product code. Indeed, a product code is a linear code whose minimum
distance is d1 d2 . Therefore syndrome decoding allows us to correct all the words
having at least t errors with t = (d1 d2 )/2. But row-column decoding allows
only all the words having less than (t1 + 1) · (t2 + 1) = (d1 /2 + 1)(d2 /2 + 1)
errors to be corrected. We therefore lose around a factor 2 in error correction
capability.
Example 8.2
We assume that we have a product code whose row code and column code both
have a minimum distance equal to 5. They are therefore both 2-correcting. The
row-column decoding of the product code, according to the above, can thus
correct any word having at most 8 errors. Figure 8.1 illustrates a word having
10 errors (shown as points) but that can be corrected by row-column decoding.
Figure 8.2 shows a pattern having the same number of errors but not correctable.
Example 8.3
Let us again take the above example with the word of Figure 8.2. During the
first step, those rows with 3 errors will not be able to be corrected by the row
code since the latter can correct only a maximum of 2 errors (MHD 5). They
will therefore be assigned an equal weight 2.5. The row with one error will have
a weight equal to 1, while all the remaining rows will have a weight equal to 0.
The configuration is then as shown in Figure 8.3.
At the second step, the correction becomes effective. Only three columns
have errors. Concerning the most left-hand column with errors, according to
the weights provided by step 1, three symbols in this column have a weight
equal to 2.5, one symbol with a weight equal to 1 and all the others have a
null weight. The second step of the algorithm for this column will therefore
involve three successive decodings: decoding without erasure, decoding with
three erasures (the symbols having a weight equal to 2.5) and decoding with
four erasures (the three previous symbols plus the symbol having a weight of 1).
The first decoding fails since the code is only 2-correcting. The second decoding
succeeds. Indeed, the column code has a minimum distance of 5. It can thus
correct t errors and e erasures when 2t + e < 5. Now, for this decoding, we have
e = 3 (since 3 erasures are placed) and t = 0 (since there are no additional errors
in the column). The weight associated with the second decoding is the sum of the
weights of the symbols erased, that is, 7.5. Likewise, the third decoding (with
4 erasures) succeeds and the weight associated with the word decoded is thus
8.5. The algorithm then chooses from among the decodings having succeeded
the one whose weight is the lowest, that is, the second decoding in this case.
8. Turbo product codes 277
Figure 8.3 – Calculation of the weight of row coming from the first step of the Reddy-
Robinson algorithm.
The two other columns with errors are also decoded. However for the most
right-hand column, the second decoding fails (since there is one error in the
non-erased symbols) and the word decoded for this column is therefore that of
the third decoding. Finally, all the errors are corrected.
In this algorithm the role of the rows and of the columns is not symmetric.
Thus, if the whole decoding fails in the initial order, it is possible to try again
by inverting the role of the rows and the columns.
The Reddy-Robinson decoding algorithm is not iterative in its initial version.
We can make it iterative, for example, by starting another decoding with the
final word of step 2, if one of the decodings of step 2 succeeded. There are also
more sophisticated iterative versions [8.16].
extrinsic information about the bits. We can thus decode the product codes in
a turbo manner.
Let therefore r = (r1 , ..., rn ) be the received word after encoding and trans-
mission on a Gaussian channel. The Chase-Pyndiah algorithm with t places is
decomposed as follows:
• Step 1: Select the t places Pk in the frame containing the least reliable
symbols in the frame (i.e. the t places j for which the rj values are the
smallest in absolute value).
• Step 2: Generate the vector of the hard decisions h0 = (h01 , ..., hon ) such
that h0j = 1 if rj > 0 and 0 otherwise. Generate the vectors h1 , ..., h2t −1
such that hij = h0j if j ∈ / {Pk } and hiPk = h0Pk ⊕ N um(i, k) where
N um(i, k) is the k-th bit in the binary writing of i.
• Step 3: Decode the words h0 , ..., h2t −1 with the hard decoder of the linear
code. We thus obtain the concurrent words c0 , ..., c2t −1 .
• Step 4: Calculate the metrics of the concurrent words
Mi = rj (1 − 2cij )
1≤j≤n
If there are no concurrent words for which the j-th bit is different from
cpp,j , then the reliability Fj is at a fixed value β.
• Step 7: Calculate the extrinsic value for each bit j,
Ej = (2 × cpp,j − 1) × Fj − rj
The extrinsic values are then exchanged between row decoders and column
decoders in an iterative process. The value β, as well as the values of the feed-
backs are here more sensitive than in the case of decoding convolutional turbo
codes. Inadequate values can greatly degrade the error correction capability.
However, it is possible to determine them incrementally. First, for the first iter-
ation we search to see which values give the best performance (for example by
dichotomy). Then, these values being fixed, we perform a similar search for the
second iteration, and so forth.
8. Turbo product codes 279
Example 8.4
Let r = (0.5; 0.7; −0.9; 0.2; −0.3; 0.1; 0.6) be a received sample of a Hamming
codeword (7, 4, 3) and the number of places of the Chase algorithm is equal to
t = 3. We choose β = 0.6. The above algorithm thus gives:
• Step 1: P1 = 6, P2 = 4, P3 = 5.
• Step 2:
I hi
0 (1;1;0;1;0;1;1)
1 (1;1;0;1;0;0;1)
2 (1;1;0;0;0;1;1)
3 (1;1;0;0;0;0;1)
4 (1;1;0;1;1;1;1)
5 (1;1;0;1;1;0;1)
6 (1;1;0;0;1;1;1)
7 (1;1;0;0;1;0;1)
The bits underlined correspond to the inversions performed by the Chase
algorithm.
• Step 3:
I hi ci
0 (1;1;0;1;0;1;1) (1;1;0;1;0;1;0*)
1 (1;1;0;1;0;0;1) (1;1;0;0*;0;0;1)
2 (1;1;0;0;0;1;1) (1;1;0;0;0;0*;1)
3 (1;1;0;0;0;0;1) (1;1;0;0;0;0;1)
4 (1;1;0;1;1;1;1) (1;1;1*;1;1;1;1)
5 (1;1;0;1;1;0;1) (0*;1;0;1;1;0;1)
6 (1;1;0;0;1;1;1) (1;0*;0;0;1;1;1)
7 (1;1;0;0;1;0;1) (1;1;0;0;0*;0;1)
The bits with a star in the column of concurrent words ci correspond to
the places corrected by the hard decoder in word hi .
• Step 4:
I ci Mi
0 (1;1;0;1;0;1;0*) -(0.5)-(0.7)+(-0.9)-(0.2)+(-0.3)-(0.1)+(0.6)=-2.1
1 (1;1;0;0*;0;0;1) -(0.5)-(0.7)+(-0.9)+(0.2)+(-0.3)+(0.1)-(0.6)=-2.7
2 (1;1;0;0;0;0*;1) -(0.5)-(0.7)+(-0.9)+(0.2)+(-0.3)+(0.1)-(0.6)=-2.7
3 (1;1;0;0;0;0;1) -(0.5)-(0.7)+(-0.9)+(0.2)+(-0.3)+(0.1)-(0.6)=-2.7
4 (1;1;1*;1;1;1;1) -(0.5)-(0.7)-(-0.9)-(0.2)-(-0.3)-(0.1)-(0.6)=-0.9
5 (0*;1;0;1;1;0;1) +(0.5)-(0.7)-(-0.9)-(0.2)-(-0.3)+(0.1)-(0.6)=0.3
6 (1;0*;0;0;1;1;1) -(0.5)+(0.7)+(-0.9)+(0.2)-(-0.3)-(0.1)-(0.6)=-0.9
7 (1;1;0;0;0*;0;1) -(0.5)-(0.7)+(-0.9)+(0.2)+(-0.3)+(0.1)-(0.6)=-2.7
280 Codes and Turbo Codes
• Steps 6 and 7:
j Fj Ej
1 ((0.3)-(-2.7))/4=0.75 0.75-0.5=0.25
2 ((-0.9)-(-2.7))/4=0.475 0.475-0.7=-0.225
3 ((-0.9)-(-2.7))/4=0.475 -0.475-(-0.9)=0.525
4 ((-0.9)-(-2.7))/4=0.475 -0.475-0.2=-0.675
5 ((-2.1)-(-2.7))/4=0.15 -0.15-(-0.3)=0.15
6 ((-0.9)-(-2.7))/4=0.475 -0.475-0.1=-0.575
7 ((-2.1)-(-2.7))/4=0.15 0.15-0.6=-0.4
Figure 8.5 – Evolution of the binary error rate during Chase-Pyndiah turbo decoding
for 1, 2, 3, 4 and 6 iterations (BCH code(64,51,6), 2-PSK on a Gaussian channel).
Example 8.5
We consider the same example as for the Chase algorithm.
Let r = (0.5; 0.7; −0.9; 0.2; −0.3; 0.1; 0.6) be a received sample. The parity
control matrix of the Hamming code of dimension 4 and length 7 is, as we have
already seen, ⎡ ⎤
1 1 1 0 1 0 0
H = ⎣1 1 0 1 0 1 0⎦
1 0 1 1 0 0 1
• Step 1: We have r = (−0.9; 0.7; 0.6; 0.5; −0.3; 0.2; 0.1) and P =
[3, 2, 7, 1, 5, 4, 6].
8. Turbo product codes 283
• Step 2: ⎡ ⎤
1 1 0 1 1 0 0
H∗ = ⎣0 1 0 1 0 1 1⎦
1 0 1 1 0 1 0
si Ci M
(0,1,1,1) (0,1,1,1,0,0,0) -2.7
(0,1,1,0) (0,1,1,0,1,1,0) -2.7+1.0+0.2=-1.5
(0,1,0,1) (0,1,0,1,0,1,1) -2.7+1.2-0.6=-2.1
(0,0,1,1) (0,0,1,1,1,0,1) -2.7+1.4+0.2=-1.1
(1,1,1,1) (1,1,1,1,1,1,1) -2.7+1.8+0.0=-0.9
(0,1,0,0) (0,1,0,0,1,0,1) -2.7+2.2+0.2=-0.3
(0,0,1,0) (0,0,1,0,0,1,1) -2.7+2.4-0.6=-0.9
(0,0,0,1) (0,0,0,1,1,1,0) -2.7+2.6+0.2=0.1
(1,1,1,0) (1,1,1,0,0,0,1) -2.7+2.8-0.2=-0.1
(1,1,0,1) (1,1,0,1,1,0,0) -2.7+3.0+0.6=0.9
(1,0,1,1) (1,0,1,1,0,1,0) -2.7+3.2-0.2=0.3
(0,0,0,0) (0,0,0,0,0,0,0) -2.7+3.6+0.0=0.9
(1,1,0,0) (1,1,0,0,0,1,0) -2.7+4.0-0.2=1.1
(1,0,1,0) (1,0,1,0,1,0,0) -2.7+4.2+0.6=2.1
(1,0,0,1) (1,0,0,1,0,0,1) -2.7+4.4-0.2=1.5
(1,0,0,0) (1,0,0,0,1,1,1) -2.7+5.4+0.0=2.7
• Step 5:
j FP (j)
1 ((-0.9)-(-2.7))/4=0.475
2 ((-1.1)-(-2.7))/4=0.4
3 ((-2.1)-(-2.7))/4=0.15
4 ((-1.5)-(-2.7))/4=0.3
5 ((-1.5)-(-2.7))/4=0.3
6 ((-1.5)-(-2.7))/4=0.3
7 ((-2.1)-(-2.7))/4=0.15
J EJ
1 0.3-0.5=-0.2
2 0.4-0.7=-0.3
3 -0.475-(-0.9)=0.525
4 -0.3-0.2=-0.5
5 -0.3-(-0.3)=0.0
6 -0.15-0.1=-0.25
7 0.15-0.6=-0.4
n−k−1
νm is the m-th bit of the binary representation of integer ν (ν = νm 2m ),
m=0
and:
5
n−1
Dν (l) = (ρl )tν (l)
l=0
tν (l) being the l-th bit of the ν-th vector of the dual of the code, that is, therefore:
"n−k−1 #
tν (l) = νm hml mod 2 = ν, hl mod 2
m=0
high, then n is much higher than n − k and the gain in terms of computation
complexity is high.
To do this, we re-write the general term of Dν (l) in the form:
(ρl )tν (l) = exp {tν (l) ln |ρl |} exp {jπql tν (l)}
where ql is such that ρl = (−1)ql |ρl |.
We then have: %n−1 &
Dν (l) = exp (tν (l) ln |ρl |) + (jπql tν (l))
l=0
%n−1 & %n−1 &
= exp (tν (l) ln |ρl |) exp (jπql tν (l))
l=0 l=0
Put: % &
n−1
n−k−1
Fρ (w) = ln (|ρl |) exp jπ wm hml
l=0 m=0
n−1
Fρ (w) = ln (|ρl |) exp {jπtw (l)}
l=0
n−1
with, in particular, Fρ (0) = ln (|ρl |).
l=0
1−exp{jπt}
On the other hand, if t = 0 or 1, then 2 = t and
Fρ (0) − Fρ (ν)
n−1
= (tν (l) ln |ρl |) .
2
l=0
/ n−k−1 0
n−1
Likewise, if we put Fq (w) = ql exp jπ wm hml , we have:
l=0 m=0
Fq (0) − Fq (ν)
n−1
= (ql ln |ρl |)
2
l=0
and therefore:
1 1
Dν (l) = exp (Fρ (0) − Fρ (ν)) exp jπ (Fq (0) − Fq (ν))
2 2
The two terms Fρ (ν) and Fq (ν) have a common expression of the form:
% n−k−1 &
n−1
F (w) = fl exp jπ wm hml
l=0 m=0
8. Turbo product codes 287
−1
2n−k
G(w) = g(p)(−1)<p,w>
p=0
n−k−1
Now, function g is null except for points pl = hml 2m for l ∈ [0, · · · , n − 1].
l=0
Thus, we have:
" #
n−k−1
n−1 n−1 n−k−1
< hml 2m ,w>
G(w) = fl (−1) m=0 = fl exp jπ wm hml = F (w)
l=0 l=0 m=0
The two terms Fρ (ν)and Fq (ν) are therefore expressed as Hadamard transforms
and can be calculated by means of the fast Hadamard transform.
Let R = [R0 , R1 , ..., R2n −1 ] be a vector with real components. The vector
obtained from R by the Hadamard transform is vector R̂ = [R̂0 , R̂1 , · · · , R̂2n −1 ]
such that
n
2 −1
R̂j = Ri (−1)<i,j> (8.3)
i=0
The scalar product < i, j > is, as above, the bit-by-bit scalar product of the
binary developments of i and j. We also write in vector form, R̂ = RH2n where
H2n is the Hadamard matrix of order 2n whose coefficient (H2n )i,j = (−1)<i,j> .
Let A be a matrix size a1 ×a2 and B a matrix size b1 ×b2 with real coefficients.
Then the Kronecker product of A by B, denoted A⊗B, is a matrix size (a1 b1 )×
(a2 b2 ) such that (A ⊗ B)i,j = Aq1 q2 Br1 r2 where i = b1 q1 + r1 and j = b2 q2 + r2
with 0 ≤ r1 < b1 and 0 ≤ r2 < b2 .
If N = 2n , we show that ([8.11]):
The fast Hadamard transform is in fact the use of this factorization to calculate
R̂.
Example 8.6
Let us calculate the Hadamard transform of vector
⎡ ⎤
1 0 0 0 1 0 0 0
⎢ 0 1 0 0 0 1 0 0 ⎥
⎡ ⎤ ⎢ ⎥
1 0 0 0 ⎢ 0 0 1 0 0 0 1 0 ⎥
⎢ ⎥
1 1 ⎢ 0 1 0 0 ⎥ ⎢ 0 0 0 1 0 0 0 1 ⎥
G1=[1] ⊗ ⊗⎢ ⎥=⎢ ⎥
1 −1 ⎣ 0 0 1 0 ⎦ ⎢ 1 0 0 0 −1 0 0 0 ⎥
⎢ ⎥
0 0 0 1 ⎢ 0 1 0 0 0 −1 0 0 ⎥
⎢ ⎥
⎣ 0 0 1 0 0 0 −1 0 ⎦
0 0 0 1 0 0 0 −1
8. Turbo product codes 289
⎡ ⎤
1 0 1 0
1 0 1 1 1 0 1 0 ⎢ 0 1 0 1 ⎥
G2 = ⊗ ⊗ = ⊗⎢
⎣ 1
⎥
0 1 1 −1 0 1 0 1 0 −1 0 ⎦
0 1 0 −1
⎡ ⎤
1 0 1 0 0 0 0 0
⎢ 0 1 0 1 0 0 0 0 ⎥
⎢ ⎥
⎢ 1 0 −1 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 1 0 −1 0 0 0 0 ⎥
=⎢ ⎥
⎢ 0 0 0 0 1 0 1 0 ⎥
⎢ ⎥
⎢ 0 0 0 0 0 1 0 1 ⎥
⎢ ⎥
⎣ 0 0 0 0 1 0 −1 0 ⎦
0 0 0 0 0 1 0 −1
⎡ ⎤
1 1 0 0 0 0 0 0
⎢ 1 −1 0 0 0 0 0 0 ⎥
⎡ ⎤ ⎢ ⎥
1 0 0 0 ⎢ 0 0 1 1 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 1 0 0 ⎥ 1 1 ⎢ 0 0 1 −1 0 0 0 0 ⎥
G3 = ⎢ ⎥⊗ =⎢ ⎥
⎣ 0 0 1 0 ⎦ 1 −1 ⎢ 0 0 0 0 1 1 0 0 ⎥
⎢ ⎥
0 0 0 1 ⎢ 0 0 0 0 1 −1 0 0 ⎥
⎢ ⎥
⎣ 0 0 0 0 0 0 1 1 ⎦
0 0 0 0 0 0 1 −1
The matrices Gi are sparse matrices having only two non-null elements per
column. Moreover, factorization of the Hadamard matrix HN has a length
proportional to log(N ). The total computation cost is therefore in N log(N ) by
the fast transform, instead of N 2 by the direct method. Figure 8.6 presents the
graph of the calculations for the fast Hadamard transform in the case N = 8.
In terms of error correcting performance, article [8.6] shows that we obtain
the same performance as the Chase-Pyndiah algorithm for two times fewer iter-
ations.
Figure 8.6 – Graph of the computation flow for the fast Hadamard transform N = 8.
optimization. This technique involves posing the problem of searching for the
most probable word as a problem of minimizing a global cost function, having as
its variables the information bits of the word. This optimization must be done
theoretically in integers, which makes the problem very difficult. The technique
used is to replace these integer variables by real variables. The problem is then
no longer strictly equivalent to the initial problem but it becomes possible to use
classical non-linear optimization techniques, like the gradient method, to search
for the minimum of the cost function.
Another approach introduced by Kschischang and Sorokine [8.17] involves
using the fact that most block codes used in turbo product codes are in fact trellis
codes. It is then possible to use classical(MAP, Max-Log-MAP, . . . ) algorithms
for decoding trellis codes to obtain the soft decisions of the decoder. This type
of algorithm is an application in the case of the turbo product codes of the
maximum likelihood decoding algorithms of Wolf [8.19].
Another more recently invented algorithm is Kötter and Vardy’s [8.10]. This
algorithm is only applicable to very specific codes, mainly including Reed-
Solomon codes and algebraic codes. It is based on the Sudan algorithm [8.18]
which is capable of determining the codewords in a neighbourhood close to the
received word. It is then possible to use conventional weighting techniques. Al-
though the initial version of this algorithm is relatively computation-costly, theo-
retical improvements have been made and this algorithm is even more promising
since Reed-Solomon codes are widely used in the domain of telecommunica-
tions. There are also decoding algorithms based on sub-codes [8.12]. Although
slightly complex at implementation level, these algorithms provide excellent per-
8. Turbo product codes 291
formance. Finally, recent studies have shown that decoding algorithms based on
belief propagation can be applied to linear codes in general [8.9].
Each half-iteration processor takes at its input the channel data as well as
the extrinsic values produced by the previous half-iteration. At the output,
the processor transfers the channel data (to ensure a pipeline operation of the
decoding chain), the hard decisions of the most probable codeword and the ex-
trinsic values calculated by the Chase-Pyndiah algorithm. The architecture of
the processor is illustrated in Figure 8.8. The FIFOs (First-In/First-Out) are
used to synchronize the input and output data of the SISO decoder which has
a certain latency. Small sized FIFOs are generally implemented by rows of D
292 Codes and Turbo Codes
flip-flops. When size increases, for reasons of hardware space and consumption,
it becomes fairly quickly worthwhile using a RAM memory and two pointers
(one for writing and one for reading), incremented at each new datum, with the
addressing managed circularly. We can also make a remark about the multipli-
cation of the extrinsic values by α. In a conventional implementation, values
Wk are generally integers of reduced size (5 or 6 bits) but the hardware cost
of a real multiplier is prohibitive, and we generally prefer to substitute it by a
simple table.
The SISO decoder, described by Figure 8.9, performs the steps of the Chase-
Pyndiah algorithm. The decoder is made up of five parts:
– The module for sequential processing of the data calculates in parallel the
input codeword syndrome and the least reliable positions in the frame.
– The algebraic decoding module performs the algebraic decoding of the
words built from input Rk data and knowledge of the least reliable places.
– The selection module determines the most probable codeword as well as
the closest concurrent word or words.
– The module for calculating the weightings determines the reliability of the
decoded bits.
– The memory module stores the total input weightings that are used to
calculate the weightings.
The module for processing the data receives the sample bits, one after the
other. If the code is cyclic (BCH, for example) calculating the syndrome is
then very simply done by using the factorization of the generator polynomial
following the Hörner scheme. Determining the least reliable positions is often
done by sequentially managing the list of not so reliable positions in a small
local RAM. There are also other solutions that are more economical in size, like
Leiserson’s systolic array, but the gain obtained is small.
8. Turbo product codes 293
The algebraic decoding module uses the value of the syndrome to determine
the erroneous places in the concurrent vectors. In the case of BCH codes, it is
possible to use the Berlekamp-Massey algorithm or the extended Euclid algo-
rithm to make the correction. It should, however, be noted that this solution
is really only economical for block codes with high correcting power. For codes
with low error correction capability, it is less costly to store the bits to be cor-
rected in a local ROM, for each possible value of the syndrome.
The selection module must sort among the words generated by the algebraic
decoding module to determine the most probable ones. It must therefore calcu-
late the metric of each of these words (which it does sequentially by additions)
and determine, by computation of the minimum value, the most probable among
them (their number is generally limited, for the sake of space).
Finally, the module for calculating the weightings uses the list of concurrent
words chosen above to generate the weightings from the equation of step 6 of
the Chase-Pyndiah algorithm. This module has low complexity since the calcu-
lations to be done are relatively simple and, for each bit, it must keep only two
values sequentially (the smallest metric among the concurrent words having 0 as
the corresponding value for this bit in its binary development, and the smallest
metric for the candidate words having 1 as their binary value). This module also
contains value β. In the case of an FPGA (Field Programmable Gate Array) im-
plantation, all the iterations are generally executed on the same hardware which
is re-used from half-iteration to half-iteration. We must therefore anticipate a
procedure for loading value β.
Bibliography
[8.1] P. Adde, R. Pyndiah, and O. Raoul. Performance and complexity of
block turbo decoder circuits. In Proceedings of Third International Con-
ference on Electronics, Circuits and Systems, (ICECS’96), pages 172–175,
Rhodes, Greece, 1996.
294 Codes and Turbo Codes
[8.13] L.E. Nazarov and V.M. Smolyaninov. Use of fast walsh-hadamard trans-
formation for optimal symbol-by-symbol binary block-code decoding.
Electronics Letters, 34:261–262, 1998.
LDPC codes
Low Density Parity Check (LDPC) codes make up a class of block codes that
are characterized by a sparse parity check matrix. They were first described
in Gallager’s thesis at the beginning of the 60s [9.21]. Apart from the hard
input decoding of LDPC codes, this thesis proposed iterative decoding based
on belief propagation (BP). This work was forgotten for 30 years. Only a few
rare studies referred to it during this dormant period, in particular, Tanner’s
which proposed a generalization of the Gallager codes and a bipartite graph
[9.53] representation.
After the invention of turbo codes, LDPC codes were rediscovered in the
middle of the 90s by MacKay et al. [9.39], Wilberg [9.64] and Sipser et al.
[9.52]. Since then, considerable progress concerning the rules for building good
LDPC codes, and coding and decoding techniques, have enabled LDPC codes
to be used, like turbo codes, in practical applications.
This chapter gives an overview of the encoding and decoding of LDPC codes,
and some considerations about hardware implementations.
The circles represent the binary data ci , also called variables. The rectangle con-
taining the exclusive or operator represents the parity equation (also called the
parity constraint, or parity). The links between the variables and the operator
indicate the variables involved in the parity equation.
There are two codewords in which bit c3 is equal to 0: codewords (0,0,0) and
(1,1,0). Similarly, there are two codewords in which bit c3 is equal to 1: code-
words (1,0,1) and (0,1,1). We deduce from this the following two equations in
the probability domain:
/
Pr(c3 = 1) = Pr(c1 = 1) × Pr(c2 = 0) + Pr(c1 = 0) × Pr(c2 = 1)
(9.2)
Pr(c3 = 0) = Pr(c1 = 0) × Pr(c2 = 0) + Pr(c1 = 1) × Pr(c2 = 1)
Using the expression of each probability according to the likelihood ratio func-
tion, deduced from Equation (9.1):
⎧
⎪ exp(L(cj ))
⎨ Pr(cj = 1) = 1+exp(L(cj ))
⎪
⎩ Pr(cj = 0) = 1 − Pr(cj = 1) = 1
1+exp(L(cj ))
9. LDPC codes 299
we have:
1 + exp(L(c2 ) + L(c1 ))
L(c3 ) = ln L(c1 ) ⊕ L(c2 ) (9.3)
exp(L(c2 )) + exp(L(c1 ))
Equation (9.3) enables us to define the switching operator ⊕ between the two
LLRs of the variables c1 and c2 .
exp(x)−1
Applying function tanh (x/2) = exp(x)+1 to Equation (9.3), the latter be-
comes:
It is practical (and frequent) to separate the processing of the sign and the
magnitude in Equation (9.4) which can then be replaced by the following two
equations:
5
1
sgn (L(c3 )) = sgn (L(cj )) (9.5)
j=0
5
1
|L(c3 )| |L(cj )|
tanh = tanh (9.6)
2 j=0
2
where the sign function sgn(x) is such that sgn(x) = +1 if x ≥ 0 and sgn(x) = −1
otherwise.
Processing the magnitude given by Equation (9.6) can be simplified by taking
the inverse of the logarithm of each of the terms of the equation, which gives:
⎛ ⎞
|L(c3 )| = f −1 ⎝ f (|L(cj )|)⎠ (9.7)
j=1,2
Such a representation is called the bipartite graph of the code. In this graph,
branches link two different classes of nodes to each other:
• The first class of nodes called variable nodes, correspond to the bits of the
codewords (cj , j ∈ {1, · · · , n}), and therefore to the columns of H.
• The second class of nodes, called parity check nodes, correspond to the
parity check equations (ep , p ∈ {1, · · · , m}), and therefore to the rows of
H.
Thus, to each branch linking a variable node cj to a parity check node ep corre-
sponds the 1 that is situated at the intersection of the j-th column and the p-th
row of the parity check matrix.
By convention, we denote P (j) (respectively J(p)) all the indices of the
parity nodes (respectively variable nodes) connected to the variable with index
j (respectively to the parity with index p). We denote by P (j)\p (respectively
J(p)\j) all the P (j) not having index p (respectively, all the J(p) not having
index j). Thus, in the example of Figure 9.2, we have
in the cycle. The graph being bipartite, the size of the cycles is even. The
size of the shortest cycle in a graph is called the girth. The presence of cycles
in the graph may degrade the decoding performance by a phenomenon of self-
confirmation during the propagation of the messages. Figure 9.3 illustrates two
cycles of sizes 4 and 6.
120
100
80
60
40
20
Figure 9.4 – Parity check matrix of a regular (3,6) LDPC code of size n = 256 and
rate R = 0, 5.
and the total number E of 1s in matrix H. For example, λ(x) = 0, 2x4 + 0, 8x3
indicates a code where 20% of the 1s are associated with variables ofdegree 5
and 80% with variables of degree 4. Note that, by definition, λ(1) = λj = 1.
Moreover, the proportion of variables of degree j in the matrix is given by
λj /j
λ̄j =
λk /k
k
Symmetrically, the
irregularity profile of the parities is represented by the poly-
nomial ρ(x) = ρp xp−1 , coefficient ρp being equal to the ratio between the
accumulated number of 1s in the rows (or parity) ofdegree p and the total num-
ber of 1s denoted E. Similarly, we obtain ρ(1) = ρj = 1. The proportion ρ̄p
of columns of degree p in matrix H is given by
ρp /p
ρ̄p =
ρk /k
k
Irregular codes have more degrees of freedom than regular codes and it is
thus possible to optimize them more efficiently: their asymptotic performance
is better than that of regular codes.
Coding rate
Consider a parity equation of degree dc . It is possible to arbitrarily fix the
values of the dc − 1 first bits; only the last bit is constrained and corresponds
to the redundancy. Thus, in a parity matrix H of size (m,n), each of the m
rows corresponds to 1 redundancy bit. If the m rows of H are independent, the
code then has m redundancy bits. The total number of bits of the code being
304 Codes and Turbo Codes
9.1.3 Encoding
Encoding an LDPC code can turn out to be relatively complex if matrix H does
not have a particular structure. There exist generic encoding solutions, including
an algorithm with complexity in O(n), requiring complex preprocessing on the
matrix H. Another solution involves directly building matrix H so as to obtain
a systematic code very simple to encode. It is this solution, in particular, that
was adopted for the standard DVB-S2 code for digital television transmission
by satellite.
Generic encoding
Encoding with a generator matrix
LDPC codes being linear block codes, the coding can be done via the gen-
erator matrix G of size k × n of the code, such as defined in Chapter 4. As
we have seen, LDPC codes are defined from their parity check matrix H, which
is generally not systematic. A transformation of H into a systematic matrix
Hsys is possible, for example with the Gaussian elimination algorithm. This
relatively simple technique, however, has a major drawback: the generator ma-
trix Gsys of the systematic
- . code is generally not sparse. The coding complexity
increases rapidly in O n2 , which makes this operation too complex for usual
length codes.
Coding with linear complexity
Richardson et al. [9.49] proposed a solution enabling quasi-linear complexity
encoding, as well as greedy algorithms making it possible to preprocess parity
check matrix H. The aim of the preprocessing is to put H as close as possible
to a lower triangular form, as illustrated in Figure 9.5, using only permutations
9. LDPC codes 305
Figure 9.5 – Representation in the lower pseudo-triangular form of the parity check
matrix H.
cHT = 0 (9.16)
Specific constructions
Coding with a sparse generator matrix
One idea proposed by Oenning et al. [9.45] involves directly building a sparse
systematic generator matrix, so the coding is performed by simple multiplication
and the parity check matrix remains sparse. These codes are called Low-Density
Generator-Matrix (LDGM) codes. Their performance is however poor [9.36],
even if it is possible to optimize their construction [9.22] and lower the error
floor.
Encoding by solving the system cH T = 0 obtained by substitution
Mackay et al. [9.40] propose to constrain the parity matrix so that it is
composed of the three sub-matrices A, B and C arranged as in Figure 9.6.
Figure 9.6 – Specific construction of parity check matrix H facilitating the encoding.
tively small. These classes therefore offer only a very limited number of possible
size – rate – irregularity profile combinations.
Summary
Table 9.1 summarizes the different possible types of coding encountered in the
literature. In practice, the conventional encoding of block codes by a generator
matrix is not used for LDPC codes due to the large the length of the codewords.
The codes obtained by projective or finite geometry cannot be optimized (opti-
mal design of the irregularity profiles). There therefore remain only codes built
to facilitate the encoding by solving the equation cH T = 0 by substitution, such
as the one chosen for the DVB-S2 standard.
latter, very many constraint nodes, each node being made up of the simplest
linear code possible (parity code). Note that from this representation of the
bipartite graph, an infinite number of more or less exotic codes can be built.
It is interesting to note that encoding LDPC codes is tending to be performed
more and more like the encoding of turbo codes (in a serial concatenation).
The precursors were, without doubt, the Repeat Accumulate codes proposed in
[9.16] whose encoding is composed of a repetition code, an interleaver and an
accumulator. These codes are then decoded by an algorithm of the LDPC type
with an adapted schedule. In the literature, we can now find many variants
of this type of encoding, which involves combining elementary encoders and
interleavers.
The similarity between turbo codes and LDPC codes is even greater than can
be assumed from the representations in the form of bipartite graphs. Indeed,
it is shown in [9.36], [9.44] and in Section 6.2 that it is possible to represent
a turbo code in the form of an LDPC matrix. The resemblance stops there.
Parity check matrix H of a turbo code contains many rectangular patterns (four
1s making a rectangle in matrix H), that is, many cycles of length 4, which make
the algorithms for decoding LDPC codes, to be described below, inefficient.
method for quantifying the sequence received, r, determines the choice of de-
coding algorithm.
where cj is the j-th bit of the codeword and rj = 2cj−1 + bj . In the case of
the additive white Gaussian noise channel, the noise samples bj follow a centred
Gaussian law with variance σ2 , that is:
! $
2
1 (rj − (2cj − 1))
p (rj |cj ) = √ exp (9.20)
2πσ2 2σ 2
Δ 2rj
Ij = L (rj |cj ) = − (9.21)
σ2
Each iteration of the BP algorithm is decomposed into two steps:
The iterations are repeated until the maximum number of iterations Nit is
reached. It is possible to stop the iterations before Nit when all the parity
equations are satisfied. This enables either a gain in mean throughput, or a
limit in consumption.
We call Lj the total information or the LLR of bit j. This is the sum of
the intrinsic information Ii and the total extrinsic information Zj which is by
definition the sum of the extrinsic information of branches Zj,p :
Δ
Zj = Zj,p (9.24)
p∈P (j)
The BP algorithm is optimal in the case where the graph of the code does not
contain any cycle: all the schedules1 give the same result. As LDPC codes
involve cycles, their decoding by the BP algorithm can lead to phenomena of
self-confirmation of the messages which degrade the convergence and make the
BP algorithm distinctly sub-optimal. However, these phenomena can be limited
if the cycles are large enough.
The first schedule proposed is called the "flooding schedule" [9.35]. It
involves successively processing all the parities then all the variables.
Initialization :
(0)
1- nit = 0, Zj,p = 0 ∀p ∀j ∈ J(p), Ij = 2yj /σ 2 ∀j
Repeat until nit = Nit or until the system has converged towards a code-
word :
2- nit = nit + 1
3- ∀ j ∈ {1, · · · , n} do: {Computation of the variable towards parity
messages}
(n ) (n −1) (n ) (n )
4- Zj it = Zj,pit and Lj it = Ij + Zj it
p∈P (j)
5- ∀ p ∈ P (j):
(n )
(n −1) (nit ) (n −1)
Lj,pit = Ij + Zj,pit = Ij + Zj − Zj,pit
p ∈P (j)/p
1 By schedules, we mean the order in which each parity and each variable is processed.
9. LDPC codes 311
(n )
The bits decoded are then estimated by sgn −Lj it .
It is interesting to note that it is possible to modify the algorithm by "order-
ing" the flooding schedule depending on the parity nodes. The latter are then
processed serially, and the algorithm becomes:
(nit +1)
3’- ∀j ∈ {1, · · · , n}: Zj =0
4’- ∀p ∈ {1, · · · , m} do:
(0) 2rj - .
Lj,p = − with rj ∼ N −1, σ 2 (9.26)
σ2
9. LDPC codes 313
(0) 2 4
therefore Lj,p ∼ N ,
σ2 σ2
We denote:
(0)
mj = σ22 the average of the consistent Gaussian probability density of
variable cj of degree dv sent to parities ep of degree dc which are connected
to it,
(nit ) (n )
μp the mean of messages Zj,pit .
(nit )
To follow the evolution of the average mj during the iterations nit , it then
suffices to take the mathematical expectation of Equations (9.22) and (9.23)
relative to the variable and parity processing , which gives:
(n )
(n )
dc −1
with Ψ (x) = E [tanh (x/2)] , x ∼ N (m, 2m) (9.27)
it it
Ψ μp = Ψ mj
(nit +1)2
mj + (dv − 1) μp(nit )
= (9.28)
σ2
Thus for a regular (dv , dc ) LDPC code and for a given noise with variance
σ 2 , Equations (9.27) and (9.28) enable us, by an iterative computation, to know
if the mean of the messages tends towards infinity or not. If such is the case,
it is possible to decode without errors with a codeword of infinite size and an
infinite number of iterations. In the case of an irregular code, it suffices to make
the weighted mean on the different degrees of Equations (9.27) and (9.28).
The maximum value of σ for which the mean tends towards infinity, and
therefore for which the error probability tends towards 0, is the threshold of
the code. For example, the threshold of a regular code (3,6), obtained with the
density evolution algorithm, is σmax = 0.8809 [9.13], which corresponds to a
Eb
minimum signal to noise ratio of N 0 min
= 1.1dB.
Another technique derived from extrinsic information transfer (EXIT)
charts2 proposed by ten Brink [9.55, 9.56] enables the irregularity profiles to
be optimized. Whereas the density evolution algorithm is interested in the evo-
lution of the probability densities of the messages during the iterations, these
charts are interested in the transfer of mutual information between the input
and the output of the decoders of the constituent codes [9.56]. The principle of
these charts has also been used with parameters other than mutual information,
like the signal to noise ratio or error probability [9.3, 9.2]. It has also been
applied to other types of channels [9.15].
2 The principle of building EXIT charts is described in Section 7.6.3
314 Codes and Turbo Codes
Figure 9.8 shows the performance of an LDPC code for different sizes and differ-
ent rates in the case of a DVB-S2 decoder implemented on an Altera Stratix80
FPGA.
LDPC codes therefore have an excellent theoretical performance. This must
however be translated by simplicity in their hardware implementation to enable
these codes to be used in practice. That is why particular attention must be
paid to LDPC decoder architectures and implementations.
R=1/2
R=3/4
R=4/5
0.1 R=5/6
R=8/9
R=9/10
0.01
0.001
FER
0.0001
1e05
1e06
1e07
1 2 3 4 5
Eb/No
Figure 9.8 – Packet error rate (or Frame error rate, FER) obtained for codeword sizes
of 64 kbits and different rates of the DVB-S2 standard (50 iterations, fixed point).
With the permission of TurboConcept S.A.S, France.
where Ix is the identity matrix whose columns have been right-shifted x times,
and where a and b are two non-zero elements in Fp of order j and k respectively.
9. LDPC codes 317
Array-based LDPC
Array codes are two-dimensional codes that have been proposed for detecting
and correcting burst errors [9.6]. When viewed as binary codes, the parity check
matrix of array codes exhibit sparseness, which can be exploited for decoding
them as LDPC codes using the BP algorithm [9.19]. Therefore, array codes
provide the framework for defining a family of LDPC codes that lend themselves
to deterministic constructions [9.18]. The parity check matrix of an array-based
LDPC code is:
⎛ ⎞
I I I ··· I
⎜ I α α2 ··· αk−1 ⎟
⎜ ⎟
H = ⎜ .. .. .. . . ⎟
⎝ . . . . . .
. ⎠
I α j−1
α 2(j−1)
··· α (j−1)(k−1)
vr = bk
λ(v − 1) = r(k − 1)
A BIBD is therefore commonly simply written as (v, k, λ), since b and r are
given in terms of v, k, and λ by
v(v − 1)λ λ(v − 1)
b= r=
k(k − 1) k−1
318 Codes and Turbo Codes
B = dv n = dc m (9.29)
sj = ⊗ e i (9.31)
i=j
Figure 9.10 – The different "compact mode" architectures for implementing the generic
operator ⊗.
is applied at all the inputs (total sum) then each output is calculated by
eliminating the contribution of the corresponding input, with the help of
the inverse operator.
It is possible to modify these architectures in order to introduce intermediate
pipeline registers enabling the critical path to be reduced. There are also archi-
tectures of the serial type (Figure 9.10(c)).
In what follows, the degree of parallelism of a GNP will be denoted αg . This
is the number of cycles necessary to process a node (without considering latency
due to the pipeline processing). Thus, for a parallel architecture capable of
9. LDPC codes 321
processing one node at each clock cycle, αg = 1, whereas for a serial architecture,
αg = d.
Note that in all the GNP architectures presented, we implicitly made the
hypothesis that all the inputs were available and that all the outputs had to
be generated either simultaneously (parallel architecture), or grouped in time
(serial architecture). This kind of GNP control mode is called the "compact
mode".
• The GNP can therefore, at the request of the system, calculate the i-th
(n ) (n −1)
output si it = E (nit ) ⊗ inv⊗ (ei it ).
• This output is sent, via the interleaver, to the opposite node which, once
(n )
the computation is over, returns ei it .
(n −1)
• This new value then replaces ei it in the memory and is also accu-
mulated to obtain the value of E (nit +1) at the end of the iteration. Two
accumulation modes are possible:
322 Codes and Turbo Codes
(n ) (n −1)
E (nit ) = E (nit ) ⊗ ei it ⊗ inv⊗ ei it (9.32)
At the end of the iteration, we thus have E (nit +1) = E (nit ) . This
solution offers two advantages in relation to delayed updating:
• one less memory word;
• an acceleration in the convergence of the algorithm as the new
values of the inputs are taken into account sooner.
Table 9.2 – Value of the generic operator associated with the variable processors
(VNPs) and parity processors (PNPs) as a function of the position of the intercon-
nection network.
makes it possible to randomly access the nodes associated with a same processor
of nodes, by memory addressing, for example. The combination of these two
types of permutations enables a random interconnection such as that existing
between the variable nodes and the parity nodes of the LDPC code.
Example of an implementation
To help clarify ideas about the way to organize the computations and the propa-
gation of the messages in the decoder, and to truly understand the link between
the organization of the propagation of the messages and the structure of the
LDPC code, Figure 9.14 shows a simple example of decoding an LDPC code of
length n = 12 and rate R = 0, 5 (therefore m = 6), with P = 2, dc = 3, α = 1
and β = dv . There are therefore P = 2 parity node processors and n/P = 6
variable node processors. One iteration is performed in m/P = 3 steps:
• At the first cycle, reading the information relative to the bits is done in
each of the VNPs, each of them containing n/P = 2 bits of the codeword
(in practice, n/P can be much higher). These bits are shaded in grey in
each VNP: this is space permutation.
9. LDPC codes 325
• This information is then sent to the PNPs via the permutation network,
whose address was generated from the cycle number (read into a memory
for example): this is time permutation.
• Combining the two describes the random interleaving between the vari-
ables and the first two parities of the bipartite graph shown on the right-
hand part of the figure.
In a single cycle, the first two parities will therefore be able to be processed.
The following two will be processed at the second cycle and so on and so forth,
until all the parities of the code have been processed. Note that this technique
where the PNP information arrives simultaneously prevents two bits contained
in the same VNP being involved in the same parity. Thus, for example, bits
1 and 2 cannot be involved in the same parity otherwise that would lead to a
memory conflict. This solution therefore imposes constraints on matrix H, if we
want it to be decodable by this structure. One solution to relax the constraints
involves, for example, entering the data serially into the parities.
Figure 9.14 – Example of a message propagation architecture: link between the de-
coder’s addressing and code structure.
only be computed when all the parities have been processed. This control mode
implements a flooding schedule according to the parities. Symmetrically, we
make a flooding schedule appear according to the variables, when the PNPs are
in distributed mode and the VNPs are in compact mode. These three types of
9. LDPC codes 327
VNP
distributed
Compact Delayed Immediate
update update
Flooding Flooding Interleaving
Compact
(parity) (horizontal)
Delayed Flooding
PNP
update (variable)
distributed Branches
Immediate interleaving
update (vertical)
Table 9.3 – Schedules associated with the different combinations of the node processor
controls.
schedules converge towards the same values: They do not change the information
propagation operation.
When one of the two types of processor is controlled in compact mode and
the other in distributed mode with immediate update, we implement a schedule
of the horizontal or vertical interleaving (shuffle) type. The order in which
the processors are activated is similar to the flooding schedule according to the
variables or the parities. Only the update of information changes since it is
performed as soon as a new input has arrived, thus accelerating the convergence
of the code.
The case where the two VNP and PNP processors are controlled in dis-
tributed mode is not of great interest. It would in fact correspond to controlling
the decoding, branch by branch.
The memory required to implement these different combinations is given in
Table 9.4.
VNP
Distributed
Compact Delayed Immediate
update update
Compact B+n 3n + g(B, dc ) 2n + g(B, dc )
Delayed
B + n + 2m 3n + 2m + B 2n + 2m + B
update
PNP Distributed
Immediate
B+n+m 3n + m + B 2n + m + B
update
is, n values. When the parity check mode is the compact one, the accumulation
of the messages of the n variables in each VNP must be memorized, that is,
n memories if we update them immediately, and 2n in the opposite case. The
reasoning is the same if the VNPs are in compact mode and the PNPs are in
distributed mode, but in this case, it is the accumulations of messages in the m
parities that must be memorized.
It is sometimes possible, as we shall see later, to memorize the Zj,p messages
in a compressed way. The number of messages to memorize then passes from B
to g(B, dc ), with g representing a compression function (g(B, dc ) < B).
Parameters Values
Message propagation architecture (αp = 1, αv = dv = 3, P = 3)
Position of the interconnection network 1
Control Distributed, delayed update
VNP
Data path Total sum, serial
Control Compact
PNP
Data path Trellis, parallel
the previous iteration (Lold). At the end of each iteration, the role of these two
memories is exchanged. In this architecture, the extrinsic branch information
Zj,p can be saved either on the VNP side (solid line in the figure) or on the PNP
side (dotted line in the figure), like in the Chen et al. [9.12] and Guilloud et al.
[9.24] architectures.
The data paths of the VNPs and PNPs are both of the total sum type, with
a serial implementation. From time T1 , dc = 4 messages Lj,p enter serially into
the PNP. After a computation latency of T2 − T1 , the messages Zj,p calculated
are sent back, again serially, to the VNPs which are controlled in distributed
mode. But in this case, the update of the information is immediate. This is
translated by using a single block of Lacc memory. Thus, the sum of the extrinsic
information of the j bits is updated as soon as a new input Zj,p arrives.
Parameters Values
Message propagation architecture (αp = dc = 4, αv = dv = 3, P = 3)
Position of the interconnection network 4
Control Compact
VNP
Data path Total sum, serial
Control Distributed, immediate update
PNP
Data path Total sum, serial
VNP with Δ = 1
In this technique, the VNP simply returns Lj to the parity constraints to
which it is connected. Thus, it is no longer necessary to memorize the Zj,p mes-
sages since the latter are no longer used by the VNP. There results a significant
economy in memory. This algorithm, APP algorithm, was first proposed by
Fossorier et al. in [9.20] , and taken up again by E. Yeo et al. in [9.66].
Note that the hypothesis of independence between the messages leaving
and entering a parity node is absolutely not verified. That is why the iterative
9. LDPC codes 331
Figure 9.16 – Example of architecture for a vertical interleaving schedule. The serial
implementation of the PNP is not detailed in this figure.
PNP with Δ = 1
This is the algorithm symmetric to the previous one: the PNP returns a
unique value. This technique, which is very efficient in terms of complexity,
enables the algorithm to reach its correction capacity in very few iterations,
typically 5. Although its correction capacity is very low in relation to the BP
algorithm, it is interesting to note that for 5 iterations, such an algorithm is more
efficient than the BP algorithm after this same number of iterations. Thus, such
procedures can be successfully applied for high data-rate applications where only
a reduced number of iterations can be performed.
332 Codes and Turbo Codes
These two variants of the Min-sum algorithm are called Offset BP-based and
Normalized BP-Based [9.10] respectively. The optimization of coefficients A
and B enables decoders to be differentiated. They can be constant or variable
according to the signal to noise ratio, the degree of the parity constraint, or the
processed iteration number, etc.
λ − min algorithm(Δ = λ + 1)
This algorithm was presented initially by Hu et al. [9.27, 9.28] then re-
formulated independently by Guilloud et al. [9.24]. Function f , defined by
Equation (9.35), is such that f (x) is large for low x, and low when x is large.
Thus, the sum in (9.35) can be approximated by its λ highest values, that is to
say, by the λ lowest values of |Lj,p |. Once the set denoted Jλ (p) of minima λ is
obtained, the PNP will calculate Δ = λ + 1 distinct magnitudes:
⎛ ⎞
x
Indeed, if j is the index of a bit having sent one of the values of the set of min-
ima, the magnitude is calculated on all the λ − 1 other minima (λ computations
on λ − 1 values). However, for all the bits, the same magnitude is returned
(a computation on λ values). It must be noted that the performance of the
λ − min algorithm can be improved by adding a correction factor A and B as
defined in equation (9.34).
A − min ∗algorithm(Δ = 2)
The last sub-optimal algorithm published so far is called the "A − min ∗"
algorithm and was proposed by Jones et al. [9.32]. Here also, the first step is
to find index j0 of the bit having the message with the lowest module: j0 =
Arg Minj∈J(p) (|Lj,p |). Then, two distinct messages are calculated:
⎛ ⎞
If j = j0 : |Zj0 ,p | = f ⎝ f |Lj,p |⎠ (9.36)
j∈J(p)/j0
⎛ ⎞
If not j
= j0 : |Zj,p | = f ⎝ f |Lj,p |⎠ (9.37)
j∈J(p)
1.00E+00
LDPC (50 it)
BP
Amin*
1.00E01 lambda=4
lambda=3
lambda=2
BPBased
1.00E02
Error probability
1.00E03
1.00E04
1.00E05
1.00E06
1 1.5 2 2.5 3 3.5 4 4.5 5
Eb/No
Figure 9.17 – Comparison of performance between the 3−min and A−min ∗ algorithms
in the case of decoding an irregular code C3 .
q
aq = trunc a 2 δ + 0.5 ,
(9.39)
a = aq 2nqδ−1
where trunc designates the truncature operation.
These two parameters can influence the decoding performance. Too low a dy-
namic allows an error floor to appear on the error rate curves. This floor appears
much earlier than that associated with the minimum distance of the code.
However it is important to note that increasing the dynamic without increas-
ing the number of quantization bits increases the value of the bit with the lowest
weight, and consequently decreases the precision of the computations done in
the PNP. Increasing the dynamic without increasing the number of bits degrades
the decoding performance in the convergence zone.
9. LDPC codes 335
• Architecture:
– Decoder: indication of the type of decoder (serial, parallel or mixed)
with parameters (P , αp , αv ) associated with the message propagation
architecture.
– Data path: indication for each node processor (variable and con-
straint) of type of architecture used (direct, trellis or total sum).
– Control: indication for each node processor (variable and constraint)
of type of control used, compact or distributed and, if applicable, of
type of update.
– Position of the interconnection network (between 1 and 4).
• Characteristics of the LDPC code: size, rate and regularity of the LDPC
code.
Bibliography
[9.1] PB. Ammar, B. Honary, Y. Kou, and S. Lin. Construction of low density
parity check codes: a combinatoric design approach. In Proceedings of
IEEE International Symposium on Information Theory (ISIT’02), July
2002.
[9.2] M. Ardakani, T.H. Chan, and F.R. Kschischang. Properties of the exit
chart for one-dimensional ldpc decoding schemes. In Proceedings of Cana-
dian Workshop on Information Theory, May 2003.
[9.3] M. Ardakani and F.R. Kschischang. Designing irregular lpdc codes using
exit charts based on message error rate. In Proceedings of International
Symposium on Information Theory (ISIT’02), July 2002.
[9.4] C. Berrou and S. Vaton. Computing the minimum distance of linear
codes by the error impulse method. In Proceedings of IEEE International
Symposium on Information Theory, July 2002.
[9.5] C. Berrou, S. Vaton, M. Jézéquel, and C. Douillard. Computing the min-
imum distance of linear codes by the error impulse method. In Proceed-
ings of IEEE Global Communication Conference (Globecom’2002), pages
1017–1020, Taipei, Taiwan, Nov. 2002.
[9.6] M. Blaum, P. Farrel, and H. Van Tilborg. Chapter 22: Array codes. In
Handbook of Coding Theory. Elsevier, 1998.
[9.9] J. Campello and D.S. Modha. Extended bit-filling and ldpc code design.
Proceedings of IEEE Global Telecommunications Conference (GLOBE-
COM’01), pages 985–989, Nov. 2001.
[9.10] J. Chen and M. Fossorier. Near optimum universal belief propagation
based decoding of low-density parity check codes. IEEE Transactions on
Communications, 50:406–414, March 2002.
[9.11] J. Chen and M.P.C. Fossorier. Density evolution for two improved bp-
based decoding algorithms of ldpc codes. IEEE Communications Letters,
6:208–210, May 2002.
338 Codes and Turbo Codes
[9.12] Y. Chen and D. Hocevar. A fpga and asic implementation of rate 1/2,
8088-b irregular low density parity check decoder. In Proceedings of IEEE
Global Telecommunications Conference (GLOBECOM’03), 1-5 Dec. 2003.
[9.57] T. Tian, C. Jones, J.D. Villasenor, and R.D. Wesel. Construction of irreg-
ular ldpc codes with low error floors. In Proceedings of IEEE International
Conference on Communications (ICC’03), 2003.
[9.58] B. Vasic. Combinatorial constructions of low-density parity check codes
for iterative decoding. In Proceedings of IEEE International Symposium
on Information Theory (ISIT’02), July 2002.
[9.59] B. Vasic. High-rate low-density parity check codes based on anti-pasch
affine geometries. In Proceedings of IEEE International Conference on
Communications (ICC’02), volume 3, pages 1332–1336, 2002.
[9.60] B. Vasic, E.M. Kurtas, and A.V. Kuznetsov. Kirkman systems and their
application in perpendicular magnetic recording. IEEE Transactions on
Magnetics, 38:1705–1710, July 2002.
342 Codes and Turbo Codes
[9.61] B. Vasic, E.M. Kurtas, and A.V. Kuznetsov. Ldpc codes based on mu-
tually orthogonal latin rectangles and their application in perpendicu-
lar magnetic recording. IEEE Transactions on Magnetics, 38:2346–2348,
Sept. 2002.
[9.62] F. Verdier and D. Declercq. A ldpc parity check matrix construction for
parallel hardware decoding. In Proceedings of 3rd International Sympo-
sium on Turbo Codes & related topics, 1-5 Sept. 2003.
[9.63] Eric W. Weisstein. Weisstein. from Mathworld,
http://mathworld.wolfram.com/BlockDesign.html.
[9.64] N. Wiberg. Codes and Decoding on General Graphs. PhD thesis,
Linköping University, 1996.
[9.67] H. Zhang and J.M.F. Moura. The design of structured regular ldpc codes
with large girth. In Proceedings of IEEE Global Telecommunications Con-
ference (GLOBECOM’03), Dec. 2003.
Chapter 10
distances, then assigning to each branch of the trellis a signal belonging to the
constellation, respecting a set of rules such as those described in [10.7].
The TTCM scheme presented by Robertson and Wörz is shown in Fig-
ure 10.1. Each TCM encoder is made up of a recursive systematic convolutional
encoder, or RSC encoder, with rate q/(q+1), and a modulation without memory
of order Q = 2q+1 . The binary symbols coming from the source are grouped
into symbols of q bits. These symbols are encoded by the first TCM in the order
in which are produced by the source and by the second TCM after interleaving.
Figure 10.1 – Diagram of the principle of turbo trellis coded modulation (TTCM),
according to Robertson and Wörz [10.5, 10.6]. Spectral efficiency η = q bit/s/Hz.
Each q-tuple coming from the source being encoded two times, a selection
operator alternatively transmits the output of one of the two TCM encoders,
in order to avoid the double transmission of information, which would lead to
a spectral efficiency of the system of q/2 bit/s/Hz. This in fact amounts to
puncturing half of the redundancy sequence for each convolutional code.
At reception, the TTCM decoder is similar to a turbo decoder, except that
the former directly processes the (q + 1)-ary symbols coming from the demod-
ulator. Thus, the calculation of the transition probabilities at each step of
the MAP algorithm (see Section 7.4) uses the Euclidean distance between the
received symbol and the symbol carried by each branch of the trellis. If the
decoding algorithm operates in the logarithmic domain (Log-MAP, Max-Log-
MAP), it is the branch metrics that are taken equal to the Euclidean distances.
Computing an estimate of the bits carried by each demodulated symbol, before
decoding, would indeed be a sub-optimal implementation of the receiver.
Similarly, for efficient implementation of turbo decoding, the extrinsic infor-
mation exchanged by the elementary decoders must directly concern the q-tuples
of information transmitted and not the binary elements that they are made up
10. Turbo codes and large spectral efficiency transmissions 345
of. At each decoding instant, the elementary decoders thus exchange 2q values
of extrinsic information.
Figure 10.2 provides two examples of elementary RSC codes used in [10.5,
10.6] to build an 8-PSK TTCM with 8 states of spectral efficiency η = 2 bit/s/Hz
and a 16-QAM TTCM with spectral efficiency η = 3 bit/s/Hz.
(a)
(b)
Figure 10.2 – Examples of elementary RSC codes used in [10.5, 10.6] for the construc-
tion of a 8-PSK turbo trellis (a) and a 16-QAM turbo trellis (b) coded modulations.
Figures 10.3 and 10.4 show the performance of these two TTCMs in terms of
binary error rates (BER) as a function of the signal to noise ratio for transmission
over a Gaussian channel. At high and average error rates, these schemes show
correction performance close to capacity: a BER of 10−4 is reached at around
0.65 dB from Shannon’s theoretical limit for the transmission of packets of 5,000
coded modulated symbols. On the other hand, as the interleaving function of
the TTCM has not been the object of any particular optimization in [10.5, 10.6],
the error rates curves presented reveal early changes in slope (BER∼ 10−5 ) that
are very pronounced.
A variant of this technique, proposed by Benedetto et al. [10.1] made it pos-
sible to improve its asymptotic performance. An alternative method to build a
TTCM with spectral efficiency q bit/s/Hz involves using two RSC codes with
rate q/(q + 1) and for each of them to puncture q/2 information bits (q is as-
sumed to be even). For each elementary code we thus transmit only half the
information bits and all the redundancy bits. The bits at the output of each
encoder are associated with a modulation with 2(q/2)+1 points. The same oper-
ation is performed for the two RSC codes, taking care that each systematic bit
is transmitted once and only once, so that the resulting turbo code is system-
346 Codes and Turbo Codes
Figure 10.3 – Binary error rate (BER) as a function of the signal to noise ratio Eb /N0
of the 8-PSK TTCM with 8 states using the RSC code of Figure 10.2(a). Transmis-
sion over a Gaussian channel. Spectral efficiency η = 2 bit/s/Hz. Blocks of 10,000
information bits, 5,000 modulated symbols. MAP decoding algorithm. Curves taken
from [10.6].
atic. On the other hand, this technique uses interleaving at bit level, and not at
symbol level like in the previous approach.
The criterion for optimizing the TTCM proposed in [10.1] is based on maxi-
mizing the effective Euclidean distance, defined as the minimum Euclidean dis-
tance between two encoded sequences whose information sequences have a Ham-
ming weight equal to 2. Figures 10.5 and 10.6 show two examples of TTCMs
built on this principle.
The correction performance of these two TTCMs over a Gaussian channel
are presented in Figures 10.7 and 10.8. At high and average error rates, they are
close to those given by the scheme of Robertson and Wörz; on the other hand,
using interleavers operating on the bits rather than on the symbols has made it
possible to significantly improve the behaviour at low error rates.
TTCMs lead to excellent correction performance over a Gaussian channel,
since they are an ad hoc approach to turbo coded modulation. However, they
have the main drawback of very limited flexibility: a new code must be defined
for each coding rate and each modulation considered. This drawback is cumber-
some in any practical system requiring a certain degree of adaptability. On the
other hand, although they are a quasi-optimal solution to the problem of coded
modulations for the Gaussian channel, their behaviour over fading channels like
Rayleigh channels leads to mediocre performance [10.9].
10. Turbo codes and large spectral efficiency transmissions 347
Figure 10.4 – BER as a function of the signal to noise ratio Eb /N0 of the 16-QAM
TTCM with 8 states using the RSC code of Figure 10.2(b). Transmission over a
Gaussian channel. Spectral efficiency η = 3 bit/s/Hz. Blocks of 15,000 information
bits, 5,000 modulated symbols. MAP decoding algorithm. Curves taken from [10.6].
can ideally reach the Hamming distanceof the code. Consequently, transmission
schemes using the BICM principle in practice have better performance on fading
channels than TTCMs have.
The code and the modulation not being jointly optimized, unlike a TTCM
scheme, we choose binary mapping of the constellation points which minimizes
the mean binary error rates at the input of the decoder. When it can be envis-
aged, Gray encoding satisfies this condition. For simplicity in implementing the
modulator and demodulator, in the case of square QAM (q even), the in-phase
and in-quadrature axes, I and Q, are mapped independently.
In Figure 10.9, the role of the "Multiplexing / symbol composition" block is
to distribute the encoded bits, after interleaving for fading channels, into modu-
lation symbols. This block, the meeting point between the code and the modu-
lation, enables a certain level of adjustment of the coded modulation according
to the performance targeted. This adjustment is possible since the code and the
modulation do not play the same role in relation to all the bits transmitted.
On the one hand, we can distinguish two distinct families of encoded bits
at the output of the encoder: systematic bits and redundancy bits. These two
families of bits play a different role in the decoding process: the systematic
bits, coming directly from the source, are used by the two elementary decoders
350 Codes and Turbo Codes
Figure 10.7 – BER as a function of the signal to noise ratio Eb /N0 of the 16-QAM
TTCM with 16 states using the RSC code of Figure 10.5. Transmission over a Gaussian
channel. Spectral efficiency η = 2 bit/s/Hz. 2×16,384 information bits. MAP decoding
algorithm. Curves taken from [10.1].
at reception whereas the redundancy bits, coming from the two elementary
encoders, are used by only one of the two decoders.
On the other hand, the binary elements contained in a modulated symbol
are not, in general, all protected identically by the modulation. For example,
in the case of PSK or QAM modulation with Gray encoding, only modulations
with two or four points offer the same level of protection to all the bits of a same
symbol. For higher order modulations, certain bits are better protected than
others.
As an illustration, consider a 16-QAM modulation, mapped independently
and in an analogue manner on the in-phase and in-quadrature axes by Gray
encoding. The projection of this modulation on each of the paths is amplitude
shift keying (ASK) with 4 symbols (see Figure 10.11).
We can show that, for a transmission over a Gaussian channel, the error
probabilities on the binary positions s1 and s2 , given the transmitted symbol
(±3 or ±1), are expressed in the form:
10. Turbo codes and large spectral efficiency transmissions 351
Figure 10.8 – BER as a function of the signal to noise ratio Eb /N0 of the 8-PSK
TTCM with 16 states using the RSC code of Figure 10.6. Transmission over a Gaussian
channel. Spectral efficiency η = 2 bit/s/Hz. 4×4,096 information bits. MAP decoding
algorithm. Curves taken from [10.3].
1 3
Peb (s2 | ±3 ) = erfc √
2 σ 2
1 1
Peb (s2 | ±1 ) = erfc √
2 σ 2
(10.1)
1 1 1 5 1 1
Peb (s1 | ±3 ) = erfc √ − erfc √ ≈ erfc √
2 σ 2 2 σ 2 2 σ 2
1 3 1 1 1 1
Peb (s1 | ±1 ) = erfc √ + erfc √ ≈ erfc √
2 σ 2 2 σ 2 2 σ 2
where erfc represents the complementary error function and σ 2 designates the
noise variance on the channel. We observe that binary position s2 is on average
better protected by the modulation than position s1 .
Consequently, it is possible to define several strategies for building modula-
tion symbols by associating as a matter of priority the systematic bits or redun-
352 Codes and Turbo Codes
Figure 10.9 – Diagram of the principle of the transmitter in the case of the pragmatic
association of a turbo code and modulation with Q = 2q states.
Figure 10.10 – Diagram of the principle of the receiver for the pragmatic turbo coded
modulation scheme of Figure 10.9.
Figure 10.11 – Diagram of the signals of 4-ASK modulation with Gray encoding.
dant bits with the positions that are the best protected by the modulation. Two
extreme strategies can thus be defined in all cases:
Modulations of orders higher than 16-QAM offer more than two levels of
protection for the different binary positions. 64-QAM modulation, for example,
gives three different levels of protection, if the in-phase and in-quadrature axes
are mapped independently and in an analogue manner by using a Gray code.
10. Turbo codes and large spectral efficiency transmissions 353
where Si,1 and Si,0 represent the sets of points s of the constellation such that
the ith bit si is equal to 1 or 0, and dr,s is the Euclidean distance between the
received symbol r and the constellation point considered s.
In practice, the Max-Log approximation is commonly used to simplify the
calculation of the LLRs:
turbo code with 16 states presented in Section 7.5 for transmission conditions
similar to those leading to the curves obtained in Figures 10.3 and 10.4. We
observe that after 8 decoding iterations, the performance of the two turbo coded
modulation families are equivalent down to BERs from 10−4 to 10−5 . The better
behaviour of the pragmatic solution at lower error rates is due, on the one hand,
to the use of 16-state elementary codes and, on the other hand, to the careful
design of the turbo code interleaver.
Figure 10.12 – BER as a function of the signal to noise ratio Eb /N0 of pragmatic
turbo-coded 8-PSK using a 16-state double-binary code. Transmission over a Gaussian
channel. Spectral efficiency η = 2 bit/s/Hz. Blocks of 10,000 information bits, 5,000
modulated symbols. MAP decoding algorithm. "Systematic" scheme.
The curves of Figure 10.14 show the influence of the strategy of construct-
ing symbols on the performance of turbo coded modulation. They show the
behaviour of the association of a 16-state double-binary turbo code and a 16-
QAM mapped independently on the in-phase and in-quadrature axes using the
Gray code. The two extreme strategies for building the symbols described above
were simulated. The size of the blocks, 54 bytes, and the simulated rates 1/2 and
3/4, are representative of concrete applications in the wireless technology sector
(IEEE 802.16 standard, Wireless Metropolitan Area Network ). Figure 10.14 also
shows the theoretical limits of the transmission studied. These limits take into
10. Turbo codes and large spectral efficiency transmissions 355
Figure 10.13 – BER as a function of the signal to noise ratio Eb /N0 of pragmatic turbo-
coded 16-QAM using a 16-state double-binary code. Transmission over a Gaussian
channel. Spectral efficiency η = 3 bit/s/Hz. Blocks of 15,000 information bits, 5,000
modulated symbols. MAP decoding algorithm. "systematic" scheme.
account the size of the blocks transmitted as well as the packet error rates (PER)
targeted. They are obtained from the value of the capacity of the channel, to
which we add a correcting term obtained with the help of the so-called "sphere
packing" bound, (see Section 3.3).
We observe that at high or average error rates, the convergence of the itera-
tive decoding process is favoured by a better protection of the systematic bits.
This result can be explained by the fact that, in the decoding process, each sys-
tematic data is used at the input of the two decoders. Consequently, an error on
a systematic bit at the output of the channel causes an error at the input of the
two elementary decoders, whereas erroneous redundancy only affects the input
of one of the two elementary decoders. Consequently, reinforcing the protection
of the systematic bits benefits the two elementary decoders simultaneously.
The higher the proportion of redundancy bits transmitted, that is to say,
the lower the coding rate, the greater the gap in performance between the two
356 Codes and Turbo Codes
Figure 10.14 – Performance in binary error rate (BER) and packet error rate (PER) of
the pragmatic association of a 16-QAM and a 16-state double-binary turbo code, for
the transmission of blocks of 54 bytes over a Gaussian channel. Coding rates 1/2 and
3/4. Max-Log-MAP decoding, inputs of the decoder quantized on 6 bits, 8 decoding
iterations.
For low and very low error rates, the scheme favouring the protection of the
redundancy gives the best performance. This behaviour is difficult to prove by
simulation for the lowest rates, as the assumed crossing point of the curves is
situated at an error rate that is difficult to obtain by simulation (P ER ≈ 10−8
for R = 1/2). The interpretation of this result requires analysis of the erroneous
paths in trellises with a high signal to noise ratio. We have observed that, in
the majority of cases, the erroneous sequences contain a fairly low number of
erroneous systematic bits and a rather high number of erroneous redundancy
bits. In other words, the erroneous sequences generally have a low input weight.
In particular, the erroneous paths in question mainly correspond to rectangular
patterns of errors (see Section 7.3.2). The result, from the point of view of
the asymptotic behaviour of turbo coded modulation, is that it is preferable to
ensure better protection of the parity bits.
The curves shown in Figure 10.14 were obtained with the help of the simpli-
fied Max-Log-MAP decoding algorithm, using data quantized on 6 bits at the
10. Turbo codes and large spectral efficiency transmissions 357
Bibliography
[10.1] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara. Parallel concate-
nated trellis codes modulation. In Proceeedings of International Confer-
ence on Communications (ICC’96), pages 974–978, Dallas, USA, 1996.
[10.2] G. Caire, G. Taricco, and E. Biglieri. Bit-interleaved coded modulation.
IEEE Transactions on Information Theory, 44(3):927–946, May 1998.
[10.3] C. Fragouli and R. Wesel. Bit vs. symbol interleaving for parallel con-
catenated trellis coded modulation. In Proceedings of Global Telecommu-
nications Conference (Globecom’01), pages 931–935, San Antonio, USA,
Nov. 2001.
[10.4] S. Le Goff, A. Glavieux, and C. Berrou. Turbo codes and high spec-
tral efficiency modulation. In Proceedings of International Conference on
Communications (ICC’94), pages 645–649, New Orleans, USA, May 1994.
The invention of turbo codes at the beginning of the 90s totally revolutionized
the field of error correcting coding. Codes relatively simple to build and decode,
making it possible to approach Shannon’s theoretical limit very closely, were at
last available. However, the impact of this discovery was not limited to one sin-
gle coding domain. More generally, it gave birth to a new paradigm for designing
digital transmission systems, today commonly known as the "turbo principle".
To solve certain very complex a priori signal processing problems, we can envis-
age dividing these problems into a cascade of elementary processing operations,
simpler to implement. However, today we know that the one-directional succes-
sion of these processing operations leads to a loss of information. To overcome
this sub-optimality, the turbo principle advocates establishing an exchange of
probabilistic information, "in the two directions", between these different pro-
cessing operations. All of the information available is thus taken into account
in solving the global problem and a consensus can be found between all the
elementary processing operations in order to elaborate the final decision.
The application of the turbo principle to a certain number of classical prob-
lems in digital transmission has provided impressive gains in performance in
comparison to traditional systems. Therefore its use rapidly became popular
within the scientific community. This chapter presents the first two systems
having historically benefited from the application of the turbo principle to a
context other than error correction coding. The first system, called turbo equal-
ization, iterates between the equalization function and a decoding function to
improve the processing of the intersymbol interference for data transmission over
multipath channels. The second, commonly called turbo CDMA, exploits the
turbo principle to improve the discrimination between users in the case of a
360 Codes and Turbo Codes
where hk (i) represents the action of the channel (echo) at instant i on a symbol
transmitted at instant i − k. The impulse response of the channel at instant i
11. The turbo principle applied to equalization and detection 361
L−1
h(z) = hk (i)z −k (11.2)
k=0
The impulse response of the channel is assumed to have finite duration (L coef-
ficients), which is a realistic hypothesis in practice in most scenarios.
Equation (11.1) shows that generally, received symbol yi is a function of the
symbols transmitted before, or after (if the channel introduces a propagation de-
lay) information symbol xi considered at instant i. In accordance with what was
introduced in Chapter 2, we then say that the received signal is spoiled by inter-
symbol interference (ISI). If we now assume that the transmission channel does
not vary (or very little) on the duration of a transmitted block of information,
model(11.1) can be simplified as follows:
L−1
yi = hk xi−k + wi (11.3)
k=0
where we have suppressed the time dependency from the coefficients of the equiv-
alent discrete channel. The representation of the equivalent discrete channel in
the form of a digital filter with finite impulse response presented in Figure 11.1
comes directly from (11.3). The coefficients of the filter are precisely those of
the impulse response of the channel.
Figure 11.1 – Representation of the equivalent discrete channel in the form of a digital
filter.
ISI can be a major obstacle for establishing a good quality digital trans-
mission, even in the presence of very low noise. As an illustration, we have
shown in Figure 11.2 the constellation of the symbols received at the output of
a channel highly perturbed by ISI, for a signal to noise ratio of 20 dB1 , given
that we have transmitted a sequence of discrete symbols with four phase states
(QPSK modulation). We thus observe that when the ISI is not processed by an
1 We recall that a signal to noise ratio of 20 dB corresponds to a power of the transmitted
signal 100 times higher than the power of the additive noise on the link.
362 Codes and Turbo Codes
adequate device, it can lead to great degradation in the error rate at reception,
and therefore in the general quality of the transmission.
(a) (b)
Figure 11.2 – Illustration of the phenomenon of ISI in the case of a 5-path highly
frequency-selective channel, for a signal to noise ratio of 20 dB.
Each candidate sequence takes a single path in the trellis. Searching for the
sequence with the minimum Euclidean distance from the observation can then
be performed recursively, with a linear computation cost depending on the size
of the message, by applying the Viterbi algorithm on the trellis of the channel.
The MLSD equalizer offers very good performance. However, the complexity
of its implementation increases proportionally with the number of states in the
trellis, and therefore exponentially with duration L of the impulse response of
the channel and size M of the modulation alphabet. Its practical utilization is
consequently limited to transmissions using modulations with a small number of
states (2, 4, or 8) on channels with few echoes. On the other hand, it should be
noted that this equalizer requires prior estimation of the impulse response of the
channel in order to build the trellis. The MLSD solution has been adopted by
many manufacturers to perform the equalization operation in mobile telephones
for the worldwide GSM (Global System Mobile) standard.
In the presence of modulations with a large number of states or on channels
whose impulse response length is large, the MLSD equalizer has an unaccept-
able computation time for real-time applications. An alternative strategy then
involves combating the ISI with the help of equalizers having less complexity,
implementing digital filters.
In this perspective, the simplest solution involves applying a linear transverse
filter at the output of the channel. This filter is optimized so as to compensate
("equalize") the irregularities of the frequency response of the channel, with
the aim of converting the frequency selective channel into an equivalent ide-
ally ISI-free (or frequency-flat) channel, perturbed only by additive noise. The
transmitted message is then estimated thanks to a simple operation of symbol
by symbol decision (threshold detector) at the output of the equalizer, optimal
11. The turbo principle applied to equalization and detection 365
Examining the diagram of the principle of the linear equalizer, we can note
that when we take a decision on symbol xi at instant i, we have an estimation
on the previous symbols x̂i−1 , x̂i−2 , . . . We can therefore envisage rebuilding
the (causal) interference caused by these data and therefore cancel it, in order to
improve the decision. The equalizer which results from this reasoning is called a
Decision-Feedback Equalizer or DFE. The diagram of the principle of the device
is illustrated in Figure 11.6. It is made up of a forward filter, in charge of
converting the impulse response of the channel into a purely causal response,
followed by a decision device and a feedback filter, in charge of estimating the
residual interference at the output of the feedback filter in order to cancel it via
a feedback loop.
As a general rule, the DFE provides performance higher than that of the lin-
ear equalizer, particularly on channels that are highly frequency selective. How-
ever, this equalizer is non-linear in nature, due to the presence of the decision
device in the feedback loop, which can give rise to an error propagation phe-
nomenon (particularly at low signal to noise ratio) when some of the estimated
data are incorrect. In practice, the filter coefficients are generally optimized
following the MMSE criterion, by assuming that the estimated data are equal
366 Codes and Turbo Codes
presence of the encoder can then be exploited to reduce this gap in performance,
by benefiting from the coding gain at reception.
In the following part of this section, we are going to examine a transmission
system shown in Figure 11.8. The modulation and transmission operations on
the channel are here represented in equivalent baseband, in order not to have to
consider a carrier frequency.
possible the probabilistic data exchanged between the equalizer and the
channel decoder. The turbo equalizer is then capable of totally compensat-
ing the degradation caused by the ISI, with the reserve that the interleaver
be large enough and carefully constructed.
The main difference with the previous scheme thus lies in the implementation of
the SISO equalizer and SISO decoder. Indeed, these latter no longer exchange
probabilistic information at binary level but at symbol level, whether in LLR
form or directly in probability form. The interested reader can find further
details on this subject in [11.8], for example.
As a general rule, the channel code is a convolutional code and the chan-
nel decoder uses a soft-input soft-output decoding algorithm of the MAP type
(or its derivatives in the logarithmic domain: Log-MAP and Max-Log-MAP).
Again, we will not consider the hardware implementation of the decoder since
this subject is dealt with in Chapter 7. Note, however, that unlike classical
turbo decoding schemes, the channel decoder here does not provide extrinsic
information on the information bits, but instead on the coded bits.
On the other hand, we distinguish different optimization criteria to imple-
ment the SISO equalizer, leading to distinct families of turbo equalizers. The
first, sometimes called "turbo detection" and what we call MAP turbo equal-
ization here, uses an equalizer that is optimal in the Maximum A Posteriori
sense. The SISO equalizer is then typically performed thanks to the BCJR-
MAP algorithm. As we shall see in the following section, this approach leads
to excellent performance, but like the classical MLSD equalizer, it has a very
high computation cost which excludes any practical implementation in the case
of modulations with a large number of states and for transmissions on channels
having large time delays. We must then turn towards alternative solutions, with
less complexity but that will necessarily be sub-optimal in nature. One strategy
that can be envisaged in this context involves reducing the number of branches
to examine at each instant in the trellis. This approach is commonly called
"reduced complexity MAP turbo equalization". We know different methods to
reach this result, which will be briefly presented in the following section. An-
other solution is inspired by classical equalization methods and implements an
optimized SISO equalizer following the minimum mean square error (MMSE)
criterion. We thus obtain an MMSE (filtering-based) turbo equalizer, a scheme
described in Section 11.1.6 and that appears as a very promising solution today
for high data rates transmissions on highly frequency-selective channels.
372 Codes and Turbo Codes
The purpose of the MAP equalizer is to evaluate the a posteriori LLR L(xi,j )
on each coded interleaved bit xi,j , defined as follows:
Pr(xi,j = 1 |y )
L(xi,j ) = ln (11.5)
Pr(xi,j = 0 |y )
Using conventional results in detection theory, we can show that this equal-
izer is optimal in the sense of the minimization of the symbol error probability.
To calculate the a posteriori LLR L(xi,j ), we will use the trellis representa-
tion associated with transmission on the frequency selective channel. Applying
Bayes’ relation, the previous relation can also be written:
Pr(xi,j = 1, y)
L(xi,j ) = ln (11.6)
Pr(xi,j = 0, y)
at instant i, on all of the transitions between instants i − 1 and i for which the
j-th bit making up the symbol associated with these transitions equals 0 or 1.
Thus, ⎛ ⎞
Pr(s , s, y)
⎜ (s ,s)/x
i,j =1 ⎟
L(xi,j ) = ln ⎝ ⎠ (11.7)
Pr(s , s, y)
(s ,s)/xi,j =0
Adopting a similar approach now to the one presented in the original article
by Bahl et al. [11.3], we can show that the joint probability Pr(s , s, y) associated
with each transition considered can be decomposed into a product of 3 terms:
Forward and backward state probabilities αi−1 (s ) and βi (s) can be calcu-
lated recursively for each state and at each instant in the trellis, by applying the
following update equations:
αi (s) = αi−1 (s )γi−1 (s , s) (11.9)
(s ,s)
βi (s ) = γi (s , s)βi+1 (s) (11.10)
(s ,s)
These two steps are called forward recursion and backward recursion, respec-
tively. Summations are performed over all the couples of states with indices (s ,
s) for which there is a valid transition between two consecutive instants in the
trellis. Forward recursion uses the following initial condition:
This condition translates the fact that the initial state in the trellis (with
index 0, by convention) is perfectly known. Concerning the backward recursion,
374 Codes and Turbo Codes
we usually assign the same weight to each state at the end of the trellis since
the arrival state is generally not known a priori:
1
βN (s) = ∀s (11.12)
M L−1
In practice, we see that the dynamic of values αi−1 (s ) and βi (s) increases
during the progression in the trellis. Consequently, these values must be normal-
ized at regular intervals in order to avoid overflow problems in the computations.
One natural solution involves dividing these metrics at each instant by constants
Kα and Kβ chosen in such a way as to satisfy the following normalization con-
dition:
1 1
αi (s) = 1 and βi (s) = 1 (11.13)
Kα s Kβ s
the sums here concerning all possible states s of the trellis at instant i.
To complete the description of the algorithm, it remains for us to develop
the expression of the term γi−1 (s , s). This term can be assimilated to a branch
metric. We can show that it is decomposed into a product with two terms:
where we have written Pa (Xl,j ) = Pr(xi,j = Xl,j ), binary element Xl,j taking the
value 0 or 1 according to the symbol Xl considered and the mapping rule. Within
the turbo equalization iterative process, the a priori probabilities Pa (Xl,j ) are
deduced from the a priori information available at the input of the equalizer.
From the initial definition (11.4) of the LLR, we can in particular show that
probability Pa (Xl,j ) and corresponding a priori LLR La(xi,j ) are linked by the
following expression:
As for the second term P (yi |s , s), it quite simply represents the likelihood
P (yi |zi ) of observation yi relative to branch label zi associated with the tran-
sition considered. The latter corresponds to the symbol that we would have
observed at the output of the channel in the absence of noise:
L−1
zi = hk xi−k (11.18)
k=0
In reality and in accordance with the turbo principle, it is not this a posteriori
information that is propagated to the SISO decoder, but rather the extrinsic
information. Here, the latter measures the equalizer’s own contribution in the
global decision process, excluding the information relating to the bit considered
coming from the decoder at the previous iteration, that is to say, the a priori
LLR La (xi,j ). If we develop the expression of branch metric γi−1 (s , s) in the
376 Codes and Turbo Codes
We can then factorize the a priori information term in relation to the bit xi,j
considered, both in numerator (Xl,j = 1) and in denominator (Xl,j = 0), which
gives:
Finally, we see that the extrinsic information is obtained quite simply by sub-
tracting the a priori information from the a posteriori LLR calculated by the
equalizer:
Le (xi,j ) = L(xi,j ) − La (xi,j ) (11.23)
This remark concludes the description of the MAP equalizer. As we have
presented it, this algorithm proves to be difficult to implement on a circuit due
to the presence of numerous multiplication operations. In order to simplify
the computations, we can then envisage transposing the whole algorithm into
the logarithmic domain (Log-MAP algorithm), the advantage being that the
multiplications are then converted into additions, which are simpler to do. If we
wish to further reduce the processing complexity, we can also use a simplified
(but sub-optimal) version, the Max-Log-MAP (or Sub-MAP) algorithm. These
two variants were presented in the context of turbo codes in Chapter 7. The
derivation is quite similar in the case of the MAP equalizer. Reference [11.5]
presents a comparison in performance between these different algorithms in a
MAP turbo equalization scenario. In particular, it turns out that the Max-Log-
MAP equalizer offers the best performance/complexity compromise when the
estimation of the channel is imperfect.
Example of performance
In order to illustrate the good performance offered by MAP turbo equalization,
we chose to simulate the following transmission scenario: a binary source gener-
ates messages of 16382 bits of information, which are then protected by a rate
11. The turbo principle applied to equalization and detection 377
Figure 11.13 – Performance of the MAP turbo equalizer for BPSK transmission on
the Proakis C channel, using a rate R = 1/2 4-state non-recursive non-systematic
convolutional code and a pseudo-random interleaver of size 32768 bits.
target bit error rate of 10−5 , the iterative process provides a gain of the order of
6.2 dB compared with the performance of the conventional receiver performing
the equalization and decoding disjointly, given by the curve at the 1st iteration.
This performance is very similar to that presented in reference [11.7].
These results give rise to a certain number of remarks, since the example
considered here presents the characteristic behaviour of turbo systems. In par-
ticular, we see that the gain provided by the iterative process only appears
beyond a certain signal to noise ratio (convergence threshold, equal to 3 dB
here). Beyond this threshold, we observe a rapid convergence of the turbo
equalizer towards the asymptotic performance of the system, given by the error
probability after decoding on a non-selective AWGN channel. To improve the
global performance of the system, we can envisage using a more powerful error
correcting code. Experience shows that we then come up against the necessity
of finding a compromise in choosing the code, between rapid convergence of the
iterative process and good asymptotic performance of the system (at high signal
to noise ratios). The greater the correction capacity of the code, the higher the
convergence threshold. On this topic, we point out that today there exist semi-
analytical tools such as EXIT (EXtrinsic Information Transfer ) charts [11.49],
enabling the value of the convergence threshold to be predicted precisely, as
well as the error rate after decoding for a given transmission scenario, under
the hypothesis of ideal interleaving (infinite size). A second solution involves
introducing a feedback effect in front of the equivalent discrete-time channel,
by inserting an adequate precoding scheme at transmission. Cascading the pre-
encoder with the channel produces a new channel model, recursive in nature,
leading to a performance gain that is greater, the larger the dimension of the
interleaver considered. This phenomenon is known as "interleaving gain" in the
literature dedicated to serial turbo codes. Subject to carefully choosing the pre-
encoder, we can then exceed the performance of classical non-recursive turbo
equalization schemes as has been shown in [11.35] and [11.39].
nels having ISI limited to a few symbol periods. Beyond that, we must turn to
less complex, but less efficient, solutions.
There are several ways to deal with this problem. If we limit ourselves to us-
ing equalizers derived from the MAP criterion, one idea is to reduce the number
of paths examined by the algorithm in the trellis. A first approach performs a
truncation of the channel impulse response in order to keep only the J < L first
coefficients. The number of states in the trellis will then be decreased. The ISI
terms ignored in the definition of the states are then taken into account when
calculating the branch metrics, using past decisions obtained from the knowl-
edge of the survivor path in each state. This strategy is called Delayed Decision
Feedback Sequence Estimation (DDFSE). It offers good performance provided
most of the channel’s energy be concentrated in its first coefficients which, in
practice, requires the implementation of a minimum-phase pre-filtering oper-
ation. Applying this technique to turbo equalization has, for example, been
studied in [11.2]. A refinement of this algorithm involves grouping some states
of the trellis together, in accordance with the set-partitioning rules defined by
Ungerboeck [11.52] for designing trellis coded modulations. This improvement,
called Reduced State Sequence Estimation (RSSE), includes DDFSE as a par-
ticular case [11.19]. In a similar way, we can also envisage retaining more than
one survivor path in each state to improve the robustness of the equalizer and if
necessary to omit the use of pre-filtering [11.42]. Rather than reduce the number
of states of the trellis by truncation, it can also be envisaged to examine only
a non-exhaustive list of the most likely paths at each instant. The resulting
algorithm is called the "M algorithm", and its extension to SISO equalization
was studied in [11.17]. Whatever the case, the search for efficient equalizers with
reduced complexity regularly continues to give rise to new contributions.
All the strategies that we have mentioned above enter into the category of
MAP turbo equalizers with reduced complexity. Generally, these solutions are
interesting when the number of states of the modulation is not too high. On
the other hand, in the case of high data rate transmissions on channels with
long delay spreads, it is preferable to envisage filter-based turbo equalizers of
the MMSE type.
1. The first operation, the SISO mapping, calculates a soft estimate for the
transmitted symbols, denoted x̄ = (x̄0 , . . . , x̄N −1 ), from the a priori in-
formation La (x) coming from the decoder at the previous iteration.
2. The linear equalizer then uses estimated data x̄i to rebuild and cancel the
ISI affecting the received signal. The resulting signal is filtered in order
to eliminate residual interference. The filter coefficients are optimized so
as to minimize the mean square error between the equalized data and
11. The turbo principle applied to equalization and detection 383
• SISO mapping
This operation involves calculating the soft estimate x̄i , defined as the math-
ematical expectation of symbol xi transmitted at instant i:
M
x̄i = Ea {xi } = Xl × Pa (Xl ) (11.24)
l=1
The sum here concerns all of the discrete symbols in the constellation. The
term Pa (Xl ) denotes the a priori probability Pr(xi = Xl ) of symbol Xl being
transmitted at instant i. We have put index a at the level of the expectation
term to highlight the fact that these probabilities are deduced from the a priori
information at the input of the equalizer. Indeed, provided the m bits making
up symbol xi are statistically independent, it is possible to write:
5
m
Pa (Xl ) = Pa (Xl,j ) (11.25)
j=1
where binary element Xl,j takes the value 0 or 1 according to the symbol Xl
considered and the mapping rule. On the other hand, starting from the general
definition (11.4) of the LLR, we can show that the a priori probability and the
a priori LLR are linked by the following relation:
1 La (xi,j )
Pa (Xl,j ) = 1 + (2Xl,j − 1) tanh with Xl,j ∈ {0, 1} (11.26)
2 2
the soft estimate x̄i is then strictly equal to the transmitted symbol xi (perfect
estimate). To summarize, the value of the soft estimate x̄i evolves as a function
of the reliability of the a priori information provided by the decoder. This
explains the name of "soft" (or probabilistic) estimate for x̄i . By construction,
the estimated data x̄i are random variables. In particular, we can show (see
[11.33] for example) that they satisfy the following statistical properties:
E {x̄i } = 0 (11.28)
E x̄i x∗j = E x̄i x̄∗j = σx̄2 δi−j (11.29)
The parameter σx̄2
here denotes the variance of estimated data x̄i . In practice,
this quantity can be estimated using the sample variance estimator on a frame
of N symbols as follows:
N −1
1 2
σx̄2 = |x̄i | (11.30)
N i=0
We easily verify that under the hypothesis of equiprobable a priori symbols,
σx̄2 = 0. Conversely, we obtain σx̄2 = σx2 in the case of perfect a priori informa-
tion on the transmitted symbols. To summarize, the variance of the estimated
data offers a measure of the reliability of the estimated data. This parameter
plays a major role in the behaviour of the equalizer.
yi = Hxi + wi (11.31)
T T
with yi = (yi , . . . , yi−F +1 ) , xi = (xi , . . . , xi−F −L+2 ) and wi = (wi , . . . ,
wi−F +1 )T . Matrix H, of dimensions F × (F + L − 1), is a Toeplitz matrix5
5 The coefficients of the matrix are constant along each of the diagonals.
11. The turbo principle applied to equalization and detection 385
where the vector x̃i = (x̄i , . . . , x̄i−Δ+1 , 0, x̄i−Δ−1 , . . . , x̄i−F −L+2 )T is of dimen-
sion F + L−1. The component related to symbol xi−Δ is set to zero in order to
cancel only the ISI and not the signal of interest. At the output of the forward
filter, the expression of the equalized sample at instant i is given by:
trix A.
386 Codes and Turbo Codes
Using the statistical properties of the estimated data x̄i , we note that:
In addition,
E ỹi ỹiH = HE (xi − x̃i )(xi − x̃i )H HH + σw
2
I
(11.39)
= (σx2 − σx̄2 )HHH + σx̄2 hΔ hH 2
Δ + σw I
To summarize, the optimal form of the equalizer coefficients can finally be writ-
ten: −1
f ∗ = (σx2 − σx̄2 )HHH + σx̄2 hΔ hH 2
Δ + σw I hΔ σx2 (11.40)
By bringing into play a simplified form of the matrix inversion lemma8 , the
previous solution can then be written:
σx2
f∗ = f̃ ∗ (11.41)
1 + βσx̄2
Here we can recognize the form of a classical linear MMSE equalizer with finite
length. Inversely, under the hypothesis of perfect a priori information on the
transmitted symbols, we have σx̄2 = σx2 . The equalizer then takes the following
form:
σx2
L−1
h∗Δ
2 2
f= 2 with h = hH
Δ hΔ = |hk | (11.44)
σx2 h + σw
2
k=0
σx2 h
2
- .
zi = 2 xi−Δ + hH
Δ wi (11.45)
σx2 h + σw
2
Starting from these hypotheses, the demapping module calculates the a poste-
riori LLR on the coded interleaved bits, denoted L(xi,j ) and defined as follows:
Pr(xi,j = 1 |zi )
L(xi,j ) = ln (11.48)
Pr(xi,j = 0 |zi )
The values present in the numerator and denominator can be evaluated by sum-
ming the a posteriori probability Pr(xi = Xl |zi ) of having transmitted a par-
ticular symbol Xl of the constellation on all the symbols for which the j-th bit
making up this symbol takes the value 0 or 1 respectively. Thus, we can write:
9 This hypothesis rigorously only holds on condition that the equalizer suppresses all the
ISI, which assumes perfect knowledge of the transmitted data. Nevertheless, it is a good
approximation in practice, particularly in a turbo equalization context where the reliability
of the decisions at the output of the decoder increases along the iterations which, in its turn,
improves the equalization operation.
11. The turbo principle applied to equalization and detection 389
"
Pr(xi =Xl |zi )
#
Xl /Xl,j =1
L(xi,j ) = ln
Pr(xi =Xl |zi )
Xl /Xl,j =0
(11.49)
"
P (zi |xi =Xl )Pa (Xl )
#
Xl /Xl,j =1
= ln
P (zn |xi =Xl )Pa (Xl )
Xl /Xl,j =0
The second equality results from applying Bayes’ relation. It shows the a priori
probability Pa (Xl ) = Pr(xi = Xl ) of having transmitted a given symbol Xl of
the modulation alphabet. This probability is calculated from the a priori infor-
mation available at the input of the equalizer (relations (11.25) and (11.26)). By
exploiting the above hypotheses, the likelihood of observation zi conditionally to
the hypothesis of having transmitted the symbol Xl at instant i can be written:
" #
2
1 |zi − gΔ Xl |
P (zi |xi = Xl ) = exp − (11.50)
πσν2 σν2
After simplification, the a posteriori LLR calculated by the demapping operation
becomes:
⎛ ⎞
|zi −gΔ Xl |2 m
⎜ X /X =1 exp − σν2
+ X L (x
l,k a i,k ) ⎟
⎜ l l,j ⎟
k=1
L(xi,j ) = ln ⎜ ⎟ (11.51)
⎝ |zi −gΔ Xl |2 m ⎠
exp − σ2
+ Xl,k La (xi,k )
ν
Xl /Xl,j =0 k=1
Like in the case of the BCJR-MAP equalizer, we can factorize in the numerator
and denominator the a priori information term in relation to the considered bit,
in order to obtain the extrinsic information that is then provided to the decoder:
⎛ " #⎞
|zi −gΔ Xl |2
⎜ exp − σν2
+ Xl,k La (xi,k ) ⎟
⎜ Xl /Xl,j =1 k=j ⎟
L(xi,j ) = La (xi,j ) + ln ⎜
⎜ " #⎟
⎟
⎝ 2 ⎠
exp − |zi −gσΔ2 Xl | + Xl,k La (xi,k )
ν
Xj /Xj,i =0 k=j
1 23 4
Le (xi,j )
(11.52)
Finally, the extrinsic information is obtained quite simply by subtracting the a
priori information from the a posteriori LLR calculated by the equalizer:
Le (xi,j ) = L(xi,j ) − La (xi,j ) (11.53)
In the particular case of BPSK modulation, the SISO demapping equations are
simplified to give the following expression of the extrinsic LLR:
4
L(xi ) = Re {zi } (11.54)
1 − gΔ
390 Codes and Turbo Codes
When Gray mapping rules are used, experience shows that we can reduce the
complexity of the demapping by ignoring the a priori information coming from
the decoder in the equations above10 , without really affecting the performance
of the device. On the other hand, this simplification no longer applies when
we consider other mapping rules, like the Set Partitioning rule used in coded
modulation schemes. This point has been particularly well highlighted in [11.14]
and [11.30].
This completes the description of the soft-input soft-output linear MMSE
equalizer. Finally, we can note that unlike the BCJR-MAP equalizer, the com-
plexity of the SISO mapping and demapping operations increases linearly (and
not exponentially) as a function of size M of the constellation and of the number
L of taps in the impulse response of the discrete-time equivalent channel model.
to the soft estimate x̄i−Δ in x̃i is set to zero in order not to cancel the signal
of interest. Vectors fi = (fi,F , . . . , fi,−F )T and gi = (gi,G , . . . , gi,−G )T represent
the coefficients of filters f and g, respectively. Both vectors are a function of
time since they are updated at each new received symbol.
The relations used to update the vectors of the coefficients can be obtained
from a least-mean square (LMS) gradient algorithm:
where σx̄2 corresponds to the variance of the soft estimates x̄i obtained from
(11.24) and ηi a zero-mean complex circularly-symmetric additive white Gaus-
sian noise with unit variance.
In the tracking period and in order to enable the equalizer to follow the
variations of the channel, it is possible to replace the transmitted symbols xi in
relations (11.56) by the decisions x̂i at the output of the equalizer, or by the
decisions on the estimated symbols x̄i .
When the SISO MMSE equalizer is realized in adaptive form, we do not
explicitly have access to the channel impulse response, and the updating relation
of gi does not enable gΔ to be obtained since component gi,Δ is constrained to
be zero. To perform the SISO demapping operation, we must however estimate
both bias gΔ on the data zn provided by the equalizer and variance σν2 of the
392 Codes and Turbo Codes
residual interference at the output of the equalizer. As we will see, these two
parameters can be estimated from the output of the equalizer. From relation
(11.46) again, we can show the general following result:
2
E |zi | = gΔ σx2 (11.58)
Examples of performance
For comparison purposes, the performance of the MMSE turbo equalizer has
been simulated by considering the same transmission scenario as for the turbo
MAP equalizer.
First, the parameters of the channel are assumed to be perfectly estimated.
The coefficients are calculated once per frame by matrix inversion, by considering
a digital filter with F = 15 coefficients and a designed delay Δ = 9. The
simulation results, obtained after 10 iterations, are presented in Figure 11.16.
11. The turbo principle applied to equalization and detection 393
Figure 11.16 – Performance of the MMSE turbo equalizer for BPSK transmission on
the Proakis C channel, with a 4-state rate R = 1/2 non-recursive non-systematic
convolutional code and a 16384 bit pseudo-random interleaver.
2. The MMSE turbo equalizer requires more iterations than the MAP turbo
equalizer to reach comparable error rates.
However, the MMSE turbo equalizer here shows its capacity to suppress all the
ISI when the signal to noise ratio is high enough, even on a channel that is known
to be difficult to equalize. It is therefore a serious alternative solution to the
MAP turbo equalizer when the latter cannot be used for reasons of complexity.
Second, the hypothesis of perfect knowledge of the channel parameters has
been removed and the turbo equalizer is simulated in the adaptive form, keep-
ing the same transmission parameters. The communication begins with the
transmission of an initial training sequence of 16384 symbols assumed to be per-
fectly known by the receiver. Then, frames composed of 1000 training symbols
followed by 16384 information symbols are periodically sent into the channel.
394 Codes and Turbo Codes
Figure 11.17 – Performance of the adaptive MMSE turbo equalizer for BPSK trans-
mission on the Proakis C channel, with a 4-state rate R = 1/2 non-recursive non-
systematic convolutional code and a 16384 bit pseudo-random interleaver.
During the processing of the 16384 information symbols, the turbo equalizer
operates in a decision-directed manner. The equalizer filters each have 21 coef-
ficients (F = G = 10). The coefficients are updated using the LMS algorithm.
The step size is set to μ = 0, 0005 during the training period, and then to
μ = 0, 000005 during the tracking period. Simulation results are given in Fig-
ure 11.17, considering 10 iterations at reception. We observe a degradation of
the order of only 1 dB compared to the ideal situation where the channel is as-
sumed to be perfectly known. We note that when the channel is estimated and
used for the direct computation of the coefficients of the MMSE equalizer, losses
in performance will also appear, which reduces the degradation in comparison
to the ideal situation of Figure 11.16. Note also that, to track the performance
of Figure 11.17, we have not taken into account the loss in the signal to noise
ratio caused by the use of training sequences.
In the light of these results, we note that the major difference between adap-
tive MMSE turbo equalization and that which uses direct computation of the
coefficients from the estimate of the channel lies in the way the filter coeffi-
cients are determined, since the structure and the optimization criterion of the
equalizers are identical.
To finish, we point out that, in the same way as for the turbo MAP equalizer,
we can use EXIT charts to predict the theoretical convergence threshold of the
11. The turbo principle applied to equalization and detection 395
MMSE turbo equalizer, under the hypothesis of ideal interleaving. The reader
will find further information on this subject in [11.8] or [11.50], for example.
r = SAb + n (11.61)
where:
The source data rates of the different users can be different. The size of the
spreading code is such that the chip data rate (after spreading) is the same for all
11. The turbo principle applied to equalization and detection 397
users. The received signal r is given by the contribution of all the K users plus
a centred AWGN with variance σ 2 . From observation r, we wish to recover the
information bits dk of each user. Figure 11.19 gives the diagram of the receiver
using a turbo CDMA type technique to jointly process the multi-detection and
the channel decoding:
Standard receiver
The simplest (conventional or standard) detector is the one which operates as
if each user was alone on the channel. The receiver is quite simply made up of
the filter adapted to the signature of the user concerned (this operation is also
called despreading), see Figure 11.20.
At the output of the adapted filter bank the signal can be written in the
form:
y = ST r = RAb + ST n (11.62)
We note that the vector of the additive noise at the output of the adapted
filter bank, is made up of correlated components. Its covariance matrix depends
directly on the intercorrelation matrix of the spreading sequences used, R =
ST S. We have ST n ∼ N (0, σ 2 R).
398 Codes and Turbo Codes
We can show that the error probability (before channel decoding) for the k-th
user can be written in the form:
⎛ ⎞
1 Ak Aj
Pe,k = P b̂k
= bk = K−1 Q⎝ + bj ρjk ⎠ (11.63)
2 σ σ
b−k ∈{−1,+1}
K−1 j=k
where ρjk = sTj sk measures the intercorrelation between the codes of users j
and k, with b−k = (b1 , b2 , · · · , bk−1 , bk+1 , · · · , bK ).
Assuming that the spreading codes used are such that the intercorrelation
coefficients are constant and equal to 0.2, Figure 11.21 gives the performance of
the standard receiver, in terms of error probability of the first user as a function
of the signal to noise ratio, for a number of users varying from 1 to 6. The
messages of all the users are assumed to be received with the same power. We
note of course that the higher the number of users, the worse the performance.
This error probability can even tend towards 1/2, while the signal to noise ratio
increases if the following condition (Near Far Effect ) is not satisfied:
Ak > Aj |ρjk |
j=k
Figure 11.21 – Error probability of the first user as a function of the signal to noise
ratio Eb/N 0 for constant intercorrelation coefficients ρ = 0.2, for K=6 users sharing
the resource.
by the equivalences:
- .
- .
2
Maxb fyb (y) ⇔ Minb y−RAbR−1 ⇔ Maxb 2bTAy−bTARAb (11.64)
Decorrelator detector
Decorrelator detectors involves multiplying observation y by the inverse of the
intercorrelation matrix of the codes: R−1 y = Ab + R−1 ST n. This equation
shows that the decorrelator enables the multiple access interference to be can-
celled completely, which makes it robust in relation to the Near Far Effect. On
the other hand, the resulting additive Gaussian noise has greater variance. In-
deed, we have: R−1 ST n ∼ N (0, σ 2 R−1 ). The error probability of the k-th user
can then be written in the form:
" #
Ak
Pe,k = P b̂k
= bk = Q + (11.65)
σ (R−1 )kk
Iterative detector
Decorrelator receivers or MMSE receivers can be implemented with iterative ma-
trix inversion (Jacobi, Gauss-Siedel, or relaxation) methods. The Jacobi method
11. The turbo principle applied to equalization and detection 401
5
k−1
-
. m−1
bm,k = sTk I − sj sTj ΦpK r = gm,k
T
r (11.68)
j=1 p=0
402 Codes and Turbo Codes
with:
5
K
- .
ΦK = I − sj sTj (11.69)
j=1
We can show that the error probability for the k-th user at iteration m can be
written in the following form, where S is the matrix of the codes and A is the
diagonal matrix of the amplitudes:
⎛ ⎞
1 g T
SAb
Q⎝ ⎠
m,k
Pe (m, k) = K−1 (11.70)
2 σ gT g
b/bk =+1 m,k m,k
Figure 11.25 gives an example of simulations of the SIC method with 5 users, a
spreading factor of 20 and an intercorrelation matrix given by:
⎛ ⎞
1 0, 3 0 0 0
⎜ 0, 3 1
⎜ 0, 3 0, 3 0, 1 ⎟ ⎟
R=⎜ ⎜ 0 0, 3 1 0 −0, 2 ⎟ ⎟
⎝ 0 0, 3 0 1 0 ⎠
0 0, 1 −0, 2 0 1
We note that after 3 iterations, the SIC converges towards the result obtained
with the decorrelator (we can prove mathematically that the SIC converges
towards this result when the number of iterations M tends towards infinity).
11. The turbo principle applied to equalization and detection 403
• Reed and Alexander [11.46] have proposed to use an adapted filter bank
followed (in parallel) by different decoders before subtracting, for each
user, the multiple access interference linked to the K − 1 other users.
• Wang and Poor [11.56] have proposed a multi-user detector that involves
implementing in parallel the MMSE filters associated with each user, fol-
lowed by the corresponding channel decoders. These two elements ex-
change their extrinsic information iteratively.
• Tarable et al. [11.48] have proposed a simplification of the method pre-
sented in [11.56]. For the first iterations, an MMSE type multi-user detec-
tor is used, followed by channel decoders placed in parallel. For the final
iterations, the MMSE filter is replaced by an adapted filter bank.
This ratio is then transformed into a weighted estimation of the binary ele-
ments:
1
b̃m,k = E [bk /ym,k ] = tanh LLR (bk /ym,k ) (11.72)
2
The soft estimation of user k at iteration m is given by bm,k = Ak b̃m,k . The
difference (bm,k − bm−1,k ) is interleaved by πk before spreading by sk . The result
404 Codes and Turbo Codes
Figure 11.26 – Interference cancellation unit for the turbo SIC decoder in CDMA for
the k-th user and at iteration m.
thus obtained Δem,k is subtracted from residual signal em,k to obtain the new
residual signal em,k+1 of the following user (if k < K) or to obtain the new
residual signal em+1,1 for the first user at the following iteration (em,K+1 =
em+1,1 ). Here, ym,k is written in the form ym,k = Ak bk + νm,k where νm,k
(residual multiple access interference plus the additive noise) is approximated
by a centred Gaussian random variable whose variance is given by:
Some simulations
To give an idea of the performance of the turbo SIC decoder, Gold sequences
of size 31 are generated. The channel turbo encoder (rate R = 1/3) normalized
for UMTS [11.1] is used. We consider frames of 640 bits per user. The external
interleavers of the different users are produced randomly. The BER and PERs
are averaged over all the users. For the channel turbo encoder, the Max-Log-
MAP algorithm is used, 8 being the number of iterations internal to the turbo
decoder. Figure 11.27(a) gives the performance of the turbo SIC decoder for one,
two and three iterations with K = 31 users (that is, 100% load rate) having the
same power. The performance of the single-user detector and of the conventional
detector are also indicated. Figure 11.27(b) shows performance in terms of PER.
Figure 11.27 – Performance of the turbo SIC decoder: (a) mean Binary Error Rates
(BER) (b) mean Packet Error Rates (PER). K = 31 users, spreading factor of 31, with
frame size 640 bits.
by a RAKE filter (filter adapted to the spreading sequence convolved with the
transfer function ck (t) in the ICUm,k ) unit, and to replace the spreading function
by the spreading function convolved by ck (t). This new structure is called a
turbo SIC/RAKE decoder.
The turbo SIC /RAKE decoder is used particularly in the context of the
uplink in the UMTS-FDD system.
11.3 Conclusions
In this chapter, we have presented the first two systems to have benefited from
applying the turbo principle to a context other than error correction coding. In
the first part, we have described the principle of turbo equalization, which relies
on an iterative exchange of probabilistic information between a SISO equalizer
and a SISO decoder. The SISO equalizer can take different forms according
to the chosen optimization criterion. We have presented two types of SISO
equalizers: the BCJR-MAP equalizer, operating on the trellis representation of
the ISI channel, and the MMSE equalizer, which uses linear filtering. The MAP
turbo equalizer leads to excellent performance compared to the conventional
receiver. However, this approach is often avoided in practice since it leads to a
very high computation cost. We have discussed several solutions for reducing
406 Codes and Turbo Codes
the complexity of the BCJR-MAP equalizer. As for the MMSE turbo equalizer,
it offers a good compromise between performance and complexity. For many
transmission configurations it leads to performance close to that offered by the
BCJR-MAP turbo equalizer, with reasonable complexity. In addition, unlike
the BCJR-MAP turbo equalizer, the MMSE turbo equalizer can be realized
in adaptive form, thereby jointly performing equalization and tracking of the
channel time variations.
In the second part, we have dealt with the application of the turbo princi-
ple to the domain of multi-user communications in code-division multiple access
systems. We have presented a survey of conventional multi-user detection tech-
niques. In particular, the PIC and SIC methods for cancellation of multi-user
interference have been described. Their particular structures lead to a relatively
simple exploitation of the turbo principle in a multi-user transmission context.
Like for turbo equalization, different detectors can be implemented based on
MMSE filters or matched-filter banks, for example.
In this chapter, we have deliberately limited ourselves to the presentation of
two particular systems exploiting the turbo principle. However, more generally,
any problem of detection or parameter estimation may benefit from the turbo
principle. Thus, the range of solutions dealing with interference caused by a
multi-antenna system at transmission and at reception (MIMO) has been en-
riched by iterative techniques such as the turbo BLAST (Bell Labs layered space
time) [11.25]. The challenge involves proposing SISO detectors of reasonable
complexity, without sacrificing data rates and/or the high performance of such
systems.
We can also mention the efforts dedicated to receiver synchronization. In-
deed, the gains in power provided by the turbo principle lead to moving the
systems’ operation point towards low signal to noise ratios. Now, conventional
synchronization devices were not initially intended to operate in such difficult
conditions [11.31]. One possible solution is to integrate the synchronization into
the turbo process. A state of the art of turbo methods for timing synchronization
was presented in [11.4]. More generally, when the choice of turbo processing at
reception is performed, it seems interesting, or even necessary, to add a system
to the receiver to iteratively estimate the transmission parameters, like channel
turbo estimation or turbo synchronization.
Among other applications, the uplink of future radio-mobile communications
systems will require higher and higher data rates, with an ever-increasing number
of users. This is the one of the favourite applications of the turbo principle, the
generalization of which will be essential in order to respond to the never-ending
technological challenge posed by the evolution of telecommunications.
Understanding the turbo principle has led to the introduction of novel theo-
retical tools and concepts, like EXIT charts or factor graphs. While the former
enable accurate prediction of the convergence threshold of iterative decoding
schemes, the latter offer a graphical framework for representing complex detec-
11. The turbo principle applied to equalization and detection 407
Bibliography
[11.1] Etsi digital cellular telecommunication system (phase 2+). GSM 05
Series, Rel. 1999.
[11.2] B. S. Ünal A. O. Berthet and R. Visoz. Iterative decoding of convolu-
tionally encoded signals over multipath rayleigh fading channels. IEEE
Journal of Selected Areas in Communications, 19(9):1729–1743, Sept.
2001.
[11.3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv. Optimal decoding of
linear codes for minimizing symbol error rate. IEEE Transactions on
Information Theory, IT-20:284–287, March 1974.
[11.22] Q. Guo, L. Ping, and H.-A. Loeliger. Turbo equalization based on factor
graphs. In Proceedings of IEEE International Symposium on Information
Theory (ISIT’05), pages 2021–2025, Sept. 2005.
[11.23] J. Hagenauer. Soft-in / soft-out – the benefits of using soft decisions in all
stages of digital receivers. In Proceedings of 3rd International Symposium
on DSP Techniques applied to Space Communications, Noordwijk, The
Netherlands, Sept. 1992.
[11.24] J. Hagenauer, E. Offer, C. Measson, and M. Mörz. Decoding and
equalization with analog non-linear networks. European Transactions
on Telecommunications, pages 107–128, Oct. 1999.
[11.25] S. Haykin, M. Sellathurai, Y. de Jong, and T. Willink. Turbo mimo
for wireless communications. IEEE Communications Magazine, pages
48–53, Oct. 2004.
[11.26] M. Hélard, P.J. Bouvet, C. Langlais, Y.M. Morgan, and I. Siaud. On the
performance of a turbo equalizer including blind equalizer over time and
frequency selective channel. comparison with an ofdm system. In Pro-
ceedings of International Symposium on Turbo Codes & Related Topics,
pages 419–422, Brest, France, Sept. 2003.
[11.27] G. D. Forney Jr. Maximum-likelihood sequence estimation of digital
sequences in the presence of intersymbol interference. IEEE Transactions
on Information Theory, IT-18(3):363–378, May 1972.
[11.28] W. Koch and A. Baier. Optimum and sub-optimum detection of coded
data disturbed by time-varying intersymbol interference. In Proceedings
of IEEE Global Telecommununication Conference (GLOBECOM’90),
volume 3, pages 1679–1684, San Diego, CA, 2-5 Dec. 1990.
capacity, 331
permutation, 5, 201, 216, 222, 231, 283, soft-in/soft-out, 11, 202, 238, 291, 297,
304, 347, 368 370
Peterson algorithm, 144 SOVA, see Soft-Output Viterbi Algo-
Phase Shift Keying, 25, 31, 239, 350, rithm
364 sphere-packing bound, 98
PIC, see parallel interference cancella- spread spectrum, 362
tion state machine, 173
PNP (parity node processor), 319, 331 successive interference cancellation,
polynomial representation, 120 401, 403
power spectral density, 9, 24, 34, 35, survivor path, 189
37, 59, 79, 103, 137 syndrome, 132, 144
product code, 115 systematic bits, 348
psd, see power spectral density
PSK, see Phase Shift Keying tail bits, 221
puncturing, 169, 196, 206, 220, 347 tail-biting, 169, 194
transfer function, 180, 260
Quadrature Amplitude Modulation, transmission channel, 6, 83, 361
26, 95, 352 tree of a code, 173
quantization, 8, 237, 333, 395 trellis, 17, 56, 173, 207, 240, 290, 319,
343, 363
Rayleigh channel, 73, 95, 237 TTCM, see turbo trellis coded modu-
Reddy-Robinson algorithm, 274 lation
redundancy, 109, 183, 216, 303, 345 turbo CDMA, 359
Reed-Muller code, 119 turbo code, 11, 83, 203, 213, 307
Reed-Solomon code, 130, 143 turbo detection, 371
repetition code, 116 turbo equalization, 359
Return To Zero, 178, 204, 216 turbo estimation, 406
RTZ, see Return To Zero turbo product code, 290
turbo synchronization, 406
scalar product, 9 turbo trellis coded modulation, 343
sequential decoding, 168
serial concatenation, 115, 202 Viterbi algorithm, 56
Shannon limit, 2, 93, 214, 345, 359 VNP (variable node processor), 319,
shortened code, 114 330
SIC, see successive interference cancel-
lation
SIC detector, 403
SISO, see soft-in/soft-out
SISO demapping, 387
SISO equalizer, 370
soft input, 134, 309
soft output, 11, 92
Soft-Output Viterbi Algorithm, 186,
214