Maximum Likelihood Decoding of Convolutional Codes
Maximum Likelihood Decoding of Convolutional Codes
Maximum Likelihood Decoding of Convolutional Codes
l Suppose a code sequence is transmitted. l Let r = (r0 , r1 ,..., rl ,..., rL + m1 ) be the received sequence where the l-th received block rl = ( rl (1) , rl ( 2) ,..., rl ( n ) ) . l MLD: Find the path c through the trellis diagram such that the conditional probability, P (r | c ) is the largest. l For a binary input, Q-ary output discrete memoryless channel (DMC), c is a binary sequence and r is a Q-ary sequence.
P (r | c )
can
P( r | c )
l l
(1)
where P (rl | cl ) is the branch conditional probability. l The branch conditional probability is given by
P (rl | cl ) = P (rl( i ) | cl( i ) )
i =0 n
(2)
where P (rl( i ) | cl( i ) ) is the channel transition probability. l Define the log-likelihood function of a path c as follows: M ( r | c ) log P (r | c ) (3) which is called the metric of path c . l From (1) and (3), we have
M (r | c )
L + m1 l=0
log P (r | c ) = M (r | c )
l l l =0 l l
L +m 1
(4) (5)
where
M ( rl | cl ) log P( rl | cl )
is called the branch metric. l From (2) and (5), we have the branch metric
M ( rl | cl ) log P (rl( i ) | cl(i ) )
i =1 n
(6) (7)
where
M ( rl (i ) | cl(i ) ) = log P (rl( i ) | cl(i ) )
is called the bit metric. l MLD: Find the path c in the trellis diagram such M ( r | c ) is maximized. Then c is the eastimate of the transmitted code sequence. l For the first j branches of a path c through the trellis, the partial path metric is
M ([ r | c ] j ) = M (rl | cl )
l=0 j 1
(8)
where d ( r , c ) is the Hamming distance between r and c . l Since log[ p /(1 p )] < 0 and ( L + m) n log(1 p ) is a constant for all code sequences c , log P( r | c ) is maximized if and only if d ( r , c ) is minimized. l MLD: The received sequence r is decoded into the code sequence c if d ( r , c ) is minimized.
Basic Concepts l Generate the code trellis at the decoder. l The decoder penetrates through the code trellis level by level in search for the transmitted code sequence. l At each level of the trellis, the decoder computes and compares the metrics of all the partial paths entering a node. l The decoder stores the partial path with the largest metric and eliminates all the other partial paths. The stored partial path is called the survivor. l For m < l L, there are 2km nodes at the l-th level of the code trellis. Hence there are 2km survivors. l When the code trellis begins to terminate, the number of survivors reduces. l At the end, the (L+m)-th level, there is only one node (the all-zero state) and hence only one survivor. l This last survivor is the maximum 7
likelihood path (or code sequence). The Viterbi Algorithm Step 1. Starting at the level l = m in the trellis, compute the partial metric for the single path entering each m-th order node. Store the path (the survivor) and its metric for each node. Step 2. Increasing l by 1. Compute the partial metric for all the paths entering a (l+1)-th order node by adding the branch metric entering that node to the metric of the connecting survivor at a previous l-th order node. For each (l+1)-th node, store the path with the largest metric (the survivor), together with its metric, and eliminate all the other paths. Step 3. If l < L+m, repeat Step 2. Otherwise, stop. 8
Example 9.1: Consider the (2,1,2) convolutional code given in Example 8.2 whose trellis diagram is shown in Fig. 9.1. Suppose the code is used for a BSC. In this case, we may use the Hamming distance as the path metric. The survivor at each node is the path with the smallest Hamming distance from the received sequence. l The message length L = 5. l There are 7 levels in the trellis. l The decoding process is shown in Figures 9.2 to 9.8.
11 01 11 01 11 01 11 10 10 10 10 10 10 01 10 10 01 10 10 01 10 10 01 10 01
00 00 00 01 01 01 01 01 11 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Initial Terminal node node Figure 9.1 The trellis diagram of a (2,1,2) code with L= 5.
10
2 11 1 10 1 10 10 01 11 2 01
11
2 1 11 01 11 10 10 10 1 3 10 10 10 01 01 2 00 2 01 01 11 11 11 3 11 3 00 00 00 00 00 00 00
r = (01, 11, 10)
12
1 3 11 11 01 11 10 10 10 10 10 3 3 10 10 10 10 01 01 01 00 2 00 1 01 01 01 11 11 11 11 3 11 3 00 00 00 00 00 00 00 00
r = (01, 11, 10, 10)
13
3 4 11 11 01 11 01 11 10 10 10 10 10 10 3 1 10 10 10 10 10 01 01 00 00 1 00 4 01 01 01 01 11 11 11 11 11 3 11 3 00 00 00 00 00 00 00 00 00
r = (01, 11, 10, 10, 00)
14
4 01 11 01 11 11 11 10 10 10 10 10 1 10 10 10 10 10 01 01 01 00 00 00 4 01 01 01 01 11 11 11 11 11 3 00 00 00 00 00 00 00 00 00 00
r = (01, 11, 10, 10, 00, 11)
2 01 4 00
15
4 01 11 01 11 11 11 10 10 10 10 1 10 10 10 10 10 01 01 01 00 00 00 4 01 01 01 01 11 11 11 11 11 3 00 00 00 00 00 00 00 00 00
r = (01, 11, 10, 10, 00, 11, 10)
2 01 11 4 00 00 00
16
4 01 11 01 11 11 11 10 10 10 10 1 10 10 10 10 10 01 01 01 00 00 00 4 01 01 01 01 11 11 11 11 11 3 00 00 00 00 00 00 00 00 00
2 01 11 4 00
00
17
3. Error Performance
l Without loss of generality, we assume that the all-zero code sequence (the all-zero path in the code trellis) is transmitted. l In the process of decoding, we say that a first-event error is committed if the all-zero path (the correct path) is eliminated for the first time at an arbitrary node in the trellis. l A measure of the error performance of a convolutional code is the first-event error probability, denoted P(E). l Another measure of the error performance is the probability that a decoded message bit is in error. This probability is normally called the decoded bit-error rate (BER), denoted Pb(E). l Upper bounds on these two error probabilities can be derived from the generating functions of the code.
here.
19
(10)
Pb ( E )
(12)
where weight
Ad free is
l We fine that
Ad free = 1
and
Bd free = 1 .
21
l If the BSC is derived from an additive white Gaussian noise (AWGN) channel with BPSK modulation, optimum coherent detection and binary output quantization, then
p = Q( 2E 1 ) e E / N0 N0 2
(13)
where E is the energy per transmitted symbol and N0 is the one-sided noise power spectral density. l For a code of rate R = k / n , the energy per message bit is
Eb E R
(14)
22
For a Binary Input AWGN Channel With No Output Quantization (Soft Decision)
l P(E) and Pb(E) are upper bounded as follows:
P ( E ) < T ( E ) | X = e REb / N 0 Pb ( E ) < 1 T ( X , Y ) |Y =1, X = e RE b / N 0 k Y
(15) (16)
(17)
23
24
l Hard decision
Pb ( E ) Rd free Eb 1 Bd free 2 d free exp[ ( )( )] k 2 N0
l Soft-decision
Pb ( E ) E 1 Bd free exp[ Rd free ( b )] k N0
l For the same Pb(E), Soft decision decoding requires less Eb/N0 than hard decision.
( Eb / N 0 ) soft < ( E b / N 0 ) hard
( Eb / N 0 ) hard = 2( Eb / N 0 ) soft
25
l If the modulator output is quantized to 8 levels, the energy gain of the soft-decision decoding over the hard-decision decoding is within about 1/4 dB of the optimum performance achievable with an unquantized demodulator output, while avoiding the need for an analog decoder. l For small Eb/N0, the energy gain of soft-decision decoding over hard-decision decoding is less 3dB, about 2 dB. l Over the entire range of Eb/N0 rations, the gain of soft-decision decoding over hard-decision decoding runs between 2 and 3 dB.
26
4. Coding Gain
l Coding gain is defined as the reduction in the require Eb/N0 (usually expressed in decibels) to achieve a specified error probability of a coded system over an uncodeed system with the same modulation and channel characteristic. l For an uncoded coherent BPSK system with an AWGN channel, the bit-error rate simply the transition probability,
Pb ( E ) = Q( 2E ) N0
l For large Eb/N0, this error rate (without coding) is approximated by Pb ( E ) 0.282 e E / N (18)
b 0
27
Coding Gain with Hard-Decision Decoding l Comparing (14) to (18), we see tat for a fixed Eb/N0, the exponent with coding is large than the exponent without coding by a factor of
Rd free 2
l For large values of Eb/N0, the exponential term dominates the error probability expressions. l Therefore, to achieve the same output bit-error rate Pb(E), the coded system requires
G = 10 log 10 ( Rd free 2 ) dB
(19)
less power than the uncoded BPSK system. l G is called the asymptotic coding gain of the coded system over the uncoded BPSK system.
28
dB
29
30
should be minimized. l Analytic construction of good convolutional codes has not been successful. l Most code construction has been done by computer search. As a result, only codes of relatively short constraint length (small memory order) which have maximal dfree have been found.
31
32
l This code has dfree=10. l With hard-decision decoding, it provides a 3.98dB coding gain over the uncoded BPSK modulation system. l With soft-decision decoding, the coding gain is 6.98dB.
33
Consideration
of
the
l The complexity of a Viterbi decoder mostly depends on the number of states in the trellis (or state diagram) and the decoding span (the length of survivors needed to be stored for making a decoding decision). Storage Consideration l Consider an (n,k,m) convolutional code l The encoder for this code consists of k shift registers to store message bits from k input terminals. l Let mi be the length of the i-th shift register. Then the memory order of the code is
m = max mi
1 i k
34
l Let K= m1 + m2 + + mk. l Then the encoder stores a total of K message bits at a time. Therefore, the encoder has 2K possible states. l This implies that, in the decoding process, there are 2K survivors. l The decoder must reserve 2K words of storage (or buffer registers) for the survivors. l Each word must be capable of storing the surviving path along with its metric. l Since the storage size increase exponentially with K, in practice it is not feasible to use codes with large K.
35
l K=8 is normally considered the practical limit for the Viterbi decoding algorithm. l This constraint on K limits the available minimum free distance of a code. l As a result, the achievable error probability cannot be made arbitrarily small. Bit error probabilities in the range 10-5~10-6 and (soft-decision) coding gains of around 7dB are considered the practical limit for the Viterbi decoding algorithm in most cases.
36
Path Memory l Suppose the length of a message sequence is kL bits (or L blocks) long. l To terminate the trellis for making a decoding decision, m blocks of 0s must be inserted into the input stream after every L blocks of message bits. l By doing this, the effective rate of information transmission is reduced from R to
Reff = L R. L+m
l Since energy per message bit is inversely proportional to rate, a lower effective rate means a lager required Eb/N0 to achieve a gain performance. l Hence, it is desirable to have L as large as possible so that Reff is nearly R. 37
l The difficulty with large L is that each of the 2K words of storage must be capable of storing a kL-bit (hypothesized) message sequence (corresponding to a surviving path) plus its metric. l Fro very large L, this is practically impossible, and some trade-offs must be made. l One approach to this problem is to truncate the path memory of the decoder by storing only the most recent r blocks of message bits for each survivor, where r << L. l After the first r blocks of the received sequence have been processed by the decoder, the decoder memory is full.
38
l As soon as the next received block is processed, a decoding decision must be made on the first block of k message bits, since they can no longer be stored in the decoder memory. l The optimum strategy to make this decision is to select the survivor with the best metric, and the first block of k message bits of this survivor is chosen as the decoded message block and released to the user. l After the first decoding decision is made, subsequent decoding decisions are made in the same manner for each new received block processed. l Note that decoding decisions made in this way are no longer maximum likelihood, but can be almost as good if r is large enough. 39
l Experience and analysis have shown that if r is in the order of 5 times of the encoder memory K or more, with probability approaching 1, all the 2K survivors stem from the same branch r levels back as shown in Figure 9.9. l Hence there is no ambiguity in making decoding decision. l The parameter r is called the decoding span (or depth).
40
Figure 9.9
41
Decoder Organization l A functional block diagram for a general Viterbi decoder is shown in Figure 9.10. l It consist of (1) A synchronizer for determining the beginning of a branch in the received bit stream. (2) A branch metric computer. (3) A path metric updating, comparison and storage device. (3) A device for updating and storing the survivors. (5) An output decision device. l A complete decoder can be built on a single chip. 42
INPUT
SYNCHRONIZER
OUTPUT
Figure 9.10
43
44
l One such scheme is being considered for NASA-TDRS system for error control. In this RS/Viterbi concatenated coding scheme, the outer code is the NASA standard (255,223) RS code over GF(28) and the inner code is the (2,1,6) Odenwalter convolutional code with Viterbi decoding.
45