Hardware Implementation of a Viterbi Decoder Using the Minimal Trellis Bruno Umbria Pedroni, Volnei Antonio Pedroni, and Richard Demo Souza† Federal Technological University of Paraná (UTFPR) – Curitiba – PR – Brazil
[email protected],
[email protected],
[email protected] Abstract—McEliece and Lin introduced the minimal trellis for convolutional codes, which can be considerably less complex than the conventional trellis typically used to construct Viterbi decoders. The authors state that this reduced trellis complexity can lead to less complex Viterbi decoders in practice. In this paper, we compare conventional and minimal Viterbi decoders for a rate 2/3 convolutional code considered by McEliece and Lin, which possesses a minimal trellis with half of the complexity of the conventional trellis. The decoder circuits were implemented in Altera programmable logic devices. We show that, though the complexity measure used by McEliece and Lin does not translate directly to practice, reductions in power consumption and hardware utilization, along with increased maximum operating frequency, can indeed be obtained for the decoders. †
I.
This paper is organized as follows. Section II introduces the code considered in our implementations, and its conventional and minimal trellises. Section III discusses the implementation of the three decoders (the conventional VD and the two minimal VDs). Sections IV and V present the VD circuits, with numerical results and performance analyses. Section VI concludes the paper. II.
INTRODUCTION
In 1967, Viterbi presented his famous algorithm as a solution to the complex convolutional decoding problem [1]. Forty years have passed and still the Viterbi algorithm is the most used method for decoding convolutional codes [2]. Moreover, convolutional codes are present in many modern wireless communication standards, where the power consumed by the decoders can represent up to one-third of the overall power consumption of the receivers [3-4]. Therefore, reducing the decoder power consumption is of great interest, especially in wireless, battery-operated applications. In [5] McEliece and Lin introduced a trellis complexity measure and showed that the minimal trellis can be much less complex than the conventional trellis, while achieving the same maximum likelihood performance when used with Viterbi decoders (VDs). Driven by this property of the minimal trellis, there has been a growing interest in the search for convolutional codes based on a trellis complexity criterion [6-9]. In order to verify the practical application of the theoretical complexity measure defined in [5], in this paper we construct VDs based on the conventional and minimal trellises for a rate 2/3 code considered by McEliece and Lin. The conventional VD and two variants of the minimal VD were implemented on Altera programmable logic devices (PLDs). †
The main contribution of this paper is to show that, even though the complexity measure used by McEliece and Lin does not seem to translate directly to practical values, reductions in terms of power consumption and hardware utilization, besides higher maximum operating frequency, can indeed be obtained by the utilization of the minimal trellis.
This work was partially supported by CNPq (Brazil) under grant 561443/2008-4
TRELLIS STRUCTURES
Convolutional codes offer a simple encoding scheme [2] in which the information word (k bits) is fed into the encoder circuit and the codeword (n bits) is obtained. In the present work, the rate k/n=2/3 code, discussed in [5], was chosen for analysis. Fig. 1 shows the encoder for this code, which has m=2 memory elements and minimum Hamming distance between codewords dfree=3. u1 D u2 D
Figure 1.
v1
v2
v3
Rate 2/3 convolutional encoder
Every convolutional code can be represented by a regular trellis, called the conventional trellis [5], which is formed by 2m states, 2k branches leaving and arriving in each state, with each branch labeled by n bits. The trellis is periodic, with the fundamental period being the trellis module. The conventional trellis of the rate 2/3 code considered here is shown in Fig. 2(a), which is comprised of 4 states, with 4 branches/state and each branch labeled by 3 bits. Notice that the conventional trellis module is formed by a single phase; in other words, there are no intermediate nodes on the module and every branch represents a complete codeword (n bits/branch).
In [5], the authors proposed a trellis complexity measure defined as the number of trellis edge symbols per encoded bit, or simply symbols/bit, which they argue is proportional to the computational complexity of the Viterbi algorithm. This complexity measure is obtained by calculating the sum of edge (branch) symbols in the trellis module and dividing the total by k. The complexity of a conventional trellis is given by TCCONV = (n/k) ·2m+k symbols/bit. Thus, the rate 2/3 code has a conventional trellis complexity TCCONV = 24 symbols/bit. It is also stated in [5] that every convolutional code can be represented by a minimal trellis. The complexity of this trellis structure is always smaller than or equal to the conventional trellis complexity (when k=1, the complexities are equal) and varies according to the characteristics of the code. The minimal trellis module of the rate 2/3 code is presented in Fig. 2(b). As can be observed, this type of structure is not very regular, being always formed by n phases (its module possesses intermediate nodes), of which k present decision nodes (when more than one branch enters each node). The complexity of the minimal trellis can be calculated simply by the sum of the edges (branches) in the structure divided by k, since each branch represents a single symbol (one coded bit per branch). For the rate 2/3 code, the minimal trellis complexity TCMIN = 24/k = 12 symbols/bit. Notice that this is half of TCCONV. A
A
000 00 110 111
B 101
B
100 010
A 0
A
1
B 1
B”
B
C 0
C’ 0
1 1 01
C ” 0 D 0
C
D 0
110
C
111 001 000
0 1
D’ 1 E’ F’
011 010 100
D
A’ 0
1
011
C
A 0 B 1
D
10
G H’
1
D
1
0 1 0
(b) Minimal trellis module
(a) Conventional trellis module
Figure 2.
The (3,2,2,3) code trellis modules.
With this in mind, McEliece and Lin suggest in [5] that one could expect a VD implementation based on the minimal trellis (minimal VD) to present reduced complexity when compared to the conventional VD. In the particular case of the above example, according to [5] one could expect the minimal VD to consume half the power of the conventional VD, while achieving the same performance in terms of bit error rate (BER). In this paper, besides power consumption, hardware consumption and maximum operating frequency are also analyzed for each implemented VD. Our objective is to verify in practice the theoretical estimation made in [5]. III.
VITERBI DECODER AND IMPLEMENTATION STRATEGIES
The VD estimates the most likely transmitted sequence out of all possible sequences. The decoder analyzes the received sequence, comparing the data with the expected trellis edge symbols, and storing the survivor paths. After a predetermined number of codewords have been processed, tracebacking converts the stored paths into a decoded message. The VD is essentially composed of three blocks [10-11]: the Hamming distance (HD) calculator, responsible for
computing the HDs (the decoders operate in hard decision mode) between the received word and the expected words; the Add-Compare-Store (ACS) unit, responsible for adding the current metrics with the HDs, analyzing the smallest sums, and storing the survivor paths and their metrics; and the Traceback (TB) unit, responsible for obtaining the most likely transmitted information sequence based on the stored paths in the ACSs. In the implemented VDs, only one HD block is used in each decoder. Typically, the ACS blocks represent the nodes in the trellis module, each containing a metric memory and a survivor path memory. Though this is true for the conventional VD, the particularities of the minimal trellis module offer two different implementation methods for the minimal VD that affect especially the ACSs, as will be seen next. Lastly, one TB block, containing a copy of the survivor path memories is used in each decoder. Although all VDs are constituted of the HD, ACS, and TB blocks, their construction varies according to the trellis structure. The different strategies used in the construction of the decoders are presented below. A. Intermediate paths As shown in Fig. 2(b), the minimal trellis module presents n=3 phases, where k=2 phases present decision nodes. Every time a decision takes place, a winner path is stored. It is possible to store all intermediate winner paths throughout the minimal trellis, though the ACSs and TB memories would double in size (k=2), with the TB having to process twice as much information when obtaining the decoded sequence. In order to eliminate excessive processing and reduce memory usage, the intermediate path memory was created to momentarily store the intermediate winner path and pass this value on to the definitive ACS survivor path memory when the complete codeword is processed. With this, only 2·2m=8 additional memory elements are necessary to store the survivor paths, independently of the length of the received message. With the above strategy, the typical TB block used in the conventional VD can also be used in the minimal VDs. B. Multi- and single-ACSs Since the minimal trellis is multi-phased, the minimal VD can be constructed with different ACS blocks for each phase (multi-ACSs strategy), where each ACS represents a node in the trellis module. As a second option – which reduces hardware consumption –, the same ACS blocks could be used for every phase (single-ACSs strategy). However, this second strategy implies in complex signal selection in the ACS blocks since the signals that enter the ACSs in each phase vary. Thus, it would be necessary to multiplex these signals accordingly, creating more complex routing paths inside the PLD. Both of these strategies are compared in sections IV and V. C. Updating the HD block The HD blocks were constructed seeking power reduction. This was obtained by updating the signals that enter the HD block only when the distances were to be calculated (to be used in the adders of the ACSs). Therefore, the encoded bits are serially stored and passed on to the HD block only when the codeword (or coded bit, for the minimal VD with multiACSs) is completely received.
IV.
VITERBI DECODER CIRCUITS
The implemented decoder circuits are discussed next. A. The conventional VD The conventional VD circuit is presented in Fig. 3. The HD calculator computes the distance between the received codeword and the 8 possible words, generating values from 0 (matching words) to 3 (completely different words) for hd0 to hd7 (formed by 2 bits each), as shown in Fig. 3(a). For each of the 2m=4 nodes, an ACS unit was created to add the current metrics with the HDs and select the winner path and metric. The ACS units are formed by 4 lock-on-overflow (LOV) adders and one 4-input comparator, along with survivor path and metric memories. The path memory consists of the winner state’s “name” (A, B, C, and D). After the predetermined number of codewords L has been received, the path memory is passed on to the TB. The decoding process occurs at the same rate as the clock signal, with k output bits being generated every n clock cycles. Since the same TB circuit was used in all VDs, it will not be presented here.
ACS consists of two LOV adders and one 2-input comparator. Four ACS units operating only in the first phase (for nodes E, F, G, H) were also used, as occurred with the VD with multiACSs.
mA’
mA’’
Figure 4. Nodes A′, A′′, and A of the minimal VD with multi-ACSs.
(b)
(a) Figure 3. The (a) HD block and (b) node A of the conventional VD.
B. The minimal VD with multi-ACSs The more complex structure of the minimal VD with multi-ACSs is shown in Fig. 4. Since the minimal trellis module presents only one symbol per branch, the HD calculator was constructed using just an inverter (for hd1). Despite the simplicity of the HD, the irregular structure of the trellis demands more ACS units. As was seen in Fig. 2(b), the first phase of the minimal trellis does not present decision nodes, so the ACSs here consist simply of a LOV adder and a metric memory. The ACSs of the second and third phases, on the other hand, were constructed using two LOV adders and one 2-input comparator per node (two branches enter each node in these phases). Moreover, an intermediate path memory was introduced for each of the four intermediate nodes in the second phase in order to store the originating state’s “names” and pass them on to the definite path memory in the third phase. The four bottom nodes (E, F, G, and H) of the first phase of the trellis module process only in this specific phase (identical to the one depicted in Fig. 4 for A′). C. The minimal VD with single-ACSs The minimal VD with single-ACSs can be seen in Fig. 5, where the signal selection for the ACSs was obtained with multiplexers and the same four top nodes of the minimal trellis module were used in all phases. Note that since no comparison is necessary in the first phase, but the structure of the ACSs is fixed, the signals selected to be compared in this phase guarantee that the unique paths to the nodes in the second phase are the winners. As with the other minimal VD, each
Figure 5. Node A of the minimal VD with single-ACSs.
Analyzing the three VD circuits, we can conclude that the conventional VD differs from the minimal VDs in the quantity of distances to calculate in the HD; in the slightly more complex adders (3+2 bits, versus 3+1 bit in the minimal VDs); and the highly more hardware demanding comparators (the 4input comparators are composed of three 2-input comparators). Between the two minimal VDs, their main difference lies in that the multi-ACSs VD presents more hardware for addition, comparison, and storage, while the single-ACSs presents more hardware for signal selection. V.
CIRCUIT PERFORMANCE ANALYSIS
For a fair comparison of performance between the VDs of Figs. 3-5, the same construction architecture and style (same types of adder and comparator circuits) were used to create the decoder circuits. The decoder circuits were created using VHDL [12-13] and implemented on five Altera PLDs. The power consumption was analyzed using Altera's PowerPlay Power Analyzer [14], which calculates the static, dynamic, and I/O power based on a simulation file. Dynamic power is consumed due to signal activity (device toggling), and is highly dependent on the operating frequency and capacitive loading of the circuit nodes, reflecting the power effectively consumed by the implemented circuit. The same
simulation files were used for the three decoders, using a global clock set at 10MHz. Terminated code sequences were randomly generated for decoding. Table I presents the dynamic power consumption, Table II presents the total hardware (logic elements) utilization, and Table III presents the maximum operating frequency for each VD circuit, according to the selected device. The PLDs in the three tables are: MAX II EPM570F100C4 (CPLD), Cyclone II EP2C5AF256A7 (FPGA1), Cyclone III EP3C5E144A7 (FPGA2), Stratix II EP2S15F484C3 (FPGA3), and Stratix III EP3SE50F484C2 (FPGA4). The hardware consumption for the Stratix II and Stratix III FPGAs is given in terms of percentage of utilization. TABLE I. DYNAMIC POWER CONSUMPTION RESULTS Dynamic Power (mW) Viterbi decoder CPLD FPGA1 FPGA2 FPGA3 FPGA4 Conventional 12,83 1,04 0,66 1,83 1,01 Minimal w/ multi-ACSs 9,09 0,78 0,52 1,68 0,81 Minimal w/ single-ACSs 12,12 1,05 0,67 2,05 1,14 TABLE II. HARDWARE UTILIZATION RESULTS Total Logic Elements Viterbi decoder CPLD FPGA1 FPGA2 FPGA3 FPGA4 Conventional 403 405 405 3%