LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
High Performance and Low Power VLSI Architecture of Viterbi Decoder using Asynchronous QDI Techniques T. Kalavathi devi1, Dr.C.Venkatesh 2, 1
Kongu Engineering College / Department of EEE, Erode, India Email:
[email protected] 2 Surya Engineering College / Department of ECE, Erode, India Email: cvkongu@gmail. com
Abstract— This paper discuss about VLSI architecture for a Viterbi Decoder using low power VLSI design techniques at circuit level with asynchronous self timed control and Differential Cascode Voltage Switch Logic (DCVSL). The asynchronous designs based on Pre Charged Half buffer (PCHB) templates. The asynchronous Viterbi decoder comprises of BMU, PMU and a SMU. Communication within the decoder blocks is controlled by the Request-Acknowledge handshake pair which signals that data is ready for process. The design of various units of Viterbi Decoder is done by T SPICE in TSMC 0.25um technology. The simulation results shows 90% power reduction has been achieved by using asynchronous design technique compared to that of the synchronous design and 52% power reduction has been achieved when compared to the design of viterbi decoder based on CMOS-Pseudo logic.
Based on the modified T-algorithm [9], VLSI architecture for the Viterbi decoders with reduced computations in add compare select unit was developed. VLSI architecture [2] was proposed for the Add-Compare-Select (ACS) operation in the Viterbi decoder which reduced the complexity of the computation by means of pipeline techniques. Examples in the second group include SPL [1] (Single ended Pass transistor Logic) implementation showed that dynamic power dissipation in ACS parts of the Viterbi decoder was reduced. In [5] an area efficient, low-power and robust ACS unit for Viterbi Decoder in two synchronous and asynchronous architectures. The architecture uses a hybrid CMOS-Pseudo NMOS technology to improve area and throughput factors. They concentrated only in ACS unit and not the full decoder. The asynchronous QDI template used was Pre Charge Full Buffer (PCFB). The design [8] was based upon serial, unary arithmetic for the manipulation and storage of metrics. The asynchronous design methodology [6,7] proposed by Caltech using quasi delay insensitive (QDI) pipeline templates. The two modules used in the proposed method for the design of architecture instead of clocking strategies are the Weak Charge Half Buffer (WCHB) and Pre Charge Half Buffer (PCHB)[11].
Index Terms— Viterbi Decoder, Asynchronous techniques, Low power, Dual rail logic, T-SPICE
I. INTRODUCTION The Viterbi algorithm [3,4,9] is used widely in the digital transmission field as a means of decoding convolutional forward error correction codes. Viterbi decoders are used for trellis code demodulation in telephone line modems, where the throughput is in the range of tens of kb/s, with restrictive limits in power dissipation and the area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used in magnetic disk drive read channels, with high throughputs. But at these high speeds, area and power are still limited. Asynchronous design is becoming alternative to synchronous design as they assume binary signals. This paper presents a low power asynchronous VLSI architecture for Viterbi decoder to reduce power dissipation with increased speed; this can be achieved by adopting Differential Cascode Voltage Switch Logic. The architecture is implemented in both synchronous and Asynchronous techniques. The latter design is simpler and more efficient. The designed decoder is code rate (r=1/2) with a constraint length of K= 3. Viterbi algorithm (VA)[4], widely used in digital communication, is known to be an efficient method for the realization of maximum likelihood decoding of convolutional codes.
II. Proposed Design A. Proposed Asynchronous Viterbi decoder Fig. 1 shows the block diagram of Asynchronous Viterbi decoder. Received Input Req
Req
Ack
Path Metric Unit (PMU)
Req
Ack
Survivor Memory Unit (SMU)
Decoded Output Ack
Figure. 1. Asynchronous Viterbi Decoder In the asynchronous Viterbi decoder received input from the encoder is given to the branch metric unit. Precharge half buffer (PCHB) cell is internally quasi delay insensitive. Weak conditioned half buffer (WCHB) acts a delay element in the design. Handshaking and 105
© 2009 ACADEMY PUBLISHER
Branch Metric Unit (BMU)
LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009 using DCVSL logic. For constraint length of K=3 and code rate r -1/2, it has four states requires 4-bit adder, comparator and selector.
completion of operation is ensured by Muller C-element. At each stage, request and acknowledge signals are provided in order to ensure the completion of the operation. B. Branch Metric Unit (BMU): The branch metric is the distance from the received code word to all the possible branch words. The distance from branch word 11 to the received sequence is called b30 & b31, and others are similarly named. In this case, the possible branch words are 00, 01, 10 and 11. Based on the 2 bit input sequence with respect to the present state, a Pre charge half buffer (PCHB) half adder is realized for each state with a delay element. Input bits are b0 and b1 which gives the true logic and complement value is given by b0b and b1b. Diagrams are the internal design file from T- SPICE.
Figure. 3. Asynchronous 4 bit Full Adder D. Survivor Memory Unit (SMU): In this method Register Exchange approach is designed, whereby a register is assigned to each state. In the architecture the inputs i0, i1, i2, i3 are the four outputs of the ACS unit and f1, f0 are the signals from the comparator output of ACS; the configuration of the registers is Serial In Serial Out. Figure. 2. Hardware realization of Branch Metric Computation Block C. Add compare and select unit (ACS): Hardware realizations (T-SPICE design) of ACS units are shown in Fig. 2 is adder. Four ACS modules are designed for each state. Each state requires 2 adders. Two inputs from the Branch metric unit b00 and b01 are given to one of the two inputs of the adders. Initial value of the other input of the adder, the path metric is taken as zero. 4 bit asynchronous ripple carry PCHB adder is constructed by rippling four 1-bit asynchronous full adders. Delay is balanced with weak conditioned charge buffers. The lower bits to the adder a00, a01 are the path metrics (previous branch metrics). Comparator then compares the resulting path metrics. The lesser one is the output from the ACS unit. Registers are placed (buffers) at the outputs. Output of the adder is given as input to the comparator. It compares the 4 bit input signals which is b00, b01, a00, a01. Produces a single bit output f1, f0. Asynchronous selector at the output of the comparator is a multiplexer. Based on the minimum path metric the selector outputs the value. Circuits are implemented
Figure. 4. Asynchronous 4 bit SMU Since the RE method does not need tracing back, it is faster. The asynchronous unit shown in Fig.4 is constructed by WCHB and DCVS logic based 2:1 multiplexer. SMU stores the least value of the 4 bits. III Experimental Results And Discussion. Both synchronous and the proposed asynchronous viterbi decoders are designed. Analysis is based on the speed, area, and Power consumption The functionality of the Viterbi Decoder is simulated using T-SPICE at TSMC 0.25um CMOS Technology. The input to the decoder is given by the output of the encoder with a sequence of 011011.the input to the 106
© 2009 ACADEMY PUBLISHER
LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009 bMU are 01, Bit b0 is 0 and bit b1 is 1, is the output of the half adder realized for each state. In ACS unit upper adder has a value 00 and lower adder has 10.the values are added and compares the path metric value which is 0 & 2, lower value 0’s input signal will be stored in the SMU. History of the preceded states at each time instant with the smallest path metric is maintained. The final decoded value is 011011 is shown in Fig.5. The decoder has 54 % increased performance for detecting and correcting errors than the existing design based on CMOS Pseudo logic [5].
DCVSL logic. The simulation results show the asynchronous design has the decrease in power consumption by 90% with increase in transistor count by 0.7 times in relative to synchronous Viterbi decoder with code rate of ½ and constraint length of K= 3 in TSMC 0.25um CMOS technology with a power supply of 2.5V.
REFERENCES [1] Bogdan.I, Mumunteanu.M, Ivey.P.A, Seed.N.L, and Powell.N. "Power Reduction Techniques for a Viterbi Decoder Implementation", ESPLD 2000 (European Low Power Initiative for Electronic System Design) Third International Workshop, Rapallo, Italy, ISBN 90-5326036-6, pp 28-48, July. 2000. [2] Chi-Ying Tsui, Cheng.R.S, K.Ling. "Low power ACS unit design for the Viterbi decoder", Proceedings of the IEEE symposium on Circuits and Systems, pp. 137-140. .1999. [3] Fettweis.G and Meyr.H. "High-speed parallel Viterbi decoding: Algorithm and VLSI-Architecture", IEEE Communications Magazine, 46v-55. 1991. [4] Forney. G .. "The Viterbi algorithm", Proceedings of the IEEE, vol. 61, no.3, pp.268-278. 1973 [5] Mohammad K.Akbari, Ali Jahanian, Mohsen Naderi, Bahman Javadi.," Area Efficient, Low Power and Robust Design for Add-Compare-Select Units"proceedings of the EUROMICRO Systems on Digital System Design (DSD'04) 0-7695-2203-3/04 IEEE. 2004. [6] Jens Sparso,. "Asynchronous Circuit Design A Tutorial", Technical University of Denmark. 2006 [7] Recep O. Ozdag, Peter A. Beerel," A Channel Based Asynchronous Low Power High Performance StandardCell Based Sequential Decoder Implemented with QDI Templates", IEEE Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems). 2004 [8] Riocereux.P.A, Brackenbury.E.M, Cumpstey.M, and Fruber.S.B."A Low-Power Self-Timed Viterbi Decoder", Proceedings of 7th International Symposium on Asynchronous Circuits and Systems. .2001 [9] Viterbi.A.."Error bounds for convolutional codes and asymptotically optimum decoding algorithm", IEEE Transactions on Information theory, vol. It-13, no.2, pp. 60-269. 1967 [10] Wann-Shyang Ju, Ming-Der Shieh and Ming-Hwa Sheu.. 'A Low- Power VLSI Architecture for the Viterbi Decoder', IEEE proceedings. 1997 [I1] Wuu and Sarma B. K. Vrudhula "A design of a fast and area efficient multi- input Muller C-element",IEEE Transactions on VLSI Systems, pp.215-219. 1993.
Figure. 5 Output of viterbi decoder The internal blocks are designed and integrated to obtain the overall performance of the viterbi decoder in TSMC 0.25pm CMOS technology. Table 1 shows the comparison of synchronous and Asynchronous design of viterbi decoder. The results show that circuit has a speed of 425Mbits/sec and consumes 1.73mW of average power compared to the synchronous design which consumes an average power of 20.4mW. TABLE I. Performance comparison of Synchronous and Asynchronous iterbi Decoder Synchronous Design Module Name
Transistors
Total
Asynchronous Design Transistors
Power(w)
Total Power(w)
Branch Metric Unit (BMU)
583
31.181m
806
24.56 m
1834
140 m
3810
91 mw
113
13n
116
8n
9215
20.4m
15702
1.73m
Path Metric Unit (PMU) Survivor Memory Unit (SMU) Viterbi Decoder
Conclusions Viterbi decoders employed in digital mobile communications are complex in its implementation and dissipate large power. The proposed Viterbi decoder uses asynchronous design techniques to reduce power consumption. The asynchronous design was based upon Quasi Delay Insensitive (QDI) timing model which can be used for robust and low power applications. The asynchronous circuit design uses 107 © 2009 ACADEMY PUBLISHER