High Performance and Low Power VLSI Architecture ...

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

High Performance and Low Power VLSI Architecture of Viterbi Decoder using Asynchronous QDI Techniques T. Kalavathi devi1, Dr.C.Venkatesh 2, 1

Kongu Engineering College / Department of EEE, Erode, India Email: [email protected] 2 Surya Engineering College / Department of ECE, Erode, India Email: cvkongu@gmail. com

Abstract— This paper discuss about VLSI architecture for a Viterbi Decoder using low power VLSI design techniques at circuit level with asynchronous self timed control and Differential Cascode Voltage Switch Logic (DCVSL). The asynchronous designs based on Pre Charged Half buffer (PCHB) templates. The asynchronous Viterbi decoder comprises of BMU, PMU and a SMU. Communication within the decoder blocks is controlled by the Request-Acknowledge handshake pair which signals that data is ready for process. The design of various units of Viterbi Decoder is done by T SPICE in TSMC 0.25um technology. The simulation results shows 90% power reduction has been achieved by using asynchronous design technique compared to that of the synchronous design and 52% power reduction has been achieved when compared to the design of viterbi decoder based on CMOS-Pseudo logic.

Based on the modified T-algorithm [9], VLSI architecture for the Viterbi decoders with reduced computations in add compare select unit was developed. VLSI architecture [2] was proposed for the Add-Compare-Select (ACS) operation in the Viterbi decoder which reduced the complexity of the computation by means of pipeline techniques. Examples in the second group include SPL [1] (Single ended Pass transistor Logic) implementation showed that dynamic power dissipation in ACS parts of the Viterbi decoder was reduced. In [5] an area efficient, low-power and robust ACS unit for Viterbi Decoder in two synchronous and asynchronous architectures. The architecture uses a hybrid CMOS-Pseudo NMOS technology to improve area and throughput factors. They concentrated only in ACS unit and not the full decoder. The asynchronous QDI template used was Pre Charge Full Buffer (PCFB). The design [8] was based upon serial, unary arithmetic for the manipulation and storage of metrics. The asynchronous design methodology [6,7] proposed by Caltech using quasi delay insensitive (QDI) pipeline templates. The two modules used in the proposed method for the design of architecture instead of clocking strategies are the Weak Charge Half Buffer (WCHB) and Pre Charge Half Buffer (PCHB)[11].

Index Terms— Viterbi Decoder, Asynchronous techniques, Low power, Dual rail logic, T-SPICE

I. INTRODUCTION The Viterbi algorithm [3,4,9] is used widely in the digital transmission field as a means of decoding convolutional forward error correction codes. Viterbi decoders are used for trellis code demodulation in telephone line modems, where the throughput is in the range of tens of kb/s, with restrictive limits in power dissipation and the area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used in magnetic disk drive read channels, with high throughputs. But at these high speeds, area and power are still limited. Asynchronous design is becoming alternative to synchronous design as they assume binary signals. This paper presents a low power asynchronous VLSI architecture for Viterbi decoder to reduce power dissipation with increased speed; this can be achieved by adopting Differential Cascode Voltage Switch Logic. The architecture is implemented in both synchronous and Asynchronous techniques. The latter design is simpler and more efficient. The designed decoder is code rate (r=1/2) with a constraint length of K= 3. Viterbi algorithm (VA)[4], widely used in digital communication, is known to be an efficient method for the realization of maximum likelihood decoding of convolutional codes.

II. Proposed Design A. Proposed Asynchronous Viterbi decoder Fig. 1 shows the block diagram of Asynchronous Viterbi decoder. Received Input Req

Req

Ack

Path Metric Unit (PMU)

Req

Ack

Survivor Memory Unit (SMU)

Decoded Output Ack

Figure. 1. Asynchronous Viterbi Decoder In the asynchronous Viterbi decoder received input from the encoder is given to the branch metric unit. Precharge half buffer (PCHB) cell is internally quasi delay insensitive. Weak conditioned half buffer (WCHB) acts a delay element in the design. Handshaking and 105

© 2009 ACADEMY PUBLISHER

Branch Metric Unit (BMU)

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009 using DCVSL logic. For constraint length of K=3 and code rate r -1/2, it has four states requires 4-bit adder, comparator and selector.

completion of operation is ensured by Muller C-element. At each stage, request and acknowledge signals are provided in order to ensure the completion of the operation. B. Branch Metric Unit (BMU): The branch metric is the distance from the received code word to all the possible branch words. The distance from branch word 11 to the received sequence is called b30 & b31, and others are similarly named. In this case, the possible branch words are 00, 01, 10 and 11. Based on the 2 bit input sequence with respect to the present state, a Pre charge half buffer (PCHB) half adder is realized for each state with a delay element. Input bits are b0 and b1 which gives the true logic and complement value is given by b0b and b1b. Diagrams are the internal design file from T- SPICE.

Figure. 3. Asynchronous 4 bit Full Adder D. Survivor Memory Unit (SMU): In this method Register Exchange approach is designed, whereby a register is assigned to each state. In the architecture the inputs i0, i1, i2, i3 are the four outputs of the ACS unit and f1, f0 are the signals from the comparator output of ACS; the configuration of the registers is Serial In Serial Out. Figure. 2. Hardware realization of Branch Metric Computation Block C. Add compare and select unit (ACS): Hardware realizations (T-SPICE design) of ACS units are shown in Fig. 2 is adder. Four ACS modules are designed for each state. Each state requires 2 adders. Two inputs from the Branch metric unit b00 and b01 are given to one of the two inputs of the adders. Initial value of the other input of the adder, the path metric is taken as zero. 4 bit asynchronous ripple carry PCHB adder is constructed by rippling four 1-bit asynchronous full adders. Delay is balanced with weak conditioned charge buffers. The lower bits to the adder a00, a01 are the path metrics (previous branch metrics). Comparator then compares the resulting path metrics. The lesser one is the output from the ACS unit. Registers are placed (buffers) at the outputs. Output of the adder is given as input to the comparator. It compares the 4 bit input signals which is b00, b01, a00, a01. Produces a single bit output f1, f0. Asynchronous selector at the output of the comparator is a multiplexer. Based on the minimum path metric the selector outputs the value. Circuits are implemented

Figure. 4. Asynchronous 4 bit SMU Since the RE method does not need tracing back, it is faster. The asynchronous unit shown in Fig.4 is constructed by WCHB and DCVS logic based 2:1 multiplexer. SMU stores the least value of the 4 bits. III Experimental Results And Discussion. Both synchronous and the proposed asynchronous viterbi decoders are designed. Analysis is based on the speed, area, and Power consumption The functionality of the Viterbi Decoder is simulated using T-SPICE at TSMC 0.25um CMOS Technology. The input to the decoder is given by the output of the encoder with a sequence of 011011.the input to the 106

© 2009 ACADEMY PUBLISHER

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009 bMU are 01, Bit b0 is 0 and bit b1 is 1, is the output of the half adder realized for each state. In ACS unit upper adder has a value 00 and lower adder has 10.the values are added and compares the path metric value which is 0 & 2, lower value 0’s input signal will be stored in the SMU. History of the preceded states at each time instant with the smallest path metric is maintained. The final decoded value is 011011 is shown in Fig.5. The decoder has 54 % increased performance for detecting and correcting errors than the existing design based on CMOS Pseudo logic [5].

DCVSL logic. The simulation results show the asynchronous design has the decrease in power consumption by 90% with increase in transistor count by 0.7 times in relative to synchronous Viterbi decoder with code rate of ½ and constraint length of K= 3 in TSMC 0.25um CMOS technology with a power supply of 2.5V.

REFERENCES [1] Bogdan.I, Mumunteanu.M, Ivey.P.A, Seed.N.L, and Powell.N. "Power Reduction Techniques for a Viterbi Decoder Implementation", ESPLD 2000 (European Low Power Initiative for Electronic System Design) Third International Workshop, Rapallo, Italy, ISBN 90-5326036-6, pp 28-48, July. 2000. [2] Chi-Ying Tsui, Cheng.R.S, K.Ling. "Low power ACS unit design for the Viterbi decoder", Proceedings of the IEEE symposium on Circuits and Systems, pp. 137-140. .1999. [3] Fettweis.G and Meyr.H. "High-speed parallel Viterbi decoding: Algorithm and VLSI-Architecture", IEEE Communications Magazine, 46v-55. 1991. [4] Forney. G .. "The Viterbi algorithm", Proceedings of the IEEE, vol. 61, no.3, pp.268-278. 1973 [5] Mohammad K.Akbari, Ali Jahanian, Mohsen Naderi, Bahman Javadi.," Area Efficient, Low Power and Robust Design for Add-Compare-Select Units"proceedings of the EUROMICRO Systems on Digital System Design (DSD'04) 0-7695-2203-3/04 IEEE. 2004. [6] Jens Sparso,. "Asynchronous Circuit Design A Tutorial", Technical University of Denmark. 2006 [7] Recep O. Ozdag, Peter A. Beerel," A Channel Based Asynchronous Low Power High Performance StandardCell Based Sequential Decoder Implemented with QDI Templates", IEEE Proceedings of the 10th International Symposium on Asynchronous Circuits and Systems). 2004 [8] Riocereux.P.A, Brackenbury.E.M, Cumpstey.M, and Fruber.S.B."A Low-Power Self-Timed Viterbi Decoder", Proceedings of 7th International Symposium on Asynchronous Circuits and Systems. .2001 [9] Viterbi.A.."Error bounds for convolutional codes and asymptotically optimum decoding algorithm", IEEE Transactions on Information theory, vol. It-13, no.2, pp. 60-269. 1967 [10] Wann-Shyang Ju, Ming-Der Shieh and Ming-Hwa Sheu.. 'A Low- Power VLSI Architecture for the Viterbi Decoder', IEEE proceedings. 1997 [I1] Wuu and Sarma B. K. Vrudhula "A design of a fast and area efficient multi- input Muller C-element",IEEE Transactions on VLSI Systems, pp.215-219. 1993.

Figure. 5 Output of viterbi decoder The internal blocks are designed and integrated to obtain the overall performance of the viterbi decoder in TSMC 0.25pm CMOS technology. Table 1 shows the comparison of synchronous and Asynchronous design of viterbi decoder. The results show that circuit has a speed of 425Mbits/sec and consumes 1.73mW of average power compared to the synchronous design which consumes an average power of 20.4mW. TABLE I. Performance comparison of Synchronous and Asynchronous iterbi Decoder Synchronous Design Module Name

Transistors

Total

Asynchronous Design Transistors

Power(w)

Total Power(w)

Branch Metric Unit (BMU)

583

31.181m

806

24.56 m

1834

140 m

3810

91 mw

113

13n

116

8n

9215

20.4m

15702

1.73m

Path Metric Unit (PMU) Survivor Memory Unit (SMU) Viterbi Decoder

Conclusions Viterbi decoders employed in digital mobile communications are complex in its implementation and dissipate large power. The proposed Viterbi decoder uses asynchronous design techniques to reduce power consumption. The asynchronous design was based upon Quasi Delay Insensitive (QDI) timing model which can be used for robust and low power applications. The asynchronous circuit design uses 107 © 2009 ACADEMY PUBLISHER

High Performance and Low Power VLSI Architecture ...

High Performance and Low Power VLSI Architecture ...

Suggest Documents

Performance and Low Power Driven VLSI ... - Semantic Scholar

Algorithm and vlsi architecture for high performance adaptive video

Algorithm and vlsi architecture for high performance adaptive video ...

A Low Power VLSI Architecture with an Application to ... - CORE

High Performance and Low power Monolithic Three

A LOW POWER AND HIGH PERFORMANCE EBCOT ...

Algorithm-based low-power VLSI architecture for 2 ... - Semantic Scholar

High-Throughput Power-Efficient VLSI Architecture of Fractional ...

Low-Complexity High Throughput VLSI Architecture of Soft-Output ML

A High-Performance VLSI Architecture for ... - Saraju P. Mohanty

VLSI Architecture for High Performance 3GPP (De ...

VLSI Architecture for High Performance 3GPP (De ...

VLSI Implementation of High Performance Optimized Architecture for ...

VLSI Implementation of a High Performance and Low Power 32-Bit ...

High Performance Low-Power Signed Multiplier - CiteSeerX

Design of a Low Power, High Performance

Reza Asadpour - Low-Power High-Performance Nanosystems ...

Circuits for High-Performance Low-Power VLSI Logic Albert Ma - Scale

a Low Power and High Code Density TTA Architecture - PARsE

High Speed and Low Power Architecture for Network Intrusion ...

Design Technologies for Low Power VLSI

low voltage, low power, high performance current ... - Semantic Scholar

Low-Complexity High Throughput VLSI ... - DATE Conference

A High Performance And Low Power Hardware ... - CiteSeerX