Abstract| Design of low-energy communications systems requires attention to .... the transmitted signal plus additive white Gaussian noise. The noise power .... decoder for the convolutional code gained about 2 dB over its corresponding hard ...
To appear in Proceedings of the 1999 Military Communications Conference MILCOM99, Nov. 1999.
Performance Optimization of VLSI Transceivers for Low-Energy Communications Systems Andrew P. Worthen, Sangjin Hong, Riten Gupta, and Wayne E. Stark Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI 48109
Abstract | Design of low-energy communications systems encode source interleave PA RC-BPSK requires attention to power consumption in the overall system design and the algorithm implementations. We consider Pamp Pmf design of a communications system incorporating digital Pdec transmission and receiver lters, and error-control coding. error Various well-known channel codes including turbo codes, matched filter deinterleave channel decode counter block codes, and convolutional codes are studied. For each of these decoders, its decoding performance and power conFig. 1. Evaluation System Model sumption are evaluated from VLSI chip designs. We then demonstrate optimization of a simple design from a total system perspective. Keywords | Low-energy system design, low-power de- cally require more complex decoders. (The encoder comcoders plexity is generally very small.) This leads to a power vs. performance trade-o where improvements in the required SNR, and thus the transmitter power, come at the cost of I. Introduction increased power consumption at the receiver. However, the N wireless mobile radio communications, energy storage interaction between code performance, decoder design, and Iand size of the portable terminals are at a premium. Be- power consumption is not well understood. This work comcause of this, low power design is essential to achieving per- pares decoder performance vs. power consumption tradeformance requirements. Portable terminals are expected to os for several common codes and shows how we can opbe used more frequently and for longer sessions in the fu- timize an example communication system with respect to ture, and hence power consumption will become even more energy consumption. Section II describes the system model important than it is now. under consideration, Section III outlines some principles In a typical wireless communication system, most of the and techniques for low-energy design, Section IV presents power required for transmitting is dissipated by the trans- our simulation results, and Section V concludes. mitter power ampli er. Furthermore, this power is typically much larger than the power consumed by the reII. System Model ceiver. If the required transmitter power could be reduced this paper, a simpli ed communication system model by even a small factor, substantial increases in battery life is Inadopted as shown in Fig. 1. At the transmitter, would be possible. Forward error-control coding (FEC), al- data bits generated by the source are passed through the ready widely used in digital wireless communications, can channel encoder, block interleaved, modulated using rootreduce the required transmitter power by achieving the de- raised-cosine ltered binary shift keying (RC-BPSK), sired performance at a lower received signal-to-noise ratio passed through a model of phase the non-linear RF power am(SNR). A wide variety of error-control coding schemes are pli er (PA), and transmitted across the channel. At the available to the system designer. The well-known Ham- receiver, the signal is ltered with a digitally-implemented ming, BCH, Hadamard, and Golay block codes [1, 2] oer lter, block deinterleaved, and passed through the reasonably simple codes with good performance for short matched channel decoder. The estimated data bits are passed to an block lengths. Convolutional codes [3, 4] are very widely error counter which the bit and packet error used. More recently, parallel concatenated convolutional rates for the system.determines As indicated, only the power concodes, or turbo codes, have been shown to have perfor- sumed by the power ampli er P , receiver matched lter amp mance close to the Shannon limit [5, 6] with a realizable P , and FEC decoder P is considered as this makes up mf dec sub-optimum decoder. most of the power consumed by the system. Unfortunately, more powerful error-control codes typiIn order to model the connection between transmitter and receiver consumed energy, we consider a two-way This research was supported by the Department of Defense Research & Engineering (DDR&E) Multidisciplinary University Re- packet communication system where each node has both search Initiative (MURI) on \Low-Energy Electronics Design for Mo- transmit and receive capability. Let denote the fraction bile Platforms" and managed by the Army Research Oce (ARO) under grant DAAH04-96-1-0377. A. Worthen was supported by a of the total packets handled by a particular node that are transmitted, so (1 ? ) is the fraction of packets which are National Science Foundation Graduate Research Fellowship.
1
2
received. Thus, the average energy consumed per packet of additions and multiplications required per packet and es(either transmitted or received) is timates of the energy required per operation as a function of the word width as described below. Eaverage = Eamp + (1 ? ) (Emf + Edec ) (1) D. Decoder Model where Eamp , Emf , and Edec are processing energies conThe decoder models include the eects of quantization in sumed by the transmitter ampli er, the receiver lter, and the decoder implementation to give realistic behavior. The the decoder respectively. Thus, equals 0:5 corresponds to decoder power consumption was determined as a function a scenario where nodes send and receive the same volume of the word width by designing VLSI implementations of of data, while = 0:125 corresponds to a scenario where the decoders and employing power consumption estimation for every seven packets received only one is sent. tools. A. Transmitter Ampli er Model Power ampli ers are most ecient when they are operated in their non-linear region. For non-constant envelope signals such as RC-BPSK, this causes signal distortion. We assume that the ampli er design is xed and the input signal amplitude may be varied. Therefore, transmitting a small signal level yields low ampli er eciency, but also very little distortion. We model the ampli er distortion by a baseband memoryless non-linearity determined by detailed performance simulations of a real power ampli er design as described in [7]. The average power consumption is calculated from simulations incorporating a detailed model of the ampli er circuit and the characteristics of the RC-BPSK signal. B. Channel Model We evaluate the performance of codes and decoders using a simple additive white Gaussian noise (AWGN) channel model. The channel model assumes that the transmitted signal is corrupted by addition of a white Gaussian noise process with two-sided power spectral density N0 =2. For total system performance evaluation and optimization, we use an AWGN channel model with propagation loss. Thus, the received signal is an attenuated version of the transmitted signal plus additive white Gaussian noise. The noise power spectral density (N0 ) was taken to be 4.96e-16 W/Hz corresponding to a noise temperature of 290 Kelvin and a 3 dB noise gure. The received signal power Pr was computed as h2 h2 Pr = P t t 4 r d
(2)
where Pt is the transmitted power, ht is the transmitter antenna height, hr is the receiver antenna height, and d is the distance between the transmitter and receiver. ht and hr were taken to be 1 meter and the antenna gains and eciencies were assumed to be one. C. Receiver Filter Model The receiver lter accurately models the eects of using a nite-length, 4-times oversampled, root-raised-cosine receiver lter implemented as a tapped delay line with xed point arithmetic. The number of bits used to represent coecients and data was an adjustable parameter. Power consumption for this component was modeled using counts
III. Design of Low-Energy VLSI Systems
Power ecient system design requires attention to implementation of algorithms and functions as well as proper selection of system level components such as error-control coding and modulation schemes. For low-energy system designs, we need to know what performance vs. power trade-os are available for dierent error-control codes and other components. Subsequently, we must consider what combinations of components give optimum performance. For error-control codes, we concentrate on the decoders because these consume much more power than the encoder. For a given error-control code, tremendous savings in power consumption can be attained both through algorithm reformulation and architectural innovations speci cally targeted for energy conservation. A. Decoder Design Optimization At the architectural and circuit levels, the main contribution to power consumption in complementary metal oxide semiconductor (CMOS) circuits is attributed to the charging and discharging of parasitic capacitors that occurs during logical transitions. The average switching energy Eop of a CMOS gate (or the power-delay product) is given by 2 Eop = Caverage Vsupply
(3)
where Caverage is the average capacitance being switched per clock, and Vsupply is the supply voltage. The quadratic dependence of energy on voltage makes it clear that operating at the lowest possible voltage is desirable for minimizing the energy consumption. Unfortunately, reducing the supply voltage comes at the cost of a reduction in computational throughput. One way to compensate for these increased delays is to use architectures that reduce the speed requirements of operations while keeping throughput constant. One architectural approach for maintaining throughput with slower circuitry is to use parallelism through hardware duplication. By using identical units in parallel, the speed requirements for each unit are reduced, allowing for a reduction in voltage. This approach is particularly useful for iterative decoding algorithms where the nal decisions are taken after multiple iterations of the process. However, duplicating units is limited by the available silicon area. Because interconnections between chips have high capacitance and thus consume substantial power, duplication factors requiring
3
multiple chips are ineective at reducing power consumption. B. System Design Optimization At the system level, proper selection of modules and their implementation is essential for realizing optimum lowenergy performance. Code selection is a widely-treated part of any communications system design. However, for low-energy design, code selection must consider, in addition to other factors, the power vs. performance tradeos of available decoders. Theoretical performance is irrelevant if the required decoder consumes too much power. The effects of code rate and block size on bandwidth requirements and communication delay are well understood. However, they also have implications for decoder power consumption. Low code rates or high data rates require greater computational throughput. This may require larger supply voltages and thus consume more power. Large block sizes require large memories in which to store the data as it is being accumulated and processed. These memories can consume considerable power and chip area. Furthermore, if the memory is too large to t on a single chip, interconnections between the chips lead to signi cant power consumption. In a complete communications system, power has to be allocated among the components. Joint design of the modules is therefore needed in order to achieve the best results. For example, it may be better to use a higher complexity, high-performance demodulator with a low-complexity suboptimal decoder, than to use a simple demodulator with a complex decoder. Unfortunately, especially when suboptimal decoders and quantization are involved, total system performance becomes infeasible to analyze. Thus, we resort to Monte Carlo simulations to determine the performance as a function of the design parameters. Numerous approaches to optimizing such \black box" functions are known. Additionally, some qualitative properties of the function may be known which can drastically speed up our search. In order to demonstrate these concepts, we describe the optimization of a simple system where the code type and rate, decoder type, and modulation are xed. The parameters to be optimized are the ampli er drive level (or equivalently the total power consumed by the ampli er), and the numbers of bits of quantization to be used in the receiver lter and decoder. We begin by separating the eects of ampli er distortion from the overall signal gain. To do this, we x the received SNR independent of the transmitter power. Next we temporarily separate the transmitter and receiver power consumption by imposing separate energy constraints on each. This leaves the receiver lter quantizer and the decoder quantizer word widths to be jointly optimized with respect to the receiver energy constraint. We do this for several dierent ampli er drives and then reintroduce the eect of ampli er gain by calculating the received SNR for xed d and N0 according to PT SNR = r c N0
where Tc is the channel symbol duration, N0 is the noise power spectral density, and Pr is the received power computed according to (2). Finally, we optimize the performance with respect to the average energy constraint (1) using brute force and interpolation. IV. Results
A. Decoder Performance Several low-complexity decoders were synthesized using the Epoch CAD design environment based on 3.3 volt, 0.6m CMOS standard cell technology. The data rate was taken to be 1 megabit per second (Mbps). For a fair comparison, similar architectural optimizations were made to the decoders. We assume uniform quantization in the soft decision decoders. The following codes were considered: Hamming (7,4), BCH (31,16), BCH (31,21), Golay (23,12), Hadamard (16,4), Hadamard (64,6), rate 1=2, constraintlength 7 convolutional code, and turbo codes with dierent block sizes. For the Hamming, BCH, and Golay codes, we implemented a hard decision decoder. For the Hadamard, convolutional, and turbo codes, both hard and soft decision decoders were considered. The turbo code used recursive systematic convolutional (RSC) codes with generator (35,23) as its constituent codes. The turbo decoder considered was a low-power design described by [8]. Fig. 2 summarizes the decoding performance vs. power consumption for the various decoders. For the Hadamard codes, the soft decision decoders gained about 2 dB over their hard decision counterparts. The soft decision Viterbi decoder for the convolutional code gained about 2 dB over its corresponding hard decision decoder at a bit error rate (BER) of 10?3 . Typically, the performance did not increase signi cantly beyond three bits of quantization. As expected for the turbo code, there were noticeable performance improvements for larger block lengths. Turbo code performance did not improve substantially beyond four decoding iterations or three to four bits of quantization. At a BER of 10?3, turbo codes gain about 0.3-1 dB over the constraint-length 7 convolutional code. Fig. 3 shows the receiver SNR required to attain a BER of 10?5. Here turbo codes gain about 1-2 dB over the convolutional code. Fig. 4 shows the chip area required by the various decoders. This is dominated by the memory requirements which are largest for the convolutional and turbo codes and are generally related to the block length of the code. In general, turbo codes have the best performance but at the expense of high power consumption and large chip area.
B. System Performance Below, we present results of system optimization for a constraint-length 7, rate 1=2 convolutional code using the Viterbi decoder mentioned above. The performance objective was the packet error rate for xed length packets of 224 information bits. With trellis termination, this gives a coded packet length of 460 bits. The channel interleaver was a depth-20 block interleaver. The transmitter lter was (4) a 19-tap, 4-times oversampled, root-raised-cosine lter, im-
4 Decoder Performance vs Power
Circuit Area for Decoders
9
10
8
Turbo(N=1024) input bit = 3,4
Hadamard (16,4) bit=hard,3,4,5,6
Turbo(N=512) input bit = 3,4
7 8
←Hamming (7,4) 6
Turbo(N=256) input bit = 3,4
10
BCH (31,21) Viterbi(K=7) input bit = 1,3,4,5 Decoder Area (µ m2)
SNR (Eb/N0) at BER = 10−3
BCH (31,16) Golay (23,12)
5
Hadamard (64,6) bit=hard,3,4,5,6 convolutional K=7 bit=hard,3,4,5,6
4 turbo 1 iteration N=256 bit=2,3,4
7
BCH(31,16)
10
Golay(23,12) 6
10
Hadamard(64,6) input bit = 1,3,4,5,6 BCH(31,21)
N=512 bit=3,4 3
Hadamard(16,4) input bit = 1,3,4,5,6
N=1024 bit=3,4 5
10
turbo 3 iterations turbo 4 iterations
Hamming(7,4)
2 turbo 2 iterations 4
1
10
0
50
100 150 Power Dissipation (mW)
200
0
5
10
15
20
25
250
Fig. 4. Decoder chip size of various decoders
Fig. 2. Decoding performance vs. power consumption for Gaussian channel at BER=10?3
Performance vs. Energies 0
10 Decoder Performance vs Power 6.5 convolutional K=7 bit=hard,3,4,5,6
−1
10
turbo 1 iteration N=256 bit=2,3,4
5
N=512 bit=3,4 4.5
b
−2
10
−3
10
N=1024 bit=3,4
0
SNR (E /N ) at BER = 10−5
5.5
Packet error rate
6
4
−4
10
3.5
turbo 2 iterations −5
10
1
turbo 4 iterations
3
0.5 0 2.5
turbo 3 iterations
x 10
Ect (J) 2
2.7
2.6
2.5
2.4
2.3
−3
Ecr (J)
2.2
2.1
1.9
2 −4
x 10
Fig. 5. Packet error rate as a function of the receiver processing energy constraint Ecr and the transmitter ampli er energy constraint Ect . The surfaces are, from top down, SNR=1 dB, 2 Fig. 3. Decoding performance vs. power consumption for Gaussian dB, 3dB, 4 dB, and 5 dB. The grids are interpolated and the channel at BER=10?5 SNR=5dB data is extrapolated from the simulation results. 0
50
100
150 Power Dissipation (mW)
200
250
300
plemented with double precision oats. The receiver lter was as described above (Section II-C) with 19 taps. Fig. 5 shows the results of the optimization for xed receiver SNR. Note that the ampli er drive, and thus the ampli er distortion, has little eect on the performance. The steps occur as successive bits of quantization become available to the decoder. The matched lter consumes much less power than the decoder so it is typically operated with more quantization levels. At least eight or nine bits seem necessary to avoid catastrophic errors. Fig. 6 shows the results of varying the SNR with the transmitter power according to (4). The dramatic performance improvement as the transmitter energy increases in this plot is because increasing the transmitter energy raises the SNR. Fig. 7 gives the results of trading o transmitter and
receiver energy according to (1). For small, transmitter energy is less important so increases in energy improve performance by allowing better receivers. Thus, this curve has steps corresponding to adding bits. For large, transmitter energy is emphasized and receiver energy is deemphasized so performance improvements are mostly due to increasing transmitter energy (and thus the received SNR). V. Conclusions
We present a VLSI decoder analysis in terms of circuit complexity and performance for low-power decoders. Various low-complexity decoder architectures are implemented and compared. The power consumption vs. performance curve obtained from this study can be used as a guide for low-energy communications system design. Furthermore,
5
of parallel concatenated recursive systematic (turbo) codes," in IEEE Global Telecommunications Conference, 1994, vol. 3. [7] C.-P. Liang, J. Jong, W. E. Stark, and J. R. East, \Non-linear ampli er eects in communications systems," IEEE Transactions on Microwave Theory and Techniques, vol. 47, no. 8, August 1999. [8] S. Hong and W. E. Stark, \Design and implementation of lowcomplexity adaptive turbo-code encoder and decoder for wireless mobile communication applications," in Proc. SiPS, 1998.
Performance vs. Energies 0
10
−1
Packet error rate
10
−2
10
−3
10
−4
10
2 2.5
10
−4
x 10
4
6
8
−4
x 10 Ect (J)
Ecr (J)
Fig. 6. Packet error rate as a function of the receiver processing energy constraint Ecr and the transmitter ampli er energy constraint Ect . The surfaces correspond to xed channel conditions with distances d, from the top down, of 250, 225, 210, and 200 meters. Performance vs. Weighted Total Energy
0
10
α=0.125
−1
α=0.5
α=0.8
packet error rate
10
−2
10
−3
10
2
3
4
5 6 Average packet energy (J)
7
8
9 −4
x 10
Fig. 7. Packet error rate vs. average energy constraint from (1) for various .
we demonstrate a methodology for investigating system level tradeos for energy and performance optimization. References J. G. Proakis, Digital Communications, McGraw-Hill, Inc., New York, 3rd edition, 1995. A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication, Addison-Wesley, 1995. A. J. Viterbi, \Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," IEEE Transactions on Information Theory, vol. 13, pp. 260{269, April 1967. [4] J. K. Omura, \On the Viterbi decoding algorithm," IEEE Transactions on Information Theory, vol. 15, pp. 177{179, January 1969. [5] C. Berrou, A. Glavieux, and P. Thitimajshima, \Near Shannon limit error-correcting coding and decoding," in Proceedings of the International Communications Conference, May 1993, pp. 1064{ 1070. [6] P. Robertson, \Illuminating the structure of code and decoder
[1] [2] [3]