mization methods for the filter design, software generators for VHDL .... it, to a full custom design to increase the design efficiency. This is, however, a very time ...
AN ENVIRONMENT FOR DESIGN AND IMPLEMENTATION OF ENERGY EFFICIENT DIGITAL FILTERS Henrik Ohlsson, Oscar Gustafsson, Weidong Li, and Lars Wanhammar Department of Electrical Engineering, Linköping University, SE-581 83 Linköping, Sweden E-mail: {henriko, oscarg, larsw}@isy.liu.se
ABSTRACT This paper presents an overview of a design environment for digital filters. We discuss ongoing work as well as previously developed parts of the design flow. The purpose with this design environment is to improve the design efficiency and thereby be able to increase the knowledge in the area of design and implementation of energy efficient digital filters. The design environment includes all design steps required from the filter design downto a physical implementation. The main bottlenecks in the design flow are the filter design, the development of a bit-level design, and the final mapping to a physical layout. By developing efficient optimization methods for the filter design, software generators for VHDL descriptions of a filter implementation, and layout generators for the physical layout, the time required for going from the filter specification to a filter implementation can be reduced significantly, compared to a conventional design flow.
1. INTRODUCTION In wireless communication systems single- and multirate, fixed function, frequency selective digital filters are of great importance. To obtain a long uptime between recharging of the battery for cellular devices, low power consumption is required. Hence, methods for design and implementation of energy efficient digital filters are of great importance. To obtain an energy efficient digital filter implementation, the entire digital filter design flow must be considered. At at each level of the design, decisions that affect the power consumption of the final implementation are made. These design decisions could, for example, be the selection of a suitable filter structure or the selection of arithmetic style.
T
x(n)
c0
c1
Lattice wave digital filters (LWDFs) constitute a class of recursive digital filters that have some advantages compared to FIR filters [1]. The filter orders are often lower than for
T
c3
cN
Figure 1. An N:th-order FIR filter the corresponding FIR filter, especially when the filter requirements are stringent, i.e., narrow transition band. Also, LWDFs can be implemented with guaranteed stability. Finally, these structures have a very low coefficient sensitivity in the passband, which corresponds to the possibility to have short coefficient word lengths. However, the stopband sensitivity is high. The coefficient word length is often determined by the stopband requirements. In Figure 2 the signal-flow graph of a ninth-order LWDF is shown. The signal-flow graphs for the first- and second-order allpass sections are shown in Figure 3 and Figure 4, respectivelly. As for the FIR filter, the fundamental arithmetic operations required are multiplication and addition.
T
1.2. Lattice Wave Digital Filters
c2
T
y(n)
1.1. FIR Filters FIR filters constitutes a class of digital filters having a finite length impulse response. An FIR filter can be realized using nonrecursive as well as recursive algorithms. However, the later is not recommended due to potential stability problems while nonrecursive FIR filters are always stable. Hence, nonrecursive FIR filter algorithms are preferable for implementation. In Figure 1 a signal-flow graph for an Nth order direct form FIR filter structure is shown. As can be seen from the figure the fundamental operations of the FIR filter are multiplication and addition.
T
T
T
a4
a8
T
a0
T
a3
a7 1/2
x(n) a1
T
a5
T a2
a6
T
T
Figure 2. A ninth-order LWDF
y(n)
1.3. Iteration Period Bound for Recursive Filters For recursive filter structures, such as the LWDF, there is an algorithmic bound on the sample period. This bound is given by T op, i T min = max ------------ i Ni
(1)
where Tmin is the iteration period bound, Top,i is the total latency for the operations in loop i and Ni is the number of delay elements in loop i [2]. For the first- and second-order allpass sections of the LWDFs, as illustrated in Figure 3 and Figure 4 with the critical loops marked, this corresponds to iteration period bounds of Tmult + 2Tadd and 2Tmult + 4Tadd, respectively, assuming that the latencies are the same for the additions and the multiplications [3]. T
T
a0 a0
x(n) y(n)
y(n)
x(n)
ties and implemented down to silicon [10]. This is, however, a very time consuming task if all the work has to be done manually. Hence, an efficient design flow with a high degree of automation is required to be able to evaluate the implementation properties of a large number of filter structures.
1.6. Dynamic Power Consumption The dynamic power consumption of a CMOS circuit is the dominating part of the total power consumption. This part can be approximated by the well known formula shown in Eq. 2 2 P dyn = α f clk C L V dd
(2)
where α is the switching activity of the circuit, fc is the clock frequency of the circuit, CL is the equivalent load capacitance of the circuit, and VDD is the power supply voltage. The contribution to the power consumption from these factors can be affected by design decisions made at all levels of the design. An efficient method for reducing the power consumption of CMOS circuits is power supply voltage scaling [11]. It means that any excess speed in a design can be traded for reduced power consumption by reducing the power supply voltage.
2. DESIGN ENVIRONMENT Figure 3. Critical loop in a first-order allpass section T T
a2
a2
T
T
a1 a1
x(n) y(n)
x(n)
y(n)
Figure 4. Critical loop in a second-order allpass section
1.4. Maximally Fast Digital Filters In order to reach the iteration period bound it is often required that the operations are scheduled over several sample periods. This is called cyclic scheduling [3] [16] [19] [20]. This is useful when, for example, there is an operation in the critical loop with a latency larger than Tmin and with more than one delay element in the loop. We have developed software tool, implemented in Java, for manual scheduling of a digital signal processing algorithm has been developed [21].
1.5. Novel Filter Structures We have proposed several novel filters structures [4] [5] [6] [7] [8] [9]. Some of these has been studied with respect to the power consumption and other implementation proper-
A view of the digital filter design environment is shown in Figure 5. The design process starts from the filter specification. From the specification a filter algorithm is selected and the filter coefficients are determined. The filter design also includes computation of the required internal data word length and proper scaling constants to avoid overflows. The filter design stage is performed in MATLAB™. The filter design is used as input for the next stage, the mapping to hardware. The first step is a structural VHDL description. This VHDL design can then be mapped to a physical layout through logic synthesis using commercial tools. It is also possible to map the VHDL description, or parts of it, to a full custom design to increase the design efficiency. This is, however, a very time consuming task if it is performed manually. Instead we use full custom made, parameterized module generators. This means that once the module generator has been implemented, it can be used to generate specialized, unconstrained layouts for the critical blocks in the filter implementation.
3. FILTER DESIGN The first step of the design flow, the filter design, can be divided into two different parts, FIR filters and LWDFs.
3.1. FIR Filters Hardware efficient FIR filters can be designed using optimization methods. Depending on the constraints that applies on the filter design, different optimization methods can be used. The constraints for the design of an FIR filter can typically be the passband and stopband limits and the passband and stopband attenuations given by the filter spec-
Filter Design
VHDL Logic Synthesis
Full Custom Layout
Bit-Parallel Arithmetic In bit-parallel arithmetic all data bits are processed concurrently. This means that fewer D flip flops are required at the expense of larger chip area, compared to bit-serial arithmetic. By using redundant arithmetic, such as carry-save arithmetic, the throughput can be increased further, at the expense of chip area [18].
Digit-Serial Arithmetic Physical Layout
Figure 5. Digital filter design flow at electronics system ification. It can also be factors that affect the filter implementation, or more specifically, the power consumption of the implementation. Examples of such constraints are the number of nonzero bits in the set of coefficients or the coefficient word lengths. We use an FIR filter design method based on mixed integer and linear programming (MILP) has been developed [12] [13]. This method allows to apply constraint on the number of nonzero bits required in the fixed coefficients, for a given filter specification. By finding a set of coefficients with a low number of nonzero bits, the implementation cost can be minimized.
3.2. Lattice Wave Digital Filters The lattice wave digital filter is derived from an analog lattice structure. From this reference structure, real coefficients for the digital filter can be derived [14] [15]. These coefficient must then be quantized to a fixed point representation. Finding a optimal set of fixed coefficients for a LWDF is an optimization problem with many degrees of freedom. Such problems can be solved using, for example, simulated annealing. We have developed a tool based on simulated annealing that optimize fixed coefficient values for LWDFs [16].
4. ARITHMETIC There is a range of arithmetic styles that can be used for implementation of a digital filter. The selection of arithmetic depends on the requirements on throughput, chip area, and power consumption [17]. Currently it is not clear what is the best choice of arithmetic from a power consumption perspective and better understanding of power consumption issues.
4.1. Bit-Serial Arithmetic In bit-serial arithmetic only one bit of the data is processed during each clock cycle. A major advantage with bit-serial arithmetic is that it is area efficient since the processing element are small. Bit-serial arithmetic also yields a low routing complexity of the circuit since the bit-serial data stream only require a single wire. A drawback with bit-serial arithmetic is that many D flip flops are required.
Digit-serial arithmetic is a compromise between bit-serial and bit-parallel arithmetic. Here each data word is divided into blocks, i.e. digits, that are processed sequentially. Also, the number of D lip flops required is a compromise between bit-serial and bit-parallel arithmetic. The digits are typically two or four bits large. In fact, bit-serial and bit-parallel arithmetic are special cases of digit-serial arithmetic with digit sizes equal to one and the data word length, respectively.
5. MAPPING TO HARDWARE The next step in the design flow is to map the filter design to hardware.
5.1. VHDL Code Generators We have developed, and are developing, several VHDL code generators for digital filters. Here follows some examples of the generators currently available or under development: • Bit-parallel FIR filters – This generator is based on bitparallel, carry save arithmetic. It supports several FIR filter structures such as the direct form, polyphase structures, and differential FIR filters. • Bit-serial LWDFs – This generator provides maximally fast bit-serial first- and second-order Richards’ allpass sections. • Bit-parallel LWDFs – This generator is based on bitparallel, carry save arithmetic. The tool generates VHDL code for first- and second-order Richards’ allpass sections that are isomorphic mapped to carry save arithmetic. The multipliers within the adaptors are mapped to Wallace adder tree structures. • Bit-parallel adders – From this generator several different bit-parallel adder structures can be generated. One such adder structure is the basic ripple-carry adder, with or without pipelining. Both the bit parallel FIR filter and the bit parallel LWDF filter generators has successfully been used for implementation of several filters [22]. The VHDL code generated by these tools structural and well suited for logic synthesis. Hence, by using commercial logic synthesis tools and standard cells libraries, a physical layout is obtained. To improve the energy efficiency of the implementation further, unconstrained layouts are used for critical blocks. To increase the design efficiency of such blocks we use module generators.
5.2. Module Generators
[3]
A module generator is a parameterized building block which is based on unconstrained layout cells. A typical module generator could be a fixed multiplier, where the input parameters are the coefficient and the data word length. Since digital filters are implemented using a small set of operations, for example additions and multiplication, module generators can be used. Examples of module generators that has been developed, or are under development at the division are: • Bit-parallel adder tree – Based on the overturned stair adder tree. Suitable for implementation of high speed, bit-parallel multipliers. • RAM generator – A low power RAM memory generator with parameterized size and data word length. • Digit-serial two-port adaptor – A symmetric two-port adaptor implemented using digit-serial arithmetic. The parameters are the data word length, the coefficient, and the digit size. These generators are developed for a 0.18 µm CMOS process from STM. Also, several module generators has been developed in older CMOS processes. The main difference between these generators developed using different technologies are the building blocks used at the lowest level. When a new technology is introduced the basic cells can be redesigned while the rest of the generator can be reused. Hence, this methodology simplifies future technology changes.
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11] [12]
5.3. Low Power Cell Library We have developed a tool for generating scalable low power cell libraries. The generated library may include simple cells such as inverters, nand-gates, and nor gates as well as more complex gates such as full adders and flip flops. The cells of the library are scalable with respect to transistor sizes and cell heights. Hence, the driving capacity of a gate can be dynamically changed depending on the requirements of the circuit. By instantiation of a suitable set of standard cells and characterization of these cells, a cell library that can be used in commercial logic synthesis tools can be implemented. Currently the tool supports a 0.18 µm CMOS process. However, it has been designed to be technology independent. By a simple change of technology parameters, cell library can be generated in other technologies.
6. CONCLUSIONS In this paper the current status of the automated design flow for design and implementation of digital filters is discussed. The aim is to increase the efficiency of design and implementation of digital filters with low power consumption.
[13]
[14] [15] [16] [17]
[18] [19] [20] [21]
7. REFERENCES [1] [2]
A. Fettweis, “Wave digital filters: Theory and practice,” Proc. IEEE, vol. 74, no. 2, pp. 270–327, Feb. 1986. M. Renfors and Y. Neuvo, “The maximum sampling rate of digital filters under hardware speed constraints,” IEEE Trans. Circuits System, vol. 28, pp 196–202. March 1981.
[22]
L. Wanhammar, DSP Integrated Circuits, Academic Press, 1999. L. Svensson, P. Löwenborg, and H. Johansson, “Modulated M-channel FIR filter banks utilizing the frequency response masking approach,” in Proc. IEEE Nordic Signal Processing Symp., Hurtigruten, Norway, Oct. 4–7, 2002. H. Johansson and L. Wanhammar, “High-speed recursive filter structures composed of identical allpass subfilters for interpolation, decimation, and QMF banks with perfect magnitude reconstruction,” IEEE Trans. Circuits Systems II, vol. 46, no. 1, pp. 16–28, Jan. 1999. H. Johansson and L. Wanhammar, “Wave digital filter structures for high-speed narrow-band and wide-band filtering,” IEEE Trans. Circuits Systems II, vol. 46, no. 6, pp. 726–741, June 1999. H. Johansson and L. Wanhammar, “Filter structures composed of allpass and FIR filters for interpolation and decimation by a factor of two,” IEEE Trans. Circuits Systems II, vol. 46, no. 7, pp. 896–905, July 1999. H. Johansson and L. Wanhammar, “High-speed recursive digital filters based on the frequency-response masking approach,” IEEE Trans. Circuits Systems II, vol. 47, no. 1, pp. 48–61, Jan. 2000. H. Johansson and P. Löwenborg, “Reconstruction of nonuniformly sampled bandlimited signals by means of digital fractional delay filters,” IEEE Trans. Signal Proc., vol. 50, no. 11, pp. 2757–2767, Nov. 2002. H. Ohlsson, H. Johansson, and L. Wanhammar, “Implementation of a combined interpolator and decimator for an OFDM system demonstrator,” in Proc. NORCHIP Conf., Turku, Finland, Nov. 6–7, 2000, pp. 47–52. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital design,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484, April 1992. O. Gustafsson, H. Johansson, and L. Wanhammar, "An MILP approach for the design of linear-phase FIR filters with minimum number of signed-power-of-two terms," in Proc. European Conf. Circuit Theory Design, Espoo, Finland, Aug. 28–31, 2001. O. Gustafsson and L. Wanhammar "Design of linearphase FIR filters combining subexpression sharing with MILP," in Proc. IEEE Midwest Symp. Circuits Systems, Tulsa, OK, Aug. 4–7, 2002. L. Wanhammar and H. Johansson, Digital Filters, Linköping University, 2002. L. Gazsi, “Explicit formulas for lattice wave digital filters,” IEEE Trans. Circuits Systems, vol. 32, no.1, pp. 68– 88, Jan. 1985. M. Vesterbacka, On Implementation of Maximally Fast Wave Digital Filters, Diss. no. 487, Linköping University, Sweden, 1997. O. Gustafsson and L. Wanhammar, “Some issues in low power arithmetic for fixed-function DSP,” in Proc. Radiovetenskap och Kommunikation, Stockholm, Sweden, June 10–13, 2002, pp. 473–477. T. G. Noll, “Carry-save architectures for high-speed digital signal processing,” J. VLSI Signal Proc., vol. 3, pp. 121–140, 1991. O. Gustafsson, On mapping of Digital Filter Algorithms to Hardware, Thesis no. 838, Linköping University, 2000. K. Palmkvist, Studies on the Design and Implementation of Digital Filters, Diss. no. 583, Linköping University, Sweden, 1999. C. Larsson, O. Gustafsson, M. Vesterbacka, and L. Wanhammar, “A tool for manual scheduling of DSP algorithms implemented in Java,” in Proc. National Conf. Radio Science (RVK), Karlskrona, Sweden, June 14–17, 1999, pp. 367–369. H. Ohlsson and L. Wanhammar, “A digital down converter for a wideband radar receiver,” in Proc. Radiovetenskap och Kommunikation, Stockholm, Sweden, June 10–13, 2002, pp. 478–481.