Implementation of 1-D Daubechies Wavelet Transform on FPGA V. Herrero, J. Cerdà, R. Gadea, M. Martínez, A. Sebastià
[email protected] Group of Design of Digital Systems, Departamento de Ingeniería Electrónica. Universidad Politécnica de Valencia. Camino de Vera s/n, 46022 Valencia, SPAIN.
Abstract The Wavelet Transform has shown itself as an useful tool in the field of 1-D and 2-D signal compression systems. Due to the growing importance of this technique, there was an increasing need in many working groups for having a development environment which could be flexible enough and where the performance of a specific architecture could be measured, closer to reality rather than in a theoretical way. The FPGAs are programmable logic devices that provide us an interesting workbench where to efficiently synthesize and test almost any architecture, using hardware description languages such as VHDL that speed up the design process. This experience opens the path to future comparative studies between different architectures or even the reuse of this design as a test bench for different filtering structures.
1
The Daubechies Wavelet Transform
The field of Discrete Wavelet Transform is a recent one. The complete theory and toolbox set has been developed in the 90’s and nowadays is producing interesting results on signal compression systems. First of all a short introduction to the continuos wavelet transform (CWT) will be given: As well as the Fourier Transform uses a basis function ejΩt, the CWT is based on other kind of basis functions which accomplish an interesting property: all of them are generated by dilation and shift of a single function ϕ(t) (called a mother wavelet). Thus the CWT represents the input signal as a linear combination in this way: xa (t ) = ∑∑ X CWT (k , l ).2− k / 2ϕ (2− k t − l ) k
l
This kind of transform allows a non-uniform frequency resolution, just like the one obtained by the filter bank shown bellow 1 2
G
y2(n)
Decimator by 2
G
2
H
x(n) H
Stage 1
Stage 2
Fig 1
1
y1(n)
y0(n)
2 Decimator by 2
1
2 Decimator by 2
Decimator by 2
Each yn is the result of the analysis of x in the n-octave
The second step is to discretize the expressions of CWT. This can be done by manipulating the filter equivalent expressions shown in chapter 11.3.1 of [1] which leads to a convolutionlike expression for both filters H (high pass) and G (low pass) and opens path to a digital implementation. The last step was done by Daubechies in the 90’s, who developed a systematic technique for generation of finite-length orthonormal wavelet functions that has supposed a ultimate advance towards the building of wavelet transformers using finite impulse response (FIR) filters. The most important characteristic of the FIR filters obtained form Daubechies functions is that they are Power Symmetric, which allows us to implement those filters using a Lattice Structure as described in chapter 6.4 of [1]. The Lattice Structure has many advantages, such as better coefficient quantization response as well as a reduction by a factor of two of the stages needed for a given filter order.
2
Architecture
The architecture used for developing the Daubechies DWT is based on the one presented by Knowles in [5]. It uses only a filter, which is the most hardware expensive part, in a recursive way, instead of replicating it n-1 times, where n is the number of octaves where the signal is being analyzed. This is called a folded structure. It takes profit of the low number of stages needed in the lattice filtering unit, using a reasonable amount of memory to temporarily save the state of the filter in every octave while evaluating one of them. A main point in the architecture of the DWT is the algorithm used to obtain the optimal sequence of evaluation of the several octaves. This is done by means of the Recursive Pyramid Algorithm (RPA) [3] which is a reformulation of the well known Pyramid Algorithm discovered by Mallat. RPA allows us to obtain the outputs of each octave at the right time, with no added delay, just as if a bank filter structure (unfolded structure) was being used. This is possible due to the decimation that the signal suffers in every branch of the filter tree.
Even A/D Converter Input Odd A/D Converter Input
+
Fig. 2 Design Architecture 2
3
Implementation
First of all a complete set of functions was developed in MATLAB, which included coefficient calculation for a given filter order, discrete wavelet transform for any number of octaves or filter order, noise evaluation, as well as a discrete wavelet anti-transform to verify the complete design. Bearing in mind the fact that a fixed point arithmetic (chapter 7.5.1 of [2]) was going to be used in the final design, there was a need to obtain an approximate wordlength. This was done by evaluating the noise due to quantization effects in a complete analysis-synthesis system [4]: The requested S/N ratio was 80dB and the wordlengths obtained for different filter orders and number of octaves were: Filter Order
Number of Octaves
Wordlength (bits)
4
4
15
6
5
15
10
7
16
12
10
16
(Filter Stages =Order / 2)
Therefore a 16 bits wordlength has been used in the final structure although high filter orders or high number of octaves are seldom used in real applications. Once the theoretical part of the design was covered, the hardware level was started. First of all we took the decision of minimizing the control path in order to speed up the design as much as possible, so a multi-phase clock has been used. This way, every part of the architecture only activates on the rising edge of one clock. Only two clocks (a basic clock and a 45º delayed clock) were needed to control the whole architecture in a simple and fast fashion.
Fig. 3. Multiphase Control
All the components of the architecture were described in RTL level VHDL using generic parameters which allowed to change the number of stages of the filter, the wordlength or the number of octaves as desired. This step was accomplished with Xilinx Foundation 2.1i software following the design flow proposed in [6]. The ModelSim XE HDL by Model Technology was used to write and execute test-benches where to simulate separately each component. The next step was to put all the parts together and build the whole design. Another test-bench was written for the complete 1-D Wavelet Transformer using STD_LOGIC_TEXTIO libraries to process signals which had been created with MATLAB. The outputs of the last simulation were analyzed with the previously created MATLAB wavelet anti-transform function to evaluate the good working of the design. The following stage was to pass the RTL level code through FPGA EXPRESS logic synthesizer, included in the Foundation Software.
3
Afterwards we chose the most appropriate FPGA device to place and route the result of the synthesis. It happened to be a VIRTEX V100 due to its enhanced characteristics such as built-in PLL controlled clock generators, and very high logic and routing resources density. Finally the placed an routed design was extracted as a structural VHDL plus a SDF (standard delay file), which was taken to MODELSIM an simulated with a similar test-bench to the one used in the RTL-level simulation. Thus we verified that the synthesis, placing and routing process had been properly done.
4
Results
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Figure 5 shows the result of analyzing and synthesizing the signal in Figure 4 using MATLAB functions previously described. This time a 8 order filter and 5 octaves were used. The last part of the reconstructed signal is not recoverable due to the convolution-like behavior of the system which always leaves the last (n-1)*(2m-1) [n: filter order / m+1: number of octaves] samples unfiltered. Figure 6 shows the result of doing the same filtering operation but using the 1-D Daubechies Wavelet Transformer fully placed and routed in a VIRTEX V100, at the analyzing stage. The results only differ on a small error due to 4
quantization effects. Figure 7 shows the difference between our analyzing system output and the output of the corresponding MATLAB function 2. These are some hardware results obtained from different implementations: Filter Order
Number of Octaves
Max. clocks frequency
Number of Slices3 used
4
5
17.88 MHz
479
4
7
18.0 MHz
488
8
5
14.45 MHz
827
8
7
14.8 MHz
837
References [1] Vaidyanathan P.P (1993). Multirate Systems and Filter Banks. Prentice Hall [2] Proakis J.G. and Dimitris G. Manolakis (1998) Tratamiento Digital de Señales. 3rd edition.Prentice Hall [3] Vishwanath M., Owens R.M., Irwin M. J. (1995). VLSI Architectures for Discrete Wavelet Transform. IEEE Transactions on Circuits and Sytems-II. Analog and Digital Signal Processing, vol. 42, No 5, May 1995. [4] Choi H., Burleson W.P., Phatak D.S. (1993). Optimal Wordlength Assigment for the Discrete Wavelet Transform in VLSI. Tech.Rep. TR-92-CSE-13, ECE Dept., Univ. Of Massachusetts, Amherst. [5] Knowles, G. (1990). VLSI Architecture for the Discrete Wavelet Transform. Electronics Letters vol. 26, pp 1184-1185 [6] Model Technology / Xilinx. (1999). Using Modelsim XE with Xilinx Foundation Series 2.1i in VHDL Design Flow. Aplications Note – 121, ModelSim XE User’s Manual. [7] Xilinx (1999). Virtex Datasheet. http://www.xilinx.com
2
the different octave outputs are concatenated in a single vector, so that the first 256 sample belong to the first octave, the 128 following to the second octave, the 64 following to the third octave, and the 32 following to the fourth and the last 32 to the fifth.
3
Slice is the minimum configurable logic unit in a VIRTEX. It includes [7] two four input LUTs and two D-FlipFlops.
5