Design of Multiplierless and High-Performance DWT and IDWT Architectures Using 4-tap Daubechies Filters Tze-Yun Sung (宋志雲) Department of Microelectronics Engineering, Chung Hua University No.707, Sec. 2, Wufu Rd., Hsinchu, Taiwan, 30012, R.O.C. Tel: 03-518-6387, Fax: 03-518-6891 Email:
[email protected] Abstract This paper proposes two architectures of 2-D discrete wavelet transform (DWT) and inverse DWT (IDWT). The first high-efficiency architecture comprises a transform module, an address sequencer, and a RAM module. The transform module has uniform and regular structure, simple control flow, and local communication. The significant advantages of the single transform module are full hardware-utilization and low-power. The second architecture features parallel and pipelined computation and high throughput. Both architectures are very suitable for VLSI implementation of new-generation image coding/decoding systems, such as JPEG-2000 and motion-JPEG. In the realization of 2-D DWT/IDWT, we focus on a VLSI implementation using 4-tap Daubechies filters, which saves power and reduces chip area. Keywords: DWT/IDWT, low-power, image coding/decoding system, JPEG-2000, VLSI, 4-tap Daubechies filters, multiplierless.
1. Introduction
using 4-tap Daubechies coefficients for lossy analysis
In the field of digital image processing, the
[1]. The symmetry of 4-tap Daubechies filters and the
wavelet
fact that they are almost orthogonal [2]–[4] make
transform for image compression [1]; hence, the
them good candidates for image analysis application.
two-dimensional (2-D) discrete wavelet transform
The coefficients of the filter are quantized before
(DWT)/inverse DWT (IDWT) has recently been used
hardware implementation; hence, the multiplier can
as a powerful tool for image coding/decoding
be replaced by limited quantity of shift registers and
systems. Two-dimensional DWT/IDWT demands
adders. Thus, the system hardware is saved, and the
massive computations, hence, it requires a parallel
system throughput is improved significantly.
JPEG-2000
standard
uses
the
scalar
and pipelined architecture to perform real-time or
In this paper, we proposed two high-efficiency
on-line video and image coding and decoding, and to
architectures for the even and odd parts of 1-D
implement
application-specific
decimated convolution. The advantages of the
integrated circuits (ASIC) or field programmable gate
proposed architectures are 100% hardware-utilization,
array (FPGA). At the heart of the analysis stage of
multiplierless, regular structure, simple control flow
the system is the DWT. In the synthesis stage, the
and high scalability.
high-efficiency
inverse DWT recovers the original image from the coefficients of DWT. Cohen, Daubechies and Feauveau proposed
The remainder of the paper is organized as follows. Section 2 presents the 2-D discrete wavelet transform algorithm, and derives new mathematical
formulas.
In
Section
3,
the
K −1
high-efficiency
LLj (m, n) = ∑ h(i ) ⋅ LLmj ,i (n)
architecture for the 2-D DWT is proposed. In Section
K −1
4, the high-efficiency and low-power architecture for
LH j (m, n) = ∑ h(i) ⋅ LH mj ,i (n)
the 2-D IDWT is proposed. Section 5 applies the two
quantization
scheme
for
K −1
HLj (m, n) = ∑ g (i ) ⋅ LLmj ,i (n)
VLSI
(7)
i =0
K −1
implementation, and analyzes their performance.
HH j (m, n) = ∑ g (i) ⋅ LH mj ,i (n)
Finally, comparison of performance between the proposed architectures and previous works is made
(6)
i =0
proposed 2-D DWT/IDWT architectures to the coefficient
(5)
i =0
(8)
i =0
where K −1
with conclusions given in Section 6.
LLmj ,i (n) = ∑ h(k ) ⋅ LLj −1 (2m − i,2n − k )
(9)
k =0
2. 2-D Discrete Wavelet Transform Algorithm
K −1
LHmj ,i (n) = ∑ g (k ) ⋅ LLj −1 (2m − i,2n − k )
(10)
k =0
The 2-D DWT is a multilevel decomposition technique. The mathematical formulas of 2-D DWT are defined as follows [5]-[6]:
By dividing each of equations (9) and (10) into two
parts:
even-numbered j m ,i
odd-numbered samples,
LL (n)
samples and
LH
and j m ,i
( n)
K −1 K −1
LLj (m, n) = ∑∑ h(i ) ⋅ h(k ) ⋅ LLj −1 (2m − i, 2n − k ) (1)
can be rewritten as follows.
i =0 k =0
K −1 K −1
LH j (m, n) = ∑∑ h(i) ⋅ g (k ) ⋅ LLj −1 (2m − i, 2n − k )
(2)
LLmj ,i (n) =
i =0 k = 0
K −1 K −1
HL (m, n) = ∑∑ g (i ) ⋅ h(i ) ⋅ LL (2m − i, 2n − k ) (3)
+
K − ⎢⎡K / 2⎥⎤ −1
j −1
j
∑
⎢⎡ K / 2⎥⎤ −1
∑
HH j (m, n) = ∑∑ g (i) ⋅ g (k ) ⋅ LLj −1 (2m − i, 2n − k ) (4)
h(2k + 1) ⋅ LL (2m − i,2n − 2k −1)
k =0
LHmj ,i (n) =
i =0 k =0
where
h(⋅)
and
g (⋅)
are low-pass filter and
high-pass filter, respectively, K is the filter length, j
LL (m, n)
,
j
j
LH (m, n) ,
,
HL (m, n)
and
j
HH (m, n) are coefficients at position (m, n) in
the low-low, low-high, high-low and high-high j
j
j
j
subbands (denoted by LL , LH , HL , and HH , respectively)
of
the
j
th
level
(11) j −1
i =0 k =0
K −1 K −1
h(2k ) ⋅ LLj −1 (2m − i,2n − 2k )
k =0
+
K − ⎢⎡K / 2⎥⎤ −1
∑
⎢⎡K / 2⎥⎤ −1
∑
g(2k ) ⋅ LLj −1 (2m − i,2n − 2k )
k =0
(12) j −1
g (2k +1) ⋅ LL (2m − i,2n − 2k −1)
k =0
The above equations imply that LLmj ,i (n) and LH mj ,i (n) can be computed as sum of two 1-D
convolutions, which are independently obtained from the even part LLj −1 (2m − i, 2n − 2k ) and the odd part LLj −1 (2m − i, 2n − 2k − 1) , respectively.
decomposition,
0 ≤ n, m < N j , and LL (m, n) is the original image 0
of size N 0 × N 0 .
3. The Proposed 2-D DWT Architecture For real-time applications, a parallel and
Note that equations (1) ~ (4) are separable 2-D
pipelined architecture is proposed to implement 2-D
convolutions followed by decimation by 2 in both
DWT. At each decomposition level, there are two
horizontal and vertical directions. In specific,
1-D filtering operations involved: row filtering
j
j
j
coefficients LL (m, n) , LH (m, n) , HL (m, n) , j
and HH (m, n) can be computed as follows.
followed by column filtering or vice versa. After 1-level DWT, the input image of size is decomposed into four subbands (LL, LH, HL and HH) of size. The
LL subband can be further decomposed into LLLL,
and high-pass reconstruction filter coefficients,
LLLH,
LLHL
decomposition
and
LLHH
at
the
second
respectively. The proposed 2-D IDWT architecture
level;
LLLL
can
be
further
comprises an inverse transform module, a RAM
decomposed into LLLLLL, LLLLLH, LLLLHL and
module
LLLLHH at the third decomposition level, and so on.
for 3-level IDWT; 2 clock cycles are allocated for the
Figure 1 shows 1-level 2-D DWT.
first reconstruction level, 8 clock cycles for the
Take the 4-tap Daubechies orthogonal wavelet as an example, Figures 2 and 3 show the horizontal
and a multiplexer. It takes 42 clock cycles
second reconstruction level, and 32 clock cycles for the third reconstruction level.
and vertical filters used for 2-D DWT, respectively. Based on equations (11) and (12), horizontal and vertical filtering can be obtained as sum of two convolutions
performed
on
their
respective
even-numbered samples and odd-numbered samples. Data flows of the vertical and horizontal filters are shown in Table 1 and Table 2, respectively. Figure 4 shows the transform module composed of vertical and horizontal filters of 2-D DWT. Figure 5 depicts the proposed architecture for 2-D DWT, which comprises a memory module, a transform module, a multiplex and an address generator.
Figures 6(a)
and 6(b) show the horizontal and vertical directions of data sampling of the proposed 2-D DWT architecture, respectively. It takes 42 clock cycles for 3-level DWT; 32 clock cycles are allocated for the first decomposition level, 8 clock cycles for the second decomposition level, and 2 clock cycles for the third decomposition level. Note that three transform modules can be cascaded to obtain high-throughput architecture with reconfigurable system.
5. Hardware Implementation and Performance Analyses of the Proposed 2-D DWT/IDWT Architectures The low-pass and high-pass filter coefficients of DWT/IDWT
In the proposed 2-D IDWT architecture, column
first
quantized
and
then
implemented in high-speed computation hardware [7]-[9].
The
quantized
coefficients
of
4-tap
Daubechies wavelet are shown in Table 3. The multiplier is replaced by a carry-save-adder (CSA), an adder and three hardwire shifters in processing element (PE). The shift and adder circuit replacing the multiplier is shown in Figure 10. In the proposed architectures, multiplications are obtained by shifts and additions after approximating the coefficients in booth
binary
recoded
format
(BBRF).
Thus,
multiplier is replaced by a carry-save-adder (CSA), an adder and three hardwire shifters in processing element (PE). The replaced multiplier can reduce power dissipation by m in comparison to the conventional architectures in m-bit operands. The advantages of the proposed 2-D DWT/IDWT architectures
4. The Proposed 2-D IDWT Architecture
are
are
regular
structure,
local
communication and simple control flow. As a result, they are suitable for VLSI implementations and
filtering followed by row filtering is performed at
reconfigurable
each reconstruction level. The proposed vertical and
hardware utilization is 100% and therefore consumes
horizontal filters used for 2-D IDWT with 4-tap
ultra-low power. The total data processing times of
Daubechies wavelet are shown in Figures 7 and 8,
2-D DWT and IDWT are , where .
respectively. Figure 9 shows the inverse transform module, where h (n) and g (n) are the low-pass
filter
lengths.
In
addition,
the
The platform for architecture development and verification had been designed and implemented in
consideration of development cost. The architecture
In this paper, two high-speed and ultra
had been implemented on Xilinx XC2V4000 FPGA
low-power architectures for 2-D DWT and IDWT
emulation board [10]; the chip had been integrated
with single transform modules have been proposed.
with the 8051 microcontroller and interface circuits
Both architectures perform analysis in
(USB 2.0) to form the architecture development and
time and synthesis in . They are significantly faster
verification platform. Figure 11 depicts block
than other commonly used architectures proposed by
diagram
Chakrabati,
of
the
platform,
where
the
8051
Visshwanath,
Wu
and
processing
Chakrabati
microcontroller reads the original image from PC via
[13]-[16]. Tables 1 and 2 depict the comparisons of
DMA channel and writes the reconstructed image
the proposed architecture to other commonly used
back to PC via USB 2.0 bus; the Xilinx XC2V4000
architectures for 2-D DWT and IDWT [13]-[16]. As
FPGA chip is used to implement high-performance
can be seen, the system performances of the two
DWT and IDWT processors. Figure 12 shows the
proposed architectures are significantly better than
architecture development and verification board.
that of previous works. Filter
In this paper, we handle borders by the periodic
coefficients
are
quantized
before
extension method [11]. Hence, the quality of
implementation using 4-tap Daubechies filters. Both
reconstructed images can be improved. Figure 13
hardwares are cost-effective and the systems have
shows the periodic extension at image borders. The
high-speed.
proposed 2-D DWT/IDWT architectures with fixed
dissipation by m compared with conventional
point operations had been applied to the Lena image.
architectures
Figure 14 shows the 3-level decomposition of Lena
utilization).
The
architectures
in
m-bit
reduce
operand
power
(low-power
image. The original and reconstructed Lena images
The proposed architectures have been verified
are shown in Figure 15. The peak-signal-to-noise
by Verilog-HDL [17] and implemented on VLSI [12].
ratio (PSNR) of the reconstructed Lena image is
The advantages of the proposed architectures are
255.5dB. The log-log plot of 2-D DWT/IDWT
100% hardware utilization and ultra low-power. The
computation against various image sizes is shown in
architectures have regular structure, simple control
Figure
flow, high throughput and high scalability. Thus, they
16.
The
proposed
2-D
DWT/IDWT
architectures are suitable for the JPEG 2000 and
are
motion JPEG applications.
coding/decoding systems, such as JPEG-2000 and
The proposed 2-D DWT and IDWT processors
very
suitable
for
new-generation
image
motion-JPEG.
have been synthesized by TSMC 0.18 1P6M CMOS
References
cell libraries [12]. The reported core sizes of 2-D DWT and IDWT processors are 588 × 588μ m 2 and
[1] ITU-T
Recommendation
T.800.
JPEG2000
410 × 406μ m 2 , respectively. The power dissipations
image coding system – Part 1, ITU Std., July
of 2-D DWT and IDWT processor are 28.53 mW and
2002. http://www.itu.int/ITU-T/.
27.10 mW at 1.8V with clock rate of 50MHz. Figures
[2] A. S. Lewis and G. Knowles, VLSI architecture
17 and 18 show the layout view of the implemented
for 2-D Daubechics wavelet transform without
2-D DWT and IDWT core processors.
multipliers, Electron. Lett., Vol. 27, Jan. 1991, pp.171–173.
6. Conclusion
[3] I. Daubechies, Orthonormal Based of Compactly
Supported Wavelets, Communications on Pure
Wavelet Transform for Perfect Reconstruction,
and Applied Mathematics, Vol. 41, 1988, pp.
IEEE Trans. Signal Processing, Vol. 48, No. 5,
909-996.
May 2000, pp. 1365-1378.
[4] I. Daubechies, Ten Lectures on Wavelets, 1st edition, Philadelphia PA: SIAM, 1992.
Technical Data, v.3.2, Taiwan Semiconductor
[5] F. Marino, Two Fast Architectures for the Direct 2-D
Discrete
Wavelet
[12] TSMC 0.18 μm CMOS Design Libraries and
Transform,
Manufacturing Company, Hsinchu, Taiwan.
IEEE
[13] C. Chakrabarti, M. Vishwanath, Efficient
Transactions on Signal Processing, Vol. 49, No.
Realizations of the Discrete and Continuous
6, June 2001, pp. 1248-1259.
Wavelet
[6] M. Antonini, M. Barlaud, P. Mathieu, I.
Transforms:
From
Single
Chip
Implementations to Mappings on SIMD Array
Daubechies, Image Coding Using Wavelet
Computers,
Transform,
Processing, Vol. 43, No. 5, 1995, pp. 759-771.
IEEE
Transactions
on
Image
Processing, Vol.1, No.2, April 1992, pp. 205-220.
IEEE
Transactions
on
Signal
[14] M. Vishwanath, R. M. Owens, M. J. Irwin, VLSI Architectures for the Discrete Wavelet
[7] K. A. Kotteri, A. E. Bell, J. E. Carletta, Design
Transform, IEEE Transactions on Circuits and
of Multiplierless, High-Performance, Wavelet
Systems-II,
Filter
pp.305-316.
Banks
with
Image
Compression
Applications, IEEE Transactions on Circuits and
Vol.
42,
No.
[15] P. –C. Wu, C. –T. Liu,
5,
May
1995,
L. –G. Chen, An
Systems-I, Vol. 51, No. 3, March 2004,
Efficient Architecture for Two-Dimensional
pp.483-494.
Inverse Discrete Wavelet Transform, IEEE
[8] T.-Y. Sung, Y.-S. Shieh, C.-L Chiu, C.-W. Yu, VLSI Architectures for 2-D Forward and Inverse Discrete
Wavelet
Daubechies
2006
Using
on
Circuits
and
Systems, Vol. 2, May 2002, pp. II-312-II-315. [16] C.
Chakrabarti,
C.
Mumford,
Efficient
Realizations of Analysis and Synthesis Filters
Application
Based on the 2-D Discrete Wavelet Transform,
(2006-CECA), Kaoshiung, Taiwan, July 2006,
International Conference on Acoustic, Speech
BO-019.
and
Communication
Conference
4-tap
Symposium
on
Electronic
Filters,
Transform
International
and
[9] T.-Y. Sung, C.-L. Chiu, Y.-S. Shieh, C.-W. Yu,
Signal
Processing,
May
1996,
pp.
3256-3259.
Low-Power Multiplierless 2-D DWT and IDWT
[17] D. E. Thomas, P. H. Moorby, The Verilog
Architectures Using 4-tap Daubechies Filters,
Hardware Description Language, Fifth Edition,
The 7th International Conference on Parallel
Kluwer Academic Pub. 2002.
and Distributed Computing, Applications and Technologies, Taipei, Taiwan, R.O.C., December 2006, pp.185-190. [10] Xilinx FPGA products, http://www.xilinx.com/ products. [11] M. Ferretti, D. Rizzo, Handling Borders in Systolic Architectures for the 1-D Discrete
Table 1 Data flow of the vertical filter of the proposed 2-D DWT architecture Clock 0 1 2 3 …
Part Even Odd Even Odd Even Odd Even Odd … …
Operation h(1)x(0,0),h(3)x(0,0) h(0)x(0,1),h(2)x(0,1) h(1)x(0,2),h(3)x(0,2) h(0)x(0,3),h(2)x(0,3) h(1)x(0,4),h(3)x(0,4) h(0)x(0,5),h(2)x(0,5) h(1)x(0,6),h(3)x(0,6) h(0)x(0,7),h(2)x(0,7) … …
Latch
Output
0
h(0)x(0,1)+h(1)x(0,0)
h(2)x(0,1)+h(3)x(0,0)
h(0)x(0,3)+h(1)x(0,2)+h(2)x(0,1)+h(3)x(0,0)
h(2)x(0,3)+h(3)x(0,2)
h(0)x(0,5)+h(1)x(0,4)+h(2)x(0,3)+h(3)x(0,2)
h(2)x(0,5)+h(3)x(0,4)
h(0)x(0,7)+h(1)x(0,6)+h(2)x(0,5)+h(3)x(0,4)
…
…
Table 2 Data flow of the horizontal filter of the proposed 2-D DWT architecture Clock 0 … N/2 … N … 3N/2 …
Input L(0,0) … L(1,0) … L(2,0) … L(3,0) …
PE1 h(3)L(0,0) … h(2)L(1,0) … h(3)L(2,0) … h(2)L(3,0) …
LD2 -… h(2)L(1,0)+h(3)L(0,0) … -… h(2)L(3,0)+h(3)L(2,0) …
PE0 h(1)L(0,0) … h(0)L(1,0) … h(1)L(2,0) … h(0)L(3,0) …
Output -… h(0)L(1,0)+h(1)L(0,0) … -… h(0)L(3,0)+h(1)L(2,0)+h(2)L(1,0)+h(3)L(0,0) …
Table 3 Quantized coefficients of the 4-tap Daubechies filter (BBRF:Booth binary recoded form) l(n)
h(n)
n
Decimal
0
-0.12939453125
1
0.22412109375
00.00100001001 00.01001001011
2
0.83642578125
3
0.48291015625
BBRF
Decimal
BBRF
-0.48291015625
00.100001001010
0.83642578125
01.001010100010
01.001010100010
-0.22412109375
00.01001001011
00.100001001010
-0.12939453125
00.00100001001
Table 4 Comparison between the proposed architecture and other architectures for 2-D DWT
N2 + N(K + 2) 4 N2 + KN + K 4
2 ≈ 2N 3 2N 2 ≈ 3
Table 5 Comparison between the proposed architecture and other architectures for 2-D IDWT
N2 + N ( K + 2) 4
≈ 2N
2
3
2
N + ( K − 2) N + K 4
≈
2N 2 3
h(3)
H( z ) H(z)
2
2
Inputeven
L
G( z )
x
2
H( z ) G(z)
2
h(1)
LL
2
Inputeven
LH
L
HL
Inputodd
H
G( z ) Horizontal
2
HH
Inputodd h(0)
h(2)
L : Latch
Vertical
Fig. 1. 1-level 2-D DWT
PE1
Output
Fig. 2. The horizontal low-pass filter for 1-D DWT
PE0
In put 0 1 0
0
h (3 ) h (2 )
L
1 0
h (1 ) h (0 )
L
1
1
LD1
LD0
L
Level 3 Level 2
L
Input O utput
LD2
L : L atch
LD
: L in e d elay
Fig. 3(a). The vertical low-pass Filter for 1-D DWT
Output
Level 1 N/8
N/4 N/2
Fig. 3(b). Line delay (LD) for Vertical Filter in 1-D DWT
0 1 L
0
0
h(3) h(2)
1 L
0 1
h(3)
x even
1
LD 1
h(1)
h(1) h(0)
LD0
x even L
L
L
x odd
L
LL
LD 2
x odd h(0)
h(2)
0 1 L
0
0
g(3) g(2)
1 L
0 1
1
LD 1
LD0
L
L
LH
LD 2
0 1
0
h(3) h(2)
L
0
1
x even
1
LD 1
g(1)
h(1) h(0)
L
0 1
g(3)
g(1) g(0)
LD 0
x even H
L x odd
L
L
g(2)
HL
LD 2
x odd g(0) 0
1 L
0
0
g(3) g(2)
1
g(1) g(0)
Address generator
L
0 1
Memory module N N × 2 2
1
LD 1
LD 0
(LL)j-1LL
Transform
(LL)j-1LH
module
(LL)j-1HL
Input L
L
L : Latch
LD
(LL)j-1HH
HH
LD 2
: Line delay
Fig. 4. The proposed architecture of the transform module for 2-D DWT
Fig. 5. The proposed architecture of the
single transform module for 2-D DWT
L(0,0)
L(0,1)
L(0,2)
L(0,3)
H(0,0) H(0,1) H(0,2) H(0,3)
L(1,0)
L(1,1)
L(1,2)
L(1,3)
H(1,0) H(1,1) H(1,2) H(1,3)
L(2,0)
L(2,1)
L(2,2)
L(2,3)
H(2,0) H(2,1) H(2,2) H(2,3)
L(3,0)
L(3,1)
L(3,2)
L(3,3)
H(3,0) H(3,1) H(3,2) H(3,3)
L(4,0)
L(4,1)
L(4,2)
L(4,3)
H(4,0) H(4,1) H(4,2) H(4,3)
L(5,0)
L(5,1)
L(5,2)
L(5,3)
H(5,0) H(5,1) H(5,2) H(5,3)
L(6,0)
L(6,1)
L(6,2)
L(6,3)
H(6,0) H(6,1) H(6,2) H(6,3)
L(7,0)
L(7,1)
L(7,2)
L(7,3)
H(7,0) H(7,1) H(7,2) H(7,3)
x(0,0) x(0,1) x(0,2) x(0,3) x(0,4) x(0,5) x(0,6) x(0,7) x(1,0) x(1,1) x(1,2) x(1,3) x(1,4) x(1,5) x(1,6) x(1,7) x(2,0) x(2,1) x(2,2) x(2,3) x(2,4) x(2,5) x(2,6) x(2,7) x(3,0) x(3,1) x(3,2) x(3,3) x(3,4) x(3,5) x(3,6) x(3,7) x(4,0) x(4,1) x(4,2) x(4,3) x(4,4) x(4,5) x(4,6) x(4,7) x(5,0) x(5,1) x(5,2) x(5,3) x(5,4) x(5,5) x(5,6) x(5,7) x(6,0) x(6,1) x(6,2) x(6,3) x(6,4) x(6,5) x(6,6) x(6,7) x(7,0) x(7,1) x(7,2) x(7,3) x(7,4) x(7,5) x(7,6) x(7,7)
Fig. 6. (a) The horizontal direction and architecture
(b) vertical direction of the proposed 2-D DWT
Even part
Even output
L1
LD ~ 0 h (0) ~ 1 h (1)
~ h ( 2)
~ h ( 0)
Input
f
Input
L2
~ 0 h (2) ~
1h (3) 2f
~ h (1)
Output
~ h (3) LD
: Line delay
Odd output Odd part
: Latch
L
Fig. 8. The vertical low-pass filter for 1-D IDWT
Fig. 7. The horizontal low-pass filter for 1-D IDWT LL
f
LD ~ 0 h (0) ~
1 h (1)
~ 0 h ( 2)
Even part
~
1 h (3) 2f
LH
f
~ h (0)
LD ~ (0) 0 g 1
g~ (1)
xeven
L
~ 0 g ( 2) ~ g (3)
L
f
LD ~ 0 h (0) ~ h 1 (1)
L
~ h (1)
1
2f
HL
~ h ( 2)
~ h (3)
Odd part ~ 0 h ( 2)
Even part
~
1 h (3) 2f
g~ (0)
HH
f
LD ~ (0) 0 g
~ 0 g ( 2)
g~ (1)
g~ (3)
1
1
H
L
g~ ( 2)
L
g~ (1)
2f
g~ (3)
xodd
Odd part LD
: Line delay
L
: Latch
Fig. 9. The proposed architecture of the transform module for 2-D IDWT
x(n)
Shifter
Shifter
Shifter
(3,2) CSA
Output Fig. 10. The shift and adder circuits replacing the multiplier
The architecture development and verification board Finish Initiate Ready
PC
U S B
U S B
8051 Micro-contr oller
R/W R/W
Xilinx XC2V6000 FPGA
SRAM
PCI Bus Address Bus
Data Bus
Download architecture configuration
Fig. 11. The block diagram of the architecture development and verification platform for forward and inverse DWT
Xilinx XC2V6000 FPGA
PCI Bus SRAM Extension Socket
USB 2.0
8051 Microcontroller
Fig. 12. The architecture development and verification board
L(0,0)
L(0,3) h(3) h(2) h(1) h(0)
h(3) h(2) h(1) h(0)
x(0,0) x(0,1) x(0,2) x(0,3) x(0,4) x(0,5) x(0,6) x(0,7)
g(3) g(2) g(1) g(0)
x(0,0) x(0,1)
g(3) g(2) g(1) g(0)
H(0,0)
H(0,3)
Fig. 13. The proposed periodic extension method
(a)
Leve1 3
Leve1 2
Leve1 1
(b) Fig. 15. (a) Original and (b) reconstructed Lena image
Fig. 14. The 3-level decomposition of Lena image
Fig. 17. Layout view of the implemented 2-D DWT core processor
Fig. 16. Log-log plot of the 2-D DWT/IDWT computations Fig. 18. Layout view of the implemented 2-D IDWT core processor against different image width for each algorithm