Design of Multiplierless and High-Performance

0 downloads 0 Views 547KB Size Report
Even h(1)x(0,6),h(3)x(0,6). 3. Odd h(0)x(0,7),h(2)x(0,7) h(2)x(0,5)+h(3)x(0,4) h(0)x(0,7)+h(1)x(0,6)+h(2)x(0,5)+h(3)x(0,4) … … … … … … … Table 2 Data flow of ...
Design of Multiplierless and High-Performance DWT and IDWT Architectures Using 4-tap Daubechies Filters Tze-Yun Sung (宋志雲) Department of Microelectronics Engineering, Chung Hua University No.707, Sec. 2, Wufu Rd., Hsinchu, Taiwan, 30012, R.O.C. Tel: 03-518-6387, Fax: 03-518-6891 Email: [email protected] Abstract This paper proposes two architectures of 2-D discrete wavelet transform (DWT) and inverse DWT (IDWT). The first high-efficiency architecture comprises a transform module, an address sequencer, and a RAM module. The transform module has uniform and regular structure, simple control flow, and local communication. The significant advantages of the single transform module are full hardware-utilization and low-power. The second architecture features parallel and pipelined computation and high throughput. Both architectures are very suitable for VLSI implementation of new-generation image coding/decoding systems, such as JPEG-2000 and motion-JPEG. In the realization of 2-D DWT/IDWT, we focus on a VLSI implementation using 4-tap Daubechies filters, which saves power and reduces chip area. Keywords: DWT/IDWT, low-power, image coding/decoding system, JPEG-2000, VLSI, 4-tap Daubechies filters, multiplierless.

1. Introduction

using 4-tap Daubechies coefficients for lossy analysis

In the field of digital image processing, the

[1]. The symmetry of 4-tap Daubechies filters and the

wavelet

fact that they are almost orthogonal [2]–[4] make

transform for image compression [1]; hence, the

them good candidates for image analysis application.

two-dimensional (2-D) discrete wavelet transform

The coefficients of the filter are quantized before

(DWT)/inverse DWT (IDWT) has recently been used

hardware implementation; hence, the multiplier can

as a powerful tool for image coding/decoding

be replaced by limited quantity of shift registers and

systems. Two-dimensional DWT/IDWT demands

adders. Thus, the system hardware is saved, and the

massive computations, hence, it requires a parallel

system throughput is improved significantly.

JPEG-2000

standard

uses

the

scalar

and pipelined architecture to perform real-time or

In this paper, we proposed two high-efficiency

on-line video and image coding and decoding, and to

architectures for the even and odd parts of 1-D

implement

application-specific

decimated convolution. The advantages of the

integrated circuits (ASIC) or field programmable gate

proposed architectures are 100% hardware-utilization,

array (FPGA). At the heart of the analysis stage of

multiplierless, regular structure, simple control flow

the system is the DWT. In the synthesis stage, the

and high scalability.

high-efficiency

inverse DWT recovers the original image from the coefficients of DWT. Cohen, Daubechies and Feauveau proposed

The remainder of the paper is organized as follows. Section 2 presents the 2-D discrete wavelet transform algorithm, and derives new mathematical

formulas.

In

Section

3,

the

K −1

high-efficiency

LLj (m, n) = ∑ h(i ) ⋅ LLmj ,i (n)

architecture for the 2-D DWT is proposed. In Section

K −1

4, the high-efficiency and low-power architecture for

LH j (m, n) = ∑ h(i) ⋅ LH mj ,i (n)

the 2-D IDWT is proposed. Section 5 applies the two

quantization

scheme

for

K −1

HLj (m, n) = ∑ g (i ) ⋅ LLmj ,i (n)

VLSI

(7)

i =0

K −1

implementation, and analyzes their performance.

HH j (m, n) = ∑ g (i) ⋅ LH mj ,i (n)

Finally, comparison of performance between the proposed architectures and previous works is made

(6)

i =0

proposed 2-D DWT/IDWT architectures to the coefficient

(5)

i =0

(8)

i =0

where K −1

with conclusions given in Section 6.

LLmj ,i (n) = ∑ h(k ) ⋅ LLj −1 (2m − i,2n − k )

(9)

k =0

2. 2-D Discrete Wavelet Transform Algorithm

K −1

LHmj ,i (n) = ∑ g (k ) ⋅ LLj −1 (2m − i,2n − k )

(10)

k =0

The 2-D DWT is a multilevel decomposition technique. The mathematical formulas of 2-D DWT are defined as follows [5]-[6]:

By dividing each of equations (9) and (10) into two

parts:

even-numbered j m ,i

odd-numbered samples,

LL (n)

samples and

LH

and j m ,i

( n)

K −1 K −1

LLj (m, n) = ∑∑ h(i ) ⋅ h(k ) ⋅ LLj −1 (2m − i, 2n − k ) (1)

can be rewritten as follows.

i =0 k =0

K −1 K −1

LH j (m, n) = ∑∑ h(i) ⋅ g (k ) ⋅ LLj −1 (2m − i, 2n − k )

(2)

LLmj ,i (n) =

i =0 k = 0

K −1 K −1

HL (m, n) = ∑∑ g (i ) ⋅ h(i ) ⋅ LL (2m − i, 2n − k ) (3)

+

K − ⎢⎡K / 2⎥⎤ −1

j −1

j



⎢⎡ K / 2⎥⎤ −1



HH j (m, n) = ∑∑ g (i) ⋅ g (k ) ⋅ LLj −1 (2m − i, 2n − k ) (4)

h(2k + 1) ⋅ LL (2m − i,2n − 2k −1)

k =0

LHmj ,i (n) =

i =0 k =0

where

h(⋅)

and

g (⋅)

are low-pass filter and

high-pass filter, respectively, K is the filter length, j

LL (m, n)

,

j

j

LH (m, n) ,

,

HL (m, n)

and

j

HH (m, n) are coefficients at position (m, n) in

the low-low, low-high, high-low and high-high j

j

j

j

subbands (denoted by LL , LH , HL , and HH , respectively)

of

the

j

th

level

(11) j −1

i =0 k =0

K −1 K −1

h(2k ) ⋅ LLj −1 (2m − i,2n − 2k )

k =0

+

K − ⎢⎡K / 2⎥⎤ −1



⎢⎡K / 2⎥⎤ −1



g(2k ) ⋅ LLj −1 (2m − i,2n − 2k )

k =0

(12) j −1

g (2k +1) ⋅ LL (2m − i,2n − 2k −1)

k =0

The above equations imply that LLmj ,i (n) and LH mj ,i (n) can be computed as sum of two 1-D

convolutions, which are independently obtained from the even part LLj −1 (2m − i, 2n − 2k ) and the odd part LLj −1 (2m − i, 2n − 2k − 1) , respectively.

decomposition,

0 ≤ n, m < N j , and LL (m, n) is the original image 0

of size N 0 × N 0 .

3. The Proposed 2-D DWT Architecture For real-time applications, a parallel and

Note that equations (1) ~ (4) are separable 2-D

pipelined architecture is proposed to implement 2-D

convolutions followed by decimation by 2 in both

DWT. At each decomposition level, there are two

horizontal and vertical directions. In specific,

1-D filtering operations involved: row filtering

j

j

j

coefficients LL (m, n) , LH (m, n) , HL (m, n) , j

and HH (m, n) can be computed as follows.

followed by column filtering or vice versa. After 1-level DWT, the input image of size is decomposed into four subbands (LL, LH, HL and HH) of size. The

LL subband can be further decomposed into LLLL,

and high-pass reconstruction filter coefficients,

LLLH,

LLHL

decomposition

and

LLHH

at

the

second

respectively. The proposed 2-D IDWT architecture

level;

LLLL

can

be

further

comprises an inverse transform module, a RAM

decomposed into LLLLLL, LLLLLH, LLLLHL and

module

LLLLHH at the third decomposition level, and so on.

for 3-level IDWT; 2 clock cycles are allocated for the

Figure 1 shows 1-level 2-D DWT.

first reconstruction level, 8 clock cycles for the

Take the 4-tap Daubechies orthogonal wavelet as an example, Figures 2 and 3 show the horizontal

and a multiplexer. It takes 42 clock cycles

second reconstruction level, and 32 clock cycles for the third reconstruction level.

and vertical filters used for 2-D DWT, respectively. Based on equations (11) and (12), horizontal and vertical filtering can be obtained as sum of two convolutions

performed

on

their

respective

even-numbered samples and odd-numbered samples. Data flows of the vertical and horizontal filters are shown in Table 1 and Table 2, respectively. Figure 4 shows the transform module composed of vertical and horizontal filters of 2-D DWT. Figure 5 depicts the proposed architecture for 2-D DWT, which comprises a memory module, a transform module, a multiplex and an address generator.

Figures 6(a)

and 6(b) show the horizontal and vertical directions of data sampling of the proposed 2-D DWT architecture, respectively. It takes 42 clock cycles for 3-level DWT; 32 clock cycles are allocated for the first decomposition level, 8 clock cycles for the second decomposition level, and 2 clock cycles for the third decomposition level. Note that three transform modules can be cascaded to obtain high-throughput architecture with reconfigurable system.

5. Hardware Implementation and Performance Analyses of the Proposed 2-D DWT/IDWT Architectures The low-pass and high-pass filter coefficients of DWT/IDWT

In the proposed 2-D IDWT architecture, column

first

quantized

and

then

implemented in high-speed computation hardware [7]-[9].

The

quantized

coefficients

of

4-tap

Daubechies wavelet are shown in Table 3. The multiplier is replaced by a carry-save-adder (CSA), an adder and three hardwire shifters in processing element (PE). The shift and adder circuit replacing the multiplier is shown in Figure 10. In the proposed architectures, multiplications are obtained by shifts and additions after approximating the coefficients in booth

binary

recoded

format

(BBRF).

Thus,

multiplier is replaced by a carry-save-adder (CSA), an adder and three hardwire shifters in processing element (PE). The replaced multiplier can reduce power dissipation by m in comparison to the conventional architectures in m-bit operands. The advantages of the proposed 2-D DWT/IDWT architectures

4. The Proposed 2-D IDWT Architecture

are

are

regular

structure,

local

communication and simple control flow. As a result, they are suitable for VLSI implementations and

filtering followed by row filtering is performed at

reconfigurable

each reconstruction level. The proposed vertical and

hardware utilization is 100% and therefore consumes

horizontal filters used for 2-D IDWT with 4-tap

ultra-low power. The total data processing times of

Daubechies wavelet are shown in Figures 7 and 8,

2-D DWT and IDWT are , where .

respectively. Figure 9 shows the inverse transform module, where h (n) and g (n) are the low-pass

filter

lengths.

In

addition,

the

The platform for architecture development and verification had been designed and implemented in

consideration of development cost. The architecture

In this paper, two high-speed and ultra

had been implemented on Xilinx XC2V4000 FPGA

low-power architectures for 2-D DWT and IDWT

emulation board [10]; the chip had been integrated

with single transform modules have been proposed.

with the 8051 microcontroller and interface circuits

Both architectures perform analysis in

(USB 2.0) to form the architecture development and

time and synthesis in . They are significantly faster

verification platform. Figure 11 depicts block

than other commonly used architectures proposed by

diagram

Chakrabati,

of

the

platform,

where

the

8051

Visshwanath,

Wu

and

processing

Chakrabati

microcontroller reads the original image from PC via

[13]-[16]. Tables 1 and 2 depict the comparisons of

DMA channel and writes the reconstructed image

the proposed architecture to other commonly used

back to PC via USB 2.0 bus; the Xilinx XC2V4000

architectures for 2-D DWT and IDWT [13]-[16]. As

FPGA chip is used to implement high-performance

can be seen, the system performances of the two

DWT and IDWT processors. Figure 12 shows the

proposed architectures are significantly better than

architecture development and verification board.

that of previous works. Filter

In this paper, we handle borders by the periodic

coefficients

are

quantized

before

extension method [11]. Hence, the quality of

implementation using 4-tap Daubechies filters. Both

reconstructed images can be improved. Figure 13

hardwares are cost-effective and the systems have

shows the periodic extension at image borders. The

high-speed.

proposed 2-D DWT/IDWT architectures with fixed

dissipation by m compared with conventional

point operations had been applied to the Lena image.

architectures

Figure 14 shows the 3-level decomposition of Lena

utilization).

The

architectures

in

m-bit

reduce

operand

power

(low-power

image. The original and reconstructed Lena images

The proposed architectures have been verified

are shown in Figure 15. The peak-signal-to-noise

by Verilog-HDL [17] and implemented on VLSI [12].

ratio (PSNR) of the reconstructed Lena image is

The advantages of the proposed architectures are

255.5dB. The log-log plot of 2-D DWT/IDWT

100% hardware utilization and ultra low-power. The

computation against various image sizes is shown in

architectures have regular structure, simple control

Figure

flow, high throughput and high scalability. Thus, they

16.

The

proposed

2-D

DWT/IDWT

architectures are suitable for the JPEG 2000 and

are

motion JPEG applications.

coding/decoding systems, such as JPEG-2000 and

The proposed 2-D DWT and IDWT processors

very

suitable

for

new-generation

image

motion-JPEG.

have been synthesized by TSMC 0.18 1P6M CMOS

References

cell libraries [12]. The reported core sizes of 2-D DWT and IDWT processors are 588 × 588μ m 2 and

[1] ITU-T

Recommendation

T.800.

JPEG2000

410 × 406μ m 2 , respectively. The power dissipations

image coding system – Part 1, ITU Std., July

of 2-D DWT and IDWT processor are 28.53 mW and

2002. http://www.itu.int/ITU-T/.

27.10 mW at 1.8V with clock rate of 50MHz. Figures

[2] A. S. Lewis and G. Knowles, VLSI architecture

17 and 18 show the layout view of the implemented

for 2-D Daubechics wavelet transform without

2-D DWT and IDWT core processors.

multipliers, Electron. Lett., Vol. 27, Jan. 1991, pp.171–173.

6. Conclusion

[3] I. Daubechies, Orthonormal Based of Compactly

Supported Wavelets, Communications on Pure

Wavelet Transform for Perfect Reconstruction,

and Applied Mathematics, Vol. 41, 1988, pp.

IEEE Trans. Signal Processing, Vol. 48, No. 5,

909-996.

May 2000, pp. 1365-1378.

[4] I. Daubechies, Ten Lectures on Wavelets, 1st edition, Philadelphia PA: SIAM, 1992.

Technical Data, v.3.2, Taiwan Semiconductor

[5] F. Marino, Two Fast Architectures for the Direct 2-D

Discrete

Wavelet

[12] TSMC 0.18 μm CMOS Design Libraries and

Transform,

Manufacturing Company, Hsinchu, Taiwan.

IEEE

[13] C. Chakrabarti, M. Vishwanath, Efficient

Transactions on Signal Processing, Vol. 49, No.

Realizations of the Discrete and Continuous

6, June 2001, pp. 1248-1259.

Wavelet

[6] M. Antonini, M. Barlaud, P. Mathieu, I.

Transforms:

From

Single

Chip

Implementations to Mappings on SIMD Array

Daubechies, Image Coding Using Wavelet

Computers,

Transform,

Processing, Vol. 43, No. 5, 1995, pp. 759-771.

IEEE

Transactions

on

Image

Processing, Vol.1, No.2, April 1992, pp. 205-220.

IEEE

Transactions

on

Signal

[14] M. Vishwanath, R. M. Owens, M. J. Irwin, VLSI Architectures for the Discrete Wavelet

[7] K. A. Kotteri, A. E. Bell, J. E. Carletta, Design

Transform, IEEE Transactions on Circuits and

of Multiplierless, High-Performance, Wavelet

Systems-II,

Filter

pp.305-316.

Banks

with

Image

Compression

Applications, IEEE Transactions on Circuits and

Vol.

42,

No.

[15] P. –C. Wu, C. –T. Liu,

5,

May

1995,

L. –G. Chen, An

Systems-I, Vol. 51, No. 3, March 2004,

Efficient Architecture for Two-Dimensional

pp.483-494.

Inverse Discrete Wavelet Transform, IEEE

[8] T.-Y. Sung, Y.-S. Shieh, C.-L Chiu, C.-W. Yu, VLSI Architectures for 2-D Forward and Inverse Discrete

Wavelet

Daubechies

2006

Using

on

Circuits

and

Systems, Vol. 2, May 2002, pp. II-312-II-315. [16] C.

Chakrabarti,

C.

Mumford,

Efficient

Realizations of Analysis and Synthesis Filters

Application

Based on the 2-D Discrete Wavelet Transform,

(2006-CECA), Kaoshiung, Taiwan, July 2006,

International Conference on Acoustic, Speech

BO-019.

and

Communication

Conference

4-tap

Symposium

on

Electronic

Filters,

Transform

International

and

[9] T.-Y. Sung, C.-L. Chiu, Y.-S. Shieh, C.-W. Yu,

Signal

Processing,

May

1996,

pp.

3256-3259.

Low-Power Multiplierless 2-D DWT and IDWT

[17] D. E. Thomas, P. H. Moorby, The Verilog

Architectures Using 4-tap Daubechies Filters,

Hardware Description Language, Fifth Edition,

The 7th International Conference on Parallel

Kluwer Academic Pub. 2002.

and Distributed Computing, Applications and Technologies, Taipei, Taiwan, R.O.C., December 2006, pp.185-190. [10] Xilinx FPGA products, http://www.xilinx.com/ products. [11] M. Ferretti, D. Rizzo, Handling Borders in Systolic Architectures for the 1-D Discrete

Table 1 Data flow of the vertical filter of the proposed 2-D DWT architecture Clock 0 1 2 3 …

Part Even Odd Even Odd Even Odd Even Odd … …

Operation h(1)x(0,0),h(3)x(0,0) h(0)x(0,1),h(2)x(0,1) h(1)x(0,2),h(3)x(0,2) h(0)x(0,3),h(2)x(0,3) h(1)x(0,4),h(3)x(0,4) h(0)x(0,5),h(2)x(0,5) h(1)x(0,6),h(3)x(0,6) h(0)x(0,7),h(2)x(0,7) … …

Latch

Output

0

h(0)x(0,1)+h(1)x(0,0)

h(2)x(0,1)+h(3)x(0,0)

h(0)x(0,3)+h(1)x(0,2)+h(2)x(0,1)+h(3)x(0,0)

h(2)x(0,3)+h(3)x(0,2)

h(0)x(0,5)+h(1)x(0,4)+h(2)x(0,3)+h(3)x(0,2)

h(2)x(0,5)+h(3)x(0,4)

h(0)x(0,7)+h(1)x(0,6)+h(2)x(0,5)+h(3)x(0,4)





Table 2 Data flow of the horizontal filter of the proposed 2-D DWT architecture Clock 0 … N/2 … N … 3N/2 …

Input L(0,0) … L(1,0) … L(2,0) … L(3,0) …

PE1 h(3)L(0,0) … h(2)L(1,0) … h(3)L(2,0) … h(2)L(3,0) …

LD2 -… h(2)L(1,0)+h(3)L(0,0) … -… h(2)L(3,0)+h(3)L(2,0) …

PE0 h(1)L(0,0) … h(0)L(1,0) … h(1)L(2,0) … h(0)L(3,0) …

Output -… h(0)L(1,0)+h(1)L(0,0) … -… h(0)L(3,0)+h(1)L(2,0)+h(2)L(1,0)+h(3)L(0,0) …

Table 3 Quantized coefficients of the 4-tap Daubechies filter (BBRF:Booth binary recoded form) l(n)

h(n)

n

Decimal

0

-0.12939453125

1

0.22412109375

   00.00100001001  00.01001001011

2

0.83642578125

3

0.48291015625

BBRF

Decimal

BBRF

-0.48291015625

  00.100001001010

0.83642578125

   01.001010100010

   01.001010100010

-0.22412109375

   00.01001001011

  00.100001001010

-0.12939453125

   00.00100001001

Table 4 Comparison between the proposed architecture and other architectures for 2-D DWT

N2 + N(K + 2) 4 N2 + KN + K 4

2 ≈ 2N 3 2N 2 ≈ 3

Table 5 Comparison between the proposed architecture and other architectures for 2-D IDWT

N2 + N ( K + 2) 4

≈ 2N

2

3

2

N + ( K − 2) N + K 4



2N 2 3

h(3)

H( z ) H(z)

2

2

Inputeven

L

G( z )

x

2

H( z ) G(z)

2

h(1)

LL

2

Inputeven

LH

L

HL

Inputodd

H

G( z ) Horizontal

2

HH

Inputodd h(0)

h(2)

L : Latch

Vertical

Fig. 1. 1-level 2-D DWT

PE1

Output

Fig. 2. The horizontal low-pass filter for 1-D DWT

PE0

In put 0 1 0

0

h (3 ) h (2 )

L

1 0

h (1 ) h (0 )

L

1

1

LD1

LD0

L

Level 3 Level 2

L

Input O utput

LD2

L : L atch

LD

: L in e d elay

Fig. 3(a). The vertical low-pass Filter for 1-D DWT

Output

Level 1 N/8

N/4 N/2

Fig. 3(b). Line delay (LD) for Vertical Filter in 1-D DWT

0 1 L

0

0

h(3) h(2)

1 L

0 1

h(3)

x even

1

LD 1

h(1)

h(1) h(0)

LD0

x even L

L

L

x odd

L

LL

LD 2

x odd h(0)

h(2)

0 1 L

0

0

g(3) g(2)

1 L

0 1

1

LD 1

LD0

L

L

LH

LD 2

0 1

0

h(3) h(2)

L

0

1

x even

1

LD 1

g(1)

h(1) h(0)

L

0 1

g(3)

g(1) g(0)

LD 0

x even H

L x odd

L

L

g(2)

HL

LD 2

x odd g(0) 0

1 L

0

0

g(3) g(2)

1

g(1) g(0)

Address generator

L

0 1

Memory module N N × 2 2

1

LD 1

LD 0

(LL)j-1LL

Transform

(LL)j-1LH

module

(LL)j-1HL

Input L

L

L : Latch

LD

(LL)j-1HH

HH

LD 2

: Line delay

Fig. 4. The proposed architecture of the transform module for 2-D DWT

Fig. 5. The proposed architecture of the

single transform module for 2-D DWT

L(0,0)

L(0,1)

L(0,2)

L(0,3)

H(0,0) H(0,1) H(0,2) H(0,3)

L(1,0)

L(1,1)

L(1,2)

L(1,3)

H(1,0) H(1,1) H(1,2) H(1,3)

L(2,0)

L(2,1)

L(2,2)

L(2,3)

H(2,0) H(2,1) H(2,2) H(2,3)

L(3,0)

L(3,1)

L(3,2)

L(3,3)

H(3,0) H(3,1) H(3,2) H(3,3)

L(4,0)

L(4,1)

L(4,2)

L(4,3)

H(4,0) H(4,1) H(4,2) H(4,3)

L(5,0)

L(5,1)

L(5,2)

L(5,3)

H(5,0) H(5,1) H(5,2) H(5,3)

L(6,0)

L(6,1)

L(6,2)

L(6,3)

H(6,0) H(6,1) H(6,2) H(6,3)

L(7,0)

L(7,1)

L(7,2)

L(7,3)

H(7,0) H(7,1) H(7,2) H(7,3)

x(0,0) x(0,1) x(0,2) x(0,3) x(0,4) x(0,5) x(0,6) x(0,7) x(1,0) x(1,1) x(1,2) x(1,3) x(1,4) x(1,5) x(1,6) x(1,7) x(2,0) x(2,1) x(2,2) x(2,3) x(2,4) x(2,5) x(2,6) x(2,7) x(3,0) x(3,1) x(3,2) x(3,3) x(3,4) x(3,5) x(3,6) x(3,7) x(4,0) x(4,1) x(4,2) x(4,3) x(4,4) x(4,5) x(4,6) x(4,7) x(5,0) x(5,1) x(5,2) x(5,3) x(5,4) x(5,5) x(5,6) x(5,7) x(6,0) x(6,1) x(6,2) x(6,3) x(6,4) x(6,5) x(6,6) x(6,7) x(7,0) x(7,1) x(7,2) x(7,3) x(7,4) x(7,5) x(7,6) x(7,7)

Fig. 6. (a) The horizontal direction and architecture

(b) vertical direction of the proposed 2-D DWT

Even part

Even output

L1

LD ~ 0 h (0) ~ 1 h (1)

~ h ( 2)

~ h ( 0)

Input

f

Input

L2

~ 0 h (2) ~

1h (3) 2f

~ h (1)

Output

~ h (3) LD

: Line delay

Odd output Odd part

: Latch

L

Fig. 8. The vertical low-pass filter for 1-D IDWT

Fig. 7. The horizontal low-pass filter for 1-D IDWT LL

f

LD ~ 0 h (0) ~

1 h (1)

~ 0 h ( 2)

Even part

~

1 h (3) 2f

LH

f

~ h (0)

LD ~ (0) 0 g 1

g~ (1)

xeven

L

~ 0 g ( 2) ~ g (3)

L

f

LD ~ 0 h (0) ~ h 1 (1)

L

~ h (1)

1

2f

HL

~ h ( 2)

~ h (3)

Odd part ~ 0 h ( 2)

Even part

~

1 h (3) 2f

g~ (0)

HH

f

LD ~ (0) 0 g

~ 0 g ( 2)

g~ (1)

g~ (3)

1

1

H

L

g~ ( 2)

L

g~ (1)

2f

g~ (3)

xodd

Odd part LD

: Line delay

L

: Latch

Fig. 9. The proposed architecture of the transform module for 2-D IDWT

x(n)

Shifter

Shifter

Shifter

(3,2) CSA

Output Fig. 10. The shift and adder circuits replacing the multiplier

The architecture development and verification board Finish Initiate Ready

PC

U S B

U S B

8051 Micro-contr oller

R/W R/W

Xilinx XC2V6000 FPGA

SRAM

PCI Bus Address Bus

Data Bus

Download architecture configuration

Fig. 11. The block diagram of the architecture development and verification platform for forward and inverse DWT

Xilinx XC2V6000 FPGA

PCI Bus SRAM Extension Socket

USB 2.0

8051 Microcontroller

Fig. 12. The architecture development and verification board

L(0,0)

L(0,3) h(3) h(2) h(1) h(0)

h(3) h(2) h(1) h(0)

x(0,0) x(0,1) x(0,2) x(0,3) x(0,4) x(0,5) x(0,6) x(0,7)

g(3) g(2) g(1) g(0)

x(0,0) x(0,1)

g(3) g(2) g(1) g(0)

H(0,0)

H(0,3)

Fig. 13. The proposed periodic extension method

(a)

Leve1 3

Leve1 2

Leve1 1

(b) Fig. 15. (a) Original and (b) reconstructed Lena image

Fig. 14. The 3-level decomposition of Lena image

Fig. 17. Layout view of the implemented 2-D DWT core processor

Fig. 16. Log-log plot of the 2-D DWT/IDWT computations Fig. 18. Layout view of the implemented 2-D IDWT core processor against different image width for each algorithm

Suggest Documents