Improved Reversible Integer Transform.pdf

10 downloads 0 Views 377KB Size Report
original transform into three triangular matrices, with the fixed-point processors to ... (B) Do permutation and sign-changing operations for G: [5][6]. With their ...
Improved Reversible Integer Transform Soo-Chang Pei and Jian-Jiun Ding Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan, 10617, R. 0. C, E-mail: [email protected]

Abstract-Integer transform are the discrete transforms whose entries are summations of 2-k. If for an integer transform, we can perfectly recover the input from the output, we call it the reversible integer transform. In 2001, Hao and Shi developed an algorithm that can convert any reversible non-integer transform into a reversible integer transform. In this paper, we improve their works. First, we simplify the way of derivation. Then, we analyze the approximation error and introduce the way to reduce it. We also discuss the problem of bit constraint and how to reduce the number of time cycle in implementation. I.

transform and the number of points of the original transform is no longer constrained to be a power of two. It is a great milestone for the development ofthe integer transform. In this paper, we will further improve Hao and Shi's algorithm. We want the integer transform to satisfy the following five goals at the same time:

(Goal I, binary entries): Both the forward transform B and the inverse transform B1 are binary-valued matrices. (Goal II, reversibility): The integer transform is perfectly

recoverable, i.e., B,.B = I (B1 = B-1). (Goal III, easy to design). (Goal IV, high accuracy): If y and z are the transform result of the original non-integer transform and the integer transform, respectively, then zt for-ywhere obis some constant. (Goal V, not increase the complexity for implementation):

INTRODUCTION

For andisetelnear trans ifeboth t forward and theinverse transform matrices (denoted by B and W') are

BInary-valued matrices:2-k

-1

[m, n]

±Y

Tm,n,k 2-k

It

(1) B[m, k] ±+ bmnk 2-, B [i, n] = + mnk where k's are integers and bm,n,k, Tmnk, = 0, 1, j, or I±j, we call it the integer transform. By contrast, if the entries of B and B-1 can not be expressed as (1), the transform is a noninteger transform. For example, the Walsh transform is an integer transforms. The Karhunen-Loeve transform (KLT), the DCT, and the DFT are non-integer transforms. The advantage of integer transforms is that they can be implemented by fixed-point processors and no floating-point processor is required. Since the computation time for the fixed-point processor is much less, the integer transform are usually very efficient. Unfortunately, most of the discrete linear transforms are non-integer ones. If we want to use fixed-point processors to implement them, we should round them into the integer transform, i.e., B 2-q round qA B 2-q round qA-1 (2) matrices transform / inverse where A and A' are the forward of the original non-integer transform. However, after rounding, the reversibility property is always lost (i.e., B,B . I) In recent years, several algorithms for converting the non-integer transform into an integer transform and preserving the reversibility property are developed. In [1], Cham derived the integer DCT that approximates the original noninteger DCT. His method based on generating a prototype matrix and using the orthogonality and the symmetry constraints to solve the unknowns of the prototype matrix. Then, the lifting scheme was developed. It is more flexible and easier to design. In [2][3][4], the 2k-point integer discrete wavelet, cosine, and Fourier transforms were derived successfully by the lifting scheme. In 2001, Hao and Shi generalized the lifting scheme [5][6]. With their algorithm, we can successfully convert any reversible non-integer transform into a reversible integer

0-7803-9390-2/06/$20.00 ©C2006 IEEE

1091

includes

the hardware cost and the number of time

cycles.

Goals I and II were achieved by Hao and Shi. In this paper, we try to achieve Goals 1111, IV, and V, which are related to the quality and the practicality of integer transforms. First, in section 2, we introduce a simpler way to design the integer transform, which can avoid the recursive process using in [5][6] and achieve Goal III. Then, in section 3, we show that, for a non-integer transform, there are a lot of integer transforms that can approximate it. To select the optimal one from them, in section 4, we introduce a systematical way to estimate the accuracy. The optimal integer transform should have the minimal NRMSE in (35). It achieves Goal IV. In section 5, we show that, although we decompose the original transform into three triangular matrices, with the proposed implementation algorithm, the required time cycles for computation is N+3+log2N, which is only a little more than the original case. It achieves Goal V.

II. SIMPLER ALGORITHM TO DESIGN INTEGER TRANSFORMS

Instead of using the iterative process in section 4 of Ref. [5], in this paper we use a simpler and faster way to derive the integer transform. SupposethatAisadiscretenon-integertransform: y = A x, A: an N x N matrix, det(A) . 0. (3) (A) First, scale A by a constant c-such that det(G) = ±1: (4) where c- = det(A) -1/N . G = c-A, Note that, for most of the applications (such as filter design, spectrum analysis, etc), scaling does not affect the performance. Even for data compression, scaling just "shifts" the dynamic range and does not increase or decrease its width. (B) Do permutation and sign-changing operations for G: (5) P, Q: any permutation matrix R =D,PGQD2, D1,,D2: diagonal matrices, Di[n, n], ±1, D2[n, n] ±1.

ISCAS 2006

1 t12

(C) Then we do triangular matrix decomposition. First, we find L1 such that H= RLb has the following form:

A',i

H = Rt = k3,~1

1

0

3

1

--

hNl,2 hN1,3

hNll

hNJ

hN,3

hN 2

1

0 bO°

where

*-h,N ..kAN-I 4,N00 h,Nl~~~~~~~~~~~~ 0 °

4,2hl h214,3h3

h

112

T1 3

...

1

V2,3

..

°

0

0

0

0

..

1

...

hNN-1

T1,N-I

_r,j

["I,n =

0

wherer 2

0

0



0

0

O

O0

O0

(9)

...

*

1

SN-1 0

CNI CN2

0 0 0O

0 0 O0

.

.

hN,N-1

1

.

...

°

1

°. 1 0

~~~~~0=N I. 4,2

1

0

0

tm,n

0 ,

Cmn

0

CN,3 ... CNJN-1

Qb (7m,n ),

(hm,n)n sn = Qb (Pn) = Qb

rn-I

Step 3: x3[n]= x2[n]+Qr {ZCnmX2 [M]

x3[1] x2[1],

(15)

Step 4:

(18)

fornf 2N (19)

F.

1

X4 [1]= ±X3 [1] + Qr {LSkX3 [k]} ,

The process of the

Step 1:

(11) (11)

X4 = D1Pz,

inverse integer transform is:

(20)

(21)

(22)

X Step 2: X3[NIV±X4[1]±Qr{ZSkX4[k]}u X SX ,

x3[n] = x4[n] when n . 1, Step 3: X2[11 =X3[l], h[n] = O (n = 2 to N),

(23)

(24)

for m 2 to NDo h [n]= h [n] ± Cn,,nX2 [M-I1], for n= m to N, = (25) x2[m] x3[m]h[n]=0 }. Step4: xi[N]=x2[N], (n=ltoN-1), form=2toNDo n to N+1-m, h[n] =h[n]+tn, N+2_mxl[N+2-m], forn=

QrI{h[m]

(12)

x2[N+±-m] =x3[N+±-m] -Qr{h[N+I-m]},

Step

5:

=QD2X1.

(26) (27)

r l 1] We can perfectly recover x from z by the procI[Theorem ess in (22)-(27) if the least bit of x is not smaller than 2'. ,N] 2

. where Lp, P2 ...... PN] ,1 Pi = det(R) ±1. (13) From (5), (6), and (10), we have decomposed G into three triangular matrices T1, T2, and T3: 14 (14) G =PPTD DT3TT12Q T * (F) Approximate T1, T2, and T3 by binary valued matrices: .

1

°

SN

x2[N] =xiVV],

0

0

0

0

..

CNl CN2 CN13

x4[n] x3[n] when n . 1, Step5: z=PTD1x4.

P. N-1 PON PN-I PN 1

.

tN1l,N

I1

0

0

0

=D2Q

rn,n]

rn, 2..

1 0 0 (E) Decompose HI into 7172 and T3. 1 0 0 0 1 2,1 H T, hk1 h3,2 11 h32 HT3T2,

2 P3

1 0

C3,131 C3,2

0 0

Then the process ofthe forward integer transform is: (17) Step 1: xf , Step 2: X2 [n=XI[nl] +Qr{ ZtnmxI[m], for n I-~N-1,

0

O

... ...

0

-1~~~~~~~~()(8)

rn-l, rnnll

POI P02 P03

o

J2

1

I 0 0 00 000 1 O *-- 0 where Qb(a) = 2-b round(2b a). (16) (G) If we constrain the least bit of the output to be 2-r, then we convert J1, J2, and J3 into ladder-truncation operations.

VN1N

hN. hN-1,3 hN hNh.iij hN-1,2 hN I hN,2 hN 3

tN t3,N

..

I

(7)

£ o 0 *-- n t] T2,n n-2,n Tn-In, Z in= r2 nr3 n... (D) T1 Ll-1. Note that T1 is also an upper-triangular matrix. 1 771,2 771,3 .71,N-1 '71,N 0 1 772,3 ... 772,N-1 772,N[1 (10) T1= L,= 0 0 1 ... 773,N-1 773,N in

N4 t3,N-1

1

t2,N-I t2,N

J= 0 0 1

1 r21 r22 ... r2,n-I n = Sn(Zn - tn ) S S =r3,1 r3,2 ..r3,n ~~~S~~ ~ ~ .

l

0 0 0

1

r3N

1 0

tiN

0+ 1 S2 S3 0 1 0

:O

0 0

0 0

0 0

(6)

2N1 2N

*r3N1

Jl

tl...

-...1

V1,N

*.

0 0

01 it23

[Theorem 2] If the least bit of x is not smaller than 2r, then, in (15), no matter how large of b we use for approximation the entries T2, andthan T3, 2~r the(independent least bit of the quence z isofnoT1,smaller of b).output It can sebe proved directly from (17)-(21). Since b does not affect the bit-length of the output, we can choose b as a large value to reduce the approximation error.

1092

* The quantizations from T1 into J1, from T2 into J2, and from T3 into J3 cause PTD1T3T2VlQx, PTD1T3V2T1Qx, and PTD1V3T2T1Qx, respectively.

From (4) to (27) we can derive the reversible integer transform that approximates G successfully. Compared with the algorithm in [5], which uses the iterative process to derive the triangular matrices, our algorithm is simpler. Moreover, in Theorem 3, we find that the entries of H in (6) dominate the accuracy, and we can use the permuting matrices P and Q in (5) to control the magnitude of hJm n. Thus the proposed algorithm also proposes a convenient way to improve the accuracy of the integer transform. GENERALIZATION OF THE INTEGER TRANSFORM The proposed process for deriving integer transforms is very flexible. We can use the following ways to generalize it: (A) In (5), there are N! possible choices for both P and Q. (B) In (5), there are 2N possible choices for both D1 and D2. (C) We may decompose AT, A-', or (AT)-1 instead of A. (D) The order of upper and lower triangular matrices can be changed. For example, T2, T3can be lower, upper, and one-row lower triangular matrices, respectively (In this case, the process in (6)-(14) should be modified properly). m (16) may be reThe rondn oprto Qb (E)(BTh o placed by operations. , since we only . require that d (F) In (5),

There are some interesting things can be noticed. (a) From Theorem 2, we can choose b as a large value and make the truncation errors from Tito Ji very small. In this case, (33) can be simplified as: TD z - y PTDT T3A2 +pT D A3. (34) (b) After the integer transform is designed, we can use (33) (or (34)) to estimate the normalized root mean square error (NRMSE) and use it to measure the accuracy:

111.

T',

deie,n(6)mybe rounding operacion defineg

et(D2l 1,l etries .o D21 a D2 weIn mayk chooswes om rofuthe diag. . daet(D1)l -k

as ±2 and choose some of their diagonal entries as ±2- . Thus, there are infinite ways to generalize the algorithm and we can obtain infinite number of integer transforms. Even if we only apply the ways of (A), (B), and (C), there are

4.22N(N!)2

(28)

possible integer transforms we can obtain. We can use accuracy analysis to determine which one is the best.

NRMSE

=

E

y) (z Y)]

(35)

( E[y Y]

(c) From (34), it is obviously that the error is dominated by

the triangular matrices T2 and T3. Therefore, [Theorem 3] To make the integer transform more accurate, we should make the entries of T2 and T3 (or the nearlytriangular matrix (6)) as small as 3 are helpful for findEqs. (33), (34), in(35), and Theorempossible. ing the integer transform with very high accuracy. In Section 3, we illustrated that there are many ways generalize the inis enough, we can try each way and transforms. teger the accuracy of the obtained use (33) and (35)Iftotime measure integer transforms and determine which one is optimal. If we do not have enough time, from Theorem 3, we can use the following way to make entries of H (defined in (6)) very small and obtain the "nearly-optimal" integer transform. (I) Set P = Q = D2I= , H = G, DI(l, 1) =1, and t = 1. N (II) Find w, such that Z G[m, W] 2 is minimal. Then we

IV. ACCURACY ANALYSIS After the integer transform is derived, we concern about how to improve its quality, especially the accuracy. Note that, in Steps 2, 3, and 4, the truncation operation Qr is equivalent to adding a small number A: where -2`i < A < 2`. Qr{c} =a+ A (29) Since a is unknown, A can be treated as a random variable that is uniformly distributed in (-2-i, 2`i) and (30) E[A] = 0, E[A2] = 4r/12, E: expected value. Thus the process in (17)-(21) can be rewritten as: z =P +Al)+A21+ A3}, (31) where A1, A2, and A3 are Nxl vectors, A4[n] has the same distribution as (30), (j = 1, 2, 3, n = 0 to N-1), except for that (32) A,[N] = 0, A2[1] 0, A3[n] = 0 when n . 1. Thus ify = Gx = cAx, the difference between y and z is: z y PTD1T3T2A1 +p D1T3A2 +p D1/A3

TDtJ3[J2(J,D2QTX

T

HE[z

T

+ PTDlT3T2VlQX + P D1T3V2T1Qx + P TD 1V3T2T1Qx . (33) where Vi = JiTi, 1 1, 2, 3, and suppose that Vi is small. Thus the error comes from the following factors: * The truncations in Steps 2, 3, and 4 cause PTD1T3T2A1, P D1T3A2, and PTD1A3, respectively,

HI

exchange the 1st and the w I columns of H and Q. (III) Vary vt in the range of 1, t+1, t+2, t+3, N, vary wt from t+1 to N, and set s = 1 or - 1. Find (vt, wt, s) such that the following summation is minimal: N

Mi C2 Ct

t

w

[m] 2 where h,. [m] = H[m,wt]-ZcrH[r,wt,

H[2,1] H[3,1]

H[2,2] H[3,2]

H[2, t] H[3, t]

H[tj]

H[t,2]

H[t,t]

- H[2, wj

- H[3, wt -

(36)

H[t,wt

H[vt,t] s-H[vt,Wt] H[vt,1] H[vt,2] Set DI[t+1, t+1] = s. Exchange the (t+1)t and the vti

(IV) rows of H and P and exchange (t+1)t and the wtt columns of H and Q. (V) Set t t+1 and return to (III), until t N. (VI) From (I)-(V), we have obtained P, Q, D1, D2, H, and R = D1PGQD2. Then we follow the process in (6)-(27) to derive the integer transform. V. REDUCING THE NUMBER OF TIME CYCLES =

=

We can prove that the number of multiplications of the integer transform is N2- 1 and that of the direct implementa-

1093

tion is N2. Thus, from the view of multiplications, the integer transform has the same complexity as the original case. However, the main problem of the integer transform in implementation is the number of time cycles. Since we decompose G into three triangular matrices, if we implement the integer transform directly, each of the triangular matrices requires N- I time cycles, the pre- and post-permutingdiagonal matrices require 2 time cycles, and in sum there are 3N -3 +2 = 3N- 1 time cycles, (37) which is three times more than those of direct implementation (requires N- I time cycles). In fact, using the following technique, the required time cycles can be reduced into N+3+log2Ntime cycles, (38) which is only a little more than the original case. (1) Notice that steps 1 and 5 are permutation and sign changing operations. Both of them require only one time cycle. (2) Steps 2 and 3 can be implemented together. Suppose that there are 2N+I memory units can be used (Note that direct implementation also require 2N memory units). We first set x2[n] = x1[n] for initiation. Then, Fork= 1 to N+I (D If I < k