International Workshop on Intelligent Data Acquisition and Advanced Computing System: Technology and Applicatioils 1-4 July 2001, Foros, Ukraine
A New Parallel Fast Cosine Transform Algorithm Anatoly Melnyk”, Yury Ermetov2) Department of Computer Engineering, Lviv Polytechnic National University, St.Bandery 12 str., Lviv 79646, Ukraine, 1) aomelnyk @ pol ynet.lviv.ua 2)
[email protected]
.
Abstract: A new algorithm for the fast cosine transform (FCT) computation is proposed. The usual FCT algorithm features consecutive adding operations which in case of parallel FCT computation result in large timing latency or require significant additional hardware. There is the same number of parallel operations in the new FCT algorithm compared to the amount of consecutive operations in the usual FCT algorithm. This provides for simple implementation of parallel FCT computations without extra timing and hardware expenses. . Keywords: digital signal processing, data compression, fast cosine transform, parallel algorithm
1. INTRODUCTION
’
Discrete cosine transform (DCT) is widely used in digital signal processing for such tasks as data compression, pseudocepstral data processing, filtering, computqtion of other orthogonal trigonometric transforms etc. [ 11. The DCT algorithm is basis for such wide-spread coding standards as JPEG, MPEG, H.261 etc. [2,3,4]. The tasks for which algorithm computation latency is crucial parameter require parallel computation. The fast cosine transform (FCT) algorithm consists of two parts: transforming matrix (TM) and adding matrix (AM) [ 5 ] . The structure of TM is similar to the structure of the fast Fourier transform algorithm containing IogzN computational stages of N/2 independent “butterfly” operations. Thus parallel computation of TM is simply accomplished by implementation of corresponding number of hardware “butterflies”. The AM consists of (logzN-1) computational stages with recursive basic operations (BO) when one BO requires result of previous BO. Consecutive nature of these operations is convenient in cases of one-channel iterative and pipeline implementations but creates large timing latency of (N/2-1) computational stages or requires significant additional computations in cases of multichannel computations. In this paper with the purpose to reduce computation and timing penalties a new parallel algorithm for AM computation is developed. It consists of (logzN-l) computational stages with independent BOs. The total number of BOs in Ah4 of the new algorithm equals the number of BOs in AM of usual FCT algorithm. The timing latency is determined by the number of computational stages and equals (log2N-1) that is N/(210g2N) times less against usual algorithm. In the second chapter FCT and inverse FCT (IFCT) algorithms are considered with detailed analysis of the AM structure. In the third chapter the new parallel FCT algorithm is developed and compared with existing one. Obtained results are discussed in the conclusion. 0-7803-7164-X~01/$1002001 IEEE
2. STANDARD FCT ALGORITHM ‘The DCT and inverse DCT (IDCT) algorithms are described by the following equations: N-l
N-I
X,,C;k , x n =
Lk =
PkLkC:k ,
(1)
k=O
n=O
c;k
=cos n(2n+1)k n,k=O,l, ...,N-1, 2N ’ Po:=l/N,Pk = 2 / N for k # O . The fast algorithm for the discrete cosine transform computation was proposed in [6]. It has poor accuracy because of cosecant coefficients. This drawback was eliminated in the FCT algorithm with cosine coefficients and the same number of computation operations [ 5 ] . The FCT algorithm with cosine coefficients requires execution of the following steps: 1) to find subsequences a, and b,, where where
nl
an =xn + x N - ~ - ~ 4, , =(x,, -x~-l-,,)Cj-/ ,
(2)
n=O, 1,. . .,N- 1 ; 2) to compute DCT of sequences an and bn receiving sequences L2k and Bk, k=0,1,. . .,N/2-1; 3) to compute odd elements LZk using formula L, = B,, L,,,, = 2(-1)k B, I
- LZ1-,
(3)
I
k=O, 1,. . .,N/2-1; 4) to perform previous steps to obtain 2-point DCT.
The flow graph of the FCT algorithm is presented in Fig. 1. The IFCT algorithm requires execution of the following steps: 1) to define subsequences Ak and Bk. where Ak = I ! . q k , k = o , l ,..., N / ~ - I ; B N , ~= 2- L~N - , , Bk = 2 k k + l - B k + l , Bk+l = (-Ilk Bk+l k = N I 2 - 2 , N I 2 -3 ,...,0, Bo = Bo I 2.
7
},
(4)
n=O, 1,. ..,N- 1 ; 2) to compute N/2-point IDCT of sequences Ak and Bk that are a, and b, correspondingly, n=O, I ,. . .,N/2- 1; 3) to compute IDCT: x ( n ) = a(n)+ b(n)C;’ ; x( N -1 - n ) = a(n)- b(n)C$ , n=O, 1,. ..,N/2-1; 4) for N=2”’, m=1,2, ..., to perform previous steps to
obtain 2-point IDCT. The flow graph of the IFCT algorithm can be simply obtained by considering the flow graph of the FCT algorithm in the reverse order from right to left side.
112
*mJ 61)
Ita
an 1(4)
a3