A new parallel fast cosine transform algorithm

International Workshop on Intelligent Data Acquisition and Advanced Computing System: Technology and Applicatioils 1-4 July 2001, Foros, Ukraine

A New Parallel Fast Cosine Transform Algorithm Anatoly Melnyk”, Yury Ermetov2) Department of Computer Engineering, Lviv Polytechnic National University, St.Bandery 12 str., Lviv 79646, Ukraine, 1) aomelnyk @ pol ynet.lviv.ua 2) [email protected]

.

Abstract: A new algorithm for the fast cosine transform (FCT) computation is proposed. The usual FCT algorithm features consecutive adding operations which in case of parallel FCT computation result in large timing latency or require significant additional hardware. There is the same number of parallel operations in the new FCT algorithm compared to the amount of consecutive operations in the usual FCT algorithm. This provides for simple implementation of parallel FCT computations without extra timing and hardware expenses. . Keywords: digital signal processing, data compression, fast cosine transform, parallel algorithm

1. INTRODUCTION

’

Discrete cosine transform (DCT) is widely used in digital signal processing for such tasks as data compression, pseudocepstral data processing, filtering, computqtion of other orthogonal trigonometric transforms etc. [ 11. The DCT algorithm is basis for such wide-spread coding standards as JPEG, MPEG, H.261 etc. [2,3,4]. The tasks for which algorithm computation latency is crucial parameter require parallel computation. The fast cosine transform (FCT) algorithm consists of two parts: transforming matrix (TM) and adding matrix (AM) [ 5 ] . The structure of TM is similar to the structure of the fast Fourier transform algorithm containing IogzN computational stages of N/2 independent “butterfly” operations. Thus parallel computation of TM is simply accomplished by implementation of corresponding number of hardware “butterflies”. The AM consists of (logzN-1) computational stages with recursive basic operations (BO) when one BO requires result of previous BO. Consecutive nature of these operations is convenient in cases of one-channel iterative and pipeline implementations but creates large timing latency of (N/2-1) computational stages or requires significant additional computations in cases of multichannel computations. In this paper with the purpose to reduce computation and timing penalties a new parallel algorithm for AM computation is developed. It consists of (logzN-l) computational stages with independent BOs. The total number of BOs in Ah4 of the new algorithm equals the number of BOs in AM of usual FCT algorithm. The timing latency is determined by the number of computational stages and equals (log2N-1) that is N/(210g2N) times less against usual algorithm. In the second chapter FCT and inverse FCT (IFCT) algorithms are considered with detailed analysis of the AM structure. In the third chapter the new parallel FCT algorithm is developed and compared with existing one. Obtained results are discussed in the conclusion. 0-7803-7164-X~01/$1002001 IEEE

2. STANDARD FCT ALGORITHM ‘The DCT and inverse DCT (IDCT) algorithms are described by the following equations: N-l

N-I

X,,C;k , x n =

Lk =

PkLkC:k ,

(1)

k=O

n=O

c;k

=cos n(2n+1)k n,k=O,l, ...,N-1, 2N ’ Po:=l/N,Pk = 2 / N for k # O . The fast algorithm for the discrete cosine transform computation was proposed in [6]. It has poor accuracy because of cosecant coefficients. This drawback was eliminated in the FCT algorithm with cosine coefficients and the same number of computation operations [ 5 ] . The FCT algorithm with cosine coefficients requires execution of the following steps: 1) to find subsequences a, and b,, where where

nl

an =xn + x N - ~ - ~ 4, , =(x,, -x~-l-,,)Cj-/ ,

(2)

n=O, 1,. . .,N- 1 ; 2) to compute DCT of sequences an and bn receiving sequences L2k and Bk, k=0,1,. . .,N/2-1; 3) to compute odd elements LZk using formula L, = B,, L,,,, = 2(-1)k B, I

- LZ1-,

(3)

I

k=O, 1,. . .,N/2-1; 4) to perform previous steps to obtain 2-point DCT.

The flow graph of the FCT algorithm is presented in Fig. 1. The IFCT algorithm requires execution of the following steps: 1) to define subsequences Ak and Bk. where Ak = I ! . q k , k = o , l ,..., N / ~ - I ; B N , ~= 2- L~N - , , Bk = 2 k k + l - B k + l , Bk+l = (-Ilk Bk+l k = N I 2 - 2 , N I 2 -3 ,...,0, Bo = Bo I 2.

7

},

(4)

n=O, 1,. ..,N- 1 ; 2) to compute N/2-point IDCT of sequences Ak and Bk that are a, and b, correspondingly, n=O, I ,. . .,N/2- 1; 3) to compute IDCT: x ( n ) = a(n)+ b(n)C;’ ; x( N -1 - n ) = a(n)- b(n)C$ , n=O, 1,. ..,N/2-1; 4) for N=2”’, m=1,2, ..., to perform previous steps to

obtain 2-point IDCT. The flow graph of the IFCT algorithm can be simply obtained by considering the flow graph of the FCT algorithm in the reverse order from right to left side.

112

*mJ 61)

Ita

an 1(4)

a3

A new parallel fast cosine transform algorithm

A new parallel fast cosine transform algorithm

Suggest Documents

Fast Algorithm for Computing Discrete Cosine Transform - IEEE Xplore

WARPED DISCRETE COSINE TRANSFORM CEPSTRUM: A NEW ...

fast 2-d discrete cosine transform - Infoscience

Parallel Fast Walsh Transform Algorithm and Its Implementation with

A hybrid algorithm using discrete cosine transform and gabor filter ...

A Cosine Similarity Algorithm Method for Fast and ...

Iris Recognition Using Discrete Cosine Transform and Kekre's Fast

An Efficient Architecture for the in Place Fast Cosine Transform

Fast Cosine Transform to increase speed-up and

Fast discrete cosine transform pruning - Signal ... - Semantic Scholar

Fast discrete cosine transform pruning - Signal ... - Semantic Scholar

PARALLEL FAST LEGENDRE TRANSFORM 1. Introduction

PARALLEL FAST LEGENDRE TRANSFORM 1. Introduction

A Proposal for a Parallel Watershed Transform Algorithm for ... - LBD

A Proposal for a Parallel Watershed Transform Algorithm for Real ...

Algorithm for Fast Complex Hadamard Transform

Fast WalshâHadamardâFourier Transform Algorithm

A new Truncated Fourier Transform algorithm

The Cosine Simplex Algorithm

A New Sine Cosine Optimization Algorithm for Solving ...

A Scalable Parallel 2D Wavelet Transform Algorithm - CiteSeerX

A Scalable Parallel 2D Wavelet Transform Algorithm - CiteSeerX

Fast Parallel Randomized Algorithm for Nonnegative Matrix

A New Asynchronous Parallel Algorithm for

A new parallel fast cosine transform algorithm