Correspondence. Fast Algorithm for Computing Discrete Cosine Transform. C. W. Kok. AbstractâAn efficient method for computing the discrete cosine trans-.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997
757
Correspondence Fast Algorithm for Computing Discrete Cosine Transform C. W. Kok Abstract—An efficient method for computing the discrete cosine transform (DCT) is proposed. Based on direct decomposition of the DCT, the recursive properties of the DCT for even length input sequence is derived, which is a generalization of the radix 2 DCT algorithm. Based on the recursive property, a new DCT algorithm for even length sequence is obtained. The proposed algorithm is very structural and requires fewer computations when compared with others. The regular structure of the proposed algorithm is suitable for fast parallel algorithm and VLSI implementation.
I. INTRODUCTION The discrete cosine transform (DCT) is a very popular unitary transform in modern digital signal processing. It finds applications in data compression, multirate filter systems, etc. A lot of research has been conducted to achieve algorithms to compute the DCT with lower computational complexity. Nowadays, there are radix 2 DCT n = 2 by algorithms that can compute a sequence with length n01 n01 n n2 real multiplications and 3n2 02 +1 real additions [4]–[7], [9], [14]–[16]. However, there are few results on the computation of DCT with other lengths. It is the purpose of this correspondence to introduce a method to compute DCT of an even length sequence where it find applications in multirate filter banks, transmultiplexers, telecommunication signal processing, etc. In general, fast DCT algorithms can be classified into two categories: indirect computations and direct computations. The proposed algorithm belongs to the second type. Indirect computation makes use of the advantage of existing fast algorithms, such as FFT, FHT, etc. to compute DCT. However, additional operations were often required for mapping the DCT sequence to other transformation sequence. Direct computation reduces the computational complexity by means of matrix factorization and recursive decomposition. In this correspondence, a recursive DCT algorithm for an evenlength input sequence is presented. Following the classical method of Narasimha and Peterson [28], Hou [3] developed a recursive algorithm computing the radix 2 DCT. The algorithm is similar to a decimation-in-frequence Cookley–Tukey FFT and computes the DCT from two identical lower order DCT’s. Hou [3] proved this recursive property of DCT by writing the DCT kernel as the angle of DFT kernel plus a variable phase angle. Similarly, Lee [4] proved the recursive property of the DCT kernel by matrix factorization. However, in Lee’s algorithm, there are inversion and division of the cosine coefficients, and thus, the algorithm is subjected to stability problems [22]. Closely resembling the DCT kernel decomposition technique of Hou’s work, the recursive property of the even-length DCT coefficient matrix is derived in this correspondence. It is considered to be a generalization of the radix 2 DCT case. The recursive property of the DCT coefficient matrix was derived directly
N
Manuscript received April 26, 1994; revised August 9, 1996. The associate editor coordinating the review of this paper and approving it for publication was Prof. Ali N. Akansu. The author is with the Elelctrical and Computer Engineering Department, University of Wisconsin–Madison, Madison, WI 53705 USA. Publisher Item Identifier S 1053-587X(97)01865-5.
from the definition of the DCT kernel. This results in a very simple and easy to understand fast algorithm for implementation. It should be noted that there are other important issues for designing a good DCT algorithm besides the arithmetic complexity. As indicated by Yun [22], considerations such as regularity, modularity, and data access scheme are also very important for a good algorithm. The above design criterion will affect the effectiveness of the algorithm when implementation is a concern. Furthermore, modularity eases the generalization of the algorithm to solve higher order problems, provided that we have solved the lower order problems. This property is especially attractive when hardware implementation is a concern because it allows us to use regular components to synthesize hardware for the computation of DCT with different lengths and different dimensions. In this correspondence, it is shown that the even-length DCT can be decomposed into two balanced lower order subproblems. This results in a recursive algorithm that requires fewer arithmetic operations then other well-known algorithms for computing the even-length DCT. For the special case, where the input sequence length equals to 2n , the proposed algorithm has the same arithmetic complexity as the radix 2 DCT algorithm in literature. The proposed algorithm is recursive in nature, thus making it very structural and modular, and it is suitable for VLSI implementation. The balanced structure allows the design of fast parallel algorithms and simpler hardware implementations. Efficient method for mapping the type-IV DCT to the type II-DCT is presented, which enables direct application of the proposed algorithm in a modulated filter bank or the lapped orthogonal transform. By making use of the results in [27] and [24], the proposed algorithm is readily generalized to a recursive algorithm for higher dimensional DCT using the vector-radix method. II. FAST RECURSIVE ALGORITHM FOR TYPE–II DCT The kth DCT coefficient for a sequence x(n) with length N is defined as
Xc (k) =
N 01 n=0
x(n) cos
2
4N
(2n + 1)k
k 2 [0; N 0 1]
; (1)
where the normalization factor of the DCT is neglected for simplicity. Consider an even-length sequence with length N = N1 N2 , where N1 is an even number, and N2 is an odd number. We can decompose the problem in (1) into two balanced subproblems: the even indexed output and odd indexed output of the DCT.
C (i) = Xc (2i)i 2
0;
D(i) = Xc (2i + 1)i 2
N 2
01
0;
N 2
(2)
01
:
A. Evaluation of C (i); Even Indexed Output of DCT
C (i) is defined as C (i) =
1053–587X/97$10.00 1997 IEEE
N 01
x(n) cos
n=0 (N=2)01
=
n=0
2
4N
x(n) cos
(2n + 1)2i 2
4N
(2n + 1)2i
(3)
758
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 (N=2)
01
+
n=0
2
1
x(N 0 1 0 n) cos
01
(N=2)
n=0 2
1
0 1 0 n)) cos
(x(n) + x(N
=
1 (2n + 1)2i
4N
2
+
0 1 0 n) + 1)2i
(2(N
4N
TABLE I ARITHMETIC OPERATIONS REQUIRED IN VARIOUS DCT ALGORITHMS FOR RADIX 2 INPUT SEQUENCE; “ ” REPRESENTS REAL MULTIPLICATIONS, “ ” REPRESENTS REAL ADDITIONS
:
Define a new sequence
p(n) = x(n) + x(N 0 1 0 n)n 2 C (i) can be rewritten as
N 2
01
: III. ARITHMETIC COMPLEXITY
01
(N=2)
C (i) =
0;
2
p(n) cos
n=0
4N
(2n + 1)2i
(4)
which is a DCT with length N=2:
RM (N ) =
B. Evaluation of D(i); Odd Indexed Output of DCT
RA(N ) =
To evaluate D(i), consider
D0 (i) = D(i) + D(i 0 1) D0 (i) =
N 01 n=0
x(n)
+ cos
N 01 =
n=0
2
cos
2
4N
(2n + 1)(2i 2
2x(n) cos
4N
0 1) 2
(2n + 1) cos
4N
(2n + 1)2i
:
Define a new sequence q (n)
q(n) = 2x(n) cos + 2x(N = (x(n)
4N
N
01
2
3N 2
RM
N 2
0 1 + 2RA
N 2
:
RA(N ) = N1 RA(N2 ) +
N 2
3N 2
log
N1
(11)
log
N1 0 N1 + 1
(12)
where RA and RM stand for real additions and real multiplications, respectively.
If the input sequence is radix 2 in length, i.e., N = 2m , a direct application of the proposed recursive algorithm will result in arithmetic complexity
(2n + 1)
0 1 0 n) cos
0;
2
A. Radix 2 DCT Computation
2
4N
(2(N
0 x(N 0 1 0 n))2 cos
1n2
2
RM (N ) = N1 RM (N2 ) +
(5)
2
N
Since the decomposition is true for any even-length input sequence, the algorithm can be applied for log N1 times. The total arithmetic complexities are given by
(2n + 1)(2i + 1)
4N
The proposed algorithm decomposes the DCT with length N into two DCT’s with length N=2: Additional N=2 real multiplications and 3N=2 0 1 real additions are required for this decomposition. The arithmetic complexities are therefore given by
0 1 0 n) + 1)
2
4N
RM (2m ) = m2m01 RA(2m ) = 3m2m01 0 2m + 1
(2n + 1)
:
(6)
which coincide with the arithmetic complexity of the well-known radix-2 DCT algorithms in the literature. Table I provides a list of the arithmetic operations required in various radix-2 DCT algorithms.
Substituting (6) into (5), we get
D0 (i) =
01
(N=2)
n=0
q(n) cos
2
4N
(2n + 1)2i
B. N = 2m N2 ; Even-Length DCT Computation (7)
which is a type-II DCT with length N=2: Noting that D(i) has the symmetric property
D(0) = D(01)
(8)
and the recursive property
D0 (i) = D0 (i) 0 D(i 0 1)
(9)
D0 (0) = 2D(0):
(10)
we obtain
Using the recursion formula in (9), D(i) can be evaluated for every
i with the DCT defined in (7).
For any even-length input sequences, the proposed algorithm computes the DCT by recursively decomposing the input sequence into two lower order problems of length 2m01 N2 : This recursive process is repeated m times until eventually, only the odd-length DCT of length N2 remains. As shown by Chan [1] and Heideman [2], the odd-length DCT can be computed by the real-valued DFT of the same length by means of index mapping with permutations and sign changes only. Since the real-valued DFT can be computed by efficient real-valued FFT algorithms, such as the prime-factor algorithm, radix p algorithm and others [26], the proposed algorithm can make use of an existing FFT algorithm to compute the oddlength DCT of the remaining N2 sequence. The proposed algorithm thus acts like a prime factor DCT algorithm with one of the prime factors being 2m : However, the arithmetic complexity of the proposed algorithm is much lower than the existing prime-factor algorithms,
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997
759
TABLE II ARITHMETIC OPERATIONS REQUIRED IN VARIOUS DCT ALGORITHMS FOR RADIX-6 INPUT SEQUENCE; “ ” REPRESENTS REAL MULTIPLICATIONS, “ ” REPRESENTS REAL ADDITIONS. “COMPLEX ” REPRESENTS COMPLEX MULTIPLICATIONS, “COMPLEX ” REPRESENTS COMPLEX ADDITIONS
2
2
and the recursive structure is much more structural and modular than the prime factor DCT algorithm. As an example, consider the arithmetic complexity of various DCT m = 6 = algorithms for computing input sequence with length m m 2 3 : A radix-3 FFT algorithm would require arithmetic complexity equal to
we consider XcI V (k) + Xc IV (k 0 1)
XcI V (k) + XcI V (k 0 1)
N
N
RA(6
m
)=
8 3
m
6
+
6
N
02
+ 1:
TO
The type-IV DCT for a sequence x(n) with length N is defined as
XcIV (k) =
N
01
n=0
Noting that X
IV c
(k )
1
x(n) cos
2
8N
(2n
0 1)(2k + 1)
k 2 [0; N 0 1]:
(2n + 1)2k 2
2x(n) cos
2
8N
(2n + 1)(2k + 1) cos
8N
(2n + 1) cos
(2n + 1)2k
(15)
cos(A + B ) + cos(A 2
8N
0 B) = 2 cos A cos B
(2n + 1);
B=
Define a new sequence r(n)
r(n) = 2x(n) cos
2
8N
2
8N
(2n + 1)2k:
(2n + 1)
(16)
and substituting (16) into (15), we get
XcI V (k) + XcI V (k 0 1) =
N
01
n=0
r(n) cos
2
4N
(2n + 1)k
= Xc;r(n) (k ) XcI V (k) = Xc;r(n) (k) 0 XcI V (k 0 1):
(17)
Using the recursion formula in (17) and the initialization condition in (14), XcI V (k) can be computed for every k with DCT defined in (13). Using the recursive formula in (17), compute the type-IV DCT with the type-II DCT of the same length. Additional N real multiplications and N 0 1 additions are required. The arithmetic complexities are given by
(13)
RM (N ) = RM (DCT (N )) + N; RA(N ) = RA(DCT (N )) + N 0 1
(14)
where RM (DCT(N )) and RA(DCT(N )) stand for real multiplications and real additions required in (11) and (12), respectively, that is, the computation of the type-II DCT of a length-N sequence.
has the symmetric property
XcI V (0) = XcI V (01)
01
2
8N
since
A=
TYPE-II DCT
cos
n=0
and m+1
A comparison of the arithmetic complexity of various algorithms used to compute the radix–6 sequence is listed in Table II, including the specially designed radix–6 algorithm by Chan [23] and the prime factor algorithm of Lee [13]. Notice that the radix–3 algorithm derived by Chan [23] is incorportated in Lee’s algorithm for optimal performance (the interested reader can refer to Chan [23] for a detailed comparison). The number of arithmetic operations for a N = 63 = 216-point DCT is also listed in Table II. Table II shows that the proposed algorithm requires a smaller number of arithmetic operations. Besides, the simple recursive structure of the proposed algorithm provides a very regular and simple solution to the problem when compared with others. IV. MAPPING TYPE-IV DCT
8N
=
m
2
2
1
6
m 2 3m m
x(n)
n=0
m
RM (6m ) = 6m 2 +
01
=
RM (3m ) = 3m 2 RA(3m ) = 3m01 8 0 1 to compute an input sequence with length 3 : Using the input mapping as proposed by Chan [1], the odd-length DCT can be computed with the same arithmetic complexity as the FFT. Thus, the overall computational complexity of the proposed algorithm for the radix–6 input sequence equals
+
+
760
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997
V. CONCLUSIONS An efficient algorithm for computing DCT’s was presented. Coupled with a direct decomposition of the DCT kernel, a regular and efficient method for computing the even-length DCT was obtained. The proposed algorithm is considered to be a generalization of the recursive radix-2 algorithms in the literature. The algorithm is recursive in nature and is very regular and modular. Thus, it is very suitable for hardware implementations. The balanced structure allows the design of fast parallel algorithms. The computational complexity of the proposed algorithm is presented and is compared with other algorithms in the literature. The results showed that a smaller number of arithmetic operations are required in contrast with others. In the special case where the input sequence is a radix-2 sequence, the arithmetic complexity of the proposed algorithm coincides with other well-known radix-2 algorithms. A mapping for computing the typeIV DCT by means of the type-II DCT of the same length is presented. This enables direct applications of the algorithm to modulated filter banks. By applying the Kronecker matrix product representation of the DCT kernel, the proposed algorithm is readily generalized to the vector-radix algorithm for higher dimension DCT computation (interested readers should refer to the Appendix for higher dimension DCT computation). APPENDIX HIGHER DIMENSION DCT COMPUTATIONS Due to the separability of the DCT kernel, the 2-D DCT can be computed by using the row-column method. The algorithm computes a 2-D DCT by taking a 1-D DCT row-wise and column-wise. However, the vector-radix approach may be preferred due to its lower arithmetic complexity. Chan [27] and Wu [24] independently developed a vector radix algorithm to generalize Hou’s [3] algorithm to 2-D. The derivation of the algorithm is based on the application of the Kronecker matrix product as a construction tool and the sequential splitting method for proofing the correctness of the algorithm. As pointed out by Wu [24], any DCT algorithm based on matrix decomposition can be generalized to compute 2-D DCT by the tensor product property [29], i.e., Kronecker matrix product formulation. Since the proposed algorithm is based on DCT coefficient matrix decomposition, following the formulation as introduced by Wu [24] and Chan [27], a vector radix algorithm for higher dimensional DCT computation is readily obtained. A detailed mathematical derivation and proof of the multidimensional DCT formulation can be found in [27] and [29]. REFERENCES [1] S. C. Chan and K. L. Ho, “Fast algorithm for computing the discrete cosine transform,” IEEE Trans. Circuits Syst. II, vol. 44, pp. 185–190, Mar. 1993. [2] M. T. Heideman, “Computation of an odd-length DCT from a realvalued DFT of the same length,” IEEE Trans. Signal Processing, vol. 40, pp. 54–61, Jan. 1992. [3] H. S. Hou, “A fast recursive algorithm for computing the discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 1445–1461, Oct. 1987. [4] B. G. Lee, “A new algorithm for computing the discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-32, pp. 1243–1245, Dec. 1984. [5] P. Z. Lee and F. Y. Hung, “A new method to design recursive algorithms for computing the 1-D and 2-D DCT’s,” in Proc. 1992 Digital Signal Processing Workshop, IL, Sept. 1992, pp. 3.9.1–3.9.2. [6] K. R. Rao and P. Yip, Discrete Cosine Transform, Algorithms, Advantage, Applications. New York: Academic, 1990. [7] S. C. Chan, “Fast transform algorithms and their applications,” Ph.D. dissertation, Univ. Hong Kong, 1991. [8] S. C. Chan and K. L. Ho, “A new 2-D fast cosine transform algorithm,” IEEE Trans. Signal Processing, vol. 38, pp. 481–485, Feb. 1991.
[9] P. Z. Lee and F. Y. Huang, “Restructured recursive DCT and DST algorithms,” IEEE Trans. Signal Processing, vol. 42, pp. 1600–1609, July 1994. , “An efficient prime-factor algorithm for the discrete cosine trans[10] form and its hardware implementation,” IEEE Trans. Signal Processing, vol. 42, pp. 1996–2005, Aug. 1994. [11] S. C. Chan and K. L. Ho, “Prime factor algorithms for computing the discrete cosine transform,” IEEE Trans. Circuits Syst. II, vol. 43, pp. 185–190, Mar. 1992. [12] P. P. Yang, N. J. Narasimha, and B. G. Lee, “Prime factor decomposition of the discrete cosine transform and its hardware realization,” in Proc. IEEE ICASSP-85, 1985, pp. 772–775. [13] B. G. Lee, “Input and output index mapping for a prime-factor decomposed computation of discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 237–244, Feb. 1989. [14] Z. Cvetkovic and M. V. Popovic, “New fast recursive algorithms for the computation of discrete cosine and sine transforms,” IEEE Trans. Signal Processing, vol. 40, pp. 2083–2086, Aug. 1992. [15] O. Ersoy and N. C. Hu, “A unified approach to the fast computation of all discrete trigonometric transforms,” in Proc. IEEE ICASSP-87, 1987, pp. 1843–1846. [16] P. Yip and K. R. Rao, “Fast decimation-in-time algorithms for a family of discrete sine and cosine transforms,” Circuits Syst., Signal Processing, vol. 3, pp. 387–408, 1984. [17] M. Vetterli and H. J. Nussbaumer, “Simple FFT and DCT algorithm with reduced number of operations,” Signal Processing, vol. 6, no. 4, pp. 267–278, 1984. [18] F. A. Kamangar and K. R. Rao, “Fast algorithms for the 2-D discrete cosine transform,” IEEE Trans. Comput., pp. 899–906, 1982. [19] Z. Wang, “On computing the discrete Fourier and cosine transforms,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 1341–1344, 1985. [20] H. S. Malvar, “Fast computation of discrete cosine transform through fast Hartley transform,” Electron. Lett., vol. 22, no. 7, pp. 352–353, Mar. 1986. ,“Corrections to ‘Fast computation of the discrete cosine transform [21] and the discrete Hartley transform,’ ” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 610–612, Apr. 1988. [22] H. D. Yun and S. U. Lee, “On the fixed-point-error analysis of several fast DCT algorithms,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 27–41, Feb. 1993. [23] Y. H. Chan and W. C. Siu, “Mixed-radix discrete cosine transform,” IEEE Trans. Signal Processing, vol. 41, pp. 3157–3160, Nov. 1993. [24] H. R. Wu and F. J. Paoloni, “A two-dimensional fast cosine transform algorithm based on Hou’s approach,” IEEE Trans. Signal Processing, vol. 39, Feb. 1991. [25] M. Vetterli, P. Duhamel, and C. Guillemot, “Tradeoffs in the computation of mono and multidimensional DCT’s,” in Proc. ICASSP-89, 1989, pp. 999–1002. [26] H. V. Sorensen, M. T. Heideman, and C. S. Burrus, “A split-radix real-valued fast Fourier transform,” in Proc. ICASSP-84, 1984, pp. 28A.7.1–28A.7.4. [27] S. C. Chan and K. L. Ho, “Prime factor real-valued Fourier, cosine and Hartley transforms,” in Proc. Signal Processing VI, 1992, pp. 1045–1048. [28] N. J. Narasimha and A. M. Peterson, “On the computation of the discrete cosine transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 1415–1424, Sept. 1989. [29] H. R. Wu and F. J. Paoloni, “The structure of vector radix fast Fourier transforms,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 1415–1424, Sept. 1989. [30] P. Duhamel, “Implementation of ‘Split-radix’ FFT algorithms for complex, real and real-symmetric data,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, pp. 285–295, Apr. 1986. [31] N. I. Cho and S. U. Lee, “DCT algorithms for VLSI parallel implementations,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 121–127, Jan. 1990.