910
IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 12, DECEMBER 2012
Fast Conversion Algorithm for the Dolby Digital (Plus) AC-3 Audio Coding Standards Vladimir Britanak, Member, IEEE
Abstract—The Dolby Digital (AC-3) and the Dolby Digital Plus or Enhanced AC-3 (E-AC-3) systems are currently the key enabling technologies for high-quality compression of digital audio signals. For the time-to-frequency transformation of an audio data block and vice versa, both systems have adopted a long transform being the modified discrete cosine transform (MDCT). The AC-3 additionally defines two variants of cosine-modulated filter banks called the first and second short transforms. A fast conversion algorithm is presented to convert the frequency coefficients of the long (MDCT) transform to those of two short transforms and vice versa, directly in the frequency domain. It is based on a block sparse matrix factorization of a conversion matrix. The fast conversion algorithm is efficient in terms of the structural simplicity, arithmetic complexity and memory requirements compared to the obvious conversion methods. Moreover, the existing AC-3 fast computational modules may be simply re-used in the conversion procedures. Consequently, the E-AC-3 to AC-3 bit stream conversion and the AC-3 to E-AC-3 bit stream transcoding can be realized in a simplified and efficient way, thus minimizing the amount of partial decoding/encoding and memory requirements during the conversion and transcoding processes. Index Terms—AC-3 analysis/synthesis filter banks, AC-3 to E-AC-3 transcoder, block sparse matrix factorization, dolby digital (AC-3), dolby digital plus or enhanced AC-3 (E-AC-3), E-AC-3 to AC-3 conversion, fast conversion algorithm.
I. INTRODUCTION
T
HE Dolby Digital (AC-3) [1], [2] and the Dolby Digital Plus or Enhanced AC-3 (E-AC-3) [3]–[5] systems are currently the key enabling technologies for high-quality compression of digital audio signals. The AC-3 is currently in use in a number of standard applications in consumer electronics including the North American HDTV standard, the DVD-Video standard and Digital Video Broadcasting (DVB) standard. The E-AC-3 is essentially the advanced version of AC-3 providing increased coding efficiency, flexibility and wider range of supported bit-rates, expanded channel formats and reproduction circumstances while preserving a high level of compatibility and interoperability with the AC-3 system [4], [5]. For the time-to-frequency transformation of an audio data block and vice versa, both systems, AC-3 and E-AC-3, have adopted a long transform being the modified discrete cosine Manuscript received August 20, 2012; revised October 16, 2012; accepted October 17, 2012. Date of publication October 22, 2012; date of current version October 29, 2012. This work was supported in part by the Slovak Scientific Agency VEGA under Project 2/0129/10. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Muhammad Zubair Ikram. The author is with the Institute of Informatics, Slovak Academy of Sciences, 845 07 Bratislava, Slovak Republic (e-mail:
[email protected]). Digital Object Identifier 10.1109/LSP.2012.2226028
transform (MDCT). The AC-3 additionally defines two variants of cosine-modulated filter banks called the first and second short transforms. For the efficient implementation of the long (MDCT) and two short transforms the AC-3 and E-AC-3 have employed a reconfigurable fast algorithm based on the type-IV discrete cosine transform (DCT-IV) of half size which is subsequently mapped into the complex DFT of quarter size [6]. Although the E-AC-3 bit streams are similar in nature to AC-3 ones (they use the same MDCT filter bank, bit-allocation process, and framing structure), are not backwards compatible, i.e., they are not decodable by AC-3 decoders [3]. Therefore, the Dolby Labs developed an efficient method to convert an E-AC-3 bit stream to an AC-3 one, the so-called E-AC-3 to AC-3 conversion, to ensure the compatibility with the large installed base of AC-3 decoders. The conversion procedure is designed to minimize loss in audio quality while keeping the complexity at a level suitable for low-cost consumer devices [4]. On the other hand, the so-called AC-3 to E-AC-3 transcoding, is used to distribute 5.1-channel audio content that has already been encoded in the AC-3. The E-AC-3 bit stream is created by transcoding an AC-3 bit stream to an E-AC-3 bit stream at lower bit rate [4]. Recently, based on the matrix representation of AC-3 filter banks and relations among transform matrices, a relation between the frequency coefficients of the long (MDCT) and those of two short transforms has been derived [8]. Given frequency coefficients of the long (MDCT) transform. Then, the frequency coefficients of two short transforms in AC-3 can be simply obtained from the frequency coefficients of the long (MDCT) transform via a conversion matrix. Since the conversion matrix after proper scaling is an orthonormal matrix with very regular general block structure, the frequency coefficients of the short transforms can be converted to the frequency coefficients of the long (MDCT) transform via the transposed conversion matrix. But, the conversion procedures defined in the matrix-vector form are still computationally intensive. Therefore, the open problem has been stated in [8] to be solved: The existence of a generalized block sparse matrix factorization of the conversion matrix which would define a fast conversion algorithm. In this letter, a fast conversion algorithm for the Dolby Digital (Plus) AC-3 audio coding standards is presented to convert the frequency coefficients of the long (MDCT) transform to those of two short transforms and vice versa, directly in the frequency domain. The fast conversion algorithm is based on a block sparse matrix factorization of the conversion matrix. It is efficient in terms of the structural simplicity, arithmetic complexity and memory requirements compared to the obvious conversion methods. Moreover, the existing AC-3 fast computa-
1070-9908/$31.00 © 2012 IEEE
BRITANAK: FAST CONVERSION ALGORITHM
911
tional modules may be simply re-used in the conversion procedures. Consequently, the E-AC-3 to AC-3 bit stream conversion and the AC-3 to E-AC-3 bit stream transcoding can be realized in a simplified and efficient way, thus minimizing the amount of partial decoding/encoding and memory requirements during the conversion and transcoding processes.
in the upper half of except compared to the elements for their reverse ordering and proper sign changes. Hence, only elements in the upper half of are unique. Therefore, only half of is sufficient to be pre-computed, i.e., we need to store elements. The conversion matrix possesses the following regular general block structure [8]:
II. BASIC FACTS: DEFINITIONS AND NOTATIONS Definitions and basic symmetry properties of AC-3 analysis and synthesis filter banks are presented in [8]. Each input audio block is windowed by a customized symmetric Kaiser-Bessel (KBD) function before its transformation. With respect to definitions of AC-3 transforms it is assumed that denotes the size of long (MDCT) transform, whereby is an integer divisible by 4, and denotes the size of two short transforms. For the efficient computation of -point forward and backward long (MDCT) transforms the AC-3 and E-AC-3 codecs have adopted the -point DCT-IV-based fast algorithm which is mapped into the identical -point forward complex FFT module. On the other hand, for the efficient computation of -point forward and backward short transforms the AC-3 codec has adopted the -point DCT-IV fast algorithm which is similarly mapped into the identical -point forward complex FFT module [1]. It is well known that the lowest achievable arithmetic complexity of an -point DCT-IV computation implemented via the -point complex DFT is real multiplications and real additions [7].
(2) where and are square sub-matrices both of order , and the superscript denotes the transposition. is the matrix of order with alternating 1 elements on the opposite main diagonal defined as
(3)
If data block lengths are powers of 2, the sign changing factor in (2) can be removed, and . In general, the product of and its transpose satisfies the following relation [8]: (4)
A. Conversion Matrix and Its Properties Based on the matrix representation of AC-3 filter banks and relations among transform matrices, the conversion matrix denoted by has been derived in [8]. The elements of are given by
is the identity matrix of order . From (4) it follows where that the matrices are orthogonal. If the matrices are properly scaled by the factor , then they are orthonormal, and their determinant is unity. The scaling factor can be simply absorbed into (1). B. Conversion Procedures Denote the frequency coefficients of the long (MDCT) transform by an -point vector , and frequency coefficients of the first and second short transforms, respectively, by -point vectors and . Given the frequency coefficients of the long (MDCT) transform. Then, they can be converted to the frequency coefficients of two short transforms via the conversion matrix according to [8]
(1) are elements in the upper half of , while where are elements in the lower half of , and represents the KBD windowing function. Since the KBD windowing function is symmetric and satisfies the so-called perfect reconstruction conditions [8], the expression under sum of (1), , and it may be eliminated. It means that the elements of conversion matrix do not depend on the KBD windowing function. Equation (1) implies that the elements in the lower half of are the same in magnitude
(5) In order to obtain the true frequency coefficients of two short transforms, after the conversion they have to be scaled by . Using the orthonormality property of , from (5) we directly obtain (6) and hence, the frequency coefficients of two short transforms can be converted to those of the long (MDCT) transform via the
912
IEEE SIGNAL PROCESSING LETTERS, VOL. 19, NO. 12, DECEMBER 2012
transposed matrix . The matrix-vector products given by (5) and (6) require multiplications, additions and the memory to store elements of . III. FAST CONVERSION ALGORITHM Now, investigate in detail the explicit forms of conversion for matrix defined by (1) and scaled by the factor , 4 and 8. For we have [8] Fig. 1. Block diagram of the fast in-place conversion algorithm.
while for the conversion matrix the following matrix product [8]
is factorized into
The key to the derivation of a fast conversion algorithm is an observation that the properly scaled factored (block) matrices on the right-hand sides of and matrices are actually the orthonormal DCT-IV matrices and , but with lower and upper halves exchanged (see the rightmost matrices on the right-hand sides of and ) when are compared with the explicit orthonormal forms of DCT-IV matrices presented in [9]. Hence, the conversion matrices and may be represented by the following block sparse matrix factorizations:
where , and is the identity matrix of order 2. In fact, investigating the conversion matrix by the same procedure we find that it may be represented by the similar block sparse matrix factorization as
in-place conversion algorithm is shown in Fig. 1. Given the frequency coefficients of the long (MDCT) transform. According to Fig. 1, they are at first transformed by the -point DCT-IV, then two halves of the output vector are exchanged and finally, each half is transformed separately by the -point DCT-IV. The final frequency coefficients of two short transforms are normalized by the factor . Note that the memory requirements to store the half of matrix are completely eliminated. Since the DCT-IV matrices are self-inverse, i.e., [9], by transposing (7) we obtain the generalized block sparse matrix factorization of the transposed conversion matrix as (8) Equations (6) and (8) define the fast conversion algorithm to convert the frequency coefficients of two short transforms to those of the long (MDCT) transform. It means that the block diagram in Fig. 1 is performed in the reverse direction. The final frequency coefficients of the long (MDCT) transform are normalized by the factor . In general, the arithmetic complexity of the fast conversion algorithm for is given by that of the -point DCT-IV plus that of two -point DCTs-IV. Using the fast algorithm for -point DCT-IV computation having the lowest achievable arithmetic complexity [7], the fast conversion algorithm requires real multiplications and real additions. Compared to the matrix-vector products given by (5) and (6) which require multiplications, additions, and the memory to store elements of [8], the fast conversion algorithm is superior both in terms of the arithmetic complexity and memory requirements. IV. DISCUSSION
As a result, for , scaling by produces its generalized block sparse matrix factorization defined as (7) and are DCT-IV matrices of order and where , respectively, is the identity matrix of order , and are zero matrices. Equations (5) and (7) define the fast conversion algorithm. The corresponding block diagram of the fast
We recall that for time-to-frequency transformation of an audio data block and vice versa, both the AC-3 and E-AC-3 use the identical long (MDCT) transform while the AC-3 uses the additional two short transforms when a transient signal is detected. The use of different filter banks has an impact on the E-AC-3 to/from AC-3 bit stream conversion/transcoding. The E-AC-3 employs always the long data blocks. Therefore for the E-AC-3 to AC-3 bit stream conversion, when the long data block contains a transient signal, the frequency coefficients of the long (MDCT) transform have to be converted to those of two short transforms for the AC-3 to cancel pre-echo
BRITANAK: FAST CONVERSION ALGORITHM
effects. An obvious conversion method involves the computation of three backward long transforms (of previous, current and next blocks), three windowing procedures, two overlap/add procedures, and two forward short transforms. This requires real multiplications, real additions, and memory locations. On the other hand, for the AC-3 to E-AC-3 bit stream transcoding, when the frequency coefficients of AC-3 two short transforms are available, then they have to be converted to those of the long (MDCT) transform for the E-AC-3. The obvious conversion method involves the computation of two backward short transforms plus two backward long transforms in the worst case (provided by the previous and next blocks are long), or two backward short transforms in the best case (provided by the previous and next blocks are short), three windowing procedures, two overlap/add procedures, and finally one forward long (MDCT) transform. This requires in the worst case similarly real multiplications and real additions, while in the best case real multiplications, real additions, and in both cases memory locations. The fast in-place conversion algorithm may be applied to the the E-AC-3 to/from AC-3 bit stream conversion/transcoding directly in the frequency domain without partial decoding/encoding, thus saving more than 50% total arithmetic operations and eliminating completely memory requirements compared to the obvious conversion methods. Moreover, the relationship between the long (MDCT) transform and two short transforms [8] guarantees that no errors are introduced during the E-AC-3 to/from AC-3 bit stream conversion/transcoding. In summary, the fast conversion algorithm has the following advantages: • It does not depend on the KBD windowing function. • It simplifies the implementation of AC-3 filter banks [8]. • It is efficient in terms of the structural simplicity, arithmetic complexity and memory requirements. • Conversion procedures can be realized directly in the frequency domain without partial decoding and encoding. • Although many fast algorithms for the DCT-IV computation are available [9], [10], the existing AC-3 fast computational modules, i.e., the -point and -point forward complex FFT modules may be simply re-used in the conversion procedures. • It minimizes the amount of partial decoding/encoding and memory requirements during the conversion and transcoding processes.
913
V. CONCLUSIONS The fast conversion algorithm for the Dolby Digital (Plus) AC-3 audio coding standards has been presented to convert the frequency coefficients of the long (MDCT) transform to those of two short transforms and vice versa, directly in the frequency domain. The fast conversion algorithm is based on the block sparse matrix factorization of the conversion matrix. It is efficient in terms of the structural simplicity, arithmetic complexity and memory requirements compared to the obvious conversion methods. The existing AC-3 fast computational modules may be simply re-used in the conversion procedures. Consequently, the E-AC-3 to AC-3 bit stream conversion and the AC-3 to E-AC-3 bit stream transcoding can be realized in a simplified and efficient way, thus minimizing the the amount of partial decoding/encoding and memory requirements during the conversion and transcoding processes. REFERENCES [1] Digital Audio Compression (AC-3) ATSC Standard Document A/52/10 of Advanced Television Systems Committee (ATSC), Audio Specialist Group T3/S7. Washington, DC, Dec. 1995. [2] M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and Standards. New York: Springer, 2003, ch. 14, pp. 371–400. [3] Digital Audio Compression Standard (AC-3, E-AC-3), Revision B Document A/52B of Advanced Television Systems Committee (ATSC). Washington, DC, Jun. 2005. [4] L. D. Fielder, R. L. Andersen, B. G. Crockett, G. A. Davidson, M. F. Davis, S. C. Turner, M. S. Vinton, and P. A. Williams, “Introduction to Dolby Digital Plus, an enhancement to the Dolby Digital coding system,” in 117th AES Conv., San Francisco, CA, Oct. 2004, preprint 6196. [5] G. A. Davidson, M. A. Isnardi, L. D. Fielder, M. S. Goldman, and C. C. Todd, “ATSC video and audio coding,” Proc. IEEE, vol. 94, no. 1, pp. 60–76, Jan. 2006. [6] R. Gluth, “Regular FFT-related transform kernels for DCT/DST-based polyphase filter banks,” in Proc. of the IEEE ICASSP’91, Toronto, ON, Canada, May 1991, pp. 2205–2208. [7] H. S. Malvar, Signal Processing With Lapped Transforms. Norwood, MA: Artech House, 1992, ch. 2, pp. 71–75. [8] V. Britanak, “On properties, relations, simplified implementation of filter banks in the Dolby Digital (Plus) AC-3 audio coding standards,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1231–1241, Jul. 2011. [9] V. Britanak, P. Yip, and K. R. Rao, Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations. Amsterdam, The Netherlands: Academic/Elsevier, 2007, ch. 4, pp. 73–140. [10] V. Britanak, “New universal rotation-based fast computational structures for an efficient implementation of the DCT-IV/DST-IV and analysis/synthesis MDCT/MDST filter banks,” Signal Process., vol. 89, no. 11, pp. 2213–2232, Nov. 2009.