Real-time implementation of a high fidelity MDCT-based codec ...

5 downloads 0 Views 218KB Size Report
Abstmct - This paper discusses the real-time implementation of a high-quality audio codec at a bit rate below 150 kbith per monophonic channel on a 24-bit fixed ...
REAL-TIME IMPLEMENTATION OF A HIGH FIDELITY MDCT-BASED CODEC Pek-Yew Tan, Sua-Hong Neo, Ah-Peng Tan

Asia Matsushita Electric (S) Pte Ltd AV/Information Research Center Block 1022, Hougang Avenue 1, #04-3526, Tai Seng Industrial Estate, Singapore 1953.

Abstmct - This paper discusses the real-time implementation of a high-quality audio codec at a bit rate below 150 kbith per monophonic channel on a 24-bit fixedpoint DSP (Motorola DSP56002) based hardware. The algorithm is an adaptive Modified Discrete Cosine Transform coding technique. Known human hearing characteristics are exploited in the adaptive bit allocation scheme. Both the hardware and software configurations are described, along with measured execution times and program and data memory usages. A high fidelity quality suitable f o r consumer applications has been achieved.

PCM

I

INTRODUCTION

The recent research efforts in wideband audio coding [1],[2],[3] have provided many innovative and useful algorithms for high-quality audio coding at low bit rates. Most of the algorithms developed seek to achieve a large bitrate compression while keeping the quality of the reconstructed audio signals close to that of compact disc. The wide applications of such codecs include Digital Audio Broadcasting (DAB), High-Definition Television (HDTV), multimedia and digital audio recording. In this paper, the real-time implementation of a MDCT-based codec with a 24-bit fixed-point DSP is reported. The flow of the presentation start with a review of the coding algorithm used. This is followed by a description of the hardware implementation. Finally, the performance of the codec is discussed.

MDCT-BASED CODING ALGORITHM

The MDCT-based coding algorithm belongs to a class of adaptive transform coder [l], using the Modified Discrete Cosine Transform as the core. It is an asymmetric codec where the encoder is more complex than the decoder. Figure 1 shows the audio encoder. The 20 KHz audio signals sampled at 44.1 KHz are processed in blocks of 512 samples. This block size has been found to provide a good compromise between frequency resolution and codec delay. Overall codec delay is theoretically 1024 samples. Adaptive block-size selection which improves the coding quality by reducing the preechoes is used to determine whether these 512 samples are

Figure 1. Audio Encoder

to be processed as either one long block or a few equal-size short blocks. For block size determination purpose, each processed block is divided into 8 sub-segments. The energy within each sub-segment is computed. The block size selected depends on whether sharp time domain transients exist in the audio signals. The samples are then transformed into coefficients in the frequency domain using the MDCT which has a time domain aliasing cancellation (TDAC) feature [4]. The transform coefficients are grouped in a manner based on but not closely akin to the auditory system. The transform coefficients are normalised using appropriate scale factors and are subjected to linear quantization according to a dynamic bit allocation scheme which considers auditory masking. As the bit allocation is a highly iterative process, it is kept simple to reduce implementation complexity. Information on the transform block-size, scale factors, bit allocation data and quantized transform coefficient data are transmitted to the decoder. Figure 2 shows the audio decoder. At the decoder, the information are decoded and the transform coefficients are dequantized. The output audio

1108 SINGAPORE 1ccsrg4

0-7803-2046-8/94/$4.00 '1 994 IEEE

signal is then obtained through an inverse MDCT transform (IMDCT). -

uant. coeff.

l: B

I T

S T R

Encoded Audlo

E A

in

M

~

below 2xI50kbit/s

v

D E M U L T I P L E X

Overlap

Samples

PCM

Audio

+

Out

-

2 x 705.8 IdNs -

fs 44.1 KHz

Figure 2. Audio Decoder Figure 3. System Block Diagram of Coda

HARDWARE IMPLEMENTATION

A modular approach was adopted for the codec prototype implementation. The codec was built around three basic cards: the Signal Processing Card (SPC), Audio YO Card (AIC) and Channel YO Card (CIC). These cards which follow the standard 3U Eurocard size were designed such that they could be used interchangeably at the encoder or decoder. Due to the complexity of the encoder algorithm, two SPCs were needed to encode the audio signals in real time. To exploit the parallel nature of the stereo channels, it is best to arrange these two SPCs running in parallel to process one audio channel each so as to reduce the system delay. The hardware of this parallel architecture is more difficult to implement than the serial architecture. Nevertheless, the additional advantage of almost identical software for both SPCs makes the parallel architecture as the more suitable choice. In contrast to encoder, the decoder's algorithm is less complex, so it is possible to accommodate the whole software within one SPC for processing the stereo channels. The system block diagram of the codec hardware is shown in figure 3. The SPC uses a 24-bit fixed-point DSP (Motorola DSP56002) as the main processing element. Each SPC has 32K-word (24 bitdword) high speed SRAM (1511s) to support

zero wait state access time for both program and data memory when the DSP is operating at 40 MHz. The SPC also incorporates 16K words EPROM to support a stand alone implementation. The program and coefficient tables are downloaded from the EPROM to SRAM at start-up via a boot program in the EPROM, and the DSP then runs entirely from the SRAM. The memory configuration of the DSP after downloading becomes,

SRAM : 16K words 8K words 8K words

P (program) X (data and tables) Y (data and tables)

On-chip RAM : 256 words 256 words

X (data) Y (data)

During program downloading, the DSP runs at fifteen wait states because of the slow access time of the EPROM. A memory controller implemented on a programmable array logic (PAL) facilitates program downloading at boot time and subsequent program execution at run time. A serial YO sub-block has also been incorporated in the SPC to select external serial links and to provide the interface to other

1109 SINGAPORE 1ccsr94

0-7803-2046-8/94/$4.00"1 994 IEEE

cards within the same system platform.

PERFORMANCE

The AIC uses a SPDIF audio data transceiver to interface to a music source or sink eg. compact disc (CD)or digital audio tape (DAT) player.

The performance of the codec was measured in terms of the execution times of all the program modules and the subjective evaluation of the audio output quality.

The CIC has been designed using a field programmable gate array (FF'GA) for greater flexibility during implementation. This card serves to transmit or receive the compressed audio bit-stream between the encoder and the decoder through a channel link. It also serves as a main controller to provide the necessary synchronization signals and co-ordinate all activities between the cards in the system.

Since execution times are not constant for certain program modules, only worst-case values are reported. The execution times can be summarised as shown below.

ENCODER :

Execution Time (ms)

The encoder and decoder softwares have been coded directly in assembly language to achieve an optimized real-time implementation. During the software development stages, the OnCE emulator (from Motorola) was used extensively to monitor intermediate results, access to DSP memory and control of run time parameters in aid of debugging.

Block-Size Determination Window, Overlap and MDCT Scale Factor Computation Bit Allocation Quantization Bit-Stream Packing Overhead

0.092 1.614 0.177 3.223 0.465 1.350 0.310

The codec was brought up systematically through a series of back-to-back test (encoder to decoder) at the end of each stage of software development whenever possible. A simple data passing program was first used to verify the functionality of the hardwares. It was then followed by the testing of window, overlap and MDCT transform softwares on the hardwares at uncompressed channel bit rate. Lastly, the entire encoder and decoder softwares were tested on the final hardwares. The program modules developed for the encoder and decoder, together with their memory requirements are given below.

Total: 7.231

DECODER :

Execution Time (ms)

Bit-Stream Unpacking Dequantization IMDCT Window and Overlap Overhead

2.777 0.485 3.998 0.934 0.082 Total: 8.276

ENCODER :

Memory Size (words)

Block-Size Determination Window, Overlap and MDCT Scale Factor Computation Bit Allocation Quantization Bit-Stream Packing Overhead Tables and Buffers

108 270 57 187 194 263 1198 4526

DECODER :

Memory Size (words)

Bit-Stream Unpacking Dequantization IMDCT Window and Overlap Overhead Tables and Buffers

208 29 373 217 259 10263

The time allocated for processing the 512 samples block is given as, Time Allocated = 512 x (1/44.1) = 11.61 m s

Various music sequences were used for the informal subjective evaluation of the codec. The subjective quality is close to that of the compact disc even for critical sequences such as Suzanne Vega, Glockenspiel and Castanets. There was no noticeable impairment in the sound quality for most of the sequences tested.

CONCLUSIONS A real-time implementation of a high fidelity MDCT-based codec has been reported. The codec uses Motorola 24-bit fixed-point DSP (DSP56002) as the main processing element. The encoder hardware consists of two DSPs arranged in parallel, whereas the decoder uses only one DSP. All DSP program codes are downloaded directly from EPROM to achieve a stand-alone system. The software is optimized through direct coding in assembly language. Execution times and program and dam memory

1110 SINGAPORE 1ccsr94

0-7803-2046-8/94/$4.00 '1994 IEEE

sizes are presented. The coding algorithm has pre-echo and block boundary noise suppression features. Informal subjective tests have proved that the developed codec achieves high audio coding quality. This codec can be used for high-quality audio recording and transmission over low bit-rate channels.

REFERENCES Masahiro Iwadare, Akihiko Sugiyama, Fumie Ham, Akihiro Hirano, and Takao Nishitani, "A 128 kb/s Hi-Fi Audio CODEC Based on Adaptive Transform Coding with Adaptive Block Size MDCT, IEEE Jour. on Selected Areas in Com., Vol.10, No.1, pp. 138-144, Jan. 1992. Grant Davidson, Louis Fielder, and Mike Antell, "High Quality Audio Transform Coding at 128 Kbitsls", /CA SSP, pp.1117-1120, 1990. K.Brandenberg, J.Herre, J.D.Johnston, Y.Mahieux, and E.F.Schroeder, "Aspec: Adaptive Spectral Entropy Coding of High Quality Music Signals", 90rh A udio Engineering

Society

Convention,

Preprint 3011, Feb.1991. J.Princen and A.B.Bradley, "AnalysidSynthesis Filter Bank Design Based on Time-Domain Aliasing Cancellation", lEEE Trans. on A coustics,Speech.imd Sig. Proc., vol.ASSP-34, no.5, pp.1153-1161, Oct.1986.

1111 0-7803-2046-~.0 O l W 4 IEEE

~

Suggest Documents