LOW BIT RATE CODING WITH BINARY GOLAY CODES Adriana Vasilache1 , Ioan T˘abus¸ 2 and Septimia Sˆarbu 2 1
2
Nokia Research Center, Visiokatu 1, 33720 Tampere, FINLAND Tampere University of Technology, P.O.Box 553, FIN-33101 Tampere, FINLAND
[email protected],
[email protected],
[email protected] 1. INTRODUCTION
Even if the computational power is constantly increasing, as well as the storage resources, the type and quantity of data stored and exchanged over the communication channels is expanding. Multimedia communication becomes an everyday aspect of our lives, and with it our expectations relative to the quality of both the communication and content are raising. From the content coding point of view, these aspects are reflected in the continuous need of low bit-rate compression algorithms that have a good compression performance and low complexity. Within the lossy compression methods, the lattice based quantization tools have gained some momentum, being present in several state of the art speech and audio codecs such as AMR-WB+ [1], G.718 [2], G.729.1 [3]. However, due to the wide range of terminals there is a general tendency towards bit rate embedded scalability of the codecs. In the speech and audio coding context, where most of the bits are allocated to transform coefficients, or to differential transform coefficients, the bit rate of interest to encode this data is around 0.5 bits per sample (bps). Recently [4], the binary codes have been shown to perform well as tools for quantization at low bit rates. In particular, the perfect Golay codes G23 have been used within a quantization method for outlier contaminated sources and tested within a speech and audio coding scenario. The method has been demonstrated to be superior to state of the art method based on lattices, in terms of compression efficiency, but the complexity requirements were still relatively high. Further method refinement and improvement [5] have been aimed at and achieved important complexity reduction while preserving the same compression performance. This presentation aims at giving a unified view over the Golay code based coding as described in [4], [5] and [6]. We will thus briefly describe the original method, followed by the techniques of complexity reduction and overall results. 2. GOLAY CODE BASED QUANTIZATION The data for which the Golay code based quantization method was originally intended consists of long vectors of differential transform coefficients issued from one frame of a layer of an embedded speech and audio codec. The method is based on the observation that such data can
be well described by an outlier contaminated Gaussian model. Part of the vector components are encoded as outliers, specifically those components having their magnitude larger than a threshold times the standard deviation estimate. The remaining components are grouped into sub vectors and encoded using Golay codevectors or quantized to zero. The signs of the significative components (those encoded as outliers and those encoded by ones in the Golay codewords) are separately encoded. The method uses a gain-shape approach, and an overall gain is also estimated and encoded. One differentiating feature, with respect to prior work of Golay based quantization, is the fact that the Golay code shells are considered individually when performing the nearest neighbor search, as opposed to considering the entire codebook. A Golay code shell is the set of Golay codevectors having the same Hamming weight. Basically the shells can be viewed as sub-codebooks. This aspect greatly enhances the optimization result in the bit-rate allocation, since the number of sign bits for each shell varies considerably. The encoding algorithm consists thus of three main parts: 1. Bit rate allocation among outliers and Golay encoded data; 2. Bit rate allocation among the Golay encoded subvectors; 3. Nearest neighbor search on the shells of Golay codes. The resulting Golay code based method compared favourably in terms of compression performance [4] with the lattice based method but the overall complexity was higher. The two most time consuming operations were the bit rate allocation among the Golay encoded sub vectors and the nearest neighbor search on the shells of the Golay code. 3. COMPLEXITY REDUCTION TECHNIQUES For the complexity reduction, the procedure of bit rate allocation between the sub vectors was first considered. Prior to the optimization algorithm, the energy of the sub vectors was taken into account to decide whether the sub vectors were eligible to be coded by a Golay codevector or not. In addition, out of the remaining sub-vectors, the shells for which the nearest neighbor search was more
Signal to noise ratio (SNR) results for the Golay and RE8 [1] stereo methods 4
3.5
SNR
3
2.5
2
1.5
RE8 MeanSNR = 2.1278 Golay MeanSNR = 2.623
1
0.5
0
5
10
15
20
25
30
35
40
Index data file Figure 1. The segmental SNR of the Golay code based method and the lattice based method (RE8) for the stereo extension layer data for 40 mixed speech and audio test files. computationally expensive were considered only for the sub vectors having high energy. By introducing the representation of the binary cyclic codes as a union of cyclic leader classes and performing the nearest neighbor search only for a selection of these classes, the complexity has been further reduced. The overall complexity has been reduced 5 times after the implementation of the combined two approaches, without loss of compression efficiency. The Golay code based method has been tested at the encoding of two super wideband extension layers and two stereo extension layers on top of G.718 and G.729.1 codecs. The available bit rates were 0.53 bps for both super wideband layers encoding 280 long vectors of differential transform coefficients and 0.59 bps and 0.24 bps for the two stereo layers encoding 88 and 280 long vectors of differential transform coefficients, respectively. Superior compression performance over the lattice based method has been obtained in both test cases. Up to 0.3dB signal to noise ration (SNR) improvement (12%) has been obtained in the super wideband scenario and 0.5dB SNR improvement (23%) in the stereo scenario [6]. The SNR results for several speech and music test files in the stereo case are presented in Figure 1. More results and detailed description of several variants of the complexity reduction methods can be found in [5] and [6]. 4. REFERENCES [1] S. Ragot, B. Bessette, and R. Lefebvre, “Lowcomplexity multi-rate lattice vector quantization with application to wideband TCX speech coding at 32 kbit/s,” in Proceedings of ICASSP 2004, Montreal, Canada, May, 17-21 2004, vol. 1. [2] T. Vaillancourt, M. Jel´ınek, A. E. Ertan, J. Stachurski, A. R¨am¨o, L. Laaksonen, J. Gibbs, U. Mittal, S. Bruhn, V. Grancharov, M. Oshikiri, H. Ehara, D. Zhan, F. Ma,
D. Virette, and S. Ragot, “ITU-T EV-VBR: A robust 8-32 kbit/s scalable coder for error prone telecommunications channels,” in Proceedings of EUSIPCO 2008, Lausanne, Switzerland, August, 25-29 2008. [3] S. Ragot, B. K¨ovesi, R. Trilling, D. Virette, N. Duc, D. Massaloux, S. Proust, B. Geiser, M. Gartner, S. Schandl, H. Taddei, Y. Gao, E. Shlomot, H. Ehara, K. Yoshida, T. Vaillancourt, R. Salami, M. S. Lee, and D. Y. Kim, “ITU-T G.729.1: An 8-32 kbits/s scalable coder interoperable with G.729 for wideband telephony and voice over IP,” in Proceedings of ICASSP 2007, Honolulu, Hawaii, USA, April, 15-20 2007. [4] I. Tabus and A. Vasilache, “Low bit rate vector quantization of outlier contaminated data based on shells of Golay codes,” in Proceedings of DCC 2009, Snowbird, Utah, USA, March, 15-18 2009. [5] A. Vasilache, S. Sarbu, and I. Tabus, “Reducing the search complexity for low bit rate vector quantization based on shells of Golay codes,” in Proceedings of EUSIPCO 2009, Glasgow, UK, August, 24-28 2009, to be published. [6] S. Sarbu, “Low bit rate vector quantization based on shells of Golay codes for audio and speech coding,” M.S. thesis, Tampere University of Technology, Tampere, Finland, 2009.