Robust Vector Quantizer Design Using Competitive Learning Neural Networks Seyed Bahram ZAHIR AZAMI
E cole Nationale Superieure des Telecommunications 46, rue Barrault 75634 Paris cedex 13, France Tel: +33 1 45 81 76 77
[email protected]
Gang FENG
Institut de la Communication Parlee INPG / Universite Stendhal 38040 Grenoble cedex 9, France Tel: +33 4 76 82 43 38
[email protected]
Abstract
of the vocoder is to be vector quantized and to be transmitted over a binary symmetric chanIn this paper we propose a new method to nel (BSC), as in gure 1, where p is the channel design vector quantizers for the noisy chan- transition probability. nels. Competitive learning neural networks Y are known for their eectiveness in voice and Vocoder X VQ VQ BSC image data compression; we use the competitive learning algorithm to create a topological similarity between the input space and the index space. This similarity reduces the eect of channel noise because any single bit error Figure 1: General block diagram. in a transmitted index will be translated to a close codevector in the input space which yields We aim at minimizing the overall distortion relatively small distortion. For an 8-bit vec- considering the channel noise eect, as in the tor quantizer, the proposed system resulted in following equation: about 4.6 dB spectral distortion for a highly noisy channel while a simple LBG, LBG with splitting and Kohonen map have resulted in D = E [d(X; Y )] b b 5.9, 5.4 and 5.0 dB of distortion, respectively. XXZ d(X; Yj ):p(j ji):p(i)dx = -1
0
1-p
Spectrum Env.
0
p p
1
1
1-p
2
Keywords : robust vector quantization, competitive learning, index assignment, Hamming distance, vocoder.
i
=1
2
j
=1
X 2Ci
The factor p(j ji) = pdH i;j (1 ? p)b?dH i;j , where b is the number of total bits and dH denotes for Hamming distance. This factor in the above formula results in a high dependency 1 Introduction of the distortion to the index assignment. In this paper a robust vector quantization de- One possibility to design a robust vector sign is proposed. More speci cally, a vocoder quantizer is to divide the design in two stages: is considered, the spectrum part of the output rst by assigning L = 2b codevectors (e.g., us(
1
)
(
)
ing LBG [7]); and then by assigning the indices to each one of the L codevectors. In this method, there will be L! possibilities to order these L codewords. However, it should be noted that many of these possibilities yield identical results. We can permute the bits to yield another codebook which has exactly the same distortion. For example, with b = 3 bits, the codebooks x x x , x x x , x x x , x x x , x x x and x x x are equivalent. We may also negate any combination of bits in a codebook to achieve an equivalent codebook like x~ x x , x x~ x , x x x~ , x~ x~ x , x~ x x~ , x x~ x~ or x~ x~ x~ , which are all equivalent to b x x x . In fact there are exactly bb = b? L = 2b b distinct combinations to assign codevectors to L codewords. The 2b and the b! factors in the denominator eliminate respectively the symmetric cases and the bit permutation cases. For instance, for b = 8 bits, this results in 8:3 10 distinct possible combinations. Obviously, the index assignment is a non polynomial (NP)-complete task [5]. Several methods for reordering the indices have been proposed in the literature. Farvardin [2] suggests simulated annealing (SA) while Zeger and Gersho [11] propose binary switching algorithm (BSA). Both methods consist of pairwise swaps of the codewords to improve a given index assignment. A robust vector quantizer can be obtained also with competitive learning; as this method can conserve a topological similarity from the input space to the index space. In this article, we will use such a method. Recently a slightly similar work has been reported which makes use of deterministic annealing [1]. Although there are some similarities in our approaches, the two methods are completely dierent, since in our method the principal idea is to consider the indices in a hyper cube structure and to adopt the classical learning algorithm to this new structure. 1
2
3
1
1
2
3
1
2
3
1
3
2
1
3
1
2
3
1
2
2
3
3
2
2
1
3
1
3
2
2
1
3
3
1
2
3
1
2
3
1
2
(2 )!
(2
2
1)!
!
499
!
2 Quasi-linear mapping Knagenhjelm [4] de nes a linearity index from the index space (hyper cube) to the input space. This linearity index acts as an indicator of how good an index assignment is. Using this indicator, Knagenhjelm and Agrell [5] have proposed a Hadamard matrix transform which aims at making this mapping as linear as possible. They observed that for a maximum entropy encoder, a very good index assignment is the one which yields the most linear transform of the index space hypercube. According to the simulations reported in [5], their algorithm outperforms both SA and BSA in most cases, while having considerably less execution complexity. A useful mapping which keeps some topological similarity between the input space and the output space is provided by the Kohonen map neural network [6]. Skinnemoen [8] has exploited this property to make a mapping directly from the input space to the modulation space. This mapping, while ecient, has a disadvantage : it reduces system design exibility, as it includes all source coding, channel coding and modulation blocks in the same block. If one needs to change the source encoder, for instance, then it will be necessary to modify the two other blocks as well. 2.1
Self organizing (SOHC)
hyper
cube
In our proposition, only the source and channel coding blocks are included in the Kohonen mapping and the modulation block is left intact. There remains one problem : in Kohonen map, we usually have just two dimensions in the output space, while we need b dimensions and each dimension must really be a binary value. To overcome this problem, SOHC has been proposed [10]. SOHC is a generalization of Kohonen map
from 2 dimensions into b dimensions and the This topological similarity is due to the Hamming distance is used as the distance mea- neighborhood concept introduced in the learnsure between the codewords instead of the Eu- ing algorithm. For instance, in the learning clidean distance. phase, when the neuron 0000 wins for a given input vector, we encourage this neuron and its neighbors to be more sensible to this input W i1:::ib ;k (t + 1) = W i1:::ib ;k (t) + : xk (t) vector. So, during the learning, a topological structure is created which later allows to obwhere tain a more robust quantizer. W i1:::ib ;k are the (i :::ib)-th codevector 2.2 Splitting weights, splitting technique [7] is shown to create i:::ib are the indices of the winner code- The a natural ordering that can somehow protect vector, the signal in the presence of channel errors [2]. This is due to the fact that in the splitting is the adaptation gain and is a function mechanism, sister codevectors behave more or of t, less similarly. We observed that the proposed is the neighborhood gain and is a func- system performs better if we use splitting in its tion of both dH [(i :::ib); (i:::ib )] and t. training period. Of course, this natural order is not completely perfect, if one uses a classical algorithm like LBG, because it is more ef cient for the least signi cant bits (those that are added later) and less ecient for the most signi cant bits. (
)
(
(
)
1
)
1
1
1
0010
1000
a
b
c 00
0100
0
1
00 01
0000
Input Space (N=3 dimensions)
10
10 11 01
0001
11
Hamming Space (b=4 dimensions)
Figure 2: Mapping from the input space to the index space keeps its topological structures such that close codevectors in the input space (with Euclidean distance criterion), are mapped to the close indices in the hyper cube (with Hamming distance criterion). As shown in gure 2, in this example an input space with three dimensions is mapped to an index space with four dimensions. Since the neighbors in each space correspond to the neighbors in the other space, the two spaces are topologically similar and so the one bit errors in an index do not make too big distortions.
Figure 3: Splitting in three steps: (a) before splitting we have 2i codevectors; (b) we split these codevectors to 2i codevectors by adding some random disturbations to the initial values; (c) we apply some learning algorithm to obtain optimal new values for the split codevectors. +1
However in SOHC we have a mechanism that force the codevectors to go to the good direction and we compensate the eect of random disturbance. Simulation has shown that SOHC trained with splitting performs better than that without splitting.
In the SOHC simulation, we begin our quantizer with just one bit, then we split it to two bits, applied the learning algorithm, and then split to three bits, and so on till eight bits. In each step, we begin the learning with a neighborhood radius equal to the number of bits and reduced this radius gradually into 1 bit, before going to the next step. Beginning with big neighborhood values let SOHC more freedom to adopt its structure with that of the input space.
3 Experimental results A channel simulation is performed and the numerical results are presented in gure 4. The vocoder parameters are the ten rst log area ratio (LAR) parameters [9] of analyzed speech frames, which are quantized to b = 8 bits. So there are L = 2b = 256 codewords. The distortion measure is the spectral distortion (SD) [3]: r SD = Fs R Fs S ? S~ df Where Fs is the sampling frequency and S and S~ are de ned on the signal before and after the transmission, respectively: ! S = 10 log A ? j2f 2 t Fs Where At is the signal amplitude. This distortion measure is an objective measure (easy to calculate) which is very correlated to subjective measures, since in fact our auditive perception mechanism is mostly sensible to the spectrum envelop of speech. The curves SOHCS and SOHC in gure 4 correspond respectively to the system with and without splitting. It can be seen that SOHCS performs better than SOHC. For comparison, the performances of some other methods are illustrated, too: LBG, LBG with splitting (LBGS), binary LBG (LBGB) and Kohonen map (KM). The simulation proves that SOHCS is the 2
1
most appropriate system among those simulated in this work. LBGB is not well optimized for small values of transition probability of channel (p), but performs better than LBG and even better than LBGS for highly noisy channels. The dierence between all methods, except LBGB, is negligible for transition probabilities below p 10? , after this threshold, SOHC performs the best for all values of p in our simulations. A possible application of the proposed system can be in the CELP encoders as a substituting block. 3
4 Conclusion Neural networks techniques have been used to improve the performance of vector quantizers on the noisy channels. This method minimizes simultaneously the distortion due to the channel and due to the quantization. The same idea could be applicable in vector quantizing other kind of sources, like wavelet parameters of image, etc.
0
1
10
exp(
)
Acknowledgments We would like to thank
colleagues in ICP-Grenoble, specially Carlo Murgia who provided the data for the simulation in this paper. The rst author wishes also to thank all the colleagues that he had during dierent periods in Sharif university of Technology-Tehran and ENST-Paris.
References 1. J.M. Buhmann and T. Hofmann, \Robust vector quantizer design using competitive learning", Proceedings of the International Conference on Acoustics, Speech and Signal Processing - ICASSP (Munich, Germany), April 1997, pp. I.139{I.142. 2. N. Farvardin, \A study of vector quantization for noisy channels", IEEE Transactions on Information Theory 36 (1990), no. 4, 799{809.
6 LBG LBGS LBGB KM SOHC SOHCS
5.5
SD (dB)
5
4.5
4
3.5
3 −4 10
−3
10
−2
10 Channel Transition Probability (p)
−1
10
Figure 4: Channel simulation results : spectral distortion versus channel noise. 3. A.H. Gray and J. D. Markel, \Quantization 8. P.H. Skinnemoen, Robust communication and bit allocation in speech processing", IEEE with modulation organized vector quantization, Transactions on Acoustics, Speech and Signal Ph.D. thesis, Norwegian Institute of TechnolProcessing 24 (1976), no. 6, 459{473. ogy, 1994. 4. P. Knagenhjelm, \How good is your index as- 9. R. Viswanathan and J. Makhoul, \Quantizasignment?", Proceedings of the International tion properties of transmission parameters in Conference on Acoustics, Speech and Siglinear predictive systems", IEEE Transactions nal Processing - ICASSP (Minneapolis, Minon Acoustics, Speech and Signal Processing 23 nesota), Apr. 1993, pp. II.423{II.426. (1975), no. 3, 309{321. 5. P. Knagenhjelm and E. Agrell, \The hadamard 10. S.B. ZahirAzami, B. AzimiSadjadi, transform - A tool for index assignment", IEEE V. Tabatabaee, and H. Borhani, \Self Transactions on Information Theory 42 (1996), organizing hyper cube and some of its apno. 4, 1139{1151. plications", Proceedings of the First Iranian Conference on Electrical Engineering (Tehran, 6. T. Kohonen, \The self-organizing map", ProIran), May 1993, pp. 464{472. ceedings of the IEEE 78 (1990), no. 4, 1468{ 1480. 11. K. Zeger and A. Gersho, \Pseudo-gray cod7. Y. Linde, A. Buzo, and R.M. Gray, \An aling", IEEE Transactions on Communications gorithm for vector quantizer design", IEEE 38 (1990), no. 12, 2147{2156. Transactions on Communications 28 (1980), no. 1, 84{95.