Hybrid symbol- and bit-based source-controlled channel ... - IEEE Xplore

1 downloads 0 Views 277KB Size Report
is proposed and applied to the mixed-excitation linear prediction. (MELP) parameters of speech. A modified BCJR algorithm is also proposed for use in the ...
394

IEEE COMMUNICATIONS LETTERS, VOL. 7, NO. 8, AUGUST 2003

Hybrid Symbol- and Bit-Based Source-Controlled Channel Decoding for MELP Speech Parameters Xiaobei Liu and Soo Ngee Koh

Abstract—A joint source channel coding (JSCC) scheme which exploits bit-level correlation as well as symbol-level correlation efficiently in a source-controlled channel decoding (SCCD) process is proposed and applied to the mixed-excitation linear prediction (MELP) parameters of speech. A modified BCJR algorithm is also proposed for use in the SCCD algorithm. Simulation results show that our proposed scheme performs better than other redundancybased JSCC schemes such as bit-based SCCD, soft-bit speech decoding (SBSD) and iterative source-channel decoding. Index Terms—Joint source channel coding, residual redundancy, speech decoding.

I. INTRODUCTION

D

UE TO the high compression rate of the MELP (2.4 kb/s) coder [1], the encoded bit streams of the speech parameters are extremely vulnerable to errors and the quality of the synthesized speech may suffer intolerable degradation especially under poor channel conditions. Due to imperfections of the speech coder, some residual redundancy remains in the encoded bit streams. A number of redundancy-based JSCC techniques [2] such as SCCD [3] and SBSD [4] were proposed to exploit the residual redundancy to improve the quality of the decoded speech when errors occur. Similar ideas were also applied to improve the robustness of the MELP coders[5], [6]. Recently, a combination of SCCD and SBSD in an iterative source-channel decoding approach was also proposed in [7] which achieves additional quality improvement over SCCD or SBSD alone. However, the size of the interleaver will affect the performance of the iterative decoder. Although a larger interleaver size can lead to better performance, it will introduces extra delay which may not be acceptable for speech transmission. In this letter, a JSCC scheme which employs a hybrid bit- and symbol-based SCCD is proposed and a modified BCJR algorithm is developed to exploit the inter- and intra-frame correlation of the transmitted MELP parameters. This scheme is compared with other redundancy-based JSCC schemes. Results show that the proposed scheme can achieve significant improvement over other JSCC schemes such as SCCD and SBSD, and is even better than the iterative JSCC decoder.

II. CONVOLUTIONAL ENCODER AND DECODER HYBRID SCCD

FOR

The commonly used SCCD is based on the bit-level whereas symbol-based SCCD is rarely used because of the complexity of the convolutional decoder grows exponentially with the symbol length, which makes the decoder far too complex for most applications [7]. In this paper, we propose a more efficient scheme that decodes some of the MELP parameters in a frame as nonbinary symbols according to their symbol length and the rest at the bit-level. Suppose at time , a -bit vector, , is fed into a convolutional encoder shift registers as shown in Fig. 1. The state consisting of is determined by the most recent inputs. of the register Unlike conventional shift register that shifts one bit at a time, bits at a time the proposed convolutional encoder shifts and produces an output sequence where and is the code rate. It should be noted should always be satisfied. At that the condition the decoder side, a modified BCJR algorithm is proposed to exploit the inter- or intra-frame correlation of the encoded speech parameters and to obtain the a posteriori probability (APP) of each transmitted bit or symbol according to . III. MODIFIED BCJR ALGORITHM Suppose symbols are transmitted through an AWGN channel. At the receiver side, the channel decoder will given the find the most probable transmitted symbol in which received sequence , and (1) is white Gaussian noise with variance . where A simplified BCJR algorithm for binary input is described in [8]. We modify the BCJR algorithm so that it can exploit either inter- or intra-frame correlations of the source symbols. The APP of a decoded symbol is given by (2) where (3) By using Bayes rule, we have (4)

Manuscript received February 10, 2003. The associate editor coordinating the review of this letter and approving it for publication was Dr. I. Fair. The authors are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (e-mail: [email protected]). Digital Object Identifier 10.1109/LCOMM.2003.815625

in which

1089-7798/03$17.00 © 2003 IEEE

(5)

LIU AND KOH: SOURCE-CONTROLLED CHANNEL DECODING FOR MELP SPEECH PARAMETERS

Fig. 1.

395

Convolutional encoder for hybrid encoding.

ERROR SENSITIVITY

(6)

TABLE I MELP BITS EVALUATION

OF THE

IN

TERMS

OF

SUBJECTIVE

and in (5)

(7) For memoryless channels, the first term in (7) can be written as where is the same symbol in the previous decoded frame. The third term in (7) is (8) is the symbol output given where is a constant and and input . If no residual redundancy is considered, state which means that the transmitted symbols are assumed to be independent identically distributed (i.i.d), the second term in (7) is

is the next state of (12) given the input otherwise. in (6) is similar to The derivation of in (5) as described above. IV. APPLICATION TO MELP SPEECH PARAMETERS

(9) To exploit the redundancy, as stated in the previous section, inter- or intra-frame correlations can be considered. When intraframe symbol correlations are considered, the second term in (7) is reduced to (10) When inter-frame symbol correlations are considered, the second term in (7) is reduced to

(11)

A MELP frame interval is 22.5 ms in duration and contains 54 bits which are divided into 11 parameters. The subjective evaluation of the error sensitivity of the bits in the MELP coder is given in Table I. It is apparent that the parameters which include bits belonging to Group3 and Group4 are more sensitive to errors and hence need more protection. Thus, we propose to decode them at the symbol level to exploit their residual redundancy more efficiently. By considering the amount of redundancy and decoding complexity, we identify six indices listed in Table II which are decoded at the symbol level. For example, we split the 7 bits of Pitch into two symbols, namely Pitch(H) and Pitch(L), which consists of the most significant 5 bits and the least significant 2 bits of Pitch respectively. According to the characteristic of speech signals and the property of the MELP encoder, most of the residual redundancy exists in the form

396

IEEE COMMUNICATIONS LETTERS, VOL. 7, NO. 8, AUGUST 2003

TABLE II INDICES DECODED AS NONBINARY SYMBOLS AND THEIR REDUNDANCY

Fig. 2. SNR of decoded MELP parameter Pitch.

of inter-frame correlations. However, it is found that the intraframe symbol correlation of Pitch(L) with Pitch(H) is more than the inter-frame correlation of Pitch(L). The intra-frame correlation of LSF(M) with LSF(H) is comparable to the inter-frame correlation of LSF(M). By comparing (10) and (11), we can find that it is simpler to use intra-frame correlation. Therefore, for Pitch(L) and LSF(M), intra-frame correlation is used. A convolutional code with a constraint length of 7 and ratethe octal generator (133, 171) is chosen as the channel code for the encoded MELP parameters and they are assumed to be transmitted through an AWGN channel. The symbol transition and are probabilities obtained from a training sequence consisting of about 50 000 speech frames from the TI-MIT speech database. To make the modified BCJR algorithm implementable, the transmitted bits are reordered such that the bits belonging to the same symbol are grouped together. The APP’s of the relevant symbols and bits are obtained by the hybrid convolutional decoder and the MELP decoder which uses them to make an MMSE estimation (SBSD with no a priori information) for parameters Pitch, Gain1, Gain2 and LSF1. V. RESULT AND CONCLUSION To evaluate the performance of the proposed hybrid scheme, the SNR of Pitch decoded by the proposed JSCC is compared with some other redundancy-based JSCC schemes outlined below and the results are shown in Fig. 2. (The results for Gain1, Gain2, and LSF1 are very similar to those of Pitch).

Bit-based SCCD: The channel decoder uses SCCD to decode the transmitted parameters bit by bit by exploiting the inter-frame correlation of each bit. Bit-based SCCD SBSD(NA): After bit-based SCCD, II. the source decoder uses the APP’s to make SBSD with no a priori information for parameters Pitch, Gain1, Gain2, and LSF1. III. SBSD(Inter): The inter-frame correlations of Pitch, Gain1, Gain2, and LSF1 are exploited by the source decoder using SBSD rather than by the channel decoder using SCCD. IV. Iterative bit-based SCCD SBSD(Inter): A combination of schemes I and III in an iterative process, the iteration number is 2 [7]. From Fig. 2, it can be observed that our proposed hybrid SCCD scheme performs much better than bit-based SCCD. This is because the residual redundancy is more thoroughly exploited by the hybrid channel decoder. It also performs better than scheme III in which SBSD is used to exploit the residual redundancy at the same symbol level. This is due to the fact that when the redundancy is exploited by the channel decoder, it helps the channel decoder to achieve better error correction, and hence the source decoder is given more accurate APP’s for each decoded symbol. The hybrid decoder even outperforms slightly the iterative decoding scheme IV. Based on the above results, it is clear that residual redundancy of nonbinary symbols is better to be exploited at the symbol level to achieve more performance improvement. Also, exploiting residual redundancy by the channel decoder can achieve more performance improvement than exploiting residual redundancy by the source decoder if they are based on the same symbol level. However, symbol-based SCCD is more complex than SBSD especially when the symbol length is long. Therefore, hybrid SCCD should be used instead of symbol-based SCCD because it can efficiently and effectively exploit the residual redundancy of the encoded parameters while maintaining a reasonable complexity. I.

REFERENCES [1] A. V. McCree and T. P. Barnwell, III, “A mixed excitation LPC vocoder model for low bit rate speech coding,” IEEE Trans. Speech Audio Processing, vol. 3, pp. 242–250, July 1995. [2] K. Sayood and J. C. Borkenhagen, “Use of residual redundancy in the design of joint source/channel coders,” IEEE Trans. Commun., vol. 39, pp. 838–846, June 1991. [3] J. Hagenauer, “Source-controlled channel decoding,” IEEE Trans. Commun., vol. 43, pp. 2449–2457, Sept. 1995. [4] T. Fingscheidt and P. Vary, “Softbit speech decoding: A new approach to error concealment,” IEEE Trans. Speech Audio Processing, vol. 9, no. 3, March 2001. [5] D. J. Rahikka, T. E. Fuja, and T. Fazel, “U. S. Federal standard MELP vocoder tactical performance enhancement via MAP error correction,” in Proc. IEEE Military Communications Conf., vol. 2, 1999, pp. 1458–1462. [6] X. B. Liu, S. N. Koh, and S. Yoshida, “Error concealment using residual redundancy for MELP parameters,” IEICE Trans. Inform. Syst., vol. E85-D, no. 5, p. 906, May 2002. [7] N. Gortz, “On the Iterative approximation of optimal joint source-channel decoding,” IEEE J. Select. Areas Commun., vol. 19, pp. 1662–1670, Sept. 2001. [8] S. S. Pietrobon and S. A. Barbulescu, “A simplification of the modified Bahl decoding algorithm for systematic convolutional codes,” in Int. Symp. on Information Theory and its Applications, Nov. 1994, pp. 1073–1077. Revised Jan. 1996.