Speech Decoding With Error Concealment Using ... - Semantic Scholar

1 downloads 0 Views 231KB Size Report
Viterbi algorithm (SOVA) in combination with interleaving as proposed in [6], the instantaneous bit error rate for the channel decoded bit i(m) is given by.
SPEECH DECODING WITH ERROR CONCEALMENT USING RESIDUAL SOURCE REDUNDANCY

T. ivingscheidt and P. Vary Institute of Communication Systems and Data Processing Aachen University of Technology, Templergraben 55, D-52056 Aachen, Germany [email protected], http://www.ind.rwth-aachen.de/-tim ABSTRACT In digital mobile communication systems there is a need for error concealment techniques to reduce the subjective effects of residual bit errors which have not been eliminated by channel decoding. This contribution presents a new approach for optimum estimation of speech codec parameters. It can be applied to any speech codec standard if a reliability information about the channel decoded bits is available (e.g. soft-output Viterbi algorithm - SOVA [6]). The proposed method exploits the residual source redundancy and includes an inherent muting mechanism leading to a graceful degradation of speech quality in case of adverse transmission (conditions. In the case of an error free channel, bit exactness as required by the standards can be preserved. This approach is applied here to the GSM full rate codec.

Figure 1: Conception of the soft input speech decoder Viterbi algorithm (SOVA) in combination with interleaving as proposed in [6], t h e instantaneous bit error rate for the channel decoded bit i(m)is given by

being the soft-output value whose sign a(m) = sign[l] equals the decoded hard-bit, ~ ( ~ ) ( denoting m) the CORI:Sponding transmitted bit, and being the received sequen.ce of symbols that is input to the channel decoder. Using the instantaneous bit error ratepe(m) of the received bit ? ( m ) , a bit transition probability can be formulated as

1. THE CONCEPTION OF SOFT INPUT

2

SPEECH DECODING In digital communication systems residual redundancy of the transmitted parameters can be observed even if source encoding techniques are used. This is due to the speech coding strategy, the limited processor resources, and the signal delay constraints. As already mentioned by Shannon [l]this source coding sub-optimality can be used at the receiver side to enhance the decoding process. Former publications on error concealment exploiting this residual source redundancy had been restricted to the MAP error criterion [2, 31, or they assumed that previousiy received parameters had to be known exactly, i.e. without errors [ 4 ] ~ This proposal focuses on the enhancement of the speech quality rather than on the minimization of a bit or symbol error rate [5]. Furthermore, the proposed error concealment technique is able to include parameter individual estimators without taking into consideration idealizing assumptions about previously received parameters. Let us consider a specific codec parameter 6 E R which is coded by M bits. T h e quantized parameter Q[6] = U is represented by the bit combination g = (z(O), ...,z(M-1)) with z(m)E (-1, +I}. Any bit combination g is assigned to a quantization table index i, such that we can write 21 = z(') as well as v = U(') with index i E (0, 1,..., 2 M - 1) to denote the quantized parameter. In Fig. 1 the soft input speech decoder is depicted. Assuming a channel decoding scheme such as the joft-output

Because of the integrated interleaving scheme providing independent bit errors a t least within any bit combination f, a set of 2 M transition probabilities from every transmitted parameter to t h e received one 2 can b e computed by M-1

P(2 I

=

IT

P(j.(m) I z(i)(m))

.

(3)

m=O

In the following, ZAndenotes the bit combination n time instants' before the present bit combination & . For the estimation of speech codec parameters at the receiver, a posteriori probability terms providing inforntation about any transmitted parameter index i are required. The complete receiver knowletge comprising all previously with X-,= ~ ~ , g ... - ~ , received bit combinations &,X-, shall be exploited. To compute the a posteriori terms it is necessary to find a statistical model of the sequence of quantized parameters. In a first approach, the sequence of quantized parameters is modelled as a Markov process of 1st order, i.e. P(% I ~ - ~ , g...) - ~ = ,P ( 5 I Solutions For *

A

lThe term "time instant" denotes any moment when the regarded parameteris received. In an ADPCM codec e.g. it equals a sample instant, in CELP coders it may be a frame or a subframe instant.

This work was supported by the Deutsche Forschungsge meinschaft (DFG) within the program "Mobilkommunikation"

.

91

10r

higher order models can be derived. After some intermedia t e steps the solution can be given in terms of a recursion as P(%(i)

I &,g-l) = c . P(& 1 %

I

(9),

j=0

&,g-,)

2M-1

T h e constant C guarantees P(G(’) I = 1. T h e terms P(%(‘) I contain the a priori knowledge (redundancy information) which has to be measured once over a large data base. These measurements can be stored at the receiver in a ROM table of size 2 M x 2‘. Furthermore it should be noted that the term P ( Z - ~ ( ~I &) l , & - 2 ) is nothing els_e but the resulting a posteriori probability from the previous time instant. P(%(3) I

* MAP Estimation

&,x-l)

Conventional Decoding

-2

AND SIMULATION RESULTS

&>x-l).

n/=argma~P(%(~ I) t

4

quality can be enhanced further i n comparison to the MAP estimation. Hard annoying effects disappear and are turned to a slightly modulated and noisy speech quality including the inherent muting effect. This is subjectively judged to be much more pleasant than the typical click noise and the synthetic frame repetition effects of digital transmission systems. It should be noted that this approach can be applied to different source coding schemes such as ADPCM and CELP using individual parameter estimators and Markov model orders. Simulated speech files c a n be found in the web at location http://www.ind.rwth-aachen.de/-tim and will b e presented at t h e workshop.

T h e maximum a posteriori probability (MAP) estimator follows the simple criterion v M A p= v ( ~ ) with

2

6

8 10 12 CiI [dB] Figure 2: Soft input GSM full rate speech decoding

2. INDIVIDUAL PARAMETER ESTIMATION

(5)

The optimum decoded parameter in a M A P sense v M A palways equals one of the codebook/ quantization table entries minimizing the decoding error probability. In Fig. 2 the results of a complete GSM simulation using the COSSAP GSM library [7] with speech and channel coding, interleaving, modulation, a channel model, demodulation and equalization are depicted for different carrier-tointerferer ratios C/I and user speeds. The channel model represents a typical case for an urban area (TU) taking into account 6 characteristic propagation paths. Although the SNR surely is not t h e optimum measure for speech quality, informal listening tests as well as SNR measurements show a significant superiority of MAP estimation in comparison t o the conventional GSM decoder with a frame repetition algorithm [SI as error concealment method. For a wide area of speech codec parameters such as spectral coefficients, gain factors, etc., the minimum mean square error (MS) estimation criterion performs even better than the M A P one. The optimum decoded parameter v M s in a mean square sense equals

ACKNOWLEDGEMENTS The authors would like to thank C.G. Gerlach, and S. Heinen for a lot of helpful discussions. 3. REFERENCES C.E. Shannon, “A Mathematical Theory of Communication”, Bell Systems Technical Journal, vol. 27, pp. 379-423, July 1948. K. Sayood and J.C. Borkenhagen, “Use of Residual Redundancy in the Design of Joint Source/ Channel Coders”, IEEE Transactions on Communications, vol. 39, no. 6, pp. 838846, June 1991. J. Hagenauer, “Source- Controlled Channel Decoding”, IEEE Transactions on Communications, vol. 43, no. 9, pp. 2449-2457, September 1995. C.G. Gerlach, “A Probabilistic Framework for Optimum Speech Extrapolation in Digital Mobile Radio”, Proc. of ICASSP’9.9, pp. 11-419-11-422, April 1993. T. Fingscheidt and P. Vary ‘‘Error Concealment by Softbit Speech Decoding”, Proc. of ITG-Fachtagung Sprachkommunikation, pp. 7-10, September 1996. J. Hagenauerand P. Hoeher, “A Viterbi Algorithm with SoftDecision Outputs and its Applications”, Proc. of GLOBE-

2M-1

i=O

According to the well known orthogonality principle of the linear mean square estimation the variance of the estimation error e M S = u M s - v is g2 = U: -,,;U with MS uzMs 2 0. Therefore, we can state t h a t u2VMs 5 U;. In the case of a worst case channel with p e = 0.5, the MS estimated parameter according to eq. (6) is completely attenuated t o zero if has a zero mean. This is e.g. the case for gain factors in CELP coders. Thus the MS estimation of the gain factors results in an inherent muting mechanism providing a graceful degradation of speech. By MS estimation of all non-integer parameters of the GSM full rate speech codec (curves marked by ”0” in Fig. a), the speech

COM, pp. 1680-1686,1989.

Synopsys Inc., Mountain View, CA, USA, “COSSAP Model Libraries Volume 3”, 1995. “Recommendation GSM 06.11, Substitution and Muting of Lost F‘rames for Full Rate Speech Traffic Channels”, ETSI/TC SMG, February 1992.

92