[Downloaded from www.aece.ro on Sunday, April 15, 2018 at 18:12:10 (UTC) by 191.96.241.223. Redistribution subject to AECE license or copyright.]
Advances in Electrical and Computer Engineering
Volume 11, Number 3, 2011
ADPCM Using a Second-order Switched Predictor and Adaptive Quantizer Vladimir DESPOTOVIC1, Zoran PERIC2 University of Belgrade, Technical Faculty of Bor, VJ 12, 19210 Bor, Serbia 2 University of Nis, Faculty of Electronic Engineering, A. Medvedeva 14, 18000 Nis, Serbia
[email protected] 1
Abstract — Adaptive differential pulse code modulation (ADPCM) with forward gain-adaptive quantizer and secondorder switched predictor based on correlation is presented in this article. Predictor consists of a bank of predetermined predictors for each block of speech samples, avoiding the need to solve, or quantize predictor coefficients during the coding process. The adaptation consists of switching to one of this predictors based on the values of the first and second order correlation coefficients. The theoretical model is generalization of the DPCM with the first order switched predictor for an arbitrary prediction order. Experimental results for ADPCM with the second-order four/eight state switched prediction based on correlation are provided. Index Terms — adaptive coding, correlation, predictive coding, speech processing, signal to noise ratio
I. INTRODUCTION This paper presents ADPCM with gain-adaptive quantizer and switched second-order predictor based on correlation. Gain-adaptive quantization was first introduced by Chen and Gersho [1-2]. The quantizer is adapted in response to a short-term estimate of the input signal standard deviation. Gain-adaptive quantizer based on optimal companding model was successfully implemented in DPCM with a simple first order predictor in [3]. Speech is first divided into frames; each frame is classified as low or high correlated based on the value of the correlation coefficient. Low correlated frames are encoded with higher bit-rate (i.e. 7 bits/sample), while high correlated frames are encoded with lower bit-rate (i.e. 6 bits/sample) without objectionable loss in quality of reconstructed speech signal. Switched first order fixed predictor is used in DPCM scheme, with two possible values of predictor coefficients: for low correlated frames (close to zero), and for high correlated frames (close to one). This paper presents an improvement of DPCM coding scheme with forward gain-adaptive quantizer and simple first order predictor presented in [3]. While the quantizer is not changed, special attention is given to improvement of the predictor performance. Switched prediction scheme based on correlation is used, where both transmitter and receiver have a bank of Lp possible predictors, and adaptation consists in switching to one of this predictors based on the value of correlation coefficient. The idea was initiated by the second-order four-state switched predictor introduced by Evci, Xydeas and Steele [4-5], that performs very well, especially in speech frames involving voiced-tounvoiced transitions. However, this predictor uses only the first normalized acf (autocorrelation function) value 1 to
assign a unique set of predictor coefficients to each of Lp=4 states (zones). We propose a switched predictor, where division into Lp zones will be performed taking into account both the first and the second normalized acf values (correlation coefficients ρ1 and ρ2). Note that correlation coefficient is a number between -1 and 1 that indicates the degree of linear dependence between the samples of speech. The correlation coefficient is +1 in the case of a perfect positive (increasing) linear relationship, −1 in the case of a perfect negative (decreasing) linear relationship [6], and some value between −1 and 1 in all other cases. As it approaches zero, there is less of a correlation. The advantages of the proposed prediction scheme are obvious: (i) all Lp predictors are predetermined, i.e. there is no need to transmit the coefficients, as these are stored at the receiver; (ii) the side information transmitted is simply the index of the predictor; and (iii) the amount of side information log 2 L p bits/frame is not significantly increased compared to the first order switched predictor. Cases of the second order switched predictors with four and eight predictor zones are analyzed. The remainder of the article is organized as follows. Section II describes theoretical basics of quantizer design. Section III presents a special case of a second-order fourstate switched predictor. Section IV gives concrete realization of ADPCM with gain-adaptive quantizer and switched predictor based on correlation. Section V discusses theoretical model and experimental results. Finally, section VI gives a conclusion. II. THEORETICAL BACKGROUND AND QUANTIZER DESIGN Consider a case of linear predictive coding (LPC) where current speech sample can be approximated by a linear combination of past samples: p
x ( n)
a
k
x ( n k ) e( n )
(1)
k 1
where x(n) is a current speech sample, p is prediction order, ak are prediction coefficients and e(n) is prediction error. Then the mean square error is given by: N 1
N 1
x ( n) E e ( n) n 0 n 0 2
a k x(n k ) k 1 p
2
(2)
where N is the number of samples in the frame. After some manipulation (2) can be rewritten as:
Digital Object Identifier 10.4316/AECE.2011.03010
61 1582-7445 © 2011 AECE
[Downloaded from www.aece.ro on Sunday, April 15, 2018 at 18:12:10 (UTC) by 191.96.241.223. Redistribution subject to AECE license or copyright.]
Advances in Electrical and Computer Engineering
Volume 11, Number 3, 2011
p
a
E R xx (0)
k
R xx (k )
(3)
k 1
where Rxx(k) is the autocorrelation function of a signal x at a lag k [7]. Knowing that x 2
d2
1 N
N 1
e
2
1 N
N 1
x
2
( n)
n0
R xx (0) and N
(n) are the variances of the speech and
n0
error signal respectively, the following expression is obtained:
d 2 x2
1 N
p
a
k
R xx (k )
(4)
k 1
Correlation coefficient at a lag k is: N k
k
x
i
xi k
i 1
N
x
2 i
R xx (k ) R xx (k ) R xx (0) N x 2
(5)
Substituting (5) in (4), we obtain: p x 2 1 ak k k 1
N 1 9ˆ 2 1 ak k 2 N 2 C k 0
(6)
(7)
2 and ˆ is the variance of the quantized ˆ signal. Signal to quantization noise ratio can be determined as: where C 3
SQNR 6.02 R 10 log 10
C 3 C 2 N 1 181 ak k k 0
(8)
where N 2 R and R is a bit rate. III. SPECIAL CASE OF A SECOND-ORDER SWITCHED PREDICTOR Let us introduce a second order, four-state switched predictor, where the predictor switches to one of four possible states, based on the correlation coefficient 1 , as given in Table I. Values for ρ1 are chosen as the midpoints of each range. It is assumed that the second order correlation 62
It will be assumed that the voiced speech will mostly have the correlation coefficient in the first range (0.6, 1.0]. It will be encoded with lower bit-rate. Let us assume that the unvoiced speech (including the periods of silence) is evenly distributed in other three ranges, and encoded with higher bit-rate. Weighted SQNR can be presented according to this classification as: SQNR w SQNR (1)
1 w SQNR ( 2) SQNR (3) SQNR ( 4) 3
SQNR (i ) 6.02 Ri 10 log 10
Let us now consider ADPCM scheme with the predictor of the prediction order p. An expression for distortion of an N-point scalar quantizer designed optimally in the minimum mean square sense for the case of the first order predictor is derived in [3]. Using (6) and having in mind that in ADPCM the prediction error signal is actually quantized, we can now generalize this result for the arbitrary prediction order.
D
TABLE I. SECOND ORDER, FOUR-STATE SWITCHED PREDICTOR State Range of ρ1 ρ1 ρ2 number 1 0.6 ÷ 1.0 0.8 0.6 2 0.3 ÷ 0.6 0.45 0.3 3 0.0 ÷ 0.3 0.15 0.05 4 -1.0 ÷ -0.0 -0.5 -0.25
(9)
where w is a weight that denotes a share of voiced frames in overall speech and
i 1
d2
coefficients ρ2 are somewhat smaller. Due to the concentration of low frequency energy of voiced sounds, adjacent samples of voiced speech are highly correlated, with correlation coefficient close to one [8-9]. On the other hand, the correlation is close to zero for unvoiced speech.
C 3 C
2
, i 1...4 (10) 2 181 a k k k 1 where Ri are bit-rates associated with corresponding range. It is well known that for the second order predictor there is a direct relation between predictor and correlation coefficients [5]: a1
1 (1 2 ) 1 12
(11)
2 12 1 12
(12)
a2
Corresponding predictor coefficients a1 and a2 are determined using (11) and (12) for each range. Careful choice of ranges can lead to substantial improvement in quality of encoded speech. IV. IMPLEMENTATION OF SWITCHED PREDICTION BASED ON CORRELATION IN ADPCM ADPCM scheme with gain-adaptive quantizer and second order switched predictor based on correlation is used this article. The proposed scheme is an improvement of the coder given in [3]. Adaptation of the quantizer is performed using forward gain-adaptive quantizer based on optimal companding model [10]. The quantizer is adapted in response to a short-term estimate of the input signal standard deviation ˆ n . This may be achieved by scaling all the samples by a gain factor gˆ ˆ n , however it is preferably from a complexity standpoint to divide the input to the quantizer by the estimated gain. Since the quantizer is described in detail in [3], we will concentrate in this paper on implementation of second-order switched predictor in ADPCM coding scheme.
[Downloaded from www.aece.ro on Sunday, April 15, 2018 at 18:12:10 (UTC) by 191.96.241.223. Redistribution subject to AECE license or copyright.]
Advances in Electrical and Computer Engineering A block scheme of encoder/decoder is given in Fig. 1. Input speech signal x[n] is first divided into blocks (frames) with the typical length 80-240 samples (10-30 ms) [11-13]. Correlation coefficient at a lag k is determined using (5) for each frame. Note that two correlation coefficients are necessary for the second order predictor (k=1, 2).
Volume 11, Number 3, 2011 four-state switched predictor presented in Section III. Let us assume that a share of voiced speech is 60% [14], i.e. w=0.6 in (9). That means that 60% of frames are considered highly correlated and classified as state 1 in Table I. These are encoded with 6 bits/sample bit-rate. The frames classified in other three ranges are encoded with higher bit-rate 7 bits/sample. SQNR dependence on the signal variance (dynamic range -20dB20dB is assumed) for 16-levels gain quantization is given in Fig. 2, i.e. we have a case of switched quantizer designed optimally for 16 different values of variance (gain) in order to cover whole dynamic range of the input speech signal. It is obvious that proposed model outperforms quantizer with the first order switched predictor given in [3]. The ITU-T G.712 standard is also satisfied in whole dynamic range. G.712 recommends lower bound of speech quality measured in SQNR that needs to be satisfied to reproduce high quality speech signal [15].
(a)
(b) Figure 1. (a) Encoder; (b) Decoder
For high quality speech coding at mid bit-rates rather simple switched predictor procedures may be adequate. An illustrative example is the switched second-order, four-state predictor given in Table 1. Based on the value of the first order correlation coefficient on the particular frame, the frame is classified in one of four possible ranges, or states (T1,1), (T2,T1), (T3,T2) and (-1,T3), where Ti, i=1,..,3 are boundary values of correlation coefficient (numbers between -1 and 1). For example, the frame is classified in state 2 if the T2