PITCH SYNCHRONOUS ADDITION AND EXTENSION ... - CiteSeerX

2 downloads 75 Views 145KB Size Report
PITCH SYNCHRONOUS ADDITION AND EXTENSION FOR LINEAR PREDICTIVE. ANALYSIS OF NOISY SPEECH. Tetsuya Shimamura. Department of ...
Proceedings of the 6th Nordic Signal Processing Symposium - NORSIG 2004 June 9 - 11, 2004 Espoo, Finland

PITCH SYNCHRONOUS ADDITION AND EXTENSION FOR LINEAR PREDICTIVE ANALYSIS OF NOISY SPEECH Tetsuya Shimamura Department of Information and Computer Sciences, Saitama University 225 Shimo-Okubo, Saitama 338-8570, Japan [email protected] ABSTRACT This paper proposes an approach for pitch synchronous linear predictive coding (LPC) of speech in noisy environments. A noise reduction method is derived which produces an enhanced speech signal with one pitch period. For the proposed LPC method, the enhanced one pitch speech signal is used in a form of pitch extension so that the autocorrelation function is obtained accurately. Simulation experiments show that the proposed LPC method provides a superior performance in white noise. 1. INTRODUCTION The main aim of LPC is to obtain the predictive coefficients of speech. For clean speech, the predictive coefficients can be estimated accurately by means of the autocorrelation method[1]. Furthermore, for the autocorrelation method it is possible to guarantee the stability of the resulting LPC filter, the transfer function of which is given by 









(1)























where is the predictive coefficients and is the gain parameter. If the LPC filter is stable, all the roots of the denominator are inside the unit circle. However, when a noise is added to the speech signal, estimation of the predictive coefficients becomes a very difficult task[2]. To overcome this problem, numerous methods have been proposed[3]-[6]. However, unfortunately, a satisfactory solution preserving the stability of the LPC filter and providing accurate estimation is never obtained. For example, a useful method to improve the performance of LPC analysis in such a noisy environment is to utilize the technique of noise compensation. Because the predictive coefficients can be computed from the autocorrelation function (ACF) of speech by using the autocorrelation method (Levinson-Durbin algorithm), an improvement of LPC is expected if the noise components are excluded from the ACF. In the case where the additive noise is a white noise, the ACF of speech can be expressed by





















#

#







'

(

*+





-

(2)

!



#

#







.

/

1

2

3

5

7

9

2

is the power of the white noise, and and where are the ACFs of speech and noisy speech, respectively. Subtracting based on the noisy part from the ACF of noisy speech at lag (

*+



#

#







©2004 NORSIG 2004







-











(2), the predictive coefficients may be estimated more accurately than the original LPC[4]. However, a serious problem appears here because unstability of the LPC filter arises when the noise power to be subtracted is too large. This problem is not easily avoided unless a priori accurate estimate of the noise variance is given, but this is difficult in practice. In this paper, we propose a noise reduction method based on pitch synchronous addition for noisy speech analysis, which always leads to a stable filter obtainable from the followed LPC analysis. We theoretically investigate the proposed noise reduction method and by computer simulations demonstrate the effectiveness of that in a noisy environment. Furthermore, studying how the improvement in signal to noise ratio(SNR) accomplished depends on the pitch period of speech, we show that the proposed noise reduction method is especially effective for high-pitched speech. In this paper, the proposed noise reduction method is applied to a pitch synchronous LPC analysis method addressed by Paliwal et al.[8]. Pitch synchronous LPC analysis is suitable for vocal tract and speech source analyses[9]. Also, it accomplishes an efficient transmission in speech communication systems[7]. The pitch synchronous LPC analysis method by Paliwal et al.[8] is based on the principle of extending the pitch period of speech, resulting in a method that provides an accurate estimate in noiseless environments. However, this method is also severely affected in noisy environments. The proposed pitch synchronous LPC method, a combination of the proposed noise reduction method with Paliwal’s pitch synchronous LPC analysis method, is very effective in noisy environments. This is shown by computer simulations where a performance comparison is made. 2. NOISE REDUCTION USING PITCH SYNCHRONOUS ADDITION Figure 1 shows a waveform of speech signal. It should be noted here that the speech signal has a clear periodicity, in which the amplitude of the waveform of benchmark signal (one period signal) preserves a constant value for each period. The period is known as the pitch period of speech. This property of speech may hold during about 20-25 msec when one utters, because for such time duration, speech is assumed stationary[9]. Therefore, applying the operation of average in a fashion synchronized to the pitch period, we could decrease the noise power if the speech signal is corrupted by an additive noise. Using the autocorrelation method after such an operation, we can expect a result satisfying both of getting the predictive coefficients accurately and of guaranteeing the stability of the resulting LPC filter.

196

4000

which also equals to (6). Inserting (7) into (5), the averaged signal is rewritten as

3000

































2000







 











9











5







(9)













Amplitude







1000



'







is constant for each , while random for each , In (9), because the characteristics of speech are not changed in a stationary state. Therefore, if approaches to infinity, then the result of (9) becomes (10) 9

0

−1000







7

5







7





−2000

"

#

%



−3000









9













−4000

&

(

In a practical situation, it is impossible to add the blocked speech samples infinite times. However, while the amplitude of keeps constant for synchronous addition (average), that of decreases by a factor of . This is obvious by considering the distribution of random signals. Thus, the SNR becomes

2000

1800

1600

1400

1200 1000 Time(Samples)

800

600

400

200

0

9











Fig. 1. Waveform of a speech signal.

5













*





 





*

 9





2.1. Pitch Synchronous Addition 













Let assume that a noisy speech signal is given by *



  



* 

5





  * 









9







'

5





(3) 







is the original speech signal and where noise. We divide the noisy speech signal into 9







5

is the additive blocks such as 













 















7



-



























-































(11)

(4)





*

9





  





*

5

where is the number of samples for each pitch period and corresponds to the number of pitch period included in the analysis frame to be assumed stationary. The synchronous addition is implemented for blocks, which is described by 













Therefore, the improvement in SNR is given by 





 







 

*































-



















(5)







.







 







(12)









That is, synchronously adding the blocked speech signals and dividing them by , the SNR increases by a factor of 

Equation (5), which includes the division by the operation of pitch synchronous average. 

, is equivalent to





times .

3. PITCH EXTENSION

2.2. Improvement in SNR The SNR of the noisy speech signal 



is obviously 

Let us assume that for a voiced speech signal 9







,



9









9



1



'



(13) 

 

 

*

9







is satisfied where is the sample length corresponding to pitch period and is an integer. In [8], based on this property of voiced speech, a pitch extended method for LPC was derived. When is obtained from one pitch period corresponding to the sample is calculated as length , the autocorrelation function of 











(6)





 





*

5







1

9









On the other hand, because the blocked speech signal 











is

9















 



4













9







'

5











(7)





#

#







 9







9





'





(14)





6 

where and correspond to the blocked speech and additive noise, respectively, the SNR of is defined by 9







5



















 



 * 9







In this case, it is assumed that there exists zero samples outside the windowed segment. In [8], it was assumed that periodical samples corresponding to the pitch period exist outside the windowed was calsegment. As a result, the autocorrelation function of culated as 9











(8)









  









 









*

5









7



#





9 

#







6 



197









9





'







(15)

The increment of data samples to calculate the autocorrelation function leaded to an improvement of the performance of LPC analysis. Paliwal assumed a noiseless situation in [8], but in this paper a noisy situation is considered. Thus, the proposed method is as follows. The noise reduction method in Section 2 is applied first. Then, according to Paliwal’s approach, instead of

where is the variance of the white noise. In (21), the noisy term is expressed by *+

(











+

+ 











 

5







5





'









(

*





-

+

!

(23)



-







-





+ 

+

and are affected From these, we see that by noise, to almost the same degree. As shown in [8], however, is essentially more accurate than . Therefore, it is expected that (17) provides a good performance of LPC in noisy environments. 





+



















7+



 



4





 

 





 









'





(16) 













7

6 



the following autocorrelation function calculation

#











#

#









#



 







7









 









'

(17)









5. SIMULATIONS

6 



is used for pitch synchronous LPC analysis in noisy environments. This preserves the property that the correlation matrix becomes Toeplitz. Therefore, the Levinson-Durbin algorithm is efficiently used to obtain the predictive coefficients and the stability of the estimated LPC filter is guaranteed. For the above process of derivation of the proposed LPC method, one possible combination of the noise reduction method with (16) is also recognized as new one, which is the proposed method without pitch extension.

To verify the performance of the proposed analysis method, we carried out computer simulations. A synthetic vowel /o/ was used, which was generated based on 



9











9





7



'







(24) 











 









( 













-

















(25)





with the following parameters; , , , , , , , , , , , Sampling frequency =10 kHz, ) Pitch period=8msec( This synthetic vowel was used for speech analysis in [8]. We added a white noise to the generated speech data and prepared noisy speech data. First, we examined the relation between the pitch period and the improvement in SNR accomplished by the proposed noise reduction method. To do this, the pitch period was changed as 

4. ANALYSIS



-





























-

















*









A perfect periodicity of pitch is lacked for noisy speech, that is,



















-







-











-





















-



-















-

















%















1



'





(18)







-



!













-













$





'



#











&



#

Hence, it is considered that the application of (15) to a speech signal having the property of (13) is different from that of (17) to a speech signal having the property of (18). In this sense, the property of (17) should be investigated. By inserting (3) into (17) and expanding, we have 

 









7











9









'

5









9





'





'

5





'











(







-

6 





20,30,40, 60, 80,120 (number of samples), 

 







 9







9





'





'

9







5





'







and based on the above speech generation way, speech signals were generated. The result in the case of frame length is shown in Figure 2. We see that the pitch period and SNR are inversely related. This result obviously validates (12). Next, we used noisy speech data with SNR=7dB. Figure 3 shows the power spectra estimated for 10 individual trials on the speech data by the covariance method, the proposed method without pitch extension and the proposed method with pitch extension, and respectively. Commonly, the data number used is the prediction order is . The pitch period is settled as . Figure 3 shows that the proposed method with pitch extension produces the closest spectral shape to the true one. For each spectrum estimated, we evaluated the following spectral estimation error 





-



6 

'

5













7



9









'

'







'

7

5













'

5







'

 7+











'



7+



#







+ +

#

#

#

(19) By taking expectation on both sides, this equation reduces to   



 











 







'





# 

 +

7+ 7

7







(20) 

#







-





-



denotes expectation, because the speech signal is where . uncorrelated with the noise component On the other hand, by inserting (3) into (15), and by expanding it and taking expectation, we have 

9

5





























+ 



#

# 







'











)







-

+













(21)















-

+

.

,



(





-

/

In (20), the noisy term is expressed by

(

1







6 (



4

5

(26)

4 (

 





3





7+ +













5







5





'







is the true power spectrum and is its estimate. where For the proposed method without pitch extension, the spectral es[dB]. And, for the proposed method timation error was 









*+





!

-



-







-

(22)

5







-



-



198



(

(

(

with pitch extension, [dB]. The standard autocorrela[dB]. These results suggest that tion method provided the operation of only pitch synchronous addition invokes an improvement of 1.21[dB], and the pitch extension operation further invokes that of 3.87[dB]. 



















Signal to Noise Ratio Improvement







14

6. CONCLUDING REMARKS

12

For the purpose of pitch synchronous LPC analysis in noisy environments, we have presented a noise reduction method using the technique of pitch synchronous addition. The method enables us for the following LPC analysis to satisfy both getting more accurate predictive coefficients and guaranteeing the stability of the LPC filter. The noise robustness of the combined LPC analysis of the noise reduction method with pitch extension technique has been confirmed by computer simulations.

10

8

6

4

2

7. REFERENCES 0 20

30

40

50 60 70 80 90 Pitch Period(Number of Samples)

100

110

120

Fig. 2. Relation between the pitch period and the improvement in SNR for synthetic vowel /o/.

[1] J.Markel, ”Digital inverse filtering - A new tool for formant trajectory estimation”, IEEE Trans. Audio and Electroacoust., Vol.AU-20, No.2, pp.129-137, June 1992. [2] M.R.Sambur and N.S.Jayant, ”LPC analysis / synthesis from speech inputs containing quantizing noise or additive white noise”, IEEE Trans. Acoust., Speech, and Signal Process., Vol. ASSP-24, No. 6, pp. 488-494, 1976. [3] J. Tierney, ”A study of LPC analysis of speech in additive noise”, IEEE Trans. Acoust., Speech, and Signal Process., Vol. ASSP-33, No. 6, pp. 389-397, Aug. 1980. [4] S.M. Kay, ”Noise compensation for autoregressive spectral estimates”, IEEE Trans. Acoust., Speech, and Signal Process., Vol.ASSP-28, No.3, pp.292-303, 1980. [5] H.Hu, ”Noise compensation for linear prediction via orthogonal transformation”, Electronics Letters, Vol.32, No.16, pp.1444-1445, 1996. [6] T.Shimamura, N.Kunieda and J.Suzuki, ”A robust linear prediction method for noisy speech”, Proc. IEEE ISCAS, pp.IV257-IV260, 1998.

20

true proposed(pitch extension) proposed standard 10

[7] S.Chandra and W.C.Lin, “Linear prediction with a variable analysis frame size”, IEEE Trans. Acoust. Speech and Signal Process., Vol. ASSP-25, No.4, pp.322-330, 1977.

Spectral density[dB]

0

[8] K.K. Paliwal and P.V.S. Rao, ”A modified autocorrelation method of linear prediction for pitch-synchronous analysis of voiced speech”, Signal Processing, Vol. 3, No. 2, pp. 181-185, 1981.

−10

−20

−30

−40

0

0.5

1

1.5

2

3 2.5 Frequency[kHz]

3.5

4

4.5

5

[9] B.S. Atal and S.L. Hanauer, ”Speech analysis and synthesis by linear prediction of the speech wave”, J. Acoust. Soc. Am., Vol. 50, No. 2, pp. 639-640, Aug. 1971.

Fig. 3. Comparison of LPC spectra for synthetic vowel /o/.

199

Suggest Documents