Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement 1
Mourad Talbi, 2Lotfi Salhi, 3Mouhamed bennasr, 4Adnane Cherif 1, University of Kairouan,
[email protected] *2, University of Kairouan,
[email protected] 3,4 Faculty of Sciences of Tunis,
[email protected],
[email protected],
[email protected]
Abstract In this paper, a new speech enhancement method is proposed and is based on the application of the Wiener filtering in the Bionic Wavelet Transform (BWT) domain. The BWT provides a time-frequency selectivity and better energy concentration property. The proposed method was compared to a number of speech enhancement methods including a BWT based speech enhancement method, Wiener filtering ,and MMSE-STSA. Results are evaluated objectively using signal-to-noise ratio (SNR), segmental signal-to-noise ratio (segSNR), Itakura-Saito distance (ISd) and perceptual evaluation of speech quality (PESQ) with TIMIT sentences corrupted by various noise types and SNRs. The obtained results show that the proposed method gives good results when compared to the other techniques.
Keywords: Bionic Wavelet Transform, Minimum Mean Square Error-Short Time Spectral Amplitude Estimator Speech Enhancement, Wiener Filtering.
1. Introduction Speech enhancement and the uncorrelated additive noise are important problems that have received much attention in the last two decades. This is the result of the rising employment of the speech processing systems in diverse real environments. The noise presence affects the speech processing systems performance. Those systems include speech recognition, mobile phones hearing aids, and voice coders. The speech enhancement aim is to improve the intelligibility and perceptual quality of speech by minimizing the effect of noise. Existing techniques for this task include Wiener filtering [1], spectral subtraction [2, 3], wavelet transform (WT) [4, 5], etc. An emerging tendency in the speech enhancement domain consists of employing a filter bank based on a specific psychoacoustic model of human auditory system (Critical bands). The principle behind this is based on the fact that embedding the model of psychoacoustic of human auditory system in filter bank can improve the intelligibility and the perceptual quality of speech. Furthermore, it is well known that the human auditory system can approximately be described as a nonuniform bandpass filter bank and humans are able to detect the desired speech in noisy environments without noise prior knowledge [6]. Different frequency transformations (scales) are proposed to consider the hearing perceptive aspect (ERB, Bark, Mel and so on). It deserves mentioning that the majority of the perceptual speech enhancement techniques are based on the wavelet packet transform [7]. Moreover, the wavelet packet transform was successfully combined with other denoising approaches for the purpose of ameliorating the speech enhancement techniques performance. They include the Wiener filtering [8], adaptive filtering [9], spectral subtraction [10] and coherence function [11]. In this paper, we propose a new speech enhancement technique which consists in applying the Wiener Filtering to each noisy Bionic wavelet coefficient in order to filter it. Then, the enhanced speech signal is obtained by applying the inverse of the BWT to those filtered bionic wavelet coefficients. The rest of the paper is organized as follows: Section 2 describes the proposed speech enhancement technique by giving a detailed overview of the bionic wavelet transform (BWT) and the Wiener filtering. In section 3, we deal with the bionic wavelet transform and the section 4 deals with Wiener filtering. Section 5 presents the objective quality measurement techniques. Experimental results are presented and discussed in section 6. Finally, the conclusion is given in section 7.
International Journal of Advancements in Computing Technology(IJACT) Volume4, Number2, February 2012 doi: 10.4156/ijact.vol4.issue2.19
146
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
2. The new proposed approach In this paper, we propose a new speech enhancement method including the Wiener Filtering and the Bionic Wavelet Transform (BWT). As shown in Fig.1., the Wiener filtering is used to reduce noise and enhance the Bionic Wavelet Coefficients which are obtained from the application of the Bionic Wavelet Transform (BWT) to the noisy speech signal. The Bionic Wavelet Transform (BWT) has been initially proposed by Yao and Zhang [12, 13] for speech processing. The BWT was implemented by incorporating the active cochlear mechanism into the wavelet transform (WT), resulting an adaptive time-frequency analysis and biologically-based model [14]. Furthermore, it presents a time-frequency selectivity and better energy concentration property that can guide to better signal and noise components separation within the coefficients [14]. Those characteristics and the success of the application of the Wiener filtering in the wavelet domain [11] have motivated us to apply the Wiener filtering in the BWT domain in order to improve the enhanced speech intelligibility. Fig.1. summarizes our new proposed speech enhancement technique.
Figure 1. Bloc diagram of the proposed speech enhancement technique. … are the 21 bionic wavelet coefficients of the noisy speech signal and … where are the 21 filtered bionic wavelet coefficients obtained after applying the Wiener filtering to each subband. The Wiener filtering algorithm is based on a priori SNR (Signal to Noise Ratio) estimation [6].
3. Bionic wavelet transform The bionic wavelet transform (BWT) was initially introduced as an adaptive wavelet transform and is conceived especially to model the human auditory system [14, 15]. The adaptive nature of the BWT is insured by replacing the constant factor of the wavelet transform with a variable quality factor. The mother wavelet () can be expressed as follow:
() () =
(1)
are respectively the center frequency and envelope function of (). The latter is where and () is chosen to be the Morlet wavelet and is represented in figure 3. In this case the function expressed as follow:
() = where
(2)
is the initial-support of the unscaled mother wavelet.
147
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
0.5 Real part Imag part
0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -4
-3
-2
-1
0
1
2
3
4
Figure 2. Real and imaginary parts of the Morlet mother wavelet. Using a time varying function T, the mother function of the BWT is expressed as follows [15]: () =
(3)
The BWT of a given signal () is defined as follows [12-16]:
∗ ∫ () − 1 () ∗ = ∙ || (, ) =
(4)
||
Hence, the adaptive nature of the BWT is captured by a time-varying factor T. This factor represents the scaling of the cochlear filter bank quality at each scale over time [12-16]. For the human auditory system, Yao and Zhang [12, 13] have taken = 15165.4 . The discrimination of the scale variable is accomplished using a pre-defined logarithmic spacing across the desired frequency rang so that the center frequency at each scale is expressed as follows [14-16]:
, = 0, 1, 2, … = (. )
(5)
For this project, coefficients at 21 scales, = 11, 11, … , 30 , are computed using numerical integration of the continuous wavelet transform. The 21 scales correspond to center frequencies logarithmically spaced from 166.4 Hz to 3369.7 Hz. For each time and scale, the adapting function (, ) is calculated using the following equation [14, 15, 16]:
(, + ∆) = 1 − |(,)|
× 1 + (, )
(6)
where designates the active gain factor representing the outer hair cell resistance function, is the active gain factor representing the time-varying compliance of Basilar membrane, is a ( ) constant representing the time-varying compliance of Basilar membrane, , is the BWT at scale and time , and ∆ is time computation step [13]. and resolutions in time domain and frequency domain can be increased respectively [13]. In implementation, BWT coefficients can be easily computed based on corresponding coefficients of the Continuous Wavelet Transform (CWT) by:
(, ) = (, ) ∙ (, )
(7)
148
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
where is a factor that depends on [13]. For the Morlet wavelet () = employed as the mother function in our experience, is expressed by: ∞
(, ) = (∫∞ )1 + ((, )/ ) that is roughly equal to:
which is also
(8)
. 1.7725⁄((, )⁄ ) + 1
In this paper, we employ the same values as in the reference [13-16]: = 0.87, = 45, = 0.8 and = 0.0005 . Finally, the computation step ∆ is chosen to be equal to 1⁄ , where represents the sampling frequency.
4. Wiener filtering Let (), () and () be respectively the noisy speech, the clean speech and the additive noise signals. Therefore, the signal () is expressed:
() = () + ()
(9)
Consider the statistical filtering problem given in Fig.2. The input signal () goes through a linear and time invariant system to produce an output signal (). We are supposed to design the system in such a way that the output signal () is as close as possible to the desired signal () [17]. This can be done by computing the estimation error () and making it as small as possible. The optimal filter that minimizes the estimation error is called the Wiener filter, named by Norbert Wiener [17] who first formulated and solved this filtering problem in the continuous domain. It should be noted that one of the constraints placed on the filter is that it is linear, thus making the analysis easy to handle. In principle, the filter could be finite response (FIR) or infinite impulse response (IIR), but often FIR filters are used for the following reasons:
: · They are inherently stable. · The resulting solution is linear and computationally easy to evaluate. Assuming a FIR system, we have:
() = ∑ ℎ ( − ) , = 0, 1, 2, …
(10)
where {ℎ } are the FIR filter coefficients, and is the number of coefficients. Then, we need to compute the filter coefficients {ℎ } so that the estimation error () = () − () is minimized. The mean square of the estimation error is commonly employed as a criterion for minimization, and the optimal filter coefficients can be derived in the time or frequency domain [17].
Figure 3. Block diagram of the statistical filtering problem.
149
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
In frequency domain, the Wiener filter is given by [17]:
( ) =
( ) ( ) ( )
(11)
where and are respectively the power spectrum of the clean speech and noise signals. They are expressed as follows:
( ) = |( )| ( ) = |( )|
(12) (13)
It is suggested that ℎ is not causal [17]; Therefore, the Wiener filter is not realizable. By defining a priori SNR at frequency , as follow [17]:
≜
( )
(14)
( )
We can also express the Wiener filter as:
( ) =
and
Note that and
when
(15)
when (i.e., at extremely low-SNR regions) (i.e., at extremely high-SNR regions).
5
0
-5
Gain filter (dB)
-10
-15
-20
-25
-30
-35
-40
-45 -20
-15
-10
-5
0
5
10
15
20
SNR (dB)
Figure 4. Attenuation curve of the Wiener filter as a function of the a priori SNR. Therefore, the Wiener filter emphasizes portions of the spectrum where the SNR is high and attenuates portions of the spectrum where the SNR is low [17]. This is illustrated in Figure 3 that plots H(ω ) as a function of ξ in dB. Note that for ξ > 10 , no attenuation is performed since we have (ω ) = 1. Therefore, the Wiener filter attenuates each frequency component in proportion to the estimated SNR ( ξ ) of the frequency [17]. In this paper, we have chosen to apply the Wiener filtering in frequency domain. The implementation of the Wiener filtering algorithm is based on a priori SNR estimation [18]. Table 1 gives the parameter values used in Wiener filtering algorithm implementation.
150
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Table 2. The used parameter values for Wiener filtering Parameter
Value
Window type Frame length Frame overlap DFT length Smoothing factor in noise spectrum update Smoothing factor in priori update VAD threshold
Hamming 256 50% 256 0.98 0.98 0.15
Where DFT designates the discrete Fourier transform and VAD is the voice activity detection [19].
5. Performance evaluation In this paper, we present the most popular objective tests that are often performed for speech enhancement techniques evaluation.
5.1. Signal-to-noise ratio The signal-to-noise ratio (SNR) of the enhanced speech signal is defined by:
= 10 ∙
∑ []
∑ ([][])
(18)
where [] and [ ] represent respectively the original and enhanced speech signals, and is the samples number per signal.
5.2. Segmental signal to noise ratio The segmental signal-to-noise ratio (segSNR) is calculated by averaging the frame based SNRs over the signal:
= ∑ 10 ∙
[] ∑ ([][]) ∑
(19)
where is the number of frames, is the size of frame, and is the beginning of the m-th frame. As the SNR can become negative and very small during silence periods, the segSNR values are limited to the range of [-10dB, 35dB].
5.3. Itakura-Saito distance The distance of Itakura-Saito (ISd) measures the spectrum changes and can be computed employing the coefficients of linear prediction (LPC) according to the following equation:
(, ) =
() ()
(20)
151
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
where represents the LPC vector of the original speech signal []. is the matrix of autocorrelation and is the LPC coefficients vector of the enhanced speech signal [ ]. In this paper, a 10th order LPC based measure is employed.
5.4. Perceptual evaluation of speech quality The perceptual evaluation of speech quality (PESQ) algorithm is an objective quality measure that is approved as the ITU-T recommendation P.862. It is a tool of objective measurement conceived to predict the results of a subjective Mean Opinion Score (MOS) test. It was proved [19] that the PESQ is more reliable and correlated better with MOS than the traditional objective speech measures.
6. Experimental results Five English sentences are used as the original speech signals. They were taken from TIMIT database and down-sampled at 8kHz. Noisy data were created by adding various sorts of noises (pink, tank, car, F16 and white noises) at different values of SNR (-10, -5, 0, 5 and 10dB), to the original clean sentences. Performances of the proposed technique (BWT/Wiener) are evaluated employing objective measures (SNR, segSNR, ISd and PESQ) and compared to those obtained by Weiner filtering, MMSE-STSA [6], spectral subtraction method [2, 3] and method of Johnson [7] which is also based on thresholding in bionic wavelet domain. Table2, Table3, Table4 and Table5 reported the objective measures obtained for noisy and enhanced speech signal. English sentence “She had your dark suit in greasy wash water all year” produced by a female speaker was used as original speech signal. The obtained results show that in case of white noise, we notice that the SNR values obtained by the proposed technique are generally better than those obtained by the other techniques. In case of Tank noise, the SNR values obtained by the proposed technique are the best compared to those obtained by the other techniques. In case of F16 noise and in term of SNR computation, the proposed technique gives the best results especially for high values of the input SNR. For the low values of the SNR, the best values of SNR are those obtained by the technique based on MMSE-STSA. In case of Pink noise and in term of SNR computation, the proposed technique gives better results when compared to the three other techniques of Johnson, Wiener and spectral subtraction. The best results are those obtained by the technique based on MMSE-STSA. The obtained SNR values show also that the proposed technique (BWT/Wiener) outperforms all the reference technique in case of Volvo. In term of SSNR computation, the results obtained in case of white noise show that the SSNR values obtained by the proposed technique are better than those obtained by the two techniques based on Wiener and Spectral subtraction, and are better than those obtained by the technique of Johnson when the SNR is higher, and we have the opposite when the SNR is lower. The best results are those obtained by the technique based on MMSE-STSA. In case of Tank noise, the SSNR values obtained by the proposed technique are better than those obtained by the other three techniques of Johnson, Wiener and spectral subtraction. When compared to the technique based on MMSE-STSA, the SSNR values obtained by the proposed technique are better than those obtained by the technique based on MMSE-STSA when the SNR is higher and we have the opposite when the SNR is lower. In case of F16 noise and in term of SSNR computation, the results obtained by the proposed technique are near to those obtained by the technique based on MMSE-STSA especially for high values of the input SNR. The best results are obtained by the technique based on MMSE-STSA. In case of Pink noise and in terms of SSNR computation, the proposed technique gives better results when compared to the three others techniques of Johnson, Wiener and spectral subtraction. In that case, the best results are those obtained by the technique based on MMSE-STSA. In term of SSNR computation, the obtained results also show that the proposed technique (BWT/Wiener) outperforms all the reference technique in case of Volvo. In term of ISd computation, the results obtained by the proposed technique are better than those obtained by the three techniques of Jhonson, Spectral subtraction and Wiener. In term ISd computation, the best results are those obtained by the technique based on MMSE-STSA. In case of Tank noise, the ISd values obtained by the proposed technique are near to those obtained by the technique of Wiener. In case of F16 noise and in term of ISd computation, the proposed technique gives better results than those obtained by the technique of Johnson. When compared to the technique of Wiener, the proposed technique gives better results for high values of the input SNR and we have the opposite for low values of SNR. The best
152
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
results are those obtained by the technique based on MMSE-STSA. In case of Pink noise and in terms of ISd computation, the proposed technique gives better results when compared to the two techniques of Johnson and Wiener. The best results are those obtained by the technique based on MMSE-STSA. In terms of the PESQ computation and in case of White noise, the results obtained by the proposed technique are better than those obtained by the two techniques of Wiener and spectral subtraction. When compared to the technique of Johnson, the PESQ values obtained by the proposed technique are near to those obtained by the techniques of Johnson and MMSE-STSA. In case of Tank noise, the PESQ values obtained by the proposed technique are near to those obtained by the technique based on MMSE-STSA which gives the best results. In case of F16 noise and in term of PESQ computation, the proposed technique gives better results than those obtained by the technique of Johnson and those results are near to those obtained by the technique of Wiener. In case of Pink noise and in terms of PESQ computation, the proposed technique gives better results when compared to the three other techniques of Johnson, Wiener and spectral subtraction. The best results are those obtained by the technique based on MMSE-STSA. Table 2. SNR measures obtained for noisy and enhanced speech signal Noise type
Enhancement technique
Volvo
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
Pink
SNR (dB) -10
-5
0
5
10
2.17
0.624
4.6503
10.001
14. 598
0.379 7.059 14.37 8.394
4.905 11.54 17.55 13.43
9.8627 15.95 21.23 18.322
14.832 20.14 24.938 22.868
19.744 24.3411 28.413 26.677
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-10 0.87 -0.036 3.737
-5 3.1117 3.400 6.7824
0 5.4045 7.4344 9.9102
5 9.2123 11.292 13.202
10 13.7172 15.3356 16.9524
1.943
5.62
9.0552
12.121
15.956
-4.7
-0.527
3.8906
9.071
13. 914
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-10 1.308 -1.081 2.486
-5 3.803 2.935 4.641
0 6.0501 7.1158 7.648
5 8.4944 11.067 11. 558
10 14.3946 14.945 14.9102
0.831
4.424
11.756
15. 312
-4.155
-0.727
3.5079
7.9227
12.6802
Tank
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-10 -0.76 -0.035 3.737 3.242 -3.31
-5 3.2531 3.40 6.783 7.029 0.308
0 6.6771 7.4344 9.9102 10.32 5.1971
5 9.4089 11.292 13.202 13.674 10.249
10 15.0289 15.336 16.952 17.27 15.1886
White
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-10 1.1741 -1.204
-5 3.607 2.309
0 5.8027 6.9521
5 8.6656 11.195
10 14.7456 14.9349
2.228 1.233 0.942
4.835 4.360 4.539
7.6286 8.1902 7.2743
11.727 11.892 10.302
14.806 15.09 13.932
F16
8.2378
153
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Table 3. SSNR measures obtained for noisy and enhanced speech signal Noise type
Enhancement technique
Volvo
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-8.64 -2.92 6 -1.638 3.14 8.76 2.667
-6.357 -1.1911 1.7745 6.8111 11. 94 7.05
-3.24 1.553 5.8323 10.852 15.37 11.511
0.482 6.7884 10.236 15.0076 18.863 15.786
4.649 10.944 14.801 19.0138 22.171 19.561
Pink
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-9.13 -2.074 -3.668 -0.862 -2.346
-7.007 -0.6279 -1.1857 1.7724 0.68
-3.919 1.2267 2.0394 4.5174 3.4661
-0.208 4.355 5.2878 7.5585 6.154
4.0156 7.9262 8.8802 11.3667 9.666
-6.98
-4.4281
-1.266
3.0032
7.2705
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-9.12 -2.604 -4.622
-7.106 -0.598 -1.755
-4.029 1.542 1.6184
-0.32 4.0592 4.9857
3.9142 8.5331 8.4033
-1.701 -3.273 -6.714
0.1909 -0.486 -4.564
2.7442 2.6598 0.6305
5.9628 5.7545 2.0456
9.0526 8.975 6.2069
Tank
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-8.874 -2.074 -3.668 -0.862 -1.468 -6.169
-6.834 -0.6279 -1.185 1.7724 1.587 -3.7984
-3.731 1.2267 2.0394 4.5174 4.6864 -0.1397
0.0609 4.355 5.2878 7.5585 8.0901 4.1158
4.2483 7.9262 8.8802 11.3667 11.763 8.5201
White
Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-9.115 -3.052 -4.914 -2.195 -3.04 -2.973
-7.154 -0.8997 -2.393 0.1662 -0.485 -0.079
-4.0772 1.1656 1.3221 2.6718 2.7248 2.3039
-0.3634 4.0274 4.9926 6.0314 5.9176 4.8648
3.8453 8.4773 8.3717 8.9511 8.845 7.97
F16
SSNR (dB)
154
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Table 4. ISd measures obtained for noisy and enhanced speech signal Noise type Volvo
Pink
F16
Tank
White
Enhancement technique
ISd
Noisy
4.649
0.114
0.101
0.085
0.057
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
10.944 14.801 19.0138 22.171 19.561
0.0941 0.085 0.039 0.026 0.049
0.11 0.054 0.009 0.019 0.024
0.096 0.0212 0.03 0.015 0.0150
0.0255 0.0044 0.002 0.013 0.0132
0.466
Noisy
4.0156
1.066
0.901
0.6849
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
7.9262
0.6611
0.359
0.1589
0.0423
8.8802 7.5585 6.154 7.2705
0.737 11.3667 9.666 2.687
0.489 0.343 0.7 1.0361
0.3045 0.1711 0.466 0.3886
0.1392 0.0726 0.2578 0.1201
Noisy
3.9142
1.637
1.451
1.1214
0.7667
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
8.5331 8.4033 9.0526 8.975 .2069
0.9081 1.316 0.870 1.375 6.899
0.546 0.877 0.609 0.974 2.316
0.3426 0.5614 0.3318 0.599 1.8102
0.1592 0.3213 0.0834 0.2633 0.2909
Noisy
0.636
0.432
0.2579
0.1107
0.0288
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
9.738
0.349
0.138
0.0252
0.0035
0.7366 0.343 11.763 0.715
0.489 0.172 0.927 0.217
0.3045 0.0726 0.59 0.0702
0.1392 0.0157 0.2648 0.023
0.0244 0.0027 0.0937 0.0141
Noisy
3.8453
5.117
3.581
2.3538
1.4782
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
8.4773
2.585
1.42
0.7650
0.4093
8.3717 8.9511 8.845 7.97
2.558 1.8717 2.282 8.144
1.523 0.997 1.165 5.296
0.876 0.5258 0.6899 1.1026
0.5219 0.2763 0.3598 0.3897
155
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Table 5. PESQ measures obtained for noisy and enhanced speech signal Noise type Volvo
Pink
F16
Tank
White
Enhancement technique
PESQ
Noisy
. 2.408
2.781
3.140
3.563
3.9127
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
2.049 2.791
2.3985 3.1901
3.237
3.5343
3.174 2.458
3.517 2.815
2.631 3.5316 3.8273 3.768 3.1706
3.1871 3.8058 4.0552 4.023 3.5031
3.6435 4.0974 4.2091 4.1942 3.7976
Noisy
0.943
1.17
1.4828
1.857
2.2763
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
1.209 1.244 1.843 1.273 1.081
1.5519 1.562 2.269 1.696 1.416
1.9177 2.0552 2.6982 2.124 1.8
2.45 2.4909 3.0575 2.529 2.1873
2.9806 2.9118 3.4124 2.9122 2.5668
Noisy
0.985
1.687
1.514
1.8699
2.2635
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
0.985
1.687
1.514
1.8699
2.2635
1.1271 1.3989 1.284 1.186
1.5625 1.739 1.216 1.518
1.9211 2.236 2.046 1.8102
2.3711 2.691 2.3850 2.1273
2.9402 3.1062 2.7757 2.4995
Noisy
8.874
1.549
1.938
2.3271
2.7040
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
-8.874
1.549
1.938
2.3271
2.7040
-0.76 1.843 1.749 1.2629
1.595 2.2688 2.105 1.632
1.9119 2.6982 2.5119 1.9927
2.2865 3.0575 2.9453 2.3592
2.5648 3.4124 3.3823 2.7149
Noisy
0.999
1.1391
1.3238
1.6003
1.9741
Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson
1.0317 1.132 1.1842 1.037 0.963
1.2988 1.361 1.4024 1.4693 1.566
1.6577 1.7777 1.9351 1.8887 2.048
2.1848 2.2006 2.5481 2.2798 2.3022
2.7852 2.6485 2.9590 2.7107 2.6596
Figures 4 and 5 represent some examples of speech enhancement using our proposed technique. Those figures show clearly that the proposed technique reduces efficiently the noise while introducing a little distortion in speech signal. Figures 8-12 present the spectrograms of examples of clean speech signals, noisy signals and enhanced signals and this for four types of noise (White, Pink, F16, Tank and Volvo noises.
156
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
1
0
-1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1
0
-1 1
0
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Figure 5. An example of speech enhancement by the proposed method: speech signal corrupted by car noise with SNR = -10dB. 1
1 0.5
0
0 -0.5 -1
0
0.5
1
1.5
2
2.5
3
3.5
-1
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
1
1 0.5
0
0 -0.5 -1
0
0.5
1
1.5
2
2.5
3
3.5
-1 1
1 0.5
0
0 -0.5 -1
0
0.5
1
1.5
2
2.5
3
3.5
-1
(a) (b) Figure 6. Two examples of speech enhancement by the proposed method: (a) speech signal corrupted by Tank noise with SNR = 5dB, (b) speech signal corrupted by Pink noise with SNR=5dB. Those figures show clearly that the proposed technique reduces efficiently the noise while introducing a little distortion in speech signal. Figure 7 presents the spectrograms of examples of clean speech signals, noisy signals and enhanced signals and this for the case of the car noise. (a)
157
1 1 0.5 0.5 0 0 -0.5 -0.5
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
Freq (kHz)
-1 -1 0 0
0.5 0.5
1 1
2 2
2.5 2.5
4 11
20
3 0.5 0.5
10 0
2 00
-10
1 -0.5 -0.5 0 -1 -10 0
-20 -30 0.5
0.5
1
1
1.5 Time (sec)
1.5
2
2
2.5
2.5
(b)
1 41 Freq (kHz)
1.5 1.5
30 20
0.5 3 0.5
10 20 0
0 -10
1 -0.5 -0.5
-20 0 -1 -1 0 0 0
0.5
0.5 0.5
1
11
1.5 Time (sec)
1.5 1.5
2
22
2.5
2.5 2.5
(c)
Enhanced Speech Signal
Freq (kHz)
1 4
30 20
3 0.5
10
2 0
0 -10
1 -0.5 0 -1 0 0
-20 0.5
0.5
1
1
1.5 Time (sec)
1.5
2
22
2.5
2.5 2.5
(d) Signal Enhanced Speech
Freq (kHz)
141
20
0.5 0.53
10 0
020
-10
1 -0.5 -0.5 0
-1 0 -1 00
-20 -30 0.5
0.5 0.5
1
11
1.5 Time (sec)
1.5 1.5
2
22
2.5
2.5 2.5
(e) Signal Enhanced Speech
Freq (kHz)
14
20
0.53
10 0
2 0
-10
1 -0.5 0 -1 0 0
-20 -30 0.5
1
0.5
1
1.5 Time (sec)
2
1.5
2.5
2
2.5
(f) 4 20 Freq (kHz)
3
10 0
2
-10 1 0
-20 -30 0
0.5
1
1.5 Time (sec)
2
2.5
Figure 7. Spectrogram of (a) Clean speech signal, (b) Speech corrupted with Car noise at -10dB SNR, and speech enhanced by employing, (c) Wiener filtering method, (d) method of Johnson (e) MMSESTSA estimator, (f) our proposed technique (BWT/Wiener filtering).
158
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
7. Conclusion In this paper, we propose a new speech enhancement technique including the bionic wavelet transform and the Wiener filtering. The obtained results show that the proposed technique outperforms the most poplar techniques. The noise is efficiently removed without introducing and preserving information in enhanced speech signal and this especially for 5 and 10dB.
8. References [1] J. S. Lim and A. V. Oppenheim, ‘‘Enhancement and bandwidth compression of noisy speech’’, In Proceedings of the IEEE, pp.1586-1604, 1979. [2] M. Berouti, R. Schwartz, and J. Makhoul, ‘‘Enhancement of speech corrupted by acoustic noise’’, In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, pp. 208-211, 1979. [3] S. Boll, ‘‘Suppression of acoustic noise in speech using spectral subtraction’’, IEEE tran. Signal Processing, 27(2), pp.113-120, 1979. [4] M. Bahoura and J. Rouat, ‘‘Wavelet speech enhancement based on time-scale adaptation’’, Speech Communication, vol.48, no.12, pp.1620-1637, 2006. [5] Dr Sattar B. Sadkhan, Dr Nidaa A. Abbas, ‘‘Proposed Simulation of Modulation Identification Based On Wavelet Transform’’, International Journal of Advancements in Computing Technology, vol.1, no.1, 2009. [6] H. Taşmaz and E. Erçelebi, ‘‘Speech enhancement based on undecimated wavelet packetperceptual filterbanks and MMSE-STSA estimation in various noise environments’’, Digital Signal Processing, vol.18, no.5, pp.797-812, 2008. [7] M. T. Johnson, X. Yuan, and Y. Ren, ‘‘Speech signal enhancement through adaptive wavelet thresholding’’, Speech Communication, vol.49, no.2, pp.123-133, 2007. [8] D. Mahmoudi, ‘‘A microphone array for speech enhancement using multiresolution wavelet transform’’, In Proc. Of Eurospeech'97, pp.339-342, 1997. [9] C. H. Yang, J. C. Wang, J. F. Wang, H. P. Lee, C. H. Wu, and K. H.Chang, ‘‘Multiband subspace tracking speech enhancement for in-car human computer speech interaction’’, Journal of Information Science and Engineering, vol.22, no.5, pp.1093-1107, 2006. [10] Y. Shao and C. H. Chang, ‘‘A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system’’, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol.37, no.4, pp.877-889, 2007. [11] J. Sika and V. Davidek, ‘‘Multi-channel noise reduction using wavelet filterbank’’, In EuroSpeech'97, pp. 2595-2598, 1997. [12] J. Yao and Y. T. Zhang, ‘‘Bionic wavelet transform: A new time-frequency method based on an auditory model’’, IEEE Transactions on Biomedical Engineering, vol.48, no.8, pp.856-863, 2001. [13] J. Yao and Y. T. Zhang, ‘‘The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network simulations’’, IEEE Transactions on Biomedical Engineering, vol.49, no.11, pp.1299-1309, 2002. [14] X. Yuan, ‘‘Auditory Model-Based Bionic Wavelet Transform for Speech Enhancement’’, Master's thesis, Marquette University, Milwaukee, WI, USA, 2003. [15] O. Sayadi and M.B. Shamsollahi, ‘‘Multiadaptive Bionic Wavelet Transform: Application to ECG Denoising and Baseline Wandering Reduction’’, EURASIP Journal of Applied Signal Processing, pp.11, 2007. [16] Talbi Mourad, Salhi Lotfi, Abid Sabeur, Cherif Adnane, ‘‘Recurrent Neural Network and Bionic Wavelet Transform for speech enhancement’’, Int. J. Signal and Imaging Systems Engineering, vol.3, no.2, pp.93-101, 2010. [17] Philipos C. Loizou, “Speech Enhancement Theory and Practice”, Taylor & Francis, USA, 2007. [18] Scalart, P. and Filho, J., ‘‘Speech enhancement based on a priori signal to noise estimation’’, In Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, pp. 629-632, 1996.
159
Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif
[19] Urmila Shrawanka, ‘‘Voice Activity Detector and Noise Trackers for Speech Recognition System in Noisy Environment’’, International Journal of Advancements in Computing Technology, vol.2, no.4, 2010. [20] E. Zavarehei, S. Vaseghi, and Q. Yan, ‘‘Inter-frame modeling of DFT trajectories of speech and noise for speech enhancement using Kalman filters’’, Speech Communication, vol.48, no.11, pp.1545-1555, 2006.
160