Wiener Filtering Application in the Bionic Wavelet ... - Semantic Scholar

Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement Mourad Talbi, Lotfi Salhi, Mouhamed bennasr, Adnane Cherif

Wiener Filtering Application in the Bionic Wavelet Domain for Speech Enhancement 1

Mourad Talbi, 2Lotfi Salhi, 3Mouhamed bennasr, 4Adnane Cherif 1, University of Kairouan, [email protected] *2, University of Kairouan, [email protected] 3,4 Faculty of Sciences of Tunis, [email protected], [email protected], [email protected]

Abstract In this paper, a new speech enhancement method is proposed and is based on the application of the Wiener filtering in the Bionic Wavelet Transform (BWT) domain. The BWT provides a time-frequency selectivity and better energy concentration property. The proposed method was compared to a number of speech enhancement methods including a BWT based speech enhancement method, Wiener filtering ,and MMSE-STSA. Results are evaluated objectively using signal-to-noise ratio (SNR), segmental signal-to-noise ratio (segSNR), Itakura-Saito distance (ISd) and perceptual evaluation of speech quality (PESQ) with TIMIT sentences corrupted by various noise types and SNRs. The obtained results show that the proposed method gives good results when compared to the other techniques.

Keywords: Bionic Wavelet Transform, Minimum Mean Square Error-Short Time Spectral Amplitude Estimator Speech Enhancement, Wiener Filtering.

1. Introduction Speech enhancement and the uncorrelated additive noise are important problems that have received much attention in the last two decades. This is the result of the rising employment of the speech processing systems in diverse real environments. The noise presence affects the speech processing systems performance. Those systems include speech recognition, mobile phones hearing aids, and voice coders. The speech enhancement aim is to improve the intelligibility and perceptual quality of speech by minimizing the effect of noise. Existing techniques for this task include Wiener filtering [1], spectral subtraction [2, 3], wavelet transform (WT) [4, 5], etc. An emerging tendency in the speech enhancement domain consists of employing a filter bank based on a specific psychoacoustic model of human auditory system (Critical bands). The principle behind this is based on the fact that embedding the model of psychoacoustic of human auditory system in filter bank can improve the intelligibility and the perceptual quality of speech. Furthermore, it is well known that the human auditory system can approximately be described as a nonuniform bandpass filter bank and humans are able to detect the desired speech in noisy environments without noise prior knowledge [6]. Different frequency transformations (scales) are proposed to consider the hearing perceptive aspect (ERB, Bark, Mel and so on). It deserves mentioning that the majority of the perceptual speech enhancement techniques are based on the wavelet packet transform [7]. Moreover, the wavelet packet transform was successfully combined with other denoising approaches for the purpose of ameliorating the speech enhancement techniques performance. They include the Wiener filtering [8], adaptive filtering [9], spectral subtraction [10] and coherence function [11]. In this paper, we propose a new speech enhancement technique which consists in applying the Wiener Filtering to each noisy Bionic wavelet coefficient in order to filter it. Then, the enhanced speech signal is obtained by applying the inverse of the BWT to those filtered bionic wavelet coefficients. The rest of the paper is organized as follows: Section 2 describes the proposed speech enhancement technique by giving a detailed overview of the bionic wavelet transform (BWT) and the Wiener filtering. In section 3, we deal with the bionic wavelet transform and the section 4 deals with Wiener filtering. Section 5 presents the objective quality measurement techniques. Experimental results are presented and discussed in section 6. Finally, the conclusion is given in section 7.

International Journal of Advancements in Computing Technology(IJACT) Volume4, Number2, February 2012 doi: 10.4156/ijact.vol4.issue2.19

146


2. The new proposed approach In this paper, we propose a new speech enhancement method including the Wiener Filtering and the Bionic Wavelet Transform (BWT). As shown in Fig.1., the Wiener filtering is used to reduce noise and enhance the Bionic Wavelet Coefficients which are obtained from the application of the Bionic Wavelet Transform (BWT) to the noisy speech signal. The Bionic Wavelet Transform (BWT) has been initially proposed by Yao and Zhang [12, 13] for speech processing. The BWT was implemented by incorporating the active cochlear mechanism into the wavelet transform (WT), resulting an adaptive time-frequency analysis and biologically-based model [14]. Furthermore, it presents a time-frequency selectivity and better energy concentration property that can guide to better signal and noise components separation within the coefficients [14]. Those characteristics and the success of the application of the Wiener filtering in the wavelet domain [11] have motivated us to apply the Wiener filtering in the BWT domain in order to improve the enhanced speech intelligibility. Fig.1. summarizes our new proposed speech enhancement technique.

Figure 1. Bloc diagram of the proposed speech enhancement technique. … are the 21 bionic wavelet coefficients of the noisy speech signal and … where are the 21 filtered bionic wavelet coefficients obtained after applying the Wiener filtering to each subband. The Wiener filtering algorithm is based on a priori SNR (Signal to Noise Ratio) estimation [6].

3. Bionic wavelet transform The bionic wavelet transform (BWT) was initially introduced as an adaptive wavelet transform and is conceived especially to model the human auditory system [14, 15]. The adaptive nature of the BWT is insured by replacing the constant factor of the wavelet transform with a variable quality factor. The mother wavelet  () can be expressed as follow:

 ()   () = 

(1)

 are respectively the center frequency and envelope function of (). The latter is where  and   () is chosen to be the Morlet wavelet and is represented in figure 3. In this case the function  expressed as follow: 

 () =     where



(2)

is the initial-support of the unscaled mother wavelet.

147


0.5 Real part Imag part

0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -4

-3

-2

-1

0

1

2

3

4

Figure 2. Real and imaginary parts of the Morlet mother wavelet. Using a time varying function T, the mother function of the BWT is expressed as follows [15]:         () =   

(3)

The BWT of a given signal () is defined as follows [12-16]: 



∗    ∫ ()     −  1  () ∗        =  ∙  ||   (, ) =

(4)

||

Hence, the adaptive nature of the BWT is captured by a time-varying factor T. This factor represents the scaling of the cochlear filter bank quality at each scale over time [12-16]. For the human auditory system, Yao and Zhang [12, 13] have taken  = 15165.4 . The discrimination of the scale variable  is accomplished using a pre-defined logarithmic spacing across the desired frequency rang so that the center frequency at each scale is expressed as follows [14-16]: 

 ,  = 0, 1, 2, …  = (. )

(5)

For this project, coefficients at 21 scales,  = 11, 11, … , 30 , are computed using numerical integration of the continuous wavelet transform. The 21 scales correspond to center frequencies logarithmically spaced from 166.4 Hz to 3369.7 Hz. For each time and scale, the adapting function (, ) is calculated using the following equation [14, 15, 16]: 

  (,  + ∆) = 1 −   |(,)| 





 × 1 +    (, )

(6)

where  designates the active gain factor representing the outer hair cell resistance function,  is the active gain factor representing the time-varying compliance of Basilar membrane,    is a ( ) constant representing the time-varying compliance of Basilar membrane,   ,  is the BWT at  scale  and time , and ∆ is time computation step [13].  and  resolutions in time domain and frequency domain can be increased respectively [13]. In implementation, BWT coefficients can be easily computed based on corresponding coefficients of the Continuous Wavelet Transform (CWT) by:

  (, ) =  (, ) ∙   (, )

(7)

148


where  is a factor that depends on  [13]. For the Morlet wavelet () =  employed as the mother function in our experience,  is expressed by: ∞

  



 (, ) = (∫∞   )1 + ((, )/ ) that is roughly equal to:



which is also

(8)

. 1.7725⁄((, )⁄ ) + 1

In this paper, we employ the same values as in the reference [13-16]:  = 0.87,  = 45,   = 0.8 and  = 0.0005 . Finally, the computation step ∆ is chosen to be equal to 1⁄ , where  represents the sampling frequency.

4. Wiener filtering Let (), () and () be respectively the noisy speech, the clean speech and the additive noise signals. Therefore, the signal () is expressed:

() = () + ()

(9)

Consider the statistical filtering problem given in Fig.2. The input signal () goes through a linear and time invariant system to produce an output signal (). We are supposed to design the system in such a way that the output signal () is as close as possible to the desired signal () [17]. This can be done by computing the estimation error () and making it as small as possible. The optimal filter that minimizes the estimation error is called the Wiener filter, named by Norbert Wiener [17] who first formulated and solved this filtering problem in the continuous domain. It should be noted that one of the constraints placed on the filter is that it is linear, thus making the analysis easy to handle. In principle, the filter could be finite response (FIR) or infinite impulse response (IIR), but often FIR filters are used for the following reasons:

: · They are inherently stable. · The resulting solution is linear and computationally easy to evaluate. Assuming a FIR system, we have:

() = ∑  ℎ ( −  ) ,  = 0, 1, 2, …

(10)

where {ℎ } are the FIR filter coefficients, and  is the number of coefficients. Then, we need to compute the filter coefficients {ℎ } so that the estimation error () = () − () is minimized. The mean square of the estimation error is commonly employed as a criterion for minimization, and the optimal filter coefficients can be derived in the time or frequency domain [17].

Figure 3. Block diagram of the statistical filtering problem.

149


In frequency domain, the Wiener filter is given by [17]:

 ( ) =

 ( )  ( ) ( )

(11)

where  and  are respectively the power spectrum of the clean speech and noise signals. They are expressed as follows:

 ( ) = |( )|  ( ) = |( )|

(12) (13)

It is suggested that ℎ is not causal [17]; Therefore, the Wiener filter is not realizable. By defining a priori SNR at frequency  , as follow [17]:

 ≜

 ( )

(14)

 ( )

We can also express the Wiener filter as:

 ( ) =

and

Note that and

when



(15)

 

when (i.e., at extremely low-SNR regions) (i.e., at extremely high-SNR regions).

5

0

-5

Gain filter (dB)

-10

-15

-20

-25

-30

-35

-40

-45 -20

-15

-10

-5

0

5

10

15

20

SNR (dB)

Figure 4. Attenuation curve of the Wiener filter as a function of the a priori SNR. Therefore, the Wiener filter emphasizes portions of the spectrum where the SNR is high and attenuates portions of the spectrum where the SNR is low [17]. This is illustrated in Figure 3 that plots H(ω ) as a function of ξ in dB. Note that for ξ > 10 , no attenuation is performed since we have (ω ) = 1. Therefore, the Wiener filter attenuates each frequency component in proportion to the estimated SNR ( ξ ) of the frequency [17]. In this paper, we have chosen to apply the Wiener filtering in frequency domain. The implementation of the Wiener filtering algorithm is based on a priori SNR estimation [18]. Table 1 gives the parameter values used in Wiener filtering algorithm implementation.

150


Table 2. The used parameter values for Wiener filtering Parameter

Value

Window type Frame length Frame overlap DFT length Smoothing factor in noise spectrum update Smoothing factor in priori update VAD threshold

Hamming 256 50% 256 0.98 0.98 0.15

Where DFT designates the discrete Fourier transform and VAD is the voice activity detection [19].

5. Performance evaluation In this paper, we present the most popular objective tests that are often performed for speech enhancement techniques evaluation.

5.1. Signal-to-noise ratio The signal-to-noise ratio (SNR) of the enhanced speech signal is defined by:

  = 10 ∙  

∑   []



 ∑ ([][])

(18)



where [] and  [ ] represent respectively the original and enhanced speech signals, and  is the samples number per signal.

5.2. Segmental signal to noise ratio The segmental signal-to-noise ratio (segSNR) is calculated by averaging the frame based SNRs over the signal:





   =  ∑  10 ∙ 





   []    ∑ ([][])   ∑



(19)

where  is the number of frames,  is the size of frame, and  is the beginning of the m-th frame. As the SNR can become negative and very small during silence periods, the segSNR values are limited to the range of [-10dB, 35dB].

5.3. Itakura-Saito distance The distance of Itakura-Saito (ISd) measures the spectrum changes and can be computed employing the coefficients of linear prediction (LPC) according to the following equation:

 (, ) =

() ()  

(20)

151


where  represents the LPC vector of the original speech signal [].  is the matrix of autocorrelation and  is the LPC coefficients vector of the enhanced speech signal  [ ]. In this paper, a 10th order LPC based measure is employed.

5.4. Perceptual evaluation of speech quality The perceptual evaluation of speech quality (PESQ) algorithm is an objective quality measure that is approved as the ITU-T recommendation P.862. It is a tool of objective measurement conceived to predict the results of a subjective Mean Opinion Score (MOS) test. It was proved [19] that the PESQ is more reliable and correlated better with MOS than the traditional objective speech measures.

6. Experimental results Five English sentences are used as the original speech signals. They were taken from TIMIT database and down-sampled at 8kHz. Noisy data were created by adding various sorts of noises (pink, tank, car, F16 and white noises) at different values of SNR (-10, -5, 0, 5 and 10dB), to the original clean sentences. Performances of the proposed technique (BWT/Wiener) are evaluated employing objective measures (SNR, segSNR, ISd and PESQ) and compared to those obtained by Weiner filtering, MMSE-STSA [6], spectral subtraction method [2, 3] and method of Johnson [7] which is also based on thresholding in bionic wavelet domain. Table2, Table3, Table4 and Table5 reported the objective measures obtained for noisy and enhanced speech signal. English sentence “She had your dark suit in greasy wash water all year” produced by a female speaker was used as original speech signal. The obtained results show that in case of white noise, we notice that the SNR values obtained by the proposed technique are generally better than those obtained by the other techniques. In case of Tank noise, the SNR values obtained by the proposed technique are the best compared to those obtained by the other techniques. In case of F16 noise and in term of SNR computation, the proposed technique gives the best results especially for high values of the input SNR. For the low values of the SNR, the best values of SNR are those obtained by the technique based on MMSE-STSA. In case of Pink noise and in term of SNR computation, the proposed technique gives better results when compared to the three other techniques of Johnson, Wiener and spectral subtraction. The best results are those obtained by the technique based on MMSE-STSA. The obtained SNR values show also that the proposed technique (BWT/Wiener) outperforms all the reference technique in case of Volvo. In term of SSNR computation, the results obtained in case of white noise show that the SSNR values obtained by the proposed technique are better than those obtained by the two techniques based on Wiener and Spectral subtraction, and are better than those obtained by the technique of Johnson when the SNR is higher, and we have the opposite when the SNR is lower. The best results are those obtained by the technique based on MMSE-STSA. In case of Tank noise, the SSNR values obtained by the proposed technique are better than those obtained by the other three techniques of Johnson, Wiener and spectral subtraction. When compared to the technique based on MMSE-STSA, the SSNR values obtained by the proposed technique are better than those obtained by the technique based on MMSE-STSA when the SNR is higher and we have the opposite when the SNR is lower. In case of F16 noise and in term of SSNR computation, the results obtained by the proposed technique are near to those obtained by the technique based on MMSE-STSA especially for high values of the input SNR. The best results are obtained by the technique based on MMSE-STSA. In case of Pink noise and in terms of SSNR computation, the proposed technique gives better results when compared to the three others techniques of Johnson, Wiener and spectral subtraction. In that case, the best results are those obtained by the technique based on MMSE-STSA. In term of SSNR computation, the obtained results also show that the proposed technique (BWT/Wiener) outperforms all the reference technique in case of Volvo. In term of ISd computation, the results obtained by the proposed technique are better than those obtained by the three techniques of Jhonson, Spectral subtraction and Wiener. In term ISd computation, the best results are those obtained by the technique based on MMSE-STSA. In case of Tank noise, the ISd values obtained by the proposed technique are near to those obtained by the technique of Wiener. In case of F16 noise and in term of ISd computation, the proposed technique gives better results than those obtained by the technique of Johnson. When compared to the technique of Wiener, the proposed technique gives better results for high values of the input SNR and we have the opposite for low values of SNR. The best

152


results are those obtained by the technique based on MMSE-STSA. In case of Pink noise and in terms of ISd computation, the proposed technique gives better results when compared to the two techniques of Johnson and Wiener. The best results are those obtained by the technique based on MMSE-STSA. In terms of the PESQ computation and in case of White noise, the results obtained by the proposed technique are better than those obtained by the two techniques of Wiener and spectral subtraction. When compared to the technique of Johnson, the PESQ values obtained by the proposed technique are near to those obtained by the techniques of Johnson and MMSE-STSA. In case of Tank noise, the PESQ values obtained by the proposed technique are near to those obtained by the technique based on MMSE-STSA which gives the best results. In case of F16 noise and in term of PESQ computation, the proposed technique gives better results than those obtained by the technique of Johnson and those results are near to those obtained by the technique of Wiener. In case of Pink noise and in terms of PESQ computation, the proposed technique gives better results when compared to the three other techniques of Johnson, Wiener and spectral subtraction. The best results are those obtained by the technique based on MMSE-STSA. Table 2. SNR measures obtained for noisy and enhanced speech signal Noise type

Enhancement technique

Volvo

Noisy Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson

Pink

SNR (dB) -10

-5

0

5

10

2.17

0.624

4.6503

10.001

14. 598

0.379 7.059 14.37 8.394

4.905 11.54 17.55 13.43

9.8627 15.95 21.23 18.322

14.832 20.14 24.938 22.868

19.744 24.3411 28.413 26.677


-10 0.87 -0.036 3.737

-5 3.1117 3.400 6.7824

0 5.4045 7.4344 9.9102

5 9.2123 11.292 13.202

10 13.7172 15.3356 16.9524

1.943

5.62

9.0552

12.121

15.956

-4.7

-0.527

3.8906

9.071

13. 914


-10 1.308 -1.081 2.486

-5 3.803 2.935 4.641

0 6.0501 7.1158 7.648

5 8.4944 11.067 11. 558

10 14.3946 14.945 14.9102

0.831

4.424

11.756

15. 312

-4.155

-0.727

3.5079

7.9227

12.6802

Tank


-10 -0.76 -0.035 3.737 3.242 -3.31

-5 3.2531 3.40 6.783 7.029 0.308

0 6.6771 7.4344 9.9102 10.32 5.1971

5 9.4089 11.292 13.202 13.674 10.249

10 15.0289 15.336 16.952 17.27 15.1886

White


-10 1.1741 -1.204

-5 3.607 2.309

0 5.8027 6.9521

5 8.6656 11.195

10 14.7456 14.9349

2.228 1.233 0.942

4.835 4.360 4.539

7.6286 8.1902 7.2743

11.727 11.892 10.302

14.806 15.09 13.932

F16

8.2378

153


Table 3. SSNR measures obtained for noisy and enhanced speech signal Noise type


Volvo


-8.64 -2.92 6 -1.638 3.14 8.76 2.667

-6.357 -1.1911 1.7745 6.8111 11. 94 7.05

-3.24 1.553 5.8323 10.852 15.37 11.511

0.482 6.7884 10.236 15.0076 18.863 15.786

4.649 10.944 14.801 19.0138 22.171 19.561

Pink


-9.13 -2.074 -3.668 -0.862 -2.346

-7.007 -0.6279 -1.1857 1.7724 0.68

-3.919 1.2267 2.0394 4.5174 3.4661

-0.208 4.355 5.2878 7.5585 6.154

4.0156 7.9262 8.8802 11.3667 9.666

-6.98

-4.4281

-1.266

3.0032

7.2705


-9.12 -2.604 -4.622

-7.106 -0.598 -1.755

-4.029 1.542 1.6184

-0.32 4.0592 4.9857

3.9142 8.5331 8.4033

-1.701 -3.273 -6.714

0.1909 -0.486 -4.564

2.7442 2.6598 0.6305

5.9628 5.7545 2.0456

9.0526 8.975 6.2069

Tank


-8.874 -2.074 -3.668 -0.862 -1.468 -6.169

-6.834 -0.6279 -1.185 1.7724 1.587 -3.7984

-3.731 1.2267 2.0394 4.5174 4.6864 -0.1397

0.0609 4.355 5.2878 7.5585 8.0901 4.1158

4.2483 7.9262 8.8802 11.3667 11.763 8.5201

White


-9.115 -3.052 -4.914 -2.195 -3.04 -2.973

-7.154 -0.8997 -2.393 0.1662 -0.485 -0.079

-4.0772 1.1656 1.3221 2.6718 2.7248 2.3039

-0.3634 4.0274 4.9926 6.0314 5.9176 4.8648

3.8453 8.4773 8.3717 8.9511 8.845 7.97

F16

SSNR (dB)

154


Table 4. ISd measures obtained for noisy and enhanced speech signal Noise type Volvo

Pink

F16

Tank

White


ISd

Noisy

4.649

0.114

0.101

0.085

0.057

Spectral subtraction Wiener MMSE-STSA BWT/Wiener Method of Johnson

10.944 14.801 19.0138 22.171 19.561

0.0941 0.085 0.039 0.026 0.049

0.11 0.054 0.009 0.019 0.024

0.096 0.0212 0.03 0.015 0.0150

0.0255 0.0044 0.002 0.013 0.0132

0.466

Noisy

4.0156

1.066

0.901

0.6849


7.9262

0.6611

0.359

0.1589

0.0423

8.8802 7.5585 6.154 7.2705

0.737 11.3667 9.666 2.687

0.489 0.343 0.7 1.0361

0.3045 0.1711 0.466 0.3886

0.1392 0.0726 0.2578 0.1201

Noisy

3.9142

1.637

1.451

1.1214

0.7667


8.5331 8.4033 9.0526 8.975 .2069

0.9081 1.316 0.870 1.375 6.899

0.546 0.877 0.609 0.974 2.316

0.3426 0.5614 0.3318 0.599 1.8102

0.1592 0.3213 0.0834 0.2633 0.2909

Noisy

0.636

0.432

0.2579

0.1107

0.0288


9.738

0.349

0.138

0.0252

0.0035

0.7366 0.343 11.763 0.715

0.489 0.172 0.927 0.217

0.3045 0.0726 0.59 0.0702

0.1392 0.0157 0.2648 0.023

0.0244 0.0027 0.0937 0.0141

Noisy

3.8453

5.117

3.581

2.3538

1.4782


8.4773

2.585

1.42

0.7650

0.4093

8.3717 8.9511 8.845 7.97

2.558 1.8717 2.282 8.144

1.523 0.997 1.165 5.296

0.876 0.5258 0.6899 1.1026

0.5219 0.2763 0.3598 0.3897

155


Table 5. PESQ measures obtained for noisy and enhanced speech signal Noise type Volvo

Pink

F16

Tank

White


PESQ

Noisy

. 2.408

2.781

3.140

3.563

3.9127


2.049 2.791

2.3985 3.1901

3.237

3.5343

3.174 2.458

3.517 2.815

2.631 3.5316 3.8273 3.768 3.1706

3.1871 3.8058 4.0552 4.023 3.5031

3.6435 4.0974 4.2091 4.1942 3.7976

Noisy

0.943

1.17

1.4828

1.857

2.2763


1.209 1.244 1.843 1.273 1.081

1.5519 1.562 2.269 1.696 1.416

1.9177 2.0552 2.6982 2.124 1.8

2.45 2.4909 3.0575 2.529 2.1873

2.9806 2.9118 3.4124 2.9122 2.5668

Noisy

0.985

1.687

1.514

1.8699

2.2635


0.985

1.687

1.514

1.8699

2.2635

1.1271 1.3989 1.284 1.186

1.5625 1.739 1.216 1.518

1.9211 2.236 2.046 1.8102

2.3711 2.691 2.3850 2.1273

2.9402 3.1062 2.7757 2.4995

Noisy

8.874

1.549

1.938

2.3271

2.7040


-8.874

1.549

1.938

2.3271

2.7040

-0.76 1.843 1.749 1.2629

1.595 2.2688 2.105 1.632

1.9119 2.6982 2.5119 1.9927

2.2865 3.0575 2.9453 2.3592

2.5648 3.4124 3.3823 2.7149

Noisy

0.999

1.1391

1.3238

1.6003

1.9741


1.0317 1.132 1.1842 1.037 0.963

1.2988 1.361 1.4024 1.4693 1.566

1.6577 1.7777 1.9351 1.8887 2.048

2.1848 2.2006 2.5481 2.2798 2.3022

2.7852 2.6485 2.9590 2.7107 2.6596

Figures 4 and 5 represent some examples of speech enhancement using our proposed technique. Those figures show clearly that the proposed technique reduces efficiently the noise while introducing a little distortion in speech signal. Figures 8-12 present the spectrograms of examples of clean speech signals, noisy signals and enhanced signals and this for four types of noise (White, Pink, F16, Tank and Volvo noises.

156


1

0

-1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1

0

-1 1

0

-1

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 Figure 5. An example of speech enhancement by the proposed method: speech signal corrupted by car noise with SNR = -10dB. 1

1 0.5

0

0 -0.5 -1

0

0.5

1

1.5

2

2.5

3

3.5

-1

0

0.5

1

1.5

2

2.5

3

0

0.5

1

1.5

2

2.5

3

0

0.5

1

1.5

2

2.5

3

1

1 0.5

0

0 -0.5 -1

0

0.5

1

1.5

2

2.5

3

3.5

-1 1

1 0.5

0

0 -0.5 -1

0

0.5

1

1.5

2

2.5

3

3.5

-1

(a) (b) Figure 6. Two examples of speech enhancement by the proposed method: (a) speech signal corrupted by Tank noise with SNR = 5dB, (b) speech signal corrupted by Pink noise with SNR=5dB. Those figures show clearly that the proposed technique reduces efficiently the noise while introducing a little distortion in speech signal. Figure 7 presents the spectrograms of examples of clean speech signals, noisy signals and enhanced signals and this for the case of the car noise. (a)

157

1 1 0.5 0.5 0 0 -0.5 -0.5


Freq (kHz)

-1 -1 0 0

0.5 0.5

1 1

2 2

2.5 2.5

4 11

20

3 0.5 0.5

10 0

2 00

-10

1 -0.5 -0.5 0 -1 -10 0

-20 -30 0.5

0.5

1

1

1.5 Time (sec)

1.5

2

2

2.5

2.5

(b)

1 41 Freq (kHz)

1.5 1.5

30 20

0.5 3 0.5

10 20 0

0 -10

1 -0.5 -0.5

-20 0 -1 -1 0 0 0

0.5

0.5 0.5

1

11

1.5 Time (sec)

1.5 1.5

2

22

2.5

2.5 2.5

(c)

Enhanced Speech Signal

Freq (kHz)

1 4

30 20

3 0.5

10

2 0

0 -10

1 -0.5 0 -1 0 0

-20 0.5

0.5

1

1

1.5 Time (sec)

1.5

2

22

2.5

2.5 2.5

(d) Signal Enhanced Speech

Freq (kHz)

141

20

0.5 0.53

10 0

020

-10

1 -0.5 -0.5 0

-1 0 -1 00

-20 -30 0.5

0.5 0.5

1

11

1.5 Time (sec)

1.5 1.5

2

22

2.5

2.5 2.5

(e) Signal Enhanced Speech

Freq (kHz)

14

20

0.53

10 0

2 0

-10

1 -0.5 0 -1 0 0

-20 -30 0.5

1

0.5

1

1.5 Time (sec)

2

1.5

2.5

2

2.5

(f) 4 20 Freq (kHz)

3

10 0

2

-10 1 0

-20 -30 0

0.5

1

1.5 Time (sec)

2

2.5

Figure 7. Spectrogram of (a) Clean speech signal, (b) Speech corrupted with Car noise at -10dB SNR, and speech enhanced by employing, (c) Wiener filtering method, (d) method of Johnson (e) MMSESTSA estimator, (f) our proposed technique (BWT/Wiener filtering).

158


7. Conclusion In this paper, we propose a new speech enhancement technique including the bionic wavelet transform and the Wiener filtering. The obtained results show that the proposed technique outperforms the most poplar techniques. The noise is efficiently removed without introducing and preserving information in enhanced speech signal and this especially for 5 and 10dB.

8. References [1] J. S. Lim and A. V. Oppenheim, ‘‘Enhancement and bandwidth compression of noisy speech’’, In Proceedings of the IEEE, pp.1586-1604, 1979. [2] M. Berouti, R. Schwartz, and J. Makhoul, ‘‘Enhancement of speech corrupted by acoustic noise’’, In ICASSP IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings, pp. 208-211, 1979. [3] S. Boll, ‘‘Suppression of acoustic noise in speech using spectral subtraction’’, IEEE tran. Signal Processing, 27(2), pp.113-120, 1979. [4] M. Bahoura and J. Rouat, ‘‘Wavelet speech enhancement based on time-scale adaptation’’, Speech Communication, vol.48, no.12, pp.1620-1637, 2006. [5] Dr Sattar B. Sadkhan, Dr Nidaa A. Abbas, ‘‘Proposed Simulation of Modulation Identification Based On Wavelet Transform’’, International Journal of Advancements in Computing Technology, vol.1, no.1, 2009. [6] H. Taşmaz and E. Erçelebi, ‘‘Speech enhancement based on undecimated wavelet packetperceptual filterbanks and MMSE-STSA estimation in various noise environments’’, Digital Signal Processing, vol.18, no.5, pp.797-812, 2008. [7] M. T. Johnson, X. Yuan, and Y. Ren, ‘‘Speech signal enhancement through adaptive wavelet thresholding’’, Speech Communication, vol.49, no.2, pp.123-133, 2007. [8] D. Mahmoudi, ‘‘A microphone array for speech enhancement using multiresolution wavelet transform’’, In Proc. Of Eurospeech'97, pp.339-342, 1997. [9] C. H. Yang, J. C. Wang, J. F. Wang, H. P. Lee, C. H. Wu, and K. H.Chang, ‘‘Multiband subspace tracking speech enhancement for in-car human computer speech interaction’’, Journal of Information Science and Engineering, vol.22, no.5, pp.1093-1107, 2006. [10] Y. Shao and C. H. Chang, ‘‘A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system’’, IEEE Transactions on Systems, Man, and Cybernetics Part B: Cybernetics, vol.37, no.4, pp.877-889, 2007. [11] J. Sika and V. Davidek, ‘‘Multi-channel noise reduction using wavelet filterbank’’, In EuroSpeech'97, pp. 2595-2598, 1997. [12] J. Yao and Y. T. Zhang, ‘‘Bionic wavelet transform: A new time-frequency method based on an auditory model’’, IEEE Transactions on Biomedical Engineering, vol.48, no.8, pp.856-863, 2001. [13] J. Yao and Y. T. Zhang, ‘‘The application of bionic wavelet transform to speech signal processing in cochlear implants using neural network simulations’’, IEEE Transactions on Biomedical Engineering, vol.49, no.11, pp.1299-1309, 2002. [14] X. Yuan, ‘‘Auditory Model-Based Bionic Wavelet Transform for Speech Enhancement’’, Master's thesis, Marquette University, Milwaukee, WI, USA, 2003. [15] O. Sayadi and M.B. Shamsollahi, ‘‘Multiadaptive Bionic Wavelet Transform: Application to ECG Denoising and Baseline Wandering Reduction’’, EURASIP Journal of Applied Signal Processing, pp.11, 2007. [16] Talbi Mourad, Salhi Lotfi, Abid Sabeur, Cherif Adnane, ‘‘Recurrent Neural Network and Bionic Wavelet Transform for speech enhancement’’, Int. J. Signal and Imaging Systems Engineering, vol.3, no.2, pp.93-101, 2010. [17] Philipos C. Loizou, “Speech Enhancement Theory and Practice”, Taylor & Francis, USA, 2007. [18] Scalart, P. and Filho, J., ‘‘Speech enhancement based on a priori signal to noise estimation’’, In Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, pp. 629-632, 1996.

159


[19] Urmila Shrawanka, ‘‘Voice Activity Detector and Noise Trackers for Speech Recognition System in Noisy Environment’’, International Journal of Advancements in Computing Technology, vol.2, no.4, 2010. [20] E. Zavarehei, S. Vaseghi, and Q. Yan, ‘‘Inter-frame modeling of DFT trajectories of speech and noise for speech enhancement using Kalman filters’’, Speech Communication, vol.48, no.11, pp.1545-1555, 2006.

160

Wiener Filtering Application in the Bionic Wavelet ... - Semantic Scholar

Wiener Filtering Application in the Bionic Wavelet ... - Semantic Scholar

Suggest Documents

Multiadaptive Bionic Wavelet Transform - Semantic Scholar

application of the wavelet transform to filtering

ecg signal denoising using wavelet domain wiener filtering - CiteSeerX

ecg signal denoising using wavelet domain wiener filtering - eurasip

Improved Wavelet Denoising via Empirical Wiener Filtering - CiteSeerX

Adaptive Wavelet Wiener Filtering of ECG Signals - PG Embedded ...

A Stationary Wavelet-Domain Wiener Filter for ... - Semantic Scholar

Consistent Wiener Filtering: Generalized Time

Application to Color Interpolation Filtering - Semantic Scholar

on-line qrs complex detection using wavelet filtering - Semantic Scholar

Application of wavelet Transform in power Quality - Semantic Scholar

Image Restoration via Wiener Filtering in the Frequency Domain - wseas

Image Restoration via Wiener Filtering in the Frequency Domain - wseas

Application of Frequency-Shift Filtering to the ... - Semantic Scholar

1 Speech Distortion Weighted Multichannel Wiener Filtering ...

A NEW WIENER FILTERING BASED DETECTION

Wiener Filtering Applied to Conducted EMI

Kernel Wiener Filtering Model with Low-Rank

(DB4) Wavelet - Semantic Scholar

Wavelet - Semantic Scholar

Mechatronic Wearable Exoskeletons for Bionic ... - Semantic Scholar

Collaborative Filtering - Semantic Scholar

Anomaly Prediction in Network Traffic using Adaptive Wiener Filtering ...

Anomaly Prediction in Network Traffic using Adaptive Wiener Filtering ...