Speech Enhancement in Short-Wave Channel Based on ICA in ...

Speech Enhancement in Short-Wave Channel Based on ICA in Empirical Mode Decomposition Domain Li-Ran Shen1, Xue-Yao Li1, Qing-Bo Yin1, 2, and Hui-Qiang Wang1 1

College of Computer Science And Technology, Harbin Engineering University, No.145 Nantong Street, Nangang District, Harbin, China 2 Div. of Electronic, Computer and Telecommunication Engineering, Pukyong National University, Busan, Korea

Abstract. It is well known that the non-stationary noise is the most difficult to be removed in speech enhancement. In this paper a novel speech enhancement algorithm based on the empirical mode decomposition (EMD) and then ICA is proposed to suppress the non-stationary noise. The noisy speech is decomposed into components by the EMD and ICA-based vector space, and the components are processed and reconstructed, respectively, by distinguishing between voiced speech and unvoiced speech. There are no requirements of noise whitening and SNR pre-calculating. Experiments show that the proposed method performs well suppressing of the non-stationary noise in short-wave channel for speech enhancement.

1 Introduction In short-wave channel communication there is a great deal of interferential noise existing in the surrounding environment, transmitting media, electronic communication device and other speakers’ sound, etc. common in most practical situations. In general, the addition of noise reduces intelligibility and degrades the performance of digital voice processors used for applications such as speech compression and recognition. Therefore, the problem of removing the uncorrelated noise component from the noisy speech signal, i.e., speech enhancement, has received considerable attention. In speech communication over the short-wave channel the purpose is to elevate the objective quality of speech signal and the intelligibility of noisy speech in order to reduce the listener fatigue. There have been numerous studies on the enhancement of the noisy speech signal. Many different types of speech enhancement algorithms have been proposed and tested [1–4, 6]. Spectral subtraction is a traditional method of speech enhancement [6]. The major drawback of this method is the remaining musical noise. Additionally a drawback of speech enhancement methods is the distortion of the useful signal. The resolution is the compromise between signal distortion and residual noise. Though this problem is well known, the study results indicate that both of these cannot be minimized simultaneously. Minimum mean square error (MMSE) [3] estimates on J. Rosca et al. (Eds.): ICA 2006, LNCS 3889, pp. 708 – 713, 2006. © Springer-Verlag Berlin Heidelberg 2006

Speech Enhancement in Short-Wave Channel Based on ICA in EMD Domain

709

speech spectrum have been proposed. And Ephraimand Van Trees proposed a signal-subspace-based spectral domain algorithm, which controls the energy of residual noise in a certain threshold while minimizing the signal distortion. Hence the probability of noise perception can be minimized. The drawback of this method is that it deals only with white noise. EMD theory is the newly developed time–frequency analysis technology and is especially of interest in non-stationary signals such as water, sonar, seismic signal, etc [7-10]. In this paper we use EMD technique to produce an observed signal matrix from single short-wave channel signal. And then based on ICA the observed signal matrix was decomposed into signal subspace and noise subspace. Reconstruct speech signal using signal subspace to achieve speech enhancement.

2 Empirical Mode Decomposition The empirical mode decomposition(EMD) was first introduced by Huang et al. [8]. The principle of this technique is to decompose adaptively a given signal x(t ) into oscil-

lating components. These components are called intrinsic mode functions (IMFs) and are obtained from the signal x by means of an algorithm, called sifting. It is a fully data driven method. The algorithm to create IMFs is elegant and simple. Firstly, the local extremes in the time series data X (t ) are identified, and then all the local maxims are connected by a cubic spline line U X (t ) , known as the upper envelope of the data set. Then, we repeat the procedure for the local minima to produce the lower envelope, L X (t ) . Their mean m1 (t ) is given by:

m1 (t ) =

L X (t ) + U X (t ) 2

(1)

It is a running mean. We note that both envelopes should cover by construction all the data between them. Then we subtract the running mean m1 (t ) , from the original data X (t ) , and we get the first component, h1 (t ) , i.e.: h1 (t ) = X (t ) − m1 (t )

(2)

To check if h1 (t ) is an IMF, we demand the following conditions: (i) h1 (t ) should be free of riding waves i.e. the. Rest component should not display under-shots or over-shots riding on the data and producing local extremes without zero crossing. (ii) To display symmetry of the upper and lower envelops with respect to zero. (iii) Obviously the number of zero crossing and extremes should be the same in both functions.

710

L.-R. Shen et al.

The sifting process has to be repeated as many times as it is required to reduce the extracted signal to an IMF. In the subsequent sifting process steps, h1 (t ) is treated as the data; then: h11 (t ) = h1 (t ) − m11 (t )

(3)

If the function h11 (t ) , does not satisfy criteria (i)–(iii), then the sifting process continues up to k times, h1k, until some acceptable tolerance is reached:

h1k (t ) = h1( k −1) (t ) − m1k (t )

(4)

The resulting time series is the first IMF, and then it is designated as C1 (t ) = h1k the first IMF component from the data contains the highest oscillation frequencies found in the original data X (t ) . The first IMF is subtracted from the original data, and this difference, is called a residue r1 (t ) by: r1 (t ) = X (t ) − C1 (t )

(5)

The residue r1 (t ) is taken as if it was the original data and we apply to it again the sifting process. The process of finding more intrinsic modes C j (t ) continues until the last mode is found. The final residue will be a constant or a monotonic function; in this last case it will be the general trend of the data. n

X (t ) = ∑ C j (t ) + rn (t ) j =1

(6)

Thus, one achieves a decomposition of the data into n-empirical IMF modes, plus a residue, rn (t ) , which can be either the mean trend or a constant. We must point out that this method do not requires a mean or zero reference, and only needs the locations of the local extremes.

3 Speech Enhancement Based on ICA 3.1 Infomax Algorithm

Informax algorithm is to maximize the network entropy. The network entropy was defined as the inter-information between input and output. I (Y, X) = H (Y) − H (Y X)

(7)


711

And then ∂I (Y , X ) ∂H (Y ) = ∂Y ∂W

(8)

If the relationship between the input and the output is Y = tanh(WX ) ,and then

[ ]

∆W = W T

−1

− 2 tanh(WX ) X T

(9)

To fast convergence and simply computation, the left of the equation (9) was multiplied by W T W . ∆W ∝ [W − 2 tanh(WX )(WX )T ]W

(10)

So it can be used using natural grade method to solve the optimization problem in equation (9).

3.2 De-noising Method Firstly using the EMD method to form the observed data Ci (t ) . And then using ICA decomposes the observed space Ci (t ) into signal subspace and noise subspace. So we can reconstruct speech signal using signal subspace.

4 Experiments and Analysis 4.1 The Source of Experiment Data The experiment data come from two parts. The first one is standard noise database and speech database. It is used to test the proposed method using different noise types and different SNRs. The second come from real short-wave speech signal records on the spot.

4.2 The Results of Experiments and Analysis Using the noise database NOISEX92, add different type noise to the same pure speech signal. The SNR is 5dB. The enhanced SNR shown in table 1. Because the different of noise, the enhancement effect are different. From the table 1, we can see the proposed method can effectively reduce the pink and white noise. The second database comes from the real records on the spot, which include many kinds of languages such as China, English, Japanese, Russian and so on. Each signal length of 8 frequency bands is 5 minutes, and the sample rate is 11025Hz. And the estimated SNR is 0.Two methods are used to test. The first one is the proposed method, and the other is the classical method spectral subtraction.

712

L.-R. Shen et al.

Table 1. The enhancement results of diffrent noise with different SNR

Noise

Pink

Factory

Airforce

Babble

White

–5

3.1

1.7

1.3

–1.2

5.2

0

9.2

5.4

4.5

2.5

12.7

5

15.2

11.9

11.2

8.2

18.0

Table 2. The tests result of real short-wave speech signal

Frequency band 1 2 3 4 5 6 7 8

Proposed method

SS

5.2 8.6 7.2 10 9.7 3.2 11.4 5.9

3.1 6.3 8.2 9.1 6.2 5.3 10.1 6.2

From table 2 we can see the proposed method can efficiently remove the noise. Because the different noise in different frequency band, the effects of enhancement are different also. In some band the enhancement effects of the proposed method are over performed the traditional method SS. And in the other frequency band the enhancement effects almost equal to the method SS. But it is worth to point out that not like SS the proposed method didn’t produce music noise.

5 Conclusions In this paper, a novel method for speech enhancement was proposed. Using EMD to form an observed data matrix. The ICA can decompose the matrix into signal subspace and noise subspace. The primary experiments show that the proposed method can efficiently remove the non-stationary noise. It is important for the short-wave communication, because it can elevate the objective quality of speech signal and the intelligibility of noisy speech in order to reduce the listener fatigue.

Acknowledgment This work herein was supported by National Nature Science Foundation of China, under project No.60475016.


713

References 1. Martin R.:Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics.IEEE Trans on Speech and Audio Processing,Vol.9(2001)504-512 2. Ephraim. Y,Malah. D.: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.32 (1984) 1109-1121 3. Zheng W.T, Cao Z.H.: Speech enhancement based on MMSE-STSA estimation and residual noise reduction, 1991 IEEE Region 10 International Conference on EC3-Energy, Computer, Communication and Control Systems， Vol.3 (1991).265 –268 4. Liu Zhibin, Xu Naiping.: Speech enhancement based on minimum mean-square error short-time spectral estimation and its realization, IEEE International conference on intelligent processing system, Vol.28 (1997)1794-1797. 5. Lim ,J.S, Oppenheim. A.V.: Enhancement and bandwidth compression of noisy speech. Proc. of the IEEE, Vo1.67 (1979): 1586-1604. 6. Goh.Z, Tan.K and Tan.T.: Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Trans. Speech Audio Procs, Vol. 6(1998) 287-292. 7. He. C and Zweig.Z.: Adaptive two-band spectral subtraction with multi-window spectral estimation. ICASSP, Vol.2 (1999) 793-796. 8. Huang. N.E.: The Empirical Mode Decomposition and the Hilbert Spectrum for Nonlinear and Non-stationary Time Series Analysis, J. Proc. R. Soc. Lond. A, Vol.454 (1998) 903-995 9. Huang. W,Shen.Z,Huang.N.E,Fung. Y.C.: Engineering Analysis of Biological Variables: an Example of Blood Pressure over 1 Day, Proc. Natl. Acad. Sci. USA, Vol.95 (1998) 4816-4821 10. Huang. W,Shen.Z,Huang.N.E,Fung.: Nonlinear Indicial Response of Complex Nonstationary Oscillations as Pulmonary Pretension Responding to Step Hypoxia, Proc. Natl. Acad. Sci, USA, Vol.96(1999)1833-1839