Phase Based Single-Channel Speech Enhancement

8 downloads 0 Views 1MB Size Report
In this direction,. Griffin et al. proposed an algorithm for clean phase estimation ... STFT phase spectrum of noisy speech signal is given as in eqn. (4). ),(. ),(. ),(. knYj .... [8] D. Griffin and J. Lim, “Signal estimation from modified short-time fourier ...
International Conference on Computer Applications in Electrical Engineering Recent Advances, IIT Roorkee, CERA-2017

Phase Based Single-Channel Speech Enhancement Using Phase Ratio Sachin Singh and A M Mutawa

Monika Gupta and Manoj Tripathy, R S Anand

Department of Electrical & Electronics Engineering, NIT Delhi, India [email protected] Department of Computer Engineering, Kuwait University. [email protected]

Researcher, UTU Dehradun, [email protected], Department of Electrical Engineering, IIT Roorkee, India [email protected], [email protected] spectrum is altered for the enhancement. Phase ratio is calculated in each frame for every sample. Two gain functions G1 and G2 were used for correction in noisy phase to suppress

Abstract— In this paper, a novel method based on the phase of noisy speech and noise signals is proposed. Phase ratio based two gain functions (G1 and G2) are developed and used for correction in noisy phase to suppress noises coming from angles 0 to ±π/2 and π/2 to ±π, respectively. For the reconstruction, both gains G1 and G2 were multiplied and lower values of the phases were neglected, which gives enhancement in speech. Objective speech intelligibility measures, informal subjective listening tests and spectrogram analysis were used for finding the effectiveness of the proposed method.

noises coming from angles 0 to ±π/2 and π/2 to ±π, respectively. The rest of this letter is organized as follows. Signal model and notation is described in Section II. Section III presents the details of the proposed approach. Experimental evaluation and results explanations are given in Section IV. Conclusions are drawn in Section V.

Keywords— Single-channel speech enhancement, phase ratio estimation, noise reduction, signal reconstruction.

I.

SIGNAL MODEL AND NOTATION

II.

Let us consider noisy speech is an additive superposition of clean speech and noise as expressed in eqn. (1).

INTRODUCTION

Aim of single-channel speech enhancement algorithm is intelligibility and quality improvement from noise-corrupted signal. Most of the speech enhancement algorithms use amplitude only, while phase remains unchanged in process of enhancement. These algorithms can be classified into Spectral subtractive [1], MMSE estimation [2], sub-space based [3], and Wiener type algorithms [4]. The direct calculation of the clean spectral phase from the noisy speech is considered very difficult task [5]. In literature some researcher developed algorithms which are improving phase of noisy speech signal. For instance, Wang et al. experimented for the unimportance of speech phase enhancement [6] while Paliwal et al. study show that the phase enhancement has importance for the quality and intelligibility improvement in single-channel speech [7, 1314]. So currently, research for the estimation of clean phase from the noisy speech signal is increasing. In this direction, Griffin et al. proposed an algorithm for clean phase estimation by iteratively analyzing and synthesizing the signal using spectral amplitude [8]. In paper [5], Gerkmann et al. used STFT-phase for the MMSE-optimal spectral amplitude estimation for the speech enhancement. The drawback of these algorithms is that they require some additional information. Stark et al. [9] and Wojcicki et al [10] proposed very effective algorithms for the clean phase estimation. In this paper, a novel approach is proposed for estimation of clean phase in single-channel noisy speech, where noisy magnitude spectrum is multiplied with a correct phase spectrum to obtain an enhanced speech spectrum. In this procedure, magnitude spectrum is left unchanged only phase

y ( n) = s ( n) + d ( n)

(1)

Where y (n), s (n) and d (n ) denote noisy, clean and noise signals in discrete-time domain, respectively. Now, discrete short-time Fourier transform (STFT) of the corrupted speech signal y (n) is given by eqn. (2).

Y (n, k ) =



 y( p)w(n − p)e

− j 2π k p / N

p = −∞

(2)

Where, k denotes the kth discrete-frequency of N uniformly spaced frequencies and w(n) is function of analysis window. Hanning window with frame length of 240 samples were used in speech processing and 8 kHz sampling frequency with 50 percentage overlapping. Now eqn. (1) can be represented as:

Y ( n, k ) = S ( n, k ) + D ( n, k )

(3)

Where, Y ( n, k ), S ( n, k ) and D ( n, k ) are the STFT of the noisy, clean and noise signals, respectively. The representation of eqn. (3) in terms of STFT magnitude and STFT phase spectrum of noisy speech signal is given as in eqn. (4).

Y ( n, k ) = Y ( n, k ) e j ∠ Y ( n , k )

Y (n, k )

(4)

and ∠Y ( n, k ) are magnitude and phase Where, spectrum, respectively.

393

Generally, single-channel speech enhancement algorithms [1-4] modify only magnitude spectrum and phase spectrum left unchanged. In this letter only phase spectrum is altered for getting the impact on quality and intelligibility of speech. The procedure of the proposed approach is given in next section. III.

The inverse STFT is used to convert eqn. (13) to timedomain. Now, overlap-add method is employed for the denoised time-domain single-channel speech signal. At last enhanced speech signal is recovered.

PROPOSED APPROACH

Fig.1 shows the block diagram of the proposed single-channel speech enhancement method which has mainly three steps: 1) calculation of phase ratio as given below; 2) using this phase ratio finds out

G1 and G2 for correcting phase to suppress

noises coming from angles0 to ±π/2 and respectively; 3) extracting correct

π/2 to ±π, phase by

Y (n, k ) = Y (n, k ) e j ∠ X ( n, k )

G = G1 G2 by using eqn. (5) to (8). using For calculating phase ratio, the angle of STFT noise and noisy spectrum is calculated as given in eqn. (5) and (6)

PD ( n, k ) = ∠d ( n, k )

(5)

PY ( n, k ) = ∠Y ( n, k )

(6)

PYD ( n, k ) = λ PY ( n, k ) + (1 − λ ) PD ( n, k )

YG (n, k ) = Y (n, k ) * G (n, k ) Y (n , k )

∠ YG ( n , k )

(7)

Phase Ratio ( n, k ) = PYD ( n, k ) /

j∠ Y SˆG (n, k ) = Y (n, k ) e G

( PD ( n, k ) * PY ( n, k ) + ξ )

( n, k )

(8)

Where PD ( n, k ), and PY ( n, k ) give angle of noise, noisy spectrum, respectively. The combined angle PYD ( n, k ) of noise and noisy signals is measured as given in eqn. (7) and

phase ratio in eqn. (8). The forgetting factors λ , and ξ are held to a fixed value 0.65 and 10^-12, respectively [11, 15]. Now, to cancel the noises coming from angles 0 to ±π/2, gain functions

Fig. 1 Block diagram of the proposed single-channel speech enhancement method.

G1 and G2 , for correcting phase to suppress noises

IV.

coming from angles 0 to ±π/2 and π/2 to ±π, respectively are calculated as in eqns. (9), and (10).

G1 (n, k ) =1 − Phase Ratio

A. Speech and Noise Corpus The clean speech patterns are taken from NOIZEUS database which is composed of 30 balanced sentences recorded from six speakers (three males and three females) [12]. This database comes with various additive non-stationary noises at different SNR levels (i.e. -10, -5, 0, 5, 10, and 15dB).

(9)

if G1 < μ 0.05, G2 (n, k ) =  Otherwise 1, (10) μ Where, is constant value of -0.3 for suppressing noises

TABLE I. OUTPUT SNR SCORES

coming from angle more than ±π/2. Final filter G is calculated by using eqn. (9) and (10).

G(n, k ) = G1 (n, k ) * G2 (n, k )

(11) The corrected phase spectrum is calculated by using eqn. (11) and (4).

YG (n, k ) = Y (n, k ) * G (n, k )

(12)

Sˆ (n, k )

The enhanced single-channel speech spectrum G calculated by using eqn. (12) and (4) as given in eqn. (13). j∠ Y Sˆ G (n, k ) = Y (n, k ) e G

RESULTS AND EVALUATION

is

(n, k )

IN PRESENCE OF WHITE NOISE

Input (dB) -10 -5

Proposed method 0.415 1.200

Wiener

SS

MMSE

-2.664 -1.387

-4.421 -2.471

-1.620 -0.949

0

3.025

-0.766

0.887

-1.052

5

6.219

-0.312

2.921

-1.595

10

10.446

-0.243

5.800

-2.760

15

15.169

-0.140

9.339

-4.630

The noises used for evaluation are given as white, train and babble. All patterns of the corpus are sampled at 8 kHz.

(13)

394

C. Performance Evaluation Parameters Performances are compared in terms of objective speech quality measure parameters namely, the perceptual estimation of speech quality (PESQ), mean-opinion score (MOS), shorttime objective intelligibility (STOI) and articulation index (AI). The objective speech intelligibility is measured in presence of wide range of noise types. The PESQ is estimated in the range of -0.5 and 4.5, and objective intelligibility increases from lower to higher. The other parameters MOS, STOI and AI measured in the range of 0 to 1, the maximum improvement is given at 1 and lowest gives minimum enhancement.

B. Phase Enhancement Procedure For the speech phase enhancement a phase ratio based algorithm is proposed described in Section III. This phase ratio based procedure is given step-by-step in Fig.1. The phase ratio is calculated from noise and noisy speech spectrum given in eqn. (8) and the values of all constants which were used in our evaluation were determined in such a way as to maximize objective speech intelligibility. The two gain functions

G1 and

G2 were calculated for suppressing noise coming from angles

Freq.(kHz) (a)

0 to ±π/2 and π/2 to ±π, respectively. These two gain functions are based on previously calculated phase values.

Freq.(kHz) (b)

Time(s)

Freq.(kHz) (c)

Time(s)

Frq.(kHz) (d)

Time(s)

Freq.(kHz) (e)

Time(s)

Fig. 3 Improvement in scores of (a) MOS (b) PESQ (c) STOI (d) AI; for babble, f16, white and pink noises.

D. Parameters Discussion The output SNR values are given in Table 1 for white noise case show that the proposed approach performs best in comparison to Wiener [4], SS [1], and MMSE [2] methods. The Spectrograms analysis is shown in Fig. 2. In the case of white noise, less amount of noise is suppressed as shown in spectrogram. Hence enhanced spectrogram is not much clear. In case of babble noise sufficient amount of noise is suppressed so the spectrogram is clear. A more detailed explanation of results with other algorithms reported in literature will be presented in future work. Fig. 3 give the

Time(s) Fig. 2 Spectrograms of sp10.wav utterance, “The sky that morning was clear and bright blue,” by a male speaker shows as: (a) clean speech; and (b) white (c) babble noises at 10dB degraded speech, respectively; (d), and (e) spectrograms are corresponding enhanced speech.

395

MOS, PESQ, STOI, and AI scores at corresponding input SNR levels for babble, f16, white and pink noises. All objective intelligibility parameters give the maximum improvement in case of babble noise. V.

[5]

[6]

CONCLUSION

[7]

In this paper, phase ratio of noise and noisy speech spectrum is proposed for phase enhancement in single-channel speech. The unchanged noisy magnitude spectrum is combined with a changed phase spectrum to get the enhanced speech spectrum. The two gain functions using phase ratio were implemented to suppress noises coming from angles 0 to ±π/2 and π/2 to ±π, respectively and get enhanced speech. The objective intelligibility measure, spectrogram analysis, and informal subjective listening tests showed that the proposed method results in improved objective speech intelligibility.

[8]

[9]

[10]

[11]

ACKNOWLEDGMENT [12]

I am very thankful to KFAS (PR1718SM05) for giving support in conducting research.

[13]

References [1]

[2]

[3]

[4]

S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics Speech and Signal Processing, vol. ASSP-27 no. 2, pp 113-120, Apr. 1979. Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error log-spectral amplitude estimator,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP-33, no.2, pp. 443-445, Apr. 1985. Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Trans. Speech Audio Processing, vol. 3, no. 4, pp. 251–266, Jul. 1995. N. Wiener, The Extrapolation, interpolation, and smoothing of stationary time series with engineering applications. New Yark: Wiley, 1949.

[14]

[15]

396

Timo Gerkmann and Martin Krawczyk, “MMSE-optimal spectral amplitude estimation given the STFT-phase,” IEEE Signal Processing Letters, vol. 20, no.2, pp. 1-4, Feb. 2013. D. L. Wang and J. S. Lim, “The unimportance of phase in speech enhancement,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. 32, no.6, pp. 1109-1121, Dec. 1984. K. Paliwal, K. Wojcicki, and B. Shannon, “The importance of phase in speech enhancement,” ELSEVIER Speech Commun., vol. 53, no. 4, pp. 465-494, Apr. 2011. D. Griffin and J. Lim, “Signal estimation from modified short-time fourier transform,” IEEE Trans. on Acoust., Speech, Signal Processing, vol. 32, no.2, pp. 236-243, Apr. 1984. Anthony P. Stark, K. Wojcicki, James G. Lyons and K. Paliwal, “Noise driven short-time phase spectrum compensation procedure for speech enhancement,” Inter Speech, Brisbane Australia, Sept. 22-26, 2008. K. Wojcicki, Mitar Milacic, Anthony P. Stark, James G. Lyons and K. Paliwal, “Exploiting conjugate symmetry of the short-time fourier spectrum for speech enhancement,”IEEE Signal Processing Letters, vol. 15, pp. 461-464, 2008. R. L. Bouquin and G. Faucon, “Using the coherence function for noise reduction,” in Inst. Electron. Eng. Proc.-I Commun., Speech, Vis., vol. 139, no. 3, pp. 276-280, Jan. 1992. Y. Hu and P. Loizou, “Subjective comparison of speech enhancement algorithms,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP’ 06), Toulouse, France, pp. 153-156, 2006. P. Mowlaee and J. Kulmer, “Phase estimation in single-channel speech enhancement: Limits-potential”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 8, pp. 1283–1294, 2015 R. Maia, Y. Stylianou, “Iterative Estimation of Phase Using Complex Cepstrum Representation”, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. Sachin Singh, Manoj Tripathy, R. S. Anand, “A Wavelet Packet Based approach for speech enhancement using modulation channel selection”, International Journal of Wireless Personal Communication, Springer, 2017. (SCI, 0.702) DOI: 10.1007/s11277-017-4094-6