Bayesian estimation for speech enhancement given a priori ...

1 downloads 0 Views 2MB Size Report
Sep 10, 2015 - phase is combined with clean speech phase uncertain prior knowledge. 3.1 Maximum likelihood phase estimation. The phase likelihood PY /s.
Int J Speech Technol (2015) 18:593–607 DOI 10.1007/s10772-015-9306-4

Bayesian estimation for speech enhancement given a priori knowledge of clean speech phase V. Sunnydayal1 • T. Kishore Kumar1

Received: 11 May 2015 / Accepted: 30 August 2015 / Published online: 10 September 2015 Ó Springer Science+Business Media New York 2015

Abstract In this paper, STFT based speech enhancement algorithms based on estimation of short time spectral amplitudes are proposed. These algorithms use maximum likelihood, maximum a posterior and minimum mean square error (MMSE) estimators which respectively uses Laplace, Gamma and Exponential probability density functions as noise spectral amplitude priors and Nakagami distribution as speech spectral amplitude priors. The phase of noisy speech carries significant information to be retrieved and utilized. However, the undesired artifacts which are the resultant of the process do create many challenges. In this paper, the reconstructed phase is treated as an uncertain prior knowledge when deriving a joint MMSE estimate of the (C)omplex speech coefficients given (U)ncertain (P)hase information is proposed. The proposed phase reconstruction algorithm assists in generating a clean speech phase. The proposed estimator reduces undesired artifacts and also gives satisfactory values between noisy phase signal and estimate of prior phase and hence yields superior performance in the instrument measures, informal listening and speech quality. Keywords MMSE estimator  Laplace density function  Exponential density function  Gamma density function  von Mises distribution  Nakagami distribution  PESQ

& V. Sunnydayal [email protected] T. Kishore Kumar [email protected] 1

Electronics and Communication Engineering Department, National Institute of Technology Warangal, Warangal, Telangana 506 004, India

1 Introduction In mobile communications speech enhancement plays a very important role. The main goal of speech enhancement is to improve the quality and intelligibility of speech degraded by noise. Speech enhancement is a difficult task, especially when only a single microphone is considered. In this paper, statistically optimal estimators such as maximum likelihood (ML) estimation, maximum a posteriori (MAP) and minimum mean squared error (MMSE) estimators are derived for clean speech spectral coefficients and clean speech phase for single channel to enhance the corrupted speech in frequency domain. In most of the noise reduction techniques (Hendriks et al. 2013; Erkelens et al. 2007; Lotter and Vary 2005), the noisy speech magnitudes are modified without considering its phase. The estimate of clean speech can be obtained by synthesizing and reanalyzing the clean speech magnitudes (Griffin and Lim 1984) alone which would degrade the speech quality. The role of phase has been discussed in single channel enhancement techniques (Wang and Lim 1982; Vary and Eurasip 1985; Gerkmann and Krawczyk 2013). Recently, the use of clean speech phase in speech enhancement algorithms (Paliwal et al. 2011; Gerkmann and Krawczyk 2013) have shown improvement in their performance. In some of the speech enhancement algorithms (Wang and Lim 1982), the noisy phase is replaced by clean speech phase which would result in some artifacts (Sturmel and Daudet 2011). This can be rectified by estimating the clean speech phase (Gerkmann and Krawczyk 2013; Krawczyk et al. 2013) and incorporating it in the Bayesian amplitude estimation. In (Gerkmann and Krawczyk 2013; Krawczyk et al. 2013), the phase estimate was used to improve amplitude estimation while noisy phase observation was used when converting from frequency domain to time domain.

123

594

Int J Speech Technol (2015) 18:593–607

The objective of the paper is to obtain phase estimate and use it in amplitude estimation as well as while converting from frequency domain to time domain without introducing artifacts. When deriving joint MMSE estimate of clean speech phase and amplitude, we treat the estimate of clean speech phase as uncertain a prior knowledge which can be obtained using the sinusoidal model based phase reconstruction algorithm (Krawczyk and Gerkmann 2012). The consideration of uncertainty about prior phase estimate allows us to improve the perceptual quality of speech. This paper is organized as follows: Sect. 2 gives the signal modelling. Section 3 presents phase estimation for unknown speech and noise amplitudes using ML and MAP estimators for different cases (assuming speech as Nakagami distribution and noise as Laplace, Gamma and exponential densities respectively) are derived. In Sect. 4 the proposed (C)omplex speech coefficients given (U)ncertain (P)hase CUP estimators are derived. Section 5 presents the implementation of proposed estimators. Experimental results are discussed in Sect. 6 and finally, the concluding remarks are discussed in Sect. 7.

2 Signal modeling The additive of clean speech Sðk; iÞ and noise signal W ðk; iÞ results in noisy speech signal Y ðk; iÞ. Noisy speech signal in STFT domain can be represented by Y ðk; iÞ ¼ Sðk; iÞ þ W ðk; iÞ

ð1Þ

where k is frequency index and i is time index. The above signals can be represented by Y ¼ RejUY S ¼ AejUS

ð2Þ

W ¼ NejUN The random variables are denoted as S; A; US (in upper case letters) and their realizations are denoted by s; a; /s (in lower case letters). The clean speech phase can also be estimated by iteratively synthesizing and reanalyzing the clean speech amplitudes (Griffin and Lim 1984). To implement this algorithm clean speech amplitude is to be known in prior. In the alternate cases there is a chance of degradation in the enhanced speech, if estimates alone are available. Recently there are advances in iterative phase estimation (Le Roux and Vincent 2013; Mowlaee and Saeidi 2013; Sturmel and Daudet 2011, 2012). Krawczyk and Gerkmann (2012) proposed that, estimation of clean speech can be carried out by estimating the fundamental frequency of voiced speech. To improve the estimate of clean speech amplitudes, phase

123

estimation can be incorporated (Gerkmann and Krawczyk 2013). In signal reconstruction, there may be chance of artifacts, if estimate of clean speech phase is replaced by noisy speech phase (Krawczyk and Gerkmann 2012; Sturmel and Daudet 2011, 2012). Artifacts may be caused, if there is an error in estimating phase. In the next section we will discuss about availability of estimate of clean speech phase by using the proposed method.

3 Phase estimation for unknown speech and noise amplitudes In this section we are going to treat amplitudes of speech and noise as unknown random variables. Maximum Likelihood (ML) of clean speech phase is derived. To obtain a Maximum a posterior (MAP) the resulted clean speech phase is combined with clean speech phase uncertain prior knowledge. 3.1 Maximum likelihood phase estimation The phase likelihood PY j/s ðy; /s Þ is derived by using Bayes’ theorem, R1 PY js ðyj/s ; aÞPUs ;A ð/s ; aÞda PY;Us ðy; /s Þ 0 ¼ PY j/s ðy; /s Þ ¼ PUS ð/s Þ PUS ð/s Þ ð3Þ Assuming amplitudes and phases are independent. i.e., PUs ;A ð/s ; aÞ ¼ PUs ; ð/s ÞPA ðaÞ Then ð3Þ simplify to PY j/s ðyj/s Þ Z1 ¼ PY js ðyj/s ; aÞPA ðaÞda

ð4Þ

0

Case (i) Assuming that PDF of noise spectral coefficients is Laplacian   1 2j y  s j PY jS ðyjsÞ ¼ exp  ð5Þ rN rN Assuming that PDF of speech spectral amplitudes is Nakagami distribution  l   2 l l 2 2l1 pA ðaÞ ¼ a exp  2 a ð6Þ CðlÞ r2S rS With gamma function CðÞ [21, Eq. (8.31)], and shape parameter l. If l\1 allows speech priors as super-Gaussian. l ¼ 1 allows speech priors as Rayleigh. Substitute (5) and (6) in (4), we get

Int J Speech Technol (2015) 18:593–607

 l 1 2 l a2l1 rN CðlÞ r2S 0   l 2 2r 2a exp  2 a  þ rN rN rS

PY j/S ðr; /Y j/s Þ ¼

595

Z1

Case (iii) Assuming that PDF of noise spectral coefficients as Gamma ð7Þ

where r ¼ j yj, a ¼ jSj: The integral (7) can be solved by using (Ryzhik and Gradshteyn 2007, eq. (3.462.1)), to obtain likelihood    2 21l Cð2lÞ 2r t PY j/S ðr; /Y j/S Þ ¼ exp  exp D2l ðtÞ rN rN CðlÞ 2 ð8Þ sffiffiffi n cosð/S  /Y Þ where t ¼ l

ð9Þ

D() is the parabolic function (Ryzhik and Gradshteyn E½jSj2  r2 2007, Eq. (9.241.2)) and n ¼ r2S ¼ E jW j2 : The ML esti½  N mator of clean speech phase is given by  2 t ML ^ /S ¼ arg max exp ð10Þ Dð2lÞ ðtÞ 2 /S Neglecting independent terms of /S in (8). When l [ 0:1025, the parabolic cylindrical function is positively monotonically decreasing function of t. expðt2 =2Þ is positive and increases exponentially with t2 . So ML solution can be obtained by lowest negative of t. It can be concluded that ML optimal estimator is the noisy signal phase /Y ^ML ¼ / ; / Y S

ð11Þ

for l [ 0:1025

Case (ii) Assuming that PDF of noise spectral coefficients is Exponential   1 ys PY jS ðyjsÞ ¼ exp ð12Þ rN rN Assuming that PDF of speech spectral coefficients as Nakagami. We will get,    2 21l Cð2lÞ r t exp PY j/S ðr; /Y j/S Þ ¼ exp D2l ðtÞ rN rN CðlÞ 2 ð13Þ sffiffiffiffiffiffi n cosð/S  /Y Þ; t¼ 4l



r2S r2N

ðyaÞ

pY ja ðyjaÞ ¼

ð14Þ

The ML estimator of clean speech phase is given by  2 ^ML ¼ arg max exp t Dð2lÞ ðtÞ ð15Þ / S 2 /S

ak1 e h hk CðkÞ

ð16Þ

Assuming that PDF of speech spectral coefficients as Nakagami. We will get,  l  2lþk1 2 l 2l PY j/S ðr; /Y j/S Þ ¼ 2 2 2 rs rs ð17Þ  2   Cð2l þ k  1Þ r t exp exp D2lkþ1 ðtÞ h 2 hk CðlÞCðkÞ sffiffiffi 1 n cosð/S  /Y Þ; h ¼ rN t¼ ð18Þ 2 l The ML estimator of clean speech phase is given by  2 t ML ^ ð19Þ /S ¼ arg max exp Dð2lkþ1Þ ðtÞ 2 /S 3.2 Maximum posteriori phase estimation In this section, we formulate the posterior distribution which includes a priori knowledge on speech spectral phase by using the Likelihood functions (8), (13) and (17). MAP estimate of clean speech spectral phase can be obtained from the posterior that gives the trade-off between noisy phase /Y and mean direction of phase prior distri~ . bution / S The mean direction can be calculated by using blind phase reconstruction algorithms (Krawczyk and Gerkmann 2012). The posterior distribution can be obtained by using ~ on phase Baye’s theorem, given the prior information / S /S   ~ ;y    pUS ;U~S ;Y /s ; / s ~ ;y ¼ pU j/~S ;y /S / S   S R2p ~ ; y d/ pUS ;U~S ;Y /s ; / s S 0    ~ pY j/S ðyj/s Þp  ... /s / s US / S ¼ 2p    R ~ d/ pY j/S ðyj/s Þp  ... /s / s S US / 0 S

ð20Þ After integration, the term /S will not present in the denominator of (20). Hence, to obtain MAP estimate maximizing the numerator in (20) is sufficient. To

123

596

Int J Speech Technol (2015) 18:593–607

   ~ is modeled by von mises maximize (20) p  ... /s / s U S / S

distribution (Mardia and Jupp 2000, Section 3.5.4). Case (i) Assuming that PDF of noise spectral coefficients as Laplacian and speech spectral coefficients as Nakagami    ^MAP ¼ arg max p ~ / / ~ / s s ; r; /Y S US j/s ;y /S  2    t ~ ¼ arg max exp Dð2lÞ ðtÞ exp j cos /S  / S 2 /S ð21Þ

qffiffi

where t ¼ ln cosð/S  /Y Þ Case (ii) Assuming that PDF of noise spectral coefficients as exponential and speech spectral coefficients as Nakagami    ^MAP ¼ arg max p ~ / / ~ / s s ; r; /Y S US j/s ;y /S  2    t ~ ¼ arg max exp Dð2lÞ ðtÞ exp j cos /S  / S 2 /S ð22Þ

qffiffiffiffi

n where t ¼ 4l cosð/S  /Y Þ Case (iii) Assuming that PDF of noise spectral coefficients as Gamma and speech spectral coefficients as Nakagami    ^MAP ¼ arg max p ~ / / ~ ; r; / / s s Y S US j/s ;y /S  2 t ð23Þ ¼ arg max exp Dð2lþk1Þ ðtÞ 2 /S    ~ exp j cos /S  / S qffiffi where t ¼ 12 ln cosð/S  /Y Þ j is the concentration parameter used to include the ~ . j!1 uncertainty of the prior phase information / S indicates large certainty about prior phase information, thus ^MAP ¼ / ~ . On the other hand j ¼ 0 MAP estimator gives / S S indicates large uncertainty about prior phase information, ^ML ¼ / . ^MAP ¼ / So MAP estimator gives noisy phase / Y S S When 0\j\1, MAP estimator yields between noisy ^ML and the prior phase information / ~ . phase /Y ¼ / S S

4 Derivation of the proposed CUP estimator This section derives the MMSE-optimal estimator of the (C)omplex speech coefficients given (U)ncertain (P)hase information (CUP) for different models of speech and noise. To achieve MMSE-Optimal estimator, we have to    ~ where / ~ denotes a priori derive solution for E SY; / S

123

S

knowledge on the phase of the clean speech. By using phase reconstruction algorithm (Krawczyk and Gerkmann 2012), the prior phase information is obtained. You et al. (2005) employed a compression parameter b from which they generalized the logarithmic amplitude compression (Ephraim and Malah 1985). In the proposed work, the compression parameter b is incorporated into the estimator as in (Gerkmann and Krawczyk 2013; Breithaupt et al. _ ð bÞ

2008b). The solution for S knowledge    ~ S^ðbÞ ¼ E Ab ejUS y; / S ¼

Z1 Z2p 0

is obtained using phase prior

  ~ d/ da ab ejUS pA;US jy;/~S a; /S jy;/ S s

ð24Þ

0

To solve for (24), the posterior function PA;US jy;/~S has to be modeled. With Baye’s rule, we can write   ~   pY;A;US ;U~S y; a; /S ; /  S ~ ¼   pA;US jy;/~S a; /S y; / S ~ pY;US y; / S      ~ p ~ pY jS;/~S ya; /S ; / a; / ; / ~ S S S A;US ;US      ¼RR ~ ~  pY js;/~S y a; /S ; /S pA;US ;U~S a; /S ; /S dad/S ð25Þ Now PY jS;/~S and PA;US ;U~S are to be modeled. It is assumed that clean speech realization S ¼ aej/S is known, ~ will not give any before the knowledge of PY jS;/~S . / S further information on Y, i.e., PY jS;/~S ¼ PY jS

ð26Þ

Assuming that the probability density function (PDF) pY js of noise spectral coefficients is Laplacian and with (26) we have   1 2j y  s j pY jS;/~S ðyja; /S Þ ¼ exp  ð27Þ rN rN To solve (25), clean speech amplitude, phase and the phase estimate pA;US ;U~S are to be modeled. Assuming amplitudes and phases are mutually independent, the joint PDF pA;US ;U~S can be expressed as     ~ ¼ pA ðaÞp ~ / ; / ~ pA;US ;U~S a; /S ; / S S S US ;US      ~ pU / ~ : ¼ pA ðaÞpUS j/~S /S / S S S ð28Þ Using (26) and (28) in (25), we will get the posterior as follows

Int J Speech Technol (2015) 18:593–607

597

   ~ pA;US jy;/~S a; /S y; / S

The speech estimate is obtained by

  ~ pY=s ðyja; /S ÞpA ðaÞpUS ;/~S /S ; / S   ¼RR ~ dad / pY js ðyja; /S ÞpA ðaÞpUS ;/~S /S ; / S S

ð29Þ

Assuming that PDF of speech spectral amplitudes is Nakagami distribution  l   2 l l 2 2l1 pA ðaÞ ¼ a exp  a ð30Þ CðlÞ r2S r2S with the shape parameter l and the Gamma function (Ryzhik and Gradshteyn 2007, Eq. (8.31)). By varying l speech amplitudes can be modeled with different PDFs. If l \ 1 (Breithaupt et al. 2008b; Andrianakis and White 2006), the speech can be modeled as super Gaussian (heavy tailed) distribution. In this paper, von Mises distribution with concentration parameter j is proposed to find the error between the true ~ in (29). phase /s and the prior phase estimate / s       ~ ~  pUS j/~S /S /S ¼ exp j cos /S  /S =2pI0 ðjÞ ð31Þ 2ar=r2N

The concentration parameter j ¼ and varðUS ja; yÞ ¼ 1  I1 ðjÞ=I0 ðjÞ (Evans et al. 2000). As the value of j increases, variance decreases. The large values ~ , where as the low of j implies high certainty about / s values of j gives large degree of uncertainty. After substituting (27), (30) and (31) in (29) and with (Gradshteyn and Ryzhik 2007, Eq. 3.462.1) MMSE estimator can be formulated from the posterior. This estimator contains compressed speech coefficients with uncertain prior knowledge of the clean speech phase. Analogous to (Gerkmann 2014), CUP estimator can be obtained as    ~ S^ðbÞ ¼ E Ab ejUS y; / S 0sffiffiffiffiffiffiffiffi1b r2N A Cð2l þ bÞ ¼@ Cð2lÞ 2ln   2 p R ðj/ Þ 2 ~ e S exp t2 D2lb ðtÞpUS j/~S d/ S 0  ð32Þ t2 R2p ~ exp 2 D2l ðtÞpUS j/~S d/S 0

sffiffiffi n cosð/S  /Y Þ t¼ l

ð33Þ

D(m) is the parabolic cylinder function (Ryzhik and Gradshteyn 2007, Eq. (9.24)), The a priori SNR is given by   r2S E S2  n¼ 2 ¼ ð34Þ r N E ðj N 2 j Þ

 1=b S^ðbÞ ^ jU^S S^ ¼ S^ðbÞ   ðbÞ  ¼ Ae S^ 

ð35Þ

Case (ii) Assuming that PDF of noise spectral coefficients as Exponential and PDF of speech spectral coefficients as Nakagami (See derivation in ‘‘Appendix’’). After substituting (12), (30) and (31) in (29) and with (Gradshteyn and Ryzhik 2007, Eq. 3.462.1) MMSE estimator can be formulated from the posterior.    ~ S^ðbÞ ¼ E Ab ejUS y; / S 0sffiffiffiffiffiffiffiffi1b nr2N A Cð2l þ bÞ ¼@ Cð2lÞ 2l  2 R2p ðj/ Þ ~ e S exp t2 D2lb ðtÞpUS j/~S d/ S 0  ð36Þ t2 R2p ~ exp 2 D2l ðtÞpUS j/~S d/S 0

sffiffiffiffiffiffi n cosð/S  /Y Þ t¼ 4l

ð37Þ

D(m) is the parabolic cylinder function (Ryzhik and Gradshteyn 2007, Eq. (9.24)), The a priori SNR is given by     n ¼ r2S =r2N ¼ E S2  =E N 2  The speech estimate is obtained by  1=b S^ðbÞ ^ jU^S S^ ¼ S^ðbÞ   ðbÞ  ¼ Ae S^ 

ð38Þ

Case (iii) Assuming that PDF of noise spectral coefficients as Gamma distribution PDF of speech spectral coefficients as Nakagami. After substituting (16), (30) and (31) in (29) and with (Gradshteyn and Ryzhik 2007, Eq. 3.462.1) MMSE estimator can be formulated from the posterior. 0sffiffiffiffiffiffiffiffi1b   2  ~ ¼ @ rN nA Cð2l þ b þ k  1Þ S^ðbÞ ¼ E Ab ejUS y; / S 2l Cð2l þ k  1Þ R2p 

exp

0

R2p 0

 2 t ~ 2 D2lbkþ1 ðtÞpUS j/~S d /S

exp

t2 2

~ D2lkþ1 ðtÞpUS j/~S d/ S ð39Þ

sffiffiffi 1 n cosð/S  /Y Þ t¼ 2 l

ð40Þ

123

598

Fig. 1 Amplitude response and phase response of Proposed CUP estimator (32) for l ¼ b ¼ 1 and n ¼ 0:3 for different values of concentration parameter j in (31). For j ¼ 0 the amplitude estimate approaches the behavior of a Wiener filter (left) and the phase estimate results in /^S ! /Y (right). For j ! 1 the amplitude

123

Int J Speech Technol (2015) 18:593–607

estimate approaches the result in [9] (left) and the phase estimate results in /^S ! /~S (right). Amplitude and phase responses for /Y ¼ 0 and a /~S ¼ 0, b /~S ¼ p=4, c /~S ¼ p=2, d /~S ¼ 34 p

Int J Speech Technol (2015) 18:593–607

599

Fig. 1 continued

0.8

0.75

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.7

Phase SensitiveXexp( Φ S)

(b)

0.68 0.66 0.64 0.62

PESQL-Improvement

PESQL-Improvement

(a)

0.65

0.6

0.6 0.58 0.56 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.54

0.55

0.52

0.5

Phase SensitiveXexp(Φ S)

0.5

0.45

0

5

10

0.48

15

0

5

Global SNR (dB)

(c) 0.46

(d)

15

0.7

0.44

0.65

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.42

0.6

Phase SensitiveXexp(ΦS)

PESQL-Improvement

PESQL-Improvement

10 Global SNR (dB)

0.4

0.38

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.36

Phase SensitiveXexp(ΦS)

0.5

0.45

0.4

0.34

0.32

0.55

0

5

10 Global SNR (dB)

15

0.35

0

5

10

15

Global SNR (dB)

Fig. 2 PESQ-improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as exponential pdf). a Modulated pink Gaussian noise, b Pink noise, c Babble noise, d non stationary factory noise

123

600

(a)

Int J Speech Technol (2015) 18:593–607

(b)

0.9 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.85 0.8

0.7

0.65

PESQL-Improvement

PESQL-Improvement

Phase SensitiveXexp(Φ S) 0.75 0.7 0.65 0.6 0.55

0.6

0.55 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.5

0.5 0.45

Phase SensitiveXexp(ΦS) 0

5

10

0.45

15

0

5

(c)

10

15

Global SNR (dB)

Global SNR (dB)

(d)

0.48 0.46

0.7

0.65

0.44 PESQL-Improvement

PESQL-Improvement

0.6

0.42 0.4 0.38

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.36

Phase SensitiveXexp(ΦS)

0.5 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.45

0.4

0.34 0.32

0.55

Phase SensitiveXexp(ΦS)

0

5

10

15

0.35

0

5

Global SNR (dB)

10

15

Global SNR (dB)

Fig. 3 PESQ-improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as Laplace pdf). a Modulated pink Gaussian noise, b Pink noise, c Babble noise, d non stationary factory noise

5 Implementation of the proposed estimator In Eqs. (32), (36) and (39) we need to integrate the parabolic cylindrical function D(t) with respect to speech spectral phase /s which is complicated with the phase limits between 0  /S \2p. The Eqs. (32), (36) and (39) can be solved by numerically where the gain functions are precomputed and tabulated. This table has four dimensions for a given shape parameter l and compression parameter b, which are the a priori SNR n, the a posteriori SNR r 2 =r2N , the concentration parameter j and phase difference /S  /Y . 5.1 Analysis of the proposed estimator The plots of the input and output curves of phase prior as a function of noisy input y and phase of /Y ¼ 0, l ¼ b ¼ 1 are shown in Fig. 1.

123

Three cases are analyzed for j ¼ 0, j ! 1 and 0\j\1: When j ¼ 0, the distribution of phase prior is uniform. There is no influence of phase prior on estimation of clean speech phase and amplitude due to large uncertainty of ~ . When l ¼ b ¼ 1, the proposed prior phase information / S estimator behaves like wiener filter. When j ! 1, von mises distribution becomes delta function. In this case the estimated coefficients amplitudes are analogous to (Gerkmann and Krawczyk 2013) whereas the phase estimation is equal to the mean direction of von _ ~ : mises distribution / ¼ / S

S

When 0\j\1, the proposed method lies between the deterministic phase prior (j ! 1) and uniformly distributed phase prior (j ¼ 0), the proposed method allows us to employ both speech spectral phase prior information ~ and uncertain about prior phase information (which is / S

Int J Speech Technol (2015) 18:593–607

(a)

601

(b)

0.9 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.85 0.8

0.7

Phase SensitiveXexp(ΦS) 0.75

PESQL-Improvement

PESQL-Improvement

0.75

0.7 0.65 0.6

0.65

0.6

0.55

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.55

0.5 0.5 0.45

Phase SensitiveXexp(ΦS) 0

5

10

0.45

15

0

5

(c)

10

15

Global SNR (dB)

Global SNR (dB)

(d)

0.5 0.48

0.7

0.65

0.46

PESQL-Improvement

PESQL-Improvement

0.6 0.44 0.42 0.4

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.38

0.5 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

0.45

Phase SensitiveXexp(ΦS)

0.36

0.4

0.34 0.32

0.55

Phase SensitiveXexp(ΦS) 0

5

10

15

0.35

0

5

10

15

Global SNR (dB)

Global SNR (dB)

Fig. 4 PESQ-improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as Gamma pdf). a Modulated pink Gaussian noise, b Pink noise, c Babble noise, d non stationary factory noise

controlled by j). When j ! 0, the estimator shows large ~ . uncertainty about prior phase information / S

6 Experimental results and discussion Speech samples are sampled at 16 kHz collected from the TIMIT database uttered by 5 male and 4 female and are corrupted by babble noise and non-stationary factory noise at various SNRs (0, 5, 10, 15 dB). Phase reconstruction (Krawczyk and Gerkmann 2012) is incorporated to obtain ~ in (32), (36) and (39). The fundamental frequency is / S calculated by PEFAC (Gonzalez and Brookes 2014) which also gives voiced signal segment probability PHV ðiÞ. The a posteriori speech presence probability is used for the estimate of r2N (noise power spectral density; Gerkmann and Hendriks 2012). The estimate of a priori SNR is obtained

by using the decision-directed approach (Ephraim and Malah 1984) which considers smoothing constant aDD ¼ 0:98. The parameter j is adjusted such that it reflects the ~ . In the proposed algorithm, at high frecertainty of / S quencies lower values of j are considered. jðk; iÞ ¼ 4PHv ðiÞ; ¼ 2PHv ðiÞ;

kfS =N\4000 Hz; kfS =N  4000 Hz

ð41Þ

where N is STFT length and fs is the sampling frequency. The proposed CUP estimators (32), (36) and (39) are compared to four estimators denoted as ‘‘CUP estimator (Gerkmann 2014)’’, ‘‘phase insensitive (Breithaupt et al. 2008b)’’, ‘‘phase sensitive (Krawczyk et al. 2013)’’ and ~S )’’. In all ‘‘phase sensitive (Krawczyk et al. 2013) Xexp(j U the above algorithms l ¼ b ¼ 1. In case of ‘‘phase insensitive (Breithaupt et al. 2008b)’’, noise and speech

123

602

(a)

Int J Speech Technol (2015) 18:593–607

(b)

1 0.5

0

0

-0.5

STOI-Improvement

-0.5 STOI-Improvement

0.5

-1 -1.5 -2 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

-2.5 -3

-4

0

5

10

-1.5 -2 -2.5

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

-3

Phase SensitiveXexp(ΦS)

-3.5

-1

-3.5 -4

15

Phase SensitiveXexp(ΦS) 0

5

10

(c) 2.5

1.5

Phase SensitiveXexp(ΦS)

STOI-Improvement

STOI-Improvement

(d)

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

2

15

Global SNR (dB)

Global SNR (dB)

1 0.5 0

2.5

2

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

1.5

Phase SensitiveXexp(ΦS)

1

0.5

0 -0.5

-0.5

-1 -1.5

0

5

10

15

Global SNR (dB)

-1 0

5

10

15

Global SNR (dB)

Fig. 5 STOI improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as exponential pdf). a Pink noise, b Babble noise, c Non stationary factory noise, d Modulated pink Gaussian noise

coefficients are respectively considered as Gaussian and chi distributed. In contrast to the proposed CUP estimator speech phase is considered as uniformly distributed. In phase insensitive method the clean speech amplitude is modified but not phase in the frequency domain. In phase sensitive (Krawczyk et al. 2013) method, phase reconstruction algorithm is employed to obtain a phase estimate similar to the proposed CUP estimator. In contrast to the proposed CUP estimator, the information about ~ is treated as deterministic. In phase sensipriori phase / S tive method noisy phase is not considered and it remains unchanged. ~S ) In phase sensitive (Krawczyk et al. 2013) Xexp (j U method, for signal reconstruction we use clean speech prior phase information whenever PEFAC signals are voiced  ~S . This method is similar to proposed speech S^ ¼ A^ exp jU method if j ! 1.

123

In CUP estimator (Gerkmann 2014) method, assuming that PDF of speech coefficients is chi distributed and noise coefficients as Gaussian. Consider the value of k = 2 in case of Gamma noise prior. In ‘‘Phase insensitive estimator (Breithaupt et al. 2008b)’’ the clean speech amplitude is modified and any phase in the frequency domain is not considered. Also in ‘‘phase sensitive (Krawczyk et al. 2013)’’ method noisy phase is not considered and it remains unchanged. In contrast to (Breithaupt et al. 2008b), in the proposed ~ is employed for obtaining the method, noisy phase / S speech amplitudes estimation using the phase sensitive amplitude estimators (Gerkmann and Krawczyk 2013; Krawczyk et al. 2013). ‘‘Phase sensitive (Krawczyk et al. ~S )’’estimator and phase sensitive amplitude 2013) Xexp (j U estimator (Krawczyk et al. 2013) are same, but prior phase ~ is used instead of noisy phase. estimate / S

Int J Speech Technol (2015) 18:593–607

(a) 4

603

(b)

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

3

0 -0.5

Phase SensitiveXexp(ΦS) 2

STOI-Improvement

STOI-Improvement

0.5

1

0

-1 -1.5 -2 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

-2.5 -3

-1

Phase SensitiveXexp(ΦS)

-3.5

-2

0

5

10

-4

15

0

5

Global SNR (dB)

(c)

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

1.5

0.5 0

1 0.5 0

-1

-0.5

5

10

15

Global SNR (dB)

Phase SensitiveXexp(ΦS)

1.5

-0.5

0

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

2

Phase SensitiveXexp(ΦS)

1

-1.5

15

3 2.5

STOI-Improvement

STOI-Improvement

(d)

2.5 2

10 Global SNR (dB)

-1

0

5

10

15

Global SNR (dB)

Fig. 6 STOI improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as Laplace pdf). a Pink noise, b Babble noise, c Non stationary factory noise, d Modulated pink Gaussian noise

The performance of five algorithms, proposed CUP estimators (32), (36) and (39), CUP estimator (Gerkmann 2014), ‘‘Phase sensitive (Krawczyk et al. 2013)’’, ‘‘Phase ~S )’’ and ‘‘phase sensitive (Krawczyk et al. 2013) Xexp(j U insensitive (Breithaupt et al. 2008b)’’ are compared using Perceptual Evaluation of Speech Quality measure (PESQ) (Loizou 2007) and the short-time objective intelligibility (STOI) measure (Taal et al. 2011). We need to compute PESQ over the entire speech signals, without considering into account voiced and unvoiced speech. From the Fig. 1, it can be seen that using ‘‘phase sensitive (Krawczyk et al. 2013)’’ approach performs well than the ‘‘phase insensitive (Breithaupt et al. 2008b)’’ estimator. In the ‘‘phase sensitive (Krawczyk et al. 2013)’’, PESQ is large in the case of babble noise. At this point the benefit of phase sensitive estimator than phase insensitive is more pronounced in the voiced speech, since voiced

speech having most of the energy at low frequencies. ~S )’’ ‘‘Phase sensitive (Krawczyk et al. 2013) Xexp (j U approach improves the performance than phase sensitive (Krawczyk et al. 2013) in low signal to noise ratio ~ to the noisy phase. But in the (SNRs) by replacing / S case of high SNRs the performance decreases because of errors in the phase estimate. Figures 2, 3 and 4 present the PESQ improvements for the MMSE with Nakagami speech prior estimators at various input SNRs, noises. For MMSE with speech priors as Nakagami distribution and noise priors as exponential estimators, the maximum PESQ improvements ranged from 0.43 to 0.79 (0 dB input SNR), 0.46–0.70 (5 dB input SNR), 0.45–0.64 (10 dB input SNR) and 0.45–0.57 (15 dB input SNR) across the babble, pink, modulated pink Gaussian, non stationary factory noises respectively.

123

604

Int J Speech Technol (2015) 18:593–607

(a)

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

2.5 2

1 0.5 0

Phase SensitiveXexp(ΦS)

-0.5 STOI-Improvement

STOI-Improvement

(b)

3

1.5 1 0.5

-1 -1.5 -2 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

-2.5

0 -3

-0.5 -1

Phase SensitiveXexp(ΦS)

-3.5

0

5

10

-4

15

0

5

Global SNR (dB)

(c) 2.5

1.5

15

4 Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

3

Phase SensitiveXexp(ΦS)

Phase SensitiveXexp(ΦS) STOI-Improvement

STOI-Improvement

(d)

Proposed CUP Estimator CUP Estimator Phase Insensitive Phase Sensitive

2

10 Global SNR (dB)

1 0.5 0

2

1

0

-0.5 -1

-1 -1.5

0

5

10

15

Global SNR (dB)

-2

0

5

10

15

Global SNR (dB)

Fig. 7 STOI improvement (speech spectral coefficients as Nakagami distribution and noise spectral coefficients as Gamma pdf). a Modulated Pink Gaussian noise, b Babble noise, c Non stationary factory noise, d pink noise

For MMSE with speech priors as Nakagami distribution and noise priors as Laplace estimators, the maximum PESQ improvements ranged from 0.45 to 0.83 (0 dB input SNR), 0.47–0.73 (5 dB input SNR), 0.48–0.66 (10 dB input SNR) and 0.40–0.60 (15 dB input SNR) across the babble, pink, modulated pink Gaussian, non stationary factory noises respectively. For MMSE with speech priors as Nakagami distribution and noise priors as Gamma estimators, the maximum PESQ improvements ranged from 0.46 to 0.84 (0 dB input SNR), 0.47–0.75 (5 dB input SNR), 0.49–0.68 1 (0 dB input SNR) and 0.43–0.59 1 (15 dB input SNR) across the babble, pink, modulated pink Gaussian, non stationary factory noises respectively. After examining all three cases, speech as Nakagami and noise as Gamma pdf yields slightly better PESQ improvement than other two cases. (Noise as exponential and Laplace pdfs).

123

The Nakagami prior preserves speech spectral components, at the expense of a larger number of spurious spectral peaks. The Gamma prior suppresses weaker spectral components. In the noise dominated regions of the spectrogram, Nakagami prior results in smoother spectral peaks and hence, the residual noise of the enhanced sentence is more uniform. For the proposed CUP estimators (32), (36) and (39), STOI gives the improvement intelligibility as shown in Figs. 5, 6 and 7. In case of ‘‘phase sensitive (Krawczyk ~ ~S )’’ approach, the substitution of / et al. 2013) Xexp (j U S in place of noisy phase may result unnatural artifacts. In the case of babble noise to avoid negative STOI scores we need to estimate the speech PSD using temporal cepstrum smoothing (Breithaupt et al. 2008a; Gerkmann and Martin 2009). In the ‘‘phase sensitive (Krawczyk et al. 2013) Xexp ~ S )’’ approach, STOI shows decreased intelligibility. (j U Informal listening confirms that, if we consider uncertainty

Int J Speech Technol (2015) 18:593–607

605

of the prior phase estimate, undesired artifacts can be reduced by using the proposed CUP estimator.

Appendix: MMSE CUP Estimator (Speech as Nakagami PDF and noise as exponential PDF)    ~ S^ðbÞ ¼ E Ab ejUS y; / S

7 Conclusion ¼ Most of the frequency domain-based single channel speech enhancement algorithms cannot modify the phase of the noisy signal. Recently, in some of the speech enhancement (Paliwal et al. 2011) approaches phase is also considered. To improve the spectral amplitude estimate, phase estimate is incorporated (Gerkmann and Krawczyk 2013), but there may cause undesired artifacts in the enhanced signal, if we substitute noisy phase signal with phase estimate (Krawczyk and Gerkmann 2012; Sturmel and Daudet 2011). In this paper, we proposed algorithms that employ priors, the Gamma, Laplace, exponential and the Nakagami. These priors were combined with the MMSE and the MAP estimators. The shape parameter to the priors allows the listener to closely match the performance of algorithms that used the same estimator, but different priors. When finding a joint MMSEestimate of the clean speech amplitude and phase, the estimate of clean speech phase is obtained as an uncertain prior knowledge. The estimated phase which is obtained from the proposed method yields satisfactory values between the prior phase estimate and the phase of noisy signal. The estimated amplitude stands in between the outputs of uniform distributed phase and the phase sensitive amplitude estimator (Gerkmann and Krawczyk 2013). The proposed joint MMSE estimator of clean speech amplitude and phase has shown an improvement on the speech quality and informal listening as well as reduced artifacts in the enhanced signal.

Z1 Z2p 0

   ~ d/ da ab ej/S pA;US jy;/~S a; US y; / S S

ð42Þ

0

  ~   pY;A;U ;U~ jy;/~ y; a; /S ; /  S S S S ~ ¼   pA;US jy;/~S a; US y; / S ~ pY;U~S y; /S       ~ ~ pA ðaÞp ~ / / pY js;/~S ya; /S ; / S S S U S j /S       ¼ RR ~ ~ pA ðaÞp ~ / / pY js;/~S ya; /S ; / S S S dad/S US j/S ð43Þ Assuming that speech coefficients as Nakagami  l   2 l l 2 2l1 pA ðaÞ ¼ a exp  2 a CðlÞ r2S rS

ð44Þ

Assuming that noise coefficients as exponential   1 ys PY jS ðyjsÞ ¼ exp rN rN

ð45Þ

von Mises distribution with concentration       ~ ¼ exp j cos /  / ~ pUS j/~S /S / =2pI0 ðjÞ S S S

ð46Þ

To determine posterior, substitute (44), (45) and (46) in (43),

   ~ S^ðbÞ ¼ E Ab ejUS y; / S ab ej/S ¼

R2p R1 0 0

R2p R1 0 0 ej/S 2 r N Cð l Þ

1 2 rN Cð l Þ

¼

2 Cð l Þ l r2S

R2p R1

0 0 R2p R1 0 0

l r2S

l r2S

exp

 l l r2S

 l

 l

 l

¼

ej/S

2 CðlÞ

       ~ a2l1 exp  rl2 a2 r1N exp ys rN pUS j/~S /S /S dad/S S

       ~ a2l1 exp  rl2 a2 r1N exp ys rN pUS j/~S /S /S dad/S S

  R2p R1 y

exp

rN

0 0

     j/ ~ dad/ abþ2l1 exp  rl2 a2  erNS a pUS j/~S /S / S S

  R2p R1 y rN

0 0

S

ð47Þ

     j/ ~ dad/ a2l1 exp  rl2 a2  erNS a pUS j/~S /S / S S S



    j/ ~ dad/ abþ2l1 exp  rl2 a2  erNS a pUS j/~S /S / S S S

     j/ ~ dad/ a2l1 exp  rl2 a2  erNS a pUS j/~S /S / S S S

123

606

Int J Speech Technol (2015) 18:593–607

From Gradshteyn and Ryzhik (2007), Eq. 3.462.1  2   Z 1 v c c 2 xv1 ebx cx ¼ ð2bÞ 2 CðvÞ exp Dv pffiffiffi 8b 2 b 0 ð48Þ Compare (47) and (48) v ¼ b þ 2l;



v ¼ 2l;

l ; r2S

R2p



ej/S

0

¼

l ; r2S c¼



ej/ in numerator rN

qffiffiffiffi 2

n cosð/S  /Y Þ where n ¼ rS r2 , t ¼ 4l N 1 0 qffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffi



2 2 2 rN 1 r r N S 1=r N C B t ¼ @ qffiffiffiffiffiffiffiffi

ffiA ¼ qffiffiffiffiffiffiffiffi

ffi ¼ pffiffiffiffiffiffi 4l 2 l r2 2 l r2 S S sffiffiffiffiffiffi n cosð/S  /Y Þ ¼ 4l

ej/ in denominator rN

0 1     1=r2N 1 =rN ~ d/ @ qffiffiffiffiffiffiffiffiffiffiffiAp ~ /S / C ð b þ 2l Þ exp D 2  ð bþ2l Þ S S US j/S rs 8l =r2S 2 2 l =rS 0 1 !    R2p 2lð2l2 Þ 1=r2N 1 =rN ~ d/ @ qffiffiffiffiffiffiffiffiffiAp ~ /S / C ð 2l Þ exp D 2  ð 2l Þ S S US j/S rs 8l= 2 2 l=r2 0 r 

 ðbþ2l 2 Þ 2l

S

S

Multiply and divide with r2N with in the integration 

2l r2N r2s r2N

¼

0 1     2 1 r 1 = r = N A N ~ @ qffiffiffiffiffiffiffiffiffiffi ffi p Cðb þ 2lÞ ej/S exp D ~  ð bþ2l Þ US j/S /S /S d/S 8l =r2S 2 0 2 l =rS 0 1 !     2 ð2l2 Þ 2p 2 R 1 r 1 = r r = N A 2l N N ~ d/ @ qffiffiffiffiffiffiffiffiffi p exp C ð 2l Þ /S / D ~ 2 2  ð 2l Þ S S / U rs rN Sj S 8l= 2 2 l=r2 0 r

ðbþ2l 2 Þ



R2p

S

1     2 1 r 1 = r = N A N ~ d/ @ qffiffiffiffiffiffiffiffiffiffi ffi pU /~ /S / Cðb þ 2lÞ ej/S exp S S 2 Dðbþ2lÞ Sj S 8l =rS 2 0 2 l =rS 0 1 !    R2p 1=r2N 1 =rN ~ d/ Cð2lÞ exp 8l Dð2lÞ @ qffiffiffiffiffiffiffiffiffiApUS j/~S /S / S S =r2 2 l=r2 0 S S

rffiffiffiffiffiffiffiffiffiffiffi  ffiðbÞ nr2N 2l

¼

nr2N 2l



R2p

rffiffiffiffiffiffiffiffiffiffiffi  ffiðbÞ

Cðb þ 2lÞ

R2p

ej/S exp

0

S^ðbÞ ¼ Cð2lÞ

R2p 0

123

S

exp

t2 2

0

    2 t ~ ð t Þp D ~  ð bþ2l Þ US j/S /S /S d/S 2

   ~ d/ Dð2lÞ ðtÞpUS j/~S /S / S S

Int J Speech Technol (2015) 18:593–607

References Andrianakis, I., & White, P. R. (2006). MMSE speech spectral amplitude estimators with Chi and Gamma speech priors. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2006), Toulouse, France, pp. 1068–1071. doi:10.1109/ICASSP.2006.1660842 Breithaupt, C., Gerkmann, T., & Martin, R. (2008a). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2008), Las Vegas, NV, USA, pp. 4897–4900. doi:10.1109/ICASSP.2008. 4518755 Breithaupt, C., Krawczyk, M., & Martin, R. (2008b). Parameterized MMSE spectral magnitude estimation for the enhancement of noisy speech. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2008), Las Vegas, NV, USA, pp. 4037–4040. doi:10.1109/ICASSP.2008. 4518540 Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transaction on Acoustic, Speech, Signal Processing, 32(6), 1109–1121. doi:10.1109/TASSP.1984.1164453. Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transaction on Acoustic, Speech, Signal Processing, 33(2), 443–445. doi:10.1109/TASSP.1985.1164550. Erkelens, J. S., Hendriks, R. C., Heusdens, R., & Jensen, J. (2007). Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors. IEEE Transaction on Audio, Speech and Language Processing, 15(6), 1741–1752. doi:10.1109/TASL.2007.899233. Evans, M., Hastings, N., & Peacock, B. (2000). von Mises distribution. In Statistical distributions (ch. 45, pp. 191–192), 4th ed. New York: Wiley. Gerkmann, T. (2014). MMSE-optimal enhancement of complex speech coefficientswith uncertain prior knowledge of the clean speech phase. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2014), Florence, Italy, pp. 4478–4482. doi:10.1109/ICASSP.2014.6854449 Gerkmann, T., & Hendriks, R. C. (2012). Unbiased MMSEbased noise power estimation with low complexity and low tracking delay. IEEE Transaction on Audio, Speech, Language Processing, 20(4), 1383–1393. doi:10.1109/TASL.2011.2180896. Gerkmann, T., & Krawczyk, M. (2013). MMSE-optimal spectral amplitude estimation given the STFT-phase. IEEE Signal Processing Letters, 20(2), 129–132. doi:10.1109/LSP.2012. 2233470. Gerkmann, T., & Martin, R. (2009). On the statistics of spectral amplitudes after variance reduction by temporal cepstrum smoothing and cepstral nulling. IEEE Transaction on Signal Processing, 57(11), 4165–4174. doi:10.1109/TSP.2009. 2025795. Gonzalez, S., & Brookes, M. (2014). PEFAC—a pitch estimation algorithm robust to high levels of noise. IEEE Transaction on Audio, Speech, Language Processing, 22(2), 518–530. doi:10. 1109/TASLP.2013.2295918. Gradshteyn, I. S., & Ryzhik, I. M. (2007). Table of integrals series and products (7th ed.). San Diego, CA: Academic. Griffin, D., & Lim, J. S. (1984). Signal estimation from modified short-time fourier transform. IEEE Transaction on Acoustics, Speech, and Signal Processing, 32(2), 236–243. doi:10.1109/ TASSP.1984.1164317. Hendriks, R. C., Gerkmann, T., & Jensen, J. (2013). DFT-domain based single-microphone noise reduction for speech

607 enhancement: A survey of the state of the art. Synthesis Lectures on Speech and Audio Processing, 9(1), 1–80. doi:10.2200/ S00473ED1V01Y201301SAP011. Krawczyk, M., & Gerkmann, T. (2012). STFT phase improvement for single channel speech enhancement. In Acoustic signal enhancement; proceedings of IWAENC 2012; international workshop O. VDE, Aachen, Germany, pp. 1–4. http://ieeexplore.ieee.org/xpl/ articleDetails.jsp?tp=&arnumber=6309424. Krawczyk, M., & Gerkmann, T. (2014). STFT phase reconstruction in voiced speech for an improved single channel speech enhancement. IEEE Transaction on Audio, Speech and Language Processing, 22(12), 1931–1940. doi:10.1109/TASLP.2014. 2354236. Krawczyk, M., Rehr, R., & Gerkmann, T. (2013). Phase-sensitive real-time capable speech enhancement under voiced- unvoiced uncertainty. In Proceeding of Eur. signal processing conference (EUSIPCO 2013), Morocco, pp. 1–5. http://ieeexplore.ieee.org/ xpl/login.jsp?tp=&arnumber=6811648&url=http%3A%2F% 2Fieeexplore.ieee.org%2Fstamp%2Fstamp.jsp%3Ftp%3D% 26arnumber%3D6811648. Le Roux, J., & Vincent, E. (2013). Consistent Wiener filtering for audio source separation. IEEE Signal Processing Letter, 20(3), 217–220. doi:10.1109/LSP.2012.2225617. Loizou, P. C. (2007). Speech enhancement-theory and practice. Boca Raton, FL: CRC Press, Taylor & Francis Group. Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-gaussian speech model. Eurasip Journal on Applied Signal Processing, 7, 1110–1126. http://www.ind.rwth-aachen.de/fileadmin/publications/lotter05a. pdf Mardia, K. V., & Jupp, P. E. (2000). Directional statistics. Chichester: Wiley. Mowlaee, P., & Saeidi, R. (2013). Iterative closed-loop phase aware single-channel speech enhancement. IEEE Signal Procesing Letter, 20(12), 1235–1239. doi:10.1109/LSP.2013.2286748. Paliwal, K., Wo´jcicki, K., & Shannon, B. (2011). The importance of phase in speech enhancement. Speech Communication, 53(4), 465–494. doi:10.1016/j.specom.2010.12.003. Ryzhik, I., & Gradshteyn, I. S. (2007). Table of integrals series and products (7th ed.). CA: Academic Press. Sturmel, N., & Daudet, L. (2011). Signal reconstruction from STFT magnitude: A state of the art. In International conference on digital audio effects (DAFx), Paris, France, pp. 375–386. http:// recherche.ircam.fr/pub/dafx11/Papers/27_e.pdf Sturmel, N., & Daudet, L. (2012). Iterative phase reconstruction of Wiener filtered signals. In Proceedings of international conference on acoustic, speech and signal processing (ICASSP 2012), Kyoto, Japan, pp. 101–104. doi:10.1109/ICASSP.2012.6287827 Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transaction on Audio, Speech, Language Processing, 19(7), 2125–2136. doi:10.1109/TASL. 2011.2114881. Vary, P., & Eurasip, M. (1985). Noise suppression by spectral magnitude estimation—mechanism and theoretical limits. Signal Processing, 8(4), 387–400. doi:10.1016/0165-1684(85)90002-7. Wang, D., & Lim, J. (1982). The unimportance of phase in speech enhancement. IEEE Transaction on Acoustics, Speech and Signal Processing, 30(4), 679–681. doi:10.1109/TASSP.1982. 1163920. You, C. H., Koh, S. N., & Rahardja, S. (2005). b-order MMSE spectral amplitude estimation for speech enhancement. IEEE Transaction on Speech Audio Processing, 13(4), 475–486. doi:10.1109/TSA.2005.848883.

123

Suggest Documents