A robust Iterative Inverse Filtering approach for

20 downloads 0 Views 722KB Size Report
different solutions [2], [3], [4] have appeared in the literature on purpose so. Most of them are based on regularization theory [5] due to the ill-posedness of the ...
A robust Iterative Inverse Filtering approach for Speech Dereverberation in presence of Disturbances Rudy Rotili, Simone Cifani, Emanuele Principi, Stefano Squartini, Francesco Piazza 3MediaLabs - DEIT - Universit`a Politecnica delle Marche Via Brecce Bianche 31, 60131 Ancona, Italy Email: {r.rotili, s.cifani, e.principi, s.squartini, f.piazza}@univpm.it

Abstract— In the present work the inverse filtering problem for speech dereverberation in stationary conditions is addressed. In particular we consider the presence of multiple observables which has a beneficial impact of on room transfer functions (RTFs) invertibility. In actual acoustic enviroments the assumed knowledge of RTFs is usually altered by the presence of disturbances under the form of additive noise or RTF fluctuations, inevitably resulting in reduced inverse filtering performances. Several approaches, mainly based on regularization theory, have appeared in the literature to face such a problem. Among them, a recent study has shown the dereverberation capabilities dependence on some design parameters, significantly related to the filter energy. In this paper such interesting work is taken as reference and its optimum inverse filtering approach substituted with an iterative technique, which is typically much more computationally efficient. As proved by results obtained through the several computer simulations carried out, such an algorithm has revealed to be more robust w.r.t. the reference counterpart in terms of regularization parameter variations.

Fig. 1: Multichannel dereverberation problem.

I. I NTRODUCTION In speech communication applications, the quality of the original source can be degraded in many ways. One of the most relevant deterioration cause is represented by acoustic reverberation. Such an effect is typically modeled as a linear convolution between the speech source and the finite impulse responses (FIRs) related to the room transfer functions (RTFs). Assuming in our case that such RTFs are known (or already accurately estimated) and time-invariant, the way to follow in order to remove the reverberation effect consists in calculating the inverse of such impulse responses and suitably apply them to the observed signals, as shown in Fig.1. Several inverse filtering methods can be found in literature and high performances are achievable under the assumptions made above. In particular, through the multiple-input/output inverse theorem (MINT), it has been already proved that the employment of multiple observations allows to obtain exact RTF inverses also in presence of real nonminimum phase responses [1]. However, it must be noticed that acoustic responses are not actually known due to the presence of disturbances and the inverse filtering performances in terms of final speech dereverberation can be severely affected by them. The nature of such disturbances is typically double: the additive noise present at the microphones during the acquisition and the RTF fluctuations, due to different factors like change of source position or temperature. For instance, if the RTF fluctuates, the inverse filtering process may work incorrectly and the output signal will suffer from distortion. Such a sensitivity has been addressed in many scientific contributions and different solutions [2], [3], [4] have appeared in the literature on purpose so. Most of them are based on regularization theory [5] due to the ill-posedness of the problem to solve. In particular we refer here to an inverse filter design method, based on MINT, that is less

sensitive to noise and RTF fluctuations [6], which has been recently investigated. This method performs a direct pseudo-inverse operation in order to obtain the optimal solution in the Least Square (LS) sense and proposes a strategy to minimize the inverse filter energy in dependence on the filter design parameters (regularization, modeling delay and filter length) with the aim of reducing the dereverberation sensitivity to disturbances. The adopted algorithm is computationally demanding and does not allow to rapidly perform massive simulations as usually required in this kind of applications. This shove the authors to propose an iterative algorithm in substitution of the LS optimum inverse filtering approach used in [6]. In this context, the technique considered in [7] for the Multichannel Sound Reproduction problem (in absence of disturbances) has been addressed, extended to the our case of interest and applied to the same experimental scenarios in [6]. Performed computer simulations have proven the effectiveness of the idea, also showing that the new approach is much more robust to variations of regularization parameter w.r.t. the conventional one. Here the outline of the paper follows: next section reports the inverse filtering problem in presence of disturbances within the acoustic system under study whereas in section III the proposed iterative algorithm is discussed; section IV describes the addressed simulation scenarios in presence of RTF fluctuations or additive noise with an analysis of obtained results; Section V concludes the paper.

This work was supported by the European Commission as sponsor of the hArtes Project, Number 035143

The acoustic scenario under study is made of a single speech source and multiple microphones, where signals x1 . . . xP are the

II. I NVERSE F ILTERING WITH DISTURBANCES In this section the acoustic system model and the regularized LS optimum filter in presence of disturbances are described by using the same notation adopted in [6]. A. The acoustic system model

reverberant signals and P is the number of microphones. The overall system can be mathematically modeled as xi (n) = hi (n) ∗ s(n) + wi (n) J X = hi (k)s(n − k) + wi (n),

i = 1, . . . , P,

(1)

k=0

where hi (n) represents the ith impulse response (IR) between the source and ith microphone, J is the IR length and wi (n) is the ith channel noise. We also assume that RTFs are coprime, or rather, they do not have any common zero in the z-plane [1]. Equation (1) can be rewritten in a matrix form as T

x(n) = H s(n) + w(n) with x(n) = w(n) =

s(n) =

ˆ

ˆ

x1 (n), x2 (n), . . . , xP (n)

ˆ

(2) ˜T

w1 (n), w2 (n), . . . , wP (n) ˆ ˜ H = H1 , . . . , HP ,

RTF, respectively. Thus, the cost function in (3) must be modified as follows: e − vk2 i C =Ehk(H + H)g e + (Hg) e T (Hg − v)+ =(Hg − v)T (Hg − v) + Eh(Hg − v)T Hg e T H)gi e gT H e is zero the cost function If we assume that the expectation value of H becomes T T e e T Hig. C = gT H Hg − gT H v − vT Hg + vT v + gT EhH

For the noise case we consider an additive white interfering process with small variance δ and correlation matrix δI. Thus, (3) takes the following expression: C = gT HT Hg − gT HT v − vT Hg + vT v + δgT g.

,

˜T

(7)

Now, assuming the fluctuation can be modeled as a white process, e T Hi e = δI and a general cost function can be derived we have EhH by embedding noise and fluctuation case studies:

,

s(n), s(n − 1), . . . , s(n − J − M + 1)

(6)

˜T

C = gT HT Hg − gT HT v − vT Hg + vT v + δgT g

. where

Each component of vectors above is ˆ ˜T xi (n) = xi (n), xi (n − 1), . . . , xi (n − M + 1) , ˆ ˜T wi (n) = wi (n), wi (n − 1), . . . , wi (n − M + 1) , 2 3 hi (0) 0 ··· 0 6 .. 7 . 6 hi (1) hi (0) . . . 7 6 7 6 . 7 .. 6 . 7 . 0 . h (1) 6 7 i 6 7 .. Hi =6 .. 7, 6 hi (J) . hi (0) 7 . [(J+M)×M] 6 7 6 0 hi (J) hi (1) 7 6 7 6 . .. 7 .. .. 4 .. . . . 5 0 ··· 0 hi (J)

 H=

H (noise case) H (fluctuation case).

(8)

(9)

The filter that minimizes the cost function in (8) is obtained by taking derivatives with respect to g and setting them equal to zero. The required solution is “ ”−1 gopt = HT H + δI HT v. (10) III. T HE P ROPOSED I TERATIVE I NVERSE F ILTERING A LGORITHM

The inverse filter calculation is usually performed through LS minimization. Since we consider the presence of disturbances, the classical LS cost function is modified as follows:

As aforementioned, we show here an alternative solution to the direct pseudo-inversion analyzed in the previous section. An iterative algorithm presented in [7] serves as starting point. In particular we consider the steepest-descent (SD) algorithm, whose recursive estimator has the form µ(m) g(m + 1) = g(m) − ∇C. (11) 2 Our objective is to extend the SD formulation proposed in [7] to the regularized case study, in order to apply the iterative approach to our inverse filtering problem with disturbances. Moving from (8) we can derive the following through simple algebraic calculations:

C = kHg − vk2 + δ kgk2 ,

∇C = −2[HT (v − Hg(m)) − δg(m)].

where i=1,. . . ,P and M is the inverse filter length for each channel. B. Regularized LS Optimum Inverse Filtering

(3)

where

(12)

Substituting (12) into (11) we have g

T

= [g1 (1), . . . , g1 (M ), . . . , gP (1), . . . , gP (M )] ,

(4)

[PM×1]

v

= [0, . . . , 0, 1, . . . , 0]T ,

(5)

[(J+M)×1]

are the inverse filter vector and the target vector, respectively. In our case v is the Kronecker delta shifted by an appropriate modeling delay (0 ≤ d ≤ P M ). The parameter δ(≥ 0), called regularization parameter, is a scalar coefficient representing the weight assigned to the disturbance term. It should be noticed that (3) has the same form of Tikhonov regularization for ill-posed problems [8]. An LS optimal solution, for the problem presented in the previous section, has been derived in [6]. For the fluctuation case the RTF is e in other words, the RTF is given by the sum denoted as H + H: of two terms, the mean RTF and the fluctuation from the mean

g(m + 1) = g(m) + µ(m)[HT (v − Hg(m)) − δg(m)].

(13)

where µ(m) is the stepsize. Notice that, in contrast to what happens in the non-adaptive case, no matrix inversion occurs in (13), resulting in a sensible reduction of the required computational load. The convergence of the algorithm to the optimal LS solution is guaranteed if the usual conditions for the stepsize in terms of autocorrelation matrix HT H eigenvalues hold. However, the achievement of the optimum can be slow if a fixed stepsize value is chosen. The algorithm convergence speed can be increased following the approach in [7], where the stepsize is chosen in order to minimize the cost function at the next iteration. The analytical expression obtained for the stepsize is the following [9]: µ(m) =

eT (m)e(m) eT (m) (HT H + δI) e(m)

(14)

Fig. 3: Scheme of source position’s changes. Fig. 2: Room setup.

where e(m) = HT [v − Hg(m)] − δg(m) It is important to note that, if we assume δ = 0, (14) corresponds to the solution obtained in [7], confirming the correctness of the performed algorithm generalization. Finally, as in [7], the complexity of the algorithm has been decreased computing the required operation in the frequency domain using FFTs. IV. C OMPUTER S IMULATIONS In this section, we want to evaluate the dereverberation performances of the proposed algorithm in dependence on the design parameters which more significantly influence the filter energy. As shown in [6], they are the regularization parameter δ, the modeling delay d and the filter length M : suitably choosing them allows to minimize the filter energy, so achieving an improved robustness to inverse filtering sensitivity to disturbances. We consider the same room setup as in [6], shown in Fig.2, for both the fluctuation and the noise case. The IRs have been obtained by using Roomsim toolbox [10], which implements the image method [11]. The sample frequency is set to 8kHz and the IRs are truncated to 3334 samples (J=3333). The inverse filter set with minimum length is obtained by setting M so that the matrix H is square, which leads to M = Mmin = J/(P − 1). These parameters have been chosen to make our simulations easily reproducible by a common end-user PC R (Intel Core2 Duo 2.2GHz, 2GB RAM). Notice that the bottleneck is clearly established by the non-adaptive solution. All simulations have R been performed in Matlab . The signal-to-distortion ratio (SDR) defined as ! PN 2 n=0 s (n) SDR = 10 log10 PN , (15) b(n))2 n=0 (s(n) − s has been chosen as criterion for performance evaluation, where s(n) is the original source signal and sb(n) is the output signal of the inverse filter defined as sb(n) = x(n)T g. To avoid dependence of the results from the type of source, a white noise with a duration of 3 seconds is used as source signal, in all experiments. A. Fluctuation Case In this case we considered that the source position can change over 8 equally spaced position in a circle of radius r, as shown in Fig. 3. We assumed that the center of the circle is the reference position and that the probability of finding the source in each position is the same. To evaluate the performance we proceed as follows: first, the reference RTF is estimated by averaging the RTFs over the 8

positions and then used to calculate the inverse filter set according to (10). Secondly, reverberant signals from each position are equalized by applying the aforementioned inverse filter set, as shown in Fig.1. Lastly, SDR values are calculated for all of the dereverberated signals and averaged over the 8 positions to obtain the overall performance measure. In the first simulation, the influence of the delay for different values of regularization parameter was investigated. The radius of the circle was kept constant at r=1cm and the filter length was set at M =1111 (minimum case). Results are reported in Fig.4(a), from which it is noticeable that SDR increases as the delay gets longer. The second simulation analyzes the influence of fluctuation amplitude for various regularization parameter. The circle radius was set at four different lengths (r=1,2,3 and 4cm), while the delay and the filter length were fixed at d=500 and M =1111 respectively. Performance, shown in Fig.4(b), confirms, as expected, that SDR decreases with increasing radius. Finally, Fig.4(c) shows the influence of filter length when the radius and the modeling delay are fixed, at d=500 and r=1cm, respectively. It can be seen that improvements are most noticeable for smaller values of δ(≤ 10−4 ), but even remaining below 2dB. A final remark concernes the iterative algorithm behaviour w.r.t its optimum non-adaptive counterpart: for all the aforementioned experiments it should be noted that the trend of the former is flatter than that of the latter, when the independent variable δ varies. However, variations in terms of the other parameters are preserved. B. Noise Case In this case, we considered a white additive noise with the same duration of the source signal. The input signal-to-noise ratio (SNR) for the ith microphone is defined as follows: ! PN 2 n=0 yi (n) SN R = 10 log10 PN , (16) 2 n=0 wi (n) where yi is the reverberant signal and wi is the noise. The first experiment, reported in Fig.4(d), investigates the influence of the modeling delay d for different value of regularization parameter δ, when both the SNR and the filter length are fixed at 20dB and M =1111, respectively. According to the fluctuation case, the performance improves when the delay increases. In the second experiment, the behaviour of the algorithm was evaluated for different values of SNR, with the delay fixed at d=500 and the filter length at M =1111. Results, shown in Fig.4(e), reveal that increasing the SNR of 10dB corresponds to an SDR improvement of about 8dB. In the third simulation, SNR and modeling delay were fixed at 20dB and d=500, respectively, and the influence of filter length, for different value of regularization parameter δ, was investigated. From the results, reported in Fig.4(f), it can be observed that the

(a) SDR as function of regularization parameter and delay. Radius is fixed at r=1cm and filter length is fixed at M =1111.

(b) SDR as function of regularization parameter and radius. Delay is fixed at d=500 and filter length is fixed at M =1111.

(c) SDR as function of regularization parameter and filter length. Delay is fixed at d=500 and radius is fixed at r=1cm.

(d) SDR as function of regularization parameter and delay. SNR is fixed at SNR=20dB and filter length is fixed at M =1111.

(e) SDR as function of regularization parameter and SNR. Delay is fixed at d=500 and filter length is fixed at M =1111.

(f) SDR as function of regularization parameter and filter length. Delay is fixed at d=500 and SNR is fixed at SNR=20dB.

Fig. 4: Simulations results for adaptive (solid) and optimum (dotted) algorithm for fluctuation (a,b,c) and noise (d,e,f) cases.

trend is very close to the fluctuation counterpart. Lastly, it is worth noting that the proposed iterative solution has shown robustness w.r.t. regularization parameter δ variations.

[2]

V. C ONCLUSION In this paper, an inverse filtering iterative technique for dereverberation in presence of disturbance has been investigated, which represents an interesting ”regularized” extension of an existing solution already applied in multichannel sound reproduction context [7]. The present approach has been successfully applied to some simulated scenarios, where the occurrence of additive noise or RTF fluctuations negatively affects the inverse filtering performances. Computer-based experiments confirm that the employed algorithm allows to achieve a sensible improved robustness to the regularization parameter variations w.r.t. the LS optimum inverse filtering based method [1] taken here as reference, with a reduced computational load, making it suitable not only for massive simulations on common end-user PCs but also for online implementation of speech dereverberation schemes with long inverse filters at high sampling frequencies (> 8kHz). Further works can be oriented to employ the same approach by using regularized versions of other efficient inverse filtering iterative algorithms, like the one based on the Gauss-Newton paradigm as proposed in [7]. Then, ongoing efforts are actually directed to study more detailed disturbance models, likely specific to certain acoustic environments, and derive suitable adaptive inverse filtering solutions taking them into account. R EFERENCES [1] M. Miyoshi and Y. Kaneda, “Inverse filtering of room acoustics,” Acoustics, Speech, and Signal Processing [see also IEEE Transactions

[3] [4]

[5] [6]

[7]

[8] [9] [10] [11]

on Signal Processing], IEEE Transactions on, vol. 36, no. 2, pp. 145– 152, Feb 1988. O. Kirkeby, P.A. Nelson, H. Hamada, and F. Orduna-Bustamante, “Fast deconvolution of multichannel systems using regularization,” Speech and Audio Processing, IEEE Transactions on, vol. 6, no. 2, pp. 189–194, Mar 1998. J. Mourjopoulos, “On the variation and invertibility of room impulse response functions,” Journal of Sound and Vibration, vol. 102, no. 2, pp. 217–228, 1985. H. Tokuno, O. Kirkeby, P.A. Nelson, and H. Hamada, “Inverse filter of sound reproduction systems using regularization,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E80-A, no. 5, pp. 809–820, 1997. A.N. Tikhonov and V.A. Arsenin, Solution of Ill-posed Problems, Vh Winston, 1977. T. Hikichi, M. Delcroix, and M. Miyoshi, “Inverse filtering for speech dereverberation less sensitive to noise and room transfer function fluctuations,” EURASIP J. Appl. Signal Process., vol. 2007, no. 1, pp. 62–62, 2007. M. Guillaume, Y. Grenier, and G. Richard, “Iterative algorithms for multichannel equalization in sound reproduction systems,” Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP ’05). IEEE International Conference on, vol. 3, pp. iii/269–iii/272 Vol. 3, 18-23 March 2005. H. Egger and H.W. Engl, “Tikhonov regularization applied to the inverse problem of option pricing: convergence analysis and rates,” Inverse Problems, vol. 21, no. 3, pp. 1027–1045, 2005. G.-O. Glentis, K. Berberidis, and S. Theodoridis, “Efficient least squares adaptive algorithms for fir transversal filtering,” Signal Processing Magazine, IEEE, vol. 16, no. 4, pp. 13–41, Jul 1999. D.R. Campbell, “Roomsim, a matlab simulation of shoebox room acoustics for use in teaching and research,” [Online]: http://media. paisley.ac.uk/˜campbell/Roomsim/. J.B. Allen and D. A. Berkley, “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, no. 4, pp. 943–950, 1979.

Suggest Documents