NEAR-FIELD SOURCE EXTRACTION USING SPEECH ... - IEEE Xplore

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC)

NEAR-FIELD SOURCE EXTRACTION USING SPEECH PRESENCE PROBABILITIES FOR AD HOC MICROPHONE ARRAYS Maja Taseska1 Shmulik Markovich-Golan2 Emanuël A. P. Habets1 Sharon Gannot2 1

International Audio Laboratories Erlangen∗ , Am Wolfsmantel 33, 91058 Erlangen, Germany 2 Bar-Ilan University, Faculty of Engineering, Ramat-Gan, 52900, Israel bandwidth compared to the centralized algorithm, which requires conveying all microphone signals to a fusion center, the required bandwidth might still be too high for some applications. The distributed adaptive node-specific signal estimation (DANSE) [4] and linearly constrained distributed adaptive node-specific signal estimation (LC-DANSE) [5] algorithms require the transmission of N 2 audio channels, where N is the number of nodes/speakers. Suboptimal approaches (e.g. [8]) aim at reducing the communication bandwidth at the cost of reduced performance.

ABSTRACT Ad hoc wireless acoustic sensor networks (WASNs) hold great potential for improved performance in speech processing applications, thanks to better coverage and higher diversity of the received signals. We consider a multiple speaker scenario where each of the WASN nodes, an autonomous system comprising of sensing, processing and communicating capabilities, is positioned in the near-field of one of the speakers. Each node aims at extracting its nearest speaker while suppressing other speakers and noise. The ad hoc network is characterized by an arbitrary number of speakers/nodes with uncontrolled microphone constellation. In this paper we propose a distributed algorithm which shares information between nodes. The algorithm requires each node to transmit a single audio channel in addition to a soft time-frequency (TF) activity mask for its nearest speaker. The TF activity masks are computed as a combination of estimates of a model-based speech presence probability (SPP), direct to reverberant ratio (DRR) and direction of arrival (DOA) per TF bin. The proposed algorithm, although sub-optimal compared to the centralized solution, is superior to the single-node solution.

Implementing the above mentioned distributed beamformers requires the second-order statistics (SOS) of the signals from the different speakers received at the microphones, and the relative transfer functions (RTFs) of the speakers, which are usually estimated during single-talk time-segments identified by some oracle mechanism. Apart from the problem of obtaining such oracle information, this approach suffers from long convergence time, since single-talk time-segments might not be frequent enough. An alternative approach, based on the sparseness of speech signals, estimates the required SOS and RTFs at each frequency-bin independently, allowing simultaneous updates for multiple concurrent speakers. Estimating the SOS and RTFs requires associating each time-frequency (TF) bin with its dominant speaker. To determine the latter, spatial cues such as direction of arrival (DOA), observation vectors, and position estimates extracted using multiple microphones can be used [9, 10, 11]. Souden et al. [12] presents an approach for clustering simultaneous speech signals in distributed WASN using source location cues.

1. INTRODUCTION Wireless acoustic sensor networks (WASN) draw increasing interest of the research community as well as of the industry in recent years. A network consists of multiple nodes, each comprising sensing, communicating and processing capabilities. By collaborating, nodes gain access to more information, thus obtaining better coverage of the environment, and benefitting from improved direct to reverberant ratio (DRR), signal to noise ratio (SNR) and higher diversity of the received signals. The reader is referred to [1], [2] for a survey on signal processing in wireless sensor networks, and to [3] for a survey on speech processing in WASN. Unlike classical microphone array processing applications, where a single array is configured to fit the problem at hand, in ad hoc scenarios, the number of nodes, their positions and the number of microphones per node cannot be controlled or pre-designed. Consider as an example, a meeting scenario with multiple participants arbitrarily positioned, each carrying their smartphones, laptops or tablets, equipped with one or more microphones. Considerable efforts were dedicated to developing optimal distributed beamformers with respect to various criteria, e.g. unconstrained minimum mean squared error (MMSE) [4] as well as constrained solutions [5], [6], [7]. Although, the latter reduce the required communication

In this contribution, we consider a special case of speaker extraction in WASN, where each node is positioned in the near-field of a different speaker. We propose a distributed beamforming algorithm, which aims at extracting the nearest speaker at each node, while suppressing the remaining speakers and the background noise. In addition, we provide a method to estimate the SOS and RTF required for the local and the global beamforming stages of the proposed distributed beamformer. Each node transmits the output of its local beamformer and a local soft TF mask related to the activity of the source nearest to the node. The local soft activity masks are obtained using a local SPP, and bin-wise DOA and DRR estimates. At each node, all local TF masks are used to compute global TF masks that determine the dominant source at each TF bin and control the update of the corresponding SOS and RTFs accordingly. Although sub-optimal with respect to a centralized solution, the proposed algorithm requires lower communication bandwidth and is more energy efficient for practical applications. The transmission of only two signals, namely the output of the local beamformer and the local soft activity mask, is required per TF bin for each node.

∗ A joint institution of the Friedrich-Alexander-University ErlangenNürnberg (FAU) and Fraunhofer IIS, Germany.

978-1-4799-6808-4/14/$31.00 ©2014 IEEE

169


3.1. Local beamformer

2. PROBLEM FORMULATION Let sn (, k) for n = 1, 2, . . . , N be the speech signals of N speakers located in a reverberant enclosure. The signals are given in the short-time Fourier transform (STFT) domain, where is the timeframe index, and 0 ≤ k ≤ K − 1 is the frequency-bin index. We assume that the n-th speaker is positioned in the near-field of the nth node, for all n. Node n comprises Mn microphones, and the total number of microphones is M = N n=1 Mn . The N nodes constitute a WASN, which is assumed to be fully-connected. The vector of the received signals at the n-th node is denoted: xn (, k) q n (, k) + v n (, k)

The local beamformer is constructed as a minimum variance distortionless response (MVDR) beamformer, designed to enhance the n-th speaker while reducing the background noise. It is given by: −1 H −1 f l,n Φ−1 (5) vn hnn hnn Φvn hnn where hnn is the RTFs vector relating the n-th node microphones and the reference signal of the n-th speaker, Φvn () E v n ()v H n () is the covariance matrix of the background noise at the n-th node and E [•] denotes the expectation operator. The output of the local beamformer (BF) is denoted yl,n () and is given by:

(1)

yl,n () f H l,n xn ().

where q n (, k) is the sum of all speech signals at the n-th node and v n (, k) is the background noise signal at the n-th node. Note that the background noise might originate from directional noise sources (e.g., air conditioning), however in this case we assume that the noise sources are not located near any node. Furthermore, we assume that the approximate DOA of the n-th speaker is known to the n-th node. Denote by q nn the vector containing the signals of the n -th speaker received at the microphones of the n-th node: q nn (, k) hnn (, k)˜ sn (, k)

(6)

The outputs of the local beamformers of all nodes, i.e., yl,n () for all n, are broadcast in the WASN and are received by all nodes. 3.2. Global beamformer We define the extended microphone signals vector of the n-th node, denoted x ¯n (), as the concatenation of the n-th node microphone signals and the local beamformer outputs of the other N − 1 nodes: x ¯n () xTn () yl,1 () · · · yl,n−1 () T yl,n+1 () · · · yl,N () . (7)

(2)

where sñ (, k) is the reference signal of the n -th speaker, and hnn (, k) is the vector of RTFs relating sñ (, k) to q nn (, k). Without loss of generality, the reference signal sñ (, k) is set to be the signal of the n -th speaker at first microphone of the n -th node, for all n . Arranging the Mn × 1 RTFs of the N speakers at the n-th node in a matrix yields an Mn × N RTFs matrix denoted as H n (, k) hn1 (, k) · · · hnN (, k) . (3)

Using (6) and similarly to (1), (3) and (4) the extended microphone signal vector of the n-th node can be written as: x ¯n () =q¯n () + v ¯n () T v ¯n () v n () f H l,1 v 1 ()

Using latter definitions, the speech signals vector at the n-th node is: q n (, k) = H n (, k)˜ s(, k) (4) T is an N × 1 vector where s ˜(, k) s˜1 (, k) · · · sÑ (, k) comprising the N speech signals received at their respective reference microphones. Hereafter, we omit the index k and unless stated otherwise, all derivations refer to a single frequency bin. We also omit, for brevity, the index from the SOS and the RTFs. In this paper, the n-th node aims at extracting the n-th speaker, while suppressing all other speakers and the noise, for n = 1, 2, . . . , N . We propose a distributed algorithm, where each node performs a local and a global beamforming stage. In the local stage, each node broadcasts the output signal of its local beamformer, and a soft TF mask that indicates the activity of the nearest source. In the global stage, the local microphone signals at the n-th node and the received information from the other nodes is used to extract the n-th speaker . In Section 3, we derive the local and global beamformers, and in Section 4 we provide a method for estimating the required SOS and RTFs, using local and global activity masks.

¯n H

fH l,n+1 v n+1 () Hn

fH l,1 H 1

fH l,n+1 H n+1

(8a) ···

fH l,n−1 v n−1 (, k) T

···

fH l,N v N ()

···

fH l,n−1 H n−1

···

fH l,N H N

¯ ns ˜(). q¯n H

(8b)

(8c) (8d)

The global beamformer of the n-th node processes the associated extended microphone signal vector and yields the global output: yg,n () f H ¯n (). g,n x

(9)

The global beamformer is constructed as an MVDR beamformer designed to extract the n-th speaker, while reducing all other speakers and the background noise: −1 ¯ ¯ ¯ H −1 f g,n Φ−1 (10) ¯ n hnn u ¯ n hnn hnn Φu ¯n () is the sum of all interferwhere u ¯n () = n =n q¯nn () + v ing speakers and background noise at the extended signal vector of the n-th node and Φu¯n Φv¯n + n =n Φq¯nn is its corresponding covariance matrix comprised of Φv¯n and Φq¯nn for n = n , defined as the covariance matrices of v ¯n () and q nn (), respec¯ nn relates the n-th speaker component at tively. The RTFs vector h the extended microphone signals to its corresponding reference signal sñ (). Note that due to the non-stationarity of the speech signals, their corresponding covariance matrices are averaged over multiple quasi-stationary time-segments. A block-diagram of the processing carried out in the n-th node is depicted in Fig. 1.

3. DISTRIBUTED BEAMFORMER In the following, we derive the local and global beamformers at the n-th node, assuming that the required SOS and the RTFs of the different sources are available. The derivation also applies to all other nodes. The estimation of the SOS and RTFs using local and global activity masks is described in Section 4.

170


calculate local source activity mask x n A, k

fl,n k

Mn yl ,n ' A, k n ' ^1, 2,! , N ` \ n

D n ' A, k n ' ^1, 2,! , N `

Mn

N 1

yl ,n A, k fg,n k

calculate global source and noise activity masks

N

T is a selection vector which exwhere ¯ i1 1 01×(M¯ n −1) tract the first element of a vector.

D n A, k

4.2. Local SPP and soft speaker activity mask

yg,n A, k

Local speech presence probability: Assuming that the STFT of the speech and the noise signals can be modeled as complex Gaussian random vectors, the multichannel SPP at node n is given by [13]: σl,n −1 − 1+ξ r l,n αnv = 1 + (1 + ξl,n ) e (16) 1−r

E n ' A, k

n ' ^1, 2,! , N `

E nv A, k

Fig. 1. Block-diagram of the proposed BF at the n-th node.

where r denotes the a priori speech absence probability, and ξl,n = tr{Φ−1 vn Φqn },

4. PARAMETER ESTIMATION

4.1. Estimation

Local speaker activity mask: Local speaker activity masks for each node are obtained by combining the SPP computed in (16) and two parameter-based masks. Parametric information is extracted from the microphones at each node, consisting of TF bin-wise DOA and DRR estimates. The mask at the n-th node indicates the activity of the n-th speaker (i.e., the speaker nearest to the n-th node). DOA-based mask: We assume that the n-th speaker is located at an approximately known direction θn with respect to the microphone ˆ k), a array at the n-th node. Given a bin-wise DOA estimate θ(, binary DOA mask is computed as follows Mθn (, k) = 1{|θ(,k)−θ θˆ (, k) (18) ˆ n |≤θthr }

For calculating the local and global beamformers in (5) and (10) for the n-th node, the covariance matrices of the extended noise signal, i.e., Φv¯n () = E v ¯n ()¯ vH n () and of the individual speaker sig nals, i.e, Φq¯nn () = E q¯nn ()q¯H nn () , for all n are required. To estimate Φq¯nn (l), we first define the following matrix (11)

which can be recursively estimated using the global activity mask βn and a forgetting factor λs as follows q¯ +¯vn () [1 − βn () (1 − λs )] Φ q¯ +¯vn ( − 1) Φ nn nn ¯n ()¯ xH + βn () (1 − λs ) x n ().

(12)

where 1(•) denotes the indicator function, θthr is a threshold defining the maximum difference between the true DOA of the speaker and the estimated DOA, for which the DOA-based mask is equal to ˆ k) were obtained using the normalized one. The DOA estimates θ(, observation vectors, as reported in [14]. DRR-based mask: Since the n-th speaker is in the near-field of the n-th node, the DRR at the microphones of the n-th node, denoted by Γn , is high when the n-th speaker is active. Therefore, we derive a DRR-based soft mask at the n-th node as follows

Given the matrix Φv¯n (), an estimate of Φq¯nn () is obtained by q¯ +¯vn () − Φv¯n (). q¯ (l) = Φ Φ nn nn

(13)

Note that if the noise covariance matrix Φv¯n () is unknown, an estimate thereof can be computed using (12), where the speech absence probability 1 − βnv and a forgetting factor λv are used instead of the activity mask βn and the forgetting factor λs . In this work we assume that Φv¯n () is known and concentrate on the estimation of the speakers’ SOS and RTF. Local covariance matrices, using only local microphones, are extracted from their corresponding extended covariance matrices by:

MΓn (, k) =

q¯nn¯ Φ i1 , H q¯nn¯ ¯ i1 Φ i1

nn I¯H ¯ h n hnn

10cρ/10 + Γn (, k)ρ

10cρ/10

(19)

where c (in dB) controls the offset along the Γn axis, and ρ defines the steepness of the sigmoid curve transition. The parameters are chosen such that a low DRR corresponds to MΓn ≈ 0, while a high DRR to MΓn ≈ 1. The DRR for each node, Γn , was estimated using a complex coherence-based estimator proposed in [15]. Finally, the local soft speaker activity masks αn are computed using the local SPP and the parameter-based soft masks according to

H q I¯H ¯ Φvn I¯n Φv¯n I¯n , Φ (14) n Φq¯nn I n nn T where I¯n I n 0Mn ×(N −1) is a selection matrix which extracts the first Mn entries of a vector and I n is an Mn × Mn identity matrix. The multiplicative transfer function model in (2) q and Φ q¯ and (8d) is manifested in the rank-1 property of Φ nn nn for all n, n , thus allowing to estimate the required RTFs by:

¯ h nn

(17)

where ξl,n is the multi-channel a priori speech-to-noise ratio at the microphone signals of the n-th node, tr{·} denotes the trace operator and Φqn = n Φqnn is the covariance matrix of all speech signals at node n. The value of r can be either fixed or signaldependent. In this work, we set r to 0.5. In practice, given an es vn , the speech covariance timate of the noise covariance matrix Φ qn = Φ xn − Φ vn , where Φ xn is an esmatrix is estimated as Φ timate of the covariance matrix of the microphone signals at node n, which can be computed by recursive temporal averaging of the signal vector xn ().

To estimate the SOS and the RTFs for the local and the global beamformers, local and global SPPs, αnv and βnv , and local and global source activity masks, αn and βn for each node n, are required. In Section 4.1, we introduce the estimation of the covariance matrices and the RTFs. In Sections 4.2 and 4.3 we describe the computation of the SPPs and the soft activity masks.

Φq¯nn +¯vn () = Φq¯nn () + Φv¯n ()

−1 −1 σl,n = xH n Φvn Φqn Φvn xn

αn (, k) = αnv (, k) · MΓn (, k) · Mθn (, k). (15)

(20)

These soft masks are broadcast in the network and used to compute

171


the global soft masks βn , as described next.

compare the performance in the two cases and evaluate the applicability of the proposed framework for SOS and RTFs estimation, we assumed in these experiments that the noise covariance matrices of the local and the extended noise signals are known. In this manner, errors in the noise covariance matrix do not affect the filtering performance. The performance was evaluated in terms of signal to distortion ratio (SDR) and signal to interference and noise ratio (SINR) improvement (averaged over the two interfering signals). A frame length of K = 2048, with 75% overlap was used for the STFT. Forgetting factors of λv = 0.9934 and λs = 0.9633 were used for the recursive SOS estimates. A threshold of θthr = 20o was used for the DOA-based masks, and the sigmoid parameters were set to c = 5 dB, ρ = −1.2 for the DRR-based masks.

4.3. Global SPP and soft speaker activity masks Global speech presence probability: The global SPP βnv is computed similarly to (16) by: −1 σg,n r − (1 + ξg,n ) e 1+ξg,n (21) βnv = 1 + 1−r ξg,n = tr{Φ−1 v ¯n Φq¯n }

−1 −1 σg,n = x ¯H ¯n n Φv ¯n Φq¯n Φv ¯n x

(22)

where ξg,n is the multi-channel a priori speech-to-noise ratio at the extended microphone signals of the n-th node and Φq¯n = n Φq¯nn is the covariance matrix of all speech signals at the extended microphone signals of the n-th node. Similarly to the local stage, the extended speech signal covariance matrix is estimated as q¯n = Φ x¯n − Φ v¯n , where Φ x¯n is an estimate of the covariance Φ matrix of the extended microphone signals at node n obtained by recursive temporal averaging of the signal vector x ¯n ().

The results of the experimental study are depicted in Table 1. Each speaker was extracted using a reference microphone from the corresponding nearest node. The performance measures were averaged over the three speakers and over 5 Monte-Carlo experiments, where for each experiment a fixed constellation of three nodes and three sources was randomly positioned in the room. Although, as expected, the centralized beamformer outperforms the proposed distributed beamformer, the proposed beamformer achieves good tradeoff between communication bandwidth (two signals per node) and SINR improvement. Moreover, from these results it is evident that the performance of the distributed beamformer is superior to the performance of the single node beamformer, even when oracle information about the source activity is not available and instead the SOS and RTFs are estimated using SPP, DOA and DRR estimates.

Global speaker activity masks: To calculate the global speaker activity masks, the following indicator function is defined: In (, k) = 1{

n =n

αn ≤αn }

(αn (, k)) .

(23)

Computing In (, k) requires the local speaker activity masks αn received from all nodes. Setting < 1, ensures that In (, k) at a given TF bin is equal to 1 for at most one source. In this manner, the indicator function serves to identify the TF bins where only one of the sources is dominant. Updating the SOS and the RTF during TF bins where one source is dominant results in accurate estimates and good performance of the beamformers. Finally, the soft global speaker activity mask for the n-th source is computed as βn (, k) = αn (, k) · In (, k).

Table 1. Performance comparison. Algorithm Centralized Single node Proposed distributed

(24)

Speaker masks ideal estimated ideal ideal estimated

SDR[dB] 9.0 9.5 7.9 9.7 9.9

SINR imp. [dB] 15.2 11.2 4.0 9.8 7.9

5. EXPERIMENTAL STUDY To validate the proposed algorithm, we evaluated its performance in various scenarios. A reverberant enclosure with dimensions 4 m × 4 m × 3 m, reverberation time of T60 = 350 ms, and N = 3 nodes, each comprising of Mn = 3 microphones was simulated using the image method [16], at a sampling rate of 16 kHz. Three speakers were positioned at a distance of 0.5 m from their corresponding nodes, at known directions. A spatially white Gaussian noise, with approximately equal power at all nodes, was added to the microphone signals. The powers of the speaker signals were set such that the resulting SNRs at the reference microphone for each node was 33 dB. The SNR at a given node was computed as the power ratio between the closest speaker’s signal and the noise signal. The signal to interference ratios (SIRs) at the reference microphones of each node were approximately 2 dB for all tested scenarios. We compared the performance of three algorithms: 1) centralized MVDR beamformer utilizing all microphone signals; 2) single node MVDR beamformer utilizing the microphones of the nearest node and using ideal masks calculated from the clean speaker signals; 3) proposed distributed beamformer. For the centralized and the distributed beamformer the SOS and RTFs were computed with the proposed source activity masks that combine local SPPs and parameter-based masks, and with oracle source activity masks. To

6. CONCLUSION We considered an ad hoc WASN comprising N nodes, where the goal at each node was to extract the nearest speaker while reducing all other speakers and the noise. A distributed beamformer comprising of two-stage filtering at each node was proposed. At each node, the proposed beamformer requires the transmission of a single audio channel and a local soft TF mask related to the activity of the nearest speaker at the node. The local soft TF masks, computed using multichannel SPPs, and bin-wise DRR and DOA estimates, are fused into global soft TF masks. The latter are used to estimate the SOS and the RTFs required for the two filtering stages. The first filtering stage utilizes local microphone data, whereas the second filtering stage, fuses the outputs of the first stage from all nodes into a global BF (at each node). The proposed algorithm, although sub-optimal compared to the centralized BF, reduces the communication bandwidth and thus the energy consumption, and yet outperforms the single-node solution, as can be verified from the experimental study. Future work includes an extensive evaluation of the system in different real-world scenarios with unknown background noise statistics.

172


7. REFERENCES

[9] M. Souden, S. Araki, K. Kinoshita, T. Nakatani, and H. Sawada, “A multichannel MMSE-based framework for speech source separation and noise reduction,” IEEE Trans. Audio, Speech and Language Processing, vol. 21, no. 9, pp. 1913–1928, Sept 2013.

[1] D. Estrin, G. Pottie, and M. Srivastava, “Instrumenting the world with wireless sensor networks,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), May 2001, pp. 2033–2036.

[10] S. Araki, H. Sawada, and S. Makino, “Blind speech separation in a meeting situation with maximum SNR beamformers,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 2007, pp. 41–44.

[2] D. Culler, D. Estrin, and M. Srivastava, “Overview of sensor networks,” Computer, vol. 37, no. 8, pp. 41–49, Aug. 2004. [3] A. Bertrand, “Applications and trends in wireless acoustic sensor networks: A signal processing perspective,” in Proc. IEEE Symposium on Communications and Vehicular Technology (SCVT), Ghent, Belgium, Nov. 2011, pp. 1–6.

[11] M. Taseska and E. Habets, “MMSE-based source extraction using position-based posterior probabilities,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 664–668.

[4] A. Bertrand and M. Moonen, “Distributed adaptive nodespecific signal estimation in fully connected sensor networks – part I: Sequential node updating,” IEEE Trans. on Signal Processing, vol. 58, no. 10, pp. 5277 –5291, Oct. 2010.

[12] M. Souden, K. Kinoshita, and T. Nakatani, “An integration of source location cues for speech clustering in distributed microphone arrays,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), May 2013, pp. 111–115.

[5] ——, “Distributed node-specific LCMV beamforming in wireless sensor networks,” IEEE Trans. on Signal Processing, vol. 60, pp. 233–246, Jan. 2012.

[13] M. Souden, J. Chen, J. Benesty, and S. Affes, “Gaussian model-based multichannel speech presence probability,” IEEE Trans. Audio, Speech and Language Processing, vol. 18, no. 5, pp. 1072–1077, Jul. 2010. [14] S. Araki, H. Sawada, R. Mukai, and S. Makino, “A novel blind source separation method with observation vector clustering,” in in Proc. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), 2005.

[6] S. Markovich-Golan, S. Gannot, and I. Cohen, “Distributed multiple constraints generalized sidelobe canceler for fully connected wireless acoustic sensor networks,” IEEE Trans. Audio, Speech and Language Processing, vol. 21, no. 2, pp. 343– 356, 2013. [7] S. Markovich-Golan, A. Bertrand, M. Moonen, and S. Gannot, “Optimal distributed minimum-variance beamforming approaches for speech enhancement in wireless acoustic sensor network,” Signal Processing, 2014, accepted for publication.

[15] O. Thiergart, G. Del Galdo, and E. A. P. Habets, “Signal-toreverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Mar. 2012, pp. 309–312.

[8] A. Bertrand and M. Moonen, “Efficient sensor subset selection and link failure response for linear MMSE signal estimation in wireless sensor networks,” in Proc. European Signal Processing Conf. (EUSIPCO), Aalborg - Denmark, Aug. 2010, pp. 1092–1096.

[16] E. Habets, “Room impulse response (RIR) generator,” http://home.tiscali.nl/ehabets/rir generator.html, Jul. 2006.

173

NEAR-FIELD SOURCE EXTRACTION USING SPEECH ... - IEEE Xplore

NEAR-FIELD SOURCE EXTRACTION USING SPEECH ... - IEEE Xplore

Suggest Documents

SOURCE ENUMERATION OF SPEECH MIXTURES ... - IEEE Xplore

An online EM algorithm for source extraction using ... - IEEE Xplore

Extractive Speech Summarization Using Shallow ... - IEEE Xplore

Knowledge Extraction Using Probabilistic Reasoning ... - IEEE Xplore

Passive source localization using compressively ... - IEEE Xplore

NEAR-FIELD SOURCE LOCALIZATION USING ... - IEEE Xplore

Speech analysis - IEEE Xplore

Noise modeling for nearfield array optimization - IEEE Xplore

Panel and Speech Balloon Extraction from Comic Books - IEEE Xplore

Improvement of Mask-Based Speech Source Separation ... - IEEE Xplore

MMSE-BASED BLIND SOURCE EXTRACTION IN ... - IEEE Xplore

Turbo Source Extraction Algorithm and Noncancellation ... - IEEE Xplore

OVERFITTING-RESISTANT SPEECH ... - IEEE Xplore

Keynote speech 1 - IEEE Xplore

Keynote speech 1 - IEEE Xplore

IEEE IRI 2013 Keynote Speech - IEEE Xplore

Y-source inverter - IEEE Xplore

Y -Source Inverter - IEEE Xplore

Analysis and Classification of Cold Speech using ... - IEEE Xplore

Bangla Speech-to-Text Conversion using SAPI - IEEE Xplore

Speech Prediction Using Higher Order Neural Networks - IEEE Xplore

Speech Analysis/synthesis And Modification Using An ... - IEEE Xplore

Automated Extraction of Swallowing Sounds Using a ... - IEEE Xplore

feature extraction using discrete cosine transform for ... - IEEE Xplore