Direction of Arrival Estimation of Multiple Sound Sources Based on ...

Audio Engineering Society

Convention Paper 9299 Presented at the 138th Convention 2015 May 7–10 Warsaw, Poland

This paper was peer-reviewed as a complete manuscript for presentation at this Convention. This paper is available in the AES E-Library, http://www.aes.org/e-lib All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society.

Direction of Arrival Estimation of Multiple Sound Sources Based on FrequencyDomain Minimum Variance Distortionless Response Beamforming Seung Woo Yu1, Kwang Myung Jeon1, Dong Yun Lee1, and Hong Kook Kim1,2 1

School of Information and Communications, Gwangju Institute of Science and Technology (GIST) Gwangju 500-712, Korea {yuseungwoo, kmjeon, ldy, hongkook}@gist.ac.kr 2

Dept. of Electrical and Computer Engineering, City University of New York, NY 10031, USA

ABSTRACT In this paper, a method for estimating the direction-of-arrivals (DOAs) of multiple non-stationary sound sources is proposed on the basis of a frequency-domain minimum variance distortionless response (FD-MVDR) beamformer. First, an FD-MVDR beamformer is applied to multiple sound sources, where the beamformer weights are updated according to the surrounding environments for the reduction of the sidelobe effect of the beamformer. Then, multistage DOA estimation is performed to reduce computational complexity regarding the beam search. Finally, a median filter is applied to improve the DOA estimation accuracy. It is demonstrated that the average DOA estimation error of the proposed method is smaller than those of the methods based on conventional GCC-PHAT, MVDR-PHAT, and FD-MVDR, with lower computational complexity than that of the conventional FD-MVDRbased DOA estimation method.

1.

INTRODUCITON

There are many acoustic array applications, such as sound localization, source separation, and direction estimation. Among them, direction-of-arrival (DOA) estimation has been applied to many areas, such as radar, communication, sonar, and aeronautics [1]. To

this end, the generalized cross correlation phase transform (GCC-PHAT) and steered response power phase transform (SRP-PHAT) have been proposed [2], and they were known to have robust performance in reverberant environments [3]. However, these methods focused on tracking one sound source that had the greatest phase power depending on the correlation of each microphone channel [5].

Yu, et al.

DOA Estimation of Multiple Sound Sources

In order to estimate the directions of multiple sound sources that are recorded simultaneously, beamformingbased DOA estimation methods have been proposed. Among them, the minimum variance distortionless response phase transform (MVDR-PHAT) [5, 6] minimized the total acoustic power while the directional gain of the desired sound source was set through the phase transform with fixed MVDR-PHAT weights [7]. However, the MVDR-PHAT beamformer was not suitable for detecting the directions of non-stationary sound sources, even though all the possible directions were exhaustively searched [7]. To remedy this problem, a frequency-domain minimum variance distortionless response (FD-MVDR)-based DOA estimation method was proposed [8, 9], where the covariance matrix for the beamformer weights was updated by taking into account environmental information to estimate the direction of non-stationary sound sources. However, the burden of computational complexity was higher due to the weight update and beam search. In this paper, a computationally efficient DOA estimation method was proposed on the basis of FDMVDR. In other words, the proposed method applies FD-MVDR to multiple non-stationary sound sources and a multi-stage DOA estimation approach is performed to reduce computational complexity of the beam search. After that, a median filter [10] is applied to improve the DOA estimation accuracy. Following this introduction, Section 2 describes the proposed DOA estimation method, including the weight updating procedure of the FD-MVDR, multi-stage DOA estimation, and the application of a median filter. Section 3 evaluates the performance of the proposed DOA estimation method applied to multiple nonstationary sound sources and compares it with those of conventional GCC-PHAT, MVDR-PHAT, and FDMVDR. Section 4 concludes the paper. 2.

PROPOSED DOA ESTIMATION METHOD

z m (n )  z M -1 ( n )

z0 (n ) 

STFT  Z i (ω ) Covariance matrix adaptation ΦZ ,i (ω) FD-MVDR weight adaptation  Wˆ MVDR ( ω ) Multi-stage DOA estimation θî , j

 d (ω, θ )

Median-filtering

θi , j

Fig. 1. Block diagram of the proposed FD-MVDR based DOA estimation method.

frequency domain signal, Z m ( ), using a short-term Fourier transform (STFT), in which the number of frequency bins for STFT is determined based on spatial sampling theory. A covariance matrix,  Z ( ), is then calculated to update the FD-MVDR weight matrix, W ( ), once every frame. Next, a multi-stage approach is applied for the DOA estimation. That is, a region in which a sound source is located is identified, and then a detailed search is performed on the region. As a result, the DOA at the i-th frame for the j-th sound source,

î , j , is estimated. Finally, a median filter is applied to î, j to reduce the estimation error. 2.2. FD-MVDR beamformer

2.1. Overview

For a given j-th sound source, S j ( ), the propagated

Fig. 1 shows a block diagram of the proposed FDMVDR-based DOA estimation. As shown in the figure, the proposed method first converts the time domain signal of the m-th microphone, z m (n ), into the

signal recorded by an M-th linear microphone array can be expressed in the frequency domain as [8, 9]

   Z i ()  d (, )S j ()  N ()

AES 138th Convention, Warsaw, Poland, 2015 May 7–10 Page 2 of 6

(1)

Yu, et al.


  where N ( ) is the background noise and d (, ) is the  steering vector. In addition, Z i ( ) is the (Mⅹ1)dimensional column vector obtained by concatenating all the microphone signals at the i-th frame, such as  T Z i ( )  [ Z 0 ( ),, Z m ( ),, Z M 1 ( )]

(2)

where T is the transpose operator and Z m ( ) is the STFT of the input signal from the m-th microphone. In Eq. (1), the steering vector is represented as [8, 9]

  j ( )  j ( )  j ( ) T d (, )  [e 0  e m  e M 1 ]

(3)

1

where  m ( )  f s c l0,m sin( ) is the time delay that

where  denotes the conjugate transpose. In Eq. (6),    Z m ,i ( )  [ Z m,i F 1 ( ),, Z m,i1 ( ), Z m,i ( )] is a block matrix using (F-1) previous frames and the current i-th frame, and P is a regularization factor to prevent the covariance matrix from being singular. In order to realize the FD-MVDR with reduced complexity, we need to select a minimum frequency for searching for the DOA frequency. For a given minimum wavelength of the sources that can be separable in the linear microphone array, min , we set the minimum frequency as spatial  2f min / f s , where f min  c min . In this paper, we set the minimum frequency as spatial  2f min / f s , where f min  c min . In this paper,

arises between the two adjacent microphones. In addition, f s , c, and lu ,v are the sampling rate, the

spatial  0.2761 for f min  2109 Hz and N fft  2048.

speed of sound in the air, and the microphone spacing between the u-th and v-th microphones, respectively.

2.3. DOA estimation

The FD-MVDR attempts to give a weight to the input signals according to the following equation of

In order to accelerate the DOA search, we first identify the region in which a sound source is assumed to be located. That is, we divide the whole space, I (  [90,90]), into R different regions, such as

ˆ   WMVDR ( )  arg min W H ( )  Z ( )W ( )

(4)

I r    (r  1)  90      r  90, r  1,, R

W

ˆ where WMVDR() is minimized with a constraint of  H W ()d (, )  1 [8]. By applying a Lagrange multiplier to Eq. (4), the weights for the FD-MVDR beamformer are obtained as  ˆ  Z1 ( )d (,  )  WMVDR ( )   H d (,  )  Z1 ( )d (,  )

(7)

where   180 / R is the interval of the separated region. In this case, the central angle,  Cr , of Ir is defined as  Cr    ( r  1 / 2), r  1,  , R. We then select the r that provides the maximum average of i PMVDR (Cr ), such as

(5) i r o  arg max PMVDR ( C r )

(8)

1 r  R

Therefore, the proper estimation of the covariance matrix,  Z ( ), in Eq. (4) or (5) plays a crucial role in the performance of the FD-MVDR. Since the aim of this paper is to estimate the DOA of a target sound source in non-stationary environments, we need to update Z () using the microphone signals for the i-th frame, such as 1F PF   Z1,i ()Z1,i ()  Z1,i ()ZM,i ()   F i 1 F i1  z,i ()      1 F   PF   ZM,i ()Z1,i ()  ZM,i ()ZM,i () F i1  F i1 

(6)

 1 i  d . In fact, we where PMVDR( )    H -1  d (, ) ()d (, ) Z ,i

use a smaller number of microphones than M, because coarse spatial resolution improves region identification. In the ro-th region selected from Eq. (8), the FD-MVDR beamforming with M microphones is conducted because the narrow beam pattern is advantageous for estimating DOA. i î , j  arg max PMCDR ( )

 ∈I o r


(9)


For estimating the DOAs of multiple sources, the procedure defined by Eqs. (8) and (9) is repeated for all the J sound sources.

Baby crying

amplitude

Yu, et al.

0.2 0 -0.2 50

100

150

200

In order to reduce the DOA estimation errors, a smoothing technique using a median filter is applied here. In particular, the median filtering is applied once every frame by shifting one frame with a window size of 11. As boundary conditions, the values for the first and last frames are repeated. Consequently, a median filtered version of θî , j , θi , j is obtained.

i,j

50

100

150

200

350

i,j

250

300

350

400 450 frame

100

150

200

250

300

350

400 450 frame

250

300

350

400

(c) 90 0 -90 100

150

200 (d)

The performance of the proposed method was compared with those of the conventional GCC-PHAT [2], MVDRPHAT [5], and FD-MVDR methods [9]. The evaluation database was recorded in an anechoic room of approximately 12m² with 30 different environmental sound sources, including TV news, air conditioning, and cooking sounds. The test database was composed of two different scenarios—one source and two sources, which were referred to as Test1 and Test2, respectively. The data was recorded at a 48-kHz sampling rate with a 16bit resolution. All the methods, including the proposed method, were designed to handle 2048 samples per frame, overlapping with half of the previous frame. The uniform linear microphone array was composed of six electret condenser microphones and the distance of each mic was l  4.2 cm . Thus, we had ltotal  21 .0cm. The performance of the DOA estimation was measured by using a mean absolute error (MAE),   , and the real time factor (RTF). In other words, MAE was defined as [11]

i,j

450

frame

90 0 -90 50

100

150

200

250

300

350

400 450 frame

250

300

350

400 450 frame

(e)

i,j

450

90 0 -90 50

i,j

400

frame

(b)

PERFORMANCE EVALUATION

1 I J 0   i , j   i , j I  J i1 j1

300

90 0 -90

50

 

250 (a)

2.4. Median filtering

3.

Cat

90 0 -90 50

100

150

200 (f)

Fig. 2. Comparison of DOA estimation performance for (a) a given waveform, (b) reference DOA, and DOAs by (c) GCC-PHAT, (d) MVDR-PHAT, (e) FD-MVDR, and (f) the proposed method.

RTF 

DO DI

(11)

where DI was the length of the input audio signals in seconds, and DO was the time elapsed for estimating the DOAs. In fact, the RTF was measured with an Intel(R) Core i7-4790K CPU with a clock cycle of 4 GHz and a RAM size of 32 GB, which operated at the Windows 7 64-bit operating system.

(10)

where I and J represented the numbers of frames and sound sources, respectively. In Eq. (10), θi0, j and θi, j were the reference and estimated DOAs, respectively, at the i-th frame and the j-th source. Additionally, RTF was defined as

Fig. 2 shows the estimated DOAs of different estimation methods applied two sound sources (Test2 scenario) shown in Fig. 2(a). Fig. 2(b) shows the reference DOA for each sound source, and Figs. 2(c)–(f) show the estimated DOAs by the GCC-PHAT, MVDR-PHAT, FD-MVDR, and the proposed method, respectively. As shown in the figure, MVDR-PHAT provided a superior performance to GCC-PHAT, but MVDR-PHAT failed


Yu, et al.


to estimate the DOAs for non-stationary sound sources, such as air-conditioning noise. On the other hand, FDMVDR outperformed MVDR-PHAT, and the proposed FD-MVDR based estimation method could give DOAs similar to the reference DOAs. Tables I and II compare MAE between different DOA methods at the scenarios of Test1 and Test2, respectively. It was shown from the tables that GCCPHAT achieved smaller MAE at Test1 scenario than Test2 scenario. Moreover, the proposed method achieved a similar performance to FD-MVDR and a smaller MAE than GCC-PHAT and MVDR-PHAT. Table III compares average RTF between GCC-PHAT, MVDR-PHAT, FD-MVDR, and the proposed method. It was shown from the table that the proposed method was faster than the FD-MVDR method. Thus, it could be concluded here that the proposed method obtained the better accurate DOAs of multiple sources than FDMVDR, whereas its complexity was lower than that of FD-MVDR.

4.

CONCLUSION

In this paper, an FD-MVDR based multi-stage DOA estimation was proposed to reduce the computational complexity for DOA estimation, compared to conventional FD-MVDR based DOA estimation method. The performance of the proposed method was evaluated in terms of average mean squared error of the estimated DOA and the processing time. It was shown from the evaluation that the proposed method was faster than FD-MVDR with smaller average mean squared error.

5.

ACKNOWLEDGEMENTS

This work was supported in part by a National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (No. 2012-010636), ICT R&D program of MSIP/IITP [2014- 044-055-002, Loudness Based Broadcasting Loudness and Stress Assessment of Indoor Environment Noises], and by the MSIP under the ITRC (Information Technology Research Center) support program (NIPA2014-H0301-14-1019) supervised by the NIPA (National IT Industry Promotion Agency).

TABLE I Comparison of MAE between different DOA estimation methods at Test1 scenario Degree (°) -60 -30 0 30 60 Avg.

GCCPHAT 18.77 6.92 2.35 6.03 8.72 8.55

MVDRPHAT 19.99 8.10 1.58 7.72 12.45 9.96

FDMVDR 20.27 7.22 4.93 5.52 8.99 9.38

Proposed 16.59 7.12 6.41 6.28 8.18 8.91

TABLE II Comparison of MAE between different DOA estimation methods at Test2 scenario Degree (°) -60, 45 -45, 30 -30, 15 -20, 0 -5, 5 Avg.

GCCPHAT 43.44 29.77 18.43 13.24 17.25 24.42

MVDRPHAT 44.38 31.92 19.88 13.66 26.09 27.18

FDMVDR 47.33 33.66 19.02 9.72 5.97 23.14

Proposed 44.98 35.12 19.52 11.37 9.23 24.04

TABLE III Comparison of RTF between different DOA estimation methods at each scenario Scenario Test1 Test2 Avg.

6.

GCCPHAT 4.07 4.06 4.06

MVDRPHAT 23.96 21.98 22.97

FDMVDR 29.86 27.21 28.53

Proposed 13.66 16.29 14.97

REFERENCES

[1] Z. Xiaofei, et al., “A novel DOA estimation algorithm based on eigen space,” in Proc. of IEEE Int. Symp. Microwave, Antenna, Propag., EMC Technol. Wireless Commun., pp. 551-554 (2007). [2] M. F. Font, Multi-microphone Signal Processing for Automatic Speech Recognition in Meeting Rooms, MS Thesis, Universitat Politecnica de Catalunya, Spain (2005). [3] K. C. Kwak and S. S. Kim, “Sound source localization with the aid of excitation source information in home robot environments,” IEEE


Yu, et al.


Transactions on Consumer Electronics, vol. 54, no. 2, pp 852-856 (2008). [4] C. J. Chun and H. K. Kim, “Sound source separation using interaural intensity difference in real environments,” 136th Audio Engineering Society Convention, New York, NY, preprint 8976, (2013). [5] H. Do and H. F. Silverman, “Robust crosscorrelation-based techniques for detecting and locating simultaneous, multiple sound sources,” in Proc. of ICASSP, pp. 201-204 (2012). [6] M. Brandestein and D. Ward, Microphone Arrays Signal Processing Techniques and Application, Springer-Verlag: Berlin, Germany (2001). [7] J. J. M. Van de Sande, “Real-time beamforming and sound classification parameter generation in public environments,” TNO Report, TNO-DV 2012 S007 (2012). [8] M. E. Lockwood, et al., “Effect of multiple nonstationary sources on MVDR beamformers,” in Proc. of 37th Asilomar Conf. on Signals, Systems and Computers, pp. 730-734 (2003). [9] M. E. Lockwood, et al., “Performance of time- and frequency-domain binaural beamformers based on recorded signals from real rooms,” Journal of the Acoustical Society of America, vol. 115, no. 1, pp. 379-391 (2004). [10] W. K. Pratt, Digital Image Processing, 4th Ed, John Wiley & Sons: Hoboken, NJ (2007). [11] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” International Journal of Forecasting, vol. 22, no. 4, pp 679-688 (2006).


Direction of Arrival Estimation of Multiple Sound Sources Based on ...

Direction of Arrival Estimation of Multiple Sound Sources Based on ...

Suggest Documents

online direction of arrival estimation based on

Estimation of Direction of Arrival of Multiple Sound Sources in 3D

DIRECTION OF ARRIVAL ESTIMATION BASED ... - Semantic Scholar

Direction-of-Arrival Estimation of Virtual Array Signals Based on ...

Decoupled 2D direction-of-arrival estimation based on ... - Springer Link

Direction-of-Arrival Estimation Based on the Joint ... - IEEE Xplore

High Resolution Direction of Arrival (DOA) Estimation Based on ...

Direction-of-Arrival Estimation Based on Sparse ... - Radioengineering

online direction of arrival estimation based on deep learning - MIRLab

Direction of arrival estimation for more correlated sources than active ...

Unitary Direction of Arrival Estimation Based on A ... - IEEE Xplore

DIRECTION OF ARRIVAL ESTIMATION BASED ON FOURTH-ORDER ...

Direction-of-Arrival Estimation Based on Khatri-Rao Product and ...

Broadband Direction-Of-Arrival Estimation Based On ...

Direction of Arrival Estimation for Multiple Source Signals Using

Multiple Direction-of-Arrival Estimation for a Mobile ... - Springer Link

Direction of Arrival Estimation for Multiple Source Signals Using

Multiple Signal Direction of Arrival (DoA) Estimation for a ... - piers

Direction-of-arrival estimation of multipath signals

BLIND DIRECTION OF ARRIVAL ESTIMATION OF COHERENT ...

Covariance-Based Direction-of-Arrival Estimation of Wideband ... - MDPI

DIRECTION OF ARRIVAL ESTIMATION FOR NONUNI- FORM ...

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION ...

Sparse Methods for Direction-of-Arrival Estimation