ABSTRACT. Source localization using spherical microphone arrays has re- ceived attention due to the ease of array processing in the spherical harmonics (SH) ...
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
NEAR-FIELD SOURCE LOCALIZATION USING SPHERICAL MICROPHONE ARRAY Lalan Kumar, Kushagra Singhal, and Rajesh M Hegde Indian Institute of Technology Kanpur {rhegde,lalank}@iitk.ac.in ABSTRACT
no more valid. In [9], design of a low order spherical microphone array is proposed to acquire the sound from near field sources. Near-field criterion for spherical array is discussed in [10]. However, spherical array has not been utilized for near-field source localization. In [11], 2-Dimensional (2D) MUSIC spectrum is presented for multiple near-field sources using Uniform Linear Array (ULA). In this work, we propose 3D SH-MUSIC spectrum for range and bearing (elevation, azimuth) estimation of multiple near-field sources. MVDR [12] and MUSIC-Group Delay (MGD) spectrum [13–16] have also been studied for near-field source localization using spherical array of microphone. The primary contribution of this work is in the proposal of novel methods for near-field source localization in spherical harmonics domain. The rest of the paper is organized as follows. In Section 2, signal model in spherical harmonics domain is presented. The near-field criteria is discussed, followed by the development of SH-MUSIC, SH-MGD and SH-MVDR methods. The proposed method is evaluated in Section 3. Section 4 concludes the paper.
Source localization using spherical microphone arrays has received attention due to the ease of array processing in the spherical harmonics (SH) domain with no spatial ambiguity. In this paper, we address the issue of near-field source localization using a spherical microphone array. In particular, three methods that jointly estimate the range and bearing of multiple sources in the spherical array framework, are proposed. Two subspace-based methods called the Spherical Harmonic MUltiple SIgnal Classification (SH-MUSIC) and the Spherical Harmonics MUSIC-Group Delay (SH-MGD) for near field source localization, are first presented. Additionally, a method for near-field source localization using the Spherical Harmonic MVDR (SH-MVDR) is also formulated. Experiments on near-field source localization are conducted using a spherical microphone array at various SNR. The SHMGD is able to resolve closely spaced sources when compared to other methods. Index Terms— MUSIC, Spherical Harmonics, Nearfield, Group delay
2. NEAR-FIELD SOURCE LOCALIZATION USING SPHERICAL MICROPHONE ARRAY
1. INTRODUCTION
In this Section, a mathematical derivation of 3-Dimensional MUSIC spectrum is presented using spherical harmonics for near-field sources. The SH-MUSIC utilizes the magnitude spectrum. However, magnitude spectrum suffers from severe environmental conditions like low SNR, reverberation and closely spaced sources. In [16], a high resolution source localization based on the MUSIC-Group delay spectrum over ULA has been proposed. The method is non-trivially extended for planar arrays in [14, 15] and for spherical array in [13]. In all these works, far-field source were considered. In this work, group delay spectrum in spherical harmonics domain has been developed for range and bearing estimation. Beamforming based SH-MVDR is also formulated for nearfield source localization. 2.1. Signal processing in Spherical Harmonics domain
Spherical microphone array processing has been a growing area of research in the last decade [1, 2]. This is primarily because of the relative ease with which array processing can be performed in the spherical harmonics (SH) domain without any spatial ambiguity [3]. Various algorithms have been proposed for far-field source localization using spherical microphone array. Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) [4] algorithm is extended for spherical array in [5]. Multiple SIgnal Classification (MUSIC) [6] is implemented in terms of spherical harmonics in [7]. In [8], room acoustics analysis is presented using spherical array, based on SH-MUSIC in frequency domain. All these source localization methods deals with planar wavefront of far-field sources. However, in applications like Close Talk Microphone (CTM), video conferencing etc, the planar wavefront assumption is
A spherical microphone array of order N with radius r and number of sensors I is considered. A sound field of sphericalwaves with wavenumber k from L near-field sources is incident on the array. The lth source location is denoted by
This work was funded by the DST project EE/SERB/20130277. The author L. Kumar was supported by TCS Research Scholarship Program TCS/CS/20110191.
978-1-4799-3109-5/14/$31.00 ©2014 IEEE
82
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
rl = (rl , Ψl ), where Ψl = (θl , φl ). The elevation angle θ is measured down from positive z axis, while the azimuthal angle φ is measured counterclockwise from positive x axis. Similarly, the ith sensor location is given by ri = (r, Φi ), where Φi = (θi , φi ). In spatial domain, the sound pressure at I microphones, p(k) = [p1 (k), p2 (k), . . . , pI (k)]T , is written as p(k) = V(k)s(k) + n(k)
where bn (k, r, rl ) is the near-field mode strength. It is related to far-field mode strength bn (kr) as [21] bn (k, r, rl ) = j −(n−1) kbn (kr)hn (krl ) where,
(1)
jn is spherical Bessel function, hn is spherical Hankel function, j is unit imaginary number and refers to first derivative. The extra term in far-field mode strength for rigid sphere accounts for scattered pressure from the sphere. The range of the source is captured in the Hankel function.
where V(k) is I × L steering matrix, s(k) is L × 1 vector of signal amplitudes, n(k) is I × 1 vector of zero mean, uncorrelated sensor noise and (.)T denotes the transpose. The steering matrix V(k) is expressed as
vl (k) = [
e
−jk|r1 −rl |
|r1 − rl |
,...,
e
−jk|rI −rl |
|rI − rl |
]T
(2)
50
(3) Magnitude(dB)
V(k) = [v1 (k), v2 (k), . . . , vL (k)], where
Denoting the acoustic pressure on the surface of the sphere by p(k, r, θ, φ), the Spherical Fourier Transform (SFT) and its inverse is defined by [17], 2π π p(k, r, θ, φ)[Ynm (θ, φ)]∗ sin(θ)dθdφ pnm (k, r) = 0
0
∞
n
p(k, r, θ, φ) =
n=0 m=−n
(5)
ai p(k, r, Φi )[Ynm (Φi )]∗
(7)
n=0 m=−n
10
rN F ≈
where ai are the sampling weights [19]. For order limited pressure function with order N , Equation 5 can be written as pnm (k, r)Ynm (Φ)
1
k
N (13) k But rN F ≥ r, r being the radius of the sphere. So the highest wavenumber possible is
∀0 ≤ n ≤ N, −n ≤ m ≤ n
n N
0
10
In general, the boundary between near-field and far-field is decided by Fraunhofer distances [22]. However, these parameters do not indicate the extent of near-field in spherical harmonics domain. For spherical array, the near-field criteria is presented in [10] based on similarity of near-field mode strength (|bn (k, r, rl )|) and far-field mode strength (|bn (kr)|). The two functions start behaving in similar way at krl ≈ N , for array of order N . This is illustrated in Figure 1 for rigid sphere Eigenmike system [23] with rl = 1m and order varying from n = 0 to n = 4. Hence the near-field condition for spherical array becomes
i=1
p(k, r, Φ) ∼ =
Far−field Near−field
−100
2.2. Near-field criterion in spherical harmonics domain
It is to be noted that Ynm are solution to the Helmholtz equation [18] and Pnm are associated Legendre function. The acoustic pressure is sampled by the microphones on the surface of the sphere. Hence, the SFT in the Equation 4 can be approximated by following summation I
−50
Fig. 1. Plot showing the nature of far-field and near-field mode strength for rigid sphere. Near-field source is at rl = 1m and order is varied from n = 0 (top) to n = 4 (bottom)
where Ynm (θ, φ) is spherical harmonic of order n and degree m defined in Equation 6, and (.)∗ denotes the complex conjugate. (2n + 1)(n − m)! m Pn (cosθ)ejmφ (6) Ynm (θ, φ) = 4π(n + m)!
pnm (k, r) ∼ =
0
−150 −1 10
(4) pnm (k, r)Ynm (θ, φ)
(10)
bn (kr) = 4πj n jn (kr), open sphere (11) j (kr) hn (kr) , rigid sphere = 4πj n jn (kr) − n hn (kr) (12)
N (14) r kmax (15) From Equations 13,14, rN F = r k Hence, for a source to be in near-field, the range of the source should satisfy kmax (16) r ≤ rl ≤ r k
(8)
kmax =
The pressure at the ith microphone due to the lth source −jk|ri −rl | is p(k, r, Φi ) = e |ri −rl | and it is given by [20] n N e−jk|ri −rl | = bn (k, r, rl )Ynm (Ψl )∗ Ynm (Φi ) (9) |ri − rl | n=0 m=−n
83
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
100 80
1.5 SH−MGD
SH−MUSIC
2
1
60 40
0.5
20
0 100
0 100 80 60 40 Elevation(θ)
20 0
0
20
80
60 40 Azimuth(φ)
80
100
60 40 Elevation(θ)
20 0
0
60 40 Azimuth(φ)
80
100
(b)
1
1
0.8
0.8
0.6
0.6
SH−MGD
SH−MUSIC
(a)
20
0.4 0.2
0.4 0.2
0 1
0 1
0.5 Range(m)
0.5 Range(m) 0
0
10
20
30
50 40 Azimuth(φ)
60
70
80
90 0
0
10
(c)
20
30
40
50 Azimuth(φ)
60
70
80
90
(d)
Fig. 2. Illustration of Azimuth and Elevation estimation by (a) SH-MUSIC (b)SH-MGD. Illustration of range and azimuth estimation using (c) SH-MUSIC (d) SH-MGD. The sources are at (0.4m,60◦ ,30◦ ) and (0.5m,55◦ ,35◦ ) at SNR 10dB. where Γ = diag(a1 , a2 , · · · , aI ), consists of sampling weights used in Equation 7 and
2.3. The Spherical Harmonics MUSIC (SH-MUSIC) spectrum for near-field source localization
pnm = [p00 , p1(−1) , p10 , p11 , · · · , pN N ]T .
This section presents formulation of the proposed SH-MUSIC spectrum for near-field source localization. Substituting the expression for pressure from Equation 9 in Equation 3, the steering matrix in Equation 2 can be written as
The orthogonality of spherical harmonics under spatial sampling suggests [19] YH (Φ)ΓY(Φ) ∼ = I.
V(k) = Y(Φ)[B(r1 )yH (Ψ1 ), · · · , B(rL )yH (ΨL )] (17)
(22)
Hence, the data model finally becomes
where Y(Φ) is I × (N + 1)2 matrix. A particular ith row vector can be written as y(Φi ) = [Y00 (Φi ), Y1−1 (Φi ), Y10 (Φi ), Y11 (Φi ), . . . , YNN (Φi )] (18) and y(Ψl ) is 1 × (N + 1)2 vector with similar structure as in Equation 18 with angle Ψl , l = 1, 2, · · · , L. The (N + 1)2 × (N + 1)2 matrix B(rl ) is given by B(rl ) = diag(b0 (k, r, rl ), b1 (k, r, rl ), b1 (k, r, rl ), b1 (k, r, rl ), . . . , bN (k, r, rl )) (19)
pnm (k, r) = [B(r1 )yH (Ψ1 ), · · · , B(rL )yH (ΨL )]s(k) + nnm (k) (23) where B(rs )yH (Ψs ) is taken to be look-up steering vector. The 3-Dimensional MUSIC spectrum in spherical harmonics domain can now be written as 1 NS H H y(Ψs )BH SNS [S pnm pnm ] By (Ψs ) (24) The search is performed over rs as in Equation 16 and over Ψs with (0 ≤ θs ≤ π, 0 ≤ φs ≤ 2π). SNS pnm is noise subspace obtained from eigenvalue decomposition of autocorrelation matrix, Spnm , defined as PM U SIC (rs , Ψs ) =
Dependency of B(rl ) on k and r is dropped for notational simplicity. Substituting (17) in (1), multiplying both side by YH (Φ)Γ and utilizing Equation 7, the data model becomes pnm (k, r) = YH (Φ)ΓY(Φ)[B(r1 )yH (Ψ1 ), · · · , B(rL )yH (ΨL )]s(k) + nnm (k)
(21)
Spnm = E[pnm (k, r)pnm (k, r)H ]
(20)
84
(25)
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
3. PERFORMANCE EVALUATION
The denominator of the MUSIC spectrum tends to zero when (rs , Ψs ) corresponds to source location owing to orthogonality between noise eigenvector and steering vector. Hence, a peak is obtained in MUSIC spectrum.
The proposed methods, SH-MUSIC, SH-MGD and SHMVDR are evaluated by conducting experiments on source localization. The estimated range and bearing are tabulated at various SNRs. The proposed algorithm was tested in a room with dimensions, 7.3m × 6.2m × 3.4m. An Eigenmike microphone array [23] was used for the simulation. It consists of 32 microphones embedded in a rigid sphere of radius 4.2 cm. The order of the array was taken to be N = 4. The source localization experiments are conducted at various SNR.
2.4. Near-field source localization using Spherical Harmonic MUSIC-Group Delay (SH-MGD) spectrum The SH-MUSIC utilizes the magnitude of y(Ψs )BH SNS pnm as it is clear from Equation 24 . The phase spectrum of MUSIC is utilized in [13–16] for robust source localization. A sharp change in unwrapped phase is seen at the Direction of Arrival (DOA) [14, 16]. Hence, the negative differentiation of unwrapped phase spectrum (Group delay) results in peak at the DOAs. In practice, abrupt changes can occur in the phase due to small variations in the signal caused by microphone calibration errors. Hence, the group delay spectrum sometimes may have spurious peaks. The product of MUSIC and Group delay spectra, called MUSIC-Group delay, removes such spurious peaks and gives high resolution estimation. The Spherical Harmonics MUSIC-Group delay (SH-MGD) spectrum is computed as PM GD (rs , Ψs ) = (
U
3.1. Experiments on source localization Two sets of experiments were conducted. For the first experiment, two closely spaced narrowband sources were placed in near-field region at (0.4m,60◦ ,30◦ ) and (0.4m,65◦ ,35◦ ). The range of the sources was kept fixed at 0.4m. The experiments were conducted at SNR 0dB and 8dB. The additive noise is assumed to be zero mean Gaussian distributed. The mean estimation for azimuth and elevation is presented in the first part of the Table 1. In the second experiment, the sources were positioned at (0.4m,60◦ ,30◦ ) and (0.5m,65◦ ,35◦ ). The range and the azimuth were estimated at SNR 5dB and 10dB, considering fixed elevation. The result shown in Table 1 is obtained from 300 independent Monte Carlo trials. It is clear that SH-MGD performs reasonably better than SH-MUSIC. Both of these methods outperform MVDR.
|∇arg(y(Ψs )BH .qu )|2 ).PM U SIC
u=1
(26) where U = (N + 1)2 − L, ∇ is the gradient operator, arg(.) indicates unwrapped phase, and qu represents the uth eigenvector of the noise subspace, SNS pnm . The first term within (.) is the group delay spectrum. The gradient is taken with respect to (rs , θs , φs ). Figure 2 illustrates the performance of SH-MUSIC and SH-MGD for range and bearing estimation using spherical microphone array. The simulation was done considering open sphere with two closely spaced sources at (0.4m,60◦ ,30◦ ), (0.5m,55◦ ,35◦ ) and SNR 10dB. Figure 2(a) and 2(b) show plots corresponding to elevation and azimuth estimation. It is clear that SH-MGD exhibits higher resolving power. Plots in Figure 2(c) and 2(d) show range and azimuth of the sources. The high resolution of MGD is due to additive property of group delay spectrum. The additive property is proved mathematically in our earlier work for ULA [16] and UCA [15]. While this is valid for spherical array also, the mathematical proof is being developed.
Table 1. Localization experiments, Set 1 : SNR 0dB, 8dB for fixed range. Set 2 : SNR 5dB, 10dB for fixed elevation SNR S SH-MGD SH-MUSIC MVDR S1 (60.46,29.82) (60.04,30.02) (58.35,29.22) 0dB S2 (65.01,34.94) (65.00,35.00) (63.67,34.19) S1 (60.00,29.96) (60.00,29.99) (61.15,29.33) 8dB S2 (65.00,35.00) (65.00,35.00) (63.65,34.43) S1 (0.416,29.91) (0.429,30.11) (0.367,29.26) 5dB S2 (0.548,34.91) (0.560,34.49) (0.541,33.28) S1 (0.409,30.00) (0.410,30.00) (0.406,30.06) 10dB S2 (0.510,35.00) (0.514,35.00) (0.548,33.40) 4. CONCLUSION
2.5. The Spherical Harmonics MVDR (SH-MVDR) spectrum for range and bearing estimation
In this work, 3-Dimensional SH-MUSIC, SH-MGD and SHMVDR are proposed for near-field source localization. Since the phase spectrum of MUSIC is more robust to noise, the SHMGD indicates higher resolution. The proof of additive property of group delay in the spherical harmonics domain is currently being developed. The detailed relative performance of SH-MUSIC and SH-MGD for closely spaced sources under reverberation will be addressed in future work. The Cram´erRao bound for spherical harmonics is being developed for the performance analysis of the proposed methods.
The conventional MVDR minimizes the contribution of interference impinging on the array from a DOA = Ψs , while it maintains certain gain in look direction Ψs . On the similar lines, the SH-MVDR spectrum for near-field source localization, can be written as PM V DR (rs , Ψs ) =
1 H y(Ψs )BH S−1 pnm By (Ψs )
(27)
85
2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
References
[12] Jack Capon, “High-resolution frequency-wavenumber spectrum analysis,” Proceedings of the IEEE, vol. 57, no. 8, pp. 1408–1418, 1969.
[1] Jens Meyer and Gary Elko, “A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield,” in Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on. IEEE, 2002, vol. 2, pp. II–1781.
[13] Lalan Kumar, Kushagra Singhal, and Rajesh M Hegde, “Robust source localization and tracking using musicgroup delay spectrum over spherical arrays,” in Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2013 IEEE 5th International Workshop on, St. Martin, France. IEEE, 2013, pp. 304–307.
[2] John McDonough, Kenichi Kumatani, Takayuki Arakawa, Kazumasa Yamamoto, and Bhiksha Raj, “Speaker tracking with spherical microphone arrays,” in Acoustics Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.
[14] Lalan Kumar, Ardhendu Tripathy, and Rajesh M Hegde, “Robust multi-source localization over planar arrays using music-group delay spectrum,” IEEE Trans. on Signal Processing, Under review, 2014.
[3] Israel Cohen and Jacob Benesty, Speech processing in modern communication: challenges and perspectives, vol. 3, Springer, 2010.
[15] Ardhendu Tripathy, L Kumar, and Rajesh M Hegde, “Group delay based methods for speech source localization over circular arrays,” in Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 Joint Workshop on. IEEE, 2011, pp. 64–69.
[4] R Roy, A Paulraj, and T Kailath, “Estimation of signal parameters via rotational invariance techniques-esprit,” in 30th Annual Technical Symposium. International Society for Optics and Photonics, 1986, pp. 94–101.
[16] Mrityunjaya Shukla and Rajesh M Hegde, “Significance of the music-group delay spectrum in speech acquisition from distant microphones,” in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010, pp. 2738–2741.
[5] Roald Goossens and Hendrik Rogier, “Closed-form 2d angle estimation with a spherical array via spherical phase mode excitation and esprit,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE, 2008, pp. 2321–2324.
[17] James R Driscoll and Dennis M Healy, “Computing fourier transforms and convolutions on the 2-sphere,” Advances in applied mathematics, vol. 15, no. 2, pp. 202–250, 1994.
[6] R. O. Schmidt, “Multiple emitter location and signal parameter estimation,,” IEEE Transactions on Antenna and Propagation,, vol. AP-34, pp. 276–280, 1986.
[18] Earl G Williams, Fourier acoustics: sound radiation and nearfield acoustical holography, Access Online via Elsevier, 1999.
[7] Xuan Li, Shefeng Yan, Xiaochuan Ma, and Chaohuan Hou, “Spherical harmonics music versus conventional music,” Applied Acoustics, vol. 72, no. 9, pp. 646–652, 2011.
[19] Boaz Rafaely, “Analysis and design of spherical microphone arrays,” Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135–143, 2005.
[8] Dima Khaykin and Boaz Rafaely, “Acoustic analysis by spherical microphone array processing of room impulse responses,” The Journal of the Acoustical Society of America, vol. 132, pp. 261, 2012.
[20] Boaz Rafaely, “Plane-wave decomposition of the sound field on a sphere by spherical convolution,” The Journal of the Acoustical Society of America, vol. 116, pp. 2149, 2004.
[9] Jens Meyer and Gary W Elko, “Position independent close-talking microphone,” Signal processing, vol. 86, no. 6, pp. 1254–1259, 2006.
[21] Etan Fisher and Boaz Rafaely, “Near-field spherical microphone array processing with radial filtering,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 2, pp. 256–265, 2011.
[10] E. Fisher and B. Rafaely, “The nearfield spherical microphone array,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, 2008, pp. 5272–5275.
[22] Constantine A Balanis, Antenna theory: analysis and design, John Wiley & Sons, 2012. [23] The Eigenmike Microphone http://www.mhacoustics.com/.
[11] Y-D Huang and Mourad Barkat, “Near-field multiple source localization by passive sensor array,” Antennas and Propagation, IEEE Transactions on, vol. 39, no. 7, pp. 968–975, 1991.
86
Array,