Jun 21, 2013 - Using Ïi(x) to represent the time delay added to a microphone i for position x, .... problems related to volumetric integrals are eliminated with the pre-computation ... of subregions V to be evaluated, together with the measure W(V) of .... C Surfaces of constant time difference .... can use the indefinite integral.
Fast Steered Response Power Computation in 3D Spatial Regions Amir Said, Bowon Lee, Ton Kalker HP Laboratories HPL-2013-40 Keyword(s): volumetric integral; branch-and-bound;
External Posting Date: June 21, 2013 [Fulltext] Internal Posting Date: June 21, 2013 [Fulltext]
Approved for External Publication
Copyright 2013 Hewlett-Packard Development Company, L.P.
Fast Steered Response Power Computation in 3D Spatial Regions Amir Said, Bowon Lee, Ton Kalker May 13, 2013
1
Problem statement and notation
We consider the set of continuous-time1 audio signals {s1 (t), s2 (t), . . . , sM (t)} corresponding to M different microphones, and the problem of finding the location of a sound source in a three-dimensional space, identified by a coordinate vector x = (x, y, z). In the general case we can consider that signals si (t) had been prefiltered to improve performance [1, 2]. Using τi (x) to represent the time delay added to a microphone i for position x, steered-beamformer – steered response power (SRP) – source localization is based on maximizing2 W (x) = =
=
Z Z
∞ −∞
"
M X
si (t − τi (x))
i=1 "M M ∞ X X
−∞
i=1 j=1
dt,
(1) #
(2)
si (t − τi (x))sj (t − τj (x)) dt.
(3)
i=1 j=1
M X M Z X
#2
si (t − τi (x))sj (t − τj (x)) dt,
∞
−∞
Defining φi,j (τ ) =
Z
∞
si (t + τ )sj (t) dt,
(4)
−∞
1
The main ideas here are also valid for the more common case of discrete-time signals, but the continuous-time case has simpler notation and more intuitive interpretation. 2 Note that main results are also valid for variations like SRP-PHAT, etc.
1
and ξi,j (x) = τi (x) − τj (x),
(5)
we have W (x) =
M X M X
φi,j (ξi,j (x)),
(6)
i=1 j=1
Since φi,j (τ ) = φj,i (−τ ), ξi,j (x) = −ξj,i (x), and ξi,i (x) ≡ 0, we could consider for minimization only the sum over each of the M (M − 1)/2 microphone pairs. However, we later need the exact non-negative value of W (x), so we include the terms defining power in each microphone, and have a sum over P = M (M + 1)/2 values. Defining3 p = i(i−1)/2+j and replacing subscripts i, j with p, we can with the proper definition of function φp (ξp (x)) define the partial sum as W (x) =
M X i=1
"
φi,i (0) + 2
i−1 X
#
φi,j (ξi,j (x)) =
j=1
P X
φp (ξp (x)).
(7)
p=1
With this formulation, if we want to compute W (x) in a set of N spatial points {x1 , x2 , . . . , xN }, we can pre-compute functions4 φp (τ ), and delay differences ξp (x) for each xn , and then efficiently compute W (xn ) with simple table look-up and (if desired) interpolation.
2
Volumetric energy measures
With the table look-up approach the complexity of the sound source search is proportional to the number of candidate positions N times the number of microphone pairs P . If we want to improve beamforming performance by increasing the number of microphones, we have a quadratic increase in the number of pairs, and also the fact that better steering needs more spatial accuracy, which in turn can greatly increase the number of points where the steered sound power needs to be evaluated. Complexity can be reduced by computing the sound power in 3-D space regions, and doing a hierarchical search, subdividing only the regions where acoustic activity is detected. Note that the same approach can be used for SRP implementations that search on 2-D surfaces, simply reducing the search space from volumes to areas, and the volumetric integrals to surface integrals. To simplify notation and presentation we describe next only the volumetric case. 3
This is simply one of the many possible ways to count ordered pairs. Function values at points of interest can be stored in arrays, possibly together with coefficients for an interpolation method, like cubic splines. 4
2
Let V represent one region of space. We want to efficiently compute Z W (V) = W (x) dx, x∈V P XZ
=
p=1
φp (ξp (x)) dx.
(8)
x∈V
We can expand φp (ξp (x)) as φp (ξp (x)) = =
∞
Z
φp (ζ)δ(ζ − ξp (x)) dζ,
−∞ Z ζp,V max min ζp,V
(9)
φp (ζ)δ(ζ − ξp (x)) dζ.
(10)
where δ(ζ) is Dirac’s delta function5 , and compute W (V) =
P Z X p=1
=
P Z X p=1
x∈V
Z
∞ −∞
φp (ζ)δ(ζ − ξp (x)) dζ dx,
max ζp,V
φp (ζ) min ζp,V
Z
x∈V
δ(ζ − ξp (x)) dx dζ.
(11)
(12)
Defining χp (ζ, V) =
Z
x∈V
δ(ζ − ξp (x)) dx,
(13)
we can rewrite (8) as W (V) =
P Z X p=1
max ζp,V
χp (ζ, V)φp (ζ) dζ.
min ζp,V
(14)
Since the function χp (ζ, V) depends only on the integration region V and spatial arrangement of microphones, it can be pre-computed before sound localization. Appendices A and B contain some examples of how these can be done for discrete and continuous functions respectively. The complexity of computing the integrals in (14) can be much smaller than computing (7) in a large number of points. First, most of the complexity and accuracy problems related to volumetric integrals are eliminated with the pre-computation 5
This function allows describing the ideas very succinctly, a more careful analysis is presented in Appendix C.
3
of (13). Second, the integral can be computed even more efficiently if χp (ζ, V) can be well approximated as a piecewise linear function. We can expand the integrals in (14) as ζp,V max Z ζp,V Z max χp (ζ, V)φp (ζ) dζ = χp (ζ, V)Φp (ζ) − min ζp,V
min ζp,V
where
Φp (ζ) =
Z
max ζp,V min ζp,V
Φp (ζ)χ′p (ζ, V) dζ,
(15)
ζ
φp (z) dz,
(16)
−∞
can be efficiently pre-computed after φp (ζ) is computed, and dχp (z, V) ′ χp (ζ, V) = , dz z=ζ
(17)
depends only on V and microphone geometry.
min max If χp (ζ, V) is defined by a set of K linear pieces at {ζ1 = ζp,V , ζ2 , . . . , ζK = ζp,V }, ′ then χp (ζ) only assumes K constant values, and we can compute " ζp,V ζk+1 # max Z ζp,V max K−1 X . (18) χp (ζ, V)φp (ζ) dζ = χp (ζ, V)Φp (ζ) − χ′p (ζk , V)Φp (ζk ) min min ζp,V
ζp,V
k=1
ζk
Note that the above equations represent one possible acceleration for the computation of the integral (14). Other techniques based on different forms of representing χp (ζ, V) are also possible.
3
Branch-and-bound Search
The SRP and SRP-PHAT [1] sound source localization methods work by computing (7) for a large number of spatial points xn , which is computationally very expensive. Since the power magnitude distribution can have many local maxima, it is possible to miss the actual sound source position if the number of search points is not large enough. This also makes multi-stage search strategies less efficient [3]. With the volumetric computation it is possible to use the branch-and-bound search strategy [4, 5, 6] which was developed for discrete and combinatorial optimization problems. It can guarantee finding the maximum, and normally it can do so with much smaller computational complexity than full search. It is possible to use this technique because the sound power function that is integrated in a volume is always positive. Thus, the total integral in a volume V is 4
always an upper bound on the integral of any smaller volume contained within V. More formally, we have V=
D [
i=1
Vi ,
Vi ∩ Vj = ∅, i 6= j
=⇒
¯ (V) ≥ max W ¯ (Vi ) . W i
(19)
Assuming point sound sources and that the integration regions can be subdivided until they reach a pre-defined minimum volume Vmin , we can make proper power comparisons. The efficiency of branch and bound process is due to the fact that if the power integral in a space region V is smaller than the power of a known region with minimum volume, then V does not need to be subdivided because it cannot possible contain the optimal (maximum power) solution. Using F to denote the full search space, and L to represent a list of pairs [V, W (V)] of subregions V to be evaluated, together with the measure W (V) of sound power inside that subregion, the proposed application of the branch and bound algorithm to sound source localization is as follows. Branch-and-bound Algorithm ¯ max ← −1, and V ∗ be undefined. (Initialization) 1: Let L ← {F, W (F)}, W 2: if L = ∅ then 3: Stop: the search is complete and V ∗ is the subregion of volume not larger than 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
Vmin with largest sound power measure end if Choose one element [V, W (V)] from list L Let L ← L − [V, W (V)] ¯ max then if W (V) ≤ W Go to Step 2. (Bound) end if Subdivide V into D disjoint subregions V1 , V2 , . . . , VD (Branch) Compute W (V1 ), W (V2 ), . . . , W (VD ) Let L ← L ∪ {[V1 , W (V1 )], [VD , W (VD )], . . . , [VD , W (VD )]} Go to Step 2.
Details on strategies for increasing the efficiency of the branch and bound method are in the references [4, 5, 6].
4
Conclusions
In this report, we presented methods for evaluating volumetric integral for the SRP computation of 3D spatial regions. We also showed that we can evaluate data5
independent volumetric integrals before the execution of the SRP algorithm for faster computation. We also presented closed form expressions for the volumetric integral evaluation of hyperboloid functions determined by the time difference of arrival between two signals, which serve as the basis functions for evaluating volumetric power for the SRP algorithm. In conjunction with iterative search algorithms such as branch-and-bound search, the SRP sound source localization can be computed at significantly lower computational cost than the traditional exhaustive search methods without compromising its accuracy. This makes the SRP algorithm a viable solution for real-time sound source localization applications where computational resources are limited.
A
Application example I: discrete functions
It may be difficult to have an intuitive understanding of how the computation (8) can be done with (14) using precomputed functions (13), since it is done with the introduction of Dirac’s delta function and its use in a volumetric integral. In this section we first show a simple case on discrete data, which exemplifies how computational savings can be achieved. In this example we analyze a case using only discrete data and functions of integers (represented with square brackets instead of parentheses). Let us consider the problem of computing the double sum w=
6 X 6 X
φ[ξ[i, j]],
(20)
i=1 j=1
where φ[n] is some possibly very complicated function, but ξ[i, j] is defined by a fixed matrix 0 1 1 2 2 1 1 1 2 2 2 0 1 2 3 3 1 0 (21) ξ= 2 2 3 4 2 1 . 1 2 3 3 2 1 0 1 2 2 2 2 Similarly to (9) we can have
φ[ξ[i, j]] =
∞ X
k=−∞
φ[k]δ[k − ξ[i, j]],
with the discrete version of the delta function being 1, k = 0, δ[k] = 0, k 6= 0. 6
(22)
(23)
The computation of w can be done as w =
6 X 6 ∞ X X
i=1 j=1 k=−∞ ∞ X
=
φ[k]
φ[k]δ[k − ξ[i, j]]
6 X 6 X i=1 j=1
k=−∞
δ[k − ξ[i, j]],
(24)
so if we first compute χ[k] =
6 X 6 X i=1 j=1
then we can use
w=
4 X
δ[k − ξ[i, j]]
(25)
φ[k]χ[k].
(26)
k=0
In this discrete data example it is much easier to see how χ[k] can be pre-computed, since it can be done by simply counting the number of occurrences of a given value k in ξ[i, j]. For instance, we have . . . , χ[−2] = 0, χ[−1] = 0, χ[0] = 4, χ[1] = 11, χ[2] = 15, . . .
(27)
In terms of computational complexity, (20) requires 36 computations of function φ[n] while (26) requires only 5.
B
Application example II: continuous functions
In this section we show another example using continuous function over volumes that involve delta functions, and show the computational savings not directly in terms of number of arithmetic operations, but instead on transforming a multi-dimensional integral to a one-dimensional integral by pre-computing the proper function. Let us consider that we want to compute many area integrals in the form Z 1Z 1 S= φ(ξ(x, y)) dx dy, −1
(28)
−1
where function φ(ξ) changes with each integral computation, but the region of integration (x, y ∈ [−1, 1]) and function ξ(x, y) remain constant. A more specific example can be 2
ξ(x, y) = x − y, φ(ξ) = ξ
3
=⇒
S= 7
Z
1 −1
Z
1 −1
(x2 − y)3 dx dy =
40 . 21
(29)
Following the approach used in Section 2, if we write φ(ξ) as Z ∞ φ(ξ) = φ(ζ)δ(ζ − ξ) dζ,
(30)
−∞
we obtain S = =
Z
1
Z
1
−1 −1 Z ζmax
Z
∞ −∞
φ(ζ)δ(ζ − ξ(x, y)) dζ dx dy,
χ(ζ)φ(ζ) dζ,
(31) (32)
ζmin
where Z
χ(ζ) =
1
Z
−1
1 −1
δ(ζ − ξ(x, y)) dx dy
(33)
is a function that can be pre-computed (since it does not depend on φ(ξ)), and ζmin and ζmax define the support of that function. In the case ξ(x, y) = x2 − y, we have χ(ζ) =
Z
1 −1
Z
(34)
1 −1
δ(ζ − x2 + y) dx dy.
(35)
We can consider Eq. (35) as χ(ζ) = lim+ σ→0
Z
1 −1
Z
1 −1
Qσ (ζ − x2 + y) dx dy,
(36)
where Qσ (·) are functions like 1 ζ2 Qσ (ζ) = √ exp − 2 , 2σ σ 2π or Qσ (ζ) =
1/σ, |ζ| < σ/2, 0, |ζ| ≥ σ/2.
(37)
(38)
Using any of these functions in a program for numerical quadrature we find that the function χ(ζ) defined by the area integral of eq. (35) is as plotted in Figure 1. We can also find a closed-form expression for this integral using, for example, this change of variables u = x, v = x2 − y
=⇒ 8
x = u, y = u2 − v,
(39)
2.5 2.0 1.5
χ(ζ) 1.0 0.5 0.0 -2
-1
0
ζ
1
2
3
Figure 1: Values of function χ(ζ) defined by area integral of eq. (35). which leads to Z
χ(ζ) = where
1 −1
Z
u2 +1
|A|δ(ζ − v) dv du,
u2 −1
∂x ∂u A = det ∂y ∂u
∂x ∂v = 1. ∂y ∂v
Defining the discrete indicator function 1, ζ ∈ [u2 − 1, u2 + 1], I(u, ζ) = 0, otherwise, we obtain χ(ζ) =
Z
(40)
(41)
(42)
1
I(u, ζ) du,
(43)
−1
which can shown to be equal to χ(ζ) =
√ 2 ζ + 1, −1 ≤ ζ < 0, 2, 0 ≤ ζ < 1, √ 2(1 − ζ − 1), 1 ≤ ζ < 2, 0, otherwise,
and thus ζmin = −1 and ζmax = 2. 9
(44)
Considering again the special case φ(ζ) = ζ 3 , we obtain S = = =
C
Z
Z
2
ζ 3 χ(ζ) dζ, −1 0
2ζ −1
40 . 21
3
p
ζ + 1 dζ +
Z
2 3
0
2ζ dζ −
Z
2
2ζ 3 1
p
ζ − 1 dζ
Surfaces of constant time difference
In the example of Appendix A we use sum (25), which measures the “size” of sets where χ[i, j] is constant, to enable fast computation. In the continuous variable case the integral (13) can be similarly computed by identifying sets where ξp (x) = ζ. Details about all the integration techniques that are needed in a practical implementation are outside the scope of this document, but here we present an example of how we can exploit some characteristics of the problem. In our case, we want to study the special properties of the signals measured by microphone pairs, and we can do it considering each pair separately. To simplify the notation we drop the subscripts p that identify the microphone pairs, and we adopt a coordinate system chosen so that the microphone pair is in the x-axis, and the microphones have coordinates m0 = (−S/2, 0, 0),
m1 = (S/2, 0, 0),
(45)
as shown in Fig. 2. This is equivalent to a simple change of variables that involves only geometric rotations and translations, and thus does not affect substantially the integration process. The change of integration volume can also be easily computed, so we do not include the details here. Using c to represent the speed of sound, the difference between propagation times, of sound originated at position x = (x, y, z), to the two microphones is ξ(x) = (kx − m0 k − kx − m1 k) /c p 1 p 2 2 2 2 2 2 = (x + S/2) + y + z − (x − S/2) + y + z , c
(46)
Since the time difference cannot be longer than the time for sound to travel between the two microphones, we know that |ξ(x)| ≤
S = s, for all x ∈ R3 . c 10
(47)
y
x
m0
m1
-S/2
x
S/2
Figure 2: Diagram with position of microphone pair (m0 , m1 ), for studying the surfaces of constant time differences. It can be shown that the equation |ξ(x)| = ζ is equivalent to x2 y 2 + z 2 c2 − = , ζ2 s2 − ζ 2 4
(48)
which defines in the plane z = 0 a hyperbola as shown in Fig.2, and in the (x, y, z) space a hyperboloid of revolution, as shown in Fig. 3. We can use this property to compute integrals in the form Z w(V) ¯ = φ(ξ(x)) dx,
(49)
x∈V
which are used in (8), and are equivalent to Z Z Z w(V) ¯ = φ(ξ(x, y, z)) dx dy dz.
(50)
(x,y,z)∈V
We want to consider a change of variables that facilitates the computation of the integral, and also fits the strategy of pre-computing terms that only depend on the placement of microphones, and not on the measured signals (function φ(·)). We need a vector functions that is bijective and differentiable, in the volume of integration 11
y
ξ(x) = ζ
x z
Figure 3: The equation ξ(x) = ζ defines a hyperboloid of revolution.
y
m0
m1
x
Figure 4: Examples of hyperbolas defined by different values of ζ.
12
(and not necessarily everywhere). We can exploit the fact that, as indicated by the examples in Fig. 4, revolution hyperboloids parameterized by the variable ζ never intersect, and their surfaces fully cover the R3 space, except the x axis. Before presenting the new sets of integration variables lowing function for computing angles of polar coordinates tan−1 ([z − y]/[y + z]) + π/4, tan−1 ([z + y]/[y − z]) + 3π/4, tan−1 ([z − y]/[y + z]) − 3π/4, tan−1 (y, z) = tan−1 ([z + y]/[y − z]) − π/4, 0,
we need to define the foly y y y y
> 0, z ≥ 0, ≤ 0, z > 0, < 0, z ≤ 0, ≥ 0, z < 0, = z = 0.
(51)
Let us now analyze one choice for change of variables, defined by the function f : R3 → H, such that f (x, y, z) = (ζ, θ, µ) when p 1 p (x + cs/2)2 + y 2 + z 2 − (x − cs/2)2 + y 2 + z 2 , c θ = tan−1 (y, z), 1p 2 µ = y + z2. c ζ =
(52)
Note that the range of f is
H = {(ζ, θ, µ) : ζ ∈ [−s, s], θ ∈ [−π, π), µ ∈ [0, ∞)} , and we have a special case along the x-axis: x ≥ S/2, (s, 0, 0), (2x/c, 0, 0), |x| < S/2, f (x, 0, 0) = (−s, 0, 0), x ≤ −S/2.
(53)
(54)
We can define the inverse function g : H → R3 such that g(ζ, θ, µ) = (x, y, z) when s µ2 1 x = cζ + , 2 2 s −ζ 4 y = cµ cos(θ), z = cµ sin(θ),
(55)
with the observation that g(f (x, y, z)) = (x, y, z), for all (x, y, z) ∈ R3 − {(x, y, z) : |x| > S/2, y = z = 0}, 13
i.e., it is the inverse everywhere, except the parts of the x axis that are outside the interval between the two microphones. In practice this is not a problem since the integrals do not need to be computed for time differences beyond the physical limits. Since the transformation is bijective and continuously differentiable in the volumes of integration, we can use the fact that ξ(g(ζ, θ, µ)) = ζ, for all (ζ, θ, µ) ∈ H,
(56)
and compute (50) using w(V) ¯ =
Z Z Z
φ(ζ)| det(J g)(ζ, θ, µ)| dζ dθ dµ
(57)
(ζ,θ,µ)∈f (V)
where J g is the Jacobian matrix ∂x ∂ζ ∂y Jg = ∂ζ ∂z ∂ζ
∂x ∂θ
∂y ∂θ ∂z ∂θ
∂x ∂µ ∂y , ∂µ ∂z ∂µ
(58)
Before computing the matrix elements we can use the fact that ∂x ∂y ∂z = = ≡ 0, ∂θ ∂ζ ∂ζ to simplify the computation of ∂x 0 ∂ζ ∂y 0 ∂θ 0 ∂z ∂θ Using the values
the determinant ∂x ∂µ ∂y ∂x ∂y ∂z ∂y ∂z − . = ∂µ ∂ζ ∂θ ∂µ ∂µ ∂θ ∂z ∂µ
∂x c [(s2 − ζ 2 )2 + 4s2 µ2 ] q = , 2 ∂ζ 4(s2 − ζ 2 )2 s2µ−ζ 2 + 41 14
(59)
(60)
(61)
and ∂y = −cµ sin(θ), ∂θ
∂z = cµ cos(θ), ∂θ
∂y = c cos(θ), ∂µ
∂z = c sin(θ), ∂µ
(62)
we conclude that c3 µ [(s2 − ζ 2 )2 + 4s2 µ2 ] q . | det(J g)(ζ, θ, µ)| = 2 2 2 2 4(s − ζ ) s2µ−ζ 2 + 14
(63)
Defining the set U(ζ, V) = {(θ, µ) : g(ζ, θ, µ) ∈ V},
(64)
we can now compute (57) as
w(V) ¯ =
Z
max ζV min ζV
φ(ζ)χ(ζ, V) dζ,
(65)
where χ(ζ, V) =
Z Z
| det(J g)(ζ, θ, µ)| dθ dµ
(θ,µ)∈U (ζ,V)
=
c3 µ [(s2 − ζ 2 )2 + 4s2 µ2 ] q dθ dµ µ2 1 2 − ζ 2 )2 4(s + s2 −ζ 2 4 (θ,µ)∈U (ζ,V) Z Z
(66)
Note that the computation of this function can be further simplified using the fact that the function being integrated does not depend on integration variable θ, and we can use the indefinite integral Z 3 c µ [(s2 − ζ 2 )2 + 4s2 µ2 ] q dµ = 2 4(s2 − ζ 2 )2 s2µ−ζ 2 + 14 s 2 c3 4µ µ2 1 3(s2 − ζ 2 ) + s2 − 2 + . (67) 12 s2 − ζ 2 s2 − ζ 2 4 This result can also be used directly by two-dimensional SRP, when the search space corresponds to θ = 0, and we only need to integrate on variable µ.
References [1] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, “Robust localization in reverberant rooms,” in M.S. Brandstein and D. Ward, eds., Microphone Arrays: Signal Processing Techniques and Applications, pp. 157–180, 2001. 15
[2] J. P. Dmochowski, J. Benesty, and S. Affes, “A Generalized Steered Response Power Method for Computationally Viable Source Localization,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15(8), pp. 2510–2526, Nov. 2007. [3] H. Do, H. F. Silverman, and Y. Yu, “A real-time SRP-PHAT source location implementation using stochastic region contraction (SRC) on a large-aperture microphone array,” IEEE Int. Conf. Acoustics, Speech and Signal Processing, Honolulu, USA, vol. 1, pp. 121-124, April 2007. [4] T. Ibaraki, “Enumerative approaches to combinatorial optimization,” Annals of Operations Research, vol. 10, 1987. [5] B. Gendron and T. G. Cranic, “Parallel branch-and-bound algorithms: survey and synthesis”, Operations Research, vol. 42(6), pp. 1042–1066, 1994. [6] J. Clausen, Branch and Bound Algorithms – Principles and Examples, http://www.diku.dk/OLD/undervisning/2003e/datV-optimer/JensClausenNoter.pdf
16