Determining Sound Source Orientation from Source ... - Springer Link

9 downloads 0 Views 322KB Size Report
the source position, directivity and multi-microphone recordings. The acoustic signal emitted by the source is assumed to be broadband, such as a down-swept ...
Determining Sound Source Orientation from Source Directivity and Multi-microphone Recordings Francesco Guarato and John C.T. Hallam Mærsk Mc-Kinney Møller Institute, University of Southern Denmark [email protected], [email protected]

Abstract. This paper presents an analytic method for determining the orientation of a directional sound source in three-dimensional space using the source position, directivity and multi-microphone recordings. The acoustic signal emitted by the source is assumed to be broadband, such as a down-swept frequency modulated chirp of the kind many bats use while echolocating. The method has been tested in simulations on PC using the directivity of a piston transducer and the more complex and more realistic head-related transfer function of the Phyllostomus discolor bat. The ultimate purpose of the work is to determine the orientation and actual emitted call of a flying bat from a remote array recording.

1

Introduction

Researchers have studied sound source localization in 2D by analysis of time differences (TD) and intensity differences (ID) of the acoustic signal received by microphones displaced around the sound source [1]. They typically take account of the directivity of the receiver system but not the directional properties of the source which is considered to be an omnidirectional point sound source. Likewise, 3D object localization using a sound emitter and analyzing the signal reflected by targets and collected by receivers has been extensively studied [2], [3], [4], and many experiments have been performed to investigate sound source localization by human beings, to measure their head-related transfer function (HRTF) and find out features they use to localize a sound source in the space, such as in [5]. Finally, acoustic simulation techniques have been used to relate the shape of sound emitters and receivers to their acoustic properties [6]. However, the reconstruction of an acoustic signal emitted by a directional source from a collection of remote omnidirectional receivers appears not to have received attention in the literature. This problem arises if one wishes to determine the precise vocalisation of a flying bat: the recording of the call is remote, since the bat typically cannot carry telemetry equipment to record the call locally (but see [7]); and each remote microphone hears the call as filtered by the bat’s emission directivity. In order to reconstruct a bat call, two features are needed: the call frequency range and amplitude. We concentrate on the reconstruction of the call amplitude, J. Mira et al. (Eds.): IWINAC 2009, Part II, LNCS 5602, pp. 429–438, 2009. c Springer-Verlag Berlin Heidelberg 2009 

430

F. Guarato and J.C.T. Hallam

whose solution requires knowledge of the bat head orientation in space when it emits the call. In this paper we show a general method to determine the orientation of a directional sound source in three-dimensional space given its position, the position of microphones in a recording array, recordings of the source by the array of microphones, and the source directivity. In Sect. 2 we present the problem statement and a mathematematical formulation of the method, while in Sect. 3 the simulations we performed to test the reliability of the method are described. The final results are discussed in Sect. 4 where we also indicate the future aims based on the present work.

2

Problem Analysis

In this section we first describe the structure of the problem and the tools we are provided with and, second, the mathematical formulation of the method for determining the sound source orientation. 2.1

Problem Setting

We are given a sound source and a set of omnidirectional microphones placed in front of the source. We suppose the source position and the microphone positions are known as well as the directivity of the source. The source ideally needs to emit a broadband signal having significant amplitude at a number of relevant frequencies, which is recorded by the microphones. Without loss of generality we place the microphones at unit distance from the source, as the distance between source and microphone can be computed from their positions and can be compensated in the signal recorded by each microphone using a factor multiplying its amplitude to correct for sound attenuation due to distance. We can therefore assume that the source is at the centre of a unit sphere while the microphones are positioned on its surface. Their positions are represented by pairs of azimuth and elevation angles, as shown in Fig. 1. The orientation of the source can also be expressed in terms of the azimuth and elevation angles of the point where the source’s reference direction intersects the sphere. 2.2

Mathematical Solution

Let D(f, θ, φ) represent the source directivity, which depends on the frequency and the azimuth and elevation angles of the receiver. Assume that (θs , φs ) is the orientation of the source. Hence, the predicted amplitude of the signal received by microphone m at frequency f is gˆmf = ef D(f, Rs (θm , φm )) = D(f, θ˜m , φ˜m ) ∀ f ∈ {1, . . . , F } ,

(1)

where ef is the amplitude emitted by the source at frequency f , Rs indicates the rotation of the sphere by (θs , φs ) — needed to align the directivity reference axis

Determining Sound Source Orientation from Source Directivity

431

Z

m

φm

Y φS

θS

θm

X

Fig. 1. Reference frame. Source (bat) orientation is defined by the azimuth and elevation angle pair (θS , φS ) and microphone m by (θm , φm ).

with the sphere reference direction — applied to microphone m ∈ {1, . . . , M }, where M is the number of microphones and F is the number of frequencies. Effectively, we rotate the microphone array to compensate for the orientation of the source so that the relationship between the microphone directions (originally in the world reference frame) and the source directivity (expressed in a sourcerelative reference frame) is determined; Rs is the true rotation that does this. To estimate the true orientation (θs , φs ) of the source, we look for the unknown rotation R of the source directional pattern across the sphere such that it best fits the amplitudes gˆmf , ∀ m, f . For each orientation we calculate the ampitudes of the signals collected by all the microphones using (1) and compare them with the ones microphones have measured. Such a comparison is expressed, for microphone m, as gˆmf − ef · D(f, R(θm , φm ))

∀f ,

(2)

that is, when the source is rotated by R, the signal received by microphone m is proportional to D(f, R(θm , φm )) and the proportion ef depends only on frequency. (ef estimates the unknown amplitude spectrum of the emitted signal.) From (2) we build up the error function by squaring and summing over all microphones and all frequencies, thus: E(R) =

F  M  f =1 m=1

2

[ˆ gmf − ef · D(f, R(θm , φm ))] =

F  f =1

Ef ,

(3)

432

F. Guarato and J.C.T. Hallam

This error function is a non-negative valued function whose domain is the set of all possible orientations the sound source can assume. By minimizing (3) we compute an estimated orientation of the source, which we hope is close to the true one. To do the minimization, we need E(R) as a function of the only unknown term R. The term Ef can be written as  2   M M M g ˆ D(f, R(θ , φ ))   mf m m m=1 2 Ef = gˆmf − M + (D(f, R(θm , φm )))2 ·Z(R), 2 (D(f, R(θ , φ ))) m m m=1 m=1 m=1 (4) where   M gˆmf D(f, R(θm , φm )) Z(R) = ef − m=1 . (5) M 2 m=1 (D(f, R(θm , φm ))) Eq. 4 is quadratic and is clearly minimized when M gˆmf D(f, R(θm , φm )) ef = m=1 . M 2 m=1 (D(f, R(θm , φm )))

(6)

By substituting the expression (6) for ef into (3), we express the error function in terms of the unknown rotation alone. Its minimum should correspond to the true orientation of the sound source, which we estimate as ˆ s = arg min E(R) . R R

2.3

(7)

Source Directivities for Simulations

We use two functions as the source directivity in (3) to test the method described above. The first function is the directivity of a Polaroid ultrasonic Transducer [8], modelled as a piston in an infinite baffle, whose analytical expression is DT (f, θ, φ) = 2 ·

|J1 (ka sin ψ)| , |ka sin ψ|

(8)

where J1 is a first order Bessel function of the first kind, k = 2πf /c with c as the velocity of sound in the air is the wave number of the emitted signal, a is the diameter of the transducer and ψ is the angle between the vector pointing to the receiver, in the direction defined by azimuth angle θ and elevation angle φ, and the normal to the surface of the transducer. It is given by ψ = arccos(cos φ cos θ) .

(9)

The second function we adopt as the source directivity is the head-related transfer function of the left ear of an individual Phyllostomus discolor (Lesser Spearnosed bat). The values have been computed by acoustic simulation, at a finite set of orientations and frequencies, of a shape model built from a scanned head. Data have a 2.5◦ and 500Hz step for the set of frequencies [25kHz, 95kHz], which is the range typically used by the Phyllostomus discolor.

Determining Sound Source Orientation from Source Directivity

3

433

Experimental Testing

The aim of this Section is to give as more as possible a precise statistical description of the performance of the method presented above. Such a performance is expressed as the error, in degrees, between the vector pointing to the estimated source orientation and the one pointing to the true orientation. First we talk about the arrangement of microphones with respect to the source and the situations we considered interesting to examine and then we show the simulation results in terms of their errors. The experiment setting has been kept the same for all the simulations and has been performed entirely on a PC. 3.1

Experiment Setting

We choose the sound source to be the origin of the reference frame with respect to which the microphone positions are set. 16 microphones are placed in a rectangular array configuration in front of the source to collect the acoustic signal. Given that each microphone has unit distance from the source, its position is completely described by azimuth and elevation angles with respect to a world reference frame (see Fig. 1). Microphone positions are written in Table 1. Table 1. Microphone positions described by azimuth and elevation angles M1 = (−40◦ , 30◦ ) M5 = (−40◦ , 10◦ ) M9 = (−40◦ , −10◦ ) M13 = (−40◦ , −30◦ )

M2 = (−10◦ , 30◦ ) M6 = (−10◦ , 10◦ ) M10 = (−10◦ , −10◦ ) M14 = (−10◦ , −30◦ )

M3 = (10◦ , 30◦ ) M7 = (10◦ , 10◦ ) M11 = (10◦ , −10◦ ) M15 = (10◦ , −30◦ )

M4 = (40◦ , 30◦ ) M8 = (40◦ , 10◦ ) M12 = (40◦ , −10◦ ) M16 = (40◦ , −30◦ )

The source orientation is been chosen from the five different orientations shown in Table 2. Table 2. Source orientations described by azimuth and elevation angles Ω1 = (−20◦ , 0◦ ) Ω2 = (−20◦ , −20◦ ) Ω3 = (0◦ , −20◦ ) Ω4 = (20◦ , −20◦ ) Ω5 = (20◦ , 0◦ )

These five orientations have been chosen by considering the ones a trawling bat uses while looking for prey near the water surface and having the microphone array in front of itself. The call emitted by the source is assumed to be broadband. Given an assumed source orientation, the amplitude detected by each mcrophone is computed for a range of frequencies using (1). Noise is added to the predicted amplitude to investigate the robustness of the method. The noise is modelled as white with a normal distribution and has been considered in (3) as an additive term to the amplitude gˆmf received by each microphone. The resulting set of microphone amplitudes are presented as input to the algorithm outlined,

434

F. Guarato and J.C.T. Hallam

and it computes an estimated orientation for the source. The error between this estimate and the originally-chosen orientation constitutes the performance of the method. In the following experiments, the source directivity for the Polaroid Transducer has an analytical expression and its value can be calculated for all orientations. We use 0.01 and 0.1 as noise variance values. Given that the amplitude of the call emitted by the source is 1, that gives an SN R of 20dB and 10dB respectively. The set of frequencies in this case is [25kHz, 35kHz] with a 1kHz step, so that the number of frequencies is 11. On the other hand, the bat HRTF value is known only at orientations corresponding to points in a grid whose step is 2.5◦ and the number of frequencies available is 141, that is, one frequency every 500Hz step in the range [25kHz, 95kHz]. 3.2

Results

Experiments have been performed as follows: for each of the two kinds of source directivity, for each source orientation of Table 2 and for each SN R value, 10 runs of the method have been considered. Experiment results are expressed in terms of the error between the estimated orientation and the initially chosen one. Let’s consider the transducer as the sound source. Fig. 2 shows a histogram of error values for 4 orientations from Table 2, all for experiments with SN R = 10dB. Although the SN R is big, the mean error is small and similar for all the orientations. Fig. 3 shows the corresponding results for the Lesser Spearnosed bat HRTF source directivity using a set of 10 frequencies corresponding to largest 10 values of HRTF, while Fig. 4 depicts the errors obtained for SN R = 10dB using every 14th frequency in the whole set of 141 frequencies available in the HRTF simulation data (10 frequencies in total). Both figures consider 4 different source orientations taken from Table 2. Fig. 5 shows the error distributions with respect to 4 orientations with SN R = 0dB. The orientation of the source giving errors in Fig. 4, (a) and (c), and 5, (a) and (c), is referred to the same azimuthal angle (−20◦ ), while errors in Fig. 4, (b) and (d), and 5, (b) and (d), are related to the opposite one (+20◦ ). The asymmetry of such results is discussed in the next paragraph. 3.3

Discussion

In the case where the Polaroid directivity is used for the sound source, the search for the source orientation shows negligible (≈ 0◦ ) error for all source orientations when SN R = 20dB, while for SN R = 10dB bigger error values are seen, see Fig. 2. Nevertheless, the errors are small and the method robust to noise. The mean and error values pictured in all cases of Fig. 2 are very similar because of the symmetry of the transducer directivity. On the other hand, when the more realistic source directivity given by the Lesser Spearnosed Bat HRTF is used, experiments performed with SN R = 20dB and SN R = 10dB and the full set of 141 frequencies returned 0◦ as the error

Determining Sound Source Orientation from Source Directivity

number of instances

(a)

(b)

3

3

2

2

1

1

0 −5

0

5

15

25

35 40

0 −5

0

5

(c) 3

2

2

1

1

0

5

15

15

25

35 40

25

35 40

(d)

3

0 −5

435

25

35 40

0 −5

0

5

15

error (deg)

Fig. 2. Error distribution for Polaroid Transducer as sound source, SN R = 10 dB . (a) source orientation (−20◦ , 0◦ ), error mean = 3.16◦ . (b) source orientation (20◦ , 0◦ ), error mean = 2◦ . (c) source orientation (−20◦ , −20◦ ), error mean = 2.5◦ . (d) source orientation (20◦ , −20◦ ), error mean = 3.3◦ .

value. This is partly due to the shape of the Lesser Spearnosed bat’s HRTF but mostly to the broad range of frequencies for which the HRTF is defined. In (3), the bigger the number of frequencies is, the less significant is the effect of even a big noise variance on the error function, so that the method is more precise. Testing the method with greater noise and smaller number of frequencies (Fig. 3–5) reveals, for example in Fig. 5, evidence of the asymmetry of the bat’s HRTF. It has a wide lobe in correspondence of positive values for the azimuth angle but not of the negative ones. For this reason, when the source is oriented to negative values of azimuth angle, most microphones receive a pretty high amplitude valued acoustic signal, while positive azimuth orientations of the source make a lot of microphones receive a weaker signal, so that it is easier for the noise to make the method mistake. In fact, for both orientations the error is much smaller at a source orientation of −20◦ than at the opposite orientation +20◦ (Fig. 5). It has to be pointed out that a situation with such big error values occurs only when the SN R is unrealistically low: in the case of a real bat, the SN R is much bigger than the ones considered in this paper where we focus on presenting the method and its robustness to the noise. Note that we used a bat ear directivity as source directivity even though it is a related to the reception of sound signals: the ear directivity is usually, but not always, more complex than the emission directivity; we feel it represents a reasonable complement to the highly symmetrical analytic model of the Polaroid Transducer for testing the performance of the present method. The directivity of a source such as a bat can only be non-destructively determined through acoustic simulation. For this reason, a real directivity can only be known in a discrete set of equally spaced orientations but not in all the ones where the acoustic signal propagates. This problem can be overcome using linear or quadratic interpolation between the known values of the directivity function to generalise the sampled directivity to cover all directions.

436

F. Guarato and J.C.T. Hallam

number of instances

(a)

(b)

2

2

1

1

0 −5

0

5

15

25

35 40

0 −5

0

5

(c) 2

1

1

0

5

15

25

35 40

25

35 40

(d)

2

0 −5

15

25

35 40

0 −5

0

5

15

error (deg)

Fig. 3. Error distributions with Lesser Spearnosed Bat HRTF as sound source, SN R = 10 dB , 10 frequencies within the 141 frequency range corresponding to the 10 biggest values of the HRTF. (a) source orientation (−20◦ , 0◦ ), error mean = 3.23◦ . (b) source orientation (20◦ , 0◦ ), error mean = 6◦ . (c) source orientation (−20◦ , −20◦ ), error mean = 4◦ . (d) source orientation (20◦ , −20◦ ), error mean = 6.8◦ .

number of instances

(a)

(b)

2

2

1

1

0 −5 0

5

25

40

5560

0 −5 0

10

2

2

1

1

0 −5 0

10

25

25

40

5560

40

5560

(d)

(c)

40

5560

0 −5 0

10

25

error (deg)

Fig. 4. Error distributions with Lesser Spearnosed Bat HRTF as sound source, SN R = 10 dB , 10 frequencies equally spaced within the 141 frequency range. (a) source orientation (−20◦ , 0◦ ), error mean = 4.6◦ . (b) source orientation (20◦ , 0◦ ), error mean = 22.8◦ . (c) source orientation (−20◦ , −20◦ ), error mean = 5◦ . (d) source orientation (20◦ , −20◦ ), error mean = 10.8◦ .

Finally, two angles were used here to represent the orientation of the source. While this is sufficient for a rotationally symmetric directivity such as that of the Polaroid Transducer, the orienting the directivity of a bat requires an additional roll angle for completeness. We have neglected that angle on the assumption that the natural reference frame for the bat’s HRTF does not roll much with respect to the world reference frame when it is calling. This is true for the simulations reported in this paper but the assumption will be tested in future work.

Determining Sound Source Orientation from Source Directivity

number of instances

(a)

(b)

2

2

1

1

0

0

20

40

60

80

100

120

0

0

20

40

(c) 2

1

1

0

20

40

60

60

80

100

120

80

100

120

(d)

2

0

437

80

100

120

0

0

20

40

60

error (deg)

Fig. 5. Error distributions with Lesser Spearnosed Bat HRTF as sound source, SN R = 0 dB . (a) source orientation (−20◦ , 0◦ ), error mean = 7◦ . (b) source orientation (20◦ , 0◦ ), error mean = 40.5◦ . (c) source orientation (−20◦ , −20◦ ), error mean = 16◦ . (d) source orientation (20◦ , −20◦ ), error mean = 41.4◦ .

4

Conclusions and Future Work

In this paper we have presented a method for determining the orientation of a directional sound source, provided that we are given its position, its directivity and a set of broadband recordings from a suitably located microphone array. Such a method can in principle be applied to any source whose directivity is known. In particular, our intention is to use the method to determine the head orientation of a flying bat while hunting, with the ultimate goal of reconstructing its call. The method has been tested using the analytic model of a Polaroid Transducer directivity and a directivity derived from acoustic simulation of the shape model of an individual bat’s head, and is shown to be accurate and robust over a range of additive noise intensities. As a future subject, the method presented in this work will be tested on an interpolated bat HRTF. Different step sizes between two consecutive orientations will be examined in order to see which one guarantees the best performance of the method. Thanks to interpolation, microphones can be placed at orientations not considered within the ones where the source directivity is known. Other improvements of the method include correcting for reflection of the call from a hypothetical water floor under the source, such as in the case of a bat trawling on the water surface. The definitive test for this method will be the recording of a real bat call through a sixteen microphone array and the processing of these real data using the method to find bat’s orientation when calls are emitted. Once position and orientation are known, we should be able to reconstruct the bat call and compare it with that recorded with a Telemike-like [7] recording system carried by the bat.

438

F. Guarato and J.C.T. Hallam

References 1. Tamai, Y., Kagami, S., Mizoguchi, H., Amemiya, Y., Nagashima, K., Takano, T.: Real-time 2 dimensional sound source localization by 128-channel huge microphone array. In: Proceedings of the 2004 IEEE International Workshop on Robot and Human Interactive Communication, pp. 65–70 (2004) 2. Peremans, H., Walker, A., Hallam, J.C.T.: 3D object localization with a binaural sonarhead, inspirations from biology. In: Proceedings of the 1998 IEEE International Conference on Robotics and Automation, May 1998, pp. 2795–2800 (1998) 3. Reijniers, J., Peremans, H.: Biomimetic sonar system performing spectrum-based localization. IEEE Transactions on Robotics 12(6), 1151–1159 (2007) 4. Kuc, R.: Three dimensional tracking using qualitative sonar. Robotics and Autonomous Systems 11, 213–219 (1993) 5. Bronkhorst, A.W.: Localization of real and virtual sound sources. J. Acoust. Soc. Am. 98m(5), 2542–2553 (1995) 6. De Mey, F., Reijniers, J., Peremans, H., Otani, M., Firzlaff, U.: Simulated head related transfer function of the phyllostomid bat Phyllostomus discolor. J. Acoust. Soc. Am. 124, 2123 (2008) 7. Riquimaroux, H.: Measurement of biosonar signals of echolocating bat during flight by a telemetry system (A). J. Acoust. Soc. Am. 117(4), 2526 (2005) 8. Tucker, D.G., Gazey, B.K.: Applied underwater acoustics. Pergamon Press, Oxford (1977)

Suggest Documents