microphone array on an autonomous mobile robot. ..... S3 active , (e) Run 6 with S1, S2, S3 and S4 active , (f) Run in the room scenario with three sources active.
Probabilistic Approach for Building Auditory Maps with a Mobile Microphone Array Nagasrikanth Kallakuri, Jani Even, Yoichi Morales, Carlos Ishi, Norihiro Hagita Intelligent Robotics and Communication Laboratory Advanced Telecommunications Research Institute International
Abstract— This paper presents a multi-modal sensor approach for mapping sound sources using an omni-directional microphone array on an autonomous mobile robot. A fusion of audio data (from the microphone array), odometry information and the laser range scan data (from the robot) was used to precisely localize and map the audio sources in an environment. An audio map is created while the robot is autonomously navigating through the environment by continuously generating audio scans with a steered response power (SRP) algorithm. Using the poses of the robot, rays are cast in the map in all directions given by the SRP. Then each occupied cell in the geometric map hit by a ray is assigned a likelihood of containing a sound source. This likelihood is derived from the SRP at that particular instant. Since the localization of the robot is probabilistic, the uncertainty in the pose of the robot in the geometric map is propagated to the occupied cells hit during the ray casting. This process is repeated while the robot is in motion and the map is updated after every audio scan. The generated sound maps were reused and the changes in the audio environment were updated by the robot as it identifies these changes.
Fig. 1. Example scenario where prior environmental knowledge enhances robot’s hearing capabilities
I. I NTRODUCTION In spite of several significant breakthroughs in the last decades in the field of robot audition, it is still a challenging task to precisely localize and map sound sources because of the complexity and variability of the environments. At any given point in space, the perceived sound is the superposition of the contributions from many sound sources present in the environment. Thus it is necessary to retrieve the sound that contains the information to be processed and discard other information from this complicated mixture. Consequently, gathering prior knowledge of the audio environment in a given place would be beneficial for any sound-related tasks in that environment. Audio sources are not always constant in location and characteristics. However, some of the sound sources may be active for relatively long periods of time and be quite localized in space like a television set, an air conditioner or a loudspeaker in a shopping mall. Thus it is useful to create a map containing the positions and characteristics like power, directionality, frequency and harmonics of these sound sources (referred to as audio map in the remainder). This information obtained from audio maps can complement visual and range sensory data and increase robot’s localization, interaction and surveillance capabilities. This research was funded by the Ministry of Internal Affairs and Communication, Japan. All the authors are with the Advanced Telecommunications Research Institute International, Kyoto, Japan
For example considering the illustration in Fig. 1, the robot perceives two dominant sounds: the television set from the left and the user voice from the right. Supposing it is an autonomous robot localized in the map, it is aware of it’s current pose. If an audio map is available for the robot, it will have a prior knowledge about the fixed sound sources in the environment and knows that the data coming from its left is from a fixed source and not so important. Then, based on the characteristics of the new sound in the environment (in this case it is speech) the robot can perform relevant operations like positioning itself to be able to listen to the user and interact with him/her more effectively. Audio map of the environment, similar to the geometric map, is not constant and keep changing very rapidly. So, it is a tedious task to make a new audio map whenever a change occurs in the environment. For that reason, it is more convenient to develop a framework that can update these changes in the map as the robot observes them while performing its usual tasks. In this paper, we present a framework for creating such environmental sound maps by using an autonomous mobile robot equipped with a microphone array. The localization of sound sources using a mobile platform has been around for some years in the robot community [1], [2], [3], [4]. The novelty of the present work is in obtaining audio information
of an environment as an actual map that can be reused and updated by the robot. Most autonomous mobile robots posses a good description of the geometry of the environment in the form of a map which is obtained by fusing the LIDAR (Light Detection and Ranging) sensors and the odometry data obtained from the encoders on the wheels of the robot. The robots use the range scanning sensors and odometry to localize themselves in the geometric map. This ability to estimate precisely the range of the objects around the robot is exploited in this paper to solve the problem of poor range estimation from audio localization techniques. Consequently, our approach assumes that the geometric coordinates of sound sources are detectable by the LIDAR on the robot. While this assumption is a bit restrictive when a two dimensional horizontal plane is scanned, it will be reasonable when extended to a three dimensional scan. Another particularity is that the available information about the localization of the robot in the environment is propagated during the creation process and embedded in the final audio map. Moreover, the fusion of the audio data with the distance scan data obtained from the laser range finders on the robot and the available geometric information of the environment makes the sound source localization more precise and computationally less expensive. The proposed framework for creating and updating an audio map consists of four stages : audio scanning, ray casting, likelihood accumulation and clustering the cells with high likelihood. Using this approach, reliable audio maps were created and updated even in environments with people talking or walking around. II. P ROBLEM STATEMENT AND RELATED W ORKS Sound source localization has been a topic of interest in the audio processing community for a long time (see [5]). The most effective techniques that emerged are either based on the estimation of the time delay of arrival at microphone pairs, or on the estimation of a steered response power (SRP) or spectral decomposition techniques like the MUSIC algorithm. All these approaches rely on the use of microphone arrays. Using a robot, it is possible to explore the space and effectively extend the operational range of sound localization. Thus a natural framework for sound source localization from a robot is to use a conventional sound localization algorithm at different locations and combine the results from all these different locations. The steered response power with phase transform (SRP-PHAT) [6] is an efficient sound localization algorithm that is suitable for mobile robot applications [7]. In order to present this framework in greater detail, let us first briefly describe the SRP approach to sound source localization. The goal of sound source localization is to estimate the position of sound sources in a search space using the audio observation. At the sampled time k, the observed signals from the Q microphones of the array are v1 (k), · · · , vQ (k). Because the geometry of the microphone array is precisely known, it is possible to focus the array
using spatial beam forming to estimate the sound from a spatial location. The beam forming output is denoted by s(k, [x, y, z]) = F (v1 (k), · · · , vQ (k), [x, y, z]),
(1)
where [x, y, z] are the coordinates of the focus point in the array referential. The SRP is obtained by computing the power of this output over T samples J(k, [xn , yn , zn ]) =
1 T
T −1
∑ s2 (k − τ , [xn , yn , zn ])
(2)
τ =0
for a set of N locations [xn , yn , zn ]n∈[1,N] in the search space. The locations corresponding to the peaks of the SRP gives the sound sources’ positions. There are several ways to obtain the beam forming output, compute the power and select the set of locations. In the remainder, the SRP obtained at the time k that contains the power from the N locations is referred to as the kth audio scan. Consequently, performing sound localization using the mobility of the robot means combining audio scans obtained from different locations at different times. In particular, note that these scans are computed in search spaces in the array’s frame of reference (as the position of the focus point have to be known in the array’s frame of reference). Thus it is necessary to transform the poses of the robot at these locations to a global coordinate system to combine them and make an audio map. Since these localization results are obtained for different times, it is important to distinguish between fixed sound sources and moving sound sources. In this paper, we are interested in the mapping of the environmental noises that are fixed sources (In fact, having an audio map of the fixed sound sources is a good prior knowledge for detecting the moving ones). The sound localization is often performed from a stationary microphone array and rarely from a moving one. Use of the robot mobility to improve the localization has been scarcely employed. For example, the authors in [4] rely on triangulation to estimate the positions of the sound sources using scans taken by an autonomous mobile robot. They obtained an average localization error of 28.2 cm for 11 sound sources. One of the very interesting approaches in this area is the use of evidence grids in [2], [8]. The space to be explored is partitioned into grid cells of fixed size. Then the probabilities of having a sound source in each of the cells are estimated during the exploration. To achieve this, at a given location, an SRP is estimated for a grid centered on the robot and these estimated powers are used to update the evidence grid. In this method, the robot was tele-operated to gather sound data in the vicinity of the sound sources and the approach is successful in finding up to two sound sources with an error of around 24 cm [8]. III. P ROPOSED APPROACH A block diagram describing the proposed approach is shown in Fig. 3. The processing blocks will be described in the following sections.
and to align the laser sensor scans [9] using the 3DToolkit library framework [10], [11]. With the resulting aligned scans an occupancy grid map was created [12], [13]. The map obtained for one of the test environments is shown in Fig. 4 (a). C. Localization Approach
Fig. 2.
Experimental Setup
A. Audio scan acquisition In this paper, the SRP-PHAT processing is done in the frequency domain after applying a short time Fourier transform (STFT) to the observed signals sampled at 48kHz (the analysis window is 25 ms long and the shift of the window is 10 ms). Then the SRP is computed for the frequency band [1000, 6000] Hz using 5 STFT frames for averaging the power. Thus a new audio scan is available every 50 ms. A particularity of sound source localization algorithm is that the estimation of a source range is imprecise whereas its bearing estimate is accurate. Thus spherical coordinates [ρ , θ , φ ] are often used to describe the search space. Contrary to [2], assuming far field assumption (ρ large compared to array aperture), a bearing only scan is performed. Namely the kth scan is a set of N angles θn (k) ∈ [0, 2π ] with their associated power Jn (k) (It is important to note that these angles are obtained in the robot referential frame). In our approach the distances are obtained using the geometric map as explained in Sect. III-D. Odometry
Map Building
Geometric Map
Front Laser (UTM) Robot Localization
Back Laser (UTM)
Sound Source Localization in Robot Referential Frame
Microphone Array
Sound Source Map
Robot Position
Raycasting in all directions, and assigning likelihood to cells in the map
Sound Source Power Spectrum (360deg) SRP-PHAT
Fig. 3.
Block diagram of the system.
B. Geometric Map Building Simultaneous Localization And Mapping (SLAM) based mapping techniques have been well studied, matured and implemented. In this work we use them for building grid maps. To build the map, we controlled the robot with a joystick through the environment gathering odometry and laser sensor information. Then we used iterative closest point based SLAM to correct the trajectory of the robot
We used a particle filter approach to estimate the robot position with a weighted set of M particles. Each particle has a pose given by the state vector {xm (k), ym (k), θm (k), wm (k)} containing a candidate position and orientation of the robot and the associated weight. While the robot moves, each particle also moves based on the odometry readings and the probabilistic motion model, which describes the uncertainty in the robot motion. In the correction step, the particle filter estimates the posterior density by considering measurement likelihood. A sensor scanning the environment was used to compute this quantity. We used the ray casting approach likelihood model in [14] to compute the particle weights. The map update depends on the state of the particle dispersion and the matching of the laser scans. The particles, which are more likely to be correct after ray casting, have a higher likelihood score, and therefore, more weight. Then particle re-sampling was performed, and the robot pose is given by the average weight of the particles. D. Likelihood accumulation and Sound Source mapping During the autonomous navigation, the audio scans {θn (k), Jn (k)} are in the robot coordinate frame and the goal of the fusion procedure is to combine them with the knowledge about the robot pose in the global referential in order to estimate the position of the sound sources in the geometric map. The main idea is to cast rays from the robot pose in these directions and find the occupied cells of the geometric map that are hit by these rays (this is the reason why we assume that the sound sources are visible in the geometric map). The knowledge about the robot pose is contained in the distribution of the particles. To propagate this knowledge in the audio direction θn (k) a ray is cast from each of the M particles. For the mth particle, the ray is cast from the point [xm (k), ym (k)] in the direction θm (k) + θn (k). The ray casting stops when it hits an occupied cell of the geometric map, or when it enters a non explored region of the map (the gray area in Fig. 4 (a)) or when it reaches the maximum range d. For the cell {i, j}, the hit probability for the kth scan is denoted by ci j (k). The hit count is initialized to zero, then, if the cell {i, j} is hit by the ray cast from the mth particle of the kth scan, it is incremented by ci j (k) = ci j (k) + wm .
(3)
After casting a ray from each of the M particles, ci j (k) represents the probability that the cell {i, j} was hit. Thus, the audio direction θn (k) may be associated with several cells and a probability for each of these cells is estimated. Fig. 4 (b) shows the cells hit by a scan and their associated hit probabilities.
160
TABLE I PARAMETERS USED DURING ALL RUNS .
0.255
158
εmax 20
156 154
εrange 20
εnorm 40
α 0.5
Lmin 0.01
Lmax 0.990
εproba 0.025
Lmax 400
d 3m
152 150 148 146 144 142 170
175
180
185
(a)
190
195
200
0
205
(b)
Fig. 4. (a) geometric map of the corridor with sound sources and robot trajectory (b) enlarged portion showing the hit probability ci j (k) obtained after casting a ray (the black crosses represent the occupied cells).
During the ray casting, the cell {i, j} is also assigned a likelihood that is function of the audio scan power. This likelihood is only created for the scans that are informative. A scan is considered informative when it has a powerful maximum and the range of the power values is large. Thus, scans that associate similar power to all directions are discarded. The kth scan is selected if max Jn (k) > εmax n
and
max Jn (k) − min Jn (k) > εrange , n
n
where εmax and εrange are the thresholds. For the selected scans, the likelihood function is obtained by taking Jn (k) − Jmin (k) α + Lmin , (4) Ln (k) = (Lmax − Lmin ) Jmax (k) − Jmin (k) where α is used to control the sharpness of the likelihood peaks and Jmin (k) = min Jn (k), Jmax (k) = max max Jn (k), εnorm . n
n
The parameter εnorm controls the range of the likelihood for scans with low power. Note that the likelihood is scaled to a n Jn (k) ]. This scaling and peak sharpness range [Lmin , Lmax maxεnorm control is similar to the use of the localization function for creating a likelihood in audio source tracking [15]. In [2], [8], a scaled version of the scan is used but no selection process is applied and the sharpness is not modified. The cell {i, j} is assigned a log-odd as follows Li j (k) = Li j (k − 1) if the kth scan is not selected or ci j (k) < εproba , Ln (k) Li j (k) = Li j (k − 1) + wm M log 1 − Ln (k) otherwise. The initial log-odd is pi j,init Li j (0) = log 1 − pi j,init
(5)
(6)
(7)
if there is no prior information on the source localization, we take pi j,init = 0.5 for all occupied cell {i, j} (flat initialization). During accumulation, the log-odds are thresholded to ±Lmax .
The number of scans for which the cell {i, j} is seen is denoted by Ki j (k). This count is used to remove cells that have been seen very few times. The sound source localization in the geometric map is performed by finding the cells that have higher log-odds, clustering these cells and computing the weighted average position within each cluster (these weights are the log-odds Li j (k) of the selected cells). IV. EXPERIMENTS For experimental validation we used a pioneer robot. This robot has a differential drive configuration and was equipped with two motor encoders and two 30 m range laser range finder (UTM-30LX from Hokuyo). The experimental platform can be observed in Fig. 2. The microphone array is composed of 16 Sony ECMC10 microphones mounted on a circular frame (diameter 31 cm). The audio capture interface is a Tokyo Electron Device Limited TDBD16AD-USB that samples the signals at 48kHz. The experimental evaluation of the approach was conducted in two different locations, one in a closed room scenario and the other in a corridor. Different sound sources of known intensities were setup in these environments. To illustrate the approach, the corridor data was used. Fig. 4 (a) depicts the geometric map of the environment. The dimension of the cells in this map is 5 cm x 5 cm. Having a geometric map of the environment, the robot is able to navigate autonomously in the corridor. The robot was given a set of way points that defined a loop covering all parts of the corridor. In the remainder, a run corresponds to the robot performing one loop in the corridor. The audio map is generated during these runs. Several (up to four) sound sources were placed in the environment (these locations are in the scan plane of the UTMs). These sound sources are loudspeakers playing recorded sounds. There was the sound of an air conditioning unit (S1 with a sound pressure of 78.5 dBA measured at 5 cm), the sound of a desktop computer fan (S2 at 77.5 dBA), the sound of a server rack (S3 at 77 dBA) and a pop song (S4 at 73 dBA). The sound pressure in the quiet corridor was around 42 dBA. The activation pattern of the sound sources for the runs can be observed in Table II. During Run 4 two persons were walking and talking in the environment. The parameters for the audio scan selection, the shaping of the audio scan to a likelihood function, the likelihood accumulation and the maximum distance for ray casting are given in Table I. V. R ESULTS During the autonomous navigation, when an audio scan is available, the log-odds of the occupied cells are updated.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 5. Raw audio map (log-odds) for the runs with flat initialization. (o) is the ground truth and (x) is the detected source (a) Run 1 with no sources active , (b) Run 3 with sources S1, S2 and S3 active, (c) Run 4 with S1, S2 and S3 active with people walking and talking around (d) Run 5 with S1 and S3 active , (e) Run 6 with S1, S2, S3 and S4 active , (f) Run in the room scenario with three sources active.
The raw audio map is obtained by taking the log-odds of the cells for which the number of visits Ni j (k) is greater than 5% of the maximal number of visits. Thus an updated raw audio map is available after each audio scan. Fig. 5(a)-(f) shows the raw audio map obtained at the end of the runs with a flat initialization. For these raw audio maps, the cells with positive likelihood are clustered. The locations of the cluster centroids appear as black crosses. The ground truth, i.e. the real positions of the sound sources are given as black circles. For each of the clusters, the number of cells and the sum of the log-odds are given in Table II. In the table, S* represents a false detection where there is no source. To demonstrate the ability to update an existing audio map, for each of the runs, the raw audio map of the previous run was used as initialization (only run 1 had a flat initialization). In order to detect new sound sources and delete the sources that no longer exist, the final raw audio map of a run is scaled down by a factor 20. The clusters, the number of cells and the sum of the log-odds obtained after each of the consecutive runs are given in Table III. VI. D ISCUSSION The proposed approach successfully estimated the positions of the sound sources in two different environments. For the flat initialization, the average localization error was 12.88 cm and for the consecutive runs 15.53 cm. Considering that the cell dimension is 5 cm x 5 cm and that the loudspeakers are not point sources but may span several cells, an error in the range of a few cells indicates a precise localization. The highest log-odds sum obtained from a wrongly detected sound source is 687 (Fig. 5 (e)) , which is relatively smaller than the smallest log-odds sum obtained for an active source 2116 (Fig. 5 (c)). Note that this lower value corresponds to
TABLE II S OUND SOURCE LOCALIZATION RESULTS FOR FLAT INITIALIZATION . Run
Active Sources
1 2
3
4
5 6
S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S3 S1 S2 S3 S4
Detected Sources S∗ S1 S2 S3 S1 S2 S3 S1 S2 S3 S1 S3 S1 S2 S3 S4 S∗
Error(cm)
14.81 24.58 1.36 12.27 18.1 1.67 16.11 27.87 2.18 11.63 6.11 15.78 25.52 9.7 5.54
Log-odds Sum 408 3012 10421 11905 4390 10842 11373 2116 10702 9048 7195 12982 2822 11051 11146 6233 687
Cell Count 6 26 36 41 26 40 41 22 41 35 32 44 18 42 41 22 12
the run 4 when two persons were talking and moving in the environment while mapping. Moreover, the estimated raw audio maps proved to be a good tool for an autonomous update procedure as illustrated in Table III. The selection of the informative audio scans and the shaping of the likelihood from the scan is an important part of the success of the audio mapping. In particular, it determines how good the algorithm is to discover new sources or delete no longer active ones. However, a trade-off has to be made between these two abilities. In our case, detecting new sources is considered more important than complete deletion of the sources that are no longer active. But as we can see in Table III, the log-odds for partially removed sources is considerably reduced (for example, the source S4 log-odds sum goes from 5831 to 36 between run
TABLE III S OUND SOURCE LOCALIZATION RESULTS FOR THE SUCCESSIVE RUNS . Run
Active Sources
1 2
3
4
S1 S2 S3 S1 S2 S3 S4 S1 S2 S3 S1
5 S3 6
Detected Sources S∗ S1 S2 S3 S1 S2 S3 S4 S∗ S1 S2 S3 S4 S1 S2 S3 S1 S2 S3 S∗
Error(cm)
14.25 25.51 4.45 15.66 29.65 4.45 6.78 12.54 20.35 3.13 20.64 8.86 22.04 5.1 19.50 39.51 11.64
Log-odds Sum 437 3599 10683 11599 2826 11216 11418 5831 574 4329 10709 11354 36 7143 84 13185 11 3 12 564
Cell Count 6 26 37 40 19 41 42 21 10 26 38 41 4 32 19 44 2 9 2 10
3 and run 4 of the successive runs). Selecting the parameters that advantage source discovery more than source deletion would effect in keeping track of sources active for several runs and missing in the current run. In Fig. 5 (b)-(f), the spatial density of the log-odds around the sound sources contains the uncertainty about the robot’s position that is propagated during the ray casting. But for sound sources spanning a small island of occupied cells in the geometric map, like the one in the right of Fig. 5 (f), the propagation of this uncertainty created a cluster of high log-odds behind the true sound source position. This cluster was not suppressed as the robot could not pass behind that source. By comparison, the other island sound source on the left of Fig. 5 (f) was successfully located as the robot could move around it. Thus, one of the limitations of this approach is the localization of these isolated small sound sources when the robot cannot navigate around them. Different resolutions (10cm x 10cm and 20cm x 20cm) of the grid map were considered but no substantial differences were observed. The maximal distance d used for the ray casting was set to 3 meters because over 3 meters, the sound emitted by the sources is below the noise level in the room. Moreover, for the ray casting procedure a longer distance would result in high dispersion and increased computational load. VII. CONCLUSIONS This paper presented a framework for creating audio maps of environmental sound sources using an autonomous mobile robot equipped with encoders, laser sensors and a 16 channel microphone array. The audio maps obtained in the experiments had an average distance error of 15.53 cm, showing that the proposed framework is capable of producing precise maps. Up to 4 sources within an 8m x 8m space were precisely localized on the map. The method is also robust towards false positive detections and noise effects produced by echoes, and also the performance did
not vary much with people moving around and talking in the environment. The maps created are reusable and have the ability to get updated as soon as a change is observed in the environment. The novelty of the approach is in the combination of the audio scans with the geometric map, which makes it computationally less expensive and more accurate. This is because in our approach only the occupied cells of the geometric map are considered and the knowledge about the robot pose is propagated to these cells during the mapping. These kind of audio maps will aid the robot in attaining a better knowledge about environmental noise. The generated audio maps can be used for better speech recognition (suppressing the known environmental noise), effective human-robot interaction, and also for surveillance of environments. With the available framework, it is possible to extend the work to 3-Dimensional sound source localization that gives more informative audio maps. R EFERENCES [1] J.-M. Valin, J. Rouat, and F. Michaud, “Enhanced robot audition based on microphone array source separation with post-filter,” in Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, vol. 3, sept.-2 oct. 2004, pp. 2123 – 2128 vol.3. [2] E. Martinson and A. C. Schultz, “Auditory evidence grids.” in IROS. IEEE, 2006, pp. 1139–1144. [3] K. Nakadai, H. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, “An open source software system for robot audition hark and its evalation,” in IEEE-RAS International Conference on Humanoid Robots, 2008, pp. 561–566. [4] Y. Sasaki, S. Thompson, M. Kaneyoshi, and S. Kagami, “Mapgeneration and identification of multiple sound sources from robot in motion,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2010, 2010, pp. 437–443. [5] H. DiBiase, J. nad Silverman and M. Brandstein, Microphone arrays : Signal Processing Techniques and Applications. Springer-Verlag, 2007. [6] M. Brandstein and H. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” in IEEE Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, 1997, pp. 375–378. [7] A. Badali, J.-M. Valin, F. Michaud, and P. Aarabi, “Evaluating realtime audio localization algorithms for artificial audition on mobile robots,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009, 2009, pp. 2033–2038. [8] E. Martinson and A. C. Schultz, “Robotic discovery of the auditory scene,” in ICRA, 2007, pp. 435–440. [9] P. Besl and H. McKay, “A method for registration of 3-d shapes,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 14, no. 2, pp. 239–256, Feb. 1992. [10] D. Borrmann, J. Elseberg, K. Lingemann, A. N¨uchter, and J. Hertzberg, “The Efficient Extension of Globally Consistent Scan Matching to 6 DoF,” in Proceedings of the 4th International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT ’08), Atlanta, USA, June 2008, pp. 29–36. [11] slam6d, “Slam6d - simultaneous localization and mapping with 6 dof,” Retrieved December May, 20 2011 from http://www.openslam.org/ slam6d.html, 2011. [12] H. Moravec and A. E. Elfes, “High resolution maps from wide angle sonar,” in Proceedings of the 1985 IEEE International Conference on Robotics and Automation, March 1985, pp. 116–121. [13] A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46–57, June 1989. [14] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). The MIT Press, 2005. [15] D. B. Ward, E. A. Lehmann, and R. C. Williamsin, “Particle filtering algorithms for tracking an acoustic source in a reverberant environment,” IEEE Trans. Speech and Audio Processing, vol. 11, pp. 826– 836, Nov. 2003.