Bayesian Sensor Model for Egocentric Stereovision João Filipe Ferreira, Cátia Pinho and Jorge Dias ISR — Institute of Systems and Robotics, FCT-University of Coimbra Coimbra, Portugal
[email protected] y
Abstract z x
In this text we will briefly present motivations, theory and results for a Bayesian approach that models stereovision as a probabilistic process in a log-spherical egocentric frame1 . This spatial configuration is in general agreement with what is believed regarding how the brain represents space for fast active perception purposes. Consequently, it provides several advantages over its Euclidean counterparts.
1. Introduction Perception has been regarded as a computational process of unconscious, probabilistic inference. Aided by developments in statistics and artificial intelligence, researchers have begun to apply the concepts of probability theory rigorously to problems in biological perception and action. One striking observation from this work is the myriad ways in which human observers behave as near-optimal Bayesian observers, which has fundamental implications for neuroscience, particularly in how we conceive of neural computations and the nature of neural representations of perceptual variables [4]. Consider the following scenario — an observer is presented with a 3D scene: how does this observer perceive its 3D structure, given the ambiguities and conflicts inherent to the perceptual process? We advocate a Bayesian framework as a promising approach in order to solve this perception problem, and it is within this perspective that the research presented on this text is framed. To support our research work, an artificial multimodal perception system (IMPEP — Integrated Multimodal Perception Experimental Platform) has been constructed at the ISR/FCT-UC consisting of a stereovision, binaural and inertial measuring unit setup mounted on a motorised head, with gaze control capabilities for image stabilisation and percep1 This
work has been supported by EC-contract number FP6-IST027140, Action line: Cognitive Systems. The contents of this text reflect only the author’s views. The European Community is not liable for any use that may be made of the information contained herein.
x
y
z z x y
Figure 1. View of the Integrated Multimodal Perception Experimental Platform (IMPEP) and respective reference frames.
tual attention purposes — see Fig. 1.
2. Background and definitions The perceptual brain contains a set of functional areas, the dorsal pathway, which are dedicated to fast processing of sensory information with the sole purpose of yielding spatial representations, with no further concern in analysing or classifying the scene itself beyond this goal [3]. Given the perceptual problem exposed earlier on, this is of particular interest to our work. Spatial representations in this pathway are believed to be metric and egocentric in lowerlevel areas so as to promote fast and accurate interaction with the surrounding environment [3]. We have therefore decided to use the occupancy grid, which is a discretised random field where the probability of occupancy of each cell is kept, and the probability values of occupancy of all cells are considered independent between each other [1]. The absence of an object based representation permits the ease of fusing low level descriptive sensory information onto the grids without necessarily implicating data association. Sensor space is defined as a log-spherical volumetric occupancy grid Y, with each cell being indexed by its far corner C ≡ (logb ρmax , θmax , φmax ) ∈ C ⊂ Y — this configuration has the advantage of providing a natural setting for the integration of stereoscopic depth cues,
Z
GC
OC
C
Figure 3. Results (centre and right) of 22 steps of probabilistic inference on occupancy of a 3D scene (left), projected onto Euclidean space. The occupancy grid top view shown on the right shows the model’s usefulness for time-to-collision computations. Figure 2. Bayes network corresponding to the Bayesian Program for the vision sensor model.
since the latter are directly a function of egocentric spherical coordinates. Moreover, logarithmic partitioning of distance accounts for the increasing just-noticeable differences (JND) [3] of stereoscopic disparity cues corresponding to surfaces at increasing distances caused by photoreceptor discretisation, thus promoting an efficient use of memory resources. Finally, our model assumes that egomotion (rotational only) is estimated through time (please refer to [2]). To compute the probability distributions for the current states of each cell, the Bayesian Program (BP) formalism, as first defined by Lebeltel [5], will be used throughout this text.
3. Bayesian sensor model for stereovision For visual perception of occupancy, the stereovision sensor can be decomposed into simpler linear (1D) depth ρ(k, i) measuring sensors per projection line/pixel (k, i), each oriented in space with spherical angles (θ(k, i), φ(k, i)) extending from the egocentric reference frame {E} (i.e., cyclopean geometry). We have decided to model these sensors in terms of their contribution to the estimation of the probability of cell occupancy (O ≡ {empty, occupied}) in a similar fashion to the solution proposed by Yguel et al. [6] for range sensors — see the Bayesian Program presented on Fig. 2. The goal is to estimate P (OC |Z C), where Z and OC ∈ O are random variables that represent a set of m depth measurement vectors [δˆm , λm ] (δˆm is a disparity reading and λm ∈ [0, 1] the corresponding confidence value) and the occupancy state, respectively, corresponding to each cell C. This is achieved through Bayesian inference on the decomposition equation shown on Fig. 2, with P (OC |C) representing the prior on occupancy (i.e. previous state of the occupancy grid) and P (C) a uniform distribution representing an estimation process assumed to be unbiased by each cell’s location in space.
For a specific projection line, given the first occupied cell [C = k] on the line-of-sight, we define ( ˆ µρ (k) = ρˆ(δ) Pk (Z) = Lk (Z, µρ (k), σρ (k)), (1) 1 σρ (k) = λ σmin ˆ taken from camera calibration. This with σmin and ρˆ(δ) likelihood function constitutes, in fact, the Gaussian elementary sensor model as defined by Yguel et al., adapted so as to perform the transformation to log-distance space.
4. Results A representative result is shown on Fig. 3. Further details on this model and related work [2] can be found at http://paloma.isr.uc.pt/~jfilipe/ BayesianMultimodalPerception .
References [1] A. Elfes. Using occupancy grids for mobile robot perception and navigation. IEEE Computer, 22(6):46–57, 1989. [2] J. F. Ferreira, P. Bessière, K. Mekhnacha, J. Lobo, J. Dias, and C. Laugier. Bayesian Models for Multimodal Perception of 3D Structure and Motion. In International Conference on Cognitive Systems (CogSys 2008), pages 103–108, University of Karlsruhe, Karlsruhe, Germany, April 2008. [3] J. F. Ferreira and M. Castelo-Branco. 3D Structure and Motion Multimodal Perception. State-of-the-Art Report, ISR and IBILI, University of Coimbra, BACS, September 2006. Published April 23, 2007. [4] D. C. Knill and A. Pouget. The Bayesian brain: the role of uncertainty in neural coding and computation. TRENDS in Neurosciences, 27(12):712–719, December 2004. [5] O. Lebeltel. Programmation Bayésienne des Robots. PhD thesis, Institut National Polytechnique de Grenoble, Grenoble, France, September 1999. [6] M. Yguel, O. Aycard, and C. Laugier. Efficient GPU-based Construction of Occupancy Grids Using several Laser Rangefinders. International Journal of Autonomous Vehicles, 2007.