Learning Temporal Context in Active Object Recognition ... - CiteSeerX

9 downloads 0 Views 217KB Size Report
t 6#%'§C"2¥(9¥C" I(5 W76(Q Hзже I viewpoint were computed simulating segmentation errors by shifting the original ¤6Х'° ¤RХ2° pixels image content by.
In: Proc. 15th International Conference on Pattern Recognition, ICPR2000, Barcelona, Spain, September 3-8, 2000, pp. 695-699

Learning Temporal Context in Active Object Recognition Using Bayesian Analysis 

Lucas Paletta Institute of Digital Image Processing Joanneum Research, Graz, Austria

Manfred Prantl Alicona GmbH Berchtesgaden, Germany

Axel Pinz Institute of Electrical Measurement and Measurement Signal Processing Graz University of Technology, Graz, Austria

Abstract

efficient dynamic cues to achieve object identification with arbitrary reliability.

Active object recognition is a successful strategy to reduce uncertainty of single view recognition, by planning sequences of views, actively obtaining these views, and integrating multiple recognition results. Understanding recognition as a sequential decision problem challenges the visual agent to select discriminative information sources. The presented system emphasizes the importance of temporal context in disambiguating initial object hypotheses, provides the corresponding theory for Bayesian fusion processes, and demonstrates its performance being superior to alternative view planning schemes. Instance based learning proposed to estimate the control function enables then realtime processing with improved performance characteristics.

Previous research on dynamic object recognition has focused on partial aspects of the problem. The temporal order of evidences in passively perceived image sequences has been extensively treated, e.g., for recognizing human gestures using hidden Markov models [17], applying adapted Kalman filters to minimize prediction errors [14], or with respect to learning by neural architectures [16, 2]. In contrast, active recognition provides the framework for selective collection of evidences to confirm/reject a current object hypothesis. The choice of action to a next target view is then either based on a set of precomputed most discriminative views [15], recursively determined in a Bayesian fusion

1. Introduction While research on object identification by analysis of a single 2-D pattern undergoes permanent improvement, the results will always depend on the ambiguity of a particular view, the imprecision of the object model, and the uncertainty in the imaging process. Instead of generating arbitrarily complex classifiers to solve an a priori ill-posed problem, dynamic information is often available to take advantage of the redundancy in different object views. Active recognition of 3-D objects involves the visual agent in a search for discriminative evidence to disambiguate initial object hypotheses. Temporal context in its observations associated to the spatial context of its 3-D object models provides then



This work is funded by the European Commission’s TMR project VIRGO under grant number ERBFMRXCT960049. This support and cooperation is gratefully acknowledged.  Email: [email protected] (corresponding author). This work has been created while the author was working at GMD – National Research Center for Information Technology, Institute for Autonomous intelligent Systems, D-53754 Sankt Augustin, Germany.

control

perception

view planner

Bayesian fusion

classifier

appearance representation

action world

   

   !#"$ &%' ($ )+*-,. /

0  '    13241 3 56878"4 1 # 9%$3 0 2  :#"4 &;% 2 1  % ? %  4

 1 ? % @4 AB* 2 #"4 CD 1 -% 40 C*4% 1 AE ? % 1-1  F  G3H JI K @5>% ? 3 L%$' ($ M%'   1 ? #' +N(  %/% 1  1  OD5P 3

5Q% D4SRLA% 1 "$ $A T0

process [3] in reaction to evidence, or derived by an action utility index learned from the visual agent’s experience [11]. The contribution of the presented work is to introduce temporal context to the Bayesian analysis of object recognition processes (Fig. 1). Using a parametrized appearance based model [9], the fusion of successive probabilistic interpretations integrates information about the spatial structure of the object model. The view planner favours then object hypotheses which are consistent with learned observation trajectories in feature space. To enable real-time control, an instance based classifier is outlined to derive a decision policy directly from the stream of action sequences induced by the Bayesian planner. Due to efficient memory organization, the access time of action selection is further reduced.

2. Probabilistic object representation A robust object representation accounts for variations in the visual appearance encountered in the particular task environment. In contrast to deriving an abstraction of this variance from an a priori defined geometric object model, a statistical analysis of the sensor measurements is efficiently integrated in an appearance based object model. Furthermore, appearance representations can be automatically learned from experience while geometric models suffer from matching complexity and fail to work for complex shapes [5]. Appearance based object model The parametric representation proposed in [9] encodes effects of shape and reflectance (e.g., by variation of pose or illumination) of the object by vector locations in a subspace of the raw sensor footprints (Fig. 2) UWVEX . The Y -dimensional subspace Z , []\_^K`a4b0cc0cb-`d e-f VgZ , constructed by Karhunen-Lo e´ ve expansion is called eigenspace, spanned by basis vectors that are ranked according to their contribution to represent major variances in the original data. In the model learning phase, images are collected under discrete variation of a visual parameter hi (e.g., pose). The object model consists then of a set of vectors [j^@kl-b hi e , kl Vnm and hiVpo , where m denotes the object set and o the pose set, respectively. Bayesian interpretation In uncertain and noisy environments, classification on the basis of a crisp mapping from observations [ to symbols kl becomes unreliable, and soft computing methods are required to quantify the ambiguity by means of beliefs for multiple object hypotheses. [8] introduced probabilistic interpretation of single 2D views represented in eigenspace. [3] extended this method to apply Bayesian reasoning on object and pose parameters outlined as follows. Given the measurement about object kl under visual parameter hi , the likelihood of obtaining feature vector [ is

denoted by q ^O[sr kltb hi e . The likelihood is estimated from a set of sample images with fixed kl and hi , capturing the inaccuracies in the parameter h i such as moderate light variations or segmentation errors. From the learned likelihoods one obtains then via Bayesian inversion

u ^Okltb r [vej\ u u hi q ^K[sr kl'b hi e ^  h i r klLe ^@kl.e'w q ^K[xeb

(1)

and a posterior estimate with respect to the object hypotheu u ses kl is given by ^Oklyr [xeS\{z i ^Okltb hi r [ve . Note that a corresponding estimate for pose hi is determined by u ^ r kl-bt[xej\ u ^Okltb r [vetw u ^@kl#r [xe [3]. hi hi In practice, the likelihoods are estimated only at selected values of h , e.g. at most informative or periodic settings hiP|~}€P6h [3, 11]. The posterior estimates for intermediate values a Gaussian mixture model h are determined using u „ † a u ^K‡ˆe-‰‹ŠŒ€ Ž$ŒN^Ok l b h i r [xe , where [3], ^Ok l b h i r [ve

Suggest Documents