Learning Temporal Context in Active Object Recognition ... - CiteSeerX

In: Proc. 15th International Conference on Pattern Recognition, ICPR2000, Barcelona, Spain, September 3-8, 2000, pp. 695-699

Learning Temporal Context in Active Object Recognition Using Bayesian Analysis

Lucas Paletta Institute of Digital Image Processing Joanneum Research, Graz, Austria

Manfred Prantl Alicona GmbH Berchtesgaden, Germany

Axel Pinz Institute of Electrical Measurement and Measurement Signal Processing Graz University of Technology, Graz, Austria

Abstract

efficient dynamic cues to achieve object identification with arbitrary reliability.

Active object recognition is a successful strategy to reduce uncertainty of single view recognition, by planning sequences of views, actively obtaining these views, and integrating multiple recognition results. Understanding recognition as a sequential decision problem challenges the visual agent to select discriminative information sources. The presented system emphasizes the importance of temporal context in disambiguating initial object hypotheses, provides the corresponding theory for Bayesian fusion processes, and demonstrates its performance being superior to alternative view planning schemes. Instance based learning proposed to estimate the control function enables then realtime processing with improved performance characteristics.

Previous research on dynamic object recognition has focused on partial aspects of the problem. The temporal order of evidences in passively perceived image sequences has been extensively treated, e.g., for recognizing human gestures using hidden Markov models [17], applying adapted Kalman filters to minimize prediction errors [14], or with respect to learning by neural architectures [16, 2]. In contrast, active recognition provides the framework for selective collection of evidences to confirm/reject a current object hypothesis. The choice of action to a next target view is then either based on a set of precomputed most discriminative views [15], recursively determined in a Bayesian fusion

1. Introduction While research on object identification by analysis of a single 2-D pattern undergoes permanent improvement, the results will always depend on the ambiguity of a particular view, the imprecision of the object model, and the uncertainty in the imaging process. Instead of generating arbitrarily complex classifiers to solve an a priori ill-posed problem, dynamic information is often available to take advantage of the redundancy in different object views. Active recognition of 3-D objects involves the visual agent in a search for discriminative evidence to disambiguate initial object hypotheses. Temporal context in its observations associated to the spatial context of its 3-D object models provides then

This work is funded by the European Commission’s TMR project VIRGO under grant number ERBFMRXCT960049. This support and cooperation is gratefully acknowledged. Email: [email protected] (corresponding author). This work has been created while the author was working at GMD – National Research Center for Information Technology, Institute for Autonomous intelligent Systems, D-53754 Sankt Augustin, Germany.

control

perception

view planner

Bayesian fusion

classifier

appearance representation

action world

!#"$ &%' ($ )+*-,. /

0 ' 13241 3 56878"4 1 # 9%$3 0 2 :#"4 &;% 2 1 %? %4

1 ? % @4 AB* 2 #"4 CD 1 -%40 C*4% 1 AE ? % 1-1 F G3H JI K @5>%? 3 L%$' ($ M%' 1 ? #' +N( %/% 1 1 OD5P 3

5Q%D4SRLA% 1 "$ $AT0

process [3] in reaction to evidence, or derived by an action utility index learned from the visual agent’s experience [11]. The contribution of the presented work is to introduce temporal context to the Bayesian analysis of object recognition processes (Fig. 1). Using a parametrized appearance based model [9], the fusion of successive probabilistic interpretations integrates information about the spatial structure of the object model. The view planner favours then object hypotheses which are consistent with learned observation trajectories in feature space. To enable real-time control, an instance based classifier is outlined to derive a decision policy directly from the stream of action sequences induced by the Bayesian planner. Due to efficient memory organization, the access time of action selection is further reduced.

2. Probabilistic object representation A robust object representation accounts for variations in the visual appearance encountered in the particular task environment. In contrast to deriving an abstraction of this variance from an a priori defined geometric object model, a statistical analysis of the sensor measurements is efficiently integrated in an appearance based object model. Furthermore, appearance representations can be automatically learned from experience while geometric models suffer from matching complexity and fail to work for complex shapes [5]. Appearance based object model The parametric representation proposed in [9] encodes effects of shape and reflectance (e.g., by variation of pose or illumination) of the object by vector locations in a subspace of the raw sensor footprints (Fig. 2) UWVEX . The Y -dimensional subspace Z , []\_^Kà4b0cc0cb-`d e-f VgZ , constructed by Karhunen-Lo e´ ve expansion is called eigenspace, spanned by basis vectors that are ranked according to their contribution to represent major variances in the original data. In the model learning phase, images are collected under discrete variation of a visual parameter hi (e.g., pose). The object model consists then of a set of vectors [j^@kl-b hi e , kl Vnm and hiVpo , where m denotes the object set and o the pose set, respectively. Bayesian interpretation In uncertain and noisy environments, classification on the basis of a crisp mapping from observations [ to symbols kl becomes unreliable, and soft computing methods are required to quantify the ambiguity by means of beliefs for multiple object hypotheses. [8] introduced probabilistic interpretation of single 2D views represented in eigenspace. [3] extended this method to apply Bayesian reasoning on object and pose parameters outlined as follows. Given the measurement about object kl under visual parameter hi , the likelihood of obtaining feature vector [ is

denoted by q Ô[sr kltb hi e . The likelihood is estimated from a set of sample images with fixed kl and hi , capturing the inaccuracies in the parameter h i such as moderate light variations or segmentation errors. From the learned likelihoods one obtains then via Bayesian inversion

u Ôkltb r [vej\ u u hi q ^K[sr kl'b hi e ^ h i r klLe ^@kl.e'w q ^K[xeb

(1)

and a posterior estimate with respect to the object hypotheu u ses kl is given by Ôklyr [xeS\{z i Ôkltb hi r [ve . Note that a corresponding estimate for pose hi is determined by u ^ r kl-bt[xej\ u Ôkltb r [vetw u ^@kl#r [xe [3]. hi hi In practice, the likelihoods are estimated only at selected values of h , e.g. at most informative or periodic settings hiP|~}P6h [3, 11]. The posterior estimates for intermediate values a Gaussian mixture model h are determined using u a u ^Ke- $NÔk l b h i r [xe , where [3], Ôk l b h i r [ve

Learning Temporal Context in Active Object Recognition ... - CiteSeerX

Learning Temporal Context in Active Object Recognition ... - CiteSeerX

Suggest Documents

Color-Shape Context for Object Recognition - CiteSeerX

LEARNING OBJECT RECOGNITION STRATEGIES Bruce ... - CiteSeerX

Learning temporal context for activity recognition - Lincoln Centre for ...

Learning temporal context for activity recognition - STRANDS project

A Computational Learning Theory of Active Object Recognition Under

Multilevel Context Representation for Improving Object Recognition

Action Recognition Using Spatial-Temporal Context

Belief Tree Search for Active Object Recognition

Learning Words in Context - CiteSeerX

Object Recognition in Videos Utilizing Hierarchical and Temporal ...

Spatio-temporal Human-Object Interactions for Action Recognition in ...

Optimization Problems in Statistical Object Recognition ... - CiteSeerX

Optimization Problems in Statistical Object Recognition ... - CiteSeerX

visual object recognition in the context of mobile ... - Semantic Scholar

Improving Appearance-Based Object Recognition in ... - CiteSeerX

Learning and Recognition of Object Manipulation Actions ... - CiteSeerX

Recognition of Temporal Structures: Learning Prior and ... - CiteSeerX

Unsupervised temporal context learning using convolutional neural ...

Unsupervised temporal context learning using convolutional neural

On-Line Learning for Active Pattern Recognition - CiteSeerX

Robust Online Object Learning and Recognition

Learning Object Metadata - CiteSeerX

Learning Discriminative Canonical Correlations for Object Recognition ...

Object learning through active exploration - ENSTA ParisTech