Reinforcement Learning of Object Detection ... - Semantic Scholar

submission to: SIRS 2000, 8th International Symposium on Intelligent Robotic Systems The University of Reading, England, July 18-20 2000

Reinforcement Learning of Object Detection Strategies Lucas Paletta, Erich Rome GMD – National Research Center for Information Technology Institute for Autonomous Intelligent Systems D-53754 Sankt Augustin, Germany {paletta,rome}@gmd.de Abstract. Mobile agents performing dynamic sensing without control on information acquisition rely on arbitrarily distributed information. The integration of irrelevant, ambiguous or misleading evidence may result in poor classification performance. In contrast, active fusion schemes seek to acquire evidence that is most appropriate to the current task, e.g., to disambiguate the current state of belief. This paper understands the fusion process in visual object detection as a sequential decision problem. Reinforcement learning enables to develop an efficient strategy to fuse decisive information in terms of a sensorimotor mapping. The presented system learns object models from visual appearance and uses a connectionist architecture for a probabilistic interpretation of the 2-D views. The expected gain in the global classification accuracy provides a utility measure to reinforce actions leading to discriminative viewpoints. The system is verified in experiments with a sewer robot on the task of visually detecting house inlets in sewage pipes for navigation purposes. Crucial improvements in performance are gained using the learned fusion strategy in contrast to arbitrary action selections.

1

Introduction

The detection and localization of objects of interest in sensor data is an important step in the analysis of robotic systems environments. Particularly, autonomous visual navigation relies on robust classification under uncertain environment conditions. While research on object identification by analysis of a single 2D pattern undergoes permanent improvement, the results always depend on the ambiguity of a particular view, the imprecision of the object model, and the uncertainty in the image formation process. Instead of generating arbitrarily complex classifiers to solve an a priori ill-posed problem, dynamic information is often available to take advantage of the redundancy in different object measurements. Active detection of 3-D objects involves the visual agent in a search for discriminative evidence to disambiguate the initial object hypotheses. Learning the fusion strategy aims then at integrating only information that has already been experienced of being decisive w.r.t. the characteristics of a particular task and its environment. Related research is focused on the combined tracking and recognition task on a continual video image stream. Crowley and Berard [5] describe an efficient system to detect and track faces based on control that selects the sequencing of multi-modal perceptual processes. Bregler [3] describes the learning of a probabilistic decomposition of visual dynamics at multiple abstractions integrated into a gesture representation by Hidden Markov Models. These systems do not treat decision making on fusion operations w.r.t. a specific task goal. An early description of an active object recognition system was proposed by Hutchinson and Kak [10] based on Dempster-Shafer belief accumulation. Various sensors are controlled with respect to disambiguation, the experiments are performed in a simple block world. Paletta and Pinz [13] extend this concept to appearance based object representations and a learning framework which enables a utility sensitive adjustment of recognition strategies. The contribution of the presented work is to introduce reinforcement learning methods in the context of visual object detection tasks. A decision making agent learns the fusion behavior directly from the interaction with its stochastic environment, in a sensorimotor feedback loop (Fig. 1). The proposed visual detection system automatically acquires an appearance based object model (feature extraction), develops a mapping from 2-D views to object hypotheses (neural posterior network, based on previous work described in [14], section 2), integrates incoming information into a global belief estimation (decision fusion, section 3), and learns to selectively fuse the view based information (reinforcement controller, section 4). Object relevant

a Sensor

Probabilistic representation

Fusion

Feature extraction

RBF posterior network

Decision fusion

Control

s, R

Reinforcement agent

a Object Tracker

Fig. 1.: Concept of the reinforcement driven detection system. A decision making agent adjusts its fusion behavior performed by actuators a on sensor positioning (solid) and parameter tuning (dashed) from results of visual feedback, i.e., via states s and rewards R. information is tracked through succeeding images via feature and confidence based correspondence, respectively. The fusion strategy is stored by a parametric data structure which enables the agent, in contrast to exhaustive planning methods, to reactively apply decisions, i.e. in response to perceptive inputs.

2

Object Detection Posterior Maps

Visual Learning of Objects Appearance based object representations consist of a collection of raw sensor footprints (e.g., Fig. 3) combining effects of shape and reflectance [12]. In contrast, geometric models suffer from matching complexity and fail to work for complex shapes [7]. Instead of storing high-dimensional pixel patterns, the sensor vector can be transformed by principal component analysis (PCA, [8]) to a lowdimensional representation in feature space, called eigenspace. Its basis is constructed by taking the d most prominent directions ei of maximum variations in the presented data set, g = (e1 , . . . , ed )T x,

(1)

g = (g1 , . . . , gd )T is the projection of image x into the eigenspace of dimension d. Distances in eigenspace are a measure of image correlation [12]. Recognition is supported by the property that close points in subspace correspond to similar object appearances. Probabilistic Interpretation by RBF Networks Object representations that implicitly model the uncertainty in eigenspace must consider estimates of the data density. The present system extends the Bayesian framework proposed by Moghaddam and Pentland [11] to combine eigenspace features and a radial basis functions (RBF) network according to Ranganath and Arun [18], under definition of a rejection class to enable a closed-world interpretation ([14]. RBF networks [4] consist of an input layer, a hidden layer of M basis functions φj , and an output layer of linear activation units yk (g) = M w φ (g), with weights [W]kj . The basis units, typically modelled j=1 kj j as Gaussian functions φj (g) = exp − g − µj 2 /2σj2 , act as receptive fields with center µj and spread σ on the eigenspace . Training of the free parameters of the RBF network [4] is preferably separated into two phases. First, the centers µj are identified with the centers of representative data clusters, e.g., using the EM (expectation-maximization) algorithm [6]. Second, the output weights are determined under minimization of some error function. For a probabilistic interpretation of the input data [19], one requires the N training vectors (g1n , . . . , gdn , Oκn ) being associated to the binary target output values tnk = [T]nk = δkκ , k = 1 . . . Ω and Ω is the number of objects Ok , to optimize the error of the posterior estimate. To evaluate any test data, the feature vector g is fed to the RBF network and mapped to the output values yk for a posterior estimate Pˆ (Ok |g) = αyk (g), α is a normalizing constant. Posterior Map Generation Detection of the object of interest is performed by a scan over the image data in search for locations which excite a high response of the RBF classifier. A priori knowledge can be exploited to determine an initial region of interest in terms of a support map and to focus further processing on this

STEREO CAMERAS

(a)

(b)

INLET

PCA RBF

P(o|g) = 0.9

Fig. 2.: (a) Schema of posterior map generation. (b) Sewer robot platform KURT II with stereo cameras and inlet object.

image region [14]. Fig. 2a depicts the schema of posterior map generation. Locations inside the attentioncontrolled support region (dashed) identify centers of receptive fields (RF, solid), which are transfered to feature extraction (PCA) and probabilistic interpretation (RBF) modules. Corresponding entries in the posterior map reflect then the confidence value in a particular object hypothesis.

3

Temporal Decision Fusion

Dynamic object detection relies on a process that keeps track of the object relevant region, i.e. the RF, through succeeding images. The integration of evidences from multiple measurements about an object is then expected to improve the performance of a decision maker [16, 2, 13]. The presented tracking algorithm extends the correlation based tracker of Crowley and Berard [5] by establishing correspondence between succeeding images I1 , I2 using both feature and confidence related information. It initializes a search region in I2 in the neighborhood NI2 of the pixel location cMAP (I1 ) in I1 with gI1 ,MAP = arg maxgI1 ,i P (O|gI1 ,i ). While this neighborhood provides approximate correspondence

of appearance features by gI2 ,MAP = arg mingI2 ,j ∈NI2 gI2 ,j − gI1 ,MAP within an error bound gI2 ,j − gI1 ,MAP < , the tracker pursues the object relevant region of gI2 ,MAP . Otherwise, the focus of attention is shifted to a new center cMAP (I2 ) . In a sequence of T observations gt , evidence about the object Ok in question is collected by the corresponding posterior map, i.e. the posterior estimates Pˆ (Ok |gt ). The agent updates the confidence in its classification by fusion of the current estimate Pˆ (Ok |gT ) with the integrated beliefs Pˆ (Ok |g1 , . . . , gT −1 ). Statistical inference is accomplished using recursive Bayesian updating of the posterior probabilities [15], assuming conditional independence of the observations w.r.t. Ok , P (Ok |g1 , . . . , gT ) = βP (Ok )

T

p(gt |Ok ),

(2)

t=1

β is a normalizing constant. The conditional probabilities p(gt |Ok ) are recovered via inversion from the posteriors P (Ok |gt ), estimated by the RBF architecture (section 2). A comparison of different fusion schemes w.r.t. active object recognition [2] stresses advantages of a probabilistic framework. Due to its conjunctive behavior [1], the worst classification dominates the fusion results, rendering it sensitive to inconsistent evidence, which is a desirable effect for detection processes. The fusion process is terminated when the confidence in a particular object hypothesis exceeds a predefined threshold.

4

Reinforcement Learning of Fusion Strategies

In each state of the detection process, a decision making agent is asked to select an action to drive its classifier towards a reliable decision. E.g., a robot capable of view planning provides choices among a set

of motor commands that determine a specific sequence of fusion steps. Learning to recognize objects means then to explore different viewpoints, to quantify consequences in terms of a utility measure, and to adjust the control strategy thereafter. The Markov decision process (MDP, [17]) provides the general framework to outline active object detection in a multistep decision task with respect to the discrimination dynamics. A decision making agent must ∞ k act to maximize the utility Q(s, a), i.e. the expected cumulative reward Qπ (st , at ) = Eπ k=1 γ Rt+k , where γ ∈ [0, 1] is a discount factor, Rt+1 is the reward, and a policy π determines the probability π(s, a) of taking action a in state s. Reinforcement learning [20] is concerned with obtaining an optimal policy by ˆ t , at ) for the maximum exexploring the utilities Q(s, a) by experience. In Q-learning [21], the estimates Q(s ∗ ∗ π pected utility Q , Q (s, a) = maxπ Q (s, a), received in subsequent steps are recursively updated for residuals of a consistency condition by ˆ t , at ) = η Rt+1 + γ max Q(s ˆ t+1 , a ) − Q(s ˆ t , at ) , ∆Q(s (3) a

where η is the learning rate. Reinforcement learning applied to object detection refers then to determining the current state of the fusion process st and registrating a reward signal Rt under the execution of appropriate actuators, such as motor commands, image processing routines, expectation operators, etc. (Fig. 1). The choice of reward function Rt in detection tasks may concern entropy drops, classification accuracy, or energy consumption terms [13].

5

Experiments

Experiments were conducted in the context of a navigation task, i.e., to visually detect house inlets in sewage pipes (Fig. 2), extending the results described in [14]. Sewer robotics is an emerging application of service robots that is being investigated at GMD [9]. The long-term goal is to construct an autonomous robot that is capable of performing inspection tasks in sewers that are too small to be accessed by human workers. Autonomous navigation in the sewage network represents a major problem since it must rely on self-localization using existing landmarks. The experiments describe the robust detection of inlet objects (Fig. 2,3) in order to refine the spatial localization, providing real-time classification capabilities. Probabilistic interpretation of appearance Image acquisition was performed in a dry sewer test net at the GMD site [9], using a CCD camera mounted on the robot platform KURT II (Fig. 2b). From a representative set of grayscaled video frames, RFs were manually cropped at inlet objects, whereas background data were generated by random RF selection from regions containing no inlets (Fig. 3). Object patterns were tilted and represented in different image resolutions, to support rotation and scale invariance [14]. Each RF image was then normalized w.r.t. size and energy [12, 14]. PCA was applied to determine the orthogonal basis for the eigenspace (section 2). The posterior maps were estimated by a RBF neural architecture given the set of N labeled training vectors (g1n , . . . , gdn , On )T . The choice of parametrization - low eigenspace dimension d = 10, size of basis layer M = 20 - is argued by combining sufficient classification accuracy [14] with low complexity of the architecture to reduce processing time. Evaluation of fusion strategies The fusion controller was trained using images captured when autonomously navigating towards different inlet objects. While the tracking mechanism (section 3) was continually focusing on an object location, Bayesian fusion updates were operated only at time cycles determined by the fusion controller. As a result of the learning session over 25000 detection trials, the learned fusion strategy (’Q-learner’, section 4) significantly outperforms a random policy (’Random strategy’) in the accuracy of the final classification (Fig. 4a). In addition, ’Q-learner’ is capable to detect objects much earlier requiring image sequences with ≈ 23% shorter lengths (Fig. 4b). The curves are obtained by using 6-fold cross-validation (5 sequences training/1 sequence test) and by computing each point as the mean from 1000 trial results. The parameters Q(s, a) of the controller were stored in a lookup table of |S| × |A| entries. States are determined both by the distance of the tracking center to the vanishing point [14] and the integrated belief value w.r.t. inlet object identification. Actions represent motor commands causing characteristic forward robot movements, the reward signal was determined by the final classification accuracy, with RT = 1 if arg maxOk P (Ok |g1 , . . . , gT ) = Oτ , and RT = 0 otherwise, and Oτ denotes the true object label.

(a)

(c)

Image 270

Map 270

Image 280

Map 280

(b)

Fig. 3.: A subset of the training set, displaying characteristic images: RF patterns of the (a) inlet and (b) background object. (c) Sample images with corresponding posterior maps. 105

1

Q−learner Random strategy

0.98

100 0.96

Q−learner Random strategy

0.92

sequence length

accuracy

0.94

0.9 0.88

95

90

85

0.86 0.84

80

0.82 0.8 0

(a)

0.5

1

1.5

2

trials

75 0

2.5 4

x 10

(b)

0.5

1

1.5 trials

2

2.5 4

x 10

Fig. 4.: Learning curves depicting the (a) increase in classification accuracy, and (b) the reduction of detection sequence lenghts, with onward training efforts of a reinforcement controller (Q-learner).

6

Conclusions and Future Work

The presented work introduced the concept of reinforcement learning to develop efficient fusion strategies for visual object detection tasks. This new methodology was verified to significantly increase the accuracy in the classification and to critically reduce the time required for detection. Navigation experiments with real world data in the context of sewer robotics confirm the applicability of the detection system. Learning schemes for adaptation of the feature space, the probabilistic interpretation and the action selection policy result in an environment specific implementation appropriate to the given task. The system should easily scale up considering a pool of specialized modules, each selected for operation after an initial likelihood test. Directions of future work are seen in the exploitation of the temporal context in the sequence of object views, active fusion of different sensor readings (visual, laser, ect.), and a thorough comparison on the applicability of different object tracking methods.

References 1. I. Bloch. Information combination operators for data fusion: A comparative review with classification. IEEE Transactions on Systems, Man and Cybernetics, 26(1):52–67, January 1996. 2. H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz. A comparison of probabilistic, possibilistic and evidence theoretic fusion schemes for active object recognition. Computing, 62:293–319, 1999. 3. C. Bregler. Learning and recognizing human dynamics in video sequences. In Proc. Conference on Computer Vision and Pattern Recognition, pages 568–574, 1997. 4. D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2:321–355, 1988. 5. J. L. Crowley and F. Berard. Multi-modal tracking of faces for video communications. In Proc. Conference on Computer Vision and Pattern Recognition, 1997.

6. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38, 1977. 7. S. Edelman. Computational theories of object recognition. Trends in Cognitive Sciences, 1:296–304, 1997. 8. K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1990. 9. J. Hertzberg, Th. Christaller, F. Kirchner, U. Licht, and E. Rome. Sewer robotics. In Proc. 5th Intl. Conf. on Simulation of Adaptive Behavior, pages 427–436, 1998. 10. S. A. Hutchinson and A. C. Kak. Planning sensing strategies in a robot work cell with multisensor capabilities. IEEE Transactions on Robotics and Automation, 5(6):765–783, 1989. 11. B. Moghaddam and A. Pentland. Probabilistic visual learning for object representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):696–710, 1997. 12. H. Murase and S. K. Nayar. Visual learning and recognition of 3-D objects from appearance. Intl. J. of Computer Vision, 14(1):5–24, 1995. 13. L. Paletta and A. Pinz. Active object recognition by view integration and reinforcement learning. Robotics and Autonomous Systems, 31(1-2), 2000. 14. L. Paletta, E. Rome, and A. Pinz. Visual object detection for autonomous sewer robots. In Proc. International Conference on Intelligent Robots and Systems, pages 1087–1093, 1999. 15. J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco, CA, 1988. 16. A. Pinz, M. Prantl, H. Ganster, and H. Borotschnig. Active Fusion - A New Method Applied to Remote Sensing Image Interpretation. Pattern Recognition Letters, 17(13):1349–1359, 1996. 17. M. L. Puterman. Markov Decision Processes. John Wiley and Sons, New York, NY, 1994. 18. S. Ranganath and K. Arun. Face recognition using transform features and neural networks. Pattern recognition, 30(10):1615–1622, 1997. 19. M. D. Richard and R. P. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3(4):461–483, 1991. 20. R. S. Sutton and A. G. Barto. Reinforcement Learning. The MIT Press, Cambridge, MA, 1998. 21. C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279–292, 1992.

This article was processed using the TEX macro package with SIRS2000 style

Reinforcement Learning of Object Detection ... - Semantic Scholar

Reinforcement Learning of Object Detection ... - Semantic Scholar

Suggest Documents

Reinforcement Learning and Visual Object

Associative Reinforcement Learning - Semantic Scholar

Reinforcement learning through modulation of ... - Semantic Scholar

Bifurcation Analysis of Reinforcement Learning ... - Semantic Scholar

Reinforcement Learning of Cooperative Behaviors ... - Semantic Scholar

Applications of Reinforcement Learning to ... - Semantic Scholar

Object learning and detection using evolutionary ... - Semantic Scholar

Reinforcement Learning for Partially Observable ... - Semantic Scholar

A Generalized Reinforcement-Learning Model - Semantic Scholar

Reinforcement Learning for Spoken Dialogue ... - Semantic Scholar

Prioritized Sweeping Reinforcement Learning ... - Semantic Scholar

Sequence labeling with Reinforcement Learning ... - Semantic Scholar

Reinforcement Learning When Visual Sensory ... - Semantic Scholar

Designing a Reinforcement Learning-based ... - Semantic Scholar

Forgetting in Reinforcement Learning Links ... - Semantic Scholar

Reinforcement learning based dual-control ... - Semantic Scholar

Faster Reinforcement Learning After Pretraining ... - Semantic Scholar

PAC Model-Free Reinforcement Learning - Semantic Scholar

Simple reinforcement learning agents: Pareto ... - Semantic Scholar

Distributed Reinforcement Learning Based MAC ... - Semantic Scholar

Multiagent reinforcement learning with adaptive ... - Semantic Scholar

Autonomous Multiagent Reinforcement Learning ... - Semantic Scholar

Reinforcement Learning and Dynamic Optimization - Semantic Scholar

Reinforcement Learning in Distributed Domains - Semantic Scholar