A Bayesian Approach for Time-Constrained 3D

16 downloads 0 Views 2MB Size Report
of the observed point cloud with each object instance x as learned in an ... [1] S. Gan, R. Fitch, and S. Sukkarieh, “Online decentralized information gathering with ...
A Bayesian Approach for Time-Constrained 3D Outdoor Object Recognition [Extended Abstract] Timothy Patten, Abdallah Kassir, Wolfram Martens, Bertrand Douillard, Robert Fitch and Salah Sukkarieh Abstract— Classifying objects in complex unknown environments is a challenging problem in robotics and is fundamental in many applications. Modern sensors and sophisticated perception algorithms can extract rich 3D textured information, but are limited to only the data that are collected from a given location or path. We are interested in closing the loop around perception and planning and focus on the problem of planning scanning sequences to improve object recognition from 3D laser data. While in previous work the segmentation of a scene into its constituent objects had been considered a preprocessing step to classification, we also address the problem of joint segmentation and classification using a novel probabilistic representation.

I. I NTRODUCTION Information gathering is an important family of tasks for outdoor robots that is central to a wide variety of applications including agriculture, environmental monitoring, mining, defence, surveillance, and disaster management. Many information gathering tasks, such as searching [1] and tracking [2], naturally involve a planning component that seeks to choose future observation locations in order to maximise an information-theoretic objective. Object segmentation and classification, however, is traditionally cast as a passive perception problem where data are generated as part of a disconnected navigation process and fed into a perception processing pipeline. This paper advocates an active approach for classifying objects outdoors, and attempts to close the loop around planning and perception in this context. In the case of 3D laser sensors, choosing the proper viewpoint of the robot is key to perception quality because data from these sensors is highly viewpoint dependent; variation in the position and orientation of the sensor relative to the environment can result in large changes in the distribution of 3D points in the scene. Achieving high-quality perception is therefore strongly coupled to planning robot motion, and there is exciting potential to dramatically improve the quality of outdoor object classification through planning. Our research agenda focuses on planning scan sequences to improve estimates of object class labels from 3D laser scans outdoors (Fig. 1). Here, we explore two directions: (1) a new problem formulation, based on the idea of a timebudgeted goal, that enables a fair comparison with passive perception, and (2) a novel probabilistic approach to joint segmentation and classification of unknown static scenes. This work is supported in part by the Australian Research Council (DP140104203), the Australian Centre for Field Robotics and the NSW State Government. T. Patten, A. Kassir, W. Martens, R. Fitch and S. Sukkarieh are with the Australian Centre for Field Robotics (ACFR), The University of Sydney. {t.patten,a.kassir,w.martens,rfitch,salah}@acfr.usyd.edu.au B. Douillard is with Zoox. [email protected]

Fig. 1. A mobile robot using 3D laser to classify outdoor objects such as trees, furniture, and vehicles.

II. R ELATED WORK Classifying 3D objects has traditionally been performed using data collected offline or from preselected locations [3], often assuming a known set of classes [4]. Recent interest focuses on long term and long range operation [5], motivating in part the need for active online viewpoint selection. Scene segmentation [6] typically considers structured indoor cases and recent work [7] emphasises the need for studying active segmentation probabilistically. Information gathering typically seeks to maximise the rate of information gain, and has been applied to exploration, search [1] and tracking [2]. In these tasks, robots choose actions by predicting informative observations. Informative path planning [8] exploits submodularity [9] to provide performance guarantees but is not applicable in our case. Although maximising mutual information is submodular, 3D outdoor object recognition does not satisfy a number of other requirements such as knowing all viewpoints a priori. Active vision [10] is related but distinct from our case. The typical objective is to classify a small number of objects (often household) while minimising the number of views, constrained to orbit the scene [4]. We consider large environments with potentially many objects and observations. III. ACTIVE BAYESIAN O BJECT R ECOGNITION In many applications, such as agriculture and surveillance, a natural roadmap exists from which the robot may deviate in order to improve perception. We propose a problem where a robot with a 3D range finder navigates to a given location in a specified time, while making use of the time budget to improve classification performance by resolving occlusions and choosing discriminatory views. It is then possible to compare to passive strategies given the same time budget.

ge increase on from the start he time of the nt to twice the xperiments we y of the voxels s assigned the each unknown is the entropy es. All entropy he entropy after

30

0.6

Object 1 Object 2 GOAL (c)

25 20

Object 1 Object 2

0.4

(b)

0.2 0

(a) 0

(c)

(b) 100

150

200

(b)

0.2

0.1 0 0.7 0.4 0.6 0.5

START 0.8

5

0.6

0

5

10

15

20

25

30

x

Object 1 Object 2 (a) 0

50

0.5

0.6 0.4

(b) 100

150

(c) 200

(a) (c) Fig. 2. (a) Environment with a number of objects and robot’s path, with Fig. 5: Example showing the selected viewpoints chosen by the observations indicated by asterisks. Classification accuracy (b) and entropy active planning algorithm. (a) The environment with a number of of two objects. ervations from (c) objects and the path with observations indicated by the asterisks, (b) of the active classification accuracy of two objects, (c) entropy of two objects.

Let X be a random variable denoting the class of a single

ints that result with high con- object. According to Bayes’ rule, the probability distribu-

tion of X after observation Zk is given by p(x|Z0:k ) = ed through the η p(Zk |x) p(x|Z0:k−1 ), assuming conditional independence tom-right hand for different observations. Here, η is a normalisation constant nd corner. The and p(Zk |x) denotes the likelihood for observation Zk . Note e asterisks. The that the prior and the posterior distribution, p(x|Z0:k−1 ) patches, which and p(x|Z ), respectively, are categorical distributions on 0:k e observations. Fig. 6: Side view of the environment used in the simulation. the known set of possible object classes, characterised by de a number of elled Object 1 a probability mass vector P. We calculate the likelihoods byEach determining the the similarity ICP) between the shape figure shows entropy (using for the active planner (solid red 9a). line), the active planner which only maximises information of the observed point cloud with each object instance x as cy for the two (dottedinpink andstage. three random trials (dashed blue lines). an line) offline 5c the entropy learned examples show that the entropy a saturation at certainty about These We measure the usefulness of reaches an observation point after a significant number of observations. However, 0 to the fact that a candidate location y by the mutual information they that the active planner reaches the saturation point e. In particular, I(X n show n n ; Zthan = H(X making ) − H(X |Zy0 ). Here, H(X nthat ) = y0 ) randomly faster observations. This implies nly observed in P − thex∈X p(x) ln p(x) denotes the entropy of a single observations it chooses have accumulated information at to the objects n object observationthe Z and H(X y0 , planner y0 ) the = a fasterbefore rate. Furthermore, active which|Z uses mprove as seen P n − p(z)H(X |z) is the conditional entropy, where utility function defined in (11) reduces the uncertainty faster z∈Zy0 all reduction in than the strategy which maximises information but ignores 0 y0 . to go to the far Zy denotes the set of possible observations at locationthe time budget. This can be explained by the fact that planning he both objects To compute mutual information for future viewpoints we by maximising selects good but may his observation predict expected information observations, taking intoviewpoints account occlusions result in a situation where it exhausts its budget too quickly. by the steeper according to the current scene estimate. Calculating condiAt this point, the only option is to take the quickest route to etween (a) and tional entropy exactly is intractable since it sums over all at the robot is the goal. The active planner implementing (11) reserves its possible including combinations all possible budget observations, and uses its time more wisely to take theofopportunity poses of each object. Instead, we approximate theabout set of to make good observations that it might not have known ne towards the using the most likely observation for each class. in the beginning. dget constraint. observations To Real support strategies, the global utility ntil location (c) C. Datavarious planning P function U (yk0 , Pk ) = I(X n ; Zyk0 ) is maximised n∈N Fig. 0 8 shows an overhead image of the set up from where over yk , given the mutual information of candidate locations the real dataset was collected. The environment consisted of for each of the objects present in theobject scene.classes. Pk is the 12 different objects from 7 different The union object of nt consisted of priorareprobability mass shown in Fig. 9. vectors for all objects. By adding 2 street signs) thetypes to represent distance and time, this utility also 10a and Fig. 10b show the reduction of function entropy for high level of termsFig. two different budgetsa goal (60% location and 140%) for one startforces the robottime to reach within a specified in Fig. 6. bservations are time-budget. goal configuration. The figures show the trendwith as inthe Fig. 2 shows an example pathsame together the synthetic data experiments that the entropy approaches a art-goal config- resulting classification accuracy and entropy for two objects.

budgets (rows).

z

1

10

0

50

(a)

15

0.3

saturation level, but the active planners reach it at a faster rate.

IV. J OINT SEGMENTATION AND CLASSIFICATION In cluttered scenes, segmenting occupied space into different objects is not trivial. Segmentation and object classification are mutually dependent and need to be addressed simultaneously. Scene segmentation is then reformulated as the problem of assigning the observed parts of a scene to

y

0.7

Fig. 3. Urban scene (left) and simulated pointclouds (right). The coloring of the pointclouds illustrates the association with 6 different objects present in the scene, according to a single draw from the Gibbs-sampler.

the different objects present. As the number of objects is not known a priori, we suggest a Dirichlet-process as prior for parts association. The analytical representation of the infinite-dimensional system is intractable, but samples can be generated using Markov-Chain Monte-Carlo methods. Observations are used to update the joint probability distribution P(X ) = P(X, A, V, Z), where X, A, V, Z denote the states for the different objects (class, location and orientation), the association for each observed part, locations of observed parts and a set of observations. We have implemented a Gibbs-sampler to generate samples from the conditional distribution P(X, A|V, Z) in simulation. ICP-alignment is used for classification, in combination with a Gaussian model for the parts’ locations V. Fig. 3 shows a 3D scene together with the simulated pointclouds. V. D ISCUSSION AND OUTLOOK Preliminary results show the benefit of our active approach and support our intuition of diminishing returns over time. Next steps are to incorporate joint segmentation with active classification and to consider multi-robot variants. R EFERENCES [1] S. Gan, R. Fitch, and S. Sukkarieh, “Online decentralized information gathering with spatial-temporal constraints,” Auton. Robots, vol. 37, no. 1, pp. 1–25, 2014. [2] Z. Xu, R. Fitch, J. Underwood, and S. Sukkarieh, “Decentralized coordinated tracking with mixed discrete-continuous decisions,” J. Field Robot., vol. 30, no. 5, pp. 717–740, 2013. [3] B. Douillard, J. Underwood, V. Vlaskine, A. Quadros, and S. Singh, “A pipeline for the segmentation and classification of 3D point clouds,” in Experimental Robotics, O. Khatib, V. Kumar, and G. Sukhatme, Eds. Springer, 2014, pp. 585–600. [4] M. Huber, T. Dencker, M. Roschani, and J. Beyerer, “Bayesian active object recognition via Gaussian process regression,” in Proc. of FUSION, 2012, pp. 1718–1725. [5] A. Collet, B. Xiong, C. Gurau, M. Hebert, and S. Srinivasa, “Herbdisc: Towards lifelong robotic object discovery,” Int. J. Rob. Res., vol. 34, no. 1, pp. 3–25, 2015. [6] C. Cadena and J. Kosecka, “Semantic parsing for priming object detection in indoors RGB-D scenes,” Int. J. Rob. Res., online 2014. [7] H. van Hoof, O. Kroemer, and J. Peters, “Probabilistic segmentation and targeted exploration of objects in cluttered environments,” IEEE Trans. Robot., vol. 30, no. 5, pp. 1198–1209, 2014. [8] J. Binney, A. Krause, and G. Sukhatme, “Optimizing waypoints for monitoring spatiotemporal phenomena,” Int. J. Rob. Res., vol. 32, no. 8, pp. 873–888, 2013. [9] G. Nemhauser, L. Wolsey, and M. Fisher, “An analysis of approximations for maximizing submodular set functions–I,” Math. Program., vol. 14, no. 1, pp. 265–294, 1978. [10] S. Chen, Y. Li, and N. Kwok, “Active vision in robotic systems: A survey of recent developments,” Int. J. Rob. Res., vol. 30, no. 11, pp. 1343–1377, 2011.