AbstractâThis paper presents a motion detection scheme using laser scanners ..... A one-to-one association scheme is therefore chosen to avoid multiple ...
Probabilistic Scheme for Laser Based Motion Detection Roman Katz, Juan Nieto and Eduardo Nebot ARC Centre of Excellence for Autonomous Systems Australian Centre for Field Robotics The University of Sydney Sydney, NSW 2006, Australia {r.katz}@cas.edu.au
Abstract— This paper presents a motion detection scheme using laser scanners mounted on a mobile vehicle. We propose a stable, yet simple motion detection scheme that can be used and improved with tracking and classification procedures. The salient contribution of the developed architecture is twofold. It proposes a spatio-temporal correspondence procedure based on a scan registration algorithm. The detection is cast as a probability decision problem that accounts for sensor noise and achieves robust classification. Probabilistic occlusion checking is finally performed to improve robustness. Experimental results show the performance of the proposed architecture under different settings in urban environments.
I. I NTRODUCTION Advanced perception modules that allow reliable moving obstacle detection from moving vehicles would provide autonomous systems with the appropriate environmental information to plan actions or avoid collisions. Such techniques are at the core of navigation research, not only in the deployment of completely autonomous vehicles, but also of driver assistance systems for commercial cars. In this context, the information provided to the driver should be able to identify risky agents as other vehicles, pedestrians and bikes. Most of the existing dynamic obstacle tracking and classification architectures heavily rely on modules performing motion detection [1], [2] or assume the initial tracks are given [3], [4]. Within the approaches based on laser, the strategy presented in [5] is very related to our work. Motion detection is based on detecting change in temporal maps. Assuming that an estimation of the vehicle motion is available, a timestamp based representation is derived from the occupancy grid. It is then used to perform classification using different thresholds for identifying static and dynamic classes. Spatio-temporal data association in range data is also studied in [6], with focus on obtaining accurate aligned maps through non-rigid alignment and change detection. The work presented in [7] integrates detection of moving objects in the context of SLAM. The approach introduced in [8] proposes a Markov localization scheme to register scans, prior to perform detection and tracking of pedestrians. A model-based approach is chosen in [9], some models are proposed for foreground and background returns to perform initial detection from a static observer. Kalman filtering is used to filter the detected tracks assuming constant velocity models. In this paper we present an architecture to perform motion detection based only on range information provided by
laser scanners. The approach proposes a spatio-temporal correspondence procedure based on performing scan registration using the Iterative Closest Point (ICP) algorithm [10]. Using the obtained correspondences, a robust classification is performed by casting the detection as a probability decision problem that considers sensor noise and then computes probabilistic occlusion checking. The proposed algorithm does not require dead-reckoning sensors and does not build nor maintain any explicit map different from the scan registering capabilities. Neither explicit tracking nor temporal filtering is performed to achieve additional robustness. The system is feature independent, since no prior knowledge is assumed about the model of the objects, underlying classes, shape, etc. We aim at obtaining a stable, yet simple motion detection strategy that can be used and improved with tracking and classification procedures. This paper is organized as follows: the system is briefly introduced in Section II, presenting a general overview of the architecture and some of the basic concepts it relies on. Section III describes our proposed approach to perform motion detection using an ICP based spatio-temporal correspondence procedure, together with the probabilistic representation for the clusters and gating, multiple scan integration, and occlusion checking. To illustrate the performance of the system, results of experiments undertaken in a large urban environment are provided and detailed in Section IV. On this basis, conclusions and future work are discussed in Section V. II. S YSTEM OVERVIEW Changes in the environment sensed by a static observer can be almost immediately correlated to underlying dynamic behavior. However, when the observer is instead dynamic, correspondence between the objects needs to be determined first. This correspondence is defined as the association of scans that belong to the same underlying object, and a measure of its location displacement. Once the correspondence is established, the difference between subsequent snapshots in temporal sequence of spatial observations indicates potential dynamic behavior. Nevertheless, this discrepancy cannot be directly used since it can be produced by other phenomena rather than dynamic objects. As presented in [5], it can be caused either by new information obtained about unknown, stationary objects due to the motion of the observer, by the
(a)
(b)
(c)
Fig. 1. (a) shows a scan segmented into clusters. The different clusters are indicated using different colors. Dashed ellipses and boxes show the Gaussian representation (3σ contour) for some of the segmented clusters. (b) shows aligned scans after ICP registration. Different colors and markers indicate different scans (magenta ‘•’ and yellow ‘◦’). Laser returns from (a) projected (yellow ‘◦’) onto the camera image are shown in (c).
motion and artifacts produced by already discovered dynamic objects (occlusions), or by a combination of these effects. We address these problems by first establishing correspondence between successive scans using a spatio-temporal procedure based on a scan registration algorithm. Even though this scheme produces quite reliable hypotheses, it cannot identify dynamic behavior unambiguously. Hence, probabilistic criterion is used to perform a more robust classification. The decision criterion considers a probabilistic representation for the clusters in the scans that accounts for underlying sensor uncertainty, and then evaluates a validation gate between the corresponding associates clusters. Finally, an occlusion checking is evaluated using also the probabilistic representation for the clusters. We next present some of the concepts that are salient in our approach. A. Scan Segmentation The goal of scan segmentation is to obtain simple entities for the objects to simplify the representation, while capturing their salient characteristics. For laser scans, clustering can be efficiently done by comparing consecutive range measurements which arrive in order of sweep angle. For two successive returns, if the distance in either range or angle is greater than a threshold, the latter return forms the beginning of a new cluster. However, we want to use a generic clustering procedure that can also handle disordered sequences (i.e., merged scans taken at different times). Therefore, the clustering is performed in an Euclidean space where a distance threshold, defined by the desired behavior for dynamic obstacles detection, is used as the criterion for forming the clustering. An example of a laser scan segmented into clusters is shown in Figure 1(a). Figure 1(c) shows an image of part of the environment. The laser returns from (a) are projected (yellow ‘◦’) onto the image using the cameralaser registration procedure presented in [11]. B. Scan Registration Unprocessed data correlation, also called scan alignment or range-image registration, is the process of aligning a
measurement point set of (2D or 3D) points with a reference point set. There are several techniques to perform rangeimage registration, being ICP [10], [12] arguably the most commonly used, with its popularity mainly due to its simplicity and efficiency. The standard ICP algorithm works as follows. Let P = {bi } be a measurement data point set containing m points, and a reference point set X = {ai } considered fixed and containing n points. The goal of the registration is to find a transformation vector to align the measurement data points of P to the reference points of X. This transformation is composed by a translation and rotation. ICP is an iterative algorithm that consists in two steps. The first step involves finding the correspondence between points in the measurement and reference scan. This is done using a nearest neighbor algorithm. The second step is to obtain a rotation and translation between the two scans, minimizing the mean square error. The algorithm is initialized with an initial pose guess and, until the estimated pose satisfies some convergence criterion, it is iteratively refined by a process of point-to-point data association and least-squares minimization. Each point b i ∈ P is first transformed to the reference coordinate frame using the current pose estimate, and then associated to its nearest neighbor in X. The original point b i and its associate ai are added to an association set E. Finally, the pairs in E are used to calculate the relative pose that minimizes the mean-squared error between the associated points. For the experiments in this paper we use a quaternion based method as presented in [13]. An example of ICP alignment is shown in Figure 1(b). C. Probabilistic Cluster Representation In order to be able to perform a probabilistic gating for the classification of the associated segments between scans, we need to represent clusters in a Bayesian way. This representation should consider the underlying cluster uncertainty given by the laser returns that form each of the clusters.
For a set of range-bearing measurements, we first convert the laser measurements and their uncertainties to a sensorcentric Cartesian space. Each of the range-bearing returns zi = (ri , θi ) with Gaussian uncertainty R i is converted to Cartesian coordinates: r cos θi xi = f (zi ) = i ri sin θi Pi
=
∇fzi Ri ∇fzTi ,
(1)
where the Jacobian ∇fzi = ∂f /∂zi . We can now obtain a mean and a covariance for clusters composed of points with uncertainty as in (1). Considering the N Gaussian points obtained from the clusters’ returns expressed in Cartesian coordinates, we compute the first and second moment of the mixture for each of the clusters as in [14]: ¯ x
=
N 1 xi N i
P =
N 1 ¯) (xi − x ¯ )T . Pi + (xi − x N i
(2)
This representation for the clusters allows as to consider the sensor noise while capturing the “dispersion” of their composing returns with respect to their mean. It also facilitates the process of gating clusters in order to perform motion detection, as explained in the next section. The Gaussian representation for some of the segmented clusters is shown in Figure 1(a). III. M OTION D ETECTION In this section we discuss the main components of our framework for motion detection. We present a simple spatiotemporal procedure based on ICP scan registration aimed at determining correspondences in laser scans. To solve the correspondence, we first establish the association of scans that belong to same underlying objects (either static or dynamic) together with their corresponding displacement. The obtained correspondences are then used to classify the associated clusters as dynamic or static using probabilistic cluster representation and gating. A stage of probabilistic occlusion checking is finally applied to reduce the number of false positives. In order to properly formulate an algorithmic approach, we initially identify the possible situations that might arise when associating objects in laser scans. Scans can be initially correlated through a proximity criterion, since nearest neighbor clusters in aligned scans correspond to the same objects. Unfortunately, an incorrect correspondence can occur due to several reasons. It can be produced by “unstable” scans, i.e. returns that vanish or lack sufficient structure and then define either incorrect neighboring associations or displacement measures. Different portions of the same objects might be represented in subsequent scans, with correct nearest neighboring association but inconsistent displacement computation. It can also be generated by the occlusion produced by moving targets. In this case, clusters might actually be well associated in terms of neighboring criterion and
displacement, but it might not be correlated with a real actual discrepancy. The range to the object might also degrade the quality of the association, where distant clusters normally become less dense and unreliable. Moreover, there might be no correspondence between clusters, due to limitations in the visibility range or new moving targets appearing in the sensor field of view. The images in Figure 2 show an example where incorrect discrepancies are encountered. Figure 2(a) and (b) show two subsequent scans projected onto the corresponding images. Due to the to occlusion produced by a moving target (car), incorrect correspondences between clusters are established in the aligned scans shown in Figure 2(c). These incorrect associations are indicated by the labeled boxes “A” and “B” in each of the images. A. Spatio-temporal Correspondence Our approach to perform correspondence is built upon a scan registration procedure. We assume that a large majority of range information is static. Most of the laser returns usually correspond to background and non moving objects, and only a small portion to dynamic ones. This is further strengthened by the relatively wide field of view of the employed sensor. This assumption has been validated by empirical evaluation using datasets taken in real world, outdoor settings under normal situations 1. Let’s consider two subsequent laser scans that have been already segmented into clusters. We assume that egomotion information is not available. Therefore, ICP algorithm is an efficient and natural means to initially align the scans in the same reference frame. Once these two scans have been aligned, the association strategy can be defined. A nearest neighbor procedure could establish an initial set of associations. However, the associations of such a straightforward rule are not reliable since in many cases there might be multiple hypotheses, as can be seen in Figure 2(d). A one-to-one association scheme is therefore chosen to avoid multiple hypotheses situations. Since it is likely that same objects are the closest to each other between successive scans, then mutual nearest neighboring is also enforced to further improve the quality of the association. Moreover, all those associations that link clusters that do not have sufficient structure are eliminated. Associations are maintained if both clusters contain more than a minimum number of points. It is important to note that this filtering is done only after the ICP alignment and the initial associations have been determined and not before. The reason for this is that even when some clusters do not have enough structure, they might help to fill in gaps and reduce the chances of establishing incorrect correspondences that might occur if eliminated. B. Probabilistic Gating Once objects correspondence has been established using the scheme presented before, classification of the associated segments can be performed. Considering the mean and covariance (2) for the mixture of Gaussians that represents 1 A more thoroughly evaluation needs to be performed for extremely populated environments.
(a)
(b)
(c)
Fig. 2. Example situation with incorrect discrepancies generated by occlusion. (a) and (b) show two subsequent scans projected onto the corresponding images, and (c) depicts the incorrect correspondences in the aligned scans. The incorrect associations are indicated by the labeled boxes “A” and “B” in each of the images.
now each of the clusters, the classification of the associated clusters can be based on standard gate validation techniques. This is done by setting up a gate for each of the associated pair of clusters j, and classifying the association using the normalized innovation squared (NIS) [15]: νjT S−1 j νj < Md ,
(3)
¯ k and innovation covariance ¯ k−p − x with innovation ν = x S = Pk−p + Pk , for a defined separation p in the sequence of scans, at time k. Since the NIS follows a chi-square PDF, then the threshold M d , for an innovation of dimension d, can be chosen for a given level of certainty in the gate. C. Multiple Scan Integration Range requirements in the context of large, outdoor environments can be very demanding, often deriving in low spatial resolution and sparse density in the returns. We hereby expand the proposed architecture to be able to integrate several consecutive scans. The increase in the density of points can actually determine more “stable” scans and clusters, where many gaps are filled in so that the robustness of the overall procedure (mainly due to having obtained consistent correspondences) tends to increase notably. If we consider an arbitrary buffer of size m, we can iteratively perform segmentation (clustering) and ICP registration for the frames (k − p, k − p + m), and then compute correspondences similarly as before for two scans only. D. Occlusion Checking Due to the dynamic nature of the observer and obstacles in the scene, some false positives can still be detected by the classifier, hence a posterior occlusion checking analysis would reduce some of the undesired false positives. We address this issue by borrowing concepts from the approach presented in [16] where occlusion checking is performed in order to choose a camera configuration for optimal coverage. The aim here is to evaluate occlusion in a probabilistic way,
considering that both the targets’ and occluders’ locations are represented by their likely locations. Given a robot position R, the probability of occlusion for the target T and the occluder Q can be evaluated in a sampled-based version as: 1 Poc (T, Q|R) ≈ Poc (p, Q|R), (4) n p∈δT
where p represents a point drawn from the distribution δ T that represents the target T location, and n is the number of samples. Poc (p, Q|R) is the probability of occlusion of the point p by the occluder Q. This can also be evaluated using samples as: 1 Poc (p, Q|R) ≈ Poc (p, o|R), (5) m o∈δQ
with m the number of samples drawn from the occluder Q, and δQ the distribution representing its location. Finally, merging(4) and (5) the probability of occlusion for the target T and any occluder Q can be computed as: 1 Poc (T, Q|R) ≈ Poc (p, o|R), (6) n·m p∈δT o∈δQ
where both δT and δQ are captured for the laser clusters using (2), and P oc (p, o|R) is a simple visibility test. IV. E XPERIMENTAL E VALUATION In this section we present experimental results showing the performance of the proposed architecture. The datasets were obtained in an outdoor, urban environment at the University of Sydney campus, using the system provided by [17]. They consist of a sequence of images temporally correlated with laser scans provided by a Sick laser scanner LMS200 operating at high speed (500kBps), with vehicle velocities varying between 0-40km/h. The algorithmic procedure presented in [11] was used to compute the camera-laser calibration. Using these datasets, we first computed the performance of the probabilistic motion detection scheme, comparing
TABLE I C LASSIFICATION A CCURACY ( IN %).
True Positive False Positive False Negative
Probabilistic gating 90.02 26.81 9.98
Deterministic gating 85.2 28.73 14.80
are shown, one in each column, illustrating the detection of cars, bicycles and pedestrians The top row (Figure 4(a)-(c)) presents the laser scans and the associated clusters, whereas the bottom row (Figure 4(d)-(f)) shows the corresponding scans projected onto images. The detection results are indicated using solid boxes for each detected object in both the top and the bottom images. V. C ONCLUSIONS AND F URTHER W ORK Fig. 3.
ROC curves. Probabilistic versus deterministic motion detection.
results with a simplified scheme that applies deterministic thresholding. We chose the multiple scan version of the scheme, using a buffer size m = 4. The obtained improvement is shown in Figure 3 by means of Receiver Operating Characteristics (ROC) curves. ROC curves provide indication of the quality of the classifier as they plot true detection rate vs. false detection rate, as the discrimination criterion threshold varies. The criterion for constructing the ROC curves were the certainty level to perform gating in the probabilistic approach, and a distance threshold to classify the correspondence in the deterministic one. The total number of true dynamic obstacles for this evaluation was 1763, manually labeled for 10mins of logged data. As can be seen, the better classification of the probabilistic scheme over the deterministic counterpart corresponds to a larger Area Under the Curve (AUC), with AUC=0.855 and AUC=0.825, respectively. These results were obtained without using the occlusion checking approach presented in Section III-D, to compare the motion detection schemes with no additional decision stages on top. Additional evaluation was performed including the occlusion checking stage. Reasonable working conditions were chosen using the ROC curves presented in Figure 3, for both the probabilistic and the deterministic approach. For the case of probabilistic gating, a threshold of M d = 0.75 was chosen whereas a threshold of 0.3m was used for the deterministic evaluation (being this value the same one used in the segmentation stages). Table I shows the results for both cases. The true positive and false negative rates (%) are very good for both schemes using the probabilistic gating and its determinist implementation. A robust detection can be indeed performed assuming that a correspondence procedure has established solid cluster associations, and identifying discrepancies. The probabilistic approach for motion detection was extremely accurate, with levels of true positive and false positive rates of 90.02% and 26.81%, respectively, out of a total number of manually labeled dynamic obstacles of 1763. Figure 4 shows images of the system performing motion detection in different situations. Three different examples
The architecture presented in this paper introduces a motion detection procedure based on laser scans. Motion detection is performed by first establishing spatio-temporal correspondence between clusters through an algorithm based on a scan registration scheme. The detection is then achieved by obtaining probabilistic representation for each of the clusters, computing a validation gate and performing probabilistic occlusion checking. The proposed architecture is feature independent, since no prior knowledge is assumed about the model of the objects, underlying classes, shape, etc. The system performs motion detection without dead-reckoning sensors, explicit maps and tracking. We have presented experimental evaluation under different settings in outdoor, urban environments showing the classification performance of the proposed architecture. The system performs robust detection of different moving objects with low false positive and false negative rates. We are currently considering further evaluation and improvements of our architecture. The robustness of the ICP as a registration procedure under high speed (both in the observer and targets) still needs to be evaluated. Boundaries definition in terms of ICP correspondence reliability given the speed of observer/targets will be clearly correlated to the laser frequency. We are currently in the process of evaluating its limitations in this regard. Since the system is dependent on segmentation, we intend to further improve it by including multiple hypothesis handling in the clustering procedure [18], and by fusing visual information as in [19]. VI. ACKNOWLEDGMENTS This work is supported by the CRCMining, the Australian Research Council (ARC) Centre of Excellence program and the New South Wales Government. Also special thanks to QSSL for donating copies of the QNX Momentics Professional development system used to implement the real-time data acquisition system. R EFERENCES [1] Z. Sun, G. Bebis, and R. Miller, “On-Road Vehicle Detection: A Review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 694–711, 2006.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. System performing motion detection in different scenarios. The top row ((a)-(c)) shows the segmentation results and the associated clusters between subsequent scans. The bottom row ((d)-(f)) presents the corresponding laser scans projected onto the camera image for each of the situations in the upper row. The obtained results are indicated using solid boxes with same colors for the detected objects in both top and bottom images.
[2] K. M. Krishna and P. K. Kalra, “Detection, Tracking and Avoidance of Multiple Dynamic Objects,” Journal of Intelligent and Robotic Systems, vol. 33, pp. 371–408, 2002. [3] G. Monteiro, C. Premebida, P. Peixoto, and U. Nunes, “Tracking and Classification of Dynamic Obstacles Using Laser Range Finder and Vision,” in IEEE/RSJ International Workshop on Intelligent Robots and Systems. Beijing, China: IEEE, 2006. [4] O. Frank, J. Nieto, J. Guivant, and S. Scheding, “Multiple Target Tracking using Sequential Monte Carlo Methods and Statistical Data Association,” in IEEE/RSJ International Workshop on Intelligent Robots and Systems. Las Vegas, USA: IEEE, 2003. [5] E. Prassler, J. Scholz, and A. Elfes, “Tracking Multiple Moving Objects for Real-Time Robot Navigation,” Autonomous Robots, vol. 8, no. 2, pp. 105–116, 2000. [6] R. Kaestner, S. Thrun, M. Montemerlo, and M. Whalley, “A Nonrigid Approach to Scan Alignment and Change Detection Using Range Sensor Data,” Springer Tracts in Advanced Robotics, vol. 25, pp. 179– 194, 2006. [7] B. Wang, “Simultaneous Localization, Mapping and Moving Object Tracking,” PhD thesis, Robotics Institute, Carnegie Mellon University, USA, 2004. [8] D. Schultz, W. Burgard, D. Fox, and A. Cremers, “People Tracking with a Mobile Robot Using Sample-based Joint Probabilistic Data Association Filters,” The International Journal of Robotics Research, vol. 22, no. 2, pp. 99–116, 2003. [9] A. Fod, A. Howard, and M. Mataric, “Laser-based People Tracking,” in International Conference on Robotics and Automation. Washington DC: IEEE, 2002. [10] P. Besl and N. McKay, “A Method for Registration of 3-D Shapes,”
[11]
[12] [13] [14] [15] [16] [17] [18] [19]
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239–256, 1992. Q. Zhang and R. Pless, “Extrinsic Calibration for a Camera and Laser Ranger Finder (improves camera intrinsic calibration),” in IEEE/RSJ International Workshop on Intelligent Robots and Systems. Japan: IEEE, 2004. T. Bailey, “Mobile Robot Localisation and Mapping in Extensive Outdoor Environments,” PhD thesis, University of Sydney, Australia, 2002. B. Horn, “Closed-form Solution of Absolute Orientation using Unit Quaternions,” Journal of the Optical Society of America A, vol. 4, no. 4, pp. 629–642, 1987. J. I. Nieto, T. Bailey, and E. Nebot, “Recursive Scan-Matching SLAM,” Journal of Robotics and Autonomous Systems, vol. 55, no. 1, pp. 39–49, 2007. S. Blackman and R. Popoli, Design and Analysis of Modern Tracking Systems. New York: Artech House Radar Library, 1999. X. Chen and J. Davis, “ An Occlusion Metric for Selecting Robust Camera Configurations,” Machine Vision and Applications, accepted May 2007. ACFR, The University of Sydney and LCR, Universidad Nacional del Sur, “PAATV/UTE Projects,” Technical Report ACFR, Sydney, Australia, 2006. D. Streller and K. Dietmayer, “Object Tracking and Classification using a Multiple Hypothesis Approach,” in Intelligent Vehicles ’04 Symposium, 2004. N. Kaempchen, M. Zocholl, and K. Dietmayer, “Spatio-temporal Segmentation using Laserscanner and Video Sequences,” Lecture Notes in Computer Science, vol. 3175, pp. 367–374, 2004.