Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems September 28 - October 2, 2004, Sendai, Japan
Situation-based Multi-target Detection and Tracking with Laserscanner in Outdoor Semi-structured Environment Abel Mendes and Urbano Nunes Institute for Systems and Robotics University of Coimbra-Polo II 3030-290 Coimbra, Portugal Email: abfm,
[email protected]
Abstract— This paper addresses the development of an anti-collision system (ACS) based on a laserscanner, for lowspeed vehicles running in Cybercars scenarios. The ACS core is a multi-target detection and tracking system (MTDATS), which is able to classify several kind of objects and can be easily expanded to detect new ones. The MTDATS is composed by five modules: 1) scan segmentation; 2) situationbased information integration; 3) object classification using a suitable voting scheme of several object properties; 4) object tracking using a Kalman filter that takes the object type to increase the tracking performance into account; 5) and a database with the objects being tracked at each interval of data processing. For each database object, the time to collision with the vehicle is computed. The worst case time-to-collision and the correspondent predicted impact point on the vehicle, are sent to the path-following controller, which using this information provides collision avoidance behaviour.
I. I NTRODUCTION In this paper, we describe a multi-target detection and tracking system (MTDATS), based on a SICK LMS200 laserscanner. Its purpose is to process timely information about the vehicle’s surrounding environment for providing anti-collision behaviour of low speed vehicles running in Cybercars scenarios. Cybercars [1] is an European project with the goal to accelerate the development and implementation of novel urban transportation systems, based on fully automated road vehicles, for passengers and goods transportation. A major task of the project is to improve cybernetic technologies for the operation of autonomous vehicles. Cybercars have to satisfy challenging requirements like following a planned path with high accuracy, assuring complete safety with human driverless control, and performing new functions for road vehicles like platooning. The automotive industry has been concentrating efforts in the development of safety systems in order to decrease the danger caused by driver’s faults or distractions. The active cruise control (ACC) is one of those systems that is being developed mainly for highways driving [2], in order to relieve the driver of the rather monotonous job of constantly fine-tuning the vehicle’s speed. The bigger difference between this and Cybercars scenarios arises from the fact that the system that runs in the highways works as a velocity tracker, concerning to the velocities of the ahead vehicle, otherwise the system running on a
0-7803-8463-6/04/$20.00 ©2004 IEEE
Cybercars scenarios works more as a collision detection system and must take care about a large number of object types, like pedestrians, cycles and cars, moving in distinct environment situations. The anti-collision system (ACS) is being developed to be integrated in a set of automated electric vehicles from Yamaha Motor Europe (YME) for the evaluation and demonstration of a cybernetic transport system, like in a park and ride scenario [3], in the framework of Cybercars and CyberMove [4] projects. The experimentation site is an outdoor semi-structured environment once the vehicles will move in a defined loop. It is characterized by distinct situations from the point of view of the MTDATS, namely the following ones can be defined: 1) approaching to a ramp; 2) crossing of two vehicles moving in opposite direction in a straight path; 3) approaching a pedestrian crossing zone; 4) sharp curve of a narrow path limited by walls (in this case the vehicle sees a wall approaching before turning); 5) U-bend; 6) approaching a car-stop zone; etc. Some of these situations can be further complex, as it is the case of situation 4) with vehicles approaching the sharp curve from opposite directions. The MTDATS described in this paper, besides of having the usual modules of a detection and tracking system (scan segmentation, object classification and object tracking), it has an additional module to cope with the distinct situations faced by the vehicles. The prediction of objects behaviour is an essential issue for intelligent vehicles in order to prevent accidents. To achieve a good object tracking and behaviour prediction, an object classification module is essential in the system. Common classification methods are based on multihypotheses, being the objects reclassified over time with the new processed data. Vision systems can provide suitable information for pedestrian detection [5], vehicle detection [6] and for object classification [7], nevertheless object classification is being used with laserscanners [8], [9]. In order to increase the performance of classification, multilayer laserscanners are rising to minimize the problem of the lack of information noted in 2D laserscanners [10].
88
e
Laser Range Finder
Scan Segmentation
Object Classification
Object Tracking
Impact-Time Computation
c
rk+1
b s1 s2
...
sn
Situation-based information processing
Fig. 1.
Objects repository
2
d f
rk
Fig. 2.
Situation-based anti-collision system
Schematic of clustering method
A. Overview
B. Line fitting
Figure 1 shows the dataflow between the modules that constitute our situation-based ACS. The feedback loop means that information of past scans is incorporated in the classification and object tracking processes over time to improve its performances. In each sample period (after a scan), the prediction of the position and classification parameters, of the tracked objects, are updated. The ACS computes and sends to the vehicle path-following controller [11], the worst case time-to-collision and the associated prediction of the impact point on the vehicle.
Assuming that the surrounding environment can be approximated by polygonal shapes, then line fitting is a suitable choice for object faces approximation. Thus, a good object characterization and data reduction can be achieved, resulting on a set of line segments per cluster, defined as:
II. S CAN S EGMENTATION The purpose of the scan segmentation is to look for segments defined by several lines, fitting the points that represent each object. To do that, the process is started by grouping all the measures of a scan, into several clusters, according to the distances between consecutive measures, followed by the line fitting of the points of each cluster.
L = {li = [Pini , Pend , m, b]T : 0 ≤ i < n}
where the Cartesian points Pini and Pend are respectively the initial and the end point of the line, and m and b are the straight line parameters. For each cluster, our process starts with a recursive line fitting [12] with the purpose of detecting the break points on the set of points of the cluster. The process starts connecting the first and last point of the cluster by a straight line (Ax + By + C = 0), where A = yl − yf ; B = xf − xl ; C = −(Byf + Axf ) (Fig. 3.a)), and for all points in between the perpendicular distance to the line d⊥,k is calculated. d⊥,k =
A. Clustering The readings are subdivided into sets of neighbour points (clusters), taking the proximity between each two consecutive points of the scan into account. A cluster is hence, a set of measures (points of the scan) close enough to each other, which due to their proximity, probably belong to the same object. The segmentation criterion is based on the one proposed in [8]: two consecutive points, distant from the laserscanner, rk and rk+1 , belong to the same segment as long as the distance between them fulfils the following expression: p tan β · 2 · (1 − cos φ) (1) rk,k+1 ≤ C0 + rmin · cos φ2 − sin φ2 where rmin = min{rk , rk+1 }, rk,k+1 = |rk − rk+1 | and φ is the angular resolution of the laserscanner. β was introduced to reduce the dependency of the segmentation with respect to the distance between the laserscanner and the object, and C0 to handle the longitudinal error of the sensor. If C0 = 0, then β represents the maximum absolute inclination that an object’s face can have to be detected as an unique segment (Fig. 2). The distance d, represents the maximum difference allowable between two consecutive measures in order to consider them as belonging to the same object.
(2)
xk A + yk B + C √ A2 + B 2
(3)
If the maximum of these distances d⊥,max exceeds a given threshold, the line is split up in the point of d⊥,max (Fig. 3.b)) and two new sets of points are created. Afterwards, new tests of set splitting are made recursively for each set. The process ends, when no more splits take place (Fig. 3.c)), which means that all sets can be approximated to a line with an error less than the threshold. The above procedure is used simply to detect break points, since the line fitting results can be improved with an additional data processing (see Fig. 3.c, where the approximation line between points b and n is not the best one to fit all points in between). Thus, after splitting all the points of a cluster in subsets, an orthogonal regression [13] is applied to determine the straight line equation that minimizes the quadratic error (Fig. 3.d). C. Joining broken objects During the clustering and line fitting processes, no effort is made to adjust the lines to the most common objects that can be found. For that reason, the segmentation can be very faithful regarding to the measure points, however not corresponding to the related objects. This problem arises specially for targets like pedestrians that often result in two segments, or big objects like walls with smaller ones in between them and the laserscanner, resulting in the division
89
2800 2600
,max
5500
2400 2200
a)
5000
2000 1800
-1000
-500
0
500
a)
,max
4000
4000
b)
4500
3500
3500
3000 2500 2000 -4000
c)
-2000 -3000
-2000
-1000
0
1000
b)
-1500
-1000
c)
Fig. 4. Results of the scan segmentation module for most common shapes captured from a YME vehicle, with laserscanner positioned at point (0, 0): a) Front view; b) Front-lateral view. c) Part of the shape of the vehicle fitted as an ellipse (front-lateral view). The x and y axes are in mm.
d) Fig. 3.
Recursive line fitting example
of the big one in two or more objects. In order to avoid those problems, the following two algorithms are proposed. Legs detection: Observing a sequence of the laserscanner data, we can easily notice that the shape of a pedestrian walking, alternates between one and two small objects, representing the legs. Then, when two small segments separated by a distance less than 50cm are found, our algorithm join both in one segment. Broken lines: The detection of walls or other big obstacles partially occluded by smaller objects in front of them, is an iterative process that goes through the current segment set searching for lines bigger than a given threshold, and check for each one if there are other ones that might fit in the same line. Afterwards, for each pair of small lines that can fit the big one, another test must be done to make sure that there’s nothing behind. III. S ITUATION -BASED INFORMATION PROCESSING The purpose of this module is to optimize the detection and tracking of objects taking advantage of the knowledge of the vehicle’s particular situation, which is perceived easily since we are considering an outdoor semi-structured environment. Based on the knowledge of the specific situation, this module performs a refinement of the segmentation and in some cases sends state information to the impact-time computation module. The automated vehicles in use, YME vehicles, have a particular shape that leads for multi-segment detection by the scan segmentation module, as shown in Fig. 4. It is crucial to identify the YME vehicle from a set of patterns like those shown in Fig. 4. For that, the algorithm searches for patterns that can be well fitted by an ellipse, as depicted in Fig. 4c. For example, for the situation ”crossing of two vehicles moving in opposite direction in a straight path“ the process is based on the following tests: 1) The search area
is restricted to a front left area; 2) The size of the ellipse must fulfill the YME vehicle dimensions; 3) The estimated ellipse and the measures must have an high correlation. In this case the module joins the set of segments provided by the scan segmentation, which are associated to the same YME vehicle. Another example is the ”approaching to a ramp“. The ramp would be detected as a static obstacle, like a wall, then resulting a collision detection, if it is not considered that the vehicle has knowledge of this specific situation. In this case, the impact-time computation module is reported about the actual particular situation. IV. O BJECT C LASSIFICATION A new entity is introduced in this section named object, which is similar to segment, but classified in a sort of predefined types, tracked from previous scans and owning special features, like: type, velocity, size. The segments become objets after correspondence with objects detected on previous scans, or in case of lack of correspondence new objects are created. The object classification module is made up by the following three submodules. A. Segment-Object correspondence The segment-object correspondence can be subdivided in three types: the first one, where the correspondence is immediately assumed due to the small distance between the centre of geometry of the actual segments and of the predicted position of previous tracked objects; the second one, when the object tracking is not properly adapted to real movements, originating undesirable estimations of the object movements and consecutively bigger distances of the corresponding objects in consecutive scans; the third type, representing segments without correspondence with older objects. The process uses a table of segment-object distances lower than δ0 , see Table I. So, the process runs through the set of actual segments and calculates, per each one, the distance to every object previously detected and it fills out
90
the table with the results given by |oi − sj | if |oi − sj | ≤ δ0 λi,j = 0 if |oi − sj | > δ0
V(v)
(4) V0
with i = 1..m and j = 1..n, where m is the number of objects and n is the number of segments.
o1 o2 .. . om
s2
...
sn
λ11 λ21 .. . λm1
λ12 λ22 .. . λm2
... ... .. . ...
λ1n λ2n .. . λmn
Σλs1
Σλs2
...
Σλsn
L2
v
(m2, b2)
Σλo1 Σλo2 .. . Σλom
Fig. 5.
Graphical representation of voting function
approximation is very effective for most of the object types and assuring the worst case on the impact time computation module.
edist (oi , sj )
ρ2 edim (oi , sj ) P (oi , sj ) = ρ3 eorient (oi , sj ) ρ4 ρ5
L1
V2
The auxiliary parameters Σλsj and Σλoi are respectively the number of elements in column j bigger than 0 and the number of elements in row i bigger than 0. In Table I, the unambiguous correspondences are the (i, j) elements where Σλsj = 1 and Σλoj = 1, which means that, for segment j there is only one object i sufficiently close to be considered as correspondent. The problem becomes harder for bigger values of either Σλsj or Σλoi , which represent more than one hypothesis of correspondence to the segment j or object i. In order to skirt that problem, other characteristics are taken into account, like: distance; dimension; orientation; occluded time and life time, through the following weight function: T ρ1
L3 L0
TABLE I TABLE OF DISTANCES BETWEEN SEGMENTS AND OBJECTS s1
(m1, b1)
V1
(5)
Tocclusion (oi ) Tlif e (oi )
with i = 1..m and j = 1..n and where each weight ρu (u = 1...5) is obtained empirically in the range of 0..1 and represents how much each feature can help in the matching process. The values of the right vector are normalized with their maximum acceptable values. The occlusion time (Tocclusion ) represents the time since the last successful segment-object correspondence. When this time exceeds 2 seconds the object is deleted from the list of tracked objects. The weight P (oi , sj ), is calculated for each segment-object pair (i, j) not yet affected. The process continues by making the correspondence of segment-object pairs that own the maximum weight, which must be bigger than a given threshold. When the correspondence process is finished for both tables, the segments not affected will origin new objects. B. Geometrical approximation The geometrical approximation provides a reduction of the amount of data and of computational time, without loss of the main object’s shape characteristics. In this work we decided to approximate the objects by circles, when the size of the objects is lower than a threshold, and by rectangles if the size is bigger than the same threshold. This kind of
C. Classification Ideally, the object classification should assign to every object, a trustful classification at each scan. Which is not trivial using laser range data, since the shape of a same object, for instance a car, can look like a small line, two perpendicular lines or even other set of geometric features. The way to surpass this problem is to take into account as many features as possible from previous object detections, in particular dimensions and dynamics. Since we cannot classify an object with an high confidence immediately at the first scan when it appears, our classification method is based on a voting scheme considering every hypotheses over time, until not reaching an high classification confidence [14]. Each feature that characterizes the object, represent a voter actor, where the weight of the vote depends on the influence of the related feature to characterize the object, and on the value ν of that feature (equation (6), graphically shown in Fig. 5): V0 ν ≤ L0 m1 ν + b1 L0 < ν < L1 V (ν) =
V1
m2 ν + b2 V2
L1 ≤ ν ≤ L2 L 2 < ν < L3 ν ≥ L3
with ν ∈ R+
(6)
The confidence level is achieved by adding the votes of every actor, and when some of the hypothesis reaches a reasonable value, we assume that the object is classified on the type of that hypothesis. V. O BJECT T RACKING AND T IME - TO - COLLISION In this work, the object tracking is performed by a Kalman filter, assuming an object model with constant velocity and white noise acceleration [15], considering different maximum accelerations for each object type. The time-to-collision computation module uses the results of the object tracking system to estimate the time-to-collision and the vehicle’s impact point, for each tracked object. The method consists on projecting all possible points of impact in the direction of the object velocity, assuming for each instant a constant object velocity relative to the vehicle. As we can see in Fig. 6, these points are the
91
d
8000 7000
Vehicle
f
4->wall
6000 5000
e
10->pedestrian
4000
b
v
c
Object
7->pedestrian
2000 1000
1->wall
15->wall
3000
a
2->unknown 3->pedestrian
0->pedestrian
8->post
0
Fig. 6. Geometrical method of collision computation. (vector V represents the obstacle velocity relatively to the vehicle; edge e will be the impact point)
-1000 -2000 -8000
-6000
-4000
-2000
0
2000
4000
6000
8000
a) 8000 7000 4->wall
6000 5000 4000
2->unknown 3->pedestrian
5->car
1->wall
3000 6->pedestrian
2000 1000
7->post
0->pedestrian
0 -1000
Fig. 7. Snapshot captured from the experiment of two YME vehicles moving in opposite direction in a regular road, with several pedestrians and stopped cars.
edges of the car (d, e, f ) and of the object (a, b, c). So, from the projection lines starting on the object, defined by the starting points and velocity vector, we select the shortest line that intercepts a line segment of the boundary of the vehicle. Applying the same method for the projection lines that start from the vehicle, we finish the process and achieve the shortest collision distance. With the knowledge of the object velocity and the shortest distance, we can easily determine the time-to-collision.
-2000 -8000
-6000
-4000
-2000
0
2000
4000
6000
8000
b) Fig. 8. Results of the ACS: a) Without taking into account the specific situation. A false collision is reported, which is identified in the figure by the star symbol; b) Taking into account the specific situation. In this case the YME vehicle is detected and tracked, which is illustrated by the bounding box containing the set of segments detected from the laserscanner raw data.
VI. R ESULTS Two experiments are analyzed that illustrates the effectiveness of the algorithms: the first one addresses the situation of crossing vehicles moving in opposite direction; and the second one illustrates the approaching to a ramp.
Fig. 9. Snapshot captured from the experiment of the YME vehicle approaching to a ramp
A. Crossing of two vehicles moving in opposite direction
B. Approaching to a ramp
Figure 7 was captured from the experiment of two YME vehicles moving in opposite direction in a regular road, with several pedestrians and stopped cars, in the surrounding area. In Fig. 8.a) is shown the result of the ACS algorithm without using situation-based information. Because of the particular shape of this vehicle, the YME vehicle is not identified as a single moving object, but as a set of instable objects with erratic movements. Therefore false collisions are often reported by the ACS, as it is the case illustrated in Fig. 8.a). The same data was processed by the situation-based ACS. The results obtained are shown in Fig. 8.b), in which the YME vehicle is detected properly and represented by a rectangular bounding box. The robustness of the Kalman based object tracking was verified, even when many fails on YME vehicle observation from laserscanner data occurs, as shown in Fig. 11.a).
Figure 9 was captured from the experiment of the vehicle approaching a ramp. In Fig. 10.a) is shown the result of the ACS algorithm without using situation-based information. The ramp is not identified as a single obstacle, but as a set of instable obstacles with varying positions, not allowing a succeeding tracking. The same data was processed by the situation-based ACS. The results obtained are shown in Fig. 10.b), in which the ramp is detected properly and represented in the figure by a thin rectangular bounding box. In this case the estimated relative velocity between the obstacle and the vehicle, causes a collision detection. The predicted impact does not affect the system since the impact-time computation module is reported of the actual specific situation. The robustness of the Kalman based object tracking was verified. As shown if Fig. 11.b), the result of the object classification (blue points) is filtered
92
6000
7000 18->wall
6000
17->unknown
15->wall
8000
5500
7500
5000
5000 4000 3000
7000
4500
6500
4000
3->wall 1->wall
6000
3500
5500
3000
5000
2000 2500
1000 0 -1000 -4000
4000
1500 -2500 -2000 -1500 -1000
-2000
0
2000
-1000 -500
0
500 1000
b)
Fig. 11. a) Crossing of two vehicles experiment. Red points represent the YME vehicle estimated position by the object tracking module; blue points represent the position of YME vehicle provided by the object classification module. This result, with many fails on YME vehicle observation from laserscanner data, is obtained considering that the situation based module does not make use of information from object repository database. b) Approaching to a ramp experiment. Red points represent the ramp estimated position by the object tracking module; blue points represent the position of the ramp provided (interception line between the ramp and the laserscanner plane) by the object classification module.
7000 4->wall
6000
a)
4000
a)
5000 4000 3->wall 3000
4500
2000
0->wall
1->wall
2000 1000 0->wall
R EFERENCES
0 -1000 -4000 -3000 -2000 -1000
0
1000
2000
3000
4000
b) Fig. 10. Results of the ACS: a) Without taking into account the specific situation. The ramp is seen as a set of different objects; b) Taking into account the specific situation. In this case the ramp is detected.
by the Kalman filter resulting in the estimated position of the line defined by the interception of the ramp and the laserscanner plane, represented by the red points. As we can observe, while the vehicle approaches the ramp, it is perceived as a static obstacle. Afterwards, as expected, no more obstacle is detected, as soon as the vehicle begins to rise the ramp. VII.
CONCLUSION
A situation-based ACS system integrating a time-tocollision computation is described in this paper. The system proved to be efficient on tracking multi-objects over time, resulting in good velocity estimates regarding different environments that show up in Cybercars scenarios. Using the velocity estimates, reliable results in time-to-collision computation are achieved. The tracking process can be improved by using other process models in the Kalman algorithm, e.g. the kinematic models for the vehicles, rather than to use always the same process model for all the objects (the constant velocity model). ACKNOWLEDGMENT This work was partially supported by EC project CyberCars (www.cybercars.org); and by FCT (Portuguese Science and Technology Foundation) project POSI/SRI/41618/2001.
[1] “www.cybercars.org,” CyberCars Project. [2] BMW, “Acc - active cruise control,” BMW AG, Munich, Germany, Document de travail pour s´eminaire, 2000. [3] A. Valejo, T. Meisner, J. Dias, and U. Nunes, “Cybernetic transport systems in coimbra: Evaluation and demonstration for cybermove project,” 2004 European Ele-Drive Transportation, Conference and Exibition on Urban Sustainable Mobility is Possible Now!, March 2004 Portugal. [4] “www.cybermove.org,” CyberMove Project. [5] D. Gavrila, J. Giebel, and S. Munder, “Vision-based pedestrian detection: The protector system,” IEEE Intelligent Vehicles Symposium, Parma, Italy, June 2004. [6] C. Hoffmann, T. Dang, and C. Stiller, “Vehicle detection fusing 2d visual features,” IEEE Intelligent Vehicles Symposium, Parma, Italy, June 2004. [7] A. J. Lipton, H. Fugiyoshi, and R. S. Patil, “Moving target classification and tracking from real-time video,” IEEE Image Understanding Workshop, pp. 129–136, 1998. [8] K. C. J. Dietmayer, J. Sparbert, and D. Streller, “Model based object classification and object tracking in traffic scenes,” IEEE Intelligent Vehicle Symp., Tokyo, Japan, pp. 25–30, 2001. [9] D. Streller and K. Dietmayer, “Object tracking and classification using a multiple hypothesis approach,” IEEE Intelligent Vehicles Symposium, Parma, Italy, June 2004. [10] K. C. Fuerstenberg and K. Dietmayer, “Object tracking and classification for multiple active safety and comfort applications using a multilayer laserscanner,” IEEE Intelligent Vehicles Symposium, Parma, Italy, June 2004. [11] L. C. Bento, U. Nunes, A. Mendes, and M. Parent, “Path-tracking of a bi-steerable cybernetic car using fuzzy logic,” Int. Conf. on Advanced Robotics, Coimbra, vol. 3, pp. 1556–1561, July 2003. [12] T. Einsele, “Localization in indoor environments using a panoramic laser range finder,” Ph.D. dissertation, Technical University of M¨unchen, September 2001. [13] mathpages, “Perpendicular regression of a line,” http://mathpages.com/home/kmath110.htm. [14] T. Deselaers, D. Keysers, R. Paredes, E. Vidal, and H. Ney, “Local representations for multi-object recognition,” DAGM 2003, Pattern Recognition, 25th DAGM Symp., pp. 305–312, September 2003. [15] M. Kohler, “Using the kalman filter to track human interactive motion - modelling and initialization of the kalman filter for translational motion,” Informatik VII, University of Dortmund, TR 629, 1997.
93