12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 2009
Non-parametric Laser and Video Data Fusion: Application to Pedestrian Detection in Urban Environment S. Gidel, C. Blanc, T. Chateau, P. Checchin and L. Trassoudaine LASMEA - UMR 6602 CNRS Blaise Pascal University Aubière, France
[email protected] Abstract – In urban environments, pedestrian detection is a challenging task for automotive research, where algorithms suffer from a lack of reliability due to many false detections. This paper presents a multisensor fusion method based on a stochastic recursive Bayesian framework also called particle filter which fuses information from laser and video sensors to improve the performance of a pedestrian detection system. The main contributions of this paper are first, the use of a non-parametric data association method in order to better approximate the discrete distribution and second, the modeling of the likelihood function with a mixture of Gaussian and uniform distributions in order to take into account all the available information. Simulation results as well as results of experiments conducted on real data demonstrate the effectiveness of the proposed approach. Keywords: Particle filters, kernel density estimation, laserscanner, video camera, sensor fusion, likelihood computation.
1 Introduction Currently, more than 8,000 vulnerable road users, pedestrians and cyclists, are killed every year in the European Union. Accident statistics indicate that despite recent advances in safety due to the introduction of passive safety systems and tighter pedestrian legislations, pedestrian accidents still represent the second largest cause of traffic-related injuries and fatalities [1]. So, the pedestrian detection becomes an essential functionality for future vehicles. This issue takes place in the context of the LOVe Project (Software for vulnerables observation) which aims at improving road safety, mainly focusing on pedestrian security [2]. For a broad review of the various sensors used for pedestrian detection, one can consult [3] and [4] where piezoelectric, radar, ultrasound, laser range scanner sensors and cameras operating in the visible or in the infrared are described. Using video sensors to solve detection and identification problems seems natural at first, given the capacity of this type of sensor to detect/analyze the size, the shape and the texture of a pedestrian. Many methods to detect human
978-0-9824438-0-4 ©2009 ISIF
beings have been developed in computer vision based on monocular or stereoscopic images [6]. However, the strong sensitivity to atmospheric conditions, the wide variability of human appearance, the limited aperture of this sensor and the impossibility to obtain direct and accurate information concerning depth have given rise to an interest for the development of a detection method starting from an active sensor like a radar or a laser sensor. The ability of laser system based pedestrian detection to count and track has been proved, even in the case of a very high-density crowd [7] [8]. However, the obvious limitations of this sensor (no information about shape, contour, texture, color of objects), its sensibility to atmospheric conditions such as rain and fog and the frequent occlusions between objects, require to devise a method of laser/camera fusion to improve a pedestrian collision avoidance system. A study of sensor-based pedestrian detection, presented in [5], indicates that the laser scanner cooperation with cameras appears to be a good solution to develop. So, the problem is how to combine the diverse and sometimes conflicting amounts of information in the best manner, to outperform the best results expected from the use of a single sensor technology. The main difficulty of data fusion lies in the association of the new observations coming from different sensors. Thus, two distinct problems have to be jointly solved: data association and estimation. The conventional approaches are based on the linear Kalman filter and lead to data association such as JPDAF (Joint Probabilistic Data Association Filter) or the MHT (Multiple Hypotheses Tracker) which differ in their association techniques but which all share the same Gaussian assumptions [9]. Unfortunately, tracking pedestrian does not cope with linear movement and Gaussian noises. Under such assumptions (stochastic state equation and nonGaussian noises), particle filters are particularly appropriate [10]. In this paper, a novel laser/camera based system is presented, that aims at reliability, real-time monitoring and tracking multiple people in an urban environment. This article is organized as follows. In Section 2 the ap-
626
proach in the LOVe project framework is explained. Then, the system and the sensors used by Renault manufacturer are described. Section 3 presents a non-parametric approach in a multisensor fusion framework. In Section 4, simulation and experimental results are presented to demonstrate the effectiveness of the proposed approach. Finally, the conclusion is presented in Section 5.
2 Overview 2.1 Our approach The multisensor outdoor pedestrian detection system has been designed to fit the technical specifications defined within the LOVe Project [2]. The purpose is to develop safe and reliable software for the observations of "vulnerables". However this software must allow a fast industrial exploitation (validated software modules, standardized and portable). In this context of standardization, the LOVe project specifications provided a list of common input and output for all the software blocks. In this output list, the dej N tected pedestrian set is defined by Zk = (z1k , ..., zk z ), with Nz the number of observations at instant k, and is associated with a "Detection Overall Rate" (DOR) defined by γkj = (γk1 , ..., γkNz ) with (γkj ∈]0, 1]) assessing its reliability. In order to track pedestrians from a multisensor framework, a Sequential Importance Resampling Particle Filter (SIR PF) is proposed. It is based on a non-parametric approach using the particle set and all the DORs to compute probabilities for each data association. Moreover, these DORs can be included in the likelihood function with traditional uncertainty associated at each detection. So, a mixture of Gaussian and uniform distributions is proposed. The problem is how to combine in the best manner, the diverse and sometimes conflicting information provided by two sensors in order to obtain better results than with only one sensor.
2.2 Vehicle description An IBEO laser sensor and two camera sensors equip the Renault test vehicle. The IBEO ALASCA XT is mounted in the center of the frontal area of the vehicle and two SMAL video cameras are on top of the car for simultaneous recording of the scene (see Figure 1).
Figure 1: Location of sensors in the Renault test vehicle. are confirmed to match pedestrians by means of a vision based classifier [11]; • centralized fusion: the measurements from the various sensors are merged (associated and tracked) in a same central block [12]; • decentralized fusion: each sensor system detects, classifies, identifies and tracks the potential pedestrians before being merged in a track-to-track fusion block [13]; • hybrid fusion: available information includes both unprocessed data from one sensor and processed data from the other one. [14]. In this paper, a centralized fusion architecture is chosen.
3.1 System Architecture Pedestrian detection system architecture is shown in Figure 2. The pedestrian detection is performed in the laserscanner [8] and video image frames [15]. A centralized fusion module is developed whose main contributions are: • a non-parametric data association, • a mixture of Gaussian and uniform distributions for likelihood computation, • the computation of a fusion confidence factor.
3 Sensor Data Fusion The task of sensor data fusion in automotive applications uses multiple sensors to constitute an all-around detection system to overcome the deficiency of an individual sensing device. Many works have been carried out to combine redundant and diverse measurement data from several sensors. In the particular case of pedestrian classification, several approaches have been proposed. Four main fusion architectures are identified in the literature:
3.2 SIR PF In the following section, the theory of the sequential Monte Carlo methods in the framework of multiple object tracking is briefly reminded. For more details, the reader can refer to Doucet’s work [10]. Let us consider a discrete dynamic system:
• serial fusion: the laserscanner segments the scene and then provides some ROIs (Regions Of Interest), which
627
Xk = f (Xk−1 ) + Wk
(1)
Zk = h(Xk ) + Vk
(2)
Video source
3.3 Non-parametric data association
Laserscanner
3.3.1 Introduction Pedestrian
Non-parametric methods allow to take into account the samples and their space distributions in the space parameters. Let Nx be the number of targets to track. This number is unknown at instant k. In this paper, multi-target tracking consists in estimating the state vector obtained by concatenating the Nx vector of all targets. The vector Xk = {xk1,l , ..., xkNx ,l }Nl=1 is given by the state equation (1) decomposed in Nx equations:
Pedestrian
Detection
Detection
Pedestrian
Pedestrian
Tracking
Tracking
Input Vector according to LOVe specifications
Xk = Fk (Xk−1 , Wk )
Association
where N is the number of particles, and the noises Wk are supposed to be spatially and temporally white. The observaN tion vector collected at time k is denoted by Zk = (z1k , ...zk z ), with Nz the number of observations deduced from the process: Zk = Hk (Xk , Vk ) (6)
(at the same time)
Filter update (likelihood computation)
Once again, the noises Vk are supposed to be independent white noises. The association matrix Ak is introduced to describe the association between the measurements Zk and the targets Xk . A non-parametric framework is chosen in order to estimate the association matrix Ak . Two techniques make it possible to generate a succession of areas which satisfy good conditions of estimation:
Tracks
Data Fusion
Confidence Factor in Data Fusion
1. by fixing the volume of the area like a function of n samples, for example Vn = √1n .
Figure 2: Multi-module architecture using lidar and vision information for pedestrian detection and classification. In red, main contributions proposed in this paper.
where Xk represents the state vector at instant k. No assumption is made about the two functions f and h, whereas Vk and Wk are supposed to be two independent white noises. Particle filters provide an approximate Bayesian solution to discrete time recursive problems by updating a description of the posterior filtering density p(xk |z1:k ). This a posteriori belief represents the state in which the objects are. The prior distribution of the recursive Bayesian filter p(xk |z1:k−1 ) is approximated by a set of N samples, using the following equation: p(xk |z1:k−1 ) =
(5)
1 N ∑ δ (xk − xik ) N i=1
(3)
where δ is the discrete Dirac function. Then the posterior distribution p(xk |z1:k ) can be estimated by: N
p(xk |z1:k ) = p(zk |xk ) ∑ p(xk |xik−1 )
(4)
i=1
It is the "Parzen Window" method. 2. by adapting the size of the areas with sample √ numbers kn fixed according to n, for example kn = n. It is the K nearest neighbors method. In this paper, the "Parzen Window" method is chosen in order to exploit the Kernel Density Estimation (KDE) allowing to extrapolate the data on the entire population. Finally, a Bernoulli variable wh ∈ {w1 , w2 } is also defined, given by wh = w1 if the associated event is classified as fused data or wh = w2 in all other cases. 3.3.2 Parzen association for particle filter An approach is proposed to build a non-parametric model based on kernel functions allowing a smart selection of the most pertinent data fusion from a likelihood analysis function. This method is not supervised, so no prior knowledge is required to process data fusion. The likelihood function j p(zk , xik |wh ) represents the probability that a 2D particle belongs to the fused data. The likelihood p(zkj , xik |wh ) will be modeled on a Parzen window which calculates the distance between an observation zkj located in the image and all its neighbors xki,l such as:
This approach can be implemented by a bootstrap filter or a Sampling Importance Resampling (SIR) filter.
628
p(zkj , xik |wh ) =
1 N ∑ ϕ (zkj , xki,l ) N l=1
(7)
j
where ϕ (zk , xki,l ) is the kernel function which allows to modify the influence zone of observation with its neighbors. A mixture of Gaussian and uniform distributions is used in order to merge in a same distribution all the information j available from mono-sensor algorithm outputs. ϕ (zk , xki,l ) is given by: −1 1 , j )+ γkj ·exp[−λc ·dc (zkj , xki,l )] j γk γk (8) The parameter λc permits to adjust the weights. The generalized squared interpoint distance dc is defined by:
ϕ (zkj , xki,l ) = (1− γkj )·U (
j
j
i,l T k dc (zk , xki,l ) = (zk − xki,l )Σ−1 ϕ (zk − xk )
(9)
with the covariance matrix Σϕ , result of the sum between the covariance matrix ΣSP2 given by the tracking algorithm (representing the variance on pedestrian position) and the measurement noise covariance matrix R: Σϕ = ΣSP2 + R
(10)
The 2D particle xki,l ∈ w1 having the highest probability is chosen by the maximum likelihood estimator such as: Aki, j |(i, j) = arg max(p(zkj |xik )|w1 ))
3.4 Computation of the confidence factor in data fusion All currently tracked objects are tested to determine if they are or not the result of fusion between laser and video data. For this purpose, each target is evaluated by computing its Confidence Factor in Data Fusion (CFDF). Three criteria constitute the CFDF: the Confidence in the Age of Track (CAT), the Detection Overall Rate (DOR) and the Sensor Fusion Rate (SFR). The CAT allows to evaluate if the pedestrian target has been tracked for a long time or not. The DOR, provided by mono sensor algorithms, is the rate of confidence that the detected object is actually a human. The SFR allows to evaluate if the track is the result or not of data fusion between laser and video pedestrian measurements. The CAT and the SFR are computed from a Gaussian distribution: ( 1 √ exp [− 12 ( t−σµo o )] 0 < t ≤ µo σo 2π CAT(t) = (13) 1 t > µo where µo represents the minimum time of life of a track without observation, σ o allows to decrease more and less quickly the CAT and t is the age of the track. SFR(x) =
(11)
(i, j)
3.3.3 Likelihood computation SIR PF allows to approximate the filtering distribution p(xk |z1:k ) by a weighted set of N particles. So, the data association above being known, the next step consists in computing the weights of all the particles belonging to the associated gravity center given by (11). Thus, the weight list Lki, j is calculated from a mixture of Gaussian and uniform distributions (see Figure 3) in order to keep all the information used during the data association step. Lki, j is defined as follows: j i, j (12) Lk = {ϕ (zk , xki,l )}
1
1
0.8
0.8
Probability density
Probability density
Finally, the weights are normalized before the resampling stage. This algorithm is summarized in Algorithm 1.
0.6
0.4
0.2
0 −1.5
0.6
0.4
0.2
−1
−0.5
0 X [m]
0.5
1
1.5
0 −1.5
−1
−0.5
0 X [m]
0.5
1
1.5
Figure 3: An example of a mixture of Gaussian and uniform distributions, with in blue the uniform distribution, in green the Gaussian distribution and in red the distribution mixture. On the left, σ = 0.15 m and DOR = 0.95. On the right, σ = 0.15 m and DOR = 0.55.
σf
1 √
1 x − µf exp[− ( )] 2 σf 2π
(14)
Where µ f represents the theoretical ratio between the number of laser data and the number of video data, σ f allows to decrease more or less quickly the sensor fusion rate and x is the ratio between the number of laser data and the number of video data. Finally, the final result is given by: CFDF =
CAT + SFR + DOR 3
(15)
4 Experiments This section presents simulations and experiments which allow to validate the algorithm of laser and video data fusion.
4.1 Simulations The goal of these simulations is to show how the algorithm of data fusion improves the pedestrian tracking system. First, a study of this data association algorithm is proposed. In Figure 4, a scenario with two pedestrians detected from laser and video data was generated. The relevance of a data association based kernel density estimation is demonstrated here when several measurements from different sensors can be associated with the same track. Second, a study of likelihood computation is proposed. In Figure 5, a cloud of random particles (red points) was generated. The particles (blue star) representing the cloud center were selected as measurements. According to the likelihood (see Figure 3), the weights are computed (green points). With the same uncertainty concerning the position, the results are different according to the "detection overall rate". This last
629
Algorithm 1 Non-parametric Data Fusion 1
1. Set k = 0, generate N samples from each measurement j,l N ,l j = 1, ..., Nz , i.e {x0 }Nl=1 = {x01,l , ..., x0 z }Nl=1 where j,l j x0 = p(X0 ).
0.9
O1: position uncertainty : +/− 5 cm pedestrian confident rate : 0.55
0.8 0.7
2. Compute the matrix Ak for all measurements and targets (Nz , Nx ).
Y [m]
0.6 0.5
O2: position uncertainty : +/− 70 cm pedestrian confident rate : 0.95 O3: position uncertainty : +/− 55cm pedestrian confident rate : 0.89
0.4
if (Ak ≤ α ) with α = the gate threshold
0.3
Aki, j = p(zkj , xik |wh ) where p(zkj , xik |wh ) is the association probability for hypothesis i using N particles according to equation (7).
Particles of object 1 Particles of object 2
0.2
laser data video data
0.1
data5
else = 0 then
{xki,l }
i,l = {xk−1 }
i.e,
=
wlk
l ∑N l=1 wk
0.2
0.3
0.4
0.5 X [m]
0.6
0.7
0.8
0.9
1
go to item 5.
3. Compute the weights wki,l = Lki, j (12) and normalize, wki,l
0.1
.
4. Generate a new set {xki,l∗ }Nl=1 by resampling with replacement N times from {xki,l }Nl=1 , where Pr(xki,l∗ = xki,l ) = wlk .
Figure 4: Data association from several sensors. Here, according to the nearest neighbor criterion, O1 would be associated to object 1 and O3 to object 2, whereas the correct association is given by the Parzen algorithm i.e O2 with object 1 and O3 with object 2, while O1 is in reality a pole. 1.5
Y [m]
i,l 5. Predict (simulate) new particles, i.e, xk+1 = f(xki,l∗ , vk ), l = 1, ..., N using different noise realizations for the particles. Compute for each estimation its CFDF (15).
1.5
1
1
0.5
0.5
Y [m]
Aki, j
0 0
0
−0.5
−1
6. Increase k and iterate to item 2.
−1.5 −1.5
0
−0.5
−1
−1
−0.5
0
X [m]
0.5
1
1.5
−1.5 −1.5
−1
−0.5
0
X [m]
0.5
1
1.5
point is important because, when pedestrian classification is not reliable, more hypotheses (particles) should be kept in order to correct a possible error in the state estimator.
Figure 5: Example of likelihood computation on a cloud of particles presented in Figure 3. In left, σ = 0.15 m and DOR = 0.95 and in right, σ = 0.15 m and DOR = 0.55.
4.2 Experiments on real data
culating the ratio:
Various results with real laser and video data fusion are introduced. Lidar and camera data are not given in the same reference frames. The reference frame related to the lidar was chosen for fusion. The SIR PF with Parzen Window association was tested in many different situations on real data provided by Renault, the French vehicle manufacturer. The presented scenarios (see Figure 8, 9 and 10) include several pedestrians (> 5) who appear and disappear in the sensor area and show different situations such as an urban scene, a semi-urban scene, or a car park. The pedestrians move in all directions. The vehicle moves at a speed ranging from 0 to 50 km/h, which allows to test the robustness of this method. Usually, in pedestrian classification framework, lidar measurements can generate false tracks [17]. These lidar measurements result in most cases from fixed objects suitable for an urban environment. False lidar measurements can be assimilated with security barriers, poles, trees, etc. For each iteration, the number of false detections is obtained by cal-
rate_of_false_detection =
NT − NP NT
(16)
with NT the total number of detections and NP the number of detected pedestrians. The rate of detection of pedestrian(s) is given by calculating the ratio: rate_of_pedestrian_detection =
NP NP _VT
(17)
with NP _VT the number of pedestrians who are actually in the area observed by the sensors. Table 4.2 shows the advantage of data fusion to significantly decrease the number of false detections when a single lidar or a single camera are used. It can also be noticed that the rate of pedestrian detection is higher after data fusion. To conclude, a study illustrating the results of the CFDF module depending on time is proposed, where each track is associated with a different color (see Figure 6 and 7). This study is conducted on a multi-pedestrian scenario presented in Figure 9.
630
Table 1: Rate of false and correct detections for the scenarios presented in the article: when only lidar or only camera are used, and after data fusion. Lidar only Camera only After fusion False Detection 0.536 0.274 0.108 Rate Pedestrian Detection 0.916 0.702 0.928 Rate
4
Y [m]
2
Figure 8: Detection example in a cross section after a centralized fusion from lidar and video image data. The red dots represent lidar detection and the blue rectangles represent camera detection. The yellow rectangles are the results provided by the data fusion module.
0 −2 −4 0
10
20
30
10
20
30
40
50
60
70
40
50
60
70
20
X [m]
15 10 5 0 0
iterations
Figure 6: Result of multi-pedestrian tracking on X and Y positions. Measurements (video and lidar) are constantly represented by gray circles. Each track is represented by a different color. 10000 9000 8000
Figure 9: Detection example in a car park after a centralized fusion from lidar and video image data. The red dots represent lidar detection and the blue rectangles represent camera detection. The yellow rectangles are the results provided by the data fusion module. We can also notice a correct pedestrian detection at a distance up to 25 meters.
CFDF
7000 6000 5000 4000 3000 2000 1000 0
10
20
30
40
50
60
70
iterations
Figure 7: Result of multi-pedestrian tracking with CFDF. Each CFDF is represented by a star of a different color while each DOR is represented by a dot of a different color.
5 Conclusions This paper presented a multisensor pedestrian detection system. The centralized fusion algorithm is applied in a
Bayesian framework. Indeed, in order to track more easily pedestrians random movements which can include abrupt trajectory change, a SIR PF was chosen. First, in order to take into account the unspecified character of the distribution of particles predicted by the SIR filter and all the DOR given by mono sensor algorithm, a data association based on kernel density estimation was used. Second, a likelihood based on a mixture of Gaussian and uniform distributions was used; thus it was possible to take into account more precisely all the available information related to the uncertainties of laser and video measurements (uncertainty concerning both the pedestrians’ position and classification). Ex-
631
[8] S. Gidel, P. Checchin, C. Blanc, T. Chateau and L. Trassoudaine, "Pedestrian Detection Method using a Multilayer Laserscanner: Application in Urban Environment", in Proc. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2008. [9] Y. Bar-Shalom and X.R. Li, "Multitarget-Multisensor Tracking: Principles and Techniques", ISBN 09648312-0-0, 1995. [10] A. Doucet, S. Godsill, and C. Andrieu, "On sequential Monte Carlo sampling methods for Bayesian filtering", Statistics and Computing, vol. 10, CA, no. 3. pp. 197208, 2000.
Figure 10: Detection example in a cross section after a centralized fusion from lidar and video image data. The red dots represent the lidar detection and the blue rectangles represent the camera detection. The yellow rectangles are the results provided by the data fusion module. Pedestrians in various orientations are detected. perimental results on simulated data as well as on real data demonstrated the effectiveness of this approach. The next step is to test this system on more data sequences to be able to characterize it in terms of false positive and correct detection rates.
References [1] http://www.euractiv.com/en/health/road-safetypedestrians/article-117530 [2] http://love.univ-bpclermont.fr/ [3] T. Gandhi and M.M. Trivedi, "Pedestrian Collision Avoidance Systems: a Survey of Computer Vision based Recent Studies", in Proc. IEEE Intelligent Transportation Systems 2006, Sept 2006, pp. 976-981. [4] F. Bu and C-Y. Chan, "Pedestrian Detection in Transit Bus Application: Sensing Technologies and Safety Solutions", in IEEE Intelligent Vehicles Symposium (IV), Las Vegas, June 2005. [5] D.M. Gavrila, "Sensor-based Pedestrian Protection", IEEE Intelligent Systems, vol. 16, NR. 6, pp. 77-81, 2001. [6] F. Arnell, "Vision-Based Pedestrian Detection System for use in Smart Cars", Master’s Thesis, Stockholm, Sweden, 2005.
[11] M. Szarvas, U. Sakai and J. Ogata, "Real-time Pedestrian Detection using Lidar and Convolutional Neural Network", in Proc. of the IEEE Intelligent Vehicles Symposium (IV), Tokyo, Japan, 2006. [12] D.T. Linzmeier, M. Skutek, M. Mekhaiel and K.C.J. Dietmayer, "A Pedestrian Detection System based on Thermophile and Radar Sensor Data Fusion", 8th International Conference on Information Fusion, Philadelphia, USA, 2005. [13] C. Blanc, L. Trassoudaine and J. Gallice, "EKF and Particle Filter Track-to-Track Fusion: a Quantitative Comparison from Radar/Lidar Obstacle Tracks", 8th International Conference on Information Fusion, Philadelphia, USA, 2005. [14] G. Monteiro, C. Premebida, P. Peixoto and U. Nunes, "Tracking and Classification of Dynamic Obstacles Using Range Finder and Vision", in IROS 2006 - Workshop on "Safe Navigation in Open and Dynamic Environments - Autonomous Systems versus Driving Assistance Systems", Beijing, China, 2006. [15] L. Leyrit, T. Chateau, C. Tournayre and J.T. Lapreste, "Association of AdaBoost and Kernel Based Machine Learning Methods for Visual Pedestrian Recognition", in Proc. of the IEEE Intelligent Vehicles Symposium (IV), Eindhoven, Netherlands, 2008. [16] L. Trailovic and L.Y. Pao, "Position error modeling using Gaussian mixture distribution with application to comparison of tracking algorithms", in American Control Conference, Denver, USA, 2003. [17] S. Gidel, P. Checchin, T. Chateau, C. Blanc and L. Trassoudaine, "Parzen Method for Fusion of Laserscanner Data: Application to Pedestrian Detection", in Proc. of the IEEE Intelligent Vehicles Symposium (IV), Eindhoven, The Netherlands, 2008.
[7] X. Shao, H. Zhao, K. Nakamura, K. Katabira, R. Shibasaki and Y. Nakagawa, "Detection and Tracking of Multiple Pedestrians by Using Laser Range Scanner", IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, USA, 2007.
632