Proposition of Generic Validation Criteria using Stereo-Vision for On-Road Obstacle Detection Mathias Perrollaz
∗
Raphael Labayrade
Alain Lambert
‡
†
Dominique Gruyer
Didier Aubert
§
Keywords: Stereo-vision, obstacle detection, sensor fusion, intelligent vehicles Abstract Real-time obstacle detection is an essential function for the future of Advanced Driver Assistance Systems (ADAS), but its applications to the driving safety require a very high reliability: the detection rate must be high, while the false detection rate must remain extremely low. Such features seem antinomic for obstacle detection systems, especially when using a single sensor. Multi-sensor fusion is often considered as a mean to reduce this limitation. In this paper, we propose to use stereo-vision as a post-process to improve the reliability of any obstacle detection system, by reducing the number of false positives. Our algorithm, which is both generic and real-time confirms detections by locally using the stereoscopic data. We evaluated and validated our approach with an initial detection based on a vision system and a laser scanner. The evaluation dataset is real on-road data and contains more than 20000 images. ∗
Inria Grenoble Rhˆ one-Alpes, France, (
[email protected]) National School of Civil Engineering (ENTPE), France ‡ Laboratoire sur les Interactions Vehicule-Infrastructure-Conducteur (LIVIC), IFSTTAR, France § Laboratoire Exploitation, Perception, Simulateurs et Simulations (LEPSIS), IFSTTAR, France †
1
‡
1
Introduction
Obstacle detection is a major topic for the development of Advanced Driver Assistance Systems (ADAS), particularly for the implementation of driving functions such as collision mitigation, collision avoidance, pre-crash or Automatic Cruise Control. Some systems are already deployed in commercial vehicles, dedicated to specific situations. For instance, the City Safety system proposed by Volvo [1] is specialized to pre-crash in low speed conditions (< 30km/h). Other car manufacturer also work on ADAS with specific applications [2]. Increasing the number of ADAS applications of a given detection system requires to increase its reliability (in term of false positives), its perception range and its genericity. The perception range can be addressed by using extended and cooperative perception [3]. With such systems, the vehicle has the capability to anticipate and to assess a risky situation hundred of meters in advance [4]. With this gain of time, the embedded system can propose a safe alternative and a set of maneuvers in order to decrease the risk level. Such work has been done in several project as Have-it [5], CVIS, Safespot [6] [7], CooPerCom [8] [9]. In any case, for both local or extended perception, it is necessary to perform a reliable and robust obstacle detection. Several approaches have been proposed, depending on the sensor involved: range finders like radar [10] or laser scanner [11] [12], or vision systems. In this specific field, monocular vision generally supposes the recognition of specific objects, like vehicles or pedestrians. More generic approaches (e.g. based on optical flow) have also been proposed, but stereo-vision seems more suitable for generic obstacle detection [13] [14] [15] [16], because it provides a three-dimensional representation of the scene. But real-time performance can be a critical issue, and even if it is now possible to compute dense disparity maps in real time [17] [18], a trade off is still necessary over precision. Due to the high complexity of road scenes, no system can currently reach an ideal 100% detection rate with no false positives, whatever the sensor used. Multi-sensor approaches
2
can lead to a significant enhancement of the performances. Among the existing methods for sensor fusion, cooperative fusion is very useful for heterogeneous data: the idea is to use a sensor to confirm the existence of detections provided by another one. Such approaches are generally designed for specific kind of obstacles, and rely on specific models/features of the class of objects to detect. They could as well be seen as a classification task. The particular case of pedestrian recognition has been widely explored, and many methods have been proposed, using features like Histogram of Oriented Gradients (HOG) [19] or Haar wavelet-based cascade [20]. Boosting algorithms may be used in conjunction to build strong classifiers from the set of weak classifiers. The reader interested in pedestrian recognition can find a survey in [21]. Generally, these detection algorithms are based on a sliding window approach to scan the image, but several authors have proposed to use a range finder or an image processing algorithm to extract ROI and apply a classifier only in these regions [22] [23]. Similar classification algorithms have also been proposed for vehicle detection, using learning methods [24] or based on more intuitive criteria, like vertical symmetry [25]. Only a few methods, based on stereo-vision, intend to perform actual generic validation of the detections. In [26], the authors classify pixels of a region of interest into foreground/background, and compare the 3D position of foreground pixels with the existing track. In [27] and [28], the authors propose criteria based on the u-v-disparity representations [14], which provide a more convenient way to use stereoscopic data. This paper build on previous work [29] and proposes an approach using stereo-vision for rejecting false detections from a generic obstacle detection system. The objective is a strong decrease of the false positive rate, while maintaining the detection rate high. The originality of our approach is the proposal, use and validation of generic criteria which do not rely on specific models of the objects to detect, but on very simple hypotheses. The resulting fusion methodology is both very efficient and generic with respect to the type of observed objects. It is also designed to be used in real time even with limited computational capacity, thanks 3
LEFT IMAGE
RIGHT IMAGE
LEFT ROI
RIGHT ROI
TARGETS FROM THE INITIAL OBSTACLE DETECTION SENSOR
BUILDING VOLUMES OF INTEREST
USE NUMERICAL ZOOM ? YES
NO
ZOOMED ROI
ROI
ROAD PROFILE LOCAL DISPARITY MAP COUNTING OBSTACLE PIXELS
VALIDATED ?
NO
COMPUTATION OF OBSTACLE PIXELS
Reject target Process next target
YES
LOCAL V-DISPARITY
ORIENTATION OF PREVAILING ALIGNMENT
VALIDATED ?
NO
Reject target Process next target
YES
BOTTOM HEIGHT
VALIDATED ?
NO
Reject target Process next target
YES
VALIDATION
Figure 1: Detailed architecture of our approach, using stereo-vision as confirmation sensor. to a local zoom capability. An experimental evaluation of the approach is proposed, showing promising performances. The paper is organized as follows. Section II is dedicated to the description of the methodology. The confirmation step is further described in section III. Section IV deals with computation time issues. Section V is dedicated to the presentation of two implementations of our approach, considering two different obstacle detection systems: stereo-vision and laser scanner. Section VI gives an experimental evaluation of the method. Section VII concludes.
2
Overview
In this paper, we propose a collaborative fusion approach. Suppose an obstacle detection system providing hypotheses of detection, denoted ”targets”. A post-process, based on stereo-vision, is used to confirm the existence of the targets and reject errors. This way, 4
the obstacle detection system can be tuned to perform overabundant detection and avoid missing important detections, the second step ensuring the reliability of the system. Fig. 1 illustrates the processing steps of the approach. For each target, a volume of interest (VOI) is built in the stereoscopic images. Then, a set of sparse measurement points is computed for each VOI and used for the evaluation of three confirmation criteria. Those criteria are designed to be generic with respect to the kind of objects, and also to process identically the moving and the static objects.
3 3.1
Stereo-vision-based Confirmation Geometrical developments
All the sensors are rigidly linked to the vehicle and refer to a common coordinate system Ra . The configuration of the stereoscopic sensor is presented on Fig. 2. Cameras are described by a pinhole model and characterized by (u0 , v0 ) the position of the optical center in the image plane, and α, the focal length expressed in pixels. The epipolar geometry is rectified and the extrinsic parameters of the stereoscopic sensor are (0, Ys0 , Zs0 ) the position of the center of the baseline, θs the pitch of the cameras and bs the length of stereo baseline. Given Ol(Xl0, Yl0, Zl0) bs Or(Xr0, Yr0, Zr0) hs
θs
Xl
Zl Yl Xr Yr
Zr Za
P(Xa, Ya, Za)
Xa Ya
Figure 2: Geometrical configuration of the stereoscopic sensor and reference to the common coordinate system Ra . a point P (Xa , Ya , Za ) in Ra , its position (ur , ∆s , v) and (ul , ∆s , v) in the stereoscopic images 5
is:
ur
s /2 = u0 + α (Ya −Y 0 ) sinXθas−b +(Za −Z 0 ) cos θs
ul
s /2 = u0 + α (Ya −Y 0 ) sinXθas+b +(Za −Z 0 ) cos θs
s
s
s
v
= v0 +
s
(Ya −Ys0 ) cos θs −(Za −Zs0 ) sin θs α (Y 0 0 a −Ys ) sin θs +(Za −Zs ) cos θs
(1)
bs ∆s = ul − ur = α (Ya −Y 0 ) sin θs +(Z 0 a −Z ) cos θs s
s
where ∆s is the disparity value. The coordinates in Ra can be reconstructed as:
Xa =
bs 2
+
bs (ur −u0 ) ∆s
Ya
= Ys0 +
bs ((v−v0 ) cos θs +α sin θs ) ∆s
Za
= Zs0 +
bs (α cos θs −(v−v0 ) sin θs ) ∆s
(2)
~ s ) defines a coordinate system R∆ associated to the disparity space. (u~r , ~v , ∆
3.2
Building volumes of interest
The first processing step of the algorithm is the conversion of targets into VOI, where the system should focus its upcoming processing stages. A VOI is defined as a rectangular parallelepiped in R∆ , frontal to the image planes (Fig. 3). This is equivalent to a region of interest (ROI) in the right image, associated to a disparity interval. Compared to ROI, this definition is useful to distinguish objects that are connected in the images, but located at various distances: they are not in the same VOI, even if there is an overlap between the two corresponding ROI. Similarly, it also allows to separate objects from the background. Umin
Δmax
Umax Ud
Δmin Vmin
Δs Vmax
V
Figure 3: Definition of the volumes of interest (VOI).
6
3.3
Computation of obstacle pixels
In each VOI measurement points are computed, though local disparity map computation. The local disparity map for each VOI is computed using a classical block matching-based WTA approach [30]. Using sparse disparity maps allows both to keep a low computation time and to reason on well textured areas: only pixels with large values of gradient are considered in the process. The local disparity data may contain unwanted pixels, like matching errors or pixels belonging to the road surface or to objects located at higher distances . Several filtering operations are considered: • the cross-validation step efficiently rejects errors located in half-occluded areas [31]; • the double correlation method, using both rectangular and sheared correlation window provides instant classification of the pixels corresponding to obstacles or road surface [32]. Only obstacle pixels are kept; • the disparity interval of the VOI is used to reject pixels located outside the processed volume; • a median filter rejects impulse noise created by isolated matching errors. The remaining measurement points in the VOI are called obstacle pixels. Fig. 4 shows the local obstacle disparity map (c) associated to a detection (a). This map can be directly projected on the v-disparity plane (d) and on the u-disparity plane (b). Using these partial representations is convenient for fast processing and robustness toward remaining errors [14].
7
a) b)
c)
d)
Figure 4: A pedestrian on the road, considered as a target (a), obstacle measurement points computed using stereo-vision (c), u-disparity (b) and v-disparity (d) representations of this points.
3.4
Validation criteria
Let us define the main features of what we call an ”obstacle”. The idea is to consider features which are as little restrictive as possible, so that the validation process remains generic with respect to the type of obstacles. We propose to consider the following three requirements for target to be an actual obstacle. Hypotheses on obstacles: 1. the size shall be significant; 2. the orientation shall be almost vertical; 3. the bottom shall be close to the road surface. These conditions allow us to define three validation criteria.
8
3.4.1
Observed surface
Validation of a target according to condition (1) consists in checking that the VOI associated to the target actually contains obstacle pixels: the number of obstacle pixels is compared to a threshold. Two approaches can be considered for choosing the threshold value. From an image processing perspective, one can consider that all pixels contain an equivalent information: a fixed value Tim is used to discriminate detections from errors. Considering the metric reconstruction, all pixels are not equivalent. A pixel represents the projection of a given surface on the matrix of the cameras, the dimensions of this surface being directly related to the distance separating the object from the sensor : more distant objects appear smaller. To establish a threshold, we offer to measure the observed surface formed by obstacle pixels. Considering the surface as fronto-parallel (i.e. ∆s is constant), variations of Xa , Ya and Za for a weak displacement along the image of this surface can be expressed from equation 2 as:
dXa =
bs dur ∆s
dYa
=
bs cos θs dv ∆s
dZa
=
−bs sin θs dv ∆s
(3)
With the approximation that fronto-parallels obstacles are also vertical (valid for small values of θs ), dZa can be neglected, and the equivalent surface of a pixel pi (uir , v i , ∆is ) is : Surf (pi ) = (
bs 2 ) cos θs ∆is
(4)
then, the validation of a target C is possible if:
X
pi ∈C
(
bs 2 ) cos θs > Tm ∆is
9
(5)
The threshold Tm physically corresponds to the minimal surface allowing to validate an hypothesis. By fixing a metric value for Tm , we obtain a threshold expressed in number of pixels which depends on the distance to the target: Tmim (∆max )= s
Tm ∆max ( s )2 cos θs bs
(6)
For long range, this threshold allows to confirm targets apparently small in picture but corresponding to big objects. On the other hand, its limit is seen on short range: the local disparity data being sparse, the visible surface is significantly smaller than the actual surface of the object. Therefore, a combined threshold is defined as : Tp (∆max ) = min(Tmim (∆max ), Tim ) s s 3.4.2
(7)
Prevailing alignment
Hypotesis (2) consider the obstacles as almost vertical, while the road is mostly horizontal. We propose to measure in which direction the obstacle pixels of the target are aligned. For this purpose, the set of obstacle pixels is projected over the v-disparity plane and a linear regression is computed to find its global orientation. Let us consider a planar object, parallel to the stereo baseline. The plane’s equation is: A2 (Za − Zs0 ) + A1 (Ya − Ys0 ) + A3 = 0
(8)
According to equation (2), its representation in the disparity space is a plane:
∆s =
bs A2 ((v − v0 )(a cos θs + sin θs ) + α(a sin θs − cos θs )) A3
10
(9)
with a =
dZa , dYa
the slope of the object plane in Ra , being the value to estimate. The plane in
R∆ is projected as a straight line on the v-disparity plane. The slope of this line is:
p0 =
d∆s a cos θs + sin θs = ∆0 dv α(a sin θs − cos θs )
(10)
with ∆0 = ∆s (v = vo ). This equation allows to express a according to the parameters (p0 , ∆0 ) of the line: a=
αp0 cos θs + ∆0 sin θs dZa = dYa αp0 sin θs − ∆0 cos θs
(11)
An obstacle will then be confirmed if :
a < Ta = maximal slope
(12)
The prevailing alignment criterion relies on a global approach within the VOI. It is sensitive to the coherence between pixels. As a consequence, it is more robust against uncorrelated matching errors. 3.4.3
Bottom height
A specific kind of false positives with stereo-vision appears in scenes with many repetitive structures. Highly correlated false matches then appear as an object close to the vehicle. Such false matches can be a problem for the previous criteria, which assume that matching errors are mainly uncorrelated. Among these errors, the most critical ones occur when the values of disparities are over-evaluated (in the case of an under-evaluation, the target is located further than the actual object, and is rather a case of false negative). Let us investigate the influence of such a miss-evaluation of disparity on the positioning of points along the Ya axis. Call (ud , v, ∆1 ) and (X1 , Y1 , Z1 ) the coordinates of a pixel respectively represented in R∆ and Ra , and (ud , v, ∆2) and (X2 , Y2 , Z2 ) its coordinates after an erroneous
11
estimation of its disparity. Equation 2 gives the error along the Ya axis:
Y2 − Y1 =
∆1 − ∆2 ∆1 − ∆2 bs ((v − v0 ) cos θs + α sin θs ) = (Y1 − Ys0 ) ∆1 ∆2 ∆2
(13)
Assuming Y1 = 0, to consider a pixel located at the base of an obstacle, we get: Y2 = Ys0 (1 −
∆1 ) ∆2
(14)
This value corresponds to the minimal possible height of the bottom of a target, after an error in the disparity estimation. It can be large when the disparity is significantly over-evaluated and may give the feeling that the target flies without ground support. The validation test consists in measuring the altitude of the lowest obstacle pixel in the VOI, and check that its altitude is low enough: maxpi ∈C (Ya ) < Th = minimal height
(15)
This criterion is useful when using a local matching method based on a small correlation window which may fail with large seemingly repetitive objects.
4
Dealing with real-time
An issue with stereo-vision is computation time. A trade-off is generally necessary: baseline, focal length, sensor matrix size and resolution are adapted to the expected application and the processing ability of the embedded system. The use of VOI offers possibilities to reduce the trade-off. Despite the recent increase in the computation capability of PC, it is still relevant to deal with reducing the computation time, based on two tendencies: (1) the increase in resolution of cameras improves the accuracy while dramatically increasing computation time; (2) embedded hardware dedicated to automotive applications is far to be as powerful
12
as a desktop PC.
4.1
Working with volumes of interest (VOI)
VOI help saving computation time, particularly with high resolutions, by avoiding t process complete images. However, when there are many targets, potentially overlapping, computation time can be insufficient. A winning strategy is to sort targets by level of importance, so that the most hazardous obstacles are processed in a fixed amount of time and the computation power is used in the ”safest” way.
4.2 4.2.1
Local rescaling Basic idea
The use of VOI can offer a finer management of computation time, by locally rescaling the stereo images. Generally, for near objects the resolution of images is high enough whereas it is not sufficient for distant objects. Increasing the resolution of images, or using sub-pixel approaches, is appropriate for distant objects while representing a loss of computation time when dealing with near objects. It would be more efficient to adapt the resolution to the image content, according to some knowledge about the observed scene. Here, every VOI in the images is associated to a distance measurement, allowing to locally adapt the resolution. A zoom factor is computed locally, so that an object’s size in the images is independent from its position in the scene. This provides two major advantages: the resolution is increased for distant objects, improving the detection, while it is reduced for close objects, saving computation time.
13
4.2.2
Computation of the zoom factor
To compute the resize coefficient for a given target, consider an imaginary object of width L, parallel to the image planes, whose disparity value is ∆s . Differentiation of equation (2), with ∆s and v fixed, gives: dXa =
bs dur ∆s
(16)
which ensures that the width l of the object in the images, along the u~r direction is:
l=
L∆s bs
(17)
By using a zoom factor Fz , the width in images will be lzoom = Fz ∗ l. Let us define a scale factor Ks as the ratio between the expected size in the image and the actual size in meters, the zoom coefficient becomes: Fz = Ks
bs ∆s
(18)
Note that in the disparity space, this is equivalent to always position the front of a target to an ideal disparity ∆i = bs .Ks .
5
Implementation
To assess the efficiency of the proposed framework, we propose two different implementations, considering two complementary sensors, stereo-vision and laser scanner, which represent two different approaches of obstacle detection.
14
OBSTACLE DISPARITY MAP
global u-disparity projection
U-DISPARITY IMAGE
Fixed Threshold projection along image lines
HISTOGRAM OF DISPARITIES
(Δmin - Δmax) U-DISPARITY IMAGE
local projection along columns
(Δmin - Δmax)
LOCAL HISTOGRAM OF Ud Fixed Threshold (Umin - Umax) Fixed Threshold VOI height
local v-disparity projection OBSTACLE DISPARITY MAP
(Umin - Umax)
LOCAL V-DISPARITY IMAGE
(Δmin - Δmax)
local projection along lines
Ci,j,k VOI bottom (= road)
HISTOGRAM IN V
(Vmin - Vmax)
TARGET
Figure 5: Algorithm used for detection using stereo-vision. The disparity space in segmented longitudinally, then laterally, and at last laterally. At each step, classes are built by thresholding 1D histograms.
5.1 5.1.1
Stereo-vision Building volumes of interest
Stereo-vision is intrinsically included into the proposed framework and can also be used for targets generation. A sparse obstacle disparity map is computed, using the double correlation algorithm [32]. Then, the disparity space is segmented into VOI, which can be used directly into the validation framework. The segmentation is performed sequentially among ∆s , ur and then v axes, as illustrated on Fig. 5. This is similar to the sliding volumes method proposed in [32], but more efficient in computation time, because it is only composed of 1D accumulations and basic thresholding.
15
5.1.2
Multi-resolution approach
It may seem paradoxical to compute a complete disparity map for detection whereas one of the major interest of the proposed framework is the ability to work with VOI. Besides, it is fully possible to take advantage of this feature through a multi-resolution approach: as the computation of obstacle pixels is performed at highest resolution on VOI, there is no need for a high precision at the detection stage.
Figure 6: Examples of detection using stereo-vision. Results obtained with the segmentation method are illustrated on Fig. 6. Most objects are detected and few false detections may occur, because this method being sensitive to the values of thresholds.
5.2
Lidar
The case of single layer laser scanner is relevant in the proposed framework since it can produce fast and accurate detections, along with a large amount of false positives: in case of strong vehicle pitch or non-plane road geometry, the intersection of the scanning plane with the road surface produces detections.
16
5.2.1
Detection method
The lidar scans an horizontal plane of the scene and provides a set of impacts. These data are filtered to remove the impacts situated outside a warning area, which correspond to the future possible positions of the ego-vehicle. The set of potential obstacles is then created by clustering the remaining laser points. This step relies on a recursive non-supervised automatic classification algorithm. The resulting detections are represented by ellipses, quantifying the uncertainty on their size and position, which are then tracked over successive frames. The processing of the laser data is described in [11]. 5.2.2
Building volumes of interest
To build a VOI, a bounding box Vo is constructed in Ra from the laser tracks as described on Fig. 7-a. Znear , Xlef t and Xright are computed from the ellipse parameters. Zf ar and Yhigh are then constructed from an arbitrary knowledge of the size of the obstacles. Given V0 , the VOI (Fig. 7-b) is defined by computing the coordinates of the summits of Vo in R∆ , using equation 1 to compute the smallest circumscribed VOI.
BOUNDING BOX LASER TARGET
Zfar
Za LASER TARGET
Znear Xleft
BOUNDING BOX
Yheigh
Xright Ya
Xa
(a)
VOLUME OF INTEREST
(b)
Figure 7: Conversion of a laser track into bounding box (a) and projection into R∆ to obtain a VOI (b). An example of detection from the laser scanner is presented on Fig. 8. The arbitrary height of bounding box is visible on the signalization cone on the right. This behavior could be corrected by further using obstacle pixels to refine measurement. 17
Figure 8: Right image from stereo pair with laser impacts projected (cross), and resulting targets (rectangles).
6 6.1
Experimental Setup and Evaluation Sensors used
The stereoscopic sensor is composed of two SMaL CMOS cameras, with 6 mm focal length. VGA 10 bits gray-scale images are grabbed every 30 ms. Three stereo baselines have been tested: 30 cm, 65 cm and 100 cm. The height is 1.4 m and the pitch angle is θs = 5˚. The telemetric sensor is a Sick LMS200 laser scanner which measures 201 points every 26 ms, with a scanning angular field of view of 100˚. It is positioned horizontally 40 cm over the road surface. The thresholds are parameterized as follow: Ti = 80pix, Tm = 0.025m2 , Ta = 45˚ and Th = 0.7m. The algorithm runs in real time (less than 40 ms) on a rather old desktop computer equipped with a Pentium 4-E processor 3.4GHz.
6.2
Detection rate
As the main objective of the proposed framework is a high rejection rate of false positives, without decreasing the rate of detection, we have evaluated these features on a large dataset of real road data. The stereo-vision implementation is very relevant for such an evaluation, as it provides a high rate of detection for any kind of objects present on the road, like cars, motorcycles, bus, pedestrians, walls, barriers, poles. . . We have recorded 3 stereo-vision sequences (Fig. 9) in different road context (10662 frames on highway, 4080 frames in 18
a)
b)
c)
d)
Figure 9: Images from the sequences used for evaluation. The two first ones use the stereovision based implementation on highway (a), downtown (b) and in mixed context (c). The fourth one (d) uses the laser scanner implementation on bumpy road. town and 5524 frames on countryside) and evaluated the detection rate before and after the validation stage. Assuming Oi and Di are respectively the numbers of detectable and detected objects in frame i, the detection rate for a sequence of Nf frames is defined as P PNf
Nf
Di Oi
. Fig. 10 shows the results. Oi and Di where manually counted. False positives
Detection rate 100 90
92 91,9
12
96,595,1
85,585,1
11,1
10
80 70
8
60 50
without validation
40
with validation
6
5,4
5,4
30 20
without validation with validation
3,6
4 2
10
0
0 Highway
Town
0,07 Highway
Mixt
0,3
0,1 Town
Mixt
0,04 Lidar
Sequences
Sequences
Figure 10: Detection rate (left) and false detections rate (right), expressed in percentages and measured for the two stereo-vision sequences, without and with the use of the validation step. Promising performances were obtained: the validation step had a very weak influence on the detection rate, for any kind of objects. It means that the proposed method does not 19
decrease significantly the performances of the included detection system.
6.3
Rejection of false positives
To measure the influence of the validation step on false positives, we have measured the false detection rate as
1 Nf
P
Nf
Fi , Fi being the number of false detections occurring in frame i.
The three previous stereo-vision sequences are used for this evaluation. A sequence using the laser scanner implementation is also used to assess the performances of the validations criteria with many false detections. It was recorded on a bumpy parking area, in order to maximize the number of a false positives due to strong pitch. Fig. 10 shows the results. Most errors are correctly rejected, so the stereo-vision stage improves performances of the embedded detection systems. Fig. 11 shows examples of common errors that are correctly rejected thanks to the proposed architecture. Note that the combination of the three criteria is very efficient, since they work on different and complementary cases. Prevailing alignment removes errors on the road surface, bottom height is necessary when there are repetitive structures (especially the buildings downtown) and observed surface does well with localized matching errors. The few remaining false positives correspond to very ambiguous cases, for example when the volume of interest contains part of another object.
6.4
Influence of the local rescaling on detection range
To quantify the advantages of the zoom approach, we have used the laser scanner implementation with low resolution stereoscopic images (1/4 PAL). Our experimental vehicle is placed in a fixed position on a 80 meters straight lane, another vehicle moves away. As seen on Fig. 12, the apparent image size of the car remains the same whatever the distance. The distance where the perception system stops validating the target is measured and reported on Table 1. Without applying the zoom, the validation framework clearly limits the
20
(a)
(b)
(d)
(e)
(c)
Figure 11: Common sources of errors in detection: (a) repetitive pattern creates highly correlated matching errors, (b) poor image quality, (c) laser scanning plane intersects road surface, (d) non plane road is seen as an obstacle, (e) laser tracking failed.
Figure 12: Numerical zoom (right) locally applied to a laser target (left). The image of an observed object is magnified when the object is far (up), and reduced when the object is close (down). detection range of the system: although the laser scanner detects the car up to 74 meters, the target is confirmed at 58 meters in the best case using prevailing alignment criterion (bottom height is not relevant for this test because it always confirms). The observed surface criterion gives poor results beyond 31 meters.
21
Table 1: Influence of numerical zoom on the confirmation range. Lidar only without zoom 74 m with zoom 74 m
Observed surface 31 m 56 m
Prevailing alignment 58 m 74 m
Bottom height 74m 74m
The numerical zoom (limited to 2×) allows to increase the range, so that with prevailing alignment criterion it is no longer limited by the validation step. The observed surface criterion can not confirm obstacles up to 74 meters, but the range is increased by 80.6%.
7
Conclusion
For the application of obstacle detection in the automotive domain, reliability is a major consideration. In this paper, we proposed an approach based on stereo-vision to improve performances of various obstacle detection systems, by rejecting false positives. For this purpose, we proposed three validation criteria, relying on three almost unconstraining hypotheses made on detections: the surface shall be large enough, the orientation shall be almost vertical and the bottom shall be close to the ground. Thus, the approach can be generic regarding the type of obstacles. We obtained real-time operation by proposing to use volumes of interest and a local numerical zoom approach. To assess how the proposed method fits with an initial detection based on different kind of sensors, we proposed two implementations using a stereo camera and a lidar. The approach has proven to be efficient in various road context (downtown, highway and mixed context) in a dataset of more than 20000 images containing many kind of objects. Most false positives are correctly rejected while true positives are in most cases not discarded. To go further in evaluating the generality of the approach, it would be relevant to implement with other detection sensors, especially with radar. 22
References [1] M. Distner, M. Bengtsson, T. Broberg, and L. Jakobsson, “City safety : A system addressing rear-end collisions at low speeds,” in Enhanced Safety Vehicles Conf., 2009. [2] M. Muntzinger, M. Aeberhard, S. Zuther, M. Ma andhlisch, M. Schmid, J. Dickmann, and K. Dietmayer, “Reliable automotive pre-crash system with out-of-sequence measurement processing,” in Intelligent Vehicles Symp., 2010. [3] A. R. S. Demmel, D. Gruyer, “V2v/v2i augmented maps: state-of-the-art and contribution to real-time crash risk assessment,” in Canadian multidisciplinary Road Safety Conf., (Niagara falls, USA), 2010. [4] S. Ammoun, F. Nashashibi, and C. Laurgeau, “Real-time crash avoidance system on crossroads based on 802.11 devices and gps receivers,” in IEEE Intelligent Transportation Systems Conf., 2006. [5] B. Vanholme, D. Gruyer, B. Lusetti, S. Glaser, and S. Mammar, “A legal safety concept for highly automated driving on highways,” IEEE Trans. Intelligent Transportation Systems, 2012. [6] C. Zott, S. Cosenza, P. Lytrivis, F. Codeca, and A. Belhoula, “The safespot vehicular platform environmental perception from sensors and wireless lan messages,” in IEEE ITS World Congress, (Stockholm, Sweeden), 2009. [7] G. Vivo, P. Dalmasso, E. Nordin, M. Dozza, P. Cravini, F. Codec, V. Manzoni, and J. IbanezGuzman, “V2v applications in the safespot european project: The oems experience,” in IEEE ITS World Congress, (Stockholm, Sweeden), 2009. [8] A. Busson, A. Lambert, D. Gruyer, and D. Gingras, “Analysis of inter-vehicle communication to reduce road crashes,” IEEE Trans. Vehicular Technology, 2011. [9] A. Lambert, D. Gruyer, A. Busson, and H. M. Ali, “Usefulness of collision warning intervehicular systems,” Int. Journal of Vehicle Safety, 2010. [10] M. Skutek, M. Mekhaiel, and G. Wanielik, “Precrash system based on radar for automotive applications,” in IEEE Intelligent Vehicles Symp., (Columbus, USA), 2003. [11] R. Labayrade, C. Royere, D. Gruyer, and D. Aubert, “Cooperative fusion for multi-obstacles detection with the use of stereovision and laser scanner,” Autonomous Robots, vol. 19, no. 2, pp. 117–140, 2005. [12] A. Mendes, L. C. Bento, and U. Nunes, “Multi-target detection and tracking with a laserscanner,” in IEEE Intelligent Vehicles Symp., (Parma, Italy), 2004. [13] M. Bertozzi and A. Broggi, “Gold : A parallel real-time stereo vision system for generic obstacle and lane detection,” IEEE Trans. Image Processing, vol. 7, pp. 62–81, January 1998. [14] R. Labayrade, D. Aubert, and J. Tarel, “Real time obstacle detection on non flat road geometry through ‘v-disparity’ representation,” in IEEE Intelligent Vehicles Symp., 2002.
23
[15] S. Nedevschi, R. Danescu, D. Frentiu, T. Marita, F. Oniga, C. Pocol, T. Graf, and R. Schmidt, “High accuracy stereovision approach for obstacle detection on non planar roads,” in IEEE Intelligent Engineering Systems Conf., 2004. [16] D. Pfeiffer and U. Franke, “Efficient representation of traffic scenes by means of dynamic stixels,” in Intelligent Vehicles Symp., (San Diego, USA), 2010. [17] W. van der Mark and D. M. Gavrila, “Real-time dense stereo for intelligent vehicles,” IEEE Trans. Intelligent Transportation Systems, vol. 7, no. 1, pp. 38–50, 2006. [18] H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 2, 2008. [19] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Int. Conf. on Computer Vision and Pattern Recognition, 2005. [20] P. Viola, M. Jones, and D. Snow, “Detecting pedestrians using patterns of motion and appearance,” Int. Journal of Computer Vision, vol. 63, no. 2, 2005. [21] M. Enzweiler and D. M. Gavrila, “Monocular pedestrian detection: Survey and experiments,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, 2009. [22] A. Broggi, P. Cerri, S. Ghidoni, P. Grisleri, and H. G. Jung, “A new approach to urban pedestrian detection for automatic braking,” IEEE Trans. Intelligent Transportation Systems, vol. 10, no. 4, 2009. [23] D. Gavrila and S. Munder, “Multi-cue pedestrian detection and tracking from a moving vehicle,” Int. Journal of Computer Vision, vol. 73, no. 1, pp. 41–59, 2007. [24] A. Haselhoff, A. Kummert, and G. Schneider, “Radar-vision fusion with an application to carfollowing using an improved adaboost detection algorithm,” in IEEE Intelligent Transportation Systems Conf., 2007. [25] G. Toulminet, M. Bertozzi, S. Mousset, A. Bensrhair, and A. Broggi, “Vehicle detection by means of stereo vision-based obstacles features extraction and monocular pattern analysis,” IEEE Trans. Image Processing, vol. 15, no. 8, 2006. [26] S. Rodriguez, V. Fremont, P. Bonnifait, and V. Cherfaoui, “Visual confirmation of mobile objects tracked by a multi-layer lidar,” in IEEE Int. Conf. on Intelligent Transportation Systems, (Madeira, Portugal), 2010. [27] M. Perrollaz, R. Labayrade, C. Roy`ere, N. Hauti`ere, and D. Aubert, “Long range obstacle detection using laser scanner and stereovision,” in IEEE Intelligent Vehicles Symp., (Tokyo, Japan), 2006. [28] M. Bai, Y. Zhuang, and W. Wang, “Stereovision based obstacle detection approach for mobile robot navigation,” in Int. Conf. on Intelligent Control and Information Processing, 2010. [29] R. Labayrade, M. Perrollaz, D. Gruyer, and D. Aubert, “Sensor data fusion for road obstacle detection: A validation framework,” in Sensor Fusion and its Applications (C. Thomas, ed.), InTech, 2010.
24
[30] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms.,” Int. Journal of Computer Vision, vol. 47, no. 1-3, 2002. [31] G. Egnal and R. Wildes, “Detecting binocular half-occlusions: Empirical comparisons of five approaches.,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1127– 1133, 2002. [32] M. Perrollaz, R. Labayrade, R. Gallen, and D. Aubert, “A three resolution framework for reliable road obstacle detection using stereovision,” in IAPR Int. Conf. on Machine Vision and Applications, (Tokyo, Japan), 2007.
8
Biographies
Mathias Perrollaz received the M.S. degree from National Polytechnic Institute of Grenoble (INPG) in 2003. He started working on ITS at the Images and Signals Laboratory in Grenoble (CNRS), and attended the perception team of the LIVIC department of IFSTTAR (French institute for science and technology in transportation) in 2004. He received the Ph.D. degree from Paris-6 university (UPMC) in 2008. Since April 2009, he is working at Inria (French institute for computer science) on probabilistic methods for ITS. In 2011 and 2013, he worked on perception for robotic manipulators at Ohio Northern University, USA. Mathias Perrollaz also taught in Paris-10, and in Grenoble Universities. Raphal Labayrade received the M.S. degree in 2000 from the university of Saint Etienne, and graduated from the ENTPE engineer school in 2000. He received the Ph.D. and the Habilitation degree respectively in 2004 and 2012 from Paris 6 University. From 2000 to 2007, he worked in the perception team of the LIVIC as a Ph.D student and then as a researcher. His work dealt with computer vision for road application. He taught at Paris-6 and Versailles Universities and in private companies. Since February 2007, he is a researcher - lecturer at ENTPE (LGCB). His research topics include multi-objective optimization in uncertain environments, and development of simulators for the assessment of visual preferences, with an application to lighting design in buildings.
25
Dominique Gruyer received the M.S. and Ph.D. degree respectively in 1995 and 1999 from the Compigne University of Technology (UTC). Since 2001, he is a researcher at IFSTTAR, into the perception team of the LIVIC. He works on multi-sensors/sources association, combination and fusion. His works applies to multi-obstacle detection and tracking, extended perception, accurate ego-localization. He is involved in several European and French projects dealing with intelligent vehicles. He is also the main inventor of the SiVIC platform (Simulation for Vehicle, Infrastructure and sensors). Since 2010, he is the team leader of the LIVIC Perception team. For 2 years, he is a networking researcher in AUTO21 (Canada). He is co-founder and was technical Director of the CIVITEC company. Alain Lambert received the M.Sc. Degree in Control Sciences and the Ph.D. degree in Robotics from the UTC, in 1994 and 1998, respectively. From 1999 to 2002, he has been an assistant professor at the ESIEE-Amiens, France. Since 2002, he is associate professor at the Paris-10 University, and a member of the ACCIS Research group at IEF (Institute of Fundamental Electronics). Since 2011, he is on leave as a senior researcher at the LIVIC department of IFSTTAR. His research interests include perception, localization and path planning. Didier Aubert received the M.S. and Ph.D. degree respectively in 1985 and 1989 from INPG. From 1989-1990, he worked as a research scientist at Carnegie Mellon University. From 1990-1994, he worked in the research department of a private company (ITMI), as the leader of several projects dealing with computer vision, mobile robotic and manipulator robotic. He was also working as a consultant on computer vision. In 1995 he joined INRETS, which became IFSTTAR in 2011. From 2002 to 2009 he was manager of the LIVIC perception team. He is currently a senior researcher and director of LEPSIS. He works on many topics related to inboard and outboard perception for ITS. He is an image processing expert for several companies. He taught and teaches in several Universities (Paris VI, XI, XII, Evry, Versailles, ENPC, ENST, ENSMP). 26