Vision-based Automated Vehicle Guidance: the ... - Semantic Scholar

12 downloads 16511 Views 141KB Size Report
E-Mail: fbertozzi,broggi,conte,[email protected] ... last few years on the algorithms and the architectures for vision-based automatic road vehicles guidance.
Vision-based Automated Vehicle Guidance: the experience of the ARGO vehicle Massimo Bertozzi, Alberto Broggi, Gianni Conte, Alessandra Fascioli Dipartimento di Ingegneria dell’Informazione Universit`a di Parma, I-43100 Parma, ITALY E-Mail: fbertozzi,broggi,conte,[email protected]

Abstract This paper presents and discusses the results obtained by the GOLD (Generic Obstacle and Lane Detection) system as an automatic driver of ARGO. ARGO is a Lancia Thema passenger car equipped with a computer vision system that allows to extract road and environmental information from the acquired scene; it has been demonstrated to drive autonomously on a number of different road and environmental conditions.

1 The ARGO Autonomous Vehicle ARGO is the experimental autonomous vehicle developed at the Dipartimento di Ingegneria dell’Informazione of the University of Parma, Italy. It integrates the main results of the research conducted over the last few years on the algorithms and the architectures for vision-based automatic road vehicles guidance. ARGO, a Lancia Thema 2000 passenger car (figure 1), is equipped with a vision system that allows to extract road and environmental information from the acquired scene and to drive autonomously under different road conditions.

1.1 Input devices Only passive, and therefore non-invasive, sensors (such as cameras) are used on ARGO to sense the surrounding environment.  ARGO is equipped with a stereoscopic vision system consisting of two synchronized cameras able to acquire pairs of grey level images simultaneously. The installed devices are low cost cameras featuring a 6:0 mm focal length and a 360 lines resolution. The cameras lie inside the car at the top corners of the windscreen in order to maximize the longitudinal distance between the two cameras.  A panel with user selectable functions is installed near the driver to change the functionalities of the system.  An odometer is used to acquire the vehicle speed.

1.2 Output devices Four output devices are installed on ARGO (see figure 1):  a led-based control panel, which indicates both the functionality selected by the user, and some data about the drive;  a pair of stereo speakers, which give acoustic warnings to the driver;  a monitor, which -for debugging purposes- displays the results of the processing;  and an electric engine installed on the steering column, which allows the vehicle to steer autonomously.  This

work was partially supported by CNR under the frame of the Progetto Finalizzato Trasporti II.

Figure 1: the ARGO experimental vehicle: exterior and interior

1.3 The processing system Two different architectural solutions have been considered and evaluated: special-purpose [1] and standard processing system, but currently the system installed on ARGO is based on a standard Pentium processor with MMX technology, which allows to boost software performance by exploiting SIMD techniques. The new instructions supported by MMX technology accelerate the performance of applications based on computation intensive algorithms that perform localized recurring operations on small native data.

2 The Inverse Perspective Mapping (IPM) The perspective effect must be taken into account when processing images since it associates a different information content to each pixel of the image. A geometrical transform (Inverse Perspective Mapping [2], IPM) has been introduced which allows to remove the perspective effect from the acquired image, remapping it into a new 2-dimensional domain (the remapped domain) in which the information content is homogeneously distributed among all pixels, thus allowing the efficient implementation of the following processing steps with a SIMD paradigm. The application of the IPM transform requires the knowledge of the specific acquisition conditions (camera position, orientation, optics,...) and some assumptions on the scene represented in the image (here defined as a-priori knowledge). Assuming the road in front of the vision system as planar, the use of IPM allows to obtain a bird’s eye view of the scene.

2.1 Extension of IPM to stereo vision As a consequence of the depth loss caused by the acquisition process, the use of a single two-dimensional image does not allow a three-dimensional reconstruction of the world without the use of any a-priori knowledge. In addition, when the target is the reconstruction of the 3D space, the solution gets more and more complex due to the larger amount of computation required by well-known approaches, such as the processing of stereo images. The traditional approach to stereo vision [3] can be divided into four steps: (i) calibration of the vision system; (ii) localization of a feature in an image; (iii) identification and localization of the same feature in the other image; (iv) 3D reconstruction of the scene. The problem of three dimensional reconstruction can be solved by the use of triangulations between points that correspond to the same feature (homologous points). Unfortunately, the determination of homologous points is a difficult task; however the introduction of some domain specific constraints (such as the assumption of a flat road in front of the cameras) can simplify it. In particular, when a complete 3D reconstruction

is not required and the verification of the match with a given surface model suffices, the application of IPM to stereo images plays a strategic role. More precisely, since IPM can be used to recover the texture of a specific surface (the road plane), when it is applied to both stereo images it provides two instances of the given surface, namely two partially overlapping patches. These two patches, thanks to the knowledge of the vision system setup, can be brought to correspondence, so that the homologous points share the same coordinates in the two remapped images [4].

2.2 Extension of IPM to handle non-flat roads The formulas that define the IPM heavily rely on the knowledge of the geometry of the road surface, which is not always completely known. It can vary mainly for the following two different reasons:  vehicle movements (pitch and roll), which change the reference system of the acquisition device with respect to the road;  changes in the slope of the road.

While it has been shown that small vehicle movements and small changes in the road slope can be neglected for sufficiently short fields of view (up to 50 m), sensible deviations from the flat road assumption may lead to unacceptable deformations in the resulting image. For this reason an extension of the IPM technique to handle also non-flat roads is currently under evaluation: thanks to the information obtained from pairs of stereo images, it is possible to derive the height of homologous points in the image using simple triangulations. The algorithm selects features of the image that belong to the road plane (in this implementation it selects road markings) and determines their height with respect to a flat road model. In this way it is possible to measure the road slope and recalibrate the IPM procedure according to the new road model.

3 GOLD, Generic Obstacle and Lane Detection on the ARGO Vehicle 3.1 Lane detection Lane detection functionality (LD) relies on the presence of painted lane markings. The advantage offered by the use of the IPM is that in the remapped image road markings are represented by quasi-vertical constantwidth lines, brighter than their surrounding region. This simplifies the following detection steps and allows its implementation with a traditional pattern matching technique on a SIMD system. The first step of road markings detection is a low-level processing aimed to detect the pixels that have a higher brightness value than their horizontal neighbors at a given distance thus obtaining a new image that encodes the horizontal brightness transitions and the presence of lane markings. Then, taking advantage of the lane markings vertical correlation, this image is enhanced through a few iterations of a geodesic morphological dilation. Different illumination conditions and the nonuniformity of painted road signs require the use of an adaptive threshold for the binarization of the image. The binary image scanned row by row in order to build chains of non-zero pixels (fig. 2.a). Each chain is then approximated with a polyline made of one or more segments, by means of an iterative process that brings the approximation under a specific threshold (fig. 2.b). To get rid of possible occlusions or errors caused by noise, two or more polylines are joined into longer ones if they satisfy some criteria such as small distance between the nearest extrema or similar orientation of the ending segments (fig. 2.c). When more solutions are possible in joining the polylines, initially all of them are considered; then a filter is applied to remove all the polylines that feature a too high or too variable curvature (fig. 2.d). A road model is used to select the polyline which most likely matches the road center line. Each computed polyline is matched against this model using several parameters such as distance, parallelism, orientation, and length. The polyline that fits better these parameters is selected. Finally a new road model is computed using the selected polyline, thus enabling the system to track the road centerline in image sequences.

(a)

(b)

(c)

(d)

(e)

Figure 2: The different steps of Lane Detection: (a) concatenation of pixels; (b) segmentation and construction of polylines; (c) joined polylines; (d ) filtered polylines; (e) superimposition of the detected center line onto a brighter version of the original image for displaying purposes only

Since the model assumed for the external environment (flat road) allows to determine the spatial relationship between image pixels and the 3D world [2], from the previous result it is possible to derive both the road geometry and the vehicle position within the lane (fig. 2.e).

3.2 Obstacle detection Obstacle Detection (OD) shares the same underlying approach (IPM). This is of basic importance the IPM transform can be performed only once and its result can be shared by the two processes. The flat road model is checked through a pixel-wise difference between the two remapped images: in correspondence to anything rising up from the road surface in front of the vehicle (namely a generic obstacle), the difference image features sufficiently large clusters of non-zero pixels that have a specific shape [5]. Due to the different angles of view of the stereo cameras, an ideal homogeneous square obstacle produces two clusters of pixels with a triangular shape in the difference image, in correspondence to its vertical edges. Obviously triangles found in real cases are not so clearly defined and often not clearly disjoint because of the texture, irregular shape, and non-homogeneous color of real obstacles, but are anyway recognizable in the difference image (see figure 3.e). The obstacle detection process is thus based on the localization of these triangles. A polar histogram is used for the detection of triangles: it is obtained scanning the difference image with respect to a focus, considering every straight line originating from the focus itself and counting the number of overthreshold pixels lying on that line (figure 3.f). The values of the polar histogram are then normalized and a low-pass filter is applied in order to decrease the influence of noise (figure 3.g). The polar histogram presents an appreciable peak in correspondence of each triangle. The position of a peak within the histogram determines the angle of view under which the obstacle edge is seen. Peaks generated by the same obstacle, for example by its left and right edges, must be joined in order to consider the whole area between them as occluded. Starting from the analysis of a large number of different situations a criterion has been found, aimed to the grouping of peaks, that takes into account several characteristics such as the peaks amplitude and width, the area they subtend, as well as the interval between them. After the peaks joining phase, the angle of view under which the whole obstacle is seen is computed considering the peaks position, amplitude, and width. In addition, the obstacle distance can be estimated by a further analysis of the difference image along the directions pointed out by the maxima of the polar histogram, in order to detect the triangles corners. In fact they represent the contact points between obstacles and the road plane and thus hold the information about the obstacle distance. For each peak of the polar histogram a radial histogram is computed scanning a specific sector of the difference image whose width is determined as a function of the peak width [6]. The number of overthreshold pixels lying in the sector is computed for every distance from the focus and the result is normalized. A simple threshold applied to the radial histogram allows to detect the triangles corners position and thus the obstacle distance. The result is displayed with black markers superimposed on a brighter version of the left image; the markers position and size encode both the distance and width of obstacles (see figure 3.h).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 3: Obstacle detection: (a) left and (b) right stereo images, (c) and (d ) the remapped images, (e) the difference image, ( f ) the angles of view overlapped with the difference image, (g) the polar histogram, and (h) the result of obstacle detection using a black marker superimposed on the acquired left image; the thin black line highlights the road region visible from both cameras

3.3 System calibration Since the processing is based on stereo vision, camera calibration plays a basic role for the success of the approach. It is divided in two steps. Supervised calibration: the first part of the calibration process is an interactive step: a grid with known size has been painted onto the ground and two stereo images are captured and used for the calibration. Thanks to an X-Window based graphical interface a user selects the intersections of the grid lines using a mouse; these intersections represent a small set of homologous points whose world coordinates are known to the system; this mapping is used to compute the calibration parameters. This first step is intended to be performed only once when the orientation of the cameras or the vehicle trim have changed. Automatic parameters tuning: after the supervised phase, the computed calibration parameters have to be refined. Moreover small changes in the vision system setup or in the vehicle trim due to the vehicle movements require a periodic tuning of the calibration. The parameters tuning consists of an iterative procedure based on the application of the IPM transform to stereo images (see section 2.1).

3.4 A cooperative approach A new cooperative approach that allows to perform a deeper fusion of the results of the two modules (lane detection and obstacle detection) and takes into account also the high temporal correlation of image sequences is currently under evaluation. Once obstacles have been detected and localized, the knowledge of their position can be exploited by the lane detection module, since obstacles generally obstruct the visibility of road markings: the regions occluded by obstacles will not be considered in the search for road markings, thus decreasing noisy features that may disturb the retrieval of the road geometry. Similarly, supposing to know the road geometry (namely the position of road markings within the image), and the precise position of obstacles, a pair of stereo images can be analyzed to validate the assumption of planarity of the road surface: the stereo pair can be used to recalibrate the stereo system and adapt it to new road surfaces, such as hills, bridges, or non-planar highway ramps; the new model will replace the old one and will then be used by both OD and LD modules.

Road Model Update

Lane Detection

+

Obstacle Detection

START

Camera Calibration

Static parameters Slowly changing dynamic parameters Rapidly changing dynamic parameters Rapidly changing dynamic parameters

Iterative process

Obstacles Lane markings Road model Calibration

Figure 4: Block diagram of the cooperative approach: the lower part depicts the data stream, while the upper part shows the control flow of the algorithm

According to this cooperative approach, the OD and LD modules are triggered alternatively; each module exploits the results produced by the previous module and feeds the following one with its own results, thus keeping an updated data structure describing the three dynamic parameters sets: (1) obstacles position, (2) road geometry, and (3) road slope. This iterative process is initialized by hand when no obstacles are visible and the road in front of the vehicle is flat; the initialization phase is used to determine the calibration of the acquisition system, i.e. the static parameters. This integrated approach is depicted in figure 4.

4 Discussion In this paper the ARGO autonomous vehicle has been presented. It was demonstrated and tested on a number of different highways, freeways, and country roads in Italy. The main features of the automatic driving system are that it is based on the processing of information acquired by passive sensors only (cameras), and that the hardware system is based on low-cost off-the-shelf components only (such as video-phone cameras and a Pentium MMX processor). The whole processing takes less than 20 ms, and, since the acquisition of a single field takes 20 ms, the system reaches real-time performance. In this case, the bottleneck of the system is the acquisition time. During the tests, the system demonstrated to be robust and reliable: obstacles were always detected and only in few cases (i.e. on paved or -more generally- rough roads) vehicle movements became so considerable that the processing of noisy remapped images led to the erroneous detection of false small sized obstacles. On the other hand, thanks to the remapping process, lane markings were located even in presence of shadows or other artifacts on the road surface. Up-to-date information, as well images and video clips, can be found on ARGO’s official web site: http://www.ce.unipr.it/ARGO.

References [1] Alberto Broggi, Gianni Conte, Francesco Gregoretti, Claudio Sanso`e, Roberto Passerone, and Leonardo M. Reyneri, “Design and Implementation of the PAPRICA Parallel Architecture”, The Journal of VLSI Signal Processing, 1997, In press. [2] H. A. Mallot, H. H. B¨ulthoff, J. J. Little, and S. Bohrer, “Inverse perspective mapping simplifies optical flow computation and obstacle detection”, Biological Cybernetics, vol. 64, pp. 177–185, 1991. [3] Olivier Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, The MIT Press, 1993. [4] Massimo Bertozzi, Alberto Broggi, and Alessandra Fascioli, “Stereo Inverse Perspective Mapping: Theory and Applications”, Image and Vision Computing Journal, 1998, in press. [5] Alessandra Fascioli, “Localizzazione di ostacoli mediante elaborazione di immagini stereoscopiche”, Master’s thesis, Universit`a degli Studi di Parma - Facolt`a di Ingegneria, 1995. [6] Massimo Bertozzi and Alberto Broggi, “GOLD: a Parallel Real-Time Stereo Vision System for Generic Obstacle and Lane Detection”, IEEE Transactions on Image Processing, vol. 7, no. 1, pp. 62–81, Jan. 1998.

Suggest Documents