Towards All Around Automatic Visual Obstacle Sensing for Cars Michael E. Brauckmann1, Christian Goerick1, Jurgen Gro1 2, and Thomas Zielke2 ;
Institut fur Neuroinformatik, Ruhr-Universitat Bochum, 44780 Bochum, Germany Phone: +49-234-700.5567, FAX: +049-234-709.4209, e-mail:
[email protected] 2 C{VIS Computer Vision und Automation GmbH, Universit atsstr. 142, 44799 Bochum, Germany 1
Abstract We currently work on the implementation of an integrated prototype system that provides "all around" automatic visual obstacle sensing for a Daimler-Benz test car. Most of the machine vision techniques being used have been developed within the European PROMETHEUS program and a number of other research projects carried out by the authors and other aliates of their institutions. This includes robust symmetry measuring, neural net{based adaptive object detection and tracking, and inverse-perspective stereo image matching and robust scale estimation in time.
1 Introduction Within the PROMETHEUS program computer vision research on ecient and reliable obstacle detection has made considerable progress. In this paper, we describe an integrated obstacle detection system that receives visual input from up to sixteen cameras. The primary visual tasks of the system are: 1. Detection and tracking of road vehicles from behind (CarTrack) used for intelligent cruise control. 2. Short range frontal detection of elevated objects (VisionBumper) used for warning functions in city trac or during stop&go on highways. 3. Lateral monitoring of neighbour lanes (SideView) as part of automatic blind spot surveillance. Some of the most ambitious activities within Prometheus are centered around the Daimler-Benz test vehicles VITA I and II [6, 7]. This paper gives an overview over research and development carried out at the Ruhr{University and C{VIS, contributing to the realization of the video-based obstacle sensing system of VITA II.
2 Sensor and Environment Model A video camera is not the most obvious choice for an obstacle detection sensor. A passive visual sensor cannot directly measure the distance to an object, but the performance of human driving which is mainly controlled by vision actually proves that a visual sensor is in principle the most versatile one. An automatically guided vehicle may have several dierent sensors for the perception of its environment. However, the processes that control the physical actions of the vehicle, e.g. braking and steering, only need to be fed with a simpli ed model of the vehicle's environment. Figure 1 is a graphical illustration of the environment model which our obstacle detection systems can produce. We distinguish three zones around the car and there is a dedicated obstacle detection process to each of them. For a close range in front of the car this is a module dubbed VisionBumper, for larger distances along the road the CarTrack module watches out for other vehicles, and alongside the car a distance of approximately one lane width is covered by the SideView module. For simplicity, we do not consider the road behind the car here. The zones marked in Fig. 1 can just be re ected about the car to get "all around" obstacle detection. According to the speci c computer vision methods applied, obstacles are modelled differently in each of the environment zones. CarTrack is specialized on recognizing the rear and frontal views of road vehicles. It can deal with up to three objects concurrently. This conforms to the requirement of the demonstration scenario which is autonomous driving on a motorway with three lanes on either side. The vehicle objects detected and tracked by CarTrack are modelled as boxes in a bird's-eye view representation as depicted in Fig. 1. The boxes are "open" at the invisible side of the objects re ecting the fact that we cannot measure the extent of the objects in the "x" direction.
VISION BUMPER
SIDEVIEW
CARTRACK
x
y
Figure 1: Abstract model of the autonomous car's environment and its obstacle perception Short range detection of any kind of obstacle objects is accomplished by the VisionBumper system. It is based on stereo image processing and by this means it measures the distance to any visible elevated object within the range of operation. VisionBumper distinguishes between two separate objects at most. This again is sucient following the argument that a higher number of obstacle objects within this range is very unlikely on the one hand and would not make any dierence for the desired vehicle reaction on the other hand. The system for lateral obstacle detection, SideView, is based on both stereo and motion image processing. The main task of the system is the detection of overtaking and overtaken vehicles. Since the cameras typically see only parts of the vehicles driving alongside, the obstacle model in the bird'seye view is a box with two inde nite sides (see Fig. 1). This however is sucient to capture the crucial information which is lateral distance and relative speed.
3
CarTrack
CarTrack is a specialized monocular visual sen-
sor system for the car-following scenario [8]. The system can reliably detect, track, and measure rear or frontal views of automobiles in a dynamic image taken from a following car or a leading car respectively (the latter case is not dealt with explicitly in the following). There are four major visual tasks the system has to cope with: 1. Detecting leading cars or other objects on the road. This means repeated visual scanning of
the road in front of the car, using appropriate criteria for object de nition. 2. Visual tracking of the objects while their image position and size may vary greatly. 3. Accurate measurements of the dynamic image size of the individual cars (objects) being tracked. 4. Classi cation of objects that are being tracked. The class of objects that are detected by CarTrack include normal cars of all sizes as well as lorries and conventional trailers. Figure 2 shows a still from the system's display used for visualizing the image processing results during realtime operation. The capabilities and the robustness of CarTrack are still being enhanced and further development work is under way. A previous version of the system has been described in [8].
3.1 Exploiting Symmetry
One of the methods used by CarTrack for object detection exploits the symmetry property typical for the rear of most vehicles on normal roads. Mirror symmetry with respect to a vertical axis is one of the most striking generic shape features available for object recognition in a car-following situation. Initially, we use an intensity-based symmetry nder to detect image regions that are candidates for a leading car. The vertical axis of symmetry obtained from this step is also an excellent feature for measuring the leading car's relative lateral displacement in consecutive images because it is invariant under (vertical) nodding movements of the camera and under changes of object size. To exactly measure the image size of the car in front,
a novel edge detector has been developed which enhances pairs of edge points if the local orientations at these points are mutually symmetric with respect to a known symmetry axis. Edge points that do not have a mirror symmetric counterpart are suppressed.
3.2 Flexibility by Using Neural Nets and Local Orientation Coding
The other main method used by CarTrack is a neural network-based approach recently developed by Goerick and Brauckmann [4]. It greatly enhances the robustness of the system with respect to variations of object shapes in the image. Neural nets are not widely used in realtime computer vision systems yet, despite their attractive properties. The main reason being that the capabilities of neural nets for learning, generalization, approximation and classi cation can only be made use of in combination with an appropriate feature extraction process. We use a particular ecient kind of local orientation coding for this purpose. The image features obtained from the preprocessing stage are bit strings each representing a binary code for the directional grey-level variation in a pixel neighborhood. In a more formal fashion the operator is de ned as X b0(n; m) = k(i; j ) u(b(n; m) , b(n + i; m + j ) i;j
, t(i; j )); (i; j ) 2 neighborhood
(1) where b(n; m) denotes the (grey scale) input image, b0 (n; m) the output 'image', k(i; j ) a coecient matrix, t(i; j ) a threshold matrix and u(z ) the unit step function de ned by 0 u(z ) = 01;; zz < 0 :
Note that the matrices may have negative index values. The output 'image' consists of labels, where each label corresponds to a speci c orientation of the neighborhood. For a N4 and a N8 neighborhood on regular square grids, suitable choices for the coecient matrices are
2 3 2 3 0 1 0 1 2 4 4 2 R 4 5 4 8 R 16 5 0 8 0
32 64 128
where R is the reference position. This choice for N4 leads to a set of labels b0 (n; m) 2 [0; : : : ; 15], corresponding to certain local structures. The choice of the coecients and the formulation of the operator gives rise to some properties: Due to the unique separability of the sum into its components, the information of the local orientation is preserved. The approach is invariant to absolute intensity values. The search for certain strucures in the image reduces to working with dierent sets of labels. For horizontal structures, mainly the labels 1,8 and 9 have to be considered. An adaption mechanism for the parameters t(i; j ) of the coding algorithm yields a high level of exibility with respect to lighting conditions [3]. The local orientation coding is combined with a lateral histogram technique. From the orientation code histograms we obtain guesses for image regions likely to show a vehicle on the road. A neural net classi er is then used to decide on the presence of a car in a given image region. Depending on the image samples used for training the neural net, the classi cation can be speci cally tuned to a certain kind of vehicle, e.g. van or truck, or it can be kept "fuzzy" in order to accept most kinds of vehicle.
3.3 Estimating Distance and Relative Speed
Figure 2: The CarTrack system in operation. The leading car in the center is being tracked and its size is measured. The left car has just been detected and tracking is being initialized.
n
"! m
The visual information that CarTrack tries to extract from the camera images is the dynamic size (in image coordinates) of a leading vehicle. In a natural environment the essential problem is that of reliable image segmentation. Because of ever changing local contrast between objects and background in the image a model-based approach is required for the segmentation step. After having detected a leading car, the vision system continuously tracks its location in the video
image while taking measurements of the image width of the car. The model relied on in this case is the assumption that the rear view of most vehicles is approximately symmetric. From the apparent width of the leading car an estimate for the distance can be calculated if the real size of the leading car is known. The width of road vehicles vary signi cantly only between dierent types, e.g. passenger cars vs. trucks. Therefore we classify the image objects accordingly and use a xed average width for each type of vehicle. In addition to the distance, the relative speed of the detected vehicles are to be estimated. In principle, this could be calculated by a discrete time derivative of the distance. However, the width of a vehicle measured in the image cannot be rendered a noiseless quantity in practice. Therefore the goal has been to arrive at an estimate for the relative speed without using derivatives of any parameter measured from the image. This has been achieved by means of an image processing method which directly determines the relative scale factor between two images of the same object. While tracking a leading vehicle, this scale factor is a dynamic variable s(t) which can be measured for each pair of video frames shot t apart in time. If the motion is directed approximately along the line of sight, the relative speed v(t) can be obtained from
p f B (1 , s(t)) v(t) = b(t) t ;
(2)
where f is the focal length of the camera, B the assumed real width of the vehicle, and b(t) its image width. It is interesting to note that the relative scale factor s(t) directly yields the so-called time-tocollision (t) which is regarded as the most important optical variable for the collision avoidance capabilities found in biological vision:
p (t) = t = (1 , s(t))
(3)
The method developed for measuring the relative scale factor s(t) does not rely on the assumption of symmetric objects. It is based on a logarithmic polar mapping and the cepstrum analysis. In a rst stage the signal of the segmented area is transformed into a shift invariant representation by calculating the absolute value of its Fourier transform which preserves the scale information. A logarithmic polar mapping converts the scale information into a shift information. The mapped representations of two successive images are combined into
a sampled function f [nT; mT ]. The relative shift information between these representations is then measured by means of a cepstrum analysis, where the cepstrum f^[nT; mT ] is de ned as
f^[nT; mT ] = kF ,1 fln(kFff [nT; mT ]gk2)gk2 (4) where F denotes the discrete Fourier transform. The cepstrum produces a peak at the position that corresponds to the relative shift [1]. This is similar to common correlation techniques, but it has been proven to be more robust. Once the shift is measured the dynamic scale factor s(t) is obtained by calculating the inverse of the logarithmic map function, the exponential function, at the position of the shift. Due to the properties of the mapping mentioned aboved the rotation within the image plane can also be determined, but this is of minor interest in this application. After both images of the object have been rescaled to a common object size, we measure the relative spatial shift of the object being tracked. An additional veri cation step consists of calculating the correlation coecient. The resulting representation of the object is of constant position and of constant size while the background moves due to egomotion. This can be used to distinguish between object and background by spatio-temporal ltering, a temporal low-pass and a spatial edge sensitive lter. To improve the segmentation the described method is combined with the process of the symmetry module described in section 3.1.
4
VisionBumper
The concept of a vision bumper is somewhat different to how we conceptually think of an obstacle detection module like CarTrack for example. A bumper "detects" any object that a car is about to collide with seriously. However, it is not one of the normal means of obstacle detection for a driver. Similarly, the output of the VisionBumper system is not meant to be used for the planning of driving manoeuvres or any kind of high level representation of the car's environment. The vision bumper just notices any object intruding an invisible safety zone in front of the car. For VisionBumper, an obstacle is any kind of object having some signi cant elevation with respect to the ground plane, i.e. the road surface the car is moving on. Conversely, a sheet of paper lying on the road, for example, may give rise to a con-
spicuous object in the camera image but should not be regarded an obstacle by the vision system. VisionBumper is a stereo vision system. If the positions of a stereo camera pair are suciently far apart, the distance to any surface point "seen" by both cameras can, in principle, be obtained by means of optical triangulation. However, there is no simple and reliable way of nding two corresponding image locations of the same surface point in 3D (correspondence problem). We therefore have built a specialized stereo vision system which exploits knowledge about the perspective projection of a planar ground area seen simultaneously from a pair of cameras. The essential image processing operation is a geometrical transformation, a so-called inverse-perspective mapping, which eectively compensates the stereo disparities of the ground plane. After the mapping operation, the two transformed images are compared and local mismatches are interpreted as possible obstacle locations. VisionBumper is descended from a similar obstacle detection system which was originally conceived for automatically guided vehicles in a factory environment. This system was used for VISOCAR which at that time was the worldwide rst commercially available driverless transport system guided primarily by video cameras [5].
5
SideView
The main purpose for lateral vision is blind spot surveillance for prevention of hazardous lane changing. Stabilization of automatic road following by local lane marker recognition and/or the detection of bounding upright structures are additional goals. A strong argument in favour of using cameras as lateral sensors is in fact the feasibility to accomplish all of the above goals with the same sensor. However, the imaging conditions when looking, as it were, out of a car's side window are not favourable because of the very high image velocities that occur. We built a special miniature fan camera which integrates three separate CCD area sensors with wide angle lenses into a single case. The fan camera is illustrated in Figure 4. The lateral angle of vision provided by the fan camera is almost 180 degrees. The arrangement of three CCD sensors making up the fan camera has advantages over a sh-eye lens. For one thing, the shape of the eld of vision is much better adapted to the particular
Figure 3: Inverse-perspective mapping and obstacle detection demonstrated for the SideView system. The small overlayed image shows the result of the mapping operation. Obstacles are detected through their dierent distortions in mapped stereo images. situation, i.e. it is much wider than high. Splitting a large eld of view into several video images has the additional advantage of a better, i.e. a more local, brightness adaptation by the automatic exposure control. The SideView system uses the same general approach to obstacle detection as VisionBumper. Again, from the cameras looking at the road, we try to get the information if there is anything in the eld of view which cannot be part of the expected road surface. Both sides of the test car have a pair of fan cameras attached to the bodywork so that there are large binocular elds of vision which can be used for inverse-perspective stereo image matching as described in Section 4. In addition to stereo image processing, a monocular obstacle detection method has been employed for SideView. It is based on optical ow analysis and therefore is limited to situations where obstacle objects are in motion with respect to the camera. The method exploits the fact that the segmentation of the optical ow eld into ground plane motion and object motion is greatly facilitated by the inverse-perspective mapping. Technical details are described in [2] where additional references can be found too. Since not all cameras mounted in the test vehicle can be connected to the computer vision system at the same time, an external video multiplexer is used for "scanning" the cameras under computer control.
man Federal Ministry of Research and Technology (project PROMETHEUS), and by the European Union as part of Esprit project CLEOPATRA. We particularly thank Volker Freiburg for his contributions to the development of CarTrack and the skillful mastering of the multi-'C40 system respectively. The art work done by Frank Joublin has been highly appreciated.
~50mm
References
15mm
30mm
60°
73°
60°
20 mm
Figure 4: The fan camera consisting of three CCD sensor modules and wide angle lenses.
6 Conclusions We have presented a multifunctional obstacle detection system for normal road vehicles. It is based on an almost "all around" video view provided by sixteen miniature low-cost CCD sensors. A parallel processor system consisting of eight high performance signal processors (TMS320C40) is the embedded hardware for the realtime implementation used in the Daimler-Benz test car. Three specialized TIM-40 processor nodes incorporating digitizer chips and frame buers perform the image acquisition. Stereo image acquisition is done by RGB color digitizers providing up to three channel simultaneous image capture. The obstacle detection system has been integrated in the test vehicle to cooperatively provide intelligent environment sensing along with other visual sensor functions like, lane following and trac sign recognition [7].
7 Acknowledgements The work described in this paper has been partially funded by the Daimler-Benz AG, the Ger-
[1] B. P. Bogert, M. J. Healy, and J. W. Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking. In M. Rosenblatt, editor, Time Series Analysis. Wiley, New York, 1963. [2] S. Bohrer, M. Brauckmann, and W. von Seelen. Visual obstacle detection by a geometrically simpli ed optical ow approach. In Proc. 10th European Conference on AI, EOAI 92, Vienna, pages 811{815, 1992. [3] Ch. Goerick. Local Orientation Coding and Adaptive Thresholding for Real Time Early Vision. Technical Report IR-INI 94-05, Institut fur Neuroinformatik, Ruhr-Universitat Bochum, 1994. [4] Ch. Goerick and M. Brauckmann. Local Orientation Coding and Neural Network Classi ers with an Application to Real-Time Car De tection. In Proc. Symposium DAGM/OAGM, Wien, 1994. [5] K. Storjohann, Th. Zielke, H.A. Mallot, and W. von Seelen. Visual Obstacle Detection for Automatically Guided Vehicles. In Proc. IEEE Conf. Robotics and Automation, Cincinnati, pages 761{766, 1990. [6] B. Ulmer. VITA { An Autonomous Road Vehicle (ARV) for Collision Avoidance in Trac. In Proc., Intelligent Vehicles '92, Detroit, pages 36{41, 1992. [7] B. Ulmer. VITA II { Active Collision Avoidance in Real Trac. In Proc., Intelligent Vehicles '94, Paris, France, volume (This volume), 1992. [8] Th. Zielke, M. Brauckmann, and W. von Seelen. CARTRACK: Computer Vision-Based Car-Following. In Proc. IEEE Workshop on Applications of Computer Vision, Palm Springs, pages 156{163, 1992.