Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring Javier Díaz, Eduardo Ros, Sonia Mota, and Rodrigo Agis 1
Dep. Arquitectura y Tecnología de Computadores, Universidad de Granada, Spain 2 Dep.Informática y Análisis Numérico, Universidad de Córdoba, Spain {jdiaz, eros, ragis}@atc.ugr.es,
[email protected]
Abstract. The main goal of an overtaking monitor system is the segmentation and tracking of the overtaking vehicle. This application can be addressed through an optic flow driven scheme. We can focus on the rear mirror visual field by placing a camera on the top of it. If we drive a car, the ego-motion optic flow pattern is more or less unidirectional, i.e. all the static objects and landmarks move backwards while the overtaking cars move forward towards our vehicle. This well structured motion scenario facilitates the segmentation of regular motion patterns that correspond to the overtaking vehicle. Our approach is based on two main processing stages: first, the computation of optical flow using a novel superpipelined and fully parallelized architecture capable to extract the motion information with a frame-rate up to 148 frames per second at VGA resolution (640x480 pixels). Second, a tracking stage based on motion pattern analysis provides an estimated position of the overtaking car. We analyze the system performance, resources and show some promising results using a bank of overtaking car sequences.
1 Introduction The blind spot in the rear-view mirror and the driver distractions are sources of multiple accidents. A camera can be placed in the car allowing us to detect the overtaking car, using optical flow algorithms. This can be used to generate alert signals to the driver. The optical flow driven scheme has several properties that can be very useful for car segmentation. Basically, focusing on the optical flow field, we should find static objects and landmarks moving backwards (due to our ego-motion) and the overtaking cars moving forward towards our vehicle. We should take into account the perspective deformation. The optical flow of a moving object is not homogeneous, the parts of the object that are far away from the camera seem to move slower than the ones that are closer, so you can find a set of different velocities, which changes continuously, along the same object. On-board cameras have been used for lane tracking [1] and also in front/back vision for obstacle avoidance [2], but the application we address here focuses on a different field of view, the rear-view mirror and is important to emphasise that we have to deal with the perspective deformation. This scenario forces to use sophisticated clustering techniques, such as neural networks, when only sparse S. Vassiliadis et al. (Eds.): SAMOS 2006, LNCS 4017, pp. 385 – 394, 2006. © Springer-Verlag Berlin Heidelberg 2006
386
J. Díaz et al.
features are used [3]. But in the presented approach we focus on an optical flow dense map to devise a more robust system based on a simple centroid computation for car tracking system. An accurate system will need to overcome this problem. We also require the proposed algorithm to be robust enough to detect movement using a non static camera. The movement of the host vehicle is a very important source of artefacts and for the application addressed here is critical that the algorithm used can “clean” these noisy patterns. The work scheme that we have developed is composed of two different stages. In the first step, using a high performance motion estimation circuit, we compute the optical flow. Second, using very simple filtering operations and optical flow templates, we get a saliency map that can be used to estimate the car position in the image. In section 4, we show some results of the proposed system for several overtaking car sequences. Some companies, such as Mobileye N.V. [4], Volvo [5], and Fico S.A. [6], have apparently developed some aids to lane-change decision making but no reports on their technical details, processor type or the performance of these approaches are available. This seems to suggest that the addressed application has in fact a high potential impact and the existing solutions are still under development. In our approach the whole system has been implemented on an embedded device to fit in the automobile market. In this environment, FPGAs seem to be a good option due to the intensive computation required, interfacing capabilities with automobile buses and packing possibilities inside the car. Furthermore, taking into account that complex vision processing systems are still being developed for automobile applications, the capability of the FPGA to be reconfigured to new processing schemes is a very valuable feature. This also encourages its utilization on these applications rather than other approaches based on ASIC/ASIP that fit better more market standard products.
2 Algorithmic Description The proposed application needs the generation of alert signals to the driver to prevent traffic accidents. What we do is to estimate the car position and the confidence level. After that the alert signal generation is straightforward. The problem has been solved using two main processing stages. First, motion is estimated using a gradient-based optical flow sensor described in [7]. In previous works, Lucas & Kanade (L&K) gradient based method [8], [9] is highlighted as a good candidate to be implemented on hardware with affordable hardware resources consumption [10], [11], [12] and good accuracy. The optical flow allows us to easily filter the overtaking car as shown on Fig. 1. This scenario requires fulfilling to main aspects on the motion estimation stage. First, in order to detect the car as soon as possible, high image resolution is desirable. Second, since the relative inter-vehicles speed can be quite high, this motivates a specific purpose computing architecture for high frame-rate processing to achieve reliable tracking. This hard constraint requires a specific design strategy, making unviable the utilization of devices such as the one described in [10] which implemented a coarse grain pipeline processing scheme of only 6 stages being able to process just 3,5 Mpixels per second. We utilize a novel superpipelined and intensively
Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring
387
parallelized architecture for optical flow processing with more than 70 pipelined stages that achieve a data throughput of one pixel per clock cycle. This customized DSP architecture is capable of processing up to 45 Mpixels/s arranged for example as 148 frames per second at VGA resolution (640x480 pixels). This is of extreme interest in order to use high frame-rate cameras which allows the estimation of high confidence motion information [13] to improving the tracking stage. This new system outperforms previous approach [10] thanks to the fine-grain pipeline, an improved image differentiation technique, and a novel memory management unit which enables the utilization of FIR temporal filters.
Fig. 1. Car segmentation using optical flow. Dark greys represent rightward movements (the car) and light greys leftward motion (the landscape). We can see that the proposed model gives us very uniform object segmentation therefore car tracking can be done easily.
In the second stage we calculate the overtaking car position and reliability on such measurement. Relaying on the advanced sensor used for motion computation, a simple tracking system based on motion filtering templates has been developed achieving very promising results. The method is described on the next section. a. Car Tracking: Post-processing optical flow steps The main operations to be implemented can be summarized as follows: 1. Pattern selection. We consider only rightward movements. While the overtaking manoeuvres, the overtaking car is moving to the right side of the image so we do not need to consider leftward velocities. If we note Vx the x component of the velocity, Vy the y component and k for the minimum reliable velocity component module, the velocities set that we use should verify:
vx > k
and
v y ≤ vx
(1)
2. Saliency map generation. This step uses the previous information and isolates the main motion features which work as the input saliency map of the next stages; It is realized by using optical flow filtering templates. The proposed system computes the number of motion pixels grouping on spatial neighbourhoods of 15x15 pixels. Each template count the number of motion pixels presented at his neighbourhood. A feature will pass to the next stage if it has enough active points, where the limit threshold depends on the estimate car position. Due to the rearview mirror perspective the threshold grows rightward according to the vehicle size. The final saliency maps clear spurious patterns and correct the image perspective to give reliable data to the next computation stage.
388
J. Díaz et al.
3. Centroid computation. A simple centroid computation of the saliency map provides us the car position estimation but, it is correct only for continuous car overtaking of only one vehicle. Some more complex and realistic situations need to be solved, as described bellow: • Multiple car overtaking. For the addressed application a multi-target tracking system is not necessary. We only want to know if there is at least one car in a dangerous situation. What we have done is to use a car position iterative computation with several stages. In the first one we use all the saliency map points of the whole image to give the car position estimation. It will be the correct position if there is only one car. When there are several targets in the system, the main goal is to detect the position of the car that is closer to us. For this purpose, we focus on the right area of the image, using the computed centroid position as left image boundary. We try to calculate a centroid of the limited image if we have significant features. Otherwise we choose the previously calculated value. We can repeat this computation several times until the estimation converges or we can use a fixed number of iterations. For our system we have used only three iterations to get adequate results.
Fig. 2. Processing stages implemented for the rear-view mirror overtaking monitor
• Static overtaking: An overtaking car seems to stop (and it vanishes in the optical flow field) because it moves at our own velocity. In this situation we need to maintain the estimated car position during a certain time. We
Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring
389
implement a simple memory system based on the Kalman filter which has been proved to be very useful in resolving many problems involved in predicting the position of moving targets [14], [15] and is even useful for complex motion prediction [16]. It predicts the car position based on the estimated centroid velocity and previous position. The process model used makes the assumption that velocity is constant and the noise can be seen as an acceleration of the object. The processing stages are schematically described on Fig. 2. The final system could be improved based on the signal indicators steering information for the alarm generation and is planned as future work.
3 System Architecture and FPGA Resources Consumption The global system architecture is represented on Fig. 3. We have implemented a very regular datapath (without requiring specific interrupt handling) with a very deep pipeline structure (more than 70 stages) in order to achieve high performance. The synchronization between the different processing units (frame-grabber, motion processing core and tracking unit) is done using specific memory data buffers which solves the problem associated to the different clock frequencies. The computing platform used to ZBT SSRAM memories whose capabilities have been exploited using a specifically designed Memory Management Unit (MMU) described on [7] that minimizes data delays and latencies. It is especially useful for the temporal filtering stage of the motion processing unit because it enables the use of FIR temporal filters which provide more stable estimations.
Fig. 3. Overtaking monitor system architecture. All the processing stages and interfaces have been implemented using the FPGA as element control and processing unit. The whole system requires two external memory banks, a camera and vehicle interfaces for the alarm generation and external inputs encoding vehicle information such as speed, steering or lateral indicators.
390
J. Díaz et al.
The memory interchange strategy makes use of delays between processing units as synchronization technique. This makes possible the design of a very deep pipeline processing structure without using branch predictions that would degrade the performance. The high system throughput is based on this deep pipeline and on the parallel scalar units of different stages designed according to the Lucas & Kanade algorithmic complexity. Well balanced units are used to achieve a final system throughput of one estimation per clock cycle. The performance of the optical flow unit makes possible to take advantage of high frame-rate cameras reducing the speed range to be processed (more time resolution) and leading to accurate tracking. Each stage has been designed with customized bitwidths from 8 (in the first stage) to 19 bits (in the last stage) with fixed-point and floating point data representation depending on required precision. More details about this architecture are given in [7]. In the tracking unit the templates computation has been implemented using convolution kernels which collect the information of the neighbourhood of each pixel. The iterative process only requires some boundary image control to choose the area in which the centroid is computed. Finally, the Kalman filtering uses simple arithmetic operations which are computed once per frame. Table 1. Basic stages gates resources consumption (results taken from the DK synthesizer [17])
Pipelined stages
NAND gates
FFs
Memory bits
Max clock frequency (MHz)
Interfaces + hardware controllers
65881
2363
18208
45
Motion Processing core
1145554
6529
516096
45,5
Tracking core
12087
751
0
71
Table 2. System resources required on a Virtex II XC2V6000-4 for the whole overtaking car system monitor (Mpps: mega-pixels per second and it’s the maximum system processing clock frequency, EMB stands for embedded memory blocks)
Slices / (%) 8250 (24%) 10073 (29%)
EMBS / (%)
Embedded multipliers / (% )
Mpps
Image Resolution
Fps
29 (20%)
12 (8%)
45.49
640x480
148
29 (20%)
12 (8%)
45,5
640x480
148
The gates consumption estimation of the different subcircuits is given on Table 1. Note that the tracking unit, provided that is implemented using iterative computation, allows efficient resources sharing (thus representing a relatively inexpensive stage). On the other hand, the motion processing unit requires the
Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring
391
intensive exploitation of the parallelism capabilities of the FPGA device (representing the most expensive module in terms of chip area). The interfaces and hardware controllers also require a considerable number of resources. Global system resources are shown in Table 2 after synthetization. It requires less than 2 million gates Virtex-II FPGA.. The tracking stage is processed sequentially only requiring 5% of the whole FPGA slices. This represents 17% of the global hardware resources consumed by the complete system.
4 Illustrative System Results Evaluating the accuracy and efficiency of the system for real image sequences is not an easy task. Visual inspection of the results can give us some “quality estimations” to evaluate the performance but it is not a definitive “quality evaluation procedure”. For our test we have considered that the tracking is done correctly if the estimated car position given by the algorithm belongs to the car’s pixels. We have tested the algorithm in different overtaking car sequences provided by Hella [18] with different vehicles and whether conditions. At the beginning of the overtaking maneuvering, when the vehicle is very small our system confidence measure is not reached. This means that we have not enough information but we have already unreliable position estimations. This has been marked as black squares in the figures. When the car is larger, confidence measures begin to be reached but without temporal consistency and, finally, the system is able to track accurately the vehicle until the end of the overtaking sequence. Reliable position is drawn in the figures using a white cross. For all the evaluated sequences, this situation is reached for very far distances of the overtaking car so the system performance is good for safe distances. An important problem occurs when the overtaking car velocity is equal to our car velocity, so the relative vehicle velocity will be around zero. In this situation the Kalman filtering allows us to keep the car position but the confidence value will not be reached, as it is seen in Fig. 4. The system memory allows us to keep the car position under the confidence threshold (see black square in the third frame). Alert signal system can use the estimation position and memory consistency to decide if we are in a dangerous situation or not.
Fig. 4. Overtaking with relative static situation with a black car in a shinny day. Sequence recorded using a conventional CCD camera.
392
J. Díaz et al.
Fig. 5. Car in a foggy and rainy day. Sequence recorded using a high dynamic range camera.
Fig. 6. Car in a cloudy day. The car moves with the lights switched off. Sequence recorded using a high dynamic range camera.
In different whether and light conditions the kind of camera sensor is crucial and strongly motivates the use of high dynamic range cameras. The sequence of Fig. 5 and 6 tests our system capability for very low contrast sequences. The weather conditions in the sequence of Fig. 5 are really bad, in these situations lights become a very important source of information. Here the system needs closer cars to reach the confidence value to begin the car tracking reliably. In Fig. 6 we test the robustness of the system to low contrast scenarios. This sequence has more contrast but the car has switched off the lights. As it can be seen the results are correct. The sequence of Fig. 7 shows a complex scene. Several cars are doing the overtaking in a highway. Each car is numbered using brackets. The figure shows different frames of the sequence and the dangerous car position estimations. As we explained in section 3.4 the system only marks the closest car (the most dangerous in the scene). One important problem occurs when we have multiple lanes. Motion information from monocular viewing can not give us information about car distance so it is difficult to know in which lane is detected the approaching car. We can use the road white lines to do that but the important issue is to be able to discriminate whether the situation is dangerous or not. Our system is useful if it prevents us of changing lane when another vehicle is present in a dangerous situation. This problem will be addressed in the future. In this figure we can see the car estimation inertia. It should be noted that when the system looks for a new car, the estimation is over the confidence threshold but in a wrong position. This occurs because the saliency map obtained from the optical flow has reliable information about the car position but the Kalman filter needs two or
Real-Time Embedded System for Rear-View Mirror Overtaking Car Monitoring
393
Fig. 7. Multiple cars overtaking in a highway in a cloudy day. Numbers above the cars are used to facilitate interpreting the scene. Sequence recorded using a high dynamic range camera.
three frames to update its parameters. We can use a more complex model for the car tracking but, thinking in hardware implementation of an embedded system, it can represent an unnecessary computation overload, since for a real time system that computes 25 frames/s this delay of the alert signal is not significant.
5 Conclusions We present a monitor system to track the overtaking cars using the rear-view mirror perspective. Basically, we use two steps, first we compute the optical flow using a high parallel and superpipelined optical flow system that gives us a robust method for estimating the motion cues. The second step generates a saliency map that represents reliably car points that are used to compute the overtaking car position. The results shown are very promising, because the system is very robust and stable, even for very difficult image sequences with bad visibility conditions. The utilization of FPGA technology fits quite well the necessities of automotive technology due to the reconfigurability and scalability of these devices. Future work will address the integration of the vehicle signals into the alarm generation decision unit and also will address the topic of how alerting the driver. We also plan to address the scalability of the system in the future to enable its implementation on smaller devices.
394
J. Díaz et al.
Acknowledgements This work was supported by the Spanish National Project DEPROVI (DPI200407032), by the EU grant DRIVSCO (IST-016276-2).
References 1. Apostoloff, N., Zelinsky, A.: Vision In and Out of Vehicles: Integrated Driver and Road Scene Monitoring. Int. J. of Robotics Research, 23: 4-5, (2004), pp. 513-538. 2. Dagan, E., Mano, O., Stein, G.P., Shashua, A.: Forward collision warning with a single camera. IEEE Intelligent Vehicles Symposium, 14-17 (2004), pp. 37-42. 3. Mota, S., Ros, E., Díaz, J., Tan, S., Dale, J., Johnston, A.: Detection and tracking of overtaking cars for driving assistance. Early Cog. Vision Workshop, Isle of Skye, Scotland, UK, 28 May- 1 June, (2004). (http://www.cn.stir.ac.uk/ecovision-ws/schedule.php). 4. Mobileye N.V. Blind Spot Detection and Lane Change Assist (BSD/LCA). Web link: http://www.mobileye.com/general.shtml. 5. Volvo BLIS system. Web link: http://www.mynrma.com.au/blis.asp. 6. Ficosa Digital blind spot detector. Web link: http://www.ficosa.com/eng/home_noticiaseventos.htm. 7. Díaz, J., Ros, E., Mota, S., Rodríguez-Gomez, R.: Highly parallelized architecture for image motion estimation. LNCS, Int. Workshop on Applied Reconfigurable Computing, ARC2006, Delft, Netherlands March 1-3, 2006. (accepted for publication). 8. Lucas B., Kanade T.: An Iterative Image Registration Technique with Applications to Stereo Vision. In Proc. DARPA Image Understanding Workshop, (1981), pp. 121-130. 9. Barron. J., Fleet, D.J., Beauchemin, S.S.: Performance of Optical Flow Techniques. IJCV 12:1, 1994, pp. 43-77. 10. Díaz, J., Ros, E., Ortigosa, E. M. and Mota, S.: FPGA based real-time optical-flow system. IEEE Trans. on Circuits and Systems for Video Technology, vol. 16: 2, (2006) pp. 274-279. 11. McCane, B., Novins, K., Crannitch D. and Galvin B.: On Benchmarking Optical Flow. Computer Vision and Image Understanding, vol. 84, (2001) pp. 126–143. 12. Liu, H.C., Hong, T.S., Herman, M., Camus, T., and Chellappa, R.: Accuracy vs. Efficiency Trade-offs in Optical Flow Algorithms. CVIU., vol.72, 3, (1998) pp. 271-286. 13. Lim, S., Apostolopoulos, J.G., Gamal, A.E.: Optical flow estimation using temporally oversampled video. IEEE Trans. on Image Processing, vol. 14:8, (2005), pp. 1074-1087. 14. Dellaert F., Thorpe C.: Robust car tracking using Kalman filtering and Bayesian templates. In Proceedings of SPIE: Intelligent Transportation Systems, vol. 3207, (1997). 15. Gao, J., Kosaka, A., Kak, A. C.: A multi-Kalman filtering approach for video tracking of human-delineated objects in cluttered environments. Computer Vision and Image Understanding, 99:1, (2005) pp. 1-57. 16. Jung, S.-K., Wohn, K.-Y.: 3-D tracking and motion estimation using hierarchical Kalman filter. IEE Proc.-Vis. Image Signal Process, 144 :5, (1997) pp. 293 – 298. 17. Celoxica company. Web site and products information available at: www.celoxica.com. 18. Dept. of predevelopment EE-11, Hella KG Hueck & Co., Germany, www.hella.de.