Vehicle Tracking in Low-angle and Front-View Images based on Spatio-Temporal Markov Random Field Model Shunsuke KAMIJO Katsushi IKEUCHI Masao SAKAUCHI Institute of Industrial Science, University of Tokyo E-mail :
[email protected]
Abstract One of the most important research in ITS is the development of systems that automatically analyze or monitor traffic activities. For that purpose, it is necessary to achieve reliable vehicle tracking in traffic images. However, occlusion effect among vehicles had impeded such reliable tracking for a long time. In order to solve this problem we have developed the dedicated tracking algorithm, referred to as Spatio-Temporal Markov Random Field in the year of 2000. This algorithm models a tracking problem by determining the state of each pixel in an image and its transit, and how such states transit along both the x-y image axes as well as the time axes. And it was proved that the algorithm has performed 95% success of tracking in middle-angle image. However, most of images actually captured by cameras on infrastructures are lowangle image, and many of them are front-view image. Since vehicles severely occlude each other in such images, segmentations of vehicle region through spatio-temporal images will be unsuccessful. In order to resolve such a problem, we have improved this ST-MRF model to re-optimize segmentation boundaries through accumulated spatio-temporal images. As a result, the improved algorithm were able to track vehicles at 91.2% success rate against such sever occlusions in low-angle and front view images at a highway junction. This successful result would lead precise analyses of severely complicated traffics.
1
Introduction
Today, one of the most important research efforts in ITS have been the development of systems that automatically monitor the flow of traffic at intersections. Such systems would be useful both in reducing the workload of human operators and in warning drivers of dangerous situations. Not only would the systems automatically monitor current situations at intersections but, if the systems could reliably assess those situations and predict whether they might lead to accidents, they might be able to warn drivers and thus reduce the number of accidents. Besides, if the systems could control periods of signals to accoding to information about masses of traffic flow, they might be able to reduce traffic jams and economic losses. Rather than the current practice of performing a global flow analysis, the automated monitoring systems should be based on local analysis of the behavior of each vehicle at intersections. The systems should be able to identify each vehicle and track its behavior, and to recognize dangerous situations or events that might result from a chain of such behavior. Tracking at intersections is often impeded by the occlusion that occurs among vehicles in crowded situations. Tracking algorithms have a long history in computer vision research. In particular, in ITS areas, vehicle tracking, one of the specialized tracking paradigms, has been extensively investigated. Peterfreund[1] employs the ’Snakes[2]’ method to extract contours of vehicles for tracking purposes. Smith[3] and Grimson[4] employ optical-flow analysis. In particular, Grimson apply clustering and vector quantization to estimated flows. Leuck[5] and Gardner[6] assume 3D models of vehicle shapes, and estimated vehicle images are projected onto a 2D image plane according to appearance angle. Leuck[5] and Gardner[6]’s methods require that many 1
3D models of vehicles be applied to general traffic images. While these methods are effective in less crowded situations, most of them cannot track vehicles reliably in situations that are complicated by occlusion and clutter. In order to resolve these problems, we have developed the dedicated tracking algorithm utilizing Spatio-Temporal Markov Random Field model[12] in 2000, and the algorithm successful have applied to traffic event analyses[11]. However, in the case of low-angle and front-view images, since serious occlusion situations occupy for a long period, reliabitily of tracking results seriously decreases ST-MRF of our previous work. Thus, in this paper, we have improved our Spatio-Temporal MRF to be effective even on such serious situations.
2
Primary ST-MRF model as Our Previous Works
2.1 Basic Ideas of ST-MRF In our previous work[12], we have proposed the idea that object tracking problems along temporal images appears to be equivalent to segmentation problems of Spatio-Temporal images. A lot of successful research efforts that had applied Markov Random Field model have been to image segmentations by a lot of researchers in the field of Computer Vision. At first, the most fundamental work had been done by Geman and Geman[7] which have become the basic research of all on MRF not only for image restorations. Methods of merging small segments into large segments can be seen in works of Panjwani and G.Healey[8], and these are kinds of unsupurvised segmentations. More works for unsupurvised segmentations successfully by Manjunath and R.Chellappa[9], P. Andrey and P. Tarroux[10], etc. Although all those researches were successful, those have applied 2D-MRF model to spatial images such as static images. We will then extend 2D-MRF model to Spatio-Temporal MRF model which is able to optimize not only spatial distribution but also temporal axis distribution. An image sequence has correlations at each pixel between consecutive images along a time axis. Our ST-MRF also considers this time-axis correlation. We named the extended MRF the Spatio-Temporal Markov Random Field Model (ST-MRF model). Strictly speaking, ST-MRF should optimize segmentations of spatio-temporal images by pixels. Therefore, cubic pixels should be evaluated by neighbor conditions and texture conditions in ST-MRF where square pixels are evaluated in 2D-MRF. However, since actual moving images cannot be captured in so high frame rates, pixels consisting vehicles move typically in several ten pixels. Therefore, spatially neighbor pixels in consecutive images never have any correlations. Consequently, we defined our ST-MRF to divide image into blocks, and to optimize labeling of such blocks as segmentations, and then to connect blocks between consecutive images referring to their motion vectors. In this section, primary idea of ST-MRF as our previous work is briefly explained. 2.2 Initialization of Object-Map In preparation for this algorithm, an image which consists of 640x480 pixels is divided into 80x60 blocks because a pixel is too small to be considered as one site in the ST-MRF and therefore, we need a larger group of pixels. Each block consists of 8x8 pixels. One block is considered to be a site in the ST-MRF. The algorithm classifies each block into vehicles or, equivalently, assigns one vehicle label to each block. Since the ST-MRF converges rapidly to a stable condition when it has a good initial estimation, we use a deductive algorithm to determine an initial label distribution. Then, these labels are refined through the ST-MRF. In this refining process, the algorithm considers correlation of blocks between consecutive images as well as neighbor blocks and then assign labels to them through the Markov Random Field 2
Move Cluster of Blocks
Delete Extend
Figure 2: Object Moving and Re-mapping Figure 1: Object Generation model. A distribution of classified labels on blocks is referred to as an Object-Map. Here, since this algorithm can be defined to be applicable to gray scaled images, all the experiments has been performed by only using gray scaled images in this paper. [Deductive Process for Tracking Algorithm] 1. Initialization(Figure.1): A background image of the intersection is constructed by accumulating and averaging an image sequence for a certain interval of time, in our case, twenty minutes. The algorithm also sets up a slit at each entrance to the intersection for examining incoming vehicles. Here, a slit is one line of an image segment to determine intensity variance, for example, four slits perpendicular to the four major incoming roads to the intersection are set up in Figure 1. 2. Generate new vehicle IDs(Figure.1): An incoming vehicle is detected when intensity difference occurs at a block along a slit. At each slit block, differences between the current and background intensities are examined, and, if the differences are greater than a certain threshold value, the algorithm decides that it detects a vehicle, generates a new vehicle ID, and assigns it to the block. Along the movement of the vehicle, the slit continuously detects the vehicle and continues to issue the same vehicle ID. Blocks having the same vehicle ID grow along the time sequence. Those blocks sharing the same object ID are grouped into a vehicle region. 3. Estimate Motion Vectors of Vehicles: Once a vehicle region leaves the slit, the algorithm updates its shape along the time sequence. For this updating, the algorithm estimates motion vectors among blocks in a vehicle region. At each block, a block matching method is employed to estimate its motion vector between the time t and t+1. Similarity between one block at time t at (x(t),y(t)) and one at time t+1 at (x(t+1),y(t+1))=(x(t)+u(t),y(t)+v(t)) is evaluated as equation(1); here, I(x, y; t) is a gray-scaled intensity of a pixel (x, y) at time t. Then the motion vector of a vehicle is approximated by the most frequent motion vector among those blocks of the same vehicle region. D=
X
|I(i + di + vi , j + dj + vj ; t + 1) − I(i + di, j + dj; t)|
(1)
0≤di