30
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 60, NO. 1, JANUARY 2011
Compensating Background for Noise due to Camera Vibration in Uncalibrated-Camera-Based Vehicle Speed Measurement System Thuy Tuong Nguyen, Xuan Dai Pham, Ji Ho Song, Seunghun Jin, Dongkyun Kim, and Jae Wook Jeon, Member, IEEE
Abstract—Vision-based vehicle speed measurement (VSM) is one of the most convenient methods available in intelligent transportation systems. Existing methods use an uncalibrated camera to measure vehicle speed, but they do not consider the possibility of camera vibration that leads to poor measurement results. This paper considers the issue when the camera is tilted downward and mounted at a fixed location on a bridge crossing the target street. The camera may vibrate due to wind or bridge movement. A vision-based speed measurement system is described in this paper, along with the vertical-and-horizontal-histogram-based method, which is used to compensate the background of an incoming image. This novel method is utilized to eliminate noise coming from the displacement between an incoming image and a background image that is caused by camera vibration over time. Moreover, a method is presented to automatically detect the vanishing point based on the Hough transform and quadtree. Experimental comparisons of the system to those of the vehicle’s own speedometer show that the proposed approach yields a satisfactory estimate of vehicle speed. Index Terms—Background compensation, camera vibration, hough transform, vanishing point, vehicle speed measurement (VSM).
I. I NTRODUCTION
I
NTELLIGENT transportation systems are becoming more important due to their advantages of saving lives, money, and time. Acquiring traffic information, such as lane width [1] traffic volume (the number of traveling vehicles per time period through a position in a lane), traffic density (the number of total vehicles in a given area at a given time) [2], and vehicle speed [3]–[15], is a key part of intelligent transportation systems, and such information is used to manage and control traffic. In this Manuscript received July 2, 2010; revised November 20, 2010; accepted November 24, 2010. Date of publication December 6, 2010; date of current version January 20, 2011. This research was performed as part of the Intelligent Robotics Development Program, which is one of the 21st Century Frontier Research and Development Programs funded by the Ministry of Commerce, Industry, and Energy of Korea. The review of this paper was coordinated by Prof. S. Ci. T. T. Nguyen, J. H. Song, D. Kim, and J. W. Jeon are with the School of Information and Communication Engineering, Sungkyunkwan University, Suwon, Gyeonggi 440-746, Korea (e-mail:
[email protected]; kaiz.corwell@ ece.skku.ac.kr;
[email protected];
[email protected]). X. D. Pham is with the Information Technology Faculty, Saigon Institute of Technology, Ho Chi Minh City 440-746, Vietnam (e-mail: xuanpd@ saigontech.edu.vn). S. Jin is with the Samsung Advanced Institute of Technology, Samsung Electronics, Yongin 446-712, Korea (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2010.2096832
paper, we focus on vehicle speed since reducing speed can help to reduce accidents. Two main techniques presently used to acquire vehicle speed are hardware- and software-based methods. Ground inductioncoil loop detectors [10] and sensors [11] are the most widely used hardware devices for measuring vehicle speed. The software methods are used to measure the time it takes for a vehicle to pass through a measured distance. Software-based monitoring systems using cameras have a number of advantages. First, a much larger set of traffic parameters can be estimated in addition to the vehicles’ count and speed. These parameters include vehicle classifications, lane changes, rapid accelerations or decelerations, and queue lengths at urban intersections. Second, cameras are less disruptive and less costly to install than other hardware devices. Many vision-based methods for traffic applications have been presented. The two objectives of this paper are given as follows: First, we provide a brief review of vision-based tools and their operation in traffic speed measurement applications. In this review, we focus on the common models, methods, features, and requirements of vehicle speed measurement (VSM) systems. This work can also be considered as the detailed related works section. In addition to existing systems, we present an overall uncalibrated-camera-based VSM system with the proposed noise-elimination technique. The second objective of this paper is given as follows: Some existing methods [3], [6], [8], [9], [14] use an uncalibrated camera to measure vehicle speed, but they do not consider the possibility of camera vibration that leads to poor measurement results. Our system takes interest in this problem that occurs when the camera is tilted downward and mounted at a fixed location on a bridge crossing the target street. There is no knowledge about the parameters of this camera. The camera may vibrate over time due to wind or bridge movement. This camera vibration leads to the displacement between an incoming image and a background image; thus, noise is produced. The proposed noise-elimination method is used to compensate the background of an incoming image to eliminate occurring noise. Moreover, we present a method that automatically detects the vanishing point. This paper is organized as follows: In Section II, we present a brief review of the vision-based VSM systems that are classified with respect to major characteristics: camera setup, feature extraction, preprocessing, result evaluation, and requirements. Section III provides the uncalibrated-camera-based VSM system with the proposed noise-elimination method. Section IV
0018-9545/$26.00 © 2010 IEEE
NGUYEN et al.: COMPENSATING BACKGROUND FOR NOISE DUE TO CAMERA VIBRATION IN VSM SYSTEM
31
Fig. 2. Camera view with the focal length, installation height, and angle parameters. Fig. 1. Diagram representing the characteristics combination of vision-based VSM systems.
presents the experimental results of our system. This paper is drawn to a conclusion in Section V. II. B RIEF R EVIEW OF V ISION -BASED V EHICLE S PEED M EASUREMENT S YSTEMS Surveys in this field started in the year 2000 and are still actively conducted to the present day (including our system) as a result of the development of vision devices and efficient image processing techniques. Stationary cameras are the majority of vision devices used in ten surveyed systems. Some of the systems share common models, methods, and features, and some originate from very diverse approaches. The vision-based VSM systems are described with respect to their main characteristics: camera setup, feature extraction, preprocessing, result evaluation, and requirements. Fig. 1 shows a combination of characteristics of vision-based VSM systems. The preprocessing step in this figure contains algorithms for different speed measurement models. Based on the preprocessing information, vehicles are detected and tracked according to the feature extraction and filtering techniques. In general, the tasks necessary to estimate vehicle speed in a VSM system are given as follows: 1) Obtain successive images; 2) identify the moving vehicles in the successive images; 3) track the vehicle between images; and 4) estimate speed based on both the distance traveled and the interframe delay. Four major characteristics of vision-based VSM systems are presented in this section: 1) VSM using a calibrated camera and using an uncalibrated camera; 2) common assumptions and requirements; 3) vehicle detection and tracking techniques; and 4) another comparative analysis containing preprocessing techniques and result evaluation. A. VSM: Using a Calibrated Camera and Using an Uncalibrated Camera We classify vision-based VSM systems into two main models: 1) the calibrated camera model [4], [5], [7], [12], [13] and 2) the uncalibrated camera model [3], [6], [8], [9], [14], [our system]. In the calibrated camera model, the geometric operations are utilized to estimate the positions of vehicles. This model requires predefined intrinsic camera parameters. Therefore, we first calibrate the camera to obtain the appropriate
parameters needed for speed estimation. In addition to using a calibrated monocular camera, Garibotto et al. [4] built a geometric model using a binocular camera. The speed measurement problem in this paper is formulated as a 3-D displacement problem that is solved using stereo vision and motion. The works in [3], [7], [8], and [13] supported by the Washington State Department of Transportation (WSDOT) presented comprehensive designs and results. The WSDOT has a network of several hundred traffic cameras deployed on the freeways and arterials around Seattle for monitoring congestion. Most of these cameras are not calibrated and can be panned, tilted, and zoomed. Fig. 2 shows the geometry of the problem. Conversely, no intrinsic camera parameters are required for the uncalibrated camera model; however, a scale factor and a vanishing point are the only two parameters that are necessary for this model. The scale factor represents the mapping of a vehicle size or other specific sizes that remain unchanged during the measurement process from the ground plane to the image plane. The vanishing point is used for rectification of road images. The rectification can be considered as camera calibration; however, instead of utilizing the geometric operations to estimate the vehicles’ positions, we measure the vehicle speed in 1-D direction on the rectified image plane. Methods for determining the scale factor and the vanishing point will be introduced in Section II-D. Furthermore, unlike the calibrated camera models that allow setting a camera on roadsides, the uncalibrated models require setting the camera to tilt downward and mounting it at a fixed location on a bridge crossing the target street. The model that requires rectifying the road image needs fewer parameters (the scale factor and the vanishing point) than the calibrated camera model; thus, it has an advantage in terms of camera setup. B. Common Assumptions and Requirements When implementing an algorithm for estimating velocity, common assumptions and requirements are established for VSM systems to simplify the measurement problem. 1) The speed of vehicles is finite, i.e., greater than or equal to zero. 2) The ground in front of the camera is planar. 3) The lane markings must be parallel to one another, which can be used to automatically compute the coordinates of the vanishing point.
32
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 60, NO. 1, JANUARY 2011
4) The camera’s apparent angle of the roadway is constrained to specific degrees. 5) Vehicles move toward or away from the camera. 6) Vehicle motion is constrained to one lane. 7) Vehicle motion is smooth. There are no sudden changes of direction in time intervals between successive frames. 8) They deal with transportation means: cars, trucks, buses, and vans. 9) They deal with some traffic conditions: light traffic and varying speeds in different lanes. 10) They deal with various lighting conditions: day, night, sunny, overcast, and rainy conditions. 11) The system has real-time operation. The major requirement of the calibrated camera model is collecting camera parameters, such as the focal length, the size of the charge-coupled device sensor, the installation height of the camera from the ground plane, and the installation angle of the camera with respect to the vertical direction. Moreover, this model also requires defining the ground plane (or the image plane) that is used [7], and it requires information about two lanes on the ground plane [13]. Most uncalibrated camera models require the length of vehicle projection in the image plane, the image region used, and the apparent angle of the roadway. The reason for defining the image region used is making sure that there is enough information in situations where vehicles are quite far from, or quite close to, the camera. When the vehicle is far from the camera, there is a lack of necessary information for detecting or recognizing the vehicle; such information might be interpolated by the algorithms applied to the VSM system. Moreover, when the vehicle is quite close to the camera, it usually exceeds the view range of the camera, and thus, we are unable to determine the vehicle speed based on its position. C. Vehicle Detection and Tracking There are many techniques to extract vehicle features for detecting and tracking vehicles. Most VSM systems attempt to eliminate road backgrounds to improve the accuracy of feature extraction. The road background image is the image obtained when there is no movement on the road. The road background cannot be directly obtained because the road varies according to time, weather, and traffic conditions. A background image must be generated based on the series of images containing vehicles and other conditions. Many systems utilize simple background elimination techniques to extract the vehicles’ images. These techniques have remarkable characteristics such as directly differentiating successive images [12], computing the average of image sequences [7], [8], [13], or iteratively excluding the outlying color values [5], [9], [our system]. The background image can also be manually defined in [6]. In [3], Sobel edge detection is a technique used to obtain the edge images of moving vehicles. In addition, a license plate recognition technique is applied in [4]; therefore, there is no background subtraction required in this paper. After obtaining the foreground image that contains only vehicle movement, the next step is estimating the speed of an individual vehicle by calculating its displacement in the
foreground image. Image correlation is used in most systems because of its simplicity and effectiveness. This technique measures the similarities between the detected vehicles to track these vehicles in consecutive frames. Moreover, some advanced techniques are used to enhance the accuracy and computational efficiency of vehicle tracking. The method of adaptive windowing prediction block matching is used in [5] to improve the computational efficiency of real-time systems. A lookup table between pixel displacement and corresponding geometrical position is provided in advance to reduce the computational complexity. Similarly, the vehicle mask overlay and block matching technique is applied in [12]. In this paper, no background reference image is required; the author regards one of the two successive images as a background reference. Consequently, estimating vehicles at high speed outperforms estimating them at low speed. In [4], the license plate is used as a high-level feature to uniquely identify an individual vehicle. The license plate recognition technique significantly improves the accuracy of vehicle detection and tracking; however, it requires high computational effort, and the precision of character segmentation determines the precision of vehicle identification. A Kalman filter is used to predict, smooth the speed estimates, and reject outliers in [13] and [14]. D. Other Comparative Analysis 1) Preprocessing: Most VSM systems require processing at the initialization step. In both calibrated and uncalibrated camera models, we define three main preprocessing techniques: 1) camera calibration; 2) scale factor; and 3) vanishing point determination. Preprocessing techniques can be manually or automatically executed. Manual camera calibration is introduced in [4], [5], [7], [12], and [13]. More conveniently, Schoepflin and Dailey [7] proposed a method to calibrate the camera by estimating the vanishing point and the vehicle lane boundaries. Scale factor, which is calculated in meters or feet per pixel, is manually defined in [6], [9], [our system], and it is automatically determined in [3], [8], and [14]. Similarly, the vanishing point can also be manually or automatically determined. In [6], a method of road rectification using one vanishing point is proposed, instead of rectification from two vanishing points. However, the vanishing point used in the practical evaluation of [6] is manually defined. In [8], [14], and our system, the vanishing point is automatically estimated through a list of intersected points resulted from the detected lines. Contrary to [6], which manually extracts the vanishing point of the road direction, [9] relies on extending the approach presented in [16]. Similar to the aforementioned methods of automatic detection of the vanishing point [16] based on the dominant direction of a scene, the edges are grouped to share a common vanishing point. The dominant groups of the edges are searched for, and the vanishing point in the road direction is estimated using a least-squares method. 2) Result Evaluation: It is not easy to evaluate the measurement results of a vision-based VSM system for the following reasons: First, vehicles are traveling on many lanes at different speeds. Second, various transportation means make the vehicle detection step more difficult. Finally, changing lighting conditions might cause noise in the detected results. Therefore,
NGUYEN et al.: COMPENSATING BACKGROUND FOR NOISE DUE TO CAMERA VIBRATION IN VSM SYSTEM
33
TABLE I C OMPARISON OF VARIOUS V ISION -BASED V EHICLE S PEED M EASUREMENT S YSTEMS
the evaluation might be based on the common assumptions mentioned in Section II-B. VSM systems create some specific test cases that can be used for evaluating the results. The ground truth (or actual) velocities are utilized by most systems [4], [5], [7], [8], [our system] because of their simplicity and effectiveness in comparison. In addition, the accuracy of the algorithm in [3] is evaluated according to the ground truth estimates and inductance loop measurements. The speed results are compared with the ground truth estimates, and the time average results are compared with the equivalent inductance loop speed measurements. Moreover, the inductance loop estimates are also utilized in [7] and [13]. These inductance loop speeds come from the Traffic Data Acquisition and Distribution data mine that contains data acquired from the Seattle metropolitan region. The systems in [6], [9], and [14] compare their speed results to the GPS measurement, and the system in [12] is compared with a Doppler speed radar system. Furthermore, there are some specialties in the result evaluation. The system in [3] suggests that using the algorithm with
a large number of vehicles will accurately estimate of a mean traffic speed. In [12], the estimation result of vehicles traveling at high speeds outperforms the result of those at low speeds because the algorithm used in this system regards one of two successive frames as a background reference. We have examined various existing vision-based VSM systems, previous works related to all modules of the VSM system, the importance of the modules, and the assumptions made about them to the primary objective of the system. It is important to look at the systems as a whole and to compare them in terms of performance based on their setup, features, evaluations, and requirements. Table I summarizes and compares various vision-based VSM systems in terms of their corresponding characteristics. Comparing and analyzing such systems enable us to highlight their advantages and disadvantages. Based on the aforementioned categorization, we attempted to give a brief review of the existing vision-based systems for measuring the vehicles’ speeds. Although we cannot cover all existing systems, we concentrate on the systems that highlight the major characteristics in this field.
34
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 60, NO. 1, JANUARY 2011
Fig. 3. Block diagram representation of the VSM system.
III. V EHICLE S PEED M EASUREMENT S YSTEM W ITH BACKGROUND C OMPENSATION AND AUTOMATIC VANISHING P OINT D ETECTION In this paper, the VSM system using an uncalibrated camera is presented along with the proposed background compensation method used to eliminate noise. The system consists of four separate stages. First, a background image is generated from a few frames using an iterative exclusion algorithm of outlying color values. Second, the novel background compensation algorithm is performed on the incoming frames and the generated background images. It is noticed that the background is repeatedly created in a specific time period. During this period, if an incoming image is subtracted to the background, with just a very small displacement, noise can occur in the resultant image. This step is used to eliminate the noise arising from the displacement between two images, caused by camera vibration over time. Third, the foreground image obtained from the previous step is affinely rectified using a vanishing point. This point in the road direction is automatically detected using our proposed algorithm, which is based on Hough transform and a quadtree. Afterward, using the rectified image, vehicles can be detected, and hence, their velocities are measured. Fig. 3 shows the diagram of the proposed system. The underlying assumptions and requirements of our VSM system are based on those presented in Section II-B. According to the diagram in Fig. 3, this section contains four separate processing steps: 1) background creation and update; 2) background compensation; 3) image rectification and vanishing point detection; and 4) vehicle detection and tracking.
A. Background Creation and Update The background subtraction technique is popular in motion detection systems. In vehicle detection, the key issue of this technique is applying subtraction on the current image to eliminate the road background. The road background image is the image obtained when there is no movement on the road. Because the road varies according to time, weather, and traffic conditions, the road background cannot be directly obtained. A background image must be generated, which is based on the series of images containing vehicles and other conditions. Therefore, this part refers to a section of [9] to present a simple but effective algorithm to create the road background from a few frames. This algorithm uses an iterative exclusion of outlying color values. The background updating model has been proposed in many recent studies [9], [17], [18]. In [17], motion tracking is the
focus, in which motion segmentation is based on an adaptive background subtraction method. This method models each point in the image as a mixture of Gaussian distributions, and it uses an online approximation to update the model. The work of [18] also utilizes a mixture of Gaussian distributions to model each pixel of the background image. These studies attempted to deal with the brightness change, and they have shown that the algorithms used are effective in video surveillance and monitoring applications. Referring to [9], the background images are repeatedly generated over a specified time period. This time period can be set to minutes or hours, depending on traffic and weather conditions. For future deployment of our system in Korea, we are going to specify a time period of 10 min for background generation. This paper utilizes the algorithm in [9] due to its simplicity and effectiveness. The approach considers the mean value μ and the standard deviation σ for three channels of each pixel in a few captured frames. The description in [9] can be formulated as IB (x, y)i =
N 1 In (x, y)i N n=1
(1)
such that μ − σ ≤ In (x, y)i ≤ μ + σ, where IB (x, y)i is the color value at channel i of pixel (x, y) in the background image IB ; In (x, y)i is the color value at channel i of pixel (x, y) in the incoming image In ; N is the number of images satisfying the given condition for all channels; and μ and σ are calculated from the N color values of channel i at (x, y). Fig. 4 shows real traffic images and the corresponding generated background images in various traffic conditions. This algorithm can perform very well in light traffic condition and quite well in normal condition, and it might yield poor results in congestion traffic condition. It can be seen that the sum of the error for all the pixels in the generated background image is increasingly smaller, compared with the real background, when the number of input images is increased. B. Background Compensation Image difference is a widely used method for the detection of motion changes [19]. The interframe difference (IFD) is calculated by performing a pixel-by-pixel subtraction between two images. It is observed that the camera can be usually vibrated, even though it is mounted onto a tripod or is solidly fixed in place. These problems are caused by wind or by bridge movement where the camera system is mounted. In particular, after creating the background, displacements can be seen between the incoming frames and the generated background. This causes noise in the foreground images. To eliminate this noise, there
NGUYEN et al.: COMPENSATING BACKGROUND FOR NOISE DUE TO CAMERA VIBRATION IN VSM SYSTEM
35
Fig. 4. (a)–(d), (f)–(i), (k)–(n) Real traffic images and (e), (j), and (o) their corresponding generated background images (using four input images). Images in the first, second, and third rows illustrate light, normal (and dusk lighting), and light heavy traffic conditions, respectively.
must be a means of compensating for it. Therefore, we first have to estimate the camera vibration or the displacement parameter. Then, we can eliminate noise by compensating the background with respect to the estimated displacement. There were related research considering camera vibration, i.e., video construction to build a video editing system [20], restoration of degraded images caused by camera motion blur [21], and analysis of camera vibration errors in the tracking system [22]. Recently, the background compensation algorithm has been applied to tracking moving objects using an active camera [23], [24] and motion estimation [25], [26]. The vertical and horizontal projection histograms of the previous and current color images are created and matched. The translational displacement is determined by locating the positions for the best match. If Ik (x, y) is the image at time index k and d = [dx, dy]T is the translational displacement between Ik (x, y) and Ik−1 (x, y), then the relationship between Ik (x, y) and Ik−1 (x, y) is defined as follows: Ik (x, y) = Ik−1 (x + dx, y + dy).
(2)
The vertical projection histogram of an image is constructed by projecting along vertical lines (columns). One histogram is constructed per vertical line (column). The number of histograms in the vertical projection histogram is equal to the number of columns in the image. Let P V (I) be the vertical histogram of the image I, giving P V (I) = hVj (I) : j = 0, 1, . . . , W − 1
(3)
where W is the number of columns, and hVj (I) is the histogram of column j of image I. Thus, hVj (I) is expressed as hVj (I) = hVjl (I) : l = 0, 1, . . . , L − 1 (4)
where L is the number of bins, and L/3 corresponds to the number of bins of one channel among three channels of a color image. For instance, the histogram is typically calculated in the RGB space using 8 × 8 × 8 bins. hVjl (I) is the number of pixels with intensities in bin l. The matching value of the two histograms (the histogram intersection matching value) is defined in [27]. Let dV (hVi (Ik−1 ), hVj (Ik )) be the matching value of two histograms hVi (Ik−1 ) and hVj (Ik ). This is expressed as L−1 V hil (Ik−1 ) − hVjl (Ik ) . dV hVi (Ik−1 ), hVj (Ik ) = 1 − l=0 2H (5) Let DV (P V (Ik−1 ), P V (Ik ), dx) be the matching value of two vertical projection histograms of two images Ik−1 and Ik . The image Ik is translated in the x-axis by the displacement dx. DV (P V (Ik−1 ), P V (Ik ), dx) is defined here. If dx ≥ 0 DV P V (Ik−1 ), P V (Ik ), dx W −1−dx V V d hi+dx (Ik−1 ), hVi (Ik ) i=0 (6) = W − 1 − dx and if dx < 0 DV P V (Ik−1 ), P V (Ik ), dx W −1+dx V V d hi (Ik−1 ), hVi−dx (Ik ) i=0 . (7) = W − 1 + dx The denominators in (6) and (7) are utilized to normalize the matched values that are between 0 and 1. The displacement between two images Ik−1 and Ik in the x-axis direction is
36
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 60, NO. 1, JANUARY 2011
Fig. 5. Result of the background compensation. (a) Generated background image. (b) Incoming image. (c) IFD image. (d) IFD image with background compensation.
determined by searching for the value dx that maximizes the matching value of the vertical projection histograms of two images Ik−1 and Ik . Thus, dx is expressed as dx = arg max DV P V (Ik−1 ), P V (Ik ), dx . (8) −W