Vision-based Vehicle Detection and Classification

1 downloads 0 Views 502KB Size Report
1) Ph.D. Candidate, Department of Civil and Environmental Engineering, University of Cyprus, Nicosia, Cyprus. Email: [email protected]. 2) MSc Student ...
Vision-based Vehicle Detection and Classification Georgios M. Hadjidemetriou1, Christina Kepola2, and Symeon E. Christodoulou3 1) Ph.D. Candidate, Department of Civil and Environmental Engineering, University of Cyprus, Nicosia, Cyprus. Email: [email protected] 2) MSc Student, Department of Civil and Environmental Engineering, University of Cyprus, Nicosia, Cyprus. Email: [email protected] 3) Ph.D., Professor, Department of Civil and Environmental Engineering, University of Cyprus, Nicosia, Cyprus. Email: [email protected]

Abstract: The increasing travel demand, during the last decades, has caused the growth of traffic accidents, traffic congestion and of infrastructure strain. Consequently, the need for enhanced surveillance and for management of traffic was generated. A large amount of data is collected for monitoring vehicles activities and predicting traffic, whose manual review is impractical. Presented herein is an automated vision-based method which detects, tracks and classifies vehicles into three main categories. The proposed system utilizes road video frames acquired by a stationery camera and Gaussian Mixture Models, as well as a foreground detector and blob analysis. The promising performance of the system is evaluated by an overall F1 score of 92.97 %. Keywords:

Computer vision, video-processing, tracking, traffic counting, road networks.

1. INTRODUCTION The competitiveness, economic strength and productivity of a country heavily depend on the performance of its transportation systems (Sussman et al., 2008) since they are an indispensable part of human activities. Road networks are a basic element of transportation systems, with an average of 40% of the population spending at least 1 hour on the road each day (Zhang et al., 2011). Unfortunately, though, the increased vehicle traffic causes a plethora of problems such as traffic accidents, traffic congestion, air pollution and pavements deterioration. The option of constructing new infrastructure and/or the enhancement of existing infrastructure by widening the roads is most often neither affordable nor feasible due to limited land resources. The aforementioned limitations in constructing or upgrading transport infrastructure have led to the development of novel strategies for the improved utilization of existing road networks. To begin with, adopted intelligent transportation systems modules (by vehicles and traffic management centers) can improve different aspects in transportation systems. Further, the processing of traffic data can lead to an improved allocation of traffic and of related transport resources, eventually reducing traffic and pavement deterioration. Additionally, the classification of traffic (vehicle types) is vital information for traffic management systems. Heavy vehicles (buses, trucks) cause pavement deterioration at a much higher level, compared to light vehicles, whilst they also cause congestion due to their low speeds. By extent, information on the number of heavy vehicles that use each road segment might lead to management actions such as the designation of specific lane for use by heavy vehicles only. Furthermore, the classification of vehicles can indicate the need for public transport or for the development of cycling infrastructure. For the purpose of this study, and although vehicle types can be generally divided into several categories such as motorbikes, cars, vans, heavy trucks and buses, three classes were considered, based on the geometric characteristics of the tracked region: two wheeled-vehicles (bicycles, motorcycles), light vehicles (cars and vans), and heavy vehicles (trucks and buses) (Unzueta et al., 2012). Automated vehicle monitoring can be accomplished, utilizing pre-existing cameras at roadways or newly installed low-resolution cameras. Pre-existing cameras have mainly been used for vehicle speed control, while lowresolution cameras are low-cost, allowing transportation departments to install them for covering central spots of roadway networks. The current paper aims to develop a methodology for automatic vehicle detection, tracking and classification, utilizing Gaussian Mixture Models (GMMs) and low-cost technologies. The presented methodology is complemented with a case study application (a smartphone camera was used for video acquisition from the roadway network of Nicosia, Cyprus). The next sections discuss the current state of research in automated traffic monitoring as well as methods related to the scope of this paper, the proposed methodology, the experimental implementation, and deduced conclusions. 2. BACKGROUND Overcoming the barriers of current practices in traffic monitoring has, in recent years, attracted the interest of the scientific community. Different data fusion techniques have been developed to assess traffic flow. Several studies

have focused on the utilization of induction loops detectors and Global Positioning System-based receivers, whilst a plethora of other research studies have been based on computer-vision techniques. Traffic data is mainly collected by road sensors embedded in the pavement, with the most of them being inductive loop detectors. This type of data collection suffers from its limited reliability and consequently, the provided information may not be sufficient for traffic operations (El Faouzi et al., 2011). Another limitation worth mentioning is the high installation and maintenance costs of the method in attaining significant coverage of the roadway network (Zhang et al., 2011). The probe vehicle data collection technique, also known as Floating Car Data (FCD) and its extended version termed as ‘xFCD’, has been one of the trends in traffic data collection. According to this technique, vehicles shift from a passive attitude to an active one and act as moving sensors, continuously feeding information about traffic conditions to a traffic management center (El Faouzi et al., 2011). Nevertheless, laser scanners increase the cost of systems and are sensitive to weather conditions, while GPS is characterized by calibration complexity. A wide range of vision-based applications have been developed to solve transportation-related problems, such as: lane markings evaluation (Sun et al., 2006), traffic signs detection (Mogelmose et al., 2012), or pavement condition assessment. The latter includes the automated detection of different types of pavement defects such as cracks (Zou et al. 2012), patches (Hadjidemetriou et al.; 2016, Hadjidemetriou et al. 2018) and potholes (Koch et al., 2012). These research studies indicate that vision-based systems are able to perform feature detection, extraction, and matching; object detection and tracking; and motion estimation. They can be utilized for obtaining richer information in terms of visual features of the vehicles (color, lights), and for achieving more accurate classifications compared to intrusive technologies such as radar, inductive loop detectors and lasers. These advantages, combined with the increasing computational power of processors, have made vision-based systems an area of great interest for roadways operators (Unzueta et al., 2012). Regarding vision-based traffic monitoring, Autoscope was amongst the first widespread video camera cars detection systems, including incident detection and its implementation at a signalized intersection (Michalopoulos, 1991). Chen et al. (2001) captured the spatial and temporal behavior of objects, after addressing issues related to unsupervised image segmentation and object modelling with multimedia inputs. Gupte et al. (2002) developed an algorithm for vision-based detection and classification of vehicles in monocular image sequences of traffic scenes, which are recorded by a stationary camera. Image processing is conducted at three levels: raw images, region level, and vehicle level. Vehicles are modelled as rectangular patterns with certain dynamic behavior. In addition, Cheng and Kamath (2004) compared the performance of a large set of different background models on urban traffic video, while experimenting with sequences filmed in weather conditions such as snow and fog, for which a robust background model is required. They concluded that Mixture of Gaussians algorithm had the best performance, followed by the adaptive median filtering. Toufiq et al. (2006) described background subtraction as the widely used paradigm for the detection of moving objects in videos taken from a static camera. The main idea behind this concept is to automatically generate and maintain a representation of the background, which can be later used to classify any new observation as either background or foreground. Further, Chintalacheruvu and Muthukumar (2012) used Harris-Stephen corner detector algorithm to create a standalone car identification and tracking method that determines both vehicles count and speed at roadways. 3. METHODOLOGY The current study proposes and evaluates a method for vehicle detection, tracking and classification using Gaussian mixture models and blob analysis. The key steps of the proposed system are illustrated in Figure 1.

Figure 1. Main stages of the proposed method The input consists of frames, extracted from the collected videos. The initial frames are used for training (learning algorithm), assuming the background scene in order to compare each frame and determine the foreground dynamic objects/vehicles. The vehicles are detected using Blob Analysis; a tool which detects connected regions. Post processing is performed on the foreground dynamic vehicles to output more reliable segmentation results and reduce the noise interference. The next step is image segmentation, which consists of object recognition and of

results extraction. Each image pixel is categorized as part of a moving object (foreground) or of the background scene. It is important to note that morphological opening is used to remove undesirable noise and to fill the gaps in the detected objects. Morphological opening removes small objects from the foreground of a frame, placing them in the background. A binary image displays the detected vehicles with white color and the background with black color, while a generated colored rectangle surrounds the detected vehicle in the initial frames (Figure 2; Figure 3; Figure 4). These figures show that vehicles parked at the examined road sections correctly not detected by the algorithm since the background model is continuously adapted to changes based on a sliding window. Once detected, the vehicles are classified into one of three classes based on the identification of the minimum and maximum number of connected pixels for each class: 4,000 and 4,500 for two wheeled-vehicles; 7,000 and 23,000 for light vehicles; and 23,000 and 30,000 for heavy vehicles, respectively. These numbers were arrived at through a search over the range of 2,000 to 40,000 connected pixels in increments of 100, to identify the best performing district connectivity count, based on the performance (F1 score) on the testing set. The tracked binary image forms the input image for counting. Each frame is tested for the presence of each category of vehicle. The output video displays the bounding boxes around the vehicles and the number of each category of vehicles in the upper left corner of the video. Consequently, vehicles are detected, tracked, categorized and counted at each frame separately.

Figure 2. Detection of moving two-wheeled vehicles

Figure 3. Detection of moving light vehicles

Figure 4. Detection of moving heavy vehicles

4. EXPERIMENTAL IMPLEMENTATION A case study was performed, based on the road network of Nicosia, Cyprus with a stationery smartphone camera positioned on the 1st floor of a building at a height of 5m with the image plane parallel to the roadway below. The videos were recorded in different times and under different weather conditions. A 13MP Smartphone camera (LG X cam) collected video frames with a 1920 x 1080 resolution. The pavement videos were automated downsized to 640 × 360 pixels to decrease data volume and image processing time. The Matlab programming language and environment performed the processing on a laptop PC with the following specifications: Dell Inspiron N5050, Intel Core i5-2430M, 2.40 GHz and 4GB RAM. The performance evaluation of the presented method involved the measures of precision (Equation 1), recall (Equation 2) and of the F1 Score (Equation 3). The term Positives (P) represents the vehicles moving on the roadway, which can be separated into True Positives (TP), and False Negatives (FN), after detection. The performance measures of precision and recall were calculated based on TP, which represent the number of vehicles correctly detected; False Positives (FP), which is the number of objects wrongly identified as vehicles; and FN that correspond to vehicles that crossed the examined road section without being detected. High precision indicates that the identified vehicles correlate with actual vehicles. Additionally, high recall shows that the majority of vehicles were detected by the algorithm. F1 Score is the harmonic average of precision and recall. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝐹1 𝑆𝑐𝑜𝑟𝑒 = 2 ∗

𝑇𝑃

(1)

𝑇𝑃+𝐹𝑃 𝑇𝑃

(2)

𝑇𝑃+𝐹𝑁

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙

(3)

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

The performance metrics are calculated for the overall number of vehicles, and for each class separately, as shown in Table 1. Furthermore, Table 2 depicts the performance evaluation results (precision, recall and F1 score) for the overall number of vehicles and for each class of vehicles. The algorithm achieved a recall of 95.56 %, which depends on TP and FN and a lower precision, which depends on TP and FP, being equal to 90.53 %, regarding the overall number of vehicles. Comparing the three vehicles classes, the highest performance is observed at the light vehicles with precision, recall and F1 score being equal to 92.56 %, 93.87 % and 93.21 %, respectively; followed by the two-wheeled vehicles, whose precision, recall and F1 score equal to 83.33 %, 96.77 % and 89.55 %; and the heavy vehicles, with their precision, recall and F1 score being equal to 83.33 %, 92.59 % and 87.72 %, respectively. In all cases, precision has lower score than recall due to the appearance of FP. This can be explained by the fact that the foreground segmentation process often includes undesirable noise. It was observed during the results extraction that as the minimum blob area reduces, the number of FP increases. That explains the low precision of two wheeled-vehicles. Regarding the low precision of heavy vehicles, in some cases two light vehicles are merged together and treated as a single entity (truck).

Results P TP FN FP

Performance Metric Precision (%) Recall (%) F1 Score (%)

Table 1. Vehicles detection results Overall Two-Wheeled Light Vehicles Vehicles 270 31 212 258 30 199 12 1 13 27 6 16

Heavy Vehicles 27 25 2 5

Table 2. The performance of the proposed method Overall Two-Wheeled Light Vehicles Vehicles 90.53 83.33 92.56 95.56 96.77 93.87 92.97 89.55 93.21

Heavy Vehicles 83.33 92.59 87.72

5. CONCLUSIONS Vehicle detection, counting and classification conducted by transportation agencies, faces a range of major challenges, motivating the need for an automation of the procedure and for low-cost solutions. An accurate system for the automated detection and classification of vehicles, using a foreground detector and vision blob analysis, is proposed. The proposed method shows promise to provide transportation authorities with a cheaper and more

widely deployable solution, compared to currently available techniques. The high recall is an indication of the efficiency of the algorithm. The performance of the proposed algorithm, after processing low resolution images, designates that low-cost cameras, which do not record high resolution videos, can be used for collecting data. Though, the presented approach has room for improvement and work is currently under way to further enhance the accuracy of the method. This can be succeeded with the growth of training data in order to achieve an enhanced background extraction. Moreover, shadow elimination or separation from the vehicle and exclusion of unclassified moving objects from the traffic statistics will probably reduce the number of FP and improve the precision, recall and F1 score measurements. It should also be mentioned that the presented algorithm has the capability to count the vehicles per frame, without calculating the total number of vehicles per video. Thus, apart from vehicles detection and classification, the proposed system can be further expanded to the counting of vehicles for each class under real-time conditions.

REFERENCES Chen, S.C., Shyu, M.L., and Zhang, C. (2001). An intelligent framework for spatio-temporal vehicle tracking. IEEE, Proceedings of the Intelligent Transportation Systems, 2001, pp. 213-218. Chen, S.C., Shyu, M.L., and Zhang, C. (2001). An unsupervised segmentation framework for texture image queries. IEEE, Proceedings of the 25th Annual International Computer Software and Applications Conference, COMPSAC, pp. 569-573. Chintalacheruvu, N. and Muthukumar, V. (2012). Video based vehicle detection and its application in intelligent transportation systems. Journal of transportation technologies, 2(04), pp.305. El Faouzi, N.E., Leung, H., and Kurian, A. (2011). Data fusion in intelligent transportation systems: Progress and challenges-A survey. Information Fusion, 12(1), pp.4-10. Gupte, S., Masoud, O., Martin, R.F., and Papanikolopoulos, N.P. (2002). Detection and classification of vehicles. IEEE Transactions on intelligent transportation systems, 3(1), pp.37-47. Hadjidemetriou, G.M., Christodoulou, S.E., and Vela, P.A. (2016). Automated detection of pavement patches utilizing support vector machine classification. IEEE, Proceedings of the 18th Mediterranean Electrotechnical Conference (MELECON), pp. 1-5. Hadjidemetriou, G.M., Vela, P.A., and Christodoulou, S.E. (2018). Automated pavement patch detection and quantification using support vector machines. Journal of Computing in Civil Engineering, 32(1), 04017073. Koch, C., Jog, G.M. and Brilakis, I. (2012). Automated pothole distress assessment using asphalt pavement video data. Journal of Computing in Civil Engineering, 27(4), pp.370-378. Michalopoulos, P.G. (1991). Vehicle detection video through image processing: the autoscope system. IEEE Transactions on vehicular technology, 40(1), pp.21-29. Mogelmose, A., Trivedi, M.M., and Moeslund, T.B. (2012). Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Transactions on Intelligent Transportation Systems, 13(4), pp.1484-1497. Sen-Ching, S.C. and Kamath, C. (2004). Robust techniques for background subtraction in urban traffic video. Electronic Imaging, pp. 881-892. Sun, T.Y., Tsai, S.J., and Chan, V. (2006). HSI color model based lane-marking detection. IEEE, Proceedings of the Intelligent Transportation Systems Conference, pp. 1168-1172. Sussman, J.S. (2008). Perspectives on intelligent transportation systems (ITS). Springer Science & Business Media. Toufiq, P., Egammal, A., and Mittal, A. (2006). A framework for feature selection for background subtraction. IEEE, Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition, CVPR’06. Unzueta, L., Nieto, M., Cortés, A., Barandiaran, J., Otaegui, O., and Sánchez, P. (2012). Adaptive multicue background subtraction for robust vehicle counting and classification. IEEE Transactions on Intelligent Transportation Systems, 13(2), pp.527-540. Zhang, J., Wang, F.Y., Wang, K., Lin, W.H., Xu, X., and Chen, C. (2011). Data-driven intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, 12(4), pp.1624-1639. Zou, Q., Cao, Y., Li, Q., Mao, Q., and Wang, S. (2012). CrackTree: Automatic crack detection from pavement images. Pattern Recognition Letters, 33(3), pp.227-238.

Suggest Documents