Multi-class Traffic Sign Detection and Classification Using Google ...

3 downloads 335688 Views 673KB Size Report
Jul 31, 2014 - Keywords: Traffic Sign, Detection, Classification, Google Street View, .... are nine directional cameras (as shown in Figure 2) for 360° views at a.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

Multi-class Traffic Sign Detection and Classification Using Google Street View Images Vahid Balali1 PhD Candidate, Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign 205 N Mathews Ave., Urbana, IL 61801 Tel: (540) 235-6474 Email: [email protected]

Elizabeth Depwe Graduate Student, Department of Civil and Environmental Engineering University of Illinois at Urbana-Champaign 205 N Mathews Ave., Urbana, IL 61801 Tel: (512) 965-9776 Email: [email protected]

Mani Golparvar-Fard Assistant Professor and NCSA Faculty Fellow Department of Civil and Environmental Engineering Department of Computer Science University of Illinois at Urbana-Champaign 205 N Mathews Ave., Urbana, IL 61801 Tel: (217) 300-5226 Email: [email protected]

Submitted to Information Systems and Technology (ABJ50) for Presentation at the 94th Transportation Research Board Annual Meeting January 2015, Washington, D.C. July 31, 2014

Word Counts: Abstract and Manuscript Text: 5184 Number of Tables and Figures: 10 (= 2500 words) Total: 7684

1

Corresponding Author

Balali, Depwe, and Golparvar-Fard 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

2

ABSTRACT Maintaining an up-to-date record of the location and condition of high-quantity low-cost roadway assets such as traffic signs, is critical to the safety of transportation systems. Despite their importance, today’s video-based data collection and analysis practices are still costly, prone to error, and performed intermittently. While, databases such as Google Street View (GSV) contain street-level panoramic images of all traffic signs and are updated regularly, their potential for creating a comprehensive inventory has not been fully explored. The key benefit of these databases is that once roadway assets are detected, accurate geographic coordinates of the detected assets can be automatically determined and visualized within the same platform. Nevertheless, detecting and classifying roadway assets from GSV imagery is challenging due to their interclass variability and particularly changes in illumination, occlusion, and orientation. This paper evaluates the application of a computer vision method for multi-class traffic sign detection and classification from GSV images. The method extracts images using the Google Street View API and leverages a sliding window mechanism to detect potential candidates for traffic signs. For each candidate, a Histogram of Oriented Gradients (HOG) is formed and is concatenated with a Color Histogram. The HOG+Color descriptors are then fed into multiple one-vs.-all Support Vector Machine classifiers to detect traffic signs and classify them into their specific categories. The experimental results with an average accuracy of 95.5% demonstrate the potential of leveraging GSV images as a viable solution for creating up-to-date inventories of traffic signs. Keywords: Traffic Sign, Detection, Classification, Google Street View, Condition Assessment

Balali, Depwe, and Golparvar-Fard 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110

3

INTRODUCTION Managing and maintaining transportation infrastructure systems is not a new problem. Nonetheless, the significant expansion in size and complexity of these networks in recent years have posed several new engineering and management problems relating to the time dependant assessment, prioritization and maintanence of existing infrastructure (1). The fast pace of deterioration and the limited funding available for rehabilitation, have motivated the Departments of Transportation (DOTs) to consider prioritizing existing infrastructure systems based on their condition. This requires the DOTs to maintain an updated record of all assets and their location and condition. As a result, the DOTs track the condition of many types of high-quantity low-cost roadway assets such as light poles, guardrails, pavement markings, and traffic signs. Despite their significance, creating a comprehensive and accurate inventory on the condition of all assets is challenging due to the high-volume of data that must be collected, as well as the subjectivity of the current inspection processes which negatively impacts the quality of the process. Also, data collection techniques have not been standardized in the United States (2) and the manual processes involved in the state-of-the-art practices still create potential safety hazards for the inspectors. Hence, there is a need for data collection and analysis methods for condition assessment of the roadway assets that do not have a significant cost burden for DOTs and can achieve automation, accuracy, and safety (3; 4). Given the significance of the problem, the Federal Highway Administration (FHWA) has recently requested identification of the gaps between currently available data collection technologies and the need for collecting comprehensive information about the nation’s roadway infrastructure (5). Thus, over the past few years, several US DOTs have pro-actively looked into Information Technologies (IT) that enable both raw and formatted asset data to be processed, stored, and utilized in an integrated asset management system. For roadway asset management, today’s most common IT capabilities focus on collecting inventory data (asset location, quality, age), typically together with photographic documentation, and also tracking public comments on current conditions (6; 7). For example, the Tennessee Department of Transportation receives continuous updates from Tennessee Road Information Management System (TRIMS) and Maintenance Management System (MMS) in a central database of roadway assets including traffic signs, guardrails, and pavement markings. The New Mexico State Highway and Transportation Department collects data on most types of visible highway assets except for light posts and road detectors (8). Virginia Department of Transportation has also recently developed a web-based asset management system using Google maps and Google earth (9). While there is evident documented benefits that these methods address problems in data collection, the process of identifying assets and inspecting their availability, exact locations, and conditions remains dominantly manual and still needs to be systematically addressed. Today, the most dominent technique involves videotaping road assets on a massive scale using inspection vehicles equipped with three to five frontal high-resolution cameras (10). Typically, by sitting in front of two or more screens, practitioners visually detect and assess the condition of these assets based on their own experience and a condition assessment handbook. Due to the high cost of inspection vehicles, the number of them is still very limited. This results in a survey cycle of one year duration– only for critical roadways– and many years of negligence for all other local and regional roads. Video-based data collection and analysis has to be done for millions of miles of roads and the practice needs to be repeated periodically. Otherwise, many critical decisions will be made based on inaccurate or incomplete information, which can ultimately affect the asset maintenance and rehabilitation process.

Balali, Depwe, and Golparvar-Fard 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131

4

Recently several non-DOT entities have commenced collecting street-level panoramic photographs on a country-wide scale. Examples include Google Street View, Microsoft Streetside, Mapjack, EveryScape, and Cyclomedia Globspotter (11). The availability of these large-scale databases– which are also frequently updated– offers the possibility to replace or perhaps augment the current DOT practices of roadway asset data collection and minimize costs. In addition, because these datasets contain photographic documentation of ALL assets, applying computer vision object detection algorithms on them can streamline both asset data collection and analysis processes. In the following, we briefly review Google Street View as one example of these large scale visual datasets that can facilitate the process. How Google Street View Can Facilitate the Asset Data Collection and Analysis Process? Google Street View (GSV) is a technology featured in Google Maps and Google Earth that provides panoramic views from positions along many streets in the world. The service was launched in 2007 and started with only a few cities in the United States. Today, a monumental amount of image data is available for many countries (48 countries) all over the world as shown in Figure 1 (12). There are nine directional cameras (as shown in Figure 2) for 360° views at a height of 2.5-3 meters, multiple GPS units for positioning, and three laser range scanners for measuring up to 50 meters 180° in the front of the vehicle (13). There are also 3G/GSM/WiFi antennas for scanning 3G/GSM and Wi-Fi hotspots. The retrieval of GSV images is a powerful alternative to current asset data collection practices, especially if conducted in earlier stages.

132 133 134

FIGURE 1 Availability of Google Street View images (12).

135 136

FIGURE 2 Google Street View data collection vehicle.

Balali, Depwe, and Golparvar-Fard 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182

5

The GSV data collection involves driving around and photographing existing locations. Weather condition and population density are the most important factors in determining the time and location for the data collection. Signals from sensors on the car that measure GPS, speed, and direction are combined to match each image to its geographic location on the map. This helps reconstruct the car’s exact route, and even tilt and realign images as needed. The photos are then stitched together into a single 360-degree image and by application of image processing algorithms, seams are lessen among the images and smooth transitions are created for visualization purposes. Such capabilities can certainly augment current DOT asset data collection practices, if not fully replacing them. The GSV dataset typically contains multiple geo-registered imagery for each one of the high-quantity low-cost assets, and thus creates an excellent opportunity for developing computer vision algorithms to leverage these multiple visual data per asset for detecting, classifying, and localizing them. Nevertheless, the general task of automated inference of traffic signs from visual data is challenging due to the intra-class variability of traffic signs. More specific challenges in GSV imagery are the expected changes in illumination, occlusion, sign position, and orientation. For practicality purposes, any method for analysis should be fast allowing these large quantity of high-resolution imagery to be processed in a timely fashion. As a proof of concept, this paper explores leveraging Google Street View (GSV) data as a means to collect the necessary comprehensive data on traffic signs, and presents and validates a computer vision method for identification and classification of multi-class US traffic signs from such imagery. In the following, the current computer vision methods for identification and classification of traffic signs are briefly reviewed. RESEARCH BACKGROUND The first step in developing computer vision algorithms for detecting and classifying traffic signs is collecting a comprehensive data set. Collecting appropriate data is often more time-consuming and more expensive than algorithm development (14). While several recent efforts in Europe have focused on collecting comprehensive datasets of traffic lights (15) and traffic signs (16-20), in the United State there are only a few recent works (e.g. 22) that have focused on collecting and sharing a publicly accessible dataset on U.S. traffic signs from streets and highways. In addition, these signs are often not as precisely standardized as one would expect (e.g. this also depends on the country). A vast majority of the literature has focused on detecting and classifying primarily one type of traffic sign– i.e., stop signs – from such datasets. Any detection and classification method typically involves 1) feature detection/description, and 2) machine learning methods for training models and inference purposes. Traffic signs have distinct shapes and colors, and thus many of these methods focused on leveraging edge features (21-23), or other features such as Histogram of Oriented Gradients (HOG) and Haar wavelets that can provide more accurate alternatives for shape recognition. Several others such as (24; 25) have combined the edge and Haar-like features. Color features have also been used dominantly in the literature (26; 27). While earlier methods focused on thresholding these features, today application of machine learning methods is more common. For example, a color-based model that does not rely on thresholding was put forward by (28). This method also uses Haar-like features trained with an AdaBoost cascade classifier which is a common fast method for leveraging Haar-like features (25; 28). HOG features have also been used in several studies to detect traffic signs (29-31). The selection of a classification method is constrained to the features extracted during the detection step. Support Vector Machine (SVM)

Balali, Depwe, and Golparvar-Fard 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203

(32-34), neural networks, and cascaded classifiers trained with some type of boosting (31; 35; 36) have been used for HOG features. While many promising methods are introduced, the joint application of shape and color features together with machine learning methods for detecting traffic signs– particularly on existing databases such as GSV– is still not fully explored. In the following, a new publicly available dataset from GSV together with its ground-truth is introduced and as a proof of concept, the application of a computer vision method for detecting shape and color features in a machine learning scheme is explored in a principled way. The performance of the method is also compared to the state-of-the-art. METHOD Different from the state-of-the-art methods, the proposed method does not make any prior assumption on the 2D location of traffic signs in images (37). Rather, a template slides across the entirety of each GSV image collected using Google Street View API and extracts candidates for traffic signs. For each candidate as shown in Figure 3, the gradient orientations and color information are locally histogrammed and concatenated as HOG+Color descriptors. These descriptors are then fed into multi-class one-vs.-all Support Vector Machine (SVM) classifiers– which are trained in an offline process– to catalogue the detected assets. Finally, a non-maxima suppression is used to remove false positives and only keep assets with the highest scores for accurate localization. Each of these steps, in addition to the process of training the classifiers are explained in the following subsections. Google Street View Image API

A Classifier Per Asset Type Real Image

204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221

6

HOG

GSV Image Trained Multiple SVM Classifiers

Extract New Candidates with Sliding Window

Extract HOG & Color Histogram Features

Detect if Sign Exists in Candidate Window

Concatenate and Form HOG+C Descriptors

Google Street View Image with 2D Localized Signs

Histogram of Color

Localize Sign in 2D with Non-Maxima Suppression

Visualize Detection in GSV Image

FIGURE 3 Overview of the proposed system. Google Street View API and Sliding Window for Candidate Extraction In this work, the database of Google Street View images is accessed via Google Maps static Application Programming Interface (API) through an HTTP request. The API has daily limit of 25,000 image requests and 2,500 directions (One query may generate 100 directions, so 250 queries per day). However, more is possible with “Maps for Business” starting from $10,000 per year. GSV images can be downloaded in any size up to 2048×2048 pixels. The proposed API is defined with URL parameters, as listed in Table 1, sent through a standardized HTTP which links to an embedded static (non-interactive) image within the Google database. A Python 2.7 script calls on a package titled “urllib” (38; 39). While looping through the parameters of interest, the script generates a string matching the HTTP request format of the Google Street View API. After the unique string is created, the URL-retrieve function is used to download the desired image from Google Street View. In the proposed API, a starting location coordinate needs to be set up manually. Since the exact coordinates of the street view images for the traffic signs of interest are unknown, the starting coordinates are incremented in a grid pattern to ensure that all possible areas are examined. At each location on the map, three images are saved

Balali, Depwe, and Golparvar-Fard 222 223 224

225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260

7

corresponding to a heading of -20˚, 0˚, and +20˚, resepctively. The script retrieves approximately 23 images per second. TABLE 1 Required Parameters for Google Street View Images API Parameter Description Dimension Location Either text string or lat/long value lat/long value Size Output size of the image in pixels. 2048×2048 Heading Compass heading of camera 0-360 (North) FOV Horizontal field of view of the image 90 degree Pitch up/down angle of the camera relative to the Street View vehicle 0 The Street View Image API snaps to the panorama photographed closest to the location. Because Street View imagery are periodically refreshed and photographs may be taken from slightly different positions each time, it is possible that the location may snap to a different panorama when imagery is updated. For each GSV Image, a template window at multiple spatial scales slides observing all possible locations for traffic signs. For each candidate, histograms representing shape and color are formed and placed into the machine learning models to classify the detected signs. In cases where several template window candidates have detected a sign, the window with maximum classification score is kept to accurately localize the detected sign in 2D GSV image. Histogram of Oriented Gradients (HOG) The basic idea is that the local shape and appearance of traffic signs in a given detection window – extracted with a template window that slides on GSV images– can be characterized by distribution of local intensity gradients. These properties can be captured via HOG descriptors (40). The HOG features are computed over all the pixels in the candidate template extracted from the GSV image by capturing the local intensity changes. The resulting features are concatenated into one large feature vector as a HOG descriptor for each candidate window. In order to do so, the magnitude g ( x, y) and orientation  ( x, y) of the intensity gradient for each pixel within the candidate window is calculated. Then the vector of all calculated orientations and their magnitude is quantized and summarized into HOG. More precisely, the window is divided into dx  dy local spatial regions (cells) where each cell contains n  n pixels. Each pixel casts a weighted vote for an edge orientation histogram bin, based on the orientation of the image gradient at that pixel. This histogram is normalized with respect to the neighboring histograms, and can be normalized multiple times with respect to different neighbors. These votes are then accumulated into n evenlyspaced orientation bins over the cells. Color Descriptors To learn and classify distinct choices of color on traffic signs, the proposed method also forms a histogram of local color distribution similar to HOG. For each template window which contains a candidate traffic sign, the image patch is divided into dx  dy non-overlapping pixel regions; i.e. cells. A similar procedure is followed to compute color feature for each cell, resulting a local histogram representation of color distributions. To minimize the effect of change of brightness in images, Hue-Saturation-Value (HSV) color channels are chosen and the Values are ignored. Hue and saturation components of color are thus histogrammed and vector-quantized. These color histogram descriptors are then concateneated with HOG to form the HOG+Color descriptors.

Balali, Depwe, and Golparvar-Fard

8

261 262 263 264 265 266 267 268 269 270 271 272

Support Vector Machine (SVM) Learning and inference of traffic signs per sliding window template is performed by using a SVM formulation. The SVM machine learning classifier (41) is used to identify whether or not the candidate window within the 2D GSV image contains a given traffic sign. Multiple and independent one-against-all SVM classifier are developed in which each SVM is a margin-based classifiers and is designed to recognize and detect a single type of traffic sign in 2D candidate windows images. As with any supervised learning mode, first a support vector machine is trained offline and then the classifier is cross validated. The trained model is ultimately used to classify (predict) new instances of the same model. To obtain satisfactory predictive accuracy in the proposed method, various kernel functions are used and validated to tune the parameters of the kernel functions. During the training process, given n labeled training data points xi , yi  , wherein

273

xi (i  1,2,...., n, xi  R d ) is the set of d-dimensional HOG+C descriptors calculated from each image

274 275 276 277

(i), and yi {0,1} is the binary traffic sign class label (e.g., stop sign or non-stop sign), each SVM classifier identifies an optimal hyper-plane wT x  b  0 between the positive and negative samples of that type of traffic sign. Hence the optimal hyper-plane is the one which maximizes the geometric margin ( ) as shown in equation (1): 2  (1) w

278 279 280 281 282 283 284 285

For each binary SVM traffic sign classifer, the dataset contains considerable number of image entries. Several preliminary experiments with different kernels and parameters were conducted to investigate where the training data vectors can be linearly separated. As expected, given the significant size of the database, the results were promising and as a results the linear kernel was considered sufficient for formulation of the classification problem. However, the presence of noises and occlusions is typical in roadway data and can produce outliers in the SVM classifiers. Hence, the slack variable  i is introduced and consequently the SVM optimization problem can be written as: N 1 2 min w,b w  C   i 2 i 1 (2) subject to : yi ( wxi  b)  1   i for i  1,2,..., N

i  0 286 287 288 289 290 291 292 293 294 295 296

for i  1,2,..., N

Where C represents a penalty constant which is determined by a cross validation technique and N is the number of images. The inputs to the training algorithm are the samples for different types of traffic signs which are prepared in an offline process and the outputs are the trained models, each ready for detecting a single type of traffic signs. In the proposed method, for each template window, all SVM classifiers are applied. The classifier that has the highest classification score is kept as the most accurate category of traffic signs for the observation. Because each traffic sign can return positive classification scores at several spatially overlapping template windows, a non-maxima suppression is also used to only keep the sliding window that retunrs the highest score for accurate localization within the 2D GSV images. This is also important as several scales of the sliding windows can result in false positives as well.

Balali, Depwe, and Golparvar-Fard 297 298 299 300 301 302 303 304 305

9

DATA COLLECTION AND SETUP For evaluating the performance of the proposed method, we leveraged the multi-class model in our recent work (28) which was trained using images collected from a highway and many secondry roads in the US. This dataset contains different shapes and colors of traffic signs and the ground truth was annotated manually. Annotations were cropped so that each contains only one single traffic sign. The dataset includes various viewpoints, scales, illumination, and intra-class variability. Table 2 summarizes the most important information about this dataset. TABLE 2 Summary Information for Training Dataset of US Traffic Signs Type Color # Positives Sign Message Diamond

Yellow

1523

Warning

Rectangle

White, Blue, Green

5924

Regulatory, Direction (Including mile markers)

Stop Sign

Red

1640

Always means Stop

Triangle

Red

809

Generic Backgrounds

Yield, Slow Down, Prepare to Stop 10000 as Generic Negatives

306 307 308 309 310 311 312 313 314 315

In this paper, the new data collected from Google Street View API is used as the testing dataset. Figure 4 shows a snapshot of the API with the needed information on it such as latitude, longitude, heading, pitch, Field-of-View, and associated URL for downloading the image. Table 3 shows the properties of the HOG+C descriptors. Because of the large size of the training datasets, linear kernels are chosen for classification in the multiple one-vs.-all SVM classifiers. The base spatial resolution of the sliding windows is 64×64 and has 67% spatial overlap as the basis for the non-maxima suppression localizations. More details on the best sliding window size and the impact of multi-scale searching mechanism can be found in (28).

316 317

FIGURE 4 Google Street View API.

Balali, Depwe, and Golparvar-Fard 318

10

TABLE 3 Parameters of HOG+C Detectors HOG Parameters HOG Values Color Parameters Color Values Linear Gradient [-1; 0; 1] Color Channel Hue and Saturation Voting Orientation 8 orientation in 0-180˚ Number of bins 6 for each Normalization Method L2 normalized blocks Normalization Method L2 normalized blocks Number of Cells 4 Number of Cells 4 Number of Pixels 8×8 Number of Pixels 8×8

319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334

The developed method for the API is implemented in Python 2.7, and detection and classification methods are implemented in Matlab on Windows 64bit. The performance of our implementation was benchmarked on an Intel(R) Core(TM) i7-3820 CPU @ 3.60 GHz with 64.0 GB RAM and NVIDIA GeForce GTX 400 graphics card. EXPERIMENTAL RESULTS AND DISCUSSION For validation on the GSV imagery, ground truth was generated manually. In the first phase of validation, experiments were conducted to test the performance of the multiple classifiers while accounting for the impact of the parameters of the sliding window overlap and also the spatial resolutions. Figure 5 shows different cases on the performance of the classification. The first row shows True Positives (TPs) which are correctly identified. The second row shows multiple detections due to the impact of the sliding windows on the detections. Third row here shows incorrectly classified traffic signs which are False Positives (FPs). And the last row shows the incorrectly rejected signs which are False Negatives (FNs).

True Positives

Multiple True Positives

False Positives

False Negatives

335 336 337

FIGURE 5 Examples of true positives, false positives, and false negatives.

Balali, Depwe, and Golparvar-Fard 338 339 340 341 342

343 344 345 346 347 348 349 350 351 352 353 354 355 356

Figure 6 shows several examples of multi-class traffic sign detection and classification sliding window, HOG+C, and the non-maxima suppression modules using Google Street View Images. As observed, different types of traffic signs with different scales, orientation/pose and under different background settings are detected and classified correctly in these imageries.

FIGURE 6 Multi-class traffic sign detection and classification in Google Street View images. To quantify the performance of the proposed detection and classification method on GSV imagery, a confusion matrix is put together. In the field of machine learning, the confusion matrix is a specific table layout that visualizes the performance of algorithms. Each column of the matrix shows the types of traffic signs in a predicted class while each row represents the types in an actual class. Figure 7 shows the confusion matrix for multi-class traffic sign detection and classification. Here, the performance of adding colors to the HOG descriptors is also verified and average accuracy of 90.75% for HOG and 95.5% for HOG+C are obtained. In the case of triangle-shaped traffic signs, HOG+C with multiple SVM classifiers works at the 100% accuracy. As observed, adding color histograms to HOG, also increases the accuracy for detecting and classifiying stop signs by 18%. HOG descriptors

357 358

11

HOG + Color descriptors

FIGURE 7 Confusion matrixes for detection and classification of multiple traffic signs.

Balali, Depwe, and Golparvar-Fard 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404

12

SUMMARY AND CONCLUSION This paper presented and validated a multi-class traffic sign detection and classification method using Google Street View (GSV) images. The results on the performance of HOG+C descriptors together with multiple binary SVM classifiers on the existing and up to date GSV imagery show the potential to provide quick and inexpensive access to information about inventory of traffic signs. The results particularly indicate that computer vision techniques leveraging shape and color information together with supervised machine learning techniques can enable automated detection and classification of traffic signs from existing GSV imagery at a reasonable accuracy. Due to the tremendous size and effortless access, Google Street Views can reasonably improve the acquisition of training and testing data for video-based traffic sign detection and classification. Given the reliability in performance and because collecting information from GSV imagery is cost-effective, the proposed method has potential to deliver inventory information on highquantity low-cost assets in a timely fashion and a user-friendly format that can tie into the existing DOT inventory management systems. As such it can minimize the burden on data collection and analysis, and allow more time on making informed decisions based on existing conditions and decide on the best timing and strategies for maintenance.The application of Street View as a data pool for training and testing of image-based machine learning algorithms includes several caveats that should be resolved to avoid biased results. Here, one possibility to bypass biased training is the application of semi-manual labeling instead of manual annotation in the training phase, and thus the use of street view data as a supplement for forming a more solid database. The future work will include investigating such opportunities for automatic creation of even more comprehensive traffic signs inventory databases. Once an asset such as traffic sign is detected in GSV imagery, the geo-information can also be used to localize and visualize the asset on the map. Investigating this can add further value to the underlying benefits of automated detection and classification. Future work will also include extracting location information of each Google Street View image and automatically retrieving accurate geographic coordinates for the detected signs. REFRENCES [1] Golparvar-Fard, M., V. Balali, and J. M. de la Garza. Segmentation and recognition of highway assets using image-based 3D point clouds and semantic Texton forests. Journal of Computing in Civil Engineering, 2012, p. 04014023. [2] de la Garza, J., C. Howerton, and D. Sideris. A study of implementation of IP-S2 mobile mapping technology for highway asset condition assessment.In Computing in Civil Engineering, ASCE, 2011. pp. 1-8. [3] Hassanain, M., T. Froese, and D. Vanier. Framework Model for Asset Maintenance Management. Journal of Performance of Constructed Facilities, Vol. 17, No. 1, 2003, pp. 51-64. [4] Rasdorf, W., J. Hummer, E. Harris, and W. Sitzabee. IT Issues for the Management of HighQuantity, Low-Cost Assets. Journal of Computing in Civil Engineering, Vol. 23, No. 2, 2009, pp. 91-99. [5] FHWA. Highway Safety and Asset Management.In FHWA-HRT-05-077, 2010. [6] Flintsch, G., and J. Bryant. Asset management data collection for supporting decision processes.In Federal Highway Administration (FHWA), 2006. [7] Oskouie, P., B. Becerik-Gerber, and L. Soibelman. Automated Cleaning of Point Clouds for Highway Retaining Wall Condition Assessment.In 2014 International Conference on Computing in Civil and Building Engineering, 2014.

Balali, Depwe, and Golparvar-Fard 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450

13

[8] Haas, K., and D. Hensing. Why your agency should consider asset management systems for roadway safety.In, 2005. [9] de la Garza, J., I. Roca, and J. Sparrow. Visualization of failed highway assets through geocoded pictures in google earth and google maps.In Proc., CIB W078 27th International Conference on Applications of IT in the AEC Industry, 2010. [10] Balali, V., M. Golparvar-Fard, and J. M. de la Garza. Video-Based Highway Asset Recognition and 3D Localization.In ASCE International Workshop on Computing in Civil Engineering, Los Angeles, CA, USA, 2013. pp. 67-76. [11] Creusen, I., and L. Hazelhoff. A semi-automatic traffic sign detection, classification, and positioning system.In IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, 2012. pp. 83050Y-83050Y-83056. [12] Google-Maps. Behind the Scenes of Street View. https://www.google.com/maps/about/behind-the-scenes/streetview/. [13] Gong, J., H. Zhou, C. Gordon, and M. Jalayer. Mobile terrestrial laser scanning for highway inventory data collection.In Proceedings of International Conference on Computing in Civil Engineering, Clearwater Beach, FL, USA, 2012. pp. 17-20. [14] Jalayer, M., J. Gong, H. Zhou, and M. Grinter. Evaluation of Remote-Sensing Technologies for Collecting Roadside Feature Data to Support Highway Safety Manual Implementation.In Transportation Research Board 92nd Annual Meeting, 2013. [15] de Charette, R., and F. Nashashibi. Real time visual traffic lights recognition based on spot light detection and adaptive traffic lights templates.In Intelligent Vehicles Symposium, 2009 IEEE, IEEE, 2009. pp. 358-363. [16] Larsson, F., and M. Felsberg. Using fourier descriptors and spatial models for traffic sign recognition.In Image Analysis, Springer, 2011. pp. 238-249. [17] Mogelmose, A., M. M. Trivedi, and T. B. Moeslund. Learning to detect traffic signs: Comparative evaluation of synthetic and real-world datasets.In Pattern Recognition (ICPR), 2012 21st International Conference on, IEEE, 2012. pp. 3452-3455. [18] Stallkamp, J., M. Schlipsing, J. Salmen, and C. Igel. The German traffic sign recognition benchmark: a multi-class classification competition.In Neural Networks (IJCNN), The 2011 International Joint Conference on, IEEE, 2011. pp. 1453-1460. [19] ---. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, Vol. 32, 2012, pp. 323-332. [20] Timofte, R., K. Zimmermann, and L. Van Gool. Multi-view traffic sign detection, recognition, and 3D localisation. Machine Vision and Applications, Vol. 25, No. 3, 2014, pp. 633-647. [21] Hoferlin, B., and K. Zimmermann. Towards reliable traffic sign recognition.In Intelligent Vehicles Symposium, 2009 IEEE, IEEE, 2009. pp. 324-329. [22] Ruta, A., Y. Li, and X. Liu. Towards Real-Time Traffic Sign Recognition by Class-Specific Discriminative Features.In BMVC, 2007. pp. 1-10. [23] Wu, J., and Y. Tsai. Enhanced roadway geometry data collection using an effective video log image-processing algorithm. Transportation Research Record: Journal of the Transportation Research Board, Vol. 1972, No. 1, 2006, pp. 133-140. [24] Hu, Z., and Y. Tsai. Generalized image recognition algorithm for sign inventory. Journal of Computing in Civil Engineering, Vol. 25, No. 2, 2011, pp. 149-158. [25] Prisacariu, V. A., R. Timofte, K. Zimmermann, I. Reid, and L. Van Gool. Integrating object detection with 3d tracking towards a better driver assistance system.In Pattern Recognition (ICPR), 2010 20th International Conference on, IEEE, 2010. pp. 3344-3347.

Balali, Depwe, and Golparvar-Fard 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494

14

[26] Lopez, L. D., and O. Fuentes. Color-based road sign detection and tracking.In Image analysis and recognition, Springer, 2007. pp. 1138-1147. [27] Maldonado-Bascon, S., S. Lafuente-Arroyo, P. Gil-Jimenez, H. Gomez-Moreno, and F. López-Ferreras. Road-sign detection and recognition based on support vector machines. Intelligent Transportation Systems, IEEE Transactions on, Vol. 8, No. 2, 2007, pp. 264-278. [28] Balali, V., and M. Golparvar-Fard. Video-Based Detection and Classification of US Traffic Signs and Mile Markers using Color Candidate Extraction and Feature-Based Recognition.In Computing in Civil and Building Engineering, 2014. pp. 858-866. [29] Houben, S., J. Stallkamp, J. Salmen, M. Schlipsing, and C. Igel. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark.In Neural Networks (IJCNN), The 2013 International Joint Conference on, IEEE, 2013. pp. 1-8. [30] Mathias, M., R. Timofte, R. Benenson, and L. Van Gool. Traffic sign recognition—How far are we from the solution?In Neural Networks (IJCNN), The 2013 International Joint Conference on, IEEE, 2013. pp. 1-8. [31] Overett, G., L. Tychsen-Smith, L. Petersson, N. Pettersson, and L. Andersson. Creating robust high-throughput traffic sign detectors using centre-surround HOG statistics. Machine Vision and Applications, 2014, pp. 1-14. [32] Creusen, I. M., R. G. Wijnhoven, E. Herbschleb, and P. De With. Color exploitation in hogbased traffic sign detection.In Image Processing (ICIP), 2010 17th IEEE International Conference on, IEEE, 2010. pp. 2669-2672. [33] Jahangiri, A., and H. Rakha. Developing a Support Vector Machine (SVM) Classifier for Transportation Mode Identification by Using Mobile Phone Sensor Data.In Transportation Research Board 93rd Annual Meeting, 2014. [34] Xie, Y., L.-f. Liu, C.-h. Li, and Y.-y. Qu. Unifying visual saliency with HOG feature learning for traffic sign detection.In Intelligent Vehicles Symposium, 2009 IEEE, IEEE, 2009. pp. 24-29. [35] Balali, V., and M. Golparvar-Fard. Scalable Nonparametric Parsing for Segmentation and Recognition of High-quantity, Low-cost Highway Assets from Car-mounted Video Streams.In Construction Research Congress, ASCE, Atlanta, GA, USA, 2014. pp. 120-129. [36] Pettersson, N., L. Petersson, and L. Andersson. The histogram feature-a resource-efficient weak classifier.In Intelligent Vehicles Symposium, 2008 IEEE, IEEE, 2008. pp. 678-683. [37] Balali, V., and M. Golparvar-Fard. Segmentation and recognition of roadway assets from carmounted camera video streams using a scalable non-parametric image parsing method. Automation in Construction, Vol. 49, Part A, 2015, pp. 27-39. [38] Ashouri Rad, A., and H. Rahmandad. Reconstructing Online Behaviors by Effort Minimization.In Social Computing, Behavioral-Cultural Modeling and Prediction, No. 7812, Springer Berlin Heidelberg, 2013. pp. 75-82. [39] Foord, M. HOW To Fetch Internet Resources Using The urllib Package. https://docs.python.org/3/howto/urllib2.html. [40] Dalal, N., and B. Triggs. Histograms of oriented gradients for human detection.In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, No. 1, IEEE, 2005. pp. 886-893. [41] Burges, C. J. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, Vol. 2, No. 2, 1998, pp. 121-167.

Suggest Documents