Local Contour Patterns for Fast Traffic Sign Detection - Semantic Scholar

3 downloads 82718 Views 716KB Size Report
This work was supported in part by the Galician Automotive Technology. Center (CTAG). .... modeled for three classes: triangular and circular traffic sign and no sign at all. .... at night) and different weather conditions (including rain and fog).
2010 IEEE Intelligent Vehicles Symposium University of California, San Diego, CA, USA June 21-24, 2010

TuB1.1

Local Contour Patterns for Fast Traffic Sign Detection Francisco Parada-Loira and José L. Alba-Castro, Member, IEEE 

video stream, only need grayscale images. Independently on the decision for color or grayscale video cameras, a TSR system is composed by three phases: detection, classification and tracking, being the last one mainly used for accelerating detection. In this paper we focus on the detection problem, both for circular and triangular signs. Grayscale approaches for traffic signs detection can be coarsely divided into texture-based and shape-based. A good work representing a pure texture approach can be found in [8], where an evolutionary discrete AdaBoost is used to handle a huge number of base texture features called dissociated dipoles and produce a cascade of detectors for each type of traffic sign (speed, circular and triangular). The outputs of the detection module contain the first classification information for the subsequent classification module that uses also shape information. Shape-based traffic sign detectors have been mainly focused on the sign contour information, either through coding distances to the scanning window [9,10] and entering the code vector into a classifier, or detecting the symmetric nature of the signs through Hough-like Transforms [11,12,13]. It is particularly outstanding the method in [11], called Fast Radial because reaches good performance with relatively low complexity. Later on we will discuss about some drawbacks of this method compared to ours. Other shape and texture approaches are based on the use of distinctive features like SURF [13], SIFT [13,14], HOG [6,15] or EOH [16] that had been also successfully applied to many other applications. The approach we propose in this paper for traffic signs detection is purely based on shape information. We define a new local pattern, coined as Local Contour Pattern (LCP), very similar to the well-known Local Binary Pattern (LBP) [17] successfully used in hundreds of applications to code local texture information, but, in our case, binary contour images are used for local codes computation. The rest of the paper is organized as follows: Section II introduces the new local contour pattern, LCP. Distinction between circular and triangle candidate signs is also introduced there. Section III is devoted to the processing of regions of interest highlighted by LCP. Two novel fast Hough-like approaches to precisely locate both kinds of sign shapes are presented in that section. Section IV shows results for real scenes captured with a camera mounted in the rear mirror of a prototype vehicle from the Galician Automotive Technology Center (CTAG) and Section V draws some conclusions.

Abstract—Advanced driver assistance systems have strong restrictions for real-time performance. Vision algorithms embedded in these systems need to balance accuracy and computational simplicity and there exists a continuous challenge to increase both goals. In this paper we define a new operator coined as Local Contour Patterns and use it in fast Hough-Transform-based approaches for circle and line detectors. We show an efficient implementation for traffic sign detection, relying only on shape information, that analyzes a 752x480 grayscale image in 40 ms in a Intel 8400 CPU, with a very good performance in real driving conditions.

I. INTRODUCTION

A

DVANCED driving assistance systems (ADAS) share the common goal of reducing the number of trafficrelated injuries and fatalities by alerting the driver or directly taking control of some vehicle subsystems before potentially dangerous situations Real–time constraints in most of vision-based ADAS pose an extra handicap for the implementation of computationally expensive image processing algorithms. Therefore, novel efficient approaches focused on reducing CPU burden without losing performance are always welcome. Traditionally, algorithmic approaches for traffic sign detection and recognition have been based on color images [1,2,3,4,5,6,7]. European traffic signs use red color to enclose important driving messages because this color is not very common in nature, hence, the human visual system makes an early discrimination. This idea was rapidly adopted for computer vision solutions and remains valid for quite a few systems nowadays. Nevertheless the use of color information has its own drawbacks: i) Bayer filters or any other filter configuration for coding color from RGB filtered CMOS cells reduce the effective resolution of the sensor for processing sub-modules working in grayscale, ii) more bandwidth is needed to transfer video to the processing unit, and iii) different color temperature of illuminating sources (height of the sun, clouds, city lights, car lights) and changes on white balance camera settings make the detection parameters in color space more tricky. There exists another important reason to avoid color use: ADAS integration. TSR is almost the only ADA system for which color information can be advantageous, while pedestrian detection, automatic cruise control, lane departure warning, etc., that should be fed by the same

This work was supported in part by the Galician Automotive Technology Center (CTAG). F. Parada-Loira and J.L. Alba-Castro are with the University of Vigo (corresponding author phone: +34-986-812680; fax: +34-986-812116; e-mail: [email protected].

978-1-4244-7868-2/10/$26.00 ©2010 IEEE

1

II. LOCAL CONTOUR PATTERNS

x

x x

One of the main goals of our work is to miss the few traffic signs as possible independently on the weather or daylight conditions and process at least 15 frames/sec from a 756x480 grayscale vehicle-mounted camera. Under these real-life conditions we decided to design a fast shape-based traffic sign detector to gain robustness against low contrast or poorly illuminated scenes. We are not so worry about detecting a few false positives per frame because the classification step could easily handle and reject about 5-10 incorrectly detected signs per frame and still run all the system in real-time over 10 frames/sec. First of all the image is converted to a contour image by means of the Canny operator [18] that shows very good performance on traffic sign scenes because they are designed, as we said, to enhance the contrast for the human visual system in color space. Grayscale images are more weighted to green channel, so red contours appear usually much darker than the surrounding areas and, specially, much darker than the enclosed white area. Moreover, Canny operator has the advantage of producing a thin contour, and this allow us to speed up the extraction of patterns as we will explain in the next subsection. Canny thresholds are selected using a dataset of videos, but it is important to highlight that reducing the upper threshold to obtain a maximum number of contours only increases computation time in 15%.

x

Detecting simple linear structures can be enough for many applications as the one we describe here. As an example, Figure 1 shows some examples of simple structures coded by LCP8,3 that we are interested to find in traffic scenes to detect traffic signs. Black pixels represent the center of the 3x3 search region and colored pixels represent surrounding pixels with value 1.

Fig. 1. Some examples of simple structures coded by LCP8,3. Top left shows linear structures close to 45º, top right close to 135º, bottom left close to 0º and bottom right close to 90º.

Traffic signs in EU can be circular, triangular or rectangular, therefore, their contours can be simplified to a constrained combination of simple linear structures, even for circular contours (octagonal shapes can simplify a circled sign). Circular and triangular traffic signs are the most important ones for traffic safety, leaving rectangular signs for additional information only. In this paper we will not deal with this kind of signs but the analysis would be absolutely equivalent and, therefore, would be useful for USA speed limit signs. B. Detection with LCP histograms

A. Definition of LCP

Starting from a binary image of contours, every active pixel encodes a B-length binary word. Many of the codewords can be early discarded for a given application. For traffic signs only code-words representing linear structures with selected angles are kept. Depending on the resolution of the image and the range of scales for useful detection (no interest on detecting signs with unreadable content) the number of code-words to represent the target linear structure can change. Around 30 LCP8,3 code-words out of 256 are used in our application to represent 4 sets of directions (15/15º, 30/60º, 120/150º, 75/105º). Similarly to what is done for LBPs to define textured regions, code-words are then mapped into 4 condensing codes representing partial linear structures to build histograms from scanning windows. LBPs use to map codewords corresponding to local rotation in one single code to make the detection robust to object rotations. We are not interested on this type of mapping because contour angle is an important source of information. Nevertheless, in order to make the traffic sign detection more robust to pixel noise or low contrast, code-words are grouped around the target angles. For example, the LCP8,3 code-words represented in Figure 1 are examples of the 4 target angles, hence, sum up to 4 different histogram bins. We can also see this partial linear structures histogram construction as a simplification of 4-bin histogram of

The Local Contour Pattern operator (LCP) is defined as a measure over binary images to find local geometrical structures. It can be defined from the well-known grayscale Local Binary Pattern [17] formulation: ௉ିଵ

‫ܲܤܮ‬௉ǡோ ൌ ෍ ‫ݏ‬ሺ݃௣ െ ݃௖ ሻʹ௣ ௣ୀ଴

The function s(x) is a hard limiter of the gray level difference between the central pixel of the local area, ݃௖ ǡ and the P pixels, ݃௣ ǡ at distance R from the center. The equation that governs the LCP code can similarly be defined as: ஻ିଵ

‫ܲܥܮ‬஻ǡே ൌ ෍ ܲ௕ ʹ௕ ௕ୀ଴

We have substituted the hard limiter by the binary value of the pixel Pb in the local neighborhood N. We also have made changes in the notation to highlight some differences with LBP coding. N stands for the odd length of the squared local region around the central contour pixel. B stands for the code length function of N. For example, N=3 produces B=8 bit codes, N=5 produces B=24 bit codes, N=7 produces B=48 bit codes. Large N values provide more flexibility to define complex structures, but they also introduce more noise in the process and complicate the clustering of codes. 2

oriented gradients (HoG) [15] where we only process a binary value in the dominant contour direction for each pixel above a threshold (Canny output). A set of scanning windows are defined depending on the resolution of the input image and the range of useful detections for the subsequent recognition process. Normalized class-conditional histograms are statistically modeled for three classes: triangular and circular traffic sign and no sign at all. Therefore we define a likelihood test for each window location among circular-sign, triangular-sign and no-sign. Unfortunately, with this detection scheme we cannot disambiguate between yield and danger signs because the distribution of partial linear structures is identical.

Fig. 2. Square scanning windows with fast computation boxes of LCP condensing codes: red boxes compute partial linear structures around 0º, green boxes around 90º, blue boxes around 135º and yellow boxes around 45º. The scale range for the square scanning windows is from 20 to 50 pixels for this application

Circular traffic signs take a little bit more computation because 8 boxes are needed for detecting a ROI. Also in this case diagonal LCP code-words are checked for a minimum count before checking horizontal and vertical code-words. For all types of traffic signs a certain amount of variance is allowed between boxes counts to take into account partial occlusions of the traffic sign. Figure 3 shows a real traffic scene processed with Canny. Figure 4 show outputs of LCP filtered images. Figure 5 shows the result of candidates ROI detection and colored partial linear structures overlaid to the grayscale original image. .

C. Fast LCP processing for traffic signs In applications with clearly structured and simple contours, LCP processing for shape detection can be greatly accelerated by using the integral images [19] of the LCPprocessed input image. The key idea consists of creating an integral image as output for each condensing code. In our case we have 4 condensing codes. Every pixel (i,j) of an integral image I counts the number of instances of LCP code-words mapped into the same condensing code that appear in the sub-image whose left upper corner is (0,0) and right lower corner is (i,j). For a given rectangular scanning window limited by corners (i-M, j-N), (i,j) the number of target LCP code-words can be rapidly calculated with the simple operation: I(i,j)-I(i-M,j)-I(i,j-N)+I(i-M,j-N). Therefore, the 4-bin histogram of our application can be readily calculated with just 4 additions plus 8 subtractions for any window of size MxN. The flexibility of this approach allows subdividing the scanning window into subparts to recover the topological order of the local linear structures that define the three types of traffic signs under study: circular and triangular danger /yield. As we mentioned before, the adaptation to include also square or rectangular traffic signs is straightforward. Figure 2 shows the boxes in which the scanning window is divided. To speed up even more the scanning, a procedure for fast window rejection is defined: for each scale only windows containing boxes with counts of target LCP code-words over a threshold are kept as a ROI for subsequent processing. Boxes of triangular candidates are searched in order: if diagonal counts (blue and yellow in figure 2) are under a threshold no count is made for horizontal LCP code-words (much more likely in traffic scenes).

Fig. 3. Canny image of a traffic scene where circular and triangular traffic signs are at the right shoulder.

III. POST-PROCESSING OF ROIS Once that most of the scanned windows have been rejected and a number of ROIs have been detected through LCP processing, two different post-processing steps check that the shape of the contour content in the ROI is really a triangular or circular shape.

3

computation of the normal directions and the Gaussian filter convolution over images for each target radio, takes too much time for our speed constraints. Video flow

Canny

Fig. 4. LCP filtered images. a) 45º condensing code (top-left), b) 135º condensing code (top-right), c) 0º condensing code (bottom left), d) 90º condensing code (bottom-right).

LCP processing

reject

Scanning for ROIs

reject

for each scale: Triangle, Circle or nothing

Fast Hough transform for circles

reject

Fast Hough transform for lines

Triangle check

Fast corner location Fig. 5. LCP-processed image with the detected ROIs and the pixels represented by the 4 LCP condensing codes colored in red (horizontal), green (vertical), yellow and blue (diagonal).

Circular traffic sign

Figure 6 shows the block diagram of the traffic sign detection process. In this section we explain the second part, where two branches specialize for circular or triangular shapes. Both cases are processed with fast implementations of Hough-like transforms. A novel approach was developed in this work based on locating the center and radii of the circular shape using simple binary kernels composed by arcs.

Danger Yield traffic sign traffic sign

Figure 6: Block diagram of the traffic sign detection process. Rejection of candidate pixels or ROIs can be produced in many steps (green arrows).

We have developed a faster and more robust approach that shares some ideas with the proposed in [20]. We define an accumulation matrix, AC[r,x,y], for each target radio that accounts for the likelihood of every candidate center in the image (Hough-like processing). A set of binary kernels are defined for each target radio and each LCP condensing code. The kernel contains two arcs at a distance R (target radio) from the center of the kernel and in the direction normal to the linear structure represented by the condensing code (0º, 45º, 90º, 135º in our case). The arc length depends on the radio and on the precision of the normal direction from the contour. We have chosen arcs with 20% length of the whole circle. This idea can be shown in Figure 7. Left graph shows 8 kernels located in contour pixels represented by 4 different LCP condensing codes defined for radio R and full circle. The accumulation matrix AC[r,x,y] counts in many points besides the real center and more false positives can appear. This approach is similar to [20] but we can exploit here the 4-bin-histogram information to reduce the number of false

A. Detecting circular shapes with arc kernels In [11], a circular shape detection algorithm is based on computing contour pixels and each one of them vote for another pixel at a specified radio distance in the direction of the gradient. Then a Gaussian filter smoothes the local influence of noisy contours to obtain less candidate circle centers. We have found that, in real traffic scenes, circular signs yield not so circular contours due to low contrast with background, partial occluding objects or simply image noise. These factors cause that normal directions from the contour pixels of a supposed circular signal are much less convergent than expected and even the use of the Gaussian filter cannot provide a clear center. Moreover, both the 4

positives. Right graph represents the same group of 8 kernels for the same radio, but with only 2 arcs of 20% length. It is easy to see that reducing the arc length, the number of intersections outside the real center is also reduced and it is more likely that the local maximum of AC yields the real circle center. Reducing the arcs to one pixel can be seen as a simplification of the Fast Radial method in [11], but, as we said, the chance of missing the real center, in this case, is big.

R

Fig. 7: Left: circle binary R-sized kernels accumulating in the center of the circular shape (solid line). Right: arc binary R-sixed kernels accumulating in the center of the circular shape. Green circles/arcs represent kernels for 90º LCP condensing codes, red for 0º, blue for 135º and yellow for 45º.

After AC[r,x,y] is filled we post-process the matrix to find suitable maxima. Several constraints can be imposed at this level, regarding tracking information from previously recognized signals, regarding overlapping maxima for different contiguous radii, regarding more likely regions in the road scene or even regarding outputs from other ADAS, like detected road lanes. It is not the purpose of this paper to go into details of the complete TSR system, but just explaining the novel methods to speed up shape processing.

Two difficult scenes with partial and final results. LCPs primitives are in red, blue and yellow (vertical LCPs not shown). Candidate ROIs are in purple (circular signs), cyan (danger) and green (yield) Detected signs after Hough Transform, and triangle conditions are in strong red.

IV. DETECTION RESULTS B. Detecting triangular shapes We have tested our LCP-based traffic sign detection approach with a database of videos supplied by the Galician Automotive Technology Center (CTAG), from which we have extracted 367 yield signs, 644 danger signs and 555 circle signs. The scenes are from different lighting conditions (sunny and cloudy days, even some scenes taken at night) and different weather conditions (including rain and fog). Given that the method doesn’t use a proper training approach, we have just set some thresholds using 10% of the signs and use the rest to give figures for recall and precision. These figures are defined as:

Post-processing ROIs previously labeled as triangular candidates is also based on the Hough Transform (HT) [21] but, in this case, it is just a fast implementation for line detection: given that 3 boxes in the ROI have passed the threshold for a minimum number of LCP condensing codes for horizontal and diagonal local linear structures, a HT for a narrow range of angles around 0º, 60º and 120º is applied to each box and only in the corresponding LCP image, therefore the computational complexity of this part is negligible. At his post-processing step danger and yield triangular signs are already discriminated thanks to the relative location of boxes in the ROI. The intersection points of the three detected HT lines are then computed and the geometrical proportion of the triangle is checked. A tolerance for perspective effects and slightly rotated traffic signs can be introduced at this step. Finally, if the triangle passes the geometrical constraints, the corners of the traffic sign are detected around the intersection points with a simplified version of the method in [22] that works also for small traffic signs.

Recall = TruePositives/(TruePositives+FalseNegatives) Precision = TruePositives/(TruePositives+FalsePositives)

Table 3 shows results for each type of traffic sign.

5

Yield Danger Circular

N.

T.P

F.N

F.P

Recall Precision

331 580 514

307 536 466

24 44 48

1 6 7

92,7 92,4 90,6

[4]

99.6 98.8 98,5

[5] [6]

Table 3: Recall and precision results for traffic sign detection per frame.

Results for traffic sign recognition can also be reported for multiple contiguous frames because the recognizer doesn’t need to detect the whole sequence of a traffic sign approaching. In Table 4 we report figures on traffic sign detection in scenes with, at least, three detections for the same ROI in consecutive frames.

Yield Danger Circular

Total 28 50 63

Detected 28 49 61

Not Detected 0 1 2

[7] [8]

[9]

Rate 100 98 96,8

[10]

Table 4: Results for sign detection per scene (three consecutive detections)

[11]

We achieve real time performance through the appropriate election of the thresholds and scales for the ROI detection. Tests have been computed in 752x480 images, scanning the whole image. The mean computation time for the process is 41 ms for an Intel Core Duo 8400 with a core speed of 3Ghz. Working with images of quarter size it is possible to obtain a mean computation time less than 22ms. per frame. It is important to highlight that partial scanning of the scene is usually applied for the operating system mounted in a vehicle, therefore, in that case, our approach runs even faster.

[12]

V. CONCLUSIONS

[18]

[13] [14] [15] [16] [17]

[19]

In this paper we have presented a complete system for circular and triangle traffic sign detection whose main property is the small computational burden. In order to gain speed and, at the same time, preserve competitive recall and precision figures, we have developed and presented a Local Contour Operator to efficiently encode and process contour images, and a kernel-based acceleration of Hough-like transform for circle detection that takes advantage of the LCP coding. Results over real traffic scenes in several scenarios and different lighting and weather conditions have shown the reliability of the proposed method.

[20] [21] [22]

REFERENCES [1] [2] [3]

De la Escalera A., Armingol J.M. and Mata M., “Traffic sign recognition and analysis for intelligent vehicles”, Image and Vision Computing, 21:247-258, 2003 Torresen J., Bakke J.W., and Sekania L., “Efficient recognition of speed limit signs”, Proc. IEEE Conf. on Intelligent Transportation Systems (ITS), Washington DC, 2004 Bahlmann C., Zhu Y., Ramesh V., Pellkofer M. and Koehler T., “A System for Traffic Sign Detection, Tracking, and Recognition Using Color, Shape, and Motion Information”, IEEE Intelligent Vehicles Symposium (IV 2005), Las Vegas, June 2005

6

Maldonado, S., Lafuente, S., Gil, P., Gómez, H. and López, F., Roadsign detection and recognition based on support vector machines. IEEE Trans. Intelligent Transportation Systems. v8 i2. 264-278. C. G. Kiran, Lekhesh V. Prabhu, V. Abdu Rahiman, K. Rajeev: Traffic Sign Detection and Pattern Recognition Using Support Vector Machine. ICAPR 2009: 87-90 Yuan Xie; Li-feng Liu; Cui-hua Li; Yan-yun Qu Unifying Visual Saliency with HOG Feature Learning for Traffic Sign Detection. IEEE Intelligent Vehicles symposium. June 2009 Shaanxi China M. A. Garcia-Garrido, et ai, "Fast traffic sign detection and recognition under changing lighting conditions," IEEE Conf. on Intelligent Transportation Systems, Sept 2006, pp.811-816. Xavier Baró et al.”Traffic sign recognition using evolutionary adaboost detection and forest-ECOC Classification”. IEEE Transactions on Intelligent Transportation Systems, V.10, N. 1,pp. 113-126, March 2009 D.M. Gavrila. Traffic Sign Recognition Revisited. Image Understanding Systems, DaimlerChrysler Research, 89081 Ulm, Germany. Lafuente Arroyo, P. Gil Jimenez, R. Maldonado Bascon, F. Lopez Ferreras, and S. Maldonado Bascon, “Traffic Sign Shape Classification Evaluation I: SVM using Distance to Borders,” Proceedings of IEEE Intelligent Vehicles Symposium , pp. 557-562, Las Vegas, June 2005 N. Barnes A.Zelinsky “Real Time radial symmetry for speed sign detection” IVS 2004 G. Loy N.Barnes “Fast Shape-based road sign detection for a driver assistance system” IROS 2004 Towards reliable traffic sign recognition Hoferlin, B.; Zimmermann, K.; Intelligent Vehicles Symposium, 2009 IEEE 2009 , Page(s): 324 329 D. G. Lowe, "Distinctive image features from scale-invariant keypoints," Int. 1. Comput. Vision, vol. 60, no. 2, pp. 91-110, 2004. N. Dalal, B. Triggs: Histograms of oriented gradients for human detection. In: CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893 vol. 1, June 2005 B. Alefs, G. Eschemann, H. Ramoser, C. Beleznai. Road Sign Detection from Edge Orientation Histograms. IEEE Intelligent Vehicles Symposium, 2007. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 971 - 987 Canny, J., A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8:679–714, 1986 Crow, F. “Summed-area tables for texture mapping”, SIGGRAPH 84, pp. 207-212. 1984 Gerig G. and Klein F. “Fast contour identification through efficient Hough Transform and simplified interpretation strategy,” 8th IJCPR, Paris, pp. 498-500, 1986. DH Ballard, “Generalizing the Hough Transform to detect arbitrary shapes”, Pattern recognition, V.13,N. 2, pp. 111-122, 1981 E. Rosten, R. Porter and T. Drummond, “Faster and better: a machine learnig approach to corner detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, V. 32, N. 1, pp. 105-119, 2009

Suggest Documents